T-SQL: Deleting all duplicate rows but keeping one
Possible Duplicate: SQL - How can I remove duplicate rows?
I have a table with a very large amount of rows. Duplicates are not allowed but due to a problem with how the rows were created I know there are some duplicates in this table. I need to eliminate the extra rows from the perspective of the key columns. Some other columns may have slightly different data but I do not care about that. I still need to keep one of these rows however. SELECT DISTINCT won't work because it operates on all columns and I need to suppress duplicates based on the key columns.
How can I delete the extra rows but still keep one efficiently?
You didn't say what version you were using, but in SQL 2005 and above, you can use a common table expression with the OVER Clause. It goes a little something like this:
WITH cte AS ( SELECT[foo], [bar], row_number() OVER(PARTITION BY foo, bar ORDER BY baz) AS [rn] FROM TABLE ) DELETE cte WHERE [rn] > 1
Play around with it and see what you get.
(Edit: In an attempt to be helpful, someone edited the
ORDER BY clause within the CTE. To be clear, you can order by anything you want here, it needn't be one of the columns returned by the cte. In fact, a common use-case here is that "foo, bar" are the group identifier and "baz" is some sort of time stamp. In order to keep the latest, you'd do
ORDER BY baz desc)
T-SQL: Deleting all duplicate rows but keeping one, , and then copy them by pressing Ctrl + C and paste to another location by Ctrl + V. In the above code, we are using a CTE (Common Table Expression) to find the duplicates and then we are deleting all the records using DELETE command but keeping the only one record for each employee. After executing this query, we have only one row for each employee as this.
DELETE FROM Table WHERE ID NOT IN ( SELECT MIN(ID) FROM Table GROUP BY Field1, Field2, Field3, ... )
fields are column on which you want to group the duplicate rows.
How to remove all duplicates but keep only one in Excel?, How do I delete all duplicate rows except one in SQL? Delete Duplicate Records and keep only one copy ;WITH CTE AS ( SELECT FirstName, LastName, Age, Row_number() OVER ( PARTITION BY FirstName, LastName,Age ORDER BY ( SELECT 1) ) AS Rn FROM dbo.Customer) DELETE FROM CTE WHERE Rn>1 As we can see, all the duplicate copies are deleted and only unique records are left.
Here's my twist on it, with a runnable example. Note this will only work in the situation where
Id is unique, and you have duplicate values in other columns.
DECLARE @SampleData AS TABLE (Id int, Duplicate varchar(20)) INSERT INTO @SampleData SELECT 1, 'ABC' UNION ALL SELECT 2, 'ABC' UNION ALL SELECT 3, 'LMN' UNION ALL SELECT 4, 'XYZ' UNION ALL SELECT 5, 'XYZ' DELETE FROM @SampleData WHERE Id IN ( SELECT Id FROM ( SELECT Id ,ROW_NUMBER() OVER (PARTITION BY [Duplicate] ORDER BY Id) AS [ItemNumber] -- Change the partition columns to include the ones that make the row distinct FROM @SampleData ) a WHERE ItemNumber > 1 -- Keep only the first unique item ) SELECT * FROM @SampleData
And the results:
Id Duplicate ----------- --------- 1 ABC 3 LMN 4 XYZ
Not sure why that's what I thought of first... definitely not the simplest way to go but it works.
How to Eliminate Duplicate Rows in SQL SELECT Statement for , Duplicates And Keep Row With Highest ID. In this article we look at ways to remove all duplicate rows except one in an SQL database. For all # Step 1: Copy distinct values to temporary table CREATE TEMPORARY TABLE tmp_user ( SELECT id, name FROM user GROUP BY name ); # Step 2: Remove all rows from original table DELETE FROM user; # Step 3: Add Unique constraint ALTER TABLE user ADD UNIQUE(name); # Step 4: Remove all rows from original table INSERT IGNORE INTO user (SELECT * FROM tmp_
How To Remove All Duplicate Rows Except One In SQL , This table has some duplicate data (in all the four columns) which needs to be deleted except the original one row. To demonstrate this, let's SQL delete duplicate Rows using Group By and having clause In this method, we use the SQL GROUP BY clause to identify the duplicate rows. The Group By clause groups data as per the defined columns and we can use the COUNT function to check the occurrence of a row.
Find and Delete all duplicate rows but keep one, Is there a SQL that I can use to delete duplicate entries from a data store, while leaving a distinct copy - leave a single copy, remove all duplicate except one? To delete the duplicate rows from the table in SQL Server, you follow these steps: Find duplicate rows using GROUP BY clause or ROW_NUMBER () function. Use DELETE statement to remove the duplicate rows.
SQL Guru: Deleting Duplicate Records While Keeping a Single , Now, I wanted to remove duplicate records and keep only latest record for each “AccountId” in the table. Steps. Based on the use case specified Recently, I got one request for one script to delete duplicate records in PostgreSQL. Most of the Database Developers have such a requirement to delete duplicate records from the Database. Like SQL Server, ROW_NUMBER() PARTITION BY is also available in PostgreSQL.