Delete duplicate rows (don't delete all duplicate)

I am using postgres. I want to delete Duplicate rows. The condition is that , 1 copy from the set of duplicate rows would not be deleted.

i.e : if there are 5 duplicate records then 4 of them will be deleted.

Try the steps described in this article: Removing duplicates from a PostgreSQL database.

It describes a situation when you have to deal with huge amount of data which isn't possible to group by.

A simple solution would be this:

       WHERE id NOT IN (SELECT min(id) --or max(id)
                        FROM foo
                        GROUP BY hash)

Where hash is something that gets duplicated.

delete from table
where not id in 
(select max(id) from table group by [duplicate row])

This is random (max Value) choice which row you need to keep. If you have aggre whit this please provide more details

The fastest is is join to the same table.

CREATE TABLE test(id INT,id2 INT);
mapy=# INSERT INTO test VALUES(1,2);
mapy=# INSERT INTO test VALUES(1,3);
mapy=# INSERT INTO test VALUES(1,4);

DELETE FROM test t1 USING test t2 WHERE AND t1.id2<t2.id2;
mapy=# SELECT * FROM test;
 id | id2 
  1 |   4
(1 row)

delete from table t1 
where rowid > (SELECT min(rowid) FROM table t2 group by 
     , );

  • possible duplicate of How to delete duplicate rows with SQL?
  • how ironic! lol 'possible duplicate of how to delete duplicates'...
  • doesn't this delete all the rows that don't have duplicates, too?
  • @pomarc no, because there's this little equals sign (=) before 1 that tells us that we want to take min(id) of all possible groups even those that contain only one member; so, no worries, you won't delete data that is not duplicated
  • is the having count (*) >=1 neccesary ? i got the same result if I execute : DELETE FROM foo WHERE id NOT IN (SELECT min(id) FROM foo GROUP BY hash)
  • @grteibo you are absolutely right, that's the way the deduplication is usually done; I don't know why I didn't notice that before; the idea of this answer is not so much the idea of deduplication itself but the fact that we calculate a hash for all the columns that we want to group by and then remove duplicates