How to delete Duplicate records in snowflake database table

snowflake find duplicate rows
snowflake delete rows
snowflake array remove duplicates
merge statement inserting duplicate rows
how to remove duplicates in sql query
query to remove duplicate records in sql
delete duplicate rows in sql
snowflake select distinct

how to delete the duplicate records from snowflake table. Thanks

ID Name
1  Apple
1  Apple
2  Apple
3  Orange
3  Orange

Result should be:

ID Name
1  Apple
2  Apple
3  Orange

If you have some primary key as such:

CREATE TABLE fruit (key number, id number, name text);

insert into fruit values (1,1, 'Apple'), (2,1,'Apple'),
      (3,2, 'Apple'), (4,3, 'Orange'), (5,3, 'Orange');

as then

DELETE FROM fruit
WHERE key in (
  SELECT key 
  FROM (
      SELECT key
          ,ROW_NUMBER() OVER (PARTITION BY id, name ORDER BY key) AS rn
      FROM fruit
  )
  WHERE rn > 1
);

But if you do not have a unique key then you cannot delete that way. At which point a

CREATE TABLE new_table_name AS
SELECT id, name FROM (
    SELECT id
        ,name
        ,ROW_NUMBER() OVER (PARTITION BY id, name) AS rn
    FROM table_name
)
WHERE rn > 1

and then swap them

ALTER TABLE table_name SWAP WITH new_table_name

How to delete duplicate records, Snowflake doesn't have ROWID to delete duplicate records. You cannot use the How to remove the full row duplicate record in Snowflake table: You can use this code as a base and tinker around a little to fit your needs! Remove Duplicate Records from Snowflake Table There are many methods that you can use to remove the duplicate records from the Snowflake table. For example, use the DISTINCT keyword to remove duplicate while retrieving rows. The following methods can be used to remove duplicate records Snowflake table.

Snowflake does not have effective primary keys, their use is primarily with ERD tools. Snowflake does not have something like a ROWID either, so there is no way to identify duplicates for deletion.

It is possible to temporarily add a "is_duplicate" column, eg. numbering all the duplicates with the ROW_NUMBER() function, and then delete all records with "is_duplicate" > 1 and finally delete the utility column.

Another way is to create a duplicate table and swap, as others have suggested. However, constraints and grants must be kept. One way to do this is:

CREATE TABLE new_table LIKE old_table COPY GRANTS;
INSERT INTO new_table SELECT DISTINCT * FROM old_table;
ALTER TABLE old_table SWAP WITH new_table;

The code above removes exact duplicates. If you want to end up with a row for each "PK" you need to include logic to select which copy you want to keep.

This illustrates the importance to add update timestamp columns in a Snowflake Data Warehouse.

How to delete duplicate records in a Snowflake Table, If you have some primary key as such: CREATE TABLE fruit (key number, id number, name text); insert into fruit values (1,1, 'Apple'), (2,1  1. How to remove the full row duplicate record in Snowflake table: If all columns, then the only solution is to do a SELECT DISTINCT from the table into a new table (and then rename/swap table names) Step-1: create table mytable_copy as select distinct * from mytable; Step-2: drop table mytable; alter table mytable_copy rename to mytable; 2.

this has been bothering me for some time as well. As snowflake has added support for qualify you can now create a dedupped table with a single statement without subselects:

CREATE TABLE fruit (id number, nam text);
insert into fruit values (1, 'Apple'), (1,'Apple'),
      (2, 'Apple'), (3, 'Orange'), (3, 'Orange');


CREATE OR REPLACE TABLE fruit AS 
SELECT * FROM 
fruit 
qualify row_number() OVER (PARTITION BY id, nam ORDER BY id, nam) = 1;
SELECT * FROM fruit;

Of course you are left with a new table and loose table history, primary keys, foreign keys and such.

How to delete Duplicate records in snowflake database table, How can I delete duplicate records from Snowflake table(given that a column name is unique but does not have the constraint), as CTE is not  Snowflake does not have something like a ROWID either, so there is no way to identify duplicates for deletion. It is possible to temporarily add a "is_duplicate" column, eg. numbering all the duplicates with the ROW_NUMBER() function, and then delete all records with "is_duplicate" > 1 and finally delete the utility column.

Your question boils down to: How can I delete one of two perfectly identical rows? . You can't. You can only do a DELETE FROM fruit where ID = 1 and Name = 'Apple';, then both rows will go away. Or you don't, and keep both.

For some databases, there are workarounds using internal rows, but there isn't any in snowflake, see https://support.snowflake.net/s/question/0D50Z00008FQyGqSAL/is-there-an-internalmetadata-unique-rowid-in-snowflake-that-i-can-reference . You cannot limit deletes, either, so your only option is to create a new table and swap.


Additional Note on Hans Henrik Eriksen's remark on the importance of update timestamps: This is a real help when the duplicates where added later. If, for example, you want to keep the newer values, you can then do this:

-- setup
create table fruit (ID Integer, Name VARCHAR(16777216), "UPDATED_AT" TIMESTAMP_NTZ);
insert into fruit values (1, 'Apple', CURRENT_TIMESTAMP::timestamp_ntz)
, (2, 'Apple', CURRENT_TIMESTAMP::timestamp_ntz)
, (3, 'Orange', CURRENT_TIMESTAMP::timestamp_ntz);
-- wait > 1 nanosecond
insert into fruit values (1, 'Apple', CURRENT_TIMESTAMP::timestamp_ntz)
, (3, 'Orange', CURRENT_TIMESTAMP::timestamp_ntz);

-- delete older duplicates (DESC)
DELETE FROM fruit
  WHERE (ID
  , UPDATED_AT) IN (
     SELECT ID
     , UPDATED_AT
     FROM (
         SELECT ID
         , UPDATED_AT
         , ROW_NUMBER() OVER (PARTITION BY ID ORDER BY UPDATED_AT DESC) AS rn
         FROM fruit
     )
     WHERE rn > 1
  );

How can I delete duplicate records from Snowflake table(given that , How can I make sure that there are no duplicate rows after insertion. Please try to use my query and check the data by update the table  In the table, we have a few duplicate records, and we need to remove them. SQL delete duplicate Rows using Group By and having clause In this method, we use the SQL GROUP BY clause to identify the duplicate rows. The Group By clause groups data as per the defined columns and we can use the COUNT function to check the occurrence of a row.

How to take care of duplicates (entire row) while doing simple insert , Any better solution to delete duplicates from tables without having creating temporary table . insert into tempdw.dup_tmp values (1,829767);. Removing Duplicates by Self-referencing Method. We can remove the duplicates using the same method we used to find duplicates with the exception of using DELETE in line with its syntax as follows: USE UniversityV2 -- Removing duplicates by using Self-Referencing method DELETE S2 FROM [dbo]. [Student] S1, [dbo].

Delete duplicates, Try now for free. Write queries, visualize data, and share your results. Share queries by URL, and organize them in folders. Works for PostgreSQL, MySQL,  Delete duplicate rows by identifying their column. After "SQL'" enter "delete from names a where rowid > (select min (rowid) from names b where b.name=a.name and b.age=a.age);" to delete the duplicate records.

How to Duplicate a Table in Snowflake in Snowflake, Duplicate records are removed, leaving only distinct values. We want to bring these tables together using a Unite component so that we have just 1 complete  If yes, proceed to step 5 below. If no, you have duplicate keys, yet unique rows, and need to decide which rows to save. This will usually entail either discarding a row, or creating a new unique key value for this row. Take one of these two steps for each such duplicate PK in the holddups table. Delete the duplicate rows from the original table.

Comments
  • Combining WITH ... AS and DELETE throws and error for me, SQL compilation error: syntax error line 10 at position 0 unexpected 'DELETE'.. I think you can only use SELECT, see docs.snowflake.net/manuals/sql-reference/constructs/…
  • quite fair point, I've not tested it, but given the CTE is not common (used more than once) it can just be pushed into a sub-select with a WHERE key IN (SELECT...) form
  • very true, replaced with a subselect.
  • In my experience duplicate deletion is mostly done manually so swapping the table and then setting the permissions is the easiest.