Merging duplicated records together with "Merge" syntax

Merging duplicated records together with "Merge" syntax

dynamics 365 merge records
dynamics crm bulk merge
dynamics 365 merge custom entity
how to delete duplicate records in dynamics 365
dynamics 365 business central documentation
dynamics 365 user guide
microsoft dynamics 365 functionality
dynamics 365 sales documentation

I am using SQL Server 2014. I am currently trying to combine millions of personnel application records in to a single personnel record.

The records contain the following columns:

ID, First_Name, Last_Name, DOB, Post_Code, Mobile, Email

A person can enter their details numerous times but due to fat fingers or fraud they can sometimes put in, incorrect details.

In my example Christopher has filled his details in 5 times, First_Name, Last_Name, DOB are always correct, Post_Code, Mobile and Email contain various connotations.

What I want to do is take the min(id) associated with this group in this case 84015283 and put it in to a new table, this will be the primary key and then you will see the other id's that are associated with it.

Examples

NID       CID
------------------
84015283  84015283
84015283  84069198
84015283  84070263
84015283  84369603
84015283  85061159

Where it gets a little complicated is, where 2 different people can have the same First_Name, Last_Name and DOB, at least one of the other fields must match "post_code, mobile or email" as per my example to another record within the group.

Though first_name, last_name, DoB match between ID's 84015283, 84069198, 84070263. 84015283, 84069198 are identical so they would match without an issue, 84070263 matches on the postcode, 84369603 matches on the mobile to a previous record and 85061159 matches on a previous mobile/email but not post_code.

If putting the NID within the original dataset is easier I can go with this rather than putting it all in a separate table.

After some googling and trying to get my head around this, I believe that using "Merge" might be a good way to achieve what I am after but I am concerned it will take a very long time due to the number of records involved.

Also going forward any routine would have to be run on subsequent new records.

I have listed the code for the example if anyone can help

DROP TABLE customer_dist

CREATE TABLE [dbo].customer_dist
(
    [id] [int] NOT NULL,
    [First_Name] [varchar](50) NULL,
    [Last_Name] [varchar](50) NULL,
    [DoB] [date] NULL,
    [post_code] [varchar](50) NULL,
    [mobile] [varchar](50) NULL,
    [Email] [varchar](100) NULL,
)

INSERT INTO customer_dist (id, First_Name, Last_Name, DoB, post_code, mobile, Email)
VALUES ('84015283', 'Christopher', 'Higg', '1956-01-13', 'CH2 3AZ', '07089559829', 'CH@hotmail.com'),
       ('84069198', 'Christopher', 'Higg', '1956-01-13', 'CH2 3AZ', '07089559829', 'CH@hotmail.com'),
       ('84070263', 'Christopher', 'Higg', '1956-01-13', 'CH2 3AZ', '07089559822', 'CHigg@AOL.com'),
       ('84369603', 'Christopher', 'Higg', '1956-01-13', 'CH2 3ZA', '07089559829', 'Higg@emailme.com'),
       ('85061159', 'CHRISTOPHER', 'Higg', '1956-01-13', 'CH2 3RA', '07089559829', 'CH@hotmail.com'),
       ('87065122', 'Matthew', 'Davis', '1978-05-10', 'CH5 1TS', '07077084692', 'Matt@gamil.com')

SELECT * FROM customer_dist

Below is the expected results, sorry I should of made it clearer what I wanted at the end.

Output Table Results

    NID         id          First_Name  Last_Name   DoB         post_code   mobile          Email
    84015283    84015283    Christopher Higg            1/13/1956   CH2 3AZ         7089559829  CH@hotmail.com
    84015283    84069198    Christopher Higg            1/13/1956   CH2 3AZ         7089559829  CH@hotmail.com
    84015283    84070263    Christopher Higg            1/13/1956   CH2 3AZ         7089559822  CHigg@AOL.com
    84015283    84369603    Christopher Higg            1/13/1956   CH2 3ZA         7089559829  Higg@emailme.com
    84015283    85061159    CHRISTOPHER Higg            1/13/1956   CH2 3RA         7089559829  CH@hotmail.com
    78065122    87065122    Matthew Davis               05/10/1978  CH5 1TS

7077084692 Matt@gamil.com

OR                          

NID         id
84015283    84015283
84015283    84069198
84015283    84070263
84015283    84369603
84015283    85061159
87065122    87065122

Apologies for the slow response.

I have updated my required output, I was asked to include an extra record that was not a match to the other records but did not include this in my required output.

HABO's response was the closest to what was needed unfortunately on further testing with other sample data, duplicates were created and the logic broke down. Other Sample data would be :-

declare @customer_dist as Table (
    [id] [int] NOT NULL,
    [First_Name] [varchar](50) NULL,
    [Last_Name] [varchar](50) NULL,
    [DoB] [date] NULL,
    [post_code] [varchar](50) NULL,
    [mobile] [varchar](50) NULL,
    [Email] [varchar](100) NULL );


INSERT INTO @customer_dist (id, First_Name, Last_Name, DoB, post_code, mobile, Email)
VALUES ('32006455', 'Mary', 'Wilson',   '1983-09-20',   'BT62JA',   '07706212920',  'nastie220@yahoo.com'),
       ('35963960', 'Mary', 'Wilson',   '1983-09-20',   'BT62JA',   '07484863324',  'nastie@hotmail.com'),
       ('38627975', 'Mary', 'Wilson',   '1983-09-20',   'BT62JA',   '07484863478',  'nastie2001@yahoo.com'),
       ('46653041', 'Mary', 'WILSON',   '1983-09-20',   'BT62JA',   '07483888179',  'nastie2010@yahoo.com'),
       ('48023677', 'Mary', 'Wilson',   '1983-09-20',   'BT62JA',   '07483888179',  'nastie@hotmail.com'),
       ('49560434', 'Mary', 'Wilson',   '1983-09-20',   'BT62JA',   '07849727199',  'nastie@hotmail.com'),
       ('49861032', 'Mary', 'WILSON',   '1983-09-20',   'BT62JA',   '07849727199',  'nastie2001@yahoo.com'),
       ('53130969', 'Mary', 'Wilson',   '1983-09-20',   'BT62JA',   '07849727199',  'Nastie@hotmail.cm'),
       ('33843283', 'Mary', 'Wilson',   '1983-09-20',   'BT148HU',  '07484863478',  'nastie2010@yahoo.co.uk'),
       ('38627975', 'Mary', 'Wilson',   '1983-09-20',   'BT62JA',   '07484863478',  'nastie2001@yahoo.com')

SELECT * FROM @customer_dist;

This is not an answer but a comment that is too long to fit in the comments section.

Since the "equality" condition is complex, I think I would do it in phases:

  1. Create the "buckets" of similar customers. A bucket identifies all customer with identical id, first_name, last_name, and dob. Add an index on the new "key" column for faster grouping. A bucket may contain one or more real customers.

    select
        cast(id as varchar(10)) +
        lower(first_name) + 
        lower(last_name) + 
        convert(varchar, dob, 23) as k,
        id, post_code, mobile, email
        into bucket
      from customer_dist;
    
    create index ix1 on bucket(k);
    
  2. Work on each bucket and separate the customers on each one. Most likely there's a single one, but can be multiple in it.

Here you'll need to run some iterative algorithm to compare rows, mark them as equal groups or different one, and eventually consolidate groups into single ones. All of this is possible, but I'm afraid I don't see how to do it simply in SQL.

You'll need to do some coding here.

Merge Duplicate Customer or Vendor Records, This is not an answer but a comment that is too long to fit in the comments section​. Since the "equality" condition is complex, I think I would do it in phases:. Sorry again for the slow reply, with the initial test data the outcome works correctly but upon further testing the output is producing duplicate records. I have amended my initial question with new test data. This code duplicates with an output of NId 33843283 Id 38627975, inserted are 10 rows but output is 12.


Since you had mentioned that your "group" is primarily based on three columns: FirstName, LastName, and DOB, you can create a View to keep track of the minimum ID for all records, and use that view whenever you would like to perform additional processing.

You can also create a CTE. It all depends on how you intend to use your result set.

I will not try to update existing records in customer_dist table since it will serve as a raw table in case you would like to go back and look at the exact data that the users have entered at different points in time (if you care about statistics/data trends)

Query in either approach:

SELECT 
  MIN(id) AS Min_Id,
  LOWER(First_Name) AS firstName, LOWER(Last_Name) As lastName, DoB
FROM
customer_dist
GROUP BY 
LOWER(First_Name), LOWER(Last_Name), DoB;

View example

CTE example

Merging duplicated records together with "Merge" syntax, Over a period, there may be a chance of accumulation of duplicate records. You can merge the duplicate records for a better organization of records. Combine Duplicate Rows and Sum the Values with VBA code. You can also combine duplicate rows and sum the values with VBA code in Excel. Just do the following: 1# click on “Visual Basic” command under DEVELOPER Tab. 2# then the “Visual Basic Editor” window will appear.


[dbo].[LEVENSHTEIN]

CREATE FUNCTION [dbo].[LEVENSHTEIN](@left  VARCHAR(100),
                                @right VARCHAR(100))
RETURNS INT
AS
  BEGIN
      DECLARE @difference    INT,
              @lenRight      INT,
              @lenLeft       INT,
              @leftIndex     INT,
              @rightIndex    INT,
              @left_char     CHAR(1),
              @right_char    CHAR(1),
              @compareLength INT

      SET @lenLeft = LEN(@left)
      SET @lenRight = LEN(@right)
      SET @difference = 0

      IF @lenLeft = 0
        BEGIN
            SET @difference = @lenRight

            GOTO done
        END

      IF @lenRight = 0
        BEGIN
            SET @difference = @lenLeft

            GOTO done
        END

      GOTO comparison

      COMPARISON:

      IF ( @lenLeft >= @lenRight )
        SET @compareLength = @lenLeft
      ELSE
        SET @compareLength = @lenRight

      SET @rightIndex = 1
      SET @leftIndex = 1

      WHILE @leftIndex <= @compareLength
        BEGIN
            SET @left_char = SUBSTRING(@left, @leftIndex, 1)
            SET @right_char = SUBSTRING(@right, @rightIndex, 1)

            IF @left_char <> @right_char
              BEGIN -- Would an insertion make them re-align?
                  IF( @left_char = SUBSTRING(@right, @rightIndex + 1, 1) )
                    SET @rightIndex = @rightIndex + 1
                  -- Would an deletion make them re-align?
                  ELSE IF( SUBSTRING(@left, @leftIndex + 1, 1) = @right_char )
                    SET @leftIndex = @leftIndex + 1

                  SET @difference = @difference + 1
              END

            SET @leftIndex = @leftIndex + 1
            SET @rightIndex = @rightIndex + 1
        END

      GOTO done

      DONE:

          RETURN @difference
      END

    GO

[dbo].[GetPercentageOfTwoStringMatching]

CREATE FUNCTION [dbo].[GetPercentageOfTwoStringMatching]
(
    @string1 NVARCHAR(100)
    ,@string2 NVARCHAR(100)
)
RETURNS INT
AS
BEGIN

    DECLARE @levenShteinNumber INT

    DECLARE @string1Length INT = LEN(@string1)
    , @string2Length INT = LEN(@string2)
    DECLARE @maxLengthNumber INT = CASE WHEN @string1Length > @string2Length THEN @string1Length ELSE @string2Length END

    SELECT @levenShteinNumber = [dbo].[LEVENSHTEIN] (   @string1  ,@string2)

    DECLARE @percentageOfBadCharacters INT = @levenShteinNumber * 100 / @maxLengthNumber

    DECLARE @percentageOfGoodCharacters INT = 100 - @percentageOfBadCharacters

    -- Return the result of the function
    RETURN @percentageOfGoodCharacters
END
GO

Query

    DECLARE @customer_dist TABLE
    (
        [id] [INT] NOT NULL ,
        [First_Name] [VARCHAR](50) NULL ,
        [Last_Name] [VARCHAR](50) NULL ,
        [DoB] [DATE] NULL ,
        [post_code] [VARCHAR](50) NULL ,
        [mobile] [VARCHAR](50) NULL ,
        [Email] [VARCHAR](100) NULL
    );

INSERT INTO @customer_dist ( id ,
                             First_Name ,
                             Last_Name ,
                             DoB ,
                             post_code ,
                             mobile ,
                             Email )
VALUES ( '84015283', 'Christopher', 'Higg', '1956-01-13', 'CH2 3AZ' ,
         '07089559829' , 'CH@hotmail.com' ) ,
       ( '84069198', 'Christopher', 'Higg', '1956-01-13', 'CH2 3AZ' ,
         '07089559829' , 'CH@hotmail.com' ) ,
       ( '84070263', 'Christopher', 'Higg', '1956-01-13', 'CH2 3AZ' ,
         '07089559822' , 'CHigg@AOL.com' ) ,
       ( '84369603', 'Christopher', 'Higg', '1956-01-13', 'CH2 3ZA' ,
         '07089559829' , 'Higg@emailme.com' ) ,
       ( '85061159', 'CHRISTOPHER', 'Higg', '1956-01-13', 'CH2 3RA' ,
         '07089559829' , 'CH@hotmail.com' ) ,
       ( '87065122', 'Matthew', 'Davis', '1978-05-10', 'CH5 1TS' ,
         '07077084692' , 'Matt@gamil.com' ) ,
       ( '94015281', 'Christopher', 'Higg', '1956-01-13', 'NN2 1XH' ,
         '08009777337' , 'CHigg@gmail.com' );



SELECT result.* ,
       [dbo].GetPercentageOfTwoStringMatching(result.DoB, d.DoB) [DOB%match] ,
       [dbo].GetPercentageOfTwoStringMatching(result.post_code, d.post_code) [post_code%match] ,
       [dbo].GetPercentageOfTwoStringMatching(result.mobile, d.mobile) [mobile%match] ,
       [dbo].GetPercentageOfTwoStringMatching(result.Email, d.Email) [email%match]
FROM   (   SELECT (   SELECT MIN(id)
                      FROM   @customer_dist AS sq
                      WHERE  sq.First_Name = cd.First_Name
                             AND sq.Last_Name = cd.Last_Name
                             AND (   sq.mobile = cd.mobile
                                     OR sq.Email = cd.Email
                                     OR sq.post_code = cd.post_code )) nid ,
                  *
           FROM   @customer_dist AS cd ) AS result
       INNER JOIN @customer_dist d ON result.nid = d.id;

Result

Second query

    DECLARE @customer_dist TABLE
    (
        [id] [INT] NOT NULL ,
        [First_Name] [VARCHAR](50) NULL ,
        [Last_Name] [VARCHAR](50) NULL ,
        [DoB] [DATE] NULL ,
        [post_code] [VARCHAR](50) NULL ,
        [mobile] [VARCHAR](50) NULL ,
        [Email] [VARCHAR](100) NULL
    );

INSERT INTO @customer_dist ( id ,
                             First_Name ,
                             Last_Name ,
                             DoB ,
                             post_code ,
                             mobile ,
                             Email )
VALUES ( '84015283', 'Christopher', 'Higg', '1956-01-13', 'CH2 3AZ' ,
         '07089559829' , 'CH@hotmail.com' ) ,
       ( '84069198', 'Christopher', 'Higg', '1956-01-13', 'CH2 3AZ' ,
         '07089559829' , 'CH@hotmail.com' ) ,
       ( '84070263', 'Christopher', 'Higg', '1956-01-13', 'CH2 3AZ' ,
         '07089559822' , 'CHigg@AOL.com' ) ,
       ( '84369603', 'Christopher', 'Higg', '1956-01-13', 'CH2 3ZA' ,
         '07089559829' , 'Higg@emailme.com' ) ,
       ( '85061159', 'CHRISTOPHER', 'Higg', '1956-01-13', 'CH2 3RA' ,
         '07089559829' , 'CH@hotmail.com' ) ,
       ( '87065122', 'Matthew', 'Davis', '1978-05-10', 'CH5 1TS' ,
         '07077084692' , 'Matt@gamil.com' ) ,
       ( '94015281', 'Christopher', 'Higg', '1956-01-13', 'NN2 1XH' ,
         '08009777337' , 'CHigg@gmail.com' );



SELECT result.* ,
       [dbo].GetPercentageOfTwoStringMatching(result.DoB, d.DoB) [DOB%match] ,
       [dbo].GetPercentageOfTwoStringMatching(result.post_code, d.post_code) [post_code%match] ,
       [dbo].GetPercentageOfTwoStringMatching(result.mobile, d.mobile) [mobile%match] ,
       [dbo].GetPercentageOfTwoStringMatching(result.Email, d.Email) [email%match]
FROM   (   SELECT (   SELECT MIN(id)
                      FROM   @customer_dist AS sq
                      WHERE  sq.First_Name = cd.First_Name
                             AND sq.Last_Name = cd.Last_Name
                             AND (  sq.DoB = cd.DoB   
                                     OR sq.mobile = cd.mobile
                                     OR sq.Email = cd.Email
                                     OR sq.post_code = cd.post_code )) nid ,
                  *
           FROM   @customer_dist AS cd ) AS result
       INNER JOIN @customer_dist d ON result.nid = d.id;

Result:

Merging Duplicate Records, With Contact Merge, when you discover that two records are duplicates of are merged so that the retained record contains the combined history from both  Table Merge creates duplicate records. Subscribe to RSS Feed. Email to a Friend. Report Inappropriate Content. ‎02-08-2017 01:46 AM. I am trying to merge two tables into a new table using a LEFT JOIN. Both tables have unique records on each row. When I join the tables, BI creates duplicate rows on some records for no apparent reason. The


Try this (necessary comments are in code):

;with cte as (
    SELECT 1 n, 84015283 CID, * FROM @tbl
    where id = 84015283
    union all 
    select c.n + 1, 84015283, t.* from cte c
    join @tbl t on
        c.First_Name = t.first_name and
        c.Last_Name = t.Last_name and
        c.DoB = t.DoB and (
        c.post_code = t.post_code or
        c.mobile = t.mobile or
        c.Email = t.Email 
        ) and
        --there is no way of writing stop condition here,
        --as joining will return in some rows every time,
        --so you have to enter here number big enough for
        --query to join all records, here 1 suffices
        --(if you enter bigger number, result will stay the same
        --due to distinct in select)
        c.n <= 1
)

select distinct CID, 
                id NID, 
                First_Name, 
                Last_Name, 
                DoB, 
                post_code, 
                mobile, 
                Email 
from cte

Alternate approach is using while loop:

declare @tempTable table
(
    [id] [int] NOT NULL,
    [First_Name] [varchar](50) NULL,
    [Last_Name] [varchar](50) NULL,
    [DoB] [date] NULL,
    [post_code] [varchar](50) NULL,
    [mobile] [varchar](50) NULL,
    [Email] [varchar](100) NULL
);
insert into @tempTable
select *
from @customer_dist

declare @inserted int = -1;
while @inserted <> (select count(*) from @tempTable)
begin
    select @inserted = count(*) from @tempTable
    insert into @tempTable
    select c.* from @customer_dist c
    where exists(select 1 from @tempTable t
                 where c.First_Name = t.first_name and
                       c.Last_Name = t.Last_name and
                       c.DoB = t.DoB and (
                       c.post_code = t.post_code or
                       c.mobile = t.mobile or
                       c.Email = t.Email 
                       )
                 )
    except
    select * from @tempTable
end

select MAX(NID) over (partition by first_name,last_name) NID,
       id, First_Name, Last_Name, DoB, post_code, mobile, Email
from (
    select (case when ROW_NUMBER() over (partition by first_name,last_name order by (select null)) = 1 then 1 else 0 end) * id NID,
           *
    from @tempTable
) a

select * from @tempTable

It loops as long as there are new records added to @tempTable. With your sample data it loops just once.

The difference to previous query is that at every step of a loop it will take only new records thanks to except, which cannot be used in CTE.

Also it performs better, because it uses exists to determine which rows still to add. In CTE it is not allowed, since CTE cannot appear in subqueries.

And, most of all, it will guarantee you that you won't miss any records! In cte you had to constrain it with c.n < 1 and that could be a risk of loosing records.

Merging contacts, The Combine/Merge Duplicate records feature ("Combine/Merge Dups") allows you identify and merge duplicate individual or organization information into one  How to combine duplicate rows into one (keeping unique values only) Select the duplicate rows you want to merge and run the Merge Duplicates wizard by clicking its Make sure your table is selected correctly and click Next. Select the key column to check for duplicates. Choose the columns to


The following example uses a CTE to pair rows (by joining the table with itself) that have matching column values as per the requirements. In each pair the "left" row precedes the "right" in Id order, hence avoiding duplicate results which differ only in having swapped Id values.

The results of the CTE are then combined with an extra row for each group of matching rows to provide the curious extra row that matches itself, i.e. NId = Id.

-- Sample data.
declare @customer_dist as Table (
    [id] [int] NOT NULL,
    [First_Name] [varchar](50) NULL,
    [Last_Name] [varchar](50) NULL,
    [DoB] [date] NULL,
    [post_code] [varchar](50) NULL,
    [mobile] [varchar](50) NULL,
    [Email] [varchar](100) NULL );

INSERT INTO @customer_dist (id, First_Name, Last_Name, DoB, post_code, mobile, Email)
VALUES ('84015283', 'Christopher', 'Higg', '1956-01-13', 'CH2 3AZ', '07089559829', 'CH@hotmail.com'),
       ('84069198', 'Christopher', 'Higg', '1956-01-13', 'CH2 3AZ', '07089559829', 'CH@hotmail.com'),
       ('84070263', 'Christopher', 'Higg', '1956-01-13', 'CH2 3AZ', '07089559822', 'CHigg@AOL.com'),
       ('84369603', 'Christopher', 'Higg', '1956-01-13', 'CH2 3ZA', '07089559829', 'Higg@emailme.com'),
       ('85061159', 'CHRISTOPHER', 'Higg', '1956-01-13', 'CH2 3RA', '07089559829', 'CH@hotmail.com'),
       ('87065122', 'Matthew', 'Davis', '1978-05-10', 'CH5 1TS', '07077084692', 'Matt@gamil.com');

SELECT * FROM @customer_dist;

-- Process the data.
with PairedRows as (
  -- Pairs of rows where the "left" row precedes the "right" in   Id   order and the rows match per the stated requirements.
  select CDL.id as NId, CDR.id as Id
    from @customer_dist as CDL inner join
      @customer_dist as CDR on
        -- Pair rows where the "left" row precedes the "right" in   Id   order.
        CDR.Id > CDL.Id and
        -- "Must match" columns.
        CDR.First_Name = CDL.First_Name and CDR.Last_Name = CDL.Last_Name and CDR.DoB = CDL.DoB and
        -- Plus at least one optional match.
        ( CDR.post_code = CDL.post_code or CDR.mobile = CDL.mobile or CDR.Email = CDL.Email )
    -- Where there is not a prior row (in   Id   order) that matches the "left" row.
    where not exists (
      select 42 from @customer_dist as NE where NE.ID < CDL.Id and 
        NE.First_Name = CDL.First_Name and NE.Last_Name = CDL.Last_Name and NE.DoB = CDL.DoB and
        ( NE.post_code = CDL.post_code or NE.mobile = CDL.mobile or NE.Email = CDL.Email ) ) )
  select NId, Id -- The paired rows.
    from PairedRows
  union all
  -- Add the   NId   row as a match to itself for every group of paired rows.
  select Min( NId ) as NID, Min( NId ) as Id
    from PairedRows
    group by NId
  order by NID, Id;

Chasing the dancing question section.

The following adds anyone not in a pair to the output through another union all:

-- Process the data.
with PairedRows as ( -- Pairs of rows where the "left" row precedes the "right" in   Id   order and the rows match per the stated requirements.
  select CDL.id as NId, CDR.id as Id
    from @customer_dist as CDL inner join
      @customer_dist as CDR on CDR.Id > CDL.Id and -- Pair rows where the "left" row precedes the "right" in   Id   order.
        CDR.First_Name = CDL.First_Name and CDR.Last_Name = CDL.Last_Name and CDR.DoB = CDL.DoB and -- "Must match" columns.
        ( CDR.post_code = CDL.post_code or CDR.mobile = CDL.mobile or CDR.Email = CDL.Email ) -- Plus at least one optional match.
    where not exists ( -- Where there is not a ...
      select 42 from @customer_dist as NE where NE.ID < CDL.Id and -- ... prior row (in   Id   order) that matches the "left" row.
        NE.First_Name = CDL.First_Name and NE.Last_Name = CDL.Last_Name and NE.DoB = CDL.DoB and
        ( NE.post_code = CDL.post_code or NE.mobile = CDL.mobile or NE.Email = CDL.Email ) ) )
  select NId, Id -- The paired rows.
    from PairedRows
  union all
  select Min( NId ) as NID, Min( NId ) as Id -- Add the   NId   row as a match to itself for every group of paired rows.
    from PairedRows
    group by NId
  union all
  select id, id -- Toss in anyone we haven't heard of.
    from @customer_dist as CD
    where not exists ( select 42 from PairedRows as PR where PR.NId = CD.id or PR.Id = CD.id )
  order by NID, Id;

And yet one more mashup to display the reason for each output row:

-- Sample data.
declare @customer_dist as Table (
    [id] [int] NOT NULL,
    [First_Name] [varchar](50) NULL,
    [Last_Name] [varchar](50) NULL,
    [DoB] [date] NULL,
    [post_code] [varchar](50) NULL,
    [mobile] [varchar](50) NULL,
    [Email] [varchar](100) NULL );

INSERT INTO @customer_dist (id, First_Name, Last_Name, DoB, post_code, mobile, Email)
VALUES ('32006455', 'Mary', 'Wilson',   '1983-09-20',   'BT62JA',   '07706212920',  'nastie220@yahoo.com'),
       ('35963960', 'Mary', 'Wilson',   '1983-09-20',   'BT62JA',   '07484863324',  'nastie@hotmail.com'),
       ('38627975', 'Mary', 'Wilson',   '1983-09-20',   'BT62JA',   '07484863478',  'nastie2001@yahoo.com'),
       ('46653041', 'Mary', 'WILSON',   '1983-09-20',   'BT62JA',   '07483888179',  'nastie2010@yahoo.com'),
       ('48023677', 'Mary', 'Wilson',   '1983-09-20',   'BT62JA',   '07483888179',  'nastie@hotmail.com'),
       ('49560434', 'Mary', 'Wilson',   '1983-09-20',   'BT62JA',   '07849727199',  'nastie@hotmail.com'),
       ('49861032', 'Mary', 'WILSON',   '1983-09-20',   'BT62JA',   '07849727199',  'nastie2001@yahoo.com'),
       ('53130969', 'Mary', 'Wilson',   '1983-09-20',   'BT62JA',   '07849727199',  'Nastie@hotmail.cm'),
       ('33843283', 'Mary', 'Wilson',   '1983-09-20',   'BT148HU',  '07484863478',  'nastie2010@yahoo.co.uk'),
       -- NB: Unique   Id   in the following row.
       ('386279750', 'Mary', 'Wilson',   '1983-09-20',   'BT62JA',   '07484863478',  'nastie2001@yahoo.com');

INSERT INTO @customer_dist (id, First_Name, Last_Name, DoB, post_code, mobile, Email)
VALUES ('84015283', 'Christopher', 'Higg', '1956-01-13', 'CH2 3AZ', '07089559829', 'CH@hotmail.com'),
       ('84069198', 'Christopher', 'Higg', '1956-01-13', 'CH2 3AZ', '07089559829', 'CH@hotmail.com'),
       ('84070263', 'Christopher', 'Higg', '1956-01-13', 'CH2 3AZ', '07089559822', 'CHigg@AOL.com'),
       ('84369603', 'Christopher', 'Higg', '1956-01-13', 'CH2 3ZA', '07089559829', 'Higg@emailme.com'),
       ('85061159', 'CHRISTOPHER', 'Higg', '1956-01-13', 'CH2 3RA', '07089559829', 'CH@hotmail.com'),
       ('87065122', 'Matthew', 'Davis', '1978-05-10', 'CH5 1TS', '07077084692', 'Matt@gamil.com');

SELECT * FROM @customer_dist;
select ( select Count(*) from @customer_dist ) as TotalRows, ( select Count( distinct id ) from @customer_dist ) as DistinctIds;

-- Process the data.
with PairedRows as ( -- Pairs of rows where the "left" row precedes the "right" in   Id   order and the rows match per the stated requirements.
  select CDL.id as NId, CDR.id as Id
    from @customer_dist as CDL inner join
      @customer_dist as CDR on CDR.Id > CDL.Id and -- Pair rows where the "left" row precedes the "right" in   Id   order.
        CDR.First_Name = CDL.First_Name and CDR.Last_Name = CDL.Last_Name and CDR.DoB = CDL.DoB and -- "Must match" columns.
        ( CDR.post_code = CDL.post_code or CDR.mobile = CDL.mobile or CDR.Email = CDL.Email ) -- Plus at least one optional match.
    where not exists ( -- Where there is not a ...
      select 42 from @customer_dist as NE where NE.ID < CDL.Id and -- ... prior row (in   Id   order) that matches the "left" row.
        NE.First_Name = CDL.First_Name and NE.Last_Name = CDL.Last_Name and NE.DoB = CDL.DoB and
        ( NE.post_code = CDL.post_code or NE.mobile = CDL.mobile or NE.Email = CDL.Email ) ) ),
  Results as (
    select NId, Id, 'Paired' as Reason -- The paired rows.
      from PairedRows
    union all
    select Min( NId ) as NID, Min( NId ) as Id, 'Self' -- Add the   NId   row as a match to itself for every group of paired rows.
      from PairedRows
      group by NId
    union all
    select id, id, 'Other' -- Toss in anyone we haven't heard of.
      from @customer_dist as CD
      where not exists ( select 42 from PairedRows as PR where PR.NId = CD.id or PR.Id = CD.id ) )
  select R.NId, R.Id, R.Reason,
    CDL.First_Name, CDL.Last_Name,
    case when CDL.DoB = CDR.DoB then '=' else '' end as MatchDoB, -- Must match.
    case when CDL.post_code = CDR.post_code then '=' else '' end as MatchPostCode,
    case when CDL.mobile = CDR.mobile then '=' else '' end as MatchMobile,
    case when CDL.Email = CDR.Email then '=' else '' end as MatchEmail,
    case when CDL.id = CDR.id then '==' else '' end as MatchSelf,
    case when ( select Count(*) from Results as IR where IR.NId = R.NId and IR.Id = R.Id ) > 1 then '#' else '' end as 'Duplicate'
    from Results as R inner join
      @customer_dist as CDL on CDL.id = R.NId inner join
      @customer_dist as CDR on CDR.id = R.Id
    order by NID, Id;

Combine and Merge Duplicate Records, Note that when you merge two contact records together, data like form submissions, notes, referral tracking, recent activity, etc are also merged  Hi all, I've run into an issue where I need to remove duplicates keys, but keep the row information. As you'll see below, I have a duplicate value for DateLocation key 62446, because it has two different survey types. I don't care about survey type, so I need to roll up that info into a single e


Manually merge duplicate contacts, If you have duplicated records in your database, Workbooks has the ability to merge them together. Only Person or Organisation records can be merged. With Merge Duplicates Wizard you will Combine duplicate rows by key columns Select any columns as unique identifiers to merge duplicates in your Excel worksheet. Select the delimiters for the merged values The merged values can be separated by semicolon, comma, space, line break, or any other symbol of your choice.


De-duplication and Merge, To open the duplicate checker, first open the table you wish to check for Merging two records together is as simple as dragging and dropping. Combining / Merging Records in Access Occasionally I receive an Access Database file containing the names of approximate 1000 students. The fields are: Student Name, Student ID Number, Date of Birth, Exam Score 1, Exam Score 2, Exam Score 3.


Cleaning up Duplicate Records – RealNex Knowledge Base, You can now find duplicate contacts, companies and properties using criteria you specify, and then merge those records together. I'll illustrate  Learn different ways you can combine values or sum numbers that refer to the same record in Excel. Feel free to download Combine Rows Wizard: https://www.abl