SQL Union not including duplicates based on single column?

union not removing duplicates
sql query remove duplicate rows based on one column
select distinct on one column, with multiple columns returned sql
sql union merge rows
sql merge two tables into one without duplicates
how to eliminate duplicate rows in inner join
remove duplicates using union all
union two columns sql

I'm trying to union two tables but I need to essentially 'prefer' the first table using just one 'id' column. If an 'id' appears in the second table that already exists in the first, I do not want to include that record.

Query looks like this

            select id, col2, col3
            from table(p_package.getData(param))

            union

            select id, col2, col3 
            from table1         
            where col7 = 'pass'
            and col8 <> 'A' 
            and col9 = to_date(Date, 'mm/dd/yyyy')

the p_package.getData(param) is a pipelined function which returns a table. I would like to avoid calling this twice for performance reasons

You can use the ROW_NUMBER() analytic function to remove the duplicates:

SELECT id, col2, col3
FROM   (
  SELECT id, col2, col3,
         ROW_NUMBER() OVER ( PARTITION BY id ORDER BY priority ) AS rn
  FROM   (
    select id, col2, col3, 1 AS priority
    from   table(p_package.getData(param))
  UNION ALL
    select id, col2, col3, 2
    from table1         
    where col7 = 'pass'
    and   col8 <> 'A' 
    and   col9 = to_date(Date, 'mm/dd/yyyy')
  )
)
WHERE rn = 1

and as a bonus, since you're filtering the duplicates elsewhere, you could change UNION to UNION ALL.

If you can have duplicates id values from the pipelined function and you want those but not any from table1 then:

SELECT id, col2, col3
FROM   (
  SELECT id, col2, col3, priority
         ROW_NUMBER() OVER ( PARTITION BY id ORDER BY priority ) AS rn
  FROM   (
    select id, col2, col3, 1 AS priority
    from   table(p_package.getData(param))
  UNION ALL
    select id, col2, col3, 2
    from table1         
    where col7 = 'pass'
    and   col8 <> 'A' 
    and   col9 = to_date(Date, 'mm/dd/yyyy')
  )
)
WHERE priority = 1
OR    rn = 1

What happens if tables we perform UNION on have duplicate rows , Answer When you combine tables with UNION, duplicate rows will include duplicates, certain versions of SQL provides the UNION ALL operator. Even if one entry of row is not same as some other row, it will be added on. SELECT column1, column2 FROM table1 UNION [ ALL ] SELECT column3, column4 FROM table2; To use the UNION operator, you write the dividual SELECT statements and join them by the keyword UNION. The columns returned by the SELECT statements must have the same or convertible data type, size, and be the same order.

Assuming you don't want to include any col1 value in the second half of the union which would introduce a value already included in the first half, you could use an exists clause:

select col1, col2, col3
from table(p_package.getData(param))
union
select col1, col2, col3 
from table1 t1
where col7 = 'pass' and col8 <> 'A'and col9 = to_date(Date, 'mm/dd/yyyy') and
      not exists (select 1 from table(p_package.getData(param)) t2
                  where t1.col1 = t2.col1);

SQL: UNION ALL Operator, and examples. The SQL UNION ALL operator is used to combine the result sets of 2 or more SELECT statements (does not remove duplicate rows). Let's look at how to use the SQL UNION ALL operator that returns one field. In this simple  To remove duplicates from a result set, you use the DISTINCT operator in the SELECT clause as follows: SELECT DISTINCT column1, column2, FROM table1; If you use one column after the DISTINCT operator, the database system uses that column to evaluate duplicate. In case you use two or more columns, the database system will use the combination of value in these columns for the duplication check.

The other solutions work but I opted to use a common table expression as suggested by xQbert

        with cte as
        (select id, col2, col3
        from table(p_package.getData(param)))

        select * from cte

        union

        select id, col2, col3 
        from table1         
        where col7 = 'pass'
        and col8 <> 'A' 
        and col9 = to_date(Date, 'mm/dd/yyyy')
        and id not in (select id from cte)

EDIT: I realized that a CTE does not actually store the data returned by a query but stores the query itself instead. While this works it does not avoid calling the pipelined function twice

Merging two selects then sort by and remove duplicates, However, when using the UNION command all selected columns need to same and could be combined by using the single join condition ON DF. of style as it likely makes no difference to the query engine (depending on its For example, the innermost INNER JOIN could look like this in SQL Server: The Union operator combines the results of two or more queries into a single result set that includes all the rows that belong to all queries in the Union. In simple terms, it combines the two or more row sets and keeps duplicates. For example, the table ‘A’ has 1,2, and 3 and the table ‘B’ has 3,4,5.

How to union two queries without duplicates, I have a sql query that returns 4 columns CustName CustId CustZip the UNION operator between the 2 queries, the UNION operator remove duplicated rows in 2- you can use the DISTINCT operator to get the unique rows. SELECT columnlist FROM table1 UNION SELECT columnlist FROM table2. In order to union two tables there are a couple of requirements: The number of columns must be the same for both select statements. The columns, in order, must be of the same data type. When rows are combined duplicate rows are eliminated.

SQL DISTINCT: Removing Duplicates In a Result Set Examples, To remove duplicates from a result set, you use the DISTINCT operator in the SELECT clause as SELECT DISTINCT one column example without duplicate. By default an SQL UNION only selects distinct values. If you want duplicates (i.e all rows from both tables) you need a UNION ALL.

SQL SERVER, In my earlier post on SQL SERVER – Delete Duplicate Rows, I showed you a tricky method of removing duplicate rows using traditional UNION operator. the multiple result sets into a single result set by removing duplicates. it pick a row from table A compares it with all rows in Table B and if it is not  To find the duplicate values in a table, you follow these steps: First, define criteria for duplicates: values in a single column or multiple columns. Second, write a query to search for duplicates. If you want to also delete the duplicate rows, you can go to the deleting duplicates from a table tutorial.

Comments
  • I don't see any columns here named id or anything like it. Can you explain what the id requirement is?
  • @TimBiegeleisen sorry, id should have been col1. Updated the question
  • Consider: use a common table expression for the pipelined function and then reference the CTE as an exclusion in the 2nd. since the CTE would be in memory already; the function call shoudln't occur twice.
  • @xQbert thanks for that suggestion! it's exactly what I was looking for but hadn't ever heard for CTE's nor did I come across it in any of my searching
  • I"m assuming you double checked to ensure the execution plan didn't show the pipelined function getting hit twice. I don't think it would occur twice since the CTE is already in memory; but I'm not POSITIVE about it ;P
  • wouldn't this end up calling the p_package.getData() function twice?
  • Well I don't see any way around this TBH. I mean, you could just do the union and sort things out afterwards, but that would also be a bunch of work.
  • Would using a join possibly be a way around it? I'm open to that but can't think of how
  • How would a join be a way around this? The second half of the union has to "know" what the first half is doing, to avoid brining in the same col1 value, right?