SQL Union not including duplicates based on single column?
sql query remove duplicate rows based on one column
select distinct on one column, with multiple columns returned sql
sql union merge rows
sql merge two tables into one without duplicates
how to eliminate duplicate rows in inner join
remove duplicates using union all
union two columns sql
I'm trying to union two tables but I need to essentially 'prefer' the first table using just one 'id' column. If an 'id' appears in the second table that already exists in the first, I do not want to include that record.
Query looks like this
select id, col2, col3 from table(p_package.getData(param)) union select id, col2, col3 from table1 where col7 = 'pass' and col8 <> 'A' and col9 = to_date(Date, 'mm/dd/yyyy')
the p_package.getData(param) is a pipelined function which returns a table. I would like to avoid calling this twice for performance reasons
You can use the
ROW_NUMBER() analytic function to remove the duplicates:
SELECT id, col2, col3 FROM ( SELECT id, col2, col3, ROW_NUMBER() OVER ( PARTITION BY id ORDER BY priority ) AS rn FROM ( select id, col2, col3, 1 AS priority from table(p_package.getData(param)) UNION ALL select id, col2, col3, 2 from table1 where col7 = 'pass' and col8 <> 'A' and col9 = to_date(Date, 'mm/dd/yyyy') ) ) WHERE rn = 1
and as a bonus, since you're filtering the duplicates elsewhere, you could change
If you can have duplicates
id values from the pipelined function and you want those but not any from
SELECT id, col2, col3 FROM ( SELECT id, col2, col3, priority ROW_NUMBER() OVER ( PARTITION BY id ORDER BY priority ) AS rn FROM ( select id, col2, col3, 1 AS priority from table(p_package.getData(param)) UNION ALL select id, col2, col3, 2 from table1 where col7 = 'pass' and col8 <> 'A' and col9 = to_date(Date, 'mm/dd/yyyy') ) ) WHERE priority = 1 OR rn = 1
What happens if tables we perform UNION on have duplicate rows , Answer When you combine tables with UNION, duplicate rows will include duplicates, certain versions of SQL provides the UNION ALL operator. Even if one entry of row is not same as some other row, it will be added on. SELECT column1, column2 FROM table1 UNION [ ALL ] SELECT column3, column4 FROM table2; To use the UNION operator, you write the dividual SELECT statements and join them by the keyword UNION. The columns returned by the SELECT statements must have the same or convertible data type, size, and be the same order.
Assuming you don't want to include any
col1 value in the second half of the union which would introduce a value already included in the first half, you could use an exists clause:
select col1, col2, col3 from table(p_package.getData(param)) union select col1, col2, col3 from table1 t1 where col7 = 'pass' and col8 <> 'A'and col9 = to_date(Date, 'mm/dd/yyyy') and not exists (select 1 from table(p_package.getData(param)) t2 where t1.col1 = t2.col1);
SQL: UNION ALL Operator, and examples. The SQL UNION ALL operator is used to combine the result sets of 2 or more SELECT statements (does not remove duplicate rows). Let's look at how to use the SQL UNION ALL operator that returns one field. In this simple To remove duplicates from a result set, you use the DISTINCT operator in the SELECT clause as follows: SELECT DISTINCT column1, column2, FROM table1; If you use one column after the DISTINCT operator, the database system uses that column to evaluate duplicate. In case you use two or more columns, the database system will use the combination of value in these columns for the duplication check.
The other solutions work but I opted to use a common table expression as suggested by xQbert
with cte as (select id, col2, col3 from table(p_package.getData(param))) select * from cte union select id, col2, col3 from table1 where col7 = 'pass' and col8 <> 'A' and col9 = to_date(Date, 'mm/dd/yyyy') and id not in (select id from cte)
EDIT: I realized that a CTE does not actually store the data returned by a query but stores the query itself instead. While this works it does not avoid calling the pipelined function twice
Merging two selects then sort by and remove duplicates, However, when using the UNION command all selected columns need to same and could be combined by using the single join condition ON DF. of style as it likely makes no difference to the query engine (depending on its For example, the innermost INNER JOIN could look like this in SQL Server: The Union operator combines the results of two or more queries into a single result set that includes all the rows that belong to all queries in the Union. In simple terms, it combines the two or more row sets and keeps duplicates. For example, the table ‘A’ has 1,2, and 3 and the table ‘B’ has 3,4,5.
How to union two queries without duplicates, I have a sql query that returns 4 columns CustName CustId CustZip the UNION operator between the 2 queries, the UNION operator remove duplicated rows in 2- you can use the DISTINCT operator to get the unique rows. SELECT columnlist FROM table1 UNION SELECT columnlist FROM table2. In order to union two tables there are a couple of requirements: The number of columns must be the same for both select statements. The columns, in order, must be of the same data type. When rows are combined duplicate rows are eliminated.
SQL DISTINCT: Removing Duplicates In a Result Set Examples, To remove duplicates from a result set, you use the DISTINCT operator in the SELECT clause as SELECT DISTINCT one column example without duplicate. By default an SQL UNION only selects distinct values. If you want duplicates (i.e all rows from both tables) you need a UNION ALL.
SQL SERVER, In my earlier post on SQL SERVER – Delete Duplicate Rows, I showed you a tricky method of removing duplicate rows using traditional UNION operator. the multiple result sets into a single result set by removing duplicates. it pick a row from table A compares it with all rows in Table B and if it is not To find the duplicate values in a table, you follow these steps: First, define criteria for duplicates: values in a single column or multiple columns. Second, write a query to search for duplicates. If you want to also delete the duplicate rows, you can go to the deleting duplicates from a table tutorial.
- I don't see any columns here named
idor anything like it. Can you explain what the
- @TimBiegeleisen sorry,
idshould have been col1. Updated the question
- Consider: use a common table expression for the pipelined function and then reference the CTE as an exclusion in the 2nd. since the CTE would be in memory already; the function call shoudln't occur twice.
- @xQbert thanks for that suggestion! it's exactly what I was looking for but hadn't ever heard for CTE's nor did I come across it in any of my searching
- I"m assuming you double checked to ensure the execution plan didn't show the pipelined function getting hit twice. I don't think it would occur twice since the CTE is already in memory; but I'm not POSITIVE about it ;P
- wouldn't this end up calling the p_package.getData() function twice?
- Well I don't see any way around this TBH. I mean, you could just do the union and sort things out afterwards, but that would also be a bunch of work.
- Would using a join possibly be a way around it? I'm open to that but can't think of how
- How would a join be a way around this? The second half of the union has to "know" what the first half is doing, to avoid brining in the same