print values that occur in one column but not another

find items in one column that are not in another column
how to check / find if value exists in another column?
if one column matches another then
excel highlight cell if value exists in another column
vlookup for one column
excel lookup value in column and return value of another column
check if one cell value exists in another column excel
how to match one column to another in excel

I have a file with four columns of data that looks like this:

cluster-9  cluster-12   cluster-40  cluster-62
cluster-10 cluster-12   cluster-42  cluster-60
cluster-12 cluster-12   cluster-43  cluster-61
cluster-12 cluster-12   cluster-28  cluster-20
cluster-12 cluster-12   cluster-29  cluster-21
cluster-16 cluster-12   cluster-41  cluster-63
cluster-16 cluster-12   cluster-2   cluster-4
cluster-16 cluster-12   cluster-8   cluster-5
cluster-16 cluster-9    cluster-9   cluster-6
cluster-16 cluster-12   cluster-45  cluster-39  

I would like to extract the unique values that are in column 1, but not in a specific other column (pairwise). So for example i'd like to be able to compare column 1 and 2 and output that only the following are in column 1 but not column 2:

cluster-10
cluster-16

Because cluster-12 and cluster-9 are found in column 2 they are not printed.

Could you please try following.

awk '{a[$1];b[$2]} END{for(i in a){if(i in b){continue};print i}}' Input_file
cluster-10
cluster-16

Let's say we want to send values of columns which we want to compare in variables(awk variables) then try following.

awk -v col1="1" -v col2="2" '{a[$col1];b[$col2]} END{for(i in a){if(i in b){continue};print i}}'  Input_file
cluster-10
cluster-16

Change values of variables -v col1 and -v col2 as per new column values which you want to compare and it will compare their values then(check to get one columns unique values seeing other column).

print values that occur in one column but not another, How do I find values in one column and not another? Check if one column value exists in another column using VLOOKUP. VLOOKUP is one of the lookup, and reference functions in Excel and Google Sheets used to find values in a specified range by “row.” It compares them row-wise until it finds a match.

There are of course multiple ways to accomplish this, but here's one using sed, sort, and uniq. The key here is to find the unique set of each of the two columns you care about and then use the -u option to uniq to print only the items in the first set. The code below looks at columns 1 and 2, but you could easily adjust to look at any other pair of columns.

#!/bin/sh
#define a separator character and a column format, adjust to fit your data
sep=" "
col="\([a-zA-Z0-9_-]*\)$sep"

#get all values in column 1 and reduce to a unique set
col1=`sed "s/^$col.*/\\1/" file | sort | uniq`
#get all values in column2 and reduce to a unique set. Adjust for a different 
#column as necessary
col2=`sed "s/^$col$col.*/\\2/" file | sort | uniq`
#concatenate our results and spit out only unique items.
#Include column 2 twice so that we don't get any items only in column2
echo "$col1$col2$col2" | sort | uniq -u

how do I find the values that are in one column but not in the other , How do you check if data in one column is in another? Check if value exists in another column and highlight them with Conditional Formatting. In Excel, you can also highlight the duplicate values in one column when they found in another column by using the Conditional Formatting. Please do as this: 1. Select the cells in List A that you want to highlight the cells where the values exit in List B as well.

You can try Perl also

$ perl -lane ' $kv{$F[0]}++; $kv2{$F[1]}++; END { for(keys %kv) { unless ($kv2{$_}) { print "$_" } }}' greg.txt
cluster-10
cluster-16
$ cat greg.txt
cluster-9  cluster-12   cluster-40  cluster-62
cluster-10 cluster-12   cluster-42  cluster-60
cluster-12 cluster-12   cluster-43  cluster-61
cluster-12 cluster-12   cluster-28  cluster-20
cluster-12 cluster-12   cluster-29  cluster-21
cluster-16 cluster-12   cluster-41  cluster-63
cluster-16 cluster-12   cluster-2   cluster-4
cluster-16 cluster-12   cluster-8   cluster-5
cluster-16 cluster-9    cluster-9   cluster-6
cluster-16 cluster-12   cluster-45  cluster-39
$

or

$ perl -lane ' $kv{$F[0]}++; $kv2{$F[1]}++; END { for(keys %kv) { print unless $kv2{$_} }} ' greg.txt
cluster-10
cluster-16
$

How to highlight cells if not in another column in Excel?, How do you find not matching values in two excel columns? Distinct in one column where another column has same values. Ask Question Find values which occur in every row for every distinct value in other column of the

How to check or find if value exists in another column?, How do I sort rows to match another column in Excel? Challenge: find out which entries in one list appear in another list. This problem comes up time and time again in Excel. In fact, it's probably the most common challenge that people come to me to solve. Here's how to solve it. Let's say we have a list of car manufacturers in column A. And a list of US presidents in column D.

Check If One Column Value Exists in Another Column, Could you please try following. awk '{a[$1];b[$2]} END{for(i in a){if(i in b){continue​};print i}}' Input_file cluster-10 cluster-16. Let's say we want to  uses a column reference (B:B) to count the number of times each value occurs in column B: Mike and Susan have only one record in the data set. John and George have two records in the data set.

Oracle SQL*Plus: The Definitive Guide, Use MATCH to determine whether each row in column A appears in column B, then filter column A to only the rows for which MATCH returned  The id column in the call table is not the same value as the id column in the Phone_book table, so you can't join on these values. See WOPR's answer for a similar approach. – Michael Fredrickson Feb 15 '12 at 23:11

Comments
  • I've managed to extract the columns of interest, and find the unique values in each column, but i'm a bit lost from there: awk 'BEGIN { FS = "\t" } ; { print $1, $2 }' FILENAME | uniq
  • Separate the columns and sort-unique them. Then see the comm command, which is specifically designed for this.
  • ...and maybe a little sed to clean the results.
  • Yah, I've been beaten.
  • @tshiono, only this time buddy :)
  • '{a[$1]; b[$2]} END{for (i in a) if (!(i in b)) print i}'
  • Thanks! This mostly works, but it seems that the last echo command is such that it takes the last value of col1 and puts it on the same line as the first item of col2 which means that the last and first items are viewed as unique. is there anyway to add a line break after echoing col1?
  • Interesting, I'm not seeing the same behavior. With the -e option to echo, it will accept backslash escape sequences (such as for an extra newline). I'll try to take a more detailed look later. Ravinder Singh's answer below is much more succinct though.