comparing two pandas dataframes with different column names and finding match

pandas : compare two columns of different data frame
spark dataframe compare two columns
pandas compare two dataframes of different lengths
compare two dataframes pandas
pandas difference between two data frames
compare multiple columns pandas
match two data frames in python
python match two columns

I have two dataframes :

df1:

A    B    C
1    ss   123
2    sv   234
3    sc   333

df2:

A    dd   xc
1    ss   123

df2 will always have a single row. How to check whether there is a match for that row of df2, in df1?

Using Numpy comparisons with np.all with parameter axis=1 for rows:

df1 = pd.DataFrame({'A': [1, 2, 3], 'B': ['ss', 'sv', 'sc'], 'C': [123, 234, 333]})
df2 = pd.DataFrame({'A': [1], 'dd': ['ss'], 'xc': [123]})

df3 = df1.loc[np.all(df1.values == df2.values, axis=1),:]

Or:

df3 = df1.loc[np.all(df1[['B','C']].values == df2[['dd','xc']].values, axis=1),:]

print(df3)
   A   B    C
0  1  ss  123

How do I compare columns in different data frames?, If you want to check equals values on a certain column let's say Name you can merge both Dataframes to a new one: mergedStuff = pd.merge(df1, df2,  In this short guide, I’ll show you how to compare values in two Pandas DataFrames. I’ll also review how to compare values from two imported files. Steps to Compare Values in two Pandas DataFrames Step 1: Prepare the datasets to be compared. To start, let’s say that you have the following two datasets that you want to compare: First Dataset:

Additional to Sandeep's answer, can do:

df1[np.all(df1.values == df2.values,1)].any().any()

For getting a boolean.

Or another way:

df1[(df2.values==df1.values).all(1)].any().any()

Or:

pd.merge(df1,df2).equals(df1)

Note: both output True

Check specific column (same as Sandeep's):

df1[col].isin(df2[col]).any()

Comparing Rows Between Two Pandas DataFrames, Find which rows are different between two DataFrames, as well as We need two datasets which have matching columns, but different entries. pandas.Indexobjects, including dataframe columns, have useful set-like methods, such as intersectionand difference. For example, given dataframes trainand test: train_cols = train.columnstest_cols = test.columnscommon_cols = train_cols.intersection(test_cols)train_not_test = train_cols.difference(test_cols)

How to check whether there is a match for that row of df2, in df1?

You can align columns and then check equality of df1 with the only row of df2:

df2.columns = df1.columns

res = (df1 == df2.iloc[0]).all(1).any()  # True

The benefit of this solution is you aren't subsetting df1 (expensive), but instead constructing a Boolean dataframe / array (cheap) and checking if all values in at least one row are True.

This is still not particularly efficient as you are considering every row in df1 rather than stopping when a condition is satisfied. With numeric data, in particular, there are more efficient solutions.

Python with Pandas: Comparing Two DataFrames, Introduction to Comparing Pandas DataFrames in Python For this tutorial we'll be comparing the following two DataFrames containing slightly different if there are differences, and to find out what those differences are. column contains which set of data, so we set the names directly to “Old” and “New”. You pass in two dataframes (df1, df2) to datacompy.Compare and a column to join on (or list of columns) to join_columns. By default the comparison needs to match values exactly, but you can pass in abs_tol and/or rel_tol to apply absolute and/or relative tolerances for numeric columns.

Python Pandas : compare two data-frames along one column and , You can use pandas.merge_asof. It allows you to combine 2 DataFrames on a key, in this case time, without the requirement that they are an  pandas.DataFrame.equals¶ DataFrame.equals (self, other) [source] ¶ Test whether two objects contain the same elements. This function allows two Series or DataFrames to be compared against each other to see if they have the same shape and elements. NaNs in the same location are considered equal.

Pandas Match Values In Two Dataframes, From the above, where Pandas was unable to find a match in the Series, Often, we may want to compare column values in different Excel files against one have two similar methods, stack and melt, to convert horizontal column names into  Quick Tip: Comparing two pandas dataframes and getting the differences Posted on January 3, 2019 January 3, 2019 by Eric D. Brown, D.Sc. There are times when working with different pandas dataframes that you might need to get the data that is ‘different’ between the two dataframes (i.e.,g Comparing two pandas dataframes and getting the

Merge, join, and concatenate, pandas provides various facilities for easily combining together Series or The Series will be transformed to DataFrame with the column name as the name of the of levels must match the number of join keys from the right DataFrame or Series. one-to-one joins: for example when joining two DataFrame objects on their  Find Common Rows between two Dataframe Using Merge Function. Using the merge function you can get the matching rows between the two dataframes. So we are merging dataframe(df1) with dataframe(df2) and Type of merge to be performed is inner, which use intersection of keys from both frames, similar to a SQL inner join.

Comments
  • What if i need to check only specific columns from df2 with df1? like only check for dd, xc from df2 with B,C of df1?
  • ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
  • @qwww Can you paste the code snippet giving you error. Also, check if the statements are exactly the same.
  • dont use isin, because it match index names, try change index name in second dataframe and it failed...
  • @jezrael Yes, Thanks for the suggestion.
  • what if column numbers are not same?
  • @qwww, Then you have a new question :). Seriously, I'm assuming you have df1 and df2 as in your question. You can't automate alignment of columns. How is Pandas meant to know xc maps to C?