pandas return columns in dataframe that are not in other dataframe

pandas get rows which are in other dataframe
pandas check if row exists in another dataframe
pandas dataframe where not in other dataframe
select all rows not in another table pandas
pandas get different rows
python not in other dataframe
pandas index not in another index
pandas add missing rows from another dataframe

I have two dataframes that look like this:

df_1 = pd.DataFrame({
'A' : [1.0, 2.0, 3.0, 4.0],
'B' : [100, 200, 300, 400],
'C' : [2, 3, 4, 5] 
                   })

df_2 = pd.DataFrame({
'B' : [1.0, 2.0, 3.0, 4.0],
'C' : [100, 200, 300, 400],
'D' : [2, 3, 4, 5] 
                  })

Now if I utilize pandas .isin function I can do something nifty like this

>>> print df_2.columns.isin(df_1.columns)
array([ True,  True, False], dtype=bool)

Columns B and C from df_2 exist in df_1 while D doesn't

My question is: does anyone know of a way to return the columns' labels for columns that exist in df_2 but not in df_1

something like this

array([u'D'], dtype=string)

Thank you in advance!

Pandas index object have set-like properties, so you can directly do:

df_2.columns.difference(df_1.columns)
Index([u'D'], dtype='object')

You can also use operators like &|^ to compute intersection, union and symmetric difference:

df_1.columns & df_2.columns
Index([u'B', u'C'], dtype='object')

df_1.columns | df_2.columns
Index([u'A', u'B', u'C', u'D'], dtype='object')

df_1.columns ^ df_2.columns
Index([u'A', u'D'], dtype='object')

There use to be the -operator for difference, now deprecated:

df_2.columns - df_1.columns
FutureWarning: using '-' to provide set differences with Indexes is deprecated, use .difference()
Index([u'D'], dtype='object')

pandas get rows which are NOT in other dataframe, One method would be to store the result of an inner merge form both dfs, then we can simply select the rows when one column's values are not in this common: Extracting specific rows of a pandas dataframe ¶. df2 [1:3] df2[1:3] df2 [1:3] That would return the row with index 1, and 2. The row with index 3 is not included in the extract because that’s how the slicing syntax works. Note also that row with index 1 is the second row. Row with index 2 is the third row and so on.

Numpy solution with numpy.setdiff1d:

a = np.setdiff1d(df_2.columns, df_1.columns)
print (a)
['D']

Pandas solution with Index.difference:

a = df_2.columns.difference(df_1.columns)
print (a)
Index(['D'], dtype='object')

Another pandas methods are intersection, union and symmetric_difference :

print (df_2.columns.intersection(df_1.columns))
Index(['B', 'C'], dtype='object')

print (df_2.columns.union(df_1.columns))
Index(['A', 'B', 'C', 'D'], dtype='object')

print (df_2.columns.symmetric_difference(df_1.columns))
Index(['A', 'D'], dtype='object')

And numpy functions are intersect1d, union1d and setxor1d:

print (np.intersect1d(df_2.columns, df_1.columns))
['B' 'C']

print (np.union1d(df_2.columns, df_1.columns))
['A' 'B' 'C' 'D']

print (np.setxor1d(df_2.columns, df_1.columns))
['A' 'D']

pandas.DataFrame.isin, Return boolean DataFrame showing whether each element in the DataFrame is 0 True False 1 False False # Column A in `other` has a 3, but not at index 1. Get the number of rows, columns, elements of pandas.DataFrame Display number of rows, columns, etc.: df.info() The info() method of pandas.DataFrame can display information such as the number of rows and columns, the total memory usage, the data type of each column, and the number of non-NaN elements.

here it is buddy

set(df_2.columns).difference(df_1.columns)
Out[76]: {'D'}

pandas.DataFrame.append, Append rows of other to the end of this frame, returning a new object. Columns not in this frame are added as new columns. Parameters: other : DataFrame or  pandas.DataFrame.dtypes¶ property DataFrame.dtypes¶ Return the dtypes in the DataFrame. This returns a Series with the data type of each column. The result’s index is the original DataFrame’s columns. Columns with mixed types are stored with the object dtype. See the User Guide for more. Returns pandas.Series. The data type of each column

You can use:

set(df_2.columns.values) - set(df_1.columns.values)

which returns a set containing column labels of columns in df_2 but not in df_1.

pandas.DataFrame.append, Append rows of other to the end of caller, returning a new object. Columns in other that are not in the caller are added as new columns. Parameters:. How to get column names in Pandas dataframe While analyzing the real datasets which are often very huge in size, we might need to get the column names in order to perform some certain operations. Let’s discuss how to get column names in Pandas dataframe .

Combining DataFrames with Pandas – Data Analysis and , Combine data from multiple files into a single DataFrame using merge and concat. in pandas to append either columns or rows from one DataFrame to another. that row will not be included in the DataFrame returned by an inner join. pandas.DataFrame.pct_change¶ DataFrame.pct_change (self: ~FrameOrSeries, periods=1, fill_method='pad', limit=None, freq=None, **kwargs) → ~FrameOrSeries [source] ¶ Percentage change between the current and a prior element. Computes the percentage change from the immediately previous row by default.

Joining Pandas Dataframes, Scenario 1 - Two data sets containing the same columns but different rows of data What if the columns in the Dataframes are not the same? The default is the inner join which returns the columns from both tables where the key or common  As already hinted at, isin requires columns and indices to be the same for a match. If match should only be on row contents, one way to get the mask for filtering the rows present is to convert the rows to a (Multi)Index:

Python| Pandas dataframe.append(), Pandas dataframe.append() function is used to append rows of other dataframe to the end of the given dataframe, returning a new dataframe object. Columns not in the original dataframes are added as new columns and the new cells are  Varun September 1, 2019 Pandas : Check if a value exists in a DataFrame using in & not in operator | isin() 2019-09-01T14:34:39+05:30 Dataframe, Pandas, Python No Comment In this article we will dicuss different ways to check if a given value exists in the dataframe or not.

Comments
  • Nice usage of set-like properties on pandas Index objects +1 :)
  • why set? it is not necessary I think.