How to subtract rows of one pandas data frame from another?

pandas subtract two dataframes on index
pandas subtract two rows
remove one dataframe from another pandas
pandas remove rows from another dataframe
pandas subtract two columns from different dataframes
subtract one dataframe from another r
pandas subtract dataframes with different column names
pandas subtract two dataframes element wise

The operation that I want to do is similar to merger. For example, with the inner merger we get a data frame that contains rows that are present in the first AND second data frame. With the outer merger we get a data frame that are present EITHER in the first OR in the second data frame.

What I need is a data frame that contains rows that are present in the first data frame AND NOT present in the second one? Is there a fast and elegant way to do it?

How about something like the following?

print df1

    Team  Year  foo
0   Hawks  2001    5
1   Hawks  2004    4
2    Nets  1987    3
3    Nets  1988    6
4    Nets  2001    8
5    Nets  2000   10
6    Heat  2004    6
7  Pacers  2003   12

print df2

    Team  Year  foo
0  Pacers  2003   12
1    Heat  2004    6
2    Nets  1988    6

As long as there is a non-key commonly named column, you can let the added on sufffexes do the work (if there is no non-key common column then you could create one to use temporarily ... df1['common'] = 1 and df2['common'] = 1):

new = df1.merge(df2,on=['Team','Year'],how='left')
print new[new.foo_y.isnull()]

     Team  Year  foo_x  foo_y
0  Hawks  2001      5    NaN
1  Hawks  2004      4    NaN
2   Nets  1987      3    NaN
4   Nets  2001      8    NaN
5   Nets  2000     10    NaN

Or you can use isin but you would have to create a single key:

df1['key'] = df1['Team'] + df1['Year'].astype(str)
df2['key'] = df1['Team'] + df2['Year'].astype(str)
print df1[~df1.key.isin(df2.key)]

     Team  Year  foo         key
0   Hawks  2001    5   Hawks2001
2    Nets  1987    3    Nets1987
4    Nets  2001    8    Nets2001
5    Nets  2000   10    Nets2000
6    Heat  2004    6    Heat2004
7  Pacers  2003   12  Pacers2003

How to subtract all rows in a dataframe with a row from another , Pandas NDFrames generally try to perform operations on items with matching indices. df - df2 only performs subtraction on the first row,  Pandas - Dropping multiple empty columns. python,pandas. You can just subscript the columns: df = df[df.columns[:11]] This will return just the first 11 columns or you can do: df.drop(df.columns[11:], axis=1) To drop all the columns after the 11th one.

Consider Following:

  1. df_one is first DataFrame
  2. df_two is second DataFrame

Present in First DataFrame and Not in Second DataFrame

Solution: by Index df = df_one[~df_one.index.isin(df_two.index)]

index can be replaced by required column upon which you wish to do exclusion. In above example, I've used index as a reference between both Data Frames

Additionally, you can also use a more complex query using boolean pandas.Series to solve for above.

pandas.DataFrame.subtract, Equivalent to dataframe - other , but with support to substitute a fill_value for Broadcast across a level, matching Index values on the passed MultiIndex level. Pandas dataframe.subtract () function is used for finding the subtraction of dataframe and other, element-wise. This function is essentially same as doing dataframe - other but with a support to substitute for missing data in one of the inputs. Syntax: DataFrame.subtract (other, axis=’columns’, level=None, fill_value=None)

You could run into errors if your non-index column has cells with NaN.

print df1

    Team   Year  foo
0   Hawks  2001    5
1   Hawks  2004    4
2    Nets  1987    3
3    Nets  1988    6
4    Nets  2001    8
5    Nets  2000   10
6    Heat  2004    6
7  Pacers  2003   12
8 Problem  2112  NaN


print df2

     Team  Year  foo
0  Pacers  2003   12
1    Heat  2004    6
2    Nets  1988    6
3 Problem  2112  NaN

new = df1.merge(df2,on=['Team','Year'],how='left')
print new[new.foo_y.isnull()]

     Team  Year  foo_x  foo_y
0   Hawks  2001      5    NaN
1   Hawks  2004      4    NaN
2    Nets  1987      3    NaN
4    Nets  2001      8    NaN
5    Nets  2000     10    NaN
6 Problem  2112    NaN    NaN

The problem team in 2112 has no value for foo in either table. So, the left join here will falsely return that row, which matches in both DataFrames, as not being present in the right DataFrame.

Solution:

What I do is to add a unique column to the inner DataFrame and set a value for all rows. Then when you join, you can check to see if that column is NaN for the inner table to find unique records in the outer table.

df2['in_df2']='yes'

print df2

     Team  Year  foo  in_df2
0  Pacers  2003   12     yes
1    Heat  2004    6     yes
2    Nets  1988    6     yes
3 Problem  2112  NaN     yes


new = df1.merge(df2,on=['Team','Year'],how='left')
print new[new.in_df2.isnull()]

     Team  Year  foo_x  foo_y  in_df1  in_df2
0   Hawks  2001      5    NaN     yes     NaN
1   Hawks  2004      4    NaN     yes     NaN
2    Nets  1987      3    NaN     yes     NaN
4    Nets  2001      8    NaN     yes     NaN
5    Nets  2000     10    NaN     yes     NaN

NB. The problem row is now correctly filtered out, because it has a value for in_df2.

  Problem  2112    NaN    NaN     yes     yes

Python, Parameters : other : Series, DataFrame, or constant axis : For Series input, axis to match Series index on level : Broadcast across a level, matching Index values  pandas.DataFrame.subtract ¶ DataFrame.subtract(self, other, axis='columns', level=None, fill_value=None) [source] ¶ Get Subtraction of dataframe and other, element-wise (binary operator sub). Equivalent to dataframe - other, but with support to substitute a fill_value for missing data in one of the inputs.

I suggest using parameter 'indicator' in merge. Also if 'on' is None this defaults to the intersection of the columns in both DataFrames.

new = df1.merge(df2,how='left', indicator=True) # adds a new column '_merge'
new = new[(new['_merge']=='left_only')].copy() #rows only in df1 and not df2
new = new.drop(columns='_merge').copy()

    Team    Year    foo
0   Hawks   2001    5
1   Hawks   2004    4
2   Nets    1987    3
4   Nets    2001    8
5   Nets    2000    10

Reference: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.merge.html

indicator : boolean or string, default False

If True, adds a column to output DataFrame called "_merge" with information on the source of each row. 
Information column is Categorical-type and takes on a value of 
"left_only" for observations whose merge key only appears in ‘left’ DataFrame,
"right_only" for observations whose merge key only appears in ‘right’ DataFrame, 
and "both" if the observation’s merge key is found in both.

Subtracting a pandas DataFrame from another DataFrame , The sub() method supports passing a parameter for missing values(np.nan, None​). Example: import pandas as pd. Subtracting one column from another in Pandas created memory probems and a solution I had two datasets with about 17 million observations for different variables in each. One was an event file (admissions to hospitals, when, what and so on).

How to add or subtract two columns and put the results in a new , How do I add or subtract two columns and put the results in a new column in a Pandas How can I Filter Rows of a Pandas DataFrame by Column Value? Data Frame before Adding Row-Data Frame after Adding Row-For more examples refer to Add a row at top in pandas DataFrame Row Deletion: In Order to delete a row in Pandas DataFrame, we can use the drop() method. Rows is deleted by dropping Rows by index label.

Adding and subtract inbetween row inputs and value equal to the , I tried another method and it worked, but didn't came as I want. import pandas as pd #a sample dictionary data = {'x1':[1,0,4,5,8,1], 'x2':[3 x1 x2 x3 y 0 1 3 4 True 1 0 4 5 False 2 4 5 1 False 3 5 6 -2 False 4 8 8 4 False 5 1 9 5 0 are looping in the rows of data frame while j is less than data frame's length  The row with index 3 is not included in the extract because that’s how the slicing syntax works. Note also that row with index 1 is the second row. Row with index 2 is the third row and so on. If you’re wondering, the first row of the dataframe has an index of 0. That’s just how indexing works in Python and pandas.

Subtracting values across grouped data frames in Pandas, Pandas subtract two dataframes on index. pandas.DataFrame.subtract, Any single or multiple element data structure, or list-like object. axis{0 or 'index', 1 or  Another DataFrame; Steps to Select Rows from Pandas DataFrame Step 1: Data Setup. Pandas read_csv() is an inbuilt function that is used to import the data from a CSV file and analyze that data in Python. So, we will import the Dataset from the CSV file, and it will be automatically converted to Pandas DataFrame and then select the Data from

Comments
  • how = 'left'? surely that's not what you want (given your SO score it must be more complex than that)
  • Left or right merge gives me a data frame that contains rows that are present in one of the data frames. But I need a data frame that contains rows that are present in one data frame AND NOT present in another one.
  • If it is just one merge key then you could do it with isin and ~
  • I'm laughing to myself actually trying to understand how to move something to object1 from object2 on the condition that the thing is in object one, and NOT in object2. To me, that just sounds like object1 - no operation necessary! I don't think I get the point. Ignore me, sorry, it just made me smile...
  • @KarlD., I have more than one merge keys.