Merging two TRUE/FALSE dataframe columns keeping only TRUE

I have two columns in a pandas dataframe, like below:

df[1]   df[2]

From these two columns, how do I make the following new column:


Looks like you need the any function, like that:

df['result_col'] = df.any(axis=1)

Merging Dataframe different columns. What if both the dataframes was completely different column names. For example let’s rename column ‘ID’ in dataframe 2 i.e. # Rename column ID to EmpID salaryDfObj.rename(columns={'ID': 'EmpID'}, inplace=True) Now out second dataframe salaryDFObj contents are,

You can just use the "or" (|) operator.

For example:

df = pd.DataFrame({'a' : [True, False, True, False, True, False], 'b': [True, True, False, False, False, False]})

df['c'] = df.a | df.b

With result:

       a      b      c
0   True   True   True
1  False   True   True
2   True  False   True
3  False  False  False
4   True  False   True
5  False  False  False

It merged the contents of the unique columns (salary & bonus) from dataframe 2 with the columns of dataframe 1 based on ‘ID’ & ‘Experience’ columns. Because if we don’t provide the column names on which we want to merge the two dataframes then it by defaults merge on columns with common names.

For a better performance you could use the underlying numpy arrays and compute the np.logical_or of the two columns:

df.loc[:,'logical_or'] = np.logical_or(*df.values.T))

    col1   col2    logical_or
0   True   True        True
1  False   True        True
2   True  False        True
3  False  False       False
4   True  False        True
5  False  False       False

Some time comparissons:

df = pd.DataFrame(np.random.randint(0,2,(10**6,2)).astype(bool))

%timeit np.logical_or(*df.values.T)
4.98 ms ± 33.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%timeit df.any(axis=1)
50 ms ± 292 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit df[0] | df[1]
6.57 ms ± 154 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

The only difference between the two is the order of the columns: the first input’s columns will always be the first in the newly formed DataFrame. merge() is the most complex of the Pandas data combination tools. It’s also the foundation on which the other tools are built.

Defaults to True, setting to False will improve performance substantially in many cases. suffixes: A tuple of string suffixes to apply to overlapping columns. Defaults to ('_x', '_y'). copy: Always copy data (default True) from the passed DataFrame or named Series objects, even when reindexing is not necessary. Cannot be avoided in many cases

To raise an exception on overlapping columns use (False, False). copy bool, default True. If False, avoid copy if possible. indicator bool or str, default False. If True, adds a column to output DataFrame called “_merge” with information on the source of each row. If string, column with information on source of each row will be added to

In the full matching, the dataframe returns only rows found in both x and y data frame. With partial merging, it is possible to keep the rows with no matching rows in the other data frame. These rows will have NA in those columns that are usually filled with values from y. We can do that by setting all.x= TRUE.