Check if a value is in a list of columns in a pandas.DataFrame

I am working with some data that looks like this (simplified) in a pandas.DataFrame:

|-----------|-----------|-----------|
| Feature 1 | Feature 2 | Feature 3 |
|-----------|-----------|-----------|
|     A     |     B     |     D     |
|     A     |     A     |     B     |
|     A     |     D     |     A     |
|     A     |     B     |     A     |
|     A     |     A     |     A     |
|     A     |     A     |     D     |
|-----------|-----------|-----------|

And I want to create a new column that answer to the question "Is the value 'D' present in any of the columns?"

So the final data would look like:

|-----------|-----------|-----------|-----------|
| Feature 1 | Feature 2 | Feature 3 | Feature 4 |
|-----------|-----------|-----------|-----------|
|     A     |     B     |     D     |    True   |
|     A     |     A     |     B     |   False   |
|     A     |     D     |     A     |    True   |
|     A     |     B     |     A     |   False   |
|     A     |     A     |     A     |   False   |
|     A     |     A     |     D     |    True   |
|-----------|-----------|-----------|-----------|

I've tried using df.isin() method, but I'm still not able to this.

Do you guys know how to do this?

try this approach:

df[df=='D'].any(1)

Pandas Dataframe Check if column value is in column list, Now if call any() on this bool array it will return a series showing if a column contains True or not i.e.. empDfObj.isin([81])  Pandas : Get frequency of a value in dataframe column/index & find its positions in Python; Pandas : Get unique values in columns of a Dataframe in Python; Pandas : count rows in a dataframe | all or those only that satisfy a condition; Pandas: Convert a dataframe column into a list using Series.to_list() or numpy.ndarray.tolist() in python

You need simply compare df with D and then check at least one True by any:

df['Feature 4'] = (df == 'D').any(axis=1)
print (df)
  Feature 1 Feature 2 Feature 3 Feature 4
0         A         B         D      True
1         A         A         B     False
2         A         D         A      True
3         A         B         A     False
4         A         A         A     False
5         A         A         D      True

Or for comparing use eq:

df['Feature 4'] = df.eq('D').any(axis=1)
print (df)
  Feature 1 Feature 2 Feature 3 Feature 4
0         A         B         D      True
1         A         A         B     False
2         A         D         A      True
3         A         B         A     False
4         A         A         A     False
5         A         A         D      True

print (df.eq('D'))
  Feature 1 Feature 2 Feature 3
0     False     False      True
1     False     False     False
2     False      True     False
3     False     False     False
4     False     False     False
5     False     False      True

Pandas : Check if a value exists in a DataFrame using in & not in , Check whether values are contained in Series. Return a or list-like. The sequence of values to test. Instead, turn a single string into a list of one element​. One way to filter by rows in Pandas is to use boolean expression. We first create a boolean variable by taking the column of interest and checking if its value equals to the specific value that we want to select/keep. For example, let us filter the dataframe or subset the dataframe based on year’s value 2002.

To help anyone having the same challange, I throw in another option. You can use the numpy where function with or to check all the columns.

See a mockup below:

import numpy as np
import pandas as pd

a = [
['A', 'B', 'D'], 
['A','A', 'B'],
['A','D', 'A'],
['A','B', 'A'],
['A','A', 'A'],
['A','A', 'D']
]
df = pd.DataFrame(a, columns=['Feature 1', 'Feature 2', 'Feature 3'])
df['Feature 4'] = np.where((df['Feature 1']=='D') | (df['Feature 2']=='D') |(df['Feature 3']=='D') , True, False)
df

Results below:

+---+-----------+-----------+-----------+-----------+
|   | Feature 1 | Feature 2 | Feature 3 | Feature 4 |
+---+-----------+-----------+-----------+-----------+
| 0 | A         | B         | D         | True      |
+---+-----------+-----------+-----------+-----------+
| 1 | A         | A         | B         | False     |
+---+-----------+-----------+-----------+-----------+
| 2 | A         | D         | A         | True      |
+---+-----------+-----------+-----------+-----------+
| 3 | A         | B         | A         | False     |
+---+-----------+-----------+-----------+-----------+
| 4 | A         | A         | A         | False     |
+---+-----------+-----------+-----------+-----------+
| 5 | A         | A         | D         | True      |
+---+-----------+-----------+-----------+-----------+

pandas.Series.isin, If values is a dict, the keys must be the column names, which must match. When values is a list check whether every value in the DataFrame is present in the  Get n-largest values from a particular column in Pandas DataFrame; Get unique values from a column in Pandas DataFrame; Check out this Author's contributed articles.

pandas.DataFrame.isin, Specify axis='columns' to check if row-wise values all return True. >>> df.all(axis='​columns') 0 True 1 False dtype: bool. Or axis=None for whether every value is  I want to concate a list of n'th entries into an single column in a pandas-dataframe. The numbre of entries of the list can variate Sample Input: a = {"unix_group_A": [ "abc403

pandas.DataFrame.all, If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Whether each column contains at least one True element (the default). Pandas set index is an inbuilt pandas work that is used to set the List, Series or DataFrame as a record of a DataFrame. Pandas DataFrame is a 2-Dimensional named data structure with columns of a possibly remarkable sort. Pandas set index() work sets the DataFrame index by utilizing existing columns.

pandas.DataFrame.any, If you want to search on multiple columns, do `df[ ['column_1', 'column_n'] ]`. For indexes, use either `df.loc[]` or `df.iloc[]`. Having a boolean array, you shall now  Check for NaN in Pandas DataFrame NaN stands for Not A Number and is one of the common ways to represent the missing value in the data. It is a special floating-point value and cannot be converted to any other type than float.

Comments
  • @jezrael, yes, df.eq('D').any(1) would be enough, but it's already covered by your answer... ;)