Select data when specific columns have null value in pandas

pandas not null multiple columns
pandas isnull
pandas find columns with nan
pandas where column value is null
pandas check if multiple columns are null
pandas find rows with nan
pandas remove null value rows
find rows where column is null pandas

I have a dataframe where there are 2 date fields I want to filter and see rows when any one of the date field is null.

ID          Date1       Date2
58844880    04/11/16    NaN
59745846    04/12/16    04/14/16
59743311    04/13/16    NaN
59745848    04/14/16    04/11/16
59598413    NaN         NaN
59745921    04/14/16    04/14/16
59561199    04/15/16    04/15/16
NaN         04/16/16    04/16/16
59561198    NaN         04/17/16

It should look like below

ID          Date1       Date2
58844880    04/11/16    NaN
59743311    04/13/16    NaN
59598413    NaN         NaN
59561198    NaN         04/17/16

Tried the code df = (df['Date1'].isnull() | df['Date1'].isnull())

Use boolean indexing:

mask = df['Date1'].isnull() | df['Date2'].isnull()
print (df[mask])
           ID     Date1     Date2
0  58844880.0  04/11/16       NaN
2  59743311.0  04/13/16       NaN
4  59598413.0       NaN       NaN
8  59561198.0       NaN  04/17/16

Timings:

#[900000 rows x 3 columns]
df = pd.concat([df]*100000).reset_index(drop=True)

In [12]: %timeit (df[df['Date1'].isnull() | df['Date2'].isnull()])
10 loops, best of 3: 89.3 ms per loop

In [13]: %timeit (df[df.filter(like='Date').isnull().any(1)])
10 loops, best of 3: 146 ms per loop

How to Filter a Pandas Dataframe Based on Null Values of a Column?, Python's pandas can easily handle missing data or NA values in a to keep the rows of data frame where the specific column don't have data� Pandas: Find Rows Where Column/Field Is Null I did some experimenting with a dataset I've been playing around with to find any columns/fields that have null values in them. Learn how I did it!

Quickly see if either column has any null values

df.isnull().any()

Count rows that have any null values

df.isnull().sum()
Get rows with null values

(1) Create truth table of null values (i.e. create dataframe with True/False in each column/cell, according to whether it has null value)

truth_table = df.isnull()

(2) Create truth table that shows conclusively which rows have any null values

conclusive_truth_table = truth_table.any(axis='columns')

(3) isolate/show rows that have any null values

df[conclusive_truth_table]

(1)-(3) put it all together

df[df.isnull().any(axis='columns')]
Alternatively

Isolate rows that have null values in any specified column

df.loc[:,['Date1','Date2']].isnull().any(axis='columns')

Isolate rows that have null values in BOTH specified columns

df[ df.loc[ :,['Date1','Date2'] ].isnull().sum(axis=1) == 2]

Pandas: Find Rows Where Column/Field Is Null, The the code you need to count null columns and see examples where a I wanted to find any columns/fields that have null values in them. To filter out the rows of pandas dataframe that has missing values in Last_Namecolumn, we will first find the index of the column with non null values with pandas notnull () function. It will return a boolean series, where True for not null and False for null values or missing values. 1

try this:

In [7]: df[df.filter(like='Date').isnull().any(1)]
Out[7]:
           ID     Date1     Date2
0  58844880.0  04/11/16       NaN
2  59743311.0  04/13/16       NaN
4  59598413.0       NaN       NaN
8  59561198.0       NaN  04/17/16

How do I select a subset of a DataFrame? — pandas 1.1.0 , How do I select specific columns from a DataFrame ? the selection brackets titanic["Age"] > 35 checks for which rows the Age column has a value The notna() conditional function returns a True for each row the values are not an Null value. Pandas is one of those packages and makes importing and analyzing data much easier. While making a Data Frame from a csv file, many blank columns are imported as null value into the Data Frame which later creates problems while operating that data frame. Pandas isnull() and notnull() methods are used to check and manage NULL values in a data frame.

pandas.DataFrame.any — pandas 1.1.0 documentation, 1 / 'columns' : reduce the columns, return a Series whose index is the original index. If None, will attempt to use everything, then use only boolean data. Exclude NA/null values. If level is specified, then, DataFrame is returned; otherwise, Series is returned. Return whether all elements are True over requested axis. Further you can also automatically remove cols and rows depending on which has more null values Here is the code which does this intelligently: df = df.drop(df.columns[df.isna().sum()>len(df.columns)],axis = 1) df = df.dropna(axis = 0).reset_index(drop=True) Note: Above code removes all of your null values. If you want null values, process them before.

pandas.DataFrame.empty — pandas 1.1.0 documentation, True if DataFrame is entirely empty (no items), meaning any of the axes are of length 0. Return series without null values. DataFrame.dropna. Return DataFrame with labels on given axis omitted where (all or any) data are missing. Notes df_empty Empty DataFrame Columns: [A] Index: [] >>> df_empty.empty True. Pandas DataFrame select the specific columns with NaN values I have a two-column DataFrame, I want to select the rows How to check if any value is NaN in a

pandas.DataFrame.all — pandas 1.1.0 documentation, Return whether all elements are True, potentially over an axis. Returns True If None, will attempt to use everything, then use only boolean data. Exclude NA/ null values. If level is specified, then, DataFrame is returned; otherwise, Series is returned. Specify axis='columns' to check if row-wise values all return True. Pandas Count Values for each Column We will use dataframe count () function to count the number of Non Null values in the dataframe. We will select axis =0 to count the values in each Column df.count (0)

Comments
  • Thank you the solution worked and also for sharing the timeit for both the solutions for comparison