Pandas dropna with multilevel column index

Related searches

So I have got this Pandas DataFrame with multilevel index for the columns:

   group1    group2    group3
   1    2    1    2    1    2
0  ...  ...  NaN  ...  ...  ...
1  NaN  ...  ...  ...  ...  ...
2  ...  ...  ...  ...  NaN  ...

Now i want to drop the rows where the columns group2 and group3 have NaN values. Which equates to rows 0 and 2 in this instance.

According to my understanding of the documentation this should work:

df.dropna(axis = 'rows', subset = ['group2', 'group3'])

But it does not. Instead I get the error:

KeyError: ['group2', 'group3']

Could someone please point out to me how to properly specify the subset?

Kind regards, Rasmus


Update

So it seems like .dropna() cannot work with mulitlevel column indexes. In the end I went with the less elegant, but workable method suggested, slightly rewritten:

mask_nan = df[['group2', 'group3']].isna().any(axis = 'columns')
df[~mask_nan]    # ~ to negate / flip the boolean values

Seems like we can not pass the index level in dropna , so we could do

df.loc[:,['group2', 'group3']].isna().any(1)

Then

df=df[df.loc[:,['group2', 'group3']].isna().any(1)]

pandas.DataFrame.dropna — pandas 0.23.1 documentation, 0, or 'index' : Drop rows which contain missing values. 1, or 'columns' : Drop columns which contain missing value. Deprecated since version 0.23.0:: Pass tuple� With a multi-index we have to specify the column using a tuple in order to drop a specific column, or specify the level to drop all columns with that key on that index level. Instead of saying drop column 'c' say drop ('a','c') as shown below: df.drop(('a', 'c'), axis = 1, inplace = True) Or specify the level as shown below

I think this is a similiar question to yours.

import numpy as np

df = df[np.isfinite(df['group2', 'group3'])]

Only the rows where the values are finite are taken into account here.

pandas.MultiIndex.droplevel — pandas 1.1.1 documentation, Index.set_names � pandas.Index.droplevel � pandas.Index.fillna � pandas.Index. dropna � pandas.Index.isna � pandas.Index.notna � pandas.Index.astype � pandas. pandas.DataFrame.dropna¶ DataFrame.dropna (axis = 0, how = 'any', thresh = None, subset = None, inplace = False) [source] ¶ Remove missing values. See the User Guide for more on which values are considered missing, and how to work with missing data. Parameters axis {0 or ‘index’, 1 or ‘columns’}, default 0

Start from detail. When you run:

idx = pd.IndexSlice
df.loc[:, idx['group2':'group3']]

You will get columns for group2 and group3:

  group2     group3    
       1   2      1   2
0    NaN   3    4.0   5
1    8.0   9   10.0  11
2   14.0  15    NaN  17

Now a more compicated expession:

df.loc[:, idx['group2':'group3']].notnull().all(axis=1)

will display a boolean Series with True where all columns are not null:

0    False
1     True
2    False
dtype: bool

So the code that you need is to use the above code in boolean indexing:

df[df.loc[:, idx['group2':'group3']].notnull().all(axis=1)]

(+ idx = pd.IndexSlice before).

pandas.MultiIndex.dropna — pandas 0.22.0 documentation, Navigation. index � modules |; next |; previous |; pandas 0.22.0 documentation �; API Reference �; pandas.MultiIndex �. Table Of Contents. What's New� It is important to note that tuples and lists are not treated identically in pandas when it comes to indexing. Whereas a tuple is interpreted as one multi-level key, a list is used to specify several keys. Or in other words, tuples go horizontally (traversing levels), lists go vertically (scanning levels).

pandas.DataFrame.stack — pandas 1.1.1 documentation, if the columns have multiple levels, the new index level(s) is (are) taken from the df_multi_level_cols3.stack(dropna=True) height weight cat m 1.0 NaN dog kg� Say I have a pandas dataframe, my_table with multilevel columns like so: SITE_NO DATETIME VALUE index A B C D 0 123 2011-11-16 12:00:00 1 3 5 7 1 456 2011-11-28 12:00

pandas.DataFrame.reset_index — pandas 1.1.1 documentation, If the columns have multiple levels, determines how the other levels are named. If None then the index name is repeated. Returns. DataFrame or None. DataFrame � def create_tuple_for_for_columns(df_a, multi_level_col): """ Create a columns tuple that can be pandas MultiIndex to create multi level column :param df_a: pandas dataframe containing the columns that must form the first level of the multi index :param multi_level_col: name of second level column :return: tuple containing (second_level_col

One option is to use MultiIndex () to construct the columns level for A and B and then concatenate them: import pandas as pd A.columns = pd.MultiIndex.from_product( [ ['A'], A.columns]) B.columns = pd.MultiIndex.from_product( [ ['B'], B.columns]) pd.concat( [A, B], axis = 1) # A B # a b a b #2016-11-21 2 1 3 0 #2016-11-22 3 4 1 0 #2016-11-23 5 2 1 6 #2016-11-24 6 3 1 5 #2016-11-25 6 3 0 2.

Comments
  • Well, df.dropna(subset = ['column_name']) should do the job. Can you try dropping one at a time?
  • No luck, just a shorter KeyError.
  • I don't think you need loc you can just use df[['group2', 'group3']]
  • Since I pass a total of four columns (two column groups of two a piece) would I not get a boolean array? I would need a Pandas Series, to pass back as an index mask.