Pandas dropna with multilevel column index

Related searches

So I have got this Pandas DataFrame with multilevel index for the columns:

   group1    group2    group3
   1    2    1    2    1    2
0  ...  ...  NaN  ...  ...  ...
1  NaN  ...  ...  ...  ...  ...
2  ...  ...  ...  ...  NaN  ...

Now i want to drop the rows where the columns group2 and group3 have NaN values. Which equates to rows 0 and 2 in this instance.

According to my understanding of the documentation this should work:

df.dropna(axis = 'rows', subset = ['group2', 'group3'])

But it does not. Instead I get the error:

KeyError: ['group2', 'group3']

Could someone please point out to me how to properly specify the subset?

Kind regards, Rasmus


So it seems like .dropna() cannot work with mulitlevel column indexes. In the end I went with the less elegant, but workable method suggested, slightly rewritten:

mask_nan = df[['group2', 'group3']].isna().any(axis = 'columns')
df[~mask_nan]    # ~ to negate / flip the boolean values

Seems like we can not pass the index level in dropna , so we could do

df.loc[:,['group2', 'group3']].isna().any(1)


df=df[df.loc[:,['group2', 'group3']].isna().any(1)]

pandas.DataFrame.dropna — pandas 0.23.1 documentation, 0, or 'index' : Drop rows which contain missing values. 1, or 'columns' : Drop columns which contain missing value. Deprecated since version 0.23.0:: Pass tuple� With a multi-index we have to specify the column using a tuple in order to drop a specific column, or specify the level to drop all columns with that key on that index level. Instead of saying drop column 'c' say drop ('a','c') as shown below: df.drop(('a', 'c'), axis = 1, inplace = True) Or specify the level as shown below

I think this is a similiar question to yours.

import numpy as np

df = df[np.isfinite(df['group2', 'group3'])]

Only the rows where the values are finite are taken into account here.

pandas.MultiIndex.droplevel — pandas 1.1.1 documentation, Index.set_names � pandas.Index.droplevel � pandas.Index.fillna � pandas.Index. dropna � pandas.Index.isna � pandas.Index.notna � pandas.Index.astype � pandas. pandas.DataFrame.dropna¶ DataFrame.dropna (axis = 0, how = 'any', thresh = None, subset = None, inplace = False) [source] ¶ Remove missing values. See the User Guide for more on which values are considered missing, and how to work with missing data. Parameters axis {0 or ‘index’, 1 or ‘columns’}, default 0

Start from detail. When you run:

idx = pd.IndexSlice
df.loc[:, idx['group2':'group3']]

You will get columns for group2 and group3:

  group2     group3    
       1   2      1   2
0    NaN   3    4.0   5
1    8.0   9   10.0  11
2   14.0  15    NaN  17

Now a more compicated expession:

df.loc[:, idx['group2':'group3']].notnull().all(axis=1)

will display a boolean Series with True where all columns are not null:

0    False
1     True
2    False
dtype: bool

So the code that you need is to use the above code in boolean indexing:

df[df.loc[:, idx['group2':'group3']].notnull().all(axis=1)]

(+ idx = pd.IndexSlice before).

pandas.MultiIndex.dropna — pandas 0.22.0 documentation, Navigation. index � modules |; next |; previous |; pandas 0.22.0 documentation �; API Reference �; pandas.MultiIndex �. Table Of Contents. What's New� It is important to note that tuples and lists are not treated identically in pandas when it comes to indexing. Whereas a tuple is interpreted as one multi-level key, a list is used to specify several keys. Or in other words, tuples go horizontally (traversing levels), lists go vertically (scanning levels).

pandas.DataFrame.stack — pandas 1.1.1 documentation, if the columns have multiple levels, the new index level(s) is (are) taken from the df_multi_level_cols3.stack(dropna=True) height weight cat m 1.0 NaN dog kg� Say I have a pandas dataframe, my_table with multilevel columns like so: SITE_NO DATETIME VALUE index A B C D 0 123 2011-11-16 12:00:00 1 3 5 7 1 456 2011-11-28 12:00

pandas.DataFrame.reset_index — pandas 1.1.1 documentation, If the columns have multiple levels, determines how the other levels are named. If None then the index name is repeated. Returns. DataFrame or None. DataFrame � def create_tuple_for_for_columns(df_a, multi_level_col): """ Create a columns tuple that can be pandas MultiIndex to create multi level column :param df_a: pandas dataframe containing the columns that must form the first level of the multi index :param multi_level_col: name of second level column :return: tuple containing (second_level_col

One option is to use MultiIndex () to construct the columns level for A and B and then concatenate them: import pandas as pd A.columns = pd.MultiIndex.from_product( [ ['A'], A.columns]) B.columns = pd.MultiIndex.from_product( [ ['B'], B.columns]) pd.concat( [A, B], axis = 1) # A B # a b a b #2016-11-21 2 1 3 0 #2016-11-22 3 4 1 0 #2016-11-23 5 2 1 6 #2016-11-24 6 3 1 5 #2016-11-25 6 3 0 2.

  • Well, df.dropna(subset = ['column_name']) should do the job. Can you try dropping one at a time?
  • No luck, just a shorter KeyError.
  • I don't think you need loc you can just use df[['group2', 'group3']]
  • Since I pass a total of four columns (two column groups of two a piece) would I not get a boolean array? I would need a Pandas Series, to pass back as an index mask.