Reshape and filter pandas dataframe

pandas melt
pandas dataframe reshape(-1 1)
pandas pivot
pandas groupby
pandas reshape long to wide
pandas stack
pandas unstack
pandas pivot table to dataframe

I would like to filter out all the values from the dataframe below (df1) with cells equal to 1 and create a new dataframe where each row has the row and column from the corresponding cell (as in df2 below):

dict1 = [{'12/21/18': 0,'12/22/18': 0,'12/23/18': 1,'12/24/18': 1},
     {'12/21/18': 1,'12/22/18': 1,'12/23/18': 0,'12/24/18': 1},
     {'12/21/18': 0,'12/22/18': 1,'12/23/18': 0,'12/24/18': 0},
     {'12/21/18': 1,'12/22/18': 0,'12/23/18': 1,'12/24/18': 1}]


df1 = pd.DataFrame(dict1, index= ['AAPL','CSCO','GE','MSFT' ])

dict2 = [{'Ticker': 'AAPL','Date': '12/23/18'},
     {'Ticker': 'AAPL','Date': '12/24/18'},
     {'Ticker': 'CSCO','Date': '12/22/18'},
     {'Ticker': 'CSCO','Date': '12/24/18'},
     {'Ticker': 'GE',  'Date': '12/22/18'},
     {'Ticker': 'MSFT','Date': '12/24/18'}]


df2 = pd.DataFrame(dict2)

Can anyone suggest an approach of how to do so?

Here's the performance comparison of methods given by @slayer and @Lucas H. I've also added a third approach.

@slayer method 
%%timeit 
1.12 ms ± 61.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

@Lucas H method
%%timeit
5.16 ms ± 735 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

@Third method
%%timeit
4.4 ms ± 232 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


# Third method
df1 = df1.T
df2 = pd.melt(df1.where(df1==0, df1.index))
df2 = df2[df2.value != 0]
df2.columns = ['Ticker', 'Date']

Clearly @slayer's method beats all.

How To Reshape Pandas Dataframe with melt and wide_to_long , wide_to_long() is that we can easily take care of the prefix in the column names. We need to specify “stubnames” to extract the prefix from column variable names. There are many different ways to reshape a pandas dataframe from long to wide form. But the pivot_table () method is the most flexible and probably the only one you need to use once you learn it well, just like how you only need to learn one method melt to reshape from wide to long (see my other post below).

You can try looking at the values of the dataframe and get an array of indices where the value is greater than 0. Then you can use the indices into the index and column name lists to build a new dataframe.

import numpy as np
idx = np.argwhere(df1.values > 0)

# Get a list of the ticker index and column names
ticker_list = df1.index.tolist()
date_list = df1.columns.tolist()
ticker = []
date = []

for value in idx:
    ticker.append(ticker_list[value[0]])
    date.append(date_list[value[1]])

df2 = pd.DataFrame({'Ticker': ticker, 'Date': date})

Reshape of pandas series?, . newshape : int or tuple of ints The new shape should be compatible with the original shape. If an integer, then the result will be a 1-D array of that length. Melt in pandas reshape dataframe from wide format to long format. It uses the “id_vars[‘col_names’]” for melt the dataframe by column names. filter_none

I think that the simplest way to do this is as follows:

df1.index.name = 'Ticker' # First reset the name so it will match your desired output (default is 'index')
df2 = df1.reset_index().melt(id_vars='Ticker',var_name='Date') # This unstacks the data

Finally to get it into your desired form, we remove 0's, reset the index, and sort:

df2 = df2[df2.value == 1].set_index('Ticker').filter(['Date','Ticker']).sort_index()

How To Filter Pandas Dataframe By Values of Column?, How do you select rows of pandas Dataframe using multiple conditions? First, let’s import the data into a Pandas data-frame using Python. Data imported to Pandas from Sample — Superstore Excel file It is a good practice to check whether the complete data-set has been imported successfully into the Pandas data frame from our data source i.e. an excel file in our case.

Reshaping and pivot tables, The stack function “compresses” a level in the DataFrame 's columns to produce either: A Series , in the case of a simple column Index. A DataFrame , in  Reshaping Pandas Dataframe with wide_to_long () In addition to melt, Pandas also another function called “wide_to_long”. We can use Pandas’ wide_to_long () to reshape the wide dataframe into long/tall dataframe. Another benefit of using Pandas wide_to_long () is that we can easily take care of the prefix in the column names.

pandas.DataFrame.filter, Subset the dataframe rows or columns according to the specified index labels. Note that this routine does not filter a dataframe on its contents. The filter is applied  The pd.wide_to_long function is built almost exactly for this situation, where you have many of the same variable prefixes that end in a different digit suffix. The only difference here is that your first set of variables don't have a suffix, so you will need to rename your columns first.

pandas.DataFrame.filter, Subset rows or columns of dataframe according to labels in the specified index. Note that this routine does not filter a dataframe on its contents. The filter is  In this article, we will cover various methods to filter pandas dataframe in Python. Data Filtering is one of the most frequent data manipulation operation. It is similar to WHERE clause in SQL or you must have used filter in MS Excel for selecting specific rows based on some conditions.

Comments
  • You just need melt and filter it
  • Thank you for the comparison and thanks also for the third approach, really appreciate the help. I might still go with Lucas' or your method because speed is no concern and because I really want to familiarize myself with melt!!
  • Nice comparison. Sometimes the efficient route is not the one most sought after.
  • Really good to know, in case I have to apply this to larger datasets, I will keep your approach in my cheatsheet slayer just in case, thank to all 3 of you!!! This has been very helpful!!!
  • Thanks meW, this is going to be very helpful!!