Pandas long to wide reshape, by two variables

pandas long to wide multiindex
pandas melt
pandas pivot
pandas reshape
pandas pivot multiple columns
pandas wide to long empty dataframe
pandas melt multiple columns
pandas unstack

I have data in long format and am trying to reshape to wide, but there doesn't seem to be a straightforward way to do this using melt/stack/unstack:

Salesman  Height   product      price
  Knut      6        bat          5
  Knut      6        ball         1
  Knut      6        wand         3
  Steve     5        pen          2

Becomes:

Salesman  Height    product_1  price_1  product_2 price_2 product_3 price_3  
  Knut      6        bat          5       ball      1        wand      3
  Steve     5        pen          2        NA       NA        NA       NA

I think Stata can do something like this with the reshape command.


A simple pivot might be sufficient for your needs but this is what I did to reproduce your desired output:

df['idx'] = df.groupby('Salesman').cumcount()

Just adding a within group counter/index will get you most of the way there but the column labels will not be as you desired:

print df.pivot(index='Salesman',columns='idx')[['product','price']]

        product              price        
idx            0     1     2      0   1   2
Salesman                                   
Knut         bat  ball  wand      5   1   3
Steve        pen   NaN   NaN      2 NaN NaN

To get closer to your desired output I added the following:

df['prod_idx'] = 'product_' + df.idx.astype(str)
df['prc_idx'] = 'price_' + df.idx.astype(str)

product = df.pivot(index='Salesman',columns='prod_idx',values='product')
prc = df.pivot(index='Salesman',columns='prc_idx',values='price')

reshape = pd.concat([product,prc],axis=1)
reshape['Height'] = df.set_index('Salesman')['Height'].drop_duplicates()
print reshape

         product_0 product_1 product_2  price_0  price_1  price_2  Height
Salesman                                                                 
Knut           bat      ball      wand        5        1        3       6
Steve          pen       NaN       NaN        2      NaN      NaN       5

Edit: if you want to generalize the procedure to more variables I think you could do something like the following (although it might not be efficient enough):

df['idx'] = df.groupby('Salesman').cumcount()

tmp = []
for var in ['product','price']:
    df['tmp_idx'] = var + '_' + df.idx.astype(str)
    tmp.append(df.pivot(index='Salesman',columns='tmp_idx',values=var))

reshape = pd.concat(tmp,axis=1)

@Luke said:

I think Stata can do something like this with the reshape command.

You can but I think you also need a within group counter to get the reshape in stata to get your desired output:

     +-------------------------------------------+
     | salesman   idx   height   product   price |
     |-------------------------------------------|
  1. |     Knut     0        6       bat       5 |
  2. |     Knut     1        6      ball       1 |
  3. |     Knut     2        6      wand       3 |
  4. |    Steve     0        5       pen       2 |
     +-------------------------------------------+

If you add idx then you could do reshape in stata:

reshape wide product price, i(salesman) j(idx)

pandas.wide_to_long — pandas 1.1.0 documentation, You specify what you want to call this suffix in the resulting long format with j (for Each row of these wide variables are assumed to be uniquely identified by i ( can for example, if your wide variables are of the form A-one, B-two,.., and you � Reshaping Pandas Dataframe with wide_to_long() In addition to melt, Pandas also another function called “wide_to_long”. We can use Pandas’ wide_to_long() to reshape the wide dataframe into long/tall dataframe. Another benefit of using Pandas wide_to_long() is that we can easily take care of the prefix in the column names.


A bit old but I will post this for other people.

What you want can be achieved, but you probably shouldn't want it ;) Pandas supports hierarchical indexes for both rows and columns. In Python 2.7.x ...

from StringIO import StringIO

raw = '''Salesman  Height   product      price
  Knut      6        bat          5
  Knut      6        ball         1
  Knut      6        wand         3
  Steve     5        pen          2'''
dff = pd.read_csv(StringIO(raw), sep='\s+')

print dff.set_index(['Salesman', 'Height', 'product']).unstack('product')

Produces a probably more convenient representation than what you were looking for

                price             
product          ball bat pen wand
Salesman Height                   
Knut     6          1   5 NaN    3
Steve    5        NaN NaN   2  NaN

The advantage of using set_index and unstacking vs a single function as pivot is that you can break the operations down into clear small steps, which simplifies debugging.

Reshape long to wide in pandas python with pivot function , Pivot() function in pandas transform the data from long to wide format. pivot() example. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. import pandas as pd. import numpy as np Values of Metrics column is used as column names and values of value� Pandas offers multiple ways to reshape data in wide form to data in tidy or long form. Pandas melt () function is one of the powerful functions to use for reshaping dataframe with Python. In this case, we will see examples of basic use of Pandas melt to reshape wide data containing all numerical variables into tall data.


Here's another solution more fleshed out, taken from Chris Albon's site.

Create "long" dataframe
raw_data = {'patient': [1, 1, 1, 2, 2],
                'obs': [1, 2, 3, 1, 2],
          'treatment': [0, 1, 0, 1, 0],
              'score': [6252, 24243, 2345, 2342, 23525]}

df = pd.DataFrame(raw_data, columns = ['patient', 'obs', 'treatment', 'score'])

Make a "wide" data
df.pivot(index='patient', columns='obs', values='score')

Reshape pandas dataframe, pd.pivot_table is probably the only pandas method you'll ever need to reshape dataframes from long to wide. Pandas long to wide reshape, by two variables (4) A bit old but I will post this for other people. What you want can be achieved, but you probably shouldn't want it ;) Pandas supports hierarchical indexes for both rows and columns. In Python 2.7.x


pivoted = df.pivot('salesman', 'product', 'price')

pg. 192 Python for Data Analysis

Pandas long to wide reshape, by two variables, I have data in long format and am trying to reshape to wide, but there doesn't seem to be a straightforward way to do this using melt/stack/unstack: Salesman� I'm trying to reshape my long data to a wide format. The data currently looks like: OBS . date . TICKER . RET 1 . 20050131 . AAPL . 0.02 2 . 20050231 . AAPL . 0.01 3


Karl D's solution gets at the heart of the problem. But I find it's far easier to pivot everything (with .pivot_table because of the two index columns) and then sort and assign the columns to collapse the MultiIndex:

df['idx'] = df.groupby('Salesman').cumcount()+1
df = df.pivot_table(index=['Salesman', 'Height'], columns='idx', 
                    values=['product', 'price'], aggfunc='first')

df = df.sort_index(axis=1, level=1)
df.columns = [f'{x}_{y}' for x,y in df.columns]
df = df.reset_index()
Output:
  Salesman  Height  price_1 product_1  price_2 product_2  price_3 product_3
0     Knut       6      5.0       bat      1.0      ball      3.0      wand
1    Steve       5      2.0       pen      NaN       NaN      NaN       NaN

Long To Wide Format, import pandas as pd. Create “long” dataframe 1, 1, 2, 1, 24243 a “wide” dataframe with the rows by patient number, the columns being by� Reshape wide to long in pandas. Use pandas.melt to transform from wide to long: date variable value 0 05/03 AA 1 1 06/03 AA 4 2 07/03 AA 7 3 08/03 AA 5 4 05


Data Reshaping with Pandas Explained | by Yuwen Wang, A project at work this week enabled me to explore the data reshaping utililies Wide to long transformation — function pd.wide_to_long() explained and models and store them as two separate piece s of information. i — these are the variables in the original data that we did not touch in the wide to long� The wide format variables are assumed to start with the stub names. i str or list-like. Column(s) to use as id variable(s). j str. The name of the sub-observation variable. What you wish to name your suffix in the long format. sep str, default “” A character indicating the separation of the variable names in the wide format, to be stripped


How To Reshape Pandas Dataframe with melt and wide_to_long , Pandas melt to reshape dataframe: Wide to Tidy. In the first 2. 3. 4. 5. 6. 7. wide_df = pd.DataFrame(data). wide_df. Name Weight BP We get the gapminder data in long form, with each variable as separate columns. The top-level melt() function and the corresponding DataFrame.melt() are useful to massage a DataFrame into a format where one or more columns are identifier variables, while all other columns, considered measured variables, are “unpivoted” to the row axis, leaving just two non-identifier columns, “variable” and “value”.


Reshape table in Pandas (long to wide) : learnpython, Reshape table in Pandas (long to wide). So I have I'm tring to get the three different categoreis as individual variables, and then get the respective scores as value for each. So a final Year, id, Company, Category 1, Category 2, Category 3� Unpivot a DataFrame from wide to long format, optionally leaving identifiers set. This function is useful to massage a DataFrame into a format where one or more columns are identifier variables ( id_vars ), while all other columns, considered measured variables ( value_vars ), are “unpivoted” to the row axis, leaving just two non-identifier