## Pandas long to wide reshape, by two variables

I have data in long format and am trying to reshape to wide, but there doesn't seem to be a straightforward way to do this using melt/stack/unstack:

Salesman Height product price Knut 6 bat 5 Knut 6 ball 1 Knut 6 wand 3 Steve 5 pen 2

Becomes:

Salesman Height product_1 price_1 product_2 price_2 product_3 price_3 Knut 6 bat 5 ball 1 wand 3 Steve 5 pen 2 NA NA NA NA

I think Stata can do something like this with the reshape command.

A simple pivot might be sufficient for your needs but this is what I did to reproduce your desired output:

df['idx'] = df.groupby('Salesman').cumcount()

Just adding a within group counter/index will get you most of the way there but the column labels will not be as you desired:

print df.pivot(index='Salesman',columns='idx')[['product','price']] product price idx 0 1 2 0 1 2 Salesman Knut bat ball wand 5 1 3 Steve pen NaN NaN 2 NaN NaN

To get closer to your desired output I added the following:

df['prod_idx'] = 'product_' + df.idx.astype(str) df['prc_idx'] = 'price_' + df.idx.astype(str) product = df.pivot(index='Salesman',columns='prod_idx',values='product') prc = df.pivot(index='Salesman',columns='prc_idx',values='price') reshape = pd.concat([product,prc],axis=1) reshape['Height'] = df.set_index('Salesman')['Height'].drop_duplicates() print reshape product_0 product_1 product_2 price_0 price_1 price_2 Height Salesman Knut bat ball wand 5 1 3 6 Steve pen NaN NaN 2 NaN NaN 5

Edit: if you want to generalize the procedure to more variables I think you could do something like the following (although it might not be efficient enough):

df['idx'] = df.groupby('Salesman').cumcount() tmp = [] for var in ['product','price']: df['tmp_idx'] = var + '_' + df.idx.astype(str) tmp.append(df.pivot(index='Salesman',columns='tmp_idx',values=var)) reshape = pd.concat(tmp,axis=1)

@Luke said:

I think Stata can do something like this with the reshape command.

You can but I think you also need a within group counter to get the reshape in stata to get your desired output:

+-------------------------------------------+ | salesman idx height product price | |-------------------------------------------| 1. | Knut 0 6 bat 5 | 2. | Knut 1 6 ball 1 | 3. | Knut 2 6 wand 3 | 4. | Steve 0 5 pen 2 | +-------------------------------------------+

If you add `idx`

then you could do reshape in `stata`

:

reshape wide product price, i(salesman) j(idx)

**pandas.wide_to_long — pandas 1.1.0 documentation,** You specify what you want to call this suffix in the resulting long format with j (for Each row of these wide variables are assumed to be uniquely identified by i ( can for example, if your wide variables are of the form A-one, B-two,.., and you � Reshaping Pandas Dataframe with wide_to_long() In addition to melt, Pandas also another function called “wide_to_long”. We can use Pandas’ wide_to_long() to reshape the wide dataframe into long/tall dataframe. Another benefit of using Pandas wide_to_long() is that we can easily take care of the prefix in the column names.

A bit old but I will post this for other people.

What you want can be achieved, but you probably shouldn't want it ;) Pandas supports hierarchical indexes for both rows and columns. In Python 2.7.x ...

from StringIO import StringIO raw = '''Salesman Height product price Knut 6 bat 5 Knut 6 ball 1 Knut 6 wand 3 Steve 5 pen 2''' dff = pd.read_csv(StringIO(raw), sep='\s+') print dff.set_index(['Salesman', 'Height', 'product']).unstack('product')

Produces a probably more convenient representation than what you were looking for

price product ball bat pen wand Salesman Height Knut 6 1 5 NaN 3 Steve 5 NaN NaN 2 NaN

The advantage of using set_index and unstacking vs a single function as pivot is that you can break the operations down into clear small steps, which simplifies debugging.

**Reshape long to wide in pandas python with pivot function ,** Pivot() function in pandas transform the data from long to wide format. pivot() example. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. import pandas as pd. import numpy as np Values of Metrics column is used as column names and values of value� Pandas offers multiple ways to reshape data in wide form to data in tidy or long form. Pandas melt () function is one of the powerful functions to use for reshaping dataframe with Python. In this case, we will see examples of basic use of Pandas melt to reshape wide data containing all numerical variables into tall data.

Here's another solution more fleshed out, taken from Chris Albon's site.

##### Create "long" dataframe

raw_data = {'patient': [1, 1, 1, 2, 2], 'obs': [1, 2, 3, 1, 2], 'treatment': [0, 1, 0, 1, 0], 'score': [6252, 24243, 2345, 2342, 23525]} df = pd.DataFrame(raw_data, columns = ['patient', 'obs', 'treatment', 'score'])

##### Make a "wide" data

df.pivot(index='patient', columns='obs', values='score')

**Reshape pandas dataframe,** pd.pivot_table is probably the only pandas method you'll ever need to reshape dataframes from long to wide. Pandas long to wide reshape, by two variables (4) A bit old but I will post this for other people. What you want can be achieved, but you probably shouldn't want it ;) Pandas supports hierarchical indexes for both rows and columns. In Python 2.7.x

pivoted = df.pivot('salesman', 'product', 'price')

pg. 192 Python for Data Analysis

**Pandas long to wide reshape, by two variables,** I have data in long format and am trying to reshape to wide, but there doesn't seem to be a straightforward way to do this using melt/stack/unstack: Salesman� I'm trying to reshape my long data to a wide format. The data currently looks like: OBS . date . TICKER . RET 1 . 20050131 . AAPL . 0.02 2 . 20050231 . AAPL . 0.01 3

Karl D's solution gets at the heart of the problem. But I find it's far easier to pivot everything (with `.pivot_table`

because of the two index columns) and then `sort`

and assign the columns to collapse the `MultiIndex`

:

df['idx'] = df.groupby('Salesman').cumcount()+1 df = df.pivot_table(index=['Salesman', 'Height'], columns='idx', values=['product', 'price'], aggfunc='first') df = df.sort_index(axis=1, level=1) df.columns = [f'{x}_{y}' for x,y in df.columns] df = df.reset_index()

##### Output:

Salesman Height price_1 product_1 price_2 product_2 price_3 product_3 0 Knut 6 5.0 bat 1.0 ball 3.0 wand 1 Steve 5 2.0 pen NaN NaN NaN NaN

**Long To Wide Format,** import pandas as pd. Create “long” dataframe 1, 1, 2, 1, 24243 a “wide” dataframe with the rows by patient number, the columns being by� Reshape wide to long in pandas. Use pandas.melt to transform from wide to long: date variable value 0 05/03 AA 1 1 06/03 AA 4 2 07/03 AA 7 3 08/03 AA 5 4 05

**Data Reshaping with Pandas Explained | by Yuwen Wang,** A project at work this week enabled me to explore the data reshaping utililies Wide to long transformation — function pd.wide_to_long() explained and models and store them as two separate piece s of information. i — these are the variables in the original data that we did not touch in the wide to long� The wide format variables are assumed to start with the stub names. i str or list-like. Column(s) to use as id variable(s). j str. The name of the sub-observation variable. What you wish to name your suffix in the long format. sep str, default “” A character indicating the separation of the variable names in the wide format, to be stripped

**How To Reshape Pandas Dataframe with melt and wide_to_long ,** Pandas melt to reshape dataframe: Wide to Tidy. In the first 2. 3. 4. 5. 6. 7. wide_df = pd.DataFrame(data). wide_df. Name Weight BP We get the gapminder data in long form, with each variable as separate columns. The top-level melt() function and the corresponding DataFrame.melt() are useful to massage a DataFrame into a format where one or more columns are identifier variables, while all other columns, considered measured variables, are “unpivoted” to the row axis, leaving just two non-identifier columns, “variable” and “value”.

**Reshape table in Pandas (long to wide) : learnpython,** Reshape table in Pandas (long to wide). So I have I'm tring to get the three different categoreis as individual variables, and then get the respective scores as value for each. So a final Year, id, Company, Category 1, Category 2, Category 3� Unpivot a DataFrame from wide to long format, optionally leaving identifiers set. This function is useful to massage a DataFrame into a format where one or more columns are identifier variables ( id_vars ), while all other columns, considered measured variables ( value_vars ), are “unpivoted” to the row axis, leaving just two non-identifier