## pandas pivot and join in two dataframes

I have two dataFrames :

df1 mag cat 0 101 A1 1 256 A2 2 760 A2 3 888 A3 ... df2 A1 A2 A3 ... 0 E50R AZ33 REZ3 1 T605 YYU6 YHG5 2 IR50 P0O9 BF53 3 NaN YY9I NaN

And I would like to create a final DataFrame which looks like :

df 101 256 760 888 ... 0 E50R AZ33 AZ33 REZ3 1 T605 YYU6 YYU6 YHG5 2 IR50 P0O9 P0O9 BF53 3 NaN YY9I YY9I NaN

I tried something with pivot, but it doesn't seem to do the job Could you help me ?

You can use a combination of `GroupBy`

, `numpy.repeat`

, `itertools.chain`

:

from itertools import chain # map cat to list of mag s = df1.groupby('cat')['mag'].apply(list) # calculate indices for columns, including repeats cols_idx = np.repeat(range(len(df2.columns)), s.map(len)) # apply indexing res = df2.iloc[:, cols_idx] # rename columns res.columns = list(chain.from_iterable(df2.columns.map(s.get))) print(res) 101 256 760 888 0 E50R AZ33 AZ33 REZ3 1 T605 YYU6 YYU6 YHG5 2 IR50 P0O9 P0O9 BF53 3 NaN YY9I YY9I NaN

**Performance benchmarking**

Some good and different solutions here, so you may be interested in performance. Wen's `reindex`

solution is the clear winner.

%timeit wen(df1, df2) # 632 µs per loop %timeit jpp(df1, df2) # 2.55 ms per loop %timeit scb(df1, df2) # 7.98 ms per loop %timeit abhi(df1, df2) # 4.52 ms per loop

Code:

def jpp(df1, df2): s = df1.groupby('cat')['mag'].apply(list) cols_idx = np.repeat(range(len(df2.columns)), s.map(len)) res = df2.iloc[:, cols_idx] res.columns = list(chain.from_iterable(df2.columns.map(s.get))) return res def scb(df1, df2): df_out = (df2.stack().reset_index() .merge(df1, left_on='level_1', right_on='cat')[['level_0','mag',0]]) return df_out.pivot('level_0','mag',0).reset_index(drop=True) def abhi(df1, df2): return df2.T.merge(df1, left_index=True, right_on='cat').drop('cat', axis=1).set_index('mag').T def wen(df1, df2): newdf=df2.reindex(columns=df1.cat) newdf.columns=df1.mag return newdf

**Merge, join, and concatenate,** Merge, join, and concatenate; Reshaping and pivot tables · Working with text data When gluing together multiple DataFrames, you have a choice of how to pandas.DataFrame.join¶ DataFrame.join (self, other, on=None, how='left', lsuffix='', rsuffix='', sort=False) → 'DataFrame' [source] ¶ Join columns of another DataFrame. Join columns with other DataFrame either on index or on a key column. Efficiently join multiple DataFrame objects by index at once by passing a list. Parameters

IIUC `reindex`

+re-name

newdf=df2.reindex(columns=df1.cat) newdf.columns=df1.mag newdf Out[519]: mag 101 256 760 888 0 E50R AZ33 AZ33 REZ3 1 T605 YYU6 YYU6 YHG5 2 IR50 P0O9 P0O9 BF53 3 NaN YY9I YY9I NaN

**pandas.DataFrame.join,** Join columns with other DataFrame either on index or on a key column. Efficiently join multiple DataFrame objects by index at once by passing a list. Parameters. Considering two similar pandas pivot tables, how to join these two tables on their indexes e.g. country.For example: df1.pivot_table(index='country', columns='year

Another way you can do it using `stack`

, `merge`

, and `pivot`

:

df_out = (df2.stack().reset_index() .merge(df1, left_on='level_1', right_on='cat')[['level_0','mag',0]]) df_out.pivot('level_0','mag',0).reset_index(drop=True)

Output:

mag 101 256 760 888 0 E50R AZ33 AZ33 REZ3 1 T605 YYU6 YYU6 YHG5 2 IR50 P0O9 P0O9 BF53 3 NaN YY9I YY9I NaN

**pandas.merge,** Column or index level names to join on. These must be found in both DataFrames. If on is None and not merging on indexes then this defaults to the intersection Pandas.join() : Combining Data on a Column or Index While merge() is a module function,.join() is an object function that lives on your DataFrame. This enables you to specify only one DataFrame, which will join the DataFrame you call.join() on.

You can do this by transpose and merge and then set column 'mag' as index. Then again transposing it.

df2_transposed = df2.T res = df2_transposed.merge(df1,how = "left",left_index=True,right_on='cat') del res['cat'] res.set_index('mag', inplace=True) res.T mag 101 256 760 888 0 E50R AZ33 AZ33 REZ3 1 T605 YYU6 YYU6 YHG5 2 IR50 P0O9 P0O9 BF53 3 YY9I YY9I

**Reshaping and Pivot Tables,** For the curious here is how the above DataFrame was created: B -0.706771 two A -1.039575 B 0.271860 baz one A -0.424972 B 0.567020 two A 0 3 0 0 1 4 1 0 0 5 0 1 0 In [73]: df[['data1']].join(dummies) Out[73]: data1 key_a key_b key_c To join these DataFrames, pandas provides multiple functions like concat(), merge() , join(), etc. In this section, you will practice using merge() function of pandas. You can join DataFrames df_row (which you created by concatenating df1 and df2 along the row) and df3 on the common column (or key) id.

This is slower than Wen's solution, but I think conceptually simpler:

newdf = df2[df1.cat] newdf.columns = df1.mag

The first line says "the ith column of `newdf`

should be the column of `df2`

that corresponds to ith element of `df1.cat`

". The second line says "The columns of `newdf`

should be named according to `df1.mag`

"

**pandas.DataFrame.combine,** Function that takes two series as inputs and return a Series or a scalar. Used to merge the two dataframes column by columns. fill_valuescalar value, default None. pandas provides various facilities for easily combining together Series or DataFrame with various kinds of set logic for the indexes and relational algebra functionality in the case of join / merge-type operations.

**Merge, join, and concatenate,** When gluing together multiple DataFrames (or Panels or), for example, you have a choice of how to handle the other axes (other than the one being Inner Merge / Inner join – The default Pandas behaviour, only keep rows where the merge “on” value exists in both the left and right dataframes. Left Merge / Left outer join – (aka left merge or left join) Keep every row in the left dataframe.

**Reshaping and pivot tables,** To reshape the data into this form, we use the DataFrame.pivot() method In [16]: stacked.unstack(1) Out[16]: second one two first bar A 0.721555 -1.039575 B 0 0 1 0 1 0 1 0 2 1 0 0 3 0 0 1 4 1 0 0 5 0 1 0 In [88]: df[['data1']].join(dummies) In Pandas data reshaping means the transformation of the structure of a table or vector (i.e. DataFrame or Series) to make it suitable for further analysis. Some of Pandas reshaping capabilities do not readily exist in other environments (e.g. SQL or bare bone R) and can be tricky for a beginner. In this post,

**Reshaping Pandas DataFrames,** A guide to DataFrame manipulation using groupby, melt, pivot tables, pivot, Dropping Rows of Data Using Pandas · Merge Sets of Data in Python this in action when applying basic math functions to multiple DataFrames Pandas have options for high-performance in-memory merging and joining. When we need to combine very large DataFrames, joins serve as a powerful way to perform these operations swiftly. Joins can only be done on two DataFrames at a time, denoted as left and right tables. The key is the common column that the two DataFrames will be joined on.