pandas pivot and join in two dataframes

pandas pivot and join in two dataframes

pandas merge
pandas merge on multiple columns
python pandas concatenate multiple columns
pandas inner join
pandas join vs merge
pandas pivot table to dataframe
pandas merge list of dataframes
pandas combine rows

I have two dataFrames :

df1
   mag   cat
0  101   A1
1  256   A2  
2  760   A2
3  888   A3  
...

df2
   A1    A2    A3    ...
0  E50R  AZ33  REZ3 
1  T605  YYU6  YHG5
2  IR50  P0O9  BF53
3  NaN   YY9I  NaN

And I would like to create a final DataFrame which looks like :

df
   101   256   760   888  ...
0  E50R  AZ33  AZ33  REZ3
1  T605  YYU6  YYU6  YHG5
2  IR50  P0O9  P0O9  BF53
3  NaN   YY9I  YY9I  NaN

I tried something with pivot, but it doesn't seem to do the job Could you help me ?


You can use a combination of GroupBy, numpy.repeat, itertools.chain:

from itertools import chain

# map cat to list of mag
s = df1.groupby('cat')['mag'].apply(list)

# calculate indices for columns, including repeats
cols_idx = np.repeat(range(len(df2.columns)), s.map(len))

# apply indexing
res = df2.iloc[:, cols_idx]

# rename columns
res.columns = list(chain.from_iterable(df2.columns.map(s.get)))

print(res)

    101   256   760   888
0  E50R  AZ33  AZ33  REZ3
1  T605  YYU6  YYU6  YHG5
2  IR50  P0O9  P0O9  BF53
3   NaN  YY9I  YY9I   NaN

Performance benchmarking

Some good and different solutions here, so you may be interested in performance. Wen's reindex solution is the clear winner.

%timeit wen(df1, df2)   # 632 µs per loop
%timeit jpp(df1, df2)   # 2.55 ms per loop
%timeit scb(df1, df2)   # 7.98 ms per loop
%timeit abhi(df1, df2)  # 4.52 ms per loop

Code:

def jpp(df1, df2):
    s = df1.groupby('cat')['mag'].apply(list)
    cols_idx = np.repeat(range(len(df2.columns)), s.map(len))
    res = df2.iloc[:, cols_idx]
    res.columns = list(chain.from_iterable(df2.columns.map(s.get)))    
    return res

def scb(df1, df2):
    df_out = (df2.stack().reset_index()
                 .merge(df1, left_on='level_1', right_on='cat')[['level_0','mag',0]])
    return df_out.pivot('level_0','mag',0).reset_index(drop=True)    

def abhi(df1, df2):
    return df2.T.merge(df1, left_index=True, right_on='cat').drop('cat', axis=1).set_index('mag').T

def wen(df1, df2):
    newdf=df2.reindex(columns=df1.cat)
    newdf.columns=df1.mag
    return newdf

Merge, join, and concatenate, Merge, join, and concatenate; Reshaping and pivot tables · Working with text data When gluing together multiple DataFrames, you have a choice of how to  pandas.DataFrame.join¶ DataFrame.join (self, other, on=None, how='left', lsuffix='', rsuffix='', sort=False) → 'DataFrame' [source] ¶ Join columns of another DataFrame. Join columns with other DataFrame either on index or on a key column. Efficiently join multiple DataFrame objects by index at once by passing a list. Parameters


IIUC reindex +re-name

newdf=df2.reindex(columns=df1.cat)
newdf.columns=df1.mag
newdf
Out[519]: 
mag   101   256   760   888
0    E50R  AZ33  AZ33  REZ3
1    T605  YYU6  YYU6  YHG5
2    IR50  P0O9  P0O9  BF53
3     NaN  YY9I  YY9I   NaN

pandas.DataFrame.join, Join columns with other DataFrame either on index or on a key column. Efficiently join multiple DataFrame objects by index at once by passing a list. Parameters. Considering two similar pandas pivot tables, how to join these two tables on their indexes e.g. country.For example: df1.pivot_table(index='country', columns='year


Another way you can do it using stack, merge, and pivot:

df_out = (df2.stack().reset_index()
             .merge(df1, left_on='level_1', right_on='cat')[['level_0','mag',0]])

df_out.pivot('level_0','mag',0).reset_index(drop=True)

Output:

mag   101   256   760   888
0    E50R  AZ33  AZ33  REZ3
1    T605  YYU6  YYU6  YHG5
2    IR50  P0O9  P0O9  BF53
3     NaN  YY9I  YY9I   NaN

pandas.merge, Column or index level names to join on. These must be found in both DataFrames. If on is None and not merging on indexes then this defaults to the intersection  Pandas.join() : Combining Data on a Column or Index While merge() is a module function,.join() is an object function that lives on your DataFrame. This enables you to specify only one DataFrame, which will join the DataFrame you call.join() on.


You can do this by transpose and merge and then set column 'mag' as index. Then again transposing it.

df2_transposed = df2.T
res = df2_transposed.merge(df1,how = "left",left_index=True,right_on='cat')
del res['cat']
res.set_index('mag', inplace=True)
res.T

mag 101      256    760     888
0   E50R    AZ33    AZ33    REZ3
1   T605    YYU6    YYU6    YHG5
2   IR50    P0O9    P0O9    BF53
3           YY9I    YY9I    

Reshaping and Pivot Tables, For the curious here is how the above DataFrame was created: B -0.706771 two A -1.039575 B 0.271860 baz one A -0.424972 B 0.567020 two A 0 3 0 0 1 4 1 0 0 5 0 1 0 In [73]: df[['data1']].join(dummies) Out[73]: data1 key_a key_b key_c  To join these DataFrames, pandas provides multiple functions like concat(), merge() , join(), etc. In this section, you will practice using merge() function of pandas. You can join DataFrames df_row (which you created by concatenating df1 and df2 along the row) and df3 on the common column (or key) id.


This is slower than Wen's solution, but I think conceptually simpler:

newdf = df2[df1.cat]
newdf.columns = df1.mag

The first line says "the ith column of newdf should be the column of df2 that corresponds to ith element of df1.cat". The second line says "The columns of newdf should be named according to df1.mag"

pandas.DataFrame.combine, Function that takes two series as inputs and return a Series or a scalar. Used to merge the two dataframes column by columns. fill_valuescalar value, default None. pandas provides various facilities for easily combining together Series or DataFrame with various kinds of set logic for the indexes and relational algebra functionality in the case of join / merge-type operations.


Merge, join, and concatenate, When gluing together multiple DataFrames (or Panels or), for example, you have a choice of how to handle the other axes (other than the one being  Inner Merge / Inner join – The default Pandas behaviour, only keep rows where the merge “on” value exists in both the left and right dataframes. Left Merge / Left outer join – (aka left merge or left join) Keep every row in the left dataframe.


Reshaping and pivot tables, To reshape the data into this form, we use the DataFrame.pivot() method In [16​]: stacked.unstack(1) Out[16]: second one two first bar A 0.721555 -1.039575 B 0 0 1 0 1 0 1 0 2 1 0 0 3 0 0 1 4 1 0 0 5 0 1 0 In [88]: df[['data1']].join(dummies)  In Pandas data reshaping means the transformation of the structure of a table or vector (i.e. DataFrame or Series) to make it suitable for further analysis. Some of Pandas reshaping capabilities do not readily exist in other environments (e.g. SQL or bare bone R) and can be tricky for a beginner. In this post,


Reshaping Pandas DataFrames, A guide to DataFrame manipulation using groupby, melt, pivot tables, pivot, Dropping Rows of Data Using Pandas · Merge Sets of Data in Python this in action when applying basic math functions to multiple DataFrames  Pandas have options for high-performance in-memory merging and joining. When we need to combine very large DataFrames, joins serve as a powerful way to perform these operations swiftly. Joins can only be done on two DataFrames at a time, denoted as left and right tables. The key is the common column that the two DataFrames will be joined on.