Pandas expand rows from list data available in column

pandas list column to multiple rows
pandas explode list to rows
pandas explode column to rows
split row into multiple rows python
pandas explode list to columns
pandas split series of lists into columns
pandas dataframe split rows
pandas explode array into columns

I have a data frame like this in pandas:

 column1      column2
 [a,b,c]        1
 [d,e,f]        2
 [g,h,i]        3
Expected output:
column1      column2
  a              1
  b              1
  c              1
  d              2
  e              2
  f              2
  g              3
  h              3
  i              3

How to process this data ?


You can create DataFrame by its constructor and stack:

 df2 = pd.DataFrame(df.column1.tolist(), index=df.column2)
        .stack()
        .reset_index(level=1, drop=True)
        .reset_index(name='column1')[['column1','column2']]
print (df2)

  column1  column2
0       a        1
1       b        1
2       c        1
3       d        2
4       e        2
5       f        2
6       g        3
7       h        3
8       i        3

If need change ordering by subset [['column1','column2']], you can also omit first reset_index:

df2 = pd.DataFrame(df.column1.tolist(), index=df.column2)
        .stack()
        .reset_index(name='column1')[['column1','column2']]
print (df2)
  column1  column2
0       a        1
1       b        1
2       c        1
3       d        2
4       e        2
5       f        2
6       g        3
7       h        3
8       i        3

Another solution DataFrame.from_records for creating DataFrame from first column, then create Series by stack and join to original DataFrame:

df = pd.DataFrame({'column1': [['a','b','c'],['d','e','f'],['g','h','i']],
                   'column2':[1,2,3]})


a = pd.DataFrame.from_records(df.column1.tolist())
                .stack()
                .reset_index(level=1, drop=True)
                .rename('column1')

print (a)
0    a
0    b
0    c
1    d
1    e
1    f
2    g
2    h
2    i
Name: column1, dtype: object

print (df.drop('column1', axis=1)
         .join(a)
         .reset_index(drop=True)[['column1','column2']])

  column1  column2
0       a        1
1       b        1
2       c        1
3       d        2
4       e        2
5       f        2
6       g        3
7       h        3
8       i        3

Expand Cells Containing Lists Into Their Own Variables In Pandas, Expand Cells Containing Lists Into Their Own Variables In Pandas DataFrame(​raw_data, columns = ['score', 'tags']) # view the dataset df  Expand cells containing lists into their own variables in pandas. Expand Cells Containing Lists Into Their Own Variables In Pandas (raw_data, columns


2019 updated answer

Since pandas >= 0.25.0 we have the explode method for this, which expands list to a row for each element and repeats the rest of the columns:

df.explode('column1').reset_index(drop=True)

Output

  column1  column2
0       a        1
1       b        1
2       c        1
3       d        2
4       e        2
5       f        2
6       g        3
7       h        3
8       i        3

pandas.DataFrame.explode, The result dtype of the subset rows will be object. Scalars will be returned unchanged. Empty list-likes will result in a np.nan for that row. Examples. >  Use the T attribute or the transpose () method to swap (= transpose) the rows and columns of pandas.DataFrame. Neither method changes the original object, but returns a new object with the rows and columns swapped (= transposed object).


Another solution is to use the result_type='expand' argument of the pandas.apply function available since pandas 0.23. Answering @splinter's question this method can be generalized -- see below:

import pandas as pd
from numpy import arange

df = pd.DataFrame(
    {'column1' : [['a','b','c'],['d','e','f'],['g','h','i']],
    'column2': [1,2,3]}
)

pd.melt(
    df.join(
        df.apply(lambda row: row['column1'], axis=1, result_type='expand')
        ),
 value_vars=arange(df['column1'].shape[0]), value_name='column1', var_name='column2')[['column1','column2']]

# can be generalized 

df = pd.DataFrame(
    {'column1' : [['a','b','c'],['d','e','f'],['g','h','i']],
    'column2': [1,2,3],
    'column3': [[1,2],[2,3],[3,4]],
    'column4': [42,23,321],
    'column5': ['a','b','c']}
)

(pd.melt(
    df.join(
        df.apply(lambda row: row['column1'], axis=1, result_type='expand')
        ),
 value_vars=arange(df['column1'].shape[0]), value_name='column1', id_vars=df.columns[1:])
 .drop(columns=['variable'])[list(df.columns[:1]) + list(df.columns[1:])]
 .sort_values(by=['column1']))

UPDATE (for Jwely's comment): if you have lists with varying length, you can do:

df = pd.DataFrame(
    {'column1' : [['a','b','c'],['d','f'],['g','h','i']],
    'column2': [1,2,3]}
)

longest = max(df['column1'].apply(lambda x: len(x)))

pd.melt(
    df.join(
        df.apply(lambda row: row['column1'] if len(row['column1']) >= longest else row['column1'] + [None] * (longest - len(row['column1'])), axis=1, result_type='expand')
    ),
 value_vars=arange(df['column1'].shape[0]), value_name='column1', var_name='column2').query("column1 == column1")[['column1','column2']]

pandas.DataFrame.expanding, Provide expanding transformations. Parameters. min_periodsint, default 1. Minimum number of observations in window required to have a value (otherwise result  import copy def pandas_explode(df, column_to_explode): """ Similar to Hive's EXPLODE function, take a column with iterable elements, and flatten the iterable to one element per observation in the output table :param df: A dataframe to explod :type df: pandas.DataFrame :param column_to_explode: :type column_to_explode: str :return: An exploded


How to split a list inside a Dataframe cell into rows in Pandas , For testing I limited the data set to three rows. Now we can merge the columns with the rest of the data set. There is a lot of empty values, but that  Dealing with Rows and Columns in Pandas DataFrame A Data frame is a two-dimensional data structure, i.e., data is aligned in a tabular fashion in rows and columns. We can perform basic operations on rows/columns like selecting, deleting, adding, and renaming.


Expanding lists in Pandas dataframes - John A. Dungan, This is a map for creating a new column. Each item in 'destination_rows' has the index to the original row which will be the source of the new data  Not exactly the question since here you know that the second index is all days in January, but suppose you have another index say from another data frame df1, which might be disjoint and with a random frequency. Then you can do this: ix = pd.DatetimeIndex(list(df2.index) + list(df1.index)).unique().sort_values() df2.reindex(ix)


Pandas expand rows from list data available in column, Pandas expand rows from list data available in column. pandas list column to multiple rows pandas explode list to rows pandas explode column to rows split row  This routine will explode list-likes including lists, tuples, Series, and np.ndarray. The result dtype of the subset rows will be object. Scalars will be returned unchanged. Empty list-likes will result in a np.nan for that row. Examples >>>