Concatenate all columns in a pandas dataframe

python pandas concatenate multiple columns
dataframe' object has no attribute concat
pandas merge columns into one
concat list to dataframe pandas
pandas concat dict of dataframes
pandas add columns from another dataframe
pandas concat vs append
drop column pandas

I have multiple pandas dataframe which may have different number of columns and the number of these columns typically vary from 50 to 100. I need to create a final column that is simply all the columns concatenated. Basically the string in the first row of the column should be the sum(concatenation) of the strings on the first row of all the columns. I wrote the loop below but I feel there might be a better more efficient way to do this. Any ideas on how to do this

num_columns = df.columns.shape[0]
col_names = df.columns.values.tolist()
df.loc[:, 'merged'] = ""
for each_col_ind in range(num_columns):
    print('Concatenating', col_names[each_col_ind])
    df.loc[:, 'merged'] = df.loc[:, 'merged'] + df[col_names[each_col_ind]]

Solution with sum, but output is float, so convert to int and str is necessary:

df['new'] = df.sum(axis=1).astype(int).astype(str)

Another solution with apply function join, but it the slowiest:

df['new'] = df.apply(''.join, axis=1)

Last very fast numpy solution - convert to numpy array and then 'sum':

df['new'] = df.values.sum(axis=1)

Timings:

df = pd.DataFrame({'A': ['1', '2', '3'], 'B': ['4', '5', '6'], 'C': ['7', '8', '9']})
#[30000 rows x 3 columns]
df = pd.concat([df]*10000).reset_index(drop=True)
#print (df)

cols = list('ABC')

#not_a_robot solution
In [259]: %timeit df['concat'] = pd.Series(df[cols].fillna('').values.tolist()).str.join('')
100 loops, best of 3: 17.4 ms per loop

In [260]: %timeit df['new'] = df[cols].astype(str).apply(''.join, axis=1)
1 loop, best of 3: 386 ms per loop

In [261]: %timeit df['new1'] = df[cols].values.sum(axis=1)
100 loops, best of 3: 6.5 ms per loop

In [262]: %timeit df['new2'] = df[cols].astype(str).sum(axis=1).astype(int).astype(str)
10 loops, best of 3: 68.6 ms per loop

EDIT If dtypes of some columns are not object (obviously strings) cast by DataFrame.astype:

df['new'] = df.astype(str).values.sum(axis=1)

How to Concatenate Column Values in Pandas DataFrame, df1 = df['1st Column Name'] + df['2nd Column Name'] + Notice that the plus symbol ('+') is used to perform the concatenation. Also note that if your dataset  In this short guide, I’ll show you how to concatenate column values in pandas DataFrame. To start, you may use this template to concatenate your column values (for strings only): df1 = df['1st Column Name'] + df['2nd Column Name'] + Notice that the plus symbol (‘+’) is used to perform the concatenation.

df = pd.DataFrame({'A': ['1', '2', '3'], 'B': ['4', '5', '6'], 'C': ['7', '8', '9']})

df['concat'] = pd.Series(df.fillna('').values.tolist()).str.join('')

Gives us:

df
Out[6]: 
   A  B  C concat
0  1  4  7    147
1  2  5  8    258
2  3  6  9    369

To select a given set of columns:

df['concat'] = pd.Series(df[['A', 'B']].fillna('').values.tolist()).str.join('')

df
Out[8]: 
   A  B  C concat
0  1  4  7     14
1  2  5  8     25
2  3  6  9     36

However, I've noticed that approach can sometimes result in NaNs being populated where they shouldn't, so here's another way:

>>> from functools import reduce
>>> df['concat'] = df[cols].apply(lambda x: reduce(lambda a, b: a + b, x), axis=1)
>>> df
   A  B  C concat
0  1  4  7    147
1  2  5  8    258
2  3  6  9    369

Although it should be noted that this approach is a lot slower:

$ python3 -m timeit 'import pandas as pd;from functools import reduce; df=pd.DataFrame({"a": ["this", "is", "a", "string"] * 5000, "b": ["this", "is", "a", "string"] * 5000});[df[["a", "b"]].apply(lambda x: reduce(lambda a, b: a + b, x)) for _ in range(10)]'
10 loops, best of 3: 451 msec per loop

Versus

$ python3 -m timeit 'import pandas as pd;from functools import reduce; df=pd.DataFrame({"a": ["this", "is", "a", "string"] * 5000, "b": ["this", "is", "a", "string"] * 5000});[pd.Series(df[["a", "b"]].fillna("").values.tolist()).str.join(" ") for _ in range(10)]'
10 loops, best of 3: 98.5 msec per loop

pandas.concat, Merge DataFrames by indexes or columns. Notes. The keys, levels, and names arguments are all optional. A walkthrough of how this method fits in with other tools  To concatenate DataFrames, usually with similar columns, use pandas.concat() function. When columns are different, the empty column values are filled with NaN.

I don't have enough reputation to comment, so I'm building my answer off of blacksite's response.

For clarity, LunchBox commented that it failed for Python 3.7.0. It also failed for me on Python 3.6.3. Here is the original answer by blacksite:

df['concat'] = pd.Series(df.fillna('').values.tolist()).str.join('')

Here is my modification for Python 3.6.3:

df['concat'] = pd.Series(df.fillna('').values.tolist()).map(lambda x: ''.join(map(str,x)))

pandas.concat, Any None objects will be dropped silently unless they are all None in which case a When concatenating along the columns (axis=1), a DataFrame is returned. Concatenate two string columns pandas: Method 1. Let’s concatenate two columns of dataframe with ‘+’ as shown below. df1['state_and_code'] = df1['State'] + df1['State_code'] print(df1) So the result will be. Concatenate two string columns pandas: Method 2 cat() Function. Let’s concatenate two columns of dataframe with cat() as shown below

The solutions given above that use numpy arrays have worked great for me.

However, one thing to be careful about is the indexing when you get the numpy.ndarray from df.values, since the axis labels are removed from df.values.

So to take one of the solutions offered above (the one that I use most often) as an example:

df['concat'] = pd.Series(df.fillna('').values.tolist()).str.join('')

This portion:

df.fillna('').values

does not preserve the indices of the original DataFrame. Not a problem when the DataFrame has the common 0, 1, 2, ... row indexing scheme, but this solution will not work when the DataFrame is indexed in any other way. You can fix this by adding an index= argument to pd.Series():

df['concat'] = pd.Series(df.fillna('').values.tolist(), 
                         index=df.index).str.join('')

I always add the index= argument just to be safe, even when I'm sure the DataFrame is row-indexed as 0, 1, 2, ...

Merge, join, and concatenate, The concat() function (in the main pandas namespace) does all of the heavy In the case of DataFrame , the indexes must be disjoint but the columns do not  pandas provides a single function, merge(), as the entry point for all standard database join operations between DataFrame or named Series objects: pd . merge ( left , right , how = 'inner' , on = None , left_on = None , right_on = None , left_index = False , right_index = False , sort = True , suffixes = ( '_x' , '_y' ), copy = True , indicator = False , validate = None )

Merge, join, and concatenate, The concat function (in the main pandas namespace) does all of the heavy In the case of DataFrame, the indexes must be disjoint but the columns do not need​  When concatenating all Series along the index (axis=0), a Series is returned. When objs contains at least one DataFrame, a DataFrame is returned. When concatenating along the columns (axis=1), a DataFrame is returned.

Concatenation (Combining Data Tables) in Python and Pandas: A , For simple operations where we need to add rows or columns of the same length, the pd.concat() function is perfect. All we have to do is pass in a list of DataFrame​  How to Combine Two Columns in Pandas with + operator . Another way to join two columns in Pandas is to simply use the + symbol. For example, to concatenate First Name column and Last Name column, we can do. df["Name"] = df["First"] + df["Last"] We will get our results like this.

Combining DataFrames with Pandas – Data Analysis and , Combine data from multiple files into a single DataFrame using merge and concat. We can use the concat function in pandas to append either columns or rows from For kicks read our output back into Python and make sure all looks good  For pandas.DataFrame, both join and merge operates on columns and rename the common columns using the given suffix. In terms of row-wise alignment, merge provides more flexible control. Different from join and merge , concat can operate on columns or rows, depending on the given axis, and no renaming is performed.

Comments
  • Is there a way to only concatenate the last 20 columns but only concatenate a particular column if there is data? Also, i would want to have a delimiter in place so when you're looking at your overall column, you can see how it's broken out.
  • Maybe it's because I'm using a more recent version of python, but I copied exactly what you had and it did not work. Dataframe and all. I'm using version 3.7.0
  • Thanks, @bodily. I ended up in the same scenario and your answer helped me.
  • how to avoid Na values here