Pandas Sum DataFrame Columns of various types

pandas sum one column
pandas sum column with condition
pandas sum multiple columns at once
pandas sum two columns
how to sum a column in python
pandas add total column
pandas sum specific rows
pandas sum columns by name

I am trying to concatenate two columns of a Pandas DataFrame:

df = pd.DataFrame({'A': [2, 1, 3, 4], 'B': ['a', 'b', 'c', 'd']})

(Formatted):

   A  B
0  2  a
1  1  b
2  3  c
3  4  d

Trying sum([df[column] for column in df]) doesn't work, obviously because you can't map adding integers (column A) to strings (columns B).

So I added the lines:

for column in df1:
    df1[column] = df1[column].apply(str)

And just to make sure the string conversions were working properly, I added the following statement:

print([df[column].apply(type) for column in df])

Which produces

In : print([df[column].apply(type) for column in df])

Out:
[0    <class 'str'>
1    <class 'str'>
2    <class 'str'>
3    <class 'str'>
Name: A, dtype: object, 0    <class 'str'>
1    <class 'str'>
2    <class 'str'>
3    <class 'str'>
Name: B, dtype: object]

But still, when I run sum([df[column] for column in df]) I get the error TypeError: unsupported operand type(s) for +: 'int' and 'str'.

What is going on?

IIUC, you can concatenate your columns like this:

df.astype(str).sum(axis=1)

0    2a
1    1b
2    3c
3    4d
dtype: object

This turns all columns to type str (df.astype(str)) and then uses sum to concatenate row-wise (axis=1)

Pandas sum(), IIUC, you can concatenate your columns like this: df.astype(str).sum(axis=1) 0 2a 1 1b 2 3c 3 4d dtype: object. This turns all columns to type str  I can sum a and b that way: In [4]: sum(df['a']) + sum(df['b']) Out[4]: 18 However this is not very convenient for larger dataframe, where you have to sum multiple columns together. Is there a neater way to sum columns (similar to the below)? What if I want to sum the entire DataFrame without specifying the columns?

Use

In [99]: df.A.astype(str) + df.B
Out[99]:
0    2a
1    1b
2    3c
3    4d
dtype: object

Alternative, with apply, which could be slow.

In [106]: df.apply(lambda x: '{A}{B}'.format(**x), axis=1)
Out[106]:
0    2a
1    1b
2    3c
3    4d
dtype: object

@JonClements has a nice alternative with format_map

In [124]: df.apply('{A}{B}'.format_map, axis=1)
Out[124]:
0    2a
1    1b
2    3c
3    4d
dtype: object

How to sum two columns in a pandas DataFrame in Python, If the input is index axis then it adds all the values in a column and repeats the same for all the columns and returns a series containing the sum of all the values in  Pandas dataframe.sum() function return the sum of the values for the requested axis. If the input is index axis then it adds all the values in a column and repeats the same for all the columns and returns a series containing the sum of all the values in each column.

If you're interested in performance, use f-strings and a list comprehension.

pd.Series([f'{i}{j}' for i,j in zip(df.A, df.B)])

0    2a
1    1b
2    3c
3    4d
dtype: object

Due to pandas handling strings inefficiently, this will be a very fast option comparatively.

Python, file using Python · Django Forms · Declare an empty List in Python · Python return statement Pandas Series.sum() method is used to get the sum of the values for the requested axis. If an entire row/column is NA, the result will be NA Code #1: By default, the sum of an empty or all-NA Series is 0. DataFrame(data). pandas.DataFrame.sum¶ DataFrame. sum ( self , axis=None , skipna=None , level=None , numeric_only=None , min_count=0 , **kwargs ) [source] ¶ Return the sum of the values for the requested axis.

Pandas Sum DataFrame Columns of various types, Pandas DataFrame - sum() function: The sum() function is used to return the sum of the values for the requested axis. Name, Description, Type/Default Value, Required / Optional. axis. Axis for numeric_only, Include only float, int, boolean columns. By default, the sum of an empty or all-NA Series is 0. Use Dataframe.dtypes to get Data types of columns in Dataframe In Python’s pandas module Dataframe class provides an attribute to get the data type information of each columns i.e.

Python, I have a pandas DataFrame with 2 columns x and y . How do I create a new column z which is the sum of the values from the other columns? Before We can use DataFrame.apply to apply a function to all columns axis=0 (the df.x 0 1 1 2 2 3 Name: x, dtype: int64 >>> type(df.x) <class 'pandas.core.series. Return the dtypes in the DataFrame. This returns a Series with the data type of each column. The result’s index is the original DataFrame’s columns. Columns with mixed types are stored with the object dtype. See the User Guide for more. The data type of each column. Dtype and sparsity information.

Pandas DataFrame: sum() function, For each month across all employees (sum by row). Step 2: Create the DataFrame. Next, create the DataFrame in order to capture the above data in Python: import  C: \python\pandas examples > python example16. py Age int64 Color object Food object Height int64 Score float64 State object dtype: object C: \python\pandas examples > 2018-12-08T15:01:41+05:30 2018-12-08T15:01:41+05:30 Amit Arora Amit Arora Python Programming Tutorial Python Practical Solution.

Comments
  • By "Concatenate" I mean string concatenate the columns to generate a Series of strings
  • The expected output should be a Pandas Series with elements '2a', '1b', '3c', '4d'. I'm not worried about the column header.
  • This is quick and easy, ++1
  • I like this, but when I generalize your solution to s = sum([df[column].astype(str) for column in df]), why am I getting the unsupported type exception again?
  • @Zero you can be a little cheeky on lambda x: '{A}{B}'.format(**x) and make it '{A}{B}'.format_map there... (seems about 10% faster avoiding the lambda and dict-unpacking)
  • @JonClements -- oh my, wasn't in aware of format_map, pretty cool, thanks.
  • Seriously, I've written that lambda before. lambda kw: '{A}{B}'.format(**kw)
  • Or even more time saved using df.A.values and df.B.values