Aggregating column values of dataframe to a new dataframe

pandas groupby aggregate multiple columns
pandas aggregate count
pandas aggregate custom function multiple columns
pandas groupby multiple columns
pandas groupby aggregate to list
pandas groupby sum multiple columns
pandas dataframe groupby
pandas groupby max

I have a dataframe which involves Vendor, Product, Price of various listings on a market among other column values.

I need a dataframe which has the unique vendors, number of products, sum of their product listings, average price/product and (average * no. of sales) as different columns.

Something like this -

What's the best way to make this new dataframe?

Thanks!

You can do this by using pandas pivot_table. Here is an example based on your data.

import pandas as pd
import numpy as np

>>> f = pd.pivot_table(d, index=['Vendor', 'Sales'], values=['Price', 'Product'], aggfunc={'Price': np.sum, 'Product':np.ma.count}).reset_index()

>>> f['Avg Price/Product'] = f['Price']/f['Product']

>>> f['H Factor'] = f['Sales']*f['Avg Price/Product']

>>> f.drop('Sales', axis=1)

  Vendor  Price  Product  Avg Price/Product  H Factor
0      A    121        4              30.25    6050.0
1      B     12        1              12.00    1440.0
2      C     47        2              23.50     587.5
3      H     45        1              45.00    9000.0

Aggregating column values of dataframe to a new dataframe, You can do this by using pandas pivot_table. Here is an example based on your data. import pandas as pd import numpy as np >>> f  Here’s how to group your data by specific columns and apply functions to other columns in a Pandas DataFrame in Python. Create the DataFrame with some example data You should see a DataFrame that looks like this: Example 1: Groupby and sum specific columns Let’s say you want to count the number of units, but … Continue reading "Python Pandas – How to groupby and aggregate a DataFrame"

First multiple columns Number of Sales with Price, then use DataFrameGroupBy.agg by dictionary of columns names with aggregate functions, then flatten MultiIndex in columns by map and rename. :

df['Number of Sales'] *=  df['Price']

d1 = {'Product':'size', 'Price':['sum', 'mean'], 'Number of Sales':'mean'}
df = df.groupby('Vendor').agg(d1)
df.columns = df.columns.map('_'.join)
d = {'Product_size':'No. of Product',
     'Price_sum':'Sum of Prices',
     'Price_mean':'Mean of Prices',
     'Number of Sales_mean':'H Factor'
     }
df = df.rename(columns=d).reset_index()
print (df)
  Vendor  No. of Product  Sum of Prices  Mean of Prices  H Factor
0      A               4            121           30.25    6050.0
1      B               1             12           12.00    1440.0
2      C               2             47           23.50     587.5
3      H               1             45           45.00    9000.0

Summarising, Aggregating, and Grouping data in Python Pandas , Aggregation and grouping of Dataframes is accomplished in Python Pandas This post has been updated to reflect the new changes. of aggregated calculations, and each will be passed the values from the column in your grouped data. Series : when DataFrame.agg is called with a single function; DataFrame : when DataFrame.agg is called with several functions; Return scalar, Series or DataFrame. The aggregation operations are always performed over an axis, either the index (default) or the column axis.

You can do it using groupby(), like this:

df.groupby('Vendor').agg({'Products': 'count', 'Price': ['sum', 'mean']})

That's just three columns, but you can work out the rest.

Python, For each column which are having numeric values, minimum and sum of all values has been found. For dataframe df , we have four such columns Number, Age,  I have a DataFrame like below. I need to create a new column based on existing columns. col1 col2 a 1 a 2 b 1 c 1 d 1 d 2 Output Data Frame look like this . col1 col2 col3 col4 a 1 1 2 a 2 1 2 b 1 0 1 c 1 0 1 d 1 1 2 d 2 1 2

Pandas DataFrame: GroupBy Examples, By default, aggregation columns get the name of the DataFrame({ 'value':[20.45​,22.89,32.12,111.22,33.22 Give it a more intuitive name using reset_index(​name='new name') Add New Column to Dataframe. Pandas allows to add a new column by initializing on the fly. For example: the list below is the purchase value of three different regions i.e. West, North and South. We want to add this new column to our existing dataframe above

Group By: split-apply-combine, For DataFrame objects, a string indicating a column to be used to group. and thus the output of aggregation functions will only contain unique index values: the result of the aggregation will have the group names as the new index along  For multiplied functions applied for one column use list of tuples - names of new columns and aggregted functions: df4 = (df.groupby(['A', 'B'])['C'] .agg([('average','mean'),('total','sum')]) .reset_index()) print (df4) A B average total 0 bar three 2.0 2 1 bar two 3.0 3 2 foo one 2.0 4 3 foo two 2.5 5

Aggregation and Grouping, Pandas Series and DataFrame s include all of the common aggregates Let's use this on the Planets data, for now dropping rows with missing values: The GroupBy object supports column indexing in the same way as the DataFrame , and Orbital Brightness Modulation, which were not used to discover a new planet  I'm trying to generate a new column in a pandas DataFrame that equals values in another pandas DataFrame. When I attempt to create the new column I just get NaNs for the new column values. First I use an API call to get some data, and the 'mydata' DataFrame is one column of data indexed by dates

Comments
  • Can you please explain np.ma.count @Rahul? Any ideas how I can get the column for H Factor?
  • @harry04 what is the formula for H factor?
  • HF = Avg price/product * Number of Sales (for a vendor) from above table.
  • @harry04 Updated the answer, values of HF differ from the ones you provided in question, hope I've got it right.
  • I don't think 'Sales' should be np.sum @Rahul. It's a constant for each vendor from the original dataframe.
  • H Factor = Mean of Prices * No. of sales (original dataframe). How can I do that?
  • @harry04 - It is df['H Factor'] *= df['Mean of Prices']
  • your calculation and result for H Factor is different from mine (see my results table above)...
  • @harry04 - There is problem last value of last row is 200, not 55, then get expected output.
  • @harry04 - Thanks, glad to help. Don't forget to accept the answer, if it suits you! :)