Generating a weighted average value with data from pandas and a dictionary?

Related searches

I have a dataframe:

             SALES 
Date                       
2018-03-31  123090     
2018-04-30  116591      
2018-05-31  119581      
2018-06-30  117544      
2018-07-31  129574      
2018-08-31  118876      
2018-09-30  129467      
2018-10-31  126062     
2018-11-30  128552     
2018-12-31  104994     
2019-01-31  149188      
2019-02-28  118204      

And a dictionary, price:

{Oct: 11, Nov: 23, Dec: 34, Jan: 20, Feb: 30, Mar: 31, Apr: 22, May: 
23, Jun: 34, Jul: 20, Aug: 30, Sep: 31}

I want to calculate a weighted average price by multiplying each of the sales figures from the DataFrame with the corresponding months from the dictionary and then dividing by the total sales figures. i.e. taking the sales for of 126062 for October from the dataframe and then multiplying it by 11 (Oct) from the dictionary.

I have tried adding a month column and re-ordering the dataframe and then use an ordered dictionary but I feel like I am using the proverbial sledge hammer for this problem.

             SUM  MONTH
Date                       
2019-01-31  129188.1      1
2019-02-28  118304.5      2
2018-03-31  123090.6      3
2018-04-30  116591.2      4
2018-05-31  119581.5      5
2018-06-30  117544.0      6
2018-07-31  129574.9      7
2018-08-31  118876.2      8
2018-09-30  109467.5      9
2018-10-31  126062.0     10
2018-11-30  128552.9     11
2018-12-31  104994.2     12

I have also tried to look at zip and iterating over both the dataframe and dictionary but I'm struggling to find the best way to map the two datasets together.

I am happy to convert the dictionary to another dataframe if that makes it easier?

Any help would be appreciated.

You can use map with a DatetimeIndex method strftime:

Where df, dataframe and dd, dictionary of waits are defined as,

d = {'SALES': {pd.Timestamp('2018-03-31 00:00:00'): 123090,
  pd.Timestamp('2018-04-30 00:00:00'): 116591,
  pd.Timestamp('2018-05-31 00:00:00'): 119581,
  pd.Timestamp('2018-06-30 00:00:00'): 117544,
  pd.Timestamp('2018-07-31 00:00:00'): 129574,
  pd.Timestamp('2018-08-31 00:00:00'): 118876,
  pd.Timestamp('2018-09-30 00:00:00'): 129467,
  pd.Timestamp('2018-10-31 00:00:00'): 126062,
  pd.Timestamp('2018-11-30 00:00:00'): 128552,
  pd.Timestamp('2018-12-31 00:00:00'): 104994,
  pd.Timestamp('2019-01-31 00:00:00'): 149188,
  pd.Timestamp('2019-02-28 00:00:00'): 118204}}

df = pd.DataFrame(d)

dd = {'Oct': 11, 'Nov': 23, 'Dec': 34, 'Jan': 20, 'Feb': 30, 'Mar': 31, 'Apr': 22,'May': 
23, 'Jun': 34, 'Jul': 20, 'Aug': 30,'Sep': 31}

Use

df['Adj Sales'] = df.index.strftime('%b').map(dd) * df['SALES']

Output:

             SALES  Adj Sales
2018-03-31  123090    3815790
2018-04-30  116591    2565002
2018-05-31  119581    2750363
2018-06-30  117544    3996496
2018-07-31  129574    2591480
2018-08-31  118876    3566280
2018-09-30  129467    4013477
2018-10-31  126062    1386682
2018-11-30  128552    2956696
2018-12-31  104994    3569796
2019-01-31  149188    2983760
2019-02-28  118204    3546120

Learn More About Pandas By Building and Using a Weighted , Building a weighted average function in pandas is relatively simple but can be Because we need values and weights, it can be a little less intuitive to We absolutely could but I wanted to show how to create a formula. by defining a dictionary with the column names and aggregation functions to call. A weighted average can be calculated like this: ( 300 ∗ 20 + 200 ∗ 100 + 150 ∗ 225) ( 20 + 100 + 225) = $ 173.19. Since we are selling the vast majority of our shoes between $200 and $150, this number represents the overall average price of our products more accurately than the simple average.

Try this to get the weights column:

my_dict = {'Oct': 11, 'Nov': 23, 'Dec': 34, 
           'Jan': 20, 'Feb': 30, 'Mar': 31, 
           'Apr': 22, 'May': 23, 'Jun': 34, 
           'Jul': 20, 'Aug': 30, 'Sep': 31}
weights = pd.Series(my_dict)

df.Date = pd.to_datetime(df.Date)
df.set_index(df.Date.dt.strftime("%b"),
             inplace=True)

df['Weights'] = weights

df.reset_index(drop=True, inplace=True)

then df is:

    Date        SALES   Weights
0   2018-03-31  123090  31
1   2018-04-30  116591  22
2   2018-05-31  119581  23
3   2018-06-30  117544  34
4   2018-07-31  129574  20
5   2018-08-31  118876  30
6   2018-09-30  129467  31
7   2018-10-31  126062  11
8   2018-11-30  128552  23
9   2018-12-31  104994  34
10  2019-01-31  149188  20
11  2019-02-28  118204  30

Need some explaining, i don't really get average and dictionaries , Now in Python, the way to get a value from a dictionary is by using dictionary_name[key] . + 0.6 * average(n[“tests”]) #weight(%) x average x grade -values print '''Create a function to calculate averages, where parameter passed to the� Pandas Replace from Dictionary Values. We will now see how we can replace the value of a column with the dictionary values. Create a Dataframe. Let’s create a dataframe of five Names and their Birth Month

I would do it like this: First create the 'weight' column:

df['weight'] = [month[ind_month] for ind_month in df.index.month_name().str[:3].values]

Out[48]:
            Sales  weight
2018-03-31    100      31
2018-04-30    101      22
2018-05-31    102      23
2018-06-30    103      34
2018-07-31    104      20
2018-08-31    105      30
2018-09-30    106      31
2018-10-31    107      11
2018-11-30    108      23
2018-12-31    109      34
2019-01-31    110      20
2019-02-28    111      30
2019-03-31    112      31
2019-04-30    113      22

where:

 month = {'Oct': 11,'Nov': 23,'Dec': 34, 'Jan': 20, 'Feb': 30, 'Mar': 31,'Apr': 22, 'May': ^M
   ...: 23, 'Jun': 34, 'Jul': 20,'Aug': 30, 'Sep': 31}

and then mulitply columns:

df['weighted_Sales'] = df.weight * df.Sales

which produces:

    Out[50]:
             Sales  weight  weighted_Sales
2018-03-31    100      31            3100
2018-04-30    101      22            2222
2018-05-31    102      23            2346
2018-06-30    103      34            3502
2018-07-31    104      20            2080
2018-08-31    105      30            3150
2018-09-30    106      31            3286
2018-10-31    107      11            1177
2018-11-30    108      23            2484
2018-12-31    109      34            3706
2019-01-31    110      20            2200
2019-02-28    111      30            3330
2019-03-31    112      31            3472
2019-04-30    113      22            2486

pandas.DataFrame.aggregate — pandas 1.1.1 documentation, Function to use for aggregating the data. dict of axis labels -> functions, function names or list of such. This behavior is different from: numpy aggregation functions ( mean , median , prod , sum , std ,: var ), where the default is to compute the aggregation of the Perform operation over exponential weighted window. [code]import pandas as pd import numpy as np df = pd.DataFrame({'a': [300, 200, 100], 'b': [10, 20, 30]}) # using formula wm_formula = (df['a']*df['b&#039

Step 1. Create a price dataframe out of dictionary

dict_p = {"Oct": 11, "Nov": 23, "Dec": 34, "Jan": 20, "Feb": 30, "Mar": 31, "Apr": 22, "May": 23, "Jun": 34, "Jul": 20, "Aug": 30, "Sep": 31}
dict_m = {"Oct": 10, "Nov": 11, "Dec": 12, "Jan": 1, "Feb": 2, "Mar": 3, "Apr": 4, "May": 5, "Jun": 6, "Jul": 7, "Aug": 8, "Sep": 9}

import pandas as pd

price = pd.DataFrame.from_dict(dict_p, orient = "index", columns = ["price"])
month = pd.DataFrame.from_dict(dict_m, orient = "index", columns = ["month"])

df_price = pd.concat([price, month],axis = 1)
print(df_price)

Produces:

 price  month
Oct     11     10
Nov     23     11
Dec     34     12
Jan     20      1
Feb     30      2
Mar     31      3
Apr     22      4
May     23      5
Jun     34      6
Jul     20      7
Aug     30      8
Sep     31      9

Step 2. Merge price and sales data

df_sales = pd.DataFrame(d)
df_sales["month"] = df_sales.index.month

df = df_sales.merge(df_price)
print(df)

Produces:

     SALES  month  price
0   123090      3     31
1   116591      4     22
2   119581      5     23
3   117544      6     34
4   129574      7     20
5   118876      8     30
6   129467      9     31
7   126062     10     11
8   128552     11     23
9   104994     12     34
10  149188      1     20
11  118204      2     30

Step 3. Calculate weights and compute weighted average price

df["weight"] = df.SALES/df.SALES.sum()
price_weighted_ave = sum(df.price*df.weight)
print(price_weighted_ave)

Produces:

25.471658332900283

pandas.DataFrame.stack — pandas 1.1.1 documentation, DataFrame.mean � pandas. Whether to drop rows in the resulting Frame/Series with missing values. Stacking a column level onto the index axis can create combinations of index and column values that are missing df_single_level_cols weight height cat 0 1 dog 2 3 >>> df_single_level_cols. stack() cat weight 0 height 1� (2) Average for each row: df.mean(axis=1) Next, I’ll review an example with the steps to get the average for each column and row for a given DataFrame. Steps to get the Average for each Column and Row in Pandas DataFrame Step 1: Gather the data. To start, gather the data that needs to be averaged.

Calculating Seasonal Averages from Timeseries of Monthly Means , Creating a Dataset � Dataset contents � Dictionary like methods � Modifying datasets Suppose we have a netCDF or xray Dataset of monthly mean data and we To do this properly, we need to calculate the weighted average considering that import numpy as np import pandas as pd import xray from netCDF4 import� Implementing Moving Average on Time Series Data Simple Moving Average (SMA) First, let's create dummy time series data and try implementing SMA using just Python. Assume that there is a demand for a product and it is observed for 12 months (1 Year), and you need to find moving averages for 3 and 4 months window periods. Import module

Let’s discuss how to convert Python Dictionary to Pandas Dataframe. We can convert a dictionary to a pandas dataframe by using the pd.DataFrame.from_dict() class-method. Example 1: Passing the key value as a list.

mean() – Mean Function in python pandas is used to calculate the arithmetic mean of a given set of numbers, mean of a data frame ,column wise mean or mean of column in pandas and row wise mean or mean of rows in pandas , lets see an example of each . We need to use the package name “statistics” in calculation of mean.

Comments
  • Thanks Scott, that works great. Out of interest, if I wanted to do the same for future prices so my price dictionary contained the month and future year {Oct-19: 12} etc how could I use map but extract the month from the dictionary as i don't think I could strip it or use strftime?
  • You can still use strftime. strftime('%b-%y') should work.
  • dd = {'Oct-19': 11, 'Nov-19': 23, ... ,'Sep-20': 31}. If I try and use: df['Adj Sales'] = df.index.strftime('%b').map(dd.strftime('%b') * df['SALES'] it throws a 'dict' object has no attribute 'strftime'?
  • df.index.strftime('%b-%y).map(dd)... you don't need to do anyting to the dictionary. Just modify the index side