Finding the mean and standard deviation of a timedelta object in pandas df

pandas timedelta
python average timedelta
pandas timedelta to datetime
pandas timedelta to years
pandas negative timedelta
pandas add timedelta
pandas timedelta months
mean of datetime pandas

I would like to calculate the mean and standard deviation of a timedelta by bank from a dataframe with two columns shown below. When I run the code (also shown below) I get the below error:

pandas.core.base.DataError: No numeric types to aggregate

My dataframe:

   bank                          diff
   Bank of Japan                 0 days 00:00:57.416000
   Reserve Bank of Australia     0 days 00:00:21.452000
   Reserve Bank of New Zealand  55 days 12:39:32.269000
   U.S. Federal Reserve          8 days 13:27:11.387000

My code:

means = dropped.groupby('bank').mean()
std = dropped.groupby('bank').std()

You need to convert timedelta to some numeric value, e.g. int64 by values what is most accurate, because convert to ns is what is the numeric representation of timedelta:

dropped['new'] = dropped['diff'].values.astype(np.int64)

means = dropped.groupby('bank').mean()
means['new'] = pd.to_timedelta(means['new'])

std = dropped.groupby('bank').std()
std['new'] = pd.to_timedelta(std['new'])

Another solution is to convert values to seconds by total_seconds, but that is less accurate:

dropped['new'] = dropped['diff'].dt.total_seconds()

means = dropped.groupby('bank').mean()

Essential basic functionality, Finding the mean and standard deviation of a timedelta object in pandas df. pandas timedelta pandas seconds to timedelta python average timedelta pandas​  If you are trying to estimate the parameters of a distribution with a sample then your standard deviation will use n-1 in the denominator to be unbiased. If you data is the entire population then the standard deviation would use n in the denominator. The mean is the same in both situations.

No need to convert timedelta back and forth. Numpy and pandas can seamlessly do it for you with a faster run time. Using your dropped DataFrame:

import numpy as np

grouped = dropped.groupby('bank')['diff']

mean = grouped.apply(lambda x: np.mean(x))
std = grouped.apply(lambda x: np.std(x))

pandas.DataFrame.describe, To view a small sample of a Series or DataFrame object, use the head() and tail() methods. You'll still find references to these in old code bases and online. These will both raise errors, as you are trying to compare multiple values.: Note that by chance some NumPy methods, like mean , std , and sum , will exclude  Find Mean, Median and Mode: import pandas as pd df = pd.DataFrame([[10, 20, 30, 40], [7, 14, 21, 28], [55, 15, 8, 12], [15, 14, 1, 8], [7, 1, 1, 8], [5, 4, 9, 2

Pandas mean() and other aggregation methods support numeric_only=False parameter.

dropped.groupby('bank').mean(numeric_only=False)

Found here: Aggregations for Timedelta values in the Python DataFrame

pandas.DataFrame.rolling, Analyzes both numeric and object series, as well as DataFrame column sets of For numeric data, the result's index will include count , mean , std , min , max as  How to convert timedelta to datetime? [duplicate] mean_df_abc = pd.to_timedelta Finding the mean and standard deviation of a timedelta object in pandas df.

I would suggest passing the numeric_only=False argument to mean as mentioned by Alexander Usikov - this works for pandas version 0.20+.

If you have an older version, the following works:

import pandas pd

df = pd.DataFrame({
    'td': pd.Series([pd.Timedelta(days=i) for i in range(5)]),
    'group': ['a', 'a', 'a', 'b', 'b']
})

(
    df
    .astype({'td': int})         # convert timedelta to integer (nanoseconds)
    .groupby('group')
    .mean()
    .astype({'td': 'timedelta64[ns]'})
)

Time deltas, This is the number of observations used for calculating the statistic. df.rolling(2, win_type='gaussian').sum(std=3) B 0 NaN 1 0.986207 2 2.958621 3 NaN 4  Standard deviation Function in python pandas is used to calculate standard deviation of a given set of numbers, Standard deviation of a data frame, Standard deviation of column and Standard deviation of rows, let’s see an example of each. We need to use the package name “statistics” in calculation of median.

Python Pandas - Quick Guide, As such, the 64 bit integer limits determine the Timedelta limits. Timedelta(days​=i) for i in range(3)]) In [26]: df = pd. A datetime64[ns] B timedelta64[ns] C datetime64[ns] dtype: object In [31]: s - s.max() Out[31]: 0 -2 days -1 days +00:​00:05 3 1 days 00:00:00 dtype: timedelta64[ns] In [67]: y2.mean() Out[67]: Timedelta('-1  Return sample standard deviation over requested axis. Normalized by N-1 by default. This can be changed using the ddof argument. Exclude NA/null values. If an entire row/column is NA, the result will be NA. If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series. Delta Degrees of Freedom.

Python Pandas - Descriptive Statistics, Fast and efficient DataFrame object with default and customized indexing. Tools for loading Standard Python distribution doesn't come bundled with Pandas module. object. By definition, a Series is a 1D data structure, so it returns std​(). Returns the Bressel standard deviation of the numerical columns. find(​pattern). Mean Function in Python pandas (Dataframe, Row and column wise mean) mean() – Mean Function in python pandas is used to calculate the arithmetic mean of a given set of numbers, mean of a data frame ,mean of column and mean of rows , lets see an example of each .

Time Series Analysis with Python Made Easy, Python Pandas - Descriptive Statistics - A large number of methods collectively like sum(), mean(), but some of them, like sumsum(), produce an object of the same size. {sum, std, }, but the axis can be specified by name or integer. DataFrame − “index” Returns the Bressel standard deviation of the numerical columns. In R, the mad() function can never be used to calculate the mad() as currently defined in Pandas. The outer aggregator is always the median. Thecenter argument is a numeric constant and is used like this: abs(df - center).median().

Comments
  • How do you want to aggregate the timedelta object? Access the .days or .seconds attributes if you're looking to aggregate.
  • Thank you, this worked like a charm -- (I used the first solution)!