How can I calculate the duration of a column, grouping by another column (Python or R)

pandas groupby aggregate multiple columns
pandas groupby average multiple columns
pandas groupby multiple columns
pandas groupby agg lambda
group by one column and select multiple columns pandas
pandas groupby sum multiple columns
pandas groupby aggregate to list
pandas aggregate

I have a dataframe, df, with the following data:

           ID            DateTime        

           A             12/13/2019 6:35:48PM
           A             12/13/2019 6:35:49PM
           A             12/13/2019 6:35:50PM
           B             12/13/2019 7:00:00PM
           B             12/13/2019 7:00:05PM
           C             12/13/2019 8:00:05PM

Desired outcome:

          ID              Duration

          A                  3 sec
          B                  5 sec
          C                  1 sec

Code I am performing using Python:


How can I calculate the duration of a column, grouping by another column?

Any suggestions will help.

You can do this in R with dplyr and magrittr packages

x <- data.frame(ID = c("A","A","A","B","B","C"),
                 DateTime =  c("12/13/2019 6:35:48PM", "12/13/2019 6:35:49PM",
                               "12/13/2019 6:35:50PM","12/13/2019 7:00:00PM", 
                               "12/13/2019 7:00:05PM","12/13/2019 8:00:05PM"))
x$DateTime <- as.POSIXct(x$DateTime, format = c("%m/%d/%Y %H:%M:%S"))
x %>% 
   group_by(ID) %>%
   mutate(dif = max(DateTime)- min(DateTime)) %>% 
   select(ID, dif) %>% distinct()
# A tibble: 3 x 2
# Groups:   ID [3]
 ID    dif   
 <fct> <drtn>
1 A     2 secs
2 B     5 secs
3 C     0 secs

Pandas Groupby: Summarising, Aggregating, and Grouping data in , Applying a single function to columns in groups; Applying multiple functions to and, while finding the transition from R's excellent data.table library frustrating at times, date: The date and time of the entry; duration: The duration (in seconds) for into Python, Pandas makes the calculation of different statistics very simple. A column or list of columns; A dict or Pandas Series; A NumPy array or Pandas Index, or an array-like iterable of these; You can take advantage of the last option in order to group by the day of the week. You can use the index’s .day_name() to produce a Pandas Index of strings. Here are the first ten observations: >>>

Are the timestamps already sorted? You probably want to find the earliest and latest timestamp and subtract them for each ID.

Aggregation and Grouping, For example, we see in the year column that although exoplanets were where the "apply" is a summation aggregation, is illustrated in this figure: data, updating the sum, mean, count, min, or other aggregate for each group along the way. The key can be any series or list with a length matching that of the DataFrame . Pandas – Python Data Analysis Library. I’ve recently started using Python’s excellent Pandas library as a data analysis tool, and, while finding the transition from R’s excellent data.table library frustrating at times, I’m finding my way around and finding most things work quite well.

You can make a custom function and aggregate of over each series grouped by 'ID'

import pandas as pd 
from datetime import datetime

def duration(series):
    return (max(series) - min(series)).total_seconds()

df.groupby['ID'].agg({'DateTime' : duration})

groupby() Method: Split Data into Groups, Apply a Function to , Learn how to implement a groupby in Python using pandas with simple Group by of a Single Column and Apply a Single Aggregate Method on a Column� the mean is calculated for the three other numeric columns in df_tips in our dataset in which the the exact duration of one ride wasn't recorded. Just as before, pandas automatically runs the .mean() calculation for all remaining columns (the animal column obviously disappeared, since that was the column we grouped by). You can either ignore the uniq_id column, or you can remove it afterwards by using one of these syntaxes: zoo.groupby('animal').mean()[['water_need']] –» This returns

How to Add Group-Level Summary Statistic as a New Column in , The values of the new column will be the same for within each continent. We know that Pandas aggregating function mean() can compute� I have a CSV file with columns date, time. I want to calculate row-by-row the time difference time_diff in the time column. I wrote the following code but it's incorrect. Here is my code and at bottom, my CSV file:

Pandas .groupby(), Lambda Functions, & Pivot Tables, This lesson of the Python Tutorial for Data Analysis covers grouping data with This lesson is part of a full-length tutorial in using Python for Data Analysis. Lambda functions; Group data by columns with .groupby(); Plot grouped data This will open a new notebook, with the results of the query loaded in as a dataframe. Using mean() method, you can calculate mean along an axis, or the complete DataFrame. Example 1: Mean along columns of DataFrame. In this example, we will calculate the mean along the columns. We will come to know the average marks obtained by students, subject wise. Python Program

“Group By” in SQL and Python: a Comparison, On the other hand, Python enables many complex functions that are not SQL also allows us to add an arbitrary amount of columns to our group by For example, if we want to calculate the number of trips and the average, min, and max trip duration for Announcing R and Mode Studio: A Free Toolkit for Every Analyst. Cumulative sum of a column in a pandas dataframe python Cumulative sum of a column in pandas is computed using cumsum() function and stored in the new column namely cumulative_sum as shown below df1['cumulative_sum'] = df1.Mathematics_score.cumsum() print(df1)

  • Does this answer your question? Calculate cumulative duration of a pandas datetime column
  • Thank you, but it doesn't include the 'group_by' condition
  • Yes, I will work in R on this particular problem, as I see the outcome will work. Thank you
  • Hi, thank you, yes, they are already sorted. I am essentially wanting to take the sum of each ID. (Find the duration spent for each ID, creating a new 'duration' column that reflects this)
  • I think what you want is not exactly the sum, I see that it is the difference of the last timestamp and the earliest for a given ID
  • I see, the reason I am saying 'sum' is because sometimes, there may be 'breaks' within the group, giving a false duration time
  • So you can have the following series of messages: A, A, B, A, A? What would be the duration that you want in this case? Is it between the first A and last A, or you want to extract 2 durations, the ones before B and the ones after B?
  • I would like to first group_by the ID, so that these are all together. Then I would like to take the sum of each of these groups, adding a new duration column
  • Thank you. Let me try this
  • Hello, I seem to be getting a: TypeError: 'method' object is not subscriptable error when I run this
  • There's a chance that you're missing brackets somewhere. Maybe you're trying to index something that's not a list. This might help.