Grow pandas dataframe group by group

Related searches

I have a multi-index Pandas dataframe. In my example there are two levels: vehicles (with attributes A and B) and reference_days (with attributes 1 and 2). For each vehicle, for each day, there is a set of moments in time (in a string-format, such that e.g. '2330' corresponds to 11.30pm and '30' to 0.30am). These moments in time are ordered chronologically, but for 1 reference_day they may cross the "midnight" line. That is, a time moment at 02.00am may be counted to the PREVIOUS day. I want to have a new column that takes value 1 if the time moment of that row actually corresponds to a "new" day (i.e. whether the midnight line has been crossed). This example corresponds to a train timetable where trips between midnight and (approximately) 4am are registered under the preceding day.

Example:

dict = {"vehicle": ["A"]*8 + ["B"]*8,
        "reference_day" : [1, 1, 1, 1, 2, 2, 2, 2]*2,
        "time" : [1830, 2200, 30, 115, 1700, 1800, 2300, 100,
                  1900, 2300, 15, 200, 1500, 2000, 2330, 120]}
df = pd.DataFrame(dict).reset_index(drop=True).set_index(["vehicle", "reference_day"], drop=True)

DataFrame looks like this:

                       time
vehicle reference_day      
A       1              1830
        1              2200
        1                30
        1               115
        2              1700
        2              1800
        2              2300
        2               100
B       1              1900
        1              2300
        1                15
        1               200
        2              1500
        2              2000
        2              2330
        2               120

I want to have an extra column like this:

                       time   next_day
vehicle reference_day      
A       1              1830   0
        1              2200   0
        1                30   1
        1               115   1
        2              1700   0
        2              1800   0
        2              2300   0
        2               100   1
B       1              1900   0
        1              2300   0
        1                15   1
        1               200   1
        2              1500   0
        2              2000   0
        2              2330   0
        2               120   1

How should I achieve this in an elegant way? Hope anyone can help, thanks!

Let's try:

df['next_day'] = df.groupby(level=[0,1])['time']\
                   .transform(lambda x: x.diff().lt(0).cumsum())

Output:

                       time  next_day
vehicle reference_day                
A       1              1830         0
        1              2200         0
        1                30         1
        1               115         1
        2              1700         0
        2              1800         0
        2              2300         0
        2               100         1
B       1              1900         0
        1              2300         0
        1                15         1
        1               200         1
        2              1500         0
        2              2000         0
        2              2330         0
        2               120         1

Pandas Groupby: Summarising, Aggregating, and Grouping data in , Aggregation and grouping of Dataframes is accomplished in Python Pandas To apply multiple functions to a single column in your grouped data, expand the� Group DataFrame using a mapper or by a Series of columns. A groupby operation involves some combination of splitting the object, applying a function, and combining the results. This can be used to group large amounts of data and compute operations on these groups.

We could also use:

df['next_day']= (df.groupby(level = [0,1])[['time']].diff()
                   .lt(0)
                   .groupby(level = [0,1])['time']
                   .cumsum()
                   .astype(int)
                )
print(df)
                       time  next_day
vehicle reference_day                
A       1              1830         0
        1              2200         0
        1                30         1
        1               115         1
        2              1700         0
        2              1800         0
        2              2300         0
        2               100         1
B       1              1900         0
        1              2300         0
        1                15         1
        1               200         1
        2              1500         0
        2              2000         0
        2              2330         0
        2               120         1

keep in mind that this at the performance level is similar to groupby.transform, although here we group twice, apply or transform with a lambda function with several methods is usually slow too.

Pandas DataFrame: GroupBy Examples, Concatenate strings in group Permalink. This is called GROUP_CONCAT in databases such as MySQL. See� Pandas is one of those packages and makes importing and analyzing data much easier. Pandas dataframe.groupby() function is used to split the data into groups based on some criteria. pandas objects can be split on any of their axes. The abstract definition of grouping is to provide a mapping of labels to group names.

The following might help?

df['next_day']=(df['time']<400).astype(int)

Pandas GroupBy: Your Guide to Grouping Data in Python – Real , You can read the CSV file into a Pandas DataFrame with read_csv() : SELECT state, count(name) FROM df GROUP BY state ORDER BY state; the difference becomes when your dataset grows to a few million rows! DataFrame - groupby() function. The groupby() function is used to group DataFrame or Series using a mapper or by a Series of columns. A groupby operation involves some combination of splitting the object, applying a function, and combining the results. This can be used to group large amounts of data and compute operations on these groups. Syntax:

Group by: split-apply-combine — pandas 1.1.1 documentation, groupby(df['A']) , but it makes life simpler. For DataFrame objects, a string indicating an index level to be used to group. A list of any� Here’s how to group your data by specific columns and apply functions to other columns in a Pandas DataFrame in Python. Create the DataFrame with some example data You should see a DataFrame that looks like this: Example 1: Groupby and sum specific columns Let’s say you want to count the number of units, but … Continue reading "Python Pandas – How to groupby and aggregate a DataFrame"

GroupBy-fu: improvements in grouping and aggregating data in , GroupBy may be one of the least well-understood features in pandas. to each group and putting the aggregated results into a DataFrame; Slicing I have a mind to expand on this idea for DataFrame objects to get us closer� Get group by key. After calling groupby(), you can access each group dataframe individually using get_group().

Any groupby operation involves one of the following operations on the original object. They are − Splitting the Object. Applying a function. Combining the results. In many situations, we split the data into sets and we apply some functionality on each subset.

Comments
  • shouldn't the last value for A be 0 as it's a journey that started at 23:00 ?
  • This is such a neat solution! Thanks a lot.
  • What if the day starts at 2:00 am?
  • Do you mean what if the day starts at 2:00 am ? In that case, df['next day']=((df['time']>=200) & (df['time']<400)).astype(int)
  • Unfortunately, this does not always work, since in my example the new day starts at "approximately" 4am. This moment is not the guaranteed 'start of the new day'.