## Pandas - Cumulative sum of consecutive ones

pandas dataframe cumulative sum group by

pandas dataframe cumulative sum of a column

pandas cumulative sum by date

pandas cumsum one column

pandas cumulative sum reset on condition

pandas rolling sum

pandas cumsum specific column

I have a dataframe like this:

Name_A ¦ date1 ¦ 1 Name_A ¦ date2 ¦ 0 Name_A ¦ date3 ¦ 1 Name_A ¦ date4 ¦ 1 Name_A ¦ date5 ¦ 1 Name_B ¦ date6 ¦ 1 Name_B ¦ date7 ¦ 1 Name_B ¦ date8 ¦ 0 Name_B ¦ date9 ¦ 1

And I would like to get this:

Name_A ¦ date1 ¦ 1 Name_A ¦ date2 ¦ 0 Name_A ¦ date3 ¦ 1 Name_A ¦ date4 ¦ 2 Name_A ¦ date5 ¦ 3 Name_B ¦ date6 ¦ 1 Name_B ¦ date7 ¦ 2 Name_B ¦ date8 ¦ 0 Name_B ¦ date9 ¦ 1

Basically I want to get the cumulative sum of consecutive 1s. If the name changes or there's a 0, it should start the counting from 0 again.

Any ideas/suggestions? Thanks.

I rebuilt your data like this:

import pandas as pd df = pd.DataFrame( {'col1': ['Name_A'] * 5 + ['Name_B'] * 4, 'col2': ['date{}'.format(x) for x in list(range(1,10,1))], 'col3': [1,0,1,1,1,1,1,0,1]})

For the kind of grouping you're suggesting, I like using `itertools.groupby`

rather than `pd.groupby`

, that way I can explicitly state the two conditions that you specified (name change and 0 in value column):

from itertools import groupby groups = [] uniquekeys = [] for k, g in groupby(df.iterrows(), lambda row: (row[1]['col1'], row[1]['col3'] == 0)): groups.append(list(g)) uniquekeys.append(k)

Now that the correct groups exist, all that remains is to iterate over then an calculate the cumulative sum:

cumsum = pd.concat([pd.Series([y[1]['col3'] for y in x]).cumsum() for x in groups]) df['cumsum'] = list(cumsum)

Result:

col1 col2 col3 cumsum 0 Name_A date1 1 1 1 Name_A date2 0 0 2 Name_A date3 1 1 3 Name_A date4 1 2 4 Name_A date5 1 3 5 Name_B date6 1 1 6 Name_B date7 1 2 7 Name_B date8 0 0 8 Name_B date9 1 1

For reference, see nice explanation about `itertools.groupby`

here.

**pandas.DataFrame.cumsum,** Returns a DataFrame or Series of the same size containing the cumulative sum. Parameters. axis{0 or 'index', 1 or 'columns'}, default 0. The index or the name Pandas Series.cumsum() is used to find Cumulative sum of a series. In cumulative sum, the length of returned series is same as input and every element is equal to sum of all previous elements. Parameters: skipna: Skips NaN addition for elements after the very next one if True.

Here's my own take:

In [145]: group_ids = df[2].diff().ne(0).cumsum() In [146]: df["count"] = df[2].groupby([df[0], group_ids]).cumsum() In [147]: df Out[147]: 0 1 2 count 0 Name_A date1 1 1 1 Name_A date2 0 0 2 Name_A date3 1 1 3 Name_A date4 1 2 4 Name_A date5 1 3 5 Name_B date6 1 1 6 Name_B date7 1 2 7 Name_B date8 0 0 8 Name_B date9 1 1

This uses the compare-cumsum-groupby pattern to find the contiguous groups, because `df[2].diff().ne(0)`

gives us a True whenever a value isn't the same as the previous, and the cumulative sum of those gives us a new number whenever a new group of 1s starts.

This will mean that we have the same group_id for binary values crossing different names, of course, but since we're grouping on *both* df[0] (the names) and group_ids, we're okay.

**pandas.Series.cumsum,** pandas.Series.cumsum¶. Series. cumsum (self, axis=None, skipna=True, *args, **kwargs)[source]¶. Return cumulative sum over a DataFrame or Series axis. pandas.DataFrame.cumsum¶ DataFrame.cumsum (self, axis=None, skipna=True, *args, **kwargs) [source] ¶ Return cumulative sum over a DataFrame or Series axis. Returns a DataFrame or Series of the same size containing the cumulative sum.

Here is a vectorized solution requiring no explicit loops:

df = pd.DataFrame.from_dict({'name': list('AAAAABBBB'), 'bit': (1,0,1,1,1,1,1,0,1)}) >>> df bit name 0 1 A 1 0 A 2 1 A 3 1 A 4 1 A 5 1 B 6 1 B 7 0 B 8 1 B >>> reset = (df['bit'] == 0) | (df['name'] != df['name'].shift(1)) >>> reset, = np.where(np.concatenate([reset, [True]])) >>> df['count'] = np.arange(reset[-1]) + (df['bit'].values[reset[:-1]]-reset[:-1]).repeat(np.diff(reset)) >>> df bit name count 0 1 A 1 1 0 A 0 2 1 A 1 3 1 A 2 4 1 A 3 5 1 B 1 6 1 B 2 7 0 B 0 8 1 B 1

**Python,** Pandas is one of those packages and makes importing and analyzing data much easier. Pandas dataframe.cumsum() is used to find the cumulative sum value Cumulative sum of a column in pandas python is carried out using cumsum () function. Get the cumulative sum of a column in pandas dataframe in python With an example. Cumulative sum of a column in pandas is computed using cumsum () function and stored in the new column namely cumulative_sum as shown below.

**Python,** skipna: Skips NaN addition for elements after the very next one if True. Result type: Series. Example #1: In this example, a series is created from pandas.Series.cumsum¶ Series.cumsum (self, axis=None, skipna=True, *args, **kwargs) [source] ¶ Return cumulative sum over a DataFrame or Series axis. Returns a DataFrame or Series of the same size containing the cumulative sum.

**How to find the count of consecutive same string values in a pandas ,** Break col1 into sub-groups of consecutive strings. df['col1'].shift(1)).cumsum() col1 col2 start subgroup 0 A>G TCT 1000 1 1 C>T ACA 2000 2 Return the sum of the values for the requested axis. This is equivalent to the method numpy.sum. Axis for the function to be applied on. Exclude NA/null values when computing the result. If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series. Include only float, int, boolean columns.

**Find the consecutive zeros in a DataFrame and do a conditional ,** Consider the following approach: def f(col, threshold=3): mask = col.groupby((col != col.shift()).cumsum()).transform('count').lt(threshold) mask &= col.eq(0) Maximum consecutive zeros in a binary array. This article is contributed by Smarak Chopdar . If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to contribute@geeksforgeeks.org.

##### Comments

- Can you share what you've tried so far? In addition, can you provide some data in a usable format? See minimal reproducible example.
- For the example in your question you could print DataFrame.head(n) in your shell then copy and paste it then format it as code.