Pandas Groupby and apply a custom function to each N- rows of a Column in that group

pandas groupby agg
pandas groupby apply
pandas groupby transform
pandas groupby count
pandas groupby multiple columns
pandas groupby aggregate multiple columns
pandas groupby sum
pandas aggregate

I have a pandas dataframe, and i want to perform a groupby over a column and apply a custom function to another column. But that function has to be applied over every two entries of the apply-column.

df = pd.DataFrame({'id':[1,1,2,2,2,3,3,3,3,3], 'vals':['ANZ', 'ABC', 'SAT', 'SATYA', 'SQL', 'WER', 'DEA', 'KIP', 'FTY', 'TCZ'] })
id  vals  
1   ANZ
1   ABC
2   SAT
2   SQL
3   WER
3   DEA
3   KIP
3   FTY
3   TCZ
# i need a column "res", as a func applied to column vals's each two rows on a group by on column 'id'. ### myfunc takes two argument and return one value.
df['res'] = df.groupby('id')['vals'].apply(myfunc)
id  vals   res
1   ANZ    myfunc(None, ANZ)
1   ABC    myfunc('ANZ', 'ABC')
2   SAT    myfunc(None, 'SAT')
2   SATYA  myfunc('SAT', 'SATYA')
2   SQL    myfunc('SATYA', 'SQL')
3   WER    myfunc(None, 'WER')
3   DEA    myfunc('WER', 'DEA')
3   KIP    myfunc('DEA', 'KIP')
3   FTY    myfunc('KIP', 'FTY')
3   TCZ    myfunc('FTY', 'TCZ')

But currently not able to form the expression for apply(), as fora group by .apply(x), x will be a series and i am unable to find a way to use index wise access on x(pandas groupby series object).

Please guide me on how to achieve this, Thanks in Adv.

I would like to propose to do your task slightly other way.

Start from generating a column with vals from the previous row in the current group. I named it prev.

Then call your function using apply to each row in df, substituting the result do res column. myfunc gets the current row and has to extract prev and vals from it, then return the result.

The only remaining thing is to drop prev column.

So the whole script can look like below:

import pandas as pd

def myfunc(x):
    pr = x.prev
    t1 = pr if pd.notnull(pr) else None
    t2 = x.vals
    return f'myfunc({repr(t1)}, {repr(t2)})'

df = pd.DataFrame({'id':[1,1,2,2,2,3,3,3,3,3], 'vals':
    ['ANZ', 'ABC', 'SAT', 'SATYA', 'SQL', 'WER', 'DEA', 'KIP', 'FTY', 'TCZ'] })
df['prev'] = df.groupby('id').shift()
df['res'] = df.apply(myfunc, axis=1)
df.drop('prev', axis=1, inplace=True)

When you print(df), you will get:

   id   vals                     res
0   1    ANZ     myfunc(None, 'ANZ')
1   1    ABC    myfunc('ANZ', 'ABC')
2   2    SAT     myfunc(None, 'SAT')
3   2  SATYA  myfunc('SAT', 'SATYA')
4   2    SQL  myfunc('SATYA', 'SQL')
5   3    WER     myfunc(None, 'WER')
6   3    DEA    myfunc('WER', 'DEA')
7   3    KIP    myfunc('DEA', 'KIP')
8   3    FTY    myfunc('KIP', 'FTY')
9   3    TCZ    myfunc('FTY', 'TCZ')

Group By: split-apply-combine, Applying a function to each group independently. A string passed to groupby may refer to either a column or an index level. If a string In [10]: df2 = df. set_index(['A', 'B']) In [11]: grouped These will split the DataFrame on its index ( rows). However, with group bys, we have flexibility to apply custom lambda functions. You can learn more about lambda expressions from the Python 3 documentation and about using instance methods in group bys from the official pandas documentation. Below, I group by the sex column and apply a lambda expression to the total_bill column.

IIUC, you can try the below;

df.groupby(df.index//2)['vals','new_value'].apply(lambda x: pd.Series(list(zip(x.new_value,x.vals))))\

0      (nan, ANZ)
1      (ANZ, ABC)
2      (ABC, SAT)
3    (SAT, SATYA)
4    (SATYA, SQL)
5      (SQL, WER)
6      (WER, DEA)
7      (DEA, KIP)
8      (KIP, FTY)
9      (FTY, TCZ)

EDIT Modifying the code a little to match the output :

a=df.groupby('id')['vals'].apply(lambda x: pd.DataFrame(list(zip(x.shift(),x))))

   id   vals           new
0   1    ANZ    (nan, ANZ)
1   1    ABC    (ANZ, ABC)
2   2    SAT    (nan, SAT)
3   2  SATYA  (SAT, SATYA)
4   2    SQL  (SATYA, SQL)
5   3    WER    (nan, WER)
6   3    DEA    (WER, DEA)
7   3    KIP    (DEA, KIP)
8   3    FTY    (KIP, FTY)
9   3    TCZ    (FTY, TCZ)

How to use the Split-Apply-Combine strategy in Pandas groupby, Pandas groupby-apply is an invaluable tool in a Python data method to group rows together according to specified column(s) values. The custom function should have one input parameter which will be print(gr, '\n') Original Dataframe a b c 0 222 34 23 1 333 31 11 2 444 16 21 3 555 32 22 4 666 33 27 5 777 35 11 ***** Apply a lambda function to each row or each column in Dataframe ***** *** Apply a lambda function to each column in Dataframe *** Modified Dataframe by applying lambda function on each column: a b c 0 232 44 33 1 343 41 21 2 454 26 31 3 565 42

So i tried out something like below.

Myfunc is used to find string similarity between two string, i used the awesome fuzzywuzzy library for that

from fuzzywuzzy import fuzz

def myfunc(x):
    x = x.tolist() # converted series to list
    y = []
    for i in range(0, len(x)):
        if i == 0:
            ## apply ratio between prev_Row_vals and Current_Row_vals
            y.append(fuzz.token_set_ratio(x[i - 1], x[i]) / 10)
    return y

  ## Now the group by and apply/transform function
  df['res'] = df.groupby('id')['vals'].transform(lambda x: myfunc(x))

But i am not sure if it is the pythonic way to do such thing. Please do let me know if there is more pythonic way to do this. Thanks.

Pandas .groupby(), Lambda Functions, & Pivot Tables, In this lesson, you'll learn how to group, sort, and aggregate data to examine data with .sample(n=1) and .sort_values; Lambda functions; Group data by columns In this lesson, you'll use records of United States domestic flights from the US It includes a record of each flight that took place from January 1-15 of 2015. To quickly answer this question, you can derive a new column from existing data using an in-line function, or a lambda function. In the previous lesson, you created a column of boolean values (True or False) in order to filter the data in a DataFrame.

groupby() Method: Split Data into Groups, Apply a Function to , Learn how to implement a groupby in Python using pandas with Each row represents a unique meal at a restaurant for a party of Group by of a Single Column and Apply a Single Aggregate Method on a NaN, 8, 10]} df_rides = pd. with group bys, we have flexibility to apply custom lambda functions. Group By: split-apply-combine¶ By “group by” we are referring to a process involving one or more of the following steps: Splitting the data into groups based on some criteria. Applying a function to each group independently. Combining the results into a data structure. Out of these, the split step is the most straightforward.

Pandas GroupBy, Groupby mainly refers to a process involving one or more of the following Applying : It is a process in which we apply a function to each group 4. Apply a function on the weight column of each bucket. sc = lambda x: (x - x.mean()) / x. std() * 10 Python | pandas.to_markdown() in Pandas · Pandas Tutorial · Add a row at  It implies that you can design a custom function to return anything, and it will be placed in a single row for a specific group under the group name label. This example should illustrate that well: def foo(gr): return pd.Series(“This is a test”) df.groupby(‘species’).apply(func=foo)

6-Aggregation-and-Grouping, "Applies" or operates on a column in your data frame with a given function. do it in a one-liner lambda function don't worry. pandas also let's apply your own custom functions. You can use custom functions when applying on Series and also when This is is analogous to, you guessed it, GROUP BY in SQL or Rows in an  groupby function in pandas python: In this tutorial we will learn how to groupby in python pandas and perform aggregate functions.we will be finding the mean of a group in pandas, sum of a group in pandas python and count of a group. We will be working on. getting mean score of a group using groupby function in python