pandas - how to create multiple columns in groupby with conditional?

Related searches

I need to group a dataframe, but I need to create two columns, one that is a simple count and another that is a count with conditional, as in the example:

The qtd_ok column counts only those that have 'OK'

I tried this, but I do not know how to add the total count in the same groupby:

df.groupby(['column1', 'column2', 'column3']).apply(lambda x : x['status'].sum() == 'OK')

First create helper column A with assign and then aggregate by agg functions sum for count only OK values and size for count all values per groups:

df = (df.assign(A=(df['status']== 'OK'))
        .groupby(['column1', 'column2', 'column3'])['A']
        .agg([('qtd_ok','sum'),('qtd','size')])
        .astype(int)
        .reset_index())

Sample:

df = pd.DataFrame({
        'column1':['a'] * 9,
        'column2':['a'] * 4 + ['b'] * 5,
        'column3':list('aaabaabbb'),
        'status':list('aabaaabba'),
})

print (df)
  column1 column2 column3 status
0       a       a       a      a
1       a       a       a      a
2       a       a       a      b
3       a       a       b      a
4       a       b       a      a
5       a       b       a      a
6       a       b       b      b
7       a       b       b      b
8       a       b       b      a

df = (df.assign(A=(df['status']== 'a'))
        .groupby(['column1', 'column2', 'column3'])['A']
        .agg([('qtd_ok','sum'),('qtd','size')])
        .astype(int)
        .reset_index())
print (df)
  column1 column2 column3  qtd_ok  qtd
0       a       a       a       2    3
1       a       a       b       1    1
2       a       b       a       2    2
3       a       b       b       1    3

Combining multiple columns in Pandas groupby with dictionary , Let' see how to combine multiple columns in Pandas using groupby with dictionary Creating the groupby dictionary Groupby the groupby_dict created above in Pandas DataFrame · Conditional operation on Pandas DataFrame columns  A Computer Science portal for geeks. It contains well written, well thought and well explained computer science and programming articles, quizzes and practice/competitive programming/company interview Questions.

pd.crosstab

You can use pd.crosstab with margins=True:

# data from @jezrael

list_of_lists = df.iloc[:, :-1].values.T.tolist()
condition = df['status'].eq('a')

res = pd.crosstab(list_of_lists, condition, margins=True)\
        .drop('All', level=0).reset_index()

print(res)

status column1 column2 column3  False  True  All
0            a       a       a      1     2    3
1            a       a       b      0     1    1
2            a       b       a      0     2    2
3            a       b       b      2     1    3

Group By: split-apply-combine, To create a GroupBy object (more on what the GroupBy object is later), you do the For DataFrame objects, a string indicating a column to be used to group. Create a Column Based on a Conditional in pandas. # Import required modules import pandas as pd import numpy as np. Make a dataframe. data = {'name':

Just an idea to count with groupby with lambda which can further be enhanced ..

>>> df
  colum1    colum2    colum3 status
0  unit1  section1  content1     OK
1  unit1  section1  content1     OK
2  unit1  section1  content1  error
3  unit1  section1  content2     OK
4  unit1  section2  content1     OK
5  unit1  section2  content1     OK
6  unit1  section2  content2  error
7  unit1  section2  content2  error
8  unit1  section2  content2     OK

using groupby with lambda..

 >>> df.groupby(['colum1','colum2', 'colum3'])['status'].apply(lambda x: x[x.str.contains('OK', case=False)].count()).reset_index()
  colum1    colum2    colum3  status
0  unit1  section1  content1       2
1  unit1  section1  content2       1
2  unit1  section2  content1       2
3  unit1  section2  content2       1

Also can use case=False for ignorecase for ok.

Group By: split-apply-combine, To create a GroupBy object (more on what the GroupBy object is later), you may do A string passed to groupby may refer to either a column or an index level. Using a custom function in Pandas groupby. In the previous example, we passed a column name to the groupby method. You can also pass your own function to the groupby method. This function will receive an index number for each row in the DataFrame and should return a value that will be used for grouping.

pandas.DataFrame.assign, Assigning multiple columns within the same assign is possible. Later items in '**​kwargs' may refer to newly created or modified columns in 'df'; items are  Pandas objects can be split on any of their axes. The abstract definition of grouping is to provide a mapping of labels to group names. Pandas datasets can be split into any of their objects. There are multiple ways to split data like: obj.groupby(key) obj.groupby(key, axis=1) obj.groupby([key1, key2])

Aggregation and Grouping, 0 0.374540 1 0.950714 2 0.731994 3 0.598658 4 0.156019 dtype: float64 Let's use this on the Planets data, for now dropping rows with missing values: dataset, but often we would prefer to aggregate conditionally on some label or index: this To produce a result, we can apply an aggregate to this DataFrameGroupBy  Any groupby operation involves one of the following operations on the original object. They are − Splitting the Object. Applying a function. Combining the results. In many situations, we split the data into sets and we apply some functionality on each subset.

Output: Explanation. Here we have grouped Column 1.1, Column 1.2 and Column 1.3 into Column 1 and Column 2.1, Column 2.2 into Column 2. Notice that the output in each column is the min value of each row of the columns grouped together. i.e in Column 1, value of first row is the minimum value of Column 1.1 Row 1, Column 1.2 Row 1 and Column 1.3 Row 1.

Comments
  • Good one to know, +1
  • Another nice solution +1
  • OP expecting two output columns of GroupBy.