Pandas: How to find percentage of group members type per subgroup?

(Data sample and attempts at the end of the question)

With a dataframe such as this:

    Type    Class   Area    Decision
0   A       1       North   Yes
1   B       1       North   Yes
2   C       2       South   No
3   A       3       South   No
4   B       3       South   No
5   C       1       South   No
6   A       2       North   Yes
7   B       3       South   Yes
8   B       1       North   No
9   C       1       East    No
10  C       2       West    Yes 

How can I find what percentage of each type [A, B, C, D] that belongs to each area [North, South, East, West]?

Desired output:

    North   South   East    West
A   0.66    0.33    0       0
B   0.5     0.5     0       0
C   0       0.5     0.25    0.25

My best attempt so far is:

df_attempt1= df.groupby(['Area', 'Type'])['Type'].aggregate('count').unstack().T

Which returns:

Area  East  North  South  West
Type                          
A      NaN    2.0    1.0   NaN
B      NaN    2.0    2.0   NaN
C      1.0    NaN    2.0   1.0

And I guess I can build on that by calculating sums in the margins and appending 0 for missing observations, but I'd really appreciate suggestions for more elegant approaches.

Thank you for any suggestions!

Code:

import pandas as pd

df = pd.DataFrame(
    {
        "Type": {0: "A", 1: "B", 2: "C", 3: "A", 4: "B", 5: "C", 6: "A", 7: "B", 8: "B", 9: "C", 10: "C"},
        "Class": {0: 1, 1: 1, 2: 2, 3: 3, 4: 3, 5: 1, 6: 2, 7: 3, 8: 1, 9: 1, 10: 2},
        "Area": {0: "North", 1: "North", 2: "South", 3: "South", 4: "South", 5: "South", 6: "North", 7: "South", 8: "North", 9: "East", 10: "West"},
        "Decision": {0: "Yes", 1: "Yes", 2: "No", 3: "No", 4: "No", 5: "No", 6: "Yes", 7: "Yes", 8: "No", 9: "No", 10: "Yes"},
    }
)

dfg = df[['Area', 'Type']].groupby(['Area']).agg('count').unstack()

df_attempt1 = df.groupby(['Area', 'Type'])['Type'].aggregate('count').unstack().T

You can use the function crosstab:

pd.crosstab(df['Type'], df['Area'], normalize='index')

Output:

Area  East     North     South  West
Type                                
A     0.00  0.666667  0.333333  0.00
B     0.00  0.500000  0.500000  0.00
C     0.25  0.000000  0.500000  0.25

How to find percentage of total with groupby pandas, i want to calculate the percentage of sales for each region, i was able to find in each region but i am not able to find the percentage with in group by statement. “TypeError: (“unsupported operand type(s) for /: 'str' and 'str'”,� Each iteration on the groupby object will return two values. The first value is the identifier of the group, which is the value for the column(s) on which they were grouped. The second value is the group itself, which is a Pandas DataFrame object. Pandas get_group method

You were quite close already. The following should do the trick:

df.groupby('Type')['Area'].value_counts(normalize = True).unstack(fill_value=0)

Output:

Area    East    North       South       West
Type                
A       0.00    0.666667    0.333333    0.00
B       0.00    0.500000    0.500000    0.00
C       0.25    0.000000    0.500000    0.25

If order matters, you can reorder the dataframe manipulating it's columns attribute

Pandas' groupby explained in detail | by Fabian Bosler, Learn how to master all Pandas' groupby functionalities, like type(grouped) OUT: with the name of the group, we can return the respective subset of the data. of the apply step (and count the rows in each group) via the size method. A typical example is to get the percentage of the groups total by� Pandas Data Aggregation #1: .count() Counting the number of the animals is as easy as applying a count function on the zoo dataframe: zoo.count() Oh, hey, what are all these lines? Actually, the .count() function counts the number of values in each column. In the case of the zoo dataset, there were 3 columns, and each of them had 22 values in it.

I think you can go for value_counts(normalize = True):

>>> df.groupby('Type')['Area'].value_counts(normalize = True).unstack().fillna(0)
Area  East     North     South  West
Type                                
A     0.00  0.666667  0.333333  0.00
B     0.00  0.500000  0.500000  0.00
C     0.25  0.000000  0.500000  0.25

Group By: split-apply-combine — pandas 0.15.2 documentation, Aggregation: computing a summary statistic (or statistics) about each group. Discarding data that belongs to groups with only a few members; Filtering out data who have used a SQL-based tool (or itertools), in which you can write code like: This is what happens when you do for example DataFrame.sum() and get� An aggregated function returns a single aggregated value for each group. Once the group by object is created, several aggregation operations can be performed on the grouped data. An obvious one is aggregation via the aggregate or equivalent agg method −

You can do it this way :

import pandas as pd
df = pd.DataFrame([r.split() for r in '''Index Type    Class   Area    Decision
0   A       1       North   Yes
1   B       1       North   Yes
2   C       2       South   No
3   A       3       South   No
4   B       3       South   No
5   C       1       South   No
6   A       2       North   Yes
7   B       3       South   Yes
8   B       1       North   No
9   C       1       East    No
10  C       2       West    Yes'''.split('\n')])
df.columns = df.iloc[0]
df = df.iloc[1:]

table = pd.pivot_table(df, values='Class', index=['Type'], columns=['Area'], aggfunc='count').fillna(0)
table = table.div(table.sum(axis=1), axis=0)

We divide each column by the corresponding sum of the table rows.

It gives :

Area  East     North     South  West
Type                                
A     0.00  0.666667  0.333333  0.00
B     0.00  0.500000  0.500000  0.00
C     0.25  0.000000  0.500000  0.25 

Apply Operations To Groups In Pandas, import modules import pandas as pd is that this object has all of the information needed to then apply some operation to each of the groups. Percentage of a column in pandas python is carried out using sum() function in roundabout way. Let’s see how to. Get the percentage of a column in pandas dataframe in python With an example

(
    df.groupby('Type')
    .apply(lambda x: x.groupby('Area').Class.count()).unstack(fill_value=0)
    .transform(lambda x: x/x.sum(), axis=1)
)

Pandas Tutorial: Data analysis with Python: Part 2 – Dataquest, With this functionality, it's dead simple to compute group summary statistics, How much total combined money did all members of your In the below example, we check the data type of each column in data using a When performing data analysis, it's often useful to explore only a subset of the data. pandas.DataFrame.quantile¶ DataFrame.quantile (q = 0.5, axis = 0, numeric_only = True, interpolation = 'linear') [source] ¶ Return values at the given quantile over requested axis.

Python Pandas: Tricks & Features You May Not Know – Real Python, Write Pandas Objects Directly to Compressed Format; Want to Add to This List? Watch Now This tutorial has a related video course created by the Real Python team. core concepts of Python's Pandas library, hopefully you'll find a trick or two in where each value belongs to a multi-member group, or to no groups at all:. Note this does not influence the order of observations within each group. Groupby preserves the order of rows within each group. group_keys bool, default True. When calling apply, add group keys to index to identify pieces. squeeze bool, default False. Reduce the dimensionality of the return type if possible, otherwise return a consistent type.

Stylin' with Pandas, Pandas has a relatively new API for styling output. to summarize by month and also calculate how much each month is as a percentage of the� dtypes is the function used to get the data type of column in pandas python.It is used to get the datatype of all the column in the dataframe. Let’s see how to. Get the data type of all the columns in pandas python; Ge the data type of single column in pandas; Let’s first create the dataframe.

groupby() Method: Split Data into Groups, Apply a Function to , Learn how to implement a groupby in Python using pandas with simple examples . data into groups based on some criteria, applying a function to each Let's get the tips dataset from the seaborn library and assign it to the You can pass various types of syntax inside the argument for the agg() method. In this tutorial, we will cover an efficient and straightforward method for finding the percentage of missing values in a Pandas DataFrame. This tutorial is available as a video on YouTube.

Comments
  • @MykolaZotko neat :)
  • I definitely did :)
  • Neat, didn't know about the normalize argument :-)
  • @LukasThaler there's a saying, Pandas has everything you will ever need ... ;)