pandas dataframe group by particular values

Suppose I have a pandas dataframe of transactions that looks like this:

+----------+----------+----------+---------+
|  Owner   |  Seller  | Mediator |  Buyer  |
+----------+----------+----------+---------+
| 'andrew' | 'bob'    | 'tom'    | 'john'  |
| 'andrew' | 'andrew' | 'bill'   | 'jason' |
| 'andrew' | 'bill'   |  'bill'  | 'tom'   |
+----------+----------+----------+---------+

I want to perform a weird groupby- I want to group by people's names based on any involvement in the transactions. So the output would be:

+----------+-------+
|   Name   | Count |
+----------+-------+
| 'andrew' |     3 |
| 'bob'    |     1 |
| 'tom'    |     2 |
| 'john'   |     1 |
| 'bill'   |     2 |
| 'jason'  |     1 |
+----------+-------+

I.e., 'andrew' has a count of 3 because his name appears in 3 transactions, 'john' has a count of 1 because he only appears in 1, etc.

Any tips to go about doing this? Thanks in advance

You can use unstack() to:

  1. Put all Names into one column
  2. groupby Name and count unique original-index which is level_1 after unstack() and reset_index():
    (df.unstack()
       .reset_index(name='Name')
       .groupby('Name') 
       .level_1 
       .nunique() 
       .rename('Count') 
       .reset_index())

    #Out[xx]:
    #     Name  Count
    #0  andrew      3
    #1    bill      2
    #2     bob      1
    #3   jason      1
    #4    john      1
    #5     tom      2

Group by: split-apply-combine — pandas 1.1.0 documentation, Transformation: perform some group-specific computations and return a like- indexed object. Filling NAs within groups with a value derived from each group. For DataFrame objects, a string indicating a column to be used to group. Used to determine the groups for the groupby. If by is a function, it’s called on each value of the object’s index. If a dict or Series is passed, the Series or dict VALUES will be used to determine the groups (the Series’ values are first aligned; see .align() method). If an ndarray is passed, the values are used as-is determine the groups.

You can create a set from each row, then reshape to a vertical data stack and get the value counts.

import pandas as pd

df = pd.DataFrame({'Owner': ['andrew', 'andrew', 'andrew'],
 'Seller': ['bob', 'andrew', 'bill'],
 'Mediator': ['tom', 'bill', 'bill'],
 'Buyer': ['john', 'jason', 'tom']}
)

cnt = (
    df.apply(lambda r: pd.Series(list(set(r))), axis=1)
      .stack()
      .value_counts()
      .reset_index().rename(columns={'index': 'Name', 0: 'Count'})
)
cnt
# returns:
     Name  Count
0  andrew      3
1    bill      2
2     tom      2
3   jason      1
4    john      1
5     bob      1

pandas dataframe group by particular values, You can use unstack() to: Put all Names into one column; groupby Name and count unique original-index which is level_1 after unstack() and� Syntax: DataFrame.groupby (by=None, axis=0, level=None, as_index=True, sort=True, group_keys=True, squeeze=False, **kwargs) Parameters : by : mapping, function, str, or iterable. axis : int, default 0. level : If the axis is a MultiIndex (hierarchical), group by a particular level or levels.

A solution with 'unique()':

df.apply(lambda row: row.unique(),axis=1) \
  .explode().value_counts() \
  .to_frame(name="Count")  \
  .rename_axis(["Name"])      

        Count
Name         
andrew      3
bill        2
tom         2
john        1
bob         1
jason       1

Pandas GroupBy: Your Guide to Grouping Data in Python – Real , You can read the CSV file into a Pandas DataFrame with read_csv() : Using . count() excludes NaN values, while .size() includes everything, NaN or not. of the index locations for the rows belonging to that particular group. Here’s how to group your data by specific columns and apply functions to other columns in a Pandas DataFrame in Python. Create the DataFrame with some example data You should see a DataFrame that looks like this: Example 1: Groupby and sum specific columns Let’s say you want to count the number of units, but …

Pandas GroupBy, applying groupby() function to. # group the data on Name value. gk = df.groupby( 'Name' ). # Let's print the first entries. # in all the groups formed. gk.first()� Flatten hierarchical indices created by groupby. It's useful to execute multiple aggregations in a single pass using the [DataFrameGroupBy.agg()] method (see above).But the result is a dataframe with hierarchical columns, which are not very easy to work with.

A Guide on Using Pandas Groupby to Group Data for Easier , Learn the best way of using the Pandas groupby function for splitting data, putting or DataFrames (a group of Series) based on particular indicators. In the following example, we add the values of identical records and� The first value is the identifier of the group, which is the value for the column(s) on which they were grouped. The second value is the group itself, which is a Pandas DataFrame object. Pandas get_group method. If you want more flexibility to manipulate a single group, you can use the get_group method to retrieve a single group.

Pandas DataFrame: GroupBy Examples, Examples of specific ways to do what you want using groupby on Pandas Concatenate strings in group; Number of unique values per group For Dataframe usage examples not related to GroupBy, see Pandas Dataframe� Each value is a sequence of the index locations for the rows belonging to that particular group. In the output above, 4, 19, and 21 are the first indices in df at which the state equals “PA.” You can also use.get_group () as a way to drill down to the sub-table from a single group:

Comments
  • this works amazingly, thanks! not only does it produce the right solution- but the runtime is significantly better than the other responses and my own solution