Select One of each in a pandas grouping

pandas groupby
pandas groupby agg
pandas groupby multiple columns
pandas groupby apply
pandas groupby count
pandas groupby aggregate multiple columns
pandas aggregate
pandas groupby transform

I'm trying to create all of the possible combinations of pairings of players to assign into 4 person golf teams based on handicap type A, B, C, or D.

I've tried various itertools methods such as combinations and permutations but can't figure out the right approach.

from itertools import combinations, product, permutations
g = player_df.groupby(by = 'hcp_ABCD')
teams_listoflists = [group[1].index for group in g]
teams_combo_ndx = [player for player in permutations(teams_listoflists, 4)]

Here is my pandas table:

        handicap      name hcp_ABCD
0         24   Player1        D
1         21   Player2        D
2          8   Player3        B
3         14   Player4        C
4         20   Player5        D
5         13   Player6        C
6         -1   Player7        A
7          5   Player8        A
8          8   Player9        B
9          6  Player10        B
10        20  Player11        D
11        15  Player12        C
12         0  Player13        A
13        12  Player14        C
14         0  Player15        A
15        10  Player16        B

i would like the output to be all combinations (without duplicates) of player combinations (teams) such that each team has a type A, B, C, and D on each. This output can be a similar table as above grouped by "options."

Edit: Am adding this output example for clarity.

                       A Player     B Player     C Player   D Player
    option 1  team1    Player7      Player3      Player4    Player1
              team2    Player8      Player9      Player6    Player2
              team3    Player13     Player10     Player12   Player5
              team4    Player15     Player16     Player14   Player11

    option 2  team1    Player7      Player16     Player4    Player1
              team2    Player8      Player3      Player6    Player2
              team3    Player13     Player9      Player12   Player5
              team4    Player15     Player10     Player14   Player11

    ...


                       A Player     B Player     C Player   D Player
    option n  team1    Player7      Player3      Player4    Player11
              team2    Player8      Player9      Player6    Player1
              team3    Player13     Player10     Player12   Player2
              team4    Player15     Player16     Player14   Player5

The point of the above is that I'm trying to find a generator that cycles through all combinations of player in each handicap group so that the combination of options of teams is clear.

Edit #2 I've determined that this code produces a combination of all of the potential team combinations:

g = df.groupby(by = 'hcp_ABCD')
combinations = [list(group[1].index) for group in g]

This creates a list of lists with the A Players in list[0], B Players in list[1], etc.

And this gets an indexer for all possible combinations of teams:

from itertools import product
options = [option for option in product(*combinations)]

But, how to assign these out into the "options" (see above example) and ensure no duplication is what I'm stuck on.

Edit #3 A simpler version (way to think about this problems) is to use the following sets:

A = ['A1', 'A2', 'A3', 'A4']
B = ['B1', 'B2', 'B3', 'B4']
C = ['C1', 'C2', 'C3', 'C4']
D=  ['D1', 'D2', 'D3', 'D4']

This essentially does what the groupby does above (grouping by hcp_ABCD) but names each "A Player", "B Player", etc.

possible_combinations of teams:

team_combinations = [team for team in product(A, B, C, D)]

then the next trick is to assign these onto combinations of 4 teams with no duplication of players.

Thanks for clarifying about the expected result. Here is the my answer which I tested. It may not be the exact format of your expected result but I leave it to you to fix it.

import pandas as pd
def is_duplicate_team(team, group):
    '''check if an option already exists'''
    return any(group == t for t in team)
def is_player_exists(group, arr):
    '''check if a player exists in a group'''
    return any(x in g for g in group for x in arr)

df = [         (24   ,'Player1','D'),
         (21   ,'Player2','D'),
          (8   ,'Player3','B'),
         (14   ,'Player4','C'),
         (20   ,'Player5','D'),
         (13   ,'Player6','C'),
         (-1   ,'Player7','A'),
          (5   ,'Player8','A'),
          (8   ,'Player9','B'),
          (6  ,'Player10','B'),
        (20  ,'Player11','D'),
        (15  ,'Player12','C'),
         (0  ,'Player13','A'),
        (12  ,'Player14','C'),
         (0  ,'Player15','A'),
        (10  ,'Player16','B')]
df = pd.DataFrame(df, columns=['handicap', 'name', 'hcp_ABCD'])
from itertools import product
grouped = df.groupby('hcp_ABCD')['name'].apply(list).reset_index()
df_name = [n for n in grouped.name]
df_comb = [p for p in product(*df_name)]

# below code will get all combinations of groups and for a team having all players
teams=[]
for i in df_comb[:-1]:
    group=[i] 
    for j in df_comb[1:]: 
        if not is_player_exists(group, j):
            group.append(j)
        if len(group) == 4:
            if not is_duplicate_team(teams, group):
                teams.append(group)
            continue

# below code will print the output similar to what you expected
i=0
for t in teams:
    i+=1
    print('option: ', str(i) )
    for p in t:
        print(p)

Group by: split-apply-combine — pandas 1.1.0 documentation, DataFrame column selection in GroupBy�. Once you have created the GroupBy object from a DataFrame, you might want to do something different for each of the � Note: There’s one more tiny difference in the Pandas GroupBy vs SQL comparison here: in the Pandas version, some states only display one gender. As we developed this tutorial, we encountered a small but tricky bug in the Pandas source that doesn’t handle the observed parameter well with certain types of data.

I made a suggestion in the comments. Here is an implementation:

import pandas as pd
from functools import reduce

data = [
    (24,'Player1','D'),
    (21,'Player2','D'),
    (8,'Player3','B'),
    (8,'Player4','B'),
    (14,'Player5','C'),
    (13,'Player6','C'),
    (-1,'Player7','A'),
    (5,'Player8','A')
]
df = pd.DataFrame(
    data,
    columns=['handicap', 'name', 'hcp_ABCD']
)

dfs = [
    grp_df.drop(columns="hcp_ABCD")
          .rename(columns={"name": f"player_{hndcp}",
                           "handicap": f"handicap_{hndcp}"})
    for hndcp, grp_df in df.assign(key=1)
                           .groupby("hcp_ABCD")
]
result = reduce(
    lambda left, right: left.merge(right, how="outer", on="key"),
    dfs
).drop(columns="key")
print(result)

Output:

    handicap_A player_A  handicap_B player_B  handicap_C player_C  handicap_D player_D
0           -1  Player7           8  Player3          14  Player5          24  Player1
1           -1  Player7           8  Player3          14  Player5          21  Player2
2           -1  Player7           8  Player3          13  Player6          24  Player1
3           -1  Player7           8  Player3          13  Player6          21  Player2
4           -1  Player7           8  Player4          14  Player5          24  Player1
5           -1  Player7           8  Player4          14  Player5          21  Player2
6           -1  Player7           8  Player4          13  Player6          24  Player1
7           -1  Player7           8  Player4          13  Player6          21  Player2
8            5  Player8           8  Player3          14  Player5          24  Player1
9            5  Player8           8  Player3          14  Player5          21  Player2
10           5  Player8           8  Player3          13  Player6          24  Player1
11           5  Player8           8  Player3          13  Player6          21  Player2
12           5  Player8           8  Player4          14  Player5          24  Player1
13           5  Player8           8  Player4          14  Player5          21  Player2
14           5  Player8           8  Player4          13  Player6          24  Player1
15           5  Player8           8  Player4          13  Player6          21  Player2

How to Get Top N Rows with in Each Group in Pandas?, Note this is not the same as top N rows according to one variable in the Now we are ready to select N rows from each group, in this example� Any groupby operation involves one of the following operations on the original object. They are − Splitting the Object. Applying a function. Combining the results. In many situations, we split the data into sets and we apply some functionality on each subset.

The following approach is uses a cartesian product and then groups twice to distribute the players into teams with a set of unique handicaps.

import pandas as pd
from pandas.compat import StringIO

print(pd.__version__)
pd.options.display.max_rows = 664

csvdata = StringIO("""handicap,name,hcp_ABCD
24,Player1,D
21,Player2,D
8,Player3,B
14,Player4,C
20,Player5,D
13,Player6,C
-1,Player7,A
5,Player8,A
8,Player9,B
6,Player10,B
20,Player11,D
15,Player12,C
0,Player13,A
12,Player14,C
0,Player15,A
10,Player16,B""")

df=pd.read_csv(csvdata)

# Generate all possible groups
# https://stackoverflow.com/questions/53699012/performant-cartesian-product-cross-join-with-pandas
def cartesian_product(left, right):
    return (left.assign(key=1).merge(right.assign(key=1), on='key').drop('key', 1))

def distribute_players(x):
    x['distribute'] = range(0, 4)
    return x

df = cartesian_product(df, df.copy())
df = df.groupby(['name_x', 'hcp_ABCD_y']).apply(distribute_players)
df['team'] = df.groupby(['name_x', 'distribute']).ngroup()
print(df[['handicap_y','name_y','hcp_ABCD_y','team']].sort_values(['team']))

     handicap_y    name_y hcp_ABCD_y  team
0            24   Player1          D     0
2             8   Player3          B     0
3            14   Player4          C     0
6            -1   Player7          A     0
1            21   Player2          D     1
5            13   Player6          C     1
7             5   Player8          A     1
8             8   Player9          B     1

A Guide on Using Pandas Groupby to Group Data for Easier , The basic Python groupby syntax typically consists of clauses such as from, group by, select, and aggregation. To understand what each of� Pandas Data Aggregation #1: .count() Counting the number of the animals is as easy as applying a count function on the zoo dataframe: zoo.count() Oh, hey, what are all these lines? Actually, the .count() function counts the number of values in each column. In the case of the zoo dataset, there were 3 columns, and each of them had 22 values in it.

Pandas GroupBy: Your Guide to Grouping Data in Python – Real , You can add these to a startup file to set them automatically each SELECT state, count(name) FROM df GROUP BY state ORDER BY state;. By “group by” we are referring to a process involving one or more of the following steps: Splitting the data into groups based on some criteria. Applying a function to each group independently. Combining the results into a data structure. Out of these, the split step is the most straightforward.

Pandas GroupBy, Applying : It is a process in which we apply a function to each group independently; Combining : It is a process Pandas datasets can be split into any of their objects. Now we select an object grouped on multiple columns. One aspect that I’ve recently been exploring is the task of grouping large data frames by different variables, and applying summary functions on each group. This is accomplished in Pandas using the “groupby()” and “agg()” functions of Panda’s DataFrame objects.

“Group By” in SQL and Python: a Comparison, Pandas will return a grouped Series when you select a single column, and a We can repeat this process for each of the aggregations we are looking to� Possible duplicate of Pandas dataframe get first row of each group – ssoler Sep 26 '16 at 15:51 1 "top-n" doesn't mean "the n topmost/first/head rows", like you're looking for!

Comments
  • Am now thinking that itertools.product(g) may be the key to this. this creates an exhaustive list of ABCD teams. But, it doesn't assign them into options without player duplicates like the above.
  • @PMende, no it isn't same. I'm not trying to sample. I'm trying to get exhuaustive list. The question you point to is getting 30% sampling.
  • Ah. I see what you mean. In this case, break each of the handicap types into sub dataframes (by handicap type), then do a cross join of each of those tables onto each other (you'll have to add a dummy column of the same value in each to do this).
  • @Pmende, -- thinking about your idea here. Maybe iterate through itertools.product(*g), add the dummy column for "option x" and then cross join the tables? Then, I can do a multi-index from there?
  • I've provided an implementation of my suggestion below.
  • I think this does include an exhaustive option for all team configurations. But it doesn't assign the teams into 4 teams such that each combination of team options is shown. Am going to update my question to be more clear on this point.
  • Well dang, you are right. This does produce a result. Was hoping to get this into a multi-index dataframe but I agree, this does seem to do it.
  • I do however, think that @Pmende is getting to maybe a slightly more elegant (ie. less code) way to do this.
  • I had to think about this for a while using a simplified approach using 4 sets of groups to select from A = (A1, A2, A3, A4), B = (B1, B2, B3, B4) ... D = (D1, D2, D3, D4). I used the [p for p in product(A, B, C, D)] to get the team combos and then run it through your combinations for loop. I wonder if there is a simpler way using itertools.combinations([A,B,C,D],4) but it does seem to work. am checking this as correct answer.
  • Thanks for clarifying @PMende. But, if I get this right, your dfs dataframe includes 4 dataframes in it of 4 players. It would seem my options list above example would be as long as 4! (factorial). If each team has 4 players and each team has 1 from each handicap group, you would have a potential for a lot more combinations that what your dfs shows. am not sure total combos but somthing like 4 option for B player, 4 options for C player, and 4 options of D player for each of A player.
  • @leeprevost I only showed the first 5 rows, because there are 256 of them.
  • @leeprevost I've edited the sample data and shown a full result of 16 combinations for each of the handicap types having 2 players.
  • I've checked this and it works. But, I'm not sure I could play back exactly HOW it works. I think I like the anonymous better as I can follow it. but this does work.
  • Am not looking to sample. Looking for an exhaustive combination of options of team configurations such that each team is 'balanced' meaning 1 A, 1B, 1C, and 1D player on each.