## Select One of each in a pandas grouping

pandas groupby agg

pandas groupby multiple columns

pandas groupby apply

pandas groupby count

pandas groupby aggregate multiple columns

pandas aggregate

pandas groupby transform

I'm trying to create all of the possible combinations of pairings of players to assign into 4 person golf teams based on handicap type A, B, C, or D.

I've tried various itertools methods such as combinations and permutations but can't figure out the right approach.

from itertools import combinations, product, permutations g = player_df.groupby(by = 'hcp_ABCD') teams_listoflists = [group[1].index for group in g] teams_combo_ndx = [player for player in permutations(teams_listoflists, 4)]

Here is my pandas table:

handicap name hcp_ABCD 0 24 Player1 D 1 21 Player2 D 2 8 Player3 B 3 14 Player4 C 4 20 Player5 D 5 13 Player6 C 6 -1 Player7 A 7 5 Player8 A 8 8 Player9 B 9 6 Player10 B 10 20 Player11 D 11 15 Player12 C 12 0 Player13 A 13 12 Player14 C 14 0 Player15 A 15 10 Player16 B

i would like the output to be all combinations (without duplicates) of player combinations (teams) such that each team has a type A, B, C, and D on each. This output can be a similar table as above grouped by "options."

Edit: Am adding this output example for clarity.

A Player B Player C Player D Player option 1 team1 Player7 Player3 Player4 Player1 team2 Player8 Player9 Player6 Player2 team3 Player13 Player10 Player12 Player5 team4 Player15 Player16 Player14 Player11 option 2 team1 Player7 Player16 Player4 Player1 team2 Player8 Player3 Player6 Player2 team3 Player13 Player9 Player12 Player5 team4 Player15 Player10 Player14 Player11 ... A Player B Player C Player D Player option n team1 Player7 Player3 Player4 Player11 team2 Player8 Player9 Player6 Player1 team3 Player13 Player10 Player12 Player2 team4 Player15 Player16 Player14 Player5

The point of the above is that I'm trying to find a generator that cycles through all combinations of player in each handicap group so that the combination of options of teams is clear.

**Edit #2**
I've determined that this code produces a combination of all of the potential team combinations:

g = df.groupby(by = 'hcp_ABCD') combinations = [list(group[1].index) for group in g]

This creates a list of lists with the A Players in list[0], B Players in list[1], etc.

And this gets an indexer for all possible combinations of teams:

from itertools import product options = [option for option in product(*combinations)]

But, how to assign these out into the "options" (see above example) and ensure no duplication is what I'm stuck on.

**Edit #3** A simpler version (way to think about this problems) is to use the following sets:

A = ['A1', 'A2', 'A3', 'A4'] B = ['B1', 'B2', 'B3', 'B4'] C = ['C1', 'C2', 'C3', 'C4'] D= ['D1', 'D2', 'D3', 'D4']

This essentially does what the groupby does above (grouping by hcp_ABCD) but names each "A Player", "B Player", etc.

possible_combinations of teams:

team_combinations = [team for team in product(A, B, C, D)]

then the next trick is to assign these onto combinations of 4 teams with no duplication of players.

Thanks for clarifying about the expected result. Here is the my answer which I tested. It may not be the exact format of your expected result but I leave it to you to fix it.

import pandas as pd def is_duplicate_team(team, group): '''check if an option already exists''' return any(group == t for t in team) def is_player_exists(group, arr): '''check if a player exists in a group''' return any(x in g for g in group for x in arr) df = [ (24 ,'Player1','D'), (21 ,'Player2','D'), (8 ,'Player3','B'), (14 ,'Player4','C'), (20 ,'Player5','D'), (13 ,'Player6','C'), (-1 ,'Player7','A'), (5 ,'Player8','A'), (8 ,'Player9','B'), (6 ,'Player10','B'), (20 ,'Player11','D'), (15 ,'Player12','C'), (0 ,'Player13','A'), (12 ,'Player14','C'), (0 ,'Player15','A'), (10 ,'Player16','B')] df = pd.DataFrame(df, columns=['handicap', 'name', 'hcp_ABCD']) from itertools import product grouped = df.groupby('hcp_ABCD')['name'].apply(list).reset_index() df_name = [n for n in grouped.name] df_comb = [p for p in product(*df_name)] # below code will get all combinations of groups and for a team having all players teams=[] for i in df_comb[:-1]: group=[i] for j in df_comb[1:]: if not is_player_exists(group, j): group.append(j) if len(group) == 4: if not is_duplicate_team(teams, group): teams.append(group) continue # below code will print the output similar to what you expected i=0 for t in teams: i+=1 print('option: ', str(i) ) for p in t: print(p)

**Group by: split-apply-combine — pandas 1.1.0 documentation,** DataFrame column selection in GroupBy�. Once you have created the GroupBy object from a DataFrame, you might want to do something different for each of the � Note: There’s one more tiny difference in the Pandas GroupBy vs SQL comparison here: in the Pandas version, some states only display one gender. As we developed this tutorial, we encountered a small but tricky bug in the Pandas source that doesn’t handle the observed parameter well with certain types of data.

I made a suggestion in the comments. Here is an implementation:

import pandas as pd from functools import reduce data = [ (24,'Player1','D'), (21,'Player2','D'), (8,'Player3','B'), (8,'Player4','B'), (14,'Player5','C'), (13,'Player6','C'), (-1,'Player7','A'), (5,'Player8','A') ] df = pd.DataFrame( data, columns=['handicap', 'name', 'hcp_ABCD'] ) dfs = [ grp_df.drop(columns="hcp_ABCD") .rename(columns={"name": f"player_{hndcp}", "handicap": f"handicap_{hndcp}"}) for hndcp, grp_df in df.assign(key=1) .groupby("hcp_ABCD") ] result = reduce( lambda left, right: left.merge(right, how="outer", on="key"), dfs ).drop(columns="key") print(result)

Output:

handicap_A player_A handicap_B player_B handicap_C player_C handicap_D player_D 0 -1 Player7 8 Player3 14 Player5 24 Player1 1 -1 Player7 8 Player3 14 Player5 21 Player2 2 -1 Player7 8 Player3 13 Player6 24 Player1 3 -1 Player7 8 Player3 13 Player6 21 Player2 4 -1 Player7 8 Player4 14 Player5 24 Player1 5 -1 Player7 8 Player4 14 Player5 21 Player2 6 -1 Player7 8 Player4 13 Player6 24 Player1 7 -1 Player7 8 Player4 13 Player6 21 Player2 8 5 Player8 8 Player3 14 Player5 24 Player1 9 5 Player8 8 Player3 14 Player5 21 Player2 10 5 Player8 8 Player3 13 Player6 24 Player1 11 5 Player8 8 Player3 13 Player6 21 Player2 12 5 Player8 8 Player4 14 Player5 24 Player1 13 5 Player8 8 Player4 14 Player5 21 Player2 14 5 Player8 8 Player4 13 Player6 24 Player1 15 5 Player8 8 Player4 13 Player6 21 Player2

**How to Get Top N Rows with in Each Group in Pandas?,** Note this is not the same as top N rows according to one variable in the Now we are ready to select N rows from each group, in this example� Any groupby operation involves one of the following operations on the original object. They are − Splitting the Object. Applying a function. Combining the results. In many situations, we split the data into sets and we apply some functionality on each subset.

The following approach is uses a cartesian product and then groups twice to distribute the players into teams with a set of unique handicaps.

import pandas as pd from pandas.compat import StringIO print(pd.__version__) pd.options.display.max_rows = 664 csvdata = StringIO("""handicap,name,hcp_ABCD 24,Player1,D 21,Player2,D 8,Player3,B 14,Player4,C 20,Player5,D 13,Player6,C -1,Player7,A 5,Player8,A 8,Player9,B 6,Player10,B 20,Player11,D 15,Player12,C 0,Player13,A 12,Player14,C 0,Player15,A 10,Player16,B""") df=pd.read_csv(csvdata) # Generate all possible groups # https://stackoverflow.com/questions/53699012/performant-cartesian-product-cross-join-with-pandas def cartesian_product(left, right): return (left.assign(key=1).merge(right.assign(key=1), on='key').drop('key', 1)) def distribute_players(x): x['distribute'] = range(0, 4) return x df = cartesian_product(df, df.copy()) df = df.groupby(['name_x', 'hcp_ABCD_y']).apply(distribute_players) df['team'] = df.groupby(['name_x', 'distribute']).ngroup() print(df[['handicap_y','name_y','hcp_ABCD_y','team']].sort_values(['team']))

handicap_y name_y hcp_ABCD_y team 0 24 Player1 D 0 2 8 Player3 B 0 3 14 Player4 C 0 6 -1 Player7 A 0 1 21 Player2 D 1 5 13 Player6 C 1 7 5 Player8 A 1 8 8 Player9 B 1

**A Guide on Using Pandas Groupby to Group Data for Easier ,** The basic Python groupby syntax typically consists of clauses such as from, group by, select, and aggregation. To understand what each of� Pandas Data Aggregation #1: .count() Counting the number of the animals is as easy as applying a count function on the zoo dataframe: zoo.count() Oh, hey, what are all these lines? Actually, the .count() function counts the number of values in each column. In the case of the zoo dataset, there were 3 columns, and each of them had 22 values in it.

**Pandas GroupBy: Your Guide to Grouping Data in Python – Real ,** You can add these to a startup file to set them automatically each SELECT state, count(name) FROM df GROUP BY state ORDER BY state;. By “group by” we are referring to a process involving one or more of the following steps: Splitting the data into groups based on some criteria. Applying a function to each group independently. Combining the results into a data structure. Out of these, the split step is the most straightforward.

**Pandas GroupBy,** Applying : It is a process in which we apply a function to each group independently; Combining : It is a process Pandas datasets can be split into any of their objects. Now we select an object grouped on multiple columns. One aspect that I’ve recently been exploring is the task of grouping large data frames by different variables, and applying summary functions on each group. This is accomplished in Pandas using the “groupby()” and “agg()” functions of Panda’s DataFrame objects.

**“Group By” in SQL and Python: a Comparison,** Pandas will return a grouped Series when you select a single column, and a We can repeat this process for each of the aggregations we are looking to� Possible duplicate of Pandas dataframe get first row of each group – ssoler Sep 26 '16 at 15:51 1 "top-n" doesn't mean "the n topmost/first/head rows", like you're looking for!

##### Comments

- Am now thinking that itertools.product(g) may be the key to this. this creates an exhaustive list of ABCD teams. But, it doesn't assign them into options without player duplicates like the above.
- @PMende, no it isn't same. I'm not trying to sample. I'm trying to get exhuaustive list. The question you point to is getting 30% sampling.
- Ah. I see what you mean. In this case, break each of the handicap types into sub dataframes (by handicap type), then do a cross join of each of those tables onto each other (you'll have to add a dummy column of the same value in each to do this).
- @Pmende, -- thinking about your idea here. Maybe iterate through itertools.product(*g), add the dummy column for "option x" and then cross join the tables? Then, I can do a multi-index from there?
- I've provided an implementation of my suggestion below.
- I think this does include an exhaustive option for all team configurations. But it doesn't assign the teams into 4 teams such that each combination of team options is shown. Am going to update my question to be more clear on this point.
- Well dang, you are right. This does produce a result. Was hoping to get this into a multi-index dataframe but I agree, this does seem to do it.
- I do however, think that @Pmende is getting to maybe a slightly more elegant (ie. less code) way to do this.
- I had to think about this for a while using a simplified approach using 4 sets of groups to select from A = (A1, A2, A3, A4), B = (B1, B2, B3, B4) ... D = (D1, D2, D3, D4). I used the [p for p in product(A, B, C, D)] to get the team combos and then run it through your combinations for loop. I wonder if there is a simpler way using itertools.combinations([A,B,C,D],4) but it does seem to work. am checking this as correct answer.
- Thanks for clarifying @PMende. But, if I get this right, your dfs dataframe includes 4 dataframes in it of 4 players. It would seem my options list above example would be as long as 4! (factorial). If each team has 4 players and each team has 1 from each handicap group, you would have a potential for a lot more combinations that what your dfs shows. am not sure total combos but somthing like 4 option for B player, 4 options for C player, and 4 options of D player for each of A player.
- @leeprevost I only showed the first 5 rows, because there are 256 of them.
- @leeprevost I've edited the sample data and shown a full result of 16 combinations for each of the handicap types having 2 players.
- I've checked this and it works. But, I'm not sure I could play back exactly HOW it works. I think I like the anonymous better as I can follow it. but this does work.
- Am not looking to sample. Looking for an exhaustive combination of options of team configurations such that each team is 'balanced' meaning 1 A, 1B, 1C, and 1D player on each.