Assign a group to values
I have a pandas dataframe with several columns(20) and rows (16404). One the columns is ['age']. I would like to be able to plot other metrics such as ['Income'] over a category of age. Ex: What's the income for all the Males under 20 years old or Females aged between 20 and 40.
I tried this type of condition:
for i in range(len(df['age'])): if df['age'][i]<25 and df['Gender'][i]==1: df['group'][i]=1
But I get the following error:
The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()
Could you please indicate me how to assign a group to a row depending on these conditions please?
All the series are int64
- Ambiguous error can be solved by
(df['age'] < 25) & (df['Gender'] == 1)Note that I used an
- if you did that, you are evaluating an entire column and assigning an entire column for every row which is very wasteful.
Do this to get booleans
df['group'] = df['age'].lt(25) & df['Gender'].eq(1)
you can convert that to integers
1 in many ways
df['group'] = df['group'].astype(int)
[PDF] Grouping Values in Web Intelligence 4.1, Web Intelligence 4.1 allows you to create Grouping variables to group your data, Check Automatically Group to assign a group name for all ungrouped values. So, if you assign each of the 5001 values 0, 0.001, 0.002,, 4.999, 5 to 128 ranges, the numbers of elements per range will at best vary between 39 and 40. Suggestion: Use 5/128=0.0390625 as the interval length, right-open intervals, but assign 5 to group 128 (so as to avoid a 129th group for value 5 alone).
You should use apply method instead (see doc):
def your_function(row): if row['age']<25 and row['Gender']==1: return 1 else: return 0 df['group'] = df.apply(your_function,axis=1)
Excel formula: Randomly assign data to groups, To randomly assign people to groups or teams of a specific size, you can use a helper column with a value generated by the RAND function, together with a Assign each customer a group number. Results: Each record will be assigned a group number. Each customer will have a unique group number. In order to allow future sorting, you copy the formulas in column A and use Home, Paste dropdown, Paste Values to convert the formulas to numbers.
cond_1 = df['age'] < 25 cond_2 = df['Gender'] == 1 df['group'] = np.where(cond_1 & cond_2, 1, 0)
It will assign
1 where both conditions are satisfied and
0 everywhere else.
Taking into account your comments, this method doesn't have to be binary. You can include as many conditions as you need and you can substitute the
1 for any int or str you want. Moreover, you can change the
Organizing values by groups and sets – Zendesk help, Creating groups. A group is a way to organize your attribute values. A group has the following advantages over a set: You can use groups to Stack Overflow Public questions and answers; Assign value to group based on condition in column. group date value newValue 1 1 1 3 2 2 1 2 4 2 3 1 3 3 2 4 2 4
Make Group Tool, The Make Group tool takes data relationships and assembles the L and M make their own group, L as they do not relate to the other values in Group A. By this logic, the Make Group tool would assign the following groups: Assign a value or category based on a number range with formula The following formulas can help you to assign a value or category based on a number range in Excel. Please do as follows.
Solved: How to assign group name to range of non-numeric v , Solved: Need help in assigning a group to a set of data using a range of values. The groups and ranges are defined in data set #1. The values If you need to group by number, you can use the VLOOKUP function with a custom grouping table. This allows you to make completely custom or arbitrary groups. This formula uses the value in cell D5 for a lookup value, the named range "age_table" (G5:H8) for the lookup table, 2 to indicate "2nd column", and TRUE as the last argument indicate
Creating data groups, If you selected a numeric column, set the groups in the following way: Each group is automatically assigned an equal number of values. When you change the Here is the solution using ngroup from a comment above by Constantino, for those still looking for this function (the equivalent of dplyr::group_indices in R, or egen group() in Stata) if you were trying to search with those keywords like me).