Group index values based on other index values in pandas Data Frame

pandas groupby index
pandas multiindex
pandas set index
pandas groupby multiindex
pandas groupby count
pandas merge
pandas rename column
pandas groupby multiple columns

I have a data frame with following structure

                     Sentence                 Label

A              B   
"unique ID1"   0    "Sample sentence 1"        jt
"unique ID1"   1   "Sample sentence 2"        jt  
"unique ID3"   2   "Sample sentence 3"        edu
"unique ID3"   3   "Sample sentence 4"        edu

I want to be able to get all values of index B grouped by value of index A where label == jt and repeat that for all unique label values. The preferred return type is key-value pairs but any other appropriate format would also work.

Valid Example for label == jt:

("unique ID1" : [0,1] )

Valid Example for label == edu:

("unique ID3" : [2,3] )

I already tried many SO questions, but haven't found what I'm looking for precisely.

I also tried this:

sorted_index_df = df.sort_index(inplace = False)

multi_index = sorted_index_df.loc[sorted_index_df["label"] == "jt"].index

Doing that would return each value of index A with it's corresponding value of index B as a separate tuple.

Ex: ('Labor_&_Delivery_Nurse-APRN__Lidia_Lambert__', 17)

But I want to be able to group all values of index B by values from index A.

Any help is appreciated.

Try this:

To get 'jt' only


unique ID1    [0, 1]
Name: B, dtype: object

To get 'edu' only


unique ID3    [2, 3]
Name: B, dtype: object

Group By: split-apply-combine, GroupBy will tab complete column names (and other attributes): It returns a Series whose index are the group names and whose values are the sizes of each​  represent an index inside a list as x,y in python. python,list,numpy,multidimensional-array. According to documentation of numpy.reshape , it returns a new array object with the new shape specified by the parameters (given that, with the new shape, the amount of elements in the array remain unchanged) , without changing the shape of the original object, so when you are calling the

you can achive this by using groupby like below

df = pd.DataFrame([['unique ID1', '0', 'Sample sentence 1', 'jt'], ['unique ID1', '1', 'Sample sentence 2', 'jt'], ['unique ID3', '2', 'Sample sentence 3', 'edu'], ['unique ID3', '3', 'Sample sentence 4', 'edu']], columns=('A', 'B', 'Sentence', 'Label'))
result = df.groupby(["A", "Label"]).agg({"B":list}).reset_index(level=0)

## you can get result for jt like


pandas Multi-index and groupbys, I mentioned, in passing, that you may want to group by several Hierarchical Indices and pandas DataFrames Lets have a quick refresher with a different dataset, the tips dataset that is built into the seaborn package. To get the index of minimum value of elements in row and columns, pandas library provides a function i.e. DataFrame.idxmin(axis=0, skipna=True) Based on the value provided in axis it will return the index position of minimum value along rows and columns. Let’s see how to use that. Get row index label of minimum value in every column

To provide better presentation, I expanded a little your data sample:

                Sentence Label
A   B                         
ID1 0  Sample sentence 1    jt
    1  Sample sentence 2    jt
ID3 2  Sample sentence 3   edu
    3  Sample sentence 4   edu
ID4 4  Sample sentence 5    jt
    5  Sample sentence 6    jt
ID5 6  Sample sentence 7   edu
    7  Sample sentence 8   edu

The aim was to have at least 2 different IDs for each Label.

To compute the result for all Labels and IDs it is enough to run a single instruction:

df.reset_index().groupby(['Label', 'A']).B.apply(list)

For my data, the result is:

Label  A  
edu    ID3    [2, 3]
       ID5    [6, 7]
jt     ID1    [0, 1]
       ID4    [4, 5]

Indexing and selecting data, Another common operation is the use of boolean vectors to filter the data. You may wish to set values on a DataFrame based on some boolean criteria derived from Similar to numpy ndarrays, pandas Index, Series, and DataFrame also Later, when discussing group by and pivoting and reshaping data, we'll show  As we can see in the output, the above operation has successfully evaluated all the values and has returned a list containing the index labels. Solution #2: We can use Pandas Dataframe.query() function to select all the rows which satisfies some condition over a given column.

pandas.DataFrame.groupby, If by is a function, it's called on each value of the object's index. If a dict or Series is passed, the Series or dict VALUES will be used to determine the groups (the  12. There is guaranteed to be no more than 1 non-null value in the paid_date column per id value and the non-null value will always come before the null values. Create Empty Pandas Dataframe # create empty data frame in pandas >df = pd. Jan 31, 2019 · Select non-empty column values using NOT IS NULL and TRIM() function.

pandas.Index, The ExtensionArray of the data backing this Series or Index. asi8. Integer representation of Return if the index is monotonic decreasing (only equal or decreasing) values. other[, sort]). Return a new Index with elements from the index that are not in other . (self, values). Group the index labels by a given array of values. Python: Add column to dataframe in Pandas ( based on other column or list or default value) Select Rows & Columns by Name or Index in DataFrame using loc & iloc | Python Pandas Pandas : Drop rows from a dataframe with missing values or NaN in columns

Indexing and Selecting Data, As using integer slices with .ix have different behavior depending on whether the slice is Getting values from an object with multi-axes selection uses the following a simple time series data set to use for illustrating the indexing functionality: where at the hierarchical level of interest, each distinct group shares a label,  Let’s see how can we get the index of minimum value in DataFrame column. Observe this dataset first. We’ll use ‘Weight’ and ‘Salary’ columns of this data in order to get the index of minimum values from a particular column in Pandas DataFrame.