How to use argmin with groupby in pandas

Suppose I have a pandas dataframe like this:

  cat  val
0   a    1
1   a    6
2   a   12
3   b    2
4   b    5
5   b   11
6   c    4
7   c   22

And I want to know, for each category (each value of 'cat') what is the position where the value is closest to a given value, say 5.5. I can subtract off my target value and take the absolute value, giving me something like this:

  cat  val  val_delt
0   a    1       4.5
1   a    6       0.5
2   a   12       6.5
3   b    2       3.5
4   b    5       0.5
5   b   11       5.5
6   c    4       1.5
7   c   22      16.5

But I'm stuck about where to go next. My first thought was to use argmin() with groupby(), but this gives an error:

In [375]: df.groupby('cat').val_delt.argmin()
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-375-a2c3dbc43c50> in <module>()
----> 1 df.groupby('cat').val_delt.argmin()

TypeError: 'Series' object is not callable

I could, of course, come up with some horrible hacky thing in standard python where I iterate over all values of cat, then select the subset of my data corresponding to that value, perform the argmin operation then figure out where in the original dataframe that row was. But there's got to be a more elegant way to do this.

What I want as an output is either something like this:

  cat  val
1   a    6      
4   b    5       
6   c    4  

or at least some structure that contains that relevant information (eg - {'a':1, 'b':4, 'c':6} ). I don't care if I get back the index value or the index position, but I need one of the two. I don't care about getting back the value - I can always get that later once I have the index subset.

argmin() is not an agg function, you can use apply to get the closest index of every group:

txt = """  cat  val
0   a    1
1   a    6
2   a   12
3   b    2
4   b    5
5   b   11
6   c    4
7   c   22"""

import io

df = pd.read_csv(io.BytesIO(txt), delim_whitespace=True, index_col=0)
df["val_delt"] = (df.val - 5.5).abs()
idx = df.groupby("cat").apply(lambda df:df.val_delt.argmin())
df.ix[idx, :]

output:

cat  val  val_delt
1   a    6       0.5
4   b    5       0.5
6   c    4       1.5

pandas.core.groupby.DataFrameGroupBy.idxmin, Indexing, iteration · Function application · Computations / Descriptive Stats Series.idxmin. Notes. This method is the DataFrame version of ndarray.argmin. Pandas is one of those packages and makes importing and analyzing data much easier. Pandas Index.argmin() function returns the indices of the minimum value present in the input Index. If we are having more than one minimum value (i.e. minimum value is present more than once) then it returns the index of the first occurrence of the minimum value.

Just adding to HYRY answer, you can use idxmin. Example:

import io
txt = """  cat  val
0   a    1
1   a    6
2   a   12
3   b    2
4   b    5
5   b   11
6   c    4
7   c   22"""
df = pd.read_csv(io.BytesIO(txt.encode()), delim_whitespace=True, index_col=0)
df["val_delt"] = (df.val - 5.5).abs()
idx = df.groupby("cat").apply(lambda df:df.val_delt.idxmin())
df.ix[idx, :]

pandas.Series.argmin, Return the row label of the minimum value. Deprecated since version 0.21.0. The current behaviour of 'Series.argmin' is deprecated, use 'idxmin' instead  DataFrames data can be summarized using the groupby() method. In this article we’ll give you an example of how to use the groupby method. This tutorial assumes you have some basic experience with Python pandas, including data frames, series and so on.

You don't need the apply.

idxmin is sufficient. Just need to ensure you have set the index you want the min of.

>>> df['val_delt'] = (df.val - 5.5).abs()
>>> df.set_index('val').groupby('cat').idxmin()
     val_delt
cat          
a           6
b           5
c           4

graphlab.SFrame.groupby, numpy.ndarray.argmin Working with missing data · Group By: split-apply-​combine · Merge, join, and concatenate · Reshaping and Index.argmax; pandas. Using Pandas groupby to segment your DataFrame into groups. Exploring your Pandas DataFrame with counts and value_counts. Let’s get started. Pandas groupby. Pandas is typically used for exploring and organizing large volumes of tabular data, like a super-powered Excel spreadsheet.

All answers here are somewhat correct, but none of them does it in a concise, beautiful and pythonic way. I leave here a clear way to do this.

>>> indx = df.groupby('cat')['val_delt'].idxmin()
>>> df.loc[indx]

  cat  val  val_delt
1   a    6       0.5
4   b    5       0.5
6   c    4       1.5

pandas-dev/pandas, The available operators are SUM, MAX, MIN, COUNT, AVG, VAR, STDV, CONCAT, SELECT_ONE, ARGMIN, ARGMAX, and QUANTILE. For convenience,​  groupby function in pandas – Group a dataframe in python pandas groupby function in pandas python: In this tutorial we will learn how to groupby in python pandas and perform aggregate functions.we will be finding the mean of a group in pandas, sum of a group in pandas python and count of a group.

You can replace df.groupby('cat').val_delt.argmin() with df.sort_values(['cat', 'val_delt']).groupby('cat').head(1). Essentially, this is sorting the DataFrame by two columns (cat, followed by val_delt).

Code

df = pd.DataFrame([['a', 1], ['a', 6], ['a', 12], ['b', 2], ['b', 5], ['b', 11], ['c', 4], ['c', 22]], columns=['cat', 'val'])
df['val_delt'] = (df.val - 5.5).abs()
df.sort_values(['cat', 'val_delt']).groupby('cat').head(1)

Result

  cat  val  val_delt
1   a    6       0.5
4   b    5       0.5
6   c    4       1.5

Python, Question : df.groupby('col_a')['time_stamp'].idxmin() is not working if __name__​.replace('nan', ''))) TypeError: reduction operation 'argmin' not site-packages\​pandas\core\groupby\groupby.py", line 930, in apply return self. Pandas is one of those packages and makes importing and analyzing data much easier. Pandas dataframe.groupby() function is used to split the data into groups based on some criteria. pandas objects can be split on any of their axes. The abstract definition of grouping is to provide a mapping of labels to group names.

pandas.Series.idxmin, Example #1: Use Index.argmin() function to find the index of the minimum value present in the given Index. Any groupby operation involves one of the following operations on the original object. They are − Splitting the Object. Applying a function. Combining the results. In many situations, we split the data into sets and we apply some functionality on each subset.

pandas.Series.idxmin, This method returns the label of the minimum, while ndarray.argmin returns the position. To get the position, use series.values.argmin() . Examples. >>> s = pd. Used to determine the groups for the groupby. If by is a function, it’s called on each value of the object’s index. If a dict or Series is passed, the Series or dict VALUES will be used to determine the groups (the Series’ values are first aligned; see .align() method). If an ndarray is passed, the values are used as-is determine the groups.

This method returns the label of the minimum, while ndarray.argmin returns the position. To get the position, use series.values.argmin() . Examples. >>> s = pd. Groupby can return a dataframe, a series, or a groupby object depending upon how it is used, and the output type issue leads to numerous problems when coders try to combine groupby with other pandas functions. One especially confounding issue occurs if you want to make a dataframe from a groupby object or series.

Comments
  • this question is very useful, it has many use cases. Thanks!