pandas unique values multiple columns

pandas unique combination of multiple columns
pandas nunique multiple columns
pandas create dataframe with unique values
how to get unique values from multiple columns in pandas dataframe
pandas count unique values in column
pandas unique values in column
pandas all combinations of two columns
pandas sum unique values in column
df = pd.DataFrame({'Col1': ['Bob', 'Joe', 'Bill', 'Mary', 'Joe'],
                   'Col2': ['Joe', 'Steve', 'Bob', 'Bob', 'Steve'],
                   'Col3': np.random.random(5)})

What is the best way to return the unique values of 'Col1' and 'Col2'?

The desired output is

'Bob', 'Joe', 'Bill', 'Mary', 'Steve'

pd.unique returns the unique values from an input array, or DataFrame column or index.

The input to this function needs to be one-dimensional, so multiple columns will need to be combined. The simplest way is to select the columns you want and then view the values in a flattened NumPy array. The whole operation looks like this:

>>> pd.unique(df[['Col1', 'Col2']].values.ravel('K'))
array(['Bob', 'Joe', 'Bill', 'Mary', 'Steve'], dtype=object)

Note that ravel() is an array method than returns a view (if possible) of a multidimensional array. The argument 'K' tells the method to flatten the array in the order the elements are stored in memory (pandas typically stores underlying arrays in Fortran-contiguous order; columns before rows). This can be significantly faster than using the method's default 'C' order.


An alternative way is to select the columns and pass them to np.unique:

>>> np.unique(df[['Col1', 'Col2']].values)
array(['Bill', 'Bob', 'Joe', 'Mary', 'Steve'], dtype=object)

There is no need to use ravel() here as the method handles multidimensional arrays. Even so, this is likely to be slower than pd.unique as it uses a sort-based algorithm rather than a hashtable to identify unique values.

The difference in speed is significant for larger DataFrames (especially if there are only a handful of unique values):

>>> df1 = pd.concat([df]*100000, ignore_index=True) # DataFrame with 500000 rows
>>> %timeit np.unique(df1[['Col1', 'Col2']].values)
1 loop, best of 3: 1.12 s per loop

>>> %timeit pd.unique(df1[['Col1', 'Col2']].values.ravel('K'))
10 loops, best of 3: 38.9 ms per loop

>>> %timeit pd.unique(df1[['Col1', 'Col2']].values.ravel()) # ravel using C order
10 loops, best of 3: 49.9 ms per loop

pandas unique values multiple columns, In this article we will discuss how to find unique elements in a single, multiple or each column of a dataframe. Series.unique(). It returns the a  Get Unique values in a multiple columns. To get the unique values in multiple columns of a dataframe, we can merge the contents of those columns to create a single series object and then can call unique() function on that series object i.e.

I have setup a DataFrame with a few simple strings in it's columns:

>>> df
   a  b
0  a  g
1  b  h
2  d  a
3  e  e

You can concatenate the columns you are interested in and call unique function:

>>> pandas.concat([df['a'], df['b']]).unique()
array(['a', 'b', 'd', 'e', 'g', 'h'], dtype=object)

Pandas : Get unique values in columns of a Dataframe in Python , Use the drop_duplicates. This method is used to get the unique rows in a DataFrame: In [29]: df = pd.DataFrame({'a':[1,2,1,2], 'b':[3,4,3,5]}). List unique values in a pandas column. Special thanks to Bob Haffner for pointing out a better way of doing it.

In [5]: set(df.Col1).union(set(df.Col2))
Out[5]: {'Bill', 'Bob', 'Joe', 'Mary', 'Steve'}

Or:

set(df.Col1) | set(df.Col2)

How to “select distinct” across multiple data frame columns in pandas?, During the course of a project that I have been working on, I needed to get the unique values from two different columns — I needed all values, and a value in  Getting Unique Values Across Multiple Columns in a Pandas Dataframe. During the course of a project that I have been working on, I needed to get the unique values from two different columns — I needed all values, and a value in one column was not necessarily in the other. I came across the .ravel function in Pandas.

An updated solution using numpy v1.13+ requires specifying the axis in np.unique if using multiple columns, otherwise the array is implicitly flattened.

import numpy as np

np.unique(df[['col1', 'col2']], axis=0)

This change was introduced Nov 2016: https://github.com/numpy/numpy/commit/1f764dbff7c496d6636dc0430f083ada9ff4e4be

Getting Unique Values Across Multiple Columns in a Pandas , Extract unique combinations of column values - pandas the python list and return two indexes m1, m2 such that the left side of m1 have elements that are less  pandas Select distinct rows across dataframe. Example. But Series.unique() works only for a single column. To simulate the select unique col_1, col_2 of SQL you can use DataFrame.drop_duplicates(): This will get you all the unique rows in the dataframe.

Non-pandas solution: using set().

import pandas as pd
import numpy as np

df = pd.DataFrame({'Col1' : ['Bob', 'Joe', 'Bill', 'Mary', 'Joe'],
              'Col2' : ['Joe', 'Steve', 'Bob', 'Bob', 'Steve'],
               'Col3' : np.random.random(5)})

print df

print set(df.Col1.append(df.Col2).values)

Output:

   Col1   Col2      Col3
0   Bob    Joe  0.201079
1   Joe  Steve  0.703279
2  Bill    Bob  0.722724
3  Mary    Bob  0.093912
4   Joe  Steve  0.766027
set(['Steve', 'Bob', 'Bill', 'Joe', 'Mary'])

Extract unique combinations of column values, unique where input array returns unique values or dataframe column or index. The input should be a 1d array and thus the multiple columns will  How To Get Unique Values of a Column with drop_duplicates() Another way, that is a bit unintuitive, to get unique values of column is to use Pandas drop_duplicates() function in Pandas. Pandas’ drop_duplicates() function on a variable/column removes all duplicated values and returns a Pandas series.

[100% Working Code], Let's discuss how to get unique values from a column in Pandas DataFrame. Now, let's get the unique values of a column in this dataframe. Split a text column into two columns in Pandas DataFrame · Python | Creating a Pandas dataframe  "SELECT DISTINCT col1, col2 FROM dataframe_table" The pandas sql comparison doesn't have anything about "distinct".unique() only works for a single column, so I suppose I could concat the columns, or put them in a list/tuple and compare that way, but this seems like something pandas should do in a more native way.

Get unique values from a column in Pandas DataFrame , To get the distinct values in col_1 you can use Series.unique() Source: How to “select distinct” across multiple data frame columns in pandas? One of the biggest advantages of having the data as a Pandas Dataframe is that Pandas allows us to slice and dice the data in multiple ways. Often, you may want to subset a pandas dataframe based on one or more values of a specific column. Essentially, we would like to select rows based on one value or multiple values present in a column.

pandas, pd.unique returns the unique values from an input array, or DataFrame column or index. The input to this function needs to be one-dimensional, so multiple  How to get unique values from multiple columns in a pandas groupby the c column to get unique values of the l1 and l2 columns. For one columns I can do

Comments
  • See also unique combinations of values in selected columns in pandas data frame and count for a different but related question. The selected answer there uses df1.groupby(['A','B']).size().reset_index().rename(columns={0:'count'})
  • How do you get a dataframe back instead of an array?
  • @Lisle: both methods return a NumPy array, so you'll have to construct it manually, e.g., pd.DataFrame(unique_values). There's no good way to get back a DataFrame directly.
  • @Lisle since he has used pd.unique it returns a numpy.ndarray as a final output. Is this what you were asking?
  • This does not work. Throws unorderable types: float() < str()