## pandas unique values multiple columns

pandas nunique multiple columns

pandas create dataframe with unique values

how to get unique values from multiple columns in pandas dataframe

pandas count unique values in column

pandas unique values in column

pandas all combinations of two columns

pandas sum unique values in column

df = pd.DataFrame({'Col1': ['Bob', 'Joe', 'Bill', 'Mary', 'Joe'], 'Col2': ['Joe', 'Steve', 'Bob', 'Bob', 'Steve'], 'Col3': np.random.random(5)})

What is the best way to return the unique values of 'Col1' and 'Col2'?

The desired output is

'Bob', 'Joe', 'Bill', 'Mary', 'Steve'

`pd.unique`

returns the unique values from an input array, or DataFrame column or index.

The input to this function needs to be one-dimensional, so multiple columns will need to be combined. The simplest way is to select the columns you want and then view the values in a flattened NumPy array. The whole operation looks like this:

>>> pd.unique(df[['Col1', 'Col2']].values.ravel('K')) array(['Bob', 'Joe', 'Bill', 'Mary', 'Steve'], dtype=object)

Note that `ravel()`

is an array method than returns a view (if possible) of a multidimensional array. The argument `'K'`

tells the method to flatten the array in the order the elements are stored in memory (pandas typically stores underlying arrays in Fortran-contiguous order; columns before rows). This can be significantly faster than using the method's default 'C' order.

An alternative way is to select the columns and pass them to `np.unique`

:

>>> np.unique(df[['Col1', 'Col2']].values) array(['Bill', 'Bob', 'Joe', 'Mary', 'Steve'], dtype=object)

There is no need to use `ravel()`

here as the method handles multidimensional arrays. Even so, this is likely to be slower than `pd.unique`

as it uses a sort-based algorithm rather than a hashtable to identify unique values.

The difference in speed is significant for larger DataFrames (especially if there are only a handful of unique values):

>>> df1 = pd.concat([df]*100000, ignore_index=True) # DataFrame with 500000 rows >>> %timeit np.unique(df1[['Col1', 'Col2']].values) 1 loop, best of 3: 1.12 s per loop >>> %timeit pd.unique(df1[['Col1', 'Col2']].values.ravel('K')) 10 loops, best of 3: 38.9 ms per loop >>> %timeit pd.unique(df1[['Col1', 'Col2']].values.ravel()) # ravel using C order 10 loops, best of 3: 49.9 ms per loop

**pandas unique values multiple columns,** In this article we will discuss how to find unique elements in a single, multiple or each column of a dataframe. Series.unique(). It returns the a Get Unique values in a multiple columns. To get the unique values in multiple columns of a dataframe, we can merge the contents of those columns to create a single series object and then can call unique() function on that series object i.e.

I have setup a `DataFrame`

with a few simple strings in it's columns:

>>> df a b 0 a g 1 b h 2 d a 3 e e

You can concatenate the columns you are interested in and call `unique`

function:

>>> pandas.concat([df['a'], df['b']]).unique() array(['a', 'b', 'd', 'e', 'g', 'h'], dtype=object)

**Pandas : Get unique values in columns of a Dataframe in Python ,** Use the drop_duplicates. This method is used to get the unique rows in a DataFrame: In [29]: df = pd.DataFrame({'a':[1,2,1,2], 'b':[3,4,3,5]}). List unique values in a pandas column. Special thanks to Bob Haffner for pointing out a better way of doing it.

In [5]: set(df.Col1).union(set(df.Col2)) Out[5]: {'Bill', 'Bob', 'Joe', 'Mary', 'Steve'}

Or:

set(df.Col1) | set(df.Col2)

**How to “select distinct” across multiple data frame columns in pandas?,** During the course of a project that I have been working on, I needed to get the unique values from two different columns — I needed all values, and a value in Getting Unique Values Across Multiple Columns in a Pandas Dataframe. During the course of a project that I have been working on, I needed to get the unique values from two different columns — I needed all values, and a value in one column was not necessarily in the other. I came across the .ravel function in Pandas.

An updated solution using numpy v1.13+ requires specifying the axis in np.unique if using multiple columns, otherwise the array is implicitly flattened.

import numpy as np np.unique(df[['col1', 'col2']], axis=0)

This change was introduced Nov 2016: https://github.com/numpy/numpy/commit/1f764dbff7c496d6636dc0430f083ada9ff4e4be

**Getting Unique Values Across Multiple Columns in a Pandas ,** Extract unique combinations of column values - pandas the python list and return two indexes m1, m2 such that the left side of m1 have elements that are less pandas Select distinct rows across dataframe. Example. But Series.unique() works only for a single column. To simulate the select unique col_1, col_2 of SQL you can use DataFrame.drop_duplicates(): This will get you all the unique rows in the dataframe.

Non-`pandas`

solution: using set().

import pandas as pd import numpy as np df = pd.DataFrame({'Col1' : ['Bob', 'Joe', 'Bill', 'Mary', 'Joe'], 'Col2' : ['Joe', 'Steve', 'Bob', 'Bob', 'Steve'], 'Col3' : np.random.random(5)}) print df print set(df.Col1.append(df.Col2).values)

Output:

Col1 Col2 Col3 0 Bob Joe 0.201079 1 Joe Steve 0.703279 2 Bill Bob 0.722724 3 Mary Bob 0.093912 4 Joe Steve 0.766027 set(['Steve', 'Bob', 'Bill', 'Joe', 'Mary'])

**Extract unique combinations of column values,** unique where input array returns unique values or dataframe column or index. The input should be a 1d array and thus the multiple columns will How To Get Unique Values of a Column with drop_duplicates() Another way, that is a bit unintuitive, to get unique values of column is to use Pandas drop_duplicates() function in Pandas. Pandas’ drop_duplicates() function on a variable/column removes all duplicated values and returns a Pandas series.

**[100% Working Code],** Let's discuss how to get unique values from a column in Pandas DataFrame. Now, let's get the unique values of a column in this dataframe. Split a text column into two columns in Pandas DataFrame · Python | Creating a Pandas dataframe "SELECT DISTINCT col1, col2 FROM dataframe_table" The pandas sql comparison doesn't have anything about "distinct".unique() only works for a single column, so I suppose I could concat the columns, or put them in a list/tuple and compare that way, but this seems like something pandas should do in a more native way.

**Get unique values from a column in Pandas DataFrame ,** To get the distinct values in col_1 you can use Series.unique() Source: How to “select distinct” across multiple data frame columns in pandas? One of the biggest advantages of having the data as a Pandas Dataframe is that Pandas allows us to slice and dice the data in multiple ways. Often, you may want to subset a pandas dataframe based on one or more values of a specific column. Essentially, we would like to select rows based on one value or multiple values present in a column.

**pandas,** pd.unique returns the unique values from an input array, or DataFrame column or index. The input to this function needs to be one-dimensional, so multiple How to get unique values from multiple columns in a pandas groupby the c column to get unique values of the l1 and l2 columns. For one columns I can do

##### Comments

- See also unique combinations of values in selected columns in pandas data frame and count for a different but related question. The selected answer there uses
`df1.groupby(['A','B']).size().reset_index().rename(columns={0:'count'})`

- How do you get a dataframe back instead of an array?
- @Lisle: both methods return a NumPy array, so you'll have to construct it manually, e.g.,
`pd.DataFrame(unique_values)`

. There's no good way to get back a DataFrame directly. - @Lisle since he has used pd.unique it returns a numpy.ndarray as a final output. Is this what you were asking?
- This does not work. Throws unorderable types: float() < str()