Decode one-hot dataframe in Pandas

one-hot encoding python pandas example
one-hot encoding neural network
reverse one hot encoding pandas
one hot encoding multiple columns python
pandas get_dummies
one hot encoding python csv
one-hot encoding for all columns in python
how do you one-hot encode the column fuel type in the dataframe df

I have 2 dataframes with the data as below:

df1:
====
id   name   age   likes
---  -----  ----  -----
0     A      21    rose
1     B      22    apple
2     C      30    grapes
4     D      21    lily

df2:
====
category    Fruit   Flower 
---------  -------  -------
orange      1        0
apple       1        0       
rose        0        1
lily        0        1
grapes      1        0

What I am trying to do is add another column to df1 which would contain the word 'Fruit' or 'Flower' depending on the one-hot encoding in df2 for that entry. I am looking for a purely pandas/numpy implementation.

Any help would be appreciated.

Thanks!

IIUC, you can use .apply and set the axis=1 or axis="columns", which means apply function to each row.

df3 = df1.merge(df2, left_on='likes', right_on='category')

# you can add your one hot columns in here.
categories_col = ['Fruit','Flower']

def get_category(x):
    for category in categories_col:
        if x[category] == 1:
            return category
df1["new"] = df3.apply(get_category, axis=1)

print(df1)
    id  name    age likes   new
0   0   A   21  rose    Flower
1   1   B   22  apple   Fruit
2   2   C   30  grapes  Fruit  
3   4   D   21  lily    Flower

But make sure your dataframe of categories_col must be one hot encode.

One-Hot Encoding in Python with Pandas and Scikit-Learn, dataframe header one hot encoding. The Countries column contain categorical values. We can convert the values in the Countries column into one-hot encoded​  Pandas Series.str.decode () function is used to decode character string in the Series/Index using indicated encoding. This function is equivalent to str.decode () in python2 and bytes.decode () in python3.

You can use apply() for that:

df1['type_string'] = df2.apply(lambda x: 'Fruit' if x.Fruit else 'Flower', 1)  

Here is a running example:

import pandas as pd
from io import StringIO

df1 = pd.read_csv(StringIO(
"""
0     A      21    rose
1     B      22    apple
2     C      30    grapes
4     D      21    lily
"""), sep='\s+', header=None)

df2 = pd.read_csv(StringIO(
"""
orange      1        0
apple       1        0       
rose        0        1
lily        0        1
grapes      1        0
"""), sep='\s+', header=None)

df1.columns = ['id', 'name', 'age', 'likes']
df2.columns = ['category', 'Fruit', 'Flower']

df1['category'] = df2.apply(lambda x: 'Fruit' if x.Fruit else 'Flower', 1)

Input

   id name  age   likes
0   0    A   21    rose
1   1    B   22   apple
2   2    C   30  grapes
3   4    D   21    lily

Output

   id name  age   likes category
0   0    A   21    rose    Fruit
1   1    B   22   apple    Fruit
2   2    C   30  grapes   Flower
3   4    D   21    lily   Flower

One-Hot Encoding a Feature on a Pandas Dataframe: Examples, One-hot encoding is a simple way to transform categorical features into vectors that are easy to deal with. Learn how to do this on a Pandas  One-hot encoding a column in a Pandas DataframePermalink. To create a dataset similar to the one used above in Pandas, we could do this: import pandas as pd df = pd.DataFrame( {'country': ['russia', 'germany', 'australia','korea','germany']}) original-dataframe.

the trick lies in the fact that the two tables have different number of rows, also the examples above might not work if df2 has more categories than what is in df1.

here's a working example:

df1 = pd.DataFrame([['orange',12],['rose',3],['apple',44],['grapes',1]], columns = ['name', 'age'])


df1
    name    age
0   orange  12
1   rose    3
2   apple   44
3   grapes  1
df2 = pd.DataFrame([['orange',1],['rose',0],['apple',1],['grapes',1],['daffodils',0],['berries',1]], columns = ['cat', 'Fruit'])

df2
    cat         Fruit
0   orange      1
1   rose        0
2   apple       1
3   grapes      1
4   daffodils   0
5   berries     1

one single line, run a listcomp with a conditional statement and do the merged df1 and df2 on the fly where the key df1.name = df2.cat:

df1['flag'] = ['Fruit' if i == 1 else 'Flower' for i in df1.merge(df2,how='left',left_on='name', right_on='cat').Fruit]
df1
output
name    age     flag
0   orange  12  Fruit
1   rose    3   Flower
2   apple   44  Fruit
3   grapes  1   Fruit

Python One Hot Encoding with Pandas Made Simple, One hot encoding is the technique to convert categorical values into a Many times we will have our data in a pandas data frame. Pandas  Encode and decode a column of a dataframe in python – pandas In this tutorial we will learn how to encode and decode a column of a dataframe in python pandas. We will see an example to encode a column of a dataframe in python pandas and another example to decode the encoded column.

How to One Hot Encode Sequence Data in Python, A one hot encoding allows the representation of categorical data to be more expressive. CNN LSTMs, Encoder-Decoder LSTMs, generative models, data I have a column in pandas dataframe that contains thousands of  pandas.Series.str.decode¶ Series.str.decode (self, encoding, errors='strict') [source] ¶ Decode character string in the Series/Index using indicated encoding. Equivalent to str.decode() in python2 and bytes.decode() in python3. Parameters encoding str errors str, optional Returns Series or Index

Guide to Encoding Categorical Values in Python, One trick you can use in pandas is to convert a column to a category, then A common alternative approach is called one hot encoding (but also The resulting dataframe looks like this (only showing a subset of columns):. One-Hot Encoding in Python. Using sci-kit learn library approach: OneHotEncoder from SciKit library only takes numerical categorical values, hence any value of string type should be label encoded before one hot encoded. So taking the dataframe from the previous example, we will apply OneHotEncoder on column Bridge_Types_Cat.

Tutorial: (Robust) One Hot Encoding in Python, One hot encoding is a common technique used to work with We'll create a new DataFrame that contains two categorical features, city If we then use map on our pandas Series , it set the new values as NaN and convert  By one hot encoding these, we eliminate our misrepresentation problem and our algorithm will perform much better. One Hot Encoding with Pandas. Many times we will have our data in a pandas data frame. Pandas have built-in functionality to help us perform one-hot encoding, let me show you how to do this below.

Comments
  • Hi, Thanks for the quick response. This works for a couple of categories. What if the categories in df2 (Fruit, Flower etc) is large? Is there an easier way to achieve this?
  • finished, check it , wish this will help.@bchain
  • the for loop/if statement is cumbersome, this can be further reduced using a listcomp, the entire for loop and if statement can be done in a single line including the merge statement.