How do I one-hot encode an array of strings with Numpy?

sklearn one-hot encoding
numpy eye
keras one-hot encoding
pandas one-hot encoding
onehotencoder multiple columns
pytorch one-hot encoding
convert string values to their one hot encoding
shape mismatch: if categories is an array, it has to be of shape (n_features,).

I know there are sub-optimal solutions out there, but I'm trying to optimise my code. So far, the shortest way I found is this:

import numpy as np
from sklearn.preprocessing import OrdinalEncoder

target = np.array(['dog', 'dog', 'cat', 'cat', 'cat', 'dog', 'dog', 'cat', 'cat'])

oe = OrdinalEncoder()
target = oe.fit_transform(target.reshape(-1, 1)).ravel()
target = np.eye(np.unique(target).shape[0])[np.array(target, dtype=np.int32)]
print(target)

[[0. 1.] [0. 1.] [1. 0.] [1. 0.] ...

This is ugly code, and very long. Remove any part of it and it won't work. I'm looking for a simpler way, that won't involve calls to more than half a dozen functions from two different libraries.

Got it. This will work with arrays of any number of unique values.

import numpy as np

target = np.array(['dog', 'dog', 'cat', 'cat', 'cat', 'dog', 'dog', 
    'cat', 'cat', 'hamster', 'hamster'])

def one_hot(array):
    unique, inverse = np.unique(array, return_inverse=True)
    onehot = np.eye(unique.shape[0])[inverse]
    return onehot

print(one_hot(target))

Out[9]: [[0., 1., 0.], [0., 1., 0.], [1., 0., 0.], [1., 0., 0.], [1., 0., 0.], [0., 1., 0.], [0., 1., 0.], [1., 0., 0.], [1., 0., 0.], [0., 0., 1.], [0., 0., 1.]])

One Hot Encoding using numpy, Usually, when you want to get a one-hot encoding for classification in machine learning, you have an array of indices. import numpy as np nb_classes = 6� I want to encode a 1-D numpy array: x = array([1,0,3]) As a 2-D 1-hot array. y = array([[0,1,0,0], [1,0,0,0], [0,0,0,1]]) Suggest me some faster technique other than looping.

Why not use OneHotEncoder?

>>> from sklearn.preprocessing import OneHotEncoder
>>> ohe = OneHotEncoder(categories='auto', sparse=False)
>>> arr = ohe.fit_transform(target[:, np.newaxis])
>>> arr
array([[0., 1.],
       [0., 1.],
       [1., 0.],
       [1., 0.],
       [1., 0.],
       [0., 1.],
       [0., 1.],
       [1., 0.],
       [1., 0.]])

It stores nice metadata about the transformation:

>>> ohe.categories_
[array(['cat', 'dog'], dtype='<U3')]

Plus you can easily convert back:

>>> ohe.inverse_transform(arr).ravel()
array(['dog', 'dog', 'cat', 'cat', 'cat', 'dog', 'dog', 'cat', 'cat'],
      dtype='<U3')

One hot encoding, Encode categorical features as a one-hot numeric array. The input to this transformer should be an array-like of integers or strings, denoting the __init__ (self, *, categories='auto', drop=None, sparse=True, dtype=<class 'numpy.float64' >,� keras only supports one-hot-encoding for data that has already been integer-encoded. You can manually integer-encode your strings like so: numpy as np code = np

You can use keras and LabelEncoder for it

import numpy as np
from keras.utils import to_categorical
from sklearn.preprocessing import LabelEncoder

# define example
data = np.array(['dog', 'dog', 'cat', 'cat', 'cat', 'dog', 'dog', 'cat', 'cat'])

label_encoder = LabelEncoder()
data = label_encoder.fit_transform(data)
# one hot encode
encoded = to_categorical(data)

sklearn.preprocessing.OneHotEncoder — scikit-learn 0.23.1 , A one hot encoding allows the representation of categorical data to This mapping is then used to encode the input string. We can see that the first letter in the input 'h' is encoded as 7, or the index 7 in the array of possible� Output: The output contains 5 columns, one column for the price, and the remaining 4 columns representing the 4 zones. Example 2: One hot encoder only takes numerical categorical values, hence any value of string type should be label encoded before one-hot encoded.

How to One Hot Encode Sequence Data in Python, The one hot encoder does not accept 1-dimensional array or a pandas series, the input should The data passed to the encoder should not contain strings. A one hot encoding is a representation of categorical variables as binary vectors. This first requires that the categorical values be mapped to integer values. Then, each integer value is represented as a binary vector that is all zero values except the index of the integer, which is marked with a 1. Worked Example of a One Hot Encoding

ML, Use numpy.arange() to do one hot encoding. One-hot encoding of data is stored in a matrix of shape data by 1 + data.max() . The contents of the one-hot matrix� A Computer Science portal for geeks. It contains well written, well thought and well explained computer science and programming articles, quizzes and practice/competitive programming/company interview Questions.

How to do one-hot encoding with numpy in Python, One hot encoding is a common technique used to work with categorical features. build one hot features, but also the LabelEncoder to transform strings The OneHotEncoder will build a numpy array for our data, replacing� How to use one-hot encoding for categorical variables that do not have a natural rank ordering. Discover data cleaning, feature selection, data transforms, dimensionality reduction and much more in my new book , with 30 step-by-step tutorials and full Python source code.

Comments
  • What is the target produced by oe
  • preprocessing.OneHotEncoder also does this, though your's is faster.
  • Any particular reason you didn't use the return_inverse argument to numpy.unique, instead of numpy.searchsorted? Also, you're calling numpy.array on something that's already an array in np.array(array).
  • Also, using return_counts=True is unnecessary. You're only using the result for its length, but that length is the same length as words.
  • This is not NumPy