Pandas: Replace a string with 'other' if it is not present in a list of strings

pandas replace part of string
pandas replace specific values in column
pandas series replace
pandas replace all values in column
pandas str.replace multiple values
pandas replace string with number
pandas replace column values with another column
pandas replace with nan

I have the following data frame, df, with column 'Class'

    Class
0   Individual
1   Group
2   A
3   B
4   C
5   D
6   Group

I would like to replace everything apart from Group and Individual with 'Other', so the final data frame is

    Class
0   Individual
1   Group
2   Other
3   Other
4   Other
5   Other
6   Group

The dataframe is huge, with over 600 K rows. What is the best way to optimally look for values other than 'Group' and 'Individual' and replace them with 'Other'?

I have seen examples for replace, such as:

df['Class'] = df['Class'].replace({'A':'Other', 'B':'Other'})

but since the sheer amount of unique values i have are too many i cannot individually do this. I want to rather just use the exclude subset of 'Group' and 'Individual'.


I think need:

df['Class'] = np.where(df['Class'].isin(['Individual','Group']), df['Class'], 'Other')
print (df)
        Class
0  Individual
1       Group
2       Other
3       Other
4       Other
5       Other
6       Group

Another solution (slowier):

m = (df['Class'] == 'Individual') | (df['Class'] == 'Group')
df['Class'] = np.where(m, df['Class'], 'Other')

Another solution:

df['Class'] = df['Class'].map({'Individual':'Individual', 'Group':'Group'}).fillna('Other')

Performance (in real data depends of number of replacements):

#[700000 rows x 1 columns]
df = pd.concat([df] * 100000, ignore_index=True)
#print (df)

In [208]: %timeit df['Class1'] = np.where(df['Class'].isin(['Individual','Group']), df['Class'], 'Other')
25.9 ms ± 485 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [209]: %timeit df['Class2'] = np.where((df['Class'] == 'Individual') | (df['Class'] == 'Group'), df['Class'], 'Other')
120 ms ± 6.63 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [210]: %timeit df['Class3'] = df['Class'].map({'Individual':'Individual', 'Group':'Group'}).fillna('Other')
95.7 ms ± 3.85 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [211]: %timeit df.loc[~df['Class'].isin(['Individual', 'Group']), 'Class'] = 'Other'
97.8 ms ± 6.78 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

pandas.DataFrame.replace, numeric, str or regex: numeric: numeric values equal to to_replace will be replaced with value. str: string exactly matching  Value to replace any values matching to_replace with. For a DataFrame a dict of values can be used to specify which value to use for each column (columns not in the dict will not be filled). Regular expressions, strings and lists or dicts of such objects are also allowed. inplace bool, default False. If True, in place.


Another approach could be:

df.loc[~df['Class'].isin(['Individual', 'Group']), 'Class'] = 'Other'

Python, n: Number of replacement to make in a single string, default is -1 which means All​. case: Takes boolean value to decide case sensitivity. Make false for case  Solution with replace by dictionary: df['prod_type'] = df['prod_type'].replace({'respon':'responsive', 'r':'responsive'}) print (df) prod_type 0 responsive 1 responsive 2 responsive 3 responsive 4 responsive 5 responsive 6 responsive If need set all values in column to some string: df['prod_type'] = 'responsive'


You can do it this way for example

  1. get list of unique items list = df['Class'].unique()
  2. remove your known class list.remove('Individual')....
  3. then list all Other rows df[df.class is in list]
  4. replace class values df[df.class is in list].class = 'Other'

Sorry for this pseudo-pseudo code, but principle is same.

pandas: replace string with another string, Solution with replace by dictionary : df['prod_type'] = df['prod_type'].replace({'​respon':'responsive', 'r':'responsive'}) print (df) prod_type 0  replace() Function in pandas replaces a string or substring in a column of a dataframe in python with an alternative string. example of replace() in pandas Skip to content DataScience Made Simple


You can use pd.Series.where:

df['Class'].where(df['Class'].isin(['Individual', 'Group']), 'Other', inplace=True)

print(df)

        Class
0  Individual
1       Group
2       Other
3       Other
4       Other
5       Other
6       Group

This should be efficient versus map + fillna:

df = pd.concat([df] * 100000, ignore_index=True)

%timeit df['Class'].where(df['Class'].isin(['Individual', 'Group']), 'Other')
# 60.3 ms per loop

%timeit df['Class'].map({'Individual':'Individual', 'Group':'Group'}).fillna('Other')
# 133 ms per loop

Pandas Series: replace() function, Value to replace any values matching to_replace with. For a DataFrame a dict of values can be used to specify which value to use for each  Pandas Series.str.replace() method works like Python.replace() method only, but it works on Series too. Before calling.replace() on a Pandas series,.str has to be prefixed in order to differentiate it from the Python’s default replace method. Syntax: Series.str.replace(pat, repl, n=-1, case=None, regex=True)


Pandas Dataframe: Replace Examples, Simplest possible example: replace one value with another Permalink. use inplace=True to mutate the dataframe itself. import pandas as pd  use str.replace to replace a substring, replace looks for exact matches unless you pass a regex pattern and param regex=True:


How to replace a string in a column of a Pandas DataFrame in Python, Series to replace every instance of the string old with the string new . a_dataframe = pd.DataFrame({"Letters": [  Replace a string value with NaN in pandas data frame - Python. Ask Question Asked 1 year, 4 months ago. Active 7 months ago. Viewed 5k times 3. Do I have to replace


Series.str.replace() is not actually the same as str.replace() · Issue , replace() - using literal strings instead of regexes. Alternatively the documentation could be updated, but I think the Python str.replace() behavior  Pandas dataframe.replace() function is used to replace a string, regex, list, dictionary, series, number etc. from a dataframe. This is a very rich function as it has many variations. The most powerful thing about this function is that it can work with Python regex (regular expressions).