Pandas conditional creation of a series/dataframe column

pandas replace values in column based on multiple condition
pandas create new column based on multiple condition
pandas if condition
pandas set column value based on condition
pandas dataframe if statement
pandas create new column based on condition
pandas if else multiple columns
pandas select columns by condition

I have a dataframe along the lines of the below:

    Type       Set
1    A          Z
2    B          Z           
3    B          X
4    C          Y

I want to add another column to the dataframe (or generate a series) of the same length as the dataframe (= equal number of records/rows) which sets a colour green if Set = 'Z' and 'red' if Set = otherwise.

What's the best way to do this?

Using conditional to generate new column in pandas dataframe , Using conditional to generate new column in pandas dataframe. Create a new alert column based on. If used is 1.0 , alert should be Full . If used is 0.0 , alert should be Empty . Otherwise, alert should be Partial . Pandas conditional creation of a series/dataframe column. Questions: I have a dataframe along the lines of the below: I want to add another column to the dataframe (or generate a series) of the same length as the dataframe (= equal number of records/rows) which sets a colour green if Set = ‘Z’ and ‘red’ if Set = otherwise.

List comprehension is another way to create another column conditionally. If you are working with object dtypes in columns, like in your example, list comprehensions typically outperform most other methods.

Example list comprehension:

df['color'] = ['red' if x == 'Z' else 'green' for x in df['Set']]

%timeit tests:

import pandas as pd
import numpy as np

df = pd.DataFrame({'Type':list('ABBC'), 'Set':list('ZZXY')})
%timeit df['color'] = ['red' if x == 'Z' else 'green' for x in df['Set']]
%timeit df['color'] = np.where(df['Set']=='Z', 'green', 'red')
%timeit df['color'] = df.Set.map( lambda x: 'red' if x == 'Z' else 'green')

1000 loops, best of 3: 239 µs per loop
1000 loops, best of 3: 523 µs per loop
1000 loops, best of 3: 263 µs per loop

5 ways to apply an IF condition in pandas DataFrame, df.loc[df['column name'] condition, 'new column name'] = 'value if condition is met' from pandas import DataFrame numbers = {'set_of_numbers': [1,2,3,4,5,6,7,8 So far you have seen how to apply an IF condition by creating a new column. # Create a new column called df.elderly where the value is yes # if df.age is greater than 50 and no if not df['elderly'] = np.where(df['age']>=50, 'yes', 'no') # View the dataframe df. Everything on this site is available on GitHub. Head to and submit a suggested change. You can also message me directly on Twitter.

Here's yet another way to skin this cat, using a dictionary to map new values onto the keys in the list:

def map_values(row, values_dict):
    return values_dict[row]

values_dict = {'A': 1, 'B': 2, 'C': 3, 'D': 4}

df = pd.DataFrame({'INDICATOR': ['A', 'B', 'C', 'D'], 'VALUE': [10, 9, 8, 7]})

df['NEW_VALUE'] = df['INDICATOR'].apply(map_values, args = (values_dict,))

What's it look like:

df
Out[2]: 
  INDICATOR  VALUE  NEW_VALUE
0         A     10          1
1         B      9          2
2         C      8          3
3         D      7          4

This approach can be very powerful when you have many ifelse-type statements to make (i.e. many unique values to replace).

And of course you could always do this:

df['NEW_VALUE'] = df['INDICATOR'].map(values_dict)

But that approach is more than three times as slow as the apply approach from above, on my machine.

And you could also do this, using dict.get:

df['NEW_VALUE'] = [values_dict.get(v, None) for v in df['INDICATOR']]

Pandas conditional creation of a series/dataframe column, Pandas conditional creation of a series/dataframe column. I want to add another column to the dataframe (or generate a series) of the same length as the dataframe (= equal number of records/rows) which sets a colour green if Set = 'Z' and 'red' if Set = otherwise. pandas create new column based on values from other columns / apply a function of multiple columns, row-wise asked Oct 10, 2019 in Python by Sammy (47.8k points)

Another way in which this could be achieved is

df['color'] = df.Set.map( lambda x: 'red' if x == 'Z' else 'green')

Create a Column Based on a Conditional in pandas, Create a Column Based on a Conditional in pandas Make a dataframe Create a new column called df.elderly where the value is yes # if  Python | Creating a Pandas dataframe column based on a given condition. While operating on data, there could be instances where we would like to add a column based on some condition. There does not exist any library function to achieve this task directly, so we are going to see the ways in which we can achieve this goal.

The following is slower than the approaches timed here, but we can compute the extra column based on the contents of more than one column, and more than two values can be computed for the extra column.

Simple example using just the "Set" column:

def set_color(row):
    if row["Set"] == "Z":
        return "red"
    else:
        return "green"

df = df.assign(color=df.apply(set_color, axis=1))

print(df)
  Set Type  color
0   Z    A    red
1   Z    B    red
2   X    B  green
3   Y    C  green

Example with more colours and more columns taken into account:

def set_color(row):
    if row["Set"] == "Z":
        return "red"
    elif row["Type"] == "C":
        return "blue"
    else:
        return "green"

df = df.assign(color=df.apply(set_color, axis=1))

print(df)
  Set Type  color
0   Z    A    red
1   Z    B    red
2   X    B  green
3   Y    C   blue
Edit (21/06/2019): Using plydata

It is also possible to use plydata to do this kind of things (this seems even slower than using assign and apply, though).

from plydata import define, if_else

Simple if_else:

df = define(df, color=if_else('Set=="Z"', '"red"', '"green"'))

print(df)
  Set Type  color
0   Z    A    red
1   Z    B    red
2   X    B  green
3   Y    C  green

Nested if_else:

df = define(df, color=if_else(
    'Set=="Z"',
    '"red"',
    if_else('Type=="C"', '"green"', '"blue"')))

print(df)                            
  Set Type  color
0   Z    A    red
1   Z    B    red
2   X    B   blue
3   Y    C  green

Python, Python | Creating a Pandas dataframe column based on a given condition Problem : Given a dataframe containing the data of a cultural event, add a column called a dataframe from Pandas series · Creating views on Pandas DataFrame  Creating a dataframe from Pandas series Series is a type of list in pandas which can take integer values, string values, double values and more. But in Pandas Series we return an object in the form of list, having index starting from 0 to n , Where n is the length of values in series.

Conditional operation on Pandas DataFrame columns , Solution #1: We can use conditional expression to check if the column is present or not. If it is not present then we calculate the price using the alternative column. Pandas Data Structures: (Series, DataFrame and how to grab required data from them What if we create a Series object with integers from 1 to 5 Each column is actually a pandas Series,

How To Create a Column Using Condition on Another Column in , How to Drop Rows Based on a Column Value in Pandas Dataframe? How To Select Columns in Python Pandas? How To Select One or More  DataFrame.shape is an attribute (remember tutorial on reading and writing, do not use parantheses for attributes) of a pandas Series and DataFrame containing the number of rows and columns: (nrows, ncolumns). A pandas Series is 1-dimensional and only the number of rows is returned. I’m interested in the age and sex of the titanic passengers.

Pandas conditional creation of a series/dataframe column, Pandas conditional creation of a series/dataframe column - Wikitechy. Example​: import pandas as pd import numpy as np df = pd. Question: Tag: python,pandas,replace,fill,calculated-columns Feel like I've looked just about everywhere and I know its probably something very simple. I'm working with a pandas dataframe and looking to fill/replace data in one of the columns based on data from that SAME column.

Comments
  • doesn't work if i put two conditions inside where clause with and
  • @AmolSharma: Use & instead of and. See stackoverflow.com/q/13589390/190597
  • df['color'] = list(np.where(df['Set']=='Z', 'green', 'red')) will suppress the pandas warning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead
  • 'green' and 'red' can also be replaced with column arithmetic. e.g., df['foo'] = np.where(df['Set']=='Z', df['Set'], df['Type'].shift(1))
  • It's a shame i can't upvote this multiple times. One upvote doesn't seem enough.
  • Note that, with much larger dataframes (think pd.DataFrame({'Type':list('ABBC')*100000, 'Set':list('ZZXY')*100000})-size), numpy.where outpaces map, but the list comprehension is king (about 50% faster than numpy.where).
  • Can the list comprehension method be used if the condition needs information from multiple columns? I am looking for something like this (this does not work): df['color'] = ['red' if (x['Set'] == 'Z') & (x['Type'] == 'B') else 'green' for x in df]
  • Add iterrows to the dataframe, then you can access multiple columns via row: ['red' if (row['Set'] == 'Z') & (row['Type'] == 'B') else 'green' for index, row in in df.iterrows()]
  • Note this nice solution will not work if you need to take replacement values from another series in the data frame, such as df['color_type'] = np.where(df['Set']=='Z', 'green', df['Type'])
  • I like this answer because it shows how to do multiple replacements of values
  • Good approach, this can be memoized for faster efficiency (in larger datasets), though would require an additional step.
  • The best one so far. You could probably add for more conditions that would be the code df.loc[(df['Set']=="Z") & (df['Type']=="A"), 'Color'] = "green"
  • This should be the accepted answer. Actually idiomatic and extensible.