Pandas DataFrame - Creating a new column from a comparison

pandas shift
pandas compare two dataframes row by row
pandas : compare two columns of different data frame
pandas diff
pandas compare two columns row by row
pandas compare two dataframes of different lengths
python compare column values
pandas match values in two dataframes

I'm trying to create a columns called 'city_code' with values from the 'code' column. But in order to do this I need to compare if 'ds_city' and 'city' values are equal.

Here is a table sample:

https://i.imgur.com/093GJF1.png

I've tried this:

def find_code(data):
    if data['ds_city'] == data['city'] :
        return data['code']
    else:
        return 'UNKNOWN'

df['code_city'] = df.apply(find_code, axis=1)

But since there are duplicates in the 'ds_city' columns that's the result:

https://i.imgur.com/geHyVUA.png

Here is a image of the expected result:

https://i.imgur.com/HqxMJ5z.png

How can I work around this?

You can use pandas merge:

df = pd.merge(df, df[['code', 'city']], how='left', 
              left_on='ds_city', right_on='city', 
              suffixes=('', '_right')).drop(columns='city_right')

# output:
#   code    city        ds_city     code_right
# 0 1500107 ABAETETUBA  ABAETETUBA  1500107
# 1 2900207 ABARE       ABAETETUBA  1500107
# 2 2100055 ACAILANDIA  ABAETETUBA  1500107
# 3 2300309 ACOPIARA    ABAETETUBA  1500107
# 4 5200134 ACREUNA     ABARE       2900207

Here's pandas.merge's documentation. It takes the input dataframe and left joins itself's code and city columns when ds_city equals city.

The above code will fill code_right when city is not found with nan. You can further do the following to fill it with 'UNKNOWN':

df['code_right'] = df['code_right'].fillna('UNKNOWN')

How to create new columns derived from existing columns?, How to handle time series data with ease? How to manipulate textual data? Comparison with other tools � Community tutorials. Create a new column in Pandas DataFrame based on the existing columns. While working with data in Pandas, we perform a vast array of operations on the data to get the data in the desired form. One of these operations could be that we want to create new columns in the DataFrame based on the result of some operations on the existing columns in the DataFrame.

This is more like np.where

import numpy as np 

df['code_city'] = np.where(data['ds_city'] == data['city'],data['code'],'UNKNOWN')

pandas.DataFrame.diff — pandas 1.1.0 documentation, Take difference over rows (0) or columns (1). Returns. Dataframe. First differences of the Series. See also. Dataframe.pct_change. Pandas: Add new column based on comparison of two DFs. Ask Question [high].max() to create a new dataframe with the max and min values for each pair of cat and

You could try this out:

# Begin with a column of only 'UNKNOWN' values.
data['code_city'] = "UNKNOWN"
# Iterate through the cities in the ds_city column.
for i, lookup_city in enumerate(data['ds_city']):
  # Note the row which contains the corresponding city name in the city column.
  row = data['city'].tolist().index(lookup_city)
  # Reassign the current row's code_city column to that code from the row we found in the last step.
  data['code_city'][i] = data['code'][row]

Comparison with SAS — pandas 1.1.0 documentation, A DataFrame in pandas is analogous to a SAS data set - a two-dimensional data source with labeled In SAS, if/then logic can be used to create new columns. Method #4: By using a dictionary We can use a Python dictionary to add a new column in pandas DataFrame. Use an existing column as the key values and their respective values will be the values for new column.

How to Compare Values in two Pandas DataFrames, How to Compare Values in two Pandas DataFrames np.where(df1['Price1'] == df2['Price2'], 'True', 'False') #create new column in df1 to check if prices match� pandas.Series.map () to create new DataFrame columns based on a given condition in Pandas We can create the DataFrame columns based on a given condition in Pandas using list comprehension, NumPy methods, apply () method, and map () method of the DataFrame object.

Deriving New Columns & Defining Python Functions, Make new columns from existing data and build custom functions. This lesson builds on the pandas DataFrame data type you learned about in a previous lesson. You'll learn But first, you'll need to learn a few tools for comparing values. Applying an IF condition under an existing DataFrame column. So far you have seen how to apply an IF condition by creating a new column. Alternatively, you may store the results under an existing DataFrame column. For example, let’s say that you created a DataFrame that has 12 numbers, where the last two numbers are zeros:

How do I compare columns in different data frames?, If you want to check equals values on a certain column let's say Name you can merge both Dataframes to a new one: mergedStuff = pd.merge(df1, df2,� df1 ['new column that will contain the comparison results'] = np.where (condition,'value if true','value if false') For our example, here is the syntax that you can add in order to compare the prices (i.e., Price1 vs. Price2) under the two DataFrames: df1 ['pricesMatch?'] = np.where (df1 ['Price1'] == df2 ['Price2'], 'True', 'False')

Comments
  • @Shadownmonster, I don't really understand the purpose of your comparison. I mean what happens if is repeated?
  • Could you add a table with the expected output?
  • @lmiguelvargasf The code is associated with the city. I need the code to repeat when 'ds_city' equals 'city'.
  • @Shadowmonster, I added an answer, let me know if this solves your problem. If not, please detail why not.
  • I don't know how the code is even possible. You are not passing any data argument when calling "find_code"
  • Thanks for your answer! But the output was the same :(