Map only first occurrence of key/value match in dataframe

pandas merge
pandas loc
pandas dataframe
pandas replace
pandas groupby
a value is trying to be set on a copy of a slice from a dataframe
pandas series
pandas iloc

Is it possible to map only the first occurrence of key in a dataframe?

Ex:

testDict = { A : 1, B: 2}

df

Name   Num
 A
 A
 B
 B

Expected output

Name   Num
 A      1
 A      
 B      2
 B 

Use duplicated to find the first occurrence and then map:

df['Num'] = df.Name[df.Name.duplicated(keep='last')].map(testDict)
print(df)

Output

  Name  Num
0    A  1.0
1    A  NaN
2    B  2.0
3    B  NaN

To remove the NaN values, if you wish, do:

df = df.fillna('')

pandas.Series.replace, Series.keys · pandas. str: string exactly matching to_replace will be replaced with value First, if to_replace and value are both lists, they must be the same length. This doesn't matter much for value since there are only a few possible For a DataFrame a dict of values can be used to specify which value to use for each  pandas.Series.map¶ Series.map (self, arg, na_action = None) [source] ¶ Map values of Series according to input correspondence. Used for substituting each value in a Series with another value, that may be derived from a function, a dict or a Series. Parameters arg function, collections.abc.Mapping subclass or Series. Mapping correspondence.


You can use duplicated and map:

df['Num'] = np.where(~df['Name'].duplicated(), df['Name'].map(testDict), '')

Output:

  Name Num
0    A   1
1    A    
2    B   2
3    B    

Indexing and selecting data, Series: series[label] returns a scalar value; DataFrame: frame[colname] returns a Series only want 'two' or 'three' In [574]: criterion = df2['a'].map(lambda x:  The active data frame. When your map document contains more than one data frame, you will have one that is the active data frame; that is, the one you are actively working with. The active data frame name is shown in bold in the table of contents. To make a data frame active, right-click on its name in the table of contents and select Activate.


map the drop_duplicates, assuming you have a unique Index for alignment. (Probably best to keep NaN so the column remains numeric)

df['Num'] = df['Name'].drop_duplicates().map(testDict)

  Name  Num
0    A  1.0
1    A  NaN
2    B  2.0
3    B  NaN

Essential basic functionality, When your DataFrame only has a single data type for all the columns, In the past, pandas recommended Series.values or DataFrame.values for With binary operations between pandas data structures, there are two key points of interest: or maximum value, idxmin() and idxmax() return the first matching index: In [​114]:  Shown below, are the three operations I commonly use. These include: printing the first 5 rows using the head method, and accessing the column names using the column attribute of the DataFrame object. Finally, I look at the shape attribute of the DataFrame object.


tips_and_tricks, For | a DataFrame, can pass a dict, if the keys are DataFrame column names. If None, will attempt to use everything, | then use only boolean data. idxmax(self, axis=0, skipna=True) | Return index of first occurrence of maximum over requested axis. Keys map to column names and values map to substitution | values. Right now, I have it iterating over and comparing each value in order. If a unique value appears, it only stores the first occurrence in the dictionary. I changed it to now also check if that value has already occurred before, and if so, to skip it.


Functions - Spark SQL, Built-in Functions, For example, map type is not orderable, so it is not supported. For complex ascii(str) - Returns the numeric value of the first character of str . Examples: If isIgnoreNull is true, returns only non-null values. Creates a map with the given key/value pairs. split(str, regex) - Splits str around occurrences that match regex . Using dictionary to remap values in Pandas DataFrame columns While working with data in Pandas, we perform a vast array of operations on the data to get the data in the desired form. One of these operations could be that we want to remap the values of a specific column in the DataFrame.


pyspark.sql module, pyspark.sql.functions List of built-in functions available for DataFrame . For a (​key, value) pair, you can omit parameter names. Temporary tables exist only during the lifetime of this instance of SQLContext . To register a nondeterministic Python function, users need to first build a nondeterministic user​-defined function  As shown in the output image, the lest index of occurrence of substring is returned. But it can be seen, in case of Terry Rozier(Row 9 in data frame), instead of first occurrence of ‘er’, 10 was returned. This is because the start parameter was kept 2 and the first ‘er’ occurs before that.