Remove certain characters if on end of string in Pandas
pandas remove characters from string
pandas remove words from string
remove a character from a string python dataframe
pandas str.replace multiple values
pandas remove pattern from column
pandas remove last character from string
pandas remove leading zeros from column
I have a list of names in which I have made all uppercase, removed spaces, and non-alphabetic characters to more easily merge with another list -- both are in pandas dataframe.
One of the dataframe's names have some names with
JR attached to the end while their counterparts in the other dataframe to not contain this suffix. How can I strip all
JR from both?
I tried something like the following:
df['NAME'] = df['NAME'].str.replace('JR','')
but I think this would remove all instances of
JR and not when it is the last 2 characters. Any help would be appreciated.
You could use replace with a regex:
import pandas as pd df = pd.DataFrame(data=['Name JR', 'Name JR Middle', 'JR Name'], columns=['name']) df['name'] = df.name.str.replace(r'\bJR$', '', regex=True).str.strip() print(df)
name 0 Name 1 Name JR Middle 2 JR Name
'\bJR$' matches the word JR only at the end of the string.
Remove ends of string entries in pandas DataFrame column, rstrip can remove more characters, if the end of strings contains some characters of striped string (in this case . , t , x ):. Example: rstrip can remove more characters, if the end of strings contains some characters of striped string (in this case ., t, x ): Example: print df filename A B C 0 txt.txt 2 4 5 1 x.txt 1 2 1 df['filename'] = df['filename'].str.rstrip('.txt') print df filename A B C 0 2 4 5 1 1 2 1. share.
def jr_replace(x): match = re.sub(r'JR$',"",x) return match df['NAME'] = df['NAME'].apply(jr_replace) print(df)
pandas.Series.str.strip, Strip whitespaces (including newlines) or a set of specified characters from each string in the Series/Index from left and If None then whitespaces are removed. Pandas provide 3 methods to handle white spaces (including New line) in any text data. As it can be seen in the name, str.lstrip () is used to remove spaces from the left side of string, str.rstrip () to remove spaces from right side of the string and str.strip () removes spaces from both sides.
One option is to remove
string.endswith, and remove it from the rows that contain it sclicing the
m = s.str.endswith('JR') s.loc[m] = s.loc[m].str[:-2]
Using @danielmesejo's dataframe:
df = pd.DataFrame(data=['Name JR', 'Name JR Middle', 'JR Name'], columns=['name']) m = df.name.str.endswith('JR') df.name.loc[m] = df.name.loc[m].str[:-2] name 0 Name 1 Name JR Middle 2 JR Name
pandas.Series.str.replace, String can be a character sequence or regular expression. replstr or as a literal string. Cannot be set to False if pat is a compiled regex or repl is a callable. # Select the pandas.Series object you want >>> df['text'] 0 vendor a::ProductA 1 vendor b::ProductA 2 vendor a::Productb Name: text, dtype: object # using pandas.Series.str allows us to implement "normal" string methods # (like split) on a Series >>> df['text'].str <pandas.core.strings.StringMethods object at 0x110af4e48> # Now we can use the
Working with text data, Some string methods, like Series.str.decode() are not available on StringArray because In : df.columns.str.strip() Out: Index(['Column A', 'Column B'], Out: 0 12 1 -$10 2 $10,000 dtype: string # We need to escape the special character (for >1 len If you index past the end of the string, the result will be a NaN . Remove leading and trailing characters. Strip whitespaces (including newlines) or a set of specified characters from each string in the Series/Index from left and right sides. Equivalent to str.strip(). Parameters to_strip str or None, default None. Specifying the set of characters to be removed. All combinations of this set of characters will be stripped.
Simplify your Dataset Cleaning with Pandas, Moreover, if this dataset will be used to feed a Machine Learning algorithm for Because we replaced the special characters with a plus, we might end up with double String example before removing the special character. pandas.Series.str.replace¶ Series.str.replace (self, pat, repl, n = - 1, case = None, flags = 0, regex = True) [source] ¶ Replace occurrences of pattern/regex in the Series/Index with some other string. Equivalent to str.replace() or re.sub(). Parameters pat str or compiled regex. String can be a character sequence or regular expression. repl
String Functions in Python with Examples, To manipulate strings and character values, python has several in-built Find length of string; Convert to lowercase and uppercase; Remove Leading and mystring[X:Y], Extract characters from middle of string, starting from X position and ends with Y str.contains('pattern', case=False), Check if pattern matches (Pandas pandas.Series.str.endswith¶ Series.str.endswith (self, pat, na = nan) [source] ¶ Test if the end of each string element matches a pattern. Equivalent to str.endswith(). Parameters pat str. Character sequence. Regular expressions are not accepted. na object, default NaN. Object shown if element tested is not a string. Returns Series or Index
- How about:
df['NAME'] = df['NAME'].apply(lambda x: x[:-2] if x.endswith('JR') else x)