Pandas - Strip white space

pandas remove spaces from column values
pandas strip characters
pandas strip whitespace from column names
pandas remove words from string
pandas column name contains space
pandas rstrip
pandas remove last character from string
pandas replace

I am using python csvkit to compare 2 files like this:

df1 = pd.read_csv('input1.csv', sep=',\s+', delimiter=',', encoding="utf-8")
df2 = pd.read_csv('input2.csv', sep=',\s,', delimiter=',', encoding="utf-8")
df3 = pd.merge(df1,df2, on='employee_id', how='right')
df3.to_csv('output.csv', encoding='utf-8', index=False)

Currently I am running the file through a script before hand that strips spaces from the employee_id column.

An example of employee_ids:

37 78973 3
2 22 3

Is there a way to get csvkit to do it and save me a step?

You can strip() an entire Series in Pandas using .str.strip():

df1['employee_id'] = df1['employee_id'].str.strip()
df2['employee_id'] = df2['employee_id'].str.strip()

This will remove leading/trailing whitespaces on the employee_id column in both df1 and df2

Alternatively, you can modify your read_csv lines to also use skipinitialspace=True

df1 = pd.read_csv('input1.csv', sep=',\s+', delimiter=',', encoding="utf-8", skipinitialspace=True)
df2 = pd.read_csv('input2.csv', sep=',\s,', delimiter=',', encoding="utf-8", skipinitialspace=True)

It looks like you are attempting to remove spaces in a string containing numbers. You can do this by:

df1['employee_id'] = df1['employee_id'].str.replace(" ","")
df2['employee_id'] = df2['employee_id'].str.replace(" ","")

pandas.Series.str.strip, Stumbled onto this question while looking for a quick and minimalistic snippet I could use. Had to assemble one myself from posts above. pandas.Series.str.strip¶ Series.str.strip (self, to_strip=None) [source] ¶ Remove leading and trailing characters. Strip whitespaces (including newlines) or a set of specified characters from each string in the Series/Index from left and right sides.

You can do the strip() in pandas.read_csv() as:

pandas.read_csv(..., converters={'employee_id': str.strip})

And if you need to only strip leading whitespace:

pandas.read_csv(..., converters={'employee_id': str.lstrip})

And to remove all spaces:

def strip_spaces(a_str_with_spaces):
    return a_str_with_spaces.replace(' ', '')

pandas.read_csv(..., converters={'employee_id': strip_spaces})

Pythonic/efficient way to strip whitespace from every Pandas Data , You can strip() an entire Series in Pandas using .str.strip(): df1['employee_id'] = df1['employee_id'].str.strip() df2['employee_id']  Pandas provide 3 methods to handle white spaces(including New line) in any text data. As it can be seen in the name, str.lstrip() is used to remove spaces from the left side of string, str.rstrip() to remove spaces from right side of the string and str.strip() removes spaces from both sides.


Pandas - Strip white space, str.strip function is used to Strip Space in column of pandas dataframe. space of column in pandas – strip(); strip all the white space of column in pandas  Pythonic/efficient way to strip whitespace from every Pandas Data frame cell that has a stringlike object in it asked Oct 5, 2019 in Data Science by sourav ( 17.6k points) python

Strip Space in column of pandas dataframe (strip leading, trailing , strip() function is used to remove leading and trailing characters. Strip whitespaces (including newlines) or a set of specified characters from each  strip all the white space of column in pandas Strip leading, trailing and all spaces of column in pandas: Stripping the leading and trailing spaces of column in pandas data frames can be achieved by using str.strip() function.

Pandas Series: str.strip() function, You could use pandas' Series.str.strip() method to do this quickly for each string-​like column: >>> data = pd.DataFrame({'values': [' ABC ', ' DEF'  The str.strip() function works really well on Series. Thus, I convert the dataframe column that contains the whitespaces into a Series, strip the whitespace using the str.strip() function and then replace the converted column back into the dataframe. Below is the example code.

How can I strip the whitespace from Pandas DataFrame headers , 1.Firstly, pass the function into rename method. 2.Then, use str.strip() method to strip the whitespace. In [5]: df. Out[5]:. Year Month Value. Remove spaces in the END of a string: sentence= sentence.rstrip() All three string functions strip lstrip, and rstrip can take parameters of the string to strip, with the default being all white space. This can be helpful when you are working with something particular, for example, you could remove only spaces but not newlines:" 1. Step 1

  • df1.employee_id = df1.employee_id.str.strip()
  • What about skipinitialspace=True in read_csv?
  • Would this approach still work if the space was not either trailing or leading? ie '23 4883 2'?
  • No. strip() only works on leading and trailing white space.
  • Can I use regex or similar instead?
  • @fightstarr20, See my latest edit. That replaces spaces with nothing. Does that accomplish what you are looking for? Your column will still be a string, but you can solve that by using astype(int) after the spaces have been removed.
  • That is perfect, thank you for the examples, i am sure the split() solution will come in handy at some point as well