Pandas pd.read_csv does not work for simple sep=','
Good afternoon, everybody.
I know that it is quite an easy question, although, I simply do not understand why it does not work the way I expected.
The task is as following:
I have a file data.csv presented in this format:
id,"feature_1","feature_2","feature_3" 00100429,"PROTO","Proprietary","Phone" 00100429,"PROTO","Proprietary","Phone"
The thing is to import this data using pandas. I know that by default pandas read_csv uses comma separator, so I just imported it as following:
data = pd.read_csv('data.csv')
And the result I got is the one I presented at the beginning with no change at all. I mean one column which contains everything.
I tried many other separators using regex, and the only one that made some sort of improvement was:
data = pd.read_csv('data.csv',sep="\,",engine='python')
On the one hand it finally separated all columns, on the other hand the way data is presented is not that convenient to use. In particular:
"id ""feature_1"" ""feature_2"" ""feature_3""" "00100429 ""PROTO"" ""Proprietary"" ""Phone"""
Therefore, I think that somewhere must be a mistake, because the data seems to be fine.
So the question is - how to import csv file with separated columns and no triple quote symbols?
Here's my quick solution for your problem -
import numpy as np import pandas as pd ### Reading the file, treating header as first row and later removing all the double apostrophe df = pd.read_csv('file.csv', sep='\,', header=None).apply(lambda x: x.str.replace(r"\"","")) df 0 1 2 3 0 id feature_1 feature_2 feature_3 1 00100429 PROTO Proprietary Phone 2 00100429 PROTO Proprietary Phone ### Putting column names back and dropping the first row. df.columns = df.iloc df.drop(index=0, inplace=True) df ## You can reset the index id feature_1 feature_2 feature_3 1 00100429 PROTO Proprietary Phone 2 00100429 PROTO Proprietary Phone ### Converting `id` column datatype back to `int` (change according to your needs) df.id = df.id.astype(np.int) np.result_type(df.id) dtype('int64')
pandas.read_csv, Read CSV (comma-separated) file into DataFrame to setting sep='\s+' . If this option is set to True, nothing should be passed in for the delimiter parameter. Stack Overflow for Teams is a private, secure spot for you and your coworkers to find and share information. Learn more python pandas read_csv quotechar does not work
Here's just an alternative way to dataLeo's answer -
import pandas as pd import numpy as np
Reading the file in a dataframe, and later removing all the double apostrophe from row values
df = pd.read_csv("file.csv", sep="\,").apply(lambda x: x.str.replace(r"\"","")) df "id" "feature_1" "feature_2" "feature_3" 0 00100429 PROTO Proprietary Phone 1 00100429 PROTO Proprietary Phone
Removing all the double apostrophe from column names
df.columns = df.columns.str.replace('\"', '') df id feature_1 feature_2 feature_3 0 00100429 PROTO Proprietary Phone 1 00100429 PROTO Proprietary Phone
id column datatype back to
int (change according to your needs)
df.id = df.id.astype('int') np.result_type(df.id) dtype('int32')
How to read data using pandas read_csv, Related: #7662 Code Sample, a copy-pastable example if possible from io import sep=None, engine='python')) print('is not the same as') print(pd.read_csv(Strin. GitHub is home to over 50 million developers working together to host and review DEPR: favour sep over delimiter in pd.read_csv #23158. I try to print my large dataframe to csv file but the tab separation sep='\t' does not work. I then test with newline sep=' ', it seems work ok, break all the elements by newline. What are possibly wrong here? The code is so simple like. df_M.to_csv('report'+filename, header=True, sep='\t', index=False)
Inconsistent behaviour on sep/delimiter for pandas.read_csv · Issue , Here we are also covering how to deal with common issues in importing CSV file. how to read a CSV file in python using read_csv function of pandas package. The program below creates a sample pandas dataframe which can be used further Using sep= parameter in read_csv( ) function, you can import file with any Here you will get detailed knowledge about the Pandas read_csv function with real-time examples. Every data science, machine learning and data analysis project required data to import the preferred IDE ( Integrated Development Environment ).
15 ways to read CSV file with pandas, Use read_csv to read tabular data into Python Most of the things that pandas can do can be done with basic Python, but the collected set of pandas functions and data structure You do not have to explicitly open and close the dataset. What happens if you forget to specify sep='\t' when reading a tab delimited dataset Not only that, read_csv () can infer the data types for each column of your dataset as well. You can see below the calories column is an integer column, whereas the fiber column is a float column: print (df ['calories'].dtypes) print (df ['fiber'].dtypes) int64 float64.
Data Analysis and Visualization with Python for Social Scientists , is that you can do all that with just one line of Python code; that's why it's fast and easy. import pandas as pd iris_df = pd.read_csv('Iris.csv', sep=',', decimal='. read_csv accepts both sep and delimiter but to_csv silently ignores delimiter. Someone was recently tripped up by this on SO. Someone was recently tripped up by this on SO. I'm fine with either teaching to_csv to behave the same way read_csv does or, alternatively, raising if delimiter is found as a keyword.
Deep Learning Pipeline: Building a Deep Learning Model with TensorFlow, Some text is separated by a tab from a binary sentiment label, where 1 is a positive You can download the dataset and place it in your Python working directory ('\tunzipping %s' % newfile) In case the previous script doesn't work, you can using the read_csv function: import numpy as np import pandas as pd dataset I was using sep='\s*' because delim_whitespace could not cope with initial whitespace as you can try out with this example. Also skipinitialspace did not help. But this sep regex supersedes what delim_whitespace does, so thanks for letting me know that they overlap.