Hive load CSV with commas in quoted fields

how to remove double quotes from data in hive
hive escape double quotes
hive comma separated output
hive export-to csv with quotes
hive csv serde new line
hive create csv table
load data from csv to hive table using python
load data into hive table

I am trying to load a CSV file into a Hive table like so:

CREATE TABLE mytable
(
num1 INT,
text1 STRING,
num2 INT,
text2 STRING
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ",";

LOAD DATA LOCAL INPATH '/data.csv'
OVERWRITE INTO TABLE mytable;    

The csv is delimited by an comma (,) and looks like this:

1, "some text, with comma in it", 123, "more text"

This will return corrupt data since there is a ',' in the first string. Is there a way to set an text delimiter or make Hive ignore the ',' in strings?

I can't change the delimiter of the csv since it gets pulled from an external source.

The problem is that Hive doesn't handle quoted texts. You either need to pre-process the data by changing the delimiter between the fields (e.g: with a Hadoop-streaming job) or you can also give a try to use a custom CSV SerDe which uses OpenCSV to parse the files.

Solved: how to load double quotes data of fields in hive , Solved: am having csv file data like this as shown below example 1,"Air Transport International, LLC",example,city i have to load this data. CREATE TABLE mytable ( num1 INT, text1 STRING, num2 INT, text2 STRING ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ","; LOAD DATA LOCAL INPATH '/data.csv' OVERWRITE INTO TABLE mytable; The csv is delimited by an comma (,) and looks like this: 1, "some text, with comma in it", 123, "more text"

If you can re-create or parse your input data, you can specify an escape character for the CREATE TABLE:

ROW FORMAT DELIMITED FIELDS TERMINATED BY "," ESCAPED BY '\\';

Will accept this line as 4 fields

1,some text\, with comma in it,123,more text

Apache Hive Load Quoted Values CSV File and Examples , Hive is just like your regular data warehouse appliances and you may receive files with single or double quoted values. In this article, we will see  CREATE TABLE mytable ( num1 INT, text1 STRING, num2 INT, text2 STRING ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ","; LOAD DATA LOCAL INPATH '/data.csv' OVERWRITE INTO TABLE mytable; The csv is delimited by an comma (,) and looks like this: 1, "some text, with comma in it", 123, "more text"

As of Hive 0.14, the CSV SerDe is a standard part of the Hive install

ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'

(See: https://cwiki.apache.org/confluence/display/Hive/CSV+Serde)

Why do we use 'row format delimited' and 'fields terminated by' in , What is the use of row format delimited in hive? If you're stuck with the CSV file format, you'll have to use a custom SerDe; and here's some work based on the opencsv libarary.. But, if you can modify the source files, you can either select a new delimiter so that the quoted fields aren't necessary (good luck), or rewrite to escape any embedded commas with a single escape character, e.g. '\', which can be specified within the ROW FORMAT

keep the delimiter in single quotes it will work.

ROW FORMAT DELIMITED 
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n';

This will work

Importing Data from Files into Hive Tables, that when it finds a new line character that means is a new records. Short of writing a serde, you could do 2 things, escape the comma in the original data before loading, using some character. for e.g. \ and then use the hive create table command using row format delimited fields terminated by ',' escaped by **'\'**

Add a backward slash in FIELDS TERMINATED BY '\;'

For Example:

CREATE  TABLE demo_table_1_csv
COMMENT 'my_csv_table 1'
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\;'
LINES TERMINATED BY '\n'
STORED AS TEXTFILE
LOCATION 'your_hdfs_path'
AS 
select a.tran_uuid,a.cust_id,a.risk_flag,a.lookback_start_date,a.lookback_end_date,b.scn_name,b.alerted_risk_category,
CASE WHEN (b.activity_id is not null ) THEN 1 ELSE 0 END as Alert_Flag 
FROM scn1_rcc1_agg as a LEFT OUTER JOIN scenario_activity_alert as b ON a.tran_uuid = b.activity_id;

I have tested it, and it worked.

hadoop Hive load CSV with commas in quoted fields?, at the command prompt and enter the following commands. My source system is sending .csv file in which data fields can contain commas and to make our task easy (in short more complicated|) they have added quotes ("" "") in the data fields wherever comma is appearing. We can remove the comma in between data fields by using shell script but can we handle this somehow in Abinitio!! Pls advise. E.g.

How to load CSV data with enclosed by double quotes and , How do I import a CSV file into an Impala table? CREATE TABLE mytable ( num1 INT, text1 STRING, num2 INT, text2 STRING ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ","; LOAD DATA LOCAL INPATH '/data.csv' OVERWRITE INTO TABLE mytable; The csv is delimited by an comma (,) and looks like this: 1, "some text, with comma in it", 123, "more text"

Export table enclosing values with quotes to local csv in hive, that Hive doesn't handle quoted texts. You either need to pre-process the data by changing the delimiter between the fields (e.g: with a Hadoop-streaming job)  Data Loader cannot handle this implicitly because there is no logical path to follow. In case your Data Loader CSV file for import will contain commas for any of the field content, you will have to enclose the contents within double quotation marks " ". Data Loader will be able to handle this.

How to load a CSV file on Hive when one field contains a comma and, Hive load CSV with commas in quoted fields. I am trying to load a CSV file into a Hive table like so: CREATE TABLE mytable ( num1 INT, text1 STRING, num2 INT,​  am having csv file data like this as shown below example 1,"Air Transport International, LLC",example,city i have to load this data in hive like this as shown below 1,Air Transport InternationalLLC,example,city but actually am getting like below?? 1,Air Transport International, LLC,example,city how

Comments
  • sed -i 's/"//g' your_file_name does the pre-process inplace by removing the quoted text. However, you NEED to be certain that there is no innocous removal of other intended quoted (") characters.
  • That handles embedded commas, but not embedded newlines, which are the other gotcha in CSV data. Or can the newlines be escaped too? The spec at cwiki.apache.org/confluence/display/Hive/… doesn't seem to allow escaping newlines.
  • If your HIVE is up-to-date, this is the best answer :)
  • This helped me too!
  • When you use OpenCSVSerde is there a way to specify what Null is defined with? Using "ROW FORMAT DELIMITED" I could add the option "NULL DEFINED AS ' '" to recognize null values in the data.
  • This is not working for me, Hive shows the quoted value as NULL
  • @wrschneider, where can i download this serde ?
  • it's working since '\;' is the same thing as ';'. There is no need to escape semicolon - but there is no need either