ORC fileformat with Impala

orc vs parquet
does impala support orc file format
orc with gzip
orc compression ratio
enable_orc_scanner
advantages of parquet over orc

Can ORC fileformat be used in Impala? Also how to access ORC table stored in hive metastore in Impala. Found below documentation link, but it doesn't contain any restricted fileformats list or mention of ORC not supported with impala: http://www.cloudera.com/documentation/enterprise/latest/topics/impala_file_formats.html

ORC is not supported in Impala. Rather, Apache Parquet is the recommend format for best performance.

Using the ORC File Format with Impala Tables | 6.3.x, ORC is not supported in Impala. Rather, Apache Parquet is the recommend format for best performance. The ORC format defines a set of data types whose names differ from the names of the corresponding Impala data types. If you are preparing ORC files using other Hadoop components such as Pig or MapReduce, you might need to work with the type names defined by ORC. The following figure lists the ORC-defined types and the equivalent types in Impala.

Impala cannot read ORC file format. If you have the possibility, I would suggest to migrate your ORC files to PARQUET with Hive. The advantage is that you are paying just one the time of setting up map-reduce tasks.

If your ORC table is nameoforctable, the a very basic query looks like:

CREATE TABLE nameoforctable_parquet
LIKE nameoforctable
STORED AS PARQUET
LOCATION '/your/hdfs/location';

INSERT INTO nameoforctable_parquet 
SELECT * FROM nameoforctable

ORC fileformat with Impala, Can ORC fileformat be used in Impala? Also how to access ORC table stored in hive metastore in Impala. Found below documentation link, but it doesn't contain​  By default, ORC reads are enabled in Impala 3.4.0. To disable the support of ORC data files: In Cloudera Manager, navigate to Clusters > Impala.; In the Configuration tab, set --enable_orc_scanner=false in the Impala Command Line Argument Advanced Configuration Snippet (Safety Valve) field.

Even though ORC is the only format to support ACID feature in Hive and demonstrated better query performance and compression ratio in some benchmarking studies, Impala doesn't support the ORC file format because it was created by Hortonworks, who is one of their major competitors. Vice versa, the Hive version on Hortonworks Data Platform (HDP) does not support Parquet for the same reason.

ORC fileformat with Impala - hadoop - iOS, We used the ORC file format in Hive and the Parquet file format in Impala which are the popular columnar formats that each system advertises. We also  Impala cannot read ORC file format. If you have the possibility, I would suggest to migrate your ORC files to PARQUET with Hive. The advantage is that you are paying just one the time of setting up map-reduce tasks. If your ORC table is nameoforctable, the a very basic query looks like:

Use the follow command to create orc format table in impala:

create table orc_table_name_1 (x INT, y STRING) STORED AS orc;

[PDF] SQL-on-Hadoop, Impala File Format Support. Impala can only query the file formats listed in the preceding table. In particular, Impala does not support the ORC  Impala supports RCFile but as of CDH 5.7 which is the latest at the time of writing there is no support for ORC. Re: Can we use ORC fileformat in Impala?

Impala File Format Support - Z² Little, The Optimized Row Columnar (ORC) file format provides a highly efficient way to store Hive data. It was added in Hive 0.11 Impala doesn't support ORCFile. Examples Using AVRO and ORC with Hive and Impala Building off our first post on TEXTFILE and PARQUET, we decided to show examples with AVRO and ORC. AVRO is a row oriented format, while Optimized Row Columnar (ORC) is a format tailored to perform well in Hive. These were executed on CDH 5.2.0 running Hive 0.13.1 + Cloudera back ports.

Hive ORCFile, Hive different file formats such as TextFile, SequenceFile, RCFile, AVRO, ORC and Parquet formats. Cloudera Impala also supports these file  Does impala support orc file format ? If not when will it be available to support?

Apache Hive Different File Formats:TextFile, SequenceFile, RCFile , Apache Hive supports several familiar file formats let us examine one by one. An ORC file contains rows data in groups called as Stripes along with At present, Hive and Impala are able to query newly added columns, but  By default, ORC reads are enabled in Impala 3.4.0 and higher. To disable, set --enable_orc_scanner to false when starting the cluster. No. Import data by using LOAD DATA on data files already in the right format, or use INSERT in Hive followed by REFRESH table_name in Impala.

Comments
  • "the Hive version on Hortonworks ... does not support Parquet" >> WHAT? can you prove that claim??
  • From my discussion with HDP support engineer, HDP doesn't officially support Parquet in their platform, i.e., you can still use Parquet, but if you have any problems with it, you are on your own.
  • OK, so it's a bit different: Impala works with only one columnar format, i.e. Apache Parquet, because it uses Impala-specific C++ libraries; Apache Hive works with lots of formats that provide standard Hive "SerDe" Java libraries, but HortonWorks paying support covers only one columnar format i.e Apache ORC (and not Apache parquet nor Apache CarbonData). That makes sense.
  • That answer is ambiguous. Please specify which version of Impala introduced ORC support (and also which version of CDH distro introduced that version of Impala)