Is it possible to change partition metadata in HIVE?

hive drop partition
msck repair table hive
hive show partitions
alter table add partition hive
hive create partitioned table
hive reserved words
hive alter table
hive create table

This is an extension of a previous question I asked: How to compare two columns with different data type groups

We are exploring the idea of changing the metadata on the table as opposed to performing a CAST operation on the data in SELECT statements. Changing the metadata in the MySQL metastore is easy enough. But, is it possible to have that metadata change applied to partitions (they are daily)? Otherwise, we might be stuck with current and future data being of type BIGINT while the historical is STRING.

Question: Is it possible to change partition meta data in HIVE? If yes, how?


You can not change the partition column in hive infact Hive does not support alterting of partitioning columns

Refer : altering partition column type in Hive

You can think of it this way - Hive stores the data by creating a folder in hdfs with partition column values - Since if you trying to alter the hive partition it means you are trying to change the whole directory structure and data of hive table which is not possible exp if you have partitioned on year this is how directory structure looks like

tab1/clientdata/2009/file2
tab1/clientdata/2010/file3

If you want to change the partition column you can perform below steps

  1. Create another hive table with required changes in partition column

    Create table new_table ( A int, B String.....)

  2. Load data from previous table

    Insert into new_table partition ( B ) select A,B from table Prev_table

How do you modify location metadata in Hive?, In this example, my table is partiioned on “cssoldtime_sk” as an int, however the crawler casts it as a string in the partition statement. I will need to fix this as well as  Is it possible to change the metadata of a column that is on a partitioned table in Hive? Hot Network Questions Someone open-sourced an un-open-source project


You can change partition column type using this statement:

alter table {table_name} partition column ({column_name} {column_type});

Also you can re-create table definition and change all columns types using these steps:

  1. Make your table external, so it can be dropped without dropping the data

    ALTER TABLE abc SET TBLPROPERTIES('EXTERNAL'='TRUE');

  2. Drop table (only metadata will be removed).

  3. Create EXTERNAL table using updated DDL with types changed and with the same LOCATION.
  4. recover partitions:

    MSCK [REPAIR] TABLE tablename;

The equivalent command on Amazon Elastic MapReduce (EMR)'s version of Hive is:

ALTER TABLE tablename RECOVER PARTITIONS;

This will add Hive partitions metadata. See manual here: RECOVER PARTITIONS

  1. And finally you can make you table MANAGED again if necessary:

ALTER TABLE tablename SET TBLPROPERTIES('EXTERNAL'='FALSE');

Note: All commands above should be ran in HUE, not MySQL.

Manage partitions automatically, data, such as logs, changes frequently. You can also configure how long to retain partition data and metadata. Using Apache Hive. Also available as: PDF. This issue was fixed in Hive 0.12.0 by creating dynamic partitions of external tables in locations based on metadata rather than user specifications (HIVE-5011). Starting in Hive 0.13.0, users are able to customize the locations by specifying a path pattern in the job configuration property hcat.dynamic.partitioning.custom.pattern (HIVE-6109).


After I changed the Avro(avsc) schema (see below), I was able to "fix" the (already existing) partition by doing "ADD PARTITION" as per this site:

http://hadooptutorial.info/partitioning-in-hive/

ALTER TABLE partitioned_user ADD PARTITION (country = 'US', state = 'CA')
LOCATION '/hive/external/tables/user/country=us/state=ca'

I changed the avro schema by doing a sqoop from MySQL (either alter the field in MySQL or CAST() in the SELECT) - this modified the avsc file.

I had done multiple things before doing the ADD PARTITION - I had done DROP/CREATE/MSCK TABLE - so I'm not sure if they are or aren't needed (but they hadn't fixed the partition).

Simple.

LanguageManual DDL - Apache Hive, After creating a partitioned table, Hive does not update metadata about corresponding objects or directories on the file system that you add or drop. The partition metadata in the Hive metastore becomes stale after corresponding objects/directories are added or deleted. Yes, it is possible to change the settings in the Hive session by using the SET command. It helps with the change in Hive job settings for an exact query. Following command shows the occupied buckets in the table:


Exchange Partition - Apache Hive, You can add columns/partitions, change SerDe, add table and SerDe Table or Partition below for more ways to alter partitions. which will update metadata about partitions to the Hive metastore  It's simple usually to change/modify the exesting table use this syntax in Hive. ALTER TABLE table_name CHANGE old_col_name new_col_name new_data_type Here you can change your column name and data type at a time. If you don't want to change col_name simply makes old_col_name and new_col_name are same. Ok. Come to your problem.


How to Update or Drop Hive Partition? Steps and Examples , The Hive metastore will be updated to change the metadata of the source Exchange partition is not allowed with transactional tables either as  As of the HDP 2.6.1 release here is a query that I use to find row counts on a specific partitioned table: SELECT * FROM hive.PARTITION_PARAMS AS A, hive.PARTITIONS AS B WHERE A.PARAM_KEY='numRows' and A.PART_ID=B.PART_ID and A.PART_ID IN ( SELECT PART_ID FROM hive.PARTITIONS WHERE TBL_ID=(SELECT A.TBL_ID FROM hive.TBLS AS A, hive.DBS AS B WHERE A.DB_ID=B.DB_ID AND B.NAME='DATABASE_NAME' AND A


4. HiveQL: Data Definition - Programming Hive [Book], Steps, Syntax, Examples, delete partition, remove partition, change HDFS In this article, we will check how to update or drop the Hive partition that you This command will remove the data and metadata for this partition. External table is created for external use as when the data is used outside Hive. Whenever we want to delete the table’s metadata and we want to keep the table’s data as it is, we use an External table. External table only deletes the schema of the table. A managed table is also called an Internal table. This is the default table in Hive