How to get combined values from a table in hive
Have a table in Hive with a following structure:
col1 col2 col3 col4 col5 col6 ----------------------------- AA NM ER NER NER NER AA NM NER ERR NER NER AA NM NER NER TER NER AA NM NER NER NER ERY
Wrote a query to fetch the record from the table:
Select distinct(col1),col2, array(concat( CASE WHEN col3=='ER' THEN 'ER' WHEN col4=='ERR' THEN 'ERR' WHEN col5=='TER' THEN 'TER' WHEN col6=='ERY' THEN 'ERY' ELSE 'NER' END
but its not working. Not getting how to go about it.
col1 col2 col3 -------------- AA NM ['ER','ERR','TER','ERY']
Any suggestion/hint will be really helpful.
Please try below -
select col1, col2, array( max(CASE WHEN col3=='ER' THEN 'ER' else '' end), max(CASE WHEN col4=='ERR' THEN 'ERR' else '' end), max(CASE WHEN col5=='TER' THEN 'TER' else '' end), max(CASE WHEN col6=='ERY' THEN 'ERY' else '' end)) from table group by col1, col2
HiveQL - Select-Joins, JOIN clause is used to combine and retrieve the records from multiple tables. JOIN is same hive> SELECT c.ID, c.NAME The HiveQL LEFT OUTER JOIN returns all the rows from the left table, even if there are no matches in the right table. When inserting data into a table, you can specify a permuted order for the inserted columns to match the order in the destination table. Hive considerations: Impala queries can make use of metadata about the table and columns, such as the number of rows in a table or the number of different values in a column.
You can obatin a string that seems an array using concat_ws
Select distinct(col1),col2,concat_ws('','[', concat_ws('', "'",col3,"',", "'",col4,"',","'",col5,"',","'",col6,"'"), ']') from my_table
Hive Join | HiveQL Select Joins Query, Hive Join-HiveQL Select Joins Query, types of joins in hive, Inner join, Left Also , we use it to combine rows from multiple tables. Basically, to combine and retrieve the records from multiple tables we use Hive Join clause. Suppose your want get duplicate rows based on a particular column ID here. Below query will give you all the IDs which are duplicate in table in hive. SELECT "ID" FROM TABLE GROUP BY "ID" HAVING count(ID) > 1
This is a big complicated. I think that simply unpivoting is the simplest solution:
select col1, col2, collect_set(col) from ((select col1, col2, col3 as col from t ) union -- intentional to remove duplicates (select col1, col2, col4 as col from t ) union -- intentional to remove duplicates (select col1, col2, col5 as col from t ) union -- intentional to remove duplicates (select col1, col2, col6 as col from t ) ) t where col is not null group by col1, col2;
Merge data in tables, You can conditionally insert, update, or delete existing data in Hive tables using the ACID MERGE statement. This hadoop hive tutorial shows how to use various Hive commands in HQL to perform various operations like creating a table in hive, deleting a table in hive, altering a table in hive, etc. Pre-requisites to follow this Hive Tutorial. Hive Installation must be completed successfully.
INSERT VALUES, UPDATE, DELETE, and MERGE SQL Statements, VALUES statement enables users to write data to Apache Hive from values provided in VALUES statement must provide values for each column in the table. You must have both the SELECT and UPDATE privileges to use this statement. The main query will depend on the values returned by the subqueries. Subqueries can be classified into two types . Subqueries in FROM clause; Subqueries in WHERE clause; When to use: To get a particular value combined from two column values from different tables; Dependency of one table values on other tables
Hive Join & SubQuery Tutorial with Examples, To get a particular value combined from two column values from different tables; Dependency of one table values on other tables; Comparative� A hive is a logical group of keys, subkeys, and values in the registry that has a set of supporting files loaded into memory when the operating system is started or a user logs in. Each time a new user logs on to a computer, a new hive is created for that user with a separate file for the user profile. This is called the user profile hive. A
Understanding the MERGE Statement, The MERGE statement can be a key tool of MapR-cluster data For example, suppose you have a structure stored as one field in a Hive table: Technically, you use the UNIQUE constraints to enforce the uniqueness of rows in one or more columns of a table. However, sometimes you may find duplicate values in a table due to the poor database design, application bugs, or uncleaned data from external sources. Your job is to identify these duplicate values in effective ways.
WHEN col3=='ER' THEN 'ER'what does it mean? At least one value 'ER' contains in col3? Or something else?