Integration of HBase and Hive – An intro to insert JSON data into HBase from Hive

Integration of HBase and Hive – An intro to insert JSON data into HBase from Hive

Here is how JSON data  is inserted into HBase table using Hive.

Use the HBaseStorageHandler to register HBase tables with the Hive metastore. You can optionally specify the HBase table as EXTERNAL, in which case , Hive can not drop that table directly . You will have to use the HBase shell command  to drop  such a table.

Registering the table is the first step. As part of the registration, you also need to specify a column mapping. This is how you will have to link Hive column names to the HBase table’s rowkey and columns. Do so using the hbase.columns.mapping SerDe property.

Step 1: Create a new HBase table which is to be managed by Hive

hive>CREATE TABLE hive_table_emp(id int, name string, role string) 
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:name,cf1:role")
TBLPROPERTIES ("hbase.table.name" = "hbase_table_emp");

Step 2: data Input file looks like:

{"id": 123, "name": "Ram", "role":"TeamLead"}
{"id": 456, "name": "Silva", "role":"Member"}
{"id": 789, "name": "Krishna", "role":"Member"}

Step 3: You can use the get_json_object function to parse the data as a JSON object. For instance, if you create a table – staging with your JSON data:

DROP TABLE IF EXISTS staging;
CREATE TABLE staging (json STRING);
LOAD DATA LOCAL INPATH '<file-location>' INTO TABLE staging;

Step 4: Then use get_json_object to extract the attributes you want to load into the table:

INSERT OVERWRITE TABLE hive_table_emp SELECT
  get_json_object(json, "$.id") AS id,
  get_json_object(json, "$.name") AS name,
  get_json_object(json, "$.role") AS role
FROM staging;

Step 5: Let us scan HBase table to validate data is loaded or not:

hbase(main):012:0> scan "hbase_table_emp"
ROW                                         COLUMN+CELL
 123                                        column=cf1:name, timestamp=1462258474614, value=Ram
 123                                        column=cf1:role, timestamp=1462258474614, value=TeamLead
 456                                        column=cf1:name, timestamp=1462258474614, value=Silva
 456                                        column=cf1:role, timestamp=1462258474614, value=Member
 789                                        column=cf1:name, timestamp=1462258474614, value=Krishna
 789                                        column=cf1:role, timestamp=1462258474614, value=Member

 

Leave a Reply

QUOTE TO CASH

VIRTUAL CONFERENCE

6 April 2020 | 11:00 AM PST

Join the Standav Team & Frank Sohn from Novus CPQ, with Special Guests from Bayer, Box, Extreme Networks, Miro, Mobileiron, and Ross Video as we discuss CPQ, Billing, and Pricing Solutions for the Enterprise.

You have successfully registered! You will receive the details to join the conference.