



Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
How to create Hive tables from HBase, convert existing HBase tables into Hive-HBase tables, and query both types of tables using Hive. It includes instructions for installing and configuring Hive and HBase, creating and populating Hive tables, and performing various queries. This integration allows users to leverage the strengths of both systems, making data analysis more efficient.
Typology: Study notes
1 / 6
This page cannot be seen from the preview
Don't miss anything!
You can create HBase tables from Hive that can be accessed by both Hive and HBase. This allows you to run Hive queries on HBase tables. You can also convert existing HBase tables into Hive HBase tables and run Hive queries on those tables as well.
In this section: Install and Configure Hive and HBase Getting Started with Hive HBase Integration
Example:
$ jps 21985 HRegionServer 1549 jenkins.war 15051 QuorumPeerMain 30935 Jps 15551 CommandServer 15698 HMaster 15293 JobTracker 15328 TaskTracker 15131 WardenMain
$ cd $HIVE_HOME $ vi conf/hive‐site.xml
Getting Started with Hive HBase Integration
In this tutorial you will:
Create a Hive table Populate the Hive table with data from a text file Query the Hive table Create a Hive HBase table Introspect the Hive HBase table from HBase Populate the Hive Hbase table with data from the Hive table Query the Hive HBase table from Hive Convert an existing HBase table into a Hive HBase table
Be sure that you have successfully completed all the steps in the Install and Configure Hive and HBase section before beginning this Getting Started tutorial. This Getting Started tutorial closely parallels the Hive HBase Integration section of the Apache Hive Wiki, and thanks to Samuel Guo and other contributors to that effort.
Create a Hive table with two columns:
Change to your Hive installation directory if you're not already there and start Hive:
$ cd $HIVE_HOME $ bin/hive
the Hive pokes table with a key of 98.
Note: This is a good illustration of the concept that Hive tables can have multiple identical keys. As we will see shortly, HBase tables cannot have multiple identical keys, only unique keys.
To create a Hive HBase table, enter these four lines of code at the Hive prompt:
After a brief delay, a message appears confirming that the table was created successfully:
Note: The TBLPROPERTIES command is not required, but those new to Hive HBase integration may find it easier to understand what's going on if Hive and HBase use different names for the same table.
In this example, Hive will recognize this table as "hbase_table_1" and HBase will recognize this table as "xyz".
Start the HBase shell:
Keeping the Hive terminal session open, start a new terminal session for HBase, then start the HBase shell:
Execute the list command to see a list of HBase tables:
HBase recognizes the Hive HBase table named xyz. This is the same table known to Hive as hbase_table_1.
hive> CREATE TABLE hbase_table_1(key int, value string)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:val") TBLPROPERTIES ("hbase.table.name" = "xyz");
Time taken: 5.195 seconds
$ cd $HBASE_HOME $ bin/hbase shell HBase Shell; enter 'help
hbase(main):001:0>
hbase(main):001:0> list TABLE xyz 1 row(s) in 0.8260 seconds
Display the description of the xyz table in the HBase shell:
From the Hive prompt, insert data from the Hive table pokes into the Hive HBase table hbase_table_1:
Query hbase_table_1 to see the data we have inserted into the Hive HBase table:
Even though we loaded two rows from the Hive pokes table that had the same key of 98, only one row was actually inserted into hbase_table_1. This is because hbase_table_1 is an HBASE table, and although Hive tables support duplicate keys, HBase tables only support unique keys. HBase tables arbitrarily retain only one key, and will silently discard all the data associated with duplicate keys.
Convert a pre existing HBase table to a Hive HBase table
To convert a pre existing HBase table to a Hive HBase table, enter the following four commands at the Hive prompt.
Note that in this example the existing HBase table is my_hbase_table.
Now we can run a Hive query against the pre existing HBase table my_hbase_table that Hive sees
hbase(main):004:0> describe "xyz" DESCRIPTION ENABLED {NAME => 'xyz', FAMILIES => [{NAME => 'cf1', BLOOMFILTER => 'NONE', REPLICATI true ON_SCOPE => '0', COMPRESSION => 'NONE', VERSIONS => '3', TTL => '2147483647', BL OCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]} 1 row(s) in 0.0190 seconds
hive> INSERT OVERWRITE TABLE hbase_table_1 SELECT * FROM pokes WHERE foo=98; ... 2 Rows loaded to hbase_table_ OK Time taken: 13.384 seconds
hive> SELECT * FROM hbase_table_1; OK 98 val_ Time taken: 0.56 seconds
hive> CREATE EXTERNAL TABLE hbase_table_2(key int, value string)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ("hbase.columns.mapping" = "cf1:val") TBLPROPERTIES("hbase.table.name" = "my_hbase_table");