Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Hive and HBase Integration: Creating and Querying Hive-HBase Tables, Study notes of Data Analysis & Statistical Methods

How to create Hive tables from HBase, convert existing HBase tables into Hive-HBase tables, and query both types of tables using Hive. It includes instructions for installing and configuring Hive and HBase, creating and populating Hive tables, and performing various queries. This integration allows users to leverage the strengths of both systems, making data analysis more efficient.

Typology: Study notes

2020/2021

Uploaded on 05/16/2021

ashwini-khandre
ashwini-khandre 🇮🇳

5 documents

1 / 6

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
HiveandHBaseIntegration
YoucancreateHBasetablesfromHivethatcanbeaccessedbybothHiveandHBase.Thisallowsyou
torunHivequeriesonHBasetables.YoucanalsoconvertexistingHBasetablesintoHiveHBasetables
andrunHivequeriesonthosetablesaswell.
Inthissection:
InstallandConfigureHiveandHBase
GettingStartedwithHiveHBaseIntegration
InstallandConfigureHiveandHBase
1.InstallandconfigureHiveifitisnotalreadyinstalled.
2.InstallandconfigureHBaseifitisnotalreadyinstalled.
3.ExecutethejpscommandandensurethatallrelevantHadoop,HBaseandZookeeperprocesses
arerunning.
Example:
Configurethehive‐site.xmlFile
1.Openthehive‐site.xmlfilewithyourfavoriteeditor,orcreateahive‐site.xmlfileifitdoesn't
alreadyexist:
2.CopythefollowingXMLcodeandpasteitintotheconfigurationelementblockofthehive‐
site.xmlfile.
$jps
21985HRegionServer
1549jenkins.war
15051QuorumPeerMain
30935Jps
15551CommandServer
15698HMaster
15293JobTracker
15328TaskTracker
15131WardenMain
$cd$HIVE_HOME
$viconf/hive‐site.xml
<configuration>
<property>
<name>hive.aux.jars.path</name>
pf3
pf4
pf5

Partial preview of the text

Download Hive and HBase Integration: Creating and Querying Hive-HBase Tables and more Study notes Data Analysis & Statistical Methods in PDF only on Docsity!

Hive and HBase Integration

You can create HBase tables from Hive that can be accessed by both Hive and HBase. This allows you to run Hive queries on HBase tables. You can also convert existing HBase tables into Hive HBase tables and run Hive queries on those tables as well.

In this section: Install and Configure Hive and HBase Getting Started with Hive HBase Integration

Install and Configure Hive and HBase

  1. Install and configure Hive if it is not already installed.
  2. Install and configure HBase if it is not already installed.
  3. Execute the jps command and ensure that all relevant Hadoop, HBase and Zookeeper processes are running.

Example:

Configure the hive‐site.xml File

  1. Open the hive‐site.xml file with your favorite editor, or create a hive‐site.xml file if it doesn't already exist:
  2. Copy the following XML code and paste it into the configuration element block of the hive‐ site.xml file.

$ jps 21985 HRegionServer 1549 jenkins.war 15051 QuorumPeerMain 30935 Jps 15551 CommandServer 15698 HMaster 15293 JobTracker 15328 TaskTracker 15131 WardenMain

$ cd $HIVE_HOME $ vi conf/hive‐site.xml

hive.aux.jars.path

  1. Save and close the hive‐site.xml file.

Getting Started with Hive HBase Integration

In this tutorial you will:

Create a Hive table Populate the Hive table with data from a text file Query the Hive table Create a Hive HBase table Introspect the Hive HBase table from HBase Populate the Hive Hbase table with data from the Hive table Query the Hive HBase table from Hive Convert an existing HBase table into a Hive HBase table

Be sure that you have successfully completed all the steps in the Install and Configure Hive and HBase section before beginning this Getting Started tutorial. This Getting Started tutorial closely parallels the Hive HBase Integration section of the Apache Hive Wiki, and thanks to Samuel Guo and other contributors to that effort.

Create a Hive table with two columns:

Change to your Hive installation directory if you're not already there and start Hive:

file:///opt/mapr/hive/hive‐/lib/hive‐hbase‐handler‐‐mapr.jar, file:///opt/mapr/hbase/hbase‐/lib/hbase‐client‐‐mapr.jar, file:///opt/m A comma separated list (with no spaces) of the jar files required for

hbase.zookeeper.quorum xx.xx.x.xxx,xx.xx.x.xxx,xx.xx.x.xxx A comma separated list (with no spaces) of the IP addresses of all ZooKeep

hbase.zookeeper.property.clientPort 5181 The Zookeeper client port. The MapR default clientPort is 5181.</descripti

$ cd $HIVE_HOME $ bin/hive

the Hive pokes table with a key of 98.

Note: This is a good illustration of the concept that Hive tables can have multiple identical keys. As we will see shortly, HBase tables cannot have multiple identical keys, only unique keys.

To create a Hive HBase table, enter these four lines of code at the Hive prompt:

After a brief delay, a message appears confirming that the table was created successfully:

Note: The TBLPROPERTIES command is not required, but those new to Hive HBase integration may find it easier to understand what's going on if Hive and HBase use different names for the same table.

In this example, Hive will recognize this table as "hbase_table_1" and HBase will recognize this table as "xyz".

Start the HBase shell:

Keeping the Hive terminal session open, start a new terminal session for HBase, then start the HBase shell:

Execute the list command to see a list of HBase tables:

HBase recognizes the Hive HBase table named xyz. This is the same table known to Hive as hbase_table_1.

hive> CREATE TABLE hbase_table_1(key int, value string)

STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:val") TBLPROPERTIES ("hbase.table.name" = "xyz");

OK

Time taken: 5.195 seconds

$ cd $HBASE_HOME $ bin/hbase shell HBase Shell; enter 'help' for list of supported commands. Type "exit" to leave the HBase Shell Version 0.90.4, rUnknown, Wed Nov 9 17:35:00 PST 2011

hbase(main):001:0>

hbase(main):001:0> list TABLE xyz 1 row(s) in 0.8260 seconds

Display the description of the xyz table in the HBase shell:

From the Hive prompt, insert data from the Hive table pokes into the Hive HBase table hbase_table_1:

Query hbase_table_1 to see the data we have inserted into the Hive HBase table:

Even though we loaded two rows from the Hive pokes table that had the same key of 98, only one row was actually inserted into hbase_table_1. This is because hbase_table_1 is an HBASE table, and although Hive tables support duplicate keys, HBase tables only support unique keys. HBase tables arbitrarily retain only one key, and will silently discard all the data associated with duplicate keys.

Convert a pre existing HBase table to a Hive HBase table

To convert a pre existing HBase table to a Hive HBase table, enter the following four commands at the Hive prompt.

Note that in this example the existing HBase table is my_hbase_table.

Now we can run a Hive query against the pre existing HBase table my_hbase_table that Hive sees

hbase(main):004:0> describe "xyz" DESCRIPTION ENABLED {NAME => 'xyz', FAMILIES => [{NAME => 'cf1', BLOOMFILTER => 'NONE', REPLICATI true ON_SCOPE => '0', COMPRESSION => 'NONE', VERSIONS => '3', TTL => '2147483647', BL OCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]} 1 row(s) in 0.0190 seconds

hive> INSERT OVERWRITE TABLE hbase_table_1 SELECT * FROM pokes WHERE foo=98; ... 2 Rows loaded to hbase_table_ OK Time taken: 13.384 seconds

hive> SELECT * FROM hbase_table_1; OK 98 val_ Time taken: 0.56 seconds

hive> CREATE EXTERNAL TABLE hbase_table_2(key int, value string)

STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ("hbase.columns.mapping" = "cf1:val") TBLPROPERTIES("hbase.table.name" = "my_hbase_table");