Prepare for your exams
Get points
Guidelines and tips

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Log in Sign up

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search Store documents

The best documents sold by students who completed their studies

Search through all study resources

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

University Rankings

Discover the best universities in your country according to Docsity users

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

From our blog

Exams and Study

Go to the blog

Hive and HBase Integration: Creating and Querying Hive-HBase Tables, Study notes of Data Analysis & Statistical Methods

Dr. Babasaheb Ambedkar Marathwada University Data Analysis & Statistical Methods

How to create Hive tables from HBase, convert existing HBase tables into Hive-HBase tables, and query both types of tables using Hive. It includes instructions for installing and configuring Hive and HBase, creating and populating Hive tables, and performing various queries. This integration allows users to leverage the strengths of both systems, making data analysis more efficient.

Typology: Study notes

2020/2021

Uploaded on 05/16/2021

ashwini-khandre 🇮🇳

5 documents

1 / 6

This page cannot be seen from the preview

Don't miss anything!

HiveandHBaseIntegration

YoucancreateHBasetablesfromHivethatcanbeaccessedbybothHiveandHBase.Thisallowsyou

torunHivequeriesonHBasetables.YoucanalsoconvertexistingHBasetablesintoHiveHBasetables

andrunHivequeriesonthosetablesaswell.

Inthissection:

InstallandConfigureHiveandHBase

GettingStartedwithHiveHBaseIntegration

InstallandConfigureHiveandHBase

1.InstallandconfigureHiveifitisnotalreadyinstalled.

2.InstallandconfigureHBaseifitisnotalreadyinstalled.

3.ExecutethejpscommandandensurethatallrelevantHadoop,HBaseandZookeeperprocesses

arerunning.

Example:

Configurethehive‐site.xmlFile

1.Openthehive‐site.xmlfilewithyourfavoriteeditor,orcreateahive‐site.xmlfileifitdoesn't

alreadyexist:

2.CopythefollowingXMLcodeandpasteitintotheconfigurationelementblockofthehive‐

site.xmlfile.

$jps

21985HRegionServer

1549jenkins.war

15051QuorumPeerMain

30935Jps

15551CommandServer

15698HMaster

15293JobTracker

15328TaskTracker

15131WardenMain

$cd$HIVE_HOME

$viconf/hive‐site.xml



<name>hive.aux.jars.path</name>

Partial preview of the text

Download Hive and HBase Integration: Creating and Querying Hive-HBase Tables and more Study notes Data Analysis & Statistical Methods in PDF only on Docsity!

Hive and HBase Integration

You can create HBase tables from Hive that can be accessed by both Hive and HBase. This allows you to run Hive queries on HBase tables. You can also convert existing HBase tables into Hive HBase tables and run Hive queries on those tables as well.

In this section: Install and Configure Hive and HBase Getting Started with Hive HBase Integration

Install and Configure Hive and HBase

Install and configure Hive if it is not already installed.
Install and configure HBase if it is not already installed.
Execute the jps command and ensure that all relevant Hadoop, HBase and Zookeeper processes are running.

Example:

Configure the hive‐site.xml File

Open the hive‐site.xml file with your favorite editor, or create a hive‐site.xml file if it doesn't already exist:
Copy the following XML code and paste it into the configuration element block of the hive‐ site.xml file.

$ jps 21985 HRegionServer 1549 jenkins.war 15051 QuorumPeerMain 30935 Jps 15551 CommandServer 15698 HMaster 15293 JobTracker 15328 TaskTracker 15131 WardenMain

$ cd $HIVE_HOME $ vi conf/hive‐site.xml

hive.aux.jars.path

Save and close the hive‐site.xml file.

Getting Started with Hive HBase Integration

In this tutorial you will:

Create a Hive table Populate the Hive table with data from a text file Query the Hive table Create a Hive HBase table Introspect the Hive HBase table from HBase Populate the Hive Hbase table with data from the Hive table Query the Hive HBase table from Hive Convert an existing HBase table into a Hive HBase table

Be sure that you have successfully completed all the steps in the Install and Configure Hive and HBase section before beginning this Getting Started tutorial. This Getting Started tutorial closely parallels the Hive HBase Integration section of the Apache Hive Wiki, and thanks to Samuel Guo and other contributors to that effort.

Create a Hive table with two columns:

Change to your Hive installation directory if you're not already there and start Hive:

file:///opt/mapr/hive/hive‐/lib/hive‐hbase‐handler‐‐mapr.jar, file:///opt/mapr/hbase/hbase‐/lib/hbase‐client‐‐mapr.jar, file:///opt/m A comma separated list (with no spaces) of the jar files required for

hbase.zookeeper.quorum xx.xx.x.xxx,xx.xx.x.xxx,xx.xx.x.xxx A comma separated list (with no spaces) of the IP addresses of all ZooKeep

hbase.zookeeper.property.clientPort 5181 The Zookeeper client port. The MapR default clientPort is 5181.</descripti

$ cd $HIVE_HOME $ bin/hive

the Hive pokes table with a key of 98.

Note: This is a good illustration of the concept that Hive tables can have multiple identical keys. As we will see shortly, HBase tables cannot have multiple identical keys, only unique keys.

To create a Hive HBase table, enter these four lines of code at the Hive prompt:

After a brief delay, a message appears confirming that the table was created successfully:

Note: The TBLPROPERTIES command is not required, but those new to Hive HBase integration may find it easier to understand what's going on if Hive and HBase use different names for the same table.

In this example, Hive will recognize this table as "hbase_table_1" and HBase will recognize this table as "xyz".

Start the HBase shell:

Keeping the Hive terminal session open, start a new terminal session for HBase, then start the HBase shell:

Execute the list command to see a list of HBase tables:

HBase recognizes the Hive HBase table named xyz. This is the same table known to Hive as hbase_table_1.

hive> CREATE TABLE hbase_table_1(key int, value string)

STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:val") TBLPROPERTIES ("hbase.table.name" = "xyz");

OK

Time taken: 5.195 seconds

$ cd $HBASE_HOME $ bin/hbase shell HBase Shell; enter 'help' for list of supported commands. Type "exit" to leave the HBase Shell Version 0.90.4, rUnknown, Wed Nov 9 17:35:00 PST 2011

hbase(main):001:0>

hbase(main):001:0> list TABLE xyz 1 row(s) in 0.8260 seconds

Display the description of the xyz table in the HBase shell:

From the Hive prompt, insert data from the Hive table pokes into the Hive HBase table hbase_table_1:

Query hbase_table_1 to see the data we have inserted into the Hive HBase table:

Even though we loaded two rows from the Hive pokes table that had the same key of 98, only one row was actually inserted into hbase_table_1. This is because hbase_table_1 is an HBASE table, and although Hive tables support duplicate keys, HBase tables only support unique keys. HBase tables arbitrarily retain only one key, and will silently discard all the data associated with duplicate keys.

Convert a pre existing HBase table to a Hive HBase table

To convert a pre existing HBase table to a Hive HBase table, enter the following four commands at the Hive prompt.

Note that in this example the existing HBase table is my_hbase_table.

Now we can run a Hive query against the pre existing HBase table my_hbase_table that Hive sees

hbase(main):004:0> describe "xyz" DESCRIPTION ENABLED {NAME => 'xyz', FAMILIES => [{NAME => 'cf1', BLOOMFILTER => 'NONE', REPLICATI true ON_SCOPE => '0', COMPRESSION => 'NONE', VERSIONS => '3', TTL => '2147483647', BL OCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]} 1 row(s) in 0.0190 seconds

hive> INSERT OVERWRITE TABLE hbase_table_1 SELECT * FROM pokes WHERE foo=98; ... 2 Rows loaded to hbase_table_ OK Time taken: 13.384 seconds

hive> SELECT * FROM hbase_table_1; OK 98 val_ Time taken: 0.56 seconds

hive> CREATE EXTERNAL TABLE hbase_table_2(key int, value string)

STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ("hbase.columns.mapping" = "cf1:val") TBLPROPERTIES("hbase.table.name" = "my_hbase_table");