clogeny's hadoop training series - apache hive
DESCRIPTION
This Hive hands-on training is part of Clogeny's Hadoop Training Series. This will give you a complete overview of Apache Hive including architecture, data models, installation, configuration and important Hive commands/scripts.TRANSCRIPT
![Page 1: Clogeny's Hadoop Training Series - Apache Hive](https://reader033.vdocuments.site/reader033/viewer/2022061114/54629404b4af9f6c1c8b47ed/html5/thumbnails/1.jpg)
Clogeny’s Hadoop Developer Training Series
An Introduction to Hive
Madhur [email protected]
Cloud Computing
Private & Public Clouds Big Data
Storage
DevOps
![Page 2: Clogeny's Hadoop Training Series - Apache Hive](https://reader033.vdocuments.site/reader033/viewer/2022061114/54629404b4af9f6c1c8b47ed/html5/thumbnails/2.jpg)
Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482
What is Hive?A data warehousing infrastructure based on HadoopProvides easy data summarizationProvides ad-hoc querying and analysis of large volumes of dataComes with Hive QL, based on SQLAllows to plug in custom mappers and reducers
![Page 3: Clogeny's Hadoop Training Series - Apache Hive](https://reader033.vdocuments.site/reader033/viewer/2022061114/54629404b4af9f6c1c8b47ed/html5/thumbnails/3.jpg)
Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482
What Hive is NOTNot suitable for small datasets due to high latencyCannot be compared to systems like OracleDoes not offer real-time queries and row level updates
![Page 4: Clogeny's Hadoop Training Series - Apache Hive](https://reader033.vdocuments.site/reader033/viewer/2022061114/54629404b4af9f6c1c8b47ed/html5/thumbnails/4.jpg)
Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482
Hive Architecture
Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482
![Page 5: Clogeny's Hadoop Training Series - Apache Hive](https://reader033.vdocuments.site/reader033/viewer/2022061114/54629404b4af9f6c1c8b47ed/html5/thumbnails/5.jpg)
Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482
Data Models Types - TablesTables• Made up of actual data and the associated metadata• Actual data is stored in a Hadoop Filesystem• Metadata is always stored in a relational database like MySQL• Managed Tables
Hive physically moves data into its warehouse $ CREATE TABLE managed_table (dummy STRING);
$ LOAD DATA INPATH '/user/tom/data.txt' INTO table managed_table;
• External Tables Hive refers data from existing location in HDFS $ CREATE EXTERNAL TABLE external_table (dummy STRING) LOCATION '/user/tom/external_table'; $ LOAD DATA INPATH '/user/tom/data.txt' INTO TABLE external_table;
Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482
![Page 6: Clogeny's Hadoop Training Series - Apache Hive](https://reader033.vdocuments.site/reader033/viewer/2022061114/54629404b4af9f6c1c8b47ed/html5/thumbnails/6.jpg)
Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482
Data Models Types - PartitionsPartitions• A way to divide tables into coarse-grained parts• Data is partitioned based on the value of partition
column• Supports multiple dimensions• Defined at table creation time using PARTITION BY
clause• At the filesystem level, partitions are simply nested
subdirectories of the table directory.
Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482
![Page 7: Clogeny's Hadoop Training Series - Apache Hive](https://reader033.vdocuments.site/reader033/viewer/2022061114/54629404b4af9f6c1c8b47ed/html5/thumbnails/7.jpg)
Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482
Data Models Types - PartitionsCREATE TABLE logs (ts BIGINT, line STRING) PARTITIONED BY (dt STRING, country STRING);
LOAD DATA LOCAL INPATH 'input/hive/partitions/file1' INTO TABLE logs PARTITION (dt='2001-01-01', country='GB');
Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482
![Page 8: Clogeny's Hadoop Training Series - Apache Hive](https://reader033.vdocuments.site/reader033/viewer/2022061114/54629404b4af9f6c1c8b47ed/html5/thumbnails/8.jpg)
Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482
Data Model Types - BucketsBuckets• Partitions table within range• Enables more efficient queries by creating smaller
buckets of data rather than working with an entire partition.
• Make sampling more efficient$ CREATE TABLE bucketed_users (id INT, name STRING) CLUSTERED BY (id) INTO 4 BUCKETS;
Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482
![Page 9: Clogeny's Hadoop Training Series - Apache Hive](https://reader033.vdocuments.site/reader033/viewer/2022061114/54629404b4af9f6c1c8b47ed/html5/thumbnails/9.jpg)
Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482
Column Data Types
Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482
PrimitivesTYPE DESCRIPTION EXAMPLE
TINYINT 8-bit signed integer 1
SMALLINT 16-bit signed integer 1
INT 32-bit signed integer 1
BIGINT 64-bit signed integer 1
FLOAT 32-bit single precision floating point number
1.0
DOUBLE 64-bit double precision floating point number
1.0
BOOLEAN true/false value TRUE
STRING Character string ‘a’,”a”
TIMESTRAMP Timestamp with nanosecond precision
‘2012-01-02 03:04:05.123456789’
![Page 10: Clogeny's Hadoop Training Series - Apache Hive](https://reader033.vdocuments.site/reader033/viewer/2022061114/54629404b4af9f6c1c8b47ed/html5/thumbnails/10.jpg)
Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482
Column Data Types
Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482
Complex Data TypesTYPE DESCRIPTION EXAMPLE
ARRAY An ordered collection of fields. The fields must all be of same type
array(1, 2)
MAP An unordered collection of key-value pairs. Keys must be primitives, values
may be any type. For a particular map, the keys must be the same type, and the values must be the
same type
map(‘a’, 1,’ b’, 2)
STRUCT A collection of named fields. The fields may be of different types
struct(‘a’, 1, 1.0)
![Page 11: Clogeny's Hadoop Training Series - Apache Hive](https://reader033.vdocuments.site/reader033/viewer/2022061114/54629404b4af9f6c1c8b47ed/html5/thumbnails/11.jpg)
Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482
Metastore
Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482
A central repository of Hive metadataComprises of 2 parts:• Metastore service• Backing store for the data
![Page 12: Clogeny's Hadoop Training Series - Apache Hive](https://reader033.vdocuments.site/reader033/viewer/2022061114/54629404b4af9f6c1c8b47ed/html5/thumbnails/12.jpg)
Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482
Metastore deployment modes1: Embedded Mode
This is the default metastore deployment mode for CDH. In this mode the metastore uses a Derby database.
Both the database and the metastore service run embedded in the main HiveServer process. Both are started for you when you start the HiveServer process.
This mode requires the least amount of effort to configure.
But it can support only one active user at a time and is not certified for production use.
![Page 13: Clogeny's Hadoop Training Series - Apache Hive](https://reader033.vdocuments.site/reader033/viewer/2022061114/54629404b4af9f6c1c8b47ed/html5/thumbnails/13.jpg)
Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482
Metastore deployment modes2: Local Mode
In this mode the Hive metastore service runs in the same process as the main HiveServer process, but the metastore database runs in a separate process, and can be on a separate host.
The embedded metastore service communicates with the metastore database over JDBC.
![Page 14: Clogeny's Hadoop Training Series - Apache Hive](https://reader033.vdocuments.site/reader033/viewer/2022061114/54629404b4af9f6c1c8b47ed/html5/thumbnails/14.jpg)
Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482
Metastore deployment modes3: Remote Mode
In this mode the Hive metastore service runs in its own JVM process; other processes communicate with it via the Thrift network API (configured via the hive.metastore.uris property). The metastore service communicates with the metastore database over JDBC (configured via the javax.jdo.option.ConnectionURL property).
![Page 15: Clogeny's Hadoop Training Series - Apache Hive](https://reader033.vdocuments.site/reader033/viewer/2022061114/54629404b4af9f6c1c8b47ed/html5/thumbnails/15.jpg)
Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482
Metastore Properties
Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482
Property Name Type Description
hive.metastore.warehouse.dir URI The directory in HDFS where managed tables are stored
hive.metastore.local Boolean Flag for embedded metastore or local metastore
hive.metastore.uris Comma separated URIs
List of remote metastore URI’s
javax.jdo.option.ConnectionURL URI The JDBC URL of the metastore database
javax.jdo.option.ConnectionDriverName String The JDBC driver classname
javax.jdo.option.ConnectionUserName String The JDBC username
javax.jdo.option.ConnectionPassword String The JDBC password
![Page 16: Clogeny's Hadoop Training Series - Apache Hive](https://reader033.vdocuments.site/reader033/viewer/2022061114/54629404b4af9f6c1c8b47ed/html5/thumbnails/16.jpg)
Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482
Hive PackagesThe following packages are needed by Hive:• hive – base package that provides the complete
language and runtime (required)• hive-metastore – provides scripts for running the
metastore as a standalone service (optional)• hive-server – provides scripts for running the original
HiveServer as a standalone service (optional)• hive-server2 – provides scripts for running the new
HiveServer2 as a standalone service (optional)
![Page 17: Clogeny's Hadoop Training Series - Apache Hive](https://reader033.vdocuments.site/reader033/viewer/2022061114/54629404b4af9f6c1c8b47ed/html5/thumbnails/17.jpg)
Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482
Comparison with Traditional Databases
Schema on Read Verses Schema on Write• In a traditional database, a table’s schema is enforced at data
load time• If the data being loaded doesn’t conform to the schema, then
it is rejected• Hive, on the other hand, doesn’t verify the data when it is
loaded, but rather when a query is issued
Updates, Transactions, and Indexes• Updates, transactions, and indexes are mainstays of traditional
databases.• Until recently, these features have not been considered a part
of Hive’s feature set
![Page 18: Clogeny's Hadoop Training Series - Apache Hive](https://reader033.vdocuments.site/reader033/viewer/2022061114/54629404b4af9f6c1c8b47ed/html5/thumbnails/18.jpg)
Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482
Installing HiveWe will install hive with Metastore as a standalone serviceFor this, install the hive and Metastore packages as:
$ yum –y install hive hive-metastore
![Page 19: Clogeny's Hadoop Training Series - Apache Hive](https://reader033.vdocuments.site/reader033/viewer/2022061114/54629404b4af9f6c1c8b47ed/html5/thumbnails/19.jpg)
Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482
Hive ConfigurationDefault configuration in• /etc/hive/conf/hive-default.xml
Re(Define) properties in• /etc/hive/conf/hive-site.xml
Use $HIVE_CONF_DIR to specify alternate conf dir locationYou can override Hadoop configuration properties in Hive’s configuration• e.g: mapred.reduce.tasks=1
![Page 20: Clogeny's Hadoop Training Series - Apache Hive](https://reader033.vdocuments.site/reader033/viewer/2022061114/54629404b4af9f6c1c8b47ed/html5/thumbnails/20.jpg)
Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482
Configure Metastore databaseStep 1: Install and start MySQL if you have not already done so$ yum install mysql-server
Step 2: Configure the MySQL Service and Connector$ yum install mysql-connector-java$ ln -s /usr/share/java/mysql-connector-java-5.1.17.jar /usr/lib/hive/lib/mysql-connector-java-5.1.17.jar
Step 3: To set the MySQL root password:
![Page 21: Clogeny's Hadoop Training Series - Apache Hive](https://reader033.vdocuments.site/reader033/viewer/2022061114/54629404b4af9f6c1c8b47ed/html5/thumbnails/21.jpg)
Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482
Configure Metastore database
![Page 22: Clogeny's Hadoop Training Series - Apache Hive](https://reader033.vdocuments.site/reader033/viewer/2022061114/54629404b4af9f6c1c8b47ed/html5/thumbnails/22.jpg)
Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482
Configure Metastore database cont…Step 4: To make sure the MySQL server starts at boot• $ /sbin/chkconfig mysqld on
Step 5. Create the Database and User• Create the initial database schema using the hive-schema-
0.10.0.mysql.sql file located in the /usr/lib/hive/scripts/metastore/upgrade/mysql directory.
• Create a user for hive with the hostname of the metastore.• Grant proper privileges to the user.
![Page 23: Clogeny's Hadoop Training Series - Apache Hive](https://reader033.vdocuments.site/reader033/viewer/2022061114/54629404b4af9f6c1c8b47ed/html5/thumbnails/23.jpg)
Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482
Configure Metastore database cont…
![Page 24: Clogeny's Hadoop Training Series - Apache Hive](https://reader033.vdocuments.site/reader033/viewer/2022061114/54629404b4af9f6c1c8b47ed/html5/thumbnails/24.jpg)
Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482
Configure Metastore database cont…
Step 6: Configure the Metastore Service to Communicate with the MySQL Database• This step shows the configuration properties you need
to set in hive-site.xml to configure the metastore service to communicate with the MySQL database, and provides sample settings. Though you can use the same
• hive-site.xml on all hosts (client, metastore, HiveServer)• hive.metastore.uris is the only property that must be
configured on all of them; the others are used only on the metastore host.
![Page 25: Clogeny's Hadoop Training Series - Apache Hive](https://reader033.vdocuments.site/reader033/viewer/2022061114/54629404b4af9f6c1c8b47ed/html5/thumbnails/25.jpg)
Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482
Configure Metastore database cont…
![Page 26: Clogeny's Hadoop Training Series - Apache Hive](https://reader033.vdocuments.site/reader033/viewer/2022061114/54629404b4af9f6c1c8b47ed/html5/thumbnails/26.jpg)
Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482
Configure Metastore database cont…
![Page 27: Clogeny's Hadoop Training Series - Apache Hive](https://reader033.vdocuments.site/reader033/viewer/2022061114/54629404b4af9f6c1c8b47ed/html5/thumbnails/27.jpg)
Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482
Configure Metastore database cont…Step 7: Create hive user directory in hdfs$ sudo –u hdfs hadoop fs –mkdir /user/hive/warehouse$ sudo –u hdfs hadoop fs –chmod og+rw /user/hive/warehouse$ sudo –u hdfs hadoop fs –chown –R hive /user/hive
Step 8: Set Environment Variables:• Add the following to .bashrc file $ vim ~/.bashrc export HADOOP_HOME="/usr/lib/hadoop" PATH=$PATH:"/usr/lib/hadoop/bin“• Run command “bash” on command prompt $ bash
![Page 28: Clogeny's Hadoop Training Series - Apache Hive](https://reader033.vdocuments.site/reader033/viewer/2022061114/54629404b4af9f6c1c8b47ed/html5/thumbnails/28.jpg)
Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482
Starting the MetastoreYou can run the metastore from the command line:$ hive --service metastore
Ensure that the above does not give any errorUse Ctrl-c to stop the metastore process running from the command line.To run the metastore as a daemon, the command is:$ service hive-metastore start
![Page 29: Clogeny's Hadoop Training Series - Apache Hive](https://reader033.vdocuments.site/reader033/viewer/2022061114/54629404b4af9f6c1c8b47ed/html5/thumbnails/29.jpg)
Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482
Starting the Hive ConsoleTo start the Hive console:$ hive
To confirm that Hive is working, issue the show tables; command to list the Hive tables; be sure to use a semi-colon after the command:hive> SHOW tables;
![Page 30: Clogeny's Hadoop Training Series - Apache Hive](https://reader033.vdocuments.site/reader033/viewer/2022061114/54629404b4af9f6c1c8b47ed/html5/thumbnails/30.jpg)
Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482
Hive CLI CommandsSet a Hive or Hadoop conf property:hive> set propkey=value;
List all properties and values:hive> set –v;
![Page 31: Clogeny's Hadoop Training Series - Apache Hive](https://reader033.vdocuments.site/reader033/viewer/2022061114/54629404b4af9f6c1c8b47ed/html5/thumbnails/31.jpg)
Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482
Hive CLI CommandsCreating managed table$ cat input/hive/tables/data.txt$ hive hive> CREATE TABLE managed_table (dummy STRING); hive> LOAD DATA LOCAL INPATH ‘input/hive/tables/data.txt' INTO table managed_table; hive> select * from managed_table; $ hadoop fs -cat /user/hive/warehouse/managed_table/data.txt
![Page 32: Clogeny's Hadoop Training Series - Apache Hive](https://reader033.vdocuments.site/reader033/viewer/2022061114/54629404b4af9f6c1c8b47ed/html5/thumbnails/32.jpg)
Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482
Hive CLI Commands
![Page 33: Clogeny's Hadoop Training Series - Apache Hive](https://reader033.vdocuments.site/reader033/viewer/2022061114/54629404b4af9f6c1c8b47ed/html5/thumbnails/33.jpg)
Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482
Hive CLI Commands
![Page 34: Clogeny's Hadoop Training Series - Apache Hive](https://reader033.vdocuments.site/reader033/viewer/2022061114/54629404b4af9f6c1c8b47ed/html5/thumbnails/34.jpg)
Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482
Hive CLI CommandsCreating external table• Select a location in hdfs to create table• Ensure it has write access to other users
$ sudo -u hdfs hadoop fs -mkdir /user/joe/table$ sudo -u hdfs hadoop fs -chmod a+w /user/joe/table
• Create external table and load data into it:hive> CREATE EXTERNAL TABLE external_table (dummy STRING) LOCATION '/user/joe/table';hive> LOAD DATA LOCAL INPATH 'input/hive/tables/data.txt' INTO TABLE external_table;hive> select * from external_table;
• Check if the table was created in the external directory$ sudo -u hdfs hadoop fs -cat /user/joe/table/data.txt
![Page 35: Clogeny's Hadoop Training Series - Apache Hive](https://reader033.vdocuments.site/reader033/viewer/2022061114/54629404b4af9f6c1c8b47ed/html5/thumbnails/35.jpg)
Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482
Hive CLI Commands
![Page 36: Clogeny's Hadoop Training Series - Apache Hive](https://reader033.vdocuments.site/reader033/viewer/2022061114/54629404b4af9f6c1c8b47ed/html5/thumbnails/36.jpg)
Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482
Hive CLI Commands
![Page 37: Clogeny's Hadoop Training Series - Apache Hive](https://reader033.vdocuments.site/reader033/viewer/2022061114/54629404b4af9f6c1c8b47ed/html5/thumbnails/37.jpg)
Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482
Hive CLI CommandsCreate Partitioned table
hive> CREATE TABLE logs (ts BIGINT, line STRING) PARTITIONED BY (dt STRING, country STRING);
Load data in table specifying the partitionshive> LOAD DATA LOCAL INPATH 'input/hive/partitions/file1' INTO TABLE logs PARTITION (dt='2001-01-01', country='GB');
hive> LOAD DATA LOCAL INPATH 'input/hive/partitions/file2' INTO TABLE logs PARTITION (dt='2001-01-01', country='US');
hive> LOAD DATA LOCAL INPATH 'input/hive/partitions/file3' INTO TABLE logs PARTITION (dt='2001-01-02', country='US');
See the table contentshive> select * from logs;
List all the partitionshive> SHOW PARTITIONS logs;
![Page 38: Clogeny's Hadoop Training Series - Apache Hive](https://reader033.vdocuments.site/reader033/viewer/2022061114/54629404b4af9f6c1c8b47ed/html5/thumbnails/38.jpg)
Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482
Hive CLI Commands
![Page 39: Clogeny's Hadoop Training Series - Apache Hive](https://reader033.vdocuments.site/reader033/viewer/2022061114/54629404b4af9f6c1c8b47ed/html5/thumbnails/39.jpg)
Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482
Hive CLI Commands
![Page 40: Clogeny's Hadoop Training Series - Apache Hive](https://reader033.vdocuments.site/reader033/viewer/2022061114/54629404b4af9f6c1c8b47ed/html5/thumbnails/40.jpg)
Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482
Hive CLI Commands
![Page 41: Clogeny's Hadoop Training Series - Apache Hive](https://reader033.vdocuments.site/reader033/viewer/2022061114/54629404b4af9f6c1c8b47ed/html5/thumbnails/41.jpg)
Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482
Hive CLI CommandsCreate Bucket:• Create a normal table users and create a bucket named
bucketed_users from ithive> set hive.enforce.bucketing=true;
hive> CREATE TABLE users (id INT, name STRING);
hive> LOAD DATA LOCAL INPATH 'input/hive/tables/users.txt' INTO table users;
hive> CREATE TABLE bucketed_users (id INT, name STRING) CLUSTERED BY (id) SORTED BY (id ASC) INTO 4 BUCKETS;
hive> INSERT OVERWRITE TABLE bucketed_users SELECT * FROM users;
• Check the contents of table per buckethive> select * from bucketed_users;
hive> select * from bucketed_users TABLESAMPLE(BUCKET 1 OUT OF 4 ON id);
![Page 42: Clogeny's Hadoop Training Series - Apache Hive](https://reader033.vdocuments.site/reader033/viewer/2022061114/54629404b4af9f6c1c8b47ed/html5/thumbnails/42.jpg)
Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482
Hive CLI Commands
![Page 43: Clogeny's Hadoop Training Series - Apache Hive](https://reader033.vdocuments.site/reader033/viewer/2022061114/54629404b4af9f6c1c8b47ed/html5/thumbnails/43.jpg)
Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482
Hive CLI Commands
![Page 44: Clogeny's Hadoop Training Series - Apache Hive](https://reader033.vdocuments.site/reader033/viewer/2022061114/54629404b4af9f6c1c8b47ed/html5/thumbnails/44.jpg)
Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482
JoinsPrerequisites• Create 2 tables sales and things and load data from files
hive> CREATE TABLE sales (user STRING, id INT)row format delimited fields terminated by '\t' stored as textfile;
hive> LOAD DATA LOCAL INPATH 'input/hive/joins/sales.txt' INTO table sales;
hive> select * from sales;
hive> CREATE TABLE things (id INT, name STRING)row format delimited fields terminated by '\t' stored as textfile;
hive> LOAD DATA LOCAL INPATH 'input/hive/joins/things.txt' INTO table things;
hive> select * from things;
![Page 45: Clogeny's Hadoop Training Series - Apache Hive](https://reader033.vdocuments.site/reader033/viewer/2022061114/54629404b4af9f6c1c8b47ed/html5/thumbnails/45.jpg)
Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482
Joins
![Page 46: Clogeny's Hadoop Training Series - Apache Hive](https://reader033.vdocuments.site/reader033/viewer/2022061114/54629404b4af9f6c1c8b47ed/html5/thumbnails/46.jpg)
Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482
JoinsInner Joinhive> SELECT sales.*, things.* FROM sales JOIN things ON (sales.id = things.id);
![Page 47: Clogeny's Hadoop Training Series - Apache Hive](https://reader033.vdocuments.site/reader033/viewer/2022061114/54629404b4af9f6c1c8b47ed/html5/thumbnails/47.jpg)
Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482
JoinsLeft Outer Joinhive> SELECT sales.*, things.* FROM sales LEFT OUTER JOIN things ON (sales.id = things.id);
![Page 48: Clogeny's Hadoop Training Series - Apache Hive](https://reader033.vdocuments.site/reader033/viewer/2022061114/54629404b4af9f6c1c8b47ed/html5/thumbnails/48.jpg)
Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482
JoinsRight Outer Joinhive> SELECT sales.*, things.* FROM sales RIGHT OUTER JOIN things ON (sales.id = things.id);
![Page 49: Clogeny's Hadoop Training Series - Apache Hive](https://reader033.vdocuments.site/reader033/viewer/2022061114/54629404b4af9f6c1c8b47ed/html5/thumbnails/49.jpg)
Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482
JoinsFull Outer Joinhive> SELECT sales.*, things.* FROM sales FULL OUTER JOIN things ON (sales.id = things.id);
![Page 50: Clogeny's Hadoop Training Series - Apache Hive](https://reader033.vdocuments.site/reader033/viewer/2022061114/54629404b4af9f6c1c8b47ed/html5/thumbnails/50.jpg)
Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482
JoinsSemi Joins• Hive does not support IN sub queries
hive> SELECT * from things WHERE things.id IN (SELECT id from sales);
• So solution is semi joinshive> SELECT * from things LEFT SEMI JOIN ON (sales.id = things.id);
![Page 51: Clogeny's Hadoop Training Series - Apache Hive](https://reader033.vdocuments.site/reader033/viewer/2022061114/54629404b4af9f6c1c8b47ed/html5/thumbnails/51.jpg)
Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482
JoinsMap Joins• Used in case when 1 table is very small enough to fit in
memory. No reducers usedhive> SELECT /*+ MAPJOIN(things) */ sales.*, things.* FROM sales JOIN things ON (sales.id = things.id);
![Page 52: Clogeny's Hadoop Training Series - Apache Hive](https://reader033.vdocuments.site/reader033/viewer/2022061114/54629404b4af9f6c1c8b47ed/html5/thumbnails/52.jpg)
Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482
Other CommandsCREATE TABLE…AS SELECThive> CREATE TABLE target AS SELECT id from things;
Altering Tableshive> ALTER TABLE target RENAME TO source;hive> ALTER TABLE source ADD COLUMNS (col2 STRING);
![Page 53: Clogeny's Hadoop Training Series - Apache Hive](https://reader033.vdocuments.site/reader033/viewer/2022061114/54629404b4af9f6c1c8b47ed/html5/thumbnails/53.jpg)
Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482
Other CommandsDropping Tables• For managed tables both data and metadata is deleted• For external tables only metadata is deleted
hive> drop table <table_name>;
![Page 54: Clogeny's Hadoop Training Series - Apache Hive](https://reader033.vdocuments.site/reader033/viewer/2022061114/54629404b4af9f6c1c8b47ed/html5/thumbnails/54.jpg)
Clogeny Technologies http://www.clogeny.com (US) 408-556-9645 (India) +91 20 661 43 482
ReferencesHadoop: The Definitive Guide, 3rd EditionHive Community page• http://hive.apache.org/