apache hive 紹介

Download Apache Hive 紹介

Post on 28-May-2015

2.056 views

Category:

Technology

0 download

Embed Size (px)

DESCRIPTION

2014年1月23日のあしたのオープンソース研で使用したApache Hive説明資料です。

TRANSCRIPT

  • 1. Apache Hive Copyright Infoscience Corporation. All rights reserved.

2. Hive SQL HiveQL Hive map/reduce Hadoop DBMS Copyright Infoscience Corporation. All rights 3. Apache Hive Apache Hive Hadoop map/reduce SQL HiveQL Copyright Infoscience Corporation. All rights 4. Apache Hive Apache Hive RCFile, HBase RDBMS Hadoop UDF) UDF SQL (HiveQL) Map/Reduce Copyright Infoscience Corporation. All rights 5. Hive ("Hadoop 3 " p.453 12-1 Hive ) map/reduce map/reduce job Hadoop Copyright Infoscience Corporation. All rights 6. Hive Hive MySQL Derby Java SQL Derby ("Hadoop 3 " p.454 12-2 ) Copyright Infoscience Corporation. All rights 7. Hive : INT, DOUBLE : ARRAY, MAP, STRUCT (RDB )("Hadoop 3 " p.459 12-3 Hive ) Copyright Infoscience Corporation. All rights 8. (1) TAB n ^A Control-A CREATE TABLE 8 001 ^BARRAY STRUCT MAP / CREATE TABLE 8 002 ^CMAP / CREATE TABLE 8 003 Hive p.47 3-3 Hive Copyright Infoscience Corporation. All rights 9. (2) HDFS John Doe^A100000.0^AMary Smith^BTodd Jones^AFederal Taxes^C.2^BState Taxes^C.05^BInsurance^C.1^A1 Michigan Ave.^BChicago^BIL^B60600 Mary Smith^A80000.0^ABill King^AFederal Taxes^C.2^BState Taxes^C.05^BInsurance^C.1^A100 Ontario St.^BChicago^BIL^B60601 Todd Jones^A70000.0^AFederal Taxes^C.15^BState Taxes^C.03^BInsurance^C.1^A200 Chicago Ave.^BOak Park^BIL^B60700 Bill King^A60000.0^AFederal Taxes^C.15^BState Taxes^C.03^BInsurance^C.1^A300 Obscure Dr.^BObscuria^BIL^B60100Copyright Infoscience Corporation. All rights 10. RDBMS Hive Copyright Infoscience Corporation. All rights 11. HiveQL : CREATE TABLE CREATE TABLE records (year STRING, temperature INT, quality INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY 't'; ROW FORMAT DELIMITED FIELDS TERMINATED BY 't': CREATE TABLE RDB DB Copyright Infoscience Corporation. All rights 12. Hive Hive PARTITIONED BY PARTITIONED BY CREATE TABLE logs (ts BIGINT, line STRING) PARTITIONED BY (dt STRING, country STRING); ( ) ( . ) Copyright Infoscience Corporation. All rights("Hadoop 3 " p.464 ) 13. HiveQL : hive> SHOW TABLES; hive> SHOW TABLES '.*s'; 's' ( ) hive> DESCRIBE invites; hive> describe invites; OK foo int bar string ds string # Partition Information # col_name data_typeNone None None commentds string None Time taken: 0.265 seconds, Fetched: 8 row(s)Copyright Infoscience Corporation. All rights 14. HiveQL : LOAD DATA LOCAL INPATH 'input/ncdc/micro-tab/sample.txt' OVERWRITE INTO TABLE records; Hive Hive HDFS Hive OVERWRITE Hive Copyright Infoscience Corporation. All rights 15. HiveQL: INSERT INSERT (Hive UPDATE, DELETE ) INSERT hive> INSERT OVERWRITE TABLE events > SELECT a.* FROM profiles a WHERE a.key < 100; profiles key 100 events INSERT OVERWRITE Copyright Infoscience Corporation. All rights 16. HiveQL : SELECT SELECT weekday, COUNT(*) FROM u_data_new GROUP BY weekday; SQL Copyright Infoscience Corporation. All rights 17. HiveQL (1) SELECT FROM ( ) Hive AND JOIN...ON... 3 SELECT sales.*, things.* FROM sales JOIN things ON (sales.id = things.id);Copyright Infoscience Corporation. All rights 18. HiveQL (2) FROM SELECT station, year, AVG(max_temperature) FROM ( SELECT station, year, MAX(temperature) AS max_temperature FROM records2 WHERE temperature != 9999 AND (quality = 0 OR quality = 1 OR quality = 4 OR quality = 5 OR quality = 9) GROUP BY station, year ) mt GROUP BY station, year;Copyright Infoscience Corporation. All rights 19. HiveQL : HiveQL : round, floor, ceil, rand, exp, ln, pow, sqrt, count, sum, avg, min, max, variance, : json_tuple ( ), parse_url_tuple length, reverse, concat, substr, upper, lower, Copyright Infoscience Corporation. All rights 20. HiveQL : SELECT CREATE VIEW max_temperatures (station, year, max_temperature) AS SELECT station, year, MAX(temperature) FROM valid_records GROUP BY station, year;Copyright Infoscience Corporation. All rights 21. HiveQL : compact bitmap 2 compact HDFS compact . CREATE TABLE t(i int, j int); CREATE INDEX x ON TABLE t(j) AS 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler';bitmap Copyright Infoscience Corporation. All rights 22. HiveQL : Hive Hive (User-Defined Function:UDF) UDF 1 1 (User-Defined Function:UDAF) UDAF 1 COUNT MAX (User-Defined Table-generating Function:UDTF) UDTF 1 ( ) Copyright Infoscience Corporation. All rights 23. Hive KIXEYE Hive https://cwiki.apache.org/confluence/download/attachments/27362054/Hive-kixeyeanalytics.pdf?version=1&modificationDate=1360856744000&api=v2 NASA ( Hive p.317 321)Copyright Infoscience Corporation. All rights 24. Impala, Presto Hive Hive Impara Presto OSS Impala Hive map/reduce Cloudera Presto map/reduce Facebook Copyright Infoscience Corporation. All rights 25. Hadoop map/reduce SQL HiveQL Hadoop mapreduce MapReduce Hadoop RDBMS Hive Hadoop cf. RDBMS NoSQL ( ) Copyright Infoscience Corporation. All rights 26. Hive " (http://www.slideshare.net/Cloudera_jp/hive-20130724) "Hadoop 3 " (12 ), Tom White Sky " Hive", Edward Capriolo, Dean Wampler, Jason Rutherglen Sky Apache Hive, wikipedia (http://ja.wikipedia.org/wiki/Apache_Hive, http://en.wikipedia.org/wiki/Apache_Hive) Apache Hive Wiki: GettingStarted https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStartedInstallingHivefromaStableRelease Hadoop http://metasearch.sourceforge.jp/wiki/index.php?Hadoop%A5%AF %A5%A4%A5%C3%A5%AF%A5%B9%A5%BF%A1%BC%A5%C8%A5%AC %A5%A4%A5%C9Copyright Infoscience Corporation. All rights