sql on hadoop 比較検証 (2014/11)

Download SQL on Hadoop 比較検証 (2014/11)

If you can't read please download the document

Upload: ntt-data-oss-professional-services

Post on 02-Jul-2015

5.892 views

Category:

Technology


1 download

DESCRIPTION

Impala Meetup 2014/10/31 @Tokyo

TRANSCRIPT

  • 1. 1Copyright 2014 NTT DATA Corporation NTT SQL on Hadoop Cloudera

2. 2Copyright 2014 NTT DATA Corporation ( ) email: [email protected] OSS - OSS - - OSS - NTT OSS NTTHadoop 2010Hadoop - http://oss.nttdata.co.jp/hadoop/bxshs.html - HadoopHADOOP HACKS - (^^ 3. Copyright 2014 NTT DATA Corporation 3 SQL on Hadoop 4. 4Copyright 2014 NTT DATA Corporation SQL on Hadoop ()SQL on Hadoop SQL on Hadoop Impala - Cloudera - 2012/10SQL on Hadoop - Presto - Facebook - 2013/11SQL on Hadoop Hive on Tez - Hortonworks - TezYARNFW - SQL on HadoopHive LinkedIn TajoMapR Drill HadoopSQL HDFS 5. 5Copyright 2014 NTT DATA Corporation Impala TPC-DSHive24(*1) Presto Hive10(*2) Hive on Tez TPC-DSHive66(*3) *1 http://blog.cloudera.com/blog/2014/01/impala-performance-dbms-class-speed *2 https://www.facebook.com/notes/facebook-engineering/presto-interacting-with-petabytes-of-data-at-facebook/10151786197628920 *3 http://hortonworks.com/blog/benchmarking-apache-hive-13-enterprise-hadoop ? 6. 6Copyright 2014 NTT DATA Corporation - GBTB - CPUHDD ImpalaPresto Hive on Tez - WHEREJOIN 100GB - Cloudera ManagerAmbari - - ImpalaParquet(Snappy)PrestoHive on TezORC(zlib) Tips 7. 7Copyright 2014 NTT DATA Corporation (1/2) TPC-DS ClouderaImpala(*1) small(GB)medium(10GB)large(100GB)xlarge(TB) x1 x3 x1 Dell PowerEdge R520 CPU Xeon(R) CPU E5-2407 @ 2.20GHz x 2 DIMM DDR3 Synchronous 1333 MHz 8GB x 8 Western Digital WD2000FYYX 2TB x 4 Intel Ethernet Controller 10-Gigabit X540-AT2 *1 http://blog.cloudera.com/blog/2014/05/new-sql-choices-in-the-apache-hadoop-ecosystem-why-impala-continues-to-lead/ 8. 8Copyright 2014 NTT DATA Corporation (2/2) ImpalaPresto Hive on Tez Hadoop ImpalaCDH 5.0 PrestoCDH Hive on TezHDP 2.1 TezHive 0.13 CDH 5.0.2 - - Hive() 0.12 ORC zlib Impala 1.3.1 Parquet Snappy Presto 0.69 ORC zlib HDP 2.1.4 - - Hive 0.13 ORC zlib Tez 0.4 - - 9. Copyright 2014 NTT DATA Corporation 9 10. 10Copyright 2014 NTT DATA Corporation TPC-DS ETL HW Impala Hadoop64GB TB SW (...) CDH 5.2Impala 2.0 Presto0.79 HDP 2.2Hive 0.14Tez 0.60 []Impala 384GB 11. 11Copyright 2014 NTT DATA Corporation (1/4) (small) Parquet+Snappy5.1GBORC+zlib3.4GB (Hive) Impala34.0 Presto11.7 Tez2.1 0 50 100 150 200 250 300 350 400 Hive(ORC) Impala(Parquet) Presto(ORC) Hive on Tez(ORC) 12. 12Copyright 2014 NTT DATA Corporation (2/4) (medium) Parquet+Snappy47.9GBORC+zlib33.6GB (Hive) Impala21.9 Presto3.3 Tez2.9 small 0 500 1000 1500 2000 2500 3000 3500 4000 Hive(ORC) Impala(Parquet) Presto(ORC) Hive on Tez(ORC) () small 13. 13Copyright 2014 NTT DATA Corporation (3/4) (large) Parquet+Snappy433.1GBORC+zlib335.9GB (Hive) Impala12.7 Presto2.0() Tez2.6 medium small 0 1000 2000 3000 4000 5000 6000 7000 Hive(ORC) Impala(Parquet) Presto(ORC) Hive on Tez(ORC) q3q19q43q53 q63q65q89 14. 14Copyright 2014 NTT DATA Corporation (4/4) (xlarge) Parquet+Snappy1.2TBORC+zlib1TB (Hive) Impala9.3 Presto0.9() Tez2.3 large 0 2000 4000 6000 8000 10000 12000 14000 16000 Hive(ORC) Impala(Parquet) Presto(ORC) Hive on Tez(ORC) q3q19q42q43 q52q53q55q63 q89 Hive 15. 15Copyright 2014 NTT DATA Corporation () Hive() ImparaPresto... () Presto Hive on Tez 0.0 5.0 10.0 15.0 20.0 25.0 30.0 35.0 small medium large xlarge Hive Impala Presto Tez 16. Copyright 2014 NTT DATA Corporation 16 17. 17Copyright 2014 NTT DATA Corporation Impala (largexlargeq59) CPU 100%Impala largexlarge() HDD xlargelarge xlargeCPU xlarge 18. 18Copyright 2014 NTT DATA Corporation Presto (mediumq65) CPU 10%100% 35MB/ HDD CPU 19. 19Copyright 2014 NTT DATA Corporation (xlargeq65) smallxlargeCPUHDD CPU 100% HDD 2040% Hive on Tez xlargeCPU xlarge xlargeHDD xlarge 20. 20Copyright 2014 NTT DATA Corporation ImpalaPrestoHive on Tez Impala Impala 2.0spill to disk Presto (21) Hive on Tez Hive CBO() 21. Copyright 2011 NTT DATA Corporation Copyright 2014 NTT DATA Corporation OSS URL: http://oss.nttdata.co.jp/hadoop : [email protected] TEL 050-5546-2496