hadoop - past, present and future - v2.0
DESCRIPTION
A session focused on ramping you up on what Hadoop is, how its works and what it's capable of. We will also look at what Hadoop 2.x and YARN brings to the table and some future projects in the Hadoop space to keep an eye on.TRANSCRIPT
![Page 1: Hadoop - Past, Present and Future - v2.0](https://reader034.vdocuments.site/reader034/viewer/2022051400/54c649b44a7959df5f8b4576/html5/thumbnails/1.jpg)
© 2014 Trace3, All rights reserved.
BIG DATA INTELLIGENCE PRACTICE
HADOOP: PAST, PRESENT AND FUTURE
![Page 2: Hadoop - Past, Present and Future - v2.0](https://reader034.vdocuments.site/reader034/viewer/2022051400/54c649b44a7959df5f8b4576/html5/thumbnails/2.jpg)
© 2014 Trace3, All rights reserved.
Roadmap
1
~1 hour
1-‐ What Makes Up Hadoop 1.x?
2-‐ What’s New In Hadoop 2.x?
3-‐ The Future Of Hadoop …
![Page 3: Hadoop - Past, Present and Future - v2.0](https://reader034.vdocuments.site/reader034/viewer/2022051400/54c649b44a7959df5f8b4576/html5/thumbnails/3.jpg)
© 2014 Trace3, All rights reserved.
WHAT MAKES UP HADOOP 1.0?
![Page 4: Hadoop - Past, Present and Future - v2.0](https://reader034.vdocuments.site/reader034/viewer/2022051400/54c649b44a7959df5f8b4576/html5/thumbnails/4.jpg)
© 2014 Trace3, All rights reserved.
What’s a “Node”?
Node aka Server
Compute
Storage
OperaVng System
Memory
![Page 5: Hadoop - Past, Present and Future - v2.0](https://reader034.vdocuments.site/reader034/viewer/2022051400/54c649b44a7959df5f8b4576/html5/thumbnails/5.jpg)
© 2014 Trace3, All rights reserved.
Hadoop 1.0: HDFS + MapReduce
4
NameNode
DataNode / TaskTracker DataNode / TaskTracker
DataNode / TaskTracker DataNode / TaskTracker
JobTracker
Client 1-‐1
1-‐2 1-‐3
![Page 6: Hadoop - Past, Present and Future - v2.0](https://reader034.vdocuments.site/reader034/viewer/2022051400/54c649b44a7959df5f8b4576/html5/thumbnails/6.jpg)
© 2014 Trace3, All rights reserved.
Hadoop 1.0: HDFS + MapReduce
5
NameNode
DataNode / TaskTracker DataNode / TaskTracker
DataNode / TaskTracker DataNode / TaskTracker
JobTracker
Client 1-‐1 1-‐2
1-‐3
Reduce Map
2-‐1 3-‐2 3-‐3 4-‐1
2-‐3 4-‐2 2-‐2 3-‐1 4-‐3
Reduce Map
![Page 7: Hadoop - Past, Present and Future - v2.0](https://reader034.vdocuments.site/reader034/viewer/2022051400/54c649b44a7959df5f8b4576/html5/thumbnails/7.jpg)
© 2014 Trace3, All rights reserved.
MapReduce v1 LimitaVons
6
Scalability Maximum cluster size is 4,000 nodes and maximum concurrent tasks is 40,000
Availability JobTracker failure kills all queued and running jobs
Resources ParVVoned into Map and Reduce Hard parGGoning of Map and Reduce slots led to low resource uVlizaVon
No Support for Alternate Paradigms / Services Only MapReduce batch jobs, nothing else
![Page 8: Hadoop - Past, Present and Future - v2.0](https://reader034.vdocuments.site/reader034/viewer/2022051400/54c649b44a7959df5f8b4576/html5/thumbnails/8.jpg)
© 2014 Trace3, All rights reserved.
Hadoop 1.0: Single Use System
7
HADOOP 1.0
Single Use System Batch Apps
HDFS (redundant, reliable storage)
MapReduce (cluster resource management and data
processing)
Pig Hive
![Page 9: Hadoop - Past, Present and Future - v2.0](https://reader034.vdocuments.site/reader034/viewer/2022051400/54c649b44a7959df5f8b4576/html5/thumbnails/9.jpg)
© 2014 Trace3, All rights reserved.
WHAT’S NEW IN HADOOP 2.0?
![Page 10: Hadoop - Past, Present and Future - v2.0](https://reader034.vdocuments.site/reader034/viewer/2022051400/54c649b44a7959df5f8b4576/html5/thumbnails/10.jpg)
© 2014 Trace3, All rights reserved.
YARN
9
YARN Replaces MapReduce
Yet Another Resource NegoVator
YARN will be the de-‐facto distributed operaVng system for Big Data
![Page 11: Hadoop - Past, Present and Future - v2.0](https://reader034.vdocuments.site/reader034/viewer/2022051400/54c649b44a7959df5f8b4576/html5/thumbnails/11.jpg)
© 2014 Trace3, All rights reserved. 10
Store DATA in one place Interact with that data in MULTIPLE WAYS
with Predictable Performance and Quality of Service
ApplicaGons Run NaGvely IN Hadoop
HDFS2 (redundant, reliable storage)
YARN (cluster resource management)
BATCH (MapReduce)
INTERACTIVE (Tez)
ONLINE (HBase)
STREAMING (DataTorrent)
GRAPH (Giraph)
YARN: No Longer Just Batch Apps
![Page 12: Hadoop - Past, Present and Future - v2.0](https://reader034.vdocuments.site/reader034/viewer/2022051400/54c649b44a7959df5f8b4576/html5/thumbnails/12.jpg)
© 2014 Trace3, All rights reserved. 11
YARN: ApplicaVons
Running all on the same Hadoop cluster to give applicaVons access to all the same source data!
MapReduce v2
Stream Processing
Master-‐Worker Online
In-‐Memory
Apache Storm
![Page 13: Hadoop - Past, Present and Future - v2.0](https://reader034.vdocuments.site/reader034/viewer/2022051400/54c649b44a7959df5f8b4576/html5/thumbnails/13.jpg)
© 2014 Trace3, All rights reserved. 12
YARN: Quickly Maturing
2010
2011
2012
2013
2014
Today
Conceived at Yahoo!
Alpha Releases – 2.0
Beta Releases – 2.1 GA Released – 2.2
100,000+ nodes, 400,000+ jobs daily 10 million+ hours of compute daily
Version 2.3 Version 2.4
![Page 14: Hadoop - Past, Present and Future - v2.0](https://reader034.vdocuments.site/reader034/viewer/2022051400/54c649b44a7959df5f8b4576/html5/thumbnails/14.jpg)
© 2014 Trace3, All rights reserved. 13
YARN: Dr. Evil Approved
![Page 15: Hadoop - Past, Present and Future - v2.0](https://reader034.vdocuments.site/reader034/viewer/2022051400/54c649b44a7959df5f8b4576/html5/thumbnails/15.jpg)
© 2014 Trace3, All rights reserved. 14
YARN: What Has Changed? YARN MRv1 RM
ResourceManager
AM ApplicaVonMaster
JT JobTracker
Scheduler Scheduler
NM NodeManager
TT TaskTracker
Container Map & Reduce Slot
ResourceManager
Scheduler
JobTracker
Scheduler
NodeManager
ApplicaVonMaster
TaskTracker
Map Reduce
NodeManager
Container Container
TaskTracker
Map Reduce
![Page 16: Hadoop - Past, Present and Future - v2.0](https://reader034.vdocuments.site/reader034/viewer/2022051400/54c649b44a7959df5f8b4576/html5/thumbnails/16.jpg)
© 2014 Trace3, All rights reserved.
The 6 Benefits Of YARN
15
• Scale • New programming models and services
• Improved cluster uVlizaVon
• Agility • Backwards compaVble with MapReduce v1
• Mixed workloads on the same source of data
![Page 17: Hadoop - Past, Present and Future - v2.0](https://reader034.vdocuments.site/reader034/viewer/2022051400/54c649b44a7959df5f8b4576/html5/thumbnails/17.jpg)
© 2014 Trace3, All rights reserved.
THE FUTURE OF HADOOP
![Page 18: Hadoop - Past, Present and Future - v2.0](https://reader034.vdocuments.site/reader034/viewer/2022051400/54c649b44a7959df5f8b4576/html5/thumbnails/18.jpg)
© 2014 Trace3, All rights reserved.
SQL on Hadoop
Speed Deliver interacGve query performance.
SQL Support array of SQL semanGcs for analyGc applicaGons running against Hadoop.
Scale SQL interface to Hadoop designed for queries that scale from Terabytes to Petabytes
![Page 19: Hadoop - Past, Present and Future - v2.0](https://reader034.vdocuments.site/reader034/viewer/2022051400/54c649b44a7959df5f8b4576/html5/thumbnails/19.jpg)
© 2014 Trace3, All rights reserved.
SQL on Hadoop
Hive on Apache Tez Hortonworks HDP2
Hive on Apache Spark Cloudera CDH5
Apache Drill MapR M7
Cloudera Impala Cloudera CDH5
Pivotal HAWQ Pivotal Big Data Suite
![Page 20: Hadoop - Past, Present and Future - v2.0](https://reader034.vdocuments.site/reader034/viewer/2022051400/54c649b44a7959df5f8b4576/html5/thumbnails/20.jpg)
© 2014 Trace3, All rights reserved.
HOYA: HBase (NoSQL) on YARN
Dynamic Scaling On-‐demand cluster size. Increase and decrease the size with load.
Easier Deployment APIs to create, start, stop and delete HBase clusters.
Availability Recover from Region Server loss with a new container.
![Page 21: Hadoop - Past, Present and Future - v2.0](https://reader034.vdocuments.site/reader034/viewer/2022051400/54c649b44a7959df5f8b4576/html5/thumbnails/21.jpg)
© 2014 Trace3, All rights reserved.
Microsoo REEF
Machine Learning Framework well suited for building machine learning jobs.
Scalable / Fault Tolerant Makes it easy to implement scalable, fault-‐tolerant runGme environments for a range of computaGonal models.
Maintain State Users can build jobs that uGlize data from where it’s needed and also maintain state a_er jobs are done.
Retainable Evaluator ExecuGon Framework
![Page 22: Hadoop - Past, Present and Future - v2.0](https://reader034.vdocuments.site/reader034/viewer/2022051400/54c649b44a7959df5f8b4576/html5/thumbnails/22.jpg)
© 2014 Trace3, All rights reserved.
Heterogeneous Storage
NameNode
Storage
NameNode
SATA SSD Fusion IO
THEN NOW
![Page 23: Hadoop - Past, Present and Future - v2.0](https://reader034.vdocuments.site/reader034/viewer/2022051400/54c649b44a7959df5f8b4576/html5/thumbnails/23.jpg)
© 2014 Trace3, All rights reserved.
Hadoop Roadmap
• Apache Hadoop 2.5 – NodeManager Restart w/o disrupGon – Dynamic Resource ConfiguraGon
• Apache Hadoop 2.6 – Memory As Storage Tier – Support For Docker Containers
Q3 2014
Q4 2014
![Page 24: Hadoop - Past, Present and Future - v2.0](https://reader034.vdocuments.site/reader034/viewer/2022051400/54c649b44a7959df5f8b4576/html5/thumbnails/24.jpg)
© 2014 Trace3, All rights reserved.
HADOOP: PAST, PRESENT & FUTURE
23
I KNOW YOU HAVE QUESTONS
NO SUCH THING AS A STUPID QUESTION.
![Page 25: Hadoop - Past, Present and Future - v2.0](https://reader034.vdocuments.site/reader034/viewer/2022051400/54c649b44a7959df5f8b4576/html5/thumbnails/25.jpg)
© 2014 Trace3, All rights reserved.
ONE LAST THING …
24
SD Big Data Meetup
meetup.com/sdbigdata 2nd Wednesday Of The Month Next: August 13th @ 5:45P