hadoop: today and tomorrow

Download Hadoop: today and tomorrow

Post on 10-May-2015




0 download

Embed Size (px)


Presentation on where Hadoop is today -and where it is going, at the London Hadoop Users group, April 2012


  • 1.Hadoop: Today and TomorrowSteve Loughran Hortonworksstevel at hortonworks.com@steveloughranLondon, April 2012 Hortonworks Inc. 2012

2. About me: HP Labs: Deployment, cloud infrastructure, Hadoop-in-Cloud Apache member and committer Ant (author, Ant in Action), Axis 2 Hadoop Dynamic deployments Diagnostics on failures Cloud infrastructure integration Joined Hortonworks in 2012 UK based: R&D + customer engagementPage 2 Hortonworks Inc. 2012 3. About Hortonworks From developing and running the worlds largest Hadoop clusters to advancing open source Apache Hadoop for the broader marketHadoop at Yahoo! 40K+ Servers 170PB Storage 5M+ Monthly Jobs 1000+ Active Users2011 HDP, training & support Page 3 Hortonworks Inc. 2012 4. Where is Hadoop? Today: Hadoop 1.xStatus & Roadmap Tomorrow: Hadoop 2.xYARNHDFS HA Enterprise integration Page 4 Hortonworks Inc. 2012 5. Releases slowed with Hadoop take up0.20.0 0.20.1 0.20.2 0.21.0 0.20.20{3,4,5}.0 64 Releases Branches from the last 2.5 years: 0.20.{0,1,2} Stable release without security 0.20.2xx.y Stable release with security 0.21.0 released, unstable, deprecated 0.22.0 orphan, unstable, lack of community Page 5 6. Now: two release branches, one devHadoop 1.x Stable, used in production systems The one to use todayHadoop 2.0 The successor Not quite ready for useHadoop 2.x "trunk" Where features & fixes first go in If you want to help start here Page 6 7. Today: Hadoop 1.x A stable Hadoop release from the ASF Merges various Hadoop 0.20.* branches (security, HBase support, ) A stable branch for patching and back-porting Highlights: Security HBase support (append operation) WebHDFS new MapReduce APIs complete & usable Distribution packaging includes RPM filesPage 7 Hortonworks Inc. 2012 8. WebHDFS: fast direct HTTP access~:$ GET http://nnode:50070/webhdfs/v1/results/part-r-00000.csv?op=openGATE4,eb8bd736445f415e18886ba037f84829,55000,2007-01-14,14:01:54,GATE4,ec58edcce1049fa665446dc1fa690638,8030803000,2007-01-14,13:52:31,GATE4,b6f07ce00f09035a6683c5e93e3c04b8,30000,2007-01-28,12:41:11,GATE4,a1bc345b756090854e9dd0011087c6c0,30000,2007-01-28,12:59:33,... Potential Uses: Out of cluster access to HDFS Cross-cluster, cross version HDFS access Native filesystem clients dfs.webhdfs.enabled=true Page 8 Hortonworks Inc. 2012 9. Hortonworks Data Platform HDP1Based on Hadoop 1.0, adds HCatalog for table and schema management Open APIs for metadata, data movement, app & jobmanagement Consumable standard Hadoop stack: Hadoop 1.0.x core (HDFS, MapReduce) Pig 0.9.x data flow programming language Hive 0.8.x SQL-like language HBase 0.92.x column table datastore HCatalog 0.3.x table and schema management ZooKeeper 3.4.x coordinator Page 9 Hortonworks Inc. 2012 10. Post-SQL KVS & Column TablesProject Voldemort Page 10 Hortonworks Inc. 2012 11. Analysis tooling maturingPigDataFu Page 11 Hortonworks Inc. 2012 12. Ingress Kafka Fluentd facebook / scribe Page 12 Hortonworks Inc. 2012 13. Keep an eye on the graph layerApacheGiraph Hama Workshop: Beyond MapReducePage 13 Hortonworks Inc. 2012 14. Tomorrow: Hadoop 2.0 HDFS Federation Clear separation of Namespace and Block Storage Snapshots Improved scalability and isolation HDFS HA Active/Standby failover of Namenodes Next Generation MapReduce architecture (aka YARN) New architecture enables other application types to plug in Resource Manager a foundation for HA and fault tolerance Performance!In beta 2012Page 14 Hortonworks Inc. 2012 15. HDFS HAZKZKZK Heartbeat Heartbeat FailoverControllerFailoverController Active StandbyCmdsMonitor HealthMonitor Healthof NN. OS, HW of NN. OS, HW NNNNActive StandbyBlock Reports to Active & StandbyDN fencing: Update cmds from one DNDN DN Hortonworks Inc. 2012 16. YARN: foundation of a datacentre OS NodeManager Container App Mstr ClientResource NodeManager Manager Client App MstrContainer MapReduce StatusNodeManager Job Submission Node Status Resource RequestContainer Container Multiple topology-aware applications in a single cluster Hortonworks Inc. 2012 17. Microsoft embraces Hadoop Good for enterprises & developers Great for end users! Page 17 Hortonworks Inc. 2012 18. Oracle accepts NoSQLMay 2011:Dont be risking your data on NoSQL databases.Sept 2011: Oracle NoSQL Database provides network-accessiblemulti-terabyte distributed key/value pair storage withpredictable latency. Oracle need compatible SQL & NoSQL business plans & to justify high-end servers over commodity x86 boxes Could drive Hadoop-centric JVM development 18 Hortonworks Inc. 2012 19. Open Source Enterprise ToolingApplication Layer Spring Data for Hadoop in Beta Cascading Apache 2.0 LicenseOS Layer RedHat building Hadoop story Canonical assisting Hadoop packaging Page 19 Hortonworks Inc. 2012 20. What does all this mean?Page 20 Hortonworks Inc. 2012 21. facebook: 45 PB, Yahoo! 180+PB Page 21 Hortonworks Inc. 2012 22. Hadoop has the momentum Platform: stable version & evolving version Tooling & layers: ecosystem Commercial training and support Adoption by enterprise vendorsPage 22 Hortonworks Inc. 2012 23. Hadoop is the Big Data Platform Page 23 Hortonworks Inc. 2011 24. Get involved with the Apache project!Join the -user mailing lists common-user@hadoop.apache.org hdfs-user@hadoop.apache.org mapreduce-user@hadoop.apache.orgFile bug reports in JIRAContribute to the documentationAdd: patches, tests, features, Page 24 Hortonworks Inc. 2012 25. Questions?hortonworks.com Page 25 Hortonworks Inc. 2012 26. hortonworks.com Page 26 Hortonworks Inc. 2012