hadoop: today and tomorrow

26
© Hortonworks Inc. 2012 Hadoop: Today and Tomorrow Steve Loughran– Hortonworks stevel at hortonworks.com @steveloughran London, April 2012

Upload: steve-loughran

Post on 10-May-2015

4.801 views

Category:

Technology


0 download

DESCRIPTION

Presentation on where Hadoop is today -and where it is going, at the London Hadoop Users group, April 2012

TRANSCRIPT

Page 1: Hadoop: today and tomorrow

© Hortonworks Inc. 2012

Hadoop: Today and Tomorrow

Steve Loughran– Hortonworksstevel at hortonworks.com@steveloughran

London, April 2012

Page 2: Hadoop: today and tomorrow

© Hortonworks Inc. 2012

About me:

Page 2

• HP Labs:–Deployment, cloud infrastructure, Hadoop-in-Cloud

• Apache – member and committer–Ant (author, Ant in Action), Axis 2–Hadoop

–Dynamic deployments–Diagnostics on failures–Cloud infrastructure integration

• Joined Hortonworks in 2012–UK based: R&D + customer engagement

Page 3: Hadoop: today and tomorrow

© Hortonworks Inc. 2012

About Hortonworks

Page 3

Hadoop at Yahoo!

40K+ Servers

170PB Storage

5M+ Monthly Jobs

1000+ Active Users

From developing and running the world's largest Hadoop clusters toadvancing open source Apache Hadoop for the broader market

HDP, training & support

2011

Page 4: Hadoop: today and tomorrow

© Hortonworks Inc. 2012

Where is Hadoop?

•Today: Hadoop 1.x–Status & Roadmap

•Tomorrow: Hadoop 2.x–YARN–HDFS HA

•Enterprise integration

Page 4

Page 5: Hadoop: today and tomorrow

Releases slowed with Hadoop take up

Page 5

• 64 Releases• Branches from the last 2.5 years:

–0.20.{0,1,2} – Stable release without security–0.20.2xx.y – Stable release with security

–0.21.0 – released, unstable, deprecated–0.22.0 – orphan, unstable, lack of community

Page 6: Hadoop: today and tomorrow

Now: two release branches, one dev

Page 6

Hadoop 1.x• Stable, used in production systems• The one to use today

Hadoop 2.0• The successor• Not quite ready for use

Hadoop 2.x "trunk"• Where features & fixes first go in• If you want to help –start here

Page 7: Hadoop: today and tomorrow

© Hortonworks Inc. 2012

Today: Hadoop 1.x

• A stable Hadoop release from the ASF–Merges various Hadoop 0.20.* branches (security, HBase support, …)

–A stable branch for patching and back-porting• Highlights:

–Security–HBase support (“append” operation)–WebHDFS–“new” MapReduce APIs complete & usable–Distribution packaging includes RPM files

Page 7

Page 8: Hadoop: today and tomorrow

© Hortonworks Inc. 2012

WebHDFS: fast direct HTTP access

~:$ GET http://nnode:50070/webhdfs/v1/results/part-r-00000.csv?op=open

GATE4,eb8bd736445f415e18886ba037f84829,55000,2007-01-14,14:01:54,GATE4,ec58edcce1049fa665446dc1fa690638,8030803000,2007-01-14,13:52:31,GATE4,b6f07ce00f09035a6683c5e93e3c04b8,30000,2007-01-28,12:41:11,GATE4,a1bc345b756090854e9dd0011087c6c0,30000,2007-01-28,12:59:33,...

Page 8

Potential Uses:

Out of cluster access to HDFS

Cross-cluster, cross version HDFS access

Native filesystem clients

dfs.webhdfs.enabled=true

Page 9: Hadoop: today and tomorrow

© Hortonworks Inc. 2012

Hortonworks Data Platform HDP1

Based on Hadoop 1.0, adds–HCatalog for table and schema management–Open APIs for metadata, data movement, app & job

management–Consumable “standard Hadoop” stack:

Hadoop 1.0.x core (HDFS, MapReduce)

Pig 0.9.x data flow programming language

Hive 0.8.x SQL-like language

HBase 0.92.x column table datastore

HCatalog 0.3.x table and schema management

ZooKeeper 3.4.x coordinator

Page 9

Page 10: Hadoop: today and tomorrow

© Hortonworks Inc. 2012

Post-SQL KVS & Column Tables

Page 10

Project Voldemort

Page 11: Hadoop: today and tomorrow

© Hortonworks Inc. 2012

Analysis tooling maturing

Page 11

DataFu

Pig

Page 12: Hadoop: today and tomorrow

© Hortonworks Inc. 2012

Ingress

Page 12

facebook / scribe

Fluentd

Kafka

Page 13: Hadoop: today and tomorrow

© Hortonworks Inc. 2012

Keep an eye on the graph layer

Page 13

Apache Giraph

Hama

Workshop: Beyond MapReduce

Page 14: Hadoop: today and tomorrow

© Hortonworks Inc. 2012

Tomorrow: Hadoop 2.0

Page 14

• HDFS Federation–Clear separation of Namespace and Block Storage–Snapshots– Improved scalability and isolation

• HDFS HA– Active/Standby failover of Namenodes

• Next Generation MapReduce architecture (aka YARN)–New architecture enables other application types to plug in–Resource Manager a foundation for HA and fault tolerance

• Performance!

In beta 2012

Page 15: Hadoop: today and tomorrow

© Hortonworks Inc. 2012

HDFS HA

NNActive

NNStandby

DN

FailoverControllerActive

ZK

CmdsMonitor Health of NN. OS, HW

Monitor Health of NN. OS, HW

Block Reports to Active & StandbyDN fencing: Update cmds from one

DN DN

FailoverControllerStandby

ZK ZKHeartbeat Heartbeat

Page 16: Hadoop: today and tomorrow

© Hortonworks Inc. 2012

YARN: foundation of a datacentre OS

Multiple topology-aware applications in a single cluster

Page 17: Hadoop: today and tomorrow

© Hortonworks Inc. 2012

Microsoft embraces Hadoop

Page 17

Good for enterprises & developers

Great for end users!

Page 18: Hadoop: today and tomorrow

© Hortonworks Inc. 2012Page 18

Oracle accepts NoSQL

May 2011: “Don't be risking your data on NoSQL databases.”

Sept 2011:“Oracle NoSQL Database provides network-

accessible multi-terabyte distributed key/value pair storage with predictable latency. ”

• Oracle need compatible SQL & NoSQL business plans• & to justify high-end servers over “commodity” x86 boxes• Could drive Hadoop-centric JVM development

Page 19: Hadoop: today and tomorrow

© Hortonworks Inc. 2012

Open Source “Enterprise” Tooling

Application Layer

• Spring Data for Hadoop in Beta

• Cascading → Apache 2.0 License

OS Layer

• RedHat building Hadoop story

• Canonical assisting Hadoop packaging

Page 19

Page 20: Hadoop: today and tomorrow

© Hortonworks Inc. 2012

What does all this mean?

Page 20

Page 21: Hadoop: today and tomorrow

© Hortonworks Inc. 2012Page 21

facebook: 45 PB, Yahoo! 180+PB

Page 22: Hadoop: today and tomorrow

© Hortonworks Inc. 2012

Hadoop has the momentum

• Platform: stable version & evolving version• Tooling & layers: ecosystem • Commercial training and support• Adoption by enterprise vendors

Page 22

Page 23: Hadoop: today and tomorrow

© Hortonworks Inc. 2011

Hadoop is the Big Data Platform

Page 23

Page 24: Hadoop: today and tomorrow

© Hortonworks Inc. 2012

Get involved with the Apache project!

•Join the -user mailing lists– [email protected][email protected][email protected]

•File bug reports in JIRA•Contribute to the documentation•Add: patches, tests, features, …

Page 24

Page 25: Hadoop: today and tomorrow

© Hortonworks Inc. 2012

Questions?

hortonworks.com

Page 25

Page 26: Hadoop: today and tomorrow

© Hortonworks Inc. 2012

hortonworks.com

Page 26