the evolving apache hadoop ecosystem – what it means for ... · – the team that originally...
TRANSCRIPT
2012 SNIA Analytics & Big Data Summit © Hortonworks Inc. 2012 – All Rights Reserved
The Evolving Apache Hadoop Ecosystem – What it means for Storage Industry Sanjay Radia Architect/Founder, Hortonworks Inc.
Page 1
2012 SNIA Analytics & Big Data Summit © Hortonworks Inc. 2012
Outline
• Hadoop (HDFS) and Storage – Data platform drivers and How Hadoop changes the game – File systems leading to HDFS – HDFS Architecture – RAID disk or not? – HDFS’s Generic storage layer – Archival – Upcoming HDFS Features
Page 2
2012 SNIA Analytics & Big Data Summit © Hortonworks Inc. 2012
Next Generation Data Platform Drivers
Business Drivers
Technical Drivers
Financial Drivers
• Need for new business models & faster growth (20%+)
• Find insights for competitive advantage & optimal returns
• Cost of data systems, as % of IT spend, continues to grow
• Cost advantages of commodity hardware & open source
• Data continues to grow exponentially • Data is increasingly everywhere and in many formats • Legacy solutions unfit for new requirements growth
2012 SNIA Analytics & Big Data Summit © Hortonworks Inc. 2012
Big Data Changes the Game
Megabytes
Gigabytes
Terabytes
Petabytes
Purchase detail Purchase record Payment record
ERP
CRM
WEB
BIG DATA
Offer details
Support Contacts
Customer Touches
Segmentation
Web logs
Offer history
A/B testing
Dynamic Pricing
Affiliate Networks
Search Marketing
Behavioral Targeting
Dynamic Funnels
User Generated Content
Mobile Web
SMS/MMS Sentiment
External Demographics
HD Video, Audio, Images
Speech to Text
Product/Service Logs
Social Interactions & Feeds
Business Data Feeds
User Click Stream
Sensors / RFID / Devices
Spatial & GPS Coordinates
Increasing Data Variety and Complexity
Transactions + Interactions + Observations = BIG DATA
2012 SNIA Analytics & Big Data Summit © Hortonworks Inc. 2012
How Hadoop Changes the Game
• Cost – Commodity servers, JBod disks – Horizontal scaling – new servers as needed
– Ride the technology curve
• Scale from small to very large – Size of data or size of computation
• The Data-Refinery – Traditional Datamarts optimize for time and space – Don’t need to pre-select the subset of “schema”
– Can experiment to gain insights into your whole data – You insights can change over time
– Don’t need to throw your old data away • Open and Growing Eco-system
– You aren’t locked into a vendor – Ever growing choice of tools and components
Page 5
© Hortonworks Inc. 2012
Distributed File Systems and HDFS
Page 6
2012 SNIA Analytics & Big Data Summit © Hortonworks Inc. 2012
File Systems Background(1) : Scaling
7
Vertically Scale Namespace IO, & Storage
(2) Single System Image Distributed FS
Vertically Scale Namespace
Horizontally Scale IO and Storage
Horizontally Scale namespace ops and IO
(1) Mainframe FS
(3) Modern FS (Hadoop, GFS, pNFS)
2012 SNIA Analytics & Big Data Summit © Hortonworks Inc. 2012
File systems Background (3): Leading to Google FS and HDFS
> Separation of metadata from data - 1978, 1980
l “Separating Data from Function in a Distributed File System” (1978) l by J E Israel, J G Mitchell, H E Sturgis
l “A universal file server” (1980) by A D Birrell, R M Needham
> Horizontal scaling of storage nodes and io bandwidth (1999-2003) l Several startups building scalable NFS – late 1990s l Luster (1999-2002) l Google FS (2003) l (Hadoop’s HDFS, pNFS)
> Commodity HW with JBODs, Replication, Non-posix semantics l Google FS (2003) - Not using RAID a very important decision
> Computation close to the data l Parallel DBs l Google FS/MapReduce
2012 SNIA Analytics & Big Data Summit © Hortonworks Inc. 2012
HDFS: Scalable, Reliable, Manageable
9
`
Switch
…
Core Switch
Switch
…
Switch
…
Scale IO, Storage, CPU • Add commodity servers & JBODs • 6K nodes in cluster, 120PB
r Fault Tolerant & Easy management r Built in redundancy r Tolerate disk and node failures r Automatically manage addition/
removal of nodes r One operator per 3K node!!
r Storage server used for computation r Move computation to data
r Not a SAN r But high-bandwidth network access
to data via Ethernet
r Scalable file system r Read, Write, rename, append
r No random writes
Simplicity of design why a small team could build such a large system in the first place
10
• Namespace layer › Multiple independent namespace
• Can be mounted using client side mount tables
› Consists of dirs, files and blocks › Operations: create, delete, modify and list dir/files
• Block Storage – Generic Block service › DataNodes cluster membership › Block Pool – set of blocks for a Namespace volume
• Shared storage across all volumes – no partitioning • Namespace Volume = Namespace + Block Pool
› Block operations • Create/delete/modify/getBlockLocation operations • Read and write access to blocks
› Detect and Tolerate faults • Monitor node/disk health, periodic block verification • Recover from failures – replication and placement
HDFS Architecture: Two Main Layers B
lock
Sto
rage
Nam
espa
ce
Block Management
NS
Storage
Datanode Datanode …
NS …
2012 SNIA Analytics & Big Data Summit © Hortonworks Inc. 2012
HDFS Architecture
Namenode
Persistent Namespace Metadata & Journal
Namespace State
Block Map
Heartbeats & Block Reports
Block ID è Block Locations
DataNodes
Block ID è Data
NFS
Hierarchal Namespace File Name è BlockIDs
Horizontally Scale IO and Storage 11
b1
b5
b3
JBOD Blo
ck S
tora
ge
Nam
espa
ce
b2
b3
b1
JBOD
b3
b5
b2
JBOD
b1
b5
b2
JBOD
2012 SNIA Analytics & Big Data Summit © Hortonworks Inc. 2012
Client Read & Write Directly from Closest Server
Namenode
Namespace State
Block Map
12
b1
b5
b3
JBOD
b2
b3
b1
JBOD
b3
b5
b2
JBOD
b1 b5
b2
JBOD
Client
1 open
2 read
Client
2 write
1 create
End-to-end checksum
2012 SNIA Analytics & Big Data Summit © Hortonworks Inc. 2012
Quiz: What Is the Common Attribute?
13
2012 SNIA Analytics & Big Data Summit © Hortonworks Inc. 2012
HDFS Actively maintains data reliability
Namenode
Namespace State
Block Map
14
b1
b5
b3
JBOD
b2
b3
b4
JBOD
b3
b5
b2
JBOD
b1 b5
b2
JBOD
2. copy
3. blockReceived
1. replicate
Bad/lost block replica
Periodically check block checksums
2012 SNIA Analytics & Big Data Summit © Hortonworks Inc. 2012
Significance of not using Disk RAID
• Key new idea was to not use RAID on the local disk and instead replicate the data – Uniformly Deal with device failure, media failures, node failures etc. – Nodes and clusters continue run, with slightly lower capacity
– Nodes and disks can be fixed when convenient – Recover from failures in parallel
– Raid5 disk take over half a day to recover from a 1TB failure – HDFS recovers 12TB in minutes – recover is in parallel
– Faster for larger clusters
– Operational advantage: system recovers automatically from node and disk failures very rapidly
– 1 operator for 4K servers at mature Hadoop installations – Yes there is the storage overhead – File-RAID is available (1.4 -1.6 efficiency)
– Practical to use for portion of data otherwise node recover takes long
Page 15
2012 SNIA Analytics & Big Data Summit © Hortonworks Inc. 2012
Current HDFS Availability & Data Integrity
• Simple design, storage fault tolerance – Storage: Rely in OS’s file system rather than use raw disk – Storage Fault Tolerance: multiple replicas, active monitoring – Single NameNode Master
– Persistent state: multiple copies + checkpoints – Restart on failure
• How well did it work? – Lost 19 out of 329 Million blocks on 10 clusters with 20K nodes in 2009
– 7-9’s of reliability – Fixed in 20 and 21.
– 18 months Study: 22 failures on 25 clusters - 0.58 failures per year per cluster – Only 8 would have benefitted from HA failover!! (0.23 failures per cluster year)
– NN is very robust and can take a lot of abuse – NN is resilient against overload caused by misbehaving apps
Automatic Failover is available in Hadoop 1 And in Hadoop 2 (alpha)
16
© Hortonworks Inc. 2012
HDFS’ Generic Storage Service Opportunities for Innovation
• Federation - Distributed (Partitioned) Namespace – Simple and Robust due to independent masters
– Scalability, Isolation, Availability
• New Services – Independent Block Pools – New FS - Partial namespace in memory
– MR Tmp storage, HBase directly on block storage
– Shadow file system – caches HDFS, NFS, S3
• Future: move Block Management into the DataNodes – Simplifies namespace/application implementation
– Distributed namenode becomes significantly simple
Storage Service
HDFS Namespace
Alternate NN Implementation HBase
MR tmp
2012 SNIA Analytics & Big Data Summit © Hortonworks Inc. 2012
Archival data – where should it sit?
• Hadoop encourages old data for future analysis – Should it sit in a separate cluster with lower computing power? – Spindles are one of the bottlenecks in big data systems
– Can’t waste precious spindles to another cluster where they are underutilized
– Better to keep archival data in the main cluster where the spindles can be actively used – As data grows over time, a Big data cluster may have smaller
percentage of hot data – Should we arrange data on the disk platter based on access patterns?
• Challenge – growing data – Do tapes play a role?
Page 18
2012 SNIA Analytics & Big Data Summit © Hortonworks Inc. 2012
HDFS in Hadoop 1 and Hadoop 2
Page 19
Hadoop 1 (GA) • Security • Append/Fsync (HBase) • WebHdfs + Spnego • Write pipeline improvements • Local write optimization • Performance improvements • Disk-fail-in-place • HA Namenode • Full Stack HA
Hadoop 2 (alpha) • New Append • Federation • Wire compatibility • Edit logs rewrite • Faster startup • HA NameNode (hot) • Full Stack HA – in progress
2012 SNIA Analytics & Big Data Summit © Hortonworks Inc. 2012
Hadoop Full Stack HA Architecture
20
HA Cluster for Master Daemons
Server Server Server
NN JT
Failover
N+K failover
Apps Running Outside
JT into Safemode
NN
job job job job job
Slave Nodes of Hadoop Cluster
Slave Server
Slave Server
Slave Server
Slave Server
Slave Server
…
2012 SNIA Analytics & Big Data Summit © Hortonworks Inc. 2012
Upcoming HDFS features • Continue improvements
– Performance – Monitoring and Management – Rolling upgrades – Disaster Recovery
• Full Stack HA in Hadoop 2 • Snapshots (prototype already published) • Support for heterogeneous storage on DataNodes
– Will allow Flash disks to be used for specific use cases
• Block grouping - allow large number of smaller blocks • Working-set of namespace in memory – large # of files • Other protocols – NFS • …
Page 21
© Hortonworks Inc. 2012
Which Apache Hadoop Distro? • Stable • Reliable • Well supported • But do not want to lock into a vendor
Page 22
2012 SNIA Analytics & Big Data Summit © Hortonworks Inc. 2012
1
• Simplify deployment to get started quickly and easily
• Monitor, manage any size cluster with familiar console and tools
• Only platform to include data integration services to interact with any data source
• Metadata services opens the platform for integration with existing applications
• Dependable high availability architecture
Hortonworks Data Platform
Hortonworks Data Platform
Delivers enterprise grade functionality on a proven Apache Hadoop distribution to ease management,
simplify use and ease integration into the enterprise
The only 100% open source data platform for Apache Hadoop
2012 SNIA Analytics & Big Data Summit © Hortonworks Inc. 2012
Hortonworks Distribution
Hadoop 1.0.x = stable “kernel” Integrates and tests all necessary Apache component projects Most stable and compatible versions of all components are chosen Closely aligned with each Apache component project code line with closedown process that provides necessary buffer QE/Beta process that produced every single stable Apache Hadoop release
Current Apache Hadoop 2 is being QE’ed and alpha/beta tested through the same process
Hortonworks Data Platform
Page 24
Had
oop
HC
atal
og
Pig
Hiv
e
HB
ase
Sqo
op
Ooz
ie
Zoo
keep
er
Am
bari
Tal
end
1.0.3
0.4.0
0.9.2
0.9.0+
0.92.1+
0.9.0+
3.1.3
3.3.4
beta
5.1.1
1.0.3 0.4.0 0.9.2 0.9.0+ 0.92.1+ 0.9.0+ 3.1.3 3.3.4 beta 5.1.1
Tested, Hardened & Proven Distribution Reduces Risk
2012 SNIA Analytics & Big Data Summit © Hortonworks Inc. 2012
Summary • Hadoop changes Big Data game in fundamental ways
– Cost, storage and compute capacity – JBODs rather than RAID, RAID arrays, or SANS – Block-Storage layer is being further generalize
– Horizontally scale from small to very very large – Data Refinery – keep all your data, new business insights
– “Archival” data is merely cold/warm – where should it sit? – Exponential growth – new data + old data & cost is low
– Open, growing eco-system – no lock in
• Hortonworks and HDP Distribution – The team that originally created Apache Hadoop at Yahoo – The team that is driving key developments in Apache Hadoop – QE’ed and alpha/beta tested every stable Hadoop release
– including Hadoop 1 and Hadoop 2 (alpha) – HDP is fully open source to apache – nothing held back
– close/identical to Apache releases Page 25