hdfs what’s new and future

24
© Hortonworks Inc. 2013 HDFS What’s New and Future Suresh Srinivas @suresh_m_s Sanjay Radia @srr Page 1

Upload: nicki

Post on 06-Feb-2016

44 views

Category:

Documents


0 download

DESCRIPTION

HDFS What’s New and Future. Suresh Srinivas @ suresh_m_s Sanjay Radia @ srr. About Me. Architect & Founder at Hortonworks Long time Apache Hadoop committer and PMC member Designed and developed many key Hadoop features Experience in supporting many clusters - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: HDFS What’s New and Future

© Hortonworks Inc. 2013

HDFS What’s New and Future

Suresh Srinivas

@suresh_m_s

Sanjay Radia

@srr

Page 1

Page 2: HDFS What’s New and Future

© Hortonworks Inc. 2013

About Me

• Architect & Founder at Hortonworks

• Long time Apache Hadoop committer and PMC member

• Designed and developed many key Hadoop features

• Experience in supporting many clusters

– Including some of the world’s largest Hadoop clusters

Page 2Architecting the Future of Big Data

Page 3: HDFS What’s New and Future

© Hortonworks Inc. 2013

Agenda

•HDFS – What’s new– New features in release 2.0

•Future– Short term and long term features

– Major architectural directions

Page 3Architecting the Future of Big Data

Page 4: HDFS What’s New and Future

© Hortonworks Inc. 2013

We have been hard at work…

• Progress is being made in many areas– Scalability

– Performance

– Enterprise features

– Ongoing operability improvements

– Expand Hadoop ecosystem to more platforms and use cases

• 2192 commits in Hadoop in the last year– Almost a million lines of changes

– ~150 contributors

– Lot of new contributors - ~80 with < 3 patches

• 350K lines of changes in HDFS and common

Page 4Architecting the Future of Big Data

Page 5: HDFS What’s New and Future

© Hortonworks Inc. 2013

Building on Rock-solid Foundation

• Original design choices - simple and robust– Single Namenode metadata server – all state in memory– Fault Tolerance: multiple replicas, active monitoring– Storage: Rely on OS’s file system not raw disk

• Reliability– Over 7 9’s of data reliability, less than 0.38 failures across 25 clusters

• Operability– Small teams can manage large clusters

• An operator per 3K node cluster

– Fast Time to repair on node or disk failure• Minutes to an hour Vs. RAID array repairs taking many long hours

• Scalable - proven by large scale deployments not bits– > 100 PB storage, > 400 million files, > 4500 nodes in a single cluster– ~ 100 K nodes of HDFS in deployment and use

Page 5Architecting the Future of Big Data

Page 6: HDFS What’s New and Future

© Hortonworks Inc. 2013

Current Hadoop Releases

• Release 1.x stream– Current stable release 1.2.0– Only critical bug fixes, improvements, and feature ported back

• Release 2.x stream– Close to trunk– Current release 2.1.0-beta – almost ready– Has the new features

• YARN• MapReduce 2• New HDFS features

– Expected to be stable and GA by September

Page 6Architecting the Future of Big Data

Page 7: HDFS What’s New and Future

© Hortonworks Inc. 2013

Federation – Many Namespaces

• Block Storage as generic storage service– DNs store blocks in Block Pools for all the Namespace Volumes

• Multiple independent Namenodes and Namespace Volumes in a cluster– Scalability by adding more namenodes/namespaces

– Isolation – separating applications to their own namespaces

– Client side mount tables/ViewFS for integrated views

Page 7Architecting the Future of Big Data

DN 1 DN 2 DN m.. .. ..

NS1Foreign

NS n...

..

.

NS k

Block Pools

Pool nPool kPool 1

NN-1 NN-k NN-n

Common Storage

Blo

ck S

tora

ge

Nam

esp

ace

Page 8: HDFS What’s New and Future

© Hortonworks Inc. 2013

High Availability – No SPOF

•Support standby namenode and failover– Planned downtime

– Unplanned downtime

•Release 1.1– Cold standby

– Uses NFS as shared storage

– Standard HA frameworks as failover controller• Linux HA and VMWare VSphere

– Suitable for small clusters up to 500 nodes

Page 8Architecting the Future of Big Data

Page 9: HDFS What’s New and Future

© Hortonworks Inc. 2013

High Availability – Release 2.0

• Support for Hot Standby

• Manual and automatic failover

• Automatic failover with Failover Controller– Active NN election and failure detection using ZooKeeper

– Periodic NN health check

– Failover on NN failure

• Removed shared storage dependency– Quorum Journal Manager

• 3 to 5 Journal Nodes for storing journal• Edit must be written to quorum number of Journal Nodes

• Replay cache for correctness & transparent failovers– In progress – will be in 2.1.0-beta

Page 9Architecting the Future of Big Data

Page 10: HDFS What’s New and Future

© Hortonworks Inc. 2013Page 10Architecting the Future of Big Data

NNActive

NNStandby

JNJN JN

Shared NN state through Quorum of JournalNodes

DN

FailoverControllerActive

ZK

Cmds

Monitor Health of NN. OS, HW Monitor Health

of NN. OS, HW

Block Reports to Active & StandbyDN fencing: only obey commands

from active

DN DN

FailoverControllerStandby

ZK ZKHeartbeat Heartbeat

DN

Namenode HA has no external dependency

Page 11: HDFS What’s New and Future

© Hortonworks Inc. 2013

Snapshots (HDFS-2802)

• Support for read-only COW snapshots– Design allows read-write snapshots

• Namenode only operation – no data copy made– Metadata in namenode - no complicated distributed mechanism– Datanodes have no knowledge

• Snapshot entire namespace or sub directories– Nested snapshots allowed– Managed by Admin

• Users can take snapshots of directories they own

• Efficient– Instantaneous creation– Memory used is highly optimized– Does not affect regular HDFS operations

Page 11Architecting the Future of Big Data

Page 12: HDFS What’s New and Future

© Hortonworks Inc. 2013

Snapshot Design

• A large number of snapshots supported– State proportional to the changes between the snapshots– Supports millions of snapshots

• Based on Persistent Data Structures– Maintains changes in the diff list at the Inodes

• Tracks creation, deletion, and modification

– Snapshot state Sn = current - ∆n

Page 12Architecting the Future of Big Data

Current Sn S0Sn-1

∆n ∆n-1 ∆0

Page 13: HDFS What’s New and Future

© Hortonworks Inc. 2013

Snapshot – APIs and CLIs

• All commands & APIs can use snapshot path– /<path>/.snapshot/<snapshot_name>/file.txt

– cp /from/.snpashot/snap1/file.txt /to/file.txt

• CLIs– Admin allows snapshots

• Snapshottable Directories

– Users can create/delete/rename snapshots

– Tool to print diff between snapshots

– Admin tool to print all snapshottable directories and snapshots

• Status– Work is complete – available in 2.1.0-beta

Page 13Architecting the Future of Big Data

Page 14: HDFS What’s New and Future

© Hortonworks Inc. 2013

Performance Improvements

• Many Improvements– SSE4.2 CRC32C – ~3x less CPU on read path

– Read path improvements - fewer memory copies

– Short-circuit read for 2-3x faster random reads (local reads)

– Unix domain socket based local reads• Simpler to configure and generic for many applications

– I/O improvements using posix_fadvise()

– libhdfs improvements for zero copy reads

• Significant improvements – I/O 2.5x to 5x faster– Lot of improvements back ported to release 1.x

Page 14Architecting the Future of Big Data

Page 15: HDFS What’s New and Future

© Hortonworks Inc. 2013

NFS Support (HDFS-4750)

• NFS Gateway provides NFS access to HDFS– File browsing, Data download/upload, Data streaming– No client-side library– Better alternative to Hadoop + Fuse based solution

• Better consistency guarantees

• Supports NFSv3• Stateless Gateway– Simpler design, easy to handle failures

• Future work– High Availability for NFS Gateway– NFSv4 support?

Page 15Architecting the Future of Big Data

Page 16: HDFS What’s New and Future

© Hortonworks Inc. 2013

Stronger Compatibility

• Hadoop RPC on the wire encryption uses protobuf• Post 2.1.0 beta stronger compatibility

– Java API– Wire protocol– Both forward and backward compatibility– Simplifies migrating to newer releases

• Rolling upgrades– Enabled by strong wire protocol compatibility

• Security improvements– Negotiation of authentication– Multiplex secure sessions in a connection

Page 16Architecting the Future of Big Data

Page 17: HDFS What’s New and Future

© Hortonworks Inc. 2013

Other Features

• New append pipeline

• Improvements for other projects– Stale Node to improve HBase MTTR

• Block placement enhancements– Better support for other topologies such as VMs and Cloud

• Expanding ecosystem, platforms and applicability– Native support for Windows

• File and Block IDs are unique– Major architectural step– Allows caches, layered file systems– A key requirement for archives and disaster recovery

Page 17Architecting the Future of Big Data

Page 18: HDFS What’s New and Future

© Hortonworks Inc. 2013

Enterprise Readiness

• Storage fault-tolerance – built into HDFS – 100% data reliability

• High Availability

• Standard Interfaces – WebHdfs(REST), Fuse, NFS, HTTPFS, libwebhdfs and libhdfs

• Wire protocol compatibility – Protocol buffers

• Rolling upgrades

• Snapshots

• Disaster Recovery – Distcp for parallel and incremental copies across cluster

– Apache Ambari and HDP for automated management

Page 18Architecting the Future of Big Data

Page 19: HDFS What’s New and Future

© Hortonworks Inc. 2011

HDFS Futures

Architecting the Future of Big DataPage 19

Page 20: HDFS What’s New and Future

© Hortonworks Inc. 2013

From Batch to Real-time

• New latency sensitive real-time use cases– Interactive queries - Stinger/Tez– HBase

• Heterogeneous (HDFS-2832)– Storage properties were hidden

• Datanode abstraction from single storage to collection of storages

– Memory, SSDs, Disks as storage types• Hierarchical storage tiers• Enables memory cache

– Work in progress

• Datanode Cache (HDFS-4949)– Enable fast zero copy access to data cached/pinned in datanode memory– Co-ordinated cache management

• Future – more sophisticated caching policies– Cache partial blocks based on access pattern– LRU, LRU-2

Page 20Architecting the Future of Big Data

Page 21: HDFS What’s New and Future

© Hortonworks Inc. 2013

Storage Abstraction

• Fundamental storage abstraction improvements

• Short Term (Post 2.x GA)

– Heterogeneous storage

– Block level APIs for direct access to fault tolerant storage• Apps & Services can by-pass file system interface

– Granular block placement policies• Co-locate related data/blocks in specific nodes

• Policy/tools to migrate related data together

• Long Term– Explore support for objects/key-value store and APIs

– Serving from Datanodes optimized based on file structure

Page 21Architecting the Future of Big Data

Page 22: HDFS What’s New and Future

© Hortonworks Inc. 2013

Higher Scalability

•Even higher scalability of namespace– Only working set in Namenode memory

– Namenode as container of namespaces• Support large number of namespaces

– Explore new types of namespaces

•Further scale the block storage– Block management to Datanodes

– Block collection/Mega block group abstraction

Page 22Architecting the Future of Big Data

Page 23: HDFS What’s New and Future

© Hortonworks Inc. 2013

High Availability

• Further enhancements to HA– Expand Full stack HA to include other dependent services

– Support multiple standby nodes

– Use standby for reads

– Simplify management – eliminate special daemons for journals• Move Namenode metadata to HDFS

Page 23Architecting the Future of Big Data

Page 24: HDFS What’s New and Future

© Hortonworks Inc. 2013

Q & A

•Myths and misinformation– Not reliable (was never true)– Namenode dies all state is lost (was never true)– Hard to operate– Slow and not performant – Namenode is a single point of failure– Needs shared NFS storage– Does not have point in time recovery– Does not support disaster recovery

Thank You!Page 24Architecting the Future of Big Data