design for a distributed name node
DESCRIPTION
A proposed design for a distributed HDFS NameNode.TRANSCRIPT
Reaching 10,000Aaron CordovaBooz Allen Hamilton | Hadoop Meetup DC | Sep 7 2010
Lots of Applications Require Scalability
Intelligence
Bio-Metrics
Bio-Informatics
Defense
Video
Images
Text
Structured Data
Graph Analytics
Machine Learning
Network Security
Hadoop Scales
Cos
t ->
Data Size ->
Shared Nothing Shared Disk
Linear Scalability
Massive Parallelism
MapReduce
Simplified Distributed Programming Model
Fault Tolerant
Designed to Scale to Thousands of Servers
Many Algorithms Easily Expressed as Map and Reduce
HDFS
Distributed File System
Optimized for High-Throughput
Fault Tolerant Through Replication, Checksumming
Designed to Scale to 10,000 servers
Hadoop is a Platform
MapReduce
HDFS
HBase
Mahout Hive
Pig
FlumeCascading
Nutch
HBase
Scalable Structured store
Fast Lookups
Durable, Consistent Writes
Automatic Partitioning
Mahout
Scalable Machine Learning Algorithms
Clustering
Classification
Fuzzy Table
Low-Latency Parallel Search
Generalized Fuzzy Matching
Images, Biometrics, Audio
One Major Problem
HDFS Single NameNode
Single NameSpace - easy to serialize operations
NameSpace stored entirely in memory
Changes written to transaction log first
Single Point of Failure
Performance Bottleneck?
NameNode Scalability
By software evolution standards Hadoop is a young project. In 2005, inspired by two Google papers, Doug Cutting and Mike Cafarella implemented the core of Hadoop. Its wide acceptance and growth started in 2006 when Yahoo! began investing in its development and committed to use Hadoop as its internal distributed platform. During the past sev-eral years Hadoop installations have grown from a handful of nodes to thousands. It is now used in many organizations around the world.
In 2006, when the buzzword for storage was Exabyte, the Hadoop group at Yahoo! formulated long-term target requirements [7] for the Hadoop Distributed File System and outlined a list of projects intended to bring the requirements to life. What was clear then has now become a reality: the need for large distributed storage systems backed by distributed computational frameworks like Ha-doop MapReduce is imminent.
Today, when we are on the verge of the Zettabyte Era, it is time to take a retrospective view of the targets and analyze what has been achieved, how aggressive our views on the evolution and needs of the storage world have been, how the achievements compare to competing systems, and what our lim-its to growth may be.
The main four-dimensional scale requirement targets for HDFS were formulated [7] as follows:
10PB capacity x 10,000 nodes x 100,000,000 files x 100,000 clients
The biggest Hadoop clusters [8, 5], such as the one recently used at Yahoo! to set sorting records, consist of 4000 nodes and have a total space capac-
“100,000 HDFS clients on a 10,000-node HDFS cluster will exceed the throughput capacity of a single name-node.
... any solution intended for single namespace server optimization lacks scalability.
... the most promising solutions seem to be based on distributing the namespace server ...”
Konstantin Shvachko
Login Apr 2010
0
12.5
25
37.5
50
writ
es/s
econ
d (th
ousa
nds)
Single NN Target
Goal
HDFS Single NameNode
Server grade machine
Lots of memory
Reliable components
RAID
Hot-Failover
Needs Parallelism
Scaling NameNode
Grow memory
Read-only Replicas of NameNode
Multiple static namespace partitions
Distributed name server, partition namespace dynamically
Distributed NameNode Features
Fast Lookups
Durable, Consistent writes
Automatic Partitioning
Can we use HBase?
NameSpace
filename : blocks DataNodes
node : blocks Blocks
block : nodes
Mappings as HBase Tables
How to order namespace?
Depth First Search Order
/
/dir1
/dir1/subdir
/dir1/subdir/file
/dir2/file1
/dir2/file2
Depth First Operations
Delete (Recursive)Move / Rename
Breadth First Search Order
0/
1/dir1
2/dir2/file1
2/dir2/file2
2/dir1/subdir
3/dir2/subdir/file
Breadth First Operations
List
NameNode
DFSClientDataNode DataNode DFSClient
Current Architecture
DFSClient
DNNProxy
DataNode
DNNProxy
DataNode
DNNProxy
DFSClient
DNNProxy
RServer RServer RServer RServer
Proposed Architecture
100k clients -> 41k writes/s
0
12.5
25
37.5
50
100 150 200 250
writ
es/s
econ
d (th
ousa
nds)
# machines hosting namespace
Single NN Distributed NN Target
Anticipated Performance
Issues
Synchronization - multiple writers, changes
Name distribution hotspots
Current Status
Working code exists that uses HBase with slightly modified DFSClient and DataNode for create, write, close, open, read, mkdirs, delete.
New component: HealthServer monitors DataNodes and does garbage collection. More like BigTable master, can die, restart without affecting clients.
Code
Will be at http://code.google.com/p/hdfs-dnn
Available under the Apache license - whichever is compatible with Hadoop
Doesn’t HBase run on HDFS?
Self-Hosted HBase
May be possible to have HBase use the same HDFS instance it’s supporting
Some recursion and self-reference already exists: HBase Metadata table is itself a table in HBase
Have to work out bootstrapping and failure recovery to resolve any potential circular dependencies