the hadoop distributed file systemcis.csuohio.edu/~sschung/cis601/hdfs_shuha.pdf · 2017. 5. 2. ·...
TRANSCRIPT
THE HADOOP DISTRIBUTED FILE SYSTEM
Authors: Konstantin Shvachko, Hairong Kuang, Sanjay Radia, Robert Chansler
Presented by Suhua Wei, Yong Yu
OUTLINE
• Architecture of Hadoop Distributed File System(HDFS)
• Report on the experience using HDFS at Yahoo!
Courtesy of http://revieweasyhomemadecookies.com/what-is-hadoopecosystem/
HDFS OVERVIEW
• HDFS stores files system metadata and application data separately
• HDFS stores metadata on a dedicated server, called NameNodes
• Application data are stored on other servers called DataNodes
• All servers are fully connected and communicate with each other using
TPC-based protocols.
ARCHITECTURE
• An HDFS client creates a new file by giving its path to the NameNode
• For each block of the file, the NameNode returns a list of DataNodes to host its replicas
• The client then pipelines data to the chosen DataNodes, which eventually confirm the creation of the block replicas to the NameNode
ARCHITECTURE
• NameNode
• The HDFS namespace is a hierarchy of files and directories. Files and directories are represented on the
NameNode by inodes, which record attributes like permissions, modification and access time, namespace
and diskspace quotas.
• The NameNode maintains the namespace tree and the mapping of file blocks to DataNodes
• The HDFS keeps the entire namespace in RAM.
• Image: the inode data and the list of blocks belonging to each file
• Checkpoint: the persistent record of the image stored in the local host’s native file system.
• Journal: the modification log of the image which is stored in the local host’s native file system
• For improve durability, redundant copies of the checkpoint and journal can be made at other servers.
ARCHITECTURE
• DataNode
• During startup each DataNode connects to the NameNode and performs a handshake, which is to verify the namespace ID, the software version of the DataNode
• After the handshake, the DataNode registers with NameNodes.
• During normal operation, DataNodes send heartbeats to the NameNodes to confirm that the DataNode is operating and the block replicas it hosts are available at interval of 3 seconds
• The NameNode replies to heartbeats to send instructions include commands to:
• replicate blocks to other nodes
• remove local block replicas
• re-register or to shut down the node
• an immediate block report
ARCHITECTURE
• HDFS Clients
• HDFS clients is a code library that exports the HDFS file system interface
• Reads data by transferring data from a DataNode directly
• Writing data
• it first asks the NameNode to choose DataNodes to host replicas of the first block of the file.
• The client sets up a node-to-node pipeline and sends data to the first DataNode
• When the first block is filled, the client requests new DataNodes to be chosen to host replicas of the next block.
• Unlike conventional file system, HDFS provides an API that exposes the locations of a file blocks, which allows applications like MapReduce framework to schedule a task to where the data are located.
REDUNDANCY MECHANISMS
• Image and Journal
• Image: the inode data and the list of blocks belonging to each file (file system metadata)
• Checkpoint: the persistent record of the image written to disk.
• Journal: the modification log of the image to ensure changes being persistent
• CheckpointNode and BackupNode
• A NameNode can alternatively be run as a CheckpointNode or BackupNode
• The CheckpointNode periodically combines the existing checkpoint and journal to create a new checkpoint and empty journal
• Creating periodic checkpoints is one way to protect the file system metadata. Good practice is to create a daily checkpoint
• A BackupNode acts like a shadow of the NameNode and keeps an up-to-date copy of the image in memory
FILE I/O OPERATIONS AND REPLICA MANAGEMENT
• File Read and Write
• HDFS implements a single-
writer, multiple-reader model
• The figure at right shows the
data pipeline during block
construction.
Acknowledgement
massages
FILE I/O OPERATIONS AND REPLICA MANAGEMENT
• Block Placement
• For large cluster, a common practice is to
spread the nodes across multiple racks.
• Nodes of a rack share a switch and rack
switches are connected by one or more
switches.
• The default HDFS replica placement policy:
• No DataNode contains more than one
replica of any block
• No rack contains more than one
replicas of the same block, provided
there are sufficient racks on the cluster
FILE I/O OPERATIONS AND REPLICA MANAGEMENT
• Replication management
• The NameNode endeavors to ensure that each block always has the intended number of relicas
• The necessity for re-replication may arise due to many reasons:
• a DataNode may become unavailable
• a replica may become corrupted
• a hard disk on a DataNode may fail
• or the replication factor of a file may be increased
• When a block becomes over replicated, the NameNode choose a replica to remove; when a block
becomes under replicated, it is put in the replication priority queue. A background thread
periodically scans the head of the replication queue to decide where to place new replicas.
HDFS AT YAHOO!
• Large HDFS clusters at Yahoo! Include about 3500 nodes.(60 million
files) A typical cluster node has:
• 2 quad core Xeon processors @ 2.5 ghz
• Red Hat Enterprise Linux Server Release 5.1
• Sun Java JDK
• 4 directly attached SATA drivers (One terabyte each)
• 1-gigabit Ethernet
PERFORMANCE BENCHMARKS
• The DFSIO benchmark measures average throughput for read, write, and
append operations.
DFSIO Production cluster Sort
Read: 66 MB /s per node Read: 1.02 MB/s per node 1 TB sort:
22.1 MB/s per node
Write: 40 MB /s per node Write: 1.09 MB /s per node 1000 TB:
9.35 MB/s per node
PERFORMANCE BENCHMARKS
• The NNTroughout benchmark is a
single node process which starts
the NameNode application and
runs a series of client threads on
the same node
• The benchmark measures the
number of operations per second
performed by the NameNode
SUMMARY
• The Hadoop Distributed File System is designed to store
very large data sets reliably and to stream these datasets
to user applications at high bandwidth
• The HDFS architecture consists of a single NameNode,
many DataNodes and the HDFS client