google file system
Post on 20-Nov-2014
918 Views
Preview:
DESCRIPTION
TRANSCRIPT
Google File SystemA distributed file system
GFS
GFS is scalable distributed file system for large distributed data-intensive applications.
Motive to build
Google had key observations upon which they decided to build their own DFS.
Cost Effective:› The system is built using inexpensive
commodity components where components failure is the norm and not the exception.
› So the system must detect, tolerate, and recover from failures on a routine basis.
Motive to build
File Size:› Multi GB files are the common case, so the
system must be optimized in managing large files
› Small files also are supported but no need to optimize for them.
Motive to build
Read Operation:› Large Data Streams
An operation reads hundreds of KBs or maybe 1MB or more.
Successive operations from the same client reads usually from the same file region.
› Random Reads An operation reads a few KBs staring from an
arbitrary offset. Performance - conscious applications usually
patch and sort their small reads to advance steadily in the file instead going back and forth.
Motive to build
Write Operations:› Are the same in size as the read
operations.› Once written the files are seldom modified.› Write operations are in the form of
sequential append.› Random writes are supported but not
efficient.
Motive to build
Transaction Management:› Usually applications use GFS in the form of
Producer- Consumer model.› Many Producer can be writing to the same
file concurrently.› Atomic writes and Synchronization
between different producers must be optimized.
Motive to build
Latency Vs High Sustained Bandwidth.› Client don’t have a tight SLA for read and
write operations response time, instead they care more about processing and moving data bulks in high rate.
System Interface
GFS provides an interface to:› Create› Delete› Open› Close› Read › Write› Snapshot (Copy)› Record Append
System Components
The system is organized into clusters. Each Cluster has the following
components:› Single Cluster Master› Multiple Chunk Servers› Multiple Clients (System Environment)
System Components
File are divided into fixed size chunks. Chunk size is 64 MB. Master assigns a 64 bit identifier called
chunk handle for each Chunk upon creation.
Chunk Servers stores chunks on local disk.
For reliability, each chunk is replicated across multiple chunk servers.
System Components
The Master maintains file system meta data.› Operations on Files and chunks
namespaces.› Mapping between Files and Chunks.› Current location of Chunks.› Chunk leas management› Garbage Collection› Chunk migration between Chunk Servers.
System Architecture
System Interaction (Reads)
System Interaction (Mutation and Leases with Data/Control separation)
Master Operations
Namespace Management and locking. Replica Placement. Replica Creation, Re-replication, and
Rebalancing. Garbage Collection. Stale Replica Detection.
Questions
Q?
top related