google file system

17
Google File System A distributed file system

Upload: amgad-muhammad

Post on 20-Nov-2014

918 views

Category:

Documents


0 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Google File System

Google File SystemA distributed file system

Page 2: Google File System

GFS

GFS is scalable distributed file system for large distributed data-intensive applications.

Page 3: Google File System

Motive to build

Google had key observations upon which they decided to build their own DFS.

Cost Effective:› The system is built using inexpensive

commodity components where components failure is the norm and not the exception.

› So the system must detect, tolerate, and recover from failures on a routine basis.

Page 4: Google File System

Motive to build

File Size:› Multi GB files are the common case, so the

system must be optimized in managing large files

› Small files also are supported but no need to optimize for them.

Page 5: Google File System

Motive to build

Read Operation:› Large Data Streams

An operation reads hundreds of KBs or maybe 1MB or more.

Successive operations from the same client reads usually from the same file region.

› Random Reads An operation reads a few KBs staring from an

arbitrary offset. Performance - conscious applications usually

patch and sort their small reads to advance steadily in the file instead going back and forth.

Page 6: Google File System

Motive to build

Write Operations:› Are the same in size as the read

operations.› Once written the files are seldom modified.› Write operations are in the form of

sequential append.› Random writes are supported but not

efficient.

Page 7: Google File System

Motive to build

Transaction Management:› Usually applications use GFS in the form of

Producer- Consumer model.› Many Producer can be writing to the same

file concurrently.› Atomic writes and Synchronization

between different producers must be optimized.

Page 8: Google File System

Motive to build

Latency Vs High Sustained Bandwidth.› Client don’t have a tight SLA for read and

write operations response time, instead they care more about processing and moving data bulks in high rate.

Page 9: Google File System

System Interface

GFS provides an interface to:› Create› Delete› Open› Close› Read › Write› Snapshot (Copy)› Record Append

Page 10: Google File System

System Components

The system is organized into clusters. Each Cluster has the following

components:› Single Cluster Master› Multiple Chunk Servers› Multiple Clients (System Environment)

Page 11: Google File System

System Components

File are divided into fixed size chunks. Chunk size is 64 MB. Master assigns a 64 bit identifier called

chunk handle for each Chunk upon creation.

Chunk Servers stores chunks on local disk.

For reliability, each chunk is replicated across multiple chunk servers.

Page 12: Google File System

System Components

The Master maintains file system meta data.› Operations on Files and chunks

namespaces.› Mapping between Files and Chunks.› Current location of Chunks.› Chunk leas management› Garbage Collection› Chunk migration between Chunk Servers.

Page 13: Google File System

System Architecture

Page 14: Google File System

System Interaction (Reads)

Page 15: Google File System

System Interaction (Mutation and Leases with Data/Control separation)

Page 16: Google File System

Master Operations

Namespace Management and locking. Replica Placement. Replica Creation, Re-replication, and

Rebalancing. Garbage Collection. Stale Replica Detection.

Page 17: Google File System

Questions

Q?