hota hadoop
TRANSCRIPT
File Systems for File Systems for Cloud ComputingCloud Computing
Chittaranjan Hota, PhDFaculty Incharge, Information Processing Division
Birla Institute of Technology & Science-Pilani, Hyderabad CampusJawahar Nagar, Shameerpet, Ranga Reddy District, Hyderabad, AP, India
16th March 2013Computer Sc Dept., Utkal University, Vani Vihar, Bhubaneswar
Growth of the InternetGrowth of the Internet
Source: Cisco VNI Global Forecast, 2011-2016
Source: Internet world stats
Golden era in Golden era in Computing Computing
Cloud Futures 2011, Redmond
Cloud computing: Is it Cloud computing: Is it a hype?a hype?
from $41 billion in 2011 to $241 billion in 2020
Scaling up…Scaling up…SETI
What is Cloud What is Cloud Computing?Computing?
FilesFiles•Permanent Storage•Information sharing •Files have data and attributes
What Distributed File What Distributed File System ProvidesSystem Provides
• Provide accesses to data stored at servers using file system interfaces
• What are the file system interfaces?o Open a file, check status on a file, close a fileo Read data from a fileo Write data to a fileo Lock a file or part of a fileo List files in a directory, delete a directoryo Delete a file, rename a file, add a symbolic link to a file
etc.
DFS Design IssuesDFS Design Issues
• Mounting• Caching• Hints• Bulk Data Transfer• Replica management• Writing policies
NFS architectureNFS architectureClient computer Server computer
UNIXfile
system
NFSclient
NFSserver
UNIXfile
system
Applicationprogram
Applicationprogram
Virtual file systemVirtual file system
PC
DO
S
UNIX kernel
system calls
RPC for (remote operations)
UNIX
Operations on local files
Operationson
remote files
UNIX kernel
Network
Google File SystemGoogle File SystemMetadata: namespace, access control, mapping of files to chunks, and current location of chunks
1
2
3
4
HDFS DesignHDFS Design
•Files stored as blockso Default 64MB
•Reliability through replicationo replicated across 3+ DataNodes
•Single NameNode coordinates access, metadatao Centralized management
•No data cachingo Little benefit due to large data sets, streaming reads
Commodity HardwareCommodity Hardware
HDFS ArchitectureHDFS Architecture
HDFS-Aware Application
POSIX API HDFS API
Regular VFS with local and NFS-supported files
Specific drivers
Separate HDFS view
Network stack
HDFS NameNode
HDFS NameNode
HDFS DataNodeHDFS DataNode
HDFS DataNodeHDFS DataNode
HDFS ArchitectureHDFS ArchitectureNamenode
B
replication
Rack1 Rack2
Client
Blocks
Datanodes Datanodes
Client
Write
Read
Metadata opsMetadata(Name, replicas, …)
Block ops
HDFS File ReadHDFS File Read
HDFS Client
Client Node
Distributed FileSystems
FSData InputStream
1: open
3: read
6: close
NameNodeNameNode
namenode
2: get block location
DataNodeDataNode
datanode
DataNodeDataNode
datanode
DataNodeDataNode
datanode
4: read5: read
Hadoop ClustersHadoop Clusters
Rack AwarenessRack Awareness
node
r1 r2 r1 rack
n2
d1 d2 Data center
d=2
n1 n1
d=0
n1
d=4d=6
HDFS WriteHDFS Write
HDFS Client
Client Node
Distributed FileSystems
FSData OutputStream
1: create
3: write
6: close
NameNodeNameNode
namenode
2: create
DataNodeDataNode
datanode
DataNodeDataNode
datanode
DataNodeDataNode
datanode
4: write packet 5: ack packet
7: complete
Pipeline
4
5 5
4
Data Center
NODE
RACK
Replica PlacementReplica Placement
Computational GridsComputational Grids
[Source: IBM TJ Watson Research Center]
Load DistributionLoad Distribution
Map/ReduceMap/Reduce
SLURMSLURM
Crowd SourcingCrowd Sourcing
Foxtrot: Associating Foxtrot: Associating audio with locationsaudio with locations
Allen Telescope Array
Search for Extra Search for Extra Terrestrial Intelligence Terrestrial Intelligence
Thank You!