scaling hdfs with a strongly consistent relational model for metadata
TRANSCRIPT
![Page 1: Scaling HDFS with a Strongly Consistent Relational Model for Metadata](https://reader035.vdocuments.site/reader035/viewer/2022070519/58ed6d021a28ab8d558b4621/html5/thumbnails/1.jpg)
Scaling HDFS with a Strongly Consistent Relational Model for Metadata
Kamal Hakimzadeh,Hooman Peiro Sajjad,
Jim Dowling (mahh, shps, jdowling)@kth.se
DAIS 2014
![Page 2: Scaling HDFS with a Strongly Consistent Relational Model for Metadata](https://reader035.vdocuments.site/reader035/viewer/2022070519/58ed6d021a28ab8d558b4621/html5/thumbnails/2.jpg)
I-node File Systems
Kamal Hakimzadeh, DAIS 2014
File Set
File Info
Pointers
File Info
Pointers
File Info
Pointers
I-nodes Blocks
![Page 3: Scaling HDFS with a Strongly Consistent Relational Model for Metadata](https://reader035.vdocuments.site/reader035/viewer/2022070519/58ed6d021a28ab8d558b4621/html5/thumbnails/3.jpg)
Kamal Hakimzadeh, DAIS 2014
Hadoop Distributed File System (HDFS)
File Info
Pointers
…
File Info
Pointers
File Info
Pointers
File Info
Pointers
File Info
Pointers…I- node
sBl
ocks
…
NameNode (NN)
DateNode (DN) DateNode DateNode DateNode
Commodity Machines
![Page 4: Scaling HDFS with a Strongly Consistent Relational Model for Metadata](https://reader035.vdocuments.site/reader035/viewer/2022070519/58ed6d021a28ab8d558b4621/html5/thumbnails/4.jpg)
High Availability in HDFS 2.0
DN DN DN DN
NNActive
NNStandby
JN JN JN
Shared NNlog stored inquorum of
journal nodes
NN
Checkpt NN
ZK ZK ZK
Master-Slave
Replicationof NN State.
Agreement on the Active Master
Faster Recovery,Cut Journal Log
Kamal Hakimzadeh, DAIS 2014
![Page 5: Scaling HDFS with a Strongly Consistent Relational Model for Metadata](https://reader035.vdocuments.site/reader035/viewer/2022070519/58ed6d021a28ab8d558b4621/html5/thumbnails/5.jpg)
Kamal Hakimzadeh, DAIS 2014
NameNode Limitations and Tradeoffs
1. 60 GB JVM heap for NN
• Compression, larger blocks
2. Operation reorder in failures
3. Single writer concurrency model
4. HA consensus overhead
100 M files ≈ 10 PB
65 M files ≈ 21 PB
Eventual Consistent
Poor throughput
![Page 6: Scaling HDFS with a Strongly Consistent Relational Model for Metadata](https://reader035.vdocuments.site/reader035/viewer/2022070519/58ed6d021a28ab8d558b4621/html5/thumbnails/6.jpg)
Move Metadata into Distributed DataBase
DN DN DN DN
Stateless NN
NDB
Up to 48 nodesMySQL Cluster
• Distributed, Replicated, In-Memory Database
• Transaction support • Read-committed isolation
level• Row-level locks• 17.6 M tx/sec.
Kamal Hakimzadeh, DAIS 2014
![Page 7: Scaling HDFS with a Strongly Consistent Relational Model for Metadata](https://reader035.vdocuments.site/reader035/viewer/2022070519/58ed6d021a28ab8d558b4621/html5/thumbnails/7.jpg)
Kamal Hakimzadeh, DAIS 2014
Metadata Consistency
Objective: Strongly Consistent Metadata
1. Transaction per each Metadata Operation2. Read committed Isolation Level3. Row-level Locking
Seriablizable Isolation Level ≈ Strongly Consistent Model
HDFS Uses System Level Lock = Single Writer Concurrency Model
![Page 8: Scaling HDFS with a Strongly Consistent Relational Model for Metadata](https://reader035.vdocuments.site/reader035/viewer/2022070519/58ed6d021a28ab8d558b4621/html5/thumbnails/8.jpg)
Kamal Hakimzadeh, DAIS 2014
HDFS Metadata
![Page 9: Scaling HDFS with a Strongly Consistent Relational Model for Metadata](https://reader035.vdocuments.site/reader035/viewer/2022070519/58ed6d021a28ab8d558b4621/html5/thumbnails/9.jpg)
Kamal Hakimzadeh, DAIS 2014
Order of Locks in the DAG of Metadata
Metadata Operations:
1. Path Operation
2. Block Operation
3. Lease Operation
Conflicting Lock OrderTotal Order Locking
Locking Issues
1. Range Queries
2. Semantically Related Objects
3. Lock Upgrade
Implicit Sub-tree lock
Strongest Required Lock
![Page 10: Scaling HDFS with a Strongly Consistent Relational Model for Metadata](https://reader035.vdocuments.site/reader035/viewer/2022070519/58ed6d021a28ab8d558b4621/html5/thumbnails/10.jpg)
Kamal Hakimzadeh, DAIS 2014
Scale of Capacity
…
48 Nodes NDB Cluster12 TB
• NDB: 3 TB, replication factor 2• File: 2 blocks, 3 replicas
HDFS: 100M files Our Solution: 4.1B files
Factor of 40
![Page 11: Scaling HDFS with a Strongly Consistent Relational Model for Metadata](https://reader035.vdocuments.site/reader035/viewer/2022070519/58ed6d021a28ab8d558b4621/html5/thumbnails/11.jpg)
Kamal Hakimzadeh, DAIS 2014
Row-level lock throughput impact
Open Operation (Shared lock) Create Operation (Exclusive Lock)
![Page 12: Scaling HDFS with a Strongly Consistent Relational Model for Metadata](https://reader035.vdocuments.site/reader035/viewer/2022070519/58ed6d021a28ab8d558b4621/html5/thumbnails/12.jpg)
Kamal Hakimzadeh, DAIS 2014
Improvement: Snapshotting
![Page 13: Scaling HDFS with a Strongly Consistent Relational Model for Metadata](https://reader035.vdocuments.site/reader035/viewer/2022070519/58ed6d021a28ab8d558b4621/html5/thumbnails/13.jpg)
Kamal Hakimzadeh, DAIS 2014
![Page 14: Scaling HDFS with a Strongly Consistent Relational Model for Metadata](https://reader035.vdocuments.site/reader035/viewer/2022070519/58ed6d021a28ab8d558b4621/html5/thumbnails/14.jpg)
Kamal Hakimzadeh, DAIS 2014