ramcloud: a low-latency datacenter storage system ankita kejriwal stanford university (joint work...
TRANSCRIPT
![Page 1: RAMCloud: A Low-Latency Datacenter Storage System Ankita Kejriwal Stanford University (Joint work with Diego Ongaro, Ryan Stutsman, Steve Rumble, Mendel](https://reader036.vdocuments.site/reader036/viewer/2022083004/56649e5e5503460f94b57913/html5/thumbnails/1.jpg)
RAMCloud: A Low-Latency Datacenter Storage System
Ankita Kejriwal
Stanford University
(Joint work with Diego Ongaro, Ryan Stutsman, Steve Rumble,
Mendel Rosenblum and John Ousterhout)
![Page 2: RAMCloud: A Low-Latency Datacenter Storage System Ankita Kejriwal Stanford University (Joint work with Diego Ongaro, Ryan Stutsman, Steve Rumble, Mendel](https://reader036.vdocuments.site/reader036/viewer/2022083004/56649e5e5503460f94b57913/html5/thumbnails/2.jpg)
… a Storage System that provides:
● Scale Data size: 10 PB Accessible by 100,000 nodes (10 Million cores)
● Uniform fast random access time to all data 100 B read: 2 µs RPC 100 B write: 5 µs RPC
● Durable and available
What if you had…
February 07, 2013 RAMCloud Slide 2
![Page 3: RAMCloud: A Low-Latency Datacenter Storage System Ankita Kejriwal Stanford University (Joint work with Diego Ongaro, Ryan Stutsman, Steve Rumble, Mendel](https://reader036.vdocuments.site/reader036/viewer/2022083004/56649e5e5503460f94b57913/html5/thumbnails/3.jpg)
● General-purpose storage system
● All data always in DRAM
● Scale: 1000 – 10000 servers, 1 PB data
● Performance goals: High throughput: 1M ops/sec/server Low-latency access: 5-10µs RPC
● Durable and available
● Potential impact: enable new class of applications Primary motivation: Web sphere Maybe HPC?
RAMCloud
February 07, 2013 RAMCloud Slide 3
![Page 4: RAMCloud: A Low-Latency Datacenter Storage System Ankita Kejriwal Stanford University (Joint work with Diego Ongaro, Ryan Stutsman, Steve Rumble, Mendel](https://reader036.vdocuments.site/reader036/viewer/2022083004/56649e5e5503460f94b57913/html5/thumbnails/4.jpg)
DRAM in Storage Systems
1970 1980 1990 2000 2010
UNIX buffercache
Main-memorydatabases
Large filecaches
Web indexesentirely in DRAM
memcached
Facebook:200 TB total data
150 TB cache!
Main-memoryDBs, again
February 07, 2013 RAMCloud Slide 4
![Page 5: RAMCloud: A Low-Latency Datacenter Storage System Ankita Kejriwal Stanford University (Joint work with Diego Ongaro, Ryan Stutsman, Steve Rumble, Mendel](https://reader036.vdocuments.site/reader036/viewer/2022083004/56649e5e5503460f94b57913/html5/thumbnails/5.jpg)
DRAM in Storage Systems
● DRAM usage specialized/limited● Clumsy (consistency with
backing store)● Lost performance (backing
store, cache misses)
1970 1980 1990 2000 2010
UNIX buffercache
Main-memorydatabases
Large filecaches
Web indexesentirely in DRAM
memcached
Facebook:200 TB total data
150 TB cache!
Main-memoryDBs, again
February 07, 2013 RAMCloud Slide 5
![Page 6: RAMCloud: A Low-Latency Datacenter Storage System Ankita Kejriwal Stanford University (Joint work with Diego Ongaro, Ryan Stutsman, Steve Rumble, Mendel](https://reader036.vdocuments.site/reader036/viewer/2022083004/56649e5e5503460f94b57913/html5/thumbnails/6.jpg)
RAMCloud Slide 6
DRAM is cheaper!from "Andersen et al., "FAWN: A Fast Array of Wimpy Nodes",Proc. 22nd Symposium on Operating System Principles, 2009, pp. 1-14.
0.1 1 10 100 10000.1
1
10
100
1000
10000
Query Rate (millions/sec)
Da
tase
t S
ize
(T
B)
Disk
Flash
DRAM
February 07, 2013
Lowest TCO
![Page 7: RAMCloud: A Low-Latency Datacenter Storage System Ankita Kejriwal Stanford University (Joint work with Diego Ongaro, Ryan Stutsman, Steve Rumble, Mendel](https://reader036.vdocuments.site/reader036/viewer/2022083004/56649e5e5503460f94b57913/html5/thumbnails/7.jpg)
Why Does Latency Matter?
● Large-scale apps struggle with high latency Random access data rate has not scaled! Facebook: can only make 100-150 internal requests per page
UI
App.Logic
DataStructures
Traditional Application
<< 1µs latency 0.5-10ms latency
Single machine
February 07, 2013 RAMCloud Slide 7
UI
App.Logic
App
licat
ion
Ser
vers S
torage Servers
Web Application
Datacenter
![Page 8: RAMCloud: A Low-Latency Datacenter Storage System Ankita Kejriwal Stanford University (Joint work with Diego Ongaro, Ryan Stutsman, Steve Rumble, Mendel](https://reader036.vdocuments.site/reader036/viewer/2022083004/56649e5e5503460f94b57913/html5/thumbnails/8.jpg)
RAMCloud Goal: Scale and Latency
● Enable new class of applications
Traditional Application
<< 1µs latency 0.5-10ms latency5-10µs
UI
App.Logic
DataStructures
Single machine
February 07, 2013 RAMCloud Slide 8
UI
App.Logic
App
licat
ion
Ser
vers S
torage Servers
Web Application
Datacenter
![Page 9: RAMCloud: A Low-Latency Datacenter Storage System Ankita Kejriwal Stanford University (Joint work with Diego Ongaro, Ryan Stutsman, Steve Rumble, Mendel](https://reader036.vdocuments.site/reader036/viewer/2022083004/56649e5e5503460f94b57913/html5/thumbnails/9.jpg)
RAMCloud Architecture
Master
Backup
Master
Backup
Master
Backup
Master
Backup…
Appl.
Library
Appl.
Library
Appl.
Library
Appl.
Library…
DatacenterNetwork Coordinator
1000 – 10,000 Storage Servers
1000 – 100,000 Application Servers
CommodityServers
DRAM32-256 GBper server
High-speed networking:● 5 µs round-trip● Full bisection
bandwidth
February 07, 2013 RAMCloud Slide 9
![Page 10: RAMCloud: A Low-Latency Datacenter Storage System Ankita Kejriwal Stanford University (Joint work with Diego Ongaro, Ryan Stutsman, Steve Rumble, Mendel](https://reader036.vdocuments.site/reader036/viewer/2022083004/56649e5e5503460f94b57913/html5/thumbnails/10.jpg)
read(tableId, key) => blob, version
write(tableId, key, blob) => version
cwrite(tableId, key, blob, version) => version
delete(tableId, key)
enumerate(tableId)
Data Model: Key-Value StoreTables
Key (≤ 64KB)Version (64b)
Blob (≤ 1MB)
Object
February 07, 2013 RAMCloud Slide 10
Richer model in the future:• Indexes?• Transactions?• Graphs?
key | value
key | value
key | value
key | value
key | value
key | value
key | value
key | value
key | value
key | value
key | value
![Page 11: RAMCloud: A Low-Latency Datacenter Storage System Ankita Kejriwal Stanford University (Joint work with Diego Ongaro, Ryan Stutsman, Steve Rumble, Mendel](https://reader036.vdocuments.site/reader036/viewer/2022083004/56649e5e5503460f94b57913/html5/thumbnails/11.jpg)
● Goals: No impact on performance Minimum cost, energy
● Keep replicas in DRAM of other servers? 3x system cost, energy Still have to handle power failures
● RAMCloud approach: 1 copy in DRAM Backup copies on disk/flash: durability ~ free!
● Issues to resolve: Synchronous disk I/O’s during writes?? Data unavailable after crashes??
Durability and Availability
February 07, 2013 RAMCloud Slide 11
![Page 12: RAMCloud: A Low-Latency Datacenter Storage System Ankita Kejriwal Stanford University (Joint work with Diego Ongaro, Ryan Stutsman, Steve Rumble, Mendel](https://reader036.vdocuments.site/reader036/viewer/2022083004/56649e5e5503460f94b57913/html5/thumbnails/12.jpg)
● No disk I/O during write requests
● Log-structured: backup disks and master’s memory
● Log cleaning
Buffered Logging
Disk
Backup
Buffered Segment
Disk
Backup
Buffered Segment
Master
Disk
Backup
Buffered Segment
In-Memory Log
HashTable
Write request
February 07, 2013 RAMCloud Slide 12
![Page 13: RAMCloud: A Low-Latency Datacenter Storage System Ankita Kejriwal Stanford University (Joint work with Diego Ongaro, Ryan Stutsman, Steve Rumble, Mendel](https://reader036.vdocuments.site/reader036/viewer/2022083004/56649e5e5503460f94b57913/html5/thumbnails/13.jpg)
● Server crashes: Must replay log to reconstruct data
● Crash recovery: Choose recovery master Backup reads log info from disk Transfers logs to recovery master Recovery master replays log
● Meanwhile, data is unavailable
● RAMCloud approach: fast crash recovery 1-2 seconds for 100 GB of data Use system scale to get around bottlenecks
Crash Recovery
February 07, 2013 RAMCloud Slide 13
RecoveryMaster
Backups
DeadMaster
![Page 14: RAMCloud: A Low-Latency Datacenter Storage System Ankita Kejriwal Stanford University (Joint work with Diego Ongaro, Ryan Stutsman, Steve Rumble, Mendel](https://reader036.vdocuments.site/reader036/viewer/2022083004/56649e5e5503460f94b57913/html5/thumbnails/14.jpg)
● Scatter backup data across backups
● Divide each master’s data into partitions Recover each partition on a separate recovery master Each backup divides its log data among recovery masters
Fast Crash Recovery
RecoveryMasters
Backups
DeadMaster
February 07, 2013 RAMCloud Slide 14
![Page 15: RAMCloud: A Low-Latency Datacenter Storage System Ankita Kejriwal Stanford University (Joint work with Diego Ongaro, Ryan Stutsman, Steve Rumble, Mendel](https://reader036.vdocuments.site/reader036/viewer/2022083004/56649e5e5503460f94b57913/html5/thumbnails/15.jpg)
● Goal: build production-quality implementation
● Nearing 1.0-level release
● Current test cluster: 80 servers, 2 TB data High speed Infiniband networking Performance:
● 100 B read: 5.3 µs RPC● 100 B write: 15 µs RPC
● Interested in finding applications for RAMCloud
RAMCloud Project Status
February 07, 2013 RAMCloud Slide 15
![Page 16: RAMCloud: A Low-Latency Datacenter Storage System Ankita Kejriwal Stanford University (Joint work with Diego Ongaro, Ryan Stutsman, Steve Rumble, Mendel](https://reader036.vdocuments.site/reader036/viewer/2022083004/56649e5e5503460f94b57913/html5/thumbnails/16.jpg)
Properties of RAMCloud relevant to application developers:
● Durability and availability
● Key-value store
● Commodity hardware
● Read / write access latency
● Random access to small objects
Is RAMCloud right for HPC apps?
February 07, 2013 RAMCloud Slide 16
![Page 17: RAMCloud: A Low-Latency Datacenter Storage System Ankita Kejriwal Stanford University (Joint work with Diego Ongaro, Ryan Stutsman, Steve Rumble, Mendel](https://reader036.vdocuments.site/reader036/viewer/2022083004/56649e5e5503460f94b57913/html5/thumbnails/17.jpg)
● General-purpose storage system
● All data always in DRAM
● Designed for: Scale: 1000 – 10000 servers, 1 PB data Performance: 5-10µs RPC
● Durable and available
Conclusion
February 07, 2013 RAMCloud Slide 17
![Page 18: RAMCloud: A Low-Latency Datacenter Storage System Ankita Kejriwal Stanford University (Joint work with Diego Ongaro, Ryan Stutsman, Steve Rumble, Mendel](https://reader036.vdocuments.site/reader036/viewer/2022083004/56649e5e5503460f94b57913/html5/thumbnails/18.jpg)
● Is RAMCloud appropriate for HPC Applications? Durability and availability Key-value store Commodity hardware Read / write access latency Random access to small objects
● One thing that we could change to make RAMCloud interesting to you!
Questions
February 07, 2013 RAMCloud Slide 18
![Page 19: RAMCloud: A Low-Latency Datacenter Storage System Ankita Kejriwal Stanford University (Joint work with Diego Ongaro, Ryan Stutsman, Steve Rumble, Mendel](https://reader036.vdocuments.site/reader036/viewer/2022083004/56649e5e5503460f94b57913/html5/thumbnails/19.jpg)
Thank you!