larger-than-memory data management on modern storage ... · •apache geode – overflow tables...
TRANSCRIPT
![Page 1: Larger-than-Memory Data Management on Modern Storage ... · •Apache Geode – Overflow Tables •MemSQL – Columnar Tables •SolidDB •P*TIME. 22 MERGING THRESHOLD • YCSB Workload](https://reader033.vdocuments.site/reader033/viewer/2022042711/5f752692c192c53aea7e0a64/html5/thumbnails/1.jpg)
Larger-than-Memory Data Managementon Modern Storage Hardware
for In-Memory OLTP Database SystemsLin Ma, Joy Arulraj, Sam Zhao, Andrew Pavlo, Subramanya R. Dulloor,
Michael J. Giardino, Jeff Parkhurst, Jason L. Gardner, Kshitij Doshi,Col. Stanley Zdonik
![Page 2: Larger-than-Memory Data Management on Modern Storage ... · •Apache Geode – Overflow Tables •MemSQL – Columnar Tables •SolidDB •P*TIME. 22 MERGING THRESHOLD • YCSB Workload](https://reader033.vdocuments.site/reader033/viewer/2022042711/5f752692c192c53aea7e0a64/html5/thumbnails/2.jpg)
2
MOTIVATION
•Allow an in-memory DBMS to store/access data on disk without bringing back all the slow parts of a disk-oriented DBMS.
•Different properties of storage devices may affect important design decisions.
![Page 3: Larger-than-Memory Data Management on Modern Storage ... · •Apache Geode – Overflow Tables •MemSQL – Columnar Tables •SolidDB •P*TIME. 22 MERGING THRESHOLD • YCSB Workload](https://reader033.vdocuments.site/reader033/viewer/2022042711/5f752692c192c53aea7e0a64/html5/thumbnails/3.jpg)
3
STORAGE TECHNOLOGIES
• 10m Tuples – 1KB each• Synchronization Enabled
HDD SMR SSD 3D XPoint NVRAM DRAM0.01
1100
100001000000
1KB Read 1KB Write 64KB Read 64KB Write
Late
ncy
(nan
osec
)
random sequential
![Page 4: Larger-than-Memory Data Management on Modern Storage ... · •Apache Geode – Overflow Tables •MemSQL – Columnar Tables •SolidDB •P*TIME. 22 MERGING THRESHOLD • YCSB Workload](https://reader033.vdocuments.site/reader033/viewer/2022042711/5f752692c192c53aea7e0a64/html5/thumbnails/4.jpg)
4
DESIGN DECISIONS
• Hardware independent policies–Cold Tuple Identification–Evicted Tuple Meta-data
• Hardware dependent policies–Cold Tuple Retrieval–Merging Threshold–Access Methods
![Page 5: Larger-than-Memory Data Management on Modern Storage ... · •Apache Geode – Overflow Tables •MemSQL – Columnar Tables •SolidDB •P*TIME. 22 MERGING THRESHOLD • YCSB Workload](https://reader033.vdocuments.site/reader033/viewer/2022042711/5f752692c192c53aea7e0a64/html5/thumbnails/5.jpg)
5
HARDWARE INDEPENDENT POLICIES
![Page 6: Larger-than-Memory Data Management on Modern Storage ... · •Apache Geode – Overflow Tables •MemSQL – Columnar Tables •SolidDB •P*TIME. 22 MERGING THRESHOLD • YCSB Workload](https://reader033.vdocuments.site/reader033/viewer/2022042711/5f752692c192c53aea7e0a64/html5/thumbnails/6.jpg)
6
INDEPENDENT POLICIES
•Cold Tuple Identification–Option #1: On-line identification–Option #2: Off-line identification
● Evicted Tuple Meta-data–Option #1: Marker to represent the on-disk position–Option #2: Bloom filter–Option #3: Rely on virtual paging
![Page 7: Larger-than-Memory Data Management on Modern Storage ... · •Apache Geode – Overflow Tables •MemSQL – Columnar Tables •SolidDB •P*TIME. 22 MERGING THRESHOLD • YCSB Workload](https://reader033.vdocuments.site/reader033/viewer/2022042711/5f752692c192c53aea7e0a64/html5/thumbnails/7.jpg)
7
EVICTED TUPLE META-DATAIn-Memory Table Heap
Tuple #01
Tuple #03
Tuple #04
Tuple #00
Tuple #02
Cold-Data Storage
header
Tuple #01
Tuple #03
Tuple #04
In-Memory Index
Access FrequencyTuple #00Tuple #01Tuple #02Tuple #03Tuple #04Tuple #05
<Block,Offset>
<Block,Offset>
<Block,Offset>
IndexBloom Filter
![Page 8: Larger-than-Memory Data Management on Modern Storage ... · •Apache Geode – Overflow Tables •MemSQL – Columnar Tables •SolidDB •P*TIME. 22 MERGING THRESHOLD • YCSB Workload](https://reader033.vdocuments.site/reader033/viewer/2022042711/5f752692c192c53aea7e0a64/html5/thumbnails/8.jpg)
8
HARDWARE DEPENDENT POLICIES
![Page 9: Larger-than-Memory Data Management on Modern Storage ... · •Apache Geode – Overflow Tables •MemSQL – Columnar Tables •SolidDB •P*TIME. 22 MERGING THRESHOLD • YCSB Workload](https://reader033.vdocuments.site/reader033/viewer/2022042711/5f752692c192c53aea7e0a64/html5/thumbnails/9.jpg)
9
Transaction
COLD TUPLE RETRIEVALIn-Memory Table Heap
Tuple #00
Tuple #02
Cold-Data Storage
header
Tuple #03
Tuple #04
● Option #1:Abort-and-Restart
Tuple #01Tuple #01
Tuple #00Read:
?Read: Tuple #01Read:
Tuple #02Read:
Abort
![Page 10: Larger-than-Memory Data Management on Modern Storage ... · •Apache Geode – Overflow Tables •MemSQL – Columnar Tables •SolidDB •P*TIME. 22 MERGING THRESHOLD • YCSB Workload](https://reader033.vdocuments.site/reader033/viewer/2022042711/5f752692c192c53aea7e0a64/html5/thumbnails/10.jpg)
10
Transaction
COLD TUPLE RETRIEVALIn-Memory Table Heap
Tuple #00
Tuple #02
Cold-Data Storage
header
Tuple #03
Tuple #04
● Option #2:Synchronous Retrieval
Tuple #01Tuple #01
Tuple #00Read:
?Read: Tuple #01Read:
Tuple #02Read:
Stall
![Page 11: Larger-than-Memory Data Management on Modern Storage ... · •Apache Geode – Overflow Tables •MemSQL – Columnar Tables •SolidDB •P*TIME. 22 MERGING THRESHOLD • YCSB Workload](https://reader033.vdocuments.site/reader033/viewer/2022042711/5f752692c192c53aea7e0a64/html5/thumbnails/11.jpg)
11
• Option #1: Always Merge
• Option #2:Merge Only on Update
• Option #3: Selective Merge
Transaction
MERGING THRESHOLDIn-Memory Table Heap
Tuple #00
Tuple #02
Cold-Data Storage
header
Tuple #03
Tuple #04
Tuple #01
Tuple #00Read:
Tuple #01Read:
Tuple #02Read:
Sketch
![Page 12: Larger-than-Memory Data Management on Modern Storage ... · •Apache Geode – Overflow Tables •MemSQL – Columnar Tables •SolidDB •P*TIME. 22 MERGING THRESHOLD • YCSB Workload](https://reader033.vdocuments.site/reader033/viewer/2022042711/5f752692c192c53aea7e0a64/html5/thumbnails/12.jpg)
12
ACCESS METHODS
• Option #1: Block-addressable–Block-level access through file system
• Option #2: Byte-addressable (NVRAM)–Use mmap through a filesystem designed for byte-addressable
NVRAM (PMFS)–Directly operate on NVRAM-resident data as if it existed in
DRAM
![Page 13: Larger-than-Memory Data Management on Modern Storage ... · •Apache Geode – Overflow Tables •MemSQL – Columnar Tables •SolidDB •P*TIME. 22 MERGING THRESHOLD • YCSB Workload](https://reader033.vdocuments.site/reader033/viewer/2022042711/5f752692c192c53aea7e0a64/html5/thumbnails/13.jpg)
13
EVALUATION
• Compare design decisions in H-Store with anti-caching.
• Storage Devices:–Hard-Disk Drive (HDD)–Shingled Magnetic Recording Drive (SMR)–Solid-State Drive (SSD)–3D XPoint (3DX)–Non-volatile Memory (NVRAM)
![Page 14: Larger-than-Memory Data Management on Modern Storage ... · •Apache Geode – Overflow Tables •MemSQL – Columnar Tables •SolidDB •P*TIME. 22 MERGING THRESHOLD • YCSB Workload](https://reader033.vdocuments.site/reader033/viewer/2022042711/5f752692c192c53aea7e0a64/html5/thumbnails/14.jpg)
14
COLD TUPLE RETRIEVAL
• YCSB Workload – 90% Reads / 10% Writes• 10GB Database using 1.25GB Memory
HDD SMR SDD 3DX NVM0
50000
100000
150000
200000
Abort and Restart Synchronous Retrieval
Thro
ughp
ut (t
xn/s
ec)
large blocks small blocks
![Page 15: Larger-than-Memory Data Management on Modern Storage ... · •Apache Geode – Overflow Tables •MemSQL – Columnar Tables •SolidDB •P*TIME. 22 MERGING THRESHOLD • YCSB Workload](https://reader033.vdocuments.site/reader033/viewer/2022042711/5f752692c192c53aea7e0a64/html5/thumbnails/15.jpg)
15
MERGING THRESHOLD
• YCSB Workload – 90% Reads / 10% Writes• 10GB Database using 1.25GB Memory
HDD (AR) HDD (SR)0
50000
100000
150000
200000
Merge (Update-Only) Merge (Top-5%)Merge (Top-20%) Merge (All)
Thro
ughp
ut (t
xn/s
ec)
Improvement:● SMR: 30%● SSD: 17%● 3DX: 21%● NVRAM: 10%
![Page 16: Larger-than-Memory Data Management on Modern Storage ... · •Apache Geode – Overflow Tables •MemSQL – Columnar Tables •SolidDB •P*TIME. 22 MERGING THRESHOLD • YCSB Workload](https://reader033.vdocuments.site/reader033/viewer/2022042711/5f752692c192c53aea7e0a64/html5/thumbnails/16.jpg)
17
CONFIGURATION COMPARISON
• Generic Configuration (2013 Anti-caching)–Abort-and-Restart Retrieval–Merge (All) Threshold–1024 KB Block Size
• Optimized Configuration–Synchronous Retrieval–Top-5% Merge Threshold–Block Sizes (HDD/SMR-1024 KB) (SSD/3DX-16 KB)–Byte-addressable access for NVRAM
![Page 17: Larger-than-Memory Data Management on Modern Storage ... · •Apache Geode – Overflow Tables •MemSQL – Columnar Tables •SolidDB •P*TIME. 22 MERGING THRESHOLD • YCSB Workload](https://reader033.vdocuments.site/reader033/viewer/2022042711/5f752692c192c53aea7e0a64/html5/thumbnails/17.jpg)
18
GENERIC VS OPTIMIZED
HDD SMR SSD 3D XPoint NVRAM0
100000
200000
Generic Optimized
VOTER
Txn
/ se
c
HDD SMR SSD 3D XPoint NVRAM0
200000
400000
TATP
Txn
/ se
c
DRAM
![Page 18: Larger-than-Memory Data Management on Modern Storage ... · •Apache Geode – Overflow Tables •MemSQL – Columnar Tables •SolidDB •P*TIME. 22 MERGING THRESHOLD • YCSB Workload](https://reader033.vdocuments.site/reader033/viewer/2022042711/5f752692c192c53aea7e0a64/html5/thumbnails/18.jpg)
19
CONCLUSION
• Low-latency storage devices: Smaller block sizes and synchronous retrieval
• Constraints on merge frequency improve performance
• The performance of NVRAM is as good as pure DRAM if treated correctly
![Page 20: Larger-than-Memory Data Management on Modern Storage ... · •Apache Geode – Overflow Tables •MemSQL – Columnar Tables •SolidDB •P*TIME. 22 MERGING THRESHOLD • YCSB Workload](https://reader033.vdocuments.site/reader033/viewer/2022042711/5f752692c192c53aea7e0a64/html5/thumbnails/20.jpg)
21
REAL-WORLD IMPLEMENTATIONS
•H-Store – Anti-Caching•Microsoft Hekaton –Project Siberia• EPFL’s VoltDB Prototype•Apache Geode – Overflow Tables•MemSQL – Columnar Tables• SolidDB•P*TIME
![Page 21: Larger-than-Memory Data Management on Modern Storage ... · •Apache Geode – Overflow Tables •MemSQL – Columnar Tables •SolidDB •P*TIME. 22 MERGING THRESHOLD • YCSB Workload](https://reader033.vdocuments.site/reader033/viewer/2022042711/5f752692c192c53aea7e0a64/html5/thumbnails/21.jpg)
22
MERGING THRESHOLD
• YCSB Workload – 90% Reads / 10% Writes• 10GB Database using 1.25GB Memory