an empirical study of hot/cold data separation policies in solid state drives (ssds)
DESCRIPTION
Presentation file for SYSTOR 2013TRANSCRIPT
An Empirical Study of Hot/Cold Data Separation Policies in Solid State
Drives (SSDs)
Jongsung Lee and Jin-Soo Kim
Sungkyunkwan University
South Korea
Flash memory
• Flash characteristics• No overwrite
• Flash Translation Layer• Out-of-place update• Garbage Collection
•Write amplification factor• Performance metric of SSDs•𝑊𝑟𝑖𝑡𝑒 𝐴𝑚𝑝𝑙𝑖𝑓𝑖𝑐𝑎𝑡𝑖𝑜𝑛 𝐹𝑎𝑐𝑡𝑜𝑟 =𝐴𝑐𝑡𝑢𝑎𝑙 𝑎𝑚𝑜𝑢𝑛𝑡 𝑑𝑎𝑡𝑎 𝑤𝑟𝑖𝑡𝑡𝑒𝑛 𝑡𝑜 𝑓𝑙𝑎𝑠ℎ
𝐴𝑚𝑜𝑢𝑛𝑡 𝑜𝑓 𝑑𝑎𝑡𝑎 𝑤𝑟𝑖𝑡𝑡𝑒𝑛 𝑏𝑦 𝑡ℎ𝑒 ℎ𝑜𝑠𝑡
• The lower WAF means the better SSD performance
2
Valid page copy
Erase
Garbage Collection Procedure
BlockPageVictim
Hot/Cold Data Separation
3
No separation
With separation
Reduces the amount of pages copied during GC
Clean page Cold page Hot pageInvalid page
Should be copied before erasing
Cold pages are frequently copied during GC
Motivation
• List of target policies• 2-Level LRU
• L.-P. Chang and T.-W. Kuo. An adaptive striping architecture for flash memory storage systems of embedded systems. (RTAS 02’)
• Multiple Bloom Filter• D. Park. Hot data identification for flash-based storage systems using multiple bloom filters. (MSST 11’)
• Dynamic dAta Clustering• M.-L. Chiang, P. C. H. Lee, and R.-C. Chang. Using data clustering to improve cleaning performance for flash memory. (Practice & Experience 99’)
• Evaluate Hot/Cold separation policies• With fair conditions on a real SSD platform
4
Related Works
• 2-Level LRU (LRU)
5
Advantages
DisadvantagesFixed size of each list
Long latency for list searching
Simple design
Hot List
Candidate List
MRU LRU
Write miss
Write hit
Full
Related Works
•Multiple Bloom Filter (MBF)
6
Advantages
DisadvantagesFixed parameters (e.g., number of filters,
filter size, decaying period, etc..)
Small memory consumption
Calculate the hash value
Current Filter
Hash 1 Hash 2
Set the corresponding bit in Current Filter
1 2
Related Works
•Dynamic dAta Clustering (DAC)
7
Advantages
DisadvantagesOptimal number of regions depends on
workload pattern
Small memory consumptionLow calculating overhead
Region 0
Region 1
Region 2
Region 3
Write Write Write Write
Garbage collectionGarbage collectionGarbage collectionGarbage collection
Evaluations
8
Device Information
• Jasmine OpenSSD Platform• Run as a normal SATA drive
• Can be programmable
• Specification• ARM7TDMI-S core
• 64MB SDRAM
• 8 NAND module slots
• Configuration• Total capacity : 32GB
• Clustered block size : 4MB
• Clustered page size : 32KB
9
Synthetic Workloads
• SkewX (Skew70, Skew90, Skew95, Skew99)• X% of writes are concentrated on (100-X)% area
• SkewInc• Skew rate changes: 70% → 90% → 95% → 99%
• SkewDec• Skew rate changes: 99% → 95% → 90% → 70%
10
X% 100-X%
Write pattern
Real Workloads
• Financial• Collected from OLTP(On-Line Transaction Processing) applications running at a financial institution
•Web search• Surfing the web during one day
•General• Run office suite, download and play mp3 files, play movies during five days
• TPC-C• Gathered from commercial DBMS while running the TPC-C benchmark for three hours
11
Performance – Synthetic workloads
12
Hot/Cold separation policies are quite effective
Performance – Synthetic workloads
13
Oracle shows the best performance in most cases
Performance – Synthetic workloads
14
Oracle correctly separates Hot/Cold data, but it does not mean that it minimize WAF
Performance – Synthetic workloads
15
DAC improves WAF value by up to 73% (average 46%)
73%
Performance – Real workloads
16
Also DAC works well in most cases
Performance – Real workloads
17
Hot/Cold separation does not work well in TPC-C workload
Performance – Real workloads
18
Because written LPNs in TPC-C are uniformly distributed
Conclusion & Future Works
•Hot/Cold data separation is effective in most cases
•DAC reduces WAF value by up to 74% in synthetic workloads, up to 58% in real workloads
• Run more diverse workloads
•Develop a brand-new hot/cold separation policy
19
Thank you!
20