1 the five-minute rule 20 years later (and how flash memory changes the rules) goetz graefe...
TRANSCRIPT
1
The Five-Minute Rule20 Years Later(And How Flash Memory Changes The Rules)Goetz Graefe
Presented ByAbhinav Parate
2
Storage Hierarchy
FLASH
3
Comparing Flash with Disks
4
When should we increase main memory?
• Metrics to decide-– Cost of infrastructure– Cost of maintenance–Mean Time to Failure– Performance improvement
• Simplest answer: Increase RAM size if it is insufficient to hold frequently accessed data item
• What time period is frequent?
5
Cost of accessing a data item
• A disc provides N accesses per second and costs $D.• DA: D/N = Cost of disc access per second • M : Cost of 1 byte of main memory
• I : Expected interval when the same data is accessed again (in seconds)
• B : Size of data in bytes
6
Cost of accessing a data item
• Number of accesses per second for data item = 1/I
• Cost if item is accessed from disc = DA/I
• Cost if item is available in memory = M * B• Keep data item in memory if main memory cost is less than
disc access cost
• M * B < DA/ I
• I < DA/ (M * B)
• For 1 KB data item, I < 400s ~ 5 minutes at 1987 costs
7
The Five-Minute Rule• In 1987, Keep a 1KB data item in main memory, if it is
accessed repeatedly in less than 5 minutes. • In 1967, the frequent period was 0.5 s• In 2007, the authors predicted 5 hour rule• At actual 2007 prices, the period turned out to be
little under 6 hours.
8
Sample Case• A database consists of 500,000 records of 1000 bytes each.• Peak load consists of 600 transactions per sec.• Only 6% of data gets 96% accesses and gets accessed in
<5min.• 6% data resides in main memory.• Remaining data gets accessed via two hard disks to support 1
second access time.• The design saved $3.5m at 1987 costs when compared with
entirely main-memory design
9
Back to Present• Technology changed• Multiple cores• Virtualization• Size of data increased tremendously• Gap between RAM and disks performance increased• FLASH memory comes into the picture!
10
Flash memory characteristics• Purchase cost• Access Latency• Bandwidth• Density• Power consumption• Cooling costs• Everything lies in between RAM and rotating
hard disks!
11
Comparison: Flash and Disks
12
Desirability of Flash Memory• Disk I/O is increasingly becoming bottleneck as the
number of CPU instructions possible in a disk I/O is steadily increasing
• A faster intermediate memory in storage hierarchy is highly desirable
13
Limitation of Flash Memory• Write-bandwidth is lower than read-bandwidth.• Re-writing a block requires erasing of entire block.• Reliability: 100,000-1M erase and write cycles• Requires wear levelling mechanism• Requires agent to erase blocks as soon as they are written
to hard disk.
14
The presentation ahead ...• Key challenges in using flash memory• Addressing challenges• Lots of open questions• Implications in greening the computing infrastructure.
15
#1: Which hardware interface to use?
• Use DIMM?• Use Serial-ATA? • Use new hardware interface?• Defining and developing new hardware interface is time-
consuming exercise• Use one of the existing interfaces
16
#2: Use as Buffer or Persistent Storage?
• Database systems are concerned with providing consistency.
• Databases have large number of small updates and must maintain recovery logs.
• Write logs to persistent storage quickly. • Use Flash as Persistent Storage!
17
#2: Use as Buffer or Persistent Storage?
• File-systems manipulates the file contents in memory and write file to disk in its entirety
• Consistency is achieved via careful write ordering, quick write-back and expensive file-system checks.
• Page movement between flash and disks is expensive if flash is considered as persistent storage.
• Use Flash memory as buffer pool!.
18
#3: How to track Frequent Pages?• The estimation and administration of frequent pages in
current system is done through LRU• Maintain two LRU chains in RAM
19
Least Recently Used Chain• LRU for RAM
• LRU for flash memory
T(N) T(N-1) T(1)
20
#4: How to decide size of RAM and Flash?
• Use Five-Minute Rule!
21
#5: How to move pages among layers in hierarchy?
• RAM and flash– DMA Transfer
• Flash and Disk– DMA (hardware)– Transfer buffer in RAM (software)
22
#6: How to track Page Locations?• File systems– Maintain pointer pages– Pointer points to data page or run of contiguous data
pages– Individual page movement may require breaking up
run and updating pointer pages
23
#6: How to track Page Locations?• Database systems– Use B-Tree indexes– Other kinds of indexes have been implemented on B-Trees
efficiently– Page movement requires updating pointers in parent node
and neighbors
24
Benefits to Database Systems• Check Point Processing– provides consistency in databases– writes dirty pages to persistent storage– persistent flash storage is faster– need to write to disk only if page-replacement policy
requires
• Recovery Logs– quick writes
25
Benefits to Database Systems• Query Processing– Index based selection is faster– Need to consider index based query plans– Index joins and intersections
• Example:• Table Scan: 100M rows : 100s• Index fetches 10K rows in 100s• Table Scan is efficient if result has more than 10K rows.• Flash index scan fetches 500K rows!
26
Problem of Optimal B-tree Page Size
• Two different optimal page sizes
27
Implications for Green Computing• This work's focus is infrastructure cost.• Energy optimization may lead to different optimal page
sizes for B-trees.• Infrastructure cost optimization can lead to significant
reduction in RAM size and hence, lower energy consumption.
• Introduces large flash memory in the system.
28
Implications for Green Computing• P_flash be power consumption with flash memory• P_noflash be power consumption without flash• Let T_flash,T_noflash denote system throughput with/without
flash• System is green if– P_flash / P_noflash < 1– T_flash / T_noflash > 1
29
Implications for Green Computing
• What if P_flash / P_noflash > 1?• In this case, system is green if– T_flash / T_noflash > P_flash / P_noflash– Gain in throughput is higher than extra power spent
30
Some calculations• Assume linear relation between number of frequently accessed
pages and the frequent period• If M is RAM used in no-flash system– M/15 is RAM in flash-based system– 4M is flash memory
• P_flash = M/15 x pram + 4M x pflash
• P_noflash = M x pram
• P_flash < P_noflash if pflash< 14/60 pram
• The relationship holds true.
31
Conclusions• Desirable to have faster intermediate memory in storage
hierarchy.• Database systems are likely to benefit a lot.• Things are not clear about file-systems.• Flash can improve system throughput and reduce power
consumption.• Reduction in RAM usage can lead to significant power
savings.
32
Thank You!