flikker: saving dram refresh-power through critical data partitioning
DESCRIPTION
Flikker: Saving DRAM Refresh-power through Critical Data Partitioning. Song Liu Karthik Pattabiraman Thomas Moscibroda Benjamin G. Zorn. Motivation: Smartphones. Smartphones becoming ubiquitous. Responsiveness is important. Refreshing DRAM can drain the battery even when idle. - PowerPoint PPT PresentationTRANSCRIPT
Flikker: Saving DRAM Refresh-power through Critical Data Partitioning
Song Liu Karthik PattabiramanThomas MoscibrodaBenjamin G. Zorn
2
Motivation: Smartphones
Smartphones becoming ubiquitous
Using more DRAM
Responsiveness is important
Refreshing DRAM can drain the battery even when idle
3
Motivation: DRAM Refresherror ratepower
refresh cycle [s]64 mSecWhere we are today
Where we want to be
X sec
The opportunity
The cost
If software is able to tolerate errors, we can lower DRAM refresh rates to achieve considerable power savings
Flikker: ApproachCritical / non-critical data partitioning
crit
non-crit
crit
non-crit
High refreshNo errors
Low refreshSome errors
Flikker DRAM
Important for application correctnesse.g., meta-data, key data structures
Does not substantially impact application correctness e.g., multimedia data, soft state
Mobile applications have substantial amounts of non-critical data that can be easily identified by application
developers 4
Contributions of FlikkerFlikker is the first software technique to
intentionally lower memory reliability for energy savings (with minimal hardware modification)
Flikker exposes errors in the DRAM to the application, and handles these errors by leveraging inherent error resilience of the software
Flikker allows the programmer to specify reliability of different data based on software requirement
Flikker achieves over 20% overall DRAM power reduction with negligible loss of performance and reliability
5
6
OutlineFlikker DRAM and software framework
Experimental resultsFuture workConclusions
Partial Array Self Refresh (PASR)Self-refresh: low power, keep the data
PASR: only refresh part of the memory array, configured among discrete levels [Samsung], [Micron]
Cons: less DRAM available in idle periods
7
Flikker HardwareDivide memory bank
into high refresh part and low refresh parts
Size of high-refresh portion can be configured at runtime
Small modification of the Partial Array Self-Refresh (PASR) mode
High Refresh
Low Refresh ¾
½
¼
⅛
Flikker DRAM Bank
1
8
9
DRAM Error Rate
Figure from [Bhalodia, Master Thesis, 2005]
Refresh cycle [s]
1s: 4x10-8
Flikker Software
Programmer AllocatorOperating System
High Refresh Rows
Low Refresh Rows
Flikker DRAM
critical object
non-critical object
critical page
non-critical page
virtual pages physical
pages
Minor changes to the memory allocator and the Operating System (OS)
10
11
OutlineFlikker DRAM and software framework
Experimental resultsFuture workConclusions
12
Applicationsmpeg2 decoderc4 (connect 4, four-in-a-row)rayshade (ray-traced images)vpr (SA based optimization)parser
13
Experiment SetupPerformance (architectural simulator)
◦ Impact of data partitioningOverall DRAM power (simulator, model)
◦ Active power, Idle power◦ Usage profile (95% idle, 5% active)
[Karlson et.al, Pervasive’09]Fault injection simulation (Pin)
◦ Simulate a self-refresh period, and inject error afterward
<0.5%
14
Fault-injection Simulationscode stack globa
lheap
baseline
code stack global
heapideal
code stack global
heapaggressive
code stack global
heapconservativ
e
code stack global
heapcrazy
custom allocato
r
compiler
support
critical
non-critical
15
Power ReductionEstimate the portion of high refresh part
based on the percentage of the critical pages◦ 24% critical pages: ¼ high refresh rows
Overall power savings: up to 25%
0%
5%
10%
15%
20%
25%
30%
mpeg2 c4 rayshade vpr parser
Overall DRAM Power Reductionconservative aggressive crazy
0%
5%
10%
15%
20%
25%
30%
35%
mpeg2 c4 rayshade vpr parser
Standby DRAM Power Reductionconservative aggressive crazy
16
Fault-injection Result Output stats (1000 executions): perfect, degraded, failed
(hang, crash) c4: always perfect mpeg2, rayshade: some degraded output vpr, parser: some failed in aggressive and crazy
0%10%20%30%40%50%60%70%80%90%
100%
cons aggr crazy cons aggr crazy cons aggr crazy cons aggr crazy cons aggr crazy
mpeg2 c4 rayshade vpr parser
Perc
enta
ge o
f 100
0 Ex
ecuti
on
Fault Inject Results for 1s Refresh CyclePerfect Degraded Failed
17
Fault-injection Result: SNRSignal-to-Noise-Ratio (SNR): the ratio of signal
energy and noise energySNR in logarithm scale: 3dB means double the ratiompeg2 encoder -> decoder: 35 dBFlikker yields very high SNR
Configuration mpeg2 rayshadeconservative 95.48 101.1aggressive 88.34 72.84
crazy 88.04 73.63
Average SNR of degraded output of mpeg2 and rayshade [dB]. The impact of Flikker is
negligible.
18
Rayshade: Degraded SNR
original degraded (52.0dB)
19
OutlineBackground
◦Error resilience of applications◦Partial Array Self Refresh
Flikker DRAM and software framework
Experimental resultsFuture workConclusions
20
DRAM in Data CentersData center applications contain soft states, e.g. index
Typical utilization of data centers is less than 30%
21
Reduce Refresh PenaltyRefresh operation incurs performance
penalty in active state◦No R/W during refresh operations◦Larger DRAM → more rows to refresh →
higher refresh penalty [Stuecheli, MICRO’10]
Flikker reduces the number of refresh operations, and thus reduces refresh penalty
22
ConclusionsHandles DRAM errors with error
resilience of software
Specify reliability of different data based on software requirement
Over 20% overall DRAM power reduction
THANK YOU!
QUESTIONS?
23
24
DRAM Refresh is Expensive
Refresh power consumptionPerformance penalty
◦ Refresh penalty increases with capacity [Stuecheli, MICRO’10]
Variation in retention time [Venkatesan, HPCA’06]
Figure from [Venkatesan, HPCA’06]
25
Memory Footprint Breakdown
Global data is not partitioned
mpeg2 c4 rayshade vpr parser0%
20%40%60%80%
100%
Application Footprint Breakdownnoncrit-heap global crit-heap stack code
26
Self-refresh Power ModelSelf-refresh power is not just power spent on refresh
Pself-refresh= Prefresh + PotherAssume Prefresh is proportional to refresh rate
error rate
power
refresh cycle [s]
27
Power Saving vs. Error Rate
1.E-11
1.E-09
1.E-07
1.E-05
1.E-03
1.E-01
0%5%
10%15%20%25%30%
0.1 0.2 0.5 1 2 5 10 20
Erro
r Rat
e
Self-
refr
esh
Pow
er S
avin
g
Refresh Cycle [s]
Power Saving and Error Rate for Different Refresh Rate
Power Saving Error Rate
¼ array high refresh
1s
28
Power vs. Output Qualityconservative: parseraggressive: mpeg2,
c4, rayshadecrazy: vpr 0%
5%
10%
15%
20%
25%
30%
35%
mpeg2 c4 rayshade vpr parser
Standby DRAM Power Reductionconservative aggressive crazy
0%10%20%30%40%50%60%70%80%90%
100%
cons aggr crazy cons aggr crazy cons aggr crazy cons aggr crazy cons aggr crazy
mpeg2 c4 rayshade vpr parser
Perc
enta
ge o
f 100
0 Ex
ecuti
on
Fault Inject Results for 1s Refresh CyclePerfect Degraded Failed