memscale: active low-power modes for main memory
DESCRIPTION
MemScale: Active Low-Power Modes for Main Memory. Qingyuan Deng, David Meisner*, Luiz Ramos, Thomas F. Wenisch*, and Ricardo Bianchini Rutgers University *University of Michigan. Server memory power challenges. Power consumption of a Google server [Barroso & Hoelzle’07]. - PowerPoint PPT PresentationTRANSCRIPT
1
MemScale: Active Low-Power Modes for Main Memory
Qingyuan Deng, David Meisner*, Luiz Ramos, Thomas F. Wenisch*, and Ricardo Bianchini
Rutgers University *University of Michigan
2
Server memory power challengesPower consumption of a Google server [Barroso &
Hoelzle’07]
• DRAM power varies little with load • Memory power represents 30-40% of total power for typical loads• Fraction is larger since memory controller power is not included
Compute Load (%)
Pow
er (%
of p
eak)
3
Improving memory energy efficiency
• Observation: Memory bandwidth is rarely fully utilized [Meisner’11];
we can save energy during periods of light and moderate load
• Previous approaches• Leveraging DRAM idle low-power state [Lebeck’00][Delaluz’01][Li’04][Diniz’07]…
• Rank sub-setting and DRAM reorganization [Ahn’09][Udipi’10][Zheng’10]…
• Memory controller power is typically not considered
• Need active low-power modes to save energy when underutilized • Frequency has greater impact on bandwidth than latency
4
MemScale: Active low-power modes for memory• Goal: Dynamically scale memory frequency to conserve energy
• Hardware mechanism:• Frequency scaling (DFS) of the channels, DIMMs, DRAM devices• Voltage & frequency scaling (DVFS) of the memory controller
• Key challenge:• Conserving significant energy while meeting performance constraints
• Approach:• Online profiling to estimate performance and bandwidth demand• Epoch-based modeling and control to meet performance constraints
• Main result: • System energy savings of 18% with average performance loss of 4%
5
Outline
• Motivation and overview
• Background on memory systems
• MemScale: DVFS for the memory system
• Results
• Conclusions
6
Impact of frequency scaling on memory latencyACT
CL
Burst
PRE
ACT CL PREBurstTime
ACT CL PREBurst
MC
MC
800 MHz
400 MHz
• For DDR3 DRAM, scaling frequency from 800MHz to 400MHz: bandwidth down by 50%, latency up by only 10%
Req
Reply
7
Opportunity for MemScale
0%
20%
40%
60%
80%
100%
MEM INTENSIVE INTERMEDIATE COMPUTE INTENSIVE
Pow
er %
(nor
mal
ized)
Background Dynamic MC
Background: clock tree, I/O driver, register, PLL, DLL, refresh, others
• Effects of lower frequency on power:• Lowers background power linearly (~f)• Lowers MC power by cubic factor (~f^3)
Dynamic: read, write, terminationMC: memory controller
8
Outline
• Motivation and overview
• Background on memory systems
• MemScale: DVFS for the memory system
• Results
• Conclusions
99
MemScale design
• Goal: Minimize energy under user-specified slowdown bound
• Approach: OS-managed, epoch-based memory frequency tuning
• Each epoch (e.g., an OS quantum):1. Profile performance & bandwidth demand
• New performance counters track mem latency, queue occupancies
2. Estimate performance & energy at each frequency• Models estimate queuing delays & system energy
3. Re-lock to best frequency; continue tracking performance• Slack: delta between estimated & observed performance
4. Carry slack forward to performance target for next epoch
1010
Frequency and slack management
Time
Epoch 1 Epoch 2 Epoch 3 Epoch 4
High Freq.
Low Freq.MC, Bus + DRAM
CPU Pos. Slack Neg. Slack Pos. Slack ProfilingTarget
Actual
Calculate slack vs. targetEstimate performance/energy via models
11
Modeling of performance and energy• New performance counters enable estimate of
• Level of contention (bank and bus)• Energy consumption
• CPI of each application
• Avg memory latency
• Performance slack
• Estimate full system energy
12
MemScale adjusts frequency dynamically
Timeline of workload mix MID3
13
Outline
• Motivation and overview
• Background on memory systems
• MemScale: DVFS for the memory system
• Results
• Conclusions
14
Methodology• Detailed simulation
• 16 cores, 16MB LLC, 4 DDR3 channels, 8 DIMMs
• Multi-programmed workloads from SPEC suites
• Power modes• 10 frequencies between 200 and 800 MHz
• Power consumption• Micron’s DRAM power model • Memory system power = 40% of total server power
15
Results – energy savings and performance
0%
10%
20%
30%
40%
50%
60%
70%
80%
ILP MID MEM AVG
Ener
gy sa
ving
s (%
)
Full system energyMemory system energy
0%
2%
4%
6%
8%
10%
12%
ILP MID MEM AVGCP
I inc
reas
e (%
)
Multiprogram averageWorst program in mix
CPI degradation bound
Memory energy savings of 44%
System energy savings of 18% always within performance bound
Average energy savings Performance overhead
16
Alternative approaches
• Fast power-down• Transition ranks into fast power-down mode when idle
• Decoupled-DIMM [Zheng’09]• Low frequency DRAM + high frequency DIMMs & channels
• Static• Pre-selected active low-power mode w/o dynamic scaling• Unrealistic: needs a priori knowledge of workload behavior
17
Results – comparison to alternative approachesFull System Energy Saving
0%2%4%6%8%
10%12%14%16%18%20%
0%1%2%3%4%5%6%7%8%9%
10% Multiprogram averageWorst program in mix
Performance overhead (MID)Full system energy savings (MID)
Ener
gy S
avin
gs (%
)
CPI i
ncre
ase
(%)
Fast-PD
Decoupled-DIM
MSta
tic
MemScale
MemScale+Fast-
PD
Fast-PD
Decoupled-DIM
MSta
tic
MemScale
MemScale+Fast-
PD
18
Conclusions
• MemScale contributions:• Active low-power modes for the memory subsystem• New perf. counters to capture energy and contention• OS policy to choose best power mode dynamically
• Avg 18% system energy savings, avg 4% performance loss
• In the paper• Performance and energy models• Sensitivity analyses (including lower performance bounds)• Energy break-down comparison
19
THANKS!
SPONSORS: