eecs 388: embedded systems - ittc
TRANSCRIPT
EECS 388: Embedded Systems
12. Power and Energy
Heechul Yun
1
Agenda
• Background
• How to measure?
• How to save energy/power?
2
3H Sutter, “The Free Lunch Is Over”, Dr. Dobb's Journal, 2005(Updated in 2009)
4
Power Consumption (Server)
• Memory consumes significant power– E.g.,) Intel Haswell-ULT: 15W, 2 x 4G DDR3 DRAM: 10W
Figure source: Luiz André Barroso and Urs Hölzle, The Datacenter as a Computer: An Introduction to the Design of Warehouse-ScaleMachines, Morgan & Claypool, 2009
Power Consumption (Smart Phone)
• Audio playback with backlight off on a smartphone
5
DVFS and DPM
• Dynamic Voltage/Frequency Scaling (DVFS)– Power ~ f V2
– Reduce frequency & voltage
• Dynamic Power Management (DPM)– Multiple power states
• CPU C-states (standby, sleep, deep sleep, …)• DDR3 power states (standby, powerdown, self-refresh, …)
• Goal: Making a “Good” Tradeoff– Minimize performance hit, maximize power reduction
6
Background
- f: clock frequency- V: voltage
7
staticPCfV 2
2
1
staticdynamic PPPower
Background
8
2~ fVPower
TimePowerEnergy
• Frequency doesn’t matter. Is that right?
fTime
1~
(Let’s ignore Pstatic for now.)
Background
9
2~ fVPower
TimePowerEnergy
• If you reduce frequency, you can also reduce voltage
fTime
1~
Vf ~
Background
10
3~ fPower
TimePowerEnergy
• Is reducing frequency always good?
fTime
1~
Background
11
2~ f
TimePTimePEnergy staticdynamic
• Is reducing frequency always good?
f
1~
PowerTop
12
Intel’s Recent Processors
• RAPL (Running Average Power Limit)
13Source: http://web.eece.maine.edu/~vweaver/projects/rapl/
Source: http://http://software.intel.com/en-us/articles/intel-power-governor
14
Platform level monitoringOdroid-XU-E boardProcessor: Exynos 5 Octa
Source: http://hardkernel.com/main/products/prdt_info.php?g_code=G137463363079
External Measurement
15
Source: http://www.hardkernel.com/main/products/prdt_info.php?g_code=G137361754360
Source: http://www.rakuten.com/prod/p3-kill-a-watt-ps-10-10-outlets-power-strip-receptacle-10/220012603.html?listingId=284206025&scid=pla_google_3KingsAudio&adid=18172&gclid=CIvs97jTq70CFa5DMgodcEkAHg
http://www.amazon.com/P3-International-P4460-Electricity-Monitor/dp/B000RGF29Q/ref=sr_1_3?ie=UTF8&qid=1395680823&sr=8-3&keywords=power+meter
How to save Power/Energy?
• Techniques for perf/energy tradeoffs
– DVFS
– Turbo boost
– Power gating
– Core heterogeneity
• Considerations
– Sensitive to time (performance)
– Sensitive to energy consumption
16
A Measurement Study
• An Analysis of Power Consumption in a Smartphone, USENIX ATC’10
17
Impact
A Smartphone
• (very old) 2.5G GPRS phone– Battery: 1200mAh, 3.7V Li-ion (4.4Wh)
18
What to Know?
• Where does the energy go?
– Detailed component-level power breakdown
– On various usage scenarios
• How to save energy?
– The efficacy of DVFS (dynamic voltage-frequency scaling) schemes
19
Methodology
• Hardware
– A development board, configured to measure individual component (CPU, memory, Modem, …) power consumption
– Using a DAQ (data acquisition) system
• Read the paper. You can find very detailed descriptions
• Software
– On Android 1.5, using a set of micro-benchmarks as well as real applications
20
Idle
• System is awake, but no applications are active• CPU and RAM are not top power consumers
21
Audio Playback
• Backlight off• Comparable to idle state
22
Video Playback
• Backlight is a dominant factor
23
Backlight
• User controllable (~255 levels)
24
CPU and Memory
• 100MHz (low perf) 400MHz (max perf)• equake’s power consumption increases significantly• mcf’s power consumption doesn’t increase much
25
Internal Flash and SD Card
• Benchmark: flash read/write (dd)• Why are they (internal and SD) different?
26
Findings
• Where does the energy go?
– GSM, display, backlight
– Not CPU and DRAM
• Is DVFS useful?
– Reduce power but not necessarily energy
– Only memory bound applications get energy savings
27
Two Additional Smartphones
28
Quiz
• Which phone do you want to use DVFS?
29
Is DVFS useful?
• Yes: Nexus One, Freerunner (weak)• No: G1
30Further reading: E. Le Sueur and G. Heiser, “Dynamic voltage and frequency scaling: the laws of diminishing returns,” HotPower’10
Challenge: How To Configure?
• Too many possible configurations
– low or high freq?
– More cores or less cores?
– Little core vs. big core?
• Platform variation
– A policy that works well on a platform does not necessarily work on another platform
31
Challenge: How To Configure?
• Too many possible configurations
– low or high freq?
– More cores or less cores?
– Little core vs. big core?
• Platform variation
– A policy that works well on a platform does not necessarily work on another platform
32
Energy Saving Strategies
• Model-based approach
– Offline: build an energy/performance model
– Online: compute an “optimal” assignment
• Heuristic approach
– Race to idle
– Never idle
– Adaptive control
33
System-wide Energy Optimization for Multiple DVS Components and
Real-time TasksHeechul Yun, Po-Liang Wu, Anshu Arya, Tarek
Abdelzaher, Cheolgi Kim, and Lui ShaUniversity of Illinois at Urbana and ChampaignIEEE Real-Time and Embedded Technology and
Applications Symposium (RTAS), 2010
34
CPU-only DVFS
• “DVFS is increasingly ineffective” [Le Sueur, HotPower’10]– Increased importance of static power– Small voltage margin for DVFS to be effective– Reduced freq. increased runtime often increased energy
35
- f: clock frequency- V: voltage- k: constant
staticPkfV 2
staticdynamic PPP
CPU-only DVFS
36
0
100
200
300
400
500
600
700
40
60
80
10
0
12
0
14
0
16
0
18
0
20
0
22
0
24
0
26
0
28
0
30
0
32
0
34
0
36
0
38
0
40
0
Valid range (~200Mhz)
Not effective, But…
fc
(Mhz)
Energy(mJ)
Task cache stall ratio = 0 %
Motivation
37
CPU(Mhz) Mem(Mhz) Time(s) Energy(mJ)
200 100 3.46 1690
100 100 3.55 1182
Memxfer5b : memory benchmark program
Half of CPU clock
Energy saved 30%
Exec. time increased only 3%
Motivation
38
CPU(Mhz) Mem(Mhz) Time(s) Energy(mJ)
200 100 4.26 2364
200 50 4.28 2106
Dhrystone: CPU benchmark program
Half of Mem clock
Energy saved 10%
Exec time increased only 0.05%
Task Model
• Task = Computation + Memory fetch
39
computation
memory fetch(cache stall)
time
power
Computation Memoryfetch
time
power
Task Model (2)
40
C M
C : computationM : off-chip memory fetch
(cache-stall cycles)
power
time
CMLower MEM freq
power
time
CM
Lower CPU freq
power
time
Task Model (3)
• Execution time of a task
– C : CPU cycles of a given task (excluding memory stalls)
– M : memory cycles of a given task (memory stall cycles)
– fc : CPU clock frequency
– fm : Memory clock frequency
41
mc f
M
f
Ce
Power Model
• Power of a component (i.e., CPU)
– k : capacitance constant
– f : frequency of the component
– V : supplying voltage
– R : leakage power
42
RkfVP 2
Different k for different modes: kactive - active mode capacitance
kstandby- standby mode capacitance
Energy Model
43
e P
Memory Fetch
power
idle
CPU active
Bus, memstandby
time
CPU standby
Bus, memactive
System static
CPU, bus, memidle
Ecpu
pure exec block
Emem
MEM fetch block
Eidle
idle block
Dynamic power
• System wide energy model– Considers CPU, bus, and memory power consumption
– Considers active, standby and idle modes
– Other components are assumed to be static (included in R)
Energy Equation and Validation
Capacitance (nF) Power (mW)
Kca Kcs Kma* Kms* I R
0.505 0.224 0.540 0.210 6.570 67.434
44
)()(
)()( 2*22*2
ePRI
f
MRfVkfVk
f
CRfVkfVkE
m
mmaccpucs
c
mmscca
Obtained coefficients in the energy equation
• Validated on a ARM926-ejs based platform via regression analysis
Heechul Yun, Po-Liang Wu, Anshu Arya, Tarek Abdelzaher, Cheolgi Kim, and Lui Sha. “System-wide Energy Optimization for Multiple DVS Components and Real-time Tasks,” ECRTS, 2010
Static MultiDVFS Problem
• Given a set of periodic real-time tasks (T1, …,Tn), where each task invocation requires up to Ci CPU cycles and up to Mi memory cycles at worst.
• Find the energy optimal static frequencies for multiple DVFS capable components (CPU and memory)
45
Problem Formulation
Minimize
Subjects to
where
46
n
i
idleimemicomp
i
EEEP
H
1
,, )(
.11
n
i i
i
P
e
H : hyper periodei : execution time of task iEcomp,i : computation block energy of task iEmem,i : cache stall block energy of task iEidle : idle block energy
Energy vs. Utilization
47
Task set cache stall ratio (MH/(CH+MH) ): 0.3
0.5
0.6
0.7
0.8
0.9
1
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
MAX
CPU-only
Static
utilization
No
rmal
ized
ave
rage
po
wer
co
nsu
mp
tio
n
MultiDVFS
Summary
• Memory-aware time/energy model – Consider CPU and memory frequencies/voltages
– Validated on a real hardware platform
• MultiDVFS– Joint optimization of CPU and memory
frequencies/voltages,
– Minimize energy consumption of periodic real-time tasks
48
Recap: First Attempt
• 1000 samples (minus the first sample. Why?)
49
CFS (nice=0)
Mean 23.8
Max 47.9
99pct 47.4
Min 20.7
Median 20.9
Stdev. 7.7
Why?
Recap: DVFS
• Dynamic voltage and frequency scaling (DVFS)
• Lower frequency/voltage saves power
• Vary clock speed depending on the load
• Cause timing variations
• Disabling DVFS
50
# echo performance > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor# echo performance > /sys/devices/system/cpu/cpu1/cpufreq/scaling_governor# echo performance > /sys/devices/system/cpu/cpu2/cpufreq/scaling_governor# echo performance > /sys/devices/system/cpu/cpu3/cpufreq/scaling_governor
Recap: Energy Saving Strategies
• Model-based approach
– Offline: build an energy/performance model
– Online: compute an “optimal” assignment
• Heuristic approach
– Race to idle
– Never idle
– Adaptive control
51
POET: A Portable Approach to Minimizing Energy Under Soft Real-
time ConstraintsConnor Imes, David H. K. Kim, Martina Maggio, and
Henry HoffmannUniversity of Illinois at Urbana and ChampaignIEEE Real-Time and Embedded Technology and
Applications Symposium (RTAS), 2015
52
Systems
53
Configurations
• Per-application/per-platform, off-line profiling
54
Platform Variation
55
Connor Imes, David H. K. Kim, Martina Maggio, and Henry Hoffmann, “POET: A Portable Approach to Minimizing Energy Under Soft Real-time Constraints,” RTAS, 2015
POET Approach
• Control theory based
– (1) observe error (2) compute control (3) apply control
56
Connor Imes, David H. K. Kim, Martina Maggio, and Henry Hoffmann, “POET: A Portable Approach to Minimizing Energy Under Soft Real-time Constraints,” RTAS, 2015
Controller
• Goal: meet the speed target
• Observe error
• Compute control signal
57
Optimizer
• Given
– C configurations,
– measured speed s(t),
– time window tau
• Goal: minimize energy
– Subject to
• Meeting performance (#of jobs in a given time window tau)
• Sum of time spent on each setting = tau
58
Example Usage
• Apply to periodic tasks– One control per job
• Heartbeat API– measure rate
59
Connor Imes, David H. K. Kim, Martina Maggio, and Henry Hoffmann, “POET: A Portable Approach to Minimizing Energy Under Soft Real-time Constraints,” RTAS, 2015
Results
60
Connor Imes, David H. K. Kim, Martina Maggio, and Henry Hoffmann, “POET: A Portable Approach to Minimizing Energy Under Soft Real-time Constraints,” RTAS, 2015
Summary
• Power/energy/speed relationship– Model vs. practice
• Control options– DVFS– On/off– Core heterogeneity
• Management approaches– Model based– Heuristic based– Control theory based
61