nrg$loops:*adjus/ng*power*from*...
Post on 03-Dec-2018
213 Views
Preview:
TRANSCRIPT
NRG-‐Loops: Adjus/ng Power from within Applica/ons
Melanie Kambadur*+, Martha Kim* *Columbia University, New York, NY USA
+Oscar Health Insurance, New York, NY USA
Once, power/performance tradeoffs were set at HW design 7me…
Power efficiency evolu/on
2
-‐ Less power -‐ Slower runGme
+ More power + Faster runGme
Low Freq.
High Freq.
Once, power/performance tradeoffs were set at HW design 7me…
Power efficiency evolu/on
3
-‐ Less power -‐ Slower runGme
+ More power + Faster runGme
Low Freq.
High Freq.
+ Med. power + Faster runGme
Once, power/performance tradeoffs were set at HW design 7me…
Power efficiency evolu/on
4
-‐ Less power -‐ Slower runGme
+ More power + Faster runGme
Low Freq.
High Freq.
+ Med. power + Faster runGme
-‐-‐ Power ++ RunGme Specialized HW
The next big thing was tunable “knobs”
Power efficiency evolu/on
5
Dynamic Frequency Tuning (DFS/DVFS)
The next big thing was tunable “knobs”
Power efficiency evolu/on
6
Dynamic Frequency Tuning (DFS/DVFS)
CPU Idle Modes
The next big thing was tunable “knobs”
Power efficiency evolu/on
7
Dynamic Frequency Tuning (DFS/DVFS)
CPU Idle Modes
Asymmetric MulGcores
How do we use these HW knobs for SW power & energy efficiency?
Moving power efficiency up the stack
8
Dynamic Frequency Tuning (DFS/DVFS)
CPU Idle Modes
Asymmetric MulGcores
Using HW knobs for SW energy efficiency
9
Most SW energy efficiency soluGons expose “hints” to OS, which then tunes HW knobs. func foo _high_power_ {
// some code } func bar _low_power_ {
// some code } func baz _high_power_ {
// some code }
High freq.
Low freq.
Highfreq.
Using HW knobs for SW energy efficiency
10
Most SW energy efficiency soluGons expose “hints” to OS, which then tunes HW knobs. class Foo _high_power_ {
// some code } class Bar _low_power_ {
// some code } class Baz _high_power_ {
// some code }
High freq.
Low freq.
Highfreq.
Using HW knobs for SW energy efficiency
11
Most SW energy efficiency soluGons expose “hints” to OS, which then tunes HW knobs. class Foo _high_power_ {
// some code } class Bar _low_power_ {
// some code } class Baz _high_power_ {
// some code }
Idle some cores
STOP using HW knobs for SW energy efficiency
12
Most SW energy efficiency soluGons expose “hints” to OS, which then tunes HW knobs. • Hard to manage HW power when mulGple
programs give hints simultaneously
• HW can predict idle periods bederà sub-‐cycle DVFS tuning?
• In pracGce, most HW tuning increases runGme to save power, so can’t save energy during SW acGve periods.
0
20
40
60
80
100
0 2 4 6 8 10 12 14
Power (%
Total)
Time (s)
Mobile app example
Segment of mobile game that takes 10s.
13
0
20
40
60
80
100
0 2 4 6 8 10 12 14
Power (%
Total)
Time (s)
Too much power!
Mobile app example Want app to consume <= 80% power.
14
0
20
40
60
80
100
0 2 4 6 8 10 12 14
Power (%
Total)
Time (s)
25% Increase
Power ok now.
Op/on 1: Let HW handle with DVFS
But you get a slowdown.
15
0
20
40
60
80
100
0 2 4 6 8 10 12 14
Power (%
Total)
Time (s)
Op/on 2: Compiler/Language Smart DVFS
10% Increase
Power ok now.
S/ll get a slowdown.
16
0
20
40
60
80
100
0 2 4 6 8 10 12 14
Power (%
Total)
Time (s)
Op/on 2: Compiler/Language Smart DVFS
10% Increase
Power ok now.
S/ll get a slowdown.
17
Moreover, must slow ALL apps on the same core.
0
20
40
60
80
100
0 2 4 6 8 10 12 14
Power (%
Total)
Time (s)
Total AdverGsement
Op/on 3: Trade func/onality for power
18
Banner ad:
0
20
40
60
80
100
0 2 4 6 8 10 12 14
Power (%
Total)
Time (s)
Total AdverGsement
Op/on 3: Trade func/onality for power
19
Banner ad:
Ad is responsible for power spike
0
20
40
60
80
100
0 2 4 6 8 10 12 14
Power (%
Total)
Time (s)
Total AdverGsement
Op/on 3: Trade func/onality for power
20
Banner ad:
Pause the ad, maintain power budget with no /me delay
NRG-‐Loops: SW-‐Only Power Management
21
• Instead of having sohware manage power via hardware knobs, have sohware manage power via sohware knobs.
NRG-‐Loops: SW-‐Only Power Management
22
• Instead of having sohware manage power via hardware knobs, have sohware manage power via sohware knobs.
HW Knobs SW Knobs
• DVFS • Idle cores • Asymmetric
mulGcore
• Adjust caching strategy • Reduce thread count • EsGmate mathemaGcal
funcGon • Stop computaGon early
and dump memory
NRG-‐Loops: SW-‐Only Power Management
23
• C++ Language Extension to tune SW power through SW knobs.
• Measures hardware power + energy and enables programs to trade funcGonality or accuracy ONLY when runGme power or energy budgets are exceeded.
• Can work concurrently with HW power soluGons.
NRG-‐Loops ADAPT
24
Concise syntax adds only a few lines of code to exis7ng programs NRG_ADAPT_for ( int i=0; i<MAX_ADS ; ++i && NRG_AVG_P<=POWER_LIMIT ) { // run ad normally
} NRG_ALTERNATE { usleep ( PAUSE_TIME );
}
NRG-‐Loops ADAPT
25
Concise syntax adds only a few lines of code to exis7ng programs NRG_ADAPT_FOR ( int i=0; i<MAX_ADS ; ++i && NRG_AVG_P<=POWER_LIMIT ) { // run ad normally
} NRG_ALTERNATE { usleep ( PAUSE_TIME );
}
Loop pragma
NRG-‐Loops ADAPT
26
Concise syntax adds only a few lines of code to exis7ng programs NRG_ADAPT_FOR ( int i=0; i<MAX_ADS ; ++i && NRG_AVG_P<=POWER_LIMIT ) { // run ad normally
} NRG_ALTERNATE { usleep ( PAUSE_TIME );
}
Original loop bounds
NRG-‐Loops ADAPT
27
Concise syntax adds only a few lines of code to exis7ng programs NRG_ADAPT_FOR ( int i=0; i<MAX_ADS ; ++i && NRG_AVG_P<=POWER_LIMIT ) { // run ad normally
} NRG_ALTERNATE { usleep ( PAUSE_TIME );
}
Concatenate power/energy goals
NRG-‐Loops ADAPT
28
Concise syntax adds only a few lines of code to exis7ng programs NRG_ADAPT_FOR ( int i=0; i<MAX_ADS ; ++i && NRG_AVG_P<=POWER_LIMIT ) { // run ad normally
} NRG_ALTERNATE { usleep ( PAUSE_TIME );
}
Original loop body
NRG-‐Loops ADAPT
29
Concise syntax adds only a few lines of code to exis7ng programs NRG_ADAPT_FOR ( int i=0; i<MAX_ADS ; ++i && NRG_AVG_P<=POWER_LIMIT ) { // run ad normally
} NRG_ALTERNATE { usleep ( PAUSE_TIME );
}
Enter if power budget exceeded
NRG-‐Loops ADAPT
30
Concise syntax adds only a few lines of code to exis7ng programs NRG_ADAPT_FOR ( int i=0; i<MAX_ADS ; ++i && NRG_AVG_P<=POWER_LIMIT ) { // run ad normally
} NRG_ALTERNATE { usleep ( PAUSE_TIME );
}
Alternate, low-‐power loop body
Other types of NRG-‐Loops
31
NRG_TRUNCATE_FOR (int i=0; i<N; ++i &&
NRG_TOT_E <= FOO_ENERGY) { // original loop body } NRG_PROB_PERF_FOR (int i=0; i<N; ++i &&
NRG_TOT_E <= FOO_ENERGY; PROB_SKIP=0.1) { // original loop body
} NRG_AUTO_PERF_FOR (int i=0; i<N; ++i && NRG_TOT_E <= FOO_ENERGY) {
// original loop body }
Do work un7l NRG Condi7on is met
Other types of NRG-‐Loops
32
NRG_TRUNCATE_FOR (int i=0; i<N; ++i &&
NRG_TOT_E <= FOO_ENERGY) { // original loop body } NRG_PROB_PERF_FOR (int i=0; i<N; ++i &&
NRG_TOT_E <= FOO_ENERGY; PROB_SKIP=0.1) { // original loop body
} NRG_AUTO_PERF_FOR (int i=0; i<N; ++i && NRG_TOT_E <= FOO_ENERGY) {
// original loop body }
Once condi7on met, do work 9/10 7mes
Other types of NRG-‐Loops
33
NRG_TRUNCATE_FOR (int i=0; i<N; ++i &&
NRG_TOT_E <= FOO_ENERGY) { // original loop body } NRG_PROB_PERF_FOR (int i=0; i<N; ++i &&
NRG_TOT_E <= FOO_ENERGY; PROB_SKIP=0.1) { // original loop body
} NRG_AUTO_PERF_FOR (int i=0; i<N; ++i && NRG_TOT_E <= FOO_ENERGY) {
// original loop body }
Do work, skipping an es7mated number of itera7ons to exactly match FOO_ENERGY
NRG-‐Loop Helpers
34
NRG_AUDIT { foo() // any code here
} NRG_USAGE (NRG_USAGE_INFO* foo_usage); float foo_energy = foo_usage-‐>energy; float foo_average_power = foo_usage-‐>average_power; float foo_wall_time = foo_usage-‐>wall_time; NRG_AVG_P <= 50.0 NRG_TOT_E <= foo_energy NRG_AVG_P <= 0.5*SYS_MAX_POWER
Capture the energy, power, and runGme use of any code.
NRG-‐Loop Helpers
35
NRG_AUDIT { foo() // any code here
} NRG_USAGE (NRG_USAGE_INFO* foo_usage); float foo_energy = foo_usage-‐>energy; float foo_average_power = foo_usage-‐>average_power; float foo_wall_time = foo_usage-‐>wall_time; NRG_AVG_P <= 50.0 // Watts NRG_TOT_E <= foo_energy // Relative to foo() NRG_AVG_P <= 0.5*SYS_MAX_POWER // Relative to TDP
Different opGons to set energy/power budgets.
NRG-‐Loops Implementa/on
36
• No adjustments to the O/S, used commodity HW with built in power meters (Intel RAPL) – Measure CPU + cache power + esGmate DRAM power
• ImplementaGon *should have* been trivial, but unfortunately wasn’t – Socket level power meters make adribuGng power to processes tricky
– Small, overflowing counters – To minimize monitoring overhead we have one monitoring thread even for mulGthreaded programs
NRG-‐Loops Implementa/on
37
• No adjustments to the O/S, used commodity HW with built in power meters (Intel RAPL) – Measure CPU + cache power + esGmate DRAM power
• ImplementaGon *should have* been trivial, but unfortunately wasn’t – Socket level power meters make adribuGng power to processes tricky
– Small, overflowing counters – To minimize monitoring overhead we have one monitoring thread even for mulGthreaded programs
• Profiling power & energy goals adds <1% overhead.
NRG-‐Loops Implementa/on
38
• No adjustments to the O/S, used commodity HW with built in power meters (Intel RAPL) – Measure CPU + cache power + esGmate DRAM power
• ImplementaGon *should have* been trivial, but unfortunately wasn’t – Socket level power meters make adribuGng power to processes tricky
– Small, overflowing counters – To minimize monitoring overhead we have one monitoring thread even for mulGthreaded programs
• Profiling power & energy goals adds <1% overhead.
Results: NRG_ADAPT Minesweeper
39
When adver7sements are consuming too much energy, force them to occasionally pause, decreasing net game plus ad energy.
Results: NRG_ADAPT Parallel Programs
40
Reduce soYware thread count to keep within a power budget.
Energy RelaG
ve
to Uncappe
d
Ques/ons?
41
• Melanie Kambadur
(melaniekambadur@gmail.com) • Martha Kim (martha@cs.columbia.edu)
NRG-‐Loops: Adjus7ng Power from within Applica7ons
top related