when is energy time?: understanding energy scaling with...
TRANSCRIPT
4/16/16
1
SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY
When Is Energy ≠ Time?: Understanding Energy Scaling with
eAudit
Sudhakar Yalamanchili and Eric Anger
School of Electrical and Computer Engineering Georgia Institute of Technology
SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY
Scaling Performance and Power
Perf opss
!
"#
$
%&= Power W( )×Efficiency ops
joule!
"#
$
%&
W. J. Dally, Keynote IITC 2012
2
n Increasing performance requires increasing system scale à parallelism
Current Exascale Growth Cores 3.1M 1B ~300x
Power 17.8MW 20-40MW ~1.5-2.5x
§ Scaling parallelism does not effect energy and time in the same way!
4/16/16
2
SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY
Energy Scaling vs. Time Scaling
n Developers should understand how both energy and time scale with parallelism
n Relationship between energy scaling and time scaling? n When are they not the same? Tradeoffs? n Drive the development of diagnostic tools
3
HPCCG Mini-app on 32nm Sandy Bridge EP
What causes this gap?
EnergySpeedup = E1Ep
TimeSpeedup = T1Tp
SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY
Why Is This an Application/Algorithm Problem?
4
4/16/16
3
SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY
Managing Thermal Capacity: Thermal Coupling
5
n Significant rise in temperature of the idle component due to thermal coupling and pollution
n CPU cores consume thermal headroom more rapidly (4X faster)
n GPUs sustain power boosts longer!
n Better management for 10%-40% gains in measured energy efficiency are possible
n Power management ≠ thermal management
Temperature on Core 2 when Core 3 is busy and remaining cores are idle
0 1 2 3
I. Paul, S. Manne, M. Arora, W.L. Bircher, S. Yalamanchili, “Cooperative boosting: needy versus greedy power management”, ISCA 2013.
SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY
Managing Power Capacity
6
0%
10%
20%
30%
40%
50%
Total Force Neighbor Comm Other % in
crea
se in
run
-tim
e
CPU DVFS per kernel in miniMD ->
P0 P1 P2 P3 P4
Dynamic Demand Power Sensitivity à Energy Efficiency1
Design Space2 (BFS)
1I. Paul, et.al, “Coordinated Energy Management in Heterogeneous Processors” SC13 2A. McLaughlin et.al, “A Power Characterization and Management of GPU Graph Traversal,” ADMS 2014
Need more DVFS states?
4/16/16
4
SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY
Shift in the Balance Point
38
123
207 292
0 1 2 3 4 5 6 7 8
200 400 600 800 1000
Band
width (G
B/s)
Normalized
Perform
ance
Core Frequency (MHz)
38
151
264
0 2
4
6
8
10
12
4 8 12 16 20 24 28 32 36 40 44 Band
width (G
B/s)
Normalized
Perform
ance
# AcBve CUs
Balance plane for performance and energy
I. Paul, W. Huang, M. Arora, and S. Yalamanchili, “Harmonia: Balancing Compute and Memory Power in High Performance GPUs,” IEEE/ACM International Symposium on Computer Architecture (ISCA), June 2015.
7
Relative energy costs of compute
and memory access
Relative ops/byte demand of application
Hardware Balance
Up to 36% power savings with a maximum performance loss of 3.6%
SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY
Understanding Application-Level Energy Scaling
8
21.7 29
0
1
2
3
4
5
6
7
8
9
10
eu-‐2005 italy rgg_n_2_18_so
Relative Energy
(normalized to the minimum
energy)
HIPC
LS
SHOC1
SHOC2
n Challenge: How do we understand the energy implications of our decisions? Algorithms, data structures, etc.
n Note the variance of energy dissipation across different implementations of the same function
Different implementations of BFS on different input data sets
Courtesy H. Kim
4/16/16
5
SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY
When Is Energy ≠ Time?
9
You can hide latency but not energy!
SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY
Energy Components
n Static Energy n Scales with execution time n E.g., leakage energy
n Technology dependent
n Idle energy consumption (OS, I/O, etc.)
n Dynamic Energy n Scales with work n Independent of time (strong scaling) n Energy required to solve problem
10
• Constant under ideal strong scaling
• Grows under weak scaling
Energy overhead
Energy scaling is limited by the fraction corresponding to static energy
4/16/16
6
SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY
Amdahl’s View of Energy Scaling
11
s = Es
Es +Ed
static energy
dynamic energy
Base case is a single core executing a serial algorithm
Cache RF
Cache RF
Cache RF
Cache RF
DRAM
core Se =
1
(1− s)+ sSt
static fraction
fpfSt+
−= 1
1
serial fraction
n Ideally energy speedup tracks time speedup (no idle energy)
pSt
t =η
Time efficiency
SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY
Impact of Concurrency
12
n In time, strong scaling is limited by the serial fraction n When it is small, large benefit from strong scaling
n In energy, strong scaling is limited by the static fraction n Static fraction is multiplicative penalty in addition of the serial fraction
sfpsfSe ××+×−=
)1(1
4/16/16
7
SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY
Energy Scaling vs. Time Scaling
n Impact of time on energy scaling is a function of the static fraction
Time_ speedup ≥ Energy_ speedup ≥1.0
13
SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY
Runtime vs. Design Time?
14
4/16/16
8
SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY
Energy Auditing: eAudit
15
App to Profile
eAudit
HPCs Process Info
CPU CPU CPU CPU
Eiger Model
HW
Energy Profile
n Application energy auditing tool n Function-level attribution
n Diagnose application energy consumption behavior
n Provide actionable information to steer energy optimization
Example output
SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY
eAudit Implementation
n Sampling-based measurements, similar to gprof
n RAPL limited to all cores on package: future versions should expose per-core counters
16
eAudit
Martin Dimitrov. https://software.intel.com/en-us/articles/intel-power-governor
4/16/16
9
SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY
Measurements: HPCCG
n Model does not include variations to energy due to growth in work
n Sharing, locking, barriers, etc.
n Typically a function of # threads
17
Socket Boundary SMT Boundary
32nm Sandy Bridge EP
SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY
eAudit In Action: Effect of Prefetching
n Prefetching decreases time and energy, but not my same degree
n Reduction in time à reduction in static energy n Speculative memory traffic à increase in dynamic energy
18
56% Reduction 66% Reduction
32nm Sandy Bridge EP
4/16/16
10
SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY
Need Better Energy Instrumentation!
19
SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY
Scaling Across Sockets
n eAudit demonstrated at board-level
n Next steps: n Add network energy models à system-level application energy audit
20
HPCCG on 32nm Sandy Bridge EP Scaling from 1 socket (6 cores) to 2 sockets (12 cores)
eAudit now
Extension to full system
Name Package Energy Speedup
Package + Memory Energy Speedup
Time Speedup
HPC_sparsemv 1.03 1.10 1.94 waxpby 1.32 1.39 2.41 ddot 1.46 1.48 2.30 generate_matrix 0.59 0.62 1.00 sys 0.14 0.15 0.24
4/16/16
11
SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY
Extending eAudit: The lwperf Library
n Enables measurements across regions of interest
n Integration with Eiger n For further analysis/modeling
n One measurement per log/stop pair
function foo(){ lwperf_log(“region1”); // Computation of interest lwperf_stop(“region1”);
// // Other work… //
lwperf_log(“region2”); // Another computation lwperf_stop(“region2”); // …
}
Name Energy (j)
Time (s)
region1 133.2 1.1
region2 552.7 3.3
SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY
Summary
n Application design should take energy behavior into consideration to reach performance goals
n Characterize energy scaling as a function of static and dynamic energy
n Time scaling only improves static energy
n Basis for eAudit, an energy measurement and analysis tool
22
t
e ssS
η+−
=)1(
1static fraction
eAudit available : github.com/gtcasl/eaudit
4/16/16
12
SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY
Energy Scaling Model
23
Ep = Es × p( )×TpT1+Ed
Ep =Es
ηt
+Ed
ηt =T1
Tp × pηe =
Ed
Es +Ed
et
eeS
ηηη
+−
= 11
Energy with p cores
Time and Energy Efficiency
Energy Speedup
Base case is a single core executing a serial algorithm
Cache RF
Cache RF
Cache RF
Cache RF
DRAM
Scaling static power with time
s =1−ηe