instruction level power analysis
DESCRIPTION
TRANSCRIPT
Instruction Level Power Analysis
1
2
Layout
Introduction Components of Power Consumption Power Characterization Instruction Level Power Analysis for RISC
processors Extensions for VLIW/EPIC processors Register Files Caches
3
Introduction
Why power of nano-electronics became so important? Because of Moore’s law still holds true through
complex applications Mobile systems – battery “bottleneck” High performance computation – heat
extraction Operating cost and reliability
Data warehouse of ISP with 8000 servers needs 2 MW
4
Introduction
Power or Energy? Aren’t they go hand-in-hand? Power varies significantly with time! A given battery has fixed amount of energy Average power consumption = Energy/Execution-
time Decides average chip and junction temperature Decides battery life (if peak current < rated
current) Peak power and current
Voltage drops, hot spots, rate of battery discharge Power-efficient, Energy-efficient, Battery-efficient
design paradigms do exist!
5
Components of Power Consumption
System = hardware platform + software (sys. & app.) Software impacts hardware power consumption
Static power Sub-threshold leakage & reverse biased junction leakage Quiescent biasing power (in case of non-CMOS circuits)
Dynamic power Charging and discharging of capacitance (switching
activity) Short circuit power during transition (rate of change,
delay) Alternative grouping (used at component/cell level)
Switching power at the boundaries of cells Internal cell power
Short circuit power Switching power at internal nodes
6
System Abstractions - PowerFunctional Specifications and Constraints
System Level Netlist
Register Transfer Level (RTL) Netlist
Component/Cell Level Netlist
Layout or Configuration-bits
Chip
Tim
e co
mp
lexity
Accu
racy
of p
ow
er
chara
cteriza
tion
Op
port
un
itie
s fo
r op
tim
izati
on
7
Power Characterization
Measurement (Chip/Board Level) Most accurate Perhaps the fastest, if setup and tools
exist Too late to change hardware details Software/Load control is still possible Typically used for software
optimizations
8
Power Characterization (cont…)
Transistor Level (estimation) Spice simulation of transistor level netlist Most accurate in the simulation world Requires complete implementation details Unmanageable time complexity even for
simpler designs Typically used for cell/component
characterization Synopsys PowerMill (said to provide spice-
like accuracy)
9
Power Characterization (cont…)
Cell Level (estimation) After logic synthesis Requires RTL implementation Simulation to capture switching activity
Requires delay simulation if glitches need to be accounted Characterized cells – empirical formulas or table look-up Interconnect power
Either unaccounted or Using estimated wire load models (typically based on
experience) or Extracted layout (if done after physical synthesis)
Still unmanageable time complexity especially to use in design space exploration
Synopsys PrimePower Netlist, interconnect capacitance, VCD traces, cell power
library
10
Power Characterization (cont…)
Register Transfer Level (estimation) Requires conceptual RTL description (detailed
micro-architecture) Data-path is modeled as netlist of macro cells,
which are characterized offline Control path and glue logic
Either unaccounted or estimated based on I/O Simulation to capture switching activity
Typically glitches are not considered but methods do exist
Interconnect power Typically unaccounted but possible to estimate
through floor-planning Typically used in DSE mostly using in-house tools
11
System Level Power Estimation
For Design Space Exploration Least accurate but uncertainty of exploration results
can be reduced if models have good fidelity Purpose, target architecture and available system
details govern the system-level estimation models Selecting algorithm or designing hardware for given
algorithm? ASIC based or processor based? Is ISA fixed or extensible?
Typically system-level power estimation models are macro-architecture template specific
Major constituents of power consumption Computation, communication, storage units & peripherals
12
Power Estimation Models
Activity Based Models Instruction Level Energy Models
13
Activity Based Models
Fixed Activity Model N-Transition Model Dual Bit Model
14
Fixed Activity Model
P = ∑ i kiGifi
Where:ki = PFA proportionality constant extracted
empirically from past designsGi = Measure of hardware complexity
fi = Activation frequency
Disadvantage: Do not model the influence of data activity on power consumption
15
N-Transition Model
P = Pconst + n.Pchange
Disadvantage:
It does not differentiate between transitions on different inputs.
16
Dual Bit Type Model
Drawback in previous approaches: Less Accurate Characterizes the
module on basis of Uniform White Noise (UWN) input
Leads to high error if the input dynamic range does not fully occupy the word length
17
Dual Bit Type ModelThe Approach
Combines reduced complexity of the architecture level with the accuracy of gate and circuit level
Black box model of capacitance switched in each module for various types of inputs
Easy to parameterize capacitance models to take into account size , etc.
18
Dual Bit Type ModelModeling Complexity
Power consumed by a module is a function of its complexity as large modules contain more circuitry
Examples: Capacitance of N-bit ripple carry subtracter:
CT = Ceff * N Not restricted to linear models, but can be
used to specify even more complex models
19
Dual Bit Type ModelCapacitive Data Coefficients
Describe the average amount of capacitance switched within a module during an input transition LSB regions suffer random transitions and
hence can be characterized by a single capacitive coefficient CUU
MSB region experiences sign transitions and so is characterized by capacitive sign coefficients C+-,C++, etc.
20
Instruction Level Power Estimation
First introduced to characterize processor power consumption to drive software optimizations
Each instruction is associated with some current
Inter instruction effects for better accuracy
21
Instruction Level Power Estimation
E = Σ(Bi x Ni) + Σ(O(i,j) x N(I,j)) + ΣEk
Bi: Base Energy Cost Oi.j: Inter-instruction effect Energy Cost Ek: additional energy penalties due to
resource constraints Require cost associated with every pair
of instructions: O(N2), where N = number of instructions in ISA
22
JouleTrack
Experiments on StrongARM by Amit Sinha & A.P.Chandran Current/instruction ~ 0.2A (averaged over all
instructions) Min-max variation of 38% of average current Address mode and data dependent variation is
smaller But, max current variation across benchmarks is
< 8% ! Concluded that first order energy model of a
given processor is, E = V I(V, f) T Second order effects can be significant for data-
path dominated processors such as DSP, VLIW
23
Instruction Level Power Estimation
Impractical for CISC processors with very large instruction set Higher Average Instruction Energy Low Energy Per Instruction Variance Do not consider inter instruction effects Cluster Similar Instructions as a single
class Exponential Storage Problem for VLIW
architectures No. of Long Instructions = N operations
into a K-wide VLIW = N(2k)
24
Modified Energy Model for VLIW
Assume Independent Energy dissipation for different Execution slots
Consider nop as the base energy E(W) = ΣU(wn|wn-1) + mxpxS + lxqxM U(wn|wn-1) = U(0|0) + Σv(wnk,wn-1k)
Wnk = operation issued on lane k by instruction wn Example
Wn = [ ALU NOP NOP NOP], Wn-1 = [ LS NOP ALU NOP]
U(wn|wn-1) = U(0|0) + v(ALU|LS) + v(NOP|ALU) Memory Requirement
O(K*N2)
25
Modified Energy Model for VLIW Cluster Similar Instructions based on cost
Θ = {e1, e2, …, et} et = energy consumption of instruction t
Partition Θ into K clusters (C1, C2, …, Ck) s.t. ΣΣ (xi,j –cj)2 = minimum
Large number of clusters Good Accuracy Huge no. of experiments
Small number of clusters Small number of experiments High Variance between clusters Reduced Accuracy
Memory Requirement O(C*N2)
26
Limitations of ILPA
Does not provide any insight on the causes of power consumption within the processor core
Does not account for the power consumed in the memory system, which is often dominant
To address the second limitation, power estimation frameworks which integrate processor and memory models are built around instruction set simulators
27
MicroArchitecture ILPA
Pipeline Aware Instruction Level Energy Model Divide the design into smaller architectural blocks
Usually Processor’s Pipeline Stages Fetch, Decode, RF, Execute, WB
E(wn|wn-1) = Σ As(wn|wn-1) + I(wn|wn-1) As = Energy Consumed Per stage s when executing
wn after wn-1 I(wn|wn-1) = Interstage connections energy
(PipeLine Registers + Buses) Provides better insight for power bottlenecks Smoother Energy Behaviour than Blackbox model Require a Pipeline Structure Aware ISS
28
Energy Models for Register File
Assume Linear Power Behaviour for access across different ports PRF = Pi + 1/T Σ (Er,n + Ew,n) Er,n = Σ H(RRi,n, RRi,n-1) *ErbEw,n = Σ H(RWi,n, oldi,n) * Ewb
29
Energy Model for Caches
Power consumption depends on mode of operation (read, write, idle)
Energy consumed in a given clock cycle is function of node transition between previous and current cycle.
Characterize energy as function of state transitions(read-read, read-write, etc).
For a given transition, dependence upon transition on address lines.
30
Thank You