1 power estimation in the algorithmic and register-transfer level september 25, 2006 chong-min kyung
TRANSCRIPT
![Page 1: 1 Power estimation in the algorithmic and register-transfer level September 25, 2006 Chong-Min Kyung](https://reader036.vdocuments.site/reader036/viewer/2022070410/56649f015503460f94c16cbc/html5/thumbnails/1.jpg)
1
Power estimation in the algorithmic and register-
transfer level
September 25, 2006Chong-Min Kyung
![Page 2: 1 Power estimation in the algorithmic and register-transfer level September 25, 2006 Chong-Min Kyung](https://reader036.vdocuments.site/reader036/viewer/2022070410/56649f015503460f94c16cbc/html5/thumbnails/2.jpg)
2
Software power analysis
• Objective ; – Compare different programs– Select processors– Optimize software
• Three level of granularity, (acc. to execution speed, availability & accuracy)– Source code level– Instruction level– BFM (Bus Function Model) level
![Page 3: 1 Power estimation in the algorithmic and register-transfer level September 25, 2006 Chong-Min Kyung](https://reader036.vdocuments.site/reader036/viewer/2022070410/56649f015503460f94c16cbc/html5/thumbnails/3.jpg)
3
• Execution performed on– 1) Target processor ; Compile source code &
run• measure the heat generated to estimate the
power?• Or monitor (with inserted monitoring instructions,
or some hardware, both with hopefully negligible overhead/disturbance on the power and speed) to count the occurrence of each instruction and compute the total estimated power?
• Dynamic code can be also handled.• Minimal disturbance of the overhead code is the
key to accuracy
![Page 4: 1 Power estimation in the algorithmic and register-transfer level September 25, 2006 Chong-Min Kyung](https://reader036.vdocuments.site/reader036/viewer/2022070410/56649f015503460f94c16cbc/html5/thumbnails/4.jpg)
4
• Execution performed on– 2) Another processor ; Run a program
estimating the power consumption with the target-compiled code as input data. • Only the power consumption of the static
code can be estimated.
– 3) Simulator ;• Either in source code level, • Or instruction code level (same as ‘Another
Processor’)
![Page 5: 1 Power estimation in the algorithmic and register-transfer level September 25, 2006 Chong-Min Kyung](https://reader036.vdocuments.site/reader036/viewer/2022070410/56649f015503460f94c16cbc/html5/thumbnails/5.jpg)
5
Power estimation of Software
• Simplest approach ; Energy consumption is proportional to the program execution time.
• Instruction set approach ; Energy consumption is different for each instruction class (class of similar power behavior), and each class of instruction pair (inter-instruction dependency).– Measurement done by running long loops of
the same instruction
![Page 6: 1 Power estimation in the algorithmic and register-transfer level September 25, 2006 Chong-Min Kyung](https://reader036.vdocuments.site/reader036/viewer/2022070410/56649f015503460f94c16cbc/html5/thumbnails/6.jpg)
6
Power estimation of Software
• Becomes more difficult with more complex processor (multi-thread, out-of-order execution,…) and memory system architecture (cache..)
• Accurate estimation requires software profiling on ISS with bus access pattern.
• A 5% accurate estimation model developed for ARM processor [DAC 99, Simunic, T;Cycle-accurate simulation of energy consumption in embedded systems]
![Page 7: 1 Power estimation in the algorithmic and register-transfer level September 25, 2006 Chong-Min Kyung](https://reader036.vdocuments.site/reader036/viewer/2022070410/56649f015503460f94c16cbc/html5/thumbnails/7.jpg)
7
Algorithmic-level power estimation
• Algorithmic-level power estimation consists of – Architecture estimation– Activation estimation– Power model evaluation
• Architecture estimation by High-Level Synthesis (HLS)– Allocation, Scheduling, and Binding (Allocation in narrow
sense is ‘unit selection’, where each operation can be performed by more than one unit.)
– Allocation and Scheduling affect each other.
• HLS considering communication (interconnect) – ASB + floorplanning– Cycle time violation check based on wire delay (based on
wire length estimation)
• (HLS considering interconnect) and power
![Page 8: 1 Power estimation in the algorithmic and register-transfer level September 25, 2006 Chong-Min Kyung](https://reader036.vdocuments.site/reader036/viewer/2022070410/56649f015503460f94c16cbc/html5/thumbnails/8.jpg)
8
Target architecture of HLS
• Target architecture of HLS– Datapath <- dataflow of CDFG– Controller <- dataflow and control flow– Clock tree
![Page 9: 1 Power estimation in the algorithmic and register-transfer level September 25, 2006 Chong-Min Kyung](https://reader036.vdocuments.site/reader036/viewer/2022070410/56649f015503460f94c16cbc/html5/thumbnails/9.jpg)
9
Target architecture of HLS• Architecture synthesis =
– Schedule the operations under timing & resource constraints, and
– Allocate the required resources (operation units)
• Operation unit can be arithmetic module, logic module or memory module.
– Output of architecture synthesis is• A set of operation units• Registers• Steering logic to transfer data between operation
units and registers, and• Controller having control signals to steer MUX, OU
and Enable signal of registers
• How to integrate power optimization into HLS?
![Page 10: 1 Power estimation in the algorithmic and register-transfer level September 25, 2006 Chong-Min Kyung](https://reader036.vdocuments.site/reader036/viewer/2022070410/56649f015503460f94c16cbc/html5/thumbnails/10.jpg)
10
RTL Power Modeling
• RTL Power Modeling = Constructing a model Power=P(X1,X2,…Xn) from n model parameters
![Page 11: 1 Power estimation in the algorithmic and register-transfer level September 25, 2006 Chong-Min Kyung](https://reader036.vdocuments.site/reader036/viewer/2022070410/56649f015503460f94c16cbc/html5/thumbnails/11.jpg)
11
Issues of RTL power modeling
• Granularity ;• Choice of model parameters ;
– Activity model or complexity model or both?
• Semantic of the model ; – cumulative or cycle-accurate?
• How to build and store the model ;– Top-down or bottom-up?– Table or equation?
![Page 12: 1 Power estimation in the algorithmic and register-transfer level September 25, 2006 Chong-Min Kyung](https://reader036.vdocuments.site/reader036/viewer/2022070410/56649f015503460f94c16cbc/html5/thumbnails/12.jpg)
12
Model granularity
• Model granularity ; – Should not be too big;
• E.g., single monolithic model is too time-consuming to build, inaccurate, and inflexible
– Not too small;– FSMD (FSM with datapath) is a reasonable
choice, as RTL design is an interaction of datapath and controller
• Five main components ;– Controller– Register file– Bus– Memory– Functional blocks
![Page 13: 1 Power estimation in the algorithmic and register-transfer level September 25, 2006 Chong-Min Kyung](https://reader036.vdocuments.site/reader036/viewer/2022070410/56649f015503460f94c16cbc/html5/thumbnails/13.jpg)
13
![Page 14: 1 Power estimation in the algorithmic and register-transfer level September 25, 2006 Chong-Min Kyung](https://reader036.vdocuments.site/reader036/viewer/2022070410/56649f015503460f94c16cbc/html5/thumbnails/14.jpg)
14
Activity model or Complexity model, or both?
• Model Parameters ;– What parameters are to be included in the
model?– Model parameters must be observable at
the RTL• P total = k AiCi ; Power model decoupled into two
separate models, i.e., activity model and capacitance model
• Activity model or Complexity model, or both?– Complexity model can be just capacitance
model or include transistor count as well to account for the leakage current.
![Page 15: 1 Power estimation in the algorithmic and register-transfer level September 25, 2006 Chong-Min Kyung](https://reader036.vdocuments.site/reader036/viewer/2022070410/56649f015503460f94c16cbc/html5/thumbnails/15.jpg)
15
Activity parameters• RTL activity : an approximation of all intra-clock
cycle activities projected to the relevant clock transition point.
• Main parameters are static and transition probabilities– Choose between bit-wise and word-wise probability
according to the desired accuracy and speed• n-input, m-output component has (n+m) bitwise parameters,
while has only two word-wise parameters
• Additional parameters;– Transition density ; average switching rate per second
• Includes non-periodic signals– Correlation measures ; useful for computing switching
power• Spatial correlation• Temporal correlation = transition probability
– Entropy ; somewhat similar to transition probability (2p(1-p)
• plog2(1/p)+(1-p)log2(1/(1-p))
![Page 16: 1 Power estimation in the algorithmic and register-transfer level September 25, 2006 Chong-Min Kyung](https://reader036.vdocuments.site/reader036/viewer/2022070410/56649f015503460f94c16cbc/html5/thumbnails/16.jpg)
16
Complexity parameters
• Capacitance ~ gate count, TR count,.• Only complexity parameters available at
RTL are– Width of a component ; # of inputs, outputs– # of states ; applicable for controller
• Architecture-specific model– k12N2 for NxN multiplier
– k2N for ripple carry adder
![Page 17: 1 Power estimation in the algorithmic and register-transfer level September 25, 2006 Chong-Min Kyung](https://reader036.vdocuments.site/reader036/viewer/2022070410/56649f015503460f94c16cbc/html5/thumbnails/17.jpg)
17
Model semantics
• Cumulative (average) vs. cycle-accurate ;– Cumulative power = summation of average
(cumulative) power over module– Cycle-accurate power = summation of power over
module for each clock cycle
• Cumulative power is only as good as tracking battery time, average heat dissipation, etc.
• Cycle-accurate power is needed for IP drop, noise, reliability (electromigration) analysis.
• Pseudo-cycle-accurate power estimation may be okay for dynamic power management.
![Page 18: 1 Power estimation in the algorithmic and register-transfer level September 25, 2006 Chong-Min Kyung](https://reader036.vdocuments.site/reader036/viewer/2022070410/56649f015503460f94c16cbc/html5/thumbnails/18.jpg)
18
How to build and store the model
• Model construction– Top-down ; good for
• When the implementation follows some predictable template, e.g., memory
• When dealing with a new circuit having no measured data available
– Bottom-up ;• Can be equation-based
– Template for the power model is given first,– Statistical techniques are used to fit the measured
values to the model by adjusting cofficients
• Model storage– Equation-based– Table-based
![Page 19: 1 Power estimation in the algorithmic and register-transfer level September 25, 2006 Chong-Min Kyung](https://reader036.vdocuments.site/reader036/viewer/2022070410/56649f015503460f94c16cbc/html5/thumbnails/19.jpg)
19
Accuracy issue
• Metric ; E = lPe-Pl/max(Pe,P)
• Average error• Standard deviation
![Page 20: 1 Power estimation in the algorithmic and register-transfer level September 25, 2006 Chong-Min Kyung](https://reader036.vdocuments.site/reader036/viewer/2022070410/56649f015503460f94c16cbc/html5/thumbnails/20.jpg)
20
Macro modeling flow
1. Choose model parameters- Ex) Average switching activity of inputs and/or outputs
2. Design training set– Good coverage, unbiasedness, resembling actual
circumstantial conditions
3. Characterization– Running the power-accurate lower-level simulator
• For example, for RTL training, run a gate-level simulator with good coverage of input/output switching activities
4. Model extraction• For Equation-based, run LMS regression engine• For table-based, merge entries according to the
available table space
![Page 21: 1 Power estimation in the algorithmic and register-transfer level September 25, 2006 Chong-Min Kyung](https://reader036.vdocuments.site/reader036/viewer/2022070410/56649f015503460f94c16cbc/html5/thumbnails/21.jpg)
21
![Page 22: 1 Power estimation in the algorithmic and register-transfer level September 25, 2006 Chong-Min Kyung](https://reader036.vdocuments.site/reader036/viewer/2022070410/56649f015503460f94c16cbc/html5/thumbnails/22.jpg)
22
C. Piguet, ‘Low Power CMOS ; technology, logic design and CAD tools’ CRC until
10/25• 9/4; introduction• 9/18; physics and limits of power dissipation in CMOS (2,3)• 9/20; system-level power analysis and estimation (18)• 9/25; algorithmic and RTL power estimation (18,19)• 9/27, 10/2, 10/4, 10/9, 10/11 ; synthesis for low power circuits
and logic blocks (7,8,9,10,13)• 10/16 ; driving interconnects for low power (14)• 10/18 ; new device candidates (4,5)• 10/23; ultimate low power logic (16)• 10/25; robustness of low power logic (17)• 10/30; low power memory• 11/1; software for low power• 11/6; energy recovery circuits• 11/8; adaptive power supply systems• 11/13,15 ; student presentation• 11/20, 22; low power DSP• 11/27 low power design methodology• 11/29, 12/4,6,11,13 ; student presentation