Download - Dynamic Power Analysis of Custom Macros
![Page 1: Dynamic Power Analysis of Custom Macros](https://reader033.vdocuments.site/reader033/viewer/2022051402/568158e4550346895dc6260c/html5/thumbnails/1.jpg)
Dynamic Power Analysis of Custom Macros
Stephen Bijansky
Bassam Mohd
Baker Mohammad
![Page 2: Dynamic Power Analysis of Custom Macros](https://reader033.vdocuments.site/reader033/viewer/2022051402/568158e4550346895dc6260c/html5/thumbnails/2.jpg)
2
Outline
• Motivation
• HSIM Power Analysis
• ESP-CV Power analysis
• ESP-CV Flow
• Results
• Conclusions
![Page 3: Dynamic Power Analysis of Custom Macros](https://reader033.vdocuments.site/reader033/viewer/2022051402/568158e4550346895dc6260c/html5/thumbnails/3.jpg)
3
Motivation
• Power characterization is an important part of low power design
• Custom macro with transistor level design has a challenge to model active power– Spice level simulation is slow– Characterizing all custom cells is a big task– Need a detail gate level model to use ASIC design flow– Changes in the top level affects macro power
Need on going modeling of power with new stimulus
![Page 4: Dynamic Power Analysis of Custom Macros](https://reader033.vdocuments.site/reader033/viewer/2022051402/568158e4550346895dc6260c/html5/thumbnails/4.jpg)
4
Overview
• Power estimation for custom macros– Transistor level schematics– Post-layout capacitance extraction
• Reduce analysis time
• Improve accuracy for long test cases
• This work is used extensively in Qualcomm’s 45nm low power DSPs
![Page 5: Dynamic Power Analysis of Custom Macros](https://reader033.vdocuments.site/reader033/viewer/2022051402/568158e4550346895dc6260c/html5/thumbnails/5.jpg)
5
Traditional Approach (.lib)
• Fast SPICE simulator HSIM• Assume certain activities on data • Append power into lib files
– conditional statements based on control signals
• Limitation on conditional statement– Mutually exclusive
• Depends on internal state nodes – Has the macro just come out of reset or has the macro
been running for a while
– Potential 2M+N entry in the lib file
![Page 6: Dynamic Power Analysis of Custom Macros](https://reader033.vdocuments.site/reader033/viewer/2022051402/568158e4550346895dc6260c/html5/thumbnails/6.jpg)
6
Cont. Traditional Approach (HSIM)
• Fast SPICE simulator• Accuracy within 2% to 3% of HSPICE• Use HSIM to run the entire power benchmark• Power benchmark might be thousands of cycles
– Potential for long run time– Large macros could take days or weeks
• Reduce benchmark to only 100 cycles– Which 100 cycle window should be used– Power analysis could be too large or too small
• Can be time consuming and error prone to set initial conditions
![Page 7: Dynamic Power Analysis of Custom Macros](https://reader033.vdocuments.site/reader033/viewer/2022051402/568158e4550346895dc6260c/html5/thumbnails/7.jpg)
7
1st Order Power Equation
Power = Activity Factor * Cap * Voltage2 * Freq
• Capacitance – LPE
• Voltage – Fixed
• Frequency – Fixed
• Activity Factor – Unknown
![Page 8: Dynamic Power Analysis of Custom Macros](https://reader033.vdocuments.site/reader033/viewer/2022051402/568158e4550346895dc6260c/html5/thumbnails/8.jpg)
8
ESP-CV Simulation
• Symbolic equivalence checking of schematics vs RTL
• Input to ESP-CV is a standard Verilog testbench
• Use ESP-CV as a Verilog simulator for schematics
• Verilog simulation orders of magnitude faster than Spice– Functional simulation– Only need to determine activity factor
![Page 9: Dynamic Power Analysis of Custom Macros](https://reader033.vdocuments.site/reader033/viewer/2022051402/568158e4550346895dc6260c/html5/thumbnails/9.jpg)
9
RC verilog switch-level simulator
D
G
S
“Gold standard”For Accuracy
“High Performance”For Accuracy
“Functional Accuracy”Automated Modeling
“Extremely Fast”No Timing
HSPICE HSIMESPCV
VCS
![Page 10: Dynamic Power Analysis of Custom Macros](https://reader033.vdocuments.site/reader033/viewer/2022051402/568158e4550346895dc6260c/html5/thumbnails/10.jpg)
10
ESP-CV Simulation
• ESP-CV converts schematic to switch level verilog– Special directives for transistor strengths– Internal node names in a custom macro are not in RTL– ESP-CV uses the internal nodes in the schematic
• Run entire benchmarks using thousands of cycles– Same benchmarks used in PT-PX for power estimation of
synthesized logic– Includes reset and initialization
• Fast run time allows running many more benchmarks
![Page 11: Dynamic Power Analysis of Custom Macros](https://reader033.vdocuments.site/reader033/viewer/2022051402/568158e4550346895dc6260c/html5/thumbnails/11.jpg)
11
Flow Steps
Input to the FlowSpice netlistVCD on the macro boundariesCap file
Output : Power Value in W
Integrate the flow with PTPX chip level run ESPCV simulate the
Verilog test bench*GV file from
ESPCV
fsdb from top level VCS sims
Vtran converts fsdb into verilog test bench
Vcd2saif
Power_calc_scriptCap file from
nanotime
Power value (avg, peak, static)
Nodes AF (SAIF)
Vcd dump of all nodes
Verilog test bench for macro interface
Spice netlist for Macro
ESPCV
T=0 010101001T=1101010101
xmp d g s b qc_pch l=40e-9 w=120e-9
set_annotated_power -internal_power 2.452e-02
Custom macro design
Full Chip simulation
![Page 12: Dynamic Power Analysis of Custom Macros](https://reader033.vdocuments.site/reader033/viewer/2022051402/568158e4550346895dc6260c/html5/thumbnails/12.jpg)
12
Flow Steps
![Page 13: Dynamic Power Analysis of Custom Macros](https://reader033.vdocuments.site/reader033/viewer/2022051402/568158e4550346895dc6260c/html5/thumbnails/13.jpg)
13
RTL Simulation and Testbench Creation
• Entire benchmark is simulated for the top level design– Verilog VCS simulation– Starts from reset, performs initialization, then benchmark– Single fsdb dump file for each benchmark
• Vtran converts the fsdb dump of the benchmark to a Verilog testbench– Macro testbench has all of the same inputs as the top level
simulation
![Page 14: Dynamic Power Analysis of Custom Macros](https://reader033.vdocuments.site/reader033/viewer/2022051402/568158e4550346895dc6260c/html5/thumbnails/14.jpg)
14
Flow Steps
![Page 15: Dynamic Power Analysis of Custom Macros](https://reader033.vdocuments.site/reader033/viewer/2022051402/568158e4550346895dc6260c/html5/thumbnails/15.jpg)
15
Calculate Activity Factor
• Process ESP-CV VCD dump file and calculate an activity factor for each node
• Vcd2saif produces a switch activity interchange format (SAIF) file– Time spent at 0/1/Z, numbers of transitions, …– Computed for only the window of interest
• Process the SAIF file to get the activity factor for each node– Transitions / Number of cycles
![Page 16: Dynamic Power Analysis of Custom Macros](https://reader033.vdocuments.site/reader033/viewer/2022051402/568158e4550346895dc6260c/html5/thumbnails/16.jpg)
16
Node Capacitances
• Calibre layout parasitic extraction (LPE)
• Nanotime calculates the total cap of every node– Reads Calibre SPEF file– Add gate, diffusion, and wire caps
• Qcs_process_cap_rpt.pl– Converts Nanotime report to an easy to use column based
text file format
• For nodes, such as bitlines, that do not have a full rail swing, the caps can be scaled
![Page 17: Dynamic Power Analysis of Custom Macros](https://reader033.vdocuments.site/reader033/viewer/2022051402/568158e4550346895dc6260c/html5/thumbnails/17.jpg)
17
Flow Steps
![Page 18: Dynamic Power Analysis of Custom Macros](https://reader033.vdocuments.site/reader033/viewer/2022051402/568158e4550346895dc6260c/html5/thumbnails/18.jpg)
18
Calculate Power
• Qcs_calc_power.pl– Combines switching activities with the capacitances to
compute the power– Voltage and frequency are fixed
• Output is a text file with the power, activity factor, capacitance, and name for each node– Easily sort to determine which nodes use the most power– Retains hierarchy easy to filter– Can partition to determine power on multiple supply nets
![Page 19: Dynamic Power Analysis of Custom Macros](https://reader033.vdocuments.site/reader033/viewer/2022051402/568158e4550346895dc6260c/html5/thumbnails/19.jpg)
19
100 Cycle Validation
![Page 20: Dynamic Power Analysis of Custom Macros](https://reader033.vdocuments.site/reader033/viewer/2022051402/568158e4550346895dc6260c/html5/thumbnails/20.jpg)
20
100 Cycle Validation
• Run ESP-CV with the same 100 cycle window that is used for HSIM
• For tests that use more than 1 mW of power, ESP-CV is within 3% of the HSIM
• For tests that use less than 1 mW, ESP-CV is within 0.08 mW of HSIM
• ESP-CV has good correlation to HSIM
![Page 21: Dynamic Power Analysis of Custom Macros](https://reader033.vdocuments.site/reader033/viewer/2022051402/568158e4550346895dc6260c/html5/thumbnails/21.jpg)
21
ESP-CV for Entire Test versus HSIM for 100 Cycles
![Page 22: Dynamic Power Analysis of Custom Macros](https://reader033.vdocuments.site/reader033/viewer/2022051402/568158e4550346895dc6260c/html5/thumbnails/22.jpg)
22
Results
• 100 cycles do not accurately model an entire test
• Test3 reported 4.7X more power using 100 cycles compared to the entire test
• Test4 reported 55% less power using 100 cycles compared to the entire test
• Difficult to choose a good 100 cycle window
![Page 23: Dynamic Power Analysis of Custom Macros](https://reader033.vdocuments.site/reader033/viewer/2022051402/568158e4550346895dc6260c/html5/thumbnails/23.jpg)
23
Run Time Comparison
![Page 24: Dynamic Power Analysis of Custom Macros](https://reader033.vdocuments.site/reader033/viewer/2022051402/568158e4550346895dc6260c/html5/thumbnails/24.jpg)
24
Run Time Comparison
• ESP-CV full test simulations– Test3 with 49,101 cycles took 406 seconds– Test4 with 240,510 cycles took 3267– Event based simulations scales with the number of cycles
• ESP-CV 100 cycle simulations needed 21 seconds– Not many events in 100 cycles
• HSIM needed between 1,950 seconds (Test5) and 9,468 seconds (Test2) to run 100 cycles– Large differences in run time with fixed number of cycles
![Page 25: Dynamic Power Analysis of Custom Macros](https://reader033.vdocuments.site/reader033/viewer/2022051402/568158e4550346895dc6260c/html5/thumbnails/25.jpg)
25
IR Drop Analysis
• Compute fixed activity factor power for use in Redhawk IR drop analysis
• Every clock nodes is assigned an activity factor of 100%
• Every non-clock node is assigned an activity factor of 15% which is 3 transitions per every 10 clock cycles
• This is worst case analysis that is used to stress the power grid to see where are the weak points
![Page 26: Dynamic Power Analysis of Custom Macros](https://reader033.vdocuments.site/reader033/viewer/2022051402/568158e4550346895dc6260c/html5/thumbnails/26.jpg)
26
Conclusion
• Simulate an entire benchmark instead of trying to guess at a subset of the benchmark– The wrong subset led to a 4.7X overestimation of power– Includes reset and initialization
• Fast simulation enables running more benchmarks
• ESP-CV is being used to generate power estimations of longer benchmarks
![Page 27: Dynamic Power Analysis of Custom Macros](https://reader033.vdocuments.site/reader033/viewer/2022051402/568158e4550346895dc6260c/html5/thumbnails/27.jpg)
27
Future Work
• Short circuit power modeling– Current flow does not address
• Leakage power modeling– Active leakage power is not accurately modeled
• Enable other methods to calculate node capacitances
• More calibration on different circuit families
![Page 28: Dynamic Power Analysis of Custom Macros](https://reader033.vdocuments.site/reader033/viewer/2022051402/568158e4550346895dc6260c/html5/thumbnails/28.jpg)
28
Thank You!
Questions
![Page 29: Dynamic Power Analysis of Custom Macros](https://reader033.vdocuments.site/reader033/viewer/2022051402/568158e4550346895dc6260c/html5/thumbnails/29.jpg)
29
Backup Slides
![Page 30: Dynamic Power Analysis of Custom Macros](https://reader033.vdocuments.site/reader033/viewer/2022051402/568158e4550346895dc6260c/html5/thumbnails/30.jpg)
30
Nanotime Capacitance Report
![Page 31: Dynamic Power Analysis of Custom Macros](https://reader033.vdocuments.site/reader033/viewer/2022051402/568158e4550346895dc6260c/html5/thumbnails/31.jpg)
31
Process Capacitance Report
%nodeCap = ();
while ($line = <CAPFILE>) {
if ($line =~ /^NODE : (\S+)/) {
$node = $1;
$line = <CAPFILE>; $line = <CAPFILE>; $line = <CAPFILE>;
$line = <CAPFILE>; $line = <CAPFILE>; $line = <CAPFILE>;
if ($line =~ /^C_total\s*:\s*(\S+)/) {
$ctotal = $1;
$nodeCap{$node} = max ($ctotal, $nodeCap{$node};
}
}
}