safe rtl annotations for low power microprocessor design
DESCRIPTION
Safe RTL Annotations for Low Power Microprocessor Design. Vinod Viswanath Department of Electrical and Computer Engineering University of Texas at Austin. Talk at Tata Institute of Fundamental Research , Mumbai, India. Outline. Power Dissipation in Hardware Circuits - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Safe RTL Annotations for Low Power Microprocessor Design](https://reader035.vdocuments.site/reader035/viewer/2022062409/56815018550346895dbe002a/html5/thumbnails/1.jpg)
Safe RTL Annotations for Low Power Microprocessor Design
Vinod Viswanath
Department of Electrical and Computer Engineering
University of Texas at Austin
Talk at Tata Institute of Fundamental Research, Mumbai, India.
![Page 2: Safe RTL Annotations for Low Power Microprocessor Design](https://reader035.vdocuments.site/reader035/viewer/2022062409/56815018550346895dbe002a/html5/thumbnails/2.jpg)
Outline
• Power Dissipation in Hardware Circuits• Instruction-driven Slicing to attain lower
power dissipation – Automatically annotates microprocessor
description at the Register Transfer Level and Architectural level
• Correctness of the introduced annotations
• Case studies
![Page 3: Safe RTL Annotations for Low Power Microprocessor Design](https://reader035.vdocuments.site/reader035/viewer/2022062409/56815018550346895dbe002a/html5/thumbnails/3.jpg)
Power Dissipation
• Switching activity power dissipation– To charge and discharge nodes
• Short Circuit power dissipation– High only for output drivers, clock
buffers
• Static power dissipation– Due to leakage current
P = 1/2 ¢ C ¢ V2DD ¢ f ¢ N + QSC ¢ VDD ¢ f ¢ N + Ileak ¢
VDD
![Page 4: Safe RTL Annotations for Low Power Microprocessor Design](https://reader035.vdocuments.site/reader035/viewer/2022062409/56815018550346895dbe002a/html5/thumbnails/4.jpg)
Switching Activity Power Dissipation
• Reduce the squared term VDD
– Leads to exponential increase in Ileak
• Host of techniques to reduce switching power at the gate level– Clock gating
• Relatively much lesser at the RTL– Use program structure and dataflow
information available at that level of abstraction
![Page 5: Safe RTL Annotations for Low Power Microprocessor Design](https://reader035.vdocuments.site/reader035/viewer/2022062409/56815018550346895dbe002a/html5/thumbnails/5.jpg)
Instruction-driven Slice
• An instruction-driven slice of a microprocessor design is – all the relevant circuitry of the design
required to completely execute a specific instruction
– Parts of the decode, execute, writeback etc. blocks
• Cone of influence of the semantics of the instruction
![Page 6: Safe RTL Annotations for Low Power Microprocessor Design](https://reader035.vdocuments.site/reader035/viewer/2022062409/56815018550346895dbe002a/html5/thumbnails/6.jpg)
Instruction-driven Slicing• Given a microprocessor design and
an instruction– Identify the instruction-driven slice– Shut off the rest of the circuitry
• This might include– Gating out parts of different blocks– Gating out floating point units during
integer ALU execution– Turning off certain FSMs in different
control blocks since exact constraints on their inputs are available due to instruction-driven slicing
![Page 7: Safe RTL Annotations for Low Power Microprocessor Design](https://reader035.vdocuments.site/reader035/viewer/2022062409/56815018550346895dbe002a/html5/thumbnails/7.jpg)
Algorithm (High Level)• Algorithm instruction-driven-slicing. Begin
• Inputs: vRTL (Verilog RTL), insts (instructions)• Output: aRTL (Annotated RTL)
– Parse vRTL to obtain the Abstract Syntax Program Graph (ASPG)
– For each instruction I in insts repeat• Slice the ASPG for instruction I• Traverse the ASPG• Add annotation variables if such a block is found• If a particular flop is already gated, then add the current annotation in an optimal fashion• Return the annotated ASPG
– Generate Verilog code (aRTL) for the annotated ASPG
End.
![Page 8: Safe RTL Annotations for Low Power Microprocessor Design](https://reader035.vdocuments.site/reader035/viewer/2022062409/56815018550346895dbe002a/html5/thumbnails/8.jpg)
Instructions as LTL Properties
• Let I = i1 Æ X i2 Æ XX i3 ... Xn-1 in be an instruction written as an LTL property, such that ir represents the conditions for the instruction I on clock cycle r.
• i1 represents the instruction word.
![Page 9: Safe RTL Annotations for Low Power Microprocessor Design](https://reader035.vdocuments.site/reader035/viewer/2022062409/56815018550346895dbe002a/html5/thumbnails/9.jpg)
RISC Pipeline (OR1200)
• 5 stage RISC pipeline implementation• Condition for slicing on ADDC instruction
– i1: ((icpu_dat_i[31:26]==6’b 111000) Æ (!rst) Æ (!flushpipe) Æ (!if_freeze))– i2: (!id_freeze)– i3: (!ex_freeze)– i4: (!mem_freeze)– i5: (!wb_freeze)
• I = i1 Æ X i2 Æ X2i3 Æ X3i4 Æ X4i5
![Page 10: Safe RTL Annotations for Low Power Microprocessor Design](https://reader035.vdocuments.site/reader035/viewer/2022062409/56815018550346895dbe002a/html5/thumbnails/10.jpg)
OR1200 ADDC Instruction
• Introduces five variables:– iADDC_if = i1– iADDC_id = #1 iADDC_if Æ i2– iADDC_ex = #1 iADDC_id Æ i3– iADDC_mem = #1 iADDC_ex Æ i4– iADDC_wb = #1 iADDC_mem Æ i5
![Page 11: Safe RTL Annotations for Low Power Microprocessor Design](https://reader035.vdocuments.site/reader035/viewer/2022062409/56815018550346895dbe002a/html5/thumbnails/11.jpg)
or1200_ctrl.lsu_op
![Page 12: Safe RTL Annotations for Low Power Microprocessor Design](https://reader035.vdocuments.site/reader035/viewer/2022062409/56815018550346895dbe002a/html5/thumbnails/12.jpg)
or1200_ctrl.pre_branch_op
![Page 13: Safe RTL Annotations for Low Power Microprocessor Design](https://reader035.vdocuments.site/reader035/viewer/2022062409/56815018550346895dbe002a/html5/thumbnails/13.jpg)
Correct Annotations
• Notion of correctness– Original RTL and the annotated RTL
should be functionally equivalent under all conditions
• Correctness theorem(defthm or1200_slicing_correct
(equal (or1200_cpu n) (or1200_cpu_sliced n)))
![Page 14: Safe RTL Annotations for Low Power Microprocessor Design](https://reader035.vdocuments.site/reader035/viewer/2022062409/56815018550346895dbe002a/html5/thumbnails/14.jpg)
ACL2 Theorem Prover
• First order logic general purpose theorem prover
• Breakdown the theorem into sub-goals• Many engines work on the sub-goals
and will either prove them or break them down further and add to the central pool of goals to be proved
• Success story in Hardware– Verified FDIV in the AMD processors
![Page 15: Safe RTL Annotations for Low Power Microprocessor Design](https://reader035.vdocuments.site/reader035/viewer/2022062409/56815018550346895dbe002a/html5/thumbnails/15.jpg)
Proof Methodology
• The RTL is a shallow embedding in ACL2• Convert Verilog RTL into ACL2RTL• We have created a large RTL library to
recognize as well as analyze ACL2RTL• Slicing is done on the Verilog code• Both original and annotated Verilog are
converted into ACL2 and we construct the functional equivalence proof in ACL2
![Page 16: Safe RTL Annotations for Low Power Microprocessor Design](https://reader035.vdocuments.site/reader035/viewer/2022062409/56815018550346895dbe002a/html5/thumbnails/16.jpg)
Verilog to ACL2
![Page 17: Safe RTL Annotations for Low Power Microprocessor Design](https://reader035.vdocuments.site/reader035/viewer/2022062409/56815018550346895dbe002a/html5/thumbnails/17.jpg)
Methodology
• In order to demonstrate our technique– We have incorporated instruction-driven slicing
as part of the traditional design flow– The vRTL model is annotated to obtain the
aRTL model– Synopsys Design Environment has been
sufficiently modified to accept the aRTL, SPEC2000 benchmarks and power process parameters and estimate the power dissipation due to switching activity
– The annotated Architectural model is fed to the SimpleScalar simulator with the Wattch power estimator to estimate the power dissipation
![Page 18: Safe RTL Annotations for Low Power Microprocessor Design](https://reader035.vdocuments.site/reader035/viewer/2022062409/56815018550346895dbe002a/html5/thumbnails/18.jpg)
Methodology
![Page 19: Safe RTL Annotations for Low Power Microprocessor Design](https://reader035.vdocuments.site/reader035/viewer/2022062409/56815018550346895dbe002a/html5/thumbnails/19.jpg)
Experiment: OR1200
• We have used our tool-chain to test our methodology on OR1200– OR1200 is a pipelined microprocessor
implementing the OpenRISC ISA.– 5-stage integer pipeline with single
instruction issue per cycle– We have annotated both the RTL and
the architectural models of OR1200
![Page 20: Safe RTL Annotations for Low Power Microprocessor Design](https://reader035.vdocuments.site/reader035/viewer/2022062409/56815018550346895dbe002a/html5/thumbnails/20.jpg)
OR1200: single instruction issue pipelined microprocessor
![Page 21: Safe RTL Annotations for Low Power Microprocessor Design](https://reader035.vdocuments.site/reader035/viewer/2022062409/56815018550346895dbe002a/html5/thumbnails/21.jpg)
OR1200 Power Gain Results
• Results are shown after annotating the– RTL (left) and Architectural (Right) models– For un-sliced and sliced on 1, 4, 10 instructions– For SPECINT2000 benchmarks
• Power dissipation decreases consistently
![Page 22: Safe RTL Annotations for Low Power Microprocessor Design](https://reader035.vdocuments.site/reader035/viewer/2022062409/56815018550346895dbe002a/html5/thumbnails/22.jpg)
OR1200 Results (contd.)
• Power gains are consistently good (Fig. 1)
• Power gains far outperform area losses (Fig 1)
• Flop distribution shown before slicing (Fig. 2a) after slicing on add (Fig. 2b) and after slicing on load (Fig. 2c)
Fig. 1
Fig.2a
Fig.2b
Fig.2c
![Page 23: Safe RTL Annotations for Low Power Microprocessor Design](https://reader035.vdocuments.site/reader035/viewer/2022062409/56815018550346895dbe002a/html5/thumbnails/23.jpg)
Experiment: PUMA
• We have used our tool-chain to test our methodology on PUMA– PUMA is a dual-issue, out-of-order super-
scalar, fixed-point PowerPC core– We have annotated both the RTL and
the architectural models of PUMA
![Page 24: Safe RTL Annotations for Low Power Microprocessor Design](https://reader035.vdocuments.site/reader035/viewer/2022062409/56815018550346895dbe002a/html5/thumbnails/24.jpg)
PUMA: a fixed point PowerPC core
![Page 25: Safe RTL Annotations for Low Power Microprocessor Design](https://reader035.vdocuments.site/reader035/viewer/2022062409/56815018550346895dbe002a/html5/thumbnails/25.jpg)
PUMA Power Gain Results
• Results are shown after annotating the– RTL (left) and Architectural (Right) models– For un-sliced and sliced on 1, 4, 10 instructions– For SPECINT2000 benchmarks
• Power dissipation decreases consistently
![Page 26: Safe RTL Annotations for Low Power Microprocessor Design](https://reader035.vdocuments.site/reader035/viewer/2022062409/56815018550346895dbe002a/html5/thumbnails/26.jpg)
PUMA Results (contd.)
• Power gains are good upon slicing for a few instructions (~7) before delay losses start dominating (Fig. 1)
• Power gains far outperform area losses (Fig 2)
• Flop distribution shown before slicing (Fig. 3a) after slicing on add (Fig. 3b) and after slicing on load (Fig. 3c)
Fig. 1
Fig. 2
Fig.3a
Fig.3b
Fig.3c
PUMA-RTL Power vs. Delay
0
0.2
0.4
0.6
0.8
1
1.2
Instruction-driven slicing
%-a
ge
Po
wer
gai
n,
Are
a lo
ss
Power
Delay
PUMA-RTL Power vs. Area
0.85
0.9
0.95
1
1.05
1.1
1.15
Instruction-driven slicing
%-a
ge
Po
wer
gai
n,
Are
a lo
ss
Power
Area
![Page 27: Safe RTL Annotations for Low Power Microprocessor Design](https://reader035.vdocuments.site/reader035/viewer/2022062409/56815018550346895dbe002a/html5/thumbnails/27.jpg)
Comparing OR1200 and PUMA
![Page 28: Safe RTL Annotations for Low Power Microprocessor Design](https://reader035.vdocuments.site/reader035/viewer/2022062409/56815018550346895dbe002a/html5/thumbnails/28.jpg)
Conclusions
• Proposed Instruction-driven Slicing as a new technique to automatically reduce power dissipation
• Implemented the methodology of incorporating instruction-driven slicing into the design flow tool-chain
• Inserting these annotations preserves the functionality of the circuit
![Page 29: Safe RTL Annotations for Low Power Microprocessor Design](https://reader035.vdocuments.site/reader035/viewer/2022062409/56815018550346895dbe002a/html5/thumbnails/29.jpg)
Conclusions (continued)
• This technique seems most applicable to single-issue multi-staged pipelined machines.
• When there are multiple instructions in-flight in the same pipeline stage, the gains of a single-instruction-abstraction are lost.
• Graphics processors, various embedded applications are more often better suited for this technique than general purpose out-of-order superscalars.