power reduction for fpga using multiple vdd/vth cecille freeman monday april 3, 2006
TRANSCRIPT
Power Reduction for FPGA using Multiple Vdd/Vth
Cecille FreemanMonday April 3, 2006
References
Fei Li; Yan Lin; Lei He. “Vdd programmability to reduce FPGA interconnect power” in ICCAD 2004. International Conference on Computer Aided Design, 2004, p 760-5.
Fei Li; Yan Lin; Lei He. “FPGA power reduction using configurable dual-Vdd” in Proceedings 2004. Design Automation Conference, 2004, p 735-40.
Fei Li; Yan Lin; Lei He; Jason Cong. “Low-power FPGA using pre-defined dual-Vdd/dual-Vt fabrics” in ACM/SIGDA International Symposium on Field Programmable Gate Arrays - FPGA, v 12, 2004, p 42-50.
OutlineIntroductionPre-defined dual Vdd
Dual Vt and dual Vdd structures CAD tool flow Results
Configurable Dual Vdd Structure CAD Results
Interconnect Dual Vdd Structure CAD Results
Introduction
Power consumption FPGAs are less power efficient than ASICs Reducing power loss is important if
FPGAs are going to be used in embedded systems
Previous approaches mostly focus on changing the design implementation
This is the first “in-depth study” of dual Vdd/Vt techniques for FPGA
This technique is fairly common in ASIC
Introduction
Power consumption Power loss from switching and leakage
Leakage is dominant in submicron (<100nm)
Both leakage and switching are reduced by reducing Vdd
Leakage is reduced by increasing Vt Programmable Vdd/Vt – 40-45% power
reduction in ASIC
Introduction
Dynamic Power
f=clock frequency E= Effective transition density C=load capacitance Vdd=supply voltage
Introduction
Leakage Power
Ilkg=leakage current Vdd=supply voltage
Ilkg increases as Vt decreases
Introduction
Dual Vdd theory Lower supply power is slower, but
results in less power loss Not all paths in the circuit need to be
equally fast Critical path has high Vdd for speed Non-critical path has low Vdd for power Makes use of timing slack
Predefined Dual Vdd/Vt
Design in 3 stages Determine a good Vdd/Vt scaling from
a normal LUT design Dual Vt within each LUT Dual Vdd across the chip
Predefined Dual Vdd/Vt
Single Vdd/Vt LUT (normal) SRAM cell, MUX tree
Predefined Dual Vdd/Vt
Single Vdd/Vt scaling Scaling across all LUTs Reduction in switching power
(quadratic as reduce supply voltage) Large delay penalties as supply is
reduced Examined 3 scaling schemes
Constant Vt Fixed Vdd/Vt ratio Constant leakage power
Predefined Dual Vdd/Vt
Scaling Vdd to constant leakage is best
Predefined Dual Vdd/Vt
Dual Vt within a single LUT SRAM can have a high Vt because
they are configured at the start, and are only read during operation (ie, no switching delay)
Increasing Vt increases the time taken to program the FPGA
Predefined Dual Vdd/Vt
Predefined Dual Vdd/Vt
Vt of SRAM set to get 15X SRAM leakage reduction Increases configuration time by 13%
MUX (region II) Vdd set using constant leakage scalingVdd of SRAM set to be same as MUX (constant in LUT)
Predefined Dual Vdd/Vt
High and Low Vdd LUTs Need a level converter Need to determine how the high and low
voltage LUTs will be placed on the chip Need a tool to determine
What should be in low and what should be in high
How the placement and routing should be done
Predefined Dual Vdd/Vt
Level Converter Basically 2 inverters with a level
restore
Predefined Dual Vdd/Vt
FPGA Fabric – 2 choices
Predefined Dual Vdd/Vt
CAD tool Assignment of high/low LUTs based on
“power sensitivity” LUT that will cause most power reduction
when moved to low VDD is changed If timing constraints are met, keep,
otherwise change back Routing done using simulated annealing,
with extra cost function for matching the high and low LUT assignment
Predefined Dual Vdd/Vt
Tested on 20 MCNC benchmarksDual Vt 11.6% power reduction for combinational 14.6% power reduction for sequential
Dual Vdd/Vt 13.6% combinational, 14.1% sequential Not as much as expected – routing and
placement issues because predefined
Layout Average 75% to low Vdd LUTs No significant difference with fabric layout
Configurable Dual Vdd/Vt
Pre-defined did not get good power reduction from dual Vdd because of routing and placement issuesSolution: make each LUT able to be either a high or a low Vdd LUT, so don’t have the extra constraint
Configurable Dual Vdd/Vt
Configurable LUT Attached by P-MOS transistor to both
rails SRAM configuration bits to determine
which rail supplies power 3 possible configurations
VddL, VddH, Power gated (both off) Configuration bits also determine if
output goes through a level converter
Configurable Dual Vdd/Vt
Configurable Dual Vdd/Vt
Problem: AREA Normally sleep transistors have high Vt,
but this means they are larger Instead use normal Vt transistors for
switches Normal Vt gives higher leakage
Gate boosting When a switch is off, apply gate voltage one
vt higher than Vdd at the source Gate boosting is used in Xilinx boards
already
Configurable Dual Vdd/Vt
Problem: AREA Apply switches with a larger granularity Clusters of 10 Logic blocks for one switch
configuration
Problem: Leakage from extra SRAM SRAM can have high Vt because not written
during operation Vt set so have 15X leakage reduction over
normal, increase in configuration time of 13%
Configurable Dual Vdd/Vt
FPGA fabric Compared fabric with all
programmable to one with VddH, VddL and programmable
Configurable Dual Vdd/Vt
CAD tools Same as for predefined, except the
matching cost now includes programmable blocks as being able to be assigned as either high or low LUTs in the placement algorithm
Configurable Dual Vdd/Vt
Results: Compared to single Vdd FPGAs with Vdd
optimized for the same target clock frequency Full supply programmability
Logic power reduction of 35.5% Logic block area increased by 24%
Partial supply programmability (1/1/3 H/L/P) Logic power reduction of 28.62% Logic block area increased by 14%
Logic area increase is not very significant when compared to area of routing
Configurable Interconnect
Global interconnect power is very highBecomes more dominant as apply power reduction to logic blocksSolution: make the interconnect programmable as well
Configurable Interconnect
Only a small portion of the interconnect is ever being used (avg 11.9% on their tests Would be good to power gate the
unused
1 configuration bit VddH, VddL
2 configuration bits VddH, VddL, power gated
Configurable Interconnect
Configuration for routing switches and connection to logic block
Configurable Interconnect
Power considerations for SRAM Additional SRAM means additional leakage
power Only program SRAM once before use Use same high-Vt SRAM as for configurable
logic blocks
Delay considerations Longer delay though routing switch Bound delay increase to 6% by properly
sizing the tri-state buffer
Configurable Interconnect
CAD tools Similar to tools as the configurable
Vdd/Vt Use only full programmable block
fabric No placement and routing constraints
Configurable Interconnect
Results One bit configuration (no power
gating) = 22.21% power reduction Two bit configuration (power gating)
= 50.55% power reduction 56.1% reduction to interconnect power
Power gating reduces FPGA interconnect power by 32% - many unused routing resources can be gated
Summary
Using a Dual Vt LUT decreases power by ~13%Predefined dual Vdd has very little effect on power because of routingFully programmable Vdd logic cells reduces power by 28.6%Fully configurable Vdd logic cells and interconnects with power gating reduces power by 50.55%Tradeoffs: increase in area, increase in delay, increase in configuration time
Future Work
Reduction of SRAM cells required for programmabilityDesign of a good power supply network for the chip
Conclusions
Excellent power reduction overallExcellent design if power reduction is a concern – no changes required to the design itselfMight introduce some timing issues because of extra delay through chipMight be expensive due to extra area required on the chip
Thanks, Questions?