lopass: a low power architectural synthesis for fpgas with interconnect estimation and optimization...
TRANSCRIPT
LOPASS: A Low Power Architectural Synthesis for FPGAs with Interconnect Estimation and
Optimization
Harikrishnan K.C.
University of Massachusetts Amherst
1
Overview
• Motivation• Introduction• FPGA Architecture• LOPASS Synthesis Flow• High level Power Estimation• Power Optimization Engine• Multiplexer Optimization for Interconnect Reduction• Experimental Results• Conclusion
2
Motivation
• Power consumption• Critical constraining factor in IC design flow
• Field Programmable Gate Arrays(FPGA)• Power inefficient due to large amount of transistors for
programmability• Fixed Logic and Routing Resources• Difficult to optimize during physical design stage
3
Introduction
• Behavioral Level Optimization• scheduling, allocation, binding
• Techniques for power reduction• high level power estimation• simultaneous scheduling allocation and binding for power
optimization• interconnection optimization
4
Previous Work
• Most previous high level synthesis techniques for FPGAs optimized objectives other than power reduction
• Dynamic reconfiguration during run time to save area,
[M. Vasilko, Int.Workshop Logic Architecture Synthesis,1995]
• Tradeoff between power and circuit speed by selecting different implementations of components
• Power consumption in steering logic and interconnects were not considered. [F. G. Wolff, Proc IEEE Nat.Aerospace.Conf.,2000]
• Newer studies have looked into simultaneous resource allocation and binding algorithms for power reduction [D. Chen, Proc. AsiaSouth Pacific Des. Autom. Conf., Jan. 2007]
5
Techniques for Power Reduction
• High level power estimation• For effective power optimization• wire capacitance, length, FPGA characteristics
• Power Optimization engine• combined solution space• Simulated Annealing based algorithm
• Interconnect Optimization• Reduce Multiplexer(MUX) requirement
6
FPGA Architecture
• SRAM based technology
• Configurable Logic Block (CLB)• Basic Logic Element (BLE)• Look Up Table (LUT)
• Routing Architecture parameters• Channel Width (W)
• Switch box flexibility (Fs)
• Connection box flexibility (Fc)
7
LOPASS Synthesis Flow• Design in HDL converted to CDFG• Estimated power values from power estimator• Power optimization by low power optimization engine• RTL synthesis using Design Compiler• FPGA evaluation tool fpgEva_LP2
report delay, power and area.
8
High Level power Estimation
• Wire Length Estimation• Rent’s Rule T = kNp
• Interconnect density function i(l)
• p is Rent’s exponent, α is fraction of sink terminals• f.o is average fan-out, k is average input/output per CLB
9
High Level power Estimation cont.
• Switching Activity Estimation• CDFG simulation
• Cin(O,O’) , input transitions when FU switches from O to O’
• The switching activity Sin is given by
• The total switching activity of the overall design
10
High Level power Estimation cont.
• Resource library Characterization• Design ware libraries from Synopsys• different resource versions for implementing same operation
type
Resource characterization flow
11
High Level power estimator
• Static and Dynamic power need to considered
• Dynamic power is given by
• Pdynamic = PLUT + PREG +PLW +PGW
• Static power is given by
• Pstatic = Ps_LUT + Ps_FF + Ps_LB + Ps_GB
• PLUT = NLUT.S.ELUT.f
• PREG = NREG.S.EREG.f
• PLW, GLW = 0.5f.S.Vdd2.Cwire
12
Power Optimization Engine
13
Multiplexer Optimization for Interconnect Reduction
• Register binding• Cofamily based algorithm
• Port assignment• Port Assignment Algorithm
• Definitions • DFG, G =(V,A)
• Compatibility Graph Gc = (Vc,Ac)
14
Register Binding
• Given a compatibility graph Gc = (Vc,Ac)
• find a subset of Ac that covers all vertices in Vc
• total sum of weights of all edges is minimum
• Calculate minimum weighted cofamilies of a partially ordered set (POSET)
• POSET• chain, antichain, k-family, k-cofamily
• Theorem: Register binding on a compatibility graph Gc into k registers is equivalent to finding k disjoint chains in the POSET.
15
Register Binding cont.• Find the minimum weighted k-cofamily in POSET
• Convert POSET to a network flow graph, the split graph• Find the minimum cost flow for this split graph
• Cost of each edge is given by
16
Cost Function Formulation
• A MUX occurs in two situations• when more than two registers feed data to a port• when more than two FUs produce results and store them into a
register
• The cost function is defined as
Nmux = number of MUXes saved/wasted
Tr-f = total connections between registers and fan out FUs
Tfu = total fanout FUs involved
α and β are positive scaling constants17
Port Assignment
• Technique for reducing MUX connection• Case 1
• Case 2
18
Experimental Results
• Power Estimation• Comparison between estimated power and those reported by
fpgaEva_LP2• Wire length is 13.7% away from reality• Total power is 14.1% away from reality
• Multiplexer Optimization• Comparison between k-co family algorithm and Bipartite
algorithm and Left edge algorithm• 24.7 % better than Bipartite algorithm• 29.6% better than Left edge algorithm
19
Experimental Results
• LOPASS Compared to SPARK • 9.1 % better in terms of latency optimization
• LOPASS Compared to Synopsys Behavioral Compiler• 57.3% reduction in CLBs• 61.6% reduction in total power consumption• 10.6% reduction in critical delay
• LOPASS Compared to Impulse C• On average 77.1% reduction in multipliers and 27.9% in LEs• 44.1% and 31.1% reduction in dynamic and total power
20
Conclusion
• A Low power architectural synthesis system, LOPASS for FPGA designs is presented
• It includes three major components• a flexible high level power estimator
• a simulated annealing based optimization engine
• a k-co family based register binding algorithm
• LOPASS is 61.6% better on power consumption and 10.6% better on clock period compared to Synopsis BC
• LOPASS is 31.1% better on power consumption with 11.8% penalty on clock period compared to Impulse C
21
Thank You!
22