using negative edge triggered ffs to reduce glitching power in fpga circuits
DESCRIPTION
Using Negative Edge Triggered FFs to Reduce Glitching Power in FPGA Circuits. Tomasz S. Czajkowski and Stephen D. Brown Department of Electrical and Computer Engineering University of Toronto, Ontario, Canada. Motivation. Glitches: - PowerPoint PPT PresentationTRANSCRIPT
June 6, 2007 1
Using Negative Edge Triggered FFs to Reduce Glitching Power in FPGA Circuits
Tomasz S. Czajkowski and Stephen D. Brown
Department of Electrical and Computer Engineering
University of Toronto, Ontario, Canada
2
Motivation
Glitches: Undesirable logic transitions that occur
due to delay imbalance in the logic circuit
Waste power and do not provide any useful functionality
Can increase the average toggle rate of a net by as much as a factor of 2
Not well defined until post placement and routing
Glitches can be filtered out by strategically inserting negative edge triggered FFs
3
Glitches in FPGAs
Due to unequal arrival time of signals at the inputs of LUTs
Glitches can be propagated through LUTs
4LUT
4LUT
Generated
Propagated
4
Reducing Glitches
Insert a negative edge triggered FF after a LUT that produces or propagates glitches
4LUT
4LUT
Generated
clock
No glitches
5
Alternatives
Gated D-latch Implement a gated D-latch in a LUT Input signal is transparent during the latter half
of the clock period
Gated LUT Gate the output of a LUT with the clock input using an
AND or an OR gate Similar effect as gated D-latch Can generate glitches too
When implemented Gated D-latch consumes 50% more power than
a FF and double that of a gated LUT Neither alternative is very effective
6
Background on Dynamic Power
Average Net Dynamic Power Dissipation
Pavg is average power V is supply voltage fclock is the clock frequency si is the average per cycle toggle rate of a net Ci is the capacitance of a net
1#
0
2 ** 2
1 nets
iiclockiavg CfsVP
7
Power Model
Goal To be able to compute the change in
dynamic power dissipation in the logic elements affected by a negative edge triggered FF insertion
Power dissipated by a LUT and a FF
Toggle Rate of logic signals (si)
Net capacitance (Ci)
8
LUT Power
The LUT itself dissipates an non-trivial amount of power when its inputs toggle
We look at how the power dissipated by a LUT relates to the frequency of its output transitions
9
LUT Power Model
10
FF Power
How much power would it cost to insert a FF into a circuit?
What about the power cost of alternatives to a FFs? Gated LUT Gated D-latch
11
Clocked Element Power Comparison
12
Toggle Rate of Logic Signals
Topic is covered considerably in literature
Toggle rate model based on the concept of Transition Density [Najm’94] and the work of Anderson and Najm [AN’03] The latter work decomposes transition density
into transitions generated by a LUT and that propagated through a LUT.
Modified to include delay information in order to account for glitches
13
Examples of Wires
P[y] Pt(y) P[y’=1 | y=0]
P[y’=0 | y=1] D(y)
D(y) –
Pt(y)
½ 1 1 1 1 0
½ ½ ≈0.4 ≈0.4 ½ 0
1/8 ¼ 1/8 1 ¼ 0
1/8 ¼ 1/8 1 ½ ¼
Clock
A
B
C
D
14
Wire Properties
Name Description Notation
Static ProbabilityProbability that a wire assumes the logic value 1 in any given clock cycle. P[y]
Transition Probability
The average number of state transitions, excluding glitches. Pt(y)
Low to High Transition Probability
Probability that a wire will change state to logic value 1, given that it is at a logic value 0 at present. P[y’=1 | y=0]
High to Low Transition Probability
Probability that a wire will change state to logic value 0, given that it is at a logic value 1 at present. P[y’=0 | y=1]
Transition DensityThe average number of logic value transitions per cycle. Includes glitches. D(y)
Average Number of Glitches per cycle
The average number of useless transitions per clock cycle D(y)-Pt(y)
15
Propagating Glitches Through a LUT
Increase D(z) to account for glitches that occur on wire y (D(y)-Pt(y)). Do so only when x remains at constant 1 for the duration of the clock cycle.
( ) [ 1]* [ 1| 1]*( ( ) ( ))tD z P x P x x D y P y
y
xz
16
Estimate Error
17
Net Capacitance
We need to be able to estimate net capacitance to figure out the difference in dynamic power dissipation due to a change in the transition density of a net
Relate net capacitance (unavailable directly) to net delay (available through timing report) Distinguish between nets of different fanout
18
Fanout 1 Net Capacitance
19
Fanout 2 Net Capacitance
20
Fanout 3 Net Capacitance
21
Fanout 4 Net Capacitance
22
Higher Fanout Net Capacitance
In our benchmark set fewer than 5% of the nets had fanout greater than 4 Clock net is excluded from calculation
Approximate capacitance of net with fanout n>4 as:
Not exact, but supports the fact that glitches on nets with high fanout are bad Average estimate error of +22%
)4mod()4(*4
)( nCCn
nC
23
Negative Edge Triggered FF Insertion Algorithm
1. Scan all nets in a logic circuit to determine if negative edge FF insertion can be applied
2. Analyze the resulting set of nets to determine the benefit of applying the optimization to each net (determined by the cost function)
3. Apply the optimization to a net on which the most power could be saved
4. Repeat until no beneficial choices are found
24
Compute change in power (∆P) + cost of adding a FF - power saved on the modified net - power saved on nets and LUTs in the
transitive fanout of the added FF
Compute the change in the minimum clock period (∆T) Specify ∆T allowed (∆Ta)
where u(x) is the step function
Accept change when ∆C < 0
Cost Function
)(1* TTuPC a
25
Example
LUTSome logic
network
LUT
LUT
LUT
LUT
LUT FF
FF
FF
26
Example: Inserted FF
LUTSome logic
network
LUT
LUT
LUT
LUT
LUT FF
FF
FF
NegFF
27
Example: Compute change in the # of glitches
LUTSome logic
network
LUT
LUT
LUT
LUT
LUT FF
FF
FF
NegFF
28
Example: Compute change in the # of glitches
LUTSome logic
network
LUT
LUT
LUT
LUT
LUT FF
FF
FF
NegFF
29
Example: Compute change in LUT power dissipation
LUTSome logic
network
LUT
LUT
LUT
LUT
LUT FF
FF
FF
NegFF
30
Experimental Results
8 benchmark circuits taken from QUIP package
Synthesize, place, route and analyze timing of a circuit using Quartus II 5.1
Apply algorithm to reduce glitches in a circuit Aim to decrease the minimum clock period by no more
than 5%
Perform timing analysis once the circuit has been modified
Use ModelSIM-Altera 6.0c for simulation Simulate a circuit both pre- and post- modification
using the same clock frequency
Use PowerPlay Power analyzer to estimate the average dynamic power dissipation of each circuit
31
Experimental Results
Circuit name
Simulation Clock
Frequency
(MHz)
Minimum Clock Period Dynamic Power Dissipation
Initial
(ns)
Final
(ns)Change
(%)Initial (mW)
Final (mW)
Change (%)
Barrel64* 200 4.386 4.806 8.74 229.94 189.7 -17.50
mux64_16bit 275 3.052 3.052 0 389.24 389.24 0.00
fip_cordic_rca 125 7.551 7.851 3.82 43.28 39.49 -8.76
oc_des_perf_opt 290 2.989 3.07 2.64 1058.8 796.7 -24.75
oc_video_compression_systems_huffman_enc 260 3.626 3.626 0 94.88 95.19 0.33
cf_fir_24_8_8 170 5.375 5.71 5.87 290.41 292.9 0.84
aes128_fast 140 6.251 6.569 4.84 879.24 870.6 -0.99
rsacypher 140 6.376 6.563 2.85 50.73 48.22 -4.95
Average +3.6 -7.0
32
Observations (1)
oc_des_perf_opt Large number of XOR gates present Removing glitches from one node removes a
lot of glitches on the nodes in its transitive fanout (up to the next FF)
mux64_16bit The cost function determined that no net was a
good candidate for optimization Very few glitches were present in the circuit
and the power they dissipate was not large enough to warrant the insertion of FFs
33
Observations (2)
cf_fir_24_8_8 Overestimated toggle rate caused the algorithm to apply
negative edge triggered FF insertion too excessively Need to include spatial correlation in the toggle rate model
aes128_fast Toggle rate is 50% higher than in oc_des_perf_opt Most nets use local LAB connections, causing little power
dissipation Insertion of 173 FFs only achieved 1% power reduction
Saved 35.14 mW in routing alone, because toggle rate on all affected wires was reduced by 50-70%
Added 24.6 mW due to FF insertion Added 1.86 mW to the power dissipated by the clock network,
because new LABs were connected to the clock network Net win of 8.68 mW
34
Conclusion
Negative edge triggered FF insertion can work well to reduce glitches in a circuit Computing glitches propagated to the transitive fanout
of a net is important, especially when XOR gates are present
When inserting a lot of negative edge triggered FFs, be mindful where they go. Do target LABs have a clock signal already routed to them?
Unlike retiming, our approach only needs to ensure that exactly one negative edge triggered FF is on any given combinational path Retiming may require the translation of more than a
single FF to be valid
35
Future Work
Better toggle rate prediction algorithm that includes spatial correlation
Having FFs that can be negative edge triggered without using an additional LAB clock line would make the cost of this optimization lower Silicon area cost vs. frequency of use trade-off
36
Acknowledgement
We’d like to express our gratitude to Altera for funding this research
We’d like to thank Altera Toronto in particular for dedicating some of their time to answer our questions and provide insight throughout the course of this work
June 6, 2007 37
Questions?