resource awareness fpga design practices for reconfigurable computing: principles and examples wu,...
TRANSCRIPT
![Page 1: Resource Awareness FPGA Design Practices for Reconfigurable Computing: Principles and Examples Wu, Jinyuan Fermilab, PPD/EED April 2007](https://reader030.vdocuments.site/reader030/viewer/2022032709/56649ed95503460f94be7932/html5/thumbnails/1.jpg)
Resource Awareness FPGA Design Practices for
Reconfigurable Computing: Principles and Examples
Wu, Jinyuan
Fermilab, PPD/EED
April 2007
![Page 2: Resource Awareness FPGA Design Practices for Reconfigurable Computing: Principles and Examples Wu, Jinyuan Fermilab, PPD/EED April 2007](https://reader030.vdocuments.site/reader030/viewer/2022032709/56649ed95503460f94be7932/html5/thumbnails/2.jpg)
Introduction• Short Course (1/2 day):
– “How to Design Compact FPGA Functions:
Resource awareness design practices.”
– http://www-ppd.fnal.gov/EEDOffice-W/Projects/ckm/comadc/CompactFPGAdesign.pdf
• Refresher Course (45min):– “Resource Saving in Micro-Computer Software &
FPGA Firmware Designs”
– http://www-ppd.fnal.gov/EEDOffice-W/Projects/ckm/comadc/ResourceSaving.ppt
• This Document– Resource Awareness FPGA Design Practices for
Reconfigurable Computing: Principles and Examples
What can be done with an
FPGA?
![Page 3: Resource Awareness FPGA Design Practices for Reconfigurable Computing: Principles and Examples Wu, Jinyuan Fermilab, PPD/EED April 2007](https://reader030.vdocuments.site/reader030/viewer/2022032709/56649ed95503460f94be7932/html5/thumbnails/3.jpg)
Example: ADC Using FPGA
AMP &Shaper
AMP &Shaper
AMP &Shaper
AMP &Shaper
AMP &Shaper
AMP &Shaper
AMP &Shaper
AMP &Shaper
ADC
ADC
ADC
ADC
FPGA
TDC
TDC
TDC
TDC
R1 R1
C
R2
FPGA
VREF
• Analog signals from AMP & Shapers are directly fed to FPGA pins.
• FPGA outputs and passive RC network are used to generate ramping reference voltage VREF.
• The input voltages and VREF are compared using FPGA differential input receivers.
• The times of transitions representing input voltage values are digitized by TDC blocks in FPGA.
T1 T2 T3 T4
V1 V2V3 V4
V1 V2V3 V4
T1 T2 T3 T4
![Page 4: Resource Awareness FPGA Design Practices for Reconfigurable Computing: Principles and Examples Wu, Jinyuan Fermilab, PPD/EED April 2007](https://reader030.vdocuments.site/reader030/viewer/2022032709/56649ed95503460f94be7932/html5/thumbnails/4.jpg)
TDC Inside FPGA
c0
c90
c180
c270
c0
MultipleSampling
ClockDomain
Changing
Trans. Detection& Encode
Q0
Q1
Q2
Q3QF
QE
QD
c90
Coarse TimeCounter
DV
T0T1
TS
• Sampling rate: 360 MHz x4 phases = 1.44 GHz.
• LSB = 0.69 ns.
• Logic elements with critical timing are assigned as shown.
4Ch
Logic elements with non-critical timing are freely placed by the fitter of the compiler.
![Page 5: Resource Awareness FPGA Design Practices for Reconfigurable Computing: Principles and Examples Wu, Jinyuan Fermilab, PPD/EED April 2007](https://reader030.vdocuments.site/reader030/viewer/2022032709/56649ed95503460f94be7932/html5/thumbnails/5.jpg)
ADC Test: Waveform Digitization on BD3_19
1
1.5
2
2.5
2500 3000 3500 4000 4500 5000 5500
t(ns)
V
Leading Ramp Trailing Ramp
0
8
16
24
32
40
48
56
64
0 32 64 96 128 160 192 224 256
Leading Ramp Trailing Ramp
RawData
Input Waveform, Overlap Trigger& Reference Voltage
Converted
FPGA
TDC
TDC
50 50
1000pF
100
VREF
A lot can be done with an FPGA if one can image.
![Page 6: Resource Awareness FPGA Design Practices for Reconfigurable Computing: Principles and Examples Wu, Jinyuan Fermilab, PPD/EED April 2007](https://reader030.vdocuments.site/reader030/viewer/2022032709/56649ed95503460f94be7932/html5/thumbnails/6.jpg)
Micro-computing vs. Reconfigurable Computing
• In microprocessor, the users specify program on fixed logic circuits.
• In FPGA, the users specify logic circuits (as well as program).
• The FPGA computing needs not to follow microprocessor architectures. (But useful experiences can be borrowed.)
• The usefulness of FPGA reconfigurable computing is still to be fully appreciated.
(100+3-4)*5+7 =?
100
34
57Control:
Data: 100,3,4,5,7
LD (-) (+)(*)(+)
CPUFPGAData
ProgramConfiguration
DataProgram
![Page 7: Resource Awareness FPGA Design Practices for Reconfigurable Computing: Principles and Examples Wu, Jinyuan Fermilab, PPD/EED April 2007](https://reader030.vdocuments.site/reader030/viewer/2022032709/56649ed95503460f94be7932/html5/thumbnails/7.jpg)
Example: Track Fitting
z=z0(z-z0)=-2 (z-z0)=+2 (z-z0)=+4(z-z0)=-4
4h
y0-4
2000 )()( zzzzhyy
![Page 8: Resource Awareness FPGA Design Practices for Reconfigurable Computing: Principles and Examples Wu, Jinyuan Fermilab, PPD/EED April 2007](https://reader030.vdocuments.site/reader030/viewer/2022032709/56649ed95503460f94be7932/html5/thumbnails/8.jpg)
Relative Errors of Several Track Fitter Schemes
0.00
2.00
4.00
6.00
8.00
10.00
12.00
14.00
16.00
18.00
20.00
0 2 4 6 8 10 12 14 16 18
Track Half Length
Rel
ativ
e E
rro
rs
3-point, next planes
3-point, full length
FPGA fitter
Least Square
2000 )()( zzzzhyy
Least Square Fitter
Multiplier-less FPGA LS Fitter
![Page 9: Resource Awareness FPGA Design Practices for Reconfigurable Computing: Principles and Examples Wu, Jinyuan Fermilab, PPD/EED April 2007](https://reader030.vdocuments.site/reader030/viewer/2022032709/56649ed95503460f94be7932/html5/thumbnails/9.jpg)
Least Square Fitter
2000 )()( zzzzhyy
y1y2y3y4y5y6y7
iii
iii
iii
ye
ydh
ycy
0
c1
c2
c3
c4
c5
c6
c7
d1
d2
d3
d4
d5
d6
d7
e1
e2
e3
e4
e5
e6
e7
X
X
X
• The parameters can be described as inner-products.
• Hit coordinates and coefficients are fed simultaneously.
• The inner-products can be calculated with multiplier-accumulator structures.
![Page 10: Resource Awareness FPGA Design Practices for Reconfigurable Computing: Principles and Examples Wu, Jinyuan Fermilab, PPD/EED April 2007](https://reader030.vdocuments.site/reader030/viewer/2022032709/56649ed95503460f94be7932/html5/thumbnails/10.jpg)
Multiplier-less (ML) Quasi-Least Square Fitter
iii
iii
iii
ye
ydh
ycy
0
y1y2y3y4y5y6y7
x1x2x3x4x5x6x7
<<
+/- +/- +/-
<< <<
4
• The coefficients are described as “two-bit” numbers, e.g.:– 5=4+1; 7=8-1; 112=128-16;
• The multiplication is replaced with two shift & add/sub operations.
• There are two clock cycles to fetch a measurement point (i.e., y1, y2, etc.) allowing two shift & add/sub operations
+18-1
128-16
![Page 11: Resource Awareness FPGA Design Practices for Reconfigurable Computing: Principles and Examples Wu, Jinyuan Fermilab, PPD/EED April 2007](https://reader030.vdocuments.site/reader030/viewer/2022032709/56649ed95503460f94be7932/html5/thumbnails/11.jpg)
Inaccuracy Doesn’t Matter, A Lot of Time
0.00
0.50
1.00
1.50
2.00
2.50
3.00
3.50
4.00
4.50
5.00
0 2 4 6 8 10 12 14 16 18
Half-length of the Track
Rel
ativ
e E
rro
r
eta4096 Least Square
eta4096 FPGA Fitter
hh512 Least Square
hh512 FPGA fitter
yy32 Least Square
yy32 FPGA fitter
Least Square Fitter
Multiplier-lessQuasi-Least Square
FPGA Fitter
2000 )()( zzzzhyy
![Page 12: Resource Awareness FPGA Design Practices for Reconfigurable Computing: Principles and Examples Wu, Jinyuan Fermilab, PPD/EED April 2007](https://reader030.vdocuments.site/reader030/viewer/2022032709/56649ed95503460f94be7932/html5/thumbnails/12.jpg)
Fitting is easy. Matching hits is harder.Software FPGA
Typical
FPGA Resource Saving Approaches
O(n2)for(){
for(){…}
}
O(n)*O(N)Comparator
Array
Hash Sorter
O(n)*O(N): in RAM
O(n3)for(){
for(){
for(){…}
}
}
O(n)*O(N2)CAM,
Hugh Trans.
Tiny Triplet Finder
O(n)*O(N*logN)
O(n4)for(){ for(){
for(){ for()
{…}
}}}
![Page 13: Resource Awareness FPGA Design Practices for Reconfigurable Computing: Principles and Examples Wu, Jinyuan Fermilab, PPD/EED April 2007](https://reader030.vdocuments.site/reader030/viewer/2022032709/56649ed95503460f94be7932/html5/thumbnails/13.jpg)
Resource Saving Tricks
Loop Reduction Tricks:The number of computations in a given task is reduced by (1) using fewer iterations in loops or/and (2) using fewer operations in each iteration.
Non-Loop Reduction Tricks:The number of computations in a given task is unchanged. The FPGA resource is saved by (1) reusing the resources multiple times via sequencing or/and (2) using transistor-saving resources such as RAM.
![Page 14: Resource Awareness FPGA Design Practices for Reconfigurable Computing: Principles and Examples Wu, Jinyuan Fermilab, PPD/EED April 2007](https://reader030.vdocuments.site/reader030/viewer/2022032709/56649ed95503460f94be7932/html5/thumbnails/14.jpg)
Resource Saving TricksLoop-Reduction
Multiplier-less (ML) Approaches
Recursive Implementation of FIR Filter
FFT: O(n)*O(log(N))
Tiny Triplet Finder: O(n)*O(N*log(N))
+
s[n]
-x[n-K]
x[n]
+y[n]
-s[n-K]
x[n]
y[n]
*h1*h2
*h[K]
X
<<
+/-
*R1/R3
*R2/R3
Bit
Arr
ay
Shifter
Bit
Arr
ay
ShifterBit-wise Coincident Logic
![Page 15: Resource Awareness FPGA Design Practices for Reconfigurable Computing: Principles and Examples Wu, Jinyuan Fermilab, PPD/EED April 2007](https://reader030.vdocuments.site/reader030/viewer/2022032709/56649ed95503460f94be7932/html5/thumbnails/15.jpg)
Resource Saving TricksNon-Loop-Reduction
Sequencing: Using RAM: Hash Sorter/Histogram
OP1
Initialization
OP2 OP3 OP4
OP1 OP2 OP3 OP4
OP1 OP2 OP3 OP4
OP1 OP2 OP3 OP4
Initialization 1Initialization 2Initialization 3
OP1OP2OP3OP4
OP1OP2OP3OP4
OP1OP2OP3OP4
OP1OP2OP3OP4
![Page 16: Resource Awareness FPGA Design Practices for Reconfigurable Computing: Principles and Examples Wu, Jinyuan Fermilab, PPD/EED April 2007](https://reader030.vdocuments.site/reader030/viewer/2022032709/56649ed95503460f94be7932/html5/thumbnails/16.jpg)
InputCtrl
De-serial.
BCO
Hit(s)
D
W/RWA
RA
16
32
An Example of Inexplicit Computing & Hidden Resource
• Data with random time stamp are re-ordered according to beam crossing (BCO).
• Data with same BCO output together and the bandwidth becomes smaller.
• Inexplicit computing (sorting) is performed with hidden resource (RAM, it should be static RAM not dynamic RAM.)
RAM
![Page 17: Resource Awareness FPGA Design Practices for Reconfigurable Computing: Principles and Examples Wu, Jinyuan Fermilab, PPD/EED April 2007](https://reader030.vdocuments.site/reader030/viewer/2022032709/56649ed95503460f94be7932/html5/thumbnails/17.jpg)
Why Saving Resource?
Why not?
![Page 18: Resource Awareness FPGA Design Practices for Reconfigurable Computing: Principles and Examples Wu, Jinyuan Fermilab, PPD/EED April 2007](https://reader030.vdocuments.site/reader030/viewer/2022032709/56649ed95503460f94be7932/html5/thumbnails/18.jpg)
The Fever of Moore’s Law vs. Maxwell’s Equations
t
DJH
t
BE
B
D
0
1998 2000 2002 2004 2006 2008 2010
Op/sec
MIT, 2002
• During the hot days of Moore’s Law, the rules of thumb are: – BRB – Buy Rather than Build
– URU – Use Rather than Understand
– WRW – Wait Rather than Work
• From fundamental principles like Maxwell’s Equations, it is known limits of Moore’s Law exist. The technology advance should come from: – The I3 Law: Imagination, Innovation & Implementation.
WRW
![Page 19: Resource Awareness FPGA Design Practices for Reconfigurable Computing: Principles and Examples Wu, Jinyuan Fermilab, PPD/EED April 2007](https://reader030.vdocuments.site/reader030/viewer/2022032709/56649ed95503460f94be7932/html5/thumbnails/19.jpg)
Total Useful Works = (Clock Frequency)
x (Silicon Size) x (Efficiency)
• There is a big room for improvement on computation efficiency in both micro-computer software and FPGA firmware.
• Resource awareness not only saves direct cost, but also indirect cost like power consumption, PC board layout, cooling etc.
• Unnecessary artificial complexities confuse people, often including the designer.• Resource saving helps today when technology stales.• Resource saving helps future with technology progresses.
E
F
S
E
F
S
Primarily Users’Responsibility