fpga technology in beam instrumentation and related tools
TRANSCRIPT
DIPAC 2005. Lyon, France. 7 June 2005.
FPGA technology in beam instrumentation and related tools
Javier SerranoCERN, Geneva, Switzerland
DIPAC 2005. Lyon, France. 7 June 2005.
Plan of the presentation
FPGA architecture basicsFPGA design flowPerformance boosting techniquesDoing arithmetic with FPGAsExample: RF cavity control in CERN’s Linac 3.
DIPAC 2005. Lyon, France. 7 June 2005.
dataSelectC
dataAC[31:0]
dataBC[31:0]
dataSelectCD1
dataACd1[31:0]DataOut_3[31:0]
0
1DataOut[31:0]
sum_1[31:0]+
sum[31:0]
DataOut[31:0][31:0]DataInB[31:0] [31:0]
DataInA[31:0] [31:0]
DataSelect
Clk
Q[0]D[0]
[31:0]Q[31:0][31:0] D[31:0]
[31:0]Q[31:0][31:0] D[31:0]
Q[0]D[0]
[31:0]Q[31:0][31:0] D[31:0] [31:0]
[31:0][31:0] [31:0]Q[31:0][31:0] D[31:0]
[31:0]
[31:0][31:0] [31:0]Q[31:0][31:0] D[31:0]
A preamble: basic digital design
High clock rate: 144.9 MHz on a Xilinx Spartan IIE.
Higher clock rate: 151.5 MHz on the same chip.
dataSelectC
dataAC[31:0]
dataBC[31:0]
sum[31:0]+
DataOut_3[31:0]
0
1DataOut[31:0]
DataOut[31:0][31:0]
DataInB[31:0] [31:0]
DataInA[31:0] [31:0]
DataSelect
ClkQ[0]D[0]
[31:0]Q[31:0][31:0] D[31:0]
[31:0]Q[31:0][31:0] D[31:0]
[31:0]
[31:0][31:0]
[31:0]
[31:0][31:0] [31:0]Q[31:0][31:0] D[31:0]
6.90 ns
6.60 ns
DIPAC 2005. Lyon, France. 7 June 2005.
FPGA internal architecture 1/3
Example: Xilinx Spartan-IIE family architecture
CLB: Configurable Logic BlockDLL: Delay Locked Loop
DIPAC 2005. Lyon, France. 7 June 2005.
FPGA internal architecture 2/3
Simplified view of Spartan-IIE CLB Slice (two identical slices inside each CLB)
Members of the Spartan-IIE family range from the XC2S50E (16*24=384 CLBs) to the XC2S600E (48*72=3456 CLBs).
DIPAC 2005. Lyon, France. 7 June 2005.
FPGA internal architecture 3/3
Other design resources in modern FPGAs:
Clock control blocks (DLL or PLL).Fast differential signaling support (LVDS, LVPECL,…).Fast hard-wired DSP blocks made of multipliers and accumulators.High speed external RAM interfacing, plus lots of internal RAM.Multi gigabit transceivers (useful for global orbit feedback).Embedded CPU cores (PowerPC, ARM,…).Digitally Controlled Impedance active I/O termination.
DIPAC 2005. Lyon, France. 7 June 2005.
FPGA vs. DSP chips
Virtex-4SX55: 512 MAC units @ 500 MHz = 256 GMAC/s !
RegisterData In
X
+
MAC
DSP
Loop 256 times per Data In sample for a 256 tap FIR filter.
Reg0Data In
Reg1 Reg255
X X X
+
Data Out
FPGA
C0 C1 C255
Data Out
DIPAC 2005. Lyon, France. 7 June 2005.
FPGA design flow
Design Entry
Place and Route
Synthesis
Behavioral simulation
Post P&R simulation
DummyOut
0
1
Selector
DummyInB
DummyInADummyOut
DummyOut <= DummyInA when Selector='1' else DummyInB;
LUT3_AC
DummyOut
0
1
OBUF
DummyOut_obuf
IBUF
Selector_ibuf
IBUF
DummyInB_ibuf
IBUF
DummyInA_ibuf
Selector
DummyInB
DummyInA
DummyOutI O
I O
I O
I O
RTL View
Technology view
DIPAC 2005. Lyon, France. 7 June 2005.
FPGA flow: P&R results
DIPAC 2005. Lyon, France. 7 June 2005.
FPGA flow: floorplanning
myCounter0: process(Reset(0), Clk)beginif Reset(0)='1' thencounter0 <= (others=>'0');elsif Clk'event and Clk='1' thencounter0 <= counter0 + 1;end if;
end process myCounter0;
DIPAC 2005. Lyon, France. 7 June 2005.
Increasing performance 1/5Buffering
Delay in modern designs can be as much as 90% routing, 10% logic. Routing delay is due to long nets + capacitive input loading.Buffering is done automatically by most synthesis tools and reduces the fan out on affected nets:
net1 net2 net1
net2
net3
Before buffering After buffering
DIPAC 2005. Lyon, France. 7 June 2005.
Increasing performance 2/5Replicating registers (and associated logic if necessary)
Producer
Consumer 1
Consumer 2
Consumer 4
Consumer 3
Consumer 1
Consumer 2
Consumer 4
Consumer 3
Producer
Before After
DIPAC 2005. Lyon, France. 7 June 2005.
Increasing performance 3/5Retiming (a.k.a. register balancing)
Large combinatorial logic delay
Large combinatorial logic delay
Small Delay
Small Delay
Balanced delay
Balanced delay
Balanced delay
Balanced delay
Before
After
DIPAC 2005. Lyon, France. 7 June 2005.
Increasing performance 4/5Pipelining
Large combinatorial logic delay
Large combinatorial logic delay
Small delay
Small delay
Before
After
Small delay
Small delay
Small delay
Small delay
DIPAC 2005. Lyon, France. 7 June 2005.
Increasing performance 5/5Time multiplexing
Data In
100 MHz
50 MHz logic
50 MHz logic
50 MHz logic
50 MHz logic
50 MHz logic
50 MHz logic
50 MHz logic
50 MHz logic
50 MHz logic
50 MHz logic
50 MHz logic
50 MHz logic
Data Out
50 MHz
De-multiplexer Multiplexer
DIPAC 2005. Lyon, France. 7 June 2005.
An example 1/2Boosting performance of an IIR filter
Simple first order IIR: y[n+1] = ay[n] + b x[n]
Z-1X +
b
x
Xa
y
Performance bottleneck in the feedback path
Problem found in the phase filter of a PLL used to track bunch frequency in CERN’s PS
DIPAC 2005. Lyon, France. 7 June 2005.
An example 2/2Boosting performance of an IIR filter
Look ahead scheme: From y[n+1] = ay[n] + b x[n] we gety[n+2] = ay[n+1] + bx[n+1] = a2y[n] + abx[n] + bx[n+1]
Z-1
X
+
x
Xa2
y
abX
b
Z-1 + Z-2
FIR filter (can be pipelined to increase throughput)
Now we have two clock ticks for the feedback!
DIPAC 2005. Lyon, France. 7 June 2005.
Performing arithmetic in FPGAs 1/2
Binary adders: made of N full adders, each implementing:
sk = xk XOR yk XOR ck
ck+1 = (xk AND yk) OR (xk AND ck) OR (yk AND ck)Easy to pipeline.
Multipliers: hardwired (if your chip has them) or “pencil and paper”:
∑−
=
=⋅=1
02
N
k
kk XaXAP
X is successively shifted by k positions. Then, whenever ak = 1, X2k is accumulated. These multipliers can be pipelined, as opposed to the hardwired variety.
DIPAC 2005. Lyon, France. 7 June 2005.
Performing arithmetic in FPGAs 2/2
Dividers: pencil and paper method.
X
a
XAQ
N
k
kk∑
−
===
1
0
2
Start with an empty auxiliary register B and start shifting bits from A into it (right to left). Whenever B-X is positive, replace B with B-X. After every shift we get a bit of the quotient: 0 if B-X is negative, 1 otherwise.
Keep in mind that these are good solutions when both operands are variable. Example with one fixed operand: 0.5625a=9a/16=a/2 + a/16. Used at CERN to get baseline from BPM signal through lossy integrator.Sin, cos, sinh, cosh, atan, atanh, square root and vector rotation: CORDIC.
DIPAC 2005. Lyon, France. 7 June 2005.
Distributed Arithmetic (DA) 1/2
∑−
=
⋅=1
0][][
N
nnxncy
∑ ∑−
=
−
=
⎟⎠
⎞⎜⎝
⎛⋅⋅=
1
0
1
02][][
N
n
B
b
bb nxncy
Digital Signal Processing is about sums of products:
Let’s assume:c[n] constant (prerequisite to use DA)x[n] input signal B bits wide
Then:xb[n] is bit number b of x[n] (either 0 or 1)
And after some rearrangement of terms: ∑ ∑
−
=
−
=
⎟⎠
⎞⎜⎝
⎛⋅⋅=
1
0
1
0][][2
B
b
N
nb
b nxncy
This can be implemented with an N-input LUT
DIPAC 2005. Lyon, France. 7 June 2005.
Distributed Arithmetic (DA) 2/2
∑ ∑−
=
−
=
⎟⎠
⎞⎜⎝
⎛⋅⋅=
1
0
1
0][][2
B
b
N
nb
b nxncy
xB[0] …… x1[0] x0[0]
xB[1] …… x1[1] x0[1]
xB[N-1] …… x1[N-1] x0[N-1]
……
....
……
....
……
.... LUT
+ Register
2-1
y
DIPAC 2005. Lyon, France. 7 June 2005.
COordinate Rotation DIgitalComputer (CORDIC) 1/2
[ ][ ]
( ) ( )
[ ][ ]
( )1
2112arctancos
2
2
coscos
2tan
tancos'tancos'
sincos'sincos'
2
1
1
±=+
==
⋅⋅+=
⋅⋅−=
−=
±=
⋅+=⋅−=
⋅+⋅=⋅−⋅=
−−
−+
−+
−
i
ii
i
iiiiii
iiiiii
ii
i
d
K
dxyKy
dyxKx
xyyyxx
xyyyxx
δδ
φ
φφφφ
φφφφGeneral vector rotation:
Rearranging:
We restrict rotation angles to be:The cosine can be treated as a constant since:
Giving the CORDIC equations:
With:
DIPAC 2005. Lyon, France. 7 June 2005.
CORDIC 2/2
Two working modes:Rotation mode: rotates the input vector by a specified angle given as an argument.Vectoring mode: rotates the vector until it aligns with the x axis while recording the angle required to make that rotation.
Usage examples:To compute (ρ,φ) from (x,y) (polar to cartesiantransformation) feed (x,y) to the CORDIC rotator in vectoring mode, then find the results in x and the phase accumulator.To compute sin φ, feed (x=1,y=0) to the CORDIC in rotation mode, then find the result in y.
DIPAC 2005. Lyon, France. 7 June 2005.
Case study: low level RF cavity control in CERN’s Linac 3 1/4
CAVITY
Pickup
Klystron amplifier
LRFSC card
Forward
Reflected
Cavity
Set Points from Control RoomQ
I
DIPAC 2005. Lyon, France. 7 June 2005.
Case study: low level RF cavity control in CERN’s Linac 3 2/4
100 MHzfrom cavity
X
80 MHz LO
20 MHz LPF
Mixer0 2 4 6 8 10 12 14 16 18 20
-1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
0 2 4 6 8 10 12 14 16 18 20
-1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
Sampling the 20 MHz at exactly 4 times its frequency produces I, Q, -I, -Q, I, Q…
DIPAC 2005. Lyon, France. 7 June 2005.
Case study: low level RF cavity control in CERN’s Linac 3 3/4
FPGA
IQ D
emod
80 Ms/s 14 bit data from ADC
PI
PI
I (40 Ms/s)
Q (40 Ms/s)
+
+
Feed forward
Feed forward
IQ m
odul
ator
80 Ms/s 14 bit
data to DAC
I Set Points
Q Set Points
DIPAC 2005. Lyon, France. 7 June 2005.
Case study: low level RF cavity control in CERN’s Linac 3 4/4
DIPAC 2005. Lyon, France. 7 June 2005.
Thanks!
Many thanks to Uli Raich and Tony Rohlev for help in preparing this talk.Snapshots in some slides courtesy of Xilinx and Synplicity.Some references for further study:
“Digital Signal Processing With Field Programmable Gate Arrays” 2nd edition by U. Meyer-BaeseAndraka Consulting Group: http://www.andraka.com/
comp.arch.fpga newsgroup
DIPAC 2005. Lyon, France. 7 June 2005.
Case study: low level RF cavity control in CERN’s Linac 3