status of gtk asic - tdcpix
DESCRIPTION
Status of GTK ASIC - TDCpix. 22 Nov 2011 G. Aglieri, M. Fiorini, P. Jarron, J. Kaplon, A. Kluge, E. Martin, M. Noy, L. Perktold, K. Poltorak. TDCpix ASIC block diagram (60 bit serial/5 LVDS pairs parallel). 45. 2011.10.24. 4x45. 45. 2.7 /4 Mhits/s. Config pixel. 5 bit trimDAC. - PowerPoint PPT PresentationTRANSCRIPT
Status of GTK ASIC - TDCpix
22 Nov 2011G. Aglieri, M. Fiorini, P. Jarron, J. Kaplon, A. Kluge,
E. Martin, M. Noy, L. Perktold, K. Poltorak
DLL
Config pixel
5 bittrimDAC
pixel
driver&line&receiverpixel cell x 45
fineHitRegister 0
syncRegister
fineTimeStampEncoder
pixelGroupFifo (depth= 3)
5 address + 5 pileup32 fineRise32 fineTrail2x12+1 coarseRise2x4+1 coarseTrail
5 add+5 pil
5 fineRise5 fineTrail12+1 coarseRise6+1 coarseTrail2 group collision
5 rise+5 trail
grou
p E
OC
8
grou
p E
OC
0
CP&PDDLL 0
colu
mn
0
columnFifoController
colu
mn
1
quarterchipFifo&frameInserter
Controller
doub
le c
olum
n 1
serializer
48
2
clkdll=320MHz
= SEU protected
TDCpix ASIC block diagram (60 bit serial/5 LVDS pairs parallel)
9+1x temp
2011.10.24
pixel column
end of column
23 cell units * (0.40 µm x 4.8 µm)* (648+152+373/10) FF=37000 µm2=124µm*300µm
8 bit thresholdDAC
column &3 bit bias DAC
ConfigDoubleCol
bandgap
2.4/3.2 Gbits/sCML driver
parallelOut
4 x LVDS480/640 Mbit/s
(enable)/mode
ba
ndg
ap
ove
rrid
e
test pulse
clkDll
config/statuschip
state machine
rese
t_dl
l CM
OS
doub
le c
olum
n 0
quar
ter
chip
RO
1
quar
ter
chip
RO
2
quar
ter
chip
RO
3
global DACs
45
5
hitArbiter 0 & edge detector hA 1
hA 2
hA 82, parallel_load&daq_rdy
1,hit
coarseHitRegister 0
2 x 32 2 x (13 + 5)
5 add+5 pil2 x 32 2 x (13 + 5)
coarseTimeStampEncoder
32
13 rise+5 trail
>
> grou
p E
OC
2
grou
p E
OC
1
coarseTimeStampServer0
coarseTimeStamp
12
5 rise+5trail+12+1 rise+6+1 trail+5add+5pil+2col=42 42 4242
columnMux 9 to 1
columnFifo (depth= 6)
42+4 add=46
46 42 5x doubleCol x 2x42+2x9
46+4 add=50
data formatter & multipleHit & comma & frame inserter
8b10b encoder
60
>
>
>
clkserial/2
clkserial/2
> clkserial/2
2 (
1 te
mp
)
analogMonitor
Mux
clksync & enableclk
clksync
> clkmultiserial
mu
ltiS
eria
lPo
we
r5859
2
31
0
serialTime1
seri
alT
ime
1
col
umn
9
serialTimeMux 90 to 48
9 c
olum
ns
clksync or clkserialTime
clksync
quar
ter
chip
RO
0
is located in synchronous logic; clk divider needs synchronous reset with respect to receiving clock domain (clkmultiserial)
>clksync
serialTimeController
LVDS320 MHz
LVDS≥320 Mbit/s
analog DC
diffanalog AC
wor
ld
doub
le c
olum
n 2
doub
le c
olum
n 3
doub
le c
olum
n 4
doub
le c
olum
n 5
doub
le c
olum
n 6
doub
le c
olum
n 19
CMOS DC
1
rese
t_gl
obal
CM
OS
rese
t_co
rsec
nt L
VD
S
648 FF @ 2 depth
2.7 /4 Mhits/s
27/40 Mhit/s
2.7/4 Mhit/s
0.3/0.44 Mhit/s
avg. nominal rate (750 MHz beam (104 Mhit/s per chip)/ rate with 2.4 Gbiit/s serializer [Mhit/s])
152FF @ 4 depth
4x4545
clkdll>
serializer controller
min. 40 FIFOs 1 FIFO overflow bit,optional overflow count
FIFO overflow status
clksync>
clkmultiserial or clktest
clksync
48
sync register> clksync & enableclk
sync register> clksync & enableclk
quarterChipMux 10 to 1
90
qchip clock divider & clk distribution
/2
clkmultiserial
clksync
clkFIFOread
c01
0
1
d
/6/5
clk D
igita
l=2
0/2
6.7
MH
z
PLL
PLL override
clkSerial=2.4/3.2 GHz
clkserial/2
ext
3
CMOS DC
LVDS320/480MHz
/2
/5a b c
01
path d is doubled as to have one direct link from clkserial/2 to clkfiforead
d
muxmodePLL
Modes:serialPLL2.4/serialPLL3.2/ext320/ext480/PLLoverrideabc:0000/0000/111*1/110*1/100*1 8 modes = 3 bitsclkInDigital=20/26.66/320/480/320MHzclkPLL=2.4/3.2/-/-/0.32GHzclksync=240(10)/ 320(10)/ 320*(1)/240*(2)/32(1) MHzclkFIFOread=40(60)/53(60)/27(12)/40(12)/5.3 MHz(60)clkmultiserial=480/640/320/480/64 MHzclkserialtime=clksync() =division factor, * can also be 0 or 1 to change clksync in TDC
1
0
b
2011.10.24
Modes:serialPLL2.4/serialPLL3.2/ext320/ext480/PLLoverrideabc:0000/0000/111*1/110*1/100*1 8 modes = 3 bitsclkInDigital=20/26.66/320/480/320MHzclkPLL=2.4/3.2/-/-/0.32GHzclksync=240(10)/ 320(10)/ 320*(1)/240*(2)/32(1) MHzclkFIFOread=40(60)/53(60)/27(12)/40(12)/5.3 MHz(60)clkmultiserial=480/640/320/480/64 MHzclkserialtime=clksync() =division factor, * can also be 0 or 1 to change clksync in TDC
is located in synchronous logic; clk divider needs synchronous reset with respect to receiving clock domain (clkmultiserial)
avg. nominal rate (750 MHz beam (104 Mhit/s per chip)/ rate with 2.4 Gbiit/s serializer [Mhit/s])
clk D
igita
l=2
0/2
6.7
MH
z
PLL
PLL override
clkSerial=2.4/3.2 GHz
/5 /2/10
/60
or e
xt/1
2
/ext
clksync
enable/clkFIFOread
clkserial/2
/5 o
r ex
t
ext
muxmode
2
/2 /10
or e
xt/2
clksync clkserialTime
clksyncReg
PLL & clock divider & clk distribution
clkmultiserial
clkserialTime/6
/nonext
/ext
CMOS DC
LVDS320/480MHz
clk D
igita
l=2
0/2
6.7
MH
z
PLL
PLL override
clkSerial=2.4/3.2 GHz
clkserial/2
ext
3PLL & clock divider &
clk distributionCMOS DC
LVDS320/480MHz
/2
/5 /2
clkmultiserial
clksync
clkFIFOread
a b c
c
01
01
1
0
b
0
1
d
/6/5
path d is doubled, but mux d and div 6 will be sitting in serializer to keep routing short.
d
muxmode
row 0
Col
umn
0
Pixel = column * 45 + row
Pixel group = column * 9+ groupgroup 0 contains pixel 0
Pixel matrix: 13500 µm
EoColumn bias 1800 µm
TL rx: 70 µmhitArbiter 175 µm
Coarse units, pixel group FIFOs, column FIFO 1075 µm
DLL, SM, fine registers 1000 µm
Quarter chip read-out & Global configuration ~ 1000 µm
Serializer & PLL & clock distributor ~ 500 µm
Pad ring ~700 µm
qchipRo02500x1000
qchipRo12500x1000
qchipRo22500x1000
qchipRo32500x1000
chipConfig1000x600
Serializer02000x500
Serializer12000x500
Serializer22000x500
Serializer32000x500
PLL&clock1000x500
Pad ring 12000 x 700
Total: 19945 µm
Routing adaptor 1000x200
Aux
. co
mpo
nent
s 50
0x25
0
12000 µm
Corners: 125 µm
Ban
d G
ap 2
50x1
000
Test
pad
s 25
0x15
00
clk_sync
Q
Q_
DQ
Q_
D Q
Q_
D Q
Q_
D
clk_sync
reset_synchronizer_sync
cmd_reset_sync
min: clk_prop + hold; max: clk_prop+clk_cycle-setup
clk_dll
cmd_reset_dll
*) pin reset_all_n reset_sync, reset_dll, reset_config, reset_bandgap_n*) cmd_reset_all reset_sync, reset_dll, reset_config, reset_bandgap_n*) cmd_reset_sync reset_sync*) cmd_reset_dll reset_dll (to dll_state_machine)*) cmd_reset_config reset_config*) cmd_reset_bandgap reset_bandgap_n
clk_config
cmd_config
reset_bandgap_ncmd_reset_bandgap
Reset scheme
Q
Q_
D
clk_sync
Q
Q_
DQ
Q_
D
Q
Q_
DQ
Q_
D
digital logic high active resetfrom outside and analog blocks low active reset
Data format• Nominal transmission: 2.4 Gbits/s,• High speed: 3.2 Gbits/s• All words: 48 bits (6 bytes) long• 8b10 encoded bit stream 60 bits
– data word– frame word– idle (komma) word: no hits available to transmit, 6 * comma character (ie. K28.5)– sync word: after reset and after each force_sync command (can be sent
repetitive)for 4 * 106 cycles, 100 ms @ 2.4 Gbit/s, 6 * comma character (ie. K27.7)– link checking sequence, known pattern (ie. counter) sent upon request
• Header contains frame counter every 6.4 µs• Data contains dynamic range up to 6.4 µs + 1 overroll counter bit
Data format-hit word normal mode (48 bit)• ------------------------------------------------------------------• --qchip_word -> data_out• ------------------------------------------------------------------
• --(47) Status/data selector 1 bit• --(46..40) Address 7 bit (90 pixel groups)• --(39..35) Address-hit arbiter 5 bit• --(34..30) Address pileup 5 bit• --(29) Leading coarse time selector 1 bit• --(28..17) Leading coarse time 12 bit 1bit rollover
indicator+2048(11bit)*3.125 ns=6.4 µs• --(16..12) Leading fine time 5 bit 98 ps -> 3.125 ns• --(11) Trailing coarse time selector 1 bit• --(10..5) Trailing coarse time 6 bit 64*3.125 ns = 200 ns• --(4..0) Trailing fine time 5 bit 98 ps -> 3.125 ns• ___________________________________________________________• --Total 48 bit
(45..39) Address 7 bit (90 pixel groups)
• 10 column each 9 pixels groups to be addressed:
• Column 0: pixel group 0,1,2,3,…,7,8• Column 1: pixel group 9,10,11,12,13..17• Column 2: pixel group 18,19,20,21,..26• ….
• pixels in pixel group are one hot encoded– example pixel 2: “00010”
Data format-status words• ------------------------------------------------------------------• -- word_frame0• ------------------------------------------------------------------
• --(47) status bit 1 bit• --(46..41) # of SEU in previous frame 6 bits 2**6=64, 64/6.4us=10E7/s• --(40..28) # of hits in previous frame 13 bits 2*1*3=8192, hits per
qchip and frame= 130 Mhits/s/4*6.4us=208->factor 40 --> 2048 --> 13 bit• --(27..0) framecounter 28 bit
2**28*6.4us=1718s• ___________________________________________________________
48 bit• -- word_frame1
• --(47) status bit 1 bit• --(46..31) checksum 16 bit• --(31..6) empty 26 bit• --(5..0) group collision count 6 bit• ___________________________________________________________• -- 48 bit
sync link word (48 bit) sent after reset for 1024 clk cycles
• 6 * Komma K28.5___________________________________________________________________________________
• Total 6 * 48 bit
sync slot word (48 bit) sent after sync link word for 1024 clk cycles
• 5 * Komma K27.7+ 1 D0.0 + D0.0 is sent after 5 Kommas___________________________________________________________________________________
• Total 6 * 48 bit
idle word (48 bit)
• 6 * Komma K27.7___________________________________________________________________________________
• Total 6 * 48 bit
Do we need these values in frame• Seu_counter• FIFO_overflow_counter• Error_info• Status_info• Checksum
Configuration: qChip• --(0) 1 bit: send_k_sync_requ• --(1) 1 bit: send_k_word_requ• --(5.2) 2 bit: k_word_type• --(6) 1 bit: send_testpattern_requ• --(14..7) 8 bit: • rotating FIFO 48 bits * 8 words• --> subsequent writing moves write pointer of FIFO so that all FIFO cells can be written• --> when test pattern FIFO is used, all 8 FIFO cells are read and pushed into• --> the data stream, thus the data stream consists of a multiple of 8 data words.• --(15) 1 bit: new_data_testpattern• --(..16) serial read-out control
• --send_k_sync_requ <= configuration_data_in(0); • --send_k_word_requ <= configuration_data_in(1);• --k_word_type <= configuration_data_in(5 downto 2);• --send_testpattern_requ <= configuration_data_in(6);• --data_testpattern <= configuration_data_in(14 downto 7);• --new_data_testpattern <= configuration_data_in(15);• --serial read-out control <= ….
Configuration: TDC
Configuration: DLL
Configuration: EOC bias
Configuration: pixel
Configuration: config
Data format-hit word extended mode, not implemented
• Status/data selector 1 bit• Leading coarse time 12 bit 2048*3.125
ns=6.4 µs• Leading fine time 5 bit 98 ps -> 3.125 ns• 2x Trailing coarse time 2x5 bit 32*3.125 ns
= 100 ns• 2x Trailing fine time 2x5 bit 98 ps ->
3.125 ns• Coarse time selector 2 bit• Address 12 bit
– Address-hit arbiter 5 bit (3 bit possible, but loss if double address bit info)
– Address-pixel group 7 bit (9 x 10 pixel groups in quarter chip -> encoding required)
• Address pileup 5 bit (can be encoded into if only one pileup info sufficient or can be sent as second word)
• Error bit (SEU, overflow) 2 bit bit (can be sent afterwards as status word)
___________________________________________________________________________________
• Total 59 bit sent in two 48 bits words
G. Aglieri
G. Aglieri
G. Aglieri
Status• schematic or hdl• simulation pre-layout / pre-synthesis• layout & extraction• simulation post-layout / parasitics back
annotated• DRC & LVS• schematic integrated in top• layout integrated in top• simulation integrated in top• SEU simulation
Clock tree÷r 60bit 5pads
Implementation data transmission 60b• Using GBT running at 20 MHz, but modifying data shift length to 60• Problem: GBT has 3 parallel multiplexed shift registers, 60/3=20
GBT can to be modified to 2 SR each 30 bits, first clock divider from 3 to 2additional high speed dividers
• 20 MHz in 2.4 Gbit/s 40 Mwords/s (+21% (132 Mhits/s); + 54% (104 Mhits/s)• 2400 / 320 = 7.5 ! 2400/8 = 300 MHz• Programmable divider: 10 (240) / 5! (480) / 60 (40) for synchronous read logic• Programmable divider: 8 (300), 6(400) for FIFO write and state machines
2.4 GHz20 MHzPLL
Clock divider2.4 GHz
1.2 GHz serial mux & shift
40 MHz parallel_load (/60)
40 MHz (60) / 240 MHz (10) / 480 MHz (5!)
• Synchronous parallel read-FIFO frequency:• serialFrequ * n / 50 [MHz] = 48
(1)/96(2)/144(3)/192/240(10)/288/336/384/432/480 (5!)
240 MHz (10) / 300 MHz (8) / 400 (6)
Fifo read
Fifo write
• Fast counter:• /2 = 1.2 GHz serial mux & shift• /5 /2 = 240 MHz fifo read• /5/2 = 240; /2 /4 = 300 MHz; /3 /2 = 400 MHz
statemachines, all FIFOs&chipFIFOwrite
Implementation data transmission; 60bit/5IO• Multi Serial60bit:
– 60 bits (8b10); 5 I/O pairs– FIFO read-frequency for 50% contingency on 132 Mhits/s 50 MHz / quarter chip
* 60 bit /5 pairs (10 bits serializer) 3000 /5 = 600 MHz per LVDS pair – Input frequency comes from PLL or from outside, either 2.4 Gbit/s on pad or 480 MHz
for all pads & synchronous logic– if synchronous logic works with 480 MHz only 480 MHz * 5 = 2400 Mbit/s / 60
40 Mhits/s (21 % (132 Mhits/s) +54 % (104 Mhit/s))– Worst case
• synchronous logic works with 320 MHz only 320 MHz * 5 = 1600 Mbit/s / 60 26.7 Mhits/s (-19 % (132 Mhits/s) +3 % (104 Mhit/s))
• synchronous logic works with 240 MHz only 240MHz * 5 = 1200 Mbit/s / 60 20 Mhits/s (-39 % (132 Mhits/s) -23 % (104 Mhit/s))
Implementation data transmission 60b• Using GBT running at 26.66 MHz• 26.66 MHz in 3.2 Gbit/s 53 Mwords/s
(+61 % (132 Mhits/s); + 105 % (104 Mhits/s)• 3200 / 320 = 10 • Programmable divider: 10 (320)
3.2 GHz26.66 MHzPLL
Clock divider3.2 GHz
3.2 GHz
53MHz parallel_load (/60)
53 MHz (60) / 320 MHz (10) / 640MHz (5!)
320MHz (10) / 400 MHz (8) / 533.33 (6)
Fifo read
Fifo write
IOs• south end of chip:
– 12 mm-2 corners*0.215 mm / 0.073 mm pitch = 158– if possible only one row
optional, two rows with power pins in the 2nd row (longer bond wires)– bond pads 200 µm long x ~ 70 µm wide
• east and west end:– area accessible when sensor bonded: x mm pads– area not accessible when sensor bonded: x mm padsavailable for test pads in the EOC area
Operation Test Power
clk_dig lvds_in 2 Test_out <37 downto 0>
Cmos or analog or lvds
38 VDDanalog1.2 power 13
clk_dll lvds_in 2 Test_in <39 downto 0> Cmos or analog or lvds
40 VDDtdc1.2 power 6
serial_conf_in lvds_in 2 VDDdigital1.2 power 7
reset_coarse_frame_count
lvds_in 2 Optional VDDserializer(min.3 pairs/serializer)
power 12
reset_global cmos_in# 1 address <3 downto 0> cmos (4)
reset_dll cmos_in# 1 Jtag_trst cmos (1) VDDlvds2.5 power 1
serial_conf_out lvds_out 2 Jtag_tck cmos (1) VDDlvdsMultiSerial2.5 power 2
reset_bandgap cmos_in 1 Jtag_tms cmos (1) GNDanalog1.2 power 13
serial_out<3 downto 0>
CML_out 8 Jtag_tdi cmos (1) GNDtdc1.2 power 6
temp<1 downto 0>
analog_out 2 C_chan <7 downto 0> cmos ? GNDdigital1.2 power 7
test_pulse_in diff analog_in 2 Jtag_tdo cmos (1) GNDserializer(min.3 pairs/serializer)
power 12
multiSerial_out<19 downto 0>
lvds_out 40 seu lvds_out 2
clk_multiserial or clk_test
lvds_out 2 GNDlvds2.5 power 1
Mode GNDlvdsMultiSerial2.5 power 2
bandgap_override analogInOut 1
mode_parallel_out cmos_in 1 12-2*0.215 mm / 0.073 mm = 158
156wo()
clockMuxMode cmos_in 3
# possibly LVDS
I/O
• Which test pads for building blocks?– TDC inputs.
• Can they be put in 2nd row? or on the side?• How much space for EOC? 4.5mm+padrow=5
mm• How much space of ASIC not under sensor
minus corner / 73µm *2 is # test pads
Test pads• divided PLL output on test pad
Chip assembly• Global floor planning• Placement of pixel matrix, TDC, EOC, pad ring,
configuration, auxiliary blocks• Power routing• Global functionality simulation• DRC, LVS• Top level schematic• Chips size compatibility with sensor, dicing,
bump bonding
Block assembly• Pixel matrix (Virtuoso)
– Pixel cell, inPixel confinguration, inpixel DACs• EOC blocks (Encounter)
– TDC, hitArbiter, FIFOreadout, quadConfiguration, chipConfiguration
• Global blocks (Virtusoso/Encounter, depending on competency)– Serializer, IO ring, band gap, temperature
Verification sequence• Test patterns
– From hit generator or– From configuration pattern
• Individual blocks– Behavioral/functional– Layout DRC/LVS– Timing back annotated, worst/best case (libraries)
• Local top level (ie. TDC, FIFOread-out, full configuration– Full functional back annotated with test patterns
• Global top level (pixel matrix&digital&serializer)– Full functional back annotated (digital) with test patterns &
simulated configuration & HDL modeled analog front-end & HDL modeled DLL• Functional simulation• SEU simulation
– Mixed mode simulation on interface: transmission line & receiver & hitArbiter– DRC/LVS, (if possible full chip)
• Global system test bench (pattern generator, verification of data output, assertions)
Pixel cell & matrix• Pixel cell
– Pre-amplifier, discriminator, transmission line driver– In pixel DAC– In pixel configuration– Qualification
• analog: extraction, connectivity, crosstalk sensitivity• config: functionality, connectivity
• Pixel matrix– Top level schematic– column layout– transmission lines– Transmission line receiver
• placement• Translation to 1.7 OA• Qualification
– extraction, simulation– power routing– test pulse routing– biasing DACs– bias routing– configuration routing– Bias monitoring & mux– Qualification
• analog: extraction, connectivity, crosstalk sensitivity, power drop• config: functionality, connectivity
Pixel cell & matrix• Analog End-of-column
– Column DAC– Column DAC control
• Temperature/radiation diodes– ADC– direct output
TDC• Delay line
– Delay line, charge pump, loop filter– State machine– Qualification
• DLL, operation margins, startup, extraction• Top level, including state machine
TDC• TDC
– Floorplanning– Delay line – 32-5 encoder
• synthesis, layout, simulation– fine hit registers
• Layout, simulation, qualification with routing effects– course counter
• concept• synthesis• qualification
– hit arbiters & edge detector• schematic, simulation, layout• Qualification
– State machine– placement, routing, Interconnection bus– Verification of power consumption– power routing TDC & compatibility with pixel matrix/global power routing– Qualification
• extraction, functionality, crosstalk, power routing, top level, mixed mode– Top level schematics– Functional simulation (startup & time tag)– Timing simulation with hitArbiterController & FIFO controller & serial read-out controller
HitArbiter• Test bench• Remove demonstrator problems
– Double hits, varying delays, pileUp address• Move to OA , 1.7• Simulate backannotation with test bench,
define efficiency• Place/Route compatible with space and
power routing
Configuration• Global configuration master• QuadConfiguration• PixelConfiguration
– SEU simulation– DLL & pixel cell functional verification with real
configuration data– Place&route (Encounter)
FIFO read-out• read-out
– VHDL system level simulation, occupancy, definition of FIFO dephts– FIFO controller (SEU hard)– FIFO
Task• PLL & Serializer & driver• Band Gap• LVDS 500 Mbit/s driver / receiver, rad tolerant• 200 µm pad opening on all pads
Pad library• Pad modification for all pads required to have
large bond pads.• Special 70µm LVDS pads?
LVDS pads• Have never been tested or simulated in detail
to higher than 200 MHz;• Pads in demonstrator have a known radiation
issue; for us with 100 krad should not be a problem
• New pads are going to be tested but are not faster have been optimized for below 200 MHz !
PLL & Serializer• Use GBT as template
– 4 * serializer + 1 PLL @ 4.8 Gbit/s = 750 mW• Use GBT only with 2.4 Gbit/s nominal• Redesign clock divider• Move from LM to DM
– Only power and capacitors on top 5 layers• Change aspect ratio from 1 mm x 1 mm to 0.5 mm x 2 mm• Separate PLL from serializer• Implement 4 clock dividers (10/8/6/2(Mux))• Change SR length to 2*25• Use only 2 Mux inputs• Outputs are CML, are optical components compatible with CML, if not find converters.
Pad ring• Definition of power domains• Break padring• Connect to power stripes• Implement elongated pads
Power domains• VDDanalog1.2
– pixel matrix only– consumption 50%: 1.6W 1.3A ≥ 13 pins
• VDDtdc1.2– DLL, fine time registers
• VDDdigital1.2– synthesized logic– VDDtdc & VDDdigital consumption 50%: 1.6 W 1.3 A ≥ 13 pins
• VDDserializer1.2?4*150mA min 6. pads, Paulo min. 3 pairs per serializer min. 12 pairs
• VDDlvds2.5– clkdll, serialConfigIn/Out, resetCoarseCnt– 1 pin
• VDDlvdsmultiserial2.5– 4 groups of 5 pads (should be physically grouped together)– min. 2 pins.
Notes• from here on notes and old block diagrams
Implementation data transmission 50b• Using GBT running at 20 MHz, but modifying data shift length to 50• Problem: GBT has 3 parallel multiplexed shift registers, 50/3=16.7
GBT need to be modified to 2 SR each 25 bits, first clock divider from 3 to 2additional high speed dividers
• 20 MHz in 2.4 Gbit/s 48 Mwords/s (+45% (132 Mhits/s); + 84 % (104 Mhits/s)• 2400 / 320 = 7.5 ! 2400/8 = 300 MHz• Programmable divider: 10 (240) / 5! (480) / 50 (48) for synchronous read logic• Programmable divider: 8 (300), 6(400) for FIFO write and state machines
2.4 GHz20 MHzPLL
Clock divider2.4 GHz
1.2 GHz serial mux & shift
48 MHz parallel_load (/50)
48 MHz (50) / 240 MHz (10) / 480 MHz (5!)
• Synchronous parallel read-FIFO frequency:• serialFrequ * n / 50 [MHz] = 48
(1)/96(2)/144(3)/192/240(10)/288/336/384/432/480 (5!)
240 MHz (10) / 300 MHz (8) / 400 (6)
Fifo read
Fifo write
• Fast counter:• /2 = 1.2 GHz serial mux & shift• /5 /2 = 240 MHz fifo read• /5/2 = 240; /2 /4 = 300 MHz; /3 /2 = 400 MHz
statemachines, all FIFOs&chipFIFOwrite
Implementation data transmission 50b; 5pairs
• Multi Serial50bit:– 50 bits (8b10); 5 I/O pairs– FIFO read-frequency for 50% contingency on 132 Mhits/s 50 MHz / quarter chip
* 50 bit /5 pairs (10 bits serializer) 500 MHz per LVDS pair 2400 /5 = 480 MHz
– Input frequency comes from PLL or from outside, either 2.4 Gbit/s on pad or 480 MHz for all pads & synchronous logic
– Worst case• synchronous logic works with 320 MHz only 320 MHz * 5 = 1600 Mbit/s / 50
32 Mhits/s (-4 % (132 Mhits/s) +23 % (104 Mhit/s))• synchronous logic works with 240 MHz only 240MHz * 5 = 1200 Mbit/s / 50
24 Mhits/s (-27 % (132 Mhits/s) -8% (104 Mhit/s))
Implementation data transmission; 60bit/4IO• Multi Serial60bit:
– 60 bits (8b10); 4 I/O pairs– FIFO read-frequency for 50% contingency on 132 Mhits/s 50 MHz / quarter chip *
60 bit /4 pairs (10 bits serializer) 750 MHz per LVDS pair 2400 /4 = 600 MHz
– Input frequency comes from PLL or from outside, either 2.4 Gbit/s on pad or 480 MHz for all pads & synchronous logic
– synchronous logic works with 600 MHz 600 MHz * 4 = 2400 Mbit/s / 60 40 Mhits/s (21 % (132 Mhits/s) +54 % (104 Mhit/s))
– Worst case• synchronous logic works with 320 MHz only 320 MHz * 4 = 1280 Mbit/s / 60
21.3 Mhits/s (-35 % (132 Mhits/s) -18 % (104 Mhit/s))• synchronous logic works with 240 MHz only 240MHz * 4 = 960 Mbit/s / 60
16 Mhits/s (-52 % (132 Mhits/s) -38% (104 Mhit/s))
Implementation data transmission50b• Using GBT running at 26.66 MHz• 26.66 MHz in 3.2 Gbit/s 64 Mwords/s
(+93% (132 Mhits/s); + 145 % (104 Mhits/s)• 3200 / 320 = 10 • Programmable divider: 10 (320)
3.2 GHz26.66 MHzPLL
Clock divider3.2 GHz
3.2 GHz
64 MHz parallel_load (/50)
320 MHz (10) / 640MHz (5!)
320MHz (10) / 400 MHz (8) / 533.33 (6)
Fifo read
Fifo write
Implementation data transmission50b
4.8 GHz40 MHzPLL
• If only 2 SR in serializer it will not run at 40 MHz• Using GBT running at 40 MHz• 40 MHz in 4.8 Gbit/s 96 Mwords/s (+190% (132 Mhits/s); + 270 % (104 Mhits/s)• 4800 / 320 = 15 2400/8 = 300 MHz• Programmable divider: 10 (480) /8 (600) /
6 (800) / 16 (300)/ 12 (400) /[15 (320)]
Clock divider4.8 GHz
4.8 GHz
96 MHz parallel_load (/50)
480 MHz (10) / 640MHz (5!)
480 MHz (10) / 600 MHz (8) / 800 (6)
400 (12) / 300 (16)
Fifo read
Fifo write
Clock tree÷r
Clock tree÷r
Clock tree÷r
Clock tree÷r
Clock tree÷r
Clock tree÷r
Notes on data transmission• 1 GHz beam: 132 Mhits/s per chip• 750 MHz: 105 Mhits/s per chip• 132 Mhits/s * 40 bits = 5.28 Gbit/s• 4 serializers 5.28/4 = 1.32 Gbit/s 132/4=33 Mwords/s• 8b10b 1.32 *10/8 = 1.65 Gbit/s 132/4=33 Mwords/s• +20% contingency 1.65 * 1.2 = 1.98 Gbit/s 132*1.2/4= 39.6 Mwords/s• = 51% contingency for 750 MHz & 105 Mhits/s
• Approach with two clock domains for last FIFO stage• 320 MHz * 8 = 2.56 Gbit/s• FIFO read frequency: 2560/50=51.2MHz• 320/51.2= 6.25 (no integer) FIFO read cannot run on 320 MHz clock• 2nd clock needed to read last FIFO, if so then serial frequency = read_frequency * 50• 2.56 Gbit/s is arbitrary chosen • Clock_dll = 320 MHz, clock_digital = 320 MHz, clock_serial = 2.56 GHz with division by 50.
Notes on data transmission• Last FIFO read & write clock different
2.56 GHz (1.28 GHz)320 MHz or anyPLL
2.56 GHz (1.28 GHz)
51.2 MHz parallel_load
If possible 320 MHz but not required
Clock divider
2.56 GHz (1.28 GHz)
Notes on data transmission• All blocks on 320 MHz• 3.2 Gbit/s 64 Mwords/s (+93% (132 Mhits/s); + 150 % (104 Mhits/s)
3.2 GHz320 MHzPLL
Clock divider
3.2 GHz
3.2 GHz
64 MHz parallel_load
If possible 320 MHz but not required
Notes on data transmission• Parallel out: max 4 x 2 pins per quarter chip (40/4=10)
• Data without 8b10 decoding• 320M/40*4=32 Mwords/s (+23 %;104 Mhit/s)
450M/40*4=45 Mwords/s (+73%;104 Mhit/s) 480M/40*4=48 Mwords/s (+84%;104 Mhit/s;
+45%;132 Mhits/s)• 320 MHz clock domain compatible with 320M/355M/400/457/533 otherwise
readfrequency of last FIFO is different from 320 MHz two clock domains.• With 8b10 decoding: 4 IO is inconvenient, either 5, same data rate as above or• Unbalanced transmission (50/4= 12.5)• @ 320 MHz 1.28 Mbits/s 320M/50*4=25.6 Mword/s (-2%;104Mhits/s)• @ 450 MHz 1.8 Mbit/s 450/50*4=36 Mwords/s (+38%;104Mhits/)• 2 clock domains at last FIFO required
Notes on data transmission• GBT:• 40 MHz in; 4.8 Gbit/s out, stream 120 bits.• Block is 1 mm x 1 mm aspect ratio not good for us. • 4 serializers + 1 PLL = 750 mW @ 40 MHz• If used like it is:• Running at 20 MHz gives; 2.4 Gbit/s; • Our 50 bits data stream needs to be reformatted to 120 bits.• Top level metals contain power and capcacitors move to LM seems possible.
Data transmission• Using GBT running at 20 MHz with 120 bit serializer word length• Needs a demultiplexer5*40bits to get from 40 bits words to 100
before or after FIFO and then 8b10 encoding to 120 bit, additional control needed
• 20 MHz in 2.4 Gbit/s 48 Mwords/s (+45% (132 Mhits/s); + 84 % (104 Mhits/s)
• 2400 / 320 = 7.5 ! 2400/8 = 300 MHz• Programmable divider: 10 (240) / 5! (480) for synchronous logic
2.4 GHz20 MHzPLL
Clock divider
2.4 GHz
2.4 GHz
20 MHz parallel_load (/120)
240 MHz / 480 MHz
• Synchronous parallel read-FIFO frequency:• serialFrequ * n / 120 [MHz] = 20
(1)/40(2)/60(3)/80/100/120/140/160/180/200/220/240/…300/400/480 (5!)/..
• Verification sequence for each sub block• update/verify block diagram• which test pads?• which IO signals to blocks• min power supply pads per domain.
I have another question about the clocks sent to the GTK: in an earlier talk about this subject we had agreed on sending by means of optical links the "high quality" clock for the GTK TDCs and the "digital clock" for the serializers.I found PLLs from IDT which have ps jitters and would do very well the job of redriving the 40MHz clock, multiplied when needed, to the GTK ASIC.But the peak jitter figures of the optical transceivers, for instance of the Finisar 4.2Gbps which I was thinking of using, are in the range of tens of ps; even 120 ps for the Zarlink 2.5GBps. It is not clear how many sigmas do they use to define the maximum.Should we worry?
DLL
Config pixel
trimDACpixel
pixel
driver&line&receiverpixel cell x 45
fineHitRegister 0
syncRegister
fineTimeStampEncoder
pixelGroupFifo (depth= 2 to n)
5 address + 5 pileup32 fineRise32 fineTrail2x12+1 coarseRise2x4+1 coarseTrail
5 add+5 pil
5 fineRise5 fineTrail12 coarseRise4 coarseTrail
5 rise+5 trail
grou
p E
OC
8
grou
p E
OC
0
CP&PDDLL 0
colu
mn
0
columnFifoController
colu
mn
1
quarterchipFifo&frameInserter
Controller
colu
mn
quad
1
serializer
40
2
clkdll=320MHz
= SEU protected
TDCpix ASIC block diagram (50 bit serial)
9+1x temp
2011.02.25
pixel column
end of column
23 cell units * (0.40 µm x 4.8 µm)* (648+152+373/9) FF=37000 µm2=124µm*300µm
thresholdDACcolumn
ConfigQuad
bandgap
2.4/3.2 Gbits/sCML driver
parallelOut
5 x LVDS480/640 Mbit/s
(enable)/mode
ba
ndg
ap
ove
rrid
e
test pulse
clkDll
config/statuschip
state machine
rese
t_dl
l CM
OS
colu
mn
quad
0
colu
mn
2co
lum
n 3
quar
ter
chip
RO
1
quar
ter
chip
RO
2
quar
ter
chip
RO
3
Global DACs
45
5
hitArbiter 0 & edge detector hA 1
hA 2
hA 82, parallel_load&daq_rdy
1,hit
coarseHitRegister 0
2 x 32 2 x (13 + 5?)
5 add+5 pil2 x 32 2 x (13 + 5?)
coarseTimeStampEncoder
32
12 rise+4 trail
>
> grou
p E
OC
2
grou
p E
OC
1
coarseTimeStampServer0
coarseTimeStamp
13
5 rise+5trail+12 rise+4 trail+5add+5pil=36 36 3636
columnMux 9 to 1
columnFifo (depth= 0 to 4)>
36+4 add=40
40
quarterChipMux 10 to 1
40 4x40+4x9
2x40+2x9
quarterChipFifo (depth= ~8)
>40+4 add=44
data formatter & multipleHit & comma & frame inserter
44>
8b10b encoder
50
>
>
>
clkserial/2
clkserial/2
> clkserial/2
2 (
1 te
mp
)
analogMonitor
Mux
clkFIFOread
clkFIFOread
> clkFIFOread
Modes:nominal/high/extern320/extern480/externserialclkInDigital=20/26.66/320/480/320MHxclkPLL=2.4/3.2/-/-/0.32GHzclksync=240(10)&300(8)&400(6)/ 320(10)&400(8)&533(6)/ 320/480MHz/ 32(10)&40(8)&53.3(6)clkFIFOread=48/64(50)/32/48(10)/6.4 MHzclkmultiserial=480/640/320/480/64 MHzclkserialtime=240/320/160/240/32 MHz
> clkmultiserial
mu
ltiS
eria
lPo
we
r0
1
2
349
48
# optional
serialTime1
seri
alT
ime
1
col
umn
+1
dum
my
5
9
100serialTimeMux
20 to 1
9 c
olum
ns+
9 d
um
mie
s
clksync
clksyncReg = clksync or clkserialTime
clksync
clksync
quar
ter
chip
RO
0
clk divider needs synchronous reset with respect to receiving clock domain (clkmultiserial)
> clkmultiserial
serialTimeController
LVDS320 MHz
clk D
igita
l=2
0/2
6.7
MH
z
PLL
PLL override
clkSerial=2.4/3.2 GHz /2/3/5
/4/2/2
/8 /6 /10
/5 /2
/50
or e
xt/1
0/ext
clksync clkFIFOreadclkserial/2
/5 o
r ex
t# #
ext
muxmode
3/2
/10
or e
xt/2
clksync clkserialTime
clksyncReg
PLL & clock divider & clk distribution
clkmultiserial
clkserialTime
LVDS≥320 Mbit/s
analog DC
diffanalog AC
wor
ld
CMOS DC
LVDS320/480MHz
colu
mn
quad
2
colu
mn
quad
3
colu
mn
quad
4
colu
mn
quad
5
colu
mn
quad
6
colu
mn
quad
7
colu
mn
quad
8
colu
mn
quad
9
CMOS DC
1
rese
t_gl
obal
CM
OS
rese
t_co
rsec
nt L
VD
S
648 FF @ 2 depth
2.7 /4.8 Mhits/s
373FF @ 8 depth
27/48 Mhit/s
2.7/4.8 Mhit/s
0.3/0.5 Mhit/s
nominal rate (750 MHz beam (104 Mhit/s er chip/ rate with 2.4 Gbiit/s serializer [Mhit/s]
152FF @ 4 depth
4x4545
DLL
Config pixel
trimDACpixel
pixel
driver&line&receiverpixel cell x 45
fineHitRegister 0
syncRegister
fineTimeStampEncoder
pixelGroupFifo (depth= 2 to n)
5 address + 5 pileup32 fineRise32 fineTrail2x12+1 coarseRise2x4+1 coarseTrail
5 add+5 pil
5 fineRise5 fineTrail12 coarseRise4 coarseTrail
5 rise+5 trail
grou
p E
OC
8
grou
p E
OC
0
CP&PDDLL 0
colu
mn
0
columnFifoController
colu
mn
1
quarterchipFifo&frameInserter
Controller
colu
mn
quad
1
serializer
48
2
clkdll=320MHz
= SEU protected
TDCpix ASIC block diagram (60 bit serial/4pads)
9+1x temp
2011.02.25
pixel column
end of column
23 cell units * (0.40 µm x 4.8 µm)* (648+152+373/9) FF=37000 µm2=124µm*300µm
thresholdDACcolumn
ConfigQuad
bandgap
2.4/3.2 Gbits/sCML driver
parallelOut
5 x LVDS480/640 Mbit/s
(enable)/mode
ba
ndg
ap
ove
rrid
e
test pulse
clkDll
config/statuschip
state machine
rese
t_dl
l CM
OS
colu
mn
quad
0
colu
mn
2co
lum
n 3
quar
ter
chip
RO
1
quar
ter
chip
RO
2
quar
ter
chip
RO
3
Global DACs
45
5
hitArbiter 0 & edge detector hA 1
hA 2
hA 82, parallel_load&daq_rdy
1,hit
coarseHitRegister 0
2 x 32 2 x (13 + 5?)
5 add+5 pil2 x 32 2 x (13 + 5?)
coarseTimeStampEncoder
32
12 rise+4 trail
>
> grou
p E
OC
2
grou
p E
OC
1
coarseTimeStampServer0
coarseTimeStamp
13
5 rise+5trail+12 rise+4 trail+5add+5pil=36 36 3636
columnMux 9 to 1
columnFifo (depth= 0 to 4)>
36+4 add=40
40
quarterChipMux 10 to 1
40 4x40+4x9
2x40+2x9
quarterChipFifo (depth= ~8)
>40+4 add=44
data formatter & multipleHit & comma & frame inserter
44>
8b10b encoder
60
>
>
>
clkserial/2
clkserial/2
> clkserial/2
2 (
1 te
mp
)
analogMonitor
Mux
clkFIFOread
clkFIFOread
> clkFIFOread
Modes:nominal/high/extern320/extern480/externserialclkInDigital=20/26.66/320/480/320MHxclkPLL=2.4/3.2/-/-/0.32GHzclksync=240(10)&300(8)&400(6)/ 320(10)&400(8)&533(6)/ 320/480MHz/ 32(10)&40(8)&53.3(6)clkFIFOread=48/64(50)/32/48(10)/6.4 MHzclkmultiserial=480/640/320/480/64 MHzclkserialtime=240/320/160/240/32 MHz
> clkmultiserial
mu
ltiS
eria
lPo
we
r0
1
2
359
58
# optional
serialTime1
seri
alT
ime
1
col
umn
+1
dum
my
5
9
100serialTimeMux
20 to 1
9 c
olum
ns+
9 d
um
mie
s
clksync
clksyncReg = clksync or clkserialTime
clksync
clksync
quar
ter
chip
RO
0
clk divider needs synchronous reset with respect to receiving clock domain (clkmultiserial)
> clkmultiserial
serialTimeController
LVDS320 MHz
clk D
igita
l=2
0/2
6.7
MH
z
PLL
PLL override
clkSerial=2.4/3.2 GHz /2/3/5
/8 /6 /10
/15
/50
or e
xt/1
0/ext
clksync
clkFIFOread
clkserial/2
/5 o
r ex
t#
ext
muxmode
3/2
/10
or e
xt/2
clksync clkserialTime
clksyncReg
PLL & clock divider & clk distribution
clkmultiserial
clkserialTime
/2#
/2/2
/2
LVDS≥320 Mbit/s
analog DC
diffanalog AC
wor
ld
CMOS DC
LVDS320/480MHz
colu
mn
quad
2
colu
mn
quad
3
colu
mn
quad
4
colu
mn
quad
5
colu
mn
quad
6
colu
mn
quad
7
colu
mn
quad
8
colu
mn
quad
9
CMOS DC
1
rese
t_gl
obal
CM
OS
rese
t_co
rsec
nt L
VD
S
648 FF @ 2 depth
2.7 /4.8 Mhits/s
373FF @ 8 depth
27/48 Mhit/s
2.7/4.8 Mhit/s
0.3/0.5 Mhit/s
avg. nominal rate (750 MHz beam (104 Mhit/s per chip)/ rate with 2.4 Gbiit/s serializer [Mhit/s])
152FF @ 4 depth
4x4545
Data format-hit word normal mode 40 bit• Status/data selector 1 bit• Leading coarse time 12 bit 1bit rollover indicator+2048(11bit)*3.125
ns=6.4 µs
• Leading fine time 5 bit 98 ps -> 3.125 ns• Trailing coarse time 4 bit 16*3.125 ns = 50 ns• Trailing fine time 5 bit 98 ps -> 3.125 ns• Coarse time selector 0/1 bit• Address 12/9 bit
– Address-hit arbiter 5 bit (3 bit possible, but loss if double address bit info)
– Address-pixel group 7 bit (9 x 10 pixel groups in quarter chip -> encoding required)
– or Address global quarter chip 9 bit (if all are encoded, if double address bit send two hits)
• Address pileup 5/2/0 bit (can be encoded into if only one pileup info sufficient or can be sent as second word)
• Error bit (SEU, overflow) 2 bit/0 bit (can be sent afterwards as status word)
___________________________________________________________________________________
• Total 45/40/36 bit
I/O not in Michels symbolTest_pulse_in lvds 2
PLL override Cmos 1
Bandgap Analog 1
Parallel_out<39 downto 0>
Lvds 40
mode_parallel_out Cmos 1
Reset_dll Cmos 1
Clock muxmode Cmos 3
138 wo ()
12-2*0.215 mm / 0.073 mm = 158