ece 679 lecture 1
DESCRIPTION
ECE 679 Lecture 1TRANSCRIPT
-
ECE 679: Digital Systems EngineeringPatrick ChiangOffice Hours: 1-2PM Mon-ThursGLSN 100
-
Class IntroductionsWho am I
Who are you
-
Class BasicsClass basics4 Homeworks (%20) (groups of 2)Midterm (%40)Final Project (%40)4-page IEEE report10 minute presentation (groups of 2)Guest lecture (Dr. Frank OMahony)Intel Research Labs (May 4th)Intel Field Trip (June 7th) TBDPresentations of 1-2 best project reports
-
Class HomeworkHomeworkSkim Dally/Poulton Digital Systems EngineeringChapter 3Skim Overview Paper: http://mos.stanford.edu/papers/mh_micro_98.pdfIncludes running Stat EyeOregon State Matlab (eecs.oregonstate.edu/it)www.stateye.orgProblem Set #1rlc files -- ~pchiang/hspice (rlc_spice_deck; rlc.rlc)Spice models -- ~pchiang/hspice/process_files/130nm to 22nmSimulator lang = spiceSpectre models
DEFINE gpdk090 /nfs/guille/analog/c/cdsmgr/process/gpdk090_v3.8/libs.cdb/gpdk090
-
What does this mean for analog designers?Ever build an ADC?Ever wonder what to do with the digital bits?
AnalogFs = 600MHz8-16 bits@ 100MHz, 200MHz, 400MHzGoes to Vector analyzer Why does this clock rate not increase?
What really is this output doing? Whereis it going?
-
Brief SummaryIntroduction to the areaWhy serial links are importantWhat are the current technology trends/limitations
-
4Gb/s Low Power, Area Efficient Serial Links
Interconnection betweendifferent chips
Transmitter Equalization
Receiver Offset Cancellation
2000 0.25um Testchip2001 0.25um Testchip4Gb/s Transmitter Output4Gb/s Transmitter Output, 1m4Gb/s Transmitter Output, Equalized
Ming-Ju E. Lee, William J. Dally, John W. Poulton, Patrick Chiang, Stephen F. Greenwood. An 84-mW 4Gb/s Clock and Data Recovery Circuit for Serial Link Applications. VLSI Circuits Symposium, Kyoto, Japan, June 2001, pp. 149-152.
Ming-Ju E. Lee, William Dally, Patrick Chiang. Low-Power Area-Efficient High-Speed I/O Circuit Techniques. IEEE Journal of Solid-State Circuits, November 2000, Vol. 35, No. 11, pp. 1591-1599.
Organization of the channel, arrows from channel, plotschange image layout
Reall what you want to say on the slides.
-
Scaling Serial Links:From 4Gb/s->20Gb/sThesis: Develop 20Gb/s Serial LinkArea: 500um x 500umPower: 200mW/link
1 bit time = 1FO4
Timing uncertainty becomes KEY issue
Focus on timing uncertainty, not channelindependent vector
-
Transmitter Block Diagram
Dotted lines around different circuit components, PLL, muxing, etc. Clocks are differential clocks.
Get rid of everything else, use red. Or change imageslose people on the insight, carry through. Simpler is better
-
Test ChipClock RecoveryRXPRBS CheckPRBS GenTXDLLTest InterfaceUMC 1.2V, 0.13um CMOS(single Vt)Die size 700um x 1.15mm50 Ohm Pad Termination using Wafer Probes
700um1.1mm10GHz PLLTransmitterMuxingPhaseInterpolatorsTestStructures
Our test chip was fabricated in National Semiconductors quarter micron CMOS technology. The die is 2.6 by 1.4 square millimeter and uses a 52-pin impedance controlled package donated by Vitesse Corporation. The active area of the transceiver circuits is 0.31-mm2.
-
PLL MeasurementsJitter limited by 1.25GHz input reference clockHP 8133A input clock (1.2ps RMS, 8.9ps pk-pk)
(c)Power SpectrumQ=10 JitterQ=5 Jitter
Open Loop VCO Phase Noise @ 1MHz-97dBc/Hz 10GHz Jitter (RMS)0.97ps10GHz Jitter(pk-pk)8.0psPLL Power38.6mWVCO Power6mWTuning Range1.14-1.31
(a)
(b)
Change the cadence of talkingthese are the important points. Too much stuff in slides, too heavyline width, is 2-3 points.
-
Eye Diagram Data Rate = 19.2Gb/s Voltage ripple caused by lack of current source at differential pair tail node
Jitter2.2ps RMS15.6ps pk-pk
Dont spend toom uch time on 19.2
Seen here is the phase step values across the entire range. The average phase resolution should by 15.6ps, so the interpolation steps shown are very accurate.Note that every 9nth phase has phase interpolation values lower than the average of 15.6ps, which is what is expected, since these are the redundant steps. You can also see that not every 9th phase value is consistently small. For example, phases 18 and 36 dont show as small of a phase step as phases 9 and 27.The reason for this error is due to a layout error, due to asymmetric clock loading causing different capacitive coupling for different transitions.
(Different phase differences due to different delays amounts in the DLL itself)
-
High Speed Transmitter Comparisons
A 250mW Full-Rate 10Gb/s Transceiver Core in 90nm CMOSusing a Tri-State Binary PD with 100ps Gated Digital OutputT. Masuda, et. al., ISSCC 2007.
A full-rate 10Gb/s transceiver core employing a tri-state binary PDwith 100ps gated digital output is implemented in a 90nm CMOS process. Direct drive from the VCO is utilized to eliminate the 10GHz clock buffer current. The RX exhibits a recovered jitterof 906fs(rms) and an input sensitivity of 5.9mV. The TX generatesa jitter of 5mUI(rms). The chip consumes 250mW.
Chart1
2.1277922078
0.4348432056
0.0258687858
0.1048387097
0.1309601416
0.0481481481
0.0331632653
0.0203402367
0.0709359606
0.0102272727
0.0067613252
0.0133333333
Sheet1
serial links 10Gb/s and beyondFOM = (Gb/s * technology) / power * area * jitter (Given 40 Gb/s bandwidth)
FOM = (Gb/s * technology) / power * area * jitter (*Gb/s / 40)
AuthorTitleJournalTechnologyTechnologyData Rate(Gb/s)Power ConsumptionPowerAreaArea(trans)(mm^2)Area(trans)(mm^2)Jitter(rms,pk-pk)Jitter(pk-pk)
Chiang, VLSI 2004*Patrick Chianga 20Gb/s 0.13um CMOS Serial Link transmitter Using an LC-PLL to Directly Drive the Output MultiplexerVLSI 20040.130.1319.2165mW(trans)0.1650.65mm x 0.35mm(trans)0.22750.22752.37, 15ps152.1277922078
Krishna, ISSCC 2005Kannan Krishna, weinladera 0.6-9.6Gb/s Binary Backplane Transceiver Core in 0.13um CMOSIsscc 2005, p. 660.130.139.6275mW0.140.58mm x 0.97mm0.2716+690*2000.41120.4348432056
Kim, ISSCC 2005Jaeha KimCircuit Techniques for a 40Gb/s Transmitter in 0.13um CMOSIsscc 2005, p. 1500.130.13402.7W(trans)2.72.5mm x 3.6mm(trans)9.189.181.53, 8.118.110.0258687858
Sidripolous, ISSCC 2004Stefanos SidripolousAn 800mW 10Gb/s Ethernet Transceiver in 0.13um CMOSIsscc 2004, p. 1680.130.13100.8W0.42.5mm x 5mm1.251.256.2ps, pk-pk6.20.1048387097
Singh, VLSI 2005Ullass Singh, Michael Green34 Gb/s, 0.18um CMOS transmitterVLSI2005, p. 1320.180.18401.335W(trans)1.41.6mm*2.6mm(trans)4.164.161.44/9.449.440.1309601416
Lee, ISSCC 2004Hyung-Rok LeeA Fully Integrated 0.13um CMOS 10Gb Ethernet Transceiver with XAUI InterfaceISSCC 2004, p. 1700.130.1310900mW0.455mm x 5mm112.2, 15150.0481481481
Werker, ISSCC 2004H. WerkerA 10Gb/s SONET-Compliant CMOS Transceiver with Low Cross-Talk and Intrinsic JitterISSCC 2004, p. 1720.130.1310980mW0.493mm x 5mm3mm x 5mm * 0.2540.74, 5ps p-p50.0331632653
Takauchi, ISSCC 2003Hideki TakauchiA CMOS Multi-Channel 10Gb/s TransceiverISSCC 2003, Sec. 4.20.110.1110176mW(rec)+102mw(PLL)+188mW(tx)0.265mm x 10mm5mm x 10mm * 1/242.0825ps250.0203402367
Meghelli, ISSCC 2003Mounir MeghelliA 0.18um SiGe BiCMOS Receiver and Transmitter Chipset for SONET OC-768 Transmission SystemsISSCC 2003, 13.10.18 SiGe0.09405.1W2.52.17x2.17 + 1.7x1.72.92.918ps(eye opening)70.0709359606
Koyama, ISSCC 2003Akio Koyama43 Gb/s full-rate-clock 16:1 multiplexer and 1:16 demultiplexer with SFI-5 interface in SiGe BiCMOS technologyISSCC 2003, 13.20.18um SiGe(Ft=140GHz)0.094022W116.5x6.5, 4.35x4.35, 4.35x4.35441.07, 8ps80.0102272727
Shaeffer, ISSCC 2003Derek ShaefferA 40/43 Gb/s SONET OC-768 SiGe 4:1 MUX/CMUISSCC 2003, p. 236SiGe(ft=120GHz)0.09404.9W2.98.25mm^23636880fs, 5.1ps5.10.0067613252
Green, ISSCC 2002Michael GreenOC-192 Transmitter in Standard 0.18um CMOSISSCC 2002, p. 2480.18um0.1810450mW(trans)0.452.5mm x 2.5mm5515ps150.0133333333
57.8
P. ChiangJ. KimU. SinghD. Shaeffer
VLSI 2004ISSCC 2005VLSI 2005ISSCC 2003
Data Rate (Gb/s)20Gb/s40Gb/s34Gb/s40Gb/s
Power165mW2.7W1.335W4.9W
Area0.2275mm^29.18mm^24.16mm^28.25mm^2
Jitter (RMS, pk-pk)2.37ps, 15ps1.53ps, 8.11 ps1.44ps, 9.44ps880fs, 5.1ps
Technology0.13um CMOS0.13um CMOS0.18um CMOS0.09um SiGe
Sheet1
2.1277922078
0.4348432056
0.0258687858
0.1048387097
0.1309601416
0.0481481481
0.0331632653
0.0203402367
0.0709359606
0.0102272727
0.0067613252
0.0133333333
Sheet2
Sheet3
-
Conventional Serial Link ReceiversConventional architectures also use multi-phase PLL
Well, guess whatwe have same problem at the receiver
-
2nd Generation Transmitter2-Tap Equalizer implemented for compensatingfor channel lossesAchieve 50ps analog delay with CML buffers
10Gb/s
Low-HighBuffers
2:1MUX
10GHz->5GHz Divider
Low-HighBuffers
8 phases @ 5GHz
5Gb/s
4
Equalizing Path
8
5GHz->2.5GHz Divider
MainPath
10GHz
Oscillator
20Gb/s
10GHz CLK
10GHz CLKB
2:1MUX
2:1MUX
2:1MUX
Phase
Comparator
Off Chip
@ 1.25GHz
Charge
Pump
Varactor
Control
Data
Retiming
PRBS/BER Checker
Divider
2:1
1.25GHz
2.5GHz
10Gb/s
2:1
2:1
4
5Gb/s
5Gb/s
4 phases @ 5GHz
5Gb/s
2:1
2:1
50psDelay
Analog delay, but replica bias
-
Fabrication: Test Chip ST Microelectronics 0.13um test chip307mW / transceiver 0.46mm^220mV input sensitivity
2006 0.13um Test Chip500um600um350um450umTransmitterReceiver
1
10GHzVCO
Normal-SizedInverters
10Ghz-5>Ghz Divider
10GHz
L-H
L-H
L-H
L-H
. . .
8
LowSwing
DigitalSwing
5GHz
2.5GHz
LowSwing
DigitalSwing
In1[b]
In0
WP / WN = 4 / 1
In1
Out0
Out1
In1[a]
Out1[a]
Out1[b]
Low-HighConverter [L-H]
2.5GHz-> 1.25GHz
To Phase Comparator
In1
CLK
CLKb
VCUR
D0
D0b
R=60
R=60
MCLK
MCLKB
M0
MINA0
MINB0
MINA1
MINB1
MINA1, MINA2 = 20.48u/0.26uall other M = 40.96u/0.13u
FromCharge Pump
CML Divider Stage
First 0.13um
-
Results
20Gb/sIdeal Channel
20Gb/s-6.5dB @ 10GHz
43ps80mV33mV37psAll Results Single-Ended
-
Results (contd)
62mV35ps20Gb/sIdeal Channelwith =0.37
20Gb/s-6.5dB @ 10GHzwith =0.37
36.4ps72mV
-
Rationale for Multi-coresNext generation computing Multi-core Processingi.e. multiple, parallel DSPs (i.e. MACs)
Why we cannot achieve faster frequencies?Wire delays dont scale like transistorsPower increases exponentially(when pushing process technology)Timing margins degraded by VariabilityPower supply noiseDigital crosstalk
NOTE: More independent threads require more memory bandwidth
Intel, 80 Cores, ISSCC 2007
-
Research: Explore Parallel Serial LinksSerial Links also exhibit the same characteristicsChannel losses get worse Power consumption increases significantly with bandwidthTiming precision limited by:Static Phase Offset (process variation)Power-supply Induced JitterInterchannel Crosstalk
Serial Links need to to also push for high amounts of parallelism How is this different than conventional link design?Channel equalization becomes more difficultAdjacent channel crosstalkDifficult channel estimation problem (power, flexibility, data-rate, equalizer design, channel, distance)Amortize Clock Power for Multiple LinksDistributed resonant clocking of analog/mixed-signal front-ends
-
Problem of IO2500 pins / 2 = 1200 Differential pinsAssume 10Gbs / link = 12 Tb/s Bandwidth100mW/Gb(bandwidth) = 120W
-
Stateye PlayingFun with Stat-Eye5Gb/s -> 10Gb/sWorse ChannelsWorse timing jitterHomework examples
-
Next TimeTelegraphers EquationReflection coefficientsChannel ModelsSkin EffectDielectric constantvias
-
Organization of the channel, arrows from channel, plotschange image layout
Reall what you want to say on the slides.Focus on timing uncertainty, not channelindependent vectorDotted lines around different circuit components, PLL, muxing, etc. Clocks are differential clocks.
Get rid of everything else, use red. Or change imageslose people on the insight, carry through. Simpler is betterOur test chip was fabricated in National Semiconductors quarter micron CMOS technology. The die is 2.6 by 1.4 square millimeter and uses a 52-pin impedance controlled package donated by Vitesse Corporation. The active area of the transceiver circuits is 0.31-mm2. Change the cadence of talkingthese are the important points. Too much stuff in slides, too heavyline width, is 2-3 points.
Dont spend toom uch time on 19.2
Seen here is the phase step values across the entire range. The average phase resolution should by 15.6ps, so the interpolation steps shown are very accurate.Note that every 9nth phase has phase interpolation values lower than the average of 15.6ps, which is what is expected, since these are the redundant steps. You can also see that not every 9th phase value is consistently small. For example, phases 18 and 36 dont show as small of a phase step as phases 9 and 27.The reason for this error is due to a layout error, due to asymmetric clock loading causing different capacitive coupling for different transitions.
(Different phase differences due to different delays amounts in the DLL itself)Well, guess whatwe have same problem at the receiverAnalog delay, but replica biasFirst 0.13um