ece 679 lecture 1

Upload: wearole

Post on 05-Oct-2015

226 views

Category:

Documents


0 download

DESCRIPTION

ECE 679 Lecture 1

TRANSCRIPT

  • ECE 679: Digital Systems EngineeringPatrick ChiangOffice Hours: 1-2PM Mon-ThursGLSN 100

  • Class IntroductionsWho am I

    Who are you

  • Class BasicsClass basics4 Homeworks (%20) (groups of 2)Midterm (%40)Final Project (%40)4-page IEEE report10 minute presentation (groups of 2)Guest lecture (Dr. Frank OMahony)Intel Research Labs (May 4th)Intel Field Trip (June 7th) TBDPresentations of 1-2 best project reports

  • Class HomeworkHomeworkSkim Dally/Poulton Digital Systems EngineeringChapter 3Skim Overview Paper: http://mos.stanford.edu/papers/mh_micro_98.pdfIncludes running Stat EyeOregon State Matlab (eecs.oregonstate.edu/it)www.stateye.orgProblem Set #1rlc files -- ~pchiang/hspice (rlc_spice_deck; rlc.rlc)Spice models -- ~pchiang/hspice/process_files/130nm to 22nmSimulator lang = spiceSpectre models

    DEFINE gpdk090 /nfs/guille/analog/c/cdsmgr/process/gpdk090_v3.8/libs.cdb/gpdk090

  • What does this mean for analog designers?Ever build an ADC?Ever wonder what to do with the digital bits?

    AnalogFs = 600MHz8-16 bits@ 100MHz, 200MHz, 400MHzGoes to Vector analyzer Why does this clock rate not increase?

    What really is this output doing? Whereis it going?

  • Brief SummaryIntroduction to the areaWhy serial links are importantWhat are the current technology trends/limitations

  • 4Gb/s Low Power, Area Efficient Serial Links

    Interconnection betweendifferent chips

    Transmitter Equalization

    Receiver Offset Cancellation

    2000 0.25um Testchip2001 0.25um Testchip4Gb/s Transmitter Output4Gb/s Transmitter Output, 1m4Gb/s Transmitter Output, Equalized

    Ming-Ju E. Lee, William J. Dally, John W. Poulton, Patrick Chiang, Stephen F. Greenwood. An 84-mW 4Gb/s Clock and Data Recovery Circuit for Serial Link Applications. VLSI Circuits Symposium, Kyoto, Japan, June 2001, pp. 149-152.

    Ming-Ju E. Lee, William Dally, Patrick Chiang. Low-Power Area-Efficient High-Speed I/O Circuit Techniques. IEEE Journal of Solid-State Circuits, November 2000, Vol. 35, No. 11, pp. 1591-1599.

    Organization of the channel, arrows from channel, plotschange image layout

    Reall what you want to say on the slides.

  • Scaling Serial Links:From 4Gb/s->20Gb/sThesis: Develop 20Gb/s Serial LinkArea: 500um x 500umPower: 200mW/link

    1 bit time = 1FO4

    Timing uncertainty becomes KEY issue

    Focus on timing uncertainty, not channelindependent vector

  • Transmitter Block Diagram

    Dotted lines around different circuit components, PLL, muxing, etc. Clocks are differential clocks.

    Get rid of everything else, use red. Or change imageslose people on the insight, carry through. Simpler is better

  • Test ChipClock RecoveryRXPRBS CheckPRBS GenTXDLLTest InterfaceUMC 1.2V, 0.13um CMOS(single Vt)Die size 700um x 1.15mm50 Ohm Pad Termination using Wafer Probes

    700um1.1mm10GHz PLLTransmitterMuxingPhaseInterpolatorsTestStructures

    Our test chip was fabricated in National Semiconductors quarter micron CMOS technology. The die is 2.6 by 1.4 square millimeter and uses a 52-pin impedance controlled package donated by Vitesse Corporation. The active area of the transceiver circuits is 0.31-mm2.

  • PLL MeasurementsJitter limited by 1.25GHz input reference clockHP 8133A input clock (1.2ps RMS, 8.9ps pk-pk)

    (c)Power SpectrumQ=10 JitterQ=5 Jitter

    Open Loop VCO Phase Noise @ 1MHz-97dBc/Hz 10GHz Jitter (RMS)0.97ps10GHz Jitter(pk-pk)8.0psPLL Power38.6mWVCO Power6mWTuning Range1.14-1.31

    (a)

    (b)

    Change the cadence of talkingthese are the important points. Too much stuff in slides, too heavyline width, is 2-3 points.

  • Eye Diagram Data Rate = 19.2Gb/s Voltage ripple caused by lack of current source at differential pair tail node

    Jitter2.2ps RMS15.6ps pk-pk

    Dont spend toom uch time on 19.2

    Seen here is the phase step values across the entire range. The average phase resolution should by 15.6ps, so the interpolation steps shown are very accurate.Note that every 9nth phase has phase interpolation values lower than the average of 15.6ps, which is what is expected, since these are the redundant steps. You can also see that not every 9th phase value is consistently small. For example, phases 18 and 36 dont show as small of a phase step as phases 9 and 27.The reason for this error is due to a layout error, due to asymmetric clock loading causing different capacitive coupling for different transitions.

    (Different phase differences due to different delays amounts in the DLL itself)

  • High Speed Transmitter Comparisons

    A 250mW Full-Rate 10Gb/s Transceiver Core in 90nm CMOSusing a Tri-State Binary PD with 100ps Gated Digital OutputT. Masuda, et. al., ISSCC 2007.

    A full-rate 10Gb/s transceiver core employing a tri-state binary PDwith 100ps gated digital output is implemented in a 90nm CMOS process. Direct drive from the VCO is utilized to eliminate the 10GHz clock buffer current. The RX exhibits a recovered jitterof 906fs(rms) and an input sensitivity of 5.9mV. The TX generatesa jitter of 5mUI(rms). The chip consumes 250mW.

    Chart1

    2.1277922078

    0.4348432056

    0.0258687858

    0.1048387097

    0.1309601416

    0.0481481481

    0.0331632653

    0.0203402367

    0.0709359606

    0.0102272727

    0.0067613252

    0.0133333333

    Sheet1

    serial links 10Gb/s and beyondFOM = (Gb/s * technology) / power * area * jitter (Given 40 Gb/s bandwidth)

    FOM = (Gb/s * technology) / power * area * jitter (*Gb/s / 40)

    AuthorTitleJournalTechnologyTechnologyData Rate(Gb/s)Power ConsumptionPowerAreaArea(trans)(mm^2)Area(trans)(mm^2)Jitter(rms,pk-pk)Jitter(pk-pk)

    Chiang, VLSI 2004*Patrick Chianga 20Gb/s 0.13um CMOS Serial Link transmitter Using an LC-PLL to Directly Drive the Output MultiplexerVLSI 20040.130.1319.2165mW(trans)0.1650.65mm x 0.35mm(trans)0.22750.22752.37, 15ps152.1277922078

    Krishna, ISSCC 2005Kannan Krishna, weinladera 0.6-9.6Gb/s Binary Backplane Transceiver Core in 0.13um CMOSIsscc 2005, p. 660.130.139.6275mW0.140.58mm x 0.97mm0.2716+690*2000.41120.4348432056

    Kim, ISSCC 2005Jaeha KimCircuit Techniques for a 40Gb/s Transmitter in 0.13um CMOSIsscc 2005, p. 1500.130.13402.7W(trans)2.72.5mm x 3.6mm(trans)9.189.181.53, 8.118.110.0258687858

    Sidripolous, ISSCC 2004Stefanos SidripolousAn 800mW 10Gb/s Ethernet Transceiver in 0.13um CMOSIsscc 2004, p. 1680.130.13100.8W0.42.5mm x 5mm1.251.256.2ps, pk-pk6.20.1048387097

    Singh, VLSI 2005Ullass Singh, Michael Green34 Gb/s, 0.18um CMOS transmitterVLSI2005, p. 1320.180.18401.335W(trans)1.41.6mm*2.6mm(trans)4.164.161.44/9.449.440.1309601416

    Lee, ISSCC 2004Hyung-Rok LeeA Fully Integrated 0.13um CMOS 10Gb Ethernet Transceiver with XAUI InterfaceISSCC 2004, p. 1700.130.1310900mW0.455mm x 5mm112.2, 15150.0481481481

    Werker, ISSCC 2004H. WerkerA 10Gb/s SONET-Compliant CMOS Transceiver with Low Cross-Talk and Intrinsic JitterISSCC 2004, p. 1720.130.1310980mW0.493mm x 5mm3mm x 5mm * 0.2540.74, 5ps p-p50.0331632653

    Takauchi, ISSCC 2003Hideki TakauchiA CMOS Multi-Channel 10Gb/s TransceiverISSCC 2003, Sec. 4.20.110.1110176mW(rec)+102mw(PLL)+188mW(tx)0.265mm x 10mm5mm x 10mm * 1/242.0825ps250.0203402367

    Meghelli, ISSCC 2003Mounir MeghelliA 0.18um SiGe BiCMOS Receiver and Transmitter Chipset for SONET OC-768 Transmission SystemsISSCC 2003, 13.10.18 SiGe0.09405.1W2.52.17x2.17 + 1.7x1.72.92.918ps(eye opening)70.0709359606

    Koyama, ISSCC 2003Akio Koyama43 Gb/s full-rate-clock 16:1 multiplexer and 1:16 demultiplexer with SFI-5 interface in SiGe BiCMOS technologyISSCC 2003, 13.20.18um SiGe(Ft=140GHz)0.094022W116.5x6.5, 4.35x4.35, 4.35x4.35441.07, 8ps80.0102272727

    Shaeffer, ISSCC 2003Derek ShaefferA 40/43 Gb/s SONET OC-768 SiGe 4:1 MUX/CMUISSCC 2003, p. 236SiGe(ft=120GHz)0.09404.9W2.98.25mm^23636880fs, 5.1ps5.10.0067613252

    Green, ISSCC 2002Michael GreenOC-192 Transmitter in Standard 0.18um CMOSISSCC 2002, p. 2480.18um0.1810450mW(trans)0.452.5mm x 2.5mm5515ps150.0133333333

    57.8

    P. ChiangJ. KimU. SinghD. Shaeffer

    VLSI 2004ISSCC 2005VLSI 2005ISSCC 2003

    Data Rate (Gb/s)20Gb/s40Gb/s34Gb/s40Gb/s

    Power165mW2.7W1.335W4.9W

    Area0.2275mm^29.18mm^24.16mm^28.25mm^2

    Jitter (RMS, pk-pk)2.37ps, 15ps1.53ps, 8.11 ps1.44ps, 9.44ps880fs, 5.1ps

    Technology0.13um CMOS0.13um CMOS0.18um CMOS0.09um SiGe

    Sheet1

    2.1277922078

    0.4348432056

    0.0258687858

    0.1048387097

    0.1309601416

    0.0481481481

    0.0331632653

    0.0203402367

    0.0709359606

    0.0102272727

    0.0067613252

    0.0133333333

    Sheet2

    Sheet3

  • Conventional Serial Link ReceiversConventional architectures also use multi-phase PLL

    Well, guess whatwe have same problem at the receiver

  • 2nd Generation Transmitter2-Tap Equalizer implemented for compensatingfor channel lossesAchieve 50ps analog delay with CML buffers

    10Gb/s

    Low-HighBuffers

    2:1MUX

    10GHz->5GHz Divider

    Low-HighBuffers

    8 phases @ 5GHz

    5Gb/s

    4

    Equalizing Path

    8

    5GHz->2.5GHz Divider

    MainPath

    10GHz

    Oscillator

    20Gb/s

    10GHz CLK

    10GHz CLKB

    2:1MUX

    2:1MUX

    2:1MUX

    Phase

    Comparator

    Off Chip

    @ 1.25GHz

    Charge

    Pump

    Varactor

    Control

    Data

    Retiming

    PRBS/BER Checker

    Divider

    2:1

    1.25GHz

    2.5GHz

    10Gb/s

    2:1

    2:1

    4

    5Gb/s

    5Gb/s

    4 phases @ 5GHz

    5Gb/s

    2:1

    2:1

    50psDelay

    Analog delay, but replica bias

  • Fabrication: Test Chip ST Microelectronics 0.13um test chip307mW / transceiver 0.46mm^220mV input sensitivity

    2006 0.13um Test Chip500um600um350um450umTransmitterReceiver

    1

    10GHzVCO

    Normal-SizedInverters

    10Ghz-5>Ghz Divider

    10GHz

    L-H

    L-H

    L-H

    L-H

    . . .

    8

    LowSwing

    DigitalSwing

    5GHz

    2.5GHz

    LowSwing

    DigitalSwing

    In1[b]

    In0

    WP / WN = 4 / 1

    In1

    Out0

    Out1

    In1[a]

    Out1[a]

    Out1[b]

    Low-HighConverter [L-H]

    2.5GHz-> 1.25GHz

    To Phase Comparator

    In1

    CLK

    CLKb

    VCUR

    D0

    D0b

    R=60

    R=60

    MCLK

    MCLKB

    M0

    MINA0

    MINB0

    MINA1

    MINB1

    MINA1, MINA2 = 20.48u/0.26uall other M = 40.96u/0.13u

    FromCharge Pump

    CML Divider Stage

    First 0.13um

  • Results

    20Gb/sIdeal Channel

    20Gb/s-6.5dB @ 10GHz

    43ps80mV33mV37psAll Results Single-Ended

  • Results (contd)

    62mV35ps20Gb/sIdeal Channelwith =0.37

    20Gb/s-6.5dB @ 10GHzwith =0.37

    36.4ps72mV

  • Rationale for Multi-coresNext generation computing Multi-core Processingi.e. multiple, parallel DSPs (i.e. MACs)

    Why we cannot achieve faster frequencies?Wire delays dont scale like transistorsPower increases exponentially(when pushing process technology)Timing margins degraded by VariabilityPower supply noiseDigital crosstalk

    NOTE: More independent threads require more memory bandwidth

    Intel, 80 Cores, ISSCC 2007

  • Research: Explore Parallel Serial LinksSerial Links also exhibit the same characteristicsChannel losses get worse Power consumption increases significantly with bandwidthTiming precision limited by:Static Phase Offset (process variation)Power-supply Induced JitterInterchannel Crosstalk

    Serial Links need to to also push for high amounts of parallelism How is this different than conventional link design?Channel equalization becomes more difficultAdjacent channel crosstalkDifficult channel estimation problem (power, flexibility, data-rate, equalizer design, channel, distance)Amortize Clock Power for Multiple LinksDistributed resonant clocking of analog/mixed-signal front-ends

  • Problem of IO2500 pins / 2 = 1200 Differential pinsAssume 10Gbs / link = 12 Tb/s Bandwidth100mW/Gb(bandwidth) = 120W

  • Stateye PlayingFun with Stat-Eye5Gb/s -> 10Gb/sWorse ChannelsWorse timing jitterHomework examples

  • Next TimeTelegraphers EquationReflection coefficientsChannel ModelsSkin EffectDielectric constantvias

  • Organization of the channel, arrows from channel, plotschange image layout

    Reall what you want to say on the slides.Focus on timing uncertainty, not channelindependent vectorDotted lines around different circuit components, PLL, muxing, etc. Clocks are differential clocks.

    Get rid of everything else, use red. Or change imageslose people on the insight, carry through. Simpler is betterOur test chip was fabricated in National Semiconductors quarter micron CMOS technology. The die is 2.6 by 1.4 square millimeter and uses a 52-pin impedance controlled package donated by Vitesse Corporation. The active area of the transceiver circuits is 0.31-mm2. Change the cadence of talkingthese are the important points. Too much stuff in slides, too heavyline width, is 2-3 points.

    Dont spend toom uch time on 19.2

    Seen here is the phase step values across the entire range. The average phase resolution should by 15.6ps, so the interpolation steps shown are very accurate.Note that every 9nth phase has phase interpolation values lower than the average of 15.6ps, which is what is expected, since these are the redundant steps. You can also see that not every 9th phase value is consistently small. For example, phases 18 and 36 dont show as small of a phase step as phases 9 and 27.The reason for this error is due to a layout error, due to asymmetric clock loading causing different capacitive coupling for different transitions.

    (Different phase differences due to different delays amounts in the DLL itself)Well, guess whatwe have same problem at the receiverAnalog delay, but replica biasFirst 0.13um