energy‐efficient vlsi...

38
Patrick Chiang, Oregon State University h6p://eecs.oregonstate.edu/research/vlsi Energy‐Efficient VLSI Interconnects Patrick Chiang Oregon State VLSI Research Group Santa Fe, Oct. 13, 2010

Upload: truongdiep

Post on 18-Mar-2018

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Energy‐Efficient VLSI Interconnectslph.ece.utexas.edu/merez/uploads/LACSS2010/LACSS2010_PatrickChia… · – Router backplane (line‐card to backplane ... Low‐Swing Bitcell

PatrickChiang,OregonStateUniversityh6p://eecs.oregonstate.edu/research/vlsi

Energy‐EfficientVLSIInterconnects

PatrickChiangOregonStateVLSIResearchGroup

SantaFe,Oct.13,2010

Page 2: Energy‐Efficient VLSI Interconnectslph.ece.utexas.edu/merez/uploads/LACSS2010/LACSS2010_PatrickChia… · – Router backplane (line‐card to backplane ... Low‐Swing Bitcell

PatrickChiang,OregonStateUniversityh6p://eecs.oregonstate.edu/research/vlsi

FighDngwords

•  Whatmeansmoretoyou–performanceorenergy?

•  DarkSilicon–moresiliconthanyouhaveenergy– Whyiswiderparallelandslowaproblem?

•  MulDcorelowerCLKdeepparallelism– LowVDDattheextreme

Page 3: Energy‐Efficient VLSI Interconnectslph.ece.utexas.edu/merez/uploads/LACSS2010/LACSS2010_PatrickChia… · – Router backplane (line‐card to backplane ... Low‐Swing Bitcell

PatrickChiang,OregonStateUniversityh6p://eecs.oregonstate.edu/research/vlsi

ITRSRoadmap2007

•  “at0.13umapproximately51%ofmicroprocessorpowerwasconsumedbyinterconnect,withaprojecDonthatwithoutchangesindesignphilosophy,inthenextfiveyearsupto80%ofmicroprocessorpowerwillbeconsumedbyinterconnect”

Page 4: Energy‐Efficient VLSI Interconnectslph.ece.utexas.edu/merez/uploads/LACSS2010/LACSS2010_PatrickChia… · – Router backplane (line‐card to backplane ... Low‐Swing Bitcell

PatrickChiang,OregonStateUniversityh6p://eecs.oregonstate.edu/research/vlsi

InterconnectEnergyWall•  “TheEnergyandPowerChallengeisthemostpervasiveofthe

four,andhasitsrootsintheinabilityofthegrouptoprojectanycombinaDonofcurrentlymaturetechnologiesthatwilldeliversufficientlypowerfulsystemsinanyclassatthedesiredpowerlevels.“

•  AkeyobservaDonofthestudyisthatitmaybeeasiertosolvethepowerproblemassociatedwithbasecomputaAonthanitwillbetoreducetheproblemoftransporAngdatafromonesitetoanother‐onthesamechip,betweencloselycoupledchipsinacommonpackage,orbetweendifferentracksonoppositesidesofalargemachineroom…

Source:DARPAExascaleStudy,Sep.2008

•  Thisisa“solved”problemifweareWILLINGtodealwiththeimplicaDons

Page 5: Energy‐Efficient VLSI Interconnectslph.ece.utexas.edu/merez/uploads/LACSS2010/LACSS2010_PatrickChia… · – Router backplane (line‐card to backplane ... Low‐Swing Bitcell

PatrickChiang,OregonStateUniversityh6p://eecs.oregonstate.edu/research/vlsi

Overview•  MoDvaDon:Whyhigh‐speedlinks?

•  Tutorialonchannelloss–  Channellosslimits:

•  DataRate•  EnergyconsumpDon

– OpDcalinterconnect–  3DIntegraDon

•  Energy‐Efficiency– Assumechannelisbenign–  1mW/GbpsOff‐ChipSerialLinks–  Fundamentallimitstolow‐poweron‐chiplinks

Page 6: Energy‐Efficient VLSI Interconnectslph.ece.utexas.edu/merez/uploads/LACSS2010/LACSS2010_PatrickChia… · – Router backplane (line‐card to backplane ... Low‐Swing Bitcell

PatrickChiang,OregonStateUniversityh6p://eecs.oregonstate.edu/research/vlsi

Whyseriallinks•  WiredcommunicaDonsfromonechiptoanother

–  Memory‐>CPU(DDR3)–  CPU‐>CPU(IntelQuickpath;AMDHyperTransport)–  Routerbackplane(line‐cardtobackplane)

•  Typicallyamajorboileneckfor:–  Systemperformance–  PowerconsumpDon–  Chipcomplexity

•  Whyoff‐chipbandwidthscalesslowly?–  CPUtransistordensityincreasing–  Memorycapacityincreasing–  Interconnectdoesnotimprove–  #ofpadsdoesnotincrease

•  UseOpDcs??

Page 7: Energy‐Efficient VLSI Interconnectslph.ece.utexas.edu/merez/uploads/LACSS2010/LACSS2010_PatrickChia… · – Router backplane (line‐card to backplane ... Low‐Swing Bitcell

PatrickChiang,OregonStateUniversityh6p://eecs.oregonstate.edu/research/vlsi

I/OBandwidthisLimiDngFactor

OnChipBW

OffChipBW

•  GOAL:–  highBW(Gbps/pad)–  Energy‐Efficient(mW/GbpsorpJ/bit)–  Low‐area(100’sto1000’sonadie)

Page 8: Energy‐Efficient VLSI Interconnectslph.ece.utexas.edu/merez/uploads/LACSS2010/LACSS2010_PatrickChia… · – Router backplane (line‐card to backplane ... Low‐Swing Bitcell

PatrickChiang,OregonStateUniversityh6p://eecs.oregonstate.edu/research/vlsi

TheEnergyWall‐‐ComputaDonisfree;CommunicaDonisexpensive

•  Futureprocessor:256Cores

•  Totalon‐chipbandwidth–  IfPWIRE=0.25mW/Gbps/mm, 150W

 Need>10xreducDon

•  Totaloff‐chipbandwidth– Predictedoff‐diebandwidth:5.12Tbps– 512Seriallinks@10Gbps(10mW/Gbps)=50W

Need>10xreducDon(1mW/Gbps)

Source:IEEEMicro,Dec.2007

Page 9: Energy‐Efficient VLSI Interconnectslph.ece.utexas.edu/merez/uploads/LACSS2010/LACSS2010_PatrickChia… · – Router backplane (line‐card to backplane ... Low‐Swing Bitcell

PatrickChiang,OregonStateUniversityh6p://eecs.oregonstate.edu/research/vlsi

Source:ExascaleRoadmapMeeDng,Dec.2009

EnergyBarrier

Page 10: Energy‐Efficient VLSI Interconnectslph.ece.utexas.edu/merez/uploads/LACSS2010/LACSS2010_PatrickChia… · – Router backplane (line‐card to backplane ... Low‐Swing Bitcell

PatrickChiang,OregonStateUniversityh6p://eecs.oregonstate.edu/research/vlsi

Overview•  MoDvaDon:Whyhigh‐speedlinks?

•  Tutorialonchannelloss–  Channellosslimits:

•  DataRate•  EnergyconsumpDon

– OpDcalinterconnect–  3DIntegraDon

•  Energy‐Efficiency– Assumechannelisbenign–  1mW/GbpsOff‐ChipSerialLinks–  Fundamentallimitstolow‐poweron‐chiplinks

Page 11: Energy‐Efficient VLSI Interconnectslph.ece.utexas.edu/merez/uploads/LACSS2010/LACSS2010_PatrickChia… · – Router backplane (line‐card to backplane ... Low‐Swing Bitcell

PatrickChiang,OregonStateUniversityh6p://eecs.oregonstate.edu/research/vlsi

Whatmakesitmorechallenging

•  Now,thebandwidthlimitisinwires•  TakeHomeNote:tellvendortothrowkitchensinkatPCB

Highspeedlinkchip

>2GHzsignals

Page 12: Energy‐Efficient VLSI Interconnectslph.ece.utexas.edu/merez/uploads/LACSS2010/LACSS2010_PatrickChia… · – Router backplane (line‐card to backplane ... Low‐Swing Bitcell

PatrickChiang,OregonStateUniversityh6p://eecs.oregonstate.edu/research/vlsi

Backplanechannel•  Lossisvariable

–  Samebackplane–  Differentlengths–  Differentstubs

•  Topvs.Bot

•  AienuaDonislarge–  >30dB@3GHz–  Butisthatbad?

•  Requiredsignalamplitudesetbynoise

Page 13: Energy‐Efficient VLSI Interconnectslph.ece.utexas.edu/merez/uploads/LACSS2010/LACSS2010_PatrickChia… · – Router backplane (line‐card to backplane ... Low‐Swing Bitcell

PatrickChiang,OregonStateUniversityh6p://eecs.oregonstate.edu/research/vlsi

ChannelResponsesSource:Palermo,TexasA&M

Page 14: Energy‐Efficient VLSI Interconnectslph.ece.utexas.edu/merez/uploads/LACSS2010/LACSS2010_PatrickChia… · – Router backplane (line‐card to backplane ... Low‐Swing Bitcell

PatrickChiang,OregonStateUniversityh6p://eecs.oregonstate.edu/research/vlsi

10GbpsEqualizaDon

•  NOTE:TellvendortothrowkitchensinkatPCB

Source:Palermo,TexasA&M

Page 15: Energy‐Efficient VLSI Interconnectslph.ece.utexas.edu/merez/uploads/LACSS2010/LACSS2010_PatrickChia… · – Router backplane (line‐card to backplane ... Low‐Swing Bitcell

PatrickChiang,OregonStateUniversityh6p://eecs.oregonstate.edu/research/vlsi

OpDcssolvesthechannel•  InterconnectisnowopDcal

– On‐chipwiresarenolongerRC– Off‐chipbackplanechannelsarenolongerT‐lines

•  MulDmodeloss=2.5dB(@850nm)

Page 16: Energy‐Efficient VLSI Interconnectslph.ece.utexas.edu/merez/uploads/LACSS2010/LACSS2010_PatrickChia… · – Router backplane (line‐card to backplane ... Low‐Swing Bitcell

PatrickChiang,OregonStateUniversityh6p://eecs.oregonstate.edu/research/vlsi

OpDcsdoesn’tsolvetheelectronics

IfTransmitpower=0,5mW/Gbps

Palermo,JSSC08

Page 17: Energy‐Efficient VLSI Interconnectslph.ece.utexas.edu/merez/uploads/LACSS2010/LACSS2010_PatrickChia… · – Router backplane (line‐card to backplane ... Low‐Swing Bitcell

PatrickChiang,OregonStateUniversityh6p://eecs.oregonstate.edu/research/vlsi

3Dsolvesthechannel•  Memory‐>CPUinterfaceiscloser

– Distancesareshorter

•  ElectronicsattheinterfacessDllrequired

ChannelDistance

Page 18: Energy‐Efficient VLSI Interconnectslph.ece.utexas.edu/merez/uploads/LACSS2010/LACSS2010_PatrickChia… · – Router backplane (line‐card to backplane ... Low‐Swing Bitcell

PatrickChiang,OregonStateUniversityh6p://eecs.oregonstate.edu/research/vlsi

Overview•  MoDvaDon:Whyhigh‐speedlinks?

•  Tutorialonchannelloss–  Channellosslimits:

•  DataRate•  EnergyconsumpDon

– OpDcalinterconnect–  3DIntegraDon

•  Energy‐Efficiency– Assumechannelisbenign–  <1mW/GbpsOff‐ChipSerialLinks–  Fundamentallimitstolow‐swingon‐chiplinks

Page 19: Energy‐Efficient VLSI Interconnectslph.ece.utexas.edu/merez/uploads/LACSS2010/LACSS2010_PatrickChia… · – Router backplane (line‐card to backplane ... Low‐Swing Bitcell

PatrickChiang,OregonStateUniversityh6p://eecs.oregonstate.edu/research/vlsi

Problem1:Off‐chipI/Opowerisnotscaling

Source:ExascaleRoadmapMeeDng,Dec.2009

OFF‐CHIP:1‐10pJ/bit(1‐10mW/Gbps)

CONVENTIONAL

OURGOAL:<0.1pJ/bit

Page 20: Energy‐Efficient VLSI Interconnectslph.ece.utexas.edu/merez/uploads/LACSS2010/LACSS2010_PatrickChia… · – Router backplane (line‐card to backplane ... Low‐Swing Bitcell

PatrickChiang,OregonStateUniversityh6p://eecs.oregonstate.edu/research/vlsi

Parallelism:ReducingClockPowerThroughGlobalClockDistribuDonOpDmizaDon(withIntel)

•  Goal:Reducepower<1mW/Gbs

•  Assumechannellossismoderate

Clk/CDR is 54% of Total

Power (10Gbs)

Clk/CDR is 71% of Total

Power (15Gbs)

Intel(VLSI,2007)

Rambus,2.4mW/Gbps

•  Clockdominatespowerinseriallinks•  Isthereawaytoshareclockpower?

Page 21: Energy‐Efficient VLSI Interconnectslph.ece.utexas.edu/merez/uploads/LACSS2010/LACSS2010_PatrickChia… · – Router backplane (line‐card to backplane ... Low‐Swing Bitcell

PatrickChiang,OregonStateUniversityh6p://eecs.oregonstate.edu/research/vlsi

ProposedReceiverArchitecture

21

Page 22: Energy‐Efficient VLSI Interconnectslph.ece.utexas.edu/merez/uploads/LACSS2010/LACSS2010_PatrickChia… · – Router backplane (line‐card to backplane ... Low‐Swing Bitcell

PatrickChiang,OregonStateUniversityh6p://eecs.oregonstate.edu/research/vlsi

ILRO:ExtensionofAdler’sEquaDon

22

Page 23: Energy‐Efficient VLSI Interconnectslph.ece.utexas.edu/merez/uploads/LACSS2010/LACSS2010_PatrickChia… · – Router backplane (line‐card to backplane ... Low‐Swing Bitcell

PatrickChiang,OregonStateUniversityh6p://eecs.oregonstate.edu/research/vlsi

Further:Near‐Threshold,0.13mW/Gbps,8GbpsSerialLinkReceiver

•  TwoinnovaDons:–  Exploitsconstraineddeskew(Jaussi,Casper@Intel)

–  “KitchenSink”

– OperatesinNear‐Threshold(Vdd<0.6V)

Page 24: Energy‐Efficient VLSI Interconnectslph.ece.utexas.edu/merez/uploads/LACSS2010/LACSS2010_PatrickChia… · – Router backplane (line‐card to backplane ... Low‐Swing Bitcell

PatrickChiang,OregonStateUniversityh6p://eecs.oregonstate.edu/research/vlsi

MeasuredDeskewRangeandBER

•  Deskewrange:0.37UI(max)

Page 25: Energy‐Efficient VLSI Interconnectslph.ece.utexas.edu/merez/uploads/LACSS2010/LACSS2010_PatrickChia… · – Router backplane (line‐card to backplane ... Low‐Swing Bitcell

PatrickChiang,OregonStateUniversityh6p://eecs.oregonstate.edu/research/vlsi

ComparisonTable

Page 26: Energy‐Efficient VLSI Interconnectslph.ece.utexas.edu/merez/uploads/LACSS2010/LACSS2010_PatrickChia… · – Router backplane (line‐card to backplane ... Low‐Swing Bitcell

PatrickChiang,OregonStateUniversityh6p://eecs.oregonstate.edu/research/vlsi

Problem2:On‐ChipWiresConsumeEnergy

•  On‐chipwirepowerdoesnotscale–  Dominatedbyinterconnectcapacitance(CVDD

2)

ON‐CHIP(StatusQuo):100‐300fJ/bit/mm

NOTE:Sub/Near‐Thresholddoesn’thelpthisproblem!

OURGOAL:<5fJ/bit/mm

[DOE,ExascaleWorkshop]

Page 27: Energy‐Efficient VLSI Interconnectslph.ece.utexas.edu/merez/uploads/LACSS2010/LACSS2010_PatrickChia… · – Router backplane (line‐card to backplane ... Low‐Swing Bitcell

PatrickChiang,OregonStateUniversityh6p://eecs.oregonstate.edu/research/vlsi

R R R R R R R R R R R R R R R R R R R G G R R R R R G R R G R R R R G R R G R R R R R G G R R R R R R R R R R R R R R R R R R R

0 1 2 3 4 5 6 7

0

1

2

3

4

5

6

7

XBAR

LINKS

Energy‐EfficientOn‐ChipLinks

4‐CoreNetwork‐on‐a‐Chip(withLi‐ShiuanPeh,MIT)

ICCD,2010

Page 28: Energy‐Efficient VLSI Interconnectslph.ece.utexas.edu/merez/uploads/LACSS2010/LACSS2010_PatrickChia… · – Router backplane (line‐card to backplane ... Low‐Swing Bitcell

PatrickChiang,OregonStateUniversityh6p://eecs.oregonstate.edu/research/vlsi

Low‐SwingBitcellBasedCrossbar

•  Twosupplies–  Low‐voltageswing=200mV

Page 29: Energy‐Efficient VLSI Interconnectslph.ece.utexas.edu/merez/uploads/LACSS2010/LACSS2010_PatrickChia… · – Router backplane (line‐card to backplane ... Low‐Swing Bitcell

PatrickChiang,OregonStateUniversityh6p://eecs.oregonstate.edu/research/vlsi

LowSwingLinkOverLogic•  1mmlong,64bitbus

–  Lowswingdiff.signaling(0.2V‐0.4V)

•  Routedovernoisydigitallogic

•  Inter‐pairdifferenDalshielding–  ReducesdifferenDalmodenoise–  Lesscapacitancethanfullplaneshielding

29

0

20

40

60

80

100

120

140

0 0.25 0.5 0.75 1 1.25

Diff

. Mod

e C

ross

talk

(mV

)

Aggressor Distance from Center of Diff. Pair (um)

Shielded Unshielded

Post‐layoutSimulaDonof1mmParallelCoupledDigitalAggressor

Page 30: Energy‐Efficient VLSI Interconnectslph.ece.utexas.edu/merez/uploads/LACSS2010/LACSS2010_PatrickChia… · – Router backplane (line‐card to backplane ... Low‐Swing Bitcell

PatrickChiang,OregonStateUniversityh6p://eecs.oregonstate.edu/research/vlsi

ChipSimulaDons

Page 31: Energy‐Efficient VLSI Interconnectslph.ece.utexas.edu/merez/uploads/LACSS2010/LACSS2010_PatrickChia… · – Router backplane (line‐card to backplane ... Low‐Swing Bitcell

PatrickChiang,OregonStateUniversityh6p://eecs.oregonstate.edu/research/vlsi

PerformanceSummary

31

Page 32: Energy‐Efficient VLSI Interconnectslph.ece.utexas.edu/merez/uploads/LACSS2010/LACSS2010_PatrickChia… · – Router backplane (line‐card to backplane ... Low‐Swing Bitcell

PatrickChiang,OregonStateUniversityh6p://eecs.oregonstate.edu/research/vlsi

FundamentalLimitstoOn‐ChipLinks

Page 33: Energy‐Efficient VLSI Interconnectslph.ece.utexas.edu/merez/uploads/LACSS2010/LACSS2010_PatrickChia… · – Router backplane (line‐card to backplane ... Low‐Swing Bitcell

PatrickChiang,OregonStateUniversityh6p://eecs.oregonstate.edu/research/vlsi

BitErrorRate

65nm‐CMOSPrototype

Page 34: Energy‐Efficient VLSI Interconnectslph.ece.utexas.edu/merez/uploads/LACSS2010/LACSS2010_PatrickChia… · – Router backplane (line‐card to backplane ... Low‐Swing Bitcell

PatrickChiang,OregonStateUniversityh6p://eecs.oregonstate.edu/research/vlsi

MeasuredEnergy/b/mm

Page 35: Energy‐Efficient VLSI Interconnectslph.ece.utexas.edu/merez/uploads/LACSS2010/LACSS2010_PatrickChia… · – Router backplane (line‐card to backplane ... Low‐Swing Bitcell

PatrickChiang,OregonStateUniversityh6p://eecs.oregonstate.edu/research/vlsi

On‐ChipLinkComparisonConvenAonalFullSwing

Schinkel(JSSCC‘09)

Stojanovic(ISSCC‘09)

1stChip 2ndChip

WireLength 1mm 2mm 10mm 1mm 1mm‐5mm

Supply 1.2V 1.2V ‐ 1.2V 0.2‐1.0V

TransceiverArea

21um2 TX:20um 2880um2 23um2 20‐30um2

SignalSwing 1.2V 120mV 200mV 250mV 34mV

Energy/Bit/mm

305fJ 105fJ 356fJ 28‐60fJ 8fJ/b/mm

•  Approachingthefundamentallimitstoenergy‐efficient,on‐chiplinks

•  Lowestenergy/b/mmofon‐chiplink

–  Energyscalability–  Lowarea–  Robust

Page 36: Energy‐Efficient VLSI Interconnectslph.ece.utexas.edu/merez/uploads/LACSS2010/LACSS2010_PatrickChia… · – Router backplane (line‐card to backplane ... Low‐Swing Bitcell

PatrickChiang,OregonStateUniversityh6p://eecs.oregonstate.edu/research/vlsi

Co‐Design•  TakeHome:tellvendortothrowkitchensinkatPCB

•  Energycanbesavedby:– OFF:Turningthingsoffwhennotbeingused– ON:Goingparallel(lowVdd)

•  Sozwareneedstopredict:– OFF:ThingsarenotbeinguDlized,turnthingsOFF

•  Coarse,finegrainclock/powergaDng– ON:ThingsareuDlized,gowideparallel

•  LowerVdddynamically,coarse/finegrainDVFS

Page 37: Energy‐Efficient VLSI Interconnectslph.ece.utexas.edu/merez/uploads/LACSS2010/LACSS2010_PatrickChia… · – Router backplane (line‐card to backplane ... Low‐Swing Bitcell

PatrickChiang,OregonStateUniversityh6p://eecs.oregonstate.edu/research/vlsi

MulDcore:near‐thresholdparallelprocessor–SyncDumin45nm‐CMOS

-  Ten parallel lanes, running in near-threshold operation (Vdd=0.4V-1.0)!

-  Incorporates Razor-like detection/recovery in every lane (variation tolerance) -  Throughput of SIMD; energy-efficiency of near-threshold operation -  GOAL: 1 GOPs / 1mW power; (Eight parallel 16b Multiply/Adds)

E.Krimer,R.Pawlowski,M.Erez,P.Chiang,"SyncDum:aNear‐ThresholdStreamProcessorforEnergy‐ConstrainedParallelApplicaDons",IEEEComputerArchitectureLeiers,2010.

Page 38: Energy‐Efficient VLSI Interconnectslph.ece.utexas.edu/merez/uploads/LACSS2010/LACSS2010_PatrickChia… · – Router backplane (line‐card to backplane ... Low‐Swing Bitcell

PatrickChiang,OregonStateUniversityh6p://eecs.oregonstate.edu/research/vlsi

HeterogeneousInterconnect

Micro,09.