optical interconnects for high performance computing · optical interconnects for high performance...

33
1 OFC 2011 Dec 10, 2010 © 2010 IBM Corporation Optical Interconnects for High Performance Computing Marc A. Taubenblatt IBM T.J. Watson Research Center Yorktown Heights, NY 10598 M. Taubenblatt OFC 2011 Dec 10, 2010 2 © 2010 IBM Corporation Acknowledgements IBM Colleagues: C. Schow, J. Kash, P. Pepeljugoski, D. Kuchta, L. Schares, C. Baks, M. Ritter, L. Shan, K. Gu, D. Kam, Y. Kwark, R. Budd, F. Libsch, C. Tsang, J. Knickerbocker, P. Coteus, A. Gara, Y. Vlasov, S. Assefa, W. Green, B. Offrein, R. Dangel, F. Horst, Y. Taira, Y. Katayama, B. Lee, J. Van Campenhout*, A. V. Rylyakov, M. Yang, J. Rosenberg, S. Nakagawa, A. Benner, D. Stigliani, C. DeCusatis, H. Bagheri, K. Akasofu and many others at IBM. Government Support: IBM’s Terabus , TELL, PERCS and Silicon Photonics programs are partially supported by DARPA. Osmosis: This research is supported in part by the University of California under subcontract number B527064. *now with Interuniversity Micro-Electronics Center (IMEC) OSA/OFC/NFOEC 2011 OThH3.pdf ©Optical Society of America

Upload: others

Post on 10-Jun-2020

13 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Optical Interconnects for High Performance Computing · Optical Interconnects for High Performance Computing Marc A. Taubenblatt IBM T.J. Watson Research Center Yorktown Heights,

1

OFC 2011Dec 10, 2010

© 2010 IBM Corporation

Optical Interconnects for High Performance Computing

Marc A. TaubenblattIBM T.J. Watson Research CenterYorktown Heights, NY 10598

M. Taubenblatt OFC 2011Dec 10, 2010 2© 2010 IBM Corporation

Acknowledgements

IBM Colleagues:

– C. Schow, J. Kash, P. Pepeljugoski, D. Kuchta, L. Schares, C. Baks, M. Ritter, L. Shan, K. Gu, D. Kam, Y. Kwark, R. Budd, F. Libsch, C. Tsang, J. Knickerbocker, P. Coteus, A. Gara, Y. Vlasov, S. Assefa, W. Green, B. Offrein, R. Dangel, F. Horst, Y. Taira, Y. Katayama, B. Lee, J. Van Campenhout*, A. V. Rylyakov, M. Yang, J. Rosenberg, S. Nakagawa, A. Benner, D. Stigliani, C. DeCusatis, H. Bagheri, K. Akasofu and many others at IBM.

Government Support:

– IBM’s Terabus , TELL, PERCS and Silicon Photonics programs are partially supported by DARPA.

– Osmosis: This research is supported in part by the University of California under subcontract number B527064.

*now with Interuniversity Micro-Electronics Center (IMEC)

OThH3.pdf   

OSA/OFC/NFOEC 2011 OThH3.pdf

©Optical Society of America

Page 2: Optical Interconnects for High Performance Computing · Optical Interconnects for High Performance Computing Marc A. Taubenblatt IBM T.J. Watson Research Center Yorktown Heights,

2

M. Taubenblatt OFC 2011Dec 10, 2010 3© 2010 IBM Corporation

Outline Why we need Optical Interconnects for High Performance

Computing

Optical Link Basics

Requirements and Trade-offs

– Cost, Power, Density

Technologies of Interest

– VCSELs/fiber

– VCSELS/Optical PCB

– Si Photonics

Architectures: Optical Switching

Conclusions

M. Taubenblatt OFC 2011Dec 10, 2010 4© 2010 IBM Corporation

Evolution of Optical interconnects

WAN, MANmetro,long-haul

LANcampus, enterprise

Systemintra/inter-rack

Boardmodule-module

Modulechip-chip

Chipon-chip buses

1980’s 1990’s 2000’s

Time of Commercial Deployment (Copper Displacement):

Distance Multi-km 100’s m 10’s m < 1 m < 10 cm < 20 mm

> 2012

Increasing integration of Optics with decreasing cost, decreasing power, increasing density

Telecom

Datacom

Computer-com

cards Card edgeCard edge/on card

Module Si C or chipIntegration

BW * Distance: Optics >> Copper

On chip

OThH3.pdf   

OSA/OFC/NFOEC 2011 OThH3.pdf

Page 3: Optical Interconnects for High Performance Computing · Optical Interconnects for High Performance Computing Marc A. Taubenblatt IBM T.J. Watson Research Center Yorktown Heights,

3

M. Taubenblatt OFC 2011Dec 10, 2010 5© 2010 IBM Corporation

Image courtesy of the National Center for Computational Sciences, Oak Ridge National Laboratory

Why High Performance Computing?

• Materials Science• Geophysical Data Processing• Environment and Climate Modeling• Life Sciences / Drug Discovery• Fluid Dynamics / Energy• Industrial Modeling• Financial Modeling• Transportation

Larger scale, more complex, higher resolution, multiscale PhysicsShorter time to solution real time response

More than 50% of Top500 systems are for industry*Growing number of flops are industry (~15% in 1990s to ~30% today)*

*www.top500.orgCourtesy L. Treinish, IBM

M. Taubenblatt OFC 2011Dec 10, 2010 6© 2010 IBM Corporation

Chip ~50% 30% CAGR*

Maintaining the HPC Performance Trend

1.E-01

1.E+00

1.E+01

1.E+02

1.E+03

1.E+04

1.E+05

1.E+06

1.E+07

1.E+08

1.E+09

1.E+10

1.E+11

No

v-9

2

No

v-9

4

No

v-9

6

No

v-9

8

No

v-0

0

No

v-0

2

No

v-0

4

No

v-0

6

No

v-0

8

No

v-1

0

No

v-1

2

No

v-1

4

No

v-1

6

No

v-1

8

No

v-2

0

Time

Pe

rfo

rma

nc

e (

Gig

afl

op

s)

Semiconductors & Pkg:~15-20% CAGR, slowing

Systems:85-90% CAGR,continuing**

Accelerators

Increasing Parallelism: System BW at all levels of assembly must scale exponentially

– ~(0.1-1 Byte/Flop) AND/OR new architectures, topologies and algorithms required

* http://www.bigncomputing.org**chart data from www.top500.org

1st500th

Total

10x /3.5-4yrs = 85-90% CAGR

Increasing Parallelism1PF

1EF

OThH3.pdf   

OSA/OFC/NFOEC 2011 OThH3.pdf

Page 4: Optical Interconnects for High Performance Computing · Optical Interconnects for High Performance Computing Marc A. Taubenblatt IBM T.J. Watson Research Center Yorktown Heights,

4

M. Taubenblatt OFC 2011Dec 10, 2010 7© 2010 IBM Corporation

Interconnects in HPC Systems

Core to core (on chip)

CPU to Memory (on card, backplane)

CPU to CPU (on 1st level package)(e.g. SMP)

Storage (rack to rack)

Cluster Fabric(Card, backplane, rack to rack)

LAN/WAN (building and beyond)A. BENNER ET AL., IBM J. RES. & DEV. VOL. 49 NO. 4/5 2005

M. Taubenblatt OFC 2011Dec 10, 2010 8© 2010 IBM Corporation

System BW must scale w/ Performance BW bottlenecks

Increasing chip performance Off chip BW bottleneck

Increasing module performance Off module bottleneck

Increasing card performance On and Off card bottleneck

Increasing rack performance Off rack bottleneck

OThH3.pdf   

OSA/OFC/NFOEC 2011 OThH3.pdf

Page 5: Optical Interconnects for High Performance Computing · Optical Interconnects for High Performance Computing Marc A. Taubenblatt IBM T.J. Watson Research Center Yorktown Heights,

5

M. Taubenblatt OFC 2011Dec 10, 2010 9© 2010 IBM Corporation

Why Optics?

LGA Socket

• Electrical Buses become increasingly difficult at high data rates (physics):• Increasing losses & cross-talk

• Frequency resonant affects

• Optical data transmission is easier: • Much lower loss, esp. at higher data rates

• Additional advantages include:

• Cable bulk, connector size, EMI…

• Potential power savings

• KEY ADVANTAGE: BW * Distance > electrical

• Optics trends for large servers and data centers are following same trends as telecom network:• Longer links first

• Short link optics requires tighter package integration:

• Higher BW closer to signal source

• Changes the supplier/manufacturing paradigm

Card

Card

CPU

Bac

kpla

ne

Cross talk Freq dependent losses

Resonance effects

Copper

Optics

M. Taubenblatt OFC 2011Dec 10, 2010 10© 2010 IBM Corporation

Rack to Rack: Already using optics

IBM Federation Switch for ASCI Purple (LLNL)- Copper for short-distance links (≤10 m)- Optical for longer links (20-40m)

~3000 parallel links 12+12@2Gb/s/channel

Combination of Electrical & Optical Cabling

20052002

NEC Earth Simulator• no optics

• 5Gb/s/ch transceivers• 55 miles of Active

Optical Cables

2008/9: 1PF/sIBM Roadrunner (LANL)

Image courtesy of the National Center for Computational Sciences, Oak Ridge National Laboratory

Cray Jaguar(ORNL)

• Was #2 in June, 2009

• Infiniband • 3 miles of

Optical Cables, longest = 60m

Courtesy of ESC www.jamstec.go.jp

Fiber to the RackFiber to the Rack40,000 optical links40,000 optical links

OThH3.pdf   

OSA/OFC/NFOEC 2011 OThH3.pdf

Page 6: Optical Interconnects for High Performance Computing · Optical Interconnects for High Performance Computing Marc A. Taubenblatt IBM T.J. Watson Research Center Yorktown Heights,

6

M. Taubenblatt OFC 2011Dec 10, 2010 11© 2010 IBM Corporation11

Bandwidth: the Bane of the Multicore Paradigm:

Logic flops continue to scale faster than interconnect BW

• Constant Byte/Flop ratio with N cores (constant ) means:Bandwidth(N-core) = N x Bandwidth(single core)

• 3Di (3D integration) will only exacerbate bottlenecks

Assumptions:• 3 GHz clock• ~ 3 IPC• 10 Gb/s I/O

• 1 B/Flop mem• 0.1 B/Flop data• 0.05 B/Flop I/O

Pins per chip

0

2000

4000

6000

8000

10000

12000

14000

16000

18000

1 2 4 8 16 32 64 128

Number of Cores

Sig

nal

+ R

efer

ence

Pin

sS

igna

l + R

efer

ence

Pin

s

M. Ritter Topical Workshop on Electronics for Particle Physics, Sept 2010

M. Taubenblatt OFC 2011Dec 10, 2010 12© 2010 IBM Corporation12

0

2000

4000

6000

8000

10000

12000

14000

16000

18000

1 2 4 8 16 32 64 128

Number of Cores

Sig

nal

+ R

efer

ence

Pin

s

Implications of BW Scaling:

Module Escape Bottleneck

Card Escape Bottleneck

Chip escape limit, 200m pitch

Module escape, 1mm pitch

Card escape, 8 pair/mm(QCM w/8 Cores…)

Pins per chip

Sig

nal +

Ref

eren

ce P

ins

M. Ritter Topical Workshop on Electronics for Particle Physics, Sept 2010

OThH3.pdf   

OSA/OFC/NFOEC 2011 OThH3.pdf

Page 7: Optical Interconnects for High Performance Computing · Optical Interconnects for High Performance Computing Marc A. Taubenblatt IBM T.J. Watson Research Center Yorktown Heights,

7

M. Taubenblatt OFC 2011Dec 10, 2010 13© 2010 IBM Corporation

High aggregate BW with electrical getting more difficult

Modu

le

Driver

Card-to-cardModule-to-module

Connector

Losses, Crosstalk

Through Vias (Stubs)

Reflections,Crosstalk

Reflections Reflections

Reflections

Losses

Crosstalk

Modu

le

Driver

Backplane

Chip-to-chip

ConnectorConnector

Mod

ule

Re

cvr

Recvr

Recvr

Module

0Hz 10GHz2.0GHz 4.0GHz 6.0GHz 8.0GHz-70

30

-60

-50

-40

-30

-20

-10

0

10

20

30

Frequency Response of 30” line

Frequency

|SD

D21

|

Via stub

No via stub

[M. Ritter, et. al., “The viability of 25 Gb/s on-board signaling”, ECTC 2008]

• Lower losses w/ higher xsection lines?

► BW density drops, cannot route

► Improved Boards (no vias, adv’d mat’ls)?

► Increased cost

• Signal Processing, higher swing?

► Power balloons

• Multi-level signaling?

► NRZ still works better

M. Taubenblatt OFC 2011Dec 10, 2010 14© 2010 IBM Corporation

1000

10000

100000

1000000

10000000

100000000

2004 2006 2008 2010 2012 2014 2016 2018 2020

Year

Nu

mb

er o

f O

pti

cal

Ch

ann

els

HPC driving volume optics Computercom market

MareNostrum

ASCI Purple

Blue Waters*

WW volume in 2008

Single machine volumes similar to today’s WW parallel optics

Roadrunner2.5Gbps

10Gbps

5Gbps

?

* Expected Summer 2011 T. Dunning, NCSA, https://hub.vscse.org/resources/86/download/VSCSE_FutureofHPC_Jul10-2.ppt#265,1,Future of High Performance Computing

OThH3.pdf   

OSA/OFC/NFOEC 2011 OThH3.pdf

Page 8: Optical Interconnects for High Performance Computing · Optical Interconnects for High Performance Computing Marc A. Taubenblatt IBM T.J. Watson Research Center Yorktown Heights,

8

M. Taubenblatt OFC 2011Dec 10, 2010 15© 2010 IBM Corporation

Network topology determines number of links and link lengths required

B. Arimilli et al, 2010 18th IEEE Symposium on High Performance Interconnects

PERCS – All to All w distributed switch - O(n2)

BG/L - 3D Torus - O(n)

A. BENNER ET AL., IBM J. RES. & DEV. VOL. 49 NO. 4/5 2005

ASCI Purple – Centralized Switch (Fat Tree)

And many others e.g. Clos,Hypercube, flattened butterfly, dragonfly…

TRADEOFFS:• Number of links (cost)• Bisection BW (performance)• Hops/Latency (performance)• Link length (cost)• Switch size (cost)

Distribution of Active Cable Lengths in Roadrunner

05

1015202530354045

0 10 20 30 40 50 60 70 80 90 100 110

Length (m)

Per

cen

tag

e o

f L

inks

(%

)

85% of the links are < 20m98% of the links are < 50m

Optics, if cheap enough, enables more connectivity, more BW, more spread out systems

M. Taubenblatt OFC 2011Dec 10, 2010 16© 2010 IBM Corporation

Cost and power of a supercomputer

YearPeak

PerformanceMachine Cost

Total Power Consumption

2008 1PF $150M 2.5MW

2012 10PF $225M 5MW

2016 100PF $340M 10MW

20201000PF(1EF)

$500M 20MW

Assumptions: Based on typical industry trends –(See, e.g., top500.org and green500.org)

– 10X performance / 4yrs (from top500 chart)– 10X performance costs 1.5X more– 10X performance consumes 2X more power

J. Kash Photonics Society Annual Meeting Nov 2010

OThH3.pdf   

OSA/OFC/NFOEC 2011 OThH3.pdf

Page 9: Optical Interconnects for High Performance Computing · Optical Interconnects for High Performance Computing Marc A. Taubenblatt IBM T.J. Watson Research Center Yorktown Heights,

9

M. Taubenblatt OFC 2011Dec 10, 2010 17© 2010 IBM Corporation

Total bandwidth, cost and power for optics in a machine

YearPeak

Performance(Bidi) Optical Bandwidth

Optics Power Consumption

Optics Cost

2008 1PF0.012PB/s

(1.2x105Gb/s)0.012MW $2.4M

2012 10PF1PB/s

(107Gb/s)0.5MW $22M

2016 100PF20PB/sec

(2x108Gb/s)2MW $68M

20201000PF(1EF)

400PB/sec(4x109Gb/s)

8MW $200M

Require >0.2Byte/FLOP I/O bandwidth, >0.2Byte/FLOP memory bandwidth– 2008 optics replaces electrical cables (0.012Byte/FLOP, 40mW/Gb/s)– 2012 optics replaces electrical backplane (0.1Byte/FLOP, 10% of power/cost)– 2016 optics replaces electrical PCB (0.2Byte/FLOP, 20% of power/cost)– 2020 optics on-chip (or to memory) (0.4Byte/FLOP, 40% of power/cost)

J. Kash Photonics Society Annual Meeting Nov 2010

M. Taubenblatt OFC 2011Dec 10, 2010 18© 2010 IBM Corporation

Cost and Power per optically-transmitted bit

YearPeak

Performance

Number of unidirectional

optical channels

Optics Power Consumption(Unidirectional)

Optics Cost (Unidirectional)

2008 1PF48,000

(@5Gb/s)50mW/Gb/s

(50pJ/bit)$10/Gb/s

2012 10PF2x106

(@10Gb/s)25mW/Gb/s $1.1/Gb/s

2016 100PF4x107

(@10Gb/s)5mW/Gb/s $0.17/Gb/s

20201000PF(1EF)

8x108

(@10Gb/s)1mW/Gb/s $0.025/Gb/s

To get optics to millions of units in 2012, need ~$1/Gb/s unidirectional– Cost targets continue to decrease with time below that

Power is OK for 2012, then sharp reductions will be needed– < 5mW/Gb/s likely difficult for with VCSEL-based links J. Kash Photonics Society

Annual Meeting Nov 2010

OThH3.pdf   

OSA/OFC/NFOEC 2011 OThH3.pdf

Page 10: Optical Interconnects for High Performance Computing · Optical Interconnects for High Performance Computing Marc A. Taubenblatt IBM T.J. Watson Research Center Yorktown Heights,

10

M. Taubenblatt OFC 2011Dec 10, 2010 19© 2010 IBM Corporation

Outline Why we need Optical Interconnects for High Performance

Computing

Optical Link Basics

Requirements and Trade-offs

– Cost, Power, Density

Technologies of Interest

– VCSELs/fiber

– VCSELS/Optical PCB

– Si Photonics

Architectures: Optical Switching

Conclusions

M. Taubenblatt OFC 2011Dec 10, 2010 20© 2010 IBM Corporation

What does an optics link consist of?Logical view:

Serializer,Coding, & Clock Laser

Driver

Deserializer,Decoding &

CDR

TIA& LA

Vb2

Optical fiber and/or waveguides, optical connectors…

LVb1

III-VVCSEL

PD

OE Module

OE Module

Pre-amp

Output Driver

CPU or Switch chip

CPU, Switch chip

Physical view:

On MCM optics

On Card optics

e.g. CPU

OE Module

PCB wiring & connector

Fiber & connector

OThH3.pdf   

OSA/OFC/NFOEC 2011 OThH3.pdf

Page 11: Optical Interconnects for High Performance Computing · Optical Interconnects for High Performance Computing Marc A. Taubenblatt IBM T.J. Watson Research Center Yorktown Heights,

11

M. Taubenblatt OFC 2011Dec 10, 2010 21© 2010 IBM Corporation

Signal Integrity: Eye Diagrams

Key Parameters:

– Extinction Ratio, ER = P1/P0• Want ER as large as possible

maximize modulated power

– Optical Modulation Amplitude, • OMA = P1 – P0• OMA = 2Pavg(ER-1)/(ER+1)

5 Gb/s, Excellent Eye

15 Gb/s, Marginal EyeEye Diagrams = snapshot of performance:

amplitude, speed, noise, jitter, distortion…

tftr

P1

P0

0 = no light Courtesy Clint Schow, IBM

jitter

Amplitude noise

ISI: Inter-symbol interference, eye depends on bit history

RIN: Random intensity noise

M. Taubenblatt OFC 2011Dec 10, 2010 22© 2010 IBM Corporation

Signal Impairment in Multimode Fiber (MMF)

A multimode fiber (or waveguide) can be represented as a waveguide propagating each of the modes in a separate “tube”

Which modes will be excited and what will be the power distribution between them depends on the launch conditions

The impulse response of the MMF is a weighted superposition of all modal pulses from each “tube”

Net: MM optics distorts the signal, leading to BW*distance limitations

Input pulse

Launching conditions

Output pulse

Diameters of core and cladding

Cladded parabola refractive index profile

Multimode optical fiber

Beam NA

Courtesy Petar Pepeljugoski, IBM

OThH3.pdf   

OSA/OFC/NFOEC 2011 OThH3.pdf

Page 12: Optical Interconnects for High Performance Computing · Optical Interconnects for High Performance Computing Marc A. Taubenblatt IBM T.J. Watson Research Center Yorktown Heights,

12

M. Taubenblatt OFC 2011Dec 10, 2010 23© 2010 IBM Corporation

P. Pepeljugoski, OFC 2005500470050um2009OM4

2002

<1995

<1995

Date of introduction

500200050umOM3

50050050umOM2

50020062.5umOM1

BW*D@ 1300nm (MHz*K)

BW*D @ 850nm (MHz*K)

Core Diam

FiberOM1

OM2

Single mode10km40Gbps2011802.3bg

~80-100m

150m

OM4

40km

SMF

Single mode10x10Gbps2010802.3ba

1300nm w equalization

220m220m220m102006802.aq

850nm100m10x10Gbps2010802.3ba

64B/66B4x25Gbps>2013100GbE

2002

1997

Year (Std)

8B/10B 850nm

550m500m1802.3z

64B/66B 850nm

300m82m33m10802.3ae

CommentOM3OM2OM1DR GbpsEthernet

Data courtesy of P. Pepeljugoski & A. Benner, IBM

Multimode Fiber & Standards

Fiber Channel: 4.25G (2005), 8.5G (2008), 14.025G (2011) 28.05G (2012) Infiniband DDR-5G (2006), QDR-10G (2006, product ~2009), FDR-14G (mid-2011), EDR-20G (mid-2011)

M. Taubenblatt OFC 2011Dec 10, 2010 24© 2010 IBM Corporation

Receiver Sensitivity Measurement

Measure bit error ratio (BER) as a function of input optical power, eye sampled at center

Receiver Sensitivity = minimum optical power required to achieve a specified BER (often 10-12)

Degrades at higher data rates due to bandwidth limitations

-18 -17 -16 -15 -14 -13 -12 -11 -10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0

-5

-6

-7

-8

-9

-10

-11

-12

OMA (dBm)

log1

0[B

ER

]

R1AR1BR1CR1DR2AR2BR2CR2DR3AR3BR3CR3DR4AR4BR4CR4D

5 Gb/s

10 Gb/s

12.5 Gb/s

15 Gb/s

Example:985-nm Terabustransceivers

Optical Power

Bit

Err

or

Rat

io

Courtesy Clint Schow, IBM

OThH3.pdf   

OSA/OFC/NFOEC 2011 OThH3.pdf

Page 13: Optical Interconnects for High Performance Computing · Optical Interconnects for High Performance Computing Marc A. Taubenblatt IBM T.J. Watson Research Center Yorktown Heights,

13

M. Taubenblatt OFC 2011Dec 10, 2010 25© 2010 IBM Corporation

Example Link Power Budget (optical waveguides)

0.80.05Margin

-9.8

8.25

3.5

2

<15 Gb/s

-9.8RX sensitivity

8.25Signal Losses

(0.3m att. + CE)

2.75Power penalties

(including Xtalk)

2 dBmTX power OMA

10 Gb/s

Xtalk penalty=2 dBCE=2dB TX, 2dB RXISI penalty 1 dB

Courtesy Petar Pepeljugoski, IBM

M. Taubenblatt OFC 2011Dec 10, 2010 26© 2010 IBM Corporation

Outline Why we need Optical Interconnects for High Performance

Computing

Optical Link Basics

Requirements and Trade-offs

– Cost, Power, Density

Technologies of Interest

– VCSELs/fiber

– VCSELS/Optical PCB

– Si Photonics

Architectures: Optical Switching

Conclusions

OThH3.pdf   

OSA/OFC/NFOEC 2011 OThH3.pdf

Page 14: Optical Interconnects for High Performance Computing · Optical Interconnects for High Performance Computing Marc A. Taubenblatt IBM T.J. Watson Research Center Yorktown Heights,

14

M. Taubenblatt OFC 2011Dec 10, 2010 27© 2010 IBM Corporation

Optimized solutions will require detailed analysis of trade-offs

Power

CostDensity

Margin, Packaging Integration, Data-rate…

Packaging Integration, Channel Integration, Margin, Cooling, Data-rate…

Base Manufacturing Cost, Yield, Channel Integration, Data-rate, Reliability…

M. Taubenblatt OFC 2011Dec 10, 2010 28© 2010 IBM Corporation

Density of optical links

Courtesy of Avago Technologies

Optics Module and electrical connectorCard edge

~8X8mm, ~0.75mm pitch

~18x41mm 1.27mm pitch

Active cable, electrical at card edge

Courtesy of AvagoTechnologies

4x4 VCSELArray

4x4 PD

Array

4x4 VCSELArray

4x4 PD

Array

Optical at card edge

~60K fibers/ RACK

ROADRUNNER ~1PF)

~40K fibers / SYSTEM

PERCS >10PF

8@5Gbps/cable

48@10Gbps/cable

~5x3mm 0.2mm pitch

~10x25mm

~5X15mm

> 40x denser

OThH3.pdf   

OSA/OFC/NFOEC 2011 OThH3.pdf

Page 15: Optical Interconnects for High Performance Computing · Optical Interconnects for High Performance Computing Marc A. Taubenblatt IBM T.J. Watson Research Center Yorktown Heights,

15

M. Taubenblatt OFC 2011Dec 10, 2010 29© 2010 IBM Corporation

Packaging of optical interconnects is critical

Optical bulkheadconnector

Ceramic Organic card

Opto module

1cm FlexNIC

Laser+driver ICfiber1.7cm traces

Optical bulkheadconnector

Ceramic Organic card

Opto module

1cm FlexNICNIC

Laser+driver ICfiber1.7cm traces

Optics on-MCM

Good: Optics on-card

• Better to put optics close to logic rather than at the card edgeAvoids power, distortion & cost of electrical link on each end of optical linkBreaks through pin-count limitation of multi-chip modules (MCMs)

Operation at 10 Gb/s:equalization required

Ceramic Organic card

Opto module

Up to ~1m, 50

NIC Laser+driver IC

fiber1cm Flex

1.7cm traces

Ceramic Organic card

Opto moduleNICNIC Laser+driver IC

fiber1cm Flex

~2cm 50 traces

Optics on-card

Operation to >15 Gb/s:no equalization required ~2cm 50 traces

Bandwidth limited by # of pins

Implemented, P7-IH

Improved electrical power and off module BW

No internal jumper/bulkhead connectorSimpler standardized packaging

M. Taubenblatt OFC 2011Dec 10, 2010 30© 2010 IBM Corporation

Integrated packaging is more complex, requires close relationship with suppliers (IBM PERCS/Blue Waters)

Heat Spreader for Optical DevicesCooling / Load Saddle for Optical Devices

Optical Transmitter/Receiver Devices 12 channel x 10 Gb/s28 pairs per Hub - (2,800+2,800) Gb/s of optical I/O BW

Heat Spreader over HUB ASIC

Strain Relief for Optical RibbonsTotal of 672 Fiber I/Os per Hub, 10 Gb/s each

Hub ASIC (Under Heat Spreader)

A.Benner, Future Directions in Packaging (FDIP) Workshop, EPEP Oct 2010

OThH3.pdf   

OSA/OFC/NFOEC 2011 OThH3.pdf

Page 16: Optical Interconnects for High Performance Computing · Optical Interconnects for High Performance Computing Marc A. Taubenblatt IBM T.J. Watson Research Center Yorktown Heights,

16

M. Taubenblatt OFC 2011Dec 10, 2010 31© 2010 IBM Corporation

Power consumption of an optical Link

LaserDriver

TIA& LA

Vb2

LVb1

III-VVCSEL

PD

OE Module

OE Module

Pre-amp

Output Driver

See C. Lai et al – OFC 2011

Power distribution for a nominal 10mw/Gbps link (at 10-20Gbps)

Power vs Speed in a given technology

Tx PA: 2.0-2.5 mW/Gbps

Tx LD: 1.0-2.0 mW/Gbps

Tx VCSEL: 1.0 mW/Gbps

Tx Total: 4.0-5.5 mW/Gbps Rx TIA: 1.0-1.5 mw/Gbps

Rx LA: 3.0-3.5 mW/Gbps

Rx OD: 0.5-2.5 mW/Gbps

Rx Total: 4.5-7.5 mW/Gbps

• VCSEL PA/LD dependent on VCSEL quality

• Rx OD dependent on electrical channel length and quality

Low power data rate sweet spot for given CMOS technology generation and design

M. Taubenblatt OFC 2011Dec 10, 2010 32© 2010 IBM Corporation

Costs associated with an optics linkBill of Materials, e.g:• Substrates• Lenses• Laser & PD arrays• u-controller• Driver/receiver chips• Connectors

• optical and electrical• Fiber cabling• Heat sinks…

Test & Yield• Tester cost• Tester time• Built in self-test• Bit Error Rate

requirements…

Assembly & Fabrication• Assembly rate & Equipment costs• Output rate• Active vs passive alignment• Manual vs automated• Tolerances vs yield• Assembly (w/ server)…

Courtesy of Avago Technologies

OThH3.pdf   

OSA/OFC/NFOEC 2011 OThH3.pdf

Page 17: Optical Interconnects for High Performance Computing · Optical Interconnects for High Performance Computing Marc A. Taubenblatt IBM T.J. Watson Research Center Yorktown Heights,

17

M. Taubenblatt OFC 2011Dec 10, 2010 33© 2010 IBM Corporation

What will Help Optics Compete with shorter Copper links for Computer-com?

Increased Channel Rate

– Key challenges: VCSELs, electrical part of link

More parallelism

– Key challenges: packaging density, standardization

Move closer to the signal source (e.g. processor, switch)

– Key challenges: packaging integration, standardization

Links from standardized sub-components (no more modules)

– Key challenges: technology, supporting infrastructure (design tools, components, manufacturing and test)

IC-like cost structure using Integrated Photonics

– Key challenges: integration vs yield, packaging, volumes

M. Taubenblatt OFC 2011Dec 10, 2010 34© 2010 IBM Corporation

Outline Why we need Optical Interconnects for High Performance

Computing

Optical Link Basics

Requirements and Trade-offs

– Cost, Power, Density

Technologies of Interest

– VCSELs/fiber

– VCSELs/Optical PCB

– Si Photonics

Architectures: Optical Switching

Conclusions

OThH3.pdf   

OSA/OFC/NFOEC 2011 OThH3.pdf

Page 18: Optical Interconnects for High Performance Computing · Optical Interconnects for High Performance Computing Marc A. Taubenblatt IBM T.J. Watson Research Center Yorktown Heights,

18

M. Taubenblatt OFC 2011Dec 10, 2010 35© 2010 IBM Corporation

Optics Technologies of Interest

VCSELs/ multi-mode fiber

VCSELs/multi-mode optical waveguides on PCB or flex

Si Photonics/single-mode WDM

2D waveguide array

32 parallel channels

35 x 35m62.5m pitch

3.9 mm

3.9 mm

Terabus 160Gb/sTRx (bottom view)

M. Taubenblatt OFC 2011Dec 10, 2010 36© 2010 IBM Corporation

VCSELs & Multi-mode Fiber

Incumbent technology with well established infrastructure

Current low cost technology

Can be more highly integrated, but will always be heterogeneous packaging

Cost can be improved with:

– Higher speeds

– More automated, designed for low cost optics modules

– More BW/fiber

Power can be improved with higher speeds, faster CMOS or well designed SiGe and signal processing

OThH3.pdf   

OSA/OFC/NFOEC 2011 OThH3.pdf

Page 19: Optical Interconnects for High Performance Computing · Optical Interconnects for High Performance Computing Marc A. Taubenblatt IBM T.J. Watson Research Center Yorktown Heights,

19

M. Taubenblatt OFC 2011Dec 10, 2010 37© 2010 IBM Corporation

Development of VCSELs for >25 Gb/s links: collaborations with OE vendors

26Gb/s 30Gb/s

Joint work with Finisar AOC

Joint work with Emcore Corp

20Gb/s 25Gb/s

8mA700mVpp

*R. Johnson and D. M. Kuchta, “30Gb/s directly modulated VCSELs,” CLEO 2008

*N. Y. Li et al., “High-Performance 850 nm VCSELs and Photodetector Arrays for 25 Gb/s Parallel Optical Interconnects,” OFC 2010.*N. Y. Li et al., “Development of High-Speed VCSELs Beyond 10 Gb/s at Emcore,” Photonics West 2010.

6mA375mVpp

M. Taubenblatt OFC 2011Dec 10, 2010 38© 2010 IBM Corporation

90-nm IBM CMOS-Driven VCSEL Transmitters and Compatible Receivers

– Power and Speed Records

2.8 pJ/bit0.7 pJ/bit

20Gb/s

1.9 pJ/bit

29Gb/s 32Gb/s20Gb/s

1.75 pJ/bit

VCSEL Transmitters

Compatible Receivers

2.9 pJ/bit 3.5 pJ/bit 6.7 pJ/bit

15Gb/s 17.5Gb/s 20Gb/s

Progress in VCSEL based CMOS Transmitters and Receivers

OThH3.pdf   

OSA/OFC/NFOEC 2011 OThH3.pdf

Page 20: Optical Interconnects for High Performance Computing · Optical Interconnects for High Performance Computing Marc A. Taubenblatt IBM T.J. Watson Research Center Yorktown Heights,

20

M. Taubenblatt OFC 2011Dec 10, 2010 39© 2010 IBM Corporation

High volumes can take advantage of mass manufacturing

Mitch Fields, OFC 2010Courtesy of Avago Technologies

Manufacture of MicroPOD – paradigm shiftProduction requirement 30,000 pairs per monthSolution Simple vertical stack design

Invest in manufacturing technology for 100% automationManufacture parallel optics in panel form

Panel with MicroPOD modules

100% automated process.Assemble in panel form.

Singulate

M. Taubenblatt OFC 2011Dec 10, 2010 40© 2010 IBM Corporation

10 Gb/s10 Gb/sTXTX RXRX

7.5 Gb/s

10 Gb/s

Higher Density: 24-Channel Fiber-Coupled Optical Transceiver

[A. Rylyakov et al., OFC 2010 Post-Deadline]

RX Sensitivity

Merits afforded by flip-chip integrated “holey” IC package• Channel Count (24 TX + 24 RX)• Bandwidth (300 Gb/s)• Power (8.2 pJ/bit at 12.5 Gb/s)• Density (10 Gb/s/mm2)

[F. Doany et al., ECTC 2010]

OThH3.pdf   

OSA/OFC/NFOEC 2011 OThH3.pdf

Page 21: Optical Interconnects for High Performance Computing · Optical Interconnects for High Performance Computing Marc A. Taubenblatt IBM T.J. Watson Research Center Yorktown Heights,

21

M. Taubenblatt OFC 2011Dec 10, 2010 41© 2010 IBM Corporation

More BW/fiber: 4-Fiber 24-Core Optical Transceiver

silicon carriersilicon carrier

TXTXICIC

RXRXICIC

VCSEL arrayVCSEL array PD arrayPD array

backsidebackside

PD arrayPD array

VCSEL arrayVCSEL array

PCB

PDVCSELTX IC RX IC

Silicon Carrier

MCF

Custom VCSEL/PD Arrays Matched to 4 Multicore Fibers

Custom OE chips designed to fit into existing configuration of Terabus project—Match silicon carrier designed for 24-channel polymer optical waveguide transmitter.

Fabricated by Emcore Corporation

B.G. Lee OFC 2009Fiber fabricated by OFS Laboratories

M. Taubenblatt OFC 2011Dec 10, 2010 42© 2010 IBM Corporation

Single-Channel MCF-TX/RX PerformanceTX at 15 Gb/s TX at 15 Gb/s (~ 6 dB ER)

MCFMCF--TX aligned to 100TX aligned to 100--m MCF core.m MCF core.Output coupled to reference RX.Output coupled to reference RX.

RX at 12.5 Gb/sRX at 12.5 Gb/s

Ref TX coupled to 100Ref TX coupled to 100--m MCF core. m MCF core. Output aligned to MCFOutput aligned to MCF--RX.RX.[B. G. Lee et al., IEEE Phot. Soc. Mtg. 2010]

OThH3.pdf   

OSA/OFC/NFOEC 2011 OThH3.pdf

Page 22: Optical Interconnects for High Performance Computing · Optical Interconnects for High Performance Computing Marc A. Taubenblatt IBM T.J. Watson Research Center Yorktown Heights,

22

M. Taubenblatt OFC 2011Dec 10, 2010 43© 2010 IBM Corporation

Vcsels & Multi-mode Fiber : Challenges for the future

• VCSELs:• ~30Gbps likely• Continued focus on reliability lower temperatures can help• Continued focus on high performance VCSELs and Trx circuits for

lower power• Change of wavelength? 900-1100nm band or 850nm? no reliability

data, could slow acceptance

• Receiver performance at higher data rates:

• Higher data rates will require smaller photodiode size to achieve higher bandwidth• Smaller photodiode size will impact the ability to collect all the light from

the fiber – may lead to modal noise penalty• Smaller core (e.g. 30um) multi-mode fiber may be a good trade-off

• Receiver sensitivity will be one of the biggest challenges• becomes worse with increased data rate• impacts the jitter budget

• Reducing receiver power consumption becoming increasingly more difficult

M. Taubenblatt OFC 2011Dec 10, 2010 44© 2010 IBM Corporation

Optical Printed Circuit Boards - based on polymer waveguide technology

Vision

Optical

MCM

Optical

MCM

Optical

DIMM

Optical

DIMM

Optical

MCM

Optical

MCM

Optical

MCM

Optical

MCM

Waveguides

Waveguides

Low cost PCB card for control signals, power, ground

All off-MCM links are optical

Technology Progression

Optical Flex

OSALGA

Optical connector

Polymer WG’s

WG on PCB

OE elements

ProcessorSocketOptical PCB

CPU ProcessorSocketOptical PCB

CPU

WG in PCB

Technology Elements

8 waveguide flex sheets, 192 waveguides, 8 connectors

4x12 MT Waveguide connector

TRX1:

16TX + 16RX

TRX2:

16TX + 16RX

4x4 VCSELArray

4x4 PD

Array

TRX2:

16TX + 16RX

4x4 VCSELArray

4x4 PD

Array

48-channel Waveguide mirror array

waveguide cores on 62.5um pitch

48-channel Waveguide mirror array

waveguide cores on 62.5um pitch

Waveguide processing on large panels, e.g., 305 mm x 460 mm

Ink-jet / UV laser writing system

TIR mirrors with laser ablation

16+16 OSA, incl circuits >12.5 Gbps/ch

• Low cost pick and place asm’bly

• Addresses fiber mgt, including shuffles, spits

• Optics close to chips to alleviate electrical power to optics cost

Advantages

16+16 WG channels on PCB

OThH3.pdf   

OSA/OFC/NFOEC 2011 OThH3.pdf

Page 23: Optical Interconnects for High Performance Computing · Optical Interconnects for High Performance Computing Marc A. Taubenblatt IBM T.J. Watson Research Center Yorktown Heights,

23

M. Taubenblatt OFC 2011Dec 10, 2010 45© 2010 IBM Corporation

RXTXLDD RX

Si CarrierVCSEL

Lens Arrays

PD

Organic Carrier

PCBPolymer Wavegu

Optical Printed Circuit Boards (Terabus)Optochip: Based on a Si carrier platform for hybrid integration of

multiple chips Dense high-speed wiring layers for inter-chip connections Dense Thru Silicon Vias (TSV) for electrical I/O Optical vias for optical I/O Allows use of Industry standard 850nm vcsels (also lower WG loss than

980nm – 0.03dB/cm)

850-nm Optochip demonstrated300Gb/s bidirectional aggregate data rate

F. Doany ECTC 2010

6.4mm x 10.4mm

2x12LDD IC3

.9m

m 2x12RX IC

1.6mm

24-channel VCSEL Array

0.9 x 3.5mm

24-channel PD Array

0.9 x 3.5mm

Full link is operational, to be presented at ECTC 2011 (F. Doany)

TX: 20 Gb/sRX: 15 Gb/s

M. Taubenblatt OFC 2011Dec 10, 2010 46© 2010 IBM Corporation

-50 -40 -30 -20 -10 0 10 20 30 40 50

-4.5

-4

-3.5

-3

-2.5

-2

-1.5

-1

-0.5

0

Offset (um)

Coupl

ing

Eff

icie

ncy

(dB

)

-50 -40 -30 -20 -10 0 10 20 30 40 50

-4.5

-4

-3.5

-3

-2.5

-2

-1.5

-1

-0.5

0

Offset (um)

Coupl

ing

Eff

icie

ncy

(dB

)

Tx: ±35 µm

-80 -60 -40 -20 0 20 40 60 80

-3.5

-3

-2.5

-2

-1.5

-1

-0.5

0

Offset (um)

Coup

ling

Eff

icie

ncy

(dB

)

-80 -60 -40 -20 0 20 40 60 80

-3.5

-3

-2.5

-2

-1.5

-1

-0.5

0

-80 -60 -40 -20 0 20 40 60 80

-3.5

-3

-2.5

-2

-1.5

-1

-0.5

0

Offset (um)

Coup

ling

Eff

icie

ncy

(dB

)

Rx > ±65 µm

Design Study: One lens vs two lens coupling Single lens coupling design

– Beam focused to a point at the WG cores

– Tight OE module to WG board alignment tolerances

Si Carrier

SLC

Driver/TIA

OESi Carrier

Base Substrate

Driver/TIA :

OE

Lens Array SLC

Dual lens coupling design– Collimated beam between module and board

– Relaxed OE module alignment tolerances

Nearly constant coupling efficiency for:

– VCSEL : ±12µm

– Photodiode: ±10µm

-3.0

-2.5

-2.0

-1.5

-1.0

-0.5

0.0

0.5

-20 -15 -10 -5 0 5 10 15 20

X (microns)

Rel

ati

ve

Co

up

lin

g L

oss

(d

B)

SimulationMeasured

Flat efficiency withinmeasurement noise for

+/- 12 µm

VCSEL0

0.2

0.4

0.6

0.8

1

-50 -40 -30 -20 -10 0 10 20 30 40 50

Y (microns)

Re

lati

ve R

esp

on

se

SimulationMeasured

Photodiode

OE

YX

Relaxed module to board alignment tolerance of > ± 35 µm realized

VCSEL

Photodiode

Measured Alignment Tolerances

Measured Alignment Tolerances

after R. Budd, IBM Lower BOM cost

Higher Yield

or

OThH3.pdf   

OSA/OFC/NFOEC 2011 OThH3.pdf

Page 24: Optical Interconnects for High Performance Computing · Optical Interconnects for High Performance Computing Marc A. Taubenblatt IBM T.J. Watson Research Center Yorktown Heights,

24

M. Taubenblatt OFC 2011Dec 10, 2010 47© 2010 IBM Corporation

Advantages of Polymer Waveguide Technologycompared to Parallel Fiber Optics Integrated mass manufacturing

– Board, sheet, film level processing of optical interconnects– Lower assembly, waveguide jumper costs– Costs for wide busses should scale better

Simple assembly

– Avoid fiber handling– Similar assembly procedures as for electrical components and boards– Establish electrical and optical connections simultaneously– Avoid separate optical layer (if integrated with board)

New or compact functionality supporting new architectures

– Shuffles, Crossings, splitters, …– Enabler for multi-drop splitting, & complex re-routing that is expensive in fiber

Higher density, waveguide pitch < 125 um (best future fiber pitch)

– Higher bandwidth density, less signal layers

Cost

– Reduced Optics module cost AND jumper/connector costs both important– Possible Lower Maintenance costs

But

– Fiber / waveguide counts remain high for single optics– Still need lots of rack-to-rack fiber interconnects– Technology ecosystem needs to mature

M. Taubenblatt OFC 2011Dec 10, 2010 48© 2010 IBM Corporation

Si Photonics Vision/Goal

– Ultra low cost CMOS fabrication

– WDM/single mode fiber for high BW/fiber and distance at high data rate

– Nanophotonics for high BW density and integration

– Wafer level testing

– Ultra low power with low load devices and tight integration

Op

tical

I/O

Logic Plane

Off

-ch

ip o

pti

cal

sig

nal

s

On

-ch

ip o

pti

cal

tra

ffic

Photonic PlaneMemory Plane

Op

tical

I/O

Op

tical

I/O

Logic Plane

Off

-ch

ip o

pti

cal

sig

nal

s

On

-ch

ip o

pti

cal

tra

ffic

Photonic PlaneMemory Plane

3D Integrated Chip

F. Horst et al OFC 2010F. Horst et al PTL 21 (2009)

http://www.research.ibm.com/photonics

Also see Y. Vlasov OFC 2011

waveguideSi

Ge PD

waveguideSi

Ge PD

MZ modulator Ge WG detector

W.Green et al Optics Express, 2007J. Van Campenhout et al, Optics Express Dec 2010S. Assefa et al, Nature March 2010 & OE, April 2010

Y.VlasovSEMICON Dec 2010

OThH3.pdf   

OSA/OFC/NFOEC 2011 OThH3.pdf

Page 25: Optical Interconnects for High Performance Computing · Optical Interconnects for High Performance Computing Marc A. Taubenblatt IBM T.J. Watson Research Center Yorktown Heights,

25

M. Taubenblatt OFC 2011Dec 10, 2010 49© 2010 IBM Corporation

Goal: Integrate Ultra-dense Photonic Circuits with electronicsApplications: High Bandwidth On-chip optical network

Opt

ical

I/O

Logic Plane

Off

-ch

ip o

pti

cal s

ign

als

On

-ch

ip o

pti

cal t

raff

icPhotonic Plane

Memory Plane

3D Integrated Chip

WD

M W

DM

Switch fabric

WDM bit-parallel message

Message1Tbps aggregate BW

Core N

Deserializer

N parallel channels

Detector

Opt

ical

-to-

Ele

ctri

cal

Serializer

Message1Tbps aggregate BW

N parallel channels

Modulator

N modulators at different wavelengths

Ele

ctri

cal-

to-O

ptic

al

Core 1

Ultimate Vision, circa 2020:Optically connected 3-D Supercomputer Chip

Y.Vlasov, ECOC 2008, and Semicon, 2010Also, J. Kash, Photonics in Switching, SF, CA, pp 55-56, 2007

M. Taubenblatt OFC 2011Dec 10, 2010 50© 2010 IBM Corporation

Example: Performance Trade-Offs of Two Fabricated Photonic Switches

1810series resistance ()

0.0010.0010.02footprint (mm2)

90200diode length (µm)

1×22×2port configuration

yesnodigital response

< 34ON power (mW)

< 1> 100> 100optical bandwidth (nm)

21no. stages

Ring Resonator (2RRS)

Wavelength-Insensitive Mach Zehnder (WIMZ)1

WIMZWIMZ

2RRS2RRS

50-µm scale bars

1) J. Van Campenhout, Optics Express, 17 (26) 24020.B. Lee IEEE Photonics Society Nov 2010

• Mach-Zehnder: less T sensitivity, more power, larger area• Ring Resonator: high Q for lower power but T sensitivity,

also much smaller

OThH3.pdf   

OSA/OFC/NFOEC 2011 OThH3.pdf

Page 26: Optical Interconnects for High Performance Computing · Optical Interconnects for High Performance Computing Marc A. Taubenblatt IBM T.J. Watson Research Center Yorktown Heights,

26

M. Taubenblatt OFC 2011Dec 10, 2010 51© 2010 IBM Corporation

A. On-board:

No, already given up power and density

B. On-multi-chip-carrier:

Yes, could provide all high-speed off chip I/O at low-power

C. On-chip:

Yes, even better than B provided the integration comes for “free”: no added power, minimal added cost in processing or packaging

D. CMOS integrated:

No, digital real-estate too valuable in the substrate, back-end metal too crowded, too hard to optimize

What is the optimal level of integration for Si Photonics?

CMOS: CPUOptical Layer

Long electrical link is bad, costs power

Board packagingnot dense enough

CMOS: TX + RX

OR

Circuit Board

CMOS:CPU + Photonics

Circuit Board

CMOS: CPUPhotonic Layer

CMOS: TX +RX

Circuit Board

CMOS:CPU

Integrated Photonics

MemoryMemory

Circuit Board

CMOS: CPU Integrated Photonics

after C. Schow OFC 2009 Workshop

M. Taubenblatt OFC 2011Dec 10, 2010 52© 2010 IBM Corporation

Robust packaging of Si Photonics is needed

tapered glass tapered glass waveguides waveguides match pitch, match pitch,

cross section, cross section, and NA of chipand NA of chip

standard 250standard 250--µµm pitch SM/PM m pitch SM/PM fiber array aligned in Vfiber array aligned in V--groove, groove, buttbutt--coupled to glass waveguidescoupled to glass waveguides

B. G. Lee, et al Proceedings of OFC 2010, paper PDPA4.F. Doany et al, J. Lightwave Techn. 99, Oct 2010

Multichannel tapered coupler allows interfacing 250-µm-pitch PM fiber array with 20-µm-pitch silicon waveguide array 8-channel coupling demonstrated < 1 dB optical loss at interface Uniform insertion loss and crosstalk …but TBD: reliability, cost…

In collaboration with

Chiral Photonics

OThH3.pdf   

OSA/OFC/NFOEC 2011 OThH3.pdf

Page 27: Optical Interconnects for High Performance Computing · Optical Interconnects for High Performance Computing Marc A. Taubenblatt IBM T.J. Watson Research Center Yorktown Heights,

27

M. Taubenblatt OFC 2011Dec 10, 2010 53© 2010 IBM Corporation

Si Photonics

High potential for future HPC systems

– Ultra low cost and power

– Highly integrated

Focus for further progress

– Device choices and design for best power/speed/temperature tradeoffs

– Low cost packaging, including light source

M. Taubenblatt OFC 2011Dec 10, 2010 54© 2010 IBM Corporation

Optical Link Reliability for PetaScale+

•Effective FIT rate for 11ch ‘link’ with typical VCSEL Wearout and Random FIT=10/device•Sparing (12th device) effectively reduces FIT to low levels.

No SparingWearout

Random

12Khrs20hrs3.8 fails/day1M

39Khrs200hrs2.68 fails/week100K

174kHrs2Khrs1 fail/month10K

252Khrs20Khrs1.5 fails/year1K

Spare VCSEL only [time to 1st fail]

Spare VCSEL + 50 FIT unspared[time to 1st fail]

No Spare VCSEL + 50 FIT unspared

# of Links

ExaScale will need sparing + ultra-reliable components

11+1 spare

D. Kuchta IEEE Systems Packaging Japan Workshop, Jan 2010

OThH3.pdf   

OSA/OFC/NFOEC 2011 OThH3.pdf

Page 28: Optical Interconnects for High Performance Computing · Optical Interconnects for High Performance Computing Marc A. Taubenblatt IBM T.J. Watson Research Center Yorktown Heights,

28

M. Taubenblatt OFC 2011Dec 10, 2010 55© 2010 IBM Corporation

Outline Why we need Optical Interconnects for High Performance Computing

Optical Link Basics

Requirements and Trade-offs

– Cost, Power, Density

Technologies of Interest

– VCSELs/fiber

– VCSELS/Optical PCB

– Si Photonics

Architectures: Optical Switching

Conclusions

M. Taubenblatt OFC 2011Dec 10, 2010 56© 2010 IBM Corporation

What about new Architectures?

Rule #1: Cannot simultaneously use new Architecture AND new optics technology in the same generation…too much risk

OThH3.pdf   

OSA/OFC/NFOEC 2011 OThH3.pdf

Page 29: Optical Interconnects for High Performance Computing · Optical Interconnects for High Performance Computing Marc A. Taubenblatt IBM T.J. Watson Research Center Yorktown Heights,

29

M. Taubenblatt OFC 2011Dec 10, 2010 57© 2010 IBM Corporation

Typical architecture: Electronic Packet Switching

Typical architecture (electronic switch chips, interconnected by electrical or optical links, in multi-stage networks)

works well now---

– Scalable BW & application-optimized cost

• Multiple switches in parallel

– Modular building blocks • many identical switch chips & links)

-- but challenging in the future

– Switch chip throughput stresses the hardest aspects of chip design

• I/O & packaging

– Multi-stage networks will require multiple E-O-E conversions

• N-stage Exabyte/s network = N*Exabytes/s of costN*Exabytes/s of power

Central switch racks

J. Kash OFC tutorial 2008

By courtesy of Barcelona Supercomputing Center - www.bsc.es

M. Taubenblatt OFC 2011Dec 10, 2010 58© 2010 IBM Corporation

Scalable Optical Circuit Switch (OCS)

MEMS-based OCS HW is commercially available (Calient, Glimmerglass,..)

• 20 ms switching time• <100 Watts

Possible new architecture: Optical Circuit Switching

Optical Packet Switches (all-optical network, one E-O-E conversion from original source to final destination, <10 ns switching time)

are hard.– Not commercially available now

– Probably not cost-competitive against electronic packet switches, even in 2015-2020

But Optical Circuit Switches (>1millisecond switching time) are available

– Several technologies (MEMS, piezo-, thermo-,..)

– Low power, no extra O-E-O– But require single-mode optics

If multiple circuit-switch planes are provisioned, performance for parallel apps can be good

OCS

Input fiber(one channel

shown)

Output fibers

OCS Concept

2-axis MEMSMirror

(one channelshown)J. Kash OFC tutorial 2008 L.Schares OFC 2009

OThH3.pdf   

OSA/OFC/NFOEC 2011 OThH3.pdf

Page 30: Optical Interconnects for High Performance Computing · Optical Interconnects for High Performance Computing Marc A. Taubenblatt IBM T.J. Watson Research Center Yorktown Heights,

30

M. Taubenblatt OFC 2011Dec 10, 2010 59© 2010 IBM Corporation

EQ

control

2 Rx

central scheduler(bipartite graph matching algorithm)

VOQs

Tx

control

64 Ingress Adapters

Optical Switch

64 Egress Adapters

EQ

control

2 Rx

control links

8 Broadcast Units128 Select Units

8x11x88x1

Com-biner

Fast SOA 1x8FiberSelectorGates

Fast SOA 1x8FiberSelectorGates

Fast SOA 1x8WavelengthSelectorGates

Fast SOA 1x8WavelengthSelectorGates

OpticalAmplifier

WDMMux

StarCoupler

8x1 1x128VOQs

Tx

control

1

8

1

128

1

64

1

64

OSMOSIS Demonstrator System Centralized optical switch with electrical scheduler

64-ports at 40 Gb/s port speed Broadcast-and-select architecture Combination of wavelength and space division multiplexing fast switching based on SOAs R. Luijten, OFC 2009

In collaboration w / Corning Inc.

M. Taubenblatt OFC 2011Dec 10, 2010 60© 2010 IBM Corporation

Example of wavelength based packet routing

λF

λH

O/E

O/Econtrollogic

50:50

50:50

70:30

70:30

North

West

East

South

SOA

SOA

transparent routingwavelength-parallel

multi-λ packet

COLUMBIAUNIVERSITY

Lightwave Research LaboratoryO. Liboiron-Ladouceur, JLT, VOL. 26, NO. 13, JULY 1, 2008

Courtesy of K. Bergman, Columbia University

OThH3.pdf   

OSA/OFC/NFOEC 2011 OThH3.pdf

Page 31: Optical Interconnects for High Performance Computing · Optical Interconnects for High Performance Computing Marc A. Taubenblatt IBM T.J. Watson Research Center Yorktown Heights,

31

M. Taubenblatt OFC 2011Dec 10, 2010 61© 2010 IBM Corporation

Optical I/O

3D Chip Stack: (top) optical NoC, (center) memory layer, (bottom) processing cores

PIN-Diode MZI Modulator

[Green et al., Optics Express 2007]

Coupled Ring Resonator Switch

[Vlasov et al., Nature Photonics 2008]Ge MSM Photodetector

[Assefa et al., Optics Express 2010]

Silicon Photonics for Chip-Scale Photonic Communication: Key to low cost switching

Passive Elementso Waveguideso Crossingso WDMso SplittersActive Elementso Modulatorso Switcheso Detectors

Critical Switch Metricso Speedo Insertion Losso Optical Bandwidtho Optical Crosstalko Footprinto Static Power Consumptiono Dynamic Switching Energyo Thermal Insensitivityo Fabrication Tolerance

after B. Lee CLEO 2010

Add’l refs: A.Rylyakov ISSCC Jan 2011W Green et al, Proc Photonics Society Annual Meeting, 2010, pp. 512-513.

M. Taubenblatt OFC 2011Dec 10, 2010 62© 2010 IBM Corporation

Outline Why we need Optical Interconnects for High Performance

Computing

Optical Link Basics

Requirements and Trade-offs

– Cost, Power, Density

Technologies of Interest

– VCSELs/fiber

– VCSELS/Optical PCB

– Si Photonics

Architectures: Optical Switching

Conclusions

OThH3.pdf   

OSA/OFC/NFOEC 2011 OThH3.pdf

Page 32: Optical Interconnects for High Performance Computing · Optical Interconnects for High Performance Computing Marc A. Taubenblatt IBM T.J. Watson Research Center Yorktown Heights,

32

M. Taubenblatt OFC 2011Dec 10, 2010 63© 2010 IBM Corporation

Characteristics of “Computer-com”

Ultra-short distance

– Some 100m, many <10m, backplanes and cards eventually

Very wide busses

– 10’s to 100’s of channels

Very high aggregate BW

– Multi-Tbps busses, multi-Tbps off chip modules

High density

– > 100Gbps/cm2 (unidirectional T+R)

Highly integrated packaging

– Close to electrical signal source

Low power

– 10’s mW/Gbps 1mW/Gbps in the future

Very high volume/low cost per Gbps

– $1’s /Gbps << $1/Gbps in the future

Reliability

– Low BER, ECC, redundancy

M. Taubenblatt OFC 2011Dec 10, 2010 64© 2010 IBM Corporation

Conclusions

Computer-com optics is leaving the “plateau of productivity” and rounding the corner of the hockey stick!

..but there’s a lot of trade-offs to be considered.

OThH3.pdf   

OSA/OFC/NFOEC 2011 OThH3.pdf

Page 33: Optical Interconnects for High Performance Computing · Optical Interconnects for High Performance Computing Marc A. Taubenblatt IBM T.J. Watson Research Center Yorktown Heights,

33

M. Taubenblatt OFC 2011Dec 10, 2010 65© 2010 IBM Corporation

Selected ReadingIn addition to talks and papers cited in the presentation, the following may be useful:

Computer Architecture and Optics Requirements A. F. Benner et al, “Exploitation of optical interconnects in future server architectures”, IBM J. Res. & Dev. Vol. 49 No. 4/5 July/Sept. 2005 J. Hennessy & D. Patterson, Computer Architecture, Morgan-Kaufman, 2003 P. Kogge et al, “Exascale Computing Study, Technology Challenges in Achieving Exascale systems”, Sept 2008,

http://www.er.doe.gov/ascr/Research/CS/DARPA%20exascale%20-%20hardware%20(2008).pdf

High Speed Electrical Interconnects: D.G. Kam, et al, “Is 25 Gb/s On-Board Signaling Viable?”, IEEE Transactions on Advanced Packaging, May 2009 Vol: 32 Issue:2 328 - 344

Optical Interconnects P. Pepeljugoski et al, “Modeling and Simulation of the next generation multimode fiber, IEEE J. Lightwave Technol., vol. 21, 5 1242-, 2003 P. Pepeljugoski et al, "Development of System Specification for Laser Optimized 50-um Multimode Fiber for Multigigabit Short-Wavelength

LANs", IEEE J. Lightwave Technology 21(5), pp.1256-75, May 2003 D. Kuchta “Progress in VCSEL based Parallel Links”, in VCSELs --- Fundamentals, Technology and Applications of Vertical-Cavity Surface-Emitting Lasers, Eds. R. Michalzik and F. Koyama, to be published, Springer-Verlag Berlin-Heidelberg-New York in 2011 D.A.B Miller, Optical Interconnects OFC Tutorial 2010 http://ee.stanford.edu/~dabm

Optical PCB F. E. Doany et al, "Terabit/s-Class 24-Channel Bidirectional Optical Transceiver Module Based on TSV Si Carrier for Board-Level

Interconnects," Electronic Components and Technology Conference (ECTC) 2010, pp. 58–65, Jun 2010 D. Jubin et al, Polymer waveguide-based multilayer optical connector, Proc. SPIE, Vol. 7607, 76070K (2010); doi:10.1117/12.841904 F. D. Doany et al,"160 Gb/s Bidirectional Polymer-Waveguide Board-Level Optical Interconnects using CMOS-Based Transceivers," IEEE

Trans. Adv. Packag., Vol. 32, No. 2, pp. 345-359, May 2009. R. Dangel et al, "Polymer-Waveguide-Based Board-Level Optical Interconnect Technology for Datacom Applications," IEEE Transactions on

Advanced Packaging, vol. 41, pp. 759-767, Nov 2008

Si Photonics: Y. Vlasov et al, multiple papers can be downloaded at: http://www.ibm.research/photonics Dries Van Thourhout, Si Photonics OFC Tutorial 2010 http://photonics.intec.ugent.be/download/ D.A.B Miller, Optical Interconnects OFC Tutorial 2010 http://ee.stanford.edu/~dabm

OThH3.pdf   

OSA/OFC/NFOEC 2011 OThH3.pdf