monolithic integration of energy-efficient cmos silicon

21
Integrated Systems Group Massachusetts Institute of Technology Monolithic Integration of Energy-efficient CMOS Silicon Photonic Interconnects Vladimir Stojanović

Upload: others

Post on 17-Nov-2021

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Monolithic Integration of Energy-efficient CMOS Silicon

Integrated Systems Group

Massachusetts Institute of Technology

Monolithic Integration of

Energy-efficient

CMOS Silicon Photonic Interconnects

Vladimir Stojanović

Page 2: Monolithic Integration of Energy-efficient CMOS Silicon

Manycore SOC roadmap fuels

bandwidth demand

64-tile system (64-256 cores) - 4-way SIMD FMACs @ 2.5 – 5 GHz

- 5-10 TFlops on one chip

- Need 5-10 TB/s of off-chip I/O

- Even higher on-chip bandwidth

2 cm

2 cm

Intel 48 core -Xeon

2

Page 3: Monolithic Integration of Energy-efficient CMOS Silicon

System Bottlenecks

CPU

Cache/

MC

DR

AM

DIM

M

Manycore system

cores

Cache/

MC

DR

AM

DIM

M

Cache/

MC

DR

AM

DIM

M

CPU CPU

Interconnect

Network

Interconnect

Network

Bottlenecks due

to energy and

bandwidth density

limitations

3

Page 4: Monolithic Integration of Energy-efficient CMOS Silicon

Wire and I/O scaling

Increased wire resistivity makes wire caps scale very slowly

Can’t get both energy-efficiency and high-data rate in I/O

On-chip wires

copper resistivity

0

2

4

6

8

10

12

14

16

18

0 5 10 15 20 25

Chip2Chip Backplane

En

erg

y-c

ost

[pJ/b

]Data-rate [Gb/s]

Best electrical links

Loss ~10dB

Loss ~20-25dB

On-chip wires I/O

4

Page 5: Monolithic Integration of Energy-efficient CMOS Silicon

Bandwidth, pin count and power scaling

Need 16k pins

in 2017 for HPC*

1 Byte/Flop

256 cores

2 TFlop/s signal pins @ 20 Gb/s/link

2,4 cores

Pa

cka

ge

pin

co

un

t

*> half pins for power supply

5

Page 6: Monolithic Integration of Energy-efficient CMOS Silicon

Supercomputers

Monolithic CMOS-Photonics in Computer Systems

Embedded apps

Si-photonics in advanced

CMOS and DRAM process

NO costly process changes

6

Page 7: Monolithic Integration of Energy-efficient CMOS Silicon

Many architectural studies show promise

[Shacham’07]

[Petracca’08]

[Vantrease’08]

[Psota’07]

[Kirman’06]

[Joshi’09]

[Pan’09]

[Batten’08] [Kurian’10] [Koka’08-10]

7

Page 8: Monolithic Integration of Energy-efficient CMOS Silicon

Optimization requires full system insight

Developed cross-layer modeling framework Kurian, Chen 2011

Cache & Core

Energy & Area

8

DSENT Electrical and optical link and

network models

Page 9: Monolithic Integration of Energy-efficient CMOS Silicon

Start at the link level:

Jointly optimize circuits and photonic devices

Reg

iste

r

Mu

x

Pre-Driver Mod-DriverReceiver

Front-end

Φ Φ Φ

Φ Φ

+

Samplers &

Monitoring

Dem

ux

Reg

iste

r

PLL or

Opt. Clk

1 2 3 4 in PLL or

Opt. Clk

Phase

Adjust

Reg

iste

r

Mu

x

Pre-Driver Mod-DriverReceiver

Front-end

Φ Φ Φ

Φ Φ

+

Samplers &

Monitoring

Dem

ux

Reg

iste

r

PLL or

Opt. Clk

1 2 3 4 in PLL or

Opt. Clk

Phase

Adjust

Dense WDM – 128 wavelengths/waveguide - >1Tb/s per waveguide

Need 1000’s of transceivers on die with < 100fJ/bit cost at > 10Gb/s !

- Optimized modulator circuits/devices

- Optimized receiver circuits/photo-detector

- Optimized thermal tuning 9

Page 10: Monolithic Integration of Energy-efficient CMOS Silicon

Laser energy increases with data-rate

Limited Rx sensitivity

Modulation more expensive -> extinction ratio / insertion loss trade-off

Tuning costs decrease with data-rate

Moderate data rates most energy-efficient

Reg

iste

r

Mu

x

Pre-Driver Mod-DriverReceiver

Front-end

Φ Φ Φ

Φ Φ

+

Samplers &

Monitoring

Dem

ux

Reg

iste

r

PLL or

Opt. Clk

1 2 3 4 in PLL or

Opt. Clk

Phase

Adjust

Reg

iste

r

Mu

x

Pre-Driver Mod-DriverReceiver

Front-end

Φ Φ Φ

Φ Φ

+

Samplers &

Monitoring

Dem

ux

Reg

iste

r

PLL or

Opt. Clk

1 2 3 4 in PLL or

Opt. Clk

Phase

Adjust

512 Gb/s aggregate throughput

assuming 32nm CMOS

Georgas CICC 2011

Need to optimize carefully

10

Page 11: Monolithic Integration of Energy-efficient CMOS Silicon

DWDM link efficiency optimization

Optimize for min energy-cost

Bandwidth density dominated by circuit and photonics area (not coupler pitch) 10x better than electrical bump limited

200x better than electrical package pin limit

Electrical

bump-pitch

limited to

<1Tb/s/mm2 >10x

Package pin limit

0.05 Tb/s/mm2

11

Page 12: Monolithic Integration of Energy-efficient CMOS Silicon

Photonic DRAM Network Organization

Important Concepts

- Power/message switching (only to active DRAM chip in

DRAM cube/super DIMM)

- Vertical die-to-die coupling (minimizes cabling - 8 dies per

DRAM cube)

-Command distributed

electrically (broadcast)

- Data photonic (single writer

multiple readers)

MC 1

MC 16

Mem

Sch

edu

ler

MC K

CPUDRAM cube 1

DRAM cube 4

Super DIMM

cmdDwr

Drd

( cube 1, die 1)

cmdDwr

Drd

( cube 1, die 8)

Dwr

Drd

DRAM cube 4

Super DIMM K

die-die switch

Laser in

Modulator bank

Receiver/PD bank

Tunable filterbank

Through silicon via

Through silicon via holeBeamer ISCA 2010 Processor die

12

Page 13: Monolithic Integration of Energy-efficient CMOS Silicon

Optimizing DRAM with photonics

Floorplan

Beamer ISCA 2010

P1 P4

13

Page 14: Monolithic Integration of Energy-efficient CMOS Silicon

Laser Power Guiding Effectiveness

Beamer ISCA 2010 14

Enables capacity scaling per channel and significant savings in laser energy

Page 15: Monolithic Integration of Energy-efficient CMOS Silicon

ATAC – On-Chip network Example

1000 core die

64 clusters connected via optical broadcast 15

Page 16: Monolithic Integration of Energy-efficient CMOS Silicon

Average Energy over Splash2 benchmarks

Ring tuning very expensive

Non-gated laser very expensive 16

Page 17: Monolithic Integration of Energy-efficient CMOS Silicon

Including the cores gives the full picture

Energy dominated by cores/caches

Faster network saves overall energy (leakage and clock)

Need aggressive clock-gating and supply/retention scaling

Page 18: Monolithic Integration of Energy-efficient CMOS Silicon

Execution time also matters

18

Page 19: Monolithic Integration of Energy-efficient CMOS Silicon

Feedback to device designers

Waveguide losses up to 2dB/cm o.k.

19

Page 20: Monolithic Integration of Energy-efficient CMOS Silicon

Conclusions

Biggest gains if photonics both on-chip and off-chip

Core-to-MC network

MC-to-DRAM bank network – immediate 10x gains

Need comprehensive modeling framework to see

the full picture

Link-level – tight interaction of circuits and photonics

through good models

System-level – Include all system components – cores,

network, caches, memory

Page 21: Monolithic Integration of Energy-efficient CMOS Silicon

Acknowledgments

Krste Asanović, Rajeev Ram, Miloš Popović, Christopher

Batten, Ajay Joshi

Anant Agarwal, Li-Shiuan Peh, Lionel Kimerling, Jurgen

Michel, Dimitri Antoniadis

Jason Miller, Jeff Shainline

Jason Orcutt, Chen Sun, Ben Moss, Jonathan Leu, Michael

Georgas, Stevan Urosević, Owen Chen, George Kurian,

Yong-Jin Kwon, Scott Beamer

Dr. Jag Shah and Dr. Charles Holland, DARPA

FCRP IFC, NSF

Trusted Foundry, Intel Corporation, APIC