firefly: illuminating future network-on-chip with nanophotonics

25
Firefly: Illuminating Future Network-on-Chip with Nanophotonics Yan Pan, Prabhat Kumar, John Kim , Gokhan Memik, Yu Zhang, Alok Choudhary EECS Department Northwestern University Evanston, IL, USA {panyan,prabhat-kumar,g-memik, yu-zhang,a-choudhary} @northwestern.edu CS Department KAIST Daejeon, Korea [email protected]

Upload: jennifer-stone

Post on 31-Dec-2015

24 views

Category:

Documents


0 download

DESCRIPTION

Firefly: Illuminating Future Network-on-Chip with Nanophotonics. Yan Pan, Prabhat Kumar, John Kim † , Gokhan Memik , Yu Zhang, Alok Choudhary. EECS Department Northwestern University Evanston, IL, USA {panyan,prabhat-kumar,g-memik, yu-zhang,a-choudhary} @northwestern.edu. - PowerPoint PPT Presentation

TRANSCRIPT

Firefly: Illuminating Future Network-on-Chip with Nanophotonics

Yan Pan, Prabhat Kumar, John Kim†, Gokhan Memik, Yu Zhang, Alok Choudhary

EECS DepartmentNorthwestern University

Evanston, IL, USA{panyan,prabhat-kumar,g-memik,

yu-zhang,a-choudhary}@northwestern.edu

† CS DepartmentKAIST

Daejeon, [email protected]

Motivation Architecture of Firefly Evaluation Conclusion

ISCA 2009Yan Pan 2/25

On-Chip Network TopologiesOn-Chip Network Topologies

Mesh[MIT RAW] [TILE64]

[Teraflops]

C-Mesh[Balfour’06]

[Cianchetti’09]

Crossbar[Vantrease’08]

[Kirman’06]

Others: Torus[Shacham’07], Flattened Butterfly[Kim’07], Dragonfly[Kim’08], Hierarchical(Bus&Mesh)[Das’08], Clos[Joshi’09], Ring[Larrabee], ……

► Network-on-chip is critical for performance.

Motivation Architecture of Firefly Evaluation Conclusion

ISCA 2009Yan Pan 3/25

Signaling technologiesSignaling technologies

► Electrical signaling– Repeater insertion needed– Bandwidth density (up to 8 Gbps/um) [Chang HPCA‘08]

► Nanophotonics– Bandwidth density ~100 Gbps/ μm !!! [Batten HOTI’08]

– Generally distance independent power consumption– Speed of light low latency

• Propagation• Switching [Cianchetti ISCA’09]

Motivation Architecture of Firefly Evaluation Conclusion

ISCA 2009Yan Pan 4/25

Nanophotonic componentsNanophotonic components

► Basic components

off-chiplaser source

coupler

resonant modulators

resonant detectors

Ge-doped

waveguide

Motivation Architecture of Firefly Evaluation Conclusion

ISCA 2009Yan Pan 5/25

Radius r Baseline WavelengthTemperature t Manufacturing error correctionCarrier density d Fast tuning by charge injection

Resonant RingsResonant Rings

► Selective– Couple optical energy of a specific wavelength

Motivation Architecture of Firefly Evaluation Conclusion

ISCA 2009Yan Pan 6/25

Putting it togetherPutting it together

► Modulation & detection– ~100 Gbps/μm bandwidth density [Batten HOTI’08]

11010101

11010101

10001011

10001011

64 wavelengths DWDM3 ~ 5μm waveguide pitch10Gbps per link

~100 Gbps/μmbandwidth density

Motivation Architecture of Firefly Evaluation Conclusion

ISCA 2009Yan Pan 7/25

What’s the catch?What’s the catch?

► Power Cost– Ring heating– Laser Power– E/O & O/E conversions– Distance insensitive

► For short links (2.5mm)

– Nanophotonics– Electrical

• RC lines with repeater insertion

[Batten HOTI’08] [Cheng ISCA’06]

0

100

200

300

400

500

600

700

Nanophotonics RC Line

Per B

it E

nerg

y (f

J/b)

Optical Components Ring Heating

Laser Electrical

► For long links– Nanophotonics

• Cost stays the same

– Electrical• Cost increases

Motivation Architecture of Firefly Evaluation Conclusion

ISCA 2009Yan Pan 8/25

Here is the idea ……Here is the idea ……

► Design an architecture that differentiates traffic.– Use electrical signaling for short links.– Use nanophotonics only for long range traffic.

► What do we gain?– Low latency– High bandwidth density– High power efficiency– Localized arbitration– Scalability

Motivation Architecture of Firefly Evaluation Conclusion

ISCA 2009Yan Pan 9/25

OutlineOutline

► Motivation► Architecture of Firefly► Evaluation► Conclusion

Motivation Architecture of Firefly Evaluation Conclusion

ISCA 2009Yan Pan 10/25

Layout View of 64-core FireflyLayout View of 64-core Firefly

► Concentration– 4 cores share a

router– 16 routers

P0 P1

P2 P3

P0 P1

P2 P3

P0 P1

P2 P3

P0 P1

P2 P3

P0 P1

P2 P3

P0 P1

P2 P3

P0 P1

P2 P3

P0 P1

P2 P3

P0 P1

P2 P3

P0 P1

P2 P3

P0 P1

P2 P3

P0 P1

P2 P3

P0 P1

P2 P3

P0 P1

P2 P3

P0 P1

P2 P3

P0 P1

P2 P3R

R R

R R

R R

R R

R R

R R

R R

R

Motivation Architecture of Firefly Evaluation Conclusion

ISCA 2009Yan Pan 11/25

Layout View of 64-core FireflyLayout View of 64-core Firefly

► Concentration► Clusters

– Electrically connected

– Mesh topology– 4 routers per

cluster– 4 clusters

R R

R R

R R

R R

R R

R R

R R

R R

Cluster 0Cluster 0(C0)(C0)

Cluster 0Cluster 0(C0)(C0)

Cluster 1Cluster 1(C1)(C1)

Cluster 1Cluster 1(C1)(C1)

Cluster 3Cluster 3(C3)(C3)

Cluster 3Cluster 3(C3)(C3)

Cluster 2Cluster 2(C2)(C2)

Cluster 2Cluster 2(C2)(C2)

C0R0 C0R1

C0R2 C0R3

C1R0 C1R1

C1R2 C1R3

C3R0 C3R1

C3R2 C3R3

C2R0 C2R1

C2R2 C2R3

Motivation Architecture of Firefly Evaluation Conclusion

ISCA 2009Yan Pan 12/25

C0R0 C0R1

C0R2 C0R3

C1R0 C1R1

C1R2 C1R3

C3R0 C3R1

C3R2 C3R3

C2R0 C2R1

C2R2 C2R3

C0R3 C1R3

C3R3C2R3

C0R2 C1R2

C3R2C2R2

Layout View of 64-core FireflyLayout View of 64-core Firefly

► Concentration► Clusters► Assemblies

– Routers from different clusters

– Optically connected

– Logical crossbars

C0R0 C1R0

C3R0C2R0

C0R1 C1R1

C3R1C2R1

A1A1

A0A0

Motivation Architecture of Firefly Evaluation Conclusion

ISCA 2009Yan Pan 13/25

C0R0 C0R1

C0R2 C0R3

C1R0 C1R1

C1R2 C1R3

C3R0 C3R1

C3R2 C3R3

C2R0 C2R1

C2R2 C2R3

Layout View of 64-core FireflyLayout View of 64-core Firefly

► Clusters– Electrical

CMESH

► Assemblies– Nanophotonic

crossbars

A2A2

A3A3

A0A0

A1A1

Nanophotonic Nanophotonic CrossbarsCrossbarsEfficient nanophotonic

crossbars needed!

Motivation Architecture of Firefly Evaluation Conclusion

ISCA 2009Yan Pan 14/25

Nanophotonic crossbarsNanophotonic crossbars

► Single-Write-Multiple-Read (SWMR) [Kirman’06] (CMXbar††)

– Dedicated sending channel– Multicast in nature– Receiver compare & discard – High fan-out laser power

SWMR Crossbar

†† [Joshi NOCS’09]

CH0

R0 R1 RN-1

w

CH1

...

......

... ... ...w

w

... ...

CH(N-1)

Data

Ch

ann

els

Motivation Architecture of Firefly Evaluation Conclusion

ISCA 2009Yan Pan 15/25

Nanophotonic crossbarsNanophotonic crossbars

► Multiple-Write-Single-Read (MWSR)[Vantrease’08] (DMXbar††)

– Dedicated receiving channel– Demux to channel– Global arbitration needed!

MWSR Crossbar

CH0

R0 R1 RN-1

CH1

...

......

... ... ...

... ...

CH(N-1)

w

w

w

Data

Ch

ann

els

†† [Joshi NOCS’09]

Motivation Architecture of Firefly Evaluation Conclusion

ISCA 2009Yan Pan 16/25

Reservation-assisted SWMRReservation-assisted SWMR

► Goal– Avoid global arbitration– Reduce power

► Proposed design– Reservation channels

• Narrow

– Multicast to reserve• Destination ID• Packet length

– Uni-cast data packet R-SWMR Crossbar

CH0a

CH1a

CH(N-1)a

...

... ... ...

log (Ns)

... ...

log (Ns)

Reservation

C

han

nels

log (Ns)

CH0

R0 R1 RN-1

CH1

CH(N-1)

...

......

... ... ...

... ...

...

w

w

w

Data

Ch

ann

els

Motivation Architecture of Firefly Evaluation Conclusion

ISCA 2009Yan Pan 17/25

Router MicroarchitectureRouter Microarchitecture

► Virtual-channel router– Added optical link ports and extra buffer.

SwitchAllocator

VCAllocator

Output k

Crossbar switch

RouterRoutingcomputation

Eject(Output 1)

VC 1

VC 2

VC v

VC 1

VC 2

VC v

Inject(Input 1)

Input k

Arbiter

global output

E/Oglobal input 1

O/E

global input gO/E

input buffer Dedicated sending channel for all traffic.

Separate receiving channels from other clusters.

Motivation Architecture of Firefly Evaluation Conclusion

ISCA 2009Yan Pan 18/25

► Routing– Intra-cluster routing– Traversing optical link

RoutingRouting

C0R0

C5R0

C5R1

C5R2

C5R3

RT LT LT LT LT LT OA RT LT RT LT RT LT RT

RT LT LT LT LT LT OA RT LT RT LT RT LT RT

RT LT LT LT LT LT OA RT LT RT LT RT LT RT

head

body

tail

RB

--

--

RTRT

RBRB

LTLT

OAOA

SwitchAllocator

VCAllocator

Output k

Crossbar switch

RouterRoutingcomputation

Eject(Output 1)

VC 1

VC 2

VC v

VC 1

VC 2

VC v

Inject(Input 1)

Input k

Arbiter

global output

E/Oglobal input 1

O/E

global input gO/E

input buffer

SwitchAllocator

VCAllocator

Output k

Crossbar switch

RouterRoutingcomputation

Eject(Output 1)

VC 1

VC 2

VC v

VC 1

VC 2

VC v

Inject(Input 1)

Input k

Arbiter

global output

E/Oglobal input 1

O/E

global input gO/E

input buffer

FIREFLY_dest FIREFLY_src

(FIREFLY_dest)

CH0a

CH1a

CH(N-1)a

...

... ... ...

log (Ns)

... ...

log (Ns)

Reservation

C

han

nels

log (Ns)

CH0

R0 R1 RN-1

CH1

CH(N-1)

...

......

... ... ...

... ...

...

w

w

w

Data

Ch

ann

els

Motivation Architecture of Firefly Evaluation Conclusion

ISCA 2009Yan Pan 19/25

Firefly – another lookFirefly – another look

► Clusters– Short electrical links– Concentrated mesh

► Assemblies– Long nanophotonic links– Partitioned crossbars

► Benefits– Traffic locality– Reduced hardware– Localized arbitration– Distributed inter-cluster bandwidth

C0R3

P

P

P

P

C0R0

P

P

P

P

C0R2

P

P

P

P

C0R1

P

P

P

P

C2R0

P

P

P

P

C3R0

P

P

P

P

C1R0

P

P

P

P

C0

C1

C2

C3

C0R3

P

P

P

P

C0R0

P

P

P

P

C0R2

P

P

P

P

C0R1

P

P

P

P

C0

...

...

C2R0

P

P

P

P

C3R0

P

P

P

P

C1R0

P

P

P

P

C1

C2

C3

...

...

...

...

A0

A1

A2

A3

Motivation Architecture of Firefly Evaluation Conclusion

ISCA 2009Yan Pan 20/25

OutlineOutline

► Motivation► Architecture of Firefly► Evaluation► Conclusion

Motivation Architecture of Firefly Evaluation Conclusion

ISCA 2009Yan Pan 21/25

Evaluation SetupEvaluation Setup

► Cycle-accurate simulator (Booksim)

► Firefly vs. CMESH, Dragonfly† and OP_XBAR► Synthetic traffic patterns and traces

Code Name Topology Global RoutingMin#VC

CMESH Concentrated mesh dimension-ordered routing 1

DFLY_MINMinimal routing, traversing nanophotonics at most once.

2

DFLY_VALNonminimal routing, traversing nanophotonics up to twice.

3

OP_XBARAll-optical crossbar using token-based global arbitration

destination-based routing 1

FIREFLYProposed hybrid architecture with multiple logical optical inter-cluster crossbar.

Intra-cluster routing in the source cluster before traversing nanophotonics

1

Dragonfly topology mapped to on-chip network

Electrical

Hybrid

Optical

Hybrid

[† Kim et al, ISCA’08]

Motivation Architecture of Firefly Evaluation Conclusion

ISCA 2009Yan Pan 22/25

Load / Latency CurveLoad / Latency Curve

► Throughput– Up to 4.8x over OP_XBAR– At least +70% over Dragonfly

0

5

10

15

20

25

30

35

0 0.1 0.2 0.3 0.4 0.5 0.6

Late

ncy

(#Cy

cles

)

Injection Rate(a)

0

5

10

15

20

25

30

35

0 0.2 0.4 0.6 0.8 1

Late

ncy

(#Cy

cles

)Injection Rate

(b)

0

10

20

30

40

50

60

0 0.2 0.4 0.6 0.8 1

Late

ncy

(#Cy

cles

)

Injection Rate(d)

0

10

20

30

40

50

60

0 0.1 0.2 0.3 0.4 0.5 0.6

Late

ncy

(#Cy

cles

)

Injection Rate(c)

Bitcomp, 1-cycle Uniform, 1-cycle

4.8x 70%

Motivation Architecture of Firefly Evaluation Conclusion

ISCA 2009Yan Pan 23/25

Energy BreakdownEnergy Breakdown

► Reduced hardware by partitioning– Reduced heating

► Throughput impact► Locality

– 34% energy reduction over OP_XBAR with locality

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2

CMESHDFLY_MINDFLY_VALOP_XBAR

FIREFLYCMESH

DFLY_MINDFLY_VALOP_XBAR

FIREFLY

Tape

r_L0

.7D

7Bi

tcom

p

Average Per-packet Energy (nJ)

Router / DEMUX

Electircal Link

Optical Link

Laser

Ring Heating

Motivation Architecture of Firefly Evaluation Conclusion

ISCA 2009Yan Pan 24/25

Technology SensitivityTechnology Sensitivity

► α is heating ratio and β is laser ratio.► Firefly favors traffic locality.

bitcomp taper_L0.7D7

Motivation Architecture of Firefly Evaluation Conclusion

ISCA 2009Yan Pan 25/25

ConclusionConclusion

► Technology impacts architecture– New opportunities in nanophotonics

• Low latency, high bandwidth density

– Tailored architectures needed

► Firefly benefits from nanophotonics by providing– Power Efficiency

• Hybrid signaling• Partitioned R-SWMR crossbars

Reduced hardware/power

– Scalability• Scalable inter-cluster bandwidth• Low-radix routers/crossbars