mohamed abdelfattah vaughn betz. 2 why nocs on fpgas? embedded nocs 1 1 2 2 comparison against buses...

45
Augmenting FPGAs with Embedded Networks-on- Chip Mohamed ABDELFATTAH Vaughn BETZ

Upload: jalen-tuberville

Post on 29-Mar-2015

223 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Mohamed ABDELFATTAH Vaughn BETZ. 2 Why NoCs on FPGAs? Embedded NoCs 1 1 2 2 Comparison Against Buses 3 3

Augmenting FPGAs with Embedded Networks-on-Chip

Mohamed ABDELFATTAHVaughn BETZ

Page 2: Mohamed ABDELFATTAH Vaughn BETZ. 2 Why NoCs on FPGAs? Embedded NoCs 1 1 2 2 Comparison Against Buses 3 3

2

Outline

Why NoCs on FPGAs?

Embedded NoCs

1

2

Comparison Against Buses3

Page 3: Mohamed ABDELFATTAH Vaughn BETZ. 2 Why NoCs on FPGAs? Embedded NoCs 1 1 2 2 Comparison Against Buses 3 3

3

Interconnect

Motivation1. Why NoCs on FPGAs?

Logic Blocks

Switch Blocks

Wires

Page 4: Mohamed ABDELFATTAH Vaughn BETZ. 2 Why NoCs on FPGAs? Embedded NoCs 1 1 2 2 Comparison Against Buses 3 3

4

Motivation1. Why NoCs on FPGAs?

Logic Blocks

Switch Blocks

Wires

Hard Blocks:• Memory• Multiplier• Processor

Page 5: Mohamed ABDELFATTAH Vaughn BETZ. 2 Why NoCs on FPGAs? Embedded NoCs 1 1 2 2 Comparison Against Buses 3 3

5

Motivation1. Why NoCs on FPGAs?

Logic Blocks

Switch Blocks

Wires

Hard InterfacesDDR/PCIe ..

Interconnect still the same

Hard Blocks:• Memory• Multiplier• Processor

1600 MHz

200 MHz

800 MHz

Page 6: Mohamed ABDELFATTAH Vaughn BETZ. 2 Why NoCs on FPGAs? Embedded NoCs 1 1 2 2 Comparison Against Buses 3 3

6

MotivationDDR3 PHY and Controller

Problems:1. Bandwidth requirements for

hard logic/interfaces2. Timing closure

1. Why NoCs on FPGAs?PCIe Controller

Gigabit Ethernet

1600 MHz

200 MHz

800 MHz

Page 7: Mohamed ABDELFATTAH Vaughn BETZ. 2 Why NoCs on FPGAs? Embedded NoCs 1 1 2 2 Comparison Against Buses 3 3

7

MotivationDDR3 PHY and Controller

Problems:1. Bandwidth requirements for

hard logic/interfaces2. Timing closure3. High interconnect utilization:

– Huge CAD Problem– Slow compilation– Power/area utilization

4. Wire speed not scaling:– Delay is interconnect-dominated

1. Why NoCs on FPGAs?PCIe Controller

Gigabit Ethernet

Page 8: Mohamed ABDELFATTAH Vaughn BETZ. 2 Why NoCs on FPGAs? Embedded NoCs 1 1 2 2 Comparison Against Buses 3 3

Barcelona Los Angeles

Keep the “roads”, but add “freeways”.

Hard Blocks

Logic Cluster

Source: Google Earth

Page 9: Mohamed ABDELFATTAH Vaughn BETZ. 2 Why NoCs on FPGAs? Embedded NoCs 1 1 2 2 Comparison Against Buses 3 3

9

DDR3 PHY and Controller

1. Why NoCs on FPGAs?PCIe Controller

Gigabit Ethernet

Problems:1. Bandwidth requirements for

hard logic/interfaces2. Timing closure3. High interconnect utilization:

– Huge CAD Problem– Slow compilation– Power/area utilization

4. Wire speed not scaling:– Delay is interconnect-dominated

FPGA with NoCNoC

Routers

Links Router forwards data packet

Router moves data to local interconnect

Page 10: Mohamed ABDELFATTAH Vaughn BETZ. 2 Why NoCs on FPGAs? Embedded NoCs 1 1 2 2 Comparison Against Buses 3 3

10

DDR3 PHY and Controller

1. Why NoCs on FPGAs?PCIe Controller

Gigabit Ethernet

Problems:1. Bandwidth requirements for

hard logic/interfaces2. Timing closure3. High interconnect utilization:

– Huge CAD Problem– Slow compilation– Power/area utilization

4. Wire speed not scaling:– Delay is interconnect-dominated

5. Abstraction favours modularity:– Parallel compilation– Partial reconfiguration– Multi-chip interconnect

FPGA with NoC

Pre-design NoC to requirements NoC links are “re-usable” NoC is heavily “pipelined” NoC abstraction favors modularity

High bandwidth endpoints known

Page 11: Mohamed ABDELFATTAH Vaughn BETZ. 2 Why NoCs on FPGAs? Embedded NoCs 1 1 2 2 Comparison Against Buses 3 3

11

DDR3 PHY and Controller

1. Why NoCs on FPGAs?PCIe Controller

Gigabit Ethernet

FPGA with NoC

Latency-tolerant communication NoC abstraction favors modularity

Problems:1. Bandwidth requirements for

hard logic/interfaces2. Timing closure3. High interconnect utilization:

– Huge CAD Problem– Slow compilation– Power/area utilization

4. Wire speed not scaling:– Delay is interconnect-dominated

5. Abstraction favours modularity:– Parallel compilation– Partial reconfiguration– Multi-chip interconnect

Page 12: Mohamed ABDELFATTAH Vaughn BETZ. 2 Why NoCs on FPGAs? Embedded NoCs 1 1 2 2 Comparison Against Buses 3 3

12

1. Why NoCs on FPGAs?

Compute Acceleration

• Maxeler• Geoscience (14x, 70x)• Financial analysis (5x, 163x)

• Altera OpenCL• Video compression (3x, 114x)• Information filtering (5.5x)

GPU CPU

Page 13: Mohamed ABDELFATTAH Vaughn BETZ. 2 Why NoCs on FPGAs? Embedded NoCs 1 1 2 2 Comparison Against Buses 3 3

13

1. Why NoCs on FPGAs?

Compute Acceleration

Page 14: Mohamed ABDELFATTAH Vaughn BETZ. 2 Why NoCs on FPGAs? Embedded NoCs 1 1 2 2 Comparison Against Buses 3 3

14

1. Why NoCs on FPGAs?

Compute Acceleration

Page 15: Mohamed ABDELFATTAH Vaughn BETZ. 2 Why NoCs on FPGAs? Embedded NoCs 1 1 2 2 Comparison Against Buses 3 3

15

1. Why NoCs on FPGAs?

Compute Acceleration

NoC

Page 16: Mohamed ABDELFATTAH Vaughn BETZ. 2 Why NoCs on FPGAs? Embedded NoCs 1 1 2 2 Comparison Against Buses 3 3

16

Outline

Why NoCs on FPGAs?

Embedded NoCs

1

2

Mixed NoCs Hard NoCs

Comparison Against Buses3

Page 17: Mohamed ABDELFATTAH Vaughn BETZ. 2 Why NoCs on FPGAs? Embedded NoCs 1 1 2 2 Comparison Against Buses 3 3

Embedded NoCsFPGA

DD

Rx In

terf

ace

PCIe

Inte

rfac

e

Router

Compute Module

Links(Hard or Soft)

Fabric

Port

(Hard or Soft)

2. Embedded NoCs

“Mixed” NoC

“Hard” NoC

Soft LinksHard Routers

Hard LinksHard Routers =++

=“Soft” NoCSoft LinksSoft Routers + =

Page 18: Mohamed ABDELFATTAH Vaughn BETZ. 2 Why NoCs on FPGAs? Embedded NoCs 1 1 2 2 Comparison Against Buses 3 3

18

Soft Hard

FPGA CAD Tools ASIC CAD Tools

Design Compiler

Area

Speed

Power?Power

Methodology

Toggle rates

Gate-level simulation Gate-level simulation

Mixed

HSPICE

Page 19: Mohamed ABDELFATTAH Vaughn BETZ. 2 Why NoCs on FPGAs? Embedded NoCs 1 1 2 2 Comparison Against Buses 3 3

19

Router Logic

Programmable Interconnect

FPGA

Router

Mixed NoCs2. Embedded NoCs

Logic blocks

Baseline Router

Programmable“soft” interconnect

Width VCs Ports Buffer

32 2 5 10/VC

“Mixed” NoCSoft LinksHard Routers + =

Page 20: Mohamed ABDELFATTAH Vaughn BETZ. 2 Why NoCs on FPGAs? Embedded NoCs 1 1 2 2 Comparison Against Buses 3 3

20

Router Logic

Programmable Interconnect

FPGA

Router

Mixed NoCs2. Embedded NoCs

Router Logic

20“Mixed” NoCSoft LinksHard Routers + =

Page 21: Mohamed ABDELFATTAH Vaughn BETZ. 2 Why NoCs on FPGAs? Embedded NoCs 1 1 2 2 Comparison Against Buses 3 3

21

Router Logic

Programmable Interconnect

Router

Assumed a mesh Can form any topology

FPGA

Mixed NoCs2. Embedded NoCs

Special FeatureConfigurable topology

Page 22: Mohamed ABDELFATTAH Vaughn BETZ. 2 Why NoCs on FPGAs? Embedded NoCs 1 1 2 2 Comparison Against Buses 3 3

22

Router Logic

Dedicated Interconnect

FPGA

Router

Hard NoCs2. Embedded NoCs

Logic blocks

Dedicated “hard” interconnect

Programmable“soft” interconnect

22“Hard” NoCHard LinksHard Routers + =

Page 23: Mohamed ABDELFATTAH Vaughn BETZ. 2 Why NoCs on FPGAs? Embedded NoCs 1 1 2 2 Comparison Against Buses 3 3

23

Router Logic

Dedicated Interconnect

FPGA

Router

Hard NoCs2. Embedded NoCs

Router Logic

23“Hard” NoCHard LinksHard Routers + =

Page 24: Mohamed ABDELFATTAH Vaughn BETZ. 2 Why NoCs on FPGAs? Embedded NoCs 1 1 2 2 Comparison Against Buses 3 3

24

Router Logic

Dedicated Interconnect

FPGA

Router

Hard NoCs2. Embedded NoCs

Low-V mode

1.1 V0.9 V

Save 33% Dynamic Power

Special Feature

~15% slower

24“Hard” NoCHard LinksHard Routers + =

Page 25: Mohamed ABDELFATTAH Vaughn BETZ. 2 Why NoCs on FPGAs? Embedded NoCs 1 1 2 2 Comparison Against Buses 3 3

Soft, Mixed and Hard

Mixed Hard Soft

Speed

Speed

Bisection BW

~ 1.5% of FPGA33% of FPGA

730 – 940 MHz166 MHz

~ 50 GB/s~ 10 GB/s

64 –

NoC

[65 nm]

3. Area/Power Analysis

576 LBs~12,500 LBsArea

448 LBs

64-node NoC on Stratix III

Page 26: Mohamed ABDELFATTAH Vaughn BETZ. 2 Why NoCs on FPGAs? Embedded NoCs 1 1 2 2 Comparison Against Buses 3 3

Soft, Mixed and Hard

Mixed Hard (Low-V)Soft

Speed

Speed

Bisection BW

~ 1.5% of FPGA33% of FPGA

730 – 940 MHz166 MHz

~ 50 GB/s~ 10 GB/s

64 –

NoC

[65 nm]

3. Area/Power Analysis

576 LBs~12,500 LBsArea

448 LBs

Provides ~50GB/s peak bisection bandwidth

Very Cheap! Less than cost of 3 soft nodes

64-node NoC on Stratix III

Page 27: Mohamed ABDELFATTAH Vaughn BETZ. 2 Why NoCs on FPGAs? Embedded NoCs 1 1 2 2 Comparison Against Buses 3 3

29

NoC Power BudgetSoft NoC Mixed NoC Hard NoC Hard NoC (Low-V)

17.4 W

250 GB/s total bandwidth

Typical FPGA Dynamic Power

123%How much is used for system-level communication?

3. Area/Power Analysis

Largest Stratix-III device

Page 28: Mohamed ABDELFATTAH Vaughn BETZ. 2 Why NoCs on FPGAs? Embedded NoCs 1 1 2 2 Comparison Against Buses 3 3

30

NoC Power BudgetSoft NoC Mixed NoC Hard NoC Hard NoC (Low-V)

17.4 W

NoC

250 GB/s total bandwidth 15%

Typical FPGA Dynamic Power

3. Area/Power Analysis

123%

Page 29: Mohamed ABDELFATTAH Vaughn BETZ. 2 Why NoCs on FPGAs? Embedded NoCs 1 1 2 2 Comparison Against Buses 3 3

31

NoC Power Budget

NoC

17.4 WTypical FPGA

Dynamic Power

Soft NoC Mixed NoC Hard NoC Hard NoC (Low-V)250 GB/s total bandwidth 15%123% 11%

3. Area/Power Analysis

Page 30: Mohamed ABDELFATTAH Vaughn BETZ. 2 Why NoCs on FPGAs? Embedded NoCs 1 1 2 2 Comparison Against Buses 3 3

32

NoC Power Budget

NoC

17.4 WTypical FPGA

Dynamic Power

Soft NoC Mixed NoC Hard NoC Hard NoC (Low-V)250 GB/s total bandwidth 15%123% 11% 7%

3. Area/Power Analysis

Page 31: Mohamed ABDELFATTAH Vaughn BETZ. 2 Why NoCs on FPGAs? Embedded NoCs 1 1 2 2 Comparison Against Buses 3 3

33

Bandwidth in Perspective

14.6 GB/s

14.6 GB/s

14.6 GB/s

14.6 GB/s

17 G

B/s

17 G

B/s

17 G

B/s

17 G

B/s

DDR3 Module 1

PCIe Module 2

Full theoretical BW

126 GB/sAggregate Bandwidth

3.5%NoC Power Budget

Cross whole chip!

3. Area/Power Analysis

Page 32: Mohamed ABDELFATTAH Vaughn BETZ. 2 Why NoCs on FPGAs? Embedded NoCs 1 1 2 2 Comparison Against Buses 3 3

34

Outline

Why NoCs on FPGAs?

Embedded NoCs

1

2

Design Effort

Comparison Against Buses3

Area/PowerEfficiency

Page 33: Mohamed ABDELFATTAH Vaughn BETZ. 2 Why NoCs on FPGAs? Embedded NoCs 1 1 2 2 Comparison Against Buses 3 3

35

DDR3: Qsys Bus vs. NoC4. Comparison

Qsys bus: Build logical bus from fabric

Embedded NoC: 16 Nodes, hard routers & links

Page 34: Mohamed ABDELFATTAH Vaughn BETZ. 2 Why NoCs on FPGAs? Embedded NoCs 1 1 2 2 Comparison Against Buses 3 3

36

DDR3: Qsys Bus vs. NoC4. Comparison

Qsys bus: Build logical bus from fabric

Embedded NoC: 16 Nodes, hard routers & links

“The Case for Embedded Networks-on-Chip on FPGAs”To appear in IEEE Micro Magazine (February)

Page 35: Mohamed ABDELFATTAH Vaughn BETZ. 2 Why NoCs on FPGAs? Embedded NoCs 1 1 2 2 Comparison Against Buses 3 3

37

Design Effort4. Comparison

• Steps to close timing using Qsys

close

FPGA

Page 36: Mohamed ABDELFATTAH Vaughn BETZ. 2 Why NoCs on FPGAs? Embedded NoCs 1 1 2 2 Comparison Against Buses 3 3

38

Design Effort4. Comparison

• Steps to close timing using Qsys

far

FPGA

Page 37: Mohamed ABDELFATTAH Vaughn BETZ. 2 Why NoCs on FPGAs? Embedded NoCs 1 1 2 2 Comparison Against Buses 3 3

39

Design Effort4. Comparison

• Steps to close timing using Qsys

far

FPGA

Timing closure can be simplified with an embedded NoC

Page 38: Mohamed ABDELFATTAH Vaughn BETZ. 2 Why NoCs on FPGAs? Embedded NoCs 1 1 2 2 Comparison Against Buses 3 3

40

Area Comparison4. Comparison

Page 39: Mohamed ABDELFATTAH Vaughn BETZ. 2 Why NoCs on FPGAs? Embedded NoCs 1 1 2 2 Comparison Against Buses 3 3

41

Area Comparison4. Comparison

Page 40: Mohamed ABDELFATTAH Vaughn BETZ. 2 Why NoCs on FPGAs? Embedded NoCs 1 1 2 2 Comparison Against Buses 3 3

42

Area Comparison4. Comparison

Entire NoC smaller than bus for 3 modules!

Page 41: Mohamed ABDELFATTAH Vaughn BETZ. 2 Why NoCs on FPGAs? Embedded NoCs 1 1 2 2 Comparison Against Buses 3 3

43

Area Comparison4. Comparison

1/8 Hard NoC BW used already less area for most systems

Page 42: Mohamed ABDELFATTAH Vaughn BETZ. 2 Why NoCs on FPGAs? Embedded NoCs 1 1 2 2 Comparison Against Buses 3 3

44

Power Comparison4. Comparison

Hard NoC saves power for even the simplest systems

Page 43: Mohamed ABDELFATTAH Vaughn BETZ. 2 Why NoCs on FPGAs? Embedded NoCs 1 1 2 2 Comparison Against Buses 3 3

1

2

Big city needs freeways to handle traffic

Area: 20-23X

Why NoCs on FPGAs?

Embedded NoCs: Mixed & Hard

Speed: 5-6X Power: 9-15X• Area Budget for 64 nodes: ~1%• Power Budget for 100 GB/s: 3-7%

Comparison Against P2P/Buses3• Raw efficiency close to simplest P2P links• NoC more efficient & lower design effort.

Page 44: Mohamed ABDELFATTAH Vaughn BETZ. 2 Why NoCs on FPGAs? Embedded NoCs 1 1 2 2 Comparison Against Buses 3 3

46

Thank You!

www.eecg.utoronto.ca/~mohamed

Page 45: Mohamed ABDELFATTAH Vaughn BETZ. 2 Why NoCs on FPGAs? Embedded NoCs 1 1 2 2 Comparison Against Buses 3 3