high-level interconnect architectures for fpgas

24
High-Level Interconnect Architectures for FPGAs Nick Barrow-Williams

Upload: ward

Post on 16-Feb-2016

44 views

Category:

Documents


0 download

DESCRIPTION

High-Level Interconnect Architectures for FPGAs. Nick Barrow-Williams. Introduction. Semiconductor industry has grown rapidly for several decades Continued shrinking of device dimension introduces new design challenges Moving data around a chip can now be the limiting factor of performance - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: High-Level Interconnect Architectures for FPGAs

High-Level Interconnect Architectures for FPGAs

Nick Barrow-Williams

Page 2: High-Level Interconnect Architectures for FPGAs

2

Introduction Semiconductor industry has grown rapidly for

several decades

Continued shrinking of device dimension introduces new design challenges

Moving data around a chip can now be the limiting factor of performance

Existing solutions do not scale well

Page 3: High-Level Interconnect Architectures for FPGAs

3

Why do existing solutions not scale?

Global connections are longer

Wire depth increased to counter width decrease

Parasitic capacitive effects increase and cause slow signal propagation

Page 4: High-Level Interconnect Architectures for FPGAs

4

Why do existing solutions not scale?

Existing system-level connection uses buses

Buses increase resource efficiency and decrease wiring congestion

Not suitable for a large number of modules

A network based alternative would offer higher aggregate bandwidth

Page 5: High-Level Interconnect Architectures for FPGAs

5

Why design for FPGA systems?

FPGA silicon area already dominated by wiring

Global wires are limited in number

Increasing gate count only increases wiring congestion

Page 6: High-Level Interconnect Architectures for FPGAs

6

The Solution: Network-on-Chip

Use technologies from network systems

Replace inefficient global wiring with high-level interconnection network

Create scalable systems to handle large numbers of modules

Page 7: High-Level Interconnect Architectures for FPGAs

7

Existing Solutions Most existing systems are for ASIC designs

Stanford Interconnect RAW SCALE SPIN

PNoC: An solution for FPGAs Complex High hardware cost

Other simulated solutions exist but few are implemented

Page 8: High-Level Interconnect Architectures for FPGAs

8

Proposal: Two network systems

Existing solutions use either packet switching or circuit switching techniques

Design, implement, test and synthesise one of each to compare performance and hardware cost

Map solutions to an FPGA platform to evaluate hardware cost in current generation systems

Page 9: High-Level Interconnect Architectures for FPGAs

9

Network Architecture Design

Topology Simple Scalable 2 Dimensional

Solution: 2D mesh Topology

Page 10: High-Level Interconnect Architectures for FPGAs

10

Network Architecture Design

Routing Algorithm Deterministic

Data always follows same path through network Simple hardware Sensitive to congestion

Adaptive Paths through network can change according to load Complex hardware Avoids congestion

Page 11: High-Level Interconnect Architectures for FPGAs

11

Network Architecture Design When choosing routing algorithms must avoid:

Deadlock:

Livelock

Solution: Use unidirectional wiring and allow each node to make two connections

Solution: Use deterministic routing

Page 12: High-Level Interconnect Architectures for FPGAs

12

Network Architecture Design Flow control methods

Circuit switched Circuit request propagates through network Path reserved to destination Grant signal propagates back Data sent then circuit deallocated

Packet switched Use header, body and tail Wormhole routing

Forward header and body without waiting for tail Need buffers to store stalled packets

Page 13: High-Level Interconnect Architectures for FPGAs

13

Router Design Each router contains a number of modules

FIFOs (only present in packet switched router)

Address to port-request decoder

Arbiter

Control finite state machines

Crossbar

Page 14: High-Level Interconnect Architectures for FPGAs

14

Circuit Switched Router Structure

Request In

Request In

Request Out

Grant In

Grant Out

Data In

Data Out

Data In

In & Out Ports

Crossbar

FSM

Arbiter Address to Port Decoder

Page 15: High-Level Interconnect Architectures for FPGAs

15

Packet Switched Router Structure

Request From

FIFOs

Request In

Write Out

Full In

Grant Out

Data From

FIFOs

Data Out

Data From FIFOs

In & Out Ports

Crossbar

Control

Arbiter Address to Port DecoderFIFO FSMData In

Full

Write

Grant

Req

Data

5 Qu

eue

Mod

ules

Page 16: High-Level Interconnect Architectures for FPGAs

16

Router Implementation and Testing

Both routers were coded using VHDL

Simulation and testing used a combination of ModelSim and Xilinx ISE 9.1

Ad-hoc tests used for individual modules

VHDL testbench used for system verification

Page 17: High-Level Interconnect Architectures for FPGAs

17

Testbench Structure

Mesh Network

ReadInput

Input Tables

TestTable

Source

OutputTable

Sink

Compare

TESTBENCH

Command File

Output File

Clock Gen

Reset Gen

Cycle Count

Success: ID: 1 Source : (0,3) Dest : (1,0) Hops : 4 Latency: 34Success: ID: 2 Source : (0,2) Dest : (1,0) Hops : 3 Latency: 27Success: ID: 3 Source : (3,2) Dest : (1,1) Hops : 3 Latency: 22Success: ID: 4 Source : (1,3) Dest : (0,1) Hops : 3 Latency: 22Success: ID: 5 Source : (3,0) Dest : (3,1) Hops : 1 Latency: 12

#START SOURCE DEST SIZE ID# ------------------------------------------------------ 2 3 0 0 1 8 1 3 2 0 0 1 2 2 3 2 3 1 1 2 3 4 3 1 1 0 8 4 5 0 3 1 3 7 5

Page 18: High-Level Interconnect Architectures for FPGAs

18

Synthesis

Each router was synthesised for a Virtex-4 LX platform

Post-synthesis verification

Resource usage

Timing

Page 19: High-Level Interconnect Architectures for FPGAs

19

Circuit Switched Resource Usage

LUTs Flip-Flops

223

60

203

62

Crossbar

Address to Port Decoder

Arbiter

FSM

16080

23

24

CrossbarAddress to Port DecoderArbiterFSM

Total of 586 4 Input LUTS

~0.1% of a Virtex 5Total of 202 Flip Flops

Page 20: High-Level Interconnect Architectures for FPGAs

20

Packet Switched Resource Usage

LUTs Flip-Flops

Total of 786 4 Input LUTS

+34% compared to circuit switched

Total of 237Flip Flops

211

34

203

40

240CrossbarAddress to Port DecoderArbiterControlQueue

90

15185

105CrossbarAddress to Port DecoderArbiterControlQueue

Page 21: High-Level Interconnect Architectures for FPGAs

21

Timing Results

Circuit Switched Packet Switched

Max Freq 126.330MHz

Setup time 5.308ns

Hold time 0.272ns

Max Freq 144.533MHz

Setup time 6.125ns

Hold time 0.272nsCritical path is through Arbiter in both

designs

Page 22: High-Level Interconnect Architectures for FPGAs

22

Project Appraisal Maintaining an accurate software simulation

proved difficult

A great deal was learnt during the implementation of the circuit switched network

HDL implementations are only prototypes

Testbench provides a good framework but more time is needed to gather performance data

Page 23: High-Level Interconnect Architectures for FPGAs

23

Conclusions

Possible to make low complexity network-on-chip systems suitable for FPGAs

Latency has to be traded for throughput

Hard to collect performance data without application driven benchmarks

Both networks are viable so why not use both?

Page 24: High-Level Interconnect Architectures for FPGAs

24

Future Work

Cycle accurate software simulations

Application driven benchmarking

Serial transmission

Power efficiency

Industry standard solution