hardwired networks on chip for fpgas and their applications kees goossens (tu delft, nxp) muhammad...

34
Hardwired networks on chip for FPGAs and their applications Kees Goossens (TU Delft, NXP) Muhammad Aqeel Wahlah (TU Delft)

Upload: maurice-mason

Post on 17-Dec-2015

219 views

Category:

Documents


0 download

TRANSCRIPT

Hardwired networks on chip for FPGAsand their applications

Kees Goossens (TU Delft, NXP)Muhammad Aqeel Wahlah (TU Delft)

2

Kees Goossens2009-08-06 MPSOC

overview

applicationsnetwork on chipFPGA

key ideas– hardwired NOC– unified interconnect– data coercion / type casting

application: dynamic partial reconfiguration– multiple concurrent applications– multiplex sub-applications (“hardware tasks”)

exampleconclusions

3

Kees Goossens2009-08-06 MPSOC

applications

BAC

T1 T2 T3

C1 C2 C3A1 A2BA

task / function mapped on IP– includes local storage / buffering

application: set of communicating IPs / tasks / ...– data, control, code– communication via connections

use case: set of concurrent applications

4

Kees Goossens2009-08-06 MPSOC

network on chip (NOC)

connects ports on hardware blocks (IP)– data, control

connections: virtual wires– real-time / quality of service

programmable at run-time– set up & remove connections by

programming control registersin the NOC

styles of communication – address-based /

memory-mapped– streaming

R R

R

NI

NI

NI

NI NI

IP

IP

IPIP

IP

NOC

T1

T2

T3

BAC

A1 A2

BA

5

Kees Goossens2009-08-06 MPSOC

FPGA fabric

LUT

LUT

LUT

LUT

IOprocessor

CPU

on-chipmemory

off-chipmemory

de/encryptaccelerator

on-chipmemory

LUT

LUT

LUT

LUT

ICAP

soft IP are configured in– configurable elements (LUT)– and switch boxes (not shown)

with a given configuration granularity (frame) using the configuration interconnect (ICAP)

hard IP– CPU– on-chip memories (BRAM, ...)– off-chip memory interfaces– decryption IP– etc.

configuration: bitstream loadingprogramming / control: set MMIO registersxilinx terminology (frames, ICAP, etc.)

6

Kees Goossens2009-08-06 MPSOC

LUT

LUT

LUT

LUT

application on FPGA

LUT

frame

frame

frame

IOprocessor

CPU

on-chipmemory

off-chipmemory

de/encryptaccelerator

on-chipmemory

A2

A1

BAC

BA

ICAP

design an application as for ASIC– IPs, interconnect, storage, sw

but map on soft & hard IP resources

traditionally have separate softdata and control interconnectscould also use soft NOC for both

soft data interconnect

soft control interconnectBACA1 A2BA

7

Kees Goossens2009-08-06 MPSOC

LUT

LUT

LUT

LUT

multiple applications on FPGA

LUT

LUT

LUT

LUT

IOprocessor

CPU

on-chipmemory

off-chipmemory

de/encryptaccelerator

on-chipmemory

A2

A1

BAC

BA

ICAP

T3

T1

interconnects and IPs of different applications share reconfiguration regions (frames)dynamic reconfiguration is global, not partial

soft data interconnect

soft control interconnect

T2T1 T2 T3

BACA1 A2BA

8

Kees Goossens2009-08-06 MPSOC

overview

applicationnetwork on chipFPGA

key ideas– hardwired NOC improved performance : cost– unified interconnect flexibility– data coercion / type casting cool (and useful) applications

application: dynamic partial reconfiguration– multiple concurrent applications– multiplex sub-applications (“hardware tasks”)

exampleconclusions

9

Kees Goossens2009-08-06 MPSOC

1. hardwired interconnect

replace soft interconnect(s)by hard interconnect(s)connect reconfifgurable regionsof LUTs (CFR)

bit-level reconfigurability (CFR)– switch boxes

transaction-levelreconfigurability (NOC)– routers, NIs– memory mapped / streaming

[Hecht FPL’05]

CFR

CFR

CFR

CFR

IOprocessor

CPU

on-chipmemory

off-chipmemory

de/encryptaccelerator

on-chipmemory

A2

A1

BAC

BA

ICAP

T3

T1

T2

hard interconnect(s)

10

Kees Goossens2009-08-06 MPSOC

hard interconnect(s)

1. hardwired interconnect

CFR

CFR

CFR

CFR

IOprocessor

CPU

on-chipmemory

off-chipmemory

de/encryptaccelerator

on-chipmemory

BAC

ICAP

T3

T1

T2

~35 X smaller area~3.5 X higher speed

~150 X better perf:cost ratio(bits/sec/area)~200 X smaller configuration footprint(program MMIO, no bitstream)~200 X faster soft IP load & bootdynamic partial reconfiguration– no constraints on soft IP

placement due to communicationloss of flexibility– fewer LUTs– CFR = frame 7% hard NOC

[based on Virtex4 & Aethereal NOC, Goossens NOCS’08]

C1

C2

c3

11

Kees Goossens2009-08-06 MPSOC

performance & cost

essentially, it all depends on– area soft:hard ≈ 35:1– speed soft:hard ≈ 3.5:1– configuration footprint of soft NOC (bitstream) :

programming footprint of hard NOC (MMIO registers) ≈ 214:1

resulting in– boot time soft:hard ≈ 1:200– functional performance:cost (bit/sec:area) soft:hard ≈ 1:147

12

Kees Goossens2009-08-06 MPSOC

performance & cost

configuration speed– 1.9 Gb/s for dedicated configuration interconnect (ICAP)– 8 Gb/s for hard NOC

programming speed– 118 MHz soft NOC– 500 MHz hard NOC

configuration footprint for soft NOC – 1.8 Mb (8300 LUTs per router+NI)

programming footprint for hard NOC– 2100 bit per connection

thus to configure & program an NI– 1 msec for soft NOC– 10.6 μsec for hard NOC

13

Kees Goossens2009-08-06 MPSOC

2. unified interconnect

CFR

CFR

CFR

CFR

IOprocessor

CPU

on-chipmemory

off-chipmemory

de/encryptaccelerator

on-chipmemory

A2

A1

BAC

BA

ICAP

T3

T1

T2

one interconnect (e.g. NOC) for– data for functional mode– control for programming– bitstreams for configuration

dynamic partitioning of different interconnects

single hard interconnect

14

Kees Goossens2009-08-06 MPSOC

single hard interconnect

3. data coercion

CFR

CFR

CFR

CFR

IOprocessor

CPU

on-chipmemory

off-chipmemory

de/encryptaccelerator

on-chipmemory

data = control = bitstream = test = …

connect a data portto a configuration port – decrypt bitstreams

bitstream

data

15

Kees Goossens2009-08-06 MPSOC

single hard interconnect

3. data coercion

CFR

CFR

CFR

CFR

IOprocessor

CPU

on-chipmemory

off-chipmemory

de/encryptaccelerator

on-chipmemory

PH

IP

data = control = bitstream = test = …

connect a data portto a configuration port – decrypt bitstreams– relocate bitstreams– run-time compute / optimise

bitstreams• JIT, peephole

bitstream

16

Kees Goossens2009-08-06 MPSOC

single hard interconnect

3. data coercion

CFR

CFR

CFR

CFR

IOprocessor

CPU

on-chipmemory

off-chipmemory

de/encryptaccelerator

on-chipmemory

PH

IP

data = control = bitstream = test = …

connect a data portto a configuration port – decrypt bitstreams– relocate bitstreams– run-time compute / optimise

bitstreams• JIT, peephole

data port to test port (NOC as TAM)– on-line (structural) testing– on-chip test-vector generation

bitstream

17

Kees Goossens2009-08-06 MPSOC

overview

applicationsnetwork on chipFPGA

key ideas– hardwired NOC– unified interconnect– data coercion / type casting

application: dynamic partial reconfiguration– multiple concurrent applications– multiplex sub-applications (“hardware tasks”)

exampleconclusions

18

Kees Goossens2009-08-06 MPSOC

dynamic partial reconfiguration: idea

“hardware operating system” implements run-time scheduling of

1. multiple concurrent applications– independent applications on own virtual platform

• no communication, no interference

• “performance virtualisation”– activation given by user, environment, etc.

T1 T2 T3

BAC C1 C2 C3A1 A2BA

app T

time

app DA AC

19

Kees Goossens2009-08-06 MPSOC

dynamic partial reconfiguration: idea

“hardware operating system” implements run-time scheduling of

1. multiple concurrent applications2. parts of single applications (soft IP, “hardware tasks”)

– multiplex parts of a single application on same resources

C1 C2 C3A1 A2BA

app T

time

app DA C

orsub-app A sub-app C

20

Kees Goossens2009-08-06 MPSOC

dynamic partial reconfiguration: idea

“hardware operating system” implements run-time scheduling of

1. multiple concurrent applications2. parts of single applications (soft IP, “hardware tasks”)

– multiplex parts of a single application on same resources– internal state

BAC C1 C2 C3A1 A2BA

app T

time

app DA C

state

21

Kees Goossens2009-08-06 MPSOC

dynamic partial reconfiguration: implementation

1. system manager– resource management (CFR, NOC, memory, …)

• inter-application virtual platforms

time

system manager

A C

application manager

BAC

T

application manager

22

Kees Goossens2009-08-06 MPSOC

dynamic partial reconfiguration: implementation

1. system manager– resource management (CFR, NOC, memory, …)

• inter-application virtual platforms

• intra-application phases– NOC programming– soft IP / (sub)-application configuration (incl. clock, reset)– bottleneck?

time

system manager

A C

application manager

BAC

23

Kees Goossens2009-08-06 MPSOC

dynamic partial reconfiguration: implementation

1. system manager2. application manager

– application programming

time

system manager

A C

application manager

BAC

T

application manager

24

Kees Goossens2009-08-06 MPSOC

dynamic partial reconfiguration: implementation

1. system manager2. application manager

– application programming– intra-application persistent data management

time

system manager

A C

application manager

BAC

BAC C1 C2 C3A1 A2BA

state

25

Kees Goossens2009-08-06 MPSOC

overview

applicationsFPGAnetwork on chip

key ideas– hardwired NOC– unified interconnect– data coercion / type casting

application: dynamic partial reconfiguration– multiple concurrent applications– multiplex sub-applications (“hardware tasks”)

exampleconclusions

26

Kees Goossens2009-08-06 MPSOC

modelling

SystemC– bit & cycle accurate NOC model– behavioural CFR models– accurate bitstream structure– behavioural hard IP models

model– starting / stopping of applications

• dynamic, based on user input– starting / stopping of sub-applications

• dynamic, based on flow of data

– configuration: loading of bitstreams for soft IP; clock & reset– programming: of NOC, system & sub-application managers– management of persistent state

27

Kees Goossens2009-08-06 MPSOC

single hard interconnect

example

system manager– program NOC for configuration

CFR

CFR

CFR

CFR

IOprocessor

CPU

on-chipmemory

off-chipmemory

de/encryptaccelerator

on-chipmemory

A2

A1

BAC

BA

systemmanager

applicationmanager

28

Kees Goossens2009-08-06 MPSOC

single hard interconnect

example

system manager– program NOC for configuration– configure: load bitstreams

• including bitstream syntax, etc.

CFR

CFR

CFR

CFR

IOprocessor

CPU

on-chipmemory

off-chipmemory

de/encryptaccelerator

on-chipmemory

A2

A1

BAC

BA

systemmanager

applicationmanager

bitstreamprogrammingdata

29

Kees Goossens2009-08-06 MPSOC

single hard interconnect

example

system manager– program NOC for configuration– configure: load bitstreams– program NOC for (sub)-application A

CFR

CFR

CFR

CFR

IOprocessor

CPU

on-chipmemory

off-chipmemory

de/encryptaccelerator

on-chipmemory

A2

A1

BAC

BA

systemmanager

applicationmanager

bitstreamprogrammingdata

30

Kees Goossens2009-08-06 MPSOC

single hard interconnect

example

system manager– program NOC for configuration– configure: load bitstreams– program NOC for (sub)-application A– program & start application manager

• including clocking & reset

CFR

CFR

CFR

CFR

IOprocessor

CPU

on-chipmemory

off-chipmemory

de/encryptaccelerator

on-chipmemory

A2

A1

BAC

BA

systemmanager

applicationmanager

bitstreamprogrammingdata

31

Kees Goossens2009-08-06 MPSOC

single hard interconnect

example

system manager– program NOC for configuration– configure: load bitstreams– program NOC for (sub)-application A– program & start application manager

application manager– programs & starts sub-app A

• soft IP fn is modelled by CFR

CFR

CFR

CFR

CFR

IOprocessor

CPU

on-chipmemory

off-chipmemory

de/encryptaccelerator

on-chipmemory

A2

A1

BAC

BA

systemmanager

applicationmanager

bitstreamprogrammingdata

32

Kees Goossens2009-08-06 MPSOC

single hard interconnect

example

system manager– program NOC for configuration– configure: load bitstreams– program NOC for (sub)-application A– program & start application manager

application manager– programs & starts sub-app A

sub-application A runs

CFR

CFR

CFR

CFR

IOprocessor

CPU

on-chipmemory

off-chipmemory

de/encryptaccelerator

on-chipmemory

A2

A1

BAC

BA

systemmanager

applicationmanager

bitstreamprogrammingdata

34

Kees Goossens2009-08-06 MPSOC

conclusions

ideas:– hardwired NOC performance:cost– unified interconnects hardware multi-tasking– data coercion / type casting cool & useful

very detailed modelmany simplifications & restrictions

many open issues– design flow: soft IP placement, binding, relocation, etc. [Madsen?]– application model:

• extend use-case model with intra-application dynamism

• more general notions of persistent state– implementation: separation of system & application managers

35

Kees Goossens2009-08-06 MPSOC