hardwired networks on chip for fpgas and their applications kees goossens (tu delft, nxp) muhammad...
TRANSCRIPT
Hardwired networks on chip for FPGAsand their applications
Kees Goossens (TU Delft, NXP)Muhammad Aqeel Wahlah (TU Delft)
2
Kees Goossens2009-08-06 MPSOC
overview
applicationsnetwork on chipFPGA
key ideas– hardwired NOC– unified interconnect– data coercion / type casting
application: dynamic partial reconfiguration– multiple concurrent applications– multiplex sub-applications (“hardware tasks”)
exampleconclusions
3
Kees Goossens2009-08-06 MPSOC
applications
BAC
T1 T2 T3
C1 C2 C3A1 A2BA
task / function mapped on IP– includes local storage / buffering
application: set of communicating IPs / tasks / ...– data, control, code– communication via connections
use case: set of concurrent applications
4
Kees Goossens2009-08-06 MPSOC
network on chip (NOC)
connects ports on hardware blocks (IP)– data, control
connections: virtual wires– real-time / quality of service
programmable at run-time– set up & remove connections by
programming control registersin the NOC
styles of communication – address-based /
memory-mapped– streaming
R R
R
NI
NI
NI
NI NI
IP
IP
IPIP
IP
NOC
T1
T2
T3
BAC
A1 A2
BA
5
Kees Goossens2009-08-06 MPSOC
FPGA fabric
LUT
LUT
LUT
LUT
IOprocessor
CPU
on-chipmemory
off-chipmemory
de/encryptaccelerator
on-chipmemory
LUT
LUT
LUT
LUT
ICAP
soft IP are configured in– configurable elements (LUT)– and switch boxes (not shown)
with a given configuration granularity (frame) using the configuration interconnect (ICAP)
hard IP– CPU– on-chip memories (BRAM, ...)– off-chip memory interfaces– decryption IP– etc.
configuration: bitstream loadingprogramming / control: set MMIO registersxilinx terminology (frames, ICAP, etc.)
6
Kees Goossens2009-08-06 MPSOC
LUT
LUT
LUT
LUT
application on FPGA
LUT
frame
frame
frame
IOprocessor
CPU
on-chipmemory
off-chipmemory
de/encryptaccelerator
on-chipmemory
A2
A1
BAC
BA
ICAP
design an application as for ASIC– IPs, interconnect, storage, sw
but map on soft & hard IP resources
traditionally have separate softdata and control interconnectscould also use soft NOC for both
soft data interconnect
soft control interconnectBACA1 A2BA
7
Kees Goossens2009-08-06 MPSOC
LUT
LUT
LUT
LUT
multiple applications on FPGA
LUT
LUT
LUT
LUT
IOprocessor
CPU
on-chipmemory
off-chipmemory
de/encryptaccelerator
on-chipmemory
A2
A1
BAC
BA
ICAP
T3
T1
interconnects and IPs of different applications share reconfiguration regions (frames)dynamic reconfiguration is global, not partial
soft data interconnect
soft control interconnect
T2T1 T2 T3
BACA1 A2BA
8
Kees Goossens2009-08-06 MPSOC
overview
applicationnetwork on chipFPGA
key ideas– hardwired NOC improved performance : cost– unified interconnect flexibility– data coercion / type casting cool (and useful) applications
application: dynamic partial reconfiguration– multiple concurrent applications– multiplex sub-applications (“hardware tasks”)
exampleconclusions
9
Kees Goossens2009-08-06 MPSOC
1. hardwired interconnect
replace soft interconnect(s)by hard interconnect(s)connect reconfifgurable regionsof LUTs (CFR)
bit-level reconfigurability (CFR)– switch boxes
transaction-levelreconfigurability (NOC)– routers, NIs– memory mapped / streaming
[Hecht FPL’05]
CFR
CFR
CFR
CFR
IOprocessor
CPU
on-chipmemory
off-chipmemory
de/encryptaccelerator
on-chipmemory
A2
A1
BAC
BA
ICAP
T3
T1
T2
hard interconnect(s)
10
Kees Goossens2009-08-06 MPSOC
hard interconnect(s)
1. hardwired interconnect
CFR
CFR
CFR
CFR
IOprocessor
CPU
on-chipmemory
off-chipmemory
de/encryptaccelerator
on-chipmemory
BAC
ICAP
T3
T1
T2
~35 X smaller area~3.5 X higher speed
~150 X better perf:cost ratio(bits/sec/area)~200 X smaller configuration footprint(program MMIO, no bitstream)~200 X faster soft IP load & bootdynamic partial reconfiguration– no constraints on soft IP
placement due to communicationloss of flexibility– fewer LUTs– CFR = frame 7% hard NOC
[based on Virtex4 & Aethereal NOC, Goossens NOCS’08]
C1
C2
c3
11
Kees Goossens2009-08-06 MPSOC
performance & cost
essentially, it all depends on– area soft:hard ≈ 35:1– speed soft:hard ≈ 3.5:1– configuration footprint of soft NOC (bitstream) :
programming footprint of hard NOC (MMIO registers) ≈ 214:1
resulting in– boot time soft:hard ≈ 1:200– functional performance:cost (bit/sec:area) soft:hard ≈ 1:147
12
Kees Goossens2009-08-06 MPSOC
performance & cost
configuration speed– 1.9 Gb/s for dedicated configuration interconnect (ICAP)– 8 Gb/s for hard NOC
programming speed– 118 MHz soft NOC– 500 MHz hard NOC
configuration footprint for soft NOC – 1.8 Mb (8300 LUTs per router+NI)
programming footprint for hard NOC– 2100 bit per connection
thus to configure & program an NI– 1 msec for soft NOC– 10.6 μsec for hard NOC
13
Kees Goossens2009-08-06 MPSOC
2. unified interconnect
CFR
CFR
CFR
CFR
IOprocessor
CPU
on-chipmemory
off-chipmemory
de/encryptaccelerator
on-chipmemory
A2
A1
BAC
BA
ICAP
T3
T1
T2
one interconnect (e.g. NOC) for– data for functional mode– control for programming– bitstreams for configuration
dynamic partitioning of different interconnects
single hard interconnect
14
Kees Goossens2009-08-06 MPSOC
single hard interconnect
3. data coercion
CFR
CFR
CFR
CFR
IOprocessor
CPU
on-chipmemory
off-chipmemory
de/encryptaccelerator
on-chipmemory
data = control = bitstream = test = …
connect a data portto a configuration port – decrypt bitstreams
bitstream
data
15
Kees Goossens2009-08-06 MPSOC
single hard interconnect
3. data coercion
CFR
CFR
CFR
CFR
IOprocessor
CPU
on-chipmemory
off-chipmemory
de/encryptaccelerator
on-chipmemory
PH
IP
data = control = bitstream = test = …
connect a data portto a configuration port – decrypt bitstreams– relocate bitstreams– run-time compute / optimise
bitstreams• JIT, peephole
bitstream
16
Kees Goossens2009-08-06 MPSOC
single hard interconnect
3. data coercion
CFR
CFR
CFR
CFR
IOprocessor
CPU
on-chipmemory
off-chipmemory
de/encryptaccelerator
on-chipmemory
PH
IP
data = control = bitstream = test = …
connect a data portto a configuration port – decrypt bitstreams– relocate bitstreams– run-time compute / optimise
bitstreams• JIT, peephole
data port to test port (NOC as TAM)– on-line (structural) testing– on-chip test-vector generation
bitstream
17
Kees Goossens2009-08-06 MPSOC
overview
applicationsnetwork on chipFPGA
key ideas– hardwired NOC– unified interconnect– data coercion / type casting
application: dynamic partial reconfiguration– multiple concurrent applications– multiplex sub-applications (“hardware tasks”)
exampleconclusions
18
Kees Goossens2009-08-06 MPSOC
dynamic partial reconfiguration: idea
“hardware operating system” implements run-time scheduling of
1. multiple concurrent applications– independent applications on own virtual platform
• no communication, no interference
• “performance virtualisation”– activation given by user, environment, etc.
T1 T2 T3
BAC C1 C2 C3A1 A2BA
app T
time
app DA AC
19
Kees Goossens2009-08-06 MPSOC
dynamic partial reconfiguration: idea
“hardware operating system” implements run-time scheduling of
1. multiple concurrent applications2. parts of single applications (soft IP, “hardware tasks”)
– multiplex parts of a single application on same resources
C1 C2 C3A1 A2BA
app T
time
app DA C
orsub-app A sub-app C
20
Kees Goossens2009-08-06 MPSOC
dynamic partial reconfiguration: idea
“hardware operating system” implements run-time scheduling of
1. multiple concurrent applications2. parts of single applications (soft IP, “hardware tasks”)
– multiplex parts of a single application on same resources– internal state
BAC C1 C2 C3A1 A2BA
app T
time
app DA C
state
21
Kees Goossens2009-08-06 MPSOC
dynamic partial reconfiguration: implementation
1. system manager– resource management (CFR, NOC, memory, …)
• inter-application virtual platforms
time
system manager
A C
application manager
BAC
T
application manager
22
Kees Goossens2009-08-06 MPSOC
dynamic partial reconfiguration: implementation
1. system manager– resource management (CFR, NOC, memory, …)
• inter-application virtual platforms
• intra-application phases– NOC programming– soft IP / (sub)-application configuration (incl. clock, reset)– bottleneck?
time
system manager
A C
application manager
BAC
23
Kees Goossens2009-08-06 MPSOC
dynamic partial reconfiguration: implementation
1. system manager2. application manager
– application programming
time
system manager
A C
application manager
BAC
T
application manager
24
Kees Goossens2009-08-06 MPSOC
dynamic partial reconfiguration: implementation
1. system manager2. application manager
– application programming– intra-application persistent data management
time
system manager
A C
application manager
BAC
BAC C1 C2 C3A1 A2BA
state
25
Kees Goossens2009-08-06 MPSOC
overview
applicationsFPGAnetwork on chip
key ideas– hardwired NOC– unified interconnect– data coercion / type casting
application: dynamic partial reconfiguration– multiple concurrent applications– multiplex sub-applications (“hardware tasks”)
exampleconclusions
26
Kees Goossens2009-08-06 MPSOC
modelling
SystemC– bit & cycle accurate NOC model– behavioural CFR models– accurate bitstream structure– behavioural hard IP models
model– starting / stopping of applications
• dynamic, based on user input– starting / stopping of sub-applications
• dynamic, based on flow of data
– configuration: loading of bitstreams for soft IP; clock & reset– programming: of NOC, system & sub-application managers– management of persistent state
27
Kees Goossens2009-08-06 MPSOC
single hard interconnect
example
system manager– program NOC for configuration
CFR
CFR
CFR
CFR
IOprocessor
CPU
on-chipmemory
off-chipmemory
de/encryptaccelerator
on-chipmemory
A2
A1
BAC
BA
systemmanager
applicationmanager
28
Kees Goossens2009-08-06 MPSOC
single hard interconnect
example
system manager– program NOC for configuration– configure: load bitstreams
• including bitstream syntax, etc.
CFR
CFR
CFR
CFR
IOprocessor
CPU
on-chipmemory
off-chipmemory
de/encryptaccelerator
on-chipmemory
A2
A1
BAC
BA
systemmanager
applicationmanager
bitstreamprogrammingdata
29
Kees Goossens2009-08-06 MPSOC
single hard interconnect
example
system manager– program NOC for configuration– configure: load bitstreams– program NOC for (sub)-application A
CFR
CFR
CFR
CFR
IOprocessor
CPU
on-chipmemory
off-chipmemory
de/encryptaccelerator
on-chipmemory
A2
A1
BAC
BA
systemmanager
applicationmanager
bitstreamprogrammingdata
30
Kees Goossens2009-08-06 MPSOC
single hard interconnect
example
system manager– program NOC for configuration– configure: load bitstreams– program NOC for (sub)-application A– program & start application manager
• including clocking & reset
CFR
CFR
CFR
CFR
IOprocessor
CPU
on-chipmemory
off-chipmemory
de/encryptaccelerator
on-chipmemory
A2
A1
BAC
BA
systemmanager
applicationmanager
bitstreamprogrammingdata
31
Kees Goossens2009-08-06 MPSOC
single hard interconnect
example
system manager– program NOC for configuration– configure: load bitstreams– program NOC for (sub)-application A– program & start application manager
application manager– programs & starts sub-app A
• soft IP fn is modelled by CFR
CFR
CFR
CFR
CFR
IOprocessor
CPU
on-chipmemory
off-chipmemory
de/encryptaccelerator
on-chipmemory
A2
A1
BAC
BA
systemmanager
applicationmanager
bitstreamprogrammingdata
32
Kees Goossens2009-08-06 MPSOC
single hard interconnect
example
system manager– program NOC for configuration– configure: load bitstreams– program NOC for (sub)-application A– program & start application manager
application manager– programs & starts sub-app A
sub-application A runs
CFR
CFR
CFR
CFR
IOprocessor
CPU
on-chipmemory
off-chipmemory
de/encryptaccelerator
on-chipmemory
A2
A1
BAC
BA
systemmanager
applicationmanager
bitstreamprogrammingdata
34
Kees Goossens2009-08-06 MPSOC
conclusions
ideas:– hardwired NOC performance:cost– unified interconnects hardware multi-tasking– data coercion / type casting cool & useful
very detailed modelmany simplifications & restrictions
many open issues– design flow: soft IP placement, binding, relocation, etc. [Madsen?]– application model:
• extend use-case model with intra-application dynamism
• more general notions of persistent state– implementation: separation of system & application managers