reiner hartenstein, university of kaiserslautern, germany - xputer lab … · 2013. 5. 8. ·...
TRANSCRIPT
Enabling Technologies for System-on-Chip Development,
November 19-20, 2001, Tampere, Finland
http://www.cs.tut.fi/soc/
Reconfigurable Computing Architectures and Methodologies for System-on-Chip;
Reiner Hartenstein, Monday, November 19, 10:15 - 11:00 hrs.
Reiner Hartenstein, University of Kaiserslautern, Germany http://hartenstein.de
Enabling Technologies for
Reconfigurable Computing
Reiner Hartenstein
University of Kaiserslautern
November 21, 2001, Tampere, Finland
Enabling Technologies for Reconfigurable Computing part 1: Reconfigurable Computing (RC) Wednesday, November 21, 8.30 – 10.00 hrs. © 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
2
Schedule
time slot
08.30 – 10.00 Reconfigurable Computing (RC)
10.00 – 10.30 coffee break
10.30 – 12.00 Compilation Techniques for RC
12.00 – 14.00 lunch break
14.00 – 15.30 Resources for Stream-based RC
15.30 – 16.00 coffee break
16.00 – 17.30 FPGAs: recent developments
© 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
3
Reconfigurable: why?
• Exploding design cost and shrinking product life cycles of ASICs create a demand on RA usage for product longevity.
• Performance is only one part of the story. The time has come fully exploit their flexibility to support turn-around times of minutes instead of months for real time in-system debugging, profiling, verification, tuning, field-maintenance, and field-upgrades.
• A new “soft machine” paradigm and language framework is available for novel compilation techniques to cope with the new market structures transferring synthesis from vendor to customer.
© 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
4
SOC Alternatives… not including C/C++ CAD Tools [Gordon Bell]
• The blank sheet of paper: FPGA
• Auto design of a basic system: Tensilica
• Standardized, committee designed components*, cells, and custom IP
• Standard components including more application specific processors *, IP add-ons and custom
• One chip does it all: SMOP **
*) Processors, Memory, Communication & Memory Links, **) SMOP ??
© 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
5
SoC Alternatives [Gordon Bell]
product strategy vendor
FPGA “sea of uncommitted gate arrays” Xylinx, Altera
compile a system unique processor for every application
Tensilica
systolic array many pipelined or parallel processors + custom
DSP, VLIW special purpose processor cores + custom
TI
processor + RAM + ASICS
general purpose cores, specialized by I/O, etc.
IBM, Intel,
universal micro multiprocessor array, programmable I/O
Cradle
© 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
6
A Decade of Research in Reconfigurable Computing
• Due to the achievements of numerous Research Projects throughout the 90ies the Breakthrough in Commercialization has started and already a quite comprehensive Methodology is available.
• Dear Colleague, the RC Scene welcomes your contributions to improve it and to push for Inclusion in contemporary CS&E Curricula.
• It is one of the Goals of this Talk to stimulate you by Highlights and introducing some Key Issues.
Enabling Technologies for System-on-Chip Development,
November 19-20, 2001, Tampere, Finland
http://www.cs.tut.fi/soc/
Reconfigurable Computing Architectures and Methodologies for System-on-Chip;
Reiner Hartenstein, Monday, November 19, 10:15 - 11:00 hrs.
Reiner Hartenstein, University of Kaiserslautern, Germany http://hartenstein.de
© 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
7
no more a strange niche area
• was “Hardware” design for a strange plattform – CAD, but no Compilation
• Emerging awareness: – New mind set – New curricular embedding
• coming Dichotomie of CS – SW <-> CW – HW <-> FW – computing in time <-> computing in space
© 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
8
flexibility / universality trade-off
trade-off flexibility efficiency
FPGA
Kress Array
Xplorer hard- wired
© 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
9
RAs are heading for Mainstream
ASPP, application-specific programmable product is: • Application-specific standard product and: • embedded programmable logic
Soap Chip: System on a programmable Chip
Logic
Analog
DRAM/Flash/SRAM
Pro
gram
mab
le L
ogic
Microprocessor
CSoC, configurable SoC is: • an industry standard µProcessor, • embedded reconfigurable array, • memory, dedicated systen bus ...
Logic
Flash / RAM
memory banks
Reconfigurable
Accelerator
Array
... become indispensable for SoC products ?
© 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
10
Reconfigurable Logic going Mainstream
• Please, Lobby for New Curricula.
• Comprehensive Methodology
• One of the goals of this talk: to motivate You by Key Issues and Visionary Highlights.
• Fine grain: FPGAs killing the ASIC market
• Coarse grain: several startups
• Substantially improved design flow and libraries
• Fastest growing segment of semiconductor market
© 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
11
Designer-oriented Innovation stalled ?
• EDA industry: about 7 bio $ • leverages > 200 bio $ semconductor industry • FPGAs (7 bio $) fastest growing segment • EDA industry constantly redefining itself • „except logic synthesis nor really significant
innovation in the past decade“ • CAD developers can‘t deliver their idear
effectively • CAD developers personally don‘t appreciate the
real problems facing designers
© 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
12
EDA the main bottleneck
Enabling Technologies for System-on-Chip Development,
November 19-20, 2001, Tampere, Finland
http://www.cs.tut.fi/soc/
Reconfigurable Computing Architectures and Methodologies for System-on-Chip;
Reiner Hartenstein, Monday, November 19, 10:15 - 11:00 hrs.
Reiner Hartenstein, University of Kaiserslautern, Germany http://hartenstein.de
© 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
13
Biggest Mistake of EDA guess it !
© 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
14
>> History
• History
• Paradidgm Shift
• Coarse Grain: why ?
• Coarse Grain Architectures
• Reconfiguration Architecture http://www.uni-kl.de
© 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
15
Logic Gate Price Trend
Source:Altera
Pric
e (N
orm
aliz
ed t
o Q
1/19
93
)
Q1 '93
Q1 '94
Q1 '95
Q1 '96
Q1 '97
Q1 '98
Q1 '99
Q1 '00
Price per Logic Element
40% lower per Year
0
0.2
0.4
0.6
0.8
1
1.2
0.261
0.086 0.042 0.029
© 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
16
?
The History of Paradigm Shifts
“Mainstream Silicon Application is switching every 10 Years”
TTL µproc., memory
“The Programmable System-on-a-Chip is the next wave“
custom
standard
1957
1967
1977
1987
1997
2007
ASICs, accel’s
LSI, MSI
1st
Design Crisis
2nd
Design Crisis
?
© 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
17
Makimoto’s 3rd Wave
• Fine Grain Subsystems (FPGAs):
– 1st half of 3rd wave – universal (but less efficient)
• Coarse Grain Subsystems:
– 2nd half of 3rd wave – domain-specific – much more flexible than 2nd half of 2rd wave
© 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
18
How’s next Wave ?
2007 FPGAs
custom
standard
1957
1967
1977
1987
1997
Tredennick’s Paradigm Shifts
procedural programming
algorithm: variable
resources: fixed
hardwired
algorithm: fixed
resources: fixed
2007
?
structural programming
algorithm: variable
resources: variable
Coarse grain RAs
no further wave !
Hartenstein’s Curve
? 4th wave ?
Enabling Technologies for System-on-Chip Development,
November 19-20, 2001, Tampere, Finland
http://www.cs.tut.fi/soc/
Reconfigurable Computing Architectures and Methodologies for System-on-Chip;
Reiner Hartenstein, Monday, November 19, 10:15 - 11:00 hrs.
Reiner Hartenstein, University of Kaiserslautern, Germany http://hartenstein.de
© 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
19
The Impact of Makimoto’s Paradigm Shifts
TTL µproc., memory
custom
standard
ASICs, accel’s
LSI, MSI
1957
1967
1977
1987
1997
2007
Procedural personalization via RAM-based
Machine Paradigm
Personalization (CAD) before fabrication
structural personalization:
RAM-based before run time
Dr. Makimoto: FPL 2000 keynote
Software Industry’s Secret of Success
Repeat Success Story by new Machine Paradigm !
© 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
20
>> Paradigm Shift
• History
• Paradidgm Shift
• Coarse Grain: why ?
• Coarse Grain Architectures
• Reconfiguration Architecture http://www.uni-kl.de
© 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
21
Sequential vs. structural RAM
re-
download
conf. accelerator(s)
RAM
Logic Synthesis
Route and Place
FPGA
“von Neumann”
downloading
RAM
downloading
data path instruction sequencer
I / O
(procedural) Software
sequential
RAM
structural
RAM
© 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
22
Changing Models of Computing
“von Neumann” contemporary reconfigurable computing
downloading
RAM
downloading
data path instruction sequencer
I / O
host
hardwired
downloading
accelerator(s)
CAD
RAM
host
re-
downloading
conf. accelerator(s)
RAM RAM
(procedural) Software
Software Configware
(structural)
Flexware Hardware
occupies most silicon
the tail wagging the dog
© 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
23
The Microprocessor is a Methuselah
• 1th 4004
• 2nd 8008
• 3rd 8086
• 4th 80286
• 5th 80386
• 6th 80486
• 7th P5 (Pentium)
• 8th P6 (Pentium Pro / Pentium II)
• 9th Pentium III
9 technology generations ...
... the steam engine
of the silicon age
© 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
24
… Decline of Wintel Business Model
Billion Subscribers worldwide
1 Bio
0.5 Bio
20
Billion US-$ US Market [forrester]
15
10
20
1997 1998 1999 2000 2001 2002
Million Devices delivered in the U.S.
[IDC]
1000 $
1500 $
Enabling Technologies for System-on-Chip Development,
November 19-20, 2001, Tampere, Finland
http://www.cs.tut.fi/soc/
Reconfigurable Computing Architectures and Methodologies for System-on-Chip;
Reiner Hartenstein, Monday, November 19, 10:15 - 11:00 hrs.
Reiner Hartenstein, University of Kaiserslautern, Germany http://hartenstein.de
© 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
25
Basics of Binding Time
run time
loading time
compile time
time of “Instruction Fetch”
microprocessor parallel computer
Reconfigurable Computing
© 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
26
Binding Time vs. Computing Domain
time domain (procedural)
Binding time: (Set-up of Communication Channels)
at run time microprocessor parallel computer
time & space (hybrid)
systolic arrays
later fabrication step ASICs
space domain (structural)
before fabrication full custom ICs
at loading time
at compile time
Reconfigurable Computing
array processor
programming domain:
The KressArray is a generalization
of the systolic array
© 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
27
Dataquest Predicts Programmability to be Predominant in SOC
• With programmability as a standard feature, ASPPs will be predominant system-on-a-chip products in five years
Dataquest Semiconductors ‘98 conference
EETimes 10/21/98
Jordan Selburn, principal analyst, ASICs and system-level integration, Dataquest Inc.’s Semiconductors Group
• Application-specific programmable products (ASPPs) will be the next best thing in semiconductor technology
© 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
28
Applications
The 10th International Conference on Field-programmable Logic and Applications
The Roadmap to Reconfigurable Systems
*) keynotes and papers at FPL 2000 Villach, Austria, August 27 - 30, 2000
http://www.fpl.uni-kl.de/FPL/
• next generations’ wireless* • network processors* • many other areas*
© 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
29
Applications (2)
• Image Processing:
– for smart car (collision avoidance, others ...),
– Smart traffic pilots, robotics, fast material inspection,
– smart stub finders, motion detection (MPEG-4, ...)
• Signal Processing, Speech Processing, Software Radio,
• Correlation, Encryption, Comm. Switching / Protocols,
• Innovative consumer electronics:
– super smart cards, smart handies, wearable,
– portable, set-top, laptop, desktop, embedded, ...
• many others, ...
© 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
30
Applications
•new cellular standard: up to 2 Mbit/sec: new CDMA standard: > 500 MIPS needed just for RF receiver part
•wide variety of end-user‘s devices: smart handies, palm pilots, laptops, games, camcorder-likes, ..the internet car, many new types of devices to come ...
• increasing wide variety of services available from network provider:download just what a particular customer is subscribed to
•expert group [Vissers]: > 20% of it will be accelerator code*
Enabling Technologies for System-on-Chip Development,
November 19-20, 2001, Tampere, Finland
http://www.cs.tut.fi/soc/
Reconfigurable Computing Architectures and Methodologies for System-on-Chip;
Reiner Hartenstein, Monday, November 19, 10:15 - 11:00 hrs.
Reiner Hartenstein, University of Kaiserslautern, Germany http://hartenstein.de
© 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
31
Sources: Proc ISSCC, ICSPAT, DAC, DSPWorld
microprocessor / DSP
No
rmalized
p
rocesso
r sp
eed
battery performance
Algorithmic Complexity
(Shannon’s Law)
memory
Tra
nsis
tors
/ch
ip
1960 1970 1980 1990 2000 2010
100 000 000
10 000 000
1000 000
100 000
10 000
1000
100
10
1
2G
3G
4G
Why coarse grain ?
1G
wireless
100
10
1
0.1
0.01
0.001
mA/ MIP
computational
efficiency
StrongARM
SH7752
© 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
32
Shannon‘s Law
• In a number of application areas throughput requirements are growing faster than Moore's law
• Fundamental flaws in
software processor solutions
• 32 soft ARM cores fit onto contemporary FPGA
• Stream-based distributed processing is the way to go
© 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
33
It’s a Paradigm Shift !
• Using FPGAs (fine grain reconfigurable) just mainly is classical Logic Synthesis on a “strange hardware” platform
• Coarse Grain Reconfigurable Arrays (Reconfigurable Computing), however, mean a really fundamental Paradigm Shift
• This is still ignored by CS and EE Curricula and almost all R&D scenes
© 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
34
>> Coarse Grain: why ?
• History
• Paradidgm Shift
• Coarse Grain: why ?
• Coarse Grain Architectures
• Reconfiguration Architecture http://www.uni-kl.de
© 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
35
It’s a General Paradigm Shift !
• Using FPGAs (fine grain reconfigurable): just Logic Synthesis on a strange platform
• Coarse Grain Reconfigurable Arrays (Reconfigurable Computing): a fundamental Paradigm Shift
• ignored by Curricula & most R&D scenes
• Replacing Concurrent Processes by much more efficient parallelism: Stream-based ComputingArrays
© 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
36
Fine-grained vs. coarse-grained
• Fine-grained reconfiguration versus coarse-grained reconfiguration.
• fine grain is general purpose
• slow and area-inefficient, but high parallelism
• coarse grain is application domain-specific
• coarse grain is highly area-efficient
• extremely high performance
Enabling Technologies for System-on-Chip Development,
November 19-20, 2001, Tampere, Finland
http://www.cs.tut.fi/soc/
Reconfigurable Computing Architectures and Methodologies for System-on-Chip;
Reiner Hartenstein, Monday, November 19, 10:15 - 11:00 hrs.
Reiner Hartenstein, University of Kaiserslautern, Germany http://hartenstein.de
© 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
37
Reconfigurability Overhead
S S
S S
resources needed for reconfigurability
partly for configuration code storage
L
L L
L L
L
L L L
area used by application
“hidden RAM” not shown
© 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
38
Principle of a Typical FPGA
FF
FF
FF
FF
FF FFFF FF
Connection-Point
Tap
CLBCLB
CLBCLB
CLBCLBFF of hidden RAM
© 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
39
Routing Overhead in FPGAs
FF
FF
FF
FF
FF FF
>1000 transistors at each cross bar
FF part of the
hidden RAM most FPGA vendors’ gate count:
1 flipflop of configuration RAM = 4 gates
Routing Congestion [DeHon]: often 50% or less of CLBs used
FF FF
Ý 40 transistors at each switching point
>
Ý 15 transistors at each tap >
© 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
40
Sources: Proc ISSCC, ICSPAT, DAC, DSPWorld
Why Coarse Grain instead of FPGA ?
physical logical
FPGA logical
1980 1990 2000 2010
FPGA physical
100 000 000 000
10 000 000 000
1000 000 000
100 000 000
10 000 000
1000 000
100 000
10 000
1000
Tra
nsis
tors
/ c
hip
~ 10
~ 10 000
drastically smaller configuration memory
a lot of more benefits
much faster loading
FPGA routed
reduced reconfigurability overhead by up to ~ 1000
© 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
41
>>> extremely high efficiency
1. avoiding address computation overhead
2. avoiding instruction fetch and interpretation overhead
3. high parallelism, massively multiple deep pipelines
4. much less configuration memory
5. no routing areas to configure functions from CLBs
© 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
42
Configurable Computing Systems
• combine programmable sequential processor with Flexware (structurally programmable „hard“ware):
• capitalize on the strength of both,flexware and software.
• early 60ies: Estrin (UCLA): enabling technology not available
• 90ies: significant increase of research activities (DARPA ...)
• FPGAs: not the enabling technology: hardware skills needed
• Verilog or VHDL based systems often result in poor performance
Enabling Technologies for System-on-Chip Development,
November 19-20, 2001, Tampere, Finland
http://www.cs.tut.fi/soc/
Reconfigurable Computing Architectures and Methodologies for System-on-Chip;
Reiner Hartenstein, Monday, November 19, 10:15 - 11:00 hrs.
Reiner Hartenstein, University of Kaiserslautern, Germany http://hartenstein.de
© 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
43
Platforms available
• Soft Data Path Arrays – KressArray – Xtreme (PACT) – ACM (Quicksilver Tech) – CHESS Array (Elixent) – others
• Compilation techniques feasibility studies: – Partitioning Co-Compiler – Design Space Explorer – others
© 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
44
Also as an autonomous Machine
• New Machine Paradigm (Xputer)
• is the counterpart of the so-called von Neumann paradigm – CONS: confuses customers (paradigm switch: the brain hurts) – PROS: strong guidance of EDA tool development – more effective hardware/software APIs – compilation techniques similar to traditional compilation – better Application Development Tools accepting C or Java
• easy to teach: simple machine principles – scan patterns (data counter) similar to control flow (program
counter) – general model of hardware / software co-design – fascination for freak effect: opening up a new R&D discipline
© 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
45
>> Coarse Grain Architectures
• History
• Paradidgm Shift
• Coarse Grain: why ?
• Coarse Grain Architectures
• Reconfiguration Architecture http://www.uni-kl.de
© 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
46
Triscend
System on Chip
Sell Chips
Embedded Systems
Company Adaptive Silicon Chameleon Systems Malleable
Silicon Spice Systolix
MorphICs
Architecture Not disclosed 32 bit datapath array Not disclosed
Not disclosed Bit Serial Systolic Array
Not disclosed
Business Model Sell Cores Sell Chips Sell Chips
Sell Solutions Sell Cores
Sell Cores
Markets
Embedded DSP Networking
Voice over IP
Networking Signal Conditioning
Wireless Commun.
Network Processors: > 20 Players
Some Players in Silicon Valley and ….
© 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
47
Commercial rDPAs
XPU family (IP cores): PACT Corp., Munich
XPU128
**) bought
**
**
flexible array: MorphICs
CALISTO: Silicon Spice
CS2000 family: Chameleon Systems
MECA family: Malleable
FIPSOC: SIDSA
ACM: Quicksilver Tech
CHESS array: Elixent
MorphoSys: Morpho Tech
*
*
*) here at SoC
© 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
48
PACT Corp
• Xtreme Processor Platform (XPP) family of IP cores, high-speed data-stream-capable, scalable, reconfigurable clusters of arrays of 32-bit DPUs with embedded memories, and high-speed I/O ports -
• Application development support software featuring a flow graph-style algorithm mapping language - to minimize training requirements.
• XPP's fabrics, featuring automatic DataFlow synchronization and flagged Event Network to dynamically configure the execution flow,
• Supports dynamic RTR: hierarchical configuration managers free the designer from chip-level details and ensure that configurations are independently loaded in exactly the intended order.
• Automatic event-based task swapping along with data streams: released resources automatically reconfigured immediately
Enabling Technologies for System-on-Chip Development,
November 19-20, 2001, Tampere, Finland
http://www.cs.tut.fi/soc/
Reconfigurable Computing Architectures and Methodologies for System-on-Chip;
Reiner Hartenstein, Monday, November 19, 10:15 - 11:00 hrs.
Reiner Hartenstein, University of Kaiserslautern, Germany http://hartenstein.de
© 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
49
Reconfigurable Interconnect Fabric
separate routing area
rDPA (Reconfigurable Datapath Array)
rDPU rDPU rDPU rDPU
rDPU rDPU rDPU rDPU
rDPU rDPU rDPU rDPU
rDPU rDPU rDPU rDPU
rDPU rDPU rDPU rDPU
rDPU rDPU rDPU rDPU
rDPU rDPU rDPU rDPU
rDPU rDPU rDPU rDPU
RIF layouted over rDPUs: rDPA wired by abutment
© 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
50
Generically defined Fabrics: KressArray Family
f)g)
i)
a)
e)
routing
routing
d)b)
h)
only
andfunction
c) rDPU:
rDPU:
rDPU
+
Some Application Areas, like e. g. Wireless Communication,need extraordinarily powerful Communication Resources
© 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
51
Universal RAs are not always feasible
... often Functional Resources are not the Throughput Bottleneck
Some Application Areas, such as e. g. Wireless Communication, need extremely rich Communication Resources
Use Domain-specific Platform Generators !
The General Purpose (coarse grain) Reconfigurable Array
may appear to be an Illusion ...
© 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
52
KressArray Family Example
16 24
32
4
8
2 rDPU external view: only NNport Abutment Architecture shown
taylored KressArray rDPU example
http://kressarray.de
© 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
53
KressArray Family generic Fabrics: a few examples
Examples of 2nd Level Interconnect: layouted over rDPU cell - no separate routing areas !
+
rout-through and function
rout-through
only more NNports:
rich Rout Resources
Select Function
Repertory
select Nearest Neighbour (NN) Interconnect: an example
16 32 8 24
4
2 rDPU
Select mode, number, width of NNports
http://kressarray.de © 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
54
CMOS intercoonnect resources
Foundries offer up to 8 metal layers
and up to 3 poly layers
reconfigurable interconnect fabric
layouted over the rDU cell
Enabling Technologies for System-on-Chip Development,
November 19-20, 2001, Tampere, Finland
http://www.cs.tut.fi/soc/
Reconfigurable Computing Architectures and Methodologies for System-on-Chip;
Reiner Hartenstein, Monday, November 19, 10:15 - 11:00 hrs.
Reiner Hartenstein, University of Kaiserslautern, Germany http://hartenstein.de
© 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
55
Super Pipe Networks
pipeline properties array applications
shape resources
mapping scheduling
(data stream formation)
systolic array
regular data
dependencies only
linear only
uniform only
linear projection or algebraic synthesis
super-systolic rDPA
no restrictions simulated
annealing or P&R algorithm
(e.g. force-directed) scheduling algorithm
*
*) KressArray [1995]
© 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
56
Communication Resource Requirements
... often Functional Resources are not the Throughput Bottleneck
In some Application Areas, such as e. g. Wireless Communication, Reconfigurable Computing Arrays need extraordinarily rich and powerful Communication Resources
The Solution: Generators for Domain-specific RA Platforms
© 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
57
rDPU not used used for routing only operator and routing port location markerLegend: backbus connect
array size: 10 x 16 = 160 rDPUs
http://kressarray.de
SNN filter KressArray Mapping Example
rout thru only
not used backbus connect
© 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
58
rDPU not used used for routing only operator and routing port location markerLegend: backbus connect
route-thru-only rDPU
3 vert. NNports, 32 bit
http://kressarray.de
Xplorer Plot: SNN Filter Example
+ [13]
2 hor. NNports, 32 bit
operator
result
operand
operand
route thru
backbus connect
© 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
59
Super Pipe Networks
pipeline propertiesarray applications
shape resources
mappingscheduling
(data streamformation)
systolicarray
regular data
dependenciesonly
linearonly
uniformonly
linear projection oralgebraic synthesis
super-systolicRA
no restrictionssimulated
annealing orP&R algorithm
(e.g. force-directed)schedulingalgorithm*
*) KressArray [ASP-DAC-1995]
© 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
60
KressArray: try out youself !
• You may experiment yourself
• You may use it over the internet
• Map an application onto a KressArray
• Start with a simple example
• Visit http://kressarray.de
• Click the link to Xplorer
• ... does not run on internet explorer ....
• ... since Bill Gates does not like Java
try Netscape 4.7x
Enabling Technologies for System-on-Chip Development,
November 19-20, 2001, Tampere, Finland
http://www.cs.tut.fi/soc/
Reconfigurable Computing Architectures and Methodologies for System-on-Chip;
Reiner Hartenstein, Monday, November 19, 10:15 - 11:00 hrs.
Reiner Hartenstein, University of Kaiserslautern, Germany http://hartenstein.de
© 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
61
Michael Herz
Dissertation Michael Herz: • ... on mapping parallel memory
architectures for stream-based arrays onto KessArrays
• ... also transformation of storage schemes to optimize memory bandwith
• (MoM scan pattern transformations)
Agilent, Sindelfingen
© 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
62
Ulrich Nageldinger
Dissertation
Ulrich Nageldinger:
• ... on mapping applications onto KessArrays
• ... simultaneous routing and placement by simulated annealing
• Supporting a huge family of KressArrays
• fuzzy logic improvement proposal generator
• profiling
• design space exploration
infineon technologies, Munich
© 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
63
Rainer Kress
Dissertation
Rainer Kress:
• ... on mapping applications onto his* KessArray
• DPSS datapath synthesis system
• Including a data scheduler
• (data stream scheduler)
• Generalization of the Systolic Array
• (KressArray is a super systolic array)
• 32 bit design via Eurochip support
infineon technologies, Munich
© 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
64
Jürgen Becker
Dissertation
Jürgen Becker:
• ... Automatically partitioning Co-compiler
• (configware / software co-compilation)
• Resource-parameter-driven retargettable
• Profiler-driven optimization
• Accepts HLL „ALE-X“ (extended C subset)
• (subset: pointers not supported)
Professor at Univ. Karlsruhe
© 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
65
Karin Schmidt
Dissertation
Karin Schmidt:
• Compilation Techniques for Xputers
• modified loop transformations
• Modified parts of implementation used for Jürgen Becker‘s Ph. D. thesis
DaimlerChrysler Research
© 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
66
CHESS Array w. embedded RAM (Elixent)
R A M
R A M
R A M
R A M
R A M
R A M
R A M
R A M
R A M
R A M
R A M
R A M
R A M
R A M
R A M
R A M
R A M
R A M
R A M
R A M
R A M
R A M
R A M
R A M
R A M
R A M
R A M
R A M
R A M
R A M
R A M
R A M
User Registers Clock Control
Me
mo
ry I
nte
rfa
ce
multi-granular e. g. 16 * 4 Bits = 64 Bits
ALU ALU
ALU
ALU
ALU ALU
Sequencer
Enabling Technologies for System-on-Chip Development,
November 19-20, 2001, Tampere, Finland
http://www.cs.tut.fi/soc/
Reconfigurable Computing Architectures and Methodologies for System-on-Chip;
Reiner Hartenstein, Monday, November 19, 10:15 - 11:00 hrs.
Reiner Hartenstein, University of Kaiserslautern, Germany http://hartenstein.de
© 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
67
Chameleon Systems
• RISC processor and an array of 108 arithmetic processing units. Each of those 32-bit processing cores runs at 125 MHz.
• The CS2112 is the industry's first Reconfigurable Communications
Processor (RCP), a streaming data processor.
• The vendor claims a performance of 20 billion 16-bit operations per second, and 2.4 billion 16-bit multiply-accumulates per second - and 1.6 GBytes / sec for ist programmable I/O (PIO) banks.
• It also has a PCI interface.
• Tool suite C~SIDE for developing, verifying and optimizing.
© 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
68
Coarse Grain Architectures
style project first
publ.
source architecture granularity fabrics mapping intended target application
DP-FPGA 1994 [4] 2-D array 1 & 4 bit multi-granular Inhomog. routing channels switchbox routing regular datapaths
KressArray 1995 [5,11] 2-D mesh family: sel. pathwidth multiple NN & bus segments (co-)compilation (adaptable)
Colt 1996 [12] 2-D array 1 & 16 bit inhomogenous run time reconfiguration highly dynamic reconfig.
Matrix 1996 [15] 2-D mesh 8 bit, multi-granular 8NN, length 4 & global lines multi-length general purpose
RAW 1997 [17] 2-D mesh 8 bit, multi-granular 8NN switched connections switchbox rout experimental
Garp 1997 [16] 2-D mesh 2 bit global & semi-global lines heuristic routing loop acceleration
REMARC 1998 [18] 2-D mesh 16 bit NN & full length buses (info not available) multimedia
MorphoSys 1999 [19] 2-D mesh 16 bit NN, length 2 & 3 global lines manual P&R (not disclosed)
CHESS 1999 [20] hexagon 4 bit, multi-granular 8NN and buses JHDL compilation multimedia
DReAM 2000 [21] 2-D array 8 &16 bit NN, segmented buses co-compilation next generation wireless
CS2000 family 2000 [23] 2-D array 16 & 32 bit inhomogenous array (not disclosed) communication
MECA family 2000 [24] 2-D array multi-granular (not disclosed) (not disclosed) tele- & datacommunication
CALISTO 2000 [25] 2-D array 16 bit multi-granular (not disclosed) (not disclosed) tele- & datacommunication
mesh
FIPSOC 2000 [26] 2-D array 4 bit multi-granular (not disclosed) (not disclosed) tele- & datacommunication
RaPID 1996 [27] 1-D array 16 bit segmented buses channel routing pipelining linear
PipeRench 1998 [29] 1-D array 128 bit (sophisticated) scheduling pipelining
PADDI 1990 [30] crossbar 16 bit central crossbar routing DSP
PADDI-2 1993 [32] crossbar 16 bit multiple crossbar routing DSP and others Cross bar
Pleiades 1997 [33] mesh+crossbar multi-granular multiple segmented crossbar switchbox routing multimedia
© 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
69
Primarily Mesh-based ….
market project bits granularity source
KressArray variable U. Kaiserslautern
Garp 2 UC Berkeley
CHESS 4 Hewlett Packard
Matrix
RAW8 M.I.T.
Colt 1 & 16 Virginia Tech
DReAM 8 &16 TU Darmstadt
REMARC Stanford
research
MorphoSys UC Irvine
CALISTO Slicon Spice
MECA family
16
Malleable
CS2000 family 16 & 32 Chameleon Systems
FIPSOC 16 & analog SIDSA
commercial
XPP XPU128 32 PACT Corp.
© 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
70
UC Berkeley (Jan Rabaey)
market project bits granularity source
PADDI
PADDI-2research
Pleiades
16 UC Berkeley
© 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
71
Crossbar-based Architectures
1993: PADY-II (Jan Rabaey)
EXUCTL
EXUCTL
EXUCTL
EXUCTL
EXUCTL
EXUCTL
EXUCTL
EXUCTL
crossbar switchI/OI/O
1990: UC Berkeley (Jan Rabaey)
16 bit
1997: Pleiades (mesh & crossbar)
32 bit
© 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
72
PADDI-II Architecture
NetworkP47
P48
P46
P45
P1P2P3P4
P5P6P7P8
P9P10P11P12
P13P14P15P16
P17P18P19P20
P21P22P23P24
P25P26P27P28
P29P30P31P32
P33P34P35P36
P37P38P39P40
P41P42P43P44
P45P46P47P48
break-switch
break-switch
I/O I/O I/O I/O
I/O I/O I/O I/O
6 x 16b
16 x 6 switch matrix
Level-2
16 x 16b
Level-1 Network
4-PE Cluster
Enabling Technologies for System-on-Chip Development,
November 19-20, 2001, Tampere, Finland
http://www.cs.tut.fi/soc/
Reconfigurable Computing Architectures and Methodologies for System-on-Chip;
Reiner Hartenstein, Monday, November 19, 10:15 - 11:00 hrs.
Reiner Hartenstein, University of Kaiserslautern, Germany http://hartenstein.de
© 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
73
MorphoSys
© 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
74
PipeRench Architecture (CMU 1998)
highly dynamic reconfiguration
alternating data/instruction stream
© 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
75
M.I.T.
MIPS-like processor
core
cross bar
global lines
global lines
RAW (M.I.T. 1997)
Reconfigurable Architecture Workbench
MATRIX (1996) Multiple Alu archiTecture with Reconfigurable Interconnect eXperiment
0.5 m CMOS 8 bit 10 x 10 1.8 mm2
100 MHz
ALU 8 bit
256x8 bit
Mem
WE mode
Net
wo
rk P
ort
A N
etwo
rk Po
rt B
Mem
Fu
nc P
ort A
LU
Fu
nc
Po
rt
compare / reduce 2
C / R Network
compare / reduce 1
C / R Network Level-1 Network
BFU opc operation
0 1 2 3 4 5 6 7 8 9
10 11 12 13 14 15
× × +
× + + × const
insh nsh dsh csh
+ +0 +1
:=
nand nor xor
© 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
76
MATRIX Interconnect Fabrics
BFU
its neighbours
BFUs
Communication Resources are often the bottleneck
© 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
77
More Research Projects
.... and others
Garp (UC Berkeley)
RaPiD (U. Washington )
REMARC (Stanford) published between 1996 - 2000
DReAM (U. Karlsruhe)
Asia / Pacific: also see embedded tutorials by Prof. Amano (ASP_DAC’99, FPL-2000)
© 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
78
RaPiD Architecture
A
L
U
RAM
MU
LT
A
L
U
RAM
A
L
U
RAM
Bus Connectors Input Multiplexers Output Drivers
DatapathRegisters
Enabling Technologies for System-on-Chip Development,
November 19-20, 2001, Tampere, Finland
http://www.cs.tut.fi/soc/
Reconfigurable Computing Architectures and Methodologies for System-on-Chip;
Reiner Hartenstein, Monday, November 19, 10:15 - 11:00 hrs.
Reiner Hartenstein, University of Kaiserslautern, Germany http://hartenstein.de
© 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
79
REMARC
© 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
80
Future Coarse Grain RA Development
• It is indispensable to operate within the Convergence Area of Compilers, Co-Compilers, Architecture and full-custom-style VLSI Design (array cells).
• It is a must, that Products come with a Development Platform which encourages users,especially also those with a limited Hardware Background.
© 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
81
>> Reconfiguration Architecture
• History
• Paradidgm Shift
• Coarse Grain: why ?
• Coarse Grain Architectures
• Reconfiguration Architecture http://www.uni-kl.de
© 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
82
statically re-configurable
Dimensions of Reconfigurability
Class ofprocessor product vendor
ASIP Tensilica Tensilica
MECA family Malleable
CALISTO SiliconSpiceNetworkProcessor
many others many others
configuration time
ASIP
fabrication time
run time Network
Processor
design time
compile time
dynamically
reconfigurable
*) Application-Specific Instruction set Processors
ASIPs* vs. Network Processors
Extremes:
© 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
83
Configuration Architectures
host
Compiler, Mapper, RTOS
etc.
Soft
Data
Path RAM
RAM
RAM
RAM
multi-context:
Soft
Data
Path
RAM
host
Compiler, Mapper, RTOS
etc.
straight forward:
host
Compiler, Mapper, RTOS
etc.
Config. Cache
RAM
RAM
RAM
RAM
Soft
Data
Path
RAM
Configuration caching*:
Configuration Loading Resources: • separate configuration fabrics (e.g. FPGA)
• wormhole routing (KressArray, Colt, PipeRench)
• RA part computes code for other RA part (self reconfiguration)
(dynamic vs. static)
dynamic
*) no cache as usual !
© 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
84
Colt Architecture (P. Athanas 1996)
Multiplier
DP
DP
DP
Smart
Crossbar
IFUIFUIFUIFU
IFUIFUIFUIFU
IFUIFUIFUIFU
IFUIFUIFUIFU
DP
DP
DPI/O Pins
I/O Pins
I/O Pins
I/O Pins
I/O Pins
I/O Pins
Studying highly dynamic reconfiguration
wormhole routing
Enabling Technologies for System-on-Chip Development,
November 19-20, 2001, Tampere, Finland
http://www.cs.tut.fi/soc/
Reconfigurable Computing Architectures and Methodologies for System-on-Chip;
Reiner Hartenstein, Monday, November 19, 10:15 - 11:00 hrs.
Reiner Hartenstein, University of Kaiserslautern, Germany http://hartenstein.de
© 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
85
Schedule
time slot
08.30 – 10.00 Reconfigurable Computing (RC)
10.00 – 10.30 coffee break
10.30 – 12.00 Compilation Techniques for RC
12.00 – 14.00 lunch break
14.00 – 15.30 Resources for Stream-based RC
15.30 – 16.00 coffee break
16.00 – 17.30 FPGAs: recent developments
© 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
86
- END -