dsd2001 reconfigurable computing: the roadmap to a new business model – and its impact on soc...
TRANSCRIPT
DSD2001
Reconfigurable Computing: the Roadmap to a New Business Model – and its Impact on SoC Design
TS4: Tuesday, 14.00 hrs
Reiner Hartenstein
University ofKaiserslautern
Pirenópolis, GO, Brazil, Sept. 10-15, 2001
© 2001, [email protected] http://www.fpl.uni-kl.de2
University of Kaiserslautern
Xputer LabConferences on Reconfigurable Logic
•topic adoption by congresses: ASP-DAC, DAC, DATE, ISCAS, SPIE ….
•FCCM, FPGA (founded 1992), and FPL (founded 1991 at Oxford, UK):
•FPL 2002, La Grande Motte (Montpellier, France), Sept. 2 – 4http://www.lirmm.fr/fpl2002/
Paper Submission deadline : 15th March 2002Notification of Acceptance : 20th May 2002
The International Conference on Field-programmable Logic and Applications
Laboratoire d‘Informatique, de Robotique et deMicroélectronique de MontpellierMontpellier
de
© 2001, [email protected] http://www.fpl.uni-kl.de3
University of Kaiserslautern
Xputer Lab>> Introduction
• Introduction• FPGA boom
• Coarse Grain Architectures
• Fascinating Paradigm Shift
• Programming Coarse Grain rDPAs
• Principles of Soft Computing Machines
• Future developments expected
• Conclusionshttp://www.uni-kl.de
fine grain
coarse grain
fundamental
issues
© 2001, [email protected] http://www.fpl.uni-kl.de4
University of Kaiserslautern
Xputer Lab Logic Gate Price Trend
Source:Altera
Pri
ce (
Norm
aliz
ed t
o Q
1/1
993)
Q1'93
Q1'94
Q1'95
Q1'96
Q1'97
Q1'98
Q1'99
Q1'00
Price per Logic Element
40% lower per Year
0
0.2
0.4
0.6
0.8
1
1.2
0.261
0.086 0.042 0.029
© 2001, [email protected] http://www.fpl.uni-kl.de5
University of Kaiserslautern
Xputer LabThe Impact of Reconfigurable
Logic
• Reconfigurable platforms bring a new dimension to digital system development and have a strong impact on SoC design.
• A rapidly growing large user base of HDL-savvy designers with FPGA experience.
• Flexibility supports turn-around times of minutes instead of months for real time in-system debugging, profiling, verification, tuning, field-maintenance, and field upgrades
• However, completely ignored by CS & CSE Curricula
© 2001, [email protected] http://www.fpl.uni-kl.de6
University of Kaiserslautern
Xputer Lab
?
What’s coming next ?
The History of Paradigm Shifts
“Mainstream Silicon Applicationis switching every 10 Years”
TTL µproc.,memory
“The Programmable System-on-a-Chipis the next wave“
custom
standard
1957
1967
1977
1987
1997
2007
Makimoto’s Wave
ASICs,accel’s
LSI,MSI
1st D
esig
n C
risis
2n
d D
esig
n C
risis
?
reconfigurablePublished
in 1989
© 2001, [email protected] http://www.fpl.uni-kl.de7
University of Kaiserslautern
Xputer LabHow’s next Wave ?
2007FPGAs
custom
standard
1957
1967
1977
1987
1997
Tredennick’sParadigm Shifts
procedural programming
algorithm: variable
resources: fixed
hardwired
algorithm: fixed
resources: fixed
2007
?
structural programming
algorithm: variable
resources: variable
Coarse grain
RAs
no further wave !
Hartenstein’s Curve
?4th wave ?
© 2001, [email protected] http://www.fpl.uni-kl.de8
University of Kaiserslautern
Xputer LabThe Impact of
Makimoto’s Paradigm Shifts
TTL µproc.,memory
custom
standard
ASICs,accel’s
LSI,MSI
reconfigurable
1957
1967
1977
1987
1997
2007
Proceduralpersonalization via RAM-based
Machine Paradigm
Personalization(CAD) beforefabrication
structuralpersonalization:
RAM-basedbefore run time
Dr. Makimoto: FPL 2000 keynote
Software Industry’sSecret of Success
Repeat Success Story bynew Machine Paradigm !
© 2001, [email protected] http://www.fpl.uni-kl.de9
University of Kaiserslautern
Xputer LabTerminology
Paradigm Platform Programming
source
“von Neumann” Hardware Software
Soft Machine (w. soft datapaths)
Coarse grain Flexware
high level Configware
RL (FPGA etc.) fine grain Flexware netlist level
Configware
© 2001, [email protected] http://www.fpl.uni-kl.de10
University of Kaiserslautern
Xputer LabReconfigurable Logic going Mainstream
• Please, Lobby for New Curricula.
• Comprehensive Methodology
• One of the goals of this talk: to motivate You by Key Issues and Visionary Highlights.
• Fine grain: FPGAs killing the ASIC market
• Coarse grain: several startups
• Substantially improved design flow and libraries
• Fastest growing segment of semiconductor market
© 2001, [email protected] http://www.fpl.uni-kl.de11
University of Kaiserslautern
Xputer Lab>> FPGA boom
• Introduction
• FPGA boom
• Coarse Grain Architectures
• Fascinating Paradigm Shift
• Programming Coarse Grain RAs
• Principles of Soft Computing Machines
• Future development expected
• Conclusionshttp://www.uni-kl.de
© 2001, [email protected] http://www.fpl.uni-kl.de12
University of Kaiserslautern
Xputer LabWhat is an FPGA ?
single-length lines
double-length lines
S S
S S
L
L L
LL
L
L LL
lon
glin
es
S = Switch BoxL = Logic Block
Xilinx XC400E
L
L L
LL
L
L LL
© 2001, [email protected] http://www.fpl.uni-kl.de13
University of Kaiserslautern
Xputer LabTop 4 FPGA Manufacturers 2000
Xilinx42%
Altera37%
Lattice15%
Actel6%
Top 4 PLD Manufacturers 2000$3.7 Bio
© 2001, [email protected] http://www.fpl.uni-kl.de14
University of Kaiserslautern
Xputer LabFPGA market 1998 / 1999
1999 rankglobal sales (mio $)
1998 1999
1 Xilinx 629
899
2 Altera 654 837
3 Lattice 206 410
4 Actel 154 172
5 Lucent 100 120
6 Cypress 41 43
7 Quicklogic 30 40
8 Atmel 32 38
Source: IC Insights Inc.
Meanwhile,
Xilinx acquired Philips' MOS
PLD business,
Lattice purchased
Vantis..
© 2001, [email protected] http://www.fpl.uni-kl.de15
University of Kaiserslautern
Xputer LabFPGAs going Mainstream
• [Dataquest] PLD market > $7 billion by 2003.
• IP reuse and "pre-fabricated" components for the efficiency of design and use for PLDs
• FPGAs are going into every type of application.
• FPGA, from an IP standpoint, starting to look like an
ASIC.
• PLD vendors provide libraries to support their
products.
• today Altera and Xilinx own >65% of PLD business.
• FPGAs soon reach 50 million system gates
© 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
16
Away from complex design flow
Placeand
Route NetlistSchematics/
HDL Netlister
Bitstream
CompilerHLL
[S. Guccione]
[S. Guccione]EDA trends ....
© 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
17
Drop traditional separate design flow
UserCode Compiler Executable
Netlister NetlistPlaceand
Route..
Bitstream
Schematics/HDL
[S. Guccione]
HLL Compiler
[S. Guccione]
EDA trends ....
© 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
18
embedded hardw. CPU & memory cores
HLL Compiler
CPUcore
FPGA core
Memorycore
[S. Guccione]
embeddedCPU andmemory
available
HLL Compiler
[S. Guccione]
memory memory
© 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
19
CPU for configuration management
•on-board microprocessor CPU is available anyhow - even along with a little RTOS
HLL Compiler
[S. Guccione]
CompilerHLL
[S. Guccione]
EDA trends ....
© 2001, [email protected] http://www.fpl.uni-kl.de20
University of Kaiserslautern
Xputer LabConfiguration Architectures
hostCompiler, Mapper, RTOS
etc.
Soft Data Path
RAM
RAMRAMRAM
multi-context:
Soft Data Path
RAMhostCompiler, Mapper, RTOS
etc.
straight forward:
hostCompiler, Mapper, RTOS
etc.
Config. Cache
RAM
RAM
RAM
RAM
Soft Data Path
RAM
Configuration caching*:
Configuration Loading Resources:• separate configuration fabrics (e.g. FPGA)• wormhole routing (KressArray, Colt, PipeRench)• RA part computes code for other RA part (self reconfiguration)
(dynamic vs. static configuration)
Dynamic(RTR)
*) no cache as
usual !
© 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
21
•million gate FPGAs and co-processing with standard microprocessor are commonplace
•direct implementation of complex algorithms
•new tools like Xilinx Jbits tool suite
•directly support coprocessing and Run Time Reconfiguration (RTR)
Converging factors for RTR [S. Guccione]
CPUcore
FPGA core
Memorycore
UserJavaCode
JavaCompiler
JBitsJBitsAPI
Executable
[S. Guccione]
© 2001, [email protected] http://www.fpl.uni-kl.de22
University of Kaiserslautern
Xputer Lab(5) static vs. dynamic
reconfiguration15 min
• supports ASAT, adaptable devices
• requires disciplined implementation to avoid a testing nightmare
• supported by on-board / on-chip CPU core
• supports in-field debugging and upgrading (new business model)
• supported by on-board / on-chip CPU core
Revenue/ month
Time / months
Update 1
Product
Update 2
1 10 20
ASIC Product
reconfigurable Product with download
30
[Kean]
page 109
© 2001, [email protected] http://www.fpl.uni-kl.de23
University of Kaiserslautern
Xputer LabConfigware as the Key Enabler
• Configware market is taking off for mainstream
• FPGA-based designs more complex, even SoC
• No design productivity and quality without good configware libraries (soft IP cores) from various application areas.
• Growing no. of independent configware houses (soft IP core vendors) and design services
• Xilinx AllianceCORE & Reference Design Alliance et al.
• Currently the top FPGA vendors are key innovators and meet most configware demand.
© 2001, [email protected] http://www.fpl.uni-kl.de24
University of Kaiserslautern
Xputer Lab„Driver“ & „OS“ for FPGAs
• separate EDA software market, comparable to the compiler / OS market in computers,
• Cadence, Mentor, Synopsys just jumped in.
• Xilinx and Altera are fabless FPGA vendors
• < 5% Xilinx / Altera income from EDA software
• > 50% Xilinx people work on support, EDA & Configware
© 2001, [email protected] http://www.fpl.uni-kl.de25
University of Kaiserslautern
Xputer Lab>> Coarse Grain Architectures
• Introduction
• FPGA boom
• Coarse Grain Architectures• Fascinating Paradigm Shift
• Programming Coarse Grain rDPAs
• Principles of Soft Computing Machines
• Future developments expected
• Conclusions
for detailed
overview see
proceedings
http://www.uni-kl.de
© 2001, [email protected] http://www.fpl.uni-kl.de26
University of Kaiserslautern
Xputer Lab
Sources: Proc ISSCC, ICSPAT, DAC, DSPWorld
microprocessor / DSP
No
rmal
ized
pro
cess
or
spee
d
battery performance
Algorithmic Complexity(Shannon’s Law)
memory
Tra
nsi
sto
rs/c
hip
1960 1970 1980 1990 2000 2010
100 000 000
10 000 000
1000 000
100 000
10 000
1000
100
10
1
2G
3G
4GWhy coarse
grain ?
1G
wireless
100
10
1
0.1
0.01
0.001
mA/ MIP
computational efficiency
StrongARMSH7752
© 2001, [email protected] http://www.fpl.uni-kl.de27
University of Kaiserslautern
Xputer Lab Fine-grained vs. coarse-grained
• Fine-grained reconfiguration versus coarse-grained reconfiguration.
• fine grain is general purpose
• slow and area-inefficient, but high parallelism• coarse grain is application domain-specific
• coarse grain is highly area-efficient
• extremely high performance
© 2001, [email protected] http://www.fpl.uni-kl.de28
University of Kaiserslautern
Xputer LabReconfigurability Overhead
S S
S Sresources needed for reconfigurability
partly for configuration code storage
L
L L
LL
L
L LL
area used by application
“hidden RAM”not shown
© 2001, [email protected] http://www.fpl.uni-kl.de29
University of Kaiserslautern
Xputer Lab
Sources: Proc ISSCC, ICSPAT, DAC, DSPWorld
Why Coarse Grain instead of FPGA ?
physicallogical
supersystolic
FPGAlogical
1980 1990 2000 2010
FPGAphysical
100 000 000 000
10 000 000 000
1000 000 000
100 000 000
10 000 000
1000 000
100 000
10 000
1000
Tra
nsi
sto
rs /
chip
~ 10
~ 10 000
drastically smaller configuration memorya lot of more benefits
much faster loading
FPGArouted
memory
microprocessor
reduced reconfigurability overhead by up to ~ 1000
© 2001, [email protected] http://www.fpl.uni-kl.de30
University of Kaiserslautern
Xputer LabCommercial RAs
XPU family (IP cores):PACT corp., Munich
XPU128
flexible array: MorphICs
CALISTO: Silicon Spice*
CS2000 family:Chameleon Systems
MECA family: Malleable*
FIPSOC: SIDSA
ACM: Quicksilver Tech
CHESS array: Elixent
*) bought
© 2001, [email protected] http://www.fpl.uni-kl.de31
University of Kaiserslautern
Xputer LabUniversal RAs are not feasible
... often Functional Resources are not the Throughput Bottleneck
Some Application Areas, such as e. g. Wireless Communication, need extremely rich Communication ResourcesUse Domain-specific Platform Generators !
The General Purpose (coarse grain) Reconfigurable Array appears to be an Illusion ...
© 2001, [email protected] http://www.fpl.uni-kl.de32
University of Kaiserslautern
Xputer Lab
KressArray Family generic Fabrics: a few examples
Examples of 2nd Level Interconnect:layouted overrDPU cell - no separate routing areas !
+
rout-through and function
rout-throug
h only more NNports:
rich Rout Resources
Select Function
Repertory
select Nearest Neighbour (NN) Interconnect: an example
16 32 8 24
4
2 rDPU
Select mode, number, width of NNports
http://kressarray.de
© 2001, [email protected] http://www.fpl.uni-kl.de33
University of Kaiserslautern
Xputer Lab
rDPU not used used for routing only operator and routing port location markerLegend: backbus connect
array size: 10 x 16 = 160 rDPUs
http://kressarray.de
SNN filter KressArray Mapping Example
rout thru only
not usedbackbus connect
© 2001, [email protected] http://www.fpl.uni-kl.de34
University of Kaiserslautern
Xputer Lab
route-thru-only rDPU
3 vert. NNports, 32 bit
http://kressarray.de
Xplorer Plot: SNN Filter Example
+[13]
2 hor. NNports, 32 bit
operator
result
operand
operand
route thru
backbus connect
© 2001, [email protected] http://www.fpl.uni-kl.de35
University of Kaiserslautern
Xputer Lab>> Fascinating Paradigm
Shift
• Introduction
• FPGA boom
• Coarse Grain Architectures
• Fascinating Paradigm Shift• Programming Coarse Grain rDPAs
• Principles of Soft Computing Machines
• Future development expected
• Conclusions
http://www.uni-kl.de
© 2001, [email protected] http://www.fpl.uni-kl.de36
University of Kaiserslautern
Xputer Lab
Paradigm Shift
Mainstream
Tornado
Development of Hypergrowth Markets
Harper Business 1995
© 2001, [email protected] http://www.fpl.uni-kl.de37
University of Kaiserslautern
Xputer Lab
Makimoto’s 3rd wave
The next EDA Industry Revolution
1978
Transistor entry: Applicon, Calma, CV ...
1992Synthesis: Cadence, Synopsys ...
1985
Schematics entry: Daisy, Mentor, Valid ...
[Keutzer / Newton]
EDA industry paradigmswitching every 7 years
1999(Co-) Compilation
Stream-based DPU arrays
[Hartenstein]
2006
© 2001, [email protected] http://www.fpl.uni-kl.de38
University of Kaiserslautern
Xputer Lab It’s a General Paradigm Shift !
• Using FPGAs (fine grain reconfigurable): just Logic Synthesis on a strange platform
• Coarse Grain Reconfigurable Arrays (Reconfigurable Computing): a fundamental Paradigm Shift
• ignored by Curricula & most R&D scenes
• Replacing Concurrent Processes by much more efficient parallelism: Stream-based ComputingArrays
systolic array* [1980]
KressArray** [1995]
chip-on-a-day* [2000]____
*) hardwired
**) reconfigurable
© 2001, [email protected] http://www.fpl.uni-kl.de39
University of Kaiserslautern
Xputer LabStream-based Computing (2)
terms:
• DPU: datapath unit• DPA: datapath array• rDPU: reconfigurable
DPU• rDPA: reconfigurable
DPA
• stream-based computing: using complex pipe network (super-systolic: Kress et al.)
© 2001, [email protected] http://www.fpl.uni-kl.de40
University of Kaiserslautern
Xputer LabConverging Design Flows
this synthesis method is a generalization of
systolic array synthesis:super systolic synthesis
and DPA [Broderson,
2000]: terms:
DPU: datpath unitDPA: data path arrayrDPU: reconfigurable DPUrDPA: reconfigurable DPA
the same synthesis method may be used for mapping an algorithm
onto both:rDPA [Kress, 1995],
© 2001, [email protected] http://www.fpl.uni-kl.de41
University of Kaiserslautern
Xputer Lab Concurrent Computing
DPUinstructionsequencer
DPUinstructionsequencer
DPUinstructionsequencer
DPUinstructionsequencer
....
Bus(es) or switch box
CPUextremely inefficient
© 2001, [email protected] http://www.fpl.uni-kl.de42
University of Kaiserslautern
Xputer Lab Stream-based Computing
DPU DPUDPUDPU
driven by data stream from / to memory or, from / to peripheral interface
transport-triggered executionno instruction sequencer inside !
© 2001, [email protected] http://www.fpl.uni-kl.de43
University of Kaiserslautern
Xputer LabStream-based Computing: (r)DPU
array
for both,reconfigurable,and, hardwired
DPU DPUDPU
DPU DPUDPU
DPU DPUDPU
driven by data streams
© 2001, [email protected] http://www.fpl.uni-kl.de44
University of Kaiserslautern
Xputer Lab>>> extremely high efficiency
• avoiding address computation overhead
• avoiding instruction fetch and interpretation overhead
• high parallelism, massively multiple deep pipelines
• much less configuration memory
• no routing areas to configure functions from CLBs
© 2001, [email protected] http://www.fpl.uni-kl.de45
University of Kaiserslautern
Xputer Lab>> Programming Coarse Grain
RAs
• Introduction
• FPGA boom
• Coarse Grain Architectures
• Fascinating Paradigm Shift
• Programming Coarse Grain rDPAs• Principles of Soft Computing Machines
• Future development expected
• Conclusions
http://www.uni-kl.de
© 2001, [email protected] http://www.fpl.uni-kl.de46
University of Kaiserslautern
Xputer LabSystolic Stream-based Computing
SystemSystolic Array [H. T. Kung, 1980]: an array of DPUs (Data Path Units)
y10
y20
y30
x1
x2
x3
-
-
-
a12
a11 a21
a32
a31
a23 a33
a22
a13
--
y1
y2
y3
---
-
DPU architecturey
+*
x
a
datastreams
equations
placement linearprojection
or algebraicmapping
The Mathematician’s
Synthesis Method
linear pipelinesand uniformarrays only
norouting!
© 2001, [email protected] http://www.fpl.uni-kl.de47
University of Kaiserslautern
Xputer Lab
computingin space
Computing in space and time
datastreams
y10
y20
y30
---
y1
y2
y3
---
x1
x2
x3
-
- -
computingin time
a12
a11 a21
a32
a31
a23 a33
a22
a13
placement
systolicarrays etc.
and other transformationsmigration by re-timing
this dichotomy iscompletely ignoredby our CS curricula
© 2001, [email protected] http://www.fpl.uni-kl.de48
University of Kaiserslautern
Xputer Lab
2
General Stream-based Computing Systemheterogenous Array of DPUs (data path units)
Scheduler
Mapper
expression treeDPU architectures
y
+*
x
a
1
simultaneousplacement& routing
3
+
++
+
***sh
*sh
sh sh
xf
xf
-
- datastreams
4
The same mapper for both:Reconfigurable,or hardwired
Kress DPSS [1995]
simulated
annealing
free form
pipe network
© 2001, [email protected] http://www.fpl.uni-kl.de49
University of Kaiserslautern
Xputer Lab
Super Pipe Networks
pipeline properties array applications
shape resources
mapping scheduling
(data stream formation)
systolic array
regular data dependencies
only
linear only
uniform only
linear projection or algebraic synthesis
super-systolic rDPA
no restrictions simulated
annealing or P&R algorithm
(e.g. force-directed) scheduling algorithm
The key is mapping, rather than architecture
**) KressArray [1995]
© 2001, [email protected] http://www.fpl.uni-kl.de50
University of Kaiserslautern
Xputer LabProcessor Memory Performance Gap
1
10
100
1000Performance
1980 1990 2000
µProc60%/yr..
DRAM7%/yr..
Processor-MemoryPerformance Gap:(grows 50% / year)
DRAM
CPU
© 2001, [email protected] http://www.fpl.uni-kl.de51
University of Kaiserslautern
Xputer Lab
http://kressarray.de
Efficient Memory Communicationshould be directly supported by the Mapper Tools
sequencers
memory ports
application
not used
Legend:Optimized ParallelMemory Controller
An example byNageldinger’s KressArray Xplorer
Synthesizable Memory Communication
© 2001, [email protected] http://www.fpl.uni-kl.de52
University of Kaiserslautern
Xputer LabMemory Communication Architecture
• hot research topic in embedded systems
• storage context transformations [Herz, others]
• for low power
• for high performance
• startups provide memory IP or generators
© 2001, [email protected] http://www.fpl.uni-kl.de53
University of Kaiserslautern
Xputer LabStream-based Soft Machine
SchedulerMemory(data memory)
memory bank
memory bank
memory bank
memory bank
memory bank
...
...
“instructions”
rDPACompiler
Sequencers(data stream
generator)
© 2001, [email protected] http://www.fpl.uni-kl.de54
University of Kaiserslautern
Xputer LabHot Research Topic: Memory Architectures
•High Performance Embedded Memory Architectures
•High Performance Memory Communication Architectures [Herz]
•Custom Memory Management Methodology [Cathoor]
•Data Reuse Transformations [Kougia et al.]
•Data Reuse Exploration [Soudris, Wuytak]
© 2001, [email protected] http://www.fpl.uni-kl.de55
University of Kaiserslautern
Xputer Lab>> Principles of Soft Computing
Machines
• Introduction
• FPGA boom
• Coarse Grain Architectures
• Fascinating Paradigm Shift
• Programming Coarse Grain rDPAs
• Principles of Soft Computing Machines
• Future development expected
• Conclusionshttp://www.uni-kl.de
© 2001, [email protected] http://www.fpl.uni-kl.de56
University of Kaiserslautern
Xputer Lab KressArray DPSS
ApplicationSet
DPSS
published at ASP-DAC 1995
ArchitectureEditor
MappingEditor
statist.Data
DelayEstim.
Analyzer
Architecture
Estimator
interm.form 2
expr.tree
ALE-XCompiler
PowerEstimator
PowerData
VHDLVerilog
HDLGeneratorSimulator
User
ALEXCode
Improvement Proposal Generator
Suggestion
SelectionUserInterface
interm.form 3
Mapper
DesignRules
DatapathGeneratorGenerator
KressrDPU
Layout
data stream Schedule
Scheduler
KressArrayXplorer (Platform Design Space Explorer)
Xplorer
InferenceEngine (FOX)
Sug-gest-ion
KressArrayfamily
parameters
Compiler
Mapper
Scheduler
© 2001, [email protected] http://www.fpl.uni-kl.de57
University of Kaiserslautern
Xputer Lab
Architecture &Mapping Editor
Stat
istics
KressArray DPSS
DatastreamGenerator
HDLGeneratorSimulator
DatapathGeneratorGenerator
Delay & Power
EstimatorImprovement
ProposalGenerator
User DPSS
SourceInputKressArray
(Design Space)Platform SpaceExplorer
http://kressarray.de
Xplorer
ApplicationSet
© 2001, [email protected] http://www.fpl.uni-kl.de58
University of Kaiserslautern
Xputer Lab Design Flow of Domain-specific
Architecture Optimization
ApplicationCompilation
ApplicationSelection
ApplicationMapping
MappingAnalysis
ModificationSuggestion
ArchitectureModification
ArchitectureVerification
OptimizedArchitecture
ApplicationSet
Initial Arch.Estimation
or benchm ark
Nageldinger’s KressArray
Design Space Xplorer:
including aFuzzy LogicImprovementProposalGenerator
accessible by internet:
http://kressarray.de
runs best withNetscape 4.6.1
© 2001, [email protected] http://www.fpl.uni-kl.de59
University of Kaiserslautern
Xputer Lab
datacounter
instructions
programcounter:
state register
CompilerMemory
Datapath
hardwired
Sequencer
Computer Computer tightly coupledby compact
instruction code
“von Neumann”
“von Neumann”does not supportsoft data pathsdoes not supportsoft data paths
Datapath
reconfigurable
Xputer Xputer
Scheduler
CompilerMemory
multiplesequencer
DatapathArray
“instructions”
University of Kaiserslautern
Xputer Lab
loosely coupledby decision data bits only
Xputer:Xputer:The Soft Machine Paradigm
The Soft Machine Paradigm reconfigurablereconfigurable
also for hardwiredalso for hardwired
Computer:the wrong Machine Paradigm
“von Neumann”
© 2001, [email protected] http://www.fpl.uni-kl.de60
University of Kaiserslautern
Xputer LabMachine Paradigms
machine categoryComputer
(“v. Neumann”)Xputer
(no transputer!)
driven by: control flow data streams (no “dataflow”)
engine principles instruction sequencing data sequencing
state register program counter (multiple) data counter(s)
communicationpath set-up
at run time at load time
resource single ALU array of ALUs & other rDPUsdatapath operation sequential parallel pipe network
© 2001, [email protected] http://www.fpl.uni-kl.de61
University of Kaiserslautern
Xputer LabFundamental Ideas available
• Data Sequencer Methodology
• Data-procedural Languages (Duality w. v. N.)
• ... supporting memory bandwidth optimization
• Soft Data Path Synthesis Algorithms
• Parallelizing Loop Transformation Methods
• Compilers supporting Soft Machines
• SW / CW Partitioning Co-Compilers
© 2001, [email protected] http://www.fpl.uni-kl.de62
University of Kaiserslautern
Xputer Lab
JPEG zigzag scan pattern
x
y
EastScan is step by [1,0]end EastScan;
SouthScan isstep by [0,1]endSouthScan;
*> Declarations
NorthEastScan isloop 8 times until [*,1]step by [1,-1]endloopend NorthEastScan;
SouthWestScan isloop 8 times until [1,*]step by [-1,1]endloopend SouthWestScan;
HalfZigZag isEastScanloop 3 times SouthWestScanSouthScanNorthEastScanEastScanendloopend HalfZigZag;
goto PixMap[1,1]
HalfZigZag;SouthWestScanuturn (HalfZigZag)
HalfZigZag
HalfZigZag
data counterdata counter
data counterdata counter
1
3
2
4
© 2001, [email protected] http://www.fpl.uni-kl.de63
University of Kaiserslautern
Xputer LabSimilar Programming Language
Paradigms
language category Computer Languages Xputer Languages
both deterministic procedural sequencing: traceable, checkpointable
sequencingdriven by:
read next instruction, goto (instruction addr.), jump (to instruction addr.), instruction loop, instruction loop nesting no parallel loops, instruction loop escapes, instruction stream branching
read next data object, goto (data addr.), jump (to data addr.), data loop, data loop nesting, parallel data loops, data loop escapes, data stream branching
very easy to learn
© 2001, [email protected] http://www.fpl.uni-kl.de64
University of Kaiserslautern
Xputer Lab
GAG =AddressGenerator
Generic GAG Scheme
LimitStepper
BaseStepper
GAG
AddressStepper
B0AL0
A
A LB0
[ ]|| ||limit
© 2001, [email protected] http://www.fpl.uni-kl.de65
University of Kaiserslautern
Xputer Lab GAG: Address Stepper
GAG =
AddressGenerator
Generic
+ / –
Escape
ClauseEnd
Detect
StepCounter
=o
L A A
inittag
AAddress
endExec
maxStepCount
0BLimit Base stepVector
[] | |
A LB0
[ ]|| ||limit
GAG: Address Stepper
© 2001, [email protected] http://www.fpl.uni-kl.de66
University of Kaiserslautern
Xputer LabGeneric Sequence Examples
a) b)
c)
d) e) f) g)
LimitSlider
BaseSlider
GAG
AddressStepper
B0AL0
A
© 2001, [email protected] http://www.fpl.uni-kl.de67
University of Kaiserslautern
Xputer Lab
floor
F
address
ceiling
C
Slider Operation Demo Example
yx
B0 L0
LB
L
A
B
© 2001, [email protected] http://www.fpl.uni-kl.de68
University of Kaiserslautern
Xputer LabChanging Models of Computation
contemporaryhost
hardwired
Compiler
accelerator(s)
CAD
RAM
reconfigurablecomputing
host
re-
Co-Compiler
conf.accelerator(s)
RAM RAM
SoftwareConfigware
Machine
paradigm
Machine paradigm
EDA tools
needed*
ASICs
*) even 80% hardware people hate their tools
both done at customer sitedone at
vendor site
no hardware
experts needed
© 2001, [email protected] http://www.fpl.uni-kl.de69
University of Kaiserslautern
Xputer Lab
Co-Compilation
Xputer
“Soft” Machine Paradigm
Configware running on
partitioning compiler
high level programming language source
Processor ReconfigurableAcceleratorsin
terf
ace
Reconfigurable Architecture (RA)
-- instead of hardwired
no CAD !
Compilation
instead !
Hardware / Software Co-Design turnsto Configware / Software Co-Design
We introduce: Co-Compilation
Computer
Machine Paradigm
Software running on
Xputer
“Soft” Machine Paradigm
Configware running on
© 2001, [email protected] http://www.fpl.uni-kl.de70
University of Kaiserslautern
Xputer LabJürgen Becker’s Co-DE-X Co-Compiler
Analyzer/ Profiler
host
GNU Ccompiler
paradigmComputer machine
DPSSKressArray
X-Ccompiler
Xputer machineparadigm
Partitioner
Loop
Transfor-
mationsX-C is C languageextended by MoPLX-C
Resource Parameters
supportingdifferentplatforms
supporting platform-based design
© 2001, [email protected] http://www.fpl.uni-kl.de71
University of Kaiserslautern
Xputer LabLoop Transformation
Examples
loop 1-8bodybodyendloop
loop 1-8bodyendloop
loop 9-16bodyendloop
fork
joinstrip mining
loop 1-4triggerendloop
loop 1-2triggerendloop
loop 1-8triggerendloop
reconf.array:host:loop 1-16bodyendloop
sequential processes: resource parameter drivenCo-Compilation
loop unrolling
© 2001, [email protected] http://www.fpl.uni-kl.de72
University of Kaiserslautern
Xputer LabHistory of Loop
TransformationsDavid Loveman, 1977, Allen and Kennedy, et
al.
Loop Unrolling, Loop Fusion, Strip Mining ....
• (Parameter-driven) Time to Time/Space Partitioning1995/97 [Karin Schmidt / Jürgen Becker]: downto Datapath Level:
e. g.: Transformation from Sequential Process to Super-systolic
• Multi-dimensional Loop Unrolling / Storage Scheme Optimization supporting burst-mode & parallel Memory Banks
2000 [Michael Herz]: optimized RA to Memory Communication Bandwidth:
70ies - 80ies: at Process Level:• Sequential to Parallel Processes, incl. Vectorization
© 2001, [email protected] http://www.fpl.uni-kl.de73
University of Kaiserslautern
Xputer Lab>> Future developments
expected
• Introduction
• FPGA boom
• Coarse Grain Architectures
• Fascinating Paradigm Shift
• Programming Coarse Grain rDPAs
• Principles of Soft Computing Machines
• Future developments expected
• Conclusions
http://www.uni-kl.de
© 2001, [email protected] http://www.fpl.uni-kl.de74
University of Kaiserslautern
Xputer LabEH conferences
• "Evolvable Hardware" (EH), "Evolutionary Methods" (EM), "Darwinistic Methods", and biologically inspired electronics
• new FPGA application [genetic FPGA] „the „DNA“ metaphor• EH(NASA/DoD Workshop on Evolvable Hardware), • ICES(Evolvable Systems),• EuroGP and GP (Genetic Programming), • CEC(Congress on Evolutionary Computation),• GECCO(Genetic and Evolutionary Computation), • EvoWorkshops 2002 (Evolutionary Computing Workshops),• MAPLD (Military and Aerospace Applications of Programmable Logic Devices and
Technologies)• ICGA (Genetic Algorithms).
© 2001, [email protected] http://www.fpl.uni-kl.de75
University of Kaiserslautern
Xputer Lab EH - What is it?
• What is the relation between Reconfigurable Computing and Evolvable Computing/Hardware?
*) by crossing chromosomes
Currently: research on darwinistic methods to generate or optimize IT systems by electronic sex*.
"chromosome": a synonym for "configuration code".
YAFA
• Evolvable Hardware and Computing - What is it?
- yet another FPGA application
© 2001, [email protected] http://www.fpl.uni-kl.de76
University of Kaiserslautern
Xputer LabHow important is evolvable
computing ?
• new conferences in their visionary phase
• some NASA / DoD expectations look unrealistic
• Coming shake-out: future is hard to guess
• reminds me to past AI daze
• partly a revival of cybernetics, bionics, etc.
• genetic algorithms people dominate the scene (who do not talk to EDA people) GA suck
© 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
77
Embedded Soft IP Cores
softCPU
FPGA
MemorycoreFPGA
CompilerHLL
© 2001, [email protected] http://www.fpl.uni-kl.de78
University of Kaiserslautern
Xputer LabSome soft CPU core examples
core architecture platform
MicroBlaze 125 MHz 70 D-MIPS
32 bit standard RISC32 reg. by 32 LUT RAM-based reg.
Xilinx up to 100 on one FPGA
Nios 16-bit instr. set
Altera Mercury
Nios 50 MHz
32-bit instr. set
Altera 22 D-MIPS
Nios 8 bit Altera – Mercury
gr1040 16-bit
gr1050 32-bit
My80 i8080A FLEX10K30 or EPF6016
DSPuva16 16 bit DSP Spartan-II
core architecture platform
Leon25 Mhz
SPARC
ARM7 clone ARM
uP1232 8-bit
CISC, 32 reg. 200 XC4000E CLBs
REGIS 8 bits Instr. + ext. ROM
2 XILINX 3020 LCA
Reliance-1 12 bit DSP Lattice 4 isp30256, 4 isp1016
1Popcorn-1 8 bit CISC Altera, Lattice, Xilinx
Acorn-1 1 Flex 10K20
YARD-1A 16-bit RISC, 2 opd. Instr.
old Xilinx FPGA Board
xr16 RISC integer C SpartanXL
© 2001, [email protected] http://www.fpl.uni-kl.de79
University of Kaiserslautern
Xputer LabFPGA CPUs in teaching and
academic research
• UCSC: 1990! • Märaldalen University,
Eskilstuna, Sweden • Chalmers University,
Göteborg, Sweden• Cornell University• Gray Research• Georgia Tech • Hiroshima City University,
Japan
• Michigan State• Universidad de
Valladolid, Spain• Virginia Tech• Washington
University, St. Louis • New Mexico Tech• UC Riverside • Tokai University, Japan
© 2001, [email protected] http://www.fpl.uni-kl.de
University of Kaiserslautern
Xputer Lab
80
Soft rDPA
Hardware Design
Memorysoft CPU
miscellanous
soft
soft
DPUDPU
arra
y
arra
ysoft
soft
DPUDPU
arra
y
arra
y
HLL Compiler
© 2001, [email protected] http://www.fpl.uni-kl.de81
University of Kaiserslautern
Xputer LabArea efficiency: still relevant to-day
• Rapid technology progress
• 50 mio system gates soon
• FPGAs for relocateble configware code ?
• Compatibility at configuration code level ?
• Slower clock: compensated by more parellelism
• Even large rDPAs as a soft IP become feasible
• By >2005: don’t care about area efficiency ?
© 2001, [email protected] http://www.fpl.uni-kl.de82
University of Kaiserslautern
Xputer Lab>> Conclusions
• Introduction
• FPGA boom
• Coarse Grain Architectures
• Fascinating Paradigm Shift
• Programming Coarse Grain rDPAs
• Principles of Soft Computing Machines
• Future development expected
• Conclusionshttp://www.uni-kl.de
© 2001, [email protected] http://www.fpl.uni-kl.de83
University of Kaiserslautern
Xputer LabMain problems to be solved (1)
• Main EDA tools required:
• De facto standard soft IP core libraries
• Tools for much better designer productivity
• Configuration code compatibility by a de facto standard RC platform family
• Compilers accepting high level programming language
• Scalable FPGA architectures supporting relocatable configuration code
© 2001, [email protected] http://www.fpl.uni-kl.de84
University of Kaiserslautern
Xputer LabMain problems to be solved
(2)
• object code compatibility for new µP products
Needed to become the dominant FPGA vendor:
• accepted OS, compilers, development tools available
• most software written for it: many application areas
• most configware (soft IP cores) written for it
• object code compatibility for new FPGA products
• widely accepted „OS“, compilers, development tools
Compare the most successful microprocessor
© 2001, [email protected] http://www.fpl.uni-kl.de85
University of Kaiserslautern
Xputer LabMain problems to be solved
(3)
computingin space
computingin time
systolicarrays etc.
and other transformationsmigration by re-timing
this dichotomy iscompletely ignoredby our CS curricula• Easy to use C or Java
based compilers needed
• Each programmer and each MBA should have qualified awareness on dichotomy and FPGAs• curricular innovations are urgently needed
• Needing HDL-savvy users is a severe limitation
Lobbying urgently
needed
© 2001, [email protected] http://www.fpl.uni-kl.de86
University of Kaiserslautern
Xputer LabHowever, current CS Education ….
Hardware invisible:under the surface
… is based on the Submarine Model
Brain usage:procedural-only
Software Faculty Colleagues shy away from the Paradigm Shift:their Brain hurts? - can’t be: this Half has been amputated
Algorithm
Assembly Language
procedural high level Programming
Language
Hardware
Software
This model disables ...
© 2001, [email protected] http://www.fpl.uni-kl.de87
University of Kaiserslautern
Xputer Lab
Hardware,Configware
Hardware and Software as Alternatives
Algorithm
Software
partitioning
Software onlySoftware & Hardw/Configw
procedural structural
Brain Usage:both Hemispheres
Hardw/Configw only
© 2001, [email protected] http://www.fpl.uni-kl.de88
University of Kaiserslautern
Xputer LabThe Dominance of the Submarine
Model
Hardware
.. indicates, that our CS Education System produces Zillions of Mentally Disabled
Persons
(procedural) structurallydisabled
… completely disabled to cope with Solutions other than Software only
© 2001, [email protected] http://www.fpl.uni-kl.de89
University of Kaiserslautern
Xputer LabIt’s time to crush the Submarine Model
Co-Compilation
structuralprogramming
Xputer machineparadigm
Computing in Space:
von Neumann book
already in the 50iesComputing in Space:
von Neumann book
already in the 50ies
Now Fundamentals and
Technology are availableNow Fundamentals and
Technology are available
It’s time to innovate
CS&E Curricula ...It’s time to innovate
CS&E Curricula ...
.. toward a Dichotomy
of Computing Science.. toward a Dichotomy
of Computing Science
proceduralproceduralprogramming
“von Neumann”“von Neumann”
paradigmComputer machine
computingin space
computingin time
systolicarrays etc.
and other transformationsmigration by re-timing
© 2001, [email protected] http://www.fpl.uni-kl.de90
University of Kaiserslautern
Xputer Lab>>> thank you
thank you for listening
© 2001, [email protected] http://www.fpl.uni-kl.de91
University of Kaiserslautern
Xputer Lab>>> END
END