cse 291 winter 2009 the fpga ecosystem rajesh gupta university of california, san diego
TRANSCRIPT
![Page 1: CSE 291 Winter 2009 The FPGA Ecosystem Rajesh Gupta University of California, San Diego](https://reader034.vdocuments.site/reader034/viewer/2022042822/56649ee75503460f94bf8caa/html5/thumbnails/1.jpg)
CSE 291 Winter 2009
The FPGA Ecosystem
Rajesh Gupta
University of California, San Diego
![Page 2: CSE 291 Winter 2009 The FPGA Ecosystem Rajesh Gupta University of California, San Diego](https://reader034.vdocuments.site/reader034/viewer/2022042822/56649ee75503460f94bf8caa/html5/thumbnails/2.jpg)
2
Moore’s Law
40048008
80808085
8086286
386486 Pentium ® proc
P6
1
10
100
1970 1980 1990 2000 2010
Year
Die
siz
e (m
m)
~7% growth per year
~2X growth in 10 years
Die size grows by 14% to satisfy Moore’s LawDie size grows by 14% to satisfy Moore’s Law
Courtesy, Intel
40048008
80808085 8086
286386
486Pentium® proc
P6
0.001
0.01
0.1
1
10
100
1000
1970 1980 1990 2000 2010
Year
Tra
nsi
sto
rs (
MT
)
2X growth in 1.96 years!
Transistors on lead microprocessors double every 2 yearsTransistors on lead microprocessors double every 2 years
Courtesy, Intel
Lead microprocessors frequency doubles every 2 yearsLead microprocessors frequency doubles every 2 years
P6
Pentium ® proc486
38628680868085
8080
80084004
0.1
1
10
100
1000
10000
1970 1980 1990 2000 2010
Fre
qu
ency
(M
hz)
2X every 2 years
Courtesy, Intel
P6Pentium ® proc
486
3862868086
80858080
80084004
0.1
1
10
100
1971 1974 1978 1985 1992 2000Year
Po
we
r (W
att
s)
![Page 3: CSE 291 Winter 2009 The FPGA Ecosystem Rajesh Gupta University of California, San Diego](https://reader034.vdocuments.site/reader034/viewer/2022042822/56649ee75503460f94bf8caa/html5/thumbnails/3.jpg)
3
The ITRS: Tao of Scalinghttp://public.itrs.net
Source: Ken Yang, UCLA
2007 0.065 micron
6.7 GHz on chip clock 9 wiring levels 600-3000 pins Vdd=0.7-1.1V
3.5W / 104W / 190W DRAM:
4.29 Gb/chip, 183 mm^2, 2.35 Gb/cm^2 MPU
386 Mtrans/chip, 140 mm^2, 276.1 Mtrans/cm^2
![Page 4: CSE 291 Winter 2009 The FPGA Ecosystem Rajesh Gupta University of California, San Diego](https://reader034.vdocuments.site/reader034/viewer/2022042822/56649ee75503460f94bf8caa/html5/thumbnails/4.jpg)
4
Design Abstraction Levels
SYSTEM
GATE
CIRCUIT
VoutVin
CIRCUIT
VoutVin
MODULE
+
DEVICE
n+S D
n+
G
Adapted from Irwin & Nayaranan’s Slides from PSU. Copyright 2002 J. Rabaey et al."
![Page 5: CSE 291 Winter 2009 The FPGA Ecosystem Rajesh Gupta University of California, San Diego](https://reader034.vdocuments.site/reader034/viewer/2022042822/56649ee75503460f94bf8caa/html5/thumbnails/5.jpg)
5
Design Process
• Conceptualization: function & structure– HLM, behavioral modeling
• Architecture: structure and organization– microarchitectural implementation
• Logical implementation: gates, modules– logic synthesis, logic verification, static timing analysis
• Circuit implementation: transistors– circuit simulations
• Physical design, verification– floorplanning, placement, routing, dynamic timing analysis
![Page 6: CSE 291 Winter 2009 The FPGA Ecosystem Rajesh Gupta University of California, San Diego](https://reader034.vdocuments.site/reader034/viewer/2022042822/56649ee75503460f94bf8caa/html5/thumbnails/6.jpg)
6
Speed Power Cost
High Low Volume
Many Implementation Choices
• Microprocessors
• Domain-specific processors– DSP
– Network processors
– Microcontrollers
• ASIPs
• Reconfigurable SoC
• FPGA
• Gate-array
• ASIC
![Page 7: CSE 291 Winter 2009 The FPGA Ecosystem Rajesh Gupta University of California, San Diego](https://reader034.vdocuments.site/reader034/viewer/2022042822/56649ee75503460f94bf8caa/html5/thumbnails/7.jpg)
7
E.g. Degree of Customization of Processor Architecture
• The architecture of the computation engine used to implement desired functionality
• Processor does not have to be programmable– “Processor” not equal to general-purpose processor
Application-specific
Registers
CustomALU
DatapathController
Program memory
Assembly code for:
total = 0 for i =1 to …
Control logic and State register
Datamemory
IR PC
Single-purpose (“hardware”)
DatapathController
Control logic
State register
Datamemory
index
total
+
IR PC
Registerfile
GeneralALU
DatapathController
Program memory
Assembly code for:
total = 0 for i =1 to …
Control logic and
State register
Datamemory
General-purpose (“software”)
[Adapted from Embedded Systems Design: A Unified Hardware/Software Introduction. Copyright 2000 Vahid & Givargis]
total = 0for i = 1 to N loop total += M[i]end loop
![Page 8: CSE 291 Winter 2009 The FPGA Ecosystem Rajesh Gupta University of California, San Diego](https://reader034.vdocuments.site/reader034/viewer/2022042822/56649ee75503460f94bf8caa/html5/thumbnails/8.jpg)
8
General-purpose Microprocessors
• Programmable device used in a variety of applications– Also known as “microprocessor”
• Features– Program memory
– General datapath with large register file and general ALU
• User benefits– Low time-to-market and NRE costs
– High flexibility
• “Pentium” the most well-known, but there are hundreds of others
IR PC
Registerfile
GeneralALU
DatapathController
Program memory
Assembly code for:
total = 0 for i =1 to …
Control logic and
State register
Datamemory
[Adapted from Embedded Systems Design: A Unified Hardware/Software Introduction. Copyright 2000 Vahid & Givargis]
![Page 9: CSE 291 Winter 2009 The FPGA Ecosystem Rajesh Gupta University of California, San Diego](https://reader034.vdocuments.site/reader034/viewer/2022042822/56649ee75503460f94bf8caa/html5/thumbnails/9.jpg)
9
Application-specific Instruction Processors, ASIP
• Programmable processor optimized for a particular class of applications having common characteristics– Compromise between general-purpose and
single-purpose processors
• Features– Program memory
– Optimized datapath
– Special functional units
• Benefits– Some flexibility, good performance, size and
power
IR PC
Registers
CustomALU
DatapathController
Program memory
Assembly code for:
total = 0 for i =1 to …
Control logic and
State register
Datamemory
[Adapted from Embedded Systems Design: A Unified Hardware/Software Introduction. Copyright 2000 Vahid & Givargis]
![Page 10: CSE 291 Winter 2009 The FPGA Ecosystem Rajesh Gupta University of California, San Diego](https://reader034.vdocuments.site/reader034/viewer/2022042822/56649ee75503460f94bf8caa/html5/thumbnails/10.jpg)
10
Single-purpose ‘Processors,’ or ASIC
• Digital circuit designed to execute exactly one program– a.k.a. coprocessor, accelerator or peripheral
• Features– Contains only the components needed to execute a
single program
– No program memory
• Benefits– Fast
– Low power
– Small size
DatapathController
Control logic
State register
Datamemory
index
total
+
[Adapted from Embedded Systems Design: A Unified Hardware/Software Introduction. Copyright 2000 Vahid & Givargis]
![Page 11: CSE 291 Winter 2009 The FPGA Ecosystem Rajesh Gupta University of California, San Diego](https://reader034.vdocuments.site/reader034/viewer/2022042822/56649ee75503460f94bf8caa/html5/thumbnails/11.jpg)
11
E.g. ASIC
• A direct sequence spread spectrum (DSSS) radio receiver ASIC (UCLA)
ASIC FeaturesArea: 4.6 mm x 5.1 mm
Speed: 20 MHz @ 10 Mcps
Technology: HP 0.5 m
Power: 16 mW - 120 mW (mode dependent) @ 20 MHz, 3.3 V
Avg. Acquisition Time: 10 s to 300 s
![Page 12: CSE 291 Winter 2009 The FPGA Ecosystem Rajesh Gupta University of California, San Diego](https://reader034.vdocuments.site/reader034/viewer/2022042822/56649ee75503460f94bf8caa/html5/thumbnails/12.jpg)
12
The Implementation Choice is Important
![Page 13: CSE 291 Winter 2009 The FPGA Ecosystem Rajesh Gupta University of California, San Diego](https://reader034.vdocuments.site/reader034/viewer/2022042822/56649ee75503460f94bf8caa/html5/thumbnails/13.jpg)
13
The Co-design Ladder
• In the past:– Hardware and software
design technologies were very different
– Recent maturation of synthesis enables a unified view of hardware and software
• Hardware/software “codesign”
Implementation
Assembly instructions
Machine instructions
Register transfers
Compilers(1960's,1970's)
Assemblers, linkers(1950's, 1960's)
Behavioral synthesis(1990's)
RT synthesis(1980's, 1990's)
Logic synthesis(1970's, 1980's)
Microprocessor plus program bits: “software”
VLSI, ASIC, or PLD implementation: “hardware”
Logic gates
Logic equations / FSM's
Sequential program code (e.g., C, VHDL)
The choice of hardware versus software for a particular function is simply a tradeoff among various design metrics, like performance, power, size, NRE cost, and especially flexibility; there is no
fundamental difference between what hardware or software can implement.
[Adapted from Embedded Systems Design: A Unified Hardware/Software Introduction. Copyright 2000 Vahid & Givargis]
![Page 14: CSE 291 Winter 2009 The FPGA Ecosystem Rajesh Gupta University of California, San Diego](https://reader034.vdocuments.site/reader034/viewer/2022042822/56649ee75503460f94bf8caa/html5/thumbnails/14.jpg)
14
Map from Behavior to Architecture
[Vincentelli]
![Page 15: CSE 291 Winter 2009 The FPGA Ecosystem Rajesh Gupta University of California, San Diego](https://reader034.vdocuments.site/reader034/viewer/2022042822/56649ee75503460f94bf8caa/html5/thumbnails/15.jpg)
15
Four Phases in Creating a Chip
![Page 16: CSE 291 Winter 2009 The FPGA Ecosystem Rajesh Gupta University of California, San Diego](https://reader034.vdocuments.site/reader034/viewer/2022042822/56649ee75503460f94bf8caa/html5/thumbnails/16.jpg)
16
Implementation Choices
Custom
Standard CellsCompiled Cells Macro Cells
Cell-based
Pre-diffused(Gate Arrays)
Pre-wired(FPGA's)
Array-based
Semicustom
Digital Circuit Implementation Approaches
Adapted from Digital Integrated Circuits (2nd Edition). Copyright 2002 J. Rabaey et al."
![Page 17: CSE 291 Winter 2009 The FPGA Ecosystem Rajesh Gupta University of California, San Diego](https://reader034.vdocuments.site/reader034/viewer/2022042822/56649ee75503460f94bf8caa/html5/thumbnails/17.jpg)
17
Transition to Automation and Regular Structures
Intel 4004 (‘71)Intel 4004 (‘71)Intel 8080Intel 8080 Intel 8085Intel 8085
Intel 8286Intel 8286 Intel 8486Intel 8486Courtesy IntelAdapted from Digital Integrated Circuits (2nd Edition). Copyright 2002 J. Rabaey et al."
![Page 18: CSE 291 Winter 2009 The FPGA Ecosystem Rajesh Gupta University of California, San Diego](https://reader034.vdocuments.site/reader034/viewer/2022042822/56649ee75503460f94bf8caa/html5/thumbnails/18.jpg)
18
Cell-based Design (or standard cells)
Routing channel requirements arereduced by presenceof more interconnectlayers
Functionalmodule(RAM,multiplier,…)
Routingchannel
Logic cellFeedthrough cellR
ow
s o
f ce
lls
Adapted from Digital Integrated Circuits (2nd Edition). Copyright 2002 J. Rabaey et al."
![Page 19: CSE 291 Winter 2009 The FPGA Ecosystem Rajesh Gupta University of California, San Diego](https://reader034.vdocuments.site/reader034/viewer/2022042822/56649ee75503460f94bf8caa/html5/thumbnails/19.jpg)
19
Standard Cell - Example
3-input NAND cell(from ST Microelectronics):C = Load capacitanceT = input rise/fall time
Adapted from Digital Integrated Circuits (2nd Edition). Copyright 2002 J. Rabaey et al."
![Page 20: CSE 291 Winter 2009 The FPGA Ecosystem Rajesh Gupta University of California, San Diego](https://reader034.vdocuments.site/reader034/viewer/2022042822/56649ee75503460f94bf8caa/html5/thumbnails/20.jpg)
20
Automatic Cell Generation
Courtesy Acadabra
Initial transistorgeometries
Placedtransistors
Routedcell
Compactedcell
Finishedcell
Adapted from Digital Integrated Circuits (2nd Edition). Copyright 2002 J. Rabaey et al."
![Page 21: CSE 291 Winter 2009 The FPGA Ecosystem Rajesh Gupta University of California, San Diego](https://reader034.vdocuments.site/reader034/viewer/2022042822/56649ee75503460f94bf8caa/html5/thumbnails/21.jpg)
21
MacroModules
25632 (or 8192 bit) SRAMGenerated by hard-macro module generator
Adapted from Digital Integrated Circuits (2nd Edition). Copyright 2002 J. Rabaey et al."
![Page 22: CSE 291 Winter 2009 The FPGA Ecosystem Rajesh Gupta University of California, San Diego](https://reader034.vdocuments.site/reader034/viewer/2022042822/56649ee75503460f94bf8caa/html5/thumbnails/22.jpg)
22
“Soft” MacroModules
Synopsys DesignCompilerAdapted from Digital Integrated Circuits (2nd Edition). Copyright 2002 J. Rabaey et al."
![Page 23: CSE 291 Winter 2009 The FPGA Ecosystem Rajesh Gupta University of California, San Diego](https://reader034.vdocuments.site/reader034/viewer/2022042822/56649ee75503460f94bf8caa/html5/thumbnails/23.jpg)
23
“Intellectual Property”
A Protocol Processor for Wireless
Adapted from Digital Integrated Circuits (2nd Edition). Copyright 2002 J. Rabaey et al."
![Page 24: CSE 291 Winter 2009 The FPGA Ecosystem Rajesh Gupta University of California, San Diego](https://reader034.vdocuments.site/reader034/viewer/2022042822/56649ee75503460f94bf8caa/html5/thumbnails/24.jpg)
24
Semicustom Design Flow
HDLHDL
Logic SynthesisLogic Synthesis
FloorplanningFloorplanning
PlacementPlacement
RoutingRouting
Tape-out
Circuit ExtractionCircuit Extraction
Pre-Layout Simulation
Pre-Layout Simulation
Post-Layout Simulation
Post-Layout Simulation
StructuralStructural
PhysicalPhysical
BehavioralBehavioralDesign Capture
Des
ign
Iter
atio
nD
esig
n It
erat
ion
Adapted from Digital Integrated Circuits (2nd Edition). Copyright 2002 J. Rabaey et al."
![Page 25: CSE 291 Winter 2009 The FPGA Ecosystem Rajesh Gupta University of California, San Diego](https://reader034.vdocuments.site/reader034/viewer/2022042822/56649ee75503460f94bf8caa/html5/thumbnails/25.jpg)
25
Pre-diffused(Gate Arrays)
Pre-wired(FPGA's)
Array-based
Late-Binding Implementation
Custom
Standard CellsCompiled Cells Macro Cells
Cell-based
Pre-diffused(Gate Arrays)
Pre-wired(FPGA's)
Array-based
Semicustom
Digital Circuit Implementation Approaches
Adapted from Digital Integrated Circuits (2nd Edition). Copyright 2002 J. Rabaey et al."
![Page 26: CSE 291 Winter 2009 The FPGA Ecosystem Rajesh Gupta University of California, San Diego](https://reader034.vdocuments.site/reader034/viewer/2022042822/56649ee75503460f94bf8caa/html5/thumbnails/26.jpg)
26
Gate Array — Sea-of-gates
rows of
cells
routing channel
uncommitted
VD D
GND
polysilicon
metal
possiblecontact
In1 In2 In3 In4
Out
UncommitedCell
CommittedCell(4-input NOR)
Adapted from Digital Integrated Circuits (2nd Edition). Copyright 2002 J. Rabaey et al."
![Page 27: CSE 291 Winter 2009 The FPGA Ecosystem Rajesh Gupta University of California, San Diego](https://reader034.vdocuments.site/reader034/viewer/2022042822/56649ee75503460f94bf8caa/html5/thumbnails/27.jpg)
27
Sea-of-gate Primitive Cells
NMOS
PMOS
Oxide-isolation
PMOS
NMOS
NMOS
Using oxide-isolation Using gate-isolation
Adapted from Digital Integrated Circuits (2nd Edition). Copyright 2002 J. Rabaey et al."
![Page 28: CSE 291 Winter 2009 The FPGA Ecosystem Rajesh Gupta University of California, San Diego](https://reader034.vdocuments.site/reader034/viewer/2022042822/56649ee75503460f94bf8caa/html5/thumbnails/28.jpg)
28
Prewired Arrays
Classification of prewired arrays (or field-programmable devices):
• Based on Programming Technique– Fuse-based (program-once)
– Non-volatile EPROM based
– RAM based
• Programmable Logic Style– Array-Based
– Look-up Table
• Programmable Interconnect Style– Channel-routing
– Mesh networks
Adapted from Digital Integrated Circuits (2nd Edition). Copyright 2002 J. Rabaey et al."
![Page 29: CSE 291 Winter 2009 The FPGA Ecosystem Rajesh Gupta University of California, San Diego](https://reader034.vdocuments.site/reader034/viewer/2022042822/56649ee75503460f94bf8caa/html5/thumbnails/29.jpg)
29
Antifuse
• Normally high resistance (> 100 M)– on application of
appropriate voltage, the antifuse is changed permanently to a low resistance structure (200-500)
![Page 30: CSE 291 Winter 2009 The FPGA Ecosystem Rajesh Gupta University of California, San Diego](https://reader034.vdocuments.site/reader034/viewer/2022042822/56649ee75503460f94bf8caa/html5/thumbnails/30.jpg)
30
Array-Based Programmable Logic
PLA PROM PAL
I 5 I 4
O0
I 3 I 2 I 1 I 0
O1O2O3
Programmable AND array
ProgrammableOR array I5 I4
O0
I3 I2 I1 I0
O1O2O3
Programmable AND array
Fixed OR array
Indicates programmable connection
Indicates fixed connection
O0
I3 I2 I1 I0
O1O2O3
Fixed AND array
ProgrammableOR array
Adapted from Digital Integrated Circuits (2nd Edition). Copyright 2002 J. Rabaey et al."
![Page 31: CSE 291 Winter 2009 The FPGA Ecosystem Rajesh Gupta University of California, San Diego](https://reader034.vdocuments.site/reader034/viewer/2022042822/56649ee75503460f94bf8caa/html5/thumbnails/31.jpg)
31
Programming a PROM
f0
1 X 2 X 1 X 0
f1NANA
: programmed node
Adapted from Digital Integrated Circuits (2nd Edition). Copyright 2002 J. Rabaey et al."
![Page 32: CSE 291 Winter 2009 The FPGA Ecosystem Rajesh Gupta University of California, San Diego](https://reader034.vdocuments.site/reader034/viewer/2022042822/56649ee75503460f94bf8caa/html5/thumbnails/32.jpg)
32
2-input mux as programmable logic block
FA 0
B
S
1
Configuration
A B S F=
0 0 0 00 X 1 X0 Y 1 Y0 Y X XYX 0 YY 0 XY 1 X X 1 Y1 0 X1 0 Y1 1 1 1
XYXY
XY
Adapted from Digital Integrated Circuits (2nd Edition). Copyright 2002 J. Rabaey et al."
![Page 33: CSE 291 Winter 2009 The FPGA Ecosystem Rajesh Gupta University of California, San Diego](https://reader034.vdocuments.site/reader034/viewer/2022042822/56649ee75503460f94bf8caa/html5/thumbnails/33.jpg)
33
Logic Cell of Actel Fuse-Based FPGA
A
B
SA Y
1
C
D
SB
1
S0S1
1
Adapted from Digital Integrated Circuits (2nd Edition). Copyright 2002 J. Rabaey et al."
![Page 34: CSE 291 Winter 2009 The FPGA Ecosystem Rajesh Gupta University of California, San Diego](https://reader034.vdocuments.site/reader034/viewer/2022042822/56649ee75503460f94bf8caa/html5/thumbnails/34.jpg)
34
Look-up Table Based Logic Cell
Out
ln1 ln2
Me
mory In Out
00 00
01 1
10 1
11 0
Adapted from Digital Integrated Circuits (2nd Edition). Copyright 2002 J. Rabaey et al."
![Page 35: CSE 291 Winter 2009 The FPGA Ecosystem Rajesh Gupta University of California, San Diego](https://reader034.vdocuments.site/reader034/viewer/2022042822/56649ee75503460f94bf8caa/html5/thumbnails/35.jpg)
35
LUT-Based Logic Cell
Courtesy Xilinx
D4
C1....C4
xxxxxx
D3
D2
D1
F4
F3
F2
F1
Logicfunction
ofxxx
Logicfunction
ofxxx
Logicfunction
ofxxx
xx
xx
4
xxxxxx
xxxxxxxx
xxx
xxxx xxxx xxxx
HP
Bitscontrol
Bitscontrol
Multiplexer Controlledby Configuration Program
x
xx
x
xx
xxx xx
xxxx
x
xxxxxx
xx
x
xx
xxx
xx
Xilinx 4000 Series
Adapted from Digital Integrated Circuits (2nd Edition). Copyright 2002 J. Rabaey et al."
![Page 36: CSE 291 Winter 2009 The FPGA Ecosystem Rajesh Gupta University of California, San Diego](https://reader034.vdocuments.site/reader034/viewer/2022042822/56649ee75503460f94bf8caa/html5/thumbnails/36.jpg)
36
Array-Based Programmable Wiring
Input/output pinProgrammed interconnection
InterconnectPoint
Horizontaltracks
Vertical tracks
Cell
M
Adapted from Digital Integrated Circuits (2nd Edition). Copyright 2002 J. Rabaey et al."
![Page 37: CSE 291 Winter 2009 The FPGA Ecosystem Rajesh Gupta University of California, San Diego](https://reader034.vdocuments.site/reader034/viewer/2022042822/56649ee75503460f94bf8caa/html5/thumbnails/37.jpg)
37
Mesh-based Interconnect Network
Switch Box
Connect Box
InterconnectPoint
Courtesy Dehon and WawrzyniekAdapted from Digital Integrated Circuits (2nd Edition). Copyright 2002 J. Rabaey et al."
![Page 38: CSE 291 Winter 2009 The FPGA Ecosystem Rajesh Gupta University of California, San Diego](https://reader034.vdocuments.site/reader034/viewer/2022042822/56649ee75503460f94bf8caa/html5/thumbnails/38.jpg)
38
Transistor Implementation of Mesh
Courtesy Dehon and WawrzyniekAdapted from Digital Integrated Circuits (2nd Edition). Copyright 2002 J. Rabaey et al."
![Page 39: CSE 291 Winter 2009 The FPGA Ecosystem Rajesh Gupta University of California, San Diego](https://reader034.vdocuments.site/reader034/viewer/2022042822/56649ee75503460f94bf8caa/html5/thumbnails/39.jpg)
39
Hierarchical Mesh Network
Use overlayed meshto support longer connections
Reduced fanout and reduced resistance
Courtesy Dehon and WawrzyniekAdapted from Digital Integrated Circuits (2nd Edition). Copyright 2002 J. Rabaey et al."
![Page 40: CSE 291 Winter 2009 The FPGA Ecosystem Rajesh Gupta University of California, San Diego](https://reader034.vdocuments.site/reader034/viewer/2022042822/56649ee75503460f94bf8caa/html5/thumbnails/40.jpg)
40
EPLD Block Diagram
MacrocellPrimary inputs
Courtesy AlteraAdapted from Digital Integrated Circuits (2nd Edition). Copyright 2002 J. Rabaey et al."
![Page 41: CSE 291 Winter 2009 The FPGA Ecosystem Rajesh Gupta University of California, San Diego](https://reader034.vdocuments.site/reader034/viewer/2022042822/56649ee75503460f94bf8caa/html5/thumbnails/41.jpg)
41
Altera MAX
From Smith97Adapted from Digital Integrated Circuits (2nd Edition). Copyright 2002 J. Rabaey et al."
![Page 42: CSE 291 Winter 2009 The FPGA Ecosystem Rajesh Gupta University of California, San Diego](https://reader034.vdocuments.site/reader034/viewer/2022042822/56649ee75503460f94bf8caa/html5/thumbnails/42.jpg)
42
Altera MAX Interconnect Architecture
LAB2
PIA
LAB1
LAB6
tPIA
tPIA
row channelcolumn channel
LAB
Courtesy Altera
Array-based(MAX 3000-7000)
Mesh-based(MAX 9000)
Adapted from Digital Integrated Circuits (2nd Edition). Copyright 2002 J. Rabaey et al."
![Page 43: CSE 291 Winter 2009 The FPGA Ecosystem Rajesh Gupta University of California, San Diego](https://reader034.vdocuments.site/reader034/viewer/2022042822/56649ee75503460f94bf8caa/html5/thumbnails/43.jpg)
43
Field-Programmable Gate ArraysFuse-based
I/O Buffers
P rogram/Test/Diag nostics
I/O Buffers
I/O B
uffe
rs
I/O B
uffe
rs
Vertical ro utes
Rows o f logic m odule s
Routing channels
Standard-cell likefloorplan
Adapted from Digital Integrated Circuits (2nd Edition). Copyright 2002 J. Rabaey et al."
![Page 44: CSE 291 Winter 2009 The FPGA Ecosystem Rajesh Gupta University of California, San Diego](https://reader034.vdocuments.site/reader034/viewer/2022042822/56649ee75503460f94bf8caa/html5/thumbnails/44.jpg)
44
Xilinx 4000 Interconnect Architecture
2
12
8
4
3
2
3
CLB
8 4 8 4
Quad
Single
Double
Long
DirectConnect
DirectConnect
Quad Long GlobalClock
Long Double Single GlobalClock
CarryChain
Long
12 4 4
Courtesy XilinxAdapted from Digital Integrated Circuits (2nd Edition). Copyright 2002 J. Rabaey et al."
![Page 45: CSE 291 Winter 2009 The FPGA Ecosystem Rajesh Gupta University of California, San Diego](https://reader034.vdocuments.site/reader034/viewer/2022042822/56649ee75503460f94bf8caa/html5/thumbnails/45.jpg)
45
RAM-based FPGA
Xilinx XC4000ex
Courtesy XilinxAdapted from Digital Integrated Circuits (2nd Edition). Copyright 2002 J. Rabaey et al."
![Page 46: CSE 291 Winter 2009 The FPGA Ecosystem Rajesh Gupta University of California, San Diego](https://reader034.vdocuments.site/reader034/viewer/2022042822/56649ee75503460f94bf8caa/html5/thumbnails/46.jpg)
46
Heterogeneous Programmable Platforms
Xilinx Vertex-II Pro
Courtesy Xilinx
High-speed I/O
Embedded PowerPcEmbedded memories
Hardwired multipliers
FPGA Fabric
Adapted from Digital Integrated Circuits (2nd Edition). Copyright 2002 J. Rabaey et al."
![Page 47: CSE 291 Winter 2009 The FPGA Ecosystem Rajesh Gupta University of California, San Diego](https://reader034.vdocuments.site/reader034/viewer/2022042822/56649ee75503460f94bf8caa/html5/thumbnails/47.jpg)
SOC as a heterogeneous computing substrate
ASIC
DSPCode
System Bus
Proc.Code
CODEC
Analoginterface
ProgrammableProcessor CoreMemory Interface
Host/Bus InterfaceProgrammable
DSP CoreMemory Interface
Host/Bus Interface
User interface
Multi-ported memory
Real time Operating System
Code
MicroprocessorCore
ASIC
Controller process
BUSCNTL
SERIAL I/O
ASICASIC
DSPCodeDSPCode
System Bus
Proc.CodeProc.Code
CODEC
Analoginterface
ProgrammableProcessor CoreMemory Interface
Host/Bus InterfaceProgrammableProcessor CoreMemory Interface
Host/Bus InterfaceProgrammable
DSP CoreMemory Interface
Host/Bus InterfaceProgrammable
DSP CoreMemory Interface
Host/Bus Interface
User interface
User interface
Multi-ported memoryMulti-ported memory
Real time Operating System
Code
MicroprocessorCore
Real time Operating System
Real time Operating System
CodeCode
MicroprocessorCore
ASICASIC
Controller process
BUSCNTL
SERIAL I/O
![Page 48: CSE 291 Winter 2009 The FPGA Ecosystem Rajesh Gupta University of California, San Diego](https://reader034.vdocuments.site/reader034/viewer/2022042822/56649ee75503460f94bf8caa/html5/thumbnails/48.jpg)
Experimental Side of Putting Things Together
![Page 49: CSE 291 Winter 2009 The FPGA Ecosystem Rajesh Gupta University of California, San Diego](https://reader034.vdocuments.site/reader034/viewer/2022042822/56649ee75503460f94bf8caa/html5/thumbnails/49.jpg)
Design
Goal of design is to take an ‘idea’ and build something that performance a certain function
Such ‘idea’ to ‘implementation’ never happen directly We go through ‘models’ that allow us to reason about properties May also be used by implementers to explore alternatives for
cost, performance
MODELS are key to formalization of the design And its process.
![Page 50: CSE 291 Winter 2009 The FPGA Ecosystem Rajesh Gupta University of California, San Diego](https://reader034.vdocuments.site/reader034/viewer/2022042822/56649ee75503460f94bf8caa/html5/thumbnails/50.jpg)
Model of Computation
A ‘model’ is an abstraction of a ‘description’ (Sometimes, a model is also used as a replica of a ‘description’)
This abstraction is defined using some ‘terms’ If the terms are graphical graphical model If the terms are mathematical formal model Generally, terms and their relationships are devised to allow syntactical
support for expressing important concepts If done right, a MOC
supports important concepts of an application domain through use of right terms
is clear and unambiguous to allow anyone to replicate/simulate intended behavior
is compositional: compositions can be validated with less effort than ab initio description
![Page 51: CSE 291 Winter 2009 The FPGA Ecosystem Rajesh Gupta University of California, San Diego](https://reader034.vdocuments.site/reader034/viewer/2022042822/56649ee75503460f94bf8caa/html5/thumbnails/51.jpg)
Compositional View of SOCs:Model of Computation A system consists of components Important questions to ask when dealing with components
What is a component? (Component ontology) States? Processes? Threads? Differential equations? Constraints?
Objects? … What knowledge do components share? (Epistemology)
Time? Name spaces? Signals? State? How do components communicate? (Protocols)
Events? Rendezvous? Message Passing? CT Signals? Streams? Method Calls? …
What do components communicate? (Lexicon) Objects? Transfer of control? Data structures? Strings?...
A MOC makes it easier to reason through these questions Start with a model of a machine, define its behavior (as operational semantics)
![Page 52: CSE 291 Winter 2009 The FPGA Ecosystem Rajesh Gupta University of California, San Diego](https://reader034.vdocuments.site/reader034/viewer/2022042822/56649ee75503460f94bf8caa/html5/thumbnails/52.jpg)
Characteristics of Common MOCs Finite State Machines
State is summary of past, Finite number of states No concurrency, no explicit time specification
Data-Flow Partial order of actions/events Concurrency, determinate, support streams (data, computation)
Discrete-event models Global notion of time, causality
![Page 53: CSE 291 Winter 2009 The FPGA Ecosystem Rajesh Gupta University of California, San Diego](https://reader034.vdocuments.site/reader034/viewer/2022042822/56649ee75503460f94bf8caa/html5/thumbnails/53.jpg)
Finite State Machines (FSMs)
Functional decomposition into states of operation Useful for control functions, protocols
Properties of FSMs Good for specifying sequential control. Not Turing complete.
More amenable to formal analysis. Typical domains of application
Control-intensive tasks. Protocols (Telecom, cache-coherency, bus, ...)
Many variants of the formulation Differ in communication, determinism, ...
![Page 54: CSE 291 Winter 2009 The FPGA Ecosystem Rajesh Gupta University of California, San Diego](https://reader034.vdocuments.site/reader034/viewer/2022042822/56649ee75503460f94bf8caa/html5/thumbnails/54.jpg)
ALARM
OFF5_SECONDS_UP => ALARM_ON
WAIT
KEY_OFF or BELT_ON
KEY_ON => START_TIMER
10_SECONDS_UP or BELT_ON or KEY_OFF => ALARM_OFF
Informal Specification If the driver
turns on the key, and does not fasten the
seat belt within 5 seconds
then sound the alarm for 5 seconds, or until the driver
fastens the seat belt or until the driver
turns off the key No explicit condition => implicit self-loop in the current state
FSM Example: Seat Belt Alarm Control
![Page 55: CSE 291 Winter 2009 The FPGA Ecosystem Rajesh Gupta University of California, San Diego](https://reader034.vdocuments.site/reader034/viewer/2022042822/56649ee75503460f94bf8caa/html5/thumbnails/55.jpg)
FSM = (Inputs, Outputs, States, InitialState, NextState, Outs) Inputs = {KEY_ON, KEY_OFF, BELT_ON,
BELT_OFF, 5_SECONDS_UP, 10_SECONDS_UP} Outputs = {START_TIMER, ALARM_ON,
ALARM_OFF} States = {OFF, WAIT, ALARM}
InitialState = OFF NextState: CurrentState, Inputs -> NextState
e.g., NextState(WAIT, {KEY_OFF}) = OFF All inputs other than KEY_OFF are implicitly absent
Outs (function): CurrentState, Inputs -> Outputs e.g., Outs(OFF, {KEY_ON}) = START_TIMER
ALARM
OFF5_SECONDS_UP => ALARM_ON
WAIT
KEY_OFF or BELT_ON
KEY_ON => START_TIMER
10_SECONDS_UP or BELT_ON or KEY_OFF => ALARM_OFF
Finite State Machine: Example + Definition
NextState: 2Inputs x S -> S Set of all subsets of I
Outs: 2Inputs x S -> 2Outputs
![Page 56: CSE 291 Winter 2009 The FPGA Ecosystem Rajesh Gupta University of California, San Diego](https://reader034.vdocuments.site/reader034/viewer/2022042822/56649ee75503460f94bf8caa/html5/thumbnails/56.jpg)
A finite state machine is said to be non-deterministic when The NextState and Output functions may be RELATIONs (instead of
functions). NextState(WAIT, {KEY_OFF, END_TIMER_5})={{OFF},
{ALARM}} Non-determinism can be user to model
unspecified behavior incomplete specification
unknown behavior e.g., the environment model Driver can be modeled as single state FSM with outputs {KEY_ON,
KEY_OFF, BELT_ON} abstraction
(the abstraction may result in insufficient detail to identify previously distinguishable situations)
Non-deterministic Finite State Machines
![Page 57: CSE 291 Winter 2009 The FPGA Ecosystem Rajesh Gupta University of California, San Diego](https://reader034.vdocuments.site/reader034/viewer/2022042822/56649ee75503460f94bf8caa/html5/thumbnails/57.jpg)
Concurrency and FSM
Significant model change: treat it as a ‘collection’ Fundamental assumption: all FSMs change states together
(synchronicity) System state is a cartesian product State space can be reduced by constrained compositions
E.g., sequential composition: output of one machine is input of another
A cleaner way to extend FSM model? Hierarchy
![Page 58: CSE 291 Winter 2009 The FPGA Ecosystem Rajesh Gupta University of California, San Diego](https://reader034.vdocuments.site/reader034/viewer/2022042822/56649ee75503460f94bf8caa/html5/thumbnails/58.jpg)
Discrete Event Models
Action, Events Notion of global time
Though it is not fundamental: time progress can be captured by ‘special’ events
Events can happen anytime asynchronously A system consists of components with input events
and output events Also, referred to as ‘primary events’.
Component is evaluated in response to input events Evaluation leads to events at the output
A discrete event simulator is a program that specifies how components are evaluated Components at a time (‘clock-driven’) Event at a time (‘event-driven’)
![Page 59: CSE 291 Winter 2009 The FPGA Ecosystem Rajesh Gupta University of California, San Diego](https://reader034.vdocuments.site/reader034/viewer/2022042822/56649ee75503460f94bf8caa/html5/thumbnails/59.jpg)
Reactive Systems “React” to events
e.g., in the external environment, other subsystems Suited for modeling “non-terminating” interactions
e.g., operating systems, interrupt handlers, process control systems. Often subject to external timing constraints
“real-time” Synchronous Reactive Systems
Synchrony associates ‘clock’ to a model All ‘synchronous events’ happen simultaneously
Clock is a ‘simplifcation’ or abstraction of time in models Between clocks, any amount of time can pass
Reactive (Real-time) Systems
![Page 60: CSE 291 Winter 2009 The FPGA Ecosystem Rajesh Gupta University of California, San Diego](https://reader034.vdocuments.site/reader034/viewer/2022042822/56649ee75503460f94bf8caa/html5/thumbnails/60.jpg)
60
Four useful MOCs
• Discrete Event (DE)– Timed models, suitable for modeling digital hardware
– But can be very general (define what is an event and what happens to it)
• Finite State Machines– Variants and extensions: StateCharts, StarCharts
• Synchronous Reactive Models– Synchrony assumption useful for safety critical embedded systems
(instantaneous reactions)• (Convert timing relations to causal ordering)
– A program is logically correct if it is deterministic and reactive
– Verifying that a program is causal is a challenge• Want one and only solution for each configuration of inputs
– Assume “constructive causality” to make it work• Still a lot better than multi-level time (delta) models
• Dataflow Process Networks– Signal processing applications
![Page 61: CSE 291 Winter 2009 The FPGA Ecosystem Rajesh Gupta University of California, San Diego](https://reader034.vdocuments.site/reader034/viewer/2022042822/56649ee75503460f94bf8caa/html5/thumbnails/61.jpg)
61
Compositional Correctness
• Build “Complete” System Models– That include the application and system software
– Adapt, control and debug applications
– Explore the full potential of SOC architectural platforms• e.g., by exploring applications, networking and communication
subsystems together
• Composition challenges– Language support for multiple MOCs not enough
– Model composability may not be guaranteed• E.g., composition of synchronous models may not be closed
• Like connecting two FSMs can lead to combinational cycles
– solutions like: delta steps (VHDL), acyclic composition (Lustre), reactions as fixed points (Esterel
![Page 62: CSE 291 Winter 2009 The FPGA Ecosystem Rajesh Gupta University of California, San Diego](https://reader034.vdocuments.site/reader034/viewer/2022042822/56649ee75503460f94bf8caa/html5/thumbnails/62.jpg)
62
Going Across MOC: Ptolemy Approach
• Encapsulate each description in a MOC in a “domain”
• Inter-domain simulations achieved through domain encapsulation
– Define semantics of every such encapsulation carefully, conservatively (and yet with some efficiency)
• The “event horizon”– Couple timed, untimed domains
![Page 63: CSE 291 Winter 2009 The FPGA Ecosystem Rajesh Gupta University of California, San Diego](https://reader034.vdocuments.site/reader034/viewer/2022042822/56649ee75503460f94bf8caa/html5/thumbnails/63.jpg)
63
Network Architecture Modeling: NS2
• Developed under the Virtual Internet Testbed (VINT) project (UCB, LBL, USC/ISI, Xerox PARC)
• Captures network nodes, topology and provides efficient event driven simulations with a number of “schedulers”
• Interpreted interface for– network configuration, simulation setup
– using existing simulation kernel objects such as predefined network links
• Simulation model in C++ for– packet processing
– changing models of existing simulation kernel classes, e.g., using a special queuing discipline.
![Page 64: CSE 291 Winter 2009 The FPGA Ecosystem Rajesh Gupta University of California, San Diego](https://reader034.vdocuments.site/reader034/viewer/2022042822/56649ee75503460f94bf8caa/html5/thumbnails/64.jpg)
64
NS2 Simulations
![Page 65: CSE 291 Winter 2009 The FPGA Ecosystem Rajesh Gupta University of California, San Diego](https://reader034.vdocuments.site/reader034/viewer/2022042822/56649ee75503460f94bf8caa/html5/thumbnails/65.jpg)
65
A 4-node system with 2 “agents”, a traffic generator
n0UDP
n1TCP
n2n3
Sink
ftp
set ns [new Simulator]set f [open out.tr w]$ns trace-all $fset n0 {$ns node}set n1 {$ns node}set n2 {$ns node}set n3 {$ns node}$ns duplex-link $no $n2 5Mb 2ms DropTail$ns duplex-link $n1 $n2 5Mb 2ms DropTail$ns duplex-link $n2 $n3 1.5Mb 10ms DropTailset udp0 [newagent/UDP]$ns attach-agent $n0 $udp0set cbr0 [newapplication/Traffic/CBR]$cbr0 attach-agent $udp0..$ns at 3.0 “finish”proc finish () {
…}$ns run
• “Agents” are network endpoints where network-layer packets are constructed or consumed.
![Page 66: CSE 291 Winter 2009 The FPGA Ecosystem Rajesh Gupta University of California, San Diego](https://reader034.vdocuments.site/reader034/viewer/2022042822/56649ee75503460f94bf8caa/html5/thumbnails/66.jpg)
66
NS2 Usage: LAN nodes
• LAN and wireless links are inherently different from PTP links due to sharing and contention properties of LANs
– a network consisting of PTP links alone can not capture LAN contention properties
– a special node is provided to specify LANs
• LanNode captures functionality of three lowest layers in the protocol stack, namely: link, MAC and physical layers.
– Specifies objects to be created for LL, INTF, MAC and Physical channels.
– Example:$ns make-lan <nodelist> <bw> <delay> <LL> <ifq> <MAC> <channel> <phy>
$ns make-lan “$n1 $n2” $bw $delay LL queue/DropTail Mac/CSMA/CD.
– Creates a LAN with basic link-layer, drop-tail queue and CSMA/CD medium access control.
n1 n2
n3
n1 n2
n3
LAN
The LAN node collects all the objects shared
on the LAN.
![Page 67: CSE 291 Winter 2009 The FPGA Ecosystem Rajesh Gupta University of California, San Diego](https://reader034.vdocuments.site/reader034/viewer/2022042822/56649ee75503460f94bf8caa/html5/thumbnails/67.jpg)
67
node1
Q
LL
MAC
node2
Q
LL
MAC
node3
Q
LL
MAC
Channel MAC classifier
LL
MAC
Phy
Channel object simulates the shared medium and supports the medium access mechanisms
of the MAC objects on the sending side.
On the receiving side, MAC classifier is responsible for delivering and optionally replicating packets to the receiving MAC
objects.
Network Stack simulation for LAN nodes in ns
Objects used in LAN nodes. Each of the underlying classes can be specialized for a given simulation.
![Page 68: CSE 291 Winter 2009 The FPGA Ecosystem Rajesh Gupta University of California, San Diego](https://reader034.vdocuments.site/reader034/viewer/2022042822/56649ee75503460f94bf8caa/html5/thumbnails/68.jpg)
68
Putting things together…
Source: Virtio Corp.
ASIC HardwareNetwork Processor(s) and Memories
System Software: OS, Middleware, Application Software
![Page 69: CSE 291 Winter 2009 The FPGA Ecosystem Rajesh Gupta University of California, San Diego](https://reader034.vdocuments.site/reader034/viewer/2022042822/56649ee75503460f94bf8caa/html5/thumbnails/69.jpg)
69
Time Granularity in ModelsA. "Specification model" "Untimed functioal models"
B. "Component-assembly model" "Architecture model" "Timed functonal model"
C. "Bus-arbitration model" "Transaction model"
D. "Bus-functional model" "Communicatin model" "Behavior level model"
E. "Cycle-accurate computationmodel"
F. "Implementation model" "Register transfer model"
Computation
Communication
A B
C
D F
Un-timed
Approximate-timed
Cycle-timed
Un-timed
Approximate-timed E
Cycle-timed
• Models B, C, D and E could be classified as TLMs.
Source: Daniel Gajski, UC Irvine.
![Page 70: CSE 291 Winter 2009 The FPGA Ecosystem Rajesh Gupta University of California, San Diego](https://reader034.vdocuments.site/reader034/viewer/2022042822/56649ee75503460f94bf8caa/html5/thumbnails/70.jpg)
70
Hardware-software co-simulation
• Verification of the functionality of a system consisting of both hardware and software (as early as possible in the design cycle).
ProcessorModel
CustomHardware
ModelCommunication
• BFM• ISA• CAM• TAM
• Functional• Behavioral• RTL• Gate• Transistor
• Tightly coupled• Loosely coupled• One process• Multi-process
![Page 71: CSE 291 Winter 2009 The FPGA Ecosystem Rajesh Gupta University of California, San Diego](https://reader034.vdocuments.site/reader034/viewer/2022042822/56649ee75503460f94bf8caa/html5/thumbnails/71.jpg)
71
Processor Models
• Four types of models– Bus-functional models
– Instruction-set models
– Cycle-accurate models
– Timing accurate models
BFM
ISM
![Page 72: CSE 291 Winter 2009 The FPGA Ecosystem Rajesh Gupta University of California, San Diego](https://reader034.vdocuments.site/reader034/viewer/2022042822/56649ee75503460f94bf8caa/html5/thumbnails/72.jpg)
72
Bus-functional Models
• Can only execute bus transactions• Can be used to check how peripherals interact with the processor bus• Available in different degrees of timing accuracy
– Cycle-accurate– Phase-accurate– Full timing (nanosecond) accurate
• Very popular in hardware design
BFM
CLK
ADDRESS
CE
DATA
R/W
Rea
d f
rom
0xf
f00
![Page 73: CSE 291 Winter 2009 The FPGA Ecosystem Rajesh Gupta University of California, San Diego](https://reader034.vdocuments.site/reader034/viewer/2022042822/56649ee75503460f94bf8caa/html5/thumbnails/73.jpg)
73
Instruction-set (ISA) Models
• Basic ISA Model– Model only the effect of
instruction execution on registers and memory
– Not processor pipeline
– Fast, used in embedded software models
• Cylcle-accurate ISA– Model the processor
pipeline and instruction execution in a cycle-accurate manner
– Provides accurate cycle counts for instruction execution
– 1.2-5X slower
Fetch ExecuteDecode
Register File
Memorymov r0, r1add r0, r2, r3st r0, (r5)
![Page 74: CSE 291 Winter 2009 The FPGA Ecosystem Rajesh Gupta University of California, San Diego](https://reader034.vdocuments.site/reader034/viewer/2022042822/56649ee75503460f94bf8caa/html5/thumbnails/74.jpg)
74
Processor Models• ISA Processor Model
– ISA Model + Cycle-accurate BFM
– Cycle accurate bus transactions but not cycle accurate instruction execution
– Fastest useful processor model
• Cycle-accurate Processor Model– Cycle-accurate ISA + Cycle-accurate BFM
– Cycle accurate instruction execution and bus transactions
– Slower than ISA processor model but still popular.
BFM
ISM
![Page 75: CSE 291 Winter 2009 The FPGA Ecosystem Rajesh Gupta University of California, San Diego](https://reader034.vdocuments.site/reader034/viewer/2022042822/56649ee75503460f94bf8caa/html5/thumbnails/75.jpg)
75
Timing-accurate Models
• Correctly models the processor behavior at the nanosecond accurate level
• Is usually generated from a gate-level netlist of the processor
• Slow (could be 3 to 5 orders of magnitude slower than cycle-accurate processor models)
• Seldom used
![Page 76: CSE 291 Winter 2009 The FPGA Ecosystem Rajesh Gupta University of California, San Diego](https://reader034.vdocuments.site/reader034/viewer/2022042822/56649ee75503460f94bf8caa/html5/thumbnails/76.jpg)
76
Typical Usage Models
• System architects looking at hardware/software tradeoffs
• ASIC developers wanting a fast and easy way to test out the hardware running actual code
• Software developers testing H/W drivers and RTOS on hardware (HDL) models
• Software developers testing application code with an RTOS on the “real” hardware (i.e. evaluation board)
• Distributed application developers– SensorSIM, TOSSIM