Download - ECE 551 Digital System Design & Synthesis
ECE 551Digital System Design &
Synthesis
Lecture 12“To synthesis, and beyond…”
So, the thing finally synthesized! So, what have you created so far?
A list of the required hardware cells A netlist describing their interconnections A simulation model that hopefully reflects reality
more accurately than the pure HDL-level simulation Includes semi-accurate logic delays
2
Now What? After synthesis, we have a netlist mapped to
our specific tech library ROMs PLDs FPGAs Standard cells Custom logic
Choose implementation platform based on cost and performance requirements
3
ROMs
Use like a GIANT truth table
Can be inefficient forsimple logic! Gates
Specify just the 1’s Specify just the 0’s
ROM Has to specify both! All outputs for all possible
minterms
4
x y zdcba
0 0 0 00 0 0 10 0 1 00 0 1 10 1 0 00 1 0 10 1 1 00 1 1 11 0 0 01 0 0 1. . . .
1 1 0 01 1 0 11 1 1 01 1 1 1
. . . .
. . . .
0 0 00 1 10 1 10 1 00 1 10 1 00 1 00 1 10 1 10 1 0. . .. . .. . .0 1 00 1 10 1 11 1 0
address data
ROMs Use like a GIANT truth table
5
x
y
zd
c
b
a0 0 00 1 10 1 10 1 00 1 10 1 00 1 00 1 10 1 10 1 0. . .. . .. . .0 1 00 1 10 1 11 1 0
addressdata
0
1
1
0
1
0
0
ROMs Use like a GIANT truth table 64K ROM: 8K entries x 8 bits (13 addr. lines)
8 Boolean functions using any of these 13 1-bit variables
6
abcdefghijkl
m
s
z
y
x
w
v
ut
address data
ROMs Use like a GIANT truth table 64K ROM: 8K entries x 8 bits (13 addr.
lines) 2 4-bit functions of 3 4-bit variables (plus flag) Other options possible
7
abcdefghijkl
m
s
z
y
x
w
v
ut
address data
ROM Logical Structure
8
AddressD ecoder
(N on-program m able)
O R M em ory Array(2 n x m )
addr[0 ]addr[1 ]
addr[n -1 ]
w [m -1] w [0]
2n M in term s (w ord lines)fo rm ed from inputs
n inputs
m outputs
ROM Circuit Structure
9
AddressD ecodern - to - 2 n
(N on-program m able)
D [m -1 ] D [0]D [1 ]D [2 ]
addr[0 ]
addr[1 ]
addr[n -1 ]
En_bar
m ask-program m edpu lldow n transistor link
V D D
Pull-upR esisto r
O R -P lane (2 n x m )
M em oryC ell
W ord lines
O utpu ts (B it-lines )
Pro
duct
Ter
ms
( Wor
d-lin
es)
Inpu
ts
Erasable Programmable ROM (EPROM)
10
AddressDecodern - to - 2n
(Non-program m able)
w[m -1] w[0]w[1]w[2]
addr[0]
addr[1]
addr[n -1]
En_bar
Floating Gate(P rogram m able) V D D
Pull-upResistor
m in2n -3
m in2n -1
m in2n -2
m in0
OR-Plane (2 n x m )
Mem oryCell
Flash Memory A flash memory is an electrically erasable PROM
configured with additional circuitry to allow erasure/programming blocks of memory (e.g. 16-64 Kbytes) in circuit.
Widely used as the program storage memory for computers and embedded systems, as well as data storage memory (audio, video, file systems) High endurance 100k/1M+ erase cycles
Flash memory (SSDs) are cost-competitive with magnetic disks up to several GB, with no mechanical shock issues, and much better random-access times.
Some FPGAs use flash memory instead of SRAM to allow instant-on behavior and not expose IP.
11
Comparison of ROMs
12
D evice
EEPR O M
F LASH
EPR O M
PRO M
Program m ingM ode Erase M ode
In-circu itByte-by-byte
In-circu it
O ut-of-c ircu it
C ustom byuser (O TP***) N one
In-circu itByte-by-byte
In -circu itBulk or sector
O ut-of-c ircu itBulk, U V Ligh t
C om plexityand C ost
R O M * M ask N one
AccessT im e
150 ns
*R equires h igh vo lum e to o ffset N R E** P rogram m ing tim e: 500 m s*** O ne-tim e program m able
Exam ple
T M S47C 25632K x 8 C M O S
AT27B V400256K x 16 or
512K x 8
In te l 27324K x 8 NM O S 45 ns
In te l 28648K x 8 NM O S
AT49LV102464K x 16 N M O S 70 ns**
ROMs Cheap – couple bucks each Reuse EEPROMs with different truth tables Non-volatile - keep values when power gone Very slow compared to gates (memory read) Combinational-only Limited to fairly simple designs (e.g., 20 or
fewer inputs) due to exponential scaling
ROMs are good for complex operations that use few variables (trigonometry, matrix inversion, etc.)
They are often used in combination with other types of logic 13
PLDs Programmable Logic Devices
PLA (Programmable Logic Array) – programmable AND and OR arrays
PAL (Programmable Array Logic) – programmable AND array and fixed OR array
Programming done at points where wires cross
14
a !a b !b c !c d !d
x y
a !a b !b c !c d !d
x
y
PLA PAL
OutputsInputsInputs
Outputs
ProductTerms
PLDs Programming points where wires cross x = a b c + a d y = a b c d + a b d + b c d
15
a !a b !b c !c d !d
x y
a !a b !b c !c d !d
x
y
PLA PAL
OutputsInputsInputs
Outputs
ProductTerms
PLDs Moderate per-unit price – 1s to 10s of $ Most are re-programmable Faster than ROMs Relatively slow compared to gates
Programming points cause delay Limited complexity
“Complex” PLDs have sequential ability, but are still too limited for very complex designs
Crossbar design scales poorly with number of inputs
Good when you don’t need the complexity of FPGA and want to save money.
16
FPGAs Field Programmable Gate Array
Temporary (Flash/SRAM based) Permanent (Anti-fuse) not as common
Pros Allow for very complex implementations Generally re-useable
(upgrades/bug-fixes/prototype) Low non-recurring engineering (NRE) costs
Cons Expensive per-unit (10s-100s of $) Slower than gates
Programming points MPGA – mask-programmable (one time)
17
Programming an FPGA Most designs based on SRAM
During configuration, the SRAM bits in the device are written with the desired values Note that this means that your IP is being passed into the
FPGA in a serial stream for the whole world to see! Different circuits implemented based on values
set in SRAM bits that form LUTs, control multiplexers, and make routing connections
18
Routing Elements Programmable connection
Programmable bypass
19
RoutingResource #1
P
RoutingResource #2
DFF
OUT
SIGNAL
P
Logic Elements Look-Up Table (LUT)
Essentially a very small memory Most common size is 4-input LUT
20
P1P2
P3P4
P5P6
P7P8
a cb
OUT
01234567
Logic Elements Look-Up Table (LUT) Example
OUT = a XOR b XOR c
21
01
10
10
01
a cb
OUT
01234567
Logic Elements Look-Up Table (LUT) Example
OUT = ab + ac + bc
22
10
11
11
01
a cb
OUT
01234567
Logic Elements Look-Up Table (LUT)
Extremely flexible in implementing logic Can implement any function!
Larger and slower than just using gates
23
P1P2
P3P4
P5P6
P7P8
a cb
OUT
01234567
FPGA Logic Structure
“Cell” or “logic block”: 1 or more LUTs
(generally 4-input) At least one D flip-flop Possibly fast carry logic
Connect several logic blocks to form circuit
24
4-LUT
carry logic
Cout Cin
OUT
DFF
I1 I2 I3 I4
Xilinx 4000 Combinational Logic Block
25
Xilinx 4000 FPGA (# of CLBs not to scale)
26
SwitchMatrix
CLB
IO B
IO B
IO B
IO B
IO B
IO B
IO B
IO B
IO B
IO B
IO B IO B IO B
IO B IOB IO B
Verticallong line
Horizontallong line
CLB
SwitchMatrix
CLBCLB
SwitchMatrix
SwitchMatrix
SwitchMatrix
SwitchMatrix
SwitchMatrix
SwitchMatrix
SwitchMatrixIO B
IO B
IO B
IOB
FPGA Summary Allow for complex implementations Generally reuseable
(upgrades/bugfixes/prototype) Low non-recurring engineering (NRE) costs
Relatively expensive per-unit (10s-100s of $)
Slower than pure gates (programming points), but FPGAs are normally first to latest technology
Newer FPGAs incorporate memories, multipliers, peripherals, and even processors all on the same chip
27
FPGA Trends Hardware specialization
Memory block hierarchies I/O interfaces
High-speed serial I/O Clock management Hardware for DSP (MAC units)
Intellectual Property (IP) cores Hard-cores Soft-cores http://www.altera.com/products/ip/ipm-index.html
Conversion to mask-programmed devices Altera Hard Copy, Xilinx Easy Path
Current Technology Examples...
Xilinx Virtex-5Xilinx’s nearly top of the line FPGA 65nm process technology
550MHz RAM blocks 6-input LUTs
Serial connectivity Ethernet MACs Rocket I/O serial 3.25Gbps PCI Express endpoint
Enhanced DSP blocks (25x18, 48b accum) 1760 pin BGA with 1200 I/O EasyPath
Xilinx Virtex-5 Applications
Xilinx Virtex-5 Family
Altera Stratix III
Stratix III
Stratix III
Altera Stratix III
Altera NIOS
Altera NIOS
Altera NIOS
Stratix III vs. Virtex-5http://www.altera.com/literature/wp/wp-01007.pdf
Stratix III vs. Virtex-5
More Current Products Actel FPGAs
Flash-based design eliminates configuration time Less susceptible to radiation induced upsets
Also manufactured in antifuse technology
Mask-Programmable Gate Arrays Mask-programmable (MPGAs) Fixed logic elements, metal routing added
42
…
Fixed Spacing
Base Cell
Metal interconnect placed in channels between cellsTransistor / gate
MPGAs Cheap per-unit pricing ($1s-$10s) Fast compared to ROMs/PLDs/FPGAs Simpler Mask than Standard Cell (routing
only) Fixed gates available High non-recurring engineering (NRE) cost -
design time, mask fabrication... $10K-$100Ks
Best for medium-to-large quantities
Used for medium-to-high-volume designs, or hardware that must be faster than FPGA 43
Standard Cells
44
Gates and other small structures
Can also use macroblocks Groups of pre-optimized
cells Larger custom-layout
structures Better logic density
From: http://www.zuraleff.com/layout
Standard Cell Layouts
45
…Adjustable
Spacing
Megacells
Metal interconnect placed in channels between cells
Gate, flip-flop, 1-bit adder, …
IC Layout Styles
Technologies in terms of layout styles:
46
Adjustable Spacing
Megacells
Standard Cell
Gate Array
…
…Fixed Spacing
Base Cell
Metal interconnect placed in channels between cells
Gate, flip-flop, 1-bit adder, …
Transistor / gate
Standard Cells Cheap per-unit pricing ($1s-$10s) Achieve better logic density than MPGA Fast compared to ROMs/PLDs/FPGAs
High NREs (design time, mask fabrication...) $100Ks-$10Ms More expensive masks than Gate Arrays
Used for Large quantities and/or Performance-critical operations
47
Custom Logic Manual layout Extremely high NRE
Huge design time! Even longer verification
time Maximum performance
and density
PLD/FPGA physicalhardware is custom logic They sell a LOT of them! You don’t have to
amortize all of their NRE, just part
48
Hardware Implementations Making the right platform choice is one of the
most important decisions for a design project’s success
There is no one “best” method
Tradeoffs between cost, speed, time-to-market, upgradeability, power efficiency
Technological changes are shifting traditional design choices. Engineers must be ready.
49
Hardware Trends Standard Cell & Custom getting more
expensive Validation is getting harder with smaller gates and
more complex designs, and is not scaling well w/Moore’s Law.
Licensing of IP is being used to counter-act NRE “Hard” (layout) and “Soft” (HDL) IP cores ARM architecture a great example
50
Hardware Trends FPGAs are getting faster and bigger
Big enough to implement a lot of designs that used to require Standard Cells
Lots of built-in IP for connectivity: Ethernet, USB, SATA
Power is becoming a significant driver Moore’s Law scaling survives for logic density but
is dying for total power consumption More computing devices are battery powered, and
batteries are not keeping pace with Moore’s Law
51
Technology Mapping Generally part of synthesis Use different tools / components based on
standard cells vs. FPGA target
Divides your circuit into basic building blocks.
52
Tech Mapping: Standard Cells Need to select your library
Which cells you’re using Which macro-cells / specialized structures
In this class, we’re using: TSMC 65/45/40 nm cell libraries
Tech mapping then implements your netlist in terms of the available cells
How do you choose?
53
Tech Mapping: Standard Cells Example boolean equation:
z = a b c + c d + e Example cell library:
2-input NAND, INV Resulting tech-mapped circuit:
54
acb
ecd
z
Tech Mapping: FPGAs Need to know building blocks of the FPGA
LUT size (if uses LUTs) Any special resources (multipliers, RAM blocks)
Tech mapping then implements your netlist in terms of those building blocks
55
Tech Mapping: FPGAs Example boolean equation:
z = a b c + c d + e Example basic block:
4-input LUT Resulting tech-mapped circuit:
56
acb
ecd
z
LUT #1
LUT #2
Tech Mapping: FPGAs Example boolean equation:
z = a b c + c d + e Example basic block:
4-input LUT Resulting tech-mapped circuit:
57
acb
ecd
z
LUT #1
LUT #2
a b c
y y + c d + e
Now What? So you’ve:
Designed your hardware in Verilog. Chosen your hardware implementation
(std. cells, FPGA, etc) How do you get from a netlist to silicon?
VLSI CAD (“Physical Design”)
58
VLSI CAD Flow
59
Translation
Verified HDL Description
Generic Netlist
Technology Mapping
Cell Library / FPGA
DescriptionPlace
Route
Partition & Floorplan
Mask Gen.
...To Fab!
Std. cells
Config. Bits
…Program!
FPGA
Post-SynthesisNetlist
Partitioning & Floorplanning Sometimes you have BIG circuits
Makes placement take a long time Yields poor results (too large a solution space)
Use partitioning and floorplanning Partitioning: Divide netlist into partitions Floorplanning: Assign partitions to chip regions Place regions separately Benefit: Small problems are easier to solve well
than large ones
What’s the Disadvantage? 60
Partitioning Example
61
A
B
C
D
E
G
F
H
I
J
K
L
How might we choose to form 3 partitions?
Partitioning Example - Bad
62
A
B
C
D
E
G
F
H
I
J
K
L
Partitioning We want to try to make our partitions as
independent as possible. Independent = fewer outside connections
Why? Want to keep wires short Try to place partitions adjacent to the partitions
they interconnect with If we have a lot of interconnections, this may not
be easy/possible
63
Partitioning Example - Bad
64
A
B
C
D
E
G
F
H
I
J
K
L
Partitioning Example - Better
65
A
B
C
D
E
G
F
H
I
J
K
L
Floorplanning OK, so we’ve divided our problem up into
partitions
Now, figure out where partitions should be placed relative to one another
Assign partitions to regions of the silicon / FPGA
Try to avoid long wires between partitions Don’t want to have to route wires through
too many other partitions Wastes area in those partitions 66
Floorplanning Example
67
4
72
1
5
3
6
9
8
Floorplanning Example Try to arrange partitions to minimize cross-
partition routing
68
4
7
2
1
5 3
6
98
Eat your heart out, Sudoku.
Placement Need to assign physical locations to
cells/LUTs If partitioning
Relative to the partition boundaries Otherwise
Relative to the chip boundaries
Common goal Reduce total wirelength of placed circuit
69
Placement Standard Cells:
Choosing a row for each cell Choosing a location within the row for each cell
FPGAs: Choosing which physical LUTs implement each
netlist LUTs
70
Routing Have locations for all the cells/LUTs in the
netlist Now need to connect them together to
actually make the circuit
Different techniques for std. cell vs. FPGA
Divided into: Global Detailed (local)
71
Global Routing Find a rough path for each net Figure out what areas a signal passes
through
72
Detailed Routing: Std. Cells Connect the cells within the global regions Common goal: minimize channel width
73
Channel Width
1 2 2 4 4 0 3 0 4
2 4 4 3 0 0 3 3 1
Detailed Routing: FPGAs Assign signals in netlist to:
Wires Switchbox points
Fixed set of available resources Can’t “widen” routing channels like Std Cell
Common goal: Reduce congestion Congestion is the ratio of signals:wires By keeping areas “open”, more likely to be able to
route later signals
74
Detailed Routing: FPGAs Common goal: Reduce congestion
75
Detailed Routing: FPGAs Common goal: Reduce congestion
76
Detailed Routing: FPGAs Frequently start with an “idealized” routing
Signals can share wires Repeatedly “rip up” and reroute
One or more nets (signals) Stop when no wires are shared
77
Final Steps: Std. Cells Generate “masks” for each layer, indicating
where the material in that layer goes Have cell locations, cell library has cell “design” Plus metal layers created during the routing phase
Send to chip fabrication foundry
78
Final Steps: FPGAs Generate the “configuration bitstream”
The series of 1’s and 0’s that determine the FPGA’s function
Tools determine these values based on: LUT contents Routing resource useage
Load the configuration onto the FPGA Also called “programming” or “configuring”
79
Conclusion Synthesis isn’t the end of the process!
Many steps after it Choose target implementation
Examine cost/performance tradeoffs Use CAD tools to implement synthesized
circuit on FPGA or std. cells Optionally partition & floorplan Place & Route Generate bitstream or layout masks
See ECE 556 for more details on CAD algorithms
80
3.125 Gb/s Transceiver
Xilinx Digital Clock Manager (DCM)
Eliminate clock skew using Delay-Locked Loop (DLL) Monitors clock skew on output and corrects Frequency doubling, multiphase clocks
Fractional Digital Frequency Synthesizer (DFS) - fOUT = M/N fIN
Input/Output Block (IOB)
Slew rate and drive strength controlPull-up, pull-down and keeperDDR signalsControlled-Z input/outputBoundary scan