titel und thema des vortrages - uni-potsdam.de...drc & lvs • during the verification step...
TRANSCRIPT
ASIC Layout
Overview
Design flow
Back-end process
FPGA design process
Conclusions
2
ASIC Design flow
3
Source: http://www.ami.ac.uk
What is Backend?
• Physical Design:
1. FloorPlanning : Architect’s job
2. Placement : Builder’s job
3. Routing : Electrician’s job
4
Input for Layout Tools
Libraries:
• Physical Libraries (LEF/OA) • Cell boundaries, pins, routing rules
• Timing Libraries (*.lib)
Optional Input Files:
• Floorplan File
• IO File
• Scan Definition File
Input:
• Verilog Gate Level Netlist
• Timing Constraint files, for all modes (*.sdc)
Optional Libraries:
• Technology Files (Cap Tables, QRC Tech file)
• SI Libraries (*.cdb)
Import Design Procedure – Global Definition File
File - Import Design
Verilog Netlist File(s)
OA-Flow Reference, Custom
Libraries of Standard Cells; IOs,
Custom Blocks, Rams …
or LEF files (LEF/DEF Flow)
Specify MMMC (Multi Mode Multi Corner) view file:
links timing libraries, RC corners, and constraints per
view
Power/Ground (Special) net definitions,
CPF: Common Power Format
(Low-Power Design/Power Islands)
Command: source <myfile>.globals
init_design
Structure of a Die
• Silicon die is mounted inside a chip package.
• A die consists of a logic core inside a power ring.
• Special power pads are used for the VDD and VSS (Core and Pad).
7
The Design Implementation Flow
Floorplaning
• Floorplanning is a very important step in layout design.
• Important objectives:
Chip size
Aspect ratio
Placement of basic building blocks
IO placement
• Definition of chip size and aspect ratio along with the placement of its building blocks (memories, hard macros) strongly affects the chip routability and the final performance
• The pads should be placed in a way to meet minimum pitch requirements defined by the packaging methodology
9
Placement and Routing
•Placement
•Defines the position of each cell
from the netlist
•Placement performed in the
defined rows
•Target is to place the connected
cells into neighboring positions to
reduce the timing penalty
•Routing
•Performing the connection
between the cells (and IOs)
•Metal lines are used to make the
routing
•Objective is to reduce the
interconnection length (reducing
line capacitance i.e.
interconnection delay)
•Global and local routing
10
Back-end Design decisions
• Core and pad limited design
Design size can be defined either by the core size or by the pad size.
In general the design complexity is defined by the number of gates
(reflected to core area)
However, the pads are unproportionally big and therefore in case of
great number of them, they could define the chip area
• Opposite to that we have a core-limited design.
• The aspect ratio of the chip has to be chosen such that it doesn’t
affect the chip routability and that corresponds to packaging.
The aspect ratio of 1.0 defines quadratic shape of the chip. This
shape is the optimal shape in respect to placement and routing.
• The size of power rings depends on estimated power consumption of
the chip.
Since the power pads are usually distributed evenly on all four sides of
the chip, the maximum current flow through the power rings is ¼ of the
total estimated current.
11
Placement
• ASIC placement is performed in rows
• Routing can be performed in both
directions – horizontal and vertical
• The chip size strongly depends on the
chosen core (row) utilization. A typical
value of core utilization is 75%.
If the chip contains complex logic requiring
excessive routing, the user should
consider relaxing the core utilization.
If the chip logic is relatively simple, the
user may try to tighten up utilization value
in order to reduce the chip size
12
Objectives of Placement Process
• Performing the placement of each individual cells in the rows
• Reducing the placement distance between the connected cells
• Performing high density placements
• Reducing the timing overhead and power consumption
• Addressing the routing challenges (avoiding routing congestion congestion)
• Timing driven placement tries to fulfil the timing constraints while performing placement
It is connected with the processes of trial routing and RC extraction to estimate the effects of the placement choices
13
Placement Algorithms
• Two general types of the algorithms:
Constructive placement
Iterative placement improvement.
• Constructive placement method
Min-cut algorithm, or
eigenvalue method
• Starts with a constructed solution,
following iterative improvement
• The min-cut algorithm placement
method uses successive application of
partitioning
Cut the area into two pieces.
Swap the cells to minimize the cost.
Repeat the process, cutting smaller
pieces until all the logic cells are
placed.
• The eigenvalue placement algorithm
uses the cost matrix or weighted
connectivity matrix
Source: Application-Specific Integrated Circuits - Michael J. S. Smith
(a) Divide the chip into bins using a grid.
(b) Merge all connections to the center of each bin.
(c) Make a cut and swap cells between bins to minimize the cost
(d) Throw out all the edges that are not inside the piece.
(e) Repeat the process and continue the individual bins.
Iterative Placement
• Based on initial placement further improvements are done
Selection criteria decides which cells should be moved.
Measurement criteria decides whether to move the selected cells.
• Several exchange methods
pairwise interchange, force-directed interchange, force-directed
relaxation, and force-directed pairwise relaxation.
• All methods based on selecting a pair of cells which need to be
exchanged.
• First the examined cell is selected, after that exchange with all other
random cells is evaluated based on cost criteria. The limits of selecting the
pair could be defined through the Manhattan distance
(a) Swapping two cells
(b) Swapping more cells provides better
results but It is more complex
(c) A one-neighborhood.
(d) A two-neighborhood.
Source: Application-Specific Integrated Circuits - Michael J. S. Smith
• Clock network need to be implemented to drive all sink elements (flip-flips,
lathes, etc) from the same source line
• Clock network consisting of large numbers of buffers, invertors, clock gates
• Objective is to reduce the phase difference between the clock at the
different clock sinks (clock skew)
• Additional goals is to reduce the clock latency (depending on the clock tree
complexity and interconnection delay)
• Clock is significant source of power consumption, therefore the objective to
reduce it
In modern designs ~50%
• Many sinks use all falling edge of the clocks
Important objecting is balancing of the rise and the fall time.
• The clock tree is defined in clock tree definition file
Clock synthesis
16
Clock trees
• A path from the clock source to clock sinks
17
Figure source: vlsi.pro
Concept of Clock Tree
18
Clock pad
Clock tree
Sub trees
Clock Skew
• Clock skew is the maximum difference in the arrival time of a clock signal
at two different sinks (flip-flops, latches etc).
• Clock skew could lead to performance drop or to the need for fixing of hold
time delay (adding the buffers) which results in additional power and area
• Clock skew should be minimized
19
Figure source: vlsi.pro
Clock Gating and CTS
• Clock gating is often used as a methodology for reducing the power
consumption
Clock network uses ~50% of the power budget
Switching of the network when it is not needed the consumption can be
dramatically reduced
• Clock gating needs to be taken into consideration while making CTS
Clock gate is part of the CTS and contribute to the skew
CT balancing required between not-gated and gated subtrees
Routing
• Goals of the routing is to minimize the interconnect delay
Routing in performed using the available different layers of metal
connections in the automatic way
Design rules need to be fulfilled (minimum spacing etc.)
Different types of routing (trial, clock routing, final routing) depending on the
design phase
Global routing – first phase of the final routing, connecting blocks
Detailed routing – final routing of all interblock connections
21
Manhattan Routing Algorithm
• Motivated by the streets of New York
Straight connections in the horizontal and vertical directions
Specific metal lines only for vertical or only for horizontal direction
Avoiding interconnection problems
Routing channels defined
• Manhattan distance
Summary of distance in X-axis and Y-axis direction
• There are now much more advanced algorithms
Pin A Pin B
Pin C Pin D
Metal 1
Metal 2
Left-Edge Routing Algorithm
Source: Application-Specific Integrated Circuits - Michael J. S. Smith
Verification
• Timing verification
• Power verification
• LVS (layout vs schematics)
• DRC (Design rule check)
24
------------------------------------------------------------
optDesign Final Non-SI Timing Summary
------------------------------------------------------------
+--------------------+---------+---------+---------+---------+
| Setup mode | all | reg2reg |reg2cgate| default |
+--------------------+---------+---------+---------+---------+
| WNS (ns):| 0.000 | 0.000 | 0.815 | 0.000 |
| TNS (ns):| 0.000 | 0.000 | 0.000 | 0.000 |
| Violating Paths:| 0 | 0 | 0 | 0 |
| All Paths:| 4906 | 3787 | 38 | 1143 |
+--------------------+---------+---------+---------+---------+
+--------------------+---------+---------+---------+---------+
| Hold mode | all | reg2reg |reg2cgate| default |
+--------------------+---------+---------+---------+---------+
| WNS (ns):| 0.003 | 0.003 | 0.009 | 8.622 |
| TNS (ns):| 0.000 | 0.000 | 0.000 | 0.000 |
| Violating Paths:| 0 | 0 | 0 | 0 |
| All Paths:| 4906 | 3787 | 38 | 1143 |
+--------------------+---------+---------+---------+---------+
Timing Verification in Backend Design
• Timing verification after synthesis was possible based on the cell
delay and assumed interconnect delay (wireload model)
• After layout the real interconnect delay can be estimated
• Based on routing information (length, types of metal lines between
two pins) the parasitics can be calculated
• Two important parameters R (resistivity) and C (capacity) of the line
• Interconnect delay
td = R * C
Figure source: Application-Specific Integrated Circuits - Michael J. S. Smith
Power Verification
• Power related issues are very important in verification process
Power consumption
IR drop
Ground bounce
EMI
Substrate noise
Crosstalk
DRC & LVS
• During the verification step Design Rule Check it is verified whether all
manufacturer rules have been followed
• LVS includes extraction of schematics from the final layout and
comparison with the original netlist which was input for the layout
Expected result is full matching
Non-matching could indicate the problems: shorts, opens, parametric
missmatch etc.
Full Back-End Flow
Technology and IP setup (libraries, memory/hard macro IP, PDK)
Loading of input data (verilog netlist, constraints)
Floorplanning
Power planning
Placement
Initial verification and IPO
Clock tree insertion
Post-CTS verification and IPO
Routing
Post-Routing Verification and IPO
Timing Closure and ECO (Error Correction and Optimization)
Power/Voltage verification
DRC
LVS
Design for Manufacturability (Metal fillers etc)
28
Field-Programmable Gate Arrays (FPGAs)
• FPGAs are already fabricated chips which can be fully
functionally programmed after production
Programming can be done by writing into configuration
memory after power-on
Configuration SRAM or Flash
• FPGAs are consisting of configurable logic blocks (CLBs)
which can be individually programmed using
programmable LUTs and memory blocks
• Routing (interconnect) between the CLBs is also
programmable using configurable routing elements
• FPGAs are in general less power efficient and with
reduced performances but NRE costs are reduced to
minimum
Today FPGAs contain specialized blocks (embedded
processors, DSP) which make them more optimal
29
30
Basic Architecture
Source figure: Xilinx
Example: Spartan 2
• Basic architecture of FPGA
contains the elements which
can be fully programmed
CLBs
Memory
IOs
Interconnect
Clocking
31
Configurable Logic Block (CLB)
Source figure : Xilinx
Example: Spartan 6
• CLBs enable full functional
programmability
programmable Lookup-tables (LUT)
for arbitrary combinational function
selectable/programmable sequential
cell for targeted distributed memory
function
use of multiplexors for
interconnecting the correct function
32
I/O Block
Source figure: Xilinx
Example: Spartan 6
• IO pads in FPGAs are fully reconfigurable
support different IO directions (I, O, IO)
single ended /differential
different interface standards (CMOS, TTL, LVDS)
different power supplies (3.3V, 2.5V, 1.8V, 1.5V, 1.2V)
pullups, pulldowns, with and wo registering
FPGA Clocking
33
Example: Spartan 6
• Clocking in FPGAs is also programmable
based on DCMs which can be programmed
in frequency/phase and aligned with other
clock sources
• Clock driver is routed to all relevant
sinks
CLBs, memory, IOs
Source figure: Xilinx
FPGA Design Flow
34
Source figure eet.com
• Design flow corresponds to the one
for ASIC, but with different
implementation
Synthesis – translation of HDL into
components of FPGA
Place – placing the netlist into
CLBs of FPGA
Route – programming
interconnects to execute the
function
FPGA Pros and Cons
Pros
Reducing NRE costs – no mask costs, reduced design costs
Reducing design time – no need to wait for chip samples
Possibility for easy correction – only reprogramming needed
Cons
High unit costs – one FPGA can be even ~10k€
Higher power consumption
Reduced performances
Today’s FPGA much more optimal
Integrating multiprocessors on chip, DSPs, interfaces etc.
35
Example- Xilinx Zynq Ultrascale+
Example of optimized FPGA platform
Multi-core ARM system
implemented on chip
Large memory resources
Advanced connectivity (USB,
PCIe, CAN, SATA, etc)
Real-time support
Combining with programmable
logic
Support for high-speed serial
interfaces
36 Source figure: Xilinx
Conclusions
• Process of designing ASICs was here analysed in details.
• Main stapes include the synthesis, back-end and timing verification
• During the practical part we will analyze the steps using the software
CAD tools
• FPGA flow is similar to ASIC flow
37