titel und thema des vortrages - uni-potsdam.de...drc & lvs • during the verification step...

ASIC Layout

Overview

Design flow

Back-end process

FPGA design process

Conclusions

2

ASIC Design flow

3

Source: http://www.ami.ac.uk

What is Backend?

• Physical Design:

1. FloorPlanning : Architect’s job

2. Placement : Builder’s job

3. Routing : Electrician’s job

4

Input for Layout Tools

Libraries:

• Physical Libraries (LEF/OA) • Cell boundaries, pins, routing rules

• Timing Libraries (*.lib)

Optional Input Files:

• Floorplan File

• IO File

• Scan Definition File

Input:

• Verilog Gate Level Netlist

• Timing Constraint files, for all modes (*.sdc)

Optional Libraries:

• Technology Files (Cap Tables, QRC Tech file)

• SI Libraries (*.cdb)

Import Design Procedure – Global Definition File

File - Import Design

Verilog Netlist File(s)

OA-Flow Reference, Custom

Libraries of Standard Cells; IOs,

Custom Blocks, Rams …

or LEF files (LEF/DEF Flow)

Specify MMMC (Multi Mode Multi Corner) view file:

links timing libraries, RC corners, and constraints per

view

Power/Ground (Special) net definitions,

CPF: Common Power Format

(Low-Power Design/Power Islands)

Command: source <myfile>.globals

init_design

Structure of a Die

• Silicon die is mounted inside a chip package.

• A die consists of a logic core inside a power ring.

• Special power pads are used for the VDD and VSS (Core and Pad).

7

The Design Implementation Flow

Floorplaning

• Floorplanning is a very important step in layout design.

• Important objectives:

Chip size

Aspect ratio

Placement of basic building blocks

IO placement

• Definition of chip size and aspect ratio along with the placement of its building blocks (memories, hard macros) strongly affects the chip routability and the final performance

• The pads should be placed in a way to meet minimum pitch requirements defined by the packaging methodology

9

Placement and Routing

•Placement

•Defines the position of each cell

from the netlist

•Placement performed in the

defined rows

•Target is to place the connected

cells into neighboring positions to

reduce the timing penalty

•Routing

•Performing the connection

between the cells (and IOs)

•Metal lines are used to make the

routing

•Objective is to reduce the

interconnection length (reducing

line capacitance i.e.

interconnection delay)

•Global and local routing

10

Back-end Design decisions

• Core and pad limited design

Design size can be defined either by the core size or by the pad size.

In general the design complexity is defined by the number of gates

(reflected to core area)

However, the pads are unproportionally big and therefore in case of

great number of them, they could define the chip area

• Opposite to that we have a core-limited design.

• The aspect ratio of the chip has to be chosen such that it doesn’t

affect the chip routability and that corresponds to packaging.

The aspect ratio of 1.0 defines quadratic shape of the chip. This

shape is the optimal shape in respect to placement and routing.

• The size of power rings depends on estimated power consumption of

the chip.

Since the power pads are usually distributed evenly on all four sides of

the chip, the maximum current flow through the power rings is ¼ of the

total estimated current.

11

Placement

• ASIC placement is performed in rows

• Routing can be performed in both

directions – horizontal and vertical

• The chip size strongly depends on the

chosen core (row) utilization. A typical

value of core utilization is 75%.

If the chip contains complex logic requiring

excessive routing, the user should

consider relaxing the core utilization.

If the chip logic is relatively simple, the

user may try to tighten up utilization value

in order to reduce the chip size

12

Objectives of Placement Process

• Performing the placement of each individual cells in the rows

• Reducing the placement distance between the connected cells

• Performing high density placements

• Reducing the timing overhead and power consumption

• Addressing the routing challenges (avoiding routing congestion congestion)

• Timing driven placement tries to fulfil the timing constraints while performing placement

It is connected with the processes of trial routing and RC extraction to estimate the effects of the placement choices

13

Placement Algorithms

• Two general types of the algorithms:

Constructive placement

Iterative placement improvement.

• Constructive placement method

Min-cut algorithm, or

eigenvalue method

• Starts with a constructed solution,

following iterative improvement

• The min-cut algorithm placement

method uses successive application of

partitioning

Cut the area into two pieces.

Swap the cells to minimize the cost.

Repeat the process, cutting smaller

pieces until all the logic cells are

placed.

• The eigenvalue placement algorithm

uses the cost matrix or weighted

connectivity matrix

Source: Application-Specific Integrated Circuits - Michael J. S. Smith

(a) Divide the chip into bins using a grid.

(b) Merge all connections to the center of each bin.

(c) Make a cut and swap cells between bins to minimize the cost

(d) Throw out all the edges that are not inside the piece.

(e) Repeat the process and continue the individual bins.

Iterative Placement

• Based on initial placement further improvements are done

Selection criteria decides which cells should be moved.

Measurement criteria decides whether to move the selected cells.

• Several exchange methods

pairwise interchange, force-directed interchange, force-directed

relaxation, and force-directed pairwise relaxation.

• All methods based on selecting a pair of cells which need to be

exchanged.

• First the examined cell is selected, after that exchange with all other

random cells is evaluated based on cost criteria. The limits of selecting the

pair could be defined through the Manhattan distance

(a) Swapping two cells

(b) Swapping more cells provides better

results but It is more complex

(c) A one-neighborhood.

(d) A two-neighborhood.


• Clock network need to be implemented to drive all sink elements (flip-flips,

lathes, etc) from the same source line

• Clock network consisting of large numbers of buffers, invertors, clock gates

• Objective is to reduce the phase difference between the clock at the

different clock sinks (clock skew)

• Additional goals is to reduce the clock latency (depending on the clock tree

complexity and interconnection delay)

• Clock is significant source of power consumption, therefore the objective to

reduce it

In modern designs ~50%

• Many sinks use all falling edge of the clocks

Important objecting is balancing of the rise and the fall time.

• The clock tree is defined in clock tree definition file

Clock synthesis

16

Clock trees

• A path from the clock source to clock sinks

17

Figure source: vlsi.pro

Concept of Clock Tree

18

Clock pad

Clock tree

Sub trees

Clock Skew

• Clock skew is the maximum difference in the arrival time of a clock signal

at two different sinks (flip-flops, latches etc).

• Clock skew could lead to performance drop or to the need for fixing of hold

time delay (adding the buffers) which results in additional power and area

• Clock skew should be minimized

19

Figure source: vlsi.pro

Clock Gating and CTS

• Clock gating is often used as a methodology for reducing the power

consumption

Clock network uses ~50% of the power budget

Switching of the network when it is not needed the consumption can be

dramatically reduced

• Clock gating needs to be taken into consideration while making CTS

Clock gate is part of the CTS and contribute to the skew

CT balancing required between not-gated and gated subtrees

Routing

• Goals of the routing is to minimize the interconnect delay

Routing in performed using the available different layers of metal

connections in the automatic way

Design rules need to be fulfilled (minimum spacing etc.)

Different types of routing (trial, clock routing, final routing) depending on the

design phase

Global routing – first phase of the final routing, connecting blocks

Detailed routing – final routing of all interblock connections

21

Manhattan Routing Algorithm

• Motivated by the streets of New York

Straight connections in the horizontal and vertical directions

Specific metal lines only for vertical or only for horizontal direction

Avoiding interconnection problems

Routing channels defined

• Manhattan distance

Summary of distance in X-axis and Y-axis direction

• There are now much more advanced algorithms

Pin A Pin B

Pin C Pin D

Metal 1

Metal 2

Left-Edge Routing Algorithm


Verification

• Timing verification

• Power verification

• LVS (layout vs schematics)

• DRC (Design rule check)

24

------------------------------------------------------------

optDesign Final Non-SI Timing Summary

------------------------------------------------------------

+--------------------+---------+---------+---------+---------+

| Setup mode | all | reg2reg |reg2cgate| default |

+--------------------+---------+---------+---------+---------+

| WNS (ns):| 0.000 | 0.000 | 0.815 | 0.000 |

| TNS (ns):| 0.000 | 0.000 | 0.000 | 0.000 |

| Violating Paths:| 0 | 0 | 0 | 0 |

| All Paths:| 4906 | 3787 | 38 | 1143 |

+--------------------+---------+---------+---------+---------+

+--------------------+---------+---------+---------+---------+

| Hold mode | all | reg2reg |reg2cgate| default |

+--------------------+---------+---------+---------+---------+

| WNS (ns):| 0.003 | 0.003 | 0.009 | 8.622 |

| TNS (ns):| 0.000 | 0.000 | 0.000 | 0.000 |

| Violating Paths:| 0 | 0 | 0 | 0 |

| All Paths:| 4906 | 3787 | 38 | 1143 |

+--------------------+---------+---------+---------+---------+

Timing Verification in Backend Design

• Timing verification after synthesis was possible based on the cell

delay and assumed interconnect delay (wireload model)

• After layout the real interconnect delay can be estimated

• Based on routing information (length, types of metal lines between

two pins) the parasitics can be calculated

• Two important parameters R (resistivity) and C (capacity) of the line

• Interconnect delay

td = R * C

Figure source: Application-Specific Integrated Circuits - Michael J. S. Smith

Power Verification

• Power related issues are very important in verification process

Power consumption

IR drop

Ground bounce

EMI

Substrate noise

Crosstalk

DRC & LVS

• During the verification step Design Rule Check it is verified whether all

manufacturer rules have been followed

• LVS includes extraction of schematics from the final layout and

comparison with the original netlist which was input for the layout

Expected result is full matching

Non-matching could indicate the problems: shorts, opens, parametric

missmatch etc.

Full Back-End Flow

Technology and IP setup (libraries, memory/hard macro IP, PDK)

Loading of input data (verilog netlist, constraints)

Floorplanning

Power planning

Placement

Initial verification and IPO

Clock tree insertion

Post-CTS verification and IPO

Routing

Post-Routing Verification and IPO

Timing Closure and ECO (Error Correction and Optimization)

Power/Voltage verification

DRC

LVS

Design for Manufacturability (Metal fillers etc)

28

Field-Programmable Gate Arrays (FPGAs)

• FPGAs are already fabricated chips which can be fully

functionally programmed after production

Programming can be done by writing into configuration

memory after power-on

Configuration SRAM or Flash

• FPGAs are consisting of configurable logic blocks (CLBs)

which can be individually programmed using

programmable LUTs and memory blocks

• Routing (interconnect) between the CLBs is also

programmable using configurable routing elements

• FPGAs are in general less power efficient and with

reduced performances but NRE costs are reduced to

minimum

Today FPGAs contain specialized blocks (embedded

processors, DSP) which make them more optimal

29

30

Basic Architecture

Source figure: Xilinx

Example: Spartan 2

• Basic architecture of FPGA

contains the elements which

can be fully programmed

CLBs

Memory

IOs

Interconnect

Clocking

31

Configurable Logic Block (CLB)

Source figure : Xilinx

Example: Spartan 6

• CLBs enable full functional

programmability

programmable Lookup-tables (LUT)

for arbitrary combinational function

selectable/programmable sequential

cell for targeted distributed memory

function

use of multiplexors for

interconnecting the correct function

32

I/O Block


Example: Spartan 6

• IO pads in FPGAs are fully reconfigurable

support different IO directions (I, O, IO)

single ended /differential

different interface standards (CMOS, TTL, LVDS)

different power supplies (3.3V, 2.5V, 1.8V, 1.5V, 1.2V)

pullups, pulldowns, with and wo registering

FPGA Clocking

33

Example: Spartan 6

• Clocking in FPGAs is also programmable

based on DCMs which can be programmed

in frequency/phase and aligned with other

clock sources

• Clock driver is routed to all relevant

sinks

CLBs, memory, IOs


FPGA Design Flow

34

Source figure eet.com

• Design flow corresponds to the one

for ASIC, but with different

implementation

Synthesis – translation of HDL into

components of FPGA

Place – placing the netlist into

CLBs of FPGA

Route – programming

interconnects to execute the

function

FPGA Pros and Cons

Pros

Reducing NRE costs – no mask costs, reduced design costs

Reducing design time – no need to wait for chip samples

Possibility for easy correction – only reprogramming needed

Cons

High unit costs – one FPGA can be even ~10k€

Higher power consumption

Reduced performances

Today’s FPGA much more optimal

Integrating multiprocessors on chip, DSPs, interfaces etc.

35

Example- Xilinx Zynq Ultrascale+

Example of optimized FPGA platform

Multi-core ARM system

implemented on chip

Large memory resources

Advanced connectivity (USB,

PCIe, CAN, SATA, etc)

Real-time support

Combining with programmable

logic

Support for high-speed serial

interfaces

36 Source figure: Xilinx

Conclusions

• Process of designing ASICs was here analysed in details.

• Main stapes include the synthesis, back-end and timing verification

• During the practical part we will analyze the steps using the software

CAD tools

• FPGA flow is similar to ASIC flow

37

titel und thema des vortrages - uni-potsdam.de...drc & lvs • during the verification step...

Documents