slide: 1 interra confidential synthesis in eda flow by: saikat bandyopadhyay © interra systems...

Post on 26-Mar-2015

214 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

interra confidentialSlide: 1

Synthesis in EDA Flow

by: Saikat Bandyopadhyay

© Interra Systems India Pvt Ltd

interra confidentialSlide: 2

Content

• Defining Synthesis

• History

• IC Design Flow

• Synthesis Flow

Analysis and Elaboration

Synthesis

Scheduling and Allocation

Optimization

Technology Mapping

• Synthesis Goals and Constraints

• Synthesizing Big Design

• Variations in Synthesis

• Q and A

interra confidentialSlide: 3

Defining Synthesis

• Conversion of High Level Hardware Description to Gate Level Hardware Description

• Level of Hardware Description

Gate level

Data Flow level

RTL level

Behavioural level

interra confidentialSlide: 4

Gate Level

• Description of the hardware is purely in terms nets connecting pins of gate instances and ports

• Example

implements a 2 input mux using gate level components

• module select(out, s, a, b);

• output out;

• input s, a, b;

• INT_NOT (s_bar, s); //s_bar=!s

• INT_AND2 (t1, a, s); //t1=a&s

• INT_AND2 (t2, b, s_bar);//t2=a&s_b

• INT_OR2 (out, t1, t2);//out=t1|t2

• endmodule

interra confidentialSlide: 5

Data Flow Level

• Gate level + assign statements

• normally used to represent combinational circuit

• Can represent sequential circuit if used with instance of latch or ff

• Example: computes absolute

value

• module abs (out, in);

• output [7:0] out;

• input [7:0] in;

• wire [7:0] twosCIn;

• assign twosCIn = ~in + 1;

• assign out = in[7] ? twosCIn : in;

• endmodule

interra confidentialSlide: 6

RTL Level

• Explicit clock and state machine

• Technology independent

• Fixed Architechture

• Synthesizable

• Example :

RTL level description for recognizing overlapping 101 pattern

• State diagram

S1S0 S2S01/0

0/0

0/0

1/0

0/0

1/1

interra confidentialSlide: 7

RTL Level

• module recognize101(match,in,ck);

• input in, ck;

• output match;

• reg match;

• reg [1:0] state;

• always @(posedge ck) begin

• case (state)

• 2’b00: begin

• if (in == 1) begin

• state = 2’b01;

• end

• match = 1’b0;

• end

• 2’b01: begin

• if (in == 0) begin

• state = 2’b10

• end

• match = 1’b0;

• end

• case 2’b10: begin

• if (in == 1) begin

• state = 2’b01;

• match = 1’b1;

• end else begin

• state = 2’b00;

• match = 1’b0;

• end

• end

• default: begin

• state = 2’b00;

• match = 1’b0;

• end

• endcase

• endmodule

interra confidentialSlide: 8

Behavioural Level

• Implicit clock and scheduling of events

• Architechture independent

• Mostly used for modeling only (not synthesizable)

• Can be synthesized with special behavioural synthesis tools.

• Example:

The following module computes sqrt

Uses logicn-1

(2i+1) = n2

0

• module sqrt(in, out);

• input [7:0] in; output [3:0] out;

• reg [3:0] out, tmp; reg [7:0] odd;

• always @(in) begin

• tmp = in; out = 0; odd = 1;

• while (tmp > 0) begin

• if (tmp >= odd) begin

• out = out+1;

• tmp = tmp - odd;

• odd = odd + 2;

• end else begin

• tmp = 0;

• end

• end

• end

• endmodule

interra confidentialSlide: 9

History of Synthesis

• Initial IC Designs were handmade at Mask level

Polygon pushing tools(example Calma®) were used for design.

Simulation was done at this level by Simulators like HiLo®.

• Next tools were developed for automatic generation of operators

Some generators were developed for generating operators from parameters like input/output width and architecture.(e.g 16 bit carry look ahead adder)

The operators were connected by hand

• Later Schematic entry tools came to market.

Gates or operators can be drawn and connected schematically

Automatic tools would generate the mask from the schematic.

Mentor graphics Idea Station® had integrated schematic entry and simulation

interra confidentialSlide: 10

History of Synthesis(cont)

• Next came High Level Hardware Description Language

Gateways Design came up with Verilog Language

Verilog was essentially developed to model behavior of Electronic Circuits. Not for simulation.

Gateways developed the Verilog Simulator now called Verilog-XL.

• From High Level Description to Gate Level

Synopsys was at earlier called optimal design Inc. It specialized in gate level logic optimization.

Synthesis happened as a after thought. Since this modeling language(verilog) was available, Synopsys engineers tried to convert various of high level verilog constructs into gate level where ever possible.

synthesis as we know today was born.

interra confidentialSlide: 11

IC Design Flow

• Develop and verify algorithm (C, Mathlab etc)

• Hand convert to RTL level Hardware Description

• Verify the RTL Design by Simulation.

• Power and Timing estimation tools can also be used at RTL level.

• Synthesis tools used to convert description to gate level.

• Simulation or Formal Verification done to verify functionality

• Design Flow

Algorithm in C, Mathlab

RTL Description

Gate Description

Synthesis

Execute andverify Algo

Simulate to verifyFunctionality

Estimate Timingand Power

Verify Timingand Power

Verify Functionalitywith Simulation or

Formal Verification

Tech Library

Constraints

interra confidentialSlide: 12

IC Design Flow (cont)

• Placement tool in now used to assign place(x,y coordinates) for gates

• Timing verification is done with better estimate of wire delays

• Routing tool assigns location for nets that connect the instance gates.

• Timing Verification is again done with still refined wire delays

• Mask is used to prepare the IC

• Design Flow

Gate Description

Placement

Mask (GDSII)

Placed Gates

Routing

VerifyTiming

Verify and CorrectPlacement Rules

VerifyTiming

Verify and CorrectMask Rules

To IC foundry

Floor Plan

Physical Library

interra confidentialSlide: 13

Synthesis Flow

Translate RTL level Design description in HDL to gate level netlist

In description only synthesizable subset of the HDL are supported for synthesis

Different steps in Synthesis flow

Elaboration

DFA

Allocation

CDFGgeneration

Analysis

CDFGTraversal

Optimization

WritingNetlist

TechnologyMapping

RTLDescription

Gate LevelDescription

MacroGeneration

interra confidentialSlide: 14

Synthesis Flow (analysis)

• Analysis Input : Design description in HDL (Verilog/VHDL file)

Output : Analyzed design units in an intermediate form either in memory or in disk

Functionality :

• Perform syntax and semantics checks on the design description

• Creates Data Structure in an language dependent form (Obejct Model)module my_mod(z, a, b, c);input [1:0] a, b, c;output [1:0] z;always @(a or b or c) z = a + b – c;endendmodule

module my_mod

always

expr

ports

interra confidentialSlide: 15

Synthesis Flow (elaboration)

• Elaboration Input : Analyzed design unit list

Output : Elaborated design unit list

Functionality :

• Expand the complete design hierarchy

• Generate a design unit list consisting of distinct design units

• Resolve all parameter values

• Compute all the constant expression

module top (o, i1, i2);input [7:0] i1, i2; output [7:0] o;my_mod#(1) (o[1:0], i1[1:0], i2[1:0]);my_mod#(3) (o[7:2], i1[7:2], i2[7:2]);endmodule

module my_mod(z, a, b);parameter w;input [2*w-1:0] a, b;output [2*w-1:0] z;assign z = a + b – c;endmodule

module top (o, i1, i2);input [7:0] i1, i2; output [7:0] o;my_mod_1 (o[1:0], i1[1:0], i2[1:0]);my_mod_3 (o[7:2], i1[7:2], i2[7:2]);endmodule

module my_mod_1(z, a, b);input [1:0] a, b;output [1:0] z;assign z = a + b – c;endmodule

module my_mod_3(z, a, b);input [5:0] a, b;output [5:0] z;assign z = a + b – c;endmodule

interra confidentialSlide: 16

Synthesis Flow (cdfg)

• Generation of Control and Data Flow Graphs

Input : Elaborated Language dependent Data Structure

Output : Language Independent Control and Data Flow Graphs(CDFG)module my_mod(z,a,b,c,m,n);

input [1:0] a, b, c;input m, n;reg[1:0] z;reg [1:0] z;reg [1:0] t;

always @(a or b or c or m or n) begin if(m) t = a; else if (n) t = b; z = t + c;endendmodule

START

END

IF

ENDIF

IF

ENDIF

= = NOP

+

t

cz

t

a

b

t

mn

interra confidentialSlide: 17

Synthesis Flow (cdfg)

• Distinct component of synthesis routine:

CDFG Generation

• Populate Language independent representation of the input design as a Control and Data Flow Graph

• Functional flow input language dependent

• Input: Inmemory representation of the entire design created by analyzer

• Output: Language independent representation of the entire design as a directed graph

• Graph is created for each concurrent block and represents sequential behaviour of the design

• Each node in Graph represents either control node or data node

• Each edge in Graph represents either control flow or data flow

interra confidentialSlide: 18

Synthesis Flow (dfa)

Data Flow Analysis and Creating Logic with Generic Gates

• Traverse the CDFG created for each concurrent block

• Calculate the driving logic for each assign object in each path and store them as logic equation

• Both data logic and control logic are evaluated

• Realize an abstract structure of the input designSTART

END

IF

ENDIF

IF

ENDIF

= = NOP

+t

cz

t

ab

t

mn

MU

X

LATCH adderb

a

m

m

n

c

z

interra confidentialSlide: 19

Synthesis Flow (dfa)

We analyze the cdfg and store the data in intermediate forms called path variable array(PVA) and path variable matrix(PVM)

Path Variable Array(PVA)

• one for each path

• array of lhs-rhs pair.

p = a + b;

q = ~en

~enq

a+bp

rhslhs

interra confidentialSlide: 20

Synthesis Flow (dfa)

Path Variable Matrix(PVM)

• Created each time paths join

• rows represent lhs(signals getting assigned)

• columns are paths

• For each column(path) there is enabling condition

nNULLmr

NULLbNULLq

a+bbap

m == 3m == 2m == 1lhs\cond

interra confidentialSlide: 21

Synthesis Flow (dfa)

Data Flow Analysis

• Each path consists of path segments and for each path segment data and control value are evaluated for each assigned object.

• These values are stored in PVA (Path Variable Array)

• A special construct PVM (Path Variable Matrix) is created out of PVAs to hold value of the objects in different paths.

• Each column in PVM represents a particular path and each row represents a particular object. Each entry in Matrix represents logic value of a particular object in a particular path.

interra confidentialSlide: 22

Synthesis Flow (dfa)

Data Flow Analysis (Example)START

END

IF

ENDIF

IF

ENDIF

==

NOP

+

t

cz

t

a

b

t

mn

PVA : P1PVM: M1

PVA : P11

PVA : P121 PVA : P12

PVM : M2

PVA : M3

PVA : P12

interra confidentialSlide: 23

Synthesis Flow (dfa)

Data Flow Analysis (Example)

• For each sequential block, one root PVA and one root PVM are allocated (P1, M1)

• Starting from each branch node new PVA is created for each path segment.(P11 and P12)

• When hit a join node, new PVM (M2) is created out of PVAs (P11 and P12)

• This PVM is passed to allocator for allocating current data and control logic

• Clock, Tristate and Hold logic is allocated only from Root PVM (M1)

interra confidentialSlide: 24

Synthesis Flow (dfa) Inferring Logic from PVM

• Each row of PVM is analyzed and logic inferred.

• For row in which all colums have values one hot mux is inferred

• For row in which some columns are empty, latch is infered

• Latch, flip-flop and tristate are allocated from root PVM: M1

lhs\cond m ~m

d a b

lhs\cond m ~m

d a NULL

MU

Xb

a

m

d

LATCH

m

a d

interra confidentialSlide: 25

Synthesis Flow(dfa example)

Lets now infer logic for the CDFG that we had created

• Initial PVM just has initial values(NULL)

• At first join node PVM M2 is created

• Since infers to latch we wait till root PVM:M3

• Since t_1 is not yet allocated. The PVM is divided into PVM for data and PVM for hold logic

lhs\cond n ~n

t_1 b NULL

lhs\cond m ~m

t a t_1

interra confidentialSlide: 26

Synthesis Flow(dfa example)

• PVM for data logic

• PVM for hold logic

• t_data goes to data pin. t_hold goes to hold pin and the output is t

• Finally logic for z is infered for root PVM

lhs\cond m ~m

t_2 a b

lhs\cond m ~m

t_2 NULL ~n

MU

Xb

a

n

t_data

m

n t_hold

+t

c

z

interra confidentialSlide: 27

Synthesis Flow(dfa example)

Inferred netlist for the CDFG

RT

L_M

UX

RTL_LD M_RTL_ADDb

a

m

m

n

c

interra confidentialSlide: 28

Synthesis Flow (cont.)

• Allocation and Scheduling

Schedule the clock cycle in which to perform the operation

Allocate actual hardware resource for each logic operation

Bind the allocated resource with the input and output data

Transform the design into netlist form by instantiating cell/macro and connects them to achieve the functionality

interra confidentialSlide: 29

Synthesis Flow (cont.)

• Allocation and Scheduling

Example of Data Flow Path for scheduling

Trivial Scheduling

• Assumes infinite resources

• All operations in 1 clock cycle

• Large clock cycle

• Latency is 0

* ** * +

* *

-

-

+ <

Clo

ck P

erio

d

interra confidentialSlide: 30

Synthesis Flow (cont.)

• Allocation and Scheduling

ASAP Scheduling

• One operation per clock cycle

• Independent operations done parallel

• Operations done ASAP

• Smaller clock

• Latency is number of levels

* ** * +

* *

-

-

+ <

T1

T2

T3

T4

interra confidentialSlide: 31

Synthesis Flow (cont.)

• Allocation and Scheduling

Scheduling under resource constraint

• Resource available

– 1 multiplier

– 1 add/sub

• Small clock(same as ASAP)

• Small area

• Large latency

*

*

*

*

+

*

*

-

-

+

<

T1

T2

T3

T4

T7

T6

T5

interra confidentialSlide: 32

Synthesis Flow(cont)

• Macro Generation

Operators in Data Flow Paths like adders, multipliers which are allocated as Macros are build in terms of primitive cells

Input: Netlist with macro Instances

Ouput: Netlist in terms of primitive instances only

Functionality

• Based on the macro(operator type), input width and input type(signed, unsigned) appropriate operator generator are called.

• generator replaces the macro with primitive gates like PRIM_AND, PRIM_XOR.

interra confidentialSlide: 33

Synthesis Flow (cont.)

• Optimization

Circuit cost whether area or speed is optimized.

Optimization in concorde is mainly done by SIS

Hanging logic removal, removal of not gates connected in series, parallel instance removal etc. is done traversing the netlist in concorde code.

interra confidentialSlide: 34

Synthesis Flow(cont)

• Logic Optimization

• Lets discuss algorithm for one such case (expand)

• Function to optimize is• FON = ab’c’ + a’b’c’ + a’bc’ + a’b’c

• Fdon’t care = abc’

• FOFF can be computed to ab’c + a’bc + abc

• Tabular representation

• FON FOFF

• a b c a b c

• ab’c’ 1 0 0 ab’c 1 0 1

• a’b’c’ 0 0 0 a’bc 0 1 1

• a’bc 0 1 1 abc 1 1 1

• a’b’c 0 0 1Cube Representation of function

a

bc

interra confidentialSlide: 35

Synthesis Flow(cont)

• Expand Algo

• Foreach row of FON

• foreach column of row

• if (FON[row][column] != *)

• F = FON

• F[row][column] = *

• if (FFOFF == )

• foreach row2 of F

• if (row != row2 &&

• F[row]F[row2] == F[row]) {

• erase F[row2];

• FON = F

interra confidentialSlide: 36

Synthesis Flow(cont)

• Expand Algo• Tabular Representation Cube Representation

• FON FOFF

• 1 0 0 1 0 1

• 0 0 0 1 1 1

• 0 1 0 0 1 1

• 0 0 1

• * 0 0 * * 0

• 0 0 0 erase 0 1 0 erase

• 0 1 0 0 0 1

• 0 0 1

• * * * * * 0 * * 0 * * 0

• 0 0 1 * 0 1 0 * 1 0 0 *

interra confidentialSlide: 37

Synthesis Flow(cont)

• Sequential Optimization

• Several Kinds of Sequential Optimization Techniques are also present.

• Lets consider one such Optimization(retiming)

• Flip Flop or Latch position is moved along the path to optimize area and speed

interra confidentialSlide: 38

Synthesis Flow (cont.)

• Technology Mapping & Optimization

Map the generic synthesized netlist using customer specific library cell

Rule Based Mapping

Algorithm Based Mapping

Mapping criteria

• get minimum area

• get minimum delay

interra confidentialSlide: 39

Synthesis Flow (cont.)

• Technology Mapping & Optimization• Lets consider Dynamic Programming based mapping to optimize area

• Library cells are converted to NAND, INV tree based on it’s logic

• Library and NAND-INV tree

• INV 2

• NAND 5

• AND 6

• IOR 5

interra confidentialSlide: 40

Synthesis Flow (cont.)

• Technology Mapping & Optimization• Design is also converted to NAND_INV tree

• Algorithm

Cost of a cell is it’s Area

Cost of Input pins is 0

Cost of a vertex is cost of cell whose pattern matches the pattern at vertex + vertex cost at inputs

If multiple cell patterns match pattern at the vertex. We will take the cell which results in minimum vertex cost

Compute cost for all vertex from input to output

interra confidentialSlide: 41

Synthesis Flow (cont.)

• Technology Mapping & Optimization

• Cost of V1 = cost(NAND) = 5

• Cost of V2 = min(cost(INV)+cost(V1), cost(AND)) = 6

• Cost of V3 = min(cost(IOR)+cost(V1),cost(NAND)+cost(V2)) = 10

• INPUT DESIGN MIN AREA IMPLEMETATION

1 2 3

interra confidentialSlide: 42

Synthesis Flow (cont.)

• Writing Structural Netlist

Write synthesized netlist in any desired format to output text files

Output netlist is in structural form.

interra confidentialSlide: 43

Synthesis Goals and Constraints

• RTL Level hardware description can be implemented in many ways[macro(architectural), or micro(logic) level]

+

+ +

+

+

+

a+b+c a+b+c a+b+c

a

a

b

c

cb c

b

a

Architectural choices

x

y

zLogic choices

x

y

z

interra confidentialSlide: 44

Synthesis Goals and Constraints

• Goals and Constraints help Synthesis Tool to make the choices

• Goals can be maximize speed or minimize area, power

• Constraints are more detailed Goals

• Constraints at Chip Level

Minimize area for a given Clock speed

Maximize speed as long as the design fits into a FPGA of specific size

• Constraints at Block Level are more complex

interra confidentialSlide: 45

Constraints at Block Level

• Input Delay specifying the data arrival time at each input seperately.

• Output Delay specifies the extra delay after the output. The current design must make the output data arrive earlier to take care of this case.

• Clock waveform needs to be specified.

• Specific paths can be specified with specific delay to meet

interra confidentialSlide: 46

Synthesizing Big Design

• Big Designs take too much memory and time to be Synthesized together.

• Divided into blocks(modules) and the blocks are synthesized separately

• Synthesis is done bottom up. Leaf level blocks are synthesized first.

• Constraints need to be computed from the Top, since constraint at each block comes from constraint of the whole chip.

interra confidentialSlide: 47

Synthesizing Big Design

• Designers divide the total chip area into area constraint for each block

• The block constraints can be total area or width and height of each block. Pin positions of each block are determined.

• Synthesis tool only takes in the area. The other constraints (width, height, pin positions) are for placement tools

B1B3

B2

B4

B5

B6

B7

Chip Layout

interra confidentialSlide: 48

Synthsizing Big Design

• Similarly designers divides the clock period into timing constraints for each block.

• Say the clock period is 20ns. For B1 Flip Flop to output can be 7ns, for B2 input to output can be 5 ns. For B3 input to Flip Flop is 8ns.

B1 B2 B3

Design with Blocks(abstract)

interra confidentialSlide: 49

Synthesizing Big Designs

• This process of dividing chips resources is called bugeting.

• Buggeting is mostly manual but there are some tools to help in bugeting

• The process is mostly iterative. After Synthesis designers often find blocks that couldn’t meet the constraints. Designers normally redo the buggeting and Synthesizes again.

interra confidentialSlide: 50

Variations in Synthesis

• Higher Level Synthesis

Input is at higher level than RTL

• Alternate Target Synthesis

Output not at Gate Level

• Timing Driven Synthesis

interra confidentialSlide: 51

Higher Level Synthesis

• Behavioural Synthesis Synthesis done from Behavioral Level

Output is normally RTL

Unlike RTL Synthesis(regular Synthesis), architechture selection is done by the tool based on constraints

Scheduling is non trivial. Clock is used to divide the data paths into different time slots

Resources are shared if they are in different time slots

interra confidentialSlide: 52

Higher Level Synthesis

• Protocol Synthesis

Input in Language specific for describing Communication Protocols between designs

Output is RTL Description for Synthesis

Sometimes also produces C model for verification

Examples are

• Synopsys’s Protocol Compiler

• Austin Protocol Compiler(APC) of The University of Texas at Austin

• ALFred Protocol Compiler

interra confidentialSlide: 53

Higher Level Synthesis

• Example of Protocol input in Timed Asynchronous Protocol(TAP)

process peconst Rp: integer=0; Bq: integer=0; tr: integer=10; qe: addressvar sp: integer = 0; sq: array [2] of integer = 0; d, e: integer; initialize: integer = 1begin act sendrqst in 0; initialize := 0 timeout sendrqst rst.e:=NCR(Bq,2,sq[0],sq[1]); send rqst to qe; act resend in tr; rcv rqst from qe d:=DCR(Bq,0,rqst.e); e:= DCR(Bq,1,rqt.e);

if (sp=d)(sp=e) sp:=e; reply.e:= NCR(Bq,1,sp); log(“detected adversary”); fi timeout resend if sq[0] = sq[1] rqst.e:=NCR(Bq,2,1,sq[1]); send rqst to qe; act resend in tr; skip; fi rcv reply from qe d:= DCR(Rp,0,reply.e); if sq[1] = d sq[0]:=sq[1]; log(“detected adversary”); fi end

interra confidentialSlide: 54

Alternate Target Synthesis

• FPGA Synthesis Special Mapping to Programmable gates

• e.g 4 input gates(often called LUT) that can be programmed to any 4 input logic

Dedicated resources needs special care while mapping and cost computation.

• Gates using carry chain wires have different delay from regular wires that go through switch boxes.

Architechture specific OptimizationLUT

LUT

LUT

SwitchBox

interra confidentialSlide: 55

Alternate Target Synthesis

• Physical Synthesis

Generates directly Placed Gates

Design Convergence is guarantied

• Constraint that meets in Synthesis may not meet after placement. We normally need to redo the Synthesis. Physical Synthesis helps to avoid this iteration

interra confidentialSlide: 56

Timing Driven Synthesis

• Synthesis is done directly to technology gates.

• Synthesis is done from input towards output(light to dark)

• Architechtures are selected while synthesizing based on the delays

interra confidentialSlide: 57

Q & A

• Thank you

top related