a method for fast delay/area estimation

18
1 A Method for Fast A Method for Fast Delay/Area Estimation Delay/Area Estimation EE219b Semester Project EE219b Semester Project Mike Sheets Mike Sheets May 16, 2000 May 16, 2000

Upload: cameo

Post on 06-Feb-2016

61 views

Category:

Documents


0 download

DESCRIPTION

A Method for Fast Delay/Area Estimation. EE219b Semester Project Mike Sheets May 16, 2000. Overview. Problem statement Proposed solution Constant delay paradigm Zero-slack algorithm Implementation Incorporation into SIS Library characterization Results Conclusions Future Work. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: A Method for Fast Delay/Area Estimation

1

A Method for Fast Delay/Area A Method for Fast Delay/Area EstimationEstimation

EE219b Semester ProjectEE219b Semester Project

Mike SheetsMike Sheets

May 16, 2000May 16, 2000

Page 2: A Method for Fast Delay/Area Estimation

2

OverviewOverview Problem statementProblem statement Proposed solutionProposed solution

– Constant delay paradigmConstant delay paradigm

– Zero-slack algorithmZero-slack algorithm

ImplementationImplementation– Incorporation into SIS Incorporation into SIS

– Library characterizationLibrary characterization

– ResultsResults

ConclusionsConclusions Future WorkFuture Work

Page 3: A Method for Fast Delay/Area Estimation

3

Problem StatementProblem Statement Given a boolean network, estimate the area if Given a boolean network, estimate the area if

implemented with particular required time implemented with particular required time constraintsconstraints– Estimation should be fast and reasonably accurateEstimation should be fast and reasonably accurate

Examine how technology independent logic Examine how technology independent logic optimization affects the estimationoptimization affects the estimation

Page 4: A Method for Fast Delay/Area Estimation

4

Area/Delay ModelsArea/Delay Models Constant area (traditional) modelConstant area (traditional) model

– Composed of discretely sized gates with constant areaComposed of discretely sized gates with constant area

– Mapping involves calculating delay as a function of loadMapping involves calculating delay as a function of load

Constant delay modelConstant delay model– Composed of mathematical functions relating area to sizeComposed of mathematical functions relating area to size

– Mapping involves calculating size (area) as a function of loadMapping involves calculating size (area) as a function of load

ND2X1

Area = constant from librarySize = constant from libraryDelay = dint + k*CL

Constant Area Model

CL

ND2

Area = Aint + Aslope*sizeSize = k*CL /(Delay – dint)Delay = constant

Constant Delay Model

CL

Page 5: A Method for Fast Delay/Area Estimation

5

Zero Slack AlgorithmZero Slack AlgorithmGiven input arrival times {ai} and output required time {rk}, assign gate delays as Given input arrival times {ai} and output required time {rk}, assign gate delays as follows:follows:

1.1. Initialize all internal required/arrival times to “unknown”Initialize all internal required/arrival times to “unknown”2.2. Select the path(s) with the minimum value of (rk-ai)/lp where lp is the length of the Select the path(s) with the minimum value of (rk-ai)/lp where lp is the length of the

path in number of gatespath in number of gates1.1. For each node from primary inputs to primary outputsFor each node from primary inputs to primary outputs

1.1. Calculate all the (ai, li) pairs from all fanin edgesCalculate all the (ai, li) pairs from all fanin edges2.2. Discard dominated pairs, save the union of the undominated pairsDiscard dominated pairs, save the union of the undominated pairs

2.2. When all primary outputs are reached, calculate minimum (rk-ai)/lpWhen all primary outputs are reached, calculate minimum (rk-ai)/lp

3.3. Assign delay of each gate in the selected path(s) to this minimumAssign delay of each gate in the selected path(s) to this minimum4.4. Update arrival and required times for all fi and fo edges of newly assigned delaysUpdate arrival and required times for all fi and fo edges of newly assigned delays5.5. Repeat steps 2-4 until all gates are assigned delaysRepeat steps 2-4 until all gates are assigned delays

n1

n2

n

n3

n4

a1

a2

r3

r4l4

l3

l2

l1

Pair (ai, li) dominates (aj, lj) if ai aj and li lj

If either (a1, l1) or (a2, l2) dominates the other, the four possible paths through n can be reduced to two, since the dominated path is “faster” than necessary.

Pair domination defined:

Page 6: A Method for Fast Delay/Area Estimation

6

Faster ApproximationFaster Approximation

Select an allowable slack threshold sSelect an allowable slack threshold s threshthresh (if zero (if zero

then algorithm yields same result as previous)then algorithm yields same result as previous)1.1. Compute the forward level lCompute the forward level ljj and arrival time a and arrival time ajj

of all nodes in network using a forward traceof all nodes in network using a forward trace

2.2. Compute the reverse level kCompute the reverse level kjj and required time r and required time rjj

of all nodes in network using a backward traceof all nodes in network using a backward trace

3.3. Update the delay of every node asUpdate the delay of every node asddjj = d = djj + (r + (rjj-a-ajj)/(l)/(ljj+k+kjj))

4.4. While the slack of any node exceeds sWhile the slack of any node exceeds sthreshthresh then then

repeat steps 1-3.repeat steps 1-3.

Page 7: A Method for Fast Delay/Area Estimation

7

Incorporation into SISIncorporation into SIS

read_libraryTech.lib.

Manualanalysis

Est.lib.

read_estim

BLIFnet.

read_blifTech. independent optimization:

script.algebraic, script.boolean, etc

Tech. dependent optimization:map

Fast delay/area estimation:estimate

Area

Area/delay tradeoff curve

Page 8: A Method for Fast Delay/Area Estimation

8

Library CharacterizationLibrary Characterization Commercial standard cell library have possibly multiple gates that implement Commercial standard cell library have possibly multiple gates that implement

the same equationthe same equation Each gate in the library has characteristics:Each gate in the library has characteristics:

– SizeSize– Delays from all input pins to the output pin for all transitions and several loadsDelays from all input pins to the output pin for all transitions and several loads– Capacitance for all input pinsCapacitance for all input pins– Maximum loadMaximum load– AreaArea

We need estimation parameters for each class of gates (ie. gates with the same We need estimation parameters for each class of gates (ie. gates with the same equation):equation):– Intrinsic gate delay (dIntrinsic gate delay (dintint))– Drive factor (k)Drive factor (k)– Area line y-intercept (AArea line y-intercept (Aintint))– Area line slope (AArea line slope (Aslopeslope))– Input capacitance line y-intercept (cInput capacitance line y-intercept (c intint))– Input capacitance line slope (cInput capacitance line slope (cslopeslope))

Page 9: A Method for Fast Delay/Area Estimation

9

Inverter Characterization (1)Inverter Characterization (1) Inverter delay scales linearly with load/size Inverter delay scales linearly with load/size

– Slope is kSlope is k

– Y-intercept is dY-intercept is dintint

Inverter Delay vs. Load/size

0.0000E+00

5.0000E-02

1.0000E-01

1.5000E-01

2.0000E-01

2.5000E-01

3.0000E-01

3.5000E-01

4.0000E-01

0.0000E+00 5.0000E+00 1.0000E+01 1.5000E+01 2.0000E+01 2.5000E+01

Load/size (lunits)

Del

ay (

du

nit

s)

ave(rise, fall) trend

Page 10: A Method for Fast Delay/Area Estimation

10

Inverter Characterization (2)Inverter Characterization (2) Inverter area scales linearly with sizeInverter area scales linearly with size

– Slope is ASlope is Aslopeslope

– Y-intercept is AY-intercept is Aintint

Inverter Area vs. Size

y = 4.0457x + 14.72

0

20

40

60

80

100

120

140

160

0 5 10 15 20 25 30 35

Size

Are

a

Area Trend

Page 11: A Method for Fast Delay/Area Estimation

11

Characterization IssuesCharacterization Issues Requires at least two gates per class in the libraryRequires at least two gates per class in the library Additionally, some gates have poor accuracy (trend lines have Additionally, some gates have poor accuracy (trend lines have

poor coefficients of determination)poor coefficients of determination) Further research shows the reason is CMOS implementation Further research shows the reason is CMOS implementation

(below)(below) Future work might replace linear model with piece-wise linear Future work might replace linear model with piece-wise linear

model for more accuracymodel for more accuracy

BA

OUT

BA

OUT

NAND-gate CMOS schematicfor smaller sizes

NAND-gate CMOS schematicfor larger sizes

Page 12: A Method for Fast Delay/Area Estimation

12

Estimation LibraryEstimation Library These issues are evident in the tableThese issues are evident in the table

– OAI31 and OAI32 have Aslope of 0.0, meaning that the two cells in the library had the same areaOAI31 and OAI32 have Aslope of 0.0, meaning that the two cells in the library had the same area– NOR3, NOR4 had poor coefficients of determinationNOR3, NOR4 had poor coefficients of determination– Many gates in the library had only one sizeMany gates in the library had only one size

k dint Aslope AintINV 0.01563 0.05932 4.2 13.8XOR2 0.01535 0.13938 36.0 18.0XNOR2 0.01505 0.13900 45.0 9.0NAND2 0.01220 0.07256 18.0 9.0NAND3 0.01516 0.07105 27.0 9.0NAND4 0.01466 0.08925 27.0 27.0NOR2 0.01542 0.06934 18.0 9.0AOI21 0.01533 0.07637 27.0 9.0AOI211 0.01627 0.01048 42.0 24.0AOI221 0.01586 0.11900 36.0 36.0AOI222 0.01614 0.15788 45.0 45.0OAI22 0.01524 0.09090 27.0 27.0OAI211 0.01495 0.10107 27.0 27.0OAI221 0.01542 0.11121 18.0 45.0OAI31 0.01594 0.10090 0.0 54.0OAI32 0.01592 0.10551 0.0 63.0OAI33 0.01590 0.11499 72.0 27.0

Page 13: A Method for Fast Delay/Area Estimation

13

Estimation ModesEstimation Modes Sweep modeSweep mode

– User specifies a range of required times to sweep User specifies a range of required times to sweep (possibly only one) and a step size(possibly only one) and a step size

– Estimation starts with the largest required time and Estimation starts with the largest required time and steps down until network fails the zero slack steps down until network fails the zero slack algorithm (ie. negative slack is encountered)algorithm (ie. negative slack is encountered)

Binary search modeBinary search mode– Used to find the minimum possible required time Used to find the minimum possible required time

(period) given infinite area(period) given infinite area– Starts at a user-specified maximum and performs a Starts at a user-specified maximum and performs a

binary search until a pass limit is reachedbinary search until a pass limit is reached

Page 14: A Method for Fast Delay/Area Estimation

14

ExperimentationExperimentation Various sized combinational logic benchmarksVarious sized combinational logic benchmarks

– MCNC c17, c880, c1908, c3540MCNC c17, c880, c1908, c3540 Various sized sequential logic benchmarksVarious sized sequential logic benchmarks

– Interpretation of required time is clock period Interpretation of required time is clock period (assuming all flip-flops are clocked synchronously)(assuming all flip-flops are clocked synchronously)

– MCNC s713, s838, s953, s1196, s1238, s1423MCNC s713, s838, s953, s1196, s1238, s1423 Tested four scriptsTested four scripts

– script.none (no optimization), script.algebraic, script.none (no optimization), script.algebraic, script.boolean, script.ruggedscript.boolean, script.rugged

Page 15: A Method for Fast Delay/Area Estimation

15

Tradeoff CurvesTradeoff Curves Sweep mode allows multiple required times Sweep mode allows multiple required times

(clock periods) to be easily tabulated(clock periods) to be easily tabulated

Area vs. Required Time Tradeoff Curves (c3540)

22000

23000

24000

25000

26000

27000

28000

29000

30000

3.5 4 4.5 5 5.5 6

Required Time (dunit)

Are

a

rugged none boolean algebraic

Page 16: A Method for Fast Delay/Area Estimation

16

Sensitivity to Optimization ScriptSensitivity to Optimization Script When delay is non-critical (ie. as required time approaches When delay is non-critical (ie. as required time approaches

infinity) infinity) – Area within 20% of no optimizationArea within 20% of no optimization– Variation between optimization scripts mostly under 10%Variation between optimization scripts mostly under 10%

Normalized Area vs. Benchmarks

0

0.2

0.4

0.6

0.8

1

1.2

1.4

s713 s838 s953 s1238 s1196 c1908 c3540 c17 c880

Benchmark

Are

a (n

orm

aliz

ed t

o n

o o

pti

miz

atio

n)

algebraic

boolean

rugged

Page 17: A Method for Fast Delay/Area Estimation

17

ConclusionsConclusions Sometimes more optimization yields worse Sometimes more optimization yields worse

resultsresults As required times become smaller, more paths As required times become smaller, more paths

become critical requiring larger sizes (area)become critical requiring larger sizes (area)– Area increases quickly before failureArea increases quickly before failure

From the benchmarks shown, estimation is From the benchmarks shown, estimation is relatively insensitive to technology independent relatively insensitive to technology independent optimization with infinite required timesoptimization with infinite required times

Page 18: A Method for Fast Delay/Area Estimation

18

Possible Future WorkPossible Future Work AccuracyAccuracy

– Relate estimated areas to actual areas from a good mapping using the full Relate estimated areas to actual areas from a good mapping using the full technology librarytechnology library

– Use more complex delay equations to handle different rise/fall timesUse more complex delay equations to handle different rise/fall times– Modify the algorithm to handle the case where a primary input cannot drive the Modify the algorithm to handle the case where a primary input cannot drive the

required loadrequired load CharacterizationCharacterization

– Revise characterization to support piece-wise linear functional formsRevise characterization to support piece-wise linear functional forms– Automate process so only the actual technology library is required as an inputAutomate process so only the actual technology library is required as an input

MappingMapping– Examine how various mapping options affect estimationExamine how various mapping options affect estimation– Use buffered fanout trees (Touati) after sizing gatesUse buffered fanout trees (Touati) after sizing gates

SpeedSpeed– Compare speed of total estimation procedure to traditional flowCompare speed of total estimation procedure to traditional flow

Power estimationPower estimation