t3_p2_lou

Upload: arun-kumar-kolavasi

Post on 05-Apr-2018

220 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/2/2019 t3_p2_lou

    1/91

  • 8/2/2019 t3_p2_lou

    2/91

    ASP-DAC'01 Lou Scheffer 2

    Timing Closure Today

    Timing more accurate as flow progresses

    Sometimes an earlier stage thinks timing is

    OK, but it fails a later stage

    Need to repeat one or more steps with

    tighter constraints

    We have atiming closure problem when

    this process fails. Symptoms include:

    Non-convergence

    Too many iterations

    Solution achievable, but this flow

    cannot find it.

    Design Entry

    Synthesis

    Timing

    Place

    Timing

    Route

    Timing

  • 8/2/2019 t3_p2_lou

    3/91

    ASP-DAC'01 Lou Scheffer II-3

    The Timing Closure Problem

    Performance of Circuit

    Test 7

    99

    96

    100

    78

    99

    83

    75

    80

    85

    90

    95

    100

    PKS/WLM P&R IPO P&R

    Stage

    Frequency(Target!00MHz)

    pks

    regular

  • 8/2/2019 t3_p2_lou

    4/91

    ASP-DAC'01 Lou Scheffer 4

    Examples of Problems

    .18.18 mm7.5 ns7.5 ns--11 / 200011 / 2000--0.5 / 5000.5 / 500V2V2

    PlacedPlacedSynthesisSynthesis

    .25.25 mm8 ns8 ns--97 / 43k97 / 43k--0.4 / 1000.4 / 100P1P1

    .18.18 mm2.52.5--10 ns10 ns--48 / 164k48 / 164k--0.5 / 20000.5 / 2000T1T1

    .18.18 mm7.5 ns7.5 ns--12 / 15k12 / 15k0 / 00 / 0V1V1

    .25.25 mm7.5 ns7.5 ns--12 / 38k12 / 38k--1 / 20001 / 2000C1C1

    TechTechCycleCycle

    timetime

    Worst slack / # missesWorst slack / # missesDesignDesign

  • 8/2/2019 t3_p2_lou

    5/91

    ASP-DAC'01 Lou Scheffer II-5

    Agenda

    nn Traditional design flowsTraditional design flows

    nn Summary of DSM ProblemsSummary of DSM Problems

    nn Timing Analysis OverviewTiming Analysis Overview

    nn Timing Correction OverviewTiming Correction Overview

    nn Approaches to Fixing Timing ClosureApproaches to Fixing Timing Closure

    nn Experimental ResultsExperimental Results

    nn SummarySummary

  • 8/2/2019 t3_p2_lou

    6/91

    ASP-DAC'01 Lou Scheffer II-6

    Traditional Design FlowsDesign Entry

    Synthesis

    Timing

    Place

    Timing

    Route

    Timing

    1. Tech independent

    optimization

    2. Tech mapping

    3. Rudimentary

    timing correction

  • 8/2/2019 t3_p2_lou

    7/91

    ASP-DAC'01 Lou Scheffer II-7

    Logic Synthesis Flow

    nn Technology independent optimizationTechnology independent optimization

    uu General goal: reduce connections, literals,General goal: reduce connections, literals,

    redundancies, arearedundancies, areann Technology mappingTechnology mapping

    uu Map logic into technology libraryMap logic into technology library

    nn Timing correctionTiming correction

    uu Find and fix critical timing pathsFind and fix critical timing paths

    uu Fix electrical violations (load, slew)Fix electrical violations (load, slew)

  • 8/2/2019 t3_p2_lou

    8/91

    ASP-DAC'01 Lou Scheffer II-8

    Traditional Design FlowsDesign Entry

    Synthesis

    w/Timing

    Place w/Timing

    Route

    Timing

    Integrate timing with

    synthesis and placement

    1. Tech independent

    optimization

    2. Tech mapping

    3. Timing correction

  • 8/2/2019 t3_p2_lou

    9/91

    ASP-DAC'01 Lou Scheffer II-9

    Agenda

    nn Traditional design flowsTraditional design flows

    nn Summary of DSM ProblemsSummary of DSM Problems

    nn Analysis Methods OverviewAnalysis Methods Overview

    nn Correction Methods OverviewCorrection Methods Overview

    nn Approaches to Fixing Timing ClosureApproaches to Fixing Timing Closure

    nn Experimental ResultsExperimental Results

    nn SummarySummary

  • 8/2/2019 t3_p2_lou

    10/91

    ASP-DAC'01 Lou Scheffer II-10

    The Wall

    nn Logic designers concentrate on logic andLogic designers concentrate on logic and

    timing (as understood by synthesis)timing (as understood by synthesis)

    nn Design work done in abstract world of gatesDesign work done in abstract world of gatesand wire load modelsand wire load models

    nn Throw designThrow design over the wallover the wall when completewhen complete

    nn

    Physical designers concentrate on layoutPhysical designers concentrate on layoutand ability to routeand ability to route

    nn Effective method for many yearsEffective method for many years

  • 8/2/2019 t3_p2_lou

    11/91

    ASP-DAC'01 Lou Scheffer II-11

    General CMOS Problems

    nn Low drive strengths / low powerLow drive strengths / low power

    uu Capacitance (not intrinsic delay) plays a largeCapacitance (not intrinsic delay) plays a large

    role in performancerole in performanceuu VariabilityVariability range between slowest possiblerange between slowest possible

    and fastest possibleand fastest possible

    nn Noise affects delayNoise affects delay

    uu IR drop a big percentage of supplyIR drop a big percentage of supply

    uu CrosstalkCrosstalkcan change delay by a factor of 2can change delay by a factor of 2

  • 8/2/2019 t3_p2_lou

    12/91

    ASP-DAC'01 Lou Scheffer II-12

    Additional DSM Problems

    nn High density / huge designsHigh density / huge designs

    nn Very thin and resistive wiresVery thin and resistive wires

    nn Very high frequenciesVery high frequenciesuu Inductance becomes more importantInductance becomes more important

    nn Smaller voltagesSmaller voltages

    uu IR drop a bigger fraction of signal swingIR drop a bigger fraction of signal swing

    nn Clock skew and latencyClock skew and latency

    nn Electromigration and noiseElectromigration and noise

  • 8/2/2019 t3_p2_lou

    13/91

    ASP-DAC'01 Lou Scheffer II-13

    Clock Distribution Problems

    nn Most common design approach requiresMost common design approach requires

    close to zero skewclose to zero skew

    nn CMOS / DSM problems all affect clocksCMOS / DSM problems all affect clocksnn Distribution problem increasingDistribution problem increasing

    uu Number of latches/flipNumber of latches/flip--flops growingflops growing

    significantlysignificantly

    nn Power consumed in clock tree significantPower consumed in clock tree significant

    uu IIand noise also of concernand noise also of concern

  • 8/2/2019 t3_p2_lou

    14/91

    ASP-DAC'01 Lou Scheffer II-14

    Process Designers are trying to help

    nn Many metal layersMany metal layers

    nn Different metal pitchesDifferent metal pitches

    uu

    Small pitch for local interconnectSmall pitch for local interconnectuu Big pitch for long, fast wiresBig pitch for long, fast wires

    nn Copper wires, thick metal to lower RCopper wires, thick metal to lower R

    nn SOISOI Silicon On InsulatorSilicon On Insulator

    nn Low k dielectricsLow k dielectrics

    nn These help but are not enoughThese help but are not enough

  • 8/2/2019 t3_p2_lou

    15/91

    ASP-DAC'01 Lou Scheffer II-15

    Agenda

    nn Traditional design flowsTraditional design flows

    nn Summary of DSM ProblemsSummary of DSM Problems

    nn Analysis Methods OverviewAnalysis Methods Overview

    nn Correction Methods OverviewCorrection Methods Overview

    nn Approaches to Fixing Timing ClosureApproaches to Fixing Timing Closure

    nn Experimental ResultsExperimental Results

    nn SummarySummary

  • 8/2/2019 t3_p2_lou

    16/91

    ASP-DAC'01 Lou Scheffer II-16

    Timing Analysis

    nn Give accurate time values on each pin/portGive accurate time values on each pin/port

    of the networkof the network

    nn Has to deal with design changes inHas to deal with design changes inoptimization toolboxoptimization toolbox

    nn StaticStatic Timing AnalysisTiming Analysis

    uu Simulation far too slow in optimizationSimulation far too slow in optimization

    environmentenvironment

    uu Accuracy is more than enoughAccuracy is more than enough

  • 8/2/2019 t3_p2_lou

    17/91

    ASP-DAC'01 Lou Scheffer II-17

    Timing Analysis Requirements

    nn Choose combination of timing analyzer and delayChoose combination of timing analyzer and delay

    calculator which are appropriate for level ofcalculator which are appropriate for level ofdesigndesign

    uu give the best accuracygive the best accuracyuu for performance that can be toleratedfor performance that can be tolerated

    nn Timing Analysis / Delay calculation must be ableTiming Analysis / Delay calculation must be ableto cope with logic design changesto cope with logic design changes

    uu

    IncrementalIncrementaluu Highest performance possibleHighest performance possible

    uu NonNon--linear delay equationslinear delay equations

  • 8/2/2019 t3_p2_lou

    18/91

    ASP-DAC'01 Lou Scheffer II-18

    Timing Analysis Requirements

    nn Must handleMust handle

    uu Difference between rising and falling delaysDifference between rising and falling delays

    uu Delay dependent on slew rateDelay dependent on slew rateuu Slew and delay dependent on output loadSlew and delay dependent on output load

    uu NonNon--linear delay equationslinear delay equations

  • 8/2/2019 t3_p2_lou

    19/91

    ASP-DAC'01 Lou Scheffer II-19

    Late Mode Analysis Definitions

    nn Constraints: assertions at the boundariesConstraints: assertions at the boundaries

    Arrival times:Arrival times:ATATaa,,ATATbb

    Required arrival time:Required arrival time:RATRATxx

    nn Delay fromDelay from aa totoxx is the longest time it takes tois the longest time it takes topropagate a signal frompropagate a signal from aa totoxx

    nn Slack is required arrival timeSlack is required arrival time -- arrival time.arrival time.

    a

    b xc

    yaAT

    bAT

    xRAT

    axd

  • 8/2/2019 t3_p2_lou

    20/91

    ASP-DAC'01 Lou Scheffer II-20

    Example

    a

    b xc

    y0=aAT

    1=bAT

    2=xRAT

    3=xAT

    132 ==xSL110 ==bSL

    000==

    aSL 2=yAT

    0=cAT

    121 ==ySL

    11

    101==

    cSL

  • 8/2/2019 t3_p2_lou

    21/91

    ASP-DAC'01 Lou Scheffer II-21

    Early mode analysis

    0=aAT

    1=bAT

    2=xRAT

    1=xAT

    121 ==x

    SL101 ==bSL

    000 ==aSL

    1=yAT

    0=cAT

    011 ==ySL

    a

    b xc

    y

    nn Definitions change as followsDefinitions change as follows

    longestlongestbecomesbecomes shortestshortest

    slack = arrivalslack = arrival -- requiredrequired

    11

    110 ==c

    SL

  • 8/2/2019 t3_p2_lou

    22/91

    ASP-DAC'01 Lou Scheffer II-22

    Delay modeling

    axda

    b

    x

    bxd

    Propagation Arcs cl

    d odclt _

    ocld _

    Test ArcTiming Model

  • 8/2/2019 t3_p2_lou

    23/91

    ASP-DAC'01 Lou Scheffer II-23

    Agenda

    nn Traditional design flowsTraditional design flows

    nn Summary of DSM ProblemsSummary of DSM Problems

    nn Analysis Methods OverviewAnalysis Methods Overview

    nn Correction Methods OverviewCorrection Methods Overview

    nn Approaches to Fixing Timing ClosureApproaches to Fixing Timing Closure

    nn Experimental ResultsExperimental Resultsnn SummarySummary

  • 8/2/2019 t3_p2_lou

    24/91

    ASP-DAC'01 Lou Scheffer II-24

    Timing Correction

    nn Fix electrical violations (slew and load).Fix electrical violations (slew and load).

    Takes priority since needed for reliability.Takes priority since needed for reliability.

    uu Resize cellsResize cellsuu Buffer netsBuffer nets

    uu Copy (clone) cellsCopy (clone) cells

    nn Fix timing problemsFix timing problems

    uu Local transforms (bag of tricks)Local transforms (bag of tricks)

    uu PathPath--based transformsbased transforms

  • 8/2/2019 t3_p2_lou

    25/91

    ASP-DAC'01 Lou Scheffer II-25

    Local Transforms

    nn Resize cellsResize cells

    nn Buffer or clone to reduce load on critical netsBuffer or clone to reduce load on critical nets

    nn

    Decompose large cellsDecompose large cellsnn Swap connections on commutative pins or amongSwap connections on commutative pins or among

    equivalent netsequivalent nets

    nn Move critical signals forwardMove critical signals forward

    nn Pad early pathsPad early paths

    nn Area recoveryArea recovery

  • 8/2/2019 t3_p2_lou

    26/91

  • 8/2/2019 t3_p2_lou

    27/91

  • 8/2/2019 t3_p2_lou

    28/91

    ASP-DAC'01 Lou Scheffer II-28

    Cloning

    0

    0.01

    0.02

    0.03

    0.04

    0.05

    0 0.2 0.4 0.6 0.8 1

    load

    d

    A B C

    b

    a

    d

    e

    f

    g

    h

    0.2

    0.2

    0.2

    0.2

    0.2

    ?

    b

    a

    d

    e

    f

    g

    h

    A

    B

  • 8/2/2019 t3_p2_lou

    29/91

    ASP-DAC'01 Lou Scheffer II-29

    Buffering

    0

    0.01

    0.02

    0.03

    0.04

    0.05

    0 0.2 0.4 0.6 0.8 1

    load

    d

    A B C

    b

    a

    d

    e

    f

    g

    h

    0.2

    0.2

    0.2

    0.2

    0.2

    ? b

    a

    d

    e

    f

    g

    h0.1

    0.2

    0.2

    0.2

    0.2

    B

    B

    0.2

  • 8/2/2019 t3_p2_lou

    30/91

    ASP-DAC'01 Lou Scheffer II-30

    Redesign Fan-in Tree

    a

    c

    d

    b eArr(b)=3

    Arr(c)=1

    Arr(d)=0

    Arr(a)=4

    Arr(e)=61

    1

    1

    c

    d

    e

    Arr(e)=5

    1

    1

    b 1

    a

  • 8/2/2019 t3_p2_lou

    31/91

    ASP-DAC'01 Lou Scheffer II-31

    Redesign Fan-out Tree

    1

    1

    1

    3

    1

    1

    1

    Longest Path = 5

    1

    1

    1

    3

    1

    2

    Longest Path = 4

    Slowdown of buffer due to load

  • 8/2/2019 t3_p2_lou

    32/91

    ASP-DAC'01 Lou Scheffer II-32

    Decomposition

  • 8/2/2019 t3_p2_lou

    33/91

    ASP-DAC'01 Lou Scheffer II-33

    Swap Commutative Pins

    2

    c

    ab

    2

    1

    01

    1

    1

    3

    a

    c

    b2

    1

    0

    1

    1

    2

    1 5

    Simple Sorting on arrival times and delay works

  • 8/2/2019 t3_p2_lou

    34/91

    ASP-DAC'01 Lou Scheffer II-34

    Move Critical Signals Forward

    nn Based on ATPGBased on ATPG

    linear in circuit sizelinear in circuit size

    Detects redundanciesDetects redundancies

    efficientlyefficiently

    nn Efficiently find wires toEfficiently find wires to

    be added and remove.be added and remove.

    Based on mandatoryBased on mandatory

    assignments.assignments.

    a

    b

    cd e

    a

    b

    ed

    c

  • 8/2/2019 t3_p2_lou

    35/91

    ASP-DAC'01 Lou Scheffer II-35

    Path-based Transforms

    nn PathPath--based resizingbased resizing

    nn UnmapUnmap//remapremap a path or conea path or cone

    nn Slack stealingSlack stealingnn RetimingRetiming

  • 8/2/2019 t3_p2_lou

    36/91

    ASP-DAC'01 Lou Scheffer II-36

    Slack Stealing

    nn Take advantage of timing behavior of level sensitive registersTake advantage of timing behavior of level sensitive registers

    (latches)(latches)

    C1

    C2Slack = 0

    C1C2

    Slack = +1Slack = -1

    C1

    C2

    0 1 2

  • 8/2/2019 t3_p2_lou

    37/91

    ASP-DAC'01 Lou Scheffer II-37

    Retiming

    Delay=3

    Delay=2

    Forward

    Backward

  • 8/2/2019 t3_p2_lou

    38/91

    ASP-DAC'01 Lou Scheffer II-38

    Agenda

    nn Traditional design flowsTraditional design flows

    nn Summary of DSM ProblemsSummary of DSM Problems

    nn Analysis Methods OverviewAnalysis Methods Overviewnn Correction Methods OverviewCorrection Methods Overview

    nn Approaches to Fixing Timing ClosureApproaches to Fixing Timing Closure

    nn Experimental ResultsExperimental Resultsnn SummarySummary

  • 8/2/2019 t3_p2_lou

    39/91

    ASP-DAC'01 Lou Scheffer II-39

    Solutions to Timing Closure

    nn Hand / Custom designHand / Custom design

    nn Improved analysisImproved analysis

    nn

    More sophisticated clock designMore sophisticated clock designnn Carry hierarchical logic design into physicalCarry hierarchical logic design into physical

    nn Modify existing flowsModify existing flows

    nn More physically knowledgeable toolsMore physically knowledgeable tools

    uu Many variations: combined synthesis/place/route,Many variations: combined synthesis/place/route,gain based synthesis, etc.gain based synthesis, etc.

  • 8/2/2019 t3_p2_lou

    40/91

    ASP-DAC'01 Lou Scheffer II-40

    Hand/Custom Design

    nn Mentioned for completenessMentioned for completeness

    uu Hurts productivityHurts productivity

    uu

    Yields highest performanceYields highest performancenn Can only fix a few thingsCan only fix a few things for example:for example:

    uu Can realistically fix timing orCan realistically fix timing or crosstalkcrosstalk

    problems on a few netsproblems on a few nets

    uu Cannot realistically change the size of blocksCannot realistically change the size of blocks

  • 8/2/2019 t3_p2_lou

    41/91

    ASP-DAC'01 Lou Scheffer II-41

    Improved Analysis Helpsnn Plot shows slack by net for two designsPlot shows slack by net for two designsnn A 10% timing deltaA 10% timing delta --> many more bad nets> many more bad nets

    uu Often the difference between success and failureOften the difference between success and failure

    0

    500

    1000

    1500

    2000

    2500

    3000

    3500

    -5 0 5 10 15 20

    Slack Relative to Worst Net (ns)

    NumberofNets

    Series1

    Series2

  • 8/2/2019 t3_p2_lou

    42/91

    ASP-DAC'01 Lou Scheffer II-42

    More accurate analysis

    nn CrosstalkCrosstalkinduced delayinduced delay

    uu Old approachOld approach overestimate coupling Coverestimate coupling C

    uu BetterBetter compute nominal timing +compute nominal timing + xtalkxtalkdeltadelta

    nn Customer example fromCustomer example from CadMosCadMos

    uu IgnoreIgnore crosstalkcrosstalkcompletelycompletely 400 MHz400 MHz

    FF Not an acceptable alternativeNot an acceptable alternative

    uu

    Coupling Caps overestimated by 60%Coupling Caps overestimated by 60% 300 MHz300 MHzuu Nominal delays + computedNominal delays + computed crosstalkcrosstalk 333 MHz333 MHz

    uu More accurate analysis gains 10% marginMore accurate analysis gains 10% margin

  • 8/2/2019 t3_p2_lou

    43/91

    ASP-DAC'01 Lou Scheffer II-43

    Increased accuracy helps

    nn Global/detailed route correlationGlobal/detailed route correlation

    uu Any global route better which than Wire LoadAny global route better which than Wire Load

    Models or Steiner trees, since global routesModels or Steiner trees, since global routes

    consider congestionconsider congestion

    uu But to get that last 10%, need global/detailedBut to get that last 10%, need global/detailed

    router linkrouter link

    FF

    Knowing some nets must detour is good, but.Knowing some nets must detour is good, but.FF WhichWhich net takesnet takes whichwhich detour is needed for gooddetour is needed for good

    correlationcorrelation

  • 8/2/2019 t3_p2_lou

    44/91

    ASP-DAC'01 Lou Scheffer II-44

    Modified clock designnn Zero skew is not necessary, and maybe notZero skew is not necessary, and maybe not

    even desirableeven desirable

    nn We have the freedom to adjust clock arrivalWe have the freedom to adjust clock arrivaltimes at memory elementstimes at memory elements

    uu This obtains more margin and thus helpsThis obtains more margin and thus helps

    convergenceconvergence

    nn Similar to retiming but less disruptiveSimilar to retiming but less disruptive

    nn Improvement very design dependentImprovement very design dependentuu If worst path is flipIf worst path is flip--flop to itself, doesnt helpflop to itself, doesnt help

    nn May impact scan chainsMay impact scan chains

  • 8/2/2019 t3_p2_lou

    45/91

    ASP-DAC'01 Lou Scheffer II-45

    Hierarchy and Physical Design

    nn Logical hierarchy can be carried over intoLogical hierarchy can be carried over into

    physical designphysical design

    nn Seems natural topSeems natural top--down approach, usingdown approach, usingfloorplanningfloorplanning as a firm guide to physicalas a firm guide to physical

    designdesign

  • 8/2/2019 t3_p2_lou

    46/91

  • 8/2/2019 t3_p2_lou

    47/91

    ASP-DAC'01 Lou Scheffer II-47

    Hierarchy and Physical Design

    Disadvantagesnn Placement solution boundedPlacement solution bounded

    nn Ability to find a routable solution hinderedAbility to find a routable solution hindered

    nn Hierarchy usually logicallyHierarchy usually logically--based, notbased, notphysicallyphysically--basedbased

    nn Boundary conditions explode and must beBoundary conditions explode and must be

    managed carefully to avoid surprisesmanaged carefully to avoid surprises

    nn Pin assignment problem for all macrosPin assignment problem for all macros

  • 8/2/2019 t3_p2_lou

    48/91

    ASP-DAC'01 Lou Scheffer II-48

    Hierarchy Example Plots

  • 8/2/2019 t3_p2_lou

    49/91

    ASP-DAC'01 Lou Scheffer II-49

    Hierarchy Example Plots

  • 8/2/2019 t3_p2_lou

    50/91

    ASP-DAC'01 Lou Scheffer II-50

    Hierarchy Example Plots

  • 8/2/2019 t3_p2_lou

    51/91

    ASP-DAC'01 Lou Scheffer II-51

    Previous attempts to fix closure

    nn Modifications/Additions to existing flowsModifications/Additions to existing flows

    nn Allow placer to do sizing and bufferingAllow placer to do sizing and buffering

    nn Do post placement optimizationDo post placement optimizationuu Simple transformationsSimple transformations

    uu Use existing placementUse existing placement

    nn

    Do post placement reDo post placement re--synthesissynthesisuu Complex transformations allowedComplex transformations allowed

    uu Needs incremental placement and extractionNeeds incremental placement and extraction

  • 8/2/2019 t3_p2_lou

    52/91

  • 8/2/2019 t3_p2_lou

    53/91

    ASP-DAC'01 Lou Scheffer II-53

    Post-Placement Optimization

    nn InIn--place (little or no placement impact)place (little or no placement impact)

    uu Resizing (carefully)Resizing (carefully)

    uu Pin swapping, some tree rebuildingPin swapping, some tree rebuilding

    uu Wire sizing / typingWire sizing / typingnn Minimally disruptiveMinimally disruptive

    uu ResizingResizing

    uu BufferingBuffering

    uu

    CloningCloninguu Tree rebuildingTree rebuilding

    uu Cell removalCell removal

  • 8/2/2019 t3_p2_lou

    54/91

    ASP-DAC'01 Lou Scheffer II-54

    In-place Optimization

    nn NotNot tootoo difficultdifficult

    nn Can use extracted electrical data (C, RC)Can use extracted electrical data (C, RC)

    from placement toolfrom placement tooluu Some changes affect pin locations, but may beSome changes affect pin locations, but may be

    ignoredignored

    uu Tree rebuilding needs incremental extractionTree rebuilding needs incremental extraction

    nn Can use timing reports for timing dataCan use timing reports for timing data

    uu But, accuracy suffers as changes are madeBut, accuracy suffers as changes are made

  • 8/2/2019 t3_p2_lou

    55/91

    ASP-DAC'01 Lou Scheffer II-55

    In-place Optimization

    Placement &extraction

    Placed

    netlist

    C/RC

    data

    Optimization

    Optdnetlist

    Resize

    swap pins

    rebuild trees

  • 8/2/2019 t3_p2_lou

    56/91

    ASP-DAC'01 Lou Scheffer II-56

    Place-disruptive Optimization

    nn Nets changing impliesNets changing implies

    uu Must be able to recompute C and RCMust be able to recompute C and RC

    uu

    May need to incrementally place new cellsMay need to incrementally place new cellsuu Need incremental timing capabilityNeed incremental timing capability

  • 8/2/2019 t3_p2_lou

    57/91

    ASP-DAC'01 Lou Scheffer II-57

    Place-disruptive Optimization

    Placement &extraction

    Placed

    netlist

    C/RC

    data

    Optimization

    with placer,timer, extractor

    Optdnetlist

    Resizebuffer

    clone

    cell removal

    rebuild trees

  • 8/2/2019 t3_p2_lou

    58/91

    ASP-DAC'01 Lou Scheffer II-58

    Post-Placement Example -

    Buffering long wires

  • 8/2/2019 t3_p2_lou

    59/91

    ASP-DAC'01 Lou Scheffer II-59

    Post-Placement Challenges

    nn Getting the timing rightGetting the timing right

    uu Different timers used at different stagesDifferent timers used at different stages

    uu

    Do the optimizer and placer see the same worstDo the optimizer and placer see the same worstpaths as the static timer?paths as the static timer?

    nn Design size / tool capacityDesign size / tool capacity

    uu Using synthesis technology on flat designsUsing synthesis technology on flat designs

  • 8/2/2019 t3_p2_lou

    60/91

    ASP-DAC'01 Lou Scheffer II-60

    Post-Placement Challenges

    nn Incompatible tools, formatsIncompatible tools, formats

    uu Placer, synthesizer, timer may all use differentPlacer, synthesizer, timer may all use different

    file format, may all be different vendorsfile format, may all be different vendors

    uu Basic interoperability issuesBasic interoperability issues

    nn Incremental placer needed for new cellsIncremental placer needed for new cells

    uu Doesnt have to be smartDoesnt have to be smart

    uu But might produce some infeasible solutionsBut might produce some infeasible solutions

    uu Must be integrated with optimizerMust be integrated with optimizer

  • 8/2/2019 t3_p2_lou

    61/91

    ASP-DAC'01 Lou Scheffer II-61

    Post-Placement Challenges

    nn Extraction/Estimation of net dataExtraction/Estimation of net data

    nn Any optimization which significantly altersAny optimization which significantly altersnet topology needs this abilitynet topology needs this ability

    uu Insert cellsInsert cellsuu Remove cellsRemove cells

    uu Move connections from one cell to anotherMove connections from one cell to another

    nn Steiner tree estimationSteiner tree estimation

    nn Net C and delay (RC) calculatorNet C and delay (RC) calculator

    nn Do results match other extraction tools?Do results match other extraction tools?

  • 8/2/2019 t3_p2_lou

    62/91

    ASP-DAC'01 Lou Scheffer II-62

    Sample Optimization Results

    .18.18 mm7.5 ns7.5 ns--4 / 10004 / 1000--11 / 200011 / 2000--0.5 / 5000.5 / 500V2V2

    PlacedPlaced OptOptSynthesizedSynthesized

    .25.25 mm8 ns8 ns--13 / 20k13 / 20k--97 / 43k97 / 43k--0.4 / 1000.4 / 100P1P1

    .18.18 mm2.52.5--10 ns10 ns--6 / 62k6 / 62k--48 / 164k48 / 164k--0.5 / 20000.5 / 2000T1T1

    .18.18 mm7.5 ns7.5 ns--0.3 / 1000.3 / 100--12 / 15k12 / 15k0 / 00 / 0V1V1

    .25.25 mm7.5 ns7.5 ns--2 / 14002 / 1400--12 / 38k12 / 38k--1 / 20001 / 2000C1C1

    TechTechCycleCycle

    timetime

    Worst slack / # missesWorst slack / # missesDesignDesign

  • 8/2/2019 t3_p2_lou

    63/91

    ASP-DAC'01 Lou Scheffer II-63

    Root Problem is Wire Load Models

    nn Main problem: correlation between PreMain problem: correlation between Pre--

    P&R estimates and PostP&R estimates and Post--P&R extractionP&R extraction

    nn

    If correlation is goodIf correlation is gooduu Problems detected and potentially fixedProblems detected and potentially fixed earlyearly

    nn If correlation is badIf correlation is bad

    uu Problems detectedProblems detected latelate

    uu Not a good situation! Need to reNot a good situation! Need to re--write RTL iswrite RTL is

    worst case for timing closure.worst case for timing closure.

  • 8/2/2019 t3_p2_lou

    64/91

    ASP-DAC'01 Lou Scheffer II-64

    Why are Wire Load Models Used?

    nn Cant complete layout until logic design isCant complete layout until logic design is

    completecomplete

    nn

    Cant complete logic design without timingCant complete logic design without timingnn Cant time without load and net delay dataCant time without load and net delay data

    nn Cant extract load and net delay data untilCant extract load and net delay data until

    layout is completelayout is complete

    nn Cant complete layout Cant complete layout

  • 8/2/2019 t3_p2_lou

    65/91

    ASP-DAC'01 Lou Scheffer II-65

    WLM solution use statistics

    nn Dont know specific layout dataDont know specific layout data

    nn But we know something about statisticalBut we know something about statistical

    propertiespropertiesnn Average net load, average net delayAverage net load, average net delay

    nn Further refine using other characteristicsFurther refine using other characteristics

    uu Number of sinksNumber of sinks

    uu Size of design (number of circuits)Size of design (number of circuits)

    uu Physical sizePhysical size

  • 8/2/2019 t3_p2_lou

    66/91

    ASP-DAC'01 Lou Scheffer II-66

    Correlation Pre/Post-P&R

    using averagesnn Wire load modelsWire load models give synthesis angive synthesis an estimateestimate

    of physical designof physical design

    nn

    We can correlate averages preWe can correlate averages pre-- and postand post--P&R as accurately as neededP&R as accurately as needed

    nn If specific design has average behavior, itsIf specific design has average behavior, its

    timing,timing, on averageon average, can be predicted, can be predicted

    nn Otherwise, a pass through placement canOtherwise, a pass through placement can

    provide correct WLM for a designprovide correct WLM for a design

  • 8/2/2019 t3_p2_lou

    67/91

    ASP-DAC'01 Lou Scheffer II-67

    Timing and averagesnn WLMsWLMs OK for area, power (properties thatOK for area, power (properties that

    are sums are well handled by statistics)are sums are well handled by statistics)

    nn But, timing dictated by the worstBut, timing dictated by the worst specificspecificpathpath

    nn That path is built ofThat path is built ofindividualindividual netsnets

    nn One net can determine the speed of anOne net can determine the speed of anentire designentire design

    nn Reality: poor correlation for relatively fewReality: poor correlation for relatively fewnets can cause major headachesnets can cause major headaches

  • 8/2/2019 t3_p2_lou

    68/91

    ASP-DAC'01 Lou Scheffer II-68

    Correlation Pre/Post-P&R

    Averages and Wire LoadsDistribution of C / fan-out

    0

    5000

    10000

    1500020000

    25000

    30000

    0 10 20 30 40 50 60 70 80 90

    100

    110

    pF per fan-out

    Numberofnets

    medianmedian meanmean

  • 8/2/2019 t3_p2_lou

    69/91

    ASP-DAC'01 Lou Scheffer II-69

    Correlation Pre/Post-P&R

    Cwire Data by Logic DesignCwire

    Number of fan-outs

  • 8/2/2019 t3_p2_lou

    70/91

    ASP-DAC'01 Lou Scheffer II-70

    Better Wire Load Modelsnn How can we use information from one passHow can we use information from one pass

    through physical design?through physical design?

    nn Adjust wire load model coefficientsAdjust wire load model coefficients

    nn

    Back annotateBack annotate specific net load and delay data tospecific net load and delay data tothe logic designthe logic design

    nn New problem: correlation of logic preNew problem: correlation of logic pre-- and postand post--

    synthesissynthesis

    nn But, there are fundamental limits to statisticalBut, there are fundamental limits to statisticalmodelsmodels a new approach is neededa new approach is needed..

  • 8/2/2019 t3_p2_lou

    71/91

    ASP-DAC'01 Lou Scheffer II-71

    A better approach:

    Combine Synthesis, P & Rnn Dont use wire load models at allDont use wire load models at all

    nn Synthesis does a trial placement as it runsSynthesis does a trial placement as it runs

    uu Loading found from estimated routesLoading found from estimated routes

    nn Must include global routingMust include global routing

    uu Then, feed global route to detailed routerThen, feed global route to detailed router

    uu Or, do detailed route itselfOr, do detailed route itself

    nn Much better correlation and timing closureMuch better correlation and timing closurenn No interNo inter--tool data transfer headachestool data transfer headaches

  • 8/2/2019 t3_p2_lou

    72/91

  • 8/2/2019 t3_p2_lou

    73/91

    ASP-DAC'01 Lou Scheffer II-73

    Conventional Flow

    nn More than 20 IterationsMore than 20 Iterations

    nn 89MHz best result89MHz best result

    w/manual changesw/manual changes

    Synthesis

    Static Timing

    syn2GCF

    SE

    Placement baseoptimization

    Detail route

    FloorplanDEF

    Extraction

    DRC

    Func. & Timing.TLF

    PhysicalLEF

    Global route

    Func. & Timing.lib

    Delay calc

    DC

    PT

    Pearl

  • 8/2/2019 t3_p2_lou

    74/91

    ASP-DAC'01 Lou Scheffer II-74

    Combined SP&R Flow

    SE-PKS

    FloorplanDEF

    Extraction

    DRC

    Func. & Timing.TLF

    PhysicalLEF

    Delay calc

    EDIFnetlist

    PKS Optimization

    Global Route

    Static Timing

    Pearl

    HE

    Static TimingPT

    TCL Constraints

    write_constraints

    Detail route

    nn 100MHz final result, met timing100MHz final result, met timing

    nn Correlation within +Correlation within + -- 2.1%2.1%

    nn One passOne pass

    nn 12hrs 20min runtime12hrs 20min runtime

  • 8/2/2019 t3_p2_lou

    75/91

    ASP-DAC'01 Lou Scheffer II-75

    Slack Correlation

    Wire Load Based

    PKS

    Routed

  • 8/2/2019 t3_p2_lou

    76/91

    ASP-DAC'01 Lou Scheffer II-76

    Enlargement of SP&R slack

  • 8/2/2019 t3_p2_lou

    77/91

    ASP-DAC'01 Lou Scheffer II-77

    Results from combined SP&R

    CaseCase sizesize macros PKS timingmacros PKS timing max freq (MHz)max freq (MHz)

    instances (k)instances (k) error (%)error (%) conventionalconventional SP&RSP&R

    11 350350 5656 ++ -- 3%3% 140140 140140

    22 250250 5050 ++ -- 3%3% 9797 100100

    33 5050 44 ++ -- 0.96%0.96% 9393 9595

    44 160160 7070 ++ -- 2.1%2.1% 8989 100100

  • 8/2/2019 t3_p2_lou

    78/91

    ASP-DAC'01 Lou Scheffer II-78

    Agenda

    nn Traditional design flowsTraditional design flows

    nn Summary of DSM ProblemsSummary of DSM Problems

    nn Analysis Methods OverviewAnalysis Methods Overviewnn Correction Methods OverviewCorrection Methods Overview

    nn Approaches to Fixing Timing ClosureApproaches to Fixing Timing Closure

    nn Experimental ResultsExperimental Resultsnn SummarySummary

  • 8/2/2019 t3_p2_lou

    79/91

    ASP-DAC'01 Lou Scheffer II-79

    How do the approaches compare?

    nn JayJay McDougalMcDougal ofofAgilentAgilent ran many flowsran many flowson the same designon the same design

    nn OverconstrainOverconstrain clock by various amountsclock by various amounts

    nn Accurate or conservativeAccurate or conservative WLMsWLMs

    uu Tried many levels of conservatismTried many levels of conservatism

    nn Allow placer to size or notAllow placer to size or not

    nn Do post placement optimization or notDo post placement optimization or notnn Physically knowledgeable synthesisPhysically knowledgeable synthesis

  • 8/2/2019 t3_p2_lou

    80/91

  • 8/2/2019 t3_p2_lou

    81/91

    ASP-DAC'01 Lou Scheffer II-81

    Key to the plot of results

    nn Basic flowBasic flow Design Compiler &Design Compiler & QplaceQplace

    nn TDD = timing driven designTDD = timing driven design

    uu In addition to minimizing wire length and congestion,In addition to minimizing wire length and congestion,

    placer is given timing constraints and allowed toplacer is given timing constraints and allowed tochange gate sizeschange gate sizes

    nn IPO and PBO are post placement optimizersIPO and PBO are post placement optimizers

    uu IPOIPO runs on synthesis DB with back annotationruns on synthesis DB with back annotation

    uu PBOPBO runs on physical DB with synthesis transformsruns on physical DB with synthesis transforms

    nn PKS = Physically Knowledgeable SynthesisPKS = Physically Knowledgeable Synthesis(combined Synthesis/Place/Route)(combined Synthesis/Place/Route)

  • 8/2/2019 t3_p2_lou

    82/91

    ASP-DAC'01 Lou Scheffer II-82

    Comparison of Approaches

    5

    5.5

    6

    6.5

    7

    7.5

    8

    8.5

    9

    9.5

    0.95 1.05 1.15 1.25

    Relative size

    Clockcyclea

    chieved

    No WLM

    90% WLM

    3ns;50%WL

    IPO 5ns NoWL

    IPO 3ns NoWL

    TDD/PBO 50%WL

    TDD/PBO 90%WL

    PKS

    Required

    Cycle time

  • 8/2/2019 t3_p2_lou

    83/91

    ASP-DAC'01 Lou Scheffer II-83

    Comparison of Approaches

    5

    5.5

    6

    6.5

    7

    7.5

    8

    8.5

    9

    9.5

    0.95 1.05 1.15 1.25

    Relative size

    Clockcyclea

    chieved

    No WLM

    90% WLM

    3ns;50%WL

    IPO 5ns NoWL

    IPO 3ns NoWL

    TDD/PBO 50%WL

    TDD/PBO 90%WL

    PKS

    Good area, but iterates

    between placement and

    synthesis, worst TTM,

    didnt hit timing target

    One tool, no iteration,

    better TTM, hit timing

    target

  • 8/2/2019 t3_p2_lou

    84/91

    ASP-DAC'01 Lou Scheffer II-84

    Agenda

    nn Traditional design flowsTraditional design flows

    nn Summary of DSM ProblemsSummary of DSM Problems

    nn Analysis Methods OverviewAnalysis Methods Overviewnn Correction Methods OverviewCorrection Methods Overview

    nn Approaches to Fixing Timing ClosureApproaches to Fixing Timing Closure

    nn Experimental ResultsExperimental Resultsnn SummarySummary

  • 8/2/2019 t3_p2_lou

    85/91

    ASP-DAC'01 Lou Scheffer II-85

    Good News

    nn At least we understand the problemAt least we understand the problem

    uu Analysis of timing is well understoodAnalysis of timing is well understood

    uu Transformations that help timing are wellTransformations that help timing are well

    understoodunderstood

    uu DSM effects are painful but can be controlledDSM effects are painful but can be controlled

  • 8/2/2019 t3_p2_lou

    86/91

    ASP-DAC'01 Lou Scheffer II-86

    Bad News

    nn Cycle time and technology advancesCycle time and technology advances

    demand more and more sophisticateddemand more and more sophisticated

    optimization techniquesoptimization techniques

    nn In previous flows, corrections must beIn previous flows, corrections must be

    applied in separate toolsapplied in separate tools

    nn Disconnects among various tools involvedDisconnects among various tools involved

    increases turnincreases turn--aroundaround--time and limitstime and limitsoptimizationoptimization

  • 8/2/2019 t3_p2_lou

    87/91

    ASP-DAC'01 Lou Scheffer II-87

    Good News

    nn The Bad News is commonly recognizedThe Bad News is commonly recognized

    nn Many tool vendors, academics, inMany tool vendors, academics, in--househouse

    EDA researchers are working to solve theseEDA researchers are working to solve theseproblemsproblems

    nn A new generation of tools is alreadyA new generation of tools is already

    available that was designed from the groundavailable that was designed from the ground

    up to address timing closureup to address timing closure

  • 8/2/2019 t3_p2_lou

    88/91

    ASP-DAC'01 Lou Scheffer II-88

    Bad News

    nn These problems wont be the last!These problems wont be the last!

    nn Each process generation brings newEach process generation brings new

    problemsproblemsuu Increased sizeIncreased size

    uu Weird process rules (antenna)Weird process rules (antenna)

    uu Possible new effects (single event upset)Possible new effects (single event upset)

  • 8/2/2019 t3_p2_lou

    89/91

    ASP-DAC'01 Lou Scheffer II-89

    Summary

    nn Timing closure is a very real problemTiming closure is a very real problem

    nn Incremental improvements help somewhat,Incremental improvements help somewhat,

    but limiting factor isbut limiting factor isnn If synthesis does not understand placement,If synthesis does not understand placement,

    it must use wire load models, which haveit must use wire load models, which have

    serious limitationsserious limitations

    nn Best approach is combined synthesis/P&RBest approach is combined synthesis/P&R

    nn Experimental data backs this upExperimental data backs this up

  • 8/2/2019 t3_p2_lou

    90/91

    ASP-DAC'01 Lou Scheffer II-90

    Acknowledgements

    nn TonyTony DrummDrumm wrote the original set of slideswrote the original set of slidesfor this lecture, including many of thefor this lecture, including many of theexamples. He credits:examples. He credits:

    uu AlexAlex SuessSuess

    uu JosJos NevesNeves

    uu Bill JoynerBill Joyner

    uu IBM Rochester EDA folksIBM Rochester EDA folks

    nn But the conclusions, and any mistakes, areBut the conclusions, and any mistakes, areminemine

  • 8/2/2019 t3_p2_lou

    91/91

    h ff