t3_p2_lou
TRANSCRIPT
-
8/2/2019 t3_p2_lou
1/91
-
8/2/2019 t3_p2_lou
2/91
ASP-DAC'01 Lou Scheffer 2
Timing Closure Today
Timing more accurate as flow progresses
Sometimes an earlier stage thinks timing is
OK, but it fails a later stage
Need to repeat one or more steps with
tighter constraints
We have atiming closure problem when
this process fails. Symptoms include:
Non-convergence
Too many iterations
Solution achievable, but this flow
cannot find it.
Design Entry
Synthesis
Timing
Place
Timing
Route
Timing
-
8/2/2019 t3_p2_lou
3/91
ASP-DAC'01 Lou Scheffer II-3
The Timing Closure Problem
Performance of Circuit
Test 7
99
96
100
78
99
83
75
80
85
90
95
100
PKS/WLM P&R IPO P&R
Stage
Frequency(Target!00MHz)
pks
regular
-
8/2/2019 t3_p2_lou
4/91
ASP-DAC'01 Lou Scheffer 4
Examples of Problems
.18.18 mm7.5 ns7.5 ns--11 / 200011 / 2000--0.5 / 5000.5 / 500V2V2
PlacedPlacedSynthesisSynthesis
.25.25 mm8 ns8 ns--97 / 43k97 / 43k--0.4 / 1000.4 / 100P1P1
.18.18 mm2.52.5--10 ns10 ns--48 / 164k48 / 164k--0.5 / 20000.5 / 2000T1T1
.18.18 mm7.5 ns7.5 ns--12 / 15k12 / 15k0 / 00 / 0V1V1
.25.25 mm7.5 ns7.5 ns--12 / 38k12 / 38k--1 / 20001 / 2000C1C1
TechTechCycleCycle
timetime
Worst slack / # missesWorst slack / # missesDesignDesign
-
8/2/2019 t3_p2_lou
5/91
ASP-DAC'01 Lou Scheffer II-5
Agenda
nn Traditional design flowsTraditional design flows
nn Summary of DSM ProblemsSummary of DSM Problems
nn Timing Analysis OverviewTiming Analysis Overview
nn Timing Correction OverviewTiming Correction Overview
nn Approaches to Fixing Timing ClosureApproaches to Fixing Timing Closure
nn Experimental ResultsExperimental Results
nn SummarySummary
-
8/2/2019 t3_p2_lou
6/91
ASP-DAC'01 Lou Scheffer II-6
Traditional Design FlowsDesign Entry
Synthesis
Timing
Place
Timing
Route
Timing
1. Tech independent
optimization
2. Tech mapping
3. Rudimentary
timing correction
-
8/2/2019 t3_p2_lou
7/91
ASP-DAC'01 Lou Scheffer II-7
Logic Synthesis Flow
nn Technology independent optimizationTechnology independent optimization
uu General goal: reduce connections, literals,General goal: reduce connections, literals,
redundancies, arearedundancies, areann Technology mappingTechnology mapping
uu Map logic into technology libraryMap logic into technology library
nn Timing correctionTiming correction
uu Find and fix critical timing pathsFind and fix critical timing paths
uu Fix electrical violations (load, slew)Fix electrical violations (load, slew)
-
8/2/2019 t3_p2_lou
8/91
ASP-DAC'01 Lou Scheffer II-8
Traditional Design FlowsDesign Entry
Synthesis
w/Timing
Place w/Timing
Route
Timing
Integrate timing with
synthesis and placement
1. Tech independent
optimization
2. Tech mapping
3. Timing correction
-
8/2/2019 t3_p2_lou
9/91
ASP-DAC'01 Lou Scheffer II-9
Agenda
nn Traditional design flowsTraditional design flows
nn Summary of DSM ProblemsSummary of DSM Problems
nn Analysis Methods OverviewAnalysis Methods Overview
nn Correction Methods OverviewCorrection Methods Overview
nn Approaches to Fixing Timing ClosureApproaches to Fixing Timing Closure
nn Experimental ResultsExperimental Results
nn SummarySummary
-
8/2/2019 t3_p2_lou
10/91
ASP-DAC'01 Lou Scheffer II-10
The Wall
nn Logic designers concentrate on logic andLogic designers concentrate on logic and
timing (as understood by synthesis)timing (as understood by synthesis)
nn Design work done in abstract world of gatesDesign work done in abstract world of gatesand wire load modelsand wire load models
nn Throw designThrow design over the wallover the wall when completewhen complete
nn
Physical designers concentrate on layoutPhysical designers concentrate on layoutand ability to routeand ability to route
nn Effective method for many yearsEffective method for many years
-
8/2/2019 t3_p2_lou
11/91
ASP-DAC'01 Lou Scheffer II-11
General CMOS Problems
nn Low drive strengths / low powerLow drive strengths / low power
uu Capacitance (not intrinsic delay) plays a largeCapacitance (not intrinsic delay) plays a large
role in performancerole in performanceuu VariabilityVariability range between slowest possiblerange between slowest possible
and fastest possibleand fastest possible
nn Noise affects delayNoise affects delay
uu IR drop a big percentage of supplyIR drop a big percentage of supply
uu CrosstalkCrosstalkcan change delay by a factor of 2can change delay by a factor of 2
-
8/2/2019 t3_p2_lou
12/91
ASP-DAC'01 Lou Scheffer II-12
Additional DSM Problems
nn High density / huge designsHigh density / huge designs
nn Very thin and resistive wiresVery thin and resistive wires
nn Very high frequenciesVery high frequenciesuu Inductance becomes more importantInductance becomes more important
nn Smaller voltagesSmaller voltages
uu IR drop a bigger fraction of signal swingIR drop a bigger fraction of signal swing
nn Clock skew and latencyClock skew and latency
nn Electromigration and noiseElectromigration and noise
-
8/2/2019 t3_p2_lou
13/91
ASP-DAC'01 Lou Scheffer II-13
Clock Distribution Problems
nn Most common design approach requiresMost common design approach requires
close to zero skewclose to zero skew
nn CMOS / DSM problems all affect clocksCMOS / DSM problems all affect clocksnn Distribution problem increasingDistribution problem increasing
uu Number of latches/flipNumber of latches/flip--flops growingflops growing
significantlysignificantly
nn Power consumed in clock tree significantPower consumed in clock tree significant
uu IIand noise also of concernand noise also of concern
-
8/2/2019 t3_p2_lou
14/91
ASP-DAC'01 Lou Scheffer II-14
Process Designers are trying to help
nn Many metal layersMany metal layers
nn Different metal pitchesDifferent metal pitches
uu
Small pitch for local interconnectSmall pitch for local interconnectuu Big pitch for long, fast wiresBig pitch for long, fast wires
nn Copper wires, thick metal to lower RCopper wires, thick metal to lower R
nn SOISOI Silicon On InsulatorSilicon On Insulator
nn Low k dielectricsLow k dielectrics
nn These help but are not enoughThese help but are not enough
-
8/2/2019 t3_p2_lou
15/91
ASP-DAC'01 Lou Scheffer II-15
Agenda
nn Traditional design flowsTraditional design flows
nn Summary of DSM ProblemsSummary of DSM Problems
nn Analysis Methods OverviewAnalysis Methods Overview
nn Correction Methods OverviewCorrection Methods Overview
nn Approaches to Fixing Timing ClosureApproaches to Fixing Timing Closure
nn Experimental ResultsExperimental Results
nn SummarySummary
-
8/2/2019 t3_p2_lou
16/91
ASP-DAC'01 Lou Scheffer II-16
Timing Analysis
nn Give accurate time values on each pin/portGive accurate time values on each pin/port
of the networkof the network
nn Has to deal with design changes inHas to deal with design changes inoptimization toolboxoptimization toolbox
nn StaticStatic Timing AnalysisTiming Analysis
uu Simulation far too slow in optimizationSimulation far too slow in optimization
environmentenvironment
uu Accuracy is more than enoughAccuracy is more than enough
-
8/2/2019 t3_p2_lou
17/91
ASP-DAC'01 Lou Scheffer II-17
Timing Analysis Requirements
nn Choose combination of timing analyzer and delayChoose combination of timing analyzer and delay
calculator which are appropriate for level ofcalculator which are appropriate for level ofdesigndesign
uu give the best accuracygive the best accuracyuu for performance that can be toleratedfor performance that can be tolerated
nn Timing Analysis / Delay calculation must be ableTiming Analysis / Delay calculation must be ableto cope with logic design changesto cope with logic design changes
uu
IncrementalIncrementaluu Highest performance possibleHighest performance possible
uu NonNon--linear delay equationslinear delay equations
-
8/2/2019 t3_p2_lou
18/91
ASP-DAC'01 Lou Scheffer II-18
Timing Analysis Requirements
nn Must handleMust handle
uu Difference between rising and falling delaysDifference between rising and falling delays
uu Delay dependent on slew rateDelay dependent on slew rateuu Slew and delay dependent on output loadSlew and delay dependent on output load
uu NonNon--linear delay equationslinear delay equations
-
8/2/2019 t3_p2_lou
19/91
ASP-DAC'01 Lou Scheffer II-19
Late Mode Analysis Definitions
nn Constraints: assertions at the boundariesConstraints: assertions at the boundaries
Arrival times:Arrival times:ATATaa,,ATATbb
Required arrival time:Required arrival time:RATRATxx
nn Delay fromDelay from aa totoxx is the longest time it takes tois the longest time it takes topropagate a signal frompropagate a signal from aa totoxx
nn Slack is required arrival timeSlack is required arrival time -- arrival time.arrival time.
a
b xc
yaAT
bAT
xRAT
axd
-
8/2/2019 t3_p2_lou
20/91
ASP-DAC'01 Lou Scheffer II-20
Example
a
b xc
y0=aAT
1=bAT
2=xRAT
3=xAT
132 ==xSL110 ==bSL
000==
aSL 2=yAT
0=cAT
121 ==ySL
11
101==
cSL
-
8/2/2019 t3_p2_lou
21/91
ASP-DAC'01 Lou Scheffer II-21
Early mode analysis
0=aAT
1=bAT
2=xRAT
1=xAT
121 ==x
SL101 ==bSL
000 ==aSL
1=yAT
0=cAT
011 ==ySL
a
b xc
y
nn Definitions change as followsDefinitions change as follows
longestlongestbecomesbecomes shortestshortest
slack = arrivalslack = arrival -- requiredrequired
11
110 ==c
SL
-
8/2/2019 t3_p2_lou
22/91
ASP-DAC'01 Lou Scheffer II-22
Delay modeling
axda
b
x
bxd
Propagation Arcs cl
d odclt _
ocld _
Test ArcTiming Model
-
8/2/2019 t3_p2_lou
23/91
ASP-DAC'01 Lou Scheffer II-23
Agenda
nn Traditional design flowsTraditional design flows
nn Summary of DSM ProblemsSummary of DSM Problems
nn Analysis Methods OverviewAnalysis Methods Overview
nn Correction Methods OverviewCorrection Methods Overview
nn Approaches to Fixing Timing ClosureApproaches to Fixing Timing Closure
nn Experimental ResultsExperimental Resultsnn SummarySummary
-
8/2/2019 t3_p2_lou
24/91
ASP-DAC'01 Lou Scheffer II-24
Timing Correction
nn Fix electrical violations (slew and load).Fix electrical violations (slew and load).
Takes priority since needed for reliability.Takes priority since needed for reliability.
uu Resize cellsResize cellsuu Buffer netsBuffer nets
uu Copy (clone) cellsCopy (clone) cells
nn Fix timing problemsFix timing problems
uu Local transforms (bag of tricks)Local transforms (bag of tricks)
uu PathPath--based transformsbased transforms
-
8/2/2019 t3_p2_lou
25/91
ASP-DAC'01 Lou Scheffer II-25
Local Transforms
nn Resize cellsResize cells
nn Buffer or clone to reduce load on critical netsBuffer or clone to reduce load on critical nets
nn
Decompose large cellsDecompose large cellsnn Swap connections on commutative pins or amongSwap connections on commutative pins or among
equivalent netsequivalent nets
nn Move critical signals forwardMove critical signals forward
nn Pad early pathsPad early paths
nn Area recoveryArea recovery
-
8/2/2019 t3_p2_lou
26/91
-
8/2/2019 t3_p2_lou
27/91
-
8/2/2019 t3_p2_lou
28/91
ASP-DAC'01 Lou Scheffer II-28
Cloning
0
0.01
0.02
0.03
0.04
0.05
0 0.2 0.4 0.6 0.8 1
load
d
A B C
b
a
d
e
f
g
h
0.2
0.2
0.2
0.2
0.2
?
b
a
d
e
f
g
h
A
B
-
8/2/2019 t3_p2_lou
29/91
ASP-DAC'01 Lou Scheffer II-29
Buffering
0
0.01
0.02
0.03
0.04
0.05
0 0.2 0.4 0.6 0.8 1
load
d
A B C
b
a
d
e
f
g
h
0.2
0.2
0.2
0.2
0.2
? b
a
d
e
f
g
h0.1
0.2
0.2
0.2
0.2
B
B
0.2
-
8/2/2019 t3_p2_lou
30/91
ASP-DAC'01 Lou Scheffer II-30
Redesign Fan-in Tree
a
c
d
b eArr(b)=3
Arr(c)=1
Arr(d)=0
Arr(a)=4
Arr(e)=61
1
1
c
d
e
Arr(e)=5
1
1
b 1
a
-
8/2/2019 t3_p2_lou
31/91
ASP-DAC'01 Lou Scheffer II-31
Redesign Fan-out Tree
1
1
1
3
1
1
1
Longest Path = 5
1
1
1
3
1
2
Longest Path = 4
Slowdown of buffer due to load
-
8/2/2019 t3_p2_lou
32/91
ASP-DAC'01 Lou Scheffer II-32
Decomposition
-
8/2/2019 t3_p2_lou
33/91
ASP-DAC'01 Lou Scheffer II-33
Swap Commutative Pins
2
c
ab
2
1
01
1
1
3
a
c
b2
1
0
1
1
2
1 5
Simple Sorting on arrival times and delay works
-
8/2/2019 t3_p2_lou
34/91
ASP-DAC'01 Lou Scheffer II-34
Move Critical Signals Forward
nn Based on ATPGBased on ATPG
linear in circuit sizelinear in circuit size
Detects redundanciesDetects redundancies
efficientlyefficiently
nn Efficiently find wires toEfficiently find wires to
be added and remove.be added and remove.
Based on mandatoryBased on mandatory
assignments.assignments.
a
b
cd e
a
b
ed
c
-
8/2/2019 t3_p2_lou
35/91
ASP-DAC'01 Lou Scheffer II-35
Path-based Transforms
nn PathPath--based resizingbased resizing
nn UnmapUnmap//remapremap a path or conea path or cone
nn Slack stealingSlack stealingnn RetimingRetiming
-
8/2/2019 t3_p2_lou
36/91
ASP-DAC'01 Lou Scheffer II-36
Slack Stealing
nn Take advantage of timing behavior of level sensitive registersTake advantage of timing behavior of level sensitive registers
(latches)(latches)
C1
C2Slack = 0
C1C2
Slack = +1Slack = -1
C1
C2
0 1 2
-
8/2/2019 t3_p2_lou
37/91
ASP-DAC'01 Lou Scheffer II-37
Retiming
Delay=3
Delay=2
Forward
Backward
-
8/2/2019 t3_p2_lou
38/91
ASP-DAC'01 Lou Scheffer II-38
Agenda
nn Traditional design flowsTraditional design flows
nn Summary of DSM ProblemsSummary of DSM Problems
nn Analysis Methods OverviewAnalysis Methods Overviewnn Correction Methods OverviewCorrection Methods Overview
nn Approaches to Fixing Timing ClosureApproaches to Fixing Timing Closure
nn Experimental ResultsExperimental Resultsnn SummarySummary
-
8/2/2019 t3_p2_lou
39/91
ASP-DAC'01 Lou Scheffer II-39
Solutions to Timing Closure
nn Hand / Custom designHand / Custom design
nn Improved analysisImproved analysis
nn
More sophisticated clock designMore sophisticated clock designnn Carry hierarchical logic design into physicalCarry hierarchical logic design into physical
nn Modify existing flowsModify existing flows
nn More physically knowledgeable toolsMore physically knowledgeable tools
uu Many variations: combined synthesis/place/route,Many variations: combined synthesis/place/route,gain based synthesis, etc.gain based synthesis, etc.
-
8/2/2019 t3_p2_lou
40/91
ASP-DAC'01 Lou Scheffer II-40
Hand/Custom Design
nn Mentioned for completenessMentioned for completeness
uu Hurts productivityHurts productivity
uu
Yields highest performanceYields highest performancenn Can only fix a few thingsCan only fix a few things for example:for example:
uu Can realistically fix timing orCan realistically fix timing or crosstalkcrosstalk
problems on a few netsproblems on a few nets
uu Cannot realistically change the size of blocksCannot realistically change the size of blocks
-
8/2/2019 t3_p2_lou
41/91
ASP-DAC'01 Lou Scheffer II-41
Improved Analysis Helpsnn Plot shows slack by net for two designsPlot shows slack by net for two designsnn A 10% timing deltaA 10% timing delta --> many more bad nets> many more bad nets
uu Often the difference between success and failureOften the difference between success and failure
0
500
1000
1500
2000
2500
3000
3500
-5 0 5 10 15 20
Slack Relative to Worst Net (ns)
NumberofNets
Series1
Series2
-
8/2/2019 t3_p2_lou
42/91
ASP-DAC'01 Lou Scheffer II-42
More accurate analysis
nn CrosstalkCrosstalkinduced delayinduced delay
uu Old approachOld approach overestimate coupling Coverestimate coupling C
uu BetterBetter compute nominal timing +compute nominal timing + xtalkxtalkdeltadelta
nn Customer example fromCustomer example from CadMosCadMos
uu IgnoreIgnore crosstalkcrosstalkcompletelycompletely 400 MHz400 MHz
FF Not an acceptable alternativeNot an acceptable alternative
uu
Coupling Caps overestimated by 60%Coupling Caps overestimated by 60% 300 MHz300 MHzuu Nominal delays + computedNominal delays + computed crosstalkcrosstalk 333 MHz333 MHz
uu More accurate analysis gains 10% marginMore accurate analysis gains 10% margin
-
8/2/2019 t3_p2_lou
43/91
ASP-DAC'01 Lou Scheffer II-43
Increased accuracy helps
nn Global/detailed route correlationGlobal/detailed route correlation
uu Any global route better which than Wire LoadAny global route better which than Wire Load
Models or Steiner trees, since global routesModels or Steiner trees, since global routes
consider congestionconsider congestion
uu But to get that last 10%, need global/detailedBut to get that last 10%, need global/detailed
router linkrouter link
FF
Knowing some nets must detour is good, but.Knowing some nets must detour is good, but.FF WhichWhich net takesnet takes whichwhich detour is needed for gooddetour is needed for good
correlationcorrelation
-
8/2/2019 t3_p2_lou
44/91
ASP-DAC'01 Lou Scheffer II-44
Modified clock designnn Zero skew is not necessary, and maybe notZero skew is not necessary, and maybe not
even desirableeven desirable
nn We have the freedom to adjust clock arrivalWe have the freedom to adjust clock arrivaltimes at memory elementstimes at memory elements
uu This obtains more margin and thus helpsThis obtains more margin and thus helps
convergenceconvergence
nn Similar to retiming but less disruptiveSimilar to retiming but less disruptive
nn Improvement very design dependentImprovement very design dependentuu If worst path is flipIf worst path is flip--flop to itself, doesnt helpflop to itself, doesnt help
nn May impact scan chainsMay impact scan chains
-
8/2/2019 t3_p2_lou
45/91
ASP-DAC'01 Lou Scheffer II-45
Hierarchy and Physical Design
nn Logical hierarchy can be carried over intoLogical hierarchy can be carried over into
physical designphysical design
nn Seems natural topSeems natural top--down approach, usingdown approach, usingfloorplanningfloorplanning as a firm guide to physicalas a firm guide to physical
designdesign
-
8/2/2019 t3_p2_lou
46/91
-
8/2/2019 t3_p2_lou
47/91
ASP-DAC'01 Lou Scheffer II-47
Hierarchy and Physical Design
Disadvantagesnn Placement solution boundedPlacement solution bounded
nn Ability to find a routable solution hinderedAbility to find a routable solution hindered
nn Hierarchy usually logicallyHierarchy usually logically--based, notbased, notphysicallyphysically--basedbased
nn Boundary conditions explode and must beBoundary conditions explode and must be
managed carefully to avoid surprisesmanaged carefully to avoid surprises
nn Pin assignment problem for all macrosPin assignment problem for all macros
-
8/2/2019 t3_p2_lou
48/91
ASP-DAC'01 Lou Scheffer II-48
Hierarchy Example Plots
-
8/2/2019 t3_p2_lou
49/91
ASP-DAC'01 Lou Scheffer II-49
Hierarchy Example Plots
-
8/2/2019 t3_p2_lou
50/91
ASP-DAC'01 Lou Scheffer II-50
Hierarchy Example Plots
-
8/2/2019 t3_p2_lou
51/91
ASP-DAC'01 Lou Scheffer II-51
Previous attempts to fix closure
nn Modifications/Additions to existing flowsModifications/Additions to existing flows
nn Allow placer to do sizing and bufferingAllow placer to do sizing and buffering
nn Do post placement optimizationDo post placement optimizationuu Simple transformationsSimple transformations
uu Use existing placementUse existing placement
nn
Do post placement reDo post placement re--synthesissynthesisuu Complex transformations allowedComplex transformations allowed
uu Needs incremental placement and extractionNeeds incremental placement and extraction
-
8/2/2019 t3_p2_lou
52/91
-
8/2/2019 t3_p2_lou
53/91
ASP-DAC'01 Lou Scheffer II-53
Post-Placement Optimization
nn InIn--place (little or no placement impact)place (little or no placement impact)
uu Resizing (carefully)Resizing (carefully)
uu Pin swapping, some tree rebuildingPin swapping, some tree rebuilding
uu Wire sizing / typingWire sizing / typingnn Minimally disruptiveMinimally disruptive
uu ResizingResizing
uu BufferingBuffering
uu
CloningCloninguu Tree rebuildingTree rebuilding
uu Cell removalCell removal
-
8/2/2019 t3_p2_lou
54/91
ASP-DAC'01 Lou Scheffer II-54
In-place Optimization
nn NotNot tootoo difficultdifficult
nn Can use extracted electrical data (C, RC)Can use extracted electrical data (C, RC)
from placement toolfrom placement tooluu Some changes affect pin locations, but may beSome changes affect pin locations, but may be
ignoredignored
uu Tree rebuilding needs incremental extractionTree rebuilding needs incremental extraction
nn Can use timing reports for timing dataCan use timing reports for timing data
uu But, accuracy suffers as changes are madeBut, accuracy suffers as changes are made
-
8/2/2019 t3_p2_lou
55/91
ASP-DAC'01 Lou Scheffer II-55
In-place Optimization
Placement &extraction
Placed
netlist
C/RC
data
Optimization
Optdnetlist
Resize
swap pins
rebuild trees
-
8/2/2019 t3_p2_lou
56/91
ASP-DAC'01 Lou Scheffer II-56
Place-disruptive Optimization
nn Nets changing impliesNets changing implies
uu Must be able to recompute C and RCMust be able to recompute C and RC
uu
May need to incrementally place new cellsMay need to incrementally place new cellsuu Need incremental timing capabilityNeed incremental timing capability
-
8/2/2019 t3_p2_lou
57/91
ASP-DAC'01 Lou Scheffer II-57
Place-disruptive Optimization
Placement &extraction
Placed
netlist
C/RC
data
Optimization
with placer,timer, extractor
Optdnetlist
Resizebuffer
clone
cell removal
rebuild trees
-
8/2/2019 t3_p2_lou
58/91
ASP-DAC'01 Lou Scheffer II-58
Post-Placement Example -
Buffering long wires
-
8/2/2019 t3_p2_lou
59/91
ASP-DAC'01 Lou Scheffer II-59
Post-Placement Challenges
nn Getting the timing rightGetting the timing right
uu Different timers used at different stagesDifferent timers used at different stages
uu
Do the optimizer and placer see the same worstDo the optimizer and placer see the same worstpaths as the static timer?paths as the static timer?
nn Design size / tool capacityDesign size / tool capacity
uu Using synthesis technology on flat designsUsing synthesis technology on flat designs
-
8/2/2019 t3_p2_lou
60/91
ASP-DAC'01 Lou Scheffer II-60
Post-Placement Challenges
nn Incompatible tools, formatsIncompatible tools, formats
uu Placer, synthesizer, timer may all use differentPlacer, synthesizer, timer may all use different
file format, may all be different vendorsfile format, may all be different vendors
uu Basic interoperability issuesBasic interoperability issues
nn Incremental placer needed for new cellsIncremental placer needed for new cells
uu Doesnt have to be smartDoesnt have to be smart
uu But might produce some infeasible solutionsBut might produce some infeasible solutions
uu Must be integrated with optimizerMust be integrated with optimizer
-
8/2/2019 t3_p2_lou
61/91
ASP-DAC'01 Lou Scheffer II-61
Post-Placement Challenges
nn Extraction/Estimation of net dataExtraction/Estimation of net data
nn Any optimization which significantly altersAny optimization which significantly altersnet topology needs this abilitynet topology needs this ability
uu Insert cellsInsert cellsuu Remove cellsRemove cells
uu Move connections from one cell to anotherMove connections from one cell to another
nn Steiner tree estimationSteiner tree estimation
nn Net C and delay (RC) calculatorNet C and delay (RC) calculator
nn Do results match other extraction tools?Do results match other extraction tools?
-
8/2/2019 t3_p2_lou
62/91
ASP-DAC'01 Lou Scheffer II-62
Sample Optimization Results
.18.18 mm7.5 ns7.5 ns--4 / 10004 / 1000--11 / 200011 / 2000--0.5 / 5000.5 / 500V2V2
PlacedPlaced OptOptSynthesizedSynthesized
.25.25 mm8 ns8 ns--13 / 20k13 / 20k--97 / 43k97 / 43k--0.4 / 1000.4 / 100P1P1
.18.18 mm2.52.5--10 ns10 ns--6 / 62k6 / 62k--48 / 164k48 / 164k--0.5 / 20000.5 / 2000T1T1
.18.18 mm7.5 ns7.5 ns--0.3 / 1000.3 / 100--12 / 15k12 / 15k0 / 00 / 0V1V1
.25.25 mm7.5 ns7.5 ns--2 / 14002 / 1400--12 / 38k12 / 38k--1 / 20001 / 2000C1C1
TechTechCycleCycle
timetime
Worst slack / # missesWorst slack / # missesDesignDesign
-
8/2/2019 t3_p2_lou
63/91
ASP-DAC'01 Lou Scheffer II-63
Root Problem is Wire Load Models
nn Main problem: correlation between PreMain problem: correlation between Pre--
P&R estimates and PostP&R estimates and Post--P&R extractionP&R extraction
nn
If correlation is goodIf correlation is gooduu Problems detected and potentially fixedProblems detected and potentially fixed earlyearly
nn If correlation is badIf correlation is bad
uu Problems detectedProblems detected latelate
uu Not a good situation! Need to reNot a good situation! Need to re--write RTL iswrite RTL is
worst case for timing closure.worst case for timing closure.
-
8/2/2019 t3_p2_lou
64/91
ASP-DAC'01 Lou Scheffer II-64
Why are Wire Load Models Used?
nn Cant complete layout until logic design isCant complete layout until logic design is
completecomplete
nn
Cant complete logic design without timingCant complete logic design without timingnn Cant time without load and net delay dataCant time without load and net delay data
nn Cant extract load and net delay data untilCant extract load and net delay data until
layout is completelayout is complete
nn Cant complete layout Cant complete layout
-
8/2/2019 t3_p2_lou
65/91
ASP-DAC'01 Lou Scheffer II-65
WLM solution use statistics
nn Dont know specific layout dataDont know specific layout data
nn But we know something about statisticalBut we know something about statistical
propertiespropertiesnn Average net load, average net delayAverage net load, average net delay
nn Further refine using other characteristicsFurther refine using other characteristics
uu Number of sinksNumber of sinks
uu Size of design (number of circuits)Size of design (number of circuits)
uu Physical sizePhysical size
-
8/2/2019 t3_p2_lou
66/91
ASP-DAC'01 Lou Scheffer II-66
Correlation Pre/Post-P&R
using averagesnn Wire load modelsWire load models give synthesis angive synthesis an estimateestimate
of physical designof physical design
nn
We can correlate averages preWe can correlate averages pre-- and postand post--P&R as accurately as neededP&R as accurately as needed
nn If specific design has average behavior, itsIf specific design has average behavior, its
timing,timing, on averageon average, can be predicted, can be predicted
nn Otherwise, a pass through placement canOtherwise, a pass through placement can
provide correct WLM for a designprovide correct WLM for a design
-
8/2/2019 t3_p2_lou
67/91
ASP-DAC'01 Lou Scheffer II-67
Timing and averagesnn WLMsWLMs OK for area, power (properties thatOK for area, power (properties that
are sums are well handled by statistics)are sums are well handled by statistics)
nn But, timing dictated by the worstBut, timing dictated by the worst specificspecificpathpath
nn That path is built ofThat path is built ofindividualindividual netsnets
nn One net can determine the speed of anOne net can determine the speed of anentire designentire design
nn Reality: poor correlation for relatively fewReality: poor correlation for relatively fewnets can cause major headachesnets can cause major headaches
-
8/2/2019 t3_p2_lou
68/91
ASP-DAC'01 Lou Scheffer II-68
Correlation Pre/Post-P&R
Averages and Wire LoadsDistribution of C / fan-out
0
5000
10000
1500020000
25000
30000
0 10 20 30 40 50 60 70 80 90
100
110
pF per fan-out
Numberofnets
medianmedian meanmean
-
8/2/2019 t3_p2_lou
69/91
ASP-DAC'01 Lou Scheffer II-69
Correlation Pre/Post-P&R
Cwire Data by Logic DesignCwire
Number of fan-outs
-
8/2/2019 t3_p2_lou
70/91
ASP-DAC'01 Lou Scheffer II-70
Better Wire Load Modelsnn How can we use information from one passHow can we use information from one pass
through physical design?through physical design?
nn Adjust wire load model coefficientsAdjust wire load model coefficients
nn
Back annotateBack annotate specific net load and delay data tospecific net load and delay data tothe logic designthe logic design
nn New problem: correlation of logic preNew problem: correlation of logic pre-- and postand post--
synthesissynthesis
nn But, there are fundamental limits to statisticalBut, there are fundamental limits to statisticalmodelsmodels a new approach is neededa new approach is needed..
-
8/2/2019 t3_p2_lou
71/91
ASP-DAC'01 Lou Scheffer II-71
A better approach:
Combine Synthesis, P & Rnn Dont use wire load models at allDont use wire load models at all
nn Synthesis does a trial placement as it runsSynthesis does a trial placement as it runs
uu Loading found from estimated routesLoading found from estimated routes
nn Must include global routingMust include global routing
uu Then, feed global route to detailed routerThen, feed global route to detailed router
uu Or, do detailed route itselfOr, do detailed route itself
nn Much better correlation and timing closureMuch better correlation and timing closurenn No interNo inter--tool data transfer headachestool data transfer headaches
-
8/2/2019 t3_p2_lou
72/91
-
8/2/2019 t3_p2_lou
73/91
ASP-DAC'01 Lou Scheffer II-73
Conventional Flow
nn More than 20 IterationsMore than 20 Iterations
nn 89MHz best result89MHz best result
w/manual changesw/manual changes
Synthesis
Static Timing
syn2GCF
SE
Placement baseoptimization
Detail route
FloorplanDEF
Extraction
DRC
Func. & Timing.TLF
PhysicalLEF
Global route
Func. & Timing.lib
Delay calc
DC
PT
Pearl
-
8/2/2019 t3_p2_lou
74/91
ASP-DAC'01 Lou Scheffer II-74
Combined SP&R Flow
SE-PKS
FloorplanDEF
Extraction
DRC
Func. & Timing.TLF
PhysicalLEF
Delay calc
EDIFnetlist
PKS Optimization
Global Route
Static Timing
Pearl
HE
Static TimingPT
TCL Constraints
write_constraints
Detail route
nn 100MHz final result, met timing100MHz final result, met timing
nn Correlation within +Correlation within + -- 2.1%2.1%
nn One passOne pass
nn 12hrs 20min runtime12hrs 20min runtime
-
8/2/2019 t3_p2_lou
75/91
ASP-DAC'01 Lou Scheffer II-75
Slack Correlation
Wire Load Based
PKS
Routed
-
8/2/2019 t3_p2_lou
76/91
ASP-DAC'01 Lou Scheffer II-76
Enlargement of SP&R slack
-
8/2/2019 t3_p2_lou
77/91
ASP-DAC'01 Lou Scheffer II-77
Results from combined SP&R
CaseCase sizesize macros PKS timingmacros PKS timing max freq (MHz)max freq (MHz)
instances (k)instances (k) error (%)error (%) conventionalconventional SP&RSP&R
11 350350 5656 ++ -- 3%3% 140140 140140
22 250250 5050 ++ -- 3%3% 9797 100100
33 5050 44 ++ -- 0.96%0.96% 9393 9595
44 160160 7070 ++ -- 2.1%2.1% 8989 100100
-
8/2/2019 t3_p2_lou
78/91
ASP-DAC'01 Lou Scheffer II-78
Agenda
nn Traditional design flowsTraditional design flows
nn Summary of DSM ProblemsSummary of DSM Problems
nn Analysis Methods OverviewAnalysis Methods Overviewnn Correction Methods OverviewCorrection Methods Overview
nn Approaches to Fixing Timing ClosureApproaches to Fixing Timing Closure
nn Experimental ResultsExperimental Resultsnn SummarySummary
-
8/2/2019 t3_p2_lou
79/91
ASP-DAC'01 Lou Scheffer II-79
How do the approaches compare?
nn JayJay McDougalMcDougal ofofAgilentAgilent ran many flowsran many flowson the same designon the same design
nn OverconstrainOverconstrain clock by various amountsclock by various amounts
nn Accurate or conservativeAccurate or conservative WLMsWLMs
uu Tried many levels of conservatismTried many levels of conservatism
nn Allow placer to size or notAllow placer to size or not
nn Do post placement optimization or notDo post placement optimization or notnn Physically knowledgeable synthesisPhysically knowledgeable synthesis
-
8/2/2019 t3_p2_lou
80/91
-
8/2/2019 t3_p2_lou
81/91
ASP-DAC'01 Lou Scheffer II-81
Key to the plot of results
nn Basic flowBasic flow Design Compiler &Design Compiler & QplaceQplace
nn TDD = timing driven designTDD = timing driven design
uu In addition to minimizing wire length and congestion,In addition to minimizing wire length and congestion,
placer is given timing constraints and allowed toplacer is given timing constraints and allowed tochange gate sizeschange gate sizes
nn IPO and PBO are post placement optimizersIPO and PBO are post placement optimizers
uu IPOIPO runs on synthesis DB with back annotationruns on synthesis DB with back annotation
uu PBOPBO runs on physical DB with synthesis transformsruns on physical DB with synthesis transforms
nn PKS = Physically Knowledgeable SynthesisPKS = Physically Knowledgeable Synthesis(combined Synthesis/Place/Route)(combined Synthesis/Place/Route)
-
8/2/2019 t3_p2_lou
82/91
ASP-DAC'01 Lou Scheffer II-82
Comparison of Approaches
5
5.5
6
6.5
7
7.5
8
8.5
9
9.5
0.95 1.05 1.15 1.25
Relative size
Clockcyclea
chieved
No WLM
90% WLM
3ns;50%WL
IPO 5ns NoWL
IPO 3ns NoWL
TDD/PBO 50%WL
TDD/PBO 90%WL
PKS
Required
Cycle time
-
8/2/2019 t3_p2_lou
83/91
ASP-DAC'01 Lou Scheffer II-83
Comparison of Approaches
5
5.5
6
6.5
7
7.5
8
8.5
9
9.5
0.95 1.05 1.15 1.25
Relative size
Clockcyclea
chieved
No WLM
90% WLM
3ns;50%WL
IPO 5ns NoWL
IPO 3ns NoWL
TDD/PBO 50%WL
TDD/PBO 90%WL
PKS
Good area, but iterates
between placement and
synthesis, worst TTM,
didnt hit timing target
One tool, no iteration,
better TTM, hit timing
target
-
8/2/2019 t3_p2_lou
84/91
ASP-DAC'01 Lou Scheffer II-84
Agenda
nn Traditional design flowsTraditional design flows
nn Summary of DSM ProblemsSummary of DSM Problems
nn Analysis Methods OverviewAnalysis Methods Overviewnn Correction Methods OverviewCorrection Methods Overview
nn Approaches to Fixing Timing ClosureApproaches to Fixing Timing Closure
nn Experimental ResultsExperimental Resultsnn SummarySummary
-
8/2/2019 t3_p2_lou
85/91
ASP-DAC'01 Lou Scheffer II-85
Good News
nn At least we understand the problemAt least we understand the problem
uu Analysis of timing is well understoodAnalysis of timing is well understood
uu Transformations that help timing are wellTransformations that help timing are well
understoodunderstood
uu DSM effects are painful but can be controlledDSM effects are painful but can be controlled
-
8/2/2019 t3_p2_lou
86/91
ASP-DAC'01 Lou Scheffer II-86
Bad News
nn Cycle time and technology advancesCycle time and technology advances
demand more and more sophisticateddemand more and more sophisticated
optimization techniquesoptimization techniques
nn In previous flows, corrections must beIn previous flows, corrections must be
applied in separate toolsapplied in separate tools
nn Disconnects among various tools involvedDisconnects among various tools involved
increases turnincreases turn--aroundaround--time and limitstime and limitsoptimizationoptimization
-
8/2/2019 t3_p2_lou
87/91
ASP-DAC'01 Lou Scheffer II-87
Good News
nn The Bad News is commonly recognizedThe Bad News is commonly recognized
nn Many tool vendors, academics, inMany tool vendors, academics, in--househouse
EDA researchers are working to solve theseEDA researchers are working to solve theseproblemsproblems
nn A new generation of tools is alreadyA new generation of tools is already
available that was designed from the groundavailable that was designed from the ground
up to address timing closureup to address timing closure
-
8/2/2019 t3_p2_lou
88/91
ASP-DAC'01 Lou Scheffer II-88
Bad News
nn These problems wont be the last!These problems wont be the last!
nn Each process generation brings newEach process generation brings new
problemsproblemsuu Increased sizeIncreased size
uu Weird process rules (antenna)Weird process rules (antenna)
uu Possible new effects (single event upset)Possible new effects (single event upset)
-
8/2/2019 t3_p2_lou
89/91
ASP-DAC'01 Lou Scheffer II-89
Summary
nn Timing closure is a very real problemTiming closure is a very real problem
nn Incremental improvements help somewhat,Incremental improvements help somewhat,
but limiting factor isbut limiting factor isnn If synthesis does not understand placement,If synthesis does not understand placement,
it must use wire load models, which haveit must use wire load models, which have
serious limitationsserious limitations
nn Best approach is combined synthesis/P&RBest approach is combined synthesis/P&R
nn Experimental data backs this upExperimental data backs this up
-
8/2/2019 t3_p2_lou
90/91
ASP-DAC'01 Lou Scheffer II-90
Acknowledgements
nn TonyTony DrummDrumm wrote the original set of slideswrote the original set of slidesfor this lecture, including many of thefor this lecture, including many of theexamples. He credits:examples. He credits:
uu AlexAlex SuessSuess
uu JosJos NevesNeves
uu Bill JoynerBill Joyner
uu IBM Rochester EDA folksIBM Rochester EDA folks
nn But the conclusions, and any mistakes, areBut the conclusions, and any mistakes, areminemine
-
8/2/2019 t3_p2_lou
91/91
h ff