development and application of tree synthesis algorithms
DESCRIPTION
Development and Application of Tree Synthesis Algorithms. John Lillis University of Illinois Chicago. Overview. Part I: Buffer tree synthesis Formulations S/P/SP-tree Part II: Fanin tree embedding/replication Optimization across gate boundaries Interaction with placement. - PowerPoint PPT PresentationTRANSCRIPT
Development and Application of Tree Synthesis Algorithms
John Lillis
University of Illinois
Chicago
Overview
Part I: Buffer tree synthesis Formulations S/P/SP-tree
Part II: Fanin tree embedding/replication Optimization across gate boundaries Interaction with placement
Part I: Buffer Tree Synthesis
Premises of Work
MAIN PREMISE: Powerful Buffer Tree Synthesis is a Core for Modern Design
Conservation of Resources Crucial Estimate: 700-800K Buffers/Chip in Near Future Cost-Performance Tradeoffs General Cost Model
Topology / Embedding / Buffering Spaces Should be Explored Simultaneously 2-Phase Approach Not Robust / Predictable Particularly Troublesome in Presence of Blockages
Max Slack Weakness
Sla
ck
Cost
Overoptimized
subtrees
Problem Formulation
Given: Location of Driver and Sinks Technology Parameters Timing Requirements Buffer Library Target Routing Graph (Blockages)
Find: Topology in corresponding space its Embedding and Buffer Assignment Minimizing Cost s.t. Timing Constraints
Philosophy of Constraint Imposition
Goals: Predictable Behavior Absence of ad-hoc Heuristics
Main Idea: Optimally Solve Constrained Variant of the Problem Well-Designed Constraints Produce
Large Flexible Solution SpaceTractability
Constraints: Topology Space
Full space
Constrainedspace
Topology Embedding Flexibility
b
c
s
a
a
b
c
s
b
ac
s
Buffer blockage
Routing blockage
Target Routing Graph Construction
a
b
c
s
Algorithmic Description
Timing-Driven Maze Routing
Topology Embedding
S-Tree P-Tree
SP-Tree
Algorithmic Description
Timing-Driven Maze Routing
Topology Embedding
S-Tree P-Tree
SP-Tree
Core Subroutine: Timing-Driven Maze Routing
Generalization of [Hur, et. al.; TCAD Feb 2000]
Single Target, Multiple Sources
Finds non-dominated paths
Simultaneous Buffer Insertion
Handling of Blockages in Topology Synthesis
Sources
Target
Algorithmic Description
Timing-Driven Maze Routing
Topology Embedding
S-Tree P-Tree
SP-Tree
Topology Embedding
Goal: Obtain timing feasible embedding / buffering of given topology, minimizing cost
Solution: Dynamic Programming (bottom-up)
Solution sets
A(u,v) represents a set of solutions that correspond to Vertex u in Topology Vertex v in Target Graph
A1b = Join(A1.left , A1.right)
A1 = GenDijsktra(A1b) A(u,v)
uv
Algorithmic Description
Timing-Driven Maze Routing
Topology Embedding
S-Tree P-Tree
SP-Tree
S-Tree
Notion of localities: Spatial Temporal Polarity
Partition sinks into 2 sets based on: estimated timing criticality signal polarity requirements some other criteria...
Subtrees can break topology and “stitch” at different place
S-Tree Topology Space
Sink partition:
{a,c,d}
{b}
a
s
bc
d
s
b
cad
s
bca
d
S-Tree Recurrence
A1b = Join(A1.left , A1.right)
A1 = GenDijsktra(A1b)
A2b = Join(A2.left , A2.right)
A2 = GenDijsktra(A2b)
A12b = Join(A12.left , A12.right) + Join(A1 , A2)
A12 = GenDijsktra(A12b)
S-Tree Topology Space
s
bca e
fd
s
bc
ae
fd
s
bc
a ef
d
Initial topology
s
bca
efd
ba
s
fd
ec
Incorporating polarity
4 sets: critical & positive signal polarity critical & negative non-critical & positive non-critical & negative
Other partitioning schemes...
Algorithmic Description
Timing-Driven Maze Routing
Topology Embedding
S-Tree P-Tree
SP-Tree
P-Tree Topology Space
a b c d e
a b c d ea b c d e
ss
All Permutation-Constrained Topologies
Limitations of P-Tree Space
Isolation of Critical / Non-Critical Subtrees: “Temporal-Locality”
Min WL May Not Produce Min Cost
Non-critical
Critical
Driver
Non-critical
Critical
Driver
Algorithmic Description
Timing-Driven Maze Routing
Topology Embedding
S-Tree P-Tree
SP-Tree
SP-Tree
Combine everything said so far...
From P-Tree Spatial locality Robustness
From S-Tree Temporal locality Polarity locality Ability to fix “topology problems” by “stitching”
Solution Space
S-TreeP-Tree
SP-Tree
Entire space
Fixed topo.
Experiments
Randomly generated nets
Non-uniform required arrival time
Non-uniform sink input capacitance
Buffer-biased cost
Interested in: Min cost feasible solution Max slack solution for verification Runtime
More details in the paper...
Algorithms for Experiments
S-Tree
P-Tree
SP-Tree
RMP [Cong, Yuan; DAC 2000]
RMP-Quick [Cong, Yuan; DAC 2000]
Results
0
5
10
15
20
25
30
35
Wire Buf Cost Slack MaxSlack
Wire Buf Cost Runtime
RMP RMP-Qck S-Tree P-Tree SP-Tree Net2-06
# b
uffe
rs
Min cost feasible Max slack
Results
0
5
10
15
20
25
30
35
40
45
50
Wire Buf Cost Slack MaxSlack
Wire Buf Cost Runtime
RMP RMP-Qck S-Tree P-Tree SP-Tree Net2-08
# b
uffe
rs
Min cost feasible Max slack
Results
0
10
20
30
40
50
60
70
80
Wire Buf Cost Slack MaxSlack
Wire Buf Cost Runtime
RMP RMP-Qck S-Tree P-Tree SP-Tree Net2-12
# b
uffe
rs
Min cost feasible Max slack
SP-Tree vs. P-Tree
Conclusions
Key Concepts: General Cost Models
Routing CongestionBuffer Congestion
Orthogonal Separation of Spatial and Temporal Locality Polarity Requirements Routing and Buffer Blockages
Targets: Small-to-Medium Sized Signal Nets
Results Summary Highly Cost-Efficient, High Performance Solutions Substantially Outperforms Prior Approaches in Solution
Quality and Runtime
Part II: Fanin Tree Embedding/Replication
Replication Overview
• Hrkic, Lillis, Beraudo (DAC04, IWLS04)• Concept: Netlist structure limits potential of
timing-driven placement• Difficult for top-down synthesis to fix• Main issue: inherently non-monotone paths• Approach (Hrkic, Lillis; DAC04) touches on
placement, synthesis (netlist perturbation) and routing.
C
B
D
A
E
B
D
A
CR
E
C
Logic Replication
Duplicate logic cell Preserve functionality Improve timing
Place / Move cells Adjust connections
C
B
A
D E
C
B
A
D E
CR
Early Work
Use replication to straighten I/O paths Local monotonicity [Beraudo, Lillis, DAC 2003]
Sequence of 3 cells on the path Incremental framework
C
B
D
A
E
F
Limitations of Local Monotonicity
Local Monotonicity satisfied Still many non-monotone paths
Replication Tree Approach[Hrkic et. al. DAC04]
Identify critical sink Extract critical fan-in tree (Replication Tree) Optimize fan-in tree (Fan-in Tree Embedding) Legalize placement
Slowest Paths Tree
Focus on slowest paths Find slowest paths tree from critical sink Include paths within epsilon of current critical
delay Focus on most critical portions of fan-in cone
A AB BC C
D D
E EF F FR
DR
ARBR
CR
Replication Tree Most circuits do not contain large fan-in trees
due to reconvergence Given a critical tree temporarily replicate the
entire tree Assign connections:
if (u,v) is tree edge; connect uR to vR else connect u to vR
A BC
D
E F FR
DR
ARBR
CR
Placement cost Replication is temporary Placement cost is crucial Cost discount for placing cell over its logical
equivalent low cost for placing DR over D actual replication will never occur multiple low cost location possible
Fan-in Tree Embedding
Given: Fan-in tree Placement of sink and inputs Arrival times at inputs Placement and routing graph
Find: Placement of internal tree nodes (Gates) Minimizing Cost s.t. Timing Constraints cost / delay tradeoff
A
B
sink
C
Lower delay, higher cost
A
B
sink
C
Higher delay, lower cost
Fan-in Tree Embedding Example
Fan-out and Fan-in Tree
source
A
B
C
A
B
sink
C
Bottom-up Top-down
Fan-in Tree Embedding
Adaptation of S-Tree algorithm [Hrkic, Lillis, DAC 2002]
Keep: Graph Model for Embedding Target Modified Timing-Driven Maze Routing
multiple source, multiple targetsat each vertex keep a list of non-dominated solutionsS. Hur, J. Lillis, IEEE TCAD 2000
Modify: Top-down vs. Bottom-up Solution signature (c,t):
c - cost t - signal arrival time
Gate placement cost p(x,y)
Join
Modifiedmaze routing
Fan-in Tree Embedding
Non-binary tree: multiple gate inputs Top-Down Dynamic Programming Maze Routing to populate solutions
deffered backtracking Join Solutions
c=px,y + c1 + ... + cn t=MAX(t1, ... ,tn)
Bottom-Up solution extraction backtrack to extract maze route extract gate placement
Aside: Legalization
Use Modified Gain-Graph approach [Hur, Lillis; ICCAD00]
Modified to incorporate timing information
Optimization Flow
Identify critical sink (static timing analysis) Extract Fan-in Tree
Replication Tree epsilon-Slowest Paths Tree
Embed Fan-in Tree Decide which cells to Replicate / Unify Legalize placement Repeat while there is improvement
Enhancements
Post-process unification some cells placed close to their logical equivalents no automatic unification if one of the paths is non-critical it is possible to unify
without degrading performace
Unification in legalizer during ripple-move cell may be placed on top of its
replica unify them and stop legalization
epsilon-Slowest Paths Tree no randomization dynamically modify value of epsilon to enlarge the fan-
in cone
Experiments
Algorithms Timing-Driven VPR (Versatile Place and Route)
[http://www.eecg.toronto.edu/~vaughn/vpr/vpr.html] Local Replication [Beraudo, Lillis, DAC-03] RT-Embedding
20 MCNC Benchmark Circuits Interested in:
Critical delay Amount of replication Wire usage
Tests performed in FPGA domain Promissing results
Experimental Setup
Obtain valid placement with Timing-Driven VPR placer
Route and Evaluate with Timing-Driven VPR router
Local Replication Replication Tree Embedding
Average values over all 20 circuits normalized to VPR
critical path delay
Winf
Wlow-stress
wire length blocks
Local Repl 0.925 0.9271.020 1.003
RT-Embed 0.858 0.8691.084 1.004
Delay improved for all circuitsBest improvement for circuit pdc: 0.641Runtime penalty under 5% on the VPR flow
Average values over all 20 circuits normalized to VPR
critical path delay
Winf Wlow-stress wire length blocks
Local Repl 0.925 0.927 1.020 1.003RT-Embed 0.858 0.869 1.084 1.004
Delay improved for all circuitsBest improvement for circuit pdc: 0.641Runtime penalty under 5% on the VPR flow
Replication Statistics
Circuit ex1010: 38 replications, 12 unifications
Ongoing Work
Generalize to ASICs Include simultaneous buffering
• Mitigation of legalization noise Preventing (some) overlaps in embedding More sophisticated placement cost
Reconvergence - arborescence approach Simultaneous technology (re-)mapping
– Explore multiple Tree Topologies simultaneously (Universal Tree solver engine: U-Tree)
Review
Trees are everywhere!
Even in places where they seem to be absent
Tree based algorithms can be very strong in generality of formulation and predictability Enable connection to general placement/routing target Can capture tradeoffs between complex objectives Can sometimes be applied to drive optimization of graph
structures.
References: http://cs.uic.edu/~jlillis/pubs.html
S/P/SP-tree executables: http://eda.cs.uic.edu/software.html
Thank youThank you
Overlap
Empty
Timing-Driven Placement Legalization
After embedding, cells could overlap in the placement
Moving cells on critical path may harm timing Ripple-move strategy [Hur, Lillis, ICCAD 2000] Modified to include both timing and wiring
information
Overlap
Empty
Timing-Driven Placement Legalization
After embedding, cells could overlap in the placement
Moving cells on critical path may harm timing Ripple-move strategy [Hur, Lillis, ICCAD 2000] Modified to include both timing and wiring
information
Overlap
Empty
Timing-Driven Placement Legalization
After embedding, cells could overlap in the placement
Moving cells on critical path may harm timing Ripple-move strategy [Hur, Lillis, ICCAD 2000] Modified to include both timing and wiring
information
Overlap
Empty
Timing-Driven Placement Legalization
After embedding, cells could overlap in the placement
Moving cells on critical path may harm timing Ripple-move strategy [Hur, Lillis, ICCAD 2000] Modified to include both timing and wiring
information
Overlap
Empty
Timing-Driven Placement Legalization
After embedding, cells could overlap in the placement
Moving cells on critical path may harm timing Ripple-move strategy [Hur, Lillis, ICCAD 2000] Modified to include both timing and wiring
information
Overlap
EmptyEmpty
Timing-Driven Placement Legalization Identify overlap Identify up to 4 closest empty (one in each
quadrant) Construct gain graph
monotone paths from congested to free slots edges: gain of moving a cell to neighboring slot wire and timing gain find max-gain path and perform ripple-move gain could be negative
Review