development and application of tree synthesis algorithms

Development and Application of Tree Synthesis Algorithms

John Lillis

University of Illinois

Chicago

Overview

Part I: Buffer tree synthesis Formulations S/P/SP-tree

Part II: Fanin tree embedding/replication Optimization across gate boundaries Interaction with placement

Part I: Buffer Tree Synthesis

Premises of Work

MAIN PREMISE: Powerful Buffer Tree Synthesis is a Core for Modern Design

Conservation of Resources Crucial Estimate: 700-800K Buffers/Chip in Near Future Cost-Performance Tradeoffs General Cost Model

Topology / Embedding / Buffering Spaces Should be Explored Simultaneously 2-Phase Approach Not Robust / Predictable Particularly Troublesome in Presence of Blockages

Max Slack Weakness

Sla

ck

Cost

Overoptimized

subtrees

Problem Formulation

Given: Location of Driver and Sinks Technology Parameters Timing Requirements Buffer Library Target Routing Graph (Blockages)

Find: Topology in corresponding space its Embedding and Buffer Assignment Minimizing Cost s.t. Timing Constraints

Philosophy of Constraint Imposition

Goals: Predictable Behavior Absence of ad-hoc Heuristics

Main Idea: Optimally Solve Constrained Variant of the Problem Well-Designed Constraints Produce

Large Flexible Solution SpaceTractability

Constraints: Topology Space

Full space

Constrainedspace

Topology Embedding Flexibility

b

c

s

a

a

b

c

s

b

ac

s

Buffer blockage

Routing blockage

Target Routing Graph Construction

a

b

c

s

Algorithmic Description

Timing-Driven Maze Routing

Topology Embedding

S-Tree P-Tree

SP-Tree

Core Subroutine: Timing-Driven Maze Routing

Generalization of [Hur, et. al.; TCAD Feb 2000]

Single Target, Multiple Sources

Finds non-dominated paths

Simultaneous Buffer Insertion

Handling of Blockages in Topology Synthesis

Sources

Target



Topology Embedding

S-Tree P-Tree

SP-Tree

Topology Embedding

Goal: Obtain timing feasible embedding / buffering of given topology, minimizing cost

Solution: Dynamic Programming (bottom-up)

Solution sets

A(u,v) represents a set of solutions that correspond to Vertex u in Topology Vertex v in Target Graph

A1b = Join(A1.left , A1.right)

A1 = GenDijsktra(A1b) A(u,v)

uv



Topology Embedding

S-Tree P-Tree

SP-Tree

S-Tree

Notion of localities: Spatial Temporal Polarity

Partition sinks into 2 sets based on: estimated timing criticality signal polarity requirements some other criteria...

Subtrees can break topology and “stitch” at different place

S-Tree Topology Space

Sink partition:

{a,c,d}

{b}

a

s

bc

d

s

b

cad

s

bca

d

S-Tree Recurrence


A1 = GenDijsktra(A1b)



A12b = Join(A12.left , A12.right) + Join(A1 , A2)


S-Tree Topology Space

s

bca e

fd

s

bc

ae

fd

s

bc

a ef

d

Initial topology

s

bca

efd

ba

s

fd

ec

Incorporating polarity

4 sets: critical & positive signal polarity critical & negative non-critical & positive non-critical & negative

Other partitioning schemes...



Topology Embedding

S-Tree P-Tree

SP-Tree

P-Tree Topology Space

a b c d e

a b c d ea b c d e

ss

All Permutation-Constrained Topologies

Limitations of P-Tree Space

Isolation of Critical / Non-Critical Subtrees: “Temporal-Locality”

Min WL May Not Produce Min Cost

Non-critical

Critical

Driver

Non-critical

Critical

Driver



Topology Embedding

S-Tree P-Tree

SP-Tree

SP-Tree

Combine everything said so far...

From P-Tree Spatial locality Robustness

From S-Tree Temporal locality Polarity locality Ability to fix “topology problems” by “stitching”

Solution Space

S-TreeP-Tree

SP-Tree

Entire space

Fixed topo.

Experiments

Randomly generated nets

Non-uniform required arrival time

Non-uniform sink input capacitance

Buffer-biased cost

Interested in: Min cost feasible solution Max slack solution for verification Runtime

More details in the paper...

Algorithms for Experiments

S-Tree

P-Tree

SP-Tree

RMP [Cong, Yuan; DAC 2000]

RMP-Quick [Cong, Yuan; DAC 2000]

Results

0

5

10

15

20

25

30

35

Wire Buf Cost Slack MaxSlack

Wire Buf Cost Runtime

RMP RMP-Qck S-Tree P-Tree SP-Tree Net2-06

# b

uffe

rs

Min cost feasible Max slack

Results

0

5

10

15

20

25

30

35

40

45

50




# b

uffe

rs


Results

0

10

20

30

40

50

60

70

80




# b

uffe

rs


SP-Tree vs. P-Tree

Conclusions

Key Concepts: General Cost Models

Routing CongestionBuffer Congestion

Orthogonal Separation of Spatial and Temporal Locality Polarity Requirements Routing and Buffer Blockages

Targets: Small-to-Medium Sized Signal Nets

Results Summary Highly Cost-Efficient, High Performance Solutions Substantially Outperforms Prior Approaches in Solution

Quality and Runtime

Part II: Fanin Tree Embedding/Replication

Replication Overview

• Hrkic, Lillis, Beraudo (DAC04, IWLS04)• Concept: Netlist structure limits potential of

timing-driven placement• Difficult for top-down synthesis to fix• Main issue: inherently non-monotone paths• Approach (Hrkic, Lillis; DAC04) touches on

placement, synthesis (netlist perturbation) and routing.

C

B

D

A

E

B

D

A

CR

E

C

Logic Replication

Duplicate logic cell Preserve functionality Improve timing

Place / Move cells Adjust connections

C

B

A

D E

C

B

A

D E

CR

Early Work

Use replication to straighten I/O paths Local monotonicity [Beraudo, Lillis, DAC 2003]

Sequence of 3 cells on the path Incremental framework

C

B

D

A

E

F

Limitations of Local Monotonicity

Local Monotonicity satisfied Still many non-monotone paths

Replication Tree Approach[Hrkic et. al. DAC04]

Identify critical sink Extract critical fan-in tree (Replication Tree) Optimize fan-in tree (Fan-in Tree Embedding) Legalize placement

Slowest Paths Tree

Focus on slowest paths Find slowest paths tree from critical sink Include paths within epsilon of current critical

delay Focus on most critical portions of fan-in cone

A AB BC C

D D

E EF F FR

DR

ARBR

CR

Replication Tree Most circuits do not contain large fan-in trees

due to reconvergence Given a critical tree temporarily replicate the

entire tree Assign connections:

if (u,v) is tree edge; connect uR to vR else connect u to vR

A BC

D

E F FR

DR

ARBR

CR

Placement cost Replication is temporary Placement cost is crucial Cost discount for placing cell over its logical

equivalent low cost for placing DR over D actual replication will never occur multiple low cost location possible

Fan-in Tree Embedding

Given: Fan-in tree Placement of sink and inputs Arrival times at inputs Placement and routing graph

Find: Placement of internal tree nodes (Gates) Minimizing Cost s.t. Timing Constraints cost / delay tradeoff

A

B

sink

C

Lower delay, higher cost

A

B

sink

C

Higher delay, lower cost

Fan-in Tree Embedding Example

Fan-out and Fan-in Tree

source

A

B

C

A

B

sink

C

Bottom-up Top-down


Adaptation of S-Tree algorithm [Hrkic, Lillis, DAC 2002]

Keep: Graph Model for Embedding Target Modified Timing-Driven Maze Routing

multiple source, multiple targetsat each vertex keep a list of non-dominated solutionsS. Hur, J. Lillis, IEEE TCAD 2000

Modify: Top-down vs. Bottom-up Solution signature (c,t):

c - cost t - signal arrival time

Gate placement cost p(x,y)

Join

Modifiedmaze routing


Non-binary tree: multiple gate inputs Top-Down Dynamic Programming Maze Routing to populate solutions

deffered backtracking Join Solutions

c=px,y + c1 + ... + cn t=MAX(t1, ... ,tn)

Bottom-Up solution extraction backtrack to extract maze route extract gate placement

Aside: Legalization

Use Modified Gain-Graph approach [Hur, Lillis; ICCAD00]

Modified to incorporate timing information

Optimization Flow

Identify critical sink (static timing analysis) Extract Fan-in Tree

Replication Tree epsilon-Slowest Paths Tree

Embed Fan-in Tree Decide which cells to Replicate / Unify Legalize placement Repeat while there is improvement

Enhancements

Post-process unification some cells placed close to their logical equivalents no automatic unification if one of the paths is non-critical it is possible to unify

without degrading performace

Unification in legalizer during ripple-move cell may be placed on top of its

replica unify them and stop legalization

epsilon-Slowest Paths Tree no randomization dynamically modify value of epsilon to enlarge the fan-

in cone

Experiments

Algorithms Timing-Driven VPR (Versatile Place and Route)

[http://www.eecg.toronto.edu/~vaughn/vpr/vpr.html] Local Replication [Beraudo, Lillis, DAC-03] RT-Embedding

20 MCNC Benchmark Circuits Interested in:

Critical delay Amount of replication Wire usage

Tests performed in FPGA domain Promissing results

Experimental Setup

Obtain valid placement with Timing-Driven VPR placer

Route and Evaluate with Timing-Driven VPR router

Local Replication Replication Tree Embedding

Average values over all 20 circuits normalized to VPR

critical path delay

Winf

Wlow-stress

wire length blocks

Local Repl 0.925 0.9271.020 1.003

RT-Embed 0.858 0.8691.084 1.004

Delay improved for all circuitsBest improvement for circuit pdc: 0.641Runtime penalty under 5% on the VPR flow

Average values over all 20 circuits normalized to VPR

critical path delay

Winf Wlow-stress wire length blocks

Local Repl 0.925 0.927 1.020 1.003RT-Embed 0.858 0.869 1.084 1.004

Delay improved for all circuitsBest improvement for circuit pdc: 0.641Runtime penalty under 5% on the VPR flow

Replication Statistics

Circuit ex1010: 38 replications, 12 unifications

Ongoing Work

Generalize to ASICs Include simultaneous buffering

• Mitigation of legalization noise Preventing (some) overlaps in embedding More sophisticated placement cost

Reconvergence - arborescence approach Simultaneous technology (re-)mapping

– Explore multiple Tree Topologies simultaneously (Universal Tree solver engine: U-Tree)

Review

Trees are everywhere!

Even in places where they seem to be absent

Tree based algorithms can be very strong in generality of formulation and predictability Enable connection to general placement/routing target Can capture tradeoffs between complex objectives Can sometimes be applied to drive optimization of graph

structures.

References: http://cs.uic.edu/~jlillis/pubs.html

S/P/SP-tree executables: http://eda.cs.uic.edu/software.html

Thank youThank you

Overlap

Empty

Timing-Driven Placement Legalization

After embedding, cells could overlap in the placement

Moving cells on critical path may harm timing Ripple-move strategy [Hur, Lillis, ICCAD 2000] Modified to include both timing and wiring

information

Overlap

EmptyEmpty

Timing-Driven Placement Legalization Identify overlap Identify up to 4 closest empty (one in each

quadrant) Construct gain graph

monotone paths from congested to free slots edges: gain of moving a cell to neighboring slot wire and timing gain find max-gain path and perform ripple-move gain could be negative

Review

development and application of tree synthesis algorithms

Documents

tree recurrencea1b

powerful buffer tree

topology spacetopology

critical subtrees

ptreespatial loc

tovertex u

topologyvertex v

right joina1