development and application of tree synthesis algorithms

65
Development and Application of Tree Synthesis Algorithms John Lillis University of Illinois Chicago

Upload: afra

Post on 19-Jan-2016

51 views

Category:

Documents


1 download

DESCRIPTION

Development and Application of Tree Synthesis Algorithms. John Lillis University of Illinois Chicago. Overview. Part I: Buffer tree synthesis Formulations S/P/SP-tree Part II: Fanin tree embedding/replication Optimization across gate boundaries Interaction with placement. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Development and Application of Tree Synthesis Algorithms

Development and Application of Tree Synthesis Algorithms

John Lillis

University of Illinois

Chicago

Page 2: Development and Application of Tree Synthesis Algorithms

Overview

Part I: Buffer tree synthesis Formulations S/P/SP-tree

Part II: Fanin tree embedding/replication Optimization across gate boundaries Interaction with placement

Page 3: Development and Application of Tree Synthesis Algorithms

Part I: Buffer Tree Synthesis

Page 4: Development and Application of Tree Synthesis Algorithms

Premises of Work

MAIN PREMISE: Powerful Buffer Tree Synthesis is a Core for Modern Design

Conservation of Resources Crucial Estimate: 700-800K Buffers/Chip in Near Future Cost-Performance Tradeoffs General Cost Model

Topology / Embedding / Buffering Spaces Should be Explored Simultaneously 2-Phase Approach Not Robust / Predictable Particularly Troublesome in Presence of Blockages

Page 5: Development and Application of Tree Synthesis Algorithms

Max Slack Weakness

Sla

ck

Cost

Overoptimized

subtrees

Page 6: Development and Application of Tree Synthesis Algorithms

Problem Formulation

Given: Location of Driver and Sinks Technology Parameters Timing Requirements Buffer Library Target Routing Graph (Blockages)

Find: Topology in corresponding space its Embedding and Buffer Assignment Minimizing Cost s.t. Timing Constraints

Page 7: Development and Application of Tree Synthesis Algorithms

Philosophy of Constraint Imposition

Goals: Predictable Behavior Absence of ad-hoc Heuristics

Main Idea: Optimally Solve Constrained Variant of the Problem Well-Designed Constraints Produce

Large Flexible Solution SpaceTractability

Constraints: Topology Space

Full space

Constrainedspace

Page 8: Development and Application of Tree Synthesis Algorithms

Topology Embedding Flexibility

b

c

s

a

a

b

c

s

b

ac

s

Page 9: Development and Application of Tree Synthesis Algorithms

Buffer blockage

Routing blockage

Target Routing Graph Construction

a

b

c

s

Page 10: Development and Application of Tree Synthesis Algorithms

Algorithmic Description

Timing-Driven Maze Routing

Topology Embedding

S-Tree P-Tree

SP-Tree

Page 11: Development and Application of Tree Synthesis Algorithms

Algorithmic Description

Timing-Driven Maze Routing

Topology Embedding

S-Tree P-Tree

SP-Tree

Page 12: Development and Application of Tree Synthesis Algorithms

Core Subroutine: Timing-Driven Maze Routing

Generalization of [Hur, et. al.; TCAD Feb 2000]

Single Target, Multiple Sources

Finds non-dominated paths

Simultaneous Buffer Insertion

Handling of Blockages in Topology Synthesis

Sources

Target

Page 13: Development and Application of Tree Synthesis Algorithms

Algorithmic Description

Timing-Driven Maze Routing

Topology Embedding

S-Tree P-Tree

SP-Tree

Page 14: Development and Application of Tree Synthesis Algorithms

Topology Embedding

Goal: Obtain timing feasible embedding / buffering of given topology, minimizing cost

Solution: Dynamic Programming (bottom-up)

Page 15: Development and Application of Tree Synthesis Algorithms

Solution sets

A(u,v) represents a set of solutions that correspond to Vertex u in Topology Vertex v in Target Graph

A1b = Join(A1.left , A1.right)

A1 = GenDijsktra(A1b) A(u,v)

uv

Page 16: Development and Application of Tree Synthesis Algorithms

Algorithmic Description

Timing-Driven Maze Routing

Topology Embedding

S-Tree P-Tree

SP-Tree

Page 17: Development and Application of Tree Synthesis Algorithms

S-Tree

Notion of localities: Spatial Temporal Polarity

Partition sinks into 2 sets based on: estimated timing criticality signal polarity requirements some other criteria...

Subtrees can break topology and “stitch” at different place

Page 18: Development and Application of Tree Synthesis Algorithms

S-Tree Topology Space

Sink partition:

{a,c,d}

{b}

a

s

bc

d

s

b

cad

s

bca

d

Page 19: Development and Application of Tree Synthesis Algorithms

S-Tree Recurrence

A1b = Join(A1.left , A1.right)

A1 = GenDijsktra(A1b)

A2b = Join(A2.left , A2.right)

A2 = GenDijsktra(A2b)

A12b = Join(A12.left , A12.right) + Join(A1 , A2)

A12 = GenDijsktra(A12b)

Page 20: Development and Application of Tree Synthesis Algorithms

S-Tree Topology Space

s

bca e

fd

s

bc

ae

fd

s

bc

a ef

d

Initial topology

s

bca

efd

ba

s

fd

ec

Page 21: Development and Application of Tree Synthesis Algorithms

Incorporating polarity

4 sets: critical & positive signal polarity critical & negative non-critical & positive non-critical & negative

Other partitioning schemes...

Page 22: Development and Application of Tree Synthesis Algorithms

Algorithmic Description

Timing-Driven Maze Routing

Topology Embedding

S-Tree P-Tree

SP-Tree

Page 23: Development and Application of Tree Synthesis Algorithms

P-Tree Topology Space

a b c d e

a b c d ea b c d e

ss

All Permutation-Constrained Topologies

Page 24: Development and Application of Tree Synthesis Algorithms

Limitations of P-Tree Space

Isolation of Critical / Non-Critical Subtrees: “Temporal-Locality”

Min WL May Not Produce Min Cost

Non-critical

Critical

Driver

Non-critical

Critical

Driver

Page 25: Development and Application of Tree Synthesis Algorithms

Algorithmic Description

Timing-Driven Maze Routing

Topology Embedding

S-Tree P-Tree

SP-Tree

Page 26: Development and Application of Tree Synthesis Algorithms

SP-Tree

Combine everything said so far...

From P-Tree Spatial locality Robustness

From S-Tree Temporal locality Polarity locality Ability to fix “topology problems” by “stitching”

Page 27: Development and Application of Tree Synthesis Algorithms

Solution Space

S-TreeP-Tree

SP-Tree

Entire space

Fixed topo.

Page 28: Development and Application of Tree Synthesis Algorithms

Experiments

Randomly generated nets

Non-uniform required arrival time

Non-uniform sink input capacitance

Buffer-biased cost

Interested in: Min cost feasible solution Max slack solution for verification Runtime

More details in the paper...

Page 29: Development and Application of Tree Synthesis Algorithms

Algorithms for Experiments

S-Tree

P-Tree

SP-Tree

RMP [Cong, Yuan; DAC 2000]

RMP-Quick [Cong, Yuan; DAC 2000]

Page 30: Development and Application of Tree Synthesis Algorithms

Results

0

5

10

15

20

25

30

35

Wire Buf Cost Slack MaxSlack

Wire Buf Cost Runtime

RMP RMP-Qck S-Tree P-Tree SP-Tree Net2-06

# b

uffe

rs

Min cost feasible Max slack

Page 31: Development and Application of Tree Synthesis Algorithms

Results

0

5

10

15

20

25

30

35

40

45

50

Wire Buf Cost Slack MaxSlack

Wire Buf Cost Runtime

RMP RMP-Qck S-Tree P-Tree SP-Tree Net2-08

# b

uffe

rs

Min cost feasible Max slack

Page 32: Development and Application of Tree Synthesis Algorithms

Results

0

10

20

30

40

50

60

70

80

Wire Buf Cost Slack MaxSlack

Wire Buf Cost Runtime

RMP RMP-Qck S-Tree P-Tree SP-Tree Net2-12

# b

uffe

rs

Min cost feasible Max slack

Page 33: Development and Application of Tree Synthesis Algorithms

SP-Tree vs. P-Tree

Page 34: Development and Application of Tree Synthesis Algorithms

Conclusions

Key Concepts: General Cost Models

Routing CongestionBuffer Congestion

Orthogonal Separation of Spatial and Temporal Locality Polarity Requirements Routing and Buffer Blockages

Targets: Small-to-Medium Sized Signal Nets

Results Summary Highly Cost-Efficient, High Performance Solutions Substantially Outperforms Prior Approaches in Solution

Quality and Runtime

Page 35: Development and Application of Tree Synthesis Algorithms

Part II: Fanin Tree Embedding/Replication

Page 36: Development and Application of Tree Synthesis Algorithms

Replication Overview

• Hrkic, Lillis, Beraudo (DAC04, IWLS04)• Concept: Netlist structure limits potential of

timing-driven placement• Difficult for top-down synthesis to fix• Main issue: inherently non-monotone paths• Approach (Hrkic, Lillis; DAC04) touches on

placement, synthesis (netlist perturbation) and routing.

Page 37: Development and Application of Tree Synthesis Algorithms

C

B

D

A

E

B

D

A

CR

E

C

Logic Replication

Duplicate logic cell Preserve functionality Improve timing

Place / Move cells Adjust connections

Page 38: Development and Application of Tree Synthesis Algorithms

C

B

A

D E

C

B

A

D E

CR

Early Work

Use replication to straighten I/O paths Local monotonicity [Beraudo, Lillis, DAC 2003]

Sequence of 3 cells on the path Incremental framework

Page 39: Development and Application of Tree Synthesis Algorithms

C

B

D

A

E

F

Limitations of Local Monotonicity

Local Monotonicity satisfied Still many non-monotone paths

Page 40: Development and Application of Tree Synthesis Algorithms

Replication Tree Approach[Hrkic et. al. DAC04]

Identify critical sink Extract critical fan-in tree (Replication Tree) Optimize fan-in tree (Fan-in Tree Embedding) Legalize placement

Page 41: Development and Application of Tree Synthesis Algorithms

Slowest Paths Tree

Focus on slowest paths Find slowest paths tree from critical sink Include paths within epsilon of current critical

delay Focus on most critical portions of fan-in cone

Page 42: Development and Application of Tree Synthesis Algorithms

A AB BC C

D D

E EF F FR

DR

ARBR

CR

Replication Tree Most circuits do not contain large fan-in trees

due to reconvergence Given a critical tree temporarily replicate the

entire tree Assign connections:

if (u,v) is tree edge; connect uR to vR else connect u to vR

Page 43: Development and Application of Tree Synthesis Algorithms

A BC

D

E F FR

DR

ARBR

CR

Placement cost Replication is temporary Placement cost is crucial Cost discount for placing cell over its logical

equivalent low cost for placing DR over D actual replication will never occur multiple low cost location possible

Page 44: Development and Application of Tree Synthesis Algorithms

Fan-in Tree Embedding

Given: Fan-in tree Placement of sink and inputs Arrival times at inputs Placement and routing graph

Find: Placement of internal tree nodes (Gates) Minimizing Cost s.t. Timing Constraints cost / delay tradeoff

Page 45: Development and Application of Tree Synthesis Algorithms

A

B

sink

C

Lower delay, higher cost

A

B

sink

C

Higher delay, lower cost

Fan-in Tree Embedding Example

Page 46: Development and Application of Tree Synthesis Algorithms

Fan-out and Fan-in Tree

source

A

B

C

A

B

sink

C

Bottom-up Top-down

Page 47: Development and Application of Tree Synthesis Algorithms

Fan-in Tree Embedding

Adaptation of S-Tree algorithm [Hrkic, Lillis, DAC 2002]

Keep: Graph Model for Embedding Target Modified Timing-Driven Maze Routing

multiple source, multiple targetsat each vertex keep a list of non-dominated solutionsS. Hur, J. Lillis, IEEE TCAD 2000

Modify: Top-down vs. Bottom-up Solution signature (c,t):

c - cost t - signal arrival time

Gate placement cost p(x,y)

Page 48: Development and Application of Tree Synthesis Algorithms

Join

Modifiedmaze routing

Fan-in Tree Embedding

Non-binary tree: multiple gate inputs Top-Down Dynamic Programming Maze Routing to populate solutions

deffered backtracking Join Solutions

c=px,y + c1 + ... + cn t=MAX(t1, ... ,tn)

Bottom-Up solution extraction backtrack to extract maze route extract gate placement

Page 49: Development and Application of Tree Synthesis Algorithms

Aside: Legalization

Use Modified Gain-Graph approach [Hur, Lillis; ICCAD00]

Modified to incorporate timing information

Page 50: Development and Application of Tree Synthesis Algorithms

Optimization Flow

Identify critical sink (static timing analysis) Extract Fan-in Tree

Replication Tree epsilon-Slowest Paths Tree

Embed Fan-in Tree Decide which cells to Replicate / Unify Legalize placement Repeat while there is improvement

Page 51: Development and Application of Tree Synthesis Algorithms

Enhancements

Post-process unification some cells placed close to their logical equivalents no automatic unification if one of the paths is non-critical it is possible to unify

without degrading performace

Unification in legalizer during ripple-move cell may be placed on top of its

replica unify them and stop legalization

epsilon-Slowest Paths Tree no randomization dynamically modify value of epsilon to enlarge the fan-

in cone

Page 52: Development and Application of Tree Synthesis Algorithms

Experiments

Algorithms Timing-Driven VPR (Versatile Place and Route)

[http://www.eecg.toronto.edu/~vaughn/vpr/vpr.html] Local Replication [Beraudo, Lillis, DAC-03] RT-Embedding

20 MCNC Benchmark Circuits Interested in:

Critical delay Amount of replication Wire usage

Tests performed in FPGA domain Promissing results

Page 53: Development and Application of Tree Synthesis Algorithms

Experimental Setup

Obtain valid placement with Timing-Driven VPR placer

Route and Evaluate with Timing-Driven VPR router

Local Replication Replication Tree Embedding

Page 54: Development and Application of Tree Synthesis Algorithms

Average values over all 20 circuits normalized to VPR

critical path delay

Winf

Wlow-stress

wire length blocks

Local Repl 0.925 0.9271.020 1.003

RT-Embed 0.858 0.8691.084 1.004

Delay improved for all circuitsBest improvement for circuit pdc: 0.641Runtime penalty under 5% on the VPR flow

Average values over all 20 circuits normalized to VPR

critical path delay

Winf Wlow-stress wire length blocks

Local Repl 0.925 0.927 1.020 1.003RT-Embed 0.858 0.869 1.084 1.004

Delay improved for all circuitsBest improvement for circuit pdc: 0.641Runtime penalty under 5% on the VPR flow

Page 55: Development and Application of Tree Synthesis Algorithms

Replication Statistics

Circuit ex1010: 38 replications, 12 unifications

Page 56: Development and Application of Tree Synthesis Algorithms

Ongoing Work

Generalize to ASICs Include simultaneous buffering

• Mitigation of legalization noise Preventing (some) overlaps in embedding More sophisticated placement cost

Reconvergence - arborescence approach Simultaneous technology (re-)mapping

– Explore multiple Tree Topologies simultaneously (Universal Tree solver engine: U-Tree)

Page 57: Development and Application of Tree Synthesis Algorithms

Review

Trees are everywhere!

Even in places where they seem to be absent

Tree based algorithms can be very strong in generality of formulation and predictability Enable connection to general placement/routing target Can capture tradeoffs between complex objectives Can sometimes be applied to drive optimization of graph

structures.

References: http://cs.uic.edu/~jlillis/pubs.html

S/P/SP-tree executables: http://eda.cs.uic.edu/software.html

Page 58: Development and Application of Tree Synthesis Algorithms

Thank youThank you

Page 59: Development and Application of Tree Synthesis Algorithms

Overlap

Empty

Timing-Driven Placement Legalization

After embedding, cells could overlap in the placement

Moving cells on critical path may harm timing Ripple-move strategy [Hur, Lillis, ICCAD 2000] Modified to include both timing and wiring

information

Page 60: Development and Application of Tree Synthesis Algorithms

Overlap

Empty

Timing-Driven Placement Legalization

After embedding, cells could overlap in the placement

Moving cells on critical path may harm timing Ripple-move strategy [Hur, Lillis, ICCAD 2000] Modified to include both timing and wiring

information

Page 61: Development and Application of Tree Synthesis Algorithms

Overlap

Empty

Timing-Driven Placement Legalization

After embedding, cells could overlap in the placement

Moving cells on critical path may harm timing Ripple-move strategy [Hur, Lillis, ICCAD 2000] Modified to include both timing and wiring

information

Page 62: Development and Application of Tree Synthesis Algorithms

Overlap

Empty

Timing-Driven Placement Legalization

After embedding, cells could overlap in the placement

Moving cells on critical path may harm timing Ripple-move strategy [Hur, Lillis, ICCAD 2000] Modified to include both timing and wiring

information

Page 63: Development and Application of Tree Synthesis Algorithms

Overlap

Empty

Timing-Driven Placement Legalization

After embedding, cells could overlap in the placement

Moving cells on critical path may harm timing Ripple-move strategy [Hur, Lillis, ICCAD 2000] Modified to include both timing and wiring

information

Page 64: Development and Application of Tree Synthesis Algorithms

Overlap

EmptyEmpty

Timing-Driven Placement Legalization Identify overlap Identify up to 4 closest empty (one in each

quadrant) Construct gain graph

monotone paths from congested to free slots edges: gain of moving a cell to neighboring slot wire and timing gain find max-gain path and perform ripple-move gain could be negative

Page 65: Development and Application of Tree Synthesis Algorithms

Review