chap3 partitioning clusteringcadlab.cs.ucla.edu/~cong/cs258f/chap3_06w.pdf ·...
Post on 26-Sep-2020
2 Views
Preview:
TRANSCRIPT
CS258F 2006W
Prof. J. Cong 1
Jason Cong 1
Jason Cong 2
Outline
Circuit Partitioning formulation
Importance of Circuit Partitioning
Partitioning Algorithms
Circuit Clustering Formulation
Clustering Algorithms
CS258F 2006W
Prof. J. Cong 2
Jason Cong 3
Basic Notations
• Node = cell, logic gate, functional block …• Pin = an input/output port of a node• Net = set of pins to be connected = hyperedge• Netlist = set of nodes + nets = hypergraph
Jason Cong 4
Circuit Partitioning FormulationBi-partitioning formulation:
Minimize interconnections between partitions
Minimum cut: min c(x, x’)
minimum bisection: min c(x, x’) with |x|= |x’|
minimum ratio-cut: min c(x, x’) / |x||x’|
X X’
c(X,X’)
CS258F 2006W
Prof. J. Cong 3
Jason Cong 5
A Bi-Partitioning Example
Min-cut size=13Min-Bisection size = 300Min-ratio-cut size= 19
a
b
c e
d f
mini-ratio-cut min-bisection
min-cut 9
10
100100 100
100100
100
4
Ratio-cut helps to identify natural clusters
Jason Cong 6
Circuit Partitioning Formulation (Cont’d)
General multi-way partitioning formulation:
Partitioning a network N into N1, N2, …, Nk such that
Each partition has an area constraint
each partition has an I/O constraint
Minimize the total interconnection:
∑∈
≤iNv
iAva )(
iii INNNc ≤− ),(
),( iN
i NNNci
−∑
CS258F 2006W
Prof. J. Cong 4
Jason Cong 7
Importance of Circuit PartitioningDivide-and- conquer methodology
The most effective way to solve problems of high complexity
E.g.: min-cut based placement, partitioning-based test generation,…
System-level partitioning for multi-chip designs
inter-chip interconnection delay dominates system performance.
Circuit emulation/parallel simulation
partition large circuit into multiple FPGAs (e.g. Quickturn), or multiple special-purpose processors (e.g. Zycad).
Parallel CAD development
Task decomposition and load balancing
In deep-submicron designs, partitioning defines local and global interconnect, and has significant impact on circuit performance
…… ……
Jason Cong 8
Partitioning Algorithms
Iterative partitioning algorithms
Spectral based partitioning algorithms
Net partitioning vs. module partitioning
Multi-way partitioning
Multi-level partitioning (to be discussed after clustering)
Further study in partitioning techniques
CS258F 2006W
Prof. J. Cong 5
Jason Cong 9
Iterative Partitioning AlgorithmsGreedy Iterative improvement method
[Kernighan-Lin 1970]
[Fiduccia-Mattheyses 1982]
[krishnamurthy 1984]
Simulated Annealing
[Kirkpartrick-Gelatt-Vecchi 1983]
[Greene-Supowit 1984](See Appendix, SA will be formally introduced in the Floorplan chapter)
Jason Cong 10
Kernighan-Lin’s AlgorithmPair-wise exchange of nodes to reduce cut sizeAllow cut size to increase temporarily within a pass
Compute the gain of a swapRepeat
Perform a feasible swap of max gainMark swapped nodes “locked”;Update swap gains;
Until no feasible swap; Find max prefix partial sum in gain sequence g1, g2, …, gm
Make corresponding swaps permanent.
Start another pass if current pass reduces the cut size(usually converge after a few passes)
u • v •
v • u •
locked
CS258F 2006W
Prof. J. Cong 6
Jason Cong 11
Fiduccia-Mattheyses’ Improvement
Each pass in KL-algorithm takes O(n3) or O(n2 logn) time (n: #modules)Choosing swap with max gain and updating swap gains take O(n2) time
FM-algorithm takes O(p) time per pass( p: #pins)
Key ideas in FM-algorithmsEach move affects only a few moves
⇒ constant time gain updating per move(amortized)Maintain a list of gain buckets
⇒ constant time selection of the move with max gain
Further improvement by KrishnamurthyLook-ahead in gain computation
•
•
u1
V1 V2
u2
gmax
-gmax
Jason Cong 12
Further Improvement on Iterative Improvement-Based Partitioning Algorithms
• Basic idea :– Encouraging cluster-based move during
iterative improvement process– Example: LSR partitioning : focus on loose
and stable net removal
CS258F 2006W
Prof. J. Cong 7
Jason Cong 13
Loose net Removal (LR)• Net Status Transition
• Loose net Removal
free net
loose net
locked net =
block 0 block 1 block 2
free cells of loose net
uncut net =
Jason Cong 14
Loose net Removal (LR)• Net Status Transition
• Loose net Removal
free net
loose net
locked net =
block 0 block 1 block 2
free cells of loose net
uncut net =
CS258F 2006W
Prof. J. Cong 8
Jason Cong 15
Loose net Removal (LR)• Net Status Transition
• Loose net Removal
free net
loose net
locked net =
block 0 block 1 block 2
free cells of loose net
uncut net =
Jason Cong 16
Loose net Removal (LR)• Net Status Transition
• Loose net Removal
free net
loose net
locked net =
block 0 block 1 block 2
free cells of loose net
uncut net =
CS258F 2006W
Prof. J. Cong 9
Jason Cong 17
Loose net Removal (LR)• Net Status Transition
• Loose net Removal
free net
loose net
locked net =
block 0 block 1 block 2
free cells of loose net
uncut net =
Jason Cong 18
LR Implementation• Basic Approach
– increase gain of free cells of loose nets outside anchor
– focus on smaller nets, with fewer number of free cells (Fn) & larger number of locked cells (Ln)
• Cell Gain Increase Function– accumulate : until complete removal– set threshold : array-based bucket can be used– expanded gain range : no tie-breaking scheme (LIFO, LA3)
needed
⎥⎥⎥
⎦
⎥
⎢⎢⎢
⎣
⎢⋅=∑∑
∈
∈
n
n
Fc
Lc
cdeg
cdeg
nsizemax_netnincr
)(
)(
)()(
CS258F 2006W
Prof. J. Cong 10
Jason Cong 19
Stable Net Transition (SNT)• Stable net [Shibuya et al. 1995]
– remain cut throughout entire run– more than 80% of cut nets are stable
• Implementation– detect stable nets by comparing cutsets– move all unprocessed cells of a stable net into single block– serve as initial partition for next run
• fast convergence• efficiently explore more local minima
• Loose and Stable net Removal (LSR)– combination of two effective net removal schemes
• LR : remove small loose nets during each pass• SNT : remove large stable nets at the end of each pass
Jason Cong 20
Cutsize Reduction by LSR• Bipartitioning under 45-55% Balance Criterion
500 700 900 1100 1300 1500 1700 1900
LSR
LR
SNT
FM0.0%
-8.0%
-45.6%-
-46.9%-
CS258F 2006W
Prof. J. Cong 11
Jason Cong 21
Partitioning Algorithms
Iterative partitioning algorithms
Spectral based partitioning algorithms
Net partitioning vs. module partitioning
Multi-way partitioning
Further study in partitioning techniques
Jason Cong 22
Spectral Based Partitioning Algorithms
a
b
c
d
13
4
3
0343
343000001010
dcba
A
dcba
=
10000
000300050004
dcba
D
dcba
=
D: degree matrix; A: adjacency matrix; D-A: Laplacian matrix
Eigenvectors of D-A form the Laplacian spectrum of G
CS258F 2006W
Prof. J. Cong 12
Jason Cong 23
Some Applications of Laplacian SpectrumPlacement and floorplan
[Hall 1970][Otten 1982][Frankle-Karp 1986][Tsay-Kuh 1986]
Bisection lower bound and computation[Donath-Hoffman 1973][Barnes 1982][Boppana 1987]
Ratio-cut lower bound and computation[Hagen-Kahng 1991][Cong-Hagen-Kahng 1992]
Jason Cong 24
Eigenvalues and Eigenvectors
If Ax=λx
then λ is an eigenvalue of A
x is an eignevector of A w.r.t. λ
(note that Kx is also a eigenvector, for any constant K).
⎟⎟⎟
⎠
⎞
⎜⎜⎜
⎝
⎛
+++
+++
=⎟⎟⎟
⎠
⎞
⎜⎜⎜
⎝
⎛
⎟⎟⎞
⎜⎜⎛
nnnnn
nn
n
n
xaxaxa
xaxaxa
x
xaaa
2211
12121111
⎠⎝ nnnn aaa 21
11211
xAxA
...
CS258F 2006W
Prof. J. Cong 13
Jason Cong 25
A Basic Property
( )
∑
∑∑
=
⎟⎟⎟
⎠
⎞
⎜⎜⎜
⎝
⎛⎟⎠
⎞⎜⎝
⎛=
⎟⎟⎟
⎠
⎞
⎜⎜⎜
⎝
⎛
⎟⎟⎟
⎠
⎞
⎜⎜⎜
⎝
⎛=
==
jiijji
n
n
iini
n
iii
nnnn
n
n
axx
x
xaxax
x
x
aa
aaxx
,
1
1,
1,1
1
1
111
1 ,,xTAx
Jason Cong 26
Basic Idea of Laplacian Spectrum Based Graph Partitioning
Given a bisection ( X, X’ ), define a partitioning vector
clearly, x ⊥ 1, x ≠ 0 ( 1 = (1, 1, …, 1 ), 0 =(0, 0, …, 0))
For a partition vector x:
Let S = {x | x ⊥ 1 and x ≠ 0}. Finding best partition vector x such that the total edge cut C(X,X’) is minimum is relaxed to finding x in S such that is minimum
Linear placement interpretation:
minimize total squared wirelength
'11.s.t),,,( 21 Xi
Xi xxxxx in ∈∈
⎩⎨⎧−==
xxEji
ji )(),(
2−∑∈
….
x1 x2 x3 xn
xxEji
ji )(),(
2−∑∈
= 4 * C(X, X’)
CS258F 2006W
Prof. J. Cong 14
Jason Cong 27
Property of Laplacian Matrix
∑
∑∑ ∑
∑
∑
∈
∈
−=
−=
−=
−=
=
Ejiji
Ejijiii
jiijii
TT
jiijT
xx
xaij xxd
xxAxdAxxDxx
xxQQxx
),(
2),(
2
2
)(
2
( 1 )
squared wirelength
= 4 * C(X, X’)
Therefore, we want to minimize T Qxx
If x is a partition vector
….
x1 x2 x3 xn
Jason Cong 28
( 2 ) Q is symmetric and semi-definite, i.e.
(i)
(ii) all eigenvalues of Q are ≥0
( 3 ) The smallest eigenvalue of Q is 0
corresponding eigenvector of Q is x0= (1, 1, …, 1 )
(not interesting, all modules overlap and x0∉S )
( 4 ) According to Courant-Fischer minimax principle:
the 2nd smallest eigenvalue satisfies:
Property of Laplacian Matrix (Cont’d)
0≥= ∑ jiijT xxQQxx
2||min
xQxxT
=λx in S
CS258F 2006W
Prof. J. Cong 15
Jason Cong 29
Results on Spectral Based Graph Partitioning
Min bisection cost c(x, x’) ≥ n⋅λ/4
Min ratio-cut cost c(x,x’)/|x|⋅|x’| ≥ λ/n (homework)
The second smallest eigenvalue gives the best linear placement
Compute the best bisection or ratio-cut
based on the second smallest eigenvector
Jason Cong 30
Computing the Eigenvector
Only need one eigenvector(the second smallest one)
Q is symmetric and sparse
Use block Lanczos Algorithm
CS258F 2006W
Prof. J. Cong 16
Jason Cong 31
Interpreting the EigenvectorLinear Ordering of modules
Try all splitting points, choose the best one
using bisection of ratio-cut metric
23 10 22 34 6 21 8
23 10 22 34 6 21 8
Jason Cong 32
Experiments on Special GraphsGBUI(2n, d, b) [Bui-Chaudhurl-Leighton 1987]
2n nodesd-regularmin-bisection≈b
GGAR (m, n, Pint, Pext ) [Garbe-Promel-Steger 1990]
m clusters
n nodes each cluster
Pint: intra-cluster edge probability
Pext: Inter-cluster edge probability
n
n
n
n
n
n
CS258F 2006W
Prof. J. Cong 17
Jason Cong 33
GBUI Results
GBUI Results
Value of Xi
Cluster 2
Cluster 1
Module i
Jason Cong 34
GGAR Results
GGAR Results
Value of Xi
Cluster 4
Cluster 1
Module i
Cluster 2
Cluster 3
CS258F 2006W
Prof. J. Cong 18
Jason Cong 35
Partitioning Algorithms
Iterative partitioning algorithms
Spectral based partitioning algorithms
Net partitioning vs. module partitioning
Multi-way partitioning
Further study in partitioning techniques
Jason Cong 36
Transform Netlists to Graph
Transform each r-pin net into a r-clique
Weight each edge in the r-clique1/r-1 ( simple scaling) or2/r (edge probability) or4/r2 (bound each net contribution in the cut size)
Transforming multi-pin nets to cliques may destroy the sparsity of D-A
CS258F 2006W
Prof. J. Cong 19
Jason Cong 37
Net Partitioning vs. Module PartitioningDual netlist representation-- intersection graph
Node: NetEdge: if two nets share a common gate
Intersection graph also captures circuit connectivity, but does not use hyperedges
o
o
o
o
o
o
o
i
G1
G7
G6
G5
G4
G3
G2
i
b
g
d
ca j
f
eh
Netlist N Intersection graph I( N )
i
b
c
f
j
hd
a e
Jason Cong 38
Transform Net Partition to Module Partition
o
o
o
o
o
o
o
i
G1
G7
G6
G5
G4
G3
G2
i
bg
d
ca j
f
eh
⊆X’⊆X
N
Every edge in B(Y, Y’) indicates a conflict:
G3 is wanted by both c and g
G4 is wanted by both d and e
G5 is wanted by both d and f
ig
c
i
b
c
f
j
hd
a
I ( N )
e
fd
e
B (Y, Y’)
Y Y’
CS258F 2006W
Prof. J. Cong 20
Jason Cong 39
Some DefinitionsMaximum Independent Set(MIS)Maximum subset of vertices with no two connected
Minimum Vertex Cover(MVC)Minimum subset of vertices which “covers” all edges
MIS MVC
Jason Cong 40
Resolving Net Conflict When Transforming A Net Partition to A Module Partition
Some nets in B(Y, Y’) will loseEach losing net corresponds to a net-cut in module partition
Objective: maximize the # winning nets, or equivalently, minimize the # losing nets
Theorem: Set of winning nets forms an independent set in B(Y, Y’), and the set of losers form a vertex cover
⇒Find max independent set in B(Y, Y’)⇒Solved optimally in polynomial time for bipartite graph
CS258F 2006W
Prof. J. Cong 21
Jason Cong 41
Bipartite Graph Properties|MIS|+|MVC|=|V|
size of any vertex cover is no smaller than max. matching
|MVC|=|MM| (MM: maximum matching)
⇒ Goal: Find an MVC of size |MM|, or equivalently, an MIS of size |V|-|MM|
MIS = Winners
MVC =Losers
Jason Cong 42
Finding an MISL
R
L
R
Maximum Matching (MM)
UL
UR
CS258F 2006W
Prof. J. Cong 22
Jason Cong 43
Finding an MIS
Alternating Paths
L
R
UL
UR
Jason Cong 44
Finding an MIS
Odd, Even Sets
L
R
UL
UR
Odd(R)
Odd(L)Even ( R )
Even(L)
CS258F 2006W
Prof. J. Cong 23
Jason Cong 45
Improving the Module PartitionNet1= {a, b}
Net3= {a, e}
Net2= {b, c}
Net6= {g, e}
Net5= {e, f}
Net4= {b, d}
Even(L)={Net1, Net2}
ModSL={a,b,c}
Even(R)={Net5, Net6}
ModSR={e,f,g}
ModSL+{d} cuts 1set
ModSR+{d} cuts 2 nets
Jason Cong 46
Algorithm IG-Match
Get spectral linear ordering
Construct bipartite graph B
Find MIS of B
Convert MIS into module partition
Theorem: IG-Match computes all |V|-1 module partitions in O(|V|(|V|+|E|)) time
Key idea: incremental updating of the bipartite graph and alternating paths
CS258F 2006W
Prof. J. Cong 24
Jason Cong 47
Experimental Results Wei-Cheng
Areas/ Cutsize IG-Match
Areas/Cutsize Improvement
bm1 9: 873 / 1 21:861 / 1 57%
19ks 1011:1833 / 109 650:2194/85 -1% Prim1 152:681 / 14 154: 679/ 14 1%
Prim2 1132: 1882/123 740: 2274/ 77 21%
Test02 372:1291/95 211:1452/38 37%
Test03 147: 1460/31 803: 804/ 58 38%
Test04 401:1114/51 73: 1442 / 6 50%
Test05 1204:1391/110 105: 1442 / 6 53%
Test06 145: 1607/18 141: 1611/ 17 3%
28.8% average improvement
Jason Cong 48
Further Study in Partitioning Techniques
Module replication in circuit partitioning [Kring-Newton 1991; Hwang-ElGamal 1992; Liu et al TCAD’95; Enos, et al, TCAD’99]
Generating uni-directional partitioning [Iman-Pedram-Fabian-Cong 1993] or acyclic partitioning [Cong-Li-Bagrodia, DAC94] [Cong-Lim, ASPDAC2000]
Logic restructuring during partitioning[Iman-Pedram-Fabian-Cong 1993]
Communication based partitioning[Hwang-Owens-Irwin 1990; Beardslee-Lin-Sangiovanni 1992]
Multi-level partitioning (will discuss after clustering)
CS258F 2006W
Prof. J. Cong 25
Jason Cong 49
Multi-Way PartitioningRecursive bi-partitioning[Kernighan-Lin 1970]
Generalization of Fuduccia-Mattheyse’s and Krishnamurthy’s algorithms [ Sanchis 1989] [Cong-Lim, ICCAD’98]
Generalization of ratio-cut and spectral method to multi-way partitioning [Chan-Schlag-Zien 1993]
generalized ratio-cut value=sum of flux of each partitiongeneralized ratio-cut cost of a k-way partition≥sum of the k smallest eigenvalue of the LaplacianMatrix
Jason Cong 50
Circuit Clustering Formulation
Motivation:Reduced the size of flat netlistsIdentify natural circuit hierarchy
Objectives:Maximize the connectivity of each cluster or minimize connectivity between clustersMinimize the delay (or simply depth) of clustered circuits
CS258F 2006W
Prof. J. Cong 26
Jason Cong 51
Measuring Cluster Connectivity
Simple density measurementm/n -- include a new node when it
connects to more than m/n edges
nm/( ) -- includes a new node when it
connects to more than 2m/(n-1) edges -- biased for small clusters
(k, I)-connectivity [Garber-Promel-Steger 1990]Every two nodes in the cluster are connected by at
least k edge-disjoint paths of length no more than IProper choice of k and I may not be easy
D/S-metric [Cong-Hagen-Kahng 1991]D/S=Degree/SeparationDegree= average degree in the clusterSeparation=average shortest path between a pair of nodes in the cluster Robust but expensive to compute
uP1
P2
Pk
≤l
v
2
Jason Cong 52
Clustering Algorithms(k, I)-connectivity algorithms [Garber-Promel-Steger 1990]
Random-walk based algorithm [Cong-Hagen-Kahng 1991; Hagen-Kahng1992]
Multicommodity-Flow based algorithm [Yeh-Cheng-Lin 1992]
Local connectivity based clustering
Clique based algorithm [Bui 1989; Cong-Smith 1993]
HEC, MHEC, First-choice [Karypis-Kumar, DAC97 & DAC98]
Global connectivity based clustering [Cong-Sung, ASPDAC’2000]
Clustering for delay minimization (combinational circuits) [Lawler-Levitt-Turner 1969; Murgai-Brayton-Sanjiovanni 1991; Rajaraman-Wong 1993]
Clustering for delay minimization (with consideration of retiming for sequential circuits) [Pan et al, TCAD’99; Cong et al, DAC’99]
Signal flow based clustering [Cong-Ding, DAC’93; Cong et al ICCAD’97]
CS258F 2006W
Prof. J. Cong 27
Jason Cong 53
FirstChoice Clustering Algorithm (FC)• Connectivity-driven clustering algorithm• First introduced by G. Karypis in DAC’1998 k-way hMetis
partitioner• Idea:
– Visit node in a random order– For each node v,
• compute the connectivity weights for its neighboring nodes. Connect(u, v)=sum of {w(e) / |e|}, e is the hyper-edge connecting uand v. |e| is the degree of hyperedge, w(e) is the hyper-edge weight.
• Choose node u which has the max. connectivity weight to “merge”. If node u is already clustered, then node v is “merged” to the cluster, otherwise, u and v form a new cluster.
– Cluster area control can be considered when “merging” node to cluster.
Jason Cong 54
Lawler’s Labeling Algorithm[Lawler-Levitt-Turner 1969]
Assumption: Cluster size≤ K; Intra-cluster delay = 0; Inter-cluster delay =1
Objective: Find a clustering of minimum delay
Algorithm:Phase 1: Label all nodes in topological ordor
For each PI node V, L(v)= 0;For each non-PI node v
p=Maximum label of predecessors of vXp = set of predecessors of v with label pif |Xp|<K then L(v) = p else L(v) =p+1
Phase2: Form clustersStart from PO to generate necessary clustersNodes with the same label form a cluster
p-1
Xp
p-1
v
p-1
p
p
CS258F 2006W
Prof. J. Cong 28
Jason Cong 55
Lawler’s Labeling Algorithm(Cont’d)
Performance of the algorithmEfficient run-timeMinimum delay clustering solutionAllow node duplicationNo attempt to minimize the number of clusters
Extension to allow arbitrary gate delaysHeuristic solution [Murgai-Brayton-Sangiovanni 1991]Optimal solution [Rajaraman-Wong 1993]
Jason Cong 56
Signal Flow and Logic Dependency• Beyond Connectivity
– identifies logically dependant cells
– useful in guiding• clustering • partitioning [CoLB94, CoLL+97]• placement [CoXu95]
CS258F 2006W
Prof. J. Cong 29
Jason Cong 57
Maximum Fanout Free Cone (MFFC)• Definition: for a node v in a combinational circuit,
– cone of v ( ) : v and all of its predecessors such that any path connecting a node in and v lies entirely in
– fanout free cone at v ( ) : cone of v such that for any node
– maximum FFC at v ( ) : FFC of v such that for any non-PI node w,
vCvC vC
vFFCvv FFCuoutputFFCvu ⊆≠ )( , in
vMFFCvv MFFCwMFFCwoutput ∈⊆ then ,)( if
Jason Cong 58
Properties of MFFCs• If• Two MFFCs are either disjoint or one contains another [CoDi93]• Area-optimal duplication-free LUTs map each MFFC independently
to get an optimal solution [CoDi93]• For acyclic bipartitioning without area constraint, there exists an
optimal cut which does not cut through any MFFCs [CoLB94]
[CoDi93] then , vMFFCwMFFCvMFFCw ⊆∈
CS258F 2006W
Prof. J. Cong 30
Jason Cong 59
Maximum Fanout Free Subgraph (MFFS)• Definition : for a node v in a sequential circuit,
• Illustration
} through passes PO some to from path every|{ vuuFFSv =} , allfor |{ vvv FFSuFFSuMFFS ∈=
MFFCs ??? MFFS
Jason Cong 60
MFFS Construction Algorithm• For Single MFFS at Node v
– select root node v and cut all its fanout edges– mark all nodes reachable backwards from all POs – MFFSv = {unmarked nodes}– complexity : O(|N| + |E|)
v
CS258F 2006W
Prof. J. Cong 31
Jason Cong 61
MFFS Construction Algorithm• For Single MFFS at Node v
– select root node v and cut all its fanout edges– mark all nodes reachable backwards from all POs– MFFSv = {unmarked nodes}– complexity : O(|N| + |E|)
v
Jason Cong 62
MFFS Construction Algorithm• For Single MFFS at Node v
– select root node v and cut all its fanout edges– mark all nodes reachable backwards from all POs – MFFSv = {unmarked nodes}– complexity : O(|N| + |E|)
v
CS258F 2006W
Prof. J. Cong 32
Jason Cong 63
MFFS Construction Algorithm• For Single MFFS at Node v
– select root node v and cut all its fanout edges– mark all nodes reachable backwards from all POs – MFFSv = {unmarked nodes}– complexity : O(|N| + |E|)
v
Jason Cong 64
MFFS Clustering Algorithm• Clusters Entire Netlist
– construct MFFS at a PO and remove it from netlist– include its inputs as new POs– repeat until all nodes are clustered– complexity : O(|N| · (|N| + |E|))
v
CS258F 2006W
Prof. J. Cong 33
Jason Cong 65
MFFS Clustering Algorithm• Clusters Entire Netlist
– construct MFFS at a PO and remove it from netlist– include its inputs as new POs– repeat until all nodes are clustered– complexity : O(|N| · (|N| + |E|))
Jason Cong 66
MFFS Clustering Algorithm• Clusters Entire Netlist
– construct MFFS at a PO and remove it from netlist– include its inputs as new POs– repeat until all nodes are clustered– complexity : O(|N| · (|N| + |E|))
v
CS258F 2006W
Prof. J. Cong 34
Jason Cong 67
MFFS Clustering Algorithm• Clusters Entire Netlist
– construct MFFS at a PO and remove it from netlist– include its inputs as new POs– repeat until all nodes are clustered– complexity : O(|N| · (|N| + |E|))
v
Jason Cong 68
MFFS Clustering Algorithm• Clusters Entire Netlist
– construct MFFS at a PO and remove it from netlist– include its inputs as new POs– repeat until all nodes are clustered– complexity : O(|N| · (|N| + |E|))
CS258F 2006W
Prof. J. Cong 35
Jason Cong 69
Speedup of MFFS Clustering• Approximation of MFFS
– cluster size limit imposed in practice– perform MFFS construction on subcircuit SC(v, h)
• defined by search depth h for given node v• internal nodes : all visited nodes by BFS of depth h from v• pseudo PIs/POs : all I/Os from/to nodes not in SC(v, h)
h
v
SC(v, h)
circuit
Jason Cong 70
Speedup of MFFS Clustering• Approximation of MFFS
– cluster size limit imposed in practice– perform MFFS construction on subcircuit SC(v, h)
• defined by search depth h for given node v• internal nodes : all visited nodes by BFS of depth h from v• pseudo PIs/POs : all I/Os from/to nodes not in SC(v, h)
h
v
pseudo PIs
pseudo POs
SC(v, h)
circuit
CS258F 2006W
Prof. J. Cong 36
Jason Cong 71
MFFS Clustering Result• MCNC benchmarks with signal flow information
NAME SIZE AVR # CLST TIME # CLST TIMEs1423 619 1.0 193 0.9 168 1.9sioo 664 4.6 442 2.3 442 2.6
s1488 686 1.0 187 0.9 187 1.4balu 801 12.9 287 2.0 284 3.8
prim1 833 6.7 394 3.2 379 5.9struct 1952 2.5 1183 17.9 1183 12.1prim2 3014 6.7 1404 34.8 1243 22.1s9234 5866 1.0 3203 36.7 3177 9.7
biomed 6514 4.5 2142 87.8 1853 30.5s13207 8772 1.0 2132 73.9 1994 14.1s15850 10470 1.0 2279 84.9 2137 17.6s35932 18148 1.0 5562 420.8 2943 32.1s38584 20995 1.0 5139 565.1 4242 44.5avq.sm 21918 4.5 8477 1287.3 8309 116.1s38417 23949 1.0 5906 452.2 5295 45.1avq.lg 25178 4.5 9103 1473.2 8658 90.2
TOTAL 48033 4543.9 42494 449.7
Optimal Heuristic
min_areamax_areaAVR =
Jason Cong 72
Two-Level Partitioning: LSR/MFFS Partitioning Algorithm
• Overview of the Algorithm– cluster netlist using MFFS algorithm– partition clustered netlist with LSR
algorithm– De-cluster the netlist– refine flat netlist with LSR algorithm
CS258F 2006W
Prof. J. Cong 37
Jason Cong 73
Flow of LSR/MFFS Algorithm
Cluster netlist with MFFS
Partition netlist with LSR
Gain?
Decluster netlist
Refine netlist with LSR
Gain?yes yes
nono
START
END
Jason Cong 74
Cutsize Reduction by MFFS Clustering• Bipartitioning under 45-55% Balance Criterion
500 700 900 1100 1300 1500 1700 1900
LSR
LR
SNT
FM
FLATMFFS
0.0%-25.8%
-8.0%-31.7%
-45.6%--48.4%
-46.9%-49.9%
CS258F 2006W
Prof. J. Cong 38
Jason Cong 75
Cutsize Reduction by MFFS Clustering
NAME FM SNT LR LSR FM SNT LR LSRs1423 12 14 13 13 12 12 12 12sioo 25 25 25 25 25 25 25 25
s1488 48 45 44 42 42 42 43 43balu 27 27 27 27 27 27 27 27
prim1 43 42 42 42 43 42 42 43struct 42 45 33 33 39 52 33 33prim2 174 167 121 122 133 122 131 119s9234 49 54 41 43 42 43 41 40
biomed 83 83 83 83 100 84 84 84s13207 90 74 75 70 91 65 73 61s15850 113 73 73 65 72 70 44 43s35932 100 128 45 44 48 76 45 44s38584 52 52 49 47 52 52 47 47avq.sm 321 378 135 139 273 172 127 127s38417 176 112 53 55 119 147 51 50avq.lg 491 380 145 131 251 168 127 127
TOTAL 1846 1699 1004 981 1369 1261 952 925% IMPR 0.0 8.0 45.6 46.9 25.8 31.7 48.4 49.9
TIME 121.4 59.4 162.3 138.9 84.2 61.3 113.6 89.4
No Clustering MFFS Clustering
Jason Cong 76
Cutsize Comparison with Other Methods
800
900
1000
1100
1200
CD
IP
Prop
F
Stra
w
hMet
is
ML
c
L/M
Panz
a
Para
b
FBB
IIPsNon-IIPs
CS258F 2006W
Prof. J. Cong 39
Jason Cong 77
Details on Cutsize ComparisonNAME CDIP PROPf Straw hMetis MLc L/M PANZA Parab FBBs1423 12 15 16 13sioo 25 25 45
s1488 43 44 50balu 27 27 27 27 27 27 27 41
prim1 52 51 49 50 47 43 struct 36 33 33 33 33 33 33 40prim2 152 152 143 145 139 119 s9234 44 42 42 40 40 40 40 74 70
biomed 83 84 83 83 83 84 83 135s13207 70 71 57 55 55 61 66 91 74s15850 67 56 44 42 44 43 44 91 67s35932 73 42 47 42 41 44 43 62 49s38584 47 51 49 47 47 47 47 55 47avq.sm 148 144 131 130 128 127 s38417 79 65 53 51 49 50 49 49 58avq.lg 145 143 140 127 128 127
TOTAL1 1023 961 898 872 861 845 TOTAL2 509 516 749% IMPR 17.4 12.1 5.9 3.1 1.9 1.4 32.0 21.4
TIME 5817 5611 12577 3455 1338 24619
Iterative Improvement Partitioners (IIPs) non-IIPs
Jason Cong 78
Multi-level Partitioning• Multi-level paradigm:
– From coarse-grain into finer-grain optimization– Successfully used in partial differential equations, image
processing, combinatorial optimization, etc, and circuit partitioning.
Coarsening Uncoarsening
Initial Partitioning
CS258F 2006W
Prof. J. Cong 40
Jason Cong 79
General Framework of Multi-Level Optimization
• Step 1: Coarsening– Generate hierarchical representation of the netlist
• Step 2: Initial Solution Generation– Obtain initial solution for the top-level clusters– Reduced problem size: converge fast
• Step 3: Uncoarsening and Refinement– Project solution to the next lower-level (uncoarsening)– Perturb solution to improve quality (refinement)
• Step 4: V-cycle– Additional improvement possible from new clustering– Iterate Step 1 (with variation) + Step 3 until no further gain
Jason Cong 80
V-cycle Refinement• Motivation
– Post-refinement scheme for multi-level methods– Different clustering can give additional improvement
• Restricted Coarsening– Require initial partitioning– Do not merge clusters in different partition– Maintain cutline: cutsize degradation is not possible
• Two Strategies: V-cycle vs. v-cycle– V-cycle: start from the bottom-level– v-cycle: start from some middle-level– Tradeoff between quality vs. runtime
CS258F 2006W
Prof. J. Cong 41
Jason Cong 81
Application in Partitioning• Multi-level Partitioning
– Coarsening engine (bottom-up)• Unrestricted and restricted coarsening• Any bottom-up clustering algorithm can be used• Cutsize oriented (MHEC, ESC) vs. delay oriented (PRIME)
– Initial partitioning engine• Move-based methods are commonly used
– Refinement engine (top-down)• Move-based methods are commonly used• Cutsize oriented (FM, LR) vs. delay oriented (xLR)
• State-of-the-art Algorithms– hMetis [DAC97] and hMetis-Kway [DAC99] for cutsize– HPM [DAC00] for delay
Jason Cong 82
hMetis Algorithm• Multilevel Partitioning Algorithm [DAC97]
– Contributions: 3 new coarsening schemes for hypergraphs– Best bipartitionining results – remain to be very competitive today!
Original Graph Edge Coarsening
Edge Coarsening = heavy-edge maximal matching1. Visit vertices randomly2. Compute edge-weights (=1/(|n|-1)) for all unmatched neighbors3. Match with an unmatched neighbor via max edge-weight
CS258F 2006W
Prof. J. Cong 42
Jason Cong 83
hMetis Algorithm (cont)
Coarsening schemes for hypergraphs (cont’d)
Hyperedge Coarsening Modified Hyperedge Coarsening
Hyperedge Coarsening = independent hyperedge merging1. Sort hyperedges in non-decreasing order of their size2. Pick an hyperedge with no merged vertices and merge
Modified Hyperedge Coarsening = Hyeredge Coarsening + post process1. Perform Hyperedge Coarsening2. Pick a non-merged hyperedge and merge its non-merged vertices
Jason Cong 84
hMetis-Kway Algorithm• Multiway Partitioning Algorithm [DAC99]
– New coarsening: First Choice (variant of Edge Coarsening)• Can match with either unmatched or matched neighbors
– Greedy refinement• On-the-fly gain computation• No bucket: not necessarily the max-gain cell moves• Save time and space requirements
Original Graph First Choice
CS258F 2006W
Prof. J. Cong 43
Jason Cong 85
hMetis Results• Bipartitioning on ISPD98 Benchmark Suite
1.61
1.211.03 1
0
0.4
0.8
1.2
1.6
Scal
ed C
utsi
ze
FM LR LR/ESC hMetis
Jason Cong 86
hMetis-Kway Results• Multiway Partitioning on ISPD98 Benchmark Suite
1.2
1.031.19
1.021.18
1.011.15
0.97
0
0.2
0.4
0.6
0.8
1
1.2
Scale
d Cu
tsize
2way 8way 16way 32way
hMetis-KwayKPM/LRLR/ESC-PM
CS258F 2006W
Prof. J. Cong 44
Jason Cong 87
SummaryPartitioning is very important for applying divide-and-conquer methodology (for complexity management)
Partitioning also defines global/local interconnects and greatly impact circuit performance
Increasing emphasis on interconnect design has introduced many new partitioning formulations
clustering is effective in reducing circuit size and identifying natural circuit hierarchy
Multi-level circuit clustering + iterative improvement based methods produce the best partitioning results (so far)
top related