large scale circuit partitioning - web hosting default...
TRANSCRIPT
Large Scale Circuit Partitioning With Loose/Stable Net Removal
And Signal Flow Based Clustering
Support : DARPA/ITO, NSF, Fujitsu MICRO
Jason CongHonching Li
Sung-Kyu LimDongmin Xu
UCLA VLSI CAD Lab
Toshiyuki Shibuya
Fujitsu Lab, LTD
Outline
• Loose and Stable net Removal Partitioning Algorithm
• Maximum Fanout Free Subgraph Clustering Algorithm
• Performance of LSR/MFFS
• Conclusion & Ongoing Work
LSR Partitioning
1. Background
2. Motivation
3. Implementation
Circuit Partitioning
• Formulation– minimize connection– satisfy area constraint
• Significance– fundamental for hierarchical layout– essential for future technology
• Iterative Improvement Partitioning– flexible, effective, and efficient
Evolution of IIP Algorithm
• Early Development– KL : Kernighan & Lin [Bell70]– FM : Fiduccia & Mattheyses [DAC82]– LA : Krishnamurthy [TCom84]
• Recent Development– CDIP/PROPf : Dutt & Deng [ICCAD96]– Strawman : Hauck & Borriello [TCAD96]– hMetis : Karypis & Kumar [DAC97]– MLc : Alpert, Huang & Kahng [DAC97]
FM Algorithm Basics
• Basic Operation : Cell Move– cost : gain (= reduction in cutsize)– constraint : area balance– cell status : free or locked
• Structurewhile (gain > 0)
while (∃ free cell)move cell
retrieve max-gain movespass run
Loose-net Removal (LR)
• New Gain Formulation
block 0 block 1
FREE net : only free cells
Loose-net Removal (LR)
• New Gain Formulation
block 0 block 1
LOOSE net : locked cells in one block
Loose-net Removal (LR)
• New Gain Formulation
block 0 block 1
-3 -2FM :
FREE cells of LOOSE net
Loose-net Removal (LR)
• New Gain Formulation
block 0 block 1
IMMEDIATE ATTENTION, W > 0
W WLR :
Loose-net Removal (LR)
• New Gain Formulation
block 0 block 1
WLR : +W
Loose-net Removal (LR)
• New Gain Formulation
block 0 block 1
LOOSE net removed
more LOOSE net formed
Loose-net Removal (LR)
• New Gain Formulation
block 0 block 1
Loose-net Removal (LR)
• New Gain Formulation
block 0 block 1
Net Pulling Effect
block 0 block 1
LR Implementation
• Gain Increase of LR
favor shorter nets
LR Implementation
• Gain Increase of LR
block 0 block 1
incr(n) =k
size(n)
33 33
25 25 25
= 100
LR Implementation
• Gain Increase of LR : upper bound
k
Less Tie-Break
LR Implementation
• FM Enhancement
while (gain > 0)while (∃ free cell)
move max-gain cell c
retrieve max-gain moves
for (each loose net n incident on c)for (each free cell f of n)
if (f.gain < T)f.gain += incr(n)
Performance of LR
• Bipartitioning without Clustering
922
1023
1956
1761
500 700 900 1100 1300 1500 1700 1900 2100
LR
CDIP
LA3
FM
Stable Net Transition (SNT)
• Stable Net [Shibuya et al, FSTJ95]– remain cut during entire run– limit FM solution
• Stable Net Removal– at the end of each run– detect stable net and isolate– new initial partition– fast convergence
Enhancement of LR
• Benefit of LR + SNT– small loose + big stable net– dynamic + static– speedup LR
• How?– initial partition by SNT for next run of LR– Loose and Stable net Removal (LSR)
MFFS Clustering
1. Motivation
2. Algorithm
3. Speedup
Circuit Clustering
• Definition– group closely connected component in
circuits
• Significance– reduce problem size– speedup partitioning– improve partitioning solution– refinement through decomposition
Maximum Fanout Free Cone
• Significance [Cong & Ding, DAC93]– exploit signal flow during clustering– group logically dependant cells– linear time complexity
• Benefit– partitioning [Cong, Li, & Bagrodia DAC94]– placement [Cong & Xu, ASP-DAC95]
Definition of MFFC
• Cone Rooted at v : Cv– v and its predecessor s.t. if u in Cv, every
path from u to v resides entirely in Cv
• Fanout Free Cone at v : FFCv– Cv is fanout free if output(Cv) = output(v)
• Maximum FFCv : MFFCv– fanout free and maximum FFCv
Definition of MFFC
• Find All Single MFFC– complexity : O(|N|+|E|)
Limitation of MFFC
• Designed for Combinational Circuit– can’t handle cycles in sequential circuit
applyMFFC
algorithm
Limitation of MFFC
• Designed for Combinational Circuit– can’t handle cycles in sequential circuit
applyMFFS
algorithm
Definition of MFFS
• For a node v in a sequential circuit;
MFFSv = {u|for all FFSv, u ∈ FFCv } – Maximum Fanout Free Subgraph rooted at v
FFSv = {u|every path from u to somePO passes through v }
– Fanout Free Subgraph rooted at v
MFFS Construction
• For Single MFFSv
MFFS Construction
• For Single MFFSv– select root node v and cut its fanout
v
MFFS Construction
• For Single MFFSv– mark nodes reachable backwards from PO
v
MFFS Construction
• For Single MFFSv– MFFSv = {unmarked nodes}
v
MFFS Construction
• For Single MFFSv– complexity : O(|N|+|E|)
v
MFFS Clustering
• For Clustering Entire Circuit
MFFS Clustering
• For Clustering Entire Circuit– find MFFSv and remove
v
MFFS Clustering
• For Clustering Entire Circuit– find MFFSv and remove
v
MFFS Clustering
• For Clustering Entire Circuit– output to removed nodes is new PO
v
MFFS Clustering
• For Clustering Entire Circuit– repeat until all nodes are clustered
vv
MFFS Clustering
• For Clustering Entire Circuit– repeat until all nodes are clustered
v v
MFFS Clustering
• For Clustering Entire Circuit– repeat until all nodes are clustered
v v
v
MFFS Clustering
• For Clustering Entire Circuit– repeat until all nodes are clustered
v v
v
MFFS Clustering
• For Clustering Entire Circuit– complexity : O(|N|• (|N|+|E|))
Speedup of MFFS Clustering
• Single MFFSv Construction– slow : O(|N|+|E|)
v
Speedup of MFFS Clustering
• Subset of MFFSv– search on subcircuit
v
Speedup of MFFS Clustering
• Subset of MFFSv
circuit
h
v
SC (v, h)
– internal node : depth h-BFS at node v
pseudo PIs
pseudo POs
– pseudo PI/PO : I/O to/from subcircuit
LSR/MFFS Algorithm
• Overview– cluster circuit with MFFS approximation
algorithm– partition clustered circuit with LSR
algorithm– decompose clusters completely– refine cutline with LSR algorithm on
declustered circuit
Experimental Result
1. Experiment Setting
2. MFFS Clustering
3. LSR/MFFS Partitioning
Experimental Setting
• Benchmark– 16 MCNC circuits with signal flow info– SPARC 5-85 with gcc v2.4– bipartitioning under 45-55% skew– real cell area
Area Variation Ratio = min cell areamax cell area
• Metric– cutsize : min of 20 runs– runtime : total of 10 runs
MFFS Clustering Result
Exact Approxckt size AVR # clst time # clst time
s1423 619 1.0 193 0.9 168 1.9
sioo 664 4.6 442 2.3 442 2.6
………
s35932 18148 1.0 5562 420.8 2943 32.1
s38584 20995 1.0 5139 565.1 4242 44.5
avq.sm 21918 4.5 8477 1287.3 8309 116.1
S38417 23949 1.0 5906 452.2 5295 45.1
avq.lg 25178 4.5 9103 1473.2 8658 90.2
Total 150379 48033 4543.9 42494 449.7
Cutsize Reduction Trend
500 700 900 1100 1300 1500 1700 1900
LSR
LR
SNT
FM
Cutsize Reduction Trend
500 700 900 1100 1300 1500 1700 1900
LSR
LR
SNT
FM
Cutsize Reduction Trend
500 700 900 1100 1300 1500 1700 1900
LSR
LR
SNT
FM
Cutsize Reduction Trend
500 700 900 1100 1300 1500 1700 1900
LSR
LR
SNT
FM
FLATMFFS
LR, SNT, MFFS are all effective
Runtime Reduction Trend
0 50 100 150
LSR
LR
SNT
FM
Runtime Reduction Trend
0 50 100 150
LSR
LR
SNT
FM
Runtime Reduction Trend
0 50 100 150
LSR
LR
SNT
FM
Runtime Reduction Trend
0 50 100 150
LSR
LR
SNT
FM
FLATMFFS
SNT and MFFS are both effective
Cutsizes Among IIPs
845
861
872
898
961
1023
700 750 800 850 900 950 1000 1050
LSR/MFFS
MLc
hMetis
Strawman
PROPf
CDIP
achieved BEST cutsize
Runtimes Among IIPs
1388
3455
1388
12577
5611
5817
0 1000 2000 3000 4000 5000 6000
LSR/MFFS
MLc
hMetis
Strawman
PROPf
CDIP
achieved BEST runtime
???
Cutsizes Among Non-IIPs
509
516
648
749
300 400 500 600 700
LSR/MFFS
PANZA
FBB
Paraboli
achieved BEST cutsize
Runtimes Among Non-IIPs
achieved BEST cutsize
1388
16024
24619
0 10000 20000
LSR/MFFS
PANZA
FBB
Paraboli
???
Conclusion & Ongoing Work
• LSR Partitioning– Loose and Stable net Removal
• MFFS Clustering– Maximum Fanout Free Subgraph
• Performance of LSR/MFFS
• Ongoing Work– LSR : multi-way partitioning– MFFS : multi-level cluster hierarchy– LSR/MFFS : mincut based placement
Thank You
For Your Attention