system partitioning - uni-paderborn.de
TRANSCRIPT
![Page 1: System Partitioning - uni-paderborn.de](https://reader031.vdocuments.site/reader031/viewer/2022020623/61f17224fcf2f4741167c979/html5/thumbnails/1.jpg)
System Partitioning
Dr. Tobias KenterProf. Dr. Christian Plessl
SS 2017
High-Performance IT Systems groupPaderborn University
Version 1.6.0 – 2017-06-25
![Page 2: System Partitioning - uni-paderborn.de](https://reader031.vdocuments.site/reader031/viewer/2022020623/61f17224fcf2f4741167c979/html5/thumbnails/2.jpg)
Overview
• graph models for system synthesis
• the partitioning problem
• partitioning methods
2
![Page 3: System Partitioning - uni-paderborn.de](https://reader031.vdocuments.site/reader031/viewer/2022020623/61f17224fcf2f4741167c979/html5/thumbnails/3.jpg)
Overview
• graph models for system synthesis
• the partitioning problem
• partitioning methods
3
![Page 4: System Partitioning - uni-paderborn.de](https://reader031.vdocuments.site/reader031/viewer/2022020623/61f17224fcf2f4741167c979/html5/thumbnails/4.jpg)
Models for System Synthesis
• allocation + binding = partitioning
• problem graph– nodes: functional and communication objects– edges: dependencies
• architecture graph– nodes: functional and communication resources– edges: directed communication paths
• specification graph– problem graph + architecture graph + possible mappings
4
![Page 5: System Partitioning - uni-paderborn.de](https://reader031.vdocuments.site/reader031/viewer/2022020623/61f17224fcf2f4741167c979/html5/thumbnails/5.jpg)
Problem Graph
1
3
2
4
DFG problem graph
1
3
2
4
5
7
6
communicationnodes
5
![Page 6: System Partitioning - uni-paderborn.de](https://reader031.vdocuments.site/reader031/viewer/2022020623/61f17224fcf2f4741167c979/html5/thumbnails/6.jpg)
Architecture Graph
architecture grapharchitecture
RISC
HWM2HWM1
bus
point-to-point link
vRISC
vHWM1
vHWM2
vbus
vptp
6
![Page 7: System Partitioning - uni-paderborn.de](https://reader031.vdocuments.site/reader031/viewer/2022020623/61f17224fcf2f4741167c979/html5/thumbnails/7.jpg)
Specification Graph
vRISC
vHWM1
vHWM2
vbus
vptp
1
3
2
4
5
7
6
7
![Page 8: System Partitioning - uni-paderborn.de](https://reader031.vdocuments.site/reader031/viewer/2022020623/61f17224fcf2f4741167c979/html5/thumbnails/8.jpg)
Allocation, Binding
vRISC
vHWM1
vHWM2
vbus
vptp
1
3
2
4
5
7
6
8
![Page 9: System Partitioning - uni-paderborn.de](https://reader031.vdocuments.site/reader031/viewer/2022020623/61f17224fcf2f4741167c979/html5/thumbnails/9.jpg)
Example: Homogeneous Multiprocessor
• optimization goals– minimize latency– meet deadlines– …
bus
PE1
M
PE2
M
PE3
M
vPE1
vbus
vPE2 vPE3
9
![Page 10: System Partitioning - uni-paderborn.de](https://reader031.vdocuments.site/reader031/viewer/2022020623/61f17224fcf2f4741167c979/html5/thumbnails/10.jpg)
Example: HW/SW Bi-Partitioning
• bi-partitioning means that there are only two blocks: SW and HW (accelerator)
processor
FPGA
bus
vprocessor
vFPGA
vbus
10
![Page 11: System Partitioning - uni-paderborn.de](https://reader031.vdocuments.site/reader031/viewer/2022020623/61f17224fcf2f4741167c979/html5/thumbnails/11.jpg)
Synthesis and Constraints
• system synthesis modeling– create problem graph– define architectural template– identify possible bindings– refinements: communication, memory, ..
• system synthesis with constraints Cmax, Lmax– allocation - determines cost C (first approximation)– binding– scheduling - determines latency L
– feasible schedule: L ≤ Lmax
– feasible binding: leads to at least one feasible schedule– feasible allocation: C ≤ Cmax and leads to at least one feasible
binding
11
![Page 12: System Partitioning - uni-paderborn.de](https://reader031.vdocuments.site/reader031/viewer/2022020623/61f17224fcf2f4741167c979/html5/thumbnails/12.jpg)
Overview
• graph models for system synthesis
• the partitioning problem
• partitioning methods
12
![Page 13: System Partitioning - uni-paderborn.de](https://reader031.vdocuments.site/reader031/viewer/2022020623/61f17224fcf2f4741167c979/html5/thumbnails/13.jpg)
Partitioning Problem
• definition: the partitioning problem is to assignn objects O ={o1, ..., on} to k partitions P={p1, ..., pk},
such that– p1⋃ p2 ⋃ ... ⋃ pm = O– pi ⋂ pj = { } ∀ i,j: i ≠ j and– the cost c(P) is minimized.
the general partitioning problem is NP-complete
• in system synthesis:– objects: problem graph nodes – partitions: architecture graph nodes
13
![Page 14: System Partitioning - uni-paderborn.de](https://reader031.vdocuments.site/reader031/viewer/2022020623/61f17224fcf2f4741167c979/html5/thumbnails/14.jpg)
Partitioning – Abstraction Levels
• structural partitioning– at the RTL- or netlist level– e.g. map a digital circuit onto two chips (FPGAs, ASICs)– system parameters are relatively well known (area and delay of functional
units, registers, gates)– no comparison of design alternatives– cost: e.g. number of wires between partitions, maximum delay between all
partitions, average size of each partition, ...
• functional partitioning– at the system level– system parameters are not known à estimation required– comparison of design alternatives à design space exploration– cost: e.g. required communication bandwidth or connections between
partitions, hardware implementation cost for all partitions, ...
14
![Page 15: System Partitioning - uni-paderborn.de](https://reader031.vdocuments.site/reader031/viewer/2022020623/61f17224fcf2f4741167c979/html5/thumbnails/15.jpg)
Cost Functions
• measure quality of a design point– may include C system cost in [$]
L latency in [sec] P power consumption in [W]
– requires estimation to find C, L, P
• example: linear cost function with penalty
– hC , hL , hP … denote how strong C, L, P violate the design constraints Cmax, Lmax, Pmax
– k1 , k2 , k3 … weighting and normalization
f(C, L, P) = k1·hC(C,Cmax) + k2·hL(L,Lmax) + k3·hP(P,Pmax)
15
✎ Example: Cost functions
![Page 16: System Partitioning - uni-paderborn.de](https://reader031.vdocuments.site/reader031/viewer/2022020623/61f17224fcf2f4741167c979/html5/thumbnails/16.jpg)
Overview
• graph models for system synthesis
• the partitioning problem
• partitioning methods
16
![Page 17: System Partitioning - uni-paderborn.de](https://reader031.vdocuments.site/reader031/viewer/2022020623/61f17224fcf2f4741167c979/html5/thumbnails/17.jpg)
Partitioning Methods – Overview
• exact methods– enumeration of solutions– integer linear programs (ILP)
• heuristic methods– constructive methods
§ random mapping§ hierarchical clustering
– iterative methods (refinement methods)§ greedy partitioners§ Kernighan-Lin
§ simulated annealing§ evolutionary algorithms (design space exploration)
17
![Page 18: System Partitioning - uni-paderborn.de](https://reader031.vdocuments.site/reader031/viewer/2022020623/61f17224fcf2f4741167c979/html5/thumbnails/18.jpg)
Enumeration
• Enumeration is not feasible even for small partitioning problems
18
✎ Example: Enumeration
![Page 19: System Partitioning - uni-paderborn.de](https://reader031.vdocuments.site/reader031/viewer/2022020623/61f17224fcf2f4741167c979/html5/thumbnails/19.jpg)
ILP – Example
19
processor
ASIC
bus
1
32
4
problem architecture
5
6
task SW cost HW cost
1 80 320
2 240 170
3 710 120
4 130 20
5 100 400
6 80 260
cost table
→ find optimal allocation & binding
(all bindings possible)
![Page 20: System Partitioning - uni-paderborn.de](https://reader031.vdocuments.site/reader031/viewer/2022020623/61f17224fcf2f4741167c979/html5/thumbnails/20.jpg)
ILP – Basic Problem Formulation
• binary variables xi,k = 1: object oi in partition pk
• cost ci,k (constants), if object oi is in partition pk
• integer linear program:
20
!𝑥#,% = 11 ≤ 𝑖 ≤ 𝑛,
%-.
𝑥#,% ∈ 0,1 1 ≤ 𝑖 ≤ 𝑛, 1 ≤ 𝑘 ≤ 𝑚
𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒:!!𝑥#,% ⋅ 𝑐#,%
8
#-.
1 ≤ 𝑘 ≤ 𝑚, 1 ≤ 𝑖 ≤ 𝑛,
%-.
![Page 21: System Partitioning - uni-paderborn.de](https://reader031.vdocuments.site/reader031/viewer/2022020623/61f17224fcf2f4741167c979/html5/thumbnails/21.jpg)
ILP – Illustration Binding
1
32
4 5
6
task SW cost HW cost
1 80 320
2 240 170
3 710 120
4 130 20
5 100 400
6 80 260
cost table
vprocessor
vASIC
x11
1
2
x12
x21
x22
n=6 m=2
n * m=12binary variables
specification graph (incomplete)
21
![Page 22: System Partitioning - uni-paderborn.de](https://reader031.vdocuments.site/reader031/viewer/2022020623/61f17224fcf2f4741167c979/html5/thumbnails/22.jpg)
ILP – Basic ILP in LP Format
/************************/ /* objective function *//************************/
/* c1s = 80 c1h = 320 *//* c2s = 240 c2h = 170 *//* c3s = 710 c3h = 120 *//* c4s = 130 c4h = 20 *//* c5s = 100 c5h = 400 *//* c6s = 80 c6h = 260 */
min: 80 x11 + 320 x12 + 240 x21 + 170 x22 + 710 x31 + 120 x32 + 130 x41 + 20 x42 + 100 x51 + 400 x52 + 80 x61 + 260 x62;
/******************************/ /* unique mapping constraints *//******************************/ x11 + x12 = 1;x21 + x22 = 1;x31 + x32 = 1;x41 + x42 = 1;x51 + x52 = 1;x61 + x62 = 1;
/**********************//* variables in {0,1} *//**********************/x11 <= 1;x12 <= 1;x21 <= 1;x22 <= 1;x31 <= 1;x32 <= 1;x41 <= 1;x42 <= 1;x51 <= 1;x52 <= 1;x61 <= 1;x62 <= 1;
/*********************/ /* integer variables *//*********************/int x11;int x12;int x21;int x22;int x31;int x32;int x41;int x42;int x51;int x52;int x61;int x62;
22
![Page 23: System Partitioning - uni-paderborn.de](https://reader031.vdocuments.site/reader031/viewer/2022020623/61f17224fcf2f4741167c979/html5/thumbnails/23.jpg)
ILP – Solution of Basic ILP w/ Cost Constraints
1
32
4 5
6
task SW cost HW cost
1 80 320
2 240 170
3 710 120
4 130 20
5 100 400
6 80 260
cost table
vprocessor
vASIC
1
2
n=6 m=2
n * m=12binary variables
total cost = 570
allocation & binding
23
![Page 24: System Partitioning - uni-paderborn.de](https://reader031.vdocuments.site/reader031/viewer/2022020623/61f17224fcf2f4741167c979/html5/thumbnails/24.jpg)
ILP – Additional Constraints (1)
• at maximum hk objects in partition pk
• maximal cost Hk of objects in partition pk
24
!𝑥#,% ≤ ℎ%1 ≤ 𝑘 ≤ 𝑚8
#-.
!𝑐#,% ⋅ 𝑥#,% ≤ 𝐻%1 ≤ 𝑘 ≤ 𝑚8
#-.
![Page 25: System Partitioning - uni-paderborn.de](https://reader031.vdocuments.site/reader031/viewer/2022020623/61f17224fcf2f4741167c979/html5/thumbnails/25.jpg)
ILP – Additional Constraints (2)
• constraint on the hardware cost– cost of all tasks mapped to hardware must not exceed 300
/****************************/ /* hardware cost constraint *//****************************/ 320 x12 + 170 x22 + 120 x32 + 20 x42 + 400 x52 + 260 x62 <= 300;
25
!𝑐#,; ⋅ 𝑥#,; ≤ 300=
#-.
![Page 26: System Partitioning - uni-paderborn.de](https://reader031.vdocuments.site/reader031/viewer/2022020623/61f17224fcf2f4741167c979/html5/thumbnails/26.jpg)
ILP – Solution w/ Cost and Capacity Constraints
1
32
4 5
6
task SW cost HW cost
1 80 320
2 240 170
3 710 120
4 130 20
5 100 400
6 80 260
cost table
vprocessor
vASIC
1
2
n=6 m=2
n * m=12binary variables
total cost = 640
constraint: HW cost <= 300
allocation & binding
26
![Page 27: System Partitioning - uni-paderborn.de](https://reader031.vdocuments.site/reader031/viewer/2022020623/61f17224fcf2f4741167c979/html5/thumbnails/27.jpg)
ILP – Communication Cost Modeling (1)
• including communication cost– assume local communication is for free (SW-SW or HW-HW) but bus
communication incurs costs (SW-HW or HW-SW)– for simplicity, we assume additive costs
edge i →j local communication bus communication
1→2 0 20
1→3 0 250
2→4 0 200
2→5 0 10
3→5 0 100
4→6 0 5
5→6 0 50
communication cost: comi,j
27
![Page 28: System Partitioning - uni-paderborn.de](https://reader031.vdocuments.site/reader031/viewer/2022020623/61f17224fcf2f4741167c979/html5/thumbnails/28.jpg)
ILP – Communication Cost Modeling (2)
• introduce additional binary variables– gi,j =1, iff the communication from node i to node j uses the bus
• define gi,j in terms of the primary variables xi,k
• new cost/objectivefunction
non-linear !!
linearization
28
✎ Excursus: Linearization of constraints
𝑥#,. ⋅ 𝑥>,;+𝑥#,; ⋅ 𝑥>,. = 𝑔#,> ∀𝑒𝑑𝑔𝑒𝑠(𝑣#, 𝑣>)
𝑥#,.+𝑥>,; − 1 ≤ 𝑔#,>𝑥#,;+𝑥>,. − 1 ≤ 𝑔#,>
G∀𝑒𝑑𝑔𝑒𝑠(𝑣#, 𝑣>)
𝑚𝑖𝑛 !(𝑐#,. ⋅ 𝑥#,.
8
#-.
+ 𝑐#,; ⋅ 𝑥#,;)+ ! 𝑐𝑜𝑚#,> ⋅ 𝑔#,>JKLJM(NO,NP)
![Page 29: System Partitioning - uni-paderborn.de](https://reader031.vdocuments.site/reader031/viewer/2022020623/61f17224fcf2f4741167c979/html5/thumbnails/29.jpg)
ILP – Extended ILP in LP Format
/***********************/ /* communication costs *//***********************/ x12 + x21 - 1 <= g12;x11 + x22 - 1 <= g12;
x12 + x31 - 1 <= g13;x11 + x32 - 1 <= g13;
x22 + x41 - 1 <= g24;x21 + x42 - 1 <= g24;
x22 + x51 - 1 <= g25;x21 + x52 - 1 <= g25;
x32 + x51 - 1 <= g35;x31 + x52 - 1 <= g35;
x42 + x61 - 1 <= g46;x41 + x62 - 1 <= g46;
x52 + x61 - 1 <= g56;x51 + x62 - 1 <= g56;
/**********************//* variables in {0,1} *//**********************/g12 <= 1; g13 <= 1; g24 <= 1; g25 <= 1; g35 <= 1; g46 <= 1; g56 <= 1;
/*********************/ /* integer variables *//*********************/int g12; int g13; int g24; int g25; int g35; int g46; int g56;
/***********************/ /* objective function *//***********************/
/* c1s = 80 c1h = 320 *//* c2s = 240 c2h = 170 *//* c3s = 710 c3h = 120 *//* c4s = 130 c4h = 20 *//* c5s = 100 c5h = 400 *//* c6s = 80 c6h = 260 */
/* g12 = 20 *//* g13 = 250 *//* g24 = 200 *//* g25 = 10 *//* g35 = 100 *//* g46 = 5 */ /* g56 = 50 */
min: 80 x11 + 320 x12 + 240 x21 + 170 x22 + 710 x31 + 120 x32 + 130 x41 + 20 x42 + 100 x51 + 400 x52 + 80 x61 + 260 x62 + 20 g12 + 250 g13 + 200 g24 + 10 g25 + 100 g35 + 5 g46 + 50 g56;
29
![Page 30: System Partitioning - uni-paderborn.de](https://reader031.vdocuments.site/reader031/viewer/2022020623/61f17224fcf2f4741167c979/html5/thumbnails/30.jpg)
ILP – Solution w/ HW and Communication Costs
task SW cost HW cost
1 80 320
2 240 170
3 710 120
4 130 20
5 100 400
6 80 260
cost table
vprocessor
vASIC
1
2
n=6 m=2
allocation & binding
total cost = 1100
constraint: HW cost <= 300
vbus
1
32
4 5
6
20 250
10
5 50
200 100
30
![Page 31: System Partitioning - uni-paderborn.de](https://reader031.vdocuments.site/reader031/viewer/2022020623/61f17224fcf2f4741167c979/html5/thumbnails/31.jpg)
Partitioning Methods – Overview
• exact methods– enumeration of solutions– integer linear programs (ILP)
• heuristic methods– constructive methods
§ random mapping§ hierarchical clustering
– iterative methods (refinement methods)§ greedy partitioners§ Kernighan-Lin
§ simulated annealing§ evolutionary algorithms (design space exploration)
31
![Page 32: System Partitioning - uni-paderborn.de](https://reader031.vdocuments.site/reader031/viewer/2022020623/61f17224fcf2f4741167c979/html5/thumbnails/32.jpg)
Constructive Methods
• examples– random mapping
§ each object is randomly assigned to some block§ disadvantage: probability to generate a near-optimal partitioning is extremely
low
– hierarchical clustering§ define a closeness function that determines how desirable the grouping of two
objects is§ perform a stepwise grouping of close objects or object groups§ disadvantage: no easy way to balance the size of the partitions
• constructive methods …– lead to valid, but suboptimal results– are often used to generate a start partition for iterative (refinement)
methods– clustering often shows the difficulty of finding suitable closeness function
32
![Page 33: System Partitioning - uni-paderborn.de](https://reader031.vdocuments.site/reader031/viewer/2022020623/61f17224fcf2f4741167c979/html5/thumbnails/33.jpg)
Hierarchical Clustering (1)
2010
108
4 6
v1
v3v2
v4
10
7
4
v5 = v1 ⋃ v3
v4
v5
v2
20
closeness between hierarchical objects: arithmetic mean
33
![Page 34: System Partitioning - uni-paderborn.de](https://reader031.vdocuments.site/reader031/viewer/2022020623/61f17224fcf2f4741167c979/html5/thumbnails/34.jpg)
Hierarchical Clustering (2)
5.5
v6 = v2 ⋃ v5
v4
v610
7
4 v4
v5
v2
10
34
![Page 35: System Partitioning - uni-paderborn.de](https://reader031.vdocuments.site/reader031/viewer/2022020623/61f17224fcf2f4741167c979/html5/thumbnails/35.jpg)
Hierarchical Clustering (3)
v7 = v6 ⋃ v4
v75.5
v4
v6
5.5
35
![Page 36: System Partitioning - uni-paderborn.de](https://reader031.vdocuments.site/reader031/viewer/2022020623/61f17224fcf2f4741167c979/html5/thumbnails/36.jpg)
Hierarchical Clustering (4)
v7 = v6 ⋃ v4
v4
v6 = v2 ⋃ v5
v5 = v1 ⋃ v3
v1 v2 v3
step 1:
step 2:
step 3:
cut lines(partitions)
36
✎ Exercise 6.1: Hierarchical Clustering ✎ Exercise 6.2: Closeness Function
![Page 37: System Partitioning - uni-paderborn.de](https://reader031.vdocuments.site/reader031/viewer/2022020623/61f17224fcf2f4741167c979/html5/thumbnails/37.jpg)
Greedy Hw/Sw Partitioning (1)
• bi-partitioning (simplest case): P = {pSW, pHW}
• software oriented approach: P = {O, {}}– in SW all functions can be realized – performance might be too low → migrate objects to HW
• hardware oriented approach: P = {{}, O}– in HW the performance goals are satisfied (assumes HW is always faster
than SW) – cost might be too high → migrate objects to SW
37
![Page 38: System Partitioning - uni-paderborn.de](https://reader031.vdocuments.site/reader031/viewer/2022020623/61f17224fcf2f4741167c979/html5/thumbnails/38.jpg)
Greedy Hw/Sw Partitioning (2)
• migration of objects into the other partition (HW/SW), until there is no more improvement
REPEAT {Pold = P;FOR i = 1 TO n {
IF (f(Move(P, oi)) < f(P)) {P = Move(P, oi);
}}
UNTIL (P == Pold)
cost function f()
38
✎ Example: Greedy structural partitioning ✎ Exercise 7.1: Greedy partitioning
![Page 39: System Partitioning - uni-paderborn.de](https://reader031.vdocuments.site/reader031/viewer/2022020623/61f17224fcf2f4741167c979/html5/thumbnails/39.jpg)
Kernighan-Lin (1)
• iterative heuristic method for generating balanced bi-partitions • problem definition
– given: graph G(V,E) with weighted edges (V: set of vertices, E: set of edges)
– find partitioning into two balanced disjoint partitions A,B, such that the weight of edges between A and B is minimized
v9
v2
v4v5
v7
v1
v3v6
v8
39
5 7
2
8
123
41
9
12
2 7
5
B. W. Kernighan and S. Lin. An efficient heuristic procedure for partitioning graphs. Bell Systems Technical Journal, 49:291–307, 1970.
![Page 40: System Partitioning - uni-paderborn.de](https://reader031.vdocuments.site/reader031/viewer/2022020623/61f17224fcf2f4741167c979/html5/thumbnails/40.jpg)
Kernighan-Lin (2)
• definitions– Ia internal cost of vertex a (sum of all edge weights between a in A and all
other nodes in A)– Ea external cost (sum of all edge weights between a any node b in B)– Da =Ea – Ia difference between external and internal cost– ca,b cost of edge between a and b
• lemma– if two nodes a and b are exchanged, the cost is reduced by
Told – Tnew = Da + Db – 2ca,b
40
![Page 41: System Partitioning - uni-paderborn.de](https://reader031.vdocuments.site/reader031/viewer/2022020623/61f17224fcf2f4741167c979/html5/thumbnails/41.jpg)
Kernighan-Lin (3)
• proof– let z be the cost due to all connections between A and B that do not
involve a or b, then:– cost before swapping a and b:
T = z + Ea + Eb – ca,b (subtract ca,b otherwise counted twice)– cost after swapping a and b:
T' = z + Ia + Ib + ca,b (add ca,b because not considered in internal cost)
– gain in cost: T-T' = (Ea-Ia) + (Eb-Ib) – 2ca,b = Da + Db – 2ca,b
41
a bca,b
IaIb
Ea
Eb
![Page 42: System Partitioning - uni-paderborn.de](https://reader031.vdocuments.site/reader031/viewer/2022020623/61f17224fcf2f4741167c979/html5/thumbnails/42.jpg)
Kernighan-Lin (4)
42
function Kernighan-Lin(G(V,E)):determine a balanced initial partition of the nodes into sets A and Bdo
compute D values for all a in A and b in Blet gv, av, and bv be empty listsfor (n := 1 to |V|/2)
find a from A and b from B, such that g = D[a] + D[b] - 2*c[a,b] is maximalmove a to B and b to Aremove a and b from further consideration in this passadd g to gv, a to av and b to bvupdate D values for the elements of A = A \ a and B = B \ b
end forfind k which maximizes g_max, the sum of g[1],...,g[k]if (g_max > 0) then
exchange av[1],av[2],...,av[k] with bv[1],bv[2],...,bv[k]until (g_max <= 0)
return G(V,E)
source: Wikipedia (en, 2.7.2015)
![Page 43: System Partitioning - uni-paderborn.de](https://reader031.vdocuments.site/reader031/viewer/2022020623/61f17224fcf2f4741167c979/html5/thumbnails/43.jpg)
Kernighan-Lin Example (1)
• balanced initial partition A and B
• compute D values for all a in A and b in B
43
v1
v5
1
v2
v6
v3
v7
v4
v8
1 4 2
2
3 1 1
B
A
E I D = E - Iv1 1 3 -2v2
v3
v7
v4
v5
v6
v8
![Page 44: System Partitioning - uni-paderborn.de](https://reader031.vdocuments.site/reader031/viewer/2022020623/61f17224fcf2f4741167c979/html5/thumbnails/44.jpg)
Kernighan-Lin Example (1)
• balanced initial partition A and B
• compute D values for all a in A and b in B
44
v1
v5
1
v2
v6
v3
v7
v4
v8
1 4 2
2
3 1 1
B
A
E I D = E - Iv1 1 3 -2v2 1 4 -3v3 1 5 -4v7 0 4 -4v4 1 2 -1v5 1 2 -1v6 1 2 -1v8 0 2 -2
![Page 45: System Partitioning - uni-paderborn.de](https://reader031.vdocuments.site/reader031/viewer/2022020623/61f17224fcf2f4741167c979/html5/thumbnails/45.jpg)
Kernighan-Lin Example (2)
• find a from A and b from B, such that g = D[a] + D[b] - 2*c[a,b] is maximal
45
E I D = E - Iv1 1 3 -2v2 1 4 -3v3 1 5 -4v7 0 4 -4v4 1 2 -1v5 1 2 -1v6 1 2 -1v8 0 2 -2
v1
v5
1
v2
v6
v3
v7
v4
v8
1 4 2
2
3 1 1
B
A
D[a] + D[b] v4 v5 v6 v8
v1 -2 -1 -2 -1 -2 -1 -2 -2v2 -3 -1 -3 -1 -3 -1 -3 -2v3 -4 -1 -4 -1 -4 -1 -4 -2v7 -4 -1 -4 -1 -4 -1 -4 -2
![Page 46: System Partitioning - uni-paderborn.de](https://reader031.vdocuments.site/reader031/viewer/2022020623/61f17224fcf2f4741167c979/html5/thumbnails/46.jpg)
Kernighan-Lin Example (2)
• find a from A and b from B, such that g = D[a] + D[b] - 2*c[a,b] is maximal
46
E I D = E - Iv1 1 3 -2v2 1 4 -3v3 1 5 -4v7 0 4 -4v4 1 2 -1v5 1 2 -1v6 1 2 -1v8 0 2 -2
v1
v5
1
v2
v6
v3
v7
v4
v8
1 4 2
2
3 1 1
B
A
g v4 v5 v6 v8
v1 -2 -1 = -3 -2 -1 -2 = -5 -2 -1 = -3 -2 -2 = -4v2 -3 -1 = -4 -3 -1 = -4 -3 -1 -2 = -6 -3 -2 = -5v3 -4 -1 -2 = -7 -4 -1 = -5 -4 -1 = -5 -4 -2 = -6v7 -4 -1 = -5 -4 -1 = -5 -4 -1 = -5 -4 -2 = -6
![Page 47: System Partitioning - uni-paderborn.de](https://reader031.vdocuments.site/reader031/viewer/2022020623/61f17224fcf2f4741167c979/html5/thumbnails/47.jpg)
Kernighan-Lin Example (3)
• move a to B and b to A
• remove a and b from further consideration in this pass• add g to gv, a to av and b to bv
– gv = [-3]– av = [v1]– bv = [v4]
47
v1
v5
1
v2
v6
v3
v7
v4
v8
14 2
2
3 1 1
BA
![Page 48: System Partitioning - uni-paderborn.de](https://reader031.vdocuments.site/reader031/viewer/2022020623/61f17224fcf2f4741167c979/html5/thumbnails/48.jpg)
Kernighan-Lin Example (4)
• update D values for the elements of A = A \ a and B = B \ b
48
v1
v5
1
v2
v6
v3
v7
v4
v8
14 2
2
3 1 1
BA
E I D = E - Iv2 4 1 3v3 0 6 -6v7 0 4 -4v5 0 3 -3v6 1 2 -1v8 2 0 0
E I D = E - Iv2 1 4 -3v3 1 5 -4v7 0 4 -4v5 1 2 -1v6 1 2 -1v8 0 2 -2
![Page 49: System Partitioning - uni-paderborn.de](https://reader031.vdocuments.site/reader031/viewer/2022020623/61f17224fcf2f4741167c979/html5/thumbnails/49.jpg)
Kernighan-Lin Example (5)
• iteration 2:• find a from A and b from B, such that
g = D[a] + D[b] - 2*c[a,b] is maximal
49
g v5 v6 v8
v2 3 -3 = 0 3 -1 -2 = 0 3 + 2 = 5v3 -6 -3 = -9 -6 -1 = -7 -6 + 2 = -4v7 -4 -3 = -7 -4 -1 = -5 -4 + 2 = -2
v1
v5
1
v2
v6
v3
v7
v4
v8
14 2
2
3 1 1
BA
E I D = E - Iv2 4 1 3v3 0 6 -6v7 0 4 -4v5 0 3 -3v6 1 2 -1v8 2 0 2
• gv = [-3, 5]• av = [v1, v2]• bv = [v4, v8]
![Page 50: System Partitioning - uni-paderborn.de](https://reader031.vdocuments.site/reader031/viewer/2022020623/61f17224fcf2f4741167c979/html5/thumbnails/50.jpg)
Kernighan-Lin Example (6)
• iteration 3:• find a from A and b from B, such that
g = D[a] + D[b] - 2*c[a,b] is maximal
50
g v5 v6
v3 -4 -3 = -7 -4 -5 = -9v7 -4 -3 = -7 -4 -5 = -9
v1
v5
1
v2
v6
v3
v7
v4
v8
14 2
2
3 1 1
B A
E I D = E - Iv3 1 5 -4v7 0 4 -4v5 0 3 -3v6 0 5 -5
• gv = [-3, 5, -7]• av = [v1, v2 , v3]• bv = [v4, v8 , v5]
![Page 51: System Partitioning - uni-paderborn.de](https://reader031.vdocuments.site/reader031/viewer/2022020623/61f17224fcf2f4741167c979/html5/thumbnails/51.jpg)
Kernighan-Lin Example (7)
• iteration 4:• find a from A and b from B, such that
g = D[a] + D[b] - 2*c[a,b] is maximal
• find k which maximizes g_max, the sum of g[1],...,g[k]
51
g v6
v7 4 + 1 = 5
v1
v5
1
v2
v6
v3
v7
v4
v8
14 2
2
3 1 1
B
A
E I D = E - Iv7 4 0 4v6 2 1 1
• gv = [-3, 5, -7, 5]• av = [v1, v2 , v3, v7]• bv = [v4, v8 , v5, v6]
• k=1: g_sum = -3• k=2: g_sum = 2• k=3: g_sum = -5• k=4: g_sum = 0
select k=2
swapped A and B
![Page 52: System Partitioning - uni-paderborn.de](https://reader031.vdocuments.site/reader031/viewer/2022020623/61f17224fcf2f4741167c979/html5/thumbnails/52.jpg)
Kernighan-Lin (5)
• algorithmic complexity of one update loop: O(n2 log(n))• widely used in partitioning of netlists for digital circuits• there exist several extensions
– unequal partition sizes– n-way partitioning– finding near-optimal partitions
52
![Page 53: System Partitioning - uni-paderborn.de](https://reader031.vdocuments.site/reader031/viewer/2022020623/61f17224fcf2f4741167c979/html5/thumbnails/53.jpg)
Simulated Annealing (1)
• simulated annealing – nature-inspired optimization method (meta-heuristic)– metal and glass take on minimal energy states when they are cooled
down under certain conditions:§ for each temperature, thermodynamic equilibrium is reached§ the temperature is decreased arbitrarily slow
– probability for a particle jumping into a higher energy state:
• application to combinatorial optimization– energy = cost of a solution– reduction of cost with simulated temperature, sometimes increases in cost
are accepted
53
𝑃(𝑒#,𝑒>, 𝑇) = 𝑒SOTSPUVW
![Page 54: System Partitioning - uni-paderborn.de](https://reader031.vdocuments.site/reader031/viewer/2022020623/61f17224fcf2f4741167c979/html5/thumbnails/54.jpg)
Simulated Annealing (2)
temp = temp_start;cost = c(P);WHILE (Frozen() == FALSE) {
WHILE (Equilibrium() == FALSE) {P’ = RandomMove(P);cost’ = c(P’);deltacost = cost’ - cost;IF (Accept(deltacost, temp) > random[0,1)) {
P = P’cost = cost’
}}temp = DecreaseTemp(temp);
}
54
deltacost < 0 → improvement
A𝑐𝑐𝑒𝑝𝑡 = min 1, 𝑒^KJ_`abcM`%⋅`J,d
![Page 55: System Partitioning - uni-paderborn.de](https://reader031.vdocuments.site/reader031/viewer/2022020623/61f17224fcf2f4741167c979/html5/thumbnails/55.jpg)
Simulated Annealing (3)
• accept function
55
0
0.2
0.4
0.6
0.8
1
1.2
-2 -1 0 1 2 3
acce
pt(d
elta
cost
,t)
deltacost (negative value means improvement)
accept function
t=0.1 t=0.5
t=1.0
t=1.5
f1(x)f2(x)f3(x)f4(x)
accept(deltacost, t) =min 1,e−deltacost
k⋅t#
$%
&
'(
k=1
![Page 56: System Partitioning - uni-paderborn.de](https://reader031.vdocuments.site/reader031/viewer/2022020623/61f17224fcf2f4741167c979/html5/thumbnails/56.jpg)
Simulated Annealing (4)
• annealing schedule: DecreaseTemp(), Frozen()– temp_start = 1.0– temp = a • temp (typical: 0.8 ≤ a ≤ 0.99)– stop at temp < temp_min
or if there is no more improvement
• equilibrium: Equilibrium()– after certain number of iterations or if there is no more improvement
• complexity– from exponential to constant, depending on the implementation of the
functions Equilibrium(), DecreaseTemp(), Frozen()– the longer the runtime, the better the results– usually functions are constructed to get polynomial runtime
56
![Page 57: System Partitioning - uni-paderborn.de](https://reader031.vdocuments.site/reader031/viewer/2022020623/61f17224fcf2f4741167c979/html5/thumbnails/57.jpg)
Partitioning Methods – Recap
• exact methods– enumeration of solutions– integer linear programs (ILP)
• heuristic methods– constructive methods
§ random mapping§ hierarchical clustering
– iterative methods (refinement methods)§ greedy partitioners§ Kernighan-Lin
§ simulated annealing§ evolutionary algorithms (design space exploration)
57
![Page 58: System Partitioning - uni-paderborn.de](https://reader031.vdocuments.site/reader031/viewer/2022020623/61f17224fcf2f4741167c979/html5/thumbnails/58.jpg)
Changes
• 2017-07-10 (v1.6.1)– add extensive example on Kerninghan-Lin
• 2017-06-25 (v1.6.0)– updated for SS2017
• 2016-07-05 (v1.5.2)– fixed type on equation on slide 43
• 2016-06-21 (v1.5.1)– fixed many equations that were broken due to PowerPoint update
• 2016-06-05 (v1.5.0)– updated for SS2016
• 2015-07-02 (v1.4.1)– corrected Kernighan-Lin pseudocode (slide 41)
• 2015-06-18 (v1.4.0)– update for SS2015
58
![Page 59: System Partitioning - uni-paderborn.de](https://reader031.vdocuments.site/reader031/viewer/2022020623/61f17224fcf2f4741167c979/html5/thumbnails/59.jpg)
Changes
• 2014-06-26 (v1.3.1)– minor cosmetic changes
• 2014-06-11 (v1.3.0)– updated for SS2014 (added detailed description of Kernighan-Lin
algorithm)
• 2013-06-12 (v1.2.0)– updated for SS2013
• 2012-05-12 (v1.1.0)– updated for SS2012
• 2011-06-07 (v1.0.0)– initial version
• 2011-06-09 (v1.0.1)– cosmetic changes
59