algorithms and data structures · a. coja-oghlan (lfcs) algorithms and data structures 2 / 27....
TRANSCRIPT
Algorithms and data structures
Amin Coja-Oghlan
LFCS
Reminder: the minimum spanning tree problem
Reminder: MST
Input: A connected weighted graph G = (V ,E ,W ).
Output: a subgraph H = (VH ,EH) (i.e., VH ⊂ V and EH ⊂ E ) suchthat
1 H is spanning, i.e., VH = V .2 H is connected.3 The weight W (H) =
∑
e∈EHW (e) of H is minimum (among all
subgraphs satisfying 1. and 2.).
In words: H is a minimum weight subgraph that connectes all vertices.
Consequently, we could add the constraint that H must be a tree(because any connected graph contains a tree).
This is called the minimum spanning tree problem (“MST”).
A. Coja-Oghlan (LFCS) Algorithms and data structures 2 / 27
Kruskal’s algorithm
The idea
Remeber that a forest is an acyclic graph (and thus all itscomponents are trees).
Starting from a spanning “forest” without any edges, Kruskal keepsmelting components of the forest as cheaply as possible (greedystrategy).
We will see that the resulting tree is a MST.
A. Coja-Oghlan (LFCS) Algorithms and data structures 3 / 27
Kruskal’s algorithm
The idea
Remeber that a forest is an acyclic graph (and thus all itscomponents are trees).
Starting from a spanning “forest” without any edges, Kruskal keepsmelting components of the forest as cheaply as possible (greedystrategy).
We will see that the resulting tree is a MST.
Algorithm Kruskal(G )
Input: a connected weighted graph G = (V ,E ,W ). Output: a MST.
1 Let F = ∅. Sort the edges E = {e1, . . . , em} increasingly by weight.
2 For i = 1, . . . ,m do
3 if e connects two different components of (V ,F ), add e to F .
4 Return (V ,F ).
A. Coja-Oghlan (LFCS) Algorithms and data structures 3 / 27
Kruskal’s algorithm: correctness
Theorem
Given a connected weighted graph G = (V ,E ,W ), Kruskal outputs aMST.
A. Coja-Oghlan (LFCS) Algorithms and data structures 4 / 27
Kruskal’s algorithm: correctness
Theorem
Given a connected weighted graph G = (V ,E ,W ), Kruskal outputs aMST.
Proof, part 1: (V , F ) is a spanning forest at all times.
(V ,F ) is clearly spanning.
Kruskal only adds edges that join two components, and hence doesnot create any cycles.
A. Coja-Oghlan (LFCS) Algorithms and data structures 4 / 27
Kruskal’s algorithm: correctness
Theorem
Given a connected weighted graph G = (V ,E ,W ), Kruskal outputs aMST.
Proof, part 2: the output of Kruskal is connected.
Assume for contradiction that (V ,F ) is not connected.
Then it is a forest with at least two components C1, C2.
Since G is connected, there is an edge joining C1,C2 in G .
Let j be the minimum index such that ej joins C1,C2.
Then Kruskal should have added ej to F .
A. Coja-Oghlan (LFCS) Algorithms and data structures 4 / 27
Kruskal’s algorithm: correctness
Theorem
Given a connected weighted graph G = (V ,E ,W ), Kruskal outputs aMST.
Proof, part 3: throughout (V , F ) is contained in an MST.
Similar to the arumgent for Prim.
A. Coja-Oghlan (LFCS) Algorithms and data structures 4 / 27
Kruskal’s algorithm: correctness
Theorem
Given a connected weighted graph G = (V ,E ,W ), Kruskal outputs aMST.
Proof, part 4: throughout (V , F ) is contained in an MST.
Combining parts 1–3 completes the proof.
A. Coja-Oghlan (LFCS) Algorithms and data structures 4 / 27
Implementing Kruskal
Data structure: disjoint sets
To implement Step 3, we need to keep track of the components of(V ,F ).
The components are disjoint sets of vertices.
They are dynamic, as we occasionally melt two components.
We will associate each component with a representative, which is avertex in the component.
We will need the following operations:
MakeSet(x): create a new set whose only member (and representative)is x .Union(x , y): replace the sets containing x and y by their union.Find(x): compute the representative of the set containing x .
A. Coja-Oghlan (LFCS) Algorithms and data structures 5 / 27
Implementing Kruskal (ctd.)
Algorithm Kruskal(G )
Input: a weighted connected graph G = (V ,E ,W ). Output: a MST.
1 Let F = ∅.
2 For all v ∈ V call MakeSet(v).
3 Sort edges E = {e1, . . . , em} so that W (ei ) ≤ W (ei+1) (1 ≤ i < m).
A. Coja-Oghlan (LFCS) Algorithms and data structures 6 / 27
Implementing Kruskal (ctd.)
Algorithm Kruskal(G )
Input: a weighted connected graph G = (V ,E ,W ). Output: a MST.
1 Let F = ∅.
2 For all v ∈ V call MakeSet(v).
3 Sort edges E = {e1, . . . , em} so that W (ei ) ≤ W (ei+1) (1 ≤ i < m).
4 For i = 1, . . . ,m do
5 If ei = {xi , yi} has the property FindSet(xi) 6= FindSet(yi ), then
6 add ei to F and call Union(xi , yi ).
7 Return F .
A. Coja-Oghlan (LFCS) Algorithms and data structures 6 / 27
Implementing Kruskal (ctd.)
The running time
n calls of MakeSet.
Time to sort the m edges: O(m lnm).
m calls of FindSet.
Up to n − 1 calls of Union.
Using an efficient implementation of disjoint sets, this is O(m lnm).
A. Coja-Oghlan (LFCS) Algorithms and data structures 7 / 27
Implementing disjoint sets
Amortized analysis
We need to analyze the total running time of
n calls of MakeSet,m calls of FindSet, andn − 1 calls of Union.
A. Coja-Oghlan (LFCS) Algorithms and data structures 8 / 27
Implementing disjoint sets
Amortized analysis
We need to analyze the total running time of
n calls of MakeSet,m calls of FindSet, andn − 1 calls of Union.
“Classical” approach: use individual worst-case bounds:
n × worst case for MakeSet
+ m × worst case for MakeSet
+ (n − 1) × worst case for Union.
A. Coja-Oghlan (LFCS) Algorithms and data structures 8 / 27
Implementing disjoint sets
Amortized analysis
We need to analyze the total running time of
n calls of MakeSet,m calls of FindSet, andn − 1 calls of Union.
“Classical” approach: use individual worst-case bounds:
n × worst case for MakeSet
+ m × worst case for MakeSet
+ (n − 1) × worst case for Union.
Amortized analysis:
analyze the cost of the entire sequence directly;take into account that most operations in the sequence don’t attainthe worst-case bound!
A. Coja-Oghlan (LFCS) Algorithms and data structures 8 / 27
Implementing disjoint sets (ctd.)
Linked lists
Use a linked list for each set.
Representative of the set is at the head of the list.
Each element has a pointer direct to the representative (head of itslist).
x
A. Coja-Oghlan (LFCS) Algorithms and data structures 9 / 27
Implementing disjoint sets (ctd.)
Linked lists: example
Linked list representation of
{ a, f }, { b }, { g , c , e }, { d } :
fa
b
g c e
d
The representatives are a, b, g and d .
A. Coja-Oghlan (LFCS) Algorithms and data structures 10 / 27
Implementing disjoint sets (ctd.)
Linked lists (ctd.)
MakeSet: just generate a new linked list; Θ(1) time.
FindSet: follow the pointer to the representative; Θ(1) time.
Union(x , y): the naive approach is to append the list of x onto theend of the list of y .
For Union it may help to have a pointer to the last entry of eachlinked list.
Snag: we have to update the representative pointer of each entry inthe list of x .
Cost for naive Union(x , y): Θ(length of list of x).
A. Coja-Oghlan (LFCS) Algorithms and data structures 11 / 27
Implementing disjoint sets (ctd.)
Example: Union(g , b)
a f
g c e
d
b
A. Coja-Oghlan (LFCS) Algorithms and data structures 12 / 27
Implementing disjoint sets (ctd.)
Conventions for the further analysis
ν = # of MakeSet operations.
µ = total # of MakeSet, Union, and FindSet operations.
Note that after ν − 1 Unions only one set remaims.
Observe that µ ≥ ν.
A. Coja-Oghlan (LFCS) Algorithms and data structures 13 / 27
Implementing disjoint sets (ctd.)
A nasty example
Let ν = ⌈µ/2⌉, q = µ − ν.Elements: x1, . . . , xn.
Operation Number of objects updatedMakeSet(x1) 1MakeSet(x2) 1
......
MakeSet(xν) 1Union(x1, x2) 1Union(x2, x3) 2Union(x3, x4) 3
......
Union(xq−1, xq) q − 1Total Θ(µ2)
A. Coja-Oghlan (LFCS) Algorithms and data structures 14 / 27
Implementing disjoint sets (ctd.)
Linked lists: fix (“weighted union heuristic”)
Record the length of each list.
Implement Union(x , y) so that it always appends the shorter list tothe longer list (break ties arbitrary).
Theorem
Using linked lists with the above fix, a sequence of
µ MakeSet, Union, and FindSet operations,
among which ν are MakeSet operations,
takes O(µ + ν ln ν) time.
Proof.
Basic insight: each element “migrates” at most log2 ν times.
A. Coja-Oghlan (LFCS) Algorithms and data structures 15 / 27
Implementing disjoint sets (ctd.)
The forest implementation of disjoint sets
Each set is represented by a (rooted) tree.
a
b
c d
ef
g
h
i
A. Coja-Oghlan (LFCS) Algorithms and data structures 16 / 27
Implementing disjoint sets (ctd.)
The forest implementation of disjoint sets (ctd.)
MakeSet: time Θ(1); just plant a new tree.
FindSet: follow the parent pointers to the root of the correspondingtree; Θ(height of tree).
Union(x , y): naive idea: make the root of the tree of x a child of theroot of the tree of y .
This is no faster than linked lists.
A. Coja-Oghlan (LFCS) Algorithms and data structures 17 / 27
Implementing disjoint sets (ctd.)
The forest implementation of disjoint sets (ctd.)
MakeSet: time Θ(1); just plant a new tree.
FindSet: follow the parent pointers to the root of the correspondingtree; Θ(height of tree).
Union(x , y): naive idea: make the root of the tree of x a child of theroot of the tree of y .
This is no faster than linked lists.
Forests: improving the running time
Keep the trees low!
Union by rank: attach the lower tree to the roof of the heigher tree.
Path compression: upon performing FindSet, place all vertices onthe path directly under the root.
A. Coja-Oghlan (LFCS) Algorithms and data structures 17 / 27
Implementing disjoint sets (ctd.)
Union by rank
For each root x maintain a variable rank [x ], which is the height ofthe tree “below” x
When performing Union(x , y), make the root with the smaller rank achild of the one with the larger rank.
If a tie occurs, make the root of x a child of the root of y andincrease the rank of the root of y .
A. Coja-Oghlan (LFCS) Algorithms and data structures 18 / 27
Implementing disjoint sets (ctd.)
Union by rank: example
a
a
frank[f ] = 0
ce
gb drank[d ] = 0
f
cd
g
rank[g ] = 2
e
b
f
d
rank[d ] = 1
g
rank[g ] = 1
e c
b
a
Union(f , g)
Union(f , d)
A. Coja-Oghlan (LFCS) Algorithms and data structures 19 / 27
Implementing disjoint sets (ctd.)
Lemma
Using “union by rank” ensures that the height of any tree is at mostlog2(#vertices in the tree).
Proof.We proceed by induction on the # of Union opertations.
no Union ⇒ all sets are singletons.
Suppose Union(x , y) is called. Let r , s be the roots of x , y .
Case 1: rank(r) < rank(s) ⇒ the height of the s-tree stays the same (butthe number of vertices increases).
Case 2: rank(s) < rank(r): analogously.
Case 2: rank(s) = rank(r): the height increases by one, and by induction
#vertices(s) + #vertices(r) ≥ 2rank(r) + 2rank(s) = 2rank(s)+1.
A. Coja-Oghlan (LFCS) Algorithms and data structures 20 / 27
Implementing disjoint sets (ctd.)
Union by rank: running time
MakeSet takes constant time.
The time for FindSet is bounded by the rank and hence O(ln ν) bythe lemma.
The time needed for Union is bounded by the rank, too, and henceO(ln ν) by the lemma.
Hence, for µ operations, among which ν are MakeSet, we needO(µ ln ν).
No better than linked lists. . . :-(
A. Coja-Oghlan (LFCS) Algorithms and data structures 21 / 27
Implementing disjoint sets (ctd.)
Path compression: idea
When performing FindSet(x), make each vertex on the path point to theroot.
Algorithm FindSet(x)
1 If x is the root, then return x . Otherwise do the following.
2 Let π(x) = FindSet(π(x)).
3 Return π(x).
A. Coja-Oghlan (LFCS) Algorithms and data structures 22 / 27
Implementing disjoint sets (ctd.)
Path compression: example
a b c d
f
e
a
b
c
d
e
f
A. Coja-Oghlan (LFCS) Algorithms and data structures 23 / 27
Implementing disjoint sets (ctd.)
The Ackermann function. . .
. . . is the function A : N × N → N defined by the recurrences
A(1, j) = 2j (j ≥ 1),
A(i , 1) = A(i − 1, 2) (i ≥ 2),
A(i , j) = A(i − 1,A(i , j − 1)) (i , j ≥ 2).
A(n,m) grows really very fast.
We are mainly interested in the so-called inverse Ackermann function
α(m, n) = min{i ≥ 1 : A(i , ⌊m/n⌋) > log2 n}.
α(m, n) grows really slowly.
A. Coja-Oghlan (LFCS) Algorithms and data structures 24 / 27
Implementing disjoint sets (ctd.)
The Ackermann function: a few numbers
j = 1 j = 2 j = 3 j = 4
i = 1 21 22 23 24
i = 2 22 2222222
22222
i = 3 22222·
·
·
2
}
16
22·
·
·
2
}
2·
·
·
2
}
16
22·
·
·
2
}
2·
·
·
2
}
2·
·
·
2
}
16
Historical importance of A(i , j): showed that the class of primitiverecursive functions does not include all computable functions.
A(i , j) grows faster than any primitive recursive function.
A. Coja-Oghlan (LFCS) Algorithms and data structures 25 / 27
Implementing disjoint sets (ctd.)
Theorem
With both the union by rank and the path compression method, theworst-case running time for
µ MakeSet, Union, and FindSet operations,
among which ν are MakeSet operations,
is O(µ · α(µ, ν)).
Remarks
Any one of union by rank and path compression does not suffice toimprove on linked lists.
Togehter they essentially yield linear time.
A. Coja-Oghlan (LFCS) Algorithms and data structures 26 / 27
Reading assignment
Take a look at. . .
[CLRS] chapter 21.
[CLRS] chapter 23.
A. Coja-Oghlan (LFCS) Algorithms and data structures 27 / 27