algorithms and data structures · a. coja-oghlan (lfcs) algorithms and data structures 2 / 27....

Algorithms and data structures

Amin Coja-Oghlan

LFCS

Reminder: the minimum spanning tree problem

Reminder: MST

Input: A connected weighted graph G = (V ,E ,W ).

Output: a subgraph H = (VH ,EH) (i.e., VH ⊂ V and EH ⊂ E ) suchthat

1 H is spanning, i.e., VH = V .2 H is connected.3 The weight W (H) =

∑

e∈EHW (e) of H is minimum (among all

subgraphs satisfying 1. and 2.).

In words: H is a minimum weight subgraph that connectes all vertices.

Consequently, we could add the constraint that H must be a tree(because any connected graph contains a tree).

This is called the minimum spanning tree problem (“MST”).

A. Coja-Oghlan (LFCS) Algorithms and data structures 2 / 27

Kruskal’s algorithm

The idea

Remeber that a forest is an acyclic graph (and thus all itscomponents are trees).

Starting from a spanning “forest” without any edges, Kruskal keepsmelting components of the forest as cheaply as possible (greedystrategy).

We will see that the resulting tree is a MST.


Kruskal’s algorithm

The idea

Remeber that a forest is an acyclic graph (and thus all itscomponents are trees).

Starting from a spanning “forest” without any edges, Kruskal keepsmelting components of the forest as cheaply as possible (greedystrategy).

We will see that the resulting tree is a MST.

Algorithm Kruskal(G )

Input: a connected weighted graph G = (V ,E ,W ). Output: a MST.

1 Let F = ∅. Sort the edges E = {e1, . . . , em} increasingly by weight.

2 For i = 1, . . . ,m do

3 if e connects two different components of (V ,F ), add e to F .

4 Return (V ,F ).


Kruskal’s algorithm: correctness

Theorem

Given a connected weighted graph G = (V ,E ,W ), Kruskal outputs aMST.



Theorem


Proof, part 1: (V , F ) is a spanning forest at all times.

(V ,F ) is clearly spanning.

Kruskal only adds edges that join two components, and hence doesnot create any cycles.



Theorem


Proof, part 2: the output of Kruskal is connected.

Assume for contradiction that (V ,F ) is not connected.

Then it is a forest with at least two components C1, C2.

Since G is connected, there is an edge joining C1,C2 in G .

Let j be the minimum index such that ej joins C1,C2.

Then Kruskal should have added ej to F .



Theorem


Proof, part 3: throughout (V , F ) is contained in an MST.

Similar to the arumgent for Prim.



Theorem


Proof, part 4: throughout (V , F ) is contained in an MST.

Combining parts 1–3 completes the proof.


Implementing Kruskal

Data structure: disjoint sets

To implement Step 3, we need to keep track of the components of(V ,F ).

The components are disjoint sets of vertices.

They are dynamic, as we occasionally melt two components.

We will associate each component with a representative, which is avertex in the component.

We will need the following operations:

MakeSet(x): create a new set whose only member (and representative)is x .Union(x , y): replace the sets containing x and y by their union.Find(x): compute the representative of the set containing x .


Implementing Kruskal (ctd.)


Input: a weighted connected graph G = (V ,E ,W ). Output: a MST.

1 Let F = ∅.

2 For all v ∈ V call MakeSet(v).

3 Sort edges E = {e1, . . . , em} so that W (ei ) ≤ W (ei+1) (1 ≤ i < m).




Input: a weighted connected graph G = (V ,E ,W ). Output: a MST.

1 Let F = ∅.

2 For all v ∈ V call MakeSet(v).

3 Sort edges E = {e1, . . . , em} so that W (ei ) ≤ W (ei+1) (1 ≤ i < m).

4 For i = 1, . . . ,m do

5 If ei = {xi , yi} has the property FindSet(xi) 6= FindSet(yi ), then

6 add ei to F and call Union(xi , yi ).

7 Return F .



The running time

n calls of MakeSet.

Time to sort the m edges: O(m lnm).

m calls of FindSet.

Up to n − 1 calls of Union.

Using an efficient implementation of disjoint sets, this is O(m lnm).


Implementing disjoint sets

Amortized analysis

We need to analyze the total running time of

n calls of MakeSet,m calls of FindSet, andn − 1 calls of Union.



Amortized analysis



“Classical” approach: use individual worst-case bounds:

n × worst case for MakeSet

+ m × worst case for MakeSet

+ (n − 1) × worst case for Union.



Amortized analysis



“Classical” approach: use individual worst-case bounds:

n × worst case for MakeSet

+ m × worst case for MakeSet

+ (n − 1) × worst case for Union.

Amortized analysis:

analyze the cost of the entire sequence directly;take into account that most operations in the sequence don’t attainthe worst-case bound!


Implementing disjoint sets (ctd.)

Linked lists

Use a linked list for each set.

Representative of the set is at the head of the list.

Each element has a pointer direct to the representative (head of itslist).

x



Linked lists: example

Linked list representation of

{ a, f }, { b }, { g , c , e }, { d } :

fa

b

g c e

d

The representatives are a, b, g and d .



Linked lists (ctd.)

MakeSet: just generate a new linked list; Θ(1) time.

FindSet: follow the pointer to the representative; Θ(1) time.

Union(x , y): the naive approach is to append the list of x onto theend of the list of y .

For Union it may help to have a pointer to the last entry of eachlinked list.

Snag: we have to update the representative pointer of each entry inthe list of x .

Cost for naive Union(x , y): Θ(length of list of x).



Example: Union(g , b)

a f

g c e

d

b



Conventions for the further analysis

ν = # of MakeSet operations.

µ = total # of MakeSet, Union, and FindSet operations.

Note that after ν − 1 Unions only one set remaims.

Observe that µ ≥ ν.



A nasty example

Let ν = ⌈µ/2⌉, q = µ − ν.Elements: x1, . . . , xn.

Operation Number of objects updatedMakeSet(x1) 1MakeSet(x2) 1

......

MakeSet(xν) 1Union(x1, x2) 1Union(x2, x3) 2Union(x3, x4) 3

......

Union(xq−1, xq) q − 1Total Θ(µ2)



Linked lists: fix (“weighted union heuristic”)

Record the length of each list.

Implement Union(x , y) so that it always appends the shorter list tothe longer list (break ties arbitrary).

Theorem

Using linked lists with the above fix, a sequence of

µ MakeSet, Union, and FindSet operations,

among which ν are MakeSet operations,

takes O(µ + ν ln ν) time.

Proof.

Basic insight: each element “migrates” at most log2 ν times.



The forest implementation of disjoint sets

Each set is represented by a (rooted) tree.

a

b

c d

ef

g

h

i



The forest implementation of disjoint sets (ctd.)

MakeSet: time Θ(1); just plant a new tree.

FindSet: follow the parent pointers to the root of the correspondingtree; Θ(height of tree).

Union(x , y): naive idea: make the root of the tree of x a child of theroot of the tree of y .

This is no faster than linked lists.



The forest implementation of disjoint sets (ctd.)

MakeSet: time Θ(1); just plant a new tree.

FindSet: follow the parent pointers to the root of the correspondingtree; Θ(height of tree).

Union(x , y): naive idea: make the root of the tree of x a child of theroot of the tree of y .

This is no faster than linked lists.

Forests: improving the running time

Keep the trees low!

Union by rank: attach the lower tree to the roof of the heigher tree.

Path compression: upon performing FindSet, place all vertices onthe path directly under the root.



Union by rank

For each root x maintain a variable rank [x ], which is the height ofthe tree “below” x

When performing Union(x , y), make the root with the smaller rank achild of the one with the larger rank.

If a tie occurs, make the root of x a child of the root of y andincrease the rank of the root of y .



Union by rank: example

a

a

frank[f ] = 0

ce

gb drank[d ] = 0

f

cd

g

rank[g ] = 2

e

b

f

d

rank[d ] = 1

g

rank[g ] = 1

e c

b

a

Union(f , g)

Union(f , d)



Lemma

Using “union by rank” ensures that the height of any tree is at mostlog2(#vertices in the tree).

Proof.We proceed by induction on the # of Union opertations.

no Union ⇒ all sets are singletons.

Suppose Union(x , y) is called. Let r , s be the roots of x , y .

Case 1: rank(r) < rank(s) ⇒ the height of the s-tree stays the same (butthe number of vertices increases).

Case 2: rank(s) < rank(r): analogously.

Case 2: rank(s) = rank(r): the height increases by one, and by induction

#vertices(s) + #vertices(r) ≥ 2rank(r) + 2rank(s) = 2rank(s)+1.



Union by rank: running time

MakeSet takes constant time.

The time for FindSet is bounded by the rank and hence O(ln ν) bythe lemma.

The time needed for Union is bounded by the rank, too, and henceO(ln ν) by the lemma.

Hence, for µ operations, among which ν are MakeSet, we needO(µ ln ν).

No better than linked lists. . . :-(



Path compression: idea

When performing FindSet(x), make each vertex on the path point to theroot.

Algorithm FindSet(x)

1 If x is the root, then return x . Otherwise do the following.

2 Let π(x) = FindSet(π(x)).

3 Return π(x).



Path compression: example

a b c d

f

e

a

b

c

d

e

f



The Ackermann function. . .

. . . is the function A : N × N → N defined by the recurrences

A(1, j) = 2j (j ≥ 1),

A(i , 1) = A(i − 1, 2) (i ≥ 2),

A(i , j) = A(i − 1,A(i , j − 1)) (i , j ≥ 2).

A(n,m) grows really very fast.

We are mainly interested in the so-called inverse Ackermann function

α(m, n) = min{i ≥ 1 : A(i , ⌊m/n⌋) > log2 n}.

α(m, n) grows really slowly.



The Ackermann function: a few numbers

j = 1 j = 2 j = 3 j = 4

i = 1 21 22 23 24

i = 2 22 2222222

22222

i = 3 22222·

·

·

2

}

16

22·

·

·

2

}

2·

·

·

2

}

16

22·

·

·

2

}

2·

·

·

2

}

2·

·

·

2

}

16

Historical importance of A(i , j): showed that the class of primitiverecursive functions does not include all computable functions.

A(i , j) grows faster than any primitive recursive function.



Theorem

With both the union by rank and the path compression method, theworst-case running time for

µ MakeSet, Union, and FindSet operations,

among which ν are MakeSet operations,

is O(µ · α(µ, ν)).

Remarks

Any one of union by rank and path compression does not suffice toimprove on linked lists.

Togehter they essentially yield linear time.


Reading assignment

Take a look at. . .

[CLRS] chapter 21.

[CLRS] chapter 23.


algorithms and data structures · a. coja-oghlan (lfcs) algorithms and data structures 2 / 27....

Documents