algorithms and data structures · a. coja-oghlan (lfcs) algorithms and data structures 2 / 27....

36
Algorithms and data structures Amin Coja-Oghlan LFCS

Upload: others

Post on 18-Jul-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Algorithms and data structures · A. Coja-Oghlan (LFCS) Algorithms and data structures 2 / 27. Kruskal’s algorithm The idea Remeber that a forest is an acyclic graph (and thus all

Algorithms and data structures

Amin Coja-Oghlan

LFCS

Page 2: Algorithms and data structures · A. Coja-Oghlan (LFCS) Algorithms and data structures 2 / 27. Kruskal’s algorithm The idea Remeber that a forest is an acyclic graph (and thus all

Reminder: the minimum spanning tree problem

Reminder: MST

Input: A connected weighted graph G = (V ,E ,W ).

Output: a subgraph H = (VH ,EH) (i.e., VH ⊂ V and EH ⊂ E ) suchthat

1 H is spanning, i.e., VH = V .2 H is connected.3 The weight W (H) =

e∈EHW (e) of H is minimum (among all

subgraphs satisfying 1. and 2.).

In words: H is a minimum weight subgraph that connectes all vertices.

Consequently, we could add the constraint that H must be a tree(because any connected graph contains a tree).

This is called the minimum spanning tree problem (“MST”).

A. Coja-Oghlan (LFCS) Algorithms and data structures 2 / 27

Page 3: Algorithms and data structures · A. Coja-Oghlan (LFCS) Algorithms and data structures 2 / 27. Kruskal’s algorithm The idea Remeber that a forest is an acyclic graph (and thus all

Kruskal’s algorithm

The idea

Remeber that a forest is an acyclic graph (and thus all itscomponents are trees).

Starting from a spanning “forest” without any edges, Kruskal keepsmelting components of the forest as cheaply as possible (greedystrategy).

We will see that the resulting tree is a MST.

A. Coja-Oghlan (LFCS) Algorithms and data structures 3 / 27

Page 4: Algorithms and data structures · A. Coja-Oghlan (LFCS) Algorithms and data structures 2 / 27. Kruskal’s algorithm The idea Remeber that a forest is an acyclic graph (and thus all

Kruskal’s algorithm

The idea

Remeber that a forest is an acyclic graph (and thus all itscomponents are trees).

Starting from a spanning “forest” without any edges, Kruskal keepsmelting components of the forest as cheaply as possible (greedystrategy).

We will see that the resulting tree is a MST.

Algorithm Kruskal(G )

Input: a connected weighted graph G = (V ,E ,W ). Output: a MST.

1 Let F = ∅. Sort the edges E = {e1, . . . , em} increasingly by weight.

2 For i = 1, . . . ,m do

3 if e connects two different components of (V ,F ), add e to F .

4 Return (V ,F ).

A. Coja-Oghlan (LFCS) Algorithms and data structures 3 / 27

Page 5: Algorithms and data structures · A. Coja-Oghlan (LFCS) Algorithms and data structures 2 / 27. Kruskal’s algorithm The idea Remeber that a forest is an acyclic graph (and thus all

Kruskal’s algorithm: correctness

Theorem

Given a connected weighted graph G = (V ,E ,W ), Kruskal outputs aMST.

A. Coja-Oghlan (LFCS) Algorithms and data structures 4 / 27

Page 6: Algorithms and data structures · A. Coja-Oghlan (LFCS) Algorithms and data structures 2 / 27. Kruskal’s algorithm The idea Remeber that a forest is an acyclic graph (and thus all

Kruskal’s algorithm: correctness

Theorem

Given a connected weighted graph G = (V ,E ,W ), Kruskal outputs aMST.

Proof, part 1: (V , F ) is a spanning forest at all times.

(V ,F ) is clearly spanning.

Kruskal only adds edges that join two components, and hence doesnot create any cycles.

A. Coja-Oghlan (LFCS) Algorithms and data structures 4 / 27

Page 7: Algorithms and data structures · A. Coja-Oghlan (LFCS) Algorithms and data structures 2 / 27. Kruskal’s algorithm The idea Remeber that a forest is an acyclic graph (and thus all

Kruskal’s algorithm: correctness

Theorem

Given a connected weighted graph G = (V ,E ,W ), Kruskal outputs aMST.

Proof, part 2: the output of Kruskal is connected.

Assume for contradiction that (V ,F ) is not connected.

Then it is a forest with at least two components C1, C2.

Since G is connected, there is an edge joining C1,C2 in G .

Let j be the minimum index such that ej joins C1,C2.

Then Kruskal should have added ej to F .

A. Coja-Oghlan (LFCS) Algorithms and data structures 4 / 27

Page 8: Algorithms and data structures · A. Coja-Oghlan (LFCS) Algorithms and data structures 2 / 27. Kruskal’s algorithm The idea Remeber that a forest is an acyclic graph (and thus all

Kruskal’s algorithm: correctness

Theorem

Given a connected weighted graph G = (V ,E ,W ), Kruskal outputs aMST.

Proof, part 3: throughout (V , F ) is contained in an MST.

Similar to the arumgent for Prim.

A. Coja-Oghlan (LFCS) Algorithms and data structures 4 / 27

Page 9: Algorithms and data structures · A. Coja-Oghlan (LFCS) Algorithms and data structures 2 / 27. Kruskal’s algorithm The idea Remeber that a forest is an acyclic graph (and thus all

Kruskal’s algorithm: correctness

Theorem

Given a connected weighted graph G = (V ,E ,W ), Kruskal outputs aMST.

Proof, part 4: throughout (V , F ) is contained in an MST.

Combining parts 1–3 completes the proof.

A. Coja-Oghlan (LFCS) Algorithms and data structures 4 / 27

Page 10: Algorithms and data structures · A. Coja-Oghlan (LFCS) Algorithms and data structures 2 / 27. Kruskal’s algorithm The idea Remeber that a forest is an acyclic graph (and thus all

Implementing Kruskal

Data structure: disjoint sets

To implement Step 3, we need to keep track of the components of(V ,F ).

The components are disjoint sets of vertices.

They are dynamic, as we occasionally melt two components.

We will associate each component with a representative, which is avertex in the component.

We will need the following operations:

MakeSet(x): create a new set whose only member (and representative)is x .Union(x , y): replace the sets containing x and y by their union.Find(x): compute the representative of the set containing x .

A. Coja-Oghlan (LFCS) Algorithms and data structures 5 / 27

Page 11: Algorithms and data structures · A. Coja-Oghlan (LFCS) Algorithms and data structures 2 / 27. Kruskal’s algorithm The idea Remeber that a forest is an acyclic graph (and thus all

Implementing Kruskal (ctd.)

Algorithm Kruskal(G )

Input: a weighted connected graph G = (V ,E ,W ). Output: a MST.

1 Let F = ∅.

2 For all v ∈ V call MakeSet(v).

3 Sort edges E = {e1, . . . , em} so that W (ei ) ≤ W (ei+1) (1 ≤ i < m).

A. Coja-Oghlan (LFCS) Algorithms and data structures 6 / 27

Page 12: Algorithms and data structures · A. Coja-Oghlan (LFCS) Algorithms and data structures 2 / 27. Kruskal’s algorithm The idea Remeber that a forest is an acyclic graph (and thus all

Implementing Kruskal (ctd.)

Algorithm Kruskal(G )

Input: a weighted connected graph G = (V ,E ,W ). Output: a MST.

1 Let F = ∅.

2 For all v ∈ V call MakeSet(v).

3 Sort edges E = {e1, . . . , em} so that W (ei ) ≤ W (ei+1) (1 ≤ i < m).

4 For i = 1, . . . ,m do

5 If ei = {xi , yi} has the property FindSet(xi) 6= FindSet(yi ), then

6 add ei to F and call Union(xi , yi ).

7 Return F .

A. Coja-Oghlan (LFCS) Algorithms and data structures 6 / 27

Page 13: Algorithms and data structures · A. Coja-Oghlan (LFCS) Algorithms and data structures 2 / 27. Kruskal’s algorithm The idea Remeber that a forest is an acyclic graph (and thus all

Implementing Kruskal (ctd.)

The running time

n calls of MakeSet.

Time to sort the m edges: O(m lnm).

m calls of FindSet.

Up to n − 1 calls of Union.

Using an efficient implementation of disjoint sets, this is O(m lnm).

A. Coja-Oghlan (LFCS) Algorithms and data structures 7 / 27

Page 14: Algorithms and data structures · A. Coja-Oghlan (LFCS) Algorithms and data structures 2 / 27. Kruskal’s algorithm The idea Remeber that a forest is an acyclic graph (and thus all

Implementing disjoint sets

Amortized analysis

We need to analyze the total running time of

n calls of MakeSet,m calls of FindSet, andn − 1 calls of Union.

A. Coja-Oghlan (LFCS) Algorithms and data structures 8 / 27

Page 15: Algorithms and data structures · A. Coja-Oghlan (LFCS) Algorithms and data structures 2 / 27. Kruskal’s algorithm The idea Remeber that a forest is an acyclic graph (and thus all

Implementing disjoint sets

Amortized analysis

We need to analyze the total running time of

n calls of MakeSet,m calls of FindSet, andn − 1 calls of Union.

“Classical” approach: use individual worst-case bounds:

n × worst case for MakeSet

+ m × worst case for MakeSet

+ (n − 1) × worst case for Union.

A. Coja-Oghlan (LFCS) Algorithms and data structures 8 / 27

Page 16: Algorithms and data structures · A. Coja-Oghlan (LFCS) Algorithms and data structures 2 / 27. Kruskal’s algorithm The idea Remeber that a forest is an acyclic graph (and thus all

Implementing disjoint sets

Amortized analysis

We need to analyze the total running time of

n calls of MakeSet,m calls of FindSet, andn − 1 calls of Union.

“Classical” approach: use individual worst-case bounds:

n × worst case for MakeSet

+ m × worst case for MakeSet

+ (n − 1) × worst case for Union.

Amortized analysis:

analyze the cost of the entire sequence directly;take into account that most operations in the sequence don’t attainthe worst-case bound!

A. Coja-Oghlan (LFCS) Algorithms and data structures 8 / 27

Page 17: Algorithms and data structures · A. Coja-Oghlan (LFCS) Algorithms and data structures 2 / 27. Kruskal’s algorithm The idea Remeber that a forest is an acyclic graph (and thus all

Implementing disjoint sets (ctd.)

Linked lists

Use a linked list for each set.

Representative of the set is at the head of the list.

Each element has a pointer direct to the representative (head of itslist).

x

A. Coja-Oghlan (LFCS) Algorithms and data structures 9 / 27

Page 18: Algorithms and data structures · A. Coja-Oghlan (LFCS) Algorithms and data structures 2 / 27. Kruskal’s algorithm The idea Remeber that a forest is an acyclic graph (and thus all

Implementing disjoint sets (ctd.)

Linked lists: example

Linked list representation of

{ a, f }, { b }, { g , c , e }, { d } :

fa

b

g c e

d

The representatives are a, b, g and d .

A. Coja-Oghlan (LFCS) Algorithms and data structures 10 / 27

Page 19: Algorithms and data structures · A. Coja-Oghlan (LFCS) Algorithms and data structures 2 / 27. Kruskal’s algorithm The idea Remeber that a forest is an acyclic graph (and thus all

Implementing disjoint sets (ctd.)

Linked lists (ctd.)

MakeSet: just generate a new linked list; Θ(1) time.

FindSet: follow the pointer to the representative; Θ(1) time.

Union(x , y): the naive approach is to append the list of x onto theend of the list of y .

For Union it may help to have a pointer to the last entry of eachlinked list.

Snag: we have to update the representative pointer of each entry inthe list of x .

Cost for naive Union(x , y): Θ(length of list of x).

A. Coja-Oghlan (LFCS) Algorithms and data structures 11 / 27

Page 20: Algorithms and data structures · A. Coja-Oghlan (LFCS) Algorithms and data structures 2 / 27. Kruskal’s algorithm The idea Remeber that a forest is an acyclic graph (and thus all

Implementing disjoint sets (ctd.)

Example: Union(g , b)

a f

g c e

d

b

A. Coja-Oghlan (LFCS) Algorithms and data structures 12 / 27

Page 21: Algorithms and data structures · A. Coja-Oghlan (LFCS) Algorithms and data structures 2 / 27. Kruskal’s algorithm The idea Remeber that a forest is an acyclic graph (and thus all

Implementing disjoint sets (ctd.)

Conventions for the further analysis

ν = # of MakeSet operations.

µ = total # of MakeSet, Union, and FindSet operations.

Note that after ν − 1 Unions only one set remaims.

Observe that µ ≥ ν.

A. Coja-Oghlan (LFCS) Algorithms and data structures 13 / 27

Page 22: Algorithms and data structures · A. Coja-Oghlan (LFCS) Algorithms and data structures 2 / 27. Kruskal’s algorithm The idea Remeber that a forest is an acyclic graph (and thus all

Implementing disjoint sets (ctd.)

A nasty example

Let ν = ⌈µ/2⌉, q = µ − ν.Elements: x1, . . . , xn.

Operation Number of objects updatedMakeSet(x1) 1MakeSet(x2) 1

......

MakeSet(xν) 1Union(x1, x2) 1Union(x2, x3) 2Union(x3, x4) 3

......

Union(xq−1, xq) q − 1Total Θ(µ2)

A. Coja-Oghlan (LFCS) Algorithms and data structures 14 / 27

Page 23: Algorithms and data structures · A. Coja-Oghlan (LFCS) Algorithms and data structures 2 / 27. Kruskal’s algorithm The idea Remeber that a forest is an acyclic graph (and thus all

Implementing disjoint sets (ctd.)

Linked lists: fix (“weighted union heuristic”)

Record the length of each list.

Implement Union(x , y) so that it always appends the shorter list tothe longer list (break ties arbitrary).

Theorem

Using linked lists with the above fix, a sequence of

µ MakeSet, Union, and FindSet operations,

among which ν are MakeSet operations,

takes O(µ + ν ln ν) time.

Proof.

Basic insight: each element “migrates” at most log2 ν times.

A. Coja-Oghlan (LFCS) Algorithms and data structures 15 / 27

Page 24: Algorithms and data structures · A. Coja-Oghlan (LFCS) Algorithms and data structures 2 / 27. Kruskal’s algorithm The idea Remeber that a forest is an acyclic graph (and thus all

Implementing disjoint sets (ctd.)

The forest implementation of disjoint sets

Each set is represented by a (rooted) tree.

a

b

c d

ef

g

h

i

A. Coja-Oghlan (LFCS) Algorithms and data structures 16 / 27

Page 25: Algorithms and data structures · A. Coja-Oghlan (LFCS) Algorithms and data structures 2 / 27. Kruskal’s algorithm The idea Remeber that a forest is an acyclic graph (and thus all

Implementing disjoint sets (ctd.)

The forest implementation of disjoint sets (ctd.)

MakeSet: time Θ(1); just plant a new tree.

FindSet: follow the parent pointers to the root of the correspondingtree; Θ(height of tree).

Union(x , y): naive idea: make the root of the tree of x a child of theroot of the tree of y .

This is no faster than linked lists.

A. Coja-Oghlan (LFCS) Algorithms and data structures 17 / 27

Page 26: Algorithms and data structures · A. Coja-Oghlan (LFCS) Algorithms and data structures 2 / 27. Kruskal’s algorithm The idea Remeber that a forest is an acyclic graph (and thus all

Implementing disjoint sets (ctd.)

The forest implementation of disjoint sets (ctd.)

MakeSet: time Θ(1); just plant a new tree.

FindSet: follow the parent pointers to the root of the correspondingtree; Θ(height of tree).

Union(x , y): naive idea: make the root of the tree of x a child of theroot of the tree of y .

This is no faster than linked lists.

Forests: improving the running time

Keep the trees low!

Union by rank: attach the lower tree to the roof of the heigher tree.

Path compression: upon performing FindSet, place all vertices onthe path directly under the root.

A. Coja-Oghlan (LFCS) Algorithms and data structures 17 / 27

Page 27: Algorithms and data structures · A. Coja-Oghlan (LFCS) Algorithms and data structures 2 / 27. Kruskal’s algorithm The idea Remeber that a forest is an acyclic graph (and thus all

Implementing disjoint sets (ctd.)

Union by rank

For each root x maintain a variable rank [x ], which is the height ofthe tree “below” x

When performing Union(x , y), make the root with the smaller rank achild of the one with the larger rank.

If a tie occurs, make the root of x a child of the root of y andincrease the rank of the root of y .

A. Coja-Oghlan (LFCS) Algorithms and data structures 18 / 27

Page 28: Algorithms and data structures · A. Coja-Oghlan (LFCS) Algorithms and data structures 2 / 27. Kruskal’s algorithm The idea Remeber that a forest is an acyclic graph (and thus all

Implementing disjoint sets (ctd.)

Union by rank: example

a

a

frank[f ] = 0

ce

gb drank[d ] = 0

f

cd

g

rank[g ] = 2

e

b

f

d

rank[d ] = 1

g

rank[g ] = 1

e c

b

a

Union(f , g)

Union(f , d)

A. Coja-Oghlan (LFCS) Algorithms and data structures 19 / 27

Page 29: Algorithms and data structures · A. Coja-Oghlan (LFCS) Algorithms and data structures 2 / 27. Kruskal’s algorithm The idea Remeber that a forest is an acyclic graph (and thus all

Implementing disjoint sets (ctd.)

Lemma

Using “union by rank” ensures that the height of any tree is at mostlog2(#vertices in the tree).

Proof.We proceed by induction on the # of Union opertations.

no Union ⇒ all sets are singletons.

Suppose Union(x , y) is called. Let r , s be the roots of x , y .

Case 1: rank(r) < rank(s) ⇒ the height of the s-tree stays the same (butthe number of vertices increases).

Case 2: rank(s) < rank(r): analogously.

Case 2: rank(s) = rank(r): the height increases by one, and by induction

#vertices(s) + #vertices(r) ≥ 2rank(r) + 2rank(s) = 2rank(s)+1.

A. Coja-Oghlan (LFCS) Algorithms and data structures 20 / 27

Page 30: Algorithms and data structures · A. Coja-Oghlan (LFCS) Algorithms and data structures 2 / 27. Kruskal’s algorithm The idea Remeber that a forest is an acyclic graph (and thus all

Implementing disjoint sets (ctd.)

Union by rank: running time

MakeSet takes constant time.

The time for FindSet is bounded by the rank and hence O(ln ν) bythe lemma.

The time needed for Union is bounded by the rank, too, and henceO(ln ν) by the lemma.

Hence, for µ operations, among which ν are MakeSet, we needO(µ ln ν).

No better than linked lists. . . :-(

A. Coja-Oghlan (LFCS) Algorithms and data structures 21 / 27

Page 31: Algorithms and data structures · A. Coja-Oghlan (LFCS) Algorithms and data structures 2 / 27. Kruskal’s algorithm The idea Remeber that a forest is an acyclic graph (and thus all

Implementing disjoint sets (ctd.)

Path compression: idea

When performing FindSet(x), make each vertex on the path point to theroot.

Algorithm FindSet(x)

1 If x is the root, then return x . Otherwise do the following.

2 Let π(x) = FindSet(π(x)).

3 Return π(x).

A. Coja-Oghlan (LFCS) Algorithms and data structures 22 / 27

Page 32: Algorithms and data structures · A. Coja-Oghlan (LFCS) Algorithms and data structures 2 / 27. Kruskal’s algorithm The idea Remeber that a forest is an acyclic graph (and thus all

Implementing disjoint sets (ctd.)

Path compression: example

a b c d

f

e

a

b

c

d

e

f

A. Coja-Oghlan (LFCS) Algorithms and data structures 23 / 27

Page 33: Algorithms and data structures · A. Coja-Oghlan (LFCS) Algorithms and data structures 2 / 27. Kruskal’s algorithm The idea Remeber that a forest is an acyclic graph (and thus all

Implementing disjoint sets (ctd.)

The Ackermann function. . .

. . . is the function A : N × N → N defined by the recurrences

A(1, j) = 2j (j ≥ 1),

A(i , 1) = A(i − 1, 2) (i ≥ 2),

A(i , j) = A(i − 1,A(i , j − 1)) (i , j ≥ 2).

A(n,m) grows really very fast.

We are mainly interested in the so-called inverse Ackermann function

α(m, n) = min{i ≥ 1 : A(i , ⌊m/n⌋) > log2 n}.

α(m, n) grows really slowly.

A. Coja-Oghlan (LFCS) Algorithms and data structures 24 / 27

Page 34: Algorithms and data structures · A. Coja-Oghlan (LFCS) Algorithms and data structures 2 / 27. Kruskal’s algorithm The idea Remeber that a forest is an acyclic graph (and thus all

Implementing disjoint sets (ctd.)

The Ackermann function: a few numbers

j = 1 j = 2 j = 3 j = 4

i = 1 21 22 23 24

i = 2 22 2222222

22222

i = 3 22222·

·

·

2

}

16

22·

·

·

2

}

·

·

2

}

16

22·

·

·

2

}

·

·

2

}

·

·

2

}

16

Historical importance of A(i , j): showed that the class of primitiverecursive functions does not include all computable functions.

A(i , j) grows faster than any primitive recursive function.

A. Coja-Oghlan (LFCS) Algorithms and data structures 25 / 27

Page 35: Algorithms and data structures · A. Coja-Oghlan (LFCS) Algorithms and data structures 2 / 27. Kruskal’s algorithm The idea Remeber that a forest is an acyclic graph (and thus all

Implementing disjoint sets (ctd.)

Theorem

With both the union by rank and the path compression method, theworst-case running time for

µ MakeSet, Union, and FindSet operations,

among which ν are MakeSet operations,

is O(µ · α(µ, ν)).

Remarks

Any one of union by rank and path compression does not suffice toimprove on linked lists.

Togehter they essentially yield linear time.

A. Coja-Oghlan (LFCS) Algorithms and data structures 26 / 27

Page 36: Algorithms and data structures · A. Coja-Oghlan (LFCS) Algorithms and data structures 2 / 27. Kruskal’s algorithm The idea Remeber that a forest is an acyclic graph (and thus all

Reading assignment

Take a look at. . .

[CLRS] chapter 21.

[CLRS] chapter 23.

A. Coja-Oghlan (LFCS) Algorithms and data structures 27 / 27