a randomized linear-time algorithm to find minimum spanning trees david r. karger david r. karger...

47
A Randomized Linear-Time A Randomized Linear-Time Algorithm to Find Minimum Algorithm to Find Minimum Spanning Trees Spanning Trees David R. David R. Karger Karger Philip N. Klein Philip N. Klein Robert E. Tarjan Robert E. Tarjan

Upload: kerry-arnold

Post on 18-Dec-2015

220 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: A Randomized Linear-Time Algorithm to Find Minimum Spanning Trees David R. Karger David R. Karger Philip N. Klein Philip N. Klein Robert E. Tarjan

A Randomized Linear-Time A Randomized Linear-Time Algorithm to Find Minimum Algorithm to Find Minimum

Spanning TreesSpanning Trees

David R. David R. KargerKargerPhilip N. KleinPhilip N. Klein

Robert E. TarjanRobert E. Tarjan

Page 2: A Randomized Linear-Time Algorithm to Find Minimum Spanning Trees David R. Karger David R. Karger Philip N. Klein Philip N. Klein Robert E. Tarjan

Talk OutlineTalk Outline

Objective & related work from Objective & related work from literaturesliteratures

IntuitionIntuition DefinitionsDefinitions AlgorithmAlgorithm Proof & AnalysisProof & Analysis Conclusion and future workConclusion and future work

Page 3: A Randomized Linear-Time Algorithm to Find Minimum Spanning Trees David R. Karger David R. Karger Philip N. Klein Philip N. Klein Robert E. Tarjan

ObjectiveObjective

A minimum spanning tree is a tree A minimum spanning tree is a tree formed from a subset of the edges in a formed from a subset of the edges in a given undirected graph, with two given undirected graph, with two properties:properties:

1. it spans the graph, i.e., it includes 1. it spans the graph, i.e., it includes every every

vertex in the graph, andvertex in the graph, and

2. it is a minimum, i.e., the total weight of 2. it is a minimum, i.e., the total weight of

all the edges is as low as possible.all the edges is as low as possible.

Find a minimum spanning tree for a graph by linear time with very high probability!!

Page 4: A Randomized Linear-Time Algorithm to Find Minimum Spanning Trees David R. Karger David R. Karger Philip N. Klein Philip N. Klein Robert E. Tarjan

Related WorkRelated Work Boruvka 1926, textbook algorithms – Boruvka 1926, textbook algorithms – Yao 1975 – Yao 1975 – Cheriton and Tarjan 1976 –Cheriton and Tarjan 1976 – Fredman and Tarjan 1987 –Fredman and Tarjan 1987 – Gabow 1986 –Gabow 1986 – Chazelle 1995 – Chazelle 1995 –

)),(( nmmO

)),(( nmmO

)loglog( nmO

)log( nmO

)loglog( nmO

)),(log( nmmO

Deterministic results! How about the randomized one??

Page 5: A Randomized Linear-Time Algorithm to Find Minimum Spanning Trees David R. Karger David R. Karger Philip N. Klein Philip N. Klein Robert E. Tarjan

IntuitionIntuition Cycle PropertyCycle Property Cut PropertyCut Property RandomizationRandomization

Page 6: A Randomized Linear-Time Algorithm to Find Minimum Spanning Trees David R. Karger David R. Karger Philip N. Klein Philip N. Klein Robert E. Tarjan

IntuitionIntuition

For any cycle For any cycle CC in a graph, the heavies in a graph, the heaviest edge in t edge in CC does not apper in the mini does not apper in the minimum spanning tree.mum spanning tree.

Page 7: A Randomized Linear-Time Algorithm to Find Minimum Spanning Trees David R. Karger David R. Karger Philip N. Klein Philip N. Klein Robert E. Tarjan

Cycle PropertyCycle PropertyHeaviest edge

Page 8: A Randomized Linear-Time Algorithm to Find Minimum Spanning Trees David R. Karger David R. Karger Philip N. Klein Philip N. Klein Robert E. Tarjan

Cycle PropertyCycle Property

For any graph, find all possible For any graph, find all possible cycles and remove the heaviest edge cycles and remove the heaviest edge from each cycle. Then we can get a from each cycle. Then we can get a minimum spanning tree??minimum spanning tree??

How about the time complexity?How about the time complexity?

How to detect the cycles in the How to detect the cycles in the graph??graph??

Page 9: A Randomized Linear-Time Algorithm to Find Minimum Spanning Trees David R. Karger David R. Karger Philip N. Klein Philip N. Klein Robert E. Tarjan

Cut PropertyCut Property

For any proper nonempty subset For any proper nonempty subset XX of the vertices, the lightest edge of the vertices, the lightest edge with exactly one endpoint in with exactly one endpoint in XX belongs to the minimum spanning belongs to the minimum spanning tree.tree.

Page 10: A Randomized Linear-Time Algorithm to Find Minimum Spanning Trees David R. Karger David R. Karger Philip N. Klein Philip N. Klein Robert E. Tarjan

Cut PropertyCut Property

XV X

Page 11: A Randomized Linear-Time Algorithm to Find Minimum Spanning Trees David R. Karger David R. Karger Philip N. Klein Philip N. Klein Robert E. Tarjan

Boruvka AlgorithmBoruvka Algorithm For each vertex, select the minimum-For each vertex, select the minimum-

weight edge incident to the vertex. weight edge incident to the vertex. Contract all the selected edges, Contract all the selected edges, replacing by a single vertex each replacing by a single vertex each connected component defined by the connected component defined by the selected edges and deleting all resulting selected edges and deleting all resulting isolated vertices, loops (edges both of isolated vertices, loops (edges both of whose endpoints are the same), and all whose endpoints are the same), and all but the lowest-weight edge among each but the lowest-weight edge among each set of multiple edges.set of multiple edges. O(m log n)

Page 12: A Randomized Linear-Time Algorithm to Find Minimum Spanning Trees David R. Karger David R. Karger Philip N. Klein Philip N. Klein Robert E. Tarjan

RandomizationRandomization

How the randomization can help us to How the randomization can help us to achieve our goal?achieve our goal?

Boruvka + Cycle Property + RandomizBoruvka + Cycle Property + Randomizationation= Linear time with very high probabilit= Linear time with very high probabilityy

Page 13: A Randomized Linear-Time Algorithm to Find Minimum Spanning Trees David R. Karger David R. Karger Philip N. Klein Philip N. Klein Robert E. Tarjan

Definition

Page 14: A Randomized Linear-Time Algorithm to Find Minimum Spanning Trees David R. Karger David R. Karger Philip N. Klein Philip N. Klein Robert E. Tarjan

DefinitionDefinition

Let G be a graph with weighted edges. Let G be a graph with weighted edges. ww((xx, , yy) : the weight of edge {) : the weight of edge {xx, , yy} }

If If FF is a forest of a subgraph in is a forest of a subgraph in GG, , FF((xx, , yy) : the path (if any) connecting ) : the path (if any) connecting xx a and nd yy in in FF, , : the maximum weight : the maximum weight of an edge onof an edge on FF((xx, , yy), with the c), with the convention that onvention that if if xx and and yy are not are not connect connected in ed in FF..

),( yxwF

),( yxwF

Page 15: A Randomized Linear-Time Algorithm to Find Minimum Spanning Trees David R. Karger David R. Karger Philip N. Klein Philip N. Klein Robert E. Tarjan

DefinitionDefinition

FF-heavy-heavy

Otherwise, {Otherwise, {xx, , yy} is } is FF-light.-light.

),(),( yxwyxw F

Page 16: A Randomized Linear-Time Algorithm to Find Minimum Spanning Trees David R. Karger David R. Karger Philip N. Klein Philip N. Klein Robert E. Tarjan

FF-heavy & -heavy & FF-light-light

G

HF

FF-light-light

F

FG

FF-heavy-heavy

Page 17: A Randomized Linear-Time Algorithm to Find Minimum Spanning Trees David R. Karger David R. Karger Philip N. Klein Philip N. Klein Robert E. Tarjan

FF-heavy & -heavy & FF-light-light

Note that the edges of Note that the edges of FF are all are all FF-li-light.ght.For any forest For any forest FF, no , no FF-heavy edge -heavy edge can be in the minimum spanning fcan be in the minimum spanning forest of orest of GG..

Cycle Property!!

Page 18: A Randomized Linear-Time Algorithm to Find Minimum Spanning Trees David R. Karger David R. Karger Philip N. Klein Philip N. Klein Robert E. Tarjan

Recursive function call:

Input: A undirected graphOutput: A minimum spanning forestTime: for the worst case

O(m) with very high probability

Algorithm}) log , (min{ 2 nmnO

Page 19: A Randomized Linear-Time Algorithm to Find Minimum Spanning Trees David R. Karger David R. Karger Philip N. Klein Philip N. Klein Robert E. Tarjan

AlgorithmAlgorithm

Step 1. Apply two successive BoruStep 1. Apply two successive Boruvka steps to the graph, thereby redvka steps to the graph, thereby reducing the number of vertices by at ucing the number of vertices by at least a factor of four.least a factor of four.

Page 20: A Randomized Linear-Time Algorithm to Find Minimum Spanning Trees David R. Karger David R. Karger Philip N. Klein Philip N. Klein Robert E. Tarjan

AlgorithmAlgorithm

G

*G

Page 21: A Randomized Linear-Time Algorithm to Find Minimum Spanning Trees David R. Karger David R. Karger Philip N. Klein Philip N. Klein Robert E. Tarjan

AlgorithmAlgorithm

Step 2. In the contracted graph, choose Step 2. In the contracted graph, choose a subgraph a subgraph HH by selecting each edge in by selecting each edge independently with probability 1/2. Appldependently with probability 1/2. Apply the algorithm recursively to y the algorithm recursively to HH, produ, producing a minimum spanning forest cing a minimum spanning forest FF of of HH. Find all the . Find all the FF-heavy edges (both tho-heavy edges (both those in se in HH and those not in and those not in HH) and delete t) and delete them.hem.

Page 22: A Randomized Linear-Time Algorithm to Find Minimum Spanning Trees David R. Karger David R. Karger Philip N. Klein Philip N. Klein Robert E. Tarjan

AlgorithmAlgorithm

G

*G

HF

*G

HHF

Back to analysis

Page 23: A Randomized Linear-Time Algorithm to Find Minimum Spanning Trees David R. Karger David R. Karger Philip N. Klein Philip N. Klein Robert E. Tarjan

AlgorithmAlgorithm

Step 3. Apply the algorithm Step 3. Apply the algorithm recursively to the remaining graph recursively to the remaining graph to compute a spanning forest . to compute a spanning forest . Return those edges contracted in Return those edges contracted in Step 1 together with the edges Step 1 together with the edges of .of .

FF

Page 24: A Randomized Linear-Time Algorithm to Find Minimum Spanning Trees David R. Karger David R. Karger Philip N. Klein Philip N. Klein Robert E. Tarjan

AlgorithmAlgorithm

G

*G

)()()( * FEHEGE

*G

)()( * heavyFGE

HF

Back to analysis

FReturn

Page 25: A Randomized Linear-Time Algorithm to Find Minimum Spanning Trees David R. Karger David R. Karger Philip N. Klein Philip N. Klein Robert E. Tarjan

AlgorithmAlgorithm

F

Those not in H

F - light

F - heavy)()( HVFV Edges of H

)()(*)(

)()( *

FEHEGE

heavyFGE

Page 26: A Randomized Linear-Time Algorithm to Find Minimum Spanning Trees David R. Karger David R. Karger Philip N. Klein Philip N. Klein Robert E. Tarjan

AnalysisAnalysis

Correctness?Correctness? Worst-case time complexity?Worst-case time complexity? Expected time complexity?Expected time complexity?

Page 27: A Randomized Linear-Time Algorithm to Find Minimum Spanning Trees David R. Karger David R. Karger Philip N. Klein Philip N. Klein Robert E. Tarjan

CorrectnessCorrectness

CompletenessCompleteness

By the cut property, every edge By the cut property, every edge contracted during Step 1 is in the contracted during Step 1 is in the minimum spanning forest. Hence minimum spanning forest. Hence the remaining edges of the minimum the remaining edges of the minimum spanning forest of the original graph spanning forest of the original graph form a minimum spanning forest of form a minimum spanning forest of the contracted graph. the contracted graph.

Page 28: A Randomized Linear-Time Algorithm to Find Minimum Spanning Trees David R. Karger David R. Karger Philip N. Klein Philip N. Klein Robert E. Tarjan

CorrectnessCorrectness

SoundnessSoundness

By the cycle property, the edges By the cycle property, the edges deleted in Step 2 do not belong to deleted in Step 2 do not belong to minimum spanning forest. By the minimum spanning forest. By the inductive hypothesis, the minimum inductive hypothesis, the minimum spanning forest of the remaining spanning forest of the remaining graph is correctly determined in graph is correctly determined in recursive call of Step 3.recursive call of Step 3.

Page 29: A Randomized Linear-Time Algorithm to Find Minimum Spanning Trees David R. Karger David R. Karger Philip N. Klein Philip N. Klein Robert E. Tarjan

Worst-case time Worst-case time complexitycomplexity

The worst-case running time of the mini-sThe worst-case running time of the mini-spanning forest algorithm is panning forest algorithm is , the same as the bound for Boruvka’s alg , the same as the bound for Boruvka’s algorithm.orithm.

Count the total number of edges. Step 1 reCount the total number of edges. Step 1 reduces the size to ¼ as its original. A subproduces the size to ¼ as its original. A subproblem at depth blem at depth dd contains at most contains at most e edges. Summing all subproblems gives andges. Summing all subproblems gives an

bound on the total number of edge bound on the total number of edges. s.

}) log , (min{ 2 nmnO

2/)4/( 2dn)( 2nO

Page 30: A Randomized Linear-Time Algorithm to Find Minimum Spanning Trees David R. Karger David R. Karger Philip N. Klein Philip N. Klein Robert E. Tarjan

Worst-case time Worst-case time complexitycomplexity

Parent: Parent: EE((G)G) Left childLeft child: : EE((H)H) Right childRight child::

Number of edges in next recursion level Number of edges in next recursion level

= E(G*) + E(F)= E(G*) + E(F)= E(G) – V(G)/2 + V(G)/4= E(G) – V(G)/2 + V(G)/4

)(GE

Parent

Left Right

)()()(*)( HEFEHEGE

)()(*)( FEHEGE

Page 31: A Randomized Linear-Time Algorithm to Find Minimum Spanning Trees David R. Karger David R. Karger Philip N. Klein Philip N. Klein Robert E. Tarjan

Worst-case time Worst-case time complexitycomplexity

m edges

edges m

edges m

edges mnlog

Page 32: A Randomized Linear-Time Algorithm to Find Minimum Spanning Trees David R. Karger David R. Karger Philip N. Klein Philip N. Klein Robert E. Tarjan

Worst-case time Worst-case time complexitycomplexity

The total time spent in Steps 1-3 is linear in thThe total time spent in Steps 1-3 is linear in the number of edges: e number of edges:

Step 1 is just two steps of Boruvka’s algorithStep 1 is just two steps of Boruvka’s algorithm. m. Step 2 takes linear time using the modified DixStep 2 takes linear time using the modified Dixon-Rauch-Tarjan verification algorithm.on-Rauch-Tarjan verification algorithm.- - FF-heavy edges of -heavy edges of GG can be computed in time can be computed in time

linear in the number of edges of linear in the number of edges of GG..

}) log , (min{ 2 nmnO

Page 33: A Randomized Linear-Time Algorithm to Find Minimum Spanning Trees David R. Karger David R. Karger Philip N. Klein Philip N. Klein Robert E. Tarjan

Analysis

Given graph G with n vertices and m edges• After one Boruvka step, Boruvka step forms

connected components and replaces each by single vertex. Since each component connects more than 2 edges, there are at most n/2 vertices remained.

• For component with k vertices, exactly k – 1 edges are removed. Thus the edges removed is at least

where is set of connected components. Since there is at most n/2 components, there is at least n/2 edges removed.

ΓnΓv1vΓg

gΓg

g

Page 34: A Randomized Linear-Time Algorithm to Find Minimum Spanning Trees David R. Karger David R. Karger Philip N. Klein Philip N. Klein Robert E. Tarjan

Analysis

Given F = MST(H), for (x, y) in H

1. If (x, y) is in F, (x, y) is F-light

2. If (x, y) is not in F, assume (x, y) is F-light, the heaviest edge in cycle P (x, y) would be on P, and is belong to no MST according to cycle property. This causes contradiction, thus (x, y) is F-heavy.

3. Thus, each F-light edge in H is also in F, and vice versa.

F

P

X

Y

Page 35: A Randomized Linear-Time Algorithm to Find Minimum Spanning Trees David R. Karger David R. Karger Philip N. Klein Philip N. Klein Robert E. Tarjan

F-light

F-heavy

G* not selected

selected bysampling

F

G'{F-light edges in G*}

H

Analysis

According to the distribution of edges used by H and G', edges of F are used twice by calling MST(H)

and MST(G').

Page 36: A Randomized Linear-Time Algorithm to Find Minimum Spanning Trees David R. Karger David R. Karger Philip N. Klein Philip N. Klein Robert E. Tarjan

The binary tree represents the recursive invocation of MST:

Analysis

Left child represents invocation of MST(H).Right child represents invocation of MST(G').

Since 2 Borůvka step are performed before invocation of MST(H) and MST(G'), number of vertices is reduced in factor of 4. Thus, the height of invocation is at most log4n.

Page 37: A Randomized Linear-Time Algorithm to Find Minimum Spanning Trees David R. Karger David R. Karger Philip N. Klein Philip N. Klein Robert E. Tarjan

Analysis - Worst Case

Given graph G with m edges and n vertices

1. After 2 Borůvka steps, at most n/4 vertices and m – n /2 edges remain for G*. This is true also for H and G' which are subgraph of G*.

2. Since F = MST(H), F has at most vH – 1 edges, and thus less than n/4.

3. According to the edge distribution,eH + eG' eG* + eF

m – n/2 + vH

m – n/2 + n/4 mThus, the number of edges in subproblems is less than original’s.

FG'

H

G*

G*

G'

G

H

original problem

Borůvka

subproblem subproblem

sample

F-light

Page 38: A Randomized Linear-Time Algorithm to Find Minimum Spanning Trees David R. Karger David R. Karger Philip N. Klein Philip N. Klein Robert E. Tarjan

4. Since total edge number of subproblems at the same depth is bound by m, and the depth is at most log4n, the overall edge number is at most m log4n.

5. Since vertex number for submproblem at depth d is at most n/4d, the edge is at most (n/4d)2. Overall edge number is also bound by

Analysis - Worst Case

0d

22

dd n

78

4n2

6. Since running time of the algorithm is proportional to edge number, we could give time complexity as

O(min{n2, m log n})

Page 39: A Randomized Linear-Time Algorithm to Find Minimum Spanning Trees David R. Karger David R. Karger Philip N. Klein Philip N. Klein Robert E. Tarjan

Analysis – Average Case

Here, we analyze the average case by partitioning the invocations as “left paths” (red paths above). After reckoning edges of subproblems along each “left path”, sum them up and we will get the overall estimate.

Page 40: A Randomized Linear-Time Algorithm to Find Minimum Spanning Trees David R. Karger David R. Karger Philip N. Klein Philip N. Klein Robert E. Tarjan

G*

G'

G

H

original problem

Borůvka

subproblem subproblem

sample

F-light

Analysis – Average Case

1. For G* with k edges, after sampling with 1/2 probability for each edge, E[eH] = k/2.Since G* G, we have E[eG*] E(eG) and

E[eH] = E[eG*]/2 E[eG]/2.

2. Along the left path with starting E[eG] = k, the expected value of total edges is

k22

keEeE0d

d0d

)d(G0d

)d(G

Page 41: A Randomized Linear-Time Algorithm to Find Minimum Spanning Trees David R. Karger David R. Karger Philip N. Klein Philip N. Klein Robert E. Tarjan

G*

G'

G

H

original problem

Borůvka

subproblem subproblem

sample

F-light

Analysis – Average Case

1. For each F-light edge, there is 1/2 probability of being sampled into H.

2. Since each F-light edge in H is also in F and F includes no edges not in H, the chance that an F-light is in F is also 1/2.

3. For edge e with weight heavier than the lightest of F is never F-light since there would be cycle with e as heaviest edge.

4. Thus, the heaviest F-light edge is always in F. Given eF=k, eG' is the number trials before k successes (selected into H), and it forms a negative binomial distribution.

Given vG*= n, F = MST(H) where H G*

F

1

23

45

6 78

11

9

10

12 13

Page 42: A Randomized Linear-Time Algorithm to Find Minimum Spanning Trees David R. Karger David R. Karger Philip N. Klein Philip N. Klein Robert E. Tarjan

G*

G'

G

H

original problem

Borůvka

subproblem subproblem

sample

F-light

Analysis – Average Case

F

1

23

45

6 78

11

9

10

12 13

5. For eF = k, eG' is of negative binomial distribution with parameter 1/2 and k. Thus E[eG'] = k/(½) = 2k.

6. Summing all cases, we get

1n

0kF'GF'G ke|eE)ke(PeE

n2

n2)ke(P

k2)ke(P

1n

0kF

1n

0kF

Given vG*= n, F = MST(H) where H G*

Page 43: A Randomized Linear-Time Algorithm to Find Minimum Spanning Trees David R. Karger David R. Karger Philip N. Klein Philip N. Klein Robert E. Tarjan

Analysis – Average Case

1. For all right subproblems, expected sum of edges is at most

E(e) 2(n/4)

E(e) 2(n/42)

E(e) 2(n/43)

E(e) 2(n/44)E(e) 2(n/45)... ...

e=m

2. For each left path, the expected total number of edges is twice of the leading subproblem, which is root or right child. So the overall expected value is at most 2(m + n).

3. Since running time is proportional to overall edge number, so its expected value is O(m) = O(m + n).

n24n2 )1d(

1dd

Page 44: A Randomized Linear-Time Algorithm to Find Minimum Spanning Trees David R. Karger David R. Karger Philip N. Klein Philip N. Klein Robert E. Tarjan

Analysis – Probability of Linearity

Chernoff Bound:

Given xi as i.d.d. random variables and 0< i n, and X is the sum of all xi, for t > 0, we have

n

1i

tXAt ieEeAXPr

Thus, the probability that less than s successes (each with cance p) within k trail is

21

21)s(Ω

ktst

k

1i

tXst

p and t for ,e

)pe(e

eEesXPr i

Page 45: A Randomized Linear-Time Algorithm to Find Minimum Spanning Trees David R. Karger David R. Karger Philip N. Klein Philip N. Klein Robert E. Tarjan

G*

G'

G

H

original problem

Borůvka

subproblem subproblem

sample

F-light

Analysis – Probability of Linearity

Given a path with leading problem G, eG = k

• For each edge in G, it has 1/2 less chance to be kept in next subproblem. and each edge-keep contributes 1 to the total edge number. The path ends when the k-th edge-move occurs.

• The probability there are 3k more total edges is probability there are k less edge-remove in k+3k trail. According to Chernoff bound, the probability is exp(-(k)).

Page 46: A Randomized Linear-Time Algorithm to Find Minimum Spanning Trees David R. Karger David R. Karger Philip N. Klein Philip N. Klein Robert E. Tarjan

G*

G'

G

H

original problem

Borůvka

subproblem subproblem

sample

F-light

Analysis – Probability of Linearity

1. Given vG* = n'. For each edge in G', it has 1/2 chance to be in F. Since eF = n' – 1, the probability that eG' > 3n' is probability there are n' - 1 less F edge in 3k trail. According to Chernoff bound, the probability is exp(-(n')).

2. There is at most n/2 total vertices in all G*. If we take all the trail as a whole, the probability that there are more than 3n/2 edges in all right subproblem is exp(-(n)).

Page 47: A Randomized Linear-Time Algorithm to Find Minimum Spanning Trees David R. Karger David R. Karger Philip N. Klein Philip N. Klein Robert E. Tarjan

Analysis – Probability of Linearity

Combined with previous two analysis, there is at least probability as below that total edges never exceeds 3(m+3n/2), where is the set of all right problems:

)m(Ω

Γg

)v(Ω)m(Ω)n(Ω e1e1e1e1 g

Thus, the probability that time complexity is O(m) is1 – exp((m)).