approximating the mst weight in sublinear time

Approximating the MST Weight in Sublinear Time

Bernard Chazelle (Princeton)

Ronitt Rubinfeld (NEC)

Luca Trevisan (U.C. Berkeley)

Sublinear Time Algorithms

• Make sense for problems on very large data sets

• Go contrary to common intuition that “an algorithm must be given at least enough time to read all the input”

• Must be probabilistic• Must be approximate

Approximation

• For decision problems:the output is the correct answer either for the given input, or at least for some other input “close” to it.(Property Testing)

• For optimization problems:the output is a number that is close to the cost of the optimal solution for the given input.(There is not enough time to construct a solution)

Previous Examples

• The cost of the max cut in a graph with n nodes and cn2 edges can be approximated to within a factor in time 2poly(1/c).(Goldreich, Goldwasser, Ron)

• Other results for “dense” instances of optimization problems, for low-rank approximation of matrices, . . .

• No results (that we know of) for problems on bounded-degree graphs.

Our Result

• Given a connected weighted graph G, with maximum degree d and with weights in the range {1, . . . , w},

• we can compute the weight of the minimum spanning tree of G to within a factor of in time O(dw-2log w/);

• we also prove that it is necessary to look at dw-2) entries in the representation of G.

(We assume that G is represented using adjacency lists)

Main Intuition

• Suppose all weights are 1 or 2• Then the MST weight is equal to

n – 2 + # of conn. comp. induced by weight-1 edges

weight 1

weight 2connected componentsInduced by weight-1 edges

MST

Algorithm

Algorithm for weights in {1,2}• To approximate the MST weight to within a

multiplicative factor (1+) it’s enough to approximate c1 to within an additive factor n

(c1:= # of connected components induced by weight-1 edges)

• To approximate c1 we use ideas from Goldreich-Ron (property testing of connectivity)

• The algorithm runs in time O(d-2log-1)

Approximating # of connected components

• Given a graph G of max degree d with n nodes we want to compute c, the number of connected components of G up to an additive error n.

• For every vertex u, definenu := 1 / size of component of u

• Thenc = u nu

• And if we callau:= max {nu, }

• Thenc = u au n

Wrapping up the analysis

• Can estimate summation of au using sampling

• Once we pick a vertex u at random, the value au can be computed in time O(d/)

• We need to pick O(1/) vertices, so we get running time O(d/)

Algorithm

CC-APPROX() Repeat O(1/2) times

pick a random vertex v

do a BFS from v, stopping after 2/ steps

b:= 1 / number of visited vertices

return (average of the values b) * n

Improved Algorithm

CC-APPROX(, W)

Repeat O(1/2) times

pick a random vertex v

do first step of a BSF from v

b:=0; t:=1

(*) flip a coin

If heads, and visited <W nodes so far

t:=2*t

continue BSF until ends or t nodes are visited

if BSF ends, b:= 2#random coins / nodes visited

else go to (*)

return (average of the values b) * n

• Inner procedure takes average O(dlog W) time

Analysis

• Main idea: if v is in a component of size c<W, then b is zero with prob. ~(1 – 1/c) and ~1 with probability ~1/c. The average of b is 1/c.

• Setting W:=2/ we get– each time, the average of b is within /2 from the

average over v of nv

(that is, (# conn. comp.)/n)– Repeating O(1/2) times, the probability of

deviating by another factor /2 is bounded by a constant

– The average running time is O(d-2logW), that is O(d-2log -1).

General Weights

• Generalize argument for weight 1 and 2.• Let

ci = # of connected components induced by edges of weight at most i

• Then the MST weight is

n – w + i=1,. . ., w-1 ci

Final Algorithm

• For j=1,. . ., w-1, call CC-APPROX(,2w/) on the subgraph of G obtained by removing edges of cost >j

• Get ai, an approximation of ci

• Return n – w + i=1,. . ., w-1 ai

• Average answer is within n/2 from cost of MST, and variance is bounded

• Total running time O(dw-2log w/)

Extensions

• Low average degree

• Non-integer weights

Lower Bound

Abstract sampling problem

• Fix p,• Define two binary distributions A,B• Pr[A=1] = p, Pr[A=0]=1-p• Pr[B=1] = p+ p, Pr[B=0]=1-p-p• Distinguishing A from B with constant

probability requires (1/p2) samples

Reduction• Fix p = 1/w• We consider two distributions of weights over

a cycle of length n• In distribution G, for each edge we sample

from A; if A=0 the edge gets weight 1, otherwise it gets weight w

• In distribution H, same with B• H and G are likely to have MST costs that

differ by about n• To distinguish them we need to look at

(w/2) edge weights

Higher Degree

• Sample from G or H as before, – also add d-1 forward edges of weight w+1

from each vertex– randomly permute names of vertices

• Now, on average, reading t edge weights gives us t/d samples from A or B, so t=(dw/2)

Conclusions

• A plausibility result showing that approximation for a standard graph problem in bounded degree (and sparse) graphs can be achieved in time independent of number of vertices

• Use of approximate cost without solution?• More problems?

– The trivial Max SAT approximation algorithms can be implemented in constant time, and give (an implicit representation of) a solution

– Non-trivial Max SAT approximation? (say, 3/4)– Something really useful?

approximating the mst weight in sublinear time

Documents