a threshold of ln(n) for approximating set cover by uriel feige lecturer: ariel procaccia

A threshold of ln(n) for approximating set cover

By Uriel Feige

Lecturer: Ariel Procaccia

Set Cover

We are given an (unweighted!) collection F of subsets of U={1,…,n}.

We must find as few as possible subsets from F such that their union covers U.

Hypergraph vertex cover (lecture of 10/3) is a special case of set cover.

Approximation algorithms

A greedy algorithm: Recall algorithms 1… At each stage, add to the cover the subset which

maximizes the number of new elements. This algorithm gives an approximation of ln(n)-lnln(n)

+O(1).

An approximation of ln(n) can also be achieved using a linear programming relaxation.

A bad example for the greedy algorithm

U={(x,y): 1 ≤ x,y ≤ n}

S1 = {(x,y): x ≤ n/2}

S2 = {(x,y): x ≥ n/2}

T1= {(x,y): 1 ≤ y ≤ n/2}

T2= {(x,y): n/2 ≤ y ≤ 3n/4}

…

32

1688

Ratio: logn/2

Overview (1)

We wish to prove Theorem 8: If a poly time algorithm can approximate set cover within (1-ε)ln(n), then NP is a subset of DTIME(nO(loglog(n))).

The strategy: a reduction from a k-prover proof system for an NP-complete problem.

Familiar with IP?

Overview (2)

Parts of the lecture: MAX 3SAT-5. The k-prover proof system. Partition systems. The reduction to set cover. Thundering applause.

May the force be with us.

Simila

r

to the

lecture

of 17/3.

MAX 3SAT-5

Input: A CNF formula with n variables and 5n/3 clauses, in which every clause contains exactly three literals, every variable appears in exactly 5 clauses, and a variable does not appear in a clause more than once.

Theorem 1: For some >0, it is NP-hard to distinguish between 3CNF-5 formulas where OPT=1 or OPT≤(1-).

Reduction from MAX-

3SAT-B

Two prover proof system for 3SAT-5 Presented as to provide intuition and help in the

analysis of the k-prover system. One round, two prover system for 3SAT-5. Protocol:

V selects an index of a clause, sends it to the first prover, selects a random var in the clause, and sends it to the second prover.

First prover returns 3 bits, second returns 1 bit. V accepts if the following conditions hold:

Clause check: the assignment sent by the first prover satisfies the clause.

Consistency check: the assignment sent by the second prover is identical to the assignment for the same variable sent by the first prover.

Assgn. To clause and

var

Proposition 2: If OPT=1-ε, then under the optimal strategy of the provers, V accepts with probability (1-ε/3).

Proof:The strategy of the second prover defines an

assignment A to the variables. If V selects a clause that is not satisfied by A

(prob. ε), the first prover must set one of the variables differently from A.

The consistency check fails with prob. ≥ 1/3. ■

There is a strategy with acceptance prob ε/3

Parallel repetition

We would like to lower the error. Modified proof system: V sends to the first

prover l clauses, from each chooses a var, and sends these l vars to the second prover.

Theorem 3 (Ran Raz): If a one round two prover system is repeated l times independently in parallel, then the error is 2-cl, where c>0 is a constant that depends only on the original proof system.

The error of the modified two prover system is ≤ 2-cl, for some universal constant c. C for subtle

The k-prover proof system

Binary code with k codewords, each of length l and weight l/2, with Hamming distance at least l/3.

Each prover is associated with a codeword. The Protocol:

The verifier selects l clauses C1,…,Cl, then selects a var from each clause: the distinguished variables x1,…,xl.

Prover Pi receives Cj for those coordinates in its codeword that are 1, and xj for the coordinates that are 0, and replies with 2l bits.

The answer of the prover induces an assignment to the distinguished variables.

Acceptance predicate: Weak: at least one pair of provers is consistent. Strong: every pair of provers is consistent.

P1: 0011 P2: 0101 P3: 1100

V c1 c2 v3 v4100,010,1,0

Lemma 4: If OPT=1 then the provers have a strategy that causes V to always strongly accept. If OPT≤(1-ε), then the verifier weakly accepts with probability at most k22-cl, where c>0 is a constant that depends only on ε.

Proof: If OPT=1, the provers can base their answers on a

satisfying assignment. Assume OPT=(1- ε), and that V weakly accepts with

prob. ≥ δ. Then with respect to Pi and Pj, V accepts with probability δ/k2.

There are ≥ l/6 coordinates on which Pi receives a clause, and Pj a var in this clause. Other coordinates: for free.

The provers have a strategy that succeeds with prob. ≥ δ/k2 on l/6 parallel repetitions of the original proof system.

δ/k2<2-cl. ■

Partition systems

A partition system B(m,L,k,d) has the following properties.1. There exists a ground set B of m distinct points. 2. There is a collection of L distinct partitions p1,…,pL.3. For 1≤i≤L, partition pi is a collection of k disjoint subsets of B

whose union is B.4. Any cover of the m points by subsets that appear in pairwise

different partitions requires at least d subsets. Lemma 6: For every c≥0 and m sufficiently large, there

is a partition system B(m,L,k,d) whose parameters satisfy the following inequalities:1. L ≈ (log(m))c.2. K can be chosen arbitrarily as long as k < ln(m)/3ln(ln(m)).3. d = (1-f(k))k∙ln(m), where f(k)→∞ as k→∞.

Proof: a randomized construction usually works.

Looks familiar?

L=4

k = 4

The reduction R=(5n)l : num of r. r↔Br(m,L,k,d): m=nΘ(l),

L=2l, d=(1-f(k))k∙ln(m). Partition↔dist. vars,

subset↔prover. B(r,j,i) = i’th subset, partition j, partition system r.

Subsets are S(q,a,i): all r s.t. (q,i)@r, extract from a an assignment ar to dist. vars. S(q,a,i) is the union of all subsets B(r,ar,i), for all r s.t. (q,i)@r.

Q=nl/2(5n/3)l/2 (questions to Pi).

r=1

r=2

r=R

L=2l

m points, k subsets

00011011

B(2,4,3)S(q,a,1)

|U?|

Lemma 5: 1. Completeness: OPT=1 there is a set cover

of size kQ.

2. Soundness: OPT=(1-ε) more than (1-2f(k))kQln(m) subsets are required.

Proof of Completeness: If OPT=1, the provers answer consistently

with the satisfying assignment. For any r, consider S(q1,a1,r),…,S(qk,ak,r) s.t.

(qi,i)@r, and ai is the appropriate answer.

Br(m,L,k,d) is covered by these k sets.

Similar for every r. The number of subsets is kQ. ■

ar=01

Proof of soundness: the saga begins

Assume OPT≤1-ε, and that there exists C that covers U, such that |C|=(1-δ)kQln(m), δ=2f(k).

q to prover Pi↔ weight w(q,i)=number of answers a s.t. S(q,a,i) is in C.

Σq,iw(q,i) = |C|. r↔w(r)= Σ(q,i)@rw(q,i). This weight = number of subsets that

participate in covering Br(m,L,k,d). Call r good if w(r)<(1-δ/2)k∙ln(m).

The good, the bad, and the random

Proposition 6: The fraction of good r is at least δ/2.

Proof: Assume otherwise, then:

On the other hand,

Hence |C|>(1- δ)kQln(m) – contradiction. ■

2(1 / 2) ln (1 ) lnrr

w kR m kR m

( , )@ ,

( , ) ( , ) | |rr r q i r i q

R Rw w q i w q i C

Q Q

r is good if:

w(r)= Σ(q,i)@rw(q,i)<(1-δ/2)k∙ln(m)

Proposition 7: C covers U, |C|=(1-δ)kQln(m) for some strategy for the k provers, V weakly accepts with prob. ≥ 2δ/(k∙ln(m))2.

This proves soundness, since 2δ/(k∙ln(m))2>k22-cl, for l=Θ(loglogn).

Proof of proposition 7: Randomized strategy: on q to Pi, select a from the set of a

s.t. S(q,a,i) is in C. For a fixed r: sets B(r,p,i) in the cover of Br(m,L,k,d) ↔ sets

S(q,a,i) in C. For a good r: C used two subsets from p in the cover of

Br(m,L,k,d). Denote B(r,p,i) and B(r,p,j) ↔ S(qi,ai,i) and S(qj,aj,j). Denote: a is in Ar,i iff S(qi,a,i) is in C. r is good |Ar,i|+|Ar,j|<k∙ln(m). The prob. that the provers answer ai and aj ≥ 4/(k∙ln(m))2.

Answers are consistent with p V weakly accepts. The prob. that V chooses a good r ≥ δ/2. ■

Fix coin tosses for provers

Theorem 8: If there is some ε>0 such that a polynomial time algorithm can approximate set cover within (1-ε)ln(n), then NP is a subset of DTIME(nO(loglog(n))).

Proof: Assume there is a poly time algorithm A that

approximates set cover within (1-ε)ln(m). Reduce GAP-3SAT-5 to set cover as described, with

k s.t. f(k)<ε/4, and m=(5n)2l/ε. m,R,Q=nO(loglogn) time to perform the reduction is

nO(loglogn). ln(m)>(1-ε/2)ln(N), where N=mR. By lemma 5, for a YES instance all points can be

covered by kQ subsets, and for a NO instance all points cannot be covered by (1-2f(k))kQln(m).

The ratio is (1-2f(k))ln(m)>(1-ε)lnN. ■

Closing remarks (1)

Max k-cover: Given U and F as before, select k subsets that such

that their union has maximum cardinality. The obvious greedy algorithm approximates max k-

cover within a ratio of at least 1-1/e ≈ 0.632. Theorem 8 can be directly used to prove: if max k-

cover can be approximated in a polynomial time within a ration of (1 - 1/e + ε) for some ε>0, then NP is a subset of DTIME(nO(loglogn)).

Closing remarks (2)

Refinements: In our hardness of approximation result for set cover,

ε is a constant. We may strengthen our assumption so as to improve our result.

ZTIME=class of languages that have a probabilistic algorithm that runs in expected time t (with zero error).

If for some η>0, NP is not a subset of ZTIME(2n^η), then for some constant c’>0 there is no polynomial time algorithm that approximates set cover within ln(n) –c’(lnln(n))2.

Questions and cool animations

?

a threshold of ln(n) for approximating set cover by uriel feige lecturer: ariel procaccia

Documents