1 algorithms for large data sets ziv bar-yossef lecture 13 june 22, 2005

1

Algorithms for Large Data Sets

Ziv Bar-YossefLecture 13

June 22, 2005

http://www.ee.technion.ac.il/courses/049011

2

Data StreamSpace Lower Bounds

3

Data Streams: Reminder

Data

(n is very large)

Computer Program

Stream through the data;Use limited memory

Approximation of

How do we provememory

lower boundsfor data stream

algorithms?

4

CC(g) = min: computes g cost()CC(g) = min: computes g cost()

Communication Complexity[Yao 79]

Alice Bobm1

m2

m3

m4

Referee

cost() = i |mi|cost() = i |mi|

a,b) “transcript”

5

CC1(g) = min: computes g cost()CC1(g) = min: computes g cost()

1-Way Communication Complexity

Alice Bob

Referee

cost() = |m1| + |m2|cost() = |m1| + |m2|

a,b) “transcript”

m1

m2

6

Data Stream Space Lower Bounds

f: An B: a function DS(f): data stream space complexity of f

Minimum amount of memory used by any data stream algorithm computing f

g: An/2 × An/2 B: g(x,y) = f(xy) CC1(g): 1-way communication complexity of g

Proposition: DS(f) ≥ (CC1(g)) Corollary: In order to obtain a lower bound on DS(f),

suffices to prove a lower bound on CC1(g).

7

Reduction: 1-Way CC to DS

Proof of reduction: S = DS(f) M: an S-space data stream algorithm for f

Lemma: There is a 1-way CC protocol for g whose cost is S + log(|B|).

Proof: works as follows: Alice runs M on x Alice sends state of M to Bob (S bits) Bob continues execution of M on y Bob sends output of M to Alice

8

Distinct Elements: Reminder

Input: a vector x 2 {1,2,…,m}n

DE(x) = number of distinct elements of x Example: if x = (1,2,3,1,2,3), then DE(x) = 3\

Space complexity of DE: Randomized approximate: O(log m) space Deterministic approximate: (m) space Randomized exact: (m) space

9

The Equality Function

EQ: X × X {0,1} EQ(x,y) = 1 iff x = y

Theorem: CC1(EQ) ≥ (log |X|) Proof:

: any 1-way protocol for EQ A(x): Alice’s message on input x B(m,y): Bob’s message when receiving message m

from Alice and input y (x,y) = (A(x), B(A(x),y)): transcript on (x,y)

10

Equality Lower Bound

Proof (cont.): Suppose |A(x)| < log|X| for all x X

Then, # of distinct messages of Alice < 2log|X| = |X|

By pigeonhole principle, there exist two inputs x,x’ X s.t. A(x) = A(x’)

Therefore, (x,x) = (x’,x)But EQ(x,x) ≠ EQ(x’,x). Contradiction.

11

Combinatorial Designs

1. For each i, |Ti| = |U|/4

2. For each i,j, |Ti Tj| ≤ |U|/8.

T1

T2

T3U

A family of subsets T1,…,Tk of a universe U s.t.

Fact: There exist designs of size k = 2(|U|).

(Constant rate, constant relative minimum distance binary error-correcting codes).

12

Reduction from EQ to DE

U = {1,2,…,m} X = { T1,….,Tk }: design of size k = 2(m)

EQ: X × X {0,1} Note:

If x = y, then DE(xy) = m/2 If x ≠ y, then DE(xy) ≥ 3m/4

Therefore: deterministic data stream algorithm that approximates

DE with space S 1-way protocol for EQ with cost S + O(log m)

Conclusion: DS(DE) ≥ (m)

13

Randomized Communication Complexity Alice & Bob are allowed to use random coin tosses For any inputs x,y, the referee needs to find g(x,y) w.p. 1 -

RCC(g) = minimum cost of a randomized protocol that

computes g with error ¼. RCC1(g) = minimum cost of a randomized 1-way protocol

that computes g with error ¼. What is RCC1(EQ)?

RDS(f) = minimum amount of memory used by any randomized data stream algorithm computing f with error ¼.

Lemma: RDS(f) ≥ (RCC1(g))

14

Set Disjointness

U = { 1,2,…,m } X = 2U = {0,1}m

Disj: X × X {0,1} Disj(x,y) = 1 iff x y ≠ Equivalently:

Theorem [Kalyanasundaram-Schnitger 88]

RCC(Disj) ≥ (m)

15

Reduction from Disj to DE

U = {1,2,…,m} X = 2U

Disj: X × X {0,1} Note: DE(xy) = |x y| Hence,

If x y = , then DE(xy) = |x| + |y| If x y ≠ , then DE(xy) < |x| + |y|

Therefore: randomized data stream algorithm that computes DE exactly with

space S 1-way randomized protocol for Disj with cost S + O(log m)

Conclusion: RDS(DE) ≥ (m)

16

Information Theory Primer

X: random variable on U H(X) = entropy of X = amount of

“uncertainty” in X (in bits) Ex: if X is uniform, then H(X) = log(|U|) Ex: if X is constant, then H(X) = 0

H(X | Y) = conditional entropy of X given Y = amount of uncertainty left in X after knowing Y Ex: H(X | X) = 0 Ex: If X,Y are independent, H(X | Y) =

H(X) I(X ; Y) = H(X) – H(X | Y) = H(Y) –

H(Y | X) = mutual information between X and Y

H(X) H(Y)

H(X|Y)

H(Y|X)

I(X;Y)

H(X,Y)

17

Information Theory Primer (cont.)

Sub-additivity of entropy:

H(X,Y) ≤ H(X) + H(Y). Equality iff X,Y independent.

Conditional mutual information:

I(X ; Y | Z) = H(X | Z) – H(X | Y,Z)

18

Information Complexity

g: A × B C: a function : a communication complexity protocol that

computes g : distribution on A × B (X,Y): random variable with distribution Information cost of : icost() = I(X,Y; (X,Y)) Information complexity of g:

IC(g) = min: computes g icost()

Lemma: For any , RCC(g) ≥ IC(g)

19

Direct Sum for Information Complexity We want to:

Find a distribution on {0,1}m {0,1}m

Show that IC(Disj) ≥ (m) Recall that Disj(x,y) = ORi = 1 to m (xi AND yi) We will prove a “direct sum” theorem for IC(Disj):

Disj is a “sum” of m independent copies of AND Hence, information complexity of Disj is m times information

complexity of AND

We will define a distribution on {0,1} {0,1} and then define = m

“Theorem” [Direct Sum]: IC(Disj) ≥ m IC(AND) It would then suffice to prove an (1) lower bound on

IC(AND)

20

Conditional Information Complexity

Cannot prove direct sum directly for information complexity Recall:

: distribution on {0,1}m {0,1}m

(X,Y): random variable with distribution (X,Y) is product, if X,Y are independent Z: some auxiliary random variable on domain S (X,Y) is product conditioned on Z, if for any z S, X,Y are

independent conditioned on the event { Z = z }. Conditional information complexity of g given Z:

CIC(g | Z) = min: computes g I(X,Y ; (X,Y) | Z)

Lemma: For any and Z, IC(g) ≥ CIC(g | Z)

21

Input Distributions for Set Disjointness : distribution on pairs {0,1} {0,1} (U,V): random variable with distribution (U,V) are generated as follows:

Choose a uniform bit D If D = 0, choose U uniformly from {0,1} and set V = 0 If D = 1, choose V uniformly from {0,1} and set U = 0

We define = m

Note: is not product Conditioned on Z = Dm, is product For every (x,y) in the support of , Disj(x,y) = 0.

24

Reduction Step Want to show:

I((Xi,Yi) ; (X,Y) | Dm) ≥ CIC(AND | D)

I((Xi,Yi) ; (X,Y) | Dm) = d-i Pr(D-i = d-i) I((Xi,Yi) ;

(X,Y) | Di,D-i = d-i)

A protocol for computing AND(xi,yi): For all j i, Alice and Bob select Xj and Yj independently

using dj

Alice and Bob run the protocol on X = (X1,…,Xi-1 ,xi,Xi+1,…,Xm) and Y = (Y1,…,Yi-1,yi,Yi+1,…,Ym)

Note that Disj(X,Y) = AND(xi,yi)

icost of this protocol = I((Xi,Yi) ; (X,Y) | Di,D-i = d-i)

25

End of Lecture 13

1 algorithms for large data sets ziv bar-yossef lecture 13 june 22, 2005

Documents

x alice

inputs x

x logx

vector x

input y x

input x b

y slide

g slide