coms 6998: advanced complexity spring 2017 lecture 14

18
COMS 6998: Advanced Complexity Spring 2017 Lecture 14: April 27, 2017 Lecturer: Rocco Servedio Scribes: Emmanouil Vlatakis 1 Introduction Last time During the last lecture we completed ABFR lower bound method. We finished the proof about computing PAR using circuits with a MAJ gate at the top and , , ¬ everywhere else. In addition, we proved Razborov AC 0 lower bounds using PAR gates. Today Today we discuss about the linear threshold functions (LTF s). We will prove lower bounds for circuits that use LTF gates. We will present the main facts on LTF s and the basic Randomized Communication Complexity (RCC) model. Finally, we analyze LTF under RCC model. 2 Linear Threshold Functions (LTF s) Definition 1. Linear Threshold Function (LTF) A boolean function f : {0, 1} n →{0, 1} is called a weighted majority or (linear) threshold function if it is expressible as f (x)= sgn(w 1 x 1 + ··· + w n x n - θ) = sgn( ~ w · ~x - θ) for some w 1 , ··· ,w n R A simple example of LTF is φ(x)= {3x 1 +4x 2 + ··· + 10x n 71} LTF s have also some several different names: weighted majority, weighted threshold functions, perceptrons, halfspaces, linear separators etc. Definition 2. Polynomial Threshold Function (PTF s) A function f : {0, 1} n →{0, 1} is called a polynomial threshold function (PTF) of degree at most k if it is expressible as f (x) = sgn(p(x)) for some real polynomial p : {0, 1} n R of degree at most k 1

Upload: others

Post on 21-Jan-2022

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: COMS 6998: Advanced Complexity Spring 2017 Lecture 14

COMS 6998: Advanced Complexity Spring 2017

Lecture 14: April 27, 2017Lecturer: Rocco Servedio Scribes: Emmanouil Vlatakis

1 Introduction

Last time

During the last lecture we completed ABFR lower bound method. We finished theproof about computing PAR using circuits with a MAJ gate at the top and ∨, ∧, ¬everywhere else. In addition, we proved Razborov AC0 lower bounds using PAR gates.

Today

Today we discuss about the linear threshold functions (LTF s). We will prove lowerbounds for circuits that use LTF gates. We will present the main facts on LTF s andthe basic Randomized Communication Complexity (RCC) model. Finally, we analyzeLTF under RCC model.

2 Linear Threshold Functions (LTF s)

Definition 1. Linear Threshold Function (LTF) A boolean function f : 0, 1n→0, 1is called a weighted majority or (linear) threshold function if it is expressible as f(x) =sgn(w1x1 + · · ·+ wnxn − θ) = sgn(~w · ~x− θ) for some w1, · · · , wn, θ ∈ R

A simple example of LTF is φ(x) = 13x1 + 4x2 + · · ·+ 10xn ≥ 71LTF s have also some several different names: weighted majority, weighted threshold

functions, perceptrons, halfspaces, linear separators etc.

Definition 2. Polynomial Threshold Function (PTF s) A function f : 0, 1n→0, 1is called a polynomial threshold function (PTF) of degree at most k if it is expressibleas f(x) = sgn(p(x)) for some real polynomial p : 0, 1n→R of degree at most k

1

Page 2: COMS 6998: Advanced Complexity Spring 2017 Lecture 14

2 LINEAR THRESHOLD FUNCTIONS (LTF S) 2

Figure 1: Linear Separator.

Corollary 3. Every LTF is just a degree-1 PTF.

Fact 4. Is every Boolean function an LTF ? No the PAR(x) is not linearly separable:

Figure 2: f(x) = xor(x1, x2)

Fact 5. Any LTF f(x) has many different representations based on ~w, θ. In fact thereare infinite different representations over the binary hypercube −1, 1n. Any coeffi-cient can be perturbed wi in the range [wi− 1

cn, wi +

1cn

], for a significant large constantc, without changing the value of f(x) over −1, 1n.

Corollary 6. Combining the previous fact and the density of the rational numbers inR, it holds that there is always a representation of LTF with only rational coefficients.

Page 3: COMS 6998: Advanced Complexity Spring 2017 Lecture 14

2 LINEAR THRESHOLD FUNCTIONS (LTF S) 3

Example 7. Both them are equivalent over −1, 12:

∥∥∥∥ x1 + x2 ≥ 150.99x1 + 1.002x2 ≥ 15.005

∥∥∥∥Fact 8.

sgn(~w · ~x− θ) = sgn(c(~w · ~x− θ)

)∀c > 0

Fact 9. Any LTF has an integer representation, that is w1, · · · , wn, θ ∈ Z.

Proof. As we mentioned before, we can choose a very small εn such that f(x) =sgn(

∑i∈[n] wixi − θ) equals to g(x) = sgn(

∑i∈[n] wixi − θ) for all x ∈ −1, 1n, where

wi ∈ (wi − εn, wi + εn). As it is well-known from real analysis, the rational numbersare dense in the set of real numbers. This implies that ∀i ∈ [n] there is at least one wiin the range (wi− εn, wi + εn), which is rational. Thus we can just adopt the followingsimple strategy: Every non rational coefficient wi, θ can be perturbed to any rationalnumber of its aforementioned corresponding range. After that, we can just multiplyeach coefficient with least common multiple of the denominators.

Definition 10 (The weight of an integer representation of LTF). The weight of aninteger representation of LTF’s parameter (~w, θ) is

∑ni=1 |wi|

Definition 11 (The weight of an LTF). The weight of an LTF function f(x), denotedby W(f(x)) is the smallest weight over all the weights of the different integer represen-tations of the function

Example 12.

or(x1, · · · , xn) = sgn(n∑i=1

xi − 1), ~x ∈ 0, 1n

Notice that using the above representation we can easily argue that W(or) ≤ n. Inaddition we can easily argue that this is optimal. Let g(x) = sgn(

∑ni=1wixi − t) be the

weight-optimal representation of or function. By definition of or function we have

that ∀ i ∈ [n] : f(~ei) = f(

(0, · · · , 0, · · · , 1i-th position

, · · · , 0))

= 1. This implies that

∀i ∈ [n] : wi ≥ t Additionally, we have that f(~0) = −1. This implies that t > 0.Therefore it holds that ∀i ∈ [n] : wi > 0 ⇒ wi ≥ 1, since they are integers. Thisconclude the proof that the weight of or is at least ≥ n too.

Example 13. MAJ(x1, · · · , xn) has weight Θ(n).

MAJ(x1, · · · , xn) =

∑i xi ≥ n/2 ~x ∈ 0, 1n∑i xi ≥ 0 ~x ∈ −1, 1n

Page 4: COMS 6998: Advanced Complexity Spring 2017 Lecture 14

2 LINEAR THRESHOLD FUNCTIONS (LTF S) 4

To prove that the lower bound of W(MAJ) = Ω(n) , i.e for the 0, 1n case, instead of~eii∈[n],~0, one just can use all the

(n

n/2+1

)positive examples of MAJ function which

contain bn/2 + 1c ones and all the(

nn/2−1

)negative examples which contain dn/2− 1e

examples. This family of vectors is sufficient to prove that t ≥ n/2 and wi ≥ 1.

2.1 LTF ’s weight upper bound

Lemma 14 (Weight Upper bound). Any n−variate LTF has weight ≤ 2Θ(n logn)

Proof sketch: Each configuration (x1, · · · , xn), f(~x) corresponds to a linear inequality.

• For example if f(1, · · · , 1) = 1 then we get :

±w1 + w2 + · · ·+ wn − θ ≥ 1

• On the other hand, if f(−1, 1, · · · , 1) = −1 then we get −w1 + · · ·+wn− θ ≤ −1

Since there is the guarantee of LTF property for f(x) , we know that the system isfeasible. To be more precise, using LP arguments we can conclude that there exists aset of (n+ 1) inequalities, which if they will be converted to equalities , will give us asolution w∗1, · · · , w∗n = S. Via LP arguments again, it is not difficult to prove thatthis solution is guaranteed that will satisfy all 2n linear inequalities.

(n+ 1) :

(±)w1 · · · −θ = ±1...

. . . −θ = ±1... · · · −θ = ±1

=: SystemM

Notice that the matrix of the system has only (±1) entries. Solving the system withCramer Rule gives us a rational solution. The determinant of the system at the worstscenario is upper bounded by n! ≤ nn = 2n logn. The determinant of the matrix is thedenominator of the solution of the Cramer Rule. Numerators are also the determinantsof minor matrices of the system. Therefore using the scaling trick that we discussedto transform all the coefficients to integers we get a weight of 2n logn.

Question: The previous fact upper bounds the weight of an LTF. Is this huge sizereally necessary?

Page 5: COMS 6998: Advanced Complexity Spring 2017 Lecture 14

3 LTF AS GATES 5

Example 15. Suppose the following decision list:

L =

x1 → x2 → · · · → xn↓ ↓ ↓ ↓

+1 −1 +1 −1

⇒ LTF = sgn(2nx1 − 2n−1x2 + 2n−3 + · · · )

Notice that because of the priority of the decision list we have inequality constraints

like:

|w1| ≥

∑ni=1 /2|w2i|

|w2| ≥∑n

i=1 /2|w2i+1||w3| ≥

∑ni=2 /2|w2i|

...Especially for this problem the smallest weight’s numbers are the Fibonacci sequence

that are growing exponentially as (1+√

52

)n

Lemma 16 (Bad News[Has94]). ∃ n-variate LTF with weights 2Ω(n logn)

Question: Can we approximate any LTF with a lower weight LTF?

Answer: Yes

Fact 17. Let f(x) be any LTF. For any ε > 0 : ∃LTFf ′ such that dist(f, f ′) =Pr

x∈0,1n[f(x) 6= f ′(x)] ≤ ε where W(f ′(x)) is (1

ε)log2(1/ε) poly(n)

Example 18. For the previous reprentation of the decision list L, we can just dropthe lower significance’s numbers.

3 LTF as gates

First, note that LTF gates can be transformed to equivalent MAJ gates:

f(x) = sgn(w1x1 + w2x2 + · · · − θ)

m

f(x) = sgn(x1 + · · ·+ x1︸ ︷︷ ︸w1 times

+x2 + · · ·+ x2︸ ︷︷ ︸w2 times

+ · · ·+−1− · · · − 1︸ ︷︷ ︸θ

)

Page 6: COMS 6998: Advanced Complexity Spring 2017 Lecture 14

3 LTF AS GATES 6

Figure 3: Suppose that we have an LTF function of the form: f(x) = sgn(~w · ~x− θ) =1~w · ~x > θ. Then, we can convert it to a majority gate by duplicating each inputwire wi times.

LTF(3x1, · · · , 2xn) ≡ MAJ(x1, x1, x1, · · · , xn, xn)

So constructing circuits of LTF gates is equivalent with a MAJ gates but at a cost ofthe weight of the LTF.

Some additional interesting remarks:

1. We do not know of any counterexamples to the following claim:

“Every f in NP (e.g. f = Clique) has poly(n)-size LTF circuits of depth 2”

2. Any poly(n)-size depth-d LTF circuit can be computed by a poly(n)-size depth-d+ 1 MAJ circuits

3. Furthermore it is not difficult to prove that any LTF circuit can be computed bydepth−O(1) and/or/not circuits with MAJ gates at the very bottom level.

3.1 LTF circuits lower bounds

Fact 19 (Layered LTF circuits). Any depth-2 layered LTF circuit for parity must haveΩ(√n) LTF gates

Page 7: COMS 6998: Advanced Complexity Spring 2017 Lecture 14

3 LTF AS GATES 7

Note: A circuit is called layered if and only if after a level-decomposition from theinput to the output, every wire of the previous level goes only to the very next levelProof sketch:

• An edge in −1, 1n is just (~x, ~xei), where ~xei =

xeii 6=j = xj

xeii = −xi.

• It is not difficult to prove that if f(x) is an LTF that separates −1, 1n hypercubethen f(x) slices the edge (~x, ~xei) if and only if f(~x) 6= f(~xei)

Definition 20 (Influence of a variable). The influence of a variable xi is thenumber of the edges of the form (y, yei) that are sliced by f .

Infi

(f) =1

2n

∣∣∣(y, yei)|∀y ∈ −1, 1n and f(y) 6= f(yei)∣∣∣

• For any boolean function |f(i)| ≤ Inf i(f) and if f is and LTF then the equalityis tight.

• For any boolean function, it holds:∑S⊆[n]

f 2(S) = 1⋃i∈[n]i ⊆ 2[n]

⇒∑i∈[n]

f 2(i) ≤ 1⇒

For any LTF f :∑i∈[n]

(Infi

(f))2

≤ 1

Using Cauchy-Schwarz we get :∑i∈[n]

Infi

(f) ≤√n

• Therefore, for any boolean function that LTF f , f slices at most ( 1√n) from all

n2n−1 edges.

• Parity function has to slice all the edges. Therefore we need at least√n LTFs

gates in the slice them.

Page 8: COMS 6998: Advanced Complexity Spring 2017 Lecture 14

3 LTF AS GATES 8

Slicing the cubeUsing LTFs arises many interesting questions. For example:How many hyperplanes

do we need to slice all the n2n−1 edges of −1, 1n hypercube?

1. n LTFs are enough. We use one for each variable.

2. For the −1, 16 cube, it is possible to slice it using 5 LTFs.

3. Generalizing the last result, we can slice all the edges of −1, 1n by 5n/6.

How many different LTFs exist? Using the Chow’s theorem we can enumerate2Θ(n2). A tighter result gives us a number of 2n

2/2−n/10 different LTFs.

What is a hard function for LTFs circuits?

Generally our preferred function is PAR. Here, we will use it via the backdoor.

Definition 21. Inner boolean product is a (2,n)-variate function which corresponds toIP(x, y) = 〈x, y〉 mod 2 or equivalenty IP(x1, · · · , xn, y1, · · · , yn) = PAR(x1∧y1, · · · , xn∧yn)

Let’s recall the definition of the parity function:

χS(~x) =⊕i∈S

xi

Therefore we have : χS(~x) = IP(1S(~x), ~x)

3.2 Lecture’s lower bounds:

Today we will prove the following lower bounds:

1. Any circuit of LTFs gates for IP must use at least Ω(n/ log n) gates.

2. Any DT with LTFs as nodes for inner product must have depth Ω(n/ log n)

3. Any circuit of the form MAJ (LTF)s for IP uses at least 2Ω(n) LTFgatesQuestion : Why we will use IP instead of PAR?

Answer : Under Randomized Communication Complexity with public coins,PAR

has a very cheap randomized protocol. On the other hand, IP requires

very high communication complexity even for the randomized protocol.

Page 9: COMS 6998: Advanced Complexity Spring 2017 Lecture 14

4 RANDOMIZED COMMUNICATION COMPLEXITY 9

4 Randomized Communication Complexity

Firstly, let’s recall the model of the communication complexity. Let A and B be twocooperative parties, say Alice and Bob. There is a known f : X×Y→Z, so f(x, y) = z.However, A only gets x and B only gets y. They must send information to each otherto compute f and at the end both must know the result of f(x, y).

Definition 22 (Deterministic Communication Complexity of function f(x, y)). Fora protocol P , cost(P ) = the depth of the protocol tree (this is the number of bits ofcommunication needed in the worst case). The deterministic communication complexityof f,D(f) is the minimum cost(P ), ∀P that compute f .

Definition 23 (Random Communication Model using public coins). Let A and Bbe two cooperative parties, say Alice and Bob. There is a known f : X × Y→Z, sof(x, y) = z. However, A only gets x and B only gets y. They must send informa-tion to each other to compute f and at the end both must know the result of f(x, y).Additionally, Alice and Bob have access to a common string of random bits.

Notice that Random Communication Model using public coins is just a probabilitydistribution over all deterministic protocols.

So, let’s suppose that Alice and Bob draw some random bits and given that thedrawn coins are public, their samples corresponds to a combined sampled Protoco`.

Thus, we have the following definition of the random communication complexityprotocol:

Definition 24 (Random Communication Complexity Model). A random protocol Pfor computing f(x, y) works thus: A, B sample from a public uniform random stringand based on this sample they agree on a deterministic protocol Protoco`. Then A andB apply this protocol.

Definition 25 (The cost of the protocol). For a random protocol P , we can definecost(P )(x, y) = the depth of the protocol tree (this is the number of bits of communi-cation needed in the worst case), given (x,y). For a random protocol P , the cost of theprotocol is cost(P ) = max(x,y)∈X×Y cost(x, y)

Definition 26 (Random Communication Protocol error). A random protocol P haserror ε if and only if ∀(x, y) ∈ X× Y :

Prr∈public coins[P (x, y) 6= f(x, y)] ≤ ε

Definition 27 (Randomized -ε communication complexity). The randomized -ε com-munication complexity of f,Rε(f) is the minimum cost(P ), ∀P that compute f withat most ε error.

Page 10: COMS 6998: Advanced Complexity Spring 2017 Lecture 14

4 RANDOMIZED COMMUNICATION COMPLEXITY 10

Lemma 28.Rε(EQ(x, y)) = O(log 1

ε)

Proof. It suffices to prove that there exists a protocol that achieves that bound.

Protocol P :

• Sample independently, uniformly and publicly log(1/ε) strings r1, · · · , rlog(1/ε)

of length n.

• Let’s imagine that Alice has the string x and Bob has the string y.

• Alice computes log(1/ε) inner products

α1

α2...

αlog(1/ε)

=

〈x, r1〉〈x, r2〉

...〈x, rlog(1/ε)〉

• Bob computes two inner products

β1

β2...

βlog(1/ε)

=

〈y, r1〉〈y, r2〉

...〈y, rlog(1/ε)〉

• Alice sends to Bob these log(1/ε) bits

α1

α2...

αlog(1/ε)

• Bob answers Q(a1, b1, a2, b2) =

n∧i=1

[ai = bi].

• Correctness

– Firstly, if x = y = s then :α1

α2...

αlog(1/ε)

=

〈x, r1〉〈x, r2〉

...〈x, rlog(1/ε)〉

=

〈s, r1〉〈s, r2〉

...〈s, rlog(1/ε)〉

=

〈y, r1〉〈y, r2〉

...〈y, rlog(1/ε)〉

Page 11: COMS 6998: Advanced Complexity Spring 2017 Lecture 14

4 RANDOMIZED COMMUNICATION COMPLEXITY 11

Therefore, if x = y then Bob always answers correctly.

– Secondly, if x 6= y then there exists an index i such that xi 6= yi.

Let’s imagine the equality and not-equality indices’ sets :

E = i|xi = yiD = i|xi = yi

Given that x 6= y ⇒ |D| > 1.

Pr[Bob fails] = Pr[

log(1/ε)⋂i=1

(ai = bi)] =

log(1/ε)∏i=1

Pr[(ai = bi)]

Let’s analyze Pr[(ak = bk)]

Pr[(ak = bk)] = Pr[〈x, rk〉 = 〈y, rk〉]= Pr[〈x, rk〉 = 〈y, rk〉]= Pr[〈x− y, rk〉 = 0]

= Pr[n∑i=1

(xi − yi)rki mod 2 = 0]

= Pr[∑i∈E

(xi − yi)rki mod 2 +∑i∈D

(xi − yi)rki mod 2 = 0]

= Pr[∑i∈E

0× rki mod 2 +∑i∈D

1× rki mod 2 = 0]

= Pr[∑i∈D

rki mod 2 = 0] = 1/2

The above analysis is just the Random Subset identity. Finally:

Pr[Bob fails] =

log(1/ε)∏i=1

Pr[(ai = bi)] =

log(1/ε)∏i=1

12

= 2− log(1/ε) = ε

• Complexity

– As the coins are public the only bits that Alice and Bob exchange are theselog(1/ε) inner-product bit.

Page 12: COMS 6998: Advanced Complexity Spring 2017 Lecture 14

5 COMMUNICATION COMPLEXITY OF UNIVARIATE FUNCTIONS 12

– So we need O(log(1/ε)) communication bits in order to solve with probabil-ity error at most ε to solve the equality problem.

Exercise 29. R12−ε

(IP) ≥ n− log(1/ε)

5 Communication Complexity of Univariate Func-

tions

Suppose that we want to analyze Communication Complexity of a uni-variate functionf(x) Then, in order to define the communication complexity of this class, we can justcompute

Deterministic Communication Complexity(f) = maxall splits

D(f)

Random Communication Complexity(f) = maxall splits

Rε(f)

In any case:

• If you want to compute an upper bound on the ratio, we should solve the problemfor any partition.

• If you want to compute a lower bound on the ratio, we have to exhibit a specificpartition.

Fact 30. For the class of the linear threshold functions, we know that the deterministiccommunication complexity is D(LTF) ≥ Ω(n).

It seems obvious that this lower bound cannot be improved.Let’s analyze the ran-dom public protocols.

The best-known result is the following:

Theorem 31. Rε(LTF) ≤ log(log(n/ε))

Here, in class, We will prove a more simple result:

Theorem 32. Rε(LTF) ≤ O(log n)O(log(log(n/ε)))

Page 13: COMS 6998: Advanced Complexity Spring 2017 Lecture 14

5 COMMUNICATION COMPLEXITY OF UNIVARIATE FUNCTIONS 13

Proof. As we mentioned, in order to upper bound the complexity, we have to designa protocol. Notice that the proposed protocol should be independent of any specificsplit.

So, let’s assume that ~x is split arbitrarily into

Alice bits := X = (x1, · · · , xk)(xk+1, · · · , xn) = Y =: Bob bits

In the beginning of the lecture for any LTF we showed that |wi| ≤ 2n logn for alli ∈ [n]

Therefore each wi is descibed by log(size) = n log n bits.

Protocol P :

1. Alice computes α =∑k

i=1 xi × wi

2. Bob computes β = θ −∑n

i=k+1 xi × wiNote:They want to verify if α > β. These numbers have n log n bits as length.

3. Alice and Bob execute binary search to find the most significant bit that α, βdiffers.-Let’s call it i∗

4. If i∗ = −1 then Alice and Bob agree to output 0.

5. If αi∗ = 1 then Alice and Bob agree to output 1.

6. If αi∗ = 0 then Alice and Bob agree to output 0

Their binary search scheme is the following: They are trying to identify what isthe longest prefix of their numbers that differ.

Page 14: COMS 6998: Advanced Complexity Spring 2017 Lecture 14

5 COMMUNICATION COMPLEXITY OF UNIVARIATE FUNCTIONS 14

Binary Search:

1. `u ← n log n

2. `d ← n log n/2

3. Alice looks at a← (α)`ubit=`d

4. Bob looks at b← (β)`ubit=`d

5. Alice and Bob runs the Rε/(2 log(n)) Equality protocol.

6. If a = b then we extend our search to the suffix (`u, `d ← `d, `d − (`d − `u)/2)

7. If a 6= b then we extend our search to the prefix (`u, `d ← `u, `d − (`d − `u)/2)

8. If `u − `d = 0 then x = `u = `d

• If x 6= 0 then output x

• If x 6= 0 then output -1 /* The numbers were equal */

Page 15: COMS 6998: Advanced Complexity Spring 2017 Lecture 14

5 COMMUNICATION COMPLEXITY OF UNIVARIATE FUNCTIONS 15

• Communication Complexity:O( log(length|W(f)|)︸ ︷︷ ︸

Number of the stages of the binary search

)O( log(2 logW(f)ε

)︸ ︷︷ ︸The samples for a ε

2 logW(f)Equality Protocol

)

Since W(f) = 2O(n logn) ⇒ length|W(f)|) ≤ logW(f) = n log n.

So, the total complexity is O(log(n log n) log(log(n/ε))).

• Correctness:The probability error is by union bound log(lengthW(f))× ε/(2 log(n)) ≤ ε.

Lemma 33. Let f be a boolean function that is computable by a circuit C of s gatesthat belong < family. Then R1/3(f) ≤ sR1/3s(<)

Proof.

Protocol P :

• Sort topologically the circuit of f from the input-bottom layer to the output-top layer. Each layer is made ups of some gates.

• Parse each layer from the bottom to the top layer and apply the R1/3s(<)protocol.

• Output the result of the final gate of the top layer

• Communication Complexity: In each gate we are applying an R1/3s(<) protocol.Since the number of the gates are at most s. We are using s × R1/3s(<) forcommunication.

• Correctness: Indeed, by union bound the probability error is1

3s× |#gates| =

1

3s× s = 1

3

5.1 Circuit lower bounds

Corollary 34. Any circuit of LTF gates for IP needs Ω( nlogn

) gates:

Proof. As we mention : n− log(1/ε) ≤ R12−ε

(IP).

Therefore n− 2 ≤ R13(IP) ≤ sR 1

3s(LTF).

Page 16: COMS 6998: Advanced Complexity Spring 2017 Lecture 14

5 COMMUNICATION COMPLEXITY OF UNIVARIATE FUNCTIONS 16

Let’s assume now by contradiction that there was a circuit that uses s = o( nlogn

).

However given the previous result, we have that sR 13s

(LTF) = o( nlogn

)R 1

3o(n

logn)

(LTF) =

o(n)But this means that n− 2 ≤ o(n), which is obviously a contradiction.

Corollary 35. Any DT with LTFs as nodes for inner product must have depth Ω(n/ log n)

Proof. The proof is similar with the previous one.The corresponding lemma for thin computation model is the following:

Lemma 36. Let f be a boolean function that is computable by a decision tree T withheight h using gates that belong to < family. Then R1/3(f) ≤ hR1/3h(<)

Proof sketch: The reason that this lemma is different with the previous one that refersto general gates is simple. During the simulation of a decision tree we will follow only avery specific branch. Therefore the actual circuit that will be used for the computationcould be at most the longest branch. By definition the number of the longest branchof the tree is its depth. Thus in the worst case scenario, the communication bits thatwe will need, are O(depth(T ))R1/3h(<).

Let’s assume now by contradiction that there was a tree that has depth . Howevergiven the previous result, we have that :

dR 13d

(LTF) = o( nlogn

)R 1

3o(n

logn)

(LTF) = o(n)

However as we have already mentioned:

n− 2 ≤ R13(IP)

This implies that if there was any DT T with depth o( nlogn

) that uses as nodes LTF

gates then n− 2 ≤ R13(IP) ≤ o(n), which is obviously a contradiction.

Let’s recall the last goal of this lecture:

Theorem 37. Any circuit of the form MAJ (LTF)s for IP uses at least 2Ω(n) LTFgates.

Page 17: COMS 6998: Advanced Complexity Spring 2017 Lecture 14

5 COMMUNICATION COMPLEXITY OF UNIVARIATE FUNCTIONS 17

Figure 4: Any circuit c = MAJ (LTF)s has two layers. The bottom layer is made upof s LTFs gates and the top layer is made up of the final MAJ gate.

Proof. Here we will need a new trick. We will prove the following lemma:

Lemma 38. For any function of the form f = MAJ(<), we have that: R1/2−1/4s(f) ≤R1/4s(<), for any family of circuits <.

Proof.

• Again we have to implement a protocol for f using the best possible randomprotocol for <.

• The reader first of all should notice that there is an extremely simple O(1)-bitR1/2(f) protocol. We can just toss a coin and decide the output of the Majoritygate.

• However, given that the majority gates are defined for odd length input we candescribe a protocol that boosts a little bit the success probability.

Question : What is the probability that the majority function agrees with a uniformly

randomly chosen input variable xi?

(where the probability is also over uniform random inputs.

Answer : At leastn+1

2

n. As the population of the majority is at least

n+12

n

Page 18: COMS 6998: Advanced Complexity Spring 2017 Lecture 14

REFERENCES 18

• Therefore:

Pri∈|Size(Bottom Layer)|

[f(x) = Ci(x)|Ci(x)is computed correctly] ≥ (s+ 1)/2

s=

1

2+

1

2s

• So the protocol is pretty simple:

Protocol P :

1. Choose uniformly at random a circuit C∗(x) and run the protocol R1/4s(<).

2. Output its result

• Indeed, the protocol P will output the correct result with probability =

Pri∈|Size(Bottom Layer)|

[f(x) = Ci(x)|Ci(x)is computed correctly] Pr[Ci(x)is computed correctly] =

(1

2+

1

2s)(1− 1

4s) ≥ (

1

2+

1

2s− 1

4s) =

1

2+

1

4s

Now we are ready to prove our last lower bound : Suppose that s = 2o(n).We know that :

n− log(1/ε) ≤ R12−ε

(IP) ∀ε > 0

Let ε = 1/4s then we have:

n− o(n) ≤ R12−1/4s

(IP)

Additionally:

R12−1/4s

(IP) ≤ R12−1/4s

(MAJ (LTF)) ≤ R1/4s(LTF) ≤ log(log n× 4s) ∈ O(log n)

Combining the last two inequalities we get Θ(n) ≤ O(log n), which is a contradic-tion.

References

[Has94] Johan Hastad. On the size of weights for threshold gates. SIAM J. Discret.Math., 7(3):484–492, August 1994. 16