probabilistic aspects of voting, intransitivity and manipulationelmos/fall19/survey.pdf ·...

Probabilistic Aspects of Voting, Intransitivity andManipulation

Elchanan Mossel∗

Abstract

Marquis de Condorcet, a French philosopher, mathematician, and political scientist,studied mathematical aspects of voting in the eighteenth century. Condorcet wasinterested in studying voting rules as procedures for aggregating noisy signals and inthe paradoxical nature of ranking 3 or more alternatives. These notes will surveysome of the main mathematical models, tools, and results in a theory that studiesprobabilistic aspects of social choice. Our journey will take us through major results inmathematical economics from the second half of the 20th century, through the theory ofboolean functions and their influences and through recent results in Gaussian geometryand functional inequalities.

∗MIT, Cambridge MA, USA. E-mail: [email protected]. Partially supported by NSF Grant CCF 1665252and DOD ONR grant N00014-17-1-2598

1

1 Introduction

Marquis de Condorcet, a French philosopher, mathematician and political scientist, studiedmathematical aspects of voting in the eighteenth century. It is remarkable that alreadyin the 18th century Condorcet was an advocate of equal rights for women and people ofall races and of free and equal public education [79]. His applied interest in democraticprocesses led him to write an influential paper in 1785 [16], where in particular he wasinterested in voting as an aggregation procedure and where he pointed out the paradoxicalnature of voting in the presence of 3 or more alternatives.

1.1 The law of large numbers and Condorcet Jury Theorem

In what is known as Condorcet Jury Theorem, Condorcet considered the following setup.There are n voters and two alternatives denoted + (which stands for +1) and − (whichstands for −1). Each voter obtains a signal which indicates which of the alternatives ispreferable. The assumption is that there is an a priori better alternative and that eachvoter independently obtains the correct information with probability p > 1/2 and incorrectinformation with probability 1 − p. The n voters then take a majority vote to decide thewinner. Without loss of generality we may assume that the correct alternative is + andtherefore the individual signals are i.i.d RV xi where P[xi = +] = p and P[xi = −] = 1− p.Let m denote the Majority function, i.e the function that returns the most popular valuesamong its inputs. Condorcet Jury Theorem is:

Theorem 1.1. For every p > 1/2:

• limn odd →∞ P[m(x1, . . . , xn) = +] = 1.

• If n1 < n2 are odd then P[m(x1, . . . , xn1) = +] < P[m(x1, . . . , xn2) = +].

The first part of the theorem is immediate from the law of large numbers (which wasknown at the time), so the novel contribution was the second part. In the early days ofmodern democracy, Condorcet used his model to argue that the more people participatingin decision making, the more likely that the correct decision is arrived at. We leave theproof of the second part of the theorem as an exercise.

1.2 Condorcet Paradox and Arrow Theorem

As hinted earlier, things are more interesting when there are 3 or more alternatives. Inthe same 1785 paper, Condorcet proposed the following “paradox”. Consider three votersnamed 1, 2 and 3, and three alternatives named a, b and c. Each voter ranks the threealternatives in one of six linear orders. While it is tempting to represent the orders aselements of the permutation group S3, it will be more useful for us to use the followingrepresentation. Voter i preference is given by (xi, yi, zi) where xi = +, if she prefers a

2

to b and −, otherwise, yi = +, if she prefers b to c and −, otherwise and zi = + if sheprefers c to a and −, otherwise. Each of the 6 rankings corresponds to one of the vectorsin −1, 13 \ ±(1, 1, 1).

Condorcet considered 3 voters, with rankings given by: a > b > c, c > a > b, b > c > a,or in our notation by the rows of the following matrix:x1 y1 z1

x2 y2 z2

x3 y3 z3

=

+ + −+ − +− + +

How should we decide how to aggregate the individual rankings? If we use the majority

rule to decide between each pair of preferences, then we apply the majority rule on each ofthe columns of the matrix and conclude that overall preference is (+,+,+). In other wordsthe overall preference is that a > b, the overall preference is that b > c and the overallpreference is that c > a. This does not correspond to an order! This is what is knownas Condorcet Paradox. It is part of a long and interesting discussion in political science,economics, mathematics and computer science on how to aggregate rankings.

Of particular interest to us is to follow Ken Arrow who asked if perhaps the paradoxis the result of using the majority function to decide between every pair of alternatives?Can we avoid paradoxes if we aggregate pairwise preferences using a different function?

One function that never results in paradoxes is the dictator function f(x) = x1 as theaggregate ranking is (x1, y1, z1) 6= ±(1, 1, 1).

Arrow in his famous theorem proved [2, 3]:

Theorem 1.2. Let f : −1, 1n → −1, 1 and suppose that f never results in a paradox,so for all (xi, yi, zi) 6= ±(1, 1, 1) it holds that (f(x), f(y), f(z)) 6= ±(1, 1, 1).Then f is adictator: there exists an i such that f(x) = xi for all x, or there exists an i such thatf(x) = −xi for all x.

With the right notation and formulation the proof of Arrow’s Theorem is very short [4,53]:

Proof. First note that if f is a constant function, then the outcome is always ±(1, 1, 1).Suppose that f is not a dictator and not a constant, then f depends on at least twocoordinates. Without loss of generality, let these coordinates be 1 and 2. Therefore:

∃x2, . . . , xn : f(+, x2, . . . , xn) 6= f(−, x2, . . . , xn)

∃y1, y3, . . . , yn : f(y1,+, y3, . . . , yn) 6= f(y1,−, y3, . . . , yn)

We now choose z1 = −y1 and zi = −xi for i ≥ 2. This guarantees that (xi, yi, zi) 6=±(1, 1, 1) for all i no matter what the values of x1 and y2 are. This can be also verified

3

from the matrix in (1). Now choose x1 and y2 so that f(x) = f(y) = f(z) to arrive at acontradiction.

x1 y1 −y1

x2 y2 −x2...

......

xn yn −xn

(1)

Arrow also considered a more general setting where f, g and h are allowed to be differentfunctions, there are other functions that result in 0 probability of paradox. For exampleif f = 1 and g = −1 for all inputs then h can be arbitrary. This corresponds to the casewhere the second alternative b is ignored (always ranked last) and the choice between aand c is determined by h. Arrow Theorem in this setting can be stated as follows:

Theorem 1.3. Let f, g, h : −1, 1n → −1, 1 and suppose that (f, g, h) never results ina paradox, so for all (xi, yi, zi) 6= ±(1, 1, 1) it holds that (f(x), g(y), h(z)) 6= ±(1, 1, 1).Theneither two of the functions are constant of opposite sign, or there exists an i such that f, gand h are dictator on voter i.

Proof. If two of the functions, say f and g take the same constant value and the thirdfunction h is not constant, then clearly one can find x, y, z such that f(x) = g(y) = h(z)and (x, y, z) 6= ±(1, 1, 1). So WLOG we may assume at least two of the functions, say fand g are not constant. Let A(f) denote the set of non-zero influence variables of f andsimilarly A(g) and A(h). Since f and g are not constant A(f) and A(g) are not empty. Ifthere exists a variable i ∈ A(f) and a variable i 6= j ∈ A(g), then by the same argument asin Theorem 1.2, there exist (x, y, z) resulting in a paradox. Thus, the only case remainingis where A(f) = A(g) = i and A(h) = i or A(h) = ∅. In either case, the functions f, g andh are all functions of variable i only. It is now easy to verify that it must be the case thatf = g = h is a dictator on voter i.

1.3 Manipulation and the GS Theorem

A naturally desirable property of a voting system is strategyproofness (a.k.a. nonmanipu-lability): no voter should benefit from voting strategically, i.e., voting not according to hertrue preferences. However, Gibbard [31] and Satterthwaite [76] showed that no reasonablevoting system can be strategyproof. Before stating their result, let us specify the problemmore formally.

The setting here is different than the setup of Arrow’s Theorem: We consider n voterselecting a winner among k alternatives. The voters specify their opinion by ranking thealternatives, and the winner is determined according to some predefined social choice func-tion (SCF) f : Snk → [k] of all the voters’ rankings, where Sk denotes the set of all possibletotal orderings of the k alternatives. We call a collection of rankings by the voters a ranking

4

profile. We say that an SCF is manipulable if there exists a ranking profile where a votercan achieve a more desirable outcome of the election according to her true preferences byvoting in a way that does not reflect her true preferences.

For example, consider Borda voting, where each candidate receives a score which is thesum of its ranks and the candidate with the lowest score wins. If the individual rankingsare: (abcd), (cadb), then a is the winner, but if the second voter were to vote (cdba) insteadthen c will become the winner, so the second voter will want to vote untruthfully.

Theorem 1.4 (Gibbard-Satterthwaite). Any Social Choice Function which is not a dicta-torship (i.e., not a function of a single voter), and which allows at least three alternativesto be elected, is manipulable.

This theorem has contributed to the realization that it is unlikely to expect truthfulnessin voting. There are many proofs of the Gibbard-Satterthwaite theorem, but all are morecomplex than the proof of Arrow’s theorem given above. We will not provide a proof ofthe theorem in these notes.

2 Modern Perspectives

Work since the 1980s addressed novel aspects of aggregation of votes. Condorcet JuryTheorem assumes a probability distribution over the voters but is restricted to a specificaggregation function (majority) while Arrow theorem considers general aggregation func-tions but involves no probability model. There are many interesting questions that canbe asked by combining the two perspectives. First, it is natural to ask about aggregationproperties of Boolean functions. The study of aggregation properties of Boolean functionswas fundamental to the development of the area of “Analysis of Boolean Functions” sincethe 1980s, starting with [8] and [40]. Second, we can ask questions regarding the proba-bility of manipulation and paradoxes, questions that were analyzed since the early 2000s,starting with the works [41, 42, 28].

In terms of mathematical theory, the main statement of quantitative social theory havethe same form as statements in property testing add citations and some of the mainstatements in additive combinatorics. In terms of the techniques, we will see that the maintechniques have discrete isopetimetric flavor add citations. We will cover the followingtopics:

We will start by studying the question of noise stability of Boolean functions [9, 60].This will lead us to discussing some of the main analytical tools in the area, includingnotably hyper-contraction [11, 68, 33, 6], the invariance principle [60] and Borell’s Gaussiannoise-stability result [13].

We will then devote a considerable effort to proving quantitative versions of ArrowTheorem. We will do so by proving a Gaussian version of Arrow theorem, as well as aquantitive version using reverse hyper-contraction [12]. Combining the two we will provea general quantitive Arrow theorem [55].

5

Different tools were used in proving different quantitive versions of the manipulationtheorem. The first proof, which applies only to 3 alternatives, uses a reduction to aquantitive Arrow Theorem [28, 26]. We will follow later proofs which apply in the case ofa larger number of coordinates and use reverse hyper-contraction as a main tool [36, 64].The classical proofs of manipulation theorems often use long paths of voting profiles. Themost general proof in [64] will quantify such arguments using geometric tools from thetheory of Markov chains.

In the last part of the notes we will review some of the more classical results of theaggregation power of Boolean functions and also discuss some future directions includingstudying the same questions for non-product measures.

We note that much of the interest in quantitative social choice theory comes fromartificial intelligence and computer science, where virtual elections are now an establishedtool in preference aggregation (see the survey by Faliszewski and Procaccia [23]). Manyof the results in the study of social choice are negative: it is impossible to design a votingsystem that satisfies a few desired properties all at once.

Recall that Gibbard-Satterthwaite theorem states that any SCF which is not a dicta-torship (i.e., not a function of a single voter), and which allows at least three alternativesto be elected, is manipulable. This has contributed to the realization that it is unlikely toexpect truthfulness in voting. Consequently, there have been many branches of researchdevoted to understanding the extent of manipulability of voting systems, and to findingways of circumventing the negative results.

One approach, introduced by Bartholdi, Tovey and Trick [5], suggests computationalcomplexity as a barrier against manipulation: if it is computationally hard for a voter tomanipulate, then she would just tell the truth (we refer to the survey by Faliszewski andProcaccia [23] for a detailed history of the surrounding literature). This is a worst-caseapproach, and while worst-case hardness of manipulation is a desirable property for a SCFto have, this does not tell us anything about typical instances of the problem—is it easyor hard to manipulate on average?

The quantitative versions of Arrow and the GS theorem we will see below show thattypically ranking is paradoxical and manipulation is easy on average. This follows a longline of research which proved such results for restricted classes of SCFs, including, e.g.,Kelly [44], Conitzer and Sandholm [17], and Procaccia and Rosenschein [74] (see also thesurvey [23]).

3 Reading

The recommended reading for various topics is as follows:

• Noise Stability: We follow the approach of [19, 20]. The original approach for theproofs of noise stability was developed in [59, 60], see also [72] as a general reference

6

to analysis of Boolean functions from the theoretical computer science perspective.Some of the strongest results on noise stability follow [56].

• Arrow-Kalai Theorem was proved in [41]. It is based on the FKN Theorem whichwas proven in [27]. More elegant and general proofs can be found in [38].

• The proof of the Gaussian and general Arrow Theorem follows [55]. Tighter resultswere later obtained in [43]. The results were later generalized further in [62].

• For the proof of the quantitive versions of the manipulation theorem we will follow [36]and [64], though the first version for the case of 3 alternatives was established in [28,26].

• The work on aggregation of binary functions is based on seminal work in the 1980sand 1990s including [8] and [40].

• We will finally discuss more general voter distributions and other models in votingsuch as [34] and [67].

4 A Preview — Noise Stability and the Probability of Para-dox for the Majority Function

As a preview of what’s to come, we will compute the asymptotic probability of a non-transitive outcome in Condorcet setup with 3 alternatives and where voters vote uniformlyat random. Let us denote the Majority function by m : −1, 1n → −1, 1 and assumethat the number of voters n is odd. Paradoxes seem more likely when there is no biastowards a particular candidate so we will consider voters who vote independently andwhere voter i votes uniformly at random from the 6 possible rankings. Recall that weencode the 6 possible rankings by vectors (x, y, z) ∈ −1,+13 \ ±(1, 1, 1). Here x is+1/− 1 if a voter ranks a above/below b, y is +1/− 1 if voter ranks b above/below c, z is+1/− 1 if voter ranks c above/below a.

How do we analyze the probability of a paradox? The following simple fact was usedin [41]: Since the binary predicate ψ : −1, 13 → 0, 1, ψ(a, b, c) = 1(a = b = c) can beexpressed as

ψ(a, b, c) =1

4(1 + ab+ ac+ bc),

we can write

P[m(x) = m(y) = m(z)] =1

4(1 + E[m(x)m(y)] + E[m(x)m(z)] + E[m(y)m(z)]),

which due to symmetry can be written as

P[m(x) = m(y) = m(z)] =1

4(1 + 3E[m(x)m(y)]).

7

Moreover, the uniform distribution over ±13 \ ±(1, 1, 1) satisfies E[xiyi] = E[yizi] =E[zixi] = −1/3 and the n coordinates are independent. As we will see shortly, we callthe the quantity E[m(x)m(y)] is the noise stability of m with noise parameter −1/3. Itsasymptotic value as n→∞ is easy to compute using a 2-dimensional CLT to obtain:

limn→∞

E[m(x)m(y)] = E[sgn(X) sgn(Y )],

where X,Y ∼ N(0,

(1 −1

3−1

3 1

)) and we can see that

E[sgn(X) sgn(Y )] = 2P[sgn(X) = sgn(Y )]− 1 = 1− 2 arccos(−1/3)

π

and

limn→∞

P[m(x) = m(y) = m(z)] = 1− 3 arccos(−1/3)

2π≈ 0.088 .

5 Noise Stability

5.1 Boolean Noise Stability

Consider the following thought experiment: suppose the voters in binary voting obtainindependent uniform signals: xi = + or xi = − with probability 1/2. This is the samesetting as in Condorcet Jury Theorem except the voters are completely uninformed.

Now consider the following process that produces a vector y as a noisy version of x. Foreach i independently: let yi = xi with probability (1 + θ)/2 and yi = −xi with probability(1− θ)/2, where θ ∈ [−1, 1]. We chose the parametrization so that E[xiyi] = θ.

How should we interpret y? A simple interpretation is as a noise process of votingmachines. Suppose that when each voter votes, there is a small probability, say 0.01 that thevoting machine records the opposite vote (independently for all voters and independentlyof the intended vote). In this case θ = 0.98. Given a voting aggregation function f :−1, 1n → −1, 1, ideally we would like the quantity

P[f(x) = f(y)] =1

2(1 + E[f(x)f(y)])

to be as large as possible if θ > 0 and as small as possible if θ < 0. The quantity E[f(x)f(y)]is called the noise stability of f . More generally, following [9] we define:

Definition 5.1. For two functions f, g : −1, 1n → R the (ρ-)noisy inner product of fand g denoted by 〈f, g〉ρ is defined by E[f(x)g(y)], where ((xi, yi) : 1 ≤ i ≤ n) are i.i.d.mean 0 (E[xi] = E[yi] = 0) and ρ-correlated (E[xiyi] = ρ). The noise stability of f is itsnoisy inner product with itself: 〈f, f〉ρ.

We can also write the noisy inner product in terms of the noise operator Tρ,

8

Definition 5.2. The Markov operator Tρ maps functions f : −1, 1n → R to functionsTρf : −1, 1n → R. It is defined by:

(Tρf)(x) = E[f(y)|x].

The noise operator is also known as the Bonami-Beckner operator and plays a key rolein the theory of hyper-contraction [11, 6]. Note that

〈f, g〉ρ = E[f(x)g(y)] = E[fTρg] = E[gTρf ] = 〈Tf, g〉 = 〈f, Tg〉

and that〈f, g〉 := E[f(x)g(x)] = 〈f, g〉1.

Basic properties of this operator can be revealed using its eigenfunctions, i.e., the Fourierbasis. The following proposition is straightforward to prove:

Proposition 5.3. For S ⊂ [n], write xS =∏i∈S xi, so x∅ ≡ 1. Then:

• (xS : S ⊂ [n]) is an orthonormal basis for the space of all functions f : −1, 1n → R.

• xS is an eigenfunction of Tρ which corresponds to the eigenvalue ρ|S|: TρxS = ρ|S|xS.

The following easy result is folklore (see e.g. [61]).

Theorem 5.4. For every ρ > 0, for every n and for every f, g : −1, 1n → −1, 1 withE[f ] = E[g] = 0, it holds that

〈f, g〉ρ ≤ 〈x1, x1〉ρ = ρ.

〈f, g〉−ρ ≥ 〈x1, x1〉−ρ = −ρ.

Moreover, the only optimizers are dictator functions, i.e., functions of the form f(x) =g(x) = xi or f(x) = g(x) = −xi.

Proof. Note that E[f ] = 〈f, 1〉 = 〈f, x∅〉. So if E[f ] = 0, then: f =∑

S 6=∅ f(S)xS , where

f(S) = 〈f, xS〉. Moreover,

Tρf(x) = E[f(y)|x] =∑S

ρ|S|f(S)xS ,

and similarly for g. Therefore:

〈f, g〉ρ = 〈Tρf, g〉 = 〈∑S 6=∅

ρ|S|f(S)xS ,∑S 6=∅

g(S)xS〉 (2)

=∑S 6=∅

ρ|S|f(S)g(S) ≤ ρ√∑S 6=∅

f2(S)∑S 6=∅

g2(S) = ρ (3)

9

where the last inequality uses Cauchy-Schwarz and Parseval’s identity∑S

f2(S) = 〈f, f〉 = 1.

In the case of equality we conclude that f = g must be a linear function, so f =∑

i aixi.Since f is Boolean,

1 ≡ f2 =∑i

a2i +

∑i 6=j

aiajxixj

and therefore it must be the case that f is a dictator. The case where ρ < 1 is similar andis left as an exercise.

The Theorem above allow a quick proof of a version of Arrow’s Theorem by Kalai [41]:

Corollary 5.5. In the context of Arrow Theorem if E[f ] = E[g] = E[h] = 0 and P[f(x) =g(y) = h(z)] = 0 then f, g and h are all the same dictator.

Proof. Use the previous theorem and:

P[f(x) = g(y) = h(z)] =1

4

(1 + 〈f, g〉−1/3 + 〈g, h〉−1/3 + 〈h, f〉−1/3

)

From the voting perspective, there is something a little unnatural about the resultabove. It says that if we want to maximize robustness, then a dictator should decide.From a mathematical perspective, it is disappointing that there is something special aboutE[f ] = 0. In particular the following is open:

Problem 5.6. For a generic ρ > 0, 0 < µ < 1 what is the value of:

limn→∞

max(〈f, f〉ρ : f : −1, 1n → −1, 1,E[f ] = µ

)Here are some additional examples:

• Similar argument to the theorem shows that if ρ > 0 then 〈f, f〉ρ ≥ ρn for f :−1, 1n → −1, 1. The parity function x[n] achieves equality.

• The asymptotic noise stability of Majority is given by Sheppard formula [77], i.e.,E[sgn(N) sgn(M)], where (N,M) are ρ-correlated random variables:

E[sgn(M) sgn(N)] = 2P[sgn(M) = sgn(N)]− 1 = 1− 2 arccos(ρ)

π:= κ(ρ)

In particular if ρ = 1 − ε, then P[f(x) 6= f(y)] is of order√ε. Compare this to a

dictator where it is of order ε.

10

• If we consider n = r2 where r is odd and the function f implements electoral college,i.e.,

f(x1, . . . , xn) = m(m(x1, . . . , xr), . . . ,m(xn−r+1, xn)

),

then it is easy to see that asymptotically the noise stability is given by κ(κ(ρ)). Inparticular if ρ = 1− ε for small ε then P[f(x) 6= f(y)] is of order ε1/4.

• Let mr be majority function on r voters and define m(1)r = mr and by induction:

m(h)r (x1, . . . , xrh) = m(h−1)

r

(mr(x1, . . . , xr), . . . ,mr(xrh−r+1, . . . , xrh)

).

This function is called the recursive majority function.

Exercise 5.7 ([58]). Show that for every ε < 0.5, if r is large enough and nh = rh

and if ρ = 1− n−εh then

limh→∞

E[m(h)r (x)m(h)

r (y)] = 0.

5.2 Gaussian Noise Stability

We will now take a detour of considering analogous quantities defined in Gaussian space.We will later see that this is quite useful in the Boolean setting.

Definition 5.8. In Gaussian space, the (ρ-)noisy inner product of φ : Rn → R andψ : Rn → R denoted by 〈φ, ψ〉ρ is defined as

E[φ(M)ψ(N)],

where ((Mi, Ni))ni=1 are i.i.d. two-dimensional Gaussian vectors, such that Ni,Mi are stan-

dard (mean 0, variance 1) Gaussian random variables and E[NiMi] = ρ. The noise stabilityof φ is its inner product with itself: 〈φ, φ〉ρ.

We will generally use f, g etc. to denote functions over the Boolean cube and φ, ψetc. for functions in L2(Rn, γ). In particular, for µ ∈ [0, 1] we write χµ for the indicator ofthe interval (−∞,Φ−1(µ)) whose Gaussian measure is µ.

We can now state Borell’s [13] noise stability result:

Theorem 5.9. For all n ≥ 1, ρ > 0 and φ, ψ : Rn → [0, 1], it holds that:

〈1− χ1−Eφ, χEψ〉ρ ≤ 〈φ, ψ〉ρ ≤ 〈χEφ, χEψ〉ρ〈1− χ1−Eφ, χEψ〉−ρ ≥ 〈φ, ψ〉−ρ ≥ 〈χEφ, χEψ〉−ρ

Borell [13] was interested in more general functionals of the heat equations and heshowed that these functionals increase with respect to nonincreasing spherical rearrange-ment. The fact that half spaces are the unique optimizers of ρ-noisy inner product wasproven in [57], where a robust version of the theorem is also proven. Tighter robust versionswere later proven by Eldan [22]. Other alternative proofs and generalization of Borell’sresult include [37, 45].

11

−1 −0.5 0.5 1

−1

−0.5

0.5

1

ρ

Stability

Figure 1: The noise stability of dictator and Gaussian half-space of measure 0.5. i.e.,functions ρ and 1− arccos ρ/2π. Note that for every 0 < ρ < 1, the dictator is more stablethan the corresponding half-spaces and for every −1 < ρ < 0, it is less stable than thecorresponding half-space

5.3 Gaussian and Boolean Noise Stability

By applying the CLT, it is easy to check that Gaussian noise stability provides a boundon the Boolean noise stability.

Proposition 5.10. For every ρ ∈ [−1, 1], µ, ν ∈ [0, 1], and for every

s ∈[〈1− χ1−µ, χν〉ρ, 〈χµ, χν〉ρ

],

there exists sequence of Boolean functions fn, gn : −1, 1n → 0, 1 such that E[fn] →µ,E[gn]→ ν and

〈fn, gn〉ρ → s

Moreover, by Theorem 5.9 and Theorem 5.4 it follows that there is a strict inequalityat µ = 1/2. See Figure 5.3.

The proof of the proposition is standard using approximation of Gaussian randomvariables in terms of sums of independent Bernoullis:

Proof. To show that we can obtain the RHS of the interval, let

fn = χµ(n−1/2n∑i=1

xi), gn = χν(n−1/2n∑i=1

xi),

12

and apply the CLT. The proof of achievability of the LHS is identical. To obtain anintermediate point s take the inputs of fn and gn to be defined on overlapping blocks ofbits, e.g.:

fn = χµ(n−1/2

(1+α)n∑i=αn

xi), gn = χν(n−1/2n∑i=1

xi).

5.4 Smooth Boolean Functions

To better understand the connection between Boolean and Gaussian stability, we definetwo notions of smoothness, termed low influences and resilience for Boolean functions. Webegin with the notion of influence, that was first defined in [8] and [40]. We will do sofor general product probability spaces. To simplify notation we will often omit the sigmaalgebra and probability measure defined over a probability space Ω.

Definition 5.11. Consider a probability space Ω. For a function f : Ωn → R, we definethe i’th influence of f as

Ii(f) = E[

Var[f |x1, . . . , xi−1, xi+1, . . . , xn]],

where the expected value is with respect to the product measure on Ωn. In the Booleancase with the uniform measure f : −1, 1n → R, the influence is equivalently defined as

Ii(f) = E[

Var[f |x1, . . . , xi−1, xi+1, . . . , xn]]

=∑S:i∈S

f2(S).

Or asIi(f) = E[|∂if |2],

where (∂if)(x1, . . . , xn) = 0.5(f(x1, . . . , xi−1,+, xi+1, . . . , xn)−f(x1, . . . , xi−1,−, xi+1, . . . , xn))is the discrete i’th directional derivative.

An easy corollary of the definition is that for |ρ| < 1, it holds that Tρf is small in thesense that the sum of its influences is bounded as a function of ρ only.

Lemma 5.12. Let f : −1, 1n → [−1, 1] and |ρ| < 1 then

n∑i=1

Ii(Tρf) ≤ 1/(1− |ρ|).

13

Proof.

n∑i=1

Ii(Tρf) =n∑i=1

∑S:i∈S

Tρf2(S)

=∑S

|S|Tρf2(S) =

∑S

|S|ρ|S|f2(S)

≤ maxk|ρ|kk

∑S

f2(S) ≤ maxk|ρ|kk ≤ 1/(1− |ρ|)

The proof follows.

For a voting function f : −1, 1n → −1, 1, Ii(f) is the probability that voter i isthe deciding voter, given all other votes. A stronger notion of power of a voter or a smallset of voters is that their vote affects the expected outcome on average. A function whoseexpectation is not affected by any small set of voters is called resilient. More formally,

Definition 5.13. We say that a function f : Ωn → R is (r, α)-resilient if∣∣∣E [f |XS = z]− E[f ]∣∣∣ ≤ α. (4)

for all j, all sets S with |S| ≤ r and all z ∈ ΩS .

Proposition 5.14. If f : −1, 1n → R satisfies

max(|f(S)| : 0 < |S| ≤ r) ≤ 2−rα, (5)

then f is (r, α)-resilient. In particular if f has all influences are bounded 4−rα2 then f is(r, α)-resilient.

Proof. The second statement follows from the first one immediately as for every non-emptyS, we may choose i ∈ S and then

f2(S) ≤ Ii(f) ≤ 4−rα2,

as needed. For the first statement, assume (5). Then:∣∣∣E [f |XS = z]− E[f ]∣∣∣ =

∣∣∣E[∑T 6=∅

f(T )zS∩TxT\S ]∣∣∣ =

∣∣∣ ∑∅6=T⊂S

f(T )zT

∣∣∣ ≤ 2|S|2−rα ≤ α.

Resilient functions have long been studied in the context of pseudo-randomness, seee.g. [14].

14

Thus, the statement that a function has a high influence variable means that thereexists a voter i that can have a noticeable effect on the outcome if voter i has access toall other votes cast. The statement that a function is not resilient implies that there is abounded set of voters who have noticeable effect on the outcome on average, i.e., with noaccess to other votes cast. Consider the following examples:

• Dictator has maximal influence of 1 (and all other 0). It is also not resilient forr ≥ 1, α < 1.

• Majority has all influences of order n−1/2 and is also (r,O(r/√n)) resilient.

• An example of a resilient function with a high influence variable is the function

f(x) = x1 sgn(n∑i=2

xi).

Here coordinate 1 has influence 1 but the function is resilient. In terms of voting,voter 1 has a lot of power if she has access to all other votes cast (or the majority ofthe votes) but without access to this information, she is powerless. Moreover, everysmall set of k voters can change the expected value of f (by conditioning on theirvote) by O(kn−1/2). Another simple example is the parity function

∏ni=1 xi, which

is (r, 0) resilient for every r < n, but where all influences are 1.

The Majority is Stablest Theorem states that the extremal noise stability of low influ-ence / resilient functions on the discrete cube is captured by Gaussian noise stability. Hereare three increasingly stronger statements along this line:

Theorem 5.15 ([59, 60]). For every ε > 0, 0 ≤ ρ < 1, there exists a τ > 0 for which thefollowing holds. Let f, g : −1, 1n → [0, 1] satisfy max(Ii(f), Ii(g)) < τ for all i. Then

〈f, g〉ρ ≤ 〈χEf , χEg〉ρ + ε.

This theorem is called Majority Is Stablest since 〈χEf , χEg〉ρ = limn→∞〈fn, gn〉ρ, wherefn(x) = χEf (n−1/2

∑ni=1 xi) and gn(x) = χEg(n

−1/2∑n

i=1 xi).It turns out that for two functions, it is in fact enough that one of them is low influence

to obtain the same results, i.e.:

Theorem 5.16 ([54], Prop 1.15). For every ε > 0 and 0 ≤ ρ < 1, there exists a τ(ρ, ε) > 0for which the following holds. Let f, g : −1, 1n → [0, 1] be such that min(Ii(f), Ii(g)) < τfor all i. Then

〈f, g〉ρ ≤ 〈χEf , χEg〉ρ + ε, (6)

where one can take

τ = εO(

log(1/ε) log(1/(1−ρ))(1−ρ)ε

). (7)

In particular the statement above holds when maxi Ii(f) < τ and g is any Boolean functionbounded between 0 and 1.

15

Moreover, one can replace the low influence condition by the condition that the functionis resilient:

Theorem 5.17 ([56]). For every ε > 0, 0 ≤ ρ < 1, there exist r, α > 0 for which thefollowing holds. Let f : −1, 1n → [0, 1] be (r, α)-resilient and let g : −1, 1n → [0, 1] bean arbitrary function. Then

〈f, g〉ρ ≤ 〈χEf , χEg〉ρ + ε.

One can take

r = O

(1

ε2(1− ρ)τ

), α = O

(ε2−r

), (8)

where τ is given by (7).

Note in particular that for our current bounds for τ and for fixed ρ, r is exponential in apolynomial in 1/ε and α is doubly exponential in a polynomial in 1/ε. Similar statementsfor one function were proven before by [73] and appeared in [39].

6 Proof of Majority is Stablest

The original proof of the Majority is Stablest Theorem used a non-linear invariance prin-ciple to derive the theorem from Borell’s result [59, 60]. We will follow a different proofthat gives an independent proof of Borell’s result. This proof from [19, 20] is based oninduction on dimension in the discrete cube.

First, it will be useful, for ρ ∈ [−1, 1], to define Jρ : (0, 1)2 → [0, 1] by

Jρ(x, y) := 〈χx, χy〉ρ = P[N ≤ Φ−1(x),M ≤ Φ−1(y)],

where N,M are jointly normally distributed random variables with the covariance matrix

Cov(N,M) =

(1 ρρ 1

).

Instead of just proving Borell’s result, we will prove a functional form of the result, firststated and proved in [57]. In [57] it is proved that in the Gaussian setup where φ, ψ : Rn →[0, 1], and N,M are jointly normal random variables with covariance

( In ρInρIn In

), we have

EJρ(φ(N), ψ(M)) ≤ Jρ(Eφ,Eψ). (9)

Note that Jρ(0, x) = 0 for all x and Jρ(1, 1) = 1. Therefore for any two sets A,B:

EJρ(1A(N), 1B(M)) = P[N ∈ A,M ∈ B].

Thus (9) applied to indicator functions, implies Borell’s inequality [13] (Theorem 5.9).

16

Exercise 6.1. Prove that (9) is in fact, equivalent to Theorem 5.9.

To prove Majority is Stablest, we would like to show an analogous inequality on thecube, with f, g : −1, 1n → [0, 1] and X,Y correlated points on −1, 1n. Our mainobservation is that while inequality (9) is not true in this setup, it is true with some extraerror terms on the right hand side. These error terms will diminish in the low influenceand Gaussian case. First, let us define the error term:

Definition 6.2. Let Ω be a probability space and f : Ωn → R. Consider the martingaledefined as

f0 = f, f1 = E[f |X2, . . . , Xn], . . . , fi = E[f |Xi+1, . . . , Xn], . . . , fn = E[f ],

and let

∆m(f) =m∑i=1

E[|fi − fi−1|3],

for m ≤ n.

Note that by orthogonality of martingale increments we know that

n∑i=1

E[|fi − fi−1|2] = Var[f ],

and by Jensen’s inequality, it follows that

Exercise 6.3.E[|fi − fi−1|2] ≤ Ii(f).

We therefore expect ∆n(f) to be small when the differences |fi − fi−1| are typicallysmall, i.e., f is smooth to changing one coordinate at a time in some average sense.

Definition 6.4. Let Ω1 and Ω2 be two sets and µ be a probability measure on Ω1 × Ω2.We say that µ has Renyi correlation at most ρ if for every measurable (w.r.t µ) f : Ω1 → Rand g : Ω2 → R with Eµf = Eµg = 0,

|Eµ[fg]| ≤ ρ√Eµ[f2]Eµ[g2].

For example, suppose that Ω1 = Ω2 and suppose (X,Y ) are generated by the followingprocedure: first choose X according to some distribution ν. Then, with probability ρ, weset Y = X, and with probability 1 − ρ, Y is chosen to be an independent sample fromν. If µ is the distribution of (X,Y ), then it is easy to check that µ has Renyi correlationρ. In particular, in the definition of noisy inner product, we let (x, y) ∈ −1, 12 haveE[y] = E[y] = 0 and E[xy] = ρ and (x, y) have Renyi correlation ρ.

We prove the following general theorem, which we will later use to derive both Borell’sinequality and the “Majority is Stablest” theorem.

17

Theorem 6.5. For any ε > 0 and 0 < ρ < 1, there is C(ρ) > 0 such that the followingholds. Let µ be a measure on Ω1 ×Ω2 with Renyi correlation at most ρ and let (Xi, Yi)

ni=1

be i.i.d. variables with distribution µ. Then for any measurable functions f : Ωn1 → [ε, 1−ε]

and g : Ωn2 → [ε, 1− ε],

EJρ(f(X), g(Y )) ≤ Jρ(Ef,Eg) + C(ρ)ε−C(ρ)(∆n(f) + ∆n(g)).

To see that there isn’t much difference between the case of [0, 1] valued functions and[ε, 1− ε] valued functions, we note that

Proposition 6.6. Consider the setup of Theorem . Let 0 < ε < 0.5 and let

f(x) =

x, ε ≤ x ≤ 1− ε,ε, x ≤ ε,1− ε, x ≥ 1− ε.

and similarly g(x). Then |Ef − E[f ]| ≤ ε and similarly for g. Moreover,∣∣∣EJρ(f(X), g(Y ))− EJρ(f(X), g(Y ))∣∣∣ ≤ 2ε

and Ii(f) ≤ Ii(f) for all i and similarly for g.

Proof. The proof follows from the fact that Jρ is 1− Lip in each of its coordinates.

6.1 The Base Case

We prove Theorem 6.6 by induction on n. In this section, we will prove the base casen = 1:

Claim 6.7. For any ε > 0 and 0 < ρ < 1, there is a C(ρ) such that for any two randomvariables X,Y ∈ [ε, 1− ε] with correlation in [−ρ, ρ],

EJρ(X,Y ) ≤ Jρ(EX,EY ) + C(ρ)ε−C(ρ)(E|X − EX|3 + E|Y − EY |3).

The proof of Claim 6.7 essentially follows from Taylor’s theorem applied to the functionJρ; the crucial point is that Jρ satisfies a certain differential equation. Define the matrixMρσ(x, y) by

Mρσ(x, y) =

(∂2Jρ(x,y)

∂x2σ∂2Jρ(x,y)∂x∂y

σ∂2Jρ(x,y)∂x∂y

∂2Jρ(x,y)∂y2

).

Claim 6.8. For any (x, y) ∈ (0, 1)2 and 0 ≤ |σ| ≤ ρ, Mρσ(x, y) is a negative semidefinitematrix. Likewise, if |σ| ≥ ρ, then Mρσ(x, y) is a positive semidefinite matrix.

18

We will also use the fact that the third derivatives of Jρ are bounded (at least, awayfrom the boundary of [0, 1]2).

Claim 6.9. For any −1 < ρ < 1, there exists C(ρ) > 0 such that for any i, j ≥ 0, i+j = 3,∣∣∣∣∂3Jρ(x, y)

∂xi∂yj

∣∣∣∣ ≤ C(ρ)(xy(1− x)(1− y))−C(ρ).

Further, the function C(ρ) can be chosen so that it is continuous for ρ ∈ (−1, 1).

Claims 6.8 and 6.9 follow from elementary calculus, and we defer their proofs (we notethat Claim 6.8 is implicit in [57]). Now we will use them with Taylor’s theorem to proveClaim 6.7.

Proof of Claim 6.7. Fix ε > 0 and ρ ∈ (0, 1). Now let C(ρ) be large enough so that all thirdderivatives of Jρ are uniformly bounded by C(ρ)ε−C(ρ) on the square [ε, 1−ε]2 (such a C(ρ)exists by Claim 6.9). Taylor’s theorem then implies that for any a, b, a+x, b+y ∈ [ε, 1− ε],

Jρ(a+ x, b+ y) ≤ Jρ(a, b) + x∂Jρ∂x

(a, b) + y∂Jρ∂y

(a, b)

+1

2(x y)

(∂2Jρ∂x2

(a, b)∂2Jρ∂x∂y (a, b)

∂2Jρ∂x∂y (a, b)

∂2Jρ∂y2

(a, b)

)(xy

)+ C(ρ)ε−C(ρ)(|x|3 + |y|3). (10)

Now suppose that X and Y are random variables taking values in [ε, 1 − ε]. If weapply (10) with a = EX, b = EY , x = X − EX, and y = Y − EY , and then takeexpectations of both sides, we obtain

EJρ(X,Y ) ≤ Jρ(EX,EY ) +1

2E

[(X Y )

(∂2Jρ∂x2

(a, b)∂2Jρ∂x∂y (a, b)


∂2Jρ∂y2

(a, b)

)(X

Y

)]+ C(ρ)ε−C(ρ)(E|X|3 + E|Y |3), (11)

where X = X − EX and Y = Y − EY . Now, if X and Y have correlation σ ∈ [0, ρ] then

EXY = σ√

EX2EY 2, and so

E

[(X Y )

(∂2Jρ∂x2

(a, b)∂2Jρ∂x∂y (a, b)


∂2Jρ∂y2

(a, b)

)(X

Y

)]= (σX σY )

(∂2Jρ∂x2

(a, b) σ∂2Jρ∂x∂y (a, b)

σ∂2Jρ∂x∂y (a, b)

∂2Jρ∂y2

(a, b)

)(σXσY

),

where σX =√

EX2 and σY =√

EY 2. By Claim 6.8,

(σX σY )

(∂2Jρ∂x2

(a, b) σ∂2Jρ∂x∂y (a, b)

σ∂2Jρ∂x∂y (a, b)

∂2Jρ∂y2

(a, b)

)(σXσY

)= (σX σY )Mρσ(a, b)

(σXσY

)≤ 0.

Applying this to (11), we obtain

EJρ(X,Y ) ≤ Jρ(EX,EY ) + C(ρ)ε−C(ρ)(E|X|3 + E|Y |3).

19

6.2 The inductive step:

Next, we prove Theorem 6.6 by induction.

Proof of Theorem 6.6. Note that the base case follows from 6.7 since f(X1), g(Y1) havecorrelation between −ρ and ρ by the assumption on the Renyi correlation between the X’sand Y ’s.

We now prove the inductive claim: Assume that the theorem holds with n replaced byn− 1. Consider f : Ωn

1 → [ε, 1− ε] and g : Ωn2 → [ε, 1− ε].

Conditioning on (Xn, Yn) we have

EJρ(f(X), g(Y )) = E[E[Jρ(f(X), g(Y ))|Xn, Yn]].

Applying the inductive hypothesis for n− 1 conditionally on Xn and Yn,

E[Jρ(f(X), g(Y ))|Xn, Yn] ≤ Jρ(E[f |Xn],E[g|Yn])

+ C(ρ)ε−C(ρ)(∆n−1(f) + ∆n−1(g)). (12)

On the other hand, the base case for n = 1 implies that

E[Jρ(E[f |Xn],E[g|Yn])] ≤ Jρ(Ef,Eg) + C(ρ)ε−C(ρ)(∆1(E[f |Xn]) + ∆1(E[g|Yn])). (13)

Taking the expectation of (12) over Xn and Yn and combining it with (13), we obtain

EJρ(f(X), g(Y )) ≤ Jρ(Ef,Eg) + C(ρ)ε−C(ρ)(E[∆n−1(f) + ∆n−1(g)]

)+ C(ρ)ε−C(ρ)

(∆1(E[f |Xn]) + ∆1(E[g|Yn])

).

Finally, note that the definition of ∆n implies that the right-hand side above is just

Jρ(Ef,Eg) + C(ρ)ε−C(ρ)(∆n(f) + ∆n(g)

).

6.3 Proof of Borell’s Theorem

We will now prove the following functional version of Borell’s result. It is easy to see itimplies Theorem 5.9.

Theorem 6.10. Let ρ ≥ 0 and N and M are Gaussian vectors with joint distribution

(N,M) ∼ N(

0,

(Id ρIdρId Id

)).

For any measurable f, g : Rd → [0, 1],

EJρ(f(N), g(M)) ≤ Jρ(Ef,Eg).

20

Proof of Theorem 6.10. Let n = md and define

G1,n =1√m

m∑i=1

xi,

2m∑i=m+1

xi, . . . ,

md∑i=(d−1)m+1

xi

.

Define G2,n similarly by using y instead of x. In other words, G1,n and G2,n are vectorsobtained by averaging the vectors x and y over consecutive blocks of size m. Define Z as

Z = (x1, xm+1, . . . , x(d−1)m+1, y1, ym+1, . . . , y(d−1)m+1).

Observe that (G1,n, G2,n) is distributed as sum of m independent copies of Z scaled by1/√m. Applying the Lindeberg-Feller central limit theorem [24], we obtain (G1,n, G2,n)→D

(N,M) as m→∞.Suppose first that f and g are L-Lipschitz functions taking values in [ε, 1 − ε]. Define

F,G : −1, 1n → R as

F (z) = f

1√m

m∑i=1

zi,2m∑

i=m+1

zi, . . . ,md∑

i=(d−1)m+1

zi

,

and similarly for G using g. By Theorem 6.6,

EJρ(F (x), G(y)) ≤ Jρ(EF,EG) + C(ρ)ε−C(ρ)(∆n(F ) + ∆n(G)). (14)

Since f is L-Lipschitz,

|F (x)− F (x⊕ ej)| ≤2L√m,

for every j, and similarly for g. Therefore:

max(∆n(F ),∆n(G)) ≤ 8L3n

m3/2=

8L3d√m.

Applying this to (14),

EJρ(F (x), G(y)) ≤ Jρ(EF,EG) + C(ρ)ε−C(ρ) 16L3d√m

,

and so the definition of F,G implies

EJρ(f(G1,n), g(G2,n)) ≤ Jρ(Ef(G1,n),Eg(G2,n)) + C(ρ)ε−C(ρ) 16L3d√m

.

Taking m → ∞, the multivariate central limit theorem and using the fact that Jρ isLipschitz implies that

EJρ(f(N), g(M)) ≤ Jρ(Ef(N),Eg(M)). (15)

21

This establishes the theorem for functions f and g which are Lipschitz and take valuesin [ε, 1−ε]. But any measurable f1, f2 : Rd → [0, 1] can be approximated (say in Lp(Rd, γd))by Lipschitz functions with values in [ε, 1−ε]. Since neither the Lipschitz constant nor ε ap-pears in (15), the general statement of the theorem follows from the dominated convergencetheorem.

6.4 Smoothness and hyper-contraction

It turns out that there is an effective bound on ∆n(Tηf) in the case where f has lowinfluences, where Tη is the noise operator and η is close to 1. In this section we show howthe fact that the noise operator Tη is “hyper-contractive” allows to bound ∆n(Tηf) whenf has low influences. In the next section, we will show that the smoothness of J impliesthat only a small error results from replacing f by Tηf for η close to 1.

Hypercontractivity is a key feature of many of the proofs in the analysis of Boolean func-tions starting with the KKL paper [40]. The proof will use the famous hyper-contractivetheorem of Bonami and Beckner:

Theorem 6.11. [11, 6] Let f : −1, 1n → R and 1 ≤ q ≤ p. Then, if ρ2 ≤ q−1p−1 , then:

‖Tρf‖p ≤ ‖f‖q.

We will use this fact to analyze the differences E[|fi − fi−1|3]. In particular we willprove the following:

Lemma 6.12. Let f : −1, 1n → [0, 1] and let 0 < ρ < 1. Then:

• (Tρf)i = Tρ(fi),

•E[|Tρ(fi − fi−1)|3] ≤ Ii(f)p/2,

where p = min(3, 1 + 1/ρ2),

•∆n(Tρ2f) ≤ 1

1− ρmax Ii(f)p/2−1.

Proof. The fact that (Tρf)i = Tρ(fi) follows immediately from the definitions. Let p =min(3, 1 + 1/ρ2). Then we apply Theorem 6.11 to obtain:

E[|Tρ(fi − fi−1)|3] = E[|Tρ(fi − fi−1)|p|Tρ(fi − fi−1)|3−p]

≤ E[|Tρ(fi − fi−1)|p] = ‖Tρ(fi − fi−1)‖pp ≤ ‖fi − fi−1‖p2 ≤ Ip/2i ,

22

and the proof follows of the second statement follows. To prove the third statement observethat:

∆n(Tρ2f) =

n∑i=1

E[|T 2ρ (fi − fi−1)|3]

≤n∑i=1

Ii(Tρf)p/2 ≤ max Ii(Tρf)p/2−1n∑i=1

Ii(Tρf)

≤ 1

1− ρmax Ii(f)p/2−1.

6.5 Proof of Majority is Stablest (Theorem 5.15)

Proof. Fix ε > 0. Our goal is to prove that:

〈f, g〉ρ ≤ 〈χEf , χEg〉ρ + ε = Jρ(Ef,Eg) + ε, (16)

when max(Ii(f), Ii(g)) ≤ τ for a small enough τ(ε).Let ε′ = ε/10. Let f be the rounding of f to the interval [ε′, 1− ε′] and similarly define

g. We first note that it suffices to prove the desired claim 16 for f and g with error ε′.This is because

|〈f, g〉ρ − 〈f , g〉ρ| ≤ 2ε′, |Ef − Ef ′| ≤ ε′, |Eg − Eg′| ≤ ε

and because Jρ(x, y) is 1-Lip in each of the coordinates x and y. Moreover, truncationreduces variance and therefore Ii(f) ≤ Ii(f) for all i and similarly for g. For simplicity ofnotation we redefine ε = ε′ and our goal is now to prove 16 for functions bounded in theinterval [ε, 1− ε].

Again, define ε′ = ε/10 and let ρ = η2ρ′ for η < 1 be chosen so that

Jρ′(Ef,Eg) ≤ Jρ(Ef,Eg) + ε′.

Then it suffices to prove that

〈f, g〉ρ = 〈Tηf, Tηg〉ρ′ ≤ Jρ′(Ef,Eg) + ε′.

and since Jρ′(x, y) ≥ xy it suffices to prove that

E[Jρ′(Tηf, Tηg)] ≤ Jρ′(Ef,Eg) + ε′. (17)

Again we have that Ii(Tηf) ≤ Ii(f) for all i and similarly for g. Renaming again ρ = ρ′

and ε′ = ε and applying Theorem 6.6 what remain to show is that

C(ρ)ε−C(ρ)(∆n(Tηf) + ∆n(Tηg)) ≤ ε.

23

By Lemma 6.12 this can be bounded by:

C(ρ)ε−C(ρ)C(η) max Ii(f)η

Thus for τ small enough, this is less than ε as needed.

6.6 Proof for resilient functions

We now prove Theorem 5.17. First we note that in the inductive proof in subsection 6.2,if we stop k steps before the end we obtain:

Proposition 6.13. For every 1 ≤ k ≤ n, it holds that:

EJρ(f(X), g(Y )) ≤ E[Jρ(fn−k(X), gn−k(Y ))

]+ C(ρ)ε−C(ρ)E

[∆n−k(f) + ∆n−k(g)

],

where fn−k = E[f |Xn−k, . . . , Xk] and gn−k = E[g|Yn−k, . . . , Yn])].

Proof. The proof will imitate the previous proof up to (17). Using the fact that

n∑i=1

Ii(Tηf) ≤ (1− η)−1,

It follows that there are most k = O(τ−1(1 − η)−1) coordinates of f and g with influencemore than τ . Without loss of generality, assume that these are the the last k coordinates.Then apply Proposition 6.13 to conclude that:

E[Jρ′(Tηf, Tηg)] ≤ E[Jρ(Tηfn−k(X), Tηgn−k(Y ))] + κ,

where the error term κ is bounded as before by

C(ρ)ε−C(ρ)C(η) max I1≤i≤n−k(f)η ≤ ε/2.

Now if the function f is (k, ε/100) resilient then so is fn−k and Tηfn−k. Therefore

|fn−k − E[f ]| ≤ ε/100,

and therefore since J is Lipchitz it holds that

E[Jρ(Tηfn−k(X), Tηgn−k(Y ))] ≤ E[Jρ(Ef, Tηgn−k(Y ))] + ε/100 ≤ Jρ(Ef,Eg) + ε/100,

where the ultimate inequality follows from the fact that J is concave in each of the coor-dinates.

24

Note that the only place where we used the resilience of f in showing that

E[Jρ(Tηfn−k(X), Tηgn−k(Y ))] ≤ Jρ(Ef,Eg) + ε/100

More generally:

Exercise 6.14. • Prove that if the W1 distance between the joint distribution of (Tηfn−k(X), Tηgn−k(Y ))and the product of the marginals is bounded by ε then

E[Jρ(Tηfn−k(X), Tηgn−k(Y ))] ≤ Jρ(Ef,Eg) + ε.

• Show that if the W1 distance between the joint distribution of (Tηfn−k(X), Tηgn−k(Y ))and the product of the marginals is smaller than the corresponding distance betweenthe joint distribution of (fn−k(X), gn−k(Y )) and the product of their marginals.

• Show that if f, g : −1, 1k → 0, 1 and the distance between the joint distributionof (f(X), g(Y )) and the product of their marginals is more than ε then:∑

S 6=∅

|f(S)g(S)| ≥ |∑S 6=∅

ρ|S|f(S)g(S)| ≥ ε.

This exercise implies in particular the following Theorem:

Theorem 6.15. For every ε > 0, 0 < ρ < 1, there exists k and α such that if f, g :−1, 1n → 0, 1 satisfy

〈f, g〉ρ > Jρ(Ef,Eg) + ε,

then there exists a set S with |S| ≤ k such that f(S)g(S) ≥ α.

6.7 Cross Influences

We now sketch an alterantive proof of Theorem 5.16 given Theorem 5.15.

Proof.

• We will use the fact that if Ii(g) ≤ τ ′, then there exists a function g′ that doesnot depend on i such that E[|g − g′|2] ≤ τ ′, i.e., g′ = E[g|x−i] and that if we letf ′ = E[f |x−i] then for all ρ: 〈f ′, g′〉ρ = 〈f, g′〉ρ.

• As before, it suffices to prove the theorem for Tγf and Tγg for some γ close to, butless than 1 incurring an error of 0.1ε.

• We know that if all the influences of both f and g are less than some τ then weobtain the right statement with error 0.1ε.

25

• By Lemma 5.12, ∑i

Ii(Tγf) ≤ maxk

kγk ≤ C(γ),

and thus there are at most C(γ)/τ coordinates of f with influence greater than τ .Let us denote this set of coordinates by Af . Let Ag be the corresponding set for g.

• Let us choose τ ′ so that√τ ′C(γ)/τ ≤ 0.1ε. Suppose that min(Ii(f), Ii(g)) ≤ τ ′ for

all i. Note that this implies in particular that Af and Ag are disjoint.

Let f ′ = E[Tγf |x−Ag ] and g′ = E[Tγf |x−Af ]. Then:

|〈f ′, g′〉 − 〈Tγf, Tγg〉| ≤ 0.2ε.

Moreover, max(Ii(f′), Ii(g

′)) ≤ τ so we can apply Theorem 5.15 to conclude.

6.8 Majority is the most stable resilient function

We now prove Theorem 5.17 using different ideas from additive combinatorics. The proofwill use Decision Trees as well as the following standard estimates, see e.g. [60, AppendixB] and subsection 6.10.

Lemma 6.16. Assume ρ < 1 and ρ1 < ρ2 < 1 then

|〈χµ1 , χµ2〉ρ1 − 〈χµ1 , χµ2〉ρ2 | ≤10(ρ2 − ρ1)

1− ρ2

|〈χµ′1 , χµ′2〉ρ − 〈χµ1 , χµ2〉ρ| ≤ |µ1 − µ′1|+ |µ2 − µ′2|

Decision trees allow to express Boolean function by querying variables on at a time,where the order of queries may depend on the values of previous variables. In the contextof low influence functions, decision trees are useful as they allow to express functions ofthe form Tγh as a bounded size decision tree, where almost all the leaves are low influencefunctions. We will use the following regularity lemma, see e.g. [65, 21].

Lemma 6.17. For a function f : −1, 1n → R, let I(f) =∑Ii(f). Then for any

τ > 0, ε > 0 and any function f there exists a decision tree for f of depth at most

d ≤ 2 +I

τε

such that the probability of reaching a leaf with influence sum τ or more is bounded by ε.

26

Proof. The construction of the decision tree is standard. If a function fxI at a certain nodexI has all influences less than τ or if the node is at level d do nothing. Otherwise, conditionon the variable j with the maximum influence in fxI and create two children fyJ and fzKwhere J = K = I ∪ j, yi = zi = xi for all i ∈ I and yj = 0 and zj = 1. Since

I(fxI ) = Ij(fxI ) +1

2(I(fyJ ) + I(fzK )),

it easily follows that if L is the set of leaves of the tree and if D(`) denotes the depth ofleaf ` then

I(f) ≥ τ∑`∈L

2−D(`)D(`).

Therefore if p is the fraction of paths that reach level d then

I(f) ≥ (d− 1)τp =⇒ p ≤ I/(d− 1)τ,

and taking d − 1 to be the smallest integer that is greater or equal to Iτε we obtain the

desired result.

The proof will be carried out via a sequence of reductions which will give the functionf more and more structure. Fix ε > 0 and 0 ≤ ρ′ < 1. Recall that we want to show thatif f is (d(ε, ρ′), α(ε, ρ′)) resilient then for all g bounded between 0 and 1,

〈f, g〉ρ′ ≤ 〈χµf , χµg〉ρ′ + ε, (18)

where µf = Ef and similarly for g.

Lemma 6.18. In order to prove (18) it suffices to prove that

〈Tηf, g〉ρ ≤ 〈χµf , χµg〉ρ + ε/2. (19)

forρ = (1− 0.01ε)ρ′ + 0.01ε, η = ρ′/ρ = 1− Ω(ε(1− ρ′)). (20)

Proof. Write ρ′ = ρη, where 1 − ρ ≥ (1 − ρ′)/2 and η < 1. Note that 〈Tηf, g〉ρ = 〈f, g〉ρ′ .Moreover, f and Tηf have the same expected value. If we could establish (19) and

|〈χµf , χµg〉ρ − 〈χµf , χµg〉ρ′ | < ε/2, (21)

then (18) would follow. Note that (21) follows from Lemma 6.16 when

10(ρ− ρ′)1− ρ

< ε/2.

We may thus choose ρ and η as in (20).

27

Lemma 6.19. Let τ be chosen so that (6) holds with error 0.01ε for ρ. Then it suffices toprove (19) for a function h = Tηf that has a decision tree of depth d and such that for atmost 0.01ε fraction of the inputs a random path of the decision tree terminates at a nodewith some influence greater than τ . Moreover

d = O

(1

ε2(1− ρ)τ

)Proof. We note that the function h = Tηf satisfies:

I(h) :=∑

Ii(h) =∑S

|S|f2(S)η2|S| ≤ maxssη2s = O

(1

ε(1− ρ)

).

Apply Lemma 6.17 to obtain a decision tree for h where for at most 0.01ε fraction of theinputs, a random path of the decision tree terminates at a node with some influence greaterthan τ . Note that the depth of the tree satisfies

d ≤ C(1 +I

τε) ≤ C(1 +

1

ε2(1− ρ)τ)

as needed.

We now conclude the proof of Theorem 5.17.

Proof. Let h be a function such as in Lemma 6.19. Assume furthermore that f is (d, 0.01ε2−d)resilient. Note that this implies that h = Tηf is also (d, 0.01ε2−d) resilient. This is becausefor every S,

E[Tηf |XS = z] = E[E[f |XS = z′]],

where z′ ∼ ⊗i∈S(ηδzi + 1−η2 (δ1 + δ−1)), and therefore:

sup|S|≤d,z

∣∣∣E [Tηf |XS = z]− E[f ]∣∣∣ ≤ sup

|S|≤d,z

∣∣∣E [f |XS = z]− E[f ]∣∣∣.

Let x, y be two ρ-correlated inputs. Then

E[h(x)g(y)] =∑xI ,yI

P[xI ]P[yI |xI ]E[h(x)g(y)|xI , yI ],

where xI denotes a random leaf of the decision tree and yI is chosen after xI to be aρ-correlated version of xI . Let A denote the set of xI for which fxI has all influences lessthan τ . Then:

E[h(x)g(y)] =∑xI ,yI

P[xI ]P[yI |xI ]E[h(x)g(y)|xI , yI ]

≤ 0.01ε+∑xI∈A

P[xI ]∑yI

P[yI |xI ]E[h(x)g(y)|xI , yI ].

28

Write µ′ = µf + 0.01ε and µ(yI) = E[g(y)|yI ]. Note that since h is (d, 0.01ε2−d)-resilientit follows that for all leaves xI it holds that E[h|xI ] ≤ µ′. Thus for xI ∈ A we can apply(6) to obtain that

E[h(x)g(y)|xI , yI ] ≤ 〈χµ′ , χµ(yI)〉ρ + 0.01ε.

Plugging this back in we obtain the bound

E[h(x)g(y)] ≤ 0.02ε+∑xI∈A

P[xI ]∑yI

P[yI |xI ]〈χµ′ , χµ(yI)〉ρ

≤ 0.02ε+∑xI ,yI

P[xI ]P[yI |xI ]〈χµ′ , χµ(yI)〉ρ

= 0.02ε+ 〈χµ′ , (∑xI ,yI

P[xI ]P[yI |xI ]χµ(yI))〉ρ

Note that ψ =∑

xI ,yIP[xI ]P[yI |xI ]χµ(yI) is a [0, 1]-valued function with E[ψ] = E[g]. Thus

by Theorem 5.9 it follows that

0.02ε+ 〈χµ′ , (∑xI ,yI

P[xI ]P[yI |xI ]χµ(yI))〉ρ ≤ 0.02ε+ 〈χµ′ , χµg〉ρ ≤ 0.04ε+ 〈χµf , χµg〉ρ,

where the last inequality follows from Lemma 6.16.

6.9 Majority is Most Predictable

Suppose n voters are to make a binary decision. Assume that the outcome of the vote isdetermined by a social choice function f : −1, 1n → −1, 1, so that the outcome of thevote is f(x1, . . . , xn) where xi ∈ −1, 1 is the vote of voter i. We assume that the votesare independent, each ±1 with probability 1

2 . It is natural to assume that the function fsatisfies f(−x) = −f(x), i.e., it does not discriminate between the two candidates. Notethat this implies that E[f ] = 0 under the uniform distribution. A natural way to try andpredict the outcome of the vote is to sample a subset of the voters, by sampling each voterindependently with probability ρ. Conditioned on a vector X of votes the distribution ofY , the sampled votes, is i.i.d. where Yi = Xi with probability ρ and Yi = ∗ (for unknown)otherwise.

Conditioned on Y = y, the vector of sampled votes, the optimal prediction of theoutcome of the vote is given by sgn((Tf)(y)) where

(Tf)(y) = E[f(X)|Y = y]. (22)

This implies that the probability of correct prediction (also called predictability) is givenby

P[f = sgn(Tf)] =1

2(1 + E[f sgn(Tf)]).

29

For example, when f(x) = x1 is the dictator function, we have E[f sgn(Tf)] = ρ cor-responding to the trivial fact that the outcome of the election is known when voter 1 issampled and are ±1 with probability 1/2 otherwise. The notion of predictability is naturalin statistical contexts. It was also studied in a more combinatorial context in [71].

Similarly to the Majority is Stablest Theorem, we can prove:

Theorem 6.20 (“Majority Is Most Predictable”). Let 0 ≤ ρ ≤ 1 and ε > 0 be given. Thenthere exists a τ > 0 such that if f : −1, 1n → [−1, 1] satisfies E[f ] = 0 and Infi(f) ≤ τfor all i, then

E[f sgn(Tf)] ≤ 2π arcsin

√ρ+ ε, (23)

where T is defined in (22).Similarly, in the same setup, for every ε > 0, there exists (r, δ) such that if f is (r, δ)-

resilient and satisfies E[f ] = 0 then (23) holds.

We note that from the central limit theorem it follows that if Majn(x1, . . . , xn) =sgn(

∑ni=1 xi), then

limn→∞

E[Majn sgn(T Majn)] = 2π arcsin

√ρ.

Remark 6.21. Note that Theorem 6.20 proves a weaker statement than showing thatMajority is the most predictable function. The statement only asserts that if a functionhas low enough influences than its predictability cannot be more than ε larger than theasymptotic predictability value achieved by the majority function when the number ofvoters n→∞. This slightly inaccurate title of the theorem is inline with previous resultssuch as the ”Majority is Stablest Theorem” (see below). Similar language may be usedlater when informally discussing statements of various theorems.

Remark 6.22. One may wonder if for a finite n, among all functions f : −1, 1n →−1, 1 with E[f ] = 0, majority is the most predictable function. Note that the pre-dictability of the dictator function f(x) = x1 is given by ρ, and 2

π arcsin√ρ > ρ for ρ→ 0.

Therefore when ρ is small and n is large the majority function is more predictable than thedictator function. However, note that when ρ→ 1 we have ρ > 2

π arcsin√ρ and therefore

for values of ρ close to 1 and large n the dictator function is more predictable than themajority function.

We note that the bound obtained in Theorem 6.20 is a reminiscent of the Majority isStablest theorem [59, 60] as both involve the arcsin function. However, the two theoremsare quite different. The Majority is Stablest theorem asserts that under the same conditionas in Theorem 6.20 it holds that

E[f(X)f(Y )] ≤ 2π arcsin ρ+ ε.

where (Xi, Yi) ∈ −1, 12 are i.i.d. with E[Xi] = E[Yi] = 0 and E[XiYi] = ρ. Thus“Majority is Stablest” considers two correlated voting vectors, while “Majority is Most

30

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

ρ

ρ

2 arcsin(√ρ)/π

Predictable” considers a sample of one voting vector. In fact, both results follow from themore general invariance principle presented here. We note a further difference betweenstability and predictability: It is well known that in the context of “Majority is Stablest”,for all 0 < ρ < 1, among all boolean functions with E[f ] = 0 the maximum of E[f(x)f(y)]is obtained for dictator functions of the form f(x) = xi. As discussed above, for ρ close to0 and large n, the dictator is less predictable than the majority function.

The proof of Theorem 6.20 follows the same lines of the Majority is Stablest Theorem.The basic space Ω = Ω1×Ω2 where (x, y) ∈ Ω is distributed as follows. First x is distributeduniformly in ±1. Conditioned on x, y = x with probability ρ and is equal to ∗ withprobability 1− ρ. It is easy to check that the that the Renyi correlation between x and yis√ρ.

6.10 Facts regarding Jρ

Here we collect various facts about the function

Jρ(x, y) := 〈χx, χy〉ρ = Pr[X ≤ Φ−1(x), Y ≤ Φ−1(y)],

where (X,Y ) ∼ N (0, ( 1 ρρ 1 )). As is standard, we will use φ to denote the density of the

standard normal distribution. These calculations all follow from elementary calculus.

Claim 6.8. For any (x, y) ∈ (0, 1)2 and 0 ≤ |σ| ≤ ρ, Mρσ(x, y) is a negative semidefinitematrix. Likewise, if |σ| ≥ ρ, then Mρσ(x, y) is a positive semidefinite matrix.

Proof. Towards proving this, note that we can define Y = ρ · X +√

1− ρ2 · Z whereZ ∼ N (0, 1) is an independent normal. Also, let us define Φ−1(x) = s and Φ−1(y) = t.

31

For s, t ∈ R, define Kρ(s, t) as

Kρ(s, t) = PrX,Y

[X ≤ s, Y ≤ t] = PrX,Z

[X ≤ s, Z ≤ (t− ρ ·X)/√

1− ρ2].

Note that for the aforementioned relations between x, y, s and t, Kρ(s, t) = Jρ(x, y). Notethat

Kρ(s, t) =

∫ s

s′=−∞φ(s′)

∫ (t−ρ·s′)/√

1−ρ2

t′=−∞φ(t′)dt′ds′. (24)

This implies that

∂Kρ(s, t)

∂s= φ(s)

∫ (t−ρ·s)/√

1−ρ2

t′=−∞φ(t′)dt′.

By chain rule, we get that∂Jρ(x, y)

∂x=∂Kρ(s, t)

∂s· ∂s∂x.

By elementary calculus, it follows that

dΦ−1(x)

dx=

1

φ(Φ−1(x))⇒ ∂s

∂x=

1

φ(Φ−1(x))=

1

φ(s).

Thus,

∂Jρ(x, y)

∂x=

∫ (t−ρ·s)/√

1−ρ2

t′=−∞φ(t′)dt′.

Thus, we next get that

∂2Jρ(x, y)

∂x2=∂2Jρ(x, y)

∂x∂s· ∂s∂x

= φ

(t− ρ · s√

1− ρ2

)· −ρ√

1− ρ2· 1

φ(s)= φ

(Φ−1(y)− ρ · Φ−1(x)√

1− ρ2

)· −ρ√

1− ρ2· 1

φ(s).

∂2Jρ(x, y)

∂x∂y=∂2Jρ(x, y)

∂x∂t· ∂t∂y

= φ

(Φ−1(y)− ρ · Φ−1(x)√

1− ρ2

)· 1√

1− ρ2· 1

φ(t).

Because we know that (X,Y ) ∼ (Y,X), by symmetry, we can conclude that

∂2Jρ(x, y)

∂y2= φ

(Φ−1(x)− ρ · Φ−1(y)√

1− ρ2

)· −ρ√

1− ρ2· 1

φ(t).

and likewise,

∂2Jρ(x, y)

∂y∂x= φ

(Φ−1(x)− ρ · Φ−1(y)√

1− ρ2

)· 1√

1− ρ2· 1

φ(s).

32

It is obvious now that

∂2Jρ(x, y)

∂x2· ∂

2Jρ(x, y)

∂y2− ρ2

(∂2Jρ(x, y)

∂x∂y

)2

= 0.

Now, suppose that |σ| ≤ |ρ|. Then

det(Mρσ(x, y)) =∂2Jρ(x, y)

∂x2· ∂

2Jρ(x, y)

∂y2− σ2

(∂2Jρ(x, y)

∂x∂y

)2

≥ 0.

If ρ ≥ 0 then the diagonal of Mρσ(x, y) is non-positive, and it follows that Mρσ(x, y) isnegative semidefinite. If ρ ≤ 0 then the diagonal is non-negative and so Mρσ(x, y) ispositive semidefinite.

Claim 6.9. For any −1 < ρ < 1, there exists C(ρ) > 0 such that for any i, j ≥ 0, i+j = 3,∣∣∣∣∂3Jρ(x, y)

∂xi∂yj

∣∣∣∣ ≤ C(ρ)(xy(1− x)(1− y))−C(ρ).

Further, the function C(ρ) can be chosen so that it is continuous for ρ ∈ (−1, 1).

Proof. As before, we set Φ−1(x) = s and Φ−1(y) = t. From the proof of Claim 6.8, we seethat

∂2Jρ(x, y)

∂x2= φ

(Φ−1(y)− ρ · Φ−1(x)√

1− ρ2

)· −ρ√

1− ρ2· 1

φ(s).

To compute the third derivatives of J , recalling that ∂s∂x = 1

φ(s) and ∂t∂y = 1

φ(t) , we have

∂3Jρ(x, y)

∂x3=

ρ

(1− ρ2)3/2

ρt+ (2ρ2 − 1)s

φ(s)exp

(− t2 − 2ρst+ (2ρ2 − 1)s2

2(1− ρ2)

)=

√2πρ

(1− ρ2)3/2(ρt+ (2ρ2 − 1)s) exp

(− t2 − 2ρst+ (3ρ2 − 2)s2

2(1− ρ2)

). (25)

Now, Φ−1(x) ∼√

2 log x as x→ 0; hence there is a constant C such that Φ−1(x) ≤ C√

log xfor all x ≤ 1

2 . Hence, exp(s2) ≤ x−C for all x ≤ 12 ; by symmetry, exp(s2) ≤ (x(1 − x))−C

for all x ∈ (0, 1). Therefore

exp(− t2 − 2ρst+ (3ρ2 − 2)s2

2(1− ρ2)

)= e− t2

2(1−ρ2) eρst

1−ρ2 e(2−3ρ2)s2

2(1−ρ2

≤ e−t2

2(1−ρ2) eρ(s2+t2)

2(1−ρ2) e(2−3ρ2)s2

2(1−ρ2

≤(x(1− x)y(1− y)

)− ρ

2(1−ρ2)(x(1− x)

)− 2−3ρ2

2(1−ρ2) . (26)

33

Further, using exp(s2) ≤ x−C and exp(t2) ≤ y−C , ρt + (2ρ2 − 1)s ≤ 4(xy)−C . As aconsequence, applying this to (25), we see that there is a constant C(ρ) > 0,∣∣∣∣∂3Jρ(x, y)

∂x3

∣∣∣∣ ≤ C(ρ)(x(1− x)y(1− y)

)−C(ρ).

The other third derivatives are similar:

∂3Jρ(x, y)

∂x2∂y=

√2πρ

(1− ρ2)3/2(t− 2ρs) exp

(− (2ρ2 − 1)t2 − 2ρst+ (2ρ2 − 1)s2

2(1− ρ2)

).

By the same steps that led to (26), we get∣∣∣∣∂3Jρ(x, y)

∂x2∂y

∣∣∣∣ ≤ C(ρ)(x(1− x)y(1− y)

)−C(ρ)

(for a slightly different C(ρ)). The bounds on ∂3J/∂y2∂x and ∂3J/∂x3 then follow becauseJ is symmetric in x and y.

The fact that C(ρ) can be chosen so that it is continuous for ρ ∈ (−1, 1) is obviousfrom the discussion above.

Claim 6.23. For any x, y ∈ (0, 1),∣∣∣∣∂Jρ(x, y)

∂ρ

∣∣∣∣ ≤ (1− ρ2)−3/2.

Proof. We begin from (24), but this time we differentiate with respect to ρ:

∂Kρ(s, t)

∂ρ= − 1

(1− ρ2)3/2

∫ s

s′=−∞φ(s′)φ

(t− ρs′√

1− ρ2

)ds′.

Since Range(φ) ⊂ (0, 1] and∫s′ φ(s′)ds′ = 1, it follows that∣∣∣∣∂Kρ(s, t)

∂ρ

∣∣∣∣ ≤ (1− ρ2)−3/2.

Since ∂Jρ(s,t)∂ρ =

∂Kρ(Φ−1(x),Φ−1(y))∂ρ , the proof is complete.

We also state the following useful claim without a proof. The proof is obvious from thecalculations in the proofs of Claim 6.8 and Claim 6.9.

Claim 6.24. For any ρ ∈ (−1, 1), ε > 0 there exists a continuous function γ(ρ, ε) suchthat for any (x, y) ∈ [ε, 1− ε]2 and 1 ≤ i+ j ≤ 3,∣∣∣∣∂i+jJρ(x, y)

∂ix∂jy

∣∣∣∣ ≤ γ(ρ, ε).

34

7 Paradoxes, Noise Stability and Reverse Hyper-Contraction

7.1 Probability of Paradox

Our next goal is to prove a quantitative version of Arrow Theorem following [55]. We willonly discuss the case of 3 alternatives but will allow different function to determine differentpairwise selections. Recall that we consider voters who vote independently and where voteri votes uniformly at random from the 6 possible rankings. Recall that we encode the 6possible rankings by vectors (x, y, z) ∈ −1,+13 \ ±(1, 1, 1). Here x is +1/ − 1 if avoter ranks a above/below b, y is +1/ − 1 if voter ranks b above/below c, z is +1/ − 1 ifvoter ranks c above/below a. We will assume further that f, g, h : −1, 1n → 0, 1 arethe aggregation functions for the a vs. b, b vs. c and c vs. a preferences. We will again usethe following observation used in [41]: Since the binary predicate ψ(a, b, c) = 1(a = b = c)can be expressed as

ψ(a, b, c) =1

4(1 + ab+ ac+ bc− a− b− c),

we can write

P[f(x) = g(y) = h(z)] =1

4

(1 + E[f(x)g(y)] + E[g(y)h(z)] + E[h(z)f(x)]− E[f ]− E[g]− E[h]

)=

1

4

(1 + 〈f, g〉−1/3 + 〈g, h〉−1/3 + 〈h, f〉−1/3 − E[f ]− E[g]− E[h]

)where the last equality follows from the fact that the uniform distribution over ±13 \±(1, 1, 1), satisfies E[xiyi] = −1/3 and similarly for other pairs of coordinates.

To state a quantitive version, we will say that a function f is ε-close to a function g ifP[f 6= g] ≤ ε. The quantitative version we we wish to prove is the following:

Theorem 7.1. For every ε > 0, there exists δ(ε) > 0 such that the following holds forevery n: If

P[f(x) = g(y) = h(z)] < δ,

then either two of the functions f, g, h are ε-close to constant functions of the opposite sign,or there exists a variable i such that f, g and h are all ε-close to the same dictator on voteri.

The main significance of Theorem 7.1 is that it is dimension independent. We get thesame bound no matter what the dimension is. This shows that one cannot avoid the curseof paradoxes in voting by assuming the probability of a paradox vanishes as the numberof voters grow. Our proof of the quantitative version will use the Majority is StablestTheorem along with reverse hyper-contractive inequality which we discuss next. Beforeproving Theorem 7.1 we give a direct implication of the Majority is Stablest Theorem inthe case where the functions f = g = h are all balanced so E[f ] = E[g] = E[h] = 0. UsingMajority is Stablest Theorem we obtain:

35

Theorem 7.2 ([41, 60]). For every ε > 0, there exists a τ > 0 such that if f, g, h :−1, 1n → 0, 1 satisfy E[f ] = E[g] = E[h] = 1/2 and have all influences bounded aboveby τ then:

P[f(x) = g(y) = h(z)] ≥ 3〈χ 12, χ 1

2〉− 1

3− 1

2− ε. (27)

Again, the right hand side of equation (27) is the asymptotic probability that P[f(x) =

g(y) = h(z)] when f = g = h = χ 12(n−

12∑n

i=1 xi) are all given by the same Majorityfunction. Theorem 7.2 provides a surprising counter argument to Condorcet’ arguments.Condorcet argued that pairwise ranking by Majority is problematic as it results in a paradoxand Theorem 7.2 shows that in fact Majority asymptotically minimizes the probability ofa paradox among low influence functions.

We also have the following strengthening of Theorem 7.2.

Theorem 7.3. For every ε > 0, there exist m,β > 0 such that if f, g, h : −1, 1n → 0, 1satisfy E[f ] = E[g] = E[h] = 1/2 and f, g and h are all (m,β)-resilient then

P[f(x) = g(y) = h(z)] ≥ 3〈χ 12, χ 1

2〉− 1

3− 1

2− ε.

7.2 Reverse Hyper-Contraction

Recall that the hyper-contractive theorem 6.11 states that for f : −1, 1n → R and1 ≤ q ≤ p and for any ρ2 ≤ q−1

p−1 , it holds that

‖Tρf‖p ≤ ‖f‖q.

Borell [12] proved a reverse inequality:

Theorem 7.4. [12] Let f, g : −1, 1n → R+ and 1 > p > q. Then, for any 0 ≤ ρ2 ≤ 1−p1−q ,

‖Tρf‖q ≥ ‖f‖p.

and for any 0 ≤ ρ2 ≤ (1− p)(1− q)

〈f, g〉ρ ≥ ‖f‖p‖‖g‖q

While the inequalities may seem like a curiosity, as p and q norms for p, q < 1 are rarelyused (nor are they norms), the second inequality is quite helpful in some social choice proofs.For a more general discussion of reverse-hyper contraction and its applications see [61, 62].

An immediate computational corollary of Theorem 7.4 is the following lemma (as statedin [61]):

36

Lemma 7.5. Let x, y ∈ −1, 1n be distributed uniformly and (xi, yi) are independent.Assume that E[x(i)] = E[y(i)] = 0 for all i and that E[x(i)y(i)] = ρ ≥ 0. Let B1, B2 ⊂−1, 1n be two sets and assume that

P[B1] ≥ e−α2, P[B2] ≥ e−β2

.

Then:

P[x ∈ B1, y ∈ B2] ≥ exp(−α2 + β2 + 2ραβ

1− ρ2).

In particular, if P[B1] ≥ ε and P[B2] ≥ ε, then

P[x ∈ B1, y ∈ B2] ≥ ε2

1−ρ .

We will need to generalize the result above to negative ρ and further to different ρvalues for different bits.

Lemma 7.6. Let x, y ∈ −1, 1n be distributed uniformly and (xi, yi) are independent.Assume that E[x(i)] = E[y(i)] = 0 for all i and that |E[x(i)y(i)]| ≤ ρ. Let B1, B2 ⊂ −1, 1nbe two sets and assume that

P[B1] ≥ e−α2, P[B2] ≥ e−β2

.

Then:

P[x ∈ B1, y ∈ B2] ≥ exp(−α2 + β2 + 2ραβ

1− ρ2).

In particular if P[B1] ≥ ε and P[B2] ≥ ε, then:

P[x ∈ B1, y ∈ B2] ≥ ε2

1−ρ . (28)

Proof. Take z so that (xi, zi) are independent and E[zi] = 0 and E[xizi] = ρ. It is easy tosee that there exists wi independent of x, z s.t. the joint distribution of (x, y) is the sameas (x, z · w), where z · w = (z1w1, . . . , znwn). Now for each fixed w we have that

P[x ∈ B1, z · w ∈ B2] = P[x ∈ B1, z ∈ w ·B2] ≥ exp(−α2 + β2 + 2ραβ

1− ρ2),

where w ·B2 = w · w′ : w′ ∈ B2. Therefore taking expectation over w we obtain:

P[x ∈ B1, y ∈ B2] = E[P[x ∈ B1, z · w ∈ B2 | w]] ≥ exp(−α2 + β2 + 2ραβ

1− ρ2)

as needed. The conclusion (28) follows by simple substitution (note the difference withCorollary 3.5 in [61] for sets of equal size which is a typo).

37

Applying the CLT and using [13] one obtains the same result for Gaussian randomvariables.

Lemma 7.7. Let N,M be N(0, In) with (N(i),M(i))ni=1 independent. Assume that |E[N(i)M(i)]| ≤ρ. Let B1, B2 ⊂ Rn be two sets and assume that

P[B1] ≥ e−α2, P[B2] ≥ e−β2

,

Then:

P[N ∈ B1,M ∈ B2] ≥ exp(−α2 + β2 + 2ραβ

1− ρ2).

In particular if P[B1] ≥ ε and P[B2] ≥ ε, then:

P[N ∈ B1,M ∈ B2] ≥ ε2

1−ρ . (29)

Proof. Fix the values of α and β and assume without loss of generality that maxi |E[N(i)M(i)]|is obtained for i = 1. Then by [13] (see also [54]), the minimum of the quantity P[N ∈B1,M ∈ B2] under the constraints on the measures given by α and β is obtained in onedimension, where B1 and B2 are intervals I1, I2. Look at random variables x(i), y(i),where E[x(i)] = E[y(i)] = 0 and E[x(i)y(i)] = E[M1N1]. Let Xn = n−1/2

∑i=1 x(i) and

Yn = n−1/2∑

i=1 y(i). Then the CLT implies that

P[Xn ∈ I1]→ P[N1 ∈ B1], P[Yn ∈ I2]→ P[M1 ∈ B2],

andP[Xn ∈ I1, Yn ∈ I2]→ P[N1 ∈ B1,M1 ∈ B2].

The proof now follows from the previous lemma.

7.3 The Gaussian Arrow Theorem

The next step is to consider a Gaussian version of the problem. The Gaussian versioncorresponds to a situation where the functions f, g, h can only “see” averages of largesubsets of the voters. We thus define a 3 dimensional normal vector N . The first coordinateof N is supposed to represent the deviation of the number of voters where a ranks aboveb from the mean. The second coordinate is for b ranking above c and the last coordinatefor c ranking above a.

Since averaging maintains the expected value and covariances, we define:

E[N21 ] = E[N2

2 ] = E[N23 ] = 1, E[N1N2] = E[N2N3] = E[N3N1] = −1/3.

We let N(1), . . . , N(n) be independent copies of N . We write N = (N(1), . . . , N(n)) andfor 1 ≤ i ≤ 3 we write Ni = (N(1)i, . . . , N(n)i). The Gaussian version of Arrow theoremstates:

38

Theorem 7.8. For every ε > 0 there exists a δ = δ(ε) > 0 such that the following hold.Let φ1, φ2, φ3 : Rn → −1, 1. Assume that for all 1 ≤ i 6= j ≤ 3 and all u ∈ −1, 1 itholds that

P[φi(Ni) = u] + P[φj(Nj) = −u] ≥ 2ε. (30)

Then with the setup given in (30) it holds that:

P[φ1(N1) = φ2(N2) = φ3(N3)] ≥ δ.

Moreover, one may take δ = (ε/2)18.

We note that if P[φi(Ni) = u] + P[φj(Ni) = −u] ≤ 2ε, then one of the alternatives willbe ranked the top/bottom with probability at least 1 − 2ε. Therefore the theorem statesthat unless this is the case, the probability of a paradox is at least δ. Since the Gaussiansetup excludes dictator functions in terms of the original vote, this result is to be expectedin this case.

Proof. We will consider two cases: either all the functions φi satisfy |Eφi| ≤ 1− ε, or thereexists at least one function with |Eφi| > 1 − ε. Assume first that there exists a functionφi with |Eφi| > 1 − ε. Without loss of generality assume that P[φ1 = 1] > 1 − ε/2. Notethat by (30) it follows that P[φ1 = −1] + P[φ2 = 1] ≥ 2ε and therefore P[φ2 = 1] > ε andsimilarly P[φ3 = 1] > ε. In particular, P[φ1 = 1, φ2 = 1] > ε/2 We now look at the functionψ = 1(φ1 = 1, φ2 = 1). Let

M1 =

√3

2(N1 +N2), M2 =

√3

2√

2(N1 −N2).

Then it is easy to see that M2(i) is uncorrelated with and therefore independent ofN3(i),M1(i) for all i. Moreover, for all i the covariance between M1(i) and N3(i) is1− 1/

√3 and 1− 1/

√3 > 1/3. We may now apply Lemma 7.7 with the vectors

(N3(1), . . . ,N3(n), Z1, . . . , Zn), (M1(1), . . . ,M1(n),M2(1), . . . ,M2(n)),

where Z = (Z1, . . . , Zn) is a normal Gaussian vector independent of anything else. Weobtain:

P[φ1(N1) = 1, φ2(N2) = 1, φ3(N3) = 1] = P[φ3(N3, Z) = 1, ψ(M1,M2) = 1] ≥ ((ε/2))2

1/3 ≥ (ε/2)6.

We next consider the case where all functions satisfy |Eφi| ≤ 1− ε. Notice that at leasttwo of the functions obtain the same value with probability at least 1/2. Let’s assume thatP[φ1 = 1] ≥ 1/2 and P[φ2 = 1] ≥ 1/2. Then by Lemma 7.7 we obtain that

P[φ1 = 1, φ2 = 1] ≥ 1/8.

Again we define ψ = 1(φ1 = 1, φ2 = 1). Since P[φ3 = 1] > ε/2, we may apply Lemma 7.7and obtain that:

P[φ1 = 1, φ2 = 1, φ3 = 1] = P[φ1 = 1, ψ = 1] ≥ (ε/8)6.

This concludes the proof.

39

7.4 Arrow Theorem for Low Influence Functions

Theorem 7.9. For every ε > 0 there exist a δ(ε) > 0 and a τ(δ) > 0 such that thefollowing holds. Let f1, f2, f3 : −1, 1n → −1, 1. Assume that for all 1 ≤ i 6= j ≤ 3 andall u ∈ −1, 1 it holds that

P[fi = u] + P[fj = −u] ≥ 4ε (31)

and for all j it holds that|1 ≤ i ≤ 3 : Ij(fi) > τ| ≤ 1. (32)

Then it holds thatP[f1(x) = f2(y) = f3(z))] ≥ δ.

Moreover, assuming the uniform distribution, one may take:

δ =1

8(ε/2)20, τ = τ(δ),

whereτ(δ) := δC

log(1/δ)δ ,

for some absolute constant C.

Proof. Let φ1, φ2, φ3 : R → −1, 1 be of the form φi = sgn(x − ti), where ti is chosen sothat E[φi] = E[fi] (where the first expected value is according to the Gaussian measure).Let N1, N2, N3 ∼ N(0, 1) be jointly Gaussian with E[NiNi+1] = −1/3. From Theorem 7.8it follows that:

P ([φ1(N1) = φ2(N2) = φ3(N3)] > 8δ, (33)

and from Theorem 5.16, it follows that by choosing C in the definition of τ large enough,we have:

〈f1, f2〉−1/3 ≥ 〈φ1, φ2〉−1/3−δ, 〈f2, f3〉−1/3 ≥ 〈φ2, φ3〉−1/3−δ, 〈f3, f1〉−1/3 ≥ 〈φ3, φ1〉−1/3−δ.

It now follows that:

P[f1(x) = f2(y) = f3(z)] =1

4

(1 + 〈f1, f2〉−1/3 + 〈f2, f3〉−1/3 + 〈f3, f1〉−1/3

)(34)

≥ 1

4

(1 + 〈φ1, φ2〉−1/3 + 〈φ2, φ3〉−1/3 + 〈φ3, φ1〉−1/3

)− 3δ/4

= P[φ1(N1) = φ(N2) = φ3(N3)]− 3δ/4 > 7δ,

as needed.

40

7.5 Arrow Theorem with at most one influential voter

In addition to the case where all the influences are small, we need to consider the casewhere one voter is influential. This includes the case of the dictator function. The theorembelow extends the case of low influences to the case where there is only one influentialvariable.

Theorem 7.10. For every ε > 0 there exists a δ(ε) > 0 and a τ(δ) > 0 such that thefollowing holds for all n. Let f1, f2, f3 : −1, 1n → −1, 1. Assume that for all 1 ≤ i ≤ 3and j > 1 it holds that

Ij(fi) < 0.5τ. (35)

Then eitherP[f1(x) = f2(y) = f3(z)] ≥ δ/6, (36)

or there exist (f1, f2, f3) which are 9ε-close to either a dictator function ±(xi, xi, xi) or afunction (g1, g2, g3) where two of the gi’s are constant and of different signs. Moreover,one may take:

δ = (ε/2)20, τ = τ(δ).

Proof. Consider the functions f bi for 1 ≤ i ≤ 3 and b ∈ −1, 1 defined by

f bi (x2, . . . , xn) = fi(b, x2, . . . , xn).

Note that for all b ∈ −1, 1, for all 1 ≤ i ≤ 3 and for all j > 1 it holds that Ij(fbii ) < τ

and therefore we may apply Theorem 7.9. We obtain that for every b = (b1, b2, b3) /∈(1, 1, 1), (−1,−1,−1) either:

P[f b11 (x) = f b22 (y) = f b33 (z)] ≥ δ, (37)

or there exist a u(b) ∈ −1, 1 and an i = i(b) such that

min(P[f bii = u(b)],P[fbi+1

i+1 = −u(b)]) ≥ 1− 3ε. (38)

Note that if there exists a vector b = (b1, b2, b3) /∈ (1, 1, 1), (−1,−1,−1) for which (37)holds then (36) follows immediately.

It thus remains to consider the case where (38) holds for all 6 vectors b. In this case wewill define new functions gi as follows. We let gi(b, x2, . . . , xn) = u if P[f bi = u] ≥ 1 − 3εfor u ∈ −1, 1 and gi(b, x2, . . . , xn) = fi(b, x2, . . . , xn) otherwise. We let G be the socialchoice function defined by g1, g2 and g3. From (38) it follows that for every b = (b1, b2, b3) /∈(1, 1, 1), (−1,−1,−1) there exist two functions gi, gi+1 and a value u s.t. gi(bi, x2, . . . , xn)is the constant function u and gi+1(bi+1, x2, . . . , xn) is the constant function −u. So

P (g1, g2, g3) = P[(g1, g2, g3) ∈ (1, 1, 1), (−1,−1,−1)] = 0,

and therefore by Theorem 1.3 (g1, g2, g3) is of the required form. It is further easy to seethat P[fi 6= gi] ≤ 3ε for all i, as needed. The proof follows.

41

7.6 Arrow Theorem with two influential voters

Our second application of reverse hyper-contraction is the following:

Lemma 7.11. Suppose that I1(f) > ε and I2(g) > ε. Let

B = ((x, y, z))ni=3 : 1 is pivotal for f and 2 is pivotal for g).

ThenP[B] ≥ ε3.

Proof. LetB1 = ((x, y, z))ni=3 : 1 is pivotal for f(·, ·, x3, . . . , xn),

B2 = ((x, y, z))ni=3 : 2 is pivotal for g(·, ·, y3, . . . , yn).

Then P[B1] ≥ I1(f) > ε and P[B2] ≥ I2(g) > ε, and our goal is to obtain a bound onP[B1 ∩B2]. Note that the event B1 is determined by x and the event B2 is determined byy. So the proof follows immediately from Lemma 7.6 with ρ = −1/3.

We can now prove the main result of the section.

Theorem 7.12. Suppose that there exist voters i and j such that

Ii(f) > ε, Ij(g) > ε.

then P[f(x) = g(y) = h(z)] > 136ε

3.

Proof. Without loss of generality assume that i = 1 and j = 2. Let B = B1 ∩ B2, whereB1 and B2 are the events from Lemma 7.11. By the lemma we have P[B] ≥ ε3. Notethat conditioned on any ((x, y, z)ni=3) ∈ B, the functions f, g and h on coordinates 1 and2 satisfy the condition of Arrow Theorem 1.3. Thus with probability at least 1/36, theoutcome is not transitive. Therefore:

P[f(x) = g(y) = h(z)] ≥ 1

36P[B] ≥ 1

36ε3.

7.7 Proof of the quantitative Arrow Theorem

We are finally ready to prove Theorem 7.1 which we restate below:

Theorem 7.13. For every ε > 0, there exists δ(ε) > 0 such that the following holds forevery n: If

P[f(x) = g(y) = h(z)] < δ,

42

then either two of the functions f, g, h are ε-close to constant functions of the opposite sign,or there exists a variable i such that f, g and h are all ε-close to the same dictator on voteri. Moreover, one can take

δ = exp

(− C

ε21

). (39)

Remark 7.14. We note that the dependency of δ on ε is very far from optimal. The opti-mal dependency of δ = O(ε3) was obtained by Keller [43] who considered the cases wheref, g, h are close to functions that have 0 probability of paradox and applied contractive andhyper-contractive estimates.

Proof. Let η be a small constant to be determined later. We will consider three cases:

• There exist two voters i 6= j ∈ [n] and two different functions say f and g such that

Ii(f) > η, Ij(g) > η. (40)

• For pair of functions k1 6= k2 ∈ f, g, h and every i ∈ [n], it holds that

min(Ii(k1), Ii(k2)) < η. (41)

• There exists a voter j′ such that for all j 6= j′

max(Ij(f), Ij(g), Ij(h)) < η. (42)

First note that each (f, g, h) satisfies at least one of the three conditions (40), (41) or(42). Thus it suffices to prove the theorem for each of the three cases.

In (40), we have by Theorem 7.12 have that

P[f(x) = g(y) = h(z)] >1

36η3.

We thus obtain that P[f(x) = g(y) = h(z)] > δ where δ is given in (39) by taking largervalues C ′ for C.

In case (41), by Theorem 7.9 it follows that either there exists a function (g1, g2, g3)which always puts a candidate at top / bottom and (f1, f2, f3) is ε-close to (g1, g2, g3)(if (31) holds), or P[f(x) = g(y) = h(z)] > Cε20 >> δ.

Similarly in the remaining case (42), we have by Theorem 7.10 that either (f1, f2, f3)is ε-close to (g1, g2, g3) with P[g1(x) = g2(y) = g3(z)] = 0 or P[f(x) = g(y) = h(z)] >Cε20 >> δ. The proof follows.

43

8 More general statements

In this section we discuss a more general statement of Arrow Theorem closer to his originalformulation and its quantitative counterpart. This requires to introduce a number ofadditional definition. The reduction from the more general statements of Arrow Theoremto the 3 candidate case discussed above will be carried out in this section.

8.1 General Setup

Consider A = a, b, . . . , , a set of k ≥ 3 alternatives. A transitive preference over A is aranking of the alternatives from top to bottom where ties are not allowed. Such a rankingcorresponds to a permutation σ of the elements 1, . . . , k where σi is the rank of alternativei. The set of all rankings will be denoted by Sk.

A constitution is a function F that associates to every n-tuple σ = (σ(1), . . . , σ(n)) oftransitive preferences (also called a profile), and every pair of alternatives a, b a preferencebetween a and b. Some key properties of constitutions include:

• Transitivity. The constitution F is transitive if F (σ) is transitive for all σ. In otherwords, for all σ and for all three alternatives a, b and c, if F (σ) prefers a to b, andprefers b to c, it also prefers a to c. Thus F is transitive if and only if its image is asubset of the permutations on k elements.

• Independence of Irrelevant Alternatives (IIA). The constitution F satisfies the IIAproperty if for every pair of alternatives a and b, the social ranking of a vs. b (higheror lower) depends only on their relative rankings by all voters. The IIA conditionimplies that the pairwise preference between any pair of outcomes depends only onthe individual pairwise preferences. Thus, if F satisfies the IIA property then thereexists functions fa>b for every pair of candidates a and b such that

F (σ) = ((fa>b(xa>b) : a, b ∈(k

2

))

• Unanimity. The constitution F satisfies Unanimity if the social outcome ranks aabove b whenever all individuals rank a above b.

• The constitution F is a dictator on voter i, if F (σ) = σ(i), for all σ, or F (σ) = −σ,for all σ, where −σ(i) is the ranking σk(i) > σk−1(i) . . . σ2(i) > σ1(i) by reversingthe ranking σ(i).

Arrow’s theorem states [2, 3] that:

Theorem 8.1. Any constitution on three or more alternatives which satisfies Transitivity,IIA and Unanimity is a dictatorship.

44

It is possible to give a characterization of all constitutions satisfying IIA and Transi-tivity. Results of Wilson [80] provide a partial characterization for the case where votersare allowed to rank some alternatives as equal. In order to obtain a quantitative versionof Arrow theorem, we give an explicit and complete characterization of all constitutionssatisfying IIA and Transitivity in the case where all voters vote using a strict preferenceorder. Write Fk(n) for the set of all constitutions on k alternatives and n voters satisfyingIIA and Transitivity. For the characterization it is useful write A >F B for the statementthat for all σ it holds that F (σ) ranks all alternatives in A above all alternatives in B. Wewill further write FA for the constitution F restricted to the alternatives in A. The IIAcondition implies that FA depends only on the individual rankings of the alternatives inthe set A. The characterization of Fk(n) we prove is the following.

Theorem 8.2. The class Fk(n) consist exactly of all constitutions F satisfying the fol-lowing: There exist a partition of the set of alternatives into disjoint sets A1, . . . , Ar suchthat:

•A1 >F A2 >F . . . >F Ar,

• For all As s.t. |As| ≥ 3, there exists a voter j such that FAs is a dictator on voter j.

• For all As such that |As| = 2, the constitution FAs is an arbitrary non-constantfunction of the preferences on the alternatives in As.

We note that for all k ≥ 3 all elements of Fk(n) are not desirable as constitutions.Indeed elements of Fk(n) either have dictators whose vote is followed with respect to someof the alternatives, or they always rank some alternatives on top some other. For a relateddiscussion see [80]. The statement above follows easily from Theorem 3 in [80]. The exactformulation is taken from [53].

The main goal of the current section is to provide a quantitative version of Theorem 8.2assuming voters vote independently and uniformly at random. Note that Theorem 8.2above implies that if F 6∈ Fk(n) then P (F ) ≥ (k!)−n. However if n is large and theprobability of a non-transitive outcome is indeed as small as (k!)−n, one may argue that anon-transitive outcome is so unlikely that in practice Arrow’s theorem does not hold.

The main goal of this section is to prove the following statement:

Theorem 8.3. For every number of alternatives k ≥ 1 and 0.01 > ε > 0, there exists aδ = δ(ε), such that for every n ≥ 1, if F is a constitution on n voters and k alternativessatisfying:

• IIA and

• P (F ) < δ,

45

then there exists G ∈ Fk(n) satisfying D(F,G) < k2ε. Moreover, one may take:

δ = exp

(− C

ε21

), (43)

for some absolute constant 0 < C <∞.

We therefore obtain the following:

Corollary 8.4. For any number of alternatives k ≥ 3 and ε > 0, there exists a δ = δ(ε),such that for every n, if F is a constitution on n voters and k alternatives satisfying:

• IIA and

• F is k2ε far from any dictator, so D(F,G) > k2ε for any dictator G,

• For every pair of alternatives a and b, the probability that F ranks a above b is atleast k2ε,

then the probability of a non-transitive outcome, P (F ), is at least δ, where δ(ε) may betaken as in (43).

Proof. Assume by contradiction that P (F ) < δ. Then by Theorem 8.3 there exists afunction G ∈ Fn,k satisfying D(F,G) < k2ε. Note that for every pair of alternatives a andb it holds that:

P[G ranks a above b] ≥ P[F ranks a above b]−D(F,G) > 0.

Therefore for every pair of alternatives there is a positive probability that G ranks a aboveb. Thus by Theorem 8.3 it follows that G is a dictator which is a contradiction.

Remark 8.5. Note that if G ∈ Fk(n) and F is any constitution satisfying D(F,G) < k2εthen P (F ) < k2ε.

Remark 8.6. The bounds stated in Theorem 8.3 and Corollary 8.4 in terms of k and εis clearly not an optimal one. We expect that the true dependency has δ which is somefixed power of ε. Moreover we expect that the bound D(F,G) < k2ε should be improvedto D(F,G) < ε.

8.1.1 Nisan’s argument

N. Nisan argued in his blog [70] that the natural way to study quantitative versions ofArrow’s theorem is to look at functions from Snk to Sk and check to what extent do theysatisfy the IIA property. He defines a function to be η-IIA if for every two alternatives aand b it holds that P[F (σ) 6= F(τ)] ≤ η, where σ is uniformly chosen and τ is uniformlychosen conditioned on the a, b ranking at τ being identical to that of σ for all voters. In his

46

blog Nisan sketches how a quantitative Arrow’s theorem proven for the definition used hereimplies a quantitative Arrow’s theorem for his definition. We briefly repeat the argumentwith some minor modifications and corrections.

Fixing alternatives a, b and writing pa,b : 0, 1n → 0, 1 for the probability that givena vector of n binary preferences between a and b, F ranks a above b. If F satisfies theIIA property then pa,b ∈ 0, 1 a.s. If F is η-IIA then E[2pa,b(1− pa,b)] ≤ η, and thereforeE[min(pa,b, 1− pa,b)] ≤ η.

Assume a quantitative Arrow theorem such as the one proven here with parameters ε, δand suppose by contradiction that F : Snk → Sk is η-IIA and ε far from Fk(n) for somesmall η to be determined later. Define a function G as follows. Let G(σ) rank a above bif for the majority of τ which agree with σ in the a, b orderings it holds that F (τ) ranks aabove b. We note that for every pair of alternatives a, b it holds that

P[F (σ), G(σ) have different order on a, b] = E[min(pa,b, 1− pa,b)] ≤ η.

By taking a union bound on all pairs of alternatives, this implies that D(F,G) ≤(k2

)η ≤

k2η/2. Note further that G satisfies the IIA property by definition. Since F is transitiveand from the quantitative Arrow theorem proven here we conclude that

D(F,G) ≥ P[P (G)] ≥ δ.

and a contradiction is implied unless k2η/2 ≥ δ. Thus the Arrow theorem for the η-IIAdefinition holds with η(ε) = 2δ/k2.(We briefly note that moving from F to G does notpreserve the property of the function being balanced so in the setting of Kalai’s theoreman additional argument is needed)

8.2 Proof of Theorem 8.3

Proof. The proof follows by applying Theorem 7.13 to triplets of alternatives. AssumeP (F ) < δ(ε).

Note that if g1, g2 : −1, 1n → −1, 1 are two different function each of which iseither a dictator or a constant function than D(g1, g2) ≥ 1/2. Therefore for all a, b it holdsthat D(fa>b, g) < ε/10 for at most one function g which is either a dictator or a constantfunction. In case there exists such function we let ga>b = g, otherwise, we let ga>b = fa>b.

Let G be the social choice function defined by the functions ga>b. Clearly:

D(F,G) <

(k

2

)ε < k2ε.

The proof would follow if we could show P (G) = 0 and therefore G ∈ Fk(n).To prove that G ∈ Fk(n) is suffices to show that for every set A of three alternatives,

it holds that GA ∈ F3(n). Since P (F ) < δ implies P (FA) < δ, Theorem 7.13 implies thatthere exists a function HA ∈ F3(n) s.t. D(HA, FA) < ε. There are two cases to consider:

47

• HA is a dictator. This implies that fa>b is ε close to a dictator for each a, b andtherefore fa>b = ga>b for all pairs a, b, so GA = HA ∈ F3(n).

• There exists an alternative (say a) that HA always ranks at the top/bottom. In thiscase we have that fa>b and f c>a are at most ε far from the constant functions 1and −1 (or −1 and 1). The functions ga>b and gc>a have to take the same constantvalues and therefore again we have that GA ∈ F3(n).

The proof follows.

Remark 8.7. Note that this proof is generic in the sense that it takes the quantitativeArrow’s result for 3 alternatives as a black box and produces a quantitative Arrow resultfor any k ≥ 3 alternatives.

8.3 Other probability measures

Given the right analytic tools, it is not hard to generalize the proof of Theorem 7.13 andTheorem 8.3 to other product distribution. This is done in [62] where some of the toolsrelated to reverse-hyper-contraction are developed. [62] obtains the following extension.

Theorem 8.8 (Quantitative Arrow’s theorem for general distribution). Let % be generaldistribution on Sk with % assigning positive probability to each element of Sk. Let P denotethe distribution %⊗n on Snk . Then for any number of alternatives k ≥ 3 and ε > 0, there

exists δ = δ(ε, ρ) > 0, such that for every n, if F : Snk → −1, 1(k2) satisfies

• IIA and

• PF (σ) is transitive ≥ 1− δ.

Then there exists a function G which is transitive and satisfies the IIA property andPF (σ) 6= G(σ) ≤ ε

9 Kalai’s proof for the balanced case

The special case, where all the functions are balanced was the first where a quantitativeArrow Theorem was proved by Kalai [41]. In this short section we provide the statementand the proof of this special case.

Theorem 9.1. There exists a constant C such that if f, g, h : −1, 1n → −1, 1 satisfyE[f ] = E[g] = E[h] = 0 and

P[f(x) = g(y) = h(z)] ≤ ε,then there exists a dictator function d, such that

P[f(x) 6= d(x)] ≤ Cε, P[g(y) 6= d(y)] ≤ Cε, P[h(z) 6= d(z)] ≤ Cε.

48

The proof of the Theorem will use the FKN Theorem [27] which will be proved at theend of the section.

Theorem 9.2. There exists a constant C such that if f : −1, 1n → −1, 1 satisfiesE[f ] = 0 and ∑

S:|S|=1

f2(S) = 1− ε, (44)

Then there exists a dictator a d such that P[f(x) 6= d(x)] ≤ Cε.

We now prove Theorem 9.1.

Proof. Recalling that

P[f(x) = g(y) = h(z)] =1

4

(1 + 〈f, g〉−1/3 + 〈g, h〉−1/3 + 〈h, f〉−1/3

)and Theorem 5.4, it is clear that to prove Theorem 9.1 it suffices to prove that: there existsa constant C such that if f, g : −1, 1n → −1, 1 satisfy E[f ] = E[g] = 0 and

〈f, g〉−1/3 ≤ −1

3+ ε (45)

then there exists a dictator d, such that

P[f(x) 6= d(x)] ≤ Cε, P[g(y) 6= d(y)] ≤ Cε. (46)

Note that

〈f, g〉−1/3 ≥ (−1/3)∑

S:|S|=1

f(S)g(S) + (−1/9)∑

S:|S|>1

|f(S)g(S)| ≥ −1

3γ − 1

9(1− γ),

where γ =∑

S:|S|=1 |f(S)g(S)|. Thus if (45) holds then∑S:|S|=1

|f(S)g(S)| ≥ 1− ε. (47)

It therefore suffices to show that if (47) holds then (46) holds. By CS,∑S:|S|=1

f2(S) ≥ (1− ε)2 ≥ 1− 2ε,

Therefore by Theorem FKN, f is CFKNε close to a dictator d1. Similarly g is is CFKNεclose to a dictator d2. Note that the statement of the theorem is trivial if C ≥ 8 andε ≤ 1/8, so assume ε ≤ 1/8. It remains to show that d1 = d2. Note that if d1 6= d2 then

P[f(x) 6= g(x)] ≥ P[d1(x) 6= d2(x)]− P[d1(x) 6= f(x)]− P[d2(x) 6= g(x)] ≥ 1

2− 2

1

8= 1/4.

49

However, by (47) and using the fact that∑

S |f(S)g(S)| ≤ 1

E[f(x)g(x)] =∑S

f(S)g(S) ≥ 1− 2ε,

and therefore P[f(x) 6= g(x)] ≤ ε. The proof follows.

We will now prove the FKN Theorem. The proof is similar to the proofs in refsO’donnell and K.’s proof We will use the following corollary of hyper-contraction.

Lemma 9.3. Let q(x) =∑

i<j qi,jxixj. Then

E[q4] ≤ 81E[q2]2.

Proof. By hyper-contraction if η2 ≤ 1/3 then,

‖Tηq‖4 ≤ ‖q‖2 =⇒ E[(Tηq)4] ≤ E[q2]2.

On the other hand,E[(Tηq)

4] = E[(∑i<j

η2qi,jxixj)4] = η8E[q4],

So by choosing η = 1/√

3, we get

E[q4] ≤ η−8E[q2]2

The proof will also use the Paley-Zygmund inequality stating that for a positive randomvariable Z and for 0 ≤ θ ≤ 1 it holds that:

P[Z ≥ θE[Z]] ≥ (1− θ)2E[Z]2

E[Z2]

Exercise 9.4. Prove this using CS.

Applying this inequality for Z = q2 and using Lemma 9.3 implies that

Corollary 9.5. For 0 ≤ θ ≤ 1,

P[q2 ≥ θE[q2]] ≥ (1− θ)2

81. (48)

50

Proof. Let

`(x) =∑|S|≤1

f(S)xS , h(x) =∑|S|>1

f(S)xS , q(x) = 2∑i<j

f(i)f(j)xixj ,

so that`2(x) =

∑i

f2(i) + q(x) = 1− ε+ q(x).

Note that f2 = 1 implies that `2 + h(2f − h) = 1. Moreover using the fact that |f | = 1,for all c ≥ 1 and sufficiently small ε,

P[h(2h− f) ≥ 2c√ε] ≤ P[|h|(2|h|+ 1) ≥ 2c

√ε] ≤ P[|h(x)| ≥ c

√ε] ≤ E[h2]

c2ε≤ 1

c2.

Therefore,

P[|q(x)| ≥ (2c+1)√ε] = P[|`2−1+ε| ≥ (2c+1)

√ε] ≤ P[|h(2f−h)|+ε ≥ (2c+1)

√ε] ≤ 1

c2.

In particular for c = 10, we obtain:

P[q2(x) ≥ 500ε] ≤ 1

100

On the other hand, applying (48) with θ = 0.05, implies that

P[q2 ≥ E[q2]/20] >1

100,

and therefore E[q2] ≤ Cε, with C = 10000. Now:

Cε ≥ E[q2] = 4∑i<j

f2(i)f2(j) = 2

(∑i

f2(i)

)2

− 2∑i

f4(i) = 2(1− ε)2 − 2∑i

f4(i)

So we obtain that

maxif2(i) ≥

∑i

f4(i) ≥ (1− ε)2 − Cε = 1−O(ε),

as needed.

51

10 Low Influence Optimality for k ≥ 3 alternatives

When we are considering k ≥ 3 alternatives, we want to define more formally the possibleoutcome in Arrow voting. Since for every two alternatives, a winner is decided, the aggre-gation results in a tournament Gk on the set [k]. Recall that Gk is a tournament on [k] ifit is a directed graph on the vertex set [k] such that for all a, b ∈ [k] either (a > b) ∈ Gk or(b > a) ∈ Gk. Given individual rankings (σi)

ni=1 the tournament Gk is defined as follows.

Let xa>b(i) = 1, if σi(a) > σi(b), and xa>b(i) = −1 if σi(a) < σi(b). Note thatxb>a = −xa>b.

The binary decision between each pair of candidates is performed via a anti-symmetricfunction f : −1, 1n → −1, 1 so that f(−x) = −f(x) for all x ∈ −1, 1. The tourna-ment Gk = Gk(σ; f) is then defined by letting (a > b) ∈ Gk if and only if f(xa>b) = 1.

Note that there are 2(k2)tournaments while there are only k! = 2Θ(k log k) linear rankings.For the purposes of social choice, some tournaments make more sense than others.

Definition 10.1. We say that a tournament Gk is linear if it is acyclic. We will writeAcyc(Gk) for the logical statement that Gk is acyclic. Non-linear tournaments are oftenreferred to as non-rational in economics as they represent an order where there are 3 can-didates a, b and c such that a is preferred to b, b is preferred to c and c is preferred to a.

We say that the tournament Gk is a unique max tournament if there is a candidatea ∈ [k] such that for all b 6= a it holds that (a > b) ∈ Gk. We write UniqueBest(Gk) forthe logical statement that Gk has a unique max. Note that the unique max property isweaker than linearity. It corresponds to the fact that there is a candidate that dominatesall other candidates.

A generalization of Borell’s result along with a general invariance principle [52, 54]allows to prove the following [37].

Theorem 10.2. For any k ≥ 1 and ε > 0 there exists a τ(ε, k) > 0 such that for anyanti-symmetric f : −1, 1n → 0, 1 satisfying maxi Infi f ≤ τ ,

P[UniqueBestk(f)] ≤ limn→∞

P[UniqueBestk(Majn)] + ε (49)

An alternative proof can be derived via multi-dimensional generalization of the induc-tive Majority is Stablest Theorem using a general notion of ρ-concavity which generalizesClaim 6.8 [66, 46].

It is not hard to show that

P[UniqueBestk(Majn)] = k−1+o(1). (50)

52

Exercise 10.3. Prove (50) by using a multi-dimensional CLT and representing the advan-tage of candidate 1 over candidate i as√

1

3X +

√2

3Zi,

where X,Z2, . . . , Zk are i.i.d. N(0, 1) random variables.

Other than the case k = 3, where the notions of unique-max and linear tournamentscoincide, very little is known about which function maximizes the probability of a linearorder. Even computing this probability for Majority provides a surprising result: Interest-ingly, we are not able to derive similar results for Acyc. We do calculate the probabilitythat Acyc holds for majority.

Proposition 10.4. We have

limn→∞

P[Acyc(Gk(σ; Majn))] = exp(−Θ(k5/3)). (51)

We find this asymptotic behavior quite surprising. Indeed, given the previous resultsthat the probability that there is a unique max is k−1+o(1), one may expect that theprobability that the order is linear would be

k−1+o(1)(k − 1)−1+o(1) . . . = (k!)−1+o(1).

However, it turns out that there is a strong negative correlation between the event thatthere is a unique maximum among the k candidates and that among the other candidatesthere is a unique max. We note that results in economics [7] have shown that for majorityvote the probability that the outcome will contain a Hamiltonian cycle when the numberof voters goes to infinity is 1− ok(1).

Proof. We use the multi-dimensional CLT. Let

Xa>b =1√n

(|σ : σ(a) > σ(b)| − |σ : σ(b) > σ(a)|)

By the CLT at the limit the collection of variables (Xa>b)a6=b converges to a joint Gaussianvector (Na>b)a6=b satisfying for all distinct a, b, c, d:

Na>b = −Nb>a, Cov[Na>b, Na>c] =1

3, Cov[Na>b, Nc>d] = 0.

and Na>b ∼ N(0, 1) for all a and b.We are interested in providing bounds on

P [∀a > b : Na>b > 0]

53

as the probability that the resulting tournament is an order is obtained by multiplying bya k! = exp(Θ(k log k)) factor.

We claim that there exist independent N(0, 1) random variables Xa for 1 ≤ a ≤ k andZa>b for 1 ≤ a 6= b ≤ k such that

Na>b =1√3

(Xa −Xb + Za>b)

(where Za>b = −Zb>a). This follows from the fact that the joint distribution of Gaussianrandom variables is determined by the covariance matrix (this is noted in the literaturein [69]).

We now prove the upper bound. Let α be a constant to be chosen later. Note that forall α and large enough k it holds that:

P [|Xa| > kα] ≤ exp(−Ω(k2α)).

Therefore the probability that for at least half of the a’s in the interval [k/2, k] it holdsthat |Xa| > kα is at most

exp(−Θ(k1+2α)).

Let’s assume that at least half of the a’s in the interval [k/2, k] satisfy that |Xa| < kα.We claim that in this case the number Hk/4[−kα, kα] of pairs a > b such that Xa, Xb ∈−[kα, kα] and Xa −Xb < 1 is Ω(k2−α).

For the last claim partition the interval [−kα, kα] into sub-intervals of length 1 andnote that at least Ω(k) of the points belong to sub-intervals which contain at least Ω(k1−α)points. This implies that the number of pairs a > b satisfying |Xa −Xb| < 1 is Ω(k2−α).

Note that for such pair a > b in order that Na>b > 0 we need that Za>b > −1 whichhappens with constant probability.

We conclude that given that half of the X’s fall in [−kα, kα] the probability of a linearorder is bounded by

exp(−Ω(k2−α)).

Thus overall we have bounded the probability by

exp(−Ω(k1+2α)) + exp(−Ω(k2−α)).

The optimal exponent is α = 1/3 giving the desired upper bound.For the lower bound we condition on Xa taking value in (a, a+1)k−2/3. Each probability

is at least exp(−O(k2/3)) and therefore the probability that all Xa take such values is

exp(−O(k5/3)).

Moreover, conditioned on Xa taking such values the probability that

Za>b > Xb −Xa,

54

for all a > b is at least(k−1∏i=0

Φ(i)k2/3

)k≥

( ∞∏i=0

Φ(i)

)k5/3= exp(−O(k5/3)).

This proves the required result.

55

11 Manipulation, Isoperimetry and Concentration of Mea-sure

11.1 Quantitive Manipulation and Isoperimetry

The GS Theorem, Theorem 1.4, states that under natural conditions, there exist profilesof voters such that at least one voter can manipulate. The basic approach for the proofis to view the theorem as an isoperimetric theorem. In classical Isoperimetric theory, thegoal is to find conditions that establish large boundary between sets. In the context ofmanipulation we can consider a voter who can manipulate as a special boundary point andour goal is to prove that there are many boundary points.

It is natural to consider the following graph where the vertex set is Snk — the set ofall voting profiles and there are edges between voting profiles that differ at a single voter.The statement of the GS theorem can be interpreted in terms of this graph: for certainnatural partitions of Snk into k parts there is an edge of the graph between two differentparts that corresponds to a manipulation. We will see that the existence of many edgesbetween different parts of the graph follows from classical Isoperimetric theory.

Thus one may consider quantitative statements of the GS theorem as isoperimetricstatements: It is not only the case that there are many edges between different parts ofthe partition, but it is also the case that many of these edges correspond to manipulationby one of the voters.

In the classical setup, isoperimetry and concentration of measure are closely related.In particular, as we will see below, for any set of fractional size at least ε in Snk , the set ofprofiles at graph distance at most C(ε)

√n contains almost the whole graph. One would

hope that this statement implies that typically a small coalition can manipulate, but thisis not known.

It may be useful to consider the following examples:

• Consider the plurality function with q ≥ 3 alternatives and n voters. For a voter tobe able to manipulate, it has to be that the difference between the top candidate andthe second to top is at most 1. This implies that the probability that there exists avoter who can manipulate is O(1/

√n).

A second question we can ask is what is the minimal size s of a coalition S ⊂ [n]that can manipulate with probability close to 1. Since the difference between the topcandidate and the second candidate is typically of order

√n, it is clear that the size

has to be at least order√n and in fact it is easy to see that if s/

√n → ∞ that for

every set S of size s, with probability 1− o(1) there exists a subset T ⊂ S that canchange the outcome of the elections by manipulating. Indeed if a, b is the secondfrom the top and c is a candidate different than a and b, we may take T to be all thevoters in S that rank c above a above b.

• Consider the case q = 2 and the function g(x) = −f(x) where f is the tribes function.

56

In this case a voter can manipulate if and only if they are influential. Therefore wecannot escape that the probability that an individual voter can manipulate is higherthan O(log n/n).

11.2 A quantitive GS Theorem

Our goal in this section is to prove a quantitative version of the manipulation Theorem.We will mostly follow [63, 64] who proved a pretty general average manipulation theoremfor a single voter. Some special cases of the Theorem were known before, in particular inthe case of 3 alternatives this was proved by Friedgut, Kalai, Keller and Nisan [28, 26].

In this section we will prove that: if k ≥ 3 and the SCF f is ε-far from the family ofnonmanipulable functions, then the probability of a ranking profile being manipulable isbounded from below by a polynomial in 1/n, 1/k, and ε. We continue by first presentingour results, then discussing their implications, and finally we conclude this section bycommenting on the techniques used in the proof.

11.3 Definition and Statement

Recall that our basic setup consists of n voters electing a winner among k alternatives viaAn SCF f : Snk → [k]. We now define manipulability in more detail:

Definition 11.1 (Manipulation points). Let σ ∈ Snk be a ranking profile. Write aσi> b to

denote that alternative a is preferred over b by voter i. A SCF f : Snk → [k] is manipulableat the ranking profile σ ∈ Snk if there exists a σ′ ∈ Snk and an i ∈ [n] such that σ and σ′

only differ in the ith coordinate and

f(σ′)σi> f(σ).

In this case we also say that σ is a manipulation point of f , and that (σ, σ′) is a manipulationpair for f . We say that f is manipulable if it is manipulable at some point σ. We also saythat σ is an r-manipulation point of f if f has a manipulation pair (σ, σ′) such that σ′ isobtained from σ by permuting (at most) r adjacent alternatives in one of the coordinatesof σ. (We allow r > k—any manipulation point is an r-manipulation point for r > k.)

Let M (f) denote the set of manipulation points of the SCF f , and for a given r, letMr (f) denote the set of r-manipulation points of f . When the SCF is obvious from thecontext, we write simply M and Mr.

We first recall Gibbard and Satterthwaite theorem (stated as Theorem 1.4 in the in-troduction):

Theorem 11.2 (Gibbard-Satterthwaite [31, 76]). Any SCF f : Snk → [k] which takes atleast three values and is not a dictator (i.e., not a function of only one voter) is manipulable.

57

This theorem is tight in the sense that monotone SCFs which are dictators or onlyhave two possible outcomes are indeed nonmanipulable (a function is non-monotone, andclearly manipulable, if for some ranking profile a voter can change the outcome from, say,a to b by moving a ahead of b in her preference). It is useful to introduce a refined notionof a dictator before defining the set of nonmanipulable SCFs.

Definition 11.3 (Dictator on a subset). For a subset of alternatives H ⊆ [k], let topH bethe SCF on one voter whose output is always the top ranked alternative among those inH.

Definition 11.4 (Nonmanipulable SCFs). We denote by NONMANIP ≡ NONMANIP (n, k)the set of nonmanipulable SCFs, which is the following:

NONMANIP (n, k) = f : Snk → [k] | f (σ) = topH (σi) for some i ∈ [n] , H ⊆ [k] , H 6= ∅∪ f : Snk → [k] | f is a monotone function taking on exactly two values .

When the parameters n and k are obvious from the context, we omit them.

Another useful class of functions, which is larger than NONMANIP, but which has asimpler description, is the following.

Definition 11.5. Define, for parameters n and k that remain implicit:

NONMANIP = f : Snk → [k] | f only depends on one coordinate or takes at most two values.

The notation should be thought of as “closure” rather than “complement”.As discussed previously, our goal is to study manipulability from a quantitative view-

point, and in order to do so we need to define the distance between SCFs.

Definition 11.6 (Distance between SCFs). The distance D(f, g) between two SCFs f, g : Snk →[k] is defined as the fraction of inputs on which they differ: D (f, g) = P (f (σ) 6= g (σ)),where σ ∈ Snk is uniformly selected. For a classG of SCFs, we write D (f,G) = ming∈GD (f, g).

The concepts of anonymity and neutrality of SCFs will be important to us, so we definethem here.

Definition 11.7 (Anonymity). A SCF is anonymous if it is invariant under changes madeto the names of the voters. More precisely, a SCF f : Snk → [k] is anonymous if for everyσ = (σ1, . . . , σn) ∈ Snk and every π ∈ Sn,

f (σ1, . . . , σn) = f(σπ(1), . . . , σπ(n)

).

Definition 11.8 (Neutrality). A SCF is neutral if it commutes with changes made tothe names of the alternatives. More precisely, a SCF f : Snk → [k] is neutral if for everyσ = (σ1, . . . , σn) ∈ Snk and every π ∈ Sk,

f (π σ1, . . . , π σn) = π (f (σ)) .

58

Our goal is to sketch the proof of the following Theorem:

Theorem 11.9. Suppose we have n ≥ 1 voters, k ≥ 3 alternatives, and a SCF f : Snk → [k]satisfying D (f,NONMANIP) ≥ ε. Then

P (σ ∈M (f)) ≥ P (σ ∈M4 (f)) ≥ p(ε,

1

n,

1

k

)(52)

for some polynomial p, where σ ∈ Snk is selected uniformly.An immediate consequence is that

P((σ, σ′

)is a manipulation pair for f

)≥ q

(ε,

1

n,

1

k

)for some polynomial q, where σ ∈ Snk is uniformly selected, and σ′ is obtained from σby uniformly selecting a coordinate i ∈ 1, . . . , n, uniformly selecting j ∈ 1, . . . , n− 3,and then uniformly randomly permuting the following four adjacent alternatives in σi:σi (j) , σi (j + 1) , σi (j + 2), and σi (j + 3).

11.4 Proof Ideas

We first present our techniques that achieve a lower bound for the probability of manipu-lation that involves factors of 1

k! and then describe how a refined approach leads to a lowerbound which has inverse polynomial dependence on k.

Rankings graph and applying the original Gibbard-Satterthwaite theorem. Con-sider the graph G = (V,E) having vertex set V = Snk , the set of all ranking profiles, and let(σ, σ′) ∈ E if and only if σ and σ′ differ in exactly one coordinate. The SCF f : Snk → [k]naturally partitions V into k subsets. Since every manipulation point must be on theboundary between two such subsets, we are interested in the size of such boundaries.

For two alternatives a and b, and voter i, denote by Ba,bi the boundary between f−1 (a)

and f−1 (b) in voter i. A simple lemma tells us that at least two of the boundaries arelarge; in the following assume that these are Ba,b

1 and Ba,c2 . Now if a ranking profile σ lies

on both of these boundaries, then applying the original Gibbard-Satterthwaite theorem tothe restricted SCF on two voters where we fix all coordinates of σ except the first two, weget that there must exist a manipulation point which agrees with σ in all but the first twocoordinates. Consequently, if we can show that the intersection of the boundaries Ba,b

1 andBa,c

2 is large, then we have many manipulation points.

Fibers and reverse hypercontractivity. In order to have more “control” over what ishappening at the boundaries, we partition the graph further—this idea is due to Friedgutet al. [28, 26]. Given a ranking profile σ and two alternatives a and b, σ induces a vectorof preferences xa,b (σ) ∈ −1, 1n between a and b. For a vector za,b ∈ −1, 1n we definethe fiber with respect to preferences between a and b, denoted by F

(za,b), to be the set of

59

ranking profiles for which the vector of preferences between a and b is za,b. We can thenpartition the vertex set V into such fibers, and work inside each fiber separately. Workinginside a specific fiber is advantageous, because it gives us the extra knowledge of the vectorof preferences between a and b.

We distinguish two types of fibers: large and small. We say that a fiber w.r.t. preferencesbetween a and b is large if almost all of the ranking profiles in this fiber lie on the boundaryBa,b

1 , and small otherwise. Now since the boundary Ba,b1 is large, either there is big mass

on the large fibers w.r.t. preferences between a and b or big mass on the small fibers. Thisholds analogously for the boundary Ba,c

2 and fibers w.r.t. preferences between a and c.Consider the case when there is big mass on the large fibers of both Ba,b

1 and Ba,c2 .

Notice that for a ranking profile σ, being in a fiber w.r.t. preferences between a and bonly depends on the vector of preferences between a and b, xa,b (σ), which is a uniformbit vector. Similarly, being in a fiber w.r.t. preferences between a and c only dependson xa,c (σ). Moreover, we know the exact correlation between the coordinates of xa,b (σ)and xa,c (σ), and it is in exactly this setting where reverse hypercontractivity applies andshows that the intersection of the large fibers of Ba,b

1 and Ba,c2 is also large. Finally, by the

definition of a large fiber it follows that the intersection of the boundaries Ba,b1 and Ba,c

2 islarge as well, and we can finish the argument using the Gibbard-Satterthwaite theorem asabove.

To deal with the case when there is big mass on the small fibers of Ba,b1 we use various

isoperimetric techniques, including the canonical path method. In particular, we use thefact that for a small fiber for Ba,b

1 , the size of the boundary of Ba,b1 in the small fiber is

comparable to the size of Ba,b1 in the small fiber itself, up to polynomial factors.

A refined geometry. Using this approach with the rankings graph above, our boundincludes 1

k! factors. In order to obtain inverse polynomial dependence on k we use a refinedapproach. Instead of the rankings graph outlined above, we use an underlying graph with adifferent edge structure: (σ, σ′) ∈ E if and only if σ and σ′ differ in exactly one coordinate,and in this coordinate they differ by a single adjacent transposition. In order to prove therefined result, we need to show that the geometric and combinatorial quantities such asboundaries and manipulation points are roughly the same in the refined graph as in theoriginal rankings graph. In particular, this is where we need to analyze carefully functionsof one voter, and ultimately prove a quantitative Gibbard-Satterthwaite theorem for onevoter.

11.5 Isoperimetric Results

11.5.1 Boundaries and influences

For a general graph G = (V,E), and a subset of the vertices A ⊆ G, we define the edgeboundary of A as

∂e (A) = (u, v) ∈ E : u ∈ A, v /∈ A .

60

We also define the boundary (or vertex boundary) of a subset of the vertices A ⊆ G to bethe set of vertices in A which have a neighbor that is not in A:

∂ (A) = u ∈ A : there exists v /∈ A such that (u, v) ∈ E .

If u ∈ ∂ (A), we also say that u is on the edge boundary of A.

Definition 11.10 (Boundaries). For a given SCF f and a given alternative a ∈ [k], wedefine

Ha (f) = σ ∈ Snk : f (σ) = a ,the set of ranking profiles where the outcome of the vote is a. The edge boundary of thisset is denoted by Ba (f) : Ba (f) = ∂e (Ha (f)). This boundary can be partitioned: we saythat the edge boundary of Ha (f) in the direction of the ith coordinate is

Bai (f) =

(σ, σ′

)∈ Ba (f) : σi 6= σ′i

.

The boundary Ba (f) can be therefore written as Ba (f) = ∪ni=1Bai (f). We can also define

the boundary between two alternatives a and b in the direction of the ith coordinate:

Ba,bi (f) =

(σ, σ′

)∈ Ba

i (f) : f(σ′)

= b.

We also say that σ ∈ Bai (f) is on the boundary Ba,b

i (f) if there exists σ′ such that(σ, σ′) ∈ Ba,b

i (f).

We will need to generalize the definition of influences as follows:

Definition 11.11 (Influences). We define the influence of the ith coordinate on f as

Infi (f) = P(f (σ) 6= f

(σ(i)))

= P((σ, σ(i)

)∈ ∪ka=1B

ai (f)

),

where σ is uniform on Snk and σ(i) is obtained from σ by rerandomizing the ith coordinate.Similarly, we define the influence of the ith coordinate with respect to a single alternativea ∈ [k] or a pair of alternatives a, b ∈ [k] as

Infai (f) = P(f (σ) = a, f

(σ(i))6= a

)= P

((σ, σ(i)

)∈ Ba

i (f)),

andInfa,bi (f) = P

(f (σ) = a, f

(σ(i))

= b)

= P((σ, σ(i)

)∈ Ba,b

i (f)),

respectively.

Clearly

Infi (f) =

k∑a=1

Infai (f) =∑

a,b∈[k]:a6=b

Infa,bi (f) .

Most of the time the specific SCF f will be clear from the context, in which case weomit the dependence on f , and write simply Ba ≡ Ba (f), Ba

i ≡ Bai (f), etc.

61

11.5.2 Large boundaries

The following standard proposition bounds the total influence with respect to a givencandidate from below by the variance with respect to that candidate.

Proposition 11.12. For any f : Snk → [k] and a ∈ [k],

n∑i=1

Infai (f) ≥ Var[1f(X)=a] (53)

where X ∈ Snk is uniformly selected.

Proof. Create a random walk X = X(0), . . . , X(n) = Y from X by re-randomizing the i:thcoordinate in the i:th step, i.e. for i ∈ [n], X(i) ∈ Snk is obtained by re-randomizing thei:th coordinate of X(i−1). Letting g(x) = 1f(x)=a and using that X,Y are independentand that if g(X) 6= g(Y ) then the value of g has to change at some edge on the path wehave

2 Var[1f(X)=a] = 2 Var g(X) = P(g(X) 6= g(Y )) ≤

≤ P(∪i∈[n]g(X(i−1)) 6= g(X(i))) ≤n∑i=1

2 Infai (f)

Further, if a function is far from all constants not all such variances can be small:

Lemma 11.13. For any f : Snk → [k],

D(f,CONST) ≤ q

2

k∑a=1

Var[1f(X)=a] (54)

Proof. For a ∈ [k], let µa = P(f(X) = a) and assume w.l.o.g. that µ1 ≥ µ2 ≥ . . . ≥ µq.Then,

D(f,CONST) = (1− µ1) ≤ kµ1(1− µ1) =k

2

(1− µ2

1 − (1− µ1)2)≤

≤ k

2

(1−

q∑a=1

µ2a

)=k

2

q∑a=1

µa − µ2a =

k

2

q∑a=1

Var[1f(X)=a]

The following lemma shows that there are some boundaries which are large

62

Lemma 11.14. Fix k ≥ 3 and f : Snk → [k] satisfying D(f,NONMANIP) ≥ ε. Then thereexist distinct i, j ∈ [n] and a, b, c, d ⊆ [k] such that c /∈ a, b and

Infa,bi (f) ≥ 2ε

nk2(k − 1)and Infc,dj (f) ≥ 2ε

nk2(k − 1). (55)

Proof. For a 6= b let Aa,b =i ∈ [n] | Infa,bi ≥

2εnk2(k−1)

.

We first claim that for all a, b there exists c, d such that c, d 6= a, b and Ac,d 6= ∅.Note that f being ε-far from taking two values implies that we can find a c /∈ a, b suchthat 1− ε

q ≥ P(f(X) = c) ≥ εk−2 ≥

εk . But then by Proposition 11.12 it holds that

∑d6=c

n∑i=1

Infc,di (f) =

n∑i=1

Infci (f) ≥ Var[1f(X)=c] ≥ε(1− ε/k)

k≥ ε(k − 1)

k2

hence there must exist some d 6= c and i ∈ [n] such that Infc,di ≥εnk2≥ 2ε

nk2(k−1), and thus

Ac,d 6= ∅.We next claim that

| ∪a,b Aa,b| ≥ 2 (56)

To see this, assume the contrary, i.e. ∪a,bAa,b ⊆ i for some i ∈ [n]. Then for all j 6= i itholds that

Infj(f) =∑c,d

Infc,dj (f) <k(k − 1)

2

2ε

nk2(k − 1)=

ε

nk(57)

For σ ∈ Sn, let fσ(x) = f(x1, . . . , xi−1, σ, xi+1, . . . , xn) and note that for j 6= i,

Infj(f) =1

k!

∑σ∈Sn

Infj(fσ) (58)

while Infi(fσ) = 0. Hence, by (57), we have

ε > k∑j 6=i

Infj(f) =k

k!

n∑j=1

∑σ

Infj(fσ) ≥ 2

k!

∑σ

D(fσ,CONST) = 2D(f,DICTi)

where the second inequality follows from Lemma 11.13 and Proposition 11.12. But thismeans that f is ε/2-close to a dictator, contradicting the assumption that D(f,NONMANIP) ≥ε.

Hence (56) holds. Therefore we can either find i 6= j and a, b 6= c, d such thati ∈ Aa,b and j ∈ Ac,d which proves the theorem, or we must have |Aa,b| ≥ 2 for some a, bwhile Ac,d = ∅ for any c, d 6= a, b. However, this contradicts the first claim in theproof. The result follows.

63

11.5.3 General isoperimetric results

Our rankings graph is the Cartesian product of n complete graphs on k! vertices. Theedge-isoperimetric problem on the product of complete graphs was originally solved byLindsey [48], implying the following result.

Corollary 11.15. If A ⊆ K` × · · · × K` (n copies of the complete graph K`) and |A| ≤(1− 1

`

)`n, then |∂e (A)| ≥ |A|.

11.5.4 Fibers

In our proof we need to partition the graph even further—this idea is due to Friedgut,Kalai, Keller, and Nisan [28, 26].

Definition 11.16. For a ranking profile σ ∈ Snk define the vector

xa,b ≡ xa,b (σ) =(xa,b1 (σ) , . . . , xa,bn (σ)

)of preferences between a and b, where xa,bi (σ) = 1 if a

σi> b and xa,bi (σ) = −1 otherwise.

Definition 11.17 (Fibers). For a pair of alternatives a, b ∈ [k] and a vector za,b ∈ −1, 1n,write

F(za,b)

:=σ : xa,b (σ) = za,b

.

We call the F(za,b)

fibers with respect to preferences between a and b.

So for any pair of alternatives a, b, we can partition the ranking profiles according toits fibers:

Snk =⋃

za,b∈−1,1nF(za,b).

Given a SCF f , for any pair of alternatives a, b ∈ [k] and i ∈ [n], we can also partitionthe boundary Ba,b

i (f) according to its fibers. There are multiple, slightly different ways ofdoing this, but for our purposes the following definition is most useful. Define

Bi

(za,b)

:=σ ∈ F

(za,b)

: f (σ) = a, and there exists σ′ s.t.(σ, σ′

)∈ Ba,b

i

,

where we omit the dependence of Bi(za,b)

on f . So Bi(za,b)⊆ F

(za,b)

is the set of verticeson the given fiber for which the outcome is a and which lies on the boundary between aand b in direction i. We call the sets of the form Bi

(za,b)

fibers for the boundary Ba,bi

(again omitting the dependence on f of both sets).We now distinguish between small and large fibers for the boundary Ba,b

i .

64

Definition 11.18 (Small and large fibers). We say that the fiber Bi(za,b)

is large if

P(σ ∈ Bi

(za,b) ∣∣∣σ ∈ F (za,b)) ≥ 1− ε3

4n3k9, (59)

and small otherwise.We denote by Lg

(Ba,bi

)the union of large fibers for the boundary Ba,b

i , i.e.,

Lg(Ba,bi

):=σ : Bi

(xa,b (σ)

)is a large fiber, and σ ∈ Bi

(xa,b (σ)

)and similarly, we denote by Sm

(Ba,bi

)the union of small fibers.

We remark that what is important is that the fraction appearing on the right hand sideof (59) is a polynomial of 1

n , 1k and ε—the specific polynomial in this definition is the end

result of the computation in the proof.Finally, for a voter i and a pair of alternatives a, b ∈ [k], we define

F a,bi :=σ : Bi

(xa,b (σ)

)is a large fiber

.

So this means that

P(σ ∈ ∪za,bBi

(za,b) ∣∣∣σ ∈ F a,bi

)≥ 1− ε3

4n3k9. (60)

11.5.5 Boundaries of boundaries

Finally, we also look at boundaries of boundaries. In particular, for a given vector za,b ofpreferences between a and b, we can think of the fiber F

(za,b)

as a subgraph of the originalrankings graph. When we write ∂

(Bi(za,b))

, we mean the boundary of Bi(za,b)

in thesubgraph of the rankings graph induced by the fiber F

(za,b). That is,

∂(Bi

(za,b))

= σ ∈ Bi(za,b)

: ∃ π ∈ F(za,b)\Bi

(za,b)

s.t.

σ and π differ in exactly one coordinate.

11.5.6 Dictators and miscellaneous definitions

For a ranking profile σ = (σ1, . . . , σn) we sometimes write σ−i for the collection of allcoordinates except the ith coordinate, i.e., σ = (σi, σ−i). Furthermore, we sometimesdistinguish two coordinates, e.g., we write σ =

(σ1, σi, σ−1,i

).

Definition 11.19 (Induced SCF on one coordinate). Let fσ−i denote the SCF on onevoter induced by f by fixing all voter preferences except the ith one according to σ−i. I.e.,

fσ−i (·) := f (·, σ−i) .

65

Recall Definition 11.3 of a dictator on a subset.

Definition 11.20 (Ranking profiles giving dictators on a subset). For a coordinate i anda subset of alternatives H ⊆ [k], define

DHi :=

σ−i : fσ−i (·) ≡ topH (·)

.

Also, for a pair of alternatives a and b, define

Di (a, b) :=⋃

H:a,b⊆H,|H|≥3

DHi .

12 Inverse polynomial manipulability for a fixed number ofalternatives

We prove here the following theorem (Theorem 12.1 below), which is weaker than our maintheorem, Theorem 11.9, in two aspects: first, the condition D (f,NONMANIP) ≥ ε is re-placed with the stronger condition D

(f,NONMANIP

)≥ ε, and second, we allow factors of

1k! in our lower bounds for P (σ ∈M (f)). The advantage is that the proof of this statementis relatively simpler. We move on to getting a lower bound with polynomial dependenceon k in the following sections, and finally we replace the condition D

(f,NONMANIP

)≥ ε

with D (f,NONMANIP) ≥ ε.

Theorem 12.1. Suppose we have n ≥ 2 voters, k ≥ 3 alternatives, and a SCF f : Snk → [k]satisfying D

(f,NONMANIP

)≥ ε. Then

P (σ ∈M (f)) ≥ p(ε,

1

n,

1

k!

), (61)

for some polynomial p, where σ ∈ Snk is selected uniformly.An immediate consequence is that

P((σ, σ′


)≥ q

(ε,

1

n,

1

k!

),

for some polynomial q, where σ ∈ Snk is selected uniformly, and σ′ is obtained from σ byuniformly selecting a coordinate i ∈ 1, . . . , n and resetting the ith coordinate to a randompreference. In particular, the specific lower bound for P (σ ∈M (f)) implies that we cantake q

(ε, 1

n ,1k

)= ε5

4n8k12(k!)5.

First we provide an overview of the proof of Theorem 12.1 in Section 12.1. In thisoverview we use adjectives such as “big”, and “not too small” to describe probabilities—here these are all synonymous with “has probability at least an inverse polynomial of n,k!, and ε−1”.

66

12.1 Overview of proof

The tactic in proving Theorem 12.1 is roughly the following:

• By Lemma 11.14, we know that there are at least two boundaries which are big.W.l.o.g. we can assume that these are either Ba,b

1 and Ba,c2 , or Ba,b

1 and Bc,d2 with

a, b∩ c, d = ∅. Our proof works in both cases, but we continue the outline of theproof assuming the former case which is more difficult.

• We partition Ba,b1 according to its fibers based on the preferences between a and b of

the n voters. Similarly for Ba,c2 and preferences between a and c.

• We can distinguish small and large fibers for these two boundaries. Now since Ba,b1

is big, either the mass of small fibers, or the mass of large fibers is big. Similarly forBa,c

2 .

• Suppose first that there is big mass on large fibers in both Ba,b1 and Ba,c

2 . In this casethe probability of our random ranking σ being in F a,b1 is big, and similarly for F a,c2 .Being in F a,b1 only depends on the vector xa,b (σ) of preferences between a and b, andsimilarly being in F a,c2 only depends on the vector xa,c (σ) of preferences between aand c. We know the correlation between xa,b (σ) and xa,c (σ) and hence we can applyreverse hypercontractivity (Lemma 7.5), which tells us that the probability that σlies in both F a,b1 and F a,c2 is big as well. If σ ∈ F a,b1 , then voter 1 is pivotal betweenalternatives a and b with big probability, and similarly if σ ∈ F a,c2 , then voter 2is pivotal between alternatives a and c with big probability. So now we have thatthe probability that both voter 1 is pivotal between a and b and voter 2 is pivotalbetween a and c is big, and in this case the Gibbard-Satterthwaite theorem tells usthat there is a manipulation point which agrees with this ranking profile in all exceptfor perhaps the first two coordinates. So there are many manipulation points.

• Now suppose that the mass of small fibers in Ba,b1 is big. By isoperimetric the-

ory, the size of the boundary of every small fiber is comparable (same order up topoly−1

(ε−1, n, k!

)factors) to the size of the small fiber. Consequently, the total size

of the boundaries of small fibers is comparable to the total size of small fibers, whichin this case has to be big.

We then distinguish two cases: either we are on the boundary of a small fiber inthe first coordinate, or some other coordinate. If σ is on the boundary of a smallfiber in some coordinate j 6= 1, then the Gibbard-Satterthwaite theorem tells us thatthere is a manipulation point which agrees with σ in all coordinates except perhapsin coordinates 1 and j. If our ranking profile σ is on the boundary of a small fiber inthe first coordinate, then either there exists a manipulation point which agrees withσ in all coordinates except perhaps the first, or the SCF on one voter that we obtain

67

from f by fixing the votes of voters 2 through n to be σ−1 must be a dictator on somesubset of the alternatives. So either we get sufficiently many manipulation points thisway, or for many votes of voters 2 through n, the restricted SCF obtained from f byfixing these votes is a dictator on coordinate 1 on some subset of the alternatives.

Finally, to deal with dictators on the first coordinate, we look at the boundary of thedictators. Since D

(f,NONMANIP

)≥ ε, the boundary is big, and we can also show

that there is a manipulation point near every boundary point.

• If the mass of small fibers in Ba,c2 is big, then we can do the same thing for this

boundary.

12.2 Division into cases

For the remainder of Section 12, let us fix the number of voters n ≥ 2, the number ofalternatives k ≥ 3, and the SCF f , which satisfies D

(f,NONMANIP

)≥ ε. Accordingly,

we typically omit the dependence of various sets (e.g., boundaries between two alternatives)on f .

Our starting point is Lemma 11.14. W.l.o.g. we may assume that the two boundariesthat the lemma gives us have i = 1 and j = 2, so the lemma tells us that

P((σ, σ(1)

)∈ Ba,b

1

)≥ 2ε

nk3,

where σ is uniform on the ranking profiles, and σ(1) is obtained by rerandomizing the firstcoordinate. This also means that

P(σ ∈ ∪za,bB1

(za,b))≥ 2ε

nk3,

and similar inequalities hold for the boundary Bc,d2 . The following lemma is an immediate

corollary.

Lemma 12.2. EitherP(σ ∈ Sm

(Ba,b

1

))≥ ε

nk3(62)

orP(σ ∈ Lg

(Ba,b

1

))≥ ε

nk3, (63)

and the same can be said for the boundary Bc,d2 .

We distinguish cases based upon this: either (62) holds, or (62) holds for the boundaryBc,d

2 , or (63) holds for both boundaries. We only need one boundary for the small fibercase, and we need both boundaries only in the large fiber case. So in the large fiber casewe must differentiate between two cases: whether d ∈ a, b or d /∈ a, b. We will see thatif d /∈ a, b then the large fiber case cannot occur—so this method of proof works as well.

In the rest of the section we first deal with the large fiber case, and then with the smallfiber case.

68

12.3 Big mass on large fibers

We now deal with the case when

P(σ ∈ Lg

(Ba,b

1

))≥ ε

nk3(64)

and alsoP(σ ∈ Lg

(Bc,d

2

))≥ ε

nk3. (65)

As mentioned before, we must differentiate between two cases: whether d ∈ a, b ord /∈ a, b.

12.3.1 Case 1

Suppose d ∈ a, b, in which case we may assume w.l.o.g. that d = a.

Lemma 12.3. If

P(σ ∈ Lg

(Ba,b

1

))≥ ε

nk3and P (σ ∈ Lg (Ba,c

2 )) ≥ ε

nk3, (66)

then

P (σ ∈M) ≥ ε3

2n3k9 (k!)2 . (67)

Proof. By (66) we have that

P(σ ∈ F a,b1

)≥ ε

nk3and P (σ ∈ F a,c2 ) ≥ ε

nk3.

We know that∣∣∣E(xa,bi (σ)xa,ci (σ)

)∣∣∣ = 1/3, and so by reverse hypercontractivity (Lemma 7.5)we have that

P(σ ∈ F a,b1 ∩ F a,c2

)≥ ε3

n3k9. (68)

Recall that we say that σ is on the boundary Ba,b1 if there exists σ′ such that (σ, σ′) ∈ Ba,b

1 .If σ ∈ F a,b1 , then with big probability σ is on the boundary Ba,b

1 , and if σ ∈ F a,c2 , thenwith big probability σ is on the boundary Ba,c

2 . Using this and (68) we can show thatthe probability of σ lying on both the boundary Ba,b

1 and the boundary Ba,c2 is big. Then

we are done, because if σ lies on both Ba,b1 and Ba,c

2 , then by the Gibbard-Satterthwaitetheorem there is a σ which agrees with σ on the last n − 2 coordinates, and which is amanipulation point, and there can be at most (k!)2 ranking profiles that give the samemanipulation point. Let us do the computation:

P(σ on Ba,b

1 , σ on Ba,c2

)≥ P

(σ on Ba,b

1 , σ on Ba,c2 , σ ∈ F a,b1 ∩ F a,c2

)≥ P

(σ ∈ F a,b1 ∩ F a,c2

)−P(σ ∈ F a,b1 ∩ F a,c2 , σ not on Ba,b

1

)−P(σ ∈ F a,b1 ∩ F a,c2 , σ not on Ba,c

2

).

69

The first term is bounded below via (68), while the other two terms can be bounded using(60):

P(σ ∈ F a,b1 ∩ F a,c2 , σ not on Ba,b

1

)≤ P

(σ ∈ F a,b1 , σ not on Ba,b

1

)≤ P

(σ not on Ba,b

1

∣∣∣σ ∈ F a,b1

)≤ ε3

4n3k9,

and similarly for the other term. Putting everything together gives us

P(σ on Ba,b

1 , σ on Ba,c2

)≥ ε3

2n3k9,

which by the discussion above implies (67).

12.3.2 Case 2

Lemma 12.4. If d /∈ a, b, then (64) and (65) cannot hold simultaneously.

Proof. Suppose on the contrary that (64) and (65) do both hold. Then

P(σ ∈ F a,b1

)≥ ε

nk3and P

(σ ∈ F c,d2

)≥ ε

nk3

as before. Since a, b ∩ c, d = ∅,σ ∈ F a,b1

and

σ ∈ F c,d2

are independent events,

and so

P(σ ∈ F a,b1 ∩ F c,d2

)= P

(σ ∈ F a,b1

)P(σ ∈ F c,d2

)≥ ε2

n2k6.

In the same way as before, by the definition of large fibers this implies that

P(σ on Ba,b

1 , σ on Bc,d2

)≥ ε2

2n2k6> 0,

but it is clear thatP(σ on Ba,b

1 , σ on Bc,d2

)= 0,

since σ on Ba,b1 and on Bc,d

2 requires f (σ) ∈ a, b ∩ c, d = ∅. So we have reached acontradiction.

12.4 Big mass on small fibers

We now deal with the case when (62) holds, i.e., when we have a big mass on the smallfibers for the boundary Ba,b

1 . We formalize the ideas of the outline described in Section 12.1in a series of statements.

First, we want to formalize that the boundaries of the boundaries are big, when we areon a small fiber.

70

Lemma 12.5. Fix coordinate 1 and the pair of alternatives a, b. Let za,b be such thatB1

(za,b)

is a small fiber for Ba,b1 . Then, writing B ≡ B1

(za,b), we have

|∂e (B)| ≥ ε3

4n3k9|B|

and

P (σ ∈ ∂ (B)) ≥ ε3

2n4k9k!P (σ ∈ B) , (69)

where both the edge boundary ∂e (B) and the boundary ∂ (B) are with respect to the inducedsubgraph F

(za,b), which is isomorphic to Kn

k!/2, the Cartesian product of n complete graphsof size k!/2.

Proof. We use Corollary 11.15 with ` = k!/2 and the set A being either B or Bc :=F(za,b)\B. Suppose first that |B| ≤

(1− 2

k!

)(k!/2)n. Then |∂e (B)| ≥ |B|. Suppose now

that |B| >(1− 2

k!

)(k!/2)n. Since we are in the case of a small fiber, we also know that

|B| ≤(

1− ε3

4n3k9

)(k!/2)n. Consequently, we get

|∂e (B)| = |∂e (Bc)| ≥ |Bc| ≥ ε3

4n3k9|B| ,

which proves the first claim.A ranking profile in F

(za,b)

has (k!/2− 1)n ≤ nk!/2 neighbors in F(za,b), which then

implies (69).

Corollary 12.6. If (62) holds, then

P

(σ ∈

⋃za,b

∂(B1

(za,b)))

≥ ε4

2n5k12k!.

Proof. Using the previous lemma and (62) we have

P

(σ ∈

⋃za,b

∂(B1

(za,b)))

=∑za,b

P(σ ∈ ∂

(B1

(za,b)))

≥∑

za,b:B1(za,b)⊆Sm(Ba,b1

)P(σ ∈ ∂

(B1

(za,b)))

≥∑

za,b:B1(za,b)⊆Sm(Ba,b1

)ε3

2n4k9k!P(σ ∈ B1

(za,b))

=ε3

2n4k9k!P(σ ∈ Sm

(Ba,b

1

))

≥ ε4

2n5k12k!.

Next, we want to find manipulation points on the boundaries of boundaries.

71

Lemma 12.7. Suppose the ranking profile σ is on the boundary of a fiber for Ba,b1 , i.e.,

σ ∈⋃za,b

∂(B1

(za,b))

.

Then either σ−1 ∈ D1 (a, b), or there exists a manipulation point σ which differs from σ inat most two coordinates, one of them being the first coordinate.

Proof. First of all, by our assumption that σ is on the boundary of a fiber for Ba,b1 , we

know that σ ∈ B1

(za,b)

for some za,b, which means that there exists a ranking profile

σ′ = (σ′1, σ−1) such that (σ, σ′) ∈ Ba,b1 . We may assume a

σ1> b and b

σ′1> a, or else either σ

or σ′ is a manipulation point.Now since σ ∈ ∂

(B1

(za,b))

we also know that there exists a ranking profile π =(πj , σ−j) ∈ F

(za,b)\B1

(za,b)

for some j ∈ [k]. We distinguish two cases: j 6= 1 and j = 1.Case 1: j 6= 1. What does it mean for π = (πj , σ−j) to be on the same fiber as σ, but

for π to not be in B1

(za,b)? First of all, being on the same fiber means that σj and πj

both rank a and b in the same order. Now π /∈ B1

(za,b)

means that

• either f (π) 6= a;

• or f (π) = a and f (π′1, π−1) 6= b for every π′1 ∈ Sk.

If f (π) = b, then either σ or π is a manipulation point, since the order of a and b is thesame in both σj and πj (since σ and π are on the same fiber).

Suppose f (π) = c /∈ a, b. Then we can define a SCF function on two coordinates byfixing all coordinates except coordinates 1 and j to agree with the respective coordinatesof σ—letting coordinates 1 and j vary we get a SCF function on two coordinates whichtakes on at least three values (a, b, and c), and does not only depend on one coordinate.Now applying the Gibbard-Satterthwaite theorem we get that this SCF on two coordinateshas a manipulation point, which means that our original SCF f has a manipulation pointwhich agrees with σ in all coordinates except perhaps in coordinates 1 and j.

So the final case is that f (π) = a and f (π′1, π−1) 6= b for every π′1 ∈ Sk. In particular

for π := (σ′1, π−1) =(πj , σ

′−j

)we have f (π) 6= b. Now if f (π) = a then either σ′ or π is a

manipulation point, since the order of a and b is the same in both σ′j = σj and πj . Finally,if f (π) = c /∈ a, b, then we can apply the Gibbard-Satterthwaite theorem just like in theprevious paragraph.

Case 2: j = 1. We can again ask: what does it mean for π = (π1, σ−1) to be on thesame fiber as σ, but for π to not be in B1

(za,b)? First of all, being on the same fiber means

that σ1 and π1 both rank a and b in the same order (namely, as discussed at the beginning,ranking a above b, or else we have a manipulation point). Now π /∈ B1

(za,b)

means that

• either f (π) 6= a;

72

• or f (π) = a and f (π′1, π−1) 6= b for every π′1 ∈ Sk.

However, we know that f (σ′) = b and that σ′ is of the form σ′ = (σ′1, σ−1) = (σ′1, π−1),and so the only way we can have π /∈ B1

(za,b)

is if f (π) 6= a.

If f (π) = b, then π is a manipulation point, since aπ1> b and f (σ) = a.

So the remaining case is if f (π) = c /∈ a, b. This means that fσ−1 (see Defini-tion 11.19) takes on at least three values. Denote by H ⊆ [k] the range of fσ−1 . Now eitherσ−1 ∈ DH

1 ⊆ D1 (a, b), or there exists a manipulation point σ which agrees with σ in everycoordinate except perhaps the first.

Finally, we need to deal with dictators on the first coordinate.

Lemma 12.8. Assume that D(f,NONMANIP

)≥ ε. We have that either

P (σ−1 ∈ D1 (a, b)) ≤ ε4

4n5k12k!,

or

P (σ ∈M) ≥ ε5

4n7k12 (k!)4 . (70)

Proof. Suppose P (σ−1 ∈ D1 (a, b)) ≥ ε4

4n5k12k!, which is the same as

∑H:a,b⊆H,|H|≥3

P(σ−1 ∈ DH

1

)≥ ε4

4n5k12k!. (71)

Note that for every H ⊆ [k] we have

ε ≤ D(f,NONMANIP

)≤ P (f (σ) 6= topH (σ1)) ≤ 1− P

(DH

1

),

and soP(DH

1

)≤ 1− ε. (72)

The main idea is that (72) implies that the size of the boundary of DH1 is comparable

to the size of DH1 , and if we are on the boundary of DH

1 , then there is a manipulation pointnearby.

So first let us establish that the size of the boundary of DH1 is comparable to the size

of DH1 . This is done along the same lines as the proof of Lemma 12.5.

Notice that DH1 ⊆ Sn−1

k , where Sn−1k should be thought of as the Cartesian product

of n − 1 copies of the complete graph on Sk. We apply Corollary 11.15 with ` = k! andwith n − 1 copies, and we see that if ε ≥ 1

k! , then∣∣∂e (DH

1

)∣∣ ≥ ∣∣DH1

∣∣. If ε < 1k! and

1− 1k! ≤ P

(DH

1

)≤ 1− ε then∣∣∂e (DH

1

)∣∣ =∣∣∣∂e ((DH

1

)c)∣∣∣ ≥ ∣∣∣(DH1

)c∣∣∣ ≥ ε ∣∣DH1

∣∣ .73

So in any case we have∣∣∂e (DH

1

)∣∣ ≥ ε ∣∣DH1

∣∣. Since σ−1 has (n− 1) (k!− 1) ≤ nk! neighborsin Sn−1

k , we have that

P(σ−1 ∈ ∂

(DH

1

))≥ ε

nk!P(σ−1 ∈ DH

1

).

Consequently, by (71), we have

P

σ−1 ∈⋃

H:a,b⊆H,|H|≥3

∂(DH

1

) =∑

H:a,b⊆H,|H|≥3

P(σ−1 ∈ ∂

(DH

1

))≥

∑H:a,b⊆H,|H|≥3

ε

nk!P(σ−1 ∈ DH

1

)≥ ε5

4n6k12 (k!)2 .

Next, suppose σ−1 ∈ ∂(DH

1

)for some H such that a, b ⊆ H, |H| ≥ 3. We want to

show that then there is a manipulation point “close” to σ−1 in some sense. To be moreprecise: for the manipulation point σ, σ−1 will agree with σ−1 in all except maybe onecoordinate.

If σ−1 ∈ ∂(DH

1

), then there exist j ∈ 2, . . . , n and σ′j such that σ′−1 :=

(σ′j , σ−1,j

)/∈

DH1 . That is, fσ′−1

(·) 6≡ topH (·). There can be two ways that this can happen—the twocases are outlined below. Denote by H ′ ⊆ [k] the range of fσ′−1

.Case 1: H′ = H. In this case we automatically know that there exists a manipula-

tion point σ such that σ−1 = σ′−1, and so σ−1 agrees with σ−1 in all coordinates exceptcoordinate j.

Case 2: H′ 6= H. W.l.o.g. suppose H ′ \ H 6= ∅, and let c ∈ H ′ \ H. (The othercase when H \H ′ 6= ∅ works in exactly the same way.) First of all, we may assume thatfσ′−1

(·) ≡ topH′ (·), because otherwise we have a manipulation point just like in Case 1.We can define a SCF on two coordinates by fixing all coordinates except coordinate

1 and j to agree with σ−1, and varying coordinates 1 and j. We know that the outcometakes on at least three different values, since σ−1 ∈ DH

1 , and |H| ≥ 3.Now let us show that this SCF is not a function of the first coordinate. Let σ1 be a

ranking which puts c first, and then a. Then f (σ1, σ−1) = a, but f(σ1, σ

′−1

)= c, which

shows that this SCF is not a function of the first coordinate (since a change in coordinatej can change the outcome).

Consequently, the Gibbard-Satterthwaite theorem tells us that this SCF on two coor-dinates has a manipulation point, and therefore there exists a manipulation point σ for fsuch that σ−1 agrees with σ−1 in all coordinates except coordinate j.

Putting everything together yields (70).

12.5 Proof of Theorem 12.1 concluded

Proof of Theorem 12.1. If (64) and (65) hold, then we are done by Lemmas 12.3 and 12.4.

74

If not, then either (62) holds, or (62) holds for the boundary Bc,d2 ; w.l.o.g. assume that

(62) holds.By Corollary 12.6, we have

P

(σ ∈

⋃za,b

∂(B1

(za,b)))

≥ ε4

2n5k12k!.

We may assume that P (σ−1 ∈ D1 (a, b)) ≤ ε4

4n5k12k!, since otherwise we are done by

Lemma 12.8. Consequently, we then have

P

(σ ∈

⋃za,b

∂(B1

(za,b))

, σ−1 /∈ D1 (a, b)

)≥ ε4

4n5k12k!.

We can then finish our argument using Lemma 12.7:

P (σ ∈M) ≥ 1

n (k!)2P

(σ ∈

⋃za,b

∂(B1

(za,b))

, σ−1 /∈ D1 (a, b)

)≥ ε4

4n6k12 (k!)3 .

13 An overview of the refined proof

In order to improve on the result of Theorem 12.1—in particular to get rid of the factor of1

(k!)4—we need to refine the methods used in the previous section.

The key to the refined method is to consider the so-called refined rankings graph insteadof the general rankings graph studied in Section 12. The vertices of this graph are againranking profiles (elements of Snk ), and two vertices are connected by an edge if they differin exactly one coordinate, and by an adjacent transposition in that coordinate. Again,the SCF f naturally partitions the vertices of this graph into k subsets, depending onthe value of f at a given vertex. Clearly a 2-manipulation point can only be on the edgeboundary of such a subset in the refined rankings graph, and so it is important to studythese boundaries.

One of the important steps of the proof in Section 12 is creating a configuration where wefix all but two coordinates, and the SCF f takes on at least three values when we vary thesetwo coordinates—then we can define another SCF on two voters and k alternatives whichmust have a manipulation point by the Gibbard-Satterthwaite theorem. The advantageof the refined rankings graph is that we can create a configuration where we fix all buttwo coordinates, and in these two coordinates we also fix all but constantly many adjacentalternatives, and the SCF takes on at least three values when we vary these constantlymany adjacent alternatives in the two coordinates. Then we can define another SCF ontwo voters and r alternatives, where r is a small constant, which must have a manipulationpoint by the Gibbard-Satterthwaite theorem. Since r is a constant, we only lose a constantfactor in our estimates, not factors of 1

k! .

75

We state the refined result in Theorem 17.1, which we also prove in Section 17. Theproof of Theorem 17.1 follows the outline of the proof of Theorem 12.1: we know that thereare at least two refined boundaries which are big

The difficulty is dealing with the case when we are on the boundary of a small fiber inthe first coordinate. Suppose σ = (σ1, σ−1) is on such a boundary. We know that there arek! ranking profiles which agree with σ in coordinates 2 through n. The difficulty comes fromthe fact that—in order to obtain a polynomial bound in k—we are only allowed to lookat a polynomial number (in k) of these ranking profiles when searching for a manipulationpoint. If there is an r-manipulation point among them for some small constant r, then weare done. If this is not the case then σ is what we call a local dictator on some subset of thealternatives in coordinate 1. We say that σ is a local dictator on some subset H ⊆ [k] ofthe alternatives in coordinate 1 if the alternatives in H are adjacent in σ1, and permutingthe alternatives in H in every possible way in the first coordinate, the outcome of the SCFf is always the top-ranked alternative in H.

So instead of dealing with dictators on some subset in coordinate 1, as in Section 12, wehave to deal with local dictators on some subset in coordinate 1. This analysis involves es-sentially only the first coordinate, in essence proving a quantitative Gibbard-Satterthwaitetheorem for one voter. This has not been studied in the literature before, and, more-over, we were not able to utilize previous quantitative Gibbard-Satterthwaite theorems tosolve this problem easily. Hence we separate this argument from the rest of the proof ofTheorem 17.1 and formulate a quantitative Gibbard-Satterthwaite theorem for one voter,Theorem 13.1, which is proven in Section 16. This proof forms the backbone for the proofof Theorem 17.1, which is then proven in Section 17.

Theorem 13.1. Suppose f : Sk → [k] is a SCF on n = 1 voter and k ≥ 3 alternativeswhich satisfies D (f,NONMANIP) ≥ ε. Then

P (σ ∈M (f)) ≥ P (σ ∈M3 (f)) ≥ p(ε,

1

k

), (73)

for some polynomial p, where σ ∈ Sk is selected uniformly. In particular, we show a lowerbound of ε3

105k16.

14 Refined rankings graph—introduction and preliminaries

14.1 Transpositions, boundaries, and influences

Definition 14.1 (Adjacent transpositions). Given two elements a, b ∈ [k], the adjacenttransposition [a : b] between them is defined as follows. If σ ∈ Sk has a and b adjacent,then [a : b]σ is obtained from σ by exchanging a and b. Otherwise [a : b]σ = σ.

We let T denote the set of all k (k − 1) /2 adjacent transpositions.For σ ∈ Snk , we let [a : b]i σ denote the ranking profile obtained by applying [a : b] on

the ith coordinate of σ while leaving all other coordinates unchanged.

76

Definition 14.2 (Boundaries). For a given SCF f and a given alternative a ∈ [k], wedefine

Ha (f) = σ ∈ Snk : f (σ) = a ,

the set of ranking profiles where the outcome of the vote is a. The edge boundary ofthis set (with respect to the underlying refined rankings graph) is denoted by Ba;T (f) :Ba;T (f) = ∂e (Ha (f)). This boundary can be partitioned: we say that the edge boundaryof Ha (f) in the direction of the ith coordinate is

Ba;Ti (f) =

(σ, σ′

)∈ Ba;T (f) : σi 6= σ′i

.

The boundary Ba (f) can be therefore written as Ba;T (f) = ∪ni=1Ba;Ti (f). We can also

define the boundary between two alternatives a and b in the direction of the ith coordinate:

Ba,b;Ti (f) =

(σ, σ′

)∈ Ba;T

i (f) : f(σ′)

= b.

Moreover, we can define the boundary between two alternatives a and b in the direction ofthe ith coordinate with respect to the adjacent transposition z ∈ T :

Ba,b;zi (f) =

(σ, σ′

)∈ Ba;T

i (f) : σ′ = ziσ, f(σ′)

= b.

We also say that σ is on the boundary Ba,b;zi (f) if (σ, ziσ) ∈ Ba,b;z

i (f). Clearly we have

Ba,b;Ti (f) =

⋃z∈T

Ba,b;zi (f) .

Definition 14.3 (Influences). Given z ∈ T , we define

Infa,b;zi (f) = P(f (σ) = a, f

(σ(i))

= b)

Infa;zi (f) = P

(f (σ) = a, f

(σ(i))6= a

)Infa,b;Ti (f) =

∑z∈T

Infa,b;zi (f) ,

where σ is uniformly distributed in Snk and σ(i) is obtained from σ by rerandomizing theith coordinate σi in the following way: with probability 1/2 we keep it as σi, and otherwisewe replace it by zσi.

Note that for a 6= b,

Infa,b;zi (f) =1

2P (f (σ) = a, f (ziσ) = b) =

1

2

∣∣∣Ba,b;zi (f)

∣∣∣(k!)n

.

Again, most of the time the specific SCF f will be clear from the context, in which casewe omit the dependence on f .

77

14.2 Canonical Paths and Group Actions

In order to derive the more refined result, we will need to consider in more detail theproperties of the permutation group Lq with respect to adjacent transpositions. Again weuse canonical paths arguments. We state the arguments in a more general setup.

Definition 14.4. Let L be a graph.

• Let PL(`) denote the set of paths of length at most ` in L and PL = ∪`∈NPL(l) theset of paths of finite length.

• Let L1, L2 ⊆ L. A canonical path map on L from L1 to L2 of length ` is a mapΓ: L1 × L2 → PL(`) which satisfies that Γ(x, y) begins at x and ends at y for all(x, y) ∈ L1 × L2.

• Given a canonical path map Γ: L1×L2 → PL(`) and 0 ≤ i ≤ ` we define the inverseimage mapping of the i’th vertex, Γ−1

i : L→ 2L1×L2 as

Γ−1i (z) = (x, y) | length(Γ(x, y)) ≥ i,Γ(x, y)i = z.

Further, we letΓ−1(z) = ∪`i=0Γ−1

i (z)

• Given a group H acting on L we say that a canonical path map Γ: L1×L2 → PL(`)is H-invariant if HL1 = L1 and HL2 = L2 and

Γ(hx, hy) = hΓ(x, y),

for all h ∈ H and all (x, y) ∈ L1 × L2.

We will use the following proposition. Recall that a group H acting on L is calledfixed-point-free if for all x ∈ L and all h ∈ H different than the identity it holds thathx 6= x.

Proposition 14.5. Let H be a fixed-point-free group acting on L and let Γ: L1 × L2 →PL(`) be a canonical path map that is H-invariant. Then for all z ∈ L and 0 ≤ i ≤ l itholds that

|Γ−1i (z)| ≤ |L1||L2|

|H|(74)

and

|Γ−1(z)| ≤ (`+ 1)|L1||L2||H|

(75)

78

Proof. Note that for all i,

|L1 × L2| ≥∑w

|Γ−1i (w)| ≥

∑h∈H|Γ−1i (hz)| = |H||Γ−1

i (z)|,

where the first inequality follows since the value of the i’th vertex partitions the set ofpaths of length at least i, the second inequality since H is fixed-point-free, and the finalequality from the path being H-invariant. We thus obtain:

|Γ−1(z)| ≤∑i=0

|Γ−1i (z)| ≤ (`+ 1)|L1||L2|

|H|,

as needed.

Two applications of the result above will be given for adjacent transpositions.

Definition 14.6. Given two elements a, b ∈ [k] the adjacent transposition [a : b] betweenthem is defined as follows. If σ ∈ Sk has a and b adjacent, then [a : b]σ is obtained from σbe exchanging a and b. Otherwise, [a : b]σ = σ.

We let T denote the set of all q(q − 1)/2 adjacent transpositions. Given µ ∈ T , wedefine

Infa,b;µi (f) = P(f(X) = a, f(X(i)) = b) (76)

Infa;µi (f) = P(f(X) = a, f(X(i)) 6= a) (77)

Infa,b;Ti (f) =∑µ∈T

Infa,b;µi (f) (78)

where X(i) is obtained from X by re-randomizing the i:th coordinate Xi in the followingway: with probability 1/2 we keep it as Xi and otherwise we replace it by µXi.

Finally for σ ∈ Snk we will let [σ : b]i σ denote the element obtained by applying [a : b]on the i:th coordinate of σ while leaving all other coordinates unchanged.

Proposition 14.7. There exists a canonical path map Γ: Sk × Sk → PSk(`) of length atmost ` = k(k − 1)/2 < k2/2, all of whose edges are adjacent transpositions such that forall µ it holds that:

|Γ−1(µ)| ≤ k2k!

2(79)

Proof. Given σ, π ∈ Sk consider the following canonical path starting at σ and ending atπ. Take the element π(1) ranked at the top for π and bubble it to the top by performingadjacent transpositions. Then take the element π(2) ranked second for π and bubble itto the second position etc. Clearly the length of the path is at most k(k − 1)/2. LetH = σ 7→ τσ | τ ∈ Sk be the group of compositions with all possible permutations of thecandidates. Since H is a fixed-point-free group acting on Sk and the described canonicalpath map is H-invariant the result follows from Proposition 14.5.

79

Corollary 14.8. For any f : Snk → [k], a ∈ [k] and i ∈ [n] it holds that∑µ∈T

Infa;µi (f) ≥ 1

k2Infai (f), (80)

where T is the set of all adjacent transpositions.

Proof. This is a standard canonical path argument. Since both sides of the desired in-equality involve averaging over all coordinates but the i’th coordinate, it follows that itsuffices to prove the claim in the case where i = n = 1. Let B = (u, v) ∈ Sk ×Sk | f(u) =a 6= f(v), ∃µ ∈ T : v = µu and note that∑

µ∈TInfa;µ

1 (f) =|B|2k!

, (81)

Consider the canonical path map Γ constructed in Proposition 14.7. Note that each canon-ical path between an element in A := σ ∈ Lq | f(σ) = a and an element in Ac must passvia one of the edges in B. Define h : A×AC → B by letting h(σ, π) be the first edge in Bwhich Γ(σ, π) passes through. Then by (79), for any (u, v) ∈ B,

|h−1((u, v))| ≤ |Γ−1(u)| ≤ k2k!

2(82)

Thus

|B| ≥ |A||Ac|

k2k!/2(83)

Combining (81) and (83) we obtain:∑µ∈T

Infa;µ1 (f) ≥ 1

2k!

|A||Ac|k2k!/2

=1

k2

|A|k!

|Ac|k!

=1

k2Infa1(f)

A second application of Proposition 14.7 is the following.

Proposition 14.9. Fix two elements a, b ∈ [k] and let B ⊆ Sk denote the set of allpermutations where a is ranked above b. Then there exists a canonical path map Γ :B×B → PB(k2) consisting of adjacent transpositions such that all permutations along thepath satisfy that a is ranked above b. Moreover for all µ it holds that:

|Γ−1(µ)| ≤ k4k!

80

Proof. Γ(σ, π) is defined as follows. We look at all elements different than a, b, startingwith the top one of π, and bubble each of them upwards to its position in π ignoring a, b.After we have done so, we have all elements but a, b ordered as in π, followed by a, followedby b. We now bubble a to its location in yπ and then bubble b. Note that the length ofthe path so defined is at most

k(k − 1)

2+ 2(k − 1) =

(k + 4)(k − 1)

2< k2

The proof now follows from Proposition 14.5 by considering the group H which acts bypermuting arbitrary all elements but those labeled by a and b:

|Γ−1(µ)| ≤ k2|B|2

|H|=k2(k!/2)2

(k − 2)!≤ k4k!

15 Refined Boundaries

Similarly to the previous construction we now define the i:th a-b boundary with respect toan adjacent swap z ∈ T as

Ba,b;zi (f) = (x, y) | f(x) = a, f(y) = b, xi = zyi, ∀j 6= i : xj = yj,

and the boundary with respect to arbitrary adjacent swaps on the i:th coordinate as

Ba,b;Ti (f) =

⋃z∈T

Ba,b;zi (f)

Note that for a 6= b,

Infa,b;zi (f) =1

2P(f(X) = a, f(zX) = b) =

1

2

|Ba,b;zi (f)|(k!)n

(84)

15.1 Manipulation points on refined boundaries

The following three lemmas from Isaksson, Kindler and Mossel [35, 36] identify manipula-tion points on (or close to) these refined boundaries.

Lemma 15.1. Fix f : Snk → [k], distinct a, b ∈ [k] and (σ, π) ∈ Ba,b;Ti . Then either

σi = [a : b]πi, or one of σ and π is a 2-manipulation point for f .

Proof. Suppose σi = [c : d]πi where c, d 6= a, b. Then an adjacent transposition of c

and d will not change the order of a and b. Hence bσi> a iff b

πi> a. But then either

81

1. f(π) = bxi> a = f(σ) and σ is a 2-manipulation point or

2. f(σ) = ayi> b = f(π) and π is a 2-manipulation point.

Lemma 15.2. Fix f : Snk → [k] and points σ, π, µ ∈ Snk such that (σ, π) ∈ Ba,b;Ti , (µ, π) ∈

Bc,b;Tj where a, b, c are distinct and i 6= j. Then there exists a 3-manipulation point ν ∈ Snk

for f such that ν` = π` for ` /∈ i, j and νi is equal to σi or πi except that the position ofc may be shifted arbitrarily and νj is equal to µj or πj except that the position of a may beshifted arbitrarily.

Proof. By Lemma 15.1 we must have σi = [a : b]πi and µj = [c : b]πj , or σ, π or µ is a2-manipulation point in which case we are done.

Now create a new triple (σ′, π′, µ′) by starting from (σ, π, µ) and simultaneously in thei:th coordinate of σ, π and µ, bubbling c towards the pair ab until it becomes adjacentto the pair. Since c is never swapped with a or b during this process Lemma 15.1 impliesthat for any intermediate triple (σ, π, µ) we have f(σ) = a, f(π) = b and f(µ) /∈ a, b, orone of σ, π and µ is a 2-manipulation point. But since we also have µ = [c : b]j π, we mustactually have f(µ) = c, or either π or µ is a 2-manipulation point.

Similarly bubbling a towards the pair bc in coordinate j starting from (σ′, π′, µ′) givesus σ′′, π′′, µ′′ all having a, b, c adjacent in coordinates i and j such that (σ′′, π′′) ∈ Ba,b;[a:b]

i

and (µ′, π′′) ∈ Bc,b;[c:b]j . Note that σ′′, π′′, µ′′ are equal except for a reordering of the blocks

containing a, b, c in coordinates i and j.Now arbitrary adjacent swapping of a, b, c in these coordinates of σ′′, π′′ and µ′′ will

keep the value of f in a, b, c, or give rise to a 2-manipulation point by Lemma 15.1. Thuswe can define a social choice function with 2 voters and 3 candidates f ′ : S2

a,b,c → a, b, cby letting f ′(ν) = f(g(ν)), where g(ν) ∈ Snq is obtained from σ′′ by simply reordering thetwo blocks of elements a, b, c in coordinates i and j to match ν1 and ν2, respectively. Sincef ′ takes three values and is not a dictator, Gibbard-Satterthwaite (Theorem 11.2) impliesthat f ′ has a manipulation point and hence f has a 3-manipulation point satisfying ourrequirements.

15.2 Large refined boundaries

The following lemma, again from Isaksson, Kindler and Mossel [35, 36], shows that thereare large refined boundaries (or else we have a lot of 2-manipulation points automatically).

Lemma 15.3. Fix k ≥ 3 and f : Snk → [k] satisfying D(f,NONMANIP) ≥ ε. Let σ beuniformly selected from Snk . Then either

P (σ ∈M2 (f)) ≥ 4ε

nk7, (85)

82

or there exist distinct i, j ∈ [n] and a, b, c, d ⊆ [k] such that c /∈ a, b and

Infa,b;[a:b]i (f) ≥ 2ε

nk7and Inf

c,d;[c:d]j (f) ≥ 2ε

nk7. (86)

Proof. First, suppose that Infa,b;νi ≥ 2εnk7

for some i, a 6= b and ν 6= [a : b]. Then by

Lemma 15.1 for any point (σ, σ′) ∈ Ba,b;νi (f) at least one of σ or σ′ = νσ is a 2-manipulation

point. Let M be the set of all such 2-manipulation points. Then

|M | ≥ |Ba,b;νi (f)| = 2(k!)n Infa,b;νi (f) ≥ 4ε

nk7(k!)n (87)

Dividing with (k!)n gives (85). Thus, for the remainder of the proof we may assume that

Infa,b;νi <2ε

nk7, ∀i ∈ [n], a, b ⊆ [k], ν 6= [a : b] (88)

Now, for a 6= b let Aa,b =i ∈ [n] | Inf

a,b;[a:b]i ≥ 2ε

nk7

.

We first claim that for all a, b there exists c, d such that c, d 6= a, b and Ac,d 6= ∅.Note that f being ε-far from taking two values asserts that we can find a c /∈ a, b suchthat 1− ε

k ≥ P(f(X) = c) ≥ εq−2 ≥

εk . But then, by Corollary 14.8 and Proposition 11.12,

∑µ∈T

∑d6=c

n∑i=1

Infc,d;µi (f) =

∑µ∈T

n∑i=1

Infc;µi (f) ≥ 1

k2Var[1f(X)=c] ≥

ε(k − 1)

k4

hence there must exist some µ ∈ T , d 6= c and i ∈ [n] such that Infc,d;µi ≥ ε

nk6. But by (88)

we must have w = [c : d], hence Ac,d 6= ∅.We next claim that

| ∪a,b Aa,b| ≥ 2 (89)

To see this, assume the contrary, i.e. ∪a,bAa,b ⊆ i for some i ∈ [n]. Then, by Corol-lary 14.8, for all j 6= i it holds that

Infj(f) ≤ k2∑ν∈T

∑a

Infa;νj (f) = k2

∑ν∈T,a,b>a

Infa,b;νj (f) ≤ k6

2

2ε

nk7=

ε

nk(90)

For σ ∈ Lq, let f ¯σ(x) = f(σ1, . . . , σi−1, σ, σi+1, . . . , σn) and note that for j 6= i,

Infj(f) =1

k!

∑σ∈Sk

Infj(fσ) (91)

while Infi(fσ) = 0. Hence, by (90), we have

ε ≥ k∑j 6=i

Infj(f) =k

k!

n∑j=1

∑σ

Infj(fσ) ≥ 2

k!

∑σ

D(fσ,CONST) = 2D(f,DICTi)

83

where the second inequality follows from Lemma 11.13 and Proposition 11.12. But thismeans that f is ε/2-close to a dictator, contradicting the assumption that D(f,NONMANIP) ≥ε.

Hence (89) holds. Therefore we can either find i 6= j and a, b 6= c, d such thati ∈ Aa,b and j ∈ Ac,d which proves the theorem, or we must have |Aa,b| ≥ 2 for some a, bwhile Ac,d = ∅ for any c, d 6= a, b. However, this contradicts the first claim in theproof. The result follows.

15.3 Fibers

We again use fibers F(za,b)

as defined in Definition 11.17. However, we need more thanthis. We note that the following definitions only apply in Section 17, i.e., when we have atleast two voters; in Section 16, when we have only one voter, things are simpler.

Given the result of Lemma 15.3, our primary interest is in the boundary Ba,b;[a:b]i . For

ranking profiles on this boundary, we know that the alternatives a and b are adjacent incoordinate i—so we know more than just the preference between a and b in coordinatei. Consequently we would like to divide the set of ranking profiles with a and b adjacentin coordinate i according to the preferences between a and b in all coordinates exceptcoordinate i. The following definitions make this precise.

As done in Section 11.5.6 for ranking profiles, we can write xa,b−i ≡ xa,b−i (σ) for the vector

of preferences between a and b for all coordinates except coordinate i, i.e., the whole vectorof preferences between a and b is xa,b (σ) =

(xa,bi (σ) , xa,b−i (σ)

).

We can define F(za,b−i

)analogously to F

(za,b):

F(za,b−i

):=σ : xa,b−i (σ) = za,b−i

.

We also define the subset of F(za,b−i

)where a and b are adjacent in coordinate i, with a

above b:

F(za,b−i

):=σ ∈ F

(za,b−i

): a and b are adjacent in coordinate i, with a above b

.

Given a SCF f , for any pair of alternatives a, b ∈ [k] and coordinate i ∈ [n], we canalso partition the boundary Ba,b

i (f) according to its fibers. There are multiple, slightlydifferent ways of doing this, but for our purposes the following definition is most useful.

DefineBi

(za,b−i

):=σ ∈ F

(za,b−i

): f (σ) = a, f ([a : b]i σ) = b

,

where we omit the dependence of Bi(za,b−i

)on f . We call sets of the form Bi

(za,b−i

)⊆

F(za,b−i

)fibers for the boundary Ba,b;[a:b]

i .

We now distinguish between small and large fibers for the boundary Ba,b;[a:b]i .

84

Definition 15.4 (Small and large fibers). We say that the fiber Bi(za,b−i

)⊆ F

(za,b−i

)is

large ifP(σ ∈ Bi

(za,b−i

) ∣∣∣σ ∈ F (za,b−i )) ≥ 1− γ,

where γ = ε3

103n3k24, and small otherwise.

As before, we denote by Lg(Ba,b;[a:b]i

)the union of large fibers for the boundary

Ba,b;[a:b]i , i.e.,

Lg(Ba,b;[a:b]i

):=

⋃Bi

(za,b−i

)is a large fiber

Bi

(za,b−i

),

and similarly, we denote by Sm(Ba,b;[a:b]i

)the union of small fibers.

As in Definition 11.18, we remark that what is important is that γ is a polynomial of1n , 1

k and ε—the specific polynomial in this definition is the end result of the computationin the proof.

The following definition is used in Section 17.3 in dealing with the large fiber case inthe refined setting.

Definition 15.5. For a coordinate i and a pair of alternatives a and b, define F a,bi to bethe set of ranking profiles σ such that xa,b (σ) satisfies

P(f (σ) = topa,b (σi)

∣∣∣ σ ∈ F (xa,b−i (σ)))≥ 1− 2kγ.

Clearly F a,bi is the union of fibers of the form F(za,b), and also F

((1, xa,b−i

))⊆ F a,bi if

and only if F((−1, xa,b−i

))⊆ F a,bi .

15.4 Boundaries of boundaries

In the refined graph setting, just like in the general rankings graph setting, we also look atboundaries of boundaries.

For a given vector za,b−i of preferences between a and b, we can think of F(za,b−i

)as a

subgraph of the original refined rankings graph Snk , i.e., two ranking profiles in F(za,b−i

)are adjacent if they differ by one adjacent transposition in exactly one coordinate. Sinceboth of the ranking profiles are in F

(za,b−i

), this adjacent transposition keeps the order of

a and b in all coordinates, and moreover it keeps a and b adjacent in coordinate i.

85

We choose to slightly modify this graph: the vertex set is still F(za,b−i

), but we modify

the edge set by adding new edges. Suppose σ ∈ F(za,b−i

)and

σi =

...cabd...

; σ′i =

...abcd...

; σ′′i =

...cdab...

.

Define in this way σ′ = (σ′i, σ−i) and σ′′ = (σ′′i , σ−i), and add (σ, σ′) and (σ, σ′′) to theedge set. So basically, we consider the block of a and b in coordinate i as a single element,and connect two ranking profiles in F

(za,b−i

)if they differ in an adjacent transposition in

a single coordinate, allowing this transposition to move the block of a and b in coordinatei. We call this graph G

(za,b−i

)=(F(za,b−i

), E(za,b−i

)), where E

(za,b−i

)is the edge set.

When we write ∂e(Bi

(za,b−i

)), we mean the edge boundary of Bi

(za,b−i

)in the graph

G(za,b−i

), and similarly when we write ∂

(Bi

(za,b−i

)), we mean the vertex boundary of

Bi

(za,b−i

)in the graph G

(za,b−i

).

15.5 Local dictators, conditioning and miscellaneous definitions

In the general rankings graph setting we defined a dictator on a subset of the alternatives,but in the refined rankings graph setting we need to define so-called local dictators.

Definition 15.6 (Local dictators). For a coordinate i and a subset of alternatives H ⊆ [k],define LDH

i to be the set of ranking profiles σ such that the alternatives in H form anadjacent block in σi, and permuting them among themselves in any order, the outcome ofthe SCF f is always the top ranked alternative among those in H. If σ ∈ LDH

i , then wecall σ a local dictator on H in coordinate i.

Also, for a pair of alternatives a and b, define

LDi (a, b) :=⋃

c/∈a,b

LDa,b,ci ,

the set of local dictators on three alternatives, two of which are a and b, in coordinate i.

In dealing with local dictators, we will condition on the top of a particular coordinatebeing fixed. We therefore introduce the following notation.

86

Definition 15.7 (Conditioning). For any coordinate i ∈ [n] and any vector v of alterna-tives we define

Pvi (·) := P (· | (σi (1) , . . . , σi (|v|)) = v) ,

where |v| denotes the length of the vector v. E.g., P(a)1 (·) = P (· |σ1 (1) = a) and

P(a,b,c)1 = P (· | (σ1 (1) , σ1 (2) , σ1 (3)) = (a, b, c)) .

We use the following notation in the proof of Theorem 15.8:

Theorem 15.8. Suppose f is a SCF on n voters and k ≥ 3 alternatives for whichD(f,NONMANIP

)≤ α. Then either

D (f,NONMANIP) < 100n4k8α1/3 (92)

orP (σ ∈M (f)) ≥ P (σ ∈M3 (f)) ≥ α. (93)

Definition 15.9 (Majority function). For a function f whose domain X is finite and whoserange is the set a, b, define Maj (f) by

Maj (f) =

a if # x ∈ X : f (x) = a ≥ # x ∈ X : f (x) = b ,b if # x ∈ X : f (x) = a < # x ∈ X : f (x) = b .

16 Quantitative Gibbard-Satterthwaite theorem for one voter

In this section we prove our quantitative Gibbard-Satterthwaite theorem for one voter,Theorem 13.1. As mentioned before, we present this proof before proving Theorem 17.1,because the proof of Theorem 17.1 follows the lines of this proof, with slight modificationsneeded to deal with having n > 1 coordinates.

For the remainder of this section, let us fix the number of voters to be 1, the number ofalternatives k ≥ 3, and the SCF f , which satisfies D (f,NONMANIP) ≥ ε. Accordingly,we typically omit the dependence of various sets (e.g., boundaries between two alternatives)on f .

An additional notational remark: since our SCF is on one voter only, we omit thesubscripts that denote the coordinate we are on. E.g., we write simply Infa,b instead ofInfa,b1 , etc.

We present the proof in several steps.

87

16.1 Large boundary between two alternatives

The first thing we have to establish is a large boundary between two alternatives. Thiscan be done just like in Lemma 15.3, except there are two small differences. On the onehand, the assumption of the lemma, namely that D (f,NONMANIP) ≥ ε, is weaker thanthat of the original lemma. On the other hand, here we only need one big boundary, unlikein Lemma 15.3, where there are two big boundaries in two different coordinates. Thefollowing lemma formulates what we need.

Lemma 16.1. Recall that f is a SCF on 1 voter and k ≥ 3 alternatives which satisfiesD (f,NONMANIP) ≥ ε. Let σ ∈ Sk be selected uniformly. Then either

P (σ ∈M2) ≥ 4ε

k6(94)

or there exist alternatives a, b ∈ [k], a 6= b such that

Infa,b;[a:b] ≥ 2ε

k6. (95)

Proof. The proof is just like the proof of Lemma 15.3. First, suppose that Infa,b;z ≥ 2εk6

forsome pair of alternatives a 6= b, and transposition z 6= [a : b]. Then by Lemma 15.1, forany point (σ, σ′) ∈ Ba,b;z, at least one of σ or σ′ = zσ is a 2-manipulation point. Then

|M2| ≥∣∣∣Ba,b;z

∣∣∣ = 2 · k! · Infa,b;z ≥ 4ε

k6k!,

and dividing with k! gives (94). So for the remainder of the proof we may assume thatInfa,b;z < 2ε

k6for every a 6= b and z 6= [a : b].

For every a ∈ [k], D(f, topa

)≥ ε, so P (f (σ) = a) ≤ 1−ε. On the other hand, there

exists an alternative, say a ∈ [k], such that P (f (σ) = a) ≥ 1k . So for this alternative we

haveVar (1 [f (σ) = a]) ≥ ε

k,

and consequently using Corollary 14.8 and Proposition 11.12 we have∑w∈T

∑b 6=a

Infa,b;w =∑w∈T

Infa;w ≥ 1

k2Var (1 [f (σ) = a]) ≥ ε

k3.

Hence there must exist some w ∈ T and b 6= a such that Infa,b;w ≥ 2εk6

, but by ourassumption we must have w = [a : b].

If (94) holds, then we are done, so in the following we assume that (95) holds.

88

We know that σ is on Ba,b;[a:b] if f (σ) = a and f ([a : b]σ) = b. We know that if bσ> a,

then σ is a 2-manipulation point, so if this happens in more than half of the cases when σis on Ba,b;[a:b], then we have

P (σ ∈M2) ≥ 2ε

k6,

in which case we are again done. So we may assume in the following that

P (σ ∈ B) ≥ 2ε

k6, (96)

whereB :=

σ : f (σ) = a, f ([a : b]σ) = b, a

σ> b.


We again divide into two cases.We introduce the set F of permutations where a is directly above b:

F :=

σ ∈ Sk : a

σ> b and b

σ′

> a, where σ′ = [a : b]σ

.

One of the following two cases must hold.Case 1: Small fiber case. We have

P(σ ∈ B

∣∣σ ∈ F ) ≤ 1− ε

4k. (97)

Case 2: Large fiber case. We have

P(σ ∈ B

∣∣σ ∈ F ) > 1− ε

4k. (98)

16.3 Small fiber case

In this section we assume that (97) holds.We first formalize that the boundary ∂ (B) of B is big (recall the definition of ∂ (B)

from Section 15.4). The proof uses the canonical path method again.

Lemma 16.2. If (97) holds, then

P (σ ∈ ∂ (B)) ≥ ε

2k4P (σ ∈ B) . (99)

Proof. Let Bc = F \ B. For every (σ, σ′) ∈ B × Bc, we define a canonical path from σ toσ′, which has to pass through at least one edge in ∂e (B). Then if we show that every edgein ∂e (B) lies on at most r canonical paths, then it follows that |∂e (B)| ≥ |B| |Bc| /r.

89

So let (σ, σ′) ∈ B × Bc. We apply the path construction of Proposition 14.7, butconsidering the block formed by a and b as a single element. Since this path goes from σ(which is in B) to σ′ (which is in Bc), it must pass through at least one edge in ∂e (B).

For a given edge (π, π′) ∈ ∂e (B), at most how many possible (σ, σ′) ∈ B×Bc pairs arethere such that the canonical path between σ and σ′ defined above passes through (π, π′)?We learn from Proposition 14.7 that there are at most (k − 1)2 (k − 1)!/2 < k2 (k − 1)!/2possibilities for the pair (σ, σ′).

Recall that∣∣F ∣∣ = (k − 1)!. By our assumption we have |B| ≤

(1− ε

4k

)(k − 1)!, and so

|Bc| ≥ ε4k (k − 1)!. Therefore

|∂e (B)| ≥ |B| |Bc|k2

2 (k − 1)!≥ ε

2k3|B| .

Now in G every ranking profile has k − 2 < k neighbors, which implies (99).


P (σ ∈ ∂ (B)) ≥ ε2

k10. (100)

Proof. Combine Lemma 16.2 and (96).

Next we want to find manipulation points on the boundary ∂ (B). The next lemmatells us that if we are on the boundary ∂ (B), then either we can find manipulation pointseasily, or we are at a local dictator on three alternatives.

Lemma 16.4. Suppose that σ ∈ ∂ (B). Then

• either σ ∈ LD (a, b),

• or there exists σ ∈ M3 such that σ is equal to σ or [a : b]σ except that the positionof a third alternative c might be shifted arbitrarily.

Proof. Since σ ∈ ∂ (B) ⊆ B, we know that f (σ) = a, and if σ′ = [a : b]σ, then f (σ′) = b.Let π ∈ Bc denote the ranking profile such that (σ, π) ∈ ∂e (B), and let π′ = [a : b]π. Sinceπ /∈ B, (f (π) , f (π′)) 6= (a, b). Then, by Lemma 15.1, if f (π) 6= f (π′), then one of π andπ′ is a 2-manipulation point. So assume f (π) = f (π′).

There are two cases to consider: either σ and π differ by an adjacent transposition notinvolving the block of a and b, or they differ by an adjacent transposition that moves theblock of a and b.

In the former case, it is not hard to see that one of σ, σ′, π, π′ is a 2-manipulationpoint, by Lemma 15.1.

If σ and π differ by an adjacent transposition that involves the block of a and b, thenthere are again two cases to consider: either this transposition moves the block of a and bup in the ranking, or it moves it down.

90

If the block of a and b is moved up to get from σ to π, then we must have f (π) = a,or else σ or π is a 3-manipulation point. Then we must have f (π′) = f (π) = a, in whichcase π′ is a 3-manipulation point, since f (σ′) = b.

The final case is when the block of a and b is moved down to get from σ to π, anda third alternative, call it c, is moved up, directly above the block of a and b. Now iff (π) = d /∈ a, b, c, then σ or π is a 3-manipulation point. If f (π) = f (π′) = a, then π′

is a 3-manipulation point, whereas if f (π) = f (π′) = b, then π is a 3-manipulation point.The remaining case is when f (π) = f (π′) = c. Now if f ([b : c]σ) 6= a or f ([a : c]σ′) 6= b,then we again have a 3-manipulation point close to σ. Otherwise σ ∈ LD (a, b).

The following corollary then tells us that either we have found many 3-manipulationpoints, or we have many local dictators on three alternatives.

Corollary 16.5. If (97) holds, then either

∑c/∈a,b

P(σ ∈ LDa,b,c

)= P (σ ∈ LD (a, b)) ≥ ε2

2k10(101)

or

P (σ ∈M3) ≥ ε2

4k12.

16.3.1 Dealing with local dictators

So the remaining case we have to deal with in this small fiber case is when (101) holds,i.e., we have many local dictators on three alternatives.

Lemma 16.6. Suppose σ ∈ LDa,b,c for some alternative c /∈ a, b. Let σ′ be equal to σexcept that the block of a, b and c is moved to the top of the coordinate. Then

• either σ′ ∈ LDa,b,c,

• or there exists a 3-manipulation point σ which agrees with σ except that the positionsof a, b and c might be shifted arbitrarily.

Proof. W.l.o.g. we may assume that in σ alternative a is ranked above b, which is rankedabove c. Now move a to the top using a sequence of adjacent transpositions, all involvinga; we call this procedure “bubbling” a to the top. If at any point during this the outcomeof f is not a, then we have found a 2-manipulation point. Now bubble up b to right belowa, and then bubble up c to be right below b. Again, if at any point during this the outcomeof f is not a, then there is a 2-manipulation point. Otherwise we now have a, b and c atthe top (in this order), with the outcome of f being a. Now permuting alternatives a, band c at the top, we either have a 3-manipulation point, or σ′ ∈ LDa,b,c.

91

Corollary 16.7. If (101) holds, then either∑c/∈a,b

P(σ ∈ LDa,b,c, σ (1) , σ (2) , σ (3) = a, b, c

)≥ ε2

4k11(102)

or

P (σ ∈M3) ≥ ε2

4k13.

Proof. Lemma 16.6 tells us that when we move the block of a, b, and c up to the top, weeither encounter a 3-manipulation point, or we get a local dictator on a, b, c at the top.

If we get a 3-manipulation point, by the describtion of this manipulation point in thelemma, there can be at most k3 ranking profiles that give the same manipulation point.

If we arrive at a local dictator at the top, then there could have been at most k differentplaces where the block of a, b and c could have come from.

Now (102) is equivalent to∑c/∈a,b

P(σ ∈ LDa,b,c, (σ (1) , σ (2) , σ (3)) = (a, b, c)

)≥ ε2

24k11. (103)

We know that

P ((σ (1) , σ (2) , σ (3)) = (a, b, c)) =1

k (k − 1) (k − 2)≤ 6

k3,

and so (103) implies (recall Definition 15.7)∑c/∈a,b

P(a,b,c)(σ ∈ LDa,b,c

)≥ ε2

144k8. (104)

Now fix an alternative c /∈ a, b and define the graph G(a,b,c) =(V(a,b,c), E(a,b,c)

)to

have vertex setV(a,b,c) := σ ∈ Sk : (σ (1) , σ (2) , σ (3)) = (a, b, c)

and for σ, π ∈ V(a,b,c) let (σ, π) ∈ E(a,b,c) if and only if σ and π differ by an adjacenttransposition. So G(a,b,c) is the subgraph of the refined rankings graph induced by thevertex set V(a,b,c). (If k = 3 or k = 4, then this graph consists of only one vertex, and noedges.)

LetT (a, b, c) := V(a,b,c) ∩ LDa,b,c,

and let ∂e (T (a, b, c)) and ∂ (T (a, b, c)) denote the edge- and vertex-boundary of T (a, b, c)in G(a,b,c), respectively.

The next lemma shows that unless T (a, b, c) is almost all of V(a,b,c), the size of theboundary ∂ (T (a, b, c)) is comparable to the size of T (a, b, c). The proof uses a canonicalpath argument, just like in Lemma 16.2.

92

Lemma 16.8. Let c /∈ a, b be arbitrary. Write T ≡ T (a, b, c) for simplicity. IfP(a,b,c) (σ ∈ T ) ≤ 1− δ, then

P(a,b,c) (σ ∈ ∂ (T )) ≥ δ

k3P(a,b,c) (σ ∈ T ) . (105)

Proof. Let T c = V(a,b,c) \ T (a, b, c). For every (σ, σ′) ∈ T × T c, we define a canonicalpath from σ to σ′, which has to pass through at least one edge in ∂e (T ). Then if weshow that every edge in ∂e (T ) lies on at most r canonical paths, then it follows that|∂e (T )| ≥ |T | |T c| /r.

So let (σ, σ′) ∈ T × T c. We apply the path construction of Proposition 14.7, but onlyto alternatives [k] \ a, b, c.

The analysis of this construction is done in exactly the same way as in Lemma 16.2; inthe end we get that there are at most k2 (k − 3)! paths that pass through a given edge in∂e (T ).

Recall that∣∣V(a,b,c)

∣∣ = (k − 3)! and that by our assumption |T | ≤ (1− δ) (k − 3)!, so|T c| ≥ δ (k − 3)!. Therefore

|∂e (T )| ≥ |T | |T c|k2 (k − 3)!

≥ δ

k2|T | .

Now every vertex in V(a,b,c) has k − 4 < k neighbors, which implies (105).

The next lemma tells us that if σ is on the boundary of a set of local dictators ona, b, c for some alternative c /∈ a, b, then there is a 2-manipulation point σ which isclose to σ.

Lemma 16.9. Suppose σ ∈ ∂ (T (a, b, c)) for some c /∈ a, b. Then there exists σ ∈ M2

which equals zσ for some adjacent transposition z that does not involve a, b or c, exceptthat the order of the block of a, b and c might be rearranged.

Proof. Let π be the ranking profile such that (σ, π) ∈ ∂e (T (a, b, c)), and let z be theadjacent transposition in which they differ, i.e., π = zσ. Since π /∈ T (a, b, c), there exists areordering of the block of a, b, and c at the top of π such that the outcome of f is not thetop ranked alternative. Call the resulting vector π′. W.l.o.g. let us assume that π′ (1) = a.Let us also define σ′ := zπ′. Now π′ is a 2-manipulation point, since f (σ′) = a.

The next corollary puts together Corollary 16.7 and Lemmas 16.8 and 16.9.

Corollary 16.10. Suppose (102) holds. Then if for every c /∈ a, b we have P(a,b,c) (σ ∈ T (a, b, c)) ≤1− ε

100k , then

P (σ ∈M2) ≥ ε3

105k16.

93

Proof. We know that (102) implies

∑c/∈a,b

Pa,b,c (σ ∈ T (a, b, c)) ≥ ε2

144k8.

Now using the assumptions, Lemma 16.8 with δ = ε100k , and Lemma 16.9, we have

P (σ ∈M2) ≥∑

c 6=a,b

1

k3P(a,b,c) (σ ∈M2) ≥

∑c/∈a,b

1

6k4P(a,b,c) (σ ∈ ∂ (T (a, b, c)))

≥∑

c/∈a,b

ε

600k8P(a,b,c) (σ ∈ T (a, b, c)) ≥ ε3

86400k16≥ ε3

105k16.

So again we are left with one case to deal with: if there exists an alternative c /∈ a, bsuch that P(a,b,c) (σ ∈ T (a, b, c)) > 1− ε

100k . Define a subset of alternatives K ⊆ [k] in thefollowing way:

K := a, b ∪c ∈ [k] \ a, b : P(a,b,c) (σ ∈ T (a, b, c)) > 1− ε

100k

.

In addition to a and b, K contains those alternatives that whenever they are at the topwith a and b, they form a local dictator with high probability.

So our assumption now is that |K| ≥ 3.Our next step is to show that unless we have many manipulation points, for any alter-

native c ∈ K, conditioned on c being at the top, the outcome of f is c with probabilityclose to 1.

Lemma 16.11. Let c ∈ K. Then either

P(c) (f (σ) = c) ≥ 1− ε

50k, (106)

orP (σ ∈M2) ≥ ε

100k4. (107)

Proof. First assume that c /∈ a, b.Let σ be uniform according to P(c), i.e., uniform on Sk conditioned on σ (1) = c. Define

σ′, where σ′ is constructed from σ by first bubbling up alternative a to just below c, usingadjacent transpositions, and then bubbling up b to just below a. Clearly σ′ is distributedaccording to P(c,a,b), i.e., it is uniform on Sk conditioned on (σ (1) , σ (2) , σ (3)) = (c, a, b).

Since c ∈ K, we know that P(c,a,b)(σ ∈ LDa,b,c

)> 1− ε

100k . This also means that

P(c)(σ′ ∈ LDa,b,c

)> 1− ε

100k.

94

Now we can partition the ranking profiles into three parts, based on the outcome of theSCF f at σ and σ′:

I1 =σ : f (σ) = c, f

(σ′)

= c

I2 =σ : f (σ) 6= c, f

(σ′)

= c

I3 =σ : f

(σ′)6= c.

If P(c) (I1) ≥ 1 − ε50k , then (106) holds. Otherwise we have P(c) (I2 ∪ I3) ≥ ε

50k , and sinceP(c) (I3) ≤ ε

100k , we have P(c) (I2) ≥ ε100k .

Now if σ ∈ I2, then we know that there is a 2-manipulation point along the way as wego from σ to σ′. I.e., to every σ ∈ I2 there exists σ ∈M2 such that σ is equal to σ exceptperhaps a and b are shifted arbitrarily. So there can be at most k2 ranking profiles σ givingthe same 2-manipulation point σ, and so we have

P (σ ∈M2) ≥ 1

kP(c) (σ ∈M2) ≥ 1

k3P(c) (I2) ≥ ε

100k4,

showing (107).Now suppose c ∈ a, b, w.l.o.g. assume c = a. We know that |K| ≥ 3 and so there

exists an alternative d ∈ K \ a, b. We can then do the same thing as above, but we nowbubble up b and d.

We now deal with alternatives that are not in K: either we have many manipulationpoints, or for any alternative d /∈ K, the outcome of f is not d with probability close to 1.

Lemma 16.12. Let d /∈ K. If P (f (σ) = d) ≥ ε4k , then

P (σ ∈M2) ≥ ε2

106k9.

Proof. Let σ be such that f (σ) = d. Bubble up d to the top, and call this ranking profileσ′. Now if f (σ′) 6= d, then we know that there exists a 2-manipulation point σ along theway, i.e., a 2-manipulation σ which agrees with σ except perhaps d is shifted arbitrarily.Consequently, either

P (σ ∈M2) ≥ ε

8k2,

in which case we are done, or

P(σ : f (σ) = f

(σ′)

= d)≥ ε

8k.

Next, let us bubble up a to just below d, and then bubble up b to just below d. Denotethis ranking profile by σ(d,b,a), and analogously define σ(d,a,b), σ(a,b,d), σ(a,d,b), σ(b,a,d), andσ(b,d,a). Either we encounter a 2-manipulation point σ along the way of bubbling up to

95

σ(d,b,a) (σ agrees with σ except d is at the top, and a and b might be arbitrarily shifted),or the outcome of the SCF f is d all along. So we have that either

P (σ ∈M2) ≥ ε

16k3,


P(σ : f (σ) = f

(σ′)

= f(σ(d,b,a)

)= f

(σ(d,a,b)

)= d)≥ ε

16k.

Now start from σ(d,a,b). First swap a and d to get σ(a,d,b), then swap d and b to getσ(a,b,d), and finally bubble d and b down to their original positions in σ, except for thefact that a is now at the top of the coordinate. Call this profile σ. Since σ is uniformlydistributed, σ is distributed according to P(a)

1 , i.e., uniformly conditional on σ (1) = a. Nownote that one of the following three events has to happen. (These events are not mutuallyexclusive.)

I1 =f(σ(a,d,b)

)= f

(σ(a,b,d)

)= a

I2 = f (σ) 6= aI3 = σ : ∃ σ ∈M2 which is equal to σ except a is shifted

to the top, and b and d may be shifted arbitrarily.

Since a ∈ K, we know by Lemma 16.11 that (unless we already have enough manipulationpoints by the lemma) we must have

P (f (σ) 6= a) = P(a) (f (σ) 6= a) ≤ ε

50k.

Consequently

P(I1 ∪ I3, f (σ) = f

(σ′)

= f(σ(d,b,a)

)= f

(σ(d,a,b)

)= d)≥ ε

16k− ε

50k=

17ε

400k,

and so eitherP (σ ∈M2) ≥ 17ε

800k3,


P(σ : f

(σ(d,b,a)

)= f

(σ(d,a,b)

)= d, f

(σ(a,b,d)

)= f

(σ(a,d,b)

)= a

)≥ 17ε

800k.

Next, we can do the same thing with b on top, and we ultimately get that either

P (σ ∈M2) ≥ ε

1600k3,

96


P(a,b,d)(σ(a,b,d) ∈ LDa,b,d

)= P

(σ : σ(a,b,d) ∈ LDa,b,d

)≥ ε

1600k. (108)

Define G(a,b,d) and T(a,b,d) analogously to G(a,b,c) and T(a,b,c), respectively.Suppose that (108) holds. We also know that d /∈ K, so Lemma 16.8 applies, and

then Lemma 16.9 shows us how to find manipulation points. We can put these argumentstogether, just like in the proof of Corollary 16.10, to show what we need:

P (σ ∈M2) ≥ 1

k3P(a,b,d) (σ ∈M2) ≥ 1

6k4P(a,b,d) (σ ∈ ∂ (T (a, b, d)))

≥ ε

600k8P(a,b,d) (σ ∈ T (a, b, d)) ≥ ε2

106k9.

Putting together the results of the previous lemmas, there is only one case to be covered,which is covered by the following final lemma. Basically, this lemma says that unlessthere are enough manipulation points, our function is close to a dictator on the subset ofalternatives K.

Lemma 16.13. Recall that we assume that D (f,NONMANIP) ≥ ε. Furthermore assumethat |K| ≥ 3, for every c ∈ K we have

P(c) (f (σ) = c) ≥ 1− ε

50k, (109)

and for every d /∈ K we haveP (f (σ) = d) ≤ ε

4k.

ThenP (σ ∈M2) ≥ ε

4k2. (110)

Proof. First note that

P (f (σ) 6= topK (σ)) = P (f (σ) /∈ K) + P (f (σ) 6= topK (σ) , f (σ) ∈ K) .

We know thatε ≤ D (f,NONMANIP) ≤ P (f (σ) 6= topK (σ))

and also thatP (f (σ) /∈ K) ≤ (k − |K|) ε

4k≤ ε

2,

which together imply that

P (f (σ) 6= topK (σ) , f (σ) ∈ K) ≥ ε

2.

97

Let σ be such that f (σ) 6= topK (σ) and f (σ) ∈ K. Now bubble topK (σ) up to the topin σ, call this ranking profile σ. Clearly then topK (σ) = topK (σ).

There are two cases to consider. If f (σ) 6= f (σ), then there is a 2-manipulation pointalong the way from σ to σ, i.e., a 2-manipulation point σ such that σ agrees with σexcept perhaps some alternative c is arbitrarily shifted. Otherwise f (σ) = f (σ), and sof (σ) 6= topK (σ).

Consequently we have that either (110) holds, or that

P (σ : f (σ) 6= topK (σ)) ≥ ε

4. (111)

By the construction of σ, we know that σ is uniformly distributed conditional on σ (1) ∈ K.Consequently, by (109), we have that

P (σ : f (σ) 6= topK (σ)) ≤ ε

50k,

which contradicts with (111) since ε50k <

ε4 .

This concludes the proof of the small fiber case.

16.4 Large fiber case

In this section we assume that (98) holds. We show that we either have a lot 2-manipulationpoints or we have a lot of local dictators on three alternatives.

Our first step towards this is the following lemma.

Lemma 16.14. Suppose (98) holds. Then

P(a,b) (σ ∈ B) ≥ 1− ε

4. (112)

Proof. Let Bc = F \B. Our assumption (98) implies that P(σ ∈ Bc

∣∣σ ∈ F ) ≤ ε4k , which

means that |Bc| ≤ ε(k−1)!4k , and so

P(a,b) (σ /∈ B) ≤ ε (k − 1)!

4k (k − 2)!<ε

4,

which is equivalent to (112).

The next lemma (together with Section 16.3.1) concludes the proof in the large fibercase.

Lemma 16.15. Suppose (98) holds and recall that our SCF f satisfies D (f,NONMANIP) ≥ε. Then either

P (σ ∈M2) ≥ ε

4k2(113)

orP (σ ∈ LD (a, b)) ≥ ε

4k2. (114)

98

Proof. By Lemma 16.14 we know that (112) holds.Let σ ∈ Sk be uniform. Define σ′ by being the same as σ except alternatives a and b

are moved to the top of the coordinate: σ′ (1) = a and σ′ (2) = b. Clearly σ′ is distributedaccording to P(a,b) (·). Also define σ′′ = [a : b]σ′.

We partition the set of ranking profiles Sk into three parts:

I1 :=σ ∈ Sk : f (σ) = topa,b (σ) ,

(f(σ′), f(σ′′))

= (a, b)

I2 :=σ ∈ Sk : f (σ) 6= topa,b (σ) ,

(f(σ′), f(σ′′))

= (a, b)

I3 :=σ ∈ Sk :

(f(σ′), f(σ′′))6= (a, b)

.

By (112) we know that P (σ ∈ I3) ≤ ε4 . We also know that P (σ ∈ I1) ≤ 1 − ε, since

D (f,NONMANIP) ≥ ε. Therefore we must have

P (σ ∈ I2) ≥ 3ε

4>ε

2.

Let us partition I2 further, and write it as I2 = I ′2 ∪(∪c/∈a,bI2,c

), where

I ′2 :=σ ∈ I2 : f (σ) 6= topa,b (σ) , f (σ) ∈ a, b

and for any c /∈ a, b,

I2,c := σ ∈ I2 : f (σ) = c .Suppose σ ∈ I ′2. W.l.o.g. let us assume that a is ranked higher than b by σ, and

therefore f (σ) = b, since σ ∈ I ′2. Then we can get from σ to σ′ by first bubbling up a tothe top, and then bubbling up b to just below a. Since f (σ) = b and f (σ′) = a, theremust be a 2-manipulation point σ along the way, which is equal to σ except perhaps thepositions of a and b are arbitrarily shifted.

Now suppose that σ ∈ I2,c for some c /∈ a, b. We distinguish two cases: either c isranked above both a and b in σ, or it is not.

If not, then say a is ranked above c in σ. Bubble a all the way to the top, and thenbubble b as well, all the way to the top, just below a. Since f (σ) = c and f (σ′) = a, theremust be a 2-manipulation point σ along the way, which is equal to σ except perhaps thepositions of a and b are arbitrarily shifted.

If c is ranked above both a and b in σ, then the argument is similar. First bubble upa and b to just below c, and denote this ranking profile by σ, then permute these threealternatives arbitrarily, and then bubble a and b to the top. It is not hard to think throughthat either there is a 2-manipulation σ along the way, which is then equal to σ exceptperhaps the positions of a and b are arbitrarily shifted, or else σ ∈ LDa,b,c.

Combining these cases we see that either (113) or (114) must hold.

So if (113) holds then we are done, and if (114) holds, then we refer back to Sec-tion 16.3.1, where we deal with the case of local dictators on three alternatives.

99


Proof of Theorem 13.1. Our starting point is Lemma 16.1, which implies that (96) holds(unless we already have many 2-manipulation points). We then consider two cases, asindicated in Section 16.2.

We deal with the small fiber case—when (97) holds—in Section 16.3. First, Lemma 16.2,Corollary 16.3, Lemma 16.4 and Corollary 16.5 show that either there are many 3-manipulationpoints, or there are many local dictators on three alternatives. We then deal with the case ofmany local dictators in Section 16.3.1. Lemma 16.6, Corollary 16.7, Lemmas 16.8 and 16.9,Corollary 16.10, and Lemmas 16.11, 16.12 and 16.13 together show that there are many3-manipulation points if there are many local dictators on three alternatives, and the SCFis ε-far from the family of nonmanipulable functions.

We deal with the large fiber case—when (98) holds—in Section 16.4. Here Lemma 16.15shows that either we have many 2-manipulation points, or we have many local dictatorson three alternatives. In this latter case we refer back to Section 16.3.1 to conclude theproof.

17 Inverse polynomial manipulability for any number of al-ternatives

In this section we prove the theorem below, which is the same as our main theorem,Theorem 11.9, except that the condition of D (f,NONMANIP) ≥ ε from Theorem 11.9 isreplaced with the stronger condition D

(f,NONMANIP

)≥ ε.

Theorem 17.1. Suppose we have n ≥ 2 voters, k ≥ 3 alternatives, and a SCF f : Snk → [k]satisfying D

(f,NONMANIP

)≥ ε. Then

P (σ ∈M (f)) ≥ P (σ ∈M4 (f)) ≥ p(ε,

1

n,

1

k

), (115)

for some polynomial p, where σ ∈ Snk is selected uniformly. In particular, we show a lowerbound of ε5

109n7k46.

An immediate consequence is that

P((σ, σ′


)≥ q

(ε,

1

n,

1

k

),

for some polynomial q, where σ ∈ Snk is uniformly selected, and σ′ is obtained from σby uniformly selecting a coordinate i ∈ 1, . . . , n, uniformly selecting j ∈ 1, . . . , n− 3,and then uniformly randomly permuting the following four adjacent alternatives in σi:σi (j) , σi (j + 1) , σi (j + 2), and σi (j + 3). In particular, the specific lower bound for P (σ ∈M4 (f))

implies that we can take q(ε, 1

n ,1k

)= ε5

1011n8k47.

100

For the remainder of the section, let us fix the number of voters n ≥ 2, the number ofalternatives k ≥ 3, and the SCF f , which satisfies D

(f,NONMANIP

)≥ ε. Accordingly,

we typically omit the dependence of various sets (e.g., boundaries between two alternatives)on f .


Our starting point in proving Theorem 17.1 is Lemma 15.3. Clearly if (85) holds then weare done, so in the rest of Section 17 we assume that this is not the case. Then Lemma 15.3tells us that (86) holds, and w.l.o.g. we may assume that the two boundaries that the lemmagives us have i = 1 and j = 2. I.e., we have

P(σ on B

a,b;[a:b]1

)≥ 4ε

nk7and P

(σ on B

c,d;[c:d]2

)≥ 4ε

nk7,

where recall that σ is on Ba,b;[a:b]1 if f (σ) = a and f ([a : b]1 σ) = b. If σ is on B

a,b;[a:b]1 and

bσ1> a, then σ is a 2-manipulation point, so if this happens in more than half of the cases

when σ is on Ba,b;[a:b]1 , then we have

P (σ ∈M2) ≥ 2ε

nk7,

and we are done. Similarly in the case of the boundary between c and d in coordinate 2.So we may assume from now on that

P(σ ∈ ∪

za,b−1B1

(za,b−1

))≥ 2ε

nk7and P

(σ ∈ ∪

zc,d−2B2

(zc,d−2

))≥ 2ε

nk7.

The following lemma is an immediate corollary.

Lemma 17.2. EitherP(σ ∈ Sm

(Ba,b;[a:b]1

))≥ ε

nk7(116)

orP(σ ∈ Lg

(Ba,b;[a:b]1

))≥ ε

nk7, (117)

and the same can be said for the boundary Bc,d;[c:d]2 .

We distinguish cases based upon this: either (116) holds, or (116) holds for the boundaryBc,d;[c:d]2 , or (117) holds for both boundaries. We only need one boundary for the small

fiber case, and we need both boundaries only in the large fiber case. So in the large fibercase we must differentiate between two cases: whether d ∈ a, b or d /∈ a, b. We will seethat our method of proof works in both cases.

In the rest of the section we first deal with the small fiber case, and then with the largefiber case.

101

17.2 Small fiber case

We now deal with the case when (116) holds. We formalize the ideas of the outline in aseries of statements.

First, we want to formalize that the boundaries of the boundaries are big in this refinedgraph setting as well, when we are on a small fiber. The proof uses the canonical pathmethod, as successfully adapted to this setting by Isaksson, Kindler and Mossel [36], andis very similar to the proof of Lemma 16.2, with some necessary modifications due to thefact that we now have n coordinates.

Lemma 17.3. Fix a coordinate and a pair of alternatives—for simplicity we choose coor-dinate 1 and alternatives a and b, but we note that this lemma holds in general, we do notassume anything special about these choices. Let za,b−1 be such that B1

(za,b−1

)is a small fiber

for Ba,b;[a:b]1 . Then, writing B ≡ B1

(za,b−1

)for simplicity, we have

P (σ ∈ ∂ (B)) ≥ γ

2nk5P (σ ∈ B) . (118)

Proof. Let Bc = F(za,b−1

)\B. For every (σ, σ′) ∈ B×Bc, we define a canonical path from

σ to σ′, which has to pass through at least one edge in ∂e (B). Then if we show that everyedge in ∂e (B) lies on at most r canonical paths, then it follows that |∂e (B)| ≥ |B| |Bc| /r.

So let (σ, σ′) ∈ B×Bc. We define a path from σ to σ′ by applying a path construction ineach coordinate one by one, and then concatenating these paths: first in the first coordinatewe get from σ1 to σ′1, while leaving all other coordinates unchanged, then in the secondcoordinate we get from σ2 to σ′2, while leaving all other coordinates unchanged, and so on,finally in the last coordinate we get from σn to σ′n. In the first coordinate we apply thepath construction of Proposition 14.7, but considering the block formed by a and b as asingle element; in all other coordinates we apply the path construction of Proposition 14.9.Since this path goes from σ (which is in B) to σ′ (which is in Bc), it must pass through atleast one edge in ∂e (B).

For a given edge (π, π′) ∈ ∂e (B), at most how many possible (σ, σ′) ∈ B×Bc pairs arethere such that the canonical path between σ and σ′ defined above passes through (π, π′)?Let us differentiate two cases.

Suppose π and π′ differ in the first coordinate. Then coordinates 2 through n of σmust agree with the respective coordinates of π, while coordinates 2 through n of σ′ can beanything (up to the restriction given by σ′ ∈ Bc ⊆ F

(za,b−1

)), giving

(k!2

)n−1possibilities.

Now fixing all coordinates except the first,Proposition 14.7 tells us that there are at most(k − 1)2 (k − 1)!/2 < k2 (k − 1)! possibilities for the pair (σ1, σ

′1). So altogether there are

at most k2 (k − 1)!(k!2

)n−1paths that pass through a given edge in ∂e (B) in this case.

Suppose now that π and π′ differ in the ith coordinate, i 6= 1. Then the first i − 1coordinates of σ′ must agree with the first i−1 coordinates of π, while coordinates i+1, . . . , n

102

of σ must agree with the respective coordinates of π. The first i − 1 coordinates of σ,and coordinates i + 1, . . . , n of σ′ can be anything (up to the restriction given by σ, σ′ ∈F(za,b−1

)), giving (k − 1)!

(k!2

)n−2possibilities. Now fixing all coordinates except the ith

coordinate, [36, Proposition 6.6.] tells us that there are at most k4k! possibilities for thepair (σi, σ

′i). So altogether there are at most 2k4 (k − 1)!

(k!2

)n−1paths that pass through

a given edge in ∂e (B) in this case.So in any case, there are at most 2k4 (k − 1)!

(k!2

)n−1paths that pass through a given

edge in ∂e (B).

Recall that∣∣∣F (za,b−1

)∣∣∣ = (k − 1)!(k!2

)n−1, and also |Bc| ≥ γ (k − 1)!

(k!2

)n−1since B is

a small fiber. Therefore

|∂e (B)| ≥ |B| |Bc|2k4 (k − 1)!

(k!2

)n−1 ≥γ

2k4|B| .

Now in G(za,b−1

)every ranking profile has no more than nk neighbors, which implies (118).


P

σ ∈ ⋃za,b−1

∂(B1

(za,b−1

)) ≥ γε

2n2k12.

Proof. Using the previous lemma and (116) we have

P

σ ∈ ⋃za,b−1

∂(B1

(za,b−1

)) =∑za,b−1

P(σ ∈ ∂

(B1

(za,b−1

)))≥

∑za,b−1 :B1

(za,b−1

)⊆Sm

(Ba,b;[a:b]1

)P(σ ∈ ∂

(B1

(za,b−1

)))

≥∑

za,b−1 :B1

(za,b−1

)⊆Sm

(Ba,b;[a:b]1

)γ

2nk5P(σ ∈ B1

(za,b))

=γ

2nk5P(σ ∈ Sm

(Ba,b

1

))≥ γε

2n2k12.

Next, we want to find manipulation points on the boundaries of boundaries.Before we do this, let us divide the boundaries of the boundaries according to which

direction they are in. If σ ∈ ∂(B1

(za,b−1

))for some za,b−1, then we know that there exists a

103

ranking profile π such that (σ, π) ∈ ∂e(B1

(za,b−1

)). We know that σ and π differ in exactly

one coordinate, say coordinate j; in this case we say that σ is on the boundary of B1

(za,b−1

)in direction j, and we write σ ∈ ∂j

(B1

(za,b−1

)). (This notation should not be confused

with that of the edge boundary.)

We can write the boundary of B1

(za,b−1

)as a union of boundaries in the different

directions:∂(B1

(za,b−1

))= ∪nj=1∂j

(B1

(za,b−1

)),

but note that this is not (necessarily) a disjoint union, as a ranking profile σ for which

σ ∈ ∂(B1

(za,b−1

))might lie on the boundary in multiple directions.

In particular, we differentiate between the boundary in direction 1 and the boundaryin all other directions. To this end we introduce the notation

∂−1

(B1

(xa,b−1

)):= ∪nj=2∂j

(B1

(xa,b−1

)).

With this notation we have the following corollary of Corollary 17.4.

Corollary 17.5. If (116) holds, then either

P(σ ∈ ∪

za,b−1∂−1

(B1

(za,b−1

)))≥ γε

4n2k12(119)

orP(σ ∈ ∪

za,b−1∂1

(B1

(za,b−1

)))≥ γε

4n2k12. (120)

Lemma 17.6. Suppose the ranking profile σ is on the boundary of a fiber for Ba,b;[a:b]1 in

direction j 6= 1, i.e.,σ ∈ ∪

za,b−1∂−1

(B1

(za,b−1

)).

Then there exists a 3-manipulation point σ which agrees with σ in all coordinates exceptperhaps coordinate 1 and some coordinate j 6= 1; furthermore σ1 is equal to σ1 or [a : b]σ1,except that the position of a third alternative c might be shifted arbitrarily, and σj is equalto σj or zσj for some adjacent transposition z ∈ T , except the position of b might be shiftedarbitrarily.

Proof. Suppose xa,b−1 (σ) = za,b−1. Since σ ∈ ∂(B1

(za,b−1

))⊆ B1

(za,b−1

), we know that f (σ) =

a, and if σ′ = [a : b]1 σ, then f (σ′) = b.

Let π = (πj , σ−j) denote the ranking profile such that (σ, π) ∈ ∂e

(B1

(za,b−1

)). Let

π′ := [a : b]1 π. Since π /∈ B1

(za,b−1

), (f (π) , f (π′)) 6= (a, b). Then, by Lemma 15.1, if

f (π) 6= f (π′), then one of π and π′ is a 2-manipulation point.

104

So let us suppose that f (π) = f (π′). If f (π′) = a, then one of σ′ and π′ is a 2-manipulation point by Lemma 15.1, since π′ = zjσ

′ for some adjacent transposition z 6=[a : b]. If f (π) = b, then similarly one of σ and π is a 2-manipulation point.

Finally let us suppose that f (π) = c for some c /∈ a, b. In this case Lemma 15.2 tellsus that there exists an appropriate 3-manipulation point σ.


P (σ ∈M3) ≥ γε

8n3k16. (121)

Proof. Lemma 17.6 tells us that for every ranking profile σ which is on the boundary of afiber for Ba,b;[a:b]

1 in some direction j 6= 1, there is a 3-manipulation point σ “nearby”; thelemma specifies what “nearby” means.

How many ranking profiles σ may give the same σ? At most 2nk4, which comes fromthe following: σ and σ agree in all coordinates except maybe two, one of which is thefirst coordinate; there are n − 1 < n possibilities for the other coordinate; in the firstcoordinate, σ1 is either σ1 or [a : b]σ1 (giving 2 possibilities), while some alternative c(k − 2 < k possibilities) might be shifted arbitrarily (at most k possibilities); in the othercoordinate j 6= 1, σj is equal to σj or zσj for some adjacent transposition z ∈ T (at mostk possibilities), except b might be shifted arbitrarily (k possibilities).

So putting this result from Lemma 17.6 together with (119) yields (121).

The remaining case we have to deal with is when (120) holds.

Lemma 17.8. Suppose the ranking profile σ is on the boundary of a fiber for Ba,b;[a:b]1 in

direction 1, i.e.,σ ∈ ∪

za,b−1∂1

(B1

(za,b−1

)).

Then either σ ∈ LD1 (a, b), or there exists a 3-manipulation point σ which agrees with σ inall coordinates except perhaps in coordinate 1; furthermore σ1 is equal to σ1, or [a : b]σ1

except that the position of a third alternative c might be shifted arbitrarily.

Proof. Just like the proof of Lemma 16.4.

The following corollary then tells us that either we have found many 3-manipulationpoints, or we have many local dictators on three alternatives in coordinate 1.

Corollary 17.9. Suppose (120) holds. Then either∑c/∈a,b

P(σ ∈ LD

a,b,c1

)= P (σ ∈ LD1 (a, b)) ≥ γε

8n2k12(122)

orP (σ ∈M3) ≥ γε

16n2k14.

105

17.2.1 Dealing with local dictators

So the remaining case we have to deal with in this small fiber case is when (122) holds,i.e., we have many local dictators in coordinate 1.

Lemma 17.10. Suppose σ ∈ LDa,b,c1 for some alternative c /∈ a, b. Define σ′ :=

(σ′1, σ−1) by letting σ′1 be equal to σ1 except that the block of a, b and c is moved to the topof the coordinate. Then

• either σ′ ∈ LDa,b,c1 ,

• or there exists a 3-manipulation point σ which agrees with σ in all coordinates exceptperhaps in coordinate 1; furthermore σ1 is equal to σ1 except that the position of a, band c might be shifted arbitrarily.


Corollary 17.11. If (122) holds, then either∑c/∈a,b

P(σ ∈ LDa,b,c1 , σ1 (1) , σ1 (2) , σ1 (3) = a, b, c

)≥ γε

16n2k13(123)

orP (σ ∈M3) ≥ γε

16n2k15.

Proof. Just like the proof of Corollary 16.7.

Now (123) is equivalent to∑c/∈a,b

P(σ ∈ LDa,b,c1 , (σ1 (1) , σ1 (2) , σ1 (3)) = (a, b, c)

)≥ γε

96n2k13. (124)

We know that

P ((σ1 (1) , σ1 (2) , σ1 (3)) = (a, b, c)) =1

k (k − 1) (k − 2)≤ 6

k3,

and so (124) implies (recall Definition 15.7)∑c/∈a,b

P(a,b,c)1

(σ ∈ LDa,b,c1

)≥ γε

576n2k10. (125)

Now fix an alternative c /∈ a, b and define the graph G(a,b,c) =(V(a,b,c), E(a,b,c)

)to

have vertex set

V(a,b,c) := σ ∈ Snk : (σ1 (1) , σ1 (2) , σ1 (3)) = (a, b, c)

106

and for σ, π ∈ V(a,b,c) let (σ, π) ∈ E(a,b,c) if and only if σ and π differ in exactly onecoordinate, and by an adjacent transposition in this coordinate. So G(a,b,c) is the subgraphof the refined rankings graph induced by the vertex set V(a,b,c).

LetT1 (a, b, c) := V(a,b,c) ∩ LD

a,b,c1 ,

and let ∂e (T1 (a, b, c)) and ∂ (T1 (a, b, c)) denote the edge and vertex boundary of T1 (a, b, c)in G(a,b,c), respectively.

The next lemma shows that unless T1 (a, b, c) is almost all of V(a,b,c), the size of theboundary ∂ (T1 (a, b, c)) is comparable to the size of T1 (a, b, c).

Lemma 17.12. Let c /∈ a, b be arbitrary. Write T ≡ T1 (a, b, c) for simplicity. IfP(a,b,c)

1 (σ ∈ T ) ≤ 1− δ, then

P(a,b,c)1 (σ ∈ ∂ (T )) ≥ δ

nk3P(a,b,c)

1 (σ ∈ T ) . (126)

Proof. The proof is essentially the same as the proof of Lemma 16.8, with a slight modifi-cation to deal with n coordinates. Let T c = V(a,b,c) \ T (a, b, c). For every (σ, σ′) ∈ T × T cwe define a canonical path from σ to σ′ by applying a path construction in each coor-dinate one by one, and then concatenating these paths. In all coordinates we apply thepath construction of [36, Proposition 6.4.], but in the first coordinate we only apply it toalternatives [k] \ a, b, c.

The analysis of this construction is done in exactly the same way as in Lemma 17.3;in the end we get that |∂e (T )| ≥ δ

k2|T |. Now every vertex in V(a,b,c) has no more than nk

neighbors, which implies (126).

The next lemma tells us that if σ is on the boundary of a set of local dictators ona, b, c for some alternative c /∈ a, b in coordinate 1, then there is a 4-manipulationpoint σ which is close to σ. The proof is similar to that of Lemma 16.9, but we have totake care of all n coordinates.

Lemma 17.13. Suppose σ ∈ ∂ (T1 (a, b, c)) for some c /∈ a, b. We distinguish two cases,based on the number of alternatives.

If k = 3, then there exists a (3-)manipulation point σ which differs from σ in at mosttwo coordinates, one of them being the first coordinate.

If k ≥ 4, then there exists a 4-manipulation point σ which differs from σ in at most twocoordinates, one of them being the first coordinate; furthermore, σ1 is equal to σ1 exceptthat the order of the block of a, b and c might be rearranged and an additional alternatived might be shifted arbitrarily; and in the other coordinate, call it j, σj is equal to σj exceptperhaps a, b and c are shifted arbitrarily.

Proof. Let π be the ranking profile such that (σ, π) ∈ ∂e (T1 (a, b, c)), let j be the coordinatein which they differ, and let z be the adjacent transposition in which they differ, i.e.,

107

π = zjσ. Since π /∈ T1 (a, b, c), there exists a reordering of the block of a, b, and c at thetop of π1 such that the outcome of f is not the top ranked alternative in coordinate 1. Callthe resulting vector π′1, and let π′ := (π′1, π−1). W.l.o.g. let us assume that π′1 (1) = a. Letus also define σ′ := zjπ

′. We distinguish two cases: j = 1 and j 6= 1.If j = 1 (in which case we must have k ≥ 5), π′ is a 2-manipulation point, since

f (σ′) = a.If j 6= 1, then there are various cases to consider. If the adjacent transposition z does

not move a, then either π′ or σ′ is a 2-manipulation point. So let us suppose that z = [a : d]for some d 6= a.

Clearly we must have f (π′) = d, or else π′ or σ′ is a 2-manipulation point. Supposefirst that d ∈ b, c. W.l.o.g. suppose that d = b.

Then take alternative c in coordinate j of both σ′ and π′, and bubble it to the blockof a and b simultaneously in the two ranking profiles. If along the way the value of theoutcome of the SCF f changes from a or b, respectively, then we have a 2-manipulationpoint by Lemma 15.1. Otherwise, we now have a, b, and c adjacent in both coordinates1 and j. Now rearranging the order of the blocks of a, b, and c in these two coordinates(which can be done using adjacent transpositions), we either get a 2-manipulation point byLemma 15.1, or we can define a new SCF on two voters and three alternatives, a, b, and c.This SCF takes on three values and it is also not hard to see that the outcome is not onlya function of the first coordinate, so by the Gibbard-Satterthwaite theorem we know thatthis SCF has a manipulation point, which is a 3-manipulation point of the original SCF f .

Now let us look at the case when d /∈ b, c. In this case we do something similar towhat we just did in the previous paragraph. In both σ′ and π′, first bubble up alternatived in coordinate 1 up to the block of a, b, and c, and then bubble b and c in coordinate j tothe block of a and d. All of this using adjacent transpositions. If the value of the outcomeof the SCF f changes from a or d, respectively, at any time along the way, then we havea 2-manipulation point by Lemma 15.1. Otherwise, we now have a, b, c and d adjacent inboth coordinates 1 and j, and we can apply the same trick to find a 4-manipulation point,using the Gibbard-Satterthwaite theorem.

The next corollary puts together Corollary 17.11 and Lemmas 17.12 and 17.13.

Corollary 17.14. Suppose (123) holds. Then if for every c /∈ a, b we have P(a,b,c)1 (σ ∈ T1 (a, b, c)) ≤

1− ε100k , then

P (σ ∈M4) ≥ γε2

345600n4k22.

Proof. We know that (123) implies∑c/∈a,b

Pa,b,c1 (σ ∈ T1 (a, b, c)) ≥ γε

576n2k10.

108

Now then using the assumptions, Lemma 17.12 with δ = ε100k and Lemma 17.13, we have

P (σ ∈M4) ≥∑

c 6=a,b

1

k3P(a,b,c)

1 (σ ∈M4) ≥∑

c/∈a,b

1

6nk8P(a,b,c)

1 (σ ∈ ∂ (T1 (a, b, c)))

≥∑

c/∈a,b

ε

600n2k12P(a,b,c)

1 (σ ∈ T1 (a, b, c)) ≥ γε2

345600n4k22.

So again we are left with one case to deal with: if there exists an alternative c /∈ a, bsuch that P(a,b,c)

1 (σ ∈ T1 (a, b, c)) > 1− ε100k . Define a subset of alternatives K ⊆ [k] in the

following way:

K := a, b ∪c ∈ [k] \ a, b : P(a,b,c)

1 (σ ∈ T1 (a, b, c)) > 1− ε

100k

.

In addition to a and b, K contains those alternatives that whenever they are at the top ofcoordinate 1 with a and b, they form a local dictator with high probability.

So our assumption now is that |K| ≥ 3.Our next step is to show that unless we have many manipulation points, for any alter-

native c ∈ K, conditioned on c being at the top of the first coordinate, the outcome of fis c with probability close to 1.

Lemma 17.15. Let c ∈ K. Then either

P(c)1 (f (σ) = c) ≥ 1− ε

50k, (127)

orP (σ ∈M2) ≥ ε

100k4. (128)


We now deal with alternatives that are not in K: either we have many manipulationpoints, or for any alternative d /∈ K, the outcome of f is not d with probability close to 1.

Lemma 17.16. Let d /∈ K. If P (f (σ) = d) ≥ ε4k , then

P (σ ∈M4) ≥ ε2

106n2k13.

Proof. The proof is very similar to that of Lemma 16.12: we do the same steps in the firstcoordinate as done in the proof of Lemma 16.12, and the fact that we have n coordinatesonly matters at the very end.

Let σ be such that f (σ) = d. We will keep coordinates 2 through n to be fixed asσ−1 throughout the proof. By bubbling alternatives d, a, and b in the first coordinate, we

109

can define σ′, σ(d,b,a), σ(d,a,b), σ(a,b,d), σ(a,d,b), σ(b,a,d), and σ(b,d,a) just as in the proof ofLemma 16.12. Again, we can show that either

P (σ ∈M2) ≥ ε

1600k3,


P(a,b,d)1

(σ(a,b,d) ∈ LDa,b,d1

)= P

(σ : σ(a,b,d) ∈ LDa,b,d1

)≥ ε

1600k. (129)

Define G(a,b,d) and T(a,b,d) analogously to G(a,b,c) and T(a,b,c), respectively.Suppose that (129) holds. We also know that d /∈ K, so Lemma 17.12 applies, and

then Lemma 17.13 shows us how to find manipulation points. We can put these argumentstogether, just like in the proof of Corollary 17.14, to show what we need:

P (σ ∈M4) ≥ 1

k3P(a,b,d)

1 (σ ∈M4) ≥ 1

6nk8P(a,b,d)

1 (σ ∈ ∂ (T1 (a, b, d)))

≥ ε

600n2k12P(a,b,d)

1 (σ ∈ T1 (a, b, d)) ≥ ε2

106n2k13.

Putting together the results of the previous lemmas, there is only one case to be covered,which is covered by the following final lemma. Basically, this lemma says that unless thereare enough manipulation points, our function is close to a dictator in the first coordinate,on the subset of alternatives K.

Lemma 17.17. Recall that we assume that D(f,NONMANIP

)≥ ε. Furthermore assume

that |K| ≥ 3, for every c ∈ K we have

P(c)1 (f (σ) = c) ≥ 1− ε

50k, (130)

and for every d /∈ K we haveP (f (σ) = d) ≤ ε

4k.

ThenP (σ ∈M2) ≥ ε

4k2. (131)


To conclude the proof in the small fiber case, inspect all the lower bounds for P (σ ∈M4)

obtained in Section 17.2, and recall that γ = ε3

103n3k24.

110

17.3 Large fiber case

We now deal with the large fiber case, when (117) holds for both boundaries, i.e., when

P(σ ∈ Lg

(Ba,b;[a:b]1

))≥ ε

nk7

andP(σ ∈ Lg

(Bc,d;[c:d]2

))≥ ε

nk7.

We differentiate between two cases: whether d ∈ a, b or d /∈ a, b.

17.3.1 Case 1

Suppose d ∈ a, b, in which case w.l.o.g. we may assume that d = a. That is, in the restof this case we may assume that

P(σ ∈ Lg

(Ba,b;[a:b]1

))≥ ε

nk7(132)

andP(σ ∈ Lg

(Ba,c;[a:c]2

))≥ ε

nk7. (133)

First, let us look at only the boundary between a and b in direction 1. Let us fix avector za,b−1 which gives a large fiber B1

(za,b−1

)for the boundary Ba,b;[a:b]

1 , i.e., we know that

P(σ ∈ B1

(za,b−1

) ∣∣∣σ ∈ F (za,b−1

))≥ 1− γ. (134)

Our basic goal in the following will be to show that conditional on the ranking profileσ being in the fiber F

(za,b−1

)(but not necessarily in F

(za,b−1

)), with high probability the

outcome of the vote is topa,b (σ1), or else we have a lot of 2-manipulation points or localdictators on three alternatives in coordinate 1.

Our first step towards this is the following.

Lemma 17.18. Suppose za,b−1 gives a large fiber B1

(za,b−1


1 . Then

P(a,b)1

(σ ∈ B1

(za,b−1

) ∣∣∣σ ∈ F (za,b−1

))≥ 1− kγ. (135)

Proof. We know that

P(

(σ1 (1) , σ1 (2)) = (a, b)∣∣∣σ ∈ F (za,b−1

))=

1

k − 1,

and so

P(a,b)1

(σ /∈ B1

(za,b−1

) ∣∣∣σ ∈ F (za,b−1

))= P(a,b)

1

(σ /∈ B1

(za,b−1

) ∣∣∣σ ∈ F (za,b−1

))= (k − 1)P

(σ /∈ B1

(za,b−1

), (σ1 (1) , σ1 (2)) = (a, b)

∣∣∣σ ∈ F (za,b−1

))≤ (k − 1) γ < kγ.

111

The next lemma formalizes our goal mentioned above.

Lemma 17.19. Suppose za,b−1 gives a large fiber B1

(za,b−1


1 . Theneither

P(f (σ) = topa,b (σ1)

∣∣∣σ ∈ F (za,b−1

))≥ 1− 2kγ (136)

orP(σ ∈M2

∣∣∣σ ∈ F (za,b−1

))≥ γ

2k(137)

orP(σ ∈ LD1 (a, b)

∣∣∣σ ∈ F (za,b−1

))≥ γ

2k. (138)

Proof. The proof of this lemma is essentially the same as that of Lemma 16.15, there areonly two slight differences. First, we use Lemma 17.18 to know that (135) holds. Second,

we take σ ∈ F(za,b−1

)to be uniform, and we stay on the fiber F

(za,b−1

)throughout the

proof: we modify only the first coordinate throughout the proof, in the same way as wedid for Lemma 16.15. We omit the details.

Now this lemma holds for all vectors za,b−1 which give a large fiber B1

(za,b−1

)for the

boundary Ba,b;[a:b]1 . By (132) we know that

P(σ : B1

(xa,b−1 (σ)

)is a large fiber

)≥ ε

nk7.

Now if (137) holds for at least a third of the vectors za,b−1 that give a large fiber B1

(za,b−1

),

then it follows thatP (σ ∈M2) ≥ γε

6nk8

and we are done. If (138) holds for at least a third of the vectors za,b−1 that give a large

fiber B1

(za,b−1

), then similarly we have

P (σ ∈ LD1 (a, b)) ≥ γε

6nk8,

which means that (122) also holds, and so we are done by the argument in Section 17.2.1.So the remaining case to consider is when (136) holds for at least a third of the vectors

za,b−1 that give a large fiber B1

(za,b−1

).

We can go through this same argument for the boundary between a and c in direction2 as well, and either we are done because

P (σ ∈M2) ≥ γε

6nk8

orP (σ ∈ LD2 (a, c)) ≥ γε

6nk8,

112

or for at least a third of the vectors za,c−2 that give a large fiber B2

(za,c−2

)we have

P(f (σ) = topa,c (σ2)

∣∣∣σ ∈ F (za,c−2

))≥ 1− 2kγ.

So basically our final case is if

P(σ ∈ F a,b1

)≥ ε

3nk7(139)

and alsoP (σ ∈ F a,c2 ) ≥ ε

3nk7. (140)

Notice that being in the set F a,b1 only depends on the vector xa,b (σ) of preferences betweena and b, and similarly being in the set F a,c2 only depends on the vector xa,c (σ) of preferences

between a and c. We know that(xa,bi (σ) , xa,ci (σ)

)ni=1

are independent, and for any given

i we know that∣∣∣E(xa,bi (σ)xa,ci (σ)

)∣∣∣ = 13 . Hence we can apply reverse hypercontractivity

(Lemma ??), to get the following result.

Lemma 17.20. If (139) and (140) hold, then also

P(σ ∈ F a,b1 ∩ F a,c2

)≥ ε3

27n3k21. (141)

Proof. See above.

The next and final lemma then concludes that we have lots of manipulation points.

Lemma 17.21. Suppose (141) holds. Then

P (σ ∈M3) ≥ ε3

54n3k27− 9γ

k3. (142)

Proof. First let us define two events:

I1 :=σ : f (σ) = topa,b (σ1)

I2 :=

σ : f (σ) = topa,c (σ2)

.

Using similar estimates as previously in Lemma 12.3, we have

P(σ ∈ I1 ∩ I2 ∩ F a,b1 ∩ F a,c2

)≥ P

(σ ∈ F a,b1 ∩ F a,c2

)− P

(σ /∈ I1, σ ∈ F a,b1 ∩ F a,c2

)− P

(σ /∈ I2, σ ∈ F a,b1 ∩ F a,c2

).

113

The first term is bounded below via (141), while the other two terms can be bounded usingthe definition of F a,b1 and F a,c2 , respectively:

P(σ /∈ I1, σ ∈ F a,b1 ∩ F a,c2

)≤ P

(σ /∈ I1, σ ∈ F a,b1

)≤ P

(σ /∈ I1

∣∣∣σ ∈ F a,b1

)≤ 2kγ,

and similarly for the other term. Putting everything together gives us

P(σ ∈ I1 ∩ I2 ∩ F a,b1 ∩ F a,c2

)≥ ε3

27n3k21− 4kγ.

If σ ∈ I1 ∩ I2 ∩ F a,b1 ∩ F a,c2 , then clearly we must have f (σ) = a, and therefore xa,b1 (σ) = 1and xa,c2 (σ) = 1. Now define σ′ from σ by bubbling up b in coordinate 1 to just below a,and bubbling up c in coordinate 2 to just below a. Either we encounter a 2-manipulationpoint along the way, or the outcome is still a: f (σ′) = a. If we encounter a 2-manipulationpoint along the way for at least half of such ranking profiles, then we are done:

P (σ ∈M2) ≥ 1

k2

(ε3

54n3k21− 2kγ

)=

ε3

54n3k23− 2γ

k.

Otherwise, we may assume that

P(σ ∈ I1 ∩ I2 ∩ F a,b1 ∩ F a,c2 , f

(σ′)

= a)≥ ε3

54n3k21− 2kγ.

In this case define σ′ := [a : b]1 σ′ and σ′′ := [a : c]2 σ

′. If f (σ′) /∈ a, b or f (σ′′) /∈ a, c,then we automatically have that one of σ′, σ′, σ′′ is a 2-manipulation point. If f (σ′) = band f (σ′′) = c, then by Lemma 15.2 we know that there exists a 3-manipulation point σwhich agrees with σ except perhaps a, b, and c could be arbitrarily shifted in the first twocoordinates. The final case is when a ∈ f (σ′) , f (σ′′). But we now show that this hassmall probability, and therefore (142) follows.

First let us look at the case of f (σ′) = a. We have

P(σ ∈ I1 ∩ I2 ∩ F a,b1 ∩ F a,c2 , f

(σ′)

= a, f(σ′)

= a)

=∑

za,b−1 :F(za,b−1

)⊆Fa,b1

P(σ ∈ I1 ∩ I2 ∩ F

((1, za,b−1

))∩ F a,c2 , f

(σ′)

= a, f(σ′)

= a)

=∑


)⊆Fa,b1

P(σ ∈ I1 ∩ I2 ∩ F a,c2 , f

(σ′)

= a, f(σ′)

= a∣∣∣σ ∈ F ((1, za,b−1

)))P(σ ∈ F

((1, za,b−1

)))

≤∑


)⊆Fa,b1

P(σ : f

(σ′)

= a∣∣∣σ ∈ F ((1, za,b−1

)))P(σ ∈ F

((1, za,b−1

))).

114

Now we know that σ′ ∈ F((−1, za,b−1

))⊆ F a,b1 , and we also know that

P(f (σ) 6= b

∣∣∣σ ∈ F ((−1, za,b−1

)))≤ 4kγ.

The number of σ’s that give the same σ′ is at most k2, and so we can conclude that

P(σ ∈ I1 ∩ I2 ∩ F a,b1 ∩ F a,c2 , f

(σ′)

= a, f(σ′)

= a)≤ 4k3γ,

and similarly

P(σ ∈ I1 ∩ I2 ∩ F a,b1 ∩ F a,c2 , f

(σ′)

= a, f(σ′′)

= a)≤ 4k3γ,

which shows that

P (σ ∈M3) ≥ 1

k6

(ε3

54n3k21− 2kγ − 8k3γ

)≥ ε3

54n3k27− 9γ

k3.

To conclude the proof in this case, recall that we have chosen γ = ε3

103n3k24.

17.3.2 Case 2

First, as in the previous case, we can look at simply the boundary between a and b indirection 1, and conclude that either there are many manipulation points, or there aremany local dictators, or (139) holds. This holds similarly for the boundary between c andd in direction 2. Finally, just as in Section 12.3.2, we can show that (139) and (140) cannothold at the same time. We omit the details.


Proof of Theorem 17.1. Our starting point is Lemma 15.3, which directly implies Lemma 17.2(unless there are many 2-manipulation points, in which case we are done). We then considertwo cases, as indicated in Section 17.1.

We deal with the small fiber case in Section 17.2. First, Lemmas 17.3, 17.6, and 17.8,and Corollaries 17.4, 17.5, 17.7, and 17.9 imply that either there are many 3-manipulationpoints, or there are many local dictators on three alternatives in coordinate 1. We thendeal with the case of many local dictators in Section 17.2.1. Lemma 17.10, Corollary 17.11,Lemmas 17.12, 17.13, Corollary 17.14, and Lemmas 17.15, 17.16, and 17.17 together showthat there are many 4-manipulation points if there are many local dictators on three alter-natives, and the SCF is ε-far from the family of nonmanipulable functions.

We deal with the large fiber case in Section 17.3. Here Lemmas 17.18, 17.19, 17.20,and 17.21 show that if there are not many local dictators on three alternatives, then thereare many 3-manipulation points. In the case when there are many local dictators, we referback to Section 17.2.1 to conclude the proof.

115

18 Reduction to distance from truly nonmanipulable SCFs

Proof of Theorem 15.8. Our assumption means that there exists a SCF g ∈ NONMANIPsuch that D (f, g) ≤ α. We distinguish two cases: either g is a function of one coordinate,or g takes on at most two values.

Case 1. g is a function of one coordinate. In this case we can assume w.l.o.g.that g is a function of the first coordinate, i.e., there exists a SCF h : Sk → [k] on onecoordinate such that for every ranking profile σ, we have g (σ) = h (σ1).

We know from the quantitative Gibbard-Satterthwaite theorem for one voter that forany β either D (h,NONMANIP (1, k)) ≤ β, or P (σ ∈M3 (h)) ≥ β3

105k16.

In the former case, we have that

D (g,NONMANIP (n, k)) ≤ D (h,NONMANIP (1, k)) ≤ β,

and so consequentlyD (f,NONMANIP (n, k)) ≤ α+ β.

In the latter case, we have that

P (σ ∈M3 (g)) = P (σ ∈M3 (h)) ≥ β3

105k16,

and so consequently

P (σ ∈M3 (f)) ≥ β3

105k16− 6nkα,

since changing the outcome of a SCF at one ranking profile can change the number of3-manipulation points by at most 6nk. Now choosing β = 100nk6α1/3 shows that either(92) or (93) holds.

Case 2. g is a function which takes on at most two values. W.l.o.g. we mayassume that the range of g is a, b ⊂ [k], i.e., for every ranking profile σ ∈ Snk we haveg (σ) ∈ a, b.

There is one thing we have to be careful about: even though g takes on at most twovalues, it is not necessarily a Boolean function, since the value of g (σ) does not necessarilydepend only on the Boolean vector xa,b (σ).

We now define a function h : Snk → a, b that is close in some sense to g and whichcan be viewed as a Boolean function h : a, bn → a, b because h (σ) depends on σ onlythrough xa,b (σ). (The vector xa,b (σ) ∈ −1, 1n encodes which of a and b is preferred ineach coordinate, and a vector in a, bn can encode the same information.) For a givenranking profile σ, let us consider the fiber on which it is on, F

(xa,b (σ)

), and let us define

g|F(xa,b(σ)) to be the restriction of g to ranking profiles in the fiber F(xa,b (σ)

). Then

define (see Definition 15.9)

h (σ) := Maj(g|F(xa,b(σ))

).

116

By definition, h (σ) depends on σ only through xa,b (σ), so we may also view h as a Booleanfunction h : a, bn → a, b.

For any given 0 < δ < 1, we either have D (g, h) ≤ δ, in which case D (f, h) ≤ α + δ,or if D (g, h) > δ, then we show presently that

P (σ ∈M2 (f)) ≥ δ

4nk5− nkα. (143)

Choosing δ = 8n2k6α then shows that either (93) holds, or D (f, h) ≤ 9n2k6α.Let us now show (143). We use a canonical path argument again, but first we divide

the ranking profiles according to the fibers with respect to preference between a and b.Let us consider an arbitrary fiber F

(za,b), and divide it into two disjoint sets: into

those ranking profiles for which the outcome of g and h agree, and those for which theseoutcomes are different. I.e.,

F(za,b)

= Fmaj(za,b)∪ Fmin

(za,b),

where

Fmaj(za,b)

=σ ∈ F

(za,b)

: g (σ) = h (σ),

Fmin(za,b)

=σ ∈ F

(za,b)

: g (σ) 6= h (σ).

By construction, we know that∣∣∣Fmin(za,b)∣∣∣ ≤ 1

2

∣∣∣F (za,b)∣∣∣ =1

2

(k!

2

)n.

Now for every pair of profiles (σ, σ′) ∈ Fmin(za,b)× Fmaj

(za,b)

define a canonical pathfrom σ to σ′ by applying a path construction in each coordinate one by one, and thenconcatenating these paths. In each coordinate we apply the path construction of [36,Proposition 6.6.]: we bubble up everything except a and b, and then finally bubble up thelast two alternatives as well.

For a given edge (π, π′) ∈ Fmin(za,b)×Fmaj

(za,b)

there are at most 2k4(k!2

)npossible

pairs (σ, σ′) ∈ Fmin(za,b)× Fmaj

(za,b)

such that the canonical path between σ and σ′

defined above passes through (π, π′). (This can be shown just like in the previous lemmas,e.g., Lemma 17.3.) Consequently we have∣∣∣∂e (Fmin

(za,b))∣∣∣ ≥ ∣∣Fmin

(za,b)∣∣ ∣∣Fmaj

(za,b)∣∣

2k4(k!2

)n ≥∣∣Fmin

(za,b)∣∣

4k4,

where the edge boundary ∂e(Fmin

(za,b))

is defined via the refined rankings graph restrictedto the fiber F

(za,b). Summing this over all fibers we have that∑

za,b

∣∣∣∂e (Fmin(za,b))∣∣∣ ≥∑

za,b

∣∣Fmin(za,b)∣∣

4k4≥ δ

4k4(k!)n , (144)

117

using the fact that D (g, h) > δ.Now it is easy to see that if (σ, σ′) ∈ ∂e

(Fmin

(za,b))

for some za,b, then either σ or σ′ isa 2-manipulation point for g. In the refined rankings graph every vertex (ranking profile)has n (k − 1) < nk neighbors, so each 2-manipulation point can be counted at most nktimes in the sum on the left hand side of (144), showing that

P (σ ∈M2 (g)) ≥ δ

4nk5,

from which (143) follows immediately, since changing the outcome of a SCF at one rankingprofile can change the number of 2-manipulation points by at most nk.

So either we are done because (93) holds, or D (f, h) ≤ 9n2k6α; suppose the lattercase. Our final step is to look at h as a Boolean function, and use a result on testingmonotonicity [32].

Denote by D the distance of h when viewed as a Boolean function from the set ofmonotone Boolean functions. Let 0 < ε < 1 be arbitrary. Then either D ≤ ε, in whichcase D (h,NONMANIP) ≤ D ≤ ε and therefore D (f,NONMANIP) ≤ 9n2k6α + ε, orD > ε. In the latter case we show that then

P (σ ∈M2 (f)) ≥ 2ε

nk− 9n3k7α. (145)

Choosing ε = 5n4k8α then shows that either (92) or (93) holds.Let us now show (145). Let us view h as a Boolean function, and denote by p (h) the

fraction of pairs of strings, differing on one coordinate, that violate the monotonicity con-dition. Goldreich, Goldwasser, Lehman, Ron, and Samorodnitsky showed in [32, Theorem2] that p (h) ≥ D

n .Now going back to viewing h as a SCF on k alternatives, this tells us that there are

at least ε22n pairs of fibers, which differ on one coordinate, that violate monotonicity. For

each such pair of fibers, whenever a and b are adjacent in the coordinate where the twofibers differ, we get a 2-manipulation point. Such a 2-manipulation point can be countedat most n times in this way (since there are n coordinates where a and b can be adjacent).Consequently, we have

|M2 (h)| ≥ ε

2· 2n · 2 (k − 1)!

(k!

2

)n−1

· 1

n=

2ε

nk(k!)n ,

i.e.,

P (σ ∈M2 (h)) ≥ 2ε

nk,

from which (145) follows immediately, since changing the outcome of a SCF at one rankingprofile can change the number of 2-manipulation points by at most nk.

118

Proof of Theorem 11.9. First we argue without specific bounds. Suppose on the contrarythat our SCF f does not have many 4-manipulation points. Then f is close to NONMANIPby Theorem 17.1. Consequently, by Theorem 15.8, f is close to NONMANIP, which is acontradiction.

Now we argue with specific bounds. Assume on the contrary that

P (σ ∈M4 (f)) <ε15

1039n67k166.

Then by Theorem 17.1 we have that D(f,NONMANIP

)< ε3

106n12k24, and consequently by

Theorem 15.8 we have D (f,NONMANIP) < ε, which is a contradiction.

19 Aggregation power of Boolean Functions

We now take a modern view of Condorcet Jury Theorem. First, recall the setting. Thereare two alternatives denoted + and −, which are apiori equally likely to be the preferablealternative, and that each voter independently receives the correct information with prob-ability p > 1/2 and incorrect information with probability 1 − p. We now denote the nsignals by x1, . . . , xn and aggregate them via a boolean function f : −1, 1n → −1, 1.The following fact is a generalization of the second part of the Jury Theorem.

Theorem 19.1. Let 0.5 < p ≤ 1. Then f : −1, 1n → −1, 1 maximizes P [f(x) = s]if and only if f(x) = + for all x such that

∑ni=1 xi > 0 and f(x) = − for all x such that∑

xi < 0.In particular when n is odd Majority is the only function that maximizes P [f(x) = s].

This is a special case of the Neyman-Pearson lemma for hypothesis testing in statistics.The proof is easy. In our special case it immediately follows from the fact that

P [s = +|x1, . . . , xn]

P [s = −|x1, . . . , xn]=P [x1, . . . , xn|s = +]

P [x1, . . . , xn|s = −]=

(p

1− p

)n+(x)−n−(x)

where n+(x) is the number of +s in x and similarly n−.Assume for a moment that s = +. Therefore, our interest is in the quantity P[f = +].

If f is a monotone function, we have the following Russo’s formula, where Pp denotes theproduct measure with marginals Pp[xi = +] = p:

Proposition 19.2. Let f : −1, 1n → −1, 1 be monotone then

d

dp

∣∣∣∣p=1/2

Pp[f = 1] =

n∑i=1

Ii(f).

119

One short proof of Proposition 19.2 is to consider an n variable function P(p1,...,pn)[f = 1]and applying the chain rule. In fact, if we define for Boolean functions

Jp,i(f) = Ep[Pp[f(x−i,+) 6= f(x−i,−)]],

then the same proof yields more generally that for monotone f : −1, 1n → −1, 1 itholds that

d

dp

∣∣∣∣p

Pp[f = 1] =

n∑i=1

Jp,i(f) =1

4p(1− p)

n∑i=1

Ip,i(f), (146)

whereIp,i(f) = E[V ar[f(x)|x−i]],

Lemma 19.3. In the setting of Condorcet voting with uniform prior on s, among allmonotone functions majority maximizes

d

dp

∣∣∣∣p=1/2

Pp[f = s] =n∑i=1

Ii(f)

Proof. Consider jurors who receive the correct signal with probability p. Then for anymonotone function f we have that

Pp[f = s] = 0.5(Pp[f = +] + P1−p[f = −]) = 0.5 + 0.5(Pp[f = +]− P1−p[f = +]).

We know that for every p, among all monotone functions, the pan-ultimae expression aboveis maximized for the Majority function. Taking the derivative of the ultimate expressionwith respect to p concludes the proof.

By examining the proof of Theorem 19.1 more carefully, we see that the Majorities arethe only maximizers of the influence sum.

Interestingly, dictators are the minimizers in their aggregation power. In particular wehave:

Proposition 19.4. For any Boolean function with f : −1, 1n → −1, 1 it holds that∑Ii(f) ≥ V ar[f ]. Equality holds if and only if f is a dictator or a constant.

Proof. This follows from the fact that∑i

Ii(f) =∑S

|S|f2(S), V ar[f ] =∑S 6=∅

f2(S).

Again, in fact the claim is true for every 0 < p < 1:

120

Proposition 19.5. For any Boolean function f : −1, 1n → −1, 1 it holds that∑Ip,i(f) ≥ V arp[f ]

Equality holds if and only if f is a dictator.

Exercise 19.6. Prove it.

In fact more is true:

Proposition 19.7. Among all f : −1, 1n → −1, 1 which are monotone and satisfyE1/2[f ] = 0, dictators are the only minimizers of Ep[f ] for all 0.5 < p < 1.

Proof. Assume by contradiction the statement is false. Then there exists a non-dictatorfunction f and 1 > r > 1/2 such that Pr[f = 1] = r. Let q = inf(r > 1/2 : Pr[f = 1] = r).Since Pr[f = 1] is a polynomial in r, it follows that Pq[f = 1] = q. Moreover, by Poincare’sinequality at 1/2, it follows that q > 1/2. Since Pr[f = 1] ≥ r for 1/2 ≤ r ≤ q it followsthat d/dr|r=qPr[f = 1] ≤ 1. However, by Poincare’s inequality at q it is strictly greaterthan 1. The proof follows.

20 The minimal Influence Problem and Coalitions

Poincare’s inequality:n∑i=1

Ii(f) =∑S

|S|f2(S) ≥ Var[f ]

implies that any function f : −1, 1n → R has a variable i with Ii(f) ≥ Var[f ]/n. This inturn implies the following:

Lemma 20.1. If f : −1, 1n → −1, 1 is a monotone function with E[f ] = 0, then forany 1 > ε > 0, there exists a set |S| ≤ εn such that E[f |∀i ∈ S, xi = 1] ≥ ε/4.

That is, a small linear fraction of the voters can change the expected value of the func-tion noticeably. For the proof the following lemma is useful to note that if f : −1,+1n →−1,+1 is monotone then:

f(i) = E[fxi] = E[E[fxi|x−i]] = Ex−i [V ar[f |x−i]] = Ii(f),

since conditioned on x−i, either f is a constant or f is the identity and in both cases,E[fxi|x−i] = V ar[f |x−i].

Proof. Construct S iteratively. Let f1 = f and let S0 = ∅. At stage i of the construction,let j be the variable with most influence of the function fi and let Si = Si−1∪j. Let fi+1

be defined as fi conditioned on xj = 1. If E[fi+1] ≥ ε we stop. Otherwise, we continue.Note that at each step we have E[fi+1 − fi] ≥ Ij(fi) ≥ 0.25/n. The proof follows.

121

It is natural to ask if the claim and proof are tight. We first ask if one can improvethe bound Ii(f) ≥ Var[f ]/n. For general functions the answer to the later question is no,as the function f = n−1/2

∑xi shows. For Boolean functions there is an example that is

almost tight i.e., the tribes function.

Example 20.2. Let m = 2r and let n = m log2m = mr and define the tribes functionf : 0, 1n → 0, 1 by

f(x) = (x1 ∧ . . . ∧ xr) ∨ (xr+1 ∧ . . . ∧ x2r) ∨ · · · ∨ (xn−r+1 ∧ . . . ∧ xn)

This example from [8] can be described as follows. The function is 1 if there is a decision togo to war and 0 otherwise. In order to go to war there has to be at least one tribe where allmembers decide to go to war, i.e, there has to exist a k so that xkr+1 = . . . = x(k+1)r = 1. Itis easy to see that the number of tribes that go to war is Bin(m, 1/m) which is approximatelyPoisson. In particular,

E[f ] = P[Bin(m, 1/m) ≥ 1] ∈ [0.1, 0.9],

for r ≥ 1. For ∂1f = 1, i.e., that the first coordinate is influential, we need that none ofthe tribes but the first tribe wants to go to war and that in tribe 1, everyone but the firstvoter wants to go to war. Thus

Ii(f) = P[∂1f = 1] = 2−r+1P[Bin(m− 1, 1/m) = 0] ≤ 2−r+1 =2

m≤ 2 log2(n)

n.

It is also easy to see from the equation above that in fact Ii(f) is of order log2(n)n .

Thus, we already know that the minimal influence is of order at least Var[f ]/n and wehave an example where it is of order Var[f ] log n/n. The KKL Theorem proven in the nextsection shows that the log n factor is in fact necessary.

Theorem 20.3. There exists a constant c such that for all f : −1, 1n → −1, 1 it holdsthat

miniIi(f) ≥ c log n

nVar[f ]

Repeating the proof of Lemma 20.1 we obtain the following:

Lemma 20.4. Let f : −1, 1n → −1, 1 be a monotone function with E[f ] = 0, then forany ε > 0, there exists a set |S| ≤ c(ε)n/ log n such that E[f |∀i ∈ S, xi = 1] ≥ 1− ε.

Interestingly, the tribes example is far from tight for Lemma 20.4 as it suffices to takeS of size log n (e.g., a tribe) to fix the value of the function to 1. There is a famousexample by Ajtai and Linial where a coalition of size of order n/ log2 n is needed to shiftthe expected value of f [1] by a constant.

122

21 Semi-Groups, Talagrand and the KKL Theorem

In this section we will write Pt = Te−t so that Pt is a semi-group, so Ps+t = PsPt.

Lemma 21.1.

Ptf =∑S

f(S)e−t|S|xS , (∇if)(x) =1

2(f(x−i, 1)− f(x−i,−1)) =

∑S:i∈S

f(S)xS\i

Note that∇i(Ptf) =

∑S:i∈S

f(S)e−t|S|xS\i = e−tPt∇fi.

and that

d

dtPtf = −

∑S

|S|f(S)e−t|S|xS = −n∑i=1

xi∇i(Ptf) = −e−tn∑i=1

xiPt∇if

sod

dtE[fPtg] = −e−t

n∑i=1

E[fxiPt∇ig] = −e−tn∑i=1

E[∇ifPt∇ig]

Claim 21.2. If f and g are monotone functions then E[fPtg] is monotonically decreasingin t.

Proof. Assume f and g are increasing then the proof follows since ∇if and ∇ig are alwaysnon-negative.

Claim 21.3. Let f1, . . . , fk be positive and monotone increasing then E[Ptf1Ptf2 . . . Ptfk]is monotone decreasing.

Proof. Write the derivative as a sum of partial derivatives.

Remark 21.4. The proofs are so easy that the claims can be generalized much further.We need MC that respects and such the generator only moves to comparable elements.

We now turn to applications a la Cordero-Ersquin and Ledoux [18] of the derivativeformula which can be thought of as an integration by parts formula.

Claim 21.5.

Cov(f, g) =

∫ ∞0

e−tn∑i=1

E[∇ifPt∇ig]dt

Proof.

Cov(f, g) = E[fP0g]− E[fP∞g] = −∫ ∞

0

d

dtE[fPtg]dt =

∫ ∞0

e−tn∑i=1

E[∇ifPt∇ig]dt

as needed.

123

We can now prove some useful corollaries.

Claim 21.6.

|Cov(f, g)| ≤n∑i=1

√Ii(f)Ii(g)

Proof. Note that

|E[∇ifPt∇ig]| ≤ ‖∇if‖2‖Pt∇ig‖2 ≤ ‖∇if‖2‖∇ig‖2 =√Ii(f)Ii(g).

The proof follows by integration.

Remark 21.7. The proof again generalizes assuming only the Cordero Eresquin setup andspectral gap.

If instead of the contractive estimate above we use the hyper-contractive inequality:

‖E[∇ifPt∇ig]| ≤ ‖∇if‖1+e−t‖∇ig‖1+e−t

We get the following bound:

|Cov(f, g)| ≤∫ ∞

0e−t

n∑i=1

‖∇if‖1+e−t‖∇ig‖1+e−tdt =

∫ 2

1

n∑i=1

‖∇if‖v‖∇ig‖vdv

Now By Holder inequality

‖f‖v ≤ ‖f‖1−θ(v)2 ‖f‖θ(v)

1 = ‖f‖2(‖f‖1‖f‖2

)θ(v)

,

where vθ + 0.5(1− θ)v = 1 so 0.5θv = 1− 0.5v so θ = 2/v − 1. Writing

ri =‖∇if‖1‖‖∇ig‖1‖∇if‖2‖∇ig‖2

,

we see that

|Cov(f, g)| ≤n∑i=1

‖∇if‖2‖∇ig‖2∫ 2

1rθ(v)i dv

Note that θ(v) = 2/v − 1 ≥ 1− v/2 so∫ 2

1rθ(v)i dv ≤

∫ 2

1r

1−v/2i dv = 2

∫ 1/2

0rvi dv =

1

ln ri(ri − 1) ≤ 5

1 + ln(1/ri)

We thus obtain

124

Claim 21.8.

|Cov(f, g)| ≤ 5n∑i=1

√Ii(f)Ii(g)

1 + ln (‖∇if‖2‖∇ig‖2/(‖∇if‖1∇‖ig‖2))

Remark 21.9. This is of course just a slight generalization of the proof by [18] of Tala-grands’s result [78].

V ar[f ] ≤ 5

n∑i=1

Ii(f)

1 + ln (‖∇if‖2/(‖∇if‖1))

We claim that furthermore the last inequality implies the KKL Theorem that for Booleanfunctions f : −1, 1n → 0, 1 there exists an i such that Ii(f) ≥ cV ar[f ] log n/n. This istrue since for Boolean functions ∇if ∈ 1, 0,−1 and therefore

‖∇if‖2 =√Ii(f), ‖∇if‖1 = Ii(f).

So for such function we get that

V ar[f ] ≤ 10

n∑i=1

Ii(f)

1− ln Ii(f)

So the there exist an i such that

Ii(f)

1− ln Ii(f)≥ cV ar[f ]/n

and therefore Ii(f) ≥ cV ar[f ] log n/n.

In the case where f and g are monotone in the same direction and moreover f and gare Boolean we can apply reverse-hyper-contraction instead to

E[∇ifPt∇ig] ≥ ‖fi‖1−e−t‖gi‖1−e−t = (Ii(f)Ii(g))1/(1−e−t)

A careful analysis of the integral lets us conclude that

Cov(f, g) ≥ cn∑i=1

Ii(f)Ii(g)√1 + 1/ log Ii(f)

√1 + 1/ log Ii(g)

This is following the proof in Keller-M-Sen in the Gaussian case.

125

21.1 Further applications for aggregation of information

Remark 21.10. The KKL Theorem and its proof immediately extend from the measure2−n(δ−1 + δ1)⊗n to any measure of the form (pδ−1 + (1 − p)δ1)⊗n for any fixed p, sincehyper-contractive estimates are known for such measures.

Allowing varying values of p in the KKL Theorem has interesting implications foraggregation of information.

We call a function f : −1, 1n → −1, 1n symmetric, if there exists a group Π who actstransitively on [n] such that f(π(x)) = f(x) for all π ∈ Π, where π(x) = (xπ(1), . . . , xπ(n)).Recall that Π acts transitively on [n] if for all i, j ∈ [n], there exists π ∈ Π with π(i) = j.

Corollary 21.11. Let f : −1, 1n → −1, 1 be monotone and symmetric and assumef(−x) = −f(x) for all x. Then for any ε > 0, there exists a c(ε) > 0 such that if

E1/2+c/ logn[f ] ≥ 1− ε.

Proof. Use Russo’s formula and the KKL Theorem.

Of course there are examples, where the theorem is far from tight. For example for themajority function, we know that the statement of the theorem holds when p = 1/2+c/

√n.

Exercise 21.12. Consider the tribes functions and show that the statement of the theoremcannot be improved.

Exercise 21.13. Show that for the recursive Majority of threes functions, the statementof the theorem holds for p = 1/2 + n−δ for some δ > 0.

These results can be further generalized to scenarios involving multiple candidates:

21.2 Monotonicity and Aggregation for multiple alternatives

Let A be a finite set. Let X = An. For σ ∈ S(n), a permutation on n elements and x ∈ Anwe denote by y = xσ the vector satisfying yi = xσ(i) for all i. For σ ∈ S(A) we writey = σ(x) for the vector satisfying yi = σ(xi) for all i. For a ∈ A and x, y ∈ X we writex ≤a y if i : xi = a ⊂ i : yi = a and for all i such that yi 6= a it holds that xi = yi. Inother words if x ≤a y then for all i if xi 6= yi then yi = a. It is easy to see that ≤a definesa partial order on X.

Definition 21.14. We say that f : X = An → A is monotone if for all a ∈ A and x, y ∈ Xsuch that x ≤a y it holds that f(x) = a implies that f(y) = a. We say that f is symmetricif there exists a transitive group Σ ⊂ S(n) such that f(xσ) = f(x) for all x ∈ X and σ ∈ Σ.We say that f is fair if for all σ ∈ S(A) and all x ∈ X it holds that f(σ(x)) = σ(f(x)).

Let ∆[A] denote the simplex of probability measures on A and let γ denote the standardprobability measure on ∆[A]. For µ ∈ ∆[A] denote by Pµ the measure µ⊗n on X. Wedenote by Eµ the expected value according to the measure Pµ. For any measure µ ∈ ∆[A],we write µ∗ for the minimal probability µ assigns to any of the atoms in A.

126

It is not too hard to generalize the proof of the KKL Theorem to obtain the followingtheorems, see cite

Theorem 21.15. There exists an absolute constant C = C(|A|) such that if f is symmetricand monotone then for any a ∈ A and 1/2 > ε > 0 it holds that

γ[µ : ε ≤ Pµ[f = a] ≤ 1− ε] ≤ C(log(1− ε)− log(ε))log log n

log n.

Theorem 21.16. For every q there exists a constant C = C(q) for which the followingholds for every 0 < ε < 1/3. Let µ ∈ ∆(q) and i ∈ [q] satisfy that

µ(i) > maxj 6=i

µ(j) + C(log(1− ε)− log(1/q))log log n

log n.

Then for every fair monotone symmetric function f : [q]n → [q] it holds that

µ[f = i] ≥ 1− ε.

The results above establishes sharp threshold for symmetric functions as it shows thatfor almost all probability measures f takes one specific value with probability at least 1−ε.

Remark 21.17. It is interesting to compare the results established here to those of [25].For |A| = 2 our results give a threshold interval of length O(log log n/ log n) compared tothe results of [25] which give threshold interval of length O(1/ log n). The example of thetribes function shows that a better bound is impossible. It is natural to conjecture thatthe threshold is always of measure O(1/ log n).

21.3 Applications to Majority/Plurality Dynamics

An interesting application of the theorems above is to Majority and Plurality dynamics [].We will concentrate on Majority dynamics.

Consider a graph G where all the degree are odd and an assignment of opinions Xv(0)for nodes of the graph. Consider the following dynamics, where Xv(t+ 1) = maj(Xw(t) :w ∼ v).

It is natural to ask a few questions: Does this process converge? It is easy to see thatthe answer to this question is no. This can be seen for example in the case where G isbi-partite graph where the initial opinions on one side are + and on the other −.

Interestingly a finer notion of convergence is known to hold []

Proposition 21.18. For each v, the sequence Xv(2t) converges.

127

Proof. Let

Jv(t) = (Xv(t+ 1)−Xv(t− 1))∑w∼v

Xw(t), J(t) =∑v

Jt(v)

We first note that Jv(t) ≥ 0 and Jv(t) = 0 if and only if Xv(t + 1) = Xv(t − 1). Fromthis we conclude that Jt ≥ 0 and Jt = 0 if and only if Xv(t + 1) = Xv(t − 1) for all v.Thus it suffices to show that Jt = 0 from some point on. Also note that if Jt = 0 thenXv(t+ 2i− 1) = Xv(t− 1) for all i.

For this let Lt =∑

u∼v(Xv(t+1)−Xu(t))2 be a measure of disagreement. We claim thatLt−Lt−1 = −Jt. Note that this implies that Lt is decreasing and therefore at some point wewill have to have Jt = 0 as needed. It is straightforward to check that Lt−Lt−1 = −Jt.

Let us write Xv = limt→∞Xv(2t). Given that this process converges, it is natural toask if it retains information in the following sense: Suppose that Xv(0) are chosen i.i.d.pδ1 + (1− p)δ−1 for p > 1/2. What is the probability that the information is retained, i.e.,what is

Pp[maj(Xv) = +]?

KKL Theorem allows us to answer this question in the special case where the graph istransitive.

22 Aggregation for non-product measures

In this section we ask the following question: can aggregation results such as the Jury The-orem and the KKL theorem be extended to non-product measures. This section providessome answers to this question. The section is based on [34]

To describe a more general settings consider the following framework. Let f : 0, 1n →0, 1 be a Boolean function. We will assume that f is

• monotone non-decreasing, i.e.,

(∀i : xi ≥ yi) =⇒ f(x1, . . . , xn) ≥ f(y1, . . . , yn),

• and anti-symmetric, i.e.,

f(1− x1, 1− x2, . . . , 1− xn) = 1− f(x1, x2, . . . , xn).

The purpose of this section is to study extensions of the weak law of large numbersin the context of general probability distributions. Let µ be a probability distribution on0, 1n. When µ is not a product measure the notion of influence can be extended indifferent way compared to the above. Define the effect of the k’th variable on the Booleanfunction f as the difference between the expected value of f(x1, . . . , xn) conditioned on

128

xk = 1 and the expected value of f(x1, . . . , xn) conditioned on xk = 0, and denote by eµk(f)the effect of the k’th variable for the Boolean function f , w.r.t. the distribution µ. Moreprecisely,

eµk(f) = µ[f(X1, . . . , Xn)|Xk = 1]− µ[f(X1, . . . , Xn)|Xk = 0]. (147)

The effect is undefined if the probability for Xk = 1 is 1 or 0. Writing µ[Xk] = p andYk = Xk − p, we get

Covµ

[f(X1, . . . , Xn), Xk] = µ[f(X1, . . . , Xn)Yk]

= pµ[(1− p)f |Xk = 1] + (1− p)µ[−pf |Xk = 0]

= p(1− p)eµk(f)

so that the effect may be interpreted as a normalized form of the correlation between theindividual vote and the election’s outcome.

When µ represent a product probability measure (??), the effect (147) and the influencecoincide, but in general this is not the case. For instance, for general µ the effect may benegative (an example will appear later) while the influence is of course always non-negative.

It is not true that for general probability distributions and general f , small influencesimplies aggregation of information. Our main result is that small effects implies aggregationof information for the particular case of weighted majority functions. Moreover, the boundsfor weighted majorities are rather realistic.

We call monotone antisymmetric function f a weighted majority function if there existsnon-negative weights w1, . . . , wn, not all zero such that f(x1, . . . , xn) = 1 if

∑ni=1wi(2xi −

1) > 0 and and f(x1, . . . , xn) = 0 if∑n

i=1wi(2xi− 1) < 0. If n is odd and wi = 1 for everyi, f is called the majority function (or simple majority).

Note that in our definition of a weighted majority function, if∑wi(2xi − 1) = 0 then

the value of f(x) may be either 0 or 1 as long as f is monotone and anti-symmetric. Thisis different from the traditional definition of a weighted majority (or threshold) functionwhere f(x) = 1 iff

∑wi(2xi − 1) > 0 and f(x) = 0 iff

∑wi(2xi − 1) < 0.

Thus for example, any monotone anti-symmetric function f : 0, 1n → 0, 1 satisfyingf(x) = 1 when x1 = x2 = 1 and f(x) = 0 when x1 = x2 = 0 is a weighted majority function(taking w1 = w2 = 1 and w3 = · · · = wn = 0) according to our definition.

The above example demonstrates that under our definition of weighted majority func-tions, there are at least 22n−2

weighted majority functions. Under the traditional definitionthe number of weighted majority functions is at most 2n

2[15, 75].

Of particular interest are voting schemes where all the voters have the same power. Onesuch case is when f is invariant under a transitive group of permutations. In other wordsthere exists a group of permutation Γ ⊂ Sn such that f(x1, . . . , xn) = f(xσ(1), . . . , xσ(n))for all σ ∈ Γ and for all 1 ≤ i, j ≤ n there exists σ ∈ Γ such that σ(i) = j; here Sn denotesthe full permutation group on n elements. One instructive example is the simple majorityfunction when n is odd which is invariant under Sn; another is the recursive majority

129

function RMk,` which is defined for n = k` where k is odd. The definition is by induction.RMk,1 is just the majority function on k bits and

RMk,`+1(x1, . . . , xk`+1) = RMk,1

(RMk,`(x1, . . . , xk`), . . . , RMk,`(xk`−k`−1+1, . . . , xk`)

).

Theorem 22.1.

(a) For every p > 12 , ε > 0 there is δ = δ(p, ε) > 0 such that for every weighted majority

function f and any distribution µ on 0, 1n, if eµk [f ] ≤ δ and µ[Xk = 1] ≥ p for allk then µ[f ] ≥ 1− ε.In other words, if the effect of each variable is at most δ and the probability that eachvariable is 1 is at least p, then f = 1 with µ-probability at least 1− ε.

(b) If f is a monotone anti-symmetric function but not a weighted majority function,then there exists a probability distribution µ such that µ[Xk = 1] > 1/2 for all k, yetµ[f ] = 0 and eµk(f) = 0 for all k.

In other words, if f is not a weighted majority function, then there is a probabilitymeasure µ for which f = 0 with µ-probability 1, yet µ[Xk = 1] > 1

2 for all k. (Sincef is constant according to the measure µ, all the effects are 0 in this case.)

(c) If f is monotone anti-symmetric and invariant under a transitive group, but is not the(simple) majority function, then then there exists a probability distribution µ suchthat µ[Xk = 1] > 1/2 for all k, yet µ[f ] = 0 and eµk(f) = 0 for all k.

The rest of this section is organized as follows. In subsection 22.1 we will discussthe notions of aggregation of information, influences and effects for general probabilitydistributions on 0, 1n. We will try to examine what aggregation of information meanswhen we do not suppose that the probability distribution for the voter’s behavior is aproduct distribution. We also examine to what extent our technical notion of “effects”represent real influence in the non-technical sense of the words. Subsection 22.2 containsthe proof of our theorem and in subsection 22.3 we present several natural problems as wellas an example showing that Theorem ?? does not extend to arbitrary Boolean monotonefunctions even for the restricted class of FKG-distributions.

22.1 Voting games, information aggregation and notions of influence

Consider the following scenario. Every agent k receives a single bit of information si whichis either ‘Vote for Alice’ or ‘Vote for Bob’ and these signals are independent. When Aliceis the better candidate the probability of receiving the signal ‘Vote for Alice’ is p > 1/2.Condorcet’s Jury Theorem deals with the case that the voters vote precisely as the signaldictates and the decision is made according to the simple majority rule. It asserts that forevery p > 1/2 the better candidate will be elected with probability tending to one. Thus

130

the majority rule allows to reveal the actual state of the world from rather weak individualsignals.

A major problem in the economic and political interpretation of Condorcet’s JuryTheorem and its extensions arises from the fact that the basic assumption of probabilityindependence among voters is quite unrealistic. Without the assumption of independence,Condorcet’s Jury Theorem as stated is no longer true, and it will no longer be the casethat when each individual votes for Alice with probability p > 1/2, Alice will win with ahigh probability.

To see this, consider the following example. As before, we have an election betweenAlice and Bob and Alice is the superior candidate. The distribution of signals s1, s2, . . . , snwill be biased towards Alice as follows: Let p = 1/2 + ε/2, where ε is small. First choose atrandom a number t uniformly in the interval [ε, 1]. Then, independently for each i, choosethe i’th voter signal si to be ‘1’ with probability t and ‘0’ with probability 1 − t. Voterswith si = 1 will vote for Alice. In this case, the probability for each individual signal sibeing ‘1’ is p but the individual signals are not independent. The probability that Alicewill win is below 1

2(1−ε) for any number of voters. This is because we can think of t beingchosen in two stages. First we toss a coin which is ’H’ with probability ε/(1 − ε). If thecoin is ’H’, then t is chosen uniformly in the interval [1 − ε, 1]. This contributes to theprobability that Alice wins at most ε/(1−ε). If the coin is ’T’ then t is chosen uniformly inthe interval [ε, 1− ε]. Here by symmetry, Alice and Bob have the same chance of winning.Thus the contribution to the probability that Alice will win from this case is 1−2ε

2(1−ε) . Thus

the overall probability that Alice will win is at most 1−2ε2(1−ε) + ε

1−ε = 12(1−ε) .

An even more extreme example is the case in which all voters vote in the same way:With probability p they all vote for Alice and with probability 1− p they all vote for Bob.Alice will be elected with probability p regardless of the number of voters when the electionis based on simple majority and for every other simple game.

These simple examples will help us to examine the notions of information aggregationand influence in the case when the assumption of probability independence is dropped.The problem in these examples is not in the way information aggregates but in the qualityof the information to start with. This assertion can be formalized as follows. Suppose thatAlice and Bob are given an a-priori probability 1/2 of being the superior candidate. Weassume that the distribution of voters for Bob given Bob is the superior candidate is thesame as the distribution of voters for Alice given that Alice is the superior candidate. Thusin the first example above if Bob is superior then we choose t uniformly in [ε, 1] and theneach voter votes for Bob independently with probability t. In the second example if Bobis superior, then all voters will vote for Bob with probability p.

We now wish to decide between the hypothesis that Alice is the superior and thehypothesis that Bob is the superior candidate given the entire vector of individual signals.It is intuitively clear and easy to prove using the Neyman-Pearson Lemma that in bothcases described above one should guess that Alice is superior to Bob exactly when the

131

majority of voters voted for Alice. However, in both examples above the probability thatthe majority will vote Alice when Alice is superior is bounded away from 1 and tends to1/2 as p does.

When we consider general distributions, the issue is to understand what information wecan derive on the superior alternative from knowing the signals of all individuals and howthe voting mechanism extracts this information. Note that in the examples we consideredabove the individual effects are large while the individual influences are small. This ismost transparent in the second example where if f is the majority function and n ≥ 3,then all of the influences are 0, while the effect of all voters are 1. Theorem 22.1 assertsthat for the weighted majority voting rule (and only for these rules) for every probabilitymeasure on 0, 1n, small individual effects implies asymptotically complete aggregation ofinformation.

Let us next consider the notion of influence without probability independence. Thenotion of pivotal variables (or players) and influence is of important technical importancein various areas of mathematics, computer science and economics. This notion is also of aconsiderable conceptual importance. The voting power index of Banzhaf is based on mea-suring the influence with respect to the uniform distribution. The Shapley–Shubik powerindex can also be based on the influence with respect to another distribution. Conceptualunderstanding of voting power in situations where the voters’ behavior is not independentis of great interest. In [29] the authors propose to define the voting power as the probabilityto be pivotal based on realistic assumptions on individual voting distributions. We makethe following remarks on the notion of individual effects which is quite a different extensionof influence and voting power measures to arbitrary probability distributions.

(i) For general distributions, the effect of an agent can be negative. This will be the casefor a voter who always votes for the candidate who is the underdog in the electionpolls and also for a committee member who antagonizes the other members of thecommittee. (On the other hand, the influence of an agent is always nonegative,because it is defined as a represents a probability.)

(ii) A dummy (a voter k which is never pivotal) has zero influence (with respect to everyprobability distribution). He may nevertheless have a large effect, such as if he alwaysvotes for the candidate who is expected to win according to election polls. In reallife, this will also be the case for an observer on a committee without the right tovote but who is likely to convince the committee of his opinion. Note that in the firstcase we do not attribute to that player real “influence” in the (non-technical) Englishsense of the word, while in the second case we would consider him “influential”. Theuncertainty in interpreting effects as real “influences” is genuine.

(iii) What is the motivation for a voter to vote, given the small probability for him tobe pivotal? This is a social dilemma, related to, e.g., the so-called tragedy of thecommons, and has been extensively discussed in the political science and philosophy

132

literature. (Sometimes the term voting paradox has been used for this dilemma, butmay cause some confusion as the same term is used also for Condorcet’s famousobservation that when three or more choices are available, the majority preferencebetween them need not be transitive.) A possible solution to the dilemma may lie inthe fact that in real-life elections, individual effects tend to be large, namely boundedaway from zero regardless of the size of the society. The uncertainty in regardingeffects as real “influence” may suggest that it is the effect of an agent rather than hisinfluence which is related to his “satisfaction” with the social decision process andhis ability to identify with the collective choice.

22.2 Proof of Theorem 22.1

22.2.1 Part (a) of the theorem

We begin this section by providing a probabilistic proof of the following result, whichclearly implies Theorem 22.1 (a).

Lemma 22.2. Let (wi)ni=1 be non-negative weights which are not all 0, let 0 < q < 1, and

let f : 0, 1n → −1, 1 be a function which satisfies

f =

1 if

∑ni=1(2xi − 2q)wi > 0

0 if∑n

i=1(2xi − 2q)wi < 0.

Write W =∑n

i=1wi. Suppose furthermore that p > q and that µ is a probability measuresatisfying µ[Xi] = pi and

n∑i=1

wipi = pW (148)

as well asn∑i=1

wipi(1− pi)eµi [f ] ≤ p(1− p)δW. (149)

Then

µ[f ] ≥ 1− δp(1− p)p− q

.

(Note that (148) holds if µ[Xi] = p for all i, and that (149) holds if µ[Xi] = p andeµi [f ] ≤ δ for all i, so that indeed Theorem 22.1 (a) follows.)

Proof. Let X =∑n

i=1(2Xi − 2q)wi. We start by noting that µ[X] = (2p− 2q)W .We let g = 1− f and Yi = pi −Xi, so that

µ[Yig] = Covµ

[f,Xi] = pi(1− pi)eµk [f ].

133

Note that conditioned on g = 1,∑n

i=1(2Xi − 2q)wi ≤ 0 and therefore∑n

i=1wiYi ≥(p− q)W . It follows that

µ

[(n∑i=1

wiYi

)g(X1, . . . , Xn)

]≥ (p− q)Wµ[g]

= (p− q)W (1− µ[f ]). (150)

On the other hand,

µ

[(n∑i=1

wiYi

)g(X1, . . . , Xn)

]=

n∑i=1

wiµ[Yig(X1, . . . , Xn)]

=n∑i=1

wipi(1− pi)eµi [f ]

≤ p(1− p)δW. (151)

Combining (150) and (151), we get that

µ[f ] ≥ 1−∑n

i=1wipi(1− pi)eµi [f ]

(p− q)W

≥ 1− δp(1− p)p− q

.

22.2.2 Parts (b) and (c) of the theorem

We note that part (c) of Theorem 22.1 follows immediately from part (b), because the onlyweighted majority function that is invariant under a transitive group, is simple majority.Let us nevertheless begin by giving an independent and simple proof of part (c). Notethat if f is not the majority function then there is a vector (x1, x2, . . . , xn) ∈ 0, 1n suchthat f(x) = 0 and x1 + x2 + · · · + xn > n/2. Then we can simply take µ to be uniformprobability distribution on the orbit of x under Γ. It is then easy to see that µ[Xk] > 1/2for all k and that µ[f = 0] = 1.

We now turn to the proof of Theorem 22.1 (b). We will show that if f is not a weightedmajority function, then there exists a measure µ satisfying µ[Xk] > 1/2 for all k andµ[f = 0] = 1.

Define [n] = 1, 2, . . . , n. For S ⊂ [n] put xS = (x1, x2 . . . , xn) where xi = 1 if andonly if i ∈ S. Let H be a hypergraph whose set of vertices is [n] and whose edges aresubsets S of [n] such that f(xS) = 0. Let τ∗ = τ∗(H) be the fractional cover numberof H, i.e., the infimum over all ν : 0, 1n → R of

∑S∈H ν[xS ], under the condition that

134

ν(xS) ≥ 0 for every S ∈ H and∑

S∈H,k∈S ν[xS ] ≥ 1 for all k. We get τ∗ =∞ if there areno ν satisfying the two conditions above (note that this is the case if f(x) = x1, say).

If τ∗ < 2, then we can define µ(S) = 0 if f(S) = 1 and µ(S) = ν(S)/τ∗ when f(S) = 0.The probability measure µ satisfies that∑

S:k∈S,f(S)=0

µ(xS) ≥ 1/τ∗ > 1/2

for every k and µ[f = 0] = 1 as stated in the theorem. Therefore, in order to prove part(b) of the theorem, it only remains to analyze the case τ∗ ≥ 2.

A well known equivalent (by linear programming duality) definition of of τ∗ is as thesupremum of

∑ni=1wi under the condition that wk ≥ 0 for k = 1, 2, . . . , n and

∑wi : i ∈

S ≤ 1 for every S ∈ H.Assume first that τ∗ > 2. In this case we can find wi’s such that

∑iwi > 2 and

f(x1, . . . , xn) = 1 if∑wixi > 1. By slightly perturbing the wi we may assume that for all

x ∈ 0, 1n it holds that∑

iwixi 6=12

∑iwi in addition to the properties that

∑iwi > 2

and f(x1, . . . , xn) = 1 if∑wixi > 1. Let g(x) = 1 if

∑iwixi >

12

∑iwi and g(x) = 0 if∑

iwixi <12

∑iwi. Then g is anti-symmetric and f = 0 =⇒ g = 0. It follows that f = g

so that f is a weighted majority function as needed.The remaining case is where τ∗ =

∑wi = 2. We obtain that f(x1, . . . , xn) = 1 if∑

wixi > 1. Since f is anti-symmetric it follows that f(x1, . . . , xn) = 0 if∑wixi < 1. The

result follows.

22.3 Problems and an additional example

The following problems naturally suggest themselves at this point:

(1) For which class of distributions is it the case that for simple majority small votingpower implies asymptotically complete aggregation of information?

(2) For which class of distributions is it the case that for every monotone Boolean functionsmall voting power implies asymptotically complete aggregation of information?

(3) For which class of distributions is it the case that for every monotone Boolean functionsmall individual effects implies asymptotically complete aggregation of information?

A natural condition to impose on the distribution µ which is realistic in various economicsituations is the FKG condition (see [47]). For x = (x1, . . . , xn) and y = (y1, . . . , yn), define

max(x, y) = (max(x1, y1), . . . ,max(xn, yn))

andmin(x, y) = (min(x1, y1), . . . ,min(xn, yn)).

135

One definition of FKG measure on 0, 1n goes as follows: A distribution µ on 0, 1n (oron Rn) is called an FKG measure if for every x, y ∈ 0, 1n we have

µ(x)µ(y) ≤ µ(max(x, y))µ(min(x, y)).

The FKG property is a profound notion of non-negative correlations between agents’ sig-nals. It implies (but is strictly stronger than) the following condition (known as non-negative association, see [49]): For all increasing real functions f and g, it is the case thatE[fg] ≥ E[f ]E[g]. This is equivalent to the condition that for all increasing events Aand B we have that P [AB] ≥ P [A]P [B]. Under the FKG property if the simple game ismonotone, all effects are non-negative. This form of non-negative correlation is a plausibleassumption to make in various contexts of collective choice. It is easy to see that underthe condition of non-negative association all individual effects are non-negative.

(4) For which class of monotone Boolean functions does small individual effects implyasymptotically complete aggregation of information?

In the following subsection, we present an example of an FKG measure and a monotoneBoolean function such that the individual effects are small and yet there is no asymptot-ically complete aggregation of information. In this example both the voting scheme andthe measure µ are invariant under a transitive group of permutations.

22.4 Example: FKG without aggregation

The measure µ. We start by describing the measure µ. The measure is given by a Gibbsmeasure for the Ising model on the 3-regular tree. See e.g. [30]. The measure is defined asfollows. Let Tr = (Vr, Er) be the r-level 3-regular tree. This is a rooted tree where eachinternal nodes has exactly 3 children and all the leaves are at distance exactly r from theroot ρ. Let Lr be the set of leaves of that tree. Note that |Lr| = 3r. Thus in Figure ??the underlying tree is T2.

We first define a measure ν on the tree 0, 1Vr . In this measure the probability of x isgiven by

ν[x] =1

2

∏(u,v)∈Er

((1− ε)1xu=xv + ε1xu 6=xv

).

In words, this means that in the measure ν the sign of the root xρ is chosen to be 0 or 1with probability 1/2. Then each vertex inherits its parents label with probability θ = 1−2εand is chosen independently otherwise.

Our measure µ is defined on 0, 1Lr (so that the voters are the leaves of the tree) asfollows.

µ[x] =∑

y:y|Lr≤x

ν[y]δ|i:xi=1,yi=0|.

136

In other words, a configuration of votes according to µ may be obtained by drawing aconfiguration x according to ν and looking at x|Lr. Then for each of the coordinates ofi ∈ Lr independently, the vote at x re-sampled to have the value 1 with probability δ.Below we will sometime abuse notation and write µ for the joint probability distributionof x and y.

Standard results for the Ising model (see, e.g., [30]) imply that µ is an FKG measure.Moreover it is easy to see that the measure is invariant under a transitive group and thatµ[xi] = (1 + δ)/2 for all i.

The function m. The function m is given by the recursive majority function m =RM3,r. Clearly, m is monotone, anti-symmetric and invariant under a transitive group.

Claim 22.3. If ε = δ ≤ 0.01 then µ[m] ≤ 1/2 + δ/2 for m = RM3,r and all r.

Proof. The proof below is similar to arguments in [50, 51]. Let (yv : v ∈ Vr) be chosenaccording to the measure ν. Let (xv : v ∈ Lr) be obtained from yv by re-sampling each ofthe coordinates of (yv : v ∈ Lr) to 1 with probability δ. Let (mv : v ∈ Vr) denote the valueof the recursive majority of all (xw : w ∈ Lr(v)), where Lr(v) are all the leaves of T belowv. We will show that µ[m = mρ = 0|yρ = 0] ≥ 1 − δ. Since µ[yρ = 0] = 1/2, we concludethat µ[m] ≤ 1/2 + µ[m|xρ = 0]/2 ≤ (1 + δ)/2, as needed.

We are interested in the probability that mv = 0 conditioned on yv = 0. It is easyto see that this probability only depends on the height of v, i.e., on the distance betweenv and the set of leaves. We let p(k) denote the probability that mv = 0 conditioned onyv = 0 for a vertex v of height k.

Clearly, p(0) = 1 − δ. We want to prove by induction that p(k) ≥ 1 − δ for all k. Letv be a node of height k + 1 and w a child of v. Note that conditioned on xv = 0 theprobability that mw = 0 is at least (1 − ε)p(k) which is at least t = (1 − ε)(1 − δ) by theinduction hypothesis. Moreover, noting that the values of the majorities of the children ofthe node v are conditionally independent given that mv = 0, we conclude that

p(k) ≥ t3 + 3t2(1− t) = 3t2 − 2t3 = t2(3− 2t).

We need that t2(3−2t) ≥ 1−δ or recalling that ε = δ: (1−ε)4(3−2(1−ε)2) ≥ (1−ε). Thisin turn is equivalent to (1−ε)3(3−2(1−ε)2) ≥ 1. The function h(ε) = (1−ε)3(3−2(1−ε)2)has h′(ε) = 10(1− ε)4 − 9(1− ε)2 = (1− ε)2(10(1− ε)2 − 9). Therefore h is increasing inthe interval [0, 0.01]. Since h(0) = 1 it follows that h(ε) ≥ 1 for all ε ≤ 0.01 as needed.

Our next objective is to bound the effect of a voter at level r. We will prove thefollowing:

Claim 22.4. The measure µ on Tr and the function m = RM3,r satisfy that the effect ofeach voter is at most (1− ε/2)(r−1)/2 + 2−(r−1)/2.

Proof. The argument here is similar in spirit to an argument in [10]. Let t+ s = r wheret ≥ (r − 1)/2 and s ≥ (r − 1)/2. Fix a leaf voter i. We want to estimate µ[m = 1|yi =

137

1] − µ[m = 1|yi = 0]. Let’s denote by µ0 the measure µ conditioned on yi = 0 and by µ1

the measure µ conditioned on yi = 1.Let i = v0, v1, . . . , vr = ρ denote the path from i to the root. We first claim that the

measures µ0, µ1 and µ may be coupled in such a way that except with probability (1−2ε)t

the only disagreements between µ0, µ1 and µ are on vertices below vt.The follows immediately from the random cluster representation of the model. In this

representation we declare and edge (u, v) open with probability (1 − 2ε) and closed withprobability 2ε. If the edge (u, v) is open then yu = yv, otherwise, the two labels areindependent. It is then clear that we may couple the two measure µ0, µ1 and µ below vt aslong as the path from i to vt contains at least one closed edge. The probability that suchan edge does not exist is at most (1− 2ε)t. The proof of the first claim follows.

For each j denote by uj and wj the siblings of vj . We assume that the measures µ0, µ1

and µ are coupled in such a way that the only disagreements between them are on verticesbelow vt. Note that if this is the case, then if the values of m under µ0 and µ1 are differentthen for all r ≥ j ≥ t it holds that muj 6= mwj . We wish to bound the µ probabilitythat muj 6= mwj for r ≥ j ≥ t. We will bounds this probability conditioned on the values(yvj )

rj=t. Conditioned on (yvj )

rj=t the event muj 6= mwj are independent for different j’s.

Moreover, by the Markov property, µ[muj 6= mwj |(yvh)rh=t] = µ[muj 6= mwj |yvj−1 ]. Finallynote that conditioned on yvj−1 the random variables muj ,mwj are identically distributedand independent. Therefore

µ[muj 6= mwj |yvj−1 ] ≤ maxp∈[0,1]

2p(1− p) ≤ 1/2.

We thus obtain that the µ probability that muj 6= mwj for r ≥ j ≥ t is at most 2−s.

References

[1] M. Ajtai and N. Linial. The influence of large coalitions. Combinatorica, 13(2):129–145, 1993.

[2] K. Arrow. A difficulty in the theory of social welfare. J. of Political Economy, 58:328–346, 1950.

[3] K. Arrow. Social choice and individual values. John Wiley and Sons, 1963.

[4] S. Barbera. Pivotal voters: A new proof of arrow’s theorem. Economics Letter, 6:13–16, 1980.

[5] J. J. Bartholdi, C. A. Tovey, and M. A. Trick. The computational difficulty of manip-ulating an election. Social Choice and Welfare, 6(3):227–241, 1989.

[6] W. Beckner. Inequalities in Fourier analysis. Ann. of Math. (2), 102(1):159–182, 1975.

138

[7] C. E. Bell. A random voting graph almost surely has a hamiltonian cycle when thenumber of alternatives is large. Econometrica, 49(6):1597–1603, 1981.

[8] M. Ben-Or and N. Linial. Collective coin flipping. In S. Micali, editor, Randomnessand Computation. Academic Press, New York, 1990.

[9] I. Benjamini, G. Kalai, and O. Schramm. Noise sensitivity of boolean functions andapplications to percolation. Inst. Hautes Etudes Sci. Publ. Math., 90:5–43, 1999.

[10] N. Berger, C. Kenyon, E. Mossel, and Y. Peres. Glauber dynamics on prob andhyperbolic graphs. Probab. Theory Related Fields, 131(3):311–340, 2005.

[11] A. Bonami. Etude des coefficients de Fourier des fonctions de Lp(G). Ann. Inst.Fourier (Grenoble), 20(fasc. 2):335–402 (1971), 1970.

[12] C. Borell. Positivity improving operators and hypercontractivity. Math. Zeitschrift,180(2):225–234, 1982.

[13] C. Borell. Geometric bounds on the Ornstein-Uhlenbeck velocity process. Z. Wahrsch.Verw. Gebiete, 70(1):1–13, 1985.

[14] B. Chor, O. Goldreich, J. Hasted, J. Freidmann, S. Rudich, and R. Smolensky. Thebit extraction problem or t-resilient functions. In — 26th Annual Symposium onFoundations of Computer Science, pages 396–407. IEEE, 1985.

[15] C. L. Coates and P. Lewis. Donut: A threshold gate computer. IEEE Transactionson Electronic Computers, (3):240–247, 1964.

[16] J.-A.-N. Condorcet. Essai sur l’application de l’analyse a la probabilite des decisionsrendues a la pluralite des voix. De l’Imprimerie Royale, 1785.

[17] V. Conitzer and T. Sandholm. Computing the optimal strategy to commit to. InProceedings of the 7th ACM conference on Electronic commerce, pages 82–90. ACM,2006.

[18] D. Cordero-Erausquin and M. Ledoux. Hypercontractive measures, talagrand’s in-equality, and influences. In Geometric Aspects of Functional Analysis, volume 2050 ofLecture Notes in Mathematics. 2012.

[19] A. De, E. Mossel, and J. Neeman. Majority is stablest : Discrete and sos. In STOC(Symposium on Theory of Computing), pages 477–486, 2013.

[20] A. De, E. Mossel, and J. Neeman. Majority is stablest: Discrete and sos. Theory ofComputing, 12(4):1–50, 2016.

139

[21] I. Diakonikolas, D. M. Kane, and J. Nelson. Bounded independence fools degree-2threshold functions. In Foundations of Computer Science (FOCS), 2010 51st AnnualIEEE Symposium on, pages 11–20. IEEE, 2010.

[22] R. Eldan. A two-sided estimate for the gaussian noise stability deficit. Inventionesmathematicae, 201(2):561–624, 2015.

[23] P. Faliszewski and A. D. Procaccia. Ai?s war on manipulation: Are we winning? AIMagazine, 31(4):53–64, 2010.

[24] W. Feller. An introduction to probability theory and its applications. Vol. I. Thirdedition. John Wiley & Sons Inc., New York, 1968.

[25] E. Friedgut and G. Kalai. Every monotone graph property has a sharp threshold.Proc. Amer. Math. Soc., 124:2993–3002, 1996.

[26] E. Friedgut, G. Kalai, N. Keller, and N. Nisan. A quantitative version of the gibbard–satterthwaite theorem for three alternatives. SIAM Journal on Computing, 40(3):934–952, 2011.

[27] E. Friedgut, G. Kalai, and A. Naor. Boolean functions whose fourier transform isconcentrated on the first two levels. Advances in Applied Mathematics, 29(3):427–437,2002.

[28] E. Friedgut, G. Kalai, and N. Nisan. Elections can be manipulated often. In Pro-ceedings of the 49th Annual IEEE Symposium on Foundations of Computer Science(FOCS), pages 243–249, 2009.

[29] A. Gelman, J. N. Katz, and F. Tuerlinckx. The mathematics and statistics of votingpower. Statistical Science, pages 420–435, 2002.

[30] H. O. Georgii. Gibbs measures and phase transitions, volume 9 of de Gruyter Studiesin Mathematics. Walter de Gruyter & Co., Berlin, 1988.

[31] A. Gibbard. Manipulation of voting schemes: a general result. Econometrica,41(4):587–601, 1973.

[32] O. Goldreich, S. Goldwasser, E. Lehman, D. Ron, and A. Samorodnitsky. Testingmonotonicity. Combinatorica, 20(3):301–337, 2000.

[33] L. Gross. Logarithmic Sobolev inequalities. Amer. J. Math., 97(4):1061–1083, 1975.

[34] O. Haggstrom, G. Kalai, and E. Mossel. A law of large numbers for weighted majority.Advances in Applied Mathematics, 37(1):112–123, 2006.

140

[35] M. Isaksson, G. Kindler, and E. Mossel. The geometry of manipulation - a quantitativeproof of the gibbard satterthwaite theorem. In Foundations of Computer Science(FOCS), pages 319–328, 2010.

[36] M. Isaksson, G. Kindler, and E. Mossel. The geometry of manipulation - a quantitativeproof of the gibbard satterthwaite theorem. Combinatorica, 32(2):221–250, 2012.

[37] M. Isaksson and E. Mossel. New maximally stable Gaussian partitions with discreteapplications. Israel Journal of Mathematics, 189:347–396, 2012.

[38] J. Jendrej, K. Oleszkiewicz, and J. O. Wojtaszczyk. On some extensions of the fkntheorem. Theory of Computing, 11(1):445–469, 2015.

[39] C. Jones. A noisy-influence regularity lemma for boolean functions, 2016. arXivpreprint arXiv:1610.06950.

[40] J. Kahn, G. Kalai, and N. Linial. The influence of variables on boolean functions.In Proceedings of the 29th Annual Symposium on Foundations of Computer Science,pages 68–80, 1988.

[41] G. Kalai. A Fourier-theoretic perspective on the Condorcet paradox and Arrow’stheorem. Adv. in Appl. Math., 29(3):412–426, 2002.

[42] G. Kalai. Social Indeterminacy. Econometrica, 72:1565–1581, 2004.

[43] N. Keller. A tight quantitative version of arrow’s impossibility theorem. 2012.

[44] J. S. Kelly. Voting anomalies, the number of voters, and the number of alternatives.Econometrica: Journal of the Econometric Society, pages 239–251, 1974.

[45] G. Kindler, N. Kirshner, and R. O?Donnell. Gaussian noise sensitivity and fouriertails. Israel Journal of Mathematics, 225(1):71–109, 2018.

[46] M. Ledoux. Remarks on gaussian noise stability, brascamp-lieb and slepian inequali-ties. In Geometric aspects of functional analysis, pages 309–333. Springer, 2014.

[47] T. M. Liggett. Interacting particle systems, volume 276 of Grundlehren der Mathema-tischen Wissenschaften [Fundamental Principles of Mathematical Sciences]. Springer-Verlag, New York, 1985.

[48] J. H. Lindsey. Assignment of numbers to vertices. The American MathematicalMonthly, 71(5):508–516, 1964.

[49] P. R. Milgrom and R. J. Weber. A theory of auctions and competitive bidding.Econometrica: Journal of the Econometric Society, pages 1089–1122, 1982.

141

[50] E. Mossel. Recursive reconstruction on periodic trees. Random Structures Algorithms,13(1):81–97, 1998.

[51] E. Mossel. Reconstruction on trees: beating the second eigenvalue. Ann. Appl. Probab.,11(1):285–300, 2001.

[52] E. Mossel. Gaussian bounds for noise correlation of functions and tight analysis oflong codes. In Foundations of Computer Science, 2008 (FOCS 08), pages 156–165.IEEE, 2008.

[53] E. Mossel. Arrow’s impossibility theorem without unanimity. Posted on Arxiv0901.4727, 2009.

[54] E. Mossel. Gaussian bounds for noise correlation of functions. GAFA, 19:1713–1756,2010.

[55] E. Mossel. A quantitative arrow theorem. Probability Theory and Related Fields,154(1):49–88, 2012.

[56] E. Mossel. Gaussian Bounds for Noise Correlation of Resilient Functions. ArXive-prints 1704.04745, 2019.

[57] E. Mossel and J. Neeman. Robust optimality of Gaussian noise stability. J. Eur.Math. Soc. (JEMS), 17(2):433–482, 2015.

[58] E. Mossel and R. O’Donnell. On the noise sensitivity of monotone functions. RandomStructures Algorithms, 23(3):333–350, 2003.

[59] E. Mossel, R. O’Donnell, and K. Oleszkiewicz. Noise stability of functions with lowinfluences: invariance and optimality (extended abstract). In 46th Annual IEEE Sym-posium on Foundations of Computer Science (FOCS 2005), 23-25 October 2005, Pitts-burgh, PA, USA, Proceedings, pages 21–30. IEEE Computer Society, 2005.

[60] E. Mossel, R. O’Donnell, and K. Oleszkiewicz. Noise stability of functions with lowinfluences: invariance and optimality. Annals of Mathematics, 171(1):295–341, 2010.

[61] E. Mossel, R. O’Donnell, O. Regev, J. E. Steif, and B. Sudakov. Non-interactive cor-relation distillation, inhomogeneous Markov chains, and the reverse Bonami-Becknerinequality. Israel J. Math., 154:299–336, 2006.

[62] E. Mossel, K. Oleszkiewicz, and A. Sen. On reverse hypercontractivity. Geometricand Functional Analysis, 23(3):1062–1097, 2013.

[63] E. Mossel and M. Z. Racz. A quantitative gibbard-satterthwaite theorem withoutneutrality. In H. J. Karloff and T. Pitassi, editors, STOC. Proceedings of the 44thSymposium on Theory of Computing Conference, STOC 2012, New York, NY, USA,May 19 - 22, 2012, pages 1041–1060. ACM, 2012.

142

[64] E. Mossel and M. Z. Racz. A quantitative gibbard-satterthwaite theorem withoutneutrality. Combinatorica, 35(3):317–387, 2015.

[65] E. Mossel and O. Schramm. Representations of general functions using smooth func-tions, 2008. Unpublished manuscript.

[66] J. Neeman et al. A multidimensional version of noise stability. Electronic Communi-cations in Probability, 19:1–10, 2014.

[67] I. Nehama. Approximately classic judgement aggregation. Annals of Mathematics andArtificial Intelligence, 68(1-3):91–134, 2013.

[68] E. Nelson. The free Markoff field. J. Functional Analysis, 12:211–227, 1973.

[69] R. G. Niemi and H. F. Weisberg. A mathematical solution for the probability ofparadox of voting. Behavioral Science, 13:317–323, 1968.

[70] N. Nisan. Algorithmic game theory / economics. postname: From arrow to fourier,2009.

[71] R. O’Donnell. Hardness amplification within np. In Proceedings of the 34th ACMSymposium on Theory of Computing, 2002.

[72] R. O’Donnell. Analysis of boolean functions. Cambridge University Press, 2014.

[73] R. O’Donnell, R. Servedio, L.-Y. Tan, and A. Wan. A regularity lemma for lownoisy-influences, 2010. Unpublished manuscript.

[74] A. D. Procaccia and J. S. Rosenschein. Junta distributions and the average-case com-plexity of manipulating elections. Journal of Artificial Intelligence Research, 28:157–181, 2007.

[75] V. Roychowdhury, K.-Y. Siu, A. Orlitsky, and T. Kailath. A geometric approach tothreshold circuit complexity. In Proceedings of the fourth annual workshop on Com-putational learning theory, pages 97–111. Morgan Kaufmann Publishers Inc., 1991.

[76] M. A. Satterthwaite. Strategy-proofness and Arrow’s Conditions: Existence and Cor-respondence Theorems for Voting Procedures and Social Welfare Functions. J. ofEconomic Theory, 10:187–217, 1975.

[77] W. Sheppard. On the application of the theory of error to cases of normal distributionand normal correlation. Phil. Trans. Royal Soc. London, 192:101–168, 1899.

[78] M. Talagrand. On Russo’s approximate 0-1 law. Annals of Probability, 22:1576–1587,1994.

143

[79] Wikipedia contributors. Marquis de condorcet — Wikipedia, the free encyclopedia,2019. [Online; accessed 2019].

[80] R. Wilson. Social choice theory without the pareto principle. Journal of EconomicTheory, 5(3):478–486, 1972.

144

probabilistic aspects of voting, intransitivity and manipulationelmos/fall19/survey.pdf ·...

Documents