folk theorem in repeated games with private...

Folk Theorem in Repeated Games with PrivateMonitoring

Takuo Sugaya�y

Princeton University

September 2, 2011

Abstract

We show that the folk theorem with individually rational payo¤s de�ned by purestrategies generically holds for N -player repeated games with private monitoring wheneach player�s number of signals is su¢ ciently large. No cheap talk communicationdevice or public randomization device is necessary.

Journal of Economic Literature Classi�cation Numbers: C72, C73, D82

Keywords: repeated game, folk theorem, private monitoring

�[email protected] thank Stephen Morris for invaluable advice, Yuliy Sannikov for helpful discussion, and Dilip Abreu,

Edoardo Grillo, Johannes Hörner, Yuhta Ishii, Michihiro Kandori, Vijay Krishna, George Mailath, TadashiSekiguchi, Andrzej Skrzypacz, Satoru Takahashi, Yuichi Yamamoto and seminar participants at 10th SAETconference, 10th World Congress of the Econometric Society, 22nd Stony Brook Game Theory Festival andSummer Workshop on Economic Theory for useful comments. The usual disclaimer of course applies.

1

1 Introduction

One of the key results in the literature on in�nitely repeated games is the folk theorem: Any

feasible and individually rational payo¤ can be sustained in equilibrium when players are

su¢ ciently patient. Fudenberg and Maskin (1986) establish the folk theorem under perfect

monitoring, that is, when players can directly observe the action pro�le. Fudenberg, Levine

and Maskin (1994) extend the folk theorem to imperfect public monitoring, where players

can observe only public noisy signals about the action pro�le.

The driving force of the folk theorem in perfect or public monitoring is the coordination

of future play based on common knowledge of relevant histories. Speci�cally, the public com-

ponent of histories, such as action pro�les in perfect monitoring or public signals in public

monitoring, reveals past action pro�les (at least statistically). Since this public information

is common knowledge, players can coordinate a punishment contingent on the public infor-

mation, and thereby provide dynamic incentives to choose actions that are not static best

responses.

With private monitoring, players can observe only private noisy signals about the action

pro�le. Common knowledge no longer exists and coordination is di¢ cult (we call this problem

�coordination failure�).1 Hence, the robustness of the folk theorem to a general private

monitoring has been an open question. For example, Kandori (2002) mentions that �[t]his

is probably one of the best known long-standing open questions in economic theory.�

Many economic situations should be analyzed as repeated games with private monitoring.

For example, Stigler (1964) proposes a repeated price-setting oligopoly, where �rms set their

own price in a face-to-face negotiation and cannot observe their opponents�prices. Instead, a

�rm obtains some information about opponents�prices through its own sales. Since the sales

level depends on both opponents�prices and unobservable shocks due to business cycles, the

sales is an imperfect signal. In addition, each �rm�s sales is often private information. Thus,

monitoring is imperfect and private. In addition, Fuchs (2007) applies a repeated game

1Mailath and Morris (2002 and 2006) and Sugaya and Takahashi (2010) o¤er the formal model of thisargument.

2

with private monitoring to a contract between a principal and an agent, and Harrington

and Skrzypacz (2007) analyze cartel behaviors using the framework of a repeated game with

private monitoring.

This paper is the �rst to show that the folk theorem holds in repeated games with generic

private monitoring. We unify and improve the three approaches in the literature on private

monitoring that have been used to show partial results so far: Belief-free, belief-based and

communication approaches.

The belief-free approach (and its generalization) has been successful to show the folk

theorem in the prisoners�dilemma.2 A strategy pro�le is belief-free if, for any history pro�le,

the continuation strategy of each player is optimal conditional on the opponent�s history.

With almost perfect monitoring, Piccione (2002) and Ely and Välimäki (2002) show the folk

theorem for the two-player prisoners�dilemma. Without any assumption on the precision of

monitoring but with conditionally independent monitoring, Matsushima (2004) obtains the

folk theorem in the two-player prisoners�dilemma.

Unfortunately, only limited results have been shown without almost perfect or condition-

ally independent monitoring: Fong, Gossner, Hörner and Sannikov (2010) show the payo¤ of

the mutual cooperation is approximately attainable and Sugaya (2010a) shows the folk the-

orem in the two-player prisoners�dilemma with some restricted classes of the distributions

of the private signals.

Several papers construct belief-based equilibria, where players�strategies involve statisti-

cal inference about the opponents�past histories. With almost perfect monitoring, Sekiguchi

(1997) shows the payo¤ of the mutual cooperation is approximately attainable and Bhaskar

and Obara (2002) show the folk theorem in the two-player prisoners� dilemma. Mailath

and Morris (2002 and 2006) consider the robustness of equilibria in public monitoring to

almost public monitoring. Based on their insights, Hörner and Olszewski (2009) establish

2Kandori and Obara (2006) use a similar concept to analyze a private strategy in public monitoring.Kandori (2010) considers �weakly belief-free equilibria,�which is a generalization of the belief-free equilibria.Apart from a typical repeated-game setting, Takahashi (2010) and Deb (2011) consider the communityenforcement and Miyagawa, Miyahara and Sekiguchi (2008) consider the situation where a player can improvethe monitoring by paying cost.

3

the folk theorem for almost public monitoring. Phelan and Skrzypacz (2009) characterize

the set of possible beliefs about opponents�states in a �nite-state automaton strategy and

Kandori and Obara (2010) o¤er an easy way to verify if a �nite-state automaton strategy is

an equilibrium.

Another approach to analyzing repeated games with private monitoring is to introduce

public communication. Folk theorems have been proven by Compte (1998), Kandori and

Matsushima (1998), Aoyagi (2002), Fudenberg and Levine (2002) and Obara (2007). In-

troducing a public element and letting a strategy depend only on the public element allow

these papers to sidestep the di¢ culty of coordination through private signals. However, the

analyses are not applicable to settings where communication is not allowed: For example, in

Stigler (1964)�s oligopoly example, anti-trust laws rule that communication is illegal. Fur-

thermore, it is uncertain whether their equilibria are robust to the perturbation that the

message transmission has small private noises although the robustness to a small private

noise is one of the main motivations in private monitoring.

This paper incorporates all the three approaches. First, the equilibrium strategy to show

the folk theorem is occasionally belief-free. That is, we see the repeated game as the repetition

of long review phases. At the beginning of each review phase, every on-path strategy of each

player is optimal conditional on the histories of the opponents. Second, however, the belief-

free property does not hold except at the beginning of the phases. Hence, we consider each

player�s statistical inference about the opponents�past histories as in belief-based approach

within each phase. Finally, in our equilibrium, the players do communicate but the message

exchange can be done with their actions. One of our methodological contributions is to o¤er

a systematic way to dispense the cheap talk message protocol with message exchange via

their actions.

The rest of the paper is organized as follows. Section 2 introduces the model and Section

3 states the assumptions and main result. The remaining parts of the paper are devoted

to its proof. Since the complete proof is long and complicated, in the proof of the main

text, we illustrate the main structure by focusing on the two-player prisoners�dilemma with

4

cheap talk and public randomization and refer the complete proof for a general game without

cheap talk and public randomization device to the Supplemental Materials. Section 4 relates

the in�nitely repeated game to a �nitely repeated game with an auxiliary scenario (reward

function) as Hörner and Olszewski (2006) and derives a su¢ cient condition on the �nitely

repeated game to show the folk theorem. In Section 5, we intuitively explain the equilibrium

construction. In particular, after explaining Matsushima (2004) with conditionally indepen-

dent signals, we explain how to extend his result to conditionally dependent signals. We

will see that the players need to coordinate their future play through private signals. After

we formally de�ne the structure of the �nitely repeated game in Section 6 and strategy in

Section 8, Section 9 explains how the coordination works. Sections 11 and 12 verify the

strategy satis�es the su¢ cient condition for the folk theorem derived in Section 4. All the

proofs are given in the Appendix. In Sections 13 and 14, we comment on how we generalize

the proof in the main text to the case for a general game without cheap talk and public

randomization device in the Supplemental Materials.

2 Model

2.1 Stage Game

The stage game is given by�I; (Ai; Yi; ~ui)i2I ; q

. I = f1; : : : ; Ng is the set of players, Ai

with jAij � 2 is the �nite set of player i�s pure actions, Yi is the �nite set of player i�s private

signals, and ~ui : Ai � Yi ! R is player i�s ex-post utility function. Let A �Q

i2I Ai and

Y �Q

i2I Yi be the set of action pro�les and signal pro�les, respectively.

In every stage game, player i chooses an action ai 2 Ai, which induces the action pro�le

a � (a1; : : : ; aN) 2 A. Then, a signal pro�le y = (y1; : : : ; yN) 2 Y is realized according to a

joint conditional probability function q (y j a). Given an action ai 2 Ai and a private signal

yi 2 Yi, player i receives the ex-post utility ~ui (ai; yi). Thus, her expected payo¤ conditional

on an action pro�le a 2 A is given by ui (a) �P

y2Y q (y j a) ~ui (ai; yi). For each a 2 A, let

u (a) represent the payo¤ vector (ui (a))i2I .

5

2.2 Repeated Game

Consider the in�nitely repeated game of the above stage game in which the (common)

discount factor is � 2 (0; 1). Let ai;� and yi;� denote respectively the action played and the

private signal observed in period � by player i. Player i�s private history up to period t � 1

is given by hti � (ai;� ; yi;� )t�1�=1. With h

1i = f;g, for each t � 1, let H t

i be the set of all hti. A

strategy for player i is de�ned to be a mapping �i :1St=1

H ti !4(Ai). Let �i be the set of all

strategies for player i. Finally, let E(�) be the set of sequential equilibrium payo¤s with a

common discount factor �.

3 Assumptions and Result

In this section, we state two assumptions and the main result (folk theorem).

First, we state an assumption on the payo¤structure. Let F � co(fu(a)ga2A) be the set of

feasible payo¤s. The individually rational payo¤for player i is v�i � mina�i2A�i maxai2Ai ui(ai; a�i).

Note that we concentrate on the pure strategy minimax. Then, the set of feasible and in-

dividually rational payo¤s is given by F � � fv 2 F : vi � v�i for all ig. We assume the full

dimensionality of F �.

Assumption 1 The stage game payo¤ structure satis�es the full dimensionality condition:

dim(F �) = N .

Second, we state an assumption on the signal structure.

Assumption 2 Each player�s number of signals is su¢ ciently large: For any i 2 I, we have

jYij � max(maxj 6=i

jAjj+ 2X

n6=i;i�1

jAnj ;Xj2IjAjj ;max

j2I2 jAjj

):

Note that RHS is bounded by a lineaer function ofP

j jAjj. Under these assumptions,

we can generically construct an equilibrium to attain any point in int(F �).

6

Theorem 1 If Assumptions 1 and 2 are satis�ed, then the folk theorem generically holds:

For generic q (� j �), for any v 2 int(F �), there exists �� < 1 such that, for all � > ��, v 2 E (�).

Since the full support assumption q(y j a) > 0 for all a 2 A and y 2 Y is generic,

we assume the monitoring is full support. Then, any sequential equilibrium is realization

equivalent to a Nash equilibrium. Hence, for the rest of the paper, we consider a Nash

equilibrium.

For the proof in the main text, we focus on the two-player prisoners�dilemma with perfect

and public cheap talk and public randomization device: I = 2, Ai = fCi; Dig and

ui(Di; Cj) > ui(Ci; Cj) > ui(Di; Dj) > ui(Ci; Dj) (1)

for all i. For notational convenience, whenever we say players i and j, unless otherwise

mentioned, i and j are di¤erent. In the two-player prisoners�dilemma, Assumptions 1 and

2 are equivalent to jYij � 4 for all i. Furthermore, we focus on v with

v 2 int([u1(D1; D2); u1(C1; C2)]� [u2(D2; D1); u2(C2; C1)]): (2)

Section 13 brie�y explains how to show the folk theorem for a general game and Section 14

explains how to dispense cheap talk and public randomization device. See the Supplemental

Materials for the formal proofs.

We prove the theorem with the following steps. We arbitrarily �x v with (2) and im-

plement v by a strategy pro�le that is recursive in every TP periods, where TP 2 N should

be determined later. In Section 4, we relate the in�nitely repeated game with a TP -period

�nitely repeated game with an auxiliary scenario. Speci�cally, we derive su¢ cient conditions

on a strategy and an auxiliary scenario in the TP -period �nitely repeated game from which

we can construct an equilibrium strategy to implement v in the in�nitely repeated game.

In Section 5, we intuitively explain the structure of the equilibrium. Given this, Section 6

explains the formal structure of the �nitely repeated game and Section 8 explains the equi-

librium strategy. In Sections 9, 11 and 12, we verify that the strategy satis�es the su¢ cient

7

conditions in the �nitely repeated game.

4 Finitely Repeated Game

In this section, we consider a TP -period �nitely repeated game with auxiliary scenarios.

We derive su¢ cient conditions on strategies and auxiliary scenarios in the �nitely repeated

game such that we can construct a strategy in the in�nitely repeated game to support v.

The su¢ cient conditions are stated in Lemma 1.

Let �TPi : HTPi ! �(Ai) be player i�s strategy in the �nitely repeated game. Let �

TPi be

the set of all strategies in the �nitely repeated game. Each player i has a state xi 2 fG;Bg.

In state xi, player i plays �i (xi) 2 �TPi . In addition, player i with xi gives an �auxiliary

scenario� (or �reward function�) �j(xi; � : �) to player j. Here, �j(xi; � : �) : HTP+1i ! R,

that is, the auxiliary scenarios are functions from the histories in the �nitely repeated game

to the real numbers.

For v with (2), we can take � > 0, vi and �vi such that

ui (D1; D2) + � < vi < vi < �vi < ui (C1; C2)� �: (3)

Our task is to �nd �i (xi) and �j(xi; � : �) such that, for su¢ ciently large �, there exists

TP such that, for any i 2 I,

1. For any xj 2 fG;Bg, �i (G) and �i (B) are optimal in the �nitely repeated game:

�i (G) ; �i (B) 2 arg max�TPi 2�TPi

E

"TPXt=1

�t�1ui (at) + �i(xj; hTP+1j : �) j �TPi ; �j(xj)

#: (4)

2. The discounted average of player i�s instantaneous utilities and player j�s auxiliary

scenario on player i is equal to �vi if player j�s state is good (xj = G) and equal to vi if

8

player j�s state is bad (xj = B):

1� �

1� �TPE

"TPXt=1

�t�1ui (at) + �i(xj; hTP+1j : �) j �(x)

#=

8<: �vi if xj = G;

vi if xj = B:(5)

Intuitively, for su¢ ciently large �, since lim�!11��1��TP =

1TP, this requires that the time

average of the expected sum of the instantaneous utilities and the reward is close to

the targeted payo¤s vi and �vi.

3. �i(G; hTP+1j : �) and �i(B; h

TP+1j : �) are uniformly bounded with respect to � and

�i(G; hTP+1j : �) � 0;

�i(B; hTP+1j : �) � 0: (6)

We call (6) the �feasibility constraint.�3

The following lemma gives us a su¢ cient condition about the �nitely repeated game to

show the folk theorem in the in�nitely repeated game:

Lemma 1 For Theorem 1, it su¢ ces to show that there exist ff�i (xi)gxi2fG;Bggi2I and

ff�i(xj; � : �)gxj2fG;Bggi2I satisfying (4), (5) and (6) in the TP -period �nitely repeated game.

5 Intuitive Explanation

5.1 With Conditionally Independent Signals

Following Matsushima�s (2004), we �rst consider the case with conditionally independent

monitoring:

q (yj j a; yi) = q (yj j a)3For notational convenience, when we say that (6) is satis�ed, it also implies that �i(G; h

TP+1j : �) and

�i(B; hTP+1j : �) are uniformly bounded with respect to �.

9

for all a 2 A and y 2 Y . That is, conditional on an action pro�le a, player i�s signal has

no information about the opponent�s signal. In this case, it is easy to construct �i (xi) and

�i(xj; � : �) satisfying (4), (5) and (6).

With conditional independence, we can see a TP -period �nitely repeated game as a T -

period �review round�with a su¢ ciently long TP = T .

�i(xi) is simply de�ned as follows: At the beginning of the �nitely repeated game, send

the message xi 2 fG;Bg by cheap talk. Then, constantly take

ai(xi) �

8<: Ci if xi = G;

Di if xi = B(7)

for T periods. That is, player i with �i(G) who wants to make player j�s value high takes

Ci and player i with �i(B) who wants to make player j�s value low takes Di.

We are left to construct �i(xj; � : �). We concentrate on the case with xj = G since

the case with xj = B is symmetric. If player j receives the message xi = B, then player

i is supposed to take Di. Since Di is the dominant action, player i does not need to be

incentivized by the auxiliary scenario. Consider the following constant reward function:

�i(G; hTP+1j : �) � ��T:

Since the reward is constant, player i plays Di. Therefore, �i(B) is optimal after the message

xi = B and

limT!1

lim�!1

1� �

1� �TPE

"TPXt=1

�t�1ui;t (a) + �i(G; hTP+1j : �) j �i (B) ; �j (G)

#

= limT!1

1

T

"TXt=1

ui (Di; Cj)� �T

#= u (Di; Cj)� � > �vi from (3). (8)

By subtracting a proper �xed (depending only on x) positive number from the reward func-

tion, it is possible to attain �vi exactly for su¢ ciently large � and T . Note that subtracting

10

a positive number does not violate (6).

On the other hand, if player j receives the message xi = G, then player j needs to

incentivize player i to take Ci by the auxiliary scenario �i(xj; � : �). Suppose there exist

statistics Ci;ajj;t 2 f0; 1g and q2 > q1 such that, for each aj 2 Aj, Ci;ajj;t = 1 indicates that

Ci is more likely to be played:

EhCi;ajj;t j ai; aj

i=

8<: q2 if ai = Ci;

q1 if ai = Di:(9)

The existence of such Ci;ajj;t , q2 and q1 will be proven in Lemma 3. Take �L such that

�L (q2 � q1) > maxa;i2 jui (a)j : (10)

Since player j with xj = G takes Cj, player j rewards player i based onPT

t=1Ci;Cjj;t , the

summation of Ci;Cjj;t :

�i(G; hTP+1j : �) � �L

(TXt=1

Ci;Cjj;t � (q2T + 2"T )

)�

� �T

with some small " > 0. The reward is linearly increasing in Ci;Cjj;t with slope �L untilPTt=1

Ci;Cjj;t hits the upper bound q2T + 2"T . If

PTt=1

Ci;Cjj;t does not hit the upper bound,

since playing Ci instead of Di increases the expectation of Ci;Cjj;t by (q2 � q1) from (9), the

marginal gain of taking Ci is �L (q2 � q1). Since (10) implies that this marginal gain dominates

the di¤erence in the instantaneous utilities, player i has the incentive to take Ci.

Hence, in order to show that player i takes Ci, it su¢ ces to show that, regardless of player

i�s signal observations, player i believes thatPT

t=1Ci;Cjj;t hits the upper bound with little

probability: For all hi,

Pr

(TXt=1

Ci;Cjj;t > q2T + 2"T

)j xj = G; hi

!� exp(� (q2 � q1)T ):

11

This can be shown as follows: If the monitoring is conditionally independent, regardless of

player i�s signal observations, player i�s beliefs onPT

t=1Ci;Cjj;t j xj = G; hi are approximately

distributed according to the normal distribution with the mean

E

"TXt=1

Ci;Cjj;t j Ci; Cj

#= q2T

and the standard deviation O(T12 ) by the central limit theorem.4 Since q2T +2"T is greater

than the mean by 2"T12 times T

12 , the order of the standard deviation, player i believes that

the probability thatPT

t=1Ci;Cjj;t hits the upper bound is negligible.

Therefore, �i(G) is optimal after the message xi = G and

limT!1

lim�!1

1� �

1� �TPE

"TPXt=1

�t�1ui;t (a) + �i(G; hTP+1j : �) j �i (G) ; �j (G)

#

= limT!1

1

TE

"TXt=1

ui;t (a) + �L

(TXt=1

Ci;Cjj;t � (q2T + 2"T )

)�

� �T j �i (G) ; �j (G)#

= u (Ci; Cj)� 2"�L� �;

which is larger than �vi for su¢ ciently small " > 0 from (3). By subtracting a proper �xed

number from the reward function, it is possible to attain �vi exactly for su¢ ciently large �

and T . Again, we can keep (6).

Since both �i(G) and �i(B) yield the same expected value �vi and are subgame perfect

once message xi is sent, both �i(G) and �i(B) are optimal. Therefore, we are done.

4Here, we consider the incentive only on the equilibrium path. Since defection decreases the expectedvalue of Ci;Cjj;t , together with conditional independence, defection reduces the probability that

PTt=1

Ci;Cjj;t

hits the upper bound even more.

12

5.2 With Conditionally Dependent Signals

5.2.1 Problem

Now we consider the two-player prisoners�dilemma with conditionally dependent monitoring

and explain the di¢ culty. To illustrate the problem, consider the case with xj = G and sup-

pose we use the same �i(G) and �i(B) and the same reward as in the case with conditionally

independent monitoring. A problem occurs when xi = G and player j needs to incentivize

player i to take Ci. Suppose player i�s signal and player j�s signal are highly correlated and

player i can inferPT

t=1Ci;Cjj;t from her own history very precisely. (i) Equation (5) requires

that the time average of the expected sum of the instantaneous utilities and the reward

function should be close to ui(Ci; Cj). (ii) Inequality (6) requires that the reward should be

negative. Since (iii) the time average of the instantaneous utilities is close to ui(Ci; Ci) if

player i takes Ci, (i), (ii) and (iii) together require that ifPT

t=1Ci;Cjj;t is close to its ex ante

value q2T , then the time average of the reward should be close to 0. This means that player

i cannot be rewarded ifPT

t=1Ci;Cjj;t is unusually high. If, in period � , player i infers from h�i

thatP�

t=1Ci;Cjj;t is already unusually high (greater than q2T + 2"T ) with high probability,

then player i stops cooperation after period � .5

Speci�cally, since the reward after receiving xi = G is �LfPT

t=1Ci;Cjj;t �(q2T+2"T )g��T ,

if player i believes thatPT

t=1Ci;Cjj;t has hit the upper bound q2T+2"T with high probability,

then player i wants to stop cooperation. The problem is that this stops increasing whenP�t=1

Ci;Cjj;t > q2T + 2"T .

5.2.2 Modi�cation

Structure of the Phase To deal with the problem above, we consider the following

modi�cation. First, for a moment,6 let us see the TP -period �nitely repeated game as L

5This is also mentioned by Fong, Gossner, Hörner and Sannikov (2010).6We will modify the structure further to deal with problems arising later. See Section 6 for the �nal

structure.

13

repetition of the T -period review rounds with

�L > �L: (11)

Now, TP = LT .

[Insert Figure1].

For notational convenience, let T (l) be the set of periods in the lth review round. When

player j monitors player i, player j randomly drops one period tj(l) from T (l). That is,

Pr(ftj(l) = tg) = 1Tfor all t 2 T (l). Let Tj(l) � T (l) n ftj(l)g be the remaining periods in

the lth review round. Player j monitors player i during T (l) by

Xj(l) �

8<:P

t2Tj(l)Ci;Cjj;t + 1tj(l) if xj = G;P

t2Tj(l)Ci;Djj;t + 1tj(l) if xj = B:

(12)

Here, 1tj(l) 2 f0; 1g is a random variable with Pr(�1tj(l) = 1

) = q2 conditional on tj(l).

Hence, instead of monitoring byP

t2T (l)Ci;Cjj;t , player j randomly picks tj(l) and replaces

Ci;Cjj;tj(l)

with the random variable 1tj(l) that is independent of the players�action. The reason

will be explained in Section 15.7.

Reward Function by Player j Second, we will heuristically de�ne the reward function

by player j.7 There are following cases:

xi = B: When player j receives the message xi = B, then player i takes a dominant

action and no incentive by the auxiliary scenario is necessary. Hence, it is straightforward

to construct a constant reward as in Section 5.1, achieving (4), (5) and (6). In this case, we

say �j(l) = G for all l = 1; :::; L. See the case with xi = G for the meaning of �j(l).

xi = G: When player j receives the message xi = G, then we need to deal with the

problem mentioned in Section 5.2.1. See each of the L review rounds independent and

7See Section 8.3 for the formal de�nition of the reward function.

14

consider the following reward function for each lth round:

�i(G; hTP+1j ; l) = �LfXj(l)� (q2T + 2"T )g � �T (13)

�i(B; hTP+1j ; l) = �LfXj(l)� (q2T � 2"T )g+ �T (14)

If we make

�i(xj; hTP+1j : �) =

LXl=1

�i(xj; hTP+1j ; l);

then, since the reward function is always increasing inXj (l) with slope �L, it is always optimal

for player i to take Ci.

Since �i(xj; hTP+1j : �) =

PLl=1 �i(xj; h

TP+1j ; l) does not satisfy (6), we need further modi-

�cation. Observe th following: For xj = G, if �i > ��T (that is, Xj(l) > q2T + 2"T ) occurs

at most for one review round, then (6) is still satis�ed. To see why, the maximum reward

in one review round is attained at Xj(l) = T and �i(G; hTP+1j ; l) < �LT � �T . On the other

hand, if Xj(l) � q2T + 2"T , then �i(G; hTP+1j ; l) � ��T . Since we take �L < L� in (11), the

total reward in the �nitely repeated game is still negative if Xj(l) > q2T + 2"T occurs only

for one review round. Symmetrically, for xj = B, if �i < �T (that is, Xj(l) < q2T � 2"T )

occurs at most for one review round, then (6) is still satis�ed.

Therefore, once

Xj (l) 62 [q2T � 2"T; q2T + 2"T ]

happens in the lth review round, we make player j�s reward function for the following review

rounds will be a negative constant for xj = G and a positive constant for xj = B. Speci�cally,

for the following review rounds ~l with ~l 2 fl + 1; : : : ; Lg, suppose we modify the reward

function as

�i(G; hTP+1j ; ~l) = ��T (15)

�i(B; hTP+1j ; ~l) = �T: (16)

Importantly, (6) is recovered since �i(G; hTP+1j ; l) > ��T and �i(B; hTP+1j ; l) < �T happen

15

at most for one review round by de�nition.

Let �j(l) 2 fG;Bg denote which reward function player j is using in the lth review

round:

� In the �rst review round, �j(1) = G and the reward is (13) or (14).

� In the (l + 1)th review round with l � 1,

� If �j(l) = B, then �j(l + 1) = B and the reward is (15) or (16).

� If �j(l) = G, then

� If Xj(l) 2 [q2T � 2"T; q2T + 2"T ], then �j(l + 1) = G and the reward is (13)

or (14).

� If Xj(l) 62 [q2T � 2"T; q2T + 2"T ], then �j(l + 1) = B and the reward is (15)

or (16).

See Figure 2 for the automaton representation. Hence, �j(l) = G implies that the reward

in the lth review round is (13) or (14) while �j(l) = B implies that the reward in the lth

review round is (15) or (16).

Note that ex ante, if the players play ai(xi); aj(xj), then Xj(l) is approximately dis-

tributed according to the normal distribution with expectation q2T and standard deviation

O(T12 ). Since Xj(l) 62 [q2T � 2"T; q2T + 2"T ] implies that Xj(l) is far away from its ex ante

value by 2T12 times the order of standard deviation, �j (l + 1) = B happens only with small

probability less than exp(� (q2 � q1)T ). Hence, we call the observation is �erroneous� if

Xj(l) 62 [q2T � 2"T; q2T + 2"T ].

[Insert Figure 2].

Optimal Action of Player i Third, given player j�s reward function, we will derive

the optimal action of player i. Since the shape of the reward function changes as �j(l)

changes, player i needs to infer �j(l). Let �j (l) 2 fG;Bg be player i�s inference of �j (l).

Forgetting the question of how player i infers �j(l) for a while, assume �j (l) is always correct:

�j (l) = �j (l) for all l.

16

After sending xi = B Since �j(l) = G for any history of player j, �j(l) = G. Since

the reward function is constant, it is optimal to take ai(xi) = Di. Therefore, �i(B) such that

player i always takes Di is optimal.

After sending xi = G If �j(l) = B, then since the reward is constant for the rest of the

�nitely repeated game, it is optimal to take Di. We verify the incentive to take cooperation

when �j(l) = G by the backward induction.

For the last review round l = L,

� If �j(L) = G, then player i wants to take Ci since the reward is linearly increasing in

Xj(l) with slope �L and there is no e¤ect on �j(L+ 1). Player i�s average continuation

payo¤ at the beginning of Lth review round is

� If xj = G, then

limT!1

lim�!1

1� �

1� �TE

24 Pt2T (L) �

t�tLui(Ci; Cj)

��tL+1��LfXj (L) + (q2T + 2"T )g+ �T

�j Ci; Cj

35= u (Ci; Cj)� 2"�L� � � �vi (17)

with tL being the �rst period of the Lth review round.

� If xj = B, then

limT!1

lim�!1

1� �

1� �TE

24 Pt2T (L) �

t�tLui(Ci; Dj)

+��tL+1��LfXj (L)� (q2T � 2"T )g+ �T

�j Ci; Dj

35= u (Ci; Dj) + 2"�L+ � � vi (18)

Note that replacing Ci;Cjj;tj(l)with 1tj(l) in (12) does not change the incentive and value

in the limit where T goes to in�nity.

� If �j(L) = B, then player i wants to take Di since the reward is constant. Player i�s

average continuation payo¤ at the beginning of Lth review round is

17

� If xj = G, then

limT!1

lim�!1

1� �

1� �T

24 Xt2T (L)

�t�tLui(Di; Cj)� ��tL+1�T

35 = u (Di; Cj)� � � �vi: (19)

� If xj = B, then

limT!1

lim�!1

1� �

1� �T

24 Xt2T (L)

�t�tLui(Di; Di) + ��tL+1�T

35 = u (Di; Dj) + � � vi: (20)

We can subtract proper positive numbers depending only on x and �j(L) from (17) and

(19) such that the continuation payo¤ is exactly �vi if xj = G. Similarly, we can add proper

positive numbers depending only on x and �j(L) to (18) and (20) such that the continuation

payo¤ is exactly vi if xj = B. Note that we can keep (6).

Therefore, �i(G) such that player i with �j(L) = G plays Ci and with �j(L) = B plays

Di is optimal and player i�s continuation payo¤ is independent of �j(L).

For the second last review round l = L � 1, since the continuation payo¤ from the Lth

review round is constant for �j(L), player i do not need to consider the e¤ect of her strategy

in the (L� 1)th review round on �j(L). Therefore, the same proof shows that �i(G) such

that player i with �j(L�1) = G plays Ci and with �j(L�1) = B playsDi is optimal. Further,

we can make sure that player i�s continuation payo¤at the beginning of the (L� 1)th review

round is �vi or vi depending on xj but independently of �j(L� 1).

Recursively, we can show that �i(G) such that player i with �j(l) = G plays Ci and with

�j(l) = B plays Di for all l is optimal and gives player i �vi or vi depending on xj.

In summary, player i�s action in the lth review round is de�ned as in Figure 3:

[Insert Figure 3]

Reward Function of Player j Revisited Fourth, we modify player j�s reward function

further to incorporate the fact that player j with �i(l) = B (player j infers that player i

18

has observed erroneous histories) takes Dj 6= aj(xj). As a preparation, we construct special

reward functions �Gi (aj; yj) and �Bi (aj; yj) that make player i indi¤erent between any action

pro�le.

Lemma 2 Generically, the following statement is true: For any � > 0, there exists �u > �

such that, for each i 2 I, there exist �Gi : Aj � Yj ! [��u;��] and �Bi : Aj � Yj ! [�; �u] such

that

ui (a) + E��Gi (aj; yj) j a

�= constant � ��u for all a 2 A;

ui (a) + E��Bi (aj; yj) j a

�= constant � �u for all a 2 A:

When �i(l) = B, player j uses the reward �xji (aj;t; yj;t) for each t in the ~lth review round

with ~l � l so that player i can always assume �i(l) = G. Therefore, the modi�ed reward

function is explained in the following �gure:

[Insert Figure 4]

The incentive compatibility that player i does not try to manipulate �j(l) will be proven

in Proposition 1.

5.2.3 Summary from the Perspectives of In�nitely Repeated Games

Here, we o¤er the summary of our equilibrium construction, using the language of in�nitely

repeated games. As we can see from the proof of Lemma 1, we can see the �nitely repeated

game as the �rst TP periods in the in�nitely repeated game and a positive (negative, re-

spectively) reward implies that in the continuation play from period TP +1 is higher (lower,

respectively) than the value in the initial period.

Suppose xj = G. Then, the value for player i in the initial period needs to be close to

ui(C;C) to attain e¢ ciency. Since the value in the initial period is very high, player j with

xj = G cannot reward player i with high realization of auxiliary scenario (�i > ��T ) by

19

going to a higher continuation payo¤ from period TP + 1. Instead, player j �allows�player

i to take defection and �rewards�player i by higher instantaneous utilities. While doing so,

player j�s incentive is given by the changes in the continuation payo¤ from period TP + 1.

That is, the changes in the continuation payo¤ for player j while player i with �j(l) = B

defects correspond to the realization of the auxiliary scenario in Lemma 2 (the roles of players

i and j are reversed).

5.2.4 Coordination Problem

We are left to show how player i infers �j(l). The rest of the paper is mainly devoted to the

coordination between �j(l) 2 fG;Bg and �j(l) 2 fG;Bg. In Section 6, we further modify

the structure of the �nitely repeated game by introducing the supplemental rounds where

player j sends the message about �j(l + 1) by her actions after the lth review round. In

Section 8, we formally de�ne the equilibrium strategy (actions and rewards) except for how

player i infers player j�s message in the supplemental rounds. Then, in Section 9, we specify

player i�s inference of player j�s message in the supplemental rounds. This fully pins down

the strategy (actions and rewards).

While we de�ne the equilibrium, we will introduce variables with various restrictions. In

Section 10, we verify that we can take these variables consistently.

Then, in Sections 11 and 12, we verify the equilibrium strategy (actions and rewards)

satis�es (4) (5) and (6).

6 Structure of the Phase

In this section, we explain the structure of the TP -period �nitely repeated game, which is

summarized in Figure 5 below. TP depends on L and T . L 2 N will be pinned down in

Section 10. T 2 N is a parameter of the equilibrium.

At the beginning of the �nitely repeated game, we insert the �coordination block�where

each player i with �i(xi) sends xi by cheap talk simultaneously. At the end of the �nitely

20

repeated game, we insert the �report block�where the players report the whole history in

the �nitely repeated game by cheap talk. The detail of the report block will be explained in

Section 12. These blocks are instantaneous with cheap talk but will take multiple periods

when we dispense cheap talk in the Supplemental Materials 3 and 4.

Between the coordination and report blocks, the players play TP -periods �nitely repeated

game. We divide TP periods into L �main blocks.�The �rst (L�1) blocks is further divided

into the following �ve rounds: For l 2 f1; :::; L � 1g, the lth block consists of T -period

review round, T12 -period supplemental round 1 for �1(l+ 1), T

12 -period supplemental round

2 for �1(l + 1), T12 -period supplemental round 1 for �2(l + 1) and T

12 -period supplemental

round 2 for �2(l + 1).8 The last Lth block has only the T -period review round. In the

lth block, the sets of periods in the review round, supplemental round 1 for �i(l + 1) and

supplemental round 2 for �i(l+1) respectively are denoted by T (l), T (l; �i; 1) and T (l; �i; 2).

For su¢ ciently large T , the length of the review round is much larger than that of the other

four rounds and the payo¤s from the review rounds approximately determine the equilibrium

payo¤. See Figure 5 for the illustration.

[Insert Figure 5]

We show that, for su¢ ciently large T , for su¢ ciently large �, with TP = (L� 1)nT + 4T

12

o+

T , there exist �i (xi) and �i(xj; � : �) satisfying (4), (5) and (6).

7 Almost Optimality

Instead of proving (4), we only establish the �almost optimality with exp(�T 14 ) > 0�until

Section 12: For all i 2 I and x 2 fG;Bg2, for any � and h�i , the loss of playing the

8Throughout the paper, we neglect the integer problem since it is handled by replacing each variable sthat should be an integer with bsc = maxn2N

n�sn.

21

continuation strategy �i(xi) j h�i is smaller than exp(�T14 ):

max�TPi 2�TPi

EhPTP

t=1 �t�1ui (at) + �i(xj; h

TP+1j : �) j h�i ; �TPi ; �j(xj)

i�E

hPTPt=1 �

t�1ui (at) + �i(xj; hTP+1j : �) j h�i ; �(x)

i � exp(�T 14 ): (21)

To see why this is su¢ cient, remember that we insert the report block where the players

report the whole history hi at the end of the �nitely repeated game (see Figure 5). Based

on the reported history hi, player j adjusts the reward function �i(xj; � : �) so that the

prescribed action is exactly optimal. This adjustment is very small for large T since the

original strategy was optimal up to the loss of exp(�T 14 ) by (21).

The remaining task is to show the incentive to tell the truth about hi. Intuitively,

with some norm, player j punishes player i proportionally to hj � E hhj j hii 2. For

a properly chosen norm, the �rst order condition to minimize the expected punishment

E� hj � E hhj j hii 2 j hi� is to tell the truth: hi = hi. Since the adjustment is small,

this punishment can also be small and does not a¤ect the equilibrium payo¤. Section 12

formalizes the argument.

Therefore, until Section 12, our objective is to construct �i (xi) and �i(xj; � : �) satisfying

(21), (5) and (6).

8 Equilibrium Strategies

In this section, we partially de�ne �i (xi) and �i(xj; � : �). In Section 8.1, we de�ne the

equilibrium action given �j(l). Then, in Section 8.2, we de�ne �j(l) given player i�s inference

of player j�s messages in the supplemental rounds. The de�nition of player i�s inference of

player j�s messages in the supplemental rounds will be deferred to Section 9. In addition,

Section 8.3 de�nes the reward function �i(xj; � : �).

22

8.1 Actions Given �j(l)

In this subsection, we explain the strategies �i(xi) given �j(l) 2 fG;Bg. The transition of

�j(l) will be explained in the next subsection.

In the each lth review round, player i with �i(xi) takes ai(xi) with

ai(xi) �

8<: Ci if xi = G;

Di if xi = B(22)

if �j(l) = G and Di if �j(l) = B. See Figure 3 to check the equilibrium action.

As player j calculates Xj(l) de�ned in (12) to monitor player i after xi = G, player i

calculates

Xi(l) �

8<:P

t2Ti(l)Cj ;Cii;t + 1ti(l) if xi = G;P

t2Ti(l)Cj ;Dii;t + 1ti(l) if xi = B

(23)

if xj = G. Then, �i(l + 1) = B is the record of an erroneous history

Xi(l) =2 [q2T � 2"T; q2T + 2"T ]: (24)

That is,

�i(l + 1) =

8<: G if l = 0 or Xi(~l) 2 [q2T � 2"T; q2T + 2"T ] for all ~l � l;

B otherwise(25)

if the cheap talk message is xj = G and �i(l + 1) = G if xj = B. Review Figure 2 for the

graphical explanation (note that the roles of i and j are reversed).

Then, in the supplemental rounds 1 and 2 for �i(l + 1), player i sends �i(l + 1). How

to send the message will be de�ned in Section 9.2. On the other hand, in the supplemental

rounds 1 and 2 for �j(l + 1), player j sends �j(l + 1) symmetrically de�ned. While player j

sends the message, player i takes Ci. Let �j(l + 1)(i) be player i�s inference of the message

�j(l + 1), which will be de�ned in Section 9.2.

23

8.2 Inference of �j(l + 1)

In this subsection, we explain the transition of player i�s inference of �j(l + 1), �j(l + 1) 2

fG;Bg. Here, we take player i�s inference of player j�s messages in the supplemental rounds

for �j(l + 1) (denoted by �j(l + 1)(i) 2 fG;Bg) as given. See Section 9 for the explanation

of �j(l + 1)(i).

8.2.1 Statistics

Since �j(l+1) depends on Xj(l) =P

t2Tj(l)ai(xi);aj(xj)j;t +1tj(l) as we have mentioned in (23),

(24) and (25), we formally de�ne aj;t �rst. Since aj;t is i.i.d. within each review round, we

omit the time subscript if it is not confusing.

We want aj to satisfy the following two conditions: First, player j needs to statistically

monitor whether player i takes ai or not. In particular, as we have mentioned in (9), we

want to establish that

E�aj j ~ai; aj

�=

8<: q2 for ~ai = ai;

q1 for all ~ai 6= ai

with q2 > q1.

Second, as we will see in Section 8.2.2, player i�s continuation play depends on E�aj j a; yi

�.

We want to prevent player j from deviating from aj to ~aj 6= aj to manipulate E�aj j a; yi

�.

It is su¢ cient to have

EyihEaj

�aj j a; yi

�j ai; ~aj

iis constant with respect to ~aj. That is, player i calculates the conditional expectation of aj

believing that aj is taken. The ex ante value of this conditional expectation is constant even

if player j secretly deviates to ~aj 6= aj.

In summary, we want to construct aj with the following conditions:

Condition 1 We want to have

24

1. aj monitors ai:

E�aj j ~ai; aj

�=

8<: q2 for ~ai = ai;

q1 for all ~ai 6= ai:

2. Player j cannot manipulate the ex ante value of Eaj�aj j a; yi

�: For all ~aj 2 Aj,

EyihEaj

�aj j a; yi

�j ai; ~aj

i= q2:

We construct aj in the following two steps. First, we de�ne aj : Yj ! (0; 1). Second,

player j constructs a random variable aj 2 f0; 1g from aj (yj) as follows: After taking aj

and observing yj, player j calculates aj (yj). After that, player j draws a random variable

from the uniform distribution on [0; 1]. If the realization of this random variable is less than

aj (yj), then aj = 1 and otherwise,

aj = 0.

Since Pr(faj = 1g j ~a; y) = aj (yj) for all ~a 2 A and y 2 Y , Condition 1 is satis�ed for

aj if and only if aj (yj) satis�es

1.

E� aj (yj) j ~ai; aj

��Xyj

q(yj j ~ai; aj) aj (yj) =

8<: q2 if ~ai = ai;

q1 if ~ai 6= ai;(26)

2. Eyi�Eyj� aj (yj) j a; yi

�j ai; ~aj

�is constant with respect to ~aj. That is, for all ~aj 2 Aj,

Xyi

8<:Xyj

aj (yj)q(yj j a; yi)

9=; q (yi j ai; ~aj) = q2: (27)

Formally, we show the following lemma:

Lemma 3 Generically, the following statement is true: There exist q2 > q1 such that, for

each i 2 I and a 2 A, there exists a function aj : Yj ! (0; 1) such that (26) and (27) are

satis�ed.

25

8.2.2 Inference

Now we are ready to explain the transition of �j(l+1) 2 fG;Bg. If xi = B, then �j(l+1) = G

is common knowledge and so de�ne �j(l + 1) = G.

We are left to consider the case with xi = G. Since �j(1) = G is common knowledge and

so de�ne �j(1) = G. Further, since �j(l + 1) = B once �j(~l) = B has happened for some

~l � l from (25), de�ne �j(l+1) = B once �j(~l) = B has happened for some ~l � l. Hence, we

are left to specify, conditional on xi = G and �j(~l) = G for all ~l � l, how �j(l+ 1) 2 fG;Bg

is determined.

Suppose �j(l) = G is a correct inference: �j(l) = G. Then, �j(l + 1) is determined as

�j(l + 1) =

8<: G if Xj(l) 2 [q2T � 2"T; q2T + 2"T ]

B if Xj(l) 62 [q2T � 2"T; q2T + 2"T ](28)

with Xj(l) =P

t2Tj(l)ai(xi);aj(xj)j;t +1tj(l). Therefore, it is natural to consider the conditional

expectation of Xj(l):

E�Xj(l) j fat; yi;tgt2T (l)

�Instead of using this, we consider

Xt2Ti(l)

Eha(x)j;t j a(x); yi;t

i+ q2

with a (x) = (ai(xi); aj(xj)). Two reminders: First, player i calculates the expectation of the

summation of a(x)j;t over Ti(l) (the set of periods when player i uses to monitor player j),

not Tj(l) (the set of periods when player j uses to monitor player i).9 As we will see, since

Ti(l) and Tj(l) are di¤erent at most for two periods, this di¤erence is negligible for almost

optimality (21). Second, player i conditions on a(x) being taken. Player i with �j(l) = G

takes ai(xi) = Ci in the lth review round. In addition, as we have explained in Section

5.2.2, player j makes player i indi¤erent between any action pro�le for the rest of the �nitely

9The term q2 re�ects the fact that the expected value of 1tj(l) is q2.

26

repeated game if �i(l) = B. Therefore, player i always conditions �i(l) = G and player j

takes aj(xj) in the lth review round to calculate the conditional expectation.

Further, instead of using Eha(x)j;t j a(x); yi;t

idirectly, player i constructs (Ei

a(x)j )t 2

f0; 1g as follows: After taking ai(xi) and observing yi;t, player i calculates Eha(x)j;t j a(x); yi;t

i.

After that, player i draws a random variable from the uniform distribution on [0; 1]. If the

realization of this random variable is less than Eha(x)j;t j a(x); yi;t

i, then (Ei

a(x)j )t = 1 and

otherwise, (Eia(x)j )t = 0. Let

EiXj(l) =Xt2Ti(l)

(Eia(x)j )t + q2:

Since

Pr�n(Ei

a(x)j )t = 1

oj at; yt

�= E

ha(x)j;t j a(x); yi;t

ifor all at and yt, conditional on fat; ytgt2T (l), the probability that��

Xt2Ti(l)

Eha(x)j;t j a(x); yi;t

i+ q2 � EiXj(l)

�� 1

4"T (29)

is of order exp(�T ) by the central limit theorem.

There are following cases:

1. (29) is not satis�ed. Let � i(l) = B denote this event. Player i will use the reward

�xij (ai;t; yi;t) de�ned in Lemma 2 in the subsequent rounds and so player j is indi¤erent

between any action pro�le. Hence, this case is excluded from player j�s consideration.

The inference of �j(l + 1) by player i will be equal to the inference of the messages

about �j(l+1) in the supplemental rounds 1 and 2 for �j(l+1): �j(l+1) = �j(l+1)(i).

2. (29) is satis�ed. Let � i(l) = G denote this event. Consider the following subcases:

(a) We have

EiXj(l) 62 [q2T �1

2"T; q2T +

1

2"T ]:

27

Let �i(l) = B denote this event. Player i will use the reward �xij (ai;t; yi;t) in the

subsequent rounds. However, compared to Case 1, player j does not exclude this

case from her consideration. Again, player i uses the inference of the messages in

the supplemental rounds: �j(l + 1) = �j(l + 1)(i).

(b) We have

EiXj(l) 2 [q2T �1

2"T; q2T +

1

2"T ]: (30)

Player i randomly picks the following two procedures:

i. With small probability � > 0, player i will use the reward �xij (ai;t; yi;t) in the

subsequent rounds. Again, player i uses the inference of the messages in the

supplemental rounds: �j(l+1) = �j(l+1)(i). Let �i(l) = B denote this event

(hence, �i(l) = B implies that either 2-(a) or 2-(b)-i happens).

ii. With large probability 1� �, player i believes �j(l+1) = G regardless of the

history in the supplemental rounds 1 and 2 for �j(l+1). Let �i(l) = G denote

this event.

Then, we de�ne �i(l) 2 fG;Bg after � i(l) = B in the same way as after � i(l) = G: If

(30) is not satis�ed, then we have �i(l) = B. If (30) is satis�ed, then with probability

�, we have �i(l) = B, and with probability 1� �, we have �i(l) = G.

For concreteness, we de�ne that player i who has deviated before the supplemental round

1 for �j(l + 1) uses �j(l + 1) = �j(l + 1)(i).

If 2-(b)-ii is the case, then since Ti(l) and Tj(l) are di¤erent only for two periods, after

knowing Ti(l) and Tj(l), (29) and (30) imply

E�Xj(l) j a(x); fyi;tgt2T (l)

�2 [q2T � "T; q2T + "T ]: (31)

Conditional on a(x) and fyi;tgt2T (l), conditional distribution of Xj(l) is distributed approxi-

mately according to the normal distribution with mean E�Xj(l) j a(x); fyi;tgt2T (l)

�and the

28

standard deviation O(T12 ) by the central limit theorem. Since (31) implies that the con-

ditional mean is inside of [q2T � 2"T; q2T + 2"T ] by "T12 times T

12 , the order of the stan-

dard deviation of the conditional distribution Xj(l) j a(x); fyi;tgt2T (l), player i believes that

Xj(l) 2 [q2T�2"T; q2T+2"T ] with probability of order 1�exp(�T ). Hence, player i believes

�j(l + 1) = G with high probability and so �j(l + 1) = G.

If 1, 2-(a) or 2-(b)-i is the case, we say �player i is ready to listen (to player j�s message

in the supplemental rounds).�

See the following �gure for all the classi�cation:

[Insert Figure 6]

Boxes 3, 5, 7 and 10 respectively correspond to 1, 2-(b)-ii, 2-(a) and 2-(b)-i.

Note that as we have mentioned in Figure 4 (the roles of i and j are reversed), when

player i has �j(l+1) = B, one of Boxes 3, 7 and 10 must to be the case and player j will be

indi¤erent between any action pro�le.

In addition, when player i uses the messages in the supplemental rounds to infer �j(l+1),

player i has made player j indi¤erent between any action pro�le. Therefore, we do not need

to consider the truthtelling incentive for player j in the supplemental rounds 1 and 2 for

�j(l + 1).

Further, as we mentioned in (29), conditional on fat; ytgt2T (l), (29) is not satis�ed with

probability of order exp(�T ) by the central limit theorem. And 2 of Lemma 3 implies that

the distribution of EiXj(l) is independent of player j�s action. Therefore, player j cannot

manipulate �i(l) 2 fG;Bg by changing her own action in the lth review round.

Therefore, we have proven the following lemma:

Lemma 4 For su¢ ciently large T , for all x 2 fG;Bg2, if �j(l) = �i(l) = G, then:

1. Conditional on any fat; ytgt2T (l), � i(l) = G with probability no less than 1� exp(�T 45 ).

2. The distribution of �i(l) 2 fG;Bg is independent of player j�s action.

29

3. If 2-(b)-ii is the case in the above explanation and �j(l) = �j(l) = G, then player i�s

conditional belief on �j(l + 1) = G at the end of the lth review round is no less than

1� exp(�T 45 ).

4. Whenever player i uses player i�s inference of player j�s messages in the supplemental

rounds for �j(l + 1), player i has made player j indi¤erent between any action pro�le

in the rounds after the lth review round.

8.3 Reward Function

In this subsection, we explain player j�s reward function on player i, �i(xj; � : �). The

following notation is useful: Let r 2 f1; :::; Lg [ f(l; �1; 1); (l; �1; 2)gl [ f(l; �2; 1)[ (l; �2; 2)glbe the generic index for the rounds. From Figure 5, there is the chronological order of the

rounds. Hence, with abuse of notation, we identify round r + 1 as the round coming right

after r. For example, with r = l (lth review round), r + 1 is the supplemental round 1 for

�1(l+ 1). In addition, let r < l if and only if the round r is before the lth review round and

r � l if and only if r < l or r = l.

The total reward is the summation of the following rewards in the review rounds and

those in the supplemental rounds:

The lth review round If �j(r) = B, #j(r) = B or �j(r) = B happens for some r < l,

then player j makes player i indi¤erent between any action pro�le sequence by

�i(xj; hTP+1j ; l) =

8<:P

t2T (l) �t�1�Gi (aj;t; yj;t) if xj = G;P

t2T (l) �t�1�Bi (aj;t; yj;t) if xj = B:

(32)

If �j(r) = #j(r) = �j(r) = G for all r < l, then player j�s reward based on x and �j(l).

Remember that the basic structure explained in Section 5.2.2 is summarized in the following

�gure:

[Insert Figure 7].

30

The formal description is given by

�i(xj; hTP+1j ; l) =

8>>>>>>>>>>>><>>>>>>>>>>>>:

��i(x; l; G)� �T if xj = G and xi = B;

��i(x; l; G) + �T if xj = B and xi = B;

��i(x; l; G) + �LfXj(l)� (q2T + 2"T )g � �T if xj = G, xi = G and �j(l) = G;

��i(x; l; B)� �T if xj = G, xi = G and �j(l) = B;

��i(x; l; G) + �LfXj(l)� (q2T � 2"T )g � �T if xj = B, xi = G and �j(l) = G;

��i(x; l; B) + �T if xj = B, xi = G and �j(l) = B:

(33)

Here, ��i(x; l; �j(l)) will be determined in Section 11 so that (21), (5) and (6) are satis�ed.

The supplemental rounds 1 and 2 for �1(l+ 1) and �2(l+ 1) Player j makes player i

indi¤erent between any action pro�le sequence by

Xt

�t�1�xji (aj;t; yj;t): (34)

On the top of that, in the supplemental rounds 1 and 2 for �j(l + 1), where player i is the

receiver of the message, player j adds8<: ��LP

t(1�Ci;aj;tj;t ) � 0 if xj = G;

�LP

tCi;aj;tj;t � 0 if xj = B

(35)

to make it strictly optimal to take Ci. Note that (35) is linearly increasing in Ci;aj;tj;t .

The summary of rewards after �j = B, #j = B or �j = B In the lth review round,

if �j(r) = B, #j(r) = B or �j = B with r < l has happened, (32) implies that any action

pro�le gives the same payo¤. On the other hand, in the supplemental rounds where player i

receives the message, (35) is valid even after �j = B, #j = B, or �j = B. Since (34) cancels

out the di¤erence in the instantaneous utilities w.r.t. an action pro�le, taking Ci is strictly

optimal by (35) and gives a constant expected payo¤.

31

With abuse of notation, we say �player i is indi¤erent between any action pro�le� if

�j = B, #j = B, or �j = B has happened, although player i strictly prefers Ci while she

receives the message.

For concreteness, we de�ne that, after player j deviates in the round r, player j has

�j(r) = B and makes any action optimal for player i.

8.3.1 Set of Almost Optimal Action

Given the above reward function, let Ai(l) be the set of player i�s optimal action in the lth

review round. As we will show in Section 11, Ai(l) is as follows: If �j(r) = B, #j(r) = B or

�j(r) = B happens for some r < l (and so (32) is being used), then Ai(l) = Ai. Otherwise

(and so (33) is being used), Ai(l) depends on xi 2 fG;Bg and �j(l) 2 fG;Bg. If xi =

�j(l) = G, since the reward is increasing in Xj(l), Ai(l) = fCig. If xj = B or �j(l) = B,

then since the reward is constant, Ai(l) = fDig.

9 Inference of the Messages

In this section, we de�ne how player i infers player j�s messages in the supplemental rounds

for �j(l + 1). That is, how �j(l + 1)(i) 2 fG;Bg is determined.

9.1 Conditions on the Message Exchange

We derive conditions on �j(l + 1)(i) that makes the inferences explained in Figure 6 almost

optimal. See Condition 2 below for the summary.

First, if (29) and (30) are satis�ed, player i needs to believe �j(l + 1) = G is true with

high probability regardless of player i�s continuation history. From 3 of Lemma 4, player

i before seeing player j�s action in the continuation play believes that the probability of a

mistake is no more than

exp(�T 45 ): (36)

32

As we will see in Sections 9.2.1 and 9.2.2, player j�s strategy in the supplemental rounds

is determined by �j(l+1). In addition, as we have seen in Section 8.1, player j�s strategy in

the review rounds is determined by �i(l + 1). Since the length of the supplemental rounds

is 4LT12 , this only updates player i�s belief by of order exp(T

12 ). Since (36) is much smaller

than 1= exp(T12 ), player i can keep high belief on �j(l+1) = G regardless of player i�s history

in the supplemental rounds.

In addition, player j�s actions in the review rounds indirectly reveal �j(l + 1) through

�i(l + 1). Suppose player i knows �i(l + 1). See Figure 8, which summarizes the important

features of �i and �j explained in Figure 6. Here, ai(l + 1) is player i�s equilibrium action

in the (l + 1)th review round prescribed in Section 8.1. Boxes 1 to 4 are about how player i

infers �j(l + 1). Remember that player j infers �i(l + 1) symmetrically, which is expressed

in Boxes 5 to 11. Whether player j is in Box 5 or 6, player j uses �i(l + 1)(j) to determine

�i(l + 1) with probability at least �. Again, since the length of the supplemental rounds

is 4LT12 , any �i(l + 1) can occur with probability of order exp(�T

12 ). Therefore, player i

believes that any �i(l+1) occurs with probability at least of order � exp(�T12 ).10 Since (36)

is much smaller than � exp(�T 12 ), player i can keep high belief on �j(l + 1) = G regardless

of player i�s history in the review rounds.

[Insert Figure 8]

In summary, if (29) and (30) are satis�ed, player i believes that �j(l + 1) = G is true

with high probability regardless of player i�s continuation history, as desired.

Second, when player i uses �j(l + 1)(i) to determine �j(l + 1), there are two cases:

�j(l + 1)(i) = �j(l + 1) and �j(l + 1)(i) 6= �j(l + 1). If the former is the case, then the

inference is optimal. If the latter is the case, for almost optimality, it su¢ ces to show that,

whenever �j(l+1)(i) 6= �j(l+1), conditional on �j(l+1) (that is, even after player i knows

that her inference was not correct), player i believes that player j makes player i indi¤erent

between any action with high probability: Ai(l + 1) = Ai.

10As we will see in Lemma 5, excluding Box 12 does not a¤ect the posterior so much.

33

Therefore, we want to have the following:

Condition 2 For the supplemental rounds, we want to have the following: Conditional on

�j(l+1) and �j(l+1)(i), if �j(l+1)(i) 6= �j(l+1), then player i believes that Ai(l+1) = Ai

with high probability.

9.2 Message Protocol

In this section, we explain how we establish Condition 2 in the supplemental rounds 1 and

2 for �j(l + 1), where player j sends �j(l + 1) 2 fG;Bg to player i.

The following notations are useful. Let aGj = Cj and aBj = Dj.11 Let qj(a) = (qj(yj j a))yjbe a vector of player j�s signal distributions under a. In addition, let 1yj;t be a jYjj�1 vector

such that the element corresponding to yj;t is 1 and the other elements are 0. qi(a) = (qi(yi j

a))yi and 1yi;t are symmetrically de�ned. " > 0 is a small number that will be speci�ed in

Section 10.

9.2.1 Supplemental Round 1 for �j(l + 1)

First, we intuitively explain the supplemental round 1 for �j(l+ 1), which is summarized in

Figure 9 below. Let ��j 2 fG;Bg be a true message, that is, �j(l+1) = ��j. Player j (sender)

takes a��jj 2 faGj ; aBj g for T

12 periods while player i (receiver) takes aGi for T

12 periods.

Suppose player j requires player i to infer the message in the following way. From the set

of periods in the current round T (l; �j; 1) (see Figure 5), player j randomly picks tj(l; �j; 1).

Player j calculates the frequency of player j�s signal observation dropping period tj(l; �j; 1).

With Tj(l; �j; 1) � T (l; �j; 1) n ftj(l; �j; 1)g, let yj(l; �j; 1) � 1

T12�1

Pt2Tj(l;�j ;1) 1yj;t. (i) If the

distance to yj(l; �j; 1) from the a¢ ne hull of fqj(a��jj ; ai)gai (a�

�fqj(a

��jj ; ai)gai

�) is no more

than ", then player j requires player i to infer �j(l + 1) properly.12 (ii) Otherwise, player j

will make any action optimal to player i (let �j(l; �j; 1) = B denote this case. Then, from

11In a general game, as we will see in the Supplemental Materials, any aGj and aBj with a

Gj 6= aBj generically

works.12As we will see in Section 9.2.2, there are subcases of (i) where player j makes player i indi¤erent between

any action pro�le, depending on the realization of the supplemental round 2 for �j(l + 1).

34

(32), Ai(~l + 1) = Ai for all ~l � l). Since we take the a¢ ne hull with respect to player i�s

action, whether (i) or (ii) is the case is out of player i�s control. By the central limit theorem,

(ii) occurs with small probability and does not a¤ect the equilibrium payo¤.

Player i (receiver) calculates the conditional expectation of yj(l; �j; 1). For Condition 2, it

su¢ ces that player i infers (iii) �j(l+1)(i) = G if the distance to E�yj(l; �j; 1) j fyi;tgt2T (l;�j ;1); aGj ; aGi

�from a�

�fqj(aGj ; ai)gai

�is no more than 2" and (iv) �j(l + 1)(i) = B if the distance to

E�yj(l; �j; 1) j fyi;tgt2T (l;�j ;1); aBj ; aGi

�from a�

�fqj(aBj ; ai)gai

�is no more than 2". To see

why this is su¢ cient, suppose �j(l + 1) = G but player i infers �j(l + 1)(i) = B. More-

over, assume that player i knows that �j(l + 1) = G and player j took aGj . Conditional on

�j(l+1) = G, the distance to E�yj(l; �j; 1) j fyi;tgt2T (l;�j ;1); aGj ; aGi

�from a�

�fqj(aGj ; ai)gai

�is more than 2" (otherwise, (iii) should be the case and �j(l + 1)(i) = G). Notice that the

conditional expectation uses the true action aGj . Since the conditional variance of yj(l; �j; 1)

is of order T12�1 by the central limit theorem, player i�s belief on the event that the dis-

tance to yj(l; �j; 1) from a��fqj(aGj ; ai)gai

�is no more than " is at most of order exp(T

12�1).

Hence, player i believes that Ai(~l+1) = Ai for all ~l � l with high probability. The symmetric

argument holds for �j(l + 1) = B.

For each �j 2 fG;Bg, instead of using Ehyj(l; �j; 1) j fyi;tgt2T (l;�j ;1); a

�jj ; a

Gi

i, player i

calculates the following statistics: Player i randomly picks ti(l; �j; 1). With Ti(l; �j; 1) �

T (l; �j; 1) n fti(l; �j; 1)g, player i calculates 1

T12�1

Pt2Ti(l;�j ;1) E

h1yj;t j yi;t; a

�jj ; a

Gi

ifor each

�j. Here, player i drops her own ti(l; �j; 1), not tj(l; �j; 1) that player j drops. However,

since Tj(l; �j; 1) and Ti(l; �j; 1) di¤ers at most for two periods, the same argument as above

asymptotically holds if we replace Ehyj(l; �j; 1) j fyi;tgt2T (l;�j ;1); a

�jj ; a

Gi

iwith

1

T12 � 1

Xt2Ti(l;�j ;1)

Eh1yj;t j yi;t; a

�jj ; a

Gi

i

= E

24 1

T12 � 1

Xt2Ti(l;�j ;1)

1yj;t j yi(l; �j; 1); a�jj ; a

Gi

35 :Here, yi(l; �j; 1) � 1

T12�1

Pt2Ti(l;�j ;1) 1yi;t.

35

The above inference is well de�ned if there is �" > 0 such that, for all " < �", there is no

player i�s signal observation fyi;tgt2Ti(l;�j ;1) such that

The distance to E

24 1

T12 � 1

Xt2Ti(l;�j ;1)

1yj;t j yi(l; �j; 1); aGj ; aGi

35from a�

�fqj(aGj ; ai)gai

�is no more than 2"; (37)

The distance to E

24 1

T12 � 1

Xt2Ti(l;�j ;1)

1yj;t j yi(l; �j; 1); aBj ; aGi

35from a�

�fqj(aBj ; ai)gai

�is no more than 2": (38)

Since yi(l; �j; 1) 2 �(f1yigyi2Yi),13 by continuity, a su¢ cient condition for the existence of

such �" is that there is no yi 2 �(f1yigyi2Yi) such that

E�yj j yi; aGj ; aGi

�2 a�

�fqj(aGj ; ai)gai

�; (39)

E�yj j yi; aBj ; aGi

�2 a�

�fqj(aBj ; ai)gai

�: (40)

Here, E [yj j yi; a] is the conditional expectation of the frequency of player j�s signal observa-

tions given the frequency of player i�s signal observation for given periods where the players

take a. (39) imposes jYjj � jAij conditions and (40) imposes jYjj � jAij conditions. Hence,

there are 2 (jYjj � jAij) conditions in total. On the other hand, we have jYij � 1 degrees of

freedom for yi. Hence, if 2 (jYjj � jAij) � jYjj � 1, then there can be solutions for (39) and

(40) and the inference above may not be well de�ned.

To deal with this problem, we consider the following procedure. (v) If player i observes

yi(l; �j; 1) whose di¤erence from the a¢ ne hull of fqi(aj; aGi )gaj is no more than ", then

player i infers �j(l + 1)(i) in the supplemental round 1 for �j(l + 1). (vi) Otherwise, player

i infers �j(l + 1)(i) in the supplemental round 2 for �j(l + 1), which will be explained later.

Since we take the a¢ ne hull with respect to player j�s action, whether player i infers �j(l+1)

from the supplemental round 1 or 2 is out of player j�s control (remember that the message

13Here, �(f1yigyi2Yi) is the set of distributions over Yi.

36

receiver i takes aGi ). By the central limit theorem, (vi) occurs with small probability and

does not a¤ect the equilibrium payo¤.

If (v) is the case, then the inferences (iii) and (iv) are generically well de�ned. To see

why, notice that this is well de�ned if there exists " > 0 such that there is no yi(l; �j; 1) such

that the distance to yi(l; �j; 1) from a��fqi(aj; aGi )gaj

�is no more than " and (37) and (38)

are satis�ed. A su¢ cient condition is that there is no yi 2 �(f1yigyi2Yi) such that

yi 2 a��fqi(aj; aGi )gaj

�(41)

and such that (39) and (40) are satis�ed. Since (41) imposes jYij � jAjj constraints, there

are jYij � jAij + 2 (jYjj � jAij) constraints together with (39) and (40). Since the degree of

freedom for yi is jYij � 1, if

jYij � jAjj+ 2 (jYjj � jAij) � jYij

,

jYjj � jAij �1

2jAij ;

then there is generically no yi 2 �(f1yigyi2Yi) satisfying (41), (39) and (40), as desired.

If (vi) is the case, then player i infers �j(l+1)(i) from the supplemental round 2. In this

case, player i has � i(l; �j; 1) = B or #i(l; �j; 1) = B.14 Then, from (32), player j is indi¤erent

between any action pro�le. That is, player j does not care about how player i will play in

the subsequent rounds. Therefore, player j excludes this case from the consideration. In

particular, player j does not care about player i�s inference in this case. That is, whenever

player j�s message in the supplemental round 2 matters, player j puts 0 belief on this event

and player j is indi¤erent between �j(l + 1)(i) = G and �j(l + 1)(i) = B.15

14See Section 15.4 for details.15As we have seen in Figure 6, whenever player i uses �j(l + 1)(i) to determine �j(l + 1), player j is

indi¤erent between any action pro�le. Hence, player j is almost indi¤erent to any action pro�le in thesupplemental round 1. However, player j�s history (and so actions) in the supplemental round 1 a¤ectsplayer j�s belief about player i�s inference �j(l + 1)(i), which, in turn, a¤ects player j�s belief about theoptimality of �i(l+ 1) since player i�s continuation strategy is jointly determined by �j(l+ 1)(i) and player

37

The summary of the cases is depicted in the following �gure:

[Insert Figure 9].

9.2.2 Supplemental Round 2 for �j(l + 1)

Second, we intuitively explain the supplemental round 2 for �j(l + 1), which is summarized

in Figures 10 and 11 below. Again, this round matters only if player i is in Box 8 of Figure

9 as a result of the supplemental round 1 for �j(l + 1).

In this round, player j (sender) sends the message about �j(l+1) again. Remember that

��j is the true variable: �j(l + 1) = ��j. This time, player j sends the message as follows:

With � > 0 being a small number de�ned in Section 10, player j determines

zj(��j) =

8>>><>>>:��j with probability 1� 2�;

fG;Bg n f��jg with probability �;

M with probability �

and player j takes

�zj(��j)j =

8>>><>>>:aGj if zj(��j) = G;

aBj if zj(��j) = B;

12aGj +

12aBj if zj(��j) =M

for T12 periods. That is, player j sends the �true�message �zj(

��j)j = a

��jj with high probability

1 � 2� while player j �tells a lie�with probability 2�. With probability �, player j sends

the opposite message zj(��j) = fG;Bg n f��jg. With probability �, player j �mixes� two

messages: zj(��j) =M and �Mj = 12aGj +

12aBj . When player j tells a lie, player j makes player

i�s history in the lth review round on which �i(l+1) is based, as we have seen in Figure 8 (with the roles ofplayers i and j reversed). Hence, player j is almost indi¤erent, but not exactly indi¤erent in the supplementalround 1 for �j(l + 1).On the other hand, in the supplemental round 2, since player j excludes the cases where player i�s history

in the supplemental round 2 a¤ects player i�s continuation strategy, player j is exactly indi¤erent to anystrategy.See Proposition 1 for the formal statement of the above discussion.

38

i indi¤erent between any action: �j(l; �j; 2) = B and Ai(~l + 1) = Ai for all ~l � l. Since a lie

occurs with small probability 2�, it does not a¤ect the equilibrium payo¤.

Player i (receiver) takes aGi . Player i infers zj(��j) as follows. For the periods in the current

round T (l; �j; 2) (see Figure 5), let yi(l; �j; 2) = (yi(l; �j; 2))yi be the vector representing the

frequency that player i observes yi in T (l; �j; 2). Conditional on ��j 2 fG;Bg, the likelihood

ratio between zj(��j) = zj 2 fG;B;Mg and zj(��j) = z0j 2 fG;B;Mg is

Pr�zj(��j) = zj j ��j;yi(l; �j; 2)

�Pr�zj(��j) = z0j j ��j;yi(l; �j; 2)

� = Pr�yi(l; �j; 2) j zj(��j) = zj

�Pr�yi(l; �j; 2) j zj(��j) = z0j

� Pr �zj(��j) = zj j ��j�

Pr�zj(��j) = z0j j ��j

� :log

Pr(yi(l;�j ;2)jzj(��j)=zj)Pr(yi(l;�j ;2)jzj(��j)=z0j)

is expressed as T12

�L(yi(l; �j; 2); zj)� L(yi(l; �j; 2); z0j)

�with

L(yi(l; �j; 2); zj) = yi;1(l; �j; 2) log q(yi;1jaGi ; �zjj ) + � � �+ yi;jYij(l; �j; 2) log q(yi;jYijjaGi ; �

zjj ):

Since L(yi(l; �j; 2); zj) is strictly concave with respect to the mixture of aGj and aBj and

�(f1yigyi2Yi) 3 yi(l; �j; 2) is compact, there exists � > 0 such that one of the following is

true:

1. zj(��j) = G is su¢ ciently more likely than zj(��j) = B: L(yi(l; �j; 2); G) � � �

L(yi(l; �j; 2); B).

2. zj(��j) = B is su¢ ciently more likely than zj(��j) = G: L(yi(l; �j; 2); B) � � �

L(yi(l; �j; 2); G).

3. If zj(��j) = G and zj(��j) = B are equally likely, then since L(yi(l; �j; 2); zj) is strictly

concave, zj(��j) =M is most likely: L(yi(l; �j; 2);M)�� L(yi(l; �j; 2); G);L(yi(l; �j; 2); B).

If 1 is the case, thenPr(zj(��j)=Gj��j ;yi(l;�j ;2))Pr(zj(��j)=Bj��j ;yi(l;�j ;2))

� exp(�T12 ) �1�2� for

��j 2 fG;Bg. Since

zj(��j) = M implies that player j told a lie and that player i is indi¤erent to any action

pro�le, player i can believe that, conditional on �j(l+1), inferring �j(l+1)(i) = G is almost

optimal. Similarly, if 2 is the case, then player i can believe that, conditional on �j(l + 1),

inferring �j(l + 1)(i) = B is almost optimal. Finally, if 3 is the case, then player i believes

39

conditional on �j(l + 1) that player j told a lie, that any action pro�le is optimal with high

probability, and that inferring �j(l + 1)(i) = B is almost optimal.

In summary, there exists �� > 0 such that, for any history in the current round, condi-

tional on the true �j(l + 1), one of the following three cases is true:

1. zj(��j) = G is more likely than zj(��j) = B by exp(��T12 ).

2. zj(��j) = B is more likely than zj(��j) = G by exp(��T12 ).

3. zj(��j) =M is most likely by exp(��T12 ).

See Figure 10 for the illustration of player j�s requirement and Figure 11 for player

i�s inference. Consider Condition 2: Since almost optimality is established conditional on

�j(l + 1), after realizing �j(l + 1)(i) 6= �j(l + 1), player i believes that �j(l + 1) = B and

Ai(~l + 1) = Ai for all ~l � l with high probability.

[Insert Figures 10 and 11]

9.3 Formal Message Protocol

The formal de�nition of the supplemental rounds 1 and 2 for �j(l+1) is given in the Appendix.

The di¤erence from the above intuitive explanation is that there are several events denoted

by �j = B which are excluded from player i�s consideration. As we have mentioned in Section

9.2.1, when player j uses player i�s message in the supplemental round 2 for �i(l+1), �j = B

or #j = B has happened and player i always assumes that her message in the supplemental

round 2 is irrelevant. In summary, we can prove the following Lemma.

Lemma 5 There exists �" > 0 such that, for any " 2 (0; �"), for any l = 1; :::; L � 1, for

su¢ ciently large T , for any i; j 2 I, there exists a message protocol in the supplemental

rounds such that

1. In the supplemental round 1 for �j(l + 1),

40

(a) Player j takes a�j(l+1)j and player i takes aGi for T12 periods.

(b) Player j creates �j(l; �j; 1); �j(l; �j; 1) 2 fG;Bg.

(c) Player i creates � i(l; �j; 1); #i(l; �j; 1) 2 fG;Bg. If � i(l; �j; 1) = #i(l; �j; 1) = G,

then �j(l + 1)(i) is determined solely by player i�s history in the supplemental

round 1.

2. In the supplemental round 2 for �j(l + 1),

(a) Player j takes a mixed strategy and player i takes aGi for T12 periods.

(b) Player j creates �j(l; �j; 1) 2 fG;Bg.

(c) If �j(l+1)(i) is determined by player i�s history in the supplemental round 2, then

� i(l; �j; 1) = B or #i(l; �j; 1) = B in the previous round.

3. Let �j, #j and �j be f�j(~l); �j(~l; �j; 1); �j(~l; �i; 1)gL�1~l=1, f#j(~l; �i; 1)gL�1~l=1

and f�j(~l; �j; 1); �j(~l; �j; 2)gL�1~l=1,

respectively. Then

(a) Conditional on f�j(~l + 1)gL�1~l=1and ��j and #j being all G,�player i believes that

�j(l+1)(i) = �j(l+1) or ��j(l; �j; 1) = B or �j(l; �j; 2) = B�with probability no

less than 1� exp(�T 13 ).

(b) Conditional on f�j(~l+ 1)gL�1~l=1and ��j and #j being all G,�any history of player

i in the supplemental rounds happens with probability no less than exp(�T 13 ).

(c) Conditional on f�j(~l + 1)gL�1~l=1and ��j and #j being all G,� conditional on any

history of player i in all the supplemental rounds, player i believes that any �i(l+

1)(j) is possible with probability no less than exp(�T 13 ).

(d) �j and #j are all G with probability no less than 1� exp(�T 13 ).

(e) The distribution of �j is independent of player i�s strategy with probability no less

than 1� exp(�T 13 ).

(f) The distribution of f�j; #jg is independent of player i�s strategy.

41

10 Variables

In this section, we show that all the variables can be taken consistently satisfying all the

requirements that we have mentioned: �, �u, q2, q1, �", �L, L, � and ".

First, � is determined in (3), �u is determined in Lemma 2, q1 and q2 are determined in

Lemma 3, and �" 2�0; 1

2

�is de�ned in Lemma 5, independently of the other variables.

Given q1 and q2, we de�ne �L to satisfy (10):

�L (q2 � q1) > maxa;i2 jui (a)j : (42)

Given � and �L, we de�ne L su¢ ciently large so that (11) holds:

�L < L�: (43)

We are left to pin down � and ". Take " > 0 and � > 0 su¢ ciently small such that

ui (D1; D2) + �+ 2"�L+ 3�u� < vi < vi < ui (C1; C2)� �� 2"�L� 3�u�; (44)

and

" < min f�"; q2 � q1g : (45)

11 Almost Optimality of �i(xi)

Since we have pinned down �j(l+1)(i) in Section 9, Section 8.2.2 pins down �j(l+1). Then,

Section 8.1 pins down the equilibrium action. In addition, since we have pinned down �j, �j

and #j, Section 9 pins down �i(xj; �; �) except for ��i(x; l; �j). Therefore, we want to show

that there exists ��i(x; l; �j) such that these actions and reward functions satisfy (21), (5)

and (6).

42

11.1 Almost Optimality of �j(l + 1)

First, we show the almost optimality of �j(l + 1). Since we veri�ed 2 of Condition 2 is

satis�ed in Lemma 5, this follows from Section 9.1.

For notational convenience, for each lth review round, let aj(l) be the sequence of player

j�s equilibrium actions in each ~lth review round with ~l � l.

As we have mentioned in Section 9, in each lth review round (r = l), player i conditions

that �j(r) = #j(r) = G for all r < l. Since Condition 2 is satis�ed conditional on �j = #j = G

in Lemma 5, the almost optimality still holds:

Lemma 6 For any lth review round, conditional on aj(l) and �j(r) = #j(r) = G for all

r < l, for any history of player i where player i has not deviated in the supplemental rounds

for �j(~l) with ~l � l, player i can believe that �j(l) = �j(l) or there exists r < l with �j(r) = B

with probability no less than

1� exp(�T 13 ): (46)

11.2 Determination of ��i(x; l; �j)

Second, based on Lemma 6, we determine ��i(x; l; �j) such that ��i(x; l; �j) satis�es (6) and

show that �i(xi) is almost optimal:

Proposition 1 There exists ��i(x; l; �j) that satis�es (6) and such that �i(xi) is almost op-

timal, that is, for su¢ ciently large T , for all round r,

1. If �j(~r) = B or #j(~r) = B with some ~r < r, then any strategy is exactly optimal.

2. If �j(~r) = #j(~r) = G for all ~r < r, then

(a) For all l 2 f1; :::; L � 1g, for r = (l; �i; 2) (supplemental round 2 for �i(l + 1))

where player i takes a mixed strategy, any strategy is exactly optimal.

(b) For all l 2 f1; :::; L � 1g, for r = (l; �j; 1) or (l; �j; 2) (supplemental round 1 or

2 for �j(l + 1)) where player i receives the message, aGi is strictly optimal by at

least 12�L(q2 � q1).

43

(c) For the other rounds, �i(xi) is optimal with loss up to exp(�T14 ).

1 follows from (32). 2-(a) is true since �j(~r) = #j(~r) = G for all ~r < r imply that player

j has inferred �i(l + 1)(j) from the supplemental round 1 (See Figure 13 reversing the roles

of i and j). 2-(b) is true since taking aGi gives the almost optimal inference and (35) gives

a high reward on aGi . 2-(c) is true intuitively because (i) in the supplemental round 1 for

�i(l+1), whenever player j is ready to listen, player i�s continuation payo¤ is �xed regardless

of player j�s inference from 4 of Lemma 4 and (ii) Lemma 6 guarantees that the coordination

goes well with high probability and that we can construct the reward functions in the review

rounds as in Section 5.2.2.

12 Exact Optimality

We are left to show the truthtelling incentive for hTP+1i in the report block and to establish

the exact optimality of �i (xi). Here, we o¤er the intuitive explanation. See Section 15.7 in

the Appendix for the formal proof.

The report block proceeds as follows: By public randomization device, the players coor-

dinate on who will report hTP+1i . Only one player reports the history. Player 1 reports hTP+11

with probability 12and player 2 reports hTP+12 with probability 1

2. Player i who is supposed

to send hTP+1i according to the public randomization device sends two messages for each

round r sequentially where player i does not take a mixed strategy (that is, except for the

supplemental round 2 for �i(l+1)): (i-r-1) What action player i took and what signal player

i observed in the �rst period of the round r, (ai;tr ; yi;tr). Here, tr is the �rst period of the

round r. (i-r-2) The history in the round r, fai;t; yi;tgt2T (r).

Let hri be the history in each round where player i does not take a mixed strategy before

the round r. Notice that, before player i sends the message about (i-r-1), the messages about

hri have been sent.

By backward induction, for each r, player j who is not supposed to send hTP+1j by

the realization of the public randomization device adjusts the reward function as follows.

44

Here, to verify the truthtelling incentive, we distinguish the true histories hri , (ai;tr ; yi;tr) and

fai;t; yi;tgt2T (r) and the messages hri , (ai;tr ; yi;tr) and fai;t; yi;tgt2T (r).

(j-r-1) Given hri , player j gives a reward based on�ai;aj;trj;tr

ai2Ai

such that any action

that should be taken by �i(xi) after hri is optimal in tr (the �rst period of the round r).

(j-r-2) Player j punishes player i if player i is likely to tell a lie about (ai;tr ; yi;tr) in (i-r-1).

(j-r-3) Player j gives a reward based onP

t2T (r)ai;tr ;aj;tj;t such that constantly taking the

same action as ai;tr during the round r is optimal. (j-r-4) Player j punishes player i if player

i is likely to tell a lie about fai;t; yi;tgt2T (r) in (i-r-2).

We take the punishment and reward satisfying the following requirements:

1. Within the round r, we take the reward in (j-r-1) much larger than the punishment

or reward in (j-r-2), (j-r-3) and (j-r-4). Similarly, we take the punishment in (j-r-2)

much larger than the punishment or reward in (j-r-3) and (j-r-4), and the reward in

(j-r-3) much larger than the punishment in (j-r-4).

2. Between rounds, we take the punishment and reward from (j-r-1) to (j-r-4) much

larger than those from (j-r + 1-1) to (j-r + 1-4).

By backward induction, we can verify the truthtelling incentive and make �i(xi) exactly

optimal: In round r, we do not need to consider the punishment or reward for the round

~r > r by Requirement 2 above. First, given that player i tells the truth in the report block,

(j-r-1) is enough to make the equilibrium strategy optimal in tr, taking into account the

e¤ect on (j-r-2), (j-r-3) and punishment and reward for the round ~r > r by Requirement

1. Second, (j-r-2) is enough to give the truthtelling incentive for (ai;tr ; yi;tr) since (i) the

punishment and reward for (j-~r-1), (j-~r-2), (j-~r-3) and (j-~r-4) with ~r < r are sunk, (ii) the

reward (j-r-1) is sunk, and (iii) the punishment and reward for (j-r-3) and (j-r-4) are much

smaller than the reward for (j-r-2). Third, given the truthtelling incentive for ai;tr , (j-r-3)

is enough to incentivize player i to constantly take ai;tr within the round r since (i) the

punishment and reward for (j-~r-1), (j-~r-2), (j-~r-3) and (j-~r-4) with ~r < r are sunk, (ii) the

punishment and reward for (j-r-1) and (j-r-2) are sunk, and (iii) the punishment for (j-r-4)

45

is much smaller than the reward for (j-r-3). Finally, (j-r-4) is enough to incentivize player i

to tell the truth about fai;t; yi;tgt2T (r) since (i) the punishment and reward for (j-~r-1), (j-~r-

2), (j-~r-3) and (j-~r-4) with ~r < r are sunk, and (ii) the punishment and reward for (j-r-1),

(j-r-2) and (j-r-3) are sunk.

Note that we do not require player i to send the messages about the rounds where

player i takes a mixed strategy. From Proposition 1, �i(xi) was exactly optimal without

the adjustment in the report block. We cancel out the adjustment of the punishment and

reward in the round r explained above once �j(~r) = B or #j(~r) = B with ~r < r happens.

Then, from Proposition 1, �i(xi) is exactly optimal in all the rounds.

In addition, in the supplemental round 2 for �i(l + 1), if �j(~r) = #j(~r) = G until this

round, then player i�s strategy in this round does not a¤ect the continuation strategy pro�le

including the report block except for player j�s messages in the report block. Since Propo-

sition 1 guarantees that �i(xi) is exactly optimal in this round without adjustment, we can

omit this round from the rounds whose history player i sends the messages in the report

block about.

From Proposition 1, the last adjustment (j-R-3) can be very small. Hence, the total

reward and punishment can be small enough not to a¤ect the equilibrium payo¤.

13 General Games

13.1 General Two-Player Games

The prisoners�dilemma is special in that, when player j�state is B, the equilibrium action

Dj can (i) attain the targeted payo¤ �vj or vj depending on player i�s state and (ii) minimax

player i at the same time. However, in a general game, there does not always exist such an

action, which causes the following problem: Player j with xj = B and �j (l) = B needs to

give a positive constant reward as in (33) to sustain (6). On the other hand, player j needs

to ensure that player i�s payo¤ is below vi regardless of player i�s strategy. Therefore, player

j with xj = B and �j (l) = B needs to minimax player i with high probability if player i

46

does not take a prescribed action although the minimaxing action can be di¤erent from the

prescribed action to attain the targetted payo¤ �vj or vj.

For this purpose, player i constructs a statistics such that if its realization is low, then

player i allows player j to minimax player i. Player i sends the message about whether

she allows player j to minimax. Player j, on the other hand, calculates the conditional

expectation of that statistics which decreases if player i deviates. Modifying the coordination

protocol about �i(l + 1), we can make a protocol such that the players can coordinate on

the punishment properly and that player j punishes player i with high probability if the

realization of the conditional expectation is su¢ ciently low regardless of player i�s message

to keep player i�s payo¤ below vi regardless of player i�s strategy. See the Supplemental

Material 1 for the formal description.

The fact that there does not always exist an action satisfying both (i) and (ii) is the

reason why the belief-free equilibrium payo¤ set is smaller than the feasible and individually

rational payo¤ set except for the prisoners�dilemma. Hörner and Olszewski (2006) consider

a way to coordinate on the punishment, which is di¤erent from ours. However, since their

coordination uses almost perfect monitoring, as Hörner and Olszewski (2006) mention, it

was not known how to obtain the coordination in not-almost-perfect monitoring.

13.2 General More-Than-Two-Player Games

If there are more than two players, as Hörner and Olszewski (2006), we let player (i � 1)�s

state determine whether player i�s value is �vi or vi independently of players (�(i�1))�s state.

With cheap talk, there is no conceptual di¢ culty to extend our result (See Supplemental

Material 2).

47

14 Equilibrium Construction without Cheap Talk

14.1 Two-Player Games

Throughout the main text, we use perfect and public cheap talk and public randomization

devices.16 In the Supplemental Material 3, we show that cheap talk and public randomization

are completely dispensable and all the messages can be sent via actions without public

randomization.

Note that we use the cheap talk messages for xi and hTP+1i . The basic idea to send xi is

the same as in the supplemental rounds for �i(l+1). As Lemma 5 shows, conditional on the

opponent message, the receiver can believe that her inference is correct or she is indi¤erent to

any action pro�le sequence. Therefore, even after observing the continuation strategy that

is not corresponding to her inference, the receiver can believe that she is indi¤erent to any

action pro�le sequence with high probability and that following the equilibrium strategy is

almost optimal. As we have seen in Section 12, we can attain the exact optimality afterwards.

There is one di¤erence between �i(l+1) and xi. For �i(l+1), whenever player j changes

her strategy based on the inference of �i(l+1), player j has made player i indi¤erent to any

action pro�le sequence and the sender�s inference of the receiver�s inference is irrelevant (4 of

Lemma 4). On the other hand, the receiver j constructs the reward function on the sender

i based on xi (remember that �i depends on not only xj but also xi). Therefore, the sender

needs to have the inference of the receiver�s inference. For this purpose, player j sends back

her inference of xi to player i to inform what inference the reward function is based on. The

incentive to tell the truth for player j can be provided.

As for hTP+1i in the report block, there are three problems. First, the cardinality of the

messages for our equilibrium construction is of order (jAij jYij)T . If we replace cheap talk

with the message exchange via actions straightforwardly, it takes too long to send all the

messages and a¤ects the equilibrium payo¤. The second problem is that we need to dispense

the public randomization device. It is important to have only one player who reports hTP+1i

16Even with cheap talk, however, the general folk theorem is new to the two-player case.

48

since otherwise, while the opponent is reporting hTP+1j , player i could update the belief

over player j�s signal observations and the incentive to tell the truth about hTP+1i would be

destroyed. On the other hand, to have the exact optimality, there needs to be a positive

probability for each player to report hTP+1i and the reward function will be adjusted. Third,

there exists a positive probability that the messages do not transmit correctly. See the

Supplemental Material 3 for the solutions to these problems.

14.2 More-Than-Two-Player Games

There is a problem in coordinating on xi, unique to the case with more than two players:

Each player takes an action based on the her own inference about xi. It may be of some

player�s interest that some players take actions corresponding to xi = G and the others

take actions corresponding to xi = B. Then, that player may have an incentive to deviate

from the equilibrium action in order to manipulate the signal distributions and to induce

the miscoordination about xi. See the Supplemental Material 4 for how to deal with this

problem.

15 Appendix

15.1 Proof of Lemma 1

To see why this is enough for Theorem 1, de�ne the strategy in the in�nitely repeated game

as follows: De�ne

p(G; hTP+1j : �) � 1 + 1� �

�TP�i(G; h

TP+1j : �)

�vi � vi;

p(B; hTP+1j : �) � 1� �

�TP�i(B; h

TP+1j : �)

�vi � vi: (47)

If (6) is satis�ed, for su¢ ciently large �, p(G; hTP+1j : �), p(B; hTP+1j : �) 2 [0; 1] for all hTP+1j .

We see the repeated game as the repetition of TP -period �review phases.� In each phase,

49

player i has a state xi 2 fG;Bg. Within the phase, player i with state xi plays according to

�i (xi) in the current phase. After observing hTP+1i in the current phase, the state in the next

phase is equal to G with probability p(xi; hTP+1i : �) and B with the remaining probability.

Player i�s initial state is equal to G with probability piv and B with probability 1 � piv

with

piv�vj + (1� piv)vj = vj:

Then, since

(1� �)

TPXt=1

�t�1ui (at) + �TP�p(G; hTP+1j )�vi + f1� p(G; hTP+1j )gvi

�=

�1� �TP

� " 1� �

1� �TP

(TPXt=1

�t�1ui (at) + �i(G; hTP+1j : �)

)#+ �TP �vi

and

(1� �)

TPXt=1

�t�1ui (at) + �TP�p(B; hTP+1j )�vi + f1� p(B; hTP+1j )gvi

�=

�1� �TP

� " 1� �

1� �TP

(TPXt=1

�t�1ui (at) + �i(B; hTP+1j : �)

)#+ �TP vi;

(4) and (5) imply, for su¢ ciently large discount factor �,

1. Conditional on the opponent�s state, the above strategy in the in�nitely repeated game

is optimal.

2. If player j is in the state G, then player i�s payo¤ from the in�nitely repeated game is

�vi and if player j is in the state B, then player i�s payo¤ is vi.

3. The payo¤ in the initial period is pjv�vi + (1� pjv)vi = vi as desired.


First, note that (27) is equivalent to

50

Xyi

8<:Xyj

aj (yj)q(yj j a; yi)

9=; q (yi j ai; ~aj) =Xyj

(Xyi

q(yj j a; yi)q(yi j ai; ~aj)) aj (yj) = q2:

(48)

Second, arbitrarily �x q2 > q1. Let Q1(~ai; aj) � (q (yj j ~ai; aj))yj be the vector of

the distribution of player j�s signal conditional on ~ai; aj. In addition, let Q2(ai; ~aj) ��Pyiq(yj j a; yi)q(yi j ai; ~aj)

�yj. The vectors Q1(~ai; aj) and Q2(ai; ~aj) are generically lin-

early independent except for ~ai = ai and ~aj = aj since Assumption 2 guarantees that

jYjj � jAij + jAjj � 1. Therefore, there generically exists a solution f ai (yj)gyj for (26) and

(27). Note that if f ai (yj)gyj solves the system for q2 and q1, then, for any m;m0 2 R++,n ai (yj)+m

m0

oyjsolves the system for q2+m

m0 and q1+mm0 . Therefore, we can make sure that

aj : Yj ! (0; 1).


This is generically satis�ed if jYjj � jAij as Yamamoto (2009b).


15.4.1 Explanation of 1 of Lemma 5: Supplemental Round 1 for �j(l + 1)

The following is the formal description of the message exchanges in the supplemental round

1 for �j(l + 1). Let us �rst introduce the following notations:

Notation 1 For �j 2 fG;Bg, we de�ne

1. the matrix projecting player i�s signals on the conditional distribution of player j�s

signals given an action pro�le a:

Qj;i(a) =

26664q(yj;1 j a; yi;1) � � � q(yj;1 j a; yi;jYij)

......

q(yj;jYj j j a; yi;1) � � � q(yj;jYj j j a; yi;jYij)

37775 :51

2. (jYjj � jAij+ 1)� jYjj matrix Hj(��j) and (jYjj � jAij+ 1)� 1 vector pj(��j) such that

the a¢ ne hull of player j�s signal distribution with respect to player i�s action when

player j takes a��jj is represented by

a�(fqj(a��jj ; ai)gai) \ R

jYj j+ =

nyj 2 RjYj j+ : Hj(��j)yj = pj(��j)

o: (49)

3. The set of hyperplanes generated by perturbing RHS of the characterization of a�(fqj(a��jj ; ai)gai)\

RjYj j+ : for " � 0,17

Hj["](��j) �

8<:yj 2 RjYj j+ : 9" 2 RjYj j�jAij+1+ such that

8<: k"k � ";

Hj(��j)yj = pj(��j)+"

9=; :

4. The set of frequencies of player i�s signal observations such that the conditional expec-

tation of the frequency of player j�s signal observation is close to Hj["](�j) when the

players take a�jj ; aGi :

Hj;i["](�j) =

8>>>>>>>>><>>>>>>>>>:

yi 2 RjYij+ such that

there exist "1 2 RjYj j+ , "2 2 RjYj j�jAij+1+ and yj 2 RjYj j+ with8>>><>>>:yj = Qj;i(a

�jj ; a

Gi )yi + "1;

Hj(�j)yj = pj(�j) + "2;

k"1k ; k"2k � "

9>>>>>>>>>=>>>>>>>>>;:

Without loss of generality, we can make sure that all the elements in Hj(G) and Hj(B)

are included in (0; 1) by the following lemma.

Lemma 7 We can take Hj(G) and Hj(B) such that all the elements are in (0; 1).

Proof. Let mH be the minimum element of Hj(��j) and MH be the maximum element of

Hj(��j). Let ~Hj(��j) be the matrix whose (l;m) element is(Hj(��j))

l;m+jmH j+1

jMH j+jmH j+2 2 (0; 1) and

~pj(��j) be the vector whose lth element is(pj(��j))

l+jmH j+1

jMH j+jmH j+2 .

17We use the sup norm unless otherwise noti�ed: kxk = maxi jxij.

52

We will show

nyj 2 RjYj j+ : Hj(��j)yj = pj(��j)

o� Hj(��j) = ~Hj(��j) �

nyj 2 RjYj j+ : ~Hj(��j)yj = ~pj(��j)

o:

1. Hj(��j) � ~Hj(��j)

Suppose yj 2 Hj(��j). Since yj 2 a�(fqj(a��jj ; ai)gai) � a�

�f0; 1gjYj j

�, ~Hj(��j)yj =

~pj(��j) as desired.

2. Hj(��j) � ~Hj(��j)

Suppose yj 62 Hj(��j). Since a�(fqj(a��jj ; ai)gai) � a�

�f0; 1gjYj j

�, without loss of gener-

ality, we can assume that one row of Hj(��j) is parallel to (1; :::; 1) and that the element

of pj(��j) corresponding to that row is 1. If (1; :::; 1)yj 6= 1, then�

1+jmH j+1jMH j+jmH j+2 ; :::;

1+jmH j+1jMH j+jmH j+2

�yj =

1+jmH j+1jMH j+jmH j+2 (1; :::; 1)yj 6=

1+jmH j+1jMH j+jmH j+2 and yj 62

~Hj(��j) as desired. If (1; :::; 1)yj = 1,

then there is another row hj(��j) and the corresponding element pj(��j) of pj(��j) such

that

hj(��j)yj 6= pj(��j):

Let ~hj(��j) be the corresponding row of ~Hj(��j) and ~pj(��j) is the corresponding element

of ~pj(��j). Then,

~hj(��j)yj =1

jMH j+ jmH j+ 2�hj(��j) + (jmH j+ 1) (1; :::; 1)

�yj

=1

jMH j+ jmH j+ 2�hj(��j)yj + jmH j+ 1

�6= 1

jMH j+ jmH j+ 2�pj(��j) + jmH j+ 1

�= ~pj(��j)

and so yj 62 ~Hj(��j).

Given the frequency of player j�s signal observation yj(l; �j; 1), yj(l; �j; 1) 2 Hj["](��j)

corresponds to Box 2 of Figure 9. On the other hand, given the frequency of player i�s signal

53

observation yi(l; �j; 1), yi(l; �j; 1) 2 Hi["](G) corresponds to Boxes 5, 6 and 7 of Figure 9.

In addition, since

E

24 1

T12 � 1

Xt2Ti(l;�j ;1)

1yj;t j fyi;tgt2Ti(l;�j ;1); a�jj ; a

Gi

35 = Qj;i(a�jj ; a

Gi )yi(l; �j; 1);

yi(l; �j; 1) 2 Hj;i["](�j) corresponds to Box 5 or 6 of Figure 9, depending on �j = G or B.

Strategies The strategy in the supplemental round 1 for �j(l + 1) is as follows: Player

j (sender) with �j(l + 1) = ��j 2 fG;Bg takes a��jj and player i (receiver) takes aGi for T

12

periods in the supplemental round 1 for �j(l + 1), that is, for t 2 T (l; �j; 1).

Requirement by Player j As we have mentioned in Section 9.2.1, player j requires player

i to infer �j(l + 1) correctly only if yj(l; �j; 1) 2 Hj["](��j).

Instead of using yj(l; �j; 1) directly, player j constructs random variables�Hj;t

t2Tj(l;�j ;1)

as follows. After taking a��jj and observing yj;t, player j calculates Hj(��j)1yj;t. Then, player

j draws (jYjj � jAij+ 1) random variables independently from the uniform distribution

on [0; 1]. If the lth realization of these random variables is less than the lth element of

Hj(��j)1yj;t, then the lth element of Hj;t is equal to 1. Otherwise, the lth element of

Hj;t is

equal to 0. From Lemma 7,

Pr�n�

Hj;t

�l= 1oj a; y

�=�Hj(��j)1yj;t

�l2 (0; 1) (50)

for all a and y.

Given�Hj;t

t2Tj(l;�j ;1)

, let

Hj (l; �j; 1) =1

T12 � 1

Xt2Tj(l;�j ;1)

Hj;t: (51)

54

Player j requires player i to infer �j(l + 1) correctly only if

Hj (l; �j; 1)�Hj(��j)yj(l; �j; 1)

=

1

T12 � 1

Xt2Tj(l;�j ;1)

Hj;t �

1

T12 � 1

Xt2Tj(l;�j ;1)

Hj(��j)1yj;t

� "

2(52)

and Hj (l; �j; 1)� pj(��j) � "

2: (53)

In summary, there are following cases:

1. (52) is not satis�ed. Let �j(l; �j; 1) = B denote this event. This case is excluded from

player i�s consideration. From (50) and (51), by the central limit theorem, conditional

on fyj;tgt2T (l;�j ;1), this occurs with probability no more than of order exp(�T12 ).

2. (52) is satis�ed. Let �j(l; �j; 1) = G denote this event.

(a) (53) is not satis�ed. Let �j(l; �j; 1) = B in this event. Then, player j makes

player i indi¤erent between any action pro�le sequence in the subsequent rounds.

This case is not excluded from player i�s consideration. Since we take the a¢ ne

hull with respect to player i�s action in (49), together with (50), the central

limit theorem implies that this occurs with probability no more than of order

exp(�T 12 ) ex ante at the beginning of the supplemental round 1 regardless of

player i�s strategy.

(b) (53) is satis�ed. Depending on the supplemental round 2, player j may require

player i to infer �j(l+1) correctly. Let �j(l; �j; 1) = G denote this event. Note that

(52) and (53) imply Hj(��j)yj(l; �j; 1)� pj(��j)

� " by triangle inequality and

so yj(l; �j; 1) 2 Hj["](��j). Therefore, player j requires player i to infer �j(l + 1)

correctly only if yj(l; �j; 1) 2 Hj["](��j) as mentioned.

In addition, if 1 is the case, we de�ne �j(l; �j; 1) 2 fG;Bg as in the case with

55

�j(l; �j; 1) = G: If (53) is not satis�ed, then �j(l; �j; 1) = B. If (53) is satis�ed,

then �j(l; �j; 1) = G.

See the upper half of Figure 12 for the illustration. Figure 12 corresponds to Figure 9 in

the intuitive explanation. Box 2 of Figure 9 corresponds to Box 5 in Figure 12 and it is still

true that player j requires player i to infer �j(l+1) correctly only if yj(l; �j; 1) 2 Hj["](��j).

Therefore, the su¢ cient condition for the almost optimal inference by player i is still valid.

[Insert Figure 12]

Inference by Player i As mentioned in Section 9.2.1, if player i infers �j(l+1) from the

supplemental round 1, then player i infers �j(l + 1) = �j 2 fG;Bg if

E

24 1

T12 � 1

Xt2Ti(l;�j ;1)

1yj j yi(l; �j; 1); a�jj ; a

Gi

35is close to a�

�fqj(a�jj ; ai)gai

�. In other words, player i infers �j(l + 1) = G if yi(l; �j; 1) 2

Hj;i["](G) and �j(l + 1) = B if yi(l; �j; 1) 2 Hj;i["](B).

Instead of using yi(l; �j; 1) directly, player i constructs random variables fi;t(G)gt2Ti(l;�j ;1),

fi;t(B)gt2Ti(l;�j ;1) and�Gi;t

t2Ti(l;�j ;1)

as follows.

Construction ofni;t(�j)

ot2Ti(l;�j ;1)

with �j 2 fG;Bg After taking aGi and observing

yi;t, player i calculates Hj(�j)Qj;i(a�jj ; a

Gi )1yi;t. Then, player i draws jYjj � jAij + 1 random

variables independently from the uniform distribution on [0; 1]. If the lth realization of these

random variables is less than the lth element of Hj(�j)Qj;i(a�jj ; a

Gi )1yi;t, then the lth element

of i;t(�j) is equal to 1. Otherwise, the lth element of i;t(�j) is equal to 0. Since all the

elements of Hj(�j) is in (0; 1) from Lemma 7 and Qj;i(a�jj ; a

Gi )1yi;t is a conditional probability

distribution, we have Hj(�j)Qj;i(a�jj ; a

Gi )1yi;t 2 (0; 1), and

Pr�n�

i;t(�j)�l= 1oj a; y

�=�Hj(�j)Qj;i(a

�jj ; a

Gi )1yi;t

�l: (54)

56

Given fi;t(�j)gt2Ti(l;�j ;1), let

i(�j)(l; �j; 1) =1

T12 � 1

Xt2Ti(l;�j ;1)

i;t(�j): (55)

Construction of�Gi;t

t2Ti(l;�j ;1)

After taking aGi and observing yi;t, player i calculates

Hi(G)1yi;t. Then, player i draws (jYij � jAjj+ 1) random variables independently from the

uniform distribution on [0; 1]. If the lth realization of these random variables is less than

the lth element of Hi(G)1yi;t, then the lth element of Gi;t is equal to 1. Otherwise, the lth

element of Gi;t is equal to 0. By Lemma 7,

Pr�n�

Gi;t

�l= 1oj a; y

�=�Hi(G)1yi;t

�l2 (0; 1) : (56)

Given�Gi;t

t2Ti(l;�j ;1)

, let

Gi (l; �j; 1) =1

T12 � 1

Xt2Ti(l;�j ;1)

Gi;t: (57)

Player j infers �j(l + 1)(i) from the supplemental round 1 if and only if

i(G)(l; �j; 1)�Hj(G)Qj;i(aGj ; a

Gi )yi(l; �j; 1)

=

1

T12 � 1

X

t2Ti(l;�j ;1)

i;t(G)�X

t2Ti(l;�j ;1)

Hj(G)Qj;i(aGj ; a

Gi )1yi;t

� "

2; (58)

i(B)(l; �j; 1)�Hj(B)Qj;i(aBj ; a

Gi )yi(l; �j; 1)

=

1

T12 � 1

X

t2Ti(l;�j ;1)

i;t(B)�X

t2Ti(l;�j ;1)

Hj(B)Qj;i(aBj ; a

Gi )1yi;t

� "

2; (59)

57

Gi (l; �j; 1)�Hi(G)yi(l; �j; 1)

=1

T12 � 1

X

t2Ti(l;�j ;1)

Gi;t �

1

T12

Xt2Ti(l;�j ;1)

Hi(G)1yi;t

� "

2; (60)

and Gi (l; �j; 1)� pi(G) � "

2; (61)

There are following cases:

1. (58), (59) or (60) are not satis�ed. Let � i(l; �j; 1) = B denote this event. This case is

excluded from player j�s consideration. From (54), (55), (56) and (57), by the central

limit theorem, conditional on fyi;tgt2T (l;�j ;1), this occurs with probability no more than

of order exp(�T 12 ).

2. (58), (59) and (60) are satis�ed. Let � i(l; �j; 1) = G denote this event.

(a) (61) is not satis�ed. Let #i(l; �j; 1) = B denote this event. This case is also

excluded from player j�s consideration. Since we take the a¢ ne hull with respect

to player j�s action in the de�nition of Hi(G) (this is symmetric to (49)), together

with (56), the central limit theorem implies that this occurs with probability no

more than of order exp(�T 12 ) ex ante at the beginning of the supplemental round

1 regardless of player j�s strategy.

(b) (61) is satis�ed. Let #i(l; �j; 1) = G denote this event. In this case, player i uses

the supplemental round 1 to infer �j(l + 1).

For the following classi�cation, de�ne �K as follows: Since a continuous linear

transformation is Lipschitz continuous, we can take �K such that, for any i and

�j 2 fG;Bg, if yi 2 Hj;i["](�j), then there exists " 2 RjYj j�jAij+1+ such that

Hj(�j)Qj;i(a�jj ; a

Gi )yi = pj(�j) + ";

k"k � ( �K � 1)": (62)

58

Given this �K, we consider following subsubcases.

i. If we have

ki(G)(l; �j; 1)� pj(G)k � �K"; (63)

then player i infers �j(l + 1)(i) = G.

ii. If we have

ki(B)(l; �j; 1)� pj(B)k � �K"; (64)

then player i infers �j(l + 1)(i) = B.

iii. Otherwise, player i infers �j(l + 1)(i) = B.

In addition, if 1 is the case, we de�ne #i(l; �j; 1) 2 fG;Bg as in the case with

� i(l; �j; 1) = G: If (61) is not satis�ed, then #i(l; �j; 1) = B. If (61) is satis�ed,

then #i(l; �j; 1) = G.

The following lemma shows that the above inferences are well de�ned. The intuition is

the same as one in Section 9.2.1.

Lemma 8 Generically, the following statement is true: There exists �" > 0 such that, for all

" < �", if Case 2-(a)-i above is the case, then Case 2-(a)-ii is not the case. Symmetrically, if

Case 2-(a)-ii above is the case, then Case 2-(a)-i is not the case.

Proof. (60) and (61) imply that there exists "1 with

Hi(G)yi = pi(G) + "1;

k"1k � ":

(58) and (63) imply that there exists "2 with

Hj(G)Qj;i(aGj ; a

Gi )yi = pj(G) + "2;

k"2k � ( �K + 1)":

59

On the other hand, (59) and (64) imply that there exists "3 with

Hj(B)Qj;i(aBj ; a

Gi )yi = pj(B) + "3;

k"3k � ( �K + 1)":

Therefore, it su¢ ces to show that, for su¢ ciently small e, for any k"k � e, there does

not exist yi 2 RjYij+ such that

26664Hi(G)

Hj(G)Qj;i(aGj ; a

Gi )

Hj(B)Qj;i(aBj ; a

Gi )

37775yi=26664pi(G)

pj(G)

pj(B)

37775+ ": (65)

For generic q, with " = 0, such yi does not exist since we have jYij degrees of freedom

and jYij + 2 jYjj � jAjj � 2 jAij + 1 constraints. Note that one row of each of Hi(G),

Hj(G)Qj;i(aGj ; a

Gi ) and Hj(B)Qj;i(a

Bj ; a

Gi ) is parallel to 1.

By Farkas Lemma, this nonexistence of yi for " = 0 is equivalent to the existence of

x 2 RjYij+2jYj j�jAj j�2jAij+1+ with

26664Hi(G)

Hj(G)Qj;i(aGj ; a

Gi )

Hj(B)Qj;i(aBj ; a

Gi )

377750

x � 0;

26664pi(G)

pj(G)

pj(B)

37775 � x > 0:

Hence, for su¢ ciently small e, for any k"k � e,

26664Hi(G)

Hj(G)Qj;i(aGj ; a

Gi )

Hj(B)Qj;i(aBj ; a

Gi )

377750

x � 0;

0BBB@26664pi(G)

pj(G)

pj(B)

37775+ "1CCCA � x > 0:

Again, by Farkas Lemma, (65) does not have a solution for su¢ ciently small e.

See the lower half of Figure 13. Case 1 above is Box 7 of Figure 13. Case 2-(a) is Box 12.

60

Case 2-(b)-i corresponds to Box 9, 2-(b)-ii corresponds to Box 10, and 2-(b)-iii corresponds

to Box 11.

Figure 13 corresponds to Figure 11 in the intuitive explanation. Box 2 of Figure 11

corresponds to Box 9 in Figure 13, Box 3 in Figure 11 corresponds to Box 10 of Figure 13,

and Box 4 of Figure 11 corresponds to Box 11 of Figure 13. The cases where player i answers

�No�to the question in Box 1 in Figure 11 are included in Boxes 7 and 12 in Figure 13.

[Insert Figure 13]

Notice that whenever player i infers �j(l+1) = B, then at least one of � i(l) = B, �i(l) = B,

� i(l; �j; 1) = B and #i(l; �j; 1) = B happens. Reversing the roles of i and j, whenever player

j takes aj(l + 1) 6= a(xj) (this implies �i(l + 1) = B), any action pro�le is optimal to player

i.

Finally, we show that Condition 2 in Section 9.1 is satis�ed.

Lemma 9 For any " 2 (0; �"), for su¢ ciently large T , for any i 2 I and �j(l+1) 2 fG;Bg,

1. Conditional on �j(l+1) and �j(l) = �j(l; �j; 1) = G, player i believes that �j(l+1)(i) =

�j(l + 1) or �j(l; �j; 1) = B with probability no less than 1� exp(�T 25 ).

2. Conditional on �j(l+1) and �j(l) = �j(l; �j; 1) = G, any history of player i can happen

with probability at least exp(�T 25 ).

3. Conditional on �j(l + 1) and � i(l) = � i(l; �j; 1) = #i(l; �j; 1) = G, conditional on any

history in the supplemental rounds, player j believes that any �j(l + 1)(i) is possible

with probability no less than exp(�T 25 ).

4. �j(l; �j; 1) = �j(l; �j; 1) = � i(l; �j; 1) = #i(l; �j; 1) = G with probability no more than

exp(�T 25 ).

5. The distribution of �j(l; �j; 1) is independent of player i�s strategy with probability no


61

6. The distribution of �j(l; �j; 1) is independent of player i�s strategy.

7. The distribution of � i(l; �j; 1) is independent of player j�s strategy with probability no


8. The distribution of #i(l; �j; 1) is independent of player j�s strategy.

Notice that, as we have mentioned, player i excludes the case with �j = B in 1 and 2 of

Lemma 9 and player j excludes the case with � i = B or #i = B in 3 of Lemma 9.

Proof.

1. Suppose �j(l + 1) = ��j. If �j(l + 1)(i) = ��j, then we are done. So, let us concentrate

on �j(l + 1)(i) 6= ��j. Forget about the conditioning on �j(l; �j; 1) = G for a while. If

�j(l + 1)(i) 6= ��j but player i uses the supplemental round 1 to infer �j(l + 1), then

i(��j)(l; �j; 1)� pj(��j) > �K"

and player i has (58) and (59). Therefore, by triangle inequality,

Hj(��j)Qj;i(a��jj ; a

Gi )yi(l; �j; 1)� pj(��j)

> � �K � 1�":

From (62), this implies that yi(l; �j; 1) 62 Hj;i["](��j), which means that any yj 2

��f1yjgyj2Yj

�within " from the conditional expectation of 1

T12�1

Pt2Ti(l;�j ;1) 1yj condi-

tional on the true message �j(l+1) = ��j is not included in Hj["](��j). Since Ti(l; �j; 1)

and Tj(l; �j; 1) di¤er at most for two periods, this implies that any yj 2 ��f1yjgyj2Yj

�within "

2from player i�s conditional expectation of yj(l; �j; 1) conditional on the true

message �j(l + 1) = ��j is not included in Hj["](��j). Since yj(l; �j; 1) 2 Hj["](��j) is a

necessary condition for �j(l; �j; 1) = G, player i puts the belief no less than exp(�T 37 )

on �j(l; �j; 1) = G by Hoe¤ding�s inequality.

62

Since Pr(�j(l; �j; 1) = G j fyj;tgt) � 1� exp(�T37 ) for all fyj;tgt, we have

Pr(fyj;tgt j �j(l; �j; 1) = G; fyi;tgt)

=Pr(�j(l; �j; 1) = G j fyi;tgt; fyj;tgt) Pr(fyj;tgt j fyi;tgt)

Pr(�j(l; �j; 1) = G j fyi;tgt)

=Pr(�j(l; �j; 1) = G j fyj;tgt) Pr(fyj;tgt j fyi;tgt)P

fyj;tgt Pr(�j(l; �j; 1) = G j fyj;tgt) Pr(fyj;tgt j fyi;tgt)

2

24 (1� 2 exp(�T 37 )) Pr(fyj;tgt j fyi;tgt);

(1 + 2 exp(�T 37 )) Pr(fyj;tgt j fyi;tgt)

35 : (66)

Hence, even after conditioning on �j(l; �j; 1) = G, player i believes �j(l; �j; 1) = G

with probability at most exp(�T 25 ). In addition, conditional on �j(l + 1), �j(l) is

independent of the supplemental rounds. Therefore, we are done.

2. Suppose �j(l; �j; 1) = B never happens. Conditional on �j(l+1) = ��j, any (yi;t)t2T (l;�j ;1)

can occur with probability at least

�minyi;a

q(yi j a)�T 1

2

:

Hence, for

0 < e <1

� log�minyj ;yi;a q(yi j a; yj)

;any history of player i can happen.

By the symmetric proof to (66), conditioning on �j(l; �j; 1) = G does not change the

probability so much. Again, �j(l) is independent of the supplemental rounds.

3. Suppose � i(l; �j; 1) = B never happens. Conditional on (at; yj;t)t2T (l;�j ;1), any yi(l; �j; 1)

can occur with probability at least

�minyj ;yi;a

q(yi j a; yj)�T 1

2

:

63

Further, for any �j 2 fG;Bg, for yi(l; �j; 1) su¢ ciently close to q(a�jj ; aGi ), (61) and

(63) are satis�ed and �j(l + 1)(i) = �j. Hence, for

0 < e <1

� log�minyj ;yi;a q(yi j a; yj)

;�j(l+1)(i) = �j can happen together with (61) (and so #i(l; �j; 1) = G) with probability

no less than exp(�1eT

12 ).

By the symmetric proof to (66), conditioning on � i(l; �j; 1) = G does not change the

belief so much. Again, � i(l) is independent of the supplemental rounds.

4 to 8 Follows from Hoe¤ding�s inequality and the fact that we take the a¢ ne hull with respect

to aj for the de�nition of Hi(G).

15.4.2 Explanation of 2 of Lemma 5: Supplemental Round 2 for �j(l + 1)

The following is the formal description of the message exchanges in the supplemental round

1 for �j(l + 1).

Strategies Player j (sender) with the message �j(l+1) = ��j 2 fG;Bg determines zj(��j) 2

fG;B;Mg as follows:

zj(��j) =

8>>><>>>:��j with probability 1� 2�;

fG;Bg n f��jg with probability �;

M with probability �:

Here, � is determined in Section 10. Player j with zj(��j) takes

�zj(��j)j =

8>>><>>>:aGj if zj(��j) = G;

aBj if zj(��j) = B;

12aGj +

12aBj if zj(��j) =M

64

for T12 periods in the supplemental round 2 for �j(l+ 1), that is, for t 2 T (l; �j; 2). Player i

(receiver) takes aGi .

Requirement of Player j Player j who has had �j(l; �j; 1) = B or �j(l; �j; 1) = B in the

supplemental round 1 has already made player i indi¤erent between any action pro�le in the

subsequence rounds. Player j who has �j(l; �j; 1) = �j(l; �j; 1) = G requires player i to infer

�j(l + 1) correctly if and only if zj(��j) = ��j. �j(l + 1) = B if zj(��j) 6= ��j.

See Figure 12 again. Notice that Box 9 is the only place where player j requires player i

to infer �j(l + 1) correctly and it happens only if �j(l; �j; 1) = G and zj(��j) = ��j.

Inference of Player i If player i has used the supplemental round 1 to infer �j(l + 1),

player i is stick to that inference. Otherwise, player i infers �j(l+1)(i) using the supplemental

round 2. Remember that player i uses the supplemental round 2 only if � i(l; �j; 1) = B or

#i(l; �j; 1) = B and player j (sender) excludes these cases from consideration (see Figure

13). Therefore, player j does not care about player i�s inference in the supplemental round

2.

Based on the signal observations in the supplemental round 2, fyi;tgt2T (l;�j ;2), the belief

of player i can be classi�ed into the following three cases:

Lemma 10 For each ��j 2 fG;Bg, conditional on �j(l+ 1) = ��j, one of the following three

is correct:

1. the likelihood ratio of zj(��j) = G compared to zj(��j) = B is no less than exp(T37 ).

2. the likelihood ratio of zj(��j) = B compared to zj(��j) = G is no less than exp(T37 ).

3. the likelihood ratio of zj(��j) =M compared to zj(��j) = G;B is no less than exp(T37 ).

If 1 of Lemma 10 is the case, then player i infers �j(l+1)(i) = G. This is almost optimal

since we condition on �j(l+ 1) = ��j 2 fG;Bg and player i is indi¤erent between any action

if zj(��j) 6= ��j. If 2 of Lemma 10 is the case, then player i infers �j(l + 1)(i) = B. This

65

is almost optimal by the same reason. If 3 of Lemma 10 is the case, then player i infers

�j(l + 1)(i) = B. In this case, zj(��j) =M 6= ��j is highly likely and player i is indi¤erent to

any action. Therefore, this is also almost optimal.

Proof. Conditional on �j(l + 1) = ��j,

Pr�zj(��j) = zj j ��j; fyi;tgt2T (l;�j ;2)

�Pr�zj(��j) = z0j j ��j; fyi;tgt2T (l;�j ;2)

� = Pr�fyi;tgt2T (l;�j ;2) j zj(��j) = zj

�Pr�fyi;tgt2T (l;�j ;2) j zj(��j) = z0j

� Pr �zj(��j) = zj j ��j�

Pr�zj(��j) = z0j j ��j

�and

Pr(zj(��j)=zj j��j)Pr(zj(��j)=z0j j��j)

is bounded byh

�1�2� ;

1�2��

i. With yi(l; �j; 2) being the frequency of each

y for fyi;tgt2T (l;�j ;2), log Pr�fyi;tgt2T (l;�j ;2) j zj(��j) = zj

�is expressed as T

12L(yi(l; �j; 2); zj)

with

L(yi(l; �j; 2); zj) = yi;1(l; �j; 2) log q(yi;1jaGi ; �zjj ) + � � �+ yi;jYij(l; �j; 2) log q(yi;jYijjaGi ; �

zjj ):

Hence, it su¢ ces to show that there exists ~� such that, with

L(fi; zj; z0j) � L(fi; zj)� L(fi; z0j);

for any fi 2 �(f1yigyi2Yi), one of the following is true:

1. zj(��j) = G is more likely than zj(��j) = B: L(fi; G;B) � ~�;

2. zj(��j) = B is more likely than zj(��j) = G: L(fi; B;G) � ~�;

3. zj(��j) =M is more likely than zj(��j) = G;B: L(fi;M;G) � ~� and L(fi;M;B) � ~�.

Generically, we can assume that for each k 2 f1; : : : ; jYijg,

q(yi;kjaGi ; �Gj ) 6= q(yi;kjaGi ; �Bj ): (67)

66

Let ��j = �aGj + (1� �) aBj for � 2 [0; 1] and consider

g(fi; �) = fi;1 log q(yi;1jaGi ; ��j ) + � � �+ fi;jYij log q(yi;jYijjaGi ; ��j ):

Then,

d2g(fi; �)

d�2= �

jYijXk=1

fi;k

(q(yi;kjaGi ; �Gj )� q(yi;kjaGi ; �Bj )

q(yi;kjaGi ; ��j )

)2< 0

for any fi;k because of (67). Hence, g(fi; �) is strictly concave. Therefore, since L(fi; zj; ~zj)

is the di¤erence in g(fi; �), one of the following is true:

1. zj(��j) = G is more likely than zj(��j) = B: L(fi; G;B) > 0;

2. zj(��j) = B is more likely than zj(��j) = G: L(fi; B;G) > 0;

3. zj(��j) =M is more likely than zj(��j) = G;B: L(fi;M;G) > 0 and L(fi;M;B) > 0.

Hence,

max fL(fi; G;B);L(fi; B;G);min fL(fi;M;G);L(fi;M;B)gg > 0:

Since LHS is continuous in fi and �(f1yigyi2Yi) is compact, there exists ~� > 0 such that

max fL(fi; G;B);L(fi; B;G);min fL(fi;M;G);L(fi;M;B)gg > ~�

for all fi 2 �(f1yigyi2Yi) as desired.

15.4.3 Proof of 3-(a) to 3-(f)

Follows from Lemma 9 and conditional independence of each round.


Since �j(l) = �j(l) always holds for �i(B), we concentrate on �i(G).

67

Since once �j(~l) = B is induced, then �j(~l0) = B for all the following rounds, there

exists a unique l� such that �j(~l) = B is initially induced in the (l� + 1)th review round:

�j (1) = � � � = �j (l�) = G and �j (l� + 1) = � � � = �j (L) = B. Similarly, there exists l� with

�j (1) = � � � = �j(l�) = G and �j(l� + 1) = � � � = �j (L) = B. If �j (L) = G (�j (L) = G,

respectively), then de�ne l� = L (l� = L, respectively).

Then, there are following three cases:

� l� = l�: This means �j (l) = �j (l) for all l as desired.

� l� > l�: This means that player i in the supplemental rounds 1 and 2 for �j(l� + 1)

inferred �j(l�+1)(i) = B. Then, for any l � l�+1, by 1 of Lemma 5, player i believes

that conditional on �j(~r) = #j(~r) = G for all r < l, player i in the lth review round

believes that �j(l� + 1)(i) = �j(l� + 1) or ��j(l�; �j; 1) = B or �j(l�; �j; 2) = B�with

probability no less than 1� exp(�T 13 ) as desired.

� l� < l�: There are following two cases:

�Player i was ready to listen in the l�th block: by the same reason as above, we

are done.

�Player i was not ready to listen in the l�th block (this means that player i has not

deviated until the end of the main round of the l�th block). This means that 3 of

Lemma 4 is applicable and player i at the end of the l�th review round believes

that �j(l� + 1) = G with probability no less than 1� exp(�T 45 ).

As we have mentioned in Section 9.1, player j�s continuation strategy reveals �j(l+

1) through (i) the strategy in the supplemental rounds and (ii) �i(l + 1). From

2 of Lemma 5, conditional on �j(l + 1) and �j(l�) = �j(l

�; �j; 1) = �j(l�; �i; 1) =

#j(l�; �i; 1) = G, any history of player i can happen in the supplemental rounds

with probability no less than exp(�T 25 ). Hence, the update from (i) is bounded

by exp(T25 ). In addition, for �i(l+ 1), player j is ready to listen with probability

at least � and 3 of Lemma 5 (with the roles of i and j being reversed) implies

68

that any �i(l+1)(j) happens with probability no less than exp(�T25 ). Hence, the

update from (ii) is bounded by 1�exp(T

25 ). Therefore, after observing player j�s

continuation strategy, player i believe �j(l + 1) = B with probability at most

exp(�T 45 ) 1

�exp(2T

25 )

1� exp(�T 45 )

< exp(�T 13 ) (68)

as desired.

15.6 Proof of Proposition 1

For (6), it su¢ ces to have

��i (x; l; �j)

8<: � 0 if xj = G;

� 0 if xj = B;(69)

j��i (x; l; �j)j � maxi;a 2 jui (a)jT (70)

for all x 2 fG;Bg2, l 2 f1; :::; Lg and �j 2 fG;Bg.

To see why (69) and (70) are su¢ cient for (6), notice the following: (70) implies uniform

boundedness. (69) implies that �i(xj; hTP+1j ; l) > ��T with xj = G or �i(xj; h

TP+1j ; l) < �T

with xj = B only if �j(l) = G andXj(l) 62 [q2T�2"T; q2T+2"T ]. Since we have �j(~l) = B for

~l > l after those events from (28), �i(xj; hTP+1j ; l) � ��T with xj = G or �i(xj; h

TP+1j ; l) � �T

with xj = B except for one round. In addition, (69) implies ��LT � �T � �i(xj; hTP+1j ; l) �

�LT + �T for all xj. Hence, in total,PL

l=1 �i(xj; hTP+1j ; l) � �LT � L�T < 0 with xj = G and

�i(xj; hTP+1j ; l) � ��LT + L�T > 0 with xj = B. The strict inequalities follow from (11).

Therefore, (6) is satis�ed.

Therefore, we prove Proposition 1 with (6) replaced with (69) and (70).

1 follows from (32). 2-(a) is true since �j(~r) = #j(~r) = G for all ~r < r imply that player

j has inferred �i(l + 1)(j) from the supplemental round 1 (See Figure 13 reversing the roles

of i and j). 2-(b) is true since taking aGi gives the almost optimal inference and (35) gives a

high reward on aGi .

69

Therefore, we are left to show the existence of ��i(x; l; �j) with (69) and (70) and 2-(c) by

backward induction. From Lemmas 4 and 5, the distribution of �j, #j and �j is independent

of player i�s strategy with high probability and so we can neglect the e¤ect of player i�s

strategy on �j, #j and �j.

In the Lth review round, consider the case with xi = B �rst. If player j uses (33), then

Di is strictly optimal. If player j uses (32), then any action is optimal.

Consider the case with xi = G next. If player j uses (33) and �j(L) = �j(L) = G, then

Ci is strictly optimal for su¢ ciently large T since the reward (33) is increasing in Xj(L) and

(42) implies that the marginal expected increase in �LXj(L) is su¢ ciently large.18 If player

j uses (33) and �j(L) = �j(L) = B, then Di is strictly optimal since the reward (33) is

constant. If player j uses (32), then any action is optimal. Hence, �i(xi) is optimal for these

cases.

For the other cases, by Lemma 6, player i does not have posteriors more than exp(�T 13 ).

Since the per-period di¤erence of the payo¤ from two di¤erent strategies is bounded by

�U � �L + maxi;a 2 jui(a)j, the expected loss from �i(xi) (taking Ci if �j(L) = G and Di if

�j(L) = B) is no more than exp(�T 13 ) �UT � exp(�T 1

4 ) for su¢ ciently large T .

Therefore, in total, �i(xi) is almost optimal in the Lth review round.

Further, if player j uses (33) and �j(L) = �j(L), then player i�s average continuation

payo¤ at the beginning of the Lth review round except for ��i(x; L; �j) is

ui(Di; Cj)� � if xi = B; xj = G;

ui(Di; Dj) + � if xi = B; xj = B;

ui(Ci; Cj)� �� 2"�L if xi = G; xj = G; �j(L) = �j(L) = G;

ui(Ci; Dj) + �+ 2"�L if xi = G; xj = B; �j(L) = �j(L) = G;

ui(Di; Cj)� � if xi = G; xj = G; �j(L) = �j(L) = B;

ui(Di; Di) + � if xi = G; xj = B; �j(L) = �j(L) = B:

(71)

Hence, there exists ��i(x; L; �j) with (69) and (70) such that player i�s average continuation

18For large T , the e¤ect of dropping one tj(l) is negligible.

70

payo¤ is equal to ui(Ci; Cj)� �� 2"�L if xj = G and ui(Di; Dj) + �+ 2"�L if xj = B.

In the supplemental rounds for �j(L) (where player i receives the message), as we have

mentioned for 2-(b) of Proposition 1, �i(xi) is exactly optimal.

In the supplemental round 2 for �i(L), as we have mentioned for 2-(a) of Proposition 1,

�i(xi) is exactly optimal.

In the supplemental round 1 for �i(L), since (i) (34) cancels out the di¤erence in the

instantaneous utilities, (ii) 4 of Lemma 4 guarantees that player i�s continuation payo¤ is

�xed regardless of player j�s inference of player i�s message if player j uses the inference

�i(L)(j) for �i(L), and (iii) the equilibrium strategy gives the almost optimal inferences,19

�i(xi) is optimal up to the loss of exp(�T13 )f �UT + maxx;�j ��i(x; L; �j)g � exp(�T 1

4 ) for

su¢ ciently large T .

In the (L�1)th main round, for the almost optimality, only di¤erence from the Lth review

round is that, if �j(L � 1) = G, then player i�s action in the (L � 1)th review round can

a¤ect the distribution of �j(L). However, since (i) the miscoordination between �j(L) and

�j(L) will not occur with probability more than exp(�T13 ) by Lemma 6 and (ii) ��i(x; L; �j)

is determined so that player i�s continuation payo¤ is the same between �j(L) = G and B if

the coordination goes well, this di¤erence is negligible for the almost optimality.

Further, if player j uses (33) and �j(L � 1) = �j(L � 1), then player i�s average payo¤

from the (L� 1)th review round except for ��i(x; L�1; �j) is given by (71). The cases where

(32) will be used in the Lth review round will happen with probability no more than 3�

(player j is ready to listen to the message �i(L) and zj(��j) 6= �j(L)) plus some negligible

probabilities for �j = B or #j = B. When (32) is used, per period payo¤ is bounded

by [��u; �u] by Lemma 2. Therefore, there exists ��i(x; L; �j) with (69) and (70) such that

player i�s average continuation payo¤ from the (L� 1)th and Lth review rounds is equal

to ui(Ci; Cj) � � � 2"�L � 3��u2if xj = G and ui(Di; Dj) + � + 2"�L + 3��u

2if xj = B. Since

the length of the supplemental rounds is much smaller than that of the review rounds, the

instantaneous utilities and rewards in the supplemental rounds do not a¤ect the average

19See (68) to see how the history in this round can a¤ect the optimality of player i�s inferences slightly.

71

payo¤.

Recursively, for l = 1, Proposition 1 is satis�ed and the average ex ante payo¤ of player

i is ui(Ci; Cj)� �� 2"�L� 3(L�1)�u�L

if xj = G and ui(Di; Dj) + �+ 2"�L+ 3(L�1)�u�L

if xj = B.

From (44), we can further modify ��i (x; 1; G) with (69) and (70) such that �i(xi) gives �vi (vi,

respectively) if xj = G (B, respectively).20

15.7 Formal Construction of the Report Block

We are left to show the truthtelling incentive for hTP+1i and to establish the exact optimality

of �i (xi). To verify the truthtelling incentive, we distinguish the true history hTP+1i and

the message hTP+1i . In general, when we write a variable in player i�s history with �hat,�it

means player i�s message (and with cheap talk, equivalently player j�s inference) about that

variable.

Let Aj(r) be the set of information up to and including the round r consisting of

� What state xj player j is in,

� What action aj(l) player j took in the lth review round with l � r, and

� What states �j(~r); #j(~r) 2 fG;Bg with ~r < r player j had.

We want to show that �i(xi) is exactly optimal in the round r conditional on Aj(r). Note

that Aj(r) contains xj and so the equilibrium is belief-free at the beginning of the �nitely

repeated game.

We introduce the following variables. Let Ri(r) be the set of rounds ~r � r � 1 that are

not a supplemental round 2 for �i(l + 1) (with Ri(1) = ;), tr be the initial period of the

round r, and hri be the summary of player i�s history at the beginning of the round r. hri is

a collection of jAij jYij � 1 vectors, one for each ~r 2 Ri(r). The element of the vector for ~r

corresponding to (ai; yi) represents how many times player i observed (ai; yi) in the round ~r.

We construct the message protocol for hTP+1i as follows:

20Remember that �j(1) = G.

72

� By public randomization device, the players coordinate on who will report hTP+1i .

Only one player reports the history. Player 1 reports hTP+11 with probability 12and

player 2 reports hTP+12 with probability 12. Suppose player i is picked by the public

randomization device.

� For each round r that is not a supplemental round 2 for �i(l + 1),

�Player i sends the history in the initial period of the round r: (ai;tr ; yi;tr).

�Player i sends the history of the round r: fai;t; yi;tgt2T (r).

With abuse of notation, we assume the players can send multiple messages sequentially.

Note that hri can be calculated by the messages sent before (ai;tr ; yi;tr).

Player j gives a reward on player i as follows. Here, we do not consider the feasibility

constraint (6). As we will see, the total reward in the report block is bounded by T�1 and we

can restore (6) by adding or subtracting a small constant depending on xj without a¤ecting

the incentive.

As a preparation, we prove the following lemma:

Lemma 11 Let hi and hj be player i�s and player j�s histories at the end of the main blocks,

respectively. There generically exist �" > 0 and gi(hj; ai; yi) such that, for su¢ ciently large T ,

for any round r 2 Ri(R) and period t in T (r), conditional on �j(~r) = #j(~r) = G with ~r < r,

it is better for player i to report ai;t; yi;t truthfully: For all hi,

E�gi(hj; ai;t; yi;t) j �j(~r) = #j(~r) = G with ~r < r; hi; (ai;t; yi;t) = (ai;t; yi;t)

�(72)

> E�gi(hj; ai;t; yi;t) j �j(~r) = #j(~r) = G with ~r < r; hi; (ai;t; yi;t) 6= (ai;t; yi;t)

�+ �"T�1;

where (ai;t; yi;t) is player i�s message.

Proof. We show

gi(hj; ai;t; yi;t) = �1ftj(r)=tg 1yj;t � E[1yj;t j ai;t; yi;t; aj;t] 273

works.21 ;22 To see this, consider the following two cases:

1. If tj(r) 6= t, any report is optimal since gi(hj; ai;t; yi;t) = 0.

2. If tj(r) = t, then period t is not used for the construction of the continuation strategy.

Hence, player i, after knowing tj(r) = t and aj;t, wants to maximize

maxai;t;yi;t

Eh� 1yj;t � E[1yj;t j ai;t; yi;t; aj;t] 2 j ai;t; yi;t; aj;ti :

The �rst order condition is

E�1yj;t j ai;t; yi;t; aj;t

�= E[1yj;t j ai;t; yi;t; aj;t];

which generically implies (ai;t; yi;t) = (ai;t; yi;t), and the second order condition is also

satis�ed.

We are left to show that there exists " > 0 such that, for any hi, r 2 Ri(R) and t 2 T (r),

player i puts belief at least "T�1 on tj(r) = t. Suppose player i knows faj;�g�2T (r) and

fyj;� ; 'j;�g�2Tj(r) in addition to hi. Here, 'j;t is

� Hj;t for t in the round r where player j is the sender of the message by a pure strategy,

� ; (no information) for t in the round r where player j is the sender of the message by

a mixed strategy,

� Gj;t, j;t(G) and j;t(B) for t in the round r where player j is the receiver of player

i�s message set by a pure strategy, and

� 'j;t = (a(x)j;t ; (Eji)t) for t in the round r that is a review round.

21We use Euclidean norm in Section 15.7.22Kandori and Matsushima (1994) use a similar reward to give a player the incentive to tell the truth

abouth the history.

74

Since faj;� ; yj;� ; 'j;�g�2Tj(r) determines player j�s continuation strategy (including �j and

#j), it su¢ ces to show that, conditional on fa� ; yi;�g�2T (r), fyj;� ; 'j;�g�2T (r) and�yj;tj(r); 'j;tj(r)

�=�

�yj; �'j�, player i puts belief at least "T�1 on tj(r) = t. For any t and t0 2 T (r), the likelihood

ratio between tj(r) = t and tj(r) = t0 is given by

Pr(tj(r) = t j fa� ; yi;�g�2T (r); fyj;� ; 'j;�g�2T (r);�yj;tj(r); 'j;tj(r)

�=��yj; �'j

�)

Pr(tj(r) = t0 j fa� ; yi;�g�2T (r); fyj;� ; 'j;�g�2T (r);�yj;tj(r); 'j;tj(r)

�=��yj; �'j

�)

=Pr(fyj;� ; 'j;�g�2T (r);

�yj;tj(r); 'j;tj(r)

�=��yj; �'j

�j fa� ; yi;�g�2T (r); tj(r) = t)

Pr(fyj;� ; 'j;�g�2T (r);�yj;tj(r); 'j;tj(r)

�=��yj; �'j

�j fa� ; yi;�g�2T (r); tj(r) = t0)

2�mina;yi

q(�yj; �'j j a; yi);1

mina;yi q(�yj; �'j j a; yi)

�2"min

a;yi;yj ;'jq(yj; 'j j a; yi);

1

mina;yi;yj ;'j q(yj; 'j j a; yi)

#:

Since min q(yj; 'j j a; yi) 2 (0; 1) from Lemma 7, there exists " > 0 such that

Pr�tj(r) = t j fa� ; yi;�g�2T (r); fyj;� ; 'j;�g�2T (r);


�=��yj; �'j

��> "Pr(tj(r) = t0 j fa� ; yi;�g�2T (r); fyj;� ; 'j;�g�2T (r);


�=��yj; �'j

�)

for all t and t0. Since there exists at least one t0 with Pr(tj(r) = t0 j fa� ; yi;�g�2T (r); fyj;� ; 'j;�g�2T (r);�yj;tj(r); 'j;tj(r)

�=�

�yj; �'j�) > T�1, we are done.

By backward induction, for each r, we will construct the following rewards based on

player i�s messages:

� If we come to the round r with �j(~r) = B or #j(~r) = B with some ~r < r, then we

cancel out all the rewards explained below about the following rounds r � r. Then,

since we have established the exact optimality of �i(xi) in the round r with �j(~r) = B

or #j(~r) 2 B with some ~r < r without adjustment, any action is exactly optimal after

(and including) the round r.

� If the round r is a supplemental round 2 for �i(l+1), player i does not report the history

75

in that round and the rewards below are all 0 for that round. Since Proposition 1

establishes the exact optimality without adjustment, we can keep the exact optimality.

� Otherwise, we consider the following punishment and reward:

� (r-j-1) Based on hri , player j gives

Xai

f(ai j hri ;Aj(r))ai;aj;trj;tr

(73)

with

f(ai j hri ;A1(r)) 2 [�T�10r+6; T�10r+6] for all ai 2 Ai

such that, after hri , it is optimal to take ai 2 Ai(hri ). Note that if

ai;aj;trj;tr

= 1,

that is, if it is likely that player i took ai at the beginning of the round r, player j

rewards player i by f(ai j hri ;A1(r)) so that player i wants to follow the equilibrium

strategy. The existence of such a function f will be veri�ed below.

� (r-j-2) Player j punishes player i if it is likely for player i to tell a lie about the

history in the initial period of the round r by

T�10r+5gi(hj; ai;tr ; yi;tr): (74)

� (r-j-3) Player j makes it optimal to constantly take ai;tr within the round r by

adding

T�10r+3Xt2T (r)

ai;tr ;aj;tj;t (75)

for t included in the round r, that is, if player i tells the truth and ai;tr = ai;tr ,

then player j rewards player i if it is likely that player i in the round r takes the

same action as the one in the initial period ai;tr .

� (r-j-4) Player j punishes player i if it is likely for player i to tell a lie about

76

fai;t; yi;tgt2T (r) by Xt2T (r)

T�10rgi(hj; ai;t; yi;t): (76)

Based on the message hri , player j calculates Ai(hri ), the set of player i�s action

that should be taken with positive probability in the round r after history hri .

We show the truthtelling incentive about (ai;tr ; yi;tr) and fai;t; yi;tgt2T (r) by backward

induction. We start from r = R, the last round. If �j(~r) = B or #j(~r) = B with some ~r < r,

the messages about the history in the round r are irrelevant. If �j(~r) = #j(~r) = G with

~r < r, then regardless of the speci�cation of f , since (73), (74) and (75) have been sunk, it

is optimal for player i to tell the truth about fai;t; yi;tgt2T (R). Since (74) dominates (75), it

is optimal to tell the truth about ai;tr ; yi;tr .

For the round (R� 1), (73), (74), (75) and (76)for r = R are dominated by the smallest

loss in (76) and (74) for r = R � 1. Therefore, the same argument for the round R works.

We can proceed until the �rst round.

Recursively, therefore, regardless of the speci�cation of f , we have established the opti-

mality of the truthtelling incentive in the report block. Now, we construct f(ai j hri ;Aj(r))

by backward induction.

At the beginning of the round R, if �j(~r) = B or #j(~r) = B with ~r � R � 1, then

any action is exactly optimal and the speci�cation of f is irrelevant. Otherwise, player i�s

value of taking a constant action ai 2 Ai conditional on Aj(R) only depends on hRi given

the truthtelling strategy in the report block. This is true even after player i�s deviation

since player j�s strategy is i.i.d. within each round. Hence, we can write the value as

vi(ai j hRi ;Aj(R)). Let f(ai j hRi ;Aj(R)) be such that

Pr(player i reports the history)X~ai

f�~ai j hRi ;Aj(R)

�Pr�n~ai;aj;tj;t = 1

oj ai;t = ai

�

=

8<: max~ai2Ai vi(~ai j hRi ;Aj(R))� vi(ai j hRi ;Aj(R)) if ai 2 Ai(hRi )

0 otherwise

77

for all hRi . Remember that �i(xi) is almost optimal except for the adjustment of the reward in

the report block, that all the messages about hRi transmit correctly, and that the variance of

the reward in the report block based on the histories in the round R is bounded by T�10R+5.

Hence, we can make sure that

f(ai j hRi ;Aj(R)) 2 [�T�10R+6; T�10R+6]:

This makes it exactly optimal to take ai 2 Ai(hRi ) at tR. After that, (75) and truthtelling

incentive imply that it is optimal to constantly take ai;tR .

We can proceed until the �rst round and show the optimality of �i(xi). The di¤erence

from the round R is that, when player i takes at, it a¤ects the reward for the messages sent

after atr ; ytr about the history in the following rounds. Since this e¤ect is dominated by (75),

it is optimal for player i to take the same action as atr constantly.

Note that if the round r is a supplemental round 2 for �i(l + 1) and this round does not

have an impact (�j(~r) = #j(~r) = G with ~r < r guarantee this), then the expected rewards in

the report block are not a¤ected by the strategy in the round r. Therefore, �i(xi) is exactly

optimal as stated in Proposition 1. In addition, if the round r is a supplemental rounds for

�j(l+ 1), since �i(xi) is strictly optimal by 12�L(q2 � q1) without f from 2-(b) of Proposition

1, the equilibrium strategy is optimal regardless of the speci�cation of f . Hence, we make f

constant at 0.

Finally, we consider the reward on the message xi:

f(xi j xj):

Conditional on xj 2 fG;Bg, �i(xi) gives �vi (vi, respectively) if xj = G (B, respectively)

without the reward in the report block from Section 11. Since the reward in the report block

so far is bounded by [�T�2; T�2], we can take f(xi j xj) 2 [�T�1; T�1] for all xi; xj 2 fG;Bg

such that �i(xi) gives �vi (vi, respectively) if xj = G (B, respectively) without the reward in

the report block.

78

References

[1] Aoyagi, M. (2002): �Collusion in Dynamic Bertrand Oligopoly with Correlated Private

Signals and Communication,�Journal of Economic Theory, 102, 229-248.

[2] Bhaskar, V., and I. Obara (2002): �Belief-Based Equilibria in the Repeated Prisoners�

Dilemma with Private Monitoring,�Journal of Economic Theory, 102, 49-69.

[3] Compte, O. (1998): �Communication in Repeated Games with Imperfect Private Mon-

itoring,�Econometrica, 66, 597-626.

[4] Deb, D. (2011): �Cooperation and Community Responsibility: A Folk Theorem for

Repeated Random Matching Games,�mimeo.

[5] Ely, J., J. Hörner, and W. Olszewski (2004): �Dispensability of Public Randomization

Device,�mimeo.

[6] Ely, J., J. Hörner, and W. Olszewski (2005): �Belief-Free Equilibria in Repeated

Games,�Econometrica, 73, 377-415.

[7] Ely, J., and J. Välimäki (2002): �A Robust Folk Theorem for the Prisoners�Dilemma,�

Journal of Economic Theory, 102, 84-105.

[8] Fong, K., O. Gossner, J. Hörner, and Y. Sannikov (2007): �E¢ ciency in a Repeated

Prisoners�Dilemma with Imperfect Private Monitoring,�mimeo.

[9] Fuchs, W. (2007): �Contracting with Repeated Moral Hazard and Private Evaluations,�

American Economic Review, 97, 1432-1448.

[10] Fudenberg, D., and D. Levine (2002): �The Nash-Threats Folk Theorem with Com-

munication and Approximate Common Knowledge in Two Player Games,�Journal of

Economic Theory, 132, 461-473.

[11] Fudenberg, D., and D. Levine, and E. Maskin (1994): �The Folk Theorem with Imper-

fect Public Information,�Econometrica, 62, 997-1040.

79

[12] Fudenberg, D., and E. Maskin (1986): �The Folk Theorem in Repeated Games with

Discounting and with Incomplete Information,�Econometrica, 54, 533-554.

[13] Harrington, J. E. Jr. and A. Skrzypacz (2007): �Collusion under Monitoring of Sales,�

RAND Journal of Economics, 38, 314-331.

[14] Hörner, J. and W. Olszewski (2006): �The Folk Theorem for Games with Private

Almost-Perfect Monitoring,�Econometrica, 74, 1499-1544.

[15] Hörner, J. and W. Olszewski (2009): �How Robust is the Folk Theorem with Imperfect

Public Monitoring?,�Quarterly Journal of Economics, 124, 1773-1814.

[16] Kandori, M. (2002): �Introduction to Repeated Games with Private Monitoring,�Jour-

nal of Economic Theory, 102, 1-15.

[17] Kandori, M. (2010): �Weakly Belief-Free Equilibria in Repeated Games with Private

Monitoring,�Econometrica, 79, 877-892.

[18] Kandori, M., and H. Matsushima (1998): �Private Observation, Communication and

Collusion,�Econometrica, 66, 627-652.

[19] Kandori, M., and I. Obara (2006): �E¢ ciency in Repeated Games Revisited: the Role

of Private Strategies,�Econometrica, 72, 499-519.

[20] Kandori, M., and I. Obara (2010): �Towards a Belief-Based Theory of Repeated Games

with Private Monitoring: An Application of POMDP,�mimeo.

[21] Mailath, G. and S. Morris (2002): �Repeated Games with Almost-Public Monitoring,�

Journal of Economic Theory 102, 189-228.

[22] Mailath, G., and S. Morris (2006): �Coordination Failure in Repeated Games with

Almost-Public Monitoring,�Theoretical Economics, 1, 311-340.

[23] Mailath, G., and L. Samuelson (2006): Repeated Games and Reputations: Long-Run

Relationships. Oxford University Press, New York, NY.

80

[24] Matsushima, H. (2004): �Repeated Games with Private Monitoring: Two Players,�

Econometrica, 72, 823-852.

[25] Miyagawa, E., Y. Miyahara, and T. Sekiguchi (2008): �The Folk Theorem for Repeated

Games with Observation Costs,�Journal of Economic Theory, 139, 192-221.

[26] Obara, I. (2009): �Folk Theorem with Communication,�Journal of Economic Theory,

144, 120-134.

[27] Phelan, C. and A. Skrzypacz: �Beliefs and Private Monitoring,�mimeo

[28] Piccione, M. (2002): �The Repeated prisoners�Dilemma with Imperfect Private Moni-

toring,�Journal of Economic Theory, 102, 70-83.

[29] Radner, R. (1985): �Repeated Principal-Agent Games with Discounting,�Economet-

rica, 53, 1173-1198.

[30] Sekiguchi, T. (1997): �E¢ ciency in Repeated prisoners�Dilemma with Private Moni-

toring,�Journal of Economic Theory, 76, 345-361.

[31] Stigler, G. (1964): �A Theory of Oligopoly,�Journal of Political Economy, 72, 44-61.

[32] Sugaya, T. (2010a): �Generic Characterization of the Set of Belief-Free Review-Strategy

Equilibrium Payo¤s,�mimeo.

[33] Sugaya, T. (2010b): �Folk Theorem in a Repeated Prisoners�Dilemma without Condi-

tional Independence,�mimeo.

[34] Sugaya, T. (2010c): �Folk Theorem in a Repeated Prisoners�Dilemma with a Generic

Private Monitoring,�mimeo.

[35] Sugaya, T. and S. Takahashi (2010): �Coordination Failure in Repeated Games with

Private Monitoring,�mimeo.

81

[36] Takahashi, S. (2007): �Community Enforcement when Players Observe Partners�Past

Play,�mimeo.

[37] Yamamoto, Y. (2007): �E¢ ciency Results in N Player Games with Imperfect Private

Monitoring,�Journal of Economic Theory, 135, 382�413.

[38] Yamamoto, Y. (2009a): �A Limit Characterization of Belief-Free Equilibrium Payo¤s

in Repeated Games,�forthcoming in Journal of Economic Theory.

[39] Yamamoto, Y. (2009b): �Repeated Games with Private and Independent Monitoring,�

mimeo.

82

1st review round: ∈ 1 .

2nd review round: ∈ 2 .

1 th review round: ∈ 1 .

th review round: ∈ .

each player with sends by cheap talk.

There are

periods

in each round

There are

periods

in each round

There are

periods

in each round

There are

periods

in each round

…

Figure 1:The Informal Structure of the Phase

Using the reward

∑ Ψ ,,

∈ 2

if and

∑ Ψ ,,

∈ 2

if .

λ λ

∑ Ψ ,,

∈

∈ 2 , 2

∑ Ψ ,,

∈

∉ 2 , 2

Using the reward if and

if .

λ 1λ 1

th review round

1 th review round

This is

Figure 2:Transition of

λ is common knowledge

Take Take Take

Figure 3:Player ’s Action

Using the reward

∑ Ψ ,,

∈ 2

if and

∑ Ψ ,,

∈ 2

if .

λ λ

∑ Ψ ,,

∈

∈ 2 , 2

∑ Ψ ,,

∈

∉ 2 , 2

Using the rewardif and

if .

λ 1λ 1

th review round

1 th review round

Using the reward

∑ ∈ , , , .

1

Figure 4:Transition of 1 and Player ’s Reward After

review round: ∈ 1 .

Supplemental round 1 for 2 : ∈ 1, , 1 .




1st block






2nd block






(L‐1)th block

Lth block review round: ∈ .

each player with sends by cheap talk.

each player reports her whole history by cheap talk.report block

coordination block

periods

/ periods/ periods/ periods/ periods

periods


periods


periods

Figure 5: Formal Structure of the Phase

Is ∑ Ψ , ∣ , , ∈∈ close to ∑ Ψ∈ ?

( ∑ Ψ , ∣ , , ∈∈ ∑ Ψ∈ ?)

NoYes

andtaken awayfrom player ’s consideration.

Is ∑ Ψ∈ close to its ex ante mean?

(Is ∑ Ψ∈ ∈ , ?)

randomization

Player is ready to listen for the message in the supplemental rounds 1 and 2 for 1 .

1 1 ∈ ,

Yes No

Box 1.

Box 2.Box 3.

Box 4.

Box 11.

Box 6. Box 12.

Box 10.

1

Box 5. 1 η η

Player is ready to listen for the message in the supplemental rounds 1 and 2 for 1 .

1 1 ∈ ,

Box 8.

Box 9.

andplayer will be indifferent between any action sequence in the subsequent rounds.

Box 7.

andplayer will be indifferent between any action sequence in the subsequent rounds.

Figure 6:Inference of

• Using the reward that is increasing

in ∑ Ψ ,

∈ .

• If player can correctly infer ,

player is indifferent between and .

λ λ

ϑ for all imply

• Using the reward that is constant. • If player can correctly infer ,

player is indifferent between and .

Figure 7:Reward Function by Player on Player

Player is sure about λ 1from , ∈

.

Player is not sure.

Player uses 1 to

determine 1 .

Player uses λ 1 to determine

1 .

1 η

η

Player is sure about λ 1from , ∈

.

• Player uses 1 to determine 1 .

• Player determines 1 from

λ 1 , which depends on Ψ , ∈,

determined by , ∈.

• Player uses λ 1 to determine 1 .

• Player has (any action

sequence is optimal).

1 η

η

Box 1.

Box 2.

Box 3.

Box 4.

Box 5.

Box 7.

Box 8.

Player is not sure from

, ∈.

• Player uses λ 1 to determine 1 .

• Player has or

and any action is optimal.

Box 6.Box 9.

λ 1 . λ 1 .What will happen in the supplemental rounds 1 and 2 for 1 .

Box 10. Box 11.

This case is taken away from player ’s consideration

Box 12.

Figure 8: Player ’s Inference of and Player ’s Inference of

Player with

(sender)

Player (receiver)

Player takes for periods. Player takes .

∑ ,∈ , , ∣

∑ ,∈ , , , ,

is close to aff , .

, , 1 :

Any action will be optimal to player .

Goes to the supplemental round 2 for 1 .

, , 1 or

, , 1 : Player is

indifferent between any action profile and player excludes this case from the consideration.

2

Yes No Yes

Box 1.

Box 2.

Box 5. Box 6.

Box 8. Box 9.

Box 4.

Box 8.

Box 7.

Is

∑ ,∈ , , close to

aff , ?

Player may require player to infer

1 correctly,

depending on the supplemental round 2 for 1 .

No

∑ ,∈ , , ∣

∑ ,∈ , , , ,

is close to aff , .

Is


aff , ?

Otherwise.

2

Box 3.

Figure 9:Player ’s Inference of Player ’s Message

Player with

(sender)

Player takes

for periods.

is…

, ∖

Player takes , ∖

for periods.

Player takes

for

periods.

Yes No

Box 1.

Is


aff , ?

, , 1 :


The supplemental rounds 1for 1


, , 2 :


Player require player to infer 1 correctly.

Box 2.

Box 5.

Box 4.Box 3.

Box 6.

Box 7.

Figure 10:Player ’s Requirement on Player ’s Inference

Player (receiver)

1

Yes

Box 2. Box 3.

Box 8. Box 9.

Box 1.

Is


aff , ?

No

Otherwise.

1



is

more likely than

.

is

more likely than

.

is the

most likely among

, , .

Box 4.

Box 5. Box 6. Box 7.

Figure 11:Player ’s Inference of Player ’s Message

∑ ,∈ , ,

∣

∑ ,∈ , , , , is close to

aff , .

∑ ,∈ , ,

∣

∑ ,∈ , , , , is close to

aff , .

Player with

(sender)

Player takes

for periods.

is…

, ∖

Player takes , ∖

for periods.

Player takes , ∖

for periods.

Yes No

Box 1.

∑

,is close to

aff , .

, , 2 .



, , 2 :


Player require player to infer 1 correctly.

Box 6.

Box 9.

Box 8.Box 7.

Box 10.

Box 11.

Is

∑ ,

close to

∑ Ω , ?

1 , , 1and taken away from player ’s consideration.

.

Is

∑ Ω , close to ?

Yes No

Box 3.

Box 5.

Box 4.

Box 2.

Figure 11:Formal: Player ’s Requirement on Player ’s Inference

Player (receiver)

∑ ,

is close to

1

Box 12.

1



is

more likely than

.

is

more likely than

.

is the

most likely among

, , .

Box 1.

Box 10.

• Is

∑ , ,,close to

∑ , ?

• Is

∑ , ,,close to

∑ , ?

• Is

∑,close to

∑ , ?

Yes to all No to at least one

, , 1 and taken

away from player ’s consideration.

Is

∑ , close to ?

Yes No

Box 7.

Box 9.

Box 8.

Box 6.

, , 1 and taken

away from player ’s consideration.

, , 1

Player is sure about

λ 1 from , ∈.

• Player is not sure.• or and excluded

from player ’s consideration.• The inference is the same as below.

11

11 η η

Box 2.

Box 3. Box 4.

Box 5.

The th review round.

Figure 13: Formal: Player ’s Inference of Player ’s Message

∑ ,

is close to Othercases

Box 11.

folk theorem in repeated games with private...

Documents