folk theorem in repeated games with private...
TRANSCRIPT
Folk Theorem in Repeated Games with PrivateMonitoring
Takuo Sugaya�y
Princeton University
September 2, 2011
Abstract
We show that the folk theorem with individually rational payo¤s de�ned by purestrategies generically holds for N -player repeated games with private monitoring wheneach player�s number of signals is su¢ ciently large. No cheap talk communicationdevice or public randomization device is necessary.
Journal of Economic Literature Classi�cation Numbers: C72, C73, D82
Keywords: repeated game, folk theorem, private monitoring
�[email protected] thank Stephen Morris for invaluable advice, Yuliy Sannikov for helpful discussion, and Dilip Abreu,
Edoardo Grillo, Johannes Hörner, Yuhta Ishii, Michihiro Kandori, Vijay Krishna, George Mailath, TadashiSekiguchi, Andrzej Skrzypacz, Satoru Takahashi, Yuichi Yamamoto and seminar participants at 10th SAETconference, 10th World Congress of the Econometric Society, 22nd Stony Brook Game Theory Festival andSummer Workshop on Economic Theory for useful comments. The usual disclaimer of course applies.
1
1 Introduction
One of the key results in the literature on in�nitely repeated games is the folk theorem: Any
feasible and individually rational payo¤ can be sustained in equilibrium when players are
su¢ ciently patient. Fudenberg and Maskin (1986) establish the folk theorem under perfect
monitoring, that is, when players can directly observe the action pro�le. Fudenberg, Levine
and Maskin (1994) extend the folk theorem to imperfect public monitoring, where players
can observe only public noisy signals about the action pro�le.
The driving force of the folk theorem in perfect or public monitoring is the coordination
of future play based on common knowledge of relevant histories. Speci�cally, the public com-
ponent of histories, such as action pro�les in perfect monitoring or public signals in public
monitoring, reveals past action pro�les (at least statistically). Since this public information
is common knowledge, players can coordinate a punishment contingent on the public infor-
mation, and thereby provide dynamic incentives to choose actions that are not static best
responses.
With private monitoring, players can observe only private noisy signals about the action
pro�le. Common knowledge no longer exists and coordination is di¢ cult (we call this problem
�coordination failure�).1 Hence, the robustness of the folk theorem to a general private
monitoring has been an open question. For example, Kandori (2002) mentions that �[t]his
is probably one of the best known long-standing open questions in economic theory.�
Many economic situations should be analyzed as repeated games with private monitoring.
For example, Stigler (1964) proposes a repeated price-setting oligopoly, where �rms set their
own price in a face-to-face negotiation and cannot observe their opponents�prices. Instead, a
�rm obtains some information about opponents�prices through its own sales. Since the sales
level depends on both opponents�prices and unobservable shocks due to business cycles, the
sales is an imperfect signal. In addition, each �rm�s sales is often private information. Thus,
monitoring is imperfect and private. In addition, Fuchs (2007) applies a repeated game
1Mailath and Morris (2002 and 2006) and Sugaya and Takahashi (2010) o¤er the formal model of thisargument.
2
with private monitoring to a contract between a principal and an agent, and Harrington
and Skrzypacz (2007) analyze cartel behaviors using the framework of a repeated game with
private monitoring.
This paper is the �rst to show that the folk theorem holds in repeated games with generic
private monitoring. We unify and improve the three approaches in the literature on private
monitoring that have been used to show partial results so far: Belief-free, belief-based and
communication approaches.
The belief-free approach (and its generalization) has been successful to show the folk
theorem in the prisoners�dilemma.2 A strategy pro�le is belief-free if, for any history pro�le,
the continuation strategy of each player is optimal conditional on the opponent�s history.
With almost perfect monitoring, Piccione (2002) and Ely and Välimäki (2002) show the folk
theorem for the two-player prisoners�dilemma. Without any assumption on the precision of
monitoring but with conditionally independent monitoring, Matsushima (2004) obtains the
folk theorem in the two-player prisoners�dilemma.
Unfortunately, only limited results have been shown without almost perfect or condition-
ally independent monitoring: Fong, Gossner, Hörner and Sannikov (2010) show the payo¤ of
the mutual cooperation is approximately attainable and Sugaya (2010a) shows the folk the-
orem in the two-player prisoners�dilemma with some restricted classes of the distributions
of the private signals.
Several papers construct belief-based equilibria, where players�strategies involve statisti-
cal inference about the opponents�past histories. With almost perfect monitoring, Sekiguchi
(1997) shows the payo¤ of the mutual cooperation is approximately attainable and Bhaskar
and Obara (2002) show the folk theorem in the two-player prisoners� dilemma. Mailath
and Morris (2002 and 2006) consider the robustness of equilibria in public monitoring to
almost public monitoring. Based on their insights, Hörner and Olszewski (2009) establish
2Kandori and Obara (2006) use a similar concept to analyze a private strategy in public monitoring.Kandori (2010) considers �weakly belief-free equilibria,�which is a generalization of the belief-free equilibria.Apart from a typical repeated-game setting, Takahashi (2010) and Deb (2011) consider the communityenforcement and Miyagawa, Miyahara and Sekiguchi (2008) consider the situation where a player can improvethe monitoring by paying cost.
3
the folk theorem for almost public monitoring. Phelan and Skrzypacz (2009) characterize
the set of possible beliefs about opponents�states in a �nite-state automaton strategy and
Kandori and Obara (2010) o¤er an easy way to verify if a �nite-state automaton strategy is
an equilibrium.
Another approach to analyzing repeated games with private monitoring is to introduce
public communication. Folk theorems have been proven by Compte (1998), Kandori and
Matsushima (1998), Aoyagi (2002), Fudenberg and Levine (2002) and Obara (2007). In-
troducing a public element and letting a strategy depend only on the public element allow
these papers to sidestep the di¢ culty of coordination through private signals. However, the
analyses are not applicable to settings where communication is not allowed: For example, in
Stigler (1964)�s oligopoly example, anti-trust laws rule that communication is illegal. Fur-
thermore, it is uncertain whether their equilibria are robust to the perturbation that the
message transmission has small private noises although the robustness to a small private
noise is one of the main motivations in private monitoring.
This paper incorporates all the three approaches. First, the equilibrium strategy to show
the folk theorem is occasionally belief-free. That is, we see the repeated game as the repetition
of long review phases. At the beginning of each review phase, every on-path strategy of each
player is optimal conditional on the histories of the opponents. Second, however, the belief-
free property does not hold except at the beginning of the phases. Hence, we consider each
player�s statistical inference about the opponents�past histories as in belief-based approach
within each phase. Finally, in our equilibrium, the players do communicate but the message
exchange can be done with their actions. One of our methodological contributions is to o¤er
a systematic way to dispense the cheap talk message protocol with message exchange via
their actions.
The rest of the paper is organized as follows. Section 2 introduces the model and Section
3 states the assumptions and main result. The remaining parts of the paper are devoted
to its proof. Since the complete proof is long and complicated, in the proof of the main
text, we illustrate the main structure by focusing on the two-player prisoners�dilemma with
4
cheap talk and public randomization and refer the complete proof for a general game without
cheap talk and public randomization device to the Supplemental Materials. Section 4 relates
the in�nitely repeated game to a �nitely repeated game with an auxiliary scenario (reward
function) as Hörner and Olszewski (2006) and derives a su¢ cient condition on the �nitely
repeated game to show the folk theorem. In Section 5, we intuitively explain the equilibrium
construction. In particular, after explaining Matsushima (2004) with conditionally indepen-
dent signals, we explain how to extend his result to conditionally dependent signals. We
will see that the players need to coordinate their future play through private signals. After
we formally de�ne the structure of the �nitely repeated game in Section 6 and strategy in
Section 8, Section 9 explains how the coordination works. Sections 11 and 12 verify the
strategy satis�es the su¢ cient condition for the folk theorem derived in Section 4. All the
proofs are given in the Appendix. In Sections 13 and 14, we comment on how we generalize
the proof in the main text to the case for a general game without cheap talk and public
randomization device in the Supplemental Materials.
2 Model
2.1 Stage Game
The stage game is given by�I; (Ai; Yi; ~ui)i2I ; q
. I = f1; : : : ; Ng is the set of players, Ai
with jAij � 2 is the �nite set of player i�s pure actions, Yi is the �nite set of player i�s private
signals, and ~ui : Ai � Yi ! R is player i�s ex-post utility function. Let A �Q
i2I Ai and
Y �Q
i2I Yi be the set of action pro�les and signal pro�les, respectively.
In every stage game, player i chooses an action ai 2 Ai, which induces the action pro�le
a � (a1; : : : ; aN) 2 A. Then, a signal pro�le y = (y1; : : : ; yN) 2 Y is realized according to a
joint conditional probability function q (y j a). Given an action ai 2 Ai and a private signal
yi 2 Yi, player i receives the ex-post utility ~ui (ai; yi). Thus, her expected payo¤ conditional
on an action pro�le a 2 A is given by ui (a) �P
y2Y q (y j a) ~ui (ai; yi). For each a 2 A, let
u (a) represent the payo¤ vector (ui (a))i2I .
5
2.2 Repeated Game
Consider the in�nitely repeated game of the above stage game in which the (common)
discount factor is � 2 (0; 1). Let ai;� and yi;� denote respectively the action played and the
private signal observed in period � by player i. Player i�s private history up to period t � 1
is given by hti � (ai;� ; yi;� )t�1�=1. With h
1i = f;g, for each t � 1, let H t
i be the set of all hti. A
strategy for player i is de�ned to be a mapping �i :1St=1
H ti !4(Ai). Let �i be the set of all
strategies for player i. Finally, let E(�) be the set of sequential equilibrium payo¤s with a
common discount factor �.
3 Assumptions and Result
In this section, we state two assumptions and the main result (folk theorem).
First, we state an assumption on the payo¤structure. Let F � co(fu(a)ga2A) be the set of
feasible payo¤s. The individually rational payo¤for player i is v�i � mina�i2A�i maxai2Ai ui(ai; a�i).
Note that we concentrate on the pure strategy minimax. Then, the set of feasible and in-
dividually rational payo¤s is given by F � � fv 2 F : vi � v�i for all ig. We assume the full
dimensionality of F �.
Assumption 1 The stage game payo¤ structure satis�es the full dimensionality condition:
dim(F �) = N .
Second, we state an assumption on the signal structure.
Assumption 2 Each player�s number of signals is su¢ ciently large: For any i 2 I, we have
jYij � max(maxj 6=i
jAjj+ 2X
n6=i;i�1
jAnj ;Xj2IjAjj ;max
j2I2 jAjj
):
Note that RHS is bounded by a lineaer function ofP
j jAjj. Under these assumptions,
we can generically construct an equilibrium to attain any point in int(F �).
6
Theorem 1 If Assumptions 1 and 2 are satis�ed, then the folk theorem generically holds:
For generic q (� j �), for any v 2 int(F �), there exists �� < 1 such that, for all � > ��, v 2 E (�).
Since the full support assumption q(y j a) > 0 for all a 2 A and y 2 Y is generic,
we assume the monitoring is full support. Then, any sequential equilibrium is realization
equivalent to a Nash equilibrium. Hence, for the rest of the paper, we consider a Nash
equilibrium.
For the proof in the main text, we focus on the two-player prisoners�dilemma with perfect
and public cheap talk and public randomization device: I = 2, Ai = fCi; Dig and
ui(Di; Cj) > ui(Ci; Cj) > ui(Di; Dj) > ui(Ci; Dj) (1)
for all i. For notational convenience, whenever we say players i and j, unless otherwise
mentioned, i and j are di¤erent. In the two-player prisoners�dilemma, Assumptions 1 and
2 are equivalent to jYij � 4 for all i. Furthermore, we focus on v with
v 2 int([u1(D1; D2); u1(C1; C2)]� [u2(D2; D1); u2(C2; C1)]): (2)
Section 13 brie�y explains how to show the folk theorem for a general game and Section 14
explains how to dispense cheap talk and public randomization device. See the Supplemental
Materials for the formal proofs.
We prove the theorem with the following steps. We arbitrarily �x v with (2) and im-
plement v by a strategy pro�le that is recursive in every TP periods, where TP 2 N should
be determined later. In Section 4, we relate the in�nitely repeated game with a TP -period
�nitely repeated game with an auxiliary scenario. Speci�cally, we derive su¢ cient conditions
on a strategy and an auxiliary scenario in the TP -period �nitely repeated game from which
we can construct an equilibrium strategy to implement v in the in�nitely repeated game.
In Section 5, we intuitively explain the structure of the equilibrium. Given this, Section 6
explains the formal structure of the �nitely repeated game and Section 8 explains the equi-
librium strategy. In Sections 9, 11 and 12, we verify that the strategy satis�es the su¢ cient
7
conditions in the �nitely repeated game.
4 Finitely Repeated Game
In this section, we consider a TP -period �nitely repeated game with auxiliary scenarios.
We derive su¢ cient conditions on strategies and auxiliary scenarios in the �nitely repeated
game such that we can construct a strategy in the in�nitely repeated game to support v.
The su¢ cient conditions are stated in Lemma 1.
Let �TPi : HTPi ! �(Ai) be player i�s strategy in the �nitely repeated game. Let �
TPi be
the set of all strategies in the �nitely repeated game. Each player i has a state xi 2 fG;Bg.
In state xi, player i plays �i (xi) 2 �TPi . In addition, player i with xi gives an �auxiliary
scenario� (or �reward function�) �j(xi; � : �) to player j. Here, �j(xi; � : �) : HTP+1i ! R,
that is, the auxiliary scenarios are functions from the histories in the �nitely repeated game
to the real numbers.
For v with (2), we can take � > 0, vi and �vi such that
ui (D1; D2) + � < vi < vi < �vi < ui (C1; C2)� �: (3)
Our task is to �nd �i (xi) and �j(xi; � : �) such that, for su¢ ciently large �, there exists
TP such that, for any i 2 I,
1. For any xj 2 fG;Bg, �i (G) and �i (B) are optimal in the �nitely repeated game:
�i (G) ; �i (B) 2 arg max�TPi 2�TPi
E
"TPXt=1
�t�1ui (at) + �i(xj; hTP+1j : �) j �TPi ; �j(xj)
#: (4)
2. The discounted average of player i�s instantaneous utilities and player j�s auxiliary
scenario on player i is equal to �vi if player j�s state is good (xj = G) and equal to vi if
8
player j�s state is bad (xj = B):
1� �
1� �TPE
"TPXt=1
�t�1ui (at) + �i(xj; hTP+1j : �) j �(x)
#=
8<: �vi if xj = G;
vi if xj = B:(5)
Intuitively, for su¢ ciently large �, since lim�!11��1��TP =
1TP, this requires that the time
average of the expected sum of the instantaneous utilities and the reward is close to
the targeted payo¤s vi and �vi.
3. �i(G; hTP+1j : �) and �i(B; h
TP+1j : �) are uniformly bounded with respect to � and
�i(G; hTP+1j : �) � 0;
�i(B; hTP+1j : �) � 0: (6)
We call (6) the �feasibility constraint.�3
The following lemma gives us a su¢ cient condition about the �nitely repeated game to
show the folk theorem in the in�nitely repeated game:
Lemma 1 For Theorem 1, it su¢ ces to show that there exist ff�i (xi)gxi2fG;Bggi2I and
ff�i(xj; � : �)gxj2fG;Bggi2I satisfying (4), (5) and (6) in the TP -period �nitely repeated game.
5 Intuitive Explanation
5.1 With Conditionally Independent Signals
Following Matsushima�s (2004), we �rst consider the case with conditionally independent
monitoring:
q (yj j a; yi) = q (yj j a)3For notational convenience, when we say that (6) is satis�ed, it also implies that �i(G; h
TP+1j : �) and
�i(B; hTP+1j : �) are uniformly bounded with respect to �.
9
for all a 2 A and y 2 Y . That is, conditional on an action pro�le a, player i�s signal has
no information about the opponent�s signal. In this case, it is easy to construct �i (xi) and
�i(xj; � : �) satisfying (4), (5) and (6).
With conditional independence, we can see a TP -period �nitely repeated game as a T -
period �review round�with a su¢ ciently long TP = T .
�i(xi) is simply de�ned as follows: At the beginning of the �nitely repeated game, send
the message xi 2 fG;Bg by cheap talk. Then, constantly take
ai(xi) �
8<: Ci if xi = G;
Di if xi = B(7)
for T periods. That is, player i with �i(G) who wants to make player j�s value high takes
Ci and player i with �i(B) who wants to make player j�s value low takes Di.
We are left to construct �i(xj; � : �). We concentrate on the case with xj = G since
the case with xj = B is symmetric. If player j receives the message xi = B, then player
i is supposed to take Di. Since Di is the dominant action, player i does not need to be
incentivized by the auxiliary scenario. Consider the following constant reward function:
�i(G; hTP+1j : �) � ��T:
Since the reward is constant, player i plays Di. Therefore, �i(B) is optimal after the message
xi = B and
limT!1
lim�!1
1� �
1� �TPE
"TPXt=1
�t�1ui;t (a) + �i(G; hTP+1j : �) j �i (B) ; �j (G)
#
= limT!1
1
T
"TXt=1
ui (Di; Cj)� �T
#= u (Di; Cj)� � > �vi from (3). (8)
By subtracting a proper �xed (depending only on x) positive number from the reward func-
tion, it is possible to attain �vi exactly for su¢ ciently large � and T . Note that subtracting
10
a positive number does not violate (6).
On the other hand, if player j receives the message xi = G, then player j needs to
incentivize player i to take Ci by the auxiliary scenario �i(xj; � : �). Suppose there exist
statistics Ci;ajj;t 2 f0; 1g and q2 > q1 such that, for each aj 2 Aj, Ci;ajj;t = 1 indicates that
Ci is more likely to be played:
EhCi;ajj;t j ai; aj
i=
8<: q2 if ai = Ci;
q1 if ai = Di:(9)
The existence of such Ci;ajj;t , q2 and q1 will be proven in Lemma 3. Take �L such that
�L (q2 � q1) > maxa;i2 jui (a)j : (10)
Since player j with xj = G takes Cj, player j rewards player i based onPT
t=1Ci;Cjj;t , the
summation of Ci;Cjj;t :
�i(G; hTP+1j : �) � �L
(TXt=1
Ci;Cjj;t � (q2T + 2"T )
)�
� �T
with some small " > 0. The reward is linearly increasing in Ci;Cjj;t with slope �L untilPTt=1
Ci;Cjj;t hits the upper bound q2T + 2"T . If
PTt=1
Ci;Cjj;t does not hit the upper bound,
since playing Ci instead of Di increases the expectation of Ci;Cjj;t by (q2 � q1) from (9), the
marginal gain of taking Ci is �L (q2 � q1). Since (10) implies that this marginal gain dominates
the di¤erence in the instantaneous utilities, player i has the incentive to take Ci.
Hence, in order to show that player i takes Ci, it su¢ ces to show that, regardless of player
i�s signal observations, player i believes thatPT
t=1Ci;Cjj;t hits the upper bound with little
probability: For all hi,
Pr
(TXt=1
Ci;Cjj;t > q2T + 2"T
)j xj = G; hi
!� exp(� (q2 � q1)T ):
11
This can be shown as follows: If the monitoring is conditionally independent, regardless of
player i�s signal observations, player i�s beliefs onPT
t=1Ci;Cjj;t j xj = G; hi are approximately
distributed according to the normal distribution with the mean
E
"TXt=1
Ci;Cjj;t j Ci; Cj
#= q2T
and the standard deviation O(T12 ) by the central limit theorem.4 Since q2T +2"T is greater
than the mean by 2"T12 times T
12 , the order of the standard deviation, player i believes that
the probability thatPT
t=1Ci;Cjj;t hits the upper bound is negligible.
Therefore, �i(G) is optimal after the message xi = G and
limT!1
lim�!1
1� �
1� �TPE
"TPXt=1
�t�1ui;t (a) + �i(G; hTP+1j : �) j �i (G) ; �j (G)
#
= limT!1
1
TE
"TXt=1
ui;t (a) + �L
(TXt=1
Ci;Cjj;t � (q2T + 2"T )
)�
� �T j �i (G) ; �j (G)#
= u (Ci; Cj)� 2"�L� �;
which is larger than �vi for su¢ ciently small " > 0 from (3). By subtracting a proper �xed
number from the reward function, it is possible to attain �vi exactly for su¢ ciently large �
and T . Again, we can keep (6).
Since both �i(G) and �i(B) yield the same expected value �vi and are subgame perfect
once message xi is sent, both �i(G) and �i(B) are optimal. Therefore, we are done.
4Here, we consider the incentive only on the equilibrium path. Since defection decreases the expectedvalue of Ci;Cjj;t , together with conditional independence, defection reduces the probability that
PTt=1
Ci;Cjj;t
hits the upper bound even more.
12
5.2 With Conditionally Dependent Signals
5.2.1 Problem
Now we consider the two-player prisoners�dilemma with conditionally dependent monitoring
and explain the di¢ culty. To illustrate the problem, consider the case with xj = G and sup-
pose we use the same �i(G) and �i(B) and the same reward as in the case with conditionally
independent monitoring. A problem occurs when xi = G and player j needs to incentivize
player i to take Ci. Suppose player i�s signal and player j�s signal are highly correlated and
player i can inferPT
t=1Ci;Cjj;t from her own history very precisely. (i) Equation (5) requires
that the time average of the expected sum of the instantaneous utilities and the reward
function should be close to ui(Ci; Cj). (ii) Inequality (6) requires that the reward should be
negative. Since (iii) the time average of the instantaneous utilities is close to ui(Ci; Ci) if
player i takes Ci, (i), (ii) and (iii) together require that ifPT
t=1Ci;Cjj;t is close to its ex ante
value q2T , then the time average of the reward should be close to 0. This means that player
i cannot be rewarded ifPT
t=1Ci;Cjj;t is unusually high. If, in period � , player i infers from h�i
thatP�
t=1Ci;Cjj;t is already unusually high (greater than q2T + 2"T ) with high probability,
then player i stops cooperation after period � .5
Speci�cally, since the reward after receiving xi = G is �LfPT
t=1Ci;Cjj;t �(q2T+2"T )g���T ,
if player i believes thatPT
t=1Ci;Cjj;t has hit the upper bound q2T+2"T with high probability,
then player i wants to stop cooperation. The problem is that this stops increasing whenP�t=1
Ci;Cjj;t > q2T + 2"T .
5.2.2 Modi�cation
Structure of the Phase To deal with the problem above, we consider the following
modi�cation. First, for a moment,6 let us see the TP -period �nitely repeated game as L
5This is also mentioned by Fong, Gossner, Hörner and Sannikov (2010).6We will modify the structure further to deal with problems arising later. See Section 6 for the �nal
structure.
13
repetition of the T -period review rounds with
�L > �L: (11)
Now, TP = LT .
[Insert Figure1].
For notational convenience, let T (l) be the set of periods in the lth review round. When
player j monitors player i, player j randomly drops one period tj(l) from T (l). That is,
Pr(ftj(l) = tg) = 1Tfor all t 2 T (l). Let Tj(l) � T (l) n ftj(l)g be the remaining periods in
the lth review round. Player j monitors player i during T (l) by
Xj(l) �
8<:P
t2Tj(l)Ci;Cjj;t + 1tj(l) if xj = G;P
t2Tj(l)Ci;Djj;t + 1tj(l) if xj = B:
(12)
Here, 1tj(l) 2 f0; 1g is a random variable with Pr(�1tj(l) = 1
) = q2 conditional on tj(l).
Hence, instead of monitoring byP
t2T (l)Ci;Cjj;t , player j randomly picks tj(l) and replaces
Ci;Cjj;tj(l)
with the random variable 1tj(l) that is independent of the players�action. The reason
will be explained in Section 15.7.
Reward Function by Player j Second, we will heuristically de�ne the reward function
by player j.7 There are following cases:
xi = B: When player j receives the message xi = B, then player i takes a dominant
action and no incentive by the auxiliary scenario is necessary. Hence, it is straightforward
to construct a constant reward as in Section 5.1, achieving (4), (5) and (6). In this case, we
say �j(l) = G for all l = 1; :::; L. See the case with xi = G for the meaning of �j(l).
xi = G: When player j receives the message xi = G, then we need to deal with the
problem mentioned in Section 5.2.1. See each of the L review rounds independent and
7See Section 8.3 for the formal de�nition of the reward function.
14
consider the following reward function for each lth round:
�i(G; hTP+1j ; l) = �LfXj(l)� (q2T + 2"T )g � �T (13)
�i(B; hTP+1j ; l) = �LfXj(l)� (q2T � 2"T )g+ �T (14)
If we make
�i(xj; hTP+1j : �) =
LXl=1
�i(xj; hTP+1j ; l);
then, since the reward function is always increasing inXj (l) with slope �L, it is always optimal
for player i to take Ci.
Since �i(xj; hTP+1j : �) =
PLl=1 �i(xj; h
TP+1j ; l) does not satisfy (6), we need further modi-
�cation. Observe th following: For xj = G, if �i > ��T (that is, Xj(l) > q2T + 2"T ) occurs
at most for one review round, then (6) is still satis�ed. To see why, the maximum reward
in one review round is attained at Xj(l) = T and �i(G; hTP+1j ; l) < �LT � �T . On the other
hand, if Xj(l) � q2T + 2"T , then �i(G; hTP+1j ; l) � ��T . Since we take �L < L� in (11), the
total reward in the �nitely repeated game is still negative if Xj(l) > q2T + 2"T occurs only
for one review round. Symmetrically, for xj = B, if �i < �T (that is, Xj(l) < q2T � 2"T )
occurs at most for one review round, then (6) is still satis�ed.
Therefore, once
Xj (l) 62 [q2T � 2"T; q2T + 2"T ]
happens in the lth review round, we make player j�s reward function for the following review
rounds will be a negative constant for xj = G and a positive constant for xj = B. Speci�cally,
for the following review rounds ~l with ~l 2 fl + 1; : : : ; Lg, suppose we modify the reward
function as
�i(G; hTP+1j ; ~l) = ��T (15)
�i(B; hTP+1j ; ~l) = �T: (16)
Importantly, (6) is recovered since �i(G; hTP+1j ; l) > ��T and �i(B; hTP+1j ; l) < �T happen
15
at most for one review round by de�nition.
Let �j(l) 2 fG;Bg denote which reward function player j is using in the lth review
round:
� In the �rst review round, �j(1) = G and the reward is (13) or (14).
� In the (l + 1)th review round with l � 1,
� If �j(l) = B, then �j(l + 1) = B and the reward is (15) or (16).
� If �j(l) = G, then
� If Xj(l) 2 [q2T � 2"T; q2T + 2"T ], then �j(l + 1) = G and the reward is (13)
or (14).
� If Xj(l) 62 [q2T � 2"T; q2T + 2"T ], then �j(l + 1) = B and the reward is (15)
or (16).
See Figure 2 for the automaton representation. Hence, �j(l) = G implies that the reward
in the lth review round is (13) or (14) while �j(l) = B implies that the reward in the lth
review round is (15) or (16).
Note that ex ante, if the players play ai(xi); aj(xj), then Xj(l) is approximately dis-
tributed according to the normal distribution with expectation q2T and standard deviation
O(T12 ). Since Xj(l) 62 [q2T � 2"T; q2T + 2"T ] implies that Xj(l) is far away from its ex ante
value by 2T12 times the order of standard deviation, �j (l + 1) = B happens only with small
probability less than exp(� (q2 � q1)T ). Hence, we call the observation is �erroneous� if
Xj(l) 62 [q2T � 2"T; q2T + 2"T ].
[Insert Figure 2].
Optimal Action of Player i Third, given player j�s reward function, we will derive
the optimal action of player i. Since the shape of the reward function changes as �j(l)
changes, player i needs to infer �j(l). Let �j (l) 2 fG;Bg be player i�s inference of �j (l).
Forgetting the question of how player i infers �j(l) for a while, assume �j (l) is always correct:
�j (l) = �j (l) for all l.
16
After sending xi = B Since �j(l) = G for any history of player j, �j(l) = G. Since
the reward function is constant, it is optimal to take ai(xi) = Di. Therefore, �i(B) such that
player i always takes Di is optimal.
After sending xi = G If �j(l) = B, then since the reward is constant for the rest of the
�nitely repeated game, it is optimal to take Di. We verify the incentive to take cooperation
when �j(l) = G by the backward induction.
For the last review round l = L,
� If �j(L) = G, then player i wants to take Ci since the reward is linearly increasing in
Xj(l) with slope �L and there is no e¤ect on �j(L+ 1). Player i�s average continuation
payo¤ at the beginning of Lth review round is
� If xj = G, then
limT!1
lim�!1
1� �
1� �TE
24 Pt2T (L) �
t�tLui(Ci; Cj)
���tL+1��LfXj (L) + (q2T + 2"T )g+ �T
�j Ci; Cj
35= u (Ci; Cj)� 2"�L� � � �vi (17)
with tL being the �rst period of the Lth review round.
� If xj = B, then
limT!1
lim�!1
1� �
1� �TE
24 Pt2T (L) �
t�tLui(Ci; Dj)
+��tL+1��LfXj (L)� (q2T � 2"T )g+ �T
�j Ci; Dj
35= u (Ci; Dj) + 2"�L+ � � vi (18)
Note that replacing Ci;Cjj;tj(l)with 1tj(l) in (12) does not change the incentive and value
in the limit where T goes to in�nity.
� If �j(L) = B, then player i wants to take Di since the reward is constant. Player i�s
average continuation payo¤ at the beginning of Lth review round is
17
� If xj = G, then
limT!1
lim�!1
1� �
1� �T
24 Xt2T (L)
�t�tLui(Di; Cj)� ��tL+1�T
35 = u (Di; Cj)� � � �vi: (19)
� If xj = B, then
limT!1
lim�!1
1� �
1� �T
24 Xt2T (L)
�t�tLui(Di; Di) + ��tL+1�T
35 = u (Di; Dj) + � � vi: (20)
We can subtract proper positive numbers depending only on x and �j(L) from (17) and
(19) such that the continuation payo¤ is exactly �vi if xj = G. Similarly, we can add proper
positive numbers depending only on x and �j(L) to (18) and (20) such that the continuation
payo¤ is exactly vi if xj = B. Note that we can keep (6).
Therefore, �i(G) such that player i with �j(L) = G plays Ci and with �j(L) = B plays
Di is optimal and player i�s continuation payo¤ is independent of �j(L).
For the second last review round l = L � 1, since the continuation payo¤ from the Lth
review round is constant for �j(L), player i do not need to consider the e¤ect of her strategy
in the (L� 1)th review round on �j(L). Therefore, the same proof shows that �i(G) such
that player i with �j(L�1) = G plays Ci and with �j(L�1) = B playsDi is optimal. Further,
we can make sure that player i�s continuation payo¤at the beginning of the (L� 1)th review
round is �vi or vi depending on xj but independently of �j(L� 1).
Recursively, we can show that �i(G) such that player i with �j(l) = G plays Ci and with
�j(l) = B plays Di for all l is optimal and gives player i �vi or vi depending on xj.
In summary, player i�s action in the lth review round is de�ned as in Figure 3:
[Insert Figure 3]
Reward Function of Player j Revisited Fourth, we modify player j�s reward function
further to incorporate the fact that player j with �i(l) = B (player j infers that player i
18
has observed erroneous histories) takes Dj 6= aj(xj). As a preparation, we construct special
reward functions �Gi (aj; yj) and �Bi (aj; yj) that make player i indi¤erent between any action
pro�le.
Lemma 2 Generically, the following statement is true: For any � > 0, there exists �u > �
such that, for each i 2 I, there exist �Gi : Aj � Yj ! [��u;��] and �Bi : Aj � Yj ! [�; �u] such
that
ui (a) + E��Gi (aj; yj) j a
�= constant � ��u for all a 2 A;
ui (a) + E��Bi (aj; yj) j a
�= constant � �u for all a 2 A:
When �i(l) = B, player j uses the reward �xji (aj;t; yj;t) for each t in the ~lth review round
with ~l � l so that player i can always assume �i(l) = G. Therefore, the modi�ed reward
function is explained in the following �gure:
[Insert Figure 4]
The incentive compatibility that player i does not try to manipulate �j(l) will be proven
in Proposition 1.
5.2.3 Summary from the Perspectives of In�nitely Repeated Games
Here, we o¤er the summary of our equilibrium construction, using the language of in�nitely
repeated games. As we can see from the proof of Lemma 1, we can see the �nitely repeated
game as the �rst TP periods in the in�nitely repeated game and a positive (negative, re-
spectively) reward implies that in the continuation play from period TP +1 is higher (lower,
respectively) than the value in the initial period.
Suppose xj = G. Then, the value for player i in the initial period needs to be close to
ui(C;C) to attain e¢ ciency. Since the value in the initial period is very high, player j with
xj = G cannot reward player i with high realization of auxiliary scenario (�i > ��T ) by
19
going to a higher continuation payo¤ from period TP + 1. Instead, player j �allows�player
i to take defection and �rewards�player i by higher instantaneous utilities. While doing so,
player j�s incentive is given by the changes in the continuation payo¤ from period TP + 1.
That is, the changes in the continuation payo¤ for player j while player i with �j(l) = B
defects correspond to the realization of the auxiliary scenario in Lemma 2 (the roles of players
i and j are reversed).
5.2.4 Coordination Problem
We are left to show how player i infers �j(l). The rest of the paper is mainly devoted to the
coordination between �j(l) 2 fG;Bg and �j(l) 2 fG;Bg. In Section 6, we further modify
the structure of the �nitely repeated game by introducing the supplemental rounds where
player j sends the message about �j(l + 1) by her actions after the lth review round. In
Section 8, we formally de�ne the equilibrium strategy (actions and rewards) except for how
player i infers player j�s message in the supplemental rounds. Then, in Section 9, we specify
player i�s inference of player j�s message in the supplemental rounds. This fully pins down
the strategy (actions and rewards).
While we de�ne the equilibrium, we will introduce variables with various restrictions. In
Section 10, we verify that we can take these variables consistently.
Then, in Sections 11 and 12, we verify the equilibrium strategy (actions and rewards)
satis�es (4) (5) and (6).
6 Structure of the Phase
In this section, we explain the structure of the TP -period �nitely repeated game, which is
summarized in Figure 5 below. TP depends on L and T . L 2 N will be pinned down in
Section 10. T 2 N is a parameter of the equilibrium.
At the beginning of the �nitely repeated game, we insert the �coordination block�where
each player i with �i(xi) sends xi by cheap talk simultaneously. At the end of the �nitely
20
repeated game, we insert the �report block�where the players report the whole history in
the �nitely repeated game by cheap talk. The detail of the report block will be explained in
Section 12. These blocks are instantaneous with cheap talk but will take multiple periods
when we dispense cheap talk in the Supplemental Materials 3 and 4.
Between the coordination and report blocks, the players play TP -periods �nitely repeated
game. We divide TP periods into L �main blocks.�The �rst (L�1) blocks is further divided
into the following �ve rounds: For l 2 f1; :::; L � 1g, the lth block consists of T -period
review round, T12 -period supplemental round 1 for �1(l+ 1), T
12 -period supplemental round
2 for �1(l + 1), T12 -period supplemental round 1 for �2(l + 1) and T
12 -period supplemental
round 2 for �2(l + 1).8 The last Lth block has only the T -period review round. In the
lth block, the sets of periods in the review round, supplemental round 1 for �i(l + 1) and
supplemental round 2 for �i(l+1) respectively are denoted by T (l), T (l; �i; 1) and T (l; �i; 2).
For su¢ ciently large T , the length of the review round is much larger than that of the other
four rounds and the payo¤s from the review rounds approximately determine the equilibrium
payo¤. See Figure 5 for the illustration.
[Insert Figure 5]
We show that, for su¢ ciently large T , for su¢ ciently large �, with TP = (L� 1)nT + 4T
12
o+
T , there exist �i (xi) and �i(xj; � : �) satisfying (4), (5) and (6).
7 Almost Optimality
Instead of proving (4), we only establish the �almost optimality with exp(�T 14 ) > 0�until
Section 12: For all i 2 I and x 2 fG;Bg2, for any � and h�i , the loss of playing the
8Throughout the paper, we neglect the integer problem since it is handled by replacing each variable sthat should be an integer with bsc = maxn2N
n�sn.
21
continuation strategy �i(xi) j h�i is smaller than exp(�T14 ):
max�TPi 2�TPi
EhPTP
t=1 �t�1ui (at) + �i(xj; h
TP+1j : �) j h�i ; �TPi ; �j(xj)
i�E
hPTPt=1 �
t�1ui (at) + �i(xj; hTP+1j : �) j h�i ; �(x)
i � exp(�T 14 ): (21)
To see why this is su¢ cient, remember that we insert the report block where the players
report the whole history hi at the end of the �nitely repeated game (see Figure 5). Based
on the reported history hi, player j adjusts the reward function �i(xj; � : �) so that the
prescribed action is exactly optimal. This adjustment is very small for large T since the
original strategy was optimal up to the loss of exp(�T 14 ) by (21).
The remaining task is to show the incentive to tell the truth about hi. Intuitively,
with some norm, player j punishes player i proportionally to hj � E hhj j hii 2. For
a properly chosen norm, the �rst order condition to minimize the expected punishment
E� hj � E hhj j hii 2 j hi� is to tell the truth: hi = hi. Since the adjustment is small,
this punishment can also be small and does not a¤ect the equilibrium payo¤. Section 12
formalizes the argument.
Therefore, until Section 12, our objective is to construct �i (xi) and �i(xj; � : �) satisfying
(21), (5) and (6).
8 Equilibrium Strategies
In this section, we partially de�ne �i (xi) and �i(xj; � : �). In Section 8.1, we de�ne the
equilibrium action given �j(l). Then, in Section 8.2, we de�ne �j(l) given player i�s inference
of player j�s messages in the supplemental rounds. The de�nition of player i�s inference of
player j�s messages in the supplemental rounds will be deferred to Section 9. In addition,
Section 8.3 de�nes the reward function �i(xj; � : �).
22
8.1 Actions Given �j(l)
In this subsection, we explain the strategies �i(xi) given �j(l) 2 fG;Bg. The transition of
�j(l) will be explained in the next subsection.
In the each lth review round, player i with �i(xi) takes ai(xi) with
ai(xi) �
8<: Ci if xi = G;
Di if xi = B(22)
if �j(l) = G and Di if �j(l) = B. See Figure 3 to check the equilibrium action.
As player j calculates Xj(l) de�ned in (12) to monitor player i after xi = G, player i
calculates
Xi(l) �
8<:P
t2Ti(l)Cj ;Cii;t + 1ti(l) if xi = G;P
t2Ti(l)Cj ;Dii;t + 1ti(l) if xi = B
(23)
if xj = G. Then, �i(l + 1) = B is the record of an erroneous history
Xi(l) =2 [q2T � 2"T; q2T + 2"T ]: (24)
That is,
�i(l + 1) =
8<: G if l = 0 or Xi(~l) 2 [q2T � 2"T; q2T + 2"T ] for all ~l � l;
B otherwise(25)
if the cheap talk message is xj = G and �i(l + 1) = G if xj = B. Review Figure 2 for the
graphical explanation (note that the roles of i and j are reversed).
Then, in the supplemental rounds 1 and 2 for �i(l + 1), player i sends �i(l + 1). How
to send the message will be de�ned in Section 9.2. On the other hand, in the supplemental
rounds 1 and 2 for �j(l + 1), player j sends �j(l + 1) symmetrically de�ned. While player j
sends the message, player i takes Ci. Let �j(l + 1)(i) be player i�s inference of the message
�j(l + 1), which will be de�ned in Section 9.2.
23
8.2 Inference of �j(l + 1)
In this subsection, we explain the transition of player i�s inference of �j(l + 1), �j(l + 1) 2
fG;Bg. Here, we take player i�s inference of player j�s messages in the supplemental rounds
for �j(l + 1) (denoted by �j(l + 1)(i) 2 fG;Bg) as given. See Section 9 for the explanation
of �j(l + 1)(i).
8.2.1 Statistics
Since �j(l+1) depends on Xj(l) =P
t2Tj(l)ai(xi);aj(xj)j;t +1tj(l) as we have mentioned in (23),
(24) and (25), we formally de�ne aj;t �rst. Since aj;t is i.i.d. within each review round, we
omit the time subscript if it is not confusing.
We want aj to satisfy the following two conditions: First, player j needs to statistically
monitor whether player i takes ai or not. In particular, as we have mentioned in (9), we
want to establish that
E�aj j ~ai; aj
�=
8<: q2 for ~ai = ai;
q1 for all ~ai 6= ai
with q2 > q1.
Second, as we will see in Section 8.2.2, player i�s continuation play depends on E�aj j a; yi
�.
We want to prevent player j from deviating from aj to ~aj 6= aj to manipulate E�aj j a; yi
�.
It is su¢ cient to have
EyihEaj
�aj j a; yi
�j ai; ~aj
iis constant with respect to ~aj. That is, player i calculates the conditional expectation of aj
believing that aj is taken. The ex ante value of this conditional expectation is constant even
if player j secretly deviates to ~aj 6= aj.
In summary, we want to construct aj with the following conditions:
Condition 1 We want to have
24
1. aj monitors ai:
E�aj j ~ai; aj
�=
8<: q2 for ~ai = ai;
q1 for all ~ai 6= ai:
2. Player j cannot manipulate the ex ante value of Eaj�aj j a; yi
�: For all ~aj 2 Aj,
EyihEaj
�aj j a; yi
�j ai; ~aj
i= q2:
We construct aj in the following two steps. First, we de�ne aj : Yj ! (0; 1). Second,
player j constructs a random variable aj 2 f0; 1g from aj (yj) as follows: After taking aj
and observing yj, player j calculates aj (yj). After that, player j draws a random variable
from the uniform distribution on [0; 1]. If the realization of this random variable is less than
aj (yj), then aj = 1 and otherwise,
aj = 0.
Since Pr(faj = 1g j ~a; y) = aj (yj) for all ~a 2 A and y 2 Y , Condition 1 is satis�ed for
aj if and only if aj (yj) satis�es
1.
E� aj (yj) j ~ai; aj
��Xyj
q(yj j ~ai; aj) aj (yj) =
8<: q2 if ~ai = ai;
q1 if ~ai 6= ai;(26)
2. Eyi�Eyj� aj (yj) j a; yi
�j ai; ~aj
�is constant with respect to ~aj. That is, for all ~aj 2 Aj,
Xyi
8<:Xyj
aj (yj)q(yj j a; yi)
9=; q (yi j ai; ~aj) = q2: (27)
Formally, we show the following lemma:
Lemma 3 Generically, the following statement is true: There exist q2 > q1 such that, for
each i 2 I and a 2 A, there exists a function aj : Yj ! (0; 1) such that (26) and (27) are
satis�ed.
25
8.2.2 Inference
Now we are ready to explain the transition of �j(l+1) 2 fG;Bg. If xi = B, then �j(l+1) = G
is common knowledge and so de�ne �j(l + 1) = G.
We are left to consider the case with xi = G. Since �j(1) = G is common knowledge and
so de�ne �j(1) = G. Further, since �j(l + 1) = B once �j(~l) = B has happened for some
~l � l from (25), de�ne �j(l+1) = B once �j(~l) = B has happened for some ~l � l. Hence, we
are left to specify, conditional on xi = G and �j(~l) = G for all ~l � l, how �j(l+ 1) 2 fG;Bg
is determined.
Suppose �j(l) = G is a correct inference: �j(l) = G. Then, �j(l + 1) is determined as
�j(l + 1) =
8<: G if Xj(l) 2 [q2T � 2"T; q2T + 2"T ]
B if Xj(l) 62 [q2T � 2"T; q2T + 2"T ](28)
with Xj(l) =P
t2Tj(l)ai(xi);aj(xj)j;t +1tj(l). Therefore, it is natural to consider the conditional
expectation of Xj(l):
E�Xj(l) j fat; yi;tgt2T (l)
�Instead of using this, we consider
Xt2Ti(l)
Eha(x)j;t j a(x); yi;t
i+ q2
with a (x) = (ai(xi); aj(xj)). Two reminders: First, player i calculates the expectation of the
summation of a(x)j;t over Ti(l) (the set of periods when player i uses to monitor player j),
not Tj(l) (the set of periods when player j uses to monitor player i).9 As we will see, since
Ti(l) and Tj(l) are di¤erent at most for two periods, this di¤erence is negligible for almost
optimality (21). Second, player i conditions on a(x) being taken. Player i with �j(l) = G
takes ai(xi) = Ci in the lth review round. In addition, as we have explained in Section
5.2.2, player j makes player i indi¤erent between any action pro�le for the rest of the �nitely
9The term q2 re�ects the fact that the expected value of 1tj(l) is q2.
26
repeated game if �i(l) = B. Therefore, player i always conditions �i(l) = G and player j
takes aj(xj) in the lth review round to calculate the conditional expectation.
Further, instead of using Eha(x)j;t j a(x); yi;t
idirectly, player i constructs (Ei
a(x)j )t 2
f0; 1g as follows: After taking ai(xi) and observing yi;t, player i calculates Eha(x)j;t j a(x); yi;t
i.
After that, player i draws a random variable from the uniform distribution on [0; 1]. If the
realization of this random variable is less than Eha(x)j;t j a(x); yi;t
i, then (Ei
a(x)j )t = 1 and
otherwise, (Eia(x)j )t = 0. Let
EiXj(l) =Xt2Ti(l)
(Eia(x)j )t + q2:
Since
Pr�n(Ei
a(x)j )t = 1
oj at; yt
�= E
ha(x)j;t j a(x); yi;t
ifor all at and yt, conditional on fat; ytgt2T (l), the probability that������
Xt2Ti(l)
Eha(x)j;t j a(x); yi;t
i+ q2 � EiXj(l)
������ � 1
4"T (29)
is of order exp(�T ) by the central limit theorem.
There are following cases:
1. (29) is not satis�ed. Let � i(l) = B denote this event. Player i will use the reward
�xij (ai;t; yi;t) de�ned in Lemma 2 in the subsequent rounds and so player j is indi¤erent
between any action pro�le. Hence, this case is excluded from player j�s consideration.
The inference of �j(l + 1) by player i will be equal to the inference of the messages
about �j(l+1) in the supplemental rounds 1 and 2 for �j(l+1): �j(l+1) = �j(l+1)(i).
2. (29) is satis�ed. Let � i(l) = G denote this event. Consider the following subcases:
(a) We have
EiXj(l) 62 [q2T �1
2"T; q2T +
1
2"T ]:
27
Let �i(l) = B denote this event. Player i will use the reward �xij (ai;t; yi;t) in the
subsequent rounds. However, compared to Case 1, player j does not exclude this
case from her consideration. Again, player i uses the inference of the messages in
the supplemental rounds: �j(l + 1) = �j(l + 1)(i).
(b) We have
EiXj(l) 2 [q2T �1
2"T; q2T +
1
2"T ]: (30)
Player i randomly picks the following two procedures:
i. With small probability � > 0, player i will use the reward �xij (ai;t; yi;t) in the
subsequent rounds. Again, player i uses the inference of the messages in the
supplemental rounds: �j(l+1) = �j(l+1)(i). Let �i(l) = B denote this event
(hence, �i(l) = B implies that either 2-(a) or 2-(b)-i happens).
ii. With large probability 1� �, player i believes �j(l+1) = G regardless of the
history in the supplemental rounds 1 and 2 for �j(l+1). Let �i(l) = G denote
this event.
Then, we de�ne �i(l) 2 fG;Bg after � i(l) = B in the same way as after � i(l) = G: If
(30) is not satis�ed, then we have �i(l) = B. If (30) is satis�ed, then with probability
�, we have �i(l) = B, and with probability 1� �, we have �i(l) = G.
For concreteness, we de�ne that player i who has deviated before the supplemental round
1 for �j(l + 1) uses �j(l + 1) = �j(l + 1)(i).
If 2-(b)-ii is the case, then since Ti(l) and Tj(l) are di¤erent only for two periods, after
knowing Ti(l) and Tj(l), (29) and (30) imply
E�Xj(l) j a(x); fyi;tgt2T (l)
�2 [q2T � "T; q2T + "T ]: (31)
Conditional on a(x) and fyi;tgt2T (l), conditional distribution of Xj(l) is distributed approxi-
mately according to the normal distribution with mean E�Xj(l) j a(x); fyi;tgt2T (l)
�and the
28
standard deviation O(T12 ) by the central limit theorem. Since (31) implies that the con-
ditional mean is inside of [q2T � 2"T; q2T + 2"T ] by "T12 times T
12 , the order of the stan-
dard deviation of the conditional distribution Xj(l) j a(x); fyi;tgt2T (l), player i believes that
Xj(l) 2 [q2T�2"T; q2T+2"T ] with probability of order 1�exp(�T ). Hence, player i believes
�j(l + 1) = G with high probability and so �j(l + 1) = G.
If 1, 2-(a) or 2-(b)-i is the case, we say �player i is ready to listen (to player j�s message
in the supplemental rounds).�
See the following �gure for all the classi�cation:
[Insert Figure 6]
Boxes 3, 5, 7 and 10 respectively correspond to 1, 2-(b)-ii, 2-(a) and 2-(b)-i.
Note that as we have mentioned in Figure 4 (the roles of i and j are reversed), when
player i has �j(l+1) = B, one of Boxes 3, 7 and 10 must to be the case and player j will be
indi¤erent between any action pro�le.
In addition, when player i uses the messages in the supplemental rounds to infer �j(l+1),
player i has made player j indi¤erent between any action pro�le. Therefore, we do not need
to consider the truthtelling incentive for player j in the supplemental rounds 1 and 2 for
�j(l + 1).
Further, as we mentioned in (29), conditional on fat; ytgt2T (l), (29) is not satis�ed with
probability of order exp(�T ) by the central limit theorem. And 2 of Lemma 3 implies that
the distribution of EiXj(l) is independent of player j�s action. Therefore, player j cannot
manipulate �i(l) 2 fG;Bg by changing her own action in the lth review round.
Therefore, we have proven the following lemma:
Lemma 4 For su¢ ciently large T , for all x 2 fG;Bg2, if �j(l) = �i(l) = G, then:
1. Conditional on any fat; ytgt2T (l), � i(l) = G with probability no less than 1� exp(�T 45 ).
2. The distribution of �i(l) 2 fG;Bg is independent of player j�s action.
29
3. If 2-(b)-ii is the case in the above explanation and �j(l) = �j(l) = G, then player i�s
conditional belief on �j(l + 1) = G at the end of the lth review round is no less than
1� exp(�T 45 ).
4. Whenever player i uses player i�s inference of player j�s messages in the supplemental
rounds for �j(l + 1), player i has made player j indi¤erent between any action pro�le
in the rounds after the lth review round.
8.3 Reward Function
In this subsection, we explain player j�s reward function on player i, �i(xj; � : �). The
following notation is useful: Let r 2 f1; :::; Lg [ f(l; �1; 1); (l; �1; 2)gl [ f(l; �2; 1)[ (l; �2; 2)glbe the generic index for the rounds. From Figure 5, there is the chronological order of the
rounds. Hence, with abuse of notation, we identify round r + 1 as the round coming right
after r. For example, with r = l (lth review round), r + 1 is the supplemental round 1 for
�1(l+ 1). In addition, let r < l if and only if the round r is before the lth review round and
r � l if and only if r < l or r = l.
The total reward is the summation of the following rewards in the review rounds and
those in the supplemental rounds:
The lth review round If �j(r) = B, #j(r) = B or �j(r) = B happens for some r < l,
then player j makes player i indi¤erent between any action pro�le sequence by
�i(xj; hTP+1j ; l) =
8<:P
t2T (l) �t�1�Gi (aj;t; yj;t) if xj = G;P
t2T (l) �t�1�Bi (aj;t; yj;t) if xj = B:
(32)
If �j(r) = #j(r) = �j(r) = G for all r < l, then player j�s reward based on x and �j(l).
Remember that the basic structure explained in Section 5.2.2 is summarized in the following
�gure:
[Insert Figure 7].
30
The formal description is given by
�i(xj; hTP+1j ; l) =
8>>>>>>>>>>>><>>>>>>>>>>>>:
��i(x; l; G)� �T if xj = G and xi = B;
��i(x; l; G) + �T if xj = B and xi = B;
��i(x; l; G) + �LfXj(l)� (q2T + 2"T )g � �T if xj = G, xi = G and �j(l) = G;
��i(x; l; B)� �T if xj = G, xi = G and �j(l) = B;
��i(x; l; G) + �LfXj(l)� (q2T � 2"T )g � �T if xj = B, xi = G and �j(l) = G;
��i(x; l; B) + �T if xj = B, xi = G and �j(l) = B:
(33)
Here, ��i(x; l; �j(l)) will be determined in Section 11 so that (21), (5) and (6) are satis�ed.
The supplemental rounds 1 and 2 for �1(l+ 1) and �2(l+ 1) Player j makes player i
indi¤erent between any action pro�le sequence by
Xt
�t�1�xji (aj;t; yj;t): (34)
On the top of that, in the supplemental rounds 1 and 2 for �j(l + 1), where player i is the
receiver of the message, player j adds8<: ��LP
t(1�Ci;aj;tj;t ) � 0 if xj = G;
�LP
tCi;aj;tj;t � 0 if xj = B
(35)
to make it strictly optimal to take Ci. Note that (35) is linearly increasing in Ci;aj;tj;t .
The summary of rewards after �j = B, #j = B or �j = B In the lth review round,
if �j(r) = B, #j(r) = B or �j = B with r < l has happened, (32) implies that any action
pro�le gives the same payo¤. On the other hand, in the supplemental rounds where player i
receives the message, (35) is valid even after �j = B, #j = B, or �j = B. Since (34) cancels
out the di¤erence in the instantaneous utilities w.r.t. an action pro�le, taking Ci is strictly
optimal by (35) and gives a constant expected payo¤.
31
With abuse of notation, we say �player i is indi¤erent between any action pro�le� if
�j = B, #j = B, or �j = B has happened, although player i strictly prefers Ci while she
receives the message.
For concreteness, we de�ne that, after player j deviates in the round r, player j has
�j(r) = B and makes any action optimal for player i.
8.3.1 Set of Almost Optimal Action
Given the above reward function, let Ai(l) be the set of player i�s optimal action in the lth
review round. As we will show in Section 11, Ai(l) is as follows: If �j(r) = B, #j(r) = B or
�j(r) = B happens for some r < l (and so (32) is being used), then Ai(l) = Ai. Otherwise
(and so (33) is being used), Ai(l) depends on xi 2 fG;Bg and �j(l) 2 fG;Bg. If xi =
�j(l) = G, since the reward is increasing in Xj(l), Ai(l) = fCig. If xj = B or �j(l) = B,
then since the reward is constant, Ai(l) = fDig.
9 Inference of the Messages
In this section, we de�ne how player i infers player j�s messages in the supplemental rounds
for �j(l + 1). That is, how �j(l + 1)(i) 2 fG;Bg is determined.
9.1 Conditions on the Message Exchange
We derive conditions on �j(l + 1)(i) that makes the inferences explained in Figure 6 almost
optimal. See Condition 2 below for the summary.
First, if (29) and (30) are satis�ed, player i needs to believe �j(l + 1) = G is true with
high probability regardless of player i�s continuation history. From 3 of Lemma 4, player
i before seeing player j�s action in the continuation play believes that the probability of a
mistake is no more than
exp(�T 45 ): (36)
32
As we will see in Sections 9.2.1 and 9.2.2, player j�s strategy in the supplemental rounds
is determined by �j(l+1). In addition, as we have seen in Section 8.1, player j�s strategy in
the review rounds is determined by �i(l + 1). Since the length of the supplemental rounds
is 4LT12 , this only updates player i�s belief by of order exp(T
12 ). Since (36) is much smaller
than 1= exp(T12 ), player i can keep high belief on �j(l+1) = G regardless of player i�s history
in the supplemental rounds.
In addition, player j�s actions in the review rounds indirectly reveal �j(l + 1) through
�i(l + 1). Suppose player i knows �i(l + 1). See Figure 8, which summarizes the important
features of �i and �j explained in Figure 6. Here, ai(l + 1) is player i�s equilibrium action
in the (l + 1)th review round prescribed in Section 8.1. Boxes 1 to 4 are about how player i
infers �j(l + 1). Remember that player j infers �i(l + 1) symmetrically, which is expressed
in Boxes 5 to 11. Whether player j is in Box 5 or 6, player j uses �i(l + 1)(j) to determine
�i(l + 1) with probability at least �. Again, since the length of the supplemental rounds
is 4LT12 , any �i(l + 1) can occur with probability of order exp(�T
12 ). Therefore, player i
believes that any �i(l+1) occurs with probability at least of order � exp(�T12 ).10 Since (36)
is much smaller than � exp(�T 12 ), player i can keep high belief on �j(l + 1) = G regardless
of player i�s history in the review rounds.
[Insert Figure 8]
In summary, if (29) and (30) are satis�ed, player i believes that �j(l + 1) = G is true
with high probability regardless of player i�s continuation history, as desired.
Second, when player i uses �j(l + 1)(i) to determine �j(l + 1), there are two cases:
�j(l + 1)(i) = �j(l + 1) and �j(l + 1)(i) 6= �j(l + 1). If the former is the case, then the
inference is optimal. If the latter is the case, for almost optimality, it su¢ ces to show that,
whenever �j(l+1)(i) 6= �j(l+1), conditional on �j(l+1) (that is, even after player i knows
that her inference was not correct), player i believes that player j makes player i indi¤erent
between any action with high probability: Ai(l + 1) = Ai.
10As we will see in Lemma 5, excluding Box 12 does not a¤ect the posterior so much.
33
Therefore, we want to have the following:
Condition 2 For the supplemental rounds, we want to have the following: Conditional on
�j(l+1) and �j(l+1)(i), if �j(l+1)(i) 6= �j(l+1), then player i believes that Ai(l+1) = Ai
with high probability.
9.2 Message Protocol
In this section, we explain how we establish Condition 2 in the supplemental rounds 1 and
2 for �j(l + 1), where player j sends �j(l + 1) 2 fG;Bg to player i.
The following notations are useful. Let aGj = Cj and aBj = Dj.11 Let qj(a) = (qj(yj j a))yjbe a vector of player j�s signal distributions under a. In addition, let 1yj;t be a jYjj�1 vector
such that the element corresponding to yj;t is 1 and the other elements are 0. qi(a) = (qi(yi j
a))yi and 1yi;t are symmetrically de�ned. " > 0 is a small number that will be speci�ed in
Section 10.
9.2.1 Supplemental Round 1 for �j(l + 1)
First, we intuitively explain the supplemental round 1 for �j(l+ 1), which is summarized in
Figure 9 below. Let ��j 2 fG;Bg be a true message, that is, �j(l+1) = ��j. Player j (sender)
takes a��jj 2 faGj ; aBj g for T
12 periods while player i (receiver) takes aGi for T
12 periods.
Suppose player j requires player i to infer the message in the following way. From the set
of periods in the current round T (l; �j; 1) (see Figure 5), player j randomly picks tj(l; �j; 1).
Player j calculates the frequency of player j�s signal observation dropping period tj(l; �j; 1).
With Tj(l; �j; 1) � T (l; �j; 1) n ftj(l; �j; 1)g, let yj(l; �j; 1) � 1
T12�1
Pt2Tj(l;�j ;1) 1yj;t. (i) If the
distance to yj(l; �j; 1) from the a¢ ne hull of fqj(a��jj ; ai)gai (a�
�fqj(a
��jj ; ai)gai
�) is no more
than ", then player j requires player i to infer �j(l + 1) properly.12 (ii) Otherwise, player j
will make any action optimal to player i (let �j(l; �j; 1) = B denote this case. Then, from
11In a general game, as we will see in the Supplemental Materials, any aGj and aBj with a
Gj 6= aBj generically
works.12As we will see in Section 9.2.2, there are subcases of (i) where player j makes player i indi¤erent between
any action pro�le, depending on the realization of the supplemental round 2 for �j(l + 1).
34
(32), Ai(~l + 1) = Ai for all ~l � l). Since we take the a¢ ne hull with respect to player i�s
action, whether (i) or (ii) is the case is out of player i�s control. By the central limit theorem,
(ii) occurs with small probability and does not a¤ect the equilibrium payo¤.
Player i (receiver) calculates the conditional expectation of yj(l; �j; 1). For Condition 2, it
su¢ ces that player i infers (iii) �j(l+1)(i) = G if the distance to E�yj(l; �j; 1) j fyi;tgt2T (l;�j ;1); aGj ; aGi
�from a�
�fqj(aGj ; ai)gai
�is no more than 2" and (iv) �j(l + 1)(i) = B if the distance to
E�yj(l; �j; 1) j fyi;tgt2T (l;�j ;1); aBj ; aGi
�from a�
�fqj(aBj ; ai)gai
�is no more than 2". To see
why this is su¢ cient, suppose �j(l + 1) = G but player i infers �j(l + 1)(i) = B. More-
over, assume that player i knows that �j(l + 1) = G and player j took aGj . Conditional on
�j(l+1) = G, the distance to E�yj(l; �j; 1) j fyi;tgt2T (l;�j ;1); aGj ; aGi
�from a�
�fqj(aGj ; ai)gai
�is more than 2" (otherwise, (iii) should be the case and �j(l + 1)(i) = G). Notice that the
conditional expectation uses the true action aGj . Since the conditional variance of yj(l; �j; 1)
is of order T12�1 by the central limit theorem, player i�s belief on the event that the dis-
tance to yj(l; �j; 1) from a��fqj(aGj ; ai)gai
�is no more than " is at most of order exp(T
12�1).
Hence, player i believes that Ai(~l+1) = Ai for all ~l � l with high probability. The symmetric
argument holds for �j(l + 1) = B.
For each �j 2 fG;Bg, instead of using Ehyj(l; �j; 1) j fyi;tgt2T (l;�j ;1); a
�jj ; a
Gi
i, player i
calculates the following statistics: Player i randomly picks ti(l; �j; 1). With Ti(l; �j; 1) �
T (l; �j; 1) n fti(l; �j; 1)g, player i calculates 1
T12�1
Pt2Ti(l;�j ;1) E
h1yj;t j yi;t; a
�jj ; a
Gi
ifor each
�j. Here, player i drops her own ti(l; �j; 1), not tj(l; �j; 1) that player j drops. However,
since Tj(l; �j; 1) and Ti(l; �j; 1) di¤ers at most for two periods, the same argument as above
asymptotically holds if we replace Ehyj(l; �j; 1) j fyi;tgt2T (l;�j ;1); a
�jj ; a
Gi
iwith
1
T12 � 1
Xt2Ti(l;�j ;1)
Eh1yj;t j yi;t; a
�jj ; a
Gi
i
= E
24 1
T12 � 1
Xt2Ti(l;�j ;1)
1yj;t j yi(l; �j; 1); a�jj ; a
Gi
35 :Here, yi(l; �j; 1) � 1
T12�1
Pt2Ti(l;�j ;1) 1yi;t.
35
The above inference is well de�ned if there is �" > 0 such that, for all " < �", there is no
player i�s signal observation fyi;tgt2Ti(l;�j ;1) such that
The distance to E
24 1
T12 � 1
Xt2Ti(l;�j ;1)
1yj;t j yi(l; �j; 1); aGj ; aGi
35from a�
�fqj(aGj ; ai)gai
�is no more than 2"; (37)
The distance to E
24 1
T12 � 1
Xt2Ti(l;�j ;1)
1yj;t j yi(l; �j; 1); aBj ; aGi
35from a�
�fqj(aBj ; ai)gai
�is no more than 2": (38)
Since yi(l; �j; 1) 2 �(f1yigyi2Yi),13 by continuity, a su¢ cient condition for the existence of
such �" is that there is no yi 2 �(f1yigyi2Yi) such that
E�yj j yi; aGj ; aGi
�2 a�
�fqj(aGj ; ai)gai
�; (39)
E�yj j yi; aBj ; aGi
�2 a�
�fqj(aBj ; ai)gai
�: (40)
Here, E [yj j yi; a] is the conditional expectation of the frequency of player j�s signal observa-
tions given the frequency of player i�s signal observation for given periods where the players
take a. (39) imposes jYjj � jAij conditions and (40) imposes jYjj � jAij conditions. Hence,
there are 2 (jYjj � jAij) conditions in total. On the other hand, we have jYij � 1 degrees of
freedom for yi. Hence, if 2 (jYjj � jAij) � jYjj � 1, then there can be solutions for (39) and
(40) and the inference above may not be well de�ned.
To deal with this problem, we consider the following procedure. (v) If player i observes
yi(l; �j; 1) whose di¤erence from the a¢ ne hull of fqi(aj; aGi )gaj is no more than ", then
player i infers �j(l + 1)(i) in the supplemental round 1 for �j(l + 1). (vi) Otherwise, player
i infers �j(l + 1)(i) in the supplemental round 2 for �j(l + 1), which will be explained later.
Since we take the a¢ ne hull with respect to player j�s action, whether player i infers �j(l+1)
from the supplemental round 1 or 2 is out of player j�s control (remember that the message
13Here, �(f1yigyi2Yi) is the set of distributions over Yi.
36
receiver i takes aGi ). By the central limit theorem, (vi) occurs with small probability and
does not a¤ect the equilibrium payo¤.
If (v) is the case, then the inferences (iii) and (iv) are generically well de�ned. To see
why, notice that this is well de�ned if there exists " > 0 such that there is no yi(l; �j; 1) such
that the distance to yi(l; �j; 1) from a��fqi(aj; aGi )gaj
�is no more than " and (37) and (38)
are satis�ed. A su¢ cient condition is that there is no yi 2 �(f1yigyi2Yi) such that
yi 2 a��fqi(aj; aGi )gaj
�(41)
and such that (39) and (40) are satis�ed. Since (41) imposes jYij � jAjj constraints, there
are jYij � jAij + 2 (jYjj � jAij) constraints together with (39) and (40). Since the degree of
freedom for yi is jYij � 1, if
jYij � jAjj+ 2 (jYjj � jAij) � jYij
,
jYjj � jAij �1
2jAij ;
then there is generically no yi 2 �(f1yigyi2Yi) satisfying (41), (39) and (40), as desired.
If (vi) is the case, then player i infers �j(l+1)(i) from the supplemental round 2. In this
case, player i has � i(l; �j; 1) = B or #i(l; �j; 1) = B.14 Then, from (32), player j is indi¤erent
between any action pro�le. That is, player j does not care about how player i will play in
the subsequent rounds. Therefore, player j excludes this case from the consideration. In
particular, player j does not care about player i�s inference in this case. That is, whenever
player j�s message in the supplemental round 2 matters, player j puts 0 belief on this event
and player j is indi¤erent between �j(l + 1)(i) = G and �j(l + 1)(i) = B.15
14See Section 15.4 for details.15As we have seen in Figure 6, whenever player i uses �j(l + 1)(i) to determine �j(l + 1), player j is
indi¤erent between any action pro�le. Hence, player j is almost indi¤erent to any action pro�le in thesupplemental round 1. However, player j�s history (and so actions) in the supplemental round 1 a¤ectsplayer j�s belief about player i�s inference �j(l + 1)(i), which, in turn, a¤ects player j�s belief about theoptimality of �i(l+ 1) since player i�s continuation strategy is jointly determined by �j(l+ 1)(i) and player
37
The summary of the cases is depicted in the following �gure:
[Insert Figure 9].
9.2.2 Supplemental Round 2 for �j(l + 1)
Second, we intuitively explain the supplemental round 2 for �j(l + 1), which is summarized
in Figures 10 and 11 below. Again, this round matters only if player i is in Box 8 of Figure
9 as a result of the supplemental round 1 for �j(l + 1).
In this round, player j (sender) sends the message about �j(l+1) again. Remember that
��j is the true variable: �j(l + 1) = ��j. This time, player j sends the message as follows:
With � > 0 being a small number de�ned in Section 10, player j determines
zj(��j) =
8>>><>>>:��j with probability 1� 2�;
fG;Bg n f��jg with probability �;
M with probability �
and player j takes
�zj(��j)j =
8>>><>>>:aGj if zj(��j) = G;
aBj if zj(��j) = B;
12aGj +
12aBj if zj(��j) =M
for T12 periods. That is, player j sends the �true�message �zj(
��j)j = a
��jj with high probability
1 � 2� while player j �tells a lie�with probability 2�. With probability �, player j sends
the opposite message zj(��j) = fG;Bg n f��jg. With probability �, player j �mixes� two
messages: zj(��j) =M and �Mj = 12aGj +
12aBj . When player j tells a lie, player j makes player
i�s history in the lth review round on which �i(l+1) is based, as we have seen in Figure 8 (with the roles ofplayers i and j reversed). Hence, player j is almost indi¤erent, but not exactly indi¤erent in the supplementalround 1 for �j(l + 1).On the other hand, in the supplemental round 2, since player j excludes the cases where player i�s history
in the supplemental round 2 a¤ects player i�s continuation strategy, player j is exactly indi¤erent to anystrategy.See Proposition 1 for the formal statement of the above discussion.
38
i indi¤erent between any action: �j(l; �j; 2) = B and Ai(~l + 1) = Ai for all ~l � l. Since a lie
occurs with small probability 2�, it does not a¤ect the equilibrium payo¤.
Player i (receiver) takes aGi . Player i infers zj(��j) as follows. For the periods in the current
round T (l; �j; 2) (see Figure 5), let yi(l; �j; 2) = (yi(l; �j; 2))yi be the vector representing the
frequency that player i observes yi in T (l; �j; 2). Conditional on ��j 2 fG;Bg, the likelihood
ratio between zj(��j) = zj 2 fG;B;Mg and zj(��j) = z0j 2 fG;B;Mg is
Pr�zj(��j) = zj j ��j;yi(l; �j; 2)
�Pr�zj(��j) = z0j j ��j;yi(l; �j; 2)
� = Pr�yi(l; �j; 2) j zj(��j) = zj
�Pr�yi(l; �j; 2) j zj(��j) = z0j
� Pr �zj(��j) = zj j ��j�
Pr�zj(��j) = z0j j ��j
� :log
Pr(yi(l;�j ;2)jzj(��j)=zj)Pr(yi(l;�j ;2)jzj(��j)=z0j)
is expressed as T12
�L(yi(l; �j; 2); zj)� L(yi(l; �j; 2); z0j)
�with
L(yi(l; �j; 2); zj) = yi;1(l; �j; 2) log q(yi;1jaGi ; �zjj ) + � � �+ yi;jYij(l; �j; 2) log q(yi;jYijjaGi ; �
zjj ):
Since L(yi(l; �j; 2); zj) is strictly concave with respect to the mixture of aGj and aBj and
�(f1yigyi2Yi) 3 yi(l; �j; 2) is compact, there exists � > 0 such that one of the following is
true:
1. zj(��j) = G is su¢ ciently more likely than zj(��j) = B: L(yi(l; �j; 2); G) � � �
L(yi(l; �j; 2); B).
2. zj(��j) = B is su¢ ciently more likely than zj(��j) = G: L(yi(l; �j; 2); B) � � �
L(yi(l; �j; 2); G).
3. If zj(��j) = G and zj(��j) = B are equally likely, then since L(yi(l; �j; 2); zj) is strictly
concave, zj(��j) =M is most likely: L(yi(l; �j; 2);M)�� � L(yi(l; �j; 2); G);L(yi(l; �j; 2); B).
If 1 is the case, thenPr(zj(��j)=Gj��j ;yi(l;�j ;2))Pr(zj(��j)=Bj��j ;yi(l;�j ;2))
� exp(�T12 ) �1�2� for
��j 2 fG;Bg. Since
zj(��j) = M implies that player j told a lie and that player i is indi¤erent to any action
pro�le, player i can believe that, conditional on �j(l+1), inferring �j(l+1)(i) = G is almost
optimal. Similarly, if 2 is the case, then player i can believe that, conditional on �j(l + 1),
inferring �j(l + 1)(i) = B is almost optimal. Finally, if 3 is the case, then player i believes
39
conditional on �j(l + 1) that player j told a lie, that any action pro�le is optimal with high
probability, and that inferring �j(l + 1)(i) = B is almost optimal.
In summary, there exists �� > 0 such that, for any history in the current round, condi-
tional on the true �j(l + 1), one of the following three cases is true:
1. zj(��j) = G is more likely than zj(��j) = B by exp(��T12 ).
2. zj(��j) = B is more likely than zj(��j) = G by exp(��T12 ).
3. zj(��j) =M is most likely by exp(��T12 ).
See Figure 10 for the illustration of player j�s requirement and Figure 11 for player
i�s inference. Consider Condition 2: Since almost optimality is established conditional on
�j(l + 1), after realizing �j(l + 1)(i) 6= �j(l + 1), player i believes that �j(l + 1) = B and
Ai(~l + 1) = Ai for all ~l � l with high probability.
[Insert Figures 10 and 11]
9.3 Formal Message Protocol
The formal de�nition of the supplemental rounds 1 and 2 for �j(l+1) is given in the Appendix.
The di¤erence from the above intuitive explanation is that there are several events denoted
by �j = B which are excluded from player i�s consideration. As we have mentioned in Section
9.2.1, when player j uses player i�s message in the supplemental round 2 for �i(l+1), �j = B
or #j = B has happened and player i always assumes that her message in the supplemental
round 2 is irrelevant. In summary, we can prove the following Lemma.
Lemma 5 There exists �" > 0 such that, for any " 2 (0; �"), for any l = 1; :::; L � 1, for
su¢ ciently large T , for any i; j 2 I, there exists a message protocol in the supplemental
rounds such that
1. In the supplemental round 1 for �j(l + 1),
40
(a) Player j takes a�j(l+1)j and player i takes aGi for T12 periods.
(b) Player j creates �j(l; �j; 1); �j(l; �j; 1) 2 fG;Bg.
(c) Player i creates � i(l; �j; 1); #i(l; �j; 1) 2 fG;Bg. If � i(l; �j; 1) = #i(l; �j; 1) = G,
then �j(l + 1)(i) is determined solely by player i�s history in the supplemental
round 1.
2. In the supplemental round 2 for �j(l + 1),
(a) Player j takes a mixed strategy and player i takes aGi for T12 periods.
(b) Player j creates �j(l; �j; 1) 2 fG;Bg.
(c) If �j(l+1)(i) is determined by player i�s history in the supplemental round 2, then
� i(l; �j; 1) = B or #i(l; �j; 1) = B in the previous round.
3. Let �j, #j and �j be f�j(~l); �j(~l; �j; 1); �j(~l; �i; 1)gL�1~l=1, f#j(~l; �i; 1)gL�1~l=1
and f�j(~l; �j; 1); �j(~l; �j; 2)gL�1~l=1,
respectively. Then
(a) Conditional on f�j(~l + 1)gL�1~l=1and ��j and #j being all G,�player i believes that
�j(l+1)(i) = �j(l+1) or ��j(l; �j; 1) = B or �j(l; �j; 2) = B�with probability no
less than 1� exp(�T 13 ).
(b) Conditional on f�j(~l+ 1)gL�1~l=1and ��j and #j being all G,�any history of player
i in the supplemental rounds happens with probability no less than exp(�T 13 ).
(c) Conditional on f�j(~l + 1)gL�1~l=1and ��j and #j being all G,� conditional on any
history of player i in all the supplemental rounds, player i believes that any �i(l+
1)(j) is possible with probability no less than exp(�T 13 ).
(d) �j and #j are all G with probability no less than 1� exp(�T 13 ).
(e) The distribution of �j is independent of player i�s strategy with probability no less
than 1� exp(�T 13 ).
(f) The distribution of f�j; #jg is independent of player i�s strategy.
41
10 Variables
In this section, we show that all the variables can be taken consistently satisfying all the
requirements that we have mentioned: �, �u, q2, q1, �", �L, L, � and ".
First, � is determined in (3), �u is determined in Lemma 2, q1 and q2 are determined in
Lemma 3, and �" 2�0; 1
2
�is de�ned in Lemma 5, independently of the other variables.
Given q1 and q2, we de�ne �L to satisfy (10):
�L (q2 � q1) > maxa;i2 jui (a)j : (42)
Given � and �L, we de�ne L su¢ ciently large so that (11) holds:
�L < L�: (43)
We are left to pin down � and ". Take " > 0 and � > 0 su¢ ciently small such that
ui (D1; D2) + �+ 2"�L+ 3�u� < vi < vi < ui (C1; C2)� �� 2"�L� 3�u�; (44)
and
" < min f�"; q2 � q1g : (45)
11 Almost Optimality of �i(xi)
Since we have pinned down �j(l+1)(i) in Section 9, Section 8.2.2 pins down �j(l+1). Then,
Section 8.1 pins down the equilibrium action. In addition, since we have pinned down �j, �j
and #j, Section 9 pins down �i(xj; �; �) except for ��i(x; l; �j). Therefore, we want to show
that there exists ��i(x; l; �j) such that these actions and reward functions satisfy (21), (5)
and (6).
42
11.1 Almost Optimality of �j(l + 1)
First, we show the almost optimality of �j(l + 1). Since we veri�ed 2 of Condition 2 is
satis�ed in Lemma 5, this follows from Section 9.1.
For notational convenience, for each lth review round, let aj(l) be the sequence of player
j�s equilibrium actions in each ~lth review round with ~l � l.
As we have mentioned in Section 9, in each lth review round (r = l), player i conditions
that �j(r) = #j(r) = G for all r < l. Since Condition 2 is satis�ed conditional on �j = #j = G
in Lemma 5, the almost optimality still holds:
Lemma 6 For any lth review round, conditional on aj(l) and �j(r) = #j(r) = G for all
r < l, for any history of player i where player i has not deviated in the supplemental rounds
for �j(~l) with ~l � l, player i can believe that �j(l) = �j(l) or there exists r < l with �j(r) = B
with probability no less than
1� exp(�T 13 ): (46)
11.2 Determination of ��i(x; l; �j)
Second, based on Lemma 6, we determine ��i(x; l; �j) such that ��i(x; l; �j) satis�es (6) and
show that �i(xi) is almost optimal:
Proposition 1 There exists ��i(x; l; �j) that satis�es (6) and such that �i(xi) is almost op-
timal, that is, for su¢ ciently large T , for all round r,
1. If �j(~r) = B or #j(~r) = B with some ~r < r, then any strategy is exactly optimal.
2. If �j(~r) = #j(~r) = G for all ~r < r, then
(a) For all l 2 f1; :::; L � 1g, for r = (l; �i; 2) (supplemental round 2 for �i(l + 1))
where player i takes a mixed strategy, any strategy is exactly optimal.
(b) For all l 2 f1; :::; L � 1g, for r = (l; �j; 1) or (l; �j; 2) (supplemental round 1 or
2 for �j(l + 1)) where player i receives the message, aGi is strictly optimal by at
least 12�L(q2 � q1).
43
(c) For the other rounds, �i(xi) is optimal with loss up to exp(�T14 ).
1 follows from (32). 2-(a) is true since �j(~r) = #j(~r) = G for all ~r < r imply that player
j has inferred �i(l + 1)(j) from the supplemental round 1 (See Figure 13 reversing the roles
of i and j). 2-(b) is true since taking aGi gives the almost optimal inference and (35) gives
a high reward on aGi . 2-(c) is true intuitively because (i) in the supplemental round 1 for
�i(l+1), whenever player j is ready to listen, player i�s continuation payo¤ is �xed regardless
of player j�s inference from 4 of Lemma 4 and (ii) Lemma 6 guarantees that the coordination
goes well with high probability and that we can construct the reward functions in the review
rounds as in Section 5.2.2.
12 Exact Optimality
We are left to show the truthtelling incentive for hTP+1i in the report block and to establish
the exact optimality of �i (xi). Here, we o¤er the intuitive explanation. See Section 15.7 in
the Appendix for the formal proof.
The report block proceeds as follows: By public randomization device, the players coor-
dinate on who will report hTP+1i . Only one player reports the history. Player 1 reports hTP+11
with probability 12and player 2 reports hTP+12 with probability 1
2. Player i who is supposed
to send hTP+1i according to the public randomization device sends two messages for each
round r sequentially where player i does not take a mixed strategy (that is, except for the
supplemental round 2 for �i(l+1)): (i-r-1) What action player i took and what signal player
i observed in the �rst period of the round r, (ai;tr ; yi;tr). Here, tr is the �rst period of the
round r. (i-r-2) The history in the round r, fai;t; yi;tgt2T (r).
Let hri be the history in each round where player i does not take a mixed strategy before
the round r. Notice that, before player i sends the message about (i-r-1), the messages about
hri have been sent.
By backward induction, for each r, player j who is not supposed to send hTP+1j by
the realization of the public randomization device adjusts the reward function as follows.
44
Here, to verify the truthtelling incentive, we distinguish the true histories hri , (ai;tr ; yi;tr) and
fai;t; yi;tgt2T (r) and the messages hri , (ai;tr ; yi;tr) and fai;t; yi;tgt2T (r).
(j-r-1) Given hri , player j gives a reward based on�ai;aj;trj;tr
ai2Ai
such that any action
that should be taken by �i(xi) after hri is optimal in tr (the �rst period of the round r).
(j-r-2) Player j punishes player i if player i is likely to tell a lie about (ai;tr ; yi;tr) in (i-r-1).
(j-r-3) Player j gives a reward based onP
t2T (r)ai;tr ;aj;tj;t such that constantly taking the
same action as ai;tr during the round r is optimal. (j-r-4) Player j punishes player i if player
i is likely to tell a lie about fai;t; yi;tgt2T (r) in (i-r-2).
We take the punishment and reward satisfying the following requirements:
1. Within the round r, we take the reward in (j-r-1) much larger than the punishment
or reward in (j-r-2), (j-r-3) and (j-r-4). Similarly, we take the punishment in (j-r-2)
much larger than the punishment or reward in (j-r-3) and (j-r-4), and the reward in
(j-r-3) much larger than the punishment in (j-r-4).
2. Between rounds, we take the punishment and reward from (j-r-1) to (j-r-4) much
larger than those from (j-r + 1-1) to (j-r + 1-4).
By backward induction, we can verify the truthtelling incentive and make �i(xi) exactly
optimal: In round r, we do not need to consider the punishment or reward for the round
~r > r by Requirement 2 above. First, given that player i tells the truth in the report block,
(j-r-1) is enough to make the equilibrium strategy optimal in tr, taking into account the
e¤ect on (j-r-2), (j-r-3) and punishment and reward for the round ~r > r by Requirement
1. Second, (j-r-2) is enough to give the truthtelling incentive for (ai;tr ; yi;tr) since (i) the
punishment and reward for (j-~r-1), (j-~r-2), (j-~r-3) and (j-~r-4) with ~r < r are sunk, (ii) the
reward (j-r-1) is sunk, and (iii) the punishment and reward for (j-r-3) and (j-r-4) are much
smaller than the reward for (j-r-2). Third, given the truthtelling incentive for ai;tr , (j-r-3)
is enough to incentivize player i to constantly take ai;tr within the round r since (i) the
punishment and reward for (j-~r-1), (j-~r-2), (j-~r-3) and (j-~r-4) with ~r < r are sunk, (ii) the
punishment and reward for (j-r-1) and (j-r-2) are sunk, and (iii) the punishment for (j-r-4)
45
is much smaller than the reward for (j-r-3). Finally, (j-r-4) is enough to incentivize player i
to tell the truth about fai;t; yi;tgt2T (r) since (i) the punishment and reward for (j-~r-1), (j-~r-
2), (j-~r-3) and (j-~r-4) with ~r < r are sunk, and (ii) the punishment and reward for (j-r-1),
(j-r-2) and (j-r-3) are sunk.
Note that we do not require player i to send the messages about the rounds where
player i takes a mixed strategy. From Proposition 1, �i(xi) was exactly optimal without
the adjustment in the report block. We cancel out the adjustment of the punishment and
reward in the round r explained above once �j(~r) = B or #j(~r) = B with ~r < r happens.
Then, from Proposition 1, �i(xi) is exactly optimal in all the rounds.
In addition, in the supplemental round 2 for �i(l + 1), if �j(~r) = #j(~r) = G until this
round, then player i�s strategy in this round does not a¤ect the continuation strategy pro�le
including the report block except for player j�s messages in the report block. Since Propo-
sition 1 guarantees that �i(xi) is exactly optimal in this round without adjustment, we can
omit this round from the rounds whose history player i sends the messages in the report
block about.
From Proposition 1, the last adjustment (j-R-3) can be very small. Hence, the total
reward and punishment can be small enough not to a¤ect the equilibrium payo¤.
13 General Games
13.1 General Two-Player Games
The prisoners�dilemma is special in that, when player j�state is B, the equilibrium action
Dj can (i) attain the targeted payo¤ �vj or vj depending on player i�s state and (ii) minimax
player i at the same time. However, in a general game, there does not always exist such an
action, which causes the following problem: Player j with xj = B and �j (l) = B needs to
give a positive constant reward as in (33) to sustain (6). On the other hand, player j needs
to ensure that player i�s payo¤ is below vi regardless of player i�s strategy. Therefore, player
j with xj = B and �j (l) = B needs to minimax player i with high probability if player i
46
does not take a prescribed action although the minimaxing action can be di¤erent from the
prescribed action to attain the targetted payo¤ �vj or vj.
For this purpose, player i constructs a statistics such that if its realization is low, then
player i allows player j to minimax player i. Player i sends the message about whether
she allows player j to minimax. Player j, on the other hand, calculates the conditional
expectation of that statistics which decreases if player i deviates. Modifying the coordination
protocol about �i(l + 1), we can make a protocol such that the players can coordinate on
the punishment properly and that player j punishes player i with high probability if the
realization of the conditional expectation is su¢ ciently low regardless of player i�s message
to keep player i�s payo¤ below vi regardless of player i�s strategy. See the Supplemental
Material 1 for the formal description.
The fact that there does not always exist an action satisfying both (i) and (ii) is the
reason why the belief-free equilibrium payo¤ set is smaller than the feasible and individually
rational payo¤ set except for the prisoners�dilemma. Hörner and Olszewski (2006) consider
a way to coordinate on the punishment, which is di¤erent from ours. However, since their
coordination uses almost perfect monitoring, as Hörner and Olszewski (2006) mention, it
was not known how to obtain the coordination in not-almost-perfect monitoring.
13.2 General More-Than-Two-Player Games
If there are more than two players, as Hörner and Olszewski (2006), we let player (i � 1)�s
state determine whether player i�s value is �vi or vi independently of players (�(i�1))�s state.
With cheap talk, there is no conceptual di¢ culty to extend our result (See Supplemental
Material 2).
47
14 Equilibrium Construction without Cheap Talk
14.1 Two-Player Games
Throughout the main text, we use perfect and public cheap talk and public randomization
devices.16 In the Supplemental Material 3, we show that cheap talk and public randomization
are completely dispensable and all the messages can be sent via actions without public
randomization.
Note that we use the cheap talk messages for xi and hTP+1i . The basic idea to send xi is
the same as in the supplemental rounds for �i(l+1). As Lemma 5 shows, conditional on the
opponent message, the receiver can believe that her inference is correct or she is indi¤erent to
any action pro�le sequence. Therefore, even after observing the continuation strategy that
is not corresponding to her inference, the receiver can believe that she is indi¤erent to any
action pro�le sequence with high probability and that following the equilibrium strategy is
almost optimal. As we have seen in Section 12, we can attain the exact optimality afterwards.
There is one di¤erence between �i(l+1) and xi. For �i(l+1), whenever player j changes
her strategy based on the inference of �i(l+1), player j has made player i indi¤erent to any
action pro�le sequence and the sender�s inference of the receiver�s inference is irrelevant (4 of
Lemma 4). On the other hand, the receiver j constructs the reward function on the sender
i based on xi (remember that �i depends on not only xj but also xi). Therefore, the sender
needs to have the inference of the receiver�s inference. For this purpose, player j sends back
her inference of xi to player i to inform what inference the reward function is based on. The
incentive to tell the truth for player j can be provided.
As for hTP+1i in the report block, there are three problems. First, the cardinality of the
messages for our equilibrium construction is of order (jAij jYij)T . If we replace cheap talk
with the message exchange via actions straightforwardly, it takes too long to send all the
messages and a¤ects the equilibrium payo¤. The second problem is that we need to dispense
the public randomization device. It is important to have only one player who reports hTP+1i
16Even with cheap talk, however, the general folk theorem is new to the two-player case.
48
since otherwise, while the opponent is reporting hTP+1j , player i could update the belief
over player j�s signal observations and the incentive to tell the truth about hTP+1i would be
destroyed. On the other hand, to have the exact optimality, there needs to be a positive
probability for each player to report hTP+1i and the reward function will be adjusted. Third,
there exists a positive probability that the messages do not transmit correctly. See the
Supplemental Material 3 for the solutions to these problems.
14.2 More-Than-Two-Player Games
There is a problem in coordinating on xi, unique to the case with more than two players:
Each player takes an action based on the her own inference about xi. It may be of some
player�s interest that some players take actions corresponding to xi = G and the others
take actions corresponding to xi = B. Then, that player may have an incentive to deviate
from the equilibrium action in order to manipulate the signal distributions and to induce
the miscoordination about xi. See the Supplemental Material 4 for how to deal with this
problem.
15 Appendix
15.1 Proof of Lemma 1
To see why this is enough for Theorem 1, de�ne the strategy in the in�nitely repeated game
as follows: De�ne
p(G; hTP+1j : �) � 1 + 1� �
�TP�i(G; h
TP+1j : �)
�vi � vi;
p(B; hTP+1j : �) � 1� �
�TP�i(B; h
TP+1j : �)
�vi � vi: (47)
If (6) is satis�ed, for su¢ ciently large �, p(G; hTP+1j : �), p(B; hTP+1j : �) 2 [0; 1] for all hTP+1j .
We see the repeated game as the repetition of TP -period �review phases.� In each phase,
49
player i has a state xi 2 fG;Bg. Within the phase, player i with state xi plays according to
�i (xi) in the current phase. After observing hTP+1i in the current phase, the state in the next
phase is equal to G with probability p(xi; hTP+1i : �) and B with the remaining probability.
Player i�s initial state is equal to G with probability piv and B with probability 1 � piv
with
piv�vj + (1� piv)vj = vj:
Then, since
(1� �)
TPXt=1
�t�1ui (at) + �TP�p(G; hTP+1j )�vi + f1� p(G; hTP+1j )gvi
�=
�1� �TP
� " 1� �
1� �TP
(TPXt=1
�t�1ui (at) + �i(G; hTP+1j : �)
)#+ �TP �vi
and
(1� �)
TPXt=1
�t�1ui (at) + �TP�p(B; hTP+1j )�vi + f1� p(B; hTP+1j )gvi
�=
�1� �TP
� " 1� �
1� �TP
(TPXt=1
�t�1ui (at) + �i(B; hTP+1j : �)
)#+ �TP vi;
(4) and (5) imply, for su¢ ciently large discount factor �,
1. Conditional on the opponent�s state, the above strategy in the in�nitely repeated game
is optimal.
2. If player j is in the state G, then player i�s payo¤ from the in�nitely repeated game is
�vi and if player j is in the state B, then player i�s payo¤ is vi.
3. The payo¤ in the initial period is pjv�vi + (1� pjv)vi = vi as desired.
15.2 Proof of Lemma 3
First, note that (27) is equivalent to
50
Xyi
8<:Xyj
aj (yj)q(yj j a; yi)
9=; q (yi j ai; ~aj) =Xyj
(Xyi
q(yj j a; yi)q(yi j ai; ~aj)) aj (yj) = q2:
(48)
Second, arbitrarily �x q2 > q1. Let Q1(~ai; aj) � (q (yj j ~ai; aj))yj be the vector of
the distribution of player j�s signal conditional on ~ai; aj. In addition, let Q2(ai; ~aj) ��Pyiq(yj j a; yi)q(yi j ai; ~aj)
�yj. The vectors Q1(~ai; aj) and Q2(ai; ~aj) are generically lin-
early independent except for ~ai = ai and ~aj = aj since Assumption 2 guarantees that
jYjj � jAij + jAjj � 1. Therefore, there generically exists a solution f ai (yj)gyj for (26) and
(27). Note that if f ai (yj)gyj solves the system for q2 and q1, then, for any m;m0 2 R++,n ai (yj)+m
m0
oyjsolves the system for q2+m
m0 and q1+mm0 . Therefore, we can make sure that
aj : Yj ! (0; 1).
15.3 Proof of Lemma 2
This is generically satis�ed if jYjj � jAij as Yamamoto (2009b).
15.4 Proof of Lemma 5
15.4.1 Explanation of 1 of Lemma 5: Supplemental Round 1 for �j(l + 1)
The following is the formal description of the message exchanges in the supplemental round
1 for �j(l + 1). Let us �rst introduce the following notations:
Notation 1 For �j 2 fG;Bg, we de�ne
1. the matrix projecting player i�s signals on the conditional distribution of player j�s
signals given an action pro�le a:
Qj;i(a) =
26664q(yj;1 j a; yi;1) � � � q(yj;1 j a; yi;jYij)
......
q(yj;jYj j j a; yi;1) � � � q(yj;jYj j j a; yi;jYij)
37775 :51
2. (jYjj � jAij+ 1)� jYjj matrix Hj(��j) and (jYjj � jAij+ 1)� 1 vector pj(��j) such that
the a¢ ne hull of player j�s signal distribution with respect to player i�s action when
player j takes a��jj is represented by
a�(fqj(a��jj ; ai)gai) \ R
jYj j+ =
nyj 2 RjYj j+ : Hj(��j)yj = pj(��j)
o: (49)
3. The set of hyperplanes generated by perturbing RHS of the characterization of a�(fqj(a��jj ; ai)gai)\
RjYj j+ : for " � 0,17
Hj["](��j) �
8<:yj 2 RjYj j+ : 9" 2 RjYj j�jAij+1+ such that
8<: k"k � ";
Hj(��j)yj = pj(��j)+"
9=; :
4. The set of frequencies of player i�s signal observations such that the conditional expec-
tation of the frequency of player j�s signal observation is close to Hj["](�j) when the
players take a�jj ; aGi :
Hj;i["](�j) =
8>>>>>>>>><>>>>>>>>>:
yi 2 RjYij+ such that
there exist "1 2 RjYj j+ , "2 2 RjYj j�jAij+1+ and yj 2 RjYj j+ with8>>><>>>:yj = Qj;i(a
�jj ; a
Gi )yi + "1;
Hj(�j)yj = pj(�j) + "2;
k"1k ; k"2k � "
9>>>>>>>>>=>>>>>>>>>;:
Without loss of generality, we can make sure that all the elements in Hj(G) and Hj(B)
are included in (0; 1) by the following lemma.
Lemma 7 We can take Hj(G) and Hj(B) such that all the elements are in (0; 1).
Proof. Let mH be the minimum element of Hj(��j) and MH be the maximum element of
Hj(��j). Let ~Hj(��j) be the matrix whose (l;m) element is(Hj(��j))
l;m+jmH j+1
jMH j+jmH j+2 2 (0; 1) and
~pj(��j) be the vector whose lth element is(pj(��j))
l+jmH j+1
jMH j+jmH j+2 .
17We use the sup norm unless otherwise noti�ed: kxk = maxi jxij.
52
We will show
nyj 2 RjYj j+ : Hj(��j)yj = pj(��j)
o� Hj(��j) = ~Hj(��j) �
nyj 2 RjYj j+ : ~Hj(��j)yj = ~pj(��j)
o:
1. Hj(��j) � ~Hj(��j)
Suppose yj 2 Hj(��j). Since yj 2 a�(fqj(a��jj ; ai)gai) � a�
�f0; 1gjYj j
�, ~Hj(��j)yj =
~pj(��j) as desired.
2. Hj(��j) � ~Hj(��j)
Suppose yj 62 Hj(��j). Since a�(fqj(a��jj ; ai)gai) � a�
�f0; 1gjYj j
�, without loss of gener-
ality, we can assume that one row of Hj(��j) is parallel to (1; :::; 1) and that the element
of pj(��j) corresponding to that row is 1. If (1; :::; 1)yj 6= 1, then�
1+jmH j+1jMH j+jmH j+2 ; :::;
1+jmH j+1jMH j+jmH j+2
�yj =
1+jmH j+1jMH j+jmH j+2 (1; :::; 1)yj 6=
1+jmH j+1jMH j+jmH j+2 and yj 62
~Hj(��j) as desired. If (1; :::; 1)yj = 1,
then there is another row hj(��j) and the corresponding element pj(��j) of pj(��j) such
that
hj(��j)yj 6= pj(��j):
Let ~hj(��j) be the corresponding row of ~Hj(��j) and ~pj(��j) is the corresponding element
of ~pj(��j). Then,
~hj(��j)yj =1
jMH j+ jmH j+ 2�hj(��j) + (jmH j+ 1) (1; :::; 1)
�yj
=1
jMH j+ jmH j+ 2�hj(��j)yj + jmH j+ 1
�6= 1
jMH j+ jmH j+ 2�pj(��j) + jmH j+ 1
�= ~pj(��j)
and so yj 62 ~Hj(��j).
Given the frequency of player j�s signal observation yj(l; �j; 1), yj(l; �j; 1) 2 Hj["](��j)
corresponds to Box 2 of Figure 9. On the other hand, given the frequency of player i�s signal
53
observation yi(l; �j; 1), yi(l; �j; 1) 2 Hi["](G) corresponds to Boxes 5, 6 and 7 of Figure 9.
In addition, since
E
24 1
T12 � 1
Xt2Ti(l;�j ;1)
1yj;t j fyi;tgt2Ti(l;�j ;1); a�jj ; a
Gi
35 = Qj;i(a�jj ; a
Gi )yi(l; �j; 1);
yi(l; �j; 1) 2 Hj;i["](�j) corresponds to Box 5 or 6 of Figure 9, depending on �j = G or B.
Strategies The strategy in the supplemental round 1 for �j(l + 1) is as follows: Player
j (sender) with �j(l + 1) = ��j 2 fG;Bg takes a��jj and player i (receiver) takes aGi for T
12
periods in the supplemental round 1 for �j(l + 1), that is, for t 2 T (l; �j; 1).
Requirement by Player j As we have mentioned in Section 9.2.1, player j requires player
i to infer �j(l + 1) correctly only if yj(l; �j; 1) 2 Hj["](��j).
Instead of using yj(l; �j; 1) directly, player j constructs random variables�Hj;t
t2Tj(l;�j ;1)
as follows. After taking a��jj and observing yj;t, player j calculates Hj(��j)1yj;t. Then, player
j draws (jYjj � jAij+ 1) random variables independently from the uniform distribution
on [0; 1]. If the lth realization of these random variables is less than the lth element of
Hj(��j)1yj;t, then the lth element of Hj;t is equal to 1. Otherwise, the lth element of
Hj;t is
equal to 0. From Lemma 7,
Pr�n�
Hj;t
�l= 1oj a; y
�=�Hj(��j)1yj;t
�l2 (0; 1) (50)
for all a and y.
Given�Hj;t
t2Tj(l;�j ;1)
, let
Hj (l; �j; 1) =1
T12 � 1
Xt2Tj(l;�j ;1)
Hj;t: (51)
54
Player j requires player i to infer �j(l + 1) correctly only if
Hj (l; �j; 1)�Hj(��j)yj(l; �j; 1)
=
1
T12 � 1
Xt2Tj(l;�j ;1)
Hj;t �
1
T12 � 1
Xt2Tj(l;�j ;1)
Hj(��j)1yj;t
� "
2(52)
and Hj (l; �j; 1)� pj(��j) � "
2: (53)
In summary, there are following cases:
1. (52) is not satis�ed. Let �j(l; �j; 1) = B denote this event. This case is excluded from
player i�s consideration. From (50) and (51), by the central limit theorem, conditional
on fyj;tgt2T (l;�j ;1), this occurs with probability no more than of order exp(�T12 ).
2. (52) is satis�ed. Let �j(l; �j; 1) = G denote this event.
(a) (53) is not satis�ed. Let �j(l; �j; 1) = B in this event. Then, player j makes
player i indi¤erent between any action pro�le sequence in the subsequent rounds.
This case is not excluded from player i�s consideration. Since we take the a¢ ne
hull with respect to player i�s action in (49), together with (50), the central
limit theorem implies that this occurs with probability no more than of order
exp(�T 12 ) ex ante at the beginning of the supplemental round 1 regardless of
player i�s strategy.
(b) (53) is satis�ed. Depending on the supplemental round 2, player j may require
player i to infer �j(l+1) correctly. Let �j(l; �j; 1) = G denote this event. Note that
(52) and (53) imply Hj(��j)yj(l; �j; 1)� pj(��j)
� " by triangle inequality and
so yj(l; �j; 1) 2 Hj["](��j). Therefore, player j requires player i to infer �j(l + 1)
correctly only if yj(l; �j; 1) 2 Hj["](��j) as mentioned.
In addition, if 1 is the case, we de�ne �j(l; �j; 1) 2 fG;Bg as in the case with
55
�j(l; �j; 1) = G: If (53) is not satis�ed, then �j(l; �j; 1) = B. If (53) is satis�ed,
then �j(l; �j; 1) = G.
See the upper half of Figure 12 for the illustration. Figure 12 corresponds to Figure 9 in
the intuitive explanation. Box 2 of Figure 9 corresponds to Box 5 in Figure 12 and it is still
true that player j requires player i to infer �j(l+1) correctly only if yj(l; �j; 1) 2 Hj["](��j).
Therefore, the su¢ cient condition for the almost optimal inference by player i is still valid.
[Insert Figure 12]
Inference by Player i As mentioned in Section 9.2.1, if player i infers �j(l+1) from the
supplemental round 1, then player i infers �j(l + 1) = �j 2 fG;Bg if
E
24 1
T12 � 1
Xt2Ti(l;�j ;1)
1yj j yi(l; �j; 1); a�jj ; a
Gi
35is close to a�
�fqj(a�jj ; ai)gai
�. In other words, player i infers �j(l + 1) = G if yi(l; �j; 1) 2
Hj;i["](G) and �j(l + 1) = B if yi(l; �j; 1) 2 Hj;i["](B).
Instead of using yi(l; �j; 1) directly, player i constructs random variables fi;t(G)gt2Ti(l;�j ;1),
fi;t(B)gt2Ti(l;�j ;1) and�Gi;t
t2Ti(l;�j ;1)
as follows.
Construction ofni;t(�j)
ot2Ti(l;�j ;1)
with �j 2 fG;Bg After taking aGi and observing
yi;t, player i calculates Hj(�j)Qj;i(a�jj ; a
Gi )1yi;t. Then, player i draws jYjj � jAij + 1 random
variables independently from the uniform distribution on [0; 1]. If the lth realization of these
random variables is less than the lth element of Hj(�j)Qj;i(a�jj ; a
Gi )1yi;t, then the lth element
of i;t(�j) is equal to 1. Otherwise, the lth element of i;t(�j) is equal to 0. Since all the
elements of Hj(�j) is in (0; 1) from Lemma 7 and Qj;i(a�jj ; a
Gi )1yi;t is a conditional probability
distribution, we have Hj(�j)Qj;i(a�jj ; a
Gi )1yi;t 2 (0; 1), and
Pr�n�
i;t(�j)�l= 1oj a; y
�=�Hj(�j)Qj;i(a
�jj ; a
Gi )1yi;t
�l: (54)
56
Given fi;t(�j)gt2Ti(l;�j ;1), let
i(�j)(l; �j; 1) =1
T12 � 1
Xt2Ti(l;�j ;1)
i;t(�j): (55)
Construction of�Gi;t
t2Ti(l;�j ;1)
After taking aGi and observing yi;t, player i calculates
Hi(G)1yi;t. Then, player i draws (jYij � jAjj+ 1) random variables independently from the
uniform distribution on [0; 1]. If the lth realization of these random variables is less than
the lth element of Hi(G)1yi;t, then the lth element of Gi;t is equal to 1. Otherwise, the lth
element of Gi;t is equal to 0. By Lemma 7,
Pr�n�
Gi;t
�l= 1oj a; y
�=�Hi(G)1yi;t
�l2 (0; 1) : (56)
Given�Gi;t
t2Ti(l;�j ;1)
, let
Gi (l; �j; 1) =1
T12 � 1
Xt2Ti(l;�j ;1)
Gi;t: (57)
Player j infers �j(l + 1)(i) from the supplemental round 1 if and only if
i(G)(l; �j; 1)�Hj(G)Qj;i(aGj ; a
Gi )yi(l; �j; 1)
=
1
T12 � 1
X
t2Ti(l;�j ;1)
i;t(G)�X
t2Ti(l;�j ;1)
Hj(G)Qj;i(aGj ; a
Gi )1yi;t
� "
2; (58)
i(B)(l; �j; 1)�Hj(B)Qj;i(aBj ; a
Gi )yi(l; �j; 1)
=
1
T12 � 1
X
t2Ti(l;�j ;1)
i;t(B)�X
t2Ti(l;�j ;1)
Hj(B)Qj;i(aBj ; a
Gi )1yi;t
� "
2; (59)
57
Gi (l; �j; 1)�Hi(G)yi(l; �j; 1)
=1
T12 � 1
X
t2Ti(l;�j ;1)
Gi;t �
1
T12
Xt2Ti(l;�j ;1)
Hi(G)1yi;t
� "
2; (60)
and Gi (l; �j; 1)� pi(G) � "
2; (61)
There are following cases:
1. (58), (59) or (60) are not satis�ed. Let � i(l; �j; 1) = B denote this event. This case is
excluded from player j�s consideration. From (54), (55), (56) and (57), by the central
limit theorem, conditional on fyi;tgt2T (l;�j ;1), this occurs with probability no more than
of order exp(�T 12 ).
2. (58), (59) and (60) are satis�ed. Let � i(l; �j; 1) = G denote this event.
(a) (61) is not satis�ed. Let #i(l; �j; 1) = B denote this event. This case is also
excluded from player j�s consideration. Since we take the a¢ ne hull with respect
to player j�s action in the de�nition of Hi(G) (this is symmetric to (49)), together
with (56), the central limit theorem implies that this occurs with probability no
more than of order exp(�T 12 ) ex ante at the beginning of the supplemental round
1 regardless of player j�s strategy.
(b) (61) is satis�ed. Let #i(l; �j; 1) = G denote this event. In this case, player i uses
the supplemental round 1 to infer �j(l + 1).
For the following classi�cation, de�ne �K as follows: Since a continuous linear
transformation is Lipschitz continuous, we can take �K such that, for any i and
�j 2 fG;Bg, if yi 2 Hj;i["](�j), then there exists " 2 RjYj j�jAij+1+ such that
Hj(�j)Qj;i(a�jj ; a
Gi )yi = pj(�j) + ";
k"k � ( �K � 1)": (62)
58
Given this �K, we consider following subsubcases.
i. If we have
ki(G)(l; �j; 1)� pj(G)k � �K"; (63)
then player i infers �j(l + 1)(i) = G.
ii. If we have
ki(B)(l; �j; 1)� pj(B)k � �K"; (64)
then player i infers �j(l + 1)(i) = B.
iii. Otherwise, player i infers �j(l + 1)(i) = B.
In addition, if 1 is the case, we de�ne #i(l; �j; 1) 2 fG;Bg as in the case with
� i(l; �j; 1) = G: If (61) is not satis�ed, then #i(l; �j; 1) = B. If (61) is satis�ed,
then #i(l; �j; 1) = G.
The following lemma shows that the above inferences are well de�ned. The intuition is
the same as one in Section 9.2.1.
Lemma 8 Generically, the following statement is true: There exists �" > 0 such that, for all
" < �", if Case 2-(a)-i above is the case, then Case 2-(a)-ii is not the case. Symmetrically, if
Case 2-(a)-ii above is the case, then Case 2-(a)-i is not the case.
Proof. (60) and (61) imply that there exists "1 with
Hi(G)yi = pi(G) + "1;
k"1k � ":
(58) and (63) imply that there exists "2 with
Hj(G)Qj;i(aGj ; a
Gi )yi = pj(G) + "2;
k"2k � ( �K + 1)":
59
On the other hand, (59) and (64) imply that there exists "3 with
Hj(B)Qj;i(aBj ; a
Gi )yi = pj(B) + "3;
k"3k � ( �K + 1)":
Therefore, it su¢ ces to show that, for su¢ ciently small e, for any k"k � e, there does
not exist yi 2 RjYij+ such that
26664Hi(G)
Hj(G)Qj;i(aGj ; a
Gi )
Hj(B)Qj;i(aBj ; a
Gi )
37775yi=26664pi(G)
pj(G)
pj(B)
37775+ ": (65)
For generic q, with " = 0, such yi does not exist since we have jYij degrees of freedom
and jYij + 2 jYjj � jAjj � 2 jAij + 1 constraints. Note that one row of each of Hi(G),
Hj(G)Qj;i(aGj ; a
Gi ) and Hj(B)Qj;i(a
Bj ; a
Gi ) is parallel to 1.
By Farkas Lemma, this nonexistence of yi for " = 0 is equivalent to the existence of
x 2 RjYij+2jYj j�jAj j�2jAij+1+ with
26664Hi(G)
Hj(G)Qj;i(aGj ; a
Gi )
Hj(B)Qj;i(aBj ; a
Gi )
377750
x � 0;
26664pi(G)
pj(G)
pj(B)
37775 � x > 0:
Hence, for su¢ ciently small e, for any k"k � e,
26664Hi(G)
Hj(G)Qj;i(aGj ; a
Gi )
Hj(B)Qj;i(aBj ; a
Gi )
377750
x � 0;
0BBB@26664pi(G)
pj(G)
pj(B)
37775+ "1CCCA � x > 0:
Again, by Farkas Lemma, (65) does not have a solution for su¢ ciently small e.
See the lower half of Figure 13. Case 1 above is Box 7 of Figure 13. Case 2-(a) is Box 12.
60
Case 2-(b)-i corresponds to Box 9, 2-(b)-ii corresponds to Box 10, and 2-(b)-iii corresponds
to Box 11.
Figure 13 corresponds to Figure 11 in the intuitive explanation. Box 2 of Figure 11
corresponds to Box 9 in Figure 13, Box 3 in Figure 11 corresponds to Box 10 of Figure 13,
and Box 4 of Figure 11 corresponds to Box 11 of Figure 13. The cases where player i answers
�No�to the question in Box 1 in Figure 11 are included in Boxes 7 and 12 in Figure 13.
[Insert Figure 13]
Notice that whenever player i infers �j(l+1) = B, then at least one of � i(l) = B, �i(l) = B,
� i(l; �j; 1) = B and #i(l; �j; 1) = B happens. Reversing the roles of i and j, whenever player
j takes aj(l + 1) 6= a(xj) (this implies �i(l + 1) = B), any action pro�le is optimal to player
i.
Finally, we show that Condition 2 in Section 9.1 is satis�ed.
Lemma 9 For any " 2 (0; �"), for su¢ ciently large T , for any i 2 I and �j(l+1) 2 fG;Bg,
1. Conditional on �j(l+1) and �j(l) = �j(l; �j; 1) = G, player i believes that �j(l+1)(i) =
�j(l + 1) or �j(l; �j; 1) = B with probability no less than 1� exp(�T 25 ).
2. Conditional on �j(l+1) and �j(l) = �j(l; �j; 1) = G, any history of player i can happen
with probability at least exp(�T 25 ).
3. Conditional on �j(l + 1) and � i(l) = � i(l; �j; 1) = #i(l; �j; 1) = G, conditional on any
history in the supplemental rounds, player j believes that any �j(l + 1)(i) is possible
with probability no less than exp(�T 25 ).
4. �j(l; �j; 1) = �j(l; �j; 1) = � i(l; �j; 1) = #i(l; �j; 1) = G with probability no more than
exp(�T 25 ).
5. The distribution of �j(l; �j; 1) is independent of player i�s strategy with probability no
less than 1� exp(�T 25 ).
61
6. The distribution of �j(l; �j; 1) is independent of player i�s strategy.
7. The distribution of � i(l; �j; 1) is independent of player j�s strategy with probability no
less than 1� exp(�T 25 ).
8. The distribution of #i(l; �j; 1) is independent of player j�s strategy.
Notice that, as we have mentioned, player i excludes the case with �j = B in 1 and 2 of
Lemma 9 and player j excludes the case with � i = B or #i = B in 3 of Lemma 9.
Proof.
1. Suppose �j(l + 1) = ��j. If �j(l + 1)(i) = ��j, then we are done. So, let us concentrate
on �j(l + 1)(i) 6= ��j. Forget about the conditioning on �j(l; �j; 1) = G for a while. If
�j(l + 1)(i) 6= ��j but player i uses the supplemental round 1 to infer �j(l + 1), then
i(��j)(l; �j; 1)� pj(��j) > �K"
and player i has (58) and (59). Therefore, by triangle inequality,
Hj(��j)Qj;i(a��jj ; a
Gi )yi(l; �j; 1)� pj(��j)
> � �K � 1�":
From (62), this implies that yi(l; �j; 1) 62 Hj;i["](��j), which means that any yj 2
��f1yjgyj2Yj
�within " from the conditional expectation of 1
T12�1
Pt2Ti(l;�j ;1) 1yj condi-
tional on the true message �j(l+1) = ��j is not included in Hj["](��j). Since Ti(l; �j; 1)
and Tj(l; �j; 1) di¤er at most for two periods, this implies that any yj 2 ��f1yjgyj2Yj
�within "
2from player i�s conditional expectation of yj(l; �j; 1) conditional on the true
message �j(l + 1) = ��j is not included in Hj["](��j). Since yj(l; �j; 1) 2 Hj["](��j) is a
necessary condition for �j(l; �j; 1) = G, player i puts the belief no less than exp(�T 37 )
on �j(l; �j; 1) = G by Hoe¤ding�s inequality.
62
Since Pr(�j(l; �j; 1) = G j fyj;tgt) � 1� exp(�T37 ) for all fyj;tgt, we have
Pr(fyj;tgt j �j(l; �j; 1) = G; fyi;tgt)
=Pr(�j(l; �j; 1) = G j fyi;tgt; fyj;tgt) Pr(fyj;tgt j fyi;tgt)
Pr(�j(l; �j; 1) = G j fyi;tgt)
=Pr(�j(l; �j; 1) = G j fyj;tgt) Pr(fyj;tgt j fyi;tgt)P
fyj;tgt Pr(�j(l; �j; 1) = G j fyj;tgt) Pr(fyj;tgt j fyi;tgt)
2
24 (1� 2 exp(�T 37 )) Pr(fyj;tgt j fyi;tgt);
(1 + 2 exp(�T 37 )) Pr(fyj;tgt j fyi;tgt)
35 : (66)
Hence, even after conditioning on �j(l; �j; 1) = G, player i believes �j(l; �j; 1) = G
with probability at most exp(�T 25 ). In addition, conditional on �j(l + 1), �j(l) is
independent of the supplemental rounds. Therefore, we are done.
2. Suppose �j(l; �j; 1) = B never happens. Conditional on �j(l+1) = ��j, any (yi;t)t2T (l;�j ;1)
can occur with probability at least
�minyi;a
q(yi j a)�T 1
2
:
Hence, for
0 < e <1
� log�minyj ;yi;a q(yi j a; yj)
;any history of player i can happen.
By the symmetric proof to (66), conditioning on �j(l; �j; 1) = G does not change the
probability so much. Again, �j(l) is independent of the supplemental rounds.
3. Suppose � i(l; �j; 1) = B never happens. Conditional on (at; yj;t)t2T (l;�j ;1), any yi(l; �j; 1)
can occur with probability at least
�minyj ;yi;a
q(yi j a; yj)�T 1
2
:
63
Further, for any �j 2 fG;Bg, for yi(l; �j; 1) su¢ ciently close to q(a�jj ; aGi ), (61) and
(63) are satis�ed and �j(l + 1)(i) = �j. Hence, for
0 < e <1
� log�minyj ;yi;a q(yi j a; yj)
;�j(l+1)(i) = �j can happen together with (61) (and so #i(l; �j; 1) = G) with probability
no less than exp(�1eT
12 ).
By the symmetric proof to (66), conditioning on � i(l; �j; 1) = G does not change the
belief so much. Again, � i(l) is independent of the supplemental rounds.
4 to 8 Follows from Hoe¤ding�s inequality and the fact that we take the a¢ ne hull with respect
to aj for the de�nition of Hi(G).
15.4.2 Explanation of 2 of Lemma 5: Supplemental Round 2 for �j(l + 1)
The following is the formal description of the message exchanges in the supplemental round
1 for �j(l + 1).
Strategies Player j (sender) with the message �j(l+1) = ��j 2 fG;Bg determines zj(��j) 2
fG;B;Mg as follows:
zj(��j) =
8>>><>>>:��j with probability 1� 2�;
fG;Bg n f��jg with probability �;
M with probability �:
Here, � is determined in Section 10. Player j with zj(��j) takes
�zj(��j)j =
8>>><>>>:aGj if zj(��j) = G;
aBj if zj(��j) = B;
12aGj +
12aBj if zj(��j) =M
64
for T12 periods in the supplemental round 2 for �j(l+ 1), that is, for t 2 T (l; �j; 2). Player i
(receiver) takes aGi .
Requirement of Player j Player j who has had �j(l; �j; 1) = B or �j(l; �j; 1) = B in the
supplemental round 1 has already made player i indi¤erent between any action pro�le in the
subsequence rounds. Player j who has �j(l; �j; 1) = �j(l; �j; 1) = G requires player i to infer
�j(l + 1) correctly if and only if zj(��j) = ��j. �j(l + 1) = B if zj(��j) 6= ��j.
See Figure 12 again. Notice that Box 9 is the only place where player j requires player i
to infer �j(l + 1) correctly and it happens only if �j(l; �j; 1) = G and zj(��j) = ��j.
Inference of Player i If player i has used the supplemental round 1 to infer �j(l + 1),
player i is stick to that inference. Otherwise, player i infers �j(l+1)(i) using the supplemental
round 2. Remember that player i uses the supplemental round 2 only if � i(l; �j; 1) = B or
#i(l; �j; 1) = B and player j (sender) excludes these cases from consideration (see Figure
13). Therefore, player j does not care about player i�s inference in the supplemental round
2.
Based on the signal observations in the supplemental round 2, fyi;tgt2T (l;�j ;2), the belief
of player i can be classi�ed into the following three cases:
Lemma 10 For each ��j 2 fG;Bg, conditional on �j(l+ 1) = ��j, one of the following three
is correct:
1. the likelihood ratio of zj(��j) = G compared to zj(��j) = B is no less than exp(T37 ).
2. the likelihood ratio of zj(��j) = B compared to zj(��j) = G is no less than exp(T37 ).
3. the likelihood ratio of zj(��j) =M compared to zj(��j) = G;B is no less than exp(T37 ).
If 1 of Lemma 10 is the case, then player i infers �j(l+1)(i) = G. This is almost optimal
since we condition on �j(l+ 1) = ��j 2 fG;Bg and player i is indi¤erent between any action
if zj(��j) 6= ��j. If 2 of Lemma 10 is the case, then player i infers �j(l + 1)(i) = B. This
65
is almost optimal by the same reason. If 3 of Lemma 10 is the case, then player i infers
�j(l + 1)(i) = B. In this case, zj(��j) =M 6= ��j is highly likely and player i is indi¤erent to
any action. Therefore, this is also almost optimal.
Proof. Conditional on �j(l + 1) = ��j,
Pr�zj(��j) = zj j ��j; fyi;tgt2T (l;�j ;2)
�Pr�zj(��j) = z0j j ��j; fyi;tgt2T (l;�j ;2)
� = Pr�fyi;tgt2T (l;�j ;2) j zj(��j) = zj
�Pr�fyi;tgt2T (l;�j ;2) j zj(��j) = z0j
� Pr �zj(��j) = zj j ��j�
Pr�zj(��j) = z0j j ��j
�and
Pr(zj(��j)=zj j��j)Pr(zj(��j)=z0j j��j)
is bounded byh
�1�2� ;
1�2��
i. With yi(l; �j; 2) being the frequency of each
y for fyi;tgt2T (l;�j ;2), log Pr�fyi;tgt2T (l;�j ;2) j zj(��j) = zj
�is expressed as T
12L(yi(l; �j; 2); zj)
with
L(yi(l; �j; 2); zj) = yi;1(l; �j; 2) log q(yi;1jaGi ; �zjj ) + � � �+ yi;jYij(l; �j; 2) log q(yi;jYijjaGi ; �
zjj ):
Hence, it su¢ ces to show that there exists ~� such that, with
L(fi; zj; z0j) � L(fi; zj)� L(fi; z0j);
for any fi 2 �(f1yigyi2Yi), one of the following is true:
1. zj(��j) = G is more likely than zj(��j) = B: L(fi; G;B) � ~�;
2. zj(��j) = B is more likely than zj(��j) = G: L(fi; B;G) � ~�;
3. zj(��j) =M is more likely than zj(��j) = G;B: L(fi;M;G) � ~� and L(fi;M;B) � ~�.
Generically, we can assume that for each k 2 f1; : : : ; jYijg,
q(yi;kjaGi ; �Gj ) 6= q(yi;kjaGi ; �Bj ): (67)
66
Let ��j = �aGj + (1� �) aBj for � 2 [0; 1] and consider
g(fi; �) = fi;1 log q(yi;1jaGi ; ��j ) + � � �+ fi;jYij log q(yi;jYijjaGi ; ��j ):
Then,
d2g(fi; �)
d�2= �
jYijXk=1
fi;k
(q(yi;kjaGi ; �Gj )� q(yi;kjaGi ; �Bj )
q(yi;kjaGi ; ��j )
)2< 0
for any fi;k because of (67). Hence, g(fi; �) is strictly concave. Therefore, since L(fi; zj; ~zj)
is the di¤erence in g(fi; �), one of the following is true:
1. zj(��j) = G is more likely than zj(��j) = B: L(fi; G;B) > 0;
2. zj(��j) = B is more likely than zj(��j) = G: L(fi; B;G) > 0;
3. zj(��j) =M is more likely than zj(��j) = G;B: L(fi;M;G) > 0 and L(fi;M;B) > 0.
Hence,
max fL(fi; G;B);L(fi; B;G);min fL(fi;M;G);L(fi;M;B)gg > 0:
Since LHS is continuous in fi and �(f1yigyi2Yi) is compact, there exists ~� > 0 such that
max fL(fi; G;B);L(fi; B;G);min fL(fi;M;G);L(fi;M;B)gg > ~�
for all fi 2 �(f1yigyi2Yi) as desired.
15.4.3 Proof of 3-(a) to 3-(f)
Follows from Lemma 9 and conditional independence of each round.
15.5 Proof of Lemma 6
Since �j(l) = �j(l) always holds for �i(B), we concentrate on �i(G).
67
Since once �j(~l) = B is induced, then �j(~l0) = B for all the following rounds, there
exists a unique l� such that �j(~l) = B is initially induced in the (l� + 1)th review round:
�j (1) = � � � = �j (l�) = G and �j (l� + 1) = � � � = �j (L) = B. Similarly, there exists l� with
�j (1) = � � � = �j(l�) = G and �j(l� + 1) = � � � = �j (L) = B. If �j (L) = G (�j (L) = G,
respectively), then de�ne l� = L (l� = L, respectively).
Then, there are following three cases:
� l� = l�: This means �j (l) = �j (l) for all l as desired.
� l� > l�: This means that player i in the supplemental rounds 1 and 2 for �j(l� + 1)
inferred �j(l�+1)(i) = B. Then, for any l � l�+1, by 1 of Lemma 5, player i believes
that conditional on �j(~r) = #j(~r) = G for all r < l, player i in the lth review round
believes that �j(l� + 1)(i) = �j(l� + 1) or ��j(l�; �j; 1) = B or �j(l�; �j; 2) = B�with
probability no less than 1� exp(�T 13 ) as desired.
� l� < l�: There are following two cases:
�Player i was ready to listen in the l�th block: by the same reason as above, we
are done.
�Player i was not ready to listen in the l�th block (this means that player i has not
deviated until the end of the main round of the l�th block). This means that 3 of
Lemma 4 is applicable and player i at the end of the l�th review round believes
that �j(l� + 1) = G with probability no less than 1� exp(�T 45 ).
As we have mentioned in Section 9.1, player j�s continuation strategy reveals �j(l+
1) through (i) the strategy in the supplemental rounds and (ii) �i(l + 1). From
2 of Lemma 5, conditional on �j(l + 1) and �j(l�) = �j(l
�; �j; 1) = �j(l�; �i; 1) =
#j(l�; �i; 1) = G, any history of player i can happen in the supplemental rounds
with probability no less than exp(�T 25 ). Hence, the update from (i) is bounded
by exp(T25 ). In addition, for �i(l+ 1), player j is ready to listen with probability
at least � and 3 of Lemma 5 (with the roles of i and j being reversed) implies
68
that any �i(l+1)(j) happens with probability no less than exp(�T25 ). Hence, the
update from (ii) is bounded by 1�exp(T
25 ). Therefore, after observing player j�s
continuation strategy, player i believe �j(l + 1) = B with probability at most
exp(�T 45 ) 1
�exp(2T
25 )
1� exp(�T 45 )
< exp(�T 13 ) (68)
as desired.
15.6 Proof of Proposition 1
For (6), it su¢ ces to have
��i (x; l; �j)
8<: � 0 if xj = G;
� 0 if xj = B;(69)
j��i (x; l; �j)j � maxi;a 2 jui (a)jT (70)
for all x 2 fG;Bg2, l 2 f1; :::; Lg and �j 2 fG;Bg.
To see why (69) and (70) are su¢ cient for (6), notice the following: (70) implies uniform
boundedness. (69) implies that �i(xj; hTP+1j ; l) > ��T with xj = G or �i(xj; h
TP+1j ; l) < �T
with xj = B only if �j(l) = G andXj(l) 62 [q2T�2"T; q2T+2"T ]. Since we have �j(~l) = B for
~l > l after those events from (28), �i(xj; hTP+1j ; l) � ��T with xj = G or �i(xj; h
TP+1j ; l) � �T
with xj = B except for one round. In addition, (69) implies ��LT � �T � �i(xj; hTP+1j ; l) �
�LT + �T for all xj. Hence, in total,PL
l=1 �i(xj; hTP+1j ; l) � �LT � L�T < 0 with xj = G and
�i(xj; hTP+1j ; l) � ��LT + L�T > 0 with xj = B. The strict inequalities follow from (11).
Therefore, (6) is satis�ed.
Therefore, we prove Proposition 1 with (6) replaced with (69) and (70).
1 follows from (32). 2-(a) is true since �j(~r) = #j(~r) = G for all ~r < r imply that player
j has inferred �i(l + 1)(j) from the supplemental round 1 (See Figure 13 reversing the roles
of i and j). 2-(b) is true since taking aGi gives the almost optimal inference and (35) gives a
high reward on aGi .
69
Therefore, we are left to show the existence of ��i(x; l; �j) with (69) and (70) and 2-(c) by
backward induction. From Lemmas 4 and 5, the distribution of �j, #j and �j is independent
of player i�s strategy with high probability and so we can neglect the e¤ect of player i�s
strategy on �j, #j and �j.
In the Lth review round, consider the case with xi = B �rst. If player j uses (33), then
Di is strictly optimal. If player j uses (32), then any action is optimal.
Consider the case with xi = G next. If player j uses (33) and �j(L) = �j(L) = G, then
Ci is strictly optimal for su¢ ciently large T since the reward (33) is increasing in Xj(L) and
(42) implies that the marginal expected increase in �LXj(L) is su¢ ciently large.18 If player
j uses (33) and �j(L) = �j(L) = B, then Di is strictly optimal since the reward (33) is
constant. If player j uses (32), then any action is optimal. Hence, �i(xi) is optimal for these
cases.
For the other cases, by Lemma 6, player i does not have posteriors more than exp(�T 13 ).
Since the per-period di¤erence of the payo¤ from two di¤erent strategies is bounded by
�U � �L + maxi;a 2 jui(a)j, the expected loss from �i(xi) (taking Ci if �j(L) = G and Di if
�j(L) = B) is no more than exp(�T 13 ) �UT � exp(�T 1
4 ) for su¢ ciently large T .
Therefore, in total, �i(xi) is almost optimal in the Lth review round.
Further, if player j uses (33) and �j(L) = �j(L), then player i�s average continuation
payo¤ at the beginning of the Lth review round except for ��i(x; L; �j) is
ui(Di; Cj)� � if xi = B; xj = G;
ui(Di; Dj) + � if xi = B; xj = B;
ui(Ci; Cj)� �� 2"�L if xi = G; xj = G; �j(L) = �j(L) = G;
ui(Ci; Dj) + �+ 2"�L if xi = G; xj = B; �j(L) = �j(L) = G;
ui(Di; Cj)� � if xi = G; xj = G; �j(L) = �j(L) = B;
ui(Di; Di) + � if xi = G; xj = B; �j(L) = �j(L) = B:
(71)
Hence, there exists ��i(x; L; �j) with (69) and (70) such that player i�s average continuation
18For large T , the e¤ect of dropping one tj(l) is negligible.
70
payo¤ is equal to ui(Ci; Cj)� �� 2"�L if xj = G and ui(Di; Dj) + �+ 2"�L if xj = B.
In the supplemental rounds for �j(L) (where player i receives the message), as we have
mentioned for 2-(b) of Proposition 1, �i(xi) is exactly optimal.
In the supplemental round 2 for �i(L), as we have mentioned for 2-(a) of Proposition 1,
�i(xi) is exactly optimal.
In the supplemental round 1 for �i(L), since (i) (34) cancels out the di¤erence in the
instantaneous utilities, (ii) 4 of Lemma 4 guarantees that player i�s continuation payo¤ is
�xed regardless of player j�s inference of player i�s message if player j uses the inference
�i(L)(j) for �i(L), and (iii) the equilibrium strategy gives the almost optimal inferences,19
�i(xi) is optimal up to the loss of exp(�T13 )f �UT + maxx;�j ��i(x; L; �j)g � exp(�T 1
4 ) for
su¢ ciently large T .
In the (L�1)th main round, for the almost optimality, only di¤erence from the Lth review
round is that, if �j(L � 1) = G, then player i�s action in the (L � 1)th review round can
a¤ect the distribution of �j(L). However, since (i) the miscoordination between �j(L) and
�j(L) will not occur with probability more than exp(�T13 ) by Lemma 6 and (ii) ��i(x; L; �j)
is determined so that player i�s continuation payo¤ is the same between �j(L) = G and B if
the coordination goes well, this di¤erence is negligible for the almost optimality.
Further, if player j uses (33) and �j(L � 1) = �j(L � 1), then player i�s average payo¤
from the (L� 1)th review round except for ��i(x; L�1; �j) is given by (71). The cases where
(32) will be used in the Lth review round will happen with probability no more than 3�
(player j is ready to listen to the message �i(L) and zj(��j) 6= �j(L)) plus some negligible
probabilities for �j = B or #j = B. When (32) is used, per period payo¤ is bounded
by [��u; �u] by Lemma 2. Therefore, there exists ��i(x; L; �j) with (69) and (70) such that
player i�s average continuation payo¤ from the (L� 1)th and Lth review rounds is equal
to ui(Ci; Cj) � � � 2"�L � 3��u2if xj = G and ui(Di; Dj) + � + 2"�L + 3��u
2if xj = B. Since
the length of the supplemental rounds is much smaller than that of the review rounds, the
instantaneous utilities and rewards in the supplemental rounds do not a¤ect the average
19See (68) to see how the history in this round can a¤ect the optimality of player i�s inferences slightly.
71
payo¤.
Recursively, for l = 1, Proposition 1 is satis�ed and the average ex ante payo¤ of player
i is ui(Ci; Cj)� �� 2"�L� 3(L�1)�u�L
if xj = G and ui(Di; Dj) + �+ 2"�L+ 3(L�1)�u�L
if xj = B.
From (44), we can further modify ��i (x; 1; G) with (69) and (70) such that �i(xi) gives �vi (vi,
respectively) if xj = G (B, respectively).20
15.7 Formal Construction of the Report Block
We are left to show the truthtelling incentive for hTP+1i and to establish the exact optimality
of �i (xi). To verify the truthtelling incentive, we distinguish the true history hTP+1i and
the message hTP+1i . In general, when we write a variable in player i�s history with �hat,�it
means player i�s message (and with cheap talk, equivalently player j�s inference) about that
variable.
Let Aj(r) be the set of information up to and including the round r consisting of
� What state xj player j is in,
� What action aj(l) player j took in the lth review round with l � r, and
� What states �j(~r); #j(~r) 2 fG;Bg with ~r < r player j had.
We want to show that �i(xi) is exactly optimal in the round r conditional on Aj(r). Note
that Aj(r) contains xj and so the equilibrium is belief-free at the beginning of the �nitely
repeated game.
We introduce the following variables. Let Ri(r) be the set of rounds ~r � r � 1 that are
not a supplemental round 2 for �i(l + 1) (with Ri(1) = ;), tr be the initial period of the
round r, and hri be the summary of player i�s history at the beginning of the round r. hri is
a collection of jAij jYij � 1 vectors, one for each ~r 2 Ri(r). The element of the vector for ~r
corresponding to (ai; yi) represents how many times player i observed (ai; yi) in the round ~r.
We construct the message protocol for hTP+1i as follows:
20Remember that �j(1) = G.
72
� By public randomization device, the players coordinate on who will report hTP+1i .
Only one player reports the history. Player 1 reports hTP+11 with probability 12and
player 2 reports hTP+12 with probability 12. Suppose player i is picked by the public
randomization device.
� For each round r that is not a supplemental round 2 for �i(l + 1),
�Player i sends the history in the initial period of the round r: (ai;tr ; yi;tr).
�Player i sends the history of the round r: fai;t; yi;tgt2T (r).
With abuse of notation, we assume the players can send multiple messages sequentially.
Note that hri can be calculated by the messages sent before (ai;tr ; yi;tr).
Player j gives a reward on player i as follows. Here, we do not consider the feasibility
constraint (6). As we will see, the total reward in the report block is bounded by T�1 and we
can restore (6) by adding or subtracting a small constant depending on xj without a¤ecting
the incentive.
As a preparation, we prove the following lemma:
Lemma 11 Let hi and hj be player i�s and player j�s histories at the end of the main blocks,
respectively. There generically exist �" > 0 and gi(hj; ai; yi) such that, for su¢ ciently large T ,
for any round r 2 Ri(R) and period t in T (r), conditional on �j(~r) = #j(~r) = G with ~r < r,
it is better for player i to report ai;t; yi;t truthfully: For all hi,
E�gi(hj; ai;t; yi;t) j �j(~r) = #j(~r) = G with ~r < r; hi; (ai;t; yi;t) = (ai;t; yi;t)
�(72)
> E�gi(hj; ai;t; yi;t) j �j(~r) = #j(~r) = G with ~r < r; hi; (ai;t; yi;t) 6= (ai;t; yi;t)
�+ �"T�1;
where (ai;t; yi;t) is player i�s message.
Proof. We show
gi(hj; ai;t; yi;t) = �1ftj(r)=tg 1yj;t � E[1yj;t j ai;t; yi;t; aj;t] 273
works.21 ;22 To see this, consider the following two cases:
1. If tj(r) 6= t, any report is optimal since gi(hj; ai;t; yi;t) = 0.
2. If tj(r) = t, then period t is not used for the construction of the continuation strategy.
Hence, player i, after knowing tj(r) = t and aj;t, wants to maximize
maxai;t;yi;t
Eh� 1yj;t � E[1yj;t j ai;t; yi;t; aj;t] 2 j ai;t; yi;t; aj;ti :
The �rst order condition is
E�1yj;t j ai;t; yi;t; aj;t
�= E[1yj;t j ai;t; yi;t; aj;t];
which generically implies (ai;t; yi;t) = (ai;t; yi;t), and the second order condition is also
satis�ed.
We are left to show that there exists " > 0 such that, for any hi, r 2 Ri(R) and t 2 T (r),
player i puts belief at least "T�1 on tj(r) = t. Suppose player i knows faj;�g�2T (r) and
fyj;� ; 'j;�g�2Tj(r) in addition to hi. Here, 'j;t is
� Hj;t for t in the round r where player j is the sender of the message by a pure strategy,
� ; (no information) for t in the round r where player j is the sender of the message by
a mixed strategy,
� Gj;t, j;t(G) and j;t(B) for t in the round r where player j is the receiver of player
i�s message set by a pure strategy, and
� 'j;t = (a(x)j;t ; (Eji)t) for t in the round r that is a review round.
21We use Euclidean norm in Section 15.7.22Kandori and Matsushima (1994) use a similar reward to give a player the incentive to tell the truth
abouth the history.
74
Since faj;� ; yj;� ; 'j;�g�2Tj(r) determines player j�s continuation strategy (including �j and
#j), it su¢ ces to show that, conditional on fa� ; yi;�g�2T (r), fyj;� ; 'j;�g�2T (r) and�yj;tj(r); 'j;tj(r)
�=�
�yj; �'j�, player i puts belief at least "T�1 on tj(r) = t. For any t and t0 2 T (r), the likelihood
ratio between tj(r) = t and tj(r) = t0 is given by
Pr(tj(r) = t j fa� ; yi;�g�2T (r); fyj;� ; 'j;�g�2T (r);�yj;tj(r); 'j;tj(r)
�=��yj; �'j
�)
Pr(tj(r) = t0 j fa� ; yi;�g�2T (r); fyj;� ; 'j;�g�2T (r);�yj;tj(r); 'j;tj(r)
�=��yj; �'j
�)
=Pr(fyj;� ; 'j;�g�2T (r);
�yj;tj(r); 'j;tj(r)
�=��yj; �'j
�j fa� ; yi;�g�2T (r); tj(r) = t)
Pr(fyj;� ; 'j;�g�2T (r);�yj;tj(r); 'j;tj(r)
�=��yj; �'j
�j fa� ; yi;�g�2T (r); tj(r) = t0)
2�mina;yi
q(�yj; �'j j a; yi);1
mina;yi q(�yj; �'j j a; yi)
�2"min
a;yi;yj ;'jq(yj; 'j j a; yi);
1
mina;yi;yj ;'j q(yj; 'j j a; yi)
#:
Since min q(yj; 'j j a; yi) 2 (0; 1) from Lemma 7, there exists " > 0 such that
Pr�tj(r) = t j fa� ; yi;�g�2T (r); fyj;� ; 'j;�g�2T (r);
�yj;tj(r); 'j;tj(r)
�=��yj; �'j
��> "Pr(tj(r) = t0 j fa� ; yi;�g�2T (r); fyj;� ; 'j;�g�2T (r);
�yj;tj(r); 'j;tj(r)
�=��yj; �'j
�)
for all t and t0. Since there exists at least one t0 with Pr(tj(r) = t0 j fa� ; yi;�g�2T (r); fyj;� ; 'j;�g�2T (r);�yj;tj(r); 'j;tj(r)
�=�
�yj; �'j�) > T�1, we are done.
By backward induction, for each r, we will construct the following rewards based on
player i�s messages:
� If we come to the round r with �j(~r) = B or #j(~r) = B with some ~r < r, then we
cancel out all the rewards explained below about the following rounds r � r. Then,
since we have established the exact optimality of �i(xi) in the round r with �j(~r) = B
or #j(~r) 2 B with some ~r < r without adjustment, any action is exactly optimal after
(and including) the round r.
� If the round r is a supplemental round 2 for �i(l+1), player i does not report the history
75
in that round and the rewards below are all 0 for that round. Since Proposition 1
establishes the exact optimality without adjustment, we can keep the exact optimality.
� Otherwise, we consider the following punishment and reward:
� (r-j-1) Based on hri , player j gives
Xai
f(ai j hri ;Aj(r))ai;aj;trj;tr
(73)
with
f(ai j hri ;A1(r)) 2 [�T�10r+6; T�10r+6] for all ai 2 Ai
such that, after hri , it is optimal to take ai 2 Ai(hri ). Note that if
ai;aj;trj;tr
= 1,
that is, if it is likely that player i took ai at the beginning of the round r, player j
rewards player i by f(ai j hri ;A1(r)) so that player i wants to follow the equilibrium
strategy. The existence of such a function f will be veri�ed below.
� (r-j-2) Player j punishes player i if it is likely for player i to tell a lie about the
history in the initial period of the round r by
T�10r+5gi(hj; ai;tr ; yi;tr): (74)
� (r-j-3) Player j makes it optimal to constantly take ai;tr within the round r by
adding
T�10r+3Xt2T (r)
ai;tr ;aj;tj;t (75)
for t included in the round r, that is, if player i tells the truth and ai;tr = ai;tr ,
then player j rewards player i if it is likely that player i in the round r takes the
same action as the one in the initial period ai;tr .
� (r-j-4) Player j punishes player i if it is likely for player i to tell a lie about
76
fai;t; yi;tgt2T (r) by Xt2T (r)
T�10rgi(hj; ai;t; yi;t): (76)
Based on the message hri , player j calculates Ai(hri ), the set of player i�s action
that should be taken with positive probability in the round r after history hri .
We show the truthtelling incentive about (ai;tr ; yi;tr) and fai;t; yi;tgt2T (r) by backward
induction. We start from r = R, the last round. If �j(~r) = B or #j(~r) = B with some ~r < r,
the messages about the history in the round r are irrelevant. If �j(~r) = #j(~r) = G with
~r < r, then regardless of the speci�cation of f , since (73), (74) and (75) have been sunk, it
is optimal for player i to tell the truth about fai;t; yi;tgt2T (R). Since (74) dominates (75), it
is optimal to tell the truth about ai;tr ; yi;tr .
For the round (R� 1), (73), (74), (75) and (76)for r = R are dominated by the smallest
loss in (76) and (74) for r = R � 1. Therefore, the same argument for the round R works.
We can proceed until the �rst round.
Recursively, therefore, regardless of the speci�cation of f , we have established the opti-
mality of the truthtelling incentive in the report block. Now, we construct f(ai j hri ;Aj(r))
by backward induction.
At the beginning of the round R, if �j(~r) = B or #j(~r) = B with ~r � R � 1, then
any action is exactly optimal and the speci�cation of f is irrelevant. Otherwise, player i�s
value of taking a constant action ai 2 Ai conditional on Aj(R) only depends on hRi given
the truthtelling strategy in the report block. This is true even after player i�s deviation
since player j�s strategy is i.i.d. within each round. Hence, we can write the value as
vi(ai j hRi ;Aj(R)). Let f(ai j hRi ;Aj(R)) be such that
Pr(player i reports the history)X~ai
f�~ai j hRi ;Aj(R)
�Pr�n~ai;aj;tj;t = 1
oj ai;t = ai
�
=
8<: max~ai2Ai vi(~ai j hRi ;Aj(R))� vi(ai j hRi ;Aj(R)) if ai 2 Ai(hRi )
0 otherwise
77
for all hRi . Remember that �i(xi) is almost optimal except for the adjustment of the reward in
the report block, that all the messages about hRi transmit correctly, and that the variance of
the reward in the report block based on the histories in the round R is bounded by T�10R+5.
Hence, we can make sure that
f(ai j hRi ;Aj(R)) 2 [�T�10R+6; T�10R+6]:
This makes it exactly optimal to take ai 2 Ai(hRi ) at tR. After that, (75) and truthtelling
incentive imply that it is optimal to constantly take ai;tR .
We can proceed until the �rst round and show the optimality of �i(xi). The di¤erence
from the round R is that, when player i takes at, it a¤ects the reward for the messages sent
after atr ; ytr about the history in the following rounds. Since this e¤ect is dominated by (75),
it is optimal for player i to take the same action as atr constantly.
Note that if the round r is a supplemental round 2 for �i(l + 1) and this round does not
have an impact (�j(~r) = #j(~r) = G with ~r < r guarantee this), then the expected rewards in
the report block are not a¤ected by the strategy in the round r. Therefore, �i(xi) is exactly
optimal as stated in Proposition 1. In addition, if the round r is a supplemental rounds for
�j(l+ 1), since �i(xi) is strictly optimal by 12�L(q2 � q1) without f from 2-(b) of Proposition
1, the equilibrium strategy is optimal regardless of the speci�cation of f . Hence, we make f
constant at 0.
Finally, we consider the reward on the message xi:
f(xi j xj):
Conditional on xj 2 fG;Bg, �i(xi) gives �vi (vi, respectively) if xj = G (B, respectively)
without the reward in the report block from Section 11. Since the reward in the report block
so far is bounded by [�T�2; T�2], we can take f(xi j xj) 2 [�T�1; T�1] for all xi; xj 2 fG;Bg
such that �i(xi) gives �vi (vi, respectively) if xj = G (B, respectively) without the reward in
the report block.
78
References
[1] Aoyagi, M. (2002): �Collusion in Dynamic Bertrand Oligopoly with Correlated Private
Signals and Communication,�Journal of Economic Theory, 102, 229-248.
[2] Bhaskar, V., and I. Obara (2002): �Belief-Based Equilibria in the Repeated Prisoners�
Dilemma with Private Monitoring,�Journal of Economic Theory, 102, 49-69.
[3] Compte, O. (1998): �Communication in Repeated Games with Imperfect Private Mon-
itoring,�Econometrica, 66, 597-626.
[4] Deb, D. (2011): �Cooperation and Community Responsibility: A Folk Theorem for
Repeated Random Matching Games,�mimeo.
[5] Ely, J., J. Hörner, and W. Olszewski (2004): �Dispensability of Public Randomization
Device,�mimeo.
[6] Ely, J., J. Hörner, and W. Olszewski (2005): �Belief-Free Equilibria in Repeated
Games,�Econometrica, 73, 377-415.
[7] Ely, J., and J. Välimäki (2002): �A Robust Folk Theorem for the Prisoners�Dilemma,�
Journal of Economic Theory, 102, 84-105.
[8] Fong, K., O. Gossner, J. Hörner, and Y. Sannikov (2007): �E¢ ciency in a Repeated
Prisoners�Dilemma with Imperfect Private Monitoring,�mimeo.
[9] Fuchs, W. (2007): �Contracting with Repeated Moral Hazard and Private Evaluations,�
American Economic Review, 97, 1432-1448.
[10] Fudenberg, D., and D. Levine (2002): �The Nash-Threats Folk Theorem with Com-
munication and Approximate Common Knowledge in Two Player Games,�Journal of
Economic Theory, 132, 461-473.
[11] Fudenberg, D., and D. Levine, and E. Maskin (1994): �The Folk Theorem with Imper-
fect Public Information,�Econometrica, 62, 997-1040.
79
[12] Fudenberg, D., and E. Maskin (1986): �The Folk Theorem in Repeated Games with
Discounting and with Incomplete Information,�Econometrica, 54, 533-554.
[13] Harrington, J. E. Jr. and A. Skrzypacz (2007): �Collusion under Monitoring of Sales,�
RAND Journal of Economics, 38, 314-331.
[14] Hörner, J. and W. Olszewski (2006): �The Folk Theorem for Games with Private
Almost-Perfect Monitoring,�Econometrica, 74, 1499-1544.
[15] Hörner, J. and W. Olszewski (2009): �How Robust is the Folk Theorem with Imperfect
Public Monitoring?,�Quarterly Journal of Economics, 124, 1773-1814.
[16] Kandori, M. (2002): �Introduction to Repeated Games with Private Monitoring,�Jour-
nal of Economic Theory, 102, 1-15.
[17] Kandori, M. (2010): �Weakly Belief-Free Equilibria in Repeated Games with Private
Monitoring,�Econometrica, 79, 877-892.
[18] Kandori, M., and H. Matsushima (1998): �Private Observation, Communication and
Collusion,�Econometrica, 66, 627-652.
[19] Kandori, M., and I. Obara (2006): �E¢ ciency in Repeated Games Revisited: the Role
of Private Strategies,�Econometrica, 72, 499-519.
[20] Kandori, M., and I. Obara (2010): �Towards a Belief-Based Theory of Repeated Games
with Private Monitoring: An Application of POMDP,�mimeo.
[21] Mailath, G. and S. Morris (2002): �Repeated Games with Almost-Public Monitoring,�
Journal of Economic Theory 102, 189-228.
[22] Mailath, G., and S. Morris (2006): �Coordination Failure in Repeated Games with
Almost-Public Monitoring,�Theoretical Economics, 1, 311-340.
[23] Mailath, G., and L. Samuelson (2006): Repeated Games and Reputations: Long-Run
Relationships. Oxford University Press, New York, NY.
80
[24] Matsushima, H. (2004): �Repeated Games with Private Monitoring: Two Players,�
Econometrica, 72, 823-852.
[25] Miyagawa, E., Y. Miyahara, and T. Sekiguchi (2008): �The Folk Theorem for Repeated
Games with Observation Costs,�Journal of Economic Theory, 139, 192-221.
[26] Obara, I. (2009): �Folk Theorem with Communication,�Journal of Economic Theory,
144, 120-134.
[27] Phelan, C. and A. Skrzypacz: �Beliefs and Private Monitoring,�mimeo
[28] Piccione, M. (2002): �The Repeated prisoners�Dilemma with Imperfect Private Moni-
toring,�Journal of Economic Theory, 102, 70-83.
[29] Radner, R. (1985): �Repeated Principal-Agent Games with Discounting,�Economet-
rica, 53, 1173-1198.
[30] Sekiguchi, T. (1997): �E¢ ciency in Repeated prisoners�Dilemma with Private Moni-
toring,�Journal of Economic Theory, 76, 345-361.
[31] Stigler, G. (1964): �A Theory of Oligopoly,�Journal of Political Economy, 72, 44-61.
[32] Sugaya, T. (2010a): �Generic Characterization of the Set of Belief-Free Review-Strategy
Equilibrium Payo¤s,�mimeo.
[33] Sugaya, T. (2010b): �Folk Theorem in a Repeated Prisoners�Dilemma without Condi-
tional Independence,�mimeo.
[34] Sugaya, T. (2010c): �Folk Theorem in a Repeated Prisoners�Dilemma with a Generic
Private Monitoring,�mimeo.
[35] Sugaya, T. and S. Takahashi (2010): �Coordination Failure in Repeated Games with
Private Monitoring,�mimeo.
81
[36] Takahashi, S. (2007): �Community Enforcement when Players Observe Partners�Past
Play,�mimeo.
[37] Yamamoto, Y. (2007): �E¢ ciency Results in N Player Games with Imperfect Private
Monitoring,�Journal of Economic Theory, 135, 382�413.
[38] Yamamoto, Y. (2009a): �A Limit Characterization of Belief-Free Equilibrium Payo¤s
in Repeated Games,�forthcoming in Journal of Economic Theory.
[39] Yamamoto, Y. (2009b): �Repeated Games with Private and Independent Monitoring,�
mimeo.
82
1st review round: ∈ 1 .
2nd review round: ∈ 2 .
1 th review round: ∈ 1 .
th review round: ∈ .
each player with sends by cheap talk.
There are
periods
in each round
There are
periods
in each round
There are
periods
in each round
There are
periods
in each round
…
Figure 1:The Informal Structure of the Phase
Using the reward
∑ Ψ ,,
∈ 2
if and
∑ Ψ ,,
∈ 2
if .
λ λ
∑ Ψ ,,
∈
∈ 2 , 2
∑ Ψ ,,
∈
∉ 2 , 2
Using the reward if and
if .
λ 1λ 1
th review round
1 th review round
This is
Figure 2:Transition of
λ is common knowledge
Take Take Take
Figure 3:Player ’s Action
Using the reward
∑ Ψ ,,
∈ 2
if and
∑ Ψ ,,
∈ 2
if .
λ λ
∑ Ψ ,,
∈
∈ 2 , 2
∑ Ψ ,,
∈
∉ 2 , 2
Using the rewardif and
if .
λ 1λ 1
th review round
1 th review round
Using the reward
∑ ∈ , , , .
1
Figure 4:Transition of 1 and Player ’s Reward After
review round: ∈ 1 .
Supplemental round 1 for 2 : ∈ 1, , 1 .
Supplemental round 2 for 2 : ∈ 1, , 2 .
Supplemental round 1 for 2 : ∈ 1, , 1 .
Supplemental round 2 for 2 : ∈ 1, , 2 .
1st block
review round: ∈ 2 .
Supplemental round 1 for 3 : ∈ 2, , 1 .
Supplemental round 2 for 3 : ∈ 2, , 2 .
Supplemental round 1 for 3 : ∈ 2, , 1 .
Supplemental round 2 for 3 : ∈ 2, , 2 .
2nd block
review round: ∈ 1 .
Supplemental round 1 for 2 : ∈ 1, , 1 .
Supplemental round 2 for 2 : ∈ 1, , 2 .
Supplemental round 1 for 2 : ∈ 1, , 1 .
Supplemental round 2 for 2 : ∈ 1, , 2 .
(L‐1)th block
Lth block review round: ∈ .
each player with sends by cheap talk.
each player reports her whole history by cheap talk.report block
coordination block
periods
/ periods/ periods/ periods/ periods
periods
/ periods/ periods/ periods/ periods
periods
/ periods/ periods/ periods/ periods
periods
Figure 5: Formal Structure of the Phase
Is ∑ Ψ , ∣ , , ∈∈ close to ∑ Ψ∈ ?
( ∑ Ψ , ∣ , , ∈∈ ∑ Ψ∈ ?)
NoYes
andtaken awayfrom player ’s consideration.
Is ∑ Ψ∈ close to its ex ante mean?
(Is ∑ Ψ∈ ∈ , ?)
randomization
Player is ready to listen for the message in the supplemental rounds 1 and 2 for 1 .
1 1 ∈ ,
Yes No
Box 1.
Box 2.Box 3.
Box 4.
Box 11.
Box 6. Box 12.
Box 10.
1
Box 5. 1 η η
Player is ready to listen for the message in the supplemental rounds 1 and 2 for 1 .
1 1 ∈ ,
Box 8.
Box 9.
andplayer will be indifferent between any action sequence in the subsequent rounds.
Box 7.
andplayer will be indifferent between any action sequence in the subsequent rounds.
Figure 6:Inference of
• Using the reward that is increasing
in ∑ Ψ ,
∈ .
• If player can correctly infer ,
player is indifferent between and .
λ λ
ϑ for all imply
• Using the reward that is constant. • If player can correctly infer ,
player is indifferent between and .
Figure 7:Reward Function by Player on Player
Player is sure about λ 1from , ∈
.
Player is not sure.
Player uses 1 to
determine 1 .
Player uses λ 1 to determine
1 .
1 η
η
Player is sure about λ 1from , ∈
.
• Player uses 1 to determine 1 .
• Player determines 1 from
λ 1 , which depends on Ψ , ∈,
determined by , ∈.
• Player uses λ 1 to determine 1 .
• Player has (any action
sequence is optimal).
1 η
η
Box 1.
Box 2.
Box 3.
Box 4.
Box 5.
Box 7.
Box 8.
Player is not sure from
, ∈.
• Player uses λ 1 to determine 1 .
• Player has or
and any action is optimal.
Box 6.Box 9.
λ 1 . λ 1 .What will happen in the supplemental rounds 1 and 2 for 1 .
Box 10. Box 11.
This case is taken away from player ’s consideration
Box 12.
Figure 8: Player ’s Inference of and Player ’s Inference of
Player with
(sender)
Player (receiver)
Player takes for periods. Player takes .
∑ ,∈ , , ∣
∑ ,∈ , , , ,
is close to aff , .
, , 1 :
Any action will be optimal to player .
Goes to the supplemental round 2 for 1 .
, , 1 or
, , 1 : Player is
indifferent between any action profile and player excludes this case from the consideration.
2
Yes No Yes
Box 1.
Box 2.
Box 5. Box 6.
Box 8. Box 9.
Box 4.
Box 8.
Box 7.
Is
∑ ,∈ , , close to
aff , ?
Player may require player to infer
1 correctly,
depending on the supplemental round 2 for 1 .
No
∑ ,∈ , , ∣
∑ ,∈ , , , ,
is close to aff , .
Is
∑ ,∈ , , close to
aff , ?
Otherwise.
2
Box 3.
Figure 9:Player ’s Inference of Player ’s Message
Player with
(sender)
Player takes
for periods.
is…
, ∖
Player takes , ∖
for periods.
Player takes
for
periods.
Yes No
Box 1.
Is
∑ ,∈ , , close to
aff , ?
, , 1 :
Any action will be optimal to player .
The supplemental rounds 1for 1
The supplemental rounds 2for 1
, , 2 :
Any action will be optimal to player .
Player require player to infer 1 correctly.
Box 2.
Box 5.
Box 4.Box 3.
Box 6.
Box 7.
Figure 10:Player ’s Requirement on Player ’s Inference
Player (receiver)
1
Yes
Box 2. Box 3.
Box 8. Box 9.
Box 1.
Is
∑ ,∈ , , close to
aff , ?
No
Otherwise.
1
The supplemental rounds 1for 1
The supplemental rounds 2for 1
is
more likely than
.
is
more likely than
.
is the
most likely among
, , .
Box 4.
Box 5. Box 6. Box 7.
Figure 11:Player ’s Inference of Player ’s Message
∑ ,∈ , ,
∣
∑ ,∈ , , , , is close to
aff , .
∑ ,∈ , ,
∣
∑ ,∈ , , , , is close to
aff , .
Player with
(sender)
Player takes
for periods.
is…
, ∖
Player takes , ∖
for periods.
Player takes , ∖
for periods.
Yes No
Box 1.
∑
,is close to
aff , .
, , 2 .
The supplemental rounds 1for 1
The supplemental rounds 2for 1
, , 2 :
Any action will be optimal to player .
Player require player to infer 1 correctly.
Box 6.
Box 9.
Box 8.Box 7.
Box 10.
Box 11.
Is
∑ ,
close to
∑ Ω , ?
1 , , 1and taken away from player ’s consideration.
.
Is
∑ Ω , close to ?
Yes No
Box 3.
Box 5.
Box 4.
Box 2.
Figure 11:Formal: Player ’s Requirement on Player ’s Inference
Player (receiver)
∑ ,
is close to
1
Box 12.
1
The supplemental rounds 1for 1
The supplemental rounds 2for 1
is
more likely than
.
is
more likely than
.
is the
most likely among
, , .
Box 1.
Box 10.
• Is
∑ , ,,close to
∑ , ?
• Is
∑ , ,,close to
∑ , ?
• Is
∑,close to
∑ , ?
Yes to all No to at least one
, , 1 and taken
away from player ’s consideration.
Is
∑ , close to ?
Yes No
Box 7.
Box 9.
Box 8.
Box 6.
, , 1 and taken
away from player ’s consideration.
, , 1
Player is sure about
λ 1 from , ∈.
• Player is not sure.• or and excluded
from player ’s consideration.• The inference is the same as below.
11
11 η η
Box 2.
Box 3. Box 4.
Box 5.
The th review round.
Figure 13: Formal: Player ’s Inference of Player ’s Message
∑ ,
is close to Othercases
Box 11.