ec487 advanced microeconomics, part i: lecture 10econ.lse.ac.uk/staff/lfelli/teach/ec487 slides...

EC487 Advanced Microeconomics, Part I:Lecture 10

Leonardo Felli

32L.LG.04

1 December 2017

Repeated Games

I This is the class of dynamic games which is best understoodin game theory.

I Players face in each period the same normal form stage game.

I Players’ payoffs are a weighted discounted average of thepayoffs players receive in every stage game.

Leonardo Felli (LSE) EC487 Advanced Microeconomics, Part II 1 December 2017 2 / 66

Repeated Games (cont’d)

Main point of the analysis:

I players’ overall payoffs depend on the present and the futurestage game payoffs,

I it is possible that the threat of a lower future payoff mayinduce a player at present to choose a strategy different fromthe stage game best reply.


Example: the repeated prisoner dilemma

I Stage game:1\2 C D

C 1, 1 −1, 2

D 2,−1 0, 0

I Per period payoff depends on current action: gi (at) .

I Players’ common discount factor δ.

I It is convenient to label the first period t = 0.


Repeated Prisoner Dilemma (cont’d)

I Since we are going to compare the equilibrium payoffs fordifferent time horizons we need to re-normalize the payoffs sothat they are comparable.

I The average discounted payoff for a T -periods game is:

Π =1− δ

1− δTT−1∑t=0

δtgi (at)

I Clearly if gi (at) = 1

Π =1− δ

1− δTT−1∑t=0

δt =

(1− δ

1− δT

)(1− δT

1− δ

)= 1


Finitely Repeated Prisoner Dilemma

I Assume first that the prisoners’ dilemma game is repeated afinite number of times.

I Nash equilibrium payoffs of the stage game: (0, 0).

I Subgame Perfect equilibrium strategies: each player choosesaction D independently of the period and the action the otherplayer chose in the past.

1\2 C D

C 1, 1 −1, 2

D 2,−1 0, 0

Proof: backward induction.

I Subgame Perfection seems to prevent any gain from repeated,but finite interaction, but...


Finitely Repeated Game

I Consider a different finitely repeated game.

I Stage game:L C R

T 1, 1 5, 0 0, 0

M 0, 5 4, 4 0, 0

B 0, 0 0, 0 3, 3

I Nash equilibria of the stage game: (T , L) and (B,R).


Finitely Repeated Game (cont’d)

Assume the game is played twice and consider the followingstrategies:

Player 1:

I play M in the first period;

I in the second period play B if the observed outcome is (M,C );

I in the second period play T if the observed outcome is not(M,C );

Player 2:

I play C in the first period;

I in the second period play R if the observed outcome is (M,C );

I in the second period play L if the observed outcome is not(M,C );


Finitely Repeated Game (cont’d)

Proposition

If δ ≥ 12 then these strategies are a subgame perfect equilibrium of

the game.

L C R

T 1, 1 5, 0 0, 0

M 0, 5 4, 4 0, 0

B 0, 0 0, 0 3, 3

Proof: Backward induction: in the last period the strategiesprescribe a Nash equilibrium. In the first period both player 1 andplayer 2 conform to the strategies if and only if:(

1− δ1− δ2

)[4 + δ 3] =

4 + δ 3

1 + δ≥(

1− δ1− δ2

)[5 + δ] =

5 + δ

1 + δ

The inequality is satisfied for δ ≥ 12 .


Infinitely Repeated Prisoner Dilemma

I Consider now the the infinitely repeated prisoner dilemma:T = +∞.

I Stage game:1\2 C D

C 1, 1 −1, 2

D 2,−1 0, 0

Proposition

Both player choosing strategy D in every period is an SPE of therepeated game.

I Proof: by one deviation principle. Notice that an infinitelyrepeated game is continuous at infinity.


Infinitely Repeated Prisoner Dilemma (cont’d)

Proposition

The (D,D) equilibrium is the only equilibrium if we restrictplayers’ strategies to be history independent.

Proposition

If δ ≥ 12 then the following strategy profile (σA, σB) is a SPE of

the repeated game:

I Player i chooses C in the first period.

I Player i continues to choose C as long as no player haschosen D in any previous period.

I Player i will choose D if a player has chosen D in the past(for the rest of the game).



Proof: If a player i conforms to the prescribed strategies the payoffis 1.

If a player deviates in one period and conforms to the prescribedstrategy from there on (one deviation principle) the continuationpayoff is:

(1− δ)(2 + 0 + . . .) = (1− δ) 2

If δ ≥ 12 then

1 ≥ (1− δ) 2.



We still need to check that in the subgame in which both playersare choosing D neither player wants to deviate.

However, choosing D in every period is a SPE of the entire gamehence it is a SPE of the (punishment) subgames.

Notice that using this type of strategies not only choosing (C ,C )in every period is a SPE outcome, a big number of other SPEoutcomes are also achievable.



Indeed there exists a Folk Theorem.

6

-

q

q

HHHHHHHHAAAAAAAAAHH

HHHH

HHAAAAAAAAA

(1, 1)

(0, 0)

(−1, 2)

(2,−1)

Π2

Π1

AAAAAHH

HH qq


General repeated normal form game

Definition

Let G be a given stage game: a normal form game

G ={N,Ai , gi (a

t)}

Definition

Let G∞ be the infinitely repeated game associated with the stagegame above:

G∞ = {N,H,P,Ui (σ)}

such that:

I H =⋃∞

t=0 At where A0 = ∅;

I P(h) = N for every h ∈ H −Z;


Repeated Normal form Game (cont’d)

I The payoffs for the game G∞ in the case δ < 1 are:

Ui (σ) = (1− δ)∞∑t=0

δtgi (σt(ht))

I Denote ht the history known to the players at the beginningof period t: ht = {a0, a1, . . . , at−1}.

I Let Ht = At−1 to be the space of all possible period thistories.

I A pure strategy for player i ∈ {1, 2} in the game G∞ is thenthe infinite sequence of mappings: {sti }∞t=0 such thatsti : Ht → Ai .



In general we will allow players to mix in every possible stagegame: ∆i (Ai ) set of probability distributions on Ai .

A behavioral mixed strategy in this environment is instead aninfinite sequence of mappings: {σti }∞t=0 such thatσti : Ht → ∆(Ai ).

Notice that mixed strategies cannot depend on past mixedstrategies by the opponents but only on their realizations.

The payoffs for the game G∞ in the case δ < 1 are:

Ui = Eσ(1− δ)∞∑t=0

δtgi (σt(ht))



I Notice that Eσ(·) is the expectation with respect to thedistribution over the infinite histories generated by the profileof mixed behavioral strategies {σti }∞t=0.

I Notice that this specification of payoffs allows us toreinterpret the discount factor δ as:

I the probability that the game will be played in the followingperiod, where these probabilities are assumed to beindependent across periods.



We allow the players to coordinate their strategies through the useof a public randomizing device whose realization in period t is ωt .

Therefore a period t history for player i is:

ht = {a0, . . . , at−1;ω0, . . . , ωt}.

Proposition

If α∗ is a NE strategy profile for the stage game G , then thestrategy:

“each player i plays α∗i independently of the history ofplay”

are a NE and a SPE of the infinitely repeated game G∞(δ).



I The proof that the strategies above are a Subgame Perfectequilibrium of the game G∞(δ) is easily obtained by usingone-deviation-only principle.

I In any given period consider the deviation in the immediateperiod and then let the players continue playing theequilibrium strategies.

I Then any deviation cannot be profitable: it is a deviation fromthe Nash equilibrium action of stage game G .



Assume now that the stage game G has n NE{αj ,∗}n

j=1.

Proposition

Then, for any map j(t) from time periods into an index of the NE{αj ,∗}n

j=1, the strategies:

“each player i plays αj(t),∗i in period t”

are a SPE of the game G∞(δ).

These SPE strategies are history independent. Therefore eachplayer’s best response in every period t is to play the stage gamebest response in t: today’s decision does not affect the future play.



I In other words, playing repeatedly the stage game G does notreduce the set of equilibrium payoffs.

I To be able to move to the Folk Theorem we first need todefine an area known as the set of feasible and individuallyrational payoffs.

I Consider as an example the following battle of sexes game G :

1\2 B F

B 1, 2 0, 0

F 0, 0 2, 1

I Assume this game is repeated an infinite number of timesG∞(δ).



I We first need to define the set of feasible payoffs of G∞.

I Recall that when choosing actions in every period players cancoordinate using a public randomizing device.

I This implies that: every payoff associated with a pure strategyprofile a can be achieved: (1, 2), (0, 0), (2, 1).

I It also implies that every payoff generated by any linear andconvex combination of the pure strategy profile can beachieved.



I In general these payoffs are all in the convex hull of thepayoffs associated with the pure strategy profiles.

I This is the smallest convex set that includes the payoffsassociated with the pure strategy profiles.

I Formally:

V = convex hull {π | πi = gi (a) ∀a ∈ A}



Graphically:

6

-��@@@@��q

qq

(0, 0)

(1, 2)

(2, 1)

π2

π1

V



I We now define the set of individually rational payoffs of G∞.

I We first need to define the minmaxing payoff of each player.

I The minmaxing payoff to player 1 is the lowest payoff thatplayer 2 can impose on player 1.

I Given that player 1 is rational this is the lowest payoff amongthe ones that are player 1’s best reply to player 2 strategies.

I This payoff is a best reply for player 1 since he is trying toachieve the best for himself, given his rationality.



I In other words, among the best reply payoffs for player 1,player 2 chooses her strategy that minimizes these payoffs.

I In the battle of sexes game:

1\2 B F

B 1, 2 0, 0

F 0, 0 2, 1

I the minmax payoff for player 1 is π1 = 1.

I the minmax payoff for player 2 is π2 = 1.

I In general:

πi = minα−i

[maxαi

gi (αi , α−i )

]



I Denote mi−i the profile of minimax strategies for players −i if

they minmax player i .

I This is the lowest payoff player i ’s opponents can hold player ito by choice of α−i .

Definition

A payoff πi for player i is individually rational if and only if:

πi ≥ πi = minα−i

[maxαi

gi (αi , α−i )

]



Definition

We define the set of individual rational payoffs to be the set ofpayoffs that give to each player a payoff

I = {(πi , π−i ) | πi ≥ πi}

The relevant set for us is the set of feasible and individuallyrational payoffs:

V = I ∩ V


Repeated Normal form Game (16)

The region of feasible and individually rational payoffs V:

6

-��@@@@@��q

(0, 0)

(1, 2)

(2, 1)

π2

π1

q @@@@@q

qqq

qπ2

π1

6

(1, 1)

V



I Consider the following game:

1\2 L R

U −2, 2 1,−2

M 1,−2 −1, 2

D 0, 1 0, 1

I Restricting attention to pure strategies then we obtain:

I m21 = D and π2 = 1;

I m12 ∈ {L,R} and π1 = 1 .



I Consider mixed strategies: assume that player 2 randomizeswith probability q on L.

I Then player 1’s expected payoffs for every possible strategychoice are:

Π1(U, q) = 1− 3q

Π1(M, q) = 2q − 1

Π1(D, q) = 0

I This implies that m12 = q ∈

[13 ,

12

]and Π1 = 0.



I Assume that player 1 randomizes with probability pU on Uand probability pM on M.

I Then player 2’s expected payoffs for every possible strategychoice are:

Π2(pU , pM , L) = 2 (pU − pM) + 1− pU − pM

Π2(pU , pM ,R) = 2 (pM − pU) + 1− pU − pM

I This implies that m21 = (pU , pM) =

(12 ,

12

)and that Π2 = 0.


Folk Theorem

I Consider a general finite normal form stage game:

G = {N;Ai , gi (a),∀i ∈ N}

I and the dynamic game that consist of the infinitely repeatedplay of the game G when players’ discount factor is δ:

G∞(δ).

I The payoffs of the infinitely repeated game are:

πi = (1− δ)∞∑t=0

δt gi (ai , a−i )


Subgame Perfect Folk Theorem

Theorem (Subgame Perfect Folk Theorem – Fudenberg andMaskin (1986))

Consider a stage game such that dim(V) = #N, where #Ndenotes the number of players and V denotes the set of feasibleand individually rational payoffs.

Then for any v ∈ V such that (vi > πi ), there exists a δv such thatfor every δ ≥ δv there exists a SPE of G∞(δ) with payoff vector v .


Subgame Perfect Folk Theorem (cont’d)

I Notice that the extra condition dim(V) = #N is not tight, inparticular the theorem can be proved when

dim(V) = #N − 1.

I The dimensionality assumption is for example satisfied in thefollowing example.


Subgame Perfect Folk Theorem: Example

I Let G be the following finite normal form game:

L R

U 2, 1 0, 2

D 0, 0 −1,−1

I Consider the dynamic game G∞(δ).

I The minmax payoff for both players in pure and mixedstrategies are:

π1 = 0 π2 = 0.


Subgame Perfect Folk Theorem: Example (cont’d)

I The set V satisfies the dimensionality assumption dim(V) = 2:

6

-

(0, 2)

(0, 0)

(−1,−1)

(2, 1)

π2

π1

Π2

Π1

V

HHHHHHHH

q

��

��HH

HHHH

HH

qHH

HHHH

HHq

q


Subgame Perfect Folk Theorem: Proof

Proof:

I For simplicity we focus on the case in which there exists apure action profile a such that

g(a) = v .

I Assume first that the minmax action profile mi−i for every

i ∈ N is also a pure strategy so that any deviation fromminmax behavior is easy to detect.

I Choose v ′ ∈ int(V) — recall that (vi > πi ) — such that:

πi < v ′i < vi ∀i ∈ N


Subgame Perfect Folk Theorem: Proof (cont’d)

I Choose also an ε > 0 and a

v ′(i) = (v ′1 + ε, . . . , v ′i−1 + ε, v ′i , v′i+1 + ε, . . . , v ′I + ε)

such that:v ′(i) ∈ V ∀i ∈ N.

I Notice that the role of the full-dimensionality assumption is toassure that there exists a v ′(i) for all i and for some ε and v ′.



I Once again for simplicity assume that for every i ∈ N thereexists an action profile a(i) such that

g(a(i)) = v ′(i).

I Further denote w ji = gi (m

j) player i ’s payoff when minmaxingplayer j .



I Finally, choose n such that

maxa

gi (a) + nπi < mina

gi (a) + nv ′i

or

n >maxa

gi (a)−mina

gi (a)

v ′i − πi.

I Clearly there exists an n satisfying the condition above beingthe numerator bounded above and the denominator boundedbelow.



I We label n the length of a punishment.

I To understand the condition above notice that for δ close to 1:

(1− δn) ' (1− δ)n.

I Consider now the following strategy profile.



1. The play starts in Phase I.

Phase I: play the action profile a, (g(a) = v).

2. The play remains in Phase I so long as in each period:

I either the realized action is a

I or the realized action differs from a in two or morecomponents.



3. If a single player j deviates from a then the play moves toPhase IIj .

Phase IIj : play mj each period.

4. The play stays in Phase IIj for n periods so long as in eachperiod:

I either the realized action is mj

I or the realized action differs from mj in two or morecomponents.



5. After n subsequent periods in Phase IIj the play switches toPhase IIIj .

Phase IIIj : play a(j).

6. If during Phase IIj a single player i ’s action differs from mji

begin Phase IIi .

7. The play stays in Phase IIIj so long as in each period:

I either the realized action is a(j)

I or the realized action differs from a(j) in two or morecomponents.



8. If during Phase IIIj a single player i ’s action differs from ai (j)then begin Phase IIi .

I Using one-deviation-principle we check now that no player hasan incentive to deviate from the prescribed action in anysubgame.

I Clearly each phase corresponds to a different type of propersubgame.



Consider Phase I.

I By conforming player i receives payoff vi while by deviating hecannot receive a payoff higher than:

πi = (1− δ) maxa

gi (a) + δ[(1− δn)πi + δnv ′i

]I Since by construction vi > v ′i for δ sufficiently close to 1:πi < vi .

I Notice indeed that if δ = 0 then πi = maxa

gi (a) and if δ = 1

then πi = v ′i .



Consider Phase IIIj , j 6= i .

I By conforming player i receives payoff v ′i + ε while bydeviating he cannot receive more than:



]

I Payoff πi < v ′i + ε for δ sufficiently close to 1.



Consider Phase IIIi .

I By conforming player i receives payoff v ′i while by deviating hecannot receive more than:



]I Indeed we need:

v ′i > (1− δ) maxa


]I That can be re-written as:

(1− δn+1)v ′i > (1− δ) maxa

gi (a) + δ (1− δn)πi



I Using the approximation (1− δn) ' (1− δ)n we get:

(n + 1)v ′i > maxa

gi (a) + δnπi

I Since v ′i > mina gi (a) and δ < 1 the following is a sufficientcondition for the inequality above:

mina

gi (a) + nv ′i > maxa

gi (a) + nπi

I Clearly from the definition of n for δ sufficiently close to 1

v ′i > πi



Consider Phase IIj , j 6= i .

I If n′ periods remaining in Phase IIj player i ’s payoff byconforming is

ui =(

1− δn′)w ji + δn

′(v ′i + ε)

I while by deviating he cannot obtain more than:


gi (a) + δ[(1− δn) v i + δnv ′i

]I Notice that for δ sufficiently close to 1

ui > πi



Finally consider Phase IIi .

I If n′ < n periods remain in Phase IIi player i ’s payoff byconforming is

u′i =(

1− δn′)πi + δn

′v ′i

I while by deviating:

π′i = (1− δn)πi + δnv ′i

I Clearly u′i > π′i .


Application of Repeated Games: Cartels

I Consider two firms repeatedly involved in a Cournot Duopolyfor an infinite number of periods.

I Both firms produce a perfectly homogeneous good with costfunctions:

c(qi ) = c qi ∀i ∈ {1, 2}.

I and inverse demand function:

P(q1 + q2) = a− (q1 + q2)

where c < a.


Application of Repeated Games: Cartels (cont’d)

I The two firms’ profit functions are:

Π1(q1, q2) = q1 [a− (q1 + q2)− c]

Π2(q1, q2) = q2 [a− (q1 + q2)− c]

I The stage game equilibrium choices (q1, q2) are:

maxq1∈R+

q1 [a− (q1 + q2)− c]

maxq2∈R+

q2 [a− (q1 + q2)− c]

I which is the solution to the following problem:

qc1 =1

2(a− qc2 − c) qc2 =

1

2(a− qc1 − c) .



I This solution is:

qc1 = qc2 =(a− c)

3

I with profits:

πc1 = πc2 =(a− c)2

9

I Consider now a single firm that is a monopolist in this marketand produces a quantity Q.

I This firm profit maximization problem is:

maxQ∈R+

Q [a− Q − c]



I The first order conditions are then:

a− 2Q − c = 0

I or the monopolist quantity:

Qm =(a− c)

2, Πm =

(a− c)2

4

I Assume now that the two firms, without any explicit deal,decide each to produce half of the monopolist quantity:

qm =1

2Qm =

(a− c)

4



I Each firm’s profit in this case is:

πm1 = πm2 =(a− c)2

8

I Notice that clearly:

πci =(a− c)2

9< πmi =

(a− c)2

8

I The quantity qm does dominate qci for both firms.

I However, if one of the firm, say firm 1, produces quantity

qm1 =(a− c)

4

then firm 2 can gain by choosing a different quantity.



I In particular, if firm 2 chooses the quantity:

q̄2 =(a− qm − c)

2=

3 (a− c)

8

I Then firm 2’s profit is:

π̄2 =9 (a− c)2

64

I which clearly is:

π̄2 =9 (a− c)2

64> πm =

(a− c)2

8

I This is the reason why for both firms to choose (qm1 , qm2 ) is

not a Nash equilibrium of the Cournot model.



I Assume however that the two firms compete for an infinitenumber of periods.

I Consider the following strategies:

I Firm 1:

I choose quantity qm1 in the first period;

I in every subsequent period choose quantity qm1 if the observedoutcome in the previous period is (qm1 , q

m2 );

I in every subsequent period choose quantity qc1 if in theprevious period you observe that firm 2 chose quantity q̄2;



I Firm 2:

I choose quantity qm2 in the first period;

I in every subsequent period choose quantity qm2 if the observedoutcome in the previous period is (qm1 , q

m2 );

I in every subsequent period choose quantity qc2 if in theprevious period you observe that firm 2 chose quantity q̄1;

I Recall that the average discounted payoff of each firm is:

(1− δ)∞∑t=0

δtπi (t)



I These strategies do not require an explicit agreement betweenthe two firms provided each firm believes the other firmbehaves this way.

I Question: for which δ neither firm wants to deviate fromthese strategies?

I Consider firm i :

πmi ≥ (1− δ)π̄i + δπci

or(a− c)2

8≥ (1− δ)

9 (a− c)2

64+ δ

(a− c)2

9



I which is satisfied if and only if:

δ ≥ 9

17

I Moreover no firm has an incentive to deviate from punishmentstrategies since (qc1 , q

c2) is a Nash equilibrium of the Cournot

stage game.

I Therefore the cartel behaviour described by the strategiesabove is a Subgame Perfect equilibrium of the infinitelyrepeated game if and only if δ ≥ 9/17.


Subgame Perfect Folk Theorem: Comment

I Notice that in the theory of repeated games there does notexists a commonly accepted theory predicting that the playerwill play an equilibrium whose payoff is on the Pareto-frontierof V.

I In other words nothing guarantees that the outcome will beon the Pareto frontier.


Subgame Perfect Folk Theorem: Comment (cont’d)

Indeed:

6

-

(0, 2)

(0, 0)

(−1,−1)

(2, 1)

π2

π1

Π2

Π1

V

HHHHHHHH

q

��

��HH

HHHH

HH

qHH

HHHH

HHq

q


Subgame Perfect Folk Theorem: Comment (cont’d)

Overall the Folk Theorem warns us to use caution when arguingthat the best way of making predictions in a strategic setting is byusing Nash and even Subgame Perfect equilibria.

I Office hours in the next two weeks:

Tuesday, Dec. 5 and 12: 11:00-13:00 am

I Exam scheduled for:

Thursday, January 4, 2018 at 14:30p.m.


ec487 advanced microeconomics, part i: lecture 10econ.lse.ac.uk/staff/lfelli/teach/ec487 slides...

Documents