evolution and repeated games d. fudenberg (harvard) e. maskin (ias, princeton)
TRANSCRIPT
Evolution and Repeated Games
D. Fudenberg (Harvard)
E. Maskin (IAS, Princeton)
2
Theory of repeated games important• central model for explaining how self-interested
agents can cooperate• used in economics, biology, political science and
other fields
2,2 1,3
3, 1 0,0
C D
Cooperate
Defect
3
But theory has a serious flaw:
• although cooperative behavior possible, so is uncooperative behavior (and everything in between)
• theory doesn’t favor one behavior over another
• theory doesn’t make sharp predictions
4
Evolution (biological or cultural) can promote efficiency
• might hope that uncooperative behavior will be “weeded out”
• this view expressed in Axelrod (1984)
5
Basic idea:
• Start with population of repeated game strategy Always D
• Consider small group of mutants using Conditional C (Play C until someone plays D, thereafter play D)
– does essentially same against Always D as Always D does
– does much better against Conditional C than Always D does
• Thus Conditional C will invade Always D
• uncooperative behavior driven out
2,2 1,3
3, 1 0,0
C D
Cooperate
Defect
6
But consider ALTAlternate between C and D until pattern broken, thereafter play D
• can’t be invaded by some other strategy– other strategy would have to alternate or else would do much worse
against ALT than ALT does
• Thus ALT is “evolutionarily stable”• But ALT is quite inefficient (average payoff 1)
2,2 1,3
3, 1 0,0
C D
C
D
7
• Still, ALT highly inflexible– relies on perfect alternation– if pattern broken, get D forever
• What if there is a (small) probability of mistake in execution?
8
• Consider mutant strategy identical to ALT except if (by mistake) alternating pattern broken– “intention” to cooperate by playing C in following period– if other strategy plays C too, – if other strategy plays D,
• • • •
2,2 1,3
3, 1 0,0
C D
C
D
s
signalss plays forevers C
and ALT each get about 0 against ALT, after pattern brokens gets 2 against ; ALT gets about 0 against , after pattern brokens s s
so invades ALTs
plays forevers D identical to ALT before pattern brokens
9
Main results in paper (for 2-player symmetric repeated games)
(1) If s evolutionarily stable and– discount rate r small (future important)
– mistake probability p small (but p > 0)
then s (almost) “efficient”
(2) If payoffs (v, v) “efficient”,then exists ES strategy s (almost) attaining (v, v) provided
– r small
– p small relative to r
• generalizes Fudenberg-Maskin (1990), in which r = p = 0
10
Finite symmetric 2–player game
• if
• normalize payoffs so that
•
2:g A A
1 2 1 2 convex hull , ,V g a a a a A A
1 2 1 2 2 1 2 1, , , then , ,g a a v v g a a v v
2 1
1 1 2min max , 0a a
g a a
11
• strongly efficient if
0,0 1,2
2,1 0,0
1 2,v v V
1 2
1 2 1 2,
maxv v V
w v v v v
1 2 1 2 1, 3, 1 2 strongly efficientv v v v v
2,2 unique strongly efficient pair
2,2 1,3
3, 1 0,0
12
Repeated game: g repeated infinitely many times
• period t history
•
• H = set of all histories
• repeated game strategy
– assume finitely complex (playable by finite computer)
• in each period, probability p that i makes mistake– chooses (equal probabilities for all actions)
– mistakes independent across players
1 2 1 21 , 1 , 1 , 1h a a a t a t
i ia s h
2 1 2 11 , 1 , , 1 , 1h a a a t a t
:s H A
13
1
,1 1 2 1 1 2 1 2
1
1, , , ,
1 1
tr p
t
rU s s E g a t a t s s p
r r
1
,1 1 2 1 1 2 1 2
1
1, , , , ,
1 1
tr p
t
rU s s h E g a t a t s s p h
r r
14
• informally, s evolutionarily stable (ES), if no mutant can invade population with big proportion s and small proportion
• formally, s is ES w.r.t. if for all and all
• evolutionary stability
– expressed statically here– but can be given precise dynamic meaning
, ,1 11 , ,r p r pq U s s qU s s
,1 11 , ,r r pq U s s qU s s
, ,q r pq q
s s
ss
15
• population of • suppose time measure in “epochs” T = 1, 2, . . . • strategy state in epoch T
− most players in population use • group of mutants (of size a) plays s'
a drawn randomly from s' drawn randomly from finitely complex
strategies• M random drawings of pairs of players
− each pair plays repeated game• = strategy with highest average score
Ts
1,2, , , where a a b q
1Ts
Ts
playersb
16
Theorem 1: For any
exists such that, for all there exists
such that, for all
(i) if s not ES,
(ii) if
, , and 0, thereq p r ,T T
M T
Pr , for all T t Ts s t T s s
T
,M M T
is ES, Pr for all 1T t Ts s s t T s s
17
Let
Theorem 2: Given such that, for all
if s is ES w.r.t.then
, ,q r p
,1 , for all .r pU s s h v h
0 and 0, there exist and q r p 0, and 0, ,r r p p
min 0 there exists such that , strongly efficientv v v v v
18
0,0 1,2
2,1 0,0 ,
11, So , 1r pv U s s
,12, So , 2r pv U s s
2,2 1,3
3, 1 0,0
19
Proof:
Suppose• will construct mutant s' that can invade • let • if s = ALT, = any history for which alternating
pattern broken
,1 , for some r pU s s h v h
,1arg min ,r ph U s s h
h
20
Construct s' so that• if h not a continuation of
• after , strategy s' – “signals” willingness to cooperate by playing differently
from s for 1 period (assume s is pure strategy)
– if other player responds positively, plays strongly efficiently thereafter
– if not, plays according to s thereafter
• after
– responds positively if other strategy has signaled, and thereafter plays strongly efficiently
– plays according to s otherwise
or h h ,s h s h h
(assume ), h h h s
21
• because is already worst history,
s' loses for only 1 period by signaling(small loss if r small)
• if p small, probability that s' “misreads” other player’s intention is small
• hence, s' does nearly as well against s as s does against itself
(even after )• s' does very well against itself (strong efficiency),
after
h
or h h
or h h
,1 1, ,r pU s s h U s s h w
22
• remains to check how well s does against s' • by definition of
• Ignoring effect of p,
Also, after deviation by s', punishment started again, and so
Hence
• so s does appreciably worse against s' than s' does against s'
, ,1 1, , ,r p r pU s s h U s s h v
, ,1 1, , for some >0r p r pU s s h U s s h w
, ,1 1, , .r p r pU s s h U s s h
, ,1 1, ,r p r pU s s h U s s h w
,h
23
• Summing up, we have:
• s is not ES
, ,1 11 , ,r p r pq U s s q U s s
, ,1 11 , ,r p r pq U s s q U s s
24
• Theorem 2 implies for Prisoner’s Dilemma that, for any
• doesn’t rule out punishments of arbitrary (finite) length
0,
,1 , 2 for and smallr pU s s h r p
25
• Consider strategy s with “cooperative” and “punishment” phases – in cooperative phase, play C – stay in cooperative phase until one player plays D, in
which case go to punishment phase– in punishment phase, play D– stay in punishment phase for m periods (and then go back
to cooperative phase) unless at some point some player chooses C, in which case restart punishment
• For any m,
,
1 , 2 (efficiency), as 0 0r pU s s h r p
26
Can sharpen Theorem 2 for Prisoner’s Dilemma:
Given , there exist such that, for all
if s is ES w.r.t.
then it cannot entail a punishment lasting more than periods
Proof: very similar to that of Theorem 2
and r p, , ,q r p 0, and 0, ,r r p p
q
3 2
2
q
q
27
For r and p too big, ES strategy s may not be “efficient”
• if
• if fully cooperative
strategies in Prisoner’s Dilemma generate payoffs
12 , then evenp
1.
, back in one-shot caser
28
Theorem 3: Let
For all for all
for all
there exists 0 s.t.r r p
,1 ,r pU s s v
0, there exist 0 and 0 s.t.q r
, with v v V v v
there exists ES w.r.t. , , for whichp p s q r p
29
Proof: Construct s so that
• along equilibrium path of (s, s), payoffs are (approximately) (v, v)
• punishments are nearly strongly efficient – deviating player (say 1) minimaxed long enough wipe
out gain– thereafter go to strongly efficient point– overall payoffs after deviation:
• if r and p small (s, s) is a subgame perfect equilibrium
,v w v
30
• In Prisoner’s Dilemma, consider s that– plays C the first period– thereafter, plays C if and only if either both players played
C previous period or neither did
• strategy s– is efficient– entails punishments that are as short as possible– is modification of Tit-for-Tat (C the first period; thereafter,
do what other player did previous period)
• Tit-for-Tat not ES– if mistake (D, C) occurs then get wave of alternating
punishments:(C, D), (D, C), (C, D), ...
until another mistake made
31
• Let s = play d as long as in all past periods– both players played d– neither played d
if single player deviates from d– henceforth, that player plays b– other player plays a
• s is ES even though inefficient– any attempt to improve on efficiency, punished forever– can’t invade during punishment, because punishment efficient
2,2
0,0
0,0 0,0
a b
a
b
c
c
d
d
4,1 0,0
1,4 0,0 0,0
0,0
0,0
0,0
0,00,0
0,00,0
modified battle of sexes
32
Consider potential invader s' For any h, s' cannot do better against s than s does against itself, since (s, s)
equilibriumhence, for all h,
and so
For s' to invade, need
Claim: implies h' involves deviation from equil path of (s, s) only other possibility:
– s' different from s on equil path – then s' punished by – violates
we thus have Hence, from rhs of
, ,1 1, ,r p r pU s s h U s s h
, ,1 1, ,r p r pU s s h U s s h
, ,1 1, , (otherwise can't invade)r p r pU s s h U s s h s
, ,1 1, , for some r p r pU s s h U s s h h
, so inequality not feasiblew
( )
( )
( )
, ,1 1, ,r p r pU s s h U s s h w
( )
( ), ( )
( )
33
For Theorem 3 to hold, p must be small relative to r• consider modified Tit-for-Tat against itself
(play C if and only if both players took same action last period)
• with every mistake, there is an expected loss of 2 – (½ · 3 + ½ (−1)) = 1 the first period2 – 0 = 2 the second period
• so over-all the expected loss from mistakes is
approximately
• By contrast, a mutant strategy that signals, etc. and doesn’t
punish at all against itself loses only about
• so if r is small enough relative to p, mutant can invade
13
rp
r
1 rp
r