evolutionary game theory · learning/evolutionary games shift of focus: away from solution...

EVOLUTIONARY GAME THEORY

Heinrich H. Nax

www.nax.science

Dec 2, 2019 Agent-Based Modeling and Social System Simulation (Fall Semester 2019)

1

Lecture 6: Evolutionary game theory

Common knowledge of rationality and the game

Suppose that players are rational decision makers and that mutual rationalityis common knowledge, that is:

I know that she knows that I will play rational

She knows that “I know that she knows that I will play rational”

I know that “She knows that “I know that she knows that I will playrational””

...

Further suppose that all players know the game and that again is commonknowledge.

2


Rationality and the “as if” approach

The rationalistic paradigm in economics (Savage, The Foundations ofStatistics, 1954)

A person’s behavior is based on maximizing some goal function (utility)under given constraints and information

The “as if” approach (Friedman, The methodology of positive economics,1953)

Do not theorize about the intentions of agents’ actions but consider onlythe outcome (observables)Similar to the natural sciences where a model is seen as an approximationof reality rather than a causal explanation (e.g., Newton’s laws)

But is the claim right? Do people act (as if) they where rational?

3


Nash’s mass-action interpretation (Nash, PhD thesis,1950)

“We shall now take up the “mass-action” interpretation of equilibriumpoints. In this interpretation solutions have no great significance. It isunnecessary to assume that the participants have full knowledge of the totalstructure of the game, or the ability and inclination to go through anycomplex reasoning processes. But the participants are supposed toaccumulate empirical information on the relative advantages of the variouspure strategies at their disposal.

...

Thus the assumption we made in this “mass-action” interpretation lead to theconclusion that the mixed strategies representing the average behavior ineach of the populations form an equilibrium.”

(bold text added for this presentation)

4


Nash’s mass-action interpretation (Nash, PhD thesis,1950)

A large population of identical individuals represents each player role ina gameThe game is played recurrently (t = 0, 1, 2, 3, ...):

In each period one individual from each player population is drawnrandomly to play the game

Individuals observe samples of earlier behaviors in their own populationand avoid suboptimal play (successful strategies are copied morefrequently)

Nash’s claim: If all individuals avoid suboptimal pure strategies and thepopulation distribution is stationary then it constitutes a [Nash] equilibrium

Almost true! Evolutionary game theory formalizes these questions andprovides answers.

5


The folk theorem of evolutionary game theory

Folk theorem

If the population process converges from an interior initial state,then for large t the distribution is a Nash equilibrium

If a stationary population distribution is stable, then it coincideswith a Nash equilibrium

Charles Darwin: “Survival of the fittest”The population which is best adapted to environment (exogenous) willreproduce more

Evolutionary game theoryThe population which performs best against other populations (endogenous)will survive/reproduce more

6


Domain of analysis

Symmetric two-player games

A symmetric two-player normal form game G = 〈N, {Si}i∈N , {ui}i∈N〉consists of three object:

1 Players: N = {1, 2}, with typical player i ∈ N.2 Strategies: S1 = S2 = S with typical strategy s ∈ S.3 Payoffs: A function ui : (h, k)→ R mapping strategy profiles to a

payoff for each player i such that for all h, k ∈ S:

u2(h, k) = u1(k, h)

7


Battle of the Sexes

Cafe PubCafe 4, 3 0, 0Pub 0, 0 3, 4

Not symmetric since:

u1(Cafe,Cafe) 6= u2(Cafe,Cafe)

8


Prisoner’s dilemma

Cooperate DefectCooperate −1,−1 −8, 0

Defect 0,−8 −5,−5

Symmetric since:

u1(Cooperate,Cooperate) = u2(Cooperate,Cooperate) = −1

u1(Cooperate,Defect) = u2(Defect,Cooperate) = −8

u1(Defect,Cooperate) = u2(Cooperate,Defect) = 0

u1(Defect,Defect) = u2(Defect,Defect) = −5

9


Symmetric Nash equilibrium

Definition: Symmetric Nash Equilibrium

A symmetric Nash equilibrium is a strategy profile σ∗ such that for ev-ery player i,

ui(σ∗, σ∗) ≥ ui(σ, σ

∗) for all σ

In words: If no player has an incentive to deviate from their part in aparticular strategy profile, then it is Nash equilibrium.

Proposition

In a symmetric normal form game there always exists a symmetric Nashequilibrium.

Note: Not all Nash equilibria of a symmetric game need to be symmetric.10


Evolutionarily stable strategy (Maynard Smith and Price,1972)

Definition: Evolutionarily stable strategy (ESS)

A mixed strategy σ ∈ ∆(S) is an evolutionarily stable strategy (ESS)if for every strategy τ 6= σ there exists ε(τ) ∈ (0, 1) such that for allε ∈ (0, ε(τ)):

U(σ, ετ + (1− ε)σ) > U(τ, ετ + (1− ε)σ)

Let ∆ESS be the set of evolutionarily stable strategies.

11


Prisoner’s dilemma

Cooperate DefectCooperate −1,−1 −8, 0

Defect 0,−8 −5,−5

∆ESS = {Defect}

13


Coordination game

A BA 4, 4 0, 0B 0, 0 1, 1

Nash equilibria:(A,A), (B,B), (0.2 · A + 0.8 · B, 0.2 · A + 0.8 · B)

All Nash equilibra are symmetric.

But the mixed Nash equilibrium is not ESS:A performs better against it!

Note that the mixed Nash equilibrium is trembling-hand perfect.

14


Existence of ESS not guaranteed

Example: Rock, paper, scissors

R P SR 0, 0 −1, 1 1,−1P 1,−1 0, 0 −1, 1S −1, 1 1,−1 0, 0

Unique Nash equilibrium and thus symmetric:σ = (1

3 R, 13 P, 1

3 S)

All pure strategies are best replies and do as well against themselves as σdoes against them⇒ Not an ESS!

15


Relations to normal form refinements

Propositions

If σ ∈ ∆(S) is weakly dominated, then it is not evolutionarilystable.

If σ ∈ ∆ESS, then (σ, σ) is a perfect equilibrium.

If (σ, σ) is a strict Nash equilibrium, then σ is evolutionarilystable.

16


Summary

Evolutionary game theory studies mutation processes (ESS)

The stable states often coincide with solution concepts from the“rational” framework

Evolutionary game theory does not explain how a population arrives atsuch a strategy⇒ Learning in games and behavioral game theory

The “best” textbook: Weibull, Evolutionary game theory, 1995

17

Game theory describes interactions

Stock market:

Individuals (traders)

Strategies (buy/sell)

Outcome (profit/loss)

wired.co.uk

Fun and games:

Players (hands)

Strategies (rock-paper-scissors)

Outcome (winner/looser)

Shamma

3

Game theory and (distributed) control...

Biology:

Individuals (honeybees)

Strategies (foraging nectar)

Outcome (survival)

beecare.bayer.com

CONTROL THEORY:

Distributed agents (turbines)

Actions (orientation)

System performance (energy)

studyindenmark.dk

4

Distributed control applications

Characteristics:Multiple decision making elementsInterdependencyNo central authorityDistributed informationCollective performance

From the analyst’s point of view, this constitutes a “game”!

6

Centralized versus distributed control

Optimization

vs

Decentralization

Distributed informationCostly (time, energy, etc) communicationNot just “multi component”Not just “graph structure”

Efficiency lossTragedy of the commonsPrice of Anarchy

7

Aims of today’s lecture

Understand the common principles ofdistributed control applications:routing, flocking, formation, coverage,assignment, cooperation, ...

Walk through details of one application:wind farm

Understand the game-theoretic parallels:players, actions, outcomes

Required reading (please ask for more!):

Marden & Shamma, “Game theory and distributed control”, Handbook ofGame Theory IV, Young & Zamir (Eds.), 2015.

8

Game theory and distributed systems

“... the study of mathematical models of conflict and cooperationbetween intelligent rational decision-makers."

Myerson, Game Theory, 1991.

“...systems are characterized by decentralization in available in-formation, multiplicity of decision makers, and individuality ofobjective functions for each decision maker.”

Saksena, O’Reilly, & Kokotovic, Automatica, 1984.

9

Game theory: distributed efficiency loss

Local objectives 6= collective objective

Braess Paradox

S D

A

B

% 1

0

%1

New road worsens congestion

60 people from S to D

No middle road:NE – 90 mins

With middle road:NE – 119/120 mins

10

Game theory: Basic conceptsElements:

Players/Agents/Actors/Individuals

Actions/Strategies/Choices/Decisions

Individual preferences over joint choices (payoffs, utility functions)

Solution concept in a distributed environmentWhat to expect?

Nash Equilibrium.Everyone’s choice is a best response from an individual perspective given thechoices of others.

12

Nash equilibrium & descriptive agenda

Game Elements

Solution Concept: NE

Rationality

“Keynes beauty contest”

Choose number between 0 and 100

Winner = Closest to 1/2 of average

NE: All pick 0

13


Game Elements

Solution Concept

Rationality Perception




Individual best reply: pick 1/2 of what YOU THINK others’ will play

14

Repeated beauty contest

First round choices Fourth vs third round

Nagel, “Unraveling in guessing games: An experimental study”, AER, 1995.

15

“Our” beauty contest from lecture 1

16


Game Elements

Solution Concept

Rationality Perception Evolution




Long-run outcome: All pick 0, i.e. NE

17

Learning/evolutionary games

Shift of focus:

Away from solution concept—Nash equilibrium

Towards how players might arrive to solution—i.e., dynamics

“The attainment of equilibrium requires a disequilibrium process.”

Arrow, 1987.

“The explanatory significance of the equilibrium concept depends onthe underlying dynamics.”

Skyrms, 1992.Distributed control: first, identify the target state; second, encouragedynamics that lead to it.

18

Literature

Monographs:

Weibull, Evolutionary Game Theory, 1997.

Young, Individual Strategy and Social Structure, 1998.

Fudenberg & Levine, The Theory of Learning in Games, 1998.

Samuelson, Evolutionary Games and Equilibrium Selection, 1998.

Young, Strategic Learning and Its Limits, 2004.

Sandholm, Population Dynamics and Evolutionary Games, 2010.

Surveys:

Hart, “Adaptive heuristics”, Econometrica, 2005.

Fudenberg & Levine, “Learning and equilibrium”, Annual Review ofEconomics, 2009.

19

Illustration: Fictitious play (1951)Stages: t = 0, 1, 2, ...Each player:

Maintain empirical frequencies (histograms) of opposing actionsForecasts (incorrectly) that others play according to observed empiricalfrequenciesSelects an action that maximizes expected payoff

Bookkeeping:

xi(·) = evolving empirical frequency of player i

Discrete-time:

xi(t + 1) = xi(t) +1

t + 1(xi(t)− rand[βi(x−i(t))]

)Continuous-time:

dxi

dt= −xi + βi(x−i)

20

Descriptive agenda analysis

Meta-theoremFor [special structure games] under [specific dynamics], players exhibit[asymptotic behavior].

TheoremFor zero-sum games under fictitious play, empirical frequencies converge toNE.

Many more...

21

Prescriptive agendaDesign degrees of freedom:

Game elements: Players, Actions, PreferencesEvolutionary dynamics: Online adaptation

Game ElementsCollectiveObjective

Evolution

CollectiveBehavior


Evolution

CollectiveBehavior

Potential appeal:Distributed self-organizationAdaptation to environmentResilience to disruptions

Marden & JSS, “Game theory and distributed control”, Handbook of Game Theory IV, Young & Zamir (eds), forthcoming.22

Prescriptive agenda in action

TheoremFor potential games under restricted movement log linear learning, jointactions “linger” at potential maximizer.

Distributed graph coverage

Local movements

Local information exchange

Linger at maximal coverageMarden and JSS, “Cooperative control and potential games”, 2009.Yazicioglu, Egerstedt, and JSS, “A game theoretic approach to distributed coverage of graphs by heterogenous mobile agents”, 2013.

23

There and back again...

Game Elements

Solution Concept

Rationality Perception Evolution


Evolution

CollectiveBehavior

“The explanatory significance of the equilibrium concept depends onthe underlying dynamics.”

Skyrms, 1992.

How to identify appropriate dynamics?

24

An applicationA wind farm:

Each windmill takes a directionalorientation and a blade angleDepending on wind direction, thisleads to an energy production foreach windmillThe central authority (forsimplicity) aims to maximize theenergy totalFor larger wind farms thecentralized control approach hasproven unsuccessful

Marden et al. 2013. “A Model-Free Approach to Wind Farm Control Using Game Theoretic Methods”. IEEE Transactions on Control Systems Technology 21(4):“Each turbine does not have access to the functional form of the power generated by the wind farm. This is because the aerodynamic interaction

access to the choices of other turbines. This is because of the lack of asuitable communication system.”

between the turbines is poorly understood. [...]"

Bee intermezzo

Bees:Bees fly to different patches offlowers foraging for nectarIf nectar per flower is abundant(high payoffs), bees continue in thecurrent patch with high probabilityIf a series of flowers yields lowpayoff, bees fly far away to a newpatch

Rule governs the behavior of bees (Thuijsman et al. JTB 1995)Shown to be a successful foraging strategy at the population level(implementing NE – Young 2009, even total payoff maximizing NE – Pradelskiand Young 2012).

26

More formally:

Game:Players i = 1, 2, ..., nFinite strategy setAi = {ai, bi, ..., ki}Joint strategy space A = ΠiAi

Payoffs ui : A→ R

How do you get windmills to play this game –giving them private utilityfunctions– so as to maximize total energy production?

27

The single turbine:

Game (given a certain winddirection):

Players i = 1, 2, ..., n(windmills/turbines)Finite strategy setAi = {ai, bi, ..., ki} (orientations)Joint strategy space A = ΠiAi (windpark configuration)Payoffs ui : A→ R (own energyproduction)

28

The learning rule (pseudo code)1. Initialize. t = 0, 1: each turbine i select a random (benchmark) orientation

ati resulting in power ut

i–. Windmill ‘moods’. t + 1 > 1:

if ati 6= at−1

i or uti ≥ ut−1

i , windmill ‘content’if at

i = at−1i and ut

i < ut−1i , windmill ‘discontent’

2a. Benchmark update. t + 1 > 1:if ‘content’,keep or switch benchmark according to higher payoffif ‘discontent’,keep old benchmark

2. Action update. t + 1 > 1:if ‘content’,play at

i with (high probability) 1− ε andRAND with εif ‘discontent’,windmill plays RAND with probability 1

29

Performance

Theorem. For any desired probability p < 1, there exists ε > 0 such that,for sufficiently large iterations, total power generated ismaximal with at least probability p.

Intuition:

A series of experiments leads to states with ever higher welfare untilsomeone’s payoff goes down.

That individual becomes discontent, and his searching may cause otheragents to become discontent.

Eventually the discontent agents settle into a new all-content state, wherethe settling probability increases with the overall welfare of the state.

30

Alternative approaches: cooperative control

Game:Players i = 1, 2, ..., nFinite strategy setAi = {ai, bi, ..., ki}Joint strategy space A = ΠiAi

Payoffs ui : A→ R (total energyproduction)

Making windmills play this game –giving them altruistic utility functions– willalso maximize total energy production.

31

Harmony intermezzo

Recall the difference between theprisoner’s dilemma and the harmonygame:

defection dominant strategy inprisoner’s dilemmacooperation dominant strategy inharmony game

Prisoner’s dilemma:

Confess Stay quietA A

Confess -6 -10B -6 0

Stay quiet 0 -2B -10 -2

How to transform a prisoner’s dilemma into a harmony game by addingaltruism...

32

Harmony intermezzo




Defect CooperateA A

Defect 10-6 10-10B 10-6 10-0

Cooperate 10-0 10-2B 10-10 10-2

33

Harmony intermezzo




Defect CooperateA A

Defect 4 0B 4 10

Cooperate 10 8B 0 8

34

Harmony intermezzo

Now each player cares forself and other the same way:

Write φS for payoff for selfWrite φO for payoff ofotherAssumeui(φS, φO) = φS + φO

i.e.altruism/other-regardingconcern

Defect CooperateA A

Defect 4+4 0+10B 4+4 10+0

Cooperate 10+0 8+8B 0+10 8+8

35

Harmony intermezzo




Defect CooperateA A

Defect 8 10B 8 10

Cooperate 10 16B 10 16

Now any dynamic that implements Nash equilibrium in this modified harmonygame would maximize total payoffs...

36

Staghunt game

Think of the following coordinationgame:

there are two actions: safe and riskyone equilibrium is when bothplayers play safeanother is when both players playriskyrisky leads to higher total payoffs

Staghunt dilemma:

Risky SafeA A

Risky 5 4.5B 5 0

Safe 0 2B 4.5 2

Adding altruism...

37

Staghunt modified




Risky SafeA A

Risky 5+5 4.5+0B 5+5 0+4.5

Safe 0+4.5 2+2B 4.5+0 2+2

38

Staghunt modified




Risky SafeA A

Risky 10 4.5B 10 4.5

Safe 4.5 4B 4.5 4

Now risky-risky is the unique Nash equilibrium.

39

Harmony intermezzo




Risky SafeA A

Risky 10 4.5B 10 4.5

Safe 4.5 4B 4.5 4

Now any dynamic that implements Nash equilibrium in this game wouldmaximize total payoffs...

40

But this need not always work

Risky SafeA A

Risky 5 3B 5 0

Safe 0 2B 3 2

Risky SafeA A

Risky 10 3B 10 3

Safe 3 4B 3 4

Now not any dynamic that implements Nash equilibrium in this game wouldmaximize total payoffs – we are back to a selection problem!

41

Differences in information

Own energy only:ui(φS) = φS

no information necessary aboutstructure of the gameprogram dynamic offlinevery specific dynamics willworkdynamic requires no feedback

Total energy:e.g. ui(φS) = φS + φO

need to understand structure ofthe game in order to identifywhich specification willgenerate desired equilibriamore general class of dynamicswill workprogram dynamic offlinedynamic requires feedbackabout energy total as gamecontinues

Which approach is better depends on the application.

42

Summary: game theory describes interactions

Economics:

Individuals (traders)

Strategies (buy/sell)

Outcome (profit/loss)

wired.co.uk

Mechanism design:

Players (doctors and hospitals)

Strategies (applications)

Outcome (Matching)

NRMP

44

Game theory and (distributed) control...

Biology:

Individuals (honeybees)

Strategies (foraging nectar)

Outcome (survival)

beecare.bayer.com

Distributed control:

Distributed agents (turbines)

Actions (orientation)

System performance (energy)

studyindenmark.dk

45

Broad agenda comparison

Biology Social Mechanism Distributedsystems design control

Game structure given given manipulable manipulableActions given given given givenPayoffs given given given manipulableInformation given given manipulable given

46

Thanks & Acknowledgements

Bary Pradelski Jeff ShammaPeyton Young

47

evolutionary game theory · learning/evolutionary games shift of focus: away from solution...

Documents