july, 2013 tutorial: introduction to game theory

70
© 2013 IBM Corporation July, 2013 Tutorial: Introduction to Game Theory Jesus Rios IBM T.J. Watson Research Center, USA [email protected]

Upload: others

Post on 06-Dec-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

© 2013 IBM Corporation

July, 2013

Tutorial: Introduction to Game Theory Jesus Rios IBM T.J. Watson Research Center, USA [email protected]

© 2013 IBM Corporation 2

Approaches to decision analysis

 Descriptive – Understanding of how decisions are made

 Normative – Models of how decision should be made

 Prescriptive – Helping DM make smart decisions – Use of normative theory to support DM – Elicit inputs of normative models

•  DM preferences and beliefs (psycho-analysis) •  use of experts

– Role of descriptive theories of DM behavior

© 2013 IBM Corporation 3

Game theory arena

 Non-cooperative games – More than one intelligent player –  Individual action spaces –  Interdependent consequences

•  Players’ consequences depend on their own and other player actions

 Cooperative game theory – Normative bargaining models

•  Joint decision making -  Binding agreements on what to play

•  Given players preferences and solution space Find a fair, jointly satisfying and Pareto optimal agreement/solution

– Group decision making on a common action space (Social choice) •  Preference aggregation •  Voting rules

-  Arrow’s theorem – Coalition games

© 2013 IBM Corporation 4

Cooperative game theory: Bargaining solution concepts

•  Disagreement point: BATNA, status quo •  Feasible solutions: ZOPA •  Pareto-efficiency •  Aspiration levels •  Fairness:

K-S, Nash, maxmin solutions

Working alone Juan $ 10

Maria $ 20

Working together $ 100

Juan

Maria

How to distribute

the profits of the cooperation? Juan x

Maria y

10

20

y

x

x + y = 100

80

90

Fair K

x = 45

y = 55

Bliss point

© 2013 IBM Corporation 5

Normative models of decision making under uncertainty

 Models for a unitary DM – vN-M expected utility

•  Objective probability distributions – Subjective expected utility (SEU)

•  Subjective probability distributions

 Example: investment decision problem – One decision variable with two alternatives

•  In what to investment? -  Treasury bonds -  IBM shares

– One uncertainty with two possible states •  IBM share price at the end of the year

-  High -  Low

– One evaluation criteria for consequences •  Profit from investment

 The simplest decision problem under uncertainty

© 2013 IBM Corporation 6

Decision Table

  DM chooses a row without knowing which column will occur

  Choice depends on the relative likelihood of High and Low? –  If DM is sure that IBM share price will be High,

best choice is to buy Shares –  If DM is sure that IBM share price will be Low,

best choice is to buy Bonds Elicit the DM’s beliefs about which column will occur

  Choice depends on the value of money –  Expected return not a good measure of decision preferences

•  The two alternatives give the same expected return but most of DMs would not fell indifferent between them Elicit risk attitude of the DM

© 2013 IBM Corporation 7

Decision tree representation

  What does the choice depends upon? –  relative likelihood of H vs L –  strength of preferences for money

IBM Shares

Bonds

High

Low

What to buy

price

$2,000

- $1,000

$500

uncertainty

certainty

© 2013 IBM Corporation 8

Subjective expected utility solution

  If DM’s decision behavior consistent with some set of “rational” desiderata (axioms) DM decides as if he has

–  probabilities to represent his beliefs about the future price of IBM share –  “utilities” to represent his preferences and risk attitude towards money

and choose the alternative of maximum expected utility

  The subjective expected utility model balance in a “rational” manner –  the DM’s beliefs and risk attitudes

  Application requires to –  know the DM’s beliefs and “utilities”

•  Different elicitation methods –  compute of expected utilities of each decision strategy

•  It may require approximation in non-simple problems

© 2013 IBM Corporation 9

  The Basic Canonical Reference Lottery ticket: p-BCRL

p

1 - p BCLR

$2,000

- $1,000

Preferences over BCRL p-BCRL > q-BCRL iff p > q

where p and q are canonical probabilities

A constructive definition of “utility”

© 2013 IBM Corporation 10

Elicit prob. of the price of IBM shares

  Event H –  IBM price High

  Event L –  IBM price Low

  Pr( H ) + Pr( L ) = 1

1 - p

IBM shares

p-BCRL

H

L

p $2,000

- $1,000

$2,000

- $1,000

BCRL

price

  Move p from 1 to 0   Which alternative is preferred by the DM?

–  IBM shares –  p-BCRL

  There exists a breakeven canonical prob. such that the DM is indifferent –  pH-BCRL ~ IBM shares

–  The judgmental probability of H is pH

© 2013 IBM Corporation 11

Elicit the utility of $500

  U( $500 )? 1 - p

p - BCLR

p $2,000

- $1,000

BCLR

$500 Bonds

  Move p from 1 to 0   Which alternative is preferred by the DM?

p-BCRL vs. Bonds   There exists a breakeven canonical prob. such that the DM is indifferent

–  u-BCRL ~ Bonds

–  This scales the value of $500 between the value of $2,000 and - $1,000 U($500) = u

  What is then U($500)? –  The probability of a BCRL between $2,000 and - $1,000 that is indifferent (for the DM) to getting

$500 with certainty

© 2013 IBM Corporation 12

Comparison of alternatives IBM shares

H

L

pH $2,000

- $1,000

$2,000

- $1,000

BCRL

price

U($500) $2,000

- $1,000

BCLR

$500 Bonds

~

~ The DM prefers to

invest on “IBM Shares” iff

pH > U($500)

© 2013 IBM Corporation 13

Solving the tree: backward induction

 Utility scaling 0 = U( - $1,000 ) < U( $500 ) = u < U( $2,000 ) = 1

IBM Shares

Bonds

High

Low

What to buy

price

$2,000

- $1,000

$500

1

0

Utilities

u

pH

1 - pH

© 2013 IBM Corporation 14

Preferences: value vs. utility

 Value function – measure the desirability (intensity of preferences) of money gained, – but do not measure risk attitude

 Utility function – Measure risk attitude – but no intensity of preferences over sure consequences

 Many methods to elicit a utility function – Qualitative analysis of risk attitude leads to parametric utility functions – Ask quantitative indifference questions between deals (one of which must be an uncertain

lottery) to assess parameters of utility function – Consistency checks and sensitivity analysis

© 2013 IBM Corporation 15

The Bayesian process of inference and evaluation with several stakeholders and decision makers (Group decision making)

© 2013 IBM Corporation 16

Disagreements in group decision making

 Group decision making assumes – Group value/utility function – Group probabilities on the uncertainties

  If our experts disagree on the science (Expert problem) – How to draw together and learn from conflicting probabilistic judgements – Mathematical aggregation

•  Bayesian approach •  Opinion pools

-  There is no opinion pool satisfying a consensus minimum set of “good” probabilistic properties •  Issues

-  How do we model knowledge overlap/correlation -  Expertise evaluation

– Behavioural aggregation – The textbook problem

•  If we do not have access to experts we need to develop meta-analytical methodologies for drawing together expert judgment studies

© 2013 IBM Corporation 17

Disagreements in group decision making

  If group members disagree on the values –  How to combine different individuals’ rankings of options into a group ranking? –  Arbitration/voting

•  Ordinal rankings -  Arrow impossibility results.

•  Cardinal ranking (values and not utilities -- Decisions without uncertainty) -  Interpersonal comparison of preferences’ strengths -  Supra decision maker approach (MAUT)

•  Issues: manipulation and true reporting of rankings

  Disagreement on the values and the science –  Combining

•  individual probabilities and utilities •  into group probabilities and utilities, respectively, •  to form the corresponding group expected utilities and choosing accordingly

–  Impossibility of being Bayesian and Paretian at the same time •  No aggregation method exist (of probabilities and utilities) compatible with the Pareto order

–  Behavioral approaches •  Consensus on group probabilities and utilities via sensitivity analysis. •  Agreement on what to do via negotiation

© 2013 IBM Corporation 18

Decision analysis in the presence of intelligent others

 Matrix games against nature – One player: R (Row)

•  Two choices: U (Up) and D (Down) – Payoff matrix

0 5

10 3

U

D

R

Nature

L R

If you were R, what would you do? D > U against L U > D against R

© 2013 IBM Corporation 19

Games against nature

 Do we know which Colum nature will choose? – We know our best responses to Nature moves, but not what move Nature will choose

 Do we know the (objective) probabilities of Nature’s possible moves? – YES

0 5

10 3

U

D

R

Nature

L R

p 1-p

0 p + 5 (1-p)

10 p + 3 (1-p)

Expected payoff

U > D iff p < 1/6 Payoffs = vNM utils

© 2013 IBM Corporation 20

Games against nature and the SEU criteria

 Do we know the (objective) probabilities of Nature’s possible moves? – No

•  Variety of decision criteria -  Maximin (pessimistic), maxmax (optimistic), Hurwicz, minimax regret,…

0 5

10 3

U

D

R

Nature

L R

0

3

Min

Maxmin D

5

10

Max

10

2

Max Regret

Maxmax D

Minmax Regret

D SEU criteria

Elicit DM’s subjective probabilistic beliefs about Nature move (p) Compute SEU of each alternative: D > U iff p > 1/6

© 2013 IBM Corporation 21

Games against others intelligent players

 Bimatrix (simultaneous) games – Second intelligent player: C (Column)

•  Two choices: L (Left) and R (Right) – Payoff bimatrix

•  we know C payoffs and that he will try to maximize them – As R, what would you do?

0 5

10 3

U

D

R

C

L R

– Knowledge C’s payoffs and rationality allows us to predict with certitude C’s move (R)

2 4

3 8

*

© 2013 IBM Corporation 22

One shot simultaneous bi-matrix games

 Two players – Trying to maximize their payoffs

 Players must choose one out of two fixed alternatives – Row player chooses a row – Column player chooses a column

 Payoffs depends of both players’ moves  Simultaneous move game

– Players must act without knowing what the other player does – Play once

 No other uncertainties involved  Players have full and common knowledge of

– choice spaces – bi-matrix payoffs

 No cooperation allowed

uR(U,L) uR(U,R)

uR(D,L) uR(D,L)

U

D

R

C

L R

uC(U,L) uC(U,R)

uC(D,L) uC(D,L)

© 2013 IBM Corporation 23

Dominant alternatives and social dilemmas

 Prisoner dilemma –  (NC,NC) is mutually dominant

•  Players’ choices are independent of information regarding the other player’s move

–  (NC,NC) is socially dominated by (C,C)

 Airport network security

5 -5

10 -2

C

NC

R

C

C NC

5 10

-5 -2 *

*

© 2013 IBM Corporation 24

Iterative dominance

 No dominant strategy for either player, however – There are iterative dominated strategies

•  L > R •  Now M is dominant in the restricted game

-  M > U and M > D •  Now L > C in the restricted game

-  20 > - 10 –  (M,L) solution by iteratively elimination of (strict) dominated strategies

•  Common knowledge and rationality assumptions

 Exercise – Find if there is a solution by iteratively eliminating dominated strategies

Solution: (D,C)

© 2013 IBM Corporation 25

Nash equilibrium

 Games without – Dominant solution – Solution by iterative elimination of dominated alternatives

0 2

1 0

Ballet

Concert

Concert Ballet

0 1

2 0 *

1 -1

-1 1

Head

Tails

Head Tails

-1 1

1 -1

*

Battle of the sexes Matching pennies

© 2013 IBM Corporation 26

Existence of Nash equilibrium (Nash)

  Every finite game has a NE in mixed strategies –  Requires extending the original set of alternatives of each player

  Consider the matching pennies game – Mixed strategies

•  Choosing a lottery of certain probabilities over Head and Tails –  Players’ choice sets defined by the lottery’s probability

•  Row: p in [0,1] •  Column: q in [0,1]

–  Payoff associated with a pair of strategies (p,q) is •  (p,1-p) P (q,1-q)T

where P is the payoff matrix for the original game in pure strategies •  Payoffs need to be vNM utilities

–  Nash equilibrium •  Intersection of players best response correspondences

uR(p*,q*) > uR(p,q*) uC(p*,q*) > uC(p*,q) (p*,q*)

© 2013 IBM Corporation 27

Nash equilibria concept as predictive tool

 Supporting the row player against the column player  Games with multiple NEs

4 10

12 5

U

D

L R

-100 6

8 4 *

*  Two NEs   (D,L) > (U,R), since 12>10 and 8>6  C may prefer to play R

– To protect himself against -100  Knowing this, R would prefer to play U

– ending up at the inferior NE (U,R)  How can we model C behavior?

– Bayesian K-level thinking

© 2013 IBM Corporation 28

K-level thinking

 Row is not sure about Column’s move – p: Row’s beliefs about C moving L – Row’s SEU

•  U: 4 p + 10 (1-p) •  D: 12 p + 5 (1-p)

– U > D iff p < 5/13 = 0.38  How to elicit p?

– Row’s analysis of Column’s decision •  Assuming C behave as a SEU maximizer •  q: C’s beliefs about whether Row is smart enough to choose D (best NE) •  L SEU: -100 (1-q) + 8 q

R SEU: 6 (1-q) + 4 q •  L > R iff q > 53/55 = 0.96 •  Since Row does not know q, his beliefs about q are represented by a CPD F •  p = Pr (q > 0.96) = F(0.96)

p

q

© 2013 IBM Corporation 29

Simultaneous vs sequential games

  First mover advantage –  Both players want to move first

•  Credible commitment/threat

  Second mover advantage –  Players want to observe their opponent’s move

before acting –  Both players try not to disclose their moves

Game of Chicken Matching pennies game

© 2013 IBM Corporation 30

Dynamic games: backward induction

  Sequential Defend-Attack games – Two intelligent players

•  Defender and Attacker – Sequential moves

•  First Defender, afterwards Attacker knowing Defender’s decision

© 2013 IBM Corporation 31

Standard Game Theoretic Analysis

Solution:

Expected utilities at node S

Best Attacker’s decision at node A

Assuming Defender knows Attacker’s analysis Defender’s best decision at node D

© 2013 IBM Corporation 32

Supporting a SEU maximizer Defender

Defender’s problem Defender’s solution of maximum SEU

Modeling input: ??

© 2013 IBM Corporation 33

Example: Banks-Anderson (2006)

 Exploring how to defend US against a possible smallpox attack – Random costs (payoffs)

– Conditional probabilities of each kind of smallpox attack given terrorists know what defence has been adopted

– Compute expected cost of each defence strategy

 Solution: defence of minimum expected cost

This is the problematic step

of the analysis

© 2013 IBM Corporation 34

Predicting Attacker’s decision: .

Defender problem Defender’s view of Attacker problem

© 2013 IBM Corporation 35

Solving the assessment problem

Defender’s view of Attacker problem

Elicitation of

A is an EU maximizer

D’s beliefs about

MC simulation

© 2013 IBM Corporation 36

Bayesian decision solution for the sequential Defend- Attack model

© 2013 IBM Corporation 37

Standard Game Theory vs. Bayesian Decision Analysis

 Decision Analysis (unitary DM) – Use of decision trees – Opponent’ actions treated as a random variables

•  How to elicit probs on opponents’ decisions?? •  Sensitivity analysis on (problematic) probabilities

 Game theory (multiple DMs) – Use of game trees – Opponent’ actions treated as a decision variables – All players are EU maximizers

•  Do we really know the utilities our opponents try to maximizes?

© 2013 IBM Corporation 38

Bayesian decision analysis approach to games

 One-sided prescriptive support – Use a prescriptive model (SEU) for supporting one of the DMs – Treat opponent's decisions as uncertainties – Assess probs over opponent's possible actions – Compute action of maximum expected utility

 The ‘real’ bayesian approach to games (Kadane & Larkey 1982) – Weaken common (prior) knowledge assumption

 How to assess a prob distribution over actions of intelligent others?? –  “Adversarial Risk Analysis” (DRI, DB and JR) – Development of new methods for the elicitation of probs on adversary’s actions

•  by modeling the adversary’s decision reasoning -  Descriptive decision models

© 2013 IBM Corporation 39

Relevance to counterbioterrorism

 Biological Threat Risk Assessment for DHS (Battelle, 2006) – Based on Probability Event Trees (PET)

•  Government & Terrorists’ decisions treated as random events

 Methodological improvements study (NRC committee) – PET appropriate for risk assessment of

•  Random failure in engineering systems but not for adversarial risk assessment

•  Terrorists are intelligent adversaries trying to achieve their own objectives

•  Their decisions (if rational) can be somehow anticipated

– PET cannot be used for a full risk management analysis •  Government is a decision maker not a random variable

© 2013 IBM Corporation 40

Methodological improvement recommendations

 Distinction between risks from – Nature/Accidents vs. – Actions of intelligent adversaries

 Need of models to predict Terrorists’ behavior – Red team role playing (simulations of adversaries thinking)

– Attack-preference models •  Examine decision from Attacker viewpoint (T as DM)

– Decision analytic approaches •  Transform the PET in a decision tree (G as DM)

-  How to elicit probs on terrorist decisions?? -  Sensitivity analysis on (problematic) probabilities -  Von Winterfeldt and O’Sullivan (2006)

– Game theoretic approaches •  Transform the PET in a game tree (G & T as DM)

© 2013 IBM Corporation 41

Models to predict opponents’ behavior

 Role playing (simulations of adversaries thinking)

 Opponent-preference models – Examine decision from the opponent viewpoint

•  Elicit opponent’s probs and utilities from our viewpoint (point estimates) – Treat the opponent as a EU maximizer ( = rationality?)

•  Solve opponent’s decision problem by finding his action of max. EU

– Assuming we know the opponent’s true probs and utilities •  We can anticipate with certitude what the opponent will do

 Probabilistic prediction models – Acknowledge our uncertainty on opponent’s thinking

© 2013 IBM Corporation 42

Opponent-preference models

 Von Winterfeldt and O’Sullivan (2006) – Should We Protect Commercial Airplanes Against

Surface-to-Air Missile Attacks by Terrorists?

Decision tree + sensitivity analysis on probs

© 2013 IBM Corporation 43

Parnell (2007)

  Elicit Terrorist’s probs and utilities from our viewpoint –  Point estimates

  Solve Terrorist’s decision problem –  Finding Terrorist’s action that gives him max. expected utility

  Assuming we know the Terrorist’s true probs and utilities –  We can anticipate with certitude what the terrorist will do

© 2013 IBM Corporation 44

Parnell (2007)

 Terrorist decision tree

© 2013 IBM Corporation 45

Paté-Cornell & Guikema (2002)

Attacker Defender

© 2013 IBM Corporation 46

Paté-Cornell & Guikema (2002)

 Assessing probabilities of terrorist’s actions – From the Defender viewpoint

•  Model the Attacker’s decision problem •  Estimate Attacker’s probs and utilities (point estimates) •  Calculate expected utilities of attacker’s actions

– Prob of attacker’s actions proportional to their perceived EU

 Feed these probs into the Defender’s decision problem – Uncertainty of Attacker’s decisions has been quantified – Choose defense of maximum expected utility

 Shortcoming –  If the (idealized) adversary is an EU maximizer he would certainly choose the attack of max expected utility

© 2013 IBM Corporation 47

How to assess probabilities over the actions of an intelligent adversary??

 Raiffa (2002): Asymmetric prescriptive/descriptive approach – Prescriptive advice to one party conditional on

a (probabilistic) description of how others will behave – Assess probability distribution from experimental data

•  Lab role simulation experiments

 Rios Insua, Rios & Banks (2009) – Assessment based on an analysis of the adversary rational behavior

•  Assuming the opponent is a SEU maximizer -  Model his decision problem -  Assess his probabilities and utilities -  Find his action of maximum expected utility

– Uncertainty in the Attacker’s decision stems from •  our uncertainty about his probabilities and utilities

– Sources of information •  Available past statistical data of Attacker’s decision behavior •  Expert knowledge / Intelligence

© 2013 IBM Corporation 48

The Defend–Attack–Defend model

  Two intelligent players –  Defender and Attacker

  Sequential moves –  First, Defender moves –  Afterwards, Attacker knowing Defender’s move –  Afterwards, Defender again responding to attack

  Infinite regress

© 2013 IBM Corporation 49

  Under common knowledge of utilities and probs   At node

  Expected utilities at node S

  Best Attacker’s decision at node A

  Best Defender’s decision at node

  Nash Solution:

Standard Game Theory Analysis

© 2013 IBM Corporation 50

  At node

  Expected utilities at node S

  At node A

  Best Defender’s decision at node

  ??

Supporting the Defender against the Attacker

© 2013 IBM Corporation 51

  Attacker’s problem as seen by the Defender

Predicting

© 2013 IBM Corporation 52

Given Assessing

© 2013 IBM Corporation 53

 Drawn

 Generate by

 Approximate

Monte-Carlo approximation of

© 2013 IBM Corporation 54

 The Defender may want to exploit information about how the Attacker analyzes her problem

 Hierarchy of recursive analysis –  Infinity regress – Stop when there is no more information to elicit

The assessment of

© 2013 IBM Corporation 55

Games with private information

 Example: – Consider the following two-person simultaneous game with asymmetric information

•  Player 1 (row) knows whether he is stronger than player 2 (Colum) but player 2 does not know this

•  Player's type use to represent information privately known by that player

© 2013 IBM Corporation 56

Bayes Nash Equilibrium

 Assumption – common prior over the row player's type:

•  Column's beliefs about the row player's type are common knowledge •  Why column is going to disclose this information? •  Why row is going to believe that column is disclosing her true beliefs about his type?

 Row’s strategy function

© 2013 IBM Corporation 57

Bayes Nash Equilibrium

© 2013 IBM Corporation 58

Is the common knowledge assumption realistic?

  – Column is better off reporting that

– 

– 

© 2013 IBM Corporation 59

Modeling opponents' learning of private information

 Simultaneous decisions – Bayes Nash Equilibrium – No opportunity to learn about this information

 Sequential decisions •  Perfect Bayesian equilibrium/Sequential rationality •  Opportunity to learn from the observed decision behavior

-  Signaling games

 Models of adversaries' thinking to anticipate their decision behavior – need to model opponents' learning of private information we want to keep secret – how would this lead to a predictive probability distribution?

© 2013 IBM Corporation 60

Sequential Defend-Attack model with Defender’s private information

 Two intelligent players – Defender and Attacker

 Sequential moves – First Defender, afterwards Attacker knowing Defender’s decision

 Defender’s decision takes into account her private information – The vulnerabilities and importance of sites she wants to protect – The position of ground soldiers in the data ferry control problem (ITA)

 Attacker observes Defender’s decision – Attacker can infer/learn about information she wants to keep secret

 How to model the Attacker’s learning

© 2013 IBM Corporation 61

Influence diagram vs. game tree representation

© 2013 IBM Corporation 62

A game theoretic analysis

© 2013 IBM Corporation 63

A game theoretic analysis

© 2013 IBM Corporation 64

A game theoretic solution

© 2013 IBM Corporation 65

Supporting the Defender

 We weaken the common knowledge assumption  The Defender’s decision problem

D

V

S A ??

© 2013 IBM Corporation 66

Defender’s solution

© 2013 IBM Corporation 67

Predicting the Attacker’s move:

© 2013 IBM Corporation 68

Attacker action of MEU

© 2013 IBM Corporation 69

Assessing

© 2013 IBM Corporation 70

How to stop this hierarchy of recursive analysis?

 Potentially infinite analysis of nested decision models – where to stop?

•  Accommodate as much information as we can •  Stop when the Defender has no more information •  Non-informative or reference model •  Sensitivity analysis test