tutorial on stochastic games i - lake como …tutorial on stochastic games anna ja!kiewicz...

80
TUTORIAL ON STOCHASTIC GAMES Anna Jaśkiewicz Wrocław University of Science and Technology Department of Pure and Applied Mathematics e-mail: [email protected]

Upload: others

Post on 11-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: TUTORIAL ON STOCHASTIC GAMES I - Lake Como …TUTORIAL ON STOCHASTIC GAMES Anna Ja!kiewicz Wroc"aw University of Science and Technology Department of Pure and Applied Mathematics e-mail:

TUTORIAL ON STOCHASTIC GAMES

Anna Jaśkiewicz

Wrocław University of Science and Technology

Department of Pure and Applied Mathematics

e-mail: [email protected]

Page 2: TUTORIAL ON STOCHASTIC GAMES I - Lake Como …TUTORIAL ON STOCHASTIC GAMES Anna Ja!kiewicz Wroc"aw University of Science and Technology Department of Pure and Applied Mathematics e-mail:

PLAN:

I Theory:I The existence result.I Comments on the extensions.I Algorithms.I Limiting average payoff game.

I Applications: Bequest games.

Page 3: TUTORIAL ON STOCHASTIC GAMES I - Lake Como …TUTORIAL ON STOCHASTIC GAMES Anna Ja!kiewicz Wroc"aw University of Science and Technology Department of Pure and Applied Mathematics e-mail:

MODEL OF A STOCHASTIC GAME

I I = {1, . . . , n} - the set of players; i ∈ I ;

I X = {1, . . . ,N} - a state space;

I Ai - the finite action set of player i ; (or Ai(x))

I A := A1 × · · · × An - the set of action profiles;

I ri : X × A 7→ R - the reward function of a player i ;

I q(x ′|x , a) - the probability that the next state is x ′ giventhe current state x and the action profile a ∈ A;

I β ∈ (0, 1) - a discount factor.

Page 4: TUTORIAL ON STOCHASTIC GAMES I - Lake Como …TUTORIAL ON STOCHASTIC GAMES Anna Ja!kiewicz Wroc"aw University of Science and Technology Department of Pure and Applied Mathematics e-mail:

EVOLUTION OF A STOCHASTIC GAME

...x1 x2 xkq i x2 ,a(2)( )q i x1,a(1)( ) q i xk ,a(k )( )

a(1) a(2) = a(k )= (a12 ,a22 ,...,an2 )

r1(x1,a(1) )!rn (x1,a(1) )

r1(x2 ,a(2) )!rn (x2 ,a(2) )

r1(xk ,a(k ) )!rn (xk ,a(k ) )

= (a1k ,a2k ,...,ank )

Page 5: TUTORIAL ON STOCHASTIC GAMES I - Lake Como …TUTORIAL ON STOCHASTIC GAMES Anna Ja!kiewicz Wroc"aw University of Science and Technology Department of Pure and Applied Mathematics e-mail:

STRATEGIES

A strategy for player i is a sequence:

πi = (πi1, πi2, . . .),

whereπik(·|hk) ∈ Pr(Ai), k ∈ N

andhk = (x1, a(1), x2, a(2), . . . , xk)

is the history of the process up to the k-th state (h1 = x1),

a(m) = (a1m, a2m, . . . , anm)

is the profile of actions at the m-th stage of the game.

Page 6: TUTORIAL ON STOCHASTIC GAMES I - Lake Como …TUTORIAL ON STOCHASTIC GAMES Anna Ja!kiewicz Wroc"aw University of Science and Technology Department of Pure and Applied Mathematics e-mail:

STATIONARY STRATEGIES

A stationary strategy for player i is a sequence:

f ∞i = (fi , fi , . . .),

wherefi : X 7→ Pr(Ai).

Identify:f ∞i ←→ fi .

We shall write:fi(·|x), fi(x)(·),

f = (f1, . . . , fn), π = (π1, . . . , πn).

The set of stationary strategies for player i is Fi .

Page 7: TUTORIAL ON STOCHASTIC GAMES I - Lake Como …TUTORIAL ON STOCHASTIC GAMES Anna Ja!kiewicz Wroc"aw University of Science and Technology Department of Pure and Applied Mathematics e-mail:

EXPECTED DISCOUNTED REWARD

Let x ∈ X be a initial state and π be a strategy profile chosen

by players. Then, r(k)i (x , π) is the expected reward function of

player i at the k-th stage of the game. Define the expecteddiscounted reward in the infinite time horizon as follows:

Ji(x , π) =∑∞

k=1 βk−1r

(k)i (x , π).

It is well-defined, since

Ji(x , π) ≤ R

1− β, max

i ,x ,a|ri(x , a)| ≤ R .

β - the probability of continuation of the game or β = 11+ρ

,

where ρ is the interest rate.

Page 8: TUTORIAL ON STOCHASTIC GAMES I - Lake Como …TUTORIAL ON STOCHASTIC GAMES Anna Ja!kiewicz Wroc"aw University of Science and Technology Department of Pure and Applied Mathematics e-mail:

AN EXAMPLE OF 2-PLAYER STOCHASTIC GAME I

6,23,-3

2,1 1,3

State x=1

(1,0)

(1,0)(1,0)

13 , 23( )

State x=2

0,0(0,1)

Stationary strategies: f1 = (1 / 2,1 / 2),1( ), f2 = (0,1),1( )r1(x, f1, f2 ) = r1(x,a1,a2 ) f1(a1

a2!A2"

a1!A1" | x) f2 (a2 | x) =

7 / 2, x = 10, x = 2

#$%

&%

r2 (x, f1, f2 ) = r2 (x,a1,a2 ) f1(a1a2!A2"

a1!A1" | x) f2 (a2 | x) =

5 / 2, x = 10, x = 2

#$%

&%

q(y | x, f1, f2 ) = q(y | x,a1,a2 ) f1(a1a2!A2"

a1!A1" | x) f2 (a2 | x)

Page 9: TUTORIAL ON STOCHASTIC GAMES I - Lake Como …TUTORIAL ON STOCHASTIC GAMES Anna Ja!kiewicz Wroc"aw University of Science and Technology Department of Pure and Applied Mathematics e-mail:

AN EXAMPLE OF 2-PLAYER STOCHASTIC GAME II

6,23,-3

2,1 1,3

State x=1

(1,0)

(1,0)(1,0)

13 , 23( )

State x=2

0,0

(0,1)

f1 = (1 / 2,1 / 2),1( ), f2 = (0,1),1( )

J1(1, f1, f2 ) = 72 1+ ! 2

3 + !2 22

32 + ...( ) = 7/21! 23!

J2 (1, f1, f2 ) = 52 1+ ! 2

3 + !2 22

32 + ...( ) = 5/21! 23!

Formula:

Qf1 f2=

23

13

0 1

!

"#

$

%&

r1(k ) (x, f1, f2 ) = r1 y, f1, f2( )q(k!1) y | x, f1, f2( )

y"X#

Ji ( f1, f2 ) = ! k!1Qf1 f2(k!1)ri ( f1, f2 ) =

k=1

"

# I ! !Qf1 f2( )!1 ri f1, f2( )

Page 10: TUTORIAL ON STOCHASTIC GAMES I - Lake Como …TUTORIAL ON STOCHASTIC GAMES Anna Ja!kiewicz Wroc"aw University of Science and Technology Department of Pure and Applied Mathematics e-mail:

NASH EQUILIBRIUM IN A DISCOUNTEDSTOCHASTIC GAME

A profile π∗ = (π∗1, . . . , π∗n) is a Nash Equilibrium, if

Ji(x , π∗) ≥ Ji(x , πi , π

∗−i)

for all x ∈ X , πi and i ∈ I .

Recall that for the vector y ∈ Rn we define

y−i = (y1, . . . , yi−1, yi+1, . . . , yn)

(zi , y−i) = (y1, . . . , yi−1, zi , yi+1, . . . , yn).

A Stationary Nash Equilibrium is a Nash equilibrium thatbelongs to the class of strategy profiles F1 × · · · × Fn.

Page 11: TUTORIAL ON STOCHASTIC GAMES I - Lake Como …TUTORIAL ON STOCHASTIC GAMES Anna Ja!kiewicz Wroc"aw University of Science and Technology Department of Pure and Applied Mathematics e-mail:

EXISTENCE OF A NASH EQUILIBRIUM

Every discounted stochastic game possesses aStationary Nash Equilibrium, i.e., there existsf ∗ = (f ∗1 , . . . , f

∗n ) ∈ F := F1 × · · · × Fn such that

Ji(x , f∗) ≥ Ji(x , πi , f

∗−i)

for all x ∈ X , πi and i ∈ I .

Page 12: TUTORIAL ON STOCHASTIC GAMES I - Lake Como …TUTORIAL ON STOCHASTIC GAMES Anna Ja!kiewicz Wroc"aw University of Science and Technology Department of Pure and Applied Mathematics e-mail:

PROOF I

I Note that F is a compact convex set in some Euclidean space.

I Observe that Ji (x , ·) is continuous on F , i.e., if

(f k1 , fk

2 , . . . , fkn )→ (f1, f2, . . . , fn) as k →∞,

(i.e., f ki (x)→ fi (x) for all x ∈ X , i ∈ I )

then Ji (x , fk

1 , . . . , fkn )→ Ji (x , f1, . . . , fn). This follows from

the formula Ji (f1, . . . , fn) = (I − βQf1...fn)−1ri (f1, . . . , fn).

I CLAIM: A policy f ∗ = (f ∗1 , . . . , f∗n ) is a Nash equilibrium if

Ji (·, f ∗) satisfies the optimality equation

Ji (x , f∗) = max

µ∈Pr(Ai )[ri (x , µ, f

∗−i ) + β

∑y∈X

Ji (y , f∗)q(y |x , µ, f ∗−i )]

= ri (x , f∗) + β

∑y∈X

Ji (y , f∗)q(y |x , f ∗)

for all x ∈ X and i ∈ I .

Page 13: TUTORIAL ON STOCHASTIC GAMES I - Lake Como …TUTORIAL ON STOCHASTIC GAMES Anna Ja!kiewicz Wroc"aw University of Science and Technology Department of Pure and Applied Mathematics e-mail:

PROOF II

I Note that for every f ∈ F there exists gi ∈ Fi such that

maxµ∈Pr(Ai )

[ri (x , µ, f−i ) + β∑y∈X

Ji (y , f )q(y |x , µ, f−i )]

= ri (x , gi , f−i ) + β∑y∈X

Ji (y , f )q(y |x , gi , f−i )

I Let ϕi (f ) be the set of all gi ∈ Fi that satisfy the aboveequality for all x ∈ X .

I Observe that ϕi (f ) 6= ∅ and ϕi (f ) is convex!

ri (x , λgi1+(1−λ)gi2, f−i ) = λri (x , gi1, f−i )+(1−λ)ri (x , gi2, f−i ), λ ∈ [0, 1].

The same for q.

I Define the mapping

Φ(f ) = ϕ1(f )× · · · × ϕn(f ).

Page 14: TUTORIAL ON STOCHASTIC GAMES I - Lake Como …TUTORIAL ON STOCHASTIC GAMES Anna Ja!kiewicz Wroc"aw University of Science and Technology Department of Pure and Applied Mathematics e-mail:

PROOF: upper semicontinuity of f 7→ Φ(f ).

Fix two sequences (f k) and (gk) of stationary strategies such that

(1) f k → f ∈ F gk → g ∈ F ,

(2) gk ∈ Φ(f k).

Then, letting k →∞ in

maxµ∈Pr(Ai )

[ri (x , µ, fk−i ) + β

∑y∈X

Ji (y , fk)q(y |x , µ, f k−i )]

= ri (x , gki , f

k−i ) + β

∑y∈X

Ji (y , fk)q(y |x , gk

i , fk−i ),

we get (use the continuity of Ji (x , ·)!)

maxµ∈Pr(Ai )

[ri (x , µ, f−i ) + β∑y∈X

Ji (y , f )q(y |x , µ, f−i )]

= ri (x , gi , f−i ) + β∑y∈X

Ji (y , f )q(y |x , gi , f−i )

for all i ∈ I and x ∈ X . Hence, g ∈ Φ(f ).

Page 15: TUTORIAL ON STOCHASTIC GAMES I - Lake Como …TUTORIAL ON STOCHASTIC GAMES Anna Ja!kiewicz Wroc"aw University of Science and Technology Department of Pure and Applied Mathematics e-mail:

PROOF: the Kakutani fixed point theorem (1941)

Let S ⊂ Rm be a compact and convex set and let φ bean upper semicontinuous correspondence from S to Ssuch that for every s ∈ S the set φ(s) is non-empty andconvex. Then there exists a point s∗ ∈ S such thats∗ ∈ φ(s∗).

Since Φ is the upper semicontinuous correspondence from thecompact convex set F into itself, it follows that there existsf ∗ ∈ F such that f ∗ ∈ Φ(f ∗).

Page 16: TUTORIAL ON STOCHASTIC GAMES I - Lake Como …TUTORIAL ON STOCHASTIC GAMES Anna Ja!kiewicz Wroc"aw University of Science and Technology Department of Pure and Applied Mathematics e-mail:

PROOF: CLAIM REVISITED I

We have

maxµ∈Pr(Ai )

[ri (x , µ, f∗−i ) + β

∑y∈X

Ji (y , f∗)q(y |x , µ, f ∗−i )]

= ri (x , f∗) + β

∑y∈X

Ji (y , f∗)q(y |x , f ∗) = Ji (x .f

∗)

because for any f ∈ F

Ji (x , f ) =∞∑k=1

βk−1r(k)i (x , f ) = ri (x , f ) + β

∞∑k=2

βk−2r(k)i (x , f )

= ri (x , f ) + β∑y∈X

Ji (y , f )q(y |x , f ).

Page 17: TUTORIAL ON STOCHASTIC GAMES I - Lake Como …TUTORIAL ON STOCHASTIC GAMES Anna Ja!kiewicz Wroc"aw University of Science and Technology Department of Pure and Applied Mathematics e-mail:

PROOF: CLAIM REVISITED II

For every µ ∈ Pr(Ai ) we have

Ji (x .f∗) ≥ ri (x , µ, f

∗−i ) + β

∑y∈X

Ji (y , f∗)q(y |x , µ, f ∗−i )

Iterating this inequality (T-1) times:

Ji (x , f∗) ≥

T∑k=1

βk−1r(k)i (x , πi , f

∗−i )+βT

∑y∈X

Ji (y , f∗)q(T )(y |x , πi , f ∗−i )

Since |Ji (x , π)| ≤ R1−β and letting T →∞ we get

Ji (x , f∗) ≥ Ji (x , πi , f

∗−i ).

Page 18: TUTORIAL ON STOCHASTIC GAMES I - Lake Como …TUTORIAL ON STOCHASTIC GAMES Anna Ja!kiewicz Wroc"aw University of Science and Technology Department of Pure and Applied Mathematics e-mail:

COMMENTS ON THE EXISTENCE OF ASTATIONARY NASH EQUILIBRIUM

I Non-zero sum stochastic game with finite state and actionsets: A.M. Fink (J Sci Hiroshima Univ Ser A-I, 28 (1964)),M. Takahashi (J Sci Hiroshima Univ Ser A-I, 26 (1963));

I These works generalise the paper of L.S. Shapley (ProcNatl Acad Sci 39 (1953)) on zero-sum stochastic gameswith finite state and action sets;

Year 1953 - the beginning of stochastic games

L.S. Shapley, Stochastic Games

Page 19: TUTORIAL ON STOCHASTIC GAMES I - Lake Como …TUTORIAL ON STOCHASTIC GAMES Anna Ja!kiewicz Wroc"aw University of Science and Technology Department of Pure and Applied Mathematics e-mail:

COMMENTS II

I Non-zero sum stochastic game with countable state spaceand compact metric action spaces: A. Federgruen (AdvAppl Prob, 10 (1978)):

(1) introduce the topology of point-wise convergence inthe set of stationary strategies;

(2) f 7→ Ji(x , f ) is continuous for every i ∈ I ;

(3) a generalisation of Kakutani’s fixed point theorem dueto I. Glicksberg (1952) and K. Fan (1952).

Page 20: TUTORIAL ON STOCHASTIC GAMES I - Lake Como …TUTORIAL ON STOCHASTIC GAMES Anna Ja!kiewicz Wroc"aw University of Science and Technology Department of Pure and Applied Mathematics e-mail:

COMMENTS III

I Non-zero sum stochastic game with uncountable (general)state space and finite action spaces.

The problem of the existence of a stationary Nashequilibrium had been unsolved till 2015!

Y.J. Levy and A. McLennan (Econometrica 83 (2015))gave a counterexample:

1. 8-player stochastic games, state space X = [0, 1], action setsare finite;

2. the transition probability is a convex combination of a Diracmeasure δ1 and the uniform distribution on [x , 1] withcoefficients depending on the current state x and the actionprofile a chosen by the players;

3. payoff functions are complex;4. E. Kohlberg and J.-F. Mertens (Econometrica 54, (1986): On

strategic stability of equilibria).

Page 21: TUTORIAL ON STOCHASTIC GAMES I - Lake Como …TUTORIAL ON STOCHASTIC GAMES Anna Ja!kiewicz Wroc"aw University of Science and Technology Department of Pure and Applied Mathematics e-mail:

COMMENTS IV: Why it is so difficult?

I The problem of the existence of a stationary Nashequilibrium is problem of a fixed point:

- compactness of a domain

- continuity of the expected payoffs on the product ofstationary strategy sets

Remedy (?):

1. the weak-star topology in the set of stationary strategies (theBanach-Alaoglu theorem gives compactness of F ), however welose continuity of the expected payoffs (counterexample: R.J.Elliot, N.J. Kalton, L. Marcus (1973))

2. the topology of uniform convergence in the set F ; but then thefamily is compact if it is uniformly bounded andequicontinuous (the Arzela-Ascoli theorem): one may considerthe Lipschitz function with constant 1, but the best responsecan be a Lipschitz function with constant 2....

Page 22: TUTORIAL ON STOCHASTIC GAMES I - Lake Como …TUTORIAL ON STOCHASTIC GAMES Anna Ja!kiewicz Wroc"aw University of Science and Technology Department of Pure and Applied Mathematics e-mail:

COMMENTS V: General state space

I P. Barelli and J. Duggan (J Econ Theory 15, (2014))1. transition law q(·|y , a) is absolutely continuous w.r.t. some

non-atomic measure µ;2. the equilibrium strategies are independent of the stage of the

game, they may depend on the current state, the previousstate and the action profile chosen by the players in theprevious state;

I A. Jaskiewicz, A.S. Nowak (Math Oper Res 41 (2016))

1. q(·|x , a) =∑l

k=1 gk(x , a)qk(·|x) such that∑l

k=1 gk(x , a) = 1,gk is a Caratheodory function;

2. qk(·|x) - absolutely continuous w.r.t. measure µ;3. the equilibrium strategies are independent of the stage of the

game, they may depend on the current state, the previousstate;

4. The counterexample of Levy/McLennan belongs to the gamesconsidered here (SAMPE).

I Proofs: Look for a fixed point in the set of Nash equilibrium payoffs!I J.-F. Mertens and T. Parthasarathy (1991, 2003): subgame perfect

equilibrium (the entire history of the game).

Page 23: TUTORIAL ON STOCHASTIC GAMES I - Lake Como …TUTORIAL ON STOCHASTIC GAMES Anna Ja!kiewicz Wroc"aw University of Science and Technology Department of Pure and Applied Mathematics e-mail:

ALGORITHMS: ZERO-SUM STOCHASTIC GAMES I

I Consider the game, in which r := r1 = −r2, player 1 -max, player 2 - min; X , A1 and A2 are finite;

I Define the game Γv (x) with the payoff matrix:

Pv (x) =

[r(x , a1, a2) + β

∑y∈X

v(y)q(y |xa1, a2)

]a1,a2

Tv(x) = val [Pv (x)]

I By Shapley (1953) the value f-n w is the unique solution

to the equation w=Tw and if f ∗1 (x) and f ∗2 (x) optimalstrategies in the game Γw (x), then f ∗1 and f ∗2 are optimalin the stochastic game, i.e.,

J(x , π1, f∗

2 ) ≤ J(x , f ∗1 , f∗

2 ) = w(x) ≤ J(x , f ∗1 , π2)

for every x ∈ X , π1 and π2.

Page 24: TUTORIAL ON STOCHASTIC GAMES I - Lake Como …TUTORIAL ON STOCHASTIC GAMES Anna Ja!kiewicz Wroc"aw University of Science and Technology Department of Pure and Applied Mathematics e-mail:

ALGORITHMS: ZERO-SUM STOCHASTIC GAMES II

I w is the fixed point of the contractive operator T ;

I value iteration: T (k)0→ w but slowly!I A.J. Hoffman and R.M. Karp (1966): modification of VI

that uses information on optimal strategies in the k-thstage of the game:

1. find g1(x) - optimal strategy for player 2 in Γ0(x)2. find w1(x) = supπ1

J(x , π1, g1)

3. find g2(x) - optimal strategy for player 2 in Γw1 (x)4. find w2(x) = supπ1

J(x , π1, g2)

5. find g3(x) - optimal strategy for player 2 in Γw2 (x)6. etc.7. Claim: wk → w

Page 25: TUTORIAL ON STOCHASTIC GAMES I - Lake Como …TUTORIAL ON STOCHASTIC GAMES Anna Ja!kiewicz Wroc"aw University of Science and Technology Department of Pure and Applied Mathematics e-mail:

ALGORITHMS: ZERO-SUM STOCHASTIC GAMESIII

I Pollatschek and Avi-Itzhak algorithm (1969) - proved tobe convergent under stringent assumptions;

I J. Filar and B. Tolwinski (1991) applied the modifiedNewton’s method and improved this algorithm:w is a unique solution to the Shapley equation ≡ theunconstrained optimisation problem

minv∈RN

∑x∈X

(Tv(x)− v(x))2

It has a unique global minimum (v = (v(1), . . . , v(N)));

I 1-player game - MDP: LP solves the problem!

I In zero-sum games this is not the case!

Page 26: TUTORIAL ON STOCHASTIC GAMES I - Lake Como …TUTORIAL ON STOCHASTIC GAMES Anna Ja!kiewicz Wroc"aw University of Science and Technology Department of Pure and Applied Mathematics e-mail:

ALGORITHMS: COUNTEREXAMPLE FOR LP

State x=1

(1,0)

(1,0)

State x=2

(0,1)1

3(0,1)

(0,1)

00

0w(2) = 0

w(1) = val1+ 1

2 w(1) 0 + 12 w(2)

0 + 12 w(2) 3+ 1

2 w(1)

!

"#

$

%&

w(1) = 13 !4 + 2 13( )

! = 12

w(1) = val r(x,a1,a2 )+ ! w(y)q(y |1,a1,a2 )y!X"#

$%

&

'(a1,a2

Shapley eq.

Page 27: TUTORIAL ON STOCHASTIC GAMES I - Lake Como …TUTORIAL ON STOCHASTIC GAMES Anna Ja!kiewicz Wroc"aw University of Science and Technology Department of Pure and Applied Mathematics e-mail:

ALGORITHMS: LP FOR ZERO-SUM STOCHASTICGAMES

LP for Single-Controller Stochastic Game: player 2 controlsthe transition probability, i.e., q(y |x , a1, a2) = q(y |x , a2)

(LP) maxv∈RN

∑x∈X

v(x)

subject to

r(x , f1, a2) + β∑y∈X

v(y)q(y |x , f1, a2) ≥ v(x), a2 ∈ A2, x ∈ X

v(x) ≥ 0

f1 ∈ F1

The problem (LP) has a solution (v ∗, f ∗1 ) such thatv ∗ = w and f ∗1 is an optimal strategy in the SCSG.To find an optimal strategy f ∗2 use the Shapley equation w = Tw .

Page 28: TUTORIAL ON STOCHASTIC GAMES I - Lake Como …TUTORIAL ON STOCHASTIC GAMES Anna Ja!kiewicz Wroc"aw University of Science and Technology Department of Pure and Applied Mathematics e-mail:

ALGORITHMS FOR NON-ZERO-SUM STOCHASTICGAMES I

Linear Complementarity Problem (LCP):

- given a square matrix M (m ×m)

- given a column vector Q ∈ Rm

- find two vectors Z = [z1, . . . , zm]T ∈ Rm andW = [w1, . . . ,wm]T ∈ Rm such that

MZ + Q = W , wj , zj ≥ 0, wjzj = 0

for all j = 1, . . . ,m.

C.E. Lemke (1965) proposed a finite step pivoting algorithm tosolve LCP for a large class of matrices M and vectors Q.

Page 29: TUTORIAL ON STOCHASTIC GAMES I - Lake Como …TUTORIAL ON STOCHASTIC GAMES Anna Ja!kiewicz Wroc"aw University of Science and Technology Department of Pure and Applied Mathematics e-mail:

ALGORITHMS FOR NON-ZERO-SUM STOCHASTICGAMES II

Finding a NE in any bimatrix game (C,D) ≡ solving LCP with

M =

[DT OO C

]Q = [−1, . . . ,−1]T

C.E. Lemke and J.T. Howson (1964): a finite step algorithmfor this LCP;

If Z ∗ = [Z ∗1 , Z∗2 ] is a solution to LCP, then the normalisation

of Z ∗i is a an equilibrium strategy for player i .

Page 30: TUTORIAL ON STOCHASTIC GAMES I - Lake Como …TUTORIAL ON STOCHASTIC GAMES Anna Ja!kiewicz Wroc"aw University of Science and Technology Department of Pure and Applied Mathematics e-mail:

ALGORITHMS: LCP FOR NON-ZERO-SUMSTOCHASTIC GAMES I

LPC for Single-Controller Stochastic Game:

- player 2 controls the transition probability, i.e.,

q(y |x , a1, a2) = q(y |x , a2)

- {f 11 , . . . , f

m11 }, {f 1

2 , . . . , fm2

2 } - the sets of pure stationarystrategies of the players

Consider the bimatrix game:

C↔ ck,l = r1(x , f k1 (x), f l2 (x))

D↔ dk,l =∑x∈X

J2(x , f k1 , fl

2 )

Page 31: TUTORIAL ON STOCHASTIC GAMES I - Lake Como …TUTORIAL ON STOCHASTIC GAMES Anna Ja!kiewicz Wroc"aw University of Science and Technology Department of Pure and Applied Mathematics e-mail:

ALGORITHMS: LCP FOR NON-ZERO-SUMSTOCHASTIC GAMES II

Let p = (p1, . . . , pm1) q = (q1, . . . , qm2) be a NE inthe bimatrix game (C,D). Then, the stationarystrategies

f ∗1 (x) =∑m1

k=1 pkδf k1 (x)

f ∗2 (x) =∑m2

l=1 qlδf l2 (x)

form a NE in the discounted stochastic game.A.S. Nowak, T.E.S. Raghavan (1993), improvements due to S.R. Mohan, S.K.

Neogy, T. Parthasarathy (1997, 2001)

Page 32: TUTORIAL ON STOCHASTIC GAMES I - Lake Como …TUTORIAL ON STOCHASTIC GAMES Anna Ja!kiewicz Wroc"aw University of Science and Technology Department of Pure and Applied Mathematics e-mail:

RECENT DEVELOPMENTS ON ALGORITHMS

I J.J.P. Herings and R.J.A.P. Peters (2004, 2010):homotopy methods in computing Nash equilibria

Idea is based on the application of the Browder fixed point theorem

(1960).

I S. Govindan and R. Wilson (2003): the algorithmcombines the global Newton method and a homotopymethod for finding fixed points of some continuousmapping.

Page 33: TUTORIAL ON STOCHASTIC GAMES I - Lake Como …TUTORIAL ON STOCHASTIC GAMES Anna Ja!kiewicz Wroc"aw University of Science and Technology Department of Pure and Applied Mathematics e-mail:

MODEL OF A ZERO-SUM STOCHASTIC GAME

I I = {1, . . . , 2} - the set of players; i ∈ I ;

I A := A1 × A2 - the set of action profiles;

I r : X × A 7→ R - the reward function of a player 1 (theloss function of player 2) ;

I Vβ - the value of the normalised β-discounted game, i.e.,for every x ∈ X and strategies πi , i = 1, 2,

Jβ(x , π1, f∗

2 ) ≤ Jβ(x , f ∗1 , f∗

2 ) = Vβ(x) ≤ Jβ(x , f ∗1 , π2),

(f ∗1 , f∗

2 ) ∈ F1 × F2 − optimal stationary strategies

Jβ(x , π1, π2) = (1− β)∞∑k=1

βk−1r (k)(x , π1, π2)

Page 34: TUTORIAL ON STOCHASTIC GAMES I - Lake Como …TUTORIAL ON STOCHASTIC GAMES Anna Ja!kiewicz Wroc"aw University of Science and Technology Department of Pure and Applied Mathematics e-mail:

SHAPLEY’S THEOREM

An auxiliary matrix game Γv (x):[(1− β)r(x , a1, a2) + β

∑y∈X

v(y)q(y |x , a1, a2)

]a1,a2

The discounted zero-sum stochastic game possessesthe value Vβ that is unique solution

Vβ(x) = val

[(1− β)r(x , a1, a2) + β

∑y∈X

Vβ(y)q(y |x , a1, a2)

]for all x ∈ X . Moreover, if (f ∗1 (x), f ∗2 (x)) is an optimalstrategy pair in the game ΓVβ

(x) for every x ∈ X , then(f ∗1 , f

∗2 ) is an optimal pair of strategies in the stochastic

zero-sum game.

The map Tβ : RN 7→ RN is contractive:

(Tβv)(x) := val [(1− β)r(x , a1, a2) + β∑y∈X

v(y)q(y |x , a1, a2)]

Page 35: TUTORIAL ON STOCHASTIC GAMES I - Lake Como …TUTORIAL ON STOCHASTIC GAMES Anna Ja!kiewicz Wroc"aw University of Science and Technology Department of Pure and Applied Mathematics e-mail:

THE LIMITING AVERAGE PAYOFF CRITERION

The worst scenario for player 1

V (x , , π1, π2) = lim infT→∞1T

∑Tk=1 r

(k)(x , π1, π2)

I MDP model (1-player game): LAP is considerably moredifficult to analyse;

I There exists a stationary optimal strategy and can befound by a suitably constructed linear program.

Questions:

I Does the value for the limiting average game exist?

I Do players possess optimal (stationary) stationarystrategies?

Page 36: TUTORIAL ON STOCHASTIC GAMES I - Lake Como …TUTORIAL ON STOCHASTIC GAMES Anna Ja!kiewicz Wroc"aw University of Science and Technology Department of Pure and Applied Mathematics e-mail:

THE BIG MATCH (GILLETTE (1957))

State x=1 State x=2*

100(0,1,0)

1

10(0,1,0)

State x=3*

(0,0,1)

(0,0,1)(1,0,0)(1,0,0)

Every day player 2 chooses a number 1 or 2 and player 1 tries to predict 2”s choice, winning

a point if he is correct. This continues as long as player 1 predicts 1. But if he ever predicts 2,

all future choices for both players are required to be the same as that day’s choices: if player

1 is correct on that day, he wins a point every day thereafter; if he is wrong on that day, he

wins zero every day thereafter. The payoff to player 1 is

1

1

2

2

*absorbing states

! k ! 0,1{ }V (1,!1,! 2 ) = lim infn!"

"1 + ...+" n

n

Page 37: TUTORIAL ON STOCHASTIC GAMES I - Lake Como …TUTORIAL ON STOCHASTIC GAMES Anna Ja!kiewicz Wroc"aw University of Science and Technology Department of Pure and Applied Mathematics e-mail:

STATIONARY STRATEGIES IN THE BIG MATCH

Gillette (1957): stationary strategies

maxf1∈F1

minf2∈F2

V (1, f1, f2) = 0 < minf2∈F2

maxf1∈F1

V (1, f1, f2) =1

2

Consider the following stationary strategies:

f p1 = ((p, 1− p), 1, 1)

f q2 = ((q, 1− q), 1, 1)

Page 38: TUTORIAL ON STOCHASTIC GAMES I - Lake Como …TUTORIAL ON STOCHASTIC GAMES Anna Ja!kiewicz Wroc"aw University of Science and Technology Department of Pure and Applied Mathematics e-mail:

THE LOWER VALUE

State x=1 State x=2*

10(0,1,0)

0State x=3*

(0,0,1)

CASE 1. p=1 - player 1 never chooses a row causing absorption; but against the strategy

1 2

*absorbing states

p(p,1! p,0) (p,0,1! p)

f1p (1) = (p,1! p) f2

q (1) = (q,1! q)

f20 (1) = (0,1) player 1 will earn 0 always and hence V 1, f1

1, f20( ) = 0

CASE 2. 0<=p<1 - againts f21(1) = (1,0) player 1 ultimately will be absorbed in state 2

with probability 1. In view of the nature of LAP V 1, f1p , f2

1( ) = 01! p + p 1! p( )+ p2 1! p( )+ ...= 1

CONCLUSION: minf2!F2

V 1, f1p , f2( ) = 0

Page 39: TUTORIAL ON STOCHASTIC GAMES I - Lake Como …TUTORIAL ON STOCHASTIC GAMES Anna Ja!kiewicz Wroc"aw University of Science and Technology Department of Pure and Applied Mathematics e-mail:

THE UPPER VALUE

State x=1 State x=2*

10(0,1,0)

State x=3*

(0,0,1)

CASE 0. q=1/2 - irrespective of what player 1 does in state 1

1

2

*absorbing states

f1p (1) = (p,1! p) f2

q (1) = (q,1! q)

CASE 1. p<1 - absorption in state 2 [3] will occur with probability q [1-q]

CONCLUSION:

(1,0,0)

(0,q,1! q)

q

1! q

V 1, f1p , f2

12( ) = 12

q(1! p)+ pq 1! p( )+ p2q 1! p( )+ ...= qCASE 2. p=1 - state 1 will repeat itself infinitely often. V 1, f1

p , f2q( ) = q if p = 1

1! q if p <1

"#$

%$

minf2!F2

maxf1!F1

V 1, f1, f2( ) = 12

q1

1

12

12

maxf1!F1

V 1, f1, f2( )

Page 40: TUTORIAL ON STOCHASTIC GAMES I - Lake Como …TUTORIAL ON STOCHASTIC GAMES Anna Ja!kiewicz Wroc"aw University of Science and Technology Department of Pure and Applied Mathematics e-mail:

SOLUTION OF THE BIG MATCH

I The Big Match does not have a limiting average value instationary strategies;

I The Big Match does not have a limiting average value inMarkov strategies (dependence on stage and current state);

I The Big Match was solved by Blackwell and Ferguson (1968):

The limiting average value of the Big Match equals 12.

Note: The behavioural strategies are indispensable inachieving limiting average ε-optimality. Player 1 canguarantee a LAP as close to 1

2as he likes by carefully

taking into account the opponent’s behaviour (his pastactions in the process of choosing his own actions). Thereis no way that player 1 can guarantee 1

2.

Page 41: TUTORIAL ON STOCHASTIC GAMES I - Lake Como …TUTORIAL ON STOCHASTIC GAMES Anna Ja!kiewicz Wroc"aw University of Science and Technology Department of Pure and Applied Mathematics e-mail:

PROOF - COMMENTS

I Blackwell and Ferguson (1968) provided two differentconstructions of ε-optimal strategy for player 1. One ofthem relies on using a sequence of optimal stationarystrategies in discounting games with discount factortending to 1. The idea is to switch from one discountedoptimal strategy to another on the basis of some statisticsdefined on the past plays.

I Their results were generalised by Kohlberg (1974) forabsorbing zero-sum stochastic games, i.e., stochasticgames in which all states but one are absorbing.

Page 42: TUTORIAL ON STOCHASTIC GAMES I - Lake Como …TUTORIAL ON STOCHASTIC GAMES Anna Ja!kiewicz Wroc"aw University of Science and Technology Department of Pure and Applied Mathematics e-mail:

THE RESULT OF MERTENS AND NEYMAN (1981)

The limiting average value V ∗ exists for every finitezero-sum stochastic game.

Moreover, ∀ε > 0 ∃ (πε1, πε2), n0 ∈ N, β0 ∈ (0, 1) such that

supπ1

Jn(x , π1, πε2)− ε ≤ V ∗(x) ≤ inf

π2

Jn(x , πε1, π2) + ε

for all n ≥ n0, x ∈ X

supπ1

Jβ(x , π1, πε2)− ε ≤ V ∗(x) ≤ inf

π2

Jβ(x , πε1, π2) + ε

for all β ∈ (β0, 1), x ∈ X .

Jn(x , π1, π2) = 1n

∑nk=1 r

(k)(x , π1, π2)

(πε1 , π

ε2) are nearly optimal in sufficiently long finite games and in all discounted

games with discount factor sufficiently close to 1.

Page 43: TUTORIAL ON STOCHASTIC GAMES I - Lake Como …TUTORIAL ON STOCHASTIC GAMES Anna Ja!kiewicz Wroc"aw University of Science and Technology Department of Pure and Applied Mathematics e-mail:

THE BIG MATCH

V ∗(x) = limn→∞ Vn(x) = limβ→1 Vβ(x)

The value for n-stage games: Vn(1) = 12

for all n.

The value for β-discounted games: Vβ(1) = 12

for all β.

The unique optimal strategy for player 2 is (12, 1

2) for all n

stage games and all β-discounted games.

For player 1 the β-discounted optimal strategy is ( 12−β ,

1−β2−β );

For player 1 optimal Markov strategy in the n-stage game is toplay ( 1+m

2+m, 1

2+m) at stage n −m for m = 1, 2, . . . , n − 1.

Page 44: TUTORIAL ON STOCHASTIC GAMES I - Lake Como …TUTORIAL ON STOCHASTIC GAMES Anna Ja!kiewicz Wroc"aw University of Science and Technology Department of Pure and Applied Mathematics e-mail:

PROOF: COMMENTS

The proof contains clever use of the Blackwell and Fergusonapproach and the Bewley and Kohlberg result (the field of realPuiseux series turns out to be extremely useful in studying theasymptotic behaviour of Vβ as β ↗ 1 and Vn as n→∞).

The proof is complex and makes use of algebraic tools and Tarski’s principle

from logic.

Bewley and Kohlberg (1976):

There exist β∗ ∈ (0, 1), M ∈ N and the real numbers ck(x),k = 0, 1, . . . such that for all β ∈ (β∗, 1)

Vβ(x) =∑∞

k=0 ck(x)(1− β)k/M .

There exist M ∈ N and the real numbers dk(x), k = 0, 1, . . .such that for all sufficiently large n

|Vn(x)−∑∞

k=0 dk(x)n−k/M | = O(ln n/n).

Page 45: TUTORIAL ON STOCHASTIC GAMES I - Lake Como …TUTORIAL ON STOCHASTIC GAMES Anna Ja!kiewicz Wroc"aw University of Science and Technology Department of Pure and Applied Mathematics e-mail:

LAP IN NON-ZERO SUM GAMES

I The limiting average equilibrium payoffs can neither beapproached asymptotically from the set of β-discountedequilibrium payoffs nor from the set of n-stage equilibriumpayoffs (see The Paris Match due to S. Sorin).

I The Puiseux series expansions for the normalisedβ-discounted payoffs in finite non-zero-sum games wereprovided by Mertens (1982).

Page 46: TUTORIAL ON STOCHASTIC GAMES I - Lake Como …TUTORIAL ON STOCHASTIC GAMES Anna Ja!kiewicz Wroc"aw University of Science and Technology Department of Pure and Applied Mathematics e-mail:

MODEL OF A BEQUEST GAME

I {1, 2, . . .} - the set of short-lived generations;I generation t lives, acts and dies in period t;I there is a single good S = [0, 1] used both for

consumption and as productive capital (the space ofendowments);

I st ∈ [0, 1] - the endowment of generation t;I generation t saves yt ∈ [0, st ] and consumes at :

st︸︷︷︸endowment

= at︸︷︷︸consumption

+ yt︸︷︷︸investment

I the next generation’s inheritance or capital: st+1 = f (yt),where f is an increasing continuous production functionwith f (0) = 0;

I ut := u(at , at+1) - generation t’s utility (generation caresabout itself and descendant);

Page 47: TUTORIAL ON STOCHASTIC GAMES I - Lake Como …TUTORIAL ON STOCHASTIC GAMES Anna Ja!kiewicz Wroc"aw University of Science and Technology Department of Pure and Applied Mathematics e-mail:

EVOLUTION OF A BEQUEST GAME

at at+1 at+m

st st+1 st+m

yt = st ! at yt+1 = st+1 ! at+1 yt+m = st+m ! at+m

… f yt+1( ) … f yt+m( )f yt( ) = st+1

u(at ,at+1) each generation derives its utility from its own consumption and

the follower’s consumption

Page 48: TUTORIAL ON STOCHASTIC GAMES I - Lake Como …TUTORIAL ON STOCHASTIC GAMES Anna Ja!kiewicz Wroc"aw University of Science and Technology Department of Pure and Applied Mathematics e-mail:

STRATEGIES

I Φ - the set of functions ϕ : S 7→ S such that ϕ(s) ∈ [0, s];

I strategy for generation t is a function ct ∈ Φ;

I if ct = c for all t for some c ∈ Φ, then generationsemploy a stationary strategy;

I c ∈ Φ - a consumption strategy =⇒ i(s) = s − c(s) - aninvestment/saving strategy.

Page 49: TUTORIAL ON STOCHASTIC GAMES I - Lake Como …TUTORIAL ON STOCHASTIC GAMES Anna Ja!kiewicz Wroc"aw University of Science and Technology Department of Pure and Applied Mathematics e-mail:

SOLUTION

E. Phelps and R. Pollack (Rev. Econ. Stud. (1968)): a gamebetween generations!

Definition: A strategy c∗ is a stationary Markovperfect equilibrium (SMPE) if

c∗(s) ∈ arg maxa∈[0,s]

u(a, c∗(f (s − a))).

The best reply of the current generation is c∗ if thenext one uses c∗.

Page 50: TUTORIAL ON STOCHASTIC GAMES I - Lake Como …TUTORIAL ON STOCHASTIC GAMES Anna Ja!kiewicz Wroc"aw University of Science and Technology Department of Pure and Applied Mathematics e-mail:

EXAMPLE

I u(at , at+1) = ln at + β ln at+1, where β ∈ (0, 1);

I f (y) = y ξ, where ξ ∈ (0, 1);I we look for a stationary strategy c(·) that is differentiable;I solve the problem

maxa∈[0,s]

(ln a + β ln c((s − a)ξ))

I FOC:1

a=βξ(s − a)ξ−1

c((s − a)ξ)c ′((s − a)ξ)

I this suggests a linear strategy c(s) := As, A ∈ (0, 1);

1

a=βξ(s − a)ξ−1A

A(s − a)ξ⇒ a =

1

1 + βξs

I hence, c∗(s) = 11+βξ

s is a SMPE.

Page 51: TUTORIAL ON STOCHASTIC GAMES I - Lake Como …TUTORIAL ON STOCHASTIC GAMES Anna Ja!kiewicz Wroc"aw University of Science and Technology Department of Pure and Applied Mathematics e-mail:

EXISTENCE OF SMPE: BEQUEST GAME

I Suppose that the generation inherits s and choosesstrategy c(s) = a that maximises

u(a, c(f (s − a))),

where c(·) is a strategy of the following generation;I c(·) - continuous ⇒ the maximum on [0, s] exists;I One can assign to a continuous function c a function c ;I The fixed point of such a map is a SMPE;I PROBLEM!

- c need not be continuous

- C [0, 1] is not compact, e.g., (xn);I Remedy: the Lipschitz cont. f-ns (stringent ass.)I compactness of the domain: the Arzela-Ascoli thm, the

Banach-Alaoglu thm.

Page 52: TUTORIAL ON STOCHASTIC GAMES I - Lake Como …TUTORIAL ON STOCHASTIC GAMES Anna Ja!kiewicz Wroc"aw University of Science and Technology Department of Pure and Applied Mathematics e-mail:

EXAMPLE: FISH WAR GAME I

I Each generation extracts a renewable common-propertyresource, e.g., fish;

I f (y) = y ξ - the production function;

I The utility of generation t:

ut(at , at+1, at+2, . . .) := ln at+α(β ln at+1 + β2 ln at+2 + . . .

)I Generation t cares about the consumption levels of all

future generations:

α− the altruism coefficient (< 1)

Page 53: TUTORIAL ON STOCHASTIC GAMES I - Lake Como …TUTORIAL ON STOCHASTIC GAMES Anna Ja!kiewicz Wroc"aw University of Science and Technology Department of Pure and Applied Mathematics e-mail:

EXAMPLE: FISH WAR GAME II

I Game: between the current generation and allfuture generations;

I Look for a SMPE in the set of linear functions, i.e.,c(s) = As, where 0 < A < 1;

I Assume that all future generations use c(s) = As :

sτ = f (sτ−1 − aτ−1) = f (sτ−1 − c(sτ−1))

= (sτ−1 − Asτ−1)ξ = (sτ−1)ξ(1− A)ξ

I Define the discounted payoff when all future generationst + 1, t + 2, . . . employ strategy c :

J(c)(st+1) =∞∑τ=1

βτ−1 ln c(st+τ ) =∞∑τ=1

βτ−1 lnAst+τ

J(c)(st+1) =st+1

1− ξβ+

lnA

1− β+

ξβ ln(1− A)

(1− β)(1− ξβ)

Page 54: TUTORIAL ON STOCHASTIC GAMES I - Lake Como …TUTORIAL ON STOCHASTIC GAMES Anna Ja!kiewicz Wroc"aw University of Science and Technology Department of Pure and Applied Mathematics e-mail:

EXAMPLE: FISH WAR GAME III

I Payoff for the current generation, if it consumes a

P(st , c)(a) = ln a + αβJ(c)(st+1)

I Problem of the current generation:

maxa∈[0,s]

P(s, c)(a) = ln a +αβξ

1− ξβln(s − a)

+αβ lnA

1− β+

αξβ2 ln(1− A)

(1− β)(1− ξβ)

I FOC:1

a=

αβξ

(1− ξβ)(s − a)

I SMPE:

c∗(s) =1− ξβ

1− ξβ + αξβs

Page 55: TUTORIAL ON STOCHASTIC GAMES I - Lake Como …TUTORIAL ON STOCHASTIC GAMES Anna Ja!kiewicz Wroc"aw University of Science and Technology Department of Pure and Applied Mathematics e-mail:

MODEL OF A STOCHASTIC BEQUEST GAME

I {1, 2, . . .} - the set of short-lived generations;

I S = [0, 1] - the space of endowments:

st︸︷︷︸endowment

= at︸︷︷︸consumption

+ yt︸︷︷︸investment

I the next generation’s inheritance or capital:

st+1 ∼ q(·|yt)

I u(at) - generation t’s utility

I v(at+1) -the satisfaction of consumption of its childrenfrom the point of view of generation t.

Page 56: TUTORIAL ON STOCHASTIC GAMES I - Lake Como …TUTORIAL ON STOCHASTIC GAMES Anna Ja!kiewicz Wroc"aw University of Science and Technology Department of Pure and Applied Mathematics e-mail:

STOCHASTIC BEQUEST GAME: DYNAMICS

...

at at+1 at+m

q i yt( ) q i yt+1( ) q i yt+m( )st st+1 st+m

yt = st ! at yt+1 = st+1 ! at+1 yt+m = st+m ! at+m

...

P(a,c)(s) = u(a)+ ! v(c(s '))q(ds ' | s ! a)S"

st ! q i | yt( )

Page 57: TUTORIAL ON STOCHASTIC GAMES I - Lake Como …TUTORIAL ON STOCHASTIC GAMES Anna Ja!kiewicz Wroc"aw University of Science and Technology Department of Pure and Applied Mathematics e-mail:

STRATEGIES

I I - the set of non-decreasing lower semicontinuousfunctions i : S 7→ S such that i(s) ∈ [0, s];every i ∈ I is continuous from the left and has acountable number of discontinuity points;

I Define:

F := {c ∈ Φ : c(s) = s − i(s), i ∈ I};

every c ∈ F is upper semicontinuous and continuous fromthe left;

I D. Bernheim and D. Ray (1987),R. Sundaram (1989)

Page 58: TUTORIAL ON STOCHASTIC GAMES I - Lake Como …TUTORIAL ON STOCHASTIC GAMES Anna Ja!kiewicz Wroc"aw University of Science and Technology Department of Pure and Applied Mathematics e-mail:

WHY THE CLASS F? (I)

I X - the vector space of real-valued functions of boundedvariation on S that are continuous from the left;

I (ηm) in X converges weakly to η ∈ X , iflimm→∞ ηm(s) = η(s) for every continuity point of η;

I Endow I ⊂ X with the topology of weak convergence;

I Consider the dual of C (S) (regular signed measures ofbounded variation) and equipped it with the weak-startopology;

I i ←→ µ, where µ ∈ C ∗(S) such that µ(S) ≤ 1(denote this set of measures by M);

I M is compact (the Banach-Alaoglu thm) ⇒ I - compact⇒ F - compact (c(s) = s − i(s)).

Page 59: TUTORIAL ON STOCHASTIC GAMES I - Lake Como …TUTORIAL ON STOCHASTIC GAMES Anna Ja!kiewicz Wroc"aw University of Science and Technology Department of Pure and Applied Mathematics e-mail:

WHY THE CLASS F? (II)

The Helly Theorem

Let (ηm) be a sequence of functions of bounded variation on Ssuch that ηm(0) ≤ constant. Then, there exists a convergentsubsequence to some function η of bounded variation on S .

Note: If ηm =: im is non-decreasing, then η =: i is alsonon-decreasing. We may accept that ηm, η are continuousfrom the left.

Page 60: TUTORIAL ON STOCHASTIC GAMES I - Lake Como …TUTORIAL ON STOCHASTIC GAMES Anna Ja!kiewicz Wroc"aw University of Science and Technology Department of Pure and Applied Mathematics e-mail:

ASSUMPTIONS

(A1) u ∈ C (S) - strictly concave, increasing;v ∈ C (S) is increasing;

(A2) q is weakly continuous on S , i.e.,

q(·|ym)⇒ q(·|y0) if ym → y0∫S

w(s)q(ds|ym)→∫S

w(s)q(ds|y0) for w ∈ C (S)

q({0}|0) = 1;

(A3) Zs := {y ∈ S : q({s}|y) > 0} is countable;

(A4) q is stochastically increasing, i.e., if z 7→ Q(z |y) is cdf ofq(·|y), then for all y1 < y2 and z ∈ S

Q(z |y1) ≥ Q(z |y2).

Page 61: TUTORIAL ON STOCHASTIC GAMES I - Lake Como …TUTORIAL ON STOCHASTIC GAMES Anna Ja!kiewicz Wroc"aw University of Science and Technology Department of Pure and Applied Mathematics e-mail:

EXAMPLE - (A3)

Zs := {y ∈ S : q({s}|y) > 0} is countable

I N ⊂ N,I {fn}n∈N - the set of continuous increasing production

functions (fn(0) = 0),

I {αn}n∈N - the set of positive numbers such that∑n∈N αn = 1,

I Defineq(·|y) =

∑n∈N

αnδfn(y)(·), y ∈ S .

I Zs := {y ∈ S : fn(y) = s, n ∈ N} is countable, since|Zs | ≤ |N |.

Page 62: TUTORIAL ON STOCHASTIC GAMES I - Lake Como …TUTORIAL ON STOCHASTIC GAMES Anna Ja!kiewicz Wroc"aw University of Science and Technology Department of Pure and Applied Mathematics e-mail:

EXAMPLE - (A3)

q(·|y) =∑n∈N

αnδfn(y)(·), y ∈ S

I (A2): w ∈ C (S) and ym → y0∫Sw(s)q(ds|ym) =

∑n∈N

αnw(fn(ym))→

∑n∈N

αnw(fn(y0)) =

∫Sw(s)q(ds|y0)

q({0}|0) =∑n∈N

αn = 1 since fn(0) = 0

I (A4) is satisfied: fn ↗I Other examples: convex combinations of above transition

probabilities and non-atomic transitions.

Page 63: TUTORIAL ON STOCHASTIC GAMES I - Lake Como …TUTORIAL ON STOCHASTIC GAMES Anna Ja!kiewicz Wroc"aw University of Science and Technology Department of Pure and Applied Mathematics e-mail:

MAIN RESULT

Assume (A1)-(A4).Then there exists a SMPE c∗ ∈ F , i.e.,

maxa∈[0,s] P(a, c∗)(s) = P(c∗(s), c∗)(s)

where

P(a, c∗)(s) = u(a) + β

∫S

v(c∗(s ′))q(ds ′|s − a).

Page 64: TUTORIAL ON STOCHASTIC GAMES I - Lake Como …TUTORIAL ON STOCHASTIC GAMES Anna Ja!kiewicz Wroc"aw University of Science and Technology Department of Pure and Applied Mathematics e-mail:

PROOF - THE MAIN IDEA

I c ∈ F - the strategy used by the descendant

I c0(s) = s− i0(s) - the best reply of the current generation to c

I CLAIM: c0 ∈ F or equivalently i0 ∈ I (non-decreasing andcontinuous from the left)

I Define the mapping L : F 7→ F by

Lc(s) := c0(s).

I L is continuous when F is given the topology of weakconvergence, i.e.,

IF cn → c weakly in F , c0,n(s) = Lcn(s) and c0,n → c0 weaklyin F , THEN c0(s) = Lc(s).

I By the Schauder-Tychonoff fixed point theorem there existsc∗ ∈ F such that c∗(s) = Lc∗(s) for s ∈ S ⇒c∗ is a SMPE.

Page 65: TUTORIAL ON STOCHASTIC GAMES I - Lake Como …TUTORIAL ON STOCHASTIC GAMES Anna Ja!kiewicz Wroc"aw University of Science and Technology Department of Pure and Applied Mathematics e-mail:

PROOF

I Assumption (A4) ⇒ Skorohod Representation Thm:

limm→∞∫Sv(c(s ′))q(ds ′|ym) =

∫Sv(c(s ′))q(ds ′|y).

I Assumption (A3) helps in controlling the atoms of q.

Page 66: TUTORIAL ON STOCHASTIC GAMES I - Lake Como …TUTORIAL ON STOCHASTIC GAMES Anna Ja!kiewicz Wroc"aw University of Science and Technology Department of Pure and Applied Mathematics e-mail:

EXTENSION TO A MODEL WITH MOREDESCENDANTS?

I Two descendants; utilities

u(a) =4√

8√a, v1(a) = 0.8u(a), v2(a) = 0.64u(a).

I Transition law is deterministic: f (y) =√y .

y − investment of t→f (y)− endowment of t + 1

→c(f (y))→f (y)− c(f (y))− investment of t + 1

→f (f (y)− c(f (y)))− endowment of t + 2

→c(f (f (y)− c(f (y))))

R(y , c)(s) (:= P(a, c)(s)) = u(s − y) + 0.8u(c(f (y)))

+0.64u(c(f (f (y)− c(f (y))))).

Page 67: TUTORIAL ON STOCHASTIC GAMES I - Lake Como …TUTORIAL ON STOCHASTIC GAMES Anna Ja!kiewicz Wroc"aw University of Science and Technology Department of Pure and Applied Mathematics e-mail:

EXTENSION TO A MODEL WITH MOREDESCENDANTS?-CONT.

R(y , c)(s) = u(s − y) + 0.8u(c(f (y))) + 0.64u(c(f (f (y)− c(f (y))))).

I The successors employ the following strategy:

c(s) =

{s s ∈ [0, 0.5]s/2 s ∈ (0.5, 1].

Then,

R(y , c)(1) =

{4√

8√

1− y + 0.8 4√

8y , y ∈ [0, 0.25]4√

8√

1− y + 0.8 4√

2y + 0.64 8√y , y ∈ (0.25, 1].

Note thatarg max

y∈A(1)R(y , c)(1) = ∅.

Page 68: TUTORIAL ON STOCHASTIC GAMES I - Lake Como …TUTORIAL ON STOCHASTIC GAMES Anna Ja!kiewicz Wroc"aw University of Science and Technology Department of Pure and Applied Mathematics e-mail:

PLOT OF R(y , c)(1)The result cannot be easily extended!

0.2 0.4 0.6 0.8 1.0

1.8

2.0

2.2

2.4

2.6

Page 69: TUTORIAL ON STOCHASTIC GAMES I - Lake Como …TUTORIAL ON STOCHASTIC GAMES Anna Ja!kiewicz Wroc"aw University of Science and Technology Department of Pure and Applied Mathematics e-mail:

POSSIBLE EXTENSION TO A MODEL WITH MOREDESCENDANTS

Assume that generation t’s utility equals

Ut(ht) := u(at) + w(at+1, at+2, . . .)

ht = (at , st+1, at+1, . . .) - the feasible future history fromperiod t onwardsu, w ∈ C (S)

Page 70: TUTORIAL ON STOCHASTIC GAMES I - Lake Como …TUTORIAL ON STOCHASTIC GAMES Anna Ja!kiewicz Wroc"aw University of Science and Technology Department of Pure and Applied Mathematics e-mail:

STOCHASTIC BEQUEST GAMES: DYNAMICS

...

generation

at at+1 at+msuccessor m ! th1! st successor

q i yt( ) q i yt+1( ) q i yt+m( )st st+1 st+m

yt = st ! at yt+1 = st+1 ! at+1 yt+m = st+m ! at+m

t ! th

w(at+1,at+2 ,...,at+m, ...)+u(at )

Page 71: TUTORIAL ON STOCHASTIC GAMES I - Lake Como …TUTORIAL ON STOCHASTIC GAMES Anna Ja!kiewicz Wroc"aw University of Science and Technology Department of Pure and Applied Mathematics e-mail:

A MODEL WITH INFINITELY MANYDESCENDANTS

I The expected utility of generation t :

Wt(c)(st) := E cstUt(h

t) = u(c(st)) +E cst [w(at+1, at+2, . . .)]

I Define

J(c)(st+1) = E cst+1

[w(at+1, at+2, . . .)]

I Then, the expected utility of generation t :

Wt(c)(st) = u(c(st)) +

∫S

J(c)(st+1)q(dst+1|st).

Page 72: TUTORIAL ON STOCHASTIC GAMES I - Lake Como …TUTORIAL ON STOCHASTIC GAMES Anna Ja!kiewicz Wroc"aw University of Science and Technology Department of Pure and Applied Mathematics e-mail:

A MODEL WITH INFINITELY MANYDESCENDANTS

I Assume that successors use c . Then

P(a, c)(s) := u(a) +

∫S

J(c)(s ′)q(ds ′|s − a)

I SMPE is a function c∗ ∈ F such that

supa∈[0,s]

P(a, c∗)(s) = P(c∗(s), c∗)(s)

Page 73: TUTORIAL ON STOCHASTIC GAMES I - Lake Como …TUTORIAL ON STOCHASTIC GAMES Anna Ja!kiewicz Wroc"aw University of Science and Technology Department of Pure and Applied Mathematics e-mail:

ASSUMPTIONS

(A1) u - strictly concave, increasing, continuous,w - continuous;

(A2) q is weakly continuous on S , i.e.,

q(·|ym)⇒ q(·|y0) if ym → y0;

q(·|y) is non-atomic for each y ∈ S \ {0} andq({0}|0) = 1.

Page 74: TUTORIAL ON STOCHASTIC GAMES I - Lake Como …TUTORIAL ON STOCHASTIC GAMES Anna Ja!kiewicz Wroc"aw University of Science and Technology Department of Pure and Applied Mathematics e-mail:

SOME NOTABLE EXAMPLES

The dynamics evolves according to the equation:st+1 = f (yt , zt), where (zt) - i.i.d. random shocks having anon-atomic probability distribution π;

q(B |y) =

∫S

1B(f (y , z))π(dz);

(C1) f (yt , zt) = zt + f0(yt), where f0 : S 7→ S - continuous,increasing; supp(π) ⊂ [0, 1− f (1)];

(C2) f (yt , zt) = ztf0(yt); supp(π) ⊂ [0, 1/f (1)];

(C3) q(B |y) =∑l

j=1 gj(y)µj(B), where gj ≥ 0 and continuousand µj are non-atomic.

Page 75: TUTORIAL ON STOCHASTIC GAMES I - Lake Como …TUTORIAL ON STOCHASTIC GAMES Anna Ja!kiewicz Wroc"aw University of Science and Technology Department of Pure and Applied Mathematics e-mail:

MAIN RESULT I

Assume (A1)-(A2).

Then there exists a SMPE c∗ ∈ F .

Why does it work: an investment strategy i ∈ I (hence c ∈ F ) has a countable

number of discontinuity points; since q is non-atomic, then these points do not

count and do not spoil the continuity of the operator L (proof is based on the

Schauder-Tychonoff thm).

Page 76: TUTORIAL ON STOCHASTIC GAMES I - Lake Como …TUTORIAL ON STOCHASTIC GAMES Anna Ja!kiewicz Wroc"aw University of Science and Technology Department of Pure and Applied Mathematics e-mail:

TRANSITION PROBABILITIES WITH ATOMS I

I Consider the model as in the fish war game:

Ut(ht) = u(at) + αβ

(∞∑τ=1

βτ−1u(at+τ )

)I The expected utility for generation t:

W (c)(st) = E cstUt(h

t) = u(c(st))+αβE cst

(∞∑τ=1

βτ−1u(at+τ )

)I Put

J(c)(st+1) = E cst+1

(∞∑τ=1

βτ−1u(at+τ )

)I Define

P(a, c)(s) = u(a) + αβ

∫S

J(c)(s ′)q(ds ′|s − a).

Page 77: TUTORIAL ON STOCHASTIC GAMES I - Lake Como …TUTORIAL ON STOCHASTIC GAMES Anna Ja!kiewicz Wroc"aw University of Science and Technology Department of Pure and Applied Mathematics e-mail:

TRANSITION PROBABILITIES WITH ATOMS II

I q(·|s − a) = q(·|y) =∑l

j=1 gj(y)µj(·), where

I gj are continuous and∑l

j=1 gj(y) = 1;I ∃νj such that µj is absolutely continuous with respect to νj for

each j ;

I g2, . . . , gl are non-decreasing and concave;

I µ1 is (first-order) stochastically dominated by µj for allj ≥ 2.

Page 78: TUTORIAL ON STOCHASTIC GAMES I - Lake Como …TUTORIAL ON STOCHASTIC GAMES Anna Ja!kiewicz Wroc"aw University of Science and Technology Department of Pure and Applied Mathematics e-mail:

MAIN RESULT II

There exists a SMPE c∗ in the class ofLipschitz functions with constant one.

Page 79: TUTORIAL ON STOCHASTIC GAMES I - Lake Como …TUTORIAL ON STOCHASTIC GAMES Anna Ja!kiewicz Wroc"aw University of Science and Technology Department of Pure and Applied Mathematics e-mail:

LITERATURE

1. A. Alj, A. Haurie (IEEE Trans Autom Control (1983)) -denumerable state of endowments

2. C. Harris, D. Laibson (Econometrica (2001)) - all descendants

3. A. Amir (Econ Theory (1996)), L. Balbus, K. Reffet, L.Wozny (J Math Econ (2012)) - narrow class of transitions, anSMPE is shown to exist in the class of Lipschitz continuousfunctions

4. L. Balbus, A. Jaskiewicz, A.S. Nowak (Games Econ Behav(2015), J Optim Theory Appl (2015))

5. Purely deterministic cases:W. Leininger (Rev Econ Studies (1986)) - “levelling” techD. Bernheim, D. Ray (Report (1983)) - “filling” tech L. Balbus, A. Jaskiewicz, A.S. Nowak (J Math Analysis Appl,(2015)) - unbounded utilities;

Page 80: TUTORIAL ON STOCHASTIC GAMES I - Lake Como …TUTORIAL ON STOCHASTIC GAMES Anna Ja!kiewicz Wroc"aw University of Science and Technology Department of Pure and Applied Mathematics e-mail:

CREDITS

1. J. Filar, K. Vrieze, Competitive Markov Decision Processes,Springer, 1996

2. A. Jaskiewicz, A.S. Nowak, Zero-sum stochastic games,Handbook of Dynamic Games, Vol I, Birkhauser, 2017

3. A. Jaskiewicz, A.S. Nowak, Non-zero-sum stochastic games,Handbook of Dynamic Games, Vol I, Birkhauser, 2017

4. A. Neyman, S. Sorin (Eds), Stochastic games, NATO Series,2003