csr2011 june14 16_30_ibsen-jensen

118
The complexity of solving reachability games using value and strategy iteration Kristoffer Arnsfelt Hansen Rasmus Ibsen-Jensen Peter Bro Miltersen Aarhus University Denmark CSR 2011, 14’th June

Post on 21-Oct-2014

261 views

Category:

Documents


1 download

DESCRIPTION

Kristoffer Arnsfelt Hansen, Rasmus Ibsen-Jensen and Peter Bro Miltersen. The complexity of solving reachability games using value and strategy iteration

TRANSCRIPT

Page 1: Csr2011 june14 16_30_ibsen-jensen

The complexity of solving reachability games using value andstrategy iteration

Kristoffer Arnsfelt HansenRasmus Ibsen-Jensen Peter Bro Miltersen

Aarhus UniversityDenmarkCSR 2011, 14’th June

Page 2: Csr2011 june14 16_30_ibsen-jensen

Overview

What are concurrent reachabillity games? Two standard algorithms solving concurrent

reachabillity games: The value iteration algorithm The strategy iteration algorithm

Examplify important facts for the proof of the time lower bound for both algorithms

1/42

Page 3: Csr2011 june14 16_30_ibsen-jensen

Matrix games von Neumann 1928

0 -1 1

1 0 -1

-1 1 0

2/42

Page 4: Csr2011 june14 16_30_ibsen-jensen

Matrix games von Neumann 1928

0 -1 1

1 0 -1

-1 1 0

2/42

Page 5: Csr2011 june14 16_30_ibsen-jensen

0 -1 1

1 0 -1

-1 1 0

Each entry can be either 0, 1 or a pointer

vs.Dante* Lucifer*

Concurrent reachability games Everett 1957/de Alfaro, Henzinger, Kupferman 1998

0 1

* Naming convention from Hansen, Koucky and Miltersen, 2009 3/42

Page 6: Csr2011 june14 16_30_ibsen-jensen

vs.Dante* Lucifer*

Concurrent reachability games Everett 1957/de Alfaro, Henzinger, Kupferman 1998

Each entry can be either 0, 1 or a pointer

* Naming convention from Hansen, Koucky and Miltersen, 2009 3/42

Page 7: Csr2011 june14 16_30_ibsen-jensen

Concurrent reachability games Everett 1957/de Alfaro, Henzinger, Kupferman 1998

Each entry can be either 0, 1 or a pointer

3/42

Page 8: Csr2011 june14 16_30_ibsen-jensen

Concurrent reachability games Everett 1957/de Alfaro, Henzinger, Kupferman 1998

Each entry can be either 0, 1 or a pointer

0

0 0

0

0 0

0

0 0

0

0 0

3/42

Page 9: Csr2011 june14 16_30_ibsen-jensen

Concurrent reachability games Everett 1957/de Alfaro, Henzinger, Kupferman 1998

Each entry can be either 0, 1 or a pointer

1

0 1

0 0 1

0

0 0

0

0 0

0

0 0

3/42

Page 10: Csr2011 june14 16_30_ibsen-jensen

Concurrent reachability games Everett 1957/de Alfaro, Henzinger, Kupferman 1998

Each entry can be either 0, 1 or a pointer

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S:

S S

0 S

0 0

3/42

Page 11: Csr2011 june14 16_30_ibsen-jensen

Concurrent reachability games Everett 1957/de Alfaro, Henzinger, Kupferman 1998

Each entry can be either 0, 1 or a pointer

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S:

S S

0 S

0 0

S S

0 S

0 0

3/42

Page 12: Csr2011 june14 16_30_ibsen-jensen

Histories

Each entry can be either 0, 1 or a pointer

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S:

S S

0 S

0 0

4/42

Page 13: Csr2011 june14 16_30_ibsen-jensen

Histories and strategies

History: Sequence of positions and choices for each player in each position.

Strategy: Map from histories to probability distributions over choices in the position we arrive at after the history

S1: Set of strategies for Dante

S2: Set of strategies for Lucifer

H1/H2: Sets of stationary strategies (sets of strategies that only depends on the position we arrive at after the history)

5/42

Page 14: Csr2011 june14 16_30_ibsen-jensen

Payoffs

v(i,σ,π): The probability to eventually reach a 1, from position i, if Dante plays by strategy σ and Lucifer by π.

6/42

Page 15: Csr2011 june14 16_30_ibsen-jensen

Everett 1957

iviviv

),,( supinf),,( infsup :i1221 SSSS

Value of i

iH

viviv

),,( supinf),,( infsup :i1221 SSH

7/42

Page 16: Csr2011 june14 16_30_ibsen-jensen

Algorithmic problems

Quantitatively solving a game: Given the game, compute the value of all positions.

Strategically solving a game: Given the game and ε>0, compute σ such that for all π and i: v(i,σ,π)>vi-ε.

8/42

Page 17: Csr2011 june14 16_30_ibsen-jensen

Value iteration Shapley 1953

9/42

Value iteration computes the value of each position in Gt in iteration t, on the basis of the value of each position in Gt-1.

Gt: A modified version of G, where Dante loses after t moves.

Page 18: Csr2011 june14 16_30_ibsen-jensen

Our results: Lower bound for value iteration There exists a concurrent reachabillity game

G, with N matrices and m rows and columns in each matrix, so that:

val(G)=1 and val(Gt) = 3m-N/2, for t=2mN/2

10/42

Page 19: Csr2011 june14 16_30_ibsen-jensen

Our results: Upper bound for value iteration For any concurrent reachabillity game G val(G)-val(Gt)<ε for t=(1/ε)mO(N)

11/42

Page 20: Csr2011 june14 16_30_ibsen-jensen

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S:

S S

0 S

0 0

12/42

Value iteration example – G0

Page 21: Csr2011 june14 16_30_ibsen-jensen

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S:

S S

0 S

0 0

0

0 0

0

12/42

Value iteration example – G0

Page 22: Csr2011 june14 16_30_ibsen-jensen

S S

0 S

0 0

S S

0 S

0 0

S:

S S

0 S

0 0

0

0

0

0

1 S S

0 1 S

0 0 1

13/42

Value iteration example – G1

Page 23: Csr2011 june14 16_30_ibsen-jensen

S S

0 S

0 0

S S

0 S

0 0

S:

S S

0 S

0 0

1 S S

0 1 S

0 0 1

0

0 0

00

0 0

13/42

Value iteration example – G1

Page 24: Csr2011 june14 16_30_ibsen-jensen

Value iteration example – G1

S S

0 S

0 0

S S

0 S

0 0

S:

S S

0 S

0 0

1 S S

0 1 S

0 0 1 0

0 0

0

0 0

01

1

1

1

13/42

Page 25: Csr2011 june14 16_30_ibsen-jensen

Value iteration example – G1

S S

0 S

0 0

S S

0 S

0 0

S:

S S

0 S

0 0

1 S S

0 1 S

0 0 1

1

0 1

0 0 1

0

0

0

0

13/42

Page 26: Csr2011 june14 16_30_ibsen-jensen

Value iteration example – G1

S S

0 S

0 0

S S

0 S

0 0

S:

S S

0 S

0 0

1 S S

0 1 S

0 0 1

1

0 1

0 0 1

0

0 0000

0

13/42

Page 27: Csr2011 june14 16_30_ibsen-jensen

0

Value iteration example – G1

S S

0 S

0 0

S S

0 S

0 0

S:

S S

0 S

0 0

1 S S

0 1 S

0 0 1

1 0 0

0 1 0

0 0 1

0.33333/

0

0 0

13/42

Page 28: Csr2011 june14 16_30_ibsen-jensen

Value iteration example – G1

S S

0 S

0 0

S S

0 S

0 0

S:

S S

0 S

0 0

1 S S

0 1 S

0 0 1

0 0

0.33333/0 00

0 0

13/42

Page 29: Csr2011 june14 16_30_ibsen-jensen

Value iteration example – G1

S S

0 S

0 0

S S

0 S

0 0

S:

S S

0 S

0 0

1 S S

0 1 S

0 0 1 0

0 0

0 0000

00000.33333/

0 0

13/42

Page 30: Csr2011 june14 16_30_ibsen-jensen

Value iteration example – G1

S S

0 S

0 0

S S

0 S

0 0

S:

S S

0 S

0 0

1 S S

0 1 S

0 0 1

0 0 0

0 0 0

0 0 0

0

0.33333/0

00/

0

13/42

Page 31: Csr2011 june14 16_30_ibsen-jensen

Value iteration example – G1

S S

0 S

0 0

S S

0 S

0 0

S:

S S

0 S

0 0

1 S S

0 1 S

0 0 1

0

0

0

0.33333/0

0/ 0/

0/

13/42

Page 32: Csr2011 june14 16_30_ibsen-jensen

Value iteration example – G2

S S

0 S

0 0

S S

0 S

0 0

S:

S S

0 S

0 0

1 S S

0 1 S

0 0 1

0

0

0

0.33333/0.33333

0.11111/ 0/

0/

14/42

Page 33: Csr2011 june14 16_30_ibsen-jensen

Value iteration example – G3

S S

0 S

0 0

S S

0 S

0 0

S:

S S

0 S

0 0

1 S S

0 1 S

0 0 1

0.11111

0

0

0.33333/0.33333

0.11111/ 0/

0.03704/

15/42

Page 34: Csr2011 june14 16_30_ibsen-jensen

Value iteration example – G4

S S

0 S

0 0

S S

0 S

0 0

S:

S S

0 S

0 0

1 S S

0 1 S

0 0 1

0.11111

0.03704

0

0.33333/0.33333

0.11111/ 0.01235/

0.03704/

16/42

Page 35: Csr2011 june14 16_30_ibsen-jensen

Value iteration example – G5

S S

0 S

0 0

S S

0 S

0 0

S:

S S

0 S

0 0

1 S S

0 1 S

0 0 1

0.11111

0.03704

0.01235

0.33748/0.33333

0.11533/ 0.01754/

0.04147/

17/42

Page 36: Csr2011 june14 16_30_ibsen-jensen

Value iteration example – G6

S S

0 S

0 0

S S

0 S

0 0

S:

S S

0 S

0 0

1 S S

0 1 S

0 0 1

0.11533

0.04147

0.01754

0.33925/0.33748

0.11855/ 0.02172/

0.04493/

18/42

Page 37: Csr2011 june14 16_30_ibsen-jensen

Value iteration example – G7

S S

0 S

0 0

S S

0 S

0 0

S:

S S

0 S

0 0

1 S S

0 1 S

0 0 1

0.11855

0.04493

0.02172

0.34068/0.33925

0.12064/ 0.02519/

0.04772/

19/42

Page 38: Csr2011 june14 16_30_ibsen-jensen

Value iteration example – G8

S S

0 S

0 0

S S

0 S

0 0

S:

S S

0 S

0 0

1 S S

0 1 S

0 0 1

0.12064

0.04772

0.02519

0.34187/0.34068

0.12388/ 0.02815/

0.04991/

20/42

Page 39: Csr2011 june14 16_30_ibsen-jensen

Value iteration example – G9

S S

0 S

0 0

S S

0 S

0 0

S:

S S

0 S

0 0

1 S S

0 1 S

0 0 1

0.12388

0.04991

0.02815

0.34378/0.34187

0.12517/ 0.03070/

0.05129/

21/42

Page 40: Csr2011 june14 16_30_ibsen-jensen

Strategy iterationChatterjee, de Alfaro, Henzinger ’06

22/42

Was conjectured to be fast

Page 41: Csr2011 june14 16_30_ibsen-jensen

Our results: Upper bound for strategy iteration An ε-optimal strategy is computed after

t=(1/ε)mO(N) iterations of strategy iteration

This follows from the corresponding results for value iteration

23/42

Page 42: Csr2011 june14 16_30_ibsen-jensen

Our results: Lower bound for strategy iteration There exists a concurrent reachabillity game

G, with N matrices, for large N, and m rows and columns in each matrix, so that:

val(G)=1 and The strategy optained by strategy iteration

guarantees winning probability at most 4m-N/2, for t= 2mN/4

24/42

Strategy iteration, m=2

N Number of iterations neededto get over 1/2

7 18446744073709551617

8 340282366920938463463374607431768211457

9 115792089237316195423570985008687907853269984665640564039457584007913129639937

Page 43: Csr2011 june14 16_30_ibsen-jensen

Strategy iteration: Before iteration 1

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S:

S S

0 S

0 0

1. Start strategy for Dante:= Uniform

25/42

Page 44: Csr2011 june14 16_30_ibsen-jensen

Strategy iteration: Before iteration 1

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

SS S

0 S

0 0

1. Start strategy for Dante:= Uniform

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

25/42

Page 45: Csr2011 june14 16_30_ibsen-jensen

Strategy iteration: Iteration 1

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

S

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

26/42

Page 46: Csr2011 june14 16_30_ibsen-jensen

Strategy iteration: Iteration 1

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

S

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

26/42

Page 47: Csr2011 june14 16_30_ibsen-jensen

Strategy iteration: Iteration 1

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

1

1

1

0

0 0

0

0 0

0

0 0

0

0 0

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

S

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

26/42

Page 48: Csr2011 june14 16_30_ibsen-jensen

Strategy iteration: Iteration 1

0 S

S S

0 S

0 0

S S

0 S

0 0

1

0 0

S S

S S

0 S

0 0

0.66667

The numbers on the edges are the probability that the edge is used.Edges without a number have probability 0.33333 to be used.

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

S

0

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

26/42

Page 49: Csr2011 june14 16_30_ibsen-jensen

Strategy iteration: Iteration 1

0

1

0.66667

The numbers on the edges are the probability that the edge is used.Edges without a number have probability 0.33333 to be used.

0.66667

0.66667

0.66667

0.66667

0.66667

0.66667

0.66667

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

26/42

Page 50: Csr2011 june14 16_30_ibsen-jensen

Strategy iteration: Iteration 1

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

S

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

26/42

Page 51: Csr2011 june14 16_30_ibsen-jensen

Strategy iteration: Iteration 1

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

0.11111

0.03704

0.01235

0.33333

S

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

26/42

Page 52: Csr2011 june14 16_30_ibsen-jensen

0.11111

0.03704

0.01235

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

Strategy iteration: Iteration 1

1 S S

0 1 S

0 0 1

0.01235

0

0 0

S

1

1

1

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

0.012350.012350.01235

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

0.33748

26/42

Page 53: Csr2011 june14 16_30_ibsen-jensen

Strategy iteration: Iteration 1

S

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

0.11111

0.03704

0.01235

0.33333

0.33748

0.33332

0.32920

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

0.33333

26/42

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

Page 54: Csr2011 june14 16_30_ibsen-jensen

Strategy iteration: Iteration 1

S

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

0.11111

0.03704

0.01235

0.33333

0.33748

0.33332

0.32920

0.34599

0.33317

0.32084

0.37327

0.33180

0.29493

0.47368

0.31579

0.21053

26/42

Page 55: Csr2011 june14 16_30_ibsen-jensen

Strategy iteration: Iteration 2

S

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

0.11111

0.03704

0.01235

0.33333

0.33748

0.33332

0.32920

0.34599

0.33317

0.32084

0.37327

0.33180

0.29493

0.47368

0.31579

0.21053

27/42

Page 56: Csr2011 june14 16_30_ibsen-jensen

Strategy iteration: Iteration 2

S

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

0.11111

0.03704

0.01235

0.33333

0.33748

0.33332

0.32920

0.34599

0.33317

0.32084

0.37327

0.33180

0.29493

0.47368

0.31579

0.21053

27/42

Page 57: Csr2011 june14 16_30_ibsen-jensen

S

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

0.11111

0.03704

0.01235

0.33333

0.33748

0.33332

0.32920

0.34599

0.33317

0.32084

0.37327

0.33180

0.29493

0.47368

0.31579

0.21053

27/42

Strategy iteration: Iteration 2

Page 58: Csr2011 june14 16_30_ibsen-jensen

Strategy iteration: Iteration 2

S

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

0.11677

0.04359

0.02065

0.33748

0.33748

0.33332

0.32920

0.34599

0.33317

0.32084

0.37327

0.33180

0.29493

0.47368

0.31579

0.21053

27/42

Page 59: Csr2011 june14 16_30_ibsen-jensen

Strategy iteration: Iteration 2

S

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

0.11677

0.04359

0.02065

0.33748

0.34031

0.33329

0.32640

0.35458

0.33289

0.31253

0.39987

0.33180

0.32917

0.55453

0.29186

0.15361

27/42

Page 60: Csr2011 june14 16_30_ibsen-jensen

Strategy iteration: Iteration 3

S

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

0.11677

0.04359

0.02065

0.33748

0.34031

0.33329

0.32640

0.35458

0.33289

0.31253

0.39987

0.33180

0.32917

0.55453

0.29186

0.15361

28/42

Page 61: Csr2011 june14 16_30_ibsen-jensen

Strategy iteration: Iteration 3

S

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

0.11677

0.04359

0.02065

0.33748

0.34031

0.33329

0.32640

0.35458

0.33289

0.31253

0.39987

0.33180

0.32917

0.55453

0.29186

0.15361

28/42

Page 62: Csr2011 june14 16_30_ibsen-jensen

Strategy iteration: Iteration 3

S

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

0.11677

0.04359

0.02065

0.33748

0.34031

0.33329

0.32640

0.35458

0.33289

0.31253

0.39987

0.33180

0.32917

0.55453

0.29186

0.15361

28/42

Page 63: Csr2011 june14 16_30_ibsen-jensen

Strategy iteration: Iteration 3

S

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

0.12067

0.04825

0.02676

0.34031

0.34031

0.33329

0.32640

0.35458

0.33289

0.31253

0.39987

0.33180

0.32917

0.55453

0.29186

0.15361

28/42

Page 64: Csr2011 june14 16_30_ibsen-jensen

Strategy iteration: Iteration 3

S

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

0.12067

0.04825

0.02676

0.34031

0.34031

0.33329

0.32640

0.35458

0.33289

0.31253

0.39987

0.33180

0.32917

0.55453

0.29186

0.15361

28/42

Page 65: Csr2011 june14 16_30_ibsen-jensen

Strategy iteration: Iteration 3

S

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

0.12067

0.04825

0.02676

0.34031

0.34241

0.33325

0.32434

0.36097

0.33259

0.30644

0.41947

0.32646

0.25407

0.60831

0.27098

0.12071

28/42

Page 66: Csr2011 june14 16_30_ibsen-jensen

Strategy iteration: Iteration 4

S

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

0.12067

0.04825

0.02676

0.34031

0.34241

0.33325

0.32434

0.36097

0.33259

0.30644

0.41947

0.32646

0.25407

0.60831

0.27098

0.12071

29/42

Page 67: Csr2011 june14 16_30_ibsen-jensen

Strategy iteration: Iteration 4

S

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

0.12067

0.04825

0.02676

0.34031

0.34241

0.33325

0.32434

0.36097

0.33259

0.30644

0.41947

0.32646

0.25407

0.60831

0.27098

0.12071

29/42

Page 68: Csr2011 june14 16_30_ibsen-jensen

Strategy iteration: Iteration 4

S

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

0.12067

0.04825

0.02676

0.34031

0.34241

0.33325

0.32434

0.36097

0.33259

0.30644

0.41947

0.32646

0.25407

0.60831

0.27098

0.12071

29/42

Page 69: Csr2011 june14 16_30_ibsen-jensen

Strategy iteration: Iteration 4

S

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

0.12067

0.04825

0.02676

0.34031

0.34241

0.33325

0.32434

0.36097

0.33259

0.30644

0.41947

0.32646

0.25407

0.60831

0.27098

0.12071

29/42

Page 70: Csr2011 june14 16_30_ibsen-jensen

Strategy iteration: Iteration 4

S

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

0.12360

0.05185

0.03154

0.34241

0.34241

0.33325

0.32434

0.36097

0.33259

0.30644

0.41947

0.32646

0.25407

0.60831

0.27098

0.12071

29/42

Page 71: Csr2011 june14 16_30_ibsen-jensen

Strategy iteration: Iteration 4

S

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

0.12360

0.05185

0.03154

0.34241

0.34241

0.33325

0.32434

0.36097

0.33259

0.30644

0.41947

0.32646

0.25407

0.60831

0.27098

0.12071

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

29/42

Page 72: Csr2011 june14 16_30_ibsen-jensen

Strategy iteration: Iteration 4

S

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

0.12360

0.05185

0.03154

0.34241

0.34407

0.33322

0.32271

0.36601

0.33230

0.30169

0.43486

0.32390

0.24125

0.64720

0.25350

0.09930

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

29/42

Page 73: Csr2011 june14 16_30_ibsen-jensen

Strategy iteration: Iteration 5

S

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

0.12360

0.05185

0.03154

0.34241

0.34407

0.33322

0.32271

0.36601

0.33230

0.30169

0.43486

0.32390

0.24125

0.64720

0.25350

0.09930

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

30/42

Page 74: Csr2011 june14 16_30_ibsen-jensen

Strategy iteration: Iteration 5

S

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

0.12360

0.05185

0.03154

0.34241

0.34407

0.33322

0.32271

0.36601

0.33230

0.30169

0.43486

0.32390

0.24125

0.64720

0.25350

0.09930

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

30/42

Page 75: Csr2011 june14 16_30_ibsen-jensen

Strategy iteration: Iteration 5

S

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

0.12360

0.05185

0.03154

0.34241

0.34407

0.33322

0.32271

0.36601

0.33230

0.30169

0.43486

0.32390

0.24125

0.64720

0.25350

0.09930

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

30/42

Page 76: Csr2011 june14 16_30_ibsen-jensen

Strategy iteration: Iteration 5

S

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

0.12593

0.05476

0.03544

0.34407

0.34407

0.33322

0.32271

0.36601

0.33230

0.30169

0.43486

0.32390

0.24125

0.64720

0.25350

0.09930

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

30/42

Page 77: Csr2011 june14 16_30_ibsen-jensen

Strategy iteration: Iteration 5

S

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

0.12593

0.05476

0.03544

0.34407

0.34543

0.33319

0.32138

0.37015

0.33202

0.29783

0.44745

0.32152

0.23103

0.67692

0.23882

0.08426

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

30/42

Page 78: Csr2011 june14 16_30_ibsen-jensen

Strategy iteration: Iteration 6

S

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

0.12593

0.05476

0.03544

0.34407

0.34543

0.33319

0.32138

0.37015

0.33202

0.29783

0.44745

0.32152

0.23103

0.67692

0.23882

0.08426

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

31/42

Page 79: Csr2011 june14 16_30_ibsen-jensen

Strategy iteration: Iteration 6

S

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

0.12593

0.05476

0.03544

0.34407

0.34543

0.33319

0.32138

0.37015

0.33202

0.29783

0.44745

0.32152

0.23103

0.67692

0.23882

0.08426

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

31/42

Page 80: Csr2011 june14 16_30_ibsen-jensen

Strategy iteration: Iteration 6

S

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

0.12593

0.05476

0.03544

0.34407

0.34543

0.33319

0.32138

0.37015

0.33202

0.29783

0.44745

0.32152

0.23103

0.67692

0.23882

0.08426

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

31/42

Page 81: Csr2011 june14 16_30_ibsen-jensen

Strategy iteration: Iteration 6

S

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

0.12786

0.05721

0.03873

0.34543

0.34543

0.33319

0.32138

0.37015

0.33202

0.29783

0.44745

0.32152

0.23103

0.67692

0.23882

0.08426

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

31/42

Page 82: Csr2011 june14 16_30_ibsen-jensen

Strategy iteration: Iteration 6

S

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

0.12786

0.05721

0.03873

0.34543

0.34543

0.33319

0.32138

0.37015

0.33202

0.29783

0.44745

0.32152

0.23103

0.67692

0.23882

0.08426

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

31/42

Page 83: Csr2011 june14 16_30_ibsen-jensen

Strategy iteration: Iteration 6

S

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

0.12786

0.05721

0.03873

0.34543

0.34658

0.33316

0.32026

0.37366

0.33177

0.29457

0.45807

0.31933

0.22260

0.70055

0.22633

0.07312

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

31/42

Page 84: Csr2011 june14 16_30_ibsen-jensen

Strategy iteration: Iteration 6

S

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

0.12786

0.05721

0.03873

0.34543

0.34658

0.33316

0.32026

0.37366

0.33177

0.29457

0.45807

0.31933

0.22260

0.70055

0.22633

0.07312

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

31/42

Page 85: Csr2011 june14 16_30_ibsen-jensen

Strategy iteration: Iteration 7

S

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

0.12786

0.05721

0.03873

0.34543

0.34658

0.33316

0.32026

0.37366

0.33177

0.29457

0.45807

0.31933

0.22260

0.70055

0.22633

0.07312

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

32/42

Page 86: Csr2011 june14 16_30_ibsen-jensen

Strategy iteration: Iteration 7

S

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

0.12786

0.05721

0.03873

0.34543

0.34658

0.33316

0.32026

0.37366

0.33177

0.29457

0.45807

0.31933

0.22260

0.70055

0.22633

0.07312

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

32/42

Page 87: Csr2011 june14 16_30_ibsen-jensen

Strategy iteration: Iteration 7

S

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

0.12950

0.05932

0.04156

0.34658

0.34658

0.33316

0.32026

0.37366

0.33177

0.29457

0.45807

0.31933

0.22260

0.70055

0.22633

0.07312

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

32/42

Page 88: Csr2011 june14 16_30_ibsen-jensen

Strategy iteration: Iteration 7

S

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

0.12950

0.05932

0.04156

0.34658

0.34658

0.33316

0.32026

0.37366

0.33177

0.29457

0.45807

0.31933

0.22260

0.70055

0.22633

0.07312

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

32/42

Page 89: Csr2011 june14 16_30_ibsen-jensen

Strategy iteration: Iteration 7

S

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

0.12950

0.05932

0.04156

0.34658

0.34758

0.33313

0.31929

0.37670

0.33153

0.29177

0.46723

0.31730

0.21547

0.71988

0.21557

0.06455

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

32/42

Page 90: Csr2011 june14 16_30_ibsen-jensen

Strategy iteration: Iteration 8

S

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

0.12950

0.05932

0.04156

0.34658

0.34758

0.33313

0.31929

0.37670

0.33153

0.29177

0.46723

0.31730

0.21547

0.71988

0.21557

0.06455

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

33/42

Page 91: Csr2011 june14 16_30_ibsen-jensen

Strategy iteration: Iteration 8

S

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

0.12950

0.05932

0.04156

0.34658

0.34758

0.33313

0.31929

0.37670

0.33153

0.29177

0.46723

0.31730

0.21547

0.71988

0.21557

0.06455

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

33/42

Page 92: Csr2011 june14 16_30_ibsen-jensen

Strategy iteration: Iteration 8

S

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

0.12950

0.05932

0.04156

0.34658

0.34758

0.33313

0.31929

0.37670

0.33153

0.29177

0.46723

0.31730

0.21547

0.71988

0.21557

0.06455

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

33/42

Page 93: Csr2011 june14 16_30_ibsen-jensen

Strategy iteration: Iteration 8

S

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

0.13093

0.06118

0.04404

0.34758

0.34758

0.33313

0.31929

0.37670

0.33153

0.29177

0.46723

0.31730

0.21547

0.71988

0.21557

0.06455

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

33/42

Page 94: Csr2011 june14 16_30_ibsen-jensen

Strategy iteration: Iteration 8

S

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

0.13093

0.06118

0.04404

0.34758

0.34845

0.33311

0.31844

0.37937

0.33130

0.28933

0.47527

0.31541

0.20932

0.73606

0.20618

0.05776

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

33/42

Page 95: Csr2011 june14 16_30_ibsen-jensen

Strategy iteration: Iteration 9

S

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

0.13093

0.06118

0.04404

0.34758

0.34845

0.33311

0.31844

0.37937

0.33130

0.28933

0.47527

0.31541

0.20932

0.73606

0.20618

0.05776

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

34/42

Page 96: Csr2011 june14 16_30_ibsen-jensen

Strategy iteration: Iteration 9

S

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

0.13093

0.06118

0.04404

0.34758

0.34845

0.33311

0.31844

0.37937

0.33130

0.28933

0.47527

0.31541

0.20932

0.73606

0.20618

0.05776

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

34/42

Page 97: Csr2011 june14 16_30_ibsen-jensen

Strategy iteration: Iteration 9

S

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

0.13093

0.06118

0.04404

0.34758

0.34845

0.33311

0.31844

0.37937

0.33130

0.28933

0.47527

0.31541

0.20932

0.73606

0.20618

0.05776

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

34/42

Page 98: Csr2011 june14 16_30_ibsen-jensen

Strategy iteration: Iteration 9

S

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

0.13219

0.06283

0.04624

0.34845

0.34845

0.33311

0.31844

0.37937

0.33130

0.28933

0.47527

0.31541

0.20932

0.73606

0.20618

0.05776

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

34/42

Page 99: Csr2011 june14 16_30_ibsen-jensen

Strategy iteration: Iteration 9

S

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

0.13219

0.06283

0.04624

0.34845

0.34845

0.33311

0.31844

0.37937

0.33130

0.28933

0.47527

0.31541

0.20932

0.73606

0.20618

0.05776

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

34/42

Page 100: Csr2011 june14 16_30_ibsen-jensen

Strategy iteration: Iteration 9

S

1. Best response for Lucifer2. Calculate values from those strategies3. Update strategy for Dante

0.13219

0.06283

0.04624

0.34845

0.34923

0.33309

0.31768

0.38176

0.33109

0.28715

0.48241

0.31366

0.20393

0.74985

0.19791

0.05224

1 S S

0 1 S

0 0 1

S S

0 S

0 0

S S

0 S

0 0

S S

0 S

0 0

34/42

Page 101: Csr2011 june14 16_30_ibsen-jensen

Generalized Purgatory P(N,m) Lucifer repeatedly hides a number between 1

and m. Dante must try to guess the number. If he guesses correctly N times in a row, he

goes to heaven. If he ever guesses incorrectly overshooting

Lucifer’s number, he goes to hell.

35/42

Page 102: Csr2011 june14 16_30_ibsen-jensen

Interesting fact

The probability that Dante goes to heaven from purgatory is nearly 1, if he plays well enough.

36/42

Page 103: Csr2011 june14 16_30_ibsen-jensen

Exemplifying important factsValue iteration on 1 matrix

Strategy iteration on 1 matrix

1

0 1

0

0

1

0 1

1

0 1

Strategy iteration on 3 matrices

37/42

Page 104: Csr2011 june14 16_30_ibsen-jensen

Exemplifying important factsValue iteration on 1 matrix

Strategy iteration on 1 matrix

1

0 1

0

0

1

0 1

1

0 1

t:=0

Strategy iteration on 3 matrices

37/42

Page 105: Csr2011 june14 16_30_ibsen-jensen

Exemplifying important factsValue iteration on 1 matrix

Strategy iteration on 1 matrix

Strategy iteration on 3 matrices

1

0 1

0

0

1

0 1

1

0 1

t:=00

0.5

0.5

0.5

0.5

0.5

0.5

0.5

0.5

37/42

Page 106: Csr2011 june14 16_30_ibsen-jensen

Exemplifying important factsValue iteration on 1 matrix

Strategy iteration on 1 matrix

Strategy iteration on 3 matrices

1

0 1

0

0

1

0 1

1

0 1

t:=10

0.5

0.5

0.5

0.5

0.5

0.5

0.5

0.5

38/42

Page 107: Csr2011 june14 16_30_ibsen-jensen

Exemplifying important factsValue iteration on 1 matrix

Strategy iteration on 1 matrix

Strategy iteration on 3 matrices

1

0 1

0

0

1

0 1

1

0 1

t:=10

0.5

0.5

0.5

0.5

0.5

0.5

0.5

0.5

38/42

Page 108: Csr2011 june14 16_30_ibsen-jensen

Exemplifying important factsValue iteration on 1 matrix

Strategy iteration on 1 matrix

Strategy iteration on 3 matrices

1

0 1

0

0

1

0 1

1

0 1

t:=10.5

0.5

0.5

0.5

0.5

0.5

0.5

0.5

0.5

0.5

0.5

0.25

0.125

38/42

Page 109: Csr2011 june14 16_30_ibsen-jensen

Exemplifying important factsValue iteration on 1 matrix

Strategy iteration on 1 matrix

Strategy iteration on 3 matrices

1

0 1

t:=10.5

0.66667

0.33333

0.66667

0.33333

0.57143

0.42857

0.53333

0.46667

0.5

0.5

0.25

0.125

1

0 1

0

0

1

0 1

38/42

Page 110: Csr2011 june14 16_30_ibsen-jensen

Exemplifying important factsValue iteration on 1 matrix

Strategy iteration on 1 matrix

Strategy iteration on 3 matrices

1

0 1

t:=20.5

0.66667

0.33333

0.66667

0.33333

0.57143

0.42857

0.53333

0.46667

0.5

0.5

0.25

0.125

1

0 1

0

0

1

0 1

39/42

Page 111: Csr2011 june14 16_30_ibsen-jensen

Exemplifying important factsValue iteration on 1 matrix

Strategy iteration on 1 matrix

Strategy iteration on 3 matrices

1

0 1

t:=20.5

0.66667

0.33333

0.66667

0.33333

0.57143

0.42857

0.53333

0.46667

0.5

0.5

0.25

0.125

1

0 1

0

0

1

0 1

39/42

Page 112: Csr2011 june14 16_30_ibsen-jensen

Exemplifying important factsValue iteration on 1 matrix

Strategy iteration on 1 matrix

Strategy iteration on 3 matrices

1

0 1

t:=20.66667

0.66667

0.33333

0.66667

0.33333

0.57143

0.42857

0.53333

0.46667

0.66667

0.53333

0.30476

0.20317

1

0 1

0

0

1

0 1

39/42

Page 113: Csr2011 june14 16_30_ibsen-jensen

Exemplifying important factsValue iteration on 1 matrix

Strategy iteration on 1 matrix

Strategy iteration on 3 matrices

1

0 1

t:=20.66667

0.75000

0.25000

0.75000

0.25000

0.61765

0.38235

0.55654

0.44346

0.66667

1

0 1

0

0

1

0 1

0.53333

0.30476

0.20317

39/42

Page 114: Csr2011 june14 16_30_ibsen-jensen

Exemplifying important factsValue iteration on 1 matrix

Strategy iteration on 1 matrix

Strategy iteration on 3 matrices

1

0 1

t:=30.66667

0.75000

0.25000

0.75000

0.25000

0.61765

0.38235

0.55654

0.44346

0.66667

1

0 1

0

0

1

0 1

0.53333

0.30476

0.20317

40/42

Page 115: Csr2011 june14 16_30_ibsen-jensen

Exemplifying important factsValue iteration on 1 matrix

Strategy iteration on 1 matrix

Strategy iteration on 3 matrices

1

0 1

t:=30.66667

0.75000

0.25000

0.75000

0.25000

0.61765

0.38235

0.55654

0.44346

0.66667

1

0 1

0

0

1

0 1

0.53333

0.30476

0.20317

40/42

Page 116: Csr2011 june14 16_30_ibsen-jensen

Exemplifying important factsValue iteration on 1 matrix

Strategy iteration on 1 matrix

Strategy iteration on 3 matrices

1

0 1

t:=30.75000

0.75000

0.25000

0.75000

0.25000

0.61765

0.38235

0.55654

0.44346

0.75000

0.55654

0.34374

0.25781

1

0 1

0

0

1

0 1

40/42

Page 117: Csr2011 june14 16_30_ibsen-jensen

Exemplifying important factsValue iteration on 1 matrix

Strategy iteration on 1 matrix

Strategy iteration on 3 matrices

1

0 1

t:=30.75000

0.80000

0.20000

0.80000

0.20000

0.65072

0.34928

0.57399

0.42601

0.75000

0.55654

0.34374

0.25781

1

0 1

0

0

1

0 1

41/42

Page 118: Csr2011 june14 16_30_ibsen-jensen

The end

Open problems: Find a fast algorithm for the problem

There exists a PSPACE algorithm for the problem, but it is not fast.

Thanks for listening

42/42