a complexity analysis solving markov decision processes using policy iteration romain hollanders,...
TRANSCRIPT
![Page 1: A complexity analysis Solving Markov Decision Processes using Policy Iteration Romain Hollanders, UCLouvain Joint work with: Balázs Gerencsér, Jean-Charles](https://reader036.vdocuments.site/reader036/viewer/2022062504/5a4d1b5e7f8b9ab0599ac274/html5/thumbnails/1.jpg)
A complexity analysis
Solving Markov Decision Processes using Policy Iteration
Romain Hollanders, UCLouvain
Joint work with: Balázs Gerencsér, Jean-Charles Delvenne and Raphaël Jungers
Seminar at Loria – Inria, Nancy, February 2015
![Page 2: A complexity analysis Solving Markov Decision Processes using Policy Iteration Romain Hollanders, UCLouvain Joint work with: Balázs Gerencsér, Jean-Charles](https://reader036.vdocuments.site/reader036/viewer/2022062504/5a4d1b5e7f8b9ab0599ac274/html5/thumbnails/2.jpg)
Policy Iteration to solve Markov Decision Processes
Two powerful tools for the analysis
Acyclic Unique Sink Orientations Order-Regular matrices
![Page 3: A complexity analysis Solving Markov Decision Processes using Policy Iteration Romain Hollanders, UCLouvain Joint work with: Balázs Gerencsér, Jean-Charles](https://reader036.vdocuments.site/reader036/viewer/2022062504/5a4d1b5e7f8b9ab0599ac274/html5/thumbnails/3.jpg)
![Page 4: A complexity analysis Solving Markov Decision Processes using Policy Iteration Romain Hollanders, UCLouvain Joint work with: Balázs Gerencsér, Jean-Charles](https://reader036.vdocuments.site/reader036/viewer/2022062504/5a4d1b5e7f8b9ab0599ac274/html5/thumbnails/4.jpg)
![Page 5: A complexity analysis Solving Markov Decision Processes using Policy Iteration Romain Hollanders, UCLouvain Joint work with: Balázs Gerencsér, Jean-Charles](https://reader036.vdocuments.site/reader036/viewer/2022062504/5a4d1b5e7f8b9ab0599ac274/html5/thumbnails/5.jpg)
![Page 6: A complexity analysis Solving Markov Decision Processes using Policy Iteration Romain Hollanders, UCLouvain Joint work with: Balázs Gerencsér, Jean-Charles](https://reader036.vdocuments.site/reader036/viewer/2022062504/5a4d1b5e7f8b9ab0599ac274/html5/thumbnails/6.jpg)
![Page 7: A complexity analysis Solving Markov Decision Processes using Policy Iteration Romain Hollanders, UCLouvain Joint work with: Balázs Gerencsér, Jean-Charles](https://reader036.vdocuments.site/reader036/viewer/2022062504/5a4d1b5e7f8b9ab0599ac274/html5/thumbnails/7.jpg)
![Page 8: A complexity analysis Solving Markov Decision Processes using Policy Iteration Romain Hollanders, UCLouvain Joint work with: Balázs Gerencsér, Jean-Charles](https://reader036.vdocuments.site/reader036/viewer/2022062504/5a4d1b5e7f8b9ab0599ac274/html5/thumbnails/8.jpg)
![Page 9: A complexity analysis Solving Markov Decision Processes using Policy Iteration Romain Hollanders, UCLouvain Joint work with: Balázs Gerencsér, Jean-Charles](https://reader036.vdocuments.site/reader036/viewer/2022062504/5a4d1b5e7f8b9ab0599ac274/html5/thumbnails/9.jpg)
![Page 10: A complexity analysis Solving Markov Decision Processes using Policy Iteration Romain Hollanders, UCLouvain Joint work with: Balázs Gerencsér, Jean-Charles](https://reader036.vdocuments.site/reader036/viewer/2022062504/5a4d1b5e7f8b9ab0599ac274/html5/thumbnails/10.jpg)
![Page 11: A complexity analysis Solving Markov Decision Processes using Policy Iteration Romain Hollanders, UCLouvain Joint work with: Balázs Gerencsér, Jean-Charles](https://reader036.vdocuments.site/reader036/viewer/2022062504/5a4d1b5e7f8b9ab0599ac274/html5/thumbnails/11.jpg)
starting state
How much will we pay ?
![Page 12: A complexity analysis Solving Markov Decision Processes using Policy Iteration Romain Hollanders, UCLouvain Joint work with: Balázs Gerencsér, Jean-Charles](https://reader036.vdocuments.site/reader036/viewer/2022062504/5a4d1b5e7f8b9ab0599ac274/html5/thumbnails/12.jpg)
How much will we pay ?
starting state
Total-cost criterion . . . . . . . . . . . . . . .
horizon
cost vector
![Page 13: A complexity analysis Solving Markov Decision Processes using Policy Iteration Romain Hollanders, UCLouvain Joint work with: Balázs Gerencsér, Jean-Charles](https://reader036.vdocuments.site/reader036/viewer/2022062504/5a4d1b5e7f8b9ab0599ac274/html5/thumbnails/13.jpg)
How much will we pay ?
starting state
Total-cost criterion
Average-cost criterion
. . . . . . . . . . . . . . .
. . . . . . . . . . . . .
horizon
cost vector
![Page 14: A complexity analysis Solving Markov Decision Processes using Policy Iteration Romain Hollanders, UCLouvain Joint work with: Balázs Gerencsér, Jean-Charles](https://reader036.vdocuments.site/reader036/viewer/2022062504/5a4d1b5e7f8b9ab0599ac274/html5/thumbnails/14.jpg)
How much will we pay ?
starting state
Total-cost criterion
Average-cost criterion
Discounted-cost criterion
. . . . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . .
horizon
discount factor
cost vector
![Page 15: A complexity analysis Solving Markov Decision Processes using Policy Iteration Romain Hollanders, UCLouvain Joint work with: Balázs Gerencsér, Jean-Charles](https://reader036.vdocuments.site/reader036/viewer/2022062504/5a4d1b5e7f8b9ab0599ac274/html5/thumbnails/15.jpg)
Markov chains
![Page 16: A complexity analysis Solving Markov Decision Processes using Policy Iteration Romain Hollanders, UCLouvain Joint work with: Balázs Gerencsér, Jean-Charles](https://reader036.vdocuments.site/reader036/viewer/2022062504/5a4d1b5e7f8b9ab0599ac274/html5/thumbnails/16.jpg)
Markov chains
Markov Decision Processes
one a
ction
per s
tate
in general
![Page 17: A complexity analysis Solving Markov Decision Processes using Policy Iteration Romain Hollanders, UCLouvain Joint work with: Balázs Gerencsér, Jean-Charles](https://reader036.vdocuments.site/reader036/viewer/2022062504/5a4d1b5e7f8b9ab0599ac274/html5/thumbnails/17.jpg)
![Page 18: A complexity analysis Solving Markov Decision Processes using Policy Iteration Romain Hollanders, UCLouvain Joint work with: Balázs Gerencsér, Jean-Charles](https://reader036.vdocuments.site/reader036/viewer/2022062504/5a4d1b5e7f8b9ab0599ac274/html5/thumbnails/18.jpg)
action
action cost
transition probability
Goal: find the optimal policy
Evaluate a policy using an objective function
Total-cost
Average-cost
Discounted-cost
Proposition: there always exists
what we aim for !
![Page 19: A complexity analysis Solving Markov Decision Processes using Policy Iteration Romain Hollanders, UCLouvain Joint work with: Balázs Gerencsér, Jean-Charles](https://reader036.vdocuments.site/reader036/viewer/2022062504/5a4d1b5e7f8b9ab0599ac274/html5/thumbnails/19.jpg)
How do we solve a Markov Decision Process ?
Policy Iteration
![Page 20: A complexity analysis Solving Markov Decision Processes using Policy Iteration Romain Hollanders, UCLouvain Joint work with: Balázs Gerencsér, Jean-Charles](https://reader036.vdocuments.site/reader036/viewer/2022062504/5a4d1b5e7f8b9ab0599ac274/html5/thumbnails/20.jpg)
POLICY ITERATION
![Page 21: A complexity analysis Solving Markov Decision Processes using Policy Iteration Romain Hollanders, UCLouvain Joint work with: Balázs Gerencsér, Jean-Charles](https://reader036.vdocuments.site/reader036/viewer/2022062504/5a4d1b5e7f8b9ab0599ac274/html5/thumbnails/21.jpg)
Choose an initial policy0.
end
while
1. Evaluate
2. Improve
is the best action in each stateaccording to
POLICY ITERATION
![Page 22: A complexity analysis Solving Markov Decision Processes using Policy Iteration Romain Hollanders, UCLouvain Joint work with: Balázs Gerencsér, Jean-Charles](https://reader036.vdocuments.site/reader036/viewer/2022062504/5a4d1b5e7f8b9ab0599ac274/html5/thumbnails/22.jpg)
Choose an initial policy0.
end
while
1. Evaluate
2. Improve
is the best action in each stateaccording to
POLICY ITERATION
![Page 23: A complexity analysis Solving Markov Decision Processes using Policy Iteration Romain Hollanders, UCLouvain Joint work with: Balázs Gerencsér, Jean-Charles](https://reader036.vdocuments.site/reader036/viewer/2022062504/5a4d1b5e7f8b9ab0599ac274/html5/thumbnails/23.jpg)
Choose an initial policy0.
end
while
1. Evaluate
2. Improve
is the best action in each stateaccording to
Stop ! We found the optimal policy
POLICY ITERATION
![Page 24: A complexity analysis Solving Markov Decision Processes using Policy Iteration Romain Hollanders, UCLouvain Joint work with: Balázs Gerencsér, Jean-Charles](https://reader036.vdocuments.site/reader036/viewer/2022062504/5a4d1b5e7f8b9ab0599ac274/html5/thumbnails/24.jpg)
Markov Decision Processes
![Page 25: A complexity analysis Solving Markov Decision Processes using Policy Iteration Romain Hollanders, UCLouvain Joint work with: Balázs Gerencsér, Jean-Charles](https://reader036.vdocuments.site/reader036/viewer/2022062504/5a4d1b5e7f8b9ab0599ac274/html5/thumbnails/25.jpg)
Turn Based Stochastic Games
one p
layer
two players
Markov Decision Processes
![Page 26: A complexity analysis Solving Markov Decision Processes using Policy Iteration Romain Hollanders, UCLouvain Joint work with: Balázs Gerencsér, Jean-Charles](https://reader036.vdocuments.site/reader036/viewer/2022062504/5a4d1b5e7f8b9ab0599ac274/html5/thumbnails/26.jpg)
![Page 27: A complexity analysis Solving Markov Decision Processes using Policy Iteration Romain Hollanders, UCLouvain Joint work with: Balázs Gerencsér, Jean-Charles](https://reader036.vdocuments.site/reader036/viewer/2022062504/5a4d1b5e7f8b9ab0599ac274/html5/thumbnails/27.jpg)
minimizer versus maximizer
STRATEGY ITERATION
![Page 28: A complexity analysis Solving Markov Decision Processes using Policy Iteration Romain Hollanders, UCLouvain Joint work with: Balázs Gerencsér, Jean-Charles](https://reader036.vdocuments.site/reader036/viewer/2022062504/5a4d1b5e7f8b9ab0599ac274/html5/thumbnails/28.jpg)
minimizer versus maximizer
STRATEGY ITERATION
find the best response
using POLICY ITERATION
against
![Page 29: A complexity analysis Solving Markov Decision Processes using Policy Iteration Romain Hollanders, UCLouvain Joint work with: Balázs Gerencsér, Jean-Charles](https://reader036.vdocuments.site/reader036/viewer/2022062504/5a4d1b5e7f8b9ab0599ac274/html5/thumbnails/29.jpg)
minimizer versus maximizer
STRATEGY ITERATION
find the best response
using POLICY ITERATION
against
find the best response
using POLICY ITERATION
against
![Page 30: A complexity analysis Solving Markov Decision Processes using Policy Iteration Romain Hollanders, UCLouvain Joint work with: Balázs Gerencsér, Jean-Charles](https://reader036.vdocuments.site/reader036/viewer/2022062504/5a4d1b5e7f8b9ab0599ac274/html5/thumbnails/30.jpg)
minimizer versus maximizer
STRATEGY ITERATION
find the best response
using POLICY ITERATION
against
Repeat until nothing changes
find the best response
using POLICY ITERATION
against
![Page 31: A complexity analysis Solving Markov Decision Processes using Policy Iteration Romain Hollanders, UCLouvain Joint work with: Balázs Gerencsér, Jean-Charles](https://reader036.vdocuments.site/reader036/viewer/2022062504/5a4d1b5e7f8b9ab0599ac274/html5/thumbnails/31.jpg)
What is the complexity of Policy Iteration ?
![Page 32: A complexity analysis Solving Markov Decision Processes using Policy Iteration Romain Hollanders, UCLouvain Joint work with: Balázs Gerencsér, Jean-Charles](https://reader036.vdocuments.site/reader036/viewer/2022062504/5a4d1b5e7f8b9ab0599ac274/html5/thumbnails/32.jpg)
Total-cost criterion
Average-cost criterion
Discounted-cost criterion
Exponential. . . . . . . . . . . . . . . . . . . . . .
[Friedmann ‘09, Fearnley ‘10]
![Page 33: A complexity analysis Solving Markov Decision Processes using Policy Iteration Romain Hollanders, UCLouvain Joint work with: Balázs Gerencsér, Jean-Charles](https://reader036.vdocuments.site/reader036/viewer/2022062504/5a4d1b5e7f8b9ab0599ac274/html5/thumbnails/33.jpg)
![Page 34: A complexity analysis Solving Markov Decision Processes using Policy Iteration Romain Hollanders, UCLouvain Joint work with: Balázs Gerencsér, Jean-Charles](https://reader036.vdocuments.site/reader036/viewer/2022062504/5a4d1b5e7f8b9ab0599ac274/html5/thumbnails/34.jpg)
Total-cost criterion
Average-cost criterion
Discounted-cost criterion
Exponential
Exponential
Exponential
. . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
[H. et al. ‘12]
[Friedmann ‘09, Fearnley ‘10]
[Friedmann ‘09, Fearnley ‘10]
![Page 35: A complexity analysis Solving Markov Decision Processes using Policy Iteration Romain Hollanders, UCLouvain Joint work with: Balázs Gerencsér, Jean-Charles](https://reader036.vdocuments.site/reader036/viewer/2022062504/5a4d1b5e7f8b9ab0599ac274/html5/thumbnails/35.jpg)
Exponential in general !
But…
![Page 36: A complexity analysis Solving Markov Decision Processes using Policy Iteration Romain Hollanders, UCLouvain Joint work with: Balázs Gerencsér, Jean-Charles](https://reader036.vdocuments.site/reader036/viewer/2022062504/5a4d1b5e7f8b9ab0599ac274/html5/thumbnails/36.jpg)
Fearnley’s example is pathological
![Page 37: A complexity analysis Solving Markov Decision Processes using Policy Iteration Romain Hollanders, UCLouvain Joint work with: Balázs Gerencsér, Jean-Charles](https://reader036.vdocuments.site/reader036/viewer/2022062504/5a4d1b5e7f8b9ab0599ac274/html5/thumbnails/37.jpg)
Deterministic MDPs
MDPs with only positive costs
Polynomial for a close variant
???
. . . . . .
. . . . . . . . . . . . . . . . .
[Ye ‘10, Hansen et al. ‘11, Scherrer ‘13]
[Post & Ye ‘12, Scherrer ‘13]
Discounted-cost criterionwith a fixed discount rate
Polynomial
. . . . . . . . . . . . . . . . . . . .
![Page 38: A complexity analysis Solving Markov Decision Processes using Policy Iteration Romain Hollanders, UCLouvain Joint work with: Balázs Gerencsér, Jean-Charles](https://reader036.vdocuments.site/reader036/viewer/2022062504/5a4d1b5e7f8b9ab0599ac274/html5/thumbnails/38.jpg)
Let us find upper bounds for the general case !
![Page 39: A complexity analysis Solving Markov Decision Processes using Policy Iteration Romain Hollanders, UCLouvain Joint work with: Balázs Gerencsér, Jean-Charles](https://reader036.vdocuments.site/reader036/viewer/2022062504/5a4d1b5e7f8b9ab0599ac274/html5/thumbnails/39.jpg)
![Page 40: A complexity analysis Solving Markov Decision Processes using Policy Iteration Romain Hollanders, UCLouvain Joint work with: Balázs Gerencsér, Jean-Charles](https://reader036.vdocuments.site/reader036/viewer/2022062504/5a4d1b5e7f8b9ab0599ac274/html5/thumbnails/40.jpg)
![Page 41: A complexity analysis Solving Markov Decision Processes using Policy Iteration Romain Hollanders, UCLouvain Joint work with: Balázs Gerencsér, Jean-Charles](https://reader036.vdocuments.site/reader036/viewer/2022062504/5a4d1b5e7f8b9ab0599ac274/html5/thumbnails/41.jpg)
![Page 42: A complexity analysis Solving Markov Decision Processes using Policy Iteration Romain Hollanders, UCLouvain Joint work with: Balázs Gerencsér, Jean-Charles](https://reader036.vdocuments.site/reader036/viewer/2022062504/5a4d1b5e7f8b9ab0599ac274/html5/thumbnails/42.jpg)
![Page 43: A complexity analysis Solving Markov Decision Processes using Policy Iteration Romain Hollanders, UCLouvain Joint work with: Balázs Gerencsér, Jean-Charles](https://reader036.vdocuments.site/reader036/viewer/2022062504/5a4d1b5e7f8b9ab0599ac274/html5/thumbnails/43.jpg)
![Page 44: A complexity analysis Solving Markov Decision Processes using Policy Iteration Romain Hollanders, UCLouvain Joint work with: Balázs Gerencsér, Jean-Charles](https://reader036.vdocuments.site/reader036/viewer/2022062504/5a4d1b5e7f8b9ab0599ac274/html5/thumbnails/44.jpg)
sink
Every subcube has a unique sink
The orientation is acyclic
Let us find the sink with POLICY ITERATION
Acyclic Unique Sink Orientation
![Page 45: A complexity analysis Solving Markov Decision Processes using Policy Iteration Romain Hollanders, UCLouvain Joint work with: Balázs Gerencsér, Jean-Charles](https://reader036.vdocuments.site/reader036/viewer/2022062504/5a4d1b5e7f8b9ab0599ac274/html5/thumbnails/45.jpg)
Initial policy
Let us find the sink with POLICY ITERATION
![Page 46: A complexity analysis Solving Markov Decision Processes using Policy Iteration Romain Hollanders, UCLouvain Joint work with: Balázs Gerencsér, Jean-Charles](https://reader036.vdocuments.site/reader036/viewer/2022062504/5a4d1b5e7f8b9ab0599ac274/html5/thumbnails/46.jpg)
: the set of dimensions of the improvement edges
Let us find the sink with POLICY ITERATION
![Page 47: A complexity analysis Solving Markov Decision Processes using Policy Iteration Romain Hollanders, UCLouvain Joint work with: Balázs Gerencsér, Jean-Charles](https://reader036.vdocuments.site/reader036/viewer/2022062504/5a4d1b5e7f8b9ab0599ac274/html5/thumbnails/47.jpg)
Let us find the sink with POLICY ITERATION
![Page 48: A complexity analysis Solving Markov Decision Processes using Policy Iteration Romain Hollanders, UCLouvain Joint work with: Balázs Gerencsér, Jean-Charles](https://reader036.vdocuments.site/reader036/viewer/2022062504/5a4d1b5e7f8b9ab0599ac274/html5/thumbnails/48.jpg)
Let us find the sink with POLICY ITERATION
![Page 49: A complexity analysis Solving Markov Decision Processes using Policy Iteration Romain Hollanders, UCLouvain Joint work with: Balázs Gerencsér, Jean-Charles](https://reader036.vdocuments.site/reader036/viewer/2022062504/5a4d1b5e7f8b9ab0599ac274/html5/thumbnails/49.jpg)
Convergence in 5 vertex evaluations
is the PI-sequence
Let us find the sink with POLICY ITERATION
![Page 50: A complexity analysis Solving Markov Decision Processes using Policy Iteration Romain Hollanders, UCLouvain Joint work with: Balázs Gerencsér, Jean-Charles](https://reader036.vdocuments.site/reader036/viewer/2022062504/5a4d1b5e7f8b9ab0599ac274/html5/thumbnails/50.jpg)
Two properties to derive an upper bound
![Page 51: A complexity analysis Solving Markov Decision Processes using Policy Iteration Romain Hollanders, UCLouvain Joint work with: Balázs Gerencsér, Jean-Charles](https://reader036.vdocuments.site/reader036/viewer/2022062504/5a4d1b5e7f8b9ab0599ac274/html5/thumbnails/51.jpg)
There exists a path connectingthe policies of the PI-sequence
Two properties to derive an upper bound
1.
2.
![Page 52: A complexity analysis Solving Markov Decision Processes using Policy Iteration Romain Hollanders, UCLouvain Joint work with: Balázs Gerencsér, Jean-Charles](https://reader036.vdocuments.site/reader036/viewer/2022062504/5a4d1b5e7f8b9ab0599ac274/html5/thumbnails/52.jpg)
A new upper bound
total number of policies
Therefore we cannot have too many large ’s in a PI-sequence
We prove
Therefore
![Page 53: A complexity analysis Solving Markov Decision Processes using Policy Iteration Romain Hollanders, UCLouvain Joint work with: Balázs Gerencsér, Jean-Charles](https://reader036.vdocuments.site/reader036/viewer/2022062504/5a4d1b5e7f8b9ab0599ac274/html5/thumbnails/53.jpg)
Can we do even better?
![Page 54: A complexity analysis Solving Markov Decision Processes using Policy Iteration Romain Hollanders, UCLouvain Joint work with: Balázs Gerencsér, Jean-Charles](https://reader036.vdocuments.site/reader036/viewer/2022062504/5a4d1b5e7f8b9ab0599ac274/html5/thumbnails/54.jpg)
The matrix is “Order-Regular”
![Page 55: A complexity analysis Solving Markov Decision Processes using Policy Iteration Romain Hollanders, UCLouvain Joint work with: Balázs Gerencsér, Jean-Charles](https://reader036.vdocuments.site/reader036/viewer/2022062504/5a4d1b5e7f8b9ab0599ac274/html5/thumbnails/55.jpg)
The matrix is “Order-Regular”
![Page 56: A complexity analysis Solving Markov Decision Processes using Policy Iteration Romain Hollanders, UCLouvain Joint work with: Balázs Gerencsér, Jean-Charles](https://reader036.vdocuments.site/reader036/viewer/2022062504/5a4d1b5e7f8b9ab0599ac274/html5/thumbnails/56.jpg)
The matrix is “Order-Regular”
![Page 57: A complexity analysis Solving Markov Decision Processes using Policy Iteration Romain Hollanders, UCLouvain Joint work with: Balázs Gerencsér, Jean-Charles](https://reader036.vdocuments.site/reader036/viewer/2022062504/5a4d1b5e7f8b9ab0599ac274/html5/thumbnails/57.jpg)
The matrix is “Order-Regular”
![Page 58: A complexity analysis Solving Markov Decision Processes using Policy Iteration Romain Hollanders, UCLouvain Joint work with: Balázs Gerencsér, Jean-Charles](https://reader036.vdocuments.site/reader036/viewer/2022062504/5a4d1b5e7f8b9ab0599ac274/html5/thumbnails/58.jpg)
The matrix is “Order-Regular”
![Page 59: A complexity analysis Solving Markov Decision Processes using Policy Iteration Romain Hollanders, UCLouvain Joint work with: Balázs Gerencsér, Jean-Charles](https://reader036.vdocuments.site/reader036/viewer/2022062504/5a4d1b5e7f8b9ab0599ac274/html5/thumbnails/59.jpg)
The matrix is “Order-Regular”
![Page 60: A complexity analysis Solving Markov Decision Processes using Policy Iteration Romain Hollanders, UCLouvain Joint work with: Balázs Gerencsér, Jean-Charles](https://reader036.vdocuments.site/reader036/viewer/2022062504/5a4d1b5e7f8b9ab0599ac274/html5/thumbnails/60.jpg)
The matrix is “Order-Regular”
![Page 61: A complexity analysis Solving Markov Decision Processes using Policy Iteration Romain Hollanders, UCLouvain Joint work with: Balázs Gerencsér, Jean-Charles](https://reader036.vdocuments.site/reader036/viewer/2022062504/5a4d1b5e7f8b9ab0599ac274/html5/thumbnails/61.jpg)
The matrix is “Order-Regular”
![Page 62: A complexity analysis Solving Markov Decision Processes using Policy Iteration Romain Hollanders, UCLouvain Joint work with: Balázs Gerencsér, Jean-Charles](https://reader036.vdocuments.site/reader036/viewer/2022062504/5a4d1b5e7f8b9ab0599ac274/html5/thumbnails/62.jpg)
The matrix is “Order-Regular”
![Page 63: A complexity analysis Solving Markov Decision Processes using Policy Iteration Romain Hollanders, UCLouvain Joint work with: Balázs Gerencsér, Jean-Charles](https://reader036.vdocuments.site/reader036/viewer/2022062504/5a4d1b5e7f8b9ab0599ac274/html5/thumbnails/63.jpg)
The matrix is “Order-Regular”
![Page 64: A complexity analysis Solving Markov Decision Processes using Policy Iteration Romain Hollanders, UCLouvain Joint work with: Balázs Gerencsér, Jean-Charles](https://reader036.vdocuments.site/reader036/viewer/2022062504/5a4d1b5e7f8b9ab0599ac274/html5/thumbnails/64.jpg)
The matrix is “Order-Regular”
![Page 65: A complexity analysis Solving Markov Decision Processes using Policy Iteration Romain Hollanders, UCLouvain Joint work with: Balázs Gerencsér, Jean-Charles](https://reader036.vdocuments.site/reader036/viewer/2022062504/5a4d1b5e7f8b9ab0599ac274/html5/thumbnails/65.jpg)
The matrix is “Order-Regular”
![Page 66: A complexity analysis Solving Markov Decision Processes using Policy Iteration Romain Hollanders, UCLouvain Joint work with: Balázs Gerencsér, Jean-Charles](https://reader036.vdocuments.site/reader036/viewer/2022062504/5a4d1b5e7f8b9ab0599ac274/html5/thumbnails/66.jpg)
The matrix is “Order-Regular”
![Page 67: A complexity analysis Solving Markov Decision Processes using Policy Iteration Romain Hollanders, UCLouvain Joint work with: Balázs Gerencsér, Jean-Charles](https://reader036.vdocuments.site/reader036/viewer/2022062504/5a4d1b5e7f8b9ab0599ac274/html5/thumbnails/67.jpg)
The matrix is “Order-Regular”
![Page 68: A complexity analysis Solving Markov Decision Processes using Policy Iteration Romain Hollanders, UCLouvain Joint work with: Balázs Gerencsér, Jean-Charles](https://reader036.vdocuments.site/reader036/viewer/2022062504/5a4d1b5e7f8b9ab0599ac274/html5/thumbnails/68.jpg)
The matrix is “Order-Regular”
![Page 69: A complexity analysis Solving Markov Decision Processes using Policy Iteration Romain Hollanders, UCLouvain Joint work with: Balázs Gerencsér, Jean-Charles](https://reader036.vdocuments.site/reader036/viewer/2022062504/5a4d1b5e7f8b9ab0599ac274/html5/thumbnails/69.jpg)
The matrix is “Order-Regular”
![Page 70: A complexity analysis Solving Markov Decision Processes using Policy Iteration Romain Hollanders, UCLouvain Joint work with: Balázs Gerencsér, Jean-Charles](https://reader036.vdocuments.site/reader036/viewer/2022062504/5a4d1b5e7f8b9ab0599ac274/html5/thumbnails/70.jpg)
The matrix is “Order-Regular”
![Page 71: A complexity analysis Solving Markov Decision Processes using Policy Iteration Romain Hollanders, UCLouvain Joint work with: Balázs Gerencsér, Jean-Charles](https://reader036.vdocuments.site/reader036/viewer/2022062504/5a4d1b5e7f8b9ab0599ac274/html5/thumbnails/71.jpg)
How large are the largest Order-Regular matrices that we can build?
![Page 72: A complexity analysis Solving Markov Decision Processes using Policy Iteration Romain Hollanders, UCLouvain Joint work with: Balázs Gerencsér, Jean-Charles](https://reader036.vdocuments.site/reader036/viewer/2022062504/5a4d1b5e7f8b9ab0599ac274/html5/thumbnails/72.jpg)
The answer of exhaustive search
??
Conjecture (Hansen & Zwick, 2012)
the Fibonacci number
the golden ratio
![Page 73: A complexity analysis Solving Markov Decision Processes using Policy Iteration Romain Hollanders, UCLouvain Joint work with: Balázs Gerencsér, Jean-Charles](https://reader036.vdocuments.site/reader036/viewer/2022062504/5a4d1b5e7f8b9ab0599ac274/html5/thumbnails/73.jpg)
The answer of exhaustive search
Theorem (H. et al., 2014)
for
(Proof: a “smart” exhaustive search)
![Page 74: A complexity analysis Solving Markov Decision Processes using Policy Iteration Romain Hollanders, UCLouvain Joint work with: Balázs Gerencsér, Jean-Charles](https://reader036.vdocuments.site/reader036/viewer/2022062504/5a4d1b5e7f8b9ab0599ac274/html5/thumbnails/74.jpg)
How large are the largest Order-Regular matrices that we can build?
![Page 75: A complexity analysis Solving Markov Decision Processes using Policy Iteration Romain Hollanders, UCLouvain Joint work with: Balázs Gerencsér, Jean-Charles](https://reader036.vdocuments.site/reader036/viewer/2022062504/5a4d1b5e7f8b9ab0599ac274/html5/thumbnails/75.jpg)
A constructive approach
![Page 76: A complexity analysis Solving Markov Decision Processes using Policy Iteration Romain Hollanders, UCLouvain Joint work with: Balázs Gerencsér, Jean-Charles](https://reader036.vdocuments.site/reader036/viewer/2022062504/5a4d1b5e7f8b9ab0599ac274/html5/thumbnails/76.jpg)
A constructive approach
![Page 77: A complexity analysis Solving Markov Decision Processes using Policy Iteration Romain Hollanders, UCLouvain Joint work with: Balázs Gerencsér, Jean-Charles](https://reader036.vdocuments.site/reader036/viewer/2022062504/5a4d1b5e7f8b9ab0599ac274/html5/thumbnails/77.jpg)
A constructive approach
![Page 78: A complexity analysis Solving Markov Decision Processes using Policy Iteration Romain Hollanders, UCLouvain Joint work with: Balázs Gerencsér, Jean-Charles](https://reader036.vdocuments.site/reader036/viewer/2022062504/5a4d1b5e7f8b9ab0599ac274/html5/thumbnails/78.jpg)
A constructive approach
![Page 79: A complexity analysis Solving Markov Decision Processes using Policy Iteration Romain Hollanders, UCLouvain Joint work with: Balázs Gerencsér, Jean-Charles](https://reader036.vdocuments.site/reader036/viewer/2022062504/5a4d1b5e7f8b9ab0599ac274/html5/thumbnails/79.jpg)
A constructive approach
Iterate and build matrices of size
![Page 80: A complexity analysis Solving Markov Decision Processes using Policy Iteration Romain Hollanders, UCLouvain Joint work with: Balázs Gerencsér, Jean-Charles](https://reader036.vdocuments.site/reader036/viewer/2022062504/5a4d1b5e7f8b9ab0599ac274/html5/thumbnails/80.jpg)
Can we do better ?
![Page 81: A complexity analysis Solving Markov Decision Processes using Policy Iteration Romain Hollanders, UCLouvain Joint work with: Balázs Gerencsér, Jean-Charles](https://reader036.vdocuments.site/reader036/viewer/2022062504/5a4d1b5e7f8b9ab0599ac274/html5/thumbnails/81.jpg)
Yes!
We can build matrices of size
![Page 82: A complexity analysis Solving Markov Decision Processes using Policy Iteration Romain Hollanders, UCLouvain Joint work with: Balázs Gerencsér, Jean-Charles](https://reader036.vdocuments.site/reader036/viewer/2022062504/5a4d1b5e7f8b9ab0599ac274/html5/thumbnails/82.jpg)
So, what do we know about Order-Regular matrices ?
Order-Regular matrix Acyclic Unique Sink Orientation
![Page 83: A complexity analysis Solving Markov Decision Processes using Policy Iteration Romain Hollanders, UCLouvain Joint work with: Balázs Gerencsér, Jean-Charles](https://reader036.vdocuments.site/reader036/viewer/2022062504/5a4d1b5e7f8b9ab0599ac274/html5/thumbnails/83.jpg)
Let’s recap’ !
![Page 84: A complexity analysis Solving Markov Decision Processes using Policy Iteration Romain Hollanders, UCLouvain Joint work with: Balázs Gerencsér, Jean-Charles](https://reader036.vdocuments.site/reader036/viewer/2022062504/5a4d1b5e7f8b9ab0599ac274/html5/thumbnails/84.jpg)
PART 1 Policy Iteration for Markov Decision Processes
Efficient in practice but not in the worst case
PART 2 The Acyclic Unique Sink Orientations point of view
Leads to a new upper bound
PART 3 Order-Regular matrices towards new bounds
The Fibonacci conjecture fails
![Page 85: A complexity analysis Solving Markov Decision Processes using Policy Iteration Romain Hollanders, UCLouvain Joint work with: Balázs Gerencsér, Jean-Charles](https://reader036.vdocuments.site/reader036/viewer/2022062504/5a4d1b5e7f8b9ab0599ac274/html5/thumbnails/85.jpg)
A complexity analysis
Solving Markov Decision Processes using Policy Iteration
Romain Hollanders, UCLouvain
Joint work with: Balázs Gerencsér, Jean-Charles Delvenne and Raphaël Jungers
Seminar at Loria – Inria, Nancy, February 2015