tkk | automation technology laboratory partially observable markov decision process (chapter 15...

67
TKK | Automation Technology Laboratory Partially Observable Markov Partially Observable Markov Decision Process Decision Process (Chapter 15 & 16) (Chapter 15 & 16) José Luis Peralta

Upload: mariah-tamsin-gordon

Post on 02-Jan-2016

226 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: TKK | Automation Technology Laboratory Partially Observable Markov Decision Process (Chapter 15 & 16) José Luis Peralta

TKK | Automation Technology Laboratory

Partially Observable Markov Decision Partially Observable Markov Decision Process Process

(Chapter 15 & 16)(Chapter 15 & 16)

José Luis Peralta

Page 2: TKK | Automation Technology Laboratory Partially Observable Markov Decision Process (Chapter 15 & 16) José Luis Peralta

TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology

ContentsContents

• POMDP• Example POMDP• Finite World POMDP algorithm• Practical Considerations• Approximate POMDP Techniques

Page 3: TKK | Automation Technology Laboratory Partially Observable Markov Decision Process (Chapter 15 & 16) José Luis Peralta

TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology

Partially Observable Markov Decision Partially Observable Markov Decision ProcessesProcesses(POMDP)(POMDP)

• POMDP: Uncertainty in Measurements State Uncertainty in Control Effects

• Adapt previous Value Iteration Algorithm (VI-VIA)

Page 4: TKK | Automation Technology Laboratory Partially Observable Markov Decision Process (Chapter 15 & 16) José Luis Peralta

TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology

Partially Observable Markov Decision Partially Observable Markov Decision ProcessesProcesses(POMDP)(POMDP)

• POMDP: World can't be sensed directly

• Measurements: incomplete, noisy, etc.

• Partial Observability Robot has to estimate a posterior distribution over a

possible world state.

Page 5: TKK | Automation Technology Laboratory Partially Observable Markov Decision Process (Chapter 15 & 16) José Luis Peralta

TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology

Partially Observable Markov Decision Partially Observable Markov Decision ProcessesProcesses(POMDP)(POMDP)

• POMDP: Algorithm to find optimal control policy exit for

FINITE WORLD:• State space • Action space • Space of observation • Planning horizon

Computation is complex For continuous case there are approximations

All Finite

Page 6: TKK | Automation Technology Laboratory Partially Observable Markov Decision Process (Chapter 15 & 16) José Luis Peralta

TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology

Partially Observable Markov Decision Partially Observable Markov Decision ProcessesProcesses(POMDP)(POMDP)

• The algorithm we are going to study all based in Value Iteration (VI).

with

The same as previous but is not observable

• Robot has to make decision in the BELIEF STATE Robot’s internal knowledge about the state of the

environment Space of posteriori distribution over state

x

( )b

Page 7: TKK | Automation Technology Laboratory Partially Observable Markov Decision Process (Chapter 15 & 16) José Luis Peralta

TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology

Partially Observable Markov Decision Partially Observable Markov Decision ProcessesProcesses(POMDP)(POMDP)

• So

with

• Control Policy

Page 8: TKK | Automation Technology Laboratory Partially Observable Markov Decision Process (Chapter 15 & 16) José Luis Peralta

TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology

Partially Observable Markov Decision Partially Observable Markov Decision ProcessesProcesses(POMDP)(POMDP)

• Belief bel Each value in POMDP is function of

entire probability distribution

• Problems: State Space finite Belief Space

continuous State Space continuous Belief

Space infinitely-dimensional continuum

Also complexity in calculate the Value Function

Because of the integral over all the

distribution

( )b

Page 9: TKK | Automation Technology Laboratory Partially Observable Markov Decision Process (Chapter 15 & 16) José Luis Peralta

TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology

Partially Observable Markov Decision Partially Observable Markov Decision ProcessesProcesses(POMDP)(POMDP)

• At the end optimal solution exist for Interesting Special Case of Finite World: state space; action space; space of observations;

planning horizon All finite

• Solution of VF are Piecewise Linear Function over the belief space

The previous arrive because • Expectation is a linear operation• Ability to select different controls in different parts

Page 10: TKK | Automation Technology Laboratory Partially Observable Markov Decision Process (Chapter 15 & 16) José Luis Peralta

TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology

Example POMDPExample POMDP

2 States: 1 2,x x 3 Control Actions: 1 2 3, ,u u u

Page 11: TKK | Automation Technology Laboratory Partially Observable Markov Decision Process (Chapter 15 & 16) José Luis Peralta

TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology

1 1

1 2

( , ) 100

( , ) 100

r x u

r x u

2 1

2 2

( , ) 100

( , ) 50

r x u

r x u

Example POMDPExample POMDP

When execute payoff:

Dilemma opposite payoff in each state knowledge of the state translate directly into

payoff

Page 12: TKK | Automation Technology Laboratory Partially Observable Markov Decision Process (Chapter 15 & 16) José Luis Peralta

TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology

1 3 2 3( , ) ( , ) 1r x u r x u

Example POMDPExample POMDP

To acquire knowledge robot has control

affects the state of the world in non-deterministic manner:

(Cost of waiting, cost of sensing, etc.)

1 1 3

1 2 3

( , ) 0.2

( , ) 0.8

p x x u

p x x u

1 1 3

1 2 3

( , ) 0.8

( , ) 0.2

p x x u

p x x u

3u

3u

Page 13: TKK | Automation Technology Laboratory Partially Observable Markov Decision Process (Chapter 15 & 16) José Luis Peralta

TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology

Example POMDPExample POMDP

• Benefit Before each control decision, the robot can sense. By sensing robot gains knowledge about the state Make better control decisions High payoff expectation

• In the case of control action , robot sense without terminal action 3u

Page 14: TKK | Automation Technology Laboratory Partially Observable Markov Decision Process (Chapter 15 & 16) José Luis Peralta

TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology

Example POMDPExample POMDP

• The measurement model is governed by the following probability distribution:

1 1

1 2

( ) 0.7

( ) 0.3

p z x

p z x

2 1

2 2

( ) 0.3

( ) 0.7

p z x

p z x

Page 15: TKK | Automation Technology Laboratory Partially Observable Markov Decision Process (Chapter 15 & 16) José Luis Peralta

TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology

Example POMDPExample POMDP

This example is easy to graph over the belief space (2 states)• Belief state

1 1

2 2 2 1 1

( )

( ) but 1 so we just graph

p b x

p b x p p p

Page 16: TKK | Automation Technology Laboratory Partially Observable Markov Decision Process (Chapter 15 & 16) José Luis Peralta

TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology

Example POMDPExample POMDP

• Control Policy Function that maps the unit interval [0;1] to space of all

actions

:[0;1] u Example

Page 17: TKK | Automation Technology Laboratory Partially Observable Markov Decision Process (Chapter 15 & 16) José Luis Peralta

TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology

Example POMDP – Control ChoiceExample POMDP – Control Choice

• Control Choice (When to execute what control?)

First consider the immediate payoff . Payoff now is a function of belief state

So for , the expected payoff

Payoff in POMDPs

1 2 3, ,u u u

1 2,b p p

Page 18: TKK | Automation Technology Laboratory Partially Observable Markov Decision Process (Chapter 15 & 16) José Luis Peralta

TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology

Example POMDP – Control ChoiceExample POMDP – Control Choice

Page 19: TKK | Automation Technology Laboratory Partially Observable Markov Decision Process (Chapter 15 & 16) José Luis Peralta

TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology

Example POMDP – Control ChoiceExample POMDP – Control Choice

Page 20: TKK | Automation Technology Laboratory Partially Observable Markov Decision Process (Chapter 15 & 16) José Luis Peralta

TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology

Example POMDP – Control ChoiceExample POMDP – Control Choice

Page 21: TKK | Automation Technology Laboratory Partially Observable Markov Decision Process (Chapter 15 & 16) José Luis Peralta

TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology

Example POMDP – Control ChoiceExample POMDP – Control Choice• First we calculate

the robot simply selects the action of highest expected payoff

Piecewise Linear convex Function

Maximum of individual payoff function

1 1V T

Page 22: TKK | Automation Technology Laboratory Partially Observable Markov Decision Process (Chapter 15 & 16) José Luis Peralta

TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology

Example POMDP – Control ChoiceExample POMDP – Control Choice• First we calculate

the robot simply selects the action of highest expected payoff

Piecewise Linear convex Function

Maximum of individual payoff function

1 1V T

Page 23: TKK | Automation Technology Laboratory Partially Observable Markov Decision Process (Chapter 15 & 16) José Luis Peralta

TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology

Example POMDP – Control ChoiceExample POMDP – Control Choice• First we calculate

the robot simply selects the action of highest expected payoff

Transition occurs when in

1 2, ,r b u r b u

1

3

7p

Optimal Policy

1 1V T

Page 24: TKK | Automation Technology Laboratory Partially Observable Markov Decision Process (Chapter 15 & 16) José Luis Peralta

TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology

Example POMDP - Sensing Example POMDP - Sensing

• Now we have perception What if the robot can sense before it chooses control? How it affects the optimal Value Function

Sensing info about State enable choose better control action

In previous example 13

7p

Expected payoff14,7

How better will this be after sensing?

Page 25: TKK | Automation Technology Laboratory Partially Observable Markov Decision Process (Chapter 15 & 16) José Luis Peralta

TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology

Example POMDP – Control ChoiceExample POMDP – Control ChoiceBelief after sensing as a function of the belief before sensing

Given by Bayes Rule

Finally

1z

1

0.7 0.40.6087

0.4*0.4 0.3p

Page 26: TKK | Automation Technology Laboratory Partially Observable Markov Decision Process (Chapter 15 & 16) José Luis Peralta

TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology

Example POMDP – Control ChoiceExample POMDP – Control ChoiceHow this affects the Value Function?

Page 27: TKK | Automation Technology Laboratory Partially Observable Markov Decision Process (Chapter 15 & 16) José Luis Peralta

TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology

Example POMDP – Control ChoiceExample POMDP – Control Choice

Mathematically

That is just replacing by in the Value Function 1p 1p 1V

Page 28: TKK | Automation Technology Laboratory Partially Observable Markov Decision Process (Chapter 15 & 16) José Luis Peralta

TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology

Example POMDP – Control ChoiceExample POMDP – Control Choice

However our interest is the complete Expected Value Function after sensing, that consider also the probability of sensing the other measurement . This is given by:2z

Page 29: TKK | Automation Technology Laboratory Partially Observable Markov Decision Process (Chapter 15 & 16) José Luis Peralta

TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology

Example POMDP – Control ChoiceExample POMDP – Control ChoiceAn this results in

Page 30: TKK | Automation Technology Laboratory Partially Observable Markov Decision Process (Chapter 15 & 16) José Luis Peralta

TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology

Example POMDP – Control ChoiceExample POMDP – Control Choice

Mathematically

Page 31: TKK | Automation Technology Laboratory Partially Observable Markov Decision Process (Chapter 15 & 16) José Luis Peralta

TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology

Example POMDP - PredictionExample POMDP - Prediction

To plan at a horizon larger than we have to take this into consideration and project our

value function accordingly

According to our transition probability model

In between the expectation is linear

If

If

1T

Page 32: TKK | Automation Technology Laboratory Partially Observable Markov Decision Process (Chapter 15 & 16) José Luis Peralta

TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology

Example POMDP – PredictionExample POMDP – PredictionAn this results in

Page 33: TKK | Automation Technology Laboratory Partially Observable Markov Decision Process (Chapter 15 & 16) José Luis Peralta

TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology

Example POMDP – PredictionExample POMDP – Prediction

And adding and we have: 1u 2u

Page 34: TKK | Automation Technology Laboratory Partially Observable Markov Decision Process (Chapter 15 & 16) José Luis Peralta

TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology

Example POMDP – PredictionExample POMDP – Prediction

Mathematically

cost Fix!!31 of u

Page 35: TKK | Automation Technology Laboratory Partially Observable Markov Decision Process (Chapter 15 & 16) José Luis Peralta

TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology

Example POMDP – PruningExample POMDP – PruningFull backup :

547,86420 is defined over 10

linear functions

T

561,012,33730 is defined over 10

linear functions

T

Impractical!!!

Efficient approximate POMDP needed

Page 36: TKK | Automation Technology Laboratory Partially Observable Markov Decision Process (Chapter 15 & 16) José Luis Peralta

TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology

Finite World POMDP algorithmFinite World POMDP algorithm

To understand this read Mathematical Derivation of POMDPs pp.531-536 in [1]

Page 37: TKK | Automation Technology Laboratory Partially Observable Markov Decision Process (Chapter 15 & 16) José Luis Peralta

TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology

Finite World POMDP algorithmFinite World POMDP algorithm

To understand this read Mathematical Derivation of POMDPs pp.531-536 in [1]

Page 38: TKK | Automation Technology Laboratory Partially Observable Markov Decision Process (Chapter 15 & 16) José Luis Peralta

TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology

Example POMDP – Practical ConsiderationsExample POMDP – Practical Considerations

It looks easy let’s try something more “real”…

Probabilistic Robot “RoboProb”

Page 39: TKK | Automation Technology Laboratory Partially Observable Markov Decision Process (Chapter 15 & 16) José Luis Peralta

TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology

Example POMDP – Practical ConsiderationsExample POMDP – Practical ConsiderationsIt looks easy let’s try something more “real”…

Probabilistic Robot “RoboProb”

11 States: 1 2 3 4, , ,x x x x

5 Control Actions: 1u

5 6 7 8, , ,x x x x9 10 11, ,x x x

2u3u 4u5u Sense without moving

1 2 3 45 6 78 9 10 11

0.10.1

0.8

Transition Model

Page 40: TKK | Automation Technology Laboratory Partially Observable Markov Decision Process (Chapter 15 & 16) José Luis Peralta

TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology

Example POMDP – Practical ConsiderationsExample POMDP – Practical Considerations

It looks easy let’s try something more “real”…Probabilistic Robot

“RoboProb”

-0,04 -0,04 -0,04 1-0,04 -0,04 -1-0,04 -0,04 -0,04 -0,04

“Reward” Payoff

1 1

8 2

( , ) 0.04

( , ) 0.04

r x u

r x u

The same set for all control action

Example

7 5

7 3

( , ) 1

( , ) 1

r x u

r x u

Page 41: TKK | Automation Technology Laboratory Partially Observable Markov Decision Process (Chapter 15 & 16) José Luis Peralta

TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology

Example POMDP – Practical ConsiderationsExample POMDP – Practical ConsiderationsIt’s getting kind of hard :S…

Probabilistic Robot “RoboProb”

Transition Probability

( , )i j kp x x u

Example 1( , )i jp x x u

1u

1 2 3 4 5 6 7 8 9 10 111 0,9 0,1 0 0 0 0 0 0 0 0 02 0,1 0,8 0,1 0 0 0 0 0 0 0 03 0 0,1 0,8 0,1 0 0 0 0 0 0 04 0 0 0 1 0 0 0 0 0 0 05 0,8 0 0 0 0,2 0 0 0 0 0 06 0 0 0,8 0 0 0,1 0,1 0 0 0 07 0 0 0 0 0 0 1 0 0 0 08 0 0 0 0 0,8 0 0 0,1 0,1 0 09 0 0 0 0 0 0 0 0,1 0,8 0,1 0

10 0 0 0 0 0 0,8 0 0 0,1 0 0,111 0 0 0 0 0 0 0,8 0 0 0,1 0,1

Posteriori State

Cur

rent

Sta

te

1 2 3 45 6 78 9 10 11

0.10.1

0.8

Page 42: TKK | Automation Technology Laboratory Partially Observable Markov Decision Process (Chapter 15 & 16) José Luis Peralta

TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology

Example POMDP – Practical ConsiderationsExample POMDP – Practical ConsiderationsIt’s getting kind of hard :S…

Probabilistic Robot “RoboProb”

Transition Probability

( , )i j kp x x u

Example 5( , )i jp x x u

1 2 3 45 6 78 9 10 11

1 2 3 4 5 6 7 8 9 10 111 1 0 0 0 0 0 0 0 0 0 02 0 1 0 0 0 0 0 0 0 0 03 0 0 1 0 0 0 0 0 0 0 04 0 0 0 1 0 0 0 0 0 0 05 0 0 0 0 1 0 0 0 0 0 06 0 0 0 0 0 1 0 0 0 0 07 0 0 0 0 0 0 1 0 0 0 08 0 0 0 0 0 0 0 1 0 0 09 0 0 0 0 0 0 0 0 1 0 0

10 0 0 0 0 0 0 0 0 0 1 011 0 0 0 0 0 0 0 0 0 0 1

Posteriori State

Cur

rent

Sta

te

5u

Page 43: TKK | Automation Technology Laboratory Partially Observable Markov Decision Process (Chapter 15 & 16) José Luis Peralta

TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology

Example POMDP – Practical ConsiderationsExample POMDP – Practical ConsiderationsIt’s getting kind of hard :S…

Probabilistic Robot “RoboProb”

Measurement Probability

( )j ip z x

1 2 3 4 5 6 7 8 9 10 111 0,7 0,03 0,03 0,03 0,03 0,03 0,03 0,03 0,03 0,03 0,032 0,03 0,7 0,03 0,03 0,03 0,03 0,03 0,03 0,03 0,03 0,033 0,03 0,03 0,7 0,03 0,03 0,03 0,03 0,03 0,03 0,03 0,034 0,03 0,03 0,03 0,7 0,03 0,03 0,03 0,03 0,03 0,03 0,035 0,03 0,03 0,03 0,03 0,7 0,03 0,03 0,03 0,03 0,03 0,036 0,03 0,03 0,03 0,03 0,03 0,7 0,03 0,03 0,03 0,03 0,037 0,03 0,03 0,03 0,03 0,03 0,03 0,7 0,03 0,03 0,03 0,038 0,03 0,03 0,03 0,03 0,03 0,03 0,03 0,7 0,03 0,03 0,039 0,03 0,03 0,03 0,03 0,03 0,03 0,03 0,03 0,7 0,03 0,03

10 0,03 0,03 0,03 0,03 0,03 0,03 0,03 0,03 0,03 0,7 0,0311 0,03 0,03 0,03 0,03 0,03 0,03 0,03 0,03 0,03 0,03 0,7

Probability of Measuring Zi

Cur

rent

Sta

te

Page 44: TKK | Automation Technology Laboratory Partially Observable Markov Decision Process (Chapter 15 & 16) José Luis Peralta

TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology

Example POMDP – Practical ConsiderationsExample POMDP – Practical ConsiderationsIt’s getting kind of hard :S…

Probabilistic Robot “RoboProb”

Belief States

1 1( )p b x

3 3( )p b x2 2( )p b x

11 11 2 3 10( ) 1p b x p p p

Impossible to graph!!

Page 45: TKK | Automation Technology Laboratory Partially Observable Markov Decision Process (Chapter 15 & 16) José Luis Peralta

TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology

Example POMDP – Practical ConsiderationsExample POMDP – Practical ConsiderationsIt’s getting kind of hard :S…

Probabilistic Robot “RoboProb”

Each linear function results from executing control , followed by observing measurement , and then executing control .

uz

u

Page 46: TKK | Automation Technology Laboratory Partially Observable Markov Decision Process (Chapter 15 & 16) José Luis Peralta

TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology

Example POMDP – Practical ConsiderationsExample POMDP – Practical ConsiderationsIt’s getting kind of hard :S…

Probabilistic Robot “RoboProb”

Defining Measurement Probability

Defining “Reward” Payoff

Defining Transition Probability

Merging Transition (Control) Probability

Page 47: TKK | Automation Technology Laboratory Partially Observable Markov Decision Process (Chapter 15 & 16) José Luis Peralta

TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology

Example POMDP – Practical ConsiderationsExample POMDP – Practical ConsiderationsIt’s getting kind of hard :S…

Probabilistic Robot “RoboProb”

u

z

u

Setting Beliefs

Executing

Sensing

number of states

number of controlsC

N

N

timesCN

timesN

Executing timesCN

Page 48: TKK | Automation Technology Laboratory Partially Observable Markov Decision Process (Chapter 15 & 16) José Luis Peralta

TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology

Example POMDP – Practical ConsiderationsExample POMDP – Practical ConsiderationsNow What…?

Probabilistic Robot “RoboProb”

Calculating

number of states

number of controlsC

N

N

The real problem is to compute

,r b u timesN

Page 49: TKK | Automation Technology Laboratory Partially Observable Markov Decision Process (Chapter 15 & 16) José Luis Peralta

TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology

Example POMDP – Practical ConsiderationsExample POMDP – Practical Considerations

The real problem is to compute

Given a belief and a control action , the outcome is a distribution over distributions.

Because belief is also based on the next measurement, the measurement itself is generated stochastically.

,p b u bKey factor in this update is the conditional probability

This probability specifies a distribution over probability distributions.

b u

b

Page 50: TKK | Automation Technology Laboratory Partially Observable Markov Decision Process (Chapter 15 & 16) José Luis Peralta

TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology

Example POMDP – Practical ConsiderationsExample POMDP – Practical Considerations

The real problem is to compute

So we make

Contain only on non-zero term = b

Page 51: TKK | Automation Technology Laboratory Partially Observable Markov Decision Process (Chapter 15 & 16) José Luis Peralta

TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology

Example POMDP – Practical ConsiderationsExample POMDP – Practical Considerations

The real problem is to compute

Arriving to:

Just integrate over measurements instead of uzBecause our space is finite we have

With

Page 52: TKK | Automation Technology Laboratory Partially Observable Markov Decision Process (Chapter 15 & 16) José Luis Peralta

TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology

Example POMDP – Practical ConsiderationsExample POMDP – Practical Considerations

The real problem is to compute

At the end we have something

So, this VIA is far from practical. For any reasonable number of distinct states, measurements,

and controls, the complexity of the value function is prohibitive, even for relatively beginning planning horizons.

Need for approximations

Page 53: TKK | Automation Technology Laboratory Partially Observable Markov Decision Process (Chapter 15 & 16) José Luis Peralta

TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology

Approximate POMDP TechniquesApproximate POMDP Techniques

• Here we have 3 approximate probabilistic planning and control algorithms QMDP AMDP MC-POMDP

• Varying degrees of practical applicability. • All 3 algorithms relied on approximations of the

POMDP value function. • They differed in the nature of their

approximations.

Page 54: TKK | Automation Technology Laboratory Partially Observable Markov Decision Process (Chapter 15 & 16) José Luis Peralta

TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology

Approximate POMDP Techniques - QMDPApproximate POMDP Techniques - QMDP

• The QMDP framework considers uncertainty only for a single action choice: Assumes after the immediate next control action, the

state of the world suddenly becomes observable. Full observability make possible to use the MDP-

optimal value function. QMDP generalizes the MDP value function to belief

spaces through the mathematical expectation operator.

Planning in QMDPs is as efficient as in MDPs, but the value function generally overestimates the true value of a belief state.

Page 55: TKK | Automation Technology Laboratory Partially Observable Markov Decision Process (Chapter 15 & 16) José Luis Peralta

TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology

Approximate POMDP Techniques - QMDPApproximate POMDP Techniques - QMDP

• Algorithm

• The QMDP framework considers uncertainty only for a single action choice.

Page 56: TKK | Automation Technology Laboratory Partially Observable Markov Decision Process (Chapter 15 & 16) José Luis Peralta

TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology

Approximate POMDP Techniques - AMDPApproximate POMDP Techniques - AMDP

• Augmented-MDP (AMDP) maps the belief into a lower-dimensional representation, over which it then performs exact value iteration.

• “Classical" representation consists of the most likely state under a belief, along with the belief entropy.

• AMDPs are like MDPs with one added dimension in the state representation that measures global degree of uncertainty.

• To implement AMDP, its necessary to learn the state transition and the reward function in the low-dimensional belief space.

Page 57: TKK | Automation Technology Laboratory Partially Observable Markov Decision Process (Chapter 15 & 16) José Luis Peralta

TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology

Approximate POMDP Techniques - AMDPApproximate POMDP Techniques - AMDP

• “Classical" representation consists of the most likely state under a belief, along with the belief entropy.

Page 58: TKK | Automation Technology Laboratory Partially Observable Markov Decision Process (Chapter 15 & 16) José Luis Peralta

TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology

Approximate POMDP Techniques - AMDPApproximate POMDP Techniques - AMDP

MEAN

COVARIANCE

TRUE COVARIANCE

TRUE MEAN

ESTIMATED MEAN

ESTIMATED COVARIANCE

Page 59: TKK | Automation Technology Laboratory Partially Observable Markov Decision Process (Chapter 15 & 16) José Luis Peralta

TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology

Approximate POMDP Techniques - AMDPApproximate POMDP Techniques - AMDP

• AMDPs in mobile robot navigation is called coastal navigation.

• Anticipates uncertainty• Selects motion that trades off overall path

length with the uncertainty accrued along a path.

• Resulting trajectories differ significantly from any non-probabilistic solution.

• Being temporarily lost is acceptable, if the robot can later re-localize with sufficiently high probability.

Page 60: TKK | Automation Technology Laboratory Partially Observable Markov Decision Process (Chapter 15 & 16) José Luis Peralta

TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology

Approximate POMDP Techniques - AMDPApproximate POMDP Techniques - AMDP

• AMDP Algorithm

Page 61: TKK | Automation Technology Laboratory Partially Observable Markov Decision Process (Chapter 15 & 16) José Luis Peralta

TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology

Approximate POMDP Techniques - AMDPApproximate POMDP Techniques - AMDP

Page 62: TKK | Automation Technology Laboratory Partially Observable Markov Decision Process (Chapter 15 & 16) José Luis Peralta

TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology

Approximate POMDP Techniques - MC-Approximate POMDP Techniques - MC-POMDPPOMDP

• The Monte Carlo MPOMDP (MC-POMDP)• Particle filter version of POMDPs. • Calculates a value function defined over sets of

particles. • MC-POMDPs uses local learning technique,

which used a locally weighted learning rule in combination with a proximity test based on KL-divergence.

• MC-POMDPs then apply Monte Carlo sampling to implement an approximate value backup.

• The resulting algorithm is a full-fledged POMDP algorithm whose computational complexity and accuracy are both functions of the parameters of the learning algorithm.

Page 63: TKK | Automation Technology Laboratory Partially Observable Markov Decision Process (Chapter 15 & 16) José Luis Peralta

TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology

Approximate POMDP Techniques - MC-Approximate POMDP Techniques - MC-POMDPPOMDP

• particle set representing belief b

• Value Function

Page 64: TKK | Automation Technology Laboratory Partially Observable Markov Decision Process (Chapter 15 & 16) José Luis Peralta

TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology

Approximate POMDP Techniques - MC-Approximate POMDP Techniques - MC-POMDPPOMDP

• MC-POMDP Algorithm

Page 65: TKK | Automation Technology Laboratory Partially Observable Markov Decision Process (Chapter 15 & 16) José Luis Peralta

TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology

• Contents :- Motivation

- Conclusions- Problem Description- Objective - Robot Model- Experimental Results

discrete Monte Carlo representation of 1:11 kk yxp

set of N particles : )(1ikx

Draw new particles from proposal Distribution

)(1

)( ik

ik xxp

Given new observation ky

evaluate importance weights using likelihood function

)()( ikk

ik xypw

Resample Particles

Discrete Monte Carlo representation (aproximation) of kk yxp :1

Approximate POMDP Techniques - MC-Approximate POMDP Techniques - MC-POMDPPOMDP

Page 66: TKK | Automation Technology Laboratory Partially Observable Markov Decision Process (Chapter 15 & 16) José Luis Peralta

TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology

References and LinksReferences and Links

• References[1] Thrun, Burgard, Fox. Probabilistic Robotics. MIT Press, 2005

• Linkshttp://en.wikipedia.org/wiki/Partially_observable_Markov_decision_processhttp://www.cs.cmu.edu/~trey/zmdp/http://www.cassandra.org/pomdp/index.shtml http://www.cs.duke.edu/~mlittman/topics/pomdp-page.html

Page 67: TKK | Automation Technology Laboratory Partially Observable Markov Decision Process (Chapter 15 & 16) José Luis Peralta

TKK | Automation Technology LaboratoryAS-84.4340 Postgraduate Course in Automation Technology

ExerciseExerciseExercise 1 in [1] Chapter 15A person faces two doors. Behind one is a tiger, behind the other a reward of +10. The person can either listen or open one of the doors. When opening the door with a tiger, the person will be eaten, which has an associated cost of -20. Listening costs -1. When listening, the person will hear a roaring noise that indicates the presence of the tiger, but only with 0.85 probability will the person be able to localize the noise correctly. With 0.15 probability, the noise will appear as if it came from the door hiding the reward.

Your questions:

(a) Provide the formal model of the POMDP, in which you define the state, action, and measurement spaces, the cost function, and the associated probability functions. (b) What is the expected cumulative payoff/cost of the open-loop action sequence: "Listen, listen, open door 1"? Explain your calculation.

(c) What is the expected cumulative payoff/cost of the open-loop action sequence: "Listen, then open the door for which we did not hear a noise"? Again, explain your calculation.