u ncertainty in s ensing ( and action ). a genda planning with belief states nondeterministic...

50
UNCERTAINTY IN SENSING (AND ACTION)

Upload: violet-jacobs

Post on 17-Jan-2016

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: U NCERTAINTY IN S ENSING ( AND ACTION ). A GENDA Planning with belief states Nondeterministic sensing uncertainty Probabilistic sensing uncertainty

UNCERTAINTY INSENSING (AND ACTION)

Page 2: U NCERTAINTY IN S ENSING ( AND ACTION ). A GENDA Planning with belief states Nondeterministic sensing uncertainty Probabilistic sensing uncertainty

AGENDA

Planning with belief states Nondeterministic sensing uncertainty Probabilistic sensing uncertainty

Page 3: U NCERTAINTY IN S ENSING ( AND ACTION ). A GENDA Planning with belief states Nondeterministic sensing uncertainty Probabilistic sensing uncertainty

ACTION UNCERTAINTY

Each action representation is of the form:

Action:a(s) -> {s1,…,sr}

where each si, i = 1, ..., r describes one possible effect of the action in a state s

Page 4: U NCERTAINTY IN S ENSING ( AND ACTION ). A GENDA Planning with belief states Nondeterministic sensing uncertainty Probabilistic sensing uncertainty

Right

AND/OR TREE

action nodes(world “decision” nodes)

state nodes(agent decision nodes)

Suck(R1)

loop

Page 5: U NCERTAINTY IN S ENSING ( AND ACTION ). A GENDA Planning with belief states Nondeterministic sensing uncertainty Probabilistic sensing uncertainty

Right

AND/OR TREE

Suck(R1)

loopLeft Suck(R2)

goal goalloop

Suck(R2)

Right Suck(R1)

loop loop

Page 6: U NCERTAINTY IN S ENSING ( AND ACTION ). A GENDA Planning with belief states Nondeterministic sensing uncertainty Probabilistic sensing uncertainty

OR SUB-TREE OR sub-tree : AND-OR tree :: path : classical search tree For each state node, only one

child is included For each action node, all

children are included It forms a part of a

potential solution ifnone of its nodes is closed

A solution is an OR sub-tree in which all leaves are goal states

Page 7: U NCERTAINTY IN S ENSING ( AND ACTION ). A GENDA Planning with belief states Nondeterministic sensing uncertainty Probabilistic sensing uncertainty

BELIEF STATE

A belief state is the set of all states that an agent think are possible at any given time or at any stage of planning a course of actions, e.g.:

To plan a course of actions, the agent searches a space of belief states, instead of a space of states

Page 8: U NCERTAINTY IN S ENSING ( AND ACTION ). A GENDA Planning with belief states Nondeterministic sensing uncertainty Probabilistic sensing uncertainty

SENSOR MODEL (DEFINITION #1) State space S The sensor model is a function

SENSE: S 2S

that maps each state s S to a belief state (the set of all states that the agent would think possible if it were actually observing state s)

Example: Assume our vacuum robot can perfectly sense the room it is in and if there is dust in it. But it can’t sense if there is dust in the other room

SENSE( ) =

SENSE( ) =

Page 9: U NCERTAINTY IN S ENSING ( AND ACTION ). A GENDA Planning with belief states Nondeterministic sensing uncertainty Probabilistic sensing uncertainty

SENSOR MODEL (DEFINITION #2) State space S, percept space P The sensor model is a function

SENSE: S Pthat maps each state s S to a percept (the percept that the agent would obtain if actually observing state s)

We can then define the set of states consistent with the observation P

CONSISTENT(P) = { s if SENSE(s)=P }

SENSE( ) =

CONSISTENT( ) =

?

?

Page 10: U NCERTAINTY IN S ENSING ( AND ACTION ). A GENDA Planning with belief states Nondeterministic sensing uncertainty Probabilistic sensing uncertainty

VACUUM ROBOT ACTION AND SENSOR MODEL

RightAppl if sIn(R1){s1 = s - In(R1) + In(R2), s2 = s}

[Right does either the right thing, or nothing]

LeftAppl if sIn(R2){s1 = s - In(R2) + In(R1),

s2 = s - In(R2) + In(R1) - Clean(R2)}[Left always move the robot to R1, but it may occasionally deposit dust in R2]

Suck(r)Appl sIn(r){s1 = s+Clean(r)}

[Suck always does the right thing]

• The robot perfectly senses the room it is in and whether there is dust in it

• But it can’t sense if there is dust in the other room

State s : any logical conjunction of In(R1), In(R2), Clean(R1), Clean (R2)(notation: + adds an attribute, - removes an attribute)

Page 11: U NCERTAINTY IN S ENSING ( AND ACTION ). A GENDA Planning with belief states Nondeterministic sensing uncertainty Probabilistic sensing uncertainty

TRANSITION BETWEEN BELIEF STATES Suppose the robot is initially in state:

After sensing this state, its belief state is:

Just after executing Left, its belief state will be:

After sensing the new state, its belief state will be:

or if there is no dust in R1 if there is dust in R1

Page 12: U NCERTAINTY IN S ENSING ( AND ACTION ). A GENDA Planning with belief states Nondeterministic sensing uncertainty Probabilistic sensing uncertainty

TRANSITION BETWEEN BELIEF STATES Suppose the robot is initially in state:

After sensing this state, its belief state is:

Just after executing Left, its belief state will be:

After sensing the new state, its belief state will be:

or if there is no dust in R1 if there is dust in R1

Left

Clean(R1) Clean(R1)

Page 13: U NCERTAINTY IN S ENSING ( AND ACTION ). A GENDA Planning with belief states Nondeterministic sensing uncertainty Probabilistic sensing uncertainty

TRANSITION BETWEEN BELIEF STATES

How do you propagate the action/sensing operation to obtain the successors of a belief state?

Left

Clean(R1) Clean(R1)

Page 14: U NCERTAINTY IN S ENSING ( AND ACTION ). A GENDA Planning with belief states Nondeterministic sensing uncertainty Probabilistic sensing uncertainty

COMPUTING THE TRANSITION BETWEEN BELIEF STATES

Given an action A, and a belief state S = {s1,…,sn}

Result of applying action, without sensing: Take the union of all SUCC(si,A) for i=1,…,n This gives us a pre-sensing belief state S’

Possible percepts resulting from sensing: {SENSE(si’) for si’ in S’} (using SENSE definition

#2) This gives us a percept set P

Possible states both in S’ AND consistent with each possible percept pj in P: Sj = {si | SENSE(si’)=pj for si’ in S’}

i.e.,Sj = CONSISTENT(pj) ∩ S’

Page 15: U NCERTAINTY IN S ENSING ( AND ACTION ). A GENDA Planning with belief states Nondeterministic sensing uncertainty Probabilistic sensing uncertainty

AND/OR TREE OF BELIEF STATES

Left

Suck

Suck

goal

A goal belief state is one in which all states are goal states

An action is applicable to a belief state B if its precondition is achieved in all states in B

Right

loop goal

Page 16: U NCERTAINTY IN S ENSING ( AND ACTION ). A GENDA Planning with belief states Nondeterministic sensing uncertainty Probabilistic sensing uncertainty

AND/OR TREE OF BELIEF STATES

Left

Suck

Right

loopSuck

goal

Right

loop goal

Page 17: U NCERTAINTY IN S ENSING ( AND ACTION ). A GENDA Planning with belief states Nondeterministic sensing uncertainty Probabilistic sensing uncertainty

AND/OR TREE OF BELIEF STATES

Left

Suck

Right

Suck

goal

Right

goal

Page 18: U NCERTAINTY IN S ENSING ( AND ACTION ). A GENDA Planning with belief states Nondeterministic sensing uncertainty Probabilistic sensing uncertainty

BELIEF STATE REPRESENTATION

Solution #1: Represent the set of states explicitly

Under the closed world assumption, if states are described with n propositions, there are O(2n) states

The number of belief states is

A belief state may contain O(2n) states

This can be hugely expensive

n2O(2 )

Page 19: U NCERTAINTY IN S ENSING ( AND ACTION ). A GENDA Planning with belief states Nondeterministic sensing uncertainty Probabilistic sensing uncertainty

BELIEF STATE REPRESENTATION

Solution #2: Represent only what is known For example, if the vacuum robot knows that

it is in R1 (so, not in R2) and R2 is clean, then the representation is K(In(R1)) K(In(R2)) K(Clean(R2))where K stands for “Knows that ...”

How many belief states can be represented? Only 3n, instead of

n2O(2 )

Page 20: U NCERTAINTY IN S ENSING ( AND ACTION ). A GENDA Planning with belief states Nondeterministic sensing uncertainty Probabilistic sensing uncertainty

SUCCESSOR OF A BELIEF STATE THROUGH AN ACTION

LeftAppl if s In(R2){s1 = s - In(R2) + In(R1),

s2 = s - In(R2) + In(R1) - Clean(R2)}

K(In(R2))K(In(R1)) K(Clean(R2))

s1 K(In(R2))K(In(R1))K(Clean(R2))

s2 K(In(R2))K(In(R1))K(Clean(R2))

K(In(R2))K(In(R1))

An action does not dependon the agent’s belief state K does not appear inthe action description(different from R&N, p. 440)

Page 21: U NCERTAINTY IN S ENSING ( AND ACTION ). A GENDA Planning with belief states Nondeterministic sensing uncertainty Probabilistic sensing uncertainty

SENSORY ACTIONS So far, we have assumed a unique sensory operation

automatically performed after executing of each action of a plan

But an agent may have several sensors, each having some cost (e.g., time) to use

In certain situations, the agent may like better to avoid the cost of using a sensor, even if using the sensor could reduce uncertainty

This leads to introducing specific sensory actions, each with its own representation active sensing

Like with other actions, the agent chooses which sensory actions it want to execute and when

Page 22: U NCERTAINTY IN S ENSING ( AND ACTION ). A GENDA Planning with belief states Nondeterministic sensing uncertainty Probabilistic sensing uncertainty

EXAMPLE

Check-Dust(r):Appl if s In(r){when Clean(r)

b’ = b - K(Clean(r))

+ K(Clean(r)) } {when Clean(r) b’ = b - K(Clean(r))

+ K(Clean(r))}

K(In(R1))K(In(R2)) K(Clean(R2))

K(In(R1))K(In(R2)) K(Clean(R2))K(Clean(R1))

K(In(R1))K(In(R2)) K(Clean(R2))K(Clean(R1))

Check-Dust(R1):

A sensory action maps a stateinto a belief stateIts precondition is about the stateIts effects are on the belief state

Page 23: U NCERTAINTY IN S ENSING ( AND ACTION ). A GENDA Planning with belief states Nondeterministic sensing uncertainty Probabilistic sensing uncertainty

INTRUDER FINDING PROBLEM

A moving intruder is hiding in a 2-D workspace The robot must “sweep” the workspace to find the

intruder Both the robot and the intruder are points

robot’s visibilityregion

hidingregion 1

cleared region

2 3

4 5 6

robot

Page 24: U NCERTAINTY IN S ENSING ( AND ACTION ). A GENDA Planning with belief states Nondeterministic sensing uncertainty Probabilistic sensing uncertainty

DOES A SOLUTION ALWAYS EXIST?

Easy to test: “Hole” in the workspace

Hard to test:No “hole” in the workspace

No !

Page 25: U NCERTAINTY IN S ENSING ( AND ACTION ). A GENDA Planning with belief states Nondeterministic sensing uncertainty Probabilistic sensing uncertainty

INFORMATION STATE

Example of an information state = (x,y,a=1,b=1,c=0) An initial state is of the form (x,y,1, 1, ..., 1) A goal state is any state of the form (x,y,0,0, ..., 0)

(x,y)

a = 0 or 1

c = 0 or 1

b = 0 or 1

0 cleared region1 hidding region

Page 26: U NCERTAINTY IN S ENSING ( AND ACTION ). A GENDA Planning with belief states Nondeterministic sensing uncertainty Probabilistic sensing uncertainty

CRITICAL LINE

a=0b=1

a=0b=1

Information state is unchanged

a=0b=0

Critical line

Page 27: U NCERTAINTY IN S ENSING ( AND ACTION ). A GENDA Planning with belief states Nondeterministic sensing uncertainty Probabilistic sensing uncertainty

A

B C D

E

CRITICALITY-BASED DISCRETIZATION

Each of the regions A, B, C, D, and E consists of “equivalent” positions of the robot,so it’s sufficient to consider a single positionper region

Page 28: U NCERTAINTY IN S ENSING ( AND ACTION ). A GENDA Planning with belief states Nondeterministic sensing uncertainty Probabilistic sensing uncertainty

CRITICALITY-BASED DISCRETIZATION

A

B C D

E

(C, 1, 1)

(D, 1)(B, 1)

Page 29: U NCERTAINTY IN S ENSING ( AND ACTION ). A GENDA Planning with belief states Nondeterministic sensing uncertainty Probabilistic sensing uncertainty

CRITICALITY-BASED DISCRETIZATION

A

B C D

E

(C, 1, 1)

(D, 1)(B, 1)

(E, 1)(C, 1, 0)

Page 30: U NCERTAINTY IN S ENSING ( AND ACTION ). A GENDA Planning with belief states Nondeterministic sensing uncertainty Probabilistic sensing uncertainty

CRITICALITY-BASED DISCRETIZATION

A

B C D

E

(C, 1, 1)

(D, 1)(B, 1)

(E, 1)(C, 1, 0)

(B, 0) (D, 1)

Page 31: U NCERTAINTY IN S ENSING ( AND ACTION ). A GENDA Planning with belief states Nondeterministic sensing uncertainty Probabilistic sensing uncertainty

CRITICALITY-BASED DISCRETIZATION

A

C D

E

(C, 1, 1)

(D, 1)(B, 1)

(E, 1)(C, 1, 0)

(B, 0) (D, 1)Much smaller search tree than with grid-based discretization !

B

Page 32: U NCERTAINTY IN S ENSING ( AND ACTION ). A GENDA Planning with belief states Nondeterministic sensing uncertainty Probabilistic sensing uncertainty

SENSORLESS PLANNING

Page 33: U NCERTAINTY IN S ENSING ( AND ACTION ). A GENDA Planning with belief states Nondeterministic sensing uncertainty Probabilistic sensing uncertainty

PLANNING WITH PROBABILISTIC UNCERTAINTY IN SENSING

No motion

Perpendicular motion

Page 34: U NCERTAINTY IN S ENSING ( AND ACTION ). A GENDA Planning with belief states Nondeterministic sensing uncertainty Probabilistic sensing uncertainty

PARTIALLY OBSERVABLE MDPS

Consider the MDP model with states sS, actions aA Reward R(s) Transition model P(s’|s,a) Discount factor g

With sensing uncertainty, initial belief state is a probability distributions over state: b(s)b(si) 0 for all siS, i b(si) = 1

Observations are generated according to a sensor model Observation space oO Sensor model P(o|s)

Resulting problem is a Partially Observable Markov Decision Process (POMDP)

Page 35: U NCERTAINTY IN S ENSING ( AND ACTION ). A GENDA Planning with belief states Nondeterministic sensing uncertainty Probabilistic sensing uncertainty

POMDP UTILITY FUNCTION

A policy p(b) is defined as a map from belief states to actions

Expected discounted reward with policy p:

Up(b) = E[t gt R(St)]

where St is the random variable indicating the state at time t

P(S0=s) = b0(s)

P(S1=s) = ?

Page 36: U NCERTAINTY IN S ENSING ( AND ACTION ). A GENDA Planning with belief states Nondeterministic sensing uncertainty Probabilistic sensing uncertainty

POMDP UTILITY FUNCTION

A policy p(b) is defined as a map from belief states to actions

Expected discounted reward with policy p:

Up(b) = E[t gt R(St)]

where St is the random variable indicating the state at time t

P(S0=s) = b0(s)

P(S1=s) = P(s|p0(b),b0) = s’ P(s|s’,p(b0)) P(S0=s’) = s’ P(s|s’,p(b0)) b0(s’)

Page 37: U NCERTAINTY IN S ENSING ( AND ACTION ). A GENDA Planning with belief states Nondeterministic sensing uncertainty Probabilistic sensing uncertainty

POMDP UTILITY FUNCTION

A policy p(b) is defined as a map from belief states to actions

Expected discounted reward with policy p:

Up(b) = E[t gt R(St)]

where St is the random variable indicating the state at time t

P(S0=s) = b0(s)

P(S1=s) = s’ P(s|s’,p(b)) b0(s’)

P(S2=s) = ?

Page 38: U NCERTAINTY IN S ENSING ( AND ACTION ). A GENDA Planning with belief states Nondeterministic sensing uncertainty Probabilistic sensing uncertainty

POMDP UTILITY FUNCTION

A policy p(b) is defined as a map from belief states to actions

Expected discounted reward with policy p:

Up(b) = E[t gt R(St)]

where St is the random variable indicating the state at time t

P(S0=s) = b0(s)

P(S1=s) = s’ P(s|s’,p(b)) b0(s’) What belief states could the robot be after 1

step?

Page 39: U NCERTAINTY IN S ENSING ( AND ACTION ). A GENDA Planning with belief states Nondeterministic sensing uncertainty Probabilistic sensing uncertainty

b0

Predictb1(s)=s’ P(s|s’,(b0)) b0(s’)

Choose action p(b0)

b1

Page 40: U NCERTAINTY IN S ENSING ( AND ACTION ). A GENDA Planning with belief states Nondeterministic sensing uncertainty Probabilistic sensing uncertainty

b0

oA oB oC oD

Predictb1(s)=s’ P(s|s’,(b0)) b0(s’)

Choose action p(b0)

b1

Receiveobservation

Page 41: U NCERTAINTY IN S ENSING ( AND ACTION ). A GENDA Planning with belief states Nondeterministic sensing uncertainty Probabilistic sensing uncertainty

b0

P(oA|b1)

Predictb1(s)=s’ P(s|s’,(b0)) b0(s’)

Choose action p(b0)

b1

Receiveobservation

b2

P(oB|b1) P(oC|b1) P(oD|b1)

b3 b4 b5

Page 42: U NCERTAINTY IN S ENSING ( AND ACTION ). A GENDA Planning with belief states Nondeterministic sensing uncertainty Probabilistic sensing uncertainty

b0

Predictb1(s)=s’ P(s|s’,(b0)) b0(s’)

Choose action p(b0)

b1

Update belief b2 b3 b4 b5

b2(s) = P(s|b1,oA)b3(s) = P(s|b1,oB)

b4(s) = P(s|b1,oC)

P(oA|b1) P(oB|b1) P(oC|b1) P(oD|b1)

b5(s) = P(s|b1,oD)

Receiveobservation

Page 43: U NCERTAINTY IN S ENSING ( AND ACTION ). A GENDA Planning with belief states Nondeterministic sensing uncertainty Probabilistic sensing uncertainty

b0

Predictb1(s)=s’ P(s|s’,(b0)) b0(s’)

Choose action p(b0)

b1

Update belief b2 b3 b4 b5

b2(s) = P(s|b1,oA)b3(s) = P(s|b1,oB)

b4(s) = P(s|b1,oC)

P(oA|b1) P(oB|b1) P(oC|b1) P(oD|b1)

b5(s) = P(s|b1,oD)

Receiveobservation

P(o|b) = sP(o|s)b(s)

P(s|b,o) = P(o|s)P(s|b)/P(o|b)

= 1/Z P(o|s) b(s)

Page 44: U NCERTAINTY IN S ENSING ( AND ACTION ). A GENDA Planning with belief states Nondeterministic sensing uncertainty Probabilistic sensing uncertainty

BELIEF-SPACE SEARCH TREE Each belief node has |A| action node successors Each action node has |O| belief successors Each (action,observation) pair (a,o) requires

predict/update step similar to HMMs

Matrix/vector formulation: b(s): a vector b of length |S| P(s’|s,a): a set of |S|x|S| matrices Ta

P(ok|s): a vector ok of length |S|

b’ = Tab (predict)

P(ok|b’) = okT b’ (probability of observation)

bk = diag(ok) b’ / (okT b’) (update)

Denote this operation as ba,o

Page 45: U NCERTAINTY IN S ENSING ( AND ACTION ). A GENDA Planning with belief states Nondeterministic sensing uncertainty Probabilistic sensing uncertainty

RECEDING HORIZON SEARCH

Expand belief-space search tree to some depth h

Use an evaluation function on leaf beliefs to estimate utilities

For internal nodes, back up estimated utilities:U(b) = E[R(s)|b] + g maxaA oO P(o|ba)U(ba,o)

Page 46: U NCERTAINTY IN S ENSING ( AND ACTION ). A GENDA Planning with belief states Nondeterministic sensing uncertainty Probabilistic sensing uncertainty

QMDP EVALUATION FUNCTION

One possible evaluation function is to compute the expectation of the underlying MDP value function over the leaf belief states f(b) = s UMDP(s) b(s)

“Averaging over clairvoyance” Assumes the problem becomes instantly fully

observable Is optimistic: U(b) f(b) Approaches POMDP value function as state and

sensing uncertainty decreases In extreme h=1 case, this is called the QMDP

policy

Page 47: U NCERTAINTY IN S ENSING ( AND ACTION ). A GENDA Planning with belief states Nondeterministic sensing uncertainty Probabilistic sensing uncertainty

QMDP POLICY (LITTMAN, CASSANDRA, KAELBLING 1995)

Page 48: U NCERTAINTY IN S ENSING ( AND ACTION ). A GENDA Planning with belief states Nondeterministic sensing uncertainty Probabilistic sensing uncertainty

WORST-CASE COMPLEXITY

Infinite-horizon undiscounted POMDPs are undecideable (reduction to halting problem)

Exact solution to infinite-horizon discounted POMDPs are intractable even for low |S|

Finite horizon: O(|S|2 |A|h |O|h) Receding horizon approximation: one-step

regret is O(gh) Approximate solution: becoming tractable for

|S| in millions a-vector point-based techniques Monte Carlo tree search …Beyond scope of course…

Page 49: U NCERTAINTY IN S ENSING ( AND ACTION ). A GENDA Planning with belief states Nondeterministic sensing uncertainty Probabilistic sensing uncertainty

NEXT TIME

Is it possible to learn how to make good decisions just by interacting with the environment?

Reinforcement learning R&N 21.1-2

Page 50: U NCERTAINTY IN S ENSING ( AND ACTION ). A GENDA Planning with belief states Nondeterministic sensing uncertainty Probabilistic sensing uncertainty

DUE TODAY

HW6 Midterm project report

HW7 available