george konidaris [email protected]€¦ · probabilistic planning recall: • problem has a state. •...

22
Probabilistic Planning George Konidaris [email protected] Spring 2016

Upload: others

Post on 18-Oct-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: George Konidaris gdk@cs.duke€¦ · Probabilistic Planning Recall: • Problem has a state. • State has the Markov property. Markov Decision Processes (MDPs): • The canonical

Probabilistic Planning

George [email protected]

Spring 2016

Page 2: George Konidaris gdk@cs.duke€¦ · Probabilistic Planning Recall: • Problem has a state. • State has the Markov property. Markov Decision Processes (MDPs): • The canonical

The Planning ProblemFinding a sequence of actions to achieve some goal.!!!!!!!!!!!

Page 3: George Konidaris gdk@cs.duke€¦ · Probabilistic Planning Recall: • Problem has a state. • State has the Markov property. Markov Decision Processes (MDPs): • The canonical

PlansIt’s great when a plan works …!!!!!!!!! … but the world doesn’t work like that.!

To plan effectively we need to take uncertainty seriously.

Page 4: George Konidaris gdk@cs.duke€¦ · Probabilistic Planning Recall: • Problem has a state. • State has the Markov property. Markov Decision Processes (MDPs): • The canonical

Probabilistic PlanningAs before:• Generalize logic to probabilities.• Probabilistic planning.!!!!This results in a harder planning problem.In particular:

• Plans can fail. • Can no longer compute straight-line plans.

Page 5: George Konidaris gdk@cs.duke€¦ · Probabilistic Planning Recall: • Problem has a state. • State has the Markov property. Markov Decision Processes (MDPs): • The canonical

Probabilistic PlanningRecall:

• Problem has a state.• State has the Markov property.

!!!

Markov Decision Processes (MDPs):• The canonical decision making formulation.• Problem has a set of states.• Agent has available actions.• Actions cause stochastic state transitions.• Transitions have costs/rewards.

P (St|St�1, at�1, St�2, at�2, ..., S0, a0) = P (St|St�1, at�1)

Page 6: George Konidaris gdk@cs.duke€¦ · Probabilistic Planning Recall: • Problem has a state. • State has the Markov property. Markov Decision Processes (MDPs): • The canonical

Markov Decision Processes: set of states: set of actions: discount factor !: reward function is the reward received taking action from state and transitioning to state .!: transition function is the probability of transitioning to state after taking action in state . !

S

A

R

R(s, a, s′)

γ

a s

s′

T

T (s′|s, a) s′

a s

< S, A, γ, R, T >

(some states are absorbing - execution stops)

Page 7: George Konidaris gdk@cs.duke€¦ · Probabilistic Planning Recall: • Problem has a state. • State has the Markov property. Markov Decision Processes (MDPs): • The canonical

The Markov PropertyNeeds to be extended for MDPs:• depends only on and • depends only on and !!

Current state is a sufficient statistic of agent’s history. !This means that:

• Decision-making depends only on current state• The agent does not need to remember its history

!

st+1 st atrt st at

Page 8: George Konidaris gdk@cs.duke€¦ · Probabilistic Planning Recall: • Problem has a state. • State has the Markov property. Markov Decision Processes (MDPs): • The canonical

Example

0.9 0.05

0.05

!

States: set of grid locationsActions: up, down, left, rightTransition function: move in direction of action with p=0.9Reward function: -1 for every step, 1000 for finding the goal

Page 9: George Konidaris gdk@cs.duke€¦ · Probabilistic Planning Recall: • Problem has a state. • State has the Markov property. Markov Decision Processes (MDPs): • The canonical

Example

BA

C BA

C

BA

C

0.8

0.2

r=-2

r=-5

Page 10: George Konidaris gdk@cs.duke€¦ · Probabilistic Planning Recall: • Problem has a state. • State has the Markov property. Markov Decision Processes (MDPs): • The canonical

MDPsOur goal is to find a policy:!!!!!

That maximizes return: expected sum of rewards. (equiv: min sum of costs)

⇡ : S ! A

1X

i=1

E[�iri]

Page 11: George Konidaris gdk@cs.duke€¦ · Probabilistic Planning Recall: • Problem has a state. • State has the Markov property. Markov Decision Processes (MDPs): • The canonical

Policies and PlansCompare a policy:

• An action for every state.!!!! … with a plan:

• A sequence of actions.!!!!

Why the difference?

Page 12: George Konidaris gdk@cs.duke€¦ · Probabilistic Planning Recall: • Problem has a state. • State has the Markov property. Markov Decision Processes (MDPs): • The canonical

PlanningSo our goal is to produce optimal policy.!!!Planning: we know T and R.!Define the value function to estimate this quantity:!!!!

V is a useful thing to know. How to find V?

⇡⇤(s) = max

⇡R⇡

(s)

V ⇡(s) = E" 1X

i=0

�iri(si)

#

Page 13: George Konidaris gdk@cs.duke€¦ · Probabilistic Planning Recall: • Problem has a state. • State has the Markov property. Markov Decision Processes (MDPs): • The canonical

Monte Carlo EstimationOne approach:!• For each state s• Repeat many times:

• Start at s• Run policy forward until absorbing state (or small) • Write down discount sum of rewards received• This is a sample of V(s)• Average these samples

!This always works! But very high variance. Why?

�t

Page 14: George Konidaris gdk@cs.duke€¦ · Probabilistic Planning Recall: • Problem has a state. • State has the Markov property. Markov Decision Processes (MDPs): • The canonical

BellmanBellman’s equation is a condition that must hold for V:

V ⇡(s) = r(s,⇡(s)) + �max

a

X

s0

T (s0|s, a)V ⇡(s0)

value of this state

reward

expected value of next state

Page 15: George Konidaris gdk@cs.duke€¦ · Probabilistic Planning Recall: • Problem has a state. • State has the Markov property. Markov Decision Processes (MDPs): • The canonical

Value IterationThis gives us an algorithm for learning the value function for a specific given fixed policy:!Repeat:

• Make a copy of the VF.• For each state in VF, assign value using BE.• Replace old VF.

!

This is known as value iteration. (In practice, only adjust “reachable” states.)!Why do we can so much about VF?

Page 16: George Konidaris gdk@cs.duke€¦ · Probabilistic Planning Recall: • Problem has a state. • State has the Markov property. Markov Decision Processes (MDPs): • The canonical

Policy IterationWe can adjust the policy given the VF:!!!!Adjust policy to be greedy w.r.t VF.!We can alternate value and policy iteration.Surprising results:

• This converges even if alternate every step.• Converges to optimal policy.• Converges in polynomial time.

⇡(s) = max

a

"R(s, a) + �

X

s0

T (s0|s, a)V (s0)

#

Page 17: George Konidaris gdk@cs.duke€¦ · Probabilistic Planning Recall: • Problem has a state. • State has the Markov property. Markov Decision Processes (MDPs): • The canonical

Elevator SchedulingCrites and Barto (1985):• System with 4 elevators, 10 floors.• Realistic simulator.• 46 dimensional state space.

Page 18: George Konidaris gdk@cs.duke€¦ · Probabilistic Planning Recall: • Problem has a state. • State has the Markov property. Markov Decision Processes (MDPs): • The canonical

MicroMAP“Drivers and Loads” (trucking), CASTLE lab at Princeton!“the model was used by 20 of the largest truckload carriers to dispatch over 66,000 drivers”

Page 19: George Konidaris gdk@cs.duke€¦ · Probabilistic Planning Recall: • Problem has a state. • State has the Markov property. Markov Decision Processes (MDPs): • The canonical

Back to PDDLNote that an MDP does not contain the structure of PDDL.!!!If we wish to combine that structure and probabilistic planning, we can use a related language called PPDDL - probabilistic problem domain definition language.!

Page 20: George Konidaris gdk@cs.duke€¦ · Probabilistic Planning Recall: • Problem has a state. • State has the Markov property. Markov Decision Processes (MDPs): • The canonical

PPDDL OperatorsNow operators have probabilistic outcomes:!(:action move_left :parameters (x, y) :precondition (not (wall(x-1, y)) :effect (probabilistic 0.8 (and (at(x-1)) (not at(x)) (decrease (reward) 1)) 0.2 (and (at(x+1)) (not(at(x))(decrease (reward) 1)) ))

Page 21: George Konidaris gdk@cs.duke€¦ · Probabilistic Planning Recall: • Problem has a state. • State has the Markov property. Markov Decision Processes (MDPs): • The canonical

PPDDLInstead of computing a plan, we again need a policy.!Most planners:

• Take as input PPDDL.• Use a simulator.• Compute policy for states reachable from start.• Are evaluated jointly with simulator.

Page 22: George Konidaris gdk@cs.duke€¦ · Probabilistic Planning Recall: • Problem has a state. • State has the Markov property. Markov Decision Processes (MDPs): • The canonical

Robot PlanningOne more common type of planning left!