top-down and bottom-up approaches to planning with partial

IJCAI-2011 Workshop on Decision Makingin Partially Observable, Uncertain Worlds

Top-Down and Bottom-Up Approaches toPlanning with Partial Observability

Hector Geffner

ICREA & Universitat Pompeu Fabra

Barcelona, Spain

Joint work with Blai Bonet, Hector Palacios, Alex Albore, ...

Hector Geffner, Planning with Partial Observability; IJCAI-2011 Workshop on Decision Making . . . 1

Planning and Autonomous Behavior

Three approaches for agent control; i.e., for deciding what do next:

• Programming-based: Specify control by hand

• Learning-based: Learn control from experience

• Model-based: Derive control from a model

Planning is the model-based approach where agent controller derived from modelof the actions, sensors, and goals.

Different models yield different types of controllers . . .


Basic State Model: Classical Planning

• finite and discrete state space S

• a known initial state s0 ∈ S

• a set SG ⊆ S of goal states

• actions A(s) ⊆ A applicable in each s ∈ S

• a deterministic transition function s′ = f(a, s) for a ∈ A(s)

• positive action costs c(a, s)

A solution is a sequence of applicable actions that maps s0 into SG, and it isoptimal if it minimizes sum of action costs (e.g., # of steps)

Resulting controller is open-loop

Different models and controllers obtained by relaxing assumptions in bold . . .


Uncertainty but No Feedback: Conformant Planning

• finite and discrete state space S

• a set of possible initial state S0 ∈ S

• a set SG ⊆ S of goal states

• actions A(s) ⊆ A applicable in each s ∈ S

• a non-deterministic transition function F (a, s) ⊆ S for a ∈ A(s)

• uniform action costs c(a, s)

A solution is still an action sequence but must achieve the goal for any possibleinitial state and transition

More complex than classical planning, verifying that a plan is conformant in-tractable in the worst case; but special case of planning with partial observability


Planning with Markov Decision Processes

MDPs are fully observable, probabilistic state models:

• a state space S

• initial state s0 ∈ S

• a set G ⊆ S of goal states

• actions A(s) ⊆ A applicable in each state s ∈ S

• transition probabilities Pa(s′|s) for s ∈ S and a ∈ A(s)

• action costs c(a, s) > 0

– Solutions are functions (policies) mapping states into actions

– Optimal solutions minimize expected cost to goal


Partially Observable MDPs (POMDPs)

POMDPs are partially observable, probabilistic state models:

• states s ∈ S

• a set G ⊆ S of observable goal states

• actions A(s) ⊆ A

• transition probabilities Pa(s′|s) for s ∈ S and a ∈ A(s)

• initial belief state b0

• sensor model given by probabilities Pa(o|s), o ∈ Obs

– Belief states are probability distributions over S

– Solutions are policies that map belief states into actions

– Optimal policies minimize expected cost to go from b0 to G


Models, Languages, and Solvers

• A planner is a solver over a class of models; it takes a model description, andcomputes the corresponding controller

Model =⇒ Planner =⇒ Controller

• Many models and solution forms; all intractable and challenge is to scale up

• Models described in compact form trough languages such as PDDL and PPDDL.

• (PO)MDPs above goal-based; discounted version common but less general

• Qualitative MDPs and POMDPs, where probabilities replaced by sets, andexpected cost by cost in worst case, also useful and challenging

• Contingent Planning or Planning with PO refer to such Qualitative POMDPs


Status AI Planning

• Classical Planning works

. Large problems solved very fast

. Key idea, but not only one: derivation and use of heuristics

• Model simple but useful

. Operators not primitive; can be policies themselves

. Fast closed-loop replanning able to cope with uncertainty sometimes

• Beyond Classical Planning

. Top-down approach: native general solvers for MDPs, POMDPs, ...

. Bottom-up approach: transformations and use of classical planners

I’ll focus on transformations for dealing with incomplete information and partialobservability


Illustration: Power of Transformations

• Problem P : find green block using visual-marker (circle) that can move aroundone cell at a time (a la Chapman and Ballard)

• Observables: Whether cell marked contains a green block (G), non-green block(B), or neither (C); and whether on table (T) or not (–)

q0

TB/Up-B/Up

TC/Right

q1-C/Down

TB/Right

-B/Down

• Controller on the right solves the problem, and not only that, it’s compact andgeneral: it applies to any number of blocks and any configuration!

Controller obtained by running a classical planner over transformed problem!


Another Illustration: Compilation of Soft Goals

• Planning with soft goals aimed at plans π that maximize utility

u(π) =∑

p∈do(π,s0)u(p) −

∑a∈π

c(a)

• Actions have cost c(a), and soft goals utility u(p)

• Best plans achieve best tradeoff between action costs and utilities

• Model used in recent planning competitions; e.g., net-benefit track 2008 IPC

• Yet it turns that soft goals do not add expressive power, and can be compiledaway . . .


Compilation of Soft Goals (cont.)

• For each soft goal p, create new hard goal p′ initially false, and two newactions:

. collect(p) with precondition p, effect p′ and cost 0, and

. forego(p) with an empty precondition, effect p′ and cost u(p)

• Plans π maximize u(π) iff minimize c(π) =∑a∈π c(a) in resulting problem

(Keyder & G. 2009)

• Many other non-classical features can be compiled away often quite effectively:

. LTL constraints and goals , Control knowledge, HTN hierarchies, ...

We focus next on compiling away uncertainty . . .


Incomplete Information with No Sensing: Conformant Planning

GI

Problem: A robot must move from an uncertain I into G with certainty, one cellat a time, in a grid nxn

• Problem very much like a classical planning problem except for uncertain I

• Plans, however, quite different: best conformant plan must move the robotto a corner first (localization)

• Conformant planning not too interesting in itself; yet provides basis to bestplanners that sense and captures special case


Conformant Planning: Belief State Formulation

GI

• call a set of possible states, a belief state

• actions then map a belief state b into a bel state ba = {s′ |s′ ∈ F (a, s) & s ∈ b}

• conformant problem becomes a path-finding problem in belief space

Problem: number of belief state is doubly exponential in number of variables.

– effective representation of belief states b

– effective heuristic h(b) for estimating cost in belief space

Recent alternative: translate into classical planning . . .


Basic Translation: Move to the ’Knowledge Level’

Given conformant problem P = 〈F,O, I,G〉

• F stands for the fluents in P

• O for the operators with effects C → L

• I for the initial situation (clauses over F -literals)

• G for the goal situation (set of F -literals)

Define classical problem K0(P ) = 〈F ′, O′, I ′, G′〉 as

• F ′ = {KL,K¬L | L ∈ F}• I ′ = {KL | clause L ∈ I}• G′ = {KL | L ∈ G}• O′ = O but preconds L replaced by KL, and effects C → L replaced by KC → KL

(supports) and ¬K¬C → ¬K¬L (cancellation)

K0(P ) is sound but incomplete: every classical plan that solves K0(P ) is aconformant plan for P , but not vice versa.


Key elements in Complete Translation KT,M(P )

• A set T of tags t: consistent sets of assumptions (literals) about the initialsituation I

I 6|= ¬t

• A set M of merges m: valid subsets of tags (= DNF)

I |=∨t∈m

t

• New (tagged) literals KL/t meaning that L is true if t true initially


General Translation Scheme KT,M(P )

Given conformant problem P = 〈F,O, I,G〉

• F stands for the fluents in P

• O for the operators with effects C → L

• I for the initial situation (clauses over F -literals)

• G for the goal situation (set of F -literals)

define classical problem KT,M(P ) = 〈F ′, O′, I ′, G′〉 as

• F ′ = {KL/t , K¬L/t | L ∈ F and t ∈ T}• I ′ = {KL/t | if I |= t ⊃ L}• G′ = {KL | L ∈ G}• O′ = O but preconds L replaced by KL, and effects C → L replaced by KC/t → KL/t

(supports) and ¬K¬C/t→ ¬K¬L/t (cancellation), and new merge actions∧t∈m,m∈M

KL/t → KL

The two parameters T and M are the set of tags (assumptions) and the set of merges (valid sets

of assumptions) . . .


Compiling Uncertainty Away: Properties

• General translation scheme KT,M(P ) is always sound, and for suitable choice ofthe sets of tags and merges, it is complete.

• KS0(P ) is complete instance of KT,M(P ) obtained by setting T to the set ofpossible initial states of P

• Ki(P ) is a polynomial instance of KT,M(P ) that is complete for problemswith width bounded by constant i.

• The width of many benchmarks bounded and equal 1!

• Such problems can be solved with a classical planner after a low poly translation(Palacios & G. 2009).


Next: Partial Observability and Sensing

1. Full translation-based account of contingent planning

. A translation-based approach to contingent planning, A. Albore and H. Palacios and H.

Geffner. IJCAI-09

2. Derivation of finite-state controllers using classical planners

. Automatic Derivation of Memoryless Policies and Finite-State Controllers Using Classical

Planners, B. Bonet and H. Palacios and H. Geffner, IJCAI-09.

3. On-line planning with classical planners

. Planning under Partial Observability by Classical Replanning: Theory and Experiments, B.

Bonet and H. Geffner, IJCAI-2011

4. Approaches that combine translations and sampling

. Effective Heuristics and Belief Tracking for Planning with Incomplete Information, A.

Albore and M. Ramirez and H. Geffner, ICAPS-2011

. Replanning in Domains with Partial Information and Sensing Actions, G. Shani and R.

Brafman, IJCAI-2011

I’ll focus on 2 and 3; Guy Shani will talk about 4


On-line Planning for Simple Partially Observable Problems

• Partial observable problem is P = 〈F,O, I,G,M〉 where

. F is set of fluents

. O is set of actions with conditional effects a : C → L

. I is set of F -clauses defining uncertain initial situation

. G is set of F -literals defining goal

. M is set of sensors (C,L): if C true, L is observed

• Simple problems satisfy the following two restrictions:

. non-unary clauses in I are invariant

. hidden fluents do not appear in body of conditional effects

• Restrictions ensure that beliefs can be represented by literals:

. Theorem: reachable beliefs b for simple problems captured by literals truein b and the invariants

Beliefs ‘solved’ in simple problems; the problem left is action selection


Action Selection via Translation into Classical Planning

For P = 〈F,O, I,G,M〉, define classical problem K(P ) = 〈F ′, O′, I ′, G′〉 with

• F ′ = {KL,K¬L : L ∈ F}• I ′ = {KL : I |= L}• G′ = {KL : L ∈ G}• O′ = Oorig ∪Oinv ∪Osen where

. Oorig = {a : KC → KL , a : ¬K¬C → ¬K¬L | a : C → L ∈ O}

. Oinv = {KC → KL | ¬C ∨ L ∈ I} (invariants)

. Osen = {A(C,L), A(C,¬L) | sensor (C,L) ∈M} (assumptions)

where

– Actions a in K(P ) from P have the preconds L replaced by KL, and

– Actions A(C,L) have preconds KC,¬KL,¬K¬L and effect KL

Let K(P )[s] be K(P ) with initial ‘knowledge state’ I ′ = s.


Properties of plans π for classical problem K(P )[s]

• Classical plans π for K(P )[s] achieve goal but cannot be executed all the wayas they may contain ‘non-real’ actions A(C,L) or A(C,¬L) for sensors (C,L)

• These actions encode assumptions about the value of L that will be observedonce C is made true

• Yet the non-empty prefix of π up to the first such action can be executed

• Since the precondition of these actions includes KC, ¬KL, and ¬K¬L, suchprefix must uncover the truth of the hidden literal L

• As a result, the executable prefix of π either uncovers the truth of a hiddenliteral or achieves the goal (if there are no actions A(C,L) or A(C,¬L) in π)

• This is crucial for the scheme to be complete for simple problems, andanalogous to the explore or exploit principle in PAC-MDP (Kearns & Singh2002),


K-Replanner

1. Let s be the initial state in K(P )

2. While s is not a goal in K(P ) do

3. Compute classical plan π for classical problem K(P )[s]

4. Set π′ := executable prefix of π

5. Set s′ := result of applying π′ to K(P )[s]

6. Add observations to s′ and close s′ with the invariants

7. Set s := s′

Theorem (Soundness): If K-Replanner terminates, it terminates in the goal

Theorem (Completeness): if P is a simple problem with a connected spaceand the non-unary clauses in I are in prime implicate form, then the K-Replannerterminates.


Examples

RB

GRB

GR

R

GG

B

W

K-Replanner using FF

domain #inst. #solved avg. length avg. #calls avg. srch. time

wumpus 70 70 34.60 2.39 1.11

kill-wumpus 30 30 49.20 31.10 1.92

colored-balls 75 26 115.81 27.54 26.51

trail 140 133 22.77 8.18 7.54

K-Replanner using Mp on colored-balls

instance #states length #calls srch. time total time

9×9-15balls-#2 3.68e39 298 65 190.4 428.9

9×9-15balls-#4 3.68e39 369 73 816.1 1,164.8

9×9-17balls-#2 3.87e44 339 69 721.7 1,082.2

9×9-18balls-#4 1.25e47 406 69 533.6 857.0


Finite State Controllers: Example

• Starting in A, move to B and back to A; marks A and B observable.

A B

• This finite-state controller solves the problem

q0

A/Right-/Right

q1B/Left

-/Left

• FSC is compact and general: can add noise, vary distance, etc.

• Heavily used in practice, e.g. video-games and robotics, but written by hand

• The Challenge: How to get these controllers automatically


Basic Notions

• Target problem P is like a classical problem with possible incomplete initialsituation I and some observable fluents

• Finite State Controller C is a set of tuples t = 〈q, o, a, q′〉:

. tuple t = 〈q, o, a, q′〉, depicted qo/a→ q′, tells to do action a

when o is observed in controller state q and then to switch to q′

. no two tuples t = 〈q, o, a, q′〉 and t′ = 〈q, o, a′, q′′〉 for same q, o

• Finite State Controller C solves P if all state trajectories compatible with P andC reach the goal

Question: how to derive FSC from P?


Idea: FSCs as Conformant Plans of Transformed Problem

For P = 〈F, I,O,G〉 and N controller states q, let P ′ = 〈F ′, I ′, O′, G′〉 with

• F ′ = F along with fluents q0, . . . , qN−1, and joint observations o

• I ′ = I ∪ {q0}• G′ = G

• O′ = actions a(q, o, q′) for every a in P , obs o, and q, q′:

. a(q, o, q′) is like action a but conditional on q and o being true

. a(q, o, q′) also makes q′ true if q and o are true

In addition, axioms and bookkeeping for

– expressing dependence of o on ‘primitive’ fluents,

– keeping q fluents mutex, and

– preventing two actions a(q, o, q′) and b(q, o, q′′) for same pair q, o in plan

Theorem: The finite state controller C solves P iff C contains the tuples〈q, o, a, q′〉 that correspond to the actions a(q, o, q′) in a conformant plan for P ′.


Example Revisited

A Bq0

A/Right-/Right

q1B/Left

-/Left

For this problem, compilation yields P ′ = 〈F ′, I ′, O′, G′〉 where:

• F ′ contains at(p1), . . . , at(p5), visited(B); q0, q1; oA, oB, o−• I ′ given by at(p1), q0, oA• G′ given by at(p1) and visited(B)

• O′ contains new actions a(q, o, q′) for a = Left, a = Right, and each q and o

. e.g., action Right(q0, oA, q1) has effects

q0, oA, at(pi)→ at(pi+1)&¬at(pi)q0, oA → q1&¬q0

• Axioms enforce oA iff at(p1), oB iff at(p5), etc.

– Resulting problem P ′ in this case is classical as I not uncertain

– Finite-state controller above ‘read off’ from the classical plan for P ′


Example 2: FSC for Visual Marker Problem

• Problem P : find green block using visual-marker (circle) that can move aroundone cell at a time (a la Chapman or Ballard)

• Observables: Whether cell marked contains a green block (G), non-green block(B), or neither (C); and whether on table (T) or not (–)

q0

TB/Up-B/Up

TC/Right

q1-C/Down

TB/Right

-B/Down

• Controller obtained using a classical planner from translation that assumes 2controller states.

• Since initial situation of P is uncertain, resulting problem P ′ is conformant;translated into classical through KS0


Summary

• Planning is model-based approach to autonomous behavior

• Many models and dimensions; all intractable, challenge is scale up

• Key technique in classical planning is derivation and use of heuristics

• Power of classical planners used for other tasks via transformations; e.g.,

. conformant planning

. on-line planning with partial observability

. derivation of finite state-controllers

• Important difference to MDP and POMDPs solvers:

. inference at level of variables, not at level of states or beliefs

• Challenge: add probabilities and narrow gap with POMDPs


Another Challenge: Russell’s and Norvig’s Wumpus World

Wumpus World PEAS description

Performance measuregold +1000, death -1000-1 per step, -10 for using the arrow

EnvironmentSquares adjacent to wumpus are smellySquares adjacent to pit are breezyGlitter iff gold is in the same squareShooting kills wumpus if you are facing itShooting uses up the only arrowGrabbing picks up gold if in same squareReleasing drops the gold in same square

Breeze Breeze

Breeze

BreezeBreeze

Stench

Stench

BreezePIT

PIT

PIT

1 2 3 4

1

2

3

4

START

Gold

Stench

Actuators Left turn, Right turn,Forward, Grab, Release, Shoot

Sensors Breeze, Glitter, Smell

Chapter 7 5Options for scaling up: Compilation with classical planners; UCT methods forPOMDPs (Silver & Veness 2010); .... ??


References

[1] A. Albore, H. Palacios, and H. Geffner. A translation-based approach to contingent planning. In Proc. IJCAI-09.

[2] A. Albore, M. Ramirez, and H. Geffner. Effective heuristics and belief tracking for planning with incompleteinformation. In Proc. ICAPS-2011.

[3] J. Baier and S. McIlraith. Exploiting procedural domain control knowledge in state-of-the-art planners. In Proc.ICAPS-07.

[4] J. Baier and S. McIlraith. Planning with temporally extended goals using heuristic search. In Proc. ICAPS-06.

[5] B. Bonet and H. Geffner. Planning under partial observability by classical replanning. In Proc. IJCAI-2011.

[6] B. Bonet and H. Geffner. Planning with incomplete information as heuristic search in belief space. In Proc.AIPS-2000.

[7] B. Bonet, H. Palacios, and H. Geffner. Automatic derivation of memoryless policies and finite-state controllersusing classical planners. In Proc. ICAPS-09.

[8] E. Keyder and H. Geffner. Soft goals can be compiled away. JAIR, 2009.

[9] M. Lekavy and P. Navrat. Expressivity of strips-like and htn-like planning. Agent and Multi-Agent Systems:Technologies and Applications, 2007.

[10] H. Palacios and H. Geffner. Compiling Uncertainty Away in Conformant Planning Problems with Bounded Width.JAIR, 2009.

[11] R. Petrick and F. Bacchus. A knowledge-based approach to planning with incomplete information and sensing. InProc. AIPS’02.

[12] G. Shani and R. Brafman. Replanning in domains with partial information and sensing actions. In Proc. IJCAI-2011.

[13] D. Silver and J. Veness. Monte-carlo planning in large pomdps. Advances in Neural Information Processing Systems(NIPS), 2010.

[14] S. Yoon, A. Fern, and R. Givan. FF-replan: A baseline for probabilistic planning. In Proc. ICAPS-07.


top-down and bottom-up approaches to planning with partial

Documents