top-down and bottom-up approaches to planning with partial
TRANSCRIPT
IJCAI-2011 Workshop on Decision Makingin Partially Observable, Uncertain Worlds
Top-Down and Bottom-Up Approaches toPlanning with Partial Observability
Hector Geffner
ICREA & Universitat Pompeu Fabra
Barcelona, Spain
Joint work with Blai Bonet, Hector Palacios, Alex Albore, ...
Hector Geffner, Planning with Partial Observability; IJCAI-2011 Workshop on Decision Making . . . 1
Planning and Autonomous Behavior
Three approaches for agent control; i.e., for deciding what do next:
• Programming-based: Specify control by hand
• Learning-based: Learn control from experience
• Model-based: Derive control from a model
Planning is the model-based approach where agent controller derived from modelof the actions, sensors, and goals.
Different models yield different types of controllers . . .
Hector Geffner, Planning with Partial Observability; IJCAI-2011 Workshop on Decision Making . . . 2
Basic State Model: Classical Planning
• finite and discrete state space S
• a known initial state s0 ∈ S
• a set SG ⊆ S of goal states
• actions A(s) ⊆ A applicable in each s ∈ S
• a deterministic transition function s′ = f(a, s) for a ∈ A(s)
• positive action costs c(a, s)
A solution is a sequence of applicable actions that maps s0 into SG, and it isoptimal if it minimizes sum of action costs (e.g., # of steps)
Resulting controller is open-loop
Different models and controllers obtained by relaxing assumptions in bold . . .
Hector Geffner, Planning with Partial Observability; IJCAI-2011 Workshop on Decision Making . . . 3
Uncertainty but No Feedback: Conformant Planning
• finite and discrete state space S
• a set of possible initial state S0 ∈ S
• a set SG ⊆ S of goal states
• actions A(s) ⊆ A applicable in each s ∈ S
• a non-deterministic transition function F (a, s) ⊆ S for a ∈ A(s)
• uniform action costs c(a, s)
A solution is still an action sequence but must achieve the goal for any possibleinitial state and transition
More complex than classical planning, verifying that a plan is conformant in-tractable in the worst case; but special case of planning with partial observability
Hector Geffner, Planning with Partial Observability; IJCAI-2011 Workshop on Decision Making . . . 4
Planning with Markov Decision Processes
MDPs are fully observable, probabilistic state models:
• a state space S
• initial state s0 ∈ S
• a set G ⊆ S of goal states
• actions A(s) ⊆ A applicable in each state s ∈ S
• transition probabilities Pa(s′|s) for s ∈ S and a ∈ A(s)
• action costs c(a, s) > 0
– Solutions are functions (policies) mapping states into actions
– Optimal solutions minimize expected cost to goal
Hector Geffner, Planning with Partial Observability; IJCAI-2011 Workshop on Decision Making . . . 5
Partially Observable MDPs (POMDPs)
POMDPs are partially observable, probabilistic state models:
• states s ∈ S
• a set G ⊆ S of observable goal states
• actions A(s) ⊆ A
• transition probabilities Pa(s′|s) for s ∈ S and a ∈ A(s)
• initial belief state b0
• sensor model given by probabilities Pa(o|s), o ∈ Obs
– Belief states are probability distributions over S
– Solutions are policies that map belief states into actions
– Optimal policies minimize expected cost to go from b0 to G
Hector Geffner, Planning with Partial Observability; IJCAI-2011 Workshop on Decision Making . . . 6
Models, Languages, and Solvers
• A planner is a solver over a class of models; it takes a model description, andcomputes the corresponding controller
Model =⇒ Planner =⇒ Controller
• Many models and solution forms; all intractable and challenge is to scale up
• Models described in compact form trough languages such as PDDL and PPDDL.
• (PO)MDPs above goal-based; discounted version common but less general
• Qualitative MDPs and POMDPs, where probabilities replaced by sets, andexpected cost by cost in worst case, also useful and challenging
• Contingent Planning or Planning with PO refer to such Qualitative POMDPs
Hector Geffner, Planning with Partial Observability; IJCAI-2011 Workshop on Decision Making . . . 7
Status AI Planning
• Classical Planning works
. Large problems solved very fast
. Key idea, but not only one: derivation and use of heuristics
• Model simple but useful
. Operators not primitive; can be policies themselves
. Fast closed-loop replanning able to cope with uncertainty sometimes
• Beyond Classical Planning
. Top-down approach: native general solvers for MDPs, POMDPs, ...
. Bottom-up approach: transformations and use of classical planners
I’ll focus on transformations for dealing with incomplete information and partialobservability
Hector Geffner, Planning with Partial Observability; IJCAI-2011 Workshop on Decision Making . . . 8
Illustration: Power of Transformations
• Problem P : find green block using visual-marker (circle) that can move aroundone cell at a time (a la Chapman and Ballard)
• Observables: Whether cell marked contains a green block (G), non-green block(B), or neither (C); and whether on table (T) or not (–)
q0
TB/Up-B/Up
TC/Right
q1-C/Down
TB/Right
-B/Down
• Controller on the right solves the problem, and not only that, it’s compact andgeneral: it applies to any number of blocks and any configuration!
Controller obtained by running a classical planner over transformed problem!
Hector Geffner, Planning with Partial Observability; IJCAI-2011 Workshop on Decision Making . . . 9
Another Illustration: Compilation of Soft Goals
• Planning with soft goals aimed at plans π that maximize utility
u(π) =∑
p∈do(π,s0)u(p) −
∑a∈π
c(a)
• Actions have cost c(a), and soft goals utility u(p)
• Best plans achieve best tradeoff between action costs and utilities
• Model used in recent planning competitions; e.g., net-benefit track 2008 IPC
• Yet it turns that soft goals do not add expressive power, and can be compiledaway . . .
Hector Geffner, Planning with Partial Observability; IJCAI-2011 Workshop on Decision Making . . . 10
Compilation of Soft Goals (cont.)
• For each soft goal p, create new hard goal p′ initially false, and two newactions:
. collect(p) with precondition p, effect p′ and cost 0, and
. forego(p) with an empty precondition, effect p′ and cost u(p)
• Plans π maximize u(π) iff minimize c(π) =∑a∈π c(a) in resulting problem
(Keyder & G. 2009)
• Many other non-classical features can be compiled away often quite effectively:
. LTL constraints and goals , Control knowledge, HTN hierarchies, ...
We focus next on compiling away uncertainty . . .
Hector Geffner, Planning with Partial Observability; IJCAI-2011 Workshop on Decision Making . . . 11
Incomplete Information with No Sensing: Conformant Planning
GI
Problem: A robot must move from an uncertain I into G with certainty, one cellat a time, in a grid nxn
• Problem very much like a classical planning problem except for uncertain I
• Plans, however, quite different: best conformant plan must move the robotto a corner first (localization)
• Conformant planning not too interesting in itself; yet provides basis to bestplanners that sense and captures special case
Hector Geffner, Planning with Partial Observability; IJCAI-2011 Workshop on Decision Making . . . 12
Conformant Planning: Belief State Formulation
GI
• call a set of possible states, a belief state
• actions then map a belief state b into a bel state ba = {s′ |s′ ∈ F (a, s) & s ∈ b}
• conformant problem becomes a path-finding problem in belief space
Problem: number of belief state is doubly exponential in number of variables.
– effective representation of belief states b
– effective heuristic h(b) for estimating cost in belief space
Recent alternative: translate into classical planning . . .
Hector Geffner, Planning with Partial Observability; IJCAI-2011 Workshop on Decision Making . . . 13
Basic Translation: Move to the ’Knowledge Level’
Given conformant problem P = 〈F,O, I,G〉
• F stands for the fluents in P
• O for the operators with effects C → L
• I for the initial situation (clauses over F -literals)
• G for the goal situation (set of F -literals)
Define classical problem K0(P ) = 〈F ′, O′, I ′, G′〉 as
• F ′ = {KL,K¬L | L ∈ F}• I ′ = {KL | clause L ∈ I}• G′ = {KL | L ∈ G}• O′ = O but preconds L replaced by KL, and effects C → L replaced by KC → KL
(supports) and ¬K¬C → ¬K¬L (cancellation)
K0(P ) is sound but incomplete: every classical plan that solves K0(P ) is aconformant plan for P , but not vice versa.
Hector Geffner, Planning with Partial Observability; IJCAI-2011 Workshop on Decision Making . . . 14
Key elements in Complete Translation KT,M(P )
• A set T of tags t: consistent sets of assumptions (literals) about the initialsituation I
I 6|= ¬t
• A set M of merges m: valid subsets of tags (= DNF)
I |=∨t∈m
t
• New (tagged) literals KL/t meaning that L is true if t true initially
Hector Geffner, Planning with Partial Observability; IJCAI-2011 Workshop on Decision Making . . . 15
General Translation Scheme KT,M(P )
Given conformant problem P = 〈F,O, I,G〉
• F stands for the fluents in P
• O for the operators with effects C → L
• I for the initial situation (clauses over F -literals)
• G for the goal situation (set of F -literals)
define classical problem KT,M(P ) = 〈F ′, O′, I ′, G′〉 as
• F ′ = {KL/t , K¬L/t | L ∈ F and t ∈ T}• I ′ = {KL/t | if I |= t ⊃ L}• G′ = {KL | L ∈ G}• O′ = O but preconds L replaced by KL, and effects C → L replaced by KC/t → KL/t
(supports) and ¬K¬C/t→ ¬K¬L/t (cancellation), and new merge actions∧t∈m,m∈M
KL/t → KL
The two parameters T and M are the set of tags (assumptions) and the set of merges (valid sets
of assumptions) . . .
Hector Geffner, Planning with Partial Observability; IJCAI-2011 Workshop on Decision Making . . . 16
Compiling Uncertainty Away: Properties
• General translation scheme KT,M(P ) is always sound, and for suitable choice ofthe sets of tags and merges, it is complete.
• KS0(P ) is complete instance of KT,M(P ) obtained by setting T to the set ofpossible initial states of P
• Ki(P ) is a polynomial instance of KT,M(P ) that is complete for problemswith width bounded by constant i.
• The width of many benchmarks bounded and equal 1!
• Such problems can be solved with a classical planner after a low poly translation(Palacios & G. 2009).
Hector Geffner, Planning with Partial Observability; IJCAI-2011 Workshop on Decision Making . . . 17
Next: Partial Observability and Sensing
1. Full translation-based account of contingent planning
. A translation-based approach to contingent planning, A. Albore and H. Palacios and H.
Geffner. IJCAI-09
2. Derivation of finite-state controllers using classical planners
. Automatic Derivation of Memoryless Policies and Finite-State Controllers Using Classical
Planners, B. Bonet and H. Palacios and H. Geffner, IJCAI-09.
3. On-line planning with classical planners
. Planning under Partial Observability by Classical Replanning: Theory and Experiments, B.
Bonet and H. Geffner, IJCAI-2011
4. Approaches that combine translations and sampling
. Effective Heuristics and Belief Tracking for Planning with Incomplete Information, A.
Albore and M. Ramirez and H. Geffner, ICAPS-2011
. Replanning in Domains with Partial Information and Sensing Actions, G. Shani and R.
Brafman, IJCAI-2011
I’ll focus on 2 and 3; Guy Shani will talk about 4
Hector Geffner, Planning with Partial Observability; IJCAI-2011 Workshop on Decision Making . . . 18
On-line Planning for Simple Partially Observable Problems
• Partial observable problem is P = 〈F,O, I,G,M〉 where
. F is set of fluents
. O is set of actions with conditional effects a : C → L
. I is set of F -clauses defining uncertain initial situation
. G is set of F -literals defining goal
. M is set of sensors (C,L): if C true, L is observed
• Simple problems satisfy the following two restrictions:
. non-unary clauses in I are invariant
. hidden fluents do not appear in body of conditional effects
• Restrictions ensure that beliefs can be represented by literals:
. Theorem: reachable beliefs b for simple problems captured by literals truein b and the invariants
Beliefs ‘solved’ in simple problems; the problem left is action selection
Hector Geffner, Planning with Partial Observability; IJCAI-2011 Workshop on Decision Making . . . 19
Action Selection via Translation into Classical Planning
For P = 〈F,O, I,G,M〉, define classical problem K(P ) = 〈F ′, O′, I ′, G′〉 with
• F ′ = {KL,K¬L : L ∈ F}• I ′ = {KL : I |= L}• G′ = {KL : L ∈ G}• O′ = Oorig ∪Oinv ∪Osen where
. Oorig = {a : KC → KL , a : ¬K¬C → ¬K¬L | a : C → L ∈ O}
. Oinv = {KC → KL | ¬C ∨ L ∈ I} (invariants)
. Osen = {A(C,L), A(C,¬L) | sensor (C,L) ∈M} (assumptions)
where
– Actions a in K(P ) from P have the preconds L replaced by KL, and
– Actions A(C,L) have preconds KC,¬KL,¬K¬L and effect KL
Let K(P )[s] be K(P ) with initial ‘knowledge state’ I ′ = s.
Hector Geffner, Planning with Partial Observability; IJCAI-2011 Workshop on Decision Making . . . 20
Properties of plans π for classical problem K(P )[s]
• Classical plans π for K(P )[s] achieve goal but cannot be executed all the wayas they may contain ‘non-real’ actions A(C,L) or A(C,¬L) for sensors (C,L)
• These actions encode assumptions about the value of L that will be observedonce C is made true
• Yet the non-empty prefix of π up to the first such action can be executed
• Since the precondition of these actions includes KC, ¬KL, and ¬K¬L, suchprefix must uncover the truth of the hidden literal L
• As a result, the executable prefix of π either uncovers the truth of a hiddenliteral or achieves the goal (if there are no actions A(C,L) or A(C,¬L) in π)
• This is crucial for the scheme to be complete for simple problems, andanalogous to the explore or exploit principle in PAC-MDP (Kearns & Singh2002),
Hector Geffner, Planning with Partial Observability; IJCAI-2011 Workshop on Decision Making . . . 21
K-Replanner
1. Let s be the initial state in K(P )
2. While s is not a goal in K(P ) do
3. Compute classical plan π for classical problem K(P )[s]
4. Set π′ := executable prefix of π
5. Set s′ := result of applying π′ to K(P )[s]
6. Add observations to s′ and close s′ with the invariants
7. Set s := s′
Theorem (Soundness): If K-Replanner terminates, it terminates in the goal
Theorem (Completeness): if P is a simple problem with a connected spaceand the non-unary clauses in I are in prime implicate form, then the K-Replannerterminates.
Hector Geffner, Planning with Partial Observability; IJCAI-2011 Workshop on Decision Making . . . 22
Examples
RB
GRB
GR
R
GG
B
W
K-Replanner using FF
domain #inst. #solved avg. length avg. #calls avg. srch. time
wumpus 70 70 34.60 2.39 1.11
kill-wumpus 30 30 49.20 31.10 1.92
colored-balls 75 26 115.81 27.54 26.51
trail 140 133 22.77 8.18 7.54
K-Replanner using Mp on colored-balls
instance #states length #calls srch. time total time
9×9-15balls-#2 3.68e39 298 65 190.4 428.9
9×9-15balls-#4 3.68e39 369 73 816.1 1,164.8
9×9-17balls-#2 3.87e44 339 69 721.7 1,082.2
9×9-18balls-#4 1.25e47 406 69 533.6 857.0
Hector Geffner, Planning with Partial Observability; IJCAI-2011 Workshop on Decision Making . . . 23
Finite State Controllers: Example
• Starting in A, move to B and back to A; marks A and B observable.
A B
• This finite-state controller solves the problem
q0
A/Right-/Right
q1B/Left
-/Left
• FSC is compact and general: can add noise, vary distance, etc.
• Heavily used in practice, e.g. video-games and robotics, but written by hand
• The Challenge: How to get these controllers automatically
Hector Geffner, Planning with Partial Observability; IJCAI-2011 Workshop on Decision Making . . . 24
Basic Notions
• Target problem P is like a classical problem with possible incomplete initialsituation I and some observable fluents
• Finite State Controller C is a set of tuples t = 〈q, o, a, q′〉:
. tuple t = 〈q, o, a, q′〉, depicted qo/a→ q′, tells to do action a
when o is observed in controller state q and then to switch to q′
. no two tuples t = 〈q, o, a, q′〉 and t′ = 〈q, o, a′, q′′〉 for same q, o
• Finite State Controller C solves P if all state trajectories compatible with P andC reach the goal
Question: how to derive FSC from P?
Hector Geffner, Planning with Partial Observability; IJCAI-2011 Workshop on Decision Making . . . 25
Idea: FSCs as Conformant Plans of Transformed Problem
For P = 〈F, I,O,G〉 and N controller states q, let P ′ = 〈F ′, I ′, O′, G′〉 with
• F ′ = F along with fluents q0, . . . , qN−1, and joint observations o
• I ′ = I ∪ {q0}• G′ = G
• O′ = actions a(q, o, q′) for every a in P , obs o, and q, q′:
. a(q, o, q′) is like action a but conditional on q and o being true
. a(q, o, q′) also makes q′ true if q and o are true
In addition, axioms and bookkeeping for
– expressing dependence of o on ‘primitive’ fluents,
– keeping q fluents mutex, and
– preventing two actions a(q, o, q′) and b(q, o, q′′) for same pair q, o in plan
Theorem: The finite state controller C solves P iff C contains the tuples〈q, o, a, q′〉 that correspond to the actions a(q, o, q′) in a conformant plan for P ′.
Hector Geffner, Planning with Partial Observability; IJCAI-2011 Workshop on Decision Making . . . 26
Example Revisited
A Bq0
A/Right-/Right
q1B/Left
-/Left
For this problem, compilation yields P ′ = 〈F ′, I ′, O′, G′〉 where:
• F ′ contains at(p1), . . . , at(p5), visited(B); q0, q1; oA, oB, o−• I ′ given by at(p1), q0, oA• G′ given by at(p1) and visited(B)
• O′ contains new actions a(q, o, q′) for a = Left, a = Right, and each q and o
. e.g., action Right(q0, oA, q1) has effects
q0, oA, at(pi)→ at(pi+1)&¬at(pi)q0, oA → q1&¬q0
• Axioms enforce oA iff at(p1), oB iff at(p5), etc.
– Resulting problem P ′ in this case is classical as I not uncertain
– Finite-state controller above ‘read off’ from the classical plan for P ′
Hector Geffner, Planning with Partial Observability; IJCAI-2011 Workshop on Decision Making . . . 27
Example 2: FSC for Visual Marker Problem
• Problem P : find green block using visual-marker (circle) that can move aroundone cell at a time (a la Chapman or Ballard)
• Observables: Whether cell marked contains a green block (G), non-green block(B), or neither (C); and whether on table (T) or not (–)
q0
TB/Up-B/Up
TC/Right
q1-C/Down
TB/Right
-B/Down
• Controller obtained using a classical planner from translation that assumes 2controller states.
• Since initial situation of P is uncertain, resulting problem P ′ is conformant;translated into classical through KS0
Hector Geffner, Planning with Partial Observability; IJCAI-2011 Workshop on Decision Making . . . 28
Summary
• Planning is model-based approach to autonomous behavior
• Many models and dimensions; all intractable, challenge is scale up
• Key technique in classical planning is derivation and use of heuristics
• Power of classical planners used for other tasks via transformations; e.g.,
. conformant planning
. on-line planning with partial observability
. derivation of finite state-controllers
• Important difference to MDP and POMDPs solvers:
. inference at level of variables, not at level of states or beliefs
• Challenge: add probabilities and narrow gap with POMDPs
Hector Geffner, Planning with Partial Observability; IJCAI-2011 Workshop on Decision Making . . . 29
Another Challenge: Russell’s and Norvig’s Wumpus World
Wumpus World PEAS description
Performance measuregold +1000, death -1000-1 per step, -10 for using the arrow
EnvironmentSquares adjacent to wumpus are smellySquares adjacent to pit are breezyGlitter iff gold is in the same squareShooting kills wumpus if you are facing itShooting uses up the only arrowGrabbing picks up gold if in same squareReleasing drops the gold in same square
Breeze Breeze
Breeze
BreezeBreeze
Stench
Stench
BreezePIT
PIT
PIT
1 2 3 4
1
2
3
4
START
Gold
Stench
Actuators Left turn, Right turn,Forward, Grab, Release, Shoot
Sensors Breeze, Glitter, Smell
Chapter 7 5Options for scaling up: Compilation with classical planners; UCT methods forPOMDPs (Silver & Veness 2010); .... ??
Hector Geffner, Planning with Partial Observability; IJCAI-2011 Workshop on Decision Making . . . 30
References
[1] A. Albore, H. Palacios, and H. Geffner. A translation-based approach to contingent planning. In Proc. IJCAI-09.
[2] A. Albore, M. Ramirez, and H. Geffner. Effective heuristics and belief tracking for planning with incompleteinformation. In Proc. ICAPS-2011.
[3] J. Baier and S. McIlraith. Exploiting procedural domain control knowledge in state-of-the-art planners. In Proc.ICAPS-07.
[4] J. Baier and S. McIlraith. Planning with temporally extended goals using heuristic search. In Proc. ICAPS-06.
[5] B. Bonet and H. Geffner. Planning under partial observability by classical replanning. In Proc. IJCAI-2011.
[6] B. Bonet and H. Geffner. Planning with incomplete information as heuristic search in belief space. In Proc.AIPS-2000.
[7] B. Bonet, H. Palacios, and H. Geffner. Automatic derivation of memoryless policies and finite-state controllersusing classical planners. In Proc. ICAPS-09.
[8] E. Keyder and H. Geffner. Soft goals can be compiled away. JAIR, 2009.
[9] M. Lekavy and P. Navrat. Expressivity of strips-like and htn-like planning. Agent and Multi-Agent Systems:Technologies and Applications, 2007.
[10] H. Palacios and H. Geffner. Compiling Uncertainty Away in Conformant Planning Problems with Bounded Width.JAIR, 2009.
[11] R. Petrick and F. Bacchus. A knowledge-based approach to planning with incomplete information and sensing. InProc. AIPS’02.
[12] G. Shani and R. Brafman. Replanning in domains with partial information and sensing actions. In Proc. IJCAI-2011.
[13] D. Silver and J. Veness. Monte-carlo planning in large pomdps. Advances in Neural Information Processing Systems(NIPS), 2010.
[14] S. Yoon, A. Fern, and R. Givan. FF-replan: A baseline for probabilistic planning. In Proc. ICAPS-07.
Hector Geffner, Planning with Partial Observability; IJCAI-2011 Workshop on Decision Making . . . 31