finding admissible bounds for over- subscribed planning problems j. benton menkes van den...
Post on 20-Dec-2015
216 views
TRANSCRIPT
Finding Admissible Bounds for Over-
subscribed Planning Problems
J. Benton
Menkes van den Briel
Subbarao Kambhampati
Arizona State University
Is this plan “good”?
How good is a given plan
How to drive a planner to find a
good planRelated
Admissibleheuristics
Need a heuristic schema that admits degrees of relaxation
Helps per-node useHelps one-shot use
Especially importantwhen quality may
vary widely{e.g.,
when wehave manysoft goals
Challenges
1. Build a strong admissible heuristic
2. Provide a way to add relaxation for varied use
An integer programming (IP) based heuristic
Use the linear programming (LP) relaxation
PSPUD
Partial Satisfaction Planning with Utility Dependency
cost: 20 cost: 5
(at t loc2)(in p1
t)
(move t loc2) (unload p1 loc2)(at t loc1)
(in p1 t)
(at t loc2)(at p1 loc2)
utility((at t loc1) & (at p1 t)) = 60
cost: 20
(move t loc1) (at t loc1)(at p1 loc2)
utility((at t loc1)) = 10 utility((at p1 loc2)) = 10
util(S0): 10
S0
util(S1): 0
S1
util(S2): 10
S2
util(S3): 10+10+60=80
S3
sum cost: 20 sum cost: 25 sum cost: 45
loc2loc1
net benefit(S0): 10-0=10net benefit(S1): 0-20=-20net benefit(S2): 10-25=-15net benefit(S3): 80-45=35
Actions have cost Goal sets have utility
Building a Heuristic
loc2loc1
A network flow model on variable transitions
truckpackage
Capture relevant transitions with multi-valued fluents
add prevail constraints
add initial statesadd goal states
add cost on actionsadd utility on goals
cost: 20
cost: 20
cost: 5
cost: 5
cost: 5
cost: 5
util: 10util: 10
util: 60
Building a Heuristic
truckpackage
cost: 20
cost: 20
cost: 5
cost: 5
cost: 5
cost: 5
util: 10
util: 60
Constraints of this model
2. If a fact is deleted, then it must be added to re-achieve a value.3. If a prevail condition is required, then it must be achieved.
1. If an action executes, then all of its effects and prevail conditions must also.
4. A goal utility dependency is achieved if its goals are achieved.
util: 10
FormulationVariablesaction(a) ∈ Z+ The number of times a ∈ A is executed
effect(a,v,e) ∈ Z+ The number of times a transition e in state variable v is caused by action a
prevail(a,v,f) ∈ Z+ The number of times a prevail condition f in state variable v is required by action a
endvalue(v,f) ∈ {0,1}
Equal to 1 if value f is the end value in a state variable v
goaldep(k) Equal to 1 if a goal dependency is achievedParameterscost(a) the cost of executing action a ∈ A
utility(v,f) the utility of achieving value f in state variable v
utility(k) the utility of achieving achieving goal dependency Gk
1. If an action executes, then all of its effects and prevail conditions must also.action(a) = Σeffects of a in v effect(a,v,e) + Σprevails of a in v
prevail(a,v,f)2. If a fact is deleted, then it must be added to re-achieve a value.
1{if f ∈ s0[v]} + Σeffects that add f effect(a,v,e) = Σeffects that delete f effect(a,v,e) + endvalue(v,f)3. If a prevail condition is required, then it must be achieved.
1{if f ∈ s0[v]} + Σeffects that add f effect(a,v,e) ≥ prevail(a,v,f) / M4. A goal utility dependency is achieved if its goals are achieved.
goaldep(k) ≥ Σf in dependency k endvalue(v,f) – |Gk| – 1
goaldep(k) ≤ endvalue(v,f) ∀ f in dependency k
FormulationVariablesaction(a) ∈ Z+ The number of times a ∈ A is executed
effect(a,v,e) ∈ Z+ The number of times a transition e in state variable v is caused by action a
prevail(a,v,f) ∈ Z+ The number of times a prevail condition f in state variable v is required by action a
endvalue(v,f) ∈ {0,1}
Equal to 1 if value f is the end value in a state variable v
goaldep(k) Equal to 1 if a goal dependency is achievedParameterscost(a) the cost of executing action a ∈ A
utility(v,f) the utility of achieving value f in state variable v
utility(k) the utility of achieving achieving goal dependency Gk
Objective FunctionΣv∈V,f∈Dv utility(v,f) endvalue(v,f) + Σk∈K utility(k) goaldep(k) – Σa∈A cost(a)
action(a)Maximize Net Benefit
Experimental Setup
Three modified IPC 3 domains: zenotravel, satellite, rovers
Compared with , a cost propagation-based heuristic
(maximize net benefit)One IPC 5 domain: Rovers, simple preferences(minimize (goal achievement violations + action
cost))
heuristic value at initial state versus optimal plan
Found using a branch and bound search
LP > IP > OPTIMALmaximizingLP < IP < OPTIMALminimizing
Results
Results
ResultsIP LP
Summary
IP gives bound on quality of plan
Doubly relaxed (LP) to provide heuristic for search (Search I Session: Monday at 4:10 pm)
Future Work
Improve encoding (to give better LP values)
Use fluent merging