conformant probabilistic planning via csps icaps-2003 nathanael hyafil & fahiem bacchus...
TRANSCRIPT
![Page 1: Conformant Probabilistic Planning via CSPs ICAPS-2003 Nathanael Hyafil & Fahiem Bacchus University of Toronto](https://reader035.vdocuments.site/reader035/viewer/2022081516/56649f435503460f94c64102/html5/thumbnails/1.jpg)
Conformant Probabilistic Planning via CSPs
ICAPS-2003
Nathanael Hyafil & Fahiem Bacchus
University of Toronto
![Page 2: Conformant Probabilistic Planning via CSPs ICAPS-2003 Nathanael Hyafil & Fahiem Bacchus University of Toronto](https://reader035.vdocuments.site/reader035/viewer/2022081516/56649f435503460f94c64102/html5/thumbnails/2.jpg)
2
Contributions• Conformant Probabilistic Planning
– Uncertainty about initial state
– Probabilistic Actions
– No observations during plan execution
– Find plan with highest probability of achieving the goal
• Utilize a CSP Approach– Encode the problem as a CSP
– Develop new techniques for solving this kind of CSP
• Compare Decision-Theoretic algorithms
![Page 3: Conformant Probabilistic Planning via CSPs ICAPS-2003 Nathanael Hyafil & Fahiem Bacchus University of Toronto](https://reader035.vdocuments.site/reader035/viewer/2022081516/56649f435503460f94c64102/html5/thumbnails/3.jpg)
3
Conformant Probabilistic Planning
CPP Problem consists of:• S: (finite) set of states (factored representation)• B: (initial) probability distribution over S (belief state)• A: (finite) set of (prob.) actions (represented as sequential-
effect trees)• G: set of Goal states (Boolean expression over state vars)• n: length/horizon of the plan to be computed
Solution: find a plan (finite sequence of actions) that maximizes the probability of reaching a goal state from B
![Page 4: Conformant Probabilistic Planning via CSPs ICAPS-2003 Nathanael Hyafil & Fahiem Bacchus University of Toronto](https://reader035.vdocuments.site/reader035/viewer/2022081516/56649f435503460f94c64102/html5/thumbnails/4.jpg)
4
Example Problem: SandCastle• S: {moat, castle}, both Boolean• G: castle = true• A: sequential effects tree representation (Littman 1997)
Erect-Castle
castle
castle
T F1.0 moat
T F1.0 0.5
moat
moat
T Fcastle 0.0
T F0.75 castle:new
T F0.67 0.25
R1 R2R3
R4
![Page 5: Conformant Probabilistic Planning via CSPs ICAPS-2003 Nathanael Hyafil & Fahiem Bacchus University of Toronto](https://reader035.vdocuments.site/reader035/viewer/2022081516/56649f435503460f94c64102/html5/thumbnails/5.jpg)
5
Constraint Satisfaction Problems
• Encode the CPP as a CSP• CSP
– finite set of variables each with its own finite domain– finite set of constraints, each over some subset of the variables.
• Constraint: satisfied by only some variable assignments.• Solution: an assignment to all Variables that satisfies all
Constraints• Standard Algorithm: Depth-First Tree Search with
Constraint Propagation
![Page 6: Conformant Probabilistic Planning via CSPs ICAPS-2003 Nathanael Hyafil & Fahiem Bacchus University of Toronto](https://reader035.vdocuments.site/reader035/viewer/2022081516/56649f435503460f94c64102/html5/thumbnails/6.jpg)
6
The Encoding• Variables:
– State variables (usually Boolean)– Action variables {1, …, |A|}– Random variables to encode the action probabilities
(True/False/Irrelevant). • The setting of the random variables makes the actions deterministic• Random variables are independent
• Constraints– Initial belief state represented as an “Initial action” (as in partial
order planning).– Actions constraint the state variables at step i with those at step i+1
• Each branch of the effects trees is represented as one constraint.
– A constraint representing the goal
![Page 7: Conformant Probabilistic Planning via CSPs ICAPS-2003 Nathanael Hyafil & Fahiem Bacchus University of Toronto](https://reader035.vdocuments.site/reader035/viewer/2022081516/56649f435503460f94c64102/html5/thumbnails/7.jpg)
7
V1
V2
T FState Var
![Page 8: Conformant Probabilistic Planning via CSPs ICAPS-2003 Nathanael Hyafil & Fahiem Bacchus University of Toronto](https://reader035.vdocuments.site/reader035/viewer/2022081516/56649f435503460f94c64102/html5/thumbnails/8.jpg)
8
V1
V2
T FState Var
![Page 9: Conformant Probabilistic Planning via CSPs ICAPS-2003 Nathanael Hyafil & Fahiem Bacchus University of Toronto](https://reader035.vdocuments.site/reader035/viewer/2022081516/56649f435503460f94c64102/html5/thumbnails/9.jpg)
9
V1
V2
a1 a2 a3
State Var
Action VarT F
![Page 10: Conformant Probabilistic Planning via CSPs ICAPS-2003 Nathanael Hyafil & Fahiem Bacchus University of Toronto](https://reader035.vdocuments.site/reader035/viewer/2022081516/56649f435503460f94c64102/html5/thumbnails/10.jpg)
10
V1
V2
a1 a2 a3
State Var
Action Var
Random Var
T F
![Page 11: Conformant Probabilistic Planning via CSPs ICAPS-2003 Nathanael Hyafil & Fahiem Bacchus University of Toronto](https://reader035.vdocuments.site/reader035/viewer/2022081516/56649f435503460f94c64102/html5/thumbnails/11.jpg)
11
V1
V2
A1
State Var
Action Var
Random Var
T F
T F
![Page 12: Conformant Probabilistic Planning via CSPs ICAPS-2003 Nathanael Hyafil & Fahiem Bacchus University of Toronto](https://reader035.vdocuments.site/reader035/viewer/2022081516/56649f435503460f94c64102/html5/thumbnails/12.jpg)
12
V1
V2
A1
State Var
Action Var
Random Var
T F
T F
![Page 13: Conformant Probabilistic Planning via CSPs ICAPS-2003 Nathanael Hyafil & Fahiem Bacchus University of Toronto](https://reader035.vdocuments.site/reader035/viewer/2022081516/56649f435503460f94c64102/html5/thumbnails/13.jpg)
13
V1
V2
A1
State Var
Action Var
Random Var
T F
![Page 14: Conformant Probabilistic Planning via CSPs ICAPS-2003 Nathanael Hyafil & Fahiem Bacchus University of Toronto](https://reader035.vdocuments.site/reader035/viewer/2022081516/56649f435503460f94c64102/html5/thumbnails/14.jpg)
14
V1
V2
A1
State Var
Action Var
Random Var
T F
Goal States
![Page 15: Conformant Probabilistic Planning via CSPs ICAPS-2003 Nathanael Hyafil & Fahiem Bacchus University of Toronto](https://reader035.vdocuments.site/reader035/viewer/2022081516/56649f435503460f94c64102/html5/thumbnails/15.jpg)
15
V1
V2
A1
State Var
Action Var
Random Var
T F
Goal States
a2
![Page 16: Conformant Probabilistic Planning via CSPs ICAPS-2003 Nathanael Hyafil & Fahiem Bacchus University of Toronto](https://reader035.vdocuments.site/reader035/viewer/2022081516/56649f435503460f94c64102/html5/thumbnails/16.jpg)
16
So far
• Encoded CPP as a CSP
• Solved the CSP
• How do we now solve the CPP?
![Page 17: Conformant Probabilistic Planning via CSPs ICAPS-2003 Nathanael Hyafil & Fahiem Bacchus University of Toronto](https://reader035.vdocuments.site/reader035/viewer/2022081516/56649f435503460f94c64102/html5/thumbnails/17.jpg)
17
State Var
Action Var
Random Var
Goal States
CSPSolutions
![Page 18: Conformant Probabilistic Planning via CSPs ICAPS-2003 Nathanael Hyafil & Fahiem Bacchus University of Toronto](https://reader035.vdocuments.site/reader035/viewer/2022081516/56649f435503460f94c64102/html5/thumbnails/18.jpg)
18
State Var
Action Var
Random Var
Goal States
CSP solution:
assignment to State/Action/Random variables that is
- valid sequence of transitions- from initial - to goal state
![Page 19: Conformant Probabilistic Planning via CSPs ICAPS-2003 Nathanael Hyafil & Fahiem Bacchus University of Toronto](https://reader035.vdocuments.site/reader035/viewer/2022081516/56649f435503460f94c64102/html5/thumbnails/19.jpg)
19
State Var
Action Var
Random Var
Goal Statesa1
a1
Action variables: plan executed
State variables: execution path induced by plan
![Page 20: Conformant Probabilistic Planning via CSPs ICAPS-2003 Nathanael Hyafil & Fahiem Bacchus University of Toronto](https://reader035.vdocuments.site/reader035/viewer/2022081516/56649f435503460f94c64102/html5/thumbnails/20.jpg)
20
State Var
Action Var
Random Var
Goal Statesa1
a2
![Page 21: Conformant Probabilistic Planning via CSPs ICAPS-2003 Nathanael Hyafil & Fahiem Bacchus University of Toronto](https://reader035.vdocuments.site/reader035/viewer/2022081516/56649f435503460f94c64102/html5/thumbnails/21.jpg)
21
State Var
Action Var
Random Var
Goal States
a2
a1
![Page 22: Conformant Probabilistic Planning via CSPs ICAPS-2003 Nathanael Hyafil & Fahiem Bacchus University of Toronto](https://reader035.vdocuments.site/reader035/viewer/2022081516/56649f435503460f94c64102/html5/thumbnails/22.jpg)
22
State Var
Action Var
Random Var
Goal States
1-.25
.25
Probability ofa Solution
Product prob of Random vars :probability that this path was traversed by this plan
![Page 23: Conformant Probabilistic Planning via CSPs ICAPS-2003 Nathanael Hyafil & Fahiem Bacchus University of Toronto](https://reader035.vdocuments.site/reader035/viewer/2022081516/56649f435503460f94c64102/html5/thumbnails/23.jpg)
23
State Var
Action Var
Random Var
Goal Statesa1
a2
Value ofa Plan
a1
a2
Value of plan : probs of all solutions with plan =
After all ’s evaluated:optimal plan = one with highest value
![Page 24: Conformant Probabilistic Planning via CSPs ICAPS-2003 Nathanael Hyafil & Fahiem Bacchus University of Toronto](https://reader035.vdocuments.site/reader035/viewer/2022081516/56649f435503460f94c64102/html5/thumbnails/24.jpg)
24
Redundant Computations
• Due to the Markov property if the same state is encountered again at step i of the plan the subtree below will be the same– If we can cache all the info in this subtree:
explore only once
• To compute the best overall n-step plan: need to know value of every n-1 step plan for all states at step 1.
![Page 25: Conformant Probabilistic Planning via CSPs ICAPS-2003 Nathanael Hyafil & Fahiem Bacchus University of Toronto](https://reader035.vdocuments.site/reader035/viewer/2022081516/56649f435503460f94c64102/html5/thumbnails/25.jpg)
25
Caching• Probability of success (value) of a i step plan
<a, > in state s = expectation of ’s success probability over the states reached from s by a:
• If we know the value of in each of these states: can compute its value in s without further search
• So, for each state s reached at step i, we cache the value of all n-i step plans
s
s1 s3s2
V1
a
.5 .2 .3
V3
V2
![Page 26: Conformant Probabilistic Planning via CSPs ICAPS-2003 Nathanael Hyafil & Fahiem Bacchus University of Toronto](https://reader035.vdocuments.site/reader035/viewer/2022081516/56649f435503460f94c64102/html5/thumbnails/26.jpg)
26
CPplan Algorithm• CPplan():
Select next unassigned variable VIf V is last state var of a step:
If this state/step is cached returnElse if all vars are assigned (must be at a goal state):
Cache 1 as value of previous state/stepElse
For each value d of VV=dCPplan()If V is some Action var Ai
Update Cache for previous state/step with values of plans starting with d
![Page 27: Conformant Probabilistic Planning via CSPs ICAPS-2003 Nathanael Hyafil & Fahiem Bacchus University of Toronto](https://reader035.vdocuments.site/reader035/viewer/2022081516/56649f435503460f94c64102/html5/thumbnails/27.jpg)
27
Caching Scheme
• Needs a lot of memory– proportional to |S|*|A|n
– no known algorithm does better
• Other features– Cache key simple (state / step)– Partial Caching achieves a good
space/time tradeoff
![Page 28: Conformant Probabilistic Planning via CSPs ICAPS-2003 Nathanael Hyafil & Fahiem Bacchus University of Toronto](https://reader035.vdocuments.site/reader035/viewer/2022081516/56649f435503460f94c64102/html5/thumbnails/28.jpg)
28
MAXPLAN (Majercik & Littman 1998)
• Parallel approach based on Stochastic SAT
• Caching Scheme different
• Faster than Buridan and other AI Planners
• uses (even) more memory than CPplan
• 2 to 3 orders of magnitude slower than CPplan
![Page 29: Conformant Probabilistic Planning via CSPs ICAPS-2003 Nathanael Hyafil & Fahiem Bacchus University of Toronto](https://reader035.vdocuments.site/reader035/viewer/2022081516/56649f435503460f94c64102/html5/thumbnails/29.jpg)
29
Results vs. Maxplan
0.1
1
10
100
1000
10000
10 11 12 13 14 15 16 17 18 19 20
MaxPlanCpplan
SandCastle-67 Slippery GripperTimeTime
Number of Steps Number of Steps
0.1
1
10
100
1000
10000
5 6 7 8 9 10 11 12
MaxplanCpplan
![Page 30: Conformant Probabilistic Planning via CSPs ICAPS-2003 Nathanael Hyafil & Fahiem Bacchus University of Toronto](https://reader035.vdocuments.site/reader035/viewer/2022081516/56649f435503460f94c64102/html5/thumbnails/30.jpg)
30
CPP as special case of POMDPs
• POMDP: model for probabilistic planning in partially observable environments
• CPP can be cast as a POMDP in which there are no observations.
• Value Iteration, a standard POMDP algorithm, can be used to compute a solution to CPP.
![Page 31: Conformant Probabilistic Planning via CSPs ICAPS-2003 Nathanael Hyafil & Fahiem Bacchus University of Toronto](https://reader035.vdocuments.site/reader035/viewer/2022081516/56649f435503460f94c64102/html5/thumbnails/31.jpg)
31
Value Iteration: Intuitions
• Value Iteration utilizes a powerful form of state abstraction.– Value of an i-step plan (for every belief state) is represented
compactly by vector of values (one for each state): value on a belief state is the expectation of these values.
– This vector of values is called an -vector.– Value iteration need only consider the set of -vectors that are
optimal for some belief state.– Plans optimal for some belief state are optimal over an entire
region of belief states. – So regions of belief states are managed collectively by a single
plan (-vector) that is i-step optimal for all belief states in the region.
![Page 32: Conformant Probabilistic Planning via CSPs ICAPS-2003 Nathanael Hyafil & Fahiem Bacchus University of Toronto](https://reader035.vdocuments.site/reader035/viewer/2022081516/56649f435503460f94c64102/html5/thumbnails/32.jpg)
32
-vector Abstraction
• Number of alpha-vectors that need to be considered might grow much more slowly than the number of action sequences.
• Slippery Gripper:– 1 step to go: 2 -vectors instead of 4 actions– 2 steps to go: 6 instead of 16 (action sequences)– 3 steps to go: 10 instead of 64– 10 steps to go: 40 instead of >106
![Page 33: Conformant Probabilistic Planning via CSPs ICAPS-2003 Nathanael Hyafil & Fahiem Bacchus University of Toronto](https://reader035.vdocuments.site/reader035/viewer/2022081516/56649f435503460f94c64102/html5/thumbnails/33.jpg)
33
Results vs. POMDP
0.01
0.1
1
10
100
1000
5 6 7 8 9 10 11 12 13 14 15
POMDPCpplan
Time
Number of Steps
Slippery Gripper Grid 10x10Time
Number of Steps
0.1
1
10
100
1000
5 6 7 8 9 10 11 12 13
POMDPCpplan
![Page 34: Conformant Probabilistic Planning via CSPs ICAPS-2003 Nathanael Hyafil & Fahiem Bacchus University of Toronto](https://reader035.vdocuments.site/reader035/viewer/2022081516/56649f435503460f94c64102/html5/thumbnails/34.jpg)
34
Dynamic Reachability
• POMDPs: small portion of all possible plans to evaluate but on all belief states including those not reachable from the initial belief state.
• Combinatorial Planners (CPplan, Maxplan) must evaluate all |A|n plans but tree search performs dynamic reachability and goal attainability analysis to only evaluate plans on reachable states at each step
• Ex: Grid 10x10, only 4 states reachable in 1 step
![Page 35: Conformant Probabilistic Planning via CSPs ICAPS-2003 Nathanael Hyafil & Fahiem Bacchus University of Toronto](https://reader035.vdocuments.site/reader035/viewer/2022081516/56649f435503460f94c64102/html5/thumbnails/35.jpg)
35
Conclusion -- Future Work
• New approach to CPP, better than previous AI planning techniques (Maxplan, Buridan)
• Analysis of respective benefits of decision theoretic techniques and AI techniques
• Ways to combine abstraction with dynamic reachability for POMDPs and MDPs.
![Page 36: Conformant Probabilistic Planning via CSPs ICAPS-2003 Nathanael Hyafil & Fahiem Bacchus University of Toronto](https://reader035.vdocuments.site/reader035/viewer/2022081516/56649f435503460f94c64102/html5/thumbnails/36.jpg)
36
Results vs. Maxplan
SandCastle - 67 Slippery Gripper
Time Time
Number of Steps Number of Steps
![Page 37: Conformant Probabilistic Planning via CSPs ICAPS-2003 Nathanael Hyafil & Fahiem Bacchus University of Toronto](https://reader035.vdocuments.site/reader035/viewer/2022081516/56649f435503460f94c64102/html5/thumbnails/37.jpg)
37
Results vs. POMDP
Time
Number of Steps
Slippery Gripper Grid 10x10
Time
Number of Steps