planning under uncertainty
DESCRIPTION
Planning Under Uncertainty. Sensing error Partial observability Unpredictable dynamics Other agents. Three Strategies for Handling Uncertainty. Reactive strategies Replanning Proactive strategies Anticipating future uncertainty Shaping future uncertainty (active sensing). - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Planning Under Uncertainty](https://reader035.vdocuments.site/reader035/viewer/2022062520/568132b1550346895d99673d/html5/thumbnails/1.jpg)
Planning Under Uncertainty
![Page 2: Planning Under Uncertainty](https://reader035.vdocuments.site/reader035/viewer/2022062520/568132b1550346895d99673d/html5/thumbnails/2.jpg)
Sensing error
Partial observability
Unpredictable dynamics
Other agents
![Page 3: Planning Under Uncertainty](https://reader035.vdocuments.site/reader035/viewer/2022062520/568132b1550346895d99673d/html5/thumbnails/3.jpg)
Three Strategies for Handling Uncertainty Reactive strategies
Replanning Proactive strategies
Anticipating future uncertaintyShaping future uncertainty (active
sensing)
![Page 4: Planning Under Uncertainty](https://reader035.vdocuments.site/reader035/viewer/2022062520/568132b1550346895d99673d/html5/thumbnails/4.jpg)
Uncertainty in Dynamics
This class: the robot has uncertain dynamicsCalibration errors, wheel slip, actuator error,
unpredictable obstacles Perfect (or good enough) sensing
![Page 5: Planning Under Uncertainty](https://reader035.vdocuments.site/reader035/viewer/2022062520/568132b1550346895d99673d/html5/thumbnails/5.jpg)
Modeling imperfect dynamics
Probabilistic dynamics modelP(x’|x,u): a probability distribution over
successors x’, given state x, control uMarkov assumption
Nondeterministic dynamics model f(x,u) -> a set of possible successors
![Page 6: Planning Under Uncertainty](https://reader035.vdocuments.site/reader035/viewer/2022062520/568132b1550346895d99673d/html5/thumbnails/6.jpg)
Target Tracking
targetrobot
The robot must keep a target in its field of view
The robot has a prior map of the obstacles
But it does not know the target’s trajectory in advance
![Page 7: Planning Under Uncertainty](https://reader035.vdocuments.site/reader035/viewer/2022062520/568132b1550346895d99673d/html5/thumbnails/7.jpg)
Target-Tracking Example Time is discretized into
small steps of unit duration
At each time step, each of the two agents moves by at most one increment along a single axis
The two moves are simultaneous
The robot senses the new position of the target at each step
The target is not influenced by the robot (non-adversarial, non-cooperative target)
targetrobot
![Page 8: Planning Under Uncertainty](https://reader035.vdocuments.site/reader035/viewer/2022062520/568132b1550346895d99673d/html5/thumbnails/8.jpg)
Time-Stamped States (no cycles possible)
State = (robot-position, target-position, time) In each state, the robot can execute 5 possible actions :
{stop, up, down, right, left} Each action has 5 possible outcomes (one for each possible action
of the target), with some probability distribution[Potential collisions are ignored for simplifying the presentation]
([i,j], [u,v], t)
• ([i+1,j], [u,v], t+1)• ([i+1,j], [u-1,v], t+1)• ([i+1,j], [u+1,v], t+1)• ([i+1,j], [u,v-1], t+1)• ([i+1,j], [u,v+1], t+1)
right
![Page 9: Planning Under Uncertainty](https://reader035.vdocuments.site/reader035/viewer/2022062520/568132b1550346895d99673d/html5/thumbnails/9.jpg)
Rewards and CostsThe robot must keep seeing the target as long as possible
Each state where it does not see the target is terminal
The reward collected in every non-terminal state is 1; it is 0 in each terminal state
[ The sum of the rewards collected in an execution run is exactly the amount of time the robot sees the target]
No cost for moving vs. not moving
![Page 10: Planning Under Uncertainty](https://reader035.vdocuments.site/reader035/viewer/2022062520/568132b1550346895d99673d/html5/thumbnails/10.jpg)
Expanding the state/action tree
...
horizon 1 horizon h
![Page 11: Planning Under Uncertainty](https://reader035.vdocuments.site/reader035/viewer/2022062520/568132b1550346895d99673d/html5/thumbnails/11.jpg)
Assigning rewards
...
horizon 1 horizon h
Terminal states: states where the target is not visible
Rewards: 1 in non-terminal states; 0 in others
But how to estimate the utility of a leaf at horizon h?
![Page 12: Planning Under Uncertainty](https://reader035.vdocuments.site/reader035/viewer/2022062520/568132b1550346895d99673d/html5/thumbnails/12.jpg)
Estimating the utility of a leaf
...
horizon 1 horizon h
Compute the shortest distance d for the target to escape the robot’s current field of view
If the maximal velocity v of the target is known, estimate the utility of the state to d/v [conservative estimate]
targetrobot
d
![Page 13: Planning Under Uncertainty](https://reader035.vdocuments.site/reader035/viewer/2022062520/568132b1550346895d99673d/html5/thumbnails/13.jpg)
Selecting the next action
...
horizon 1 horizon h
Compute the optimal policy over the state/action tree using estimated utilities at leaf nodes
Execute only the first step of this policy
Repeat everything again at t+1… (sliding horizon)
Real-time constraint: h is chosen so that a decision can be returned in unit time [A larger h may result in a better decision that will arrive too late !!]
![Page 14: Planning Under Uncertainty](https://reader035.vdocuments.site/reader035/viewer/2022062520/568132b1550346895d99673d/html5/thumbnails/14.jpg)
Pure Visual Servoing
![Page 15: Planning Under Uncertainty](https://reader035.vdocuments.site/reader035/viewer/2022062520/568132b1550346895d99673d/html5/thumbnails/15.jpg)
Computing and Using a Policy
![Page 16: Planning Under Uncertainty](https://reader035.vdocuments.site/reader035/viewer/2022062520/568132b1550346895d99673d/html5/thumbnails/16.jpg)
Markov Decision Process Finite state space X, action space U Reward function R(x) Optimal value function V(x)
Bellman equations If x is terminal:
V(x) = R(x)
If x is non-terminal:
V(x) = R(x) + maxuUx’XP(x’|x,u)V(x’)
*(x) = arg maxuUx’XP(x’|x,u)V(x’)
![Page 17: Planning Under Uncertainty](https://reader035.vdocuments.site/reader035/viewer/2022062520/568132b1550346895d99673d/html5/thumbnails/17.jpg)
Solving Finite MDPs
Value iteration Improve V(x) by repeatedly “backing-up”
value of best action Policy iteration
Improve (x) by repeatedly computing best action
Linear programming
![Page 18: Planning Under Uncertainty](https://reader035.vdocuments.site/reader035/viewer/2022062520/568132b1550346895d99673d/html5/thumbnails/18.jpg)
Continuous Markov Decision Processes Discretize state and action space
E.g., grids, random samples=> Perform value/policy iteration/LP
formulation on discrete space Or discretize value function
Basis functions=> Iterative least squares approaches
![Page 19: Planning Under Uncertainty](https://reader035.vdocuments.site/reader035/viewer/2022062520/568132b1550346895d99673d/html5/thumbnails/19.jpg)
Extra Credit for Open House +5% of final project grade Preliminary demo & description by next Tuesday Open House on April 16, 4-7pm
![Page 20: Planning Under Uncertainty](https://reader035.vdocuments.site/reader035/viewer/2022062520/568132b1550346895d99673d/html5/thumbnails/20.jpg)
Upcoming Subjects
Sensorless planning (Goldberg, 1994)
An example of a nondeterministic uncertainty model
Planning to sense (Gonzalez-Banos and Latombe 2002)
![Page 21: Planning Under Uncertainty](https://reader035.vdocuments.site/reader035/viewer/2022062520/568132b1550346895d99673d/html5/thumbnails/21.jpg)
Comments on Uncertainty
Reasoning with uncertainty requires representing and transforming sets of states
This is challenging in high-D spaces Hard to get accurate, sparse representation Multi-modal, nonlinear, degenerate distributions
Breakthroughs would have dramatic impact