finding approximate pomdp solutions through belief compression
DESCRIPTION
Finding Approximate POMDP Solutions through Belief Compression. Based on slides by Nicholas Roy, MIT. Estimated robot position Robot position distribution True robot position Goal position. Reliable Navigation. Conventional trajectories may not be robust to localisation error. Control. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Finding Approximate POMDP Solutions through Belief Compression](https://reader033.vdocuments.site/reader033/viewer/2022051117/56815ce7550346895dcaed63/html5/thumbnails/1.jpg)
Based on slides byNicholas Roy, MIT
Finding Approximate POMDP Solutions through Belief Compression
![Page 2: Finding Approximate POMDP Solutions through Belief Compression](https://reader033.vdocuments.site/reader033/viewer/2022051117/56815ce7550346895dcaed63/html5/thumbnails/2.jpg)
Reliable Navigation
Conventional trajectories may not be robust to localisation error
Estimated robot positionRobot position distribution
True robot positionGoal position
![Page 3: Finding Approximate POMDP Solutions through Belief Compression](https://reader033.vdocuments.site/reader033/viewer/2022051117/56815ce7550346895dcaed63/html5/thumbnails/3.jpg)
Perception and Control
Perception Control
World state
Control algorithms
![Page 4: Finding Approximate POMDP Solutions through Belief Compression](https://reader033.vdocuments.site/reader033/viewer/2022051117/56815ce7550346895dcaed63/html5/thumbnails/4.jpg)
Perception and Control
Assumed full observability
Exact POMDP planning
Probabilistic Perception
ModelP(x) argmax P(x) Control
World state World state
Probabilistic Perception
ModelP(x) Control
Brittle
Intractable
![Page 5: Finding Approximate POMDP Solutions through Belief Compression](https://reader033.vdocuments.site/reader033/viewer/2022051117/56815ce7550346895dcaed63/html5/thumbnails/5.jpg)
Perception and Control
Assume full observability
Exact POMDP planning
Brittle
World state
Probabilistic Perception
ModelP(x) Compressed P(x) Control
Intractable
![Page 6: Finding Approximate POMDP Solutions through Belief Compression](https://reader033.vdocuments.site/reader033/viewer/2022051117/56815ce7550346895dcaed63/html5/thumbnails/6.jpg)
Main Insight
World state
Probabilistic Perception
ModelP(x) Low-dimensional P(x) Control
Good policies for real world POMDPs can be found by planning over low-dimensional representations
of the belief space.
![Page 7: Finding Approximate POMDP Solutions through Belief Compression](https://reader033.vdocuments.site/reader033/viewer/2022051117/56815ce7550346895dcaed63/html5/thumbnails/7.jpg)
but not usually.
The controller may be globally uncertain...
Belief Space Structure
![Page 8: Finding Approximate POMDP Solutions through Belief Compression](https://reader033.vdocuments.site/reader033/viewer/2022051117/56815ce7550346895dcaed63/html5/thumbnails/8.jpg)
Coastal Navigation
Represent beliefs using
Discretise into low-dimensional belief space MDP
)();(maxarg~ bHsbbs
![Page 9: Finding Approximate POMDP Solutions through Belief Compression](https://reader033.vdocuments.site/reader033/viewer/2022051117/56815ce7550346895dcaed63/html5/thumbnails/9.jpg)
Coastal Navigation
![Page 10: Finding Approximate POMDP Solutions through Belief Compression](https://reader033.vdocuments.site/reader033/viewer/2022051117/56815ce7550346895dcaed63/html5/thumbnails/10.jpg)
A Hard Navigation Problem
0
1
2
3
4
5
6
7
8
9
Maximum Likelihood AMDP
Dis
tanc
e in
M
Average Distance to Goal
![Page 11: Finding Approximate POMDP Solutions through Belief Compression](https://reader033.vdocuments.site/reader033/viewer/2022051117/56815ce7550346895dcaed63/html5/thumbnails/11.jpg)
Dimensionality Reduction
Principal Components Analysis
Original Beliefs
WeightsCharacteristicBeliefs
![Page 12: Finding Approximate POMDP Solutions through Belief Compression](https://reader033.vdocuments.site/reader033/viewer/2022051117/56815ce7550346895dcaed63/html5/thumbnails/12.jpg)
Principal Components Analysis
Given belief bn, we want bm, m«n.
Collection of beliefs drawn from 200 state problem
Prob
abili
ty o
f bei
ng in
stat
e
State
~
![Page 13: Finding Approximate POMDP Solutions through Belief Compression](https://reader033.vdocuments.site/reader033/viewer/2022051117/56815ce7550346895dcaed63/html5/thumbnails/13.jpg)
One sample distribution
m=9 gives this representation for one sample distribution
Principal Components Analysis
Given belief bn, we want bm, m«n.
Prob
abili
ty o
f bei
ng in
stat
e
State
~
![Page 14: Finding Approximate POMDP Solutions through Belief Compression](https://reader033.vdocuments.site/reader033/viewer/2022051117/56815ce7550346895dcaed63/html5/thumbnails/14.jpg)
Principal Components Analysis
Many real world POMDP distributions are characterised by large regions of low probability.
Idea: Create fitting criterion that is (exponentially) stronger in low-probability regions (E-PCA)
![Page 15: Finding Approximate POMDP Solutions through Belief Compression](https://reader033.vdocuments.site/reader033/viewer/2022051117/56815ce7550346895dcaed63/html5/thumbnails/15.jpg)
1 basis2 bases3 bases4 bases
Example EPCA
State
Prob
abili
ty o
f bei
ng in
stat
e
![Page 16: Finding Approximate POMDP Solutions through Belief Compression](https://reader033.vdocuments.site/reader033/viewer/2022051117/56815ce7550346895dcaed63/html5/thumbnails/16.jpg)
Example Reduction
![Page 17: Finding Approximate POMDP Solutions through Belief Compression](https://reader033.vdocuments.site/reader033/viewer/2022051117/56815ce7550346895dcaed63/html5/thumbnails/17.jpg)
E-PCA will indicate appropriate number of bases, depending on beliefs encountered
Finding Dimensionality
![Page 18: Finding Approximate POMDP Solutions through Belief Compression](https://reader033.vdocuments.site/reader033/viewer/2022051117/56815ce7550346895dcaed63/html5/thumbnails/18.jpg)
Planning
S1
S2
S3Original POMDP Low-dimensional
belief space B
E-PCA
Discrete beliefspace MDP
Discretise
~
![Page 19: Finding Approximate POMDP Solutions through Belief Compression](https://reader033.vdocuments.site/reader033/viewer/2022051117/56815ce7550346895dcaed63/html5/thumbnails/19.jpg)
Model Parameters
Reward function
R(b)
s1 s2 s3
p(s)
Back-project to high dimensional belief
S
b sRspsREbR )()())(()(
Compute expected reward from belief:~~
![Page 20: Finding Approximate POMDP Solutions through Belief Compression](https://reader033.vdocuments.site/reader033/viewer/2022051117/56815ce7550346895dcaed63/html5/thumbnails/20.jpg)
Model Parameters
Low dimensionFull dimension
~1. For each belief bi and action a
bi
~3. Propagate according to
action
bj
4. Propagate according toobservation
bj
~
~5. Recover bj
||
1
||
1
||
1
)(),|()|()~
,,~
(bZ
k
S
l
S
mmjmllkji sbasspszpbabT
6. Set T(bi, a, bj) to probabilityof observation
~~ bi
~2. Recover full belief bi
![Page 21: Finding Approximate POMDP Solutions through Belief Compression](https://reader033.vdocuments.site/reader033/viewer/2022051117/56815ce7550346895dcaed63/html5/thumbnails/21.jpg)
Robot Navigation Example
True (hidden) robot positionGoal position
Goal state
Initial Distribution
![Page 22: Finding Approximate POMDP Solutions through Belief Compression](https://reader033.vdocuments.site/reader033/viewer/2022051117/56815ce7550346895dcaed63/html5/thumbnails/22.jpg)
Robot Navigation Example
True robot positionGoal position
![Page 23: Finding Approximate POMDP Solutions through Belief Compression](https://reader033.vdocuments.site/reader033/viewer/2022051117/56815ce7550346895dcaed63/html5/thumbnails/23.jpg)
Policy Comparison
0
1
2
3
4
5
6
7
8
9
Maximum Likelihood AMDP E-PCA
Average Distance to GoalD
ista
nce
in M
6 bases
![Page 24: Finding Approximate POMDP Solutions through Belief Compression](https://reader033.vdocuments.site/reader033/viewer/2022051117/56815ce7550346895dcaed63/html5/thumbnails/24.jpg)
People Finding
![Page 25: Finding Approximate POMDP Solutions through Belief Compression](https://reader033.vdocuments.site/reader033/viewer/2022051117/56815ce7550346895dcaed63/html5/thumbnails/25.jpg)
People Finding as a POMDP
Fully Observable Robot
Position of person unknownRobot position
True person position
![Page 26: Finding Approximate POMDP Solutions through Belief Compression](https://reader033.vdocuments.site/reader033/viewer/2022051117/56815ce7550346895dcaed63/html5/thumbnails/26.jpg)
Finding and Tracking People
Robot positionTrue person position
![Page 27: Finding Approximate POMDP Solutions through Belief Compression](https://reader033.vdocuments.site/reader033/viewer/2022051117/56815ce7550346895dcaed63/html5/thumbnails/27.jpg)
People Finding as a POMDP
Factored belief space2 dimensions: fully-observable robot position6 dimensions: distribution over person positions
Regular grid gives ≈ 1016 states
![Page 28: Finding Approximate POMDP Solutions through Belief Compression](https://reader033.vdocuments.site/reader033/viewer/2022051117/56815ce7550346895dcaed63/html5/thumbnails/28.jpg)
Variable Resolution
Non-regular grid using samples
b1b2 b3 b4
b5
T(b1, a1, b2)
T(b1, a2, b5)
Compute model parameters using nearest-neighbour
~ ~
~ ~
~
~~
~ ~
![Page 29: Finding Approximate POMDP Solutions through Belief Compression](https://reader033.vdocuments.site/reader033/viewer/2022051117/56815ce7550346895dcaed63/html5/thumbnails/29.jpg)
Refining the Grid
V(b1)~
V(b'1)~
Sample beliefs according to policy
b1
~
b'~
Construct new model~ ~Keep new belief if V(b'1) > V(b1)
![Page 30: Finding Approximate POMDP Solutions through Belief Compression](https://reader033.vdocuments.site/reader033/viewer/2022051117/56815ce7550346895dcaed63/html5/thumbnails/30.jpg)
The Optimal Policy
Original distribution
Reconstruction using EPCA and 6 bases
Robot positionTrue person position
![Page 31: Finding Approximate POMDP Solutions through Belief Compression](https://reader033.vdocuments.site/reader033/viewer/2022051117/56815ce7550346895dcaed63/html5/thumbnails/31.jpg)
0
50
100
150
200
250
Closest Densest MaximumLikelihood
E-PCA RefinedE-PCA
Policy Comparison
Average time to find person
Ave
rage
# o
f Act
ions
to fi
nd P
erso
n
E-PCA: 72 statesRefined E-PCA: 260 states
Fully observable MDP
![Page 32: Finding Approximate POMDP Solutions through Belief Compression](https://reader033.vdocuments.site/reader033/viewer/2022051117/56815ce7550346895dcaed63/html5/thumbnails/32.jpg)
Nick’s Thesis Contributions
Good policies for real world POMDPs can be found by planning over a low-dimensional representation of the belief space, using E-PCA.
POMDPs can scale to bigger, more complicated real-world problems.POMDPs can be used for real deployed robots.