sampling methods: particle filteringrtc12/cse586/lectures/samplepfslides.pdf · sis degeneracy...
TRANSCRIPT
Penn State
Robert Collins
Sampling Methods: Particle Filtering
CSE586 Computer Vision II
CSE Dept, Penn State Univ
Penn State
Robert Collins
Recall: Importance Sampling
Procedure to estimate EP(f(x)):
1) Generate N samples xi from Q(x)
2) form importance weights
3) compute empirical estimate of EP(f(x)), the
expected value of f(x) under distribution P(x), as
Penn State
Robert Collins
Resampling
Note: We thus have a set of weighted samples (xi, wi | i=1,…,N)
If we really need random samples from P, we can generate them
by resampling such that the likelihood of choosing value xi is
proportional to its weight wi
This would now involve now sampling from a discrete
distribution of N possible values (the N values of xi )
Therefore, regardless of the dimensionality of vector x, we are
resampling from a 1D distribution (we are essentially
sampling from the indices 1...N, in proportion to the
importance weights wi). So we can using the inverse
transform sampling method we discussed earlier.
Penn State
Robert Collins
Sequential Monte Carlo Methods
Sequential Importance Sampling (SIS) and the closely
related algorithm Sampling Importance Sampling (SIR)
are known by various names in the literature:
- bootstrap filtering
- particle filtering
- Condensation algorithm
- survival of the fittest
General idea: Importance sampling on time series data,
with samples and weights updated as each new data
term is observed. Well-suited for simulating recursive
Bayes filtering!
Penn State
Robert Collins
Recall: Bayes Filtering
Two-step Iteration at Each Time t:
Motion Prediction Step:
Data Correction Step (Bayes rule):
Penn State
Robert Collins
Recall: Bayes Filtering
Problem: in general we get intractible integrals
Motion Prediction Step:
Data Correction Step (Bayes rule):
Penn State
Robert Collins
Sequential Monte Carlo Methods
Intuition:
• Represent probability distributions by samples
(called particles).
• Each particle is a “guess” at the true state.
• For each one, simulate it’s motion update and add
noise to get a motion prediction. Measure the
likelihood of this prediction, and weight the
resulting particles proportional to their likelihoods.
Penn State
Robert Collins
Back to Bayes Filtering
This integral in the denominator of Bayes rule disappears as a
consequence of representing distributions by a weighted set of
samples. Since we have only a finite number of samples, the
normalization constant will be the sum of the weights!
Data Correction Step (Bayes rule):
Penn State
Robert Collins
Back to Bayes Filtering
Now let’s write the Bayes filter by combining motion
prediction and data correction steps into one equation.
motion term old posterior data term new posterior
Penn State
Robert Collins
Monte Carlo Bayes Filtering
Assume the posterior at time t-1 (which is the prior at time t)
has been approximated as a set of N weighted particles:
So that
Where is the delta dirac function
Useful property:
Penn State
Robert Collins
Monte Carlo Bayes Filtering
Then the motion prediction integral simplifies to a summation
Property of Dirac
delta function
Exchange order
of summation
and integration
The prior had been
approximated by N
particles
Motion prediction
integral
Penn State
Robert Collins
Monte Carlo Bayes Filtering
Our Bayes filtering equation thus simplifies as well
Plugging in result
from previous page
Bringing term that
doesn’t depend on i
into the summation
Penn State
Robert Collins
Monte Carlo Bayes Filtering
Our new posterior is therefore
but this is still not amenable to computation in closed-form for
arbitrary motion models and likelihood functions (e.g. we would
have to integrate it to compute the normalization constant c)
Idea : Let’s approximate the posterior as a set of N samples!
Idea 2 : Hey wait a minute, the prior was already represented as
a set of N samples! Why don’t we just “update” each of those?
Penn State
Robert Collins
Monte Carlo Bayes Filtering
Approach: for each sample xit-1 , generate a new sample xi
t from
by importance sampling using some
convenient proposal distribution
So, generate a sample
and compute its importance weight
Penn State
Robert Collins
Monte Carlo Bayes Filtering
We then can approximate our posterior as
where
Penn State
Robert Collins
SIS Algorithm
Penn State
Robert Collins
SIS Degeneracy
Unfortunately, pure SIS suffers from degeneracy. In
many cases, after a few iterations, all but one particle
will have negligible weight.
Illustration of degeneracy:
w
Time 19
w
Time 10
w
Time 1
Penn State
Robert Collins
Resampling to Combat Degeneracy
Sampling with replacement to get N new samples, each
having equal weight 1/N
Samples with high weight get replicated
Samples with low weight die off
Concentrates particles in areas of higher probability
Penn State
Robert Collins
Generic Particle Filter
Penn State
Robert Collins
Sample Importance Resample (SIR)
SIR is a special case of the generic particle filter where:
- the prior density is used as the proposal density
- resampling is done every iteration
therefore
and thus
the old weights are all equal due to resampling
cancellation
Penn State
Robert Collins
SIR Algorithm
Penn State
Robert Collins
Drawing from the Prior Density
xk = fk (xk-1, vk-1) v is process noise
note, when we use the prior as the importance density, we only
need to sample from the process noise distribution (typically
uniform or Gaussian).
Why? Recall:
Thus we can sample from the prior P(xk | xk-1) by starting with
sample xik-1, generating a noise vector vi
k-1 from the noise process,
and forming the noisy sample
xik = fk (x
ik-1, v
ik-1)
If the noise is additive, this leads to a very simple interpretation:
move each particle using motion prediction, then add noise.
Penn State
Robert Collins SIR Filtering Illustration
M
m
m
kM
x1
)(
1
1,
x
M
m
m
k
m
k wx1
)()( ,
M
m
m
k
Mx
1
)(~ 1,
M
m
m
kM
x1
)(
1
1,
M
m
m
k
m
k wx1
)(
1
)(
1 ,
M
m
m
k
Mx
1
)(
1
~ 1,
M
m
m
kM
x1
)(
2
1,
Penn State
Robert Collins
Problems with SIS/SIR
Degeneracy: in SIS, after several iterations all samples
except one tend to have negligible weight. Thus a lot of
computational effort is spent on particles that make no
contribution. Resampling is supposed to fix this, but
also causes a problem...
Sample Impoverishment: in SIR, after several iterations
all samples tend to collapse into a single state. The
ability to representation multimodal distributions is thus
short-lived.
CSE598G
Robert Collins
Particle Filter Failure Analysis
References
King and Forsyth, “How Does CONDENSATION Behave with
a Finite Number of Samples?” ECCV 2000, 695-709.
Karlin and Taylor, A First Course in Stochastic Processes, 2nd
edition, Academic Press, 1975.
CSE598G
Robert Collins
Particle Filter Failure Analysis
Summary
Condensation/SIR is aymptotically correct as the number of samples tends
towards infinity. However, as a practical issue, it has to be run with a finite
number of samples.
Iterations of Condensation form a Markov chain whose state space is quantized
representations of a density.
This Markov chain has some undesirable properties
• high variance - different runs can lead to very different answers
• low apparent variance within each individual run (appears stable)
• state can collapse to single peak in time roughly linear in number
of samples
• tracker may appear to follow peaks in the posterior even in the absence
of any meaningful measurements.
These properties generally known as “sample impoverishment”
CSE598G
Robert Collins
Stationary Analysis
For simplicity, we focus on tracking problems with
stationary distributions (posterior should be the same
at any time step).
[because it is hard to really focus on what is going on
when the posterior modes are deterministically
moving around. Any movement of modes in our
analysis will be due to behavior of the particle filter]
CSE598G
Robert Collins
A Simple PMF State Space
Consider 10 particles representing a probability mass
function over 2 locations.
PMF state space:
1 2
{(0,10)(1,9)(2,8)(3,7)(4,6)(5,5)
(6,4)(7,3)(8,2)(9,1)(10,0)}
(4,6)
We will now instantiate a particular two-state filtering model
that we can analyze in closed-form, and explore the Markov
chain process (on the PMF state space above) that describes
how particle filtering performs on that process.
CSE598G
Robert Collins
Discrete, Stationary, No Noise
Assume a stationary process model with no-noise
process model: Xk+1 = F Xk + vk
Identity
I
no noise
0
Xk+1 = Xk process model:
CSE598G
Robert Collins
Perfect Two-State Ambiguity
Let our two filtering states be {a,b}.
We define both prior distribution and observation
model to be ambiguous (equal belief in a and b).
P(X0) = .5 X0 = a .5 X0 = b
P(Z|Xk) = .5 X0 = a .5 X0 = b
from process model:
P(Xk+1 | Xk) = 1
1
0
0
a b
a
b
CSE598G
Robert Collins
Recall: Recursive Filtering
Prediction:
Update:
previous estimated state state transition predicted current state
predicted current state measurement estimated current state
normalization term
These are exact propagation equations.
CSE598G
Robert Collins
Analytic Filter Analysis
1 .5 .5 0
0 .5 .5 1 = .5
= .5
Predict
Update
.5 = .25/(.25+.25) = .5
.5
.5 .5 = .25/(.25+.25) = .5
CSE598G
Robert Collins
Analytic Filter Analysis
Therefore, for all k, the posterior distribution is
P(Xk | z1:k) = .5 Xk = a .5 Xk = b
which agrees with our intuition in regards to the
stationarity and ambiguity of our two-state model.
Now let’s see how a particle filter behaves...
CSE598G
Robert Collins
Particle Filter
Consider 10 particles representing a probability mass
function over our 2 locations {a,b}.
In accordance with our ambiguous prior, we will initialize
with 5 particles in each location
a b
P(X0) =
CSE598G
Robert Collins
(equal weights in this case)
(no-op in this case)
(all weights become .5 in this case)
(weights are still equal )
Condensation (SIR) Particle Filter
1) Select N new samples with replacement, according
to the sample weights
2) Apply process model to each sample (deterministic
motion + noise)
3) For each new position, set weight of particle in
accordance to observation probability
4) Normalize weights so they sum to one
CSE598G
Robert Collins
Condensation as Markov Chain (Key Step)
Recall that 10 particles representing a probability
mass function over 2 locations can be thought of as
having a state space with 11 elements:
{(0,10)(1,9)(2,8)(3,7)(4,6)(5,5)(6,4)(7,3)(8,2)(9,1)(10,0)}
a b
(5,5)
CSE598G
Robert Collins
Condensation as Markov Chain (Key Step)
We want to characterize the probability that the
particle filter procedure will transition from the
current configuration to a new configuration:
{(0,10)(1,9)(2,8)(3,7)(4,6)(5,5)(6,4)(7,3)(8,2)(9,1)(10,0)}
a b
(5,5)
?
CSE598G
Robert Collins
Condensation as Markov Chain (Key Step)
We want to characterize the probability that the
particle filter procedure will transition from the
current configuration to a new configuration:
{(0,10)(1,9)(2,8)(3,7)(4,6)(5,5)(6,4)(7,3)(8,2)(9,1)(10,0)}
a b
(5,5)
?
Let P(j | i) be
prob of transitioning
from (i,10-i) to (j,10-j)
CSE598G
Robert Collins
Example
N=10 samples
a b
(5,5)
a b
(4,6)
a b
(3,7)
a b
(5,5) a b
(6,4)
.2051
.1172
.2051 .2461
0 10 0
.25 P( j | 5)
j
CSE598G
Robert Collins
Full Transition Table
i
j
0
10
0 10
P( j | i)
0 1
0
0
.25 P( j | 5)
j
CSE598G
Robert Collins
The Crux of the Problem
from (5,5), there is a good chance
we will jump to away from (5,5),
say to (6,4)
P(j|5)
P(j|6) once we do that, we are no longer
sampling from the transition
distribution at (5,5), but from the
one at (6,4). But this is biased off
center from (5,5)
P(j|7) and so on. The behavior will be
similar to that of a random walk.
CSE598G
Robert Collins
Another Problem
i
j
0
10
0 10
P( j | i)
P(0|0) = 1
P(10|10) = 1
(0,10) and (10,0)
are absorbing states!
CSE598G
Robert Collins
Observations
• The Markov chain has two absorbing states
(0,10) and (10,0)
• Once the chain gets into either of these two
states, it can never get out (all the particles
have collapsed into a single bucket)
• There is a nonzero probability of getting into
either absorbing state, starting from (5,5)
These are the seeds of our destruction!
CSE598G
Robert Collins
Simulation
CSE598G
Robert Collins
Some sample runs with 10 particles
CSE598G
Robert Collins
N=10
N=20
N=100
More Sample Runs
CSE598G
Robert Collins
Average Time to Absorbtion
number of particles N
aver
age
tim
e to
ab
sorb
tion
Dots - from running simulator (100 trials at N=10,20,30...)
Line - plot of 1.4 N, the asymptotic analytic estimate (King and Forsyth)
CSE598G
Robert Collins
More Generally
Implications of stationary process model with no
noise, in a discrete state space.
• any time any bucket contains zero particles, it will
forever after have zero particles (for that run).
• there is typically a nonzero probability of getting
zero particles in a bucket sometime during the run.
• thus, over time, the particles will inevitably
collapse into a single bucket.
CSE598G
Robert Collins
Extending to Continuous Case
A similar thing happens in more realistic cases. Consider a
continuous case with two stationary modes in the likelihood,
and where each mode has small variance with respect to
distance between modes.
mode1 mode2
CSE598G
Robert Collins
Extending to Continuous Case
mode1 mode2
The very low variance between modes is fatal to any particles
that try to cross from one to the other via diffusion.
CSE598G
Robert Collins
Extending to Continuous Case
mode1 mode2
Each mode thus becomes an isolated island, and we can reduce
this case to our previous two-state analysis (each mode is one
discrete state)
a b