sampling methods: particle filteringrtc12/cse586/lectures/samplepfslides.pdf · sis degeneracy...

Penn State

Robert Collins

Sampling Methods: Particle Filtering

CSE586 Computer Vision II

CSE Dept, Penn State Univ

Penn State

Robert Collins

Recall: Importance Sampling

Procedure to estimate EP(f(x)):

1) Generate N samples xi from Q(x)

2) form importance weights

3) compute empirical estimate of EP(f(x)), the

expected value of f(x) under distribution P(x), as

Penn State

Robert Collins

Resampling

Note: We thus have a set of weighted samples (xi, wi | i=1,…,N)

If we really need random samples from P, we can generate them

by resampling such that the likelihood of choosing value xi is

proportional to its weight wi

This would now involve now sampling from a discrete

distribution of N possible values (the N values of xi )

Therefore, regardless of the dimensionality of vector x, we are

resampling from a 1D distribution (we are essentially

sampling from the indices 1...N, in proportion to the

importance weights wi). So we can using the inverse

transform sampling method we discussed earlier.

Penn State

Robert Collins

Sequential Monte Carlo Methods

Sequential Importance Sampling (SIS) and the closely

related algorithm Sampling Importance Sampling (SIR)

are known by various names in the literature:

- bootstrap filtering

- particle filtering

- Condensation algorithm

- survival of the fittest

General idea: Importance sampling on time series data,

with samples and weights updated as each new data

term is observed. Well-suited for simulating recursive

Bayes filtering!

Penn State

Robert Collins

Recall: Bayes Filtering

Two-step Iteration at Each Time t:

Motion Prediction Step:

Data Correction Step (Bayes rule):

Penn State

Robert Collins

Recall: Bayes Filtering

Problem: in general we get intractible integrals

Motion Prediction Step:


Penn State

Robert Collins

Sequential Monte Carlo Methods

Intuition:

• Represent probability distributions by samples

(called particles).

• Each particle is a “guess” at the true state.

• For each one, simulate it’s motion update and add

noise to get a motion prediction. Measure the

likelihood of this prediction, and weight the

resulting particles proportional to their likelihoods.

Penn State

Robert Collins

Back to Bayes Filtering

This integral in the denominator of Bayes rule disappears as a

consequence of representing distributions by a weighted set of

samples. Since we have only a finite number of samples, the

normalization constant will be the sum of the weights!


Penn State

Robert Collins

Back to Bayes Filtering

Now let’s write the Bayes filter by combining motion

prediction and data correction steps into one equation.

motion term old posterior data term new posterior

Penn State

Robert Collins

Monte Carlo Bayes Filtering

Assume the posterior at time t-1 (which is the prior at time t)

has been approximated as a set of N weighted particles:

So that

Where is the delta dirac function

Useful property:

Penn State

Robert Collins


Then the motion prediction integral simplifies to a summation

Property of Dirac

delta function

Exchange order

of summation

and integration

The prior had been

approximated by N

particles

Motion prediction

integral

Penn State

Robert Collins


Our Bayes filtering equation thus simplifies as well

Plugging in result

from previous page

Bringing term that

doesn’t depend on i

into the summation

Penn State

Robert Collins


Our new posterior is therefore

but this is still not amenable to computation in closed-form for

arbitrary motion models and likelihood functions (e.g. we would

have to integrate it to compute the normalization constant c)

Idea : Let’s approximate the posterior as a set of N samples!

Idea 2 : Hey wait a minute, the prior was already represented as

a set of N samples! Why don’t we just “update” each of those?

Penn State

Robert Collins


Approach: for each sample xit-1 , generate a new sample xi

t from

by importance sampling using some

convenient proposal distribution

So, generate a sample

and compute its importance weight

Penn State

Robert Collins


We then can approximate our posterior as

where

Penn State

Robert Collins

SIS Algorithm

Penn State

Robert Collins

SIS Degeneracy

Unfortunately, pure SIS suffers from degeneracy. In

many cases, after a few iterations, all but one particle

will have negligible weight.

Illustration of degeneracy:

w

Time 19

w

Time 10

w

Time 1

Penn State

Robert Collins

Resampling to Combat Degeneracy

Sampling with replacement to get N new samples, each

having equal weight 1/N

Samples with high weight get replicated

Samples with low weight die off

Concentrates particles in areas of higher probability

Penn State

Robert Collins

Generic Particle Filter

Penn State

Robert Collins

Sample Importance Resample (SIR)

SIR is a special case of the generic particle filter where:

- the prior density is used as the proposal density

- resampling is done every iteration

therefore

and thus

the old weights are all equal due to resampling

cancellation

Penn State

Robert Collins

SIR Algorithm

Penn State

Robert Collins

Drawing from the Prior Density

xk = fk (xk-1, vk-1) v is process noise

note, when we use the prior as the importance density, we only

need to sample from the process noise distribution (typically

uniform or Gaussian).

Why? Recall:

Thus we can sample from the prior P(xk | xk-1) by starting with

sample xik-1, generating a noise vector vi

k-1 from the noise process,

and forming the noisy sample

xik = fk (x

ik-1, v

ik-1)

If the noise is additive, this leads to a very simple interpretation:

move each particle using motion prediction, then add noise.

Penn State

Robert Collins SIR Filtering Illustration

M

m

m

kM

x1

)(

1

1,

x

M

m

m

k

m

k wx1

)()( ,

M

m

m

k

Mx

1

)(~ 1,

M

m

m

kM

x1

)(

1

1,

M

m

m

k

m

k wx1

)(

1

)(

1 ,

M

m

m

k

Mx

1

)(

1

~ 1,

M

m

m

kM

x1

)(

2

1,

Penn State

Robert Collins

Problems with SIS/SIR

Degeneracy: in SIS, after several iterations all samples

except one tend to have negligible weight. Thus a lot of

computational effort is spent on particles that make no

contribution. Resampling is supposed to fix this, but

also causes a problem...

Sample Impoverishment: in SIR, after several iterations

all samples tend to collapse into a single state. The

ability to representation multimodal distributions is thus

short-lived.

CSE598G

Robert Collins

Particle Filter Failure Analysis

References

King and Forsyth, “How Does CONDENSATION Behave with

a Finite Number of Samples?” ECCV 2000, 695-709.

Karlin and Taylor, A First Course in Stochastic Processes, 2nd

edition, Academic Press, 1975.

CSE598G

Robert Collins

Particle Filter Failure Analysis

Summary

Condensation/SIR is aymptotically correct as the number of samples tends

towards infinity. However, as a practical issue, it has to be run with a finite

number of samples.

Iterations of Condensation form a Markov chain whose state space is quantized

representations of a density.

This Markov chain has some undesirable properties

• high variance - different runs can lead to very different answers

• low apparent variance within each individual run (appears stable)

• state can collapse to single peak in time roughly linear in number

of samples

• tracker may appear to follow peaks in the posterior even in the absence

of any meaningful measurements.

These properties generally known as “sample impoverishment”

CSE598G

Robert Collins

Stationary Analysis

For simplicity, we focus on tracking problems with

stationary distributions (posterior should be the same

at any time step).

[because it is hard to really focus on what is going on

when the posterior modes are deterministically

moving around. Any movement of modes in our

analysis will be due to behavior of the particle filter]

CSE598G

Robert Collins

A Simple PMF State Space

Consider 10 particles representing a probability mass

function over 2 locations.

PMF state space:

1 2

{(0,10)(1,9)(2,8)(3,7)(4,6)(5,5)

(6,4)(7,3)(8,2)(9,1)(10,0)}

(4,6)

We will now instantiate a particular two-state filtering model

that we can analyze in closed-form, and explore the Markov

chain process (on the PMF state space above) that describes

how particle filtering performs on that process.

CSE598G

Robert Collins

Discrete, Stationary, No Noise

Assume a stationary process model with no-noise

process model: Xk+1 = F Xk + vk

Identity

I

no noise

0

Xk+1 = Xk process model:

CSE598G

Robert Collins

Perfect Two-State Ambiguity

Let our two filtering states be {a,b}.

We define both prior distribution and observation

model to be ambiguous (equal belief in a and b).

P(X0) = .5 X0 = a .5 X0 = b

P(Z|Xk) = .5 X0 = a .5 X0 = b

from process model:

P(Xk+1 | Xk) = 1

1

0

0

a b

a

b

CSE598G

Robert Collins

Recall: Recursive Filtering

Prediction:

Update:

previous estimated state state transition predicted current state

predicted current state measurement estimated current state

normalization term

These are exact propagation equations.

CSE598G

Robert Collins

Analytic Filter Analysis

1 .5 .5 0

0 .5 .5 1 = .5

= .5

Predict

Update

.5 = .25/(.25+.25) = .5

.5

.5 .5 = .25/(.25+.25) = .5

CSE598G

Robert Collins

Analytic Filter Analysis

Therefore, for all k, the posterior distribution is

P(Xk | z1:k) = .5 Xk = a .5 Xk = b

which agrees with our intuition in regards to the

stationarity and ambiguity of our two-state model.

Now let’s see how a particle filter behaves...

CSE598G

Robert Collins

Particle Filter

Consider 10 particles representing a probability mass

function over our 2 locations {a,b}.

In accordance with our ambiguous prior, we will initialize

with 5 particles in each location

a b

P(X0) =

CSE598G

Robert Collins

(equal weights in this case)

(no-op in this case)

(all weights become .5 in this case)

(weights are still equal )

Condensation (SIR) Particle Filter

1) Select N new samples with replacement, according

to the sample weights

2) Apply process model to each sample (deterministic

motion + noise)

3) For each new position, set weight of particle in

accordance to observation probability

4) Normalize weights so they sum to one

CSE598G

Robert Collins

Condensation as Markov Chain (Key Step)

Recall that 10 particles representing a probability

mass function over 2 locations can be thought of as

having a state space with 11 elements:

{(0,10)(1,9)(2,8)(3,7)(4,6)(5,5)(6,4)(7,3)(8,2)(9,1)(10,0)}

a b

(5,5)

CSE598G

Robert Collins


We want to characterize the probability that the

particle filter procedure will transition from the

current configuration to a new configuration:

{(0,10)(1,9)(2,8)(3,7)(4,6)(5,5)(6,4)(7,3)(8,2)(9,1)(10,0)}

a b

(5,5)

?

CSE598G

Robert Collins


We want to characterize the probability that the

particle filter procedure will transition from the

current configuration to a new configuration:

{(0,10)(1,9)(2,8)(3,7)(4,6)(5,5)(6,4)(7,3)(8,2)(9,1)(10,0)}

a b

(5,5)

?

Let P(j | i) be

prob of transitioning

from (i,10-i) to (j,10-j)

CSE598G

Robert Collins

Example

N=10 samples

a b

(5,5)

a b

(4,6)

a b

(3,7)

a b

(5,5) a b

(6,4)

.2051

.1172

.2051 .2461

0 10 0

.25 P( j | 5)

j

CSE598G

Robert Collins

Full Transition Table

i

j

0

10

0 10

P( j | i)

0 1

0

0

.25 P( j | 5)

j

CSE598G

Robert Collins

The Crux of the Problem

from (5,5), there is a good chance

we will jump to away from (5,5),

say to (6,4)

P(j|5)

P(j|6) once we do that, we are no longer

sampling from the transition

distribution at (5,5), but from the

one at (6,4). But this is biased off

center from (5,5)

P(j|7) and so on. The behavior will be

similar to that of a random walk.

CSE598G

Robert Collins

Another Problem

i

j

0

10

0 10

P( j | i)

P(0|0) = 1

P(10|10) = 1

(0,10) and (10,0)

are absorbing states!

CSE598G

Robert Collins

Observations

• The Markov chain has two absorbing states

(0,10) and (10,0)

• Once the chain gets into either of these two

states, it can never get out (all the particles

have collapsed into a single bucket)

• There is a nonzero probability of getting into

either absorbing state, starting from (5,5)

These are the seeds of our destruction!

CSE598G

Robert Collins

Simulation

CSE598G

Robert Collins

Some sample runs with 10 particles

CSE598G

Robert Collins

N=10

N=20

N=100

More Sample Runs

CSE598G

Robert Collins

Average Time to Absorbtion

number of particles N

aver

age

tim

e to

ab

sorb

tion

Dots - from running simulator (100 trials at N=10,20,30...)

Line - plot of 1.4 N, the asymptotic analytic estimate (King and Forsyth)

CSE598G

Robert Collins

More Generally

Implications of stationary process model with no

noise, in a discrete state space.

• any time any bucket contains zero particles, it will

forever after have zero particles (for that run).

• there is typically a nonzero probability of getting

zero particles in a bucket sometime during the run.

• thus, over time, the particles will inevitably

collapse into a single bucket.

CSE598G

Robert Collins

Extending to Continuous Case

A similar thing happens in more realistic cases. Consider a

continuous case with two stationary modes in the likelihood,

and where each mode has small variance with respect to

distance between modes.

mode1 mode2

CSE598G

Robert Collins


mode1 mode2

The very low variance between modes is fatal to any particles

that try to cross from one to the other via diffusion.

CSE598G

Robert Collins


mode1 mode2

Each mode thus becomes an isolated island, and we can reduce

this case to our previous two-state analysis (each mode is one

discrete state)

a b

sampling methods: particle filteringrtc12/cse586/lectures/samplepfslides.pdf · sis degeneracy...

Documents