unscented particle filter - isip · 2005. 3. 29. · assumptions laid within the format of...

The Unscented Particle Filter

Rudolph van der Merwe (OGI)

Nando de Freitas (UC Berkeley)

Arnaud Doucet (Cambridge University)

Eric Wan(OGI)

patilText Box Slide from : cslu.ece.ogi.edu/publications/ps/UPF_CSLU_talk.pdf

Our reference : www.cavs.msstate.edu/~patil/particle_filter/UPF_CSLU_talk.pdf

Outline

l Optimal Estimation & Filtering

l Optimal Recursive Bayesian Solution

l Practical Solutions

w Gaussian approximations (EKF, UKF)

w Sequential Monte Carlo methods (Particle Filters)

l The Unscented Particle Filter

w The Unscented Transformation and U

w Applications of UT/UKF to Particle Fil

l Experimental Results

l Conclusions

KF

ters

Filtering

l General problem statement

w Filtering is the problem of sequentially estimating the states (parameters or hidden variables) of a system as a set of observations become available on-line.

kx1k −x2k −x

2k −y 1k −y ky

1( | )k kp −x x

( | )k kp y x

Unobserved

Observed

patilText Boxsmall y's are instantaneous observationscapital Y's are observation sequence.Similarly, for states denoted by x, X.

patilText BoxIn Speech : y implies utterences, or speech data.x implies articulator positions, vocal tract parameters.

patilText BoxThis is a nice equation - representing the probability of the current observation given the sequence. This representation is more appropriate as compared to y = f(x). where in f(x), would be linear or nonlinear function.

patilPencil

patilPencil

patilText Boxp(yk | xk) -- emission probability. This is true as we know, yk is a noisy observation.

patilText Boxp(xk | xk-1) -- the forward transition probability.

patilPencil

patilText BoxThis is a typical case, for in practice, the output is available or measurable and then we either try to estimate the input or try to model the system (estimate the transfer function of the system)

patilText BoxSee the note at the bottom.

patilText BoxIt is important to note that: A observation is related to the current state; in fact, the observation is not related to the previous sample or samples. It can be some how related to the previous sample(s) by the current state. Hence the equation : yk = f ( xk, input, and measurement noise) on slide # 7.Secondly, the current state is related to the previous state, input and process noise and NOT a function of observation per se. [Slide 7].

patilText BoxOUR ASSUMPTION will be:MARKOVIAN, nonlinear, non-Gaussian State Space Model

patilText BoxThe unobserved signal [hidden state] is modelled as a Markov Process of initial distribution p(x0) and transition equation p(xt | xt-1).The obsesrvations [ yt] are assumed to be conditionally independent given the process xt and of marginal distribution p(yt | xt).

patilText Boxrefer: Book "Sequential Monte Carlo Methods," by Arnaud Doucet, de Freitas, Gordan.

patilPencil

patilPencil

patilText BoxOur aim is to Estimate recursively in time:1. the posterior distribution p(Xt | Y1_to_t)2. Its associated features [ including the marginal distribution p(xt | Y 1_to_t) - filtering distribution].3. expectations for some function integrable w.r.t. posterior distribution.The "some function" can be conditional mean, conditional covariance, or any other appropriate function

Filtering

l Solution of sequential estimation problem given by

w Posterior density :

w By recursively computing a marginal of the posterior, the filtering density,

one need not keep track of the complete history of the states.

( | )k kp X Y

1 2

1 2

{ , , , }

{ , , , }k k

k k

==

X x x x

Y y y y

LL

( | )k kp x Y

patilText BoxThis is posterior density because we are estimating the current state after the current observation has been made.This is related to evidence and prior.

patilText BoxFind about the last comment and have is it related to our discussion.

Posterior, Marginal. and not keeping the complete history of states.

Is it any way related to finding P(xk=1 | Yk+1), and How?

patilText Box The Word POSTERIOR implies measuring a certain current quantity [ or sequence] after the observation(s) have been made.Herein,p(Xk |Yk) = p( xk, xk-1, xk-2, ... x0 | yk, yk-1, yk-2, ..... y0)Thus, it is conditional joint probability of xk given Yk.

patilPencil

Filtering

l Given the filtering density, a number of estimates of the system state can be calculated:

w Mean (optimal MMSE estimate of state)

w Mode

w Median

w Confidence intervals

w Kurtosis, etc.

[ ]ˆ | ( | )k k k k k k kE x p d= = ∫x Y x x Y x

patilText BoxStill right now we are not sure of how p(xk|Yk) is estimated.

patilPencil

patilPencil

patilText Box The mode is defined as the value with the maximum density function.

patilText Box The median is defined as the value in the middle of the distribution if the values are arranged in ascending or descending order.

patilText BoxThe degree of certainity with which the value will lie with the threshold. This threshold is called as confidence intervals

patilText BoxFourth order expectation of the value is called Kurtosis. This is not the formal definition, but i am not able to paste the formula here, but i still wanted to give some idea as to what Kurtosis is, hence the mention. For more information, refer any statistics book.

http://www.riskglossary.com/articles/kurtosis.htmpatilNotewhy is this? bold xk and unbold xk?

patilText BoxNeed to understand more. left hand side should be mean(xk | Yk).Check this!!!

patilText BoxRefer Slide # 3. This is conditional mean that the author is talking about. Expectation of any function - "some function".

State Space Formulation of System

l General discrete-time nonlinear, non-Gaussian dynamic system

w Assumptions :

1) States follow a first order Markov process

2) Observations independent given the states

1 1 1( , , )

( , , )k k k k

k k k k

− − −==

x f x u v

y h x u n

state

noisy observation measurement noise

process noise

known input

1 2 0 1( | , , , ) ( | )k k k k kp p− − −=x x x x x xL

( | , ) ( | )k k k kp A p=y x y x

patilText BoxAssumptions laid within the format of discussion. These are not the assumptions from the State Space Formulations.

patilText BoxThe assumption validity needs to be checked for the case of nonlinear system. But, to start with the assumption offers simplicity in calculations.

patilText BoxAssumption # 2 is used in finding the estimate of filtering density.

http://www.cavs.msstate.edu/~patil/Particle_filter/Nonlinear_state_space_models.pdfpatilText BoxThe assumption is specific to the nonlinear sate space models. for ref. www.cavs.msstate.edu/~patil/nonlinear_state_space_models.pdf, page 3 out of page 8, section 2.1 The Probablistic Model

patilText Box assumed that the current state is dependent only on the previous state.

patilText BoxWhere A is either one observation (any one) yn or any set of previous observations. Yn. Actually here n taking any value between o to k.

patilText Boxjustify all the RED arrow words.

patilRectangle

Recursive Bayesian Estimation

l Given this state space model, how do we recursively estimate the filtering density ?

1

1

1

1 1

1

1

1 1

1

1

1

1

( | ) ( )( )

( )

( , | ) ( )

( , )

( | , ) ( )

( | ) ( )

( )

( | )

( | )

( )

( | )

( | )

( | , )

(

( | ) ( )

(

)

)

|

k k kk k

k

k k k k

k k

k k k k

k k k

k

k k

k k

k k k

kk

k k

k

k k k

k k

k

p pp

p

p p

p

p p

p p

p

p p

p

p

p p

p

p

p

p

−

−

−

− −

−

−

−

− −

−

−

−

=

=

=

=

=

Y x

x Y Yx

Y x xx Y

Y

y Y x xy Y

y Y x xy Y Y

xy Y Y

x Yy Y

y Y x

y x

patilText BoxPredicte (update) stage

patilText Boxnormalising factor or constant - depends on the likelihood function p (yk / xk) defined in the measurement model

patilText Box Slide from : cslu.ece.ogi.edu/publications/ps/UPF_CSLU_talk.pdfOur reference : www.cavs.msstate.edu/~patil/particle_filter/UPF_CSLU_talk.pdf

patilText BoxFirst equation is got from the Basic Baye's Rule.

patilText BoxJoint Probability Distribution of current observation, observation sequence til previous observations given the current state.Similarly, the denominator consists of Joint Probability Distribution of current observation and the previous observation sequence.

patilPencil

patilText Boxobservations independent given the states. This assumption is used.

patilPencil

patilPencil

patilText BoxPlease also refer - /www.cavs.msstate.edu/~patil/particle_filter/ Tutorial_on_particlefilter.pdf

patilPencil

patilPencil

patilText Boxp(a | b) = p(b | a) p(a) / p(b)

patilPencil

patilPencil

patilText Boxp(Yk | xk) = p(yk, yk-1, yk-2, ... y0 | xk)= club yk-1, yk-2, y0 as Yk-1= p(yk, Yk-1 | xk)

patilPencil

patilPencil

patilPencil

patilPencil

patilPencil

patilText Boxis P(a,b) = P(a|b) P(b) ?

patilNoteExplanation for the Numerator!!!

patilNoteFor the Blue => Give MORE EXPLANATION!!

Recursive Bayesian Estimation

w Prior :

(Propagation of past state into future before new observation is made.)

w Likelihood : defined in terms of observation model

w Evidence :

1

1

( | ) ( | )( )

( | )k k k k

k kk k

p pp

p−

−

=y x x Y

x Yy Y

priorlikelihood

evidenceposterior

11 1 1 1( | )( | ) ( | )k k kk k k kp p dp− − − −−= ∫x Y x Y xx xtransition density given by process model

1 1( | ) ( | ) ( | )k k k k k k kp p p d− −= ∫y Y y x x Y x

patilText BoxCalled the normalising constant - also refer eqn (5) of IEEE xsactions on signal processing, feb. 02, pp 175

patilText Boxalso refer eqn (3) of IEEE xsactions on signal processing, feb. 02, pp 175

patilText BoxPrediction stage :

patilText Boxupdate stage :

patilText BoxThis parameter we want to estimate -

patilNoteExplain the PRIOR...

patilHighlight

patilNoteYk-1 is discrete observation sequence, so, p(xk|Yk-1) is it continuous, or what is it?

Why is p(xk|xk-1) called as DENSITY => because word density implies continuity or continuous function!!

how is p(a|b,c) related to p(a|b) or p(a|c)?

The question really is: how do we involve Yk-1 into the equation for PRIOR and No contribution from Yk-1 for first term ???

patilText Boxmake a note!

patilText BoxFinding estimate or next observation value based on the previously available observation sequence is NOT based on MSE criterion. But, the estimate is the mostly likely [probable] value.There can be more then one way to estimate the next observation, Evidence equation being one of them.

Practical Solutions

l Gaussian Approximations

l Perfect Monte Carlo Simulation

l Sequential Monte Carlo Methods : “Particle Filters”

w Bayesian Importance Sampling

w Sampling-importance resampling (SIR)

patilText BoxKalman filters - Extended KF [EKF], Unscented KF [UKF], Grid-based filters, Gaussian-sum filters

patilText BoxHMM filters are an application of approximate grid-based methods in a fixed-interval smoothing context and have been extensively used in speech processing.In HMM based tracking - Viterbi Algorithm is used to calculate the maximum posteriori estimate of the path through the trellis. Another approach is Baum-Welch Algorithm.The Viterbi and Baum-Welch Algorithms are frequently applied when the sate space is approximated to be discrete. The algorithms are optimal IFF the underlying state space is truly discrete in nature.

patilText Box Called by different name : Bootstrap Filtering, Condensation Algorithm, Particle Filtering, Interacting Particle Filtering, Survival of the fittest.It is a technique for implementing a recursive Bayesian filter by MC simulations.

patilText BoxThe weights are chosen using the principle of importance sampling.

patilRectangle

patilText BoxTwo Methods given below:

patilText BoxImportance sampling is a variance reduction technique for efficient estimation of rate-event probabilities by Monte Carlo.[ in other words, a method for reducing the variance of the estimate of an expectation by carefully choosing a sampling distribution.

Gaussian Approximationsl Most common approach.l Assume all RV statistics are Gaussian.l Optimal recursive MMSE estimate is then given by

l Different implementations :

w Extended Kalman Filter (EKF) : optimal quantities approximated via first order Taylor series expansion (linearization) of process and measurement models.

w Unscented Kalman Filter (UKF) : optimal quantities calculate g the Unscented Transformation (accurate to second order fornonlinearity). Drastic improvement over EKF [Wan, van der Nelson 2000].

l Problem : Gaussian approximation breaks down for most nonlreal-world applications (multi-modal distributions, non-Gaussisources, etc.)

observation oprediction of predictif on ofˆ ( ) [( ) ( )]k k kkk κ= + −x yx y

d usin

any Merwe,

inear an noise

http://cslu.cse.ogi.edu/nsel/ukf/patilText Boxfor a link on use of UNSCENTED KALMAN FILTER for speech processing. CLICK here.

patilText BoxBasics of Kalman Filtering, Click here

http://www.cs.unc.edu/~welch/media/pdf/kalman_intro.pdfhttp://www.cavs.msstate.edu/~patil/particle_filter/nonlinear_state_space_models.pdfpatilText BoxIf the link does not work, go to www.cavs.msstate.edu/~patil/particle_filter/Ukf_final_speech.pdf

patilText BoxFind the details for the above statement!!!

patilNoteExplanation:this is like finding the local maxima or minima or even global one. Fine tuning of the predicted values .Cost function can be the number of iterations over fine tuning and / or error in prediction defined by Kk[difference in observation].

Perfect Simulation

l Allow for a comdistribution.

l Map intractabltractable discrthe posterior d

l So, any estima

may be approx

ˆ ( | )k kp =x Y

[E

Monte Carlo

plete repres

e integrals ofete sums of wistribution.

te of the form

imated by :

(11

N

kNi

δ=

−∑ x x

]( )kf f= ∫x

entation of the posterior

optimal Bayesian solution to eighted samples drawn from

)( )ik . . .( ) ( | )I I Dik k kp←x x Y

( ) ( | )k k k kp dx x Y x

[ ] ( )( )11

( )N

ik kN

i

E f f=

≈ ∑x x

http://www.chem.unl.edu/zeng/joy/mclab/mcintro.htmlpatilText Boxindependent, identically distributed random variable

patilRectangle

patilText BoxHow is this INTRACTABLE? [Not easily managed, governed or directed.]

patilPencil

patilPencil

patilText BoxHow is this simplification done?

patilText Boxhow to interpret this equation?

patilText Boxi indicates the estimate of xk

patilText Boxclick & refer page 26 of this thesis!

http://cbcl.mit.edu/projects/cbcl/publications/theses/thesis-shelton.pdfpatilText BoxThis is the same what is said in the thesis, but with little more explanation. refer Page 26-28, or chapter 2 of the thesis.

http://cbcl.mit.edu/projects/cbcl/publications/theses/thesis-shelton.pdfpatilText Boxx(i)k is

Particle Filters

l Bayesian Importance Samplingw It is often impossible to sample directly from the true

posterior density.w However, we can rather sample from a known, easy-to-

sample, proposal distribution,

and make use of the following substitution

( | )k kq x Y

[ ] ( ) ( )( )

( ) ( ) ( )

( ) ( )( )

( )

|

( | )

( | )

( )

( | )

|

|( ) ( )

( ) |

( |

( )

|

)

k k

k

k k

k k

k

k k k

k

k

k

k

k

k

k

pk k k

k k k kq

wk k k k

p

k kq

p

pk

p

p

pk q

E f f d

f q d

f q

q

d

w

=

=

=

=

∫∫∫

x Y

x

Y

Y

x Y

Y x x

Y

x

x

Y

x

Y

xx x x

x x Y x

x x Y x

x

x Y

patilText BoxIs similar to importance density - q (Xk | Zk)

patilText BoxFind out!

patilText Boxsome sites for Importance Sampling

patilText Box 1

http://rkb.home.cern.ch/rkb/AN16pp/node132.htmlpatilText Box 2

patilText Box 3

patilText Box 4

patilText Box 5

http://www.tcm.phy.cam.ac.uk/~ajw29/thesis/node17.htmlhttp://xbeams.chem.yale.edu/~batista/vaa/node44.htmlhttp://www.roma1.infn.it/~dagos/rpp/node41.htmlhttp://www.stat.wisc.edu/~mchung/teaching/stat471/lecture11.pdfpatilText Box 6

http://www.cs.virginia.edu/~yp9x/projects/Importance_Sampling_Slides.pdf

[ ] ( ) ( )( )

( )( )

( )( )

( ) [ ]( ) [ ]

1

|

|

|

|

( ) ( ) ( ) |

( ) ( ) |

( ) ( ) |

( ) |

( ) ( )

( )

( | ) ( )

k

k k

k

k

k

k

k

k

p

q

k k k k k k

k k k kq

k

k k k k k k

k k k k k k

k k k k k

k k kq

k kq

E f f w q d

f w q d

f w q d

p p

w q d

E w f

E w

d

=

=

=

=

∫

∫∫∫

∫x

Y

x

x Y

Y

x

Y

Y

x x x x Y x

x x x Y x

x x x Y x

x x Y x

x x

x x

x

Y x

Particle Filters

Particle Filters

w So, by drawing samples from , we can approximate expectations of interest by the following:

w Where the normalized importance weights are given by

( | )k kq x Y

[ ]( ) ( )1

1

( )11

( ) ( )

1

( ) ( )( )

( )

( ) ( )

N i ik k kN i

k N ik kN i

Ni i

k k ki

w fE f

w

w f

=

=

=

≈

≈

∑∑

∑

x xx

x

x x%

( )( )

( )

1

( )( )

( )

ii k k

k k N jk kj

ww

w=

=∑

xx

x%

Particle Filters

w Using the state space assumptions (1st order Markov / observational independence given state), the importance weights can be estimated recursively by [proof in De Freitas (2000)]

w Problem with SIS is that the variance of the importance weights increase stochastically over time [Kong et al. (1994), Doucet et al. (1999)]

w To solve this, we need to resample the particles • keep / multiply particles with high importance weights• discard particles with low importance weights

w Sampling-importance Resampling (SIR)

( ) ( )( )

11

1

| |

| ,k k k k

k kk k k

p pw w

q−

−−

=y x x x

x X Y

Particle Filters

l Sampling-importance Resampling w Maps the N unequally weighted particles into a new set of N

equally weighted samples.

w Method proposed by Gordon, Salmond & Smith (1993) and proven mathematically by Gordon (1994).

{ } { }( ) ( ) ( ) 1, ,i i jk k kw N −→x x%

j resampled index p(i)

cdf

1

( )jtw%

1N −

Particle Filters

Particle Filters

l Choice of Proposal Distribution

critical design issue for successful particle filter• samples/particles are drawn from this distribution• used to evaluate importance weights

w Requirements

1) Support of proposal distribution must include support of true posterior distribution, i.e. heavy-tailed distributions are preferable.

2) Must include most recent observations.

( ) ( )( )

11

1

| |

| ,k k k k

k kk k k

p pw w

q−

−−

=y x x x

x X Y

patilText BoxAbsolutely Critical Design Step

patilPencil

Particle Filters

w Most popular choice of proposal distribution does not satisfy these requirements though:

[Isard and Blake 96, Kitagawa 96, Gordon et al. 93, Beadle and Djuric 97, Avitzour 95]

w Easy to implement :

w Does not incorporate most recent observation though !

( ) ( )( )

( )

11

1

1

| |

|

|

k k k kk k

k k

k k k

p pw w

p

w p

−−

−

−

=

=

y x x x

x x

y x

( ) ( )1 1| , |k k k k kq p− −=x X Y x x

Improving Particle Filters

l Incorporate New Observations into Proposal

w Use Gaussian approximation (i.e. Kalman filter) to generate proposal by combining new observation with prior

( ) ( )( )

1 1| , | ,

ˆ ,cov[ ]

k k k G k k k

k k

q p− −=

=

x X Y x X Y

x xN


l Extented Kalman Filter Proposal Generation

w De Freitas (1998), Doucet (1998), Pitt & Shephard (1999).

w Greatly improved performance compared to standard particle filter in problems with very accurate measurements, i.e. likelihood very peaked in comparison to prior.

w In highly nonlinear problems, the EKF tends to be very inaccurate and underestimates the true covariance of the state. This violates the distribution support requirement for the proposal distribution and can lead to poor performance and filter divergence.

l We propose the use of the Unscented Kalman Filterfor proposal generation to address these problems !


l Unscented Kalman Filter Proposal Generation

w UKF is a recursive MMSE estimator based on the Unscented Transformation (UT).

w UT : Method for calculating the statistics of a RV that undergoes a nonlinear transformation (Julier and Uhlmann 1997)

w UT/UKF : - accurate to 3rd order for Gaussians- higher order errors scaled by choice of

transform parameters.

w More accurate estimates than EKF (Wan, van der Merwe, Nelson 2000)

w Have some control over higher order moments, i.e. kurtosis, etc. heavy tailed distributions !

a

x

Px

c a ai x xl q = + -x x P x P

f b g

Weightedsample mean

Weightedsample

covariance

b i

b i

y

Pyy i

+-

Unscented Transformation

8

mean

covariance sigmapoints

Actual (sampling) UT Linearized (EKF)

( )=y f x ( )i iψ χ= f

transformedsigma points

UTcovariance

UTmean

y f x P P= =( ) y A AT

x

truemean

truecovariance f x( )

A AT xP

The Unscented Transformation

9

+

Unscented Particle Filter

Particle Filter UKF Proposal

Unscented Particle Filter

Experimental Results

l Synthetic Experiment

w Time-series

• process model :

• nonstationary observation model :

( )1 1 sink k kx k x vωπ φ+ = + + +process noise (Gamma)

2 30

302k k

k

k k

k

k

x ny

x n

φφ

≤

>

+=

− +

measurement noise (Gaussian)


l Synthetic Experiment : (100 independent runs)

0.0060.070Unscented Particle Filter

0.0160.310Particle Filter : EKF proposal

0.0530.424Particle Filter : generic

0.0120.280Unscented Kalman Filter (UKF)

0.0150.374Extended Kalman Filter (EKF)

variancemean

MSEFilter

Experimental Resultsl Synthetic Experiment


l Pricing Financial Optionsw Options : financial derivative that gives the holder the right

(but not obligation) to do something in the future.• Call option : - allow holder to buy an underlying cash product

- at a specified future date (“maturity time”) - for a predetermined price (“strike price”)

• Put option : - allow holder to sell an underlying cash product

w Black Scholes partial differential equation• Main industry standard for pricing options

22 21

2 2

f f frS S rf

t S Sσ

∂ ∂ ∂+ + =

∂ ∂ ∂option value

value of underlying cash product

risk-free interest ratevolatility of cash product


l Pricing Financial Optionsw Black & Scholes (1973) derived the following pricing solution:

1 2

2

( ) ( )

( 1) ( )

m

m

rtc c

rtc c

C S d Xe d

P S d Xe d

−

−

= −

= − − + −

N NN N

212

1

2 1

ln( / ) ( )

(.) cumulative normal distribution

m

m

m

c

S X r td

t

d d t

σσ

σ

+ +=

= −

=N


l Pricing Financial Optionsw State-space representation to model system for particle filters

• Hidden states :

• Output observations:

• Known control signals:

w Estimate call and put prices over a 204 day period on the FTSE-100 index.

• Performance : normalized square error for one-step-ahead predictions

, C P

, r σ

, mt S

( )ˆk kk

NSE = −∑ y y


l Options Pricing Experiment : (100 independent runs)




0.0000.023Unscented Kalman Filter (UKF)Put


0.0000.035Trivial




0.0000.037Unscented Kalman Filter (UKF)Call


0.0000.078Trivial

varmean

NSEAlgorithmOption Type


l Options Pricing Experiment : UPF one-step-ahead predictions


l Options Pricing Experiment : Estimated interest rate and volatility


l Options Pricing Experiment : Probability distributions of implied interest rate and volatility

Particle Filter Demosl Visual Dynamics Group, Oxford. (Andrew Blake)

Tracking agile motion

Tracking motion against camouflage Mixed state tracking

Conclusionsl Particle filters allow for a practical but complete representation of

posterior probability distribution.

l Applicable to general nonlinear, non-Gaussian estimation problems where standard Gaussian approximations fail.

l Particle filters rely on importance sampling, so the proper choice of proposal distribution is very important:

w Standard approach (i.e. transition prior proposal) fails when likelihood of new data is very peaked (accurate sensors) or for heavy-tailed noise sources.

w EKF proposal : Incorporates new observations, but can diverge due to inaccurate and inconsistent state estimates.

w Unscented Particle Filter : UKF proposal• More consistent and accurate state estimates.• Larger support overlap, can have heavier tailed distributions.• Theory predicts and experiments prove significantly better performance.

The End

unscented particle filter - isip · 2005. 3. 29. · assumptions laid within the format of...

Documents