ece531 lecture 11: dynamic parameter estimation and the ......ece531 lecture 1: dynamic parameter...

ECE531 Lecture 1: Dynamic Parameter Estim. System Model

ECE531 Lecture 11: Dynamic Parameter Estimation

and the Kalman-Bucy Filter

D. Richard Brown III

Worcester Polytechnic Institute

13-Apr-2011

Worcester Polytechnic Institute D. Richard Brown III 13-Apr-2011 1 / 48


Introduction

◮ So far, we have only considered estimation problems with fixedparameters, e.g.

◮ Yki.i.d.∼ N (θ, σ2).

◮ Yk = a cos(ωk + φ) + Wk for θ = [a, φ, ω]⊤.

◮ Yki.i.d.∼ θe−θyk .

◮ Many problems require us to estimate dynamic or time-varyingparameters, e.g.

◮ Radar (position and velocity of target changing over time)◮ Communications (amplitude and phase of signal changing over time)◮ Stock market (price of shares next week changing over time)

◮ Step 1: We need to develop a general dynamic model (ECE504) for◮ How time-varying parameters evolve over time and◮ How observations are generated from the parameter state.

◮ Step 2: We need to develop good techniques for estimating dynamicparameters. These techniques should leverage some knowledge of thedynamic model.



Discrete Time Model for Dynamic Parameters

unit

delay

X[n]

X[n + 1]f[n]

h[n]

Z[n]

V [n]

U [n]

Y [n]

The time index isdenoted asn = 0, 1, . . . . Allvectors are consideredto be random unlessotherwise specified.

U [n] ∈ Rs

X [n] ∈ Rm

Z[n] ∈ Rk

V [n] ∈ Rk

Y [n] ∈ Rk

f[n] : Rs × R

m 7→ Rm

h[n] : Rm 7→ R

k



Notation and Terminology

◮ U [n] is the dynamical system “input”, n = 0, 1, . . . . This input isusually random and is also called the “process noise”.

◮ X[n] is the “state” of the dynamical system, n = 0, 1, . . . . This iswhat we want to estimate.

◮ Z[n] is the dynamical system “output”, n = 0, 1, . . . .◮ f[n] is a time varying function that updates the state based on the

current state and the current input, i.e.

X[n + 1] = f[n](X[n], U [n])

◮ h[n] is a time varying function that generates the current outputbased on the current state, i.e.

Z[n] = h[n](X[n])

◮ V [n] is the “measurement noise” n = 0, 1, . . . .◮ Y [n] is the “observation” n = 0, 1, . . . .



Example: One-Dimensional Motion

Suppose we have a particle moving on a line with position x and velocity v

updated according to

x[n + 1] = x[n] + Tv[n]

v[n + 1] = v[n] + Ta[n]

where a[n] represents the piecewise constant acceleration and T is the timebetween samples. The implicit assumption here is that T is small enough suchthat the piecewise constant approximation holds.

◮ The state X = [x, v]⊤.

◮ The input U = a.

◮ The state update equation can be written as

X [n + 1] =

[1 T

0 1

]

X [n] +

[0T

]

U [n]

◮ Suppose we observe the position of the particle in noise. Then

Y [n] =[1 0

]X [n] + V [n]



Example: One-Dimensional Motion (White Gaussian Input)

0 5 10 15 20 25 30 35−4

−3

−2

−1

0

1

2

3

4

position x

velo

city

v



Remarks

◮ In these types of dynamical systems, X[k] is completely determinedby the earlier state X[ℓ], l < k, and the inputs {U [ℓ], . . . , U [k − 1]}.

◮ The state at time ℓ completely summarizes the system in the sensethat you don’t need to know the details of what happened prior totime ℓ if you know X[ℓ].

◮ We are going to study problems in which we wish to estimate thedynamic state X[ℓ] given a sequence of observations Y [0], . . . , Y [k].These problems can be categorized into three types:

1. Filtering: ℓ = k (estimate the current state)2. Prediction: ℓ > k (estimate a future state)3. Smoothing: ℓ < k (estimate a previous state)

◮ Note that the notation in dynamic parameter estimation problems isdifferent than static parameter estimation problems: We will use X[ℓ]to represent the quantity we wish to estimate (rather than θ[ℓ]).



Restriction 1: Squared Error Cost Assignment

We will only consider the squared error cost assignment, i.e.

MSE = E{

‖X [ℓ] − X[ℓ]‖22

}

where X [ℓ] is the estimate of the state X[ℓ].

We also assume that we know:

◮ the joint distribution of the inputs U [0], U [1], . . . , U [ℓ − 1] and

◮ the distribution of the initial state X[0].

Given the observation Y [0], . . . , Y [k], what estimator X[ℓ] minimizes theMSE?

Hint: Is this Bayesian estimation or non-random parameter estimation?



Restriction 2: Linear Dynamical Model

We are going to restrict our attention to systems with state updateequations and output equations of the form

X[n + 1] = F [n]X[n] + G[n]U [n] n = 0, 1, . . .

Y [n] = H[n]X[n] + V [n] n = 0, 1, . . .

where, for each n, F [n] ∈ Rm×m, G[n] ∈ R

m×s, and H[n] ∈ Rk×m.

◮ We’ve already seen that one-dimensional motion fits within this linearmodel.

◮ The same is true for two- and three-dimensional motion and lots ofother real-world dynamic systems.

◮ Many nonlinear systems can approximately fit in this model bylinearizing f and h around a nominal state (Taylor series expansion).



Linear Dynamical Model Review

State update:

X[n + 1] = F [n]X[n] + G[n]U [n]

= F [n] (F [n − 1]X[n − 1] + G[n − 1]U [n − 1]) + G[n]U [n]

= etc.

Repeating this process leads to an expression for the state at time n + 1 interms of the initital state X[0] and the inputs U [0], U [1], . . . :

X[n + 1] =

{n∏

k=0

F [n − k]

}

X[0] +n∑

j=0

{n−j−1∏

k=0

F [n − k]

}

G[j]U [j]

where

t∏

k=0

F [n − k] := F [n]F [n − 1]F [n − 2] · · ·F [n − t]

is called the state transition matrix from time n − t to time n + 1.Worcester Polytechnic Institute D. Richard Brown III 13-Apr-2011 10 / 48


Linear Dynamical Model Review: Time Invariant Case

When

F [n] ≡ F

G[n] ≡ G

then the discrete time dynamical system is time invariant and the state attime n + 1 is simply

X[n + 1] = Fn+1X[0] +n∑

j=0

Fn−jG[j]U [j]



Restriction 3: Gaussian Input and Measurement Noise

To develop the main result, we will only consider systems in which theinput (process noise) sequence U [0], U [1], . . . and the measurement noisesequence V [0], V [1], . . . are independent sequences of independent zeromean Gaussian random vectors, i.e.

E{U [k]} = 0

E{U [k]U⊤[j]} =

{

Q[k] k = j

0 otherwise

E{V [k]} = 0

E{V [k]V ⊤[j]} =

{

R[k] k = j

0 otherwise

E{U [k]V ⊤[j]} = 0 for all j and k

We also assume that the initial state X[0] ∼ N (m[0],Σ[0]) is a Gaussianrandom vector independent of U [0], U [1], . . . and V [0], V [1], . . . .



Dynamic MMSE State Estimation (Filtering)

unit

delay

X[n]

X[n + 1]F [n], G[n]

H[n]Z[n]

V [n]

U [n]

Y [n]

We will focus on thefiltering problem, i.e.estimating X[ℓ] given theobservations Y [0], . . . , Y [ℓ].We know that the MMSEstate estimator is theconditional mean

X[ℓ | ℓ] := En

X[ℓ] | Yℓ0

o

where

Yℓ0 :=

h

Y⊤[0], . . . , Y ⊤[ℓ]

i

⊤

is the “super vector” of all

observations up to and

including time ℓ.



The Brute Force Approach: Batch MMSE Estimation

Suppose we’ve received observations Yℓ0 :=

[Y ⊤[0], . . . , Y ⊤[ℓ]

]⊤and we

wish to estimate the state X[ℓ] from these observations. If we just did thisas a batch operation, what is the MMSE estimate of X[ℓ]?Recall the conditional mean of jointly Gaussian random vectors X and Y

E[X |Y = y] = E[X] + cov(X,Y ) [cov(Y, Y )]−1 (y − E[Y ])

We can use this result to write

X[ℓ | ℓ] = E{

X[ℓ] | Yℓ0

}

= E{X[ℓ]} + cov{X[ℓ],Yℓ0}

[

cov{Yℓ0,Y

ℓ0}

]−1 (

Yℓ0 − E{Yℓ

0})

Recall that each Y [n] ∈ Rk.

◮ What are the dimensions of the matrix inverse that we have tocompute here?

◮ What happens as we get more observations?Worcester Polytechnic Institute D. Richard Brown III 13-Apr-2011 14 / 48


MMSE State Prediction X[0] given no observations

Kalman’s big idea was to find a computationally efficient recursion for MMSEstate estimation. To see how this works, we need to approach the MMSEestimation problem from the beginning.

THE BEGINNING: We want to estimate the initial state X [0] before we have anyobservations. What is the best estimator of the initial state?

Recall that we have assumed a Gaussian distributed initial stateX [0] ∼ N (m[0], Σ[0]) where m[0] and Σ[0] are known. Prior to receiving the firstobservation, the MMSE estimate (prediction) of the initial state X [0] is simply

X[0 | − 1] := E {X [0] | no observations} = m[0].

The error covariance matrix of this MMSE estimator (predictor) is then

Σ[0 | − 1] := E

{(

X[0 | − 1] − X [0])(

X[0 | − 1] − X [0])⊤

}

= E{

(m[0] − X [0]) (m[0] − X [0])⊤

}

= Σ[0]



MMSE State Estimate X[0] given Y [0] (part 1)

At time n = 0, we receive the observation

Y [0] = H[0]X[0] + V [0]

where X[0] ∼ N (m[0],Σ[0]), V [0] ∼ N (0, R[0]), and H[0] is known.

We use our prior result for jointly Gaussian random vectors with

E[X] = m[0] E[Y ] = H[0]m[0]

cov(X,Y ) = Σ[0]H⊤[0] cov(Y, Y ) = H[0]Σ[0]H⊤[0] + R[0]

to write an expression for the MMSE estimate of X[0] given Y [0] as

X[0 | 0] := E {X[0] |Y [0]}

= m[0] + Σ[0]H⊤[0]“

H [0]Σ[0]H⊤[0] + R[0]”−1

(Y [0] − H [0]m[0])

= X[0 | − 1] +

Σ[0 | − 1]H⊤[0]“

H [0]Σ[0 | − 1]H⊤[0] + R[0]”−1

(Y [0] − H [0]X [0 | − 1])




If we define the Kalman Gain as

K[0] := cov(X [0], Y [0]) [cov(Y [0], Y [0])]−1

= Σ[0 | − 1]H⊤[0](H [0]Σ[0 | − 1]H⊤[0] + R[0]

)−1

we can write

X [0 | 0] = X [0 | − 1] + K[0](

Y [0] − H [0]X[0 | − 1])

Remarks:

◮ The term Y [0 | − 1] := Y [0] − H [0]X[0 | − 1] is sometimes called the“innovation”. It is the error between the MMSE prediction of Y [0] and theactual observation Y [0].

◮ Substituting for Y [0], note that the innovation can be written as

Y [0 | − 1] =(

H [0]X [0] + V [0] − H [0]X[0 | − 1])

= H [0](

X [0] − X[0 | − 1])

+ V [0]



Kalman Gain: Intuition

The matrix

K[0] := cov(X [0], Y [0]) [cov(Y [0], Y [0])]−1

is sometimes called the “Kalman Gain”. Both covariances are unconditional here.

◮ Rewriting first covariance term in the Kalman Gain as

cov(X [0], Y [0]) = E

{(

X [0] − X[0 | − 1])(

Y [0] − H [0]X[0 | − 1])⊤

}

and the second term as

cov(Y [0], Y [0]) = E

“

Y [0] − H [0]X [0 | − 1]” “

Y [0] − H [0]X[0 | − 1]”⊤

ff

we can see that the Kalman Gain (in a simplistic scalar sense) is

◮ “proportional” to the covariance between the state prediction error andthe innovation

◮ “inversely proportional” to the innovation variance

◮ Does this make sense in the context of our state estimation equation?

X[0 | 0] = X[0 | − 1] + K[0]“

Y [0] − H [0]X [0 | − 1]”




The error covariance matrix of the MMSE estimator X[0 | 0] can becomputed as

Σ[0 | 0] := E

{(

X[0 | 0] − X[0]) (

X[0 | 0] − X[0])⊤

|Y [0]

}

= cov {X[0] |Y [0]}

We can use the standard result for jointly Gaussian random vectors to write

Σ[0 | 0] = Σ[0] − Σ[0]H⊤[0](

H[0]Σ[0]H⊤[0] + R[0])−1

H[0]Σ[0]

= Σ[0 | − 1] − K[0]H[0]Σ[0 | − 1]

where we used our definitions for Σ[0 | − 1] and K[0] in the last equality.Note that, given K[0] and H[0], the error covariance matrix of the MMSEstate estimator after the observation Y [0] is only a function of the errorcovariance matrix of the prediction Σ[0 | − 1].



Remarks

What we have done so far:

1. Predicted the first state with no observations: X[0 | − 1].

2. Computed the error covariance matrix of this prediction: Σ[0 | − 1].

3. Received the observation Y [0].

4. Estimated the first state given the observation: X [0 | 0].

5. Computed the error covariance matrix of this estimate: Σ[0 | 0].

Interesting observations:◮ The estimate X [0 | 0] is expressed in terms of the prediction

X[0 | − 1] and the observation Y [0].◮ The estimate error covariance matrix Σ[0 | 0] is expressed in terms

of the prediction error covariance matrix Σ[0 | − 1].

The goal here is to develop a general recursion. Our first step will be tosee if the prediction for the next state (and its ECM) can be expressed interms of the estimate (and its ECM) of the previous state.



State Update from X[ℓ] to X[n + 1]

From our linear dynamical model, we have

X[n + 1] = F [n]X[n] + G[n]U [n]

= F [n] (F [n − 1]X[n − 1] + G[n − 1]U [n − 1]) + G[n]U [n]

= etc.

For n + 1 > ℓ, we can repeat this process to write

X[n + 1] =

{n−ℓ∏

k=0

F [n − k]

}

︸︷︷︸

Fℓn

X[ℓ] +n∑

j=ℓ

{n−j−1∏

k=0

F [n − k]

}

︸︷︷︸

Fj+1n

G[j]U [j]

where

F tn :=

{

F [n]F [n − 1] · · ·F [t] t ≤ n

I t > n



General Expression for MMSE State Prediction

For n + 1 > ℓ, we can use our prior result to write

X[n + 1 | ℓ] := E{X [n + 1] | Yℓ

0

}

model= E

Fℓ

nX [ℓ] +

n∑

j=ℓ

F j+1n G[j]U [j] | Yℓ

0

linearity= Fℓ

nE{X [ℓ] | Yℓ

0

}+

n∑

j=ℓ

F j+1n G[j]E

{U [j] | Yℓ

0

}

= FℓnX[ℓ | ℓ] +

n∑

j=ℓ

F j+1n G[j]E

{U [j] | Yℓ

0

}

irrelevance= Fℓ

nX[ℓ | ℓ] +n∑

j=ℓ

F j+1n G[j]E {U [j]}

= FℓnX[ℓ | ℓ]

Note that the one-step predictor can be expressed as X [ℓ + 1 | ℓ] = F [ℓ]X[ℓ | ℓ].Worcester Polytechnic Institute D. Richard Brown III 13-Apr-2011 22 / 48


Conditional ECM of MMSE State Prediction

Σ[n + 1 | ℓ] := covn

X[n + 1] | Yℓ0

o

model= cov

8

<

:

FℓnX[ℓ] +

nX

j=ℓ

Fj+1n G[j]U [j] | Yℓ

0

9

=

;

indep= cov

n

FℓnX[ℓ] | Yℓ

0

o

+ cov

8

<

:

nX

j=ℓ

Fj+1n G[j]U [j] | Yℓ

0

9

=

;

irrelevance= cov

n

FℓnX[ℓ] | Yℓ

0

o

+ cov

8

<

:

nX

j=ℓ

Fj+1n G[j]U [j]

9

=

;

= Fℓncov

n

X[ℓ] | Yℓ0

o

(Fℓn)⊤ +

nX

j=ℓ

Fj+1n G[j]cov {U [j]}

“

Fj+1n G[j]

”⊤

= FℓnΣ[ℓ | ℓ](Fℓ

n)⊤ +n

X

j=ℓ

Fj+1n G[j]Q[j]

“

Fj+1n G[j]

”

⊤

For one-step prediction: Σ[ℓ + 1 | ℓ] = F [ℓ]Σ[ℓ | ℓ]F⊤[ℓ] + G[ℓ]Q[ℓ]G⊤[ℓ].



Summary of Main Results

One-step MMSE state predictor:

X[ℓ + 1 | ℓ] = F [ℓ]X [ℓ | ℓ]

ECM of MMSE one-step state predictor:

Σ[ℓ + 1 | ℓ] = F [ℓ]Σ[ℓ | ℓ]F⊤[ℓ] + G[ℓ]Q[ℓ]G⊤[ℓ]

We have shown the following results only for the case ℓ = 0:MMSE state estimator:

X[ℓ | ℓ] = X [ℓ | ℓ − 1] + K[ℓ](

Y [ℓ] − H[ℓ]X [ℓ | ℓ − 1])

ECM of MMSE state estimator:

Σ[ℓ | ℓ] = Σ[ℓ | ℓ − 1] − K[ℓ]H[ℓ]Σ[ℓ | ℓ − 1]

We still need to show that these expressions hold for general ℓ = 0, 1, . . . .Worcester Polytechnic Institute D. Richard Brown III 13-Apr-2011 24 / 48


Induction: MMSE State Estimator for Arbitrary ℓ

Assume that

X[ℓ | ℓ] = X [ℓ | ℓ − 1] + K[ℓ](

Y [ℓ] − H[ℓ]X [ℓ | ℓ − 1])

and

Σ[ℓ | ℓ] = Σ[ℓ | ℓ − 1] − K[ℓ]H[ℓ]Σ[ℓ | ℓ − 1]

are true for some value of ℓ. We want to show that these expressions arealso true for ℓ + 1 using the fact that the prediction equations have alreadybeen shown to be true for any ℓ.

Fact (not too difficult to prove): The innovation

Y [ℓ + 1 | ℓ] := Y [ℓ + 1] − H[ℓ + 1]X [ℓ + 1 | ℓ]

is a zero-mean Gaussian random vector uncorrelated with Y [0], . . . , Y [ℓ].



Useful Result 0

Assume that X, Y , and Z are jointly Gaussian and that Y and Z areuncorrelated with each other. Denote C as the covariance matrix betweenthe subscripted random variables. We can use the conditional mean resultfor jointly Gaussian random vectors to write

E {X |Y,Z} = E{X} +[CXY CXZ

][CY Y CY Z

CZY CZZ

]−1 [Y − E{Y }Z − E{Z}

]

Under our assumption that Y and Z are uncorrelated, both CY Z and CZY

are equal to zero. The matrix inverse is then easy to compute and we havethe useful result

E {X |Y,Z} = E{X} + CXY C−1Y Y (Y − E{Y }) + CXZC−1

ZZ (Z − E{Z})

= E{X |Y } + CXZC−1ZZ (Z − E{Z}) .



Induction: Conditional Mean

X [ℓ + 1 | ℓ + 1] := E{X [ℓ + 1] | Yℓ+1

0

}

= E{X [ℓ + 1] | Yℓ

0, Y [ℓ + 1]}

= E{

X [ℓ + 1] | Yℓ0, Y [ℓ + 1 | ℓ]

}

ur0= E

{X [ℓ + 1] | Yℓ

0

}+ cov

(

X [ℓ + 1], Y [ℓ + 1 | ℓ])

×

[

cov(

Y [ℓ + 1 | ℓ], Y [ℓ + 1 | ℓ])]−1

Y [ℓ + 1 | ℓ]

= X [ℓ + 1 | ℓ] + cov(

X [ℓ + 1], Y [ℓ + 1 | ℓ])

×

[

cov(

Y [ℓ + 1 | ℓ], Y [ℓ + 1 | ℓ])]−1

Y [ℓ + 1 | ℓ]

where we have used the fact that E{

Y [ℓ + 1 | ℓ]}

= 0 in the second to last

expression.



Induction: Error Covariance Matrix

The final step is to show

Σ[ℓ + 1 | ℓ + 1] = Σ[ℓ + 1 | ℓ] − K[ℓ + 1]H[ℓ + 1]Σ[ℓ + 1 | ℓ]

To see this, start from the definition

Σ[ℓ + 1 | ℓ + 1] = En“

X [ℓ + 1 | ℓ + 1] − X[ℓ + 1]” “

X[ℓ + 1 | ℓ + 1] − X[ℓ + 1]”

⊤o

and substitute our result

X[ℓ + 1 | ℓ + 1] = X[ℓ + 1 | ℓ] + K[ℓ + 1](

Y [ℓ + 1] − H [ℓ + 1]X[ℓ + 1 | ℓ])

After expanding the outer product, there are four expectations that needto be computed...



The four expectations that need to be computed in Σ[ℓ + 1 | ℓ + 1]:

E

“

X[ℓ + 1] − X [ℓ + 1 | ℓ]” “

X[ℓ + 1] − X [ℓ + 1 | ℓ]”⊤

ff

−K[ℓ + 1]E

“

Y [ℓ + 1] − H[ℓ + 1]X[ℓ + 1 | ℓ]” “

X[ℓ + 1] − X [ℓ + 1 | ℓ]”

⊤ff

−E

“

X[ℓ + 1] − X [ℓ + 1 | ℓ]” “

Y [ℓ + 1] − H[ℓ + 1]X[ℓ + 1 | ℓ]”⊤

ff

K⊤[ℓ + 1] +

K[ℓ + 1]E

“

Y [ℓ + 1] − H[ℓ + 1]X[ℓ + 1 | ℓ]” “

Y [ℓ + 1] − H[ℓ + 1]X[ℓ + 1 | ℓ]”

⊤ff

K⊤[ℓ + 1]

◮ The first line is simply Σ[ℓ + 1 | ℓ].

◮ The third line was solved in “useful result number 1” and is

Σ[ℓ + 1 | ℓ]H⊤[ℓ + 1]K⊤[ℓ + 1]

◮ Inspection of K⊤[ℓ + 1] reveals that the third line is a symmetricmatrix. Hence, the second line is equal to the third line.

◮ The fourth line was solved in “useful result number 2”...



From useful result number 2, we can write the fourth line as

K[ℓ + 1](H[ℓ + 1]Σ[ℓ + 1 | ℓ]H⊤[ℓ + 1] + R[ℓ + 1])K⊤[ℓ + 1]

This can be simplified a bit since

K⊤[ℓ + 1] = (H [ℓ + 1]Σ[ℓ + 1 | ℓ]H⊤[ℓ + 1] + R[ℓ + 1])−1H [ℓ + 1]Σ[ℓ + 1 | ℓ]

Substituting for the K⊤[ℓ + 1] at the end of the equation, we can writethe fourth line as

K[ℓ + 1]H[ℓ + 1]Σ[ℓ + 1 | ℓ]

which is the same as the second and third lines, except for a sign change.Putting it all together, we have

Σ[ℓ + 1 | ℓ + 1] = Σ[ℓ + 1 | ℓ] − K[ℓ + 1]H[ℓ + 1]Σ[ℓ + 1 | ℓ]

which is the result we wanted to show.



The Discrete-Time Kalman-Bucy Filter (1961)

Theorem

Under the squared error cost assignment, the linear system model, and the whiteGaussian input, noise, and initial state assumptions discussed previously, theoptimal estimates for the current state (filtering) and the next state (prediction)are given recursively as

X[ℓ | ℓ] = X[ℓ | ℓ − 1] + K[ℓ](

Y [ℓ] − H [ℓ]X[ℓ | ℓ − 1])

for ℓ = 0, 1, . . .

X[ℓ + 1 | ℓ] = F [ℓ]X[ℓ | ℓ] for ℓ = 0, 1, . . .

with the initialization X [0 | − 1] = m[0] and where the matrix

K[ℓ] = Σ[ℓ | ℓ − 1]H⊤[ℓ](H [ℓ]Σ[ℓ | ℓ − 1]H⊤[ℓ] + R[ℓ]

)−1

with Σ[ℓ | ℓ − 1] := cov {X [ℓ] |Y [0], . . . , Y [ℓ − 1]} and R[ℓ] := cov {V [ℓ]}.



The Kalman Filter

unitdelay

-

Y [ℓ] K[ℓ] F [ℓ]

H[ℓ]X [ℓ | ℓ − 1]

X[ℓ | ℓ]

X [ℓ + 1 | ℓ]



The Kalman Filter: Time-Invariant Case

unitdelay

-

Y [ℓ] K[ℓ] F

HX [ℓ | ℓ − 1]

X[ℓ | ℓ]

X [ℓ + 1 | ℓ]

K[ℓ] = Σ[ℓ | ℓ − 1]H⊤(HΣ[ℓ | ℓ − 1]H⊤ + R

)−1

X[ℓ | ℓ] = X[ℓ | ℓ − 1] + K[ℓ](

Y [ℓ] − HX[ℓ | ℓ − 1])

Σ[ℓ | ℓ] = Σ[ℓ | ℓ − 1] − K[ℓ]HΣ[ℓ | ℓ − 1]

X[ℓ + 1 | ℓ] = FX[ℓ | ℓ]

Σ[ℓ + 1 | ℓ] = FΣ[ℓ | ℓ]F⊤ + GQG⊤

Even though F , G, H , Q, and R are time-invariant, K[ℓ] is still time-varying.




Our one-dimensional motion system:

◮ The state X [ℓ] =[x[ℓ], v[ℓ]

]⊤(position, velocity).

◮ The input U [ℓ] = a[ℓ]i.i.d.∼ N (0, σ2

a) (acceleration).

◮ The state update equation is

X [ℓ + 1] =

[1 T

0 1

]

︸︷︷︸

F

X [ℓ] +

[0T

]

︸︷︷︸

G

U [ℓ]

where T is the sampling time.

◮ Suppose we observe the position of the particle in noise. Then

Y [n] =[1 0

]

︸︷︷︸

H

X [n] + V [n]

where V [ℓ]i.i.d.∼ N (0, σ2

v) and is independent of U [ℓ].




−2 −1 0 1 20

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

position x

velo

city

v

0 5 10 15 200

0.5

1

1.5

2

2.5

3

3.5

time

squa

red

erro

r

true statepredicted stateestimated state

T = 0.1, σ2a = 1, and σ2

v = 0.1 in this example.



Example: One-Dimensional Motion Kalman Gain

−2 −1 0 1 20

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

position x

velo

city

v

0 5 10 15 200

0.2

0.4

0.6

0.8

1

1.2

1.4

time

Kal

man

gai

n

position xvelocity v

T = 0.1, σ2a = 1, and σ2

v = 0.1 in this example.



Example: One-Dimensional Motion MSE

0 5 10 15 20 25 300

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

time

mea

n sq

uare

d er

ror

true statepredicted stateestimated state

T = 0.1, σ2a = 1, σ2

v = 0.1; averaged over 5000 runs.



Computational Requirements: Kalman Filter

Note that the Kalman filter recursion also has a matrix inverse.

K[ℓ] = Σ[ℓ | ℓ − 1]H⊤[ℓ](

H[ℓ]Σ[ℓ | ℓ − 1]H⊤[ℓ] + R[ℓ])−1

What are the dimensions of the matrix inverse that we have to computehere? What happens as more observations arrive?



Computational Requirements: Kalman Filter

Note that the error covariance matrix and Kalman gain updates

K[ℓ] = Σ[ℓ | ℓ − 1]H⊤[ℓ](

H[ℓ]Σ[ℓ | ℓ − 1]H⊤[ℓ] + R[ℓ])−1

Σ[ℓ | ℓ] = Σ[ℓ | ℓ − 1] − K[ℓ]H[ℓ]Σ[ℓ | ℓ − 1]

Σ[ℓ + 1 | ℓ] = F [ℓ]Σ[ℓ | ℓ]F [ℓ]⊤ + G[ℓ]Q[ℓ]G⊤[ℓ]

do not depend on the observation. This means that, if F [ℓ], G[ℓ],H[ℓ], Q[ℓ], and R[ℓ] are known in advance, e.g. they are time invariant,the error covariance matrices (prediction and estimation) as well as theKalman gain can be computed in advance.

Pre-computation of K[ℓ], Σ[ℓ | ℓ], and Σ[ℓ + 1 | ℓ] makes the real-time

computational requirements of the Kalman filter very modest:

◮ State estimate: One matrix-vector product and one vector addition.

◮ State prediction: One matrix-vector product.



Static State Estimation with the Kalman Filter

Consider the special case F [ℓ] ≡ I and G[ℓ] ≡ 0. In this case, we have astatic parameter estimation problem since the state does not change overtime. It should be clear that X[ℓ] ≡ X[0] = X.

What does the Kalman filter do in this scenario? Let’s look at therecursion replacing F [ℓ] ≡ I and G[ℓ] ≡ 0 (for simplicity, we also assumethat H and R are time-invariant).

K[ℓ] = Σ[ℓ | ℓ − 1]H⊤(

HΣ[ℓ | ℓ − 1]H⊤ + R)−1

X[ℓ | ℓ] = X[ℓ | ℓ − 1] + K[ℓ](

Y [ℓ] − HX[ℓ | ℓ − 1])

Σ[ℓ | ℓ] = Σ[ℓ | ℓ − 1] − K[ℓ]HΣ[ℓ | ℓ − 1]

X [ℓ + 1 | ℓ] = X[ℓ | ℓ]

Σ[ℓ + 1 | ℓ] = Σ[ℓ | ℓ]

Both predictions are equal to the last estimates (this should make sense).Worcester Polytechnic Institute D. Richard Brown III 13-Apr-2011 44 / 48


Static State Estimation with the Kalman Filter

Since the predictions aren’t necessary anymore, the notation and therecursion can be simplified to

K[ℓ] = Σ[ℓ − 1]H⊤(

HΣ[ℓ − 1]H⊤ + R)−1

X [ℓ] = X [ℓ − 1] + K[ℓ](

Y [ℓ] − HX[ℓ − 1])

Σ[ℓ] = Σ[ℓ − 1] − K[ℓ]HΣ[ℓ − 1]

What is the Kalman filter doing here?

It is performing sequential MMSE estimation of the static parameter X.That is, given observations Y [0], Y [1], . . . , the Kalman filter is acomputationally efficient way to sequentially refine the MMSE estimate ofan unknown static parameter (state) with each new observation as theyarrive (rather than batch processing at the end of all the observations).



Example: Sequential MMSE Estimation of Static Param.

0 0.1 0.2 0.3 0.4−0.25

−0.2

−0.15

−0.1

−0.05

0

0.05

0.1

X0

X1

0 5 10 15 20 250

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

time

squa

red

erro

r

true stateestimated state

σ2v = 0.1 and H = I in this example.



Example: Sequential MMSE Estimation of Static Param.

0 5 10 15 20 25 30 35 40 45 500

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

0.2

time

mea

n sq

uare

d er

ror

true stateestimated state

σ2v = 0.1 and H = I; averaged over 5000 runs.



Conclusions

◮ The discrete-time Kalman-Bucy filter is often called the “Workhorseof Estimation”.

◮ Computationally efficient and no loss of optimality in the linearGaussian case.

◮ The discrete-time Kalman-Bucy filter is also optimum in dynamic (orstatic) MMSE parameter estimation problems among the class oflinear MMSE estimators even if the noise is non-Gaussian.

◮ Even more extensions not covered here:◮ Prediction and smoothing.◮ Continuous-time Kalman-Bucy filter.◮ Extended Kalman filter (nonlinear but differentiable state update,

input, and output functions).◮ Unscented Kalman filter (also allows for nonlinear functions, better

convergence properties than the EKF in some cases).◮ etc.


ece531 lecture 11: dynamic parameter estimation and the ......ece531 lecture 1: dynamic parameter...

Documents