11 - 1 the linear model s. lall, stanford 2011.02.15.01 11 - the linear...

29
11 - 1 The linear model S. Lall, Stanford 2011.02.15.01 11 - The linear model The linear model The joint pdf and covariance Example: uniform pdfs The importance of the prior Linear measurements with Gaussian noise Example: Gaussian noise The signal-to-noise ratio Scalar systems and the SNR Example: small and large noise Posterior covariance Example: navigation Alternative formulae Weighted least squares

Upload: others

Post on 19-Jan-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 11 - 1 The linear model S. Lall, Stanford 2011.02.15.01 11 - The linear modelengr207b.stanford.edu/lectures/linear_model_2011_02_15... · 2016. 1. 6. · 11 - 2 The linear model S

11 - 1 The linear model S. Lall, Stanford 2011.02.15.01

11 - The linear model

• The linear model

• The joint pdf and covariance

• Example: uniform pdfs

• The importance of the prior

• Linear measurements with Gaussian noise

• Example: Gaussian noise

• The signal-to-noise ratio

• Scalar systems and the SNR

• Example: small and large noise

• Posterior covariance

• Example: navigation

• Alternative formulae

• Weighted least squares

Page 2: 11 - 1 The linear model S. Lall, Stanford 2011.02.15.01 11 - The linear modelengr207b.stanford.edu/lectures/linear_model_2011_02_15... · 2016. 1. 6. · 11 - 2 The linear model S

11 - 2 The linear model S. Lall, Stanford 2011.02.15.01

The linear model

A very important class of estimation problems is the linear model

y = Ax + w

• x and w are independent

• We have induced pdfs px for x and pw for w

• The matrix A is m × n

We measure y = ymeas and would like to estimate x

Page 3: 11 - 1 The linear model S. Lall, Stanford 2011.02.15.01 11 - The linear modelengr207b.stanford.edu/lectures/linear_model_2011_02_15... · 2016. 1. 6. · 11 - 2 The linear model S

11 - 3 The linear model S. Lall, Stanford 2011.02.15.01

The mean

• Let µx = E x and µw = Ew

• Then

E y = Aµx + µw

• Call this µy

Page 4: 11 - 1 The linear model S. Lall, Stanford 2011.02.15.01 11 - The linear modelengr207b.stanford.edu/lectures/linear_model_2011_02_15... · 2016. 1. 6. · 11 - 2 The linear model S

11 - 4 The linear model S. Lall, Stanford 2011.02.15.01

The linear map

Since y = Ax + w, we have[

xy

]

=

[

I 0A I

] [

xw

]

• We will measure y = ymeas and estimate x

• To do this, we would like the conditional pdf of x | y = ymeas

• For this, we need the joint pdf of x and y

Page 5: 11 - 1 The linear model S. Lall, Stanford 2011.02.15.01 11 - The linear modelengr207b.stanford.edu/lectures/linear_model_2011_02_15... · 2016. 1. 6. · 11 - 2 The linear model S

11 - 5 The linear model S. Lall, Stanford 2011.02.15.01

The joint pdf

The joint pdf p of x and y is

p(x, y) = px(x)pw(y − Ax)

because the joint pdf of x, w is

p1

([

xw

])

= px(x)pw(w)

and we know[

xy

]

=

[

I 0A I

] [

xw

]

z = Hu implies pz(a) = |det H|−1pu(H−1(u)), so

p(x, y) = p1

([

I 0−A I

] [

xy

])

Page 6: 11 - 1 The linear model S. Lall, Stanford 2011.02.15.01 11 - The linear modelengr207b.stanford.edu/lectures/linear_model_2011_02_15... · 2016. 1. 6. · 11 - 2 The linear model S

11 - 6 The linear model S. Lall, Stanford 2011.02.15.01

The covariance

Let Σw = cov(w) and Σx = cov(x). We have

cov

[

xy

]

=

[

Σx ΣxAT

AΣx AΣxAT + Σw

]

• Call this

[

Σx Σxy

Σyx Σy

]

. Above holds because cov

[

xy

]

=

[

I 0A I

] [

Σx 00 Σw

] [

I 0A I

]T

• Then cov(y) = AΣxAT + Σw

• AΣxAT is ‘signal covariance’

• Σw is ‘noise covariance’

Page 7: 11 - 1 The linear model S. Lall, Stanford 2011.02.15.01 11 - The linear modelengr207b.stanford.edu/lectures/linear_model_2011_02_15... · 2016. 1. 6. · 11 - 2 The linear model S

11 - 7 The linear model S. Lall, Stanford 2011.02.15.01

Example: uniform pdfs

Suppose x ∼ U [−2, 2] and w ∼ U [−1, 1] and we measure y = x + w.

The joint pdf and MMSE estimator are

−3 −2 −1 0 1 2 3−3

−2

−1

0

1

2

3

Notice the estimator is not xest = ymeas, because of the prior information that x ∈ [−2, 2].

Page 8: 11 - 1 The linear model S. Lall, Stanford 2011.02.15.01 11 - The linear modelengr207b.stanford.edu/lectures/linear_model_2011_02_15... · 2016. 1. 6. · 11 - 2 The linear model S

11 - 8 The linear model S. Lall, Stanford 2011.02.15.01

The importance of the prior

• x ∼ U [−2, 2] as before

• w ∼ U [−0.3, 0.3]; signal x is large relative to the noise w

The joint pdf and MMSE estimator are

−3 −2 −1 0 1 2 3−3

−2

−1

0

1

2

3

The estimator is almost xest = ymeas

Page 9: 11 - 1 The linear model S. Lall, Stanford 2011.02.15.01 11 - The linear modelengr207b.stanford.edu/lectures/linear_model_2011_02_15... · 2016. 1. 6. · 11 - 2 The linear model S

−2 0 2

−10

−5

0

5

10

11 - 9 The linear model S. Lall, Stanford 2011.02.15.01

The importance of the prior

• x ∼ U [−2, 2] as before

• w ∼ U [−10, 10]; signal x is small relative to the noise w

• The joint pdf and MMSE estimator are shown.

• The estimator mostly ignores the measurement

• The estimate is almost the prior mean E x = 0

Page 10: 11 - 1 The linear model S. Lall, Stanford 2011.02.15.01 11 - The linear modelengr207b.stanford.edu/lectures/linear_model_2011_02_15... · 2016. 1. 6. · 11 - 2 The linear model S

11 - 10 The linear model S. Lall, Stanford 2011.02.15.01

Linear measurements with Gaussian noise

We have the linear model

y = Ax + w

• x ∼ N (0, Σx) and w ∼ N (0, Σw) are independent

• So

[

xy

]

is Gaussian, with mean and covariance

E

[

xy

]

=

[

00

]

cov

[

xy

]

=

[

Σx ΣxAT

AΣx AΣxAT + Σw

]

Page 11: 11 - 1 The linear model S. Lall, Stanford 2011.02.15.01 11 - The linear modelengr207b.stanford.edu/lectures/linear_model_2011_02_15... · 2016. 1. 6. · 11 - 2 The linear model S

11 - 11 The linear model S. Lall, Stanford 2011.02.15.01

Linear measurements with Gaussian noise

The MMSE estimate of x given y = ymeas is

x̂mmse = ΣxAT (AΣxA

T + Σw)−1ymeas

because we know x̂mmse = ΣxyΣ−1

y ymeas

The matrix L = ΣxAT (AΣxA

T + Σw)−1 is called the estimator gain

Page 12: 11 - 1 The linear model S. Lall, Stanford 2011.02.15.01 11 - The linear modelengr207b.stanford.edu/lectures/linear_model_2011_02_15... · 2016. 1. 6. · 11 - 2 The linear model S

−2 0 2−6

−4

−2

0

2

4

6y=Axx=Ly

11 - 12 The linear model S. Lall, Stanford 2011.02.15.01

Example: linear measurements with Gaussian noise

Suppose y = 2x + w, with

• prior covariance cov(x) = 1

• noise covariance cov(w) = 3

the estimator is

xmmse =2ymeas

7

The MMSE estimator gives a smaller answer than justinverting A,

|xmmse| ≤ |A−1ymeas|

since we have prior information about x

Page 13: 11 - 1 The linear model S. Lall, Stanford 2011.02.15.01 11 - The linear modelengr207b.stanford.edu/lectures/linear_model_2011_02_15... · 2016. 1. 6. · 11 - 2 The linear model S

11 - 13 The linear model S. Lall, Stanford 2011.02.15.01

Non-zero means

Suppose x ∼ N (µx, Σx) and w ∼ N (µw, Σw).

The MMSE estimate of x given y = ymeas is

x̂mmse = µx + ΣxAT (AΣxA

T + Σw)−1(ymeas − Aµx − µw)

Page 14: 11 - 1 The linear model S. Lall, Stanford 2011.02.15.01 11 - The linear modelengr207b.stanford.edu/lectures/linear_model_2011_02_15... · 2016. 1. 6. · 11 - 2 The linear model S

11 - 14 The linear model S. Lall, Stanford 2011.02.15.01

The signal to noise ratio

Suppose where x, y and w are scalar, and y = Ax + w. The signal-to-noise ratio is

s =

√A2Σx√Σw

• Commonly used for scalar w, x, y; no use in vector case

• In terms of s, the MMSE estimate is

xmmse = µx +AΣx

A2Σx + Σw(ymeas − Aµx)

=1

1 + s2µx +

s2

1 + s2A−1ymeas

Page 15: 11 - 1 The linear model S. Lall, Stanford 2011.02.15.01 11 - The linear modelengr207b.stanford.edu/lectures/linear_model_2011_02_15... · 2016. 1. 6. · 11 - 2 The linear model S

11 - 15 The linear model S. Lall, Stanford 2011.02.15.01

Scalar systems and the SNR

The MMSE estimate is

xmmse =1

1 + s2µx +

s2

1 + s2A−1ymeas

• let θ =1

1 + s2, then xmmse = θµx + (1 − θ)A−1y

a convex linear combination of the prior mean and the least-squares estimate

• when s is small, xmmse ≈ µx, the prior mean

• when s is large, xmmse ≈ A−1y, the least-squares estimate of y

Page 16: 11 - 1 The linear model S. Lall, Stanford 2011.02.15.01 11 - The linear modelengr207b.stanford.edu/lectures/linear_model_2011_02_15... · 2016. 1. 6. · 11 - 2 The linear model S

−2 0 2−6

−4

−2

0

2

4

6y=Axx=Ly

11 - 16 The linear model S. Lall, Stanford 2011.02.15.01

Example: small noise

Suppose y = 2x + w, with

• prior covariance cov(x) = 1

• noise covariance cov(w) = 0.4; signal is largecompared to noise

Hence

• SNR s =

√A2Σx√Σw

≈ 3.2

• Estimate is

xmmse =s2

1 + s2A−1ymeas

≈ 0.9A−1ymeas

i.e., close to ymeas/2

Page 17: 11 - 1 The linear model S. Lall, Stanford 2011.02.15.01 11 - The linear modelengr207b.stanford.edu/lectures/linear_model_2011_02_15... · 2016. 1. 6. · 11 - 2 The linear model S

−2 0 2−6

−4

−2

0

2

4

6y=Axx=Ly

11 - 17 The linear model S. Lall, Stanford 2011.02.15.01

Example: large noise

Suppose y = 2x + w, with

• prior covariance cov(x) = 1

• noise covariance cov(w) = 20; signal is smallcompared to noise

Hence

• SNR s =

√A2Σx√Σw

≈ 0.45

• Estimate is

xmmse =s2

1 + s2A−1ymeas

≈ 0.17A−1ymeas

i.e., closer to 0 for all ymeas

Page 18: 11 - 1 The linear model S. Lall, Stanford 2011.02.15.01 11 - The linear modelengr207b.stanford.edu/lectures/linear_model_2011_02_15... · 2016. 1. 6. · 11 - 2 The linear model S

11 - 18 The linear model S. Lall, Stanford 2011.02.15.01

The posterior covariance

The posterior covariance of x given y = ymeas is

cov(

x∣

∣ y = ymeas

)

= Σx − ΣxAT (AΣxA

T + Σw)−1AΣx

• above follows because

cov(

x∣

∣ y = ymeas

)

= Σx − ΣxyΣ−1

y Σyx

• We can use this to compute the MSE since

E(

‖x − x̂mmse‖2∣

∣ y = ymeas

)

= trace cov(

x∣

∣ y = ymeas

)

Page 19: 11 - 1 The linear model S. Lall, Stanford 2011.02.15.01 11 - The linear modelengr207b.stanford.edu/lectures/linear_model_2011_02_15... · 2016. 1. 6. · 11 - 2 The linear model S

11 - 19 The linear model S. Lall, Stanford 2011.02.15.01

The posterior covariance and SNR

For scalar problems, the posterior covariance of x given y = ymeas is

cov(

x∣

∣ y = ymeas

)

=Σx

1 + s2

• The uncertainty (covariance) in x is reduced by the factor1

1 + s2by measurement

Page 20: 11 - 1 The linear model S. Lall, Stanford 2011.02.15.01 11 - The linear modelengr207b.stanford.edu/lectures/linear_model_2011_02_15... · 2016. 1. 6. · 11 - 2 The linear model S

11 - 20 The linear model S. Lall, Stanford 2011.02.15.01

Example: navigation

x =

[

pq

]

our location, we measure distances ri to m beacons at points (ui, vi)

r2r3

r4

r1(p; q)

(u1; v1)

(u2; v2)

(u3; v3)

(u4; v4)

assume p, q are small compared to ui, vi. then, approximately

y = Ax

• A ∈ Rm×2, ith row of A is the transpose of unit vector in the direction of beacon i

• y =

u2

1+ v2

1− r1

...√

u2m + v2

m − rm

measured vector of distances

Page 21: 11 - 1 The linear model S. Lall, Stanford 2011.02.15.01 11 - The linear modelengr207b.stanford.edu/lectures/linear_model_2011_02_15... · 2016. 1. 6. · 11 - 2 The linear model S

3 4 5 6 7 80

1

2

3

4

5

b1

b2

b3

Prior meanactual location

11 - 21 The linear model S. Lall, Stanford 2011.02.15.01

Example: navigation

here A ∈ R3×2 with

A =

b1

b2

b3

and y = Ax. Each bi is a unit vector.

• Prior information is x ∼ N([

44

]

,

[

2 00 2

])

• y is measured; yi is range measurement in the direction bi with noise w added

• beacons at

[

500

]

,

[

2050

]

,

[

−50−50

]

• figure shows prior 90% confidence ellipsoid

Page 22: 11 - 1 The linear model S. Lall, Stanford 2011.02.15.01 11 - The linear modelengr207b.stanford.edu/lectures/linear_model_2011_02_15... · 2016. 1. 6. · 11 - 2 The linear model S

11 - 22 The linear model S. Lall, Stanford 2011.02.15.01

Example: posterior confidence ellipsoids

Posterior confidence ellipsoids for two different possible noise covariances.

Σw =

0.011

1

3 4 5 6 7 80

1

2

3

4

5

b1

b2

b3

actual locationMMSE estimate

Σw =

11

0.01

3 4 5 6 7 80

1

2

3

4

5

b1

b2

b3

actual locationMMSE estimate

Page 23: 11 - 1 The linear model S. Lall, Stanford 2011.02.15.01 11 - The linear modelengr207b.stanford.edu/lectures/linear_model_2011_02_15... · 2016. 1. 6. · 11 - 2 The linear model S

11 - 23 The linear model S. Lall, Stanford 2011.02.15.01

Alternative formula

There is another way to write the posterior covariance:

cov(

x | y = ymeas

)

=(

Σ−1

x + ATΣ−1

w A)−1

• follows from the Sherman-Morrison-Woodbury formula

(A − BD−1C)−1 = A−1 + A−1B(D − CA−1B)−1CA−1

• This is very useful when we have fewer unknowns than measurements; i.e., Σx issmaller that AΣxA

T

Page 24: 11 - 1 The linear model S. Lall, Stanford 2011.02.15.01 11 - The linear modelengr207b.stanford.edu/lectures/linear_model_2011_02_15... · 2016. 1. 6. · 11 - 2 The linear model S

11 - 24 The linear model S. Lall, Stanford 2011.02.15.01

Alternative formula

There is also an alternative formula for the estimator gain

L = (Σ−1

x + ATΣ−1

w A)−1ATΣ−1

w

• Because

L = ΣxAT (AΣxA

T + Σw)−1

= ΣxAT (Σ−1

w AΣxAT + I)−1Σ−1

w

= (ΣxATΣ−1

w A + I)−1ΣxATΣ−1

w by push-through identity

= (ATΣ−1

w A + Σ−1

x )−1ATΣ−1

w

Page 25: 11 - 1 The linear model S. Lall, Stanford 2011.02.15.01 11 - The linear modelengr207b.stanford.edu/lectures/linear_model_2011_02_15... · 2016. 1. 6. · 11 - 2 The linear model S

11 - 25 The linear model S. Lall, Stanford 2011.02.15.01

Comparison with least-squares

The least-squares approach minimizes

‖y − Ax‖2 =

m∑

i=1

(

yi − aTi x

)2

where A =[

a1 a2 . . . am

]T

Suppose instead we minimizem

i=1

wi

(

yi − aTi x

)2

where w1, w2, . . . , wm are positive weights

Page 26: 11 - 1 The linear model S. Lall, Stanford 2011.02.15.01 11 - The linear modelengr207b.stanford.edu/lectures/linear_model_2011_02_15... · 2016. 1. 6. · 11 - 2 The linear model S

−3 −2 −1 0 1 2 3−3

−2

−1

0

1

2

3

−3 −2 −1 0 1 2 3−3

−2

−1

0

1

2

3

11 - 26 The linear model S. Lall, Stanford 2011.02.15.01

Weighted norms

More generally, let’s look at weighted norms

contours of the 2-norm

‖x‖2 =√

xTx

contours of the weighted-norm

‖x‖W =√

xTWx

= ‖W 1

2x‖2

where W =

[

2 11 4

]

Page 27: 11 - 1 The linear model S. Lall, Stanford 2011.02.15.01 11 - The linear modelengr207b.stanford.edu/lectures/linear_model_2011_02_15... · 2016. 1. 6. · 11 - 2 The linear model S

11 - 27 The linear model S. Lall, Stanford 2011.02.15.01

Weighted least squares

the weighted least-squares problem; given ymeas ∈ Rm,

minimize ‖ymeas − Ax‖W

assume A ∈ Rm×n, skinny, full rank, and W ∈ R

m×m and W > 0

then (by differentiating) the optimum x is

xwls = (ATWA)−1ATWymeas

Page 28: 11 - 1 The linear model S. Lall, Stanford 2011.02.15.01 11 - The linear modelengr207b.stanford.edu/lectures/linear_model_2011_02_15... · 2016. 1. 6. · 11 - 2 The linear model S

11 - 28 The linear model S. Lall, Stanford 2011.02.15.01

Weighted least squares

Axest

range(A)

ymeas

Rm

• if there is no noise, y lies in rangeA

• the weighted least-squares estimate xwls minimizes

‖ymeas − Ax‖W

• Axwls is the closest (in weighted-norm) point in rangeA to ymeas

Page 29: 11 - 1 The linear model S. Lall, Stanford 2011.02.15.01 11 - The linear modelengr207b.stanford.edu/lectures/linear_model_2011_02_15... · 2016. 1. 6. · 11 - 2 The linear model S

11 - 29 The linear model S. Lall, Stanford 2011.02.15.01

MMSE and weighted least squares

suppose we choose weight W = Σ−1

w ; then WLS solution is

xwls = (ATΣ−1

w A)−1ATΣ−1

w ymeas

compare with MMSE estimate when x ∼ N (0, Σx) and w ∼ N (0, Σw)

xmmse = (Σ−1

x + ATΣ−1

w A)−1ATΣ−1

w ymeas

• as the prior covariance Σx → ∞, the MMSE estimate tends to the WLS estimate

• if Σw = I then MMSE tends to usual least-squares solution as Σx → ∞

• the weighted norm heavily penalizes the residual y − Ax in low-noise directions