stochastic signals and systemshomepages.hs-bremen.de/~krausd/iwss/sss3.pdfchapter 3 / stochastic...

97
I N S T I T U T E O F W A T E R A C O U S T I C S, S O N A R E N G I N E E R I N G A N D S I G N A L T H E O R Y Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 1 Stochastic Signals and Systems Contents 1 Probability Theory 2 Stochastic Processes 3 Parameter Estimation 4 Signal Detection 5 Spectrum Analysis 6 Optimal Filtering

Upload: others

Post on 25-Jul-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Stochastic Signals and Systemshomepages.hs-bremen.de/~krausd/iwss/SSS3.pdfChapter 3 / Stochastic Signals and Systems / Prof. Dr. -Ing. Dieter Kraus 5 We examine the accuracy properties

I N S T I T U T E O F W A T E R A C O U S T I C S,

S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y

Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 1

Stochastic Signals and SystemsContents1 Probability Theory2 Stochastic Processes3 Parameter Estimation4 Signal Detection5 Spectrum Analysis 6 Optimal Filtering

Page 2: Stochastic Signals and Systemshomepages.hs-bremen.de/~krausd/iwss/SSS3.pdfChapter 3 / Stochastic Signals and Systems / Prof. Dr. -Ing. Dieter Kraus 5 We examine the accuracy properties

I N S T I T U T E O F W A T E R A C O U S T I C S,

S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y

Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 2

3 Parameter Estimation 3

3.1 Estimating Function and Estimator 4

3.2 Sufficient Statistic, Exponential Family 15

3.3 Linear Least Squares Estimation 29

3.4 Confidence Intervals 53

3.5 Cramer-Rao Lower Bound 57

3.6 Maximum Likelihood Estimation 75

3.7 Bayesian Estimation 803.7.1 Minimum Mean Square Error Estimation 843.7.2 Minimum Mean Absolute Error Estimation 913.7.3 Maximum A Posteriori Estimation 93

References to Chapter 3 97

Page 3: Stochastic Signals and Systemshomepages.hs-bremen.de/~krausd/iwss/SSS3.pdfChapter 3 / Stochastic Signals and Systems / Prof. Dr. -Ing. Dieter Kraus 5 We examine the accuracy properties

I N S T I T U T E O F W A T E R A C O U S T I C S,

S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y

Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 3

3 Parameter EstimationThe observations (x1,…,xn)T = x are realizations of the random variables (X1,…,Xn)T = X with density fx(x), which is element of a known set { fx(x | ): Î W } and whose parameter vector = (q1,…,qp)T is unknown.

Problem:For given observations x we are looking for an estimate

of that depends on x, i.e. .In parameter estimation problems one distinguishes wheth-er the parameter vector is a perfectly unknown quantity or whether the prior density of the parameter vector is sup-posed to be known.

θθθ

θ θ = θ(x)θ = (θ1,…,θp )T

Page 4: Stochastic Signals and Systemshomepages.hs-bremen.de/~krausd/iwss/SSS3.pdfChapter 3 / Stochastic Signals and Systems / Prof. Dr. -Ing. Dieter Kraus 5 We examine the accuracy properties

I N S T I T U T E O F W A T E R A C O U S T I C S,

S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y

Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 4

In the latter case one interprets the parameter vector as random vector, whose posterior density is used for the subsequent statistical inference. This approach leads to Bayes estimators.

3.1 Estimating Function and EstimatorFor determining qi , we denote the mapping

a estimating function for qi and its function value for a giv-en set of observations an estimate. Since x1,…,xn can be considered as random values the estimates are also ran-dom and therefore only approximate the true values qi .

( ) ( )1 1ˆ, , , , , 1, ,n i nx x x x i pq =! " ! !

Page 5: Stochastic Signals and Systemshomepages.hs-bremen.de/~krausd/iwss/SSS3.pdfChapter 3 / Stochastic Signals and Systems / Prof. Dr. -Ing. Dieter Kraus 5 We examine the accuracy properties

I N S T I T U T E O F W A T E R A C O U S T I C S,

S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y

Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 5

We examine the accuracy properties of the estimates on the basis of the corresponding random variables. Therefore, we define the random variable

and denote it as estimator of qi.

To characterize the properties of the estimator

completely, the density function that generally de-pends on has to be determined, e.g. using the methodsdescribed in Chapter 1.7.

( )1ˆˆ , , , 1, , ,i i nX X i pqQ = =! !

Θ = θ X1,…,Xn( )

θfΘ(θ | θ)

Page 6: Stochastic Signals and Systemshomepages.hs-bremen.de/~krausd/iwss/SSS3.pdfChapter 3 / Stochastic Signals and Systems / Prof. Dr. -Ing. Dieter Kraus 5 We examine the accuracy properties

I N S T I T U T E O F W A T E R A C O U S T I C S,

S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y

Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 6

For simplicity let p =1. Exemplarily, the densities of two alternative estimators and for q are depicted below.

0 1 2 3 4 5 6 7 8 9 100

0.1

0.2

0.3

0.4

0.5

q

!ΘΘ

f !Θ ( !θ |θ)

fΘ (θ |θ)

Page 7: Stochastic Signals and Systemshomepages.hs-bremen.de/~krausd/iwss/SSS3.pdfChapter 3 / Stochastic Signals and Systems / Prof. Dr. -Ing. Dieter Kraus 5 We examine the accuracy properties

I N S T I T U T E O F W A T E R A C O U S T I C S,

S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y

Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 7

The determination of the density function can be rather complicated. Therefore, one must be often satisfied with the determination of the moments of an estimator. Bias:The bias or systematic error is define by

where

If the estimator is said to be unbiased.

E(Θ i ) = ! θ i x1,…,xn( )fX1,…,Xn−∞

∫−∞

∫ (x1,…,xn | θ) dx1!dxn.

Θb(Θ) = 0

b(Θ) = b(Θ1),…,b(Θp )( )T = E(Θ1)−θ1,…,E(Θp )−θp( )T

= E(Θ1),…,E(Θp )( )T − θ1,…,θp( )T = E(Θ)− θ,

Page 8: Stochastic Signals and Systemshomepages.hs-bremen.de/~krausd/iwss/SSS3.pdfChapter 3 / Stochastic Signals and Systems / Prof. Dr. -Ing. Dieter Kraus 5 We examine the accuracy properties

I N S T I T U T E O F W A T E R A C O U S T I C S,

S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y

Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 8

Covariance:The covariance matrix is define by

where the diagonal elements represent the variances

for i = 1,…,p.

2 2 2

ˆ ˆ ˆVar( ) Cov( , )ˆ ˆ ˆ ˆE( E ) E( ) (E )

i i i

i i i i

Q = Q Q

= Q - Q = Q - Q

Cov(Θ) = Cov(Θ i ,Θ j )⎡⎣ ⎤⎦i, j=1,…,p

= E Θ i−E(Θ i )( ) Θ j −E(Θ j )( )( )⎡⎣⎢

⎤⎦⎥i, j=1,…,p

= E Θ −E(Θ)( ) Θ −E(Θ)( )T⎛⎝

⎞⎠ ,

Page 9: Stochastic Signals and Systemshomepages.hs-bremen.de/~krausd/iwss/SSS3.pdfChapter 3 / Stochastic Signals and Systems / Prof. Dr. -Ing. Dieter Kraus 5 We examine the accuracy properties

I N S T I T U T E O F W A T E R A C O U S T I C S,

S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y

Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 9

Mean Square Error:The matrix-valued mean square error is define by

where the diagonal elements represent the mean square errors of the individual components given by

for i = 1,…,p.

MSE(Θ) = E (Θ i−θ i )(Θ j −θ j )( )⎡⎣

⎤⎦i, j=1,…,p

= E (Θ − θ)(Θ − θ)T( )= Cov(Θ)+b(Θ)b(Θ)T ,

MSE(Θ i ) = E (Θ i −θ i )2( ) = Var(Θ i )+ b(Θ i )

2

Θ i

Page 10: Stochastic Signals and Systemshomepages.hs-bremen.de/~krausd/iwss/SSS3.pdfChapter 3 / Stochastic Signals and Systems / Prof. Dr. -Ing. Dieter Kraus 5 We examine the accuracy properties

I N S T I T U T E O F W A T E R A C O U S T I C S,

S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y

Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 10

Minimum Variance Unbiased (MVU) Estimator:An unbiased estimator is MVU if for any unbiased esti-mator the following inequality holds.

MVU exists no MVU existsVar Var

q!q

q

q

q!

qq q

aTCov(Θ)a ≤ aTCov( !Θ)a ∀a ∈"p, ∀θ ∈Ω

Θ!Θ

Page 11: Stochastic Signals and Systemshomepages.hs-bremen.de/~krausd/iwss/SSS3.pdfChapter 3 / Stochastic Signals and Systems / Prof. Dr. -Ing. Dieter Kraus 5 We examine the accuracy properties

I N S T I T U T E O F W A T E R A C O U S T I C S,

S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y

Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 11

Note:To emphasize that the variance is smallest for all an MVU es-timator is sometimes called uniformly minimum variance unbiased UMVU estimator.

Minimum Mean Square Error (MMSE) Estimator:An estimator provides an MMSE if for any estimatorthe following inequality holds.

Linear Estimator:An estimator is called linear if it can be expressed as a linear function of the observations, i.e.

Θ !Θ

θ ∈Ω

Θ

aTMSE(Θ)a ≤ aTMSE( !Θ)a ∀a ∈!p, ∀θ ∈Ω

Θ = AX with X = (X1,…,Xn) and A = ai, j( )i=1,…,p; j=1,…,n

Page 12: Stochastic Signals and Systemshomepages.hs-bremen.de/~krausd/iwss/SSS3.pdfChapter 3 / Stochastic Signals and Systems / Prof. Dr. -Ing. Dieter Kraus 5 We examine the accuracy properties

I N S T I T U T E O F W A T E R A C O U S T I C S,

S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y

Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 12

Exercise 3.1-1: MVU and MMSE variance estimator

Page 13: Stochastic Signals and Systemshomepages.hs-bremen.de/~krausd/iwss/SSS3.pdfChapter 3 / Stochastic Signals and Systems / Prof. Dr. -Ing. Dieter Kraus 5 We examine the accuracy properties

I N S T I T U T E O F W A T E R A C O U S T I C S,

S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y

Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 13

Consistency:Often one is interested in the behavior of an estimator if the number of observations grows. An estimator is said to be

– strongly consistent if converges with probability 1 towards , i.e.

– mean square consistent if converges in mean square sense towards , i.e.

– consistent if converges in probability towards , i.e.

Θn = θ(X1,…,Xn)a.s.n→∞⎯ →⎯⎯ θ

Θn = θ(X1,…,Xn)m.s.n→∞⎯ →⎯⎯ θ

Θn = θ(X1,…,Xn)P

n→∞⎯ →⎯⎯ θ

θ

θ

θ

Θn

Θn

Θn

Page 14: Stochastic Signals and Systemshomepages.hs-bremen.de/~krausd/iwss/SSS3.pdfChapter 3 / Stochastic Signals and Systems / Prof. Dr. -Ing. Dieter Kraus 5 We examine the accuracy properties

I N S T I T U T E O F W A T E R A C O U S T I C S,

S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y

Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 14

Asymptotic Normality:In certain cases one can show that an estimator is asymptotically normally distributed such that

Consequently, for large n the density function of can be approximated by

limn→∞

n(Θn − θ)∼Np(0,Σ).

fΘn(θ | θ) = np 2

(2π )p 2 det(Σ)⋅exp − n

2(θ − θ)T Σ−1(θ − θ)

⎧⎨⎩

⎫⎬⎭.

Θn

Page 15: Stochastic Signals and Systemshomepages.hs-bremen.de/~krausd/iwss/SSS3.pdfChapter 3 / Stochastic Signals and Systems / Prof. Dr. -Ing. Dieter Kraus 5 We examine the accuracy properties

I N S T I T U T E O F W A T E R A C O U S T I C S,

S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y

Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 15

3.2 Sufficient Statistic, Exponential FamiliesA function T = t(X) that is only depending on the observa-tion model X is called a statistic. Let X = (X1,…,Xn)T be a model of a binary sequence with probability p and 1-p of observing a one and a zero res-pectively. Furthermore, we suppose that the X1,…,Xn are independent. Thus, our model can be described by the probabilities

1 1

1

1

( ) ( )

( | ) (1 )

(1 ) (1 )

k k

n nk kk k

nx x

k

x n x t n t

P p p p

p p p p= =

-

=

- -

= = -

å å= - = -

Õx x

X x

Page 16: Stochastic Signals and Systemshomepages.hs-bremen.de/~krausd/iwss/SSS3.pdfChapter 3 / Stochastic Signals and Systems / Prof. Dr. -Ing. Dieter Kraus 5 We examine the accuracy properties

I N S T I T U T E O F W A T E R A C O U S T I C S,

S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y

Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 16

Now the following question arises. Do we collect all infor-mation in terms of inference about p by recording only

The statistical understanding about the collection of all information is quantified in the following definition.

Definition:A statistic T = t(X) is called sufficient for the parameter qif the conditional distribution of X given T = t(x) is inde-pendent of q for all t, i.e.

Hence, T contains "all information" about q included in x.

1( ) ?n

kkt x

==åx

FX x |T = t(x);θ( ) = FX x |T = t(x)( ).

Page 17: Stochastic Signals and Systemshomepages.hs-bremen.de/~krausd/iwss/SSS3.pdfChapter 3 / Stochastic Signals and Systems / Prof. Dr. -Ing. Dieter Kraus 5 We examine the accuracy properties

I N S T I T U T E O F W A T E R A C O U S T I C S,

S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y

Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 17

Exercise 3.2-1: Sufficiency of

1( ) n

kkt x

==åx

Page 18: Stochastic Signals and Systemshomepages.hs-bremen.de/~krausd/iwss/SSS3.pdfChapter 3 / Stochastic Signals and Systems / Prof. Dr. -Ing. Dieter Kraus 5 We examine the accuracy properties

I N S T I T U T E O F W A T E R A C O U S T I C S,

S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y

Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 18

Because the conditional distribution has to be determined a direct evaluation of sufficiency is usually difficult. Fortunately, the following theorem exist whose conditions can be verified easily.

Theorem: (factorization theorem for densities) A necessary and sufficient condition for a statistic T = t(X) to be sufficient is that there exist non-negative functions g(t | ) and h(x) such that fX(x | ) satisfiesθ θ

fX(x | θ) = g t(x) | θ( ) ⋅h(x).

Page 19: Stochastic Signals and Systemshomepages.hs-bremen.de/~krausd/iwss/SSS3.pdfChapter 3 / Stochastic Signals and Systems / Prof. Dr. -Ing. Dieter Kraus 5 We examine the accuracy properties

I N S T I T U T E O F W A T E R A C O U S T I C S,

S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y

Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 19

Exercise 3.2-2: Proof of the Theorem

Exercise 3.2-3: Sufficient statistic for mean and variance

Page 20: Stochastic Signals and Systemshomepages.hs-bremen.de/~krausd/iwss/SSS3.pdfChapter 3 / Stochastic Signals and Systems / Prof. Dr. -Ing. Dieter Kraus 5 We examine the accuracy properties

I N S T I T U T E O F W A T E R A C O U S T I C S,

S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y

Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 20

Minimal Sufficient StatisticA sufficient statistic T is said to be minimal if of all suffi-cient statistics it provides the greatest possible reduction of data, i.e. if for any sufficient statistic T' there exists a function s such that T = s(T').

Complete Sufficient StatisticA sufficient statistic T is said to be complete (or unique) if the condition

implies f(T) = 0 with probability 1 for all q.

Note: Completeness ensures minimality

θ

Eθ f(T)( ) = 0 ∀θ ∈Ω

Page 21: Stochastic Signals and Systemshomepages.hs-bremen.de/~krausd/iwss/SSS3.pdfChapter 3 / Stochastic Signals and Systems / Prof. Dr. -Ing. Dieter Kraus 5 We examine the accuracy properties

I N S T I T U T E O F W A T E R A C O U S T I C S,

S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y

Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 21

Exercise 3.2-4: Minimal and complete sufficient statistics

Page 22: Stochastic Signals and Systemshomepages.hs-bremen.de/~krausd/iwss/SSS3.pdfChapter 3 / Stochastic Signals and Systems / Prof. Dr. -Ing. Dieter Kraus 5 We examine the accuracy properties

I N S T I T U T E O F W A T E R A C O U S T I C S,

S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y

Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 22

Uniqueness of unbiased estimating functionCompleteness implies that there is just one estimating function of the sufficient statistic that provides an unbi-ased estimator of .Let T be a complete sufficient statistic and f1 and f2 be two functions such that

Then

and due to the completeness of TEθ f1(T)− f2(T)( ) = Eθ f(T)( ) = 0 ∀θ ∈Ω

f(T) = 0⇒ f1(T) = f2(T) with probability 1 ∀θ ∈Ω

θ

Eθ f1(T)( ) = Eθ f2(T)( ) = θ ∀θ ∈Ω.

Page 23: Stochastic Signals and Systemshomepages.hs-bremen.de/~krausd/iwss/SSS3.pdfChapter 3 / Stochastic Signals and Systems / Prof. Dr. -Ing. Dieter Kraus 5 We examine the accuracy properties

I N S T I T U T E O F W A T E R A C O U S T I C S,

S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y

Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 23

Theorem: (Rao-Blackwell)Let be an unbiased estimator of and T = t(X) be a sufficient statistic for q. Then the estimator defined by

is unbiased and improves on as follows.

Theorem: (Lehmann-Scheffe)If in addition to the assumptions employed for the Rao-Blackwell theorem the sufficient statistic is complete, then

is unique MVU estimator of q.

θ!Θ

Θ = E( !Θ |T)

Θ = E( !Θ |T)

aTCov(Θ)a ≤ aTCov( !Θ)a ∀a ∈"p!Θ

Page 24: Stochastic Signals and Systemshomepages.hs-bremen.de/~krausd/iwss/SSS3.pdfChapter 3 / Stochastic Signals and Systems / Prof. Dr. -Ing. Dieter Kraus 5 We examine the accuracy properties

I N S T I T U T E O F W A T E R A C O U S T I C S,

S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y

Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 24

Exercise 3.2-5:Proof of the Theorems

Page 25: Stochastic Signals and Systemshomepages.hs-bremen.de/~krausd/iwss/SSS3.pdfChapter 3 / Stochastic Signals and Systems / Prof. Dr. -Ing. Dieter Kraus 5 We examine the accuracy properties

I N S T I T U T E O F W A T E R A C O U S T I C S,

S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y

Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 25

Exponential FamiliesA family of distributions is forming a k-dimen-sional exponential family if the distributions have densities of the form

Frequently, it is more convenient to use the xi as the pa-rameters and write the density in the canonical form

FX(x | θ){ }FX(x | θ)

fX(x | θ) = h(x) ⋅exp ξi (θ)ti (x)−B(θ)i=1

k

∑⎛⎝⎜⎞⎠⎟.

fX(x | ξ) = h(x) ⋅exp ξi ti (x)− A(ξ)i=1

k

∑⎛⎝⎜⎞⎠⎟.

Page 26: Stochastic Signals and Systemshomepages.hs-bremen.de/~krausd/iwss/SSS3.pdfChapter 3 / Stochastic Signals and Systems / Prof. Dr. -Ing. Dieter Kraus 5 We examine the accuracy properties

I N S T I T U T E O F W A T E R A C O U S T I C S,

S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y

Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 26

Applying the factorization theorem for densities one can easily observe that

constitutes a sufficient statistic for the exponential family.

Note:The parameter space of the natural parametervector is convex.If the exponential family is of full rank, i.e. the parameter space contains a k-dimensional rectangle, then T = (T1, …,Tk)T is complete.

( ) ( )1 1, , ( ), , ( )T Tk kT T t t= =T X X! !

Ξ ⊂ !k

ξ = (ξ1,…,ξk )T

Ξ

Page 27: Stochastic Signals and Systemshomepages.hs-bremen.de/~krausd/iwss/SSS3.pdfChapter 3 / Stochastic Signals and Systems / Prof. Dr. -Ing. Dieter Kraus 5 We examine the accuracy properties

I N S T I T U T E O F W A T E R A C O U S T I C S,

S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y

Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 27

For exponential families one can claim, that the integral

has derivatives of all orders with respect to the xi which can be obtained by differentiating under the integral sign.Exploiting the properties of the integral we can find

and

fX(x | ξ)!n∫ dx = h(x) ⋅exp ξi ti (x)− A(ξ)i=1

k

∑⎛⎝⎜⎞⎠⎟!n∫ dx = 1

E Ti( ) = E ti (X)( ) = ∂∂ξiA(ξ)

Cov Ti ,Tj( ) = Cov ti (X),t j (X)( ) = ∂2

∂ξi ∂ξ jA(ξ)

Page 28: Stochastic Signals and Systemshomepages.hs-bremen.de/~krausd/iwss/SSS3.pdfChapter 3 / Stochastic Signals and Systems / Prof. Dr. -Ing. Dieter Kraus 5 We examine the accuracy properties

I N S T I T U T E O F W A T E R A C O U S T I C S,

S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y

Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 28

Exercise 3.2-6: Rayleigh distribution, mean and variance of the natural sufficient statistic

Page 29: Stochastic Signals and Systemshomepages.hs-bremen.de/~krausd/iwss/SSS3.pdfChapter 3 / Stochastic Signals and Systems / Prof. Dr. -Ing. Dieter Kraus 5 We examine the accuracy properties

I N S T I T U T E O F W A T E R A C O U S T I C S,

S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y

Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 29

3.3 Linear Least Squares EstimationConsider the linear model

where X and Z are n´1 vectors modeling the measure-ments and the measurement noise, respectively. Further-more, H denotes a known n´p matrix and q the p´1 pa-rameter vector that has to be estimated. The measurement noise is supposed to be statistically characterized by and Hence, the measurement model X possess the mean vec-tor and covariance matrix

E( ) =Z 0

2Cov( ) .Zs=X I

2Cov( ) E( ) .TZs= =Z ZZ I

X =Hθ + Z with Xi=hi1θ1+…+hipθp+Zi , i =1,…,n,

E(X) =Hθ

Page 30: Stochastic Signals and Systemshomepages.hs-bremen.de/~krausd/iwss/SSS3.pdfChapter 3 / Stochastic Signals and Systems / Prof. Dr. -Ing. Dieter Kraus 5 We examine the accuracy properties

I N S T I T U T E O F W A T E R A C O U S T I C S,

S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y

Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 30

Exercise 3.3-1: Model examples, • polynomial curve fitting, • amplitude and phase estimation of sinusoids • FIR filter identification

Page 31: Stochastic Signals and Systemshomepages.hs-bremen.de/~krausd/iwss/SSS3.pdfChapter 3 / Stochastic Signals and Systems / Prof. Dr. -Ing. Dieter Kraus 5 We examine the accuracy properties

I N S T I T U T E O F W A T E R A C O U S T I C S,

S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y

Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 31

Now, the least squares criterion can be expressed by

Differentiating with respect to by using the identities

results after equating to zero in the following so-called normal equation system

A solution of the normal equation system is given by

q(θ) = xi − (hi1θ1+…+hipθp )( )2i=1

n

∑= (x −Hθ)T (x −Hθ) = xTx − 2θTHTx + θTHTHθ.

∇θ θTAa = Aa, ∇θ θ

TAθ = (A + AT )θ

HTHθ =HTx.

θ = (HTH)−HTx,

θ

Page 32: Stochastic Signals and Systemshomepages.hs-bremen.de/~krausd/iwss/SSS3.pdfChapter 3 / Stochastic Signals and Systems / Prof. Dr. -Ing. Dieter Kraus 5 We examine the accuracy properties

I N S T I T U T E O F W A T E R A C O U S T I C S,

S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y

Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 32

where denotes a generalized inverse.

1) If the rank(H) = p, the number of unknown parameters, then is the ordinary inverse.

2) Let rank(H) = M < p, i.e. either n < p or the columns ofH are linearly dependent, then

is the Moore-Penrose inverse, where l1,…,lM and u1, …,uM denote the non-zero eigenvalues and corre-sponding eigenvectors of , respectively.

( )T -H H

1( ) ( )T T- -=H H H H

1

1( ) ( )M

T T Tm m

m ml- +

=

= =åH H H H u u

TH H

Page 33: Stochastic Signals and Systemshomepages.hs-bremen.de/~krausd/iwss/SSS3.pdfChapter 3 / Stochastic Signals and Systems / Prof. Dr. -Ing. Dieter Kraus 5 We examine the accuracy properties

I N S T I T U T E O F W A T E R A C O U S T I C S,

S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y

Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 33

Supplement:A generalized inverse of A is defined by the property

It is not unique. The Moore-Penrose inverse of A is de-fined by the properties

It is unique and provides the minimum length solution of the linear equation system

The Moore-Penrose inverse A+ can be determined by employing the singular value decomposition of A.

+ + + +

+ + + +

= == =, ,

( ) ( ) .T T

AA A A A AA AAA AA A A A A

.- =AA A A

.=Ax b

Page 34: Stochastic Signals and Systemshomepages.hs-bremen.de/~krausd/iwss/SSS3.pdfChapter 3 / Stochastic Signals and Systems / Prof. Dr. -Ing. Dieter Kraus 5 We examine the accuracy properties

I N S T I T U T E O F W A T E R A C O U S T I C S,

S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y

Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 34

Exercise 3.3-2: Normal equation system and generalized inverse

Page 35: Stochastic Signals and Systemshomepages.hs-bremen.de/~krausd/iwss/SSS3.pdfChapter 3 / Stochastic Signals and Systems / Prof. Dr. -Ing. Dieter Kraus 5 We examine the accuracy properties

I N S T I T U T E O F W A T E R A C O U S T I C S,

S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y

Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 35

Properties of the least squares estimatorThe mean vector is given by

where the matrices U1 and U2 can be derived from the singular value decomposition

with1( , , ,0, ,0)T T

Mdiag l l=H H U U! !

( ) ( ) ( )1 2 1 1 2 1, , , , .M M p+= = =U U U U u u U u u! !

E(Θ) = (HTH)−HTE(X) = (HTH)−HTHθ

=θ rank(H) = pU1U1

T θ = (I−U2U2T )θ rank(H) =M < p

⎧⎨⎪

⎩⎪,

Page 36: Stochastic Signals and Systemshomepages.hs-bremen.de/~krausd/iwss/SSS3.pdfChapter 3 / Stochastic Signals and Systems / Prof. Dr. -Ing. Dieter Kraus 5 We examine the accuracy properties

I N S T I T U T E O F W A T E R A C O U S T I C S,

S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y

Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 36

The covariance matrix of the least squares estimator

results after exploiting

in the expression

CΘΘ

= Cov(Θ) = E Θ −E(Θ)( ) Θ −E(Θ)( )T⎛⎝

⎞⎠

CΘΘ

= E (HTH)−HT (X −Hθ)(X −Hθ)TH(HTH)−( )= (HTH)−HT E (X −Hθ)(X −Hθ)T( )H(HTH)−= (HTH)−HT (σ Z

2 I)H(HTH)− = σ Z2(HTH)−.

E(Θ) = (HTH)−HTHθ

Page 37: Stochastic Signals and Systemshomepages.hs-bremen.de/~krausd/iwss/SSS3.pdfChapter 3 / Stochastic Signals and Systems / Prof. Dr. -Ing. Dieter Kraus 5 We examine the accuracy properties

I N S T I T U T E O F W A T E R A C O U S T I C S,

S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y

Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 37

Exercise 3.3-3: LSE for p = 1 and sample mean

Page 38: Stochastic Signals and Systemshomepages.hs-bremen.de/~krausd/iwss/SSS3.pdfChapter 3 / Stochastic Signals and Systems / Prof. Dr. -Ing. Dieter Kraus 5 We examine the accuracy properties

I N S T I T U T E O F W A T E R A C O U S T I C S,

S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y

Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 38

Theorem: (Gauß-Markov)Given the model

where

and

Then the best linear unbiased estimator (BLUE) is the least squares estimator

rank( ) .p=H

X =Hθ + Z with Xi=hi1θ1+…+hipθp+Zi , i =1,…,n,

E(X) =Hθ, Cov(X) = Cov(Z) = σ Z2 I

Θ = (HTH)−1HTX.

Page 39: Stochastic Signals and Systemshomepages.hs-bremen.de/~krausd/iwss/SSS3.pdfChapter 3 / Stochastic Signals and Systems / Prof. Dr. -Ing. Dieter Kraus 5 We examine the accuracy properties

I N S T I T U T E O F W A T E R A C O U S T I C S,

S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y

Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 39

Exercise 3.3-4: Proof of the Gauß-Markov Theorem

Page 40: Stochastic Signals and Systemshomepages.hs-bremen.de/~krausd/iwss/SSS3.pdfChapter 3 / Stochastic Signals and Systems / Prof. Dr. -Ing. Dieter Kraus 5 We examine the accuracy properties

I N S T I T U T E O F W A T E R A C O U S T I C S,

S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y

Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 40

Consequently, the minimum of the sum of squares is

where

are projection matrices, which project a vector a Î Rn by Pa and P^a into R(H) and N(HT), respectively.R(H): range of H, i.e. the space spanned by the columns of H

N(HT): nullspace of HT, i.e. the space that is orthogonal to R(H)

1 1( ) and ( )T T T T- ^ -= = - = -P HH H H P I P I H H H H

q(θ) = x −H(HTH)−1HTx( )T x −H(HTH)−1HTx( )= (x −Px)T (x −Px) = xT (I−P)T (I−P)x= xTP⊥TP⊥x = xTP⊥x,

Page 41: Stochastic Signals and Systemshomepages.hs-bremen.de/~krausd/iwss/SSS3.pdfChapter 3 / Stochastic Signals and Systems / Prof. Dr. -Ing. Dieter Kraus 5 We examine the accuracy properties

I N S T I T U T E O F W A T E R A C O U S T I C S,

S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y

Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 41

Thus, and The projection matrices are also symmetric and idempotent, i.e.

Moreover, by employing the trace of a square matrix

together with its property

where A, B and C are n´k, k´l and l´n matrices the min-imum of the sum of squares can be expressed by

, and , .T T^ ^ ^ ^ ^= = = =P P P PP P P P P P

,T ^ ^= =H P 0 P H 0

( ) , 1, ,1

tr( ) withn

ii ij i j nia a

==

= =åA A!

.^ ^= =PP P P 0

tr( ) tr( ) tr( ),= =ABC BCA CAB

q(θ) = xTP⊥x = tr(P⊥xxT ).

Page 42: Stochastic Signals and Systemshomepages.hs-bremen.de/~krausd/iwss/SSS3.pdfChapter 3 / Stochastic Signals and Systems / Prof. Dr. -Ing. Dieter Kraus 5 We examine the accuracy properties

I N S T I T U T E O F W A T E R A C O U S T I C S,

S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y

Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 42

Now we want to consider how the variance of the noise model can be estimated. The expected value of the minimum of the sum of squares provides

where

has been exploited.

( )( )

1

1

tr( ) tr ( )

tr ( ) tr( )

T Tn

T Tpn n n p

^ -

-

= -

= - = - = -

P I H H H H

H H H H I

2Zs

E q(Θ)( ) = E tr(P⊥XXT )( ) = tr P⊥E (Hθ + Z)(Hθ + Z)T( )( )= tr P⊥ HθθTHT +E(Z)θTHT +HθE(ZT )+E(ZZT )( )( )= tr P⊥(HθθTHT +σ Z

2 I)( ) = σ Z2 tr(P⊥ ) = (n − p)σ Z

2,

Page 43: Stochastic Signals and Systemshomepages.hs-bremen.de/~krausd/iwss/SSS3.pdfChapter 3 / Stochastic Signals and Systems / Prof. Dr. -Ing. Dieter Kraus 5 We examine the accuracy properties

I N S T I T U T E O F W A T E R A C O U S T I C S,

S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y

Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 43

This result motivates

as an unbiased estimator for

Theorem:If in addition to the Gauß-Markov theorem Z obeys a nor-mal distribution, i.e. the following holds.

2.Zs

S2 = q(Θ) (n − p)

Z∼Nn(0,σ Z2 I),

a) Θ = (HTH)−1HTX∼Np θ,σ Z2 (HTH)−1( )

b) Θ and S2 are stochastically independent

c) (n − p) σ Z2 ⋅S2 ∼ χn−p

2

Page 44: Stochastic Signals and Systemshomepages.hs-bremen.de/~krausd/iwss/SSS3.pdfChapter 3 / Stochastic Signals and Systems / Prof. Dr. -Ing. Dieter Kraus 5 We examine the accuracy properties

I N S T I T U T E O F W A T E R A C O U S T I C S,

S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y

Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 44

Exercise 3.3-5: Proof of the Theorem

Exercise 3.3-6: Amplitude and phase estimation of sinusoids

Exercise 3.3-7: System identification (FIR filter)

Page 45: Stochastic Signals and Systemshomepages.hs-bremen.de/~krausd/iwss/SSS3.pdfChapter 3 / Stochastic Signals and Systems / Prof. Dr. -Ing. Dieter Kraus 5 We examine the accuracy properties

I N S T I T U T E O F W A T E R A C O U S T I C S,

S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y

Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 45

In the following we generalize the linear model

such that the measurement noise possesses the meanand the covariance matrix

PrewhiteningIf is decomposed, e.g. by the Cholesky decomposi-tion , and

be introduced, the linear model can be reformulated to

with

E( )=Z 0 2Cov( ) .Zs= ZZZ C

ZZCT=ZZC CC

1 1 1, ,- - -= = =Y C X K C H W C Z

X =Hθ + Z with Xi=hi1θ1+…+hipθp+Zi , i =1,…,n,

Y =Kθ +W

E(Y) =Kθ = C−1Hθ and Cov(Y) = Cov(W) = σ Z2 I.

Page 46: Stochastic Signals and Systemshomepages.hs-bremen.de/~krausd/iwss/SSS3.pdfChapter 3 / Stochastic Signals and Systems / Prof. Dr. -Ing. Dieter Kraus 5 We examine the accuracy properties

I N S T I T U T E O F W A T E R A C O U S T I C S,

S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y

Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 46

Hence, the weighted least squares criterion for the gen-eralized linear model can be derived as follows.

The normal equation system

and its solution

are obtained analogous to the white noise case.

q(θ) = (y −Kθ)T (y −Kθ)

= C−1(x −Hθ)( )T C−1(x −Hθ)( )= (x −Hθ)T (C−1)TC−1(x −Hθ) = (x −Hθ)TCZZ

−1 (x −Hθ).

KTKθ =KTy resp. HTCZZ−1 Hθ =HTCZZ

−1 x

θ = (KTK)−KTy = (HTCZZ−1 H)−HTCZZ

−1 x

Page 47: Stochastic Signals and Systemshomepages.hs-bremen.de/~krausd/iwss/SSS3.pdfChapter 3 / Stochastic Signals and Systems / Prof. Dr. -Ing. Dieter Kraus 5 We examine the accuracy properties

I N S T I T U T E O F W A T E R A C O U S T I C S,

S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y

Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 47

Properties of the generalized least squares estimatorThe mean vector is given by

where the matrices U1 and U2 can be derived from the singular value decomposition

with

11( , , ,0, ,0)T T

Mdiag l l- =ZZH C H U U! !

E(Θ) = (HTCZZ−1H)−HTCZZ

−1 E(X) = (HTCZZ−1 H)−HTCZZ

−1 Hθ

=θ rank(H) = pU1U1

T θ = (I−U2U2T )θ rank(H) =M < p

⎧⎨⎪

⎩⎪,

U = U1 U2( ), U1 = u1,…,uM( ) U2 = uM+1,…,up( ).

Page 48: Stochastic Signals and Systemshomepages.hs-bremen.de/~krausd/iwss/SSS3.pdfChapter 3 / Stochastic Signals and Systems / Prof. Dr. -Ing. Dieter Kraus 5 We examine the accuracy properties

I N S T I T U T E O F W A T E R A C O U S T I C S,

S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y

Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 48

The covariance matrix of the generalized least squares estimator

results after exploiting

in the expression

CΘΘ

= Cov(Θ) = E Θ −E(Θ)( ) Θ −E(Θ)( )T⎛⎝

⎞⎠

E(Θ) = (HTCZZ−1 H)−HTCZZ

−1 Hθ

CΘΘ

= E (HTCZZ−1H)−HTCZZ

−1 (X−Hθ)(X−Hθ)TCZZ−1H(HTCZZ

−1H)−( )= (HTCZZ

−1H)−HTCZZ−1E (X−Hθ)(X−Hθ)T( )CZZ

−1H(HTCZZ−1H)−

= (HTCZZ−1H)−HTCZZ

−1 (σ Z2CZZ)CZZ

−1H(HTCZZ−1H)−

= σ Z2(HTCZZ

−1H)−.

Page 49: Stochastic Signals and Systemshomepages.hs-bremen.de/~krausd/iwss/SSS3.pdfChapter 3 / Stochastic Signals and Systems / Prof. Dr. -Ing. Dieter Kraus 5 We examine the accuracy properties

I N S T I T U T E O F W A T E R A C O U S T I C S,

S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y

Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 49

Theorem: (Gauß-Markov)Given the model

where

and

Then the best linear unbiased estimator (BLUE) is the least squares estimator

rank( ) .p=H

X =Hθ + Z with Xn=hi1θ1+…+hipθp+Zi , i =1,…,n,

E(X) =Hθ, Cov(X) = Cov(Z) = σ Z2CZZ

Θ = (HTCZZ−1 H)−1HTCZZ

−1 X.

Page 50: Stochastic Signals and Systemshomepages.hs-bremen.de/~krausd/iwss/SSS3.pdfChapter 3 / Stochastic Signals and Systems / Prof. Dr. -Ing. Dieter Kraus 5 We examine the accuracy properties

I N S T I T U T E O F W A T E R A C O U S T I C S,

S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y

Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 50

Hence, the minimum of the weighted sum of squares is

where

represent again projection matrices.

1 1 11 1 1( )T T

- - -- - - ^= = -

ZZ ZZ ZZZZ ZZC C CP HH C H H C and P I P

q(θ) = tr(P⊥yyT ) = tr I−K(KTK)−1KT( )yyT( )= tr I−C−1H HT (C−1)TC−1H( )−1HT (C−1)T⎛

⎝⎞⎠C

−1xxT (C−1)T⎛⎝⎜

⎞⎠⎟

= tr CZZ−1 I−H HTCZZ

−1H( )−1HTCZZ−1⎛

⎝⎞⎠ xx

T⎛⎝⎜

⎞⎠⎟

= tr CZZ−1 I−P

CZZ−1( )xxT( ) = tr CZZ

−1 PCZZ

−1⊥ xxT( ),

Page 51: Stochastic Signals and Systemshomepages.hs-bremen.de/~krausd/iwss/SSS3.pdfChapter 3 / Stochastic Signals and Systems / Prof. Dr. -Ing. Dieter Kraus 5 We examine the accuracy properties

I N S T I T U T E O F W A T E R A C O U S T I C S,

S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y

Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 51

Again, we are interested in estimating the variance of the noise. The expected value of the minimum of the weighted sum of squares results in

where

has been utilized.

2Zs

E q(Θ)( )= E tr(CZZ−1P

CZZ−1

⊥ XXT )( )= tr CZZ−1P

CZZ−1

⊥ E (Hθ+Z)(Hθ+Z)T( )( )= tr CZZ

−1PCZZ

−1⊥ HθθTHT +E(Z)θTHT +HθE(ZT )+E(ZZT )( )( )

= tr CZZ−1P

CZZ−1

⊥ (HθθTHT +σ Z2CZZ)( )=σ Z

2 tr(PCZZ

−1⊥ )= (n−p)σ Z

2,

tr(PCZZ

−1⊥ ) = tr In −H(H

TCZZ−1 H)−1HTCZZ

−1( )= n − tr (HTCZZ

−1 H)−1HTCZZ−1H( ) = n − tr(Ip ) = n − p

Page 52: Stochastic Signals and Systemshomepages.hs-bremen.de/~krausd/iwss/SSS3.pdfChapter 3 / Stochastic Signals and Systems / Prof. Dr. -Ing. Dieter Kraus 5 We examine the accuracy properties

I N S T I T U T E O F W A T E R A C O U S T I C S,

S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y

Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 52

This result motivates

as an unbiased estimator for

Theorem:If in addition to the Gauß-Markov theorem Z obeys a nor-mal distribution, i.e. the following holds.

2.Zs

Z∼Nn(0,σ Z2CZZ),

S2 = q(Θ) (n − p)

a) Θ = (HTCZZ−1H)−1HTCZZ

−1 X∼Np θ,σ Z2 (HTCZZ

−1H)−1( )b) Θ and S2 are stochastically independent

c) (n − p) σ Z2 ⋅S2 ∼ χn−p

2

Page 53: Stochastic Signals and Systemshomepages.hs-bremen.de/~krausd/iwss/SSS3.pdfChapter 3 / Stochastic Signals and Systems / Prof. Dr. -Ing. Dieter Kraus 5 We examine the accuracy properties

I N S T I T U T E O F W A T E R A C O U S T I C S,

S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y

Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 53

3.4 Confidence IntervalsNow, an unknown parameter q is considered, where the density fX(x|q) of X and an estimator for q pos-sessing the density are given.With the knowledge of and a given a, e.g. a =0.05, we can derive from

the probability equation

Θ = θ(X)fΘ (θ |θ)

fΘ (θ |θ)

1−α = P θ −a1<Θ ≤θ +a2( ) = FΘ θ +a2 |θ( )− FΘ θ −a1 |θ( )

1−α = P θ − a1 < Θ ≤ θ + a2( ) = P −a1 < Θ −θ ≤ a2( )= P −Θ − a1 < −θ ≤ a2 − Θ( ) = P Θ − a2 ≤ θ < Θ + a1( )

Page 54: Stochastic Signals and Systemshomepages.hs-bremen.de/~krausd/iwss/SSS3.pdfChapter 3 / Stochastic Signals and Systems / Prof. Dr. -Ing. Dieter Kraus 5 We examine the accuracy properties

I N S T I T U T E O F W A T E R A C O U S T I C S,

S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y

Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 54

which states that random interval covers the parameter q with probability 1-a.

For given observations x the set

is called – confidence interval for q with the confidence

coefficient 1- a

or alternatively – interval estimate to the confidence level 1- a.

{ }2 1ˆ ˆ( ) : ( ) ( )C a aq q q q= - £ < +x x x

[Θ − a2,Θ + a1)

Page 55: Stochastic Signals and Systemshomepages.hs-bremen.de/~krausd/iwss/SSS3.pdfChapter 3 / Stochastic Signals and Systems / Prof. Dr. -Ing. Dieter Kraus 5 We examine the accuracy properties

I N S T I T U T E O F W A T E R A C O U S T I C S,

S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y

Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 55

In case of an unknown parameter vector q an interval es-timate can be constructed if a function exists such that the corresponding random variable

obeys a known distribution which is independent of .

Frequently one can find such a function, which depends on x only over the estimate , i.e.

Examples are given in the subsequent exercises.

θ(x)

θ

g(x | θ)

Y = g(x | θ)

g(x | θ) = h(θ | θ).

Page 56: Stochastic Signals and Systemshomepages.hs-bremen.de/~krausd/iwss/SSS3.pdfChapter 3 / Stochastic Signals and Systems / Prof. Dr. -Ing. Dieter Kraus 5 We examine the accuracy properties

I N S T I T U T E O F W A T E R A C O U S T I C S,

S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y

Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 56

Exercise 3.4-1: Signal + white noise, confidence interval for the • signal amplitude estimate (known noise variance)• signal amplitude estimate (unknown noise variance)• noise variance estimate

Exercise 3.4-2: Sinusoids + white noise, confidence interval for the amplitude estimates (unknown noise variance)

Page 57: Stochastic Signals and Systemshomepages.hs-bremen.de/~krausd/iwss/SSS3.pdfChapter 3 / Stochastic Signals and Systems / Prof. Dr. -Ing. Dieter Kraus 5 We examine the accuracy properties

I N S T I T U T E O F W A T E R A C O U S T I C S,

S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y

Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 57

3.5 Cramer-Rao Lower BoundIn the following only unbiased estimators, i.e.

are considered, whose covariance matrix

exist. The Cramer-Rao inequality provides now a lower bound to the covariance matrix of any unbiased estima-tor of . Due to the following regularity conditions certainexpected values can be determined, which are of impor-tance for deriving the Cramer-Rao inequality.

E(Θ) = θ(x)fX(x | θ) dx!n∫ = θ,

CΘΘ

= Cov(Θ) = E (Θ − θ)(Θ − θ)T( )

θ

Page 58: Stochastic Signals and Systemshomepages.hs-bremen.de/~krausd/iwss/SSS3.pdfChapter 3 / Stochastic Signals and Systems / Prof. Dr. -Ing. Dieter Kraus 5 We examine the accuracy properties

I N S T I T U T E O F W A T E R A C O U S T I C S,

S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y

Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 58

Regularity conditions:a) The parameter set W is an interval in .

b) The gradient exists for any x and q.

c) The gradient of with respect to q can be obtained by taking the gradient under the integralsign, i.e. .

d) The gradient of with respect to q can be ob-tained by taking the gradient under the integral sign,i.e. .

!p

∇θ ln fX(x | θ)( )fX(x | θ) dx!n∫

∇θ fX(x | θ) dx!n∫ = ∇θ fX(x | θ) dx!n∫

∇θ θ(x) fX(x | θ) dx!n∫ = θ(x)∇θ fX(x | θ) dx!n∫

E θ(X)( )

Page 59: Stochastic Signals and Systemshomepages.hs-bremen.de/~krausd/iwss/SSS3.pdfChapter 3 / Stochastic Signals and Systems / Prof. Dr. -Ing. Dieter Kraus 5 We examine the accuracy properties

I N S T I T U T E O F W A T E R A C O U S T I C S,

S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y

Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 59

Corollary:If the former regularity conditions hold, then1)

2)

Proof of 1):

E ∇θ ln fX(X | θ)( )( ) = 0E ∇θ ln fX(X | θ)( )( )(Θ − θ)T( ) = I

E ∇θ ln fX(X | θ)( )( ) = ∇θ ln fX(x | θ)( ) fX(x | θ) dx!n∫ =

=∇θ fX(x | θ)fX(x | θ)

fX(x | θ) dx!n∫ = ∇θ fX(x | θ) dx!n∫= ∇θ fX(x | θ) dx!n∫ = ∇θ1= 0

Page 60: Stochastic Signals and Systemshomepages.hs-bremen.de/~krausd/iwss/SSS3.pdfChapter 3 / Stochastic Signals and Systems / Prof. Dr. -Ing. Dieter Kraus 5 We examine the accuracy properties

I N S T I T U T E O F W A T E R A C O U S T I C S,

S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y

Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 60

Proof of 2):

E ∇θ ln fX(X | θ)( )( )(Θ − θ)T( ) == ∇θ ln fX(x | θ)( ) θ(x)− θ( )T fX(x | θ)dx!n∫=

∇θ fX(x | θ)fX(x | θ)

θ(x)− θ( )T fX(x | θ)dx!n∫

= ∇θ fX(x | θ) θ(x)− θ( )T dx!n∫

= ∇θ fX(x | θ) θ(x)T dx

!n∫ − ∇θ fX(x | θ)dx!n∫ θT

= ∇θ θT − ∇θ1( )θT = I

Page 61: Stochastic Signals and Systemshomepages.hs-bremen.de/~krausd/iwss/SSS3.pdfChapter 3 / Stochastic Signals and Systems / Prof. Dr. -Ing. Dieter Kraus 5 We examine the accuracy properties

I N S T I T U T E O F W A T E R A C O U S T I C S,

S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y

Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 61

Definition: (Fisher information matrix)The p´p Fisher information matrix is defined by

Hence, the entries of the Fisher information matrix are for i, j = 1,…,p given by

I (θ) = E ∇θ ln fX(X | θ)( )∇θT ln fX(X | θ)( )( ) =

= ∇θ ln fX(x | θ)( )∇θT ln fX(X | θ)( ) fX(x | θ)dx!n∫ .

I ij (θ) = E∂ ln fX(x | θ)( )

∂θ i⋅∂ ln fX(X | θ)( )

∂θ j

⎝⎜

⎠⎟

=∂ ln fX(x | θ)( )

∂θ i⋅∂ ln fX(X | θ)( )

∂θ jfX(x | θ)dx!n∫ .

Page 62: Stochastic Signals and Systemshomepages.hs-bremen.de/~krausd/iwss/SSS3.pdfChapter 3 / Stochastic Signals and Systems / Prof. Dr. -Ing. Dieter Kraus 5 We examine the accuracy properties

I N S T I T U T E O F W A T E R A C O U S T I C S,

S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y

Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 62

Theorem: (Cramer-Rao inequality)Suppose the former assumptions and regularity condi-tions hold and the Fisher information matrix is posi-tive definite. Then the variance of any estimator is bounded downwards by

If particularly a = ei, i.e. the i-th Cartesian unit vector, the inequality provides the lower limit for the variance of the i-th component of .

I (θ)Θa = a

Θ

Var(Θa) = aTC

ΘΘa ≥ aTI (θ)−1a ∀a ∈!p.

Var(Θ i ) = eiTC

ΘΘei ≥ ei

TI (θ)−1ei = I (θ)ii−1

i = 1,…,p

Page 63: Stochastic Signals and Systemshomepages.hs-bremen.de/~krausd/iwss/SSS3.pdfChapter 3 / Stochastic Signals and Systems / Prof. Dr. -Ing. Dieter Kraus 5 We examine the accuracy properties

I N S T I T U T E O F W A T E R A C O U S T I C S,

S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y

Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 63

Proof:First, the random vector

is introduced and its second order moment matrix

is determined. After exploiting the former corollary the

E(YYT ) = E (Θ − θ)(Θ − θ)T( )− E (Θ − θ)∇θ

T ln fX(X | θ)( )( )I (θ)−1− I (θ)−1E ∇θ ln fX(X | θ)( )(Θ − θ)T( )+ I (θ)−1E ∇θ ln fX(X | θ)( )∇θ

T ln fX(X | θ)( )( )I (θ)−1.

Y = Θ − θ − I (θ)−1∇θ ln fX(X | θ)( )

Page 64: Stochastic Signals and Systemshomepages.hs-bremen.de/~krausd/iwss/SSS3.pdfChapter 3 / Stochastic Signals and Systems / Prof. Dr. -Ing. Dieter Kraus 5 We examine the accuracy properties

I N S T I T U T E O F W A T E R A C O U S T I C S,

S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y

Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 64

second order moment matrix can be simplified to

Since

the second order moment matrix is positive semidefinite.Thus, we can write

and the assertion

of the theorem follows.

E(YYT ) = E (Θ − θ)(Θ − θ)T( )− I (θ)−1.

aTE(YYT )a = E(aTYYTa) = E (aTY)2( ) ≥ 0 ∀a ∈!p

aTE(YYT )a = aTCΘΘa − aTI (θ)−1a ≥ 0 ∀a ∈!p

aTCΘΘa ≥ aTI (θ)−1a ∀a ∈!p

Page 65: Stochastic Signals and Systemshomepages.hs-bremen.de/~krausd/iwss/SSS3.pdfChapter 3 / Stochastic Signals and Systems / Prof. Dr. -Ing. Dieter Kraus 5 We examine the accuracy properties

I N S T I T U T E O F W A T E R A C O U S T I C S,

S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y

Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 65

Under the following regularity condition an alternative and often more convenient expression for calculating the in-formation matrix can be presented.

Regularity condition:e) The Hessian matrix of

with respect to can be obtained by taking all the re-quired second partial derivatives under the integral sign, i.e.

∇θ∇θT fX(x | θ) dx!n∫ = ∇θ∇θ

T fX(x | θ) dx.!n∫

fX(x | θ) dx!n∫θ

Page 66: Stochastic Signals and Systemshomepages.hs-bremen.de/~krausd/iwss/SSS3.pdfChapter 3 / Stochastic Signals and Systems / Prof. Dr. -Ing. Dieter Kraus 5 We examine the accuracy properties

I N S T I T U T E O F W A T E R A C O U S T I C S,

S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y

Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 66

Corollary:If the former regularity condition holds, the Fisher infor-mation matrix can be alternatively determined by

Consequently, the elements of the Fisher information matrix can be expressed by

I (θ) = E ∇θ ln fX(X | θ)( )∇θT ln fX(X | θ)( )( )

= −E ∇θ∇θT ln fX(X | θ)( )( ).

I ij (θ)=E∂ ln fX(x|θ)( )

∂θ i⋅∂ ln fX(X|θ)( )

∂θ j

⎝⎜

⎠⎟ =−E

∂2 ln fX(x|θ)( )∂θ i ∂θ j

⎝⎜

⎠⎟

i, j = 1,…,p.

Page 67: Stochastic Signals and Systemshomepages.hs-bremen.de/~krausd/iwss/SSS3.pdfChapter 3 / Stochastic Signals and Systems / Prof. Dr. -Ing. Dieter Kraus 5 We examine the accuracy properties

I N S T I T U T E O F W A T E R A C O U S T I C S,

S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y

Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 67

Proof:

E ∇θ∇θT ln fX(X|θ)( )( ) = E ∇θ

∇θT fX(x|θ)fX(x|θ)

⎝⎜

⎠⎟ =

= E∇θ∇θ

T fX(x|θ) ⋅ fX(x|θ)−∇θ fX(x|θ) ⋅∇θT fX(x|θ)

fX2(x | θ)

⎝⎜

⎠⎟

= E∇θ∇θ

T fX(x|θ)fX(x|θ)

⎝⎜

⎠⎟ −E

∇θ fX(x|θ) ⋅∇θT fX(x|θ)

fX2(x|θ)

⎝⎜

⎠⎟

= ∇θ∇θT fX(x|θ)dx!n∫ −E ∇θ ln fX(x|θ)( )∇θ

T ln fX(x|θ)( )( )= ∇θ∇θ

T fX(x|θ)dx!n∫ − I (θ) = ∇θ∇θT1− I(θ) = −I (θ)

Page 68: Stochastic Signals and Systemshomepages.hs-bremen.de/~krausd/iwss/SSS3.pdfChapter 3 / Stochastic Signals and Systems / Prof. Dr. -Ing. Dieter Kraus 5 We examine the accuracy properties

I N S T I T U T E O F W A T E R A C O U S T I C S,

S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y

Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 68

Theorem:Let belong to the exponential family in canonical form and T denote the corresponding sufficient statistic. 1) The regularity conditions a) – e) are satisfied and the

information matrix is positive definite

Thus, for unbiased estimators the Cramer-Rao in-equality holds.

fX(x | ξ)

I(ξ) = E ∇ξ ln fX(X | ξ)( )∇ξT ln fX(X | ξ)( )( )

= −E ∇ξ∇ξT ln fX(X | ξ)( )( )

= ∇ξ∇ξT A(ξ) = Cov(T) = CTT.

Page 69: Stochastic Signals and Systemshomepages.hs-bremen.de/~krausd/iwss/SSS3.pdfChapter 3 / Stochastic Signals and Systems / Prof. Dr. -Ing. Dieter Kraus 5 We examine the accuracy properties

I N S T I T U T E O F W A T E R A C O U S T I C S,

S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y

Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 69

2) Let the vector valued function

be one-to-one and differentiable with

Then the regularity conditions a) - e) are satisfied with respect to q and the information matrix can be expressed byI (θ) = −E ∇θ∇θ

T ln fX(X | θ)( )( ) = G(θ)I (ξ) ξ=g(θ)G(θ)T= G(θ) ∇ξ∇ξ

T A(ξ)( )θ=g(θ)

G(θ)T .

ξ = g(ξ), ξ ∈!k , ξ ∈!p with p ≤ k

G(θ) = ∇θ g(θ)T =

∂gj (θ)∂θ i

⎝⎜

⎠⎟i=1,…,p; j=1,…,k

.

Page 70: Stochastic Signals and Systemshomepages.hs-bremen.de/~krausd/iwss/SSS3.pdfChapter 3 / Stochastic Signals and Systems / Prof. Dr. -Ing. Dieter Kraus 5 We examine the accuracy properties

I N S T I T U T E O F W A T E R A C O U S T I C S,

S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y

Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 70

Now, one asks oneself immediately, does there exist an unbiased estimator for which the information inequality takes the equality sign, i.e. whose covariance matrix e-quals the inverse Fisher information matrix.An estimator for which the information inequality takes the equality sign, i.e. whose covariance matrix coincides with the inverse Fisher information matrix (Cramer-Rao lower bound), is called efficient.

Efficient estimators can be achieved only in special ca-ses, since the random vector Y, introduced for proving the information inequality, must be the zero-vector with probability 1.

Page 71: Stochastic Signals and Systemshomepages.hs-bremen.de/~krausd/iwss/SSS3.pdfChapter 3 / Stochastic Signals and Systems / Prof. Dr. -Ing. Dieter Kraus 5 We examine the accuracy properties

I N S T I T U T E O F W A T E R A C O U S T I C S,

S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y

Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 71

Exercise 3.5-1: Non/efficiency of linear least squares estimators when the noise is i.i.d. and normally distributed

Page 72: Stochastic Signals and Systemshomepages.hs-bremen.de/~krausd/iwss/SSS3.pdfChapter 3 / Stochastic Signals and Systems / Prof. Dr. -Ing. Dieter Kraus 5 We examine the accuracy properties

I N S T I T U T E O F W A T E R A C O U S T I C S,

S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y

Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 72

The previous exercise showed, that the deviation of the covariance matrix from the Fisher information matrix can be neglected for large samples.This motivates the definition of the limit

where n indicates the growing sample size. If the matrix exists and is not singular, then unbiased and con-

sistent estimators are of interest that satisfy the limit

Simultaneously, such estimators are then even consis-tent in the mean square sense.

Γ(θ) = limn→∞

1nIn(θ),

Γ(θ)

limn→∞nC

ΘnΘn= Γ(θ)−1.

Θn

Page 73: Stochastic Signals and Systemshomepages.hs-bremen.de/~krausd/iwss/SSS3.pdfChapter 3 / Stochastic Signals and Systems / Prof. Dr. -Ing. Dieter Kraus 5 We examine the accuracy properties

I N S T I T U T E O F W A T E R A C O U S T I C S,

S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y

Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 73

Estimators used in practice are often neither unbiased nor mean square consistent. In these cases the limiting distributions of such estimators are considered. If the lim-iting distributions have the property

the estimators are said to be asymptotically efficient.Furthermore, under certain regularity conditions (similar to the former) one can show that in the class of asympto-tically normally distributed estimators the following lowerbound can be proved.

aTD(θ)a ≥ aTΓ(θ)−1a a ∈!p

limn→∞

n(Θnn − θ)∼Np 0,D(θ)( ) and D(θ) = Γ(θ)−1

Page 74: Stochastic Signals and Systemshomepages.hs-bremen.de/~krausd/iwss/SSS3.pdfChapter 3 / Stochastic Signals and Systems / Prof. Dr. -Ing. Dieter Kraus 5 We examine the accuracy properties

I N S T I T U T E O F W A T E R A C O U S T I C S,

S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y

Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 74

Exercise 3.5-2: Asymptotic efficiency of linear least squares estimatorswhen the noise is i.i.d. and normally distributed

Page 75: Stochastic Signals and Systemshomepages.hs-bremen.de/~krausd/iwss/SSS3.pdfChapter 3 / Stochastic Signals and Systems / Prof. Dr. -Ing. Dieter Kraus 5 We examine the accuracy properties

I N S T I T U T E O F W A T E R A C O U S T I C S,

S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y

Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 75

3.6 Maximum Likelihood EstimationThe maximum Likelihood method is one of the most im-portant procedures for the construction of estimators. For given observations x the procedure selects that q as maximum likelihood estimate , for which the density function takes its maximum. Thus, the maximum likelihood estimate can be interpre-ted heuristically as that parameter vector that makes the occurrence of the data observed most likely. Instead of determining the maximum of , one can due to the monotonicity of the logarithmic function alter-natively maximize .

fX(x | θ)

fX(x | θ)

ln fX(x | θ)( )

θ(x)

Page 76: Stochastic Signals and Systemshomepages.hs-bremen.de/~krausd/iwss/SSS3.pdfChapter 3 / Stochastic Signals and Systems / Prof. Dr. -Ing. Dieter Kraus 5 We examine the accuracy properties

I N S T I T U T E O F W A T E R A C O U S T I C S,

S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y

Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 76

Definition:Suppose X possesses the density with and the vector of observations x = (x1,…,xn)T is given. Then

is called likelihood function and

is called maximum likelihood estimate (MLE) for .

If the gradient of with respect to q exists and if is positive for the given x, one can try to find the

MLE by solving the likelihood equation system

θ

θ ∈ΩfX(x | θ)

fX(x | θ)

fX(x | θ)fX(x | θ)

θ = argmaxθ∈Ω

fX(x | θ) = argmaxθ∈Ωln fX(x | θ)( )

∇θ ln fX(x | θ)( )θ=θ(x)

= 0

Page 77: Stochastic Signals and Systemshomepages.hs-bremen.de/~krausd/iwss/SSS3.pdfChapter 3 / Stochastic Signals and Systems / Prof. Dr. -Ing. Dieter Kraus 5 We examine the accuracy properties

I N S T I T U T E O F W A T E R A C O U S T I C S,

S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y

Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 77

Remarks:• Generally the likelihood equation system is nonlinear.• The likelihood equation system can possess several

solutions.• The solutions of the likelihood equation system do not

necessarily correspond to relative maxima.• If the absolute maximum of the likelihood function lies

on the boundary of W it can not be described by a so-lution of the likelihood equation system.

• Consequently, one has generally to employ sophisti-cated numerical optimization techniques for determin-ing maximum likelihood estimates.

Page 78: Stochastic Signals and Systemshomepages.hs-bremen.de/~krausd/iwss/SSS3.pdfChapter 3 / Stochastic Signals and Systems / Prof. Dr. -Ing. Dieter Kraus 5 We examine the accuracy properties

I N S T I T U T E O F W A T E R A C O U S T I C S,

S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y

Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 78

Properties of Maximum Likelihood Estimation:1) Let be a continuous mapping of

the parameter vector h and the MLE of h. Then the MLE of is given by .

2) Suppose t(x) is a sufficient transformation for q, i.e.

Then the maximum likelihood estimate only de-pends over t(x) on x.

3) If belongs to the exponential family in cano-nical form the related likelihood equation system

possesses a unique solution.

θ = g(η) θ = g(η)

θ

η

ηg :!q → !p, p ≤ q

fX(x | θ) = g t(x) | θ( ) ⋅h(x).

∇ξ ln fX(x | ξ)( ) = t(x)−∇ξA(ξ) = 0

fX(x | ξ)

Page 79: Stochastic Signals and Systemshomepages.hs-bremen.de/~krausd/iwss/SSS3.pdfChapter 3 / Stochastic Signals and Systems / Prof. Dr. -Ing. Dieter Kraus 5 We examine the accuracy properties

I N S T I T U T E O F W A T E R A C O U S T I C S,

S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y

Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 79

Exercise 3.6-1: Maximum likelihood estimation for linear models when the noise is i.i.d. and normally distributed

Exercise 3.6-2: Sinusoids + white noise, maximum likelihood estimation of amplitude, phase and frequency when the noise isi.i.d. and normally distributed

Exercise 3.6-3: Example for inconsistent maximum likelihood estimation

Page 80: Stochastic Signals and Systemshomepages.hs-bremen.de/~krausd/iwss/SSS3.pdfChapter 3 / Stochastic Signals and Systems / Prof. Dr. -Ing. Dieter Kraus 5 We examine the accuracy properties

I N S T I T U T E O F W A T E R A C O U S T I C S,

S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y

Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 80

3.7 Bayesian EstimationIn the Bayesian theory of parameter estimation the un-known parameter vector q is treated itself as a realiza-tion of a random experiment that obeys its own so calledprior distribution .The objective is to use together with the measure-ments x drawn from the distribution to turn the prior distribution into a posterior distribution

that is used for the statistical inference.

fΘ(θ)

fΘ(θ)

fΘ(θ)fX(x | θ)

fΘ(θ | x) =fXΘ(x,θ)fX(x)

=fX(x | θ)fΘ(θ)fXΘ(x,θ)dθ!p∫

=fX(x | θ)fΘ(θ)fX(x | θ)fΘ(θ)dθ!p∫

Page 81: Stochastic Signals and Systemshomepages.hs-bremen.de/~krausd/iwss/SSS3.pdfChapter 3 / Stochastic Signals and Systems / Prof. Dr. -Ing. Dieter Kraus 5 We examine the accuracy properties

I N S T I T U T E O F W A T E R A C O U S T I C S,

S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y

Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 81

Loss functionThe quality of the estimate is measured by a real-valued loss function , e.g. by the Euclidean dis-tance between q and , i.e.

Risk or loss averageThe risk is the loss averaged over the distribu-tion of the measurements for any fixed parameter vector

, i.e.

L(θ,θ(x)) = θ − θ(x)( )T θ − θ(x)( ).L(θ,θ(x))

L(θ,θ(x))

θ(x)

θ ∈Ω

RL(θ | θ) = EX|θ L(θ,θ(X))( ) = L(θ,θ(x))fx(x | θ)dx!n∫ .

θ(x)

Page 82: Stochastic Signals and Systemshomepages.hs-bremen.de/~krausd/iwss/SSS3.pdfChapter 3 / Stochastic Signals and Systems / Prof. Dr. -Ing. Dieter Kraus 5 We examine the accuracy properties

I N S T I T U T E O F W A T E R A C O U S T I C S,

S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y

Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 82

Bayes riskThe Bayes risk is the risk averaged over the pri-or distribution , i.e.

The Bayes risk measures the average risk one incurs if the estimator is used for the random experimentat hand. Hence, is depending only on the estimat-ing function .

RL(θ) = EΘ RL(θ |Θ)( ) = EΘ EX|θ L(Θ,θ(X))( )( )= L(θ,θ(x))fX(x | θ)fΘ(θ)dxdθ!n∫!p∫= L(θ,θ(x))fXΘ(x,θ)dxdθ!n+p∫ = EXΘ L(Θ,θ(X))( )

RL(θ |Θ)

RL(θ)

fΘ(θ)

Θ= θ(X)

θ(x)

Page 83: Stochastic Signals and Systemshomepages.hs-bremen.de/~krausd/iwss/SSS3.pdfChapter 3 / Stochastic Signals and Systems / Prof. Dr. -Ing. Dieter Kraus 5 We examine the accuracy properties

I N S T I T U T E O F W A T E R A C O U S T I C S,

S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y

Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 83

Bayes estimatorMinimization of the Bayes risk over the class of all estimating functions for which exists, providesthe Bayes estimating function

and therefore the Bayes estimator .

In the following Bayes estimators are considered in more detail for the square error, the absolute error and uniform error loss functions.

RL(θ)RL(θ)θ(x)

ΘL = θL(X)

θL(x) = argminθ(x)RL(θ) = argminθ(x)

EX,Θ L(Θ,θ(X))( )

Page 84: Stochastic Signals and Systemshomepages.hs-bremen.de/~krausd/iwss/SSS3.pdfChapter 3 / Stochastic Signals and Systems / Prof. Dr. -Ing. Dieter Kraus 5 We examine the accuracy properties

I N S T I T U T E O F W A T E R A C O U S T I C S,

S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y

Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 84

3.7.1 Minimum Mean Square Error EstimationNon-Linear Minimum Mean Square Error EstimationTo estimate the random parameter vector Q by means of the realization of the random vector X in the minimum mean square error sense we have to find an estimating function such that

is minimum. Over the class of all functions for whichthe expected value exists, is minimized by

θ(x)

θ(x)

RMSE (θ) = E Θ − θ(X)( )T Θ − θ(X)( )⎛⎝

⎞⎠

= θ − θ(x)( )T θ − θ(x)( )fXΘ(x,θ) dxdθ!n+p∫

RMSE (θ)

Page 85: Stochastic Signals and Systemshomepages.hs-bremen.de/~krausd/iwss/SSS3.pdfChapter 3 / Stochastic Signals and Systems / Prof. Dr. -Ing. Dieter Kraus 5 We examine the accuracy properties

I N S T I T U T E O F W A T E R A C O U S T I C S,

S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y

Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 85

cf. Exercise 1.9-18.

Linear Minimum Mean Square Error EstimationNow, we wish to estimate the random parameter vectorQ by the linear estimating function

such that the mean square error

θMMSE (x) = E θ |X = x( ),

θ(x) = Ax +b

RLMSE (A,b) = E Θ − (AX +b)( )T Θ − (AX +b)( )( )= θ − (Ax +b)( )T θ − (Ax +b)( )fXΘ(x,θ)dxdθ!n+p∫

Page 86: Stochastic Signals and Systemshomepages.hs-bremen.de/~krausd/iwss/SSS3.pdfChapter 3 / Stochastic Signals and Systems / Prof. Dr. -Ing. Dieter Kraus 5 We examine the accuracy properties

I N S T I T U T E O F W A T E R A C O U S T I C S,

S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y

Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 86

is minimized by varying the parameter matrix A and vec-tor b. Thus, we have to solve the minimization problem

The mean square error is minimum if

are satisfied. Applying the expectation operator we obtain

where

( )=,

ˆ ˆ( , ) argmin ( , ) .LMSERAb

A b A b

∂∂ARLMSE (A,b) = −2E Θ − (AX + b)( )XT( ) = 0

∇bRLMSE (A,b) = −2E Θ − (AX + b)( ) = 0

RΘX − ARXX − bµXT = 0 and µΘ − AµX − b = 0,

RXX = E(XXT), RΘX = E(ΘXT), µX = E(X) and µΘ = E(Θ).

Page 87: Stochastic Signals and Systemshomepages.hs-bremen.de/~krausd/iwss/SSS3.pdfChapter 3 / Stochastic Signals and Systems / Prof. Dr. -Ing. Dieter Kraus 5 We examine the accuracy properties

I N S T I T U T E O F W A T E R A C O U S T I C S,

S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y

Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 87

Resolving for leads after some manipulations to

and therefore to the estimating function

After exploiting the relationships between the covarianceand the first and second order moments

the estimating function can also be expressed by

ˆ ˆandA b

θLMMSE (x) = µΘ +CΘXCXX−1 x − µX( ).

CXX =RXX − µXµXT and CΘX =RΘX − µΘµX

T

θLMMSE (x)= Ax+ b=µΘ+ RΘX−µΘµXT( ) RXX−µXµX

T( )−1 x− µX( ).

A = RΘX−µΘµXT( ) RXX−µXµX

T( )−1

b = µΘ − RΘX−µΘµXT( ) RXX−µXµX

T( )−1µX

Page 88: Stochastic Signals and Systemshomepages.hs-bremen.de/~krausd/iwss/SSS3.pdfChapter 3 / Stochastic Signals and Systems / Prof. Dr. -Ing. Dieter Kraus 5 We examine the accuracy properties

I N S T I T U T E O F W A T E R A C O U S T I C S,

S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y

Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 88

Theorem:Let Y be a random vector composed of the random vec-tors X and Q and obeying the distribution

Then X and Q possessing the marginal distributions

and the conditional distribution

cf. exercise 1.10-7.

Y =XΘ

⎛⎝⎜

⎞⎠⎟∼Nn+p

µXµΘ

⎝⎜

⎠⎟ ,

ΣXX ΣXΘ

ΣΘX ΣΘΘ

⎝⎜

⎠⎟

⎝⎜

⎠⎟ .

X∼Nn µX,ΣXX( ), Θ∼Np µΘ,ΣΘΘ( )

Θ |X = x∼Np µΘ + ΣΘXΣXX−1 (x − µX ),ΣΘΘ − ΣΘXΣXX

−1 ΣXΘ( )

Page 89: Stochastic Signals and Systemshomepages.hs-bremen.de/~krausd/iwss/SSS3.pdfChapter 3 / Stochastic Signals and Systems / Prof. Dr. -Ing. Dieter Kraus 5 We examine the accuracy properties

I N S T I T U T E O F W A T E R A C O U S T I C S,

S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y

Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 89

Theorem:Given the linear model

where H is a known matrix, Q and Z are statistically in-dependent random vectors with and

. Then

and the MMSE estimate can be expressed by

X =HΘ + Z,

Θ∼Np(µΘ,ΣΘΘ )

θMMSE (x) = µΘ + ΣΘΘHT HΣΘΘH

T + ΣZZ( )−1(x −HµΘ ).

Y =XΘ

⎛⎝⎜

⎞⎠⎟∼Nn+p

HµΘ

µΘ

⎝⎜

⎠⎟ ,HΣΘΘH

T + ΣZZ HΣΘΘ

ΣΘΘHT ΣΘΘ

⎝⎜

⎠⎟

⎝⎜⎜

⎠⎟⎟

Z∼Nn(0,ΣZZ)

Page 90: Stochastic Signals and Systemshomepages.hs-bremen.de/~krausd/iwss/SSS3.pdfChapter 3 / Stochastic Signals and Systems / Prof. Dr. -Ing. Dieter Kraus 5 We examine the accuracy properties

I N S T I T U T E O F W A T E R A C O U S T I C S,

S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y

Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 90

Exercise 3.7-1: Proof of the Theorem

Page 91: Stochastic Signals and Systemshomepages.hs-bremen.de/~krausd/iwss/SSS3.pdfChapter 3 / Stochastic Signals and Systems / Prof. Dr. -Ing. Dieter Kraus 5 We examine the accuracy properties

I N S T I T U T E O F W A T E R A C O U S T I C S,

S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y

Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 91

3.7.2 Minimum Mean Absolute Error EstimationThe mean absolute error estimation is considered for the single parameter case only. To estimate the random para-meter q by means of the realization of the random vec-tor X in the minimum mean absolute error sense

has to be minimized. As shown in the subsequent exer-cise the minimization of leads to

Hence, the is the median of the posterior den-sity function .

ˆ ( )MMAEq x

ˆ( )MAER q

RMAE (θ) = E |Θ − θ(X)|( ) = |θ − θ(x)| fXΘ (x,θ) dxdθ!n+1∫

fΘ (θ | x) dθ−∞

θMMAE (x)∫ = fΘ (θ | x) dθθMMAE (x)

∫ .

fΘ (θ | x)

Page 92: Stochastic Signals and Systemshomepages.hs-bremen.de/~krausd/iwss/SSS3.pdfChapter 3 / Stochastic Signals and Systems / Prof. Dr. -Ing. Dieter Kraus 5 We examine the accuracy properties

I N S T I T U T E O F W A T E R A C O U S T I C S,

S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y

Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 92

Exercise 3.7-2: Proof of median

Page 93: Stochastic Signals and Systemshomepages.hs-bremen.de/~krausd/iwss/SSS3.pdfChapter 3 / Stochastic Signals and Systems / Prof. Dr. -Ing. Dieter Kraus 5 We examine the accuracy properties

I N S T I T U T E O F W A T E R A C O U S T I C S,

S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y

Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 93

3.7.3 Maximum A Posteriori EstimationAnother estimation technique that fits within the Bayesian framework is the maximum a posteriori estimation (MAP).Single parameter caseTo motivate the approach the uniform loss function

is introduced. The Bayes risk is then given by

ˆ0 | ( )|ˆ( , ( )) where 0ˆ1 | ( )|

Lq q d

q q dq q d

ì - £ï= >í- >ïî

xxx

RL(θ) = E L(θ,θ(X))( ) = L(θ,θ(x))fXΘ (x,θ)dxdθ!n+1∫= L(θ,θ(x)) fΘ (θ | x)dθ−∞

∫⎛⎝⎞⎠!n∫ fX(x)dx.

Page 94: Stochastic Signals and Systemshomepages.hs-bremen.de/~krausd/iwss/SSS3.pdfChapter 3 / Stochastic Signals and Systems / Prof. Dr. -Ing. Dieter Kraus 5 We examine the accuracy properties

I N S T I T U T E O F W A T E R A C O U S T I C S,

S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y

Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 94

Since fX(x) and the integral in the brackets are always po-sitive it is sufficient to minimize

or equivalently to maximize

for every x.For d arbitrary small determines the mode of and is termed maximum a posteriori (MAP) estimate, i.e.

ˆ( )q x

L(θ,θ(x))fΘ (θ |x)dθ−∞

∫ = fΘ (θ |x)dθ + fΘ (θ |x)dθθ (x)+δ

∫−∞

θ (x)−δ

1− fΘ (θ |x)dθ + fΘ (θ |x)dθθ (x)+δ

∫−∞

θ (x)−δ

∫⎛⎝⎜⎞⎠⎟ = fΘ (θ |x)dθθ (x)−δ

θ (x)+δ

fΘ (θ |x)

θMAP (x) = argmaxθfΘ (θ |x) = argmaxθ

fX(x |θ)fΘ (θ)= argmax

θln fX(x |θ)( )+ ln fΘ (θ)( )( )

Page 95: Stochastic Signals and Systemshomepages.hs-bremen.de/~krausd/iwss/SSS3.pdfChapter 3 / Stochastic Signals and Systems / Prof. Dr. -Ing. Dieter Kraus 5 We examine the accuracy properties

I N S T I T U T E O F W A T E R A C O U S T I C S,

S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y

Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 95

Multiple parameter caseEmploying the marginal posterior density functions

the MAP estimates are given by

However, due to the required integration we might pro-pose the alternative vector valued MAP estimate

Remark:Generally, both MAP estimates are not equivalent.

fΘ i(θ i | x) = fΘ(θ | x)dθ1…dθ i−1dθ i+1…dθp!p−1∫

θ i,MAP (x) = argmaxθ ifΘ i(θ i | x) i = 1,…,p.

θMAP (x) = argmaxθ fΘ(θ | x) = argmaxθ fX(x | θ)fΘ(θ).

Page 96: Stochastic Signals and Systemshomepages.hs-bremen.de/~krausd/iwss/SSS3.pdfChapter 3 / Stochastic Signals and Systems / Prof. Dr. -Ing. Dieter Kraus 5 We examine the accuracy properties

I N S T I T U T E O F W A T E R A C O U S T I C S,

S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y

Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 96

Exercise 3.7-3:MMSE, MMAE and MAP estimation of the parameter of an exponential distribution, where the parameter obeys also an exponential distribution

Exercise 3.7-4: Signal + noise, MMSE signal amplitude estimation

Page 97: Stochastic Signals and Systemshomepages.hs-bremen.de/~krausd/iwss/SSS3.pdfChapter 3 / Stochastic Signals and Systems / Prof. Dr. -Ing. Dieter Kraus 5 We examine the accuracy properties

I N S T I T U T E O F W A T E R A C O U S T I C S,

S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y

Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 97

References to Chapter 3[1] M. Fisz, Probability Theory and Mathematical Statistics,

Krieger Publishing Company, 1980[2] C.W. Helstrom, Elements of Signal Detection and Estimation,

Prentice Hall, 1994[3] S. Kay, Fundamentals of Statistical Signal Processing,

Vol. 1: Estimation Theory, Prentice Hall, 1993[4] E.L. Lehmann and G. Casella, Theory of Point Estimation,

Springer, 2003[5] H.V. Poor, An Introduction to Signal Detection and Estimation,

Springer, 1994[6] C.R. Rao, Linear Statistical Inference and Its Application,

John Wiley, 1973[7] L.L. Scherf, Statistical Signal Processing, Addison Wesley, 1990