stochastic signals and systemshomepages.hs-bremen.de/~krausd/iwss/sss3.pdfchapter 3 / stochastic...
TRANSCRIPT
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 1
Stochastic Signals and SystemsContents1 Probability Theory2 Stochastic Processes3 Parameter Estimation4 Signal Detection5 Spectrum Analysis 6 Optimal Filtering
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 2
3 Parameter Estimation 3
3.1 Estimating Function and Estimator 4
3.2 Sufficient Statistic, Exponential Family 15
3.3 Linear Least Squares Estimation 29
3.4 Confidence Intervals 53
3.5 Cramer-Rao Lower Bound 57
3.6 Maximum Likelihood Estimation 75
3.7 Bayesian Estimation 803.7.1 Minimum Mean Square Error Estimation 843.7.2 Minimum Mean Absolute Error Estimation 913.7.3 Maximum A Posteriori Estimation 93
References to Chapter 3 97
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 3
3 Parameter EstimationThe observations (x1,…,xn)T = x are realizations of the random variables (X1,…,Xn)T = X with density fx(x), which is element of a known set { fx(x | ): Î W } and whose parameter vector = (q1,…,qp)T is unknown.
Problem:For given observations x we are looking for an estimate
of that depends on x, i.e. .In parameter estimation problems one distinguishes wheth-er the parameter vector is a perfectly unknown quantity or whether the prior density of the parameter vector is sup-posed to be known.
θθθ
θ θ = θ(x)θ = (θ1,…,θp )T
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 4
In the latter case one interprets the parameter vector as random vector, whose posterior density is used for the subsequent statistical inference. This approach leads to Bayes estimators.
3.1 Estimating Function and EstimatorFor determining qi , we denote the mapping
a estimating function for qi and its function value for a giv-en set of observations an estimate. Since x1,…,xn can be considered as random values the estimates are also ran-dom and therefore only approximate the true values qi .
( ) ( )1 1ˆ, , , , , 1, ,n i nx x x x i pq =! " ! !
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 5
We examine the accuracy properties of the estimates on the basis of the corresponding random variables. Therefore, we define the random variable
and denote it as estimator of qi.
To characterize the properties of the estimator
completely, the density function that generally de-pends on has to be determined, e.g. using the methodsdescribed in Chapter 1.7.
( )1ˆˆ , , , 1, , ,i i nX X i pqQ = =! !
Θ = θ X1,…,Xn( )
θfΘ(θ | θ)
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 6
For simplicity let p =1. Exemplarily, the densities of two alternative estimators and for q are depicted below.
0 1 2 3 4 5 6 7 8 9 100
0.1
0.2
0.3
0.4
0.5
q
!ΘΘ
f !Θ ( !θ |θ)
fΘ (θ |θ)
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 7
The determination of the density function can be rather complicated. Therefore, one must be often satisfied with the determination of the moments of an estimator. Bias:The bias or systematic error is define by
where
If the estimator is said to be unbiased.
E(Θ i ) = ! θ i x1,…,xn( )fX1,…,Xn−∞
∞
∫−∞
∞
∫ (x1,…,xn | θ) dx1!dxn.
Θb(Θ) = 0
b(Θ) = b(Θ1),…,b(Θp )( )T = E(Θ1)−θ1,…,E(Θp )−θp( )T
= E(Θ1),…,E(Θp )( )T − θ1,…,θp( )T = E(Θ)− θ,
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 8
Covariance:The covariance matrix is define by
where the diagonal elements represent the variances
for i = 1,…,p.
2 2 2
ˆ ˆ ˆVar( ) Cov( , )ˆ ˆ ˆ ˆE( E ) E( ) (E )
i i i
i i i i
Q = Q Q
= Q - Q = Q - Q
Cov(Θ) = Cov(Θ i ,Θ j )⎡⎣ ⎤⎦i, j=1,…,p
= E Θ i−E(Θ i )( ) Θ j −E(Θ j )( )( )⎡⎣⎢
⎤⎦⎥i, j=1,…,p
= E Θ −E(Θ)( ) Θ −E(Θ)( )T⎛⎝
⎞⎠ ,
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 9
Mean Square Error:The matrix-valued mean square error is define by
where the diagonal elements represent the mean square errors of the individual components given by
for i = 1,…,p.
MSE(Θ) = E (Θ i−θ i )(Θ j −θ j )( )⎡⎣
⎤⎦i, j=1,…,p
= E (Θ − θ)(Θ − θ)T( )= Cov(Θ)+b(Θ)b(Θ)T ,
MSE(Θ i ) = E (Θ i −θ i )2( ) = Var(Θ i )+ b(Θ i )
2
Θ i
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 10
Minimum Variance Unbiased (MVU) Estimator:An unbiased estimator is MVU if for any unbiased esti-mator the following inequality holds.
MVU exists no MVU existsVar Var
q!q
q
q
q!
qq q
aTCov(Θ)a ≤ aTCov( !Θ)a ∀a ∈"p, ∀θ ∈Ω
Θ!Θ
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 11
Note:To emphasize that the variance is smallest for all an MVU es-timator is sometimes called uniformly minimum variance unbiased UMVU estimator.
Minimum Mean Square Error (MMSE) Estimator:An estimator provides an MMSE if for any estimatorthe following inequality holds.
Linear Estimator:An estimator is called linear if it can be expressed as a linear function of the observations, i.e.
Θ !Θ
θ ∈Ω
Θ
aTMSE(Θ)a ≤ aTMSE( !Θ)a ∀a ∈!p, ∀θ ∈Ω
Θ = AX with X = (X1,…,Xn) and A = ai, j( )i=1,…,p; j=1,…,n
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 12
Exercise 3.1-1: MVU and MMSE variance estimator
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 13
Consistency:Often one is interested in the behavior of an estimator if the number of observations grows. An estimator is said to be
– strongly consistent if converges with probability 1 towards , i.e.
– mean square consistent if converges in mean square sense towards , i.e.
– consistent if converges in probability towards , i.e.
Θn = θ(X1,…,Xn)a.s.n→∞⎯ →⎯⎯ θ
Θn = θ(X1,…,Xn)m.s.n→∞⎯ →⎯⎯ θ
Θn = θ(X1,…,Xn)P
n→∞⎯ →⎯⎯ θ
θ
θ
θ
Θn
Θn
Θn
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 14
Asymptotic Normality:In certain cases one can show that an estimator is asymptotically normally distributed such that
Consequently, for large n the density function of can be approximated by
limn→∞
n(Θn − θ)∼Np(0,Σ).
fΘn(θ | θ) = np 2
(2π )p 2 det(Σ)⋅exp − n
2(θ − θ)T Σ−1(θ − θ)
⎧⎨⎩
⎫⎬⎭.
Θn
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 15
3.2 Sufficient Statistic, Exponential FamiliesA function T = t(X) that is only depending on the observa-tion model X is called a statistic. Let X = (X1,…,Xn)T be a model of a binary sequence with probability p and 1-p of observing a one and a zero res-pectively. Furthermore, we suppose that the X1,…,Xn are independent. Thus, our model can be described by the probabilities
1 1
1
1
( ) ( )
( | ) (1 )
(1 ) (1 )
k k
n nk kk k
nx x
k
x n x t n t
P p p p
p p p p= =
-
=
- -
= = -
å å= - = -
Õx x
X x
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 16
Now the following question arises. Do we collect all infor-mation in terms of inference about p by recording only
The statistical understanding about the collection of all information is quantified in the following definition.
Definition:A statistic T = t(X) is called sufficient for the parameter qif the conditional distribution of X given T = t(x) is inde-pendent of q for all t, i.e.
Hence, T contains "all information" about q included in x.
1( ) ?n
kkt x
==åx
FX x |T = t(x);θ( ) = FX x |T = t(x)( ).
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 17
Exercise 3.2-1: Sufficiency of
1( ) n
kkt x
==åx
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 18
Because the conditional distribution has to be determined a direct evaluation of sufficiency is usually difficult. Fortunately, the following theorem exist whose conditions can be verified easily.
Theorem: (factorization theorem for densities) A necessary and sufficient condition for a statistic T = t(X) to be sufficient is that there exist non-negative functions g(t | ) and h(x) such that fX(x | ) satisfiesθ θ
fX(x | θ) = g t(x) | θ( ) ⋅h(x).
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 19
Exercise 3.2-2: Proof of the Theorem
Exercise 3.2-3: Sufficient statistic for mean and variance
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 20
Minimal Sufficient StatisticA sufficient statistic T is said to be minimal if of all suffi-cient statistics it provides the greatest possible reduction of data, i.e. if for any sufficient statistic T' there exists a function s such that T = s(T').
Complete Sufficient StatisticA sufficient statistic T is said to be complete (or unique) if the condition
implies f(T) = 0 with probability 1 for all q.
Note: Completeness ensures minimality
θ
Eθ f(T)( ) = 0 ∀θ ∈Ω
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 21
Exercise 3.2-4: Minimal and complete sufficient statistics
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 22
Uniqueness of unbiased estimating functionCompleteness implies that there is just one estimating function of the sufficient statistic that provides an unbi-ased estimator of .Let T be a complete sufficient statistic and f1 and f2 be two functions such that
Then
and due to the completeness of TEθ f1(T)− f2(T)( ) = Eθ f(T)( ) = 0 ∀θ ∈Ω
f(T) = 0⇒ f1(T) = f2(T) with probability 1 ∀θ ∈Ω
θ
Eθ f1(T)( ) = Eθ f2(T)( ) = θ ∀θ ∈Ω.
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 23
Theorem: (Rao-Blackwell)Let be an unbiased estimator of and T = t(X) be a sufficient statistic for q. Then the estimator defined by
is unbiased and improves on as follows.
Theorem: (Lehmann-Scheffe)If in addition to the assumptions employed for the Rao-Blackwell theorem the sufficient statistic is complete, then
is unique MVU estimator of q.
θ!Θ
Θ = E( !Θ |T)
Θ = E( !Θ |T)
aTCov(Θ)a ≤ aTCov( !Θ)a ∀a ∈"p!Θ
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 24
Exercise 3.2-5:Proof of the Theorems
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 25
Exponential FamiliesA family of distributions is forming a k-dimen-sional exponential family if the distributions have densities of the form
Frequently, it is more convenient to use the xi as the pa-rameters and write the density in the canonical form
FX(x | θ){ }FX(x | θ)
fX(x | θ) = h(x) ⋅exp ξi (θ)ti (x)−B(θ)i=1
k
∑⎛⎝⎜⎞⎠⎟.
fX(x | ξ) = h(x) ⋅exp ξi ti (x)− A(ξ)i=1
k
∑⎛⎝⎜⎞⎠⎟.
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 26
Applying the factorization theorem for densities one can easily observe that
constitutes a sufficient statistic for the exponential family.
Note:The parameter space of the natural parametervector is convex.If the exponential family is of full rank, i.e. the parameter space contains a k-dimensional rectangle, then T = (T1, …,Tk)T is complete.
( ) ( )1 1, , ( ), , ( )T Tk kT T t t= =T X X! !
Ξ ⊂ !k
ξ = (ξ1,…,ξk )T
Ξ
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 27
For exponential families one can claim, that the integral
has derivatives of all orders with respect to the xi which can be obtained by differentiating under the integral sign.Exploiting the properties of the integral we can find
and
fX(x | ξ)!n∫ dx = h(x) ⋅exp ξi ti (x)− A(ξ)i=1
k
∑⎛⎝⎜⎞⎠⎟!n∫ dx = 1
E Ti( ) = E ti (X)( ) = ∂∂ξiA(ξ)
Cov Ti ,Tj( ) = Cov ti (X),t j (X)( ) = ∂2
∂ξi ∂ξ jA(ξ)
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 28
Exercise 3.2-6: Rayleigh distribution, mean and variance of the natural sufficient statistic
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 29
3.3 Linear Least Squares EstimationConsider the linear model
where X and Z are n´1 vectors modeling the measure-ments and the measurement noise, respectively. Further-more, H denotes a known n´p matrix and q the p´1 pa-rameter vector that has to be estimated. The measurement noise is supposed to be statistically characterized by and Hence, the measurement model X possess the mean vec-tor and covariance matrix
E( ) =Z 0
2Cov( ) .Zs=X I
2Cov( ) E( ) .TZs= =Z ZZ I
X =Hθ + Z with Xi=hi1θ1+…+hipθp+Zi , i =1,…,n,
E(X) =Hθ
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 30
Exercise 3.3-1: Model examples, • polynomial curve fitting, • amplitude and phase estimation of sinusoids • FIR filter identification
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 31
Now, the least squares criterion can be expressed by
Differentiating with respect to by using the identities
results after equating to zero in the following so-called normal equation system
A solution of the normal equation system is given by
q(θ) = xi − (hi1θ1+…+hipθp )( )2i=1
n
∑= (x −Hθ)T (x −Hθ) = xTx − 2θTHTx + θTHTHθ.
∇θ θTAa = Aa, ∇θ θ
TAθ = (A + AT )θ
HTHθ =HTx.
θ = (HTH)−HTx,
θ
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 32
where denotes a generalized inverse.
1) If the rank(H) = p, the number of unknown parameters, then is the ordinary inverse.
2) Let rank(H) = M < p, i.e. either n < p or the columns ofH are linearly dependent, then
is the Moore-Penrose inverse, where l1,…,lM and u1, …,uM denote the non-zero eigenvalues and corre-sponding eigenvectors of , respectively.
( )T -H H
1( ) ( )T T- -=H H H H
1
1( ) ( )M
T T Tm m
m ml- +
=
= =åH H H H u u
TH H
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 33
Supplement:A generalized inverse of A is defined by the property
It is not unique. The Moore-Penrose inverse of A is de-fined by the properties
It is unique and provides the minimum length solution of the linear equation system
The Moore-Penrose inverse A+ can be determined by employing the singular value decomposition of A.
+ + + +
+ + + +
= == =, ,
( ) ( ) .T T
AA A A A AA AAA AA A A A A
.- =AA A A
.=Ax b
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 34
Exercise 3.3-2: Normal equation system and generalized inverse
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 35
Properties of the least squares estimatorThe mean vector is given by
where the matrices U1 and U2 can be derived from the singular value decomposition
with1( , , ,0, ,0)T T
Mdiag l l=H H U U! !
( ) ( ) ( )1 2 1 1 2 1, , , , .M M p+= = =U U U U u u U u u! !
E(Θ) = (HTH)−HTE(X) = (HTH)−HTHθ
=θ rank(H) = pU1U1
T θ = (I−U2U2T )θ rank(H) =M < p
⎧⎨⎪
⎩⎪,
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 36
The covariance matrix of the least squares estimator
results after exploiting
in the expression
CΘΘ
= Cov(Θ) = E Θ −E(Θ)( ) Θ −E(Θ)( )T⎛⎝
⎞⎠
CΘΘ
= E (HTH)−HT (X −Hθ)(X −Hθ)TH(HTH)−( )= (HTH)−HT E (X −Hθ)(X −Hθ)T( )H(HTH)−= (HTH)−HT (σ Z
2 I)H(HTH)− = σ Z2(HTH)−.
E(Θ) = (HTH)−HTHθ
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 37
Exercise 3.3-3: LSE for p = 1 and sample mean
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 38
Theorem: (Gauß-Markov)Given the model
where
and
Then the best linear unbiased estimator (BLUE) is the least squares estimator
rank( ) .p=H
X =Hθ + Z with Xi=hi1θ1+…+hipθp+Zi , i =1,…,n,
E(X) =Hθ, Cov(X) = Cov(Z) = σ Z2 I
Θ = (HTH)−1HTX.
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 39
Exercise 3.3-4: Proof of the Gauß-Markov Theorem
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 40
Consequently, the minimum of the sum of squares is
where
are projection matrices, which project a vector a Î Rn by Pa and P^a into R(H) and N(HT), respectively.R(H): range of H, i.e. the space spanned by the columns of H
N(HT): nullspace of HT, i.e. the space that is orthogonal to R(H)
1 1( ) and ( )T T T T- ^ -= = - = -P HH H H P I P I H H H H
q(θ) = x −H(HTH)−1HTx( )T x −H(HTH)−1HTx( )= (x −Px)T (x −Px) = xT (I−P)T (I−P)x= xTP⊥TP⊥x = xTP⊥x,
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 41
Thus, and The projection matrices are also symmetric and idempotent, i.e.
Moreover, by employing the trace of a square matrix
together with its property
where A, B and C are n´k, k´l and l´n matrices the min-imum of the sum of squares can be expressed by
, and , .T T^ ^ ^ ^ ^= = = =P P P PP P P P P P
,T ^ ^= =H P 0 P H 0
( ) , 1, ,1
tr( ) withn
ii ij i j nia a
==
= =åA A!
.^ ^= =PP P P 0
tr( ) tr( ) tr( ),= =ABC BCA CAB
q(θ) = xTP⊥x = tr(P⊥xxT ).
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 42
Now we want to consider how the variance of the noise model can be estimated. The expected value of the minimum of the sum of squares provides
where
has been exploited.
( )( )
1
1
tr( ) tr ( )
tr ( ) tr( )
T Tn
T Tpn n n p
^ -
-
= -
= - = - = -
P I H H H H
H H H H I
2Zs
E q(Θ)( ) = E tr(P⊥XXT )( ) = tr P⊥E (Hθ + Z)(Hθ + Z)T( )( )= tr P⊥ HθθTHT +E(Z)θTHT +HθE(ZT )+E(ZZT )( )( )= tr P⊥(HθθTHT +σ Z
2 I)( ) = σ Z2 tr(P⊥ ) = (n − p)σ Z
2,
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 43
This result motivates
as an unbiased estimator for
Theorem:If in addition to the Gauß-Markov theorem Z obeys a nor-mal distribution, i.e. the following holds.
2.Zs
S2 = q(Θ) (n − p)
Z∼Nn(0,σ Z2 I),
a) Θ = (HTH)−1HTX∼Np θ,σ Z2 (HTH)−1( )
b) Θ and S2 are stochastically independent
c) (n − p) σ Z2 ⋅S2 ∼ χn−p
2
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 44
Exercise 3.3-5: Proof of the Theorem
Exercise 3.3-6: Amplitude and phase estimation of sinusoids
Exercise 3.3-7: System identification (FIR filter)
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 45
In the following we generalize the linear model
such that the measurement noise possesses the meanand the covariance matrix
PrewhiteningIf is decomposed, e.g. by the Cholesky decomposi-tion , and
be introduced, the linear model can be reformulated to
with
E( )=Z 0 2Cov( ) .Zs= ZZZ C
ZZCT=ZZC CC
1 1 1, ,- - -= = =Y C X K C H W C Z
X =Hθ + Z with Xi=hi1θ1+…+hipθp+Zi , i =1,…,n,
Y =Kθ +W
E(Y) =Kθ = C−1Hθ and Cov(Y) = Cov(W) = σ Z2 I.
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 46
Hence, the weighted least squares criterion for the gen-eralized linear model can be derived as follows.
The normal equation system
and its solution
are obtained analogous to the white noise case.
q(θ) = (y −Kθ)T (y −Kθ)
= C−1(x −Hθ)( )T C−1(x −Hθ)( )= (x −Hθ)T (C−1)TC−1(x −Hθ) = (x −Hθ)TCZZ
−1 (x −Hθ).
KTKθ =KTy resp. HTCZZ−1 Hθ =HTCZZ
−1 x
θ = (KTK)−KTy = (HTCZZ−1 H)−HTCZZ
−1 x
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 47
Properties of the generalized least squares estimatorThe mean vector is given by
where the matrices U1 and U2 can be derived from the singular value decomposition
with
11( , , ,0, ,0)T T
Mdiag l l- =ZZH C H U U! !
E(Θ) = (HTCZZ−1H)−HTCZZ
−1 E(X) = (HTCZZ−1 H)−HTCZZ
−1 Hθ
=θ rank(H) = pU1U1
T θ = (I−U2U2T )θ rank(H) =M < p
⎧⎨⎪
⎩⎪,
U = U1 U2( ), U1 = u1,…,uM( ) U2 = uM+1,…,up( ).
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 48
The covariance matrix of the generalized least squares estimator
results after exploiting
in the expression
CΘΘ
= Cov(Θ) = E Θ −E(Θ)( ) Θ −E(Θ)( )T⎛⎝
⎞⎠
E(Θ) = (HTCZZ−1 H)−HTCZZ
−1 Hθ
CΘΘ
= E (HTCZZ−1H)−HTCZZ
−1 (X−Hθ)(X−Hθ)TCZZ−1H(HTCZZ
−1H)−( )= (HTCZZ
−1H)−HTCZZ−1E (X−Hθ)(X−Hθ)T( )CZZ
−1H(HTCZZ−1H)−
= (HTCZZ−1H)−HTCZZ
−1 (σ Z2CZZ)CZZ
−1H(HTCZZ−1H)−
= σ Z2(HTCZZ
−1H)−.
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 49
Theorem: (Gauß-Markov)Given the model
where
and
Then the best linear unbiased estimator (BLUE) is the least squares estimator
rank( ) .p=H
X =Hθ + Z with Xn=hi1θ1+…+hipθp+Zi , i =1,…,n,
E(X) =Hθ, Cov(X) = Cov(Z) = σ Z2CZZ
Θ = (HTCZZ−1 H)−1HTCZZ
−1 X.
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 50
Hence, the minimum of the weighted sum of squares is
where
represent again projection matrices.
1 1 11 1 1( )T T
- - -- - - ^= = -
ZZ ZZ ZZZZ ZZC C CP HH C H H C and P I P
q(θ) = tr(P⊥yyT ) = tr I−K(KTK)−1KT( )yyT( )= tr I−C−1H HT (C−1)TC−1H( )−1HT (C−1)T⎛
⎝⎞⎠C
−1xxT (C−1)T⎛⎝⎜
⎞⎠⎟
= tr CZZ−1 I−H HTCZZ
−1H( )−1HTCZZ−1⎛
⎝⎞⎠ xx
T⎛⎝⎜
⎞⎠⎟
= tr CZZ−1 I−P
CZZ−1( )xxT( ) = tr CZZ
−1 PCZZ
−1⊥ xxT( ),
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 51
Again, we are interested in estimating the variance of the noise. The expected value of the minimum of the weighted sum of squares results in
where
has been utilized.
2Zs
E q(Θ)( )= E tr(CZZ−1P
CZZ−1
⊥ XXT )( )= tr CZZ−1P
CZZ−1
⊥ E (Hθ+Z)(Hθ+Z)T( )( )= tr CZZ
−1PCZZ
−1⊥ HθθTHT +E(Z)θTHT +HθE(ZT )+E(ZZT )( )( )
= tr CZZ−1P
CZZ−1
⊥ (HθθTHT +σ Z2CZZ)( )=σ Z
2 tr(PCZZ
−1⊥ )= (n−p)σ Z
2,
tr(PCZZ
−1⊥ ) = tr In −H(H
TCZZ−1 H)−1HTCZZ
−1( )= n − tr (HTCZZ
−1 H)−1HTCZZ−1H( ) = n − tr(Ip ) = n − p
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 52
This result motivates
as an unbiased estimator for
Theorem:If in addition to the Gauß-Markov theorem Z obeys a nor-mal distribution, i.e. the following holds.
2.Zs
Z∼Nn(0,σ Z2CZZ),
S2 = q(Θ) (n − p)
a) Θ = (HTCZZ−1H)−1HTCZZ
−1 X∼Np θ,σ Z2 (HTCZZ
−1H)−1( )b) Θ and S2 are stochastically independent
c) (n − p) σ Z2 ⋅S2 ∼ χn−p
2
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 53
3.4 Confidence IntervalsNow, an unknown parameter q is considered, where the density fX(x|q) of X and an estimator for q pos-sessing the density are given.With the knowledge of and a given a, e.g. a =0.05, we can derive from
the probability equation
Θ = θ(X)fΘ (θ |θ)
fΘ (θ |θ)
1−α = P θ −a1<Θ ≤θ +a2( ) = FΘ θ +a2 |θ( )− FΘ θ −a1 |θ( )
1−α = P θ − a1 < Θ ≤ θ + a2( ) = P −a1 < Θ −θ ≤ a2( )= P −Θ − a1 < −θ ≤ a2 − Θ( ) = P Θ − a2 ≤ θ < Θ + a1( )
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 54
which states that random interval covers the parameter q with probability 1-a.
For given observations x the set
is called – confidence interval for q with the confidence
coefficient 1- a
or alternatively – interval estimate to the confidence level 1- a.
{ }2 1ˆ ˆ( ) : ( ) ( )C a aq q q q= - £ < +x x x
[Θ − a2,Θ + a1)
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 55
In case of an unknown parameter vector q an interval es-timate can be constructed if a function exists such that the corresponding random variable
obeys a known distribution which is independent of .
Frequently one can find such a function, which depends on x only over the estimate , i.e.
Examples are given in the subsequent exercises.
θ(x)
θ
g(x | θ)
Y = g(x | θ)
g(x | θ) = h(θ | θ).
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 56
Exercise 3.4-1: Signal + white noise, confidence interval for the • signal amplitude estimate (known noise variance)• signal amplitude estimate (unknown noise variance)• noise variance estimate
Exercise 3.4-2: Sinusoids + white noise, confidence interval for the amplitude estimates (unknown noise variance)
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 57
3.5 Cramer-Rao Lower BoundIn the following only unbiased estimators, i.e.
are considered, whose covariance matrix
exist. The Cramer-Rao inequality provides now a lower bound to the covariance matrix of any unbiased estima-tor of . Due to the following regularity conditions certainexpected values can be determined, which are of impor-tance for deriving the Cramer-Rao inequality.
E(Θ) = θ(x)fX(x | θ) dx!n∫ = θ,
CΘΘ
= Cov(Θ) = E (Θ − θ)(Θ − θ)T( )
θ
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 58
Regularity conditions:a) The parameter set W is an interval in .
b) The gradient exists for any x and q.
c) The gradient of with respect to q can be obtained by taking the gradient under the integralsign, i.e. .
d) The gradient of with respect to q can be ob-tained by taking the gradient under the integral sign,i.e. .
!p
∇θ ln fX(x | θ)( )fX(x | θ) dx!n∫
∇θ fX(x | θ) dx!n∫ = ∇θ fX(x | θ) dx!n∫
∇θ θ(x) fX(x | θ) dx!n∫ = θ(x)∇θ fX(x | θ) dx!n∫
E θ(X)( )
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 59
Corollary:If the former regularity conditions hold, then1)
2)
Proof of 1):
E ∇θ ln fX(X | θ)( )( ) = 0E ∇θ ln fX(X | θ)( )( )(Θ − θ)T( ) = I
E ∇θ ln fX(X | θ)( )( ) = ∇θ ln fX(x | θ)( ) fX(x | θ) dx!n∫ =
=∇θ fX(x | θ)fX(x | θ)
fX(x | θ) dx!n∫ = ∇θ fX(x | θ) dx!n∫= ∇θ fX(x | θ) dx!n∫ = ∇θ1= 0
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 60
Proof of 2):
E ∇θ ln fX(X | θ)( )( )(Θ − θ)T( ) == ∇θ ln fX(x | θ)( ) θ(x)− θ( )T fX(x | θ)dx!n∫=
∇θ fX(x | θ)fX(x | θ)
θ(x)− θ( )T fX(x | θ)dx!n∫
= ∇θ fX(x | θ) θ(x)− θ( )T dx!n∫
= ∇θ fX(x | θ) θ(x)T dx
!n∫ − ∇θ fX(x | θ)dx!n∫ θT
= ∇θ θT − ∇θ1( )θT = I
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 61
Definition: (Fisher information matrix)The p´p Fisher information matrix is defined by
Hence, the entries of the Fisher information matrix are for i, j = 1,…,p given by
I (θ) = E ∇θ ln fX(X | θ)( )∇θT ln fX(X | θ)( )( ) =
= ∇θ ln fX(x | θ)( )∇θT ln fX(X | θ)( ) fX(x | θ)dx!n∫ .
I ij (θ) = E∂ ln fX(x | θ)( )
∂θ i⋅∂ ln fX(X | θ)( )
∂θ j
⎛
⎝⎜
⎞
⎠⎟
=∂ ln fX(x | θ)( )
∂θ i⋅∂ ln fX(X | θ)( )
∂θ jfX(x | θ)dx!n∫ .
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 62
Theorem: (Cramer-Rao inequality)Suppose the former assumptions and regularity condi-tions hold and the Fisher information matrix is posi-tive definite. Then the variance of any estimator is bounded downwards by
If particularly a = ei, i.e. the i-th Cartesian unit vector, the inequality provides the lower limit for the variance of the i-th component of .
I (θ)Θa = a
TΘ
Θ
Var(Θa) = aTC
ΘΘa ≥ aTI (θ)−1a ∀a ∈!p.
Var(Θ i ) = eiTC
ΘΘei ≥ ei
TI (θ)−1ei = I (θ)ii−1
i = 1,…,p
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 63
Proof:First, the random vector
is introduced and its second order moment matrix
is determined. After exploiting the former corollary the
E(YYT ) = E (Θ − θ)(Θ − θ)T( )− E (Θ − θ)∇θ
T ln fX(X | θ)( )( )I (θ)−1− I (θ)−1E ∇θ ln fX(X | θ)( )(Θ − θ)T( )+ I (θ)−1E ∇θ ln fX(X | θ)( )∇θ
T ln fX(X | θ)( )( )I (θ)−1.
Y = Θ − θ − I (θ)−1∇θ ln fX(X | θ)( )
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 64
second order moment matrix can be simplified to
Since
the second order moment matrix is positive semidefinite.Thus, we can write
and the assertion
of the theorem follows.
E(YYT ) = E (Θ − θ)(Θ − θ)T( )− I (θ)−1.
aTE(YYT )a = E(aTYYTa) = E (aTY)2( ) ≥ 0 ∀a ∈!p
aTE(YYT )a = aTCΘΘa − aTI (θ)−1a ≥ 0 ∀a ∈!p
aTCΘΘa ≥ aTI (θ)−1a ∀a ∈!p
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 65
Under the following regularity condition an alternative and often more convenient expression for calculating the in-formation matrix can be presented.
Regularity condition:e) The Hessian matrix of
with respect to can be obtained by taking all the re-quired second partial derivatives under the integral sign, i.e.
∇θ∇θT fX(x | θ) dx!n∫ = ∇θ∇θ
T fX(x | θ) dx.!n∫
fX(x | θ) dx!n∫θ
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 66
Corollary:If the former regularity condition holds, the Fisher infor-mation matrix can be alternatively determined by
Consequently, the elements of the Fisher information matrix can be expressed by
I (θ) = E ∇θ ln fX(X | θ)( )∇θT ln fX(X | θ)( )( )
= −E ∇θ∇θT ln fX(X | θ)( )( ).
I ij (θ)=E∂ ln fX(x|θ)( )
∂θ i⋅∂ ln fX(X|θ)( )
∂θ j
⎛
⎝⎜
⎞
⎠⎟ =−E
∂2 ln fX(x|θ)( )∂θ i ∂θ j
⎛
⎝⎜
⎞
⎠⎟
i, j = 1,…,p.
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 67
Proof:
E ∇θ∇θT ln fX(X|θ)( )( ) = E ∇θ
∇θT fX(x|θ)fX(x|θ)
⎛
⎝⎜
⎞
⎠⎟ =
= E∇θ∇θ
T fX(x|θ) ⋅ fX(x|θ)−∇θ fX(x|θ) ⋅∇θT fX(x|θ)
fX2(x | θ)
⎛
⎝⎜
⎞
⎠⎟
= E∇θ∇θ
T fX(x|θ)fX(x|θ)
⎛
⎝⎜
⎞
⎠⎟ −E
∇θ fX(x|θ) ⋅∇θT fX(x|θ)
fX2(x|θ)
⎛
⎝⎜
⎞
⎠⎟
= ∇θ∇θT fX(x|θ)dx!n∫ −E ∇θ ln fX(x|θ)( )∇θ
T ln fX(x|θ)( )( )= ∇θ∇θ
T fX(x|θ)dx!n∫ − I (θ) = ∇θ∇θT1− I(θ) = −I (θ)
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 68
Theorem:Let belong to the exponential family in canonical form and T denote the corresponding sufficient statistic. 1) The regularity conditions a) – e) are satisfied and the
information matrix is positive definite
Thus, for unbiased estimators the Cramer-Rao in-equality holds.
fX(x | ξ)
I(ξ) = E ∇ξ ln fX(X | ξ)( )∇ξT ln fX(X | ξ)( )( )
= −E ∇ξ∇ξT ln fX(X | ξ)( )( )
= ∇ξ∇ξT A(ξ) = Cov(T) = CTT.
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 69
2) Let the vector valued function
be one-to-one and differentiable with
Then the regularity conditions a) - e) are satisfied with respect to q and the information matrix can be expressed byI (θ) = −E ∇θ∇θ
T ln fX(X | θ)( )( ) = G(θ)I (ξ) ξ=g(θ)G(θ)T= G(θ) ∇ξ∇ξ
T A(ξ)( )θ=g(θ)
G(θ)T .
ξ = g(ξ), ξ ∈!k , ξ ∈!p with p ≤ k
G(θ) = ∇θ g(θ)T =
∂gj (θ)∂θ i
⎛
⎝⎜
⎞
⎠⎟i=1,…,p; j=1,…,k
.
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 70
Now, one asks oneself immediately, does there exist an unbiased estimator for which the information inequality takes the equality sign, i.e. whose covariance matrix e-quals the inverse Fisher information matrix.An estimator for which the information inequality takes the equality sign, i.e. whose covariance matrix coincides with the inverse Fisher information matrix (Cramer-Rao lower bound), is called efficient.
Efficient estimators can be achieved only in special ca-ses, since the random vector Y, introduced for proving the information inequality, must be the zero-vector with probability 1.
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 71
Exercise 3.5-1: Non/efficiency of linear least squares estimators when the noise is i.i.d. and normally distributed
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 72
The previous exercise showed, that the deviation of the covariance matrix from the Fisher information matrix can be neglected for large samples.This motivates the definition of the limit
where n indicates the growing sample size. If the matrix exists and is not singular, then unbiased and con-
sistent estimators are of interest that satisfy the limit
Simultaneously, such estimators are then even consis-tent in the mean square sense.
Γ(θ) = limn→∞
1nIn(θ),
Γ(θ)
limn→∞nC
ΘnΘn= Γ(θ)−1.
Θn
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 73
Estimators used in practice are often neither unbiased nor mean square consistent. In these cases the limiting distributions of such estimators are considered. If the lim-iting distributions have the property
the estimators are said to be asymptotically efficient.Furthermore, under certain regularity conditions (similar to the former) one can show that in the class of asympto-tically normally distributed estimators the following lowerbound can be proved.
aTD(θ)a ≥ aTΓ(θ)−1a a ∈!p
limn→∞
n(Θnn − θ)∼Np 0,D(θ)( ) and D(θ) = Γ(θ)−1
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 74
Exercise 3.5-2: Asymptotic efficiency of linear least squares estimatorswhen the noise is i.i.d. and normally distributed
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 75
3.6 Maximum Likelihood EstimationThe maximum Likelihood method is one of the most im-portant procedures for the construction of estimators. For given observations x the procedure selects that q as maximum likelihood estimate , for which the density function takes its maximum. Thus, the maximum likelihood estimate can be interpre-ted heuristically as that parameter vector that makes the occurrence of the data observed most likely. Instead of determining the maximum of , one can due to the monotonicity of the logarithmic function alter-natively maximize .
fX(x | θ)
fX(x | θ)
ln fX(x | θ)( )
θ(x)
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 76
Definition:Suppose X possesses the density with and the vector of observations x = (x1,…,xn)T is given. Then
is called likelihood function and
is called maximum likelihood estimate (MLE) for .
If the gradient of with respect to q exists and if is positive for the given x, one can try to find the
MLE by solving the likelihood equation system
θ
θ ∈ΩfX(x | θ)
fX(x | θ)
fX(x | θ)fX(x | θ)
θ = argmaxθ∈Ω
fX(x | θ) = argmaxθ∈Ωln fX(x | θ)( )
∇θ ln fX(x | θ)( )θ=θ(x)
= 0
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 77
Remarks:• Generally the likelihood equation system is nonlinear.• The likelihood equation system can possess several
solutions.• The solutions of the likelihood equation system do not
necessarily correspond to relative maxima.• If the absolute maximum of the likelihood function lies
on the boundary of W it can not be described by a so-lution of the likelihood equation system.
• Consequently, one has generally to employ sophisti-cated numerical optimization techniques for determin-ing maximum likelihood estimates.
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 78
Properties of Maximum Likelihood Estimation:1) Let be a continuous mapping of
the parameter vector h and the MLE of h. Then the MLE of is given by .
2) Suppose t(x) is a sufficient transformation for q, i.e.
Then the maximum likelihood estimate only de-pends over t(x) on x.
3) If belongs to the exponential family in cano-nical form the related likelihood equation system
possesses a unique solution.
θ = g(η) θ = g(η)
θ
η
ηg :!q → !p, p ≤ q
fX(x | θ) = g t(x) | θ( ) ⋅h(x).
∇ξ ln fX(x | ξ)( ) = t(x)−∇ξA(ξ) = 0
fX(x | ξ)
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 79
Exercise 3.6-1: Maximum likelihood estimation for linear models when the noise is i.i.d. and normally distributed
Exercise 3.6-2: Sinusoids + white noise, maximum likelihood estimation of amplitude, phase and frequency when the noise isi.i.d. and normally distributed
Exercise 3.6-3: Example for inconsistent maximum likelihood estimation
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 80
3.7 Bayesian EstimationIn the Bayesian theory of parameter estimation the un-known parameter vector q is treated itself as a realiza-tion of a random experiment that obeys its own so calledprior distribution .The objective is to use together with the measure-ments x drawn from the distribution to turn the prior distribution into a posterior distribution
that is used for the statistical inference.
fΘ(θ)
fΘ(θ)
fΘ(θ)fX(x | θ)
fΘ(θ | x) =fXΘ(x,θ)fX(x)
=fX(x | θ)fΘ(θ)fXΘ(x,θ)dθ!p∫
=fX(x | θ)fΘ(θ)fX(x | θ)fΘ(θ)dθ!p∫
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 81
Loss functionThe quality of the estimate is measured by a real-valued loss function , e.g. by the Euclidean dis-tance between q and , i.e.
Risk or loss averageThe risk is the loss averaged over the distribu-tion of the measurements for any fixed parameter vector
, i.e.
L(θ,θ(x)) = θ − θ(x)( )T θ − θ(x)( ).L(θ,θ(x))
L(θ,θ(x))
θ(x)
θ ∈Ω
RL(θ | θ) = EX|θ L(θ,θ(X))( ) = L(θ,θ(x))fx(x | θ)dx!n∫ .
θ(x)
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 82
Bayes riskThe Bayes risk is the risk averaged over the pri-or distribution , i.e.
The Bayes risk measures the average risk one incurs if the estimator is used for the random experimentat hand. Hence, is depending only on the estimat-ing function .
RL(θ) = EΘ RL(θ |Θ)( ) = EΘ EX|θ L(Θ,θ(X))( )( )= L(θ,θ(x))fX(x | θ)fΘ(θ)dxdθ!n∫!p∫= L(θ,θ(x))fXΘ(x,θ)dxdθ!n+p∫ = EXΘ L(Θ,θ(X))( )
RL(θ |Θ)
RL(θ)
fΘ(θ)
Θ= θ(X)
θ(x)
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 83
Bayes estimatorMinimization of the Bayes risk over the class of all estimating functions for which exists, providesthe Bayes estimating function
and therefore the Bayes estimator .
In the following Bayes estimators are considered in more detail for the square error, the absolute error and uniform error loss functions.
RL(θ)RL(θ)θ(x)
ΘL = θL(X)
θL(x) = argminθ(x)RL(θ) = argminθ(x)
EX,Θ L(Θ,θ(X))( )
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 84
3.7.1 Minimum Mean Square Error EstimationNon-Linear Minimum Mean Square Error EstimationTo estimate the random parameter vector Q by means of the realization of the random vector X in the minimum mean square error sense we have to find an estimating function such that
is minimum. Over the class of all functions for whichthe expected value exists, is minimized by
θ(x)
θ(x)
RMSE (θ) = E Θ − θ(X)( )T Θ − θ(X)( )⎛⎝
⎞⎠
= θ − θ(x)( )T θ − θ(x)( )fXΘ(x,θ) dxdθ!n+p∫
RMSE (θ)
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 85
cf. Exercise 1.9-18.
Linear Minimum Mean Square Error EstimationNow, we wish to estimate the random parameter vectorQ by the linear estimating function
such that the mean square error
θMMSE (x) = E θ |X = x( ),
θ(x) = Ax +b
RLMSE (A,b) = E Θ − (AX +b)( )T Θ − (AX +b)( )( )= θ − (Ax +b)( )T θ − (Ax +b)( )fXΘ(x,θ)dxdθ!n+p∫
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 86
is minimized by varying the parameter matrix A and vec-tor b. Thus, we have to solve the minimization problem
The mean square error is minimum if
are satisfied. Applying the expectation operator we obtain
where
( )=,
ˆ ˆ( , ) argmin ( , ) .LMSERAb
A b A b
∂∂ARLMSE (A,b) = −2E Θ − (AX + b)( )XT( ) = 0
∇bRLMSE (A,b) = −2E Θ − (AX + b)( ) = 0
RΘX − ARXX − bµXT = 0 and µΘ − AµX − b = 0,
RXX = E(XXT), RΘX = E(ΘXT), µX = E(X) and µΘ = E(Θ).
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 87
Resolving for leads after some manipulations to
and therefore to the estimating function
After exploiting the relationships between the covarianceand the first and second order moments
the estimating function can also be expressed by
ˆ ˆandA b
θLMMSE (x) = µΘ +CΘXCXX−1 x − µX( ).
CXX =RXX − µXµXT and CΘX =RΘX − µΘµX
T
θLMMSE (x)= Ax+ b=µΘ+ RΘX−µΘµXT( ) RXX−µXµX
T( )−1 x− µX( ).
A = RΘX−µΘµXT( ) RXX−µXµX
T( )−1
b = µΘ − RΘX−µΘµXT( ) RXX−µXµX
T( )−1µX
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 88
Theorem:Let Y be a random vector composed of the random vec-tors X and Q and obeying the distribution
Then X and Q possessing the marginal distributions
and the conditional distribution
cf. exercise 1.10-7.
Y =XΘ
⎛⎝⎜
⎞⎠⎟∼Nn+p
µXµΘ
⎛
⎝⎜
⎞
⎠⎟ ,
ΣXX ΣXΘ
ΣΘX ΣΘΘ
⎛
⎝⎜
⎞
⎠⎟
⎛
⎝⎜
⎞
⎠⎟ .
X∼Nn µX,ΣXX( ), Θ∼Np µΘ,ΣΘΘ( )
Θ |X = x∼Np µΘ + ΣΘXΣXX−1 (x − µX ),ΣΘΘ − ΣΘXΣXX
−1 ΣXΘ( )
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 89
Theorem:Given the linear model
where H is a known matrix, Q and Z are statistically in-dependent random vectors with and
. Then
and the MMSE estimate can be expressed by
X =HΘ + Z,
Θ∼Np(µΘ,ΣΘΘ )
θMMSE (x) = µΘ + ΣΘΘHT HΣΘΘH
T + ΣZZ( )−1(x −HµΘ ).
Y =XΘ
⎛⎝⎜
⎞⎠⎟∼Nn+p
HµΘ
µΘ
⎛
⎝⎜
⎞
⎠⎟ ,HΣΘΘH
T + ΣZZ HΣΘΘ
ΣΘΘHT ΣΘΘ
⎛
⎝⎜
⎞
⎠⎟
⎛
⎝⎜⎜
⎞
⎠⎟⎟
Z∼Nn(0,ΣZZ)
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 90
Exercise 3.7-1: Proof of the Theorem
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 91
3.7.2 Minimum Mean Absolute Error EstimationThe mean absolute error estimation is considered for the single parameter case only. To estimate the random para-meter q by means of the realization of the random vec-tor X in the minimum mean absolute error sense
has to be minimized. As shown in the subsequent exer-cise the minimization of leads to
Hence, the is the median of the posterior den-sity function .
ˆ ( )MMAEq x
ˆ( )MAER q
RMAE (θ) = E |Θ − θ(X)|( ) = |θ − θ(x)| fXΘ (x,θ) dxdθ!n+1∫
fΘ (θ | x) dθ−∞
θMMAE (x)∫ = fΘ (θ | x) dθθMMAE (x)
∞
∫ .
fΘ (θ | x)
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 92
Exercise 3.7-2: Proof of median
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 93
3.7.3 Maximum A Posteriori EstimationAnother estimation technique that fits within the Bayesian framework is the maximum a posteriori estimation (MAP).Single parameter caseTo motivate the approach the uniform loss function
is introduced. The Bayes risk is then given by
ˆ0 | ( )|ˆ( , ( )) where 0ˆ1 | ( )|
Lq q d
q q dq q d
ì - £ï= >í- >ïî
xxx
RL(θ) = E L(θ,θ(X))( ) = L(θ,θ(x))fXΘ (x,θ)dxdθ!n+1∫= L(θ,θ(x)) fΘ (θ | x)dθ−∞
∞
∫⎛⎝⎞⎠!n∫ fX(x)dx.
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 94
Since fX(x) and the integral in the brackets are always po-sitive it is sufficient to minimize
or equivalently to maximize
for every x.For d arbitrary small determines the mode of and is termed maximum a posteriori (MAP) estimate, i.e.
ˆ( )q x
L(θ,θ(x))fΘ (θ |x)dθ−∞
∞
∫ = fΘ (θ |x)dθ + fΘ (θ |x)dθθ (x)+δ
∞
∫−∞
θ (x)−δ
∫
1− fΘ (θ |x)dθ + fΘ (θ |x)dθθ (x)+δ
∞
∫−∞
θ (x)−δ
∫⎛⎝⎜⎞⎠⎟ = fΘ (θ |x)dθθ (x)−δ
θ (x)+δ
∫
fΘ (θ |x)
θMAP (x) = argmaxθfΘ (θ |x) = argmaxθ
fX(x |θ)fΘ (θ)= argmax
θln fX(x |θ)( )+ ln fΘ (θ)( )( )
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 95
Multiple parameter caseEmploying the marginal posterior density functions
the MAP estimates are given by
However, due to the required integration we might pro-pose the alternative vector valued MAP estimate
Remark:Generally, both MAP estimates are not equivalent.
fΘ i(θ i | x) = fΘ(θ | x)dθ1…dθ i−1dθ i+1…dθp!p−1∫
θ i,MAP (x) = argmaxθ ifΘ i(θ i | x) i = 1,…,p.
θMAP (x) = argmaxθ fΘ(θ | x) = argmaxθ fX(x | θ)fΘ(θ).
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 96
Exercise 3.7-3:MMSE, MMAE and MAP estimation of the parameter of an exponential distribution, where the parameter obeys also an exponential distribution
Exercise 3.7-4: Signal + noise, MMSE signal amplitude estimation
I N S T I T U T E O F W A T E R A C O U S T I C S,
S O N A R E N G I N E E R I N G A N DS I G N A L T H E O R Y
Chapter 3 / Stochastic Signals and Systems / Prof. Dr.-Ing. Dieter Kraus 97
References to Chapter 3[1] M. Fisz, Probability Theory and Mathematical Statistics,
Krieger Publishing Company, 1980[2] C.W. Helstrom, Elements of Signal Detection and Estimation,
Prentice Hall, 1994[3] S. Kay, Fundamentals of Statistical Signal Processing,
Vol. 1: Estimation Theory, Prentice Hall, 1993[4] E.L. Lehmann and G. Casella, Theory of Point Estimation,
Springer, 2003[5] H.V. Poor, An Introduction to Signal Detection and Estimation,
Springer, 1994[6] C.R. Rao, Linear Statistical Inference and Its Application,
John Wiley, 1973[7] L.L. Scherf, Statistical Signal Processing, Addison Wesley, 1990