estimation theory 1

26
ESTIMATION THEORY 4.1. Introduction: When we fit the random data by an AR model, we have to determine the process parameters observed data. In RADAR signal processing, we have to determine the location and the velocity of a target by observing the received noisy data In communication, we have to infer about the transmitted signal from the received noisy data. Generally estimation includes: Parametric estimation and Non parametric estimation. RADAR signal processing

Upload: gopi-saiteja

Post on 20-Jan-2017

171 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: Estimation theory 1

ESTIMATION THEORY

4.1. Introduction:

When we fit the random data by an AR model, we have to determine the process

parameters observed data.

In RADAR signal processing, we have to determine the location and the velocity of a

target by observing the received noisy data

In communication, we have to infer about the transmitted signal from the received noisy

data.

Generally estimation includes: Parametric estimation and Non parametric estimation.

RADAR signal processing

Page 2: Estimation theory 1

For example, consider the problem of estimating the probability density function Xf x

of a random variable X. We may assume a model for X, say the Gaussian, and find the mean

and the variance 2 of the RV. Finding out and 2 from the observed value of X is a

problem of parameter estimation. Particularly, we have the problem to find each value of a

signal from noisy observation. This problem is known as the signal estimation problem.

Otherwise, we may be interested to find the true value of Xf x directly from data for all

values of X without assuming any model for Xf x .This is the non-parametric method of

estimation.

We will discuss the problem of parameter estimation here:

We have a sequence of observable random variables 1, 2 n

X X X , represented by the

vector:

1

2

n

X

X

X

X

X is governed by the joint density function which depends on some unobservable parameter

and is given by: 1 2, ,..., / 1 2 /, , , | |

nX X X nf x x x fX x , where, may be deterministic or

random. Our aim is to make an inference on from an observed sample of 1, 2 , ,

nX X X .

An estimator ( ) x is a rule by which we guess about the value of an unknown on the basis of

X.

( ) X being a function of random variables, is a random variable. For a particular observation

1, 2 , ,n

X X X , we get what is known as an estimate (not estimator). An estimator for a parameter

is also called a point estimate.

Example 1:

Let 1 2, , ,n

X X X be a sequence of independent and identically distributed (i.i.d) random

variables with mean and variance 2 .

Page 3: Estimation theory 1

1

1 n

i

i

Xn

is an estimator for .

2 2

1

1ˆ1

n

i

i

Xn

is an estimator for 2 .

An estimator is a function of the random sequence 1 2, , ,

nX X X and if it does not involve any

unknown parameters. Such a function is generally called a Statistic.

Example 2:

Suppose we have DC voltage X corrupted by noise i

V and the observed data , 1,2,...,i

Y i n are

given by

i iY X V

Then, 1

1ˆn

i

i

X Yn

is an estimator for X.

Properties of Estimators

A good estimator should satisfy some properties. ( ) X should be as close to as possible.

Some desirable properties of the estimator can be described in terms of the mean and variance of

the estimator.

(a) Unbiased Estimator

An estimator of is said to be unbiased if and only if E( )= . The quantity E( )- is

called the bias of the estimator. Unbiasedness is necessary but not sufficient to make an

estimator a good one.

A random parameter θ is unbiased. if E E . We consider θ to be a deterministic parameter in this discussion.

Consider two estimators,

22

1

1

1 n

i

i

Xn

and

22

2

1

1

1

v

i

i

Xn

Page 4: Estimation theory 1

for an i.i.d. sequence 1, 2, ,

nX X X .

We can show 2

2 is an unbiased estimator.

2 2

1 1

2 2

1

2

n n

i i

i i

n

i i

i

E X E X

E X X

Now, 2 2

iE X and

2

2

2

2

2

2

1

1( )

i

i

i

XE E

n

E n Xn

E Xn

2

2

1i i j

i j i

E X E X Xn

2

2

1i

E Xn

(because of independence) 2

n

also, 2 2

ii

XE X E

n n

2

2 2 2 2

1

2 ( 1)n

i x

i

E X n n

So, 2

2 2

1

1ˆ ˆ( )1

n

i

i

E E Xn

2 is an unbiased estimator of 2 .

Similarly, sample mean is an unbiased estimator

1

1

1

1

n

x i

i

n

i

i

Xn

nE E X

n n

.

Example 4 Suppose 1 2, , ,n

X X X is an i.i.d. Poisson random variables with unknown

parameter . Then,

Page 5: Estimation theory 1

1

1

1ˆn

i

i

Xn

and 2

1

1

1ˆ1

n

i

i

Xn

are two unbiased estimators of .

(b) Variance of the Estimator

The variance of the estimator is given by:

2

var ( )E E

For an unbiased case

2

var E

The variance of the estimator should be as low as possible.

An unbiased estimator is called a minimum variance unbiased estimator (MVUE) if

2 2

'E E where ' is any other unbiased estimator.

(c) Mean square error of Estimator

2

MSE E .MSE should be as small as possible. Out of all unbiased estimator, the

MVUE has the minimum mean square error.

MSE is related to the bias and variance as shown below:

2 2

2 2

2 2

2

2

MSE E E E E

E E E E E E E

E E E E E E E

2var 0b

So, 2varMSE b

(d) Consistent Estimators

As we have more data, the quality of estimation should be better. This idea is used in defining

the consistent estimator. An estimator is called a consistent estimator of if converges in

probability to .

0limN

P

for any 0

Page 6: Estimation theory 1

Less rigorous test is obtained by applying the Markov Inequality

2

2

EP

If is an unbiased estimator [ 0b ], then varMSE .

Therefore, if 2

0limN

E

, then will be a consistent estimator.

Also, note that 2varMSE b .

Therefore, if the estimator is asymptotically unbiased (i.e. 0b as n ) and var 0

as n ,then 0MSE .Therefore for an asymptotically unbiased estimator , if var 0

as n , then will be a consistent estimator.

Example 3

Suppose 1 2, , ,n

X X X is an i.i.d. random sequence with unknown x

and known 2

x .

Let 1

1 n

i

i

Xn

be an estimator forx

. We have already shown that x

is unbiased. also,

2

varn

. Is it a consistent estimator?

Clearly, 2

var 0lim limxn n n

. Therefore, is a consistent estimator of .

Efficient Estimator

Suppose 1 and 2 be two unbiased estimator of the parameter . The relative efficiency of the

estimator 2 with respect to the estimator 1 I s defined by

1

2

var( )

ˆvar( )Relative Efficiency

Page 7: Estimation theory 1

Particularly, if 1 is an MVUE, it is called an efficient estimator and the absolute efficiency of

an unbiased estimator with respect to this estimator.

Example 5 Suppose 1 2, , ,

nX X X is an i.i.d. normal random variables with unknown mean .

Then, and the sample median m are two

We have shown that 2

varn

and it can be shown that 2

ˆvar2

mn

. Therefore,

2ˆEfficiencyof m

Example 6 Suppose 1 2, , ,N

X X X is an i.i.d. normal random variables with unknown mean .

Then, 1

1

1ˆ1

n

i

i

Xn

is a biased estimator of . Note that

1

2 2

1 2 2

ˆ ˆ1

ˆ ˆvar( ) var( )( 1) ( 1)

n

n

n n

n n

We have shown that 2

varn

and it can be shown that 2

ˆvar2

mn

. Therefore,

2ˆEfficiencyof m

Page 8: Estimation theory 1

Minimum Variance Unbiased Estimator

We described about the Minimum Variance Unbiased Estimator (MVUE) which is a desirable

estimator

is an MVUE if

)ˆ(E

and ˆ ˆ( ) ( )Var Var

where is any other unbiased estimator of .

Theorem: MVUE is unique

Suppose 1 and

2 are two MVUEs for the deterministic parameter .

Clearly , 1 2ˆ ˆE E

Suppose 2

1 2( ) ( )Var Var

Assume another estimator 1 23

2

Then

1 2 1 23

1 2 1 2

1 2 1 2

2

var( ) var( ) 2cov( , )( )

4

var( ) var( ) 2 cov( , )

4

var( ) var( ) 2 var( ) var( ).

4using CS. inequality

var

But3( )var cannot be less than 2 .

2 2

3 1 2( ) cov( , )var .

Now consider

1 2 1 2 1 2

2 2 2

( ) var( ) var( ) 2cov( , )

2

0

var

1 2ˆ ˆ with probability 1.

Cramer Rao theorem

Can we reduce the variance of an unbiased estimator indefinitely? The answer is given by the

Cramer Rao theorem.

Suppose is an unbiased estimator of random sequence. Let us denote the sequence by the

vector

Page 9: Estimation theory 1

1

2

n

X

X

X

X

Let / 1( ,......, / )

nf x x X

be the joint PDF which characterises .X This function is called

likelihood function. Note that may also be random. In that case likelihood function will

represent conditional joint density function.

The quantity / 1 2( / ) ln ( , ,..., / )

nL f x x x Xx is called the log- likelihood function.

Statement of the Cramer Rao theorem

Suppose is an unbiased estimator of D , where D is an open interval and

/ 1( ,..., / )n

f x x Xsatisfies the following regularity conditions:

(i) The support /{ | ( / ) 0}

Xf x x does not depend on . We may assume n

to be the support.

(ii) For , D x , ( / )L L

x exists and is finite.

Then

1ˆ( )( )

n

VarI

where 2( ) ( )n

LI E

and ( )n

I is a measure of average information in the random sequence

and is called Fisher information statistic.

The equality of CR bound holds if )ˆ(

cL

where c is a constant.

Proof: is an unbiased estimator of

0)ˆ( E .

Page 10: Estimation theory 1

/

/

ˆ( ) ( / ) 0.

ˆ( ) ( / ) 0.

f d

f d

X

X

x x

x x

where the integration is an n-fold integration.

Differentiate with respect to , we get

/ˆ{( ) ( / )} 0.

df dx

d

X x

Note that the regularity condition that the limits of integration are not function of . Therefore,

the processes of integration and differentiation can be interchanged and we get,

/

/ /

/

ˆ{( ) ( / )} 0.

ˆ( ) ( / )} ( / ) 0.

ˆ( ) ( / ) ( / ) 1. (1)

f d

f d f d

f d f d

X

X

X X

X

x x

x x x x

x x x x

Note that

/

/ / /( / ) {ln ( / )} ( / )

( ) ( / )

f f f

Lf

X

X X Xx x x

x

Therefore, from (1)

/ˆ( ){ ( / )} ( / )} 1.L f d

Xx x x

So that

/ /

2

ˆ( ) ( / ) ( / ) ( / )dx 1f L f d

X Xx x x x . (2)

since /

( / ) 0.f

X

x

Recall that he Cauchy Schawarz inequality is given by

Page 11: Estimation theory 1

222

, baba

where the equality holds when ba c ( where c is any scalar ).

Applying this inequality to the L.H.S. of equation (2) we get

2

2

2

-

ˆ( ) ( / ) ( / ) ( / )dx

ˆ( - ) ( / ) d ( ( / ) ( / ) d

f L f d

f L f

X X

X X

x x x x

x x x x x

= ˆvar( ) ( )n

I

ˆ. . var( ) ( )n

L H S I

But R.H.S. = 1

ˆvar( ) I ( ) 1.n

1ˆvar( ) ,( )nI

which is the Cramer Rao Inequality. The right hand side is the Cramer Rao lower bound (CRLB)

for ˆvar( ) .

The equality will hold when

ˆ ( / ) ( / )} ( ) ( / ) ,L f c f

X Xx x x

so that

where c is independent of and may be a function of . Noting that

2

2 21 ( / ) ˆ = ( - )ˆvar( )

LE c E

x

,

we get

( )nc I

Thus the CRLB is achieved if and only if

( / ) ˆ ( )( - )n

LI

x

( / ) ˆ ( - )L

c

x

Page 12: Estimation theory 1

If satisfies CR -bound with equality, then is called an efficient estimator. Note that an

efficient estimator is always an MVUE.

Also from /

( / ) 1,f d

Xx x we get

/

/

( / ) 0

( / ) 0

f d

Lf d

X

X

x x

x x

Taking the partial derivative with respect to again, we get

/ /

/ /

2

2

22

2

( / ) ( / ) 0

( / ) ( / ) 0

L Lf f d

L Lf f d

X X

X X

x x x

x x x

2 2

2

L( / ) L( / )E - E

x x

Thus the CR inequality may be written as:

2

2

1ˆvar( )L( / )

- E

x

Remark

(1) If the information ( )I is more, ˆvar( ) will be less.

(2) Suppose 1 2, ,...,n

X X X are iid. Then

Page 13: Estimation theory 1

/ /1 1

, ,..., /1 2

, ,..., /1 2

/

/

2 2

1 2

2

1 2

2

1 22

2

21

2

2

( ) ln( ( )) ln( ( ))

( ) ln( ( , ,..., / ))

ln( ( , ,..., / ))

ln( ( / ))

ln(

X X

X X Xn

X X Xn

Xi

Xi

n n

n

n

ii

I E f x E f x

I E f x x x

E f x x x

E f x

E f

1

1

( / ))

( )

n

ii

x

nI

(3) If satisfies CR -bound with equality, then is called an efficient estimator.

Extension to Vector Parameters

Suppose 1 2, ,..., k are k parameters which are represented as the vector 1 2[ ... ]k θ .

Then the log-likelihood function is given by

/ 1 2( / ) ln ( , ,..., )nL f x x x X θx θ

We can represent the 1st-order partial derivatives of ( / )L x θ as

1 2

( / ) ( / ) ( / )... ... ( / )k

L L L L

x θ x θ x θ x θθ

The Fisher Information matrix is given by

( / ) ( / )E L L

nI x θ x θθ θ

where E is performed on each term of the matrix.

It can be shown that

Page 14: Estimation theory 1

2 2 2

2

1 1 2 1 2

2 2 2

2

1 2

( / )

( / ) ( / ) ( / )... ....

( / ) ( / ) ( / )... ....

n n n

E L

L L L

EL L L

nI x θθ θ

x θ x θ x θ

x θ x θ x θ

Assume that the pdf / 1 2( , ,..., /nf x x xX θ θ ) satisfies the regularity condition

( ( / )) 0E L

x θθ

where the expectation is taken with respect to / 1 2( , ,..., / )nf x x xX θ θ Then, the

covariance matrix of any unbiased estimator satisfies:

ˆ ( )n -1

θC I θ 0

where the in equality with respect to the Zero matrix implies that the left-hand side is a positive

semi-definite matrix.

The CR theorem for the individual variances is given by

( , )ˆ( ) ( ) |i n i iVar -1I θ

where ( , )| i i

denotes the ith element of the matrix.

The equality will hold when

ˆ ( / ) ( / )} ( ) ( / ) ,L f c f

X Xx x x

so that

Example 3:

Let 1 2, ,...,n

X X X are iid Gaussian random sequence with known 2 variance and unknown

mean .

Suppose 1

1ˆn

i

i

Xn

which is unbiased. Find CR bound and hence show that is an efficient

estimator.

( / ) ˆ ( ) ( - )L

n

x θ I θ θ θ

θ

Page 15: Estimation theory 1

The likelihood function / 2( , ,...., / )

nf x x x X 1

will be product of individual densities (since iid)

/ 2 n

1 2( )22 11

( , ,....., / )( (2 )

n n

nxi

if x x x e

X 1

so that 2

21

1( / ) ln( 2 ) ( )

2

nn n

i

i

L x

X

Now

21

2

2 2

2

2 2

1 0 - ( -2) ( )

2

n -

n So that E -

n

ii

LX

L

L

CR Bound = 2

2

22

1 1 1

( )-n

nLI nE

2 2 21

1 ˆ ( ) - ( - )2

n

ii

i i

L n X nX

n

estimator.efficient an is ˆ and

)-ˆ( c - Hence

L

Example 4 Suppose n n

X a bn V , 2~ (0, ), and are known constants.n

V N a b Here

[ ]a b θ . The -likelihood function is given by

1

/ 1 2n

1 2( )21 2( , ,..., )

( (2 )

ii

nn

nx a bi

f x x x e

X θ

2

21

2 21 1

2 2 2

2 2 2 2 2

1( / ) ln( 2 ) ( )

2

1 1( ), ( ) ,

( 1) ( 1)(2 1), and

2 6

nn n

ii

n n

i ii i

L x a bi

L Lx a bi x a bi i

a b

L n L n n L n n n

a a b a

x θ

Page 16: Estimation theory 1

2

( 1)

2

1 ( 1) ( 1)(2 1)

2 6

n nn

n n n n

nI

Taking the inverse we get,

1 2

2

2

2(2 1) 6

( 1) ( 1)

6 12

( 1) ( 1)

2(2 1) 12var( ) and var( )

( 1) ( 1)

n

n

n n n n

n n n n

na b

n n n n

I

MVUE through Sufficient Statistic

W saw that MVUE achieving the CRLB can be obtained through the factorization of

( / ) ˆ( )L

x θ

I θ θ θθ

. However, the CRLB may not be achieved by the MVUE. The sufficient

statistic can be used to find the MVUE under certain conditions.

The observations 1 2, ,...., nX X X contain information about the unknown parameter . An

estimator should carry the same information about as the observed data. The concept of

sufficient statistic is based on this idea.

A measurable function 1 2( , ,...., )

nT X X X is called a sufficient statistic of if it contains the

same information about as contained in the random sequence 1 2, ,...., .nX X X In other word the

joint conditional density 1 2 1 2, ,..., | ( , ,..., ) 1 2( , ,..., )

n nX X X T X X X nf x x x does not involve .

There are a large number of sufficient statistics for a particular criterion. One has to select a

sufficient statistic which has good estimation properties.

Example 7: Suppose ,1i

x N and 1 2 1 2,T x x x x . Then,

Page 17: Estimation theory 1

1 2 1 2

1 2 1 2

1 2

1 2

1 2

2 2

1 2

2

1 2

1 2 1 2, , ,

1 2, | ,

1 2,

, 1 2

1 2

1

2

12

4

, , ,,

,

,

1

2 1

4

1

x x T x x

x x T x x

T x x

x x

x x

x x

x x

f x x T x xf x x

f T x x

f x x

f x x

e

e

e

2 2 2 2 2 21 2 1 2 1 2 1 2 1 2

2 21 2 1 2

2

1 2

1 12 2 4 4 2

2 4

12

4

1

4

1

1

x x x x x x x x x x

x x x x

x x

e

e

Thus 1 2 1 2

1 2, | ,,

x x T x xf x x does not involve the parameter . Hence 1 2 1 2,T x x x x is a

sufficient statistic.

Remark: If 1 2 1 2, 3T x x x x we can show in a similar way that 1 2,T x x is not a sufficient

statistic.

The above definition allows us to check whether a given statistic is sufficient or not. A way to

determine a sufficient statistic is through the Neyman Fisher Factorization theorem .

Factorization theorem

For continuous RVS 1 2, , ,

nX X X , the statistic

1 2( , ,...., )n

T X X X is a sufficient statistic for

if and only if

1 2, ,..., / 1 2 1 2

ˆ( , ,...., ) ( , ) ( , ,...., )nX X X n nf x x x g h x x x

where )ˆ,( g is a non-constant and nonnegative function of and and 1 2( , ,...., )n

h x x x

does not involve and is a nonnegative function of 1 2, ,....,n

x x x .

For the discrete case, the factorization theorem states:

T(x) is sufficient if and only if

( ) ( , ( )) ( )p g T h X

x x x

Page 18: Estimation theory 1

Proof: Denote the value T(x) by t. Suppose T(X) is a sufficient statistic. Then,

: ( )

( ) ( )

( , ( ) )

( ( ) ) ( ( ) , )

( ( ) ) ( ( ) ) [ ( ) is a sufficient statistic ]

( ) ( )

( , )) ( )

T t

p P

P T t

P T t P T t

P T t P T t T

p h

g t h

X

Xx x

x X x

X x X

X X x X

X X x X X

x x

x

where : ( )

( , ) ( )T t

g t p

Xx x

x and ( ) ( ( ) )) h P T t x X x X

Conversely, suppose ( ) ( , )) ( )p g t h X

x x . Then,

: ( )

: ( )

: (

, ( )( )

( )

( )

( , ) ( )

( , ) ( )

( , ) ( )

( , ) ( )

( )

( )

T t

T t

T t

P T tP T t

P T t

P

P T t

g t h

g t h

g t h

g t h

h

h

x x

x x

x x)

X x XX x X

X

X x

X

x

x

x

x

x

x

which does not depend on θ.

Example 8: Suppose 1 2, ,...,n

X X X are iid Gaussian random variables with unknown mean

and known variance 1.

Then n

ii 1

1( ) XT

n X is a sufficient statistic of .

Because

Page 19: Estimation theory 1

1 2

1

2 2

1 1

21 2

, ,..., / 1 21

21 2

1 1 2 2

1( , ,...., )

2

1

( 2 )

1

( 2 )

n

n

i

n n

ii i

n xi

X X X ni

xi

n

n x xi

n

f x x x e

e

e e

The first exponential is a function of 1 2, ,..., nx x x and the second exponential is a function of

and 1

( )n

ii

T x x

. Therefore 1

( )n

ii

T x x

is a sufficient statistics of .

Rao-Blackwell Theorem

Suppose is an unbiased estimator of and ( )T X is a sufficient for .

Then ˆ ˆ( / ( ))E T X is unbiased and ˆ ˆvar( ) var( ) .

Proof : Using the property of conditional expectation ,we have

ˆ ˆ( ( / ( )))

ˆ( )

E E E T

E

X

∴ is an unbiased estimator of �. Now 2

2

2

2

ˆ ˆvar( ) ( )

ˆ( ( ) / ( ))

ˆ( (( ) / ( )) )

ˆ( )

ˆvar( )

(Using Jensen's inequality for a convex finction)

E

E E T

E E T

E

X

X

Complete statistic

A statistic X is said to be complete if for any bounded function g X

0 for E g X

implies that

0 1 for P g X

Example: Suppose 1 2 3, , ,........,n

are iid 1, random variables and

1

n

i

i

X

Clearly ,i n X and X takes values 0,1,...,t n .

Page 20: Estimation theory 1

0

1 0 0,1n

n tt

t

nE g g t

t

X

0

1 0 0,11

tnn

t

ng t

t

0

0 1

tn

t

ng t

t

The left hand side are polynomials in 1

and can be zero if and only if the coefficients

vanish

g 0 for 0,1,2,...,t t n

Hence is complete statistic.

Remark: If X is a complete statistic then there is only one function g X which is

unbiased. Suppose there is another function 1g x which is unbiased.

Then 1 0g g X X

1 0g g X X

1 0g g X X

1g g X X with probability 1

Lehmann-Scheff theorem

Suppose X is complete sufficient statistic for and g X is unbiased estimator based

on X . Then g X is the MVUE

Proof:

Using Rao Blackwell theorem,

ˆ /g X X X ,where

X is any unbiased estimator of , is unbiased.

g X is unique as X is a complete

statistic and

ˆ

ˆ /

Var g Var

g

X X

X X X

is an MVUE

Exponential Family of Distribution

A family of distribution with the probability density function ( or probability mass function) of

the form

Page 21: Estimation theory 1

/ ( ) ( )exp( ( ) ( ))X

f x a b x c t x

with ( ) 0a and ( )c as real functions of and ( ) 0b x

is called an exponential family of distribution.

Similarly a family of distributions /

1

( ) ( )expk

X i i

i

f x a b x c t x

with

, and i

a b x c as specified above , is called the k-parameter exponential family. An

e po e tial fa il of dis rete RV’s ill ha e the pro a ilit ass fu tio i the a o e for s

Example 9

Suppose 2,X N then

2

2

2/

2 2

2

2

2 2 2

1 1exp

22

1 1 = exp 2

22

1 1 1 = exp exp

2 22

Xf x x

x x

x x

Thus 2/X

f x

belongs to a 2-parameter exponential family with

2

22

1 2 1 22 2 2

1 1 1, exp , 1, , , , and

2 22a b x c c t x x t x x

If 1 2, ,...,n

X X X are iid random variables ,then

1 21

1 1

1

, ,..., exp

= exp

k nnn

n j i i jj

i j

kn

i i

i

f x x x a b x c t x

a b c T

X/θ

x x

Define

Page 22: Estimation theory 1

1

2

1

1

2

1

1

:

:

= :

:

k

n

j

j

n

j

j

n

n j

j

T

T

T

T

t x

t x

t x

x

x

x

x

It is easy to show that T x is a sufficient and complete.

Page 23: Estimation theory 1

Criteria for Estimation

The estimation of a parameter is based on several well-known criteria. Each of the criteria tries

to optimize some functions of the observed samples with respect to the unknown parameter to be

estimated. Some of the most popular estimation criteria are:

Maximum Likelihood

Minimum Mean Square Error.

Baye’s Method.

Maximum Likelihood Estimator (MLE)

Suppose 1 2, ,...,n

X X X are random samples with the joint probability density

function 1 2, ,..., / 1 2( , ,..., )

nX X X nf x x x which depends on an unknown nonrandom

parameter .

/ 1 2( , , ..., / )n

f x x x X is called the likelihood function. If

1 2, ,...,n

X X X are discrete,

then the likelihood function will be a joint probability mass function. We represent

the concerned random variables and their values in vector notation by

1 2[ ... ]n

X X X X and 1 2[ ... ]n

x x x x respectively. Note that

/( / ) ln ( / )L f Xx x is the log likelihood function. As a function of the random

variables, the likelihood and log-likelihood functions are random variables.

The maximum likelihood estimator MLE is such an estimator that

/ 1 2 / 1 2ˆ( , ,..., / ) ( , ,..., / ),

n MLE nf x x x f x x x X X

If the likelihood function is differentiable with respect to , then MLE is given by

MLE

ˆ/ θ ( / ) 0f

X x

or 0 θ

)|L(

MLEθ

x

Thus the MLE is given by the solution of the likelihood equation given above.

Page 24: Estimation theory 1

If we have k unknown parameters given by

1

2

k

θ

Then MLE is given by a set of conditions.

1 1 2 2ˆ ˆ ˆ1 2

L( / ) L( / ) L( / ) .... 0

θ θ θMLE MLE k kMLE

k

x x x

Since ln( ( / ))L x is a monotonic function of the argument, it is convenient to express the MLE

conditions in terms of the log-likelihood function. The condition is now given by

1 1 2 2ˆ ˆ ˆ1 2

(L( / )) (L( / )) (L( / )) .... 0

θ θ θMLE MLE k kMLE

k

Ln Ln Ln

x x x

Example 10:

Let 1 2, ,...,

nX X X are independent identically distributed sequence of 2

( , )N distributed

random variables. Find MLE for and 2 .

2

2

1 2/ ,( , ,..., / , )nf x x x

X =

21

2

1

1

2

ixn

i

e

2

2 2

/ ,( / , ) ln ( / , )L f

Xx x

=

2

1

1ln 2 ln -

2

n

i

i

xn n

ˆ

1

0

ˆ( ) 0

MLE

n

i MLE

i

L

x

2

2

ˆ

2

2 4

0

ˆ( )0

ˆ ˆ

MLE

MLEi

MLE MLE

L

xn

Solving we get

Page 25: Estimation theory 1

1

22

1

1ˆ ˆ

n

MLE i

i

n

MLE i MLE

i

x andn

xn

Example 11:

Let 1 2, ,...,

nX X X are independent identically distributed random samples with

1 /

1( ) -

2

x

Xf x e x

Show that 1 2( , ,..., )

nX X Xmedian is the MLE for .

1

1 2, ,...., / 1 2

1( , ,...., )

2

n

i

i

n

x

X X X n nf x x x e

/

1

( / ) ln ( )

ln 2 n

i

i

L f

n x

Xx x

1

n

i

i

x

is minimized by 21, , ( ..., )nxmedian x x

21, ,ˆ ( ..., )MLE nxmedian x x Properties of MLE

(1) MLE may be biased or unbiased. In Example 4, ˆMLE

is unbiased where as 2ˆMLE

is a

biased estimator.

(2) If an efficient estimator exists, the MLE estimator is the efficient estimator.

Supposes an efficient estimator θ exists . Then

ˆ( / ) ( )L x c

at ˆ ,MLE

ˆ( / )

0

ˆ ˆ( ) 0

ˆ θ

MLE

MLE

MLE

L x

c

Page 26: Estimation theory 1

(3) The MLE is asymptotically unbiased and efficient. Thus for large n, the MLE is

approximately efficient.

(4) Invariance Properties of MLE

It is a remarkable property of the MLE and not shaerd by other estimators. If MLE

is the MLE

of and ( )h is a function, then ˆ( )MLE

h is the