maximum likelihood estimationmaximum likelihood estimator (mle) the maximum likelihood estimator...

48
Maximum Likelihood Estimation January 4, 2019 MLE January 4, 2019 1 / 48

Upload: others

Post on 16-Feb-2020

49 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Maximum Likelihood EstimationMaximum likelihood estimator (MLE) The maximum likelihood estimator (MLE) is the value b which maximises L( ;x). The MLE also maximises l( ;x) because

Maximum Likelihood Estimation

January 4, 2019

MLE January 4, 2019 1 / 48

Page 2: Maximum Likelihood EstimationMaximum likelihood estimator (MLE) The maximum likelihood estimator (MLE) is the value b which maximises L( ;x). The MLE also maximises l( ;x) because

Likelihood Function and ML estimator

Suppose we have a random sample X1, . . . ,Xn i.i.d f (x ; θ).

Then the joint pdf is

f (x ; θ) =n∏

i=1

f (xi ; θ)

with x = (x1, . . . , xn)T .

The likelihood function L(θ; x) is the joint pdf viewed as a function of θ. It isoften more convenient to work with the log-likelihood

l(θ; x) = ln [L(θ; x)]

.

MLE January 4, 2019 2 / 48

Page 3: Maximum Likelihood EstimationMaximum likelihood estimator (MLE) The maximum likelihood estimator (MLE) is the value b which maximises L( ;x). The MLE also maximises l( ;x) because

Maximum likelihood estimator (MLE)

The maximum likelihood estimator (MLE) is the value θ which maximisesL(θ; x).

The MLE also maximises l(θ; x) because ln() is monotonic. Usually it iseasier to maximise l(θ; x), so we work with this.

Comments:

To find the maximum of the likelihood function is an optimization problem.For simple cases we can find closed-form expressions for θ. However, weoften need iterative numerical numerical optimisation procedures.Useful to plot (log-)likelihood surface to identify potential problems.

MLE January 4, 2019 3 / 48

Page 4: Maximum Likelihood EstimationMaximum likelihood estimator (MLE) The maximum likelihood estimator (MLE) is the value b which maximises L( ;x). The MLE also maximises l( ;x) because

Example

Suppose X1, . . . ,Xn is a random sample from the exponential distribution withpdf

f (x ; θ) =

θe−θx x > 0,0 otherwise.

L(θ; x) = θne−θ∑n

i=1 xi , so l(θ; x) = n ln(θ)− θn∑

i=1

xi

∂l∂θ

=nθ−

n∑i=1

xi = 0 gives the MLE θ =1X

Check:∂2l∂θ2 =

−nθ2 < 0,

so θ does correspond to a maximum.

MLE January 4, 2019 4 / 48

Page 5: Maximum Likelihood EstimationMaximum likelihood estimator (MLE) The maximum likelihood estimator (MLE) is the value b which maximises L( ;x). The MLE also maximises l( ;x) because

Example

The sample data on the atmospheric pollution (due to sulphur dioxide) in 50cities in micrograms/m3 follows an exponential distribution.xin=c(xin=c(6.8,6.0,2.4,0.98,3.3, 5.3,1.2,3.7,4.2,7.5, 6.9,5.6,3.2,3.7,4.7, 3.2,3.5,7.0,4.4,3.1, 8.8,3.4,3.4,7.9,3.7,3.4,9.1,5.2,6.7,2.5, 7.8,1.7,2.4,6.9,4.2, 5.1,6.4,8.7,3.6,2.7, 3.4,5.7,5.38,5.2,7.3, 4.9,3.9,7.9,2.7,2.4))

lvexp=function(lambda,yoss) # the (log-)likelihood functionn=length(yoss)sumy=sum(yoss)nlog(lambda)-lambdasumylambda=seq(0,1,length=length(xin)) # plot of the log-likelihoodlogv=lvexp(lambda,xin)plot(lambda,logv,type="l", xlab="lambda",ylab="log-likelihood")lambda[which.max(logv)] # max of log-likelihood function at[1] 0.21/mean(xin) # the MLE[1] 0.2

MLE January 4, 2019 5 / 48

Page 6: Maximum Likelihood EstimationMaximum likelihood estimator (MLE) The maximum likelihood estimator (MLE) is the value b which maximises L( ;x). The MLE also maximises l( ;x) because

Plot of the (log-)likelihood function

0.0 0.2 0.4 0.6 0.8 1.0

−24

0−

220

−20

0−

180

−16

0−

140

log likelihood: exponential distribution

lambda

log−

likel

ihoo

d

MLE January 4, 2019 6 / 48

Page 7: Maximum Likelihood EstimationMaximum likelihood estimator (MLE) The maximum likelihood estimator (MLE) is the value b which maximises L( ;x). The MLE also maximises l( ;x) because

Example 2

Example

Suppose X1, . . . ,Xn is a random sample from the Bernouilli distribution withpdf

f (x) =

px (1− p)1−x x = 0,1 0 ≤ p ≤ 10 otherwise

E [X ] = p and Var [X ] = p(1− p) = pq

l(p; x) =n∑

i=1

xi ln p + (n −n∑

i=1

xi ) ln(1− p)

∂l∂p

=

∑ni=1 xi

p−

n −∑n

i=1 xi

1− p= 0 gives the MLE p = X

Check: ∂2 l∂p2 < 0, so p does correspond to a maximum.

Since∑n

i=1 Xi ∼ Bin(n,p), we have E [p] = p so that p is unbiased.

MLE January 4, 2019 7 / 48

Page 8: Maximum Likelihood EstimationMaximum likelihood estimator (MLE) The maximum likelihood estimator (MLE) is the value b which maximises L( ;x). The MLE also maximises l( ;x) because

Plot of the (log-)likelihood functiontheta = seq(0,1,by=.01 ) # values for θy = dbinom (3, 10, theta) # calculate l(θ)y = y / max(y) # rescaleplot ( theta, y, type="l", xlab=expression(theta), ylab="likelihood function", main ="Likelihood for n=10 s=3")theta[which.max(y)]lines(theta[which.max(y)], max(y) ,type="h", lty=1)

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

Likelihood for n=10 s=3

θ

likel

ihoo

d fu

nctio

n

MLE January 4, 2019 8 / 48

Page 9: Maximum Likelihood EstimationMaximum likelihood estimator (MLE) The maximum likelihood estimator (MLE) is the value b which maximises L( ;x). The MLE also maximises l( ;x) because

Example

Suppose X1, . . . ,Xn is a random sample from U[0, θ]. The likelihood function

L(θ; x) =1θn 1maxi xi≤θ(max

ixi ).

For θ ≥ max xi , L(θ; x) = 1θn > 0 and is decreasing as θ increases, while for

θ < max xi , L(θ; x) = 0.Hence the MLE is the value θ = maxi xi .

Exercise

Is θ = maxi xi unbiased? Is it asymptotically unbiased?

MLE January 4, 2019 9 / 48

Page 10: Maximum Likelihood EstimationMaximum likelihood estimator (MLE) The maximum likelihood estimator (MLE) is the value b which maximises L( ;x). The MLE also maximises l( ;x) because

Plot of likelihood for U[0, θ].Assume x = (4,7,2,10), so n = 4 and maxi = 10.

0 5 10 15 20 25 30

0e+

002e

−05

4e−

056e

−05

8e−

051e

−04

Likelihood U[0,theta] for max(x)=10, n=4

theta

likel

ihoo

d

MLE January 4, 2019 10 / 48

Page 11: Maximum Likelihood EstimationMaximum likelihood estimator (MLE) The maximum likelihood estimator (MLE) is the value b which maximises L( ;x). The MLE also maximises l( ;x) because

R commands for plot of likelihood of U[0, θ]

L<-function(theta,x)n<-length(x)maxx<-max(x)(1/thetaˆn)* ifelse(maxx>theta,0,1)x<-c(4,7,2,10)theta<-seq(1,30,by=1)plot(theta,L(theta,x),type="l",xlab="theta",ylab="likelihood",main="Likelihood U[0,theta] for max(x)=10, n=4")lines(max(L(theta,x)) ,theta[which.max(L(theta,x))],type="l", lty=1)

MLE January 4, 2019 11 / 48

Page 12: Maximum Likelihood EstimationMaximum likelihood estimator (MLE) The maximum likelihood estimator (MLE) is the value b which maximises L( ;x). The MLE also maximises l( ;x) because

MLE and Exponential Families of Distributions

Definition (The Exponential Family of Distributions)

The r.v. X belongs to the k-parameter exponential family of distributions iffits pdf can be written in the form

f (x ; θ) = exp

k∑

j=1

Aj (θ)Bj (x) + C(x) + D(θ)

where

A1(θ), . . . ,Ak (θ),D(θ) are functions of θ alone.B1(x), . . . ,Bk (x),C(x) are functions of x alone.

MLE January 4, 2019 12 / 48

Page 13: Maximum Likelihood EstimationMaximum likelihood estimator (MLE) The maximum likelihood estimator (MLE) is the value b which maximises L( ;x). The MLE also maximises l( ;x) because

Examples

Example

Exponential (k = 1): θe−θx = exp (−θ)(x) + ln θ

i.e. A(θ) = −θ,B(x) = x ,C(x) = 0 and D(θ) = ln θ.

Normal (k = 2):(2πσ2)−12 exp

− 1

2σ2 (x − µ)2

= exp

− 1

2σ2 x2 +µ

σ2 x − µ2

2σ2 −12

ln(2πσ2)

i.e. A1(θ) = − 1

2σ2 ,A2(θ) = µσ2 ,B1(x) = x2,B2(x) = x ,C(x) = 0 and

D(θ) = − µ2

2σ2 − 12 ln(2πσ2).

MLE January 4, 2019 13 / 48

Page 14: Maximum Likelihood EstimationMaximum likelihood estimator (MLE) The maximum likelihood estimator (MLE) is the value b which maximises L( ;x). The MLE also maximises l( ;x) because

Some distributions belonging to the exponential family.

Distribution f (x ; θ) A(θ) B(x) C(x) D(θ)k = 1Binomial

(nx

)px (1− p)n−x ln( p

1−p ) x ln(n

x

)n ln(1− p)

PoissonExponential θ exp−θx −θ x 0 ln θ

N(0, σ2) (2πσ2)−1/2 exp− x2

2σ2 − 12σ2 x2 0 − 1

2 ln(2πσ2)

N(µ, 1) (2π)−1/2 exp−(x−µ)2

2 µ x − x22 − 1

2 ln(2π)− µ22

Gamma θr xr−1 exp−θx

(r−1)! −θ x (r − 1) ln x r ln θ − ln((r − 1)!)(1 param)k = 2

N(µ, σ2) (2πσ2)−1/2 exp− (x−µ)2

2σ2 A1(θ) = B1(x) 0 − 12 ln(2πsigma2)

− 12σ2 x2

A2(θ) = B2(x) = − 12µ

2/σ2

µ

σ2 xGamma(2 param)

Table: Some members of the exponential family of distributions

MLE January 4, 2019 14 / 48

Page 15: Maximum Likelihood EstimationMaximum likelihood estimator (MLE) The maximum likelihood estimator (MLE) is the value b which maximises L( ;x). The MLE also maximises l( ;x) because

ExerciseFill in the gaps in the table above for the following:

Poisson:e−θθx

x!

Gamma (two parameters):βαxα−1e−βx

Γ(α)

MLE January 4, 2019 15 / 48

Page 16: Maximum Likelihood EstimationMaximum likelihood estimator (MLE) The maximum likelihood estimator (MLE) is the value b which maximises L( ;x). The MLE also maximises l( ;x) because

Natural Parameterization

Letting φj = Aj (θ), j = 1, . . . , k , the exponential form becomes

f (x ; θ) = exp

k∑

j=1

φjBj (x) + C(x) + D(φ)

The parameters φ1, . . . , φk are called natural or canonical parameters.

Exponential in terms of its natural parameter φ = −θ:

−φeφx

Normal in terms of its natural parameters φ1 = − 12σ2 , φ2 = µ

σ2 :

exp

φ1x2 + φ2x +

φ22

4φ1− 1

2ln

(− π

φ1

)

MLE January 4, 2019 16 / 48

Page 17: Maximum Likelihood EstimationMaximum likelihood estimator (MLE) The maximum likelihood estimator (MLE) is the value b which maximises L( ;x). The MLE also maximises l( ;x) because

MLEs of Natural Parameters

Theorem (MLEs of Natural Parameters)

Suppose X1, . . . ,Xn form a random sample from a distribution which is amember of the k-parameter exponential family with pdf

f (x ; θ) = exp

k∑

j=1

φjBj (x) + C(x) + D(φ)

then the MLEs of φ1, . . . , φk are found by solving the equations

tj = E [Tj ], j = 1, . . . , k

where Tj =n∑

i=1

Bj (Xi ), j = 1, . . . , k and tj =n∑

i=1

Bj (xi ).

MLE January 4, 2019 17 / 48

Page 18: Maximum Likelihood EstimationMaximum likelihood estimator (MLE) The maximum likelihood estimator (MLE) is the value b which maximises L( ;x). The MLE also maximises l( ;x) because

Proof.The likelihood function is

L(φ; x) =n∏

i=1

f (xi ;φ) =n∏

i=1

exp

k∑

j=1

φjBj (xi ) + C(xi ) + D(φ)

= exp

k∑

j=1

φj

n∑i=1

Bj (xi ) +n∑

i=1

C(xi ) + nD(φ)

= exp

k∑

j=1

φj tj +n∑

i=1

C(xi ) + nD(φ)

⇒ l(φ; x) = constant +

k∑j=1

φj tj + nD(φ)

MLE January 4, 2019 18 / 48

Page 19: Maximum Likelihood EstimationMaximum likelihood estimator (MLE) The maximum likelihood estimator (MLE) is the value b which maximises L( ;x). The MLE also maximises l( ;x) because

Proof.The likelihood function is

l(φ; x) = constant +k∑

j=1

φj tj + nD(φ)⇒ ∂l∂φj

= tj + n∂D(φ)

∂φj

Furthermore,

E[∂l∂φj

]= 0, so E [Tj ] = −n

∂D(φ)

∂φj,

hence∂l∂φj

= tj − E [Tj ]

and so solving∂l∂φj

= 0 is equivalent to solving tj = E [Tj ].

Moreover, it can be shown (not here) that if these equations have a solutionthen it is the unique MLE (thus there is not need to check second derivatives).See Bickel and Doksum, 1977, Mathematical Statistics, Basic Ideas and selectedTopics, Holden Day, San Francisco.

MLE January 4, 2019 19 / 48

Page 20: Maximum Likelihood EstimationMaximum likelihood estimator (MLE) The maximum likelihood estimator (MLE) is the value b which maximises L( ;x). The MLE also maximises l( ;x) because

Example: N(µ,1) distribution

Example

For the N(µ,1) distribution,

A(θ) = µ and B(x) = x

Therefore T =∑n

i=1 Xi and

E [T ] = nE [Xi ] = nµ

Setting t = E [T ] givesn∑

i=1

xi = nµ

and solving for µ gives the MLE µ = X .

MLE January 4, 2019 20 / 48

Page 21: Maximum Likelihood EstimationMaximum likelihood estimator (MLE) The maximum likelihood estimator (MLE) is the value b which maximises L( ;x). The MLE also maximises l( ;x) because

The Cramer-Rao Inequality and Lower Bound

Theorem (The Cramer-Rao Inequality and Lower Bound)

Suppose X1, . . . ,Xn form a random sample from the distribution with pdff (x ; θ). Subject to certain regularity conditions on f (x ; θ), we have that for anyunbiased estimator θ for θ,

Var [θ] ≥ I−1θ

where Iθ is the Fisher Information about θ

Iθ = E

[(∂ ln[L(θ; x)]

∂θ

)2]

= E

[(∂l∂θ

)2].

I−1θ is known as the Cramer-Rao lower bound

MLE January 4, 2019 21 / 48

Page 22: Maximum Likelihood EstimationMaximum likelihood estimator (MLE) The maximum likelihood estimator (MLE) is the value b which maximises L( ;x). The MLE also maximises l( ;x) because

Proof.

For unbiased θE [θ] =

∫xθL(θ; x) dx = θ

Under regularity conditions ∫θ∂L∂θ

dx = 1

Now∂l∂θ

=∂ ln L∂θ

=1L∂L∂θ⇒ ∂L

∂θ= L

∂l∂θ.

Therefore

1 =

∫θ∂L∂θ

= E[θ∂l∂θ

].

MLE January 4, 2019 22 / 48

Page 23: Maximum Likelihood EstimationMaximum likelihood estimator (MLE) The maximum likelihood estimator (MLE) is the value b which maximises L( ;x). The MLE also maximises l( ;x) because

Proof.

We can then prove the result using Cauchy-Schwartz inequality. Let U = θand V = ∂l

∂θ . Then

E [V ] =

∫∂l∂θ

L dx =

∫∂L∂θ

=∂

∂θ

[∫L dx

]= 0

Therefore

Cov [U,V ] = E [UV ]− E [U]E [V ] = E [UV ] = E[θ∂l∂θ

]= 1

Also

Var [V ] = E [V 2] = E

[(∂l∂θ

)2]

= Iθ

So by Jensen’s Inequality:

(Cov [U,V ])2 ≤ Var [U]Var [V ]⇒ 1 ≤ Var(θ)Iθ

MLE January 4, 2019 23 / 48

Page 24: Maximum Likelihood EstimationMaximum likelihood estimator (MLE) The maximum likelihood estimator (MLE) is the value b which maximises L( ;x). The MLE also maximises l( ;x) because

Comments:

The larger Iθ is, the more information we have in the data about θ, hencethe attainable variance of θ is lower.Regularity conditions required to exchange integration and differentiationin the proof include that the range of values of X must not depend on θ.

An unbiased estimator θ whose variance attains the Cramer-Rao lowerbound is called efficient.

MLE January 4, 2019 24 / 48

Page 25: Maximum Likelihood EstimationMaximum likelihood estimator (MLE) The maximum likelihood estimator (MLE) is the value b which maximises L( ;x). The MLE also maximises l( ;x) because

Example

Suppose X1, . . . ,Xn form a random sample from N ∼ (µ, σ2) with σ2 unknown.

L =n∏

i=1

f (xi , µ) =n∏

i=1

(2πσ2)−12 exp

− 1

2σ2 (xi − µ)2

⇒ l = −n2

ln(2πσ2)− 12σ2

n∑i=1

(xi − µ)2

⇒ ∂l∂µ

=1σ2

n∑i=1

(xi − µ) =nσ2 (x − µ)

∴ µ = X is MLE.

MLE January 4, 2019 25 / 48

Page 26: Maximum Likelihood EstimationMaximum likelihood estimator (MLE) The maximum likelihood estimator (MLE) is the value b which maximises L( ;x). The MLE also maximises l( ;x) because

Example

Iθ = E

[(∂l∂µ

)2]

= E[

n2

σ4 (X − µ)2]

=n2

σ4 E[(X − µ)2]

=n2

σ4 Var [X ] =n2

σ4σ2

n=

nσ2

Thus the lower bound is I−1θ = σ2

n , which is attained by µ = X , hence µ is anefficient estimator.µ may also be referred to as a minimum variance unbiased estimator(MVUE).

MLE January 4, 2019 26 / 48

Page 27: Maximum Likelihood EstimationMaximum likelihood estimator (MLE) The maximum likelihood estimator (MLE) is the value b which maximises L( ;x). The MLE also maximises l( ;x) because

Exercise

ExerciseUnder the same regularity conditions as before, show that Iθ can beexpressed in the more useful form

Iθ = −E[∂2l∂θ2

].

Using this result show that the ML estimator obtained earlier for the parameterof a Poisson distribution attains the Cramer-Rao lower bound.

MLE January 4, 2019 27 / 48

Page 28: Maximum Likelihood EstimationMaximum likelihood estimator (MLE) The maximum likelihood estimator (MLE) is the value b which maximises L( ;x). The MLE also maximises l( ;x) because

Properties of MLEs

TheoremSuppose θ and φ represent two alternative parameterizations and that φ is a(1-1) function of θ, so we can write

φ = g(θ), θ = h(φ)

for appropriate g and h.Then if θ is the MLE of θ, then the MLE of φ is g(θ).

MLE January 4, 2019 28 / 48

Page 29: Maximum Likelihood EstimationMaximum likelihood estimator (MLE) The maximum likelihood estimator (MLE) is the value b which maximises L( ;x). The MLE also maximises l( ;x) because

Proof.

Suppose the value of φ that maximises L corresponds to θ 6= θ, so that

L(g(θ); x) > L(g(θ); x)

Taking the inverse function h(.) we have that

L(θ; x) > L(θ; x)

so θ is not the MLE.

MLE January 4, 2019 29 / 48

Page 30: Maximum Likelihood EstimationMaximum likelihood estimator (MLE) The maximum likelihood estimator (MLE) is the value b which maximises L( ;x). The MLE also maximises l( ;x) because

Invariance of MLE

Theorem (Invariance of MLE)

Let θ1, . . . , θk be a MLE for θ1, . . . , θk . If

T (θ) = (T1(θ), . . . , Tr (θ))

is a transformation of the parameter space Ω, then

T (θ) = (T1(θ), . . . , Tr (θ))

is a MLE of T (θ).

MLE January 4, 2019 30 / 48

Page 31: Maximum Likelihood EstimationMaximum likelihood estimator (MLE) The maximum likelihood estimator (MLE) is the value b which maximises L( ;x). The MLE also maximises l( ;x) because

Example

Consider X1, . . . ,Xn ∼ N(µ, σ2), µ, σ2 both unknown. Then the log-likelihoodis

l = −n2

ln(2πσ2)− 12σ2

n∑i=1

(xi − µ)2

Could find σ2 from∂l∂σ2 =

−n2σ2 +

12σ4

n∑i=1

(xi − µ)2

or∂l∂σ

=−nσ

+1σ3

n∑i=1

(xi − µ)2

MLE January 4, 2019 31 / 48

Page 32: Maximum Likelihood EstimationMaximum likelihood estimator (MLE) The maximum likelihood estimator (MLE) is the value b which maximises L( ;x). The MLE also maximises l( ;x) because

Example

Substituting µ = X and setting the second equation to zero:

1σ3

n∑i=1

(xi − X )2 =nσ

Could solve for σ and square but easier to solve for σ2 directly:

σ2 =

∑ni=1(xi − X )2

n

MLE January 4, 2019 32 / 48

Page 33: Maximum Likelihood EstimationMaximum likelihood estimator (MLE) The maximum likelihood estimator (MLE) is the value b which maximises L( ;x). The MLE also maximises l( ;x) because

Example

By invariance, we can find MLEs for parameters of distributions in theexponential family via the natural parameterisation.E.g. for the Poisson distribution,

A(θ) = ln θ and B(x) = x

Therefore T =∑n

i=1 Xi and

E [T ] = nE [Xi ] = nθ

Setting t = E [T ] givesn∑

i=1

xi = nθ

and solving for θ gives the MLE θ = X .

MLE January 4, 2019 33 / 48

Page 34: Maximum Likelihood EstimationMaximum likelihood estimator (MLE) The maximum likelihood estimator (MLE) is the value b which maximises L( ;x). The MLE also maximises l( ;x) because

Lemmas

Lemma

Suppose there exists an unbiased estimator, θ, which attains the Cramer-Raobound. Suppose that θ, the MLE is a solution to

∂l∂θ

= 0.

Then θ = θ.

MLE January 4, 2019 34 / 48

Page 35: Maximum Likelihood EstimationMaximum likelihood estimator (MLE) The maximum likelihood estimator (MLE) is the value b which maximises L( ;x). The MLE also maximises l( ;x) because

Lemma

Under fairly weak regularity conditions, MLE’s are consistent. If θ is the MLEfor θ, then asymptotically

θ ∼ N(θ, I−1θ )

MLE January 4, 2019 35 / 48

Page 36: Maximum Likelihood EstimationMaximum likelihood estimator (MLE) The maximum likelihood estimator (MLE) is the value b which maximises L( ;x). The MLE also maximises l( ;x) because

Example

Example

For the normal distribution N(µ, σ2)

the MLE µ is an efficient estimator for µthe MLE σ2 is asymptotically efficient for σ2, but not efficient for finitesample sizes.

MLE January 4, 2019 36 / 48

Page 37: Maximum Likelihood EstimationMaximum likelihood estimator (MLE) The maximum likelihood estimator (MLE) is the value b which maximises L( ;x). The MLE also maximises l( ;x) because

MLE and properties of MLE for the multi-parameterCase

Let X1, ...,Xn be iid with common pdf f (x ;θ), where θ ∈ Ωθ ⊂ Rp.

Likelihood function:

L(θ) =n∏

i=1

f (xi ;θ)

l(θ) = log L(θ) =n∑

i=1

log f (xi ;θ)

Maximum likelihood estimator (MLE) θ solves the vector equations

∂θl(θ) = 0.

MLE January 4, 2019 37 / 48

Page 38: Maximum Likelihood EstimationMaximum likelihood estimator (MLE) The maximum likelihood estimator (MLE) is the value b which maximises L( ;x). The MLE also maximises l( ;x) because

Properties

The following properties extend to θ from the scalar case:

Invariance: let η = g(θ), then η = g(θ) is MLE of η.

MLE January 4, 2019 38 / 48

Page 39: Maximum Likelihood EstimationMaximum likelihood estimator (MLE) The maximum likelihood estimator (MLE) is the value b which maximises L( ;x). The MLE also maximises l( ;x) because

Properties of MLE, multi-parameter case

Theorem (Properties of MLE, multi-parameter case)

Under a set of regularity conditions1 Consistency: the likelihood equation

∂θl(θ) = 0

has a solution θn such thatθn

P→ θ.

2 Asymptotic normality:

√n(θn − θ)

D→ MVNp(0, I−1(θ)).

MLE January 4, 2019 39 / 48

Page 40: Maximum Likelihood EstimationMaximum likelihood estimator (MLE) The maximum likelihood estimator (MLE) is the value b which maximises L( ;x). The MLE also maximises l( ;x) because

Theorem

I(θ) is the Fisher Information matrix with entries

Iii (θ) = Var[∂ log f (X ;θ)

∂θi

](1)

= −E[∂2

∂θ2i

log f (X ;θ)

](2)

Ijk (θ) = Cov[∂ log f (X ;θ)

∂θj,∂ log f (X ;θ)

∂θk

](3)

= −E[∂2

∂θjθklog f (X ;θ)

](4)

(5)

for i , j , k = 1, ...,p.

MLE January 4, 2019 40 / 48

Page 41: Maximum Likelihood EstimationMaximum likelihood estimator (MLE) The maximum likelihood estimator (MLE) is the value b which maximises L( ;x). The MLE also maximises l( ;x) because

Comments

Rao-Cramer bound in the multi-parameter case:Let θn,j be an unbiased estimator of θj . Then it can be shown that

Var(θn,j ) ≥1n

I−1jj (θ).

The unbiased estimator is efficient if it attains the lower bound.θn are asymptotically efficient estimators, that is, for j = 1, ...,p

√n(θn,j − θj )

D→ N(0, I−1jj (θ)).

MLE January 4, 2019 41 / 48

Page 42: Maximum Likelihood EstimationMaximum likelihood estimator (MLE) The maximum likelihood estimator (MLE) is the value b which maximises L( ;x). The MLE also maximises l( ;x) because

Transformation

Theorem (Transformation )

Let g be a transformation g(θ) = (g1(θ), ...,gk (θ))T such that 1 ≤ k ≤ p andthat the k × p matrix of partial derivatives

B =

[∂gi

∂θj

], i = 1, ..., k ; j = 1, ...,p,

has continuous elements and does not vanish in the neighbourhood of θ. Letη = g(η). Then η is MLE of η = g(θ) and

√n(η − η)

D→ MVNk (0,BI−1(θ)B′).

MLE January 4, 2019 42 / 48

Page 43: Maximum Likelihood EstimationMaximum likelihood estimator (MLE) The maximum likelihood estimator (MLE) is the value b which maximises L( ;x). The MLE also maximises l( ;x) because

Computation of MLEs: the Newton Raphson Method

Let g(θ) be the gradient of l(θ, ; x) and let H(θ) denote the matrix of 2ndderivatives (i.e. the Hessian matrix). Suppose θ0 is an initial estimate of θ andθ is the MLE. Expanding g(θ) about θ0 using the Taylor expansion gives

g(θ) = g(θ0) + (θ − θ0)T H(θ0) + . . .

⇒ 0 = g(θ0) + (θ − θ0)T H(θ0) + . . .

Therefore θ is approximated by

θ1 = θ0 − g(θ0)H−1(θ0)

Begin again with improved estimate θ1 and iterate until convergence.

MLE January 4, 2019 43 / 48

Page 44: Maximum Likelihood EstimationMaximum likelihood estimator (MLE) The maximum likelihood estimator (MLE) is the value b which maximises L( ;x). The MLE also maximises l( ;x) because

Fisher’s Method of Scoring

A simple modification of N-R in which the H(θ) are replaced by

E [H(θ)] = −Iθ

Now (under the usual regularity conditions)

E

[(∂l∂θ

)2]

= −E[∂2l∂θ2

]and so E

[∂2l

∂θi∂θj

]= −E

[∂l∂θi

∂l∂θj

].

therefore we need only calculate the score vector of 1st derivatives.

Also E [H(θ)] is positive definite, thus eliminating possible non-convergenceproblems of N-R.

MLE January 4, 2019 44 / 48

Page 45: Maximum Likelihood EstimationMaximum likelihood estimator (MLE) The maximum likelihood estimator (MLE) is the value b which maximises L( ;x). The MLE also maximises l( ;x) because

N-R and the Exponential Families of Distributions

In this case N-R and Fisher’s method of scoring are equivalent. Using thenatural parametrization

l(φ; x) = constant +k∑

j=1

φj tj + nD(φ)

thus∂l∂φj

= tj + n∂D(φ)

∂φj

and∂2l

∂φi∂φj= n

∂2D(φ)

∂φi∂φj

As D(φ) does not depend on x , H(θ) and E(H) are identical.

MLE January 4, 2019 45 / 48

Page 46: Maximum Likelihood EstimationMaximum likelihood estimator (MLE) The maximum likelihood estimator (MLE) is the value b which maximises L( ;x). The MLE also maximises l( ;x) because

The EM AlgorithmSuppose data is decomposed into observed (incomplete data) and missing(augmented data) values

x = (x0,xm)

L(θ|x0) = g(X 0|θ)︸ ︷︷ ︸incomplete data likelihood

=

∫f (x0,xm|θ)︸ ︷︷ ︸

complete data likelihood

dxm

We would like to maximise L(θ; x0) but this may be difficult to obtain.

The EM algorithm maximises L(θ; x0) by working with f (x0,xm|θ)

We have

f (x0,xm|θ)

g(x0|θ)= k(xm|θ,x0)

so g(x0|θ) =f (x0,xm|θ)

k(xm|θ,x0)

MLE January 4, 2019 46 / 48

Page 47: Maximum Likelihood EstimationMaximum likelihood estimator (MLE) The maximum likelihood estimator (MLE) is the value b which maximises L( ;x). The MLE also maximises l( ;x) because

The EM Algorithm(cont.)

Taking logs

ln g(x0|θ) = ln f (x0,xm|θ)− ln k(xm|θ,x0)

or l(θ|x0) = l(θ|x0,xm)− ln k(xm|θ,x0)

Let θi be a temporary estimate of θ, now taking expectation under k(xm|θi ,x0)

E [l(θ|x0)] = E [l(θ|x0,xm)|θi ,x0]− E [ln k(xm|θ,x0)|θi ,x0].

Let Q(θ|θi ) = E [l(θ|x0,xm)|θi ,x0] and H(θ|θi ) = E [ln k(xm|θ,x0)|θi ,x0], sincefor any θ, H(θ|θi ) ≤ H(θi |θi ), the log-likelihood increases if we increase Q.Therefore seek to maximise Q(θ|θi ) = E [l(θ|x0,xm)|θi ,x0].

MLE January 4, 2019 47 / 48

Page 48: Maximum Likelihood EstimationMaximum likelihood estimator (MLE) The maximum likelihood estimator (MLE) is the value b which maximises L( ;x). The MLE also maximises l( ;x) because

The EM Algorithm

Let θ0 be an initial estimate of θ. The algorithm iterates as follows:

E-step: Calculate the expected log-likelihood,

Q(θ|θi ) = E [l(θ|x0,xm)|θi ,x0] =

∫l(θ|x0,xm)k(xm|θi ,x0) dxm

.M-step: Maximise Q(θ|θi ) w.r.t. θ to obtain a new estimate θi+1.

We then iterate through the E- and M-steps until convergence (e.g. until|θi+1 − θi | is small) to the incomplete data MLE.

MLE January 4, 2019 48 / 48