1 research method lecture 13 (greene ch 16) maximum likelihood estimation (mle) ©

1

Research MethodResearch Method

Lecture 13 Lecture 13

(Greene Ch 16)(Greene Ch 16)

Maximum Maximum Likelihood Likelihood

Estimation (MLE)Estimation (MLE)©

Basic ideaBasic idea

Maximum likelihood estimation (MLE) is a method to find the most likely density function that would have generated the data.

Thus, MLE requires you to make a distributional assumption first.

This handout provides you with an intuition behind the MLE using examples.

2

Example 1Example 1 Let me explain the basic idea of

MLE using this data.

Let us make an assumption that the variable X follows normal distribution.

Remember that the density function of normal distribution with mean μ and variance σ2 is given by:

3

Id X

1 1

2 4

3 5

4 6

5 9

xexf x -for 2

1)(

22 2/)(

2

The data is plotted on the horizontal line. Now, ask yourself the following question. “Which distribution, A or B, is more

likely to have generated the data?”

4

1 4 5 6 9

A BId X

1 1

2 4

3 5

4 6

5 9

Answer to the question is A, because the data are clustered around the center of the distribution A, but not around the center of the distribution B.

This example illustrates that, by looking at the data, it is possible to find the distribution that is most likely to have generated the data.

Now, I will explain exactly how to find the distribution in practice.

5

The illustration of the The illustration of the estimation procedure.estimation procedure.

MLE starts with computing the likelihood contribution of each observation.

The likelihood contribution is the height of the density function. We use Li to denote the likelihood contribution of ith observation.

6

Graphical illustration of the likelihood contribution

7

1 4 5 6 9

AId X

1 1

2 4

3 5

4 6

5 9

The likelihood contribution of the first observation

22 2/)1(

212

1

eL=

Data value

Then, you multiply the likelihood contributions of all the observations. This is called the likelihood function. We use the notation L.

In our example, n=5.

8

n

iiLL

1

:function Likelihood

This notation means you multiply from i=1 through n.

In our example, the likelihood function looks like:

I wrote L(μ,σ) to emphasize that the likelihood function depends on these parameters.

9

22

2222

2222

/)9(

2

/)6(

2

/)5(

2

/)4(

2

/)1(

2

54321

5

1

2

12

1

2

12

1

2

1

),(

e

ee

ee

LLLLLLLi

iId Y

1 1

2 4

3 5

4 6

5 9

Then you find the values of μ and σ that maximize the likelihood function.

The values of μ and σ which are obtained this way are called the Maximum Likelihood Estimators of μ and σ.

Most of the MLE cannot be solved ‘by hand’. Thus, you need to write an iterative procedure to solve it on computer.

10

Fortunately, there are many optimization computer programs that can do this.

Most common programs among Economists are GQOPT. This program runs on FORTRAN. Thus, you need to write a FORTRAN program.

Even more fortunately, many of the models that requires MLE (like Probit or Logit models) can be estimated automatically on STATA.

However, it is necessary for you to understand the basic idea of MLE in order to understand what STATA does.

11

Example 2Example 2

Example 1 was the simplest case.

We are usually interested in estimating a model like y=β0+β1x+u.

Estimating such a model can be done using MLE.

12

Suppose that you have this data, and you are interested in estimating the model: y=β0+β1x+u

Let us make an assumption that u follows the normal distribution with mean 0 and variance σ2.

13

Id Y X

1 2 1

2 6 4

3 7 5

4 9 6

5 15

9

You can write the model as:

u=y-(β0+β1x)

This means that y-(β0+β1x) follows the normal distribution with with mean 0 and variance σ2.

The likelihood contribution of each person is the height of the density function at the data point (y-β0+β1x).

14

For example, the likelihood contribution of the 2nd observation is given by

152-β0-β1

The likelihood contribution of the 2nd observation

2210 2/)46(

222

1

eL=

Data point

15-β0-9β1

Id Y X

1 2 1

2 6 4

3 7 5

4 9 6

5 15

9

7-β0-5β16-β0-4β1 9-β0-6β1

)2/(

2

22

2

1u offunction density The

ue

Then the likelihood function is given by

16

2210

2210

2210

2210

2210

/)915(

2

/)69(

2

/)57(

2

/)46(

2

/)2(

2

543211

10

2

12

1

2

12

1

2

1

),,(

e

ee

ee

LLLLLLLn

ii

Id Y X

1 2 1

2 6 4

3 7 5

4 9 6

5 15

9

The likelihood function is a function of β0,β1, and σ.

You choose the values of β0,β1, and σ that maximizes the likelihood function. These are the maximum likelihood estimators of of β0,β1, and σ .

Again, maximization can be easily done using GQOPT or any other programs that have the optimization programs (like Matlab).

17

Example 3Example 3

Consider the following model.

y*=β0+β1x+u

Sometimes, we only know whether y*≥0 or not.

18

The data contain a variable Y which is either 0 or 1.

If Y=1, it means that y*≥0 If Y=0, it means that y*<0

19

Id Y X

1 0 1

2 0 4

3 1 5

4 1 6

5 1 9

Then, what is the likelihood contribution of each observation? In this case, we only know if y* ≥0 or y*<0. We do not know the exact value of y* .

In such case, we use the probability that y* ≥0 or y*<0 as the likelihood contribution.

Now, let’s make an assumption that u follows the standard normal distribution (normal distribution with mean 0 and variance 1.) 20

Take 2nd observation as an example. Since Y=0 for this observation, we know y*<0

Thus, the likelihood contribution is

21-β0-β1 -β0-9β1

Id Y X

1 0 1

2 0 4

3 1 5

4 1 6

5 1 9

-β0-5β1-β0-4β1 -β0-6β1

ondistributi normal standard

ofFunction on Distributi Cumulative

10

4

10

10*

2

)4(2

1

)4(

)04()0(

210

due

uP

uPyPL

u

L2

Now, take 3nd observation as an example. Since Y=1 for this observation, we know y*≥0

Thus, the likelihood contribution is

22-β0-β1 -β0-9β1

Id Y X

1 0 1

2 0 4

3 1 5

4 1 6

5 1 9

-β0-5β1-β0-4β1 -β0-6β1

)5(1

2

11

2

1

)5(

)05()0(

10

5

25

10

10*

3

210

2

10

duedue

uP

uPyPL

uu

L3

Thus, the likelihood function has the following complicated form.

23

Id Y X

1 0 1

2 0 4

3 1 5

4 1 6

5 1 9

)9(1)6(1

)5(1)4()(

2

11

2

11

2

11

2

1

2

1

),(

00

000

9

65

4

5

110

02

02

02

02

02

due

duedue

duedue

LL

u

uu

uu

ii

You choose the values of β0 and β1 that maximizes the likelihood function. These are the maximum likelihood estimators of of β0 and β1 .

24

Procedure of the MLEProcedure of the MLE

1. Compute the likelihood contribution of each observation: Li for i=1…n

2. Multiply all the likelihood contribution to form the likelihood function L.

3. Maximize L by choosing the values of the parameters. The values of parameters that maximizes L is the maximum likelihood estimators of the parameters.

25

n

iiLL

1

The log likelihood The log likelihood functionfunction

It is usually easier to maximize the natural log of the likelihood function than the likelihood function itself.

26

n

ii

n

ii LLL

11

)log(log)log(

The standard errors in The standard errors in MLEMLE

This is usually an advanced topic. However, it is useful to know how the standard errors are computed in MLE, since we use it for t-tests.

27

The score vector is the first derivative of the log likelihood function with respect to the parameters

Let θ be a column vector of the parameters. In Example 2, θ=(β0,β1,σ)’.

Then the score vector q is given by

28

)log(L

q

Then, the standard errors of the parameters are given by the square root of the diagonal elements of the following matrix.

29

1' qq

1 research method lecture 13 (greene ch 16) maximum likelihood estimation (mle) ©

Documents

likelihood function

data value slide

likelihood contributions

maximum likelihood estimators

basic idea of mle

likely density function

estimation procedure

th observation