1 research method lecture 13 (greene ch 16) maximum likelihood estimation (mle) ©

29
1 Research Method Research Method Lecture 13 Lecture 13 (Greene Ch 16) (Greene Ch 16) Maximum Likelihood Maximum Likelihood Estimation (MLE) Estimation (MLE) ©

Upload: lynette-bryan

Post on 22-Dec-2015

221 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: 1 Research Method Lecture 13 (Greene Ch 16) Maximum Likelihood Estimation (MLE) ©

1

Research MethodResearch Method

Lecture 13 Lecture 13

(Greene Ch 16)(Greene Ch 16)

Maximum Maximum Likelihood Likelihood

Estimation (MLE)Estimation (MLE)©

Page 2: 1 Research Method Lecture 13 (Greene Ch 16) Maximum Likelihood Estimation (MLE) ©

Basic ideaBasic idea

Maximum likelihood estimation (MLE) is a method to find the most likely density function that would have generated the data.

Thus, MLE requires you to make a distributional assumption first.

This handout provides you with an intuition behind the MLE using examples.

2

Page 3: 1 Research Method Lecture 13 (Greene Ch 16) Maximum Likelihood Estimation (MLE) ©

Example 1Example 1 Let me explain the basic idea of

MLE using this data.

Let us make an assumption that the variable X follows normal distribution.

Remember that the density function of normal distribution with mean μ and variance σ2 is given by:

3

Id X

1 1

2 4

3 5

4 6

5 9

xexf x -for 2

1)(

22 2/)(

2

Page 4: 1 Research Method Lecture 13 (Greene Ch 16) Maximum Likelihood Estimation (MLE) ©

The data is plotted on the horizontal line. Now, ask yourself the following question. “Which distribution, A or B, is more

likely to have generated the data?”

4

1 4 5 6 9

A BId X

1 1

2 4

3 5

4 6

5 9

Page 5: 1 Research Method Lecture 13 (Greene Ch 16) Maximum Likelihood Estimation (MLE) ©

Answer to the question is A, because the data are clustered around the center of the distribution A, but not around the center of the distribution B.

This example illustrates that, by looking at the data, it is possible to find the distribution that is most likely to have generated the data.

Now, I will explain exactly how to find the distribution in practice.

5

Page 6: 1 Research Method Lecture 13 (Greene Ch 16) Maximum Likelihood Estimation (MLE) ©

The illustration of the The illustration of the estimation procedure.estimation procedure.

MLE starts with computing the likelihood contribution of each observation.

The likelihood contribution is the height of the density function. We use Li to denote the likelihood contribution of ith observation.

6

Page 7: 1 Research Method Lecture 13 (Greene Ch 16) Maximum Likelihood Estimation (MLE) ©

Graphical illustration of the likelihood contribution

7

1 4 5 6 9

AId X

1 1

2 4

3 5

4 6

5 9

The likelihood contribution of the first observation

22 2/)1(

212

1

eL=

Data value

Page 8: 1 Research Method Lecture 13 (Greene Ch 16) Maximum Likelihood Estimation (MLE) ©

Then, you multiply the likelihood contributions of all the observations. This is called the likelihood function. We use the notation L.

In our example, n=5.

8

n

iiLL

1

:function Likelihood

This notation means you multiply from i=1 through n.

Page 9: 1 Research Method Lecture 13 (Greene Ch 16) Maximum Likelihood Estimation (MLE) ©

In our example, the likelihood function looks like:

I wrote L(μ,σ) to emphasize that the likelihood function depends on these parameters.

9

22

2222

2222

/)9(

2

/)6(

2

/)5(

2

/)4(

2

/)1(

2

54321

5

1

2

12

1

2

12

1

2

1

),(

e

ee

ee

LLLLLLLi

iId Y

1 1

2 4

3 5

4 6

5 9

Page 10: 1 Research Method Lecture 13 (Greene Ch 16) Maximum Likelihood Estimation (MLE) ©

Then you find the values of μ and σ that maximize the likelihood function.

The values of μ and σ which are obtained this way are called the Maximum Likelihood Estimators of μ and σ.

Most of the MLE cannot be solved ‘by hand’. Thus, you need to write an iterative procedure to solve it on computer.

10

Page 11: 1 Research Method Lecture 13 (Greene Ch 16) Maximum Likelihood Estimation (MLE) ©

Fortunately, there are many optimization computer programs that can do this.

Most common programs among Economists are GQOPT. This program runs on FORTRAN. Thus, you need to write a FORTRAN program.

Even more fortunately, many of the models that requires MLE (like Probit or Logit models) can be estimated automatically on STATA.

However, it is necessary for you to understand the basic idea of MLE in order to understand what STATA does.

11

Page 12: 1 Research Method Lecture 13 (Greene Ch 16) Maximum Likelihood Estimation (MLE) ©

Example 2Example 2

Example 1 was the simplest case.

We are usually interested in estimating a model like y=β0+β1x+u.

Estimating such a model can be done using MLE.

12

Page 13: 1 Research Method Lecture 13 (Greene Ch 16) Maximum Likelihood Estimation (MLE) ©

Suppose that you have this data, and you are interested in estimating the model: y=β0+β1x+u

Let us make an assumption that u follows the normal distribution with mean 0 and variance σ2.

13

Id Y X

1 2 1

2 6 4

3 7 5

4 9 6

5 15

9

Page 14: 1 Research Method Lecture 13 (Greene Ch 16) Maximum Likelihood Estimation (MLE) ©

You can write the model as:

u=y-(β0+β1x)

This means that y-(β0+β1x) follows the normal distribution with with mean 0 and variance σ2.

The likelihood contribution of each person is the height of the density function at the data point (y-β0+β1x).

14

Page 15: 1 Research Method Lecture 13 (Greene Ch 16) Maximum Likelihood Estimation (MLE) ©

For example, the likelihood contribution of the 2nd observation is given by

152-β0-β1

The likelihood contribution of the 2nd observation

2210 2/)46(

222

1

eL=

Data point

15-β0-9β1

Id Y X

1 2 1

2 6 4

3 7 5

4 9 6

5 15

9

7-β0-5β16-β0-4β1 9-β0-6β1

)2/(

2

22

2

1u offunction density The

ue

Page 16: 1 Research Method Lecture 13 (Greene Ch 16) Maximum Likelihood Estimation (MLE) ©

Then the likelihood function is given by

16

2210

2210

2210

2210

2210

/)915(

2

/)69(

2

/)57(

2

/)46(

2

/)2(

2

543211

10

2

12

1

2

12

1

2

1

),,(

e

ee

ee

LLLLLLLn

ii

Id Y X

1 2 1

2 6 4

3 7 5

4 9 6

5 15

9

The likelihood function is a function of β0,β1, and σ.

Page 17: 1 Research Method Lecture 13 (Greene Ch 16) Maximum Likelihood Estimation (MLE) ©

You choose the values of β0,β1, and σ that maximizes the likelihood function. These are the maximum likelihood estimators of of β0,β1, and σ .

Again, maximization can be easily done using GQOPT or any other programs that have the optimization programs (like Matlab).

17

Page 18: 1 Research Method Lecture 13 (Greene Ch 16) Maximum Likelihood Estimation (MLE) ©

Example 3Example 3

Consider the following model.

y*=β0+β1x+u

Sometimes, we only know whether y*≥0 or not.

18

Page 19: 1 Research Method Lecture 13 (Greene Ch 16) Maximum Likelihood Estimation (MLE) ©

The data contain a variable Y which is either 0 or 1.

If Y=1, it means that y*≥0 If Y=0, it means that y*<0

19

Id Y X

1 0 1

2 0 4

3 1 5

4 1 6

5 1 9

Page 20: 1 Research Method Lecture 13 (Greene Ch 16) Maximum Likelihood Estimation (MLE) ©

Then, what is the likelihood contribution of each observation? In this case, we only know if y* ≥0 or y*<0. We do not know the exact value of y* .

In such case, we use the probability that y* ≥0 or y*<0 as the likelihood contribution.

Now, let’s make an assumption that u follows the standard normal distribution (normal distribution with mean 0 and variance 1.) 20

Page 21: 1 Research Method Lecture 13 (Greene Ch 16) Maximum Likelihood Estimation (MLE) ©

Take 2nd observation as an example. Since Y=0 for this observation, we know y*<0

Thus, the likelihood contribution is

21-β0-β1 -β0-9β1

Id Y X

1 0 1

2 0 4

3 1 5

4 1 6

5 1 9

-β0-5β1-β0-4β1 -β0-6β1

ondistributi normal standard

ofFunction on Distributi Cumulative

10

4

10

10*

2

)4(2

1

)4(

)04()0(

210

due

uP

uPyPL

u

L2

Page 22: 1 Research Method Lecture 13 (Greene Ch 16) Maximum Likelihood Estimation (MLE) ©

Now, take 3nd observation as an example. Since Y=1 for this observation, we know y*≥0

Thus, the likelihood contribution is

22-β0-β1 -β0-9β1

Id Y X

1 0 1

2 0 4

3 1 5

4 1 6

5 1 9

-β0-5β1-β0-4β1 -β0-6β1

)5(1

2

11

2

1

)5(

)05()0(

10

5

25

10

10*

3

210

2

10

duedue

uP

uPyPL

uu

L3

Page 23: 1 Research Method Lecture 13 (Greene Ch 16) Maximum Likelihood Estimation (MLE) ©

Thus, the likelihood function has the following complicated form.

23

Id Y X

1 0 1

2 0 4

3 1 5

4 1 6

5 1 9

)9(1)6(1

)5(1)4()(

2

11

2

11

2

11

2

1

2

1

),(

00

000

9

65

4

5

110

02

02

02

02

02

due

duedue

duedue

LL

u

uu

uu

ii

Page 24: 1 Research Method Lecture 13 (Greene Ch 16) Maximum Likelihood Estimation (MLE) ©

You choose the values of β0 and β1 that maximizes the likelihood function. These are the maximum likelihood estimators of of β0 and β1 .

24

Page 25: 1 Research Method Lecture 13 (Greene Ch 16) Maximum Likelihood Estimation (MLE) ©

Procedure of the MLEProcedure of the MLE

1. Compute the likelihood contribution of each observation: Li for i=1…n

2. Multiply all the likelihood contribution to form the likelihood function L.

3. Maximize L by choosing the values of the parameters. The values of parameters that maximizes L is the maximum likelihood estimators of the parameters.

25

n

iiLL

1

Page 26: 1 Research Method Lecture 13 (Greene Ch 16) Maximum Likelihood Estimation (MLE) ©

The log likelihood The log likelihood functionfunction

It is usually easier to maximize the natural log of the likelihood function than the likelihood function itself.  

26

n

ii

n

ii LLL

11

)log(log)log(

Page 27: 1 Research Method Lecture 13 (Greene Ch 16) Maximum Likelihood Estimation (MLE) ©

The standard errors in The standard errors in MLEMLE

This is usually an advanced topic. However, it is useful to know how the standard errors are computed in MLE, since we use it for t-tests.

27

Page 28: 1 Research Method Lecture 13 (Greene Ch 16) Maximum Likelihood Estimation (MLE) ©

The score vector is the first derivative of the log likelihood function with respect to the parameters

Let θ be a column vector of the parameters. In Example 2, θ=(β0,β1,σ)’.

Then the score vector q is given by

28

)log(L

q

Page 29: 1 Research Method Lecture 13 (Greene Ch 16) Maximum Likelihood Estimation (MLE) ©

Then, the standard errors of the parameters are given by the square root of the diagonal elements of the following matrix.

29

1' qq