econ 140 lecture 191 heteroskedasticity lecture 19

23
Lecture 19 1 Econ 140 Econ 140 Heteroskedasticity Lecture 19

Post on 22-Dec-2015

215 views

Category:

Documents


2 download

TRANSCRIPT

Lecture 19 1

Econ 140Econ 140

Heteroskedasticity

Lecture 19

Lecture 19 2

Econ 140Econ 140Today’s plan

• How to test for it: graphs, Park and Glejser tests

• What we can do if we find heteroskedasticity

• How to estimate in the presence of heteroskedasticity

Lecture 19 3

Econ 140Econ 140Palm Beach County revisited

• How far is Palm Beach an outlier?

– Can the outlier be explained by heteroskedasticity?

– If so, what are the consequences?

• Heteroskedasticity will affect the variance of the regression line

– It will consequently affect the variance of the estimated coefficients

• L19.XLS provides an example of how to work through a problem like this using Excel

Lecture 19 4

Econ 140Econ 140Palm Beach County revisited (2)

• Palm Beach is a good example to use since there are scale effects in the data

– The voting pattern shows that the voting behavior and number of registered voters are related to the population in each county

• As the county gets larger, voting patterns may diverge from what would be assumed given the number of registered voters

– Note from the graph: as we move away from the origin, the difference between registered Reform voters and Reform votes cast increases

– We’ll hypothesize that this will have an affect on heteroskedasticity

Lecture 19 5

Econ 140Econ 140Notation

• Heteroskedasticity is observed as cross-section variability in the data

– data across units at point in time

• In our notation, heteroskedasticity is:

E(ei2) 2

• We can also write:

E(ei2) = i

2

– This means that we expect variable variance: the variance changes with each unit of observation

Lecture 19 6

Econ 140Econ 140Consequences

When heteroskedasticity is present:

1) OLS estimator is still linear

2) OLS estimator is still unbiased

3) OLS estimator is not efficient - the minimum

variance property no longer holds

4) Estimates of the variances are biased

5)

is not an unbiased estimator of YX2

6) We can’t trust the confidence intervals or

hypothesis tests (t-tests & F-tests): we may draw the

wrong conclusions

kn

eYX

22 ˆ

Lecture 19 7

Econ 140Econ 140Consequences (2)

• When BLUE holds and there is homoskedasticity, the first-order condition gives:

• With heteroskedasticity, we have:

• If we substitute the equation for ci to both equations, we find:

22ˆ icbV

2i

ii

x

xc

22ˆiicV

where

2

2

2ˆ and ˆ

i

i

i xV

xbV

Lecture 19 8

Econ 140Econ 140Cases

• With homoskedasticity: around each point, the variance around the regression line is constant

• With heteroskedasticity: around each point, the variance around the regression line varies with each value of the independent variable

Lecture 19 9

Econ 140Econ 140Detecting heteroskedasticity

• There are three ways of detecting heteroskedastiticy:

1) Graphically

2) Park Test

3) Glejser Test

Lecture 19 10

Econ 140Econ 140Graphical detection

• We can see that the errors vary with the unit of observation

• With homoskedasticity we find that for E(ei, X) = 0 :

• The errors are independent of the independent variables

• With heteroskedasticity we can get a variety of patterns

• The errors show a systematic relationship with the independent variables

• Note: you can use either e or e2 on the y-axis

Lecture 19 11

Econ 140Econ 140Graphical detection (3)

• Using the Palm Beach example (L19.xls), the estimated regression equation was:

XY 45.228.50ˆ • The errors of this equation, can be graphed against the number of registered Reform party voters, (the independent variable)

– Graph shows that the errors increasing with the number of registered reform voters

• While the graphs may be convincing, we also want to use a test to confirm this. We have two:

YYe ˆˆ

Lecture 19 12

Econ 140Econ 140Park Test

• Here’s the procedure:

1) Run regression Yi = a + bXi + ei despite the heteroskedasticity problem (it can also be multivariate)

2) Obtain residuals (ei), square them (ei2), and take their

logs (ln ei2)

3) Run a spurious regression:

4) Do a hypothesis test on with H0: g1 = 0

5) Look at the results of the hypothesis test:• reject the null: you have heteroskedasticity• fail to reject the null: homoskedasticity, or

which is a constant

iii vXgge lnln 102

1g

02ln gei

Lecture 19 13

Econ 140Econ 140Glejser Test

• When we use the Glejser, we’re looking for a scaling effect

• The procedure:

1) Run the regression (it can also be multivariate)

2) Collect ei terms

3) Take the absolute value of the errors

4) Regress |ei| against independent variable(s)

• you can run different kinds of regressions:

ii

i

iii

iii

uX

gge

uXgge

uXgge

1or

or

10

10

10

Lecture 19 14

Econ 140Econ 140Glejser Test (2)

4) [continued]

• If heteroskedasticity takes one of these forms, this will suggest an appropriate transformation of the model

• The null hypothesis is still H0: g1 = 0 since we’re testing for a relationship between the errors and the independent variables

• We reach the same conclusions as in the Park Test

Lecture 19 15

Econ 140Econ 140A cautionary note

• The errors in the Park Test (vi) and the Glejser Test (ui) might also be heteroskedastic.

– If this is the case, we cannot trust the hypothesis test H0: g1 = 0 or the t-test

• If we find heteroskedastic disturbances in the data, what can we do?

– Estimate the model Yi = a + bXi + ei using weighted least squares

– We’ll look at two examples of weighted least squares: one where we know the true variance, and one where we don’t

Lecture 19 16

Econ 140Econ 140Correction with known i

2

• Given that the true variance is known and our model is:

Yi = a + bXi + ei

• Consider the following transformation of the model:

i

i

i

i

ii

i eXba

Y

1

– In the transformed model, let

– So the expected value of the error squared is:iii ue

2

22 )(

i

ii

eEuE

Lecture 19 17

Econ 140Econ 140Correction with known i

2 (2)

• Given that there is heteroskedasticity, E(ei2) = i

2

– thus:

• In this simplistic example, we re-weighted model by the constant i

• What this example shows: when the variance is known, we must transform our model to obtain a homoskedastic error term.

1)(2

22

i

iiuE

Lecture 19 18

Econ 140Econ 140Correction with unknown i

2

• Given an unknown variance, we need to state the ad-hoc but plausible assumptions with our variance i

2 (how the errors vary with the independent variable)

• For example: we can assert that E(ei2) = 2Xi

• Remember: Glejser Test allows us to choose a relationship between the errors and the independent variable

i

i

i

i

ii

i

X

e

X

Xb

Xa

X

Y 1

Lecture 19 19

Econ 140Econ 140Correction with unknown i

2 (2)

• In this example you would transform the estimating equation by dividing through by to get:

i

i

i

i

ii

i

X

e

X

Xb

Xa

X

Y 1

i

i

X

e

iX

• Letting:

– The expected value of this error squared is:

i

ii X

eEE

22

Lecture 19 20

Econ 140Econ 140Correction with unknown i

2 (3)

• Recalling an earlier assumption, we find:

• When we don’t know the true variance we re-scale the estimating equation by the independent variable

222

2 i

i

i

ii X

X

X

eEE

Lecture 19 21

Econ 140Econ 140Returning to Palm Beach

• On L19.xls we have presidential election data by county in Florida

– To get a correct estimating equation, we can run a regression without Palm Beach if we think it’s an outlier.

– Then we can see if we can obtain a prediction for the number of reform votes cast in Palm Beach

– We can perform a Glejser Test for the regression excluding Palm Beach

– We run a regression of the absolute value of the errors (|ei|)against registered Reform voters (Xi)

Lecture 19 22

Econ 140Econ 140Returning to Palm Beach (2)

• The t-test rejects the null

– this indicates the presence of heteroskedasticity

• We can re-scale the model in different ways or introduce a new independent variable (such as the total number of registered voters by county)

• Keep transforming the model and running the Glejser Test

– When we fail to reject the null: there is no longer heteroskedasticity in the model

Lecture 19 23

Econ 140Econ 140Summary

• Even with re-weighted equations, we might still have heteroskedastic errors

– so we have to rerun the Glejser Test until we cannot reject the null

• If we cannot reject the null, we may have to rethink our model transformation

– if we suspect a scale effect, we may want to introduce new scaling variables

• Variables from the re-scaled equation are comparable with the coefficients from the original model