applied statistics and econometrics outline of lecture 6

27
Applied Statistics and Econometrics Lecture 6 Saul Lach September 2017 Saul Lach () Applied Statistics and Econometrics September 2017 1 / 53 Outline of Lecture 6 1 Omitted variable bias (SW 6.1) 2 Multiple regression model (SW 6.2, 6.3) 3 Measures of t (SW 6.4) 4 The Least Squares Assumptions (SW 6.5) 5 Sampling distribution of the OLS estimator (SW 6.6) 6 Hypothesis tests and condence intervals for a single coe¢ cient (SW 7.1) Saul Lach () Applied Statistics and Econometrics September 2017 2 / 53

Upload: others

Post on 04-Apr-2022

11 views

Category:

Documents


0 download

TRANSCRIPT

Applied Statistics and EconometricsLecture 6

Saul Lach

September 2017

Saul Lach () Applied Statistics and Econometrics September 2017 1 / 53

Outline of Lecture 6

1 Omitted variable bias (SW 6.1)2 Multiple regression model (SW 6.2, 6.3)3 Measures of fit (SW 6.4)4 The Least Squares Assumptions (SW 6.5)5 Sampling distribution of the OLS estimator (SW 6.6)6 Hypothesis tests and confidence intervals for a single coeffi cient (SW7.1)

Saul Lach () Applied Statistics and Econometrics September 2017 2 / 53

Omitted variable bias

The model with a single regressor is Y = β0 + β1X + u.

The error u arises because of factors that influence Y but are notincluded in the regression.

These exluded factors are omitted variables from the regression.There are always omitted variables, and sometimes this can lead to abias in the OLS estimator.

We will study when such a bias arises and the likely direction of thisbias.

Saul Lach () Applied Statistics and Econometrics September 2017 3 / 53

Example: testscores and STR

The estimated regression line of the test score-class size relationship is

testscore = 698.9329− 2.2798× STR

A likely omitted variable here is “family income”.

Suppose that in high-income districts classes are smaller and testscores are higher.

Is −2.28 a credible estimate of the causal effect on test scores of achange in the student-teacher ratio?

Probably not because it is likely that the estimated effect of STR alsoreflects the impact on test scores of variations in income acrossdistricts.

Districts with smaller STR have higher tests due to higher income.Thus −2.78 is a larger effect (in abolute value) than the true causaleffect of class size.

Saul Lach () Applied Statistics and Econometrics September 2017 4 / 53

Omitted variable bias

The bias in the OLS estimator that occurs as a result of an omittedfactor is called the Omitted Variable Bias (OVB).Given that that there are always omitted variables, it is important tounderstand when such a OVB occurs?

For OVB to occur, the omitted factor, which we call “Z”, mustsatisfy two conditions:

1 Z is a determinant of Y (i.e., Z is part of u)2 Z is correlated with the regressor X .

Both conditions must hold for the omission of Z to result in omittedvariable bias.

Saul Lach () Applied Statistics and Econometrics September 2017 5 / 53

Omitted variable bias: test scores and class size

Another omitted variable could be English language ability

1 English language ability (whether the student has English as a secondlanguage) plausibly affects standardized test scores: Z is adeterminant of Y.

2 Immigrant communities tend to be less affl uent and thus have smallerschool budgets —and higher STR: Z is correlated with X.

Accordingly, β1 is biased: what is the direction of the bias? That is,what is the sign of this bias?

If intuition fails you, there is a formula. . . soon.

Saul Lach () Applied Statistics and Econometrics September 2017 6 / 53

Conditions for OVB in CASchools data

Sometimes we can actually check these conditions (at least in a givensample).The California School dataset has data on the percentage of studentslearning English.The variable is el_pct.

Variable Obs Mean Std. Dev. Min Maxel_pct 420 15.76816 18.28593 0 85.53972

Is this variable correlated with STR and testscore (at least in thissample)?correlate el_pct str testscr

el_pct str testscrel_pct 1str 0.1876 1

testscr −0.6441 −0.2264 1

Saul Lach () Applied Statistics and Econometrics September 2017 7 / 53

Conditions for OVB in CASchools data

●●●

● ●●●●●● ●●● ●●● ●● ●●● ●●● ●● ● ●● ● ●●● ● ● ●●● ●● ●●●● ●● ●●● ● ●●● ●●●● ●●●● ● ●●● ●●● ●●●●●●● ●● ●●● ● ●●● ●● ●●● ● ●● ● ●●● ●● ●●● ●●● ●● ●● ●● ●●●●●● ●● ●● ●● ●● ●● ●●●●● ●●● ●●● ●●● ●● ● ● ●●●● ●● ●● ●●●● ● ●● ●●●● ●●● ● ●●●● ●● ●●● ●● ● ●●● ● ● ● ●● ● ● ●●● ● ● ●●● ●● ●● ● ●● ●●●● ●●● ●● ● ●●●● ●● ●●●● ●● ●●● ●●●●●● ●● ● ●●● ●● ● ● ●●●●●●● ● ●● ●●● ●●● ●● ●● ● ●●●● ●● ●●●● ●●● ●●● ●● ●● ● ●●● ● ●●● ●● ●● ● ●●●● ● ● ● ●●●●● ● ●●● ●● ● ●●●● ●● ● ●●● ●●● ●● ●● ●● ● ●●● ●●●● ●●● ● ●●●● ● ●●●● ●●● ● ●● ●● ●●●● ●●● ●●● ●●●● ●● ●●● ●●● ●● ● ●● ●●

●●●●

● ● ●●● ●●● ●● ●

●●

14 16 18 20 22 24 26

620

640

660

680

700

str

test

scor

e

● ●●

● ●●●●● ●● ●● ●●●● ●● ●● ●●●● ●●●●●● ●● ●●●●● ●●● ●● ● ●●●● ●●● ●●● ●●● ●●●● ●● ●●● ● ● ● ●●●●●● ●● ●●●● ● ●●●●● ●● ●● ● ●●●●●● ●● ●●●● ● ●● ● ●● ● ●●● ●● ● ●● ● ● ●●●●● ● ●● ●●●● ●● ●● ●●● ●●● ●●● ●● ●● ● ●●● ●● ●● ●●● ● ●● ●●●●● ●● ●●●● ●●● ●● ●●●●● ●● ●●●●● ● ●●● ●●●●●● ●● ●● ●●●● ●●●● ●●● ●● ●● ●●● ●●●● ●● ●●●● ● ●● ●● ●● ●● ●● ●● ●●● ●● ●●●● ●● ●●●●● ●●●● ●● ●●●●●●● ●●●● ●● ●●● ●●●●● ●● ● ●● ● ●●●● ●●●●● ●●●●●●●● ●●●● ●● ●● ●● ●●● ●●● ●● ●●●● ●●●●●● ●●● ●●●● ●● ●● ●●●● ●●●●●●●● ●●●●●●

●●●● ●●●● ●●●●● ●●● ●●●● ● ●● ● ●●● ●●●●●●●●

●●●●●

●●

0 20 40 60 80

620

640

660

680

700

english

test

scor

e

●●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●●

●●

●●

●●

● ●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

● ●●

●●

●●

●●

● ●●

0 20 40 60 80

14

16

18

20

22

24

26

english

str

Districts with lower percent English learners have higher test scores.

Districts with lower percent English learners have smaller classes.

Saul Lach () Applied Statistics and Econometrics September 2017 8 / 53

OVB formula

Recall from Lecture 4 (Preliminary algebra 3 slide) that we can write

β1 − β1 =∑ni=1(Xi − X )ui

∑ni=1(Xi − X )2

=1n ∑n

i=1(Xi − X )ui1n ∑n

i=1(Xi − X )2

=1n ∑n

i=1(Xi − X )uin−1n s

2X

Under assumptions LS2 and LS3 we have

β1p−→ β1 +

Cov(X , u)Var(x)︸ ︷︷ ︸OBV

If LS1 holds then Cov(X , u) = 0 and β1p−→ β1 (and also

E(

β1)= β1).

If LS1 does not hold then Cov(X , u) 6= 0 and β1p−→ β1 +

Cov (X ,u)Var (x )

(and also E(

β1)6= β1).

Saul Lach () Applied Statistics and Econometrics September 2017 9 / 53

OVB formula in terms of omitted variable

Previous formula is in terms of the error term u and, although it iscalled the OVB, it is more general in that the formula is correctirrespective of the reason for the correlation (or covariance) betweenu and X .

Suppose we now that we assert that a variable Z is omitted from theregression. We are then saying that Z is part of u and we w.l.o.g. wecan then write

u = β2Z + ε

where β2 is a coeffi cient.

ThenCov(X , u) = Cov(X , β2Z + ε) = β2Cov(X ,Z )

assuming ε is uncorrelated with X .

Saul Lach () Applied Statistics and Econometrics September 2017 10 / 53

OVB formula in terms of omitted variable

The OVB formula in this case becomes

β1p−→ β1 + β2

Cov(X ,Z )Var(x)︸ ︷︷ ︸OBV

The math makes clear the two conditions for an OVB

1 Z is a determinant of Y =⇒ β2 6= 0.2 Z is correlated with the regressor X =⇒ Cov(X ,Z ) 6= 0.

Saul Lach () Applied Statistics and Econometrics September 2017 11 / 53

OVB formula: correlation version

An alternative formulation for the OVB formula is in terms of thecorrelation rather than the covariance:

β1p−→ β1 +

Cov(X , u)Var(x)︸ ︷︷ ︸OBV

= β1 + ρXuσuσX︸ ︷︷ ︸

OBV

β1p−→ β1 + β2

Cov(X ,Z )Var(x)︸ ︷︷ ︸OBV

= β1 + β2ρXZσZσX︸ ︷︷ ︸

OBV

Saul Lach () Applied Statistics and Econometrics September 2017 12 / 53

OVB formula in test score-class size example

We usually use the OVB formula to try to sign the direction of thebias.For example, when Z is the % of English Learners it is likely thatβ2 < 0 (also sample correlation suggest this).

And ρXZ is likely to be positive, ρXZ > 0 (also suggested by samplecorrelation).

Thus,β2(−)

ρXZ(+)

σZσX

< 0

so that β1 is smaller than the true parameter β1. Ignoring Englishlearners overstates (in an absolute sense) the class size effect.

What is the likely sign of the bias when Z is family income?

Saul Lach () Applied Statistics and Econometrics September 2017 13 / 53

Three ways to overcome omitted variable bias

1. Run a randomized controlled experiment in which treatment (STR) israndomly assigned: then el_pct is still a determinant of testscore, butel_pct is uncorrelated with STR.

Such random experiments are unrealistic in practice.

Saul Lach () Applied Statistics and Econometrics September 2017 14 / 53

Three ways to overcome omitted variable bias

2. Adopt the “cross tabulation”approach: divide sample into groupshaving approx. same value of el_pct and analyze within groups.Problems: 1) soon we will run out of data, 2) there are otherdeterminants (e.g., family income,parental education) that are omitted.

Saul Lach () Applied Statistics and Econometrics September 2017 15 / 53

Three ways to overcome omitted variable bias

3. Use a regression in which the omitted variable (el_pct) is no longeromitted: include el_pct as an additional regressor in a multipleregression.

This is the approach we will focus on.

Saul Lach () Applied Statistics and Econometrics September 2017 16 / 53

Where are we?

1 Omitted variable bias (SW 6.1)2 Multiple regression model (SW 6.2, 6.3)3 Measures of fit (SW 6.4)4 The Least Squares Assumptions (SW 6.5)5 Sampling distribution of the OLS estimator (SW 6.6)6 Hypothesis tests and confidence intervals for a single coeffi cient (SW7.1)

Saul Lach () Applied Statistics and Econometrics September 2017 17 / 53

The multiple regression model

The population regression model (or function) is

Y = β0 + β1X1 + β2X2 + · · ·+ βkXk + u

Y is the dependent variable.

X1,X2, . . . ,Xk are the k independent variables (regressors).β0 is the (unknown) intercept and β1, . . . , βk are the various(unknown) slopes.

u is the regression error reflecting other omitted factors affecting Y .

We assume right away that

E (u|X1,X2, . . .Xk ) = 0

so that the population regression line is the C.E. of Y given the k X ′sand the slope parameters can be interpreted as causal effects.

Saul Lach () Applied Statistics and Econometrics September 2017 18 / 53

Interpretation of coeffi cients (slopes) in multiple regression

Consider changing X1 from x1 to x1 + ∆1, while holding all the otherX ′s fixed.Before the change we have

E (Y |X1 = x1, . . . ,Xk = xk )= β0 + β1x1 + β2x2 + · · ·+ βkxk

After the change we have

E (Y |X1 = x1 + ∆1, . . . ,Xk = xk )= β0 + β1 (x1 + ∆1) + β2x2 + · · ·+ βkxk

The difference is

E (Y |X1 = x1 + ∆, . . . ,Xk = xk )− E (Y |X1 = x1, . . . ,Xk = xk )= β1∆1

Saul Lach () Applied Statistics and Econometrics September 2017 19 / 53

Interpretation of coeffi cients (slopes) in multiple regression

When ∆1 = 1 we have

E (Y |X1 = x1 + 1, . . . ,Xk = xk )−E (Y |X1 = x1, . . . ,Xk = xk ) = β1

β1 measures the effect on (expected) Y of a unit change in X1,holding the other regressors X2, . . .Xk fixed (we also saycontrolling for X2, . . .Xk ).Whether this partial effect can be given a causal interpretationdepends on what we assume for E (u|X1,X2, . . .Xk ) .If E (u|X1,X2, . . .Xk ) is constant, as assumed here, then β1 is thecausal effect of X1 on Y .Otherwise, it is not a causal effect. Why?

Same interpretation for βj , j = 2, . . . , k.

Saul Lach () Applied Statistics and Econometrics September 2017 20 / 53

The multiple regression model in the sample

The regression model (or function) in the sample is

Yi = β0 + β1X1i + β2X2i + · · ·+ βkXki + ui i = 1, . . . , n

The i th observation in the sample is (Yi ,X1i ,X2i , . . . ,Xki ) .

Saul Lach () Applied Statistics and Econometrics September 2017 21 / 53

Estimation

To simplify the presentation we assume that we have two regressorsonly, k = 2,

Yi = β0 + β1X1i + β2X2i + ui

With two regressors, the OLS estimator solves:

Minb0,b1,b2

n

∑i=1(Yi − (b0 + b1X1i + b2X2i ))2

The OLS estimator minimizes the average squared difference betweenthe actual values of Yi and the prediction (predicted value),b0 + b1X1i + b2X2i , based on such b′s.

This minimization problem is solved using calculus.

The result is the OLS estimators of β0, β1, β2 denoted, respectively,by β0, β1, β2.

Generalization of the case with one regressor (k = 1).

Saul Lach () Applied Statistics and Econometrics September 2017 22 / 53

Graphic intuition

minb0,b1

n

∑i=1(Yi − b0 − b1X1i )2

Fits line through points in R2

●●

●●●

● ●●●●●● ●●● ●●● ●● ●●● ●●● ●● ● ●● ● ●●● ● ● ●●● ●● ●●●● ●● ●● ● ● ●●● ●●●● ●●●● ● ●●● ●●● ●●●●●●● ●● ●● ● ● ● ●● ●● ●●● ● ●● ● ●●● ●● ●●● ●●● ●● ●● ●● ●●●●● ● ●● ●● ●● ●● ●● ●●●●● ● ●● ●●● ●●● ●● ● ● ●●●● ●● ●● ●● ●● ● ●● ●●●● ●●● ● ●●●● ●● ●● ● ●● ● ●●● ● ● ● ●● ● ● ●●● ● ● ●●● ●● ●● ● ●● ●●●● ●●● ●● ● ●●●● ●● ●●●● ●● ●●● ●●●●● ● ●● ● ●●● ●● ● ● ●●●●●●● ● ●● ●●● ●●● ●● ●● ● ●●●● ●● ●●●● ●●● ●●● ●● ●● ● ●●● ● ●● ● ●● ●● ● ●●●● ● ● ● ●●●●● ● ●●● ●● ● ●●●● ●● ● ●●● ●● ● ●● ●● ●● ● ●●● ●●●● ●●● ● ●●●● ● ●●●● ●●● ● ●● ●● ● ●●● ●●● ●● ● ●●●● ●● ●●● ●●● ●● ● ●● ●● ●

●●●● ● ●●● ●●● ●● ●

●●

14 16 18 20 22 24 26

620

640

660

680

700

str

test

scor

e

minb0,b1,b2

n

∑i=1(Yi − b0 − b1X1i − b2X2i )2

Fits plane through points in R3

14 16 18 20 22 24 26600

620

640

660

680

700

720

0 20

40 60

80100

str

engl

ish

test

scor

e●

● ●● ●●

● ●

●●

●●

●●

● ●

● ●●

●●

●●●

● ●

●●

●●

●●

●●●●

●●●

●●●

●●

● ●●

● ●

●● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●● ●

●●

●●

● ●

●●

●●●

●●

●●

●●●

●●●

●●

●●

●●

●●

● ●

●●●

●●

●●●

●●

●● ●

●●

●●●●

●●

●●

●●

●●

●●

● ●

●●● ●

● ●●

●●●

●● ●●● ● ●● ●●● ● ●● ● ●●●●● ●

●●●● ● ●●● ●●● ●●●●● ●●

●● ●● ●

●● ●

Saul Lach () Applied Statistics and Econometrics September 2017 23 / 53

Matrix notation

The multiple regression model

Yi = β0 + β1X1i + β2X2i + · · ·+ βkXki + ui i = 1, . . . , n

can be written in matrix form as

Y = Xβ+ u

where

Y =

Y1Y2...Yn

X =1 X11 X21 · · · Xk11 X12 X22 · · · Xk2...

.... . .

...1 X1n X2n · · · Xkn

β =

β0β1...

βk

u =u1u2...un

Saul Lach () Applied Statistics and Econometrics September 2017 24 / 53

OLS in matrix form

Using matrix notation, the minimization of the sum of squaredresiduals can be compactly written as

Minβ(Y−Xβ)′ (Y−Xβ)

And the first order conditions are

X′ (Y−Xβ) = 0 =⇒ X′X︸︷︷︸(k+1)×(k+1)

β︸︷︷︸(k+1)×1

= X′Y︸︷︷︸(k+1)×1

which is a system of linear equations that can be solved for β (recallAx = b, where A = X′X,x = β and c = X′Y).The solution is the OLS estimator given by

β =(X′X

)−1 X′Yprovided X′X is invertible.

Saul Lach () Applied Statistics and Econometrics September 2017 25 / 53

Example:the CASchools test score data

What happens to the coeffi cient on STR?

. reg testscr str

Source SS df MS Number of obs = 420 F(1, 418) = 22.58

Model 7794.11004 1 7794.11004 Prob > F = 0.0000 Residual 144315.484 418 345.252353 R-squared = 0.0512

Adj R-squared = 0.0490 Total 152109.594 419 363.030056 Root MSE = 18.581

testscr Coef. Std. Err. t P>|t| [95% Conf. Interval]

str -2.279808 .4798256 -4.75 0.000 -3.22298 -1.336637 _cons 698.933 9.467491 73.82 0.000 680.3231 717.5428

. reg testscr str el_pct

Source SS df MS Number of obs = 420 F(2, 417) = 155.01

Model 64864.3011 2 32432.1506 Prob > F = 0.0000 Residual 87245.2925 417 209.221325 R-squared = 0.4264

Adj R-squared = 0.4237 Total 152109.594 419 363.030056 Root MSE = 14.464

testscr Coef. Std. Err. t P>|t| [95% Conf. Interval]

str -1.101296 .3802783 -2.90 0.004 -1.848797 -.3537945 el_pct -.6497768 .0393425 -16.52 0.000 -.7271112 -.5724423 _cons 686.0322 7.411312 92.57 0.000 671.4641 700.6004

Saul Lach () Applied Statistics and Econometrics September 2017 26 / 53

OLS predicted values and residuals

Just as in the single regression model the predicted value is

Yi = β0 + β1X1i + β2X2i + · · ·+ βkXki

And the residual is

ui = Yi − Yi= Yi −

(β0 + β1X1i + β2X2i + · · ·+ βkXki

)So that we can write

Yi = Yi + ui= β0 + β1X1i + β2X2i + · · ·+ βkXki + ui

Saul Lach () Applied Statistics and Econometrics September 2017 27 / 53

Where are we?

1 Omitted variable bias (SW 6.1)2 Multiple regression model (SW 6.2, 6.3)3 Measures of fit (SW 6.4)4 The Least Squares Assumptions (SW 6.5)5 Sampling distribution of the OLS estimator (SW 6.6)6 Hypothesis tests and confidence intervals for a single coeffi cient (SW7.1)

Saul Lach () Applied Statistics and Econometrics September 2017 28 / 53

Measures of fit for multiple regression

Same measures as before:

SER (RMSE)= std. deviation of residual uiR2 = fraction of variance of Y explained or accounted for byX1, . . . ,Xk .(new!) R2 is the “adjusted R2”= R2 adjusted for the number ofregressors.

Saul Lach () Applied Statistics and Econometrics September 2017 29 / 53

Measures of fit: SER and RMSE

As in regression with a single regressors, the SER/RMSE measure thespread of the Y ′s around the estimated regression line

SER(RMSE ) =

√1

n− k − 1n

∑i=1u2i

Saul Lach () Applied Statistics and Econometrics September 2017 30 / 53

Measures of fit: R squared

As in the regression with a single regressors, the R2 is the fraction ofthe variance of Y accounted for by the model (i.e., by X1, . . . ,Xk )

R2 =ESSTSS

= 1− SSRTSS

where

ESS =n

∑i=1

(Yi − Y

)2, TSS =

n

∑i=1(Yi − Y )2 , SSR =

n

∑i=1u2i

The R2 never decreases when another regressors is added (i.e., whenk increases). (why?)

Not a good feature for a measure of “fit”.

Saul Lach () Applied Statistics and Econometrics September 2017 31 / 53

Measures of fit: Adjusted R squared

The adjusted R2 — R2—addresses this issue by “penalizing”you forincluding another regressor:

R2 = 1−(

n− 1n− k − 1

)SSRTSS

= R2 −(

kn− k − 1

)SSRTSS

Note that R2 < R2 but their difference tends to vanish for large n.

R2 does not necessarily increase with k (although SSR decreases,n−1n−k−1 increases).

R2 can be negative!

Saul Lach () Applied Statistics and Econometrics September 2017 32 / 53

How to interpret the simple and adjusted R squared?

A high R2 (or R2) means that the regressors account for the variationin Y.

A high R2 (or R2) does not mean that you have eliminated omittedvariable bias.

A high R2 (or R2) does not mean that you have an unbiasedestimator of a causal effect.

A high R2 (or R2) does not mean that the included variables arestatistically significant — this must be determined using hypothesestests.

Maximize R2 (or R2) is not a a criterion we use to select regressors.

Saul Lach () Applied Statistics and Econometrics September 2017 33 / 53

CASchools data example

Regression of testscore against STR:

testscore = 698.93− 2.28str , R2 = 0.0512

Regression of testscore against STR and el_pct:

testscore = 686.032− 1.101str − 0.65el_pct, R2 = 0.426

Adding the % of English Learners substantially improves fit of theregression. Both regressors account for almost 43% of the variation oftestscores across districts.

Saul Lach () Applied Statistics and Econometrics September 2017 34 / 53

Where are we?

1 Omitted variable bias (SW 6.1)2 Multiple regression model (SW 6.2, 6.3)3 Measures of fit (SW 6.4)4 The Least Squares Assumptions (SW 6.5)5 Sampling distribution of the OLS estimator (SW 6.6)6 Hypothesis tests and confidence intervals for a single coeffi cient (SW7.1)

Saul Lach () Applied Statistics and Econometrics September 2017 35 / 53

The Least Squares Assumptions for multiple regression

The multiple regression model is

Y = β0 + β1X1 + β2X2 + · · ·+ βkXk + u

The four least squares assumptions are:

Assumption #1 The conditional distribution of u given all the X ′s hasmean zero, that is,

E (u|X1 = x1, . . . ,Xk = xk ) = 0

for all (x1, . . . xk ) .Assumption #2 (Yi ,X1i , . . . ,Xki ), i = 1, . . . , n, are i.i.d.Assumption #3 Large outliers in Y and X are unlikely. X1, . . . ,Xk and Y

have finite fourth moments,

E(Y 4)< ∞,E

(X 41)< ∞, . . . ,E

(X 4k)< ∞

Assumpiton #4 There is no perfect multicollinearity.Saul Lach () Applied Statistics and Econometrics September 2017 36 / 53

Assumption #1: mean independence

E (u|X1 = x1, . . . ,Xk = xk ) = 0

Same interpretation as in the regression with a single regressor.

This assumption gives a causal interpretation to the parameters (theβ′s).

If an omitted variable (a) belongs in the equation (so is in u) and (b)is correlated with an included X , then this condition fails and there isOVB (omitted variable bias).

The solution — if possible — is to include the omitted variable in theregression.

Usually, this assumption is more likely to hold when one controls formore factors by including them in the regression.

Saul Lach () Applied Statistics and Econometrics September 2017 37 / 53

Assumption #2: i.i.d. sample

Same assumption as in the single regressor model.

This is satisfied automatically if the data are collected by simplerandom sampling.

Saul Lach () Applied Statistics and Econometrics September 2017 38 / 53

Assumption #3: large outliers are unlikely

Same assumption as in the single regressor model.

OLS can be sensitive to large outliers. It is recommended to checkthe data (via scatterplots, etc.) to make sure there are no largeoutliers (due to typos, coding errors, etc.)

This is technical assumption satisfied automatically by variables witha bounded domain.

Saul Lach () Applied Statistics and Econometrics September 2017 39 / 53

Assumption #4: No perfect multicollinearity

New assumption that applies when there is more than a singleregressor.

Perfect multicollinearity occurs when one of the regressors is anexact linear function of the other regressors.

Assumption #4 rules this out.

Cannot estimate the effect of, say, X1 holding all other variablesconstant if one of these variables is a perfect linear function of X1.

When there is perfect multicollinearity, the statistical software will letyou know it by either crashing, by giving an error message, or by“dropping”one of the regressors arbitrarily.

Saul Lach () Applied Statistics and Econometrics September 2017 40 / 53

Including a perfectly collinear regressor in R

Example: generate str_new=5+.2*str and add it to regression. Whathappens? Stata drops one fo the collinear variables.

. g str_new=5+.2*str

. reg testscr str str_newnote: str_new omitted because of collinearity

Source SS df MS Number of obs = 420 F(1, 418) = 22.58

Model 7794.11004 1 7794.11004 Prob > F = 0.0000 Residual 144315.484 418 345.252353 R-squared = 0.0512

Adj R-squared = 0.0490 Total 152109.594 419 363.030056 Root MSE = 18.581

testscr Coef. Std. Err. t P>|t| [95% Conf. Interval]

str -2.279808 .4798256 -4.75 0.000 -3.22298 -1.336637 str_new 0 (omitted) _cons 698.933 9.467491 73.82 0.000 680.3231 717.5428

Saul Lach () Applied Statistics and Econometrics September 2017 41 / 53

The dummy variable trap

Suppose you have a set of multiple binary (dummy) variables, whichare mutually exclusive and exhaustive — that is, there are multiplecategories and every observation falls in one and only one category(think of region of residence: Sicily, Lazio, Tuscany, etc.).

If you include all these dummy variables and a constant in theregression, you will have perfect multicollinearity — this is sometimescalled the dummy variable trap.Why is there perfect multicollinearity here?

Solutions to the dummy variable trap:

1 Omit one of the groups (e.g. Lazio), or2 Omit the intercept

What are the implications of (1) or (2) for the interpretation of thecoeffi cients? We will analyze this later in an example.

Saul Lach () Applied Statistics and Econometrics September 2017 42 / 53

Assumption #4: No perfect multicollinearity

Perfect multicollinearity usually reflects a mistake in the definitions ofthe regressors, or an oddity in the data.

The solution to perfect multicollinearity is to modify your list ofregressors so that you no longer have perfect multicollinearity.

Saul Lach () Applied Statistics and Econometrics September 2017 43 / 53

Imperfect multicollinearity

Imperfect and perfect multicollinearity are quite different despite thesimilarity of their names.

Imperfect multicollinearity occurs when two or more regressors arevery highly (but not perfectly) correlated.

Why the term “multicollinearity”? If two regressors are very highlycorrelated, then their scatterplot will pretty much look like a straightline — they are “co-linear" —but unless the correlation is exactly ±1,that collinearity is imperfect.

Saul Lach () Applied Statistics and Econometrics September 2017 44 / 53

Imperfect multicollinearity

Imperfect multicollinearity implies that one or more of the regressioncoeffi cients will be imprecisely estimated.

Intuition: the coeffi cient on X1 is the effect of X1 holding X2constant; but if X1 and X2 are highly correlated, there is very littlevariation in X1 once X2 is held constant —so the data are pretty muchuninformative about what happens when X1 changes but X2 doesn’t.This means that the variance of the OLS estimator of the coeffi cienton X1 will be large.

Thus, imperfect multicollinearity (correctly) results in large standarderrors for one or more of the OLS coeffi cients.

Importantly, imperfect multicollinearity does not violate Assumption#4. The OLS regression will “run”.

Saul Lach () Applied Statistics and Econometrics September 2017 45 / 53

Where are we?

1 Omitted variable bias (SW 6.1)2 Multiple regression model (SW 6.2, 6.3)3 Measures of fit (SW 6.4)4 The Least Squares Assumptions (SW 6.5)5 Sampling distribution of the OLS estimator (SW 6.6)6 Hypothesis tests and confidence intervals for a single coeffi cient (SW7.1)

Saul Lach () Applied Statistics and Econometrics September 2017 46 / 53

The sampling distribution of OLS

Under the four LS assumptions:

1 β0, β1, . . . , βk are unbiased and consistent estimators ofβ0, β1, . . . , βk .

2 The joint sampling distribution of β0, β1, . . . , βk is well approximatedby a multivariate normal distribution.

3 This implies that, in large samples, for j = 0.1, . . . , k,

βj v N(

βj , σ2βj

)or, equivalently,

βj − βj√σ2

βj

v N (0, 1)

Saul Lach () Applied Statistics and Econometrics September 2017 47 / 53

The variance of the OLS estimator

There is a more complicated formula for the estimator of the varianceof βj ...but the software does it for us!

As in the single regressor case, there is a formula that holds onlywhen there is homoskedasticity, i.e., when Var (u|X1, . . . ,Xk ) is aconstant that does not vary with the values of (X1, . . . ,Xk ) , andanother formula that holds when there is heteroskedasticity.

As in the single regressor case, we prefer to use the formula that isrobust to heteroskedasticity becasuse it is also correct when thereis homoskedasticity.

Intuitively we expect our estimator to be less precise (to have highersampling variance) when using the same data to estimate moreparameters. This is indeed correct, and the formula for the varianceof βj (not shown) reflects this intuition as it usually increases withthe number of variables (k) included in the regression.This result prevents us form keeping adding regressors without limits.

Saul Lach () Applied Statistics and Econometrics September 2017 48 / 53

Where are we?

1 Omitted variable bias (SW 6.1)2 Multiple regression model (SW 6.2, 6.3)3 Measures of fit (SW 6.4)4 The Least Squares Assumptions (SW 6.5)5 Sampling distribution of the OLS estimator (SW 6.6)6 Hypothesis tests and confidence intervals for a single coeffi cient(SW 7.1)

Saul Lach () Applied Statistics and Econometrics September 2017 49 / 53

Hypothesis Tests and Confidence Intervals for a SingleCoeffi cient

This follows the same logic and recipe as for the slope coeffi cient in asingle-regressor model.

Becauseβj−βj√

σ2βj

is approximately distributed N(0, 1) in large samples

(under the four LS assumptions), hypotheses on β1 can be testedusing the usual t-statistic

t =β1 − β1,0

SE(

β1) ,

and 95% confidence intervals are constructed as{β1 ± 1.96× SE

(β1)}

Similarly for β2, . . . , βk .β1 and β2 are generally not independently distributed — so neither aretheir t-statistics (more on this later).

Saul Lach () Applied Statistics and Econometrics September 2017 50 / 53

The California school dataset

Single regressor estimates

. reg testscr str, robust

Linear regression Number of obs = 420 F(1, 418) = 19.26 Prob > F = 0.0000 R-squared = 0.0512 Root MSE = 18.581

Robust testscr Coef. Std. Err. t P>|t| [95% Conf. Interval]

str -2.279808 .5194892 -4.39 0.000 -3.300945 -1.258671 _cons 698.933 10.36436 67.44 0.000 678.5602 719.3057

Saul Lach () Applied Statistics and Econometrics September 2017 51 / 53

The California school dataset

Multiple regression estimates

. reg testscr str el_pct,robust

Linear regression Number of obs = 420 F(2, 417) = 223.82 Prob > F = 0.0000 R-squared = 0.4264 Root MSE = 14.464

Robust testscr Coef. Std. Err. t P>|t| [95% Conf. Interval]

str -1.101296 .4328472 -2.54 0.011 -1.95213 -.2504616 el_pct -.6497768 .0310318 -20.94 0.000 -.710775 -.5887786 _cons 686.0322 8.728224 78.60 0.000 668.8754 703.189

Saul Lach () Applied Statistics and Econometrics September 2017 52 / 53

Testing hypotheses and CI in the California school dataset

The coeffi cient on STR in the multiple regression is the effect onTestscore of a unit change in STR, holding constant the percentageof English Learners in the district.The coeffi cient on STR falls by one-half (in absolute value) whenel_pct is added to the regression (does it make sense?)The 95% confidence interval for the coeffi cient on STR in is

{−1.10± 1.96× 0.43} = (−1.95,−0.25)

The t-statistic testing H0 : βSTR = 0 is

t =βSTR − 0√

σ2βSTR

=−1.100.43

= −2.54

so we reject the null hypothesis at the 5% significance levelWe use heteroskedasticity-robust standard errors for exactly the samereasons as in the case of a single regressor.

Saul Lach () Applied Statistics and Econometrics September 2017 53 / 53