week 7: multiple regression - princeton university · 2020. 10. 10. · week 7: multiple regression...

387
Week 7: Multiple Regression Brandon Stewart 1 Princeton October 12–16, 2020 1 These slides are heavily influenced by Matt Blackwell, Adam Glynn, Justin Grimmer, Jens Hainmueller and Erin Hartman. Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 1 / 93

Upload: others

Post on 24-Feb-2021

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Week 7: Multiple Regression

Brandon Stewart1

Princeton

October 12–16, 2020

1These slides are heavily influenced by Matt Blackwell, Adam Glynn, Justin Grimmer,Jens Hainmueller and Erin Hartman.

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 1 / 93

Page 2: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Where We’ve Been and Where We’re Going...

Last WeekI regression with two variablesI omitted variables, multicollinearity, interactions

This WeekI matrix form of linear regressionI inference and hypothesis tests

Next WeekI diagnostics

Long RunI probability → inference → regression → causal inference

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 2 / 93

Page 3: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Where We’ve Been and Where We’re Going...

Last WeekI regression with two variablesI omitted variables, multicollinearity, interactions

This WeekI matrix form of linear regressionI inference and hypothesis tests

Next WeekI diagnostics

Long RunI probability → inference → regression → causal inference

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 2 / 93

Page 4: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Where We’ve Been and Where We’re Going...

Last WeekI regression with two variablesI omitted variables, multicollinearity, interactions

This WeekI matrix form of linear regressionI inference and hypothesis tests

Next WeekI diagnostics

Long RunI probability → inference → regression → causal inference

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 2 / 93

Page 5: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Where We’ve Been and Where We’re Going...

Last WeekI regression with two variablesI omitted variables, multicollinearity, interactions

This WeekI matrix form of linear regressionI inference and hypothesis tests

Next WeekI diagnostics

Long RunI probability → inference → regression → causal inference

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 2 / 93

Page 6: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Where We’ve Been and Where We’re Going...

Last WeekI regression with two variablesI omitted variables, multicollinearity, interactions

This WeekI matrix form of linear regressionI inference and hypothesis tests

Next WeekI diagnostics

Long RunI probability → inference → regression → causal inference

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 2 / 93

Page 7: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

1 Matrix Form of RegressionEstimationFun With(out) Weights

2 OLS Classical Inference in Matrix FormUnbiasednessClassical Standard Errors

3 Agnostic Inference

4 Standard Hypothesis Testst-TestsAdjusted R2

F Tests for Joint Significance

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 3 / 93

Page 8: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

1 Matrix Form of RegressionEstimationFun With(out) Weights

2 OLS Classical Inference in Matrix FormUnbiasednessClassical Standard Errors

3 Agnostic Inference

4 Standard Hypothesis Testst-TestsAdjusted R2

F Tests for Joint Significance

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 3 / 93

Page 9: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

The Linear Model with New Notation

Remember that we wrote the linear model as the following for alli ∈ [1, . . . , n]:

yi = β0 + xiβ1 + ziβ2 + ui

Imagine we had an n of 4. We could write out each formula:

y1 = β0 + x1β1 + z1β2 + u1 (unit 1)

y2 = β0 + x2β1 + z2β2 + u2 (unit 2)

y3 = β0 + x3β1 + z3β2 + u3 (unit 3)

y4 = β0 + x4β1 + z4β2 + u4 (unit 4)

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 4 / 93

Page 10: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

The Linear Model with New Notation

Remember that we wrote the linear model as the following for alli ∈ [1, . . . , n]:

yi = β0 + xiβ1 + ziβ2 + ui

Imagine we had an n of 4. We could write out each formula:

y1 = β0 + x1β1 + z1β2 + u1 (unit 1)

y2 = β0 + x2β1 + z2β2 + u2 (unit 2)

y3 = β0 + x3β1 + z3β2 + u3 (unit 3)

y4 = β0 + x4β1 + z4β2 + u4 (unit 4)

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 4 / 93

Page 11: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

The Linear Model with New Notation

Remember that we wrote the linear model as the following for alli ∈ [1, . . . , n]:

yi = β0 + xiβ1 + ziβ2 + ui

Imagine we had an n of 4. We could write out each formula:

y1 = β0 + x1β1 + z1β2 + u1 (unit 1)

y2 = β0 + x2β1 + z2β2 + u2 (unit 2)

y3 = β0 + x3β1 + z3β2 + u3 (unit 3)

y4 = β0 + x4β1 + z4β2 + u4 (unit 4)

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 4 / 93

Page 12: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

The Linear Model with New Notation

Remember that we wrote the linear model as the following for alli ∈ [1, . . . , n]:

yi = β0 + xiβ1 + ziβ2 + ui

Imagine we had an n of 4. We could write out each formula:

y1 = β0 + x1β1 + z1β2 + u1 (unit 1)

y2 = β0 + x2β1 + z2β2 + u2 (unit 2)

y3 = β0 + x3β1 + z3β2 + u3 (unit 3)

y4 = β0 + x4β1 + z4β2 + u4 (unit 4)

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 4 / 93

Page 13: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

The Linear Model with New Notation

Remember that we wrote the linear model as the following for alli ∈ [1, . . . , n]:

yi = β0 + xiβ1 + ziβ2 + ui

Imagine we had an n of 4. We could write out each formula:

y1 = β0 + x1β1 + z1β2 + u1 (unit 1)

y2 = β0 + x2β1 + z2β2 + u2 (unit 2)

y3 = β0 + x3β1 + z3β2 + u3 (unit 3)

y4 = β0 + x4β1 + z4β2 + u4 (unit 4)

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 4 / 93

Page 14: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

The Linear Model with New Notation

Remember that we wrote the linear model as the following for alli ∈ [1, . . . , n]:

yi = β0 + xiβ1 + ziβ2 + ui

Imagine we had an n of 4. We could write out each formula:

y1 = β0 + x1β1 + z1β2 + u1 (unit 1)

y2 = β0 + x2β1 + z2β2 + u2 (unit 2)

y3 = β0 + x3β1 + z3β2 + u3 (unit 3)

y4 = β0 + x4β1 + z4β2 + u4 (unit 4)

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 4 / 93

Page 15: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

The Linear Model with New Notation

Remember that we wrote the linear model as the following for alli ∈ [1, . . . , n]:

yi = β0 + xiβ1 + ziβ2 + ui

Imagine we had an n of 4. We could write out each formula:

y1 = β0 + x1β1 + z1β2 + u1 (unit 1)

y2 = β0 + x2β1 + z2β2 + u2 (unit 2)

y3 = β0 + x3β1 + z3β2 + u3 (unit 3)

y4 = β0 + x4β1 + z4β2 + u4 (unit 4)

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 4 / 93

Page 16: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

The Linear Model with New Notation

Remember that we wrote the linear model as the following for alli ∈ [1, . . . , n]:

yi = β0 + xiβ1 + ziβ2 + ui

Imagine we had an n of 4. We could write out each formula:

y1 = β0 + x1β1 + z1β2 + u1 (unit 1)

y2 = β0 + x2β1 + z2β2 + u2 (unit 2)

y3 = β0 + x3β1 + z3β2 + u3 (unit 3)

y4 = β0 + x4β1 + z4β2 + u4 (unit 4)

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 4 / 93

Page 17: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

The Linear Model with New Notation

Remember that we wrote the linear model as the following for alli ∈ [1, . . . , n]:

yi = β0 + xiβ1 + ziβ2 + ui

Imagine we had an n of 4. We could write out each formula:

y1 = β0 + x1β1 + z1β2 + u1 (unit 1)

y2 = β0 + x2β1 + z2β2 + u2 (unit 2)

y3 = β0 + x3β1 + z3β2 + u3 (unit 3)

y4 = β0 + x4β1 + z4β2 + u4 (unit 4)

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 4 / 93

Page 18: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

The Linear Model with New Notation

y1 = β0 + x1β1 + z1β2 + u1 (unit 1)

y2 = β0 + x2β1 + z2β2 + u2 (unit 2)

y3 = β0 + x3β1 + z3β2 + u3 (unit 3)

y4 = β0 + x4β1 + z4β2 + u4 (unit 4)

We can write this as:y1

y2

y3

y4

=

1111

β0 +

x1

x2

x3

x4

β1 +

z1

z2

z3

z4

β2 +

u1

u2

u3

u4

Outcome is a linear combination of the the x, z, and u vectors

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 5 / 93

Page 19: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

The Linear Model with New Notation

y1 = β0 + x1β1 + z1β2 + u1 (unit 1)

y2 = β0 + x2β1 + z2β2 + u2 (unit 2)

y3 = β0 + x3β1 + z3β2 + u3 (unit 3)

y4 = β0 + x4β1 + z4β2 + u4 (unit 4)

We can write this as:

y1

y2

y3

y4

=

1111

β0 +

x1

x2

x3

x4

β1 +

z1

z2

z3

z4

β2 +

u1

u2

u3

u4

Outcome is a linear combination of the the x, z, and u vectors

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 5 / 93

Page 20: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

The Linear Model with New Notation

y1 = β0 + x1β1 + z1β2 + u1 (unit 1)

y2 = β0 + x2β1 + z2β2 + u2 (unit 2)

y3 = β0 + x3β1 + z3β2 + u3 (unit 3)

y4 = β0 + x4β1 + z4β2 + u4 (unit 4)

We can write this as:y1

y2

y3

y4

=

1111

β0 +

x1

x2

x3

x4

β1 +

z1

z2

z3

z4

β2 +

u1

u2

u3

u4

Outcome is a linear combination of the the x, z, and u vectors

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 5 / 93

Page 21: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

The Linear Model with New Notation

y1 = β0 + x1β1 + z1β2 + u1 (unit 1)

y2 = β0 + x2β1 + z2β2 + u2 (unit 2)

y3 = β0 + x3β1 + z3β2 + u3 (unit 3)

y4 = β0 + x4β1 + z4β2 + u4 (unit 4)

We can write this as:y1

y2

y3

y4

=

1111

β0 +

x1

x2

x3

x4

β1 +

z1

z2

z3

z4

β2 +

u1

u2

u3

u4

Outcome is a linear combination of the the x, z, and u vectors

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 5 / 93

Page 22: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Grouping Things into Matrices

Can we write this in a more compact form?Yes! Let X and β be the following:

X(4×3)

=

1 x1 z1

1 x2 z2

1 x3 z3

1 x4 z4

β(3×1)

=

β0

β1

β2

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 6 / 93

Page 23: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Grouping Things into Matrices

Can we write this in a more compact form?

Yes! Let X and β be the following:

X(4×3)

=

1 x1 z1

1 x2 z2

1 x3 z3

1 x4 z4

β(3×1)

=

β0

β1

β2

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 6 / 93

Page 24: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Grouping Things into Matrices

Can we write this in a more compact form?Yes! Let X and β be the following:

X(4×3)

=

1 x1 z1

1 x2 z2

1 x3 z3

1 x4 z4

β(3×1)

=

β0

β1

β2

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 6 / 93

Page 25: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Grouping Things into Matrices

Can we write this in a more compact form?Yes! Let X and β be the following:

X(4×3)

=

1 x1 z1

1 x2 z2

1 x3 z3

1 x4 z4

β(3×1)

=

β0

β1

β2

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 6 / 93

Page 26: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Grouping Things into Matrices

Can we write this in a more compact form?Yes! Let X and β be the following:

X(4×3)

=

1 x1 z1

1 x2 z2

1 x3 z3

1 x4 z4

β(3×1)

=

β0

β1

β2

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 6 / 93

Page 27: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Back to Regression

X is the n × (k + 1) design matrix of independent variables

β be the (k + 1)× 1 column vector of coefficients.

Xβ will be n × 1:

Xβ = β0 + β1x1 + β2x2 + · · ·+ βkxk

We can compactly write the linear model as the following:

y(n×1)

= Xβ(n×1)

+ u(n×1)

We can also write this at the individual level, where x′i is the ith rowof X:

yi = x′iβ + ui

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 7 / 93

Page 28: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Back to Regression

X is the n × (k + 1) design matrix of independent variables

β be the (k + 1)× 1 column vector of coefficients.

Xβ will be n × 1:

Xβ = β0 + β1x1 + β2x2 + · · ·+ βkxk

We can compactly write the linear model as the following:

y(n×1)

= Xβ(n×1)

+ u(n×1)

We can also write this at the individual level, where x′i is the ith rowof X:

yi = x′iβ + ui

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 7 / 93

Page 29: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Back to Regression

X is the n × (k + 1) design matrix of independent variables

β be the (k + 1)× 1 column vector of coefficients.

Xβ will be n × 1:

Xβ = β0 + β1x1 + β2x2 + · · ·+ βkxk

We can compactly write the linear model as the following:

y(n×1)

= Xβ(n×1)

+ u(n×1)

We can also write this at the individual level, where x′i is the ith rowof X:

yi = x′iβ + ui

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 7 / 93

Page 30: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Back to Regression

X is the n × (k + 1) design matrix of independent variables

β be the (k + 1)× 1 column vector of coefficients.

Xβ will be n × 1:

Xβ = β0 + β1x1 + β2x2 + · · ·+ βkxk

We can compactly write the linear model as the following:

y(n×1)

= Xβ(n×1)

+ u(n×1)

We can also write this at the individual level, where x′i is the ith rowof X:

yi = x′iβ + ui

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 7 / 93

Page 31: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Back to Regression

X is the n × (k + 1) design matrix of independent variables

β be the (k + 1)× 1 column vector of coefficients.

Xβ will be n × 1:

Xβ = β0 + β1x1 + β2x2 + · · ·+ βkxk

We can compactly write the linear model as the following:

y(n×1)

= Xβ(n×1)

+ u(n×1)

We can also write this at the individual level, where x′i is the ith rowof X:

yi = x′iβ + ui

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 7 / 93

Page 32: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Back to Regression

X is the n × (k + 1) design matrix of independent variables

β be the (k + 1)× 1 column vector of coefficients.

Xβ will be n × 1:

Xβ = β0 + β1x1 + β2x2 + · · ·+ βkxk

We can compactly write the linear model as the following:

y(n×1)

= Xβ(n×1)

+ u(n×1)

We can also write this at the individual level, where x′i is the ith rowof X:

yi = x′iβ + ui

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 7 / 93

Page 33: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Back to Regression

X is the n × (k + 1) design matrix of independent variables

β be the (k + 1)× 1 column vector of coefficients.

Xβ will be n × 1:

Xβ = β0 + β1x1 + β2x2 + · · ·+ βkxk

We can compactly write the linear model as the following:

y(n×1)

= Xβ(n×1)

+ u(n×1)

We can also write this at the individual level, where x′i is the ith rowof X:

yi = x′iβ + ui

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 7 / 93

Page 34: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Back to Regression

X is the n × (k + 1) design matrix of independent variables

β be the (k + 1)× 1 column vector of coefficients.

Xβ will be n × 1:

Xβ = β0 + β1x1 + β2x2 + · · ·+ βkxk

We can compactly write the linear model as the following:

y(n×1)

= Xβ(n×1)

+ u(n×1)

We can also write this at the individual level, where x′i is the ith rowof X:

yi = x′iβ + ui

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 7 / 93

Page 35: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Multiple Linear Regression in Matrix Form

Let β be the matrix of estimated regression coefficients and y be thevector of fitted values:

β =

β0

β1...

βk

y = Xβ

It might be helpful to see this again more written out:

y =

y1

y2

...yn

= Xβ =

1β0 + x11β1 + x12β2 + · · ·+ x1K βk1β0 + x21β1 + x22β2 + · · ·+ x2K βk

...

1β0 + xn1β1 + xn2β2 + · · ·+ xnK βk

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 8 / 93

Page 36: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Multiple Linear Regression in Matrix Form

Let β be the matrix of estimated regression coefficients and y be thevector of fitted values:

β =

β0

β1...

βk

y = Xβ

It might be helpful to see this again more written out:

y =

y1

y2

...yn

= Xβ =

1β0 + x11β1 + x12β2 + · · ·+ x1K βk1β0 + x21β1 + x22β2 + · · ·+ x2K βk

...

1β0 + xn1β1 + xn2β2 + · · ·+ xnK βk

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 8 / 93

Page 37: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Multiple Linear Regression in Matrix Form

Let β be the matrix of estimated regression coefficients and y be thevector of fitted values:

β =

β0

β1...

βk

y = Xβ

It might be helpful to see this again more written out:

y =

y1

y2

...yn

= Xβ =

1β0 + x11β1 + x12β2 + · · ·+ x1K βk1β0 + x21β1 + x22β2 + · · ·+ x2K βk

...

1β0 + xn1β1 + xn2β2 + · · ·+ xnK βk

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 8 / 93

Page 38: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Multiple Linear Regression in Matrix Form

Let β be the matrix of estimated regression coefficients and y be thevector of fitted values:

β =

β0

β1...

βk

y = Xβ

It might be helpful to see this again more written out:

y =

y1

y2

...yn

= Xβ =

1β0 + x11β1 + x12β2 + · · ·+ x1K βk1β0 + x21β1 + x22β2 + · · ·+ x2K βk

...

1β0 + xn1β1 + xn2β2 + · · ·+ xnK βk

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 8 / 93

Page 39: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Multiple Linear Regression in Matrix Form

Let β be the matrix of estimated regression coefficients and y be thevector of fitted values:

β =

β0

β1...

βk

y = Xβ

It might be helpful to see this again more written out:

y =

y1

y2

...yn

= Xβ =

1β0 + x11β1 + x12β2 + · · ·+ x1K βk1β0 + x21β1 + x22β2 + · · ·+ x2K βk

...

1β0 + xn1β1 + xn2β2 + · · ·+ xnK βk

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 8 / 93

Page 40: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Multiple Linear Regression in Matrix Form

Let β be the matrix of estimated regression coefficients and y be thevector of fitted values:

β =

β0

β1...

βk

y = Xβ

It might be helpful to see this again more written out:

y =

y1

y2

...yn

= Xβ =

1β0 + x11β1 + x12β2 + · · ·+ x1K βk1β0 + x21β1 + x22β2 + · · ·+ x2K βk

...

1β0 + xn1β1 + xn2β2 + · · ·+ xnK βk

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 8 / 93

Page 41: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Multiple Linear Regression in Matrix Form

Let β be the matrix of estimated regression coefficients and y be thevector of fitted values:

β =

β0

β1...

βk

y = Xβ

It might be helpful to see this again more written out:

y =

y1

y2

...yn

= Xβ =

1β0 + x11β1 + x12β2 + · · ·+ x1K βk1β0 + x21β1 + x22β2 + · · ·+ x2K βk

...

1β0 + xn1β1 + xn2β2 + · · ·+ xnK βk

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 8 / 93

Page 42: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Multiple Linear Regression in Matrix Form

Let β be the matrix of estimated regression coefficients and y be thevector of fitted values:

β =

β0

β1...

βk

y = Xβ

It might be helpful to see this again more written out:

y =

y1

y2

...yn

= Xβ =

1β0 + x11β1 + x12β2 + · · ·+ x1K βk1β0 + x21β1 + x22β2 + · · ·+ x2K βk

...

1β0 + xn1β1 + xn2β2 + · · ·+ xnK βk

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 8 / 93

Page 43: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Residuals

We can easily write the residuals in matrix form:

u = y − Xβ

Our goal as usual is to minimize the sum of the squared residuals,which we saw earlier we can write:

u′u = (y − Xβ)′(y − Xβ)

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 9 / 93

Page 44: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Residuals

We can easily write the residuals in matrix form:

u = y − Xβ

Our goal as usual is to minimize the sum of the squared residuals,which we saw earlier we can write:

u′u = (y − Xβ)′(y − Xβ)

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 9 / 93

Page 45: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Residuals

We can easily write the residuals in matrix form:

u = y − Xβ

Our goal as usual is to minimize the sum of the squared residuals,which we saw earlier we can write:

u′u = (y − Xβ)′(y − Xβ)

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 9 / 93

Page 46: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Residuals

We can easily write the residuals in matrix form:

u = y − Xβ

Our goal as usual is to minimize the sum of the squared residuals,which we saw earlier we can write:

u′u = (y − Xβ)′(y − Xβ)

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 9 / 93

Page 47: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

OLS Estimator in Matrix Form

Goal: minimize the sum of the squared residuals.

Take (matrix) derivatives, set equal to 0.

Resulting first order conditions:

X′(y − Xβ) = 0

Rearranging:X′Xβ = X′y

In order to isolate β, we need to move the X′X term to the other sideof the equals sign.

We’ve learned about matrix multiplication, but what about matrix“division”?

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 10 / 93

Page 48: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

OLS Estimator in Matrix Form

Goal: minimize the sum of the squared residuals.

Take (matrix) derivatives, set equal to 0.

Resulting first order conditions:

X′(y − Xβ) = 0

Rearranging:X′Xβ = X′y

In order to isolate β, we need to move the X′X term to the other sideof the equals sign.

We’ve learned about matrix multiplication, but what about matrix“division”?

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 10 / 93

Page 49: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

OLS Estimator in Matrix Form

Goal: minimize the sum of the squared residuals.

Take (matrix) derivatives, set equal to 0.

Resulting first order conditions:

X′(y − Xβ) = 0

Rearranging:X′Xβ = X′y

In order to isolate β, we need to move the X′X term to the other sideof the equals sign.

We’ve learned about matrix multiplication, but what about matrix“division”?

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 10 / 93

Page 50: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

OLS Estimator in Matrix Form

Goal: minimize the sum of the squared residuals.

Take (matrix) derivatives, set equal to 0.

Resulting first order conditions:

X′(y − Xβ) = 0

Rearranging:X′Xβ = X′y

In order to isolate β, we need to move the X′X term to the other sideof the equals sign.

We’ve learned about matrix multiplication, but what about matrix“division”?

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 10 / 93

Page 51: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

OLS Estimator in Matrix Form

Goal: minimize the sum of the squared residuals.

Take (matrix) derivatives, set equal to 0.

Resulting first order conditions:

X′(y − Xβ) = 0

Rearranging:X′Xβ = X′y

In order to isolate β, we need to move the X′X term to the other sideof the equals sign.

We’ve learned about matrix multiplication, but what about matrix“division”?

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 10 / 93

Page 52: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

OLS Estimator in Matrix Form

Goal: minimize the sum of the squared residuals.

Take (matrix) derivatives, set equal to 0.

Resulting first order conditions:

X′(y − Xβ) = 0

Rearranging:X′Xβ = X′y

In order to isolate β, we need to move the X′X term to the other sideof the equals sign.

We’ve learned about matrix multiplication, but what about matrix“division”?

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 10 / 93

Page 53: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

OLS Estimator in Matrix Form

Goal: minimize the sum of the squared residuals.

Take (matrix) derivatives, set equal to 0.

Resulting first order conditions:

X′(y − Xβ) = 0

Rearranging:X′Xβ = X′y

In order to isolate β, we need to move the X′X term to the other sideof the equals sign.

We’ve learned about matrix multiplication, but what about matrix“division”?

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 10 / 93

Page 54: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Scalar Inverses

What is division in its simplest form? 1a is the value such that a 1

a = 1:

For some algebraic expression: au = b, let’s solve for u:

1

aau =

1

ab

u =b

a

Need a matrix version of this: 1a .

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 11 / 93

Page 55: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Scalar Inverses

What is division in its simplest form? 1a is the value such that a 1

a = 1:

For some algebraic expression: au = b, let’s solve for u:

1

aau =

1

ab

u =b

a

Need a matrix version of this: 1a .

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 11 / 93

Page 56: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Scalar Inverses

What is division in its simplest form? 1a is the value such that a 1

a = 1:

For some algebraic expression: au = b, let’s solve for u:

1

aau =

1

ab

u =b

a

Need a matrix version of this: 1a .

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 11 / 93

Page 57: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Scalar Inverses

What is division in its simplest form? 1a is the value such that a 1

a = 1:

For some algebraic expression: au = b, let’s solve for u:

1

aau =

1

ab

u =b

a

Need a matrix version of this: 1a .

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 11 / 93

Page 58: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Scalar Inverses

What is division in its simplest form? 1a is the value such that a 1

a = 1:

For some algebraic expression: au = b, let’s solve for u:

1

aau =

1

ab

u =b

a

Need a matrix version of this: 1a .

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 11 / 93

Page 59: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Scalar Inverses

What is division in its simplest form? 1a is the value such that a 1

a = 1:

For some algebraic expression: au = b, let’s solve for u:

1

aau =

1

ab

u =b

a

Need a matrix version of this: 1a .

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 11 / 93

Page 60: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Scalar Inverses

What is division in its simplest form? 1a is the value such that a 1

a = 1:

For some algebraic expression: au = b, let’s solve for u:

1

aau =

1

ab

u =b

a

Need a matrix version of this: 1a .

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 11 / 93

Page 61: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Matrix Inverses

Definition (Matrix Inverse)

If it exists, the inverse of square matrix A, denoted A−1, is the matrixsuch that A−1A = I.

We can use the inverse to solve (systems of) equations:

Au = b

A−1Au = A−1b

Iu = A−1b

u = A−1b

If the inverse exists, we say that A is invertible or nonsingular.

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 12 / 93

Page 62: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Matrix Inverses

Definition (Matrix Inverse)

If it exists, the inverse of square matrix A, denoted A−1, is the matrixsuch that A−1A = I.

We can use the inverse to solve (systems of) equations:

Au = b

A−1Au = A−1b

Iu = A−1b

u = A−1b

If the inverse exists, we say that A is invertible or nonsingular.

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 12 / 93

Page 63: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Matrix Inverses

Definition (Matrix Inverse)

If it exists, the inverse of square matrix A, denoted A−1, is the matrixsuch that A−1A = I.

We can use the inverse to solve (systems of) equations:

Au = b

A−1Au = A−1b

Iu = A−1b

u = A−1b

If the inverse exists, we say that A is invertible or nonsingular.

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 12 / 93

Page 64: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Matrix Inverses

Definition (Matrix Inverse)

If it exists, the inverse of square matrix A, denoted A−1, is the matrixsuch that A−1A = I.

We can use the inverse to solve (systems of) equations:

Au = b

A−1Au = A−1b

Iu = A−1b

u = A−1b

If the inverse exists, we say that A is invertible or nonsingular.

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 12 / 93

Page 65: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Matrix Inverses

Definition (Matrix Inverse)

If it exists, the inverse of square matrix A, denoted A−1, is the matrixsuch that A−1A = I.

We can use the inverse to solve (systems of) equations:

Au = b

A−1Au = A−1b

Iu = A−1b

u = A−1b

If the inverse exists, we say that A is invertible or nonsingular.

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 12 / 93

Page 66: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Matrix Inverses

Definition (Matrix Inverse)

If it exists, the inverse of square matrix A, denoted A−1, is the matrixsuch that A−1A = I.

We can use the inverse to solve (systems of) equations:

Au = b

A−1Au = A−1b

Iu = A−1b

u = A−1b

If the inverse exists, we say that A is invertible or nonsingular.

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 12 / 93

Page 67: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Back to OLS

Recall:X′Xβ = X′y

Let’s assume, for now, that the inverse of X′X existsThen we can write the OLS estimator as the following:

β = (X′X)−1X′y

See Aronow and Miller Theorem 4.1.4 for proof.“ex prime ex inverse ex prime y” sear it into your soul.

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 13 / 93

Page 68: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Back to OLSRecall:

X′Xβ = X′y

Let’s assume, for now, that the inverse of X′X existsThen we can write the OLS estimator as the following:

β = (X′X)−1X′y

See Aronow and Miller Theorem 4.1.4 for proof.“ex prime ex inverse ex prime y” sear it into your soul.

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 13 / 93

Page 69: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Back to OLSRecall:

X′Xβ = X′y

Let’s assume, for now, that the inverse of X′X exists

Then we can write the OLS estimator as the following:

β = (X′X)−1X′y

See Aronow and Miller Theorem 4.1.4 for proof.“ex prime ex inverse ex prime y” sear it into your soul.

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 13 / 93

Page 70: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Back to OLSRecall:

X′Xβ = X′y

Let’s assume, for now, that the inverse of X′X existsThen we can write the OLS estimator as the following:

β = (X′X)−1X′y

See Aronow and Miller Theorem 4.1.4 for proof.“ex prime ex inverse ex prime y” sear it into your soul.

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 13 / 93

Page 71: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Back to OLSRecall:

X′Xβ = X′y

Let’s assume, for now, that the inverse of X′X existsThen we can write the OLS estimator as the following:

β = (X′X)−1X′y

See Aronow and Miller Theorem 4.1.4 for proof.

“ex prime ex inverse ex prime y” sear it into your soul.

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 13 / 93

Page 72: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Back to OLSRecall:

X′Xβ = X′y

Let’s assume, for now, that the inverse of X′X existsThen we can write the OLS estimator as the following:

β = (X′X)−1X′y

See Aronow and Miller Theorem 4.1.4 for proof.“ex prime ex inverse ex prime y”

sear it into your soul.

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 13 / 93

Page 73: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Back to OLSRecall:

X′Xβ = X′y

Let’s assume, for now, that the inverse of X′X existsThen we can write the OLS estimator as the following:

β = (X′X)−1X′y

See Aronow and Miller Theorem 4.1.4 for proof.“ex prime ex inverse ex prime y” sear it into your soul.

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 13 / 93

Page 74: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Intuition for the OLS in Matrix Form

β = (X′X)−1X′y

What’s the intuition here?

“Numerator” X′y: is approximately composed of the covariancesbetween the columns of X and y

“Denominator” X′X is approximately composed of the samplevariances and covariances of variables within X

Thus, we have something like:

β ≈ (variance of X)−1(covariance of X & y)

i.e. analogous to the simple linear regression case!Disclaimer: the final equation is exactly true for all non-intercept coefficients if you remove the intercept from X such that

β−0 = Var(X−0)−1Cov(X−0, y). The numerator and denominator are the variances and covariances if X and y are demeanedand normalized by the sample size minus 1.

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 14 / 93

Page 75: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Intuition for the OLS in Matrix Form

β = (X′X)−1X′y

What’s the intuition here?

“Numerator” X′y: is approximately composed of the covariancesbetween the columns of X and y

“Denominator” X′X is approximately composed of the samplevariances and covariances of variables within X

Thus, we have something like:

β ≈ (variance of X)−1(covariance of X & y)

i.e. analogous to the simple linear regression case!Disclaimer: the final equation is exactly true for all non-intercept coefficients if you remove the intercept from X such that

β−0 = Var(X−0)−1Cov(X−0, y). The numerator and denominator are the variances and covariances if X and y are demeanedand normalized by the sample size minus 1.

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 14 / 93

Page 76: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Intuition for the OLS in Matrix Form

β = (X′X)−1X′y

What’s the intuition here?

“Numerator” X′y: is approximately composed of the covariancesbetween the columns of X and y

“Denominator” X′X is approximately composed of the samplevariances and covariances of variables within X

Thus, we have something like:

β ≈ (variance of X)−1(covariance of X & y)

i.e. analogous to the simple linear regression case!Disclaimer: the final equation is exactly true for all non-intercept coefficients if you remove the intercept from X such that

β−0 = Var(X−0)−1Cov(X−0, y). The numerator and denominator are the variances and covariances if X and y are demeanedand normalized by the sample size minus 1.

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 14 / 93

Page 77: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Intuition for the OLS in Matrix Form

β = (X′X)−1X′y

What’s the intuition here?

“Numerator” X′y: is approximately composed of the covariancesbetween the columns of X and y

“Denominator” X′X is approximately composed of the samplevariances and covariances of variables within X

Thus, we have something like:

β ≈ (variance of X)−1(covariance of X & y)

i.e. analogous to the simple linear regression case!Disclaimer: the final equation is exactly true for all non-intercept coefficients if you remove the intercept from X such that

β−0 = Var(X−0)−1Cov(X−0, y). The numerator and denominator are the variances and covariances if X and y are demeanedand normalized by the sample size minus 1.

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 14 / 93

Page 78: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Intuition for the OLS in Matrix Form

β = (X′X)−1X′y

What’s the intuition here?

“Numerator” X′y: is approximately composed of the covariancesbetween the columns of X and y

“Denominator” X′X is approximately composed of the samplevariances and covariances of variables within X

Thus, we have something like:

β ≈ (variance of X)−1(covariance of X & y)

i.e. analogous to the simple linear regression case!

Disclaimer: the final equation is exactly true for all non-intercept coefficients if you remove the intercept from X such that

β−0 = Var(X−0)−1Cov(X−0, y). The numerator and denominator are the variances and covariances if X and y are demeanedand normalized by the sample size minus 1.

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 14 / 93

Page 79: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Intuition for the OLS in Matrix Form

β = (X′X)−1X′y

What’s the intuition here?

“Numerator” X′y: is approximately composed of the covariancesbetween the columns of X and y

“Denominator” X′X is approximately composed of the samplevariances and covariances of variables within X

Thus, we have something like:

β ≈ (variance of X)−1(covariance of X & y)

i.e. analogous to the simple linear regression case!Disclaimer: the final equation is exactly true for all non-intercept coefficients if you remove the intercept from X such that

β−0 = Var(X−0)−1Cov(X−0, y). The numerator and denominator are the variances and covariances if X and y are demeanedand normalized by the sample size minus 1.

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 14 / 93

Page 80: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Fun Without Weights

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 15 / 93

Page 81: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Fun Without Weights

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 15 / 93

Page 82: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Improper Linear Models

If you have to diagnose a disease are you better off with an expert ora statistical model?

Meehl (1954) Clinical Versus Statistical Prediction: A TheoreticalAnalysis and Review of the Evidence argued that proper linear modelsoutperform clinical intuition in many areas.

Proper linear model is one where predictor variables are givenoptimized weights in some way (for example through regression).

Dawes argues that even improper linear models (those where weightsare set by hand or set to be equal), outperform clinical intuition.

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 16 / 93

Page 83: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Improper Linear Models

If you have to diagnose a disease are you better off with an expert ora statistical model?

Meehl (1954) Clinical Versus Statistical Prediction: A TheoreticalAnalysis and Review of the Evidence argued that proper linear modelsoutperform clinical intuition in many areas.

Proper linear model is one where predictor variables are givenoptimized weights in some way (for example through regression).

Dawes argues that even improper linear models (those where weightsare set by hand or set to be equal), outperform clinical intuition.

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 16 / 93

Page 84: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Improper Linear Models

If you have to diagnose a disease are you better off with an expert ora statistical model?

Meehl (1954) Clinical Versus Statistical Prediction: A TheoreticalAnalysis and Review of the Evidence argued that proper linear modelsoutperform clinical intuition in many areas.

Proper linear model is one where predictor variables are givenoptimized weights in some way (for example through regression).

Dawes argues that even improper linear models (those where weightsare set by hand or set to be equal), outperform clinical intuition.

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 16 / 93

Page 85: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Improper Linear Models

If you have to diagnose a disease are you better off with an expert ora statistical model?

Meehl (1954) Clinical Versus Statistical Prediction: A TheoreticalAnalysis and Review of the Evidence argued that proper linear modelsoutperform clinical intuition in many areas.

Proper linear model is one where predictor variables are givenoptimized weights in some way (for example through regression).

Dawes argues that even improper linear models (those where weightsare set by hand or set to be equal), outperform clinical intuition.

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 16 / 93

Page 86: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Example: Graduate Admissions

Faculty rated all students in the psych department at University ofOregon.

Ratings predicted from a proper linear model of student GRE scores,undergrad GPA and selectivity of student’s undergraduate institution.Cross-validated correlation was .38.

Correlation of faculty ratings with average rating of admissionscommittee was .19.

Standardized and equally weighted improper linear model, correlatedat .48.

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 17 / 93

Page 87: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Example: Graduate Admissions

Faculty rated all students in the psych department at University ofOregon.

Ratings predicted from a proper linear model of student GRE scores,undergrad GPA and selectivity of student’s undergraduate institution.Cross-validated correlation was .38.

Correlation of faculty ratings with average rating of admissionscommittee was .19.

Standardized and equally weighted improper linear model, correlatedat .48.

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 17 / 93

Page 88: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Example: Graduate Admissions

Faculty rated all students in the psych department at University ofOregon.

Ratings predicted from a proper linear model of student GRE scores,undergrad GPA and selectivity of student’s undergraduate institution.Cross-validated correlation was .38.

Correlation of faculty ratings with average rating of admissionscommittee was .19.

Standardized and equally weighted improper linear model, correlatedat .48.

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 17 / 93

Page 89: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Example: Graduate Admissions

Faculty rated all students in the psych department at University ofOregon.

Ratings predicted from a proper linear model of student GRE scores,undergrad GPA and selectivity of student’s undergraduate institution.Cross-validated correlation was .38.

Correlation of faculty ratings with average rating of admissionscommittee was .19.

Standardized and equally weighted improper linear model, correlatedat .48.

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 17 / 93

Page 90: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Other Examples

Self-assessed measures of marital happiness: modeled with improperlinear model of (rate of lovemaking - rate of arguments):correlation of .40

Einhorn (1972) study of doctors coding biopsies of patients withHodgkin’s disease and then rated severity. Their rating of severity wasessentially uncorrelated with survival times, but the variables theycoded predicted outcomes using a regression model.

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 18 / 93

Page 91: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Other Examples

Self-assessed measures of marital happiness: modeled with improperlinear model of (rate of lovemaking - rate of arguments):correlation of .40

Einhorn (1972) study of doctors coding biopsies of patients withHodgkin’s disease and then rated severity. Their rating of severity wasessentially uncorrelated with survival times, but the variables theycoded predicted outcomes using a regression model.

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 18 / 93

Page 92: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Other Examples

Self-assessed measures of marital happiness: modeled with improperlinear model of (rate of lovemaking - rate of arguments):correlation of .40

Einhorn (1972) study of doctors coding biopsies of patients withHodgkin’s disease and then rated severity. Their rating of severity wasessentially uncorrelated with survival times, but the variables theycoded predicted outcomes using a regression model.

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 18 / 93

Page 93: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Other Examples

Column descriptions:

C1) average of human judges

C2) model based on human judges

C3) randomly chosen weights preserving signs

C4) equal weighting

C5) cross-validated weights

C6) unattainable optimal linear model

Common pattern: c2, c3, c4, c5, c6 > c1

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 19 / 93

Page 94: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

The Argument

“People – especially the experts in a field – are much better atselecting and coding information than they are at integrating it.”(573)

The choice of variables is extremely important for prediction!

This parallels a piece of folk wisdom in the machine learning literaturethat a better predictor will beat a better model every time.

People are good at picking out relevant information, but terrible atintegrating it.

The difficulty arises in part because people in general lack a strongreference to the distribution of the predictors.

Linear models are robust to deviations from the optimal weights (seealso Waller 2008 on “Fungible Weights in Multiple Regression”)

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 20 / 93

Page 95: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

The Argument

“People – especially the experts in a field – are much better atselecting and coding information than they are at integrating it.”(573)

The choice of variables is extremely important for prediction!

This parallels a piece of folk wisdom in the machine learning literaturethat a better predictor will beat a better model every time.

People are good at picking out relevant information, but terrible atintegrating it.

The difficulty arises in part because people in general lack a strongreference to the distribution of the predictors.

Linear models are robust to deviations from the optimal weights (seealso Waller 2008 on “Fungible Weights in Multiple Regression”)

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 20 / 93

Page 96: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

The Argument

“People – especially the experts in a field – are much better atselecting and coding information than they are at integrating it.”(573)

The choice of variables is extremely important for prediction!

This parallels a piece of folk wisdom in the machine learning literaturethat a better predictor will beat a better model every time.

People are good at picking out relevant information, but terrible atintegrating it.

The difficulty arises in part because people in general lack a strongreference to the distribution of the predictors.

Linear models are robust to deviations from the optimal weights (seealso Waller 2008 on “Fungible Weights in Multiple Regression”)

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 20 / 93

Page 97: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

The Argument

“People – especially the experts in a field – are much better atselecting and coding information than they are at integrating it.”(573)

The choice of variables is extremely important for prediction!

This parallels a piece of folk wisdom in the machine learning literaturethat a better predictor will beat a better model every time.

People are good at picking out relevant information, but terrible atintegrating it.

The difficulty arises in part because people in general lack a strongreference to the distribution of the predictors.

Linear models are robust to deviations from the optimal weights (seealso Waller 2008 on “Fungible Weights in Multiple Regression”)

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 20 / 93

Page 98: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

The Argument

“People – especially the experts in a field – are much better atselecting and coding information than they are at integrating it.”(573)

The choice of variables is extremely important for prediction!

This parallels a piece of folk wisdom in the machine learning literaturethat a better predictor will beat a better model every time.

People are good at picking out relevant information, but terrible atintegrating it.

The difficulty arises in part because people in general lack a strongreference to the distribution of the predictors.

Linear models are robust to deviations from the optimal weights (seealso Waller 2008 on “Fungible Weights in Multiple Regression”)

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 20 / 93

Page 99: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

The Argument

“People – especially the experts in a field – are much better atselecting and coding information than they are at integrating it.”(573)

The choice of variables is extremely important for prediction!

This parallels a piece of folk wisdom in the machine learning literaturethat a better predictor will beat a better model every time.

People are good at picking out relevant information, but terrible atintegrating it.

The difficulty arises in part because people in general lack a strongreference to the distribution of the predictors.

Linear models are robust to deviations from the optimal weights (seealso Waller 2008 on “Fungible Weights in Multiple Regression”)

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 20 / 93

Page 100: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Thoughts on the Argument

Particularly in prediction, looking for the true or right model can bequixotic.

The broader research project suggests that a big part of whatquantitative models are doing predictively, is focusing human talent inthe right place.

This all applies because predictors well chosen and the sample size issmall (so it is hard to learn much from the data).

Dawes (1979) is an intellectual basis to support algorithmic decisionmaking. Roughly, if simple models are better than experts, than withlots of data, complicated model could be much better than experts.

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 21 / 93

Page 101: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Thoughts on the Argument

Particularly in prediction, looking for the true or right model can bequixotic.

The broader research project suggests that a big part of whatquantitative models are doing predictively, is focusing human talent inthe right place.

This all applies because predictors well chosen and the sample size issmall (so it is hard to learn much from the data).

Dawes (1979) is an intellectual basis to support algorithmic decisionmaking. Roughly, if simple models are better than experts, than withlots of data, complicated model could be much better than experts.

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 21 / 93

Page 102: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Thoughts on the Argument

Particularly in prediction, looking for the true or right model can bequixotic.

The broader research project suggests that a big part of whatquantitative models are doing predictively, is focusing human talent inthe right place.

This all applies because predictors well chosen and the sample size issmall (so it is hard to learn much from the data).

Dawes (1979) is an intellectual basis to support algorithmic decisionmaking. Roughly, if simple models are better than experts, than withlots of data, complicated model could be much better than experts.

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 21 / 93

Page 103: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Thoughts on the Argument

Particularly in prediction, looking for the true or right model can bequixotic.

The broader research project suggests that a big part of whatquantitative models are doing predictively, is focusing human talent inthe right place.

This all applies because predictors well chosen and the sample size issmall (so it is hard to learn much from the data).

Dawes (1979) is an intellectual basis to support algorithmic decisionmaking. Roughly, if simple models are better than experts, than withlots of data, complicated model could be much better than experts.

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 21 / 93

Page 104: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

We Covered

Matrix notation for OLS

Estimation mechanics

Next Time: Classical Inference and Properties

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 22 / 93

Page 105: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

We Covered

Matrix notation for OLS

Estimation mechanics

Next Time: Classical Inference and Properties

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 22 / 93

Page 106: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

We Covered

Matrix notation for OLS

Estimation mechanics

Next Time: Classical Inference and Properties

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 22 / 93

Page 107: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Where We’ve Been and Where We’re Going...

Last WeekI regression with two variablesI omitted variables, multicollinearity, interactions

This WeekI matrix form of linear regressionI inference and hypothesis tests

Next WeekI diagnostics

Long RunI probability → inference → regression → causal inference

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 23 / 93

Page 108: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

1 Matrix Form of RegressionEstimationFun With(out) Weights

2 OLS Classical Inference in Matrix FormUnbiasednessClassical Standard Errors

3 Agnostic Inference

4 Standard Hypothesis Testst-TestsAdjusted R2

F Tests for Joint Significance

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 24 / 93

Page 109: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

1 Matrix Form of RegressionEstimationFun With(out) Weights

2 OLS Classical Inference in Matrix FormUnbiasednessClassical Standard Errors

3 Agnostic Inference

4 Standard Hypothesis Testst-TestsAdjusted R2

F Tests for Joint Significance

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 24 / 93

Page 110: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

OLS Assumptions in Matrix Form

1 Linearity: y = Xβ + u

2 Random/iid sample: (yi , x′i ) are a iid sample from the population.

3 No perfect collinearity: X is an n × (k + 1) matrix with rank k + 1

4 Zero conditional mean: E [u|X] = 0

5 Homoskedasticity: var(u|X) = σ2uIn

6 Normality: u|X ∼ N(0, σ2uIn)

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 25 / 93

Page 111: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

OLS Assumptions in Matrix Form

1 Linearity: y = Xβ + u

2 Random/iid sample: (yi , x′i ) are a iid sample from the population.

3 No perfect collinearity: X is an n × (k + 1) matrix with rank k + 1

4 Zero conditional mean: E [u|X] = 0

5 Homoskedasticity: var(u|X) = σ2uIn

6 Normality: u|X ∼ N(0, σ2uIn)

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 25 / 93

Page 112: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

OLS Assumptions in Matrix Form

1 Linearity: y = Xβ + u

2 Random/iid sample: (yi , x′i ) are a iid sample from the population.

3 No perfect collinearity: X is an n × (k + 1) matrix with rank k + 1

4 Zero conditional mean: E [u|X] = 0

5 Homoskedasticity: var(u|X) = σ2uIn

6 Normality: u|X ∼ N(0, σ2uIn)

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 25 / 93

Page 113: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Assumption 3: No Perfect Collinearity

Definition (Rank)

The rank of a matrix is the maximum number of linearly independentcolumns.

If X has rank k + 1, then all of its columns are linearly independent

. . . If all of the columns are linearly independent, then the assumptionof no perfect collinearity hold.

If X has rank k + 1, then (X′X) is invertible (see linear algebra bookfor proof)

Just like variation in X led us to be able to divide by the variance insimple OLS

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 26 / 93

Page 114: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Assumption 3: No Perfect Collinearity

Definition (Rank)

The rank of a matrix is the maximum number of linearly independentcolumns.

If X has rank k + 1, then all of its columns are linearly independent

. . . If all of the columns are linearly independent, then the assumptionof no perfect collinearity hold.

If X has rank k + 1, then (X′X) is invertible (see linear algebra bookfor proof)

Just like variation in X led us to be able to divide by the variance insimple OLS

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 26 / 93

Page 115: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Assumption 3: No Perfect Collinearity

Definition (Rank)

The rank of a matrix is the maximum number of linearly independentcolumns.

If X has rank k + 1, then all of its columns are linearly independent

. . . If all of the columns are linearly independent, then the assumptionof no perfect collinearity hold.

If X has rank k + 1, then (X′X) is invertible (see linear algebra bookfor proof)

Just like variation in X led us to be able to divide by the variance insimple OLS

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 26 / 93

Page 116: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Assumption 3: No Perfect Collinearity

Definition (Rank)

The rank of a matrix is the maximum number of linearly independentcolumns.

If X has rank k + 1, then all of its columns are linearly independent

. . . If all of the columns are linearly independent, then the assumptionof no perfect collinearity hold.

If X has rank k + 1, then (X′X) is invertible (see linear algebra bookfor proof)

Just like variation in X led us to be able to divide by the variance insimple OLS

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 26 / 93

Page 117: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Assumption 3: No Perfect Collinearity

Definition (Rank)

The rank of a matrix is the maximum number of linearly independentcolumns.

If X has rank k + 1, then all of its columns are linearly independent

. . . If all of the columns are linearly independent, then the assumptionof no perfect collinearity hold.

If X has rank k + 1, then (X′X) is invertible (see linear algebra bookfor proof)

Just like variation in X led us to be able to divide by the variance insimple OLS

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 26 / 93

Page 118: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Assumption 3: No Perfect Collinearity

Definition (Rank)

The rank of a matrix is the maximum number of linearly independentcolumns.

If X has rank k + 1, then all of its columns are linearly independent

. . . If all of the columns are linearly independent, then the assumptionof no perfect collinearity hold.

If X has rank k + 1, then (X′X) is invertible (see linear algebra bookfor proof)

Just like variation in X led us to be able to divide by the variance insimple OLS

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 26 / 93

Page 119: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Expected Values of Vectors

The expected value of the vector is just the expected value of itsentries.

Using the zero mean conditional error assumptions:

E [u|X] =

E [u1|X]E [u2|X]

...E [un|X]

=

00...0

= 0

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 27 / 93

Page 120: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Expected Values of Vectors

The expected value of the vector is just the expected value of itsentries.

Using the zero mean conditional error assumptions:

E [u|X] =

E [u1|X]E [u2|X]

...E [un|X]

=

00...0

= 0

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 27 / 93

Page 121: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Expected Values of Vectors

The expected value of the vector is just the expected value of itsentries.

Using the zero mean conditional error assumptions:

E [u|X] =

E [u1|X]E [u2|X]

...E [un|X]

=

00...0

= 0

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 27 / 93

Page 122: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Unbiasedness of β

Is β still unbiased under assumptions 1-4? Does E [β] = β?

β =(X′X

)−1X′y (linearity and no collinearity)

β =(X′X

)−1X′(Xβ + u)

β =(X′X

)−1X′Xβ +

(X′X

)−1X′u

β = Iβ +(X′X

)−1X′u

β = β +(X′X

)−1X′u

E [β|X] = E [β|X] + E [(X′X

)−1X′u|X]

E [β|X] = β +(X′X

)−1X′E [u|X]

E [β|X] = β (zero conditional mean)

E [E [β|X]] = β (law of iterated expectations)

So, yes!

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 28 / 93

Page 123: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Unbiasedness of β

Is β still unbiased under assumptions 1-4? Does E [β] = β?

β =(X′X

)−1X′y (linearity and no collinearity)

β =(X′X

)−1X′(Xβ + u)

β =(X′X

)−1X′Xβ +

(X′X

)−1X′u

β = Iβ +(X′X

)−1X′u

β = β +(X′X

)−1X′u

E [β|X] = E [β|X] + E [(X′X

)−1X′u|X]

E [β|X] = β +(X′X

)−1X′E [u|X]

E [β|X] = β (zero conditional mean)

E [E [β|X]] = β (law of iterated expectations)

So, yes!

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 28 / 93

Page 124: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Unbiasedness of β

Is β still unbiased under assumptions 1-4? Does E [β] = β?

β =(X′X

)−1X′y (linearity and no collinearity)

β =(X′X

)−1X′(Xβ + u)

β =(X′X

)−1X′Xβ +

(X′X

)−1X′u

β = Iβ +(X′X

)−1X′u

β = β +(X′X

)−1X′u

E [β|X] = E [β|X] + E [(X′X

)−1X′u|X]

E [β|X] = β +(X′X

)−1X′E [u|X]

E [β|X] = β (zero conditional mean)

E [E [β|X]] = β (law of iterated expectations)

So, yes!

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 28 / 93

Page 125: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Unbiasedness of β

Is β still unbiased under assumptions 1-4? Does E [β] = β?

β =(X′X

)−1X′y (linearity and no collinearity)

β =(X′X

)−1X′(Xβ + u)

β =(X′X

)−1X′Xβ +

(X′X

)−1X′u

β = Iβ +(X′X

)−1X′u

β = β +(X′X

)−1X′u

E [β|X] = E [β|X] + E [(X′X

)−1X′u|X]

E [β|X] = β +(X′X

)−1X′E [u|X]

E [β|X] = β (zero conditional mean)

E [E [β|X]] = β (law of iterated expectations)

So, yes!

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 28 / 93

Page 126: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Unbiasedness of β

Is β still unbiased under assumptions 1-4? Does E [β] = β?

β =(X′X

)−1X′y (linearity and no collinearity)

β =(X′X

)−1X′(Xβ + u)

β =(X′X

)−1X′Xβ +

(X′X

)−1X′u

β = Iβ +(X′X

)−1X′u

β = β +(X′X

)−1X′u

E [β|X] = E [β|X] + E [(X′X

)−1X′u|X]

E [β|X] = β +(X′X

)−1X′E [u|X]

E [β|X] = β (zero conditional mean)

E [E [β|X]] = β (law of iterated expectations)

So, yes!

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 28 / 93

Page 127: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Unbiasedness of β

Is β still unbiased under assumptions 1-4? Does E [β] = β?

β =(X′X

)−1X′y (linearity and no collinearity)

β =(X′X

)−1X′(Xβ + u)

β =(X′X

)−1X′Xβ +

(X′X

)−1X′u

β = Iβ +(X′X

)−1X′u

β = β +(X′X

)−1X′u

E [β|X] = E [β|X] + E [(X′X

)−1X′u|X]

E [β|X] = β +(X′X

)−1X′E [u|X]

E [β|X] = β (zero conditional mean)

E [E [β|X]] = β (law of iterated expectations)

So, yes!

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 28 / 93

Page 128: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Unbiasedness of β

Is β still unbiased under assumptions 1-4? Does E [β] = β?

β =(X′X

)−1X′y (linearity and no collinearity)

β =(X′X

)−1X′(Xβ + u)

β =(X′X

)−1X′Xβ +

(X′X

)−1X′u

β = Iβ +(X′X

)−1X′u

β = β +(X′X

)−1X′u

E [β|X] = E [β|X] + E [(X′X

)−1X′u|X]

E [β|X] = β +(X′X

)−1X′E [u|X]

E [β|X] = β (zero conditional mean)

E [E [β|X]] = β (law of iterated expectations)

So, yes!

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 28 / 93

Page 129: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Unbiasedness of β

Is β still unbiased under assumptions 1-4? Does E [β] = β?

β =(X′X

)−1X′y (linearity and no collinearity)

β =(X′X

)−1X′(Xβ + u)

β =(X′X

)−1X′Xβ +

(X′X

)−1X′u

β = Iβ +(X′X

)−1X′u

β = β +(X′X

)−1X′u

E [β|X] = E [β|X] + E [(X′X

)−1X′u|X]

E [β|X] = β +(X′X

)−1X′E [u|X]

E [β|X] = β (zero conditional mean)

E [E [β|X]] = β (law of iterated expectations)

So, yes!

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 28 / 93

Page 130: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Unbiasedness of β

Is β still unbiased under assumptions 1-4? Does E [β] = β?

β =(X′X

)−1X′y (linearity and no collinearity)

β =(X′X

)−1X′(Xβ + u)

β =(X′X

)−1X′Xβ +

(X′X

)−1X′u

β = Iβ +(X′X

)−1X′u

β = β +(X′X

)−1X′u

E [β|X] = E [β|X] + E [(X′X

)−1X′u|X]

E [β|X] = β +(X′X

)−1X′E [u|X]

E [β|X] = β (zero conditional mean)

E [E [β|X]] = β (law of iterated expectations)

So, yes!

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 28 / 93

Page 131: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Unbiasedness of β

Is β still unbiased under assumptions 1-4? Does E [β] = β?

β =(X′X

)−1X′y (linearity and no collinearity)

β =(X′X

)−1X′(Xβ + u)

β =(X′X

)−1X′Xβ +

(X′X

)−1X′u

β = Iβ +(X′X

)−1X′u

β = β +(X′X

)−1X′u

E [β|X] = E [β|X] + E [(X′X

)−1X′u|X]

E [β|X] = β +(X′X

)−1X′E [u|X]

E [β|X] = β (zero conditional mean)

E [E [β|X]] = β (law of iterated expectations)

So, yes!

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 28 / 93

Page 132: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Unbiasedness of β

Is β still unbiased under assumptions 1-4? Does E [β] = β?

β =(X′X

)−1X′y (linearity and no collinearity)

β =(X′X

)−1X′(Xβ + u)

β =(X′X

)−1X′Xβ +

(X′X

)−1X′u

β = Iβ +(X′X

)−1X′u

β = β +(X′X

)−1X′u

E [β|X] = E [β|X] + E [(X′X

)−1X′u|X]

E [β|X] = β +(X′X

)−1X′E [u|X]

E [β|X] = β (zero conditional mean)

E [E [β|X]] = β (law of iterated expectations)

So, yes!

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 28 / 93

Page 133: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

A Much Shorter Proof of Unbiasedness of β

A shorter (but less helpful later) proof of unbiasedness,

E [E [β|X]] = E [E [(X′X

)−1X′y|X]] (definition of the estimator)

= E [(X′X

)−1X′Xβ] (expectation of y)

= β

Now we know the sampling distribution is centered on β we want to derivethe variance of the sampling distribution conditional on X .

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 29 / 93

Page 134: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

A Much Shorter Proof of Unbiasedness of β

A shorter (but less helpful later) proof of unbiasedness,

E [E [β|X]] = E [E [(X′X

)−1X′y|X]] (definition of the estimator)

= E [(X′X

)−1X′Xβ] (expectation of y)

= β

Now we know the sampling distribution is centered on β we want to derivethe variance of the sampling distribution conditional on X .

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 29 / 93

Page 135: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

A Much Shorter Proof of Unbiasedness of β

A shorter (but less helpful later) proof of unbiasedness,

E [E [β|X]] = E [E [(X′X

)−1X′y|X]] (definition of the estimator)

= E [(X′X

)−1X′Xβ] (expectation of y)

= β

Now we know the sampling distribution is centered on β we want to derivethe variance of the sampling distribution conditional on X .

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 29 / 93

Page 136: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

A Much Shorter Proof of Unbiasedness of β

A shorter (but less helpful later) proof of unbiasedness,

E [E [β|X]] = E [E [(X′X

)−1X′y|X]] (definition of the estimator)

= E [(X′X

)−1X′Xβ] (expectation of y)

= β

Now we know the sampling distribution is centered on β we want to derivethe variance of the sampling distribution conditional on X .

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 29 / 93

Page 137: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

A Much Shorter Proof of Unbiasedness of β

A shorter (but less helpful later) proof of unbiasedness,

E [E [β|X]] = E [E [(X′X

)−1X′y|X]] (definition of the estimator)

= E [(X′X

)−1X′Xβ] (expectation of y)

= β

Now we know the sampling distribution is centered on β we want to derivethe variance of the sampling distribution conditional on X .

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 29 / 93

Page 138: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Assumption 5: Homoskedasticity

The stated homoskedasticity assumption is: var(u|X) = σ2uIn

To really understand this we need to know what var(u|X) is in fullgenerality.

The variance of a vector is actually a matrix:

var[u] = Σu =

var(u1) cov(u1, u2) . . . cov(u1, un)

cov(u2, u1) var(u2) . . . cov(u2, un)...

. . .

cov(un, u1) cov(un, u2) . . . var(un)

This matrix is always symmetric since cov(ui , uj) = cov(uj , ui ) bydefinition.

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 30 / 93

Page 139: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Assumption 5: Homoskedasticity

The stated homoskedasticity assumption is: var(u|X) = σ2uIn

To really understand this we need to know what var(u|X) is in fullgenerality.

The variance of a vector is actually a matrix:

var[u] = Σu =

var(u1) cov(u1, u2) . . . cov(u1, un)

cov(u2, u1) var(u2) . . . cov(u2, un)...

. . .

cov(un, u1) cov(un, u2) . . . var(un)

This matrix is always symmetric since cov(ui , uj) = cov(uj , ui ) bydefinition.

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 30 / 93

Page 140: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Assumption 5: Homoskedasticity

The stated homoskedasticity assumption is: var(u|X) = σ2uIn

To really understand this we need to know what var(u|X) is in fullgenerality.

The variance of a vector is actually a matrix:

var[u] = Σu =

var(u1) cov(u1, u2) . . . cov(u1, un)

cov(u2, u1) var(u2) . . . cov(u2, un)...

. . .

cov(un, u1) cov(un, u2) . . . var(un)

This matrix is always symmetric since cov(ui , uj) = cov(uj , ui ) bydefinition.

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 30 / 93

Page 141: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Assumption 5: Homoskedasticity

The stated homoskedasticity assumption is: var(u|X) = σ2uIn

To really understand this we need to know what var(u|X) is in fullgenerality.

The variance of a vector is actually a matrix:

var[u] = Σu =

var(u1) cov(u1, u2) . . . cov(u1, un)

cov(u2, u1) var(u2) . . . cov(u2, un)...

. . .

cov(un, u1) cov(un, u2) . . . var(un)

This matrix is always symmetric since cov(ui , uj) = cov(uj , ui ) bydefinition.

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 30 / 93

Page 142: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Assumption 5: Homoskedasticity

The stated homoskedasticity assumption is: var(u|X) = σ2uIn

To really understand this we need to know what var(u|X) is in fullgenerality.

The variance of a vector is actually a matrix:

var[u] = Σu =

var(u1) cov(u1, u2) . . . cov(u1, un)

cov(u2, u1) var(u2) . . . cov(u2, un)...

. . .

cov(un, u1) cov(un, u2) . . . var(un)

This matrix is always symmetric since cov(ui , uj) = cov(uj , ui ) bydefinition.

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 30 / 93

Page 143: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Assumption 5: The Meaning of Homoskedasticity

What does var(u|X) = σ2uIn mean?

In is the n × n identity matrix, σ2u is a scalar.

Visually:

var[u] = σ2uIn =

σ2u 0 0 . . . 0

0 σ2u 0 . . . 0

...0 0 0 . . . σ2

u

In less matrix notation:

I var(ui ) = σ2u for all i (constant variance)

I cov(ui , uj) = 0 for all i 6= j (implied by iid)

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 31 / 93

Page 144: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Assumption 5: The Meaning of Homoskedasticity

What does var(u|X) = σ2uIn mean?

In is the n × n identity matrix, σ2u is a scalar.

Visually:

var[u] = σ2uIn =

σ2u 0 0 . . . 0

0 σ2u 0 . . . 0

...0 0 0 . . . σ2

u

In less matrix notation:

I var(ui ) = σ2u for all i (constant variance)

I cov(ui , uj) = 0 for all i 6= j (implied by iid)

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 31 / 93

Page 145: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Assumption 5: The Meaning of Homoskedasticity

What does var(u|X) = σ2uIn mean?

In is the n × n identity matrix, σ2u is a scalar.

Visually:

var[u] = σ2uIn =

σ2u 0 0 . . . 0

0 σ2u 0 . . . 0

...0 0 0 . . . σ2

u

In less matrix notation:I var(ui ) = σ2

u for all i (constant variance)I cov(ui , uj) = 0 for all i 6= j (implied by iid)

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 31 / 93

Page 146: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Assumption 5: The Meaning of Homoskedasticity

What does var(u|X) = σ2uIn mean?

In is the n × n identity matrix, σ2u is a scalar.

Visually:

var[u] = σ2uIn =

σ2u 0 0 . . . 0

0 σ2u 0 . . . 0

...0 0 0 . . . σ2

u

In less matrix notation:

I var(ui ) = σ2u for all i (constant variance)

I cov(ui , uj) = 0 for all i 6= j (implied by iid)

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 31 / 93

Page 147: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Assumption 5: The Meaning of Homoskedasticity

What does var(u|X) = σ2uIn mean?

In is the n × n identity matrix, σ2u is a scalar.

Visually:

var[u] = σ2uIn =

σ2u 0 0 . . . 0

0 σ2u 0 . . . 0

...0 0 0 . . . σ2

u

In less matrix notation:

I var(ui ) = σ2u for all i (constant variance)

I cov(ui , uj) = 0 for all i 6= j (implied by iid)

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 31 / 93

Page 148: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Assumption 5: The Meaning of Homoskedasticity

What does var(u|X) = σ2uIn mean?

In is the n × n identity matrix, σ2u is a scalar.

Visually:

var[u] = σ2uIn =

σ2u 0 0 . . . 0

0 σ2u 0 . . . 0

...0 0 0 . . . σ2

u

In less matrix notation:

I var(ui ) = σ2u for all i (constant variance)

I cov(ui , uj) = 0 for all i 6= j (implied by iid)

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 31 / 93

Page 149: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Rule: Variance of Linear Function of Random Vector

Recall that for a linear transformation of a random variable X we haveV [aX + b] = a2V [X ] with constants a and b.

We will need an analogous rule for linear functions of random vectors.

Definition (Variance of Linear Transformation of Random Vector)

Let f (u) = Au + B be a linear transformation of a random vector u withnon-random vectors or matrices A and B. Then the variance of thetransformation is given by:

V [f (u)] = V [Au + B] = AV [u]A′

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 32 / 93

Page 150: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Rule: Variance of Linear Function of Random Vector

Recall that for a linear transformation of a random variable X we haveV [aX + b] = a2V [X ] with constants a and b.

We will need an analogous rule for linear functions of random vectors.

Definition (Variance of Linear Transformation of Random Vector)

Let f (u) = Au + B be a linear transformation of a random vector u withnon-random vectors or matrices A and B. Then the variance of thetransformation is given by:

V [f (u)] = V [Au + B] = AV [u]A′

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 32 / 93

Page 151: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Rule: Variance of Linear Function of Random Vector

Recall that for a linear transformation of a random variable X we haveV [aX + b] = a2V [X ] with constants a and b.

We will need an analogous rule for linear functions of random vectors.

Definition (Variance of Linear Transformation of Random Vector)

Let f (u) = Au + B be a linear transformation of a random vector u withnon-random vectors or matrices A and B. Then the variance of thetransformation is given by:

V [f (u)] = V [Au + B] = AV [u]A′

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 32 / 93

Page 152: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Conditional Variance of ββ = β + (X′X)

−1 X′u and E [β|X] = β + E [(X′X)−1 X′u|X] = β so the OLS

estimator is a linear function of the errors. Thus:

V [β|X] = V [β|X] + V [(X′X)−1

X′u|X]

= V [(X′X)−1

X′u|X]

= (X′X)−1

X′V [u|X]((X′X)−1

X′)′ (X is nonrandom given X)

= (X′X)−1

X′V [u|X]X (X′X)−1

= (X′X)−1

X′σ2IX (X′X)−1

(by homoskedasticity)

= σ2I (X′X)−1

X′X (X′X)−1

= σ2 (X′X)−1

This gives the (k + 1)× (k + 1) variance-covariance matrix of β.

To estimate V [β|X], we replace σ2 with its unbiased estimator σ2, which is nowwritten using matrix notation as:

σ2 =

∑i u

2i

n − (k + 1)=

u′u

n − (k + 1)

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 33 / 93

Page 153: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Conditional Variance of ββ = β + (X′X)

−1 X′u and E [β|X] = β + E [(X′X)−1 X′u|X] = β so the OLS

estimator is a linear function of the errors. Thus:

V [β|X] = V [β|X] + V [(X′X)−1

X′u|X]

= V [(X′X)−1

X′u|X]

= (X′X)−1

X′V [u|X]((X′X)−1

X′)′ (X is nonrandom given X)

= (X′X)−1

X′V [u|X]X (X′X)−1

= (X′X)−1

X′σ2IX (X′X)−1

(by homoskedasticity)

= σ2I (X′X)−1

X′X (X′X)−1

= σ2 (X′X)−1

This gives the (k + 1)× (k + 1) variance-covariance matrix of β.

To estimate V [β|X], we replace σ2 with its unbiased estimator σ2, which is nowwritten using matrix notation as:

σ2 =

∑i u

2i

n − (k + 1)=

u′u

n − (k + 1)

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 33 / 93

Page 154: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Conditional Variance of ββ = β + (X′X)

−1 X′u and E [β|X] = β + E [(X′X)−1 X′u|X] = β so the OLS

estimator is a linear function of the errors. Thus:

V [β|X] = V [β|X] + V [(X′X)−1

X′u|X]

= V [(X′X)−1

X′u|X]

= (X′X)−1

X′V [u|X]((X′X)−1

X′)′ (X is nonrandom given X)

= (X′X)−1

X′V [u|X]X (X′X)−1

= (X′X)−1

X′σ2IX (X′X)−1

(by homoskedasticity)

= σ2I (X′X)−1

X′X (X′X)−1

= σ2 (X′X)−1

This gives the (k + 1)× (k + 1) variance-covariance matrix of β.

To estimate V [β|X], we replace σ2 with its unbiased estimator σ2, which is nowwritten using matrix notation as:

σ2 =

∑i u

2i

n − (k + 1)=

u′u

n − (k + 1)

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 33 / 93

Page 155: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Conditional Variance of ββ = β + (X′X)

−1 X′u and E [β|X] = β + E [(X′X)−1 X′u|X] = β so the OLS

estimator is a linear function of the errors. Thus:

V [β|X] = V [β|X] + V [(X′X)−1

X′u|X]

= V [(X′X)−1

X′u|X]

= (X′X)−1

X′V [u|X]((X′X)−1

X′)′ (X is nonrandom given X)

= (X′X)−1

X′V [u|X]X (X′X)−1

= (X′X)−1

X′σ2IX (X′X)−1

(by homoskedasticity)

= σ2I (X′X)−1

X′X (X′X)−1

= σ2 (X′X)−1

This gives the (k + 1)× (k + 1) variance-covariance matrix of β.

To estimate V [β|X], we replace σ2 with its unbiased estimator σ2, which is nowwritten using matrix notation as:

σ2 =

∑i u

2i

n − (k + 1)=

u′u

n − (k + 1)

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 33 / 93

Page 156: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Conditional Variance of ββ = β + (X′X)

−1 X′u and E [β|X] = β + E [(X′X)−1 X′u|X] = β so the OLS

estimator is a linear function of the errors. Thus:

V [β|X] = V [β|X] + V [(X′X)−1

X′u|X]

= V [(X′X)−1

X′u|X]

= (X′X)−1

X′V [u|X]((X′X)−1

X′)′ (X is nonrandom given X)

= (X′X)−1

X′V [u|X]X (X′X)−1

= (X′X)−1

X′σ2IX (X′X)−1

(by homoskedasticity)

= σ2I (X′X)−1

X′X (X′X)−1

= σ2 (X′X)−1

This gives the (k + 1)× (k + 1) variance-covariance matrix of β.

To estimate V [β|X], we replace σ2 with its unbiased estimator σ2, which is nowwritten using matrix notation as:

σ2 =

∑i u

2i

n − (k + 1)=

u′u

n − (k + 1)

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 33 / 93

Page 157: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Conditional Variance of ββ = β + (X′X)

−1 X′u and E [β|X] = β + E [(X′X)−1 X′u|X] = β so the OLS

estimator is a linear function of the errors. Thus:

V [β|X] = V [β|X] + V [(X′X)−1

X′u|X]

= V [(X′X)−1

X′u|X]

= (X′X)−1

X′V [u|X]((X′X)−1

X′)′ (X is nonrandom given X)

= (X′X)−1

X′V [u|X]X (X′X)−1

= (X′X)−1

X′σ2IX (X′X)−1

(by homoskedasticity)

= σ2I (X′X)−1

X′X (X′X)−1

= σ2 (X′X)−1

This gives the (k + 1)× (k + 1) variance-covariance matrix of β.

To estimate V [β|X], we replace σ2 with its unbiased estimator σ2, which is nowwritten using matrix notation as:

σ2 =

∑i u

2i

n − (k + 1)=

u′u

n − (k + 1)

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 33 / 93

Page 158: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Conditional Variance of ββ = β + (X′X)

−1 X′u and E [β|X] = β + E [(X′X)−1 X′u|X] = β so the OLS

estimator is a linear function of the errors. Thus:

V [β|X] = V [β|X] + V [(X′X)−1

X′u|X]

= V [(X′X)−1

X′u|X]

= (X′X)−1

X′V [u|X]((X′X)−1

X′)′ (X is nonrandom given X)

= (X′X)−1

X′V [u|X]X (X′X)−1

= (X′X)−1

X′σ2IX (X′X)−1

(by homoskedasticity)

= σ2I (X′X)−1

X′X (X′X)−1

= σ2 (X′X)−1

This gives the (k + 1)× (k + 1) variance-covariance matrix of β.

To estimate V [β|X], we replace σ2 with its unbiased estimator σ2, which is nowwritten using matrix notation as:

σ2 =

∑i u

2i

n − (k + 1)=

u′u

n − (k + 1)

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 33 / 93

Page 159: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Conditional Variance of ββ = β + (X′X)

−1 X′u and E [β|X] = β + E [(X′X)−1 X′u|X] = β so the OLS

estimator is a linear function of the errors. Thus:

V [β|X] = V [β|X] + V [(X′X)−1

X′u|X]

= V [(X′X)−1

X′u|X]

= (X′X)−1

X′V [u|X]((X′X)−1

X′)′ (X is nonrandom given X)

= (X′X)−1

X′V [u|X]X (X′X)−1

= (X′X)−1

X′σ2IX (X′X)−1

(by homoskedasticity)

= σ2I (X′X)−1

X′X (X′X)−1

= σ2 (X′X)−1

This gives the (k + 1)× (k + 1) variance-covariance matrix of β.

To estimate V [β|X], we replace σ2 with its unbiased estimator σ2, which is nowwritten using matrix notation as:

σ2 =

∑i u

2i

n − (k + 1)=

u′u

n − (k + 1)

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 33 / 93

Page 160: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Conditional Variance of ββ = β + (X′X)

−1 X′u and E [β|X] = β + E [(X′X)−1 X′u|X] = β so the OLS

estimator is a linear function of the errors. Thus:

V [β|X] = V [β|X] + V [(X′X)−1

X′u|X]

= V [(X′X)−1

X′u|X]

= (X′X)−1

X′V [u|X]((X′X)−1

X′)′ (X is nonrandom given X)

= (X′X)−1

X′V [u|X]X (X′X)−1

= (X′X)−1

X′σ2IX (X′X)−1

(by homoskedasticity)

= σ2I (X′X)−1

X′X (X′X)−1

= σ2 (X′X)−1

This gives the (k + 1)× (k + 1) variance-covariance matrix of β.

To estimate V [β|X], we replace σ2 with its unbiased estimator σ2, which is nowwritten using matrix notation as:

σ2 =

∑i u

2i

n − (k + 1)=

u′u

n − (k + 1)

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 33 / 93

Page 161: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Conditional Variance of ββ = β + (X′X)

−1 X′u and E [β|X] = β + E [(X′X)−1 X′u|X] = β so the OLS

estimator is a linear function of the errors. Thus:

V [β|X] = V [β|X] + V [(X′X)−1

X′u|X]

= V [(X′X)−1

X′u|X]

= (X′X)−1

X′V [u|X]((X′X)−1

X′)′ (X is nonrandom given X)

= (X′X)−1

X′V [u|X]X (X′X)−1

= (X′X)−1

X′σ2IX (X′X)−1

(by homoskedasticity)

= σ2I (X′X)−1

X′X (X′X)−1

= σ2 (X′X)−1

This gives the (k + 1)× (k + 1) variance-covariance matrix of β.

To estimate V [β|X], we replace σ2 with its unbiased estimator σ2, which is nowwritten using matrix notation as:

σ2 =

∑i u

2i

n − (k + 1)=

u′u

n − (k + 1)

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 33 / 93

Page 162: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Sampling Variance for β

Under assumptions 1-5, the variance-covariance matrix of the OLSestimators is given by:

V [β|X] = σ2(X′X

)−1=

β0 β1 β2 · · · βkβ0 V [β0] Cov[β0, β1] Cov[β0, β2] · · · Cov[β0, βk ]

β1 Cov[β0, β1] V [β1] Cov[β1, β2] · · · Cov[β1, βk ]

β2 Cov[β0, β2] Cov[β1, β2] V [β2] · · · Cov[β2, βk ]...

......

.... . .

...

βk Cov[β0, βk ] Cov[βk , β1] Cov[βk , β2] · · · V [βk ]

Recall that standard errors are the square root of the diagonals of thismatrix.

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 34 / 93

Page 163: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Overview of Inference in the General SettingUnder assumption 1-5 in large samples:

βj − βjSE [βj ]

∼ N(0, 1)

In small samples, under assumptions 1-6,

βj − βjSE [βj ]

∼ tn−(k+1)

Estimated standard errors are:

SE [βj ] =

√var[β]jj

var[β] = σ2u(X′X)−1

σ2u =

u′u

n − (k + 1)

Thus, confidence intervals and hypothesis tests proceed in essentiallythe same way.

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 35 / 93

Page 164: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Overview of Inference in the General SettingUnder assumption 1-5 in large samples:

βj − βjSE [βj ]

∼ N(0, 1)

In small samples, under assumptions 1-6,

βj − βjSE [βj ]

∼ tn−(k+1)

Estimated standard errors are:

SE [βj ] =

√var[β]jj

var[β] = σ2u(X′X)−1

σ2u =

u′u

n − (k + 1)

Thus, confidence intervals and hypothesis tests proceed in essentiallythe same way.

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 35 / 93

Page 165: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Overview of Inference in the General SettingUnder assumption 1-5 in large samples:

βj − βjSE [βj ]

∼ N(0, 1)

In small samples, under assumptions 1-6,

βj − βjSE [βj ]

∼ tn−(k+1)

Estimated standard errors are:

SE [βj ] =

√var[β]jj

var[β] = σ2u(X′X)−1

σ2u =

u′u

n − (k + 1)

Thus, confidence intervals and hypothesis tests proceed in essentiallythe same way.

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 35 / 93

Page 166: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Overview of Inference in the General SettingUnder assumption 1-5 in large samples:

βj − βjSE [βj ]

∼ N(0, 1)

In small samples, under assumptions 1-6,

βj − βjSE [βj ]

∼ tn−(k+1)

Estimated standard errors are:

SE [βj ] =

√var[β]jj

var[β] = σ2u(X′X)−1

σ2u =

u′u

n − (k + 1)

Thus, confidence intervals and hypothesis tests proceed in essentiallythe same way.Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 35 / 93

Page 167: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Properties of the OLS Estimator: Summary

Theorem

Under Assumptions 1–6, the (k + 1)× 1 vector of OLS estimators β, conditionalon X, follows a multivariate normal distribution with mean β andvariance-covariance matrix σ2 (X′X)

−1:

β|X ∼ N(β, σ2 (X′X)

−1)

Each element of β (i.e. β0, ..., βk+1) is normally distributed, and β is anunbiased estimator of β as E [β] = β

Variances and covariances are given by V [β|X] = σ2 (X′X)−1

An unbiased estimator for the error variance σ2 is given by

σ2 =u′u

n − (k + 1)

With a large sample, β approximately follows the same distribution underAssumptions 1–5 only, i.e., without assuming the normality of u.

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 36 / 93

Page 168: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Properties of the OLS Estimator: Summary

Theorem

Under Assumptions 1–6, the (k + 1)× 1 vector of OLS estimators β, conditionalon X, follows a multivariate normal distribution with mean β andvariance-covariance matrix σ2 (X′X)

−1:

β|X ∼ N(β, σ2 (X′X)

−1)

Each element of β (i.e. β0, ..., βk+1) is normally distributed, and β is anunbiased estimator of β as E [β] = β

Variances and covariances are given by V [β|X] = σ2 (X′X)−1

An unbiased estimator for the error variance σ2 is given by

σ2 =u′u

n − (k + 1)

With a large sample, β approximately follows the same distribution underAssumptions 1–5 only, i.e., without assuming the normality of u.

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 36 / 93

Page 169: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Properties of the OLS Estimator: Summary

Theorem

Under Assumptions 1–6, the (k + 1)× 1 vector of OLS estimators β, conditionalon X, follows a multivariate normal distribution with mean β andvariance-covariance matrix σ2 (X′X)

−1:

β|X ∼ N(β, σ2 (X′X)

−1)

Each element of β (i.e. β0, ..., βk+1) is normally distributed, and β is anunbiased estimator of β as E [β] = β

Variances and covariances are given by V [β|X] = σ2 (X′X)−1

An unbiased estimator for the error variance σ2 is given by

σ2 =u′u

n − (k + 1)

With a large sample, β approximately follows the same distribution underAssumptions 1–5 only, i.e., without assuming the normality of u.

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 36 / 93

Page 170: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Properties of the OLS Estimator: Summary

Theorem

Under Assumptions 1–6, the (k + 1)× 1 vector of OLS estimators β, conditionalon X, follows a multivariate normal distribution with mean β andvariance-covariance matrix σ2 (X′X)

−1:

β|X ∼ N(β, σ2 (X′X)

−1)

Each element of β (i.e. β0, ..., βk+1) is normally distributed, and β is anunbiased estimator of β as E [β] = β

Variances and covariances are given by V [β|X] = σ2 (X′X)−1

An unbiased estimator for the error variance σ2 is given by

σ2 =u′u

n − (k + 1)

With a large sample, β approximately follows the same distribution underAssumptions 1–5 only, i.e., without assuming the normality of u.

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 36 / 93

Page 171: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Properties of the OLS Estimator: Summary

Theorem

Under Assumptions 1–6, the (k + 1)× 1 vector of OLS estimators β, conditionalon X, follows a multivariate normal distribution with mean β andvariance-covariance matrix σ2 (X′X)

−1:

β|X ∼ N(β, σ2 (X′X)

−1)

Each element of β (i.e. β0, ..., βk+1) is normally distributed, and β is anunbiased estimator of β as E [β] = β

Variances and covariances are given by V [β|X] = σ2 (X′X)−1

An unbiased estimator for the error variance σ2 is given by

σ2 =u′u

n − (k + 1)

With a large sample, β approximately follows the same distribution underAssumptions 1–5 only, i.e., without assuming the normality of u.

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 36 / 93

Page 172: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Properties of the OLS Estimator: Summary

Theorem

Under Assumptions 1–6, the (k + 1)× 1 vector of OLS estimators β, conditionalon X, follows a multivariate normal distribution with mean β andvariance-covariance matrix σ2 (X′X)

−1:

β|X ∼ N(β, σ2 (X′X)

−1)

Each element of β (i.e. β0, ..., βk+1) is normally distributed, and β is anunbiased estimator of β as E [β] = β

Variances and covariances are given by V [β|X] = σ2 (X′X)−1

An unbiased estimator for the error variance σ2 is given by

σ2 =u′u

n − (k + 1)

With a large sample, β approximately follows the same distribution underAssumptions 1–5 only, i.e., without assuming the normality of u.

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 36 / 93

Page 173: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Implications of the Variance-Covariance Matrix

Note that the sampling distribution is a joint distribution because itinvolves multiple random variables.

This is because the sampling distribution of the terms in β arecorrelated.

In a practical sense, this means that our uncertainty aboutcoefficients is correlated across variables.

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 37 / 93

Page 174: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Implications of the Variance-Covariance Matrix

Note that the sampling distribution is a joint distribution because itinvolves multiple random variables.

This is because the sampling distribution of the terms in β arecorrelated.

In a practical sense, this means that our uncertainty aboutcoefficients is correlated across variables.

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 37 / 93

Page 175: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Implications of the Variance-Covariance Matrix

Note that the sampling distribution is a joint distribution because itinvolves multiple random variables.

This is because the sampling distribution of the terms in β arecorrelated.

In a practical sense, this means that our uncertainty aboutcoefficients is correlated across variables.

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 37 / 93

Page 176: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Implications of the Variance-Covariance Matrix

Note that the sampling distribution is a joint distribution because itinvolves multiple random variables.

This is because the sampling distribution of the terms in β arecorrelated.

In a practical sense, this means that our uncertainty aboutcoefficients is correlated across variables.

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 37 / 93

Page 177: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Multivariate Normal: SimulationY = β0 + β1X1 + u with u ∼ N(0, σ2

u = 4) and β0 = 5, β1 = −1, and n = 100:

0.0 0.2 0.4 0.6 0.8 1.0

24

68

Sampling distribution of Regression Lines

x1

y

4.0 4.5 5.0 5.5 6.0

−3

−2

−1

0

Joint sampling distribution

beta_0 hat

beta

_1 h

at

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 38 / 93

Page 178: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Marginals of Multivariate Normal RVs are NormalY = β0 + β1X1 + u with u ∼ N(0, σ2

u = 4) and β0 = 5, β1 = −1, and n = 100:

Sampling Distribution beta_0 hat

beta_0 hat

Fre

quen

cy

4.0 4.5 5.0 5.5 6.0

010

020

030

040

050

060

0Sampling Distribution beta_1 hat

beta_1 hat

Fre

quen

cy

−3 −2 −1 0 1

020

040

060

080

0

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 39 / 93

Page 179: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Matrix Notation Overview

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 40 / 93

Page 180: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

We Covered

Unbiasedness

Classical Standard Errors

Next Time: Agnostic Inference

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 41 / 93

Page 181: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

We Covered

Unbiasedness

Classical Standard Errors

Next Time: Agnostic Inference

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 41 / 93

Page 182: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

We Covered

Unbiasedness

Classical Standard Errors

Next Time: Agnostic Inference

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 41 / 93

Page 183: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Where We’ve Been and Where We’re Going...

Last WeekI regression with two variablesI omitted variables, multicollinearity, interactions

This WeekI matrix form of linear regressionI inference and hypothesis tests

Next WeekI diagnostics

Long RunI probability → inference → regression → causal inference

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 42 / 93

Page 184: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

1 Matrix Form of RegressionEstimationFun With(out) Weights

2 OLS Classical Inference in Matrix FormUnbiasednessClassical Standard Errors

3 Agnostic Inference

4 Standard Hypothesis Testst-TestsAdjusted R2

F Tests for Joint Significance

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 43 / 93

Page 185: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

1 Matrix Form of RegressionEstimationFun With(out) Weights

2 OLS Classical Inference in Matrix FormUnbiasednessClassical Standard Errors

3 Agnostic Inference

4 Standard Hypothesis Testst-TestsAdjusted R2

F Tests for Joint Significance

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 43 / 93

Page 186: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Agnostic Perspective on the OLS estimator

We know the population value of β is:

β = E[X′X

]−1E[X′y]

How do we get an estimator of this?

Plug-in principle replace population expectation with sampleversions:

β =(X′X

)−1X′y

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 44 / 93

Page 187: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Agnostic Perspective on the OLS estimator

We know the population value of β is:

β = E[X′X

]−1E[X′y]

How do we get an estimator of this?

Plug-in principle replace population expectation with sampleversions:

β =(X′X

)−1X′y

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 44 / 93

Page 188: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Agnostic Perspective on the OLS estimator

We know the population value of β is:

β = E[X′X

]−1E[X′y]

How do we get an estimator of this?

Plug-in principle replace population expectation with sampleversions:

β =(X′X

)−1X′y

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 44 / 93

Page 189: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Asymptotic OLS inferenceWith this representation, we can write the OLS estimator as follows:

β = β +(X′X

)−1X′u

Core idea: X′u is the sum of r.v.s so the CLT applies.

That, plus some asymptotic theory allows us to say:

√N(β − β)

d→ N(0,Ω)

The covariance matrix, Ω is given as:

Ω = E [X′X]−1E [X′Diag(u2)X]E [X′X]−1

We will again be able to replace u with its empirical counterpart (theresiduals) u = y − Xβ, and X with its sample counterpart.

No need for assumptions A1 (linearity), A4 (conditional mean zeroerrors) or A5 (homoskedasticity) needed! Just IID (A2), no perfectcollinearity (A3) and asymptotics.

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 45 / 93

Page 190: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Asymptotic OLS inferenceWith this representation, we can write the OLS estimator as follows:

β = β +(X′X

)−1X′u

Core idea: X′u is the sum of r.v.s so the CLT applies.

That, plus some asymptotic theory allows us to say:

√N(β − β)

d→ N(0,Ω)

The covariance matrix, Ω is given as:

Ω = E [X′X]−1E [X′Diag(u2)X]E [X′X]−1

We will again be able to replace u with its empirical counterpart (theresiduals) u = y − Xβ, and X with its sample counterpart.

No need for assumptions A1 (linearity), A4 (conditional mean zeroerrors) or A5 (homoskedasticity) needed! Just IID (A2), no perfectcollinearity (A3) and asymptotics.

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 45 / 93

Page 191: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Asymptotic OLS inferenceWith this representation, we can write the OLS estimator as follows:

β = β +(X′X

)−1X′u

Core idea: X′u is the sum of r.v.s so the CLT applies.

That, plus some asymptotic theory allows us to say:

√N(β − β)

d→ N(0,Ω)

The covariance matrix, Ω is given as:

Ω = E [X′X]−1E [X′Diag(u2)X]E [X′X]−1

We will again be able to replace u with its empirical counterpart (theresiduals) u = y − Xβ, and X with its sample counterpart.

No need for assumptions A1 (linearity), A4 (conditional mean zeroerrors) or A5 (homoskedasticity) needed! Just IID (A2), no perfectcollinearity (A3) and asymptotics.

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 45 / 93

Page 192: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Asymptotic OLS inferenceWith this representation, we can write the OLS estimator as follows:

β = β +(X′X

)−1X′u

Core idea: X′u is the sum of r.v.s so the CLT applies.

That, plus some asymptotic theory allows us to say:

√N(β − β)

d→ N(0,Ω)

The covariance matrix, Ω is given as:

Ω = E [X′X]−1E [X′Diag(u2)X]E [X′X]−1

We will again be able to replace u with its empirical counterpart (theresiduals) u = y − Xβ, and X with its sample counterpart.

No need for assumptions A1 (linearity), A4 (conditional mean zeroerrors) or A5 (homoskedasticity) needed! Just IID (A2), no perfectcollinearity (A3) and asymptotics.

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 45 / 93

Page 193: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Asymptotic OLS inferenceWith this representation, we can write the OLS estimator as follows:

β = β +(X′X

)−1X′u

Core idea: X′u is the sum of r.v.s so the CLT applies.

That, plus some asymptotic theory allows us to say:

√N(β − β)

d→ N(0,Ω)

The covariance matrix, Ω is given as:

Ω = E [X′X]−1E [X′Diag(u2)X]E [X′X]−1

We will again be able to replace u with its empirical counterpart (theresiduals) u = y − Xβ, and X with its sample counterpart.

No need for assumptions A1 (linearity), A4 (conditional mean zeroerrors) or A5 (homoskedasticity) needed! Just IID (A2), no perfectcollinearity (A3) and asymptotics.

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 45 / 93

Page 194: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Asymptotic OLS inferenceWith this representation, we can write the OLS estimator as follows:

β = β +(X′X

)−1X′u

Core idea: X′u is the sum of r.v.s so the CLT applies.

That, plus some asymptotic theory allows us to say:

√N(β − β)

d→ N(0,Ω)

The covariance matrix, Ω is given as:

Ω = E [X′X]−1E [X′Diag(u2)X]E [X′X]−1

We will again be able to replace u with its empirical counterpart (theresiduals) u = y − Xβ, and X with its sample counterpart.

No need for assumptions A1 (linearity), A4 (conditional mean zeroerrors) or A5 (homoskedasticity) needed! Just IID (A2), no perfectcollinearity (A3) and asymptotics.

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 45 / 93

Page 195: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Stepping Back: The Classical Approach HomoskedasticityRemember what we did before:

β =(X′X

)−1X′y

Let Var[u|X] = Σ

Recall before we used Assumptions 1-4 to show:

Var[β|X] =(X′X

)−1X′ΣX

(X′X

)−1

With homoskedasticity, Σ = σ2I, we simplified

Var[β|X] = (X′X)−1

X′ΣX (X′X)−1

= (X′X)−1

X′σ2IX (X′X)−1

(by homoskedasticity)

= σ2(X′X)−1

X′X (X′X)−1

= σ2 (X′X)−1

Replace σ2 with estimate σ2 will give us our estimate of thecovariance matrix

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 46 / 93

Page 196: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Stepping Back: The Classical Approach HomoskedasticityRemember what we did before:

β =(X′X

)−1X′y

Let Var[u|X] = Σ

Recall before we used Assumptions 1-4 to show:

Var[β|X] =(X′X

)−1X′ΣX

(X′X

)−1

With homoskedasticity, Σ = σ2I, we simplified

Var[β|X] = (X′X)−1

X′ΣX (X′X)−1

= (X′X)−1

X′σ2IX (X′X)−1

(by homoskedasticity)

= σ2(X′X)−1

X′X (X′X)−1

= σ2 (X′X)−1

Replace σ2 with estimate σ2 will give us our estimate of thecovariance matrix

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 46 / 93

Page 197: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Stepping Back: The Classical Approach HomoskedasticityRemember what we did before:

β =(X′X

)−1X′y

Let Var[u|X] = Σ

Recall before we used Assumptions 1-4 to show:

Var[β|X] =(X′X

)−1X′ΣX

(X′X

)−1

With homoskedasticity, Σ = σ2I, we simplified

Var[β|X] = (X′X)−1

X′ΣX (X′X)−1

= (X′X)−1

X′σ2IX (X′X)−1

(by homoskedasticity)

= σ2(X′X)−1

X′X (X′X)−1

= σ2 (X′X)−1

Replace σ2 with estimate σ2 will give us our estimate of thecovariance matrix

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 46 / 93

Page 198: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Stepping Back: The Classical Approach HomoskedasticityRemember what we did before:

β =(X′X

)−1X′y

Let Var[u|X] = Σ

Recall before we used Assumptions 1-4 to show:

Var[β|X] =(X′X

)−1X′ΣX

(X′X

)−1

With homoskedasticity, Σ = σ2I, we simplified

Var[β|X] = (X′X)−1

X′ΣX (X′X)−1

= (X′X)−1

X′σ2IX (X′X)−1

(by homoskedasticity)

= σ2(X′X)−1

X′X (X′X)−1

= σ2 (X′X)−1

Replace σ2 with estimate σ2 will give us our estimate of thecovariance matrix

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 46 / 93

Page 199: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Stepping Back: The Classical Approach HomoskedasticityRemember what we did before:

β =(X′X

)−1X′y

Let Var[u|X] = Σ

Recall before we used Assumptions 1-4 to show:

Var[β|X] =(X′X

)−1X′ΣX

(X′X

)−1

With homoskedasticity, Σ = σ2I, we simplified

Var[β|X] = (X′X)−1

X′ΣX (X′X)−1

= (X′X)−1

X′σ2IX (X′X)−1

(by homoskedasticity)

= σ2(X′X)−1

X′X (X′X)−1

= σ2 (X′X)−1

Replace σ2 with estimate σ2 will give us our estimate of thecovariance matrix

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 46 / 93

Page 200: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Stepping Back: The Classical Approach HomoskedasticityRemember what we did before:

β =(X′X

)−1X′y

Let Var[u|X] = Σ

Recall before we used Assumptions 1-4 to show:

Var[β|X] =(X′X

)−1X′ΣX

(X′X

)−1

With homoskedasticity, Σ = σ2I, we simplified

Var[β|X] = (X′X)−1

X′ΣX (X′X)−1

= (X′X)−1

X′σ2IX (X′X)−1

(by homoskedasticity)

= σ2(X′X)−1

X′X (X′X)−1

= σ2 (X′X)−1

Replace σ2 with estimate σ2 will give us our estimate of thecovariance matrix

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 46 / 93

Page 201: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Stepping Back: The Classical Approach HomoskedasticityRemember what we did before:

β =(X′X

)−1X′y

Let Var[u|X] = Σ

Recall before we used Assumptions 1-4 to show:

Var[β|X] =(X′X

)−1X′ΣX

(X′X

)−1

With homoskedasticity, Σ = σ2I, we simplified

Var[β|X] = (X′X)−1

X′ΣX (X′X)−1

= (X′X)−1

X′σ2IX (X′X)−1

(by homoskedasticity)

= σ2(X′X)−1

X′X (X′X)−1

= σ2 (X′X)−1

Replace σ2 with estimate σ2 will give us our estimate of thecovariance matrix

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 46 / 93

Page 202: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Stepping Back: The Classical Approach HomoskedasticityRemember what we did before:

β =(X′X

)−1X′y

Let Var[u|X] = Σ

Recall before we used Assumptions 1-4 to show:

Var[β|X] =(X′X

)−1X′ΣX

(X′X

)−1

With homoskedasticity, Σ = σ2I, we simplified

Var[β|X] = (X′X)−1

X′ΣX (X′X)−1

= (X′X)−1

X′σ2IX (X′X)−1

(by homoskedasticity)

= σ2(X′X)−1

X′X (X′X)−1

= σ2 (X′X)−1

Replace σ2 with estimate σ2 will give us our estimate of thecovariance matrix

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 46 / 93

Page 203: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Stepping Back: The Classical Approach HomoskedasticityRemember what we did before:

β =(X′X

)−1X′y

Let Var[u|X] = Σ

Recall before we used Assumptions 1-4 to show:

Var[β|X] =(X′X

)−1X′ΣX

(X′X

)−1

With homoskedasticity, Σ = σ2I, we simplified

Var[β|X] = (X′X)−1

X′ΣX (X′X)−1

= (X′X)−1

X′σ2IX (X′X)−1

(by homoskedasticity)

= σ2(X′X)−1

X′X (X′X)−1

= σ2 (X′X)−1

Replace σ2 with estimate σ2 will give us our estimate of thecovariance matrixStewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 46 / 93

Page 204: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

What Does This Rule Out?

0 20 40 60 80 100

0

20

40

60

80

100

120

x

y

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 47 / 93

Page 205: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Non-constant Error Variance

Homoskedastic:

V [u|X] = σ2I =

σ2 0 0 . . . 00 σ2 0 . . . 0

...0 0 0 . . . σ2

Heteroskedastic:

V [u|X] =

σ2

1 0 0 . . . 00 σ2

2 0 . . . 0...

0 0 0 . . . σ2n

Independent, not identical

Cov(ui , uj |X) = 0

Var(ui |X) = σ2i

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 48 / 93

Page 206: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Non-constant Error Variance

Homoskedastic:

V [u|X] = σ2I =

σ2 0 0 . . . 00 σ2 0 . . . 0

...0 0 0 . . . σ2

Heteroskedastic:

V [u|X] =

σ2

1 0 0 . . . 00 σ2

2 0 . . . 0...

0 0 0 . . . σ2n

Independent, not identical

Cov(ui , uj |X) = 0

Var(ui |X) = σ2i

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 48 / 93

Page 207: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Non-constant Error Variance

Homoskedastic:

V [u|X] = σ2I =

σ2 0 0 . . . 00 σ2 0 . . . 0

...0 0 0 . . . σ2

Heteroskedastic:

V [u|X] =

σ2

1 0 0 . . . 00 σ2

2 0 . . . 0...

0 0 0 . . . σ2n

Independent, not identical

Cov(ui , uj |X) = 0

Var(ui |X) = σ2i

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 48 / 93

Page 208: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Non-constant Error Variance

Homoskedastic:

V [u|X] = σ2I =

σ2 0 0 . . . 00 σ2 0 . . . 0

...0 0 0 . . . σ2

Heteroskedastic:

V [u|X] =

σ2

1 0 0 . . . 00 σ2

2 0 . . . 0...

0 0 0 . . . σ2n

Independent, not identical

Cov(ui , uj |X) = 0

Var(ui |X) = σ2i

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 48 / 93

Page 209: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Non-constant Error Variance

Homoskedastic:

V [u|X] = σ2I =

σ2 0 0 . . . 00 σ2 0 . . . 0

...0 0 0 . . . σ2

Heteroskedastic:

V [u|X] =

σ2

1 0 0 . . . 00 σ2

2 0 . . . 0...

0 0 0 . . . σ2n

Independent, not identical

Cov(ui , uj |X) = 0

Var(ui |X) = σ2i

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 48 / 93

Page 210: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Consequences of Heteroskedasticity Under Classical SEs

Standard error estimates incorrect:

SE [β1] =

√σ2∑

i (Xi − X )2

α-level tests, the probability of Type I error 6= α

Coverage of 1− α CIs 6= 1− αOLS is not BLUE

However:

I β still unbiased and consistent for βI degree of the problem depends on how serious the heteroskedasticity is

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 49 / 93

Page 211: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Consequences of Heteroskedasticity Under Classical SEs

Standard error estimates incorrect:

SE [β1] =

√σ2∑

i (Xi − X )2

α-level tests, the probability of Type I error 6= α

Coverage of 1− α CIs 6= 1− αOLS is not BLUE

However:

I β still unbiased and consistent for βI degree of the problem depends on how serious the heteroskedasticity is

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 49 / 93

Page 212: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Consequences of Heteroskedasticity Under Classical SEs

Standard error estimates incorrect:

SE [β1] =

√σ2∑

i (Xi − X )2

α-level tests, the probability of Type I error 6= α

Coverage of 1− α CIs 6= 1− αOLS is not BLUE

However:

I β still unbiased and consistent for βI degree of the problem depends on how serious the heteroskedasticity is

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 49 / 93

Page 213: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Consequences of Heteroskedasticity Under Classical SEs

Standard error estimates incorrect:

SE [β1] =

√σ2∑

i (Xi − X )2

α-level tests, the probability of Type I error 6= α

Coverage of 1− α CIs 6= 1− α

OLS is not BLUE

However:

I β still unbiased and consistent for βI degree of the problem depends on how serious the heteroskedasticity is

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 49 / 93

Page 214: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Consequences of Heteroskedasticity Under Classical SEs

Standard error estimates incorrect:

SE [β1] =

√σ2∑

i (Xi − X )2

α-level tests, the probability of Type I error 6= α

Coverage of 1− α CIs 6= 1− αOLS is not BLUE

However:

I β still unbiased and consistent for βI degree of the problem depends on how serious the heteroskedasticity is

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 49 / 93

Page 215: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Consequences of Heteroskedasticity Under Classical SEs

Standard error estimates incorrect:

SE [β1] =

√σ2∑

i (Xi − X )2

α-level tests, the probability of Type I error 6= α

Coverage of 1− α CIs 6= 1− αOLS is not BLUE

However:

I β still unbiased and consistent for βI degree of the problem depends on how serious the heteroskedasticity is

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 49 / 93

Page 216: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Consequences of Heteroskedasticity Under Classical SEs

Standard error estimates incorrect:

SE [β1] =

√σ2∑

i (Xi − X )2

α-level tests, the probability of Type I error 6= α

Coverage of 1− α CIs 6= 1− αOLS is not BLUE

However:I β still unbiased and consistent for β

I degree of the problem depends on how serious the heteroskedasticity is

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 49 / 93

Page 217: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Consequences of Heteroskedasticity Under Classical SEs

Standard error estimates incorrect:

SE [β1] =

√σ2∑

i (Xi − X )2

α-level tests, the probability of Type I error 6= α

Coverage of 1− α CIs 6= 1− αOLS is not BLUE

However:I β still unbiased and consistent for βI degree of the problem depends on how serious the heteroskedasticity is

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 49 / 93

Page 218: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Heteroskedasticity Consistent Estimator

Under non-constant error variance:

Var[u] = Σ =

σ2

1 0 0 . . . 00 σ2

2 0 . . . 0...

0 0 0 . . . σ2n

When Σ 6= σ2I, we are stuck with this expression:

Var[β|X] =(X′X

)−1X′ΣX

(X′X

)−1

Idea: If we can consistently estimate the components of Σ, we coulddirectly use this expression by replacing Σ with its estimate, Σ.

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 50 / 93

Page 219: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Heteroskedasticity Consistent Estimator

Under non-constant error variance:

Var[u] = Σ =

σ2

1 0 0 . . . 00 σ2

2 0 . . . 0...

0 0 0 . . . σ2n

When Σ 6= σ2I, we are stuck with this expression:

Var[β|X] =(X′X

)−1X′ΣX

(X′X

)−1

Idea: If we can consistently estimate the components of Σ, we coulddirectly use this expression by replacing Σ with its estimate, Σ.

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 50 / 93

Page 220: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Heteroskedasticity Consistent Estimator

Under non-constant error variance:

Var[u] = Σ =

σ2

1 0 0 . . . 00 σ2

2 0 . . . 0...

0 0 0 . . . σ2n

When Σ 6= σ2I, we are stuck with this expression:

Var[β|X] =(X′X

)−1X′ΣX

(X′X

)−1

Idea: If we can consistently estimate the components of Σ, we coulddirectly use this expression by replacing Σ with its estimate, Σ.

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 50 / 93

Page 221: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Heteroskedasticity Consistent Estimator

Under non-constant error variance:

Var[u] = Σ =

σ2

1 0 0 . . . 00 σ2

2 0 . . . 0...

0 0 0 . . . σ2n

When Σ 6= σ2I, we are stuck with this expression:

Var[β|X] =(X′X

)−1X′ΣX

(X′X

)−1

Idea: If we can consistently estimate the components of Σ, we coulddirectly use this expression by replacing Σ with its estimate, Σ.

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 50 / 93

Page 222: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

White’s Heteroskedasticity Consistent EstimatorSuppose we have heteroskedasticity of unknown form (but zero covariance):

V [u] = Σ =

σ2

1 0 0 . . . 00 σ2

2 0 . . . 0...

0 0 0 . . . σ2n

then V [β|X] = (X′X)−1 X′ΣX (X′X)

−1and White (1980) shows that

V [β|X] = (X′X)−1

X′

u2

1 0 0 . . . 00 u2

2 0 . . . 0...

0 0 0 . . . u2n

X (X′X)−1

is a consistent estimator of V [β|X] under any form of heteroskedasticityconsistent with V [u] above.

The estimate based on the above is called the heteroskedasticity consistent (HC)

or robust standard errors. This also coincides with the agnostic standard errors!

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 51 / 93

Page 223: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

White’s Heteroskedasticity Consistent EstimatorSuppose we have heteroskedasticity of unknown form (but zero covariance):

V [u] = Σ =

σ2

1 0 0 . . . 00 σ2

2 0 . . . 0...

0 0 0 . . . σ2n

then V [β|X] = (X′X)

−1 X′ΣX (X′X)−1

and White (1980) shows that

V [β|X] = (X′X)−1

X′

u2

1 0 0 . . . 00 u2

2 0 . . . 0...

0 0 0 . . . u2n

X (X′X)−1

is a consistent estimator of V [β|X] under any form of heteroskedasticityconsistent with V [u] above.

The estimate based on the above is called the heteroskedasticity consistent (HC)

or robust standard errors. This also coincides with the agnostic standard errors!

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 51 / 93

Page 224: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

White’s Heteroskedasticity Consistent EstimatorSuppose we have heteroskedasticity of unknown form (but zero covariance):

V [u] = Σ =

σ2

1 0 0 . . . 00 σ2

2 0 . . . 0...

0 0 0 . . . σ2n

then V [β|X] = (X′X)

−1 X′ΣX (X′X)−1

and White (1980) shows that

V [β|X] = (X′X)−1

X′

u2

1 0 0 . . . 00 u2

2 0 . . . 0...

0 0 0 . . . u2n

X (X′X)−1

is a consistent estimator of V [β|X] under any form of heteroskedasticityconsistent with V [u] above.

The estimate based on the above is called the heteroskedasticity consistent (HC)

or robust standard errors. This also coincides with the agnostic standard errors!

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 51 / 93

Page 225: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

White’s Heteroskedasticity Consistent EstimatorSuppose we have heteroskedasticity of unknown form (but zero covariance):

V [u] = Σ =

σ2

1 0 0 . . . 00 σ2

2 0 . . . 0...

0 0 0 . . . σ2n

then V [β|X] = (X′X)

−1 X′ΣX (X′X)−1

and White (1980) shows that

V [β|X] = (X′X)−1

X′

u2

1 0 0 . . . 00 u2

2 0 . . . 0...

0 0 0 . . . u2n

X (X′X)−1

is a consistent estimator of V [β|X] under any form of heteroskedasticityconsistent with V [u] above.

The estimate based on the above is called the heteroskedasticity consistent (HC)

or robust standard errors. This also coincides with the agnostic standard errors!

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 51 / 93

Page 226: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Intuition for Robust Standard Errors

Core intuition: while Σ is an n × n matrix, X′ΣX is a (k + 1)× (k + 1)matrix.

So there is hope of estimating it consistently as sample size growseven when every true error variance is unique.

Σ =

u2

1 0 0 . . . 00 u2

2 0 . . . 0...

0 0 0 . . . u2n

X′ΣX =

i xi ,1xi ,1u2i

∑i xi ,1xi ,2u

2i . . .

∑i xi ,1xi ,k+1u

2i∑

i xi ,2xi ,1u2i

∑i xi ,2xi ,2u

2i . . .

∑i xi ,2xi ,k+1u

2i

...∑i xi ,k+1xi ,1u

2i

∑i xi ,k+1xi ,2u

2i . . .

∑i xi ,k+1xi ,k+1u

2i

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 52 / 93

Page 227: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Intuition for Robust Standard Errors

Core intuition: while Σ is an n × n matrix, X′ΣX is a (k + 1)× (k + 1)matrix. So there is hope of estimating it consistently as sample size growseven when every true error variance is unique.

Σ =

u2

1 0 0 . . . 00 u2

2 0 . . . 0...

0 0 0 . . . u2n

X′ΣX =

i xi ,1xi ,1u2i

∑i xi ,1xi ,2u

2i . . .

∑i xi ,1xi ,k+1u

2i∑

i xi ,2xi ,1u2i

∑i xi ,2xi ,2u

2i . . .

∑i xi ,2xi ,k+1u

2i

...∑i xi ,k+1xi ,1u

2i

∑i xi ,k+1xi ,2u

2i . . .

∑i xi ,k+1xi ,k+1u

2i

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 52 / 93

Page 228: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Intuition for Robust Standard Errors

Core intuition: while Σ is an n × n matrix, X′ΣX is a (k + 1)× (k + 1)matrix. So there is hope of estimating it consistently as sample size growseven when every true error variance is unique.

Σ =

u2

1 0 0 . . . 00 u2

2 0 . . . 0...

0 0 0 . . . u2n

X′ΣX =

i xi ,1xi ,1u2i

∑i xi ,1xi ,2u

2i . . .

∑i xi ,1xi ,k+1u

2i∑

i xi ,2xi ,1u2i

∑i xi ,2xi ,2u

2i . . .

∑i xi ,2xi ,k+1u

2i

...∑i xi ,k+1xi ,1u

2i

∑i xi ,k+1xi ,2u

2i . . .

∑i xi ,k+1xi ,k+1u

2i

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 52 / 93

Page 229: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Intuition for Robust Standard Errors

Core intuition: while Σ is an n × n matrix, X′ΣX is a (k + 1)× (k + 1)matrix. So there is hope of estimating it consistently as sample size growseven when every true error variance is unique.

Σ =

u2

1 0 0 . . . 00 u2

2 0 . . . 0...

0 0 0 . . . u2n

X′ΣX =

i xi ,1xi ,1u2i

∑i xi ,1xi ,2u

2i . . .

∑i xi ,1xi ,k+1u

2i∑

i xi ,2xi ,1u2i

∑i xi ,2xi ,2u

2i . . .

∑i xi ,2xi ,k+1u

2i

...∑i xi ,k+1xi ,1u

2i

∑i xi ,k+1xi ,2u

2i . . .

∑i xi ,k+1xi ,k+1u

2i

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 52 / 93

Page 230: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

White’s Heteroskedasticity Consistent EstimatorRobust standard errors are easily computed with the “sandwich” formula:

1 Fit the regression and obtain the residuals u

2 Construct the “meat” matrix Σ with squared residuals in diagonal:

Σ =

u2

1 0 0 . . . 00 u2

2 0 . . . 0...

0 0 0 . . . u2n

3 Plug Σ into the sandwich formula to obtain the robust estimator of the

variance-covariance matrix

V [β|X] = (X′X)−1

X′ΣX (X′X)−1

There are various small sample corrections to improve performance when samplesize is small. The most common variant (sometimes labeled HC1) is:

V [β|X] =n

n − k − 1·(X′X

)−1X′ΣX

(X′X

)−1

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 53 / 93

Page 231: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

White’s Heteroskedasticity Consistent EstimatorRobust standard errors are easily computed with the “sandwich” formula:

1 Fit the regression and obtain the residuals u

2 Construct the “meat” matrix Σ with squared residuals in diagonal:

Σ =

u2

1 0 0 . . . 00 u2

2 0 . . . 0...

0 0 0 . . . u2n

3 Plug Σ into the sandwich formula to obtain the robust estimator of the

variance-covariance matrix

V [β|X] = (X′X)−1

X′ΣX (X′X)−1

There are various small sample corrections to improve performance when samplesize is small. The most common variant (sometimes labeled HC1) is:

V [β|X] =n

n − k − 1·(X′X

)−1X′ΣX

(X′X

)−1

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 53 / 93

Page 232: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

White’s Heteroskedasticity Consistent EstimatorRobust standard errors are easily computed with the “sandwich” formula:

1 Fit the regression and obtain the residuals u

2 Construct the “meat” matrix Σ with squared residuals in diagonal:

Σ =

u2

1 0 0 . . . 00 u2

2 0 . . . 0...

0 0 0 . . . u2n

3 Plug Σ into the sandwich formula to obtain the robust estimator of thevariance-covariance matrix

V [β|X] = (X′X)−1

X′ΣX (X′X)−1

There are various small sample corrections to improve performance when samplesize is small. The most common variant (sometimes labeled HC1) is:

V [β|X] =n

n − k − 1·(X′X

)−1X′ΣX

(X′X

)−1

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 53 / 93

Page 233: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

White’s Heteroskedasticity Consistent EstimatorRobust standard errors are easily computed with the “sandwich” formula:

1 Fit the regression and obtain the residuals u

2 Construct the “meat” matrix Σ with squared residuals in diagonal:

Σ =

u2

1 0 0 . . . 00 u2

2 0 . . . 0...

0 0 0 . . . u2n

3 Plug Σ into the sandwich formula to obtain the robust estimator of the

variance-covariance matrix

V [β|X] = (X′X)−1

X′ΣX (X′X)−1

There are various small sample corrections to improve performance when samplesize is small. The most common variant (sometimes labeled HC1) is:

V [β|X] =n

n − k − 1·(X′X

)−1X′ΣX

(X′X

)−1

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 53 / 93

Page 234: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

White’s Heteroskedasticity Consistent EstimatorRobust standard errors are easily computed with the “sandwich” formula:

1 Fit the regression and obtain the residuals u

2 Construct the “meat” matrix Σ with squared residuals in diagonal:

Σ =

u2

1 0 0 . . . 00 u2

2 0 . . . 0...

0 0 0 . . . u2n

3 Plug Σ into the sandwich formula to obtain the robust estimator of the

variance-covariance matrix

V [β|X] = (X′X)−1

X′ΣX (X′X)−1

There are various small sample corrections to improve performance when samplesize is small. The most common variant (sometimes labeled HC1) is:

V [β|X] =n

n − k − 1·(X′X

)−1X′ΣX

(X′X

)−1

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 53 / 93

Page 235: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

White’s Heteroskedasticity Consistent EstimatorRobust standard errors are easily computed with the “sandwich” formula:

1 Fit the regression and obtain the residuals u

2 Construct the “meat” matrix Σ with squared residuals in diagonal:

Σ =

u2

1 0 0 . . . 00 u2

2 0 . . . 0...

0 0 0 . . . u2n

3 Plug Σ into the sandwich formula to obtain the robust estimator of the

variance-covariance matrix

V [β|X] = (X′X)−1

X′ΣX (X′X)−1

There are various small sample corrections to improve performance when samplesize is small. The most common variant (sometimes labeled HC1) is:

V [β|X] =n

n − k − 1·(X′X

)−1X′ΣX

(X′X

)−1

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 53 / 93

Page 236: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Notes on White’s Robust Standard Errors

Doesn’t change estimate β.

Provides a plug-and-play estimate of V [β] which can be used withSEs, confidence intervals etc.—does not provide V [u].

Consistent for V [β] under any form of heteroskedasticity (i.e. wherethe covariances are 0).

This is a large sample result, best with large n

For small n, performance might be poor and the estimates aredownward biased (correction factors exist but are often insufficient)

As we saw, we can arrive at White’s heteroskedasticity consistentstandard errors using the plug-in principle and thus in some ways,these are the natural way of getting standard errors in the agnosticregression framework.

Robust SEs converge to same point as the bootstrap.

This is a general framework (more to come in Week 8).

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 54 / 93

Page 237: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Notes on White’s Robust Standard Errors

Doesn’t change estimate β.

Provides a plug-and-play estimate of V [β] which can be used withSEs, confidence intervals etc.—does not provide V [u].

Consistent for V [β] under any form of heteroskedasticity (i.e. wherethe covariances are 0).

This is a large sample result, best with large n

For small n, performance might be poor and the estimates aredownward biased (correction factors exist but are often insufficient)

As we saw, we can arrive at White’s heteroskedasticity consistentstandard errors using the plug-in principle and thus in some ways,these are the natural way of getting standard errors in the agnosticregression framework.

Robust SEs converge to same point as the bootstrap.

This is a general framework (more to come in Week 8).

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 54 / 93

Page 238: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Notes on White’s Robust Standard Errors

Doesn’t change estimate β.

Provides a plug-and-play estimate of V [β] which can be used withSEs, confidence intervals etc.—does not provide V [u].

Consistent for V [β] under any form of heteroskedasticity (i.e. wherethe covariances are 0).

This is a large sample result, best with large n

For small n, performance might be poor and the estimates aredownward biased (correction factors exist but are often insufficient)

As we saw, we can arrive at White’s heteroskedasticity consistentstandard errors using the plug-in principle and thus in some ways,these are the natural way of getting standard errors in the agnosticregression framework.

Robust SEs converge to same point as the bootstrap.

This is a general framework (more to come in Week 8).

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 54 / 93

Page 239: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Notes on White’s Robust Standard Errors

Doesn’t change estimate β.

Provides a plug-and-play estimate of V [β] which can be used withSEs, confidence intervals etc.—does not provide V [u].

Consistent for V [β] under any form of heteroskedasticity (i.e. wherethe covariances are 0).

This is a large sample result, best with large n

For small n, performance might be poor and the estimates aredownward biased (correction factors exist but are often insufficient)

As we saw, we can arrive at White’s heteroskedasticity consistentstandard errors using the plug-in principle and thus in some ways,these are the natural way of getting standard errors in the agnosticregression framework.

Robust SEs converge to same point as the bootstrap.

This is a general framework (more to come in Week 8).

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 54 / 93

Page 240: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Notes on White’s Robust Standard Errors

Doesn’t change estimate β.

Provides a plug-and-play estimate of V [β] which can be used withSEs, confidence intervals etc.—does not provide V [u].

Consistent for V [β] under any form of heteroskedasticity (i.e. wherethe covariances are 0).

This is a large sample result, best with large n

For small n, performance might be poor and the estimates aredownward biased (correction factors exist but are often insufficient)

As we saw, we can arrive at White’s heteroskedasticity consistentstandard errors using the plug-in principle and thus in some ways,these are the natural way of getting standard errors in the agnosticregression framework.

Robust SEs converge to same point as the bootstrap.

This is a general framework (more to come in Week 8).

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 54 / 93

Page 241: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Notes on White’s Robust Standard Errors

Doesn’t change estimate β.

Provides a plug-and-play estimate of V [β] which can be used withSEs, confidence intervals etc.—does not provide V [u].

Consistent for V [β] under any form of heteroskedasticity (i.e. wherethe covariances are 0).

This is a large sample result, best with large n

For small n, performance might be poor and the estimates aredownward biased (correction factors exist but are often insufficient)

As we saw, we can arrive at White’s heteroskedasticity consistentstandard errors using the plug-in principle and thus in some ways,these are the natural way of getting standard errors in the agnosticregression framework.

Robust SEs converge to same point as the bootstrap.

This is a general framework (more to come in Week 8).

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 54 / 93

Page 242: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Notes on White’s Robust Standard Errors

Doesn’t change estimate β.

Provides a plug-and-play estimate of V [β] which can be used withSEs, confidence intervals etc.—does not provide V [u].

Consistent for V [β] under any form of heteroskedasticity (i.e. wherethe covariances are 0).

This is a large sample result, best with large n

For small n, performance might be poor and the estimates aredownward biased (correction factors exist but are often insufficient)

As we saw, we can arrive at White’s heteroskedasticity consistentstandard errors using the plug-in principle and thus in some ways,these are the natural way of getting standard errors in the agnosticregression framework.

Robust SEs converge to same point as the bootstrap.

This is a general framework (more to come in Week 8).

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 54 / 93

Page 243: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Notes on White’s Robust Standard Errors

Doesn’t change estimate β.

Provides a plug-and-play estimate of V [β] which can be used withSEs, confidence intervals etc.—does not provide V [u].

Consistent for V [β] under any form of heteroskedasticity (i.e. wherethe covariances are 0).

This is a large sample result, best with large n

For small n, performance might be poor and the estimates aredownward biased (correction factors exist but are often insufficient)

As we saw, we can arrive at White’s heteroskedasticity consistentstandard errors using the plug-in principle and thus in some ways,these are the natural way of getting standard errors in the agnosticregression framework.

Robust SEs converge to same point as the bootstrap.

This is a general framework (more to come in Week 8).

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 54 / 93

Page 244: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Notes on White’s Robust Standard Errors

Doesn’t change estimate β.

Provides a plug-and-play estimate of V [β] which can be used withSEs, confidence intervals etc.—does not provide V [u].

Consistent for V [β] under any form of heteroskedasticity (i.e. wherethe covariances are 0).

This is a large sample result, best with large n

For small n, performance might be poor and the estimates aredownward biased (correction factors exist but are often insufficient)

As we saw, we can arrive at White’s heteroskedasticity consistentstandard errors using the plug-in principle and thus in some ways,these are the natural way of getting standard errors in the agnosticregression framework.

Robust SEs converge to same point as the bootstrap.

This is a general framework (more to come in Week 8).

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 54 / 93

Page 245: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

We Covered

Agnostic approach to deriving the estimator (see more in the Aronowand Miller textbook if you are interested).

Robust standard errors and how they flow naturally from the pluginprinciple.

Next Time: Hypothesis Tests

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 55 / 93

Page 246: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

We Covered

Agnostic approach to deriving the estimator (see more in the Aronowand Miller textbook if you are interested).

Robust standard errors and how they flow naturally from the pluginprinciple.

Next Time: Hypothesis Tests

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 55 / 93

Page 247: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

We Covered

Agnostic approach to deriving the estimator (see more in the Aronowand Miller textbook if you are interested).

Robust standard errors and how they flow naturally from the pluginprinciple.

Next Time: Hypothesis Tests

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 55 / 93

Page 248: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

We Covered

Agnostic approach to deriving the estimator (see more in the Aronowand Miller textbook if you are interested).

Robust standard errors and how they flow naturally from the pluginprinciple.

Next Time: Hypothesis Tests

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 55 / 93

Page 249: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Where We’ve Been and Where We’re Going...

Last WeekI regression with two variablesI omitted variables, multicollinearity, interactions

This WeekI matrix form of linear regressionI inference and hypothesis tests

Next WeekI diagnostics

Long RunI probability → inference → regression → causal inference

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 56 / 93

Page 250: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

1 Matrix Form of RegressionEstimationFun With(out) Weights

2 OLS Classical Inference in Matrix FormUnbiasednessClassical Standard Errors

3 Agnostic Inference

4 Standard Hypothesis Testst-TestsAdjusted R2

F Tests for Joint Significance

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 57 / 93

Page 251: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

1 Matrix Form of RegressionEstimationFun With(out) Weights

2 OLS Classical Inference in Matrix FormUnbiasednessClassical Standard Errors

3 Agnostic Inference

4 Standard Hypothesis Testst-TestsAdjusted R2

F Tests for Joint Significance

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 57 / 93

Page 252: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Running Example: Chilean Referendum on Pinochet

The 1988 Chilean national plebiscite was a national referendum heldto determine whether or not dictator Augusto Pinochet would extendhis rule for another eight-year term in office.

Data: national survey conducted in April and May of 1988 byFLACSO in Chile.

Outcome: 1 if respondent intends to vote for Pinochet, 0 otherwise.We can interpret the β slopes as marginal “effects” on the probabilitythat respondent votes for Pinochet.

Plebiscite was held on October 5, 1988. The No side won with 56%of the vote, with 44% voting Yes.

We model the intended Pinochet vote as a linear function of gender,education, and age of respondents.

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 58 / 93

Page 253: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Running Example: Chilean Referendum on Pinochet

The 1988 Chilean national plebiscite was a national referendum heldto determine whether or not dictator Augusto Pinochet would extendhis rule for another eight-year term in office.

Data: national survey conducted in April and May of 1988 byFLACSO in Chile.

Outcome: 1 if respondent intends to vote for Pinochet, 0 otherwise.We can interpret the β slopes as marginal “effects” on the probabilitythat respondent votes for Pinochet.

Plebiscite was held on October 5, 1988. The No side won with 56%of the vote, with 44% voting Yes.

We model the intended Pinochet vote as a linear function of gender,education, and age of respondents.

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 58 / 93

Page 254: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Running Example: Chilean Referendum on Pinochet

The 1988 Chilean national plebiscite was a national referendum heldto determine whether or not dictator Augusto Pinochet would extendhis rule for another eight-year term in office.

Data: national survey conducted in April and May of 1988 byFLACSO in Chile.

Outcome: 1 if respondent intends to vote for Pinochet, 0 otherwise.We can interpret the β slopes as marginal “effects” on the probabilitythat respondent votes for Pinochet.

Plebiscite was held on October 5, 1988. The No side won with 56%of the vote, with 44% voting Yes.

We model the intended Pinochet vote as a linear function of gender,education, and age of respondents.

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 58 / 93

Page 255: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Running Example: Chilean Referendum on Pinochet

The 1988 Chilean national plebiscite was a national referendum heldto determine whether or not dictator Augusto Pinochet would extendhis rule for another eight-year term in office.

Data: national survey conducted in April and May of 1988 byFLACSO in Chile.

Outcome: 1 if respondent intends to vote for Pinochet, 0 otherwise.We can interpret the β slopes as marginal “effects” on the probabilitythat respondent votes for Pinochet.

Plebiscite was held on October 5, 1988. The No side won with 56%of the vote, with 44% voting Yes.

We model the intended Pinochet vote as a linear function of gender,education, and age of respondents.

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 58 / 93

Page 256: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Running Example: Chilean Referendum on Pinochet

The 1988 Chilean national plebiscite was a national referendum heldto determine whether or not dictator Augusto Pinochet would extendhis rule for another eight-year term in office.

Data: national survey conducted in April and May of 1988 byFLACSO in Chile.

Outcome: 1 if respondent intends to vote for Pinochet, 0 otherwise.We can interpret the β slopes as marginal “effects” on the probabilitythat respondent votes for Pinochet.

Plebiscite was held on October 5, 1988. The No side won with 56%of the vote, with 44% voting Yes.

We model the intended Pinochet vote as a linear function of gender,education, and age of respondents.

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 58 / 93

Page 257: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Hypothesis Testing in R

Model the intended Pinochet vote as a linear function of gender,education, and age of respondents.

R Code> fit <- lm(vote1 ~ fem + educ + age, data = d)

> summary(fit)

~~~~~

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 0.4042284 0.0514034 7.864 6.57e-15 ***

fem 0.1360034 0.0237132 5.735 1.15e-08 ***

educ -0.0607604 0.0138649 -4.382 1.25e-05 ***

age 0.0037786 0.0008315 4.544 5.90e-06 ***

---

Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1

Residual standard error: 0.4875 on 1699 degrees of freedom

Multiple R-squared: 0.05112, Adjusted R-squared: 0.04945

F-statistic: 30.51 on 3 and 1699 DF, p-value: < 2.2e-16

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 59 / 93

Page 258: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

The t-Value for Multiple Linear Regression

Consider testing a hypothesis about a single regression coefficient βj :

H0 : βj = c

In the simple linear regression we used the t-value to test this kind ofhypothesis.

We can consider the same t-value about βj for the multiple regression:

T =βj − c

SE (βj)

How do we compute SE (βj)?

SE (βj) =

√V (βj) =

√V (β)(j,j) =

√σ2(X′X)−1

(j,j)

where A(j,j) is the (j , j) element of matrix A.

That is, take the variance-covariance matrix of β and square root thediagonal element corresponding to j .

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 60 / 93

Page 259: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

The t-Value for Multiple Linear Regression

Consider testing a hypothesis about a single regression coefficient βj :

H0 : βj = c

In the simple linear regression we used the t-value to test this kind ofhypothesis.

We can consider the same t-value about βj for the multiple regression:

T =βj − c

SE (βj)

How do we compute SE (βj)?

SE (βj) =

√V (βj) =

√V (β)(j,j) =

√σ2(X′X)−1

(j,j)

where A(j,j) is the (j , j) element of matrix A.

That is, take the variance-covariance matrix of β and square root thediagonal element corresponding to j .

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 60 / 93

Page 260: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

The t-Value for Multiple Linear Regression

Consider testing a hypothesis about a single regression coefficient βj :

H0 : βj = c

In the simple linear regression we used the t-value to test this kind ofhypothesis.

We can consider the same t-value about βj for the multiple regression:

T =βj − c

SE (βj)

How do we compute SE (βj)?

SE (βj) =

√V (βj) =

√V (β)(j,j) =

√σ2(X′X)−1

(j,j)

where A(j,j) is the (j , j) element of matrix A.

That is, take the variance-covariance matrix of β and square root thediagonal element corresponding to j .

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 60 / 93

Page 261: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

The t-Value for Multiple Linear Regression

Consider testing a hypothesis about a single regression coefficient βj :

H0 : βj = c

In the simple linear regression we used the t-value to test this kind ofhypothesis.

We can consider the same t-value about βj for the multiple regression:

T =βj − c

SE (βj)

How do we compute SE (βj)?

SE (βj) =

√V (βj) =

√V (β)(j,j) =

√σ2(X′X)−1

(j,j)

where A(j,j) is the (j , j) element of matrix A.

That is, take the variance-covariance matrix of β and square root thediagonal element corresponding to j .

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 60 / 93

Page 262: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

The t-Value for Multiple Linear Regression

Consider testing a hypothesis about a single regression coefficient βj :

H0 : βj = c

In the simple linear regression we used the t-value to test this kind ofhypothesis.

We can consider the same t-value about βj for the multiple regression:

T =βj − c

SE (βj)

How do we compute SE (βj)?

SE (βj) =

√V (βj) =

√V (β)(j,j) =

√σ2(X′X)−1

(j,j)

where A(j,j) is the (j , j) element of matrix A.

That is, take the variance-covariance matrix of β and square root thediagonal element corresponding to j .

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 60 / 93

Page 263: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

The t-Value for Multiple Linear Regression

Consider testing a hypothesis about a single regression coefficient βj :

H0 : βj = c

In the simple linear regression we used the t-value to test this kind ofhypothesis.

We can consider the same t-value about βj for the multiple regression:

T =βj − c

SE (βj)

How do we compute SE (βj)?

SE (βj) =

√V (βj) =

√V (β)(j,j) =

√σ2(X′X)−1

(j,j)

where A(j,j) is the (j , j) element of matrix A.

That is, take the variance-covariance matrix of β and square root thediagonal element corresponding to j .

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 60 / 93

Page 264: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Getting the Standard ErrorsR Code

> fit <- lm(vote1 ~ fem + educ + age, data = d)

> summary(fit)

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 0.4042284 0.0514034 7.864 6.57e-15 ***

fem 0.1360034 0.0237132 5.735 1.15e-08 ***

educ -0.0607604 0.0138649 -4.382 1.25e-05 ***

age 0.0037786 0.0008315 4.544 5.90e-06 ***

---

We can pull out the variance-covariance matrix σ2(X′X)−1 in R from the lm() object:

R Code> V <- vcov(fit)

> V

(Intercept) fem educ age

(Intercept) 2.642311e-03 -3.455498e-04 -5.270913e-04 -3.357119e-05

fem -3.455498e-04 5.623170e-04 2.249973e-05 8.285291e-07

educ -5.270913e-04 2.249973e-05 1.922354e-04 3.411049e-06

age -3.357119e-05 8.285291e-07 3.411049e-06 6.914098e-07

> sqrt(diag(V))

(Intercept) fem educ age

0.0514034097 0.0237132251 0.0138648980 0.0008315105

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 61 / 93

Page 265: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Getting the Standard ErrorsR Code

> fit <- lm(vote1 ~ fem + educ + age, data = d)

> summary(fit)

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 0.4042284 0.0514034 7.864 6.57e-15 ***

fem 0.1360034 0.0237132 5.735 1.15e-08 ***

educ -0.0607604 0.0138649 -4.382 1.25e-05 ***

age 0.0037786 0.0008315 4.544 5.90e-06 ***

---

We can pull out the variance-covariance matrix σ2(X′X)−1 in R from the lm() object:R Code

> V <- vcov(fit)

> V

(Intercept) fem educ age

(Intercept) 2.642311e-03 -3.455498e-04 -5.270913e-04 -3.357119e-05

fem -3.455498e-04 5.623170e-04 2.249973e-05 8.285291e-07

educ -5.270913e-04 2.249973e-05 1.922354e-04 3.411049e-06

age -3.357119e-05 8.285291e-07 3.411049e-06 6.914098e-07

> sqrt(diag(V))

(Intercept) fem educ age

0.0514034097 0.0237132251 0.0138648980 0.0008315105

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 61 / 93

Page 266: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Using the t-Value as a Test Statistic

The procedure for testing this null hypothesis (βj = c) is identical to the simpleregression case, except that our reference distribution is tn−k−1 instead of tn−2.

1 Compute the t-value as T = (βj − c)/SE [βj ]

2 Compare the value to the critical value tα/2 for the α level test, which underthe null hypothesis satisfies

P(−tα/2 ≤ T ≤ tα/2

)= 1− α

3 Decide whether the realized value of T in our data is unusual given thedistribution of the test statistic under the null hypothesis.

4 Finally, either declare that we reject H0 or not, or report the p-value.

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 62 / 93

Page 267: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Using the t-Value as a Test Statistic

The procedure for testing this null hypothesis (βj = c) is identical to the simpleregression case, except that our reference distribution is tn−k−1 instead of tn−2.

1 Compute the t-value as T = (βj − c)/SE [βj ]

2 Compare the value to the critical value tα/2 for the α level test, which underthe null hypothesis satisfies

P(−tα/2 ≤ T ≤ tα/2

)= 1− α

3 Decide whether the realized value of T in our data is unusual given thedistribution of the test statistic under the null hypothesis.

4 Finally, either declare that we reject H0 or not, or report the p-value.

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 62 / 93

Page 268: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Using the t-Value as a Test Statistic

The procedure for testing this null hypothesis (βj = c) is identical to the simpleregression case, except that our reference distribution is tn−k−1 instead of tn−2.

1 Compute the t-value as T = (βj − c)/SE [βj ]

2 Compare the value to the critical value tα/2 for the α level test, which underthe null hypothesis satisfies

P(−tα/2 ≤ T ≤ tα/2

)= 1− α

3 Decide whether the realized value of T in our data is unusual given thedistribution of the test statistic under the null hypothesis.

4 Finally, either declare that we reject H0 or not, or report the p-value.

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 62 / 93

Page 269: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Using the t-Value as a Test Statistic

The procedure for testing this null hypothesis (βj = c) is identical to the simpleregression case, except that our reference distribution is tn−k−1 instead of tn−2.

1 Compute the t-value as T = (βj − c)/SE [βj ]

2 Compare the value to the critical value tα/2 for the α level test, which underthe null hypothesis satisfies

P(−tα/2 ≤ T ≤ tα/2

)= 1− α

3 Decide whether the realized value of T in our data is unusual given thedistribution of the test statistic under the null hypothesis.

4 Finally, either declare that we reject H0 or not, or report the p-value.

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 62 / 93

Page 270: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Using the t-Value as a Test Statistic

The procedure for testing this null hypothesis (βj = c) is identical to the simpleregression case, except that our reference distribution is tn−k−1 instead of tn−2.

1 Compute the t-value as T = (βj − c)/SE [βj ]

2 Compare the value to the critical value tα/2 for the α level test, which underthe null hypothesis satisfies

P(−tα/2 ≤ T ≤ tα/2

)= 1− α

3 Decide whether the realized value of T in our data is unusual given thedistribution of the test statistic under the null hypothesis.

4 Finally, either declare that we reject H0 or not, or report the p-value.

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 62 / 93

Page 271: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Using the t-Value as a Test Statistic

The procedure for testing this null hypothesis (βj = c) is identical to the simpleregression case, except that our reference distribution is tn−k−1 instead of tn−2.

1 Compute the t-value as T = (βj − c)/SE [βj ]

2 Compare the value to the critical value tα/2 for the α level test, which underthe null hypothesis satisfies

P(−tα/2 ≤ T ≤ tα/2

)= 1− α

3 Decide whether the realized value of T in our data is unusual given thedistribution of the test statistic under the null hypothesis.

4 Finally, either declare that we reject H0 or not, or report the p-value.

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 62 / 93

Page 272: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Confidence IntervalsTo construct confidence intervals, there is again no difference compared to the case ofk = 1, except that we need to use tn−k−1 instead of tn−2

Since we know the sampling distribution for our t-value:

T =βj − c

SE [βj ]∼ tn−k−1

So we also know the probability that the value of our test statistics falls into a giveninterval:

P

(−tα/2 ≤

βj − βjSE [βj ]

≤ tα/2

)= 1− α

We rearrange: [βj − tα/2SE [βj ] , βj + tα/2SE [βj ]

]and thus can construct the confidence intervals as usual using:

βj ± tα/2 · SE [βj ]

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 63 / 93

Page 273: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Confidence IntervalsTo construct confidence intervals, there is again no difference compared to the case ofk = 1, except that we need to use tn−k−1 instead of tn−2

Since we know the sampling distribution for our t-value:

T =βj − c

SE [βj ]∼ tn−k−1

So we also know the probability that the value of our test statistics falls into a giveninterval:

P

(−tα/2 ≤

βj − βjSE [βj ]

≤ tα/2

)= 1− α

We rearrange: [βj − tα/2SE [βj ] , βj + tα/2SE [βj ]

]and thus can construct the confidence intervals as usual using:

βj ± tα/2 · SE [βj ]

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 63 / 93

Page 274: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Confidence IntervalsTo construct confidence intervals, there is again no difference compared to the case ofk = 1, except that we need to use tn−k−1 instead of tn−2

Since we know the sampling distribution for our t-value:

T =βj − c

SE [βj ]∼ tn−k−1

So we also know the probability that the value of our test statistics falls into a giveninterval:

P

(−tα/2 ≤

βj − βjSE [βj ]

≤ tα/2

)= 1− α

We rearrange: [βj − tα/2SE [βj ] , βj + tα/2SE [βj ]

]and thus can construct the confidence intervals as usual using:

βj ± tα/2 · SE [βj ]

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 63 / 93

Page 275: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Confidence IntervalsTo construct confidence intervals, there is again no difference compared to the case ofk = 1, except that we need to use tn−k−1 instead of tn−2

Since we know the sampling distribution for our t-value:

T =βj − c

SE [βj ]∼ tn−k−1

So we also know the probability that the value of our test statistics falls into a giveninterval:

P

(−tα/2 ≤

βj − βjSE [βj ]

≤ tα/2

)= 1− α

We rearrange: [βj − tα/2SE [βj ] , βj + tα/2SE [βj ]

]

and thus can construct the confidence intervals as usual using:

βj ± tα/2 · SE [βj ]

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 63 / 93

Page 276: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Confidence IntervalsTo construct confidence intervals, there is again no difference compared to the case ofk = 1, except that we need to use tn−k−1 instead of tn−2

Since we know the sampling distribution for our t-value:

T =βj − c

SE [βj ]∼ tn−k−1

So we also know the probability that the value of our test statistics falls into a giveninterval:

P

(−tα/2 ≤

βj − βjSE [βj ]

≤ tα/2

)= 1− α

We rearrange: [βj − tα/2SE [βj ] , βj + tα/2SE [βj ]

]and thus can construct the confidence intervals as usual using:

βj ± tα/2 · SE [βj ]

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 63 / 93

Page 277: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Confidence Intervals in RR Code

> fit <- lm(vote1 ~ fem + educ + age, data = d)

> summary(fit)

~~~~~

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 0.4042284 0.0514034 7.864 6.57e-15 ***

fem 0.1360034 0.0237132 5.735 1.15e-08 ***

educ -0.0607604 0.0138649 -4.382 1.25e-05 ***

age 0.0037786 0.0008315 4.544 5.90e-06 ***

---

R Code> confint(fit)

2.5 % 97.5 %

(Intercept) 0.303407780 0.50504909

fem 0.089493169 0.18251357

educ -0.087954435 -0.03356629

age 0.002147755 0.00540954

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 64 / 93

Page 278: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Testing Hypothesis About a Linear Combination of βj

R Code> fit <- lm(REALGDPCAP ~ Region, data = D)

> summary(fit)

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 4452.7 783.4 5.684 2.07e-07 ***

RegionAfrica -2552.8 1204.5 -2.119 0.0372 *

RegionAsia 148.9 1149.8 0.129 0.8973

RegionLatAmerica -271.3 1007.0 -0.269 0.7883

RegionOecd 9671.3 1007.0 9.604 5.74e-15 ***

βAsia and βLAm are close. So we may want to test the null hypothesis:

H0 : βLAm = βAsia ⇔ βLAm − βAsia = 0

against the alternative of

H1 : βLAm 6= βAsia ⇔ βLAm − βAsia 6= 0

What would be an appropriate test statistic for this hypothesis?

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 65 / 93

Page 279: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Testing Hypothesis About a Linear Combination of βjR Code

> fit <- lm(REALGDPCAP ~ Region, data = D)

> summary(fit)

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 4452.7 783.4 5.684 2.07e-07 ***

RegionAfrica -2552.8 1204.5 -2.119 0.0372 *

RegionAsia 148.9 1149.8 0.129 0.8973

RegionLatAmerica -271.3 1007.0 -0.269 0.7883

RegionOecd 9671.3 1007.0 9.604 5.74e-15 ***

βAsia and βLAm are close. So we may want to test the null hypothesis:

H0 : βLAm = βAsia ⇔ βLAm − βAsia = 0

against the alternative of

H1 : βLAm 6= βAsia ⇔ βLAm − βAsia 6= 0

What would be an appropriate test statistic for this hypothesis?

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 65 / 93

Page 280: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Testing Hypothesis About a Linear Combination of βjR Code

> fit <- lm(REALGDPCAP ~ Region, data = D)

> summary(fit)

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 4452.7 783.4 5.684 2.07e-07 ***

RegionAfrica -2552.8 1204.5 -2.119 0.0372 *

RegionAsia 148.9 1149.8 0.129 0.8973

RegionLatAmerica -271.3 1007.0 -0.269 0.7883

RegionOecd 9671.3 1007.0 9.604 5.74e-15 ***

βAsia and βLAm are close. So we may want to test the null hypothesis:

H0 : βLAm = βAsia ⇔ βLAm − βAsia = 0

against the alternative of

H1 : βLAm 6= βAsia ⇔ βLAm − βAsia 6= 0

What would be an appropriate test statistic for this hypothesis?

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 65 / 93

Page 281: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Testing Hypothesis About a Linear Combination of βjR Code

> fit <- lm(REALGDPCAP ~ Region, data = D)

> summary(fit)

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 4452.7 783.4 5.684 2.07e-07 ***

RegionAfrica -2552.8 1204.5 -2.119 0.0372 *

RegionAsia 148.9 1149.8 0.129 0.8973

RegionLatAmerica -271.3 1007.0 -0.269 0.7883

RegionOecd 9671.3 1007.0 9.604 5.74e-15 ***

βAsia and βLAm are close. So we may want to test the null hypothesis:

H0 : βLAm = βAsia ⇔ βLAm − βAsia = 0

against the alternative of

H1 : βLAm 6= βAsia ⇔ βLAm − βAsia 6= 0

What would be an appropriate test statistic for this hypothesis?

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 65 / 93

Page 282: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Testing Hypothesis About a Linear Combination of βjR Code

> fit <- lm(REALGDPCAP ~ Region, data = D)

> summary(fit)

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 4452.7 783.4 5.684 2.07e-07 ***

RegionAfrica -2552.8 1204.5 -2.119 0.0372 *

RegionAsia 148.9 1149.8 0.129 0.8973

RegionLatAmerica -271.3 1007.0 -0.269 0.7883

RegionOecd 9671.3 1007.0 9.604 5.74e-15 ***

βAsia and βLAm are close. So we may want to test the null hypothesis:

H0 : βLAm = βAsia ⇔ βLAm − βAsia = 0

against the alternative of

H1 : βLAm 6= βAsia ⇔ βLAm − βAsia 6= 0

What would be an appropriate test statistic for this hypothesis?

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 65 / 93

Page 283: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Testing Hypothesis About a Linear Combination of βjR Code

> fit <- lm(REALGDPCAP ~ Region, data = D)

> summary(fit)

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 4452.7 783.4 5.684 2.07e-07 ***

RegionAfrica -2552.8 1204.5 -2.119 0.0372 *

RegionAsia 148.9 1149.8 0.129 0.8973

RegionLatAmerica -271.3 1007.0 -0.269 0.7883

RegionOecd 9671.3 1007.0 9.604 5.74e-15 ***

Let’s consider a t-value:

T =βLAm − βAsia

SE(βLAm − βAsia)

We will reject H0 if T is sufficiently different from zero.

Note that unlike the test of a single hypothesis, both βLAm and βAsia are randomvariables, hence the denominator.

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 66 / 93

Page 284: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Testing Hypothesis About a Linear Combination of βjR Code

> fit <- lm(REALGDPCAP ~ Region, data = D)

> summary(fit)

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 4452.7 783.4 5.684 2.07e-07 ***

RegionAfrica -2552.8 1204.5 -2.119 0.0372 *

RegionAsia 148.9 1149.8 0.129 0.8973

RegionLatAmerica -271.3 1007.0 -0.269 0.7883

RegionOecd 9671.3 1007.0 9.604 5.74e-15 ***

Let’s consider a t-value:

T =βLAm − βAsia

SE(βLAm − βAsia)

We will reject H0 if T is sufficiently different from zero.

Note that unlike the test of a single hypothesis, both βLAm and βAsia are randomvariables, hence the denominator.

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 66 / 93

Page 285: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Testing Hypothesis About a Linear Combination of βjOur test statistic:

T =βLAm − βAsia

SE (βLAm − βAsia)∼ tn−k−1

How do you find SE (βLAm − βAsia)?

Is it SE (βLAm)− SE (βAsia)? No!

Is it SE (βLAm) + SE (βAsia)? No!

Recall the following property of the variance:

V (X ± Y ) = V (X ) + V (Y )± 2Cov(X ,Y )

Therefore, the standard error for a linear combination of coefficients is:

SE (β1 ± β2) =

√V (β1) + V (β2)± 2Cov[β1, β2]

which we can calculate from the estimated covariance matrix of β.

Since the estimates of the coefficients are correlated, we need the covarianceterm.

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 67 / 93

Page 286: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Testing Hypothesis About a Linear Combination of βjOur test statistic:

T =βLAm − βAsia

SE (βLAm − βAsia)∼ tn−k−1

How do you find SE (βLAm − βAsia)?

Is it SE (βLAm)− SE (βAsia)? No!

Is it SE (βLAm) + SE (βAsia)? No!

Recall the following property of the variance:

V (X ± Y ) = V (X ) + V (Y )± 2Cov(X ,Y )

Therefore, the standard error for a linear combination of coefficients is:

SE (β1 ± β2) =

√V (β1) + V (β2)± 2Cov[β1, β2]

which we can calculate from the estimated covariance matrix of β.

Since the estimates of the coefficients are correlated, we need the covarianceterm.

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 67 / 93

Page 287: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Testing Hypothesis About a Linear Combination of βjOur test statistic:

T =βLAm − βAsia

SE (βLAm − βAsia)∼ tn−k−1

How do you find SE (βLAm − βAsia)?

Is it SE (βLAm)− SE (βAsia)?

No!

Is it SE (βLAm) + SE (βAsia)? No!

Recall the following property of the variance:

V (X ± Y ) = V (X ) + V (Y )± 2Cov(X ,Y )

Therefore, the standard error for a linear combination of coefficients is:

SE (β1 ± β2) =

√V (β1) + V (β2)± 2Cov[β1, β2]

which we can calculate from the estimated covariance matrix of β.

Since the estimates of the coefficients are correlated, we need the covarianceterm.

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 67 / 93

Page 288: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Testing Hypothesis About a Linear Combination of βjOur test statistic:

T =βLAm − βAsia

SE (βLAm − βAsia)∼ tn−k−1

How do you find SE (βLAm − βAsia)?

Is it SE (βLAm)− SE (βAsia)? No!

Is it SE (βLAm) + SE (βAsia)? No!

Recall the following property of the variance:

V (X ± Y ) = V (X ) + V (Y )± 2Cov(X ,Y )

Therefore, the standard error for a linear combination of coefficients is:

SE (β1 ± β2) =

√V (β1) + V (β2)± 2Cov[β1, β2]

which we can calculate from the estimated covariance matrix of β.

Since the estimates of the coefficients are correlated, we need the covarianceterm.

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 67 / 93

Page 289: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Testing Hypothesis About a Linear Combination of βjOur test statistic:

T =βLAm − βAsia

SE (βLAm − βAsia)∼ tn−k−1

How do you find SE (βLAm − βAsia)?

Is it SE (βLAm)− SE (βAsia)? No!

Is it SE (βLAm) + SE (βAsia)?

No!

Recall the following property of the variance:

V (X ± Y ) = V (X ) + V (Y )± 2Cov(X ,Y )

Therefore, the standard error for a linear combination of coefficients is:

SE (β1 ± β2) =

√V (β1) + V (β2)± 2Cov[β1, β2]

which we can calculate from the estimated covariance matrix of β.

Since the estimates of the coefficients are correlated, we need the covarianceterm.

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 67 / 93

Page 290: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Testing Hypothesis About a Linear Combination of βjOur test statistic:

T =βLAm − βAsia

SE (βLAm − βAsia)∼ tn−k−1

How do you find SE (βLAm − βAsia)?

Is it SE (βLAm)− SE (βAsia)? No!

Is it SE (βLAm) + SE (βAsia)? No!

Recall the following property of the variance:

V (X ± Y ) = V (X ) + V (Y )± 2Cov(X ,Y )

Therefore, the standard error for a linear combination of coefficients is:

SE (β1 ± β2) =

√V (β1) + V (β2)± 2Cov[β1, β2]

which we can calculate from the estimated covariance matrix of β.

Since the estimates of the coefficients are correlated, we need the covarianceterm.

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 67 / 93

Page 291: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Testing Hypothesis About a Linear Combination of βjOur test statistic:

T =βLAm − βAsia

SE (βLAm − βAsia)∼ tn−k−1

How do you find SE (βLAm − βAsia)?

Is it SE (βLAm)− SE (βAsia)? No!

Is it SE (βLAm) + SE (βAsia)? No!

Recall the following property of the variance:

V (X ± Y ) = V (X ) + V (Y )± 2Cov(X ,Y )

Therefore, the standard error for a linear combination of coefficients is:

SE (β1 ± β2) =

√V (β1) + V (β2)± 2Cov[β1, β2]

which we can calculate from the estimated covariance matrix of β.

Since the estimates of the coefficients are correlated, we need the covarianceterm.

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 67 / 93

Page 292: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Testing Hypothesis About a Linear Combination of βjOur test statistic:

T =βLAm − βAsia

SE (βLAm − βAsia)∼ tn−k−1

How do you find SE (βLAm − βAsia)?

Is it SE (βLAm)− SE (βAsia)? No!

Is it SE (βLAm) + SE (βAsia)? No!

Recall the following property of the variance:

V (X ± Y ) = V (X ) + V (Y )± 2Cov(X ,Y )

Therefore, the standard error for a linear combination of coefficients is:

SE (β1 ± β2) =

√V (β1) + V (β2)± 2Cov[β1, β2]

which we can calculate from the estimated covariance matrix of β.

Since the estimates of the coefficients are correlated, we need the covarianceterm.

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 67 / 93

Page 293: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Testing Hypothesis About a Linear Combination of βjOur test statistic:

T =βLAm − βAsia

SE (βLAm − βAsia)∼ tn−k−1

How do you find SE (βLAm − βAsia)?

Is it SE (βLAm)− SE (βAsia)? No!

Is it SE (βLAm) + SE (βAsia)? No!

Recall the following property of the variance:

V (X ± Y ) = V (X ) + V (Y )± 2Cov(X ,Y )

Therefore, the standard error for a linear combination of coefficients is:

SE (β1 ± β2) =

√V (β1) + V (β2)± 2Cov[β1, β2]

which we can calculate from the estimated covariance matrix of β.

Since the estimates of the coefficients are correlated, we need the covarianceterm.

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 67 / 93

Page 294: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Example: GDP per capita on Regions

R Code> fit <- lm(REALGDPCAP ~ Region, data = D)

> V <- vcov(fit)

> V

(Intercept) RegionAfrica RegionAsia RegionLatAmerica

(Intercept) 613769.9 -613769.9 -613769.9 -613769.9

RegionAfrica -613769.9 1450728.8 613769.9 613769.9

RegionAsia -613769.9 613769.9 1321965.9 613769.9

RegionLatAmerica -613769.9 613769.9 613769.9 1014054.6

RegionOecd -613769.9 613769.9 613769.9 613769.9

RegionOecd

(Intercept) -613769.9

RegionAfrica 613769.9

RegionAsia 613769.9

RegionLatAmerica 613769.9

RegionOecd 1014054.6

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 68 / 93

Page 295: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Example: GDP per capita on Regions

R Code> fit <- lm(REALGDPCAP ~ Region, data = D)

> V <- vcov(fit)

> V

(Intercept) RegionAfrica RegionAsia RegionLatAmerica

(Intercept) 613769.9 -613769.9 -613769.9 -613769.9

RegionAfrica -613769.9 1450728.8 613769.9 613769.9

RegionAsia -613769.9 613769.9 1321965.9 613769.9

RegionLatAmerica -613769.9 613769.9 613769.9 1014054.6

RegionOecd -613769.9 613769.9 613769.9 613769.9

RegionOecd

(Intercept) -613769.9

RegionAfrica 613769.9

RegionAsia 613769.9

RegionLatAmerica 613769.9

RegionOecd 1014054.6

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 68 / 93

Page 296: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Example: GDP per capita on Regions

We can then compute the test statistic for the hypothesis of interest:

R Code> se <- sqrt(V[4,4] + V[3,3] - 2*V[3,4])

> se

[1] 1052.844

>

> tstat <- (coef(fit)[4] - coef(fit)[3])/se

> tstat

RegionLatAmerica

-0.3990977

t =βLAm − βAsia

SE(βLAm − βAsia)where

SE(βLAm − βAsia) =

√V (βLAm) + V (βAsia)− 2Cov[βLAm, βAsia]

Plugging in we get t ≈ −0.40. So what do we conclude?

We cannot reject the null that the difference in average GDP resulted from chance.

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 69 / 93

Page 297: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Example: GDP per capita on Regions

We can then compute the test statistic for the hypothesis of interest:R Code

> se <- sqrt(V[4,4] + V[3,3] - 2*V[3,4])

> se

[1] 1052.844

>

> tstat <- (coef(fit)[4] - coef(fit)[3])/se

> tstat

RegionLatAmerica

-0.3990977

t =βLAm − βAsia

SE(βLAm − βAsia)where

SE(βLAm − βAsia) =

√V (βLAm) + V (βAsia)− 2Cov[βLAm, βAsia]

Plugging in we get t ≈ −0.40. So what do we conclude?

We cannot reject the null that the difference in average GDP resulted from chance.

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 69 / 93

Page 298: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Example: GDP per capita on Regions

We can then compute the test statistic for the hypothesis of interest:R Code

> se <- sqrt(V[4,4] + V[3,3] - 2*V[3,4])

> se

[1] 1052.844

>

> tstat <- (coef(fit)[4] - coef(fit)[3])/se

> tstat

RegionLatAmerica

-0.3990977

t =βLAm − βAsia

SE(βLAm − βAsia)where

SE(βLAm − βAsia) =

√V (βLAm) + V (βAsia)− 2Cov[βLAm, βAsia]

Plugging in we get t ≈ −0.40. So what do we conclude?

We cannot reject the null that the difference in average GDP resulted from chance.

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 69 / 93

Page 299: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Aside: Adjusted R2

R Code> fit <- lm(vote1 ~ fem + educ + age, data = d)

> summary(fit)

~~~~~

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 0.4042284 0.0514034 7.864 6.57e-15 ***

fem 0.1360034 0.0237132 5.735 1.15e-08 ***

educ -0.0607604 0.0138649 -4.382 1.25e-05 ***

age 0.0037786 0.0008315 4.544 5.90e-06 ***

---

Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1

Residual standard error: 0.4875 on 1699 degrees of freedom

Multiple R-squared: 0.05112, Adjusted R-squared: 0.04945

F-statistic: 30.51 on 3 and 1699 DF, p-value: < 2.2e-16

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 70 / 93

Page 300: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Aside: Adjusted R2

R Code> fit <- lm(vote1 ~ fem + educ + age, data = d)

> summary(fit)

~~~~~

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 0.4042284 0.0514034 7.864 6.57e-15 ***

fem 0.1360034 0.0237132 5.735 1.15e-08 ***

educ -0.0607604 0.0138649 -4.382 1.25e-05 ***

age 0.0037786 0.0008315 4.544 5.90e-06 ***

---

Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1

Residual standard error: 0.4875 on 1699 degrees of freedom

Multiple R-squared: 0.05112, Adjusted R-squared: 0.04945

F-statistic: 30.51 on 3 and 1699 DF, p-value: < 2.2e-16

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 70 / 93

Page 301: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Aside: Adjusted R2

R2 often used to assess in-sample model fit. Recall

R2 = 1− SSres

SStot

where SSres are the sum of squared residuals and the SStot are the sum ofthe squared deviations from the mean.

Perhaps problematically, it can be shown that R2 always stays constant orincreases with more explanatory variables

So, how do we penalize more complex models? Adjusted R2

This makes R2 more ‘comparable’ across models with different numbers ofvariables, but the next section will show you an even better way to approachthat problem in a testing framework.

Still since people report it, the next slide derives adjusted R2 (but we aregoing to skip it),

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 71 / 93

Page 302: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Aside: Adjusted R2

R2 often used to assess in-sample model fit.

Recall

R2 = 1− SSres

SStot

where SSres are the sum of squared residuals and the SStot are the sum ofthe squared deviations from the mean.

Perhaps problematically, it can be shown that R2 always stays constant orincreases with more explanatory variables

So, how do we penalize more complex models? Adjusted R2

This makes R2 more ‘comparable’ across models with different numbers ofvariables, but the next section will show you an even better way to approachthat problem in a testing framework.

Still since people report it, the next slide derives adjusted R2 (but we aregoing to skip it),

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 71 / 93

Page 303: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Aside: Adjusted R2

R2 often used to assess in-sample model fit. Recall

R2 = 1− SSres

SStot

where SSres are the sum of squared residuals and the SStot are the sum ofthe squared deviations from the mean.

Perhaps problematically, it can be shown that R2 always stays constant orincreases with more explanatory variables

So, how do we penalize more complex models? Adjusted R2

This makes R2 more ‘comparable’ across models with different numbers ofvariables, but the next section will show you an even better way to approachthat problem in a testing framework.

Still since people report it, the next slide derives adjusted R2 (but we aregoing to skip it),

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 71 / 93

Page 304: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Aside: Adjusted R2

R2 often used to assess in-sample model fit. Recall

R2 = 1− SSres

SStot

where SSres are the sum of squared residuals and the SStot are the sum ofthe squared deviations from the mean.

Perhaps problematically, it can be shown that R2 always stays constant orincreases with more explanatory variables

So, how do we penalize more complex models? Adjusted R2

This makes R2 more ‘comparable’ across models with different numbers ofvariables, but the next section will show you an even better way to approachthat problem in a testing framework.

Still since people report it, the next slide derives adjusted R2 (but we aregoing to skip it),

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 71 / 93

Page 305: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Aside: Adjusted R2

R2 often used to assess in-sample model fit. Recall

R2 = 1− SSres

SStot

where SSres are the sum of squared residuals and the SStot are the sum ofthe squared deviations from the mean.

Perhaps problematically, it can be shown that R2 always stays constant orincreases with more explanatory variables

So, how do we penalize more complex models?

Adjusted R2

This makes R2 more ‘comparable’ across models with different numbers ofvariables, but the next section will show you an even better way to approachthat problem in a testing framework.

Still since people report it, the next slide derives adjusted R2 (but we aregoing to skip it),

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 71 / 93

Page 306: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Aside: Adjusted R2

R2 often used to assess in-sample model fit. Recall

R2 = 1− SSres

SStot

where SSres are the sum of squared residuals and the SStot are the sum ofthe squared deviations from the mean.

Perhaps problematically, it can be shown that R2 always stays constant orincreases with more explanatory variables

So, how do we penalize more complex models? Adjusted R2

This makes R2 more ‘comparable’ across models with different numbers ofvariables, but the next section will show you an even better way to approachthat problem in a testing framework.

Still since people report it, the next slide derives adjusted R2 (but we aregoing to skip it),

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 71 / 93

Page 307: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Aside: Adjusted R2

R2 often used to assess in-sample model fit. Recall

R2 = 1− SSres

SStot

where SSres are the sum of squared residuals and the SStot are the sum ofthe squared deviations from the mean.

Perhaps problematically, it can be shown that R2 always stays constant orincreases with more explanatory variables

So, how do we penalize more complex models? Adjusted R2

This makes R2 more ‘comparable’ across models with different numbers ofvariables, but the next section will show you an even better way to approachthat problem in a testing framework.

Still since people report it, the next slide derives adjusted R2 (but we aregoing to skip it),

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 71 / 93

Page 308: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Aside: Adjusted R2

R2 often used to assess in-sample model fit. Recall

R2 = 1− SSres

SStot

where SSres are the sum of squared residuals and the SStot are the sum ofthe squared deviations from the mean.

Perhaps problematically, it can be shown that R2 always stays constant orincreases with more explanatory variables

So, how do we penalize more complex models? Adjusted R2

This makes R2 more ‘comparable’ across models with different numbers ofvariables, but the next section will show you an even better way to approachthat problem in a testing framework.

Still since people report it, the next slide derives adjusted R2 (but we aregoing to skip it),

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 71 / 93

Page 309: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Aside: Adjusted R2

Key idea: rewrite R2 in terms of variances

R2 = 1− SSres/n

SStot/n

= 1− V (SSres)

V (SStot)

where V is a biased estimator of the population variance.

What if we replace the biased estimator with the unbiased estimators

V (SSres) = SSres/(n − k − 1)

V (SStot) = SStot/(n − 1)

Some algebra gets us to

R2adj = R2 − (1− R2)

k − 1

n − k︸ ︷︷ ︸model complexity penalty

Adjusted R2 will always be smaller than R2 and can sometimes be negative!

For more about R2, check out Kvalseth (1985) “Cautionary Note aboutR2”: https://doi.org/10.1080/00031305.1985.10479448

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 72 / 93

Page 310: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Aside: Adjusted R2

Key idea: rewrite R2 in terms of variances

R2 = 1− SSres/n

SStot/n

= 1− V (SSres)

V (SStot)

where V is a biased estimator of the population variance.

What if we replace the biased estimator with the unbiased estimators

V (SSres) = SSres/(n − k − 1)

V (SStot) = SStot/(n − 1)

Some algebra gets us to

R2adj = R2 − (1− R2)

k − 1

n − k︸ ︷︷ ︸model complexity penalty

Adjusted R2 will always be smaller than R2 and can sometimes be negative!

For more about R2, check out Kvalseth (1985) “Cautionary Note aboutR2”: https://doi.org/10.1080/00031305.1985.10479448

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 72 / 93

Page 311: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Aside: Adjusted R2

Key idea: rewrite R2 in terms of variances

R2 = 1− SSres/n

SStot/n

= 1− V (SSres)

V (SStot)

where V is a biased estimator of the population variance.

What if we replace the biased estimator with the unbiased estimators

V (SSres) = SSres/(n − k − 1)

V (SStot) = SStot/(n − 1)

Some algebra gets us to

R2adj = R2 − (1− R2)

k − 1

n − k︸ ︷︷ ︸model complexity penalty

Adjusted R2 will always be smaller than R2 and can sometimes be negative!

For more about R2, check out Kvalseth (1985) “Cautionary Note aboutR2”: https://doi.org/10.1080/00031305.1985.10479448

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 72 / 93

Page 312: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Aside: Adjusted R2

Key idea: rewrite R2 in terms of variances

R2 = 1− SSres/n

SStot/n

= 1− V (SSres)

V (SStot)

where V is a biased estimator of the population variance.

What if we replace the biased estimator with the unbiased estimators

V (SSres) = SSres/(n − k − 1)

V (SStot) = SStot/(n − 1)

Some algebra gets us to

R2adj = R2 − (1− R2)

k − 1

n − k︸ ︷︷ ︸model complexity penalty

Adjusted R2 will always be smaller than R2 and can sometimes be negative!

For more about R2, check out Kvalseth (1985) “Cautionary Note aboutR2”: https://doi.org/10.1080/00031305.1985.10479448

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 72 / 93

Page 313: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Aside: Adjusted R2

Key idea: rewrite R2 in terms of variances

R2 = 1− SSres/n

SStot/n

= 1− V (SSres)

V (SStot)

where V is a biased estimator of the population variance.

What if we replace the biased estimator with the unbiased estimators

V (SSres) = SSres/(n − k − 1)

V (SStot) = SStot/(n − 1)

Some algebra gets us to

R2adj = R2 − (1− R2)

k − 1

n − k︸ ︷︷ ︸model complexity penalty

Adjusted R2 will always be smaller than R2 and can sometimes be negative!

For more about R2, check out Kvalseth (1985) “Cautionary Note aboutR2”: https://doi.org/10.1080/00031305.1985.10479448

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 72 / 93

Page 314: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Aside: Adjusted R2

Key idea: rewrite R2 in terms of variances

R2 = 1− SSres/n

SStot/n

= 1− V (SSres)

V (SStot)

where V is a biased estimator of the population variance.

What if we replace the biased estimator with the unbiased estimators

V (SSres) = SSres/(n − k − 1)

V (SStot) = SStot/(n − 1)

Some algebra gets us to

R2adj = R2 − (1− R2)

k − 1

n − k︸ ︷︷ ︸model complexity penalty

Adjusted R2 will always be smaller than R2 and can sometimes be negative!

For more about R2, check out Kvalseth (1985) “Cautionary Note aboutR2”: https://doi.org/10.1080/00031305.1985.10479448

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 72 / 93

Page 315: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Aside: Adjusted R2

Key idea: rewrite R2 in terms of variances

R2 = 1− SSres/n

SStot/n

= 1− V (SSres)

V (SStot)

where V is a biased estimator of the population variance.

What if we replace the biased estimator with the unbiased estimators

V (SSres) = SSres/(n − k − 1)

V (SStot) = SStot/(n − 1)

Some algebra gets us to

R2adj = R2 − (1− R2)

k − 1

n − k︸ ︷︷ ︸model complexity penalty

Adjusted R2 will always be smaller than R2 and can sometimes be negative!

For more about R2, check out Kvalseth (1985) “Cautionary Note aboutR2”: https://doi.org/10.1080/00031305.1985.10479448

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 72 / 93

Page 316: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Aside: Adjusted R2

Key idea: rewrite R2 in terms of variances

R2 = 1− SSres/n

SStot/n

= 1− V (SSres)

V (SStot)

where V is a biased estimator of the population variance.

What if we replace the biased estimator with the unbiased estimators

V (SSres) = SSres/(n − k − 1)

V (SStot) = SStot/(n − 1)

Some algebra gets us to

R2adj = R2 − (1− R2)

k − 1

n − k︸ ︷︷ ︸model complexity penalty

Adjusted R2 will always be smaller than R2 and can sometimes be negative!

For more about R2, check out Kvalseth (1985) “Cautionary Note aboutR2”: https://doi.org/10.1080/00031305.1985.10479448

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 72 / 93

Page 317: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Why Blow Through R2

In-sample model fit is not a particularly good indicator of model fit ona new sample.

Adjusted R2 is solving a problem about increasingly complex models,but by the time you reach this problem, you should be using held-outdata.

Stay tuned for more in Week 8!

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 73 / 93

Page 318: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Why Blow Through R2

In-sample model fit is not a particularly good indicator of model fit ona new sample.

Adjusted R2 is solving a problem about increasingly complex models,but by the time you reach this problem, you should be using held-outdata.

Stay tuned for more in Week 8!

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 73 / 93

Page 319: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Why Blow Through R2

In-sample model fit is not a particularly good indicator of model fit ona new sample.

Adjusted R2 is solving a problem about increasingly complex models,but by the time you reach this problem, you should be using held-outdata.

Stay tuned for more in Week 8!

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 73 / 93

Page 320: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Why Blow Through R2

In-sample model fit is not a particularly good indicator of model fit ona new sample.

Adjusted R2 is solving a problem about increasingly complex models,but by the time you reach this problem, you should be using held-outdata.

Stay tuned for more in Week 8!

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 73 / 93

Page 321: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

1 Matrix Form of RegressionEstimationFun With(out) Weights

2 OLS Classical Inference in Matrix FormUnbiasednessClassical Standard Errors

3 Agnostic Inference

4 Standard Hypothesis Testst-TestsAdjusted R2

F Tests for Joint Significance

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 74 / 93

Page 322: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

1 Matrix Form of RegressionEstimationFun With(out) Weights

2 OLS Classical Inference in Matrix FormUnbiasednessClassical Standard Errors

3 Agnostic Inference

4 Standard Hypothesis Testst-TestsAdjusted R2

F Tests for Joint Significance

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 74 / 93

Page 323: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

F Test for Joint Significance of Coefficients

In research we often want to test a joint hypothesis which involves multiple linearrestrictions (e.g. β1 = β2 = β3 = 0)

Suppose our regression model is:

Voted = β0 + γ1FEMALE + β1EDUCATION+

γ2(FEMALE · EDUCATION) + β2AGE + γ3(FEMALE · AGE) + u

and we want to testH0 : γ1 = γ2 = γ3 = 0.

Substantively, what question are we asking?

→ Do females and males vote systematically differently from each other?(Under the null, there is no difference in either the intercept or slopes betweenfemales and males).

This is an example of a joint hypothesis test involving three restrictions: γ1 = 0,γ2 = 0, and γ3 = 0.

If all the interaction terms and the group lower order term are close to zero, thenwe fail to reject the null hypothesis of no gender difference.

F tests allows us to to test joint hypothesis

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 75 / 93

Page 324: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

F Test for Joint Significance of Coefficients

In research we often want to test a joint hypothesis which involves multiple linearrestrictions (e.g. β1 = β2 = β3 = 0)

Suppose our regression model is:

Voted = β0 + γ1FEMALE + β1EDUCATION+

γ2(FEMALE · EDUCATION) + β2AGE + γ3(FEMALE · AGE) + u

and we want to testH0 : γ1 = γ2 = γ3 = 0.

Substantively, what question are we asking?

→ Do females and males vote systematically differently from each other?(Under the null, there is no difference in either the intercept or slopes betweenfemales and males).

This is an example of a joint hypothesis test involving three restrictions: γ1 = 0,γ2 = 0, and γ3 = 0.

If all the interaction terms and the group lower order term are close to zero, thenwe fail to reject the null hypothesis of no gender difference.

F tests allows us to to test joint hypothesis

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 75 / 93

Page 325: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

F Test for Joint Significance of Coefficients

In research we often want to test a joint hypothesis which involves multiple linearrestrictions (e.g. β1 = β2 = β3 = 0)

Suppose our regression model is:

Voted = β0 + γ1FEMALE + β1EDUCATION+

γ2(FEMALE · EDUCATION) + β2AGE + γ3(FEMALE · AGE) + u

and we want to testH0 : γ1 = γ2 = γ3 = 0.

Substantively, what question are we asking?

→ Do females and males vote systematically differently from each other?(Under the null, there is no difference in either the intercept or slopes betweenfemales and males).

This is an example of a joint hypothesis test involving three restrictions: γ1 = 0,γ2 = 0, and γ3 = 0.

If all the interaction terms and the group lower order term are close to zero, thenwe fail to reject the null hypothesis of no gender difference.

F tests allows us to to test joint hypothesis

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 75 / 93

Page 326: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

F Test for Joint Significance of Coefficients

In research we often want to test a joint hypothesis which involves multiple linearrestrictions (e.g. β1 = β2 = β3 = 0)

Suppose our regression model is:

Voted = β0 + γ1FEMALE + β1EDUCATION+

γ2(FEMALE · EDUCATION) + β2AGE + γ3(FEMALE · AGE) + u

and we want to testH0 : γ1 = γ2 = γ3 = 0.

Substantively, what question are we asking?

→ Do females and males vote systematically differently from each other?(Under the null, there is no difference in either the intercept or slopes betweenfemales and males).

This is an example of a joint hypothesis test involving three restrictions: γ1 = 0,γ2 = 0, and γ3 = 0.

If all the interaction terms and the group lower order term are close to zero, thenwe fail to reject the null hypothesis of no gender difference.

F tests allows us to to test joint hypothesis

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 75 / 93

Page 327: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

F Test for Joint Significance of Coefficients

In research we often want to test a joint hypothesis which involves multiple linearrestrictions (e.g. β1 = β2 = β3 = 0)

Suppose our regression model is:

Voted = β0 + γ1FEMALE + β1EDUCATION+

γ2(FEMALE · EDUCATION) + β2AGE + γ3(FEMALE · AGE) + u

and we want to testH0 : γ1 = γ2 = γ3 = 0.

Substantively, what question are we asking?

→ Do females and males vote systematically differently from each other?(Under the null, there is no difference in either the intercept or slopes betweenfemales and males).

This is an example of a joint hypothesis test involving three restrictions: γ1 = 0,γ2 = 0, and γ3 = 0.

If all the interaction terms and the group lower order term are close to zero, thenwe fail to reject the null hypothesis of no gender difference.

F tests allows us to to test joint hypothesis

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 75 / 93

Page 328: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

F Test for Joint Significance of Coefficients

In research we often want to test a joint hypothesis which involves multiple linearrestrictions (e.g. β1 = β2 = β3 = 0)

Suppose our regression model is:

Voted = β0 + γ1FEMALE + β1EDUCATION+

γ2(FEMALE · EDUCATION) + β2AGE + γ3(FEMALE · AGE) + u

and we want to testH0 : γ1 = γ2 = γ3 = 0.

Substantively, what question are we asking?

→ Do females and males vote systematically differently from each other?(Under the null, there is no difference in either the intercept or slopes betweenfemales and males).

This is an example of a joint hypothesis test involving three restrictions: γ1 = 0,γ2 = 0, and γ3 = 0.

If all the interaction terms and the group lower order term are close to zero, thenwe fail to reject the null hypothesis of no gender difference.

F tests allows us to to test joint hypothesis

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 75 / 93

Page 329: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

F Test for Joint Significance of Coefficients

In research we often want to test a joint hypothesis which involves multiple linearrestrictions (e.g. β1 = β2 = β3 = 0)

Suppose our regression model is:

Voted = β0 + γ1FEMALE + β1EDUCATION+

γ2(FEMALE · EDUCATION) + β2AGE + γ3(FEMALE · AGE) + u

and we want to testH0 : γ1 = γ2 = γ3 = 0.

Substantively, what question are we asking?

→ Do females and males vote systematically differently from each other?(Under the null, there is no difference in either the intercept or slopes betweenfemales and males).

This is an example of a joint hypothesis test involving three restrictions: γ1 = 0,γ2 = 0, and γ3 = 0.

If all the interaction terms and the group lower order term are close to zero, thenwe fail to reject the null hypothesis of no gender difference.

F tests allows us to to test joint hypothesis

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 75 / 93

Page 330: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

F Test for Joint Significance of Coefficients

In research we often want to test a joint hypothesis which involves multiple linearrestrictions (e.g. β1 = β2 = β3 = 0)

Suppose our regression model is:

Voted = β0 + γ1FEMALE + β1EDUCATION+

γ2(FEMALE · EDUCATION) + β2AGE + γ3(FEMALE · AGE) + u

and we want to testH0 : γ1 = γ2 = γ3 = 0.

Substantively, what question are we asking?

→ Do females and males vote systematically differently from each other?(Under the null, there is no difference in either the intercept or slopes betweenfemales and males).

This is an example of a joint hypothesis test involving three restrictions: γ1 = 0,γ2 = 0, and γ3 = 0.

If all the interaction terms and the group lower order term are close to zero, thenwe fail to reject the null hypothesis of no gender difference.

F tests allows us to to test joint hypothesis

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 75 / 93

Page 331: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

The χ2 Distribution

To test more than one hypothesis jointly we need to introduce some newprobability distributions.

Suppose Z1, ...,Zn are n i.i.d. random variables following N (0, 1).

Then, the sum of their squares, X =∑n

i=1 Z2i , is distributed according to the χ2

distribution with n degrees of freedom, X ∼ χ2n.

0 10 20 30 40

0.0

0.1

0.2

0.3

0.4

0.5

x

Den

sity

chisquare 1chisquare 4chisquare 15

Properties: X > 0, E [X ] = n and V [X ] = 2n. In R: dchisq(), pchisq(), rchisq()

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 76 / 93

Page 332: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

The χ2 DistributionTo test more than one hypothesis jointly we need to introduce some newprobability distributions.

Suppose Z1, ...,Zn are n i.i.d. random variables following N (0, 1).

Then, the sum of their squares, X =∑n

i=1 Z2i , is distributed according to the χ2

distribution with n degrees of freedom, X ∼ χ2n.

0 10 20 30 40

0.0

0.1

0.2

0.3

0.4

0.5

x

Den

sity

chisquare 1chisquare 4chisquare 15

Properties: X > 0, E [X ] = n and V [X ] = 2n. In R: dchisq(), pchisq(), rchisq()

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 76 / 93

Page 333: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

The χ2 DistributionTo test more than one hypothesis jointly we need to introduce some newprobability distributions.

Suppose Z1, ...,Zn are n i.i.d. random variables following N (0, 1).

Then, the sum of their squares, X =∑n

i=1 Z2i , is distributed according to the χ2

distribution with n degrees of freedom, X ∼ χ2n.

0 10 20 30 40

0.0

0.1

0.2

0.3

0.4

0.5

x

Den

sity

chisquare 1chisquare 4chisquare 15

Properties: X > 0, E [X ] = n and V [X ] = 2n. In R: dchisq(), pchisq(), rchisq()

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 76 / 93

Page 334: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

The χ2 DistributionTo test more than one hypothesis jointly we need to introduce some newprobability distributions.

Suppose Z1, ...,Zn are n i.i.d. random variables following N (0, 1).

Then, the sum of their squares, X =∑n

i=1 Z2i , is distributed according to the χ2

distribution with n degrees of freedom, X ∼ χ2n.

0 10 20 30 40

0.0

0.1

0.2

0.3

0.4

0.5

x

Den

sity

chisquare 1chisquare 4chisquare 15

Properties: X > 0, E [X ] = n and V [X ] = 2n. In R: dchisq(), pchisq(), rchisq()

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 76 / 93

Page 335: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

The χ2 DistributionTo test more than one hypothesis jointly we need to introduce some newprobability distributions.

Suppose Z1, ...,Zn are n i.i.d. random variables following N (0, 1).

Then, the sum of their squares, X =∑n

i=1 Z2i , is distributed according to the χ2

distribution with n degrees of freedom, X ∼ χ2n.

0 10 20 30 40

0.0

0.1

0.2

0.3

0.4

0.5

x

Den

sity

chisquare 1chisquare 4chisquare 15

Properties: X > 0, E [X ] = n and V [X ] = 2n. In R: dchisq(), pchisq(), rchisq()

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 76 / 93

Page 336: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

The F distribution

The F distribution arises as a ratio of two independent chi-squared distributed randomvariables:

F =X1/df1X2/df2

∼ Fdf1,df2

where X1 ∼ χ2df1

, X2 ∼ χ2df2

, and X1⊥⊥X2.

df1 and df2 are called the numerator degrees of freedom and the denominator degrees offreedom.

0.0 0.5 1.0 1.5 2.0

01

23

4

x

Den

sity

F 1,2F 5,5F 30, 20F 500, 200

In R: df(), pf(), rf()

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 77 / 93

Page 337: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

The F distributionThe F distribution arises as a ratio of two independent chi-squared distributed randomvariables:

F =X1/df1X2/df2

∼ Fdf1,df2

where X1 ∼ χ2df1

, X2 ∼ χ2df2

, and X1⊥⊥X2.

df1 and df2 are called the numerator degrees of freedom and the denominator degrees offreedom.

0.0 0.5 1.0 1.5 2.0

01

23

4

x

Den

sity

F 1,2F 5,5F 30, 20F 500, 200

In R: df(), pf(), rf()

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 77 / 93

Page 338: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

The F distributionThe F distribution arises as a ratio of two independent chi-squared distributed randomvariables:

F =X1/df1X2/df2

∼ Fdf1,df2

where X1 ∼ χ2df1

, X2 ∼ χ2df2

, and X1⊥⊥X2.

df1 and df2 are called the numerator degrees of freedom and the denominator degrees offreedom.

0.0 0.5 1.0 1.5 2.0

01

23

4

x

Den

sity

F 1,2F 5,5F 30, 20F 500, 200

In R: df(), pf(), rf()

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 77 / 93

Page 339: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

The F distributionThe F distribution arises as a ratio of two independent chi-squared distributed randomvariables:

F =X1/df1X2/df2

∼ Fdf1,df2

where X1 ∼ χ2df1

, X2 ∼ χ2df2

, and X1⊥⊥X2.

df1 and df2 are called the numerator degrees of freedom and the denominator degrees offreedom.

0.0 0.5 1.0 1.5 2.0

01

23

4

x

Den

sity

F 1,2F 5,5F 30, 20F 500, 200

In R: df(), pf(), rf()

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 77 / 93

Page 340: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

F Test against H0 : γ1 = γ2 = γ3 = 0.

The F statistic can be calculated by the following procedure:

1 Fit the Unrestricted Model (UR) which does not impose H0:

Vote = β0 +γ1FEM +β1EDUC +γ2(FEM ∗EDUC) +β2AGE +γ3(FEM ∗AGE) +u

2 Fit the Restricted Model (R) which does impose H0:

Vote = β0 + β1EDUC + β2AGE + u

3 From the two results, compute the F Statistic:

F0 =(SSRr − SSRur )/q

SSRur/(n − k − 1)

where SSR=sum of squared residuals, q=number of restrictions, k=number ofpredictors in the unrestricted model, and n= # of observations.

Intuition:

increase in prediction error

original prediction error

The F statistics have the following sampling distributions:

Under Assumptions 1–6, F0 ∼ Fq,n−k−1 regardless of the sample size.

Under Assumptions 1–5, qF0a.∼ χ2

q as n→∞ (see next section).

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 78 / 93

Page 341: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

F Test against H0 : γ1 = γ2 = γ3 = 0.The F statistic can be calculated by the following procedure:

1 Fit the Unrestricted Model (UR) which does not impose H0:

Vote = β0 +γ1FEM +β1EDUC +γ2(FEM ∗EDUC) +β2AGE +γ3(FEM ∗AGE) +u

2 Fit the Restricted Model (R) which does impose H0:

Vote = β0 + β1EDUC + β2AGE + u

3 From the two results, compute the F Statistic:

F0 =(SSRr − SSRur )/q

SSRur/(n − k − 1)

where SSR=sum of squared residuals, q=number of restrictions, k=number ofpredictors in the unrestricted model, and n= # of observations.

Intuition:

increase in prediction error

original prediction error

The F statistics have the following sampling distributions:

Under Assumptions 1–6, F0 ∼ Fq,n−k−1 regardless of the sample size.

Under Assumptions 1–5, qF0a.∼ χ2

q as n→∞ (see next section).

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 78 / 93

Page 342: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

F Test against H0 : γ1 = γ2 = γ3 = 0.The F statistic can be calculated by the following procedure:

1 Fit the Unrestricted Model (UR) which does not impose H0:

Vote = β0 +γ1FEM +β1EDUC +γ2(FEM ∗EDUC) +β2AGE +γ3(FEM ∗AGE) +u

2 Fit the Restricted Model (R) which does impose H0:

Vote = β0 + β1EDUC + β2AGE + u

3 From the two results, compute the F Statistic:

F0 =(SSRr − SSRur )/q

SSRur/(n − k − 1)

where SSR=sum of squared residuals, q=number of restrictions, k=number ofpredictors in the unrestricted model, and n= # of observations.

Intuition:

increase in prediction error

original prediction error

The F statistics have the following sampling distributions:

Under Assumptions 1–6, F0 ∼ Fq,n−k−1 regardless of the sample size.

Under Assumptions 1–5, qF0a.∼ χ2

q as n→∞ (see next section).

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 78 / 93

Page 343: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

F Test against H0 : γ1 = γ2 = γ3 = 0.The F statistic can be calculated by the following procedure:

1 Fit the Unrestricted Model (UR) which does not impose H0:

Vote = β0 +γ1FEM +β1EDUC +γ2(FEM ∗EDUC) +β2AGE +γ3(FEM ∗AGE) +u

2 Fit the Restricted Model (R) which does impose H0:

Vote = β0 + β1EDUC + β2AGE + u

3 From the two results, compute the F Statistic:

F0 =(SSRr − SSRur )/q

SSRur/(n − k − 1)

where SSR=sum of squared residuals, q=number of restrictions, k=number ofpredictors in the unrestricted model, and n= # of observations.

Intuition:

increase in prediction error

original prediction error

The F statistics have the following sampling distributions:

Under Assumptions 1–6, F0 ∼ Fq,n−k−1 regardless of the sample size.

Under Assumptions 1–5, qF0a.∼ χ2

q as n→∞ (see next section).

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 78 / 93

Page 344: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

F Test against H0 : γ1 = γ2 = γ3 = 0.The F statistic can be calculated by the following procedure:

1 Fit the Unrestricted Model (UR) which does not impose H0:

Vote = β0 +γ1FEM +β1EDUC +γ2(FEM ∗EDUC) +β2AGE +γ3(FEM ∗AGE) +u

2 Fit the Restricted Model (R) which does impose H0:

Vote = β0 + β1EDUC + β2AGE + u

3 From the two results, compute the F Statistic:

F0 =(SSRr − SSRur )/q

SSRur/(n − k − 1)

where SSR=sum of squared residuals, q=number of restrictions, k=number ofpredictors in the unrestricted model, and n= # of observations.

Intuition:

increase in prediction error

original prediction error

The F statistics have the following sampling distributions:

Under Assumptions 1–6, F0 ∼ Fq,n−k−1 regardless of the sample size.

Under Assumptions 1–5, qF0a.∼ χ2

q as n→∞ (see next section).

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 78 / 93

Page 345: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

F Test against H0 : γ1 = γ2 = γ3 = 0.The F statistic can be calculated by the following procedure:

1 Fit the Unrestricted Model (UR) which does not impose H0:

Vote = β0 +γ1FEM +β1EDUC +γ2(FEM ∗EDUC) +β2AGE +γ3(FEM ∗AGE) +u

2 Fit the Restricted Model (R) which does impose H0:

Vote = β0 + β1EDUC + β2AGE + u

3 From the two results, compute the F Statistic:

F0 =(SSRr − SSRur )/q

SSRur/(n − k − 1)

where SSR=sum of squared residuals, q=number of restrictions, k=number ofpredictors in the unrestricted model, and n= # of observations.

Intuition:

increase in prediction error

original prediction error

The F statistics have the following sampling distributions:

Under Assumptions 1–6, F0 ∼ Fq,n−k−1 regardless of the sample size.

Under Assumptions 1–5, qF0a.∼ χ2

q as n→∞ (see next section).

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 78 / 93

Page 346: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

F Test against H0 : γ1 = γ2 = γ3 = 0.The F statistic can be calculated by the following procedure:

1 Fit the Unrestricted Model (UR) which does not impose H0:

Vote = β0 +γ1FEM +β1EDUC +γ2(FEM ∗EDUC) +β2AGE +γ3(FEM ∗AGE) +u

2 Fit the Restricted Model (R) which does impose H0:

Vote = β0 + β1EDUC + β2AGE + u

3 From the two results, compute the F Statistic:

F0 =(SSRr − SSRur )/q

SSRur/(n − k − 1)

where SSR=sum of squared residuals, q=number of restrictions, k=number ofpredictors in the unrestricted model, and n= # of observations.

Intuition:

increase in prediction error

original prediction error

The F statistics have the following sampling distributions:

Under Assumptions 1–6, F0 ∼ Fq,n−k−1 regardless of the sample size.

Under Assumptions 1–5, qF0a.∼ χ2

q as n→∞ (see next section).

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 78 / 93

Page 347: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

F Test against H0 : γ1 = γ2 = γ3 = 0.The F statistic can be calculated by the following procedure:

1 Fit the Unrestricted Model (UR) which does not impose H0:

Vote = β0 +γ1FEM +β1EDUC +γ2(FEM ∗EDUC) +β2AGE +γ3(FEM ∗AGE) +u

2 Fit the Restricted Model (R) which does impose H0:

Vote = β0 + β1EDUC + β2AGE + u

3 From the two results, compute the F Statistic:

F0 =(SSRr − SSRur )/q

SSRur/(n − k − 1)

where SSR=sum of squared residuals, q=number of restrictions, k=number ofpredictors in the unrestricted model, and n= # of observations.

Intuition:

increase in prediction error

original prediction error

The F statistics have the following sampling distributions:

Under Assumptions 1–6, F0 ∼ Fq,n−k−1 regardless of the sample size.

Under Assumptions 1–5, qF0a.∼ χ2

q as n→∞ (see next section).

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 78 / 93

Page 348: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

F Test against H0 : γ1 = γ2 = γ3 = 0.The F statistic can be calculated by the following procedure:

1 Fit the Unrestricted Model (UR) which does not impose H0:

Vote = β0 +γ1FEM +β1EDUC +γ2(FEM ∗EDUC) +β2AGE +γ3(FEM ∗AGE) +u

2 Fit the Restricted Model (R) which does impose H0:

Vote = β0 + β1EDUC + β2AGE + u

3 From the two results, compute the F Statistic:

F0 =(SSRr − SSRur )/q

SSRur/(n − k − 1)

where SSR=sum of squared residuals, q=number of restrictions, k=number ofpredictors in the unrestricted model, and n= # of observations.

Intuition:

increase in prediction error

original prediction error

The F statistics have the following sampling distributions:

Under Assumptions 1–6, F0 ∼ Fq,n−k−1 regardless of the sample size.

Under Assumptions 1–5, qF0a.∼ χ2

q as n→∞ (see next section).

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 78 / 93

Page 349: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Unrestricted Model (UR)

R Code> fit.UR <- lm(vote1 ~ fem + educ + age + fem:age + fem:educ, data = Chile)

> summary(fit.UR)

~~~~~

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 0.293130 0.069242 4.233 2.42e-05 ***

fem 0.368975 0.098883 3.731 0.000197 ***

educ -0.038571 0.019578 -1.970 0.048988 *

age 0.005482 0.001114 4.921 9.44e-07 ***

fem:age -0.003779 0.001673 -2.259 0.024010 *

fem:educ -0.044484 0.027697 -1.606 0.108431

---

Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1

Residual standard error: 0.487 on 1697 degrees of freedom

Multiple R-squared: 0.05451, Adjusted R-squared: 0.05172

F-statistic: 19.57 on 5 and 1697 DF, p-value: < 2.2e-16

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 79 / 93

Page 350: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Restricted Model (R)

R Code> fit.R <- lm(vote1 ~ educ + age, data = Chile)

> summary(fit.R)

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 0.4878039 0.0497550 9.804 < 2e-16 ***

educ -0.0662022 0.0139615 -4.742 2.30e-06 ***

age 0.0035783 0.0008385 4.267 2.09e-05 ***

---

Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1

Residual standard error: 0.4921 on 1700 degrees of freedom

Multiple R-squared: 0.03275, Adjusted R-squared: 0.03161

F-statistic: 28.78 on 2 and 1700 DF, p-value: 5.097e-13

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 80 / 93

Page 351: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

F Test in R

R Code> SSR.UR <- sum(resid(fit.UR)^2) # = 402

> SSR.R <- sum(resid(fit.R)^2) # = 411

> DFdenom <- df.residual(fit.UR) # = 1703

> DFnum <- 3

> F <- ((SSR.R - SSR.UR)/DFnum) / (SSR.UR/DFdenom)

> F

[1] 13.01581

> qf(0.99, DFnum, DFdenom)

[1] 3.793171

Given above, what do we conclude?

F0 = 13 is greater than the critical value for a .01 level test. So we rejectthe null hypothesis.

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 81 / 93

Page 352: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

F Test in R

R Code> SSR.UR <- sum(resid(fit.UR)^2) # = 402

> SSR.R <- sum(resid(fit.R)^2) # = 411

> DFdenom <- df.residual(fit.UR) # = 1703

> DFnum <- 3

> F <- ((SSR.R - SSR.UR)/DFnum) / (SSR.UR/DFdenom)

> F

[1] 13.01581

> qf(0.99, DFnum, DFdenom)

[1] 3.793171

Given above, what do we conclude?F0 = 13 is greater than the critical value for a .01 level test. So we rejectthe null hypothesis.

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 81 / 93

Page 353: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Null Distribution, Critical Value, and Test StatisticNote that the F statistic is always positive, so we only look at the right tail of thereference F (or χ2 in a large sample) distribution.

0 5 10 15

0.0

0.2

0.4

0.6

value

dens

ity o

f F(3

,170

3−5−

1)

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 82 / 93

Page 354: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

F Test Examples I

The F test can be used to test various joint hypotheses which involve multiplelinear restrictions. Consider the regression model:

Y = β0 + β1X1 + β2X2 + β3X3 + ...+ βkXk + u

We may want to test:H0 : β1 = β2 = ... = βk = 0

Have any of you used an F-test like this in your research?

This is called the omnibus test and is routinely reported by statisticalsoftware.

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 83 / 93

Page 355: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

F Test Examples I

The F test can be used to test various joint hypotheses which involve multiplelinear restrictions.

Consider the regression model:

Y = β0 + β1X1 + β2X2 + β3X3 + ...+ βkXk + u

We may want to test:H0 : β1 = β2 = ... = βk = 0

Have any of you used an F-test like this in your research?

This is called the omnibus test and is routinely reported by statisticalsoftware.

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 83 / 93

Page 356: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

F Test Examples I

The F test can be used to test various joint hypotheses which involve multiplelinear restrictions. Consider the regression model:

Y = β0 + β1X1 + β2X2 + β3X3 + ...+ βkXk + u

We may want to test:H0 : β1 = β2 = ... = βk = 0

Have any of you used an F-test like this in your research?

This is called the omnibus test and is routinely reported by statisticalsoftware.

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 83 / 93

Page 357: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

F Test Examples I

The F test can be used to test various joint hypotheses which involve multiplelinear restrictions. Consider the regression model:

Y = β0 + β1X1 + β2X2 + β3X3 + ...+ βkXk + u

We may want to test:H0 : β1 = β2 = ... = βk = 0

Have any of you used an F-test like this in your research?

This is called the omnibus test and is routinely reported by statisticalsoftware.

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 83 / 93

Page 358: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

F Test Examples I

The F test can be used to test various joint hypotheses which involve multiplelinear restrictions. Consider the regression model:

Y = β0 + β1X1 + β2X2 + β3X3 + ...+ βkXk + u

We may want to test:H0 : β1 = β2 = ... = βk = 0

Have any of you used an F-test like this in your research?

This is called the omnibus test and is routinely reported by statisticalsoftware.

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 83 / 93

Page 359: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Omnibus Test in R

R Code> summary(fit.UR)

~~~~~

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 0.293130 0.069242 4.233 2.42e-05 ***

fem 0.368975 0.098883 3.731 0.000197 ***

educ -0.038571 0.019578 -1.970 0.048988 *

age 0.005482 0.001114 4.921 9.44e-07 ***

fem:age -0.003779 0.001673 -2.259 0.024010 *

fem:educ -0.044484 0.027697 -1.606 0.108431

---

Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1

Residual standard error: 0.487 on 1697 degrees of freedom

Multiple R-squared: 0.05451, Adjusted R-squared: 0.05172

F-statistic: 19.57 on 5 and 1697 DF, p-value: < 2.2e-16

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 84 / 93

Page 360: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Omnibus Test in R with Random NoiseR Code

> set.seed(08540)

> p <- 10; x <- matrix(rnorm(p*1000), nrow=1000)

> y <- rnorm(1000); summary(lm(y~x))

~~~~~

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) -0.0115475 0.0320874 -0.360 0.7190

x1 -0.0019803 0.0333524 -0.059 0.9527

x2 0.0666275 0.0314087 2.121 0.0341 *

x3 -0.0008594 0.0321270 -0.027 0.9787

x4 0.0051185 0.0333678 0.153 0.8781

x5 0.0136656 0.0322592 0.424 0.6719

x6 0.0102115 0.0332045 0.308 0.7585

x7 -0.0103903 0.0307639 -0.338 0.7356

x8 -0.0401722 0.0318317 -1.262 0.2072

x9 0.0553019 0.0315548 1.753 0.0800 .

x10 0.0410906 0.0319742 1.285 0.1991

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.011 on 989 degrees of freedom

Multiple R-squared: 0.01129, Adjusted R-squared: 0.001294

F-statistic: 1.129 on 10 and 989 DF, p-value: 0.3364

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 85 / 93

Page 361: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

F Test Examples II

The F test can be used to test various joint hypotheses which involve multiplelinear restrictions. Consider the regression model:

Y = β0 + β1X1 + β2X2 + β3X3 + ...+ βkXk + u

Next, let’s consider:H0 : β1 = β2 = β3

What question are we asking?

→ Are the coefficients X1, X2 and X3 different from each other?

How many restrictions?

→ Two (β1 − β2 = 0 and β2 − β3 = 0)

How do we fit the restricted model?

→ The null hypothesis implies that the model can be written as:

Y = β0 + β1(X1 + X2 + X3) + ...+ βkXk + u

So we create a new variable X ∗ = X1 + X2 + X3 and fit:

Y = β0 + β1X∗ + ...+ βkXk + u

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 86 / 93

Page 362: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

F Test Examples IIThe F test can be used to test various joint hypotheses which involve multiplelinear restrictions. Consider the regression model:

Y = β0 + β1X1 + β2X2 + β3X3 + ...+ βkXk + u

Next, let’s consider:H0 : β1 = β2 = β3

What question are we asking?

→ Are the coefficients X1, X2 and X3 different from each other?

How many restrictions?

→ Two (β1 − β2 = 0 and β2 − β3 = 0)

How do we fit the restricted model?

→ The null hypothesis implies that the model can be written as:

Y = β0 + β1(X1 + X2 + X3) + ...+ βkXk + u

So we create a new variable X ∗ = X1 + X2 + X3 and fit:

Y = β0 + β1X∗ + ...+ βkXk + u

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 86 / 93

Page 363: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

F Test Examples IIThe F test can be used to test various joint hypotheses which involve multiplelinear restrictions. Consider the regression model:

Y = β0 + β1X1 + β2X2 + β3X3 + ...+ βkXk + u

Next, let’s consider:H0 : β1 = β2 = β3

What question are we asking?

→ Are the coefficients X1, X2 and X3 different from each other?

How many restrictions?

→ Two (β1 − β2 = 0 and β2 − β3 = 0)

How do we fit the restricted model?

→ The null hypothesis implies that the model can be written as:

Y = β0 + β1(X1 + X2 + X3) + ...+ βkXk + u

So we create a new variable X ∗ = X1 + X2 + X3 and fit:

Y = β0 + β1X∗ + ...+ βkXk + u

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 86 / 93

Page 364: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

F Test Examples IIThe F test can be used to test various joint hypotheses which involve multiplelinear restrictions. Consider the regression model:

Y = β0 + β1X1 + β2X2 + β3X3 + ...+ βkXk + u

Next, let’s consider:H0 : β1 = β2 = β3

What question are we asking?

→ Are the coefficients X1, X2 and X3 different from each other?

How many restrictions?

→ Two (β1 − β2 = 0 and β2 − β3 = 0)

How do we fit the restricted model?

→ The null hypothesis implies that the model can be written as:

Y = β0 + β1(X1 + X2 + X3) + ...+ βkXk + u

So we create a new variable X ∗ = X1 + X2 + X3 and fit:

Y = β0 + β1X∗ + ...+ βkXk + u

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 86 / 93

Page 365: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

F Test Examples IIThe F test can be used to test various joint hypotheses which involve multiplelinear restrictions. Consider the regression model:

Y = β0 + β1X1 + β2X2 + β3X3 + ...+ βkXk + u

Next, let’s consider:H0 : β1 = β2 = β3

What question are we asking?

→ Are the coefficients X1, X2 and X3 different from each other?

How many restrictions?

→ Two (β1 − β2 = 0 and β2 − β3 = 0)

How do we fit the restricted model?

→ The null hypothesis implies that the model can be written as:

Y = β0 + β1(X1 + X2 + X3) + ...+ βkXk + u

So we create a new variable X ∗ = X1 + X2 + X3 and fit:

Y = β0 + β1X∗ + ...+ βkXk + u

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 86 / 93

Page 366: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

F Test Examples IIThe F test can be used to test various joint hypotheses which involve multiplelinear restrictions. Consider the regression model:

Y = β0 + β1X1 + β2X2 + β3X3 + ...+ βkXk + u

Next, let’s consider:H0 : β1 = β2 = β3

What question are we asking?

→ Are the coefficients X1, X2 and X3 different from each other?

How many restrictions?

→ Two (β1 − β2 = 0 and β2 − β3 = 0)

How do we fit the restricted model?

→ The null hypothesis implies that the model can be written as:

Y = β0 + β1(X1 + X2 + X3) + ...+ βkXk + u

So we create a new variable X ∗ = X1 + X2 + X3 and fit:

Y = β0 + β1X∗ + ...+ βkXk + u

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 86 / 93

Page 367: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

F Test Examples IIThe F test can be used to test various joint hypotheses which involve multiplelinear restrictions. Consider the regression model:

Y = β0 + β1X1 + β2X2 + β3X3 + ...+ βkXk + u

Next, let’s consider:H0 : β1 = β2 = β3

What question are we asking?

→ Are the coefficients X1, X2 and X3 different from each other?

How many restrictions?

→ Two (β1 − β2 = 0 and β2 − β3 = 0)

How do we fit the restricted model?

→ The null hypothesis implies that the model can be written as:

Y = β0 + β1(X1 + X2 + X3) + ...+ βkXk + u

So we create a new variable X ∗ = X1 + X2 + X3 and fit:

Y = β0 + β1X∗ + ...+ βkXk + u

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 86 / 93

Page 368: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

F Test Examples IIThe F test can be used to test various joint hypotheses which involve multiplelinear restrictions. Consider the regression model:

Y = β0 + β1X1 + β2X2 + β3X3 + ...+ βkXk + u

Next, let’s consider:H0 : β1 = β2 = β3

What question are we asking?

→ Are the coefficients X1, X2 and X3 different from each other?

How many restrictions?

→ Two (β1 − β2 = 0 and β2 − β3 = 0)

How do we fit the restricted model?

→ The null hypothesis implies that the model can be written as:

Y = β0 + β1(X1 + X2 + X3) + ...+ βkXk + u

So we create a new variable X ∗ = X1 + X2 + X3 and fit:

Y = β0 + β1X∗ + ...+ βkXk + u

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 86 / 93

Page 369: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Testing Equality of Coefficients in R

R Code> fit.UR2 <- lm(REALGDPCAP ~ Asia + LatAmerica + Transit + Oecd, data = D)

> summary(fit.UR2)

~~~~~

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 1899.9 914.9 2.077 0.0410 *

Asia 2701.7 1243.0 2.173 0.0327 *

LatAmerica 2281.5 1112.3 2.051 0.0435 *

Transit 2552.8 1204.5 2.119 0.0372 *

Oecd 12224.2 1112.3 10.990 <2e-16 ***

---

Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1

Residual standard error: 3034 on 80 degrees of freedom

Multiple R-squared: 0.7096, Adjusted R-squared: 0.6951

F-statistic: 48.88 on 4 and 80 DF, p-value: < 2.2e-16

Are the coefficients on Asia, LatAmerica and Transit statisticallysignificantly different?

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 87 / 93

Page 370: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Testing Equality of Coefficients in RR Code

> D$Xstar <- D$Asia + D$LatAmerica + D$Transit

> fit.R2 <- lm(REALGDPCAP ~ Xstar + Oecd, data = D)

> SSR.UR2 <- sum(resid(fit.UR2)^2)

> SSR.R2 <- sum(resid(fit.R2)^2)

> DFdenom <- df.residual(fit.UR2)

> F <- ((SSR.R2 - SSR.UR2)/2) / (SSR.UR2/DFdenom)

> F

[1] 0.08786129

> pf(F, 2, DFdenom, lower.tail = F)

[1] 0.9159762

So, what do we conclude?

The three coefficients are statistically indistinguishable from each other,with the p-value of 0.916.

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 88 / 93

Page 371: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Testing Equality of Coefficients in RR Code

> D$Xstar <- D$Asia + D$LatAmerica + D$Transit

> fit.R2 <- lm(REALGDPCAP ~ Xstar + Oecd, data = D)

> SSR.UR2 <- sum(resid(fit.UR2)^2)

> SSR.R2 <- sum(resid(fit.R2)^2)

> DFdenom <- df.residual(fit.UR2)

> F <- ((SSR.R2 - SSR.UR2)/2) / (SSR.UR2/DFdenom)

> F

[1] 0.08786129

> pf(F, 2, DFdenom, lower.tail = F)

[1] 0.9159762

So, what do we conclude?The three coefficients are statistically indistinguishable from each other,with the p-value of 0.916.

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 88 / 93

Page 372: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

t Test vs. F TestConsider the hypothesis test of

H0 : β1 = β2 vs. H1 : β1 6= β2

What ways have we learned to conduct this test?

Option 1: Compute T = (β1 − β2)/SE (β1 − β2) and do the t test.

Option 2: Create X ∗ = X1 + X2, fit the restricted model, computeF = (SSRR − SSRUR)

/(SSRR/(n − k − 1)) and do the F test.

It turns out these two tests give identical results. This is because

X ∼ tn−k−1 ⇐⇒ X 2 ∼ F1,n−k−1

So, for testing a single hypothesis it does not matter whether one does a ttest or an F test.

Usually, the t test is used for single hypotheses and the F test is used forjoint hypotheses.

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 89 / 93

Page 373: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

t Test vs. F TestConsider the hypothesis test of

H0 : β1 = β2 vs. H1 : β1 6= β2

What ways have we learned to conduct this test?

Option 1: Compute T = (β1 − β2)/SE (β1 − β2) and do the t test.

Option 2: Create X ∗ = X1 + X2, fit the restricted model, computeF = (SSRR − SSRUR)

/(SSRR/(n − k − 1)) and do the F test.

It turns out these two tests give identical results. This is because

X ∼ tn−k−1 ⇐⇒ X 2 ∼ F1,n−k−1

So, for testing a single hypothesis it does not matter whether one does a ttest or an F test.

Usually, the t test is used for single hypotheses and the F test is used forjoint hypotheses.

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 89 / 93

Page 374: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

t Test vs. F TestConsider the hypothesis test of

H0 : β1 = β2 vs. H1 : β1 6= β2

What ways have we learned to conduct this test?

Option 1: Compute T = (β1 − β2)/SE (β1 − β2) and do the t test.

Option 2: Create X ∗ = X1 + X2, fit the restricted model, computeF = (SSRR − SSRUR)

/(SSRR/(n − k − 1)) and do the F test.

It turns out these two tests give identical results. This is because

X ∼ tn−k−1 ⇐⇒ X 2 ∼ F1,n−k−1

So, for testing a single hypothesis it does not matter whether one does a ttest or an F test.

Usually, the t test is used for single hypotheses and the F test is used forjoint hypotheses.

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 89 / 93

Page 375: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

t Test vs. F TestConsider the hypothesis test of

H0 : β1 = β2 vs. H1 : β1 6= β2

What ways have we learned to conduct this test?

Option 1: Compute T = (β1 − β2)/SE (β1 − β2) and do the t test.

Option 2: Create X ∗ = X1 + X2, fit the restricted model, computeF = (SSRR − SSRUR)

/(SSRR/(n − k − 1)) and do the F test.

It turns out these two tests give identical results. This is because

X ∼ tn−k−1 ⇐⇒ X 2 ∼ F1,n−k−1

So, for testing a single hypothesis it does not matter whether one does a ttest or an F test.

Usually, the t test is used for single hypotheses and the F test is used forjoint hypotheses.

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 89 / 93

Page 376: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

t Test vs. F TestConsider the hypothesis test of

H0 : β1 = β2 vs. H1 : β1 6= β2

What ways have we learned to conduct this test?

Option 1: Compute T = (β1 − β2)/SE (β1 − β2) and do the t test.

Option 2: Create X ∗ = X1 + X2, fit the restricted model, computeF = (SSRR − SSRUR)

/(SSRR/(n − k − 1)) and do the F test.

It turns out these two tests give identical results. This is because

X ∼ tn−k−1 ⇐⇒ X 2 ∼ F1,n−k−1

So, for testing a single hypothesis it does not matter whether one does a ttest or an F test.

Usually, the t test is used for single hypotheses and the F test is used forjoint hypotheses.

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 89 / 93

Page 377: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

t Test vs. F TestConsider the hypothesis test of

H0 : β1 = β2 vs. H1 : β1 6= β2

What ways have we learned to conduct this test?

Option 1: Compute T = (β1 − β2)/SE (β1 − β2) and do the t test.

Option 2: Create X ∗ = X1 + X2, fit the restricted model, computeF = (SSRR − SSRUR)

/(SSRR/(n − k − 1)) and do the F test.

It turns out these two tests give identical results. This is because

X ∼ tn−k−1 ⇐⇒ X 2 ∼ F1,n−k−1

So, for testing a single hypothesis it does not matter whether one does a ttest or an F test.

Usually, the t test is used for single hypotheses and the F test is used forjoint hypotheses.

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 89 / 93

Page 378: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Some More Notes on F Tests

The F-value can also be calculated from R2:

F =(R2

UR − R2R)/q

(1− R2UR)/(n − k − 1)

F tests only work for testing nested models, i.e. the restricted model mustbe a special case of the unrestricted model.

For example F tests cannot be used to test

Y = β0 + β1X1

+ β2X2

+ β3X3 + u

againstY = β0 + β1X1 + β2X2 +

β3X3

+ u

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 90 / 93

Page 379: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Some More Notes on F Tests

The F-value can also be calculated from R2:

F =(R2

UR − R2R)/q

(1− R2UR)/(n − k − 1)

F tests only work for testing nested models, i.e. the restricted model mustbe a special case of the unrestricted model.

For example F tests cannot be used to test

Y = β0 + β1X1

+ β2X2

+ β3X3 + u

againstY = β0 + β1X1 + β2X2 +

β3X3

+ u

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 90 / 93

Page 380: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Some More Notes on F Tests

Joint significance does not necessarily imply the significance of individualcoefficients, or vice versa:Finite-Sample Properties of OLS 45

Figure 1.5: t- versus F -Tests

An Example of a Test Statistic Whose Distribution Depends on XXXTo place the discussion of this section in a proper perspective, it may be usefulto note that there are some statistics whose conditional distribution depends on X.Consider the celebrated Durbin-Watson statistic:

∑ni=2(ei − ei−1)2∑n

i=1 e2i.

The conditional distribution, and hence the critical values, of this statistic dependon X, but J. Durbin and G. S. Watson have shown that the critical values fallbetween two bounds (which depends on the sample size, the number of regres-sors, and whether the regressor includes a constant). Therefore, the critical valuesfor the unconditional distribution, too, fall between these bounds.

The statistic is designed for testing whether there is no serial correlation inthe error term. Thus, the null hypothesis is Assumption 1.4, while the maintainedhypothesis is the other assumptions of the classical regression model (including thestrict exogeneity assumption) and the normality assumption. But, as emphasizedin Section 1.1, the strict exogeneity assumption is not satisfied in time-series mod-els typically encountered in econometrics, and serial correlation is an issue thatarises only in time-series models. Thus, the Durbin-Watson statistic is not usefulin econometrics. More useful tests for serial correlation, which are all based onlarge-sample theory, will be covered in the next chapter.

Image Credit: Hayashi (2011) Econometrics

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 91 / 93

Page 381: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

Goal Check: Understand lm() Output

Call:

lm(formula = sr ~ pop15, data = LifeCycleSavings)

Residuals:

Min 1Q Median 3Q Max

-8.637 -2.374 0.349 2.022 11.155

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 17.49660 2.27972 7.675 6.85e-10 ***

pop15 -0.22302 0.06291 -3.545 0.000887 ***

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 4.03 on 48 degrees of freedom

Multiple R-squared: 0.2075,Adjusted R-squared: 0.191

F-statistic: 12.57 on 1 and 48 DF, p-value: 0.0008866Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 92 / 93

Page 382: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

This Week in Review

You now have seen the full linear regression model!

Multiple regression is much like the regression formulations we havealready seen.

We showed how to estimate the coefficients and get the variancecovariance matrix.

You can also calculate robust standard errors which provide a plugand play replacement.

Much of the hypothesis test infrastructure ports over nicely, plusthere are new joint tests we can use.

Next week: Troubleshooting the Linear Model!

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 93 / 93

Page 383: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

This Week in Review

You now have seen the full linear regression model!

Multiple regression is much like the regression formulations we havealready seen.

We showed how to estimate the coefficients and get the variancecovariance matrix.

You can also calculate robust standard errors which provide a plugand play replacement.

Much of the hypothesis test infrastructure ports over nicely, plusthere are new joint tests we can use.

Next week: Troubleshooting the Linear Model!

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 93 / 93

Page 384: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

This Week in Review

You now have seen the full linear regression model!

Multiple regression is much like the regression formulations we havealready seen.

We showed how to estimate the coefficients and get the variancecovariance matrix.

You can also calculate robust standard errors which provide a plugand play replacement.

Much of the hypothesis test infrastructure ports over nicely, plusthere are new joint tests we can use.

Next week: Troubleshooting the Linear Model!

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 93 / 93

Page 385: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

This Week in Review

You now have seen the full linear regression model!

Multiple regression is much like the regression formulations we havealready seen.

We showed how to estimate the coefficients and get the variancecovariance matrix.

You can also calculate robust standard errors which provide a plugand play replacement.

Much of the hypothesis test infrastructure ports over nicely, plusthere are new joint tests we can use.

Next week: Troubleshooting the Linear Model!

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 93 / 93

Page 386: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

This Week in Review

You now have seen the full linear regression model!

Multiple regression is much like the regression formulations we havealready seen.

We showed how to estimate the coefficients and get the variancecovariance matrix.

You can also calculate robust standard errors which provide a plugand play replacement.

Much of the hypothesis test infrastructure ports over nicely, plusthere are new joint tests we can use.

Next week: Troubleshooting the Linear Model!

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 93 / 93

Page 387: Week 7: Multiple Regression - Princeton University · 2020. 10. 10. · Week 7: Multiple Regression Brandon Stewart1 Princeton October 12{16, 2020 1These slides are heavily in uenced

This Week in Review

You now have seen the full linear regression model!

Multiple regression is much like the regression formulations we havealready seen.

We showed how to estimate the coefficients and get the variancecovariance matrix.

You can also calculate robust standard errors which provide a plugand play replacement.

Much of the hypothesis test infrastructure ports over nicely, plusthere are new joint tests we can use.

Next week: Troubleshooting the Linear Model!

Stewart (Princeton) Week 7: Multiple Regression October 12–16, 2020 93 / 93