linear models and their mathematical foundations: multiple ... · models, estimation and...

25
Models, estimation and goodness-of-fit Generalized least squares Misspecifications and orthogonalization Linear models and their mathematical foundations: Multiple linear regression, part I Steffen Unkel Department of Medical Statistics University Medical Center G¨ ottingen, Germany Winter term 2018/19 1/25

Upload: others

Post on 04-May-2020

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Linear models and their mathematical foundations: Multiple ... · Models, estimation and goodness-of- t Generalized least squares Misspeci cations and orthogonalization Linear models

Models, estimation and goodness-of-fitGeneralized least squares

Misspecifications and orthogonalization

Linear models and their mathematical foundations:Multiple linear regression, part I

Steffen Unkel

Department of Medical StatisticsUniversity Medical Center Gottingen, Germany

Winter term 2018/19 1/25

Page 2: Linear models and their mathematical foundations: Multiple ... · Models, estimation and goodness-of- t Generalized least squares Misspeci cations and orthogonalization Linear models

Models, estimation and goodness-of-fitGeneralized least squares

Misspecifications and orthogonalization

Introduction

In multiple linear regression, we attempt to predict acontinuous (random) response variable y on the basis of anassumed linear relationship with several (fixed) predictorvariables x1, x2, . . . , xk .

Given a sample of n observations on y and the associated xvariables, the n model equations can be written as

y1y2...yn

=

1 x11 x12 . . . x1k1 x21 x22 . . . x2k...

......

. . ....

1 xn1 xn2 . . . xnk

β0β1...βk

+

ε1ε2...εn

or

yn×1

= Xn×(k+1)

β(k+1)×1

+ εn×1

.

Winter term 2018/19 2/25

Page 3: Linear models and their mathematical foundations: Multiple ... · Models, estimation and goodness-of- t Generalized least squares Misspeci cations and orthogonalization Linear models

Models, estimation and goodness-of-fitGeneralized least squares

Misspecifications and orthogonalization

Assumptions

Model assumptions:

A1 E(ε) = 0.

A2 Cov(ε) = σ2I.

Occasionally, we will make use of the following additionalassumption:

A3 ε ∼ Nn(0, σ2I).

For the time being, we assume that for the n × (k + 1) designmatrix it holds that n > k + 1 and rank(X) = k + 1.

The β regression coefficients are sometimes referred to aspartial regression coefficients.

Winter term 2018/19 3/25

Page 4: Linear models and their mathematical foundations: Multiple ... · Models, estimation and goodness-of- t Generalized least squares Misspeci cations and orthogonalization Linear models

Models, estimation and goodness-of-fitGeneralized least squares

Misspecifications and orthogonalization

Least squares estimation of β

To find β, we solve the optimization problem minβ

ε>ε.

If y = Xβ + ε, where X has size n × (k + 1) with n > k + 1and rank(X) = k + 1, then the (ordinary) least squaresestimator β that minimizes ε>ε is

β = (X>X)−1X>y .

The least squares estimator is derived without any of theassumptions A1–A3.

If β = (X>X)−1X>y, then ε = y− Xβ = y− y is the vectorof residuals.

Winter term 2018/19 4/25

Page 5: Linear models and their mathematical foundations: Multiple ... · Models, estimation and goodness-of- t Generalized least squares Misspeci cations and orthogonalization Linear models

Models, estimation and goodness-of-fitGeneralized least squares

Misspecifications and orthogonalization

Basic geometry of least squares

Prediction space

O

y

Figure: A general point Xβ in the prediction space.

Winter term 2018/19 5/25

Page 6: Linear models and their mathematical foundations: Multiple ... · Models, estimation and goodness-of- t Generalized least squares Misspeci cations and orthogonalization Linear models

Models, estimation and goodness-of-fitGeneralized least squares

Misspecifications and orthogonalization

Basic geometry of least squares (2)

Prediction space

O

y

X�β

Figure: The right-angled triangle of vectors y, y = Xβ and ε = y− y.

Winter term 2018/19 6/25

Page 7: Linear models and their mathematical foundations: Multiple ... · Models, estimation and goodness-of- t Generalized least squares Misspeci cations and orthogonalization Linear models

Models, estimation and goodness-of-fitGeneralized least squares

Misspecifications and orthogonalization

Properties of the least squares estimator β

1 If assumption A1 holds, then E(β) = β.

2 If assumption A2 holds, then Cov(β) = σ2(X>X)−1.

3 Gauss-Markov theorem: If A1 and A2 hold, the least squaresestimators βj , j = 0, . . . , k, have minimum variance among all

linear unbiased estimators; the βj (j = 0, . . . , k) are best linearunbiased estimators (BLUE).

Corollary: If A1 and A2 hold, the BLUE of a>β is a>β, whereβ = (X>X)−1X>y.

Winter term 2018/19 7/25

Page 8: Linear models and their mathematical foundations: Multiple ... · Models, estimation and goodness-of- t Generalized least squares Misspeci cations and orthogonalization Linear models

Models, estimation and goodness-of-fitGeneralized least squares

Misspecifications and orthogonalization

Properties (2)

4 If x = (1, x1, . . . , xk)> and z = (1, c1x1, . . . , ckxk)>, then

y = β>

x = β>z z, where βz is the least squares estimator from

the regression of y on z.

Corollary: The fitted value y is invariant to a full-rank lineartransformation on the x variables.

Winter term 2018/19 8/25

Page 9: Linear models and their mathematical foundations: Multiple ... · Models, estimation and goodness-of- t Generalized least squares Misspeci cations and orthogonalization Linear models

Models, estimation and goodness-of-fitGeneralized least squares

Misspecifications and orthogonalization

Estimation of σ2

We estimate σ2 by

s2 =1

n − k − 1

n∑i=1

(yi − x>i β)2

=1

n − k − 1(y− Xβ)>(y− Xβ) =

y>y− β>

X>y

n − k − 1

=SSE

n − k − 1,

where x>i is the ith row of X and SSE = y>y− β>

X>y.

If A1 and A2 hold, then E(s2) = σ2 and an unbiased

estimator of Cov(β) is Cov(β) = s2(X>X)−1.

Winter term 2018/19 9/25

Page 10: Linear models and their mathematical foundations: Multiple ... · Models, estimation and goodness-of- t Generalized least squares Misspeci cations and orthogonalization Linear models

Models, estimation and goodness-of-fitGeneralized least squares

Misspecifications and orthogonalization

Maximum likelihood estimation

To the assumptions A1 and A2, we now add A3:ε ∼ Nn(0, σ2I).

If y ∼ Nn(Xβ, σ2I) and X is an n × (k + 1) design matrixwith rank(X) = k + 1 < n the maximum likelihood estimators(MLEs) of β and σ2 are

βMLE = (X>X)−1X>y ,

σ2MLE =1

n(y− Xβ)>(y− Xβ) .

Whereas βMLE is the same as the least squares estimator,σ2MLE is biased.

Winter term 2018/19 10/25

Page 11: Linear models and their mathematical foundations: Multiple ... · Models, estimation and goodness-of- t Generalized least squares Misspeci cations and orthogonalization Linear models

Models, estimation and goodness-of-fitGeneralized least squares

Misspecifications and orthogonalization

Some properties of the MLEs

1 The MLEs βMLE and σ2MLE have the following distributionalproperties:

(i) βMLE ∼ Nk+1(β, σ2(X>X)−1).

(ii) nσ2MLE/σ

2 ∼ χ2(n − k − 1).

(iii) βMLE and σ2MLE are independent.

2 If y ∼ Nn(Xβ, σ2I), then βMLE and σ2MLE are jointly sufficientstatistics for the parameters β and σ2.

Winter term 2018/19 11/25

Page 12: Linear models and their mathematical foundations: Multiple ... · Models, estimation and goodness-of- t Generalized least squares Misspeci cations and orthogonalization Linear models

Models, estimation and goodness-of-fitGeneralized least squares

Misspecifications and orthogonalization

The multiple linear regression model in centered form

Let xj =∑n

i=1 xij/n (j = 1, . . . , k). The centered multiplelinear regression model for y is

y = (1n Xc)

(αβ1

)+ ε ,

where α = β0 + β1x1 + · · ·+ βk xk , β1 = (β1, . . . , βk)>,Xc =

(In − n−11n1>n

)X1 and

X1 =

x11 x12 . . . x1kx21 x22 . . . x2k

......

. . ....

xn1 xn2 . . . xnk

.

Recall the centering matrix In − n−11n1>n .

Winter term 2018/19 12/25

Page 13: Linear models and their mathematical foundations: Multiple ... · Models, estimation and goodness-of- t Generalized least squares Misspeci cations and orthogonalization Linear models

Models, estimation and goodness-of-fitGeneralized least squares

Misspecifications and orthogonalization

Least squares estimators in the centered model

The least squares estimators are given by

α = y , β1 = (X>c Xc)−1X>c y .

The estimators above are the same as β = (X>X)−1X>y withthe adjustment

β0 = α− β1x1 − · · · − βk xk = y − β>1 x .

We can express y1, . . . , yn in centered form as follows:yi = α + β1(xi1 − x1) + · · ·+ β1(xik − xk).

We can write the error sum of squares as follows:

SSE =∑n

i=1(yi − y)2 − β>1 X>c y = y>y− β

>X>y.

Winter term 2018/19 13/25

Page 14: Linear models and their mathematical foundations: Multiple ... · Models, estimation and goodness-of- t Generalized least squares Misspeci cations and orthogonalization Linear models

Models, estimation and goodness-of-fitGeneralized least squares

Misspecifications and orthogonalization

Coefficient of determination

Recall the coefficient of determination:

R2 =

∑ni=1(yi − y)2∑ni=1(yi − y)2

=SSR

SST= 1− SSE

SST.

For the multiple linear regression model, 0 ≤ R2 ≤ 1 can bewritten as

R2 =β>

X>y− ny2

y>y− ny2= 1− y>y− β

>X>y

y>y− ny2

=β>1 X>c Xc β1

y>y− ny2.

Winter term 2018/19 14/25

Page 15: Linear models and their mathematical foundations: Multiple ... · Models, estimation and goodness-of- t Generalized least squares Misspeci cations and orthogonalization Linear models

Models, estimation and goodness-of-fitGeneralized least squares

Misspecifications and orthogonalization

Some properties of R2

1 The positive square root of R is the multiple correlationbetween the response and the predictors.

2 The multiple correlation is equal to the simple correlationbetween the observed yi ’s and the fitted yi ’s.

3 R2 is invariant to full-rank linear transformations on the x ’sand to a scale change on y .

Winter term 2018/19 15/25

Page 16: Linear models and their mathematical foundations: Multiple ... · Models, estimation and goodness-of- t Generalized least squares Misspeci cations and orthogonalization Linear models

Models, estimation and goodness-of-fitGeneralized least squares

Misspecifications and orthogonalization

Adjusted R2

Adding a predictor to the model cannot decrease the value ofR2.

However, this may conflict with the principle of parsimony.

An adjusted R2a has been proposed that includes a penalty for

adding a predictor variable to the model.

It is defined as

R2a =

(R2 − k

n−1

)(n − 1)

n − k − 1=

(n − 1)R2 − k

n − k − 1.

R2a can be negative.

Winter term 2018/19 16/25

Page 17: Linear models and their mathematical foundations: Multiple ... · Models, estimation and goodness-of- t Generalized least squares Misspeci cations and orthogonalization Linear models

Models, estimation and goodness-of-fitGeneralized least squares

Misspecifications and orthogonalization

Model setting

We now consider situations for which the assumption A2 isviolated.

Instead we impose the assumption Cov(ε) = σ2V, whereV 6= I is a known symmetric positive definite matrix of sizen × n.

The matrix V has n diagonal elements and

(n2

)elements

above (or below) the diagonal.

In certain applications, a simpler structure for V (e.g.diagonal) is assumed.

Winter term 2018/19 17/25

Page 18: Linear models and their mathematical foundations: Multiple ... · Models, estimation and goodness-of- t Generalized least squares Misspeci cations and orthogonalization Linear models

Models, estimation and goodness-of-fitGeneralized least squares

Misspecifications and orthogonalization

Generalized least squares (GLS) estimators

For the model with Cov(ε) = σ2V, we obtain the followingresults:

(i) The BLUE of β is β = (X>V−1X)−1X>V−1y .

(ii) The covariance matrix for β is Cov(β) = σ2(X>V−1X)−1 .

(iii) An unbiased estimator of σ2 is

s2 =(y− Xβ)>V−1(y− Xβ)

n − k − 1

=y>[V−1 − V−1X(X>V−1X)−1X>V−1

]y

n − k − 1,

where β is given in (i).

Winter term 2018/19 18/25

Page 19: Linear models and their mathematical foundations: Multiple ... · Models, estimation and goodness-of- t Generalized least squares Misspeci cations and orthogonalization Linear models

Models, estimation and goodness-of-fitGeneralized least squares

Misspecifications and orthogonalization

Maximum likelihood estimators

For the model with Cov(ε) = σ2V, the maximum likelihoodestimators are

β = (X>V−1X)−1X>V−1y ,

σ2 =(y− Xβ)>V−1(y− Xβ)

n.

Winter term 2018/19 19/25

Page 20: Linear models and their mathematical foundations: Multiple ... · Models, estimation and goodness-of- t Generalized least squares Misspeci cations and orthogonalization Linear models

Models, estimation and goodness-of-fitGeneralized least squares

Misspecifications and orthogonalization

Misspecification of the error structure

Suppose the model is y = Xβ + ε with E(y) = Xβ andCov(y) = σ2V and one uses the ordinary least squaresestimator βOLS = (X>X)−1X>y to estimate β.

The consequences of using the ordinary least squares estimatoron E(βOLS) and Cov(βOLS) for the case that the errorstructure Cov(ε) = σ2V holds will be discussed in the tutorial.

Winter term 2018/19 20/25

Page 21: Linear models and their mathematical foundations: Multiple ... · Models, estimation and goodness-of- t Generalized least squares Misspeci cations and orthogonalization Linear models

Models, estimation and goodness-of-fitGeneralized least squares

Misspecifications and orthogonalization

Model misspecification

Suppose the model is y = Xβ + ε with E(y) = Xβ andCov(y) = σ2I.

Let the model be partitioned as

y = Xβ + ε = (X1 X2)

(β1

β2

)+ ε

= X1β1 + X2β2 + ε .

Suppose we leave out X2β2 when it should be included, i.e.,when β2 6= 0.

By doing so, we misspecify E(y).

Winter term 2018/19 21/25

Page 22: Linear models and their mathematical foundations: Multiple ... · Models, estimation and goodness-of- t Generalized least squares Misspeci cations and orthogonalization Linear models

Models, estimation and goodness-of-fitGeneralized least squares

Misspecifications and orthogonalization

Reduced model

We consider estimation of β1 when underfitting.

We write the reduced model as

y = Xβ∗1 + ε∗

using β∗1 to emphasize that these parameters and their

estimates β∗1 will be different from β1 and β1, respectively, in

the full model.

Winter term 2018/19 22/25

Page 23: Linear models and their mathematical foundations: Multiple ... · Models, estimation and goodness-of- t Generalized least squares Misspeci cations and orthogonalization Linear models

Models, estimation and goodness-of-fitGeneralized least squares

Misspecifications and orthogonalization

Fitting the reduced model

If we fit the model y = X1β∗1 + ε∗ when the correct model is

y = X1β1 + X2β2 + ε with Cov(y) = σ2I, then the following

results for the least squares estimator β∗1 = (X>1 X1)−1X>1 y

can be obtained:

E(β∗1) = will be discussed in the tutorial

Cov(β∗1) = σ2(X>1 X1)−1 ,

Furthermore, Cov(β1)− Cov(β∗1) = σ2AB−1A>, which is a

positive definite matrix, where A = (X>1 X1)−1X>1 X2 andB = X>2 X2 − X>2 X1A. Therefore, Var(βj) > Var(β∗j ).

Winter term 2018/19 23/25

Page 24: Linear models and their mathematical foundations: Multiple ... · Models, estimation and goodness-of- t Generalized least squares Misspeci cations and orthogonalization Linear models

Models, estimation and goodness-of-fitGeneralized least squares

Misspecifications and orthogonalization

Underfitting and overfitting

Underfitting leads in general to biased results but lowervariances.

Overfitting leads to unbiased results but greater variances.

Seek an adequate balance between a biased model and amodel with large variances.

Task: find an optimum subset of predictors.

Winter term 2018/19 24/25

Page 25: Linear models and their mathematical foundations: Multiple ... · Models, estimation and goodness-of- t Generalized least squares Misspeci cations and orthogonalization Linear models

Models, estimation and goodness-of-fitGeneralized least squares

Misspecifications and orthogonalization

Orthogonalization

Suppose that in the full model y = X1β1 + X2β2 + ε thecolumns of X1 are orthogonal to the columns of X2, that is,X>1 X2 = O.

If X>1 X2 = O, then the least squares estimator β∗1 obtained

from fitting the reduced model is unbiased: E(β∗1) = β1.

Moreover, if X>1 X2 = O, then the estimator of β1 in the full

model is the same as the estimator of β∗1 in the reduced

model.

The process of orthogonalization can give additional insightsinto the meaning of the regression coefficients.

Winter term 2018/19 25/25