the general linear model. the simple linear model linear regression

118
The General Linear Model

Upload: hugo-ward

Post on 17-Dec-2015

256 views

Category:

Documents


5 download

TRANSCRIPT

Page 1: The General Linear Model. The Simple Linear Model Linear Regression

The General Linear Model

Page 2: The General Linear Model. The Simple Linear Model Linear Regression

The Simple Linear Model

Linear Regression

Page 3: The General Linear Model. The Simple Linear Model Linear Regression

Suppose that we have two variables

1. Y – the dependent variable (response variable)

2. X – the independent variable (explanatory variable, factor)

Page 4: The General Linear Model. The Simple Linear Model Linear Regression

X , the independent variable may or may not be a random variable .

Sometimes it is randomly observed.

Sometimes specific values of X are selected

Page 5: The General Linear Model. The Simple Linear Model Linear Regression

The dependent variable, Y, is assumed to be a random variable .

The distribution of Y is dependent on X

The object is to determine that distribution using statistical techniques. (Estimation and Hypothesis Testing)

Page 6: The General Linear Model. The Simple Linear Model Linear Regression

These decisions will be based on data collected on both variable Y (the dependent variable) and X (the independent variable) .

Let (x1, y1), (x2, y2), … ,(xn, yn) denote n pairs of values measured on the independent variable (X) and the dependent variable (Y)

The scatterplot:

The graphical plot of the points:

(x1, y1), (x2, y2), … ,(xn, yn)

Page 7: The General Linear Model. The Simple Linear Model Linear Regression

Assume that we have collected data on two variables X and Y. Let

(x1, y1) (x2, y2) (x3, y3) … (xn, yn)

denote the pairs of measurements on the on two variables X and Y for n cases in a sample (or population)

Page 8: The General Linear Model. The Simple Linear Model Linear Regression

1. independent random variables.

2. Normally distributed.

3. Have the common variance, .

4. The mean of yi is i = + xi

The assumption will be made that y1, y2, y3

…, yn are

Data that satisfies the assumptions above is to come from the Simple Linear Model

Page 9: The General Linear Model. The Simple Linear Model Linear Regression

Each yi is assumed to be randomly generated

from a normal distribution with

mean i = + xi and

standard deviation .

yi

+ xi

xi

Page 10: The General Linear Model. The Simple Linear Model Linear Regression

• When data is correlated it falls roughly about a straight line.

0

20

40

60

80

100

120

140

160

40 60 80 100 120 140

Page 11: The General Linear Model. The Simple Linear Model Linear Regression

The density of yi is:

2

2

2

2

1

ii xy

i eyf

The joint density of y1 ,y2 , …,yn is:

n

i

xy

n

ii

eyyyf1

221

2

2

2

1,,,

n

iii xy

nn e 1

222

1

2/2

1

Page 12: The General Linear Model. The Simple Linear Model Linear Regression

Estimation of the parameters

the intercept

the slope the standard deviation (or variance 2)

Page 13: The General Linear Model. The Simple Linear Model Linear Regression

The Least Squares Line

Fitting the best straight line

to “linear” data

Page 14: The General Linear Model. The Simple Linear Model Linear Regression

LetY = a + b X

denote an arbitrary equation of a straight line.a and b are known values.This equation can be used to predict for each value of X, the value of Y.

For example, if X = xi (as for the ith case) then the predicted value of Y is:

ii bxay ˆ

Page 15: The General Linear Model. The Simple Linear Model Linear Regression

Define the residual for each case in the sample to be:

iiiii bxayyyr ˆ

,ˆ,,ˆ,ˆ 222111 nnn yyryyryyr

n

iii

n

ii yyrRSS

1

2

1

2 ˆ

The residual sum of squares (RSS) is defined as:

The residual sum of squares (RSS) is a measure of the “goodness of fit of the line Y = a + bX to the data

n

iii bxay

1

2

Page 16: The General Linear Model. The Simple Linear Model Linear Regression

One choice of a and b will result in the residual sum of squares

n

iii

n

iii

n

ii bxayyyrRSS

1

2

1

2

1

2 ˆ

attaining a minimum.

If this is the case than the line:

Y = a + bX

is called the Least Squares Line

baR ,

Page 17: The General Linear Model. The Simple Linear Model Linear Regression

To find the Least Squares estimates, a and b, we need to solve the equations:

0,

1

2

n

iii bxay

aa

baR

and

0,

1

2

n

iii bxay

bb

baR

Page 18: The General Linear Model. The Simple Linear Model Linear Regression

Note:

n

iii bxay

aa

baR

1

2,

or

n

iii bxay

1

)1(2

0)2(1 1

n

i

n

iii xbnay

n

ii

n

ii xbnay

11

and xbay

Page 19: The General Linear Model. The Simple Linear Model Linear Regression

Note:

or

n

iii bxay

bb

baR

1

2,

n

iiii xbxay

1

)(2

0)2(1

2

11

n

ii

n

ii

n

iii xbxayx

n

ii

n

ii

n

iii xbxayx

1

2

11

Page 20: The General Linear Model. The Simple Linear Model Linear Regression

Hence the optimal values of a and b satisfy the equations: and

n

ii

n

ii

n

iii xbxayx

1

2

11

From the first equation we have:

xbay

xbya

n

iixbxnxby

1

2

The second equation becomes:

n

ii

n

ii

n

iii xbxxbyyx

1

2

11

n

ii xnxbyxn

1

22

Page 21: The General Linear Model. The Simple Linear Model Linear Regression

Solving the second equation for b:

xS

Syxbya

xx

xy

xx

xy

n

ii

n

iii

S

S

xnx

yxnyxb

1

22

1

and

n

iixx xnxS

1

22where

n

iiixy yxnyxS

1

and

Page 22: The General Linear Model. The Simple Linear Model Linear Regression

Note:

and

xx

n

ii

n

ii Sxnxxx

1

22

1

2

Proof

xy

n

iii

n

iii Syxnyxyyxx

11

n

iiiii

n

iii yxyxyxyxyyxx

11

yxnyxxyyxn

ii

n

ii

n

iii

111

yxnyxnyxnyxn

iii

1

yxnyxn

iii

1

Page 23: The General Linear Model. The Simple Linear Model Linear Regression

Summary:Slope and intercept of the least squares Line

xS

Syxy

xx

xy ˆˆ

xx

xy

n

ii

n

iii

S

S

xx

yyxx

1

2

1

and

Page 24: The General Linear Model. The Simple Linear Model Linear Regression

Maximum Likelihood Estimation of the parameters

the intercept

the slope the standard deviation

Page 25: The General Linear Model. The Simple Linear Model Linear Regression

Recall

The joint density of y1 ,y2 , …,yn is:

n

i

xy

n

ii

eyyyf1

221

2

2

2

1,,,

n

iii xy

nn e 1

222

1

2/2

1

L

= the Likelihood function

Page 26: The General Linear Model. The Simple Linear Model Linear Regression

n

iii xyn

n1

2

22

1ln2ln

2

Ll ln

the log Likelihood function

To find the maximum Likelihood estimates of , and we need to solve the equations:

0l

0l

0l

Page 27: The General Linear Model. The Simple Linear Model Linear Regression

n

iii xy

1

2 0

These are the same equations for the least squares line which have solution:

0l

0l

becomes

n

iii xy

1

2 0becomes

xy ˆˆ xx

xy

S

S

Page 28: The General Linear Model. The Simple Linear Model Linear Regression

The third equation:

0l

becomes

022

111

23

n

iii xyn

n

iii xy

n 1

22 ˆˆ1

ˆ

Page 29: The General Linear Model. The Simple Linear Model Linear Regression

Summary: Maximum Likelihood Estimates

xS

Syxy

xx

xy ˆˆ

xx

xy

n

ii

n

iii

S

S

xx

yyxx

1

2

1

and

n

iii xy

n 1

22 ˆˆ1

ˆ

Page 30: The General Linear Model. The Simple Linear Model Linear Regression

A computing formula for the estimate of 2

xy ˆˆ and

n

iii xy

n 1

22 ˆˆ1

ˆ

n

iii xxyy

n 1

22 ˆˆ1ˆ Hence

n

iii xxyy

n 1

2ˆ1

n

iiiii xxyyxxyy

n 1

222 ˆˆ21

xxxyyy SSSn

2ˆˆ21

Page 31: The General Linear Model. The Simple Linear Model Linear Regression

Now

xx

xy

S

S

2Hence xxxyyy SSSn

2ˆˆ21

xx

xx

xyxy

xx

xyyy S

S

SS

S

SS

n

2

21

xx

xyyy S

SS

n

21

Page 32: The General Linear Model. The Simple Linear Model Linear Regression

222 2ˆ

n

nE

It also can be shown that

Thus , the maximum likelihood estimator of 2, is a biased estimator of 2.

2

This estimator can be easily converted into an unbiased estimator of 2 by multiply by the ratio n/(n – 2)

n

iii xy

nn

ns

1

222 ˆˆ2

2

xx

xyyy S

SS

n

2

2

1

Page 33: The General Linear Model. The Simple Linear Model Linear Regression

Estimators in Linear Regression

xS

Syxy

xx

xy ˆˆ

xx

xy

n

ii

n

iii

S

S

xx

yyxx

1

2

1

and

xx

xyyy

n

iii S

SS

nxy

ns

2

1

22

2

1ˆˆ2

1

Page 34: The General Linear Model. The Simple Linear Model Linear Regression

The major computation is :

n

iixx xxS

1

2

n

iiyy yyS

1

2

n

iiixy yyxxS

1

Page 35: The General Linear Model. The Simple Linear Model Linear Regression

n

x

xxxS

n

iin

ii

n

iixx

2

1

1

2

1

2

n

yx

yx

n

ii

n

iin

iii

11

1

n

y

yyyS

n

iin

ii

n

iiyy

2

1

1

2

1

2

n

iiixy yyxxS

1

Computing Formulae:

Page 36: The General Linear Model. The Simple Linear Model Linear Regression

Application Of Statistical Theory to simple Linear Regression

We will now use statistical theory to prove optimal properties of the estimators.

Recall, the joint density of y1 ,y2 , …,yn is:

n

i

xy

n

ii

eyyyf1

221

2

2

2

1,,,

n

iii xy

nn e 1

222

1

2/2

1

Page 37: The General Linear Model. The Simple Linear Model Linear Regression

i

n

ii

n

ii

n

ii

n

ii

n

ii

n

ii

n

iii

n

ii

n

ii

n

ii

n

iiiii

xyyyxxn

nn

xxnxyyy

nn

xxyy

nn

ee

e

e

111

22

1

22

1

22

1

22

1

2

111

22

1

222

222

12

2

1

2/

2222

1

2/

22

1

2/

2

1

2

1

2

1

ii

ii ypSgh )()(exp)()(

3

1

θyθy

n

ii

n

ii xxn

nn egh 1

22

1

22

22

1

2/2

1)(,

2

1)(,

where

θyθ

Page 38: The General Linear Model. The Simple Linear Model Linear Regression

231

3

221

2

211

21

)( ,)(

,)( ,)(

,2

1)( ,)( and

θy

θy

θy

pyxS

pyS

pyS

n

iii

n

ii

n

ii

statisticssufficientcomplete

yyy

are

)(,)(,)(

Thus

13

12

1

21

n

iii

n

ii

n

ii yxSySyS

Page 39: The General Linear Model. The Simple Linear Model Linear Regression

n

SS

n

y

yS

n

iin

iiyy

22

1

2

1

1

2 )()(

yy

)()(

)(

)(

Now

23

21

3

11

1

yy

y

y

SxSn

Sx

S

n

yx

yxS

n

ii

n

ii

n

iin

iiixy

n

S

n

yy

n

ii )(21 y

Page 40: The General Linear Model. The Simple Linear Model Linear Regression

Also

)()()(ˆˆ 232

2 yyy

SxSS

x

n

Sx

S

Syxy

xxxx

xy

)()(1ˆ

23 yy

SxSSS

S

xxxx

xy

and

223

22

1

22

)()(1)(

)(2

1

2

1

yyy

y

SxSSn

SS

n

S

SS

ns

xx

xx

xyyy

Thus all three estimators are functions of the set of complete sufficient statistics.

If they are also unbiased then they are Uniform Minimum Variance Unbiased (UMVU) estimators (using the Lehmann-Scheffe theorem)

)(),(),( 321 yyy

SSS

Page 41: The General Linear Model. The Simple Linear Model Linear Regression

)()()(

ˆ 2322 yy

y

SxSS

x

n

S

xx

)()(1ˆ

23 yy

SxSSxx

and

We have already shown that s2 is an unbiased estimator of 2. We need only show that:

are unbiased estimators of and .

. and )( and )( Now1

31

2 ii

n

iii

n

ii xyEyxSyS

yy

n

i

n

iii xnnxyESE

1 12 )( Thus y

n

ii

n

iii

n

iii xxnxxyExSE

1

2

113 )( and y

Page 42: The General Linear Model. The Simple Linear Model Linear Regression

)()(1ˆ

23 yy

SExSES

Exx

Thus

are unbiased estimators of and .

xnnxxxnS

n

ii

xx

1

21

n

ii

xx

xnxS 1

221

n

iixx xnxS

1

22 since

Page 43: The General Linear Model. The Simple Linear Model Linear Regression

)()()(

ˆ 2322 yy

y

SExSES

x

n

SEE

xx

Also

xnnxxxnS

x

n

xnn n

ii

xx

1

22

2

1

22

xnxS

xx

n

ii

xx

2

2

1

2

xx

n

ii

S

xnxxx

. and of esimators unbiased are ˆ and ˆ Thus

Page 44: The General Linear Model. The Simple Linear Model Linear Regression

The General Linear Model

Page 45: The General Linear Model. The Simple Linear Model Linear Regression

Consider the random variable Y with

1. E[Y] = 1X1+ 2X2 + ... + pXp

(alternatively E[Y] = 0+ 1X1+ ... + pXp, intercept included)

and

2. var(Y) = 2

• where 1, 2 , ... ,p are unknown parameters

• and X1 ,X2 , ... , Xp are nonrandom variables.

• Assume further that Y is normally distributed.

p

iii X

1

Page 46: The General Linear Model. The Simple Linear Model Linear Regression

Thus the density of Y is:

f(Y|1, 2 , ... ,p, 2) = f(Y| , 2)

2

221122...

2

1exp

2

1pp XXXY

p

2

1

where β

β

Page 47: The General Linear Model. The Simple Linear Model Linear Regression

Now suppose that n independent observations of Y, (y1, y2, ..., yn) are made

corresponding to n sets of values of (X1 ,X2 , ... , Xp) - (x11 ,x12 , ... , x1p),(x21 ,x22 , ... , x2p),...(xn1 ,xn2 , ... , xnp).

Then the joint density of y = (y1, y2, ... yn) is:

f(y1, y2, ..., yn|1, 2 , ... ,p, 2) =

n

i

p

jijjin

xy1

2

122/2 2

1exp

2

1

2βy

f

Page 48: The General Linear Model. The Simple Linear Model Linear Regression

βXyβXyβy

22/2

2

2

1exp

2

1 Thus

nf

βXXββXyyy

2

2

1exp

2

122/2 n

βXyyyβXXβ

2

2

1exp

2

1exp

2

1222/2 n

βXyyyβy

2

2

1exp,

22

gh

npnn

p

p

xxx

xxx

xxx

21

22221

11211

X

Page 49: The General Linear Model. The Simple Linear Model Linear Regression

Thus is a member of the exponential family of distributions

And is a Minimal Complete set of Sufficient Statistics.

2βy

f

yXyyS ,

Page 50: The General Linear Model. The Simple Linear Model Linear Regression

Matrix-vector formulation

The General Linear Model

Page 51: The General Linear Model. The Simple Linear Model Linear Regression

npnn

p

p

pn xxx

xxx

xxx

y

y

y

21

22221

11211

2

1

2

1

,,Let Xβy

ondistributi , a has Then 2IβXy

N X

ondistributi , a has and

thenlet or 2I0εεβXy

βXyε

N

Page 52: The General Linear Model. The Simple Linear Model Linear Regression

The General Linear Model

npnn

p

p

pn xxx

xxx

xxx

y

y

y

21

22221

11211

2

1

2

1

,,Let Xβy

ondistributi , a has where 2I0εεβXy

N

Page 53: The General Linear Model. The Simple Linear Model Linear Regression

Geometrical interpretation of the General Linear Model

p

npnn

p

p

pn xxx

xxx

xxx

y

y

y

xxXβy

1

21

22221

11211

2

1

2

1

,,Let

ondistributi , a has where 2I0εεβXy

N

X

xxxβXyμ

of columns by the spanned spacelinear in the lies

221 ppE

Page 54: The General Linear Model. The Simple Linear Model Linear Regression

Geometical interpretation of the General Linear Model

1y

βXμ

ε

y

px

X of columns by the

spanned spaceLinear

1x

2y

ny

Page 55: The General Linear Model. The Simple Linear Model Linear Regression

Estimation

The General Linear Model

Page 56: The General Linear Model. The Simple Linear Model Linear Regression

Least squares estimates of Let

βXyβXy

n

i

p

jijjip xyRR

1

2

121 ,, β

p ,, of estimates squaresLeast The 21

p ˆ,ˆ,ˆ values theare 21

n

i

p

jijjip xyR

1

2

121 ,, minimizethat

Page 57: The General Linear Model. The Simple Linear Model Linear Regression

The Equations for the Least squares estimates

pk

R

k

p ,,2 ,1 ,0,, 21

02or 1 1

n

iik

p

jijji xxy

pkyxxxn

iiik

p

j

n

iikijj ,...,2 ,1 and

11 1

Page 58: The General Linear Model. The Simple Linear Model Linear Regression

Written out in full

n

iiip

n

iiip

n

iii

n

ii yxxxxxx

11

112

1121

1

21

n

iiip

n

iiip

n

ii

n

iii yxxxxxx

12

122

1

221

121

n

iiipp

n

iip

n

iipi

n

iipi yxxxxxx

11

22

121

11

These equations are called the Normal Equations

Page 59: The General Linear Model. The Simple Linear Model Linear Regression

Matrix development of the Normal Equations

Now βXXββXyyyβXyβXyβ

2R

0βXXyX

β

βXXββXyyy

β

β

222R

yXβXXor

yXβXX

The Normal Equations

Page 60: The General Linear Model. The Simple Linear Model Linear Regression

Summary (the Least Squares Estimates)

yXβXX ˆ

The Least Squares estimates satisfy The Normal Equations

npnn

p

p

n xxx

xxx

xxx

y

y

y

21

22221

11211

2

1

, where Xy

Page 61: The General Linear Model. The Simple Linear Model Linear Regression

Note: Some matrix properties

Rank rank(AB) = min(rank(A), rank(B))

rank(A) ≤ min(# rows of A, # cols of A )

rank(A) = rank(A)Consider the normal equations

yXβXX ˆ

matrix. a is andmatrix a is pppn XXX

. if ,min nppnprankrank XXX

. is then invertibleXXXXX prankrank

. of be tosaid is matrix The rank fullX

Page 62: The General Linear Model. The Simple Linear Model Linear Regression

then the solution to the normal equations

yXβXX ˆ

matrix. a is andmatrix a is pppn XXX

. if ,min nppnprankrank XXX

. is then invertibleXXXXX prankrank

. of is matrix theIf rank fullX

yXXXβ 1ˆ

Page 63: The General Linear Model. The Simple Linear Model Linear Regression

Maximum Likelihood Estimation

General Linear Model

Page 64: The General Linear Model. The Simple Linear Model Linear Regression

The General Linear Model

εβXy

ondistributi , a has where 2I0ε

N

βXyβXy

βy

22

1

2/2

2

2

1,

ef n

2

1,

function Likelihood

22

1

2/2

2βXyβXy

y β

eL n

Page 65: The General Linear Model. The Simple Linear Model Linear Regression

The Maximum Likelihood estimates of and 2 are the values

that maximize

or equivalently

2ˆ and ˆ β

2

1,

22

1

2/2

2βXyβXy

y β

eL n

,ln, 22 ββ yy

Ll

2

1ln2ln

22

22 βXyβXy

nn

22

1ln2ln

22

22 βXXββXyyy

nn

Page 66: The General Linear Model. The Simple Linear Model Linear Regression

This yields the system of linear equations

(The Normal Equations)

0

β

βXXβ

β

βXy

β

βy

2

,2l

yXβXX ˆ

0βXXyX

22 or

Page 67: The General Linear Model. The Simple Linear Model Linear Regression

0

βy

2

2 ,

l

yields the equation:

0

1

2

ln2

2

2

2

2

βXyβXy

n

0 2

1 2222

βXyβXy

n

0

22 42

βXyβXy

n

βXyβXy ˆˆ1ˆ 2

n

Page 68: The General Linear Model. The Simple Linear Model Linear Regression

If [X'X]-1 exists then the normal equations have solution:

and

βXyβXy ˆˆ1

ˆ 2

n

yXXXXyyXXXXy 11

n

1

yXXXXIXXXXIy 11

n

1

yXXXXIy 1

n

1

βXyyyyXXXXyyy 1 ˆ11

nn

yXXXβ 1

Page 69: The General Linear Model. The Simple Linear Model Linear Regression

Summary

and

βXyβXy ˆˆ1

ˆ 2

n

yXXXXIy 1

n

1

βXyyy ˆ1

n

yXXXβ 1ˆ

Page 70: The General Linear Model. The Simple Linear Model Linear Regression

Comments

The matrices

are symmetric idempotent matrices

XXXXIEXXXXE 12

11 and

1

1

11

1111

EXXXX

XXXXXXXX

XXXXXXXXEE

also 222 EEE

Page 71: The General Linear Model. The Simple Linear Model Linear Regression

Comments (continued)

pnrankrank

prankrank

prank

XXXXIE

XXXXE

XX

12

11 and

rank) full of (i.e. if

1E1E

Page 72: The General Linear Model. The Simple Linear Model Linear Regression

Geometry of Least Squares

1y

yXXXXβX 1ˆ

y

px

X of columns by the

spanned spaceLinear

1x

2y

ny

Page 73: The General Linear Model. The Simple Linear Model Linear Regression

Example

• Data is collected for n = 15 cases on the variables Y (the dependent variable) and X1, X2, X3 and X4.

• The data and calculations are displayed on the next page:

Page 74: The General Linear Model. The Simple Linear Model Linear Regression

x 1 x 2 x 3 x 4

52 59 34 7449 67 37 8953 51 31 8738 76 21 7456 69 27 8648 74 30 8351 69 28 76

X = 52 70 37 7757 76 29 8549 63 29 7648 72 26 7548 66 31 8746 67 25 7548 61 21 8045 66 32 78

36806 49540 21734 59457XX = 49540 68096 29284 80543

21734 29284 13118 3521659457 80543 35216 96736

63637.4Xy = 85176.9

37647103047.5

88.686.6

110.259.291.785.776.8

y = 86.197.179.782.592.374.287.979.8

Page 75: The General Linear Model. The Simple Linear Model Linear Regression

0.004291 -0.00019 -0.00131 -0.002

(XX)-1 = -0.00019 0.000979 0.00018 -0.00076

-0.00131 0.00018 0.003771 -0.00072-0.002 -0.00076 -0.00072 0.002134

1.238816

(XX)-1Xy = -0.64272-0.003

0.840056

β

yXc

yXβyy

where20.59156ˆ11

1

ˆ11

1

4

1

17

1

2

2

jjj

ii cy

s

4.53779220.59156 s

Page 76: The General Linear Model. The Simple Linear Model Linear Regression

Properties of The Maximum Likelihood Estimates

Unbiasedness, Minimum Variance

Page 77: The General Linear Model. The Simple Linear Model Linear Regression

Note:

and

yXXXβ

1ˆ EE

ββXXXXyXXX

11 E

βcβcβc

ˆˆ EE

Thus is an unbiased estimator of . Since

is also a function of the set of complete minimal sufficient statistics, it is the UMVU estimator of . (Lehman-Scheffe)

βcβc

βc

βc

Page 78: The General Linear Model. The Simple Linear Model Linear Regression

Note:

where

βXyβXy ˆˆ1

ˆ 2

n

yAyyXXXXIy 1

n

1

XXXXIA 1

n

1

In general

AΣμAμyAy trE

where yμ

EyΣ

ofmatrix covariance-variance and

Page 79: The General Linear Model. The Simple Linear Model Linear Regression

Thus:

βXyβXy ˆˆ1

ˆ 2

n

yAyyXXXXIy 1

E

nEE

1ˆ 2

XXXXIA 1

n

1where

Aμμ trA

βXyμ

E nI2

βXXXXXIXβ 1

nE

1ˆ 2

nntr IXXXXI 1 21

Page 80: The General Linear Model. The Simple Linear Model Linear Regression

Thus:

βXXXXXXXβ 1

nE

1ˆ 2

XXXXI 1 ntr

n

2

XXXXI 1 trtrn n

2

0

ptrnn

trnn

IXXXX 2

12

2n

pn

Page 81: The General Linear Model. The Simple Linear Model Linear Regression

Let

βXyβXy ˆˆ1

ˆ 22

pnpn

ns

Then

2222 ˆ

n

pn

pn

nE

pn

nsE

Thus s2 is an unbiased estimator of 2.

Since s2 is also a function of the set of complete minimal sufficient statistics, it is the UMVU estimator of 2.

Page 82: The General Linear Model. The Simple Linear Model Linear Regression

Distributional Properties

Least square Estimates (Maximum Likelidood estimates)

Page 83: The General Linear Model. The Simple Linear Model Linear Regression

1. If then where A is a q × p matrix

Recall

~ , μy

pN AAμAyAw , ~

qN

yAyμy UN p then , ~ Suppose 2.

.rank of idempotent is 2.

.rank of idempotent is .1

r

r

A

A

μAμ

21on with distributi , has r

t.independen are and

then and , ~ Suppose 3.

yAwyAy

0ACμy

U

N p

t.independen are and

then and , ~ Suppose 4.

21 yByyAy

0BAμy

UU

N p

Page 84: The General Linear Model. The Simple Linear Model Linear Regression

The General Linear Model

and yXXXXIy 1

pns

1 2. 2

XXXAyAyXXXβ 11 whereˆ 1.

yBy

yXXXXIy 1

1 Now

22

2

spn

U

XXXXIB 1 2

1 where

IβXy 2, ~

nN

The Estimates

Page 85: The General Linear Model. The Simple Linear Model Linear Regression

Theorem

.0 with ,~ 2. 22

2

pnspn

U

12, ~ ˆ 1. XXββ

pN

Proof

tindependen are and ˆ 3. 2sβ

IβXμy 2,, ~ Since

nn NN

AAμAyAyXXXβ , ~ ˆthen 1 pN

ββXXXXβXXXXμA

11 Now

12112

121

and

XXXXXXXX

XXXIXXXAA

12, ~ ˆ Thus XXββ

pN

Page 86: The General Linear Model. The Simple Linear Model Linear Regression

0on with distributi , has Now pnU yBy

.rank of idempotent is 2.

.rank of idempotent is .1

pn

pn

B

B

0 and 21 μBμ

IXXXXIB 1 22

and 1

where

BXXXXI

IXXXXIB

1

1

1 Now 2

2

XXXXXXXX

XXXXXXXXI

XXXXIXXXXI

11

11

11

Since

Page 87: The General Linear Model. The Simple Linear Model Linear Regression

βXXXXXIXβμBμ 1

221

21 1

Also

.idempotent is Thus XXXXIBB 1

XXXXXXXX

XXXXXXXXI11

11

XXXXI 1

βXXXXXIXβ 1

22

1

βXXXXXXXXβ 1

22

1

02

12

βXXXXβ

Page 88: The General Linear Model. The Simple Linear Model Linear Regression

pn

rank of

idempotent is Thus XXXXIBB 1

.rank full of is if Now pprankrank XXXXXX 1

XXXXIEXXXXE 11 21 and Let

0EE

EEIEE

21

2121

and

with idempotent symmetric are and

Thus 12 pnranknrank EE

.0 with ,~ Hence 22

2

pnspn

U yBy

Page 89: The General Linear Model. The Simple Linear Model Linear Regression

Finally

and yCyyXXXXIy 1

pns

12

XXXAyAyXXXβ 11 whereˆ Since

XXXXIC 1

pn

1with

XXXXIIXXXXCA 11

pn

1then

XXXXIXXXX 11

pn

XXXXXXXXXXXX 111

pn

0XXXXXXXX 11

pn

Page 90: The General Linear Model. The Simple Linear Model Linear Regression

Thus

are independent.

yXXXXIyyXXXβ 1

pns

1 and ˆ 21

Summary

.0 with ,~ 2. 22

2

pnspn

U

12, ~ ˆ 1. XXββ

pN

tindependen are and ˆ 3. 2sβ

Page 91: The General Linear Model. The Simple Linear Model Linear Regression

Example

• Data is collected for n = 15 cases on the variables Y (the dependent variable) and X1, X2, X3 and X4.

• The data and calculations are displayed on the next page:

Page 92: The General Linear Model. The Simple Linear Model Linear Regression

x 1 x 2 x 3 x 4

52 59 34 7449 67 37 8953 51 31 8738 76 21 7456 69 27 8648 74 30 8351 69 28 76

X = 52 70 37 7757 76 29 8549 63 29 7648 72 26 7548 66 31 8746 67 25 7548 61 21 8045 66 32 78

36806 49540 21734 59457XX = 49540 68096 29284 80543

21734 29284 13118 3521659457 80543 35216 96736

63637.4Xy = 85176.9

37647103047.5

88.686.6

110.259.291.785.776.8

y = 86.197.179.782.592.374.287.979.8

Page 93: The General Linear Model. The Simple Linear Model Linear Regression

0.004291 -0.00019 -0.00131 -0.002

(XX)-1 = -0.00019 0.000979 0.00018 -0.00076

-0.00131 0.00018 0.003771 -0.00072-0.002 -0.00076 -0.00072 0.002134

1.238816

(XX)-1Xy = -0.64272-0.003

0.840056

β

yXc

yXβyy

where20.59156ˆ11

1

ˆ11

1

4

1

17

1

2

2

jjj

ii cy

s

4.53779220.59156 s

Page 94: The General Linear Model. The Simple Linear Model Linear Regression

0.2096250.043943

0.2786530.077648

0.1419820.020159

0.2972470.088356

4

3

2

ˆ

ˆ

ˆ

ˆ

s

s

s

s

βvar 0.088356 -0.00401 -0.02694 -0.04116

s2(XX)-1

= -0.00401 0.020159 0.003709 -0.01567-0.02694 0.003709 0.077648 -0.0148-0.04116 -0.01567 -0.0148 0.043943

Page 95: The General Linear Model. The Simple Linear Model Linear Regression

1.238816

(XX)-1Xy = -0.64272-0.003

0.840056

β

0.2096250.043943

0.2786530.077648

0.1419820.020159

0.2972470.088356

4

3

2

ˆ

ˆ

ˆ

ˆ

s

s

s

s

Compare with SPSS output

Estimates of the coefficients

Page 96: The General Linear Model. The Simple Linear Model Linear Regression

The General Linear Model

with an intercept

Page 97: The General Linear Model. The Simple Linear Model Linear Regression

Consider the random variable Y with

1. E[Y] = 0+ 1X1+ 2X2 + ... + pXp

(intercept included)

and

2. var(Y) = 2

• where 1, 2 , ... ,p are unknown parameters

• and X1 ,X2 , ... , Xp are nonrandom variables.

• Assume further that Y is normally distributed.

Page 98: The General Linear Model. The Simple Linear Model Linear Regression

npnn

p

p

pn xxx

xxx

xxx

y

y

y

21

22221

11211

2

1

0

2

1

1

1

1

,,Let Xβy

ondistributi , a has i.e. 2IβXy

N

ondistributi , a has where 2I0εεβXy

N

The matrix formulation (intercept included)

Then the model becomes

Thus to include an intercept add an extra column of 1’s in the design matrix X and include the intercept in the parameter vector

Page 99: The General Linear Model. The Simple Linear Model Linear Regression

nn x

x

x

y

y

y

1

1

1

,, 2

1

1

02

1

Xβy

The matrix formulation of the Simple Linear regression model

2

1

2

1

12

1

21

1

1

1

111

xnSxn

xnn

xx

xn

x

x

x

xxx

xx

n

ii

n

ii

n

ii

n

n XX

Page 100: The General Linear Model. The Simple Linear Model Linear Regression

and

yxnS

yn

yx

y

y

y

y

xxx xyn

iii

n

ii

n

n

1

12

1

21

111

yX

nxn

xnxnS

xnxnSn

xnSxn

xnn

xx

xx

xx

2

22

1

2

1

1

XX

12

1

1

xxxx

xxxx

SS

xS

x

S

x

n

Now

Page 101: The General Linear Model. The Simple Linear Model Linear Regression

thus

yxnS

yn

SS

xS

x

S

x

n

xy

xxxx

xxxx

1

1

ˆ

ˆ

2

1

1

yXXX

yxnS

Syn

S

x

yxnSS

xyn

S

x

n

xyxxxx

xyxxxx

1

1 2

xx

xy

xx

xy

S

S

xS

Sy

Page 102: The General Linear Model. The Simple Linear Model Linear Regression

Finally

xxxx

xxxx

S

ss

S

x

sS

xs

S

x

ns

22

222

12

1

1

ˆ

ˆvar XX

xx

xx

xx

S

sx

S

s

sS

x

n

2

10

2

1

22

ˆ,ˆcov

ˆvar

1ˆvar

Thus

Page 103: The General Linear Model. The Simple Linear Model Linear Regression

The Gauss-Markov Theorem

An important result in the theory of Linear models

Proves optimality of Least squares estimates in a more general setting

Page 104: The General Linear Model. The Simple Linear Model Linear Regression

Assume the following model Linear Model

IyβXy 2var and

E

We will not necessarily assume Normality.

Consider the least squares estimate of

ˆ 1 yXXXβ

β

nn yayaya

2211

1

ˆ

ya

yXXXcβc

is an unbiased linear estimator of βc

Page 105: The General Linear Model. The Simple Linear Model Linear Regression

The Gauss-Markov Theorem

Assume

IyβXy 2var and

E

Consider the least squares estimate of

ˆ 1 yXXXβ

β

nn yayaya

2211

1

ˆ

ya

yXXXcβc

, an unbiased linear estimator of βc

and

Let nn ybybyb

2211yb

denote any other unbiased linear estimator of βc

βcyb ˆvarvarthen

Page 106: The General Linear Model. The Simple Linear Model Linear Regression

Proof Now IyβXy 2var and

E

βcβXXXXc

yXXXc

yXXXcβc

1

1

1

ˆ

E

EE

cXXccXXXXXXc

cXXXIXXXc

cXXXyXXXc

yXXXcβc

12112

121

11

1

var

var

varˆvar

Page 107: The General Linear Model. The Simple Linear Model Linear Regression

Now is an unbiased estimator of if

yb βc

ββcβXbybyb

allfor EE

cbXcXb

or i.e.

Also

bbbIbbybyb 22varvar

Thus

cXXcbbβcyb 122ˆvarvar

bXXXXbbb 122

bXXXXIb 12

bXXXXIuuu

bXXXXIXXXXIb

12

112

where0

Page 108: The General Linear Model. The Simple Linear Model Linear Regression

Thus

βcyb ˆvarvar

The Gauss-Markov theorem states that

is the Best Linear Unbiased Estimator (B.L.U.E.) of

βc

βc

Page 109: The General Linear Model. The Simple Linear Model Linear Regression

Hypothesis testing for the GLM

The General Linear Hypothesis

Page 110: The General Linear Model. The Simple Linear Model Linear Regression

Testing the General Linear Hypotheses

The General Linear Hypothesis H0: h111 + h122 + h133 +... + h1pp = h1

h211 + h222 + h233 +... + h2pp = h2

...

hq11 + hq22 + hq33 +... + hqpp = hq

where h11h12, h13, ... , hqp and h1h2, h3, ... , hq are known coefficients.

In matrix notation11

qppqhβH

Page 111: The General Linear Model. The Simple Linear Model Linear Regression

Examples 1. H0: 1 = 0

2. H0: 1 = 0, 2 = 0, 3 = 0

3. H0: 1 = 2

6

5

4

3

2

1

16

β 0,000001

1161

hH

0

0

0

,

000100

000010

000001

1363hH

0,0000111161

hH

Page 112: The General Linear Model. The Simple Linear Model Linear Regression

Examples 4. H0: 1 = 2 , 3 = 4

5. H0: 1 = 1/2(2 + 3)

6. H0: 1 = 1/2(2 + 3), 3 = 1/3(4 + 5 + 6)

6

5

4

3

2

1

16

β

0

0,

001100

0000111262

hH

0,0001112

121

61

hH

0

0,

100

000112

31

31

31

21

21

62hH

Page 113: The General Linear Model. The Simple Linear Model Linear Regression

TheLikelihood Ratio Test

The joint density of is:

βXyβXyβy

22/2

2

2

1exp

2

1

nf

y

The likelihood function

βXyβXyβy

22/2

2

2

1exp

2

1

nL

The log-likelihood function

βXyβXy

ββ yy

22

22

22

2

1ln2ln

ln

nn

Ll

Page 114: The General Linear Model. The Simple Linear Model Linear Regression

Defn (Likelihood Ratio Test of size )Rejects

H0:

against the alternative hypothesis

H1: .

when

and K is chosen so that

KL

L

f

f

2

2

ˆˆ

ˆˆ

)|(max

)|(max

β

β

θx

θx

y

y

θ

θ

and allfor )|( θxθxxC

dfCP

0 oneleast at for )|( θxθxxC

dfCP

hβH

hβH

hβHβ

βββ

ˆ: assuming of sM.L.E.' theare

ˆˆ and of sM.L.E.' theare ˆˆ where

02

222

H

Page 115: The General Linear Model. The Simple Linear Model Linear Regression

Note

2ˆ and ˆ

find To β

We will maximize.

condition side thesubject to ly equivalent

2

1ln2ln

2

22

222

β

βXyβXyβ

y

y

L

l nn

βXyβXyyXXXβ ˆˆˆ and ˆ 121

n

hβH

:0H

The Lagrange multiplier technique will be used for this purpose

Page 116: The General Linear Model. The Simple Linear Model Linear Regression

We will maximize.

hβHλ

βXyβXy

hβHλβλβ y

2

1ln2ln

,

22

22

22

nn

lg

0hβH0

λ

λβ

gives ,2g

β

hβHλ

β

βXyβXy

β

λβ

2

2

2

1,

g

0λHβXXyX

222

12

Page 117: The General Linear Model. The Simple Linear Model Linear Regression

or0λHβXXyX

2

λHyXβXX 2

λHXXyXXXβ 121

0

2

1

2

,2222

2

βXyβXyλβ

ng

finally

or βXyβXy

n

12

Page 118: The General Linear Model. The Simple Linear Model Linear Regression

Thus the equations for are

λHXXyXXXβ 121 ˆˆ

Now

or

βXyβXy

ˆˆ1ˆ 2

n

hβH

ˆ

λHXXHyXXXHβHh 121 ˆˆ

yXXXHhλHXXH 1

2

1

ˆ1

and yXXXHHXXHhHXXHλ 11111

2ˆ1