the general linear model. the simple linear model linear regression

The General Linear Model

The Simple Linear Model

Linear Regression

Suppose that we have two variables

1. Y – the dependent variable (response variable)

2. X – the independent variable (explanatory variable, factor)

X , the independent variable may or may not be a random variable .

Sometimes it is randomly observed.

Sometimes specific values of X are selected

The dependent variable, Y, is assumed to be a random variable .

The distribution of Y is dependent on X

The object is to determine that distribution using statistical techniques. (Estimation and Hypothesis Testing)

These decisions will be based on data collected on both variable Y (the dependent variable) and X (the independent variable) .

Let (x1, y1), (x2, y2), … ,(xn, yn) denote n pairs of values measured on the independent variable (X) and the dependent variable (Y)

The scatterplot:

The graphical plot of the points:

(x1, y1), (x2, y2), … ,(xn, yn)

Assume that we have collected data on two variables X and Y. Let

(x1, y1) (x2, y2) (x3, y3) … (xn, yn)

denote the pairs of measurements on the on two variables X and Y for n cases in a sample (or population)

1. independent random variables.

2. Normally distributed.

3. Have the common variance, .

4. The mean of yi is i = + xi

The assumption will be made that y1, y2, y3

…, yn are

Data that satisfies the assumptions above is to come from the Simple Linear Model

Each yi is assumed to be randomly generated

from a normal distribution with

mean i = + xi and

standard deviation .

yi

+ xi

xi

• When data is correlated it falls roughly about a straight line.

0

20

40

60

80

100

120

140

160

40 60 80 100 120 140

The density of yi is:

2

2

2

2

1

ii xy

i eyf

The joint density of y1 ,y2 , …,yn is:

n

i

xy

n

ii

eyyyf1

221

2

2

2

1,,,

n

iii xy

nn e 1

222

1

2/2

1

Estimation of the parameters

the intercept

the slope the standard deviation (or variance 2)

The Least Squares Line

Fitting the best straight line

to “linear” data

LetY = a + b X

denote an arbitrary equation of a straight line.a and b are known values.This equation can be used to predict for each value of X, the value of Y.

For example, if X = xi (as for the ith case) then the predicted value of Y is:

ii bxay ˆ

Define the residual for each case in the sample to be:

iiiii bxayyyr ˆ

,ˆ,,ˆ,ˆ 222111 nnn yyryyryyr

n

iii

n

ii yyrRSS

1

2

1

2 ˆ

The residual sum of squares (RSS) is defined as:

The residual sum of squares (RSS) is a measure of the “goodness of fit of the line Y = a + bX to the data

n

iii bxay

1

2

One choice of a and b will result in the residual sum of squares

n

iii

n

iii

n

ii bxayyyrRSS

1

2

1

2

1

2 ˆ

attaining a minimum.

If this is the case than the line:

Y = a + bX

is called the Least Squares Line

baR ,

To find the Least Squares estimates, a and b, we need to solve the equations:

0,

1

2

n

iii bxay

aa

baR

and

0,

1

2

n

iii bxay

bb

baR

Note:

n

iii bxay

aa

baR

1

2,

or

n

iii bxay

1

)1(2

0)2(1 1

n

i

n

iii xbnay

n

ii

n

ii xbnay

11

and xbay

Note:

or

n

iii bxay

bb

baR

1

2,

n

iiii xbxay

1

)(2

0)2(1

2

11

n

ii

n

ii

n

iii xbxayx

n

ii

n

ii

n

iii xbxayx

1

2

11

Hence the optimal values of a and b satisfy the equations: and

n

ii

n

ii

n

iii xbxayx

1

2

11

From the first equation we have:

xbay

xbya

n

iixbxnxby

1

2

The second equation becomes:

n

ii

n

ii

n

iii xbxxbyyx

1

2

11

n

ii xnxbyxn

1

22

Solving the second equation for b:

xS

Syxbya

xx

xy

xx

xy

n

ii

n

iii

S

S

xnx

yxnyxb

1

22

1

and

n

iixx xnxS

1

22where

n

iiixy yxnyxS

1

and

Note:

and

xx

n

ii

n

ii Sxnxxx

1

22

1

2

Proof

xy

n

iii

n

iii Syxnyxyyxx

11

n

iiiii

n

iii yxyxyxyxyyxx

11

yxnyxxyyxn

ii

n

ii

n

iii

111

yxnyxnyxnyxn

iii

1

yxnyxn

iii

1

Summary:Slope and intercept of the least squares Line

xS

Syxy

xx

xy ˆˆ

xx

xy

n

ii

n

iii

S

S

xx

yyxx

1

2

1

and

Maximum Likelihood Estimation of the parameters

the intercept

the slope the standard deviation

Recall

The joint density of y1 ,y2 , …,yn is:

n

i

xy

n

ii

eyyyf1

221

2

2

2

1,,,

n

iii xy

nn e 1

222

1

2/2

1

L

= the Likelihood function

n

iii xyn

n1

2

22

1ln2ln

2

Ll ln

the log Likelihood function

To find the maximum Likelihood estimates of , and we need to solve the equations:

0l

0l

0l

n

iii xy

1

2 0

These are the same equations for the least squares line which have solution:

0l

0l

becomes

n

iii xy

1

2 0becomes

xy ˆˆ xx

xy

S

S

The third equation:

0l

becomes

022

111

23

n

iii xyn

n

iii xy

n 1

22 ˆˆ1

ˆ

Summary: Maximum Likelihood Estimates

xS

Syxy

xx

xy ˆˆ

xx

xy

n

ii

n

iii

S

S

xx

yyxx

1

2

1

and

n

iii xy

n 1

22 ˆˆ1

ˆ

A computing formula for the estimate of 2

xy ˆˆ and

n

iii xy

n 1

22 ˆˆ1

ˆ

n

iii xxyy

n 1

22 ˆˆ1ˆ Hence

n

iii xxyy

n 1

2ˆ1

n

iiiii xxyyxxyy

n 1

222 ˆˆ21

xxxyyy SSSn

2ˆˆ21

Now

xx

xy

S

S

2Hence xxxyyy SSSn

2ˆˆ21

xx

xx

xyxy

xx

xyyy S

S

SS

S

SS

n

2

21

xx

xyyy S

SS

n

21

222 2ˆ

n

nE

It also can be shown that

Thus , the maximum likelihood estimator of 2, is a biased estimator of 2.

2

This estimator can be easily converted into an unbiased estimator of 2 by multiply by the ratio n/(n – 2)

n

iii xy

nn

ns

1

222 ˆˆ2

1ˆ

2

xx

xyyy S

SS

n

2

2

1

Estimators in Linear Regression

xS

Syxy

xx

xy ˆˆ

xx

xy

n

ii

n

iii

S

S

xx

yyxx

1

2

1

and

xx

xyyy

n

iii S

SS

nxy

ns

2

1

22

2

1ˆˆ2

1

The major computation is :

n

iixx xxS

1

2

n

iiyy yyS

1

2

n

iiixy yyxxS

1

n

x

xxxS

n

iin

ii

n

iixx

2

1

1

2

1

2

n

yx

yx

n

ii

n

iin

iii

11

1

n

y

yyyS

n

iin

ii

n

iiyy

2

1

1

2

1

2

n

iiixy yyxxS

1

Computing Formulae:

Application Of Statistical Theory to simple Linear Regression

We will now use statistical theory to prove optimal properties of the estimators.

Recall, the joint density of y1 ,y2 , …,yn is:

n

i

xy

n

ii

eyyyf1

221

2

2

2

1,,,

n

iii xy

nn e 1

222

1

2/2

1

i

n

ii

n

ii

n

ii

n

ii

n

ii

n

ii

n

iii

n

ii

n

ii

n

ii

n

iiiii

xyyyxxn

nn

xxnxyyy

nn

xxyy

nn

ee

e

e

111

22

1

22

1

22

1

22

1

2

111

22

1

222

222

12

2

1

2/

2222

1

2/

22

1

2/

2

1

2

1

2

1

ii

ii ypSgh )()(exp)()(

3

1

θyθy

n

ii

n

ii xxn

nn egh 1

22

1

22

22

1

2/2

1)(,

2

1)(,

where

θyθ

231

3

221

2

211

21

)( ,)(

,)( ,)(

,2

1)( ,)( and

θy

θy

θy

pyxS

pyS

pyS

n

iii

n

ii

n

ii

statisticssufficientcomplete

yyy

are

)(,)(,)(

Thus

13

12

1

21

n

iii

n

ii

n

ii yxSySyS

n

SS

n

y

yS

n

iin

iiyy

22

1

2

1

1

2 )()(

yy

)()(

)(

)(

Now

23

21

3

11

1

yy

y

y

SxSn

Sx

S

n

yx

yxS

n

ii

n

ii

n

iin

iiixy

n

S

n

yy

n

ii )(21 y

Also

)()()(ˆˆ 232

2 yyy

SxSS

x

n

Sx

S

Syxy

xxxx

xy

)()(1ˆ

23 yy

SxSSS

S

xxxx

xy

and

223

22

1

22

)()(1)(

)(2

1

2

1

yyy

y

SxSSn

SS

n

S

SS

ns

xx

xx

xyyy

Thus all three estimators are functions of the set of complete sufficient statistics.

If they are also unbiased then they are Uniform Minimum Variance Unbiased (UMVU) estimators (using the Lehmann-Scheffe theorem)

)(),(),( 321 yyy

SSS

)()()(

ˆ 2322 yy

y

SxSS

x

n

S

xx

)()(1ˆ

23 yy

SxSSxx

and

We have already shown that s2 is an unbiased estimator of 2. We need only show that:

are unbiased estimators of and .

. and )( and )( Now1

31

2 ii

n

iii

n

ii xyEyxSyS

yy

n

i

n

iii xnnxyESE

1 12 )( Thus y

n

ii

n

iii

n

iii xxnxxyExSE

1

2

113 )( and y

)()(1ˆ

23 yy

SExSES

Exx

Thus

are unbiased estimators of and .

xnnxxxnS

n

ii

xx

1

21

n

ii

xx

xnxS 1

221

n

iixx xnxS

1

22 since

)()()(

ˆ 2322 yy

y

SExSES

x

n

SEE

xx

Also

xnnxxxnS

x

n

xnn n

ii

xx

1

22

2

1

22

xnxS

xx

n

ii

xx

2

2

1

2

xx

n

ii

S

xnxxx

. and of esimators unbiased are ˆ and ˆ Thus

Consider the random variable Y with

1. E[Y] = 1X1+ 2X2 + ... + pXp

(alternatively E[Y] = 0+ 1X1+ ... + pXp, intercept included)

and

2. var(Y) = 2

• where 1, 2 , ... ,p are unknown parameters

• and X1 ,X2 , ... , Xp are nonrandom variables.

• Assume further that Y is normally distributed.

p

iii X

1

Thus the density of Y is:

f(Y|1, 2 , ... ,p, 2) = f(Y| , 2)

2

221122...

2

1exp

2

1pp XXXY

p

2

1

where β

β

Now suppose that n independent observations of Y, (y1, y2, ..., yn) are made

corresponding to n sets of values of (X1 ,X2 , ... , Xp) - (x11 ,x12 , ... , x1p),(x21 ,x22 , ... , x2p),...(xn1 ,xn2 , ... , xnp).

Then the joint density of y = (y1, y2, ... yn) is:

f(y1, y2, ..., yn|1, 2 , ... ,p, 2) =

n

i

p

jijjin

xy1

2

122/2 2

1exp

2

1

2βy

f

βXyβXyβy

22/2

2

2

1exp

2

1 Thus

nf

βXXββXyyy

2

2

1exp

2

122/2 n

βXyyyβXXβ

2

2

1exp

2

1exp

2

1222/2 n

βXyyyβy

2

2

1exp,

22

gh

npnn

p

p

xxx

xxx

xxx

21

22221

11211

X

Thus is a member of the exponential family of distributions

And is a Minimal Complete set of Sufficient Statistics.

2βy

f

yXyyS ,

Matrix-vector formulation


npnn

p

p

pn xxx

xxx

xxx

y

y

y

21

22221

11211

2

1

2

1

,,Let Xβy

ondistributi , a has Then 2IβXy

N X

ondistributi , a has and

thenlet or 2I0εεβXy

βXyε

N


npnn

p

p

pn xxx

xxx

xxx

y

y

y

21

22221

11211

2

1

2

1

,,Let Xβy

ondistributi , a has where 2I0εεβXy

N

Geometrical interpretation of the General Linear Model

p

npnn

p

p

pn xxx

xxx

xxx

y

y

y

xxXβy

1

21

22221

11211

2

1

2

1

,,Let


N

X

xxxβXyμ

of columns by the spanned spacelinear in the lies

221 ppE

Geometical interpretation of the General Linear Model

1y

βXμ

ε

y

px

X of columns by the

spanned spaceLinear

1x

2y

ny

Estimation


Least squares estimates of Let

βXyβXy

n

i

p

jijjip xyRR

1

2

121 ,, β

p ,, of estimates squaresLeast The 21

p ˆ,ˆ,ˆ values theare 21

n

i

p

jijjip xyR

1

2

121 ,, minimizethat

The Equations for the Least squares estimates

pk

R

k

p ,,2 ,1 ,0,, 21

02or 1 1

n

iik

p

jijji xxy

pkyxxxn

iiik

p

j

n

iikijj ,...,2 ,1 and

11 1

Written out in full

n

iiip

n

iiip

n

iii

n

ii yxxxxxx

11

112

1121

1

21

n

iiip

n

iiip

n

ii

n

iii yxxxxxx

12

122

1

221

121

n

iiipp

n

iip

n

iipi

n

iipi yxxxxxx

11

22

121

11

These equations are called the Normal Equations

Matrix development of the Normal Equations

Now βXXββXyyyβXyβXyβ

2R

0βXXyX

β

βXXββXyyy

β

β

222R

yXβXXor

yXβXX

The Normal Equations

Summary (the Least Squares Estimates)

yXβXX ˆ

The Least Squares estimates satisfy The Normal Equations

npnn

p

p

n xxx

xxx

xxx

y

y

y

21

22221

11211

2

1

, where Xy

Note: Some matrix properties

Rank rank(AB) = min(rank(A), rank(B))

rank(A) ≤ min(# rows of A, # cols of A )

rank(A) = rank(A)Consider the normal equations

yXβXX ˆ

matrix. a is andmatrix a is pppn XXX

. if ,min nppnprankrank XXX

. is then invertibleXXXXX prankrank

. of be tosaid is matrix The rank fullX

then the solution to the normal equations

yXβXX ˆ

matrix. a is andmatrix a is pppn XXX

. if ,min nppnprankrank XXX

. is then invertibleXXXXX prankrank

. of is matrix theIf rank fullX

yXXXβ 1ˆ

Maximum Likelihood Estimation

General Linear Model


εβXy

ondistributi , a has where 2I0ε

N

βXyβXy

βy

22

1

2/2

2

2

1,

ef n

2

1,

function Likelihood

22

1

2/2

2βXyβXy

y β

eL n

The Maximum Likelihood estimates of and 2 are the values

that maximize

or equivalently

2ˆ and ˆ β

2

1,

22

1

2/2

2βXyβXy

y β

eL n

,ln, 22 ββ yy

Ll

2

1ln2ln

22

22 βXyβXy

nn

22

1ln2ln

22

22 βXXββXyyy

nn

This yields the system of linear equations

(The Normal Equations)

0

β

βXXβ

β

βXy

β

βy

2

,2l

yXβXX ˆ

0βXXyX

22 or

0

βy

2

2 ,

l

yields the equation:

0

1

2

ln2

2

2

2

2

βXyβXy

n

0 2

1 2222

βXyβXy

n

0

22 42

βXyβXy

n

βXyβXy ˆˆ1ˆ 2

n

If [X'X]-1 exists then the normal equations have solution:

and

βXyβXy ˆˆ1

ˆ 2

n

yXXXXyyXXXXy 11

n

1

yXXXXIXXXXIy 11

n

1

yXXXXIy 1

n

1

βXyyyyXXXXyyy 1 ˆ11

nn

yXXXβ 1

Summary

and

βXyβXy ˆˆ1

ˆ 2

n

yXXXXIy 1

n

1

βXyyy ˆ1

n

yXXXβ 1ˆ

Comments

The matrices

are symmetric idempotent matrices

XXXXIEXXXXE 12

11 and

1

1

11

1111

EXXXX

XXXXXXXX

XXXXXXXXEE

also 222 EEE

Comments (continued)

pnrankrank

prankrank

prank

XXXXIE

XXXXE

XX

12

11 and

rank) full of (i.e. if

1E1E

Geometry of Least Squares

1y

yXXXXβX 1ˆ

y

px

X of columns by the

spanned spaceLinear

1x

2y

ny

Example

• Data is collected for n = 15 cases on the variables Y (the dependent variable) and X1, X2, X3 and X4.

• The data and calculations are displayed on the next page:

x 1 x 2 x 3 x 4

52 59 34 7449 67 37 8953 51 31 8738 76 21 7456 69 27 8648 74 30 8351 69 28 76

X = 52 70 37 7757 76 29 8549 63 29 7648 72 26 7548 66 31 8746 67 25 7548 61 21 8045 66 32 78

36806 49540 21734 59457XX = 49540 68096 29284 80543

21734 29284 13118 3521659457 80543 35216 96736

63637.4Xy = 85176.9

37647103047.5

88.686.6

110.259.291.785.776.8

y = 86.197.179.782.592.374.287.979.8

0.004291 -0.00019 -0.00131 -0.002

(XX)-1 = -0.00019 0.000979 0.00018 -0.00076

-0.00131 0.00018 0.003771 -0.00072-0.002 -0.00076 -0.00072 0.002134

1.238816

(XX)-1Xy = -0.64272-0.003

0.840056

β

yXc

yXβyy

where20.59156ˆ11

1

ˆ11

1

4

1

17

1

2

2

jjj

ii cy

s

4.53779220.59156 s

Properties of The Maximum Likelihood Estimates

Unbiasedness, Minimum Variance

Note:

and

yXXXβ

1ˆ EE

ββXXXXyXXX

11 E

βcβcβc

ˆˆ EE

Thus is an unbiased estimator of . Since

is also a function of the set of complete minimal sufficient statistics, it is the UMVU estimator of . (Lehman-Scheffe)

βcβc

βc

βc

Note:

where

βXyβXy ˆˆ1

ˆ 2

n

yAyyXXXXIy 1

n

1

XXXXIA 1

n

1

In general

AΣμAμyAy trE

where yμ

EyΣ

ofmatrix covariance-variance and

Thus:

βXyβXy ˆˆ1

ˆ 2

n

yAyyXXXXIy 1

E

nEE

1ˆ 2

XXXXIA 1

n

1where

Aμμ trA

βXyμ

E nI2

βXXXXXIXβ 1

nE

1ˆ 2

nntr IXXXXI 1 21

Thus:

βXXXXXXXβ 1

nE

1ˆ 2

XXXXI 1 ntr

n

2

XXXXI 1 trtrn n

2

0

ptrnn

trnn

IXXXX 2

12

2n

pn

Let

βXyβXy ˆˆ1

ˆ 22

pnpn

ns

Then

2222 ˆ

n

pn

pn

nE

pn

nsE

Thus s2 is an unbiased estimator of 2.

Since s2 is also a function of the set of complete minimal sufficient statistics, it is the UMVU estimator of 2.

Distributional Properties

Least square Estimates (Maximum Likelidood estimates)

1. If then where A is a q × p matrix

Recall

~ , μy

pN AAμAyAw , ~

qN

yAyμy UN p then , ~ Suppose 2.

.rank of idempotent is 2.

.rank of idempotent is .1

r

r

A

A

μAμ

21on with distributi , has r

t.independen are and

then and , ~ Suppose 3.

yAwyAy

0ACμy

U

N p

t.independen are and

then and , ~ Suppose 4.

21 yByyAy

0BAμy

UU

N p


and yXXXXIy 1

pns

1 2. 2

XXXAyAyXXXβ 11 whereˆ 1.

yBy

yXXXXIy 1

1 Now

22

2

spn

U

XXXXIB 1 2

1 where

IβXy 2, ~

nN

The Estimates

Theorem

.0 with ,~ 2. 22

2

pnspn

U

12, ~ ˆ 1. XXββ

pN

Proof

tindependen are and ˆ 3. 2sβ

IβXμy 2,, ~ Since

nn NN

AAμAyAyXXXβ , ~ ˆthen 1 pN

ββXXXXβXXXXμA

11 Now

12112

121

and

XXXXXXXX

XXXIXXXAA

12, ~ ˆ Thus XXββ

pN

0on with distributi , has Now pnU yBy

.rank of idempotent is 2.

.rank of idempotent is .1

pn

pn

B

B

0 and 21 μBμ

IXXXXIB 1 22

and 1

where

BXXXXI

IXXXXIB

1

1

1 Now 2

2

XXXXXXXX

XXXXXXXXI

XXXXIXXXXI

11

11

11

Since

βXXXXXIXβμBμ 1

221

21 1

Also

.idempotent is Thus XXXXIBB 1

XXXXXXXX

XXXXXXXXI11

11

XXXXI 1

βXXXXXIXβ 1

22

1

βXXXXXXXXβ 1

22

1

02

12

βXXXXβ

pn

rank of

idempotent is Thus XXXXIBB 1

.rank full of is if Now pprankrank XXXXXX 1

XXXXIEXXXXE 11 21 and Let

0EE

EEIEE

21

2121

and

with idempotent symmetric are and

Thus 12 pnranknrank EE

.0 with ,~ Hence 22

2

pnspn

U yBy

Finally

and yCyyXXXXIy 1

pns

12

XXXAyAyXXXβ 11 whereˆ Since

XXXXIC 1

pn

1with

XXXXIIXXXXCA 11

pn

1then

XXXXIXXXX 11

pn

XXXXXXXXXXXX 111

pn

0XXXXXXXX 11

pn

Thus

are independent.

yXXXXIyyXXXβ 1

pns

1 and ˆ 21

Summary

.0 with ,~ 2. 22

2

pnspn

U

12, ~ ˆ 1. XXββ

pN

tindependen are and ˆ 3. 2sβ

Example

• Data is collected for n = 15 cases on the variables Y (the dependent variable) and X1, X2, X3 and X4.

• The data and calculations are displayed on the next page:

x 1 x 2 x 3 x 4

52 59 34 7449 67 37 8953 51 31 8738 76 21 7456 69 27 8648 74 30 8351 69 28 76

X = 52 70 37 7757 76 29 8549 63 29 7648 72 26 7548 66 31 8746 67 25 7548 61 21 8045 66 32 78

36806 49540 21734 59457XX = 49540 68096 29284 80543

21734 29284 13118 3521659457 80543 35216 96736

63637.4Xy = 85176.9

37647103047.5

88.686.6

110.259.291.785.776.8

y = 86.197.179.782.592.374.287.979.8

0.004291 -0.00019 -0.00131 -0.002

(XX)-1 = -0.00019 0.000979 0.00018 -0.00076

-0.00131 0.00018 0.003771 -0.00072-0.002 -0.00076 -0.00072 0.002134

1.238816

(XX)-1Xy = -0.64272-0.003

0.840056

β

yXc

yXβyy

where20.59156ˆ11

1

ˆ11

1

4

1

17

1

2

2

jjj

ii cy

s

4.53779220.59156 s

0.2096250.043943

0.2786530.077648

0.1419820.020159

0.2972470.088356

4

3

2

ˆ

ˆ

ˆ

ˆ

s

s

s

s

βvar 0.088356 -0.00401 -0.02694 -0.04116

s2(XX)-1

= -0.00401 0.020159 0.003709 -0.01567-0.02694 0.003709 0.077648 -0.0148-0.04116 -0.01567 -0.0148 0.043943

1.238816

(XX)-1Xy = -0.64272-0.003

0.840056

β

0.2096250.043943

0.2786530.077648

0.1419820.020159

0.2972470.088356

4

3

2

ˆ

ˆ

ˆ

ˆ

s

s

s

s

Compare with SPSS output

Estimates of the coefficients


with an intercept

Consider the random variable Y with

1. E[Y] = 0+ 1X1+ 2X2 + ... + pXp

(intercept included)

and

2. var(Y) = 2

• where 1, 2 , ... ,p are unknown parameters

• and X1 ,X2 , ... , Xp are nonrandom variables.

• Assume further that Y is normally distributed.

npnn

p

p

pn xxx

xxx

xxx

y

y

y

21

22221

11211

2

1

0

2

1

1

1

1

,,Let Xβy

ondistributi , a has i.e. 2IβXy

N


N

The matrix formulation (intercept included)

Then the model becomes

Thus to include an intercept add an extra column of 1’s in the design matrix X and include the intercept in the parameter vector

nn x

x

x

y

y

y

1

1

1

,, 2

1

1

02

1

Xβy

The matrix formulation of the Simple Linear regression model

2

1

2

1

12

1

21

1

1

1

111

xnSxn

xnn

xx

xn

x

x

x

xxx

xx

n

ii

n

ii

n

ii

n

n XX

and

yxnS

yn

yx

y

y

y

y

xxx xyn

iii

n

ii

n

n

1

12

1

21

111

yX

nxn

xnxnS

xnxnSn

xnSxn

xnn

xx

xx

xx

2

22

1

2

1

1

XX

12

1

1

xxxx

xxxx

SS

xS

x

S

x

n

Now

thus

yxnS

yn

SS

xS

x

S

x

n

xy

xxxx

xxxx

1

1

ˆ

ˆ

2

1

1

yXXX

yxnS

Syn

S

x

yxnSS

xyn

S

x

n

xyxxxx

xyxxxx

1

1 2

xx

xy

xx

xy

S

S

xS

Sy

Finally

xxxx

xxxx

S

ss

S

x

sS

xs

S

x

ns

22

222

12

1

1

ˆ

ˆvar XX

xx

xx

xx

S

sx

S

s

sS

x

n

2

10

2

1

22

ˆ,ˆcov

ˆvar

1ˆvar

Thus

The Gauss-Markov Theorem

An important result in the theory of Linear models

Proves optimality of Least squares estimates in a more general setting

Assume the following model Linear Model

IyβXy 2var and

E

We will not necessarily assume Normality.

Consider the least squares estimate of

ˆ 1 yXXXβ

β

nn yayaya

2211

1

ˆ

ya

yXXXcβc

is an unbiased linear estimator of βc

The Gauss-Markov Theorem

Assume

IyβXy 2var and

E

Consider the least squares estimate of

ˆ 1 yXXXβ

β

nn yayaya

2211

1

ˆ

ya

yXXXcβc

, an unbiased linear estimator of βc

and

Let nn ybybyb

2211yb

denote any other unbiased linear estimator of βc

βcyb ˆvarvarthen

Proof Now IyβXy 2var and

E

βcβXXXXc

yXXXc

yXXXcβc

1

1

1

ˆ

E

EE

cXXccXXXXXXc

cXXXIXXXc

cXXXyXXXc

yXXXcβc

12112

121

11

1

var

var

varˆvar

Now is an unbiased estimator of if

yb βc

ββcβXbybyb

allfor EE

cbXcXb

or i.e.

Also

bbbIbbybyb 22varvar

Thus

cXXcbbβcyb 122ˆvarvar

bXXXXbbb 122

bXXXXIb 12

bXXXXIuuu

bXXXXIXXXXIb

12

112

where0

Thus

βcyb ˆvarvar

The Gauss-Markov theorem states that

is the Best Linear Unbiased Estimator (B.L.U.E.) of

βc

βc

Hypothesis testing for the GLM

The General Linear Hypothesis

Testing the General Linear Hypotheses

The General Linear Hypothesis H0: h111 + h122 + h133 +... + h1pp = h1

h211 + h222 + h233 +... + h2pp = h2

...

hq11 + hq22 + hq33 +... + hqpp = hq

where h11h12, h13, ... , hqp and h1h2, h3, ... , hq are known coefficients.

In matrix notation11

qppqhβH

Examples 1. H0: 1 = 0

2. H0: 1 = 0, 2 = 0, 3 = 0

3. H0: 1 = 2

6

5

4

3

2

1

16

β 0,000001

1161

hH

0

0

0

,

000100

000010

000001

1363hH

0,0000111161

hH

Examples 4. H0: 1 = 2 , 3 = 4

5. H0: 1 = 1/2(2 + 3)

6. H0: 1 = 1/2(2 + 3), 3 = 1/3(4 + 5 + 6)

6

5

4

3

2

1

16

β

0

0,

001100

0000111262

hH

0,0001112

121

61

hH

0

0,

100

000112

31

31

31

21

21

62hH

TheLikelihood Ratio Test

The joint density of is:

βXyβXyβy

22/2

2

2

1exp

2

1

nf

y

The likelihood function

βXyβXyβy

22/2

2

2

1exp

2

1

nL

The log-likelihood function

βXyβXy

ββ yy

22

22

22

2

1ln2ln

ln

nn

Ll

Defn (Likelihood Ratio Test of size )Rejects

H0:

against the alternative hypothesis

H1: .

when

and K is chosen so that

KL

L

f

f

2

2

ˆˆ

ˆˆ

)|(max

)|(max

β

β

θx

θx

y

y

θ

θ

and allfor )|( θxθxxC

dfCP

0 oneleast at for )|( θxθxxC

dfCP

hβH

hβH

hβHβ

βββ

ˆ: assuming of sM.L.E.' theare

ˆˆ and of sM.L.E.' theare ˆˆ where

02

222

H

Note

2ˆ and ˆ

find To β

We will maximize.

condition side thesubject to ly equivalent

2

1ln2ln

2

22

222

β

βXyβXyβ

y

y

L

l nn

βXyβXyyXXXβ ˆˆˆ and ˆ 121

n

hβH

:0H

The Lagrange multiplier technique will be used for this purpose

We will maximize.

hβHλ

βXyβXy

hβHλβλβ y

2

1ln2ln

,

22

22

22

nn

lg

0hβH0

λ

λβ

gives ,2g

β

hβHλ

β

βXyβXy

β

λβ

2

2

2

1,

g

0λHβXXyX

222

12

or0λHβXXyX

2

λHyXβXX 2

λHXXyXXXβ 121

0

2

1

2

,2222

2

βXyβXyλβ

ng

finally

or βXyβXy

n

12

Thus the equations for are

λHXXyXXXβ 121 ˆˆ

Now

or

βXyβXy

ˆˆ1ˆ 2

n

hβH

ˆ

λHXXHyXXXHβHh 121 ˆˆ

yXXXHhλHXXH 1

2

1

ˆ1

and yXXXHHXXHhHXXHλ 11111

2ˆ1

the general linear model. the simple linear model linear regression

Documents

y n slide

line y

distribution of y

dependent variable y

density of y i

mean of y i

variables x

independent variable