lecture 11

Models Not of Full Rank

Estimation/Hypothesis Testing

YNx1 = XNxpbpx1 + eNx1e = y Ey and Ee = 0 EY =Xb, vare = Eee = 2IN

results in e 0,2I and Y Xb,2I

Normal equations:

Using least squares we obtain

XXb = XY

Consider a completely randomized design

y ij = + i + e ij for i = 1, 2, 3

Then b = 1 2 3 and data represented as

observation 1 2 3y11 1 1 0 0y12 1 1 0 0y13 1 1 0 0y21 1 0 1 0y22 1 0 1 0y31 1 0 0 1

where the data is:normal off-type aberrant101 84 32105 8894

totals 300 172 32

The sum of the last 3 columns is the first column; every y ij contains therefore the first column of X is allones. Also every y ij contains just one therefore the sum of the last three columns is one hence X is not offull column rank. X X is square symmetric; its elements are inner products of the columns of X with eachother. X is not of full column rank therefore X Xis not of full column rank.

1

XX =

6 3 2 13 3 0 02 0 2 01 0 0 1

NOTE: elements of XX are the number of times that parameter of the model occurs in a totali.e. occurs 6 times in y,1 occurs 3 times in y,2 occurs 2 times in y,3occurs once in y occurs 3 times in y1,1 occurs 3 times in y1,2 and 3 do not occur in y1

and

XY =

1 1 1 1 1 11 1 1 0 0 00 0 0 1 1 00 0 0 0 0 1

y11y12y13y21y22y31

=

yy1y2y3

=

50430017232

101 = y11 = + 1 + e11105 = y12 = + 2 + e12

XY is a vector consisting of the inner product of columns of X with Y and since the nonzero elements of X are

ones, we obtain

yy1y2y3

.

Since XX is not of full column rank, there is not one unique solution to the normal equations

XXb0 = XY

2

where

b0 =

0

10

20

30

and applying generalized inverse G we write

GXXb0 = GXY b0 = GXY

6 3 2 13 3 0 02 0 2 01 0 0 1

0

10

20

30

=

yy1y2y3

=

50430017232

The normal equations are re-written as

EXY = XY

replacing b by b0on LHS.

Hence a solution isb0 = GXY

Consequence of a Solution:b0 is a function of Y

a.

Eb0 = GXEY= GXXb= Hb

b.varb0 = varGXY

= GXvarYXG= GXXG2I

For XX symmetric orthogonal permutation matrix P

3

XPXP = PXXP

2 =A11 A12A12 A22

then

G = PA111 00 0

P

andGXXG = G

and varb0 = G2

c. Estimating Ey

Ey = y = Xb0

= XGXyNote this vector is invariant to G since XGX is invariant hence y is always the same regardless ofb0

d. Residual Error Sum of Squares

SSE = y Xb0 y Xb0

= yy yXb0 Xb0 y + Xb0 Xb0

= yI XGXI XGXy= yI XGXy= yy yXGXy= yy b0Xy in computational form

and XGX is invariant to G so SSE is invariant to G and hence invariant to b0.

e. Estimating residual error variance

With y NXb.2I

ESSE = EyI XGXy= trI XGX2I + XbI XGXXb= 2rankI XGX= N rankX2

4

Hence

2 = SSEN rankX

f. Partitioning the SST (sum of squares total):

SST = yy SST = yySSM = Ny2 = yN111y from fitting a general mean SSTm = yy Ny2

SSR = yXGXy = b0Xy SSRm = yXGX N111y =SSR SSM SSRm0 = yXGX N111SSE = yI XGXy SSE = yI XGXy SSE = yI XGXy

g. Coefficient of Determination R2The estimated expected values of y are y

The coefficient of determination = product-moment correlation between observations y and y 2so

R2 =

N

i=1

y i y y i y

2

N

i=1

y i y2N

i=1

y i y

2

Note:

XXGX = X

and because 1 is the first row of X, 1XGX = 1 y = y and thus

R2 = SSRm2

SSTmSSRm=

SSRmSSTm

5

lecture 11

Documents