lecture 11
DESCRIPTION
mTRANSCRIPT
-
Models Not of Full Rank
Estimation/Hypothesis Testing
YNx1 = XNxpbpx1 + eNx1e = y Ey and Ee = 0 EY =Xb, vare = Eee = 2IN
results in e 0,2I and Y Xb,2I
Normal equations:
Using least squares we obtain
XXb = XY
Consider a completely randomized design
y ij = + i + e ij for i = 1, 2, 3
Then b = 1 2 3 and data represented as
observation 1 2 3y11 1 1 0 0y12 1 1 0 0y13 1 1 0 0y21 1 0 1 0y22 1 0 1 0y31 1 0 0 1
where the data is:normal off-type aberrant101 84 32105 8894
totals 300 172 32
The sum of the last 3 columns is the first column; every y ij contains therefore the first column of X is allones. Also every y ij contains just one therefore the sum of the last three columns is one hence X is not offull column rank. X X is square symmetric; its elements are inner products of the columns of X with eachother. X is not of full column rank therefore X Xis not of full column rank.
1
-
XX =
6 3 2 13 3 0 02 0 2 01 0 0 1
NOTE: elements of XX are the number of times that parameter of the model occurs in a totali.e. occurs 6 times in y,1 occurs 3 times in y,2 occurs 2 times in y,3occurs once in y occurs 3 times in y1,1 occurs 3 times in y1,2 and 3 do not occur in y1
and
XY =
1 1 1 1 1 11 1 1 0 0 00 0 0 1 1 00 0 0 0 0 1
y11y12y13y21y22y31
=
yy1y2y3
=
50430017232
101 = y11 = + 1 + e11105 = y12 = + 2 + e12
XY is a vector consisting of the inner product of columns of X with Y and since the nonzero elements of X are
ones, we obtain
yy1y2y3
.
Since XX is not of full column rank, there is not one unique solution to the normal equations
XXb0 = XY
2
-
where
b0 =
0
10
20
30
and applying generalized inverse G we write
GXXb0 = GXY b0 = GXY
6 3 2 13 3 0 02 0 2 01 0 0 1
0
10
20
30
=
yy1y2y3
=
50430017232
The normal equations are re-written as
EXY = XY
replacing b by b0on LHS.
Hence a solution isb0 = GXY
Consequence of a Solution:b0 is a function of Y
a.
Eb0 = GXEY= GXXb= Hb
b.varb0 = varGXY
= GXvarYXG= GXXG2I
For XX symmetric orthogonal permutation matrix P
3
-
XPXP = PXXP
2 =A11 A12A12 A22
then
G = PA111 00 0
P
andGXXG = G
and varb0 = G2
c. Estimating Ey
Ey = y = Xb0
= XGXyNote this vector is invariant to G since XGX is invariant hence y is always the same regardless ofb0
d. Residual Error Sum of Squares
SSE = y Xb0 y Xb0
= yy yXb0 Xb0 y + Xb0 Xb0
= yI XGXI XGXy= yI XGXy= yy yXGXy= yy b0Xy in computational form
and XGX is invariant to G so SSE is invariant to G and hence invariant to b0.
e. Estimating residual error variance
With y NXb.2I
ESSE = EyI XGXy= trI XGX2I + XbI XGXXb= 2rankI XGX= N rankX2
4
-
Hence
2 = SSEN rankX
f. Partitioning the SST (sum of squares total):
SST = yy SST = yySSM = Ny2 = yN111y from fitting a general mean SSTm = yy Ny2
SSR = yXGXy = b0Xy SSRm = yXGX N111y =SSR SSM SSRm0 = yXGX N111SSE = yI XGXy SSE = yI XGXy SSE = yI XGXy
g. Coefficient of Determination R2The estimated expected values of y are y
The coefficient of determination = product-moment correlation between observations y and y 2so
R2 =
N
i=1
y i y y i y
2
N
i=1
y i y2N
i=1
y i y
2
Note:
XXGX = X
and because 1 is the first row of X, 1XGX = 1 y = y and thus
R2 = SSRm2
SSTmSSRm=
SSRmSSTm
5