02 chapter alr for printing(1)
Post on 01-Jun-2018
233 Views
Preview:
TRANSCRIPT
-
8/9/2019 02 Chapter ALR for Printing(1)
1/117
Introduction to Simple Linear Regression: I
consider responseY and predictorXmodel for simple linear regression assumes a mean function
E(Y | X = x) of the form 0+1x and a variance functionVar(Y |X=x) that is constant:
E(Y |X=x) =0+ 1x and Var(Y |X=x) =2
3 parameters are 0 (intercept), 1(slope) & 2 >0
to interpret2, define random variablee =YE(Y |X=x)so thatY = E(Y |X=x) +e
results inA.2.4 of ALR say that E(e) = 0 and Var(e) =2
2 tells how closeYis likely to be to mean value E(Y |X=x)
ALR21, 292 II1
-
8/9/2019 02 Chapter ALR for Printing(1)
2/117
Introduction to Simple Linear Regression: II
let (xi, yi),i= 1, . . . , n, denote predictor/response pairs (X, Y)
ei=yiE(Y |X=xi) =yi(0+1xi) is called a statisticalerror and represents distance betweenyiand its mean function
will make two additional assumptions about the errors:
[1]E(ei|xi) = 0, implying a scatter plot ofeiversusxishouldresemble a null plot (random deviations about zero)
[2]e1, . . . , enare a set ofnindependent random variables (RVs)
will make third additional assumption upon occasion:
[3]conditional onxis, errorseis are normally distributed
note: normally distributed is same as Gaussian distributed(preferred expression in engineering & physical sciences)
ALR21, 29 II2
-
8/9/2019 02 Chapter ALR for Printing(1)
3/117
Estimation of Model Parameters: I
Q: given data (xi, yi), i = 1, . . . , n (n realizations of RVs XandY), how should we determine parameters0 & 1?
since 0& 1determine a line, question is equivalent to decidinghow best to draw a line through a scatterplot of (xi, yi)
for n >2, possibilities for defining best (lotsmore exist!): hire an expert to eye ball a line (Mosteller et al., 1981) find line minimizing distances between data and all possible
lines, with some considerations being
direction (vertical, horizontal, perpendicular) squared difference, absolute difference, etc. look at all possible lines determined by two distinct points in
scatterplot, and pick one with median slope (sounds bizarre,but later on will discuss why this might be of interest)
ALR22,23 II3
-
8/9/2019 02 Chapter ALR for Printing(1)
4/117
-
8/9/2019 02 Chapter ALR for Printing(1)
5/117
Vertical, Horizontal&PerpendicularLeast Squares
0 10 20 30 40 50
10
20
30
40
50
60
xi
yi
II5
-
8/9/2019 02 Chapter ALR for Printing(1)
6/117
Estimation of Model Parameters: II
one strategy: form to-be-defined estimators 0 & 1 of0 &1, after which form residuals(observed errors):
ei=yi (0+ 1xi) =yi yi,where yi= (0+ 1xi) is fitted value forith case
Q: why isnt residual eiin general equal to error
ei=yi (0+ 1xi)?Q: if per chance we had 0=0& 1=1, would fitted value
yibe equal to actual value yi?
ALR22 II6
-
8/9/2019 02 Chapter ALR for Printing(1)
7/117
Least Squares Criterion: I
least squares scheme: estimate 0& 1such that sum of squaresof resulting residuals is as small as possible
since residuals are given by ei=yi (0 +1xi) once 0& 1are known, consider
RSS(b0, b1) =n
i=1
[yi (b0+b1xi)]2,
i.e., theresidual sum of squareswhen we useb0for the inter-cept andb1 for the slope
least squares estimators 0& 1 are such that
RSS(0,1)< RSS(b0, b1)
when eitherb0 = 0orb1 = 1 (or both)ALR24 II7
-
8/9/2019 02 Chapter ALR for Printing(1)
8/117
Least Squares Criterion: II
Q: how do we set b0 &b1 to make RSS(b0, b1) the smallest?could try lots of different values (a grid search a potentially
exhausting task!), but can put calculus to good use here
to motivate how to find 0 & 1, first consider simpler mean
function E(Y |X=x) =1x(regression through origin)model is nowY =1x+e, and task is to findb1 minimizing
RSS(b1) =n
i=1[yi b1xi]2 =
n
i=1
y2i 2b1xiyi+b21x2i
Q: why isb1minimizing RSS(b1) is same aszminimizing
f(z) =az2 +bz, where a=n
i=1
x2i and b= 2n
i=1
xiyi?
ALR24, 47, 48 II8
-
8/9/2019 02 Chapter ALR for Printing(1)
9/117
Least Squares Criterion: III
sincea= i x2i >0,f(z) =az2 +bz asz sincef(0) = 0, minimizer zmust be such that f(z) 0
0
0
z
f(z)
iff(z)
-
8/9/2019 02 Chapter ALR for Printing(1)
10/117
Least Squares Criterion: IV
roots of polynomialaz2 +bz+cgiven by quadratic formula:b
b2 4ac
2a
whenc= 0, one root is 0, and nonzero root is b/a, soz= b
2a= i xiyi
i x2i
= 1 ()
alternative approach to finding minimizer ofRSS(b1): differen-tiate with respect tob1, set result to 0 and solve for b1:
d RSS(b1)
db1=
d i[yi b1xi]2db1
= 2i
xi(yi b1xi) = 0,
which yields same expression for 1as stated in()ALR47, 48 II10
-
8/9/2019 02 Chapter ALR for Printing(1)
11/117
Least Squares Criterion: V
return now to mean function E(Y | X = x) = 0+ 1X, forwhich RSS(b0, b1) =
i[yi b0 b1xi]2
calculus-based approach to get least squares estimators 0 &1 follows a path similar to that for E(Y |X=x) =1X
leads to two equations to solve for two unknowns (0 & 1)
differentiate RSS(b0, b1) with respect to b0and set result to 0:
2
i(yi b0 b1xi) = 0, giving b0n+b1
ixi=
iyi
differentiate RSS(b0, b1) with respect to b1and set result to 0:
2
i
xi(yib0b1xi) = 0, giving b0
i
xi+b1
i
x2i =
i
xiyi
ALR293 II11
-
8/9/2019 02 Chapter ALR for Printing(1)
12/117
Least Squares Criterion: VI
so-callednormal equationsfor simple linear regression are thusb0n+b1
i
xi=
i
yi and b0
i
xi+b1
i
x2i =
i
xiyi
using x= 1n
i xi& y=
1n
i yi, 1st normal equation gives
b0= y b1xreplaceb0in 2nd normal equation with right-hand side of above:
(y b1x)
i
xi+b1
i
x2i =
i
xiyi
after a bit of algebra, getb1=
i xiyi yi xii x
2i x
i xi
=
i xiyi nxyi x
2i nx2
= 1,
and hence 0= y 1xALR293, 204 II12
-
8/9/2019 02 Chapter ALR for Printing(1)
13/117
Sum of Cross Products and Sum of Squares
define sum of cross products and sum of squares for xs:
SXY=
i
(xi x)(yi y) & SXX=
i
(xi x)2 ()
Problem 1: show thati
(xix)(yiy) = i
xiyinxy & i
(xix)2 = i
x2inx2
can thus write
1= i xiyi nxyi x2i nx2 = SXYSXX
note: should avoid
i xiyinxy&
i x2inx2 when actually
computing 1 use SXYand SXXfrom()insteadALR294, 23 II13
-
8/9/2019 02 Chapter ALR for Printing(1)
14/117
Sufficient Statistics
since1=
SXY
SXXand 0= y 1x,
need only know x, y, SXY& SXXto form 0& 1
since
x=1n
ni=1
xi, y=1n
ni=1
yi, SXY=n
i=1
xiyinxy, SXX= ni=1
x2inx2
follows that 0& 1depend only on foursufficient statistics:n
i=1xi,
ni=1
yi,
ni=1
xiyi and
ni=1
x2i
in theory, can dispense with 2nvalues (xi, yi),i= 1, . . . , n, andjust keep 4 sufficient statistics as far as 0 & 1 are concerned
ALR294, 23 II14
-
8/9/2019 02 Chapter ALR for Printing(1)
15/117
Atmospheric Pressure & Boiling Point of Water
as 1st example, reconsider Forbess recordings of atmosphericpressure and boiling point of water, which physics suggests arerelated by
log (pressure) = 0+ 1 boiling pointtaking responseYto be log
10
(pressure) and predictorXto beboiling point, will estimate 0 & 1 for model
Y =0+ 1X+e
via least squares based upon data (xi, yi),i= 1, . . . , 17
taking log to mean log base 10, computations in Ryield
x= 202.9529, y= 1.396041, SXY= 4.753781 & SXX= 530.7824,
from which we get
1=SXY
SXX= 0.008956178 and 0= y 1x= 0.4216418
ALR25, 26 II15
-
8/9/2019 02 Chapter ALR for Printing(1)
16/117
-
8/9/2019 02 Chapter ALR for Printing(1)
17/117
Predicting the Weather
as 2nd example, reconsidern= 93 years of measured early/lateseason snowfalls from Fort Collins, Colorado
taking response Yand predictor Xto be late season (JanJune)and early season (SeptDec) snowfalls, entertain model
Y =0+ 1X+e
computations in Ryield
x= 16.74409, y= 32.04301, SXY= 2229.014 & SXX= 10954.07,
from which we get
1=SXY
SXX= 0.2034873 and 0= y 1x= 28.6358
ALR8, 9 II17
-
8/9/2019 02 Chapter ALR for Printing(1)
18/117
Scatterplot of Late Snowfall Versus Early Snowfall
0 10 20 30 40 50
10
20
30
40
50
60
Early snowfall (inches)
Latesnowfall(i
nches)
ALR8 II18
-
8/9/2019 02 Chapter ALR for Printing(1)
19/117
Sample Variances, Covariance and Correlation
define sample variance and sample standard deviation ofxs:
SD2x=
i(xi x)2n 1 =
SXX
n 1 & SDx=
SXX
n 1note: sometimesnis used in place ofn
1 in defining SD2x
after defining SYY= i(yiy)2 (sum of squares forys), definesimilar quantities forys:
SD2y= i(yi y)2n
1
= SYY
n
1
& SDy=
SYY
n
1
finally define sample covariance and then sample correlation:
sxy=
i(xi x)(yi y)
n 1 = SXY
n 1 & rxy= sxy
SDxSDy
ALR23 II19
-
8/9/2019 02 Chapter ALR for Printing(1)
20/117
Alternative Expression for Slope Estimator
Problem 2: alternative expression for 1=SXY/SXXis
1=rxySDySDx
Problem 3:
1
rxy
1
note that, ifxis & yis are such that SDy = SDx, then esti-mated slope is same as sample correlation, as following set ofplots illustrate
ALR24 II20
-
8/9/2019 02 Chapter ALR for Printing(1)
21/117
Sample Correlation rxy= 0.999
!"# #"$ #"# #"$ !"#
!"
#
#"
$
#"
#
#"
$
!
"#
%&
'&
II21
-
8/9/2019 02 Chapter ALR for Printing(1)
22/117
Estimating 2: I
simple linear regression model has 3 parameters: 0(intercept),1 (slope) & 2 (variance of errors)
with0&1estimated by 0 & 1, will base estimator for 2
on variance of residuals (observed errors)
recall definition of residuals: ei=yi (0+
1xi)
in view of, e.g.,
SD2x=
i(xi x)2n 1
obvious estimator of2 would appear to bei(ei e)2n 1 , where
e=1
n
ni=1
ei
Problem 4: show thate= 0alwaysfor simple linear regression
ALR26 II22
-
8/9/2019 02 Chapter ALR for Printing(1)
23/117
Estimating 2: II
obvious estimator thus simplifies to ie2i /(n 1)taking RSSto be shorthand for RSS(0,1), we have
RSS=n
i=1[yi (0+ 1xi)]2 =
n
i=1e2i ,
so the obvious estimator of2 is RSS/(n 1)can show (e.g., Seber, 1977, p. 51) that unbiased estimator 2
of2, i.e., E(2) =2, is
2 = RSSn 2,
wheren2 = sample size # of parameters in mean function;obvious estimator 2 and hence is biased towards zero
ALR26, 27, 306 II23
-
8/9/2019 02 Chapter ALR for Printing(1)
24/117
-
8/9/2019 02 Chapter ALR for Printing(1)
25/117
Scatterplot of log10(Pressure) Versus Boiling Point
195 200 205 210
1.3
5
1.4
0
1.4
5
Boiling point
log(Pressu
re)
ALR6 II25
-
8/9/2019 02 Chapter ALR for Printing(1)
26/117
Residual Plot for Forbes Data
!"# $%% $%# $!%
%&%
%#
%&%
%%
%&%
%#
%&%
!%
%&%
!
#
'()*)+, .()+/
012)345*2
ALR6 II26
-
8/9/2019 02 Chapter ALR for Printing(1)
27/117
Estimating 2: IV
for the Forbes data, computations in RyieldSYY= 0.04279135, SXY= 4.753781 & SXX= 530.7824,
from which we get
RSS= SYY
(SXY)2
SXX= 0.0002156426
sincen= 17 for the Forbes data,
2 = RSS
n 2=RSS
15 = 0.00001437617
standard error of regression is
= 0.00001437617 = 0.003791592(also called residual standard error)
reddashed horizontal lines on residual plot showALR26 II27
-
8/9/2019 02 Chapter ALR for Printing(1)
28/117
Estimating 2: V
for the Fort Collins data, computations in Ryield
SYY= 17572.41, SXY= 2229.014 & SXX= 10954.07,
from which we get
RSS= SYY (SXY)2
SXX= 17118.83
sincen= 93 for the Fort Collins data,
2 = RSS
n
2=
RSS
91 = 188.119
standard error of regression is
=
354.5912 = 13.71565
ALR26 II28
-
8/9/2019 02 Chapter ALR for Printing(1)
29/117
Scatterplot of Late Snowfall Versus Early Snowfall
0 10 20 30 40 50
10
20
30
40
50
60
Early snowfall (inches)
Latesnowfall(i
nches)
ALR8 II29
-
8/9/2019 02 Chapter ALR for Printing(1)
30/117
Residual Plot for Fort Collins Data
! "! #! $! %! &!
$!
#!
"!
!
"!
#!
$!
'()*+ -./01(** 23.456-7
86-39:(*-
II30
-
8/9/2019 02 Chapter ALR for Printing(1)
31/117
Matrix Formulation of Simple Linear Regression: I
matrix theory offers an alternative formulation for simple linearregression, with the advantage that it generalizes readily tohandle multiple linear regression
start by defining a n-dimensional column vector Y with yis;
ann 2 matrixXwhose 1st column consists of just 1s, andwhose 2nd has thexis; a 2-dimensional vector containing0and1; and ann-dimensional vectorecontaining theeis:
Y= y1y2...
yn , X=
1 x11 x2... ...1 xn
, = 01 , e= e1e2...
en
matrix version of simple linear regression model is Y=X+e
ALR63, 64, 60 II31
-
8/9/2019 02 Chapter ALR for Printing(1)
32/117
Matrix Formulation of Simple Linear Regression: II
since
X=
1 x11 x2... ...1 xn
& =
01
, it follows that X =
0+ 1x10+ 1x2
...0+ 1xn
henceith row of matrix equationY =X + esays
yi=0+ 1xi+ei,
which is consistent with modelY =0 +1X+ e; see also II2
let e andX denote the transposes ofe and X; i.e., e is ann-dimensionalrow vector e1 e2 en, whileXis a 2 nmatrix taking the form
1 1 1x1 x2 xn
ALR299, 300, 301 II32
-
8/9/2019 02 Chapter ALR for Printing(1)
33/117
Matrix Formulation of Simple Linear Regression: III
sincee=Y
X and sinceee= i e2i , can express sum ofsquares of errors as
ee= (Y X)(Y X)if we entertain b=
b0 b1
rather than unknown =
0 1
,
corresponding residuals are given byY
Xb, so residual sum
of squares can be written as
RSS(b) = (Y Xb)(Y Xb)= (Y bX)(Y Xb)= YY
YXb
bXY + bXXb
= YY 2YXb + bXXb,where we make use of 2 facts: (1) transpose of a product isproduct of transposes in reverse order & (2) transpose of ascalar is itself (hencebXY= (bXY)=YXb)
ALR61, 62, 300, 301, 304 II33
-
8/9/2019 02 Chapter ALR for Printing(1)
34/117
Taking Derivatives with Respect to Vector b: I
supposef(b) is a scalar-valued function of vectorb(elementsareb1, b2, . . . , bq)
two examples, for which ais a vector (ith element is ai) & Ais anq qmatrix (element inith row &jth column isAi,j):
f1(b) =ab=q
i=1
aibi and f2(b) =bAb=q
i=1
qj=1
biAi,jbj
define
df(b)db
=
df(b)db1
df(b)db2
...df(b)
dbq
ALR301, 304 II34
-
8/9/2019 02 Chapter ALR for Printing(1)
35/117
Taking Derivatives with Respect to Vector b: II
can show (see, e.g., Rao, 1973, p. 71) that
f1(b) =ab has derivative df1(b)
db =a
and
f2(b) =bAb has derivative df2(b)
db = 2Ab
(not hard to show do it for fun and games!)
Q: what is the derivative off3(b) =ba?
Q: what is the derivative off4(b) =bb=
i b
2i ?
Q: what is the derivative of
f5(b) =cCb=
pi=1
qj=1
ciCi,jbj,
wherecis ap-dimensional vector andCis ap qmatrix?II35
-
8/9/2019 02 Chapter ALR for Printing(1)
36/117
Matrix Formulation of Simple Linear Regression: IV
returning toRSS(b) =YY 2YXb + bXXb,
taking the derivative off(b) = RSS(b) with respect to bandsetting the resulting expression to 0 (a vector of zeros) yields
the matrix version of the normal equations:XXb= XY,
where we have made use of the facts
d(YY)db =0,
d(YXb)db =XY and
d(bXXb)db = 2XXb
least squares estimator of is solution to normal equations:
XX=XY
ALR304 II36
-
8/9/2019 02 Chapter ALR for Printing(1)
37/117
Matrix Formulation of Simple Linear Regression: V
lets verify that solution to XX=XYyields same estima-tors 1 & 0as before, namely, SXY/SXX& y 1x
now
XX= 1 1 1x1 x2 xn1 x11 x2... ...1 xn
= n i xii xi i x2i = n nxnxi x2iand
XY= 1 1 1x1 x2 xn
y1
y2...yn
= i yii xiyi
= nyi xiyi
Problem 6: finish the verification!
ALR63, 64 II37
-
8/9/2019 02 Chapter ALR for Printing(1)
38/117
Properties of Least Squares Estimators: I
since E(Y |X=x) =0+ 1x, fitted mean function isE(Y |X=x) = 0+ 1x, ()which is a line with intercept 0 and slope 1
recalling that 0
= y
1x, start from the right-hand side of
()withxset to xto get0+ 1x= y 1x+1x= y,
which says point (x,y) mustlie on fitted mean function
vertical dashed line on following plots indicates the value of x,while horizontal dashed line, the value of y
ALR27, 28 II38
-
8/9/2019 02 Chapter ALR for Printing(1)
39/117
Scatterplot of log10(Pressure) Versus Boiling Point
195 200 205 210
1.3
5
1.4
0
1.4
5
Boiling point
log(Pressu
re)
ALR6 II39
-
8/9/2019 02 Chapter ALR for Printing(1)
40/117
Scatterplot of Late Snowfall Versus Early Snowfall
0 10 20 30 40 50
10
20
30
40
50
60
Early snowfall (inches)
Latesnowfall(inches)
ALR8 II40
-
8/9/2019 02 Chapter ALR for Printing(1)
41/117
Properties of Least Squares Estimators: II
both 0 and 1 can be written as a linear combination of re-sponsesy1, y2, . . . , yn
since 1=SXY/SXXand since SXY=
i(xi x)yi(see Prob-lem 1), we have
1=i(xi x)yi
SXX= n
i=1
ciyi, where ci=xi xSXX
Q: whichyis will have the most/least influence on 1?
lets look atciplotted versusxifor Forbes data
ALR27 II41
-
8/9/2019 02 Chapter ALR for Printing(1)
42/117
Weights Versus Boiling Point for Forbes Data
!"# $%% $%# $!%
%&
%!#
%&
%%#
%
&%%#
%&
%!#
'()*)+, .()+/
0)
II42
-
8/9/2019 02 Chapter ALR for Printing(1)
43/117
Scatterplot of log10(Pressure) Versus Boiling Point
195 200 205 210
1.3
5
1.4
0
1.4
5
Boiling point
log(Pressu
re)
ALR6 II43
-
8/9/2019 02 Chapter ALR for Printing(1)
44/117
Random Vectors and Their Properties: I
column vector U is said to be a random vector if each of itselementsUiis an RV (random variable)
expected value of random vector denoted by E(U) is avector whoseith element is expected value ofith RVUiinU
for example, ifU= U1 U2 U3, thenE(U) =
E(U1)E(U2)E(U3)
ifUhas dimensionq, ifais ap-dimensional column vector ofconstants and ifAis ap qdimensional matrix of constants,
can show (fairly easily give it a try!) that
E(a + AU) =a + AE(U)
ALR303 II44
-
8/9/2019 02 Chapter ALR for Printing(1)
45/117
Random Vectors and Their Properties: II
recall that, ifUiand U
jare two RVs, theircovarianceis defined
to be
Cov(Ui, Uj) = E([Ui E(Ui)][Uj E(Uj)]note that Cov(Uj, Ui) = Cov(Ui, Uj) and that
Cov(Ui, Ui) = E([UiE(Ui)][UiE(Ui)] = E([UiE(Ui)]2
) = Var(Ui)by definition, the covariance matrix for q-dimensional random
vector U to be denoted by Var(U) is q qmatrix whose(i, j)th element is Cov(Ui, Uj)
for example, ifU= U1 U2 U3, thenVar(U) =
Var(U1) Cov(U1, U2) Cov(U1, U3)Cov(U2, U1) Var(U2) Cov(U2, U1)Cov(U3, U1) Cov(U3, U2) Var(U3)
ALR291, 292, 303 II45
-
8/9/2019 02 Chapter ALR for Printing(1)
46/117
Random Vectors and Their Properties: III
ifUhas dimensionq, ifais ap-dimensional column vector ofconstants and ifA is p qdimensional matrix of constants,can show (a bit more challenging, but still worth a try!) that
Var(a + AU) =A Var(U)A
RVs inUare uncorrelated if Cov(Ui, Uj) = 0 wheni =jif eachUiinUhas the same variance (
2, say) and ifUis areuncorrelated, then Var(U) =2I, whereI is theq qidentitymatrix (1s along its diagonal and 0s elsewhere)
for this special case,
Var(a + AU) =A(2I)A=2AA
ALR304, 292 II46
-
8/9/2019 02 Chapter ALR for Printing(1)
47/117
Properties of Least Squares Estimators: III
recall that least squares estimator
solves normal equations:XX=XY ()
Problem 6: in the case of simple linear regression,XX is aninvertible matrix and thus has an inverse(XX)1 such that
(XX)1XX= XX(XX)
1=I
premultiplication of both sides of()by(XX)1 yields(XX)1XX=(XX)1XY
from which we get I=(XX)1XYand hence=(XX)1XY
above succinctly expresses the fact that 0& 1(the elementsof) are linear combinations ofyis (the elements ofY)
ALR61, 64, 304 II47
-
8/9/2019 02 Chapter ALR for Printing(1)
48/117
Properties of Least Squares Estimators: IV
considering = (XX)
1
XY and taking conditional expec-tation of both sides yieldsE(|X) = E((XX)1XY|X)
= (XX)1XE(Y|X)= (X
X)
1X
E(X + e|X)
= (XX)1X(X +E(e|X))= (XX)1XX=
Q: whats the justification for each step above?E( | X) = holds for all X and hence E() = uncondi-
tionally, from which we can conclude that 0& 1are unbiasedestimators of0 & 1: E(0) =0 and E(1) =1
ALR305 II48
-
8/9/2019 02 Chapter ALR for Printing(1)
49/117
Properties of Least Squares Estimators: V
since (i) Var(a + AU) =A Var(U)A, (ii) (AB)=BAand(iii) (A1)= (A)1 for a square matrix, we have
Var(|X) = Var((XX)1XY|X)= (XX)1XVar(Y|X)((XX)1X)
= (XX)1
XVar(X + e|X)X(XX)1
= (XX)1XVar(e|X)X(XX)1= (XX)1X(2I)X(XX)1= 2(XX)1XX(XX)1
=
2
(XX)1
Q: justification for each step above?
ALR305 II49
-
8/9/2019 02 Chapter ALR for Printing(1)
50/117
Properties of Least Squares Estimators: VI
can readily verify thata bc d
1=
1
ad cb d bc a
since
XX= n nxnxi x2i , get (XX)1 = 1ni x2i n2x2 i x2i
nx
nx n since
Var(|X) =
Var(0|X) Cov(0,1|X)
Cov(0,1|X) Var(1|X)
=2(XX)1,
we find that, e.g., Var(1|X) = 2i x
2i nx2
= 2
SXX
by making use of
i x
2i nx2 =SXX(see Problem 1)
ALR64, 305, 28 II50
-
8/9/2019 02 Chapter ALR for Printing(1)
51/117
Properties of Least Squares Estimators: VII
Q: what happens to Var(1 | X) = 2
/SXX if we have theluxury of making the sample sizenas large as we want?
in practice, 2 is usually unknown and must be estimated via2, leading to the following estimator for Var(1|X):
Var(1|X) = 2SXX
term standard error is sometimes (but not always) used torefer to the square root of an estimated variance
standard error of1 denoted by se(1) is thus /SXX
ALR29 II51
-
8/9/2019 02 Chapter ALR for Printing(1)
52/117
Confidence Intervals and Tests for Slope: I
assuming errors ei in simple linear regression to be normallydistributed, parameter estimator1for slope 1is also normallydistributed (same holds for 0 also)
further assuming errors ei to have mean 0 and unknown vari-ance2, distribution of
1also depends upon unknown 2
with 2 estimated by 2, confidence intervals (CIs) and testsconcerning unknown true1need to be based ont-distributionwith degrees of freedom in sync with divisor used to form 2
letTbe a random variable with at-distribution withddegreesof freedom, and let t(/2, d) be percentage point such that
Pr(T t(/2, d)) =/2
ALR30, 31 II52
-
8/9/2019 02 Chapter ALR for Printing(1)
53/117
Confidence Intervals and Tests for Slope: II
plot below shows probability density function (PDF) for t-distribution withd = 15 degrees of freedom, witht(0.05, 15) =1.753 marked by vertical dashed line (thus area under PDF toright of line is 0.05, and area to left is 0.95)
! " # $ # " !$%$
$%#
$%"
$%!
&
'()
II53
-
8/9/2019 02 Chapter ALR for Printing(1)
54/117
Confidence Intervals and Tests for Slope: III
(1 ) 100% CI for slope 1 is set of points 1in interval1 t(/2, n 2)se(1) 1 1+t(/2, n 2)se(1)
example: for Forbes data (n= 17), 1= 0.008956 and se(1) =/SXX= 0.0001646 since = 0.003792 and
SXX= 23.04,
so 90% CI for 1is
0.0089561.7530.0001646 1 0.008956+1.7530.0001646becauset(0.05, 15) = 1.753, yielding
0.008668 1 0.009245
ALR31 II54
-
8/9/2019 02 Chapter ALR for Printing(1)
55/117
Confidence Intervals and Tests for Slope: IV
can test null hypothesis 1= 1 versus alternative hypothesis1 =1 by computingt-statistic
t=1 1se(1)
and comparing it to percentage points for t-distribution withn 2 degrees of freedom
example: for Fort Collins data (n = 93), 1 = 0.2035 andse(1) = /
SXX = 0.1310 since = 13.72 and
SXX =
104.7, sot-statistic for test of zero slope (1 = 0) ist=
0.2035 00.1310
= 1.553
ALR31 II55
-
8/9/2019 02 Chapter ALR for Printing(1)
56/117
Confidence Intervals and Tests for Slope: V
letting G(x) denote cumulative probability distribution func-tion for random variableT witht(91) distribution, i.e.,
G(x) = Pr(T x),p-value associated withtis 2(1
G(|t|)) see next overhead
p-value is 0.1239, which is not small by common standards (e.g.,0.05 or 0.01), so not much support for rejecting null hypothesis
ALR31 II56
-
8/9/2019 02 Chapter ALR for Printing(1)
57/117
-
8/9/2019 02 Chapter ALR for Printing(1)
58/117
Prediction: I
suppose now we want to predict a yet-to-be-observed responseygiven a settingx for the predictor
if assumed-to-be-true linear regression model were known per-fectly, prediction would be y = 0 + 1x, whereas modelsays
y=0+ 1x +e= y +eprediction error would be y y = e, which has variance
Var(e|x) =2
in general we must be satisfied using estimators 0 &
1 inlieu of true values 0 & 1, which intuitively should lead to
predictions that are not as good, resulting in a prediction errorwith a variance inflated above 2
ALR32, 33 II58
-
8/9/2019 02 Chapter ALR for Printing(1)
59/117
Prediction: II
using fitted mean functionE(Y |X=x) = 0 +1xto predictresponseyfor givenx, prediction is nowy= 0+ 1x,
and prediction error becomes
y
y
=0+ 1x
+e
(0+ 1x
)
recall that, ifU&Vare uncorrelated RVs, then Var(UV) =Var(U) + Var(V) (see Equation (A.2), p. 291, of Weisberg)
assuming that e is uncorrelated with RVs involved in formationof0&1, can regard U=0+ 1x
+e
andV = 0+ 1x
as uncorrelated RVs when conditioned onx andx1, . . . , xnlettingx+ be shorthand for x, x1, . . . , xn, we can write
Var(y y|x+ ) = Var(0+ 1x +e|x+ ) + Var(0+ 1x|x+ )
ALR32, 33, 291 II59
-
8/9/2019 02 Chapter ALR for Printing(1)
60/117
Prediction: III
study piecesVar(0
+ 1
x
+e|x+
) &Var(
0+
1x|x+
)
one at a time
using fact that Var(c + U) = Var(U) for a constantc, we have
Var(0+ 1x +e|x+ ) = Var(e|x+ ) =2
recall that, ifU & V are correlated RVs and c is a constantthen Var(U+cV) = Var(U) +c2 Var(V) + 2c Cov(U, V) (seeEquation (A.3), p. 292, of Weisberg)
hence
Var(0
+ 1
x|x+
) = Var(
0|x+
) +x2
Var(
1|x+
) + 2x
Cov(
0,
1|x+
)
= Var(0|x1, . . . , xn) +x2Var(1|x1, . . . , xn)
+2xCov(0, 1|x1, . . . , xn)under assumptionxis independent of RVs forming 0& 1
ALR32, 33, 292 II60
-
8/9/2019 02 Chapter ALR for Printing(1)
61/117
Prediction: IV
expressions for Var(0 |x1, . . . , xn), Var(1 |x1, . . . , xn) andCov(0, 1|x1, . . . , xn) can be extracted from matrix
Var(|X) =
Var(0|X) Cov(0,1|X)
Cov(0,1|X) Var(1|X)
=2(XX)1
Exercise (unassigned): using elements of (XX)1, show thatVar(0+ 1x|x+ ) =2
1
n+
(x x)2SXX
above representsincreasein variance of prediction error due to
necessity of estimating0&1, with the actual variance being
Var(y y|x+ ) =2
1+1
n+
(x x)2SXX
ALR32, 33, 295 II61
-
8/9/2019 02 Chapter ALR for Printing(1)
62/117
Prediction: V
estimating 2 by 2 and taking square root lead to standarderror of prediction (sepred) atx:
sepred(y|x+ ) =
1 +1
n+
(x x)2SXX
1/2 using Forbes data as an example, suppose we want to pre-
dict log10(pressure) at a hypothetical location for which boilingpoint of waterxis somewhere between 190 and 215
prediction for log10(pressure) given boiling point x is
y= 0+ 1x= 0.4216418 + 0.008956178x(1)100% prediction interval is set of pointsyin interval
yt(/2, n2)sepred(y|x+ ) y y+t(/2, n2)sepred(y|x+ )
ALR32, 33 II62
-
8/9/2019 02 Chapter ALR for Printing(1)
63/117
Prediction: VI
heren= 17, so, for a 99% prediction interval, we set = 0.01and uset(0.005, 15) = 2.947
since = 0.003792, x= 203.0 and SXX= 530.8, we have
sepred(y
|x+
) = 0.0037921 +
1
17
+(x 203.0)2
530.8 1/2
solid red curves on following plot depict 99% prediction intervalasxsweeps from 190 to 215 (black lines show intervals assum-ing unrealistically no uncertainty in parameter estimates)
for x = 200, prediction is y = 1.370, and 99% predictioninterval is specified by 1.358 y 1.381 in original space, prediction is 10y = 23.42, and interval is
101.358 10y 101.381, i.e., 22.80 10y 24.05ALR32, 33 II63
-
8/9/2019 02 Chapter ALR for Printing(1)
64/117
Scatterplot of log10(Pressure) Versus Boiling Point
190 195 200 205 210 215
1.3
0
1.3
5
1.4
0
1.4
5
1.5
0
Boiling point
log(Pressure)
II64
-
8/9/2019 02 Chapter ALR for Printing(1)
65/117
Scatterplot of Pressure Versus Boiling Point
190 195 200 205 210 215
20
22
24
26
28
30
32
Boiling point
Pressur
e
II65
-
8/9/2019 02 Chapter ALR for Printing(1)
66/117
Coefficient of Determination R2: I
ignoring potential predictors, best prediction of response is sam-ple average y of observed responsesy1,y2, . . . , yn
for Fort Collins data, total sum of squares SYY=
i(yi y)2is sum of squares of deviations of data from horizontal dashed
line on next plotwith inclusion of predictors, unexplained variation is RSS
for Fort Collins data, RSSis sum of squares of deviations fromsolid line on next plot
ALR35, 36 II66
-
8/9/2019 02 Chapter ALR for Printing(1)
67/117
Scatterplot of Late Snowfall Versus Early Snowfall
0 10 20 30 40 50
10
20
30
40
50
60
Early snowfall (inches)
Latesnowfall(inches)
ALR8 II67
-
8/9/2019 02 Chapter ALR for Printing(1)
68/117
Coefficient of Determination R2: II
difference between SYYand RSS is called sum of squares dueto regression:
SSreg= SYY RSSProblem 5 says that
RSS=SYY(SXY)2
SXX
hence
SSreg= SYYSYY (SXY)
2
SXX
=
(SXY)2
SXX
divide SSregbySYYto get definition for coefficient of deter-mination:
R2 =SSreg
SYY=
(SXY)2
SXX SYY = 1 RSS
SYY
ALR35, 36 II68
-
8/9/2019 02 Chapter ALR for Printing(1)
69/117
Coefficient of Determination R2: III
Exercise (unassigned): R2 =r2xy (squared sample correlation)must have 0 R2 1R2 100 gives percentage of total sum of squares explained by
regression (concept ofR2 generalizes to multiple regression)
examples: R2 = 0.026 for Fort Collins &R2 = 0.995 for Forbes
ALR35, 36 II69
-
8/9/2019 02 Chapter ALR for Printing(1)
70/117
Coefficient of Determination R2: IV
R and other computer packages report bothR
2
and a variationknown as the adjustedR2 :
R2adj= 1 RSS/df
SYY/(n 1) as compared to R2 = 1 RSS
SYY,
wheredfis the degrees of freedom
for simple linear regression, df = n 2, so R2adjgets closerand closer toR2 asnincreases
in general,df=n minus # of parameters in mean functionR2
adjis intended to facilitate comparison of models, but Weis-
berg notes (p. 36) there are better ways of doing so
note: R2 useless if mean function does not have intercept term(e.g., regression through the origin: E(Y |X=x) =1x)
ALR36 II70
-
8/9/2019 02 Chapter ALR for Printing(1)
71/117
Inadequacy of Sufficient Statistics: I
all of the data-dependent variables connected with a simple
linear regression (e.g., 0, 1, 2, SSreg, RSS,R2, etc.) can beformed using just five fundamental statistics:
x, y, SXX, SYY and SXY
since
x=1
n
ni=1
xi, SXX=n
i=1
x2inx2 and SXY=n
i=1
xiyinxy
(with analogous equations for y and SYY), it follows that basiclinear regression analysis depends only on five so-called suffi-
cient statistics:n
i=1
xi,n
i=1
yi,n
i=1
x2i ,n
i=1
y2i andn
i=1
xiyi
ALR293, 294, 23, 24, 25 II71
-
8/9/2019 02 Chapter ALR for Printing(1)
72/117
Inadequacy of Sufficient Statistics: II
under assumptions of normality and correctness of regressionmodel, we do not in theory lose any probabilistic informationby tossing away the original data (xi, yi), i = 1, . . . , n, andjust keeping five sufficient statistics
reliance on sufficient statistics is dangerous in actual applica-tions, where adequacy of basic assumptions (normality, correct-ness of model) is always open to question
Anscombe (1973) constructed an example of four data sets(n= 11) with sufficient statistics that are identical(to within
rounding error), offering much food for thought
third data set: reconsider scheme of picking median slopeamongst all possible lines determined by two distinct points
ALR12, 13 II72
-
8/9/2019 02 Chapter ALR for Printing(1)
73/117
Anscombes First Data Set
5 10 15 20
2
4
6
8
10
12
14
Predictor
Response
ALR13 II73
-
8/9/2019 02 Chapter ALR for Printing(1)
74/117
Anscombes Second Data Set
5 10 15 20
2
4
6
8
10
12
14
Predictor
Response
ALR13 II74
-
8/9/2019 02 Chapter ALR for Printing(1)
75/117
Anscombes Third Data Set
5 10 15 20
2
4
6
8
10
12
14
Predictor
Response
ALR13 II75
-
8/9/2019 02 Chapter ALR for Printing(1)
76/117
Anscombes Fourth Data Set
5 10 15 20
2
4
6
8
10
12
14
Predictor
Response
ALR13 II76
-
8/9/2019 02 Chapter ALR for Printing(1)
77/117
Residuals: I
looking at residuals ei is a vital step in regression analysis can check assumptions to prevent garbage in/garbage out
basic tool is a plot of residuals versus other quantities, of whichthree obvious choices are:
1.residuals versus fitted values yi2.residuals versus predictorsxi3.residuals versus case numbersi
special nature of certain data might suggest other plots
useful residual plot resembles a null plot when assumptionshold, and a non-null plot when some assumption fails
lets look at plots1to3using Anscombes data sets as examples
ALR36, 37, 38 II77
-
8/9/2019 02 Chapter ALR for Printing(1)
78/117
Residuals Versus Fitted Values, Data Set #1
! " # $ % &'
(
&
'
&
)*++,- /012,3
4,3*-20
13
II78
-
8/9/2019 02 Chapter ALR for Printing(1)
79/117
Residuals Versus Predictors, Data Set #1
! " # $% $& $!
&
$
%
$
'()*+,-.(/
0)/+*12
3/
II79
-
8/9/2019 02 Chapter ALR for Printing(1)
80/117
Residuals Versus Fitted Values, Data Set #2
! " # $ % &'
()'
&
)!
&)'
')!
')'
')!
&)'
*+,,-. 0123-4
5-4+.31
24
II80
-
8/9/2019 02 Chapter ALR for Printing(1)
81/117
Residuals Versus Predictors, Data Set #2
! " # $% $& $!
&'%
$
'(
$'%
%'(
%'%
%'(
$'%
)*+,-./0*1
2+1-,34
51
II81
-
8/9/2019 02 Chapter ALR for Printing(1)
82/117
Residuals Versus Fitted Values, Data Set #3
! " # $ % &'
&
'
&
(
)
*+,,-. 0123-4
5-4+.31
24
II82
-
8/9/2019 02 Chapter ALR for Printing(1)
83/117
-
8/9/2019 02 Chapter ALR for Printing(1)
84/117
Residuals Versus Fitted Values, Data Set #4
! " # $% $$ $&
$'(
%'(
%'%
%'(
$'%
$'(
)*++,- /012,3
4,3*-20
13
II84
-
8/9/2019 02 Chapter ALR for Printing(1)
85/117
Residuals Versus Predictors, Data Set #4
! "# "$ "% "& "!
"'(
#'(
#'#
#'(
"'#
"'(
)*+,-./0*1
2+1-,34
51
II85
-
8/9/2019 02 Chapter ALR for Printing(1)
86/117
Residuals: II
Q: why is a plot of residuals versus yi identical to a plot ofresiduals versusxiafter relabeling the horizontal axis?
II86
-
8/9/2019 02 Chapter ALR for Printing(1)
87/117
Residuals Versus Case Numbers, Data Set #1
! " # $ %&
!
%
&
%
'()* ,-./*0)
1*)23-(
4)
II87
-
8/9/2019 02 Chapter ALR for Printing(1)
88/117
Residuals Versus Case Numbers, Data Set #2
! " # $ %&
!'&
%
'(
%'&
&'(
&'&
&'(
%'&
)*+, ./01,2+
3,+45/*
6+
II88
-
8/9/2019 02 Chapter ALR for Printing(1)
89/117
Residuals Versus Case Numbers, Data Set #3
! " # $ %&
%
&
%
!
'
()*+ -./0+1*
2+*34.)
5*
II89
-
8/9/2019 02 Chapter ALR for Printing(1)
90/117
Residuals Versus Case Numbers, Data Set #4
! " # $ %&
%'(
&'(
&'&
&'(
%'&
%'(
)*+, ./01,2+
3,+45/*
6+
II90
-
8/9/2019 02 Chapter ALR for Printing(1)
91/117
Residuals: III
although plots of ei versus i were not particularly useful forAnscombes data, plotisuseful for certain other data sets (par-ticularly where cases are collected sequentially in time)
fourth obvious choice: plot residuals eiversus responsesyi
this choice is problematic because relationshipyi= yi+ ei
says that, if spread of yis is small compared to spread of eis,large eis will correspond to largeyis even if model is correct
thus residuals versus responses is not a useful residual plot be-cause it need not resemble a null plot when assumptions hold
as an example, reconsider Fort Collins data
II91
-
8/9/2019 02 Chapter ALR for Printing(1)
92/117
Scatterplot of Late Snowfall Versus Early Snowfall
0 10 20 30 40 50
10
20
30
40
50
60
Early snowfall (inches)
Latesnowfall(
inches)
ALR8 II92
-
8/9/2019 02 Chapter ALR for Printing(1)
93/117
Residuals Versus Fitted Values, Fort Collins Data
!" !# !$ !% !& $"
!"
#"
'"
"
'"
#"
!"
()**+, ./01+2
3+2),1/02
II93
id l di C lli
-
8/9/2019 02 Chapter ALR for Printing(1)
94/117
Residuals Versus Predictors, Fort Collins Data
! "! #! $! %! &!
$!
#!
"!
!
"!
#!
$!
'()*+,-.(/
0)/+*123/
II94
R id l V C N b F C lli D
-
8/9/2019 02 Chapter ALR for Printing(1)
95/117
Residuals Versus Case Numbers, Fort Collins Data
! "! #! $! %!
&!
"!
'!
!
'!
"!
&!
()*+ -./0+1*
2+*34.)5*
II95
R id l V R F t C lli D t
-
8/9/2019 02 Chapter ALR for Printing(1)
96/117
Residuals Versus Responses, Fort Collins Data
!" #" $" %" &" '"
$"
#"
!"
"
!"
#"
$"
()*+,-*)*
()*./012*
II96
R id l IV
-
8/9/2019 02 Chapter ALR for Printing(1)
97/117
Residuals: IV
reconsider Forbes data, focusing first on 3 following overheads reddashed horizontal lines on residual plot show recall definition of weights ci:
1
= i(xi x)yiSXX =n
i=1 ciyi, where ci=xi xSXX
ALR36, 37, 38 II97
S tt l t f l (P ) V B ili P i t
-
8/9/2019 02 Chapter ALR for Printing(1)
98/117
Scatterplot of log10(Pressure) Versus Boiling Point
195 200 205 210
1.3
5
1.40
1.4
5
Boiling point
log(Press
ure)
ALR6 II98
Pl t f R id l P di t f F b D t
-
8/9/2019 02 Chapter ALR for Printing(1)
99/117
Plot of Residuals versus Predictors for Forbes Data
!"# $%% $%# $!%
%&%
%#
%&%
%%
%&%
%#
%&%
!%
%&%!#
'()*)+, .()+/
012)345*2
ALR6 II99
W i ht V B ili P i t f F b D t
-
8/9/2019 02 Chapter ALR for Printing(1)
100/117
Weights Versus Boiling Point for Forbes Data
!"# $%% $%# $!%
%&
%!#
%&
%%#
%&
%%#
%&
%!#
'()*)+, .()+/
0)
II100
R id l V
-
8/9/2019 02 Chapter ALR for Printing(1)
101/117
Residuals: V
Weisberg notes that Forbes deemed this case evidently a mis-take, but perhaps just because of its appearance as an outlier
Weisberg (p. 38) shows that, if (x12, y12) is removed and re-gression analysis is redone on reduced data set, resulting slopeestimate is virtually the same, but and quantities that depend
upon it drastically change (see overheads that follow)
to delete or not to delete that is the question:
if we dont delete, normality assumption is questionable
if we do delete, normality assumption is tenable, but no realscientific justification for doing so (open to charges of datamassaging)
ALR36, 37, 38 II101
S tt l t f l (P ) V B ili P i t
-
8/9/2019 02 Chapter ALR for Printing(1)
102/117
Scatterplot of log10(Pressure) Versus Boiling Point
195 200 205 210
1.3
5
1.40
1.4
5
Boiling point
log(Pressure)
ALR6 II102
S tt l t f l (P ) V B ili P i t
-
8/9/2019 02 Chapter ALR for Printing(1)
103/117
Scatterplot of log10(Pressure) Versus Boiling Point
195 200 205 210
1.3
5
1.40
1.4
5
Boiling point
log(Pressure) x
II103
Plot of Residuals versus Predictors for Forbes Data
-
8/9/2019 02 Chapter ALR for Printing(1)
104/117
Plot of Residuals versus Predictors for Forbes Data
!"# $%% $%# $!%
%&%
%#
%&%
%%
%&%
%#
%&%
!%
%&%!#
'()*)+, .()+/
012)34
5*2
ALR6 II104
Plot of Residuals versus Predictors for Forbes Data
-
8/9/2019 02 Chapter ALR for Printing(1)
105/117
Plot of Residuals versus Predictors for Forbes Data
!"# $%% $%# $!%
%&%
%#
%&%
%%
%&%
%#
%&%
!%
%&%!#
'()*)+, .()+/
012)34
5*2
6
II105
Scatterplot of log (Pressure) Versus Boiling Point
-
8/9/2019 02 Chapter ALR for Printing(1)
106/117
Scatterplot of log10(Pressure) Versus Boiling Point
190 195 200 205 210 215
1.3
0
1.3
5
1.4
0
1.4
5
1.5
0
Boiling point
log(Pressure)
II106
Scatterplot of log (Pressure) Versus Boiling Point
-
8/9/2019 02 Chapter ALR for Printing(1)
107/117
Scatterplot of log10(Pressure) Versus Boiling Point
190 195 200 205 210 215
1.3
0
1.3
5
1.40
1.4
5
1.5
0
Boiling point
log(Pressure) x
II107
Main Points: I
-
8/9/2019 02 Chapter ALR for Printing(1)
108/117
Main Points: I
given a responseYand a predictorX, simple linear regressionassumes(1)a linear mean function
E(Y |X=x) =0+ 1x,
where 0 (intercept term) and 1 (slope term) are unknownparameters (constants), and(2)a constant variance function
Var(Y |X=x) =2,
where2 >0 is a third unknown parameter
simple linear regression model can also be written as
Y = E(Y |X=x) +e=0+ 1x+e,
wheree is a statistical error, a random variable (RV) such thatE(e) = 0 and Var(e) =2
ALR21, 292, 293 II108
Main Points: II
-
8/9/2019 02 Chapter ALR for Printing(1)
109/117
Main Points: II
let (xi, y
i), i = 1, . . . , n, be RVs obeying Y =
0+
1x+ e
(predictor/response data are realizations of these 2nRVs)
for theith case, haveyi=0+ 1xi+ei
errorse1, . . . , enare independent RVs such that E(ei|xi) = 0
model for data can also be written in matrix notation asY=X + e,
where
Y= y1
y2...
yn
, X= 1 x1
1 x2... ...1 xn
, = 01 , e= e1
e2...
en
ALR21, 29, 63, 64 II109
Main Points: III
-
8/9/2019 02 Chapter ALR for Printing(1)
110/117
Main Points: III
given sample means
x=1n
ni=1
xi and y=1n
ni=1
yi
and sample cross products and sum of squares
SXY=
n
i=1
(xix)(yiy), SXX=n
i=1
(xix)2
& SYY=
n
i=1
(yiy)2
,
can form least squares estimators for parameters 1 and0:
1=SXY
SXXand 0= y 1x
corresponding estimator for error variance 2 is
2 = RSS
n 2, where RSS=n
i=1
[yi (0 +1xi)]2 =SYY(SXY)2
SXX
ALR293, 294, 24, 25 II110
Main Points: IV
-
8/9/2019 02 Chapter ALR for Printing(1)
111/117
Main Points: IV
letting
RSS(b0, b1) =n
i=1
[yi (b0+b1xi)]2,
least squares estimators 0 and 1 are choices for b0 and b1
such that RSS(b0, b1) is minimizedfitted values yiand residuals eiare defined as
yi= 0+ 1xi and ei=yi (0+ 1xi) =yi yi,in terms of which we have
RSS= RSS(0,1) =n
i=1
e2i and 2 =
ie2in 2
ALR24, 22, 23 II111
Main Points: V
-
8/9/2019 02 Chapter ALR for Printing(1)
112/117
Main Points: V
in matrix notation, least squares estimator of is such that
XX=XY, i.e., is solution to normal equations XXb= XY
2 2 matrixXXhas an inverse as long as SXX = 0, so= (XX)1XY
since E() =, estimators0& 1are unbiased, as is 2 also:
E(0) =0, E(1) =1 and E(2) =2
also have Var(|X) =2(XX)1, leading us to deduce
Var(1|X) = 2
SXX, which can be estimated byVar(1|X) = 2
SXX,
the square root of which is se(1), the standard error of1
ALR304, 305, 61, 62, 63, 27, 28 II112
Main Points: VI
-
8/9/2019 02 Chapter ALR for Printing(1)
113/117
Main Points: VI
can test null hypothesis (NH) that1
= 0 by formingt-statistict= 1/se(1) and comparing it to percentage pointst(, n2)for t-distribution with n 2 degrees of freedom, with a largevalue of|t|giving evidence against NH via a small p-value
(1
)
100% confidence interval for 1 is set of points in
interval whose end points are
1 t(/2, n 2)se(1) and 1+t(/2, n 2)se(1)
ALR31 II113
Main Points: VII
-
8/9/2019 02 Chapter ALR for Printing(1)
114/117
Main Points: VII
can predict a yet-to-be-observed responsey
given a settingxfor the predictor using y = 0+ 1x, which has a standard
error given by
sepred(y
|x+
) = 1 +
1
n
+(x x)2
SXX 1/2
,
wherex+ denotesxalong with original predictorsx1, . . . , xn(1 ) 100% prediction interval constitutes all values from
y
t(/2, n
2)sepred(y|x+
) to y
+t(/2, n
2)sepred(y
|x+
)
ALR32, 33 II114
-
8/9/2019 02 Chapter ALR for Printing(1)
115/117
Main Points: IX
-
8/9/2019 02 Chapter ALR for Printing(1)
116/117
Main Points: IX
plots of residuals eiare invaluable for assessing reasonableness
of fitted model (a point that cannot be emphasized too much)
standard plot is residuals ei versus fitted values yi, which isequivalent to eis versus predictorsxi
plot of residuals versus case numberiis potentially but notalways useful
donotplot residuals versus responsesyi misleading!failure to plot residuals is potentially bad for your health!
Thou Shalt Plot Residuals (a proposed 11th commandment!)
ALR36, 37, 38 II116
Additional References
-
8/9/2019 02 Chapter ALR for Printing(1)
117/117
F. Mosteller, A.F. Siegel, E. Trapido and C. Youtz (1981), Eye Fitting Straight Lines,
The American Statistician,35, pp. 1501
C.R. Rao (1972), Linear Statistical Inference and Its Applications (Second Edition),
New York: John Wiley & Sons, Inc.
G.A.F. Seber (1977),Linear Regression Analysis, New York: John Wiley & Sons, Inc.
top related