st217 mathematical statistics bclux.x-pec.com/files/mathstuff/2ndyear/msb/slides6.pdfintroduction...
TRANSCRIPT
ST217 Mathematical Statistics B
Barbel Finkenstadt
March 11, 2008
CHAPTER 6
LINEAR STATISTICAL MODELS
The (Normal) Linear ModelThe Analysis of Variance (ANOVA)
I one-way ANOVAI two-way ANOVA
Introduction
Definition [Response Variable] a response variable is arandom variable Y whose value we wish to predict.
Definition [Explanatory Variable] An explanatory variable is arandom variable X whose values can be used to predict Y .
Definition [Linear Model] A linear model is a prediction functionfor Y in terms of the values x1, x2, . . . , xk of X1, X2, . . . , Xk of theform
E[Y |x1, x2, . . . , xk ] = β0 + β1x1 + β2x2 + · · ·+ βkxk (1)
Matrix formulationIf Y1, Y2, . . . , Yn are responses for cases 1, 2, . . . , n, and xij isthe value of Xj (j = 1, . . . , k ) for case i , then
E[Y|X] = Xβ. (2)
Vector of responses:
Y =
Y1Y2...
Yn
Matrix of explanatory variables (design matrix):
X = (xij) where xi0 = 1 for i = 1, . . . n
Parameter vector (unknown):
β =
β0β1...
βk
.
Example 1
Response Y = S2 (Systolic BP after treatment)
I explanatory variable S1 (this is a ‘simple linear regressionmodel’, with just 1 explanatory variable),
I explanatory variable D2
I explanatory variables D1 and S1 (a ‘multiple regressionmodel’).
Example 2
Response Y = 2D2 + S2
I explanatory variable Z = 2D1 + S1
I explanatory variables Z and Z 2 (a ‘quadratic regressionmodel’).
New response and explanatory variables may be obtained bytransforming and/or combining old ones.
Comments
I A linear relationship is the simplest possible relationshipbetween response variables and explanatory variables.
I Can sometimes approximate a nonlinear relationship by alinear model, for example ‘polynomial regression’
E[Y |x ] = β0 + β1x + β2x2 + · · ·+ βmxm.
I By ’linear model’ we mean ’linear in parameters’.I Distributional assumptions (if any) ONLY about the
response variable Y . The explanatory variables areconsidered as given and fixed.
I There are various other names for response andexplanatory variables
Simple Linear Regression
Definition [Simple linear regression]A simple linear regression model is a linear model with oneresponse variable Y and one explanatory variable X , i.e. amodel of the form
E[Y |x1] = β0 + β1x1. (3)
Typically in practice we have n data points (xi , yi) fori = 1, . . . , n, and we want to predict a future response Y fromthe corresponding observed value x of X .
Which variable should be treated as the response?
1. X may precede Y in time, for exampleI X is BP before treatment and Y is BP after treatment, orI X is number of hours revision and Y is exam mark;
2. X may be in some way more fundamental, for exampleI X is weight and Y is height orI X is height and Y is weight;
3. X may be easier or cheaper to observe, and in future wehope to estimate/approximate Y without measuring it.
Parameters β0 and β1 are unknown and we need to estimatethem in order to predict Y by Y = β0 + β1x .
To make accurate predictions we require the prediction error
Y − Y = Y − β0 + β1x
to be small.
This suggests that, given data (xi , yi) for i = 1, . . . , n, we shouldfind β0 and β1 so that all the vertical deviations of the observeddata points from the fitted line y = β0 + β1x are small.
To do this we apply the ‘least squares’ criterion:Minimise the sum of squared deviations
∑(yi − yi)
2.
Method of Least Squares
For simple linear regression,
yi = β0 + β1xi (i = 1, . . . , n) (4)
To estimate β0 and β1 by least squares, we minimise
Q =n∑
i=1
[yi − (β0 + β1xi)]2. (5)
Exercise 6.1: Show that Q is minimised at values β0 and β1satisfying the equations
β0n + β1∑
xi =∑
yi ,
β0∑
xi + β1∑
x2i =
∑xiyi ,
(6)
(these are called the normal equations for β0 and β1) andhence that
β1 =
∑xiyi − n x y∑
x2i − nx2 , (7)
β0 = y − β1x . (8)
Comments
I Checking second order partial derivatives of Q verifies thatQ is minimised at β = β.
I y = β0 + β1x is called the ‘least squares fit’ to the data.I The least squares fitted line passes through (x , y), the
centroid of the data points.
The Normal Linear Model (NLM)Definition [NLM]Given n response RVs Yi (i = 1, 2, . . . , n), with correspondingvalues of explanatory variables xT
i , the NLM makes thefollowing assumptions:
1. (Conditional) IndependenceThe Yi are mutually independent given the xT
i .2. Linearity
The expected value of the response variable is linearlyrelated to the unknown parameters β:
EYi = xTi β.
3. NormalityThe random variation Yi |xi is Normally distributed.
4. Homoscedasticity (Equal Variances)i.e. Yi |xi ∼ N(xT
i β, σ2).
Matrix Formulation of NLM
The NLM for responses y = (y1, y2, . . . , yn)T can be recast as
follows
1. E[Y] = Xβ for some parameter vectorβ = (β1, β2, . . . , βp)T ,
2. ε = Y− E[Y] ∼ MVN(0, σ2I), where I is the (n × n) identitymatrix.
It can be shown that the least squares estimates of β are givenby solving the simultaneous linear equations
XT y = XT Xβ (9)
(the normal equations), with solution (assuming that XT X isnonsingular)
β = (XT X)−1XT y, (10)
I In the general formulation the first column of the matrix Xcontains 1’s. The corresponding parameter β1 will be the‘constant term’.
I The fitted values are y = XT β.I The vector of residuals is r = y− y.
Definition [Residual sum of squares]The residual sum of squares (RSS) in the fitted NLM is
s2 =∑n
i=1(yi − yi)2
= (y− Xβ)T (y− Xβ)(11)
I S2/σ2 ∼ χ2(n−p),
I S2 is independent of β.
Exercise 6.2
(a) Show that the log-likelihood function for the NLM is
(constant)− n2
log(σ2)− 12σ2 (y− Xβ)T (y− Xβ). (12)
(b) Show that the maximum likelihood estimate of β is identicalto the least squares estimate. What is the distribution of β?
(c) Show that the MLE σ2 of σ2 is
σ2 =s2
n. (13)
What are the mean and variance of σ2?(d) Show that an unbiased estimator of σ2 is given by the
formulaResidual Sum of Squares
Residual Degrees of Freedom
bf Amended lecture notes start here, insert the following slidesinstead of section 6.6 in current lecture notes
ExerciseGive the Rao-Cramer lower bound for an unbiased estimator ofβ and σ2 and verify whether β and the unbiased estimator of σ2
attain it.
Degree of Explanation
A frequently reported summary statistic for the ’goodness of fit’is the coefficient of determinaion R2
R2 =Varriation explained by Regression
Total variation(14)
=
∑ni=1(yi − y)2∑ni=1(yi − y)2
(15)
where the total variation corresponds to the observedunconditional variation of the response variable.
Caution: R2 cannot be used as a basis for comparing thegoodness of fit of alternative models as it can be increasedsimply by increasing the number of explanatory variables.
Hypothesis Testing
(a) General linear hypothesesSuppose we wish to test the null hypothesis
H0 : Rβ = r
that the parameters of the β vector are contained in thesubspace for which Rβ = r where
R is a J × p; J ≤ p hypothesis design matrix of rank J;r is a J × 1 known vector.
Exercise Formulate
(a) H0 : β1 = β2 for the case p = 2,(b) H0 : β1 = β2 and
∑pi=1 βi = 1,
(c) H0 : β1 − 3β3 = 6β4 and β1β2
= 4,
as a general linear hypothesis Rβ = r.
Let β = (XT X)−1XT y be the unconstrained MLE andσ2 = s2
(n−p) . Then it can be shown that the following test is ageneral likelihood ratio test:
Under H0 : Rβ = r
λ =(Rβ − r)T [R(XT X)−1RT ]−1(Rβ − r)
Jσ2 ∼ F(J,n−p).
Rejection rule: reject H0 if the realised λ is larger than the(1− α)percentile of the F(J,n−p) distribution.
(b) Single HypothesisThis is a subcase of (a) with rank J = 1:
Suppose we wish to test the null hypothesis
H0 : R1β = r1
Examples: Formulate
H0 : β1 = β2,H0 :
∑pi=1 βi = 1,
H0 : β1 = r1,
as a simple linear hypothesis of the form R1β = r1.
In the case of a simple linear hypothesis (J = 1) the likelihoodratio test in (a) reduces to
λ =(R1β − r)2
σ2[R(XT X)−1RT ]∼ F(1,n−p) = t2
(n−p).
Therefore we can use a t-test
T =√
λ =R1β − r
σ√
R(XT X)−1RT∼ t(n−p).
Decision rule: reject H0 if |T | is too large.
Comment: For the single hypothesis H0 : βi = βi,0 note that
T =βi − βi,0
σ√
aii∼ t(n−p),
where aii is the i’th diagonal element of (XT X)−1.
Regression software often reports the value of T for H0 : βi = 0for each single regression coefficient.
(c) Hypothesis testing about σ2
Straightforward: The RV
(n − p)σ2
σ2 ∼ χ2(n−p)
provides distributional basis for testing H0 : σ2 = σ20.
Analysis of Residuals
Residuals should be approximately iid, normal, random,patternless, uncorrelated, ...
Investigate residuals to
I check for outliersI plot standardised residuals (these should be approximately
standard normal) and be suspicious about absolute valueslarger than about 2.
I check for violations of normalityI histogram and normal or QQ plot;I possibly tests for normality (χ2 of goodness of fit,
Kolmogorov-Smirnov test)
I check for homogeneous variances by plotting residualsagains predictions y and also against explanatoryvariables;
I in particular if the response variable is a variable that ismeasured over time then the residuals might be correlated.You can inspect this in scatterplots where you plot theresidual at time point t versus the residual at time pointt − 1 and compute the sample correlation coefficient.
Examples of the NLMExample 1
Simple Linear Regression
Yi = β0 + β1xi + εi , (16)
where εiIID∼N(0, σ2).
Example 2
Two-sample t-test
y =
x1x2...
xmy1...
yn
, X =
1 01 0...
...1 00 1...
...0 1
, β =
(β0β1
), (17)
and we’re interested in the hypothesis H0 : (β0 − β1) = 0.
Example 3: Multiple RegressionY = SBP after captopril, x1 = SBP before captopril, x2 = DBPbefore captopril,
y =
201165166157147145168180147136151168179129131
, X =
1 210 1301 169 1221 187 1241 160 1041 167 1121 176 1011 185 1211 206 1241 173 1151 146 1021 174 981 201 1191 198 1061 148 1071 154 100
, β =
β0β1β2
,
(18)
See slides captopril.pdf for output, file can be downloaded frommaterial webpage.
Exercise [dependence and heteroscedasticiy]Suppose that Y|X is MVN(Xβ, σ2Φ) where Φ is a knownpositive definite matrix. Show that the MLE of β is given by thegeneralised least squares estimator
β = (XT Φ−1X)−1XT Φ−1y.
ANALYSIS OF VARIANE (ANOVA)
One-Way Analysis of VarianceTwo-Way Analysis of Variance
One-Way Analysis of Variance
Generalization of the two-sample t-test to p > 2 groups
Observations yij (j = 1, 2, . . . , ni ) in the i th group(i = 1, 2, . . . , p).Total number of observations n = n1 + n2 + · · ·+ np
Denote the corresponding RVs by Yij . Assume thatYij ∼ N(βi , σ
2) independently.
Wish to test the null hypothesis
H0 : β1 = β2 = · · · = βp
The one-way ANOVA model is a NLM of the form
Y ∼ MVN(Xβ, σ2I),where
Y =
Y1
Y2...
Yn
, X =
1 0 0 · · · 01 0 0 · · · 0...
......
. . ....
1 0 0 · · · 00 1 0 · · · 0...
......
. . ....
0 1 0 · · · 00 0 1 · · · 0...
......
......
0 0 1 · · · 0...
......
......
0 0 0 · · · 1...
......
. . ....
0 0 0 · · · 1
, β =
β1
β2...
βp
, (19)
where X has n1 rows of the first type, . . . np rows of the last type, andn1 + n2 + · · · + np = n.
Notation
y i+ =1ni
ni∑j=1
yij , y++ =1n
p∑i=1
ni∑j=1
yij
(=
1n
p∑i=1
niy i+
), etc.
Exercise 6.6 Show that for one-way ANOVA,XT X = diag(n1, n2, . . . , np), and henceβ = (Y 1+, Y 2+, . . . , Y p+)T .
Equivalent Reformulation
Yij = βi + eij = µ + αi + eij
whereµ is called the grand meanαi is called treatment effect of treatment i .
µ = E[Y ++] =1n
p∑i=1
ni∑j=1
EYij =1n
p∑i=1
niβi ,
αi = βi − µ (i = 1, 2, . . . , p),∑pi=1 αi = 0.
Hypotheses are now
H0 : αi = 0 (i = 1, 2, . . . , p),H1 : αi 6= 0 for at least one i .
One-Way ANOVA TestTotal variability of all data taken together around its estimatedgrand mean can be decomposed as
SST =∑p
i=1∑ni
j=1(yij − y++)2
=∑p
i=1 ni(y i+ − y++)2 +∑p
i=1∑ni
j=1(yij − y i+)2
= SS(tr) + SSE
with corresponding degrees of freedom
n − 1 = p − 1 + n − p.
Under H0 the F-ratio
f =
SS(tr)(p−1)σ2
SSE(n−p)σ2
=MS(tr)MSE
∼ F (p − 1, n − p)
and we reject H0 if f exceeds the critical value at significancelevel α.
Exercises1) Prove that SST = SS(tr) + SSE.2) Show that f is a general likelihood ratio test.
One-way-ANOVA table
Source of Degrees of Sum of Meanvariation freedom squares square fTreatment p − 1 SS(tr) MS(tr) MS(tr)
MSEError n − p SSE MSETotal n − 1 SST MST
Example: River pollution (p.94)The following data come from a study of pollution in inlandwaterways. In each of seven localities, five pike were caughtand the log concentration of copper in their livers measured.
Locality Log concentration of copper (ppm)
1. Windermere 0.187 0.836 0.704 0.938 0.1242. Grassmere 0.449 0.769 0.301 0.045 0.8463. River Stour 0.628 0.193 0.810 0.000 0.8554. Wimbourne St Giles 0.412 0.286 0.497 0.417 0.3375. River Avon 0.243 0.258 -0.276 -0.538 0.0416. River Leam 0.134 0.281 0.529 0.305 0.4597. River Kennett 0.471 0.371 0.297 0.691 0.535
Want to carry out a one-way analysis of variance to test fordifferences between the data between localities.
Figure: Concentration of copper in pike livers
Two-Way Analysis of Variance
Here there are two factors:
Factor A has I ‘levels’ 1, 2, . . . , I, and factor B has J ‘levels’1, 2, . . . , J.
Factor B1 2 . . . J
1 Y11 Y12 . . . Y1J2 Y21 Y22 . . . Y2J
Factor A 3 Y31 Y32 . . . Y3J...
......
. . ....
I YI1 YI2 . . . YIJ
i.e. there is precisely one observation Yij at each (i , j)combination of factor levels.
Again assume the NLM with
E[Yij ] = θi + φj for i = 1 . . . I and j = 1 . . . J.
i.e.Yij ∼ N(θi + φj , σ
2) independently. (20)
Exercise 6.7 What is the matrix formulation of this model?
Reformulation
The two-way ANOVA model
Yij = µ + αi + βj + eij ,
where
eij ∼ N(0, σ2),I∑
i=1
αi = 0,
J∑j=1
βj = 0.
The Hypotheses of interest are
H0,A : αi = 0 i = 1, ..., I versus H1,A : αi 6= 0 for at least one i .H0,B : βj = 0 j = 1, ..., J versus H1,B : βi 6= 0 for at least one j .
NotationEstimated grand mean:
y++ =1IJ
I∑i=1
J∑j=1
yij
Mean observation for i’th level of factor A:
y i+ =1J
J∑j=1
yij ,
Mean observation for j’th level of factor B:
y+j =1I
I∑i=1
yij .
Decomposition of total variability of all data taken togetheraround its estimated grand mean∑I
i=1∑J
j=1(yij − y++)2 =∑I
i=1 J(y i+ − y++)2
+∑J
j=1 I(y+j − y++)2
+∑I
i=1∑J
j=1(yij − y i+ − y+j + y++)2
SST = SSA + SSB + SSE
with corresponding degrees of freedom
(IJ − 1) = (I − 1) + (J − 1) + (IJ − J − I + 2− 1).
Two-way ANOVA test
We test H0,A with the F-ratio
fA =SSA/(I − 1)σ2
SSE/(I − 1)(J − 1)σ2 =MSAMSE
Under H0,AfA ∼ F (I − 1, (I − 1)(J − 1))
and we reject H0,A if fA is larger than the appropriately chosencritical value.
The test for H0,B is analogous.
Two-way-ANOVA table
Source of Degrees of Sum of Meanvariation freedom squares square fFactor A I − 1 SSA MSA MSA
MSEFactor B J − 1 SSB MSB MSB
MSEError (I − 1)(J − 1) SSE MSETotal IJ − 1 SST MST
Exercise
Consider the following data on the amount of time (in minutes)it took a certain person to drive to work, Monday through Friday,along four different routes:
Mon Tues Wed Thu FriRoute 1 22 26 25 25 31Route 2 25 27 28 26 29Route 3 26 29 33 30 33Route 4 26 28 27 30 30
Test at the 0.05 level of significance whether the differencesamong the means obtained for the differnt routes are significantand also whether the differences among the means obtainedfor the different days of the week are significant.
Notes for RevisionRevise all chapters 2-6 carefully, including all material added onslides. You are expected to
Chapter 2 be able to derive marginal and conditional distributions,understand independence, be able to do all manipulationsof expectations and conditional expectations (includingcorrelations, variances and covariances), all exercises andproblems of chapter 2 are relevant.
Chapter 3 fully understand and do manipulations about multivariatenormal as practised in exercises (you do need to memorizethe pdf of the MVN distribution!); recognise chi-square, tand F distribution as transformations of other randomvariables, and know some basic properties of these (suchas first two moments).
Chapter 4 fully understand all material in particular about propertiesof estimators and likelihood estimation. Be able to deriveML estimators and their information matrix in themultiparameter case.
Revision contd.
Chapter 5 understand all basic concepts of testing, prove NeymanPearson lemma, be able to carry out all tests introduced todata, be aware of the test assumptions and how to checkit, be able to prove when a test is a LR test.
chapter 6 Definition and all assumptions of the NLM and its matrixformulation, be able to derive ML and LS estimators andtheir properties (again you do need to memorize the pdf ofMVN to set up the likelihood!), fully understand the outputof some regression software and analysis of residuals aswell as some formal testing in the NLM. Carry out a 1-wayand 2-way analysis of variance for some real data (withyour pocket calculator)
Notes on Revision
Revise
I some linear algebra, you need to able to carry out matrixmanipulations (incl determinants, inverses, rank) andformulate linear equations in matrix form.
I exercises and problems (and most importantly all exercisesdone in lectures and problems marked as past examquestions in the lecture notes)
Make sure you bring a working pocket calculator to exam andthat you know how to use it. The pocket calculator has toconform to University rules, ie it may only be able to displayfigures, no letters!
Be aware: If you forget your calculator you will not be able to doa significant part of the exam. Also it is not permitted to share apocket calculator with a neighbour.