sta 286 week 131 inference for the regression coefficient recall, b 0 and b 1 are the estimates of...
TRANSCRIPT
STA 286 week 13 1
Inference for the Regression Coefficient
• Recall, b0 and b1 are the estimates of the slope β1 and intercept β0 of population regression line.
• We can shows that b0 and b1 are the unbiased estimates of β1 and β0 and furthermore that b0 and b1 are Normally distributed with means β1 and β0 and standard deviation that can be estimated from the data.
• We use the above facts to obtain confidence intervals and conduct hypothesis testing about β1 and β0.
STA 286 week 13 2
CI for Regression Slope and Intercept
• A level 100(1-α)% confidence interval for the intercept β0 is
where the standard error of the intercept is
• A level 100(1-α)% confidence interval for the slope β1 is
where the standard error of the slope is
• Example ….
02/;20 bn SEtb
2
210 xx
x
nsSE
i
b
12/;21 bn SEtb
21
xx
sSE
i
b
STA 286 week 13 3
Significance Tests for Regression Slope
• To test the null hypothesis H0: β1 = 0 we compute the test statistic
• The above test statistic has a t distribution with n-2 degrees of
freedom. We can use this distribution to obtain the P-value for the
various possible alternative hypotheses.
• Note: testing the null hypothesis H0: β1 = 0 is equivalent to testing
the null hypothesis H0: ρ = 0 where ρ is the population correlation.
1
1
bSE
bt
STA 286 week 13 4
Example
• Refer to the heart rate and oxygen example….
STA 286 week 13 5
Confidence Interval for the Mean Response
• For any specific value of x, say x0, the mean of the response y in this subpopulation is given by: μy = β0 + β1x0.
• We can estimate this mean from the sample by substituting the least-square estimates of β0 and β1:
• A 100(1-α)% level confidence interval for the mean response μy when x takes the value x0 is
where the standard error of is
.ˆ 010 xbby
ˆ2/;2ˆ SEt ny
2
20
ˆ
1
xx
xx
nsSE
i
STA 286 week 13 6
Example
• Data on the wages and length of service (LOS) in months for 60 women who work in Indiana banks.
• We are interested to know how LOS relates to wages. The Minitab output and commands are given in a separate file.
STA 286 week 13 7
Prediction Interval• The predicted response y for an individual case with a specific value
x0 of the explanatory variable x is:
• A useful prediction should include a margin of error to indicate its accuracy.
• The interval used to predict a future observation is called a prediction interval.
• A 100(1-α)% level prediction interval for a future observation on the
response variable y from the subpopulation corresponding to x0 is
where the standard error of is
010ˆ xbby
yn SEty ˆ2/;2ˆ
2
20
ˆ
11
xx
xx
nsSE
i
y
y
STA 286 week 13 8
Example
• Calculate a 95% PI for the wage of an employee with 3 years experience (i.e. LOS=36).
• Calculate a 90% PI for the wage of an employee with 3 years experience (i.e. LOS=36).
STA 286 week 13 9
Analysis of Variance for Regression
• Analysis of variance, ANOVA, is essential for multiple regression and for comparing several means.
• ANOVA summarizes information about the sources of variation in
the data. It is based on the framework of DATA = FIT + RESIDUAL.
• The total variation in the response y is expressed by the deviations
• The overall deviation of any y observation from the mean of the y’s
can be split into two main sources of variation and expressed as
yyi
iiii yyyyyy ˆˆ
STA 286 week 13 10
Sum of Squares
• Sum of squares (SS) represent variation presented in the responses. They are calculated by summing squatted deviations. Analysis of variance partition the total variation between two sources.
• The total variation in the data is expressed as SST = SSM + SSE.
• SST stands for sum of squares for total it is given by...
• SSM stands for sum of squares for model it is given by...
• SSE stands for sum of squares for errors it is given by ...
• Each of the above SS has degrees of freedom associated with it. The degrees of freedom are…
STA 286 week 13 11
Coefficient of Determination R2
• The coefficient of variation R2 is the fraction of variation in the
values of y that is explained by the least-squares regression. The SS
make this interpretation precise.
• We can show that
• This equation is the precise statement of the fact that R2 is the fraction of variation in y explained by x.
SST
SSE1
SST
SSM2 R
STA 286 week 13 12
Mean Square
• For each source, the ratio of the SS to the degrees of freedom is
called the mean square (MS).
• To calculate mean squares, use the formula
freedom of degrees
squares of sumMS
STA 286 week 13 13
ANOVA Table and F Test
• In the simple linear regression model, the hypotheses H0: β1 = 0 vr H1: β1 ≠ 0 are tested by the F statistic.
• The F statistic is given by
• The F statistic has an F(1, n-2) distribution which we can use to find the P-value.
• Example…
MSE
MSMF
STA 286 week 13 14
Residual Analysis • We will use residuals for examining the following six types of
departures from the model.
The regression is nonlinear The error terms do not have constant variance The error terms are not independent The model fits but some outliers The error terms are not normally distributed One or more important variables have been omitted from the
model
STA 286 week 13 15
Residual plots • We will use residual plots to examine the aforementioned types of
departures. The plots that we will use are:
Residuals versus the fitted values
Residuals versus time (when the data are obtained in a time sequence) or other variables
Normal probability plot of the residuals
Histogram, Stemplots and boxplots of residuals
STA 286 week 13 16
Example
• Below are the residual plots from the model predicting GPA based on SAT scores….