sta 286 week 131 inference for the regression coefficient recall, b 0 and b 1 are the estimates of...

16
STA 286 week 13 1 Inference for the Regression Coefficient Recall, b 0 and b 1 are the estimates of the slope β 1 and intercept β 0 of population regression line. We can shows that b 0 and b 1 are the unbiased estimates of β 1 and β 0 and furthermore that b 0 and b 1 are Normally distributed with means β 1 and β 0 and standard deviation that can be estimated from the data. We use the above facts to obtain confidence intervals and conduct hypothesis testing about β 1 and β 0 .

Upload: jocelyn-simon

Post on 05-Jan-2016

213 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: STA 286 week 131 Inference for the Regression Coefficient Recall, b 0 and b 1 are the estimates of the slope β 1 and intercept β 0 of population regression

STA 286 week 13 1

Inference for the Regression Coefficient

• Recall, b0 and b1 are the estimates of the slope β1 and intercept β0 of population regression line.

• We can shows that b0 and b1 are the unbiased estimates of β1 and β0 and furthermore that b0 and b1 are Normally distributed with means β1 and β0 and standard deviation that can be estimated from the data.

• We use the above facts to obtain confidence intervals and conduct hypothesis testing about β1 and β0.

Page 2: STA 286 week 131 Inference for the Regression Coefficient Recall, b 0 and b 1 are the estimates of the slope β 1 and intercept β 0 of population regression

STA 286 week 13 2

CI for Regression Slope and Intercept

• A level 100(1-α)% confidence interval for the intercept β0 is

where the standard error of the intercept is

• A level 100(1-α)% confidence interval for the slope β1 is

where the standard error of the slope is

• Example ….

02/;20 bn SEtb

2

210 xx

x

nsSE

i

b

12/;21 bn SEtb

21

xx

sSE

i

b

Page 3: STA 286 week 131 Inference for the Regression Coefficient Recall, b 0 and b 1 are the estimates of the slope β 1 and intercept β 0 of population regression

STA 286 week 13 3

Significance Tests for Regression Slope

• To test the null hypothesis H0: β1 = 0 we compute the test statistic

• The above test statistic has a t distribution with n-2 degrees of

freedom. We can use this distribution to obtain the P-value for the

various possible alternative hypotheses.

• Note: testing the null hypothesis H0: β1 = 0 is equivalent to testing

the null hypothesis H0: ρ = 0 where ρ is the population correlation.

1

1

bSE

bt

Page 4: STA 286 week 131 Inference for the Regression Coefficient Recall, b 0 and b 1 are the estimates of the slope β 1 and intercept β 0 of population regression

STA 286 week 13 4

Example

• Refer to the heart rate and oxygen example….

Page 5: STA 286 week 131 Inference for the Regression Coefficient Recall, b 0 and b 1 are the estimates of the slope β 1 and intercept β 0 of population regression

STA 286 week 13 5

Confidence Interval for the Mean Response

• For any specific value of x, say x0, the mean of the response y in this subpopulation is given by: μy = β0 + β1x0.

• We can estimate this mean from the sample by substituting the least-square estimates of β0 and β1:

• A 100(1-α)% level confidence interval for the mean response μy when x takes the value x0 is

where the standard error of is

.ˆ 010 xbby

ˆ2/;2ˆ SEt ny

2

20

ˆ

1

xx

xx

nsSE

i

Page 6: STA 286 week 131 Inference for the Regression Coefficient Recall, b 0 and b 1 are the estimates of the slope β 1 and intercept β 0 of population regression

STA 286 week 13 6

Example

• Data on the wages and length of service (LOS) in months for 60 women who work in Indiana banks.

• We are interested to know how LOS relates to wages. The Minitab output and commands are given in a separate file.

Page 7: STA 286 week 131 Inference for the Regression Coefficient Recall, b 0 and b 1 are the estimates of the slope β 1 and intercept β 0 of population regression

STA 286 week 13 7

Prediction Interval• The predicted response y for an individual case with a specific value

x0 of the explanatory variable x is:

• A useful prediction should include a margin of error to indicate its accuracy.

• The interval used to predict a future observation is called a prediction interval.

• A 100(1-α)% level prediction interval for a future observation on the

response variable y from the subpopulation corresponding to x0 is

where the standard error of is

010ˆ xbby

yn SEty ˆ2/;2ˆ

2

20

ˆ

11

xx

xx

nsSE

i

y

y

Page 8: STA 286 week 131 Inference for the Regression Coefficient Recall, b 0 and b 1 are the estimates of the slope β 1 and intercept β 0 of population regression

STA 286 week 13 8

Example

• Calculate a 95% PI for the wage of an employee with 3 years experience (i.e. LOS=36).

• Calculate a 90% PI for the wage of an employee with 3 years experience (i.e. LOS=36).

Page 9: STA 286 week 131 Inference for the Regression Coefficient Recall, b 0 and b 1 are the estimates of the slope β 1 and intercept β 0 of population regression

STA 286 week 13 9

Analysis of Variance for Regression

• Analysis of variance, ANOVA, is essential for multiple regression and for comparing several means.

• ANOVA summarizes information about the sources of variation in

the data. It is based on the framework of DATA = FIT + RESIDUAL.

• The total variation in the response y is expressed by the deviations

• The overall deviation of any y observation from the mean of the y’s

can be split into two main sources of variation and expressed as

yyi

iiii yyyyyy ˆˆ

Page 10: STA 286 week 131 Inference for the Regression Coefficient Recall, b 0 and b 1 are the estimates of the slope β 1 and intercept β 0 of population regression

STA 286 week 13 10

Sum of Squares

• Sum of squares (SS) represent variation presented in the responses. They are calculated by summing squatted deviations. Analysis of variance partition the total variation between two sources.

• The total variation in the data is expressed as SST = SSM + SSE.

• SST stands for sum of squares for total it is given by...

• SSM stands for sum of squares for model it is given by...

• SSE stands for sum of squares for errors it is given by ...

• Each of the above SS has degrees of freedom associated with it. The degrees of freedom are…

Page 11: STA 286 week 131 Inference for the Regression Coefficient Recall, b 0 and b 1 are the estimates of the slope β 1 and intercept β 0 of population regression

STA 286 week 13 11

Coefficient of Determination R2

• The coefficient of variation R2 is the fraction of variation in the

values of y that is explained by the least-squares regression. The SS

make this interpretation precise.

• We can show that

• This equation is the precise statement of the fact that R2 is the fraction of variation in y explained by x.

SST

SSE1

SST

SSM2 R

Page 12: STA 286 week 131 Inference for the Regression Coefficient Recall, b 0 and b 1 are the estimates of the slope β 1 and intercept β 0 of population regression

STA 286 week 13 12

Mean Square

• For each source, the ratio of the SS to the degrees of freedom is

called the mean square (MS).

• To calculate mean squares, use the formula

freedom of degrees

squares of sumMS

Page 13: STA 286 week 131 Inference for the Regression Coefficient Recall, b 0 and b 1 are the estimates of the slope β 1 and intercept β 0 of population regression

STA 286 week 13 13

ANOVA Table and F Test

• In the simple linear regression model, the hypotheses H0: β1 = 0 vr H1: β1 ≠ 0 are tested by the F statistic.

• The F statistic is given by

• The F statistic has an F(1, n-2) distribution which we can use to find the P-value.

• Example…

MSE

MSMF

Page 14: STA 286 week 131 Inference for the Regression Coefficient Recall, b 0 and b 1 are the estimates of the slope β 1 and intercept β 0 of population regression

STA 286 week 13 14

Residual Analysis • We will use residuals for examining the following six types of

departures from the model.

The regression is nonlinear The error terms do not have constant variance The error terms are not independent The model fits but some outliers The error terms are not normally distributed One or more important variables have been omitted from the

model

Page 15: STA 286 week 131 Inference for the Regression Coefficient Recall, b 0 and b 1 are the estimates of the slope β 1 and intercept β 0 of population regression

STA 286 week 13 15

Residual plots • We will use residual plots to examine the aforementioned types of

departures. The plots that we will use are:

Residuals versus the fitted values

Residuals versus time (when the data are obtained in a time sequence) or other variables

Normal probability plot of the residuals

Histogram, Stemplots and boxplots of residuals

Page 16: STA 286 week 131 Inference for the Regression Coefficient Recall, b 0 and b 1 are the estimates of the slope β 1 and intercept β 0 of population regression

STA 286 week 13 16

Example

• Below are the residual plots from the model predicting GPA based on SAT scores….