sta291 statistical methods lecture 27. inference for regression
TRANSCRIPT
![Page 1: STA291 Statistical Methods Lecture 27. Inference for Regression](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649e0d5503460f94af6fab/html5/thumbnails/1.jpg)
STA291Statistical Methods
Lecture 27
![Page 2: STA291 Statistical Methods Lecture 27. Inference for Regression](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649e0d5503460f94af6fab/html5/thumbnails/2.jpg)
Inference for Regression
Does the cost of a movie depend on its length?
Now we want to know, how useful is this model?
![Page 3: STA291 Statistical Methods Lecture 27. Inference for Regression](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649e0d5503460f94af6fab/html5/thumbnails/3.jpg)
The Population and the SampleThe movie budget sample is based on 120 observations. But we know observations vary from sample to sample. So we imagine a true line that summarizes the relationship between x and y for the entire population,
Where µy is the population mean of y at a given value of x.
We write µy instead of y because the regression line assumes that the means of the y values for each value of x fall exactly on the line.
0 1y x
![Page 4: STA291 Statistical Methods Lecture 27. Inference for Regression](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649e0d5503460f94af6fab/html5/thumbnails/4.jpg)
For a given value x: Most, if not all, of the y values obtained from a
particular sample will not lie on the line.
The sampled y values will be distributed about µy.
We can account for the difference between ŷ and µy by adding the error residual, or ε : 0 1y x
The Population and the Sample
![Page 5: STA291 Statistical Methods Lecture 27. Inference for Regression](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649e0d5503460f94af6fab/html5/thumbnails/5.jpg)
Regression Inference Collect a sample and estimate the population β’s by
finding a regression line (Chapter 6):
The residuals e = y – ŷ are the sample-based versions of ε.
Account for the uncertainties in β0 and β1 by making confidence intervals, as we’ve done for means and proportions.
0 1
0 0 1 1
ˆ
estimates , estimates
y b b x
b b
The Population and the Sample
![Page 6: STA291 Statistical Methods Lecture 27. Inference for Regression](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649e0d5503460f94af6fab/html5/thumbnails/6.jpg)
Assumptions and Conditions
In this order:1. Linearity Assumption2. Independence Assumption3. Equal Variance Assumption4. Normal Population Assumption
![Page 7: STA291 Statistical Methods Lecture 27. Inference for Regression](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649e0d5503460f94af6fab/html5/thumbnails/7.jpg)
Summary of Assumptions and Conditions
Assumptions and Conditions
![Page 8: STA291 Statistical Methods Lecture 27. Inference for Regression](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649e0d5503460f94af6fab/html5/thumbnails/8.jpg)
Summary of Assumptions and Conditions1. Make a scatterplot of the data to check for linearity.
(Linearity Assumption)2. Fit a regression and find the residuals, e, and
predicted values ŷ.3. Plot the residuals against time (if appropriate) and
check for evidence of patterns (Independence Assumption).
4. Make a scatterplot of the residuals against x or the predicted values. This plot should not exhibit a “fan” or “cone” shape. (Equal Variance Assumption)
5. Make a histogram and Normal probability plot of the residuals (Normal Population Assumption)
Assumptions and Conditions
![Page 9: STA291 Statistical Methods Lecture 27. Inference for Regression](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649e0d5503460f94af6fab/html5/thumbnails/9.jpg)
The Standard Error of the SlopeFor a sample, we expect b1 to be close, but not equal to the model slope β1. For similar samples, the standard error of the slope is a measure of the variability of b1 about the true slope β1.
Spread around the line: se
Spread of the x values: sx
Sample size: n
![Page 10: STA291 Statistical Methods Lecture 27. Inference for Regression](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649e0d5503460f94af6fab/html5/thumbnails/10.jpg)
Which of these scatterplots would give the more consistent regression slope estimate if we were to sample repeatedly from the underlying population? Hint: Compare se’s.
The Standard Error of the Slope 𝑆𝐸 (𝑏1 )=
𝑠𝑒𝑠𝑥 √𝑛− 1
![Page 11: STA291 Statistical Methods Lecture 27. Inference for Regression](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649e0d5503460f94af6fab/html5/thumbnails/11.jpg)
Which of these scatterplots would give the more consistent regression slope estimate if we were to sample repeatedly from the underlying population? Hint: Compare sx’s.
The Standard Error of the Slope 𝑆𝐸 (𝑏1 )=
𝑠𝑒𝑠𝑥 √𝑛− 1
![Page 12: STA291 Statistical Methods Lecture 27. Inference for Regression](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649e0d5503460f94af6fab/html5/thumbnails/12.jpg)
Which of these scatterplots would give the more consistent regression slope estimate if we were to sample repeatedly from the underlying population? Hint: Compare n’s.
The Standard Error of the Slope 𝑆𝐸 (𝑏1 )=
𝑠𝑒𝑠𝑥 √𝑛− 1
![Page 13: STA291 Statistical Methods Lecture 27. Inference for Regression](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649e0d5503460f94af6fab/html5/thumbnails/13.jpg)
A Test for the Regression SlopeWhen the conditions are met, the standardized
estimated regression slope,
Follows a t-distribution with df = n – 2. We estimate SE(b1) with:
Where sx is the ordinary standard deviation of the x’s and
𝑆𝐸 (𝑏1 )=𝑠𝑒
𝑠𝑥 √𝑛− 1
𝑡=𝑏1 −𝛽1
𝑆𝐸 (𝑏1 )
𝑠𝑒=√∑ (𝑦− �̂� )2
𝑛−2
![Page 14: STA291 Statistical Methods Lecture 27. Inference for Regression](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649e0d5503460f94af6fab/html5/thumbnails/14.jpg)
The usual null hypothesis about the slope is that it’s equal to 0. Why?A slope of zero says that y doesn’t tend to change linearly when x changes. In other words, if the slope equals zero, there is no linear association between the two variables.
H0: β1 = 0. This would mean that x and y are not linearly related.Ha: β1 ≠ 0. This would mean . . .
A Test for the Regression Slope
![Page 15: STA291 Statistical Methods Lecture 27. Inference for Regression](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649e0d5503460f94af6fab/html5/thumbnails/15.jpg)
CI for the Regression Slope
𝑏1 ± 𝑡𝑛− 2∗ ×𝑆𝐸 (𝑏1 )
When the assumptions and conditions are met, we can find a confidence interval for b1 from
Where the critical value t* depends on the confidence level and has df = n – 2.
![Page 16: STA291 Statistical Methods Lecture 27. Inference for Regression](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649e0d5503460f94af6fab/html5/thumbnails/16.jpg)
16.4 A Test for the Regression SlopeExample : Soap
A soap manufacturer tested a standard bar of soap to see how long it would last. A test subject showered with the soap each day for 15 days and recorded the weight (in grams) remaining. Conditions were met so a linear regression gave the following:
Dependent variable is: WeightR squared = 99.5% s = 2.949Variable Coefficient SE(Coeff) t-ratio P-valueIntercept 123.141 1.382 89.1 <0.0001Day -5.57476 0.1068 -52.2 <0.0001
What is the standard deviation of the residuals?
What is the standard error of b1?
What are the hypotheses for the regression slope?
At α = 0.05, what is the conclusion?
![Page 17: STA291 Statistical Methods Lecture 27. Inference for Regression](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649e0d5503460f94af6fab/html5/thumbnails/17.jpg)
16.4 A Test for the Regression Slope
Example : Soap
A soap manufacturer tested a standard bar of soap to see how long it would last. A test subject showered with the soap each day for 15 days and recorded the weight (in grams) remaining. Conditions were met so a linear regression gave the following:
Dependent variable is: WeightR squared = 99.5% s = 2.949Variable Coefficient SE(Coeff) t-ratio P-valueIntercept 123.141 1.382 89.1 <0.0001Day -5.57476 0.1068 -52.2 <0.0001
What is the standard deviation of the residuals? se = 2.949
What is the standard error of ? SE( ) = 0.01681b1b
![Page 18: STA291 Statistical Methods Lecture 27. Inference for Regression](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649e0d5503460f94af6fab/html5/thumbnails/18.jpg)
16.4 A Test for the Regression Slope
Example : Soap
A soap manufacturer tested a standard bar of soap to see how long it would last. A test subject showered with the soap each day for 15 days and recorded the weight (in grams) remaining. Conditions were met so a linear regression gave the following:
Dependent variable is: WeightR squared = 99.5% s = 2.949Variable Coefficient SE(Coeff) t-ratio P-valueIntercept 123.141 1.382 89.1 <0.0001Day -5.57476 0.1068 -52.2 <0.0001
What are the hypotheses for the
regression slope?
At α = 0.05, what is the conclusion? Since the p-value is small (<0.0001), reject the null hypothesis. There is strong evidence of a linear relationship between Weight and Day.
o 1
a 1
H : 0
H : 0
![Page 19: STA291 Statistical Methods Lecture 27. Inference for Regression](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649e0d5503460f94af6fab/html5/thumbnails/19.jpg)
16.4 A Test for the Regression Slope
Example : Soap
A soap manufacturer tested a standard bar of soap to see how long it would last. A test subject showered with the soap each day for 15 days and recorded the weight (in grams) remaining. Conditions were met so a linear regression gave the following:
Dependent variable is: WeightR squared = 99.5% s = 2.949Variable Coefficient SE(Coeff) t-ratio P-valueIntercept 123.141 1.382 89.1 <0.0001Day -5.57476 0.1068 -52.2 <0.0001
Find a 95% confidence interval for the slope?
Interpret the 95% confidence interval for the slope?
At α = 0.05, is the confidence interval consistent with the hypothesis test conclusion?
![Page 20: STA291 Statistical Methods Lecture 27. Inference for Regression](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649e0d5503460f94af6fab/html5/thumbnails/20.jpg)
16.4 A Test for the Regression SlopeExample : SoapA soap manufacturer tested a standard bar of soap to see how long it would last. A test subject showered with the soap each day for 15 days and recorded the weight (in grams) remaining. Conditions were met so a linear regression gave the following:
Dependent variable is: WeightR squared = 99.5% s = 2.949Variable Coefficient SE(Coeff) t-ratio P-valueIntercept 123.141 1.382 89.1 <0.0001Day -5.57476 0.1068 -52.2 <0.0001
Find a 95% confidence interval for the slope?
Interpret the 95% confidence interval for the slope? We can be 95% confident that weight of soap decreases by between 5.34 and 5.8 grams per day.
At α = 0.05, is the confidence interval consistent with the hypothesis test conclusion? Yes, the interval does not contain zero, so reject the null hypothesis.
1 1* ( ) 5.57476 (2.160)(0.1068) ( 5.805, 5.344)b t SE b
![Page 21: STA291 Statistical Methods Lecture 27. Inference for Regression](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649e0d5503460f94af6fab/html5/thumbnails/21.jpg)
Don’t fit a linear regression to data that aren’t straight.
Watch out for changing spread.
Watch out for non-Normal errors. Check the histogram and the Normal probability plot.
Watch out for extrapolation. It is always dangerous to predict for x-values that lie far away from the center of the data.
![Page 22: STA291 Statistical Methods Lecture 27. Inference for Regression](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649e0d5503460f94af6fab/html5/thumbnails/22.jpg)
Watch out for high-influence points and unusual observations.
Watch out for one-tailed tests. Most software packages perform only two-tailed tests. Adjust your P-values accordingly.
![Page 23: STA291 Statistical Methods Lecture 27. Inference for Regression](https://reader035.vdocuments.site/reader035/viewer/2022062516/56649e0d5503460f94af6fab/html5/thumbnails/23.jpg)
Looking back
oKnow the Assumptions and conditions for inference about regression coefficients and how to check them, in this order: LIENoKnow the components of the standard error of the slope coefficientoTest statisticoCI Interpretation