Download - ANOVA for Regression
![Page 1: ANOVA for Regression](https://reader033.vdocuments.site/reader033/viewer/2022061509/56815aa3550346895dc82d6f/html5/thumbnails/1.jpg)
ANOVA for Regression
ANOVA tests whether the regression model has any explanatory power.
In the case of simple regression analysis the ANOVA test and the test for b1 are identical.
![Page 2: ANOVA for Regression](https://reader033.vdocuments.site/reader033/viewer/2022061509/56815aa3550346895dc82d6f/html5/thumbnails/2.jpg)
ANOVA for Regression
MSE = SSE/(n-2)
MSR = SSR/pwhere p=number of independent variables
F = MSR/MSE
![Page 3: ANOVA for Regression](https://reader033.vdocuments.site/reader033/viewer/2022061509/56815aa3550346895dc82d6f/html5/thumbnails/3.jpg)
ANOVA Hypothesis Test
H0: b1 = 0Ha: b1 ≠ 0
Reject H0 if:F > Fa Or if:p < a
![Page 4: ANOVA for Regression](https://reader033.vdocuments.site/reader033/viewer/2022061509/56815aa3550346895dc82d6f/html5/thumbnails/4.jpg)
Regression and ANOVASource of variation
Sum of squares Degrees of freedom
Mean Square F
Regression SSR 1 MSR=SSR/1 F=MSR/MSE
Error SSE n-2 MSE=SSE/(n-2)
Total SST n-1
![Page 5: ANOVA for Regression](https://reader033.vdocuments.site/reader033/viewer/2022061509/56815aa3550346895dc82d6f/html5/thumbnails/5.jpg)
ANOVA and RegressionANOVA
df SS MS FSignificance
F
Regression 1 3364 3364 273 1.23E-15
Residual 27 3334 12.3
Total 28 3697
Fa = 4.21 given a=.05, df num. = 1, df denom. = 27
![Page 6: ANOVA for Regression](https://reader033.vdocuments.site/reader033/viewer/2022061509/56815aa3550346895dc82d6f/html5/thumbnails/6.jpg)
Issues with Hypothesis Test Results
• Correlation does NOT prove causation• The test does not prove we used the correct
functional form
![Page 7: ANOVA for Regression](https://reader033.vdocuments.site/reader033/viewer/2022061509/56815aa3550346895dc82d6f/html5/thumbnails/7.jpg)
Output with Temperature as Y
SUMMARY OUTPUT
Regression StatisticsMultiple R 0.953884648R Square 0.909895922Adjusted R Square 0.906558734Standard Error 5.053605155Observations 29
ANOVA df SS MS F Significance F
Regression 1 6963.27661 6963.27661 272.6535 1.23118E-15Residual 27 689.5509766 25.5389251Total 28 7652.827586
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%Intercept 67.59301867 1.358242515 49.7650588 4.24E-28 64.80613526 70.3799021Thousands of cubic feet -1.372438825 0.083116544 -16.512222 1.23E-15 -1.542979885 -1.20189776
![Page 8: ANOVA for Regression](https://reader033.vdocuments.site/reader033/viewer/2022061509/56815aa3550346895dc82d6f/html5/thumbnails/8.jpg)
Jun-07
Aug-07
Oct-07
Dec-07
Feb-08
Apr-08
Jun-08
Aug-08
Oct-08
Dec-08
Feb-09
Apr-09
Jun-09
Aug-09
Oct-09
0
10
20
30
40
50
60
70
80
Temperature and Natural Gas Consumed
Average daily temperature Thousands of cubic feet
![Page 9: ANOVA for Regression](https://reader033.vdocuments.site/reader033/viewer/2022061509/56815aa3550346895dc82d6f/html5/thumbnails/9.jpg)
0 10 20 30 40 50 60 70 800
5
10
15
20
25
30
35
40
Monthly Natural Gas Use and Temperature
Average Daily Temperature
Thou
sand
s of
cub
ic fe
et
![Page 10: ANOVA for Regression](https://reader033.vdocuments.site/reader033/viewer/2022061509/56815aa3550346895dc82d6f/html5/thumbnails/10.jpg)
Confidence Interval for Estimated Mean Value of y
xp = particular or given value of xyp = value of the dependent variable for xp E(yp) = expected value of yp
or E(y|x= xp)
)(ˆ
ˆ 10
p
p
yEy
xbby
of estimate our is
![Page 11: ANOVA for Regression](https://reader033.vdocuments.site/reader033/viewer/2022061509/56815aa3550346895dc82d6f/html5/thumbnails/11.jpg)
Confidence Interval for Estimated Mean Value of y
p
p
y
i
py
sty
xx
xx
nss
ˆ2/
2
2
ˆ
ˆ
1
![Page 12: ANOVA for Regression](https://reader033.vdocuments.site/reader033/viewer/2022061509/56815aa3550346895dc82d6f/html5/thumbnails/12.jpg)
Computing b0 and b1, Examplex y1 15 -3 3 -9 93 14 -1 2 -2 13 11 -1 -1 1 14 12 0 0 0 09 8 5 -4 -20 25
Sum = 20 60 -30 36Mean = 4 12
b1 = -0.83b0 = 15.33
)( xxi )( yyi 2)( xxi ))(( yyxx ii
From example of car age, price:
![Page 13: ANOVA for Regression](https://reader033.vdocuments.site/reader033/viewer/2022061509/56815aa3550346895dc82d6f/html5/thumbnails/13.jpg)
x y 1 15 9 14.5 6.2 0.3 93 14 1 12.84 0.7 1.3 43 11 1 12.84 0.7 3.4 14 12 0 12.01 0.0 0.0 09 8 25 7.86 17.4 0.0 16
Sum=20 Sum=60 36 SSR=25.0 SSE=5.0 SST=30Mean=4 Mean=12
b1=-0.833b0=15.33
r2 = 25/30 = .833
y 2)ˆ( yy 2)ˆ( yy 2)( yy 2)( xx
Confidence Interval of Conditional Mean
![Page 14: ANOVA for Regression](https://reader033.vdocuments.site/reader033/viewer/2022061509/56815aa3550346895dc82d6f/html5/thumbnails/14.jpg)
616.0228.29.1
36
45
5
129.1
1
29.125
5
2
2
2
2
ˆ
xx
xx
nss
n
SSEMSEs
i
py p
Confidence Interval of Conditional Mean
![Page 15: ANOVA for Regression](https://reader033.vdocuments.site/reader033/viewer/2022061509/56815aa3550346895dc82d6f/html5/thumbnails/15.jpg)
Confidence Interval of Conditional Mean
14.13,22.996.118.11
616.182.318.11
ˆ ˆ2/
py
sty
Given 1-a = .95 and df = 3:
![Page 16: ANOVA for Regression](https://reader033.vdocuments.site/reader033/viewer/2022061509/56815aa3550346895dc82d6f/html5/thumbnails/16.jpg)
Confidence Interval for Predicted Values of y
A confidence interval for a predicted value of y must take into account both random error in the estimate of b1 and the random deviations of individual values from the regression line.
![Page 17: ANOVA for Regression](https://reader033.vdocuments.site/reader033/viewer/2022061509/56815aa3550346895dc82d6f/html5/thumbnails/17.jpg)
Confidence Interval for Estimated Mean Value of y
ind
i
pind
sty
xx
xx
nss
2/
2
2
ˆ
11
![Page 18: ANOVA for Regression](https://reader033.vdocuments.site/reader033/viewer/2022061509/56815aa3550346895dc82d6f/html5/thumbnails/18.jpg)
43.1228.129.1
36
45
5
1129.1
11
2
2
2
ˆ
xx
xx
nss
i
pyind
Confidence Interval of Individual Value
![Page 19: ANOVA for Regression](https://reader033.vdocuments.site/reader033/viewer/2022061509/56815aa3550346895dc82d6f/html5/thumbnails/19.jpg)
Confidence Interval of Conditional Mean
73.15,63.655.418.11
43.1182.318.11
ˆ ˆ2/
indysty
Given 1-a = .95 and df = 3:
![Page 20: ANOVA for Regression](https://reader033.vdocuments.site/reader033/viewer/2022061509/56815aa3550346895dc82d6f/html5/thumbnails/20.jpg)
Residual Plots Against x
Residual – the difference between the observed value and the predicted value
Look for:• Evidence of a nonconstant variance• Nonlinear relationship
![Page 21: ANOVA for Regression](https://reader033.vdocuments.site/reader033/viewer/2022061509/56815aa3550346895dc82d6f/html5/thumbnails/21.jpg)
Regression and Outliers
Outliers can have a disproportionate effect on the estimated regression line.
10 20 30 40 50 60 70 80 90 1000
5
10
15
20
25
30
35
40
Natural Gas Usage and Tem-perature
Temperature
000'
s Cu
bic
Feet
CoefficientsIntercept 36.19972
X Variable 1 -0.44381
![Page 22: ANOVA for Regression](https://reader033.vdocuments.site/reader033/viewer/2022061509/56815aa3550346895dc82d6f/html5/thumbnails/22.jpg)
Regression and Outliers
One solution is to estimate the model with and without the outlier.
Questions to ask:•Is the value a error?•Does the value reflect some unique circumstance?•Is the data point providing unique information about values outside of the range of other observations?
![Page 23: ANOVA for Regression](https://reader033.vdocuments.site/reader033/viewer/2022061509/56815aa3550346895dc82d6f/html5/thumbnails/23.jpg)
Chapter 15
Multiple Regression
![Page 24: ANOVA for Regression](https://reader033.vdocuments.site/reader033/viewer/2022061509/56815aa3550346895dc82d6f/html5/thumbnails/24.jpg)
Regression
Multiple Regression Modely = b0 + b1x1 + b2x2 + … + bpxp + e
Multiple Regression Equationy = b0 + b1x1 + b2x2 + … + bpxp
Estimated Multiple Regression Equation
ppxbxbxbby ...ˆ 22110
![Page 25: ANOVA for Regression](https://reader033.vdocuments.site/reader033/viewer/2022061509/56815aa3550346895dc82d6f/html5/thumbnails/25.jpg)
Car DataMPG Weight Year Cylinders
18 3504 70 815 3693 70 818 3436 70 816 3433 70 817 3449 70 815 4341 70 814 4354 70 814 4312 70 814 4425 70 815 3850 70 815 3563 70 814 3609 70 8… … … …
![Page 26: ANOVA for Regression](https://reader033.vdocuments.site/reader033/viewer/2022061509/56815aa3550346895dc82d6f/html5/thumbnails/26.jpg)
Multiple Regression, Example Coefficients Standard Error t Stat
Intercept 46.3 0.800 57.8Weight -0.00765 0.000259 -29.4
R Square 0.687
Coefficients Standard Error t StatIntercept -14.7 3.96 -3.71Weight -0.00665 0.000214 -31.0Year 0.763 0.0490 15.5
R Square 0.807
![Page 27: ANOVA for Regression](https://reader033.vdocuments.site/reader033/viewer/2022061509/56815aa3550346895dc82d6f/html5/thumbnails/27.jpg)
Multiple Regression, Example
Coefficients Standard Error t StatIntercept -14.4 4.03 -3.58Weight -0.00652 0.000460 -14.1Year 0.760 0.0498 15.2Cylinders -0.0741 0.232 -0.319
R Square 0.807
Predicted MPG for car weighing 4000 lbs built in 1980 with 6 cylinders:-14.4 -.00652(4000)+.76(80)-.0741(6)=-14.4-26.08+60.8-.4446=19.88
![Page 28: ANOVA for Regression](https://reader033.vdocuments.site/reader033/viewer/2022061509/56815aa3550346895dc82d6f/html5/thumbnails/28.jpg)
Multiple Regression Model
2ˆ ii yySSE
2ˆ yySSR i
2 yySST i
SST = SSR + SSE
![Page 29: ANOVA for Regression](https://reader033.vdocuments.site/reader033/viewer/2022061509/56815aa3550346895dc82d6f/html5/thumbnails/29.jpg)
Multiple Coefficient of Determination
The share of the variation explained by the estimated model.
R2 = SSR/SST
![Page 30: ANOVA for Regression](https://reader033.vdocuments.site/reader033/viewer/2022061509/56815aa3550346895dc82d6f/html5/thumbnails/30.jpg)
F Test for Overall Significance
H0: b1 = b1 = . . . = bp
Ha: One or more of the parameters is not equal to zero
Reject H0 if: F > Fa OrReject H0 if: p-value < a
F = MSR/MSE
![Page 31: ANOVA for Regression](https://reader033.vdocuments.site/reader033/viewer/2022061509/56815aa3550346895dc82d6f/html5/thumbnails/31.jpg)
ANOVA Table for Multiple Regression Model
Source Sum of Squares
Degrees of Freedom
Mean Squares
F
Regression SSR p MSR = SSR/p
F=MSR/MSE
Error SSE n-p-1 MSE = SSE/(n-p-1)
Total SST n-1
![Page 32: ANOVA for Regression](https://reader033.vdocuments.site/reader033/viewer/2022061509/56815aa3550346895dc82d6f/html5/thumbnails/32.jpg)
t Test for Coefficients
H0: b1 = 0Ha: b1 ≠ 0
Reject H0 if:t < -t /2a or t > t /2a Or if:p < a
t = b1/sb1
With a t distribution of n-p-1 df
![Page 33: ANOVA for Regression](https://reader033.vdocuments.site/reader033/viewer/2022061509/56815aa3550346895dc82d6f/html5/thumbnails/33.jpg)
MulticollinearityWhen two or more independent variables are highly correlated.
When multicollinearity is severe the estimated values of coefficients will be unreliable
Two guidelines for multicollinearity:• If the absolute value of the correlation coefficient for two independent variables exceeds 0.7• If the correlation coefficient for independent variable and some other independent variable is greater than the correlation with the dependent variable
![Page 34: ANOVA for Regression](https://reader033.vdocuments.site/reader033/viewer/2022061509/56815aa3550346895dc82d6f/html5/thumbnails/34.jpg)
Multicollinearity
MPG Weight Year CylindersMPG 1Weight -0.829 1Year 0.578 -0.300 1Cylinders -0.773 0.895 -0.344 1