chapter 15
DESCRIPTION
Chapter 15. Multiple Regression. Regression. Multiple Regression Model y = b 0 + b 1 x 1 + b 2 x 2 + … + b p x p + e. Multiple Regression Equation y = b 0 + b 1 x 1 + b 2 x 2 + … + b p x p. Estimated Multiple Regression Equation. Car Data. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Chapter 15](https://reader036.vdocuments.site/reader036/viewer/2022070421/56816207550346895dd22fff/html5/thumbnails/1.jpg)
Chapter 15
Multiple Regression
![Page 2: Chapter 15](https://reader036.vdocuments.site/reader036/viewer/2022070421/56816207550346895dd22fff/html5/thumbnails/2.jpg)
Regression
Multiple Regression Modely = b0 + b1x1 + b2x2 + … + bpxp + e
Multiple Regression Equationy = b0 + b1x1 + b2x2 + … + bpxp
Estimated Multiple Regression Equation
ppxbxbxbby ...ˆ 22110
![Page 3: Chapter 15](https://reader036.vdocuments.site/reader036/viewer/2022070421/56816207550346895dd22fff/html5/thumbnails/3.jpg)
Car DataMPG Weight Year Cylinders
18 3504 70 815 3693 70 818 3436 70 816 3433 70 817 3449 70 815 4341 70 814 4354 70 814 4312 70 814 4425 70 815 3850 70 8
. . . .
. . . .
. . . .
Continuing on for 397 observations
![Page 4: Chapter 15](https://reader036.vdocuments.site/reader036/viewer/2022070421/56816207550346895dd22fff/html5/thumbnails/4.jpg)
Multiple Regression, Example Coefficients Standard Error t Stat
Intercept 46.3 0.800 57.8Weight -0.00765 0.000259 -29.4
R Square 0.687
Coefficients Standard Error t StatIntercept -14.7 3.96 -3.71Weight -0.00665 0.000214 -31.0Year 0.763 0.0490 15.5
R Square 0.807
![Page 5: Chapter 15](https://reader036.vdocuments.site/reader036/viewer/2022070421/56816207550346895dd22fff/html5/thumbnails/5.jpg)
Multiple Regression, Example
Coefficients Standard Error t StatIntercept -14.4 4.03 -3.58Weight -0.00652 0.000460 -14.1Year 0.760 0.0498 15.2Cylinders -0.0741 0.232 -0.319
R Square 0.807
Predicted MPG for car weighing 4000 lbs built in 1980 with 6 cylinders:-14.4 -.00652(4000)+.76(80)-.0741(6)=-14.4-26.08+60.8-.4446=19.88
![Page 6: Chapter 15](https://reader036.vdocuments.site/reader036/viewer/2022070421/56816207550346895dd22fff/html5/thumbnails/6.jpg)
2ˆ ii yySSE
2ˆ yySSR i
2 yySST i
SST = SSR + SSE
Sums of Squares
![Page 7: Chapter 15](https://reader036.vdocuments.site/reader036/viewer/2022070421/56816207550346895dd22fff/html5/thumbnails/7.jpg)
Multiple Coefficient of DeterminationThe share of the variation explained by the estimated model.
R2 = SSR/SST
Multiple Correlation Coefficient
yyrRR ˆ2
The correlation coefficient of the actual and predicted values
![Page 8: Chapter 15](https://reader036.vdocuments.site/reader036/viewer/2022070421/56816207550346895dd22fff/html5/thumbnails/8.jpg)
Adjusted Multiple Coefficient of Determination
1111 22
pnnRRa
Regression StatisticsMultiple R 0.898R Square 0.807Adjusted R Square 0.805Standard Error 3.44Observations 397
![Page 9: Chapter 15](https://reader036.vdocuments.site/reader036/viewer/2022070421/56816207550346895dd22fff/html5/thumbnails/9.jpg)
F Test for Overall Significance
H0: b1 = b2 = . . . = bp = 0Ha: One or more of the parameters is not equal to zero
Reject H0 if: F > Fa OrReject H0 if: p-value < a
F = MSR/MSE
![Page 10: Chapter 15](https://reader036.vdocuments.site/reader036/viewer/2022070421/56816207550346895dd22fff/html5/thumbnails/10.jpg)
ANOVA Table for Multiple Regression Model
Source Sum of Squares
Degrees of Freedom
Mean Squares F
Regression SSR p MSR = SSR/p F=MSR/MSE
Error SSE n-p-1 MSE = SSE/(n-p-1)
Total SST n-1
![Page 11: Chapter 15](https://reader036.vdocuments.site/reader036/viewer/2022070421/56816207550346895dd22fff/html5/thumbnails/11.jpg)
ANOVA Example
ANOVA
df SS MS FSignificance
FRegression 3 19382 6460 547 6.42E-140Residual 393 4638 11.8Total 396 24021
![Page 12: Chapter 15](https://reader036.vdocuments.site/reader036/viewer/2022070421/56816207550346895dd22fff/html5/thumbnails/12.jpg)
t Test for Coefficients
H0: b1 = 0Ha: b1 ≠ 0
Reject H0 if:t < -ta/2 or t > ta/2 Or if:p < a
t = b1/sb1
With a t distribution of n-p-1 df
![Page 13: Chapter 15](https://reader036.vdocuments.site/reader036/viewer/2022070421/56816207550346895dd22fff/html5/thumbnails/13.jpg)
t Test Example
Coefficients Standard Error t Stat P-valueIntercept -14.48 4.038 -3.587 0.0003769Weight -0.006525 0.0004603 -14.18 3.892E-37Year 0.7608 0.04985 15.26 1.258E-41Cylinders -0.07420 0.2322 -0.3196 0.7494
![Page 14: Chapter 15](https://reader036.vdocuments.site/reader036/viewer/2022070421/56816207550346895dd22fff/html5/thumbnails/14.jpg)
MulticollinearityWhen two or more independent variables are highly correlated.
When multicollinearity is severe the estimated values of coefficients will be unreliable.
![Page 15: Chapter 15](https://reader036.vdocuments.site/reader036/viewer/2022070421/56816207550346895dd22fff/html5/thumbnails/15.jpg)
MulticollinearityTwo guidelines for identifying multicollinearity:• If the absolute value of the correlation coefficient for two independent variables exceeds 0.7• If the correlation coefficient for an independent variable and some other independent variable is greater than the correlation with that variable and the dependent variable
![Page 16: Chapter 15](https://reader036.vdocuments.site/reader036/viewer/2022070421/56816207550346895dd22fff/html5/thumbnails/16.jpg)
Multicollinearity
MPG Weight Year CylindersMPG 1Weight -0.829 1Year 0.578 -0.300 1Cylinders -0.773 0.895 -0.344 1
Table of correlation coefficients:
![Page 17: Chapter 15](https://reader036.vdocuments.site/reader036/viewer/2022070421/56816207550346895dd22fff/html5/thumbnails/17.jpg)
Multicollinearity Coefficients Standard Error t Stat
Intercept -14.4 4.03 -3.58Weight -0.00652 0.000460 -14.1Year 0.760 0.0498 15.2Cylinders -0.0741 0.232 -0.319
R Square 0.807
Coefficients Standard Error t StatIntercept -16.9 4.95 -3.42Year 0.747 0.0612 12.21Cylinders -2.99 0.133 -22.46
R Square 0.708
![Page 18: Chapter 15](https://reader036.vdocuments.site/reader036/viewer/2022070421/56816207550346895dd22fff/html5/thumbnails/18.jpg)
Qualitative Variables and Regression
Quantitative variable – A variable that can be measured numerically (interval or ratio scale of measurement)
Qualitative variable – A variable where labels or names are used to identify some attribute (nominal or ordinal scale of measurement)
![Page 19: Chapter 15](https://reader036.vdocuments.site/reader036/viewer/2022070421/56816207550346895dd22fff/html5/thumbnails/19.jpg)
Qualitative Variables and Regression
The effect of a quantitative variable can be estimated using a dummy variable.
A dummy variable can equal 0 or 1, it creates different y intercepts for groups with different attributes.
![Page 20: Chapter 15](https://reader036.vdocuments.site/reader036/viewer/2022070421/56816207550346895dd22fff/html5/thumbnails/20.jpg)
Qualitative Variables and Regression
Assume we estimate a regression model for the number of sick days an employee takes per year. A dummy variable is included that equals 1 if the individual smokes and 0 if they do not. Age is also included in the model.
![Page 21: Chapter 15](https://reader036.vdocuments.site/reader036/viewer/2022070421/56816207550346895dd22fff/html5/thumbnails/21.jpg)
Qualitative Variables and Regression
Estimated model:Sick days taken = -1 +(3)Smoker + (.1)Age
Sick Days Smoker Age
3 0 45
6 1 50
0 0 20
5 0 65
10 1 60
Example of how data would be coded:
![Page 22: Chapter 15](https://reader036.vdocuments.site/reader036/viewer/2022070421/56816207550346895dd22fff/html5/thumbnails/22.jpg)
Dummy VariablesSick days taken = -1 +(3)Smoker + (.1)Age
What is the y-intercept for nonsmokers? -1What is the y-intercept for smokers? 2What is the predicted number of sick days for a 40-year-old smoker? 6What is the average difference in the number of sick days taken by smokers and nonsmokers? 3
![Page 23: Chapter 15](https://reader036.vdocuments.site/reader036/viewer/2022070421/56816207550346895dd22fff/html5/thumbnails/23.jpg)
Dummy Variables
If an attribute has three or more possible values you must include k-1 dummy variables in the model, where k is the number of possible values.
![Page 24: Chapter 15](https://reader036.vdocuments.site/reader036/viewer/2022070421/56816207550346895dd22fff/html5/thumbnails/24.jpg)
Dummy VariablesSuppose we have three job classifications: manager, operator, and secretary
Operator dummy equals 1 if the person is an operator, 0 otherwise
Secretary dummy equals 1 if the person is an secretary, 0 otherwise
Manager is the omitted group (choice of omitted group will not alter the predicted values)
![Page 25: Chapter 15](https://reader036.vdocuments.site/reader036/viewer/2022070421/56816207550346895dd22fff/html5/thumbnails/25.jpg)
Dummy VariablesSick days taken = -1 +(1)Operator + 1.5(Secretary) + (.1)Age
What are the y-intercepts for each job classification? Managers=-1, Operators=0, Secretaries=0.5 What is the predicted number of sick days for a 40-year-old secretary? 4.5What is the average difference in the number of sick days taken by operators and secretaries? 0.5
![Page 26: Chapter 15](https://reader036.vdocuments.site/reader036/viewer/2022070421/56816207550346895dd22fff/html5/thumbnails/26.jpg)
Dummy VariablesIn some cases there will be multiple sets of dummy variables, such as:Sick days taken = -1 +(3)Smoker + (1)Operator + 1.5(Secretary) + (.1)Age
Note that there are now 6 different intercepts:Nonsmoker, Manager: -1 (omitted group)Smoker, Manager: 2Nonsmoker, Operator: 0Smoker, Operator: 3Nonsmoker, Secretary: 0.5Smoker, Secretary: 3.5
![Page 27: Chapter 15](https://reader036.vdocuments.site/reader036/viewer/2022070421/56816207550346895dd22fff/html5/thumbnails/27.jpg)
Dummy VariablesNote that when dummy variables are used we are assuming that the coefficients of the other variables are the same for all groups.
In this example the increase in sick days used from aging a year is equal to 0.1 for all of the groups.
If there is reason to believe the effect of an independent variable differs by group, you may want to estimate separate equations for each group.
![Page 28: Chapter 15](https://reader036.vdocuments.site/reader036/viewer/2022070421/56816207550346895dd22fff/html5/thumbnails/28.jpg)
Nonlinear RelationshipsNonlinear relationships can be modeled by including a variable that is a nonlinear function of an independent variable.
For example it is usually assumed that health care expenditures increase at an increasing rate as people age.
![Page 29: Chapter 15](https://reader036.vdocuments.site/reader036/viewer/2022070421/56816207550346895dd22fff/html5/thumbnails/29.jpg)
Nonlinear RelationshipsIn that case you might try including age squared into the model:Health expend = 500 + (5)Age + (.5)AgeSQ
Age Health Expend10 60020 80030 110040 1500
![Page 30: Chapter 15](https://reader036.vdocuments.site/reader036/viewer/2022070421/56816207550346895dd22fff/html5/thumbnails/30.jpg)
Nonlinear RelationshipsIf the dependent variable increases at a decreasing rate as the independent variable rises you might want to include the square root of the independent variable.
If you are unsure of the nature of the relationship you can use dummy variables for different ranges of values of the independent variable.
![Page 31: Chapter 15](https://reader036.vdocuments.site/reader036/viewer/2022070421/56816207550346895dd22fff/html5/thumbnails/31.jpg)
Non-continuous RelationshipsIf the relationship between the dependent variable and an independent variable is non-continuous a slope dummy variable can be used to estimate two sets of coefficients for the independent variable.
For example, if natural gas usage is not affected by temperature when the temperature rises above 60 degrees, we could have:Gas usage = b0 + b1(GT60) + b2(Temp) + b2(GT60)(Temp)
![Page 32: Chapter 15](https://reader036.vdocuments.site/reader036/viewer/2022070421/56816207550346895dd22fff/html5/thumbnails/32.jpg)
Non-continuous Relationships
Note that at temperatures above 60 degrees the net effect of a 1 degree increase in temperature on gas usage is -0.056 (-.866+.810)
CoefficientsStandard Error t Stat P-value
Intercept 53.002 2.415 21.95 7.48E-18
GT60 -46.623 16.682 -2.79 0.0098
Temp -0.866 0.0595 -14.56 1.02E-13
(GT60)(Temp) 0.810 0.255 3.18 0.0039