Download - Chapter 15: Multiple Linear Regression
![Page 1: Chapter 15: Multiple Linear Regression](https://reader033.vdocuments.site/reader033/viewer/2022061517/56815996550346895dc6df1a/html5/thumbnails/1.jpg)
Apr 22, 2023
Chapter 15:Chapter 15: Multiple Linear Regression Multiple Linear Regression
![Page 2: Chapter 15: Multiple Linear Regression](https://reader033.vdocuments.site/reader033/viewer/2022061517/56815996550346895dc6df1a/html5/thumbnails/2.jpg)
In Chapter 15:15.1 The General Idea
15.2 The Multiple Linear Regression Model
15.3 Categorical Explanatory Variables in Regression Models
15.4 Regression Coefficients
15.5 ANOVA for Multiple Linear Regression
15.6 Examining Multiple Regression Conditions
![Page 3: Chapter 15: Multiple Linear Regression](https://reader033.vdocuments.site/reader033/viewer/2022061517/56815996550346895dc6df1a/html5/thumbnails/3.jpg)
15.1 The General Idea
Simple regression considers the relation between a single explanatory variable X and response variable Y.
![Page 4: Chapter 15: Multiple Linear Regression](https://reader033.vdocuments.site/reader033/viewer/2022061517/56815996550346895dc6df1a/html5/thumbnails/4.jpg)
The General Idea, cont. Multiple regression considers the relation between multiple explanatory variables (X1, X2, …, Xk) and response variable Y.
![Page 5: Chapter 15: Multiple Linear Regression](https://reader033.vdocuments.site/reader033/viewer/2022061517/56815996550346895dc6df1a/html5/thumbnails/5.jpg)
The General Idea, cont.
• The multiple regression model helps “adjust out” the effects of confounding factors (i.e., extraneous lurking variables that bias results).
• Simple linear concepts (Chapter 14) must be mastered before tackling multiple regression.
![Page 6: Chapter 15: Multiple Linear Regression](https://reader033.vdocuments.site/reader033/viewer/2022061517/56815996550346895dc6df1a/html5/thumbnails/6.jpg)
Simple Regression ModelThe simple regression population model:
where
μY|x ≡ expected value of Y given value xα ≡ intercept parameter β ≡ slope parameter
![Page 7: Chapter 15: Multiple Linear Regression](https://reader033.vdocuments.site/reader033/viewer/2022061517/56815996550346895dc6df1a/html5/thumbnails/7.jpg)
Simple Regression Model, cont. Estimates for the model are derived by minimizing ∑residuals2 to come up with estimates for an intercept (denoted a) and slope (b):
The standard error of the regression (sY|x) quantifies variability around the line:
![Page 8: Chapter 15: Multiple Linear Regression](https://reader033.vdocuments.site/reader033/viewer/2022061517/56815996550346895dc6df1a/html5/thumbnails/8.jpg)
Multiple Regression ModelThe multiple regression population model with two explanatory variables is:
where μY|x ≡ expected value of Y given values x1 and x2 α ≡ intercept parameterβ1 ≡ slope parameter for X1 β2 ≡ slope parameter for X2
![Page 9: Chapter 15: Multiple Linear Regression](https://reader033.vdocuments.site/reader033/viewer/2022061517/56815996550346895dc6df1a/html5/thumbnails/9.jpg)
Multiple Regression Model, cont.
Estimates for the coefficients are derived by minimizing ∑residuals2 to derive this multiple regression model:
The standard error of the regression quantifies variability around the regression plane
![Page 10: Chapter 15: Multiple Linear Regression](https://reader033.vdocuments.site/reader033/viewer/2022061517/56815996550346895dc6df1a/html5/thumbnails/10.jpg)
Regression Modeling
• A simple regression model fits a line in two-dimensional space
• A multiple regression model with two explanatory variables fits a regression plane in three dimensional space
• A multiple regression model with k explanatory variables fits a regression “surface” in k + 1 dimensional space (cannot be visualized)
![Page 11: Chapter 15: Multiple Linear Regression](https://reader033.vdocuments.site/reader033/viewer/2022061517/56815996550346895dc6df1a/html5/thumbnails/11.jpg)
Interpretation of Coefficients• Here’s a model with two
independent variables• The intercept predicts
where the regression plane crosses the Y axis
• The slope for variable X1 predicts the change in Y per unit X1 holding X2 constant
• The slope for variable X2 predicts the change in Y per unit X2 holding X1 constant
![Page 12: Chapter 15: Multiple Linear Regression](https://reader033.vdocuments.site/reader033/viewer/2022061517/56815996550346895dc6df1a/html5/thumbnails/12.jpg)
15.3 Categorical Explanatory Variables in Regression Models
• Categorical explanatory (independent) variables can be fit into a regression model by converting them into 0/1 indicator variables (“dummy variables”)
• For binary variables, code the variable 0 for “no” and 1 for “yes”
• For categorical variables with k categories, use k–1 dummy variables.
![Page 13: Chapter 15: Multiple Linear Regression](https://reader033.vdocuments.site/reader033/viewer/2022061517/56815996550346895dc6df1a/html5/thumbnails/13.jpg)
Dummy Variables, cont.
In this example, the variable SMOKE2 here has three levels initially coded 0 = non-smoker, 1 = former smoker, 2 = current smoker. To use this variable in a regression model, convert it to two dummy variables as follows:
![Page 14: Chapter 15: Multiple Linear Regression](https://reader033.vdocuments.site/reader033/viewer/2022061517/56815996550346895dc6df1a/html5/thumbnails/14.jpg)
Illustrative Example
Childhood respiratory health survey.
• A binary explanatory variable (SMOKE) is coded 0 for non-smoker and 1 for smoker.
• The response variable Forced Expiratory Volume (FEV) is measured in liters/second
• The mean FEV in nonsmokers is 2.566
• The mean FEV in smokers is 3.277.
![Page 15: Chapter 15: Multiple Linear Regression](https://reader033.vdocuments.site/reader033/viewer/2022061517/56815996550346895dc6df1a/html5/thumbnails/15.jpg)
Illustrative Example, cont. • A regression model is applied (regress FEV on
SMOKE)
• The least squares regression line is ŷ = 2.566 + 0.711X
• The intercept (2.566) = the mean FEV of group 0
• The slope = the mean difference = 3.277 − 2.566 = 0.711
• tstat = 6.464 with 652 df, P ≈ 0.000 (same as equal variance t test)
• The 95% CI for slope is 0.495 to 0.927 (same as the 95% CI for μ1 − μ2)
![Page 16: Chapter 15: Multiple Linear Regression](https://reader033.vdocuments.site/reader033/viewer/2022061517/56815996550346895dc6df1a/html5/thumbnails/16.jpg)
Illustrative Example, cont.Scatter plot for categorical independent variable (SMOKE) with regression line passing through group means.
![Page 17: Chapter 15: Multiple Linear Regression](https://reader033.vdocuments.site/reader033/viewer/2022061517/56815996550346895dc6df1a/html5/thumbnails/17.jpg)
Confounding, Illustrative Example
• In our example, the children who smoked had higher mean FEV than the children who did not smoke
• How can this be true given what we know about the deleterious respiratory effects of smoking?
• The answer lies in the fact that the smokers were older than the nonsmokers: AGE confounding the relationship between SMOKE and FEV
• A multiple regression model can adjust for AGE in this situation
![Page 18: Chapter 15: Multiple Linear Regression](https://reader033.vdocuments.site/reader033/viewer/2022061517/56815996550346895dc6df1a/html5/thumbnails/18.jpg)
15.4 Multiple Regression Coefficients
We rely on software to calculate the multiple regression statistics. This is the regression dialogue box from SPSS
![Page 19: Chapter 15: Multiple Linear Regression](https://reader033.vdocuments.site/reader033/viewer/2022061517/56815996550346895dc6df1a/html5/thumbnails/19.jpg)
Multiple Regression Example cont.
The multiple regression model is:
FEV = 0.367 + −.209(SMOKE) + .231(AGE)
Here is the SPSS output
Intercept a Slope b2Slope b1
![Page 20: Chapter 15: Multiple Linear Regression](https://reader033.vdocuments.site/reader033/viewer/2022061517/56815996550346895dc6df1a/html5/thumbnails/20.jpg)
Multiple Regression Coefficients, cont.
• The slope coefficient associated for SMOKE is −.206, suggesting that smokers have .206 less FEV on average compared to non-smokers (after adjusting for age)
• The slope coefficient for AGE is .231, suggesting that each year of age in associated with an increase of .231 FEV units on average (after adjusting for SMOKE)
![Page 21: Chapter 15: Multiple Linear Regression](https://reader033.vdocuments.site/reader033/viewer/2022061517/56815996550346895dc6df1a/html5/thumbnails/21.jpg)
Coefficientsa
.367 .081 4.511 .000
-.209 .081 -.072 -2.588 .010
.231 .008 .786 28.176 .000
(Constant)
smoke
age
Model1
B Std. Error
UnstandardizedCoefficients
Beta
StandardizedCoefficients
t Sig.
Dependent Variable: feva.
Inference About the CoefficientsInferential statistics are calculated for each
regression coefficient. For example, in testing
H0: β1 = 0 (SMOKE coefficient controlling for AGE)
tstat = −2.588 and P = 0.010
This statistic has df = n – k – 1 = 654 – 2 – 1 = 651See text for formulas
![Page 22: Chapter 15: Multiple Linear Regression](https://reader033.vdocuments.site/reader033/viewer/2022061517/56815996550346895dc6df1a/html5/thumbnails/22.jpg)
Inference About the CoefficientsThe 95% confidence interval for this slope of SMOKE controlling for AGE is −0.368 to − 0.050.
Coefficientsa
.207 .527
-.368 -.050
.215 .247
(Constant)
smoke
age
Model1
Lower Bound Upper Bound
95% Confidence Interval for B
Dependent Variable: feva.
![Page 23: Chapter 15: Multiple Linear Regression](https://reader033.vdocuments.site/reader033/viewer/2022061517/56815996550346895dc6df1a/html5/thumbnails/23.jpg)
15.5 ANOVA for Multiple Regression
See pp. 343 – 346
![Page 24: Chapter 15: Multiple Linear Regression](https://reader033.vdocuments.site/reader033/viewer/2022061517/56815996550346895dc6df1a/html5/thumbnails/24.jpg)
15.6 Examining Regression Conditions
• Conditions for multiple regression mirror those of simple regression– Linearity– Independence– Normality– Equal variance
• These can be evaluated by analyzing the pattern of the residuals
![Page 25: Chapter 15: Multiple Linear Regression](https://reader033.vdocuments.site/reader033/viewer/2022061517/56815996550346895dc6df1a/html5/thumbnails/25.jpg)
Examining ConditionsThis shows standardized residuals against standardized predicted values for the illustrative model (FEV regressed on AGE and SMOKE)
Same number of points above and below the horizontal no major departures from linearity
Higher variability at high values of Y possible unequal variance (biologically reasonable)
![Page 26: Chapter 15: Multiple Linear Regression](https://reader033.vdocuments.site/reader033/viewer/2022061517/56815996550346895dc6df1a/html5/thumbnails/26.jpg)
Examining Conditions
The fairly straight diagonal suggests no major departures from Normality
Normal Q-Q plot of standardized residuals