© 2002 prentice-hall, inc.chap 12-1 statistics for managers using microsoft excel 3 rd edition...

98
© 2002 Prentice-Hall, Inc. Chap 12-1 Statistics for Managers Using Microsoft Excel 3 rd Edition Chapter 12 Multiple Regression

Upload: sybil-warren

Post on 23-Dec-2015

217 views

Category:

Documents


1 download

TRANSCRIPT

  • Slide 1
  • 2002 Prentice-Hall, Inc.Chap 12-1 Statistics for Managers Using Microsoft Excel 3 rd Edition Chapter 12 Multiple Regression
  • Slide 2
  • 2002 Prentice-Hall, Inc. Chap 12-2 Chapter Topics The multiple regression model Residual analysis Testing for the significance of the regression model Inferences on the population regression coefficients Testing portions of the multiple regression model
  • Slide 3
  • 2002 Prentice-Hall, Inc. Chap 12-3 Chapter Topics The quadratic regression model Dummy variables Using transformation in regression models Collinearity Model building Pitfalls in multiple regression and ethical considerations (continued)
  • Slide 4
  • 2002 Prentice-Hall, Inc. Chap 12-4 Population Y-intercept Population slopesRandom Error The Multiple Regression Model Relationship between 1 dependent & 2 or more independent variables is a linear function Dependent (Response) variable for sample Independent (Explanatory) variables for sample model Residual
  • Slide 5
  • 2002 Prentice-Hall, Inc. Chap 12-5 Population Multiple Regression Model Bivariate model
  • Slide 6
  • 2002 Prentice-Hall, Inc. Chap 12-6 Sample Multiple Regression Model Bivariate model Sample Regression Plane
  • Slide 7
  • 2002 Prentice-Hall, Inc. Chap 12-7 Simple and Multiple Regression Compared Coefficients in a simple regression pick up the impact of that variable plus the impacts of other variables that are correlated with it and the dependent variable. Coefficients in a multiple regression net out the impacts of other variables in the equation.
  • Slide 8
  • 2002 Prentice-Hall, Inc. Chap 12-8 Simple and Multiple Regression Compared:Example Two simple regressions: Multiple regression:
  • Slide 9
  • 2002 Prentice-Hall, Inc. Chap 12-9 Multiple Linear Regression Equation Too complicated by hand! Ouch!
  • Slide 10
  • 2002 Prentice-Hall, Inc. Chap 12-10 Interpretation of Estimated Coefficients Slope (b i ) Estimated that the average value of Y changes by b i for each 1 unit increase in X i holding all other variables constant (ceterus paribus) Example: if b 1 = -2, then fuel oil usage (Y) is expected to decrease by an estimated 2 gallons for each 1 degree increase in temperature (X 1 ) given the inches of insulation (X 2 ) Y-intercept (b 0 ) The estimated average value of Y when all X i = 0
  • Slide 11
  • 2002 Prentice-Hall, Inc. Chap 12-11 Multiple Regression Model: Example ( 0 F) Develop a model for estimating heating oil used for a single family home in the month of January based on average temperature and amount of insulation in inches.
  • Slide 12
  • 2002 Prentice-Hall, Inc. Chap 12-12 Sample Multiple Regression Equation: Example Excel Output For each degree increase in temperature, the estimated average amount of heating oil used is decreased by 5.437 gallons, holding insulation constant. For each increase in one inch of insulation, the estimated average use of heating oil is decreased by 20.012 gallons, holding temperature constant.
  • Slide 13
  • 2002 Prentice-Hall, Inc. Chap 12-13 Multiple Regression in PHStat PHStat | regression | multiple regression EXCEL spreadsheet for the heating oil example.
  • Slide 14
  • 2002 Prentice-Hall, Inc. Chap 12-14 Venn Diagrams and Explanatory Power of Regression Oil Temp Variations in Oil explained by Temp or variations in Temp used in explaining variation in Oil Variations in Oil explained by the error term Variations in Temp not used in explaining variation in Oil
  • Slide 15
  • 2002 Prentice-Hall, Inc. Chap 12-15 Venn Diagrams and Explanatory Power of Regression Oil Temp (continued)
  • Slide 16
  • 2002 Prentice-Hall, Inc. Chap 12-16 Venn Diagrams and Explanatory Power of Regression Oil Temp Insulation Overlapping variation NOT estimation Overlapping variation in both Temp and Insulation are used in explaining the variation in Oil but NOT in the estimation of nor NOT Variation NOT explained by Temp nor Insulation
  • Slide 17
  • 2002 Prentice-Hall, Inc. Chap 12-17 Coefficient of Multiple Determination Proportion of total variation in Y explained by all X variables taken together Never decreases when a new X variable is added to model Disadvantage when comparing models
  • Slide 18
  • 2002 Prentice-Hall, Inc. Chap 12-18 Venn Diagrams and Explanatory Power of Regression Oil Temp Insulation
  • Slide 19
  • 2002 Prentice-Hall, Inc. Chap 12-19 Adjusted Coefficient of Multiple Determination Proportion of variation in Y explained by all X variables adjusted for the number of X variables used Penalize excessive use of independent variables Smaller than Useful in comparing among models
  • Slide 20
  • 2002 Prentice-Hall, Inc. Chap 12-20 Coefficient of Multiple Determination Excel Output Adjusted r 2 reflects the number of explanatory variables and sample size is smaller than r 2
  • Slide 21
  • 2002 Prentice-Hall, Inc. Chap 12-21 Interpretation of Coefficient of Multiple Determination 96.56% of the total variation in heating oil can be explained by different temperature and amount of insulation 95.99% of the total fluctuation in heating oil can be explained by different temperature and amount of insulation after adjusting for the number of explanatory variables and sample size
  • Slide 22
  • 2002 Prentice-Hall, Inc. Chap 12-22 Using The Model to Make Predictions Predict the amount of heating oil used for a home if the average temperature is 30 0 and the insulation is six inches. The predicted heating oil used is 278.97 gallons
  • Slide 23
  • 2002 Prentice-Hall, Inc. Chap 12-23 Predictions in PHStat PHStat | regression | multiple regression Check the confidence and prediction interval estimate box EXCEL spreadsheet for the heating oil example.
  • Slide 24
  • 2002 Prentice-Hall, Inc. Chap 12-24 Residual Plots Residuals vs. May need to transform Y variable Residuals vs. May need to transform variable Residuals vs. May need to transform variable Residuals vs. time May have autocorrelation
  • Slide 25
  • 2002 Prentice-Hall, Inc. Chap 12-25 Residual Plots: Example No Discernable Pattern Maybe some non- linear relationship
  • Slide 26
  • 2002 Prentice-Hall, Inc. Chap 12-26 Influence Analysis To determine observations that have influential effect on the fitted model Potentially influential points become candidate for removal from the model Criteria used are The hat matrix elements h i The Studentized deleted residuals t i * Cooks distance statistic D i All three criteria are complementary Only when all three criteria provide consistent result should an observation be removed
  • Slide 27
  • 2002 Prentice-Hall, Inc. Chap 12-27 The Hat Matrix Element h i If, X i is an influential point X i may be considered a candidate for removal from the model
  • Slide 28
  • 2002 Prentice-Hall, Inc. Chap 12-28 The Hat Matrix Element h i : Heating Oil Example No h i > 0.4 No observation appears to be candidate for removal from the model
  • Slide 29
  • 2002 Prentice-Hall, Inc. Chap 12-29 The Studentized Deleted Residuals t i * : difference between the observed and predicted based on a model that includes all observations except observation i : standard error of the estimate for a model that includes all observations except observation i An observation is considered influential if is the critical value of a two-tail test at 10% level of significance
  • Slide 30
  • 2002 Prentice-Hall, Inc. Chap 12-30 The Studentized Deleted Residuals t i * :Example t 10 * and t 13 * are influential points for potential removal from the model
  • Slide 31
  • 2002 Prentice-Hall, Inc. Chap 12-31 Cooks Distance Statistic D i is the Studentized residual If, an observation is considered influential is the critical value of the F distribution at a 50% level of significance
  • Slide 32
  • 2002 Prentice-Hall, Inc. Chap 12-32 Cooks Distance Statistic D i : Heating Oil Example No D i > 0.835 No observation appears to be candidate for removal from the model Using the three criteria, there is insufficient evidence for the removal of any observation from the model
  • Slide 33
  • 2002 Prentice-Hall, Inc. Chap 12-33 Testing for Overall Significance Shows if there is a linear relationship between all of the X variables together and Y Use F test statistic Hypotheses: H 0 : k = 0 (no linear relationship) H 1 : at least one i ( at least one independent variable affects Y ) The null hypothesis is a very strong statement Almost always reject the null hypothesis
  • Slide 34
  • 2002 Prentice-Hall, Inc. Chap 12-34 Testing for Overall Significance Test statistic: Where F has p numerator and (n-p-1) denominator degrees of freedom (continued)
  • Slide 35
  • 2002 Prentice-Hall, Inc. Chap 12-35 Test for Overall Significance Excel Output: Example p = 2, the number of explanatory variables n - 1 p value
  • Slide 36
  • 2002 Prentice-Hall, Inc. Chap 12-36 Test for Overall Significance Example Solution F 03.89 H 0 : 1 = 2 = = p = 0 H 1 : At least one i 0 =.05 df = 2 and 12 Critical Value(s) : Test Statistic: Decision: Conclusion: Reject at = 0.05 There is evidence that at least one independent variable affects Y = 0.05 F 168.47 (Excel Output)
  • Slide 37
  • 2002 Prentice-Hall, Inc. Chap 12-37 Test for Significance: Individual Variables Shows if there is a linear relationship between the variable X i and Y Use t test statistic Hypotheses: H 0 : i 0 (no linear relationship) H 1 : i 0 (linear relationship between X i and Y)
  • Slide 38
  • 2002 Prentice-Hall, Inc. Chap 12-38 t Test Statistic Excel Output: Example t Test Statistic for X 1 (Temperature) t Test Statistic for X 2 (Insulation)
  • Slide 39
  • 2002 Prentice-Hall, Inc. Chap 12-39 t Test : Example Solution H 0 : 1 = 0 H 1 : 1 0 df = 12 Critical Value(s): Test Statistic: Decision: Conclusion: Reject H 0 at = 0.05 There is evidence of a significant effect of temperature on oil consumption. t 0 2.1788 -2.1788.025 Reject H 0 0.025 Does temperature have a significant effect on monthly consumption of heating oil? Test at = 0.05. t Test Statistic = -16.1699
  • Slide 40
  • 2002 Prentice-Hall, Inc. Chap 12-40 Venn Diagrams and Estimation of Regression Model Oil Temp Insulation Only this information is used in the estimation of This information is NOT used in the estimation of nor
  • Slide 41
  • 2002 Prentice-Hall, Inc. Chap 12-41 Confidence Interval Estimate for the Slope Provide the 95% confidence interval for the population slope 1 (the effect of temperature on oil consumption). -6.169 1 -4.704 The estimated average consumption of oil is reduced by between 4.7 gallons to 6.17 gallons per each increase of 1 0 F.
  • Slide 42
  • 2002 Prentice-Hall, Inc. Chap 12-42 Contribution of a Single Independent Variable Let X k be the independent variable of interest Measures the contribution of X k in explaining the total variation in Y (SST)
  • Slide 43
  • 2002 Prentice-Hall, Inc. Chap 12-43 Contribution of a Single Independent Variable Measures the contribution of in explaining SST From ANOVA section of regression for
  • Slide 44
  • 2002 Prentice-Hall, Inc. Chap 12-44 Coefficient of Partial Determination of Measures the proportion of variation in the dependent variable that is explained by X k while controlling for (holding constant) the other independent variables
  • Slide 45
  • 2002 Prentice-Hall, Inc. Chap 12-45 Coefficient of Partial Determination for (continued) Example: Two Independent Variable Model
  • Slide 46
  • 2002 Prentice-Hall, Inc. Chap 12-46 Venn Diagrams and Coefficient of Partial Determination for Oil Temp Insulation =
  • Slide 47
  • 2002 Prentice-Hall, Inc. Chap 12-47 Coefficient of Partial Determination in PHStat PHStat | regression | multiple regression Check the coefficient of partial determination box EXCEL spreadsheet for the heating oil example
  • Slide 48
  • 2002 Prentice-Hall, Inc. Chap 12-48 Contribution of a Subset of Independent Variables Let X s be the subset of independent variables of interest Measures the contribution of the subset x s in explaining SST
  • Slide 49
  • 2002 Prentice-Hall, Inc. Chap 12-49 Contribution of a Subset of Independent Variables: Example Let X s be X 1 and X 3 From ANOVA section of regression for
  • Slide 50
  • 2002 Prentice-Hall, Inc. Chap 12-50 Testing Portions of Model Examines the contribution of a subset X s of explanatory variables to the relationship with Y Null hypothesis: Variables in the subset do not improve significantly the model when all other variables are included Alternative hypothesis: At least one variable is significant
  • Slide 51
  • 2002 Prentice-Hall, Inc. Chap 12-51 Testing Portions of Model Always one-tailed rejection region Requires comparison of two regressions One regression includes everything Another regression includes everything except the portion to be tested (continued)
  • Slide 52
  • 2002 Prentice-Hall, Inc. Chap 12-52 Partial F Test For Contribution of Subset of X variables Hypotheses: H 0 : Variables X s do not significantly improve the model given all others variables included H 1 : Variables X s significantly improve the model given all others included Test Statistic: with df = m and (n-p-1) m = # of variables in the subset X s
  • Slide 53
  • 2002 Prentice-Hall, Inc. Chap 12-53 Partial F Test For Contribution of A Single Hypotheses: H 0 : Variable X j does not significantly improve the model given all others included H 1 : Variable X j significantly improves the model given all others included Test Statistic: With df = 1 and (n-p-1) m = 1 here
  • Slide 54
  • 2002 Prentice-Hall, Inc. Chap 12-54 Testing Portions of Model: Example Test at the =.05 level to determine whether the variable of average temperature significantly improves the model given that insulation is included.
  • Slide 55
  • 2002 Prentice-Hall, Inc. Chap 12-55 Testing Portions of Model: Example H 0 : X 1 (temperature) does not improve model with X 2 (insulation) included H 1 : X 1 does improve model =.05, df = 1 and 12 Critical Value = 4.75 (For X 1 and X 2 )(For X 2 ) Conclusion: Reject H 0 ; X 1 does improve model
  • Slide 56
  • 2002 Prentice-Hall, Inc. Chap 12-56 Testing Portions of Model in PHStat PHStat | regression | multiple regression Check the coefficient of partial determination box EXCEL spreadsheet for the heating oil example.
  • Slide 57
  • 2002 Prentice-Hall, Inc. Chap 12-57 Do We Need to Do this for One Variable? The F test for the inclusion of a single variable after all other variables are included in the model is IDENTICAL to the t test of the slope for that variable The only reason to do an F test is to test several variables together
  • Slide 58
  • 2002 Prentice-Hall, Inc. Chap 12-58 The Quadratic Regression Model Relationship between one response variable and two or more explanatory variables is a quadratic polynomial function Useful when scatter diagram indicates non- linear relationship Quadratic model : The second explanatory variable is the square of the first variable
  • Slide 59
  • 2002 Prentice-Hall, Inc. Chap 12-59 Quadratic Regression Model (continued) Quadratic models may be considered when scatter diagram takes on the following shapes: X1X1 Y X1X1 X1X1 YYY 2 > 0 2 < 0 2 = the coefficient of the quadratic term X1X1
  • Slide 60
  • 2002 Prentice-Hall, Inc. Chap 12-60 Testing for Significance: Quadratic Model Testing for Overall Relationship Similar to test for linear model F test statistic = Testing the Quadratic Effect Compare quadratic model with the linear model Hypotheses (No 2 nd order polynomial term) (2 nd order polynomial term is needed)
  • Slide 61
  • 2002 Prentice-Hall, Inc. Chap 12-61 Heating Oil Example ( 0 F) Determine whether a quadratic model is needed for estimating heating oil used for a single family home in the month of January based on average temperature and amount of insulation in inches.
  • Slide 62
  • 2002 Prentice-Hall, Inc. Chap 12-62 Heating Oil Example: Residual Analysis No Discernable Pattern Maybe some non- linear relationship (continued)
  • Slide 63
  • 2002 Prentice-Hall, Inc. Chap 12-63 Heating Oil Example: t Test for Quadratic Model Testing the quadratic effect Compare quadratic model in insulation With the linear model Hypotheses (No quadratic term in insulation) (Quadratic term is needed in insulation) (continued)
  • Slide 64
  • 2002 Prentice-Hall, Inc. Chap 12-64 Example Solution H 0 : 3 = 0 H 1 : 3 0 df = 11 Critical Value(s): Test Statistic: Decision: Conclusion: Do not reject H 0 at = 0.05 There is not sufficient evidence for the need to include quadratic effect of insulation on oil consumption. Z 0 2.2010 -2.2010.025 Reject H 0 0.025 Is quadratic model in insulation needed on monthly consumption of heating oil? Test at = 0.05. t Test Statistic = 1.6611
  • Slide 65
  • 2002 Prentice-Hall, Inc. Chap 12-65 Example Solution in PHStat PHStat | regression | multiple regression EXCEL spreadsheet for the heating oil example.
  • Slide 66
  • 2002 Prentice-Hall, Inc. Chap 12-66 Dummy Variable Models Categorical explanatory variable (dummy variable) with two or more levels: Yes or no, on or off, male or female, Coded as 0 or 1 Only intercepts are different Assumes equal slopes across categories The number of dummy variables needed is (number of levels - 1) Regression model has same form:
  • Slide 67
  • 2002 Prentice-Hall, Inc. Chap 12-67 Dummy-Variable Models (with 2 Levels) Given: Y = Assessed Value of House X 1 = Square footage of House X 2 = Desirability of Neighborhood = Desirable (X 2 = 1) Undesirable (X 2 = 0) 0 if undesirable 1 if desirable Same slopes
  • Slide 68
  • 2002 Prentice-Hall, Inc. Chap 12-68 Undesirable Desirable Location Dummy-Variable Models (with 2 Levels) (continued) X 1 (Square footage) Y (Assessed Value) b 0 + b 2 b0b0 Same slopes Intercepts different
  • Slide 69
  • 2002 Prentice-Hall, Inc. Chap 12-69 Interpretation of the Dummy Variable Coefficient (with 2 Levels) Example: : GPA 0 Female 1 Male : Annual salary of college graduate in thousand $ On average, male college graduates are making an estimated six thousand dollars more than female college graduates with the same GPA. :
  • Slide 70
  • 2002 Prentice-Hall, Inc. Chap 12-70 Dummy-Variable Models (with 3 Levels)
  • Slide 71
  • 2002 Prentice-Hall, Inc. Chap 12-71 Interpretation of the Dummy Variable Coefficients (with 3 Levels) With the same footage, a Split- level will have an estimated average assessed value of 18.84 thousand dollars more than a Condo. With the same footage, a Ranch will have an estimated average assessed value of 23.53 thousand dollars more than a Condo.
  • Slide 72
  • 2002 Prentice-Hall, Inc. Chap 12-72 Interaction Regression Model Hypothesizes interaction between pairs of X variables Response to one X variable varies at different levels of another X variable Contains two-way cross product terms Can be combined with other models E.G., Dummy variable model
  • Slide 73
  • 2002 Prentice-Hall, Inc. Chap 12-73 Effect of Interaction Given: Without interaction term, effect of X 1 on Y is measured by 1 With interaction term, effect of X 1 on Y is measured by 1 + 3 X 2 Effect changes as X 2 increases
  • Slide 74
  • 2002 Prentice-Hall, Inc. Chap 12-74 Y = 1 + 2X 1 + 3(1) + 4X 1 (1) = 4 + 6X 1 Y = 1 + 2X 1 + 3(0) + 4X 1 (0) = 1 + 2X 1 Interaction Example Effect (slope) of X 1 on Y does depend on X 2 value X1X1 4 8 12 0 010.51.5 Y Y = 1 + 2X 1 + 3X 2 + 4X 1 X 2
  • Slide 75
  • 2002 Prentice-Hall, Inc. Chap 12-75 Interaction Regression Model Worksheet Multiply X 1 by X 2 to get X 1 X 2. Run regression with Y, X 1, X 2, X 1 X 2
  • Slide 76
  • 2002 Prentice-Hall, Inc. Chap 12-76 Interpretation when there are more than Three Levels MALE = 0 if female and 1 if male MARRIED = 1 if married; 0 if not DIVORCED = 1 if divorced; 0 if not MALEMARRIED = 1 if male married; 0 otherwise = (MALE times MARRIED) MALEDIVORCED = 1 if male divorced; 0 otherwise = (MALE times DIVORCED)
  • Slide 77
  • 2002 Prentice-Hall, Inc. Chap 12-77 Interpretation when there are more than Three Levels (continued)
  • Slide 78
  • 2002 Prentice-Hall, Inc. Chap 12-78 Interpreting Results FEMALE Single: Married: Divorced: MALE Single: Married: Divorced: Main Effects : MALE, MARRIED and DIVORCED Interaction Effects : MALEMARRIED and MALEDIVORCED Difference
  • Slide 79
  • 2002 Prentice-Hall, Inc. Chap 12-79 Hypothesize interaction between pairs of independent variables Contains 2-way product terms Hypotheses: H 0 : 3 = 0 (no interaction between X 1 and X 2 ) H 1 : 3 0 (X 1 interacts with X 2 ) Evaluating Presence of Interaction
  • Slide 80
  • 2002 Prentice-Hall, Inc. Chap 12-80 Using Transformations Requires data transformation Either or both independent and dependent variables may be transformed Can be based on theory, logic or scatter diagrams
  • Slide 81
  • 2002 Prentice-Hall, Inc. Chap 12-81 Inherently Linear Models Non-linear models that can be expressed in linear form Can be estimated by least squares in linear form Require data transformation
  • Slide 82
  • 2002 Prentice-Hall, Inc. Chap 12-82 Transformed Multiplicative Model (Log-Log) Similarly for X 2
  • Slide 83
  • 2002 Prentice-Hall, Inc. Chap 12-83 Square Root Transformation 1 > 0 1 < 0 Similarly for X 2 Transforms one of above model to one that appears linear. Often used to overcome heteroscedasticity.
  • Slide 84
  • 2002 Prentice-Hall, Inc. Chap 12-84 Linear-Logarithmic Transformation 1 > 0 1 < 0 Similarly for X 2 Transformed from an original multiplicative model
  • Slide 85
  • 2002 Prentice-Hall, Inc. Chap 12-85 Exponential Transformation (Log-Linear) Original Model 1 > 0 1 < 0 Transformed Into:
  • Slide 86
  • 2002 Prentice-Hall, Inc. Chap 12-86 Interpretation of Coefficients The dependent variable is logged The coefficient of the independent variable can be approximately interpreted as: a 1 unit change in leads to an estimated percentage change in the average of Y The independent variable is logged The coefficient of the independent variable can be approximately interpreted as: a 100 percent change in leads to an estimated unit change in the average of Y
  • Slide 87
  • 2002 Prentice-Hall, Inc. Chap 12-87 Interpretation of coefficients Both dependent and independent variables are logged The coefficient of the independent variable can be approximately interpreted as : a 1 percent change in leads to an estimated percentage change in the average of Y. Therefore is the elasticity of Y with respect to a change in (continued)
  • Slide 88
  • 2002 Prentice-Hall, Inc. Chap 12-88 Interpretation of Coefficients If both Y and are measured in standardized form: And The are called standardized coefficients They indicate the estimated number of average standard deviations Y will change when changes by one standard deviation (continued)
  • Slide 89
  • 2002 Prentice-Hall, Inc. Chap 12-89 Collinearity (Multicollinearity) High correlation between explanatory variables Coefficient of multiple determination measures combined effect of the correlated explanatory variables No new information provided Leads to unstable coefficients (large standard error) Depending on the explanatory variables
  • Slide 90
  • 2002 Prentice-Hall, Inc. Chap 12-90 Venn Diagrams and Collinearity Oil Temp Insulation Overlap NOT Large Overlap in variation of Temp and Insulation is used in explaining the variation in Oil but NOT in estimating and Overlap Large Overlap reflects collinearity between Temp and Insulation
  • Slide 91
  • 2002 Prentice-Hall, Inc. Chap 12-91 Detect Collinearity (Variance Inflationary Factor) Used to Measure Collinearity If is Highly Correlated with the Other Explanatory Variables.
  • Slide 92
  • 2002 Prentice-Hall, Inc. Chap 12-92 Detect Collinearity in PHStat PHStat | regression | multiple regression Check the variance inflationary factor (VIF) box EXCEL spreadsheet for the heating oil example Since there are only two explanatory variables, only one VIF is reported in the excel spreadsheet No VIF is > 5 There is no evidence of collinearity
  • Slide 93
  • 2002 Prentice-Hall, Inc. Chap 12-93 Model Building Goal is to develop a good model with the fewest explanatory variables Easier to interpret Lower probability of collinearity Stepwise regression procedure Provide limited evaluation of alternative models Best-subset approach Uses the c p statistic Selects model with small c p near p+1
  • Slide 94
  • 2002 Prentice-Hall, Inc. Chap 12-94 Model Building Flowchart Choose X 1,X 2,X p Run Regression to find VIFs Remove Variable with Highest VIF Any VIF>5? Run Subsets Regression to Obtain best models in terms of C p Do Complete Analysis Add Curvilinear Term and/or Transform Variables as Indicated Perform Predictions No More than One? Remove this X Yes No Yes
  • Slide 95
  • 2002 Prentice-Hall, Inc. Chap 12-95 Pitfalls and Ethical Considerations Understand that interpretation of the estimated regression coefficients are performed holding all other independent variables constant Evaluate residual plots for each independent variable Evaluate interaction terms To avoid pitfalls and address ethical considerations:
  • Slide 96
  • 2002 Prentice-Hall, Inc. Chap 12-96 Additional Pitfalls and Ethical Considerations Obtain VIF for each independent variable and remove variables that exhibit a high collinearity with other independent variables before performing significance test on each independent variable Examine several alternative models using best- subsets regression Use other methods when the assumptions necessary for least-squares regression have been seriously violated (continued) To avoid pitfalls and address ethical considerations:
  • Slide 97
  • 2002 Prentice-Hall, Inc. Chap 12-97 Chapter Summary Developed the multiple regression model Discussed residual plots Addressed testing the significance of the multiple regression model Discussed inferences on population regression coefficients Addressed testing portion of the multiple regression model
  • Slide 98
  • 2002 Prentice-Hall, Inc. Chap 12-98 Chapter Summary Described the quadratic regression model Addressed dummy variables Discussed using transformation in regression models Described collinearity Discussed model building Addressed pitfalls in multiple regression and ethical considerations (continued)