week 12 november 17-21 four mini-lectures qmm 510 fall 2014
TRANSCRIPT
13-2
Chapter Contents
13.1 Multiple Regression
13.2 Assessing Overall Fit
13.3 Predictor Significance
13.4 Confidence Intervals for Y
13.5 Categorical Predictors
13.6 Tests for Nonlinearity and Interaction
13.7 Multicollinearity
13.8 Violations of Assumptions
13.9 Other Regression Topics
Ch
apter 13
Multiple Regression ML 12.1
Much of this is like Chapter 12, except that we have more than one predictor.
13-3
• Multiple regression is an extension of simple regression to include more than one independent variable.
• Limitations of simple regression:
• often simplistic
• biased estimates if relevant predictors are omitted
• lack of fit does not show that X is unrelated to Y if the true model is multivariate
Simple or Multivariate?
Ch
apter 13
Multiple Regression
13-5
• Y is the response variable and is assumed to be related to the k predictors (X1, X2, … Xk) by a linear equation called the population regression model:
• The estimated (fitted) regression equation is:
Regression Terminology
Ch
apter 13
Multiple Regression
Use Roman letters for sample estimates
Use Greek letters for population parameters
13-6
Fitted Regression: Simple versus Multivariate
Ch
apter 13
Multiple Regression
If we have more than two predictors, there is no way to visualize it …
13-7
n observed values of the response variable Y and its proposed predictors X1, X2, …, Xk are presented in the form of an n x k matrix.
Data Format
Ch
apter 13
Multiple Regression
13-8
Ch
apter 13
Common Misconceptions about Fit• A common mistake is to assume that the model with the best
fit is preferred.
• Sometimes a model with a low R2 may give useful predictions, while a model with a high R2 may conceal problems.
• Thoroughly analyze the results before choosing the model.
Multiple Regression
13-9
Four Criteria for Regression Assessment
• Logic - Is there an a priori reason to expect a causal relationship between the predictors and the response variable?
• Fit - Does the overall regression show a significant relationship between the predictors and the response variable?
• Parsimony - Does each predictor contribute significantly to the explanation? Are some predictors not worth the trouble?
• Stability - Are the predictors related to one another so strongly that the regression estimates become erratic?
Ch
apter 13
Multiple Regression
13-10
Assessing Overall Fit
• For a regression with k predictors, the hypotheses to be tested areH0: All the true coefficients are zeroH1: At least one of the coefficients is nonzero
• In other words,H0: b1 = b2 = … = bk= 0H1: At least one of the coefficients is nonzero
F Test for Significance
Ch
apter 13
13-11
F Test for Significance
Ch
apter 13
The ANOVA calculations for a k-predictor model resemble those for a simple regression, except for degrees of freedom:
Assessing Overall Fit
13-12
• R2, the coefficient of determination, is a common measure of overall fit.
• It can be calculated in one of two ways (always done by computer).
• For example, for the home price data,
Coefficient of Determination (R2)
Ch
apter 13
Assessing Overall Fit
13-13
• It is generally possible to raise the coefficient of determination R2 by including additional predictors.
• The adjusted coefficient of determination is done to penalize the inclusion of useless predictors.
• For n observations and k predictors:
Adjusted R2
Ch
apter 13
Assessing Overall Fit
13-14
• Limit the number of predictors based on the sample size.
• A large sample size permits many predictors.
• When n/k is small, the R2 no longer gives a reliable indication of fit.
• Suggested rules are:
Evan’s Rule (conservative): n/k 0 (at least 10 observations per predictor)
Doane’s Rule (relaxed): n/k 5 (at least 5 observations predictor)
How Many Predictors?
Ch
apter 13
Assessing Overall Fit
These are just guidelines – use your judgment.
13-15
• Test each fitted coefficient to see whether it is significantly different from zero.
• The hypothesis tests for the coefficient of predictor Xj are
• If we cannot reject the hypothesis that a coefficient is zero, then the corresponding predictor does not contribute to the prediction of Y.
Ch
apter 13
Predictor Significance
13-16
• Excel reports the test statistic for the coefficient of predictor Xj :
Test Statistic
• Find the critical value tα for chosen level of significance α from Appendix D or from Excel using =T.INV.2T(α,df) 2 tailed test.
• To reject H0 we compare tcalc to tα for the different hypotheses (or reject if p-value α).
Ch
apter 13
• The 95% confidence interval for coefficient bj is
Predictor Significance
13-17
Confidence Intervals for Y
• The standard error of the regression (se) is another important measure of fit. Except for d.f. the formula for se resembles se for simple regression.
• For n observations and k predictors
Standard Error
• If all predictions were perfect (SSE = 0) then se = 0.
Ch
apter 13
13-18
• Approximate 95% confidence interval for conditional mean of Y:
• Approximate 95% prediction interval for individual Y value:
Approximate Confidence and Prediction Intervals for Y
Ch
apter 13
Confidence Intervals for Y
13-19
• The t-values for 95% confidence are typically near 2 (as long as n is not too small).
• Very quick prediction and confidence intervals for Y interval without using a t table are:
Quick 95 Percent Confidence and Prediction Interval for Y
Ch
apter 13
Confidence Intervals for Y
12-20
Unusual Observations ML 12.2
Standardized Residuals• Use Excel, MINITAB, MegaStat or other software to compute
standardized residuals.
• If the absolute value of any standardized residual is at least 2, then it is classified as unusual (as in simple regression).
Ch
apter 13
Leverage and Influence• A high leverage statistic indicates unusual X values in one or more
predictors.
• Such observations are influential because they are near the edge(s) of the fitted regression plane.
• Leverage for observation i is denoted hi (computed by MegaStat)
12-21
Leverage
• For a regression model with k predictors, an observation whose leverage exceeds 2(k+1)/n is unusual.
• In Chapter 12, the leverage rule was 4/n. With k = 1 predictor, we get 2(k+1)/n = 2(1+1)/n = 4/n.
• So this leverage criterion applies to simple regression as a special case.
Ch
apter 13
Unusual Observations
12-22
Ch
apter 13
Unusual Observations Example: Heart Death Rate in 50 States
n = 50 states,k = 3 predictors
high leverage criterion is 2(k+1)/n = 2(3+1)/50 = 0.160
Note: Only unusual observations are shown (there were n = 50 observations)
MegaStat highlights the high leverage observations (> .160)
4 states (FL, HI, OK, WV) have unusual residuals(> 2 se) highlighted by MegaStat
standard errorse = 27.422
13-23
Categorical Predictors ML 12.3
• A binary predictor has two values (usually 0 and 1) to denote the presence or absence of a condition.
• For example, for n graduates from an MBA program: Employed = 1Unemployed = 0
• These variables are also called dummy , dichotomous, or indicator variables.
• For easy understandability, name the binary variable the characteristic that is equivalent to the value of 1.
What Is a Binary or Categorical Predictor?
Ch
apter 13
13-24
• A binary predictor is sometimes called a shift variable because it shifts the regression plane up or down.
• Suppose X1 is a binary predictor that can take on only the values of 0 or 1.
• Its contribution to the regression is either b1 or nothing, resulting in an intercept of either b0 (when X1 = 0) or b0 + b1 (when X1 = 1).
• The slope does not change: only the intercept is shifted. Forexample,
Effects of a Binary Predictor
Ch
apter 13
Categorical Predictors
13-25
• In multiple regression, binary predictors require no special treatment. They are tested as any other predictor using a t test.
Testing a Binary for Significance
Ch
apter 13
More Than One Binary• More than one binary occurs when the number of categories to be
coded exceeds two.
• For example, for the variable GPA by class level, each category is a binary variable:
Freshman = 1 if a freshman, 0 otherwiseSophomore = 1 if a sophomore, 0 otherwiseJunior = 1 if a junior, 0 otherwiseSenior = 1 if a senior, 0 otherwiseMasters = 1 if a master’s candidate, 0 otherwiseDoctoral = 1 if a PhD candidate, 0 otherwise
Categorical Predictors
13-26
• Including all binaries for all categories may introduce a serious problem of collinearity for the regression estimation. Collinearity occurs when there are redundant independent variables.
• When the value of one independent variable can be determined from the values of other independent variables, one column in the X data matrix will be a perfect linear combination of the other column(s).
• The least squares estimation would fail because the data matrix would be singular (i.e., would have no inverse).
What if I Forget to Exclude One Binary?
Ch
apter 13
Categorical Predictors
13-2713-27
Other Regression Problems
• Outliers? (omit only if clearly errors)
• Missing Predictors? (usually you can’t tell)
• Ill-Conditioned Data (adjust decimals or take logs)
• Significance in Large Samples? (if n is huge, almost any regression will be significant)
• Model Specification Errors? (may show up in residual patterns)
• Missing Data? (we may have to live without it)
• Binary Response? (if Y = 0,1 we use logistic regression)
• Stepwise and Best Subsets Regression (MegaStat does these)
Ch
apter 13