week 12 november 17-21 four mini-lectures qmm 510 fall 2014

27
Week 12 November 17-21 Four Mini-Lectures QMM 510 Fall 2014

Upload: kyla-satterlee

Post on 14-Dec-2015

219 views

Category:

Documents


2 download

TRANSCRIPT

Week 12 November 17-21

Four Mini-Lectures QMM 510Fall 2014

13-2

Chapter Contents

13.1 Multiple Regression

13.2 Assessing Overall Fit

13.3 Predictor Significance

13.4 Confidence Intervals for Y

13.5 Categorical Predictors

13.6 Tests for Nonlinearity and Interaction

13.7 Multicollinearity

13.8 Violations of Assumptions

13.9 Other Regression Topics

Ch

apter 13

Multiple Regression ML 12.1

Much of this is like Chapter 12, except that we have more than one predictor.

13-3

• Multiple regression is an extension of simple regression to include more than one independent variable.

• Limitations of simple regression:

• often simplistic

• biased estimates if relevant predictors are omitted

• lack of fit does not show that X is unrelated to Y if the true model is multivariate

Simple or Multivariate?

Ch

apter 13

Multiple Regression

13-4

Ch

apter 13Visualizing a Multiple Regression

Multiple Regression

13-5

• Y is the response variable and is assumed to be related to the k predictors (X1, X2, … Xk) by a linear equation called the population regression model:

• The estimated (fitted) regression equation is:

Regression Terminology

Ch

apter 13

Multiple Regression

Use Roman letters for sample estimates

Use Greek letters for population parameters

13-6

Fitted Regression: Simple versus Multivariate

Ch

apter 13

Multiple Regression

If we have more than two predictors, there is no way to visualize it …

13-7

n observed values of the response variable Y and its proposed predictors X1, X2, …, Xk are presented in the form of an n x k matrix.

Data Format

Ch

apter 13

Multiple Regression

13-8

Ch

apter 13

Common Misconceptions about Fit• A common mistake is to assume that the model with the best

fit is preferred.

• Sometimes a model with a low R2 may give useful predictions, while a model with a high R2 may conceal problems.

• Thoroughly analyze the results before choosing the model.

Multiple Regression

13-9

Four Criteria for Regression Assessment

• Logic - Is there an a priori reason to expect a causal relationship between the predictors and the response variable?

• Fit - Does the overall regression show a significant relationship between the predictors and the response variable?

• Parsimony - Does each predictor contribute significantly to the explanation? Are some predictors not worth the trouble?

• Stability - Are the predictors related to one another so strongly that the regression estimates become erratic?

Ch

apter 13

Multiple Regression

13-10

Assessing Overall Fit

• For a regression with k predictors, the hypotheses to be tested areH0: All the true coefficients are zeroH1: At least one of the coefficients is nonzero

• In other words,H0: b1 = b2 = … = bk= 0H1: At least one of the coefficients is nonzero

F Test for Significance

Ch

apter 13

13-11

F Test for Significance

Ch

apter 13

The ANOVA calculations for a k-predictor model resemble those for a simple regression, except for degrees of freedom:

Assessing Overall Fit

13-12

• R2, the coefficient of determination, is a common measure of overall fit.

• It can be calculated in one of two ways (always done by computer).

• For example, for the home price data,

Coefficient of Determination (R2)

Ch

apter 13

Assessing Overall Fit

13-13

• It is generally possible to raise the coefficient of determination R2 by including additional predictors.

• The adjusted coefficient of determination is done to penalize the inclusion of useless predictors.

• For n observations and k predictors:

Adjusted R2

Ch

apter 13

Assessing Overall Fit

13-14

• Limit the number of predictors based on the sample size.

• A large sample size permits many predictors.

• When n/k is small, the R2 no longer gives a reliable indication of fit.

• Suggested rules are:

Evan’s Rule (conservative): n/k 0 (at least 10 observations per predictor)

Doane’s Rule (relaxed): n/k 5 (at least 5 observations predictor)

How Many Predictors?

Ch

apter 13

Assessing Overall Fit

These are just guidelines – use your judgment.

13-15

• Test each fitted coefficient to see whether it is significantly different from zero.

• The hypothesis tests for the coefficient of predictor Xj are

• If we cannot reject the hypothesis that a coefficient is zero, then the corresponding predictor does not contribute to the prediction of Y.

Ch

apter 13

Predictor Significance

13-16

• Excel reports the test statistic for the coefficient of predictor Xj :

Test Statistic

• Find the critical value tα for chosen level of significance α from Appendix D or from Excel using =T.INV.2T(α,df) 2 tailed test.

• To reject H0 we compare tcalc to tα for the different hypotheses (or reject if p-value α).

Ch

apter 13

• The 95% confidence interval for coefficient bj is

Predictor Significance

13-17

Confidence Intervals for Y

• The standard error of the regression (se) is another important measure of fit. Except for d.f. the formula for se resembles se for simple regression.

• For n observations and k predictors

Standard Error

• If all predictions were perfect (SSE = 0) then se = 0.

Ch

apter 13

13-18

• Approximate 95% confidence interval for conditional mean of Y:

• Approximate 95% prediction interval for individual Y value:

Approximate Confidence and Prediction Intervals for Y

Ch

apter 13

Confidence Intervals for Y

13-19

• The t-values for 95% confidence are typically near 2 (as long as n is not too small).

• Very quick prediction and confidence intervals for Y interval without using a t table are:

Quick 95 Percent Confidence and Prediction Interval for Y

Ch

apter 13

Confidence Intervals for Y

12-20

Unusual Observations ML 12.2

Standardized Residuals• Use Excel, MINITAB, MegaStat or other software to compute

standardized residuals.

• If the absolute value of any standardized residual is at least 2, then it is classified as unusual (as in simple regression).

Ch

apter 13

Leverage and Influence• A high leverage statistic indicates unusual X values in one or more

predictors.

• Such observations are influential because they are near the edge(s) of the fitted regression plane.

• Leverage for observation i is denoted hi (computed by MegaStat)

12-21

Leverage

• For a regression model with k predictors, an observation whose leverage exceeds 2(k+1)/n is unusual.

• In Chapter 12, the leverage rule was 4/n. With k = 1 predictor, we get 2(k+1)/n = 2(1+1)/n = 4/n.

• So this leverage criterion applies to simple regression as a special case.

Ch

apter 13

Unusual Observations

12-22

Ch

apter 13

Unusual Observations Example: Heart Death Rate in 50 States

n = 50 states,k = 3 predictors

high leverage criterion is 2(k+1)/n = 2(3+1)/50 = 0.160

Note: Only unusual observations are shown (there were n = 50 observations)

MegaStat highlights the high leverage observations (> .160)

4 states (FL, HI, OK, WV) have unusual residuals(> 2 se) highlighted by MegaStat

standard errorse = 27.422

13-23

Categorical Predictors ML 12.3

• A binary predictor has two values (usually 0 and 1) to denote the presence or absence of a condition.

• For example, for n graduates from an MBA program: Employed = 1Unemployed = 0

• These variables are also called dummy , dichotomous, or indicator variables.

• For easy understandability, name the binary variable the characteristic that is equivalent to the value of 1.

What Is a Binary or Categorical Predictor?

Ch

apter 13

13-24

• A binary predictor is sometimes called a shift variable because it shifts the regression plane up or down.

• Suppose X1 is a binary predictor that can take on only the values of 0 or 1.

• Its contribution to the regression is either b1 or nothing, resulting in an intercept of either b0 (when X1 = 0) or b0 + b1 (when X1 = 1).

• The slope does not change: only the intercept is shifted. Forexample,

Effects of a Binary Predictor

Ch

apter 13

Categorical Predictors

13-25

• In multiple regression, binary predictors require no special treatment. They are tested as any other predictor using a t test.

Testing a Binary for Significance

Ch

apter 13

More Than One Binary• More than one binary occurs when the number of categories to be

coded exceeds two.

• For example, for the variable GPA by class level, each category is a binary variable:

Freshman = 1 if a freshman, 0 otherwiseSophomore = 1 if a sophomore, 0 otherwiseJunior = 1 if a junior, 0 otherwiseSenior = 1 if a senior, 0 otherwiseMasters = 1 if a master’s candidate, 0 otherwiseDoctoral = 1 if a PhD candidate, 0 otherwise

Categorical Predictors

13-26

• Including all binaries for all categories may introduce a serious problem of collinearity for the regression estimation. Collinearity occurs when there are redundant independent variables.

• When the value of one independent variable can be determined from the values of other independent variables, one column in the X data matrix will be a perfect linear combination of the other column(s).

• The least squares estimation would fail because the data matrix would be singular (i.e., would have no inverse).

What if I Forget to Exclude One Binary?

Ch

apter 13

Categorical Predictors

13-2713-27

Other Regression Problems

• Outliers? (omit only if clearly errors)

• Missing Predictors? (usually you can’t tell)

• Ill-Conditioned Data (adjust decimals or take logs)

• Significance in Large Samples? (if n is huge, almost any regression will be significant)

• Model Specification Errors? (may show up in residual patterns)

• Missing Data? (we may have to live without it)

• Binary Response? (if Y = 0,1 we use logistic regression)

• Stepwise and Best Subsets Regression (MegaStat does these)

Ch

apter 13