multiple logistic regression stat e-150 statistical methods

18
Multiple Logistic Regression STAT E-150 Statistical Methods

Upload: phyllis-collins

Post on 18-Jan-2016

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Multiple Logistic Regression STAT E-150 Statistical Methods

Multiple Logistic Regression

STAT E-150Statistical Methods

Page 2: Multiple Logistic Regression STAT E-150 Statistical Methods

2

Multiple Logistic Regression is used when there are several predictors, and the response variable is a binary variable.

As we did previously, we can use an indicator variable to represent the response variable, with 1 to represent the presence of some condition (“success” or "yes") and 0 to represent the absence of the condition (“failure” or "no"): 

The logistic regression model describes how the probability of “success” is related to the values of the explanatory variables, which can be categorical or quantitative.

1 if successy

0 if failure

Page 3: Multiple Logistic Regression STAT E-150 Statistical Methods

3

The Multiple Logistic Regression Model

Logit form:

 Probability form:

0 1 1 2 2 k k

0 1 1 2 2 k k

x x + x

x x + x

e

1 e

0 1 1 2 2 k klog x x + x1

Page 4: Multiple Logistic Regression STAT E-150 Statistical Methods

4

Conditions:

Linearity: Check for a linear relationship between a predictor and the logit - transformed response variable, using logit plots.

 Probability Model: The response values must be random and

independent. Think carefully about how the data was produced.

Page 5: Multiple Logistic Regression STAT E-150 Statistical Methods

5

As with other regression methods, if the conditions are satisfied, we can test hypotheses and construct confidence intervals, and use the results to describe relationships and make predictions. There are advantages to this analysis. First, there are no assumptions about the distributions of the predictors; they do not have to be normally distributed, linearly related, or have equal variances within each group. In addition, there are no restrictions on the type of predictors.

Page 6: Multiple Logistic Regression STAT E-150 Statistical Methods

6

Let's return to the example we discussed earlier: Suppose that the sales director of appliance stores wants to find out which factors encourage customers to purchase extended warranties after a major appliance purchase. The response variable indicates whether a warranty is purchased. The predictor variables are

- Customer gender - Age of the customer- Whether a gift is offered with the warranty- Price of the appliance- Race of the customer (this is coded with four indicator variables to

represent White, African-American, Hispanic, and Other) 

Page 7: Multiple Logistic Regression STAT E-150 Statistical Methods

Variables in the Equation

  B S.E. Wald df Sig. Exp(B)

Step 1a Gender -3.772 2.568 2.158 1 .142 .023

Gift 2.715 1.567 3.003 1 .083 15.112

Age .091 .056 2.638 1 .104 1.096

Price .001 .000 3.363 1 .067 1.001

White 3.773 13.863 .074 1 .785 43.518

AfricanAmerican 1.163 13.739 .007 1 .933 3.199

Hispanic 6.347 14.070 .203 1 .652 570.898

Constant -12.018 14.921 .649 1 .421 .000

a. Variable(s) entered on step 1: Gender, Gift, Age, Price, White, AfricanAmerican, Hispanic.

 

7

Let‘s start with the full model, using all predictors:

The significance of each predictor is measured using the Wald statistic.

(Note that SPSS finds a Wald Chi-Square and not a Wald z which you may see elsewhere; remember that the z value is just the square root of the Chi-Square value. In this case, use the sign of the corresponding coefficient estimate, βi.) 

Page 8: Multiple Logistic Regression STAT E-150 Statistical Methods

Variables in the Equation

  B S.E. Wald df Sig. Exp(B)

Step 1a Gender -3.772 2.568 2.158 1 .142 .023

Gift 2.715 1.567 3.003 1 .083 15.112

Age .091 .056 2.638 1 .104 1.096

Price .001 .000 3.363 1 .067 1.001

White 3.773 13.863 .074 1 .785 43.518

AfricanAmerican 1.163 13.739 .007 1 .933 3.199

Hispanic 6.347 14.070 .203 1 .652 570.898

Constant -12.018 14.921 .649 1 .421 .000

a. Variable(s) entered on step 1: Gender, Gift, Age, Price, White, AfricanAmerican, Hispanic.

 

8

Let‘s start with the full model, using all predictors:

Which predictors are significant at the .10 level of significance?

Gender (p = .142) and the race variables White (p = .785), AfricanAmerican (p = .933), and Hispanic (p = .652), are not

significant.

(Age is marginal (p = .104), and we’ll leave it in for now.)

Page 9: Multiple Logistic Regression STAT E-150 Statistical Methods

9

We can now repeat the analysis using only the significant predictors.

Which predictors are significant in this reduced model? 

Page 10: Multiple Logistic Regression STAT E-150 Statistical Methods

10

We can now repeat the analysis using only the significant predictors.

Which predictors are significant in this reduced model? The reduced model indicates that all variables are significant, even at the .05 level of significance.

Page 11: Multiple Logistic Regression STAT E-150 Statistical Methods

11

 

As we have discussed, there are seemingly contradictory values: the coefficient for Price is too small to fit into three decimal places, but we know it is an important predictor because the odds ratio is 1. We can try to find more information by dividing the values of Price by 100. Here are the results for the new model:

Page 12: Multiple Logistic Regression STAT E-150 Statistical Methods

12

Here are the results for the new model:

 Note that the odds ratio for Price100 is now 1.041 and the new coefficient is .040. All other values are unchanged.

Variables in the Equation

  B S.E. Wald df Sig. Exp(B)

Step 1a Gift 2.339 1.131 4.273 1 .039 10.368

Age .064 .032 4.132 1 .042 1.066

Price100 .040 .016 6.165 1 .013 1.041

Constant -6.096 2.142 8.096 1 .004 .002

a. Variable(s) entered on step 1: Gift, Age, Price100.

 

Page 13: Multiple Logistic Regression STAT E-150 Statistical Methods

13

How do we assess this model?

Use the -2(log likelihood) value; lower values indicate a better fit.

When you are comparing models, consider the difference in the -2LL values to see if the difference is significant.

Page 14: Multiple Logistic Regression STAT E-150 Statistical Methods

14

What if we remove the Age variable?

Variables in the Equation

  B S.E. Wald df Sig. Exp(B)

Step 1a Gift 2.339 1.131 4.273 1 .039 10.368

Age .064 .032 4.132 1 .042 1.066

Price100 .040 .016 6.165 1 .013 1.041

Constant -6.096 2.142 8.096 1 .004 .002

a. Variable(s) entered on step 1: Gift, Age, Price100.

 

Page 15: Multiple Logistic Regression STAT E-150 Statistical Methods

15

What if we remove the Age variable?

Page 16: Multiple Logistic Regression STAT E-150 Statistical Methods

16

With the Age variable:

Without the Age variable:

We can conclude that the larger model is better.

Page 17: Multiple Logistic Regression STAT E-150 Statistical Methods

17

We can also compare the results of the Hosmer-Lemenshow test for goodness-of-fit; it assesses the general model, not the model parameters.

With the Age variable:

Without the Age variable:

Smaller p-values indicate a lack of fit for the model.Which model appears to be better based on this value?

Page 18: Multiple Logistic Regression STAT E-150 Statistical Methods

18

We can also compare the results of the Hosmer-Lemenshow test for goodness-of-fit; it assesses the general model, not the model parameters.

With the Age variable:

Without the Age variable:

Again we can conclude that the larger model is a better fit.