logistic regression a quick intro. why logistic regression? big idea: dependent variable is a...

Logistic regression

A quick intro

Why Logistic Regression?

Big idea: dependent variable is a dichotomy (though can use for more than 2 categories i.e. multinomial logistic regression)

Why would we use? One thing to use a t-test (or multivariate counterpart) to say

groups are different, however it may be the research goal to predict group membership

• Clinical/Medical context• Schizophrenic or not• Clinical depression or not• Cancer or not

• Social/Cognitive context• Vote yes or no• Preference A over B• Graduate or not

Things to cover Relationship to typical multiple regression Interpretation of fit Interpretation of coefficients

Questions

Can the cases be accurately classified given a set of predictors?

Can the solution generalize to predicting new cases? Comparison of equation with predictors plus intercept to

a model with just the intercept What is the relative importance of each predictor?

How does each variable affect the outcome? Does a predictor make the solution better or worse or

have no effect? Are there interactions among predictors?

Does adding interactions among predictors (continuous or categorical) significantly improve the model?

Can parameters be accurately estimated? What is the strength of association between the outcome

variable and a set of predictors?

Multiple regression approach With MR, we used a method to minimize the squared

deviations from our predicted values Can’t really pull off with dichotomous variable

Only two outcome values to produce residuals Can’t meet normality or homoscedasticity

assumptions While it could produce what are essentially predicted

probabilities of belonging to a particular group, those probabilities are not bounded by zero and 1

Logistic regression will allow us to go about the prediction/explanation process in a similar manner, but without the problems

Assumptions

The only “real” limitation with logistic regression is that the outcome must be discrete.

If the distributional assumptions are met for it then discriminant function analysis may be more powerful, although it has been shown to overestimate the association using discrete predictors.

If the outcome is continuous then multiple regression is more powerful given that the assumptions are met

Assumptions

Ratio of cases to variables: using discrete variables requires that there are enough responses in every given category to allow for reasonable estimation of parameters/predictive power

Linearity in the logit – the IVs should have a linear relationship with the logit form of the DV.

There is no assumption about the predictors being linearly related to each other.

Assumptions

Absence of collinearity among predictors

No outliers Independence of errors

Assumes categories are mutually exclusive

Coefficients

In interpreting coefficients we’re now thinking about a particular case’s tendency toward some outcome

The problem with probabilities is that they are non-linear Going from .10 to .20 doubles the probability, but

going from .80 to .90 only increases the probability somewhat

With logistic regression we start to think about the odds Odds are just an alternative way of expressing the

likelihood (probability) of an event. Probability is the expected number of the event

divided by the total number of possible outcomes Odds are the expected number of the event divided

by the expected number of non-event occurrences. • Expresses the likelihood of occurrence relative to likelihood

of non-occurrence

Odds

Let's begin with probability. Let's say that the probability of success is .8, thus p = .8

Then the probability of failure is q = 1 - p = .2

The odds of success are defined as odds(success) = p/q = .8/.2 = 4, that is, the odds of success are 4 to 1.

We can also define the odds of failure odds(failure) = q/p = .2/.8 = .25, that is, the odds of failure are 1 to 4.

Odds Ratio

Next, let's compute the odds ratio by OR = odds(success)/odds(failure) = 4/.25 = 16

The interpretation of this odds ratio would be that the odds of success are 16 times greater than for failure.

Now if we had formed the odds ratio the other way around with odds of failure in the numerator, we would have gotten

OR = odds(failure)/odds(success) = .25/4 = .0625

Here the interpretation is that the odds of failure are one-sixteenth the odds of success.

Logit

Logit Natural log (e) of an odds Often called a log odds

• The logit scale is linear

Logits are continuous and are centered on zero (kind of like z-scores) p = 0.50, odds = 1, then logit = 0 p = 0.70, odds = 2.33, then logit = 0.85 p = 0.30, odds = .43, then logit = -0.85

Logit

So conceptually putting things in our standard regression form: Log odds = bo + b1X

Now a one unit change in X leads to a b1 change in the log odds

In terms of odds:

In terms of probability:

Thus the logit, odds and probability are different ways of expressing the same thing

0 1( 1) b b Xodds Y e

0 1

0 1Pr( 1)

1

b b X

b b X

eY

e

Coefficients

The raw coefficients for our predictor variables in our output are the amount of increase in the log odds given a one unit increase in that predictor

The coefficients are determined through an iterative process that finds the coefficients that best match the data at hand Maximum likelihood Starts with a set of coefficients (e.g. ordinary least

squares estimates) and then proceeds to alter until almost no change in fit

Note that with SPSS it codes the outcome variable as 0 and 1 and predicts with respect to the 0 category Might be more intuitive to switch the coefficients’

signs with your output

Coefficients

We also receive a different type of coefficient expressed in odds

Anything above 1 suggests an increase in odds of an event, less than, a decrease in the odds

For example, if 1.14, moving on the independent variable 1 unit increases the odds of the event by a factor of 1.14 Essentially it is the odds ratio for one value of X vs.

the next value of X More intuitively it refers to the percentage increase (or

decrease) of becoming a member of group such and such with a one unit increase in the predictor variable

% ( 1)*100be

Example

Example: predicting art museum visitation by education, age, income, and political views Gss93 dataset

SPSS will start with “Block 0” which is testing to see whether the intercept is a worthwhile predictor by itself

In other words, is just guessing one of the outcomes all the time going to be enough

Model fit

Goodness-of-fit statistics help you to determine whether the model adequately describes the data Here statistical significance is not desired More like a badness of fit really, and

problematic since one can’t accept the null due to non-significance

Best used descriptively perhaps Pseudo r-squared statistics

In this dichotomous situation we will have trouble with devising an r2

Model fit

Cox & Snell’s value would not reach 1.0 even for a perfect fit

Nagelkerke is a version of C&S that would Probably preferred

but may be a little optimistic (just like our regular R-square)

The Hosmer and Lemeshow GOF suggests we’re ok too

Model Summary

1610.561a .168 .226Step1

-2 Loglikelihood

Cox & SnellR Square

NagelkerkeR Square

Estimation terminated at iteration number 5 becauseparameter estimates changed by less than .001.

a.

Hosmer and Lemeshow Test

4.677 8 .792Step1

Chi-square df Sig.

Coefficients

Would appear age is the only one that doesn’t contribute statistically significantly

Note the odds ratio of 1.00 Moving one unit in age doesn’t say

anything about whether you will be more or less likely to go to the museum

Polview (1 extreme lib, 7 extreme cons) isn’t perhaps doing much either

More conservative less likely to go to museum but only a very small change

Education More education more likely to visit More interest?

Income Higher income more likely to visit More leisure?

Variables in the Equation

.082 .045 3.329 1 .068 1.086

-.281 .026 120.857 1 .000 .755

.004 .004 1.206 1 .272 1.004

-.052 .013 16.746 1 .000 .949

4.379 .431 103.158 1 .000 79.766

polviews

educ

age

income91

Constant

Step1

a

B S.E. Wald df Sig. Exp(B)

Variable(s) entered on step 1: polviews, educ, age, income91.a.

Dependent Variable Encoding

0

1

Original ValueYes

No

Internal Value

Classification

Classification table Here we get a good

sense of how well we’re able to predict the outcome.

69% overall compared to 58.7% if we just guessed no (Block 0)

Classification Tablea

298 269 52.6

157 650 80.5

69.0

ObservedYes

No

Visited Art Museumor Gallery in Last Yr

Overall Percentage

Step 1Yes No

Visited Art Museum orGallery in Last Yr Percentage

Correct

Predicted

The cut value is .500a.

Other measures regarding classification

MeasureCalculation

Prevalence (a + c)/N

Overall Diagnostic Power (b + d)/N

Correct Classification Rate (a + d)/N

Sensitivity a/(a + c)

Specificity d/(b + d)

False Positive Rate b/(b + d)

False Negative Rate c/(a + c)

Positive Predictive Power a/(a + b)

Negative Predictive Power d/(c + d)

Misclassification Rate (b + c)/N

Odds-ratio (ad)/(cb)

Kappa (a + d) - (((a + c)(a + b) + (b + d)(c + d))/N)

N - (((a + c)(a + b) + (b + d)(c + d))/N)

NMI n(s) 1 - -a.ln(a)-b.ln(b)-c.ln(c)-d.ln(d)+(a+b).ln(a+b)+(c+d).ln(c+d)

N.lnN - ((a+c).ln(a+c) + (b+d).ln(b+d))

Actual + Actual -

Predicted + a b

Predicted - c d

The classification stats from DFA would apply here as well

logistic regression a quick intro. why logistic regression? big idea: dependent variable is a...

Documents