logistic regression 2 sociology 8811 lecture 7 copyright © 2007 by evan schofer do not copy or...

Logistic Regression 2

Sociology 8811 Lecture 7

Copyright © 2007 by Evan SchoferDo not copy or distribute without permission

Logistic Regression

• We can convert a probability to odds:

• “Logit” = natural log (ln) of an odds• Natural log means base “e”, not base 10

– We can model a logit as a function of independent variables:

• Just as we model Y or a probability (the LPM)

i

ii p

podds

1

K

jjij

i

ii X

p

pLp

11ln)(logit

Logistic Regression

• Note: We can solve for “p” and reformulate the model:

ee

eK

jjij

K

jjij

K

jjij

XBXB

XB

YP

11 11

1 1)1(

• Why model this rather than a probability?– Because it is a useful non-linear transformation

• It always generates Ps between 0 and 1, regardless of the values of X variables

• Note: probit transformation has similar effect.

Logistic Regression• Benefits of Logistic regression:

• You can now effectively model probability as a function of X variables

• You don’t have to worry about violations of OLS assumptions

• Predictions fall between 0 and 1

• Downsides– You lose the “simple” interpretation of linear

coefficients• In a linear model, effect of each unit change in X on Y

is consistent• In a non-linear model, the effect isn’t consistent…• Also, you can’t compute some stats (e.g., R-square).

Logistic Regression Example• Stata output for gun ownership:. logistic gun male educ income south liberal, coef

Logistic regression Number of obs = 850 LR chi2(5) = 89.53 Prob > chi2 = 0.0000Log likelihood = -502.7251 Pseudo R2 = 0.0818

------------------------------------------------------------------------------ gun | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+---------------------------------------------------------------- male | .7837017 .156764 5.00 0.000 .4764499 1.090954 educ | -.0767763 .0254047 -3.02 0.003 -.1265686 -.026984 income | .2416647 .0493794 4.89 0.000 .1448828 .3384466 south | .7363169 .1979038 3.72 0.000 .3484327 1.124201 liberal | -.1641107 .0578167 -2.84 0.005 -.2774294 -.0507921 _cons | -2.28572 .6200443 -3.69 0.000 -3.500984 -1.070455------------------------------------------------------------------------------

• Note: Results aren’t that different from LPM• We’re dealing with big effects, large sample…• But, predicted probabilities & SEs will be better.

Interpreting Coefficients

• Raw coefficients (s) show effect of 1-unit change in X on the log odds of Y=1– Positive coefficients make “Y=1” more likely

• Negative coefficients mean “less likely”

– But, effects are not linear• Effect of unit change on p(Y=1) isn’t same for all values

of X!

– Rather, Xs have a linear effect on the “log odds”• But, it is hard to think in units of “log odds”, so we need

to do further calculations• NOTE: log-odds interpretation doesn’t work on Probit!


• Best way to interpret logit coefficients is to exponentiate them

• This converts from “log odds” to simple “odds”• Exponentiation = opposite of natural log

– On calculator use “ex” or “inverse ln” function

– Exponentiated coefficients are called odds ratios• An odds ratio of 3.0 indicates odds are 3 times higher

for each unit change in X– Or, you can say the odds increase “by a factor of 3”.

• An odds ratio of .5 indicates odds decrease by ½ for each unit change in X.

– Odds ratios < 1 indicate negative effects.

Raw Coefs vs. Odds ratios• It is common to present results either way:. logistic gun male educ income south liberal, coef------------------------------------------------------------------------------ gun | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+---------------------------------------------------------------- male | .7837017 .156764 5.00 0.000 .4764499 1.090954 educ | -.0767763 .0254047 -3.02 0.003 -.1265686 -.026984 income | .2416647 .0493794 4.89 0.000 .1448828 .3384466 south | .7363169 .1979038 3.72 0.000 .3484327 1.124201 liberal | -.1641107 .0578167 -2.84 0.005 -.2774294 -.0507921 _cons | -2.28572 .6200443 -3.69 0.000 -3.500984 -1.070455------------------------------------------------------------------------------

. logistic gun male educ income south liberal------------------------------------------------------------------------------ gun | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]-------------+---------------------------------------------------------------- male | 2.189562 .3432446 5.00 0.000 1.610347 2.977112 educ | .926097 .0235272 -3.02 0.003 .8811137 .9733768 income | 1.273367 .0628781 4.89 0.000 1.155904 1.402767 south | 2.08823 .4132686 3.72 0.000 1.416845 3.077757 liberal | .848648 .049066 -2.84 0.005 .7577291 .9504762------------------------------------------------------------------------------

Can you see the relationship? Negative coeffs yield ratios less below 1.0!


• Example: Do you drink coffee?• Y=1 indicates coffee drinkers; Y=0 indicates no coffee• Key independent variable: Year in grad program

– Observed “raw” coefficient: b = 0.67• A positive effect… each year increases log odds by .67• But how big is it really?

– Exponentiation: e.67= 1.95 • Odds increase multiplicatively by 1.95• If a person’s initial odds were 2.0 (2:1), an extra year of

school would result in: 2.0*1.95 = 3.90• The odds nearly DOUBLE for each unit change in X

– Net of other variables in the model…

Interpreting Coefficients• Exponentiated coefficients (“odds ratios”)

operate multiplicatively• Effect on odds is found by multiplying coefficients

– eb of 1.0 means that a variable has no effect• Multiplying anything by 1.0 results in same value

– eb > 1.0 means that the variable has a positive effect on the odds of “Y=1”

• eb < 1.0 means that the variable has a negative effect

• Hint: Papers may present results as “raw” coefficients or odds ratios

• It is important to be aware of what you’re looking at• If all numbers are positive, it is probably odds ratios!


• To further aid interpretation, we can: convert exponentiated coefficients to % change in odds– Calculate: (exponentiated coef - 1)*100%

• Ex: (e.67 – 1) * 100% = (1.95 – 1) * 100% = 95%• Interpretation: Every unit change in X (year of school)

increases the odds of coffee drinking by 95%

• What about a 2-point change in X?• Is it 2 * 95%? No!!! You must multiply odds ratios:• (1.95 * 1.95 – 1) * 100% = (3.80 – 1) * 100 = +280%

– 3-point change = (1.95 * 1.95 * 1.95 – 1) * 100%• N-point change = (ORn – 1) * 100%


• What is the effect of a 1-unit decrease in X?• No, you can’t flip sign… it isn’t -95%

– You must invert odds ratios to see opposite effect• Additional year in school = (1.95 – 1) * 100% = +95%• One year less: (1/1.95 – 1)*100 =(.512 -1)*100= -48.7%

• What is the effect of two variables together?• To combine odds ratios you must multiply

– Ex: Have a mean advisor; b=.1.2; OR = e1.2 = 3.32• Effect of 1 additional year AND mean advisor:• (1.95 * 3.32 – 1)*100 = (6.47 – 1) * 100% = 547%

increase in odds of coffee drinking…

Interpreting Coefficients• Gun ownership: Effect of education?. logistic gun male educ income south liberal, coef



• (e-.076-1)*100% = 7.38% lower odds per year• Also: Male: (e.78-1)*100% = 118% -- more than double!

Interpreting Interactions• Interactions work like linear regression. gen maleXincome = male * income

. logistic gun male educ income maleXincome south liberal, coefLogistic regression Number of obs = 850 LR chi2(6) = 93.10 Prob > chi2 = 0.0000Log likelihood = -500.93966 Pseudo R2 = 0.0850

------------------------------------------------------------------------------ gun | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+---------------------------------------------------------------- male | 2.914016 1.186788 2.46 0.014 .5879542 5.240078 educ | -.0783493 .0254356 -3.08 0.002 -.1282022 -.0284964 income | .3595354 .0879431 4.09 0.000 .1871701 .5319008 maleXincome | -.1873155 .1030033 -1.82 0.069 -.3891982 .0145672 south | .7293419 .1987554 3.67 0.000 .3397886 1.118895 liberal | -.1671854 .0579675 -2.88 0.004 -.2807996 -.0535711 _cons | -3.58824 1.030382 -3.48 0.000 -5.60775 -1.568729------------------------------------------------------------------------------Income coef for women is .359. For men it is .359 – (-.187) = .172; exp(.172)= 1.187

Combining odds ratios (by multiplying) gives identical results:

exp(.359) * exp (-.187) = 1.43 * .083 = 1.187

Predicted Probabilities

• To determine predicted probabilities, first compute the predicted Logit value:

KiKiii XXXL ...ˆ2211

ee

e

ei

i

K

jjij

K

jjij

L

L

XB

XB

YP

11ˆ

ˆ

1

1

)1(

• Then, plug logit values back into P formula:

Predicted Probabilities: Own a gun?

• Predicted probability for a female PhD student• Highly educated northern liberal female


0.4)7(16.)0(73.)4(24.)20(077.)0(78.28.2ˆ iL

017.018.1

018.)1(

110.4

0.4

ˆ

ˆ

ee

ee

i

i

L

L

YP

The Logit Curve

• Effect of log odds on probability = nonlinear!» From Knoke et al. p. 300


• Important point: Substantive effect of a variable on predicted probability differs depending on values of other variables

• If probability is already high (or low), variable changes may matter less…

– Suppose a 1-point change in X doubles the odds…• Effect isn’t substantively consequential if probability

(Y=1) is already very high– Ex: 20:1 odds = .95 probability; 40:1 odds = .975 probability– Change in probability is only .025

• Effect matters a lot for cases with probabilities near .5– 1:1 odds = .5 probability. 2:1 odds = .67 probability– Change in probability is nearly .2!

Logit Example: Own a gun?

• Predicted probability of gun ownership for a female PhD student is very low: P=.017– Two additional years of education lowers

probability from .017 to .015 – not a big effect• Additional unit change can’t have a big effect –

because probability can’t go below zero • It would matter much more for a southern male…

16.4)7(16.)0(73.)4(24.)22(077.)0(78.28.2ˆ iL

0153.0156.1

0156.)1(

110.4

0.4

ˆ

ˆ

ee

ee

i

i

L

L

YP


• Predicted probabilities are a great way to make findings accessible to a reader

– Often people make bar graphs of probabilities

– 1. Show predicted probabilities for real cases• Ex: probability of civil war for Ghana vs. Sweden

– 2. Show probabilities for “hypothetical” cases that exemplify key contrasts in your data

• Ex: Guns: Southern male vs. female PhD student

– 3. Show how a change in critical independent variable would affect predicted probability

• Ex: Guns: What would happen to southern male who went and got a PhD?

Predicted Probabilities: Stata

• Like OLS regression, we can calculate predicted values for all cases

. predict predprob, pr(1488 missing values generated)

. list predprob gun if gun ~= .

+----------------+ | predprob gun | |----------------| 1. | .486874 0 | 2. | .6405225 1 | 6. | .7078031 1 | 9. | .6750654 1 | 14. | .4243994 0 | |----------------| 17. | .0617232 0 | 19. | .6556235 1 | 22. | .6356462 0 | 27. | .3670604 0 | 32. | .5620316 0 |

Many of the predictions are pretty good

But, some aren’t!


• “Adjust” command can produce predicted values for different groups in your data

• Also – can set variables at mean or specific values– NOTE: “Adjust” does not take into account weighted data

• Example: Probabilities for men/women. adjust, pr by(male)

------------------------------------------------------------------ Dependent variable: gun Command: logistic Variables left as is: educ, income, south, liberal

---------------------- male | pr----------+----------- 0 | .225814 1 | .417045----------------------

Note that the predicted probability for men is nearly twice as high as for women.

Stata Notes: Adjust Command

• Stata “adjust” command can be tricky– 1. By default it uses the entire sample, not just

cases in your prior analysis• Best to specify prior sample: • adjust if e(sample), pr by(male)

– 2. For non-specified variables, stata uses group means (defined by “by” command)

• Don’t assume it pegs cases to overall sample mean• Variables “left as is” take on mean for subgroups

– 3. It doesn’t take into account weighted data• Use “lincom” if you have weighted data


• Effect of pol views & gender for PhD students

. adjust south=0 income=4 educ=20, pr by(liberal male)

------------------------------------------------------------ Dependent variable: gun Command: logisticCovariates set to value: south = 0, income = 4, educ = 20---------------------------- | male liberal | 0 1----------+----------------- 1 | .046588 .096652 2 | .039818 .083241 3 | .033996 .071544 4 | .029 .06138 5 | .024719 .052578 6 | .021057 .044978 7 | .017927 .038433

Note that independent variables are set to values of interest. (Or can be set to mean).

Graphing Predicted Probabilities• P(Y=1) for Women & Men by Liberal

• scatter Women Men Liberal, c(l l)

.02

.04

.06

.08

.1

0 2 4 6 8Liberal

Women Men

Did model categorize cases correctly?

• We can choose a criteria: predicted P > .5:. estat clas-------- True --------Classified | D ~D | Total-----------+--------------------------+----------- + | 64 48 | 112 - | 229 509 | 738-----------+--------------------------+----------- Total | 293 557 | 850

Classified + if predicted Pr(D) >= .5True D defined as gun != 0--------------------------------------------------Sensitivity Pr( +| D) 21.84%Specificity Pr( -|~D) 91.38%Positive predictive value Pr( D| +) 57.14%Negative predictive value Pr(~D| -) 68.97%--------------------------------------------------False + rate for true ~D Pr( +|~D) 8.62%False - rate for true D Pr( -| D) 78.16%False + rate for classified + Pr(~D| +) 42.86%False - rate for classified - Pr( D| -) 31.03%--------------------------------------------------Correctly classified 67.41%--------------------------------------------------

The model yields predicted p>.5 for 112 people; only 64 of them actually have guns

Overall, this simple model doesn’t offer extremely accurate predictions…

67% of people are correctly classified

Note: Results change if you use a different criteria (e.g., p>.6)

Sensitivity / Specificity of Prediction

• Sensitivity: Of gun owners, what proportion were correctly predicted to own a gun?

• Specificity: Of non-gun owners, what proportion did we correctly predict?

• Choosing a different probability cutoff affects those values

• If we reduce the cutoff to P > .4, we’ll catch a higher proportion of gun owners

• But, we’ll incorrectly identify more non-gun owners.• And, we’ll have more false positives.

Sensitivity / Specificity of Prediction

• Stata can produce a plot showing how predictions will change if we vary “P” cutoff:

• Stata command: lsens

0.0

00.

25

0.5

00.

75

1.0

0S

ensi

tivity

/Spe

cific

ity

0.00 0.25 0.50 0.75 1.00Probability cutoff

Sensitivity Specificity

Hypothesis tests

• Testing hypotheses using logistic regression• H0: There is no effect of year in grad program on coffee

drinking• H1: Year in grad school is associated with coffee

– Or, one-tail test: Year in school increases probability of coffee

– MLE estimation yields standard errors… like OLS– Test statistic: 2 options; both yield same results

• t = b/SE… just like OLS regression • Wald test (Chi-square, 1df); essentially the square of t

– Reject H0 if Wald or t > critical value• Or if p-value less than alpha (usually .05).

Model Fit: Likelihood Ratio Tests

• MLE computes a likelihood for the model• “Better” models have higher likelihoods• Log likelihood is typically a negative value, so “better”

means a less negative value… -100 > -1000

• Log likelihood ratio test: Allows comparison of any two nested models

• One model must be a subset of vars in other model– You can’t compare totally unrelated models!

• Models must use the exact same sample.


• Default LR test comparison: Current model versus “null model”

• Null model = only a constant; no covariates; K=0

• Also useful: Compare small & large model• Do added variables (as a group) fit the data better?

– Ex: Suppose a theory suggests 4 psychological variables will have an important effect…

• We could use LR test to compare “base model” to model with 4 additional variables.

• STATA: Run first model; “store” estimates; run second model; use stata command “lrtest” to compare models


• Likelihood ratio test is based on the G-square• Chi-square distributed; df = K1 – K0

• K = # variables; K1 = full model, K0 = simpler model

• L1 = likelihood for full model; L0 = simpler model

101

02 ln2ln2ln2 LLL

LG

• Significant likelihood ratio test indicates that the larger model (L1) is an improvement

• G2 > critical value; or p-value < .05.

Model Fit: Likelihood Ratio Tests• Stata’s default LR test; compares to null model. logistic gun male educ income south liberal, coef



LR Chi2(5) indicates G-square for 5 degrees of freedom

Prob > chi2 is a p-value. p < .05 indicates a significantly better model

Model likelihood = -502.7 Null model is a lower value (more negative)


• Example: Null model log likelihood: -547.5; Full model: -502.7

• 5 new variables, so K1 – K0 = 5.

101

02 ln2ln2ln2 LLL

LG

• According to 2 table, crit value=11.07• Since 89.5 greatly exceeds 11.07, we are confident that

the full model is an improvement• Also, observed p-value in STATA output is .000!

5.897.50225.54722 G

Model Fit: Pseudo R-Square

• Pseudo R-square• “A descriptive measure that indicates roughly the

proportion of observed variation accounted for by the… predictors.” Knoke et al, p. 313


------------------------------------------------------------------------------ gun | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]-------------+---------------------------------------------------------------- male | 2.189562 .3432446 5.00 0.000 1.610347 2.977112 educ | .926097 .0235272 -3.02 0.003 .8811137 .9733768 income | 1.273367 .0628781 4.89 0.000 1.155904 1.402767 south | 2.08823 .4132686 3.72 0.000 1.416845 3.077757 liberal | .848648 .049066 -2.84 0.005 .7577291 .9504762------------------------------------------------------------------------------

Model explains roughly 8% of variation in Y

Assumptions & Problems

• Assumption: Independent random sample• Serial correlation or clustering violate assumptions; bias

SE estimates and hypothesis tests• We will discuss possible remedies in the future

• Multicollinearity: High correlation among independent variables causes problems

• Unstable, inefficient estimates• Watch for coefficient instability, check VIF/tolerance• Remove unneeded variables or create indexes of related

variables.


• Outliers/Influential cases• Unusual/extreme cases can distort results, just like OLS

– Logistic requires different influence statistics• Example: dbeta – very similar to OLS “Cooks D”

– Outlier diagnostics are available in STATA• After model: “predict outliervar, dbeta”• Lists & graphs of residuals & dbetas can identify

influential cases.


• Insufficient variance: You need cases for both values of the dependent variable

• Extremely rare (or common) events can be a problem• Suppose N=1000, but only 3 are coded Y=1• Estimates won’t be great

– Also: Maximum likelihood estimates cannot be computed if any independent variable perfectly predicts the outcome (Y=1)

• Ex: Suppose Soc 8811 drives all students to drink coffee... So there is no variation…

– In that case, you cannot include a dummy variable for taking Soc 8811 in the model.


• Model specification / Omitted variable bias• Just like any regression model, it is critical to include

appropriate variables in the model• Omission of important factors or ‘controls’ will lead to

misleading results.

Stata Notes: Logistic Regression

• Stata has two commands: “logit” & “logistic”– Logit, by default, produces raw coefficients– Logistic, by default, produces odds ratios

• It exponentiates all coefficients for you!

• Note: Both yield identical results– The following pairs of commands are identical

– For raw coefficients:• logit gun male educ income south liberal• logistic gun male educ income south liberal, coef

– And for odds ratios:• logit gun male educ income south liberal, nocoef• logistic gun male educ income south liberal

Real World Example: Coups

• Issue: Many countries face the threat of a coup d’etat – violent overthrow of the regime

• What factors whether a countries will have a coup?

• Paper Handout: Belkin and Schofer (2005)

• What are the basic findings?

• How much do the odds of a coup differ for military regimes vs. civilian governments?– b=1.74; (e1.74 -1)*100% = +470%

• What about a 2-point increase in log GDP?– b=-.233; ((e-.233 * e-.233) -1)*100% = -37%

logistic regression 2 sociology 8811 lecture 7 copyright © 2007 by evan schofer do not copy or...

Documents