Logistic Regression
Sociology 229: Advanced Regression
Copyright © 2010 by Evan SchoferDo not copy or distribute without permission
Agenda
• Today’s Class• Introductions • Go over syllabus• Review topic: Logistic regression
– Not required – only for those who want to stay…
• Next week:• Multinomial logistic regression
Introduction
• Goal of this course: expand your methodological “toolbox”– Regression is extremely robust and versatile...
• BUT: often we have data that violates assumptions of regression models…
– Such as a dichotomous dependent variable
• OR: we wish to do a kind of analysis beyond what can be done with ordinary regression models
– Ex: quantile regression
– So, we need to develop a set of additional tools…
Introduction
• Main course topics• Multinomial logistic regression• Count models• Event history / survival analysis• Multilevel models & panel models
– & some additional stuff squeezed in…
• Issue: There is always a trade-off between depth and coverage
• The course covers a lot of topics briefly• Advantage: exposes you to lots of useful things• Disadvantage: We don’t have nearly enough time to
cover material thoroughly…
Review Syllabus
• Main points:• All readings are available online
– Complete readings prior to class on week they are assigned
• Grades are based on several short assignments – Plus, small “participation” component– No big paper at the end
– NOTE: This class has some overlap with my Event History Analysis class
• I’ve come up with some (optional) alternative material for those who took my earlier class.
Introductions
• This is a small class… let’s introduce ourselves
• Also: It is helpful to get to know your classmates… for when you are stuck on the homework…
Review: Types of Variables
• Continuous variable = can be measured with infinite precision
• Age: we may round off, but great precision is possible
• Discrete variable = can only take on a specific set of values
• Typically: Positive integers or a small set of categories• Ex: # children living in a household; Race; gender• Note: Dichotomous = discrete with 2 categories.
Review: Types of Variables
• And, don’t forget about measurement scales:
• Nominal: Categories that can’t be ordered• Note: Also called “categorical” variables • Ex: Religion; race; geographic state of residence
• Ordinal: Orderable categories• Ex: Social class; College “rankings”; Most attitudinal
measures (Do you approve of… on a 1-5 scale)
• Interval/Continuous: Ordered, with consistent differences across units
• Ex: Age; Cholesterol level; Income (in dollars).
Review: OLS Regression
• Question: What kinds of variables can be analyzed with OLS regression?– Basic correlation and regression was designed for
2 interval/ratio variables• Does fat consumption correlate with cholesterol level?
– Also: It is easy to incorporate nominal/categorical independent variables
• Strategy: Use dummy variables in regression• Ex: Is gender associated with cholesterol level?
– Also: OLS is “robust” and works reasonably well with many ordinal measures (ideally 5+ categories)
• Ex: Are environmental attitudes associated with approval of the president?
Example 1: OLS Regression• Example: Study time and student achievement.
– X variable: Average # hours spent studying per day– Y variable: Score on reading test
Case X Y
1 2.6 28
2 1.4 13
3 .65 19
4 4.1 31
5 .25 8
6 1.9 16
Y axis
X axis
0 1 2 3 4
30
20
10
0
Example 2: Dichotomous Variable• Ex: Did students pass the test (score > 18)?
• Does OLS regression make sense here?
Case X Y
1 2.6 1
2 1.4 0
3 .65 1
4 4.1 1
5 .25 0
6 1.9 0
Y axis
X axis
0 1 2 3 4
Pass (1)
Fail (0)
OLS & Dichotomous Variables
• Problem: OLS regression wasn’t really designed for dichotomous dependent variables
• Two possible outcomes (typically labeled 0 & 1)
• What kinds of problems come up?– Linearity assumption doesn’t hold up– Error distribution is not normal– The model offers nonsensical predicted values
• Instead of predicting pass (1) or fail (0), the regression line might predict -.5.
The Linear Probability Model (LPM)
• Solution #1: Use OLS regression anyway!
• Dependent variable = the probability that a case scores 1 (as opposed to 0)
• In previous example, 1 = passed test; 0 = failed.
– We’ll assume that the probability changes as a linear function of independent variables:
• Note: This assumption may not be appropriate
iKiKii eXXXY ...1P 2211
i
K
jjij eXY
1
1P
Linear Probability Model (LPM)
• The LPM may yield reasonable results• Often good enough to get a “crude look” at your data• Results tend to be better if data is well behaved
– Ex: If there are decent numbers of cases in each category of the dependent variable.
• Interpretation:– Coefficients (b) reflect the increase in probability
of Y=1 for each unit change in X– Constant (a) reflects the base probability of Y=1 if
all X variables are zero– Significance tests are done; but may not be
trustworthy due to OLS assumption violations.
LPM Example: Own a gun?• Stata OLS output:. regress gun male educ income south liberal
Source | SS df MS Number of obs = 850-------------+------------------------------ F( 5, 844) = 17.86 Model | 18.3727851 5 3.67455703 Prob > F = 0.0000 Residual | 173.628391 844 .205720843 R-squared = 0.0957-------------+------------------------------ Adj R-squared = 0.0903 Total | 192.001176 849 .226149796 Root MSE = .45356
------------------------------------------------------------------------------ gun | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- male | .1637871 .0314914 5.20 0.000 .1019765 .2255978 educ | -.0153661 .00525 -2.93 0.004 -.0256706 -.0050616 income | .0379628 .0071879 5.28 0.000 .0238546 .0520711 south | .1539077 .0420305 3.66 0.000 .0714111 .2364043 liberal | -.0313841 .011572 -2.71 0.007 -.0540974 -.0086708 _cons | .13901 .1027844 1.35 0.177 -.0627331 .3407531------------------------------------------------------------------------------
Interpretation: Each additional year of education decreases probability of gun ownership by .015. What about other vars?
LPM Example: Own a gun?
• OLS results can yield predicted probabilities• Just plug in values of constant, X’s into linear equation• Ex: A conservative, poor, southern male:
------------------------------------------------------------------------------ gun | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- male | .1637871 .0314914 5.20 0.000 .1019765 .2255978 educ | -.0153661 .00525 -2.93 0.004 -.0256706 -.0050616 income | .0379628 .0071879 5.28 0.000 .0238546 .0520711 south | .1539077 .0420305 3.66 0.000 .0714111 .2364043 liberal | -.0313841 .011572 -2.71 0.007 -.0540974 -.0086708 _cons | .13901 .1027844 1.35 0.177 -.0627331 .3407531------------------------------------------------------------------------------
KiKiKiii LiberalSouthIncEducMaleY 54322111P
)0(03.)1(15.)6(038.)12(015.)1(16.139.1P Y
501.1P Y
LPM Example: Own a gun?
• Predicted probability for a female PhD student• Highly educated northern liberal female
------------------------------------------------------------------------------ gun | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- male | .1637871 .0314914 5.20 0.000 .1019765 .2255978 educ | -.0153661 .00525 -2.93 0.004 -.0256706 -.0050616 income | .0379628 .0071879 5.28 0.000 .0238546 .0520711 south | .1539077 .0420305 3.66 0.000 .0714111 .2364043 liberal | -.0313841 .011572 -2.71 0.007 -.0540974 -.0086708 _cons | .13901 .1027844 1.35 0.177 -.0627331 .3407531------------------------------------------------------------------------------
)7(03.)0(15.)4(038.)20(015.)0(16.139.1P Y
23.1P Y
21.015.30.0139.1P Y
LPM: Weaknesses
• Model yields nonsensical predicted values• Probabilities should always fall between 0 and 1.
• Assumptions of OLS regression are violated• Linearity• Homoskedasticity (Equal error variance across values
of X): error = low near 0, 1 & high at other values. • Normality of error distribution
– Coefficients (b) are not biased; but not “best” (i.e., lowest possible sampling variance)
– Variances & Standard errors will be inaccurate• Hypothesis tests (t-tests, f-tests) can’t be trusted
Logistic Regression
• Better Alternative: Logistic Regression• Also called “Logit”• A non-linear form of regression that works well for
dichotomous dependent variables• Other non-linear formulations also work (e.g., probit)
• Based on “odds” rather than probability• Rather than model P(Y=1), we model “log odds” of Y=1• “Logit” refers to the natural log of an odds…
– Logistic regression is regression for a logit• Rather than a simple variable “Y” (OLS)• Or a probability (the Linear Probability Model).
Probability & Odds
• Probability of event A defined as p(A):
outcomes ofnumber total
occursA in which outcomes)( Ap
• Example: Coin Flip… probability of “heads”• 1 outcome is “heads”, 2 total possible outcomes • P(“heads”) = 1 / 2 = .5
• Odds of A = Number of outcomes that are A, divided by number of outcomes that are not A
• Odds of “heads” = 1 / 1 = 1.0• Also equivalent to: probability of event over probability of
it not happening: p/(1-p) = (.5 / 1-.5) = 1.0
Logistic Regression
• We can convert a probability to odds:
• “Logit” = natural log (ln) of an odds• Natural log means base “e”, not base 10
– We can model a logit as a function of independent variables:
• Just as we model Y or a probability (the LPM)
i
ii p
podds
1
K
jjij
i
ii X
p
pLp
11ln)(logit
Logistic Regression
• Note: We can solve for “p” and reformulate the model:
ee
eK
jjij
K
jjij
K
jjij
XBXB
XB
YP
11 11
1 1)1(
• Why model this rather than a probability?– Because it is a useful non-linear transformation
• It always generates Ps between 0 and 1, regardless of the values of X variables
• Note: probit transformation has similar effect.
Logistic Regression: Estimation
• Estimation: We can model the logit• Recall: “Hat” = estimate…
KiKiii XXXL ...ˆ2211
• Solution requires Maximum Likelihood Estimation (MLE)
• In OLS there was an algebraic solution• Here, we allow the computer to “search” for the best
values of coefficients (“a” and “b”s) to fit observed data.
Logistic Regression: Estimation• Properties of Maximum Likelihood Estimation
– See Long & Freese 2003:69, Long 1997:54 for a summary
• “Consistent, efficient and asymptotically normal as N approaches infinity.” Large N = better!
– Rules of thumb regarding sample size• N > 500 = fine; N < 100 can be worrisome
– Results aren’t necessarily wrong if N<100; – But it is a possibility; and hard to know when problems crop up
• Plus ~10 cases per independent variable• Eliason (1993) suggests minimum N~60 for up to 5 IVs
– Higher N is needed if data are problematic due to:• Multicollinearity• Limited variation in dependent variable.
Logistic Regression• Benefits of Logistic regression:
• You can now effectively model probability as a function of X variables
• You don’t have to worry about violations of OLS assumptions
• Predictions fall between 0 and 1
• Downsides– You lose the “simple” interpretation of linear
coefficients• In a linear model, effect of each unit change in X on Y
is consistent• In a non-linear model, the effect isn’t consistent…• Also, you can’t compute some stats (e.g., R-square).
Logistic Regression Example• Stata output for gun ownership:. logistic gun male educ income south liberal, coef
Logistic regression Number of obs = 850 LR chi2(5) = 89.53 Prob > chi2 = 0.0000Log likelihood = -502.7251 Pseudo R2 = 0.0818
------------------------------------------------------------------------------ gun | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+---------------------------------------------------------------- male | .7837017 .156764 5.00 0.000 .4764499 1.090954 educ | -.0767763 .0254047 -3.02 0.003 -.1265686 -.026984 income | .2416647 .0493794 4.89 0.000 .1448828 .3384466 south | .7363169 .1979038 3.72 0.000 .3484327 1.124201 liberal | -.1641107 .0578167 -2.84 0.005 -.2774294 -.0507921 _cons | -2.28572 .6200443 -3.69 0.000 -3.500984 -1.070455------------------------------------------------------------------------------
• Note: Results aren’t that different from LPM• We’re dealing with big effects, large sample…• But, predicted probabilities & SEs will be better.
Interpreting Coefficients
• Raw coefficients (s) show effect of 1-unit change in X on the log odds of Y=1– Positive coefficients make “Y=1” more likely
• Negative coefficients mean “less likely”
– But, effects are not linear• Effect of unit change on p(Y=1) isn’t same for all values
of X!
– Rather, Xs have a linear effect on the “log odds”• But, it is hard to think in units of “log odds”, so we need
to do further calculations• NOTE: log-odds interpretation doesn’t work on Probit!
Interpreting Coefficients
• Best way to interpret logit coefficients is to exponentiate them
• This converts from “log odds” to simple “odds”• Exponentiation = opposite of natural log
– On calculator use “ex” or “inverse ln” function
– Exponentiated coefficients are called odds ratios• An odds ratio of 3.0 indicates odds are 3 times higher
for each unit change in X– Or, you can say the odds increase “by a factor of 3”.
• An odds ratio of .5 indicates odds decrease by ½ for each unit change in X.
– Odds ratios < 1 indicate negative effects.
Interpreting Coefficients
• Example: Do you drink coffee?• Y=1 indicates coffee drinkers; Y=0 indicates no coffee• Key independent variable: Year in grad program
– Observed “raw” coefficient: b = 0.67• A positive effect… each year increases log odds by .67• But how big is it really?
– Exponentiation: e.67= 1.95 • Odds increase multiplicatively by 1.95• If a person’s initial odds were 2.0 (2:1), an extra year of
school would result in: 2.0*1.95 = 3.90• The odds nearly DOUBLE for each unit change in X
– Net of other variables in the model…
Interpreting Coefficients• Exponentiated coefficients (“odds ratios”)
operate multiplicatively• Effect on odds is found by multiplying coefficients
– eb of 1.0 means that a variable has no effect• Multiplying anything by 1.0 results in same value
– eb > 1.0 means that the variable has a positive effect on the odds of “Y=1”
• eb < 1.0 means that the variable has a negative effect
• Hint: Papers may present results as “raw” coefficients or odds ratios
• It is important to be aware of what you’re looking at• If all coeffs are positive, they might be odds ratios!
Interpreting Coefficients
• To further aid interpretation, we can: convert exponentiated coefficients to % change in odds– Calculate: (exponentiated coef - 1)*100%
• Ex: (e.67 – 1) * 100% = (1.95 – 1) * 100% = 95%• Interpretation: Every unit change in X (year of school)
increases the odds of coffee drinking by 95%
• What about a 2-point change in X?• Is it 2 * 95%? No!!! You must multiply odds ratios:• (1.95 * 1.95 – 1) * 100% = (3.80 – 1) * 100 = +280%
– 3-point change = (1.95 * 1.95 * 1.95 – 1) * 100%• N-point change = (ORn – 1) * 100%
Interpreting Coefficients
• What is the effect of a 1-unit decrease in X?• No, you can’t flip sign… it isn’t -95%
– You must invert odds ratios to see opposite effect• Additional year in school = (1.95 – 1) * 100% = +95%• One year less: (1/1.95 – 1)*100 =(.512 -1)*100= -48.7%
• What is the effect of two variables together?• To combine odds ratios you must multiply
– Ex: Have a mean advisor; b=.1.2; OR = e1.2 = 3.32• Effect of 1 additional year AND mean advisor:• (1.95 * 3.32 – 1)*100 = (6.47 – 1) * 100% = 547%
increase in odds of coffee drinking…
Interpreting Coefficients• Gun ownership: Effect of education?. logistic gun male educ income south liberal, coef
Logistic regression Number of obs = 850 LR chi2(5) = 89.53 Prob > chi2 = 0.0000Log likelihood = -502.7251 Pseudo R2 = 0.0818
------------------------------------------------------------------------------ gun | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+---------------------------------------------------------------- male | .7837017 .156764 5.00 0.000 .4764499 1.090954 educ | -.0767763 .0254047 -3.02 0.003 -.1265686 -.026984 income | .2416647 .0493794 4.89 0.000 .1448828 .3384466 south | .7363169 .1979038 3.72 0.000 .3484327 1.124201 liberal | -.1641107 .0578167 -2.84 0.005 -.2774294 -.0507921 _cons | -2.28572 .6200443 -3.69 0.000 -3.500984 -1.070455------------------------------------------------------------------------------
• (e-.076-1)*100% = 7.39% lower odds per year• Also: Male: (e.78-1)*100% = +118% -- more than double!
Raw Coefs vs. Odds ratios• It is common to present results either way:. logistic gun male educ income south liberal, coef------------------------------------------------------------------------------ gun | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+---------------------------------------------------------------- male | .7837017 .156764 5.00 0.000 .4764499 1.090954 educ | -.0767763 .0254047 -3.02 0.003 -.1265686 -.026984 income | .2416647 .0493794 4.89 0.000 .1448828 .3384466 south | .7363169 .1979038 3.72 0.000 .3484327 1.124201 liberal | -.1641107 .0578167 -2.84 0.005 -.2774294 -.0507921 _cons | -2.28572 .6200443 -3.69 0.000 -3.500984 -1.070455------------------------------------------------------------------------------
. logistic gun male educ income south liberal------------------------------------------------------------------------------ gun | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]-------------+---------------------------------------------------------------- male | 2.189562 .3432446 5.00 0.000 1.610347 2.977112 educ | .926097 .0235272 -3.02 0.003 .8811137 .9733768 income | 1.273367 .0628781 4.89 0.000 1.155904 1.402767 south | 2.08823 .4132686 3.72 0.000 1.416845 3.077757 liberal | .848648 .049066 -2.84 0.005 .7577291 .9504762------------------------------------------------------------------------------
Can you see the relationship? Negative coeffs yield ratios less below 1.0!
Interpreting Coefficients
• Raw coefficients (s) show effect of 1-unit change in X on the log odds of Y=1– Positive coefficients make “Y=1” more likely
• Negative coefficients mean “less likely”
– But, effects are not linear• Effect of unit change on p(Y=1) isn’t same for all values
of X!
– Rather, Xs have a linear effect on the “log odds”• But, it is hard to think in units of “log odds”, so we need
to do further calculations• NOTE: log-odds interpretation doesn’t work on Probit!
Interpreting Coefficients
• Best way to interpret logit coefficients is to exponentiate them
• This converts from “log odds” to simple “odds”• Exponentiation = opposite of natural log
– On calculator use “ex” or “inverse ln” function
– Exponentiated coefficients are called odds ratios• An odds ratio of 3.0 indicates odds are 3 times higher
for each unit change in X– Or, you can say the odds increase “by a factor of 3”.
• An odds ratio of .5 indicates odds decrease by ½ for each unit change in X.
– Odds ratios < 1 indicate negative effects.
Raw Coefs vs. Odds ratios• It is common to present results either way:. logistic gun male educ income south liberal, coef------------------------------------------------------------------------------ gun | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+---------------------------------------------------------------- male | .7837017 .156764 5.00 0.000 .4764499 1.090954 educ | -.0767763 .0254047 -3.02 0.003 -.1265686 -.026984 income | .2416647 .0493794 4.89 0.000 .1448828 .3384466 south | .7363169 .1979038 3.72 0.000 .3484327 1.124201 liberal | -.1641107 .0578167 -2.84 0.005 -.2774294 -.0507921 _cons | -2.28572 .6200443 -3.69 0.000 -3.500984 -1.070455------------------------------------------------------------------------------
. logistic gun male educ income south liberal------------------------------------------------------------------------------ gun | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]-------------+---------------------------------------------------------------- male | 2.189562 .3432446 5.00 0.000 1.610347 2.977112 educ | .926097 .0235272 -3.02 0.003 .8811137 .9733768 income | 1.273367 .0628781 4.89 0.000 1.155904 1.402767 south | 2.08823 .4132686 3.72 0.000 1.416845 3.077757 liberal | .848648 .049066 -2.84 0.005 .7577291 .9504762------------------------------------------------------------------------------
Can you see the relationship? Negative coeffs yield ratios less below 1.0!
Interpreting Coefficients
• Example: Do you drink coffee?• Y=1 indicates coffee drinkers; Y=0 indicates no coffee• Key independent variable: Year in grad program
– Observed “raw” coefficient: b = 0.67• A positive effect… each year increases log odds by .67• But how big is it really?
– Exponentiation: e.67= 1.95 • Odds increase multiplicatively by 1.95• If a person’s initial odds were 2.0 (2:1), an extra year of
school would result in: 2.0*1.95 = 3.90• The odds nearly DOUBLE for each unit change in X
– Net of other variables in the model…
Interpreting Coefficients• Exponentiated coefficients (“odds ratios”)
operate multiplicatively• Effect on odds is found by multiplying coefficients
– eb of 1.0 means that a variable has no effect• Multiplying anything by 1.0 results in same value
– eb > 1.0 means that the variable has a positive effect on the odds of “Y=1”
• eb < 1.0 means that the variable has a negative effect
• Hint: Papers may present results as “raw” coefficients or odds ratios
• It is important to be aware of what you’re looking at• If all numbers are positive, it is probably odds ratios!
Interpreting Coefficients
• To further aid interpretation, we can: convert exponentiated coefficients to % change in odds– Calculate: (exponentiated coef - 1)*100%
• Ex: (e.67 – 1) * 100% = (1.95 – 1) * 100% = 95%• Interpretation: Every unit change in X (year of school)
increases the odds of coffee drinking by 95%
• What about a 2-point change in X?• Is it 2 * 95%? No!!! You must multiply odds ratios:• (1.95 * 1.95 – 1) * 100% = (3.80 – 1) * 100 = +280%
– 3-point change = (1.95 * 1.95 * 1.95 – 1) * 100%• N-point change = (ORn – 1) * 100%
Interpreting Coefficients
• What is the effect of a 1-unit decrease in X?• No, you can’t flip sign… it isn’t -95%
– You must invert odds ratios to see opposite effect• Additional year in school = (1.95 – 1) * 100% = +95%• One year less: (1/1.95 – 1)*100 =(.512 -1)*100= -48.7%
• What is the effect of two variables together?• To combine odds ratios you must multiply
– Ex: Have a mean advisor; b=.1.2; OR = e1.2 = 3.32• Effect of 1 additional year AND mean advisor:• (1.95 * 3.32 – 1)*100 = (6.47 – 1) * 100% = 547%
increase in odds of coffee drinking…
Interpreting Coefficients• Gun ownership: Effect of education?. logistic gun male educ income south liberal, coef
Logistic regression Number of obs = 850 LR chi2(5) = 89.53 Prob > chi2 = 0.0000Log likelihood = -502.7251 Pseudo R2 = 0.0818
------------------------------------------------------------------------------ gun | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+---------------------------------------------------------------- male | .7837017 .156764 5.00 0.000 .4764499 1.090954 educ | -.0767763 .0254047 -3.02 0.003 -.1265686 -.026984 income | .2416647 .0493794 4.89 0.000 .1448828 .3384466 south | .7363169 .1979038 3.72 0.000 .3484327 1.124201 liberal | -.1641107 .0578167 -2.84 0.005 -.2774294 -.0507921 _cons | -2.28572 .6200443 -3.69 0.000 -3.500984 -1.070455------------------------------------------------------------------------------
• (e-.076-1)*100% = 7.38% lower odds per year• Also: Male: (e.78-1)*100% = 118% -- more than double!
Interpreting Interactions• Interactions work like linear regression. gen maleXincome = male * income
. logistic gun male educ income maleXincome south liberal, coefLogistic regression Number of obs = 850 LR chi2(6) = 93.10 Prob > chi2 = 0.0000Log likelihood = -500.93966 Pseudo R2 = 0.0850
------------------------------------------------------------------------------ gun | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+---------------------------------------------------------------- male | 2.914016 1.186788 2.46 0.014 .5879542 5.240078 educ | -.0783493 .0254356 -3.08 0.002 -.1282022 -.0284964 income | .3595354 .0879431 4.09 0.000 .1871701 .5319008 maleXincome | -.1873155 .1030033 -1.82 0.069 -.3891982 .0145672 south | .7293419 .1987554 3.67 0.000 .3397886 1.118895 liberal | -.1671854 .0579675 -2.88 0.004 -.2807996 -.0535711 _cons | -3.58824 1.030382 -3.48 0.000 -5.60775 -1.568729------------------------------------------------------------------------------Income coef for women is .359. For men it is .359 – (-.187) = .172; exp(.172)= 1.187
Combining odds ratios (by multiplying) gives identical results:
exp(.359) * exp (-.187) = 1.43 * .083 = 1.187
Predicted Probabilities
• To determine predicted probabilities, first compute the predicted Logit value:
KiKiii XXXL ...ˆ2211
ee
e
ei
i
K
jjij
K
jjij
L
L
XB
XB
YP
11ˆ
ˆ
1
1
)1(
• Then, plug logit values back into P formula:
Predicted Probabilities: Own a gun?
• Predicted probability for a female PhD student• Highly educated northern liberal female
------------------------------------------------------------------------------ gun | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+---------------------------------------------------------------- male | .7837017 .156764 5.00 0.000 .4764499 1.090954 educ | -.0767763 .0254047 -3.02 0.003 -.1265686 -.026984 income | .2416647 .0493794 4.89 0.000 .1448828 .3384466 south | .7363169 .1979038 3.72 0.000 .3484327 1.124201 liberal | -.1641107 .0578167 -2.84 0.005 -.2774294 -.0507921 _cons | -2.28572 .6200443 -3.69 0.000 -3.500984 -1.070455------------------------------------------------------------------------------
0.4)7(16.)0(73.)4(24.)20(077.)0(78.28.2ˆ iL
017.018.1
018.)1(
110.4
0.4
ˆ
ˆ
ee
ee
i
i
L
L
YP
Predicted Probabilities
• Important point: Substantive effect of a variable on predicted probability differs depending on values of other variables
• If probability is already high (or low), variable changes may matter less…
– Suppose a 1-point change in X doubles the odds…• Effect isn’t substantively consequential if probability
(Y=1) is already very high– Ex: 20:1 odds = .95 probability; 40:1 odds = .975 probability– Change in probability is only .025
• Effect matters a lot for cases with probabilities near .5– 1:1 odds = .5 probability. 2:1 odds = .67 probability– Change in probability is nearly .2!
Logit Example: Own a gun?
• Predicted probability of gun ownership for a female PhD student is very low: P=.017– Two additional years of education lowers
probability from .017 to .015 – not a big effect• Additional unit change can’t have a big effect –
because probability can’t go below zero • It would matter much more for a southern male…
16.4)7(16.)0(73.)4(24.)22(077.)0(78.28.2ˆ iL
0153.0156.1
0156.)1(
110.4
0.4
ˆ
ˆ
ee
ee
i
i
L
L
YP
Predicted Probabilities
• Predicted probabilities are a great way to make findings accessible to a reader
– Often people make bar graphs of probabilities
– 1. Show predicted probabilities for real cases• Ex: probability of civil war for Ghana vs. Sweden
– 2. Show probabilities for “hypothetical” cases that exemplify key contrasts in your data
• Ex: Guns: Southern male vs. female PhD student
– 3. Show how a change in critical independent variable would affect predicted probability
• Ex: Guns: What would happen to southern male who went and got a PhD?
Predicted Probabilities: Stata
• Like OLS regression, we can calculate predicted values for all cases
. predict predprob, pr(1488 missing values generated)
. list predprob gun if gun ~= .
+----------------+ | predprob gun | |----------------| 1. | .486874 0 | 2. | .6405225 1 | 6. | .7078031 1 | 9. | .6750654 1 | 14. | .4243994 0 | |----------------| 17. | .0617232 0 | 19. | .6556235 1 | 22. | .6356462 0 | 27. | .3670604 0 | 32. | .5620316 0 |
Many of the predictions are pretty good
But, some aren’t!
Predicted Probabilities: Stata
• “Adjust” (stata 9/10) and “margins” (stata 11) commands can produce predicted values for different groups in your data
• Also – can set variables at mean or specific values
• Example: Probabilities for men/women. adjust, pr by(male)
------------------------------------------------------------------ Dependent variable: gun Command: logistic Variables left as is: educ, income, south, liberal
---------------------- male | pr----------+----------- 0 | .225814 1 | .417045----------------------
Note that the predicted probability for men is nearly twice as high as for women.
Stata Notes: Adjust Command
• Stata “adjust” command can be tricky– 1. By default it uses the entire sample, not just
cases in your prior analysis• Best to specify prior sample: • adjust if e(sample), pr by(male)
– 2. For non-specified variables, stata uses group means (defined by “by” command)
• Don’t assume it pegs cases to overall sample mean• Variables “left as is” take on mean for subgroups
– 3. It doesn’t take into account weighted data• Use “lincom” if you have weighted data
Marginal Change in Logit
• Issue: How to best capture effect size in non-linear models?– % Change in odds ratios for 1-unit change in X– Change in actual probability for 1-unit change in X
• Either for hypothetical cases or an actual case
• Another option: marginal change• The actual slope of the curve at a specific point• Again, can be computed for real or hypothetical cases• Use “adjust” (stata 9/10) or “margins” (stata 11)
– Recall from calculus: derivatives are slopes...• So, a marginal change is just a derivative.
Predicted Probabilities: Stata
• Effect of pol views & gender for PhD students
. adjust south=0 income=4 educ=20, pr by(liberal male)
------------------------------------------------------------ Dependent variable: gun Command: logisticCovariates set to value: south = 0, income = 4, educ = 20---------------------------- | male liberal | 0 1----------+----------------- 1 | .046588 .096652 2 | .039818 .083241 3 | .033996 .071544 4 | .029 .06138 5 | .024719 .052578 6 | .021057 .044978 7 | .017927 .038433
Note that independent variables are set to values of interest. (Or can be set to mean).
Graphing Predicted Probabilities• P(Y=1) for Women & Men by Liberal
• scatter Women Men Liberal, c(l l)
.02
.04
.06
.08
.1
0 2 4 6 8Liberal
Women Men
Did model categorize cases correctly?
• We can choose a criteria: predicted P > .5:. estat clas-------- True --------Classified | D ~D | Total-----------+--------------------------+----------- + | 64 48 | 112 - | 229 509 | 738-----------+--------------------------+----------- Total | 293 557 | 850
Classified + if predicted Pr(D) >= .5True D defined as gun != 0--------------------------------------------------Sensitivity Pr( +| D) 21.84%Specificity Pr( -|~D) 91.38%Positive predictive value Pr( D| +) 57.14%Negative predictive value Pr(~D| -) 68.97%--------------------------------------------------False + rate for true ~D Pr( +|~D) 8.62%False - rate for true D Pr( -| D) 78.16%False + rate for classified + Pr(~D| +) 42.86%False - rate for classified - Pr( D| -) 31.03%--------------------------------------------------Correctly classified 67.41%--------------------------------------------------
The model yields predicted p>.5 for 112 people; only 64 of them actually have guns
Overall, this simple model doesn’t offer extremely accurate predictions…
67% of people are correctly classified
Note: Results change if you use a different criteria (e.g., p>.6)
Sensitivity / Specificity of Prediction
• Sensitivity: Of gun owners, what proportion were correctly predicted to own a gun?
• Specificity: Of non-gun owners, what proportion did we correctly predict?
• Choosing a different probability cutoff affects those values
• If we reduce the cutoff to P > .4, we’ll catch a higher proportion of gun owners
• But, we’ll incorrectly identify more non-gun owners.• And, we’ll have more false positives.
Sensitivity / Specificity of Prediction
• Stata can produce a plot showing how predictions will change if we vary “P” cutoff:
• Stata command: lsens
0.0
00.
25
0.5
00.
75
1.0
0S
ensi
tivity
/Spe
cific
ity
0.00 0.25 0.50 0.75 1.00Probability cutoff
Sensitivity Specificity
Hypothesis tests
• Testing hypotheses using logistic regression• H0: There is no effect of year in grad program on coffee
drinking• H1: Year in grad school is associated with coffee
– Or, one-tail test: Year in school increases probability of coffee
– MLE estimation yields standard errors… like OLS– Test statistic: 2 options; both yield same results
• t = b/SE… just like OLS regression • Wald test (Chi-square, 1df); essentially the square of t
– Reject H0 if Wald or t > critical value• Or if p-value less than alpha (usually .05).
Model Fit: Likelihood Ratio Tests
• MLE computes a likelihood for the model• “Better” models have higher likelihoods• Log likelihood is typically a negative value, so “better”
means a less negative value… -100 > -1000
• Log likelihood ratio test: Allows comparison of any two nested models
• One model must be a subset of vars in other model– You can’t compare totally unrelated models!
• Models must use the exact same sample.
Model Fit: Likelihood Ratio Tests
• Default LR test comparison: Current model versus “null model”
• Null model = only a constant; no covariates; K=0
• Also useful: Compare small & large model• Do added variables (as a group) fit the data better?
– Ex: Suppose a theory suggests 4 psychological variables will have an important effect…
• We could use LR test to compare “base model” to model with 4 additional variables.
• STATA: Run first model; “store” estimates; run second model; use stata command “lrtest” to compare models
Model Fit: Likelihood Ratio Tests
• Likelihood ratio test is based on the G-square• Chi-square distributed; df = K1 – K0
• K = # variables; K1 = full model, K0 = simpler model
• L1 = likelihood for full model; L0 = simpler model
101
02 ln2ln2ln2 LLL
LG
• Significant likelihood ratio test indicates that the larger model (L1) is an improvement
• G2 > critical value; or p-value < .05.
Model Fit: Likelihood Ratio Tests• Stata’s default LR test; compares to null model. logistic gun male educ income south liberal, coef
Logistic regression Number of obs = 850 LR chi2(5) = 89.53 Prob > chi2 = 0.0000Log likelihood = -502.7251 Pseudo R2 = 0.0818
------------------------------------------------------------------------------ gun | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+---------------------------------------------------------------- male | .7837017 .156764 5.00 0.000 .4764499 1.090954 educ | -.0767763 .0254047 -3.02 0.003 -.1265686 -.026984 income | .2416647 .0493794 4.89 0.000 .1448828 .3384466 south | .7363169 .1979038 3.72 0.000 .3484327 1.124201 liberal | -.1641107 .0578167 -2.84 0.005 -.2774294 -.0507921 _cons | -2.28572 .6200443 -3.69 0.000 -3.500984 -1.070455------------------------------------------------------------------------------
LR Chi2(5) indicates G-square for 5 degrees of freedom
Prob > chi2 is a p-value. p < .05 indicates a significantly better model
Model likelihood = -502.7 Null model is a lower value (more negative)
Model Fit: Likelihood Ratio Tests
• Example: Null model log likelihood: -547.5; Full model: -502.7
• 5 new variables, so K1 – K0 = 5.
101
02 ln2ln2ln2 LLL
LG
• According to 2 table, crit value=11.07• Since 89.5 greatly exceeds 11.07, we are confident that
the full model is an improvement• Also, observed p-value in STATA output is .000!
5.897.50225.54722 G
Model Fit: Pseudo R-Square
• Pseudo R-square• “A descriptive measure that indicates roughly the
proportion of observed variation accounted for by the… predictors.” Knoke et al, p. 313
Logistic regression Number of obs = 850 LR chi2(5) = 89.53 Prob > chi2 = 0.0000Log likelihood = -502.7251 Pseudo R2 = 0.0818
------------------------------------------------------------------------------ gun | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]-------------+---------------------------------------------------------------- male | 2.189562 .3432446 5.00 0.000 1.610347 2.977112 educ | .926097 .0235272 -3.02 0.003 .8811137 .9733768 income | 1.273367 .0628781 4.89 0.000 1.155904 1.402767 south | 2.08823 .4132686 3.72 0.000 1.416845 3.077757 liberal | .848648 .049066 -2.84 0.005 .7577291 .9504762------------------------------------------------------------------------------
Model explains roughly 8% of variation in Y
Assumptions & Problems
• Assumption: Independent random sample• Serial correlation or clustering violate assumptions; bias
SE estimates and hypothesis tests• We will discuss possible remedies in the future
• Multicollinearity: High correlation among independent variables causes problems
• Unstable, inefficient estimates• Watch for coefficient instability, check VIF/tolerance• Remove unneeded variables or create indexes of related
variables.
Assumptions & Problems
• Outliers/Influential cases• Unusual/extreme cases can distort results, just like OLS
– Logistic requires different influence statistics• Example: dbeta – very similar to OLS “Cooks D”
– Outlier diagnostics are available in STATA• After model: “predict outliervar, dbeta”• Lists & graphs of residuals & dbetas can identify
influential cases.
Plotting Residuals by Casenumber• predict sresid, rstandard• gen casenum = _n• scatter sresid casenum
-2-1
01
23
stan
dard
ize
d P
ears
on r
esi
dua
l
0 1000 2000 3000casenum
Assumptions & Problems
• Insufficient variance: You need cases for both values of the dependent variable
• Extremely rare (or common) events can be a problem• Suppose N=1000, but only 3 are coded Y=1• Estimates won’t be great
• Also: Maximum likelihood estimates cannot be computed if any independent variable perfectly predicts the outcome (Y=1)
• Ex: Suppose sociology classes drives all students to drink coffee... So there is no variation…
– In that case, you cannot include a dummy variable for taking sociology classes in the model.
Assumptions & Problems
• Model specification / Omitted variable bias• Just like any regression model, it is critical to include
appropriate variables in the model• Omission of important factors or ‘controls’ will lead to
misleading results.
Probit
• Probit models are an alternative to logistic regression
• Involves a different non-linear transformation• Generally yields results very similar to logit models
– Coefficients are rescaled by factor of (approx) 1.6
– For ‘garden variety’ analyses, there is little reason to prefer either logit or probit
• But, probit has advantages in some circumstances– Ex: Multinomial models that violate the IIA assumption (to be
discussed later).
Example: Coup d’etat
• Issue: Many countries face the threat of a coup d’etat – violent overthrow of the regime
• What factors whether a countries will have a coup?
• Paper Handout: Belkin and Schofer (2005)
• What are the basic findings?
• How much do the odds of a coup differ for military regimes vs. civilian governments?– b=1.74; (e1.74 -1)*100% = +470%
• What about a 2-point increase in log GDP?– b=-.233; ((e-.233 * e-.233) -1)*100% = -37%