lecture 18 ordinal and polytomous logistic regression
DESCRIPTION
Lecture 18 Ordinal and Polytomous Logistic Regression. BMTRY 701 Biostatistical Methods II. Categorical Outcomes. Logistic regression is appropriate for binary outcomes What about other kinds of categorical data? >2 categories ordinal data - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Lecture 18 Ordinal and Polytomous Logistic Regression](https://reader035.vdocuments.site/reader035/viewer/2022062322/568145b7550346895db2bafc/html5/thumbnails/1.jpg)
Lecture 18Ordinal and Polytomous Logistic Regression
BMTRY 701
Biostatistical Methods II
![Page 2: Lecture 18 Ordinal and Polytomous Logistic Regression](https://reader035.vdocuments.site/reader035/viewer/2022062322/568145b7550346895db2bafc/html5/thumbnails/2.jpg)
Categorical Outcomes
Logistic regression is appropriate for binary outcomes
What about other kinds of categorical data?• >2 categories• ordinal data
Standard logistic is not applicable unless you ‘threshold’ the date or collapse categories
BMTRY 711: Analysis of Categorical Data This is just an overview
![Page 3: Lecture 18 Ordinal and Polytomous Logistic Regression](https://reader035.vdocuments.site/reader035/viewer/2022062322/568145b7550346895db2bafc/html5/thumbnails/3.jpg)
Ordinal Logistic Regression
Ordinal Dependent Variable• Teaching experience• SES (high, middle, low)• Degree of Agreement• Ability level (e.g. literacy, reading)• Severity of disease/outcome• Severity of toxicity
Context is important Example: attitudes towards smoking
![Page 4: Lecture 18 Ordinal and Polytomous Logistic Regression](https://reader035.vdocuments.site/reader035/viewer/2022062322/568145b7550346895db2bafc/html5/thumbnails/4.jpg)
Proportional Odds Model
One of several possible regression models for the analysis of ordinal data, and also the most common.
Model predicts the ln(odds) of being in category j or beyond.
Simplifying assumption: “proportional odds”• Effect of covariate assumed to be invariant across splits• Example: 4 categories
0 vs 1,2,3 0,1 vs 2,3 0,1,2 vs 3
• Assumes that each of these comparisons yields the same odds ratio
![Page 5: Lecture 18 Ordinal and Polytomous Logistic Regression](https://reader035.vdocuments.site/reader035/viewer/2022062322/568145b7550346895db2bafc/html5/thumbnails/5.jpg)
Motivating Example: YTS
The South Carolina Youth Tobacco Survey (SC YTS) is part of the National Youth Tobacco Survey program sponsored by the Centers for Disease Control and Prevention. The YTS is an annual school-based survey designed to evaluate youth-related smoking practices, including initiation and prevalence, cessation, attitudes towards smoking, media influences, and more. The SC YTS is coordinated by the SC Department of Health and Environmental Control and has been administered yearly since 2005. Data for this report are based on years 2005-2007. The SC YTS uses a two-stage sample cluster design to select a representative sample of public middle (grades 6-8) and high school (grades 9-12) students.
![Page 6: Lecture 18 Ordinal and Polytomous Logistic Regression](https://reader035.vdocuments.site/reader035/viewer/2022062322/568145b7550346895db2bafc/html5/thumbnails/6.jpg)
Ordinal Outcomes
. tab cr44
“do you think | smoking | cigarettes | makes young | people look | cool or fit | in?” | Freq. Percent Cum.---------------+-----------------------------------definitely yes | 460 6.07 6.07 probably yes | 818 10.79 16.86 probably not | 1,329 17.53 34.38definitely not | 4,975 65.62 100.00---------------+----------------------------------- Total | 7,582 100.00
.
“do you think | young people | risk harming | themselves if | they smoke | from 1 - 5 | ciga | Freq. Percent Cum.---------------+-----------------------------------definitely yes | 5,387 70.98 70.98 probably yes | 1,283 16.91 87.89 probably not | 360 4.74 92.63definitely not | 559 7.37 100.00---------------+----------------------------------- Total | 7,589 100.00
![Page 7: Lecture 18 Ordinal and Polytomous Logistic Regression](https://reader035.vdocuments.site/reader035/viewer/2022062322/568145b7550346895db2bafc/html5/thumbnails/7.jpg)
What factors are related to these attitudes?
Gender? Grade? Race? parental education (surrogate for SES)? year? (2005, 200, 2007) have tried cigarettes? school performance? smoker in the home?
![Page 8: Lecture 18 Ordinal and Polytomous Logistic Regression](https://reader035.vdocuments.site/reader035/viewer/2022062322/568145b7550346895db2bafc/html5/thumbnails/8.jpg)
Tabulation of gender vs. look cool
“do you think | smoking | cigarettes | makes young | people look | cool or fit | gender in?” | 0 1 | Total---------------+----------------------+----------definitely yes | 278 177 | 455 probably yes | 446 364 | 810 probably not | 692 628 | 1,320 definitely not | 2,158 2,797 | 4,955 ---------------+----------------------+---------- Total | 3,574 3,966 | 7,540
![Page 9: Lecture 18 Ordinal and Polytomous Logistic Regression](https://reader035.vdocuments.site/reader035/viewer/2022062322/568145b7550346895db2bafc/html5/thumbnails/9.jpg)
Possible “breaks”
OR = 1.81
male female
def yes
278 177
else 3296 3789
OR = 1.59
male female
yes 724 541
no 2850 3425
OR = 1.57
male female
else 1416 1169
def no 2158 2797
![Page 10: Lecture 18 Ordinal and Polytomous Logistic Regression](https://reader035.vdocuments.site/reader035/viewer/2022062322/568145b7550346895db2bafc/html5/thumbnails/10.jpg)
Proportional Odds Assumption
How to implement this? Model the probability of ‘cumulative’ logits Instead of
Here, we have
)1(1
)1(
yP
yPodds
)(1
)(
kyP
kyPodds
![Page 11: Lecture 18 Ordinal and Polytomous Logistic Regression](https://reader035.vdocuments.site/reader035/viewer/2022062322/568145b7550346895db2bafc/html5/thumbnails/11.jpg)
The (simple) ordinal logistic model
1,...,1;)|(-1
)|(ln 1
Kkx
xkyP
xkyPk
Warning! different packages parameterize it different ways!Stata codes it differently than SAS and R.
Notice how this differs from logistic regression:there is a ‘level’ specific intercept.
But, there is just ONE log odds ratio describing the association between x and y.
![Page 12: Lecture 18 Ordinal and Polytomous Logistic Regression](https://reader035.vdocuments.site/reader035/viewer/2022062322/568145b7550346895db2bafc/html5/thumbnails/12.jpg)
Example
. ologit lookcool gender
Iteration 0: log likelihood = -7465.0108Iteration 1: log likelihood = -7418.1251Iteration 2: log likelihood = -7418.0256
Ordered logistic regression Number of obs = 7540 LR chi2(1) = 93.97 Prob > chi2 = 0.0000Log likelihood = -7418.0256 Pseudo R2 = 0.0063
------------------------------------------------------------------------------ lookcool | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+---------------------------------------------------------------- gender | .4605529 .0476442 9.67 0.000 .367172 .5539338-------------+---------------------------------------------------------------- /cut1 | -2.525572 .0529346 -2.629322 -2.421823 /cut2 | -1.375663 .0380258 -1.450193 -1.301134 /cut3 | -.4159722 .0336987 -.4820204 -.349924------------------------------------------------------------------------------
![Page 13: Lecture 18 Ordinal and Polytomous Logistic Regression](https://reader035.vdocuments.site/reader035/viewer/2022062322/568145b7550346895db2bafc/html5/thumbnails/13.jpg)
R estimation
Different parameterization
Makes you think about what the model is doing!
KkxxkyP
xkyPk ,...,2;
)|(-1
)|(ln 1
![Page 14: Lecture 18 Ordinal and Polytomous Logistic Regression](https://reader035.vdocuments.site/reader035/viewer/2022062322/568145b7550346895db2bafc/html5/thumbnails/14.jpg)
> library(Design)> oreg <- lrm(lookcool ~ gender, data=data)> oreg
Logistic Regression Model
lrm(formula = lookcool ~ gender, data = data)
Frequencies of Responses 1 2 3 4 455 810 1320 4955
Frequencies of Missing Values Due to Each Variablelookcool gender 196 50
Obs Max Deriv Model L.R. d.f. P C Dxy 7540 2e-12 93.97 1 0 0.552 0.104 Gamma Tau-a R2 Brier 0.206 0.054 0.014 0.056
Coef S.E. Wald Z Py>=2 2.5256 0.05293 47.71 0y>=3 1.3757 0.03803 36.18 0y>=4 0.4160 0.03370 12.34 0gender 0.4606 0.04764 9.67 0
![Page 15: Lecture 18 Ordinal and Polytomous Logistic Regression](https://reader035.vdocuments.site/reader035/viewer/2022062322/568145b7550346895db2bafc/html5/thumbnails/15.jpg)
MLR. ologit lookcool gender evertried smokerhome grade school_perf
Iteration 0: log likelihood = -2123.9232Iteration 1: log likelihood = -2052.0964Iteration 2: log likelihood = -2051.2897Iteration 3: log likelihood = -2051.2895
Ordered logistic regression Number of obs = 2125 LR chi2(5) = 145.27 Prob > chi2 = 0.0000Log likelihood = -2051.2895 Pseudo R2 = 0.0342
------------------------------------------------------------------------------ lookcool | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+---------------------------------------------------------------- gender | .214247 .09052 2.37 0.018 .0368309 .391663 evertried | 1.048804 .0999844 10.49 0.000 .852838 1.24477 smokerhome | .1350945 .0931715 1.45 0.147 -.0475182 .3177072 grade | .0646475 .0253649 2.55 0.011 .0149332 .1143618school_per~e | -.0407656 .0591738 -0.69 0.491 -.1567441 .0752128-------------+---------------------------------------------------------------- /cut1 | -.9447746 .301191 -1.535098 -.3544511 /cut2 | .3491469 .2940768 -.2272331 .9255269 /cut3 | 1.386131 .2950286 .8078857 1.964377------------------------------------------------------------------------------
![Page 16: Lecture 18 Ordinal and Polytomous Logistic Regression](https://reader035.vdocuments.site/reader035/viewer/2022062322/568145b7550346895db2bafc/html5/thumbnails/16.jpg)
It is a pretty strong assumption
How can we check? Simple check as shown in 2x2 table. Continuous variables: harder
• need to consider the model• no direct ‘tabular’ comparison
multiple regression: does it hold for all? Tricky! It needs to make sense and you need to
do some ‘model checking’ for all of your variables
Worthwhile to check each individually.
![Page 17: Lecture 18 Ordinal and Polytomous Logistic Regression](https://reader035.vdocuments.site/reader035/viewer/2022062322/568145b7550346895db2bafc/html5/thumbnails/17.jpg)
There is another approach
There is a test of proportionality. Implemented easily in Stata with an add-on
package: omodel• Ho: proportionality holds• Ha: proportionality is violated
Why? violation would require more parameters and would be a larger model
What does small p-value imply?• but be careful of sample size!• large sample sizes will make it hard to ‘adhere’ to
proportionality assumption
![Page 18: Lecture 18 Ordinal and Polytomous Logistic Regression](https://reader035.vdocuments.site/reader035/viewer/2022062322/568145b7550346895db2bafc/html5/thumbnails/18.jpg)
Estimation in Stata. omodel logit lookcool gender
Iteration 0: log likelihood = -7465.0108Iteration 1: log likelihood = -7418.1251Iteration 2: log likelihood = -7418.0256
Ordered logit estimates Number of obs = 7540 LR chi2(1) = 93.97 Prob > chi2 = 0.0000Log likelihood = -7418.0256 Pseudo R2 = 0.0063
------------------------------------------------------------------------------ lookcool | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+---------------------------------------------------------------- gender | .4605529 .0476442 9.67 0.000 .367172 .5539338-------------+---------------------------------------------------------------- _cut1 | -2.525572 .0529346 (Ancillary parameters) _cut2 | -1.375663 .0380258 _cut3 | -.4159722 .0336987 ------------------------------------------------------------------------------
Approximate likelihood-ratio test of proportionality of oddsacross response categories: chi2(2) = 2.43 Prob > chi2 = 0.2964
![Page 19: Lecture 18 Ordinal and Polytomous Logistic Regression](https://reader035.vdocuments.site/reader035/viewer/2022062322/568145b7550346895db2bafc/html5/thumbnails/19.jpg)
. omodel logit lookcool grade
Iteration 0: log likelihood = -7425.0617Iteration 1: log likelihood = -7424.7193Iteration 2: log likelihood = -7424.7193
Ordered logit estimates Number of obs = 7505 LR chi2(1) = 0.68 Prob > chi2 = 0.4079Log likelihood = -7424.7193 Pseudo R2 = 0.0000
------------------------------------------------------------------------------ lookcool | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+---------------------------------------------------------------- grade | -.0106001 .0128062 -0.83 0.408 -.0356997 .0144995-------------+---------------------------------------------------------------- _cut1 | -2.784359 .0678301 (Ancillary parameters) _cut2 | -1.640955 .0567613 _cut3 | -.6923403 .0534291 ------------------------------------------------------------------------------
Approximate likelihood-ratio test of proportionality of oddsacross response categories: chi2(2) = 22.31 Prob > chi2 = 0.0000
![Page 20: Lecture 18 Ordinal and Polytomous Logistic Regression](https://reader035.vdocuments.site/reader035/viewer/2022062322/568145b7550346895db2bafc/html5/thumbnails/20.jpg)
What would the ORs be?
Generate three separate binary outcome variables from the ordinal variable• lookcool1v234• lookcool12v34• lookcool123v4
Estimate the odds ratio for each binary outcome
![Page 21: Lecture 18 Ordinal and Polytomous Logistic Regression](https://reader035.vdocuments.site/reader035/viewer/2022062322/568145b7550346895db2bafc/html5/thumbnails/21.jpg)
Stata Code
gen lookcool1v234=1 if lookcool==2 | lookcool==3 | lookcool==4replace lookcool1v234=0 if lookcool==1gen lookcool12v34=1 if lookcool==3 | lookcool==4replace lookcool12v34=0 if lookcool==1 | lookcool==2gen lookcool123v4=1 if lookcool==4replace lookcool123v4=0 if lookcool==2 | lookcool==3 | lookcool==1
logit lookcool1v234 gradelogit lookcool12v34 gradelogit lookcool123v4 grade
![Page 22: Lecture 18 Ordinal and Polytomous Logistic Regression](https://reader035.vdocuments.site/reader035/viewer/2022062322/568145b7550346895db2bafc/html5/thumbnails/22.jpg)
Results
For a one grade difference (range = 6 – 12)
• lookcool1v234 vs. grade: OR = 1.002 (0.93)
• lookcool12vs34 vs. grade: OR = 1.04 (p=0.03)
• lookcool123v4 vs. grade: OR = 0.98 (p=0.11)
![Page 23: Lecture 18 Ordinal and Polytomous Logistic Regression](https://reader035.vdocuments.site/reader035/viewer/2022062322/568145b7550346895db2bafc/html5/thumbnails/23.jpg)
Another approach: Polytomous Logistic Regression
Polytomous (aka Polychotomous) Logistic Regression
Fits the regression model with all contrasts. Can be used as an inferential model Or, can be used to estimate odds ratio to see if
they look ‘ordered” Model is different though
1,...,1;)|(
)|(ln
KkxxKyP
xkyPkk
![Page 24: Lecture 18 Ordinal and Polytomous Logistic Regression](https://reader035.vdocuments.site/reader035/viewer/2022062322/568145b7550346895db2bafc/html5/thumbnails/24.jpg)
. mlogit lookcool gender
Iteration 0: log likelihood = -7465.0108Iteration 1: log likelihood = -7417.1379Iteration 2: log likelihood = -7416.8737Iteration 3: log likelihood = -7416.8737
Multinomial logistic regression Number of obs = 7540 LR chi2(3) = 96.27 Prob > chi2 = 0.0000Log likelihood = -7416.8737 Pseudo R2 = 0.0064
------------------------------------------------------------------------------ lookcool | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------definitely~s | gender | -.7108369 .1003382 -7.08 0.000 -.9074962 -.5141776 _cons | -2.049316 .0637222 -32.16 0.000 -2.174209 -1.924423-------------+----------------------------------------------------------------probably yes | gender | -.4625306 .0762255 -6.07 0.000 -.6119298 -.3131314 _cons | -1.576618 .0520148 -30.31 0.000 -1.678565 -1.474671-------------+----------------------------------------------------------------probably not | gender | -.3564113 .0621157 -5.74 0.000 -.4781559 -.2346668 _cons | -1.137351 .0436861 -26.03 0.000 -1.222974 -1.051728------------------------------------------------------------------------------(lookcool==definitely not is the base outcome)
![Page 25: Lecture 18 Ordinal and Polytomous Logistic Regression](https://reader035.vdocuments.site/reader035/viewer/2022062322/568145b7550346895db2bafc/html5/thumbnails/25.jpg)
Interpretation
For gender, notice the ordered nature of the odds ratio
Suggests that it may be appropriate to use an ordinal model
This model is more general, less restrictive but, sort of a mess to interpret
![Page 26: Lecture 18 Ordinal and Polytomous Logistic Regression](https://reader035.vdocuments.site/reader035/viewer/2022062322/568145b7550346895db2bafc/html5/thumbnails/26.jpg)
. mlogit lookcool grade
Iteration 0: log likelihood = -7425.0617Iteration 1: log likelihood = -7414.6932Iteration 2: log likelihood = -7414.6755
Multinomial logistic regression Number of obs = 7505 LR chi2(3) = 20.77 Prob > chi2 = 0.0001Log likelihood = -7414.6755 Pseudo R2 = 0.0014
------------------------------------------------------------------------------ lookcool | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------definitely~s | grade | .0051085 .0264996 0.19 0.847 -.0468299 .0570468 _cons | -2.407127 .1089328 -22.10 0.000 -2.620632 -2.193623-------------+----------------------------------------------------------------probably yes | grade | -.0390659 .0207178 -1.89 0.059 -.0796721 .0015402 _cons | -1.672094 .0825905 -20.25 0.000 -1.833968 -1.510219-------------+----------------------------------------------------------------probably not | grade | .0627357 .0166685 3.76 0.000 .030066 .0954055 _cons | -1.562533 .0709374 -22.03 0.000 -1.701568 -1.423499------------------------------------------------------------------------------(lookcool==definitely not is the base outcome)
![Page 27: Lecture 18 Ordinal and Polytomous Logistic Regression](https://reader035.vdocuments.site/reader035/viewer/2022062322/568145b7550346895db2bafc/html5/thumbnails/27.jpg)
In R?
mlogit library requires a data transformation step