Download - Estimating Drug Use Prevalence Using Latent Class Models with Item Count Response as One Indicator
Estimating Drug Use Prevalence Using Latent Class Models with Item Count
Response as One Indicator
Paul Biemer
RTI International and
University of North Carolina
Presentation Outline
• Describe the item count (IC) method
• Present standard IC estimates of cocaine use and compare them with direct estimates
• Describe a method for adjusting the standard estimates for measurement bias
• Present the bias corrected estimates
• Implications for future applications of IC
What is the item count method?
• Used for estimating the prevalence of sensitive behaviors
• Sensitive behavior is one of a small number of behaviors in a list
• Respondents indicate only how many behaviors in the list apply, not which ones
• If the average number of “other” behaviors is known, prevalence of the sensitive behavior can be estimated
Illustration – One Pair of Lists
Random sample
Subsample A Subsample Brandom split
ICQ (short list)
ICQ (long list)
Long list = short list + sensitive item
Illustration – One Pair of Lists
Shortx
Random sample
Subsample A Subsample Brandom split
ICQ (short list)
ICQ (long list)
Long list = short list + sensitive item
Longx
Prevalence Estimate for Single Pair Design
ˆ Long Shortp x x
Prevalence = avg count for long list – avg count for short list
Example of Youth ICQ: ICQ1 – Short
Next is a list of things that you may or may not have done in the past 12 months. How many of the things on this list did you do in the past 12 months, that is since [DATE 12 MONTHS AGO].
• Rode with a drunk driver
• Walked alone after dark through a dangerous neighborhood
• Rode a bicycle without a helmet
• Went swimming or played outdoor sports when it was lightning
Example of Youth ICQ: ICQ1 – Long
Next is a list of things that you may or may not have done in the past 12 months. How many of the things on this list did you do in the past 12 months, that is since [DATE 12 MONTHS AGO].
• Rode with a drunk driver
• Walked alone after dark through a dangerous neighborhood
• Rode a bicycle without a helmet
• Went swimming or played outdoor sports when it was lightning
• Used cocaine, in any form, one or more times
Results Using the Standard IC Estimator
Item Count Estimates by Age and Gender
Age Gender Item Count NSDUH
12-17 Total 0.73% 1.5%
Male 0.19% 1.4%
Female 1.28% 1.5%
18+ Total -0.08% 1.9%
Male 0.42% 2.8%
Female -0.55% 1.1%
Pseudo IC Variable
• Recall each of the 4 IC short-list item was asked separately
• Form a “pseudo-” IC variable corresponding to the IC short-list response where
Pseudo-IC = number of positive responses to the
4 IC short-list questions asked separately
Item Count Response by Pseudo-Item Count Response for Both Short IC Questions
Pseudo-IC Response
Short-List IC Response
0 1 2 3 4
0 51,015 1,641 286 49 47
1 4,392 6,333 447 48 19
2 718 607 622 53 9
3 263 114 48 44 7
4 1,393 96 37 9 8
Objective of the Modeling Approach
• Combine all data on cocaine use including –– Direct question– Item count pair of questions– Pseudo-item count data
• Apply latent class models to predict cocaine use
• Why latent class models?– Accounts for measurement error in all the observations– Model assumptions are plausible for the current
application
Central Idea for the Modeling Approach
ˆ ICp D A
ˆIC Z X
Let A = short form responseD = long form responseA is an indicator of X (latent variable)D is an indicator of Z (latent variable)
Standard IC estimator is
Use LCA to estimate Z and X and form
Repeat this for each of the two IC pairs
A B
X Y Z
C D
G
Path Model for One IC Pair of Questions
Short IC Question
Pseudo Short IC Question
Cocaine Long IC Question
Grouping variable
A B
X Y Z
C D
G
Path Model for One IC Pair of Questions
Short IC Question
Pseudo Short IC Question
Cocaine Long IC Question
Grouping variable
A B
X Y Z
C D
G
Path Model for One IC Pair of Questions
Short IC Question
Pseudo Short IC Question
Cocaine Long IC Question
Grouping variable
Data Likelihood
( , )GABC GCDL ( ) ( )GABC GCD L L
Random split half-sample
Subsample I Subsample II
MAR
where
| |log ( ) log( )gabc g xyz g abc xyzxyz gabc
GABC n L
| |log ( ) loggcd g xyz g cd xyzxyz gcd
GCD n L
xyzN denotes summation over x, y and z = x+y.
Estimation of Cocaine Use Prevalence
|[ ]c i y j c|yπ ˆ c|yπ
( ) 1 0ˆ ˆ ˆ=[ , ] NSDUH c c cπ1 0[ , ]y y yπ
Parameters Estimators
from LCA
Cocaine prevalence
Corrected Estimator of Cocaine Prevalence
1( )ˆ ˆ ˆ= NSDUH
y c|y cπ π π
Corrected cocaine use prevalence
Correction estimated from LCM
NSDUH Estimate
Results Using the LCM-based Estimator
Pair 1 s.e. Pair 2 s.e. Average s.e.
.6953 .1961 .7065 .2835 .7009 .1724
.9988 .0012 .9993 .0012 .9991 .0004
Estimates of Classification Accuracy from LCM
1| 1ˆc y
0| 0ˆc y
NSDUH and Model-based IC Estimates of Past Year Cocaine Use Prevalence by Gender and Age
NSDUH s.e. LCM s.e.
Total 1.90 0.08 2.71 0.36
Male 2.600.14
3.710.44
Female 1.10 0.08 1.57 0.28
12-17 1.50 0.10 2.14 0.33
18+ 1.90 0.09 2.71 0.36
Summary• Despite careful design and large sample size, the
standard item count method failed– Estimates of cocaine use prevalence were less than
direct estimates from NSDUH
• Major cause appeared to be measurement error– Difficult response task– IC masking may be ineffective for eliciting truthful counts– IC direct questions may be interpreted differently
• Latent class model corrections were successful at reducing downward bias– NSDUH estimates were increased by ~40% on average– Standard errors were much larger
Further reading -
Biemer, P. and Brown, G. (2005). “Model-based Estimation of Drug Use Prevalence with Item Count Data,” Journal of Official Statistics, Vol. 21, No. 3.
Biemer, P., B.K. Jordan, M. Hubbard, and D. Wright (2005). A Test of the Item Count Methodology for Estimating Cocaine Use Prevalence. In Kennet, J., and J. Gfroerer (Eds.), Evaluating and Improving Methods Used in the National Survey on Drug Use and Health. Rockville, MD: SAMHSA