modeling correlated/clustered multinomial data justin newcomer department of mathematics and...
TRANSCRIPT
Modeling Correlated/Clustered Modeling Correlated/Clustered Multinomial Data Multinomial Data
Justin NewcomerJustin Newcomer
Department of Mathematics and StatisticsDepartment of Mathematics and Statistics
University of Maryland, Baltimore CountyUniversity of Maryland, Baltimore County
Probability and Statistics Day, April 28, 2007Probability and Statistics Day, April 28, 2007
Joint Research with Professor Nagaraj K. Neerchal, UMBC Joint Research with Professor Nagaraj K. Neerchal, UMBC and Jorge G. Morel, PhD, P&G Pharmaceuticals, Inc.and Jorge G. Morel, PhD, P&G Pharmaceuticals, Inc.
2
Motivation
In the analysis of forest pollen, counts of the frequency of occurrence of different kinds of pollen grains are made at various levels of a sediment core
An attempt is then made to reconstruct the past vegetation changes in the area from which the core was taken
Example – Forrest Pollen Count, Mosimann (1962)Example – Forrest Pollen Count, Mosimann (1962)
3
Motivation
Four arboreal types of fossil forest pollen (pine, fir, oak and alder) were counted in the Bellas Artes core from the Valley of Mexico
At various levels of the core, pollen was classified in clusters of 100 pollen grains
The Data:
Example – Forrest Pollen Count, Mosimann (1962)Example – Forrest Pollen Count, Mosimann (1962)
Core Level (Cluster) Pine Fir Oak Alder
1 94 0 5 12 75 2 14 93 81 2 13 44 95 2 3 0
72 80 0 14 673 85 3 9 3
...
4
Motivation
The probability function:
Key assumptions: Each observation can be classified by exactly one of k
possible outcomes, with probabilities 1,..., k
All observations are independent of each other
In our example, since each pollen count comes from a cluster of 100 pollen grains, the individual observations within a cluster can be expected to be correlated The possible correlations are a violation of the multinomial
model assumptions!
The Multinomial ModelThe Multinomial Model
ktk
tt
ktt
m
2121
1 !!
!Pr tT
5
Motivation
How can we properly model these data and estimate the proportions of pollen grains?
What are the effects of using the wrong model?
Problem StatementProblem Statement
6
Overdispersion (Extra Variation)
Data exhibit variances larger than that permitted by the multinomial model
Usually caused by a lack of independence or clustering of experimental units
“Overdispersion is not uncommon in practice. In fact, some would maintain that over-dispersion is the norm in practice and nominal dispersion the exception.” McCullagh and Nelder (1989)
OverviewOverview
7
Overdispersion (Extra Variation)
Usually characterized by the first two moments
The quantity {1+ 2(m – 1)} is known as the design effect (Kish, 1965).
The parameter is known as the “intra class” or “intra cluster” correlation We use to denote a positive intra cluster correlation which
corresponds to overdispersion
Multinomial OverdispersionMultinomial Overdispersion
πT mE
11 DiagVar 2 mm πππT
8
Parameter Estimation
How can we properly model these data and estimate the proportions of pollen grains?
Moment Based
Likelihood Based
Quasi-Likelihood
Generalized Estimating Equations
Finite Mixture Distribution
Dirichlet Multinomial Distribution
(Easily implemented in SAS – Proc Genmod)
(Not currently in SAS – Must write your own code)
9
Quasi-Likelihood Estimation
Here we assume that overdispersion occurs by inflation of variances by a constant factor
Estimate systematic structure of the model via maximum likelihood procedures
Inflate the variance by a suitable constant
Wedderburn (1974), Cox and Snell (1989)Wedderburn (1974), Cox and Snell (1989)
jjjj YYE Var Var ,
10
Generalized Estimating Equations (GEE)
Liang and Zeger (1986), Zeger and Liang (1986)Liang and Zeger (1986), Zeger and Liang (1986)
Extension of Quasi-likelihood to clustered and longitudinal data:
The Generalized Estimating Equations are:
, Cov , E ,αμv,yyμy jrsjsjrjj
, 21 jrsjjmjjj v,....,μ,μμj
Vμ
0βμyVβHβU
n
jjjjj
1
1
11
Likelihood Models for Correlated Multinomial
Multinomial Distribution with a Dirichlet Prior
Dirichlet Multinomial Distribution, Mosimann (1962)Dirichlet Multinomial Distribution, Mosimann (1962)
11 ,, ~ , ~| kCCDirichletlMultinomia PPT
12
It can be shown that
If we let then the moments of the Dirichlet
Multinomial distribution are given by
Dirichlet Multinomial DistributionDirichlet Multinomial Distribution, Mosimann (1962), Mosimann (1962)
Likelihood Models for Correlated Multinomial
k
ii
k
iii
k C
Ct
Cm
C
tt
m
1
1
1 !!
!Pr
tT
2
21
C
πT mE
11 DiagVar 2 mm πππT
13
Likelihood Models for Correlated Multinomial
Can be represented as: T=YN+X|N
N Binomial(, m), Y Multinomial(, 1), N Y (X|N) Multinomial(, m-N ) if N < m
Finite Mixture of Multinomials, Morel & Neerchal (1993)Finite Mixture of Multinomials, Morel & Neerchal (1993)
1 1
11
?
?
?
N m-N
YN X given N(a)
0 0
00
?
?
?
N m-N
YN X given N(b)
14
Likelihood Models for Correlated Multinomial
It can be shown that:
If
and,
Then the moments of the Finite Mixture distribution are given by,
Finite Mixture of Multinomials, Morel & Neerchal (1993)Finite Mixture of Multinomials, Morel & Neerchal (1993)
1th ofcolumn theis , )1( kiii i Ieeπp
πp )1( k
πT mE
11 DiagVar 2 mm πππT
k
i
k
j
tij
k
i
k
iii
jptt
m
1 111 !!
!PrPr
tXtT
15
Maximum Likelihood Estimation
Computed using the Fisher Scoring Algorithm:
Fisher Information Matrix plays an important role
Can be computationally challenging
Approximations are available
Dirichlet Multinomial FIM can be computed using marginal Beta-Binomial moments
OverviewOverview
ˆ
ˆ ˆ ˆˆ 1
1
i
iiii
L
θ
θθIθθ
)()()(
tθθ
t
θθ
tθI P
PPE
t
loglog)(
22
elements! 176,851 has then 4 and 100 If km
16
Maximum Likelihood Estimation
Maximum Likelihood Estimation results under the Finite Mixture and Dirichlet Multinomial Distributions
The naïve model underestimates the standard errors
The FM model gives smaller standard errors for the estimates of
Example Example – Forrest Pollen Count, Mosimann (1962)– Forrest Pollen Count, Mosimann (1962)
Parameter Estimate S.E. Estimate S.E. Estimate S.E.
1 0.8627 0.0040 0.8621 0.0065 0.8684 0.00482 0.0141 0.0014 0.0164 0.0022 0.0151 0.00153 0.0906 0.0034 0.0888 0.0053 0.0863 0.0040 0.1278 0.0109 0.0897 0.0139
ModelNaïve (Multinomial) Dirichlet Multinomial Finite Mixture
(pine)
(fir)
(oak)
(alder) 4 = 1-(1 + 2 + 3)
17
Maximum Likelihood Estimation
Simulation StudySimulation Study
What are the effects of using the wrong model?
After each simulation, we calculate the average of the determinants from each model
A comparison of these averages gives us insight as to which model may be more efficient
Finite Mixture Dirichle Multinomial
Finite Mixture
Calculate an estimate of and its SE under the FM model. Calculate the determinant of the estimated inverse FIM
Calculate an estimate of and its SE under the DM model. Calculate the determinant of the estimated inverse FIM
Likelihood ModelSimulate 5,000 Datasets From
18
Maximum Likelihood Estimation
Simulation StudySimulation Study
The Joint Asymptotic Relative Efficiency (JARE) can be used to summarize the simulation results as it indicates which estimate would have a smaller asymptotic variance
For a vector parameter, JARE is the ratio of the determinants of the asymptotic variance-covariance matrices
ˆ det
ˆ detˆ,ˆ JARE
FMFM
DMDMFMDM
πvar
πvarππ
(0.3) (0.5) (0.1, 0.3)' (0.1, 0.5)'0.3 FM 1.16028 1.24236 1.11770 1.15731
DM 1.15604 1.23019 1.20241 1.22824
0.7 FM 2.20322 2.28815 2.60584 2.67401DM 2.13496 2.19185 3.52726 3.48980
Value of Simulated Data From
19
Conclusions
If we observe correlated/clustered multinomial data, use of the naïve multinomial model causes the standard errors to be underestimated which leads to erroneous inferences and inflated Type-I error rates
If the data truly comes from a Finite Mixture distribution, then estimation using this model clearly outperforms the Dirichlet Multinomial in terms of efficiency
If we are unsure of the distribution, the FM model may underestimate the standard errors and the Dirichlet Multinomial model provides a safe alternative
20
Future Work
Covariates can be included and linked to the model parameters through “link” functions as in the Generalized Linear Model (GLM) frameworkObtain the expressions for the efficiency of likelihood models relative to GEE
Use simulations to see if gains in efficiency of the likelihood models can be achieved over GEEDoes the inclusion of covariates change our conclusions? Does the choice of link function have an influence?
Extension to Include CovariatesExtension to Include Covariates
Simulation StudySimulation Study
21
References
Cox, D.R. and Snell, E.J. (1989) Analysis of Binary Data. 2nd Ed. New York: Chapman and Hall.
Kish, L. (1965) Survey Sampling. New York: John Wiley & Sons.
Liang, K.Y. and Zeger, S.L. (1986) “Longitudinal data analysis using generalized linear models.” Biometrika 73: 13-22.
McCullagh, P. and Nelder, J.A. (1989) Generalized Linear Models. 2nd Ed. London: Chapman and Hall.
Morel, J.G. and Nagaraj, N.K. (1993) “A finite mixture distribution for modelling multinomial extra variation.” Biometrika 80: 363-371.
Mosimann, J. E. (1962) “On the Compound Multinomial Distribution, the Multivariate -distribution, and Correlation among Proportions,” Biometrika, 49: 65-82.
Neerchal, N.K. and Morel, J.G. (1998) “Large cluster results for two parametric multinomial extra variation models.” Journal of the American Statistical Association 93: 1078-1087.
Wedderburn, R.W.M. (1974) “Quasi-likelihood functions, generalized linear models, and the Gauss-Newton method.” Biometrika 61: 439-447.
Zeger, S.L. and Liang, K.Y. (1986) “Longitudinal data analysis for discrete and continuous outcomes.” Biometrics 42: 121-130.