lorelogram models for spatially clustered binary datafmi.stat.unipd.it/pdf/slides_cattelan.pdf ·...
TRANSCRIPT
Lorelogram models for spatially clustered binarydata
Manuela Cattelan and Cristiano Varin∗
Department of Statistical Sciences, University of Padova, Italy∗ D.E.S.I.S., Ca’ Foscari University - Venice, Italy
Intermediate Workshop of the PRIN 2015“Likelihood-free Methods of Inference”
February 19, 2019
Outline
• A motivating example
• Spatially clustered binary data
• Spatial lorelogram
• Hybrid pairwise likelihood
• Application
Spatially clustered binary data Spatial lorelogram Hybrid pairwise likelihood Application Conclusions
The Gambia malaria data
x
y
300 350 400 450 500 550 600 650
1450
1500
1550
East [Km]
Nor
th [K
m]
●●●
●
●●
●●
●●
●
●
●
●
●
●
● ●
●●●
●●
●
● ●
●●
● ●●
● ●●
●●
●●
●
●
●
●●
●●●
●●●
●
●
●
●●
●
●
●
● ●
●
●
●
●
●
●
Area 1
Area 2
Area 3
Area 4
Area 5
• 65 villages in 5 areas in The Gambia• 8 to 63 children per village• Response: presence of malaria parasites in the children blood• Individual covariates: age, bed net use, bed net treated with
insecticide• Village-level covariates: presence of a health center, measure
of vegetation close to the village
Spatially clustered binary data Spatial lorelogram Hybrid pairwise likelihood Application Conclusions
Spatially clustered binary data
• Random effects models (Diggle et al., 2002)
• GEE with spatial exponential correlation matrix (Thomson etal., 1999)
• Marginal multivariate probit with tetrachoric spatialcorrelation matrix (Bai et al., 2014)
Here:
• Marginal model
• Extend the lorelogram (Heagerty and Zeger, 1998) for spatialdependence
• Employ odds ratios
Spatially clustered binary data Spatial lorelogram Hybrid pairwise likelihood Application Conclusions
Notation
• nk number of observations in cluster k, k = 1, . . . ,K,
• n =∑K
k=1 nk total number of observations,
• Y = (Y1, . . . , Yn)> vector of all observations with those
belonging to the same cluster contiguous,
• πi(β) = Pr(Yi = 1),
• xi p-dimensional vector of covariates,
• β regression coefficients.
Spatially clustered binary data Spatial lorelogram Hybrid pairwise likelihood Application Conclusions
Optimal estimating equations
Focus on the marginal logistic model
logit{πi(β)} = x>i β. (1)
Optimal estimating equations for β are
D>V−1(y − π) = 0, (2)
where
• D = ∂π/∂β,
• V covariance matrix of Y ,
• y = (y1, . . . , yn)> vector of observed values of Y ,
• π = (π1, . . . , πn)> corresponding vector of marginal
probabilities.
Spatially clustered binary data Spatial lorelogram Hybrid pairwise likelihood Application Conclusions
Covariance matrix
• Vind = diag{πi(1− πi)},• VGEE = V
1/2ind RV
1/2ind , where R = diag(R1, . . . ,RK) is a
block-diagonal correlation matrix.
Correlation constrained by marginal means.
Let πij = Pr(Yi = 1, Yj = 1), then the covariance betweenobservations is cov(Yi, Yj) = πij − πiπj .
Spatially clustered binary data Spatial lorelogram Hybrid pairwise likelihood Application Conclusions
Pairwise odds ratio
The pairwise odds-ratio of Yi and Yj is
ψij =πij (1− πi − πj + πij)
(πi − πij) (πj − πij).
This can be inverted to obtain πij given πi, πj and ψij (Mardia,1967),
πij(ψij , πi, πj) =
πiπj , if ψij = 1,1 + (πi + πj)(ψij − 1)−G(πi, πj , ψij)
2(ψij − 1), if ψij 6= 1,
with
G(πi, πj , ψij) =√{1 + (πi + πj)(ψij − 1)}2 + 4ψij(1− ψij)πiπj .
Spatially clustered binary data Spatial lorelogram Hybrid pairwise likelihood Application Conclusions
The spatial lorelogram
• Lorelogram: introduced by Heagerty and Zeger (1998) forlongitudinal categorical data.
• ui coordinate of the i-th observation,
• spatial lorelogram between observation i and observation j:γij = logψij = γ(ui,uj).
• Consider isotropic spatial lorelogram models with general form
γ(dij ;α) = α11(dij = 0) + α2 ρ(dij/α3),
• dij distance between observations i and j,• 1(E) indicator function of event E,• α = (α1, α2, α3)
> vector of non-negative parameters.
Spatially clustered binary data Spatial lorelogram Hybrid pairwise likelihood Application Conclusions
Examples of spatial lorelogram models
0 5 10 15 20 25 30
0.000.05
0.40
0.70
Distance
Lo
relo
gra
m
Exponential
Gaussian
Spherical
Wave
Model Dependence function ρ(x) Range Monotone
Exponential exp(−x) Infinite yesGaussian exp(−x2) Infinite yesSpherical (1− 1.5x+ 0.5x3)1(x < 1) Finite yesWave sin(x)/x Infinite no
Spatially clustered binary data Spatial lorelogram Hybrid pairwise likelihood Application Conclusions
The empirical spatial lorelogram
• dmax distance beyond which dependence becomes negligible
• subdivide interval [0, dmax] into B + 1 possibly overlappingbins
• first bin includes only observations in the same cluster
• other bins constructed around a sequence of equispacedmidpoints 0 < m1 < m2 < . . . < mB < dmax and with radiush
• the empirical spatial lorelogram is the set of estimates γ̂b ofthe logarithm of pairwise odds ratios γb = γ(mb) using thepairs of observations in the b-th bin.
Spatially clustered binary data Spatial lorelogram Hybrid pairwise likelihood Application Conclusions
The empirical spatial lorelogram
γ̂b estimated from the pairwise likelihood
`(β, γb) =∑
{(i,j)∈Ib}
logPr(Yi = yi, Yj = yj),
where
Ib = {(i, j) : i < j and max(0,mb − h) ≤ dij ≤ mb + h},
and the bivariate probability function is
Pr(Yi = yi, Yj = yj) = πyiyjij (πi − πij)yi(1−yj) ×
(πj − πij)(1−yi)yj (1− πi − πj + πij)(1−yi)(1−yj).
Spatially clustered binary data Spatial lorelogram Hybrid pairwise likelihood Application Conclusions
Hybrid pairwise likelihood estimation
• Alternates between optimal estimating equations forregression parameters and pairwise likelihood for dependenceparameters (Kuk, 2007).
• Here, no independent clusters.
• We employ the pairwise likelihood
`(α,β) =∑
{(i,j)∈Sd}
logPr(Yi = yi, Yj = yj),
where Sd identifies all pairs of observations far apart d or lessunits,
Sd = {(i, j) : i < j and dij ≤ d}.
• Standard errors can be computed as
avar(β̂) = D>V−1D.
Spatially clustered binary data Spatial lorelogram Hybrid pairwise likelihood Application Conclusions
Simulation studies
0.00 0.05 0.10 0.15 0.20 0.25 0.30
0.4
0.5
0.6
0.7
0.8
0.9
1.0
α3
Ratio o
f sta
ndard
err
ors
0.00 0.05 0.10 0.15 0.20 0.25 0.30
α3
0.00 0.05 0.10 0.15 0.20 0.25 0.30
α3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
Figure: Ratio of the standard errors computed ignoring spatialdependence (numerator of the ratio) to the standard errors computedunder the assumed exponential spatial lorelogram model (denominator ofthe ratio). The three panels display the ratios for intercept (left panel),trend covariate (central panel) and subject-specific covariate (rightpanel). The lines shown are for α2 = 0.25, 0.5, 0.75, 1 (descending).
Spatially clustered binary data Spatial lorelogram Hybrid pairwise likelihood Application Conclusions
Simulation studies
75%
80%
85%
90%
95%
100%
Cove
rag
e (
β0)
α3= 0.03
100 200 400
Number of clusters
α3= 0.05 α3= 0.07
α3= 0.03 α3= 0.0575%
80%
85%
90%
95%
100%
Cove
rag
e (
β1)
α3= 0.07
50 100 200
75%
80%
85%
90%
95%
100%
Cove
rag
e (
β2)
Number of clusters
α3= 0.03 α3= 0.05
50 100 200
Number of clusters
α3= 0.07
Spatially clustered binary data Spatial lorelogram Hybrid pairwise likelihood Application Conclusions
The Gambia data
GEEParameter Est. SE z
Intercept 8.30 3.61 2.30Age ×103 0.60 0.11 5.56Net-use −0.40 0.14 2.80Treated net −0.39 0.19 2.04Green −0.43 0.16 2.77Green2 × 102 0.50 0.16 3.03PHC −0.27 0.22 1.24
Within-village pairwise odds ratio exp(0.49) = 1.63.
Spatially clustered binary data Spatial lorelogram Hybrid pairwise likelihood Application Conclusions
The Gambia data
Consider cluster-level residuals ε̂ = (ε̂1, . . . , ε̂k)> (Diggle et al.,
2002) computed as
ε̂ = C>V̂−1/2(y − π̂),
where C is a n×K contrast matrix whose element of position(i, k) is equal to 1/
√nk if observation i belongs to cluster k and
zero otherwise.Departures from model assumptions can be checked using the Dstatistic (Walter, 1994):
D =
∑k<k′ w(k, k
′)|rank(εk)− rank(εk′)|∑k<k′ w(k, k
′).
In the application D = 19.47 corresponding to a p-value of0.9× 10−3
Spatially clustered binary data Spatial lorelogram Hybrid pairwise likelihood Application Conclusions
The Gambia data: empirical spatial lorelogram
Distance [Km]
Lore
logr
am
0 5 10 15 20 25 30
0.0
0.1
0.2
0.3
0.4
0.5
Spatially clustered binary data Spatial lorelogram Hybrid pairwise likelihood Application Conclusions
The Gambia data
GEE SpatialParameter Est. SE z Est. SE z
Intercept 8.30 3.61 2.30 6.95 3.35 2.08Age ×103 0.60 0.11 5.56 0.60 0.11 5.69Net-use −0.40 0.14 2.80 −0.39 0.14 2.77Treated net −0.39 0.19 2.04 −0.32 0.17 1.90Green −0.43 0.16 2.77 −0.36 0.14 2.51Green2 × 102 0.50 0.16 3.03 0.40 0.15 2.69PHC −0.27 0.22 1.24 −0.26 0.17 1.48
Spatially clustered binary data Spatial lorelogram Hybrid pairwise likelihood Application Conclusions
Conclusions
• Analysis of spatially clustered binary data using a marginalmodel
• Consider pairwise odds ratios instead of correlations
• Extend the lorelogram to spatial data
• Use hybrid pairwise likelihood: need only up to bivariateprobabilities
• Extensions for ordinal and nominal outcomes
Spatially clustered binary data Spatial lorelogram Hybrid pairwise likelihood Application Conclusions
References• Bai, Y., Kang, J. and Song, P. X.-K. (2014). Efficient pairwise composite
likelihood estimation for spatial-clustered data. Biometrics 70, 661–670.
• Diggle, P.J., Moyeed, R., Rowlingson, B. and Thomson, M. (2002).Childhood malaria in the Gambia: a case study in model-basedgeostatistics. Journal of the Royal Statistical Society Series C 51, 493–506.
• Heagerty, P.J. and Zeger, S.L. (1998). Lorelogram: A regression approachto exploring dependence in longitudinal categorical responses. Journal ofthe American Statistical Association 93, 150–162.
• Kuk, A. Y. (2007). A hybrid pairwise likelihood method. Biometrika 94,939–952.
• Mardia, K.V. (1967). Some contributions to the contingency-typebivariate distributions. Biometrika 54, 235–249.
• Thomson, M. C., Connor, S. J., D’Alessandro, U., Rowlingson, B., Diggle,P., Cresswell, M. and Greenwood, B. (1999). Predicting malaria infectionin Gambian children from satellite data and bed net use surveys: Theimportance of spatial correlation in the interpretation of results. AmericanJournal of Tropical Medicine and Hygiene 61, 2–8.
• Walter, S.D. (1994). A simple test for spatial pattern in regional healthdata. Statistics in Medicine 13, 1037–1044.