lorelogram models for spatially clustered binary datafmi.stat.unipd.it/pdf/slides_cattelan.pdf ·...

21
Lorelogram models for spatially clustered binary data Manuela Cattelan and Cristiano Varin * Department of Statistical Sciences, University of Padova, Italy * D.E.S.I.S., Ca’ Foscari University - Venice, Italy Intermediate Workshop of the PRIN 2015 “Likelihood-free Methods of Inference” February 19, 2019

Upload: others

Post on 23-Jul-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Lorelogram models for spatially clustered binary datafmi.stat.unipd.it/pdf/slides_cattelan.pdf · Manuela Cattelan and Cristiano Varin Department of Statistical Sciences, University

Lorelogram models for spatially clustered binarydata

Manuela Cattelan and Cristiano Varin∗

Department of Statistical Sciences, University of Padova, Italy∗ D.E.S.I.S., Ca’ Foscari University - Venice, Italy

Intermediate Workshop of the PRIN 2015“Likelihood-free Methods of Inference”

February 19, 2019

Page 2: Lorelogram models for spatially clustered binary datafmi.stat.unipd.it/pdf/slides_cattelan.pdf · Manuela Cattelan and Cristiano Varin Department of Statistical Sciences, University

Outline

• A motivating example

• Spatially clustered binary data

• Spatial lorelogram

• Hybrid pairwise likelihood

• Application

Page 3: Lorelogram models for spatially clustered binary datafmi.stat.unipd.it/pdf/slides_cattelan.pdf · Manuela Cattelan and Cristiano Varin Department of Statistical Sciences, University

Spatially clustered binary data Spatial lorelogram Hybrid pairwise likelihood Application Conclusions

The Gambia malaria data

x

y

300 350 400 450 500 550 600 650

1450

1500

1550

East [Km]

Nor

th [K

m]

●●●

●●

●●

●●

● ●

●●●

●●

● ●

●●

● ●●

● ●●

●●

●●

●●

●●●

●●●

●●

● ●

Area 1

Area 2

Area 3

Area 4

Area 5

• 65 villages in 5 areas in The Gambia• 8 to 63 children per village• Response: presence of malaria parasites in the children blood• Individual covariates: age, bed net use, bed net treated with

insecticide• Village-level covariates: presence of a health center, measure

of vegetation close to the village

Page 4: Lorelogram models for spatially clustered binary datafmi.stat.unipd.it/pdf/slides_cattelan.pdf · Manuela Cattelan and Cristiano Varin Department of Statistical Sciences, University

Spatially clustered binary data Spatial lorelogram Hybrid pairwise likelihood Application Conclusions

Spatially clustered binary data

• Random effects models (Diggle et al., 2002)

• GEE with spatial exponential correlation matrix (Thomson etal., 1999)

• Marginal multivariate probit with tetrachoric spatialcorrelation matrix (Bai et al., 2014)

Here:

• Marginal model

• Extend the lorelogram (Heagerty and Zeger, 1998) for spatialdependence

• Employ odds ratios

Page 5: Lorelogram models for spatially clustered binary datafmi.stat.unipd.it/pdf/slides_cattelan.pdf · Manuela Cattelan and Cristiano Varin Department of Statistical Sciences, University

Spatially clustered binary data Spatial lorelogram Hybrid pairwise likelihood Application Conclusions

Notation

• nk number of observations in cluster k, k = 1, . . . ,K,

• n =∑K

k=1 nk total number of observations,

• Y = (Y1, . . . , Yn)> vector of all observations with those

belonging to the same cluster contiguous,

• πi(β) = Pr(Yi = 1),

• xi p-dimensional vector of covariates,

• β regression coefficients.

Page 6: Lorelogram models for spatially clustered binary datafmi.stat.unipd.it/pdf/slides_cattelan.pdf · Manuela Cattelan and Cristiano Varin Department of Statistical Sciences, University

Spatially clustered binary data Spatial lorelogram Hybrid pairwise likelihood Application Conclusions

Optimal estimating equations

Focus on the marginal logistic model

logit{πi(β)} = x>i β. (1)

Optimal estimating equations for β are

D>V−1(y − π) = 0, (2)

where

• D = ∂π/∂β,

• V covariance matrix of Y ,

• y = (y1, . . . , yn)> vector of observed values of Y ,

• π = (π1, . . . , πn)> corresponding vector of marginal

probabilities.

Page 7: Lorelogram models for spatially clustered binary datafmi.stat.unipd.it/pdf/slides_cattelan.pdf · Manuela Cattelan and Cristiano Varin Department of Statistical Sciences, University

Spatially clustered binary data Spatial lorelogram Hybrid pairwise likelihood Application Conclusions

Covariance matrix

• Vind = diag{πi(1− πi)},• VGEE = V

1/2ind RV

1/2ind , where R = diag(R1, . . . ,RK) is a

block-diagonal correlation matrix.

Correlation constrained by marginal means.

Let πij = Pr(Yi = 1, Yj = 1), then the covariance betweenobservations is cov(Yi, Yj) = πij − πiπj .

Page 8: Lorelogram models for spatially clustered binary datafmi.stat.unipd.it/pdf/slides_cattelan.pdf · Manuela Cattelan and Cristiano Varin Department of Statistical Sciences, University

Spatially clustered binary data Spatial lorelogram Hybrid pairwise likelihood Application Conclusions

Pairwise odds ratio

The pairwise odds-ratio of Yi and Yj is

ψij =πij (1− πi − πj + πij)

(πi − πij) (πj − πij).

This can be inverted to obtain πij given πi, πj and ψij (Mardia,1967),

πij(ψij , πi, πj) =

πiπj , if ψij = 1,1 + (πi + πj)(ψij − 1)−G(πi, πj , ψij)

2(ψij − 1), if ψij 6= 1,

with

G(πi, πj , ψij) =√{1 + (πi + πj)(ψij − 1)}2 + 4ψij(1− ψij)πiπj .

Page 9: Lorelogram models for spatially clustered binary datafmi.stat.unipd.it/pdf/slides_cattelan.pdf · Manuela Cattelan and Cristiano Varin Department of Statistical Sciences, University

Spatially clustered binary data Spatial lorelogram Hybrid pairwise likelihood Application Conclusions

The spatial lorelogram

• Lorelogram: introduced by Heagerty and Zeger (1998) forlongitudinal categorical data.

• ui coordinate of the i-th observation,

• spatial lorelogram between observation i and observation j:γij = logψij = γ(ui,uj).

• Consider isotropic spatial lorelogram models with general form

γ(dij ;α) = α11(dij = 0) + α2 ρ(dij/α3),

• dij distance between observations i and j,• 1(E) indicator function of event E,• α = (α1, α2, α3)

> vector of non-negative parameters.

Page 10: Lorelogram models for spatially clustered binary datafmi.stat.unipd.it/pdf/slides_cattelan.pdf · Manuela Cattelan and Cristiano Varin Department of Statistical Sciences, University

Spatially clustered binary data Spatial lorelogram Hybrid pairwise likelihood Application Conclusions

Examples of spatial lorelogram models

0 5 10 15 20 25 30

0.000.05

0.40

0.70

Distance

Lo

relo

gra

m

Exponential

Gaussian

Spherical

Wave

Model Dependence function ρ(x) Range Monotone

Exponential exp(−x) Infinite yesGaussian exp(−x2) Infinite yesSpherical (1− 1.5x+ 0.5x3)1(x < 1) Finite yesWave sin(x)/x Infinite no

Page 11: Lorelogram models for spatially clustered binary datafmi.stat.unipd.it/pdf/slides_cattelan.pdf · Manuela Cattelan and Cristiano Varin Department of Statistical Sciences, University

Spatially clustered binary data Spatial lorelogram Hybrid pairwise likelihood Application Conclusions

The empirical spatial lorelogram

• dmax distance beyond which dependence becomes negligible

• subdivide interval [0, dmax] into B + 1 possibly overlappingbins

• first bin includes only observations in the same cluster

• other bins constructed around a sequence of equispacedmidpoints 0 < m1 < m2 < . . . < mB < dmax and with radiush

• the empirical spatial lorelogram is the set of estimates γ̂b ofthe logarithm of pairwise odds ratios γb = γ(mb) using thepairs of observations in the b-th bin.

Page 12: Lorelogram models for spatially clustered binary datafmi.stat.unipd.it/pdf/slides_cattelan.pdf · Manuela Cattelan and Cristiano Varin Department of Statistical Sciences, University

Spatially clustered binary data Spatial lorelogram Hybrid pairwise likelihood Application Conclusions

The empirical spatial lorelogram

γ̂b estimated from the pairwise likelihood

`(β, γb) =∑

{(i,j)∈Ib}

logPr(Yi = yi, Yj = yj),

where

Ib = {(i, j) : i < j and max(0,mb − h) ≤ dij ≤ mb + h},

and the bivariate probability function is

Pr(Yi = yi, Yj = yj) = πyiyjij (πi − πij)yi(1−yj) ×

(πj − πij)(1−yi)yj (1− πi − πj + πij)(1−yi)(1−yj).

Page 13: Lorelogram models for spatially clustered binary datafmi.stat.unipd.it/pdf/slides_cattelan.pdf · Manuela Cattelan and Cristiano Varin Department of Statistical Sciences, University

Spatially clustered binary data Spatial lorelogram Hybrid pairwise likelihood Application Conclusions

Hybrid pairwise likelihood estimation

• Alternates between optimal estimating equations forregression parameters and pairwise likelihood for dependenceparameters (Kuk, 2007).

• Here, no independent clusters.

• We employ the pairwise likelihood

`(α,β) =∑

{(i,j)∈Sd}

logPr(Yi = yi, Yj = yj),

where Sd identifies all pairs of observations far apart d or lessunits,

Sd = {(i, j) : i < j and dij ≤ d}.

• Standard errors can be computed as

avar(β̂) = D>V−1D.

Page 14: Lorelogram models for spatially clustered binary datafmi.stat.unipd.it/pdf/slides_cattelan.pdf · Manuela Cattelan and Cristiano Varin Department of Statistical Sciences, University

Spatially clustered binary data Spatial lorelogram Hybrid pairwise likelihood Application Conclusions

Simulation studies

0.00 0.05 0.10 0.15 0.20 0.25 0.30

0.4

0.5

0.6

0.7

0.8

0.9

1.0

α3

Ratio o

f sta

ndard

err

ors

0.00 0.05 0.10 0.15 0.20 0.25 0.30

α3

0.00 0.05 0.10 0.15 0.20 0.25 0.30

α3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

Figure: Ratio of the standard errors computed ignoring spatialdependence (numerator of the ratio) to the standard errors computedunder the assumed exponential spatial lorelogram model (denominator ofthe ratio). The three panels display the ratios for intercept (left panel),trend covariate (central panel) and subject-specific covariate (rightpanel). The lines shown are for α2 = 0.25, 0.5, 0.75, 1 (descending).

Page 15: Lorelogram models for spatially clustered binary datafmi.stat.unipd.it/pdf/slides_cattelan.pdf · Manuela Cattelan and Cristiano Varin Department of Statistical Sciences, University

Spatially clustered binary data Spatial lorelogram Hybrid pairwise likelihood Application Conclusions

Simulation studies

75%

80%

85%

90%

95%

100%

Cove

rag

e (

β0)

α3= 0.03

100 200 400

Number of clusters

α3= 0.05 α3= 0.07

α3= 0.03 α3= 0.0575%

80%

85%

90%

95%

100%

Cove

rag

e (

β1)

α3= 0.07

50 100 200

75%

80%

85%

90%

95%

100%

Cove

rag

e (

β2)

Number of clusters

α3= 0.03 α3= 0.05

50 100 200

Number of clusters

α3= 0.07

Page 16: Lorelogram models for spatially clustered binary datafmi.stat.unipd.it/pdf/slides_cattelan.pdf · Manuela Cattelan and Cristiano Varin Department of Statistical Sciences, University

Spatially clustered binary data Spatial lorelogram Hybrid pairwise likelihood Application Conclusions

The Gambia data

GEEParameter Est. SE z

Intercept 8.30 3.61 2.30Age ×103 0.60 0.11 5.56Net-use −0.40 0.14 2.80Treated net −0.39 0.19 2.04Green −0.43 0.16 2.77Green2 × 102 0.50 0.16 3.03PHC −0.27 0.22 1.24

Within-village pairwise odds ratio exp(0.49) = 1.63.

Page 17: Lorelogram models for spatially clustered binary datafmi.stat.unipd.it/pdf/slides_cattelan.pdf · Manuela Cattelan and Cristiano Varin Department of Statistical Sciences, University

Spatially clustered binary data Spatial lorelogram Hybrid pairwise likelihood Application Conclusions

The Gambia data

Consider cluster-level residuals ε̂ = (ε̂1, . . . , ε̂k)> (Diggle et al.,

2002) computed as

ε̂ = C>V̂−1/2(y − π̂),

where C is a n×K contrast matrix whose element of position(i, k) is equal to 1/

√nk if observation i belongs to cluster k and

zero otherwise.Departures from model assumptions can be checked using the Dstatistic (Walter, 1994):

D =

∑k<k′ w(k, k

′)|rank(εk)− rank(εk′)|∑k<k′ w(k, k

′).

In the application D = 19.47 corresponding to a p-value of0.9× 10−3

Page 18: Lorelogram models for spatially clustered binary datafmi.stat.unipd.it/pdf/slides_cattelan.pdf · Manuela Cattelan and Cristiano Varin Department of Statistical Sciences, University

Spatially clustered binary data Spatial lorelogram Hybrid pairwise likelihood Application Conclusions

The Gambia data: empirical spatial lorelogram

Distance [Km]

Lore

logr

am

0 5 10 15 20 25 30

0.0

0.1

0.2

0.3

0.4

0.5

Page 19: Lorelogram models for spatially clustered binary datafmi.stat.unipd.it/pdf/slides_cattelan.pdf · Manuela Cattelan and Cristiano Varin Department of Statistical Sciences, University

Spatially clustered binary data Spatial lorelogram Hybrid pairwise likelihood Application Conclusions

The Gambia data

GEE SpatialParameter Est. SE z Est. SE z

Intercept 8.30 3.61 2.30 6.95 3.35 2.08Age ×103 0.60 0.11 5.56 0.60 0.11 5.69Net-use −0.40 0.14 2.80 −0.39 0.14 2.77Treated net −0.39 0.19 2.04 −0.32 0.17 1.90Green −0.43 0.16 2.77 −0.36 0.14 2.51Green2 × 102 0.50 0.16 3.03 0.40 0.15 2.69PHC −0.27 0.22 1.24 −0.26 0.17 1.48

Page 20: Lorelogram models for spatially clustered binary datafmi.stat.unipd.it/pdf/slides_cattelan.pdf · Manuela Cattelan and Cristiano Varin Department of Statistical Sciences, University

Spatially clustered binary data Spatial lorelogram Hybrid pairwise likelihood Application Conclusions

Conclusions

• Analysis of spatially clustered binary data using a marginalmodel

• Consider pairwise odds ratios instead of correlations

• Extend the lorelogram to spatial data

• Use hybrid pairwise likelihood: need only up to bivariateprobabilities

• Extensions for ordinal and nominal outcomes

Page 21: Lorelogram models for spatially clustered binary datafmi.stat.unipd.it/pdf/slides_cattelan.pdf · Manuela Cattelan and Cristiano Varin Department of Statistical Sciences, University

Spatially clustered binary data Spatial lorelogram Hybrid pairwise likelihood Application Conclusions

References• Bai, Y., Kang, J. and Song, P. X.-K. (2014). Efficient pairwise composite

likelihood estimation for spatial-clustered data. Biometrics 70, 661–670.

• Diggle, P.J., Moyeed, R., Rowlingson, B. and Thomson, M. (2002).Childhood malaria in the Gambia: a case study in model-basedgeostatistics. Journal of the Royal Statistical Society Series C 51, 493–506.

• Heagerty, P.J. and Zeger, S.L. (1998). Lorelogram: A regression approachto exploring dependence in longitudinal categorical responses. Journal ofthe American Statistical Association 93, 150–162.

• Kuk, A. Y. (2007). A hybrid pairwise likelihood method. Biometrika 94,939–952.

• Mardia, K.V. (1967). Some contributions to the contingency-typebivariate distributions. Biometrika 54, 235–249.

• Thomson, M. C., Connor, S. J., D’Alessandro, U., Rowlingson, B., Diggle,P., Cresswell, M. and Greenwood, B. (1999). Predicting malaria infectionin Gambian children from satellite data and bed net use surveys: Theimportance of spatial correlation in the interpretation of results. AmericanJournal of Tropical Medicine and Hygiene 61, 2–8.

• Walter, S.D. (1994). A simple test for spatial pattern in regional healthdata. Statistics in Medicine 13, 1037–1044.