working correlation structure selection in generalized ... · working correlation structure...

Computational Statistics manuscript No.(will be inserted by the editor)

Working correlation structure selection in

generalized estimating equations

Liya Fu · Yangyang Hao · You-Gan

Wang

Received: date / Accepted: date

Abstract Selecting an appropriate correlation structure in analyzing longitu-dinal data can greatly improve the efficiency of parameter estimation and thuslead to more reliable statistical inference. A number of such criteria have beenproposed in the literature from different perspectives. However, little is knownabout the relative performance of these criteria. We review and evaluate thesecriteria by carrying out extensive simulation studies. Surprisingly, we find thatthe AIC and the BIC based on either the Gaussian working likelihood or theempirical likelihood outperform the others.

Keywords Correlation Information Criterion · Empirical likelihood ·Longitudinal data · Model selection

1 Introduction

Suppose that a longitudinal data set is composed of an outcome variable,yit, and a p × 1 covariate vector, xit, observed at time t = 1, . . . ni fromsubjects i = 1, . . . ,m. Let Yi = (yi1, . . . , yini

)T be the outcome values andXi = (xi1, . . . , xini

)T be the ni× p matrix of covariate values for i = 1, . . . ,m.Usually, the observations from the same subject are correlated, and the un-derlying correlation structure is unknown (Diggle et al., 2002). In economics,

L. FuSchool of Mathematics and Statistics, Xi’an Jiaotong University, ChinaE-mail: [email protected]

Y. HaoLogistics Research Center, Shanghai Maritime University, ChinaE-mail: [email protected]

Y-G. WangSchool of Mathematical Sciences, Queensland University of Technology, AustraliaE-mail: [email protected]

2 Liya Fu et al.

longitudinal data are called panel data, where a “panel” is a group of indi-viduals surveyed repeatedly over time. Let µik be the conditional expectationE(yik|xik) given xik, and σ2

ik be the corresponding conditional variance ofyik. In marginal models, it is assumed that (i) µik = g(xT

ikβ), where g(·) isa specified link function, and β is an unknown parameter vector, and that(ii) σ2

ik = φν(µik), where ν(·) is a given or known function of µik, and φ isa scale parameter. Note that when yik is a continuous response, the marginalmodels correspond to the multivariate nonlinear regression models discussedby Davidson and MacKinnon (2004, p.509).

Let Ri(α) be a correlation matrix governed by a parameter vector α of

dimension q. Define Vi = φA1/2i Ri(α)A

1/2i , where Ai = diag(ν(µik)). The

“working” matrix Vi will be the same as cov(Yi) if Ri(α) is correctly specifiedas the true correlation matrix of Yi. Because the true correlation matrix isusually difficult to specify and misspecification is the norm, Liang and Zeger(1986) developed the generalized estimating equations (GEE) approach byincorporating the working correlation matrix Ri(α) to account for the within-subject correlation. The GEE can be expressed as

U(β) =

m∑

i=1

DTi A

−1/2i R−1

i (α)A−1/2i Si = 0, (1)

where Di = ∂µi/∂βT with µi = (µi1, . . . , µini

)T, and Si = Yi − µi. Note thatDi, Ai, and Si all depend on β via the mean function µi. Hence, eqn (1) is

generally nonlinear in β. Let β̂(R) be the GEE estimator by solving eqn (1).

The covariance matrix of β̂(R) is given by:

VR =

(

m∑

i=1

DTi V

−1i Di

)−1( m∑

i=1

DTi V

−1i cov(Si)V

−1i Di

)(

m∑

i=1

DTi V

−1i Di

)−1

.

The estimate V̂R of the covariance matrix VR can be obtained by replacingcov(Si) with SiS

Ti and β, α, φ with their estimates. Note that the GEE can

be derived from the criterion function of the generalized method of moments(GMM) (Davidson and MacKinnon, 2004; Hansen, 1982). Therefore, the GEE

estimator β̂(Ri) is also a GMM estimator.

Liang and Zeger (1986) proved that β̂(R) is consistent even when the cor-

relation matrix is misspecified. However, the efficiency of β̂(R) depends onthe proximity of Ri(α) to the underlying true correlation matrix. When the

correlation structure is misspecified, the asymptotic efficiency of β̂(R) couldbe very low (Fitzmaurice, 1995; Wang and Carey, 2003). Therefore, it is im-

portant to seek better modeling of Ri(α) to improve the efficiency of β̂(Ri).The most commonly used working structures are as follows: (i) independence,q = 0, Ri = Ini

for all i; (ii) exchangeable, q = 1, Ri = (1 − α)Ini+ αeeT

with −1/(max{ni}mi=1−1) < α < 1, where e = (1, . . . , 1)T; (iii) the first-order

autoregression (AR(1)), q = 1, Ri = (α|k−l|) with −1 < α < 1; and (iv)

Working correlation structure selection in generalized estimating equations 3

stationary correlation structure, that is, Ri(α) is a matrix in which each de-scending diagonal from left to right is constant, and α = (α1, . . . , αq)

T, whereq = ni − 1. Note that the independence, exchangeable, and AR(1) correlationstructures are all special cases of the stationary correlation structure.

Traditional model selection criteria such as the Akaike information crite-rion (AIC) and the Bayesian information criterion (BIC) are not useful forselecting a correlation structure. Pan (2001) modified the AIC and developeda quasi-likelihood information criterion (QIC) for model selection in the GEEframework. This criterion can be used to select covariates in mean functionmodeling and working correlation structure modeling. Wang and Carey (2004)first utilized the method proposed by Rotnitzky and Jewell (1990) to choose aworking correlation matrix in the GEE. Shults and Chaganty (1998) proposeda simple selection criterion (SC) to choose a correlation structure, which corre-sponds to the minimum value of the weighted error sum of squares. Shults et al.(2009) evaluated this criterion in the GEE by analyzing correlated binary data.Hin and Wang (2009) proposed a correlation information criterion (CIC) asa modification of the QIC to improve its performance. Gosho et al. (2011)provided a criterion (C(R)) based on a statistic designed to test the hy-pothesis that the covariance matrix equals a given matrix. Carey and Wang(2011) evaluated the Gaussian pseudolikelihood and geodesic distance cri-teria for selecting the variance model and the correlation structure in theGEE. Chen and Lazar (2012) proposed two criteria, the empirical Akaike in-formation criterion (EAIC) and the empirical Bayesian information criterion(EBIC), by substituting the empirical likelihood for the parametric likelihoodin the AIC and the BIC, which are more powerful than the QIC and the CIC.

As these criteria can help identify the appropriate correlation structure, itis important to compare how their performance changes under different datasetups, including type of responses, such as Gaussian, binary, and Poissonresponses, and the true correlation structure, such as independence, exchange-able, AR(1), and stationary correlation structures, the value of the correlationcoefficient, and the sample size m, and the observed time periods ni. Knowl-edge of their performance profiles can reveal their strengths and limitations,which can help shed more light on the choice of a more suitable working corre-lation structure among competing structures in longitudinal data analysis. Inthis paper, we describe and evaluate the performance of these criteria basedon extensive simulation studies.

The rest of the paper is organized as follows: In Section 2, we review themethods for selecting a working correlation structure. In Section 3, we reportthe results of extensive simulation studies designed to evaluate the performanceof these methods. In Section 4, we analyze a real dataset. In Section 5, weconclude by suggesting a strategy for selecting a working correlation structurein the GEE.

4 Liya Fu et al.

2 Criteria for selecting a working correlation structure

In this section, we describe how the aforementioned criteria select a workingcorrelation structure in the longitudinal data setting.

2.1 Quasi-likelihood criterion

Classical model selection criteria such as the AIC and the BIC need to specifythe joint density function of response variables, which is usually unknownor difficult to justify in longitudinal data analysis. Therefore, these modelcriteria cannot be directly used to select a correlation structure in analysis oflongitudinal data. Pan (2001) proposed a criterion named the QIC to choosethe appropriate mean model or the working correlation structure, which is amodification of the AIC based on quasi-likelihood.

According to McCullagh and Nelder (1989), the moment assumptions, µik =E(yik|xik) and σ2

ik = φν(µik), lead to the log quasi-likelihood function of yikas

Q(yik, µik, φ) =

∫ µik

yik

yik − t

φν(t)dt.

If yik is a continuous response with specified ν(µik) = 1, then Q(yik, µik, φ) =−(yik − µik)

2/(2φ). If yik is a binary response with specified ν(µik) = µik(1−µik), then Q(yik, µik, φ) = {yik log[µik/(1− µik)] + log(1− µik)}/φ. If yik is acount response with specified ν(µik) = µik, then Q(yik, µik, φ) = (yik logµik −µik)/φ (McCullagh and Nelder, 1989). Note that the displayed expression ofQ(yik, µik, φ) was used by Pan (2001) without subtracting Q(yik, yik, φ). Thedispersion parameter is φ = 1 for a binary response. For other response types,φ is unknown and estimated by φ̂ = (M − p)−1

∑mi=1

∑ni

k=1(yik − µ̂ik)2/ν(µ̂ik)

throughout this paper, in which M is the total number of observations, and µ̂ik

and ν(µ̂ik) are the estimated mean and variance values evaluated at β = β̂(R)obtained from eqn (1) with a correlation matrix Ri.

Denote the data from the m subjects as D = ((Y1, X1), . . . , (Ym, Xm)).The corresponding log quasi-likelihood using the simple independence workingmatrix is

Q(β, φ; I,D) =

m∑

i=1

ni∑

k=1

Q(yik, µik, φ).

Therefore, for a given working correlation matrix, the QIC proposed by Pan(2001) can be expressed as

QIC = −2Q(β̂, φ̂; I,D) + 2tr(V̂RQ̂I), (2)

where the first term is the estimated log quasi-likelihood with β = β̂(R) and

φ = φ̂(R), and the second term is the trace of the product of the estimateV̂R of VR and Q̂I = φ−1

∑mi=1 D

Ti A

−1i Di|β=β̂(R),φ=φ̂(R). The QIC can be used

to select variables in addition to selecting a working correlation matrix inlongitudinal data analysis.


2.2 Correlation information criterion

Hin and Wang (2009) noted that the expectation of the first term in the QICis free from the working correlation matrix and the true correlation matrix.Therefore, Q(β̂, φ̂; I,D) does not contain information about the hypothesized

correlation structure, and the random errors from Q(β̂, φ̂; I,D) can affect theperformance of the QIC. Therefore, Hin and Wang (2009) proposed using onlythe second term in the QIC as a correlation information criterion (CIC) forcorrelation structure selection,

CIC = tr(V̂RQ̂I).

Thus, QIC = −2Q(β̂, φ̂; I,D) + 2CIC. Without the effect of the random errorfrom the first term in eqn (2), the CIC could be more powerful than theQIC. The theoretical underpinning for the biased first term in eqn (2) wasoutlined in Wang and Hin (2010). Note that for the continuous response with

specified ν(µik) = 1 and φ estimated by φ̂, we have QIC = (M − p) + 2CIC,which means that the CIC is equivalent to the QIC in the working correlationstructure chosen for the continuous response.

2.3 The Rotnitzky-Jewell criterion

Let Q0 = m−1∑m

i=1 DTi V

−1i Di, Q1 = m−1

∑mi=1 D

Ti V

−1i SiS

Ti V

−1i Di, and

Q = Q−10 Q1. Rotnitzky and Jewell (1990) proposed a generalized Wald test

statistic for testing regression parameters based on Q. Hin et al. (2007) notedthat when the correlation structure is correctly specified, Q should be closeto the p-dimensional identity matrix, and thus, C1 = tr(Q)/p and C2 =tr(Q2)/p should be close to one in value. Therefore, Hin et al. (2007) definedthe Rotnitzky-Jewell criterion (RJ) based on the following measure

RJ =√

(1− C1)2 + (1− C2)2|β=β̂(R),φ=φ̂(R).

The value of the RJ should be close to zero when the correlation structure isaccurately specified. Moreover, Carey and Wang (2011) considered two criteriaby geodesic distance: ∆1 =

∑pi=1(λi − 1)2/p = C2 − 2C1 + 1 and ∆2 =

∑pi=1(log λi)

2, where λi are eigenvalues of Q. When Ri approximates to thetrue correlation matrix, all λi will be close to one, and the values of ∆1 and∆2 will be close to zero. The RJ, ∆1, and ∆2 are all based on the statistic Q,thus, their performance for selecting the true correlation structure could besimilar.

2.4 The Shults-Chaganty criterion

Shults and Chaganty (1998) proposed a quasi-least squares method defined as

SC =m∑

i=1

STi V

−1i Si|β=β̂(R),φ=φ̂(R)

6 Liya Fu et al.

to estimate α in the GEE. Shults et al. (2009) proposed the Shults-Chagantycriterion based on the SC to select a working correlation structure for corre-lated binary data. This criterion is motivated by the hypothesis that the appro-priate correlation matrix should minimize the error sum of squares weightedby the inverse of the working correlation matrix. The SC can be also utilizedto choose a correlation structure for the correlated continuous data and countdata.

2.5 The C(R) criterion

Gosho et al. (2011) proposed a criterion to choose a working correlation struc-ture by minimizing the C(R) in which

C(R) = tr

(

1

m

m∑

i=1

SiSTi

)(

1

m

m∑

i=1

Vi

)−1

− In

2

. (3)

From a theoretical point of view, the C(R) measures the discrepancy betweenthe covariance matrix estimator and the specified working covariance matrix.Furthermore, the C(R) is based on a statistic to test the hypothesis that thecovariance matrix equals a given matrix. However, the C(R) is appropriateonly for balanced data.

2.6 Gaussian pseudolikelihood criterion

Carey and Wang (2011) used the following Gaussian pseudolikelihood criterionto choose a working correlation structure and a variance model:

LG = −1

2

m∑

i=1

{

(Yi − µi)TV −1

i (Yi − µi) + log(|Vi|)}

|β=β̂(Ri),φ=φ̂(Ri).

A correlation structure yielding the largest LG value is chosen. On the groundsthat competing working correlation structures may have different numbersof correlation parameters, it is necessary to consider the model dimension.Zhu and Zhu (2013) substituted the Gaussian pseudolikelihood for the para-metric likelihood in the AIC and the BIC and obtained two criteria:

GAIC = −2LG + 2dim(θ),

GBIC = −2LG + log(m)dim(θ),

where θ = (βT, αT)T, β ∈ Rp, α ∈ R

q. For a given set of correlation structuresand estimates of α and β, choose the correlation structure corresponding tothe minimum value of the GAIC or the GBIC. Note that there is a subtledifference between the traditional AIC/BIC and the GAIC/GBIC: the tradi-tional AIC/BIC requires that the likelihood function to be correct, while theGAIC/GBIC does not. Here, LG is a working likelihood.


2.7 The empirical likelihood method

The empirical likelihood method provides a valuable approach for estimatingparameters and testing hypotheses in nonparametric or distribution-free con-texts (Owen, 2001). Chen and Lazar (2012) proposed a novelty criterion forselecting a correlation structure via the empirical likelihood. Because their pro-posed criteria are complex for unbalanced data, we only consider the balancedcase in this paper, that is, n1 = n2 = · · · = nm = n.

Let α = (α1, . . . , αn−1)T. Define the estimating functions for β and α based

on the ith subject:

g(Yi, Xi, β, α,RF ) =

DTi A

−1/2i R−1

F (α)A−1/2i Si

∑n−1k=1 eikeik+1 − α1φ̂(n− 1− p/m)

...∑1

k=1 eikeik+n−1 − αn−1φ̂(1− p/m)

,

where RF (α) is a stationary correlation matrix, and eik = (Yik−µik)/√

ν(µik).The empirical likelihood ratio is

RF (θ) = max

{

m∏

i=1

mpi; 0 ≤ pi ≤ 1,

m∑

i=1

pi = 1,

m∑

i=1

pig(Yi, Xi, β, α,RF ) = 0

}

.

Chen and Lazar (2012) modified the AIC and the BIC by substituting theempirical likelihood for the parametric likelihood and gave the empirical like-lihood versions of the AIC and the BIC:

EAIC = −2 logRF (θ̂) + 2dim(θ),

EBIC = −2 logRF (θ̂) + log(m)dim(θ).

It is worth mentioning that, when calculating the EAIC and the EBIC, θ̂is the GEE estimator of (βT, αT)T and RF (α) is embedded by the workingcorrelation matrix. Note that the dimension of α is n− 1 (balanced cases) andhence the EAIC and the EBIC contain p + n − 1 unknown parameters evenfor an exchangeable matrix and an AR(1) correlation matrix. Therefore, if thesample size m is small and n is large, the performance of the EAIC and theEBIC is affected, which will be demonstrated with simulation studies.

3 Simulation studies

We now carry out extensive simulation studies in to assess the performance ofthese criteria aforementioned under different settings.

8 Liya Fu et al.

3.1 Data generation designs

We consider three different types of response distributions: Gauss, Poissonand Binary. Case (1) Gaussian responses: correlated normal data are gener-ated from a linear model Yik = xik1β1 + xik2β2 + ǫik, k = 1, . . . , n, i =1, . . . ,m, where β1 = β2 = 1. Covariate xi11 = . . . = xin1 ∼ Bernoulli(0.5)and xi12, . . . , xin2 are generated independently from N(0, 1). The errors ǫi =(ǫi1, . . . , ǫin)

T, i = 1, . . . ,m are generated independently from a multivariatenormal distribution N(0, R(ρ)).Case (2) Poisson responses: correlated Poisson data are generated using themean model log(µik) = β0 + xik1β1 + (k − 1)β2, k = 1, . . . , n, i = 1, . . . ,m,where β0 = 0.5, β1 = β2 = −0.2, and xik1 ∼ Bernoulli(0.5). Different correla-tion structures are also considered as in Case (1).Case (3) Binary responses: correlated binary data are generated using themarginal probability model logit(µik) = β0+xik1β1+(k−1)β2, k = 1, . . . , n, i =1, . . . ,m, where xik1 and (β0, β1, β2) are the same as those in Case (2). Thismodel was used in Gosho et al. (2011).

For Cases (1) and (2), each subject is assumed to have the same numberof observations n, and three different numbers of n, n = 4, 8, and 12 areinvestigated. As for the true correlation structures, we consider three differentstructures for the true correlation matrix RT used in generating the data:independence, exchangeable, and AR(1). When RT is exchangeable or AR(1),eight different values are investigated for the correlation parameter ρ: 0.1, 0.2,0.3, 0.4, 0.6, 0.7, and 0.8.

For Case (3), because of the intrinsic constraints, max{0, pik + pil − 1} ≤p(yik = 1, yil = 1) ≤ min{p(yik = 1), p(yil = 1)} for any i, k, and l, thecorrelation value ρ between any two pairs yik and yil must satisfy the followingconstraints,

maxi,k 6=l

{

−

√

pikpilqilqik

,−

√

qilqikpikpil

}

≤ ρ ≤ mini,k 6=l

{√

pikqilpilqik

,

√

pilqikpikqil

}

,

where pik = p(yik = 1) and qik = 1 − pik (McDonald, 1993; Shults et al.,2009). To ensure the constraints are met, we let ρ take values of 0.1, 0.2, 0.3,0.4, and 0.6 when n = 4, and 0.1, 0.2, 0.3, and 0.4 when n = 8.

As for the number of subjects,m, we consider four different values,m = 50,100, 200, and 500. For each simulation setting, we generate 1000 independentdatasets for evaluation. Computation is carried out using the statistical soft-ware R version 3.2.5. The correlated Gaussian data are generated using thefunction mvrnorm(m, mu, Sigma) in the mvtnorm library (Gen et al., 2014),where m is the number of samples, mu is a vector giving the means of thevariables, and Sigma is a covariance matrix of the variables. The correlatedcount data are generated using the function rcounts(m, margins, mu, corstr,corpar) in the corcounts library (Erhardt, 2009), where margins is a vector ofmargin tokens and is specified as “Poi” for Poisson response, mu is a meanvector of length n for the Poisson margins, corstr is the correlation structure,


and corpar is the correlation parameter. The correlated binary data are gen-erated using the function rmvbin(n, margprob, bincorr) in the bindata library(Leisch et al., 2011), where n is the number of observations,margprob are mar-gin probabilities and equal (µi1, . . . , µin) in Case (3) for subject i, and bincorr

is a given correlation matrix. Parameter estimates (β̂, α̂, φ̂) can be obtainedusing the function gee in the gee library or the function geese in the geepacklibrary. All simulation R codes are given in the supplementary materials.

3.2 Simulation study results

For the working correlation matrix R, we consider three different candidatestructures: independence (IN), exchangeable (EX), and AR(1) (AR) struc-tures. The simulation results are plotted in Figures 1–7. The three panelsfrom the top to the bottom in Figures 1-6 correspond to n = 4, 8, and 12,while the four plots in each panel correspond to m = 50, 100, 200, and 500.Each plot shows the frequency of selecting the true correlation structure versusthe correlation values in Figs 1–6.

From Figures 1-6, we can see that when RT is exchangeable or AR(1), fora given sample size m, the frequencies of selecting the correct structure formost criteria increase with the number of observations n, which implies thesecriteria work well when n is large. For a fixed value of n, most criteria havehigher frequencies of selecting the correct structure when n becomes larger.

The frequencies of selecting the correct structure for most criteria increaseas the correlation ρ increases. However, in the case of Poisson responses (Figs.3-4), as n increases, the performance of the EBIC and the EAIC deteriorateseven when a strong correlation exists. The CIC and the QIC perform similarlyfor the Gaussian responses (Figs. 1 and 2). However, for other response types(binary and Poisson), the CIC performs much better than the QIC in mostcases (Figs. 3-6). The SC does not perform well for most values of m, n, andρ. The criteria RJ, ∆1, and ∆2 perform similarly in most cases, because theyare based on the same statistics.

Figure 7 summarizes the results when the true structure is independence,and three panels from the top to bottom correspond to the Gaussian, Poisson,and binary responses. Plots from left to the right correspond to the differentobservation time periods for each subject (n). The x-axis is the number ofsubjects (m), and the y-axis is the frequency of selecting the true correlationstructure.

When there is indeed no correlation (the independence model is the truestructure), the GAIC, GBIC, EAIC, and EBIC are much more effective thanthe other criteria (Fig. 7). Furthermore, the GAIC is better than the EAIC es-pecially when n is large and m is small. Also, the GBIC appears to outperformEBIC in most cases.

Furthermore, we consider the situation in which the stationary correlationstructure (ST) is included in the working correlation structure candidates in

10 Liya Fu et al.

addition to independence, exchangeable, and AR(1) structures. The simulationresults show similar patterns (Supplemental Figs 1–7).

For the Gaussian response (Supplemental Figs. 1 and 2), when RT is ex-changeable or AR(1), the frequencies of selecting the correct structure for mostcriteria increase as the correlation increases. When the sample size is small(m = 50), the GBIC performs better than the EBIC, and the GAIC performsbetter than the EAIC. When the sample size becomes large (m = 500), theGBIC and EBIC are comparable, but the GAIC performs slightly better thanthe EAIC. The SC performs better than the ∆1, ∆2, RJ, and C(R) in mostcases. The CIC, ∆1, ∆2, and C(R) tend to select the more complex stationarystructure. This is because these criteria do not take account of the parameterdimension (i.e., model complexity).

For the Poisson response (Supplemental Fig. 3 and 4), when RT is ex-changeable or AR(1), and n = 4 or n = 8, the frequencies of selecting thecorrect structure for most criteria increase as the correlation increases. Whilethe correct selection rates for most criteria first increase and then decreaseas the correlation increases. For a smaller sample size m = 50 and a largern = 12, the GAIC and the GBIC perform better than the EAIC and theEBIC. Furthermore, for a large sample size, the GBIC is more effective thanthe GAIC, and the EBIC demonstrates a higher correct identification ratethan the EAIC. When RT is exchangeable (Supplemental Fig. 3), the ∆1 and∆2 perform better than the RJ while the RJ performs better than the ∆1 and∆2 when RT is AR(1) (Supplemental Fig. 4).

For the binary response (Supplemental Figs. 5 and 6), the performance ofthe C(R), CIC, ∆1, and ∆2 is poor. They all tend to select the ST structuremore often. When RT is exchangeable (Supplemental Fig. 5), the EAIC andEBIC perform slightly better than the GBIC for n = 4, and they all becomemore effective than the GAIC. While the GBIC performs better than theEAIC and EBIC when m = 50 and n = 8. The GBIC, EAIC, and EBICperform similarly for large sample sizes. The QIC performs better than theCIC. The performance of the RJ, ∆1, and ∆2 is similar. When RT is AR(1)(Supplemental Fig. 6), the GBIC outperforms other criteria. The RJ is betterthan the ∆1 and ∆2.

When RT takes an independence matrix (Supplemental Fig. 7), the GBICand EBIC are consistently better than other criteria for a small n. Further-more, the frequencies of selecting the correct structure of the EAIC and EBICdecrease as n increases for a small sample size. This is because the EAIC andEBIC contain more parameters for a larger n.

In summary, the performance of all the criteria depends on the true cor-relation structure, the type of responses, the correlation values, the numberof time periods n, and the sample size m. Overall, the GBIC performs bestalthough not in all cases, and its performance is never too disappointing.


4 Epileptic seizure data

In this section, we apply the methods to the epileptic seizure data from a clini-cal trial of 59 epileptics (Diggle et al., 2002; Thall and Vail, 1990), which wereanalyzed by Carey and Wang (2011). For each patient, the number of epilepticseizures was recorded during a baseline period of eight weeks. Patients wererandomized to treatment with progabide or placebo. The number of seizureswas recorded in four consecutive two-week intervals. Therefore, each patienthad four observations. What is of interest is whether the progabide decreasedthe epileptic seizure rate. We use a Poisson model to fit this count data. Theresponse is the number of seizures, and covariates are progabide treatment,the logarithm of 1/4 the eight-week baseline counts, logarithm of the patients’age, the interaction between treatment and the logarithm of 1/4 the eight-week baseline rate, and an indicator of period four, which were considered inThall and Vail’s (1990) original analysis.

Following Carey and Wang (2011), we consider the variance function ν(µik) =µ2ik. We fit the GEE model assuming four competing working correlation struc-

tures: IN, EX, AR(1), and ST, with and without the 49th patient who hadunusual pre- and post-randomization seizure counts (Diggle et al., 2002). Theparameter estimates, their standard errors, and criteria values are presented inTable 1. When the 49th patient is included, we can see that the Gaussian pseu-dolikelihood criteria (the GAIC and the GBIC), ∆1, ∆2, SC, and C(R) selectan exchangeable structure in agreement with Carey and Wang (2011), whereasthe empirical likelihood criteria (the EAIC and the EBIC), RJ, QIC, and CICselect AR(1) instead. When the 49th patient is excluded, the choices of theGAIC, GBIC, EAIC, EBIC, CIC, ∆1, ∆2, and C(R) remain unchanged, whilethe QIC and RJ choose the exchangeable structure in this case. This indicatesthat the choices of the QIC and RJ are sensitive to the outlier. According tothe simulations, the GBIC performs the best overall. Therefore, we suggestchoosing the exchangeable correlation structure. On the other hand, the pa-rameter estimates under the exchangeable correlation structure assumptionhave smaller standard errors than those obtained under the AR(1) correlationstructure assumption except for the log(base) effect. Taking into considerationthe standard errors of the parameter estimates, the exchangeable correlationstructure may be preferred over AR(1).

5 Conclusion

We have reviewed and evaluated several criteria with simulation studies. Theresults show that the GAIC, GBIC, EAIC, and EBIC are more useful thanother criteria when the true correlation structure is independence, exchange-able, or AR(1). The modified version of the AIC or BIC via the Gaussianpseudolikelihood is better than their counterparts of the empirical likelihoodin most cases. The GBIC is better than the GAIC, and the EBIC performs bet-ter than the EAIC. When correlation structure candidates are exchangeable

12 Liya Fu et al.

or AR(1), the C(R) performs very well. When RT is a stationary structure andcan be approximated by an exchangeable or AR(1), the GAIC, GBIC, EAIC,and EBIC tend to select a parsimonious structure, but they will not resultin significant efficiency loss (Chen and Lazar, 2012). According to our simula-tion results, the GAIC, GBIC, EAIC, and EBIC seem to be reasonably reliableanalytical tools especially when the correlation structure is prone to misspeci-fication. We suggest computing the GAIC, GBIC, EAIC, and EBIC for all theplausible candidate working correlation structures. If the correlation structuresselected using these criteria are different, the one with small standard errorcan be trusted. In our simulation studies, we consider only the balanced datain which the number of observations is the same for each subject. However,unbalanced cases are common in longitudinal data because subjects can dropout in any follow-ups. All the criteria can be directly used in unbalanced caseexcept for the C(R) unless we modify the C(R) by incorporating an indicatorfunction Ji in eqn (7) of Gosho et al. (2014).

In this study, we focused on selecting the correlation structure instead ofselecting the covariates in regression (which is also an important topic). TheCIC, C(R), RJ, ∆1, and ∆2 cannot be used to select covariates in the meanregression, as these criteria depend upon the regression mean model speci-fication. The QIC can be used to select correlation structure and covariatesin the GEE method (Pan, 2001). The SC, GAIC, GBIC, EAIC, and EBICcan be also used to select covariate variables in the mean model, but theirperformance has not been assessed. Furthermore, the importance of modellingthe variance function has attracted researchers’ attention (Carey and Wang,2011; Wang and Zhao, 2007; Wang and Hin, 2010). The performance of theGaussian pseudolikelihood LG, ∆1, and ∆2 for selecting a variance functionhas been studied by Carey and Wang (2011). Our future work will study theperformance of the SC, QIC, EAIC, EBIC, GAIC, and GBIC for selecting thecovariate variables or the variance functions.

Acknowledgements This research was funded by the Australian Research Council Dis-covery Projects (DP130100766 and DP160104292). L. Fu’s research was partly supported bythe National Science Foundation of China (Grant Nos. 11201365 and 11301408) and the Doc-toral Programs Foundation of Ministry of Education of China (Grant No. 2012020112005).

References

Carey, V. J. and Wang, Y-G. (2011). Working covariance model selection forgeneralized estimating equations. Statistics in Medicine 30, 3117–3124.

Chen, J. and Lazar, N. A. (2012). Selection of working correlation structure ingeneralized estimating equations via empirical likelihood. Journal of Com-putational and Graphical Statistics 21, 18–41.

Davidson, R and MacKinnon, J. G. (2004). Econometric Theory and Methods,Oxford University Press, Oxford.

Diggle, P. J., Heagerty, P. J., Liang, K. L. and Zeger, S. L. (2002). Analysisof longitudinal data. Oxford University Press, Oxford.


Erhardt, V. (2009). Generate correlated count random variables. R packageversion 1.4.

Fitzmaurice, G. M. (1995). A caveat concerning independence estimating equa-tions with multivariate binary data. Biometircs 51, 309–317.

Genz, A., Bretz, F., Miwa, T., Mi, X., Leisch, F., Scheipl, F., Bornkamp, B.and Hothorn, T. (2014). Multivariate normal and t distributions. R packageversion 0.9-9997.

Gosho, M., Hamada, C. and Yoshimura, I. (2011). Criterion for the selectionof a working correlation structure in the generalized estimating equationapproach for longitudinal data. Communications in Statistics–Theorey andMethods 40, 3839–3856.

Gosho, M., Hamada, C. and Yoshimura, I. (2014). Selection of working cor-relation structure in weighted generalized estimating equation mehod forincomplete longitudinal data. Communications in Statistics–Theorey andMethods 43, 62–81.

Hansen, L. P. (1982). Large sample properties of generalized method of mo-ments estimators. Econometrica, 50, 1029 –1054.

Hin, L-Y., Carey, V. J. and Wang, Y-G. (2007). Criteria for working-correlation-structure selection in GEE: assessment via simulation. TheAmerican Association 61, 360–364.

Hin, L-Y. and Wang, Y-G. (2009). Working correlation structure identificationin generalized estimating equations. Statistics in Medicine 28, 642–658.

Leisch, F., Weingessel, A. and Hornik, K. (2011). Generation of artificial binarydata. R package version 0.9-19.

Liang, K. Y. and Zeger, S. L. (1986). Longitudinal data analysis using gener-alized linear models. Biometrika 73, 13–22.

McCullagh, P. and Nelder, J. (1989). Generalized linear models. Chapman &Hall: London.

McDonald, B. W. (1993) Estimating logistic regression parameters for bivari-ate binary data. Journal of the Royal Statistical Society. Series B 55, 391–397.

Owen, A. B. (2001). Empirical likelihood. New York: Chapman and Hall-CRC.Pan, W. (2001). Akaike’s information criterion in generalized estimating equa-tions. Biometrics 57, 120–125.

Rotnitzky, A. and Jewell, N. P. (1990). Hypothesis testing of regression pa-rameteres in semparametric generalized linear models for clustere correlateddata. Biomtrika 77, 485–497.

Shults, J. and Chaganty, N. R. (1998). Analysis of serially correlated datausing quasi-least square. Biometrics 54, 1622–1630.

Shults, J., Sun, W., Tu, X., Kim, H. F., Amsterdam, J., Hilbe, J. M. and Ten-Have, T. (2009). A comparison of several approaches for choosing betweenworking correlation structures in generalzied estimating equation analysisof longitudinal binary data. Statistics in Medicine 28, 2338–2355.

Thall, P. F. and Vail, S. C. (1990). Some covariance models for longitudinalcount data with overdispersion. Biometrics 46, 657–671.

14 Liya Fu et al.

Wang, Y-G. and Carey, V. J. (2003). Working correlation structure misspec-ification, estimation and covariate design: implications for generalised esti-mating equations performance. Biometrika 90, 29–41.

Wang, Y-G. and Carey, V. J. (2004). Unbiased estimating equations fromworking correlation models for irregularly timed repeated measures. Journalof the Ammerican Statistical Association 99, 845–853.

Wang, Y-G. and Hin, L-Y. (2010). Modeling strategies in longitudinal dataanalysis: Covariate,variance function and correlations structure selection.Computational Statistics and Data Analysis 54, 3359–3370.

Wang, Y-G. and Zhao, Y. (2007). A modified pseudolikelihood approach foranalysis of longitudinal data. Biometrics 63, 681–689.

Zhu, X and Zhu, Z. (2013). Comparison of criteria to select working correla-tion matrix in generalized estimating equations. Chinese Journal of AppliedProbability and Statistics 29, 515–530.


Table 1 Parameter estimates, their standard errors, and the criteria values for the seizure data with four working

correlation structures and variance ν(µ) = µ2

Working correlation structure

Complete data An outlier removed

IN EX AR ST IN EX AR STIntercept -0.607 -0.607 -0.801 -0.783 -0.697 -0.697 -0.903 -0.895

(0.826) (0.826) (0.839) (0.840) (0.813) (0.813) (0.815) (0.818)Log base 0.872 0.872 0.860 0.860 0.871 0.871 0.858 0.858

(0.123) (0.123) (0.114) (0.114) (0.123) (0.123) (0.114) (0.113)Progabide -0.794 -0.794 -0.857 -0.853 -0.586 -0.586 -0.633 -0.633

(0.401) (0.401) (0.415) (0.415) (0.375) (0.375) (0.378) (0.379)Log base× 0.286 0.286 0.305 0.303 0.148 0.148 0.156 0.155Progabide (0.207) (0.207) (0.215) (0.215) (0.196) (0.196) (0.196) (0.197)Log age 0.302 0.302 0.367 0.363 0.330 0.330 0.399 0.399

(0.239) (0.239) (0.245) (0.245) (0.235) (0.235) (0.238) (0.238)Visit 4 -0.127 -0.127 -0.086 -0.105 -0.129 -0.129 -0.090 -0.111

(0.096) (0.096) (0.101) (0.099) (0.095) (0.095) (0.101) (0.100)Correlationcoefficient 0 0.34 0.45 α1 0 0.34 0.46 α∗

CriteriaGAIC 1415.950 1385.517 1388.195 1389.352 1369.807 1340.336 1343.005 1344.472GBIC 1428.415 1400.060 1402.738 1408.050 1382.170 1354.759 1357.428 1363.016EAIC 67.592 19.159 16.577 18.000 60.074 19.580 16.544 18.000EBIC 80.057 33.702 31.120 36.698 72.437 34.003 30.967 36.544QIC 1950.218 1950.218 1907.170 1920.055 1872.231 1872.231 1824.760 1836.728CIC 9.442 9.442 9.321 9.353 8.528 8.528 8.056 8.088RJ 2.110 0.147 0.091 0.161 1.523 0.309 0.329 0.395∆1 0.884 0.149 0.194 0.176 0.621 0.153 0.161 0.161∆2 2.895 1.545 1.700 1.723 2.482 1.827 1.915 1.970SC 230.000 229.482 237.892 237.406 226.000 225.489 235.184 234.863C(R) 3.438 1.394 1.822 1.574 2.504 1.848 2.195 2.076

α1 = (0.45, 0.30, 0.15) and α∗ = (0.46, 0.30, 0.13).

16 Liya Fu et al.

ρ n=4

Fre

quen

cies

of S

elec

tion

(EX

)

400

600

800

1000

0.2 0.4 0.6 0.8

500.2 0.4 0.6 0.8

100

0.2 0.4 0.6 0.8

2000.2 0.4 0.6 0.8

500

GAICGBIC

EAICEBIC

QICCIC

∆1

∆2

RJC(R)

SC

ρ n=8

Fre

quen

cies

of S

elec

tion

(EX

)

400

600

800

1000

0.2 0.4 0.6 0.8

500.2 0.4 0.6 0.8

100

0.2 0.4 0.6 0.8

2000.2 0.4 0.6 0.8

500

GAICGBIC

EAICEBIC

QICCIC

∆1

∆2

RJC(R)

SC

ρ n=12

Fre

quen

cies

of S

elec

tion

(EX

)

400

600

800

1000

0.2 0.4 0.6 0.8

500.2 0.4 0.6 0.8

100

0.2 0.4 0.6 0.8

2000.2 0.4 0.6 0.8

500

GAICGBIC

EAICEBIC

QICCIC

∆1

∆2

RJC(R)

SC

Fig. 1 Frequencies of selecting the correct correlation structure for Gaussian responseswhen the true correlation structure is exchangeable.


ρ n=4

Fre

quen

cies

of S

elec

tion

(AR

)

200

400

600

800

1000

0.2 0.4 0.6 0.8

500.2 0.4 0.6 0.8

100

0.2 0.4 0.6 0.8

2000.2 0.4 0.6 0.8

500

GAICGBIC

EAICEBIC

QICCIC

∆1

∆2

RJC(R)

SC

ρ n=8

Fre

quen

cies

of S

elec

tion

(AR

)

400

600

800

1000

0.2 0.4 0.6 0.8

500.2 0.4 0.6 0.8

100

0.2 0.4 0.6 0.8

2000.2 0.4 0.6 0.8

500

GAICGBIC

EAICEBIC

QICCIC

∆1

∆2

RJC(R)

SC

ρ n=12

Fre

quen

cies

of S

elec

tion

(AR

)

400

600

800

1000

0.2 0.4 0.6 0.8

500.2 0.4 0.6 0.8

100

0.2 0.4 0.6 0.8

2000.2 0.4 0.6 0.8

500

GAICGBIC

EAICEBIC

QICCIC

∆1

∆2

RJC(R)

SC

Fig. 2 Frequencies of selecting the correct correlation structure for Gaussian responseswhen the true correlation structure is AR(1).

18 Liya Fu et al.

ρ n=4

Fre

quen

cies

of S

elec

tion

(EX

)

200

400

600

800

1000

0.2 0.4 0.6 0.8

500.2 0.4 0.6 0.8

100

0.2 0.4 0.6 0.8

2000.2 0.4 0.6 0.8

500

GAICGBIC

EAICEBIC

QICCIC

∆1

∆2

RJC(R)

SC

ρ n=8

Fre

quen

cies

of S

elec

tion

(EX

)

400

600

800

1000

0.2 0.4 0.6 0.8

500.2 0.4 0.6 0.8

100

0.2 0.4 0.6 0.8

2000.2 0.4 0.6 0.8

500

GAICGBIC

EAICEBIC

QICCIC

∆1

∆2

RJC(R)

SC

ρ n=12

Fre

quen

cies

of S

elec

tion

(EX

)

200

400

600

800

1000

0.2 0.4 0.6 0.8

500.2 0.4 0.6 0.8

100

0.2 0.4 0.6 0.8

2000.2 0.4 0.6 0.8

500

GAICGBIC

EAICEBIC

QICCIC

∆1

∆2

RJC(R)

SC

Fig. 3 Frequencies of selecting the correct correlation structure for Poisson responses whenthe true correlation structure is exchangeable.


ρ n=4

Fre

quen

cies

of S

elec

tion

(AR

)

0

200

400

600

800

1000

0.2 0.4 0.6 0.8

500.2 0.4 0.6 0.8

100

0.2 0.4 0.6 0.8

2000.2 0.4 0.6 0.8

500

GAICGBIC

EAICEBIC

QICCIC

∆1

∆2

RJC(R)

SC

ρ n=8

Fre

quen

cies

of S

elec

tion

(AR

)

0

200

400

600

800

1000

0.2 0.4 0.6 0.8

500.2 0.4 0.6 0.8

100

0.2 0.4 0.6 0.8

2000.2 0.4 0.6 0.8

500

GAICGBIC

EAICEBIC

QICCIC

∆1

∆2

RJC(R)

SC

ρ n=12

Fre

quen

cies

of S

elec

tion

(AR

)

0

200

400

600

800

1000

0.2 0.4 0.6 0.8

500.2 0.4 0.6 0.8

100

0.2 0.4 0.6 0.8

2000.2 0.4 0.6 0.8

500

GAICGBIC

EAICEBIC

QICCIC

∆1

∆2

RJC(R)

SC

Fig. 4 Frequencies of selecting the correct correlation structure for Poisson responses whenthe true correlation structure is AR(1).

20 Liya Fu et al.

ρ n=4

Fre

quen

cies

of S

elec

tion

(EX

)

400

600

800

1000

0.1 0.2 0.3 0.4 0.5 0.6

500.1 0.2 0.3 0.4 0.5 0.6

100

0.1 0.2 0.3 0.4 0.5 0.6

2000.1 0.2 0.3 0.4 0.5 0.6

500

GAICGBIC

EAICEBIC

QICCIC

∆1

∆2

RJC(R)

SC

ρ n=8

Fre

quen

cies

of S

elec

tion

(EX

)

700

800

900

1000

0.10 0.15 0.20 0.25 0.30 0.35 0.40

500.10 0.15 0.20 0.25 0.30 0.35 0.40

100

0.10 0.15 0.20 0.25 0.30 0.35 0.40

2000.10 0.15 0.20 0.25 0.30 0.35 0.40

500

GAICGBIC

EAICEBIC

QICCIC

∆1

∆2

RJC(R)

Fig. 5 Frequencies of selecting the correct correlation structure for binary responses whenthe true correlation structure is exchangeable (EX).


ρ n=4

Fre

quen

cies

of S

elec

tion

(AR

)

200

400

600

800

1000

0.1 0.2 0.3 0.4 0.5 0.6

500.1 0.2 0.3 0.4 0.5 0.6

100

0.1 0.2 0.3 0.4 0.5 0.6

2000.1 0.2 0.3 0.4 0.5 0.6

500

GAICGBIC

EAICEBIC

QICCIC

∆1

∆2

RJC(R)

SC

ρ n=8

Fre

quen

cies

of S

elec

tion

(AR

)

400

600

800

1000

0.10 0.15 0.20 0.25 0.30 0.35 0.40

500.10 0.15 0.20 0.25 0.30 0.35 0.40

100

0.10 0.15 0.20 0.25 0.30 0.35 0.40

2000.10 0.15 0.20 0.25 0.30 0.35 0.40

500

GAICGBIC

EAICEBIC

QICCIC

∆1

∆2

RJC(R)

SC

Fig. 6 Frequencies of selecting the correct correlation structure for binary responses whenthe true correlation structure is AR(1).

22 Liya Fu et al.

Sample size

Fre

quen

cies

of S

elec

tion

(IN

)

0

200

400

600

800

1000

100 200 300 400 500

4100 200 300 400 500

8

100 200 300 400 500

12

GAICGBIC

EAICEBIC

QICCIC

∆1

∆2

RJC(R)

SC

Sample size

Fre

quen

cies

of S

elec

tion

(IN

)

0

200

400

600

800

1000

100 200 300 400 500

4100 200 300 400 500

8

100 200 300 400 500

12

GAICGBIC

EAICEBIC

QICCIC

∆1

∆2

RJC(R)

SC

Sample size

Fre

quen

cies

of S

elec

tion

(IN

)

0

200

400

600

800

1000

100 200 300 400 500

4100 200 300 400 500

8

GAICGBIC

EAICEBIC

QICCIC

∆1

∆2

RJC(R)

SC

Fig. 7 Frequencies of selecting the correct correlation structure (IN) for Gaussian (toppanel), Poisson (middle panel), and binary (bottom panel) responses when the true corre-lation structure is independent (IN).

working correlation structure selection in generalized ... · working correlation structure...

Documents