the adjustment of random baseline measurements in treatment effect estimation

15
Journal of Statistical Planning and Inference 136 (2006) 4161 – 4175 www.elsevier.com/locate/jspi The adjustment of random baseline measurements in treatment effect estimation Xun Chen Clinical Biostatistics, Sanofi-Aventis, BX2-400C, Bridgewater, NJ 08807, USA Received 11 February 2004; received in revised form 18 March 2005; accepted 3 August 2005 Available online 22 September 2005 Abstract The analysis of covariance (ANCOVA) is often used in analyzing clinical trials that make use of “baseline” response. Unlike Crager [1987. Analysis of covariance in parallel-group clinical trials with pretreatment baseline. Biometrics 43, 895–901.], we show that for random baseline covariate, the ordinary least squares (OLS)-based ANCOVA method provides invalid unconditional inference for the test of treatment effect when heterogeneous regression exists for the baseline covariate across different treatments. To correctly address the random feature of baseline response, we propose to directly model the pre- and post-treatment measurements as repeated outcome values of a subject. This bivariate modeling method is evaluated and compared with the ANCOVA method by a simulation study under a wide variety of settings. We find that the bivariate modeling method, applying the Kenward–Roger approximation and assuming distinct general variance–covariance matrix for different treatments, performs the best in analyzing a clinical trial that makes use of random baseline measurements. © 2005 Elsevier B.V. All rights reserved. Keywords: ANCOVA; Bivariate model; Random baseline measurement 1. Introduction The analysis of covariance (ANCOVA) is conventionally used to analyze clinical trials that make use of pretreatment “baseline” response. Comparing to the simple ANOVA method (either for the posttreatment measurements alone or for the difference between posttreatment and pre- treatment measurements), the ANCOVA method may reduce bias due to the imbalance of baseline covariates and maybe more importantly, may achieve more powerful comparisons in randomized clinical trials (Laird, 1983; Follmann, 1991; Senn, 1994). One important feature of the ANCOVA Tel.: +1 908 304 6582; fax: +1 908 231 4784. E-mail address: xun.chen@sanofi-aventis.com. 0378-3758/$ - see front matter © 2005 Elsevier B.V. All rights reserved. doi:10.1016/j.jspi.2005.08.046

Upload: xun-chen

Post on 26-Jun-2016

214 views

Category:

Documents


2 download

TRANSCRIPT

Journal of Statistical Planning andInference 136 (2006) 4161–4175

www.elsevier.com/locate/jspi

The adjustment of random baseline measurements intreatment effect estimation

Xun Chen∗Clinical Biostatistics, Sanofi-Aventis, BX2-400C, Bridgewater, NJ 08807, USA

Received 11 February 2004; received in revised form 18 March 2005; accepted 3 August 2005Available online 22 September 2005

Abstract

The analysis of covariance (ANCOVA) is often used in analyzing clinical trials that make use of “baseline”response. Unlike Crager [1987. Analysis of covariance in parallel-group clinical trials with pretreatmentbaseline. Biometrics 43, 895–901.], we show that for random baseline covariate, the ordinary least squares(OLS)-based ANCOVA method provides invalid unconditional inference for the test of treatment effectwhen heterogeneous regression exists for the baseline covariate across different treatments. To correctlyaddress the random feature of baseline response, we propose to directly model the pre- and post-treatmentmeasurements as repeated outcome values of a subject. This bivariate modeling method is evaluated andcompared with the ANCOVA method by a simulation study under a wide variety of settings. We find thatthe bivariate modeling method, applying the Kenward–Roger approximation and assuming distinct generalvariance–covariance matrix for different treatments, performs the best in analyzing a clinical trial that makesuse of random baseline measurements.© 2005 Elsevier B.V. All rights reserved.

Keywords: ANCOVA; Bivariate model; Random baseline measurement

1. Introduction

The analysis of covariance (ANCOVA) is conventionally used to analyze clinical trials thatmake use of pretreatment “baseline” response. Comparing to the simple ANOVA method (eitherfor the posttreatment measurements alone or for the difference between posttreatment and pre-treatment measurements), the ANCOVA method may reduce bias due to the imbalance of baselinecovariates and maybe more importantly, may achieve more powerful comparisons in randomizedclinical trials (Laird, 1983; Follmann, 1991; Senn, 1994). One important feature of the ANCOVA

∗ Tel.: +1 908 304 6582; fax: +1 908 231 4784.E-mail address: [email protected].

0378-3758/$ - see front matter © 2005 Elsevier B.V. All rights reserved.doi:10.1016/j.jspi.2005.08.046

4162 X. Chen / Journal of Statistical Planning and Inference 136 (2006) 4161–4175

method is that it develops inference based on the ordinary least squares (OLS) regression, thatis, conditionally on the observed baseline covariates. When baseline covariates are fixed ob-servations, e.g., observations controlled by investigator, tests based on conditional estimate andvariance are valid unconditionally. But when baseline covariates are considered random, whichis quite common for laboratory measurements, e.g., blood pressure, fasting plasma glucose, etc,it is not straightforward whether the conditional inference will remain valid unconditionally. Ina series of papers dealing with the applications of ANCOVA in early years of Biometrics, Coxand McCullagh (1982) actually called for distinctions between the case in which the covariate isfixed and that in which the covariate is essentially random when applying the ANCOVA method.

Crager (1987) presented an important result for the application of the ANCOVA model with ran-dom pretreatment baselines. In the paper, Crager claimed that the usual (fixed-covariate) ANCOVAtreatment effect tests are valid (both conditionally and unconditionally) for random covariate aslong as the pretreatment and posttreatment measurements follow a bivariate normal distribution.This finding was enjoyable for practitioners as it suggested no need to distinguish random orfixed baseline covariates for valid ANCOVA tests. It is counter-intuitive, however, especially inthe case of heterogeneous regression (i.e., different regression slopes against baseline covariatefor different treatments) when treatment effect varies with baseline measurements. In this paper,we will re-examine the validity of the usual ANCOVA method in the case of random baselinecovariate while re-visiting the proof of Crager (1987). We organize the paper as follows. In Sec-tion 2, we examine the validity of the usual ANCOVA method when baseline response variable isconsidered random. In Section 3, we introduce an alternative bivariate modeling method whichdirectly addresses the random feature of baseline measurements. A simulation study is conductedin Section 4 to evaluate the performance of the usual ANCOVA method and the proposed bivariatemodeling method. A data analysis example is presented in Section 5 for illustration purpose. Weconclude the paper in Section 6.

2. The validity of the usual ANCOVA method

Assume there are n1 subjects in treatment group 1 and n2 subjects in treatment group 2 in arandomized clinical trial that compares the two treatments. Let Xij and Yij denote the responseat baseline and the response after treatment, respectively, for subject j in treatment group i

(i = 1, 2, j = 1, . . . , ni). Assume in the randomized trial(Xij

Yij

)∼ i.i.d. BVN

((�i

�i + �i

), �(i)

)(1)

for i = 1, 2, j = 1, . . . , ni . BVN denotes bivariate normal distribution;

�(i) =(

�(i)11 �(i)

12

�(i)12 �(i)

22

)

represents the variance–covariance matrix of(Xij

Yij

).

As a randomized trial, it is reasonable to further assume �1 = �2 = � and �(1)11 = �(2)

11 = �11.

X. Chen / Journal of Statistical Planning and Inference 136 (2006) 4161–4175 4163

Here we are interested in estimating the average effect of treatment i, say �i in distribution (1),and testing the null hypothesis of no difference between treatments, that is, H0 : �1 − �2 = 0. Onecommon method is to fit an ANCOVA model as follows (Crager, 1987):

Yij − Xij = (1 − �i )� + �i + (�i − 1)Xij + �ij , (2)

where �ij is normally distributed with mean 0 and variance �(i)22 − �(i)2

12 /�11 and is independentof Xij . Note

�i = Cov(Yij , Xij )

Var(Xij )= �(i)

12

�11.

Different covariance between pretreatment values and posttreatment values in different treatmentgroups may thus result in heterogeneity of regression in model (2). Model (2) is usually fitted byOLS method which estimates �i by �i = Yi. − �− �i (Xi. − �), c = �1 − �2 by c =[Y1. − �1(X1. −�)]− [Y2. − �2(X2. − �)] and estimates their variance conditionally on baseline covariate, that is,using Var(�| �X) for Var(�)and using Var(c| �X) for Var(c). Here �X denotes all of the pretreatmentmeasurements X11, . . . , X1n1 , X21, . . . , X2n2 ;

� =∑

i

∑jXij

n1 + n2= n1X1. + n2X2.

n1 + n2;

�i =∑

j (Xij − Xi.)(Yij − Yi.)∑j (Xij − Xi.)

2for �1 �= �2 and

�1 = �2 =∑

i

∑j (Xij − Xi.)(Yij − Yi.)∑

i

∑j (Xij − Xi.)

2for �1 = �2.

The inference developed from the usual ANCOVA model is thus conditional on the observedbaseline covariates. In practice, however, interest is more in the unconditional interpretation. Topromote unconditional interpretation for the estimates and tests of ANCOVA model, one needs tomake sure that the conditional inference from the usual ANCOVA model is valid unconditionally.

Now that Var(c) = E[Var(c| �X)] + Var[E(c| �X)] (not the Var(c) = E[Var(c| �X)] as in Crager(1987)) (and so for Var(�i ) = E[Var(�i | �X)] + Var[E(�i | �X)]). It follows directly that when thebaseline covariates are fixed values, the unconditional inference based on Var(c) will be identicalto the conditional inference based on Var(c| �X) as Var[E(c| �X)]=Var[f ( �X)] ≡ 0 for fixed baselinevalues. But when the pretreatment measurements are random variables, we find Var(c| �X) couldunderestimateVar(c) and result in invalid unconditional inference.

Specifically, it can be shown that E(c| �X) = �1 − �2 + (�1 − �2)(� − �) and

Var[E(c| �X)] = (�1 − �2)2Var(�), (3)

where Var(�) = �11/(n1 + n2) > 0 when the baseline measurements are random variables. Itfollows directly that for random baseline covariates, Var(c) > E[Var(c| �X)] when �1 �= �2 (thatis, when there is an interaction between treatment and baseline). So unlike Crager (1987), we findthat the usual ANCOVA model does not always provide valid unconditional inference for the testof H0 : �1 − �2 = 0 for data from distribution (1).

Furthermore, it can be shown that when the baseline values are random, Var[E(�i | �X)] = (�i −1)2Var(�) > 0 no matter �1 = �2 or not. It tells us that the usual ANCOVA method tends to

4164 X. Chen / Journal of Statistical Planning and Inference 136 (2006) 4161–4175

underestimate the unconditional variance of its OLS estimate of mean change from baseline forrandom baseline covariate. There is thus need to address the random feature of baseline variablesfor valid unconditional inference (e.g., for the test of H0 : �1 − �2 = 0 and for the estimation ofVar(�i )). In the next section, we will introduce the application of a bivariate modeling method, inplace of the usual ANCOVA method, for the analysis of data from distribution (1).

3. The application of bivariate modeling method

Considering posttreatment response values as the repeated measures of their correspondingpretreatment values, we may directly address the random feature of the baseline measurementsin (1) using a bivariate model:

Xij = � + eij ,

Yij = �i + e′ij , (4)

where

Var

[(Xij

Yij

)∣∣∣∣ i]

= �(i) and

(eij

e′ij

)∼ BVN(0, �(i))

for i = 1, 2, j = 1, . . . , ni .

Let

�� =( �

�1�2

)

denote the unknown parameters. The generalized least squares (GLS) estimator of �� can berepresented as

��GLS =⎡⎣ 2∑

i=1

ni∑j=1

(1 0 01 I (i, 1) I (i, 2)

)T

(�(i))−1(

1 0 01 I (i, 1) I (i, 2)

)⎤⎦−1

×2∑

i=1

ni∑j=1

(1 0 01 I (i, 1) I (i, 2)

)T

(�(i))−1[XijYij

],

where I (i, k)=1 if i=k and I (i, k)=0 otherwise. Note this GLS estimator coincides with the GEEestimator in Yang and Tsiatis (2001). It is known that the GLS estimator is fully efficient underthe bivariate normal distribution (1). Under the normality assumption, the restricted maximumlikelihood (REML) method can be used to derive empirical estimates for �(i) (Swallow and

Monahan, 1984). Denote the empirical estimator of �(i) by �(i)

and the corresponding empiricalGLS (say, EGLS) estimator for �� as ��. The F -test statistic for any linear contrast of ��, say forH0 : L′ �� = 0, is then conventionally set as (Verbeke and Molenberghs, 1999)

f = ��′L′ Var(L′ ��)−1L ��

rank(L)∼ F(rank(L), df (Var(L′ ��)−1)). (5)

Note here L′ = (1, −1, 0) and (1, 0, −1) for the estimations of mean change from baseline intreatment groups 1 and 2, respectively; and L′=(0, 1, −1) for the comparison between treatments1 and 2.

X. Chen / Journal of Statistical Planning and Inference 136 (2006) 4161–4175 4165

In the next section, we will evaluate the performance of the usual ANCOVA method and thebivariate modeling method by a simulation study over a variety of settings.

4. Simulation study

In the simulation, the usual ANCOVA method and the bivariate modeling method are con-ducted by SASTM Proc GLM and SASTM Proc Mixed, separately. A variety of model settings areconsidered:

• Three different distributions are employed when generating the pretreatment and posttreat-ment measurements (Xij , Yij ): standard normal, standardized t (5) (with mean 0 and standarddivision 1), and standardized log standard normal (with mean 0 and standard division 1). Thesethree distributions are illustrated in Fig. 1. The measurements (Xij , Yij ) are generated in threesteps: at first, a pair of random variables are generated with independent and identical distribu-tion (standard normal, or standardized t (5), or standardized log standard normal); the pair ofgenerated random variables are then transformed to have a preassigned variance–covariancematrix �i for group i; at last, the pair of transformed random variables are added to prespecifiedset of means and it completes the generation for (Xij , Yij ).

• Two scenarios of variance–covariance matrices are applied in the simulation. Scenario I: thetwo treatment groups have the same variance–covariance matrix, say

�1 = �2 =(

1 0.50.5 1

),

such that the regressions of posttreatment measurements on pretreatment measurements areparallel across treatments (that is, no interaction). Scenario II: the two treatment groups havedifferent variance–covariance matrices, say

�1 =(

1 0.80.8 1

), �2 =

(1 0.2

0.2 0.4

),

such that the regressions of posttreatment measurements on pretreatment measurements arenot parallel across treatments (that is, interaction).

• Different small to moderate sample settings are evaluated. For the equal sample case, weconsider n1 = n2 = 10, 20, 50 ; for the unequal sample case, we let n1 = 20, n2 = 10; n1 =50, n2 = 25; and n1 = 50, n2 = 40, separately.

When analyzing the simulated data, the usual ANCOVA method is conducted in two ways:assuming �1 = �2 (ANCOVA I) or assuming �1 �= �2 (ANCOVA II). In corresponding, theversions I and II of the bivariate modeling method assume �1 =�2 and �1 �= �2, respectively. Nospecific structure pattern is assumed for �i when analyzing the bivariate model in our simulationas the structure pattern is barely known priorly in practice and misspecifying it may lead to inflatedtype I error in analysis (Wright and Wolfinger, 1996). Two different methods are applied to derivethe inference in (5) (specifically, to estimate Var(L′ ��) and to estimate the degree of freedom ofVar(L′ ��)) in the simulation: the well known Satterthwaite approximation (Satterthwaite, 1941)(denoted as Bivariate_S I or Bivariate_S II) and the recently developed Kenward and Rogerapproximation (Kenward and Roger, 1997) (denoted as Bivariate_KR I or Bivariate_KR II). TheKenward and Roger approximation is expected to be more reliable in small sample cases (Chenand Wei, 2003).

4166 X. Chen / Journal of Statistical Planning and Inference 136 (2006) 4161–4175

x

f(x)

-3 -2 -1 0 1 2 3

0.0

0.2

0.4

0.6Normal(0,1)Standarized t(5)Standarized Log-nomal(0,1)

Fig. 1. An illustration for the density functions of the tested distributions.

Without loss of generality, we set � = �1 = �2 = 0 for the evaluation of type I error rate for thetest of H0 : �1 = �2. The powers of different methods are evaluated at � = �2 = 0, �1 = 0.4. Tocompare the empirical powers at the same empirical significance level, we adjusted the estimatedempirical powers following the method by Zhang and Boos (1994) to take into account of thetrue levels of the different tests. The empirical power of a test will not be reported if the test hasempirical type I error greater than 6% (i.e., 20% above the nominal level). The results presentedin this paper are all based on 5000 simulations. The tested nominal level is the conventional 5%

Tables 1–3 summarize the performance of the different methods when testing H0 : �1 − �2 = 0under different pre–post distribution settings. The estimated standard error of �1 − �2 (generatedby the respective SAST M GLM and Mixed procedures) is denoted as SE_est. Its relative differenceto the Monte Carlo standard error (SE_MC) is calculated by

SE_est − SE_MC

SE_MC× 100%.

The tables indicate that for random baseline measurements:

• When there is interaction between treatment and baseline, the ANCOVA I method (assum-ing homogeneous slope across treatments) performs well for equal sample cases but fails forunequal sample cases; the ANCOVA II method (assuming heterogeneous slopes across treat-ments), on the other hand, tends to underestimate the standard error of �1 − �2 in both equaland unequal sample cases, which result in inflated type I error rates for corresponding tests.

• When there is no interaction between treatment and baseline, the ANCOVA I and II meth-ods have comparable good performance under normal and t-distributions; for the lognormaldistribution, the ANCOVA I method tends to be more reliable than the ANCOVA II method.

• The bivariate modeling method that applies the Kenward–Roger approximation and assumesdistinct general variance–covariance matrix for pre–post measurements under different

X. Chen / Journal of Statistical Planning and Inference 136 (2006) 4161–4175 4167

Table 1Comparison of the usual ANCOVA method and the bivariate method for normal distribution (based on 5000 simulations,nominal level = 5%)

Scenario I: Same covariance matrix across treatmentsEqual �1 − �2 SE_est. Rel. diff. to Type I Adj. powersample size SE_MC (%) error (%) (%)

ANCOVA I 10/10 −0.002 0.392 −2.7 5.1 15.520/20 −0.002 0.275 −2.7 5.3 27.650/50 0.000 0.173 −0.8 5.3 60.5

ANCOVA II 10/10 −0.002 0.393 −3.0 5.1 15.420/20 −0.002 0.275 −2.8 5.6 27.450/50 0.000 0.174 −0.8 5.3 60.8

Bivariate_S I 10/10 −0.002 0.37 −8.2 6.4 15.420/20 −0.002 0.268 −5.2 6.3 27.750/50 0.000 0.172 −1.8 5.5 60.1

Bivariate_S II 10/10 −0.002 0.37 −8.2 6.2 15.420/20 −0.002 0.268 −5.2 6.3 27.550/50 0.000 0.172 −1.8 5.5 60.1

Bivariate_KR I 10/10 −0.002 0.39 −3.2 5.3 15.420/20 −0.002 0.274 −2.8 5.3 27.750/50 0.000 0.173 −0.8 5.4 60.1

Bivariate_KR II 10/10 −0.002 0.391 −2.9 5.1 15.320/20 −0.002 0.275 −2.7 5.5 27.550/50 0.000 0.173 −0.8 5.4 60.1

Unequal �1 − �2 SE_est. Rel diff. to Type I Adj powersample size SE_MC error (%) (%)

ANCOVA I 20/10 −0.004 0.337 −2.7 5.2 19.850/25 0.000 0.213 −0.3 4.6 46.650/40 0.001 0.184 −0.3 5.2 56.2

ANCOVA II 20/10 −0.004 0.341 −2.6 5.4 19.050/25 0.000 0.214 −0.1 4.7 46.450/40 0.001 0.184 −0.2 5.1 56.2

Bivariate_S I 20/10 −0.004 0.325 −6.2 5.9 19.750/25 0.000 0.21 −1.7 4.9 46.850/40 0.001 0.182 −1.5 5.5 56.0

Bivariate_S II 20/10 −0.004 0.321 −8.2 6.6 19.150/25 0.000 0.209 −2.2 5.1 45.250/40 0.001 0.182 −1.4 5.4 56.4

Bivariate_KR I 20/10 −0.004 0.336 −2.9 5.3 19.750/25 0.000 0.213 −0.3 4.6 46.850/40 0.001 0.184 −0.3 5.3 55.9

Bivariate_KR II 20/10 −0.004 0.338 −3.3 5.3 18.950/25 0.000 0.213 −0.2 4.6 45.250/40 0.001 0.184 −0.2 5.1 56.4

4168 X. Chen / Journal of Statistical Planning and Inference 136 (2006) 4161–4175

Table 1 (continued)

Scenario II: Different covariance matrix across treatmentsEqual �1 − �2 SE_est. Rel diff. to Type I Adj powersample size SE_MC (%) error (%) (%)

ANCOVA I 10/10 0.004 0.303 −1.2 4.8 24.320/20 0.002 0.213 −0.8 5.0 44.850/50 0.001 0.134 0.4 4.3 85.3

ANCOVA II 10/10 0.003 0.272 −10.8 7.0 —20/20 0.002 0.191 −10.8 7.3 —50/50 0.001 0.12 −9.9 7.3 —

Bivariate_S I 10/10 0.004 0.286 −6.7 6.3 —20/20 0.002 0.207 −3.4 5.7 44.750/50 0.001 0.133 −0.6 4.6 85.1

Bivariate_S II 10/10 0.004 0.286 −6.9 6.2 —20/20 0.002 0.207 −3.5 5.7 44.650/50 0.001 0.133 −0.6 4.6 85.1

Bivariate_KR I 10/10 0.004 0.301 −1.6 5.0 24.020/20 0.002 0.213 −0.9 5.1 44.650/50 0.001 0.134 −0.4 4.9 85.1

Bivariate_KR II 10/10 0.004 0.304 −0.9 4.7 23.720/20 0.002 0.214 −0.5 5.0 44.650/50 0.001 0.135 −0.2 4.9 85.1

Unequal �1 − �2 SE_est. Rel diff. to Type I Adj powersample size SE_MC error (%) (%)

ANCOVA I 20/10 0.003 0.248 −6.5 7.0 —50/25 0.000 0.162 −4.9 6.5 —50/40 0.001 0.142 0.5 4.5 80.6

ANCOVA II 20/10 0.001 0.237 −9.4 7.5 —50/25 0.000 0.148 −8.4 6.7 —50/40 0.001 0.128 −8.9 6.8 —

Bivariate_S I 20/10 0.003 0.249 −8.8 7.4 —50/25 0.000 0.161 −5.3 6.3 —50/40 0.001 0.141 −0.6 4.7 80.7

Bivariate_S II 20/10 0.002 0.246 −6.4 6.5 —50/25 0.000 0.16 −1.2 5.0 68.550/40 0.001 0.141 0.0 4.4 81.9

Bivariate_KR I 20/10 0.003 0.258 −5.6 6.6 —50/25 0.000 0.163 −4.0 6.0 —50/40 0.001 0.142 0.6 4.5 80.7

Bivariate_KR II 20/10 0.002 0.260 −1.2 5.3 30.350/25 0.000 0.159 −0.6 4.8 68.750/40 0.001 0.140 −0.4 4.9 81.9

X. Chen / Journal of Statistical Planning and Inference 136 (2006) 4161–4175 4169

Table 2Comparison of the usual ANCOVA method and the bivariate method for standardized t (5) distribution (based on 5000simulations, nominal level = 5%)

Scenario I: Same covariance matrix across treatmentsEqual �1 − �2 SE_est. Rel. diff. to Type I Adj. powersample size SE_MC (%) error (%) (%)

ANCOVA I 10/10 0.004 0.381 −2.9 4.7 18.320/20 0.005 0.27 −2.1 4.9 31.950/50 0.005 0.172 −0.1 4.9 65.4

ANCOVA II 10/10 0.004 0.377 −4.2 5.1 17.920/20 0.004 0.268 −3.0 5.4 31.050/50 0.005 0.171 −0.7 5.0 65.6

Bivariate_S I 10/10 0.004 0.359 −8.5 6.1 —

20/20 0.005 0.263 −4.6 5.4 31.750/50 0.005 0.170 −1.1 5.2 65.3

Bivariate_S II 10/10 0.004 0.359 −8.5 5.9 18.020/20 0.005 0.263 −4.7 5.4 31.750/50 0.005 0.170 −1.1 5.2 65.3

Bivariate_KR I 10/10 0.004 0.379 −3.5 4.9 18.320/20 0.005 0.27 −2.2 5.0 31.650/50 0.005 0.172 −0.1 4.9 65.3

Bivariate_KR II 10/10 0.004 0.381 −3.1 4.7 18.120/20 0.005 0.271 −2.0 4.9 32.050/50 0.005 0.172 −0.1 4.9 65.3

Unequal �1 − �2 SE_est. Rel diff. to Type I Adj powersample size SE_MC error (%) (%)

ANCOVA I 20/10 0.007 0.33 −2.6 5.1 22.850/25 0.001 0.210 −0.6 5.1 47.050/40 0.004 0.182 −0.4 4.9 60.0

ANCOVA II 20/10 0.006 0.331 −2.8 5.4 22.050/25 0.001 0.209 −0.8 5.2 46.850/40 0.004 0.181 −1.0 5.1 59.8

Bivariate_S I 20/10 0.007 0.318 −6.1 6.0 22.750/25 0.001 0.207 −2.0 5.3 47.250/40 0.004 0.180 −1.5 5.2 60.2

Bivariate_S II 20/10 0.006 0.312 −8.0 6.4 —50/25 0.001 0.206 −2.5 5.2 48.150/40 0.004 0.180 −1.4 5.3 60.2

Bivariate_KR I 20/10 0.007 0.329 −2.8 5.2 22.650/25 0.001 0.210 −0.7 5.1 47.250/40 0.004 0.182 −0.4 4.9 60.2

Bivariate_KR II 20/10 0.006 0.328 −3.2 5.2 22.550/25 0.001 0.210 −0.5 4.9 48.150/40 0.004 0.182 −0.2 4.9 60.2

4170 X. Chen / Journal of Statistical Planning and Inference 136 (2006) 4161–4175

Table 2 (continued)

Scenario II: Different covariance matrix across treatments

Equal �1 − �2 SE_est. Rel diff. to Type I Adj powersample size SE_MC error (%) (%)

ANCOVA I 10/10 0.006 0.296 −0.5 4.7 27.820/20 −0.002 0.21 −0.7 4.7 47.950/50 −0.005 0.133 −1.7 5.2 82.3

ANCOVA II 10/10 0.004 0.267 −11.3 7.2 —20/20 −0.002 0.189 −11.2 7.7 —50/50 −0.005 0.119 −12.0 7.9 —

Bivariate_S I 10/10 0.006 0.279 −6.1 5.8 27.620/20 −0.002 0.205 −3.3 5.2 48.250/50 −0.005 0.132 −2.7 5.4 82.2

Bivariate_S II 10/10 0.006 0.279 −6.3 5.7 27.220/20 −0.002 0.205 −3.3 5.2 48.250/50 −0.005 0.132 −2.7 5.4 82.2

Bivariate_KR I 10/10 0.006 0.294 −1.0 4.6 27.620/20 −0.002 0.21 −0.8 4.8 48.250/50 −0.005 0.133 −1.7 5.2 82.2

Bivariate_KR II 10/10 0.006 0.297 −0.3 4.8 27.520/20 −0.002 0.211 −0.4 4.7 48.350/50 −0.005 0.133 −1.5 5.2 82.2

Unequal �1 − �2 SE_est. Rel diff. to Type I Adj powersample size SE_MC error (%) (%)

ANCOVA I 20/10 0.001 0.244 −8.2 7.1 —50/25 −0.006 0.161 −6.0 6.7 —50/40 −0.005 0.141 −1.9 5.4 76.9

ANCOVA II 20/10 −0.001 0.234 −10.5 7.7 —50/25 −0.005 0.146 −11.4 8.1 —50/40 −0.005 0.126 −11.8 8.1 —

Bivariate_S I 20/10 0.001 0.244 −8.0 6.8 —50/25 −0.006 0.159 −7.3 7.1 —50/40 −0.005 0.139 −3.0 5.7 77.0

Bivariate_S II 20/10 0.000 0.243 −6.7 6.0 32.850/25 −0.005 0.158 −4.1 5.9 65.450/40 −0.005 0.139 −2.7 5.6 77.7

Bivariate_KR I 20/10 0.001 0.253 −4.8 5.8 32.050/25 −0.006 0.161 −6.1 6.2 —50/40 −0.005 0.141 −1.9 5.4 77.0

Bivariate_KR II 20/10 0.000 0.256 −1.6 5.0 32.750/25 −0.005 0.162 −2.0 5.4 65.350/40 −0.005 0.141 −1.4 5.2 77.6

X. Chen / Journal of Statistical Planning and Inference 136 (2006) 4161–4175 4171

Table 3Comparison of the usual ANCOVA method and the bivariate method for standardized log-Normal (0, 1) distribution(based on 5000 simulations, nominal level = 5% )

Scenario I: Same covariance matrix across treatmentsEqual �1 − �2 SE_est. [Rel. diff. to Se_MC (%)] Type I Adj. powersample size error (%) (%)

ANCOVA I 10/10 0.003 0.302 −7.4 4.0 33.720/20 −0.003 0.232 −3.5 4.1 45.450/50 −0.004 0.157 −4.1 5.4 69.5

ANCOVA II 10/10 0.005 0.29 −23.8 12.0 —20/20 −0.002 0.221 −18.6 11.1 —50/50 −0.004 0.151 −12.8 9.0 —

Bivariate_S I 10/10 0.003 0.285 −12.6 5.2 34.520/20 −0.003 0.226 −6.1 4.7 45.550/50 −0.004 0.155 −5.1 5.7 69.7

Bivariate_S II 10/10 0.003 0.284 −12.6 4.9 34.720/20 −0.003 0.226 −6.0 4.5 45.450/50 −0.004 0.155 −5.1 5.6 69.7

Bivariate_KR I 10/10 0.003 0.3 −7.8 4.0 34.420/20 −0.003 0.232 −3.6 4.0 45.450/50 −0.004 0.157 −4.2 5.4 69.7

Bivariate_KR II 10/10 0.003 0.303 −6.8 3.6 34.420/20 −0.003 0.233 −3.1 3.9 45.650/50 −0.004 0.157 −4.0 5.3 69.7

Unequal �1 − �2 SE_est. Rel diff. to SE_MC (%) Type I Adj powersample size error (%) (%)

ANCOVA I 20/10 −0.002 0.277 −5.3 4.7 34.750/25 −0.007 0.188 −3.7 5.0 55.950/40 −0.005 0.165 −4.4 5.3 65.8

ANCOVA II 20/10 0.012 0.274 −17.4 10.5 —50/25 −0.001 0.182 −12.2 8.4 —50/40 −0.003 0.159 −13.2 9.5 —

Bivariate_S I 20/10 −0.002 0.267 −8.7 5.2 35.150/25 −0.007 0.185 −5.0 5.4 55.750/40 −0.005 0.163 −5.5 5.5 65.8

Bivariate_S II 20/10 0.006 0.255 −11.8 6.5 —50/25 −0.003 0.18 −6.9 6.4 —50/40 −0.004 0.162 −5.7 5.6 65.4

Bivariate_KR I 20/10 −0.002 0.276 −5.5 4.7 35.150/25 −0.007 0.188 −3.7 5.1 55.750/40 −0.005 0.165 −4.5 5.3 65.8

Bivariate_KR II 20/10 0.006 0.269 −7.2 5.6 32.150/25 −0.003 0.184 −5.0 5.8 55.250/40 −0.004 0.165 −4.4 5.3 65.3

4172 X. Chen / Journal of Statistical Planning and Inference 136 (2006) 4161–4175

Table 3 (continued)

Scenario II: Different covariance matrix across treatmentsEqual �1 − �2 SE_est. Rel diff. to SE_MC (%) Type I Adj powersample size error (%) (%)

ANCOVA I 10/10 −0.048 0.246 −17.2 9.5 —20/20 −0.025 0.185 −11.5 8.9 —50/50 −0.010 0.124 −5.7 7.1 —

ANCOVA II 10/10 −0.043 0.222 −30.0 13.9 —20/20 −0.025 0.165 −23.7 13.2 —50/50 −0.010 0.11 −16.8 10.5 —

Bivariate_S I 10/10 −0.048 0.228 −23.2 12.5 —20/20 −0.025 0.179 −14.4 10.0 —50/50 −0.010 0.122 −6.8 7.4 —

Bivariate_S II 10/10 −0.048 0.228 −24.1 11.9 —20/20 −0.025 0.179 −14.5 9.8 —50/50 −0.010 0.122 −6.8 7.4 —

Bivariate_KR I 10/10 −0.048 0.24 −18.9 10.9 —20/20 −0.025 0.184 −12.1 9.4 —50/50 −0.010 0.124 −5.8 7.2 —

Bivariate_KR II 10/10 −0.048 0.243 −19.0 10.4 —20/20 −0.025 0.184 −11.9 9.1 —50/50 −0.010 0.124 −5.7 7.1 —

Unequal �1 − �2 SE_est. Rel diff. to SE_MC (%) Type I Adj powerSample Size error (%) (%)

ANCOVA I 20/10 −0.017 0.217 −15.9 8.8 —50/25 −0.007 0.148 −11.1 7.5 —50/40 −0.010 0.13 −6.8 6.8 —

ANCOVA II 20/10 −0.043 0.204 −23.9 12.0 —50/25 −0.020 0.134 −16.5 9.9 —50/40 −0.013 0.116 −16.7 10.3 —

Bivariate_S I 20/10 −0.017 0.208 −19.5 10.5 —50/25 −0.007 0.145 −12.5 8.1 —50/40 −0.010 0.129 −8.0 7.3 —

Bivariate_S II 20/10 −0.032 0.206 −20.4 11.6 —50/25 −0.013 0.145 −9.0 8.2 —50/40 −0.011 0.129 −7.4 7.4 —

Bivariate_KR I 20/10 −0.017 0.215 −16.7 9.4 —50/25 −0.007 0.147 −11.3 7.7 —50/40 −0.010 0.13 −7.0 7.0 —

Bivariate_KR II 20/10 −0.032 0.217 −16.0 10.3 —50/25 −0.013 0.148 −7.1 7.9 —50/40 −0.011 0.131 −6.1 7.0 —

X. Chen / Journal of Statistical Planning and Inference 136 (2006) 4161–4175 4173

Table 4Estimations of mean change from baseline after each treatment for Normal distribution (n1 = n2 = 50, based on 5000simulations)

�1 SE_est. Rel. diff. to SE_MC (%) �2 SE_est. Rel. diff. to SE_MC (%)

Scenario I: Same covariance matrix across treatmentsANCOVA I 0.002 0.122 −8.8 0.002 0.122 −8.7ANCOVA II 0.002 0.123 −9.0 0.001 0.123 −8.6Bivariate_S I 0.002 0.132 −1.9 0.002 0.132 −1.8Bivariate_S II 0.002 0.131 −3.1 0.001 0.131 −2.6Bivariate_KR I 0.002 0.132 −1.5 0.002 0.132 −1.3Bivariate_KR II 0.002 0.132 −2.0 0.001 0.133 −1.5

Scenario II: Different covariance matrix across treatmentsANCOVA I 0.000 0.095 3.0 −0.001 0.095 −22.6ANCOVA II 0.000 0.085 −3.0 −0.001 0.085 −28.0Bivariate_S I 0.000 0.107 15.8 −0.001 0.107 −12.9Bivariate_S II 0.000 0.086 −1.7 −0.001 0.115 −3.1Bivariate_KR I 0.000 0.107 16.3 −0.001 0.107 −12.6Bivariate_KR II 0.000 0.087 −0.6 −0.001 0.117 −1.6

treatments performs well throughout the simulation (except for the log-normal case). Theconventional Satterthwaite approximation is not reliable when sample size is smaller than 20.

• The power of the bivariate modeling method is similar to that of the ANCOVA method.

Furthermore, Table 4 summarizes the performance of the different methods estimating meanchange from baseline after each treatment. It can be seen that the usual ANCOVA method (bothANCOVA I and II) is unable to provide correct inference for its OLS estimates.

5. Real life example

Here we use the data presented in Rutherford (2001) to illustrate the application of the differ-ent methods discussed in this paper. In this data set, it was assumed that 16 patients’ memoryrecall scores were measured before and after two different treatments (8 patients per group). Theinteraction (between baseline and treatments) was found statistically significant (p = 0.016) fol-lowing the ANCOVA II analysis. Table 5 summarizes the results of the ANCOVA methods andthe bivariate modeling methods. The estimated standard error for �1 − �2 following the ANCOVAII analysis is found noticeably smaller than the other three methods. This is not surprising giventhe obvious random feature of memory recall score of each patient at baseline and it coincideswith the simulation results in Section 4. The ANCOVA I method provided more reliable inferencefor the estimation of �1 − �2. But none of the ANCOVA methods correctly estimated the SE of�i . It is worth noting that the SE of �i is very different between treatments following the Bivari-ate_KR II method. This actually supports the significant interaction detected in the ANCOVA IIanalysis. Following the recommendations of the simulation study, the practitioners are suggestedsummarizing the analysis based on the Bivariate_KR II results.

4174 X. Chen / Journal of Statistical Planning and Inference 136 (2006) 4161–4175

Table 5Real data illustration

Analysis method Change from baseline Between treatment differenceestimate (SE) estimate (SE)

ANCOVA I Trt 1 4.07 (1.38) −9.60 (1.96)Trt 2 13.67 (1.38)

ANCOVA II Trt 1 4.23 (1.12) −9.59 (1.58)Trt 2 13.82 (1.12)

Bivariate_KR I Trt 1 4.07 (1.54) −9.60 (2.00)Trt 2 13.67 (1.54)

Bivariate_KR II Trt 1 4.23 (0.59) −9.60 (2.02)Trt 2 13.83 (2.01)

6. Conclusion and discussion

The focus of this paper is to examine the validity of different methods that make use of ran-dom baseline measurements in treatment effect estimation. The treatment effect is defined in an“average” sense regardless of parallel or nonparallel regressions against baseline across differenttreatments. We have shown that when baseline covariates are random variables, the usual OLS-based ANCOVA treatment effect test is not valid when the regression slopes against the baselinecovariates are different across treatments. We have also shown that with random baseline covari-ates, the usual ANCOVA method is unable to provide valid inference for its OLS estimate of meanchange from baseline for response variable under each treatment. To remedy this deficiency of theOLS-based ANCOVA method, we suggest modeling the pretreatment and posttreatment responsevariables as repeated measures of an outcome. Specifically, when applying the bivariate modelingmethod, our simulation study further suggests that applying the Kenward–Roger approximationand assuming distinct general variance–covariance matrix for different treatment groups wouldachieve the most reliable performance.

When applying the ANCOVA method, practitioners often need to make a choice between theANCOVA I model and the ANCOVA II model. A statistical test for the significance of interactionis usually run a priori to assist this decision. The problem is, however, most of the interaction testsare not sufficiently powered in practice. Moreover, such kind of “conditional” analysis strategyis generally not encouraged as it may inflate the type I error rate of the subsequent tests. Thisdilemma may be easily resolved by applying the bivariate modeling method—assuming distinctgeneral variance–covariance matrices for different treatment groups. The assumption of distinctgeneral variance–covariance structure across different treatments will automatically cover bothANCOVA I and II models. And as we have shown in the simulation study, the loss of power dueto such general model assumptions is very limited.

Another advantage of the bivariate modeling method is that it keeps subjects with partial obser-vations (with only pretreatment measurements or only posttreatment measurements) in analysiswhile the ANCOVA method eliminates those subjects from analysis. The bivariate modelingmethod could thus be more powerful than the ANCOVA type method when there are some sub-jects missing partial observations.

X. Chen / Journal of Statistical Planning and Inference 136 (2006) 4161–4175 4175

Finally, we should note that the bivariate modeling method can be easily generalized to accom-modate more adjustment factors or covariates and to capture the time course of treatment effectwhen the posttreatment response is measured more than once in a clinical trial (a multivariatemodel will be more applicable then). More in depth exploration of this generalization will be thetopic of our future work.

References

Chen, X., Wei, L., 2003. A comparison of recent methods for the analysis of small-sample cross-over studies. Stat. Med.22, 2821–2833.

Cox, D.R., McCullagh, P., 1982. Some aspects of analysis of covariance. Biometrics 38, 541–561.Crager, M.R., 1987. Analysis of covariance in parallel-group clinical trials with pretreatment baseline. Biometrics 43, 895

–901.Follmann, D.A., 1991. The effects of screening on some pretest–posttest variance. Biometrics 47, 763–771.Kenward, M., Roger, J., 1997. Small sample inference for fixed effects from restricted maximum likelihood. Biometrics

53, 983–997.Laird, N., 1983. Further comparative analysis of pretest–posttest research designs. Amer. Statist. 37, 329–340.Rutherford, A., 2001. Introducing ANOVA and ANCOVA: A GLM Approach. Sage Publications, Beverley Hills, CA.Satterthwaite, F., 1941. Synthesis of variance. Psychometrika 6, 309–316.Senn, S., 1994. Testing for baseline balance in clinical trials. Statist. Med. 13, 1715–1726.Swallow, W.H., Monahan, J.F., 1984. Monte Carlo Comparison of ANOVA, MIVQUE, REML, and ML estimators of

variance components. Technometrics 28, 47–57.Verbeke, G., Molenberghs, G., 1999. Linear Mixed Models in Practice. Springer, Berlin.Wright, S.P., Wolfinger, R.D., 1996. Repeated measures analysis using mixed models: some simulation results. Paper

presented at the Conference on Modeling Longitudinal and Spatially Correlated Data: Methods, Applications, andFuture Directions, Nantuket, MA.

Yang, L., Tsiatis, A.A., 2001. Efficiency study of estimators for a treatment effect in a pretest–posttest trial. Amer. Statist.55, 314–321.

Zhang, J., Boos, D., 1994. Adjusted power estimates in Monte Carlo experiments. Comm. Statist. Simulation Comput.23 (1), 165–173.