Download - A SEQUENTIAL PROCEDURE FOR COMPARING TWO EXPERIMENTAL TREATMENTS WITH A CONTROL

This article was downloaded by: [University of California Santa Cruz]On: 24 October 2014, At: 22:28Publisher: Taylor & FrancisInforma Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House,37-41 Mortimer Street, London W1T 3JH, UK

Journal of Biopharmaceutical StatisticsPublication details, including instructions for authors and subscription information:http://www.tandfonline.com/loi/lbps20

A SEQUENTIAL PROCEDURE FOR COMPARING TWOEXPERIMENTAL TREATMENTS WITH A CONTROLEmmanuelle Vincent a , Susan Todd b & John Whitehead ba Pfizer Global Research and Development , Fresnes Laboratories , 3 à 9 rue de la Loge,Fresnes Cedex, 94265, Franceb Medical and Pharmaceutical Statistics Research Unit , The University of Reading , EarleyGate, Reading, RG6 6FN, UKPublished online: 05 Oct 2011.

To cite this article: Emmanuelle Vincent , Susan Todd & John Whitehead (2002) A SEQUENTIAL PROCEDURE FOR COMPARINGTWO EXPERIMENTAL TREATMENTS WITH A CONTROL, Journal of Biopharmaceutical Statistics, 12:2, 249-265, DOI: 10.1081/BIP-120015747

To link to this article: http://dx.doi.org/10.1081/BIP-120015747

PLEASE SCROLL DOWN FOR ARTICLE

Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) containedin the publications on our platform. However, Taylor & Francis, our agents, and our licensors make norepresentations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of theContent. Any opinions and views expressed in this publication are the opinions and views of the authors, andare not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon andshould be independently verified with primary sources of information. Taylor and Francis shall not be liable forany losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoeveror howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use ofthe Content.

This article may be used for research, teaching, and private study purposes. Any substantial or systematicreproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in anyform to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http://www.tandfonline.com/page/terms-and-conditions

http://www.tandfonline.com/loi/lbps20

http://www.tandfonline.com/action/showCitFormats?doi=10.1081/BIP-120015747

http://www.tandfonline.com/action/showCitFormats?doi=10.1081/BIP-120015747

http://dx.doi.org/10.1081/BIP-120015747

http://www.tandfonline.com/page/terms-and-conditions

http://www.tandfonline.com/page/terms-and-conditions

A SEQUENTIAL PROCEDURE FOR COMPARINGTWO EXPERIMENTAL TREATMENTS

WITH A CONTROL

Emmanuelle Vincent,1,* Susan Todd,2 and John Whitehead2

1Pfizer Global Research and Development, Fresnes Laboratories,

3 a 9 rue de la Loge, 94265 Fresnes Cedex, France2Medical and Pharmaceutical Statistics Research Unit, The University

of Reading, Earley Gate, Reading RG6 6FN, UK

ABSTRACT

A procedure is described in which patients are randomized between two

experimental treatments and a control. At a series of interim analyses, each

experimental treatment is compared with control. One of the experimental

treatments might then be found sufficiently superior to the control for it to be

declared the best treatment, and the trial stopped. Alternatively, experimental

treatments might be eliminated from further consideration at any stage. It is

shown how the procedure can be conducted while controlling overall error

probabilities. Data concerning evaluation of different doses of riluzole in the

treatment of motor neurone disease are used for illustration.

Key Words: Clinical trials; Multiple comparisons; Sequential methods

INTRODUCTION

Sequential designs for clinical comparisons of two treatments are now well

established and widely implemented: for introductory accounts see Whitehead[1]

249

DOI: 10.1081/BIP-120015747 1054-3406 (Print); 1520-5711 (Online)Copyright q 2002 by Marcel Dekker, Inc. www.dekker.com

*Corresponding author. E-mail: [email protected]

JOURNAL OF BIOPHARMACEUTICAL STATISTICS

Vol. 12, No. 2, pp. 249–265, 2002

©2002 Marcel Dekker, Inc. All rights reserved. This material may not be used or reproduced in any form without the express written permission of Marcel Dekker, Inc.

MARCEL DEKKER, INC. • 270 MADISON AVENUE • NEW YORK, NY 10016

Dow

nloa

ded

by [

Uni

vers

ity o

f C

alif

orni

a Sa

nta

Cru

z] a

t 22:

28 2

4 O

ctob

er 2

014

and Jennison and Turnbull.[2] Much less developed is the methodology for

comparing more than two treatments. In practice, ad-hoc modifications to methods

for two treatments have often been used: see Whitehead and Thomas[3] for a

description of a comparison of two nonsteroidal anti-flammatory drugs with

placebo control in arthritis and Johnson et al.,[4] who compare low and high

intravenous doses of a drug with oral administration in prostatic cancer.

Amongst multiple treatment methods, three situations can be distinguished:

(a) The treatments comprise a control and two or more qualitatively

different experimental treatments;

(b) The treatments are all qualitatively different and none of them can be

considered to be a control;

(c) The treatments are different doses of the same drug, and if a control

group is present it can be characterized as being dose 0.

This paper concerns situation (a), which commonly arises in drug

development. The objective is to establish whether one or more of the

experimental treatments is significantly superior to a control. A series of interim

analyses will allow early elimination of inferior treatments or early selection of the

obvious best. The goal of early elimination and selection will be pursued in order

to minimize the exposure of patients to inferior treatments. Alternative approaches

might give precise estimation a greater priority, which would lead to an

avoidance of eliminating or selecting treatments early, but this strategy is not

adopted here.

Previous approaches to situation (a) include the method of Paulson[5] that

allowed elimination of inferior treatments, but did not consider whether the

remaining treatments were significantly better than control. Follmann, Proschan,

and Geller[6] consider an approach to sequential multiple treatment comparisons

that can be applied to either situation (a) or (b) (see also Ref. [7]). They apply an

a-spending function approach to the preservation of overall type I error rate, and

evaluate the associated critical stopping limits using simulation. Although the

approach taken here differs in detail, it can be viewed as building upon Follmann,

Proschan, and Geller’s work, adding to it methods for achieving set power that

react progressively to accumulating knowledge of the rate at which information is

acquired and of values of certain nuisance parameters. Numerical integration is

used for all calculations, rather than simulation. Tang and Geller[8] suggest an

alternative approach based on a closed testing procedure, as an application of a

method developed principally in the context of multiple endpoints. They present

procedures that preserve type I error, but do not explore the computation of power

in any detail.

The methods developed for case (a) may be applied when treatments are

actually alternative doses of the same drug. This is appropriate when it is not

desirable to make assumptions about the shape of the dose–response curve. The

principal illustration of this paper is a phase III comparison of placebo and daily

doses of 50, 100, and 200 mg bid of the drug riluzole in amyotrophic lateral

VINCENT, TODD, AND WHITEHEAD250



Dow

nloa

ded

by [

Uni

vers

ity o

f C

alif

orni

a Sa

nta

Cru

z] a

t 22:

28 2

4 O

ctob

er 2

014

sclerosis (ALS).[9] This condition is also known as motor neurone disease or Lou

Gehrig’s disease. The endpoint under consideration was survival time from

randomization to death, and at the design stage investigators were not prepared to

assume monotonicity of the dose–response relationship, let alone linearity. This

trial is described in greater detail in “A Trial Evaluating Riluzole in the Treatment

of Amyotrophic Lateral Sclerosis.”

Situation (b), in which no treatment acts as control, is less common in

clinical trials. Were it to occur, it is unlikely that the elimination procedure of

Paulson[10] or the sequential x 2- and F-tests of Siegmund[11] and Jennison and

Turnbull[12] would be suitable. The former seeks to select the best treatment,

without regard to it being significantly better than its rivals; the latter stop as soon

as it has been established that not all treatments are identical, regardless of the

pattern of treatment effects. Of greater practical interest for clinical trials is the

extension due to Siegmund[13] for the case of three treatments: once it has been

established that the treatments are not identical, a second stage is used to identify

the best. This approach is applied to three treatment survival studies by

Betensky.[14] As mentioned earlier, the approach due to Follmann, Proschan, and

Geller[6] can also be applied to case (b).

In case (c), the treatments are all doses of the same drug, and if a particular

form of dose–response relationship can be assumed, then specific methods can be

applied. Whitehead (see Ref. [1]; Section 8.4) describes a sequential trend test

applicable to binary responses which is a sequential version of the trend test of

Armitage.[15] Focusing on a single parameter such as slope allows the direct

application of univariate procedures.

The methodology described in this paper is based on the statistics comparing

each of the experimental treatments with control, and their joint, approximately

multivariate normal distribution. Especially important is the allowance for the

correlations between them. The approach builds on the methods of bivariate

sequential analysis developed for the simultaneous monitoring of safety and

efficacy in a comparison of two treatments by Jennison and Turnbull[16] and Cook

and Farewell.[17] More specifically, we extend the work of Todd[18,19] who allows

for updating the parameters underlying the design at each interim analysis in the

context of a more general formulation. This allows us to determine designs that

will achieve given overall power specifications, and enables appropriate

recomputation of future stopping limits in line with emerging patterns of

information accrual and correlation.

A TRIAL EVALUATING RILUZOLE IN THE TREATMENT OFAMYOTROPHIC LATERAL SCLEROSIS

Between December 1992 and November 1993, 959 patients suffering

from ALS were randomized in roughly equal proportions between a placebo

EXPERIMENTAL TREATMENTS WITH CONTROL 251



Dow

nloa

ded

by [

Uni

vers

ity o

f C

alif

orni

a Sa

nta

Cru

z] a

t 22:

28 2

4 O

ctob

er 2

014

control arm and daily doses 50, 100, and 200 mg bid of the drug riluzole.[9]

Treatment was administered to each patient for a period of up to 18 months,

after which open label treatment was allowed at the discretion of the treating

clinician. The primary efficacy response was time from randomization to

failure, where failure included death, tracheostomy, or intubation with artificial

ventilation. Survival times were censored at 18 months or at the analysis cut-

off date of 31 December 1994. The principal analysis was based on logrank

tests stratified by mode of disease onset (limb or bulbar). An intention-to-treat

approach was taken, and data from all 959 randomized patients were included.

Each dose group was separately compared with control, with primary attention

focused on the case of the 100 mg group because this dose had already been

found to have a significantly beneficial effect on survival in an earlier phase II

study. Although a trend test was planned in the protocol, this was not

identified as the primary analysis because no pre-trial assumption of linearity

(in log dose; placebo taken to be dose 1) could confidently be made. Indeed,

it was feared that adverse events might cause mortality to be higher on

200 mg than on 100 mg and so not even monotonicity could be taken for

granted.

The trial employed separate sequential “open top” designs[1] for each

pairwise comparison with placebo. Using this method, the logrank statistic, in the

form observed number of failures on control minus expected number and stratified

by mode of onset, was plotted against its null variance at each of a series of five

interim analyses. The null variance is approximately equal to one quarter of the

number of failures and the logrank statistic has expected value equal to the log

hazard ratio multiplied by the null variance. The log hazard ratio is expressed so as

to have positive values if riluzole is superior to control. The opentop design has a

single lower boundary, and if the logrank statistic ever falls below this, it can be

concluded that riluzole is significantly worse than control. Otherwise, the trial

proceeds to its planned conclusion and a final analysis that allows for the use of the

lower boundary is conducted.

This mild form of sequential procedure allowed stopping of the trial for

evidence of harm, but not for benefit. The investigators were reluctant to stop

early for benefit in this trial because of the complications of the eventual

multiple comparisons with control, and because of concern that the hazards in

the riluzole groups might converge to that in the control group over time. The

latter feature had already been suggested by the data from the phase II trial of

the 100 mg dose and, if true, it would invalidate the assumption of propor-

tional hazards. Consequently, a sequential design based on proportional hazards

would have an elevated probability of wrongly stopping at an early interim

analysis.

The method described in this paper allows properly for the multiple

comparison issue, but it must be acknowledged that it does depend for its validity

on proportional hazards. Retrospective reanalyzes of data from the riluzole trial

will nevertheless be used to illustrate the new methodology.




Dow

nloa

ded

by [

Uni

vers

ity o

f C

alif

orni

a Sa

nta

Cru

z] a

t 22:

28 2

4 O

ctob

er 2

014

A REVIEW OF SEQUENTIAL METHODS FOR TWOTREATMENTS

In this paper, the parameterizations and test statistics presented in Ref. [1]

will be combined with the error-spending approach of Slud and Wei.[20] In this

section, the approach will be reviewed in the simpler setting of two treatments: an

experimental (E ) and a control (C ).

The parameter u represents the true advantage of E over C (if u . 0; then E

is better than C ). For binary data, u could be a log-odds ratio; for survival data a

log-hazard ratio. The statistic Z is the efficient score statistic for u based on the

available data, while V represents Fisher’s (observed) information about u

contained in the data. In the survival case mentioned earlier, Z and V are the

logrank and its null variance, respectively. When u is small, conditionally on the

value of V, Z follows the normal distribution with mean uV and variance V.

Furthermore, in the cases of binary and survival data, and in many more situations,

Z has independent increments between consecutive interim analyses and values of

Z plotted against V at successive interim analyses resemble points on a

Brownian motion with drift u.[21] Note that, in this notation, the variance of Z is

not unity.

A sequential procedure consists of a series of interim analyses. At the mth

of these, the test statistics, denoted by Zm and Vm, are calculated, m ¼ 1; . . . ;M:If Zm � ð‘m; umÞ; where ‘m and um are critical bounds such that ‘m , um; then

the trial will be stopped. Usually the outcome Zm . um corresponds to evidence

that E is significantly better than C, while Zm , ‘m corresponds to evidence that

E is no better than or significantly worse than C. The values ‘m and um are

determined so that the null probability that Zj . uj; for any j # m; is equal to a

predetermined fraction of the overall one-sided type I error rate a. The critical

values ‘m and um are computed progressively, so that when they are to be found,

the values ‘j and uj ð j ¼ 1; . . . ;m 2 1Þ are already known. There remains a

constraint to be set, which can be done by requiring that ‘m ¼ 2um or by

specifying a second spending function concerning the null probability of Zj , ‘j;for any j # m; m ¼ 1; . . . ;M:

The power of the procedure is the probability that for some m, Zm . um

for a clinically relevant value uR (.0) of u. The value of VM is fixed in order

that the pre-specified power is achieved. The power does depend on the

sequence of values V1, V2, . . . actually observed, but it will be fixed for the

sequence that is anticipated at the design stage. In fact the power changes

little in response to minor departures from the anticipated schedule of V

values: the design can be progressively modified in order to preserve power if

there is a more substantial departure from the interim analysis schedule.

Jennison and Turnbull (see Ref. [2]: Section 7.2.2) discuss the robustness of

power to changes in the schedule of looks and Scharfstein and Tsiatis[22]

describe ways of modifying designs to preserve power when departures from

the schedule are more substantial.




Dow

nloa

ded

by [

Uni

vers

ity o

f C

alif

orni

a Sa

nta

Cru

z] a

t 22:

28 2

4 O

ctob

er 2

014

A SEQUENTIAL DESIGN FOR THE COMPARISON OF TWOEXPERIMENTAL TREATMENTS WITH A CONTROL

The procedure described in this section generalizes the univariate approach

of “A Review of Sequential Methods for Two Treatments,” enabling selection of

the best of several experimental treatments while controlling an overall type I error

rate and power. The sequential aspect of the procedure allows the selection of the

best experimental treatment at any of the planned interim inspections, provided

sufficient information has been gathered. Elimination of an experimental

treatment found to be worse than control is also permitted at any interim inspection

of the data. Thus, both a selection and an elimination process are carried out

simultaneously through this sequential procedure. The procedure developed here

is conducted using the statistics described in the previous section for pairwise

comparisons of treatments with control and derived as efficient for the respective

marginal null hypotheses. This allows the use of familiar test statistics and

becomes natural once treatments have been eliminated. For simplicity, the

procedure is developed for the case of just two experimental treatments and

control, but the principles involved can be extended to more.

Overall Type I Error Rate and Overall Power

An overall type I error rate v is defined as the probability that at least one of the

experimental treatments is found to be sufficiently better than the control under the

overall null hypothesis H0 that all experimental treatments are equivalent to control.

By definition, this type I error rate is one-sided: the probability that is controlled

under H0 is the probability of wrongly selecting one of the experimental treatments.

The word significantly and the notation p are avoided here since conventional levels

for these quantities may not be appropriate in this bivariate setting.

The overall power, 1 2 l; is defined as the probability of selecting the best

experimental treatment provided the latter is in truth better than control and better

than the other experimental treatment. The alternative hypothesis H1 considered

here and defined later has already been used by Dunnett,[23] who allows a more

general formulation and considers fixed sample sizes, and by Thall, Simon, and

Ellenberg[24] who develop two-stage procedures applied to oncology clinical trials.

Following “A Review of Sequential Methods for Two Treatments,” let the

parameter ui represent the true advantage of experimental Ei over the control

ði ¼ 1; 2Þ: The statistics ZðiÞm compare each experimental treatment Ei with the

control at each interim inspection m ðm ¼ 1; . . . ;MÞ : ZðiÞm is the efficient score

statistic for ui based on the data available from the control and the ith experimental

treatment. The statistics V ðiÞm represent Fisher’s (observed) information about ui at

the mth interim inspection ði ¼ 1; 2; m ¼ 1; . . . ;MÞ:At the mth interim inspection, the observed efficient scores ZðiÞ

m are compared

to the critical bounds cm

pV ðiÞ

m and dm

pV ðiÞ

m ; where cm and dm are critical values




Dow

nloa

ded

by [

Uni

vers

ity o

f C

alif

orni

a Sa

nta

Cru

z] a

t 22:

28 2

4 O

ctob

er 2

014

with cm . dm, common to the two dimensions ði ¼ 1 and 2Þ and chosen to

control the overall type I error rate and overall power. If both observed ZðiÞm lie

within the continuation region ðdm

pV ðiÞ

m ; cm

pV ðiÞ

m Þ then both pairwise comparisons

with control are considered again at the ðm þ lÞth interim inspection. The

sequential procedure stops if for at least one experimental treatment Ei, i ¼ 1; 2;the observed statistic ZðiÞ

m is greater than cm

pV ðiÞ

m ; in this case, experimental

treatment Ei is selected as the best treatment. However, if ZðiÞm is less than dm

pV ðiÞ

m ;then experimental treatment Ei is eliminated and will not be considered at any later

inspection. If both experimental treatments are found to be sufficiently better than

control, then the selection of the best experimental treatment is made according to

some predefined criterion, such as an ordering in ZðiÞm ; ZðiÞ

m =p

V ðiÞm ; or ZðiÞ

m =V ðiÞm : If, by

the end of the sequential procedure, neither of the experimental treatments has

been selected or both have been eliminated, then neither can be regarded as

superior to the control. This procedure is described in “A Review of Sequential

Methods for Two Treatments” and illustrated in Fig. 1(c) of Ref. [19].

Computation of the overall type I error rate v can be illustrated in the simple

situation in which only two looks at the data are planned: one interim and one final

inspection ðM ¼ 2Þ: The overall one-sided type I error rate v is then the probability

that at least one of the two experimental treatments is found to be sufficiently better

than control at either look at the data, under the overall null hypothesis H0 that both

experimental treatments are equivalent to control: H0 : u1 ¼ u2 ¼ 0: The

continuation region for treatment Ei at the first interim look at the data is defined as:

CðiÞ1 ¼ d1

ffiffiffiffiffiffiffiV ðiÞ

1

q; c1

ffiffiffiffiffiffiffiV ðiÞ

1

q� �; i ¼ 1; 2:

Then:

v ¼ P Zð1Þ1 $ c1

ffiffiffiffiffiffiffiffiV ð1Þ

1

qand=or Zð2Þ

1 $ c1


1

q� �

þ P Zð1Þ2 $ c2


2

qand=or Zð2Þ

2 $ c2


2

q� ��

and�Zð1Þ

1 [ Cð1Þ1 and Zð2Þ

1 [ Cð2Þ1

��

þ P Zð1Þ2 $ c2


2

qand Zð1Þ


1 # d1


1

q� ��

þ P Zð2Þ2 $ c2


2

qand Zð2Þ


1 # d1


1

q� �� :ð1Þ

The alternative hypothesis H1 considered is based on two quantities, d1 and d 2,

ðd 2 . d1 . 0Þ: The value d1 corresponds to a marginal improvement over the

control, whereas d 2 represents a clinically relevant improvement. Suppose that only

one of the two experimental treatments achieves a clinically relevant improvement

ui ¼ d 2 over the control, the remaining experimental treatment brings only a




Dow

nloa

ded

by [

Uni

vers

ity o

f C

alif

orni

a Sa

nta

Cru

z] a

t 22:

28 2

4 O

ctob

er 2

014

marginal improvement d1 over the control. In this case, it is important that we select

the better treatment and that we do not select the other. The power of the procedure,

1 2 l; is defined as the probability of making the correct selection under this

alternative hypothesis, denoted by H1. The values d1 and d 2 thus define an

indifference zone. The number of patients per treatment group is calculated to

achieve a specified value of overall power.

Computation of the power is also illustrated in the case where the two

experimental treatments are considered at two looks only. Here, the alternative

hypothesis is H1 : u1 ¼ d1; u2 ¼ d 2; and

1 2 l ¼P Zð2Þ1 $ c1


1

qand Zð2Þ

1 =


1

q$ Zð1Þ

1 =


1

q� �

þ P Zð2Þ2 $ c2


2

qand Zð2Þ

2 =


2

q$ Zð1Þ

2 =


2

q� ��

and

�Zð1Þ


1 [ Cð2Þ1

��

þ P Zð2Þ2 $ c2


2

qand Zð2Þ


1 # d1


1

q� �� ;

ð2Þ

assuming that selection between two good treatments is based on Z=ffiffiffiffiV

p. A similar

and equally valued power would be achieved for the alternative H1 : u2 ¼ d1;u1 ¼ d 2:

Choosing the Stopping Boundaries

For the sequential design considered here, the upper critical values cm,

m ¼ 1; . . . ;M; are determined through appropriate allocation of the one-sided type

I error rate across interim analyses, whereas the lower critical values dm are set to

be constant ðdm ¼ d; m ¼ 1; . . . ;MÞ: The methodology could be extended to

allow different upper critical values for each experimental treatment and to allow

the lower critical values dm to be fixed using another error spending function.

In order to obtain the upper critical values cm, the joint distribution of the two

efficient scores ZðiÞm ðm ¼ 1; . . . ;MÞ is determined (under H0) using recursive

multivariate numerical integration, generalizing the approach of Armitage et al.[25]

to the bivariate case. When designing the trial, computations are conducted to

determine the power and other properties of the procedure using anticipated values

of the information statistics V ðiÞm : When allocation to E1 and E2 proceeds at equal

rates, it will usually be sufficient to suppose that these statistics will be equal at all

interim looks: V ðiÞm ¼ Vm ði ¼ 1; 2 and m ¼ 1; . . . ;MÞ: An exception is the case of

survival data, in which the condition for equality of the V ðiÞm is that the event rates

on E1 and E2 are equal, which will occur when the allocation rates are equal and

the treatment difference is small.




Dow

nloa

ded

by [

Uni

vers

ity o

f C

alif

orni

a Sa

nta

Cru

z] a

t 22:

28 2

4 O

ctob

er 2

014

Since both pairwise comparisons of the experimental treatments are against

the same control group, the efficient scores ZðiÞm are correlated. The correlation

between the ZðiÞm statistics is allowed in the determination of their joint distribution. It

can be shown in the case of binary or normally distributed observations that when

the experimental treatment groups are of equal size the correlation between Zð1Þm and

Zð2Þm is approximately 0.5, m ¼ 1; . . . ;M: For binary data, this approximation is

valid when the success probabilities in the three groups are similar, and exact when

they are equal. For normally distributed data, the result assumes a common variance

in all three treatment groups and neglects the sampling distribution of its variance.

These results follow because, in each case, ZðiÞm is proportional to SðiÞm 2 SðCÞ

m where

SðiÞm and SðCÞ

m are the sums of the observations in the ith treated group ði ¼ 1; 2Þ and the

control group, respectively. Now var{SðiÞm 2 SðCÞm } < 2var{SðCÞ

m } under the

conditions stated earlier, and cov{Sð1Þm 2 SðCÞ

m ; Sð2Þm 2 SðCÞ

m } ¼ varSðCÞm : The case of

survival data is considered in the next section.

Given the critical values cm and dm, and planned information ratios Vm/VM,

m ¼ 1; . . . ;M; where VM is the information statistic at the last possible inspection,

the power of the procedure can be found. A search procedure can then be used to

find the value of VM that achieves the specified power level l 2 l: Recursive

numerical integration can also be used to compute expected final values of V and

to deduce the maximum sample size. This process is illustrated for survival data in

“Specifications for the Sequential Design.”

When interim analyses are conducted, the upper critical values will be

recomputed to allow for the actual (possibly unequal) values of V ð1Þm and V ð2Þ

m and

the estimated correlation between Zð1Þm and Zð2Þ

m :

Application to Survival Data

For survival data, the parameter ui can be taken to be minus the log of the ratio

of the hazard on Ei to that on C ði ¼ 1; 2Þ: Proportional hazards are assumed so that

the ui do not depend on time. The efficient score Z (i ) is the logrank statistic for

comparing Ei with C and V (i ) is its null variance, where in this section the subscript

m indicating the mth interim analysis has been suppressed for the sake of simplicity.

In order to derive an expression for the correlation between Z (1) and Z (2),

some notation must be introduced. Suppose there are f distinct event times,

tð1Þ , · · · , tðf Þ; at the jth of which o†j events occur, oCj on control and oij on Ei

ði ¼ 1; 2; o†j ¼ oCj þ o1j þ o2j; j ¼ 1; . . . ; f Þ: The total numbers of events on C

and Ei are oC† and oi†, respectively, ði ¼ 1; 2; o†† ¼ oC† þ o1† þ o2†Þ: The

numbers of patients at risk at time t( j ) on C and Ei are rCj and rij, respectively

ði ¼ 1; 2; r†j ¼ rCj þ r1j þ r2j; j ¼ 1; . . . ; f Þ:In this notation, the familiar expressions for the pairwise logrank statistics

and their null variances become

Z ðiÞ ¼ oC† 2Xf

j¼ 1

ðoCj þ oijÞrCj

ðrCj þ rijÞð3Þ




Dow

nloa

ded

by [

Uni

vers

ity o

f C

alif

orni

a Sa

nta

Cru

z] a

t 22:

28 2

4 O

ctob

er 2

014

and

V ðiÞ ¼Xf

j¼ 1

ðoCj þ oijÞðrCj þ rij 2 oCj 2 oijÞrCjrij

ðrCj þ rijÞ2ðrCj þ rij 2 1Þ

; i ¼ 1; 2: ð4Þ

The covariance between Z (1) and Z (2) is obtained in Appendix 1 using the joint

hypergeometric distribution of oCj, o1j, and o2j as

covðZ ð1Þ;Z ð2ÞÞ ¼Xf

j¼1

ðoCj þo1j þo2jÞðrCj þ r1j þ r2j 2oCj 2o1j 2o2jÞrCjr1jr2j

ðrCj þ r1jÞðrCj þ r2jÞðrCj þ r1j þ r2jÞðrCj þ r1j þ r2j 21Þ:

ð5Þ

Finally, the correlation r can be found as cov(Z (1), Z (2))/p

(V (1)V (2)).

In the neighborhood of the global null hypothesis, with equal allocation to

treatments, rCj < r1j < r2j < r†j/3 ð j ¼ 1; . . . ; f Þ; and oC† < o1† < o2† < o††/3,

so that V (i ) < (oC† þ oi†)/4 < o††/6 ði ¼ 1; 2Þ and covðZ ð1Þ;Z ð2ÞÞ < o††=12 <V ð1Þ=2 < V ð2Þ=2: When designing the trial it will be supposed that the correlation

r is equal to 0.5, whereas when monitoring the design Eqs. (4) and (5) will be used

to obtain more accurate estimates at each interim analysis.

RETROSPECTIVE ANALYSIS OF THE ALS/RILUZOLE STUDY

In this section, a sequential procedure is retrospectively designed and

conducted for the ALS/riluzole study. Only two of the experimental treatments,

the 100 mg daily riluzole dose (E1) and the 200 mg daily riluzole dose (E2) are

compared with placebo, in order to keep the computational aspect of the problem

relatively simple.

Specifications for the Sequential Design

As in the original study,[9] five interim inspections of the data are planned

ðM ¼ 5Þ; spaced equally in terms of information. The overall one-sided type I

error rate v is set equal to 0.05. A linear spending function is adopted for v; if vm is

the probability of falsely finding at least one of the two experimental treatments

sufficiently better than control at or before the mth interim look, then vm should

equal (vVm)/V5, m ¼ 1; . . . ; 5: The probability of falsely rejecting H0 at the m th

interim inspection (and not before) is denoted by hm, so that vm ¼ h1 þ · · · þ hm:The upper critical value cm is found so that the type I error requirement vm is

met, taking the correlation between Zð1Þm and Zð2Þ

m to be 0.5. The critical value d for

elimination was chosen to be 21.96 for each interim look. This familiar, but

admittedly rather arbitrary, critical value was chosen to allow elimination to occur

before too many adverse data could be observed.




Dow

nloa

ded

by [

Uni

vers

ity o

f C

alif

orni

a Sa

nta

Cru

z] a

t 22:

28 2

4 O

ctob

er 2

014

The alternative hypothesis H1 was defined by setting d1 ¼ 0:20 and

d2 ¼ 0:43: The latter figure is the estimate of the treatment effect from the Cox

model for the primary comparison (100 mg/day riluzole) observed in the original

study. This estimate is used in this illustration to ensure sufficient power to the

statistical procedure proposed here. These choices seemed appropriate in the

context of the ALS/riluzole study where no effective treatment was available: d1

might be set closer to d2 in other situations. The overall power 1 2 l was chosen to

be 0.90. Table 1 shows the critical values and required amounts of information for

this design, computed to fulfill this power requirement using extensions of Eqs. (1)

and (2) to five looks at the data. The corresponding maximum pairwise Fisher’s

information statistic, V5, was found to be 92.27, as shown in Table 1. Multiplying

by 6 (as V ðiÞ ¼ o††=6; see “Application to Survival Data”) leads to a predicted

maximum of 554 events. The maximum number of patients required can then be

obtained by dividing the total number of events required by the probability of

failing for any study patient (following the approach of Freedman[26]). The latter

probability was anticipated to be 0.48 based upon results of the earlier phase II

study, and the corresponding maximum number of patients Nmax is 1153.

Expressions similar to those in Eqs. (1) and (2) can be derived for the probabilities

of elimination and selection at each of the interim analyses. From these, the

expected number of events while on study can be deduced. Under the alternative

hypothesis, this is equal to 264 events. Of course, patients already recruited may

experience events after elimination of their treatment. Furthermore, the expected

number of interim analyses can be found to be 2.38. From details of the anticipated

recruitment pattern, the corresponding expected trial duration could also be found.

Sample size is driven by the detection of a clinically relevant difference vs.

control. However, the position of the different experimental treatments relative to

control can affect the calculations dramatically. For comparison, the maximum

pairwise Fisher’s information for the power requirement d1 ¼ 0 and d 2 ¼ 0:43

was found to be 75.41. This is relatively close to the value of 92.27 for d1 ¼ 0:2and d 2 ¼ 0:43 used in the example. However, when d1 ¼ 0:3 and d 2 ¼ 0:43 the

maximum pairwise Fisher’s information required rises to 529.86. Thus, if d1 is set

close to d 2 in a trial, the sample size required can increase substantially.

Table 1. Sequential Design for the ALS / Riluzole Study with Linear

Spending Function and v ¼ 0:05 and 1 2 l ¼ 0:90; Nmax ¼ 1153;

E(Number of Events) ¼ 264; E(Number of Interim Analyses) ¼ 2.38

m vm hm cm d Vm

1 0.01 0.01 2.5578 21.96 18.45

2 0.02 0.01 2.4642 21.96 36.91

3 0.03 0.01 2.3744 21.96 55.36

4 0.04 0.01 2.2944 21.96 73.81

5 0.05 0.01 2.2226 21.96 92.27




Dow

nloa

ded

by [

Uni

vers

ity o

f C

alif

orni

a Sa

nta

Cru

z] a

t 22:

28 2

4 O

ctob

er 2

014

Monitoring the Study

At the first interim analysis, the observed logrank statistics zð1Þ1 ; zð2Þ1 and their

variances vð1Þ1 ; vð2Þ1 were calculated. Values of vð1Þ1 and vð2Þ1 were 2.46 and 2.64,

respectively, and the correlation between the two efficient scores was now

estimated by r1 ¼ 0:437 using Eq. (5). These values differ from those planned and

listed in Table 1, in particular the V values are much less than anticipated because

of slow recruitment. As a result, the critical value c1 was recomputed allowing for

r1; vð1Þ1 ; vð2Þ1 and maintaining the probability of falsely rejecting the null hypothesis

H0 at the first interim look to be v1 ¼ 0:01: The revised critical bound c1 was found

to be 2.5624. Each of the two test statistics zð1Þ1 and zð2Þ1 was then compared to the

corresponding critical bounds dp

vðiÞ1 and c1

pvðiÞ1 ði ¼ 1; 2Þ: Since both

statistics were in the continuation region, i.e., dp

vð1Þ1 , zð1Þ1 , c1

pvð1Þ1 and

dp

vð2Þ1 , zð2Þ1 , c1

pvð2Þ1 (see Table 2), the trial progressed to the second interim

look with all three treatment groups.

At the second look, the correlation r 2 between Zð1Þ2 and Zð2Þ

2 was re-evaluated

using the additional observations in the three treatment groups and was found to be

0.457. The actual probability h½2�1 of falsely rejecting the null hypothesis at the first

interim inspection given r 2, was recomputed as 0.009962. The critical value c2

such that v2 was equal to 0.02 was based on a re-calculated probability of falsely

rejecting the null hypothesis at the second interim look (and not before) of

0.010038 ð¼ 0:02 2 h½2�1 Þ; c2 was found to be 2.5204. Again, zð1Þ2 and zð2Þ2 were

compared with their respective operative boundaries; as dp

vð1Þ2 , zð1Þ2 , c2

pvð1Þ2

and dp

vð2Þ2 , zð2Þ2 , c2

pvð2Þ2 ; the trial proceeded to the third interim inspection

with C, E1, and E2. Notice that only future critical values are recomputed. Those

already used are not changed although the effect of using them is reassessed.

At the third interim look, much more data had accumulated, and the resulting

test statistics are shown in Table 2. The probabilities of falsely rejecting H0 at the

first two looks, given r 3 ¼ 0:478; were found to be h½3�1 ¼ 0:009919 whereas

h½3�2 ¼ 0:009989; leaving 0:03 2 h½3�

1 2 h½3�2 ¼ 0:010092 to be spent at the third

look. It follows that c3 ¼ 2:4878 and since zð1Þ3 . c3

pvð1Þ3 ; monitoring stopped

with the selection of E1 (the 100 mg daily dose of riluzole) as the best treatment.

Table 2. Monitoring of the Case Study Comparing Placebo, 100, and 200 mg Riluzole Daily

Interim

m

Ei,

i ¼ 1; 2 rm zðiÞm vðiÞm cm dffiffiffiffiffiffivðiÞm

pcm

ffiffiffiffiffiffivðiÞm

ph½m�

j ; j # m vm

1 E1 0.437 0.56 2.46 2.5624 23.07 4.02 0.01

E2 20.18 2.64 23.18 4.16

2 E1 0.457 1.78 9.58 2.5204 26.07 7.80 0.009962 ( j ¼ 1) 0.02

E2 2.14 12.10 26.82 8.77 0.010038 ( j ¼ 2)

3 E1 0.478 14.41 33.15 2.4878 211.28 14.32 0.009919 ( j ¼ 1) 0.03

E2 11.29 35.13 211.62 14.74 0.009989 ( j ¼ 2)

0.010092 ( j ¼ 3)




Dow

nloa

ded

by [

Uni

vers

ity o

f C

alif

orni

a Sa

nta

Cru

z] a

t 22:

28 2

4 O

ctob

er 2

014

Treatment E2 was not found to be sufficiently better than control at this third

interim look, but was not eliminated for being worse.

DISCUSSION

In this paper, a procedure has been described for comparing two

experimental treatments with a control while maintaining specified values for

the risks of type I and type II error. Calculation of a precise p-value, and of point

and interval estimates for the magnitude of the advantage of the chosen treatment

(and of the discarded treatment) over control remain an open problem.

This paper shows how a sequential procedure can be constructed so that the

upper critical values cm satisfy a prespecified error spending function, and the

lower critical values dm are either imposed or derived from a second spending

function. It is possible to compute the expected number of interim analyses for any

given design under any pair of treatment differences u1 and u2, and such calcula-

tions can provide the basis for choice between rival designs. The identification of

optimal designs, and indeed the identification of suitable optimality criteria,

remains to be explored. This paper has also shown how the design can be revised at

successive interim analyses to allow for the emerging pattern of increases in

information and value of the correlation between test statistics.

Here, the error spending approach of Slud and Wei[20] has been used for

simplicity. The better known a-spending approach of Lan and DeMets[27] is an

attractive alternative, as it allows the proportion of the error rate to be spent at any

look to be determined by the proportion of the maximum information collected so

far, rather than just by the look number. In the example in “Retrospective Analysis

of the ALS/Riluzole Study,” the amount of information available at the first look

was considerably less than anticipated, and it would have been desirable to spend

less than the 0.01 of the error rate at that stage. However, information is

accumulated on two endpoints, possibly at different rates, and it is not

straightforward to generalize the concept of information fraction to the bivariate

case.

Computation for the methodology introduced here, via a generalization of

the algorithm of Armitage, McPherson, and Rowe[25] is rather intensive and slow.

An executable program to perform the five-look design illustrated here is available

from the authors. The principles of the methodology extend easily to the case of

more than two experimental treatments, but the computational difficulties multiply

and become the limiting factor in such a generalization.

An alternative approach has been described by Stallard and Todd[28]

following the methods of Thall et al.[24] They allow selection of a single experi-

mental treatment from several competitors to be made once, at the first interim

analysis. The selected treatment then continues in one or more comparisons with

control. This simpler procedure allows more than two experimental treatments to

be considered, and a p-value to be computed, but its structure is rather inflexible.




Dow

nloa

ded

by [

Uni

vers

ity o

f C

alif

orni

a Sa

nta

Cru

z] a

t 22:

28 2

4 O

ctob

er 2

014

Sequential methodology for comparing more than two treatments is

presently at an early stage of development, comparable with that of designs for

two treatments 25 years ago. The problem is a special case of multivariate

sequential analysis that includes the simultaneous monitoring of multiple

endpoints for each patient, and the repeated conduct of both treatment

comparisons and goodness-of-fit assessments at consecutive interim analyses.

The need for such methods is becoming pressing as imaginative sequential

procedures are being attempted in practice, and it is hoped that this paper will

stimulate further research into suitable methodology.

APPENDIX 1: DERIVATION OF THE COVARIANCE BETWEENTWO LOGRANK STATISTICS

Consider a clinical trial in which patients are randomized between two active

treatments E1 and E2 and a control treatment C. Each patient is followed up from

the time of randomization until the occurrence of some event. Suppose that r

patients survive without experiencing the event for time t or longer past the time of

their randomization, of whom o experience the event at time t precisely. The

experiences of these r patients at time t can be summarized as below.

E1 E2 C Total

Event at time t o1 o2 oC o

Event at time . t r1 2 o1 r2 2 o2 rC 2 oC r 2 o

Event at time $ t r1 r2 rC r

Conditioning on the margins of the table, the probability that the o events are

distributed as shown is

Pðo1; o2; o3Þ ¼

r1

o1

!r2

o2

!rC

oC

!

r

o

! ; ðA1Þ

for all values of o1, o2, oC consistent with the margins. Consequently, with the

same conditioning,

Eðo1Þ ¼X o1

r1

o1

!r2

o2

!rC

oC

!

r

o

!




Dow

nloa

ded

by [

Uni

vers

ity o

f C

alif

orni

a Sa

nta

Cru

z] a

t 22:

28 2

4 O

ctob

er 2

014

¼r1o

r

Xr1 2 1

o1 2 1

!r2

o2

!rC

oC

!

r 2 1

o 2 1

! ¼r1o

r; ðA2Þ

where the sums are over all possible values of o1, o2, oC consistent with the

margins. Similarly,

E{o1ðo1 2 1Þ} ¼X o1ðo1 2 1Þ

r1

o1

!r2

o2

!rC

oC

!

r

o

!

¼r1ðr1 2 1Þoðo 2 1Þ

rðr 2 1Þ; ðA3Þ

so that

Eðo21Þ ¼

r1ðr1 2 1Þoðo 2 1Þ

rðr 2 1Þþ

r1o

r; ðA4Þ

and

Eðo1o2Þ ¼X o1o2

r1

o1

!r2

o2

!rC

oC

!

r

o

! ¼r1r2oðo 2 1Þ

rðr 2 1Þ: ðA5Þ

Similar expressions apply for the expectations of squares and products of the other

observed numbers of events at time t. Now, as EðZ ð1ÞÞ ¼ EðZ ð2ÞÞ ¼ 0;

covðZ ð1Þ; Z ð2ÞÞ ¼ Er1oC 2 rCo1

r1 þ rC

� �r2oC 2 rCo2

r2 þ rC

� �� :

Expanding the product within the braces and applying equations of the type (A4)

and (A5) yields

covðZ ð1Þ; Z ð2ÞÞ ¼oðr 2 oÞr1r2rC

rðr 2 1Þðr1 þ rCÞðr2 þ rCÞ: ðA6Þ

Summing such terms over similar 2 £ 3 tables for each event time leads to Eq. (5)

in “A Sequential Design for the Comparison of Two Experimental Treatments

with a Control.”




Dow

nloa

ded

by [

Uni

vers

ity o

f C

alif

orni

a Sa

nta

Cru

z] a

t 22:

28 2

4 O

ctob

er 2

014

Notice that, when Eq. (A6) is applied to find cov(Z (1), Z (1)), the result is not

the same as V (1) given by Eq. (6) and used as the variance of Z (1) in the logrank

test. Equation (4) follows from an argument similar to that above applied to 2 £ 2

tables concerning only the data from E1 and C. Here we retain the standard form of

the logrank test based on Eqs. (3) and (4) because these statistics are familiar, and

because they allow a direct comparison between any two treatments without

influence from data concerning the third and without assumptions concerning the

third. It would be straightforward to derive and use test statistics Z (1) and V (1)

from the series of 2 £ 3 tables instead.

ACKNOWLEDGMENT

During the development of this research, the first author was in receipt of a grant

from the University of Reading Research Endowment Trust Fund.

REFERENCES

1. Whitehead, J. The Design and Analysis of Sequential Clinical Trials, Revised 2nd

Ed.; Wiley: Chichester, 1997.

2. Jennison, J.; Turnbull, B.W. Group Sequential Methods with Applications to Clinical

Trials; Chapman and Hall/CRC: Boca Raton, USA, 2000.

3. Whitehead, J.; Thomas, P. A Sequential Trial of Pain Killers in Arthritis: Issues of

Multiple Comparisons with Control and of Interval-Censored Survival Data.

J. Biopharm. Stat. 1997, 7, 333–353.

4. Johnson, C.D.; Puntis, M.; Davidson, N.; Todd, S.; Bryce, R. A Randomised, Dose-

Finding Phase III Study of Litium Gamolenate (LiGLA) in Advanced Pancreatic

Adenocarcinoma. Br. J. Surg. 2001, 88(5), 662–668.

5. Paulson, E. A Sequential Procedure for Comparing Several Experimental Categories

with a Standard or Control. Ann. Math. Stat. 1962, 33, 438–443.

6. Follman, D.A.; Proschan, M.A.; Geller, N.L. Monitoring Pairwise Comparisons in

Multi-Armed Clinical Trials. Biometrics 1994, 50, 325–336.

7. Proschan, M.A.; Follmann, D.A.; Geller, N.L. Monitoring Multi-Armed Trials. Stat.

Med. 1994, 13, 1441–1452.

8. Tang, D-.I.; Geller, N.L. Closed Testing Procedures for Group Sequential Clinical

Trials with Multiple Endpoints. Biometrics 1999, 55, 1188–1192.

9. Lacomblez, L.; Bensimon, G.; Leigh, P.N.; Guillet, P.; Meininger, V. Dose-Ranging

Study of Riluzole in Amyotrophic Lateral Sclerosis. Lancet 1996, 347, 1425–1431.

10. Paulson, E. A Sequential Procedure for Selecting the Population with the Largest

Mean from k Normal Populations. Ann. Math. Stat. 1964, 35, 174–180.

11. Siegmund, D. Sequential x 2 and F Tests and the Related Confidence Intervals.

Biometrika 1980, 67, 389–402.

12. Jennison, C.; Turnbull, B.W. Exact Calculations for Sequential t, x 2 and F Tests.

Biometrika 1991, 78, 133–141.

13. Siegmund, D. A Sequential Clinical Trial for Comparing Three Treatments. Ann.

Stat. 1993, 21, 464–483.




Dow

nloa

ded

by [

Uni

vers

ity o

f C

alif

orni

a Sa

nta

Cru

z] a

t 22:

28 2

4 O

ctob

er 2

014

14. Betensky, R.A. Sequential Analysis of Censored Survival Data from Three

Treatment Groups. Biometrics 1997, 53, 807–822.

15. Armitage, P. Tests for Linear Trends in Proportions and Frequencies. Biometrics

1955, 11, 375–386.

16. Jennison, C.; Turnbull, B.W. Group Sequential Tests for Bivariate Response: Interim

Analyses of Clinical Trials with Both Efficacy and Safety Endpoints. Biometrics

1993, 49, 741–752.

17. Cook, J.C.; Farewell, V.T. Guidelines for Monitoring Efficacy and Toxicity

Responses in Clinical Trials. Biometrics 1994, 50, 1146–1152.

18. Todd, S. Sequential Designs for Monitoring Two Endpoints in a Clinical Trial. Drug

Inf. J. 1999, 33, 417–426.

19. Todd, S. A Flexible Information-Based Approach to the Design and Interim

Monitoring of Bivariate Group Sequential Trials, School of Applied Statistics

Technical Report 02/1; The University of Reading.

20. Slud, E.V.; Wei, L.J. Two-Sample Repeated Significance Tests Based on the

Modified Wilcoxon Statistic. J. Am. Stat. Assoc. 1982, 77, 862–868.

21. Scharfstein, D.O.; Tsiatis, A.A.; Robins, J.M. Semiparametric Efficiency and Its

Implication on the Design and Analysis of Group Sequential Studies. J. Am. Stat.

Assoc. 1997, 92, 1342–1350.

22. Scharfstein, D.O.; Tsiatis, A.A. The Use of Simulation and Bootstrap in Information-

Based Group Sequential Studies. Stat. Med. 1998, 17, 75–87.

23. Dunnett, C.W. Selection of the Best Treatment in Comparison to a Control with an

Application to a Medical Trial. In Design of Experiments: Ranking and Selection;

Santner, T.J., Tamhane, A.C., Eds.; Marcel Dekker: New York, 1984; 47–66.

24. Thall, P.F.; Simon, R.; Ellenberg, S.S. A Two-Stage Design for Choosing Among

Several Experimental Treatments and a Control in Clinical Trials. Biometrics 1989,

45, 537–547.

25. Armitage, P.; McPherson, C.K.; Rowe, B.C. Repeated Significance Tests on

Accumulating Data. J. R. Stat. Soc. Ser. A. 1969, 132, 235–244.

26. Freedman, L.S. Tables of the Number of Patients Required in Clinical Trials Using

the Logrank Test. Stat. Med. 1992, 1, 121–130.

27. Lan, K.K.G.; DeMets, D.L. Discrete Sequential Boundaries for Clinical Trials.

Biometrika 1983, 70, 659–663.

28. Stallard, N.; Todd, S. Sequential Designs for Phase III Clinical Trials Incorporating

Treatment Selection, 2002, accepted.

Received November 2001

Revised May 2002

Accepted June 2002




Dow

nloa

ded

by [

Uni

vers

ity o

f C

alif

orni

a Sa

nta

Cru

z] a

t 22:

28 2

4 O

ctob

er 2

014

Download - A SEQUENTIAL PROCEDURE FOR COMPARING TWO EXPERIMENTAL TREATMENTS WITH A CONTROL

Top Related