early stopping rules - clinical perspectives and ethical considerations

11
STATISTICS IN MEDICINE, VOL. 13. 1459-1469 (1994) EARLY STOPPING RULES - CLINICAL PERSPECTIVES AND ETHICAL CONSIDERATIONS M. BAUM*t, J. HOUGHTONt AND K. ABRAMSPS *Department of Surgery, Royal Marsden Hospital, Fulham Road, London S W3 655. U. K. +Cancer Research Campaign Clinical Trials Centre, King’s College School of Medicine and Dentistry, Rayne Institute, 123 Coldharbourlane, London SE5 9NU, U.K. $Medical Statistics Unit, London School of Hygiene and Tropical Medicine, Keppel Street, London WCIE 7HT, U.K. SUMMARY A clinical trial should only be launched in the presence of equipoise both amongst clinicians responsible for treating the disease and their patients. However, during the period of patient recruitment the chances are that the levels of equipoise will modify. In some cases, where toxicity or inferiority of efficacy are readily demonstrable, the trial needs to be stopped prematurely to prevent harm to patients. In other cases, the degree of equipoise may be increased by the very existence of the trial or from conflicting evidence reported from other studies. By reference to four specific trials it is argued that instead of definitive rules, the decision as to whether recruitment should be continued must be the role of a Data Monitoring Committee which is able to consider and respond to all the available evidence. INTRODUCTION Sound science is a necessary but not sufficient condition for sound ethics in clinical research. A randomized controlled trial is the expression of the scientific method in clinical research and appropriate statistical methodology is essential at the design stage or as one humorist put it, ‘the statistician should be the obstetrician of the clinical trial, not its morbid anatomist’. Having agreed on the ethical imperative to launch a trial it is unethical to close it prematurely without good reason, as emphasized by Sir Austin Bradford-Hill. The ethical prerequisites for launching a trial are well rehearsed, yet the ethical conditions for the premature closing of this trial are still uncertain and are the very justification for this three-day conference. ETHICAL PRECONDITIONS FOR LAUNCHING A TRIAL Before launching a randomized controlled trial there has to be a defined area of uncertainty amongst the medical profession. This is sometimes referred to as ‘group’ or ‘collective equipoise’. For an individual clinician to enter a patient into the trial he is expected to express an individual uncertainty or degree of equipoise for each patient he sees or for any predetermined group of patients. It has been argued that since it is extremely unlikely that individual clinicians will express the same general degree of uncertainty for all his patients as has been identified collectively, all randomized controlled trials are unethical. However, this completely overlooks the point that there can be no collective uncertainty unless there is variability amongst the CCC 0277-6715/94/141459-11 0 1994 by John Wiley & Sons, Ltd.

Upload: m-baum

Post on 06-Jul-2016

214 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Early stopping rules - clinical perspectives and ethical considerations

STATISTICS IN MEDICINE, VOL. 13. 1459-1469 (1994)

EARLY STOPPING RULES - CLINICAL PERSPECTIVES AND ETHICAL CONSIDERATIONS

M. BAUM*t, J. HOUGHTONt AND K. ABRAMSPS *Department of Surgery, Royal Marsden Hospital, Fulham Road, London S W 3 655. U . K .

+Cancer Research Campaign Clinical Trials Centre, King’s College School of Medicine and Dentistry, Rayne Institute, 123 Coldharbourlane, London S E 5 9NU, U . K .

$Medical Statistics Unit, London School of Hygiene and Tropical Medicine, Keppel Street, London WCIE 7HT, U . K .

SUMMARY A clinical trial should only be launched in the presence of equipoise both amongst clinicians responsible for treating the disease and their patients. However, during the period of patient recruitment the chances are that the levels of equipoise will modify. In some cases, where toxicity or inferiority of efficacy are readily demonstrable, the trial needs to be stopped prematurely to prevent harm to patients. In other cases, the degree of equipoise may be increased by the very existence of the trial or from conflicting evidence reported from other studies. By reference to four specific trials it is argued that instead of definitive rules, the decision as to whether recruitment should be continued must be the role of a Data Monitoring Committee which is able to consider and respond to all the available evidence.

INTRODUCTION

Sound science is a necessary but not sufficient condition for sound ethics in clinical research. A randomized controlled trial is the expression of the scientific method in clinical research and appropriate statistical methodology is essential at the design stage or as one humorist put it, ‘the statistician should be the obstetrician of the clinical trial, not its morbid anatomist’. Having agreed on the ethical imperative to launch a trial it is unethical to close it prematurely without good reason, as emphasized by Sir Austin Bradford-Hill. The ethical prerequisites for launching a trial are well rehearsed, yet the ethical conditions for the premature closing of this trial are still uncertain and are the very justification for this three-day conference.

ETHICAL PRECONDITIONS FOR LAUNCHING A TRIAL

Before launching a randomized controlled trial there has to be a defined area of uncertainty amongst the medical profession. This is sometimes referred to as ‘group’ or ‘collective equipoise’. For an individual clinician to enter a patient into the trial he is expected to express an individual uncertainty or degree of equipoise for each patient he sees or for any predetermined group of patients. It has been argued that since it is extremely unlikely that individual clinicians will express the same general degree of uncertainty for all his patients as has been identified collectively, all randomized controlled trials are unethical. However, this completely overlooks the point that there can be no collective uncertainty unless there is variability amongst the

CCC 0277-6715/94/141459-11 0 1994 by John Wiley & Sons, Ltd.

Page 2: Early stopping rules - clinical perspectives and ethical considerations

1460 M. BAUM. J. HOUGHTON AND K. ABRAMS

uncertainties expressed by the individuals who make up the group. Provided that the main question to be addressed within the trial lies within the extremes of this variability of uncertainty the collective uncertainty can be addressed, assuming that the entry criteria are sufficiently pragmatic.

The next precondition for launching the trial is to have available a promising new treatment or two competing therapies both of which are in current use, where a worthwhile difference in outcome might be anticipated. These differences in outcome could be expressed in improvement of quality of life, prolongation of life or in a similar outcome but with reduced toxicity or expenditure. Having defined the minimum difference that would be worthwhile to detect it is then necessary to make the power calculations so there is a reasonable chance - say 90 per cent power - that the clinical trial will demonstrate a statistically significant result which might be expected to approximate both in direction and size of effect, when generalized to the real world of clinical practice.

Having met these preconditions and satisfied the needs of multiple ethics committees it would be judged unethical to prolong the study when unequivocal evidence of benefit or harm are detected, or more subtly, when the confidence intervals around the point estimate discovered at a predetermined inspection interval suggests that there is little chance of demonstrating a wor- thwhile benefit that either justifies the cost of toxicity of the intervention (we note with irony that although hospital ethics committees are very concerned about the starting of a clinical trial they seem to take little interest in monitoring the results or questioning the ethics of either premature stopping or post-mature recruitment).

Of course, we appreciate the danger of too many unauthorized ‘peeks’ at the data which may lead to the premature abandonment of the trial, yet without some form of data monitoring on the way to achieving the target sample, important and unpredicted results might be missed. For example, the benefit might be significantly greater than predicted and thus appear following fewer events; the toxicity might be greater than predicted or the suggestion of a marginal gain too small to justify the continuation of the trial to the target sample size. It is for these reasons that each major randomized controlled trial is now associated with an independent Data Monitoring Committee who have the awesome ethical role of protecting the public from the zeal of the trialist. These Data Monitoring Committees have, therefore, adopted the role that the hospital ethics committees have neither the stomach or the necessary statistical skills with which to cope.

In addition it is also the role of the Data Monitoring Committee together with that of the Steering Committee to monitor the literature and to make sure the trial is not overtaken by events and to include unpublished individual patient data to aid formal meta-analysis, which in itself may provide grounds for early stopping of active recruitment.

HAZARDS OF PREMATURE STOPPING

As with the ethical imperatives of launching a clinical trial, the hazards of premature stopping have been well rehearsed. For example, the famous ‘double blind’ clinical trial investigating the objective efficacy of prayer might have been stopped after the first six patients’ results were available all of which showed an advantage to being the unwitting recipient of unsolicited prayer.’ However, five of the next six showed an advantage to the control group and ultimately no significant difference emerged, but at least one can say of this ‘therapy’ there was no possibility of toxic side effects, yet imagine if the treatments had been some form of mutilating surgery or toxic chemotherapy! At a more serious level, however, we can use the example of the US. physicians’ health study which concerned the primary prevention of myocardial infarction with the use of a regular low dose of aspirin.2 The study was designed to look at total mortality yet the

Page 3: Early stopping rules - clinical perspectives and ethical considerations

EARLY STOPPING RULES - CLINICAL AND ETHICAL CONSIDERATIONS 1461

Data Monitoring Committee recommended stopping it prematurely because there was a signifi- cant reduction in non-fatal MI with a non-significant adverse effect on stroke. As a result it still remains unclear what the net effect of anti-platelet therapy would be as primary prevention in low risk individuals.

This then leads to a broader discussion of the danger of surrogate rather than real endpoints and condition specific versus all-cause mortality. Real endpoints in medicine are simple to define and involve length of life and quality of the life. The first is easy to measure and the latter more difficult. Important examples of the danger of using surrogate endpoints would include clot-lysis in the tissue plasminogen activator versus streptokinase trial. Early data suggested clot-lysis was significantly better with TPA yet the more mature follow-up of these studies show that strep- tokinase was equally as effective but infinitely cheaper.’ As far as breast cancer is concerned, because of its prolonged natural history many trialists are impatient to wait for the results of overall survival and satisfy themselves with disease-free survival. Aggressive chemotherapy regimens are very effective in demonstrating prolongation of early disease-free survival as against the non-toxic endocrine approaches, and this led to many American oncologists adopting adjuvant chemotherapy prematurely whereas the more mature follow-up of these trials using death as the endpoint showed that tamoxifen, a very non-toxic agent, produced twice the relative risk reduction for mortality at ten years amongst unselected post-menopausal women when compared with aggressive chemotherapy regimens4

Coming next to the knotty problem of condition specific versus all-cause mortality, we have to accept that with some of the less common causes of death it would be almost impossible to collect sufficient sample size to demonstrate an impact on all-cause mortality. For example, trials of mammographic screening for breast cancer use cause-specific mortality as the major endpoint. Breast cancer is the commonest female malignancy and the commonest cause of death for young women but it is still relatively rare compared with cardiovascular disease in older women. That is why proponents of screening satisfy themselves with a reduction in cause-specific mortality which reaches significant levels in the age group 50-70 while ignoring the fact that this fails to translate into an all-cause mortality r ed~c t ion .~ In ignoring this possibility there is the risk of ignoring a small number of treatment related deaths that could completely counter-balance the reduction in cause-specific deaths. The same might apply in trials of cholesterol lowering agents even though ischaemic heart disease is such a common cause of death. These trials could easily have been stopped prematurely when a reduction in the risk of dying from ischaemic heart disease was demonstrated, yet when pursued to completion they have shown a surprising and counter- intuitive result, that the reduction in risk of death from myocardial infarction is almost com- pletely offset by the increase in deaths from colonic cancer and suicide.6

HAZARDS OF POST-MATURE RECRUITMENT INTO TRIALS

Unlike the hazards of premature stopping, the dangers of doggedly sticking to rules whereby analysis of the results is denied until predetermined target size has been reached have not been well formulated. In stating this we do not intend to support the silly allegations by certain critics of clinical trials who say that it is unethical to recruit one more patient than is necessary to achieve statistical significance at the 5 per cent level and that it is also unethical to recruit one patient less than is needed to achieve this! We merely wish to emphasize that the sample size estimates are based on assumptions which could be wildly wrong in both directions, leading to an unnecessary prolonged exposure to an ineffective, toxic agent on the one hand, or the delayed introduction to a beneficial therapy on the other.

Page 4: Early stopping rules - clinical perspectives and ethical considerations

1462 M. BAUM. J. HOUGHTON A N D K. ABRAMS

For example, William Silverman cites a study comparing two different antibacterial prophylac- tic regimens for very small neonates with high mortality rates for infection.' A fixed sample size with 100 in each arm was set in motion with a promise from the statisticians that the results would not be inspected until 200 infants had been enrolled. After the 192nd infant had been entered one of the house officers took an 'illegal' peek at the autopsy findings amongst the entered babies and demonstrated a highly significant and unpredicted excess of fatal kernicterus. Although the conventional regimen had been quite effective in reducing the number of fatal infections Silver- man was horrified to note that his house officer was right, with fatal kernicterus occurring nine times more frequently amongst the routine treated babies compared with those treated with the novel treatment. In retrospect, this problem might have been avoided using sequential analysis which should be the ethical recommendation when events occur very rapidly, as for example in neonates at risk of fatal infection. At the other extreme, Tom Chalmers and his colleagues have argued against the legitimacy of continued randomization to a controlled group after reasonable evidence of efficacy has been established following the advent of cumulative meta-analysis.* They use examples concerning prophylactic antibiotics in colon surgery where 48 trials were published reporting assignment of over 2000 patients to non-antibiotic control groups since 1971 when a cumulative meta-analysis might have demonstrated beyond a shadow of a doubt that this was a simple means of significantly reducing peri-operative death. They also suggested that fib- rinolytic drugs had shown a reduction in early mortality from myocardial infarction by over 20 per cent in 1973. Since then 54 trials have randomized over 40,000 patients.

What they failed to discuss, however, was whether it is possible to shake the collective uncertainty of the medical profession earlier on in the development of an effective new therapy apart from all trials being registered and incorporated into prospective meta-analyses. Perhaps this might be a role for the emerging health technology assessment centres advocated by the advisory group to the Director of Research and Development and chaired by Iain Chalmers.

SPECIFIC EXAMPLES OF THE APPLICATION OF EARLY STOPPING RULES IN CURRENT CLINICAL PRACTICE

We wish to illustrate many of the points described above by reference to four clinical trials that have either been recently closed or are still recruiting in a modified format. These relate to high energy neutron therapy for pelvic cancers which was stopped early because of an unanticipated increased mortality in the experimental treatment group,' the European carotid endarterectomy trial which was recently stopped for two-thirds of the patients," one third of whom showed a significant benefit whilst the other third showed a significant detriment. A trial of treatment for anal cancer when the Data Monitoring Sub-committee recommended recruitment beyond the target sample size. Finally, we wish to discuss the ethics of continuing the trial of mammographic screening for a cohort of women aged 40 at recruitment.

HIGH ENERGY NEUTRON TREATMENT FOR PELVIC CANCER

The objective of this trial was to compare high energy neutron treatment with conventional megavoltage X-ray treatment in the management of locally advanced pelvic cancer.' Randomiz- ation commenced in February 1986 and patients were allocated to neutron or photon treatment in an unstratified manner at a ratio of 3: 1 in favour of neutrons until January 1988 when randomization became 1 : 1 and stratified by tumour site.

In March 1988, when 65 patients had been entered, a non-random sample of ten clinicians, with an interest or involvement in neutron therapy, were asked to quantify their beliefs about the 12

Page 5: Early stopping rules - clinical perspectives and ethical considerations

EARLY STOPPING RULES - CLINICAL AND ETHICAL CONSIDERATIONS 1463

0 m

o J

15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95

Neutron failure rate (%)

Figure 1. The histograms represents the aggregate opinion of ten clinicians involved in the neutron therapy trial as to their belief about the twelve month failure rate following high energy neutron treatment in patients with tumours of the pelvic region. The vertical lines represent the aggregate beliefat 50 per cent failure rate and the clinical demand equivalent to

a relative risk of 1.3 or greater (photons to neutrons)

month failure rate* following high energy neutron therapy in patients with cancer of the pelvic region. Their aggregated belief, using a linear opinion pool,' was for a relative risk of failure of 1.14 (photons to neutrons). They were also asked what the relative risk would have to be before they would routinely use neutron therapy for pelvic cancer. Their aggregate demand was that the relative risk of failure would need to be 1.3 or greater (photons to neutrons). Figure 1 shows the aggregated beliefs together with mean demand. The skew distribution is due to one clinician only willing to give 5 per cent chance that neutrons were going to be beneficial. Assuming 80 per cent power, and a significance level of 5 per cent, over 600 patients in total would be required to detect a treatment difference as great as that demanded by the clinicians. For 50 per cent power the number of patients would fall by half, to 300.

In anticipation of a mid-term review by the Medical Research Council (MRC) neutron therapy cancer subcommittee an informal ad hoc analysis was conducted in October 1989. As a result of this analysis, and after reference to the MRC subcommittee an independent Data Monitoring Commit- tee (DMC) was formed. The DMC was made up of a statistician, a consultant in radiotherapy and a surgeon. The DMC was presented with an analysis of the mortality and morbidity data up to January 1990, together with the ad hoc interim analysis of October 1989, a statistical overview of five published randomized trials of low energy neutron therapy trials for pelvic cancer, a summary of neutron therapy trials in head and neck cancer, and a summary of the elicited clinical beliefs. In February 1990, prior to the DMC meeting, the clinical coordinator of the trial suspended randomization as he felt that he could no longer ethically continue entering patients into the trial. This decision was later ratified by both the MRC neutron therapy subcommittee and the DMC.

*Failure was defined to be death without having achieved local control, recurrence after local control, death due to metastatic disease or severe (grade IV/V EORTC) radiation morbidity.

Page 6: Early stopping rules - clinical perspectives and ethical considerations

1464 M. BAUM. J. HOUGHTON AND K. ABRAMS

By January 1990,151 patients with locally advanced non-metastatic cancer of the pelvic region had been recruited into the trial. The main outcome measures considered were all-cause mortality, death in relation to metastatic disease, and treatment related morbidity. Preliminary analysis of the data showed that the two treatment groups were well balanced with respect to various patient characteristics, thought a priori to be of possible importance. The relative risk for all-cause mortality (photons to neutrons), making no allowance for covariates, was estimated to be 0.59, with 95 per cent confidence interval (0.36,0-95), and P-value less than 0.03. When allowance was made for site of tumour and other prognostic variables, the relative risk for all-cause mortality (photons to neutrons) became 0.66, with 95 per cent confidence interval (0-4, 1.1).

There are a number of issues raised by the early stopping of this trial. The main issue is whether the trial should have been stopped when it was, bearing in mind that the trial had only recruited 151 patients out of an anticipated 600, and that the results to January 1990 had shown evidence of an advantage to photons. Whilst continuation of the trial can be shown” to yield a very small chance of eventually demonstrating statistical significance in favour of neutron therapy, the key point was that the chance of the trial ever showing clinical significance, using the group’s demands described above, was so small that it was unethical to continue. In reality the situation may be even more unfavourable towards neutron therapy. Here clinical significance is based on the elicited demands of a group of clinicians who were involved in neutron therapy. Had the clinical demands of a broader range of clinicians been elicited, then these may well have been more sceptical, requiring a much larger treatment difference in favour of neutrons before they would use them routinely. In the light of this, is it ethical to continue with a trial that is extremely unlikely to influence clinical practice, whilst at the same time possible exposing patients to an inferior treatment? Clearly, the answer is no.

The second major issue is the role that DMCs play in the stopping of trials. In the neutron therapy trial, provision for neither formal stopping rules nor a DMC was made in the original trial protocol. Instead the formation of the DMC was in response to adverse interim results, and was intended to provide independent verification of future decisions made about the trial. Since the stopping of the neutron therapy trial the MRC have considered the creation of DMCs for all their ‘high profile’ trials,’j for example, CHART. Other U.K. groups have different policies but the trials office to which all the authors are attached also recommends that all recruiting, and indeed some closed trials, are reviewed by DMCs. Obviously it may not be practical to have a DMC for every single trial, but for those that are seen as ‘high profile’, and likely to have a profound effect on clinical practice, then they must be seen as mandatory.

Both of the above issues have implications for the planning of future trials. The elicitation of prior beliefs and prior demands about the efficacy of a new treatment is of crucial importance. It is only by conducting such an exercise that interim analyses, whether planned or not, can be placed into perspective, and decisions made about the future of a trial made that are both clinically relevant and ethical.

THE EUROPEAN CAROTID ENDARTERECTOMY TRIAL

Patients with moderate to severe carotid artery stenosis may give notice of this condition by a transient ischaemic attack (TIA) when platelet thrombi embolize from the area of stenosis into the cerebral circulation. Although a TIA is in itself not a cause of significant morbidity it is a marker of an impending major stroke or death from a cerebrovascular accident. Until recently there was collective uncertainty as to the best method of management. Many neurologists would be happy to treat this condition medically with anti-platelet drugs whereas many surgeons would

Page 7: Early stopping rules - clinical perspectives and ethical considerations

EARLY STOPPING RULES - CLINICAL AND ETHICAL CONSIDERATIONS 1465

take a more aggressive approach in attacking the carotid artery directly and performing surgical endarterectomy even though there is a significant treatment-related mortality. In fact, this has become one of the most popular and frequent operations in the United States of America. Furthermore, teams of neurologists/vascular surgeons expressed different levels of uncertainty according to the degree of stenosis within the carotid artery as judged by Doppler scanning or angiography. Some teams felt that the tighter the stenosis the greater the indication for surgery, whereas others felt the looser the stenosis the safer the operation and the greater the indication for surgery. Thus, when a collaborative group who decided to investigate this dichotomy of opinion was established their collective uncertainty bracketed all degrees of carotid artery stenosis, although individual surgeon/physician pairs expressed different grey areas in this continuum. One of us (MB) had the privilege to serve on the Data Monitoring Committee of this trial which was responsible for making some tough decisions where ethics and statistics could not be separated. Fortunately, the design of the trial allowed stratification into three sub-groups : those with tight stenosis, those with loose stenosis and an intermediate group. For all groups there was a treatment-related mortality which was expressed within the first few weeks after surgery. At long term follow-up those patients with loose stenosis never compensated for this early mortality by reduction in major stroke or cerebrovascular death, whereas those with a tight stenosis rapidly compensated for the short term treatment-related mortality by a significant reduction in major strokes or death from cerebrovascular accidents. Thus, the Data Monitoring Committee could advise the Steering Committee with confidence that the trial should be stopped both for greater than anticipated benefit for the tight stenosis and the greater than anticipated detriment for the loose stenosis, leaving the middle third in a grey area where recruitment should continue. It is unlikely that this trial will ever demonstrate the specific degree of stenosis where the decision to operate or not will be a precisely defined threshold. It is likely that there will always be a grey area where clinical judgement and patient preference may have to be considered, but then the objectives of science in medicine are merely to set limits to our ignorance rather than providing us with certainty in all therapeutic decision-making.

ANAL CANCER TRIAL

On the basis of five-year survival data from non-randomized studies, surgical and non-surgical approaches to the management of epidermoid carcinoma of the anal canal and anal margin appeared to give equivalent results. However, the advantage of anal conservation in most patients by primary non-surgical treatment, led to a trial being designed to compare the relative efficacy of combined modality therapy (chemotherapy and radiation) with radiation alone in the primary treatment of this disease.14 The principal endpoint was defined as local treatment failure, whether due to the disease or complications of the treatment, as indicated by the need for major surgical intervention. Five-year survival was a secondary endpoint.

Since anal carcinoma is a relatively uncommon cancer with an annual incidence between 250 and 300 cases in the U.K. each year, it was felt that recruitment would be limited. To demonstrate a difference at P = 005 between local control, six months after completion of therapy, of 75 per cent by one treatment and 90 per cent by the other, would require 130 patients per arm for 90 per cent power. To recruit 360 patients in 3-4 years was considered to be a formidable task in comparison with the recruitment rates achieved by other solid tumour trials in the U.K.

However, the level of interest in the trial was such that the original target was reached in just over three years and the Data Monitoring Subcommittee (DMSC) advised that recruitment should continue for a few months in order to allow most of the 260 patients, required by the protocol, to reach the time of the primary endpoint, that is, six months post-treatment, when a full

Page 8: Early stopping rules - clinical perspectives and ethical considerations

1466 M. BAUM, J. HOUGHTON AND K. ABRAMS

review of the data would be undertaken. This was done in June 1991 when 38 patients were known to be dead and 33 of the survivors had local failure requiring colostomy. The DMSC acknowledged that the trial had reached its recruitment target but also was well aware of the fact that a 60 per cent reduction in the local failure rate (from 25 per cent to 10 per cent) was an over optimistic projection of the likely treatment benefit. Given the relatively little extra toxicity of the combined modality over radiotherapy alone, a smaller reduction in local failures of about half this size (that is, 30 per cent) would be worth detecting. A trial of 260 patients had a high chance (about 70 per cent) of missing a 30 per cent reduction in local failure (for example, 25 per cent to 17 per cent). The second point considered by the committee was that local control at six months, as had been suggested from the previously published series, did not seem to be the most appropriate endpoint. At the time of review, already in the U.K. trial, one-third of the local failures were occurring more than six months after completion of therapy and this with a relative- ly short median follow-up time. Three year or five year local control of the principal endpoint seemed preferable. These considerations led to a clear indication for the need to continue accrual, particularly since it seemed that other trials of anal cancer around the world were unlikely to provide much further data on this question. The DMSC therefore revised the recruitment target to 430 patients which would lead to only a 10 per cent chance of missing a reduction of local failure of a third (for example, 50 per cent to 67 per cent three year colostomy-free survival). There would also be evens (50: 50) chance of detecting a smaller improvement, of say 50 to 60 per cent. Following this recommendation recruitment continued to the study and in fact increased from about 80 patients per annum to over 100.

The DMSC met again in July 1992 to discuss the future of the trial. Based on the current estimates of failure rate, the DMSC expected that in a further two years sufficient events should have occurred in the patients already entered into the study to provide a reliable answer. The Working Party was therefore faced with two alternatives, to close or to continue recruitment into the trial whilst waiting for the number of events required to accumulate. There appeared to be no compelling reasons to stop the study as closure of the trial at this stage might risk not achieving sufficient statistical power to detect a moderate difference between therapies. The outstanding effort of so many clinicians, the absence of any other similar trials to add further data and the impossibility at the present time to provide treatment guidelines were all factors which led the Working Party to recommend continuing accrual.

In this situation the DMSC has prevented a very successful trial from stopping despite having reached its original accrual target. Patients have been carefully monitored for toxicity and since the addition of chemotherapy did not greatly increase the morbidity of treatment, uncertainty remained about the outcome. In fact, judging from the recruitment figures, the existence of the trial had in fact increased equipoise during the initial 3 years’ accrual (Figure 2). In such a situation it seems that the role of the DMSC should be that of encouraging recruitment, so that a really reliable answer can be obtained rather than accepting the somewhat optimistic power calculations which have been based on realistic estimates of achievable recruitment rather than realistic expectations of differences in outcome.

MAMMOGRAPHIC SCREENING FOR BREAST CANCER IN PRE-MENOPAUSAL WOMEN

The intention of mammographic screening is to reduce deaths from breast cancer on the assumption that there is a window of opportunity between mammographic detection threshold of disease and the clinical threshold, during which time the cancer, if left in situ, has the opportunity to metastasize via vascular dissemination. Interpreting the value of mammographic screening is

Page 9: Early stopping rules - clinical perspectives and ethical considerations

EARLY STOPPING RULES - CLINICAL AND ETHICAL CONSIDERATIONS 1467

Year

Figure 2. Patient accrual to the UKCCCR Anal Cancer Trial. The bars represent the annual recruitment (left axis) and the line, the cumulative accrual from the start of the trial in December 1987 (right axis)

subject to a well-defined bias whereby an increase in case survival after treatment of two to three years could easily be ascribed to the lead time given by earlier detection death without ultimately influencing the point at which the patient dies. For this reason a number of randomized controlled trials have been carried out where breast cancer mortality in the population invited to be screened has been judged as the significant outcome measure. There is now a widespread consensus that these trials have demonstrated a 20-30 per cent reduction in the risk of breast-cancer-associated death amongst the population offered screening at two to three year intervals over a period of up to twelve years for women over the age of 50.” In contrast, the results for screening women under the age of 50 are extremely uncertain, with point estimates of relative risks varying between 0.8 and 1.3 and with confidence intervals that might include a 30 per cent reduction in the risk of breast cancer death and up to a 50-60 per cent excess risk amongst the screened population.16 To resolve this uncertainty, once and for all, the UKCCCR launched a trial recruiting a cohort of women aged 40 who would have annual mammographic screening for ten years, with cause specific mortality as the major endpoint. Power calculations were estimated on the assumption that in public health terms a 20 per cent reduction in the relative risk of breast cancer death was worth detecting. Currently, the trial is recruiting at a slower than anticipated rate. Events may have overtaken this trial with the publication of an overview analysis which includes 90,000 pre-menopausal women from trials conducted in Sweden” and a trial conducted in Canada involving 40,000 women between the ages of 40 and 49.’* The Canadian trial showed a non-significant detriment amongst the screened group, whereas the overview of Swedish trials showed a non-significant reduction in the risk of breast cancer death of approximately 10 per cent. Adding these data to our background knowledge it seems implausible that if a real benefit exists it would be in the order of a 20 per cent reduction in the risk of breast cancer death, particularly as the most promising result from the Swedish overview was associated with a 90 per cent compliance rate as against an approximate 60 per cent compliance rate amongst the currently recruiting study in the United Kingdom. To determine whether the 10 per cent reduction is a closer approximation to the truth would involve an enormous expansion of recruitment into the U.K. trial, at enormous cost, which would have to be

Page 10: Early stopping rules - clinical perspectives and ethical considerations

1468 M. BAUM, J . HOUGHTON AND K. ABRAMS

offset against the funding of research into more promising areas. Furthermore, when recruiting women to this trial they should be informed of the possibility of harm as well as potential benefit which would be likely to reduce compliance beyond the already critically low acceptance rate. It could be judged, therefore, that a rational and ethical decision might be to abort this trial prematurely. However, the debate continues and we urge our readers watch this space!

DISCUSSION

From the examples quoted in this paper, the necessity of constant review of a trial’s progress is obvious, both to protect patients and to allow equitable distribution of the scarce resources available for the administration of large RCTs. We do not think that the providing of predeter- mined rules in the original protocol for the trial is a sufficient safeguard nor the most appropriate means of ensuring that constraints are correctly applied to decisions concerning continuing recruitment to a particular trial. Rules are often rigid and inflexible and those prescribing them unable to foresee all relevant future events. In the absence of DMCs, planned interim analyses or diligent ethics committees, one has to rely on the premise that the clinical trials that are conducted are also well designed. Unfortunately, this would appear not to be the case. In a recent survey, of 16 first trial reports of phase 111 cancer clinical trials in the British Journal ofCancer in 1991 and 1992, eight made no mention of sample size or power calc~lations.’~ We would urge those who fund important phase 111 trials to ensure that there is an adequate mechanism available to continuously review the progress of a trial.

ACKNOWLEDGEMENTS

We would like to thank DrSheila Gore for permission to use the individual elicited beliefs of the ten clinicians involved in the neutron therapy trial. We would also like to thank Drs. Debroah Ashby and Doug Errington for useful discussions regarding the neutron therapy trial. We would also like to acknowledge the support of the Cancer Research Campaign for two of the authors (JH and KA).

REFERENCES 1. Joyce, C. R. B. and Welldon, R. M. C. ‘The objective efficacy of prayer - a double-blind clinical trial’,

Journal of Chronic Diseases, 18, 367-377 (1965). 2. Steering Committee of the Physicians’ Health Study Research Group. ‘Final report on the aspirin

component of the ongoing physicians’ health study’, New England Journal of Medicine, 321, 129-135 ( 1989).

3. Third International Study of infarct Survival. ‘Isis-3: A randomised comparison of streptokinase vs tissue plasminogen activator vs anistreplase and of aspirin plus heparin vs aspirin alone among 41, 299 cases of suspected acute myocardial infarction’, Lancet, 339, 754-769 (1992).

4. Early Breast Cancer Trialists Collaborative Group. ‘Systemic treatment of early breast cancer by hormonal, cytotoxic, or immune therapy’, Lancet, 339, 1-15 and 71-85 (1992).

5. Skrabanek, P. ‘The debate over mass mammography in Britain: the case against’, British Medical Journal, 297,971-972 (1988).

6. Muldoon, M. F., Manuck, S. B., and Matthews, K. A. ‘Lowering cholesterol concentrations and mortality: a quantitative review of primary prevention trials’, British Medical Journal, 301, 309-314 ( 1990).

7. Silverman, W. A., Andersen, D. H., Blanc, W. A. and Crozier, D.N. ‘A difference in mortality rate and incidence of kernicterus among premature infants allotted two prophylactic antibacterial regimens’, Pediatrics, 18, 614-625 (1956).

8. Lau, J. Antman, E. M., Jimenez-Silva, J., Kupelnick, B., Mosteller, F. and Chalmers, T. C. ‘Cumulative meta-analysis of therapeutic trials for myocardial infarction’, New England Journal of Medicine, 327, 248-254 (1992).

Page 11: Early stopping rules - clinical perspectives and ethical considerations

E A R L Y STOPPING RULES - CLINICAL AND ETHICAL CONSIDERATIONS 1469

9. Errington, R. D., Ashby, D., Gore, S. M., Abrams, K. R., Myint, S., Bonnett, D. E., Blake, S. W., and Saxton, T. E. ‘High energy neutron treatment for pelvic cancers: study stopped because of increased mortality’, British Medical Journal, 302, 1045- 105 1 (1991).

10. European Carotid Surgery Trialists’ Collaborative Group. ‘MRC European carotid surgery trial: interim results for symptomatic patients with severe (70--99%) or with mild (0--29%) carotid stenosis’, Lancet, 337, 1235- 1243 (I99 1).

11. Spiegelhalter, D. J., Freedman, L. S. and Parmar, M. K. B. ‘Applying Bayesian ideas in drug develop- ment and clinical trials’, Statistics in Medicine, 12, 1501 - 15 1 I (1993).

12. Genest, C. and Zidek, J. V. ‘Combining probability distributions’, Statistical Science, 1, 114-147 (1986). 13. Parmar, M. K. B. and Machin, D. ‘Monitoring clinical trials: experience of, and proposals under

consideration by, the cancer therapy committee of the British Medical Research Council’, Statistics in Medicine, 12, 497-504 (1993).

14. ‘UKCCCR Trial of combined modality therapy v radiation alone in the management of anal cancer’, Protocol available from CRC Clinical Trials Centre, London SE5 9NU, 1987.

15. Wald, N., Frost, C. and Cuckle, H. ‘Breast cancer screening: the current position’, British Medical Journal, 302, 845 (1991).

16. Eddy, D. M., Hasselblad, V.. McGiveney, W. and Hendee, W. ‘The value of mammography screening in women under age 50 years’, Journal of the American Medical Association, 259, 1 5 13- I5 19 [ 1988).

17. Nystrom, L., Rutqvist, L. E., Wall, S., Lindgren, A,, Lindqvist, M., Ryden, S., Anderson, 1.. Bjurstam, N., Fagerberg, G., Friseu, J., Tabar, L. and Larsson, L-G. ‘Breast cancer screening with mammography: overview of Swedish randomised trials’, Lancet, 341, 973-978 (1993).

18. Miller, A. B., Baines, C. J., To, T. and Wall, C. ‘The Canadian National Breast Screening Study. 1 . Breast cancer detection and death rates among women aged 40 to 49 years’, Canadian Medical Associalion Journal, 147, (10). 1459-1476 (1992).

19. Ashby, D. and Machin, D. ‘Stopping rules, interim analyses and data monitoring committees’, Editorial, British Journal of Cancer, 68, 1047-1050 ( I 993).