discussion of ‘the what, why and how of bayesian clinical trials monitoring’

5
STATISTICS IN MEDICINE, VOL. 13, 1385-1389 (1994) DISCUSSION OF ‘THE WHAT, WHY AND HOW OF BAYESIAN CLINICAL TRIALS MONITORING’ INTRODUCTION It is perhaps worth remembering that in the late seventies, at least for me, working at the European Organisation for the Research and Treatment of Cancer Data Center, many issues that have been discussed in this workshop were not obvious. As time has passed, the need for larger clinical trials has become more and more clear, both because we recognize the need, and also importantly because they are now accepted as feasible and practicable. Much of the early experience in the field of randomized trials in cancer, was in some sense, a learning experience rather than a serious attack on the basic questions. For example, there was much concern about carefully defining eligibility criteria, there was little awareness of the uncertainty principle and its consequences and intention to treat analyses were not current. There was a rather naive (optimistic) view of what was possible with new therapies in cancer so that many of the trials designed turned out to be too small to convincingly demonstrate what turned out to be modest but worthwhile benefits. Standard design criteria were a = 0.05, 1 - B = 0.80 and a 50 per cent improvement in median survival, and so patient numbers obtained from George and Dew’ usually led to trials of fewer than 200 patients. Thus, the early ‘optimism’ about the likely size of treatment effect led to small trials with the consequently inconclusive (non-significant) results. Early stopping for efficacy reasons was rarely a problem, since the trials were generally too small to establish even quite considerable benefits. The problems associated with repeated significance testing on accumulating data were essentially ignored. It is also important to note that any methods concerned with early stopping have to be readily applicable in a wide context and particularly in the context of a (very) busy trials office. Thus, for example, the current trial portfolio of the MRC Cancer Trials Office (CTO) is approximately 40 open randomized phase 111 trials of varying size at varying stages of maturity. Their size ranges from the low hundreds to a planned four thousand patients in one trial. The approach to the design of stopping rules for these different trials varies. Thus, for example, in a renal carcinoma trial comparing methodroxyprogesterone (MPA) with a-interferon a sequential design is u t i l i ~ e d , ~ . ~ for the continuous hyperfractionated accelerated radiotherapy (CHART) trials in carcinoma of the bronchus and in head and neck cancer the approach is Baye~ian,~ but for the majority the approach is to use conservative tests in the spirit of OBrien and Fleming.’ As an example, the following gives an extract from a protocol designed in 1983 to compare the addition of a radiosensitizer to radiotherapy in patients with advanced cervix cancer and illustrates a very flexible approach to interim analysis: 0 1994 by John Wiley & Sons, Ltd.

Upload: david-machin

Post on 06-Jul-2016

220 views

Category:

Documents


5 download

TRANSCRIPT

Page 1: Discussion of ‘the what, why and how of bayesian clinical trials monitoring’

STATISTICS IN MEDICINE, VOL. 13, 1385-1389 (1994)

DISCUSSION OF ‘THE WHAT, WHY A N D HOW OF BAYESIAN CLINICAL TRIALS MONITORING’

INTRODUCTION

It is perhaps worth remembering that in the late seventies, at least for me, working at the European Organisation for the Research and Treatment of Cancer Data Center, many issues that have been discussed in this workshop were not obvious. As time has passed, the need for larger clinical trials has become more and more clear, both because we recognize the need, and also importantly because they are now accepted as feasible and practicable. Much of the early experience in the field of randomized trials in cancer, was in some sense, a learning experience rather than a serious attack on the basic questions. For example, there was much concern about carefully defining eligibility criteria, there was little awareness of the uncertainty principle and its consequences and intention to treat analyses were not current. There was a rather naive (optimistic) view of what was possible with new therapies in cancer so that many of the trials designed turned out to be too small to convincingly demonstrate what turned out to be modest but worthwhile benefits. Standard design criteria were a = 0.05, 1 - B = 0.80 and a 50 per cent improvement in median survival, and so patient numbers obtained from George and Dew’ usually led to trials of fewer than 200 patients. Thus, the early ‘optimism’ about the likely size of treatment effect led to small trials with the consequently inconclusive (non-significant) results. Early stopping for efficacy reasons was rarely a problem, since the trials were generally too small to establish even quite considerable benefits. The problems associated with repeated significance testing on accumulating data were essentially ignored.

It is also important to note that any methods concerned with early stopping have to be readily applicable in a wide context and particularly in the context of a (very) busy trials office. Thus, for example, the current trial portfolio of the MRC Cancer Trials Office (CTO) is approximately 40 open randomized phase 111 trials of varying size at varying stages of maturity. Their size ranges from the low hundreds to a planned four thousand patients in one trial.

The approach to the design of stopping rules for these different trials varies. Thus, for example, in a renal carcinoma trial comparing methodroxyprogesterone (MPA) with a-interferon a sequential design is u t i l i ~ e d , ~ . ~ for the continuous hyperfractionated accelerated radiotherapy (CHART) trials in carcinoma of the bronchus and in head and neck cancer the approach is Baye~ian ,~ but for the majority the approach is to use conservative tests in the spirit of OBrien and Fleming.’ As an example, the following gives an extract from a protocol designed in 1983 to compare the addition of a radiosensitizer to radiotherapy in patients with advanced cervix cancer and illustrates a very flexible approach to interim analysis:

0 1994 by John Wiley & Sons, Ltd.

Page 2: Discussion of ‘the what, why and how of bayesian clinical trials monitoring’

1386 DISCUSSION

'There will be continued analysis of the data which may influence the final total number of cases admitted. Progress reports will be circulated every six months from the begining of the study, to maintain interest in the progress of the study, review patient entry rates, monitor early toxicity and report on the organisational aspects. Interim analyses of the difference in local recurrence and survival rates will be conducted at less frequent intervals to detect gross differences in treatment response. These analyses will be simple comparisons interpreted by conservative significance tests. The results of these analyses will be discussed by the Chairman, Statistician and one other member of the group. They will not be circulated unless there is a reason to recommend termination of the trial. In this case the final decision will rest with the Working Party.'

This trial was designed at a stage when the Bayesian approach was in its early formative period. The results of this trial have now been published by the MRC Working Party on advanced carcinoma of the cervix.6

It is clear from the extract that the stopping rule was rather vague and non-prescriptive. The consequences of this rather informal procedure are described in the trial publication and have been discussed by Parmar and Machin.' This approach contrasts markedly with the very structured approach described by Freedman et a/.,* see also Bartolucci et

BAYESIAN METHODS

It is clear that the advantages of using a (structured) Bayesian approach are that it can provide sensible procedures for assessing trials that need to be stopped early or extended, it can incorporate external information into the final assessment of the value of a particular therapy and it can also cope with any extra (unscheduled) analysis. These properties of the Bayesian approach are important as they provide an integrated view of all aspects of stopping, continuing or extending recruitment to randomized trials, as well as their reporting. They are mathematically elegant, in that the various prior distributions can be easily incorporated to obtain the relevant posterior distributions.

The Initial Prior Distribution

It is important to remember that at the design stage of a trial arbitrary decisions are made when determining trial size with respect to: the significance level (a); whether a one- or a two-sided comparison is relevant; the power (1 - b), and, rather critically, the effect size (a), to be detected in the trial. The latter is often expressed as the logarithm of the hazard ratio (HR) for survival studies. I t must also be recognized that the method of interim analysis chosen is in itself arbitrary as is the frequency of such analyses and importantly how the results of such analyses are implemented.

It is a clear ethical requirement at the design stage of a trial that there must be considerable uncertainty about the size of 6." At that stage there is, at the very least, a considerable likelihood that no (zero) effect may be observed. This uncertainty implies a 'very vague' prior distribution centred on 6 = 0. The variance of this prior will depend on how the distribution is obtained. It may come, for example, from the investigators who will participate in the trial or from an analysis of the literature. Freedman and Spiegelhalter,' ' see also Freedman et a/.,'* have described a method of obtaining a prior distribution from the investigators, but this tends to be centred around their optimistic belief of the improvement that might be achieved rather than a summary of their knowledge of the current evidence. As far as I am aware, there has been no description of how the information available in the published literature is quantified into a prior distribution. In

Page 3: Discussion of ‘the what, why and how of bayesian clinical trials monitoring’

DISCUSSION 1387

certain circumstances, although this will be rare, such a prior may be obtained from a formal overview providing an estimate of the pooled hazard ratio and its associated variance.13 It is a moot point whether the point estimator from such an overview should be used as the prior mean 6, but nevertheless it may make sense to use the variance as an indicator of the uncertainty. If the investigators themselves are used to obtain the prior, then some care needs to be attached to its interpretation. At best such a prior will be a compound of anecdotal information obtained from unpublished observations, of information obtained from the published literature, and personal experience with the regimen. Such difficulties however, should not imply that this prior will be of no value at the planning stage.

I t is clear that at the beginning of a new trial, any prior used will dominate the data if its variance is small. Thus, whatever is observed with the trial patients will not overcome this prior until the number of patients recruited, or rather the number of critical events observed, reaches a certain minimum size. It is a useful concept, therefore, to translate the initial prior into an equivalent number of patients (events) so that any (first) interim analysis can be focused at a point when the data begins to influence the posterior distribution in a substantial way. Recommendations for when such a first interim analysis is appropriate would be valuable.

The Evolving Prior Distribution

As a trial progress the external information provided, for example, from the medical literature will be evolving alongside the patient recruitment to the trial and observation of the associated events. In planning a trial using a Bayesian approach it may be necessary to also stipulate how this evolving external information is to be progressively summarized. It is also important to decide what information should, or should not, influence the prior. One may confine attention only to randomized clinical trials of exactly the same or similar therapies obtained perhaps from a register of trials.14 One must also be aware that the literature may be biased.13 The importance of developing the external prior as the trial progresses is that this (evolving) subjective distribution together with the (baseline) prior obtained before the start of he trial (although these two can be formally combined into one prior) and the event data obtained from the trial patients are all elements necessary for a full Bayesian approach to interim analysis. At the time of such analysis all elements are combined into the associated posterior distribution needs which is then presented to a Data Monitoring Committee for their consideration.

Sceptical Priors

Freedman et a/.' has described the role of the sceptical prior both at the interim analysis stage and also at the final reporting at the end of a trial. This seems to me a very useful concept irrespective of whether or not a Bayesian approach has been used at the design stage. It enables, for example, the reader (or formal assessor) to see whether his personal scepticism about the therapy on trial is sufficient to negate the (positive) trial results. This is particularly important if the trial is to have any impact on subsequent clinician practice.'

Practicalities

In practice there will be difficulty in obtaining (updating) the relevant prior distributions and there is also the difficulty in maintaining the trial database in that there is often delay between the occurrence of an event (death), its reporting and finally its addition to include trial database. It would be an interesting exercise to assess the effect of such information delay on the associated priors (see Parmar and Machin'). Clearly, this will not be important for very large trials as the

Page 4: Discussion of ‘the what, why and how of bayesian clinical trials monitoring’

1388 DISCUSSION

delayed events may be a small proportion compared to those already in the database. Such delays will be important in small trials and may be important in the early stages of all trials when there is a possibility of stopping too early - with the decision being based on incomplete patient data.16

CASE STUDIES

There is a clear need for case studies both of trials that go on to completion of their initial recruitment target (will this ever be the case with a full Bayesian approach?) and those that stop early. Such studies could be of two types. One might be retrospective, perhaps examining the literature available at the start of a particular trial, then at the half-way stage and then one year (say) prior to publication. Such an analysis would mirror those done by Donaldson et a1.l’ to evaluate sequential designs. The other type of case study will be real and Parmar4 has described how a Bayesian approach to the CHART trials has been used at the design and interim analysis stages. Once these trials are closed, then a detailed account of the advantages and disadvantages of the Bayesian procedures should also be reported. Such case studies will allow an assessment of the Bayesian approach to be made and perhaps more importantly set the standard for the procedures to follow in adopting such an approach by other investigators. To facilitate this it is also important to develop appropriate and user-friendly computer software which could then link to standard statistical packages, for example those used for survival analysis.

CONCLUSION

The obvious drawbacks of a Bayesian approach are persuading both the statistical, and perhaps more importantly, clinical community of both its ‘scientific validity’ and practicability. Such an approach must add to the ‘cost’ of organizing randomized trials and, therefore, must be clearly seen to bring ‘benefits’. Clinicians in particular are apt to be uncomfortable with a procedure or a series of techniques which provides ‘optimal answers’ as they often think that statistical analysis is not based on subjective assumptions but rather on entirely objective fact. Showing them how different assumptions (priors), purporting to represent belief, influence the interpretation of results is likely to be treated with some scepticism. They will also need help in the method of constructing and application of their own scepticism. Considerable work needs to be done in developing the methodology for summarizing the results of concurrent, external trials. l 3 There is a need for ‘case’ studies using the Bayesian approach to illustrate what it gives above and beyond current methods. These will also allow the trial co-ordinating team and other interested clinicians to get some ‘feel’ for the approach. Until this is done Bayesian methods for monitoring clinical trials will remain no more than an interesting but unused tool.

REFERENCES

I . George, S. L. and Desu, M. ‘Planning the size and duration of a clinical trial studying the time to some

2. Whitehead, J. The Deign and Analysis of Sequential CIinical Trials. 2nd edn, Ellis Horwood, Chichester,

3. Fayers, P. M., Cook, P. A., Machin. D. et al. ‘On the development of the Medical Research Council trial

4. Parmar, M. K. B. Trials of the MRC Cancer Therapy Committee, Statistics in Medicine, 13, 1297-1312

5 . OBrien, P. C. and Fleming, T. R. ‘A multiple testing procedure for clinical trials’, Biometrics, 35,549-556

critical event’, Journal of Chronic Diseases, 27, 15-24 (1974).

1992.

of a-interferon in metastatic renal carcinoma’, Statistics in Medicine, in press.

1994.

( 1979).

Page 5: Discussion of ‘the what, why and how of bayesian clinical trials monitoring’

DISCUSSION 1389

6. Medical Research Council Working Party on Advanced Carcinoma of the Cervix. ‘A trial of Ro 03-8799 (pimonidazole) in carcinoma of the uterine cervix: an interim report from the Medical Research Council Working Party on advanced carcinoma of the cervix’, Radiotherapy and Oncology, 26, 93-103 (1993).

7. Parmar, M. K. B. and Machin, D. ‘Monitoring Clinical Trials: Experience of, and proposals under consideration by, the Cancer Therapy Committee of the British Medical Research Council’, Statistics in Medicine, 12, 497-504 (1993).

8. Freedman, L. S., Spiegelhalter, D. S. and Parmar, M. K. B. ‘The what, why and how of Bayesian clinical trials monitoring’, Statistics in Medicine, 13, (1994).

9. Bartolucci, A. A., Katholi, C. R. and Birch, R. ‘Interim analysis of failure time data - a Bayesian approach’, Enoironmetrics, 3, 465477 ( 1992).

10. Machin, D. ‘Interim analysis and ethical issues in the conduct of trials’, in Williams, C. J. (ed), Introducing New Treatments for Cancer: Practical, Ethical and Leyal Problems, Wiley, Chichester, 1992.

1 I . Freedman, L. S. and Spiegelhalter, D. J. ‘The assessment of subjective opinion and its use in relation to stopping rules for clinical trials’, The Statistician, 32, 153-160 (1983).

12. Freedman, L. S., Lowe, D. and Macaskill, P. ‘Stopping rules for clinical trials incorporating clinical opinion’, Biometrics, 40, 575-586 (1984).

13. Stewart, L. A. and Parmar, M. K. B. ‘Meta-analysis of the literature or of individual patient data: Is there a difference?, The Lancet, 341.418422 (1993).

14. Fayers, P. M. and Armitage, T. ‘Towards an international register of cancer trials: The UKCCCR Register of UK Trials’, European Journal of Cancer, 29A, 907-912 (1993).

15. Stephens, R. J. and Gibson, D. ‘The impact of clinical trials of the treatment of lung cancer’, Clinical Oncoloyy, 5, 21 1-219 (1993).

16. COMPACT Steering Committee. ‘Improving the quality of data in clinical trials in cancer, British Journal of Cancer, 63, 412415 (1991).

17. Donaldson, A. N., Whitehead, J., Stephens, R. J. and Machin, D. ‘A simulated sequential analysis based on data from two MRC lung cancer trials’, British Journal of Cancer, 68, 1171-1178 (1993).