why do we need controls? why do we need to randomize?

758 AMERICAN JOURNAL OF OPHTHALMOLOGY MAY, 1975

was for each standard therapy, there would not be this great dichotomy between informed consent for research and informed consent for practice. Somewhere between is a reasonable informed consent for both.

REFERENCES

1. Chalmers, T. C, Sebestyen, C. S., and Lee, S. : Emergency surgical treatment of bleeding peptic ulcer. An analysis of the published data on 21,130 patients. Trans. Am. Clin. Climatol. Assoc. 82:188, 1970.

2. Enquist, I. F., Karlson, K. E., Dennis, C, Fierst, S. M., and Shaftan, G. W. : Statistically valid ten-year comparative evaluation of three methods of management of massive gastroduodenal hemorrhage. Ann. Surg. 162 -.550, 1965.

3. Read, R. C, Huebl, H. C, and Thai, A. P. : Randomized study of massive bleeding from peptic ulcération. Ann. Surg. 162:561, 1965.

4. Spicer, F. W., Carbone, J. V., and Lyon, C. G. :

A controlled clinical trial is a scientific experiment. In the ideal scientific experiment, there is an experimental group and a control group, that is, a comparison group, and the two groups are identical in every respect except for the experimental treatment. Then any difference in outcome must be attributable to the difference in treatments.

Most clinical trials in the history of medicine have been uncontrolled. The usual practice has been to treat a series of patients with a new treatment and then form an impression, or opinion, as to whether the results are excellent, good, same as before, or bad. If the results are excellent or good, they are

Mr. Ederer is head, Section of Clinical Trials and Natural History Studies, National Eye Institute, Bethesda, Maryland.

Presented at the National Eye Institute Workshop on Randomized Controlled Clinical Trials, Washington, D.C., Nov. 6, 1973.

Reprint requests to Mr. Fred Ederer, National Eye Institute, National Institutes of Health, Bethesda, MD 20014.

Acute massive hemorrhage from gastroduodenal ulcération. Am. J. Surg. 102:153, 1961.

5. Chalmers, T. C. : Randomization and coronary artery surgery. Ann. Thorac. Surg. 14:323, 1972.

6. McGill, D. B., Humpherys, S. R., Baggenstoss, A. H., and Dickson, E. R. : Cirrhosis and death after jejunoileal shunt. Gastroenterology 63:872, 1972.

7. The University Group Diabetes Program: A study of the effects of hypoglycémie agents on vascular complications in patients with adult-onset diabetes. Diabetes 19 (Suppl. 2) :747, 1970.

8. The University Group Diabetes Program: Effects of hypoglycémie agents on vascular complications in patients with adult-onset diabetes. 4. A preliminary report on phenformin results. J.A.M.A. 217:777, 1971.

9. Chalmers, T. C, Block, J. B., and Lee, S.: Controlled studies in clinical cancer research. N. Engl. J. Med. 287:75, 1972.

10. Buncher, C. R.: Sounding Board: Administratively significant. N. Engl. J. Med. 289:155, 1973.

11. Chalmers, T. C. : Varieties of significance. N. Engl. J. Med. 289:923, 1973.

more likely to be published than if they are indifferent or bad. Negative findings lack glamour.

When a treatment is dramatically effective, no controls may be needed. The first case of penicillin treated in this country provided strong indications of its remarkable therapeutic potential.1 But how many penicillins have we had ? Usually we are concerned with much smaller effects. While the truth may eventually emerge from a series of uncontrolled trials, this is at best an inefficient process. And the danger is that the truth may never emerge. Photocoagulation treatment to prevent blindness from diabetic retinopathy2

and screening for early diagnosis of cervical cancer are two widely used, costly procedures, the values of which remain inadequately substantiated. Of the latter, Geoffrey Rose3 recently said:

It is now clear that a definite answer to this major question could only have been obtained by a controlled trial of early diagnosis by cer-

WHY DO W E NEED CONTROLS? WHY DO W E NEED TO RANDOMIZE?

FRED EDERER, M.A.

Bethesda, Mary fond

VOL. 79, NO. 5 CLINICAL TRIALS 759

vical smear versus clinical diagnosis of invasive disease. Such a trial seems no longer possible either ethically or practically. We are now committed to continuing a costly screening service and shall never know whether it is doing little or much to control the fatal forms of the disease.

I would like to give three examples illustrating the difficulty in drawing conclusions from uncontrolled studies (Table 1). This is from a study of antihistamines in the treatment of the common cold.4 Of colds under one day's duration, 13.4% were cured and 68.2% were cured or improved. Is that good, bad, or indifferent? Without controls, the investigator would have to search his memory about past results in untreated cases. In place of objective evidence, we would have a subjective, intuitive impression, that is, one person's opinion as to whether the results are excellent, good, or no better than before. Fortunately, the study in question did use controls—randomized controls, at that—in the form of placebos. The placebo results were essentially the same as the antihistamine results.

The second example comes from the National Diet-Heart Study, which attempted to assess the influence of several randomized, double-blind diets on serum cholesterol (Table 2).5 The results from two of the participating centers showed percent serum cholesterol changes in middle-aged men after the start of two somewhat different experimental diets, B and C, and a control diet, D. Without the control diet we would be tempted to conclude that the diets were more effective in Minneapolis-St. Paul than in Oakland. But after subtracting the control results, the net experimental effect is about the same in

TABLE 1 COLDS UNDER ONE DAY'S DURATION*

Cured, Cured or % Improved, %

Antihistamine 13.4 68.2 Placebo 13.9 64.7

* Results on second day following treatment.

TABLE 2 NATIONAL DIET-HEART STUDY*

Diet MstnpaPulHS" 0 a k l a n d

B -14 .7 - 1 1 . 0 C - I S . 5 -10 .9 D - 7.3 - 1.8

* Mean percent serum cholesterol change.

the two cities: 8 or 9%. The cause of the greater Minneapolis-St. Paul drops in all groups was never explained. It might have been some coincidental occurrence, such as a seasonal influence.

From results of another double-blind diet-heart study conducted over an eight-year period it appears that the experimental diet became more and more effective as the study progressed (Figure).8 But the people on the control diet also showed a progressive serum cholesterol decrease over the eight years. We may not know how to explain these decreases. The important point is that a true measure of the effectiveness of an experimental treatment is obtained not from the treatment alone, but by comparing it with a control.

Examples of these kinds led Professor Hugo Muench of Harvard University to formulate his Second Law: "Results can always be improved by omitting controls."7

The last example again illustrates how coincidental occurrences, in the absence of controls, can lead to a misinterpretation of an observed effect.8 During World War II rescue workers, digging in the ruins of an apartment house blown up in the London blitz, found an old man lying naked in a bathtub, fully conscious. He said to his rescuers: "You know, that was the most amazing experience I ever had. When I pulled the plug and the water started down the drain, the whole house blew up."

The next question is : Why do we need to randomize? Before answering we should explain what randomization means. It means that the choice of treatment for each unit

760

1.05,

y**\

AMERICAN JOURNAL OF OPHTHALMOLOGY

SEBUM CHOLESTEROL

MAY. 197S

1..«,./: Γ4-4 >T T CONTROL 1 T .

MEAN DIFFERENCE = 12.7%

YEARS FROM START OF DIET

Figure (Ederer). Changes in serum cholesterol in a controlled, double-masked diet trial. (From Dayton and associates.*)

(for example, patient, eye) should be made by an independent act of randomization, such as the toss of a coin or the use of a table of random numbers. We randomize because it prevents any possibility of bias and it is by far the best method to date to make the groups we want to compare as similar as possible. But why take blind luck? Why not match the groups on important prognostic factors, such as age, sex, and severity of disease? The answer is that we can match only on factors we know, or believe, to be important. We cannot match on factors we do not know about. Secondly, we can only match on factors we can observe or measure. We cannot match on factors we cannot observe or measure. Randomization tends to match on all factors, prognostic and nonprognostic, observable and not observable, measurable and not measurable.

The Coronary Drug Project illustrates how effective randomization is in matching groups (Table 3).9 The percentage at baseline with ten given findings is nearly identical

in the Atromid-S and placebo groups. The publication lists 44 such findings.

Suppose we do not randomize. What kinds of trouble does that get us into ? A number of alternatives to randomization have been used.

TABLE 3 CORONARY DRUG PROJECT

Age >45 Race, nonwhite Risk group 2 (high risk) >2 previous myocardial

infarctions > 12 months since last myo

cardial infarction Electrocardiogram, Class 1

or 2 Q waves Definite angina pectoris T-wave Relative body weight > 1.00 Cigarette smokers

Percent with Given Baseline Findings

Atromid-S

85.2 6.5

33.8

18.0

85.3

83.6 47.1 50.7 87.9 39.0

Placebo

85.2 6.8

34.3

19.9

86.1

83.3 46.5 49.5 87.9 37.9

VOL. 79, NO. S CLINICAL TRIALS 761

One of these is to use unplanned controls. Some patients are given a drug, and some are not. The drug may be given to clinic patients and private patients may serve as controls, or vice versa. Whatever the method of selection, the basic question is whether the two groups are comparable, and with unplanned controls this question cannot ever be answered affirmatively with assurance.

A sea captain was given samples of anti-nausea pills to test during a voyage. The need for controls was carefully explained to him. Upon return of the ship, the captain reported the results enthusiastically. "Practically every one of the controls was ill, and not one of the subjects had any trouble. Really wonderful stuff." A skeptic asked how he had chosen the controls and the subjects. "Oh, I gave the stuff to my seamen and used the passengers as controls."10

Not only are unplanned controls hazardous, but when a randomization procedure is developed, it is essential to adhere strictly to it. The investigator is sometimes tempted to depart from the randomization procedure, particularly when he doubles as therapist. The 1930 Lanarkshire milk experiment, which involved 20,000 school children, was a test of the effect of extra milk on their height and weight. Selection of the 10,000 subjects and 10,000 controls was left to the teachers, in some cases "by ballot," in others on an alphabetical system. But there was an escape clause: the teachers were allowed to "improve" the selections if they considered them unbalanced. It was discovered afterward that sympathies had biased the selections so that at the outset the controls were heavier and taller than the experimental subjects. As a result, the experiment ". . . failed to produce a valid estimate of the advantage of giving milk to children "«

Studies have been done in which patients are alternated between treatments A and B or in which selection is determined by whether the hospital number or the day of the month is even or odd. While many clinicians believe these methods to be random, in fact they are

not. A post-World War II multi-clinic trial of anticoagulant therapy used the day-of-the-month method, and it was discovered that more patients than expected were admitted on odd days. The investigators reported that "as physicians observed the benefits of anticoagulant therapy, they speeded up, where feasible, the hospitalization of those patients . . . who would routinely have been hospitalized on an even day in order to bring as many as possible under the odd-day deadline."12

Another method is to select controls from among those who refuse the treatment. Again we must ask: Are the two groups comparable in all relevant respects? It has been shown that volunteers and nonvolunteers tend to differ in a number of ways.13

Still another method that has been used is to use one treatment in one center, and another treatment in a second center. The Diet-Heart Study example illustrated the kinds of difficulties we can encounter (Table 2 ) . Hospitals differ in kinds of patients, in kinds of doctors, and in kinds of ancillary care, but these differences are often not obvious. Suppose diet B had been administered at the Oakland Center and C at the Twin Cities Center. It would have been concluded that C is more effective than B. Sometimes differences are subtle and escape notice. I have another anti-nausea example. Two drugs and a placebo were tested in several boats. It was argued that all boats were alike and that they would run parallel courses. But the designers of the experiment insisted that all remedies be used on each boat, and this is how it was in fact done. The results were interesting: they showed that all the men on one boat had lower illness rates for each remedy than the men on the other boat. It turned out that this boat carried a different ballast. No one would have bothered to look for the ballast difference if the boat had contained a single remedy."

Another favorite method is the historical control. A new treatment is tried on a series of patients, and case records are drawn from the same clinic for a previous period when

762 AMERICAN JOURNAL OF OPHTHALMOLOGY MAY, 1975

the old treatment was used. Again, the fundamental assumption is that the two series are alike in all relevant respects, and this is impossible to prove. Many things can change over time, and often these changes are subtle and difficult to detect.1*-18

When coronary care units were introduced in the 1960s, attempts were made to compare mortality from coronary heart disease in these units with previous experience in the same hospitals. But referral practice had changed: more, and undoubtedly different, types of patients were admitted after the coronary care units were opened. In ophthalmology, similar changes in referral patterns can occur gradually as an ophthalmologist becomes better known or more rapidly after he acquires a new instrument, such as a laser.

When conditions permit, a patient may be used as his own control, and this is usually an advantageous experimental design. In the case of monocular treatment, one eye may be treated while the fellow eye serves as a control. Here, not the patients, but the eyes are randomized. When treatment is binocular, it may be possible to apply one treatment for several weeks or months, and then switch the patient to another treatment for an equivalent period. Here the treatment is randomized.

DISCUSSION

D R . BERNARD S C H W A R T Z : Randomization—

isn't that a function of the size of the sample which you are testing?

M R . EDERER: If you have two treatments to compare, you can randomize with two patients. I am not recommending this as an adequate sample size, but there is no minimum required for randomization.

REFERENCES

1. Weinstein, L. : Antibiotics. 2. Penicillin. In Goodman, L. S., and Gilman, A. (eds.) : The Pharmacological Basis of Therapeutics. New York, Mac-Millan, 1965, pp. 1193-1195.

2. Ederer, F., and Hiller, R. : Clinical trials, diabetic retinopathy, and photocoagulation. A reanalysis of five studies. Survey Ophthalmol. 19:267, 1975.

3. Rose, G. : Early diagnosis of chronic disease. Br. J. Hosp. Med. 1971, vol. 6.

4. Medical Research Council: Clinical trials of antihistaminic drugs in the prevention and treatment of the common cold. Br. Med. J. 2:425, 1950.

5. National Diet-Heart Study Research Group: The National Diet-Heart Study final report. Circulation 37(Suppl. 1) :1, 1968.

6. Dayton, S., Pearce, M. L., Hashimoto, S., Dixon, W. J., and Tomiyasu, U. : A controlled clinical trial of a diet high in unsaturated fat. Circulation 39(Suppl. 2) :1, 1969.

7. Bearman, J. E., Loewenson, R. B., and Gullen, W. H. : Muench's Postulates, Laws, and Corollaries. Biometrics Note No. 4, Office of Biometry and Epidemiology, National Eye Institute, April 1974.

8. Chalmers, T. C. : Science versus ethics in human drug trials. Problems and solutions. In Proger, S. (ed.) : The Medicated Society. New York, Mac-Millan, 1968, pp. 181-203.

9. Coronary Drug Project Research Group: The Coronary Drug Project. Design, Methods, and Baseline Results. Am. Heart Assoc. Monograph No. 38. New York. Am. Heart Assoc, 1973.

10. Wilson, E. B. : An Introduction to Scientific Research. New York, McGraw-Hill, 1952, p. 42.

11. "Student": The Lanarkshire milk experiment. Biometrika 23:398, 1931.

12. Wright, I. S., Marple, C. D., and Beck, D. F. : Myocardial Infarction. Its Clinical Manifestations and Treatment with Anticoagulants. New York, Grune and Stratton, 1954, pp. 9-11.

13. Crocetti, A. : Volunteering in Medical Research. Doctoral dissertation. Baltimore, Johns Hopkins University, 1970.

14. Merrel, M. : Clinical therapeutic trial of a new drug. Bull. Johns Hopkins Hosp. 85 :223, 1949.

15. Mainland, D. : Statistical ward rounds. Clin. Pharmacol. Ther. 8:876, 1967.

16. Hill, A. B. : Clinical trials. In Principles of Medical Statistics. New York, Oxford University Press, 1971, chap. 20, pp. 251-52.

why do we need controls? why do we need to randomize?

Documents