statistical science 13, no 2, 95 122 r. a. fisher in the 21st...

28
Statistical Science 1998, Vol. 13, No. 2, 95 ] 122 R. A. Fisher in the 21st Century Invited Paper Presented at the 1996 R. A. Fisher Lecture Bradley Efron Abstract. Fisher is the single most important figure in 20th century statistics. This talk examines his influence on modern statistical think- ing, trying to predict how Fisherian we can expect the 21st century to be. Fisher’s philosophy is characterized as a series of shrewd compro- mises between the Bayesian and frequentist viewpoints, augmented by some unique characteristics that are particularly useful in applied problems. Several current research topics are examined with an eye toward Fisherian influence, or the lack of it, and what this portends for future statistical developments. Based on the 1996 Fisher lecture, the article closely follows the text of that talk. Key words and phrases: Statistical inference, Bayes, frequentist, fidu- cial, empirical Bayes, model selection, bootstrap, confidence intervals. 1. INTRODUCTION Even scientists need their heroes, and R. A. Fisher was certainly the hero of 20th century statistics. His ideas dominated and transformed our field to an extent a Caesar or an Alexander might have envied. Most of this happened in the second quarter of the century, but by the time of my own education Fisher had been reduced to a somewhat minor figure in American academic statistics, with the influence of Neyman and Wald rising to their high water mark. There has been a late 20th century resurgence of interest in Fisherian statistics, in England where his influence never much waned, but also in Amer- ica and the rest of the statistical world. Much of this revival has gone unnoticed because it is hidden behind the dazzle of modern computational meth- ods. One of my main goals here will be to clarify Fisher’s influence on modern statistics. Both the strengths and limitations of Fisherian thinking will be described, mainly by example, finally leading up Bradley Efron is Max H. Stein Professor of Human- ities and Sciences and Professor of Statistics and Biostatistics, Department of Statistics, Stanford Ž University, Stanford, California 94305-4065 e- . mail: brad@stat.stanford.edu . to some speculations on Fisher’s role in the statisti- cal world of the 21st century. What follows is basically the text of the Fisher lecture presented to the August 1966 Joint Statisti- cal meetings in Chicago. The talk format has cer- tain advantages over a standard journal article. First and foremost, it is meant to be absorbed quickly, in an hour, forcing the presentation to concentrate on main points rather than technical details. Spoken language tends to be livelier than the gray prose of a journal paper. A talk encourages bolder distinctions and personal opinions, which are dangerously vulnerable in a written article but appropriate I believe for speculations about the future. In other words, this will be a broad-brush painting, long on color but short on detail. These advantages may be viewed in a less favor- able light by the careful reader. Fisher’s mathemat- ical arguments are beautiful in their power and economy, and most of that is missing here. The broad brush strokes sometimes conceal important areas of controversy. Most of the argumentation is by example rather than theory, with examples from my own work playing an exaggerated role. Refer- ences are minimal, and not indicated in the usual author ] year format but rather collected in anno- tated form at the end of the text. Most seriously, the one-hour limit required a somewhat arbitrary selection of topics, and in doing so I concentrated on 95

Upload: others

Post on 24-Jan-2021

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Statistical Science 13, No 2, 95 122 R. A. Fisher in the 21st ...18.655/papers/efron1998.pdfStatistical Science 1998, Vol. 13, No. 2, 95]122 R. A. Fisher in the 21st Century Invited

Statistical Science1998, Vol. 13, No. 2, 95]122

R. A. Fisher in the 21st Century

Invited Paper Presented at the 1996R. A. Fisher LectureBradley Efron

Abstract. Fisher is the single most important figure in 20th centurystatistics. This talk examines his influence on modern statistical think-ing, trying to predict how Fisherian we can expect the 21st century tobe. Fisher’s philosophy is characterized as a series of shrewd compro-mises between the Bayesian and frequentist viewpoints, augmented bysome unique characteristics that are particularly useful in appliedproblems. Several current research topics are examined with an eyetoward Fisherian influence, or the lack of it, and what this portends forfuture statistical developments. Based on the 1996 Fisher lecture, thearticle closely follows the text of that talk.

Key words and phrases: Statistical inference, Bayes, frequentist, fidu-cial, empirical Bayes, model selection, bootstrap, confidence intervals.

1. INTRODUCTION

Even scientists need their heroes, and R. A. Fisherwas certainly the hero of 20th century statistics.His ideas dominated and transformed our field toan extent a Caesar or an Alexander might haveenvied. Most of this happened in the second quarterof the century, but by the time of my own educationFisher had been reduced to a somewhat minorfigure in American academic statistics, with theinfluence of Neyman and Wald rising to their highwater mark.

There has been a late 20th century resurgence ofinterest in Fisherian statistics, in England wherehis influence never much waned, but also in Amer-ica and the rest of the statistical world. Much ofthis revival has gone unnoticed because it is hiddenbehind the dazzle of modern computational meth-ods. One of my main goals here will be to clarifyFisher’s influence on modern statistics. Both thestrengths and limitations of Fisherian thinking willbe described, mainly by example, finally leading up

Bradley Efron is Max H. Stein Professor of Human-ities and Sciences and Professor of Statistics andBiostatistics, Department of Statistics, Stanford

ŽUniversity, Stanford, California 94305-4065 e-.mail: [email protected] .

to some speculations on Fisher’s role in the statisti-cal world of the 21st century.

What follows is basically the text of the Fisherlecture presented to the August 1966 Joint Statisti-cal meetings in Chicago. The talk format has cer-tain advantages over a standard journal article.First and foremost, it is meant to be absorbedquickly, in an hour, forcing the presentation toconcentrate on main points rather than technicaldetails. Spoken language tends to be livelier thanthe gray prose of a journal paper. A talk encouragesbolder distinctions and personal opinions, whichare dangerously vulnerable in a written article butappropriate I believe for speculations about thefuture. In other words, this will be a broad-brushpainting, long on color but short on detail.

These advantages may be viewed in a less favor-able light by the careful reader. Fisher’s mathemat-ical arguments are beautiful in their power andeconomy, and most of that is missing here. Thebroad brush strokes sometimes conceal importantareas of controversy. Most of the argumentation isby example rather than theory, with examples frommy own work playing an exaggerated role. Refer-ences are minimal, and not indicated in the usualauthor]year format but rather collected in anno-tated form at the end of the text. Most seriously,the one-hour limit required a somewhat arbitraryselection of topics, and in doing so I concentrated on

95

Page 2: Statistical Science 13, No 2, 95 122 R. A. Fisher in the 21st ...18.655/papers/efron1998.pdfStatistical Science 1998, Vol. 13, No. 2, 95]122 R. A. Fisher in the 21st Century Invited

B. EFRON96

those parts of Fisher’s work that have been mostimportant to me, omitting whole areas of Fisherianinfluence such as randomization and experimentaldesign. The result is more a personal essay than asystematic survey.

Ž .This is a talk as I will now refer to it on Fisher’sinfluence, not mainly on Fisher himself or even hisintellectual history. A much more thorough study ofthe work itself appears in L. J. Savage’s famoustalk and essay, ‘‘On rereading R. A. Fisher,’’ the1971 Fisher lecture, a brilliant account of Fisher’sstatistical ideas as sympathetically viewed by a

Ž .leading Bayesian Savage, 1976 . Thanks to JohnPratt’s editorial efforts, Savage’s talk appeared,posthumously, in the 1976 Annals of Statistics. Inthe article’s discussion, Oscar Kempthorne called itthe best statistics talk he had ever heard, andChurchill Eisenhart said the same. Another finereference is Yates and Mather’s introduction to the1971 five-volume set of Fisher’s collected works.The definitive Fisher reference in Joan Fisher Box’s1978 biography, The Life of a Scientist.

It is a good rule never to meet your heroes. Iinadvertently followed this rule when Fisher spokeat the Stanford Medical School in 1961, withoutnotice to the Statistics Department. The strength ofFisher’s powerful personality is missing from thistalk, but not I hope the strength of his ideas. Heroicis a good word for Fisher’s attempts to changestatistical thinking, attempts that had a profoundinfluence on this century’s development of statisticsinto a major force on the scientific landscape. ‘‘Whatabout the next century?’’ is the implicit questionasked in the title, but I won’t try to address thatquestion until later.

2. THE STATISTICAL CENTURY

Despite its title, the greater portion of the talkconcerns the past and the present. I am going tobegin by looking back on statistics in the 20thcentury, which has been a time of great advance-ment for our profession. During the 20th centurystatistical thinking and methodology have becomethe scientific framework for literally dozens of fields,including education, agriculture, economics, biologyand medicine, and with increasing influence re-cently on the hard sciences such as astronomy,geology and physics.

In other words, we have grown from a smallobscure field into a big obscure field. Most peopleand even most scientists still don’t know muchabout statistics except that there is something goodabout the number ‘‘.05’’ and perhaps something badabout the bell curve. But I believe that this willchange in the 21st century and that statistical

methods will be widely recognized as a centralelement of scientific thinking.

The 20th century began on an auspicious statisti-cal note with the appearance of Karl Pearson’sfamous x 2 paper in the spring of 1900. Thegroundwork for statistics’s growth was laid by apre]World War II collection of intellectual giants:Neyman, the Pearsons, Student, Kolmogorov,Hotelling and Wald, with Neyman’s work beingespecially influential. But from our viewpoint at thecentury’s end, or at least from my viewpoint, thedominant figure has been R. A. Fisher. Fisher’sinfluence is especially pervasive in statistical appli-cations, but it also runs through the pages of ourtheoretical journals. With the end of the century inview this seemed like a good occasion for takingstock of the vitality of Fisher’s legacy and its poten-tial for future development.

A more accurate but less provocative title for thistalk would have been ‘‘Fisher’s influence on modernstatistics.’’ What I will mostly do is examine sometopics of current interest and assess how muchFisher’s ideas have or have not influenced them.The central part of the talk concerns six researchareas of current interest that I think will be impor-tant during the next couple of decades. This willalso give me a chance to say something about thekinds of applied problems we might be dealing withsoon, and whether or not Fisherian statistics isgoing to be of much help with them.

First though I want to give a brief review ofFisher’s ideas and the ideas he was reacting to. Onedifficulty in assessing the importance of Fisherianstatistics is that it’s hard to say just what it is.Fisher had an amazing number of important ideasand some of them, like randomization inference andconditionality, are contradictory. It’s a little as if ineconomics Marx, Adam Smith and Keynes turnedout to be the same person. So I am just going tooutline some of the main Fisherian themes, with noattempt at completeness or philosophical reconcilia-tion. This and the rest of the talk will be very shorton references and details, especially technical de-tails, which I will try to avoid entirely.

In 1910, two years before the 20-year-old Fisherpublished his first paper, an inventory of the statis-tics world’s great ideas would have included thefollowing impressive list: Bayes theorem, leastsquares, the normal distribution and the centrallimit theorem, binomial and Poisson methods forcount data, Galton’s correlation and regression,multivariate distributions, Pearson’s x 2 and Stu-dent’s t. What was missing was a core for theseideas. The list existed as an ingenious collection ofad hoc devices. The situation for statistics wassimilar to the one now faced by computer science.

Page 3: Statistical Science 13, No 2, 95 122 R. A. Fisher in the 21st ...18.655/papers/efron1998.pdfStatistical Science 1998, Vol. 13, No. 2, 95]122 R. A. Fisher in the 21st Century Invited

R. A. FISHER IN THE 21ST CENTURY 97

In Joan Fisher Box’s words, ‘‘The whole field waslike an unexplored archaeological site, its structurehardly perceptible above the accretions of rubble,its treasures scattered throughout the literature.’’

There were two obvious candidates to provide astatistical core: ‘‘objective’’ Bayesian statistics inthe Laplace tradition of using uniform priors forunknown parameters, and a rough frequentism ex-emplified by Pearson’s x 2 test. In fact, Pearsonwas working on a core program of his own throughhis system of Pearson distributions and the methodof moments.

By 1925, Fisher had provided a central core forstatistics}one that was quite different and morecompelling than either the Laplacian or Pearsonianschemes. The great 1925 paper already containsmost of the main elements of Fisherian estimationtheory: consistency; sufficiency; likelihood; Fisherinformation; efficiency; and the asymptotic optimal-ity of the maximum likelihood estimator. Partlymissing is ancillarity, which is mentioned but notfully developed until the 1934 paper.

The 1925 paper even contains a fascinating andstill controversial section on what Rao has calledthe second order efficiency of the maximum likeli-

Ž .hood estimate MLE . Fisher, never really satisfiedwith asymptotic results, says that in small samplesthe MLE loses less information than competingasymptotically efficient estimators, and implies thatthis helps solve the problem of small-sample infer-

Žence at which point Savage wonders why oneshould care about the amount of information in a

.point estimator .Fisher’s great accomplishment was to provide an

optimality standard for statistical estimation}ayardstick of the best it’s possible to do in any givenestimation problem. Moreover, he provided a practi-cal method, maximum likelihood, that quite reli-ably produces estimators coming close to the idealoptimum even in small samples.

Optimality results are a mark of scientific matu-rity. I mark 1925 as the year statistical theorycame of age, the year statistics went from an ad hoccollection of ingenious techniques to a coherent dis-cipline. Statistics was lucky to get a Fisher at thebeginning of the 20th century. We badly need an-other one to begin the 21st, as will be discussednear the end of the talk.

3. THE LOGIC OF STATISTICAL INFERENCE

Fisher believed that there must exist a logic ofinductive inference that would yield a correct an-swer to any statistical problem, in the same waythat ordinary logic solves deductive problems. Byusing such an inductive logic the statistician would

be freed from the a priori assumptions of theBayesian school.

Fisher’s main tactic was to logically reduce agiven inference problem, sometimes a very compli-cated one, to a simple form where everyone shouldagree that the answer is obvious. His favorite tar-get for the ‘‘obvious’’ was the situation where weobserve a single normally distributed quantity xwith unknown expectation u ,

Ž . Ž 2 .1 x ; N u , s ,

the variance s 2 being known. Everyone agrees,says Fisher, that in this case, the best estimate isu s x and the correct 90% confidence interval for 0Ž .to use terminology Fisher hated is

ˆŽ .2 u " 1.645s .

Fisher’s inductive logic might be called a theory oftypes, in which problems are reduced to a smallcatalogue of obvious situations. This had been triedbefore in statistics, the Pearson system being agood example, but never so forcefully nor success-fully. Fisher was astoundingly resourceful at reduc-

Ž .ing problems to simple forms like 1 . Some of thedevices he invented for this purpose were suffi-ciency, ancillarity and conditionality, transfor-mations, pivotal methods, geometric arguments,randomization inference and asymptotic maximumlikelihood theory. Only one major reduction princi-ple has been added to this list since Fisher’s time,invariance, and that one is not in universal favorthese days.

Fisher always preferred exact small-sample re-sults but the asymptotic optimality of the MLE hasbeen by far the most influential, or at least themost popular, of his reduction principles. The 1925

ˆpaper shows that in large samples the MLE u of anunknown parameter u approaches the ideal formŽ .1 ,

ˆ 2Ž .u ª N u , s ,

with the variance s 2 determined by the Fisherinformation and the sample size. Moreover, no other‘‘reasonable’’ estimator of u has a smaller asymp-totic variance. In other words, the maximum likeli-hood method automatically produces an estimatorthat can reasonably be termed ‘‘optimal,’’ withoutever invoking the Bayes theorem.

Fisher’s great accomplishment triggered a burstof interest in optimality results. The most spectacu-lar product of this burst was the Neyman]Pearsonlemma for optimal hypothesis testing, followed soonby Neyman’s theory of confidence intervals. TheNeyman]Pearson lemma did for hypothesis testingwhat Fisher’s MLE theory did for estimation, bypointing the way toward optimality.

Page 4: Statistical Science 13, No 2, 95 122 R. A. Fisher in the 21st ...18.655/papers/efron1998.pdfStatistical Science 1998, Vol. 13, No. 2, 95]122 R. A. Fisher in the 21st Century Invited

B. EFRON98

Philosophically, the Neyman]Pearson lemma fitsin well with Fisher’s program: using mathematicallogic it reduces a complicated problem to an obvioussolution without invoking Bayesian priors. More-over, it is a tremendously useful idea in applica-tions, so that Neyman’s ideas on hypotheses testingand confidence intervals now play a major role inday-to-day applied statistics.

However, the success of the Neyman]Pearsonlemma triggered new developments, leading to amore extreme form of statistical optimality thatFisher deeply distrusted. Even though Fisher’s per-sonal motives are suspect here, his philosophicalqualms were far from groundless. Neyman’s ideas,as later developed by Wald into decision theory,brought a qualitatively different spirit into statis-tics.

Fisher’s maximum likelihood theory waslaunched in reaction to the rather shallow Lapla-cian Bayesianism of the previous century. Fisher’swork demonstrated a more stringent approach tostatistical inference. The Neyman]Wald decisiontheoretic school carried this spirit of astringencymuch further. A strict mathematical statement ofthe problem at hand, often phrased quite narrowly,followed by an optimal solution became the ideal.The practical result was a more sophisticated formof frequentist inference having enormous mathe-matical appeal.

Fisher, caught I think by surprise by this flank-ing attack from his right, complained that the Ney-man]Wald decision theorists could be accuratewithout being correct. A favorite example of hisconcerned a Cauchy distribution with unknowncenter

1Ž . Ž .3 f x s .u 2w Ž . xp 1 q x y u

Ž .Given a random sample x s x , x , . . . , x from1 2 nŽ .3 , decision theorists might try to provide the

ˆshortest interval of the form u " c that covers thetrue u with probability 0.90. Fisher’s objection,

spelled out in his 1934 paper on ancillarity, wasthat c should be different for different samples xdepending upon the correct amount of informationin x.

The decision theory movement eventuallyspawned its own counter-reformation. The neo-Bayesians, led by Savage and de Finetti, produceda more logical and persuasive Bayesianism, empha-sizing subjective probabilities and personal decisionmaking. In its most extreme form the Savage]deFinetti theory directly denies Fisher’s claim of animpersonal logic of statistical inference. There hasalso been a postwar revival of interest in objectivistBayesian theory, Laplacian in intent but based onJeffreys’s more sophisticated methods for choosingobjective priors, which I shall talk more about lateron.

Very briefly then, this is the way we arrived atthe end of the 20th century with three competingphilosophies of statistical inference: Bayesian; Ney-man]Wald frequentist; and Fisherian. In manyways the Bayesian and frequentist philosophiesstand at opposite poles from each other, withFisher’s ideas being somewhat of a compromise. Iwant to talk about that compromise next because ithas a lot to do with the popularity of Fisher’smethods.

4. THREE COMPETING PHILOSOPHIES

The chart in Figure 1 shows four major areas ofdisagreement between the Bayesians and the fre-quentists. These are not just philosophical dis-agreements. I chose the four categories becausethey lead to different behavior at the data-analyticlevel. For each category I have given a rough indi-cation of Fisher’s preferred position.

4.1 Individual Decision Making versusScientific Inference

Bayes theory, and in particular Savage]de FinettiŽBayesianism the kind I’m focusing on here, though

later I’ll also talk about the Jeffreys brand of objec-

FIG. 1. Four major areas of disagreement between Bayesian and frequentist methods. For each one I have inserted a row of stars toindicate, very roughly, the preferred location of Fisherian inference.

Page 5: Statistical Science 13, No 2, 95 122 R. A. Fisher in the 21st ...18.655/papers/efron1998.pdfStatistical Science 1998, Vol. 13, No. 2, 95]122 R. A. Fisher in the 21st Century Invited

R. A. FISHER IN THE 21ST CENTURY 99

.tive Bayesianism , emphasizes the individual deci-sion maker, and it has been most successful infields like business where individual decisions areparamount. Frequentists aim for universal accep-tance of their inferences. Fisher felt that the properrealm of statistics was scientific inference, where itis necessary to persuade all or at least most of theworld of science that you have reached the correctconclusion. Here Fisher is far over to the frequen-

Žtist side of the chart which is philosophically ac-curate but anachronistic, since Fisher’s positionpredates both the Savage]de Finetti and Ney-

.man]Wald schools .

4.2 Coherence versus Optimality

Bayesian theory emphasizes the coherence of itsjudgments, in various technical ways but also inthe wider sense of enforcing consistency relation-ships between different aspects of a decision-mak-ing situation. Optimality in the frequentist sense isfrequently incoherent. For example, the uniform

Ž .minimum variance unbiased UMVU estimate of� 4 � 4exp u does not have to equal exp the UMVU of u ,

and more seriously there is no simple calculus re-lating the two different estimates. Fisher wanted tohave things both ways, coherent and optimal, andin fact maximum likelihood estimation does satisfy

$ˆ� 4 � 4exp u s exp u .

The tension between coherence and optimality islike the correctness]accuracy disagreement con-

Ž .cerning the Cauchy example 3 , where Fisherargued strongly for correctness. The emphasis oncorrectness, and a belief in the existence of a logicof statistical inference, moves Fisherian philosophytoward the Bayesian side of Figure 1. Fisherianpractice is a less clear story. Different parts of theFisherian program don’t cohere with each otherand in practice Fisher seemed quite willing to sacri-fice logical consistency for a neat solution to aparticular problem, for example, switching backand forth between frequentist and nonfrequentistjustifications of the Fisher information. This kind ofcase-to-case expediency, which is a common at-tribute of modern data analysis has a frequentistflavor. I have located the Fisherian stars for thiscategory a little closer to the Bayesian side of Fig-ure 1, but spreading over a wide range.

4.3 Synthesis versus Analysis

Bayesian decision making emphasizes the collec-tion of information across all possible sources, andthe synthesis of that information into the finalinference. Frequentists tend to break problems intoseparate small pieces that can be analyzed sepa-

Ž .rately and optimally . Fisher emphasized the useof all available information as a hallmark of correctinference, and in this way he is more in sympathywith the Bayesian position.

In this case Fisher tended toward the Bayesianposition both in theory and in methodology: maxi-mum likelihood estimation and its attendant theoryof approximate confidence intervals based on Fisherinformation are superbly suited to the combination

Žof information from different sources. On the otherhand, we have this quote from Yates and Mather:‘‘In his own work Fisher was at his best whenconfronted with small self-contained sets ofdata. . . . He was never much interested in the as-sembly and analysis of large amounts of data fromvaried sources bearing on a given issue.’’ They blamethis for his stubbornness on the smoking]cancercontroversy. Here as elsewhere we will have to view

.Fisher as a lapsed Fisherian.

4.4 Optimism versus Pessimism

This last category is more psychological thanphilosophical, but it is psychology rooted in thebasic nature of the two competing philosophies.Bayesians tend to be more aggressive and risk-tak-ing in their data analyses. There couldn’t be a morepessimistic and defensive theory than minimax, tochoose an extreme example of frequentist philoso-phy. It says that if anything can go wrong it will. Ofcourse a minimax person might characterize theBayesian position as ‘‘If anything can go right itwill.’’

Fisher took a middle ground here. He scorns thefiner mathematical concerns of the decision theo-

Žrists ‘‘Not only does it take a cannon to shoot a.sparrow, but it misses the sparrow!’’ , but he fears

averaging over the states of nature in a Bayesianway. One of the really appealing features of Fisher’swork is its spirit of reasonable compromise, cau-tious but not overly concerned with pathologicalsituations. This has always struck me as the rightattitude toward most real-life problems, and it’scertainly a large part of Fisher’s dominance in sta-tistical applications.

Looking at Figure 1, I think it is a mistake tryingtoo hard to make a coherent philosophy out ofFisher’s theories. From our current point of viewthey are easier to understand as a collection ofextremely shrewd compromises between Bayesianand frequentist ideas. Fisher usually wrote as if hehad a complete logic of statistical inference in hand,but that didn’t stop him from changing his systemwhen he thought up another landmark idea.

De Finetti, as quoted by Cifarelli and Regazzini,puts it this way: ‘‘Fisher’s rich and manifold per-sonality shows a few contradictions. His common

Page 6: Statistical Science 13, No 2, 95 122 R. A. Fisher in the 21st ...18.655/papers/efron1998.pdfStatistical Science 1998, Vol. 13, No. 2, 95]122 R. A. Fisher in the 21st Century Invited

B. EFRON100

sense in applications on one hand and his loftyconception of scientific research on the other leadhim to disdain the narrowness of a genuinely objec-tivist formulation, which he regarded as a woodenattitude. He professes his adherence to the objec-tivist point of view by rejecting the errors of theBayes]Laplace formulation. What is not so goodhere is his mathematics, which he handles withmastery in individual problems but rather cava-lierly in conceptual matters, thus exposing himselfto clear and sometimes heavy criticism. From ourpoint of view it appears probable that many ofFisher’s observations and ideas are valid providedwe go back to the intuitions from which they springand free them from the arguments by which hethought to justify them.’’

Figure 1 describes Fisherian statistics as a com-promise between the Bayesian and frequentistschools, but in one crucial way it is not a compro-mise: in its ease of use. Fisher’s philosophy wasalways expressed in very practical terms. Heseemed to think naturally in terms of computa-tional algorithms, as with maximum likelihood esti-mation, analysis of variance and permutation tests.If anything is going to replace Fisher in the 21stcentury it will have to be a methodology that isequally easy to apply in day-to-day practice.

5. FISHER’S INFLUENCE ONCURRENT RESEARCH

There are three parts to this talk: past, presentand future. The past part, which you have justseen, didn’t do justice to Fisher’s ideas, but thesubject here is more one of influence than ideas,admitting of course that the influence is founded onthe ideas’s strengths. So now I am going to discussFisher’s influence on current research.

Ž .What follows are several actually six examplesof current research topics that have attracted a lotof attention recently. No claim of completeness isbeing made here. The main point I’m trying tomake with these examples is that Fisher’s ideas arestill exerting a powerful influence on developmentsin statistical theory, and that this is an importantindication of their future relevance. The exampleswill gradually get more speculative and futuristic,and will include some areas of development notsatisfactorily handled by Fisher]holes in the Fishe-rian fabric}where we might expect future work tobe more frequentist or Bayesian in motivation.

The examples will also allow me to talk about thenew breed of applied problems statisticians arestarting to see, the bigger, messier, more compli-cated data sets that we will have to deal with in the

coming decades. Fisherian methods were fashionedto deal with the problems of the 1920s and 1930s. Itis not a certainty that they will be equally applica-ble to the problems of the 21st century}a questionI hope to shed at least a little light upon.

5.1 Fisher Information and the Bootstrap

This first example is intended to show howFisher’s ideas can pop up in current work, but bedifficult to recognize because of computational ad-vances. First, here is a very brief review of Fisherinformation. Suppose we observe a random sample

Ž .x , x , . . . , x from a density function f x depend-1 2 n u

ing on a single unknown parameter u ,

Ž .f x ª x , x , . . . , x .u 1 2 n

The Fisher information in any one x is the ex-pected value of minus the second derivative of thelog density,

­ 2

Ž .i s E y log f x ,u u u2½ 5­u

and the total Fisher information in the whole sam-ple is ni .u

Fisher showed that the asymptotic standard er-ror of the MLE is inversely proportional to thesquare root of the total information,

1ˆŽ . Ž .4 se u s ,˙u ni' u

and that no other consistent and sufficiently regu-lar estimation of u}essentially no other asymptoti-cally, unbiased estimator}can do better.

A tremendous amount of philosophical interpre-tation has been attached to i , concerning theu

meaning of statistical information, but in practiceŽ .Fisher’s formula 4 is most often used simply as a

handy estimate of the standard error of the MLE.Ž .Of course, 4 by itself cannot be used directly

because i involves the unknown parameter u .u

Fisher’s tactic, which seems obvious but in fact isquite central to Fisherian methodology, is to plug

ˆ Ž .in the MLE u for u in 4 , giving a usable estimateof standard error,

1$Ž .5 se s .

ni' u

Ž .Here is an example of formula 5 in action. Figure2 shows the results of a small study designed totest the efficacy of an experimental antiviral drug.

Page 7: Statistical Science 13, No 2, 95 122 R. A. Fisher in the 21st ...18.655/papers/efron1998.pdfStatistical Science 1998, Vol. 13, No. 2, 95]122 R. A. Fisher in the 21st Century Invited

R. A. FISHER IN THE 21ST CENTURY 101

FIG. 2. The cd4 data; 20 AIDS patients had their cd4 countsmeasured before and after taking an experimental drug; correla-

ˆtion coefficient u s 0.723.

A total of n s 20 AIDS patients had their cd4counts measured before and after taking the drug,yielding data

Ž .x s before , after for i s 1, 2, . . . , 20.i i i

ˆThe Pearson sample correlation coefficient was u s0.723. How accurate is this estimate?

If we assume a bivariate normal model for thedata,

Ž . Ž .6 N m , S ª x , x , x , . . . , x ,2 1 2 3 20

the notation indicating a random sample of 20 pairsfrom a bivariate normal distribution with expecta-

ˆtion vector m and covariance matrix S, then u isthe MLE for the true correlation coefficient u . TheFisher information for estimating u turns out to be

Ž 2 .2 Ži s 1r 1 y u after taking proper account of theu

Ž .‘‘nuisance parameters’’ in 6 }one of those techni-. Ž .cal points I am avoiding in this talk so 5 gives

estimated standard error

ˆ2Ž .1 y u$se s s 0.107.'20

Here is a bootstrap estimate of standard error forthe same problem, also assuming that the bivariatenormal model is correct. In this context the boot-

Ž .strap samples are generated from model 6 , butˆwith estimates m and S substituted for the un-ˆ

known parameters m and S:

ˆ U U U U UN m , S ª x , x , x , . . . , x ª u ,ˆŽ . 1 2 3 20

Uwhere u is the sample correlation coefficient forthe bootstrap data set xU, xU, xU, . . . , xU .1 2 3 20

This whole process was independently repeated2,000 times, giving 2,000 bootstrap correlation coef-

Uficients u . Figure 3 shows their histogram.UThe empirical standard deviation of the 2,000 u

values is$se s 0.112,boot

which is the normal-theory bootstrap estimate ofˆstandard error for u ; 2,000 is 10 times more than

needed for a standard error, but we will need all2,000 later for the discussion of approximate confi-dence intervals.

5.2 The Plug-in Principle

The Fisher information and bootstrap standarderror estimates, 0.107 and 0.112, are quite close toeach other. This is no accident. Despite the fact thatthey look completely different, the two methods aredoing very similar calculations. Both are using the‘‘plug-in principle’’ as a crucial step in getting theanswer.

Here is a plug-in description of the two methods:

v Ž . ŽFisher information} i compute an approxi-.mate formula for the standard error of the sam-

ple correlation coefficient as a function of theŽ . Ž .unknown parameters m, S ; ii plug in esti-

ˆŽ . Ž .mates m, S for the unknown parameters m, Sˆin the formula;

ˆv Ž . Ž .bootstrap} i plug in m, S for the unknownˆŽ .parameters m, S in the mechanism generating

Ž .the data; ii compute the standard error of thesample correlation coefficient, for the plugged-inmechanism, by Monte Carlo simulation.

The two methods proceed in reverse order, ‘‘com-pute and then plug in’’ versus ‘‘plug in and thencompute,’’ but this is a relatively minor technicaldifference. The crucial step in both methods, andthe only statistical inference going on, is the substi-

ˆŽ .tution of the estimates m, S for the unknownˆŽ .parameters m, S , in other words the plug-in prin-

ciple. Fisherian inference makes frequent use of theplug-in principle, and this is one of the main rea-sons that Fisher’s methods are so convenient to usein practice. All possible inferential questions areanswered by simply plugging in estimates, usuallymaximum likelihood estimates, for unknown pa-rameters.

The Fisher information method involves cleverermathematics than the bootstrap, but it has to be-cause we enjoy a 107 computational advantage overFisher. A year’s combined computational effort by

Page 8: Statistical Science 13, No 2, 95 122 R. A. Fisher in the 21st ...18.655/papers/efron1998.pdfStatistical Science 1998, Vol. 13, No. 2, 95]122 R. A. Fisher in the 21st Century Invited

B. EFRON102

FIG. 3. Histogram of 2,000 bootstrap correlation coefficients; bivariate normal sampling model.

all the statisticians of 1925 wouldn’t equal a minuteof modern computer time. The bootstrap exploitsthis advantage to numerically extend Fisher’s cal-culations to situations where the mathematics be-comes hopelessly complicated. One of the lessattractive aspects of Fisherian statistics is its over-reliance on a small catalog of simple parametricmodels like the normal, understandable enoughgiven the limitations of the mechanical calculatorsFisher had to work with.

Modern computation has given us the opportu-nity to extend Fisher’s methods to a much wider

Žclass of models, including nonparametric ones the.more usual arena of the bootstrap . We are begin-

ning to see many such extensions, for example, theextension of discriminant analysis to CART, andthe extension of linear regression to generalizedadditive models.

6. THE STANDARD INTERVALS

I want to continue the cd4 example, but proceed-ing from standard errors to confidence intervals.The confidence interval story illustrates how com-puter-based inference can be used to extend Fisher’sideas in a more ambitious way.

The MLE and its estimated standard error wereused by Fisher to form approximate confidence in-tervals, which I like to call the standard intervalsbecause of their ubiquity in day-to-day practice,

$ˆŽ .7 u " 1.645 se.

The constant, 1.645, gives intervals of approximate90% coverage for the unknown parameter u , with5% noncoverage probabilities at each end of theinterval. We could use 1.96 instead of 1.645 for 95%coverage, and so on, but here I’ll stick to 90%.

The standard intervals follow from Fisher’s re-ˆsult that u is asymptotically normal, unbiased and

with standard error fixed by the sample size andthe Fisher information,

ˆ 2Ž . Ž .8 u ª N u , se ,

Ž . Ž .as in 4 . We recognize 8 as one of Fisher’s ideal‘‘obvious’’ forms.

If usage determines importance then the stan-dard intervals were Fisher’s most important inven-tion. Their popularity is due to a combination ofoptimality, or at least asymptotic optimality, with

Page 9: Statistical Science 13, No 2, 95 122 R. A. Fisher in the 21st ...18.655/papers/efron1998.pdfStatistical Science 1998, Vol. 13, No. 2, 95]122 R. A. Fisher in the 21st Century Invited

R. A. FISHER IN THE 21ST CENTURY 103

computation tractability. The standard intervalsare:

v accurate}their noncoverage probabilities, whichare supposed to be 0.05 at each end of the inter-val, are actually

'Ž .9 0.05 q cr n ,

where c depends on the situation, so as the sam-ple size n gets large we approach the nominalvalue 0.05 at rate ny1r2 ;

v correct}the estimated standard error based onthe Fisher information is the minimum possiblefor any asymptotically unbiased estimate of u so

Ž .interval 7 doesn’t waste any information nor isit misleadingly optimistic;$ˆv automatic}u and se are computed from the samebasic algorithm no matter how complicated theproblem may be.

Despite these advantages, applied statisticiansknow that the standard intervals can be quite inac-curate in small samples. This is illustrated in theleft panel of Figure 4 for the cd4 correlation exam-ple, where we see that the standard interval end-points lie far to the right of the endpoints for thenormal-theory exact 90% central confidence inter-val. In fact, we can see from the bootstrap his-

Ž .togram reproduced from Figure 3 that in this casethe asymptotic normality of the MLE hasn’t takenhold at n s 20, so that there is every reason todoubt the standard interval. Being able to look atthe histogram, which has a lot of information in it,is a luxury Fisher did not have.

Fisher suggested a fix for this specific situation:ˆtransform the correlation coefficient to f s

y1 ˆŽ .tanh u , that is, to

ˆ1 1 q uˆŽ .10 f s log ,ˆ2 1 y u

apply the standard method on this scale and thentransform the standard interval back to the u scale.This was another one of Fisher’s ingenious reduc-tion methods. The tanhy1 transformation greatlyaccelerates convergence to normality, as we can see

Ufrom the histogram of the 2,000 values of u sy1 UŽ .tanh u in the right panel of Figure 4, and

makes the standard intervals far more accurate.However, we have now lost the ‘‘automatic’’ prop-erty of the standard intervals. The tanhy1 trans-formation works only for the normal correlationcoefficient and not for most other problems.

The standard intervals take literally the largeˆ 2Ž .sample approximation u ; N u , se , which says

ˆthat u is normally distributed, is unbiased for uand has a constant standard error. A more carefullook at the asymptotics shows that each of thesethree assumptions can fail in a substantial way: the

ˆ ˆsampling distribution of u can be skewed; u can bebiased as an estimate of u ; and its standard errorcan change with u . Modern computation makes itpractical to correct all three errors. I am going tomention two methods of doing so, the first using thebootstrap histogram, the second based on likelihoodmethods.

It turns out that there is enough information inthe bootstrap histogram to correct all three errors

Ž . Ž .FIG. 4. Left panel Endpoints of exact 90% confidence interval for cd4 correlation coefficient solid lines are much different thanŽ . Ž .standard interval endpoints dashed lines , as suggested by the nonnormality of the bootstrap histogram. Right panel Fisher ’s

transformation normalizes the bootstrap histogram and makes the standard interval more accurate.

Page 10: Statistical Science 13, No 2, 95 122 R. A. Fisher in the 21st ...18.655/papers/efron1998.pdfStatistical Science 1998, Vol. 13, No. 2, 95]122 R. A. Fisher in the 21st Century Invited

B. EFRON104

of the standard intervals. The result is a system ofapproximate confidence intervals an order of mag-nitude more accurate, with noncoverage probabili-ties

0.05 q crnŽ .compared to 9 , achieving what is called second

order accuracy. Table 1 demonstrates the practicaladvantages of second order accuracy. In most situa-tions we would not have exact endpoints as a ‘‘goldstandard’’ for comparison, but second order accu-racy would still point to the superiority of the boot-strap intervals.

The bootstrap method, and also the likelihood-based methods of the next section, are transforma-tion invariant; that is, they give the same intervalfor the correlation coefficient whether or not you gothrough the tanhy1 transformation. In this sensethey automate Fisher’s wonderful transformationtrick.

I like this example because it shows how a basicFisherian construction, the standard intervals, canbe extended by modern computation. The extensionlets us deal easily with very complicated probabilitymodels, even nonparametric ones, and also withcomplicated statistics such as a coefficient in astepwise robust regression.

Moreover, the extension is not just to a wider setof applications. Some progress in understandingthe theoretical basis of approximate confidence in-tervals is made along the way. Other topics arespringing up in the same fashion. For example,Fisher’s 1925 work on the information loss for in-sufficient estimators has transmuted into our mod-ern theories of the EM algorithm and Gibbs sam-pling.

7. CONDITIONAL INFERENCE,ANCILLARITY AND THE MAGIC FORMULA

Table 2 shows the occurrence of a very undesir-able side effect in a randomized experiment thatwill be described more fully later. The treatmentproduces a smaller ratio of these undesirable effectsthan does the control, the sample log odds ratiobeing

1 13u s log s y4.2.ž /15 3

TABLE 1Endpoints of exact and approximate 90% confidence intervals

for the cd4 correlation coefficient assuming bivariate normality

Exact Bootstrap Standard

0.05 0.464 0.468 0.5470.95 0.859 0.856 0.899

TABLE 2The occurrence of adverse events in a randomized

ˆexperiment; sample log odds ratio u s y4.2

Yes No

Treatment 1 15 16

Control 13 3 16

14 18

Fisher wondered how one might make appropri-ate inferences for u , the true log odds ratio. Thetrouble here is nuisance parameters. A multinomialmodel for the 2 = 2 table has three free param-eters, representing four cell probabilities con-strained to add up to 1, and in some sense two ofthe three parameters have to be eliminated in orderto get at u . To do this Fisher came up with anotherdevice for reducing a complicated situation to asimple form.

Fisher showed that if we condition on themarginals of the table, then the conditional density

ˆof u given the marginals depends only u . The nui-sance parameters disappear. This conditioning is‘‘correct’’ he argued because the marginals are act-ing as what might be called approximate ancillarystatistics. That is, they do not carry much directinformation concerning the value of u , but they

ˆhave something to say about how accurately u esti-mates u . Later Neyman gave a much more specificfrequentist justification for conditioning on themarginals, through what is now called Neymanstructure.

For the data in Table 2, the conditional distribu-ˆ w xtion of u given the marginals yields y6.3, y2.4 as

a 90% confidence interval for u , ruling out the nullhypothesis value u s 0 where Treatment equalsControl. However, the conditional distribution isnot easy to calculate, even in this simple case, andit becomes prohibitive in more complicated situa-tions.

In his 1934 paper, which was the capstone ofFisher’s work on efficient estimation, he solved theconditioning problem for translation families. Sup-

Ž .pose that x s x , x , . . . , x is a random sample1 2 nŽ .from a Cauchy distribution 3 and that we wish to

use x to make inferences about u , the unknowncenter point of the distribution. In this case there isa genuine ancillary statistic A, the vector of spac-ings between the ordered values of x. Again Fisherargued that correct inferences about u should be

ˆŽ < .based on f u A , the conditional density of theuˆMLE u given the ancillary A, not on the uncondi-

ˆŽ .tional density f u .u

Fisher also provided a wonderful trick for calcu-ˆŽ < . Ž .lating f u A . Let L u be the likelihood function:u

Page 11: Statistical Science 13, No 2, 95 122 R. A. Fisher in the 21st ...18.655/papers/efron1998.pdfStatistical Science 1998, Vol. 13, No. 2, 95]122 R. A. Fisher in the 21st Century Invited

R. A. FISHER IN THE 21ST CENTURY 105

the unconditional density of the whole sample, con-sidered as a function of u with x fixed. Then itturns out that

Ž .L uˆŽ . Ž < .11 f u A s c ,u ˆŽ .L u

Ž .where c is a constant. Formula 11 allows us toˆŽ < .compute the conditional density f u A from theu

likelihood, which is easy to calculate. It also hintsat a deep connection between likelihood-based in-ference, a Fisherian trademark, and frequentistmethods.

Despite this promising start, the promise wentunfulfilled in the years following 1934. The trouble

Ž .was that formula 11 applies only in very specialcircumstances, not including the 2 = 2 table exam-ple, for instance. Recently, though, there has been arevival of interest in likelihood-based conditionalinference. Durbin, Barndorff-Nielsen, Hinkley andothers have developed a wonderful generalization

Ž .of 11 that applies to a wide variety of problemshaving approximate ancillaries, the so-called magicformula

1r22Ž .L u dˆŽ . Ž < . Ž . <12 f u A s c y log L u .ˆusuu 2½ 5ˆ duŽ .L u

The bracketed factor is constant in the CauchyŽ . Ž .situation, reducing 12 back to 11 .

Likelihood-based conditional inference has beenpushed forward in current work by Fraser, Cox andReid, McCullagh, Barndorff-Nielson, Pierce, DiCic-cio and many others. It represents a major effort toperfect and extend Fisher’s goal of an inferentialsystem based directly on likelihoods.

In particular the magic formula can be used togenerate approximate confidence intervals that aremore accurate than the standard intervals, at leastsecond order accurate. These intervals agree to sec-ond order with the bootstrap intervals. If this werenot true, then one or both of them would not besecond order correct. Right now it looks like at-tempts to improve upon the standard intervals areconverging from two directions: likelihood and boot-strap.

Ž .Results like 12 have enormous potential. Likeli-hood inference is the great unfulfilled promise ofFisherian statistics}the promise of a theory thatdirectly interprets likelihood functions in a waythat simultaneously satisfies Bayesians and fre-quentists. Fulfilling that promise, even partially,would greatly influence the shape of 21st centurystatistics.

8. FISHER’S BIGGEST BLUNDER

Now I’ll start edging gingerly into the 21st cen-tury by discussing some topics where Fisher’s ideashave not been dominant, but where they might ormight not be important in future developments. Iam going to begin with the fiducial distribution,generally considered to be Fisher’s biggest blunder.But in Arthur Koestler’s words ‘‘The history ofideas is filled with barren truths and fertile errors.’’If fiducial inference is an error it certainly has beena fertile one.

In terms of Figure 1, the Bayesian]frequentistcomparison chart, fiducial inference was Fisher’sclosest approach to the Bayesian side of the ledger.Fisher was trying to codify an objective Bayesian-ism in the Laplace tradition but without usingLaplace’s ad hoc uniform prior distributions. I be-lieve that Fisher’s continuing devotion to fiducialinference had two major influences, a negative re-action against Neyman’s ideas and a positive at-traction to Jeffreys’s point of view.

The solid line in Figure 5 is the fiducial densityfor a binomial parameter u having observed 3 suc-cesses in 10 trials,

Ž .s ; Binomial n , u , s s 3 and n s 10.

Also shown is an approximate fiducial density thatI will refer to later. Fisher’s fiducial theory at itsboldest treated the solid curve as a genuine a poste-riori density for u even though, or perhaps because,no prior assumptions had been made.

8.1 The Confidence Density

We could also call the fiducial distribution the‘‘confidence density’’ because this is an easy way tomotivate the fiducial construction. As I said earlier,Fisher would have hated this name.

Suppose that for every value of a between 0 andw x1 we have an upper 100 a th confidence limit u a

for u , so that by definition

ˆ� w x 4prob u - u a s a .

We can interpret this as a probability distributionfor u given the data if we are willing to accept theclassic wrong interpretation of confidence,

ˆ ˆŽ w x w x .u is in the interval u 0.90 , u 0.91with probability 0.01, and so on.

Going to the continuous limit gives the ‘‘confi-dence density,’’ a name Neyman would have hated.

The confidence density is the fiducial distribu-tion, at least in those cases where Fisher wouldhave considered the confidence limits to be inferen-

Page 12: Statistical Science 13, No 2, 95 122 R. A. Fisher in the 21st ...18.655/papers/efron1998.pdfStatistical Science 1998, Vol. 13, No. 2, 95]122 R. A. Fisher in the 21st Century Invited

B. EFRON106

FIG. 5. Fiducial density for a binomial parameter u having observed 3 successes out of 10 trials. The dashed line is an approximationthat is useful in complicated situations.

tially correct. The fiducial distribution in Figure 5is the confidence density based on the usual confi-

Ždence limits for u taking into account the discreteˆ. w xnature of the binomial distribution : u a is the

Ž .value of u such that S ; Binomial 10, u satisfies

1� 4 � 4prob S ) 3 q prob S s 3 s a .2

Fisher was uncomfortable applying fiducial argu-ments to discrete distributions because of the adhoc continuity corrections required, but the difficul-ties caused are more theoretical than practical.

The advantage of stating fiducial ideas in termsof the confidence density is that they then can beapplied to a wider class of problems. We can use theapproximate confidence intervals mentioned ear-lier, either the bootstrap or the likelihood ones, toget approximate fiducial distribution even in verycomplicated situations having lots of nuisance pa-

Žrameters. The dashed curve in Figure 5 is theconfidence density based on approximate bootstrap

.intervals. And there are practical reasons why itwould be very convenient to have good approximatefiducial distributions, reasons connected with our

profession’s 250-year search for a dependable objec-tive Bayes theory.

8.2 Objective Bayes

By ‘‘objective Bayes’’ I mean a Bayesian theory inwhich the subjective element is removed from thechoice of prior distribution; in practical terms auniversal recipe for applying Bayes theorem in theabsence of prior information. A widely acceptedobjective Bayes theory, which fiducial inference wasintended to be, would be of immense theoreticaland practical importance.

I have in mind here dealing with messy, compli-cated problems where we are trying to combineinformation from disparate sources}doing a me-taanalysis, for example. Bayesian methods areparticularly well-suited to such problems. This isparticularly true now that techniques like the Gibbssampler and Markov chain Monte Carlo are avail-able for integrating the nuisance parameters out ofhigh-dimensional posterior distributions.

The trouble of course is that the statistician stillhas to choose a prior distribution in order to use

Page 13: Statistical Science 13, No 2, 95 122 R. A. Fisher in the 21st ...18.655/papers/efron1998.pdfStatistical Science 1998, Vol. 13, No. 2, 95]122 R. A. Fisher in the 21st Century Invited

R. A. FISHER IN THE 21ST CENTURY 107

Bayes’s theorem. An unthinking use of uniformpriors is no better now than it was in Laplace’s day.A lot of recent effort has been put into the develop-ment of uninformative or objective prior distribu-tions, priors that eliminate nuisance parameterssafely while remaining neutral with respect to theparameter of interest. Kass and Wasserman’s 1996JASA article reviews current developments byBerger, Bernardo and many others, but the task offinding genuinely objective priors for high-dimen-sional problems remains daunting.

Fiducial distributions, or confidence densities, of-fer a way to finesse this difficulty. A good argumentcan be made that the confidence density is theposterior density for the parameter of interest, af-ter all of the nuisance parameters have been inte-grated out in an objective way. If this argumentturns out to be valid, then our progress in con-structing approximate confidence intervals, and ap-proximate confidence densities, could lead to aneasier use of Bayesian thinking in practical prob-lems.

This is all quite speculative, but here is a safeprediction for the 21st century: statisticians will beasked to solve bigger and more complicated prob-lems. I believe that there is a good chance thatobjective Bayes methods will be developed for suchproblems, and that something like fiducial infer-ence will play an important role in this develop-ment. Maybe Fisher’s biggest blunder will become abig hit in the 21st century!

9. MODEL SELECTION

Model selection is another area of statistical re-search where important developments seem to bebuilding up, but without a definitive breakthrough.The question asked here is how to select the modelitself, not just the continuous parameters of a givenmodel, from the observed data. F-tests, and ‘‘F ’’stands for Fisher, help with this task, and arecertainly the most widely used model selectiontechniques. However, even in relatively simpleproblems things can get complicated fast, as anyonewho has gotten lost in a tangle of forward andbackward stepwise regression programs can testify.

The fact is that classic Fisherian estimation andtesting theory are a good start, but not much morethan that, on model selection. In particular, maxi-mum likelihood estimation theory and model fittingdo not account for the number of free parametersbeing fit, and that is why frequentist methodslike Mallow’s C , the Akaike information criterionpand cross-validation have evolved. Model selectionseems to be moving away from its Fisherian roots.

Now statisticians are starting to see really com-plicated model selection problems, with thousands

and even millions of data points and hundreds ofcandidate models. A thriving area called machinelearning has developed to handle such problems, inways that are not yet very well connected to statis-tical theory.

Table 3, taken from Gail Gong’s 1982 thesis,shows part of the data from a model selection prob-lem that is only moderately complicated by today’sstandards, though hopelessly difficult from a pre-war viewpoint. A ‘‘training set’’ of 155 chronic hep-atitis patients were measured on 19 diagnostic pre-diction variables. The outcome variable y waswhether or not the patient died from liver failureŽ .122 lived, 33 died , the goal of the study being todevelop a prediction rule for y in terms of thediagnostic variables.

In order to predict the outcome, a logistic regres-sion model was built up in three steps:

v Individual logistic regressions were run for eachof the 19 predictors, yielding 13 that were signifi-cant at the 0.05 level.

v A forward stepwise logistic regression program,including only those patients with none of the 13predictors missing, retained 5 of the 13 predictorsat significance level 0.10.

v A second forward stepwise logistic regression pro-gram, including those patients with none of the 5predictors missing, retained 4 of the 5 at signifi-cance level 0.05.

These last four variables,

Ž . Ž .13 ascites, 15 bilirubin,Ž . Ž .7 malaise, 20 histology,

were deemed the ‘‘important predictors.’’ The logis-tic regression based on them misclassified 16% ofthe 155 patients, with cross-validation suggesting atrue error rate of about 20%.

A crucial question concerns the validity of theselected model. Should we take the four ‘‘importantpredictors’’ very seriously in a medical sense? Thebootstrap answer seems to be ‘‘probably not,’’ eventhough it was natural for the medical investigatorto do so given the impressive amount of statisticalmachinery involved in their selection.

Gail Gong resampled the 155 patients, taking asa unit each patient’s entire record of 19 predictorsand response. For each bootstrap data set of 155resampled records, she reran the three-stage logis-tic regression model, yielding a bootstrap set of‘‘important predictors.’’ This was done 500 times.Figure 6 shows the important predictors for thefinal 25 bootstrap data sets. The first of these isŽ .13, 7, 20, 15 , agreeing except for order with the setŽ .13, 15, 7, 20 from the original data. This didn’thappen in any other of the 499 bootstrap cases. In

Page 14: Statistical Science 13, No 2, 95 122 R. A. Fisher in the 21st ...18.655/papers/efron1998.pdfStatistical Science 1998, Vol. 13, No. 2, 95]122 R. A. Fisher in the 21st Century Invited

B. EFRON108

TA

BL

E3

155

chro

nic

hep

atit

ispa

tien

tsw

ere

mea

sure

don

19d

iagn

osti

cva

riab

les;

dat

ash

own

for

the

last

11pa

tien

ts;

outc

ome

ys

0or

1as

pati

ent

live

dor

die

d;

neg

ativ

en

um

bers

ind

icat

em

issi

ng

dat

a

Con

s-S

ter-

An

ti-

Fa-

Mal

-A

nor

-L

iver

Liv

erS

ple

enS

pi-

As-

Var

-B

ili-

Alk

Alb

u-

Pro

-H

isto

-ta

nt

Age

Sex

oid

vira

lti

gue

aise

exia

Big

Fir

mP

alp

der

sci

tes

ices

rub

inP

hos

SG

OT

min

tein

logy

y1

23

45

67

89

1011

1213

1415

1617

1819

20a

11

451

22

11

12

22

11

21.

90y

111

42.

4y

1y

314

50

131

11

21

22

22

22

22

1.20

7519

34.

254

214

61

141

12

21

22

21

11

21

4.20

6512

03.

4y

1y

314

71

170

11

21

11

y3

y3

y3

y3

y3

y3

1.70

109

528

2.8

352

148

01

201

12

22

22

y3

22

22

0.90

8915

24.

0y

12

149

01

361

22

22

22

22

22

20.

6012

030

4.0

y1

215

01

146

12

21

11

22

21

11

7.60

y1

242

3.3

50y

315

10

144

12

21

22

21

22

22

0.90

126

142

4.3

y1

215

20

161

11

21

12

11

21

22

0.80

9520

4.1

y1

215

30

153

21

21

22

22

11

21

1.50

8419

4.1

48y

315

41

143

12

21

22

22

11

12

1.20

100

193.

142

215

5

FIG. 6. The set of ‘‘important predictors’’ selected in the last 25of 500 bootstrap replications of the three-step logistic regression

Ž .model selection program; original choices were 13, 15, 7, 20 .

all 500 bootstrap replications only variable 20, his-tology, which appeared 295 times, was ‘‘important’’more than half of the time. These results certainlydiscourage confidence in the causal nature of the

Ž .predictor variables 13, 15, 7, 20 .Or do they? It seems like we should be able to use

the bootstrap results to quantitatively assess thevalidity of the various predictors. Perhaps theycould also help in selecting a better predictionmodel. Questions like these are being asked thesedays, but the answers so far are more intriguingthan conclusive.

It is not clear to me whether Fisherian methodswill play much of a role in the further progress ofmodel selection theory. Figure 6 makes model selec-tion look like an exercise in discrete estimation,while Fisher’s MLE theory was always aimed atcontinuous situations. Direct frequentist methodslike cross-validation seem more promising rightnow, and there have been some recent develop-ments in Bayesian model selection, but in fact ourbest efforts so far are inadequate for problems likethe hepatitis data. We could badly use a cleverFisherian trick for reducing complicated model se-lection problems to simple obvious ones.

10. EMPIRICAL BAYES METHODS

As a final example, I wanted to say a few wordsabout empirical Bayes methods. Empirical Bayes

Page 15: Statistical Science 13, No 2, 95 122 R. A. Fisher in the 21st ...18.655/papers/efron1998.pdfStatistical Science 1998, Vol. 13, No. 2, 95]122 R. A. Fisher in the 21st Century Invited

R. A. FISHER IN THE 21ST CENTURY 109

seems like the wave of the future to me, but itseemed that way 25 years ago and the wave stillhasn’t washed in, despite the fact that it is an areaof enormous potential importance. It is not a topicthat has had much Fisherian input.

Table 4 shows the data for an empirical Bayessituation: independent clinical trials were run in 41cities, comparing the occurrence of recurrent bleed-ing, an undesirable side effect, for two stomachulcer surgical techniques, a new treatment and anolder control. Each trial yielded an estimate of thetrue log odds ratio for recurrent bleeding, Treat-ment versus Control,

u s log odds ratio in city i , i s 1, 2, . . . , 41.i

In city 8, for example, we have the estimate seenearlier in Table 2,

1 13u s log s y4.2,ž /15 3

indicating that the new surgery was very effectivein reducing recurrent bleeding, at least in city 8.

Figure 7 shows the likelihoods for u in 10 of thei41 cities. These are conditional likelihoods, usingFisher’s trick of conditioning on the marginals toget rid of the nuisance parameters in each city. Itseems clear that the log odds ratios u are not allithe same. For instance, the likelihoods for cities 8and 13 barely overlap. On the other hand, the uivalues are not wildly discrepant, most of the 41

FIG. 7. Individual likelihood functions for u , for 10 of the 41iexperiments in Table 4; L , the likelihood for the log odds ratio8in city 8, lies to the left of most of the others.

likelihood functions concentrating themselves onŽ . Žthe range y6, 3 . This is the kind of complicated

inferential situation I was worrying about in thediscussion of fiducial inference, confidence densities

.and objective Bayes methods.

TABLE 4Ulcer data: 41 independent experiments concerning the number of occurrences of recurrent bleeding following

Ž . Ž .ulcer surgery; a, b s a bleeding; a nonbleeding for Treatment, a new surgical technique;ˆŽ .c, d is the same for Control, an older surgery; u is the sample log odds ratio, with$

estimated standard deviation SD; stars indicate cases shown in Figure 7

$ $ˆ ˆExperiment a b c d u SD Experiment a b c d u SD

U1 7 8 11 2 y1.84 0.86 21 6 34 13 8 y2.22 0.612 8 11 8 8 y0.32 0.66 22 4 14 5 34 66 0.71

U3 5 29 4 35 0.41 0.68 23 14 54 13 61 20 0.424 7 29 4 27 0.49 0.65 24 6 15 8 13 y43 0.64

U5 3 9 0 12 inf 1.57 25 0 6 6 0 yinf 2.08U6 4 3 4 0 yinf 1.65 26 1 9 5 10 y1.50 1.02U7 4 13 13 11 y1.35 0.68 27 5 12 5 10 y0.18 0.73U8 1 15 13 3 y4.17 1.04 28 0 10 12 2 yinf 1.60

9 3 11 7 15 y0.54 0.76 29 0 22 8 16 yinf 1.4910U 2 36 12 20 y2.38 0.75 30 2 16 10 11 y1.98 0.8011 6 6 8 0 yinf 1.56 31 1 14 7 6 y2.79 1.01

U12 2 5 7 2 y2.17 1.06 32 8 16 15 12 y0.92 0.57U13 9 12 7 17 0.60 0.61 33 6 6 7 2 y1.25 0.92

14 7 14 5 20 0.69 0.66 34 0 20 5 18 yinf 1.5115 3 22 11 21 y1.35 0.68 35 4 13 2 14 0.77 0.8716 4 7 6 4 y0.97 0.86 36 10 30 12 8 y1.50 0.5717 2 8 8 2 y2.77 1.02 37 3 13 2 14 0.48 0.9118 1 30 4 23 y1.65 0.98 38 4 30 5 14 y0.99 0.7119 4 24 15 16 y1.73 0.62 39 7 31 15 22 y1.11 0.5220 7 36 16 27 y1.11 0.51 40U 0 34 34 0 yinf 2.01

41 0 9 0 16 NA 2.04

Page 16: Statistical Science 13, No 2, 95 122 R. A. Fisher in the 21st ...18.655/papers/efron1998.pdfStatistical Science 1998, Vol. 13, No. 2, 95]122 R. A. Fisher in the 21st Century Invited

B. EFRON110

Notice that L , the likelihood for u , lies to the8 8left of most of the other curves. This would still betrue if we could see all 41 curves instead of just 10of them. In other words, u appears to be more8negative than the log odds ratios in most of theother cities.

What is a good estimate or confidence interval foru ? Answering this question depends on how much8the results in other cities influence our thinkingabout city 8. That is where empirical Bayes theorycomes in, giving us a systematic framework forcombining the direct information for u from city88’s experiment with the indirect information fromthe experiments in the other 40 cities.

The ordinary 90% confidence interval for u ,8Ž .based only on the data 1, 15, 13, 3 from its own

experiment, is

Ž . w x13 u g y6.3, y2.4 .8

Empirical Bayes methods give a considerably dif-ferent result. The empirical Bayes analysis uses thedata in the other 40 cities to estimate a priordensity for log odds ratios. This prior density can becombined with the likelihood L for city 8, using8Bayes theorem, to get a central 90% a posterioriinterval for u ,8

Ž . w x14 u g y5.1, y1.8 .8

The fact that most of the cities had less nega-tively tending results than city 8 plays an impor-tant role in the empirical Bayes analysis. TheBayesian prior estimated from the other 40 citiessays that u is unlikely to be as negative as its own8data by itself would indicate.

The empirical Bayes analysis implies that thereis a lot of information in the other 40 cities’s datafor estimating u , as a matter of fact, just about as8much as in city 8’s own data. This kind of ‘‘other’’information does not have a clear Fisherian inter-pretation. The whole empirical Bayes analysis isheavily Bayesian, as if we had begun with a gen-uinely informative prior for u and yet it still has8some claims to frequentist objectivity.

Perhaps we are verging here on a new compro-mise between Bayesian and frequentist methods,one that is fundamentally different from Fisher’sproposals. If so, the 21st century could look a lotless Fisherian, at least for problems with parallelstructure like the ulcer data. Right now there aren’tmany such problems. This could change quickly ifthe statistics community became more confidentabout analyzing empirical Bayes problems. Thereweren’t many factorial design problems beforeFisher provided an effective methodology for han-dling them. Scientists tend to bring us the problems

we can solve. The current attention to metaanalysisand hierarchical models certainly suggests a grow-ing interest in the empirical Bayes kind of situa-tion.

11. THE STATISTICAL TRIANGLE

The development of modern statistical theory hasbeen a three-sided tug of war between the Bayesian,frequentist and Fisherian viewpoints. What I havebeen trying to do with my examples is apportionthe influence of the three philosophies on severaltopics of current interest: standard error estima-tion; approximate confidence intervals; conditionalinference; objective Bayes theories and fiducial in-ference; model selection; and empirical Bayes tech-niques.

Figure 8, the statistical triangle, does this moreconcisely. It uses barycentric coordinates to indicatethe influence of Bayesian, frequentist and Fishe-rian thinking upon a variety of active researchareas. The Fisherian pole of the triangle is locatedbetween the Bayesian and frequentist poles, as inFigure 1, but here I have allocated Fisherian phi-losophy its own dimension to take account of itsdistinctive operational features: reduction to ‘‘obvi-ous’’ types; the plug-in principle; an emphasis oninferential correctness; the direct interpretation oflikelihoods; and the use of automatic computationalalgorithms.

Of course, a picture like this cannot be more thanroughly accurate, even if one accepts the author’sprejudices, but many of the locations are difficult toargue with. I had no trouble placing conditionalinference and partial likelihood near the Fisherianpole, robustness at the frequentist pole and multi-ple imputation near the Bayesian pole. EmpiricalBayes is clearly a mixture of Bayesian and frequen-tist ideas. Bootstrap methods combine the con-venience of the plug-in principle with a strongfrequentist desire for accurate operating character-istics, particularly for approximate confidence in-tervals, while the jackknife’s development has beenmore purely frequentistic.

Some of the other locations in Figure 8 are moreproblematical. Fisher provided the original idea be-hind the EM algorithm, and in fact the self-con-

Žsistency of maximum likelihood estimation when.missing data is filled in by the statistician is a

classic Fisherian correctness argument. On theother hand EM’s modern development has had astrong Bayesian component, seen more clearly inthe related topic of Gibbs sampling. Similarly,Fisher’s method for combining independent p-val-ues is an early form of metaanalysis, but the sub-ject’s recent growth has been strongly frequentist.

Page 17: Statistical Science 13, No 2, 95 122 R. A. Fisher in the 21st ...18.655/papers/efron1998.pdfStatistical Science 1998, Vol. 13, No. 2, 95]122 R. A. Fisher in the 21st Century Invited

R. A. FISHER IN THE 21ST CENTURY 111

FIG. 8. A barycentric picture of modern statistical research, showing the relative influence of the Bayesian, frequentist and Fisherianphilosophies upon various topics of current interest.

The trouble here is that Fisher wasn’t always aFisherian, so it is easy to confuse parentage withdevelopment.

The most difficult and embarrassing case con-cerns what I have been calling ‘‘objective Bayes’’methods, among which I included fiducial infer-ence. One definition of frequentism is the desire todo well, or at least not to do poorly, against everypossible prior distribution. The Bayesian spirit, asepitomized by Savage and de Finetti, is to do verywell against one prior distribution, presumably theright one.

There have been a variety of objective Bayescompromises between these two poles. Working nearthe frequentist end of the spectrum, Welch andPeers showed how to calculate priors whose a pos-teriori credibility intervals coincide closely withstandard confidence intervals. Jeffreys’s work,which has led to vigorous modern development ofBayesian model selection, is less frequentistic. In abivariate normal situation Jeffreys would recom-mend the same prior distribution for estimating thecorrelation coefficient or for the ratio of expecta-tions, while the Welch]Peers theory would use twodifferent priors in order to separately match each ofthe frequentist solutions.

Nevertheless Jeffreys’s Bayesianism has an un-Ždeniable objectivist flavor. Erich Lehmann per-

.sonal communication had this to say: ‘‘If one sepa-wrates the two Bayesian concepts Savage]de Finetti

xand Jeffreys and puts only the subjective version

in your Bayesian corner, it seems to me that some-thing interesting happens: the Jeffreys conceptmoves to the right and winds up much closer to thefrequency corner than to the Bayesian one. Forexample, you contrasted Bayes as optimistic andrisk-taking with frequentist as pessimistic andplaying it safe. On both of these scales Jeffreys ismuch closer to the frequentist end of the spectrum.In fact, the concept of uninformative prior is philo-sophically close to Wald’s least favorable distribu-tion, and the two often coincide.’’

Lehmann’s advice is followed a bit in Figure 8,Ž .where the Bayesian model selection BIC point, a

direct legacy of Jeffreys’s work, has been moved alittle ways toward the frequentist pole. However, Ihave located fiducial inference, Fisher’s form ofobjective Bayesianism, near the center of the trian-gle. There isn’t much work in that area right nowbut there is a lot of demand coming from all threedirections.

The point of my examples, and the main point ofthis talk, was to show that Fisherian statistics, isnot a dead language and that it continues to inspirenew research. I think this is clear in Figure 8, evenallowing for its inaccuracies. But Fisher’s languageis not the only language in town, and it is not eventhe dominant language of our research journals.That prize would have to go to a rather casualfrequentism, not usually as hard-edged as puredecision theory these days. We might ask whatFigure 8 will look like 20 or 30 years from now, and

Page 18: Statistical Science 13, No 2, 95 122 R. A. Fisher in the 21st ...18.655/papers/efron1998.pdfStatistical Science 1998, Vol. 13, No. 2, 95]122 R. A. Fisher in the 21st Century Invited

B. EFRON112

whether there will be many points of active re-search interest lying near the Fisherian pole of thetriangle.

12. R. A. FISHER IN THE 21ST CENTURY

Most talks about the future are really about thepresent, and this one has certainly been no excep-tion. But here at the end of the talk, and nearly atthe end of the 20th century, we can peek cautiouslyahead and speculate at least a little bit about thefuture of Fisherian statistics.

Of course Fisher’s fundamental discoveries likesufficiency, Fisher information, the asymptotic effi-ciency of the MLE, experimental design and ran-domization inference are not going to disappear.They might become less visible though. Right nowwe use those ideas almost exactly as Fisher coinedthem, but modern computing equipment couldchange that.

For example, maximum likelihood estimates canbe badly biased in certain situations involving a

Žgreat many nuisance parameters as in the Ney-.man]Scott paradox. A computer-modified version

of the MLE that was less less biased could becomethe default estimator of choice in applied problems.REML estimation of variance components offers acurrent example. Likewise, with the universalspread of high-powered computers statisticiansmight automatically use some form of the moreaccurate confidence intervals I mentioned earlierinstead of the standard intervals.

Changes like these would conceal Fishers’s influ-ence, but not really diminish it. There are a coupleof good reasons though that one might expect moredramatic changes in the statistical world, the firstof these being the miraculous improvement in ourcomputational equipment, by orders of magnitudeevery decade. Equipment is destiny in science, andstatistics is no exception to that rule. Second,statisticians are being asked to solve bigger, harder,more complicated problems, under such namesas pattern recognition, DNA screening, neuralnetworks, imaging and machine learning. Newproblems have always evoked new solutions instatistics, but this time the solutions might have tobe quite radical ones.

Almost by definition it’s hard to predict radicalchange, but I thought I would finish with a fewspeculative possibilities about a statistical futurethat might, or might not, be a good deal less Fisher-ian.

12.1 A Bayesian World

In 1974 Dennis Lindley predicted that the 21stŽcentury would be Bayesian. I notice that his recent

Statistical Science interview now predicts the year.2020. He could be right. Bayesian methods are

attractive for complicated problems like the onesjust mentioned, but unless the scientific worldchanges the way it thinks I can’t imagine subjectiveBayes methods taking over. What I called objectiveBayes, the use of neutral or uninformative priors,seems a lot more promising and is certainly in theair these days.

A successful objective Bayes theory would have toprovide good frequentist properties in familiar situ-ations, for instance, reasonable coverage probabili-ties for whatever replaces confidence intervals. Sucha Bayesian world might not seem much differentthan the current situation except for more straight-forward analyses of complicated problems like mul-tiple comparisons. One can imagine the statisticianof the year 2020 hunched over his or her supercom-puter terminal, trying to get Proc Prior to runsuccessfully, and we can only wish that future col-league ‘‘good luck.’’

12.2 Nonparametrics

As part of our Fisherian legacy we tend to overusesimple parametric models like the normal. A non-parametric world, where parametric models were alast resort instead of the first, would favor thefrequentist vertex of the triangle picture.

12.3 A New Synthesis

The postwar years and especially the last fewdecades have been more notable for methodologicalprogress than the development of fundamental newideas in the theory of statistical inference. Thisdoesn’t mean that such developments are finishedforever. Fisher’s work came out of the blue in the1920s, and maybe our field is due for another boltof lightening.

It’s easy for us to imagine that Fisher, Neymanand the others were lucky to live in a time when allthe good ideas hadn’t been plucked from the trees.In fact, we are the ones living in the golden age ofstatistics}the time when computation has becomefast and easy. In this sense we are overdue for anew statistical paradigm, to consolidate themethodological gains of the postwar period. Therubble is building up again, to use Joan FisherBox’s simile, and we could badly use a new Fisherto put our world in order.

My actual guess is that the old Fisher will have avery good 21st century. The world of applied statis-tics seems to need an effective compromise betweenBayesian and frequentist ideas, and right now thereis no substitute in sight for the Fisherian synthesis.Moreover, Fisher’s theories are well suited to life in

Page 19: Statistical Science 13, No 2, 95 122 R. A. Fisher in the 21st ...18.655/papers/efron1998.pdfStatistical Science 1998, Vol. 13, No. 2, 95]122 R. A. Fisher in the 21st Century Invited

R. A. FISHER IN THE 21ST CENTURY 113

the computer age. Fisher seemed naturally to thinkin algorithmic terms. Maximum likelihood esti-mates, the standard intervals, ANOVA tables, per-mutation tests are all expressed algorithmicallyand are easy to extend with modern computation.

Let me say finally that Fisher was a genius of thefirst rank, who has a solid claim to being the mostimportant applied mathematician of the 20th cen-tury. His work has a unique quality of daring math-ematical synthesis combined with the utmost prac-ticality. The stamp of his work is very much uponour field and shows no sign of fading. It is thestamp of a great thinker, and statistics}and sci-ence in general}is much in his debt.

REFERENCES

Section 1

Ž . ŽSAVAGE, L. J. H. 1976 . On rereading R. A. Fisher with discus-. Žsion . Ann. Statist. 4 441]500. Savage says that Fisher’s

work greatly influenced his seminal book on subjectiveBayesianism. Fisher’s great ideas are examined lovingly

.here, but not uncritically.Ž .YATES, F. and MATHER K. 1971 . Ronald Aylmer Fisher. In

Ž .Collected Papers of R. A. Fisher K. Mather, ed. 1 23]52.ŽUniv. Adelaide Press. Reprinted from a 1963 Royal Statisti-

cal Society memoir. Gives a nontechnical assessment of.Fisher’s ideas, personality and attitudes toward science.

Ž . ŽBOX, J. F. 1978 . The Life of a Scientist. Wiley, New York. Thisis both a personal and an intellectual biography by Fisher’sdaughter, a scientist in her own right and also an historianof science, containing some unforgettable vignettes of preco-cious mathematical genius mixed with a difficulty in ordi-nary human interaction. The sparrow quote in Section 4 is

.put in context on page 130.

Section 2

Ž .FISHER, R. A. 1925 . Theory of statistical estimation. Proc.ŽCambridge Philos. Soc. 22 200]225. Reprinted in the

Mather collection, and also in the 1950 Wiley Fisher collec-tion Contributions to Mathematical Statistics. This is mychoice for the most important single paper in statisticaltheory. A competitor might be Fisher’s 1922 PhilosophicalSociety paper, but as Fisher himself points out in the Wileycollection, the 1925 paper is more compact and businesslike

.than was possible in 1922, and more sophisticated as well.Ž .EFRON B. 1995 . The statistical century. Royal Statistical Soci-

Ž . Žety News 22 5 1]2. This is mostly about the postwar boomin statistical methodology and uses a different statistical

.triangle than Figure 8.

Section 3

Ž .FISHER, R. A. 1934 . Two new properties of mathematical likeli-Žhood. Proc. Roy. Soc. Ser. A 144 285]307. Concerns two

situations when fully efficient estimation is possible in finitesamples: one-parameter exponential families, where theMLE is a sufficient statistic, and location]scale families,where there are exhaustive ancillary statistics. Reprinted in

.the Mather and the Wiley collections.

Section 4Ž .EFRON, B. 1978 . Controversies in the foundations of statistics.

ŽAmer. Math. Monthly 85 231]246. The Bayes]Frequen-tist]Fisherian argument in terms of what kinds of averagesshould the statistician take. Includes Fisher’s famous circle

.example of ancillarity.Ž .EFRON, B. 1982 . Maximum likelihood and decision theory. Ann.

ŽStatist. 10 240]356. Examines five questions concerningmaximum likelihood estimation: What kind of theory is it?How is it used in practice? How does it look from a frequen-tistic decision-theory point of view? What are its principalvirtues and defects? What improvements have been sug-

.gested by decision theory?Ž .CIFARELLI, D. and REGAZZINI, E. 1996 . De Finetti’s contribution

wto probability and statistics. Statist. Sci. 11 253]282. Thesecond half of the quote in my Section 4, their Section 3.2.2,goes on to criticize the Neyman]Pearson school. De Finettiis less kind to Fisher in the discussion following Savage’sŽ . x1976 article.

Sections 5 and 6Ž .DICICCIO, T. and EFRON, B. 1996 . Bootstrap confidence inter-

Ž . Žvals with discussion . Statist. Sci. 11 189]228. Presentsand discusses the cd4 data of Figure 2. The bootstrap confi-

.dence limits in Table 1 were obtained by the BC method.a

Section 7Ž .REID, N. 1995 . The roles of conditioning in inference. Statist.

w USci. 10 138]157. This is a survey of the p formula, what Icalled the magic formula following Ghosh’s terminology, andmany other topics in conditional inference; see also the

Ž .discussion following the companion article on pages173]199, in particular McCullagh’s commentary. Gives an

xextensive bibliography.Ž .EFRON, B. and HINKLEY, D. 1978 . Assessing the accuracy of the

maximum likelihood estimator: observed versus expectedŽFisher information. Biometrika 65 457]487. Concerns an-

cillarity, approximate ancillarity and the assessment of ac-.curacy for a MLE.

Section 8Ž .EFRON, B. 1993 . Bayes and likelihood calculations from con-

Žfidence intervals. Biometrika 80 3]26. Shows how ap-proximate confidence intervals can be used to get goodapproximate confidence densities, even in complicated prob-

.lems with a great many nuisance parameters.

Section 9Ž .EFRON, B. and GONG, G. 1983 . A leisurely look at the bootstrap,

the jackknife, and cross-validation. Amer. Statist. 37 36]48.ŽThe chronic hepatitis example is discussed in Section 10 of

.this bootstrap]jackknife survey article.Ž .O’HAGAN, A. 1995 . Fractional Bayes factors for model compari-

Ž .son with discussion . J. Roy. Statist. Soc. Ser. B 57 99]138.ŽThis paper and the ensuing discussion, occasionally ratherheated, give a nice sense of Bayesian model selection in the

.Jeffreys tradition.Ž .KASS, R. and RAFTERY, A. 1995 . Bayes factors. J. Amer. Statist.

ŽSoc. 90 773]795. This review of Bayesian model selectionfeatures five specific applications and an enormous bibliog-

.raphy.

Page 20: Statistical Science 13, No 2, 95 122 R. A. Fisher in the 21st ...18.655/papers/efron1998.pdfStatistical Science 1998, Vol. 13, No. 2, 95]122 R. A. Fisher in the 21st Century Invited

B. EFRON114

Section 10Ž .EFRON, B. 1996 . Empirical Bayes methods for combining likeli-

hoods. J. Amer. Statist. Assoc. 91 538]565.

Section 11Ž .KASS, R. and WASSERMAN, L. 1996 . The selection of prior distri-

butions by formal rules. J. Amer. Statist. Assoc. 91Ž1343]1370. Begins ‘‘Subjectivism has become the dominant

philosophical tradition for Bayesian inference. Yet in prac-

tice, most Bayesian analyses are performed with so-called.noninformative priors. . . . ’’

Section 12Ž .LINDLEY, D. V. 1974 . The future of statistics}a Bayesian 21st

century. In Proceedings of the Conference on Directions forMathematical Statistics. Univ. College, London.

Ž .SMITH, A. 1995 . A conversation with Dennis Lindley. Statist.ŽSci. 10 305]319. This is a nice view of Bayesians and

.Bayesianism. The 2020 prediction is attributed to de Finetti.

CommentD. R. Cox

I very much enjoyed Professor Efron’s eloquentand perceptive assessment of R. A. Fisher’s contri-butions and of their current relevance. I am surethat Professor Efron is right to attach outstandingimportance to Fisher’s ideas.

As Professor Efron emphasizes, Fisher’s ideas areso wide ranging that it is not feasible to cover themall in a single paper. The following outline notes arein supplementation of rather than in disagreementwith the paper.Ž .1 Fisher stressed the need for different modes

of attack on different types of inferential problem.Ž .2 While his formal ideas deal with fully para-

metric problems he gave the ‘‘exact’’ randomizationtest based on the procedure used in design. Thestatus to him of such tests is not entirely clear. Didhe regard them as reassurance for the faint ofheart, timid about working assumptions of normal-ity, or are they the preferred method of analysis towhich normal theory based results are often a con-venient approximation? Yates vigorously rejectedthe second interpretation. The more important pointis probably that Fisher recognized that randomiza-tion indicated the appropriate analysis of variance,that is, appropriate estimate of error. This replacedthe need for special ad hoc assumptions of a newlinear model for each design.Ž .3 A key to understanding some of the distinc-

tions between Fisherian and Neyman]Pearsonapproaches lies in Fisher’s special notion of themeaning of the probability p of a ‘‘unique’’ event as

Ž .set out, for example, in Fisher 1956, pages 31]36 .

D. R. Cox is an Honorary Fellow, Nuffield College,ŽOxford OX1 1NF, United Kingdom e-mail:

[email protected] .

There are two aspects, one that the individual be-longs to an ensemble or population in a proportionp of which which the event holds. The other is thatit is not possible to recognize the individual as lyingin a subpopulation with a different proportion.Fisher considered, it seems to me correctly, thatthis enabled probability statements to be attachedto an unknown parameter on the basis of a randomsample, no other information being available, with-out invoking a prior distribution. The snag is thatsuch distributions cannot be manipulated or com-bined by the ordinary laws of probability.Ž .4 The development of Fisher’s ideas on Baye-

sian inference can be traced by comparing thepolemical remarks at the start of Design of Experi-

Ž .ments Fisher, 1935 with the more measured com-Ž .ments in Fisher 1956 .

Ž .5 Fisher was of course an extremely powerfulmathematician especially with distributional andcombinatorial calculations. It helps us to under-stand his attitude to mathematical rigor to note the

Ž .remarks of Mahalanobis 1938 , who wrote:

The explicit statement of a rigorous ar-gument interested him but only on theimportant condition that such explicitdemonstration of rigor was needed. Me-chanical drill in the technique of rigorousstatement was abhorrent to him, partlyfor its pedantry, and partly as an inhibi-tion to the active use of the mind. He feltit was more important to think actively,even at the expense of occasional errorsfrom which an alert intelligence wouldsoon recover, than to proceed with per-fect safety at a snail’s pace along wellknown paths with the aid of a mostperfectly designed mechanical crutch.

Page 21: Statistical Science 13, No 2, 95 122 R. A. Fisher in the 21st ...18.655/papers/efron1998.pdfStatistical Science 1998, Vol. 13, No. 2, 95]122 R. A. Fisher in the 21st Century Invited

R. A. FISHER IN THE 21ST CENTURY 115

This is a comment on Fisher’s attitude when anundergraduate at Cambridge; it is tempting to thinkthat Mahalanobis was using Fisher’s own words.Ž .6 In some ways Fisher’s greatest methodologi-

cal contributions, the analysis of variance includingdiscriminant analysis, and ideas about experimen-tal design, appear to owe relatively little directly tohis ideas on general statistical theory. There are ofcourse connections to be perceived subsequently,for example, to sufficiency and transformation mod-els, and in the case of randomization somewhatunderdeveloped connections to conditioning and an-cillary statistics. Fisher’s mastery of distributiontheory was, however, obviously relevant, perhapsmost strikingly in his approach to the connectionbetween the distribution theory of multiple regres-sion to that of linear discriminant analysis.Ž .7 In the normal process of scientific develop-

ment important notions get simplified and ab-sorbed into the general ethos of the subject and

reference to original sources becomes unnecessaryexcept for the historian of ideas. It is a measure ofthe range and subtlety of Fisher’s ideas that it isstill fruitful to read parts at least of the two booksmentioned above as well as the papers that Profes-sor Efron mentions in his list of references.Ž .8 I agree with Professor Efron that one key

current issue is the synthesis of ideas on modifiedlikelihood functions, empirical Bayes methods andsome notion of reference priors.Ž .9 I like Professor Efron’s triangle although on

balance I would prefer a square, labeled by one axisthat represents mathematical formulation and theother that represents conceptual objective. For ex-ample Fisher and Jeffreys were virtually identicalin their objective although of course different intheir mathematics.Ž .10 Finally, in contemplating Fisher’s contribu-

tions, one must not forget that he was as eminentas a geneticist as he was as a statistician.

CommentRob Kass

Very nice, very provocative}as I read Efron’sversion of Fisher’s profound influence on our disci-pline, I found myself wondering whether statisticscould have succeeded to this point so spectacularly,across so many disciplines, if it had been basedprimarily on Bayesian logic rather than largelyFisherian frequentist logic. Without the historicalcounterfactual, the question might be stated thisway: from a Bayesian point of view, when, if ever, isfrequentist logic necessary?

I believe there are three places where frequentistreasoning is essential to the success of our enter-prise. First, we need goodness-of-fit assessments, orwhat Dempster in his Fisherian ruminations has

Žcalled ‘‘postdictive inference’’ see Gelman, Meng.and Stern, 1996, and the discussion of it . Second,

although many principles have been formulated fordefining noninformative, or reference, priors, itseems that good behavior under repeated sampling

Rob Kass is Professor and Head, Department ofStatistics, Carnegie Mellon University, Pittsburgh,

Ž .Pennsylvania 15213 e-mail: [email protected] .

should play a role somehow. It is possible, as Coximplied in his 1994 Statistical Science interview,that we have not yet recognized how, and I havethe sense that Efron shares this view. But the thirdand perhaps most important place even weBayesians currently find frequentist methods use-ful is that they offer highly desirable shortcuts.This is related to Efron’s point at the end of Section4. The Fisherian frequentist methods for what arenow relatively simple situations, such as analysis ofvariance, are certainly easy to use; it remains un-clear whether standardized Bayesian methods couldentirely replace them. Equally crucial, however, arethe more sophisticated data analytic methods, suchas modern nonparametric regression, which share abig relative advantage in ease of use over theirBayesian counterparts. In short, despite my strongpreference for Bayesian thinking, based on our cur-rent understanding of inference, I cannot see howthe next century or any other could be exclusivelyBayesian. I have felt for a long time that theBayesian versus frequentist contrast, while strictlya matter of logic, might more usefully be consid-ered, metaphorically, a matter of language}thatis, they are two alternative languages that are usedto grapple with uncertainty, with fluent speakers of

Page 22: Statistical Science 13, No 2, 95 122 R. A. Fisher in the 21st ...18.655/papers/efron1998.pdfStatistical Science 1998, Vol. 13, No. 2, 95]122 R. A. Fisher in the 21st Century Invited

B. EFRON116

one being capable of arriving at an understandingof essentially any phenomenon that is understoodby fluent speakers of the other, despite there beingno good translation of certain phrases.

I suspect we all agree on Fisher’s greatness. I liketo say Fisher was to statisics what Newton was tophysics. Continuing the analogy, Efron suggeststhat we need a statistical Einstein. But the realquestion is whether it is possible to obtain a newframework that achieves the goal of fiducial infer-ence. The situation in statistical inference is beauti-fully peaceful and compelling for one-parameterproblems: reference Bayesian and Fisherian roadsconverge to second-order, via the magic formula.When we go to the multiparameter world, however,the hope dims not only for a reconciliation ofBayesian and frequentist paradigms, but for anysatisfactory, unified approach in either a frequen-tist or Bayesian framework, and we must wonderwhether the world is simply depressingly messy.Indeed, some of the cautionary notes sounded inthe 1996 JASA review paper I wrote with Larry

ŽWasserman implicitly suggest that as pure subjec-.tivists are quick to argue there may be no way

around the fundamental difficulties.It is clear that statistical problems are becoming

much more complicated. I got the possibly erro-

Žneous sense from his comment at the end of Sec-.tion 8 that Efron connects this to his unease with

the current situation and the need for a newparadigm}to be furnished perhaps by our EinsteinMessiah. I would instead look toward a differentbig new theoretical development. Bayesian and fre-quentist analyses based on parametric models are,from a consumer’s point of view, really quite simi-lar. Bayesian nonparametrics, however, is in itsinfancy and its connection with frequentist non-parametrics almost nonexistent. I hope for a muchmore thorough and successful Bayesian nonpara-metric theory and a resulting deeper understandingof infinite-dimensional problems. Perhaps entirelynew principles would have to be invoked to supple-ment those of Fisher, Jeffreys, Neyman and deFinetti]Savage. If it happens, an increasingly non-parametric future would not, in principle, move ustoward the frequentist vertex of Efron’s triangle.Rather, Bayesian inference would continue to playits illuminating foundational role, important newmethodology would be developed, and it might eventurn out that there is a genuine, deep and detailedsense in which frequentist methods could be consid-ered shortcut substitutes for full-fledged Bayesianalternatives.

CommentOle E. Barndorff-Nielsen

It has been a pleasure reading Professor Efron’sfar-ranging, thoughtful and valiant paper. I agreewith most of the views presented there and thiscontribution to the discussion of the paper consistsof a number of disperse comments, mostly adding towhat has been mentioned in the paper.Ž .1 One, potentially major, omission from the pa-

per’s vision of Fisherian influence in the next cen-tury is the lack of discussion of the role that ideasand methods of statistical inference may have inquantum mechanics. Such ideas and methods arelikely to become of increasing importance, particu-larly in connection with the developments in exper-

Ole E. Barndorff-Nielsen is Professor, TheoreticalStatistics, Institute of Mathematics, Aarhus Univer-

Žsity, 8000 AARHUS C, Denmark e-mail: [email protected] .

imental techniques that allow the study of verysmall quantum systems.

Moreover, there is already now, in the physicalliterature, a substantial body of results on quantumanalogues of Fisher information and on associatedresults of statistical differential geometry, in thevein of Amari.Ž .2 It also seems pertinent to stress the impor-

tance of ‘‘pseudolikelihood,’’ that is, functions ofpart or all of the data and part or all of the parame-ters that to a large extent can be treated as genuinelikelihoods. Many of the most fruitful advances inthe second half of the 20th century centers aroundsuch functions.Ž .3 My own view on optimality is, perhaps, some-

what different from that of Bradley Efron in that Ifind that the focussing on optimality has to a con-siderable extent been to the detriment of statisticaldevelopment. The problems and danger stem fromtoo narrow definitions of what is meant by optimal-

Page 23: Statistical Science 13, No 2, 95 122 R. A. Fisher in the 21st ...18.655/papers/efron1998.pdfStatistical Science 1998, Vol. 13, No. 2, 95]122 R. A. Fisher in the 21st Century Invited

R. A. FISHER IN THE 21ST CENTURY 117

ity. One prominent instance of how too much em-phasis on a restricted concept of optimality maylead astray from general principles of scientific en-quiry is provided by the upper storeys of the Ney-man]Person test theory edifice.

In general, what good is it to have a procedurethat is ‘‘optimal’’ if this optimality’’ does not fit innaturally with a comprehensive and integratedmethodological approach to scientific studies.

However, if I read correctly, Professor Efron andI do not disagree strongly here. I am thinking inparticular of his reference to the ‘‘spirit of reason-able compromise.’’Ž . Ž4 In relation to the statement at the end of

.Section 8.1, on ‘‘The Confidence Density’’ concern-ing approximate confidence intervals and approxi-mate fiducial distributions in situations with many

nuisance parameters, I wonder how this is to bereconciled with the many counterexamples to

Ž .Fisher’s idea s for fiducial inference under multi-parameter distributions.Ž .5 Concerning model selection I suspect that

techniques in Fisherian spirit can be developedalthough this largely remains to be done.Ž .6 On a historical note, neither Bayesian statis-

tics nor Neyman]Pearson]Wald type theory hasever taken serious foothold in statistics in Den-mark, and thinking along lines close to those ofFisher has been prominent throughout.

The tradition does, in fact, go back to the latterpart of the 19th century, in particular to Thiele,who, one might add, essentially worked with analy-

Ž .sis of variance long before Fisher cf. Hald, 1981 .

CommentD. V. Hinkley

Using this paper as an excuse, I revisited theŽ .Collected Papers Bennett, 1972 after a gap of 15

years and was struck again by how clear and con-cise the writing was}usually firm and irreverent,but often good-humored.

Fisher’s work certainly colored my own view ofthe principles and practices of statistics, especiallyin the early 1980s. But I think that the biologicalcontext of much of Fisher’s work made him wronglydismissive of Bayes’s theorem as a potential tool. Itis instructive to read the two-page note by FisherŽ .1929 , which answered Student’s suggestion thatnormal-theory methods such as analysis of variancebe extended to nonnormal data. Fisher says ‘‘I havenever known difficulty to arise in biological workfrom imperfect normality of variation, often thoughI have examined data for this particular cause ofdifficulty. . . . This is not to say that the deviation

w xfrom normal-theory methods may not have a realŽapplication in some technological work . . . ’’ what

would Fisher have thought of mathematical fi-.nance, one wonders! . I think that Fisher saw sam-

pling distributions as concrete, logically distinctfrom the weaker uncertainty characterizing prior

D. V. Hinkley is Professor, Department of Statisticsand Applied Probability, University of California,

ŽSanta Barbara, California 93106-3110 e-mail:[email protected] .

distributions, and so incompatible with the applica-tion of Bayes’s theorem. Fisher’s view about sam-pling models is still widely held, especially amongthose using non-Bayesian methods, and so modeluncertainty is a nonissue in nearly all of statistical

Ž .education not so in some sciences, however . But itmay yet turn out to be one of the most importanttopics in statistics. Note that model uncertainty isthe antithesis of model selection, which we arebeginning to understand fairly well using non-Fisherian ideas.

Other discussants may comment on Efron’s inter-pretation of Fisher, so I shall note only that he mayoverstate the contradictions. For example, random-ization inference could be conditioned, when appro-priate, by appropriate design}as happened withthe Knight’s Move and Diagonal Squares; see YatesŽ . Ž .1970, page 58 and Savage et al. 1976, page 464 .

Ž .Indeed the combination of Efron 1971 and CoxŽ . Ž1982 seems quintessentially Fisherian! Similarideas extend into resampling]bootstrap]cross-

.validation analysis, but are not yet common.But what of Fisher’s possible influence on the

future of statistics? With the bootstrap I am notsure. Certainly the parametric bootstrap involves aharmless use of computers to do Fisher’s calcula-tions. The ingenious theoretical basis for the para-

Ž .metric BC bootstrap method Efron, 1987 couldaalmost have been written by Fisher himself. Theidea of going beyond a simple standard error and

Page 24: Statistical Science 13, No 2, 95 122 R. A. Fisher in the 21st ...18.655/papers/efron1998.pdfStatistical Science 1998, Vol. 13, No. 2, 95]122 R. A. Fisher in the 21st Century Invited

B. EFRON118

Žstandardized estimate for confidence intervals or. Ž .tests follows Fisher; see Fisher 1928 , where in

reference to the conventional standard error of thesample correlation Fisher quips ‘‘among the things‘which students have to know, but only a fool woulduse’.’’ Nevertheless, the alternative developmentsbased on likelihood seem closer to continuation of

Žthe Fisher approach. All of these methods do needfurther development to become easily and widely

.useable.A major question is whether or not the nonpara-

metric bootstrap methods have or will have Fishe-rian influence. I think not. Certainly when we getinto model selection, highly nonparametric regres-

Ž .sion such as regression trees Ripley, 1996 , we aremore in the arena of Tukey than Fisher. Purelyempirical validation and assessment does not seemto appear in Fisher’s work, possibly due to impossi-bility of practical calculations.

The topic of empirical Bayes estimation, or ran-dom-effect modelling, has become of increasing im-

portance with the spread of metaanalysis ideas andŽ .methods. To paraphrase Efron Section 10 , the

information about one subpopulation in anothersubpopulation ‘‘does not have a clear Fisherian in-terpretation.’’ But surely Fisher would have ad-dressed this problem in a sensible way, so we needto figure out how by going back to read Fisher.

Fisher’s major contributions may be the ideasabout designs to avoid bias in experimental resultsand to enable calculation of reliable measures ofuncertainty. Compared to these, the fine pointsabout whether confidence intervals are Bayesian ornot seem relatively unimportant. Fisher did makevaluable contributions to the discussion of the rolesof probability in statistics, much of it usefully sur-

Ž .veyed by Lane 1980 . As far as Fisher’s theoreticalwork goes, particularly the 1925 and 1934 master-pieces, it has long seemed to me that Fisher mayhave inadvertently prepared us for an acceptablefuture of Bayesian methodology.

CommentD. A. S. Fraser

It is a delight to read Brad’s thoughtful andinsightful overview giving deep credits to Fisher’scontributions to current and future statistics. Bradmentions large areas not covered by his review andit is our loss that these areas such as randomiza-tion and experimental design have been omitted;they may also be among Fisher’s major contribu-tions although much neglected in current practice. Istrongly endorse Brad’s positive approval of Fisher’scontributions and add just a few further approvingremarks.

Ž .Brad notes that at ‘‘the time of Brad’s educationFisher had been reduced to a somewhat minorfigure in American academic statistics.’’ This seemsto be a substantial understatement particularly atStanford in 1961]1962 where, and when, Brad be-gan his graduate work in statistics. In the fall of1961 a psychologist]statistician Sidney Siegel spoke

D. A. S. Fraser is Professor, Department of Statis-tics, Sidney Smith Hall, University of Toronto,

ŽToronto, Canada M5S 3G3 e-mail: [email protected] .

at the Stanford statistics seminar describing howhe had graphed the likelihood function for an ap-plied problem to obtain insight on a parameter ofinterest, something that now would be viewed as asensible step in a statistical analysis. Siegel waswidely challenged by the Statistics faculty then,that this cannot and should not be done, a frontalrejection of a Fisherian idea, leaving Siegel feelingdispossessed of his credibility as a statistician. Alsothat same fall ‘‘Fisher spoke at the Stanford Medi-cal School’’ as noted by Brad. I was visiting theStanford Statistics Department that fall and didhave notice of Fisher’s talk. The absence of Statis-tics faculty from Fisher’s talk was a measure ofFisher’s influence at that time, perhaps the nadir ofhis influence. Brad does much to correct this andgive credit for the wealth we have inherited fromFisher.

I prefer a somewhat different interpretation for‘‘frequentist’’ than that indicated by ‘‘competingphilosophies . . . : Bayesian; Neyman]Wald frequen-tist; and Fisherian.’’ ‘‘Frequentist’’ seems to refer tothe interpretation for the probabilities given by thestatistical model alone, thus embracing both Fishe-

Page 25: Statistical Science 13, No 2, 95 122 R. A. Fisher in the 21st ...18.655/papers/efron1998.pdfStatistical Science 1998, Vol. 13, No. 2, 95]122 R. A. Fisher in the 21st Century Invited

R. A. FISHER IN THE 21ST CENTURY 119

rian and decision theoretic philosophies. Bayesianstatistics adds prior probabilities as the distin-guishing feature. But even here the distinction maynow be blurring when we view such additions on a‘‘what if’’ basis of temporarily enlarging the model,in a way similar to the ‘‘what if’’ basis we oftenapply to the model itself.

Brad refers to empirical Bayes as ‘‘not a topicthat has had much Fisherian input.’’ It does seem,however, to be quite consonant with Fisher’s viewand indeed Fisher should be treated as the initiatorof what was latter called empirical Bayes. In his1956 book Statistical Methods and Scientific Infer-

Ž .ence page 18f Fisher considers the genetic originsof an animal under investigation and appends tothe initial statistical model the theoretical]em-pirical probabilities concerning the origins of theanimal. This is pure what-we-now-call empiricalBayes and Fisher should be credited. From anotherviewpoint it is just enlarging the statistical modelto provide in some sense proper modeling.

In referring to Fisher’s ‘‘devices’’ for solving prob-lems, Brad sometimes uses the seemingly pejora-tive term ‘‘trick.’’ Most of these devices did seem tobe tricks when introduced but hardly now. Perhapswe both long for more of these ‘‘tricks’’ as guides tofuture devices.

Ž . Ž .Two formulas, 10 and 11 , are recorded for ‘‘theˆ ˆŽ < .conditional density’’ f u A ‘‘of the MLE u givenu

w xan ancillary’’ A. These formulas use cryptic nota-tion that makes them technically wrong withoutclarification to indicate additional dependencies onu . The context assumes that x is equivalent toˆŽ .u , A so that density and likelihood have the form

ˆ ˆŽ . Ž .f u , A and L u ; u , A ; the amplified formulasu

would then appear as

ˆŽ .L u ; u , AˆŽ < .f u A s c ,u ˆ ˆŽ .L u ; u , A

ˆŽ .L u ; u , AˆŽ < .f u A s cu ˆ ˆŽ .L u ; u , AŽ .1

1r22d ˆŽ . <? y log L u ; u , A ˆusu2½ 5du

Žfor the location and general cases with appropriate.accuracies . In the location model case with

ˆŽ . Ž < . Ž .f x ; u s g u y u A h Au

Ž .the amplified version of 10 then gives

ˆŽ < .g u y u Ac ,Ž < .g 0 A

which reproduces the conditional density except fora norming constant. By contrast the given formulaŽ .10 taken literally with the preamble describingŽ .L u ‘‘as a function of u with x fixed’’ at an ob-

ˆ0Ž .served value u , A gives

ˆ0Ž < .g u y u Ac ,

0ˆ ˆŽ < .g u y u A

which is the MLE density approximation only forˆ ˆ0Ž . Ž .the observed data point u , A s u , A . This may

seem like a very technical point but the formulasˆŽ < .are often cited as a way ‘‘for calculating f u A .’’u

In practice they can rarely be used for such acalculation as it would require the likelihood func-

ˆŽ .tion to be available at points u , A with fixed Aˆand varying u , and this would require an explicit

ancillary and much computation.For the location model case there is a formula

ˆfrom Fisher that gives the conditional density of u0 ˆ0Ž . Žfrom just the observed likelihood L u s L u ; u ,

.A :

ˆ 0 ˆ ˆ0Ž < . Ž .f u A; u s cL u y u q u .u

An extension of this is available for transformationmodels.

Brad mentions that ‘‘the magic formula can beused to generate approximate confidence intervalsw x w xwith at least second order accura cy .’’ In fact forcontinuous models third order confidence intervalsare available in wide generality but require addi-tional theory for approximate ancillaries.

I particularly welcome Brad’s positive views onthe prospects for fiducial methods. As one who hasworked extensively on contexts where fiducial cal-

Žculations have good conventional properties the.transformation group context or are useful for in-

Žcreasing the accuracy of a significance value using.fiducial to eliminate a nuisance parameter , I find

it reassuring to see optimism elsewhere. Certainlythe typical statistician feels fiducial is wrong but inmost cases he is also unfamiliar with the details orthe overlap with confidence methods.

For the case where the dimension of the variablehas been reduced to the dimension of the parame-ter both fiducial and confidence make use quitegenerally of a pivotal quantity. The confidence pro-cedure chooses a 90% region on the pivot space andthen inverts to the parameter space to get theconfidence region; by contrast the fiducial inverts tothe parameter space and then chooses the 90%region. The fiducial issues arise then from the in-creased generality in the second procedure. Ofcourse any observed 90% fiducial region has a cor-

Page 26: Statistical Science 13, No 2, 95 122 R. A. Fisher in the 21st ...18.655/papers/efron1998.pdfStatistical Science 1998, Vol. 13, No. 2, 95]122 R. A. Fisher in the 21st Century Invited

B. EFRON120

responding 90% pivot region and is then of coursethe 90% confidence region for that pivot region. Thecontroversial issues arise with simulations and rep-etitions. Should these repetitions always be withthe same value of the parameter? This is not the

reality of applications! And there are alternatives. Ido applaud Brad’s boldness in inverting the pivotaldistribution and getting a confidence density; thefiducial whisper campaign has been too constrain-ing on the profession.

CommentA. P. Dempster

Bradley Efron’s colorful lecture is fun to read.Brad is generous and accurate in crediting Fisher’simportant mathematical foundations for sparkingthe ‘‘frequentist’’ school whose framework is evi-dently deeply embedded in Brad’s psyche and work.At the same time, I question whether he is suffi-ciently in touch with Fisher’s thinking to do justiceto Fisher’s also remarkable contributions to thelogic of inference. A balanced reading of the Fish-er]Neyman disputes suggests that the history of20th century statistics is not a linear path fromFisher, to Neyman and ultimately to a modern‘‘frequentist’’ statistics whose main challenger is‘‘Bayesianism’’ of either ‘‘subjective’’ or ‘‘objective’’varieties. The hackneyed terms in quotes need fun-damental clarification and definition, laying outroles in science as actually practiced.

Fisher’s conception of inference is built aroundthe interpretation of sampling distributions in rela-tion to sample data. Sample data are frequencydata, and sampling distributions have natural fre-quency interpretations. But these roles for fre-quency are basic to any view of statistics as adiscipline and are far from making Fisher ‘‘fre-quentist’’ in Neyman’s sense. Fisher aimed to char-acterize the information in data, whereas Neymansettled on a theory that guides a statistician’sbehavior in choosing among procedures seen ascompeting on the basis of long run operating char-acteristics. ‘‘Neyman]Wald’’ theory provides usefulinsights, but creates a sterile view of practice. Aftera procedure is chosen and applied, how does onethen think postanalysis about uncertainties? Fishergave interpretations of significance tests, likelihoodand fiducial intervals that whatever their strengthsand weaknesses meet the question head-on.

A. P. Dempster is Professor, Department of Statis-tics, Harvard University, Cambridge, Massachu-

Ž .setts 02138 e-mail: [email protected] .

Understanding Fisher requires understandingthat a probability in practice determines a ‘‘well-

Ž .specified state of uncertainty’’ Fisher, 1958 abouta specific situation in hand, a position identifiableas describing a kind of formal subjectivity thatcomplements rather than contradicts the conven-tional view of science as objective. Interpreting the

Žp-value of a significance test Fisher’s way Fisher,.1956, page 39 , after the result is reported, requires

‘‘postdictive’’ assessment of such a formal subjec-Ž .tive probability Dempster, 1971 . As early as 1935,

Fisher recognized that ‘‘confidence intervals’’ are‘‘only another way of saying that, by a certain testof significance, some kinds of hypothetical possibili-

Žties are to be rejected, while others are not’’ Ben-.nett, 1990, page 187 . It may be, as Brad says,

‘‘generally considered’’ that fiducial probability wasFisher’s ‘‘biggest blunder,’’ but should not be on thebasis of the simplistic arguments accompanyingNeyman’s exasperated polemics, for example, ‘‘aconglomeration of mutually inconsistent assertions,

Žnot a mathematical theory’’ Neyman, 1977, page.100 . Interpreting a fiducial probability shares with

interpreting a Bayesian posterior probability anintended postanalysis predictive interpretation offormal subjective probability that depends on ac-cepting detailed prior assumptions, includingspecifically in the case of the fiducial argument thatthe distribution of an identified pivotal quantity isindependent of, and hence unaffected by, the obser-vations. Both fiducial inferences and Bayesian in-ferences stand or fall on case-by-case judgments ofboth models and independences assumed.

Was Fisher a ‘‘lapsed Fisherian’’ in the smokingand lung cancer debate of the late 1950s? He wascertainly correct to draw attention to the potentialfor misleading selection biases affecting causal in-ferences from observational data. Are Fisher’s in-novative ideas ‘‘like randomization inference andconditionality . . . contradictory’’? Not if one acceptsthat postdictive reasoning and predictive reasoningare complementary activities. Conditionality is im-

Page 27: Statistical Science 13, No 2, 95 122 R. A. Fisher in the 21st ...18.655/papers/efron1998.pdfStatistical Science 1998, Vol. 13, No. 2, 95]122 R. A. Fisher in the 21st Century Invited

R. A. FISHER IN THE 21ST CENTURY 121

portant for estimation, but irrelevant for interpret-ing the outcome of a randomization test. Also, inFisher’s view of estimation, conditionality is not theblanket principle of Bayesian statistics. Indeed, se-lective conditioning was for Fisher a key to avoid-ing universal submission to Bayes. Fisher wasunsuccessful in obvious ways, but he addresses

deeper and harder questions than Neymanian the-ory attempts to solve. I believe we should downplaysound-bite criticisms coming from self-limiting the-oretical perspectives, either frequentist or Bayesian.I do not wish to protest too much, however. Bradand I agree in seeking to resurrect Fisher from hisrelative obscurity in American academic statistics.

RejoinderBradley Efron

One could scarcely ask for a clearer set of discus-sions, or a more qualified set of discussants, leavingme with very little to rejoin. The lecture itself waswritten quickly, in a few weeks, and barely revisedfor publication. My fear was that a careful revisionwould become just that, careful, and lose its force inan attempt to cover all Fisherian bases. The com-mentaries each add important ideas that I omitted,while scarcely overlapping with each other. Fisher’sworld must be a very high-dimensional one! Let meend here with just a few brief reactions to thecommentaries.Ž . Ž .1 Paragraph 3 of Professor Cox’s comments

concerns Fisher’s definition of probability, hingingon the absence of recognizable subsets. This is acrucial point for the relevance of statistical argu-ments to actual decision making, but it is not aneasy criterion to apply. As an analog I like toimagine a field of intermixed orange and whitepoppies, with the proportion of orange poppies rep-resenting the probability of an event of interest.Perhaps the orange proportion increases steadilyfrom east to west or from north to south or diago-nally from corner to corner. A model-building pro-gram based on logistic regression or CART amountsto a systematic search for the field’s inhomo-geneities.

We are all used to basing statistical inferences onthe outputs of such programs, saying perhaps thatthe estimated probability of an orange poppy at thefield’s center point is 0.20, but how does this ‘‘prob-ability’’ relate to Fisher’s definition? A differentmodel, amounting to a different partitioning of thefield, might give a different probability. My problemhere is not with recognizability, which is obviouslyan important idea, but with its application to con-texts where the statistician must choose which sub-sets might be recognizable.Ž .2 Professor Kass discusses one of the deepest

problems of statistical philosophy: why are frequen-

tist computations often useful and compelling, evenfor those who prefer the Bayesian paradigm?Fisher’s work provides the best answers so far tothat question but we are still a long way fromhaving a satisfactory resolution. Kass offers a dif-ferent hope, via Bayesian nonparametrics, forbridging the chasm. Going this route requires us tosolve another deep problem, how to put uninforma-

Ž .tive prior distributions on high- or infinite- dimen-sional parameter spaces.Ž .3 On the same point, Professor Barndorff-Niel-

sen asks how we can reconcile fiducial inferencewith high-dimensional estimation results like theJames]Stein theorem. For a multivariate version of

Ž . Ž 2 .formula 1 , x ; N u , s I , the original fiducialK< Ž 2 .argument seems to say that u x ; N x, s I , aK

terrible answer if we are interested in estimating5 5 2u , for example. The confidence density approachgives sensible results in such situations. My 1993Biometrika paper argues, a la Kass, that this is away of using frequentist methods to aid Bayesiancalculations.Ž .4 Professor Hinkley notes the continued vitality

of Tukey-style data analysis. In its purest form thisline of work is statistics without probability theoryŽsee, e.g., Mosteller and Tukey’s 1977 book ‘‘Data

.Analysis and Regression’’ and as such I could notplace it anywhere in the statistical triangle of Sec-tion 11. This is my picture’s fault of course, notTukey’s. Problem-driven areas like neural networksoften begin with a healthy burst of pure data analy-sis before settling down to an accommodation withstatistical theory.Ž .5 I am grateful to Professor Fraser for present-

ing a more intelligible version of the magic formula.This was the spot in the talk where ‘‘avoiding tech-nicalities’’ almost avoided coherency. ‘‘Trick’’ is apositive word in my vocabulary, reflecting a Caltecheducation, and I only wish I could think of somemore Fisher-level tricks. Fisherian statistics was

Page 28: Statistical Science 13, No 2, 95 122 R. A. Fisher in the 21st ...18.655/papers/efron1998.pdfStatistical Science 1998, Vol. 13, No. 2, 95]122 R. A. Fisher in the 21st Century Invited

B. EFRON122

not entirely missing from early 1960s Stanford: itflourished in the medical school, where LincolnMoses and Rupert Miller ran the biostatistics train-ing program.Ž .6 I resisted the urge to locate the discussants in

the statistical triangle, perhaps because ProfessorDempster has placed me closer to the frequentistcorner than I feel comfortable with. Dempster em-phasizes an important point: that Fisher’s theory,more so than Neyman’s, aimed at postdata inter-pretations of uncertainty. This appeared in almostpure Bayesian form with fiducial inference, butusually was expressed less formally, as in his inter-

Žpretations of significance test p-values. See Sec-tion 20 of the 1954 edition of Statistical Methods

.for Research Workers.One cannot consider Fisher without also dis-

cussing Neyman, and he is likely to come out ofsuch discussions sounding like the bad guy. This ismost unfair. To invert Kass’s metaphor, Neymanplayed Niels Bohr to Fisher’s Einstein. Nobody couldmatch Fisher’s intuition for statistical inference,but even the strongest intuition can sometimes goastray. Neyman]Wald decision theory was an heroicand largely successful attempt to put statisticalinference on a sound mathematical foundation. Toomuch mathematical soundness can be stultifying,as Barndorff-Nielsen points out, but too much intu-ition can cross over into mysticism, and one cansympathize with Neyman’s frustration over Fisher’ssometimes Delphic fiducial pronouncements.Ž . Ž7 Hinkley and Dempster and others at the

.lecture questioned whether randomization infer-ence really contradicts the conditionality principle.I have to admit to using both ends of the contradic-tion myself without feeling any great amount ofguilt, but the logical inconsistency still seems to bethere. Isn’t a randomly chosen experimental designan ancillary, and shouldn’t we condition on it ratherthan basing inference on its random properties?Dempster’s statement ‘‘Conditionality is importantfor estimation, but irrelevant for interpreting theoutcome of a randomization test’’ seems to justrestate the problem. We condition on a randomly

Žselected sample size n to borrow Professor Cox’s.famous example whether the data is used for esti-

mation or testing.Ž .8 Lumping subjective and objective Bayesian-

ism together simplified my presentation, but may-

be to a dangerous degree. It elicited ProfessorLehman’s objection quoted in Section 11. Perhaps itwould be better to follow Professor Cox’s preferencefor a square instead of a triangle.

Statistical philosophy is best taken in small doses.ŽThe discussants have followed this rule even if I

.have not with six excellent brief essays. I amgrateful to them, to COPSS for the invitation togive the Fisher lecture, and to the Editors of Statis-tical Science for arranging this discussion.

ADDITIONAL REFERENCES

Ž .BENNETT, J. H. 1972 . Collected Papers of R. A. Fisher. Univ.Adelaide Press.

Ž .BENNETT, J. H., ed. 1990 . Statistical Inference and Analysis.Selected Correspondence of R. A. Fisher. Oxford Univ. Press.

Ž .COX, D. R. 1982 . A remark on randomization in clinical trials.Ž .Utilitas Math. 21A 245]252. Birthday volume for F. Yates.

Ž .DEMPSTER, A. P. 1971 . Model searching and estimation in thelogic of inference. In Foundations of Statistical InferenceŽ .V. P. Godambe and D. A. Sprott, eds. 56]78. Holt, Rinehartand Winston, Toronto.

Ž .EFRON, B. 1971 . Forcing a sequential experiment to be bal-anced. Biometrika 58 403]417.

Ž . ŽEFRON, B. 1987 . Better bootstrap confidence intervals with.discussion . J. Amer. Statist. Assoc. 82 171]200.Ž .FISHER, R. A. 1928 . Correlation coefficients in meteorology.

Nature 121 712.Ž .FISHER, R. A. 1929 . Statistics and biological research. Nature

124 266]267.Ž .FISHER, R. A. 1935 . Design of Experiments. Oliver and Boyd,

Edinburgh.Ž .FISHER, R. A. 1956 . Statistical Methods and Scientific Infer-

Žence. Oliver and Boyd, Edinburgh. Slightly revised versions.appeared in 1958 and 1960.

Ž .FISHER, R. A. 1958 . The nature of probability. Centennial Re-view 2 261]274.

Ž .GELMAN, A., MENG, X.-L. and STERN, H. 1996 . Posterior predic-Ž .tive assessments of model fitness with discussion . Statist.

Sinica 6 773]807.Ž .HALD, A. 1981 . T. N. Thiele’s contributions to statistics. Inter-

nat. Statist. Rev. 49 1]20.Ž .LANE, D. A. 1980 . Fisher, Jeffreys and the nature of probabil-

ity. R. A. Fisher: An Appreciation Lecture Notes in Statist. 1.Springer, New York.

Ž .MAHALANOBIS, P. C. 1938 . Professor Ronald Aylmer Fisher.Sankhya 4 265]272.

Ž .NEYMAN, J. 1977 . Frequentist probability and frequentiststatistics. Synthese 36 97]131.

Ž .RIPLEY, B. D. 1996 . Pattern Recognition and Neural Networks.Cambridge Univ. Press.

Ž .SAVAGE, L. J. 1976 . On rereading R. A. Fisher. Ann. Statist. 4441]483.

Ž .YATES, F. 1970 . Experimental Design. Selected Papers of FrankYates, C.B.E., F.R.S. Griffin, London.