introduction to statistical decision theoryby john w. pratt; howard raiffa; robert schlaifer

3
Introduction to Statistical Decision Theory by John W. Pratt; Howard Raiffa; Robert Schlaifer Review by: Mark J. Schervish Journal of the American Statistical Association, Vol. 91, No. 435 (Sep., 1996), pp. 1376-1377 Published by: American Statistical Association Stable URL: http://www.jstor.org/stable/2291759 . Accessed: 15/06/2014 11:13 Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at . http://www.jstor.org/page/info/about/policies/terms.jsp . JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact [email protected]. . American Statistical Association is collaborating with JSTOR to digitize, preserve and extend access to Journal of the American Statistical Association. http://www.jstor.org This content downloaded from 62.122.77.48 on Sun, 15 Jun 2014 11:13:11 AM All use subject to JSTOR Terms and Conditions

Upload: review-by-mark-j-schervish

Post on 20-Jan-2017

237 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Introduction to Statistical Decision Theoryby John W. Pratt; Howard Raiffa; Robert Schlaifer

Introduction to Statistical Decision Theory by John W. Pratt; Howard Raiffa; RobertSchlaiferReview by: Mark J. SchervishJournal of the American Statistical Association, Vol. 91, No. 435 (Sep., 1996), pp. 1376-1377Published by: American Statistical AssociationStable URL: http://www.jstor.org/stable/2291759 .

Accessed: 15/06/2014 11:13

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .http://www.jstor.org/page/info/about/policies/terms.jsp

.JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range ofcontent in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new formsof scholarship. For more information about JSTOR, please contact [email protected].

.

American Statistical Association is collaborating with JSTOR to digitize, preserve and extend access to Journalof the American Statistical Association.

http://www.jstor.org

This content downloaded from 62.122.77.48 on Sun, 15 Jun 2014 11:13:11 AMAll use subject to JSTOR Terms and Conditions

Page 2: Introduction to Statistical Decision Theoryby John W. Pratt; Howard Raiffa; Robert Schlaifer

1376 Journal of the American Statistical Association, September 1996

semivariogram estimation (see, e.g., Russo 1984 and Warrick and Myers 1987).

Other than mentioning in an appendix that the kriging estimator is the conditional expected value if the random field is Gaussian (p. 215), the au- thor does not make it clear that only if the random field is nearly Gaussian is a linear estimator the minimum variance estimator among all unbiased estimators (Cressie 1993, pp. 110, 278). This lack of optimality for non- Gaussian-like fields has precipitated much work in nonlinear geostatistics (e.g., disjunctive kriging and indicator kriging). Introductory books on spa- tial statistics should be trying to correct the widespread misunderstanding that kriging is a "distribution-free" estimator-implicitly, kriging's opti- mality is very dependent on a strong distributional assumption.

The book includes short, introductory chapters on principal components analysis (PCA), canonical correlations analysis (CCA), and correspondence analysis (CA) (Chaps. 17-19). Although linkages between PCA, CCA, and geostatistics are discussed in Chapters 22 and 25, apart from a very brief allusion to a similarity between CA and disjunctive kriging, it is not clear why the CA chapter was included. Also, there are no references to basic texts in PCA or CCA and no discussion of the effect on PCA and/or CCA analyses of using the sample covariance matrix in place of the true covariance matrix; that is, there is no mention of the sampling behavior of principal components and/or canonical correlations. Also, no methods are given for fitting generalized covariance functions to data (see pp. 186- 187).

On p. 136, the author dismisses use of the pseudo-cross-semivariogram by first noting unexplained "limitations" reported by other authors and then making the unexplained assertion that the subtraction of two variables having different units "does not make sense" even after the variables have been rescaled. No mention is made of the theoretical and simulation work of Ver Hoef and Cressie (1993) that finds cokriging based on the pseudo- cross-semivariogram to be superior in certain cases to the "classical" cross- semivariogram. This issue would be no more than theoretical swordplay if not for the case of variables measured on noncoincident networks. In this situation, the author acknowledges that the classical cross-semivariogram cannot be estimated (p. 145). At least in the environmental sciences, as data from different monitoring efforts are pooled to give more cost-effective estimates of environmental status, analysis of data from noncoincident networks will become more frequent (see, e.g., Haas 1996).

The presentation of external drift (Chap. 30) allows universal kriging to make use of covariates other than polynomial functions of spatial loca- tion. However, the situation definition is confusing. On p. 190, the author describes one variable being accurately measured at only a few locations with another variable measured "imprecisely" at all grid and variable 1 measurement locations. Variable 1 is modeled as a random function, but variable 2 is deterministic. If variable 2 is so imprecisely measured, then why is it not also modeled as random?

Finally, a minor quibble. Many of the references are written in French to either journals that are in French or to technical reports issued by the Ecole des Mines des Paris. For many readers, this will make for some frustrating efforts to track down a tantalizing reference.

It is a credit to the author that, despite the oversights that I have pointed out here, I nonetheless found myself impressed with both the clarity of his style and the range of theory and applications he manages to cover in 256 pages.

Timothy C. HAAS University of Wisconsin, Milwaukee

REFERENCES

Brockwell, P. J., and Davis, R. A. (1991), Time Series: Theory and Methods, New York: Springer-Verlag.

Cressie, N. (1993) Statistics for Spatial Data (rev. ed.), New York: John Wiley.

(1985), "Fitting Variogram Models by Weighted Least Squares," International Association for Mathematical Geology, 17, 563-586.

Davis, B. M., and Borgman, L. E. (1979), "Some Exact Sampling Distri- butions for Variogram Estimators," Journal of the International Associ- ation for Mathematical Geology, 11, 643-653.

Haas, T. C. (1996), "Multivariate Spatial Prediction in the Presence of Nonlinear Trend and Covariance Nonstationarity," Environmetrics, 7, (to appear).

Handcock, M. S., and Stein, M. L. (1993), "A Bayesian Analysis of Krig- ing," Technometrics, 35, 403-410.

Kitanidis, P. K. (1985), "Minimum Variance Unbiased Quadratic Estima- tion of Covariances of Regionalized Variables," Journal of the Interna- tional Association for Mathematical Geology, 17, 195-208.

Russo, D. (1984), "Design of an Optimal Sampling Network for Estimating the Variogram," Soil Science Society of America Journal, 48, 708-716.

Ver Hoef, J. M., and Cressie, N. (1993), "Multivariable Spatial Prediction," International Journal of Mathematical Geology, 25, 219-240.

Warrick, A. W., and Myers, D. E. (1987), "Optimization of Sampling Lo- cations for Variogram Calculations," Water Resource Research, 23, 496- 500.

Yfantis, E. A., Flatman, G. T., and Behar, J. V. (1987), "Efficiency of Krig- ing Estimation for Square, Triangular, and Hexagonal Grids," Mathemat- ical Geology, 19, 183-205.

Zimmerman, D. L., and Zimmerman, M. B. (1991), "A Comparison of Spatial Semivariogram Estimators and Corresponding Ordinary Kriging Predictors," Technometrics, 33, 77-91.

Introduction to Statistical Decision Theory. John W. PRATT, Howard RAIFFA, and Robert SCHLAIFER, Cam- bridge, MA: MIT Press, 1995. xix + 875 pp. $65.

This book is a classic. I mean that both as a complement and as a reflection of the fact that a "preliminary edition" of the book has been in widespread use since 1965. I am honored that the American Statistical Association has waited until 1995 to ask me to review this book, as I had not yet mastered statistics (or even the quadratic formula) in 1965. On the other hand, it is sad that Robert Schlaifer did not live to see the final version of his masterpiece in print.

The book begins with an introduction to decision making under un- certainty and then gradually introduces concepts and tools that facilitate the solution of increasingly complicated decision problems, ending with multiple linear regression. The prototypical decision problem covered in this text can be described as follows. There is an unknown state of the world, s; for example, the amount of oil under the ground at a particular site. A decision maker then has the choice of several possible experiments from a set E that can help her to learn about s. For example, she could pay for seismic tests or not. Each experiment e leads to the observation of some datum z, and the decision maker must then choose some action a from a set A. Possible actions might be to drill a well or to abandon the site altogether. Finally, some consequence c accrues to the decision maker as a function of all that has come before, namely s, e, z, and a. The consequence will generally include any costs of experimentation and data collection as well as any income that might be generated as a result of any sales. Using a well-motivated set of axioms for rational preference among uncertain consequences, the authors derive the usual expected-utility the- ory of decision making. That is, to choose rationally, the decision maker must behave as if she had a real-valued utility function over consequences and a probability distribution over all uncertain quantities (s and z) de- pending on the choices made (e and a) and then make the choices that lead to the largest expected value of the utility. In these cases the best terminal action a given observed data z from experiment e is usually sim- ple to determine. The analysis then concentrates on the choice of optimal experiment e, namely the best sample size n to observe.

The most commonly used examples in the text are those in which s is a parameter of some distribution and each experiment is to sample some specific number n of conditionally independent (given s) observations x1,...,xn with that distribution. The action set A is either a finite set (as in a hypothesis-testing problem) or the set of possible s values (as in an estimation problem). The utility of the consequence in the examples equals a sampling cost plus either one of several linear functions of s (one for each possible action) or some convex function of a - s in the estimation case.

The book is noteworthy not so much for the statistical topics covered but rather for the approach taken in covering them. The authors chose to develop the subjective Bayesian approach to decision theory and infer- ence. Approximately two-thirds of the way through the book they include Chapter 20 on "Classical Methods." This turns on its head the more typical arrangement of introductory texts in which classical methods are discussed exclusively with at most one chapter on Bayesian methods thrown in near the end. Furthermore, this chapter is less an exposition of classical meth- ods than a series of criticisms. But even for a book on Bayesian statistics, the decision-theoretic emphasis is far stronger than one finds in typical in- troductory texts. For example, the first six chapters contain much material on the foundations of decision theory not found in other introductions to Bayesian statistics (e.g., DeGroot 1986).

In one sense, this book is actually two books in one binding: an in- troduction to probability and statistics from the Bayesian viewpoint and an introduction to statistical decision theory. Although one could easily isolate much of the standard material needed for a 1-year introduction to

This content downloaded from 62.122.77.48 on Sun, 15 Jun 2014 11:13:11 AMAll use subject to JSTOR Terms and Conditions

Page 3: Introduction to Statistical Decision Theoryby John W. Pratt; Howard Raiffa; Robert Schlaifer

Book Reviews 1377

probability and statistics from the Bayesian viewpoint, it seems clear that the authors did not have this in mind when writing the book. Because dis- tribution theory and statistical modeling rely so little on decision theory, the authors would have been hard-pressed to write the book in such a way as to prevent the separation. However, those portions of the book that do not pertain to decision theory would not make a particularly good intro- ductory statistics text. There is far too little exposition and there are too few examples of some of the fundamental concepts. For example, abstract probability and conditional probability take fewer than seven pages. But, as we have already noted, the authors did not intend for the book to be used this way.

Of more interest is the introduction to decision theory, from which the book earns its title. Chapter 2 provides an informal discussion of lotteries and quantification of preferences and judgments; Chapter 3 and Section 8.1 give a more mathematical treatment of these same ideas. Chapters 4 and 5 concentrate on utilities and probability judgments, and Chapter 6 introduces decision trees with several good examples. The remainder of the introduction to decision theory is interspersed with the introduction to probability and statistics. To some extent, the text seems to jump back and forth between the two "subbooks" quite a bit more than one might like. For example, Section 10.2 introduces conditional preferences for lot- teries, but it does not rely on the material in most of Chapter 8 (expec- tations) and Chapter 9 (special distributions). Section 10.3 covers Bayes's theorem, which also could have followed immediately after Chapter 7 (random variables). The next five chapters alternate between probability theory and decision theory for no apparent reason: Chapter 11 (Bernoulli process), Chapter 12 (terminal analysis), Chapter 13 (paired random vari- ables), Chapter 14 (preposterior analysis), and Chapter 15 (Poisson pro- cess). At this point the text settles down temporarily. The ends of Chapters 15-17 (the latter two covering normal random variables with known and unknown variance) present examples of terminal and preposterior analysis for decision problems involving the distributions introduced earlier in the chapters. After a digression on large-sample theory (Chap. 18), the formal introduction to decision theory concludes in Chapter 19 with a discussion of extensive and normal forms of analysis.

The remaining material also divides nicely between the two "subbooks." Chapters 21, 22, and 24 cover multivariate distributions, multivariate nor- mal distribution, and linear regression. Chapter 23 discusses four decision problems: choosing the best of several treatments, allocating resources be- tween biased and unbiased observations, selecting a stratified sample, and choosing a portfolio of investments.

Although the authors claim to not assume any previous training in proba- bility and statistical inference, this book would be most useful for a second course following a traditional 1-year probability and statistics sequence or at least one semester of mathematical probability as presented by DeGroot (1986). The decision theory portions of the book rely on an understanding of probability theory and Bayesian reasoning. This text provides a good introduction to Bayesian reasoning but is noticeably weak on probability theory. This is not to say that the authors did a bad job; they merely de- voted their effort to those areas where the book was intended to make the largest contributions. Here they succeeded admirably.

The major strengths of this text are twofold. First, it gives a general and well-motivated introduction to the principles of Bayesian decision theory that should be accessible to anyone with a good mathematical statistics background. Second, it provides a good introduction to Bayesian inference in general, with particular emphasis on the use of subjective information to choose prior distributions. After the introduction of each new process (e.g. Bernoulli, Poisson, normal) the text gives guidance on how to choose prior distributions and on the effect of the priors on inference. In addition, several extended case studies at key locations pull together the principles and methods presented earlier. Each chapter begins with an introductory statement putting its material into context. There are over 450 exercises at the ends of the chapters, but more than 30% of them are in three chapters: 22 and 24 (multivariate normal and regression) and Appendix 3 ("Prop- erties of Utility Functions for Monetary Consequences"). Many chapters have no exercises at all. Fortunately, the major decision-theoretic chapters (2, 4, 12, 14, 19, and 23) have good sets of exercises.

For those who have used the preliminary edition, there have been some changes, but they seem small in comparison to the size of the text. Chapters 15 and 17 and Appendix 3 are all new. Additional sections have appeared in Chapters 14 (on sequential sampling), 19 (on classical decision theory), 20 (on further uses of tests and on sufficient statistics), 23D (a section on capital asset pricing models, replacing a section on nonnegativity con- straints), and 24 (on choice among models and on causation). Most of the sections that were listed as "to be added" in Chapters 11, 18, 23, and 24 in fact were not added. A new case study was added to Chapter 4, the term "preference function" was changed to "utility function," and the

hand-drawn graphs were all polished up considerably. Some more modern references have been added, but the treatments of almost all subjects are basically the same as they were in 1965.

In summary, Introduction to Statistical Decision Theory provides an ex- cellent introduction to Bayesian decision theory and to subjective Bayesian reasoning, at a level appropriate for students who have already been in- troduced to mathematical probability. It is not a modern text in any sense; for example, it makes no pretense of covering data-analytic, graphical, and computer-intensive methods. But what it strives to do, it does very well.

Mark J. SCHERVISH Carnegie Mellon University

REFERENCE

DeGroot, M. H. (1986), Probability and Statistics (2nd ed.), Reading, MA: Addison-Wesley.

Analysing Survival Data from Clinical Trials and Observational Studies.

Ettore MARUBINI and Marcia Grazia VALSECCHI. New York: John Wiley, 1995. xvi + 414 pp. $

Since scientists first calculated survival rates for cancer, the analysis of time-to-event data has played an important role in clinical trials, onco- logical medicine, infectious diseases, actuarial science, engineering, eco- nomics, management, and the social sciences. Theoretical methods of sur- vival analysis can be found in some standard reference books, including works by Andersen, Borgan, Gill and Keiding (1993), Cox and Oakes (1984), Kalbfleisch and Prentice (1980), Lawless (1982), and Miller (1981). In contrast, this book provides an introduction to survival analysis without theoretical derivations and is aimed at clinical researchers.

The intended readers are medical research professionals with some ex- perience in applied statistics. With the aim of providing a practical guide for clinicians with limited statistical knowledge, the methodology discus- sions rely on heuristics. By including some mathematical statistics back- ground, such as the delta method and asymptotic results, in appendices, this book becomes useful for biostatisticians as well. It not only covers recent methodological developments, but also provides applications and practical discussions. On the whole, it represents an excellent contribution to the field.

Chapters 1 and 2 begin with a brief introduction to the characteristics of survival data and some general principles of statistical inference, including the concepts of type I and type II censoring. Sections 2.2 and 2.3 discuss the selection of patients, trial design, and randomization. These topics are important issues for planning and designing medical studies. Section 2.5 discusses the intention-to-treat principle. In carrying out clinical trials with a long follow-up, such as cancer clinical trials, an ethical issue must be considered while making decisions. Section 2.8 discusses this issue and other important procedures, such as monitoring and interim analyses via the group sequential approach and repeated significance testing for controlled clinical trials. Practical discussions in Sections 2.5-2.8 are very interesting and useful for both clinicians and applied statisticians.

Chapter 3, "Estimation of Survival Probabilities," presents basic con- cepts of product-limit estimates, life table estimates, comparison and mea- sures of precision for survivorship probability estimates, and confidence bands. A list of suggestions is given for reporting and interpreting a sur- vival curve. Section 3.6 on event rates discusses the rate estimates from an epidemiologist's viewpoint. The theory of the product-limit estimate and the delta method included in the appendixes at the end of Chapter 3 pro- vides readers with the mathematical background required to understand the analytical results.

Chapter 4 presents nonparametric methods for the comparison of sur- vival curves such as the Mantel-Haenzel (log-rank) test, the Gail and Si- mon test of qualitative interaction, and comparison methods for more than two samples. Because relative failure rates may vary widely among strata, discussions on inclusion of strata in the Mantel-Haenzel test provide read- ers with very useful information. Section 4.6 discusses the Mantel-Byar test, an extension of the Mantel-Haenzel test, for two-sample compar- isons with time-dependent covariates. Section 4.7 presents the problem of comparing more than two samples. Section 4.8 considers sample size cal- culations in comparative trials. Appendixes at the end of Chapter 4 present derivations of critical values for the Gail-Simon test and Freedman's pro- cedure for sample size determination.

Chapter 5, "Distribution Functions for Failure Time," deals mainly with equivalent functions describing survival. It introduces the exponen- tial model as the simplest parametric model that can be adopted for fitting failure data and discusses the estimation of exponential regression param-

This content downloaded from 62.122.77.48 on Sun, 15 Jun 2014 11:13:11 AMAll use subject to JSTOR Terms and Conditions