betting on theories (cambridge studies in probability, induction and decision theory)

324
This book is a major new contribution to decision theory, fo- cusing on the question of when it is rational to accept scientific theories. The author examines both Bayesian decision theory and con- firmation theory, refining and elaborating the views of Ramsey and Savage. He argues that the most solid foundations for con- firmation theory are to be found in decision theory, and he pro- vides a decision-theoretic derivation of principles for how new probabilities should be revised over time. Professor Maher de- fines a notion of accepting a hypothesis, and then shows that it is not reducible to probability and that it is needed to deal with some important questions in the philosophy of science. A Bayesian decision-theoretic account of rational acceptance is provided, together with a proof of the foundations for this theory. A final chapter shows how this account can be used to cast light on such vexed issues as verisimilitude and scientific realism. This is a book of critical importance to all philosophers of science and epistemologists, as well as to decision theorists in economics and other branches of the social sciences.

Upload: patrick-maher

Post on 18-Dec-2016

224 views

Category:

Documents


6 download

TRANSCRIPT

This book is a major new contribution to decision theory, fo-cusing on the question of when it is rational to accept scientifictheories.

The author examines both Bayesian decision theory and con-firmation theory, refining and elaborating the views of Ramseyand Savage. He argues that the most solid foundations for con-firmation theory are to be found in decision theory, and he pro-vides a decision-theoretic derivation of principles for how newprobabilities should be revised over time. Professor Maher de-fines a notion of accepting a hypothesis, and then shows thatit is not reducible to probability and that it is needed to dealwith some important questions in the philosophy of science.A Bayesian decision-theoretic account of rational acceptanceis provided, together with a proof of the foundations for thistheory. A final chapter shows how this account can be used tocast light on such vexed issues as verisimilitude and scientificrealism.

This is a book of critical importance to all philosophers ofscience and epistemologists, as well as to decision theorists ineconomics and other branches of the social sciences.

Betting on theories

Cambridge Studies in Probability,Induction, and Decision Theory

General editor: Brian Skyrms

Advisory editors: Ernest W. Adams, Ken Binmore,Jeremy Butterfield, Persi Diaconis, William L. Harper,John Harsanyi, Richard C. Jeffrey, Wolfgang Spohn,Patrick Suppes, Amos Tversky, Sandy Zabell

This new series is intended to be the forum for the most innovativeand challenging work in the theory of rational decision. It focuses oncontemporary developments at the interface between philosophy, psy-chology, economics, and statistics. The series addresses foundationaltheoretical issues, often quite technical ones, and therefore assumes adistinctly philosophical character.

Other titles in the seriesEllery Eells, Probabilistic Causality

Richard Jeffrey, Probability and the Art of JudgmentRobert C. Koons, Paradoxes of Belief and Strategic Rationality

Cristina Bicchieri and Maria Luisa Dalla Chiara (eds.), Knowledge,Belief and Strategic Interaction

ForthcomingJ. Howard Sobel, Taking Chances

Patrick Suppes and Mario Zanotti, Foundations of Probability withApplications

Clark Glymour and Kevin Kelly (eds.), Logic, Computation, andDiscovery

Betting on theories

Patrick MaherUniversity of Illinois at Urbana-Champaign

CAMBRIDGEUNIVERSITY PRESS

Published by the Press Syndicate of the University of CambridgeThe Pitt Building, Trumpington Street, Cambridge CB2 1RP

40 West 20th Street, New York, NY 10011-4211, USA

10 Stamford Road, Oakleigh, Victoria 3166, Australia

© Cambridge University Press 1993

First published 1993

Library of Congress Cataloging-in-Publication DataMaher, Patrick.

Betting on theories / Patrick Maher.p. cm. - (Cambridge studies in probability, induction, and

decision theory)Includes bibliographical references and index.

ISBN 0-521-41850-X1. Decision-making. I. Title. II. Series.

QA279.4.M33 1993519.5'42 - dc20 92-13817

CIP

A catalog record for this book is available from the British Library

ISBN 0-521-41850-X

Transferred to digital printing 2004

Contents

Preface page ix

1 The logic of preference 11.1 Expected utility 11.2 Calculation 51.3 Representation 91.4 Preference 121.5 Connectedness 191.6 Normality 211.7 Rationality 231.8 Justification 251.9 Qualification 29

2 Transitivity and normality 342.1 Popular endorsement 342.2 Arguments for transitivity and normality 36

2.2.1 The money pump 362.2.2 Consequent ialism 382.2.3 Modesty 422.2.4 Principles a and /3 45

2.3 Objections to transitivity and normality 472.3.1 Probabilistic prevalence 482.3.2 Shifting focus 492.3.3 Majority rule 512.3.4 Levi 532.3.5 Discrimination threshold 57

2.4 Hume and McClennen 603 Independence 63

3.1 Violations 633.1.1 The Allais problems 633.1.2 The Ellsberg problems 66

3.2 Arguments for independence 703.2.1 Synchronic separability 703.2.2 Diachronic separability and rigidity 743.2.3 The sure-thing principle 763.2.4 The value of information 793.2.5 Other arguments 81

3.3 Objections to independence 823.4 Conclusion 83

4 Subjective probability in science 844.1 Confirmation 844.2 Normativity 864.3 Betting on theories 884.4 Subjectivity 914.5 Empiricism 934.6 The Dutch book argument for probability 94

4.6.1 The simple argument 954.6.2 The fallacy 964.6.3 Introduction of utilities 974.6.4 Posting betting quotients 994.6.5 Mathematical aggregation 102

5 Diachronic rationality 1055.1 Reflection 105

5.1.1 The Dutch book argument 1065.1.2 Counterexamples 1075.1.3 The fallacy 1105.1.4 Integrity 1135.1.5 Reflection and learning 1145.1.6 Reflection and rationality 116

5.2 Conditionalization 1205.2.1 Conditionalization, Reflection,

and rationality 1215.2.2 Other arguments for conditionalization 1235.2.3 Van Praassen on conditionalization 1255.2.4 The rationality of arbitrary shifts 126

5.3 Probability kinematics 1275.4 Conclusion 128

6 The concept of acceptance 1306.1 Definition 130

VI

6.2 Acceptance and probability 1336.2.1 Probability 1 not necessary 1336.2.2 High probability not sufficient 1346.2.3 Probability 1 not sufficient 1356.2.4 High probability not necessary 137

6.3 Rational acceptance 1396.3.1 Theory 1396.3.2 Example 1436.3.3 Objections 147

6.4 Acceptance and action 1496.5 Belief 1526.6 Other concepts of acceptance 155

6.6.1 Kaplan 1556.6.2 Levi 1566.6.3 Van Praassen 158

6.7 Summary 1617 The significance of acceptance 162

7.1 Explaining the history of science 1627.2 The role of alternative hypotheses 1697.3 The scientific value of evidence 1737.4 Summary 181

8 Representation theorem 1828.1 Savage's uninterpretable acts 1828.2 Simple cognitive expected utility 185

8.2.1 Notation 1868.2.2 Connectedness 1878.2.3 Two constant acts 1878.2.4 Closure condition 1888.2.5 Transitivity 1908.2.6 Independence 1908.2.7 Generalized weak dominance 1908.2.8 Qualitative probability 1928.2.9 Continuity 1938.2.10 Representation theorem 195

8.3 General cognitive expected utility 1978.3.1 Dominance for countable partitions 1988.3.2 Acts and consequences 2008.3.3 Density of simple acts 202

vn

8.3.4 Structural assumptions 2038.3.5 Representation theorem 206

9 Scientific values 2089.1 Truth 2089.2 Necessary conditions 209

9.2.1 Respect for truth 2109.2.2 Respect for information 2139.2.3 Impartiality 2149.2.4 Contradiction suboptimal 216

9.3 Value incommensurability 2169.4 Verisimilitude 218

9.4.1 Why verisimilitude? 2189.4.2 A little history 2209.4.3 The case for a subjective approach 2249.4.4 A subjective definition of verisimilitude 227

9.5 Information and distance from truth 2319.5.1 A subjective definition of information 2319.5.2 Comparison with other accounts

of information 2349.5.3 Distance from truth 237

9.6 Scientific realism 240

AppendixesA Proof for Section 5.1.6 245B Proof of Theorem 8.1 249

B.I Probability 249B.2 Utility of gambles 250B.3 Utility of consequences 259

C Sufficient conditions for Axiom 10 265D Proof of Theorem 8.2 267

D.I Countable additivity of probability 267D.2 The function w 269D.3 The signed measures Wf 273D.4 Utility on Y 281D.5 The need for Axiom 11 287

Bibliography 292Index 307

vin

Preface

Under what conditions does evidence confirm a scientific hy-pothesis? And why under those conditions only? There is ananswer to these questions that is both precise and general, andwhich fits well with scientific practice. I allude to the Bayes-ian theory of confirmation. This theory represents scientists ashaving subjective probabilities for hypotheses, and it uses prob-ability theory (notably Bayes' theorem) to explain when, andwhy, evidence confirms scientific theories.

I think Bayesian confirmation theory is correct as far as itgoes and represents a great advance in the theory of confirma-tion. But its foundations have sometimes been seen as shaky.Can we really say that scientists have subjective probabilitiesfor scientific theories - or even that rationality requires this?One purpose of the present book is to address this foundationalissue. In Chapter 1, I defend an interpretation of subjectiveprobability that is in the spirit of Prank Ramsey. On this inter-pretation, a person has subjective probabilities if the person haspreferences satisfying certain conditions. In Chapters 2 and 3I give reasons for thinking that these conditions are require-ments of rationality. It follows that rational people have (notnecessarily precise) subjective probabilities. In Chapter 4,1 ap-ply this general argument to science, to conclude that rationalscientists have (not necessarily precise) subjective probabilitiesfor scientific theories.

The presupposition of Bayesian confirmation theory that Ihave been discussing is a synchronic principle of rationality;it says that a rational scientist at a given time has subjectiveprobabilities. But Bayesian confirmation theory also assumesprinciples about how these probabilities should be revised asnew information is acquired; these are diachronic principles ofrationality. Such principles are investigated in Chapter 5. Here

ix

I show that the usual arguments for these diachronic principlesare fallacious. Furthermore, if the arguments were right, theywould legitimate these principles in contexts where they are re-ally indefensible. I offer a more modest, and I hope more cogent,replacement for these flawed arguments.

Non-Bayesian philosophers of science have focused on thequestion of when it is rational to accept a scientific theory.Bayesians have tended to regard their notion of subjective prob-ability as a replacement for the notion of acceptance and thusto regard talk of acceptance as a loose way of talking aboutsubjective probabilities. Insofar as the notion of acceptance hasbeen poorly articulated in non-Bayesian philosophy of science,this cavalier attitude is understandable. But there is a notion ofacceptance that can be made clear and is not reducible to theconcept of subjective probability. Confirmation theory does notprovide a theory of rational acceptance in this sense. Instead,we need to employ decision theory. This is argued in Chapter 6.

In my experience, Bayesian philosophers of science tend tothink that if acceptance is not reducible to subjective proba-bility then it cannot be an important concept for philosophy ofscience. But this thought is mistaken. One reason is that Bayes-ian analyses of the history of science, which are used to arguefor the correctness of Bayesian confirmation theory, themselvesrequire a theory of acceptance. Another reason is the acknowl-edged role of alternative hypotheses in scientific development;accounting for this role is beyond the resources of confirmationtheory and requires the theory of acceptance. A third reasonis that we would like to explain why gathering evidence con-tributes to scientific goals, and acceptance theory provides abetter account of this than is possible using confirmation the-ory alone. These claims are defended in Chapter 7.

A decision-theoretic account of rational acceptance raisesfoundational questions again. We now require not only thatscientists have probabilities for scientific hypotheses but alsothat acceptance of hypotheses have consequences to which util-ities can be assigned. Since acceptance is a cognitive act, theconsequences in question can be called cognitive consequences,and their utilities can be called cognitive utilities. It needs tobe shown that the required measures of cognitive utility can

meaningfully be assigned. According to the approach defendedin Chapter 1, this requires showing that rational scientists havepreferences among cognitive options that can be representedby a pair of probability and cognitive utility functions. Chap-ter 8 takes up this challenge, and answers it by providing a newrepresentation theorem.

In the final chapter of the book, I try to show that my the-ory of acceptance provides a fruitful perspective on traditionalquestions about scientific values. Here I partially side with thosewho see science as aiming at truth. In fact, I show how it ispossible to define a notion of verisimilitude (or closeness to thewhole truth), and I argue that this is ultimately the only sci-entific value. But on the other hand, I differ from most realistsin allowing that different scientists may, within limits, assessverisimilitude in different ways. The upshot is a position onscientific values that is sufficiently broad-minded to be consis-tent with van Praassen's antirealism, but is strong enough tobe inconsistent with Kuhn's conception of scientific values.I began work on the acceptance theory in this book in thewinter of 1980, when I was a graduate student at the Uni-versity of Pittsburgh. That semester, seminars by both CarlHempel and Teddy Seidenfeld introduced me to Hempel's andLevi's decision-theoretic models of rational acceptance. I laterwrote my dissertation (Maher 1984) on this topic, proving therepresentation theorem described in Chapter 8 and developingthe subjective theory of verisimilitude presented in Chapter 9.Wesley Salmon gently directed my dissertation research, othermembers of my dissertation committee (David Gauthier, ClarkGlymour, Carl Hempel, and Nicholas Rescher) provided valu-able feedback, and Teddy Seidenfeld helped enormously, espe-cially on the representation theorem.

In 1987,1 received a three-year fellowship from the MichiganSociety of Fellows, a grant from the National Science Foun-dation, a summer stipend from the National Endowment forthe Humanities, and a humanities released-time award fromthe University of Illinois - all to write this book. EventuallyI decided that the focus of the book needed to be narrowed;hence much of the research supported by these awards has beenexcluded from this book, although most has been published

xi

elsewhere. Still, without those awards, especially the Michiganfellowship, this book would never have been written.

I gave seminars based on drafts of this book at Michiganand Illinois; I thank the participants in those seminars for dis-cussions that helped improve the book. Critiques by GeorgeMavrodes and Wei-ming Wu are especially vivid in my mind.Brad Armendt read the book for Cambridge University Press,and his written comments were very useful to me. When Ithought the book was basically finished, Allan Gibbard's (1990)book prompted me to change radically the conception of ratio-nality I was working with; I thank my Illinois colleagues whoread through Gibbard's book with me, and thank Gibbard foran illuminating e-mail correspondence. Pred Schmitt and StevenWagner gave me stimulating written comments on the materialin Chapter 6. Many others discussed aspects of the book withme, and I hope they will forgive me if I don't drag these ac-knowledgments out to an interminable length by attempting tolist them all. I will, however, mention my wife, Janette, whonot only provided domestic support but also read much of thebook, talked about it with me on and off for years, and sug-gested improvements. I will also mention my trusty PCs andLeslie Lamport's (1986) lATjjjX software, which together turnedcountless drafts into beautiful pages. I T^X deserves to be morewidely used by philosophers than it is, especially since it is free.

Chapter 5 is an improved version of my "Diachronic Ratio-nality," Philosophy of Science 59 (1992). Section 7.3 is a refinedversion of an argument first published in "Why Scientists GatherEvidence," British Journal for the Philosophy of Science 41(1990). Section 4.3, and parts of Chapter 6 and Section 9.6, ap-peared in "Acceptance Without Belief," PSA 1990, vol. 1. Thismaterial is reproduced here by permission of the publishers.

xn

The logic of preference

The heart of Bayesian theory is the principle that rationalchoices maximize expected utility. This chapter begins with astatement of that principle. The principle is a formal one, andwhat it means is open to some interpretation. The remainderof the chapter is concerned with setting out an interpretationthat makes the principle both correct and useful. I also indicatehow I would defend these claims of correctness and usefulness.

1.1 EXPECTED UTILITY

If you need to make a decision, then there is more than onepossible act that you could choose. In general, these acts willhave different consequences, depending on what the true stateof the world may be; and typically one is not certain whichstate that is. Bayesian decision theory is a theory about whatcounts as a rational choice in a decision problem. The theorypostulates that a rational person has a probability function pdefined over the states, and a utility function u defined overthe consequences. Let a(x) denote the consequence that will beobtained if act a is chosen and state x obtains, and let X bethe set of all possible states. Then the expected utility of act ais the expected value of u(a(x))\ I will refer to it as EU(a). IfX is countable, we can write

EU(a)=y£p(x)u(a(x)).xex

Bayesian decision theory holds that the choice of act a is ratio-nal just in case the expected utility of a is at least as great asthat of any other available act. That is, rational choices maxi-mize expected utility.

The principle of maximizing expected utility presupposesthat the acts, consequences, and states have been formulated

appropriately. The formulation is appropriate if the decisionmaker is (or ought to be) sure that

1. one and only one state obtains;2. the choice of an act has no causal influence on which state

obtains;1 and3. the consequences are sufficiently specific that they determine

everything that is of value in the situation.

The following examples illustrate why conditions 2 and 3 areneeded.

Mr. Coffin is a smoker considering whether to quit or continuesmoking. All he cares about is whether or not he smokes andwhether or not he lives to age 65, so he takes the consequencesto be

Smoke and live to age 65Quit and live to age 65Smoke and die before age 65Quit and die before age 65

The first-listed consequence has highest utility for Coffin, thesecond-listed consequence has second-highest utility, and so ondown. And Coffin takes the states to be "Live to age 65" and"Die before age 65." Then each act-state pair determines aunique consequence, as in Figure 1.1. Applying the principleof maximizing expected utility, Coffin now reaches the conclu-sion that smoking is the rational choice. For he sees that what-ever state obtains, the consequence obtained from smoking hashigher utility than that obtained from not smoking; and so theexpected utility of smoking is higher than that of not smoking.But if Coffin thinks that smoking might reduce the chance ofliving to age 65, then his reasoning is clearly faulty, for he hasnot taken account of this obviously relevant possibility. Thefault lies in using states ("live to 65," "die before 65") thatmay be causally influenced by what is chosen, in violation ofcondition 2.1 Richard Jeffrey (1965, 1st ed.) maintained that what was needed was that thestates be probabilistically independent of the acts. For a demonstration that thisis not the same as requiring causal independence, and an argument that causalindependence is in fact the correct requirement, see (Gibbard and Harper 1978).

Live to 65 Die before 65Smoke

Quit

Smoke andlive to 65Quit andlive to 65

Smoke anddie before 65Quit anddie before 65

Figure 1.1: Coffin's representation of his decision problem

Suppose Coffin is sure that the decision to smoke or not hasno influence on the truth of the following propositions:

A: If I continue smoking then I will live to age 65.5 : If I quit smoking then I will live to age 65.

Then condition 2 would be satisfied by taking the states to bethe four Boolean combinations of A and B (i.e., "A and B""A and not B," "B and not A," and "neither A nor B"). Also,these states uniquely determine what consequence will be ob-tained from each act. And with these states, the principle ofmaximizing expected utility no longer implies that the ratio-nal choice is to smoke; the rational choice will depend on theprobabilities of the states and the utilities of the consequences.

Next example: Ms. Drysdale is about to go outside and is won-dering whether to take an umbrella. She takes the available actsto be "take umbrella" and "go without umbrella," and she takesthe states be "rain" and "no rain." She notes that with theseidentifications, she has satisfied the requirement of act-state in-dependence. Finally, she identifies the consequences as being thatshe is "dry" or "wet." So she draws up the matrix shown in Fig-ure 1.2. Because she gives higher utility to staying dry than get-ting wet, she infers that the expected utility of taking the um-brella is higher than that of going without it, provided only thather probability for rain is not zero. Drysdale figures that the prob-ability of rain is never zero, and takes her umbrella.

Since a nonzero chance of rain is not enough reason to carryan umbrella, Drysdale's reasoning is clearly faulty. The trouble

Rain No rainTake umbrella

Go without

Dry

Wet

Dry

Dry

Figure 1.2: Drysdale's representation of her decision problem

TakeGo

umbrellawithout

RainDry & umbrella

Wet & no umbrella

No rainDry & umbrella

Dry & no umbrella

Figure 1.3: Corrected representation of Drysdale's decision problem

is that carrying the umbrella has its own disutility, which hasnot been included in the specification of the consequences; thisviolates condition 3. If we include in the consequences a specifica-tion of whether or not the umbrella is carried, the consequencesbecome those shown in Figure 1.3.

Suppose that these consequences are ranked by utility in thisorder:

Dry & no umbrellaDry &; umbrellaWet &; no umbrella

A mere positive probability for rain is now not enough to maketaking the umbrella maximize expected utility; a small risk ofgetting wet would be worth running, for the sake of not havingto carry the umbrella.

It is implicit in the definition of expected utility that each acthas a unique consequence in any given state. This together withthe previous conditions prevents the principle of maximizing ex-pected utility being applied to cases where the laws of natureand the prior history of the world, together with the act cho-sen, do not determine everything of value in the situation (asmight happen when the relevant laws are quantum mechanical).

In such a situation, taking the states to consist of the laws of na-ture and prior history of the world (or some part thereof) wouldnot give a unique consequence for each state, unless the conse-quences omitted something of value in the situation. Includingin the states a specification of what consequence will in fact beobtained avoids this problem but violates the requirement thatthe states be causally independent of the acts. There is a gener-alization of the principle of maximizing expected utility that candeal with decision problems of this kind; but I shall not present ithere, because it introduces complexities that are irrelevant to thethemes of this book. The interested reader is referred to (Lewis1981).

1.2 CALCULATION

In many cases, it would not be rational to bother doing a cal-culation to determine which option maximizes expected utility.So if Bayesian decision theory held that a rational person wouldalways do such calculations, the theory would be obviously in-correct. But the theory does not imply this.

To see that the theory has no such implication, note that do-ing a calculation to determine what act maximizes expectedutility is itself an act; and this act need not maximize ex-pected utility. For an illustration, consider again the problemof whether to take an umbrella. A fuller representation of theacts available would be the following:

t: Take umbrella, without calculating expected utility.t: Go without umbrella, without calculating expected

utility.c: Calculate the expected utility of t and i (with a view to

subsequently making a choice that is calculated tomaximize expected utility).2

Because calculation takes time, it may well be that t or i hashigher expected utility than c; and if so, then Bayesian de-cision theory itself endorses not calculating expected utility.2 After calculating expected utility, one would choose an act without again calcu-

lating expected utility; thus the choice at that time will be between t and i. Sowhether or not expected utility is calculated, one eventually chooses t or i.

4 -

3 -

2 -

1 -

0 -

1

< rain > < no rain 'c-+t ' c-^i , I z I

Figure 1.4: Acts of taking an umbrella (£), not taking umbrella (£), andcalculating (c) whether to choose t or f.

Conversely, if c has higher expected utility than t or f, thenthe theory holds that it is rational to do the calculation.

If we wanted to, we could do an expected utility calculation,to determine which of t, f, and c maximizes expected utility.The situation is represented in Figure 1.4. Here "c—•£" meansthat choosing c (calculating the expected utility of t and t)would lead to t being chosen;3 and similarly for "c—>t." Whenc —• t is true, the utility of c is equal to that of t, less the costof calculation; and when c —• i is true, the utility of c is equalto that of t, less the same cost of calculation. Suppose that4

p(rain.c—•£) =p(no rain.c—•?) = .4p(rain.c—>i) = p(no rain.c—>£) = .1u(f.rain) = 0; u(t.no rain) = 3?/(£.rain) = 2; ?z(t.no rain) = 4

3 Assuming one will choose an act that is calculated to maximize expected utility,c —* t includes all states in which calculation would show t to have a higherexpected utility than i. But it may also include states in which calculation wouldshow t and i to have the same expected utility.

4Here the dot represents conjunction, and its scope extends to the end of theformula. For example, p(no rain.c —• t) is the probability that there there is norain and that c—*t.

ThenEU(t) = 2.5; EU(i) = 2

and letting x be the cost of calculation,

EU{c) = 2.7 - x.

Thus Bayesian decision theory deems c the rational choice if xis less than 0.2, but t is the rational choice if x exceeds 0.2.

In the course of doing this second-order expected utility cal-culation, I have also done the first-order calculation, showingthat EU{t) > EU(t). But this does not negate the point I ammaking, namely that Bayesian decision theory can deem it irra-tional to calculate expected utility. For Bayesian decision theoryalso does not require the second-order calculation to be done.This point will be clear if we suppose that you are the personwho has to decide whether or not to take an umbrella, and Iam the one doing the second-order calculation of whether youshould do a first-order calculation. Then I can calculate (usingyour probabilities and utilities) that you would be rational tochoose £, and irrational to choose c; and this does not requireyou to do any calculation at all. Likewise, I could if I wished(and if it were true) show that you would be irrational to dothe second-order calculation which shows that you would beirrational to do the first-order calculation.5

This conclusion may at first sight appear counterintuitive.For instance, suppose that t maximizes expected utility and,in particular, has higher expected utility than both i and c.Suppose further that you currently prefer t to the other optionsand would choose it if you do not do any calculation. Thus ifyou do no calculation, you will make a choice that Bayesiandecision theory deems irrational. But if you do a calculation,you also do something deemed irrational, for calculation has alower expected utility than choosing t outright. This may seem5Kukla (1991) discusses the question of when reasoning is rational, and sees just

these options: (a) reasoning is rationally required only when we know that thebenefits outweigh the costs; or (b) a metacalculation of whether the benefitsoutweigh the costs is always required. Since (b) is untenable, he opts for (a).But he fails to consider the only option consistent with decision theory andthe one being advanced here: That reasoning and metacalculations alike arerationally required just when the benefits do outweigh the costs, whether or notit is known that they do.

to put you in an impossible position. To know that t is therational choice you would need to do a calculation, but doingthat calculation is itself irrational. You are damned if you doand damned if you don't calculate.

This much is right: In the case described, what you would doif you do not calculate is irrational, and so is calculating. Butthis does not mean that decision theory deems you irrationalno matter what you do. In fact, there is an option available toyou that decision theory deems rational, namely t. So there isno violation here of the principle that 'ought' implies 'can'.

What the case shows is that Bayesian decision theory does notprovide a means of guaranteeing that your choices are rational.I suggest that expecting a theory of rational choice to do thisis expecting too much. What we can reasonably ask of such atheory is that it provide a criterion for when choices are rational,which can be applied to actual cases, even though it may notbe rational to do so; Bayesian decision theory does this.6

Before leaving this topic, let me note that in reality we havemore than the two options of calculating expected utility andchoosing without any calculation. One other option is to calcu-late expected utility for a simplified representation that leavesout some complicating features of the real problem. For ex-ample, in a real problem of deciding whether or not to takean umbrella we would be concerned, not just with whether ornot it rains, but also with how much rain there is and whenit occurs; but we could elect to ignore these aspects and doa calculation using the simple matrix I have been using here.This will maximize expected utility if the simplifications reducethe computational costs sufficiently without having too greata probability of leading to the wrong choice. Alternatively, itmight maximize expected utility to use some non-Bayesian rule,such as minimizing the maximum loss or settling for an act inwhich all the outcomes are "satisfactory."7

6Railton (1984) argues for a parallel thesis in ethics. Specifically, he contendsthat morality does not require us to always calculate the ethical value of acts weperform, and may even forbid such calculation in some cases.

7This is the Bayes/non-Bayes compromise advocated by I. J. Good (1983, 1988),but contrary to what Good sometimes says, the rationale for the compromisedoes not depend on probabilities being indeterminate.

8

1.3 REPRESENTATION

Bayesian decision theory postulates that rational persons havethe probability and utility functions needed to define expectedutility. What does this mean, and why should we believe it?

I suggest that we understand attributions of probability andutility as essentially a device for interpreting a person's prefer-ences. On this view, an attribution of probabilities and utilitiesis correct just in case it is part of an overall interpretation ofthe person's preferences that makes sufficiently good sense ofthem and better sense than any competing interpretation does.This is not the place to attempt to specify all the criteria thatgo into evaluating interpretations, nor shall I attempt to spec-ify how good an interpretation must be to be sufficiently good.For present purposes, it will suffice to assert that if a person'spreferences all maximize expected utility relative to some p andix, then it provides a perfect interpretation of the person's pref-erences to say that p and u are the person's probability andutility functions. Thus, having preferences that all maximizeexpected utility relative to p and u is a sufficient (but not nec-essary) condition for p and u to be one's probability and utilityfunctions. I shall call this the preference interpretation of prob-ability and utility.8 Note that on this interpretation, a personcan have probabilities and utilities without consciously assign-ing any numerical values as probabilities or utilities; indeed,the person need not even have the concepts of probability andutility.

Thus we can show that rational persons have probability andutility functions if we can show that rational persons have pref-erences that maximize expected utility relative to some suchfunctions. An argument to this effect is provided by represen-tation theorems for Bayesian decision theory. These theoremsshow that if a person's preferences satisfy certain putatively rea-sonable qualitative conditions, then those preferences are indeedrepresentable as maximizing expected utility relative to someprobability and utility functions. Ramsey (1926) and Savage(1954) each proved a representation theorem, and there have

8The preference interpretation is (at least) broadly in agreement with work inphilosophy of mind, e.g., by Davidson (1984, pp. 159f.).

9

been many subsequent theorems, each making somewhat dif-ferent assumptions. (For a survey of representation theorems,see [Fishburn 1981].)

As an illustration, and also to prepare the way for later dis-cussion, I will describe two of the central assumptions used inSavage's (1954) representation theorem. First, we need to in-troduce the notion of weak preference. We say that you weaklyprefer g to / if you either prefer g to / , or else are indiffer-ent between them. The notation ' / ^ #' will be used to denotethat g is weakly preferred to / . Now Savage's first postulatecan be stated: It is that for any acts / , g, and /i, the followingconditions are satisfied:

Connectedness. Either f -<g or g -< f (or both).

Transitivity. Iff^g and g <h, then f ^ h.

A relation that satisfies both the conditions of connectednessand transitivity is said to be a weak (or simple) order. So analternative statement of this postulate is that the relation ^ isa weak order on the set of acts.

Savage's second postulate asserts that if two acts have thesame consequences in some states, then the person's prefer-ences regarding those acts should be independent of what thatcommon consequence is. For example, in Figure 1.5, / and ghave the same consequence on A, and / ' and g1 are the re-sult of replacing that common consequence with somethingelse; so according to this postulate, if / < g, then it shouldbe that / ' ^ </. Formally, the postulate is that for any acts/ , #, / ' , and </, and for any event A, the following conditionholds:9

Independence. If f — f on A, g = g1 on A, f = g on A,f = g' on A, and f <g, then f < g'.9This postulate is often referred to as the sure-thing principle, a term that comes

from Savage (1954, p. 21). But as I read Savage, what he means by the sure-thingprinciple is not any postulate of his theory, but rather an informal principle thatmotivates the present postulate. In Section 3.2.3 I will discuss that principle,and consider how well it motivates the postulate. So for my purposes, it wouldconfuse an important distinction to refer to this postulate as "the sure-thingprinciple."

10

consequences

Figure 1.5: Illustration of the independence postulate

(The notation ' / = g on A1 means that / and g have the sameconsequence for every state in A, i.e., f(x) = g(x) for all x G A.)

Savage shows that if your preferences satisfy these and otherassumptions, then they maximize expected utility relative tosome probability and utility functions. But it is worth notingthat for transitivity and independence, the converse is also true;that is, if your preferences maximize expected utility relativeto some probability and utility functions, then transitivity andindependence must hold.

Proof. (Proofs that are set off from the text like this may beskipped without loss of continuity.) Suppose preferences max-imize expected utility relative to some probability and utilityfunctions. Then if / •< g and g •< h, we have that EU(f) <EU(g) and EU(g) < EU(h),~whence EU(f) < EU(h),and / ^ h] thus transitivity is satisfied. Now suppose the as-sumptions of the independence axiom are satisfied. It isstraightforward to show that

EU(g) - EU(f) = EU(g') - EU{f).

Since / •< g, EU{f) < EU{g), which by the above identityimplies that EU(f') < EU(gf), and hence that / ' •< g\ as theindependence axiom requires.

Thus any representation theorem must either assume transi-tivity and independence or else assume something at least asstrong as these conditions.

11

I do not hold that rationality always forbids violations oftransitivity or independence. For example, an anti-Bayesian ty-coon might offer me a million dollars to have preferences thatare intransitive in some insignificant way; then I would agreethat if I could make my preferences intransitive, this would bethe rational thing for me to do. More realistically, it may bethat my preferences are intransitive, but it would take moreeffort than it is worth to remove the intransitivities, in whichcase the rational thing to do would be to remain intransitive.

What I do hold is that when the preferences are relevant to asufficiently important decision problem, and where there are norewards attached to violating transitivity or independence, thenit is rational to have one's preferences satisfy these conditions.In Chapters 2 and 3,1 offer arguments to support this position,and I will critique arguments in the literature that, if sound,would refute this position.

It is too cumbersome to keep saying that transitivity andindependence are requirements of rationality "when the prefer-ences are relevant to a sufficiently important decision problem,and where there are no rewards attached to violating transitiv-ity or independence." So in this book I will make it a standingassumption that we are dealing with situations in which theconditions in quotes are satisfied; I can then say simply thattransitivity and independence are requirements of rationality.

Suppose, for the moment, that transitivity and independence,and the other conditions necessary for a representation theorem,are indeed requirements of rationality. Then the representationtheorem shows that a rational person's preferences maximizeexpected utility relative to some probability and utility func-tions. And on the preference interpretation of probability andutility, this vindicates the claim that a rational person has prob-ability and utility functions, and prefers acts that maximizeexpected utility.

1.4 PREFERENCE

On the understanding of Bayesian decision theory I have beenoutlining, the notion of preference is central. But what is pref-erence? A behaviorist definition would be: You prefer g to f

12

just in case you are disposed to choose g when presented witha choice between / and g.

Savage (1954, p. 54) rejects a definition like this on the groundthat you might be indifferent between / and #, and choose gsimply because you must make a choice. However, this is notnecessarily a counterexample to the definition; for it may bethat you chose g randomly and thus were not disposed to chooseg. (Analogy: A fair coin does not have a disposition to landheads when tossed, even though it may land heads on someoccasions.)

To put the behaviorist definition to a more severe test, sup-pose you are indifferent between / and #, and are disposed toresolve indifferences by accepting the last option offered. Thenif someone offers you / and g in that order, aren't you dis-posed to choose g over / , without preferring /? If so, this isa counterexample to the behaviorist definition. But in fact thedefinition can be made to handle this sort of case correctly. Wemight say that in the case imagined, what you are disposed tochoose is the last option offered] this option happens to be 5,but dispositions may be held to be intensional, in which case itdoes not follow that you are disposed to choose g.

Nevertheless, there are counterexamples to the behavioristdefinition of preference. Suppose you are indifferent between /and g but know that you will shortly face a choice betweenthem. Thinking about the choice in advance, you may decideto choose p, for no reason other than that you must choosesomething. Having made that decision, you are now disposedto choose g, though you are still indifferent between / and g.10

For another sort of counterexample, consider a quiz show inwhich contestants are asked to select a box, and the boxes differin nothing other than the number written on them. We mightestablish (perhaps on the basis of repeated choices) that onecontestant is disposed to pick

box 1 when the choice is between boxes 1 and 2;box 2 when the choice is between boxes 2 and 3;box 3 when the choice is between boxes 1 and 3.

10 What has been decided on here is a simple plan. The role of plans in constrainingfuture choices is discussed by Bratman (1987).

13

These dispositions are intransitive, and we might criticize thecontestant for having intransitive preferences. But the contes-tant may well respond: "I don't have intransitive preferences. Idon't care which box I choose; they are all the same. It just hap-pens that I always make the choices the same way, but surelyrationality does not require me to choose sometimes one way,sometimes another, when I am indifferent between options."It seems to me that we should accept this answer; that is, weshould accept that the contestant is really indifferent, and thusthat the contestant's dispositions to choose do not reflect a pref-erence for one box over the other.

As this last example shows, the difficulty with the behavioristdefinition of preference is not merely that it fails to capture theordinary meaning of'preference.' There is also the problem that ifwe adopt the behaviorist definition, then it need not be irrationalto have intransitive preferences. Since Bayesian decision theorydoes deem intransitive preferences irrational, the behaviorist def-inition is one that Bayesian decision theorists must reject.

Let us use the notation ' / -< g1 to denote that g is strictlypreferred to / , and ' / ~ p' to denote that the person is indif-ferent between / and g. These two relations can be defined interms of weak preference, as follows.

Definition 1.1. / -< g = f ^ g and g ^ / .

Definition 1.2. / ~ g = f -<g and g -< f.

The point of the preceding paragraphs can then be put by say-ing that a disposition to choose g is consistent with both / -< gand / ~ g. For this reason, the notion of preference is finergrained than that of disposition to choose.

Although I have rejected the claim that disposition to chooseis a sufficient condition for preference, I do wish to stipulatethat a disposition to choose is a necessary condition for prefer-ence. That is to say, if / -< g, then you are disposed to chooseg when the available options are / and g. Hence if / -< g atthe time of choosing between / and #, you will choose g. Thisstipulation forces a fairly tight connection between preferenceand disposition to choose, without actually reducing the onenotion to the other.

14

Furthermore, although disposition to choose is not sufficientfor preference, it is strong prima facie evidence for preference.And in cases where it is doubtful that a disposition to chooserepresents a preference, there are ways of obtaining further rel-evant evidence. For example, we know that most people prefermore money to less. From this, we can infer that if / + is like/ except that with / + the person receives an additional mon-etary prize, then (ceteris paribus) / -< /+. Thus if the personis disposed to choose g over / + , we have that / -< / + ^ #, andassuming that transitivity holds, we can infer that / -< g. Onthe other hand, if the person chooses g over / but chooses / +

over #, no matter how small the monetary prize, this would bestrong evidence that f ~ g.

People sometimes make one choice but say they want tochoose differently. A common example is provided by smok-ers who say they want to quit; in a recent survey, 80 per-cent of smokers said this.11 My stipulation about the con-nection between preference and choice implies that someonewho chooses to smoke cannot prefer quitting to smoking. Fur-thermore, by what was said in the preceding paragraph, thefact that smokers pay money to smoke is evidence that theystrictly prefer smoking to quitting. Using obvious notation,we have

s^s. (1.1)

How then are we to interpret the smokers who say they wantto quit?

One possibility would be to say that for these smokers, quit-ting is not an option they can choose. If this were so, then thewould-be-reformed smokers do not choose smoking over quit-ting, and we could say that for these smokers, s -< s. This maybe an acceptable account for some of the more deeply addictedsmokers, but for the most part we are inclined to say that smok-ers do have the option of quitting. So we have reason to lookfor a different solution.

Jeffrey (1974) suggested that a would-be-reformed smokercould be understood as preferring smoking to quitting but also

11 Survey conducted by the Centers for Disease Control; reported in the NewYork Times, July 20, 1989.

15

s

Figure 1.6: Decision between smoking preferences

preferring to prefer quitting. The would-be-reformed smokers'preferences then satisfy both (1.1) and

(s -< s) -< (s -< s). (1.2)

We could then say that quitting is an option for these smokers,but preferring quitting to smoking is not an option. Under thesecircumstances, (1.1) and (1.2) would appear to be consistentand to satisfactorily represent the predicament of the would-be-reformed smokers.

On closer inspection, though, it turns out that (1.2) cannot beattributed to would-be-reformed smokers, on the present con-ception of preference (which Jeffrey also holds). To see this,suppose Mr. Coffin is given the opportunity to choose betweens -< s and s -< s. After making that choice, there will be thechoice of whether to smoke or not. The choices are representedin the decision tree of Figure 1.6. If Coffin chooses s -< s atnode 1, then at node 2 he will choose s. Similarly, if he choosess -< s at node 1, then at node 3 he will choose s. Thus hischoices at node 1 are equivalent to choosing between s and s.If he appreciates this equivalence, and if (1.1) holds, then hewill choose s in this choice; thus he will choose s -< s at node 1,contrary to (1.2). Thus if we attribute both (1.1) and (1.2) toCoffin, we must also regard him as so confused that he doesnot appreciate that choosing s -< s is equivalent to choosing s,while choosing s -< s is equivalent to choosing s. But would-be-reformed smokers are surely not all guilty of a confusion of thissort.

16

My analysis assumes that at node 1, the value Coffin placeson s and s is the same whether he obtains them via node 2or node 3. This assumption may be thought objectionable. IfCoffin gets s via node 3, then he will have reversed his preferenceto s -< s; wouldn't s be more desirable in that situation thanif s -< s? But this worry is misplaced. Preference is a relationbetween acts; thus s and s must be acts. Hence s and s mustbe sufficiently specific that they together with the true statedetermine what consequence Coffin will obtain; furthermore,the states are assumed to be outside his influence (Section 1.1).Hence s and s must be sufficiently specific that Coffin's choiceat node 1 does not alter the desirability of s or s for him.

I've argued that we cannot represent the would-be-reformedsmokers' predicament by ascribing to them (1.1) and (1.2). Anatural response is then to try to understand their situationin terms of some other preferences. Here is one attempt: Letq& represent the act of quitting without suffering withdrawalsymptoms. Of course, this option is not actually available. Whatis available is the option of quitting and suffering withdrawalsymptoms, which we continue to denote q. Then we might trysaying that the would-be-reformed smoker's predicament is thatq -< s -< <fo, but only q and s are available.

However, some would-be-reformed smokers think that theyreally ought to quit; that is, they think they ought to chooseq over s, though they actually choose s instead. This is a caseof weakness of will, and I do not think it can be represented interms of preferences alone. Instead, I suggest the following anal-ysis: When smokers say that they ought to quit, they are notexpressing a preference but rather are expressing their accep-tance of a norm that requires them to quit. Following Gibbard(1990), I take acceptance of a norm to be a mental state that isclosely tied to linguistic assertion and also tends to motivate ac-tions; however, the motivational component is not so strong asto preclude the possibility that one will sincerely accept a normthat requires q but nevertheless choose s. So what I would sayabout these would-be-reformed smokers is that they prefer s toq but accept a norm that requires them to choose q; hence theyaccept a norm that requires them to have different preferencesthan those they do have.

17

Generalizing from this example, I suggest that weakness ofwill arises when a person chooses in a way that is inconsistentwith the norms the person accepts. Thus we can make sense ofweakness of will while still insisting that people always choosewhat they prefer and without having to invoke second-orderpreferences.12

I've now discussed how preference, as I want to conceive ofit, is and is not related to choice; but I have not yet said whatpreference is. Work by Davidson, Hurley, and others suggeststhat for a person to have a preference is for the person to beassigned that preference under the best interpretation of theperson. (If there is more than one "best" interpretation, theperson has the preference iff13 it is assigned by all the bestinterpretations.) If this is accepted, then to say what preferenceis, I only need to fill in the criteria for evaluating attributionsof preference by an interpretation. Much of what I have alreadysaid bears on this. Thus I would say that a necessary criterionfor a satisfactory interpretation of preference is that it never as-sign to a person the preference / -< g at the same time that theperson is not disposed to choose g in a choice between / and g.Interpretations satisfying this necessary condition are evaluatedby the degree to which they satisfy the following desiderata:

(i) When the person is disposed to choose g over / , theperson strictly prefers g to / .

(ii) The person's preferences are normal ones for people tohave in the circumstances that the person is in.

(iii) The person's preferences are what the person says theyare, except where we have reason to think the personmistaken or insincere.

(iv) The person's preferences are rational.

12My account of weakness of will agrees with Gibbard (1990, pp. 56—61). Davidson(1980, Essay 2) attempted to give an account of weakness of will that preservesthe principle: "If an agent judges that it would be better to do x than to do y,then he wants to do x more than he wants to do y." In my terms, this principleis most naturally read as saying that if you accept a norm that requires choosingx over y, then you prefer choosing x over y. This is precisely what I think wemust reject. If I'm right, then Davidson's account of weakness of will must bedefective. For an argument that it is, see (Hurley 1989, pp. 131-5).

13I follow the common practice of using 'iff' as an abbreviation for 'if and onlyif.

18

In the earlier example of the quiz show contestant, attributionof intransitive preferences violates (ii)-(iv), and my suggestionwas that we get a better interpretation by allowing that theperson is indifferent between the boxes, thus violating only (i).

1.5 CONNECTEDNESS

Everyone must have had the experience of agonizing over a deci-sion, not knowing what to do. In such a situation, it seems mostnatural to say that we neither prefer one option to the othernor are indifferent between them. But then we violate Savage'sconnectedness postulate, since we are faced with options / andg, such that neither f -<g nor g £ f.

Besides the difficult decisions we actually face, there is a hugeclass of difficult decisions we have not had to face. In manycases, we also lack preferences about the options in these hy-pothetical decision problems. However, Savage's connectednesspostulate requires that one have preferences, even about merelyhypothetical options.

Now the axioms of representation theorems are meant to berequirements of rationality, not descriptions of real people. Andperhaps it can be argued that when you must choose betweensome options, rationality requires you to acquire a preference(or indifference) between the options. But it is hard to see howrationality could require you to have preferences also about allthe merely hypothetical options that are not available to you.

There is a way to sidestep this worry about connectedness.We could simply stipulate that / ^ g means you do not strictlyprefer / to #.14 Since you cannot strictly prefer both / to 5,and g to / , it follows that at least one of / 3 g and g •< f musthold; hence the connectedness postulate is necessarily satisfied.Taking this line means that we cease to distinguish betweencases where you are indifferent between / and g, and cases whereyou lack a view about the relative merits of / and g.

This defense of connectedness only shifts the underlying prob-lem, and does not solve it. To see this, suppose you are unde-cided between / and g, and let g + d be an act that is just

14In fact, this is the interpretation that Savage (1954, p. 17f.) gives to the weakpreference relation.

19

like g except that you also get a small additional reward d. Forexample, / and g might be two possible job offers that youwould have trouble deciding between, and d might be $5 (sothat g + d is accepting the second job offer and also receivingan additional $5). If d really is regarded by you as a reward,then g -< g + d. Since you do not strictly prefer g to / , the viewwe are considering has it that f -<g. But if your indecision be-tween / and g is at all serious, and if the reward d is sufficientlysmall, you will also be undecided between / and g + d\ on theview we are considering, that implies g + d ^ / . So we haveg + d-<f-<g^g + d\ thus you violate transitivity. Yet youneed not be irrational to be in this situation, especially if / andg are options you are unlikely to actually have available to you.So if we were to make connectedness an analytic truth, we couldno longer say transitivity was a requirement of rationality.

Consequently, I will not interpret / •< g as meaning that youdo not strictly prefer / to g. Instead, I will continue to interpretit as meaning that you either strictly prefer g to / or are indiffer-ent between them. Since you might be in neither of these states,and since that need not be irrational, I then have to say that theconnectedness postulate is not a requirement of rationality.

A more plausible condition is that rationality requires yourpreferences, so far as they go, to agree with at least one con-nected preference ordering that satisfies transitivity, indepen-dence, and the other assumptions of a representation theorem(Skyrms 1984, ch. 2). I will adopt this condition.

A representation theorem like Savage's shows that if yourpreferences are connected, and if you satisfy the other assump-tions, then your preferences maximize expected utility relativeto some probability and utility functions. Furthermore, the the-orem shows that the probability function is unique, and the util-ity function is unique in the sense that if u and v are two utilityfunctions representing your preferences, then there exist con-stants a and 6, a > 0, such that v = au + b. Now suppose yourpreferences are not connected but satisfy all the other assump-tions of a representation theorem. Then your preferences agreewith more than one connected preference ranking that satisfiesall the assumptions of a representation theorem. The represen-tation theorem tells us that each of these connected preference

20

rankings is representable by a pair of probability and utilityfunctions. We can then regard your unconnected preferences asrepresented by the set of all pairs of probability and utility func-tions that represent a connected extension of your preferences. Iwill call this set your representor15 and its elements p-u pairs.

Prom the way the representor has been defined, it followsthat if you prefer / to 5, then / has higher expected utilitythan g, relative to every p-u pair in your representor; and if youare indifferent between / and #, then / and g have the sameexpected utility relative to every p-u pair in your representor.You lack a preference between / and g if the p-u pairs in yourrepresentor are not unanimous about which of these acts hasthe higher expected utility.

At the beginning of this chapter, I described Bayesian decisiontheory as holding that rational persons have a probability andutility function, and that a choice is rational just in case it max-imizes the chooser's expected utility. In the light of the presentsection, that description needs a gloss. The statement that a ra-tional person has a probability and utility function is not meantto imply that these functions are unique; a more explicit state-ment of the position I am defending would be that a rationalperson has a representor that is nonempty; that is, it contains atleast one p-u pair. A corresponding gloss is needed for the state-ment that a choice is rational just in case it maximizes expectedutility; a more complete statement of this condition would bethat a choice is rational just in case it maximizes expected util-ity relative to every p-u pair in the chooser's representor.

Prom this perspective we can say that although connected-ness is not a requirement of rationality, and although represen-tation theorems assume connectedness, nevertheless represen-tation theorems do provide the foundations for a normativelycorrect decision theory.

1.6 NORMALITY

The preference relations / ^ #, / -< g, and / ~ g only deal withyour attitude to choices in which there are just the two options

15The term comes from van Fraassen (1985, p. 249), but the meaning here isdifferent.

21

/ and g; they imply nothing about your attitude when there aremore than these two options. To see this, note that if / -< g, thenaccording to the stipulations I have made, in a choice between/ and g you would choose g; however, it is consistent with thisthat in a choice where the options are / , #, and h, you wouldchoose / . In order to be able to talk about your attitude todecision problems with more than two options, I will introducea generalization of the binary preference relations / 3 g, f -< #,and / ~ g.

Let 'C(F)' denote the acts you prefer or want to choose whenthe set of available acts is F. For example, if you prefer to choose/ when the available acts are / , 5, and /i, then C{f,g,h} ={/}. If you prefer not choosing h but are indifferent betweenchoosing / and 5, then C{/, p, h} = {/, g}. If you are indifferentbetween all three options, then C{/, #, h} = {/, p, h}. If you areundecided about what to choose, then C{/, 5, h} is the emptyset, denoted by 0. The function C so defined is called your choicefunction, and C(F) is called your choice set for the decisionproblem in which F is the set of available options. Of course,your choice function will in general change over time.

The notion of a choice function generalizes the binary pref-erence relations. To see this, note that the binary preferencerelations can be defined in terms of the choice function, as fol-lows:

Thus in an axiomatic presentation of this subject, one wouldbegin with the choice function and then define the binary pref-erence relations. I have followed the reverse order because Ithink it is easier to first master the simpler concepts of binarypreference and then generalize to the choice function.

The discussion of the meaning of preference, in Section 1.4,generalizes in the obvious way to choice functions. In partic-ular, the choice function is to be understood as satisfying thefollowing stipulation:

If C(F) 7 0 for you at the time of deciding between the actsin F, then you will decide on an act in C(F).

22

This implies, in particular, that if C(F) = {/}, you will choose/ when the set of available acts is F.

Part of what is asserted by Bayesian decision theory cannotbe expressed in terms of binary preferences, but rather requiresthe notion of a choice function. To see this, suppose your binarypreference relation ^ satisfies all the requirements of Bayesiandecision theory (transitivity, independence, etc.); and supposethat for you / -< g, but C{/, g, h} = {/}. Since / -< g and yourpreferences accord with Bayesian decision theory, g must havehigher expected utility than / , in which case / does not maxi-mize expected utility in {/, p, h}, and so according to Bayesiandecision theory, choosing / is irrational in this decision prob-lem. Thus your choice function is inconsistent with Bayesiandecision theory, even though your binary preferences are con-sistent with it.

Since the assumptions made in representation theorems con-cern binary preference only, we see that Bayesian decision the-ory is committed to more than the assumptions that appear inrepresentation theorems. This extra commitment can be statedas follows.

Normality. If F is a set of acts on which ^ is connected, then

= {feF:g^fforallgeF}.

(The term 'normality' comes from [Sen 1971].) It is easy toverify that this condition must hold if Bayesian decision theoryholds. For according to Bayesian decision theory, / G C(F) iff/ maximizes expected utility in F; that is, iff EU(g) < EU(f)for all j G f ; and that is equivalent to g -< f for all g G F.

In Chapter 2, I will defend the view that normality is a re-quirement of rationality.

1.7 RATIONALITY

Bayesian decision theory, as I have been articulating it, is con-cerned with the rationality of preferences and choices. But whatis rationality?

In many discussions of Bayesian decision theory, the theoryappears to be taken as holding that maximization of expectedutility is (at least part of) what it means to be rational. If this

23

is accepted, then the claim that rational persons maximize ex-pected utility becomes a tautology. But it seems clear that thisis not a tautology, as 'rationality' is normally understood. Thatobjection could be avoided by taking the theory to be giving astipulative definition of 'rationality' rather than attempting tocapture the ordinary meaning of the term; but then we wouldneed some argument why rationality in this sense is an inter-esting notion. Such an argument would presumably try to showthat we ought to maximize expected utility - in other words, thatmaximizing expected utility is rational, in a sense other than theone just stipulated. My topic in this section is what that other,essentially normative, sense of 'rationality' might be.

Taylor (1982) suggests that irrationality is just inconsistency(though he denies that rationality is just consistency). David-son (1985) works with a definition of irrationality as viola-tion of one's own standards. And Foley (1987) favors the viewthat a person's belief is rational if it conforms to the person'sdeepest epistemic standards. However, I do not think that this(in) consistency conception of (ir) rationality serves all the legit-imate purposes for which we use the notion of (ir)rationality.An example that is particularly relevant to this book: Philoso-phers of science are concerned with normative theories of scien-tific method. One way to evaluate such theories is to test themagainst our judgment of particular cases. Thus we might, forexample, have a firm pretheoretic intuition that geologists inthe 1960s were rational to accept continental drift. If a theoryof scientific method agrees with this intuition, that is a point infavor of the theory; otherwise, we have a strike against the the-ory. But when we judge what was rational for geologists in the1960s, we are not primarily concerned with whether acceptingcontinental drift was consistent with their standards, howeverinteresting this question may be. For the purposes of evaluat-ing our theory of scientific method, what we need to judge iswhether we think geologists in the 1960s ought to have acceptedcontinental drift. The 'ought' here involves a notion of rational-ity that goes beyond consistency.

The notion of rationality we need here is, I think, best an-alyzed along the lines of the norm-expressivist theory in Al-lan Gibbard's recent book (1990). On this theory, we do not

24

attempt to fill in the blank in " CX is rational' means "Rather, we say what someone is doing when they call X ratio-nal. And the theory says that when you call something rational,you are expressing your acceptance of norms that permit X. So,for example, to say geologists in the 1960s were rational to ac-cept continental drift is to express your endorsement of normsthat permit acceptance of continental drift in the circumstancesgeologists were in in the 1960s.

On Gibbard's view, to say that something is rational is notto make a statement that could be true or false; in particular,the assertion about geologists in the 1960s does not say thattheir acceptance of continental drift was consistent with thenorms of either the speaker or the geologists. However, thisassertion does express the speaker's acceptance of norms thatpermit these geologists to accept continental drift.

What I have said is just the barest outline of an analysis ofrationality. For completeness, it especially requires an accountof what it is to express one's acceptance of a norm. Here I referthe reader to Gibbard's discussion, which admirably presentsone way such an account may be developed.

1.8 JUSTIFICATION

We talk not only about the rationality of actions and beliefs,but also about whether or not they are justified. What is therelation between these notions of rationality and justification?A common view is that they coincide; that is, that something isrational just in case it is justified. I shall argue that this is notso, and that an action or belief can be rational without beingjustified.

Consider the stock example of a man who has good evidencethat his wife is unfaithful. If he were to believe her unfaithful,this would show - and probably would bring the marriage toan end; so, all things considered, his interests are best servedby believing that she is faithful, contrary to his evidence.16 Itseems clear that this man would not be justified in believinghis wife faithful. On the other hand, the belief would be in the

16For definiteness, let 'belief here mean high probability.

25

man's best interest and would do no harm to anyone else, so itseems that the norms we accept should permit him to believethis, in which case, the norm expressivist theory would haveus saying that it is rational for the man to believe his wifeunfaithful. Thus we seem to be forced to divorce the notions ofrationality and justification.

Gibbard has a way of attempting to avoid this divorce. Hesuggests that it would be rational (and justified) for the manto want to believe his wife faithful, but irrational (and unjus-tified) to actually believe it (Gibbard 1990, p. 36f.). Gibbardwould allow that what is said here for wanting also applies topreference in my sense. For example, suppose the man can gethimself to believe his wife faithful by some means, and let b bethe act of doing this. Then Gibbard would allow that the manis rational (and justified) to have 6 ^ 6 , while still maintainingthat the man would be irrational (and unjustified) to have thebelief that b produces.

Is this a consistent position? In trying to answer that ques-tion, I will assume that consistency requires that the normsone accepts should be jointly satisfiable in at least one possibleworld. This is in accord with Gibbard's normative logic (1990,ch. 5), though I will gloss over certain refinements in Gibbard'saccount.

I have made it a conceptual truth that someone who has b -< bwill choose b when the alternatives are b and b. So if we acceptthat the man would be rational to have 6 -< 6, we accept thatthe following norm applies to this man.

(1) If you have a choice between b and 6, choose b.

But b is an act that the man knows will produce in him thebelief that his wife is faithful. Given this knowledge, there isno possible world in which the man chooses b but does notbelieve his wife is faithful. So if we judge that the man wouldbe irrational to believe his wife faithful, we must accept thatthis norm applies to him:

(2) Do not choose b.

The only worlds in which (1) and (2) are both satisfied arethose in which the man does not have a choice between b and

26

b. Hence Gibbard must conclude that the man should not havethe choice between b and b. That is, Gibbard must accept thefollowing norm:

(3) Do not have a choice between b and b.

The thought here would be that fully rational persons are notable to get themselves to believe things that are not supportedby the evidence. This is a possible position to take, so there isnot yet any inconsistency.

Now suppose that our man does have the power to get him-self to believe his wife is faithful. For the sake of argument,I will concede that his having this power is a failure of ra-tionality. Still, he has it, and we can suppose that there isnothing he can do about it. A useful system of norms will saywhat should be done in situations where some norm is un-avoidably violated. In a different context, Gibbard discussesnorms of this sort, calling them "norms of the second-best"(1990, p. 241). So now I ask: Given that the man is unableto satisfy both (1) and (2), which one should we regard asapplying to his situation?

Suppose we decide to endorse (1), and drop (2) as a normof the second-best. Then we are saying, according to Gibbard,that in this situation the man is both rational and justified tobelieve that his wife is faithful. But in view of the contraryevidence, the claim that the man is justified in believing thisseems plainly false.

So let us instead try endorsing (2), and dropping (1) as anorm of the second-best. Then we are saying, according to Gib-bard, that the man would be irrational and unjustified to believehis wife faithful. But there is a clear sense in which it makessense for the man to believe his wife faithful. Someone who ad-vised the man to acquire this belief would have given the mangood advice. In the original situation, Gibbard proposed to ac-commodate this by saying that wanting to believe is rational;but we are now in a situation where preference entails choice,so that option is not available here, unless we go back to theunsatisfactory situation of the preceding paragraph.

I conclude that Gibbard's attempt to equate rationality andjustification, elegant though it is, ultimately fails. I think we

27

have to accept that rationality and justification are differentnotions. I would say that the man is rational to get himself tobelieve (and hence to believe) that his wife is faithful, but isnot justified in believing this.

The norm-expressivist analysis is, I think, correct for rational-ity. That is, I agree that calling something rational is expressingacceptance of norms that permit it. Thus in saying that the manwould be rational to believe his wife faithful, I am expressingmy acceptance of norms that permit the man to believe this.Because the man is nevertheless not justified in believing this,I think the norm-expressivist analysis does not give a correctaccount of justification.

Of course, justification is itself an evaluative notion, like ratio-nality; so if a norm-expressivist analysis of rationality is right,one should look for an analysis of justification along the samelines. In the case of subjective probability, we might try thefollowing. The standard way in which subjective probabilitieshave value or disvalue is via their connection with preferencefor actions, and hence with action itself. It is desirable to havesubjective probabilities that result in successful actions beingchosen and unsuccessful ones avoided. Gibbard calls probabil-ities that are desirable in this sense systematically apt (1990,p. 221). But in some cases, subjective probabilities can havevalue or disvalue in another way. This is what happens in thecase of the man with evidence that his wife is unfaithful. Thereason he does not want to apportion probability in accor-dance with the evidence is that this would produce involun-tary signs that would destroy the marriage; it is not that thepreferences associated with this probability would be inappro-priate. Indeed, I will show in Chapter 5 that considering onlythe influence on actions chosen, the man would maximize ex-pected utility by assigning probabilities in accordance with theevidence. So what we might try saying is that for subjectiveprobabilities to be justified is for them to be systematically apt- that is, rational if we consider only the influence on actionschosen and ignore other consequences. Then to call a subjectiveprobability judgment justified is to express one's endorsement ofnorms that permit it in circumstances where no extrasystematicconsequences attach to having that probability. A judgment of

28

justification would then be a conditional judgment of rational-ity, dealing with a possibly counterfactual situation. It is clearwhy such judgments would be useful.

Even if this account of justification is right as far as it goes,more work would be needed to deal with justification of otherthings, such as emotions and moral judgments. I will not pur-sue that here, because this book is about rationality, not jus-tification. For my purposes, it suffices to have established thatrationality is not the same thing as justification.

1.9 QUALIFICATION

I began this chapter by saying that according to Bayesian de-cision theory, it is rational to choose act a just in case a max-imizes expected utility, relative to your probabilities and utili-ties. Taken literally, this position would imply that questions ofthe rationality of your probabilities and utilities are irrelevantto the question of what you rationally ought to do. It seemsthat currently most Bayesians accept this literal interpretationof the theory. For example, Eells (1982, p. 5) allows that anaction that maximizes your expected utility may not be well-informed, but he does not allow that it could be irrational. Iwill call this position unqualified Bayesianism.

I do not accept unqualified Bayesianism. I think we shouldallow that probabilities and utilities can themselves be irra-tional; and when they are, it will not generally be rational tomaximize expected utility relative to them. Thus I view theprinciple of maximizing expected utility as elliptical; to get aliterally correct principle, we must add to the principle of ex-pected utility the proviso that the probabilities and utilitiesare themselves rational. I will call the position I am defendingqualified Bayesianism.

In saying that probabilities and utilities can themselves beirrational, I am not saying that rationality determines uniqueprobability and utility functions that everyone with such-and-such evidence ought to have. On the contrary, I would allow thata wide range of probability and utility functions may be rationalfor persons who have had all the same experiences. Thus I rejectpositions like that of Carnap (1950) or Salmon (1967) on what

29

it takes for probabilities to be rational. Qualified Bayesianismis not a species of "objective Bayesianism," as that term hasusually been understood.

Then why do I advocate qualified Bayesianism? Let me be-gin by noting that there is no positive argument for unqualifiedBayesianism. In particular, representation theorems do not pro-vide such an argument. What representation theorems show (iftheir assumptions are accepted) is that you are not fully ra-tional if your preferences do not all maximize expected utilityrelative to your probability and utility functions. It follows thatmaximization of your expected utility is a necessary conditionof rationality. But it does not follow that maximization of yourexpected utility is a sufficient condition of rationality. Nor isthere any other positive argument for unqualified Bayesianism,so far as I know. By contrast, representation theorems do pro-vide an argument for qualified Bayesianism. For if it is rationalto have p and u as your probability and utility functions, and if(as the representation theorems show) a rational person maxi-mizes expected utility, then it is rational for you to maximizeexpected utility relative to p and u.

Though there is no positive argument for unqualified Bayes-ianism, there are arguments against it. One is that even if aperson's preferences all maximize expected utility relative to pand u, having these preferences may conflict with the normsthat the person accepts, and thus be irrational by the person'sown lights. In such a situation, we have to say either that theperson's preferences are irrational or the person's norms aremistaken. Unqualified Bayesianism entails that the first optionis impossible (since the preferences all maximize expected utilityrelative to the person's probability and utility functions); thusit entails that the person's norms are mistaken. But in somecases, the opposite conclusion will be at least as plausible.

For example, suppose that I am one of those smokers whoaccept that they ought to quit; but being weak willed, I goon smoking. According to the account of preference in Sec-tion 1.4, I prefer smoking to quitting. Furthermore, my pref-erences might satisfy transitivity, independence, and the otherassumptions of a representation theorem. If so, I have proba-bility and utility functions, and the options I prefer maximize

30

expected utility relative to my probability and utility functions(Section 1.3). Hence smoking maximizes my expected utility.Unqualified Bayesianism concludes that I am rational to go onsmoking, and hence that I am in error in thinking that I oughtto quit. But it seems at least as plausible to say that my normsare right and it is my preferences that ought to change.

For another sort of example, suppose I prefer to go back intothe house rather than cross a black cat in the street. Supposefurther that this preference maximizes my expected utility, be-cause I have a high probability for the claim that I will havebad luck if I cross a black cat. Elster (1983, p. 15) says Bayesiandecision theory is an incomplete theory of rationality because itdoes not deem this preference irrational. Unqualified Bayesian-ism has to say that the preference is not irrational; and I wouldagree that the preference is not necessarily irrational. But sup-pose that I myself accept that my view about the influence ofblack cats is irrational, but I have not been able to rid myselfof it. Then I think I am irrational, and it seems bizarre to sayI must be wrong about this.

A possible response here would be to say that when a person'snorms and choices diverge, we should take the norms ratherthan the choices as determining the person's preferences. But wewant to attribute preferences to people on matters about whichthey have no normative commitments; and we attribute prefer-ences to beasts, who accept no norms at all. So this proposalwould make preference a gerrymandered concept. Furthermore,the proposal would not remove the basic problem; for whena person's norms and choices diverge, it is not necessarily thenorms that are correct.

Second argument against unqualified Bayesianism: We at-tribute probabilities and utilities to persons, even when we knowthat their preferences do not all maximize expected utility rel-ative to any one p-u pair. My account of personal probabilityin Section 1.3 allowed for this, by saying that p and u are yourprobability and utility functions if they provide an interpreta-tion of your preferences that is sufficiently good and is betterthan any competing interpretation. So now, suppose that mostof my preferences maximize expected utility relative to p andu, and on this account p and u are my probability and utility

31

functions. But I have some preferences that do not maximize ex-pected utility relative to p and u. Unqualified Bayesianism hereendorses majority rule: It says that most of my preferences arerational and the few that diverge are irrational. But when I re-flect on the conflict, I might well decide that it is the minoritythat are right and the majority that are wrong. There seems noreason to think it would always be the other way around.

For example, I might be in the grip of the gambler's fallacyand think that when a coin has been tossed and landed heads,it is more likely that it will land tails the next time. Here aprobability is attributed to me because it fits my preferencesregarding bets on coin tosses. However, I may have a few otherpreferences that do not fit. For example, suppose I would betthat there is no causal influence between different tosses of thecoin, though such a bet does not maximize expected utilityrelative to the probability and utility functions attributed to meon the basis of most of my preferences. Unqualified Bayesianismimplies that the latter preference is irrational; but in this case Ithink most of us would say this preference is rational, and it isthe ones that maximize my expected utility that are irrational.

Third argument: Suppose that, for no reason, I suddenly cometo give a high probability to the proposition that my office isbugged. Most Bayesians will agree that if this new opinion wasmotivated by no evidence, then I was irrational to adopt it.In the jargon to be introduced in Chapter 4, the shift violates"conditionalization." In Chapter 5, I will show that Bayesiansare right about this, provided certain conditions hold. So, let usgrant that I was irrational to shift from my former view to thenew one that my office is bugged. Question: Now that I havemade the shift, is it rational for me to accept an even-moneybet that my office is bugged? Unqualified Bayesianism says it is,because accepting the bet maximizes my current expected util-ity. But even unqualified Bayesians agree that I was irrationalto acquire my current probability function; thus, unqualifiedBayesianism seems committed to the view that once errors aremade, they should not be corrected. It is hard to see the meritin this view; indeed, I suggest that the view is patently false.

Finally, let me draw on an analogy with the principle ofdeductive closure. This principle is apt to be stated in an

32

unqualified form, according to which it holds that persons oughtto accept the logical consequences of what they accept. But onreflection, we soon see that this principle is untenable. For wewant to allow that when the consequences of what we acceptturn out to be sufficiently implausible, we should not acceptthem but instead revise some of the things we now accept. Thusthe principle is acceptable only if qualified; we should say that ifit is rational to accept what you have accepted, then it is rationalto accept the logical consequences of what you have accepted.This qualified version of the principle of deductive closure isanalogous to qualified Bayesianism. The unqualified version ofdeductive closure, which we quickly saw to be untenable, isanalogous to unqualified Bayesianism.

Of course, qualified principles are not as informative as un-qualified ones. Nevertheless, they can assist in making deci-sions. Consider first the qualified version of deductive closure.If you find that a proposition is a consequence of those youhave accepted, the principle tells you that either the proposi-tion should be accepted or something you have accepted shouldbe abandoned; and though it does not tell you which to do,you will commonly be able to make a judgment in favor of oneoption or the other. The contribution that qualified Bayesian-ism can make to resolving decision problems is similar. You candetermine which options in a decision problem maximize yourexpected utility, and qualified Bayesianism then tells you thatyou should either choose one of those options or else have differ-ent probabilities or utilities. Normally you will be able to makea judgment in favor of one or other of these alternatives. Hereagain, as in Section 1.2, we see that Bayesian decision theoryis an aid to rational decision making though not an algorithmthat can replace good sense.

I turn now to the argument that transitivity, normality, andindependence are requirements of rationality. The reader whoneeds no convincing on this matter could skip the next twochapters.

33

Transitivity and normality

In this chapter, I begin by noting that transitivity and nor-mality enjoy wide endorsement. I then consider arguments fortransitivity and normality, and give reasons for thinking thatthese do not provide additional reason to endorse transitivityand normality. Then I consider arguments against transitivityand normality, and give reasons for thinking that they too areineffective. I conclude that transitivity and normality are aboutas secure as any useful principle of rationality is likely to be.

2.1 POPULAR ENDORSEMENT

Although violations of transitivity and normality are not un-common, most people, on discovering that they have made sucha violation, feel that they have made a mistake. This indicatesthat most people already regard transitivity and normality asrequirements of rationality.

Experimental evidence for this was obtained by MacCrim-mon (1968). MacCrimmon's subjects were business executivesin the Executive Development Program at the University ofCalifornia at Los Angeles. In one of these experiments, subjectswere asked to assume that they were deciding on the price ofa product to be produced by their company. They were alsoasked to assume that four of the possible pricing policies wouldhave the following consequences.

A. Expected return 10%Expected share of market 40%

B. Expected return 20%Expected share of market 20%

C. Expected return 5%Expected share of market 50%

34

D. Expected return 15%Expected share of market 30%

The subjects were presented with all possible pairs of theseoptions and asked to indicate a preference. Options from twoother transitivity experiments were interspersed with the pre-sentations of these options.

In this experiment, only 2 of 38 subjects had intransitivepreferences. Altogether, in the three experiments (which all hada similar structure), 8 of the 114 subject-experiments exhibitedintransitivity. After the experiment, subjects were interviewedand given a chance to reflect on their choices. MacCrimmonwrites:

During the interview, 6 of these 8 subjects quickly acknowledged thatthey had made a "mistake" in their choice and expressed a desireto change their choices. The remaining 2 subjects persisted in theintransitivity, asserting that they saw no need to change their choicesbecause they focused on different dimensions for different choice pairs.The fallacy of this reasoning was not apparent to them in a five-minutediscussion.

Thus MacCrimmon's results support the view that an over-whelming majority of people regard transitivity as a require-ment of rationality; but it also shows that some people do not.1

After making the pairwise comparisons, subjects in Mac-Crimmon's experiment were asked to rank order all four op-tions. MacCrimmon refers to a discrepancy between the rankordering and the pairwise comparisons as a "choice instability."Any violation of normality would be a choice instability, butthe converse does not hold. MacCrimmon reports:

Only 8 of the 38 subjects had no "choice instabilities" in the threeexperiments. The other 30 subjects (except for the 2 who persisted inintransitivities) all wished to change their choice (most would changethe binary choice, but the difference was not significant). They gener-ally attributed the "choice instability" to carelessness in their readingor thinking.

1Tversky (1969) obtained a higher rate of intransitivity than occurred in Mac-Crimmon's study, but he too found that "the vast majority" of subjects regardedtransitivity as a requirement of rationality.

35

MacCrimmon's results thus provide evidence that commitmentto normality is at least as widespread as commitment to transi-tivity. There are, however, the holdouts. Furthermore, some ofthose who reject transitivity and normality are well-informedstudents of decision theory. For that reason, arguments havebeen offered that attempt to derive transitivity, and sometimesnormality also, from more compelling principles. I discuss sucharguments in the next section.

2.2 ARGUMENTS FOR TRANSITIVITY AND NORMALITY

I first discuss the best-known argument for transitivity, themoney pump argument. I show that this argument is fallacious.I then consider three other arguments, which I take to be sound,but whose premises seem to me no more plausible than transi-tivity and normality themselves. Thus my ultimate conclusionis that none of these arguments adds much to the case for tran-sitivity and normality. However, this is not a ground for doubt-ing transitivity and normality; it rather reflects the difficulty offinding premises more plausible than transitivity and normality.

2.2.1 The money pumpThe money pump argument is presented by Davidson, McKin-sey, and Suppes (1955), RaifFa (1968), and in many other places.It goes like this:

Suppose you violate transitivity; then there are acts / , p, andh such that f-<g-<h^f.li you had /i, then since h -< / , youshould be willing to pay a small premium to exchange h for / .But then, since / •< g •< h, you should be willing to exchange/ for #, and then g for h. But now you are back where youwere, except poorer by the premium you paid to exchange hfor / . Furthermore, we could go around the cycle repeatedly,collecting a premium from you each time you exchange h for / .Thus you are said to be a money pump.

This is such a simple and vivid argument that it is a pity it isfallacious. But fallacious it is. The fallacy lies in a careless anal-ysis of sequential choice. The argument assumes that someonewith intransitive preferences will make each choice without anythought about what future options will be available, yet this is

36

h

Figure 2.1: The money pump decision tree

not in general a rational way to proceed. For example, suppose/ ~ 9 ~ h " /> that you now have h, and that you have theopportunity to exchange h for / , at a small cost d. Suppose youknow that if you make the exchange, you will then be offered gfor / ; and subsequently h for g. Then your decision problem isreally the sequential decision problem represented in Figure 2.1.In this tree, the nodes (which are numbered) represent pointsat which you have to make a decision. Node 1 represents thepoint at which you decide whether to exchange h for / , and payd for the privilege. If you do not make the exchange, you geth; otherwise you move to node 2. At node 2 you have to decidewhether to exchange / for g\ if you do not make this exchange,you get / — d; otherwise, you move to node 3. At node 3, youhave to decide whether to exchange g for h.

The money pump argument assumes that since g ^ /i, theng — d^h — d, and hence that you would be willing to exchangeg for h at node 3. But if you knew you would do this, thenthe choice at node 2 is effectively between f — d and h — d.And the money pump argument assumes that since h -< / , it isalso the case that h — d -< f — d. Consequently, at node 2 youought to choose / — d, and hence refuse to exchange / for g.The cycle, envisaged by the money pump argument, is brokenat this point.

This analysis assumes you know what choices will be offeredto you in the future. If this were not known, then you mightwell make choices that would leave you worse off than whenyou started. But this fact does not provide much of an argu-ment against intransitivity. An investor who buys stocks and

37

later sells them at a lower price is not necessarily irrational;things have just not turned out the way they were expected to.Similarly, persons with intransitive preferences may argue, thefact that they might unexpectedly end up with a loss does notshow that they are irrational.2

2.2.2 ConsequentialismConsequentialism is the principle that the value of acts is de-termined by their consequences in the various possible states ofthe world. Hammond (1988a) argues that this principle sufficesto derive many of the postulates of a representation theorem,including transitivity and normality. Hammond's argument islike the money pump argument in that it considers sequentialchoices over time, but Hammond does not make the assump-tion of myopia that vitiated the money pump argument. Aspresented by Hammond, the argument is couched in formidablenotation; I will try to state it in English.

Hammond begins by laying down a principle that I will callrigidity. (Hammond calls it "consistency.") This principle re-quires that the choices a person would prefer on reaching anychoice node in any decision tree are the same as those the personwould now prefer if now faced with the choices available at thatnode. Hammond claims that rigidity is a very weak principle,and to support that claim he discusses the following example:A potential drug addict faces the decision of whether or not toabstain entirely from a drug; and if the drug is tried, then thereis the decision of whether or not to quit. Figure 2.2 representsthe situation. We can suppose that initially (at node 1), whatthe potential addict most prefers is to use the drug, then quitbefore permanent damage is done. However, by the time node 2is reached, the potential addict will have become an actual ad-dict and will choose to continue use. Still, Hammond argues,rigidity is satisfied in this example. For he claims (p. 36, line 1)

2Lehrer and Wagner (1985) argue for transitivity by showing that in a particularexample, if you have intransitive preferences, then the following holds: If youmake sequential choices myopically in accordance with those preferences, thenyou may end up with an option that you disprefer to some other option thatwas available. To infer from this that intransitivity is irrational is to make thesame sort of error as I have just identified in the money pump argument.

38

continue use

' use then quit

v abstain

Figure 2.2: The potential addict's problem

that the potential addict would now choose to continue use ifnow faced with the choices at node 2.

The rationale for the latter claim is presumably that someonefaced with the choices at node 2 has already tried the drug, andhence the potential addict faced with these choices must beaddicted. But this need not be so. Suppose the potential addicthas never tried the drug, is offered the choice between a smalland a large quantity of it, has no other possibility of procuringthe drug, and is required to consume at least the small quantity.Then the choices are those at node 2, but the potential addict isnot yet addicted; and so we can expect that the potential addictwould choose the small quantity (which amounts to choosing touse and then quit). But still, we can suppose, it remains the casethat at node 2 in the tree in Figure 2.2, the potential addict willhave become an actual addict and thus will prefer continued use.Then contrary to what Hammond claims, rigidity is violated inthis example.

So rigidity is a substantive assumption, and the question be-comes whether it is plausible as a principle of rationality. Thefact that we are all potential addicts, and are not irrational onthat account, suggests that rigidity is not a requirement of ra-tionality. Perhaps it will be urged that an actual addict has irra-tional preferences, and this is the source of the failure of rigidity.But if this defense is relied on, then Hammond's argument wouldonly show that transitivity and normality are satisfied by thosewho not only are rational, but also would be rational under allfuture contingencies. Since none of us are in that position, this

39

would leave open the possibility that transitivity and normalityare not requirements of rationality for us.

In any case, potential addiction is not the only way in whichnormal people violate rigidity. Consider a ten-year-old boy whoprefers that he never get involved with women, but who willhave very different preferences in a few years' time. Here there isa failure of rigidity, but I don't think we want to say that eitherthe ten-year-old or the sixteen-year-old is irrational. A possibleresponse to this would be to say that the trees in Hammond'sargument can be limited to ones that will be completed in ashort period of time, so that significant maturation cannot occurwithin the tree. But do we really want to make the defense oftransitivity and normality rest on contingent assumptions aboutthe rate of human maturation?

There are other objections that can be raised against rigidity,and I will present some of them in Section 3.2.2. I don't pressthem here, because I guess that Hammond would respond to anyone in a way that meets them all. Hammond remarks that "thepotential addict is really two (potential) persons, before andafter addiction" (p. 36). If we are using usual criteria of per-sonal identity, then this statement seems clearly false; addictiondoes not in general destroy personal identity. But the statementcan alternatively be interpreted as stipulating an idiosyncraticnotion of personal identity, according to which drug-inducedchanges in preferences destroy personal identity. If we are go-ing to take this step, we might as well go the whole way andstipulate that any change of preference over time destroys per-sonal identity. Thus all objections to rigidity are met. Rigiditythen becomes, not merely a very weak principle (as Hammondclaims), but an empty tautology.

In addition to rigidity, Hammond adopts another principle,which he says is implied by consequentialism. The principle isthis: The consequences you can obtain in a decision tree, choos-ing at each node in accordance with your preferences at thatnode, depend only on the consequences available in the treeand thus are independent of the structure of the branches inthe tree. I will call the latter principle path independence. Ham-mond shows that rigidity and path independence together entailtransitivity and normality.

40

Figure 2.3: Trees to illustrate (*)

If rigidity is assumed to hold, then the person referred toin the path-independence principle is one whose preferences donot change as the person traverses the tree. What path inde-pendence then amounts to is this:

(*) The options you can obtain in a decision tree, bychoosing at each node in accordance with thepreferences you now have for the options at that node,are a function of the consequences available in the tree.

I think Hammond is most illuminatingly read as showing that(*) entails transitivity and normality.

To see why this entailment holds in a particular case, supposeyou violate normality by having / -< 5, but C{f,g,h} = {/}.Then in Figure 2.3 tree (a), choosing in accordance with yourpreferences would give you / . But in tree (b), choosing in accor-dance with your current preferences at node 2 would give youg\ thus you cannot obtain / by choosing in accordance with thepreferences you now have at each node in tree (b). Since thesame consequences are available in trees (a) and (b), this is aviolation of (*).

Is this violation of (*) patently irrational? Someone who re-jects normality might claim that there is nothing irrationalabout having the options one can obtain depend on the struc-ture of the decision problem. Alternatively, the opponent ofnormality could concede that the options a person can chooseshould not depend on the structure of the problem, but could

41

deny that this shows violations of normality are irrational; in-stead, it could be maintained that what the case shows is thatrational persons should sometimes change their preferences asthey move through a decision tree. For example, in tree (b) ofFigure 2.3, this view would hold that what you should do isproceed to node 2, and as you do so change your preferences tohave g -< / , thus ensuring that you choose / . McClennen (1990)would make this second response.

Now we may well claim that these objections to (*) are mis-taken. But that claim seems to me no more obvious than theclaim that transitivity and normality are requirements of ratio-nality. So while I personally would endorse (*), I do not think weadd to the credibility of transitivity and normality by derivingthem from (*). In other words, I judge Hammond's argumentto be sound but not useful.

2.2. S ModestyI now present another argument based on consideration of se-quential choice situations. The premise of this argument mayseem more compelling than Hammond's, at least initially.Though the argument can be generalized to derive both transi-tivity and normality, I here keep things simple by considering itonly as an argument against the most extreme type of violationof transitivity.

Let us say that your preferences are strictly intransitive if forsome / , #, and /i, you have / -< g -< h -< / . If your preferencesare like this, then in certain sequential decision problems, youyourself would prefer to have different preferences to those youactually have. For example, consider the decision problem rep-resented in Figure 2.4. Since f ~< g, you may anticipate thatyou would choose g at node 2, in which case your choice atnode 1 is effectively between g and h. Since g -< /i, you wouldthen choose h. But now consider what would happen if you had/ >- g while other preferences remain unchanged. Then you an-ticipate that at node 2 you would choose / ; hence your choice atnode 1 is effectively between / and h; and so, since h -< / , youwould choose to proceed to node 2 and obtain / . Thus withyour present preferences, you would obtain h; but by revers-ing the preference between / and g you would obtain / . Since

42

h

Figure 2.4: Tree to show strict intransitivity entails modesty

f y h, you therefore prefer that in this decision problem yourpreference be different from what it is. We can describe this bysaying that your preferences are modest, since they themselvesdeem different preferences better. And this seems to be a failureof rationality.3

Note, though, that this argument assumes that you knowyour preference between / and g will be the same when youreach node 2 as it is now. If you knew you would at node 2reverse your preference between / and g, you would not preferto now have different preferences, and thus your preferenceswould not be modest. Thus an opponent of transitivity couldagree that someone with modest preferences is irrational butclaim that in the present example the irrationality derives, notfrom the violation of transitivity, but from the failure to changepreferences appropriately when moving from node 1 to node 2.This would be the position of McClennen (1990).

Modesty is the property that you prefer that your current pref-erences be different. What I have just noted is that strict intran-sitivity entails modesty only if you know that your preferences

3The following might appear to be a counterexample to this claim: If I preferredbuying XYZ Corporation's stock tomorrow to not buying, I would have evidencethat the stock will rise, and thus I would be likely to make a profit. But infact, I don't have such evidence and prefer not buying the stock. Thus I seemto prefer having different preferences to those I actually have, without beingirrational. But here what I prefer is that I change my preferences as a result of alearning experience; if it were within my power to just choose to have differentpreferences, then absent any learning, I prefer to leave my preferences as theyare. By contrast, the case described in the text is one where you would want tochoose to have different preferences, without any learning. I mean to refer onlyto situations of the latter kind when I call preferences modest.

43

/ -< g at 2

Figure 2.5: Choice of future preferences

at future nodes will be the same as your current preferences forthe options available at those nodes. Thus modesty by itself can-not be used to argue for transitivity. However, we might try this:Say that your preferences are diachronically modest if you nowprefer that at some future node in a decision problem, your pref-erences would be different from your current preferences. It mightseem that diachronic modesty is also a failure of rationality, andthat the argument so far shows that strict intransitivity entailsdiachronic modesty.

The main problem with this argument is with the notion ofpreferences for future preferences. To interpret such preferences,we need to connect them with choice. Presumably, to say "younow (prior to choosing at node 1) prefer that at node 2 you pre-fer / " implies this: If you could now choose what preference youwould have at node 2, and then make your choice at node 1,you would choose to prefer / . But then the decision tree weare considering becomes the one shown in Figure 2.5. Here thechoice of what preference to have at node 2 is made at node 0,and nodes la and lb correspond to node 1 in Figure 2.4. If atnode 0 you anticipate choosing in accordance with your currentpreferences at node 1 (a or b), then you anticipate that choosingto proceed to node la will result in obtaining / , while choosingto proceed to node lb will result in obtaining h. Since h -< / ,you will therefore proceed to node la and thus choose to haveyour preference between / and g reversed at node 2. But no-tice that in this reasoning I had to assume that you anticipatethat at node 1 you will choose in accordance with your cur-rent preferences (i.e., your preferences at node 0). An opponent

44

of transitivity can point out that he has already rejected theassumption that preferences should be constant; and indeed, itwas precisely to try to get around that objection that we got in-volved with preferences about future preferences. So this movein fact gets us nowhere.

Thus the argument from modesty must be as I originallystated it. And what we have seen it to rest on is the followingprinciple:

In any decision tree, if you knew you would choose at futurenodes in accordance with your current preferences, youwould not prefer that your current preferences bedifferent than they in fact are.

While I personally find this a plausible principle of rationality,I do not think that it is more compelling than transitivity andnormality themselves. I have already indicated how McClennenwould reject it.

2.2.4 Principles a and (3Sen (1971) shows that if a choice function C satisfies certainproperties, then the choice function is normal, and preferencesare transitive. This result can be regarded as providing yet an-other argument for transitivity and normality. It is the last ar-gument I will consider.

One of the properties assumed by Sen is

Property a. If S and T are sets of acts, and S is a subset ofT, then any act f in both S and C(T) is also in C(S).

That is: An act that is best in a given set of alternatives mustalso be best when the number of alternatives is reduced.

Another property needed for Sen's result is

Property /?. / / S and T are sets of acts, and S is a subset ofT, and if f and g are both in C(S), then f is in C(T) iffg isalso inC(T).

That is: When the set of alternatives is enlarged, two acts thatwere initially best either both remain best, or both cease to be

45

best. Adding alternatives cannot make only one of the initiallybest options cease to be best.

I will say that a choice function C is connected if for anynonempty set of acts 5, C(S) is nonempty. Then the resultproved by Sen is this: If C is a connected choice function satisfy-ing properties a and /?, then C is normal and binary preferencesare transitive.

If we were to hold that rational persons must have connectedchoice functions, then this result would give us an argument forthe rationality of transitivity and normality. But I have allowedthat rational persons need not have connected choice functions.And Sen's result can fail when the choice function is not con-nected. That is, a person whose choice function is not connectedcan satisfy properties a and /?, while violating transitivity ornormality (or both).4 Thus Sen's result as it stands does notgive a sound argument for transitivity or normality. I now con-sider how this defect might be rectified.

Although a person can rationally have a choice function Cthat is not connected, it is plausible that C should satisfy thiscondition:

Extendability. C is extendable to a connected choice functionthat can be held rationally.

Extendability is satisfied just in case there is a rational con-nected choice function C* such that, whenever C(S) 7 0, C(S) =C*(S).

Another plausible rationality requirement is

Closure. / / every rational connected extension C* of C is suchthat C*(S) = T, then C(S) = T.

In other words, if the requirement of extendability fixes whatC(S) must be if it is nonempty, then C(S) must now have thatvalue. One might object to closure on the ground that it maytake an unreasonable amount of thought to figure out thatC(S) can be given a nonempty value in only one way, if ex-tendability is not to be violated; but I assume we are dealing4For example, if / € 5, and g ^ / for all g € 5, but C(S) = 0, then normality is

violated, but there need be no violation of a or /3.

46

with situations in which the options are important enough thatdoing the necessary thinking is worthwhile.

Now a modified version of Sen's result, which applies evento unconnected choice functions, is this: If a choice functionsatisfies principles a and /?, plus extendability and closure, thennormality and transitivity are satisfied. (I omit the proof.)

If the reader finds this a compelling argument for transitivityand normality, well and good. But my own judgment is thatwhile I accept the premises of the argument, I don't find themmore compelling than transitivity and normality themselves.The attractiveness of principles a and /? seems to derive fromthe view of acts as ordered in a way that does not depend onwhat other acts are available, and this is just what transitivityand normality assert. Furthermore, the fact that we needed toappeal to two additional principles (extendability and closure)tends to undercut the force of the justification.

This completes my discussion of arguments for transitivity. Ihave contended that one of these arguments (the money pumpargument) is fallacious. If somebody finds one or more of theother arguments compelling, I have no quarrel with that. But Ihave stated my own view that when the assumptions of thesearguments are carefully scrutinized, the arguments do not suc-ceed in adding to the initially high plausibility of transitivityand normality. This result can be seen as a compliment to theplausibility of transitivity and normality, rather than a strikeagainst them. But if we have no good argument for transitiv-ity or normality, it becomes all the more important to considerwhether there is a good argument against them; to this I nowturn.

2.3 OBJECTIONS TO TRANSITIVITY AND NORMALITY

Some writers on decision theory have explicitly rejected tran-sitivity; a smaller number have also rejected normality. In thissection, I will discuss some of the reasons that have been givenfor rejecting these conditions. The discussion will mostly con-cern transitivity, since this is the condition that has receivedmost critical attention; but I will discuss one objection that isaimed at normality as well as transitivity (Levi's). I will argue

47

that the reasons offered for rejecting transitivity and normalityhave little, if any, force.

Recall that one possible objection to transitivity has alreadybeen dealt with in Section 1.5. We saw there that if lacking aview about the relative merits of two options were identifiedwith indifference between those options, then transitivity couldwell be violated. I headed off this objection by insisting on adistinction between those two attitudes. If you lack a view aboutthe relative merits of / and #, then the preference relation, asI interpret it, is not defined between these options.

The objections to transitivity that I will now discuss all pur-port to show that there can be good reasons to violate transi-tivity, even when preference is interpreted as I have proposed. Istart with the least cogent objection and end with what I taketo be the most serious challenge to transitivity.

2.3.1 Probabilistic prevalenceIf a, ft, and c are three quantities, it is possible that p(a < 6),p(b < c), and p(c < a) are all greater than 1/2. Bar-Hilleland Margalit (1988) refer to this phenomenon as probabilisticprevalence. Blyth (1972), Packard (1982), Anand (1987), andBar-Hillel and Margalit have all claimed that the phenomenonleads to intransitive preferences.5

Consider a concrete example, due essentially to Blyth. Let a,6, and c denote the times taken by three runners A, i?, and C,in a race. Assume the probability distribution of a, 6, and c isas follows. (Here times are in minutes (') and seconds (").)

p(a = l'O", b = l'l", c = 1'2") - .3p(a = 1'2", b = l'O", c = l'l") = .3p(a = l'l", b = 1'2", c = l'O") = .4.

Then p(a < b) = .7, p(b < c) = .6, and p(c < a) = .7.5Bar-Hillel and Margalit argue for intransitivity in choice patterns, which they

distinguish from preference. But given the connection between preference andchoice that I have set out in Section 1.4, their intrasitivities in choice are intran-sitivities in preference. (Their discussion of the relation between preference andchoice, in the final section of their paper, (a) conflates preferences with motives;and (b) erroneously claims that choice is extensional.)

48

Blyth does not explain how this phenomenon is supposed tolead to intransitive preferences. Bar-Hillel and Margalit fill outthe argument; they say that "A is a better bet than B whenthese two are the competing runners, B is a better bet than Cwhen these two are the competing runners, and C is a betterbet than A when these two are the competing runners" (1988,p. 131). They claim we have here a case of strict intransitivitywhich is "perfectly justified and rational" (1988, p. 132).

But in fact, there is no violation of transitivity in this ex-ample. Let fg be the act of betting that A will beat B\ andsimilarly for the other pairs of runners. Assuming these bets areat even odds, then what we have in this example is that

But this is a violation of transitivity only if f% = fc> fE =

f%, and fc = /B- NOW an act determines what consequenceswill be obtained in what states. Indeed, in Savage's (1954) pre-sentation of decision theory, acts are identified with functionsfrom the set of states to the set of consequences. Thus twoacts can be said to be identical just in case they give the sameconsequences as each other in every state. One possible state is

„ i / n / / u ififf ~ ifona = 1 [) , o = l l , c = 1 2 .

In this state f% gives a loss, while f£ gives a profit. So f% andJQ give different consequences in some states, and hence arenot identical acts. The other required identities are similarlyshown to be false.

2.3.2 Shifting focusArthur Burks writes:A subject may prefer act a\ to act a2 on the basis of one criterion,a2 to a3 on the basis of another criterion, a3 to a\ on the basis of athird, and yet be unable to measure the relative strength of his desirefor each act in terms of each criterion or to weigh the relative meritsof the three criteria. For example, he may prefer a Volkswagen to aFord on the basis of size, a Ford to a Plymouth on the basis of safety,a Plymouth to a Volkswagen on the basis of speed, and yet be unableto rank the criteria of size, safety, and speed. In this case it seemsreasonable for his preferences to be intransitive. (1977, p. 211)

49

The person in Burks's car example regards smallness, safety,and speed as desirable features in a car. Smallness is a reasonfor favoring the Volkswagen over the Ford, but on the otherhand, speed and safety are reasons to favor the Ford over theVolkswagen. But then, the fact that the Volkswagen is smallerthan the Ford is no compelling reason to prefer the Volkswagento the Ford overall. It may be said that the Volkswagen is muchsmaller than the Ford, while the differences in speed and safetyare not so dramatic. But as against that, it can be said thatspeed and safety are two factors, while size is only one. So wehave been given no good reason why the Volkswagen should bepreferred to the Ford overall. Similarly, we have been given nogood reason why the Ford should be preferred to the Plymouth,or the Plymouth to the Volkswagen. Hence, I do not see anygood reason to think that in this case it is reasonable to haveintransitive preferences.

Burks is careful to say that the person in his example is un-able to decide on the relative importance of the three desider-ata. But in that case, it seems that the person ought to lacka view about whether the Volkswagen is better than the Ford,not focus on one desideratum for this comparison and ignorethe others. So this feature of Burks's case does not provide anysupport for the conclusion that preferences can reasonably beintransitive.

After the passage quoted, Burks goes on to say the reasonfor not ranking the three options in a single transitive order-ing may be that it is hard to do. But if we are looking for aneasy basis for choosing between the options, we could simplyrank all three on the basis of, say, safety alone. There is noreason why one comparison should be made on the basis of onedesideratum alone, while other comparisons are made by usingdifferent desiderata in isolation. Anyway, the case now seems tobe becoming one in which the rationale for intransitivity is thatthe decision problem is not sufficiently important to justify thecomputational costs that might be involved in achieving tran-sitivity; and I have conceded (in Section 1.3) that intransitivityneed not be irrational under these conditions.

So while shifting focus in different comparisons can lead tointransitive preferences, I do not think we have been given any

50

reason for doing that, and certainly not in decision problems ofsufficient importance. Hence this observation does not under-mine the view that transitivity is a requirement of rationality,when the preferences are relevant to a sufficiently importantdecision problem.6

2.3.3 Majority ruleIt is well known that majority rule can produce intransitivepairwise rankings. For example, suppose Fred, Gary, and Helenrank order the options / , g and h as shown in Figure 2.6 (with1 being the highest ranking). Then in a choice between / and<7, a majority vote will select g\ in a choice between g and /i, amajority vote will select h; and in a choice between h and / , amajority vote will select / . This situation is sometimes referredto as the Condorcet paradox.

There have been several attempts to argue from the Con-dorcet paradox to the conclusion that intransitive preferencesare sometimes reasonable. Perhaps the simplest such argumentis one given by Bar-Hillel and Margalit (1988, p. 122). They sug-gest that a benevolent dictator might reasonably want to makewhatever choices a majority of the subjects would vote for. Inview of the Condorcet paradox, they immediately conclude thatthis dictator may have intransitive preferences.

The most natural interpretation of this scenario is that thedictator puts a value on doing what the majority would vote for.But if so, then there is no violation of transitivity. For example,suppose the dictator's subjects are Fred, Gary, and Helen, withrank orderings as in Figure 2.6. Then the consequence of choos-ing g over / for the dictator is not just whatever g gives butalso the desired consequence of doing what the majority wants.Since an act must specify the consequences obtained in eachstate, and since consequences must be fully specific with regardto everything the decision maker cares about (Section 1.1), theact that the dictator chooses in this case cannot be identified6R. I. G. Hughes (1980) also argues for intransitivity on the basis of shifting fo-

cus. Unlike Burks, Hughes fills out his example in such a way as to make thepreferences compelling. But so filled out, the options in the various pairwise com-parisons are not the same, and there is no violation of transitivity. His mistakeis the same one involved in the argument from probabilistic prevalence.

51

f 9 h

FredGaryHelen

1

2

3

3

1

2

2

3

1

Figure 2.6: Preference rankings

with g. Let us call it instead p+ . Similarly, what choosing /would give in this case is not merely what / would give butalso the undesirable consequence of not doing what the major-ity would vote for; we could call this option /~. Continuing inthis way, we find that the dictator's preferences are /~ -< ff"1",g~ -< /i+, and h~ -< /+ . This is not a violation of transitivity,because / " jt /+ , and similarly for g and h.

But perhaps it will be stipulated that this dictator does notput any intrinsic value on doing what the majority would votefor; instead the dictator cares only for the welfare of the sub-jects and on this basis prefers options that would be chosenby the majority. In that case, I would suggest that since themajority votes here produce intransitive pairwise rankings, thedictator has good reason to think that in this case the major-ity vote is a poor guide to what is best for the welfare of thesubjects. Majority rule may be desirable because of such con-siderations as fairness and practicality, but apart from thoseconsiderations (which must be ignored to get a counterexam-ple to transitivity), there is no reason to think that majorityrule always maximizes the welfare of the subjects. Indeed, ifthe subjects are Fred, Gary, and Helen, and they are faced withthe decision tree of Figure 2.4, a majority would at node 1 votenot to use majority rule at node 2. (The proof is essentiallythe same as the proof that strictly intransitive preferences aremodest, given in Section 2.2.3.)

Another attempt to use Condorcet's paradox against transi-tivity considers a situation in which you must choose betweenoptions that differ in several different aspects that you careabout. It is claimed that in such a case, you could reasonablyprefer one option to another just in case it is superior in a

52

SizeSafetySpeed

Volkswagen1

2

3

Plymouth2

3

1

Ford3

1

2

Figure 2.7: Ranking of car qualities

majority of the relevant respects. And this rule leads to in-transitive preferences.

For a concrete illustration, let us reconsider Burks's car ex-ample. Suppose that the ranking of the three cars with respectto size, safety, and speed is as in Figure 2.7. Applying major-ity rule to the aspects, we get that the Volkswagen is betterthan the Plymouth, the Plymouth is better than the Ford, andthe Ford is better than the Volkswagen. (The opposite of thepreferences Burks thought reasonable in this case!)

However, the fact that the Volkswagen is better than thePlymouth in two respects is not much of a reason to preferthe Volkswagen. The respect in which it is worse may be muchmore significant. If you have no opinion about the magnitude ofthe differences in the various aspects, then it seems to me morenatural to be indifferent between these options, or else to lacka view about which of these options is best. So I see no greatintuitive appeal in basing preferences on majority rule amongattributes. Thus the fact that preferences formed in this waymay be intransitive is not a good reason to deny that rationalpreferences must be transitive.

2.34In a series of works (1974, 1980, 1986), Isaac Levi has insistedthat rational persons need not have precise probabilities andutilities for all conceivable events and consequences. I have my-self endorsed this view in Section 1.5. However, Levi's concep-tion of what is involved in having indeterminate probabilitiesand utilities is different from mine. On the preference interpre-tation I have adopted, to have indeterminate probabilities or

53

utilities is to lack a view about what choices are best in somedecision problems. Levi, on the other hand, provides rules thatwould determine, for any rational agent and any decision prob-lem, which options are best in that decision problem. Thus Leviis committed to connectedness, in the sense in which I use thatterm.7 And as Levi himself has emphasized (1986, ch. 6), hisrules for making choices when probabilities and utilities are in-determinate violate both transitivity and normality. Here I willbriefly review those rules and argue that there is no compellingreason to accept them.

Levi refers to the set of probability functions that repre-sent the person's probability judgments as the person's credalstate. And he refers to the set of utility functions that repre-sent the person's value judgments as the person's value struc-ture. When probabilities are indeterminate, the credal statewill contain more than one probability function; and whenutilities are indeterminate, the value structure will contain es-sentially different utility functions. An option that maximizesexpected utility relative to some probability function in thecredal state and some utility function in the value structureis said to be E-admissible. Levi holds that E-admissibility is anecessary, but not sufficient, condition for a choice to be ratio-nal. He suggests that considerations of security may be usedto discriminate between the E-admissible options. Given somepartition, the security level of an act is the minimum of theutilities of the act given each element of the partition. Levileaves it up to the person to determine what partition will beused for assessing security. He says an option is S-admissible ifits security level (relative to the agent's favorite partition) is atleast as great as that of any other E-admissible option. Leviholds that a rational choice should be S-admissible. If thereis more than one S-admissible act, then further tests may beused to distinguish between them. The acts that survive alltests are said by Levi to be "admissible"; using the terminol-ogy of Section 1.6, I would say these acts constitute the choiceset.

7The concept of preference that I set out in Section 1.4 is what Levi (1986, p. 97)calls "basic revealed" preference; and that relation is connected for a rationalperson, according to Levi's decision theory.

54

3 -

2 -

1 -

o-

- 1 -

o

unny

fJ

Figure 2.8: Example in which Levi's theory violates transitivity

To see that this proposal violates transitivity, let {A, B, C}be a partition of the states of nature, and let acts / , g, and hbe as follows:8

u(f) = 3 on A U B, - 2 on Cu(g) = 3 on A, - Ion BUCu(h) = 0 everywhere.

These acts are represented in Figure 2.8. Suppose that AU Bhas a determinate probability of 2/3, but that the probabilitiesof A and B are each indeterminate with a lower bound of 0and an upper bound of 2/3. Suppose further that security isassessed relative to the partition {A, B, C} and that no consid-erations beyond E-admissibility and security are invoked. ThenLevi's theory requires / -< p, g -< /i, and h -< / , giving strictlyintransitive preferences.

Proof. In a choice between / and 5, both options are E-ad-missible, since EU(f) = 4/3 for every probability function inthe credal state, while EU(g) ranges from —1 to 5/3. Since8By "u(f) = 3 on AU B", I mean that for all states x G A U B, u[f(x)] = 3.

Similarly for the other cases.

55

the security level of / is —2, while that of g is — 1, only g isS-admissible in a choice between / and #, whence / -< g. Ina choice between g and /i, both options are again E-admissiblebut h has a security level of 0, and so h is uniquely S-admissible;hence g -< h. In a choice between h and / , only / is E-admissible,s i n c e EU(f) = 4 / 3 > 0 = EU{h)\ a n d soh^ f.

One can also construct examples in which Levi's proposalsviolate normality, without violating transitivity.9

So Levi's proposal conflicts with both transitivity and nor-mality. Is that any reason not to be committed to transitivityand normality? It would be if there were some compelling rea-son for adopting Levi's proposal; but I see no such reason. Forone thing, Levi's proposal presupposes that a rational personmust have a view about which acts are best in any decisionproblem; as I said in Section 1.5, this is implausible.10

But then what advice would I offer you, when you are facedwith a decision problem, in which you lack a preference aboutwhat to choose? In some cases, I would give you specific advicebased on information I have that you may lack; for example, ifyou have no preference about whether or not to take an um-brella, and I have reason to think it will rain, I will advise youto take an umbrella. But what if you and I have the same in-formation? I still might accept norms that govern your case asI understand it, and give you advice based on these. For exam-ple, if you have no preference about whether or not to mutilateyourself for no reason, my advice would be not to do it, be-cause I think this would be irrational. In other cases, the normsI accept would say that any option can be rationally chosen, inwhich case my advice would be that it doesn't matter what youchoose. Finally, my norms might be like your preferences and

9One such example is provided by Cases 1, 2, and 4 of (Levi 1974), when P/Sis between .4 and .5. Levi there cites this example as showing that his theoryviolates the "principle of independence of irrelevant alternatives."

10In a paper presented at the 1986 Philosophy of Science Association meetings(Maher 1986a), I pointed out that there are many alternatives to Levi's decisionrule and claimed that Levi had given no reason for preferring his rule to thesealternatives. Levi, who was in the audience, responded that his account fitschoices that are often made in the Allais and Ellsberg problems. That responseprompted the research reported in (Maher 1989) and (Maher and Kashima1991), which shows that the choices Levi is referring to are not made for thereasons envisaged by his theory. (These choices will be discussed in Chapter 3.)

56

not extend to the decision problem at hand; then I would say Idon't know what you should do. I think my advice, incompleteas it may be, is likely to be better than Levi's arbitrary rules.

2.3.5 Discrimination thresholdIt is plausible that when the difference between two options istoo small to be perceptually detectable, then a person would beindifferent between those options. But if this prima facie plau-sible thesis is accepted, then transitivity of preferences becomesunsupportable.

To illustrate this, suppose you prefer your coffee with a spoonof sugar in it.11 You surely cannot distinguish between cupsof coffee that differ by only one grain of sugar. If so, and ifyou are indifferent between options that are not perceptuallydistinguishable, then you violate transitivity.

Proof. Let n denote the number of grains in a spoonful ofsugar, and let gi denote the option of receiving a cup of coffeewith i grains of sugar in it. By assumption, gi ~ ft+i, for all i.One application of transitivity gives go ~ g<i. A second appli-cation gives go ~ 53. Continuing in this way, we get by n — 1applications of transitivity that go ~ gn- But you do not satisfythis condition; for you, go -< gn-

If we are to maintain transitivity, and also that your prefer-ences here are rational, it seems we must say that despite ap-pearances, you do prefer a cup of coffee with i+1 grains of sugarto one with i grains, for some i between 0 and n — 1. In defenseof that option, it can be said that this preference need onlybe very slight - so slight that it would be outweighed by othersmall differences. For example, you needn't be prepared to payas much as a penny more for the cup with the extra grain ofsugar, or even to reach significantly further to obtain it. Also,there is a cost of calculation, which makes it uneconomic toworry much about very minor differences. Thus in practice, avery slight preference will be essentially indistinguishable fromindifference. Ordinary language might well not distinguish be-tween such a slight preference and true indifference; in ordinary11 This example is derived from (Luce 1956).

57

life, there would be little point in making the distinction. Butwhen we philosophize about long sequences of options, eachsimilar to the next but with very different endpoints, we canuse this distinction between very slight preference and indiffer-ence to resolve an apparent contradiction between reasonable-seeming preferences and a commitment to transitivity.

That defense of transitivity could be criticized, as follows.The reason you want sugar in your coffee is presumably for itseffect on the taste. But you cannot detect the difference betweencups of coffee differing by only one grain of sugar. Hence thetaste must be the same in each case; and since the taste is theonly reason you care about having sugar in coffee, it follows thatyou must be indifferent between cups of coffee differing by onlyone grain of sugar. So there must be a violation of transitivityhere, after all.

I will argue that this objection is fallacious. But before criti-cizing it, let me clarify one possible source of confusion. Supposeyou are presented with two cups of coffee differing by only onegrain of sugar and you have no idea which is which. Then surelyyou will be indifferent as to which one you choose. But it doesnot follow from this that you are indifferent between cups of cof-fee differing by only one grain of sugar. The situation is ratherthat each of the choices open to you is a gamble with a 50-50chance of giving either a cup with i grains of sugar, or one withi + 1 grains, for some i. Analogy: Suppose you are presentedwith two black boxes, one containing a million dollars and theother containing shredded newspaper, so that you cannot tellwhich is which. Offered your choice between these boxes, youwill probably be indifferent between the two options open toyou; but you are not indifferent between getting a million dol-lars and getting shredded newspaper.

So the sort of case we need to focus on is one in which younot only know that the two cups of coffee differ by only onegrain of sugar, but also are informed which one has the extragrain, although you could not distinguish which was which ifyou had not been told. And the question that concerns us iswhether, given that your only reason for wanting sugar in yourcoffee is for its taste, it follows that you must be indifferent

58

between receiving either of these cups of coffee. I maintain thatit does not follow. To fix ideas, suppose the following are truefor direct comparisons of cups of coffee:

(i) You cannot distinguish by taste between a cupcontaining no sugar and one containing 10 grains ofsugar,

(ii) You cannot distinguish by taste between a cupcontaining 10 grains of sugar and one containing 20grains,

(iii) You can distinguish by taste between a cup containingno sugar and one containing 20 grains of sugar.

According to the argument of two paragraphs back, (i) im-plies that for you, the taste of coffee containing no sugar isthe same as the taste of coffee containing 10 grains of sugar.However, this implication is demonstrably false in the case athand. By (iii), you can taste the difference between cups ofcoffee containing 0 and 20 grains of sugar. If we replace thesugarless coffee by a cup containing 10 grains of sugar, youcan no longer taste a difference with the 20-grain cup. Hence,you can by taste distinguish between the 0-grain and 10-graincups; so these must taste different. The fact that you cannotdistinguish them in a direct comparison therefore shows, notthat they produce the same taste sensation, but rather thatin a direct comparison you are unable to distinguish the tastesensations they produce. And given that the taste is differ-ent, you can prefer the coffee with 10 grains of sugar to thesugarless coffee, even though you cannot directly distinguishbetween these on the basis of taste, and even though tasteis the only reason you care for sugar in your coffee. The fal-lacy in the argument of two paragraphs back is that it tacitlymakes the false assumption that differences of sensation mustalways be detectable in direct comparisons.

Let us use the notation a »- b to denote that a tastes de-tectably better than b in a direct comparison, and use a « b todenote that a and b cannot be distinguished on the basis of tastein a direct comparison. Then it is plausible that a really tastes

59

better than b just in case one of the following three conditionsholds:

• a » - 6;• a « b and there exists c such that a « c and c »- 6; or• o « i i and there exists d such that a »- d and d & b.

If so, then someone who cares only about taste should have a yb just in case one of these three conditions holds. Luce (1956)shows that, provided »- and « satisfy reasonable conditions,12

the preferences of such a person are transitive. So, contraryperhaps to first appearances, there is an attractive way of con-forming to transitivity in the coffee example.

2.4 HUME AND MCCLENNEN

I have now considered arguments for and against transitivityand normality, and found both wanting. That leaves us wherewe started: Transitivity and normality are attractive normativeprinciples endorsed by the vast majority. While violations arenot rare, almost everyone feels they should correct those viola-tions when they discover them. I think this is as good a caseas we can hope to get for substantive normative principles; iffurther support is desired, it should be sought in successful ap-plications of the principles. Thus I propose to proceed with anormative theory incorporating transitivity and normality.

The position I am taking here has been criticized by McClen-nen. He notes that there is not a consensus on principles liketransitivity and normality, and holds that "Where there is no12The conditions are that for all a, 6, c, and d:

• Exactly one of a » - 6, b » - a, or a « b obtains;• a « a;• If a »- 6, b « c, and c>^- d, then a » - d;• If a » - 6, b » - c, and b « d, then not both a w d and c « d.To forestall possible misunderstanding, let me make it clear that although Iam using Luce's formal result, the interpretation I am putting on that result isdifferent from Luce's. Luce accepted that intransitive indifferences were reason-able, and offered his result to show that utilities could be denned nonetheless,with indifference occurring when the difference in utility was smaller than somethreshold. His abandonment of transitivity may be due to a failure to make thedistinctions I have drawn in the two preceding paragraphs.

60

consensus, it is important to proceed with the greatest of cau-tion. Otherwise, one ends up legislating for all about mattersbest left to individual discretion" (1990, p. 86). He proposesto follow Hume, and say that choice is irrational only if it isinconsistent with the chooser's ends. As McClennen interpretsit, this seems to boil down to saying that the only principleof rationality is that you should not make choices that are cer-tain to leave you worse off than some other alternative that wasavailable. As the previous discussion of modesty will have in-dicated, a person who violates transitivity and normality neednot violate this principle, provided preferences are revised overtime in suitable ways. Thus McClennen rejects transitivity andnormality as requirements of rationality.13

McClennen's conception of rationality is similar to the con-sistency conception discussed in Section 1.7. As I argued there,this is a notion of rationality that is not adequate for the sortof purposes that concern me. The principle that one should notaccept sure losses is not a sufficient basis on which to build anormative theory of scientific method, and thus acceptance ofMcClennen's strictures would make such a theory impossible.

Let us then evaluate McClennen's argument for his weak con-ception of rationality. His premise is that we should not adoptnormative principles that lack unanimous support. Why shouldwe accept this premise? Perhaps unanimity on normative prin-ciples is a desirable thing, and one way to achieve this wouldbe for everyone to weaken the norms they accept to the com-mon denominator. But this would involve many of us abandon-ing normative principles we accept, and speaking for myself,I don't think this is a price worth paying. Thus I doubt thatMcClennen's premise itself enjoys anything like unanimous en-dorsement. Thus if we accept McClennen's premise, we shouldreject it. Hence we should reject it.

13Though this is not how McClennen himself puts the matter. He says that heaccepts transitivity for a fixed set of options, and rejects only that the or-dering must be unchanged by adding or removing options. But this separatespreference from choice in a way that I think makes the notion of preferencemeaningless. For example, I can make no sense of what it would be to have/ -< g -< h if the options are / , g, and h, but g -< / if the options are / and g.As I use the term, binary preference relates to choices where the available op-tions are binary. With preference so understood, McClennen rejects transitivityand normality.

61

I suggest that our reaction to differences of opinion over tran-sitivity and normality should instead be to let "a hundred flow-ers blossom, and a hundred schools of thought contend" (Mao1966, ch. 32). Since foundational arguments have been foundinadequate to settle the issue either way, advocates of differentpositions should get to work developing theories based on theirpreferred principles. We can then use our judgments of the re-sulting theories to help decide between the principles. In laterchapters of this book, I try to develop further the theory basedon transitivity and normality by applying it to the acceptanceof scientific theories and the delineation of scientific values. Iencourage critics of transitivity and normality to tell us howthey would deal with these issues.

62

3

Independence

Having argued for the legitimacy of assuming transitivity andnormality, I now turn to the independence principle (definedin Section 1.3). Independence does not have the kind of pop-ular endorsement that transitivity does. Nevertheless, I showthat independence does follow from premises that are widelyendorsed. The argument that I give differs at least subtly fromthose in the literature, and I explain why I regard my argumentas superior. I also explain why arguments against independencestrike me as uncompelling.

3.1 VIOLATIONS

In Chapter 2 we saw that most people endorse transitivity andnormality. With independence, on the other hand, there is fairlystrong prima facie evidence that many people reject it. Part ofthis evidence consists of decision problems in which a substan-tial proportion of people make choices that appear inconsistentwith independence; there is also evidence that to many peoplethe axiom itself does not seem compelling.

3.1.1 The Allais problemsMaurice Allais was one of the earliest critics of independence,and backed up his position by formulating decision problemsin which many people have preferences that appear to violateindependence. I will present these problems in the way Savage(1954, p. 103) formulated them, since this brings out the conflictwith independence most clearly.

A ball is to be randomly drawn from an urn containing 100balls, numbered from 1 to 100. There are two separate decisionproblems to consider, with options and outcomes as in Figure 3.1.Studies have consistently found that a substantial proportion

63

Problem A

1 2-11 12-100

0,2

$1,000,000 $1,000,000 $1,000,000

$0 $5,000,000 $1,000,000

Problem B

1 2-11

h12-100

$1,000,000 $1,000,000 $0

$0 $5,000,000 $0

Figure 3.1: The Allais problems

of subjects choose a\ in Problem A and b2 in Problem B. As-suming that these choices reflect strict preferences, it followsthat these subjects have a2 -< a\ and 61 -< 62 - preferencesthat are inconsistent with independence if the consequences aretaken to be the monetary outcomes. Allais claimed that thesewere reasonable choices to make, so I will call them the Allaischoices. (The corresponding strict preferences will be called theAllais preferences.) In versions of the Allais problems sometimesdiffering slightly from that given here, the proportion of sub-jects making the Allais choices has been found to be 46% (Al-lais 1979), 80% and 50% (Morrison 1967), 39% (MacCrimmon1968), 30% (Moscowitz 1974), 35% to 66% (Slovic and Tversky1974), 33% (MacCrimmon and Larsson 1979), 61% (Kahnemanand Tversky 1979), and 38% (Maher 1989).

For all that I have said so far, it might be that the Al-lais choosers are committed to independence but fail to real-ize that their preferences violate that commitment. This hy-pothesis has been investigated, by presenting subjects with anargument against the Allais choices, based on independence;and an argument for those choices, conforming to Allais's rea-soning. It is consistently found that most subjects rate Allais's

64

0,2

hh

1

$1,000,000

$0 + L

$1,000,000

$0

2-11

$1,000,000

$5,000,000

$1,000,000

$5,000,000

12-100

$1,000,000

$1,000,000

$0

$0

Figure 3.2: Modified consequences for the Allais problems

reasoning more compelling than that based on independence;and after reading these arguments, the proportion endorsing theAllais preferences is apt to increase slightly (MacCrimmon 1968,Slovic and Tversky 1974, MacCrimmon and Larsson 1979).

A number of writers (Morrison 1967, Raiffa 1968, Bell 1982,Eells 1982, Jeffrey 1987, Broome 1991) have suggested thatthose who make the Allais choices think it would be more un-pleasant to have ball 1 drawn if a^ was chosen than if 62 waschosen. As Morrison (1967) puts it, if you choose a^ and ball 1is drawn, you may feel that you have lost $1 million ratherthan that your wealth remains unchanged; and you may notfeel this way if you choose 62 and ball 1 is drawn. But if thisis so, then since consequences must specify everything you careabout (Section 1.1), the consequence of choosing 0,2 when ball 1is drawn is not the same as the consequence of choosing 62when ball 1 is drawn. If this is right, then the consequences inthe Allais problems would be more accurately represented as inFigure 3.2 (where L denotes the sense of having lost a milliondollars). If the consequences are identified in this way, then theAllais preferences do not violate independence.

While the Allais preferences can be reconciled with indepen-dence in this way, I think it unacceptably ad hoc in the absenceof some independent evidence that subjects do indeed thinkthat if ball 1 will be drawn, they would be worse off choosinga2 than choosing 62. Jeffrey (1987, p. 234) claims it is clear fromthe subjects' explanations of their preference that this is indeedthe case, but I know of no formal study which supports thisclaim. The studies that have asked subjects for reasons in the

65

Allais problems (MacCrimmon 1968, Slovic and Tversky 1974,MacCrimmon and Larsson 1979) have merely asked subjectsto indicate agreement or disagreement with the independenceprinciple and with AUais-type reasoning. A representative ex-ample of the latter is the following.

In Problem A, I have a choice between $1 million for certainand a gamble where I might end up with nothing. Whygamble? The small probability of missing the chance of alifetime to become rich seems very unattractive to me.

In Problem B, there is a good chance that I will end upwith nothing no matter what I do. The chance of getting$5 million is almost as good as getting $1 million so Imight as well go for the $5 million and choose 62 over 61(Slovic and Tversky 1974, with notation changed).

The studies find that subjects tend to endorse this AUais-typereasoning; but it does not follow from this that subjects thinkthey would be worse off with 0,2 and ball 1 than with 62 andball 1.

What is needed is a study designed to test whether subjectsdo think they would be worse off with c&2 and ball 1 than with62 and ball 1. Until such a study is conducted, I think we cannotsay with much confidence whether those with the Allais pref-erences are violating independence. But the fact that subjectsrate Allais-type reasoning as more attractive than independenceis reason to doubt that they are committed to independence.

3.1.2 The Ellsberg problemsDaniel Ellsberg (1961) also formulated decision problems inwhich it seems that many people have preferences that vio-late independence. One pair of decision problems, in which theapparent conflict with independence is most direct, is as follows.

A ball is to be randomly drawn from an urn containing 90balls. You are informed that 30 of these balls are red and the re-mainder are either black or yellow, with the proportion of blackto yellow balls being unknown. The two decision problems havethe options shown in Figure 3.3. Ellsberg thought it reasonableto choose c\ in Problem C, and cfo in Problem D. His rationale

66

C2

Problem CRed

$100

$0

Black

$0

$100

Problem D

Red

$100

$0

Black

$0

$100

Yellow

$0

$0

Yellow

$100

$100

Figure 3.3: The Ellsberg problems

for choosing c\ was that with it there is a known chance of 1/3of winning $100, while with C2 the chance of winning is un-known. His rationale for choosing cfo was similar: with c?2 thereis a known chance of 2/3 of winning, while with d\ all that isknown is that the chance of winning is at least 1/3. I will callthese choices the Ellsberg choices and will call the correspond-ing strict preferences the Ellsberg preferences. These preferencesviolate independence, provided the consequences can be takento be the monetary outcomes.

In studies where subjects were presented with Problems Cand D, the proportion selecting the Ellsberg choices has beenfound to be 66% to 80% (Slovic and Tversky 1974), 58% (Mac-Crimmon and Larsson 1979), 75% (Maher 1989), and 60% (Ma-her and Kashima 1991). Assuming that these choices reflectstrict preferences,1 it follows that most subjects have the Ells-berg preferences.2

1For evidence confirming this assumption, see the data on Problem F in (Maher1989) and (Maher and Kashima 1991).

2 However, in two studies conducted at La Trobe University, Kashima and I (1992)found the proportion of Ellsberg choosers to be 34% and 25%. There were 131subjects in the first study and 68 in the second, so these proportions differsignificantly from those previously reported. There may be a cultural effect;La Trobe University is located in Melbourne, Australia, where gambling is partof the culture.

67

C2

dx

d2

Red$100

$0 + F

$100 + F

$0

Black$0

$100 + F

$0 + F

$100

Yellow$0

$0 + F

$100 + F

$100

Figure 3.4: Modified consequences for the Ellsberg problems

Could it be that the Ellsberg choosers are really committedto independence and simply make a mistake in these problems?The evidence suggests otherwise. In some studies, subjects havebeen presented with an argument for making choices conform-ing with independence and an Ellsberg-style argument for theEllsberg choices (Slovic and Tversky 1974, MacCrimmon andLarsson 1979); the result is that subjects who make the Ellsbergchoices tend to rate the Ellsberg-style argument higher than theargument based on independence. A high level of agreementwith Ellsberg-style reasoning was also reported in Maher andKashima (1991).

A number of writers have suggested that persons who havethe Ellsberg preferences put an intrinsic value on knowingchances (Eells 1982, Skyrms 1984, Jeffrey 1987, Broome 1991).As Skyrms (1984, pp. 33f.) puts it, ignorance of chances mightinduce the unpleasant sensation of "butterflies in the stomach."If this is right, then the consequences in Problems D and Ewould be more correctly represented as in Figure 3.4 (whereF = butterflies in the stomach). If something like this is thecorrect representation of the consequences, then the Ellsbergpreferences do not violate independence.

Like the similar move with the Allais problems, this way ofmaking the Ellsberg choosers consistent with independence isunacceptably ad hoc in the absence of some independent evi-dence that Ellsberg choosers do indeed put an intrinsic valueon knowing chances. The mere fact that these subjects tend toprefer gambles where the chances are known does not show thatthey put an intrinsic value on knowing chances. Analogy: People

68

generally prefer gambles in which the probability of winning ishigh, but Bayesians do not infer from this that people generallyput an intrinsic value on having a high probability of winning.Rather, the high probability of winning is taken to be a meansto the intrinsically desired end of actually winning. Similarly,subjects who make the EUsberg choices might do so becausethey think this is a good means to the intrinsically desired endof receiving $100, not because they put any intrinsic value onknowing objective chances.

Yoshihisa Kashima and I conducted a study to ascertainwhether EUsberg choosers put an intrinsic value on knowingchances. The study was conducted with undergraduate studentsat La Trobe University. They were presented with a version ofProblems C and D, with the $100 increased to $1,000. Seven-teen of our subjects made the EUsberg choices, and these werepresented with the following reasons for those choices. Somewere offered these reasons with respect to Problem C, otherswith respect to Problem D.

1. If I choose C2 (di) I don't know the probability of winning,and that will make me feel nervous. If I choose c\ (cfe), I knowthe probability of winning, and I'll feel more comfortable withthat. So I choose c\ {d<i) to avoid the feeling of anxiousnessthat comes with not knowing the probability of winning.

2. My only concern is to try to win the $1,000; feelings of anx-iety don't bother me. I choose c\ (G^) because I think itmakes sense to choose a known chance of achieving my goal,instead of an unknown chance of achieving it.

3. I agree with both reasons 1 and 2 above for choosing c\(cfo). That is, I choose c\ (cfc) both to avoid the feeling ofanxiety that would come with not knowing the probabilityof winning, and also because choosing a known chance ofwinning is a good way to win.

Subjects were asked to indicate to what extent they agreedwith each of these three arguments. The result was that theyagreed most with the second reason. This tends to disconfirmthe hypothesis that EUsberg choosers put an intrinsic value onknowing chances.

69

So the data on the Ellsberg choices, like that on the Allaischoices, provide reason to think that many people are not com-mitted to independence. However, I will argue that these peopleare making a mistake.

3.2 ARGUMENTS FOR INDEPENDENCE

3.2.1 Synchronic separabilityI begin by presenting an argument that the Allais preferencesare a mistake. This argument is due in its essentials to Marko-witz (1959, pp. 221-4) and Raiffa (1968, p. 82f.), though mypresentation will differ slightly from either of theirs.

Suppose that, as in the version of the Allais problems consid-ered above, we have an urn containing balls numbered from 1 to100. But now, before you have to make any choice, the experi-menter draws a ball from the urn; if the number on the ball is12 or greater, you receive either $1 million or nothing, depend-ing on which version of the experiment is being conducted. Ifthe number on the ball is 11 or less, the experimenter does notreveal what the number is, and gives you your choice betweenthe following options:

/ : $1 million.g: $5 million if the number on the ball is not 1, and nothing

if it is 1.

The situation is diagrammed in Figure 3.5. I follow the stan-dard convention, in which boxes represent nodes at which achoice is made, and circles represent nodes at which "nature"or "chance" determines the branch to be taken.

Now suppose you could specify in advance what choice youwould make, should a ball numbered 11 or less be drawn. Thatis to say, you can now tell the experimenter whether you want/ or g should a ball numbered 11 or less be drawn, and theexperimenter will take that to be your choice, should the drawnball be numbered 11 or less. What choice do you want to specifyin tree (a) of Figure 3.5? Is this the same choice you want tospecify in tree (b)?

Markowitz and Raiffa report that most people specify the samechoice for both trees. Markowitz reports that the invariable

70

$1 million12-100

1-11

$1 million

12-100

$5 million

$1 million

$5 million

(a) (b)

Figure 3.5: Sequential version of the Allais problems

choice was / - that is, the million dollars for sure. In any case,we have that for almost everyone, either

(i) Specifying / in advance is weakly preferred to specifyingg in advance in both trees (a) and (b); or

(ii) Specifying g in advance is weakly preferred to specifying/ in advance in both trees (a) and (b).

Furthermore, it seems uncontroversial that the consequencesa person values are not changed by representing the optionsin a tabular or tree form. But then, specifying / in advancein tree (a) of Figure 3.5 is the same act as choosing b\ in theAllais problems, while specifying g in advance in that tree is thesame as choosing &2- Hence (i) is inconsistent with the Allaispreference 61 -< 62- Similarly, (ii) is inconsistent with the Allaispreference a^ -< a±.

The data reported by Markowitz and Raiffa were obtainedin informal discussion; but a formal study by Kahneman andTversky (1979) points to the same conclusion. Kahneman andTversky used a version of the Allais problems with differentprobabilities and prizes, and found that the majority of theirsubjects made AUais-type choices. However, when subjects werepresented with Kahneman and Tversky's analog of tree (b) andasked what choice they would specify in advance, 78 percent

71

specified Kahneman and Tversky's analog of / , a choice incon-sistent with the Allais preferences.3

McClennen (1983, n. 5) attempts to resist this conclusion byclaiming the instructions that Markowitz, and Kahneman andTversky, gave their subjects were ambiguous. He claims thatwhen it is said the choice must be specified in advance, it is un-clear whether one is being asked to predict one's later choice,or whether one is being asked to now make the choice. How-ever, the instructions that Markowitz as well as Kahneman andTversky report giving seem unambiguous to me. Markowitz'sformulation of the problem (modified slightly to fit my presen-tation above) is

Suppose you had to commit yourself in advance. Suppose that, beforethe ball is drawn, you had to say "I will take / if the choice arises"or "I will take g if the choice arises." To which would you commityourself?

Kahneman and Tversky refer to the chance and choice nodesin their version of the problem as the first and second stages]and they specify that

Your choice must be made before the game starts, i.e., before theoutcome of the first stage is known.

This leaves no room for reasonable doubt that subjects are be-ing asked to make the choice in advance.

We have been considering an argument against the Allaispreferences. Kashima and I (1992) show that parallel pointscan be made about the Ellsberg preferences. That is to say,the Ellsberg preferences can be presented in a tree format, andwhen they are so presented most Ellsberg choosers cease to havethe Ellsberg preferences.

This line of argument can be generalized to give an argumentagainst all violations of independence, as I will now show.

I will say that you currently prefer choosing / rather than gat a choice node iff you now prefer specifying that you choose/ were you to reach that node, rather than specifying that youchoose g. The preference here can be understood as either strict3The decision problems mentioned here are problems 3, 4, and 10 in (Kahneman

and Tversky 1979).

72

or weak. So the data reported above can be expressed by sayingthat most people currently prefer that they choose / ratherthan g at both choice nodes in Figure 3.5. This data suggeststhat most people accept the following principle as normativelycorrect:

Synchronic separability. Your current preferences regardinghow you choose at a choice node do not depend on what wouldhappen if you do not reach that choice node.

For example, the difference between the choice nodes in trees (a)and (b) of Figure 3.5 consists solely in whether you would orwould not have received a million dollars if you had not reachedthat node, and most people think this should not affect prefer-ences regarding the choice at the node.

The term separability here denotes that the part of the deci-sion tree prior to the choice node can be ignored in consideringwhat choice is best at the node. I add the adjective synchronicbecause the preferences at issue are your current preferencesonly. (A principle of diachronic separability will be discussed inSection 3.2.2.)

Synchronic separability entails independence.

Proof. Suppose / = f on A, g = g1 on A, / = g on A, / ' = g'on A, and / •< g. What needs to be shown is that if synchronicseparability holds, it follows that / ' -< g'.

In tree (a) of Figure 3.6, specifying that you choose #, shouldyou reach the choice node, is equivalent to now choosing g over/ . So since / •< g, you must now weakly prefer specifying thatyou choose g, as opposed to / . And this is the same as sayingthat you now weakly prefer that you choose p, should you reachthe choice node in tree (a). By synchronic separability, it followsthat you now have the same preference about your choice at thechoice node in tree (b). So you now weakly prefer specifyingthat you choose g at the choice node in tree (b), as opposed tochoosing / . But specifying g in advance in tree (b) is the sameas choosing g' over / ' . Hence you must have / ' -< g1.

Thus we have an argument for independence, from premisesthat appear to be very widely endorsed.

73

Figure 3.6: Trees to show synchronic separability entails independence

There are a number of arguments for independence in theliterature, and the argument I have just given is closely relatedto many of them. However, so far as I know, the argument as Ihave given it is also at least subtly different from the argumentsin the literature. The differences are for a reason: I think theargument as I have given it is superior to the arguments in theliterature. So, at least for afficionados of this literature, I willexplain why I think the argument from synchronic separabilityis superior to some of the arguments familiar in the literature.Since this discussion will not add any stronger argument forindependence than the one just given, many readers may wantto skip it and turn directly to the consideration of the argumentsagainst independence, in Section 3.3.

3.2.2 Diachronic separability and rigidityThe argument of Markowitz and Raiffa, against the Allais pref-erences, has not usually been seen as appealing to synchronicseparability. Instead, it has been seen as appealing to two prin-ciples, one of which is:

Diachronic separability. The preferences you would havewere you to reach a choice node do not depend on what wouldhave happened, had you not reached that node.

Applied to the sequential version of the Allais problems, thissays that if a ball numbered 1-11 is drawn, the preference you

74

would then have will be the same whether you are in tree (a) or(b) of Figure 3.5. It does not say anything about your currentpreferences regarding your choice at that point. Hence the namediachronic separability.

Diachronic separability is not by itself inconsistent with theAllais preferences. One needs in addition the principle of

Rigidity. Your current preferences regarding how you chooseat a choice node are the same as the preferences you wouldhave were you to reach that choice node.

If rigidity is assumed, then diachronic and synchronic separa-bility become equivalent. So rigidity and diachronic separabilitytogether entail synchronic separability, and hence entail inde-pendence. However, this implication is not a good reason toconclude that independence is a requirement of rationality. Igave some reasons for rejecting rigidity in Section 2.2.2; I willgive additional reasons here.

One way preferences might rationally change, as you movethrough a decision tree, is that you acquired new informationalong the way. One might attempt to preclude this by stipulat-ing that no relevant new evidence is acquired. But you mightstill use the time interval to reconsider your preferences, andmay have some insight that leads to a rational revision of thosepreferences, without acquiring any new evidence. To precludethis, one would need to stipulate that a rational person is log-ically omniscient, so that nothing further can be learned bythought alone. However, since we are plainly not such creatures,this would be to concede that the argument for independencedoes not apply to us.

Another way preferences might rationally change is due toselective discounting of the future. It is usual in economics tosuppose that future benefits are discounted according to howfar in the future they lie. This in itself does not seem irrational.Nor would it be irrational to use different discounting schedulesfor different things. For example, future pains might be dis-counted more or less strongly than future pleasures. But then,if some possible future consequences contain different mixturesof pleasure and pain, a person's preferences between those con-sequences will change as the time of obtaining them approaches.

75

In Section 2.2.2,1 indicated a way that any objection to rigid-ity can be met. We can stipulate that our criterion of personalidentity is such that persons cease to be self-identical wheneverthey change their preferences. This makes rigidity not only truebut tautologous. If we take this line, then diachronic separabil-ity has the same content as synchronic separability, and thepresent argument for independence is just a misleading way ofgiving the same argument that I gave in the preceding subsec-tion.

3.2.3 The sure-thing principleSavage (1954, pp. 21-3) says that independence (which he calls"P2") is suggested by what he calls the sure-thing principle*Savage introduces this principle with the following example.

A businessman contemplates buying a certain piece of property. Heconsiders the outcome of the next presidential election relevant to theattractiveness of the purchase. So, to clarify the matter for himself,he asks whether he would buy if he knew that the Republican candi-date were going to win, and decides that he would do so. Similarly,he considers whether he would buy if he knew that the Democraticcandidate were going to win, and again finds that he would do so.Seeing that he would buy in either event, he decides that he shouldbuy, even though he does not know which event obtains.

This businessman is applying the sure-thing principle. Savage's"relatively formal" statement of the sure-thing principle is es-sentially as follows.

The sure-thing principle. If you would have f ^ g were youto learn A and would also have f ^ g were you to learn A, thenyou should now have f ^ g. Moreover (provided you do notregard A as virtually impossible) if you would have f -< g wereyou to learn A and would also have f ^ g were you to learn A,then you should now have f -< g.

4 As I noted in Chapter 1, many authors have identified Savage's sure-thing prin-ciple with his P2 (= independence). Ellsberg's 1961 paper contains the earliestexample of this identification that I have noticed. Jeffrey (1983) considers Sav-age's discussion of the sure-thing principle in some detail but makes the sameidentification. I think this identification fits poorly what Savage says. Also, thepossibility of arguing from (what I call) the sure-thing principle to (what I call)independence makes it desirable to have different names for these principles.

76

The name of the principle comes from the idea that, under thestated conditions, it is a sure thing that g is at least as goodas / . Savage says that "except possibly for the assumption ofsimple ordering [i.e., transitivity and connectedness], I know ofno other extralogical principle governing decisions that findssuch ready acceptance."

The phrase "virtually impossible," which appears in the sure-thing principle, is vague. But later Savage gives it a precisemeaning: You regard A as virtually impossible just in case youare indifferent between all acts that differ only on A (1954,p. 24). Events that you regard as virtually impossible, in thissense, are also said to be null.

The "learning" referred to in the sure-thing principle mustbe understood in such a way that it satisfies this condition:

(*) If you were to learn A, then you would be indifferentbetween acts that do not differ on A.

Another way of putting this would be to say that if you wereto learn A, then A would be null.

With these understandings, the sure-thing principle entailsindependence.

Proof. Suppose / = / ' on A, g = g' on A, / = g on 4, / ' = g1

on A, and f £ g. Case (i): A is not null. For any preferencerelation i?, let A—>R denote that you would have R were youto learn A. By (*), A—>f ~ g. If A—»/ >- g, we would then haveby the sure-thing principle that f y g, which is not the case;hence A—>f£g. Since j — f and g = g1 on A, it follows (by(*) and transitivity) that A—>f ^ g 1. Since / ' = ^ on i , wealso have A—>f ~ g'. So by the sure-thing principle, f ^ g', asindependence requires. Case (ii): A is null. Then since / ' = g1

on A, it follows from the definition of a null event that f ~ </,and so again f -<gl\

Why is this not a better argument than the one from syn-chronic separability? Well, first of all, note that the sure-thingprinciple tacitly assumes that there is a fact about what prefer-ence you would have between / and #, were you to learn A. Butif you violate diachronic separability, this need not be the case;

77

the preference you would have after learning A might depend onwhat you would have got if A had obtained. Thus the sure-thingprinciple, as stated by Savage, presupposes diachronic separa-bility. Furthermore, this presupposition plays an essential rolein the proof that the sure-thing principle entails independence.

Second, I note that the sure-thing principle entails rigidity.For suppose you violate rigidity, because you will have / •<g in the future, though you now have g -< f. Let A be anyproposition whose truth value you will not learn until after yourpreference has changed. Then if you were to learn A or A, youwould have / 3 g; so by the sure-thing principle, you shouldnow have f -<g. Hence you violate the sure-thing principle.

Thus the argument from the sure-thing principle is not essen-tially different from the argument from diachronic separabilityand rigidity. And so it is open to the same objection: Rationalitydoes not preclude changes of preferences over time.

Furthermore, empirical studies do not bear out Savage's claimthat "except possibly for the assumption of simple ordering.. . no other extralogical principle governing decisions . . . findssuch ready acceptance" as the sure-thing principle. MacCrim-mon and Larsson (1979, sec. 6) found that subjects on averagerated their agreement with the first part of the principle, on ascale from 0 to 10, as 6.7. This is not inordinately high; subjectsindicated higher agreement with some principles of choice thatare inconsistent with the sure-thing principle.

After this disparaging of the argument from the sure-thingprinciple, I should say that I think there is something rightabout the principle. I will use the notation / -<A g to denotethat if you could now specify how you would choose were youto learn A, you weakly prefer specifying that you choose g.(Relations like / -<A 9 and / ~A 9 are defined similarly.) WhatI think is right is the

Synchronic sure-thing principle. If f £A g and f ^ g,then f •< g. Moreover, if A is not null, and if f -<A 9, then

As the name indicates, this is a synchronic analog of Savage'ssure-thing principle. While I would endorse this principle, itis not much use in arguing for independence. For the principle

78

presupposes synchronic separability, and we have seen that syn-chronic separability by itself entails independence.

3.2.4 The value of informationA principle with considerable intuitive appeal is that it can-not be undesirable to acquire cost-free information. This prin-ciple follows from the principle of maximizing expected utility,provided suitable conditions are satisfied. (The "suitable condi-tions" are stated in Section 5.1.2.) Wakker (1988) suggests thatindependence follows from this attractive principle.5

For present purposes, it will be sufficient to present an ex-ample of how this argument is supposed to work. Suppose thenthat you have the AUais preferences but that (as most peopleapparently do) you would prefer f to g were you at either choicenode in Figure 3.5. Now I present you with the following choice:You can either specify your choice in tree (b) of Figure 3.5 inadvance, or else you can wait and see whether or not the ball isnumbered between 1 and 11, and if it is, you then get to makeyour choice. This decision problem is represented in Figure 3.7.

Specifying / in advance is equivalent to choosing 61 (fromProblem B, Section 3.1.1), whereas specifying g in advance isequivalent to choosing 62- Thus at node 2, your options are ef-fectively between b\ and 62- Since you have the Allais preferenceb\ -< 62 ? you would therefore choose g at node 2. Thus decidingto choose now is equivalent to choosing to specify g in advance,which is equivalent to choosing 62-

On the other hand, we have assumed that you would prefer/ to g at either choice node in Figure 3.5; hence6 you wouldprefer / to g at node 3 in Figure 3.7. So if you postpone yourdecision until after you learn whether the ball is numbered 1—11, you will get $0 if it is not and $1 million if it is. This is5 Wakker presents an argument that certain violations of Samuelson's strong in-

dependence axiom entail that cost-free information is undesirable. But he saysthat the argument can be reformulated to apply to the "sure-thing principle"in place of strong independence. I guess that by the "sure-thing principle" hemeans what I am calling independence, and this is the argument I will considerhere. But Wakker's argument is also easily adapted to give an argument for thesure-thing principle of Section 3.2.3.

6This isn't a logical entailment. It actually depends on an application of diachronicseparability. Another way in which Wakker's argument presupposes diachronicseparability will be noted shortly.

79

12-100.

Choose now

Wait for information

million

million

million

Figure 3.7: Decision of whether to acquire information

equivalent to getting b\. Because you have the Allais preferenceb\ -<b2, you therefore prefer to make your choice before gettingthe information. Thus you violate the principle on the value ofcost-free information.

We have been assuming that you would have g -< / at bothchoice nodes in Figure 3.5. If instead you would have / -< g atboth choice nodes, the argument is easily reformulated, usingtree (a) of Figure 3.5 in place of tree (b) of Figure 3.5. But theargument does not work at all if you would have g -< / at thechoice node in tree (a) but / -< g at the choice node in tree (b).Thus Wakker's argument tacitly assumes that diachronic sepa-rability holds (or at least that it is not violated in this way).

I turn now to what I take to be the main weakness in Wakker'sargument. At node 1 in Figure 3.7, we are supposing you knowyou would choose / were you at node 3. But we have also seenthat if you could specify your choice in advance, you wouldspecify that you choose g. Thus you know that if you get theinformation, you will make a choice that you do not now thinkoptimal for that situation. And it is really this that causes younot to want to acquire the information. But when your futurechoices will not be made in the way you now think would be

80

optimal, it is no longer intuitively plausible that cost-free in-formation is not undesirable, nor does expected utility theoryimply this result. Consequently, the fact that you would wantto avoid cost-free information under these circumstances is nota reason to say you are irrational.

One might try answering this objection by saying that thereal indicator of irrationality is that you know you would undercertain circumstances choose in ways you now think suboptimal.But this is to assume a kind of rigidity condition, and is opento the same sort of objections as I have already raised.

To avoid these difficulties, we might try framing Wakker'sargument in terms of current preferences. I would agree thatif you know your future choices will be made in ways you nowthink optimal, then you should not want to avoid cost-free in-formation. But recall that Wakker's argument also assumed di-achronic separability; if we shift to current preferences, thatassumption becomes synchronic separability. And we have seenthat synchronic separability by itself suffices to derive indepen-dence. Hence the appeal to the value of information becomesredundant.

3.2.5 Other argumentsI have now discussed three alternatives to my argument fromsynchronic separability, and a common pattern has emerged:These alternative arguments all assume diachronic separabil-ity, which I think is no more attractive than synchronic sepa-rability. In addition, these arguments all tacitly assume rigid-ity, or something comparable, which is not a requirement ofrationality.

There are other arguments for independence that I have notdiscussed. These include the arguments of Hammond (1988a,b)and Seidenfeld (1988). I will not discuss these arguments here,because what I would end up saying about them is essentiallythe same as what I have said about the last three arguments dis-cussed: They assume principles akin to diachronic separabilityand rigidity.

So I conclude that the argument from synchronic separa-bility is the best argument for independence that we have.

81

Fortunately, that argument is a good one; it derives indepen-dence from premises that most people accept.

3.3 OBJECTIONS TO INDEPENDENCE

There are able decision theorists who reject independence. It istime to look at what reasons they have to support this position.This can be brief, because the main points have already beencovered in Section 3.1.

Allais, EUsberg, and others have claimed that many people,who seem quite reasonable, do choose in a way that violatesindependence, and that this is a reason to think that a rationalperson need not satisfy independence. But as we have seen, theputative violations may not be genuine violations, because theymay disappear when consequences are made sufficiently specificto include everything the person values in the outcome. Second,even if it can be shown that those with the Allais and EUs-berg preferences really are violating independence, there maybe good reason to call them irrational; for they probably acceptthe premises of the argument from synchronic separability, inwhich case it is undeniable that they are making a mistake.

Allais thought there was a good reason to have the Allaispreferences, namely that when an option has a sure payoff, thecertainty itself provides a reason to choose that option, overand above what the payoff itself is. But if certainty has anintrinsic value, it needs to be included in the specification ofthe consequences; and once that is done, the Allais preferencesdo not violate independence. On the other hand, if certaintydoes not have an intrinsic value, Allais's argument in favor ofthe Allais preferences ceases to be compelling.

The situation is similar with EUsberg. He maintained thatthere was a good reason to have the Ellsberg preferences, namelythat when an option involves known chances, that fact itselfprovides a reason to choose the option. If knowing chances hasintrinsic value, it needs to be included in the specification of theconsequences; and once that is done, the Ellsberg preferences areconsistent with independence. And if knowing chances does nothave intrinsic value, Ellsberg's argument in favor of the Ellsbergpreferences ceases to be compelling.

82

3.4 CONCLUSION

I have tried to show that there is no compelling argumentagainst independence, and there is an argument for it fromthe attractive principle of synchronic separability. Of course,those decision theorists who oppose independence will respondby rejecting synchronic separability. I do not have further argu-ments to prove they are wrong, but they also lack an argumentto prove their position. McClennen would claim that we shouldnot adopt principles of rationality that do not enjoy unanimousendorsement, but I have already criticized that position in Sec-tion 2.4.

So while I do not claim to be able to bring opponents ofindependence to their knees, I do claim that the view thatindependence is a requirement of rationality is as reasonableas any contrary view. If the debate is to be taken further, I sug-gest that the most productive way to do it is for the differentsides to develop theories based on their favored principles, sothat we can judge the principles by their fruits. In this spirit, Inow turn to the development of a theory based on the assump-tion that independence, along with transitivity and normality,is a requirement of rationality.

83

Subjective probability in science

4.1 CONFIRMATION

The preceding three chapters have been concerned with theBayesian theory of rational preference. I turn now to a topic thatmay at first sight appear to be quite remote from those concerns,namely the question of the relation between scientific theoriesand the evidence for or against them. But as will quickly becomeapparent, these topics are intimately connected.

All scientific theories go beyond the evidence that is ad-duced in support of them; hence the truth of the theories doesnot follow by logic from that evidence. Even if the evidenceis correct, it always remains possible that the theory is false.Nor is this a merely idle possibility. Newtonian mechanics wasas well supported by evidence as any scientific theory is likelyto be, but nevertheless proved to be not the literal truth. Thisfact naturally leads one to ask why it is that some evidence,which does not guarantee the truth of a theory, is neverthe-less regarded as supporting, or confirming, the truth of thetheory.

Bayesian philosophers of science hold that scientists have sub-jective probabilities for scientific theories. Evidence that con-firms a theory is then taken to be evidence that increases thescientist's subjective probability for the theory. The questionposed in the preceding paragraph is thus construed, from aBayesian perspective, as a question about why some evidenceincreases the subjective probability of a theory.

Bayesian answers to this question make use of a principle ofconditionalization, which we can state as follows.1

1Here '•' represents an arbitrary event or proposition, and p(-\E) denotes probabil-ity conditioned on E. According to the usual definition, p(A\E) = p(AE)/p(E),and is defined only if p(E) ^ 0.

84

Conditionalization. If your current probability function is p,and if q is the probability function you would have if you learnedE and nothing else, then q(-) should be identical to p(-\E).In Chapter 5 I will show that this principle follows from theprinciple of maximizing expected utility, under some reasonablymild additional assumptions.

The principle of conditionalization, together with the Bayes-ian understanding of what confirmation is, entails that evidenceE confirms hypothesis H just in case p(H\E) > p(H). Similarly,E disconfirms H just in case p(H\E) < p{H).

Bayesian philosophers of science have shown that this rathermodest-looking account of confirmation can explain many meth-odological intuitions about when, and how strongly, evidenceconfirms a hypothesis.

For a basic illustration of this, suppose that a hypothesis if,together with some background information in which we arehighly confident, entails some testable proposition E. Supposealso that every plausible alternative hypothesis, together withbackground information, entails that E is false. Our intuitionsare that in this case, the verification of E would confirm if.On the Bayesian account of confirmation, this intuition can beexplained, as follows. The elementary theorem of probabilityknown as Bayes' theorem entails that

Since p(H) + p(H) = 1, we can rewrite this as

P(E\H) - \p(E\H)Because E follows from H together with background informa-tion of which we are confident, p(E\H) is high. And becauseevery plausible alternative hypothesis predicts that E is false,p{E\H) is low. Thus p(E\H) - p{E\H) > 0, and so, assumingp(S) > 0, we have

which is the desired result.85

For a less trivial example, consider the following two scenar-ios.

(i) A hypothesis H is advanced, and H entails theempirical proposition E. Later it is discovered that E isindeed true.

(ii) E is discovered to be true, and then a hypothesis H isproposed to explain why E is true.

Many scientists and philosophers of science have maintainedthat, other things being equal, H would be better confirmed insituation (i) than in situation (ii). I call this view the predictivistthesis. Among those who have endorsed this thesis are Bacon(1620, book I, cvi), Leibniz (1678), Huygens (1690, preface),Whewell (1847, vol. 2, p. 64f.), Peirce (1883), Duhem (1914,ch. II, §5), Popper (1963, p. 241f.), and Kuhn (1962, p. 154f.).But why should the predictivist thesis be true? In (Maher 1988,1990a) I show that there is a Bayesian explanation. I do notknow of any cogent non-Bayesian explanation of the predictivistthesis.2

For Bayesian explanations of some other judgments that sci-entists make about confirmation, see (Howson and Urbach 1989,ch. 4).

4.2 NORMATIVITY

I have said Bayesian theory can explain judgments that scien-tists make about when evidence confirms theories. But in whatsense does it explain these judgments? One possibility is thatthe Bayesian explanation represents the reasoning by which sci-entists arrive at their judgments about confirmation. But thisview faces serious difficulties. For one thing, Bayesian expla-nations of judgments about confirmation often involve moder-ately lengthy mathematical derivations - my own explanationof the predictivist thesis takes several pages of mathematics. It2For criticism of some non-Bayesian attempts at explaining the predictivist thesis,

see Campbell and Vinci (1983). Campbell and Vinci (1982, 1983) attempt theirown Bayesian explanations, which I criticized in (Maher 1988, p. 283). Howsonand Franklin (1991) criticize my defense of the predictivist thesis, but theircriticism rests on a failure to distinguish between a hypothesis and the methodby which the hypothesis was generated (Maher in press).

86

is not plausible to think that scientists are doing such nontrivialmathematics when they make judgments of confirmation. Sec-ond, psychologists have amassed evidence that people generallydo not reason using probability theory, but rather use heuristicsthat violate probability theory in systematic ways (Kahneman,Slovic, and Tversky 1982).

So I think it is a mistake to view Bayesian confirmation theoryas a psychological theory of scientists. I suggest we should view itrather as a normative theory. A Bayesian "explanation" of somejudgment scientists make about confirmation purports to showthat, and why, this judgment is rational The reason why the judg-ment is rational may or may not be the reason why the judgmentwas actually made. Bayesian confirmation theory is concernedwith the former, while the latter is the province of psychology.

Why does it matter whether Bayesian confirmation theorycan explain the judgments about confirmation that scientistsmake? Following Rawls (1971, pp. 19-21, 46-7) and Goodman(1973, pp. 62-6), it is now widely allowed that a normative the-ory should be judged by both (a) the plausibility of its basicassumptions and (b) the degree to which it fits pretheoreticjudgments about particular cases. I have argued in Chapters 1to 3 that the subjective probability theory on which Bayesianconfirmation theory rests is plausible; and in Chapter 5 I willshow that under fairly weak conditions, the principle of con-ditionalization follows from this. Thus Bayesian confirmationtheory is in a satisfactory state with respect to criterion (a).The point of explaining scientists' intuitive judgments aboutconfirmation is to show that Bayesian confirmation theory isalso successful with respect to criterion (b) - it fits pretheoreticjudgments about particular cases.

But, some philosophers still ask, how can showing that a the-ory fits actual judgments show that it is correct as a normativetheory? Isn't this a futile attempt to argue from the descriptiveto the normative, from what is to what ought to be? Not atall, because the explanation of a judgment of confirmation isonly reason to think the theory normatively correct if we areantecedently confident that the judgment itself is correct. Sothe argument is from particular normative judgments to gen-eral ones, not from the descriptive to the normative.

87

All of this has a more familiar parallel in logic. A system oflogic can be used to explain why certain accepted arguments arevalid. This explanation cannot generally be regarded as a de-scription of the reasoning of people who make or endorse thosearguments. All we can safely say is that the logical explanationof the validity of an argument explains why the judgment ofvalidity is correct. Furthermore, if the question of the correct-ness of the logical system arises, a relevant factor in answeringthat question is the extent to which the system fits pretheoreticjudgments about the validity of particular arguments.

I have been stressing that Bayesian confirmation theory shouldbe interpreted as normative, not as a descriptive theory of howscientists actually reason. But this is not to say that there isno connection at all between the demonstrations of Bayesianconfirmation theory and actual scientific reasoning. Indeed, theremarkable degree of agreement between considered scientificjudgments of confirmation, and Bayesian confirmation theory,makes it overwhelmingly likely that there is some connection.But the degree of fit is surely rather crude, in general.

4.3 BETTING ON THEORIES

Bayesian confirmation theory assumes that scientists have sub-jective probabilities for scientific theories. In Chapter 1, I ad-vocated a preference interpretation of subjective probability(and utility), according to which you have subjective proba-bility function p if your preferences maximize expected utilityrelative to p (and some utility function). I intend this inter-pretation to be quite general, and so to apply in particular tosubjective probabilities for scientific theories.

It has sometimes been claimed that the preference interpre-tation of subjective probability cannot apply to probabilities ofscientific theories.3 Those who make this claim seem to have inmind an argument like the following.

1. The preference interpretation of probability defines the prob-ability of A in terms of preferences regarding bets on A.

3For example, by Shimony (1970, p. 93), Dorling (1972, p. 184), and Thagard(1988, p. 98).

2. A bet on A is possible only if we can definitively establishwhether or not A is true.

3. We cannot definitively establish whether or not scientific the-ories are true.Therefore:

4. Probabilities for scientific theories cannot be defined by thepreference interpretation.

But I will show that this argument has two flaws: It is invalid,and one of its premises is false.

To see that the argument is invalid, note that what follows from(2) and (3) is merely that we cannot actually implement bets onscientific theories; it does not follow that we cannot have prefer-ences regarding these bets. But it is the impossibility of havingpreferences regarding bets on scientific theories that is needed toderive the conclusion (4). Thus the argument tacitly assumes5. Preferences are defined only for possible bets.But (5) is surely false; we often have preferences between op-tions we could not actually be presented with.

Consider an example: Thagard (1988, p. 98) writes that he"is unable to attach much sense to the question of how to bet,for example, on the theory of evolution." Now there is certainlya problem here, in that "the theory of evolution" is not a well-defined entity. But for present purposes, let's identify this "the-ory" with the hypothesis that all species on earth evolved froma small number of primitive species. I asked several biologists toimagine that we will definitively determine the truth value of thisthesis. (For vividness, I suggested they might imagine that somereliable extraterrestrials have been observing the earth since itsearly days, and will inform us of what they saw.) I asked these bi-ologists to suppose also that they have their choice of (a) $1,000 ifthe evolutionary hypothesis is true or (b) $1,000 if it is false. Thebiologists had no trouble making sense of the imagined situationand were emphatic that they would choose (a). Prom this we caninfer that they prefer bet (a) to bet (b), and this is consistentwith (a) and (b) not being bets one could actually make.4

4If we assume that the biologists' preference for (a) over (b) maximizes their ex-pected utility, then we can infer that these biologists give evolution a probabilitygreater than 1/2.

89

I am not claiming that we have preferences regarding all con-ceivable bets. There may be some pairs of bets for which weneither prefer one to the other, nor are indiflFerent between them(without violating any requirements of rationality). In such acase, we have indeterminate subjective probabilities and utili-ties. What I am claiming is that we can have preferences regard-ing conceivable bets that cannot in fact be implemented; and Ithink most of us do have preferences about many such bets.

I have shown that the argument for (4) is invalid and that thisinvalidity cannot be removed without adding a false premise. Iwill now show that the original argument already contains afalse premise. This is the premise (2), which asserts that a beton A is possible only if we can definitively establish whether ornot A is true. To see that this is false, note that a bet on A is anarrangement in which some desirable consequence occurs justin case A is true. If we cannot definitively establish whether ornot A is true, then we cannot definitively establish whether ornot we have won the bet; but it does not follow that we cannotmake the bet.

This point can be illustrated by the arrangements DavidHume made for publishing his Dialogues Concerning Natu-ral Religion. Hume had been persuaded by his friends thatpublication of the Dialogues was likely to outrage the reli-gious establishment, and that this could have unfortunate con-sequences for him. But Hume wanted the Dialogues publishedbecause he thought they might have a beneficial influence, andbecause he wanted to add to his fame. So he withheld the Di-alogues from publication during his lifetime but made elabo-rate provisions to ensure that they would be published afterhis death. In making these provisions, Hume was betting thatthese provisions would result in publication of the Dialogues;but Hume knew he would never be able to establish definitivelywhether or not he had won this bet.5

5 Someone will think: "Hume made these arrangements because the thought thatthe Dialogues would be published pleased him - and he could verify that hehad this thought." But what Hume wanted to achieve was not that he think theDialogues would be published; he sought rather to achieve that the Dialogues bepublished. Given a choice between (i) falsely believing that the Dialogues wouldbe published, and (ii) falsely believing that they would not be published, I amsure Hume would have chosen (ii).

90

In Chapter 6 I will introduce the notion of accepting a propo-sition. Suppose, as I think is not too far from the truth, thatscientists typically would like the theories they accept to betrue.6 Thus acceptance of theory A is an arrangement in whichscientists get the consequence "Accepting A when it is true" ifA is true, and "Accepting A when it is false" if A is false. Theyregard the former consequence as more desirable than the latterconsequence, and so this is a bet on A - even though nobodycan definitively establish whether A is true. Since scientists can(and do) accept scientific theories, they can (and do) make betson propositions whose truth value cannot be conclusively ascer-tained. Thus (2) is false.

In Chapter 8 I will show that preferences over the cognitiveoptions of accepting various theories, together with conductingvarious experiments, are all that is needed in order to apply thepreference interpretation of probability to scientific theories.

So the preference interpretation of subjective probability isapplicable to the subjective probabilities of scientific theories.

4.4 SUBJECTIVITY

Probably the most common objection raised against Bayesianphilosophy of science is that it is "too subjective." A vivid ex-ample is the following passage by Wesley Salmon. Salmon isdiscussing the view that satisfaction of the axioms of probabil-ity is necessary and sufficient for rationality, a view he associateswith Ramsey and de Finetti (Salmon 1967, n. 102). He objectsthat on this view:

You cannot be convicted of irrationality as long as you are willing tomake the appropriate adjustments elsewhere in your system of beliefs.You can believe to degree 0.99 that the sun will not rise tomorrow.You can believe with equal conviction that hens will lay billiard balls.You can maintain with virtual certainty that a coin that has consis-tently come up heads three quarters of the time in a hundred milliontrials is heavily biased for tails! There is no end to the plain absur-dities that qualify as rational. It is not that the theory demands the6 It would make no difference if we said they want theories they accept to be

approximately true, or to be empirically adequate. For we cannot definitivelyestablish whether a theory has these properties, just as we cannot definitivelyestablish whether it is true.

91

acceptance of such foolishness, but it does tolerate it. (Salmon 1967,p. 81)

This objection can be read in two ways. On one reading, whatit claims is that you, a person who thinks that the sun will risetomorrow, and so on, could arbitrarily switch to thinking thatthe sun will not rise, and so on, without being deemed irrationalby this theory. That is a relevant objection to the view that sat-isfaction of the axioms of probability is sufficient for rationality.But it is not a relevant objection to the Bayesian confirmationtheory described in this chapter. The reason is that these ar-bitrary shifts would violate the principle of conditionalization.Since Ramsey and de Finetti also endorsed conditionalization,it is likewise not a relevant objection to their views.7

On the other possible reading of Salmon's objection, what itclaims is that a being who had these bizarre probabilities (per-haps by being born with them) would not necessarily be deemedirrational by this theory. In this case, it is not so obvious to methat we are dealing with a case of irrationality. Still, I don'twish to deny the point, that a person who satisfies the formalrequirements of Bayesian theory may nevertheless be irrational.I take Bayesian theory to be specifying necessary conditions ofrationality, not sufficient ones (compare Section 1.9). So again,my response to Salmon's objection would be that the positionhe is criticizing is not the one I am defending. I think it is alsonot the position Ramsey defended. Ramsey (1926, sec. 5) saysthat logic, which he conceives as the rules of rational thought,contains not only a "logic of consistency" (including probabil-ity theory), but also a "logic of truth" (which specifies whatinference patterns are reliable).

Perhaps it will now be objected that Bayesian theory is in-complete, because it does not give a sufficient condition of ra-tionality. But I am pessimistic about the prospects for a formaltheory that would give a sufficient condition of rationality. Ofcourse, those who think otherwise are welcome to pursue thatresearch program. In any case, it is enough for my purposes hereif Bayesian theory gives a necessary condition of rationality.7Kyburg has claimed that Bayesians have no good rationale for conditionalization,

and that it is inconsistent with the basic approach of Bayesian theory. Theseclaims will be shown to also be mistaken, in Section 5.2.

92

4.5 EMPIRICISM

According to Bayesian confirmation theory, a rational person'ssubjective probability function is derived from an earlier subjec-tive probability function, updated by what has been learned inthe interim. But this process obviously cannot go back forever.Idealizing somewhat, we ultimately reach a subjective proba-bility function that the person was born with.

The probabilities that a person has today will be to someextent dependent on what that initial probability function was.There are theorems to show that, under certain conditions, andgiven enough evidence, the effect of the initial probability func-tion will be negligible. While these theorems are significant, theconditions that they require will not always be satisfied.

The upshot is that according to Bayesian confirmation theory,a person's probabilities today are not solely based on experience,but are also partly a function of an initial probability distributionthat is unshaped by the person's experience. This situation isunacceptable to strict empiricists, who want all opinion aboutcontingent facts to be based on experience. Locke and Hume wereempiricists of this ilk; more recently, van Praassen has advocateda similar position.8

I would answer this objection by simply observing that theonly way to conform to the strict empiricist standard is to haveno opinions at all about contingent facts, except perhaps thosethat have been directly observed. For any method of reasoningbeyond direct experience cannot itself be based on experienceand, in fact, performs the same role as the Bayesian's prior prob-ability function. Since empiricists are plainly unwilling to acceptthe skepticism their position entails, their objections cannot betaken seriously. (An equally serious objection would be thatBayesians do not base their probabilities solely on the naturallight of reason.)

It is sometimes thought that any opinion about contingentfact, if not based on experience, must be "synthetic a priori."Thus Kyburg (1968) argues that Bayesian prior probability

8 "Nothing except experience may be treated as a source of information (hence,prior opinion cannot have such a status, so radical breaks with prior opinion —'throwing away priors,' as born-again Bayesians call it - can be rational proce-dure)" (van Praassen 1985, p. 250).

93

functions commit one to a priori synthetic knowledge. This isat best misleading. A priori knowledge, as traditionally under-stood, is not only not based on experience, but is also not re-visable by experience. A subjective probability function, evenif not based on experience, is revisable by experience. Indeed,Bayesian confirmation theory is precisely a theory about howsubjective probabilities should be revised in the light of experi-ence. In this sense, subjective probabilities are not a priori.

So while Bayesian confirmation theory is not strictly empiri-cist, I think it captures as much of empiricism as is reasonable.

4.6 THE DUTCH BOOK ARGUMENT FOR PROBABILITY

I began this chapter by noting the fundamental assumptionof Bayesian philosophy of science, namely that scientists havesubjective probabilities for scientific theories. Since the theoryis normative, a more careful formulation of this fundamental as-sumption would be that rationality requires scientists to have(not necessarily precise) subjective probabilities for scientifictheories. On the preference interpretation of subjective prob-ability that I favor, this fundamental assumption is to be de-fended by producing a representation theorem and arguing thatmost scientists are committed to the basic conditions on pref-erence assumed by this theorem.

This way of defending the fundamental assumption of Bayes-ian philosophy of science is not the usual one adopted by Bayes-ian philosophers of science. Typically, Bayesian philosophers ofscience appeal instead to the Dutch book argument (Horwich1982, Howson and Urbach 1989). This argument does have theadvantage of being much shorter than any representation the-orem. So I need to explain why I invoke a representation the-orem, not the Dutch book argument. My reason is that theDutch book argument is fallacious, as I will now show.

My task here is complicated by the fact that "the Dutchbook argument" has been propounded by different authors indifferent ways; so it is not really one argument but rather afamily of resembling arguments. The approach I will adopt isto consider first a definite and simple form of the argument andidentify the fallacy as it occurs in this form. I will subsequently

94

show that various refinements of this argument either (a) donot avoid the fallacy, or (b) replace it with another fallacy, or(c) really abandon the Dutch book approach in favor of therepresentation theorem approach.

4.6.I The simple argumentFirst, some gambling terminology: If you pay $r for the rightto receive %s if A is true, you are said to have made a bet onA with a betting quotient of r/s , and stake s. Here r may bepositive, zero, or negative; and s may be positive or negative.

Dutch book arguments normally assume that for any propo-sition A, there is a number p{A) such that you are willing toaccept any bet with betting quotient p{A). As the notation in-dicates, p(A) is thought of as your subjective probability for A;and the task of the Dutch book argument is to show that itdeserves to be called a probability, by showing that rationalityrequires p to satisfy the axioms of probability.

Dutch book arguments purport to do this by showing that ifp does not satisfy the axioms of probability, then you will bewilling to accept bets that necessarily give you a loss. A set ofbets with this property is called a Dutch book; hence the nameDutch book argument9 Since the argument has been given infull in many places, I will here merely give an example of howit is supposed to work. For this example, let H denote thata coin will land heads on the next toss, and suppose that foryou p(H) = p(B) = .6, which violates the axioms of probability.The Dutch book argument, applied to this case, goes as follows:Since p(H) = .6, you are willing to pay 60 cents for a bet thatpays you $1 if H. The same holds for H. But these two betstogether will result in you losing 20 cents no matter how the coinlands; they constitute a Dutch book. The Dutch book argumentassumes that you do not want to give away money to a bookie,and concludes that your willingness to accept these two betsshows you are irrational.

9The term 'Dutch book' was introduced to the literature by Lehman (1955), whostated that the term was used by bookmakers.

95

4-6.2 The fallacySuppose that, after a night on the town, you want to catch thebus home. Alas, you find that you have only 60 cents in yourpocket, and the bus costs $1. A bookie, learning of your plight,offers you the following deal: If you give him your 60 cents, hewill toss a coin; and if the coin lands heads, he will give you$1; otherwise, you have lost your 60 cents. If you accept thebookie's proposal, you stand a 50-50 chance of being able totake the bus home, while rejecting it means you will certainlyhave to walk. Under these conditions, you may well feel that thebookie's offer is acceptable; let us suppose you do. Presumablythe offer would have been equally acceptable if you were bettingon tails rather than heads; there is no reason, we can suppose,to favor one side over the other.

As subjective probability was defined for the simple Dutchbook argument, your probability for heads in the above scenariois at least .6, and so is your probability for tails; thus yourprobabilities violate the probability calculus. The Dutch bookargument claims to deduce from this that you are irrational.And yet, given the predicament you are in, your preferencesseem perfectly reasonable.

Looking back over the simple Dutch book argument I gave, wecan see what has gone wrong. Prom the fact that you are willingto accept each of two bets that together would give a sure loss,that argument infers that you are willing to give away money toa bookie. This assumes that if you are willing to accept each ofthe bets, you must be willing to accept both of them. But thatassumption is surely false in the present case. In being willingto bet at less than even money on either heads or tails, you aremerely being sensible; but you would certainly have taken leaveof your senses if you were willing to accept both bets together.Accepting both bets, like accepting neither, means you will haveto walk home.

This, then, is the fallacy in the Dutch book argument: Itassumes that bets that are severally acceptable must also bejointly acceptable; and as our example shows, this is not so.

Now as I mentioned before, there are many versions of theDutch book argument. Fans of one or another of these versionsare sure to protest that the problem just identified is only a

96

problem for the simple form of the argument I illustrated, andis not shared by their favorite version of the argument. But Ithink that response would be mistaken and that the elementaryfallacy just identified actually goes to the core of the Dutch bookapproach. In the following subsections, I will substantiate thisclaim by considering the main refinements of the Dutch bookargument known to me.

4-6.3 Introduction of utilitiesOne refinement of the simple Dutch book argument is to de-fine betting quotients in terms of utilities rather than monetaryamounts (Shimony 1955, Horwich 1982). In this version of theargument, if you pay $r for the right to receive $s if A is true,and you have utility function u, then the betting quotient issaid to be u($r)/u($s). This will in general be different from thevalue r/s that was taken to be the betting quotient in the sim-ple argument. Apart from that difference, everything proceedsas before: It is assumed that for any proposition A, there is anumber p(A) such that you are willing to accept any bet withbetting quotient p(A) - betting quotients now being defined inthe way just stated. And it is argued that if the function p sodefined does not satisfy the axioms of probability, then you arewilling to accept a set of bets that together produce a sure loss.

Note that this approach assumes that a rational person doeshave a utility function. A skeptic could well question this as-sumption. In fact, prior to the representation theorem of vonNeumann and Morgenstern (1947), it was standardly deniedthat utilities had any more than ordinal significance. This kindof skepticism can be answered, as von Neumann and Morgen-stern answered it, by giving a representation theorem. But ifDutch book arguments appeal to a representation theorem, theyhave lost their putative advantage of simplicity over represen-tation theorems.

Even if the assumption of utilities is granted, a more seriousproblem remains, namely: The Dutch book argument with util-ities is still fallacious, in exactly the same way as the simpleargument. For even with the introduction of utilities, the argu-ment still assumes that bets that are severally acceptable are

97

jointly acceptable, and we have seen that this is not a require-ment of rationality. To see that this is so, return to the exampleof Section 4.6.2, and suppose your utility function u is such that

ti($0) = 0; u($0.40) = .4; u($0.60) = .6; t*($1.00) = 2.

Since you currently have $0.60, the expected utility of not ac-cepting the bookie's deal is .6. Assuming your subjective prob-ability for heads is 1/2, the expected utility of accepting thebookie's offer to bet on the coin landing heads is 1. (You have$1 if the coin lands heads, and nothing otherwise.) Thus youare willing to accept this bet. And the same is true if the bet ison tails rather than heads. But to accept both bets would leaveyou with only $0.40, and since the utility of that (.4) is lessthan the status quo, you are not willing to accept both bets.Prom this perspective, the reason why bets that are severallyacceptable need not be jointly acceptable is that utility neednot be a linear function of money. More generally, utilities neednot be additive: The utility of two things together need not beequal to the sum of the utilities of each separately.

One might try to repair the argument by requiring that thebets considered be for some commodity in which utilities areadditive. But as Schick (1986) argues, one would then needa reason to think that such a commodity exists, and no suchreason is in sight.10

Besides not fixing the fallacy in the simple argument, the in-troduction of utilities creates a new fallacy that was not presentin the simple argument. To see this, suppose your probabilities(defined with betting quotients in terms of utilities) are suchthat p(H) = p(H) = .6. The argument that this violation ofthe probability calculus is irrational is now that you are willingto pay .6 utiles (units of utility) for a bet that pays 1 utile if His true; and you would pay the same for a bet that pays 1 utileif H is false; and both bets together produce a sure loss. Wehave already seen that you need not be willing to accept bothbets together. The point I wish to make now is that even if you10If a person's probabilities satisfy the axioms of probability, then it can be shown

that (expected) utilities are additive over lottery tickets. But the Dutch bookargument is trying to show that probabilities should satisfy the axioms, so itwould beg the question to rely on this fact here.

98

did accept both bets together, you need not suffer a loss. Forexample, suppose your utilities are such that

ii($l) = 1; ti(-$0.25) = - .6 .

Then to pay .6 utiles for a bet that pays 1 utile if H (or H) isto pay 25 cents for a bet that yields $1 if jfiT (or H). Thus toaccept both bets is to make a sure profit (of 50 cents), not asure loss.

4.6.4 Posting betting quotientsIn de Finetti's formulation of the Dutch book argument, oneconsiders a (hopefully fictitious) situation in which you are re-quired to state some number p(A) for each event A, and mustthen accept whatever bets anyone cares to make with you, pro-vided the betting quotients agree with your stated probabili-ties.11 It is then argued that if the betting quotients you postdo not satisfy the axioms of probability, bets can be made thatyou are obliged to accept and that together will result in yousuffering a sure loss.

This version of the Dutch book argument eliminates by fiatthe possibility that you will not be willing to accept jointly betsthat are severally acceptable. You are now required to acceptall bets at your stated betting quotients. And if the bettingquotient is defined in terms of money rather than utility (as itwas in de Finetti's presentation), the difficulties raised by theintroduction of utilities are also avoided.

On the other hand, de Finetti's version of the Dutch bookargument introduces new difficulties. First, we can question theassumption that it would be irrational for you to set bettingquotients such that someone could make a Dutch book againstyou. This need not be so. Consider again the scenario in whichyou only have 60 cents, and need $1 to ride the bus home. Youmight know the following things:• If you post betting quotients that satisfy the axioms of prob-

ability, then the bookie will make no bet with you (and soyou will have to walk home).

11See (de Finetti 1931, pp. 304, 308, and 1937, p. 102). The relevant passagesfrom the former article are translated in (Heilig 1978, p. 331).

99

• If you post betting quotients of .6 for both heads and tails,the bookie will place a bet on one or the other, but not both.

Since you prefer a bet at the betting quotient of .6 to no bet, itis then rational for you to post betting quotients of .6 on headsand tails. Hence it can be rational to post betting quotientsthat leave you open to a Dutch book.

The preceding difficulty might be avoided by stipulation.De Finetti has already stipulated that the situation being con-sidered is one in which you are obliged to post betting quotientsand accept all bets that anyone may care to make at these bet-ting quotients. Why not add the further stipulation that thereis a bookie who will make a Dutch book against you if thatis possible? With this additional stipulation, we have that youwill indeed suffer a sure loss if your posted betting quotients donot satisfy the axioms of probability.

Suppose we make this stipulation. Still it does not follow thatyou would be irrational to post betting quotients that violatethe axioms of probability. For example, suppose you know thatif you post betting quotients of .6 on a coin landing heads, and.6 on it landing tails, then the bookie will make bets with yousuch that you lose $1 for sure, and no other bets will be made.Suppose you also know that if you post betting quotients of .5 onheads and .5 on tails, the bookie will bet you $100 that the coinwill land heads. If you are risk averse, you might well prefer toaccept the sure loss of $1 than to gamble on losing $100; hence itwould be rational for you to post betting quotients that violatethe probability calculus.

Of course, this gap in the argument can be closed by a fur-ther stipulation. We can suppose you are sure, not only thatthe bookie will make you suffer a sure loss if you post bettingquotients violating the probability calculus, but also that thebookie will bankrupt you. I am willing to concede, at least forthe sake of argument, that with all these stipulations assumed,it would indeed be irrational for you to post betting quotientsthat violate the axioms of probability. (If my concession is pre-mature, some further stipulations can always be introduced toclose the gaps.) But, I shall argue, this superficially success-ful defense of de Finetti's argument achieves a merely Pyrrhicvictory.

100

Imagine someone arguing that subjective probabilities mustsatisfy the axioms of probability because, if you are requiredto assign numbers to propositions, and are told you will beshot if they do not satisfy the axioms of probability, then youwould be irrational to assign numbers that violate the axioms.Clearly this argument has no force at all. (An exactly parallelargument could be used to "show" that subjective probabilitiesmust violate the axioms of probability: Just stipulate that if thenumbers you assign satisfy the axioms, you will be shot.) Theflaw in the argument is that it assumes the numbers a personassigns to propositions in some arbitrarily specified situationcan be identified with the person's subjective probabilities; andplainly that is not so.

But now let us ask whether de Finetti's argument does nothave the same flaw. When it is stipulated that

• You must post betting quotients and accept all bets anyonewants to make at your posted betting quotients; and

• there is a bookie who will bankrupt you if you post bettingquotients that do not satisfy the axioms of probability

is there any good reason to think the betting quotients youspecify represent your subjective probabilities? To answer thisquestion, we need to think about what makes something yoursubjective probability. In Chapter 1,1 suggested that for a func-tion p to be your subjective probability function, it must be thecase that the attribution of p and u to you provides a sufficientlygood interpretation of your preferences. And a central desider-atum for an interpretation is that it represent your preferencesas by and large maximizing expected utility.

Now the numbers that you would be rational to nominate,when all the stipulations de Finetti must make are assumed,need not be your subjective probabilities, in this sense. For ex-ample, suppose your true subjective probabilities are .6 for Hand H. Given the stipulations de Finetti must make, you knowthat if you post betting quotients of .6 for both these events,you will be bankrupted. You also figure that if you post bettingquotients of .5 for both events, you stand at most a .6 chanceof being bankrupted. (You are bankrupted iff (a) the bookiebets an amount equal to your total wealth on H or H, but not

101

both; and (b) the bookie wins the bet. You put the probabilityof (b) at .6.) Hence you see that you would be irrational to postbetting quotients equal to your true subjective probabilities.

Thus de Finetti's argument is fallacious, in the same way asthe argument that appeals to what you would do when a gunis put to your head. At best, the argument shows that, undersome specified conditions, it would be rational for you to as-sign numbers to propositions in a way that satisfies the axiomsof probability. Since these numbers need not be your subjec-tive probabilities, it does not follow that rationality requiresyou to have subjective probabilities satisfying the axioms ofprobability.

4-6.5 Mathematical aggregationSkyrms (1987a) has recently claimed that the Dutch book ar-gument can be reformulated in a way that makes it cogent. Herefers to the result of making both bets b\ and 62 as the phys-ical aggregate of 61 and 61. This is the kind of combination ofbets that has been considered in the versions of the Dutch bookargument discussed so far. Skyrms proposes to consider insteadwhat he calls mathematical aggregates of bets. The mathemati-cal aggregate of 61 and 62 is an arrangement that, in each state,gives a consequence whose utility is equal to the sum of theutilities of the consequences b\ and 62 give in that state.

In the first two versions of the Dutch book argument that Iconsidered, it was erroneously assumed that bets that are sev-erally acceptable must be jointly acceptable. In the terminologyjust introduced, we can put the point by saying that the phys-ical aggregate of a set of acceptable bets need not itself be anacceptable bet.

Prom the perspective of expected utility theory, the reason whyseverally acceptable bets need not be jointly acceptable is thatthe expected utility of the physical aggregate need not be thesum of the expected utilities of the bets involved. On the otherhand, the expected utility of a mathematical aggregate is neces-sarily the sum of the expected utilities of the aggregated bets.Hence if each of a set of bets is acceptable (i.e., each has higherexpected utility than the status quo), then the mathematical

102

aggregate of those bets should also be acceptable. So if we re-place physical by mathematical aggregates, a false assumptionof Dutch book arguments is replaced by a true one.

However, this also means that the mathematical aggregate ofa set of separately acceptable bets must have greater expectedutility than the status quo. Thus the mathematical aggregateof a set of separately acceptable bets can never give a sure loss.Hence if physical aggregation is replaced by mathematical ag-gregation, a Dutch book argument cannot possibly show that aperson who violates the probability calculus is willing to accepta sure loss. Nor does Skyrms claim that it could.

What, then, is Skyrms's version of the Dutch book argument?Skyrms makes the following assumptions. (Here f#g denotesthe mathematical aggregate of / and g.)

(1) If b gives the same utility u in every state, then theexpected utility of 6, EU(b), equals u.

(2) If EU(bi) = EU(b[) and EU(b2) = EU{b'2), thenEU{bl#b2) =

Skyrms shows that given these assumptions, probabilities mustsatisfy the additive law of probability (i.e., the probability of adisjunction of incompatible propositions must be equal to thesum of the probabilities of the disjuncts).

Skyrms anticipates objections to assumption (2); and he re-sponds that this can be derived from more restrictive principles.He also begins to sketch a representation theorem, a la Ramsey,which would enable the additivity of probability to be derivedwithout assuming (2). He says that "If we pursue this line we, likeRamsey, will have followed the dutch book theorems to deeperand deeper levels until it leads to the representation theorem."

However, what Skyrms here presents as a defense of Dutchbook arguments, I interpret as an abandonment of such argu-ments, in favor of the representation theorem approach. Thepoint is partly semantic. I think it is most useful to use theterm 'Dutch book argument' to refer only to arguments thatinvolve a Dutch book, that is, a set of bets that together pro-duce a sure loss. Skyrms's version of the Dutch book argumentinvolves no Dutch book, so is not really a Dutch book argu-ment at all. Similarly, no Dutch book figures in representation

103

theorems, so representation theorems are not a species of Dutchbook argument.

Apart from the semantic issue of what counts as a Dutch bookargument, I think Skyrms and I are in complete agreement. Heseems to allow that it cannot cogently be argued that violationof the probability calculus makes you susceptible to a Dutchbook. And he maintains, as I do, that the cogent way to arguethat rational scientists have subjective probabilities, satisfyingthe axioms of probability, is to use a representation theorem.

104

Diachronic rationality

We have seen that Bayesian confirmation theory rests on twoassumptions: (1) That rational scientists have probabilities forscientific hypotheses, and (2) the principle of conditionalization.The latter is a diachronic principle of rationality, because itconcerns how probabilities at one time should be related toprobabilities at a later time.

Chapters 1-4 gave an extended argument in support of (1).This chapter will examine what can be said on behalf of (2).I will reject the common Bayesian view that conditionalizationis a universal requirement of rationality, but argue that never-theless it should hold in normal scientific contexts.

I begin by discussing a putative principle of rationality knownas Reflection. A correct understanding of the status of this prin-ciple will be the key to my account of the status of condition-alization.

5.1 REFLECTION

Suppose you currently have a (personal) probability functionp, and let Rq denote that at some future time t + x you willhave probability function q. Goldstein (1983) and van Praassen(1984) have claimed that the following identity is a requirementof rationality:1

p(-\Rq) = <?(•)•

Following van Praassen (1984), I will refer to this identity asReflection,

As an example of what Reflection requires, suppose you aresure that you cannot drive safely after having ten drinks. Sup-pose further that you are sure that after ten drinks, you would1 Goldstein actually defends a stronger condition; but the argument for his strongercondition is the same as for the weaker one stated here.

105

be sure (wrongly, as you now think) that you could drive safely.Then you violate Reflection. For if p is your current probabilityfunction, q the one you would have after ten drinks, and D theproposition that you can drive safely after having ten drinks,we have

p(D\Rq) » 0 < 1 « q(D).

Reflection requires p(D\Rq) = q(D) « 1. Thus you should nowbe sure that you would not be in error, if in the future youbecome sure that you can drive safely after having ten drinks.

5.1.1 The Dutch book argumentWhy should we think Reflection is a requirement of rationality?According to Goldstein and van Praassen, this conclusion is es-tablished by a diachronic Dutch book argument. A diachronicDutch book argument differs from a regular Dutch book argu-ment in that the bets are not all offered at the same time. Butlike a regular Dutch book argument, it purports to show thatanyone who violates the condition is willing to accept bets thattogether produce a sure loss, and hence is irrational.

Since the diachronic Dutch book argument for Reflectionhas been stated in full generality elsewhere (Goldstein 1983,van Praassen 1984, Skyrms 1987b), I will here merely illustratehow it works. Suppose, then, that you violate Reflection withrespect to drinking and driving, in the way indicated above. Forease of computation, I will assume that p(D) = 0 and q(D) = 1.(Using less extreme values would not change the overall conclu-sion.) Let us further assume that your probability that you willhave ten drinks tonight is 1/2. The Dutch bookie tries to makea sure profit from you by first offering a bet b\ whose payoff inunits of utility is2

- 2 if DRq', 2 if DRq\ - 1 if Rq.

For you at this time, p(DRq) = 0, and p(DRq) = p(Rq) = 1/2.Thus the expected utility of &i is 1/2. We are taking the utilityof the status quo to be 0, and so the bookie figures that you2Conjunction is represented by concatenation, and negation by overbars. For ex-

ample, DRq is the proposition that D is false and Rq is true.

106

will accept this bet. If you accept the bet and do not get drunk(Rq is false), you lose one unit of utility. If you accept and doget drunk (Rq is true), the bookie offers you &2? whose payoffin units of utility is

1 if D; - 3 if D.

Since you are now certain D is true, accepting 62 increases yourexpected utility, and so the bookie figures you will accept it.But now, if D is true, you gain 1 from 62 but lose 2 from 61,for an overall loss of 1. And if D is false, you gain 2 from 61but lose 3 from 62? again losing 1 overall. Thus no matter whathappens, you lose.3

5.1.2 CounterexamplesDespite this argument, there are compelling prima facie coun-terexamples to Reflection. Indeed, the drinking/driving exam-ple is already a prima facie counterexample; it seems that youwould be right to now discount any future opinions you mightform while intoxicated - contrary to what Reflection requires.But we can make the counterexample even more compelling bysupposing you are sure that tonight you will have ten drinks. Itthen follows from Reflection that you should now be sure thatyou can drive safely after having ten drinks.

Proof. p(D) = p(D\Rq), since p{Rq) = 1= q(D), by Reflection= 1.

This result seems plainly wrong. Nor does it help to say that arational person should not drink so much; for it may be that thedrinking you know you will do tonight will not be voluntary.

A defender of Reflection might try responding to such coun-terexamples by claiming that the person you would be whendrunk is not the same person who is now sober. If you were3In presentations of this argument, it is usual to have two bets, where J have the

single bet b\. Those two bets would be a bet on Rq, and a bet on D which iscalled off if Rq is false. By using a single bet instead, I show that the argumentdoes not here require the assumption that bets that are separately acceptableare also jointly acceptable.

107

absolutely sure of this, then for you p(Rq) = 0, since Rq as-serts that you will come to have probability function q.4 Inthat case, p(-\Rq) may be undefined and the counterexamplethereby avoided. But this is a desperate move. Nobody I knowgives any real credence to the claim that having ten drinks, andas a result thinking he or she can drive safely, would destroy hisor her personal identity. They are certainly not sure that thisis true.

Alternatively, defenders of Reflection may bite the bullet, anddeclare that even when it is anticipated that one's probabili-ties will be influenced by drugs, Reflection should be satisfied.Perhaps nothing is too bizarre for such a die-hard defender ofReflection to accept. But it may be worth pointing out a pecu-liar implication of the position here being embraced: It entailsthat rationality requires taking mind-altering drugs, in circum-stances where that position seems plainly false. I will now showhow that conclusion follows.

It is well known that under suitable conditions, gatheringevidence increases the expected utility of subsequent choices, ifit has any effect at all. The following conditions are sufficient:5

1. The evidence is "cost free"; that is, gathering it does notalter what acts are subsequently available, nor is any penaltyincurred merely by gathering the evidence.

2. Reflection is satisfied for the shifts in probability that couldresult from gathering the evidence. That is to say, if p isyour current probability function, then for any probabilityfunction q you could come to have as a result of gatheringthe evidence, p(-\Rq) = q(-).

3. The decision to gather the evidence is not "symptomatic";that is, it is not probabilistically relevant to states it doesnot cause.

4. Probabilities satisfy the axioms of probability, and choicesmaximize expected utility at the time they are made.

4I assume that you are sure you cannot come to have q other than by drinking.This is a plausible assumption if (as we can suppose) q also assigns an extremelyhigh probability to the proposition that you have been drinking.

51 assume causal decision theory. For a discussion of this theory, and a proof thatthe stated conditions are indeed sufficient in this theory, see (Maher 1990b). Inthat work, I referred to Reflection as Miller's principle.

108

Now suppose you have the opportunity of taking a drug that willinfluence your probabilities in some way that is not completelypredictable. The drug is cost free (in particular, it has no directeffect on your health or wealth), and the decision to take thedrug is not symptomatic. Assume also that rationality requirescondition 4 above to be satisfied. If Reflection is a general re-quirement of rationality, condition 2 should also be satisfied forthe drug-induced shifts. Hence all four conditions are satisfied,and it follows that you cannot reduce your expected utility bytaking this drug; and you may increase it.

For example, suppose a bookie is willing to bet with you onthe outcome of a coin toss. You have the option of betting onheads or tails; you receive $1 if you are right and lose $2 ifyou are wrong. Currently your probability that the coin willland heads is 1/2, and so you now think the best thing to do isnot bet. (I assume that your utility function is roughly linearfor such small amounts of money.) But suppose you can take adrug that will make you certain of what the coin toss will be;you do not know in advance whether it will make you sure ofheads or tails, and you antecedently think both results equallylikely.6 The drug is cost free, and you satisfy condition 4. Thenif Reflection should hold with regard to the drug-induced shifts,you think you can make money by taking the drug. For afteryou take the drug, you will bet on the outcome you are thencertain will result; and if you satisfy Reflection, you are nowcertain that bet will be successful. By contrast, if you do nottake the drug, you do not expect to make a profit betting onthis coin toss. Thus the principle of maximizing expected utilityrequires you to take the drug.

But in fact, it is clear that taking the drug need not be ratio-nal. You could perfectly rationally think that the bet you wouldmake after taking the drug has only a 50-50 chance of win-ning, and hence that taking the drug is equivalent to choosing

6 This condition is necessary in order to ensure that your decision to take the drugis not symptomatic. If you thought the drug was likely to make you sure thecoin will land heads (say), and if Reflection is satisfied, then the probability ofthe coin landing heads, given that you take the drug, would also be high. Sincetaking the drug has no causal influence on the outcome of the toss, and sincethe unconditional probability of heads is 1/2, taking the drug would then be asymptomatic act.

109

randomly to bet on heads or tails. Since thinking that violatesReflection, we have another reason to deny that Reflection is arequirement of rationality.

5.1.3 The fallacyWe now face a dilemma. On the one hand, we have a diachronicDutch book argument to show that Reflection is a requirementof rationality. And on the other hand, we have strong reasonsfor saying that Reflection is not a requirement of rationality.There must be a mistake here somewhere.

In (Maher 1992), following Levi (1987), I argued that a so-phisticated bettor who looks ahead will not accept the bets of-fered in the Dutch book argument for Reflection. The thoughtwas that if you look ahead, you will see that accepting b\ in-evitably leads to a sure loss, and hence will refuse to take thefirst step down the primrose path. This diagnosis assumed thatif you do not accept &i, you will not be offered &2- However,Skyrms (in press) points out that the bookie could offer 62 ifRq obtains, regardless of whether 61 has been accepted. Facedwith this strategy, you do best (maximize expected utility) toaccept b\ as well, and thus ensure a sure loss.

So with Skyrms's emendation, the diachronic Dutch bookargument does show that if you violate Reflection, you can bemade to suffer a sure loss. Yet as Skyrms himself agrees, itis not necessarily rational to conform to Reflection. Thus wehave to say that susceptibility to a sure loss does not proveirrationality. This conclusion may appear counterintuitive; butthat appearance is an illusion, I will now argue.

We say that act a dominates act b if, in every state, theconsequence of a is better than that of b. It is uncontroversialthat it is irrational to choose an act that is dominated by someother available act. Call such an act dominated. One might nat-urally suppose that accepting a sure loss is a dominated act,and thereby irrational.

But consider this case: I have bet that it will not rain today.The deal, let us say, is that I lose $1 if it rains and win $1otherwise. How I came to make this bet does not matter -perhaps it looked attractive to me at the time; perhaps I made

110

No rain RainAccept 2nd betDon't accept

Figure 5.1: Available options after accepting bet against rain

it under duress. In any case, storm clouds are now gathering andI think I will lose the bet. I would now gladly accept a bet thatpays me $0.50 if it rains and in which I pay $1.50 otherwise.If I did accept such a new bet, then together with the one Ialready have, I would be certain to lose $0.50. So I am willingto accept a sure loss. But I am not thereby irrational. The sureloss of $0.50 is better than a high probability of losing $1. Notealso that although I am willing to accept a sure loss, I am notwilling to accept a dominated option. My options are shown inFigure 5.1. The first option, which gives the sure loss, is notdominated by the only other available act.

So we see that acceptance of a sure loss is not always a dom-inated act; and when it is not, acceptance of a sure loss canbe rational. I suggest that the intuitive irrationality of accept-ing (or being willing to accept) a sure loss results from the falsesupposition that acceptance of a sure loss is always a dominatedoption, combined with the correct principle that it is irrationalto accept (or be willing to accept) a dominated option.

Let us apply this to the Dutch book argument for Reflec-tion. In the example of Section 5.1.1, you are now certain thataccepting 62 would result in a loss, and hence you prefer thatyou not accept it. However, you also know that you will ac-cept it if you get drunk. This indicates that your willingnessto accept 62 when drunk is not something you are now ableto reverse (for if you could, you would). Thus you are in ef-fect now stuck with the fact that you will accept 62 if youget drunk, that is, if Rq is true. Hence you are in effect nowsaddled with the bet

1 if DRq', - 3 if DRq,

111

DRq DRq Rq

Accept 61Reject 61

- 1 - 1 - 11 - 3 0

Figure 5.2: Available options for Reflection violator

though it looks unattractive to you now. (This is analogous tothe first bet in the rain example.) But you do have a choiceabout whether or not to accept 61. Since 61 pays

- 2 if DRq; 2 if DRq; - 1 if Rq

your options and their payoffs are as in Figure 5.2. Accepting61 ensures that you suffer a sure loss; but it is not a dominatedoption. In fact, since p(D) = 0 and p(Rq) = 1/2, accepting 61reduced your expected loss from —1.5 to —1. So in this case, asin the rain example, the willingness to accept a sure loss doesnot involve willingness to accept a dominated option, and doesnot imply irrationality.

If there is any irrationality in this case, it lies in the potentialfuture acceptance of 62- But because that future acceptance isoutside your present control, it is no reason to say that youare now irrational. Perhaps your future self would be irrationalwhen drunk, but that is not our concern. Reflection is a condi-tion on your present probabilities only, and what we have seen isthat you are not irrational to now have the probabilities you do,even though having these probabilities means you are willing toaccept a sure loss.

Let us say that a Dutch book theorem asserts that violationof some condition leaves one susceptible to a sure loss, whilea Dutch book argument infers from the theorem that viola-tion of the condition is irrational. In Section 4.6, the condi-tion in question was satisfaction of the axioms of probability,and my claim in effect was that the argument fails because thetheorem is false. In the present section, the condition in ques-tion has been Reflection, and my claim has been that here theDutch book theorem is correct but the argument based on it is

112

fallacious. Consequently, this argument provides no reason notto draw the obvious conclusion from the counterexamples inSection 5.1.2: Reflection is not a requirement of rationality.(Christensen [1991] and Talbott [1991] arrive at the same con-clusion by different reasoning.)

5.1.4 IntegrityRecognizing the implausibility of saying Reflection is a require-ment of rationality, van Praassen (1984, pp. 250-5) tried to bol-ster its plausibility with a voluntarist conception of personalprobability judgments. He claimed that personal probabilityjudgments express a kind of commitment; and he averred thatintegrity requires you to stand behind your commitments, in-cluding conditional ones. For example, he says your integritywould be undermined if you allowed that were you to promiseto marry me, you still might not do it. And by analogy, heconcludes that your integrity would be undermined if you saidthat your probability for A, given that tomorrow you give itprobability r, is something other than r.

I agree that a personal probability judgment involves a kind ofcommitment; to make such a judgment is to accept a constrainton your choices between uncertain prospects. For example, ifyou judge A to be more probable than S, and if you prefer $1to nothing, then faced with a choice between

(i) $1 if A, nothing otherwise

and

(ii) nothing if A, $1 otherwise,

you are committed to choosing (i). But of course, you are notthereby committed to making this choice at all times in thefuture; you can revise your probabilities without violating yourcommitment. The commitment is to make that choice now, ifnow presented with those options. But this being so, a violationof Reflection is not analogous to thinking you might break amarriage vow. To think you might break a marriage vow is tothink you might break a commitment. To violate Reflectionis to not now be committed to acting in accord with a future

113

commitment, on the assumption that you will in the future havethat commitment. The difference is that in violating Reflection,you are not thereby conceding that you might ever act in a waythat is contrary to your commitments at the time of action. Abetter analogy for violations of Reflection would be saying thatyou now think you would be making a foolish choice, if you wereto decide to marry me. In this case, as in the case of Reflection,you are not saying you could violate your commitments; you aremerely saying you do not now endorse certain commitments,even on the supposition that you were to make them. Sayingthis does not undermine your status as a person of integrity.

5.1.5 Reflection and learningIn the typical case of taking a mind-altering drug, Reflection isviolated, and we also feel that while the drug would shift ourprobabilities, we would not have learned anything in the pro-cess. For instance, if a drug will make you certain of the outcomeof a coin toss, then under typical conditions the shift producedby the drug does not satisfy Reflection, and one also does notregard taking the drug as a way of learning the outcome of thecoin toss.

Conversely, in typical cases where Reflection is satisfied, wedo feel that the shift in probabilities would involve learningsomething. For example, suppose Persi is about to toss a coin,and suppose you know that Persi can (and will) toss the coin sothat it lands how he wants, and that he will tell you what theoutcome will be if you ask. Then asking Persi about the cointoss will, like taking the mind-altering drug, make you certainof the outcome of the toss. But in this case, Reflection will besatisfied, and we can say that by asking Persi you will learnhow the coin is going to land.

What makes the difference between these cases is not that adrug is involved in one and testimony in the other. This can beseen by varying the examples. Suppose you think Persi reallyhas no idea how the coin will land but has such a golden tonguethat if you talked to him you would come to believe him; in thiscase, a shift caused by talking to Persi will not satisfy Reflection,and you will not think that by talking to him you will learn the

114

outcome of the coin toss (even though you will become sure ofsome outcome). Conversely, you might think that if you takethe drug, a benevolent genie will influence the coin toss so thatit agrees with what the drug would make you believe; in thiscase, the shift in probabilities caused by taking the drug willsatisfy Reflection, and you will think that by taking the drugyou will learn the outcome of the coin toss.

These considerations lead me to suggest that regarding a po-tential shift in probability as a learning experience is the samething as satisfying Reflection in regard to that shift. Symboli-cally: You regard the shift from p to q as a learning experiencejust in case p(-\Rq) = q(')-7

Shifts that do not satisfy Reflection, though not learning ex-periences in the sense just defined, may still involve some learn-ing. For example, if q is the probability function you would haveafter taking the drug that makes you sure of the outcome of thecoin toss, you may think that in shifting to q you would learnthat you took the drug but not learn the outcome of the cointoss. In general, what you think you would learn in shifting fromp to q is represented by the difference between p and p(-\Rq).sWhen Reflection is satisfied, what is learned is represented bythe difference between p and g, and we call the whole shift alearning experience.

Learning, so construed, is not limited to cases in which newempirical evidence is acquired. You may have no idea what is thesquare root of 289, but you may also think that if you ponderedit long enough you would come to concentrate your probabilityon some particular number, and that potential shift may wellsatisfy Reflection. In this case, you would regard the potentialshift as a learning experience, though no new empirical evidencehas been acquired. On the other hand, any shift in probability

7This proposal was suggested to me by Skyrms (1990a), who assumes that whatis thought to be a learning experience will satisfy Reflection. (He calls Reflection"Principle (M)".)

8This assumes that q records everything relevant about the shift. Otherwise, itwould be possible to shift from p to q in different ways (e.g., by acquiring evidenceor taking drugs), some of which would involve more learning than others. Thisassumption could fail, e.g., if after making the shift, you would forget somerelevant information about the shift. Such cases could be dealt with by replacingRq with a proposition that specifies your probability distribution at every instantbetween t and t + x.

115

that is thought to be due solely to the influence of evidence isnecessarily regarded as a learning experience. Thus satisfactionof Reflection is necessary, but not sufficient, for regarding a shiftin probability as due to empirical evidence.

A defender of Reflection might think of responding to thecounterexamples by limiting the principle to shifts of a certainkind. But the observations made in this section show that sucha response will not help. If Reflection were said to be a re-quirement of rationality only for shifts caused in a certain way(e.g., by testimony rather than drugs), then there would still becounterexamples to the principle. And if Reflection were said tobe a requirement of rationality for shifts that are regarded aslearning experiences, or as due to empirical evidence, then theprinciple would be one that it is impossible to violate, and hencevacuous as a principle of rationality.9

5.1.6 Reflection and rationalityAlthough there is nothing irrational about violating Reflection,it is often irrational to implement those potential shifts thatviolate Reflection. That is to say, while one can rationally havep(-\Rq) 7 q(-), it will in such cases often be irrational to choosea course of action that might result in acquiring the probabilityfunction q. The coin-tossing example of Section 5.1.2 providesan illustration of this. Let H denote that the coin lands heads,and let q be the probability function you would have if you tookthe drug, and it made you certain of H. Then if you think takingthe drug gives you only a random chance of making a successfulbet, p(H\Rq) = .5 < q{H) = 1, and you violate Reflection; butthen you would be irrational to take the drug, since the expectedreturn from doing so is (1/2)($1) - (l/2)($2) < 0.10

9Jeffrey (1988, p. 233) proposed to restrict Reflection to shifts that are "rea-sonable," without saying what that means. His proposal faces precisely thedilemma I have just outlined. If a "reasonable" shift is denned by its causal ori-gin, Jeffrey's principle is not a requirement of rationality. If a "reasonable" shiftis defined to be a learning experience, Jeffrey's principle is vacuous. In the nextsection, we will see that if a "reasonable" shift is a shift that it would be rationalto implement, Jeffrey's principle is again not a requirement of rationality.

10Here and in what follows, I assume that the proviso of Section 1.9 holds. Thatis, I assume your probabilities and utilities are themselves rational, so thatrationality requires maximizing expected utility.

116

This observation can be generalized, and made more precise,as follows. Let d and d! be two acts; for example, d might bethe act of taking the drug in the coin-tossing case, and d! theact of not taking the drug. Assume that

(i) Any shift in probability after choosing dr would satisfyReflection.

In the coin-tossing case, this will presumably be satisfied; if q'is the probability function you would have if you decided not totake the drug, qf will not differ much from p, and in particular

Assume also that(ii) d and d' influence expected utility only via their

influence on what subsequent choices maximizeexpected utility.

More fully: Choosing d or d' may have an impact on your prob-ability function, and thereby influence your subsequent choices;but (ii) requires that they not influence expected utility in anyother way. So there must not be a reward or penalty attacheddirectly to having any of the probability functions that could re-sult from choosing d or d'\ nor can the choice of d or df alter whatsubsequent options are available. This condition will also hold inthe coin-tossing example if the drug is free and has no deleteriouseffects on health and otherwise the situation is fairly normal.11

Assume further that(iii) If anything would be learned about the states by

choosing d, it would also be learned by choosing d'.

What I mean by (iii) is that the following four conditions areall satisfied. Here Q is the set of all probability functions thatyou could come to have if you chose d.

(a) You are sure there is a fact about what probabilityfunction you would have if you chose d; that is, you giveprobability 1 to the proposition that for some #, thecounterfactual conditional d^>Rq is true.

11 According to an idea floated in Section 1.8, satisfaction of (ii) ensures that thesubjective probabilities that it is rational to have are also justified.

117

(b) For all q G Q there is a probability function q' such that

(c) There is a set S of states of nature that are suitable forcalculating the expected utility of the acts that will beavailable after the choice between d and dr is made.(What this requires is explained in the first paragraphof the proof given in Appendix A.)

(d) For all q G Q, and for q1 related to q as in (b), and forall s G S, p(s\Rq) = p(s\Rqf).

In the coin-tossing example, condition (a) can be assumed tohold: Presumably the drug is deterministic, so that there is afact about what probability function you would have if you tookthe drug, though you do not know in advance what that factis. Condition (b) holds trivially in the coin-tossing example,because not taking the drug would leave you with the sameprobability function q1 regardless of what effect the drug wouldhave. Condition (c) is satisfied by taking S = {H,H}. And itis a trivial exercise to show that (d) holds, since

p(H\Rq) = p(H) = 1/2 = p{H\Rq,).

The coin-tossing example thus satisfies condition (iii). We couldsay that in this example, you learn nothing about the stateswhether you choose d or d1.

Also assume that

(iv) d and d! have no causal influence on the states Smentioned in (c).

In the coin-tossing example, neither taking the drug nor refusingit has any causal influence on how the coin lands; and so (iv) issatisfied.

Finally, assume that

(v) d and d' are not evidence for events they have notendency to cause.

In the coin-tossing example, (iv) and (v) together entail thatp(H\d) = p(H\df) = 1/2, which is what one would expect tohave in this situation.

118

Theorem. / / conditions (i)-(v) are known to hold, then theexpected utility of d1 is not less than that of d, and may begreater.

So it would always be rational to choose d', but it may beirrational to choose d. The proof is given in Appendix A.

The theorem can fail when the stated conditions do not hold.For one example of this, suppose you are convinced there is asuperior being who gives eternal bliss to all and only those whoare certain that pigs can fly. Suppose also that there is a drugthat, if you take it, will make you certain that pigs can fly. Ifq is the probability function you would have after taking thisdrug, and F is the proposition that pigs can fly, then q(F) = 1.Presumably p(F\Rq) = p(F) « 0. So the shift resulting fromtaking this drug violates Reflection. On the other hand, nottaking the drug would leave your current probability essentiallyunchanged. But in view of the reward attached to being certainpigs can fly, it would (or at least, could) be rational to takethe drug and thus implement a violation of Reflection.12 Herethe result fails, because condition (ii) does not hold: Taking thedrug influences your utility other than via its influence on yoursubsequent decisions.

To illustrate another way in which the result may fail, supposeyou now think there is a 90 percent chance that Persi knows howthe coin will land, but that after talking to him you would becertain that what he told you was true. Again letting H denotethat the coin lands heads, and letting qjj be the probabilityfunction you would have if Persi told you the coin will landheads, we have p(H\RqH) — .9, while qjj{H) — 1. Similarlyfor qg. Thus talking to Persi implements a shift that violatesReflection. If you do not talk to Persi, you will have probabilityfunction q' which, so far as if is concerned, is identical to yourcurrent probability function p; so p(H\Rq') = q'(H) = -5. Thusnot talking to Persi avoids implementing a shift that violatesReflection. Your expected return from talking to Persi is

(.l)(-$2) = $0.70.12If eternal bliss includes epistemic bliss, taking the drug could even be rational

from a purely epistemic point of view. Nevertheless, your certainty that pigscan fly would not be justified (see Section 1.8).

119

Since you will not bet if you do not talk to Persi, the expectedreturn from not talking to him is zero. Hence talking to Persimaximizes your expected monetary return. And assuming yourutility function is approximately linear for small amounts ofmoney, it follows that talking to Persi maximizes expected util-ity. Here the theorem fails because condition (iii) fails. By talk-ing to Persi, you do learn something about how the coin willland; and you learn nothing about this if you do not talk tohim. The theorem I stated implies that the expected utility oftalking to Persi is no higher than that of learning what youwould learn from him, without violating Reflection; but in theproblem I have described, the latter option is not available.

I will summarize the foregoing theorem by saying that, otherthings being equal, implementing a shift that violates Reflec-tion cannot have greater expected utility than implementing ashift that satisfies Reflection. Conditions (i)-(v) specify what ismeant here by "other things being equal." This, not the claimthat a rational person must satisfy Reflection, gives the trueconnection between Reflection and rationality.

5.2 CONDITIONALIZATION

In Chapter 4,1 noted that Bayesian confirmation theory makesuse of a principle of conditionalization. The principle, as I for-mulated it there, was:

Conditionalization. / / your current probability function is p,and ifq is the probability function you would have if you learnedE and nothing else, then q(-) should be identical to p(-\E).

An alternative formulation, couched in terms of evidence ratherthan learning, will be discussed in Section 5.2.3.

Paul Teller (1973, 1976) reports a Dutch book argument dueto David Lewis, which purports to show that conditionaliza-tion is a requirement of rationality. The argument is essentiallythe same as the Dutch book argument for Reflection,13 and isfallacious for the same reason.13But this way of putting the matter reverses the chronological order, since Lewis

formulated the argument for conditionalization before the argument for Reflec-tion was advanced.

120

5.2.1 Conditionalization, Reflection, and rationality

In this section, I will argue that conditionalization is not a uni-versal requirement of rationality, and will explain what I taketo be its true normative status.

Recall what the conditionalization principle says: If you learnE, and nothing else, then your posterior probability shouldequal your prior probability conditioned on E. But what doesit mean to "learn E, and nothing else"? In Section 5.1.5, I sug-gested that what you think you would learn in shifting fromp to q is represented by the difference between p and p(-\Rq).Prom this perspective, we can say that you think you wouldlearn E and nothing else, in shifting from p to #, just in casep(-\Rg)=P(-\E).

This is only a subjective account of learning; it gives an in-terpretation of what it means to think E would be learned, notwhat it means to really learn E. But conditionalization is plau-sible only if your prior probabilities are rational; and then thesubjective and objective notions of learning presumably coin-cide. So we can take the "learning" referred to in the principleof conditionalization to be learning as judged by you. In whatfollows, I use the term 'learning' in this way.

So if you learned E and nothing else, and if your probabilitiesshifted from p to g, then p(-\Rq) = p('\E). If you also satisfyReflection in regard to this shift, then p(-\Rq) = g(-), and soq(-) = p{-\E), as conditionalization requires. This simple infer-ence shows that Reflection entails conditionalization.

It is also easy to see that if you learn E, and nothing else, andif your probabilities shift in a way that violates Reflection, thenyour probability distribution is not updated by conditioning onE. For since you learned E, and nothing else, p{-\Rq) = p(-\E)\and since Reflection is not satisfied in this shift, q(-) p(-\Rq),whence q(-) p(-\E).

These results together show that conditionalization is equiv-alent to the following principle: When you learn E and nothingelse, do not implement a shift that violates Reflection. But wesaw, in Section 5.1.6, that there are cases in which it is rationalto implement a shift that violates Reflection. I will now showthat some of these cases are ones in which you learn E, and

121

nothing else. This suffices to show that it can be rational toviolate conditionalization.

Consider again the situation in which you are sure there isa superior being who will give you eternal bliss, if and onlyif you are certain that pigs can fly; and there is a drug avail-able that will make you certain of this. Let d be the act oftaking the drug, and q the probability function you wouldhave after taking the drug. Then we can plausibly supposethat p(-\Rq) = p('\d), and hence that in taking the drug youlearn d, and nothing else. Consequently, conditionalization re-quires that your probability function after taking the drug bep(-\d), which it will not be. (With F denoting that pigs can fly,p(F\d) = p(F) « 0, while q(F) = 1.) Hence taking the drugimplements a violation of conditionalization. Nevertheless, itis rational to take the drug in this case, and hence to violateconditionalization.

Similarly for the other example of Section 5.1.6. Here youthink there is a 90 percent chance that Persi knows how the coinwill land, but you know that after talking to him, you wouldbecome certain that what he told you was true. We can supposethat in talking to Persi, you think you will learn what he said,and nothing else. Then an analysis just like that given for thepreceding example shows that talking to Persi implements aviolation of conditionalization. Nevertheless, it is rational totalk to Persi, because (as we saw) this maximizes your expectedutility.

It is true that in both these examples, there are what wemight call "extraneous" factors that are responsible for the ra-tionality of violating conditionalization. In the first example,the violation is the only available way to attain eternal bliss;and in the second example, it is the only way to acquire someuseful information. Can we show that, putting aside such con-siderations, it is irrational to violate conditionalization? Yes,we have already proved that. For we saw that when otherthings are equal (in a sense made precise in Section 5.1.6),expected utility can always be maximized without implement-ing a violation of Reflection. As an immediate corollary, wehave that when other things are equal, expected utility can

122

always be maximized without violating conditionalization.14

To summarize: The principle of conditionalization is a spe-cial case of the principle that says not to implement shifts thatviolate Reflection. Like that more general principle, it is not auniversal requirement of rationality; but it is a rationally ac-ceptable principle in contexts where other things are equal, inthe sense made precise in Section 5.1.6.

5.2.2 Other arguments for conditionalizationLewis's Dutch book argument is not the only argument that hasbeen advanced to show that conditionalization is a requirementof rationality. What I have said in the preceding section impliesthat these other arguments must also be incorrect. I will showthat this is so for arguments offered by Teller, and by Howson.

After presenting Lewis's Dutch book argument, Teller (1973,1976) proceeds to offer an argument of his own for conditional-ization. The central assumption of this argument is that if youlearn E and nothing else, then for all propositions A and Bthat entail E, if p(A) = p(B), then it ought to be the case thatq(A) = q(B). (Here, as before, p and q are your prior and poste-rior probability functions, respectively.) Given this assumption,Teller is able to derive the principle of conditionalization. Butthe counterexamples that I have given to conditionalization arealso counterexamples to Teller's assumption. To see this, con-sider the first counterexample, in which taking a drug will makeyou certain pigs can fly, and this will give you eternal bliss.Let F and d be as before, and let G denote that the moon ismade of green cheese. We can suppose that in this example,p(Fd) = p(Gd), and q(Fd) = q(F) > q(G) = q(Gd). Assumingthat d is all you learn from taking the drug, we have a violation14Brown (1976) gives a direct proof of a less general version of this result. What

makes his result less general is that it applies only to cases where for each E youmight learn, there is a probability function q such that you are sure q would beyour probability function if you learned E, and nothing else. This means thatBrown's result is not applicable to the coin-tossing example of Section 5.1.2, forexample. (In this example, your posterior probability, on learning that you tookthe drug, could give probability 1 to either heads or tails.) Another differencebetween Brown's proof and mine is that his does not apply to probabilitykinematics (cf. Section 5.3).

123

of Teller's principle. But the shift from p to q involves no failureof rationality. You do not want q(F) to stay small, or else youwill forgo eternal bliss; nor is there any reason to become cer-tain of G, and preserve Teller's principle that way. Thus Teller'sprinciple is not a universal requirement of rationality, and hencehis argument fails to show that conditionahzation is such a re-quirement. (My second counterexample to conditionahzationcould be used to give a parallel argument for this conclusion.)

Perhaps Teller did not intend his principle to apply to thesorts of cases considered in my counterexamples. If so, theremay be no dispute between us, since I have agreed that condi-tionahzation is rational when other things are equal. But thenI would say that Teller's defense of conditionahzation is in-complete, because he gives no method for distinguishing thecircumstances in which his principle applies. By contrast, thedecision-theoretic approach I have used makes it a straightfor-ward matter of calculation to determine under what circum-stances rationality requires conditionalization.

I turn now to Howson's argument for conditionalization (How-son and Urbach 1989, pp. 67f.). Howson interprets p(H) as thebetting quotient on H that you now regard as fair, p(H\E) as thebetting quotient that you now think would be fair were you tolearn E (and nothing else), and q(H) as the betting quotient thatyou will in fact regard as fair after learning E (and nothing else).His argument is the following. (I have changed the notation.)p(H\E) is, as far as you are concerned, just what the fair betting-quotient would be on H were E to be accepted as true. Hence fromthe knowledge that E is true you should infer (and it is an inferenceendorsed by the standard analyses of subjunctive conditionals) thatthe fair betting quotient on H is equal to p(H\E). But the fair bettingquotient on H after E is known is by definition q(H).

I would not endorse Howson's conception of conditional prob-ability. But even granting Howson this conception, his argumentis fallacious. Howson's argument rests on an assumption of thefollowing form: People who accept "If A then 5" are obligedby logic to accept B if they learn A. But this is a mistake; onlearning A you might well decide to abandon the conditional "IfA then J5," thereby preserving logical consistency in a differentway.

124

In the case at hand, Howson's conception of conditional prob-ability says that you accept the conditional "If I were to learnE and nothing else, then the fair betting quotient for H wouldbe p{H\E)y Howson wants to conclude from this that if youdo learn E and nothing else, then logic obliges you to acceptthat the fair betting quotient for H is p(H\E). But as we haveseen, this does not follow; for you may reject the conditional. Infact, if you adopt a posterior probability function q, then yourconditional probability for H becomes q(H\E) = q(H)\ and ac-cording to Howson, this means you now accept the conditional"If I were to learn E and nothing else, then the fair betting quo-tient for H would be q(H)" In cases where conditionalizationis violated, q(H) ^ p(H\E), and so the conditional you nowaccept differs from the one you accepted before learning E.

Thus neither Teller's argument nor Howson's refutes myclaim that it is sometimes rational to violate conditionaliza-tion. And neither is a substitute for my argument that, whenother things are equal, rationality never requires violating con-ditionalization.

5.2.3 Van Fraassen on conditionalizationIn a recent article, van Fraassen (in press) argues that con-ditionalization is not a requirement of rationality. Prom theperspective of this chapter, that looks at first sight to be aparadoxical position for him to take. I have argued that con-ditionalization is a special case of the principle not to imple-ment shifts that violate Reflection. If this is accepted, thenvan Praassen's claim that Reflection is a requirement of ratio-nality implies that conditionalization is also a requirement ofrationality.

I think the contradiction here is merely apparent. Van Praa-ssen's idea of how you could rationally violate conditionaliza-tion is that you might think that when you get some evidenceand deliberate about it, you could have some unpredictable in-sight that will cause your posterior probability to differ fromyour prior conditioned on the evidence. Now I would say that ifyou satisfy Reflection, your unpredictable insight will be part ofwhat you learned from this experience, and there is no violation

125

of conditionalization. But there is a violation of what we couldcallEvidence-conditionalization. // your current probabilityfunction is p, and if q is the probability function you wouldhave if you acquired evidence E and no other evidence, thenq(-) should be identical to p(-\E).

This principle differs from conditionalization as I defined it, inhaving E be the total evidence acquired, rather than the to-tality of what was learned. These are different things because,as argued in Section 5.1.5, not all learning involves getting ev-idence. Where ambiguity might otherwise arise, we could callconditionalization as I defined it learning-conditionalization.

These two senses of conditionalization are not usually dis-tinguished in discussions of Bayesian learning theory, presum-ably because those discussions tend to focus on situations inwhich it is assumed that the only learning that will occur isdue to acquisition of evidence. But once we consider the pos-sibility of learning without acquisition of evidence, evidence-conditionalization becomes a very implausible principle. For ex-ample, suppose you were to think about the value of \/289, andthat as a result you substantially increase your probability thatit is 17. We can suppose that you acquired no evidence overthis time, in which case evidence-conditionalization would re-quire your probability function to remain unchanged. Hence ifevidence-conditionalization were a correct principle, you wouldhave been irrational to engage in this ratiocination. This isa plainly false conclusion. (On the other hand, there need beno violation of learning-conditionalization; you may think youlearned that A/289 is 17.)

So van Praassen is right to reject evidence-conditionalization,and doing so is not inconsistent with his endorsement of Reflec-tion. But that endorsement of Reflection does commit him tolearning-conditionalization; and I have urged that this principleshould also be rejected.

5.2.4 The rationality of arbitrary shiftsSpeaking of the theory of subjective probability, Henry Kyburgwrites:

126

But the really serious problem is that there is nothing in the theorythat says that a person should change his beliefs in response to evi-dence in accordance with Bayes' theorem. On the contrary, the wholethrust of the subjectivist theory is to claim that the history of theindividual's beliefs is irrelevant to their rationality: all that counts ata given time is that they conform to the requirements of coherence.It is certainly not required that the person got to the state he is in byapplying Bayes' theorem to the coherent degrees of belief he had insome previous state. No more, then, is it required that a rational in-dividual pass from his present coherent state to a new coherent stateby conditionalization For all the subjectivist theory has to say,he may with equal justification pass from one coherent state to an-other by free association, reading tea-leaves, or consulting his parrot.(Kyburg 1978, pp. 176-7)

The standard Bayesian response to this objection is to claimthat conditionalization has been shown to be a requirement ofrationality, for example, by the diachronic Dutch book argu-ment (Skyrms 1990b, ch. 5). But I have shown that the argu-ments for conditionalization are fallacious and that the prin-ciple is not a general requirement of rationality. Nevertheless,Kyburg's objection is still mistaken.

If you think there is something wrong with revising your prob-abilities by free association, reading tea-leaves, or consultingyour parrot, then presumably shifts in probability induced bythese means do not satisfy Reflection for you. If that is so, thenthe theorem of Section 5.1.6 shows that if these shifts wouldmake any difference at all to your expected utility, then im-plementing them would not maximize expected utility, otherthings being equal. Thus under fairly weak conditions, Bayes-ian theory does imply that it is irrational for you to revise yourbeliefs by free association, and so forth.

5.3 PROBABILITY KINEMATICS

It is possible for the shift from p to q to satisfy Reflection with-out it being the case that there is a proposition E such thatq(-) = p(-\E). When this happens, you think you have learnedsomething, but there is no proposition E that expresses whatyou learned. The principle of conditionalization is then not ap-plicable.

127

Jeffrey (1965, ch. 11) proposed a generalization of condition-alization, called probability kinematics, that applies in suchcases. Jeffrey supposed that what was learned can be repre-sented as a shift in the probability of the elements of somepartition {Ei}. The rule of probability kinematics then speci-fies that the posterior probability function q be related to theprior probability p by the condition

Armendt (1980) has given a Dutch book argument to showthat the rule of probability kinematics is a requirement of ra-tionality. But this argument has the same fallacy as the Dutchbook arguments for Reflection and conditionalization. Further-more, my account of the true status of conditionalization alsoextends immediately to probability kinematics.

A natural interpretation of what it means for you to thinkwhat you learned is represented by a shift from p to q1 on theEi would be that the shift is to q, and

But then it follows that the requirement to update your beliefsby probability kinematics is equivalent to the requirement notto implement any shifts that violate Reflection. Hence updatingby probability kinematics is not in general a requirement ofrationality, though it is a rational principle when other thingsare equal, in the sense of Section 5.1.6.

5.4 CONCLUSION

If diachronic Dutch book arguments were sound, then Reflec-tion, conditionalization, and probability kinematics would allbe requirements of rationality. But these arguments are falla-cious, and in fact none of these three principles is a generalrequirement of rationality. Nevertheless, there is some truth tothe idea that these three principles are requirements of rational-ity. Bayesian decision theory entails that when other things are

128

equal, rationality never requires implementing a shift in proba-bility that violates Reflection. Conditionalization and probabil-ity kinematics are special cases of the principle not toimplement shifts that violate Reflection. Hence we also havethat when other things are equal, it is always rationally permis-sible, and may be obligatory, to conform to conditionalizationand probability kinematics.

129

6

The concept of acceptance

6.1 DEFINITION

In everyday life, and also in science, opinions are often expressedby making categorical assertions, rather than by citing personalprobabilities. The categorical assertion of if, when sincere andintentional, expresses a mental state of the assertor that I referto as acceptance of H. This chapter will examine the concept ofacceptance just defined; the following chapter will argue that itis an important concept for the philosophy of science.

What I am here calling acceptance is commonly called belief.Consider, for example, G. E. Moore's (1942, p. 543) observa-tion that it is paradoxical to say 'H, but I do not believe thatif.' The paradox is explained by the fact that sincere and in-tentional assertion of H is taken to be a sufficient conditionfor believing if.

It is, then, part of the folk concept of belief that it is a mentalstate expressed by sincere, intentional assertions. However, Ithink the folk concept involves other aspects too, and I amgoing to argue that the various aspects of the folk concept ofbelief do not all refer to the same thing. That is why I am callingthe concept I have defined 'acceptance', rather than 'belief.

My definition of acceptance assumes that you understandwhat a sincere assertion is. If that is not so, we are in trou-ble. For I cannot define sincerity except by saying that anassertion is sincere iff it was intended to express somethingthe assertor accepts; and you won't understand this definitionunless you already understand acceptance. But I think thateveryone knows what counts as evidence for and against thesincerity of an assertion, and this shows that everyone has thesort of understanding of sincerity that we require here. (Notethat my aim here is not to reduce the concept of acceptanceto other concepts, but merely to identify what I mean by

130

'acceptance'. Thus the fact that the notion of sincerity can-not be explained without invoking the notion of acceptancedoes not mean the definition of acceptance cannot achieve itspurpose.)

I say that acceptance is expressed by assertions that are bothsincere and intentional. What I mean by an intentional asser-tion is that it asserts what the person intended to assert; thusslips of the tongue are not intentional assertions. Unintentionalassertions may very well be sincere; sincerity requires only theintention to assert something that is accepted and not that thisintention is successful. Thus you may sincerely utter somethingyou do not accept; however, you cannot make a sincere andintentional assertion of something you do not accept, as accep-tance is here understood.

There is no infallible test for whether or not a person acceptsH. For one thing, there is no infallible test of whether an asser-tion is sincere and intentional. And even if we had an infallibletest of sincerity and intentionality, that test would not deter-mine whether a person who does not assert H accepts H. Butverificationism is now in disrepute, so I suppose I do not haveto argue that a concept can be legitimate, even though there isno infallible test for its application. What is more important isthat we be able to make reasonable inductive inferences aboutwhat a person accepts; this we can do, as the following exampleswill illustrate.

Categorical assertions predominate in science textbooks. Be-cause these assertions appear to be sincere, and because theyusually do not appear to be slips, we can say that the authorsof science texts accept the assertions that occur in these books.Presumably their colleagues would also sincerely endorse mostof these assertions, for example in teaching; and so they tooaccept most of what is in textbooks of their subject.1

In their research articles, scientists often mix categorical as-sertions with more guarded statements. Here is an example,

1Some texts present an outmoded theory (e.g., classical mechanics), because itis easier for students than the theory that is actually accepted (e.g., relativitytheory). Often this is done by describing the theory, without asserting it. Wherethe outmoded theory is asserted, the assertion must be deemed insincere, evenif pedagogically sound.

131

taken almost at random, from an article in a recent issue ofNature (15 June 1989 issue).

We found that this enzyme RNA catalyses the site-specific attack ofguanosine on the isolated PI stem, but that the Km for free PI wasvery high (> O.lmM). This weak interaction probably reflects the factthat there are few sequence or size requirements for the recognitionof PI by the core intron.

Here is another example, from the same issue of Nature.

In conclusion, Greenland ice cores reveal abrupt and radical changesin the North Atlantic region during the Younger Dryas-Pre-Borealtransition, including decreased storminess, a 50% increase in the pre-cipitation rate, a 7°C warming, and probably a temporary decrease ofthe evaporation temperature in the source area of moisture ultimatelyprecipitated as snow at high elevation in the arctic.

In each of these quotations, categorical assertions are followedby statements that do not make a categorical assertion butrather express a judgment of probability. Prom these statementswe can reasonably infer that the authors accept the assertionsthey have made categorically, but we cannot conclude that theyaccept the hypotheses they merely say to be probable. Whenscientists refrain from categorically asserting a hypothesis in apublication, this does not necessarily mean that they do notaccept the hypothesis. They may accept the hypothesis them-selves, but refrain from categorically asserting it in a publi-cation because the evidence is not yet enough to convince thescientific community at large. In such a case, the scientists maysay in private that they believe the hypothesis, and this wouldbe strong evidence that they accept it. The point is that ifthere is any context in which a person would assert a hypoth-esis, then we can reasonably infer that the person accepts thathypothesis (provided it is reasonable to think that the personis sincere and not making a mistake in that context, and thatthe change of context would not change what the person ac-cepts).

Acceptance of H is the state expressed by sincere intentionalassertion of H. But what sort of thing is H here? Although as-sertions are made by means of sentences, I do not intend that Hshould be understood to be a sentence. I would say that what

132

a German speaker asserts by saying 'Der Schnee ist weiss' isthe same as what an English speaker asserts by saying 'Snowis white', though the sentences are different. I would also saythat the state of acceptance these assertions express (if sincereand intentional) is the same. Conversely, a sentence like 'Thisis an electron' may be used to make quite different assertionson different occasions (possibly assertions with different truthvalues), even though the sentence is the same; here the stateof acceptance that the assertions express (if sincere and inten-tional) is also different.

I therefore take H to be a proposition, rather than a sen-tence. Propositions are here understood as sets of states (theyare also referred to as events). Thus we identify what a per-son has asserted on a given occasion with the set of states thatare consistent with the assertion. For example, an utterance of'Snow is white' (or 'Der Schnee ist weiss') asserts that the truestate is a member of the class of all states in which snow iswhite. And an utterance of 'This is an electron' asserts that thetrue state is a member of that class of states in which the thingdenoted by 'this' is an electron.

6.2 ACCEPTANCE AND PROBABILITY

6.2.1 Probability 1 not necessaryWhat is the relation between acceptance and probability? Onesuggestion would be to identify acceptance of a hypothesis withassignment of probability 1 to that hypothesis. But this viewis untenable. For to give hypothesis if probability 1 is to bewilling to bet on it at any odds; for example, a person whogave H probability 1 would be willing to accept a bet in whichthe person wins a penny if H is true, and dies a horrible deathif H is false. I think it is clear that scientists are not usually thisconfident of the hypotheses they sincerely categorically assert,and thus that probability 1 is not a necessary condition foracceptance.2

2I argued that this is so in (Maher 1986b) and (1990c, n. 13). Because I think fewreaders will need convincing, I do not repeat those arguments here.

133

6.2.2 High probability not sufficientHaving discarded the idea that acceptance can be identifiedwith probability 1, we might try identifying it with "high" prob-ability, that is, probability greater than (or not less than) somevalue r < 1. But this also would be a mistake. For example,consider a lotto game in which six numbers between 1 and 50are drawn without replacement. If you purchase a ticket in sucha game, your chance of winning is less than one in ten million.Now suppose you are thinking of purchasing such a ticket; youhave selected six numbers, and you ask me whether I think thiscombination will win. I certainly would not say it will win, butI also would not say it won't win. I would tell you the oddsagainst it winning are enormous, but no greater than for anyother number. Nor would I be being less than forthcoming here;I simply would not accept the proposition that your numberswill not win, even though I give this proposition an enormouslyhigh probability. Hence even an extremely high probability isnot sufficient for acceptance.

The claim of the preceding paragraph is that it is logicallypossible to give a proposition an extremely high probability,yet not accept it. Even if this is conceded, it might still beclaimed that it is irrational not to accept a proposition with anextremely high probability. But I think this claim should also berejected. I do not think the stance I adopted in the precedingparagraph is irrational. Furthermore, the claim conflicts withwhat I take to be a more compelling principle of rationality.

If I were to accept everything that I think extremely probable,then for every set of six lotto numbers, I would accept that thisset would not win. However, I also accept that some set of sixlotto numbers would win. Thus I accept an inconsistent set ofpropositions. Hence if rationality required me to accept all thepropositions I regard as highly probable, it would require meto accept an inconsistent set of propositions. But we naturallysuppose that rationality requires consistency.

That natural supposition needs some qualification to be de-fensible. It may be that there is some nonobvious inconsistencyin the propositions I accept, but the only way for me to removeit would be to spend years investigating the logical relationsamong the things I accept, or else to give up much of what I

134

accept. Either way, the cost of achieving consistency seems morethan it is worth, and so it is rational for me to continue withthe inconsistency.

However, the lotto example I was discussing is not like this.Here I have a clearly identified inconsistency, which can eas-ily be removed. In such a situation, I think rationality requiresconsistency. Our everyday practice suggests that this endorse-ment of consistency is widely shared. Show people that variousthings they have said are inconsistent, and they will feel obligedto retract one of those propositions. But if rationality requiresconsistency, even if only in cases where inconsistency is easilyavoidable, then rationality cannot require accepting all propo-sitions with high probability.3

6.2.3 Probability 1 not sufficientI have argued that no probability short of 1 is sufficient foracceptance, while a probability of 1 is not necessary for accep-tance. Together, these results show that acceptance cannot beidentified with any level of probability.

Still, it may seem that if a proposition is given a probabilityof 1, then it must be accepted. But this also is false. SupposeProfessor Milli's probabilities concerning the charge of the elec-tron, e, are as depicted in Figure 6.1. Here /(e) denotes Milli's"probability density" for e. What this means is that for anynumbers a and 6, Milli's probability that e lies between a and bis the area under the curve f(e) from a to 6. As we can see fromthe figure, for any distinct values a and 6, Milli gives a posi-tive probability to e being between a and 6. However, for anynumber a, Milli's probability that e = a is zero. (The "area"under the curve f(e) from a to a has zero width, and hence iszero.) Since Milli's probabilities satisfy the axioms of probabil-ity, it follows that for every number a, Milli gives probability 13The clash of principles here is the core of the "lottery paradox", which seems

to have been introduced to the literature by Kyburg (1961, p. 197). Kyburgtook the opposite stance to the one I am taking; he upheld the principle thathigh probability is sufficient for acceptance, and so rejected consistency. Thedifference between him and me may derive from the fact that his conceptionof acceptance differs from mine, being tied to practical action. For a critiqueof Jonathan Cohen's (1977) attempt to resolve the paradox with "inductive"probabilities, see Maher (1986b, p. 375f.).

135

f(e)

Figure 6.1: Milli's probability distribution for e

to the proposition that e ^ a. So if probability 1 were sufficientfor acceptance, then for each number a, Milli must accept thate ^ a.

But suppose Milli says: "My conclusion is that the charge onthe electron is 4.774 ± .009 x 10~10 electrostatic units." Pressedfurther, he refuses to rule out any value of e in this interval.Then absent any indications that Milli is being less than forth-coming, I think we should allow that for values of a in the statedinterval, Milli does not accept that e / a . And this need notundermine our attribution to Milli of the probability densityfunction / , that gives probability 1 to all propositions of theform e ^ a. Thus probability 1 is not sufficient for acceptance.

If you have gone this far with me, you concede that it is log-ically possible for a person to give a proposition probability 1,yet not accept it. This still leaves open the possibility that prob-ability 1 rationally obliges one to accept a proposition, even ifit does not logically force it. But I think there is also no suchrational obligation. Indeed, the idea that there is such an obli-gation, like the parallel idea considered in the preceding section,conflicts with the principle of consistency.

The conflict is evident in Milli's case. If Milli were to acceptevery proposition to which he gives probability 1, then for all a,he would accept that e ^ a. Yet Milli is also sure that the elec-tron has some value or other, and hence must accept this too.Thus if probability 1 is sufficient for acceptance to be rationallyrequired, Milli is rationally obliged to accept an inconsistent setof propositions. Yet there is nothing pathological about Milli's

136

probability function; probability functions like this are called"normal" in statistics. The problem lies rather in the idea thatprobability 1 is sufficient for rational acceptance.

6.2.4 High probability not necessaryFor all that I have said so far, it could be that high probabilityis necessary for acceptance. Specifically, it may be thought thata proposition cannot be accepted without giving it a probabilityof at least 1/2. While I would agree that accepted propositionscommonly do have a probability greater than 1/2,1 do not thinkthat this must always be the case.

It has often been noted that, in view of the regularity withwhich past scientific theories have been overthrown, we can rea-sonably infer that current scientific theories will also be over-thrown. Thus anyone reflecting on the history of science oughtto give a low probability (less than 1/2) to any given significantcurrent theory being literally correct. Yet scientists continue tosincerely assert significant scientific theories. If high probabilitywere necessary for acceptance, we would have to say that either

(a) These scientists have not drawn the appropriate lessonfrom the history of science, and accord their theories anunreasonably high probability; or

(b) Contrary to appearances, these scientists do not actuallyaccept their theories.

While (a) might be true in many cases, we could have goodevidence that it is false for some scientists. Suppose Einsteinwere offered his choice of

(1) World peace if general relativity is completely correct; or(2) World peace if general relativity is false in some way.

I think Einstein probably would have chosen (2). In any case,let us suppose that he makes this choice. Then it would bereasonable to conclude (using the preference interpretation ofprobability) that his probability for general relativity is lessthan 1/2.

If high probability is necessary for acceptance, we then wouldhave to say that Einstein did not accept general relativity. But

137

suppose he does categorically assert the theory, at the sametime that he is choosing (2) over (1); this is certainly possible.It is also possible that we could satisfy ourselves that no slip ofthe tongue occurred; his assertion was intentional. Then some-one who holds that high probability is necessary for acceptancemust say that Einstein's assertion is not sincere. "Actions speaklouder than words," it will be said.

But suppose Einstein defends himself against this charge ofinsincerity. "General relativity is simple in conception, followsfrom attractive assumptions, and its empirical predictions havebeen successful. Thus although the theory is probably incorrectin some way, I am confident that it will live on as a limiting caseof a future theory. And in the meantime, it is the only theorythat fits the evidence. Thus for the time being I accept thatthe world is as the theory says, though I realize that the theoryis likely to be corrected in the future." With such an explana-tion, and absent any other reasons to question sincerity, I thinkwe should accept that Einstein is sincere in asserting generalrelativity, even though giving it a low probability of being com-pletely correct. If someone thinks otherwise, their conception ofsincerity is different from mine.

There may be a temptation to say that what Einstein reallyaccepts is not general relativity itself, but rather the propositionthat general relativity is approximately correct; that it will "liveon as a limiting case of a future theory." But I am supposingthat, as scientists usually do, Einstein is categorically assertingthe theory itself, not merely that the theory is approximatelycorrect. That being so, what he accepts, on my definition ofacceptance, is the theory itself, not merely the claim that thetheory is approximately correct.

The example I have been using is largely fictitious, though Ihope it is not too implausible a fiction. In any case, the merepossibility of the story is enough to establish that it is possibleto accept a proposition without giving it a probability as highas 1/2.

But even if acceptance is possible in this case, can it berational? I submit that the example at hand supports an af-firmative answer. It is certainly rationally permissible (if notobligatory) to give major scientific theories a low probability of

138

being literally correct. But we also pretheoretically suppose thatit is rational to accept our best current scientific theories. Ifthese things seem in conflict, I would suggest the cause is aninadequate theory of rational acceptance. In the next section, Iwill describe a theory that lets us say all the things we want tosay here.

We don't need to go to grand scientific theories to find ex-amples in which acceptance of propositions, while giving themprobability less than 1/2, can be rational. I think that for almosteveryone, there are ten propositions that the person accepts, butto whose conjunction the person would give probability less than1/2. The propositions can be mundane things like 'My desk ismade of oak', 'My wife is in the living room', 'Tom is a psychol-ogy major', and so on. In such cases, giving the conjunction aprobability less than 1/2 seems reasonable; this follows from theassumption that we are not much more than 90 percent confi-dent of each, and their probabilities are independent. Acceptingthe individual propositions is also pretheoretically reasonable.So someone who maintains that a probability greater than 1/2 isnecessary for rational acceptance must say that rationality doesnot require people to accept the logical consequences of whatthey accept. But this logical principle is one on which we relyevery day, so abandoning it is a high price to pay.

6.3 RATIONAL ACCEPTANCE

6.3.1 Theory1 have argued that high probability is neither necessary nor suf-ficient for rational acceptance of a hypothesis. If this is right,rational acceptance must depend on something other than prob-ability.

To see what this something else is, consider the conclusionCavendish drew from an experiment he conducted in 1773. Theexperiment was to determine how the electrostatic force be-tween charged particles varies with the distance between theparticles. Cavendish states his conclusion this way:

We may therefore conclude that the electric attraction and repulsionmust be inversely as some power of the distance between that of the2 + l/50th and that of the 2 — l/50th, and there is no reason to think

139

that it differs at all from the inverse duplicate ratio. (Cavendish 1879,pp. 111-2)

This statement indicates that Cavendish accepted

He'- The electrostatic force falls off as the nth power of thedistance, for some n between 1.98 and 2.02.

Why wouldn't Cavendish have accepted only a weaker conclu-sion, for example by broadening the range of possible values ofn, as in

H'c: The electrostatic force falls off as the nth power of thedistance, for some n between 1.9 and 2.1.

Or he could have made his conclusion conditional, as in

HQ: If the electrostatic force falls off as the nth power of thedistance, for some n, then n is between 1.98 and 2.02.

Both Hfc and HQ are more probable than the conclusion

that Cavendish actually drew, as are infinitely many otherweaker versions of Cavendish's hypothesis. The obvious sugges-tion is that although these weaker hypotheses are more probablethan Hc, they are also considerably less informative, and thatis why Cavendish did not limit himself to these weaker hypothe-ses. But if informativeness is what is wanted, why not accept astronger hypothesis, a natural one here being

HQ: The electrostatic force falls off as the second power ofthe distance.

And again there is an obvious answer: Although Hc is moreinformative than He, Cavendish felt that it was not sufficientlyprobable to accept.

These considerations suggest that acceptance involves a trade-off of two competing considerations: the concern to be right(which would lead one to accept hypotheses of high probability),and the desire for informative hypotheses (which tends to favorhypotheses of low probability). Thus a theory of acceptance needsto take into account the scientist's goals or values, and specificallythe relative weights put on the goals of truth and informativeness.

140

Accept HAccept H'

HH' ' HH1 ' HH'

Figure 6.2: A simple conception of the utility of accepting hypotheses

Since the Bayesian way to represent goals is with a utility func-tion, a Bayesian theory of acceptance thus requires there to be autility function representing the weights that the scientist putson the competing goals of truth and informativeness.

What would such a utility function be like? First we needto identify the consequences that are the domain of the utilityfunction. A simple suggestion (Hempel 1960, 1962; Levi 1967)is that acceptance of a hypothesis H has two possible conse-quences, which we could describe as "accepting H when it istrue," and "accepting H when it is false." The utility functionwould assign utilities to consequences such as these, and thegoal of truth would be represented by giving a higher utility tothe former consequence than to the latter. Then the utility ofaccepting H would be as in Figure 6.2.

If H1 is a logically stronger hypothesis than if, then the goalof accepting informative true hypotheses would be representedby giving higher utility to the consequence "accepting H' whenit is true" than to "accepting H when it is true." This is also di-agrammed in Figure 6.2. (In this figure, the utility of acceptinga false hypothesis is represented as being higher for the less in-formative hypothesis; but I do not insist that this must be thecase.)

Figure 6.2 assumes that the utility of accepting a given hy-pothesis depends only on the truth value of the hypothesis, andthus has only two possible values. But it has often been saidthat false theories can be more or less close to the truth; oras Popper puts it, some false theories have greater verisimil-itude than others. And it is held that the utility of accepting

141

Accept H

Figure 6.3: Utility of acceptance depending on verisimilitude

a false theory depends on this distance from the truth, beinghigher the closer the hypothesis is to the truth. On this view,not only will the utility of accepting a hypothesis depend onwhether the hypothesis is true or false, but also, in case thehypothesis is false, it will depend on how far from the truththe hypothesis is. So a plot of the utility of acceptance mightlook like Figure 6.3.

Up to this point I have been discussing the acceptance of a sin-gle hypothesis. But scientists accept many hypotheses, and theutility of accepting two hypotheses will not in general be the sumof the utilities of accepting each. However, we can reduce the gen-eral case to the case of accepting a single proposition by focusingon the total corpus of accepted propositions. For this corpus canbe represented by a single proposition K, namely the conjunc-tion of all propositions that the person accepts. The acceptanceof a new proposition H, without any change in those previouslyaccepted, can then be represented as a replacement of the totalcorpus K by the logically stronger corpus consisting of the con-junction of K and H. The abandonment of a previously acceptedproposition, and the replacement of one accepted proposition byanother, can also be represented as shifts from one corpus to an-other. From this holistic perspective, a scientist's goals can berepresented by assigning utilities to the possible consequences ofaccepting K as corpus, for each possible corpus K. I will referto a function of this sort as a cognitive utility function, since itassigns utilities to cognitive consequences.

Chapter 8 will show how a cognitive utility function can bedefined, consistently with the preference interpretation of prob-ability and utility. In the meantime, I will anticipate that result

142

and assume that a cognitive utility function has been defined.We can then define the expected cognitive utility of acceptinga corpus in the usual way, as the probability-weighted sum ofthe utilities of the possible consequences. So if u(K, x) denotesthe cognitive utility of accepting K as corpus in state rr, and ifthe set of all possible states is X, then the expected cognitiveutility of accepting K as corpus is J2xeXp(x)u(K, x), assumingX is countable. We can then say that acceptance of corpus K isrational just in case the expected cognitive utility of acceptingthis corpus is at least as great as that of accepting any otheravailable corpus.

6.3.2 ExampleTo illustrate this theory of rational acceptance, consider theproblem of estimating the true value of some real-valued pa-rameter. For example, the problem might be to estimate thetrue value of the charge on the electron. The problem can bephrased, in our terminology, as: For what set A of real numbersshould I accept (as corpus) that the true value of the parameteris in A! Since the hypotheses in which we are interested differonly in the set A, it will be convenient to identify the set withthe corresponding proposition.

Let us define the content of (the proposition that the truevalue is in) A as4

1 if 4 = 0.So, for example, the content of a set containing just a singlepoint is 1, and it declines as the set A is enlarged, reaching 0when A is the whole real line (—00,00). Also, let A's distancefrom the truth, when the true value of the parameter is r, bedefined as

l+inf.eA \*-r\ U A ^ 0

1 if A = 0.

4Here sup(A) denotes the least upper bound, or supremum, of A; and inf(A)denotes the greatest lower bound, or infimum, of A. If A is the interval (a, b) or[a, 6], then sup(A) = b and inf(A) = a.

143

0.4^

0.3-

f(x) 0.2 -

0 .1-

- 2I

- 1

Figure 6.4: The standard normal distribution

If A is true (i.e., r E A), then dr(A) = 0. As the distancebetween r and the closest point in A increases, dr(A) increases,approaching 1 in the limit.

Now for any real number fc, define

Uk(A,r) = kc(A) — dr(A).

For each fc, Uk represents a possible measure of the cognitiveutility of accepting A when the true state is r; and k representsthe relative weight this utility function puts on the desideratumof informativeness, as compared with the competing desidera-tum of avoidance of error. I believe that there are other possiblecognitive utility functions besides the Uk\ I define these func-tions here merely to give an example.

To complete the stipulations of this example, suppose thatsome scientist's probability distribution for the parameter isthe standard normal distribution; this means that for any setA, the probability that the true value r is in A equals the areaabove the set A under the curve f(x) in Figure 6.4.5 Milli mighthave this distribution if r is a multiple of e — 4.774 x 10~10.

If cognitive utility is measured by u^, for some k < 1, then ex-pected cognitive utility can always be maximized by acceptingas corpus a closed interval of the form [—a, a], for some a.5 For those who want the formula, it is

p{A) =

The set A must be measurable.

144

I I I I I I0 1 2 3 4 5

Figure 6.5: S{u.zi[—a, a]) plotted as a function of a

Proof. For any A ^ 0, the interval [inf(A),sup(-A)] has thesame content as A, and cannot have a larger error. Also,

u*(0, r) = fc-l<0 = uk((-oo, oo), r),

and so expected utility is not maximized by accepting 0. Thusexpected utility can always be maximized by accepting a corpusthat is a closed interval. Since the standard normal distributionis symmetric about 0, the closed intervals that maximize ex-pected utility have the form [—a, a].

Thus we lose no real generality if we assume that the sets Athat are candidates for acceptance are all closed intervals of theform [—a, a].

Figure 6.5 shows the expected utility of accepting [—a,a],plotted as a function of a, when k = .37.6 Expected utility ismaximized at a = 1.95, and thus it is rational to accept ascorpus that the true value of the parameter is in the interval[—1.95,1.95]. The probability that the true value is indeed in61 use £ to denote expected value. The formula is

£(uk[-a,a}) = -}= [ uk([-a,a],x)e-x2'2dx

e - a / a dr.

145

this interval is .95, and thus the interval [—1.95,1.95] can bethought of as the Bayesian analog of a 95 percent confidenceinterval.

Note that in accepting [—1.95,1.95] as corpus, one suspendsjudgment on propositions that are more probable than what isaccepted. For example, to accept [—1.95,1.95] as corpus is tosuspend judgment on whether the value is outside the interval[—.01, .01]; and the probability of the latter is .994. Despite itshigher probability, acceptance of the latter proposition (eitheras corpus, or in conjunction with [—1.95,1.95]) would reduceexpected cognitive utility. Similarly for the negation of evensmaller intervals, which have even higher probability. In accept-ing [—1.95,1.95], one also fails to accept that the value of theparameter is not precisely 0, though that proposition has proba-bility 1. Unlike the interval cases, addition of this proposition to[—1.95,1.95] would not reduce expected utility, because it doesnot reduce the content of what is accepted or increase the pos-sible error; however, it does not increase expected utility either,so there is no positive reason to accept it. Furthermore, sincethe set of all propositions with probability 1 is inconsistent, toaccept all propositions with probability 1 is to accept the emptyset 0 as corpus, which does not maximize expected utility (seepreceding proof). Thus the present theory of rational accep-tance supports my earlier claim that no level of probability issufficient for acceptance.

As the value of k is increased, the cognitive utility functionUk puts more weight on content. Figure 6.6 shows £{u^[—a, a]),plotted as a function of a, assuming again that the parame-ter has a standard normal probability distribution. Here themaximum expected utility is attained when a = 0, and thusexpected utility is maximized by taking as corpus the degener-ate interval [—0,0], that is, the singleton set {0}. Since this sethas probability 0, one sees that expected cognitive utility can bemaximized by accepting a proposition with zero probability. Thereason is that although there is no chance of the corpus beingliterally correct, there is a very good probability that it will bequite close to the truth, and this together with the desire foran informative corpus makes acceptance of the corpus optimal.This dramatically supports my earlier intuitive argument, that

146

I I I I I I0 1 2 3 4 5

Figure 6.6: S(u.s[—a, a]) plotted as a function of a

high probability is not a necessary condition for accepting ahypothesis.7

6.3.3 ObjectionsEpistemologists have stock objections to decision-theoretic ac-counts of rational acceptance. One is that we don't have proba-bilities and utilities; the answer to this is in Sections 1.3 and 1.5.Another is that expected utility calculations are difficult to do;the answer to this is in Section 1.2. Here I wish to answer twoother objections that are often raised, and have not been ad-dressed earlier in this book.

The first of these objections is this: The acts that are evalu-ated in decision theory must be options that you could chooseif you wanted to. But a doxastic state like acceptance is notsomething you can choose at will, like deciding to take an um-brella. Hence the application of decision theory to acceptanceis fundamentally misguided.

To see the answer to this objection, we need to think aboutthe role norms of rationality play. These norms get their pointbecause our acceptance of them tends to influence what we do.Thus there is no point having norms requiring us to do / , if /is so far beyond our power that acceptance of this norm couldhave no tendency to make us do / . For example, it would be7Niiniluoto (1986) gives a similar Bayesian account of interval and point

estimation.

147

futile for me to accept the norm 'Have blue eyes,' given thatdoing so would have no tendency to make me have blue eyes.Typically, the things that acceptance of a norm cannot get us todo are the things that are not subject to our will; and hence weget the principle that 'ought' implies 'can'. However, the caseof acceptance is not typical in this respect. While acceptanceis not usually directly subject to the will, it nevertheless is thecase that acceptance of norms governing acceptance influenceswhat propositions we accept. We see this every day in science,for example where scientists determine a 95 percent confidenceinterval and then accept that the true value of the variable ofinterest lies in this interval. If the scientists accepted a differ-ent normative statistical theory, they would draw a differentconclusion. Hence there is a point to saying that we ought toaccept this or that proposition, even if acceptance is not directlysubject to the will.

Thus for the purposes of cognitive decision theory, we take theoptions to be not alternatives that are directly subject to thewill but alternatives that we would (or could) take if our normsrequired this. It seems that we could accept any proposition wecan formulate, if we accepted a norm requiring this. Hence thereis no shortage of options to which to apply cognitive decisiontheory.

The second objection I wish to address is this: Suppose accep-tance of A maximizes expected utility, and that I do accept it.Then according to the theory I have outlined, I have done therational thing. But suppose that I accepted A for some irrele-vant reason; perhaps I just picked it randomly and it was a flukethat I happened to pick the proposition that maximizes expectedcognitive utility. Then, the objection claims, my acceptance ofA was irrational, even though it maximizes expected utility.

This objection has an air of unreality about it, since I am notlikely to be able to accept something for a reason that I thinkis irrelevant. And if we want to make the case one in whichI do not think my reason is irrelevant, but ought to, then myirrationality can be attributed to having irrational probabilitiesand/or utilities (Section 1.9).

But putting aside such cavils, let us suppose that I can choosewhat to accept by a process that I regard as random. I hold

148

that this would not be irrational unless the random process hadsome positive chance of resulting in me accepting something withlower expected utility than A. But in that case, the expected util-ity of randomly choosing what to accept is less than the expectedutility of directly accepting A. This is so even if, as luck wouldhave it, the random method results in me accepting A. For theexpected utility of randomly deciding what to accept is a mix-ture of the expected utilities of the various propositions that Imight accept by this method, and in the case we are dealing with,this must be less than the expected utility of A Thus my the-ory of rational acceptance adequately explains the irrationalityinvolved in accepting A as the result of a random process.

6.4 ACCEPTANCE AND ACTION

My definition of acceptance linked acceptance of H to assertionof H. Is there also a relation between acceptance and action?Only an indirect one, I shall argue.

Note first that acceptance of H does not commit you to act inall cases as if H were true. For to act as if H were true is to actin the way that would be optimal if H were indeed true (wherewhat counts as "optimal" is determined by your utilities). Thusif you were committed to acting in all cases as if H were true,you would be committed to accepting any bet on H, since youwill win such a bet if H is true. But I have already noted thatyou can accept a proposition without being willing to bet on itat any odds.

Conversely, you can be willing to act in all cases as if H weretrue yet not accept H. For we have seen that you can give H aprobability of 1 yet not accept H; and if you give H a probabilityof 1, you are willing to act in all cases as if if were true.

Nor is it the case that acceptance of H commits you to bettingon H when the odds exceed some threshold. For to be willing tobet on H at odds greater than m : n is to have a probability forH greater than m/(m + n); and we have seen that no positivelevel of probability is necessary for acceptance.

There is thus no necessary link between what a person ac-cepts and how the person acts in practical circumstances. Inany situation, acceptance of H is consistent both with acting

149

as if H were true and also with acting as if H were false, andnonacceptance of H is also compatible with acting either way.This reflects the fact that rational action is determined by prob-abilities (plus utilities), and acceptance is not identifiable withany level of probability.

The lack of linkage between acceptance and action may alsobe brought out by imagining two rational individuals who havethe same probability distributions, and whose utility functionsagree on practical consequences, but who assign different utili-ties to some cognitive consequences. Then both individuals willhave identical preferences regarding all practical actions butmay nevertheless accept different propositions. Conversely, iftheir utility functions agree on cognitive but not practical con-sequences, they will accept the same propositions but have dif-ferent preferences regarding practical actions.

Furthermore, the decision to accept a theory (or to not acceptit) normally should produce no change in one's willingness toact as if the theory were true in practical contexts. If I rationallyaccept H, then H maximizes expected cognitive utility, relativeto my current probability and cognitive utility functions. Thefact that I have accepted H would not normally give me anyreason to further increase the probability of H. For example,the reasons that make me confident the theory of evolution istrue do not include the fact that I accept the theory. Thus thedecision to accept a theory (or to not accept it) normally shouldproduce no change in the probability of that theory - and henceno change in one's willingness to act as if the theory were truein practical contexts.

If this were not so (i.e., if the acceptance of a theory pro-duced a change in one's willingness to act in accordance withthe theory), then acceptance of a theory would have an influ-ence on practical utilities, and this would need to be taken intoaccount in considering whether or not it is rational to acceptthe hypothesis. The fact that there is normally no such influ-ence means that, normally, practical utilities are irrelevant tothe rationality of accepting a hypothesis. That is why, in pre-senting my theory of acceptance in Section 6.3,1 supposed thatrational acceptance maximizes expected cognitive utility, andignored practical utilities.

150

Having just mentioned my acceptance of the theory of evo-lution, let me indicate what role acceptance has played in ourevolution. I have been arguing that acceptance of a hypothesisnormally has no influence on practical action; and our survival,insofar as it is up to us, is completely determined by our prac-tical actions. Why then would evolution produce creatures likeus, who tend to place considerable importance on acceptingtruths, and avoiding accepting falsehoods? The reason, I sug-gest, is that our intellectual curiosity provides us with a moti-vation to reason, observe, and experiment, and these activitiesoften tend ultimately to favor our practical success. The desirefor practical success can also motivate such information gather-ing, but intellectual curiosity provides an additional, and oftenmore immediate, stimulus.

This does not mean that intellectual curiosity is reducibleto a desire for practical success, and hence that cognitive util-ity is a species of practical utility. Since the point is some-what subtle, let me illustrate it with an unsubtle analogy: thedesire for sex. It is clear that we have this desire because in-dividuals who have it are more likely to reproduce. However,the desire for sex is not the same as a desire to reproduce, orelse there would be no market for contraceptives. I am sug-gesting that intellectual curiosity (cognitive utility) is like thedesire for sex: Just as the desire for sex increases our moti-vation to reproduce, so our intellectual curiosity increases ourmotivation to acquire information; and both of these tend toincrease evolutionary fitness - but what the desire is a de-sire for is not the same as the evolutionary function thatthe desire serves. It may be that astrophysics is the intellec-tual equivalent of a contraceptive: something that breaks theconnection between our desires and the evolutionary functionthese desires serve.

Although there is no necessary linkage between acceptanceand practical action, we may sometimes be able to make infer-ences about what people accept on the basis of their practicalactions. For people's practical actions give us some informationabout their probabilities; and from that, together with someassumptions about their cognitive utilities, we can make an in-ference about what they accept. This is the same process we

151

use to predict people's actions in one practical context, on thebasis of their actions in a different practical context.

There is, then, some connection between acceptance and prac-tical action, due to the fact that both partially reflect a person'sprobability function. But this indirect connection is the only con-nection there is.

6.5 BELIEF

At the beginning of this chapter, I mentioned my reservationabout the folk concept of belief. We have now reached the pointwhere I can explain the problem with this concept.

It is standardly assumed that you believe H just in case youare willing to act as if if were true. But under what circum-stances is this willingness to act supposed to occur? As we ob-served in the preceding section, to be committed to always act-ing as if if were true is to be willing to bet on H at any odds.However, the usual view seems to be that you do not need to beabsolutely certain of H (give it probability 1) in order to believeit. For one thing, it is usually supposed that there is very littlewe can be rationally certain of, but that we can neverthelessrationally hold beliefs on a wide range of topics. Thus belief inH does not seem to imply a willingness to act as if H were trueunder all circumstances.

Two responses to this difficulty suggest themselves. One is tosay that you believe H just in case you are willing to act as ifH were true, provided the odds are not too high (where whatcounts as "too high" remains to be specified). On this account,belief in H would be identified with having a probability for Hexceeding some threshold. The other approach is to abandonthe idea that belief is a qualitative state that you either haveor you don't, and instead say that it comes in degrees. Yourdegree of belief in H could then be measured by the highestodds at which you would be willing to bet on H. This secondsuggestion effectively identifies belief with probability; and infact, Bayesians since Ramsey (1926) have referred to subjectiveprobability as "degree of belief."

Whichever of these suggestions is adopted, one will still haveto deal with another aspect of the concept of belief. As I

152

remarked at the beginning of this chapter, it is standardly as-sumed that belief in H is the mental state expressed by sincereintentional assertion of H - in other words, that belief is thesame as acceptance. But if belief is acceptance, then it cannotbe related to probability in either of the ways just considered.

The reason why acceptance cannot be identified with prob-ability greater than some threshold has already been given inSection 6.2. No matter where the threshold is set, you can giveH a probability that high but still not accept if; and you alsocan accept something with a probability lower than the thresh-old. The reason why acceptance cannot be identified with prob-ability itself is that acceptance (as I have defined it) is a qual-itative state that one either has or lacks, while probability isa matter of degree. One might think of dealing with the latterproblem by introducing a notion of "degree of acceptance," andattempting to identify that with probability; but this also willnot work. On any reasonable understanding of degree of accep-tance, a person who (qualitatively) accepts H but not K, oughtto count as having a higher degree of acceptance for H than forK\ yet we have seen that such a person may well give a lowerprobability to H than to K.

The upshot, then, is that the folk concept of belief appearsto regard belief in H as a single mental state that is expressedboth by a willingness to act as if H were true and also by sincereintentional assertion of H; and these are in fact two distinctstates.

Stich (1983, pp. 230-7) has claimed that the mental statesunderlying sincere assertion and practical action need not bethe same, and has inferred that the folk concept of belief maynot refer to anything. Obviously I agree with Stich that thesestates need not be the same; for I have argued that they arenot the same.

Stich cites some psychological research that, he claims, sup-ports the conclusion that the mental states underlying assertionand action are distinct. I would like to be able to cite some em-pirical support of this kind, for the conclusion I have reachedon more a priori grounds. But unfortunately, the research citedby Stich gives no real support to this conclusion (though it alsodoes not disconfirm it).

153

Consider, for example, a study by Wilson, Hull, and Johnson(1981), discussed by Stich. In this study, subjects were inducedto volunteer to visit senior citizens in a nursing home, and thenlater were asked if they would volunteer to help former mentalpatients. For one group of subjects, overt pressure from theexperimenter to visit the senior citizens was made salient; foranother group, it was not. There was a negative correlationbetween being in the first group and agreeing to help the mentalpatients. The experimenters hypothesized that the reason forthe correlation was that subjects in the second group, feelingless pressured, would tend to infer that they were visiting thenursing home because they were helpful people, and that thisinference made them more likely to volunteer to help the mentalpatients.

After agreeing to visit the nursing home, half the subjects ineach group were asked to list all the reasons they could thinkof that might explain why they agreed to go and to rate theirimportance. Later, all subjects were asked to rate themselves onvarious traits relevant to helpfulness. Those who had been askedto list reasons tended to rate themselves as more helpful thanthose who had not, but they were not more likely to volunteerto help the mental patients.

This study does show that subjects' actual and self-reportedhelpfulness can be independently manipulated. But it does notfollow from this that different types of belief state underlie as-sertion and action; for subjects who act in helpful ways are notacting on the belief that they are helpful. What follows is ratherthat being helpful and accepting that you are helpful are twodifferent states.

A suitable experiment to show the difference between action-producing and assertion-producing belief states would be tohave subjects consider a million-ticket fair lottery. If they arenot willing to assert categorically that a given ticket will notwin, but are willing to bet on this at high odds, then we haveestablished that (a high degree of) the kind of belief that un-derlies action is not sufficient for the kind of belief that under-lies assertion. Conversely, if we can find propositions that sub-jects will categorically assert to be true but will not bet highodds on, we have shown that the kind of belief that underlies

154

assertion is not sufficient for (a high degree of) the kind of beliefthat underlies action. I have already suggested that, at least forreflective people, scientific theories often fall in the latter cate-gory.

6.6 OTHER CONCEPTS OF ACCEPTANCE

The term 'acceptance' is widely used, especially in the philoso-phy of science; and other authors have given it definitions thatdiffer from the one I have given in this chapter. I will concludethis chapter by comparing my definition with a few of thesealternatives.

6.6.1 KaplanMark Kaplan (1981) has advocated a conception of acceptancethat is similar in spirit to mine. Like me, Kaplan views rationalacceptance as maximizing expected cognitive utility, and he as-serts that improbable propositions can be rationally accepted,while probable ones need not be. But he and I define acceptancein different ways.

Kaplan defines '5 accepts if' as meaning

S would defend H were S"s sole aim to defend the truth.

By 'defend' Kaplan means categorical assertion; so his defini-tion and mine are related in connecting acceptance to assertion.Also, if S"s sole aim were to defend the truth, then S would besincere; and so the notion of sincerity figures in both definitions.

The main difference between Kaplan's definition and mine isthat his definition uses a counterfactual conditional. This resultsin his definition's having an extension different from mine. Forexample, on Kaplan's account, only very confused persons couldever accept

A: Defending the truth is not my sole aim.

For if your sole aim were to defend the truth, and you were notconfused, you would know A was false, and hence would notdefend it. On my account, you could accept A because you couldbe in the mental state expressed by sincere intentional assertionof A (even though you might not wish to actually assert it).

155

So far this difference is just a different choice of how to use aword. But when combined with the decision-theoretic accountof rational acceptance that Kaplan and I share, it becomes asubstantive difference. Suppose that you are in the mental stateexpressed by sincere intentional assertion of A, but would notdefend A were your sole aim to defend the truth, since thenyou would know A was false. My theory says that you acceptA, and hence that your present cognitive utility is higher if Ais true than if A is false. Kaplan's theory says that you acceptA, and hence that your present cognitive utility is higher if Ais false than if A is true. I think my theory better fits our senseof what your cognitive utility would be in this situation.

Another difference is that Kaplan does not take account ofthe possibility of unintentional assertions. Suppose that if yoursole aim were to defend the truth, then you would attemptto assert H, but you would make a slip, and unintentionallyassert H1. Then on Kaplan's account you accept Hf, but onmy account you probably8 accept H. So on Kaplan's account,your cognitive utility, if you would slip and assert if, is thesame as if you would sincerely and intentionally assert H'; whileon my account, it is the same as if you were to sincerely andintentionally assert if. Again, I think my account better fits oursense of what your cognitive utility would be in this situation.

6.6.2 LeviLevi (1967) discussed two notions of acceptance, which he called'acceptance as true' and 'acceptance as evidence'. Levi (1967,p. 25) says that 'accept as true' means the same as 'believe'.However, Levi allows that you can accept H as true, withoutbeing willing to act on H in all contexts; he even allows thatyou can accept H while giving it a low probability. Thus accep-tance-as-true departs from the common concept of belief bynot being linked with action. Levi does seem to assume, how-ever, that sincere intentional assertion of H expresses a per-son's acceptance of if as true (1967, pp. 10, 223). In both these8If you are attempting to assert H, and if you are sincere, then on my account

you accept H. But in the case we are considering here, this only shows that ifyour sole aim were to defend the truth, then you would (on my account) acceptH. It remains possible that you do not in fact accept H.

156

respects, acceptance as I have defined it appears to agree withLevi's concept of acceptance-as-true.

Levi also offered a decision-theoretic account of when it isrational to accept a hypothesis as true. Like the account of ra-tional acceptance I have sketched, Levi's account proposed thatthe cognitive goals of accuracy and content could be representedby a cognitive utility function. However, Levi held that a personmight simultaneously have different demands for information,these demands being representable by different utility functions.And for Levi, acceptance-as-true was relative to these demandsfor information; on his theory, one could accept H as true tosatisfy one demand for information, and at the same time ac-cept H as true to satisfy another demand for information. Thiswould be rational, according to Levi, provided each acceptance-as-true maximized expected cognitive utility, relative to its ownparticular cognitive utility function.

My notion of acceptance differs from Levi's concept of accep-tance-as-true in not being relative to demands for information.My main reason for not adopting a relativized notion is thatrationality requires keeping the various propositions we acceptconsistent with one another; thus someone who accepts H tosatisfy one of their demands for information is obliged to alsoaccept if as an answer to any other demand for informationthat H may satisfy. We therefore do not need a question-relativenotion of acceptance in a theory of rational acceptance.

In Levi's later writings (1976,1980), the notion of acceptance-as-true has virtually disappeared, and acceptance-as-evidence occupies center stage. It is now acceptance-as-evidence for which Levi gives a decision-theoretic accountinvolving cognitive utility. I have the impression that after(1967), Levi came to view acceptance-as-true as not a realbearer of cognitive utility. On this later view, as I interpretit, to accept H as true relative to some demand for informationis merely to be committed to accept H as evidence if (possiblycontrary to fact) that demand for information were one's onlydemand.9 This interpretation of Levi implies that he too doesnot now believe that acceptance, in the sense that influences

9This interpretation is suggested by (Levi 1976, p. 35, and 1984, p. xiv).

157

cognitive utility, is relative to demands for information.But while Levi's notion of acceptance-as-evidence is like my

notion of acceptance in not being relativized, it differs from mynotion in another crucial respect: To accept a hypothesis as ev-idence, Levi says, is to assume its truth in all (practical andtheoretical) deliberation. Thus if one accepts H as evidence,one must give H probability 1. By contrast, my notion of ac-ceptance, like Levi's notion of acceptance-as-true, allows thatone can accept hypotheses without giving them probability 1,and indeed can do so while giving them a probability as low asyou like.

I find the concept of acceptance-as-evidence less importantthan Levi does, because I think people accept-as-evidencemuch less than he supposes. Levi imagines that observationreports, scientific theories, and other things are often accepted-as-evidence. This means that people should be willing to bet atany odds on a wide range of observation reports and scientifictheories. I am not willing to do that, and people I have talked tosay they are not willing either. I do not deny that some propo-sitions, even contingent ones, are assigned probability 1; if oneis dealing with a real-valued parameter, for example, some con-tingent propositions must get probability 1. But I take it thatthe propositions given probability 1 are typically very uninfor-mative ones, like "the force between charged particles does notvary precisely as the rth power of the distance." Observationreports and scientific theories are typically regarded as fallible.

In any event, acceptance as I have defined it is ubiquitous,and is not identifiable with acceptance-as-evidence. For as Levihas observed (1967, p. 10), people can and do sincerely assertpropositions they are not willing to bet on at all odds and hencedo not take to be certainly true; the propositions thus assertedare accepted in my sense, but not accepted-as-evidence. Thus atheory of acceptance in my sense is needed, whatever one thinksof Levi's concept of acceptance-as-evidence.

6.6.3 Van FraassenIn The Scientific Image, van Fraassen maintained that accep-tance of a theory in science involves belief that the theory is

158

empirically adequate (it agrees with observable phenomena) butnot that the theory is true. He also said that this belief inempirical adequacy is only a necessary condition for acceptance,not a sufficient condition. Acceptance, he said, also involves acertain commitment. For scientists, the relevant commitmententails the adoption of a particular kind of research program;and for nonscientists, it entails "willingness to answer questionsex cathedra" (van Praassen 1980, p. 12).

For van Fraassen (1985, pp. 247ff.), belief is the same as prob-ability. So his statement, that acceptance of a theory in scienceinvolves belief in its empirical adequacy, must mean that accep-tance of a theory in science involves giving a high probability tothe theory being empirically adequate. And similarly, the state-ment that belief in the truth of the theory is not necessary foracceptance in science must mean that one can accept a theoryin science without giving a high probability to the theory beingtrue.

Van Fraassen's reference to ex cathedra pronouncementspoints to a similarity between his concept of acceptance andmine: Both are connected with categorical assertion. However,the connection is less direct on my account. People, includingscientists, are sometimes secretive; they can be in the mentalstate expressed by sincere intentional assertion of if, withoutbeing willing to assert H. Thus I would not agree that accep-tance entails "willingness to answer questions ex cathedra."

On both van Fraassen's view and mine, a person can accepta theory without giving a high probability to the theory beingtrue. However, I disagree with his statement that acceptance ofa theory in science involves belief that the theory is empiricallyadequate. Van Fraassen's reason for saying this is that he thinks(a) science aims at accepting empirically adequate theories, and(b) (rational?) acceptance involves a high probability that theaim of acceptance is served. Proposition (a) is about the aimof science, not the concept of acceptance; I will discuss it inSection 9.6. Here I focus on (b), which is a claim about thenature of (rational) acceptance.

One problem with (b) - and also with (a) - is that the notionof an aim seems to presuppose that there is only one desirableoutcome; whereas in fact there may be a continuum of possible

159

outcomes, of varying degrees of desirability. But let us put thatproblem aside, and suppose that we have a simple case in whichwe can say that the aim in accepting H is to accept an empiri-cally adequate proposition. Still, rational acceptance of H doesnot require that H have a high probability of being empiricallyadequate. For example, suppose that accepting H increases util-ity by 10 if H is empirically adequate, and reduces utility by 1otherwise. Then the probability of if being empirically adequateneed only be 0.1 for acceptance of H to be rational. Thus (b) isfalse.

Let us turn now to the question of whether acceptance of a hy-pothesis commits a scientist to a research program. Van Praassensays that this commitment to a research program comes aboutbecause accepted theories are never complete (1980, p. 12); thissuggests that the research program that he takes to be entailedby acceptance is one of extending the scope of the theory. Butmost scientists extend the scope of only a few theories, if any,and I doubt that van Praassen wishes to keep the class of theo-ries accepted by scientists this small. Perhaps the commitmentto extend the scope of the theory is meant to apply only to thosescientists who extend the scope of some theory in the subjectarea of the accepted theory: The position would be that if youaccept i7, then if you extend the scope of some theory in thesubject area of H, you must extend H. But even this is not acondition that must hold for acceptance as I have defined it. Forexample, we can easily imagine a scientist Poisson', who sincerelyasserted a corpuscular theory of light, but extended the scope ofPresnel's wave theory of light by deriving a previously unnoticedconsequence of the theory, and extended the scope of no othertheory of light.10 Poisson' would accept a corpuscular theory onmy account of acceptance but not on van Fraassen's account.

Even if your sole concern was the scientific one of acceptingthe right theory, it could still be rational to accept (in my sense)one theory but work on an incompatible theory. To see this,suppose that acceptance of H currently has a higher expectedcognitive utility than does the acceptance of H', but there is

10The real Poisson did extend the scope of Fresnel's theory, without accepting it(Worrall 1989). But I do not know whether Poisson accepted the corpusculartheory, or whether he extended the scope of that theory.

160

a chance H1 could be developed further into a theory whoseacceptance would have a higher expected utility than H. Insuch a case, it could be rational to develop H' while acceptingH.n

6.7 SUMMARY

I defined acceptance of H as the mental state expressed by sin-cere intentional assertion of H. We saw that whether a personwill accept H depends not only on the probability of H but alsoon other factors, such as how informative H is. In the light ofthis, I proposed a decision-theoretic account of rational accep-tance, in which acceptance is viewed as having consequenceswith cognitive utility, and rational acceptance maximizes ex-pected cognitive utility.

Acceptance, as I have defined it, captures one aspect of thefolk notion of belief, while probability is a different, incompati-ble aspect of that concept. This notion of acceptance is relatedto the notions of acceptance employed by Kaplan, Levi, andvan Praassen; where it differs from the latter, I have given rea-sons for preferring my definition.

Two main questions remain to be answered. One is why (ifat all) we should think that acceptance is an important conceptfor the philosophy of science. The other is the question of howto justify the assumption of a utility function for cognitive con-sequences. These questions will be considered, in this order, inthe next two chapters.11Essentially this point has been made by Laudan (1977, pp. 108-13). However,

the point seems to be inconsistent with Laudan's notion of acceptance, whichis treating a theory as if it were true (p. 108).

161

The significance of acceptance

In the previous chapter, I argued that the doxastic state ofaccepting a hypothesis is not reducible to probability, and Isketched a Bayesian theory of rational acceptance that takesinto account cognitive utilities as well as probabilities. But atheory of acceptance is not a usual component of Bayesianphilosophies of science. It seems that many Bayesian philoso-phers of science think of subjective probability as a replacementfor the notion of acceptance, and so think that acceptance hasno important role to play in a Bayesian philosophy of science.This chapter will argue that that view is a mistake, by describ-ing three ways in which the theory of acceptance makes animportant contribution to Bayesian philosophy of science.

7.1 EXPLAINING THE HISTORY OF SCIENCE

Much of what is recorded in the history of science is categor-ical assertions by scientists of one or another hypothesis, to-gether with reasons adduced in support of those hypothesesand against competing hypotheses. It is much less common forhistory to record scientists' probabilities. Thus philosophers ofscience without a theory of acceptance lack the theoretical re-sources to discuss the rationality (or irrationality) of most ofthe judgments recorded in the history of science. But a philos-ophy of science this limited in scope can fairly be described asimpoverished.

Without a theory of acceptance, it is also impossible to in-fer anything about scientists' subjective probabilities from theircategorical assertions. Thus for a philosophy of science withouta theory of acceptance, the subjective probabilities of most sci-entists must be largely inscrutable. This severely restricts thedegree to which Bayesian confirmation theory can be shown to

162

agree with pretheoretically correct judgments of confirmationthat scientists have made. But as we saw in Section 4.2, demon-stration of such agreement provides an important argument forthe correctness of Bayesian confirmation theory. Thus the lackof a theory of acceptance would seriously limit the argumentssupporting the correctness of Bayesian confirmation theory.

This conclusion will be surprising to most Bayesian philoso-phers of science. For while Bayesian philosophers of science donot generally see a need for a theory of acceptance, they havenevertheless gone to considerable lengths to analyze episodesin the history of science. How is this possible, when accordingto what I have just said, analysis of the history of science nor-mally requires a theory of acceptance? It is possible becausealthough these philosophers do not see the need for a theoryof acceptance, they nevertheless operate with a tacit theory ofacceptance. This tacit theory seems to identify acceptance withhigh probability. In Section 6.2 I explained why this is an un-tenable theory. Consequently, Bayesian analyses of historicalepisodes do not avoid reliance on a theory of acceptance; andthey could be improved by using a better theory of acceptance.

To illustrate this, I will review three representative examplesof Bayesian attempts to explain episodes from the history ofscience, and show how each of them presupposes an incorrecttheory of acceptance.

One reason I used Cavendish as an example in Section 6.3.1 isthat this example has been analyzed by Dorling (1974). Dorlingtakes the fact that needs explaining to be that "Cavendish's ex-periment and argument . . . render it highly probable that thecorrect law would be at any rate a very good macroscopic ap-proximation to the inverse-square law" (1974, p. 336); and hegives a Bayesian analysis that yields this conclusion. But thisanalysis explains a historical fact only if Cavendish (or otherscientists) did think his experiment made it highly probablethat an inverse square law was close to the truth. And historydoes not record Cavendish saying that he thought this. It simplyrecords his categorical conclusion that the electrostatic law wasapproximately an inverse-square law.1 Nor does it seem that

1For the relevant quotation from Cavendish, see page 139.

163

other scientists said Cavendish's experiment made his conclu-sion highly probable; certainly Dorling cites no evidence thatthey did.

Presumably Dorling took Cavendish's assertion to show thatCavendish gave his conclusion a high probability. Dorling wouldthen be tacitly assuming

(1) Accepted propositions are given high probability.

If my argument in Section 6.2.4 is right, then (1) is false, andso Dorling's analysis tacitly presupposes an untenable theory ofacceptance. This is not to say that Dorling's analysis is fatallyflawed; perhaps the theory of acceptance developed in Chap-ter 6 could be used to argue that Cavendish did indeed give ahigh probability to his conclusion, as a result of his experiment.But that is not my concern here. Rather my concern is simplyto point out that Dorling's historical analysis does tacitly pre-suppose a theory of acceptance and that the analysis would beimproved by using a better theory of acceptance.

My next example is a historical analysis by Franklin and How-son (1985), who offer what they say is a Bayesian explication ofNewton's argument from Kepler's laws to the inverse-square lawof gravitation. Newton gives this argument in his Principia. AsFranklin and Howson point out, Newton offers a deductive ar-gument from Kepler's laws, together with a result stated earlierin the Principia, to the conclusion that the inverse-square lawholds between each planet and the sun.2 Newton also presentsevidence that the inverse-square law holds between the moonand the earth, and between Jupiter and Saturn and their moons.Franklin and Howson's explanation of why Newton offers thisadditional evidence is that "The additional evidence supportsthe hypothesis that the inverse square force is universal."

Since Franklin and Howson claim to be giving a Bayesianexplication, the notion of "support" they invoke here needs aBayesian interpretation. I guess that what they mean by "sup-port" is confirmation, or increase in probability. Thus if M is2 This derivation has been criticized because (a) it assumes that the sun is fixed

in space, contrary to what Newtonian theory implies, and (b) the earlier resultNewton uses for the derivation applies to circular motion only, not the ellipticalorbit of the planets. Franklin and Howson argue that Newton was aware of thesefacts but regarded them as introducing negligible errors.

164

the evidence about the moons, and U the universal inverse-square law, Franklin and Howson's explication of why Newtonoffers M is that p(U\M) > p(U). And in fact, it can easilybe argued that this inequality should hold. If the existenceof the moons and planets is assumed, then p(M\U) = 1; andprovided p(M) < 1, the desired inequality then follows fromBayes' theorem.

But what reason is there to think that this analysis explicatesNewton's reasoning? Newton did not say that p(U\M) > p(U);rather he lists M as a premise in his argument for the categoricalconclusion [/. If Franklin and Howson are attempting to explainNewton's probability judgments, they must be inferring whatthose judgments were from Newton's use of this argument, inwhich case they are probably assuming

(2) If evidence strengthens the case for accepting ahypothesis, then it confirms that hypothesis.

Alternatively, Franklin and Howson might be trying to explainwhy Newton argued as he did, by showing that Newton wouldhave held p(U\M) > p(U). In that case, they would seem to beassuming

(3) If evidence confirms a hypothesis, then it strengthensthe case for accepting that hypothesis.3

However, both principles (2) and (3) are false.For example, suppose we have 100 exclusive and exhaustive

hypotheses, denoted H\ through f/ioo- If their prior probabil-ities are all the same, namely 0.01, then we would most likelysuspend judgment, and not accept or reject any of them. Sup-pose this is so. Now we acquire some evidence E that raisesthe probability of H\ to 0.98, and the probability of B.2 to 0.02,while the probabilities of the remaining hypotheses drop to zero.With the new probabilities, we would most likely accept H\. Inthat case, E has confirmed H2 (it has raised its probability from0.01 to 0.02), yet has led to H2 being rejected and so certainly3This principle has been explicitly endorsed by van Fraassen, who asserts that

"any reason for belief is a fortiori a reason for acceptance" (1983, p. 168).Van Fraassen identifies belief with (high) subjective probability.

165

has not strengthened the case for accepting H2. Thus we havea counterexample to (3). Also, E has led to H2 being acceptedand so has strengthened the case for accepting H2, yet E dis-confirmed H2 (it reduced the probability of H2 from 0.99 to0.98); this is a counterexample to (2).

This example is consistent with the decision-theoretic accountof acceptance given in Section 6.3.1.

Proof. For 1 < n < 100, let the utility of accepting a disjunc-tion of n of the Hi be ^ p — 1 if the disjunction is true, and—2 if the disjunction is false. Then expected utility is initiallymaximized by accepting the disjunction of all 100 i7j, that is,suspending judgment. And after E is learned, expected utilityis maximized by accepting Hi alone.

Thus Franklin and Howson's explication of Newton's reason-ing tacitly presupposes a false principle regarding acceptance. Ithink this flaw in their explication could be repaired by a morecareful analysis, but that is not my concern here. The point Iwish to make is merely that their analysis does not avoid re-liance on a theory of acceptance; rather it tacitly assumes partof a theory, and what they assume does not stand up to criti-cal scrutiny. Their explication of Newton's reasoning could beimproved by the use of a better theory of acceptance.

For my third and last cautionary tale, I turn to Rosenkrantz's(1977) analysis of Copernicus's evidence for the heliocentric the-ory. Rosenkrantz writes:

[The] simplifications of the Copernican system (by virtue of whichit is a system) are frequently cited reasons for preferring it to thePtolemaic theory. Yet, writers from the time of Copernicus to ourown, have uniformly failed to analyse these simplifications or accountadequately for their force. The Bayesian analysis . . . fills this lacunain earlier accounts by showing that the cited simplifications render theheliostatic theory . . . better supported than the corresponding (morecomplicated) geostatic theory. (1977, p. 140)

Let H be Copernicus's heliocentric hypothesis, G Ptolemy'sgeocentric hypothesis, and E the evidence of the observedplanetary motions. Rosenkrantz's definition of support is such

166

that E supports H better than G just in case

p(H\E) p(G\E)p(H) > p(G) •

As the preceding quote indicates, Rosenkrantz intends this in-equality to be a reason for preferring H to G.

Now history records that Copernicus accepted H and rejectedG, and it records the reasons he gave in support of this, butit does not record his probabilities for H or G. Thus if Rosen-krantz's account is to explain the historical facts, the preferencethat it explains must be a preference for accepting H rather thanG. Rosenkrantz's account would then be tacitly assuming

(4) If E supports H better than G, then E is a reason toprefer accepting H over accepting G.

Alternatively, we might interpret Rosenkrantz as inferring fromthe historical record that Copernicus regarded E as supportingH better than G (in Rosenkrantz's sense) and attempting toexplain why this probabilistic judgment should hold. In thatcase, his inference from the historical record appears to assume

(5) If E is a reason to prefer accepting H over accepting G,then E supports H better than G.

However, both principles (4) and (5) are also false.For example, suppose A, 2?, and C are mutually exclusive

and exhaustive hypotheses, with probabilities before and afteracquiring evidence E as shown in Figure 7.1. One might wellsuspend judgment on these three hypotheses before learning J5,and accept A afterward. In that case, E is clearly not a reasonto prefer accepting C over accepting A; nevertheless, E doessupport C better than A. Thus we have a counterexample to(4). Also, E is a reason to prefer accepting A over C, though itdoes not support A better than G; so we also have a counterex-ample to (5). I leave it to the interested reader to show thatthis example is consistent with the theory of Section 6.3.1.

The preceding example illustrates a situation that is ubiqui-tous in science. For example, let A be some hypothesis that is

167

Figure 7.1: Counterexample to (4) and (5)

confirmed by evidence E, but does not logically entail E] andlet C be the proposition AE. Then

p(A\E) p(E\A) 1p(A) p(E)

p(E\C) = p(C\E)' P(E) p(C) •

Thus whenever a hypothesis A does not logically entail evidenceE, the proposition AE is better supported, in Rosenkrantz'ssense, than is A; but it would be absurd to infer from thisthat E makes acceptance of AE preferable to acceptance of A.There are many cases in which we take evidence to provide agood reason for accepting a hypothesis that does not entail thatevidence.

Thus Rosenkrantz's explanation of Copernicus's reasoning,like the two earlier explanations I have considered, relies on atacit theory of acceptance that does not stand up to criticalscrutiny. To make his explanation cogent, a better theory ofacceptance must be employed.

With these three examples, I have illustrated the point thatI made at the beginning of this section, namely: A theory ofacceptance is normally required to show that historical judg-ments of confirmation by scientists can be explained by Bayes-ian confirmation theory. Since the ability to explain exemplaryepisodes in the history of science can provide important sup-port for Bayesian confirmation theory, this is one reason whyeven Bayesians who find acceptance intrinsically uninteresting

168

should nevertheless regard the theory of acceptance as impor-tant. More reasons follow.

7.2 THE ROLE OF ALTERNATIVE HYPOTHESES

In The Structure of Scientific Revolutions, Kuhn wrote:[O]nce it has achieved the status of paradigm, a scientific theory isdeclared invalid only if an alternate candidate is available to take itsplace. No process yet disclosed by the historical study of scientificdevelopment at all resembles the methodological stereotype of falsi-fication by direct comparison with nature The decision to rejectone paradigm is always simultaneously the decision to accept another,and the judgment leading to that decision involves the comparison ofboth paradigms with nature and with each other, (p. 77)

I think that what Kuhn means by a "paradigm" here is atheory that has been so successful as to be regarded as an ex-emplary achievement. And so understood, Kuhn's statementmakes an observation that appears to be substantially correctand important for understanding the development of science. Agood example, and one cited by Kuhn, is provided by the historyof Newtonian gravitational theory. Prom around 1690 to 1750it was well known that the moon's perigee precessed at twicethe rate calculated by Newton himself, but this did not causeNewton's theory to be rejected, and in the end the discrepancyturned out to be due to a mathematical error. Again, in 1859Le Verrier published his calculations showing that according toNewtonian theory, the precession of the perihelion of Mercuryshould be about 7 percent less than the observed value. The lat-ter anomaly never was satisfactorily resolved within Newtoniantheory and was part of the evidence that supported Einstein'srelativity theory. But Newton's theory continued to be acceptedin the intervening period despite the outstanding anomaly, andthe lack of a better alternative surely was an important reasonfor this continued acceptance.

Of course, the advance of the perihelion of Mercury was notthe only evidence supporting Einstein's theory, and it by it-self may not have been enough to persuade many scientiststo reject Newton's theory in favor of Einstein's. Other impor-tant evidence supporting Einstein was the null result of theMichelson-Morley experiment, and the results of Eddington's

169

1919 measurement of the curvature of light passing the sun.But the Michelson-Morley result was also known before Ein-stein's theory was proposed, and had not led to the rejection ofNewton's theory. One can only speculate on what would havehappened if Eddington's results had been obtained before Ein-stein's theory was proposed, but I think that this too would nothave been enough to lead many scientists to reject Newton's the-ory in the absence of Einstein's theory explaining the results.Eddington's results were not completely unambiguous and wereobtained under difficult conditions; they could easily have beenattributed to experimental error. One might also suspect someerror in the assumptions on which the result was based, such asthe mass of the sun. If this is right, then the evidence that infact persuaded many scientists of the falsity of Newton's theorywould not have done so if Einstein's alternative theory had notbeen available.

As this illustrates, contrary evidence is rarely enough to per-suade scientists to reject a theory that has been highly suc-cessful; only when there is also an alternative theory that ac-counts for the evidence better is the older theory likely to berejected. This is a fact that is important for understanding thedevelopment of science, and it is something that a satisfactoryphilosophy of science should be able to explain.

The decision-theoretic account of acceptance is able to ex-plain this phenomenon in a very natural way. Suppose H is atheory that has had exemplary success (such as Newton's theoryof gravitation) but faces some serious anomalies; and supposethat there is no better alternative to H available. Then the op-tions available to scientists can be adequately represented as:to accept H, to accept H, or to suspend judgment altogether(i.e., to accept only H V H).4 If the anomalies are sufficientlyserious, p(H) may be low; but since H has been highly suc-cessful, it is still probably reasonably close to the truth, sincenew anomalies do not negate earlier successes. Also H will behighly informative. By contrast, H has very little content, eventhough the anomalies may indicate that its probability is high,and hence its expected distance from the truth is small. And4The symbol 'V' means 'or'. Thus lH V H1 is the tautologous statement that H

is either true or false.

170

highlow

zero

smallsmaller

zero

Content Expected distance from truthHH

H\/H

Figure 7.2: Acceptance options without an alternative theory

while H V H is certainly true, it has no content at all. The sit-uation is summarized in Figure 7.2. Thus scientists who aim toaccept informative theories that are close to the truth can beexpected to continue to accept H, despite the anomalies.

Now suppose an alternative theory K is formulated. Thisgives scientists an option they did not have before, namely toaccept K. We can suppose that the content of K is at leastcomparable with that of H (otherwise K would not really be analternative to H). And if K accounts for the evidence better, orhas other relevant virtues, then its expected distance from thetruth will be surmised to be less than that of H. Thus acceptingK will have higher expected utility than accepting i7, and itis rational to accept K, thereby rejecting H (since the two areassumed to be incompatible). In this way, the formulation ofthe new theory leads to a rejection of the old theory, somethingthat was not able to be effected merely by the anomalies theold theory faced.

The phenomenon to which Kuhn has drawn attention con-cerns the conditions under which scientists accept a scientifictheory (or reject it - which is to accept its negation). And theexplanation I have just given makes essential use of the the-ory of rational acceptance. To see that the notion of acceptanceis essential here, consider how one might try to deal with thephenomenon without invoking acceptance.

The first difficulty is to state the phenomenon. A probabilis-tic ersatz version of it would be: Anomalies do not substantiallydisconfirm a theory in the absence of an alternative theory, but atheory can be substantially disconfirmed when an alternative isavailable. Now one problem with this ersatz version is that it isnot borne out by the history of science, in the way that Kuhn's

171

thesis is. For the data from the history of science primarily con-cern the hypotheses that scientists accepted at different times,not the probabilities that scientists gave to those hypotheses.Thus even if one could give an explanation for the ersatz the-sis, that would be an answer to the wrong question. What asatisfactory philosophy of science needs to be able to explain isKuhn's thesis, and since acceptance is not high confidence, thisis not explained by explaining the ersatz thesis.

Second, I see no plausible explanation for why the ersatz the-sis would be true, or even any good reason to think that it istrue. I'll discuss this by considering both parts of the thesis inturn: the claim that anomalies do not substantially disconfirma well-established theory in the absence of an alternative the-ory, and the claim that anomalies together with an alternativetheory do substantially disconfirm an established theory.

Let E be the evidence about the advance of the perihelionof Mercury obtained by the end of the nineteenth century: thatits precession is 7 percent higher than predicted on the basis ofNewton's theory and current views about the remainder of thesolar system, that attempts to locate additional planets to ac-count for the discrepancy have been fruitless, and so on. SurelyE is much more what one would expect if Newton's theory werefalse, rather than true; an intuition that is supported by the factthat at this time some scientists proposed modifications to New-ton's theory to account for the anomalous perihelion advance- for example, modifications to the inverse-square law. So if His Newton's theory, we appear to have that p(E\H) <C p(E\H)\but then it follows from Bayes' theorem that E substantiallydisconfirmed if, contrary to what the ersatz thesis asserts.

Now let D denote that an alternative theory has been de-veloped, which better accounts for the evidence to date (e.g.,Einstein's theory). The ersatz thesis has it that ED disconfirmsif to a much greater extent than E alone. This will be true justin case p(D\EH) <C p(D\EH). But we know that regardless ofwhether H is true, there are always alternative theories that fitthe data to date, so there does not seem to be any reason whyp(D\EH) should be much less than p(D\EH)\ thus there seemsno reason why the discovery of an alternative theory shoulditself substantially disconfirm if.

172

To sum up this section: Kuhn's thesis makes an important ob-servation about the dynamics of scientific theory change, andit is one that can be naturally explained using the theory ofacceptance, and cannot be explained without invoking accep-tance. This is a second reason why acceptance is important inthe philosophy of science.

7.3 THE SCIENTIFIC VALUE OF EVIDENCE

Evidence gathering is a central part of scientific activity, andclearly makes an important contribution to the advance of sci-ence. It is just as clear that a satisfactory philosophy of scienceought to be able to explain the importance of evidence gather-ing in pursuing scientific goals.

But what exactly should such an explanation show? A firstsuggestion might be: Gathering evidence is always worth doingso far as the goals of science are concerned. But this claimis not true. Gathering evidence takes time and money, andsometimes is would be wiser not to gather a particular pieceof evidence, so that the resources can be put to use on otheraspects of the scientific enterprise. And even if we put asidethe cost of gathering evidence, it still would not be true thatthere is positive value in gathering any given piece of evidence;some evidence is simply irrelevant to anything we care about.(Who cares whether the number of hairs on my head is oddor even?)

What does seem to be true is that, ignoring the costs ofevidence gathering, gathering any particular piece of evidenceshould not be expected to be positively detrimental to our scien-tific goals. For ignoring costs, gathering irrelevant informationneither advances nor impedes our scientific goals; and gatheringrelevant evidence can be expected to advance those goals, eventhough in some cases it might in fact retard them. For futurereference, I shall state this thesis here as theScientific value of evidence thesis (SVET). // there wereno cost in gathering a piece of evidence, then so far as the goalsof science are concerned, it would not be irrational to gatherthat evidence; and in some cases it would be irrational not todo so.

173

It is the truth of SVET that a philosophy of science ought tobe able to explain.

Naively, one might suppose that it is trivial to give such anexplanation: Science aims to find true theories, and gatheringcost-free evidence cannot be counterproductive so far as thisgoal is concerned. But this is too simplistic because it overlooksthe fact that evidence is not infrequently misleading, causingscientists to accept false theories and reject true ones. For ex-ample, experiments conducted in the 1920s were taken by Bohrand other physicists to show that the principle of conservationof energy is violated in the beta decay of atomic nuclei; but wenow believe that this was a mistake, resulting from ignorance ofthe existence of a hitherto unknown particle, the neutrino. Ofcourse, we believe that we now have the correct interpretation,but there is no guarantee that we will always find the correctinterpretation of evidence, even in the long run.

There is a result in Bayesian decision theory that has beenthought to give a satisfactory explanation of SVET. I shall ar-gue that properly interpreted this result does indeed provide asatisfactory explanation of SVET, but that the requisite inter-pretation involves the notion of acceptance.

The result in question has already been stated in Section 5.1.2;it is that under suitable conditions (spelled out in Section 5.1.2),gathering evidence increases the expected utility of subsequentchoices, if it has any effect at all. For example, checking theweather forecast before dressing does not guarantee that I willdress appropriately for the day's weather, but it does increasethe probability of this and so increases the expected utility of mysubsequent decision about how to dress.

This result will explain SVET, provided we are willing toidentify the scientific value of gathering evidence with the ex-pected utility of subsequent decisions that might be made inthe light of the evidence. But what are these subsequent deci-sions? I. J. Good (1967) appears to assume that the subsequentdecisions can be identified with practical actions. He recog-nizes that in many cases scientists gather evidence without anydefinite practical applications in view, but even in such caseshe supposes that the ultimate goal is still pragmatic success.He has proposed that 'pseudo-' or 'quasiutilities' be used to

174

measure the value of these unknown practical applications(1969, p. 185) (1983, pp. 40, 191-2, 219).

Now it is true that science contributes to pragmatic success,and this provides a perfectly good motivation for doing science.But scientists also have cognitive goals, which are not reducibleto pragmatic goals. This is indicated by the importance that sci-entists attribute to different experiments. On Good's account,the importance that scientists attribute to an experiment oughtto be a function of the expected practical applications of theexperiment; and this is not the case. For example, Galileo'sobservation of the moons of Jupiter was rightly regarded asof great scientific importance, though surely nobody expectedany practical applications to flow from it, and indeed none havebeen forthcoming.5

I wish to understand the 'goals of science' referred to in SVETas being these cognitive goals. With this understanding, we can-not explain SVET as Good attempts to do, by appealing to thefact that evidence gathering increases the expected utility ofsubsequent practical actions.

This failure of Good's account is avoidable by making use ofthe notion of acceptance. We can say that the decisions influ-enced by evidence gathering are not limited to practical appli-cations but may also include cognitive decisions regarding whathypothesis to accept. Our formal result then shows that gath-ering cost-free evidence not only increases the expected utilityof practical actions, but also increases the expected utility ofacceptance decisions. The utility attaching to the latter deci-sions reflects cognitive goals, such as the goal of accepting trueinformative theories. Thus with the notion of acceptance, theformal result does indeed provide an explanation of SVET.

Some authors, seeing that SVET is not explained by Good'sapproach, have offered an alternative approach to explainingSVET, which does not make use of the notion of acceptance.This alternative approach has been advocated, in slightly

5Of course, Galileo hoped to use his discoveries to increase his fame and his salary,and these are practical ends. But our concern here is with how evidence gatheringcontributes to scientific goals, and these goals of Galileo's are not scientific ones.If there were not other goals to which Galileo's discoveries contributed, then hisdiscoveries would not have contributed to Galileo's fame or salary, either.

175

different forms, by Rosenkrantz (1981) and Horwich (1982).According to these authors, the cognitive goal of scientists isnot to accept true informative theories but rather to have prob-abilities that are close to the truth. A probability assignmentp(H) is said to be close to the truth if p(H) is close to 1, andH is true; or if p(H) is close to 0, and H is false. On this con-ception, the ideal situation would be to give probability 1 to alltruths and probability 0 to all falsehoods; other situations havelesser utility, depending on how closely they approximate thatideal. Call this the probabilist explanation of SVET, as opposedto the acceptance-based explanation I offered. I will now arguethat the probabilist explanation is inferior to the acceptance-based explanation.

To keep things simple, suppose that all we are interested inis the truth value of some hypothesis if. Then the cognitiveutility of having probability function p depends only on thetruth value of if, and we can write u(p, if) for the cognitiveutility of p when H is true, and u(p, H) for the cognitive utilityof p when H is false. I will call a utility function u truth-seekingif for all probability functions p and p', if p'(H) > p(H), thenu(pf,H) > u(p, if), and u(p',H) < u(p,H). Now it turns outthat there are truth-seeking cognitive utility functions for whichSVET is false; that is, a scientist who had one of these utilityfunctions could reduce expected cognitive utility by gatheringcost-free evidence. For example, let

and let u(p, H) be defined simply by replacing H by H in thisequation. Prom Figure 7.3, it can be seen that u(p, if) increasesas p(H) increases, and thus that u is a truth-seeking cognitiveutility function. Nevertheless, with this utility function, gath-ering cost-free evidence can reduce expected cognitive utility.

Proof. Suppose p(H) = 0.8, p(E) = 0.5, and p(H\E)_= 0.9.The expected utility of not acquiring the evidence E or E is theexpected utility of holding to the current probability function,namely

p(H)u(p,H) +p(H)u{p,H) = 0.78176

u(p,H)

p(H)

Figure 7.3: u(p, H) = p(H) + ^- sm3np(H)

Let PE and pg denote the probability functions that result fromconditioning p on E and E respectively. Then the expectedutility of acquiring the evidence E or E is

p(E) \p(H\E)u(pE,H) +p(H\E)u(pE,H)] +p(E) [p(H\E)u(p^H) +p(H\E)u(PE,H)] = 0.76.

The expected utility of gathering the evidence is thus less thanthat of not gathering it.

How might a defender of the probabilistic account respondto this? One possible response would be to say that as a matterof fact, scientists do not have cognitive utility functions like theone in Figure 7.3; that the cognitive utility functions scientistsactually have are such as to make SVET hold. But if this isall that can be said, then SVET merely expresses a contingentfact about scientists' cognitive utility functions. I think we wantto say that SVET is a deeper truth than that; that gatheringcost-free evidence could never be expected to be deleterious toscientific goals.

Another possible response is to say that a utility function sim-ply does not count as representing the cognitive goals of scienceunless it makes SVET true. The cognitive utility function inFigure 7.3 would, on this view, be rejected as unscientific. This

177

response avoids the objection raised in the preceding paragraph;indeed, it reduces SVET to a trivial necessary truth. But thissuccess is only achieved at a price. The proposed restriction onwhat counts as a scientific utility function is not motivated byany reflection on the goals of science that the utility function issupposed to capture, but merely by the fact that the probabilistexplanation fails without this restriction; thus the probabilistrestriction on what counts as a scientific utility function is quitead hoc. No such ad hocery is needed in the acceptance-basedexplanation of SVET; there we were able to show that SVETholds, regardless of the form of the cognitive utility function.

But even if this ad hocery is embraced, there is a furtherproblem. Consider, for example, the cognitive utility functiondefined by

u(p,H)=p(H)', u(p,H)=p(H).

Horwich (1982, pp. 127-9) postulated that this utility functionrepresented scientific cognitive values, and showed that on thisassumption SVET is deducible.6 But if you have this utilityfunction, then your probability function maximizes expectedcognitive utility only if it assigns to if a probability of 0, 1/2,or 1; any other value is suboptimal.

Proof. Let p be your current probability for if, and let q beany number between 0 and 1. The expected cognitive utilityof adopting q as the probability of iJ, calculated using yourcurrent probability p, is

Sp(q) =

Differentiating with respect to q gives d£p(q)/dq = 2p—l. Thus6This way of putting the matter is mine, not Horwich's. What Horwich says is

that the error in probability function p when H is true is 1 — p(H), and similarlyfor H. Since he takes rational choice to minimize expected error, he is in effectidentifying cognitive utility with negative error, i.e., u(p,H) = p(H) — 1. Sinceutility functions differing by a constant represent the same preferences, it isequivalent to say u(p, H) = p(H); similarly for H.

178

if p > 1/2, d£p(q)/dq is positive, and q maximizes expectedcognitive utility iff q = 1. Similarly, if p < 1/2 then q max-imizes expected cognitive utility iff q = 0. If p = 1/2, thend£p(q)/dq = 0, and all values of q have the same expected util-ity. Thus £p(p) > £p{q), for all q e [0,1], only if p = 0, 1/2,or 1.

This means that if you were ever to find yourself giving a hypoth-esis a probability other than 0,1/2, or 1, then you could increaseyour expected cognitive utility by shifting your probability forthat hypothesis to the nearest extreme value (either 0 or 1, asthe case may be). But in fact we think that shifting probabilitiesto extreme values like this, without any evidence to support theshift, could not increase expected cognitive utility.

The possible responses to this problem are the same as theresponses to the problem about utility functions that make ev-idence gathering irrational; and they suffer from the same de-fects. Thus one might say that scientists do not in fact haveutility functions like the one Horwich proposed; that in facttheir utility functions are such that, in the absence of new evi-dence, expected utility is always maximized by holding to theircurrent probability function. (Such utility functions have beendiscussed in the Bayesian literature under the rubric of 'properscoring rules' [Savage 1971].) This response makes it merely acontingent fact, which might have been otherwise, that arbi-trary shifts in probability to extreme values do not increase ex-pected cognitive utility. But this truth is surely not somethingthat might have been otherwise.

The other possible response is to say that a utility functiondoes not count as representing the cognitive goals of scienceunless it is a proper scoring rule. This succeeds in making ita necessary truth that arbitrary shifts in probability do notincrease expected cognitive utility; but like the similar moveused to make SVET necessary, success is here bought at theprice of invoking a completely ad hoc assumption. The lack ofany prior plausibility for the present assumption is indicatedby the fact that it deems unscientific the utility function thatHorwich proposed as representing scientific goals!

179

When cognitive consequences are taken to result from ac-ceptance decisions, rather than from probabilities, no such adhocery is needed. In an arbitrary shift in probability, one learnsnothing about the states, in the sense of learning defined in Sec-tion 5.1.5. Hence the theorem of Section 5.1.6 shows that theexpected utility of making an arbitrary shift cannot be higherthan that of leaving current probabilities unchanged. By takingthe subsequent actions to be decisions about what to accept,this theorem shows that arbitrary shifts also cannot increaseexpected cognitive utility. And this holds no matter what cog-nitive utility function a person might have.

To be fair to the probabilist explanation, I should point outthat the utility functions that violate SVET are not properscoring rules.7 Thus if one counts only proper scoring rules asrepresenting scientific values, no further assumption about theform of a cognitive utility function is needed to derive SVET.But this does not negate the fact that the probabilist mustrestrict cognitive utility functions in an ad hoc way, both toexplain SVET and also to prevent arbitrary shifts in probabilitybeing rational.

To sum up this section: There are two attempts to explainSVET without invoking acceptance. The first, Good's, makesthe mistake of conflating the cognitive goals of science withpragmatic goals. The second, the probabilist approach of Rosen-krantz and Horwich, postulates cognitive utilities determinedby a scientist's utility function; this postulate, unlike the corre-sponding postulate concerning acceptance, does no work else-where in the philosophy of science. Furthermore, the proba-bilist approach requires additional ad hoc assumptions aboutthe form of a scientific cognitive utility function. By contrast,the acceptance-based approach gets to explain SVET for free -no further assumptions need to be invoked.8

7Proof. Think of adopting a new probability function as an act, so that utilitiesdepend on the act and state of nature, in the usual way. Then if the utilityfunction is a proper scoring rule, the "act" chosen will be one that maximizesexpected utility at the time it is chosen. (For a proper scoring rule is a utilityfunction for probabilities that always makes the current probability functionmaximize expected utility.) We can now prove, in the usual way, that gatheringevidence cannot reduce expected utility.

8For an argument that neither Popper nor Kuhn is able to explain SVET, see(Maher 1990c).

180

7.4 SUMMARY

I have given three reasons why acceptance is an important con-cept for the philosophy of science. Acceptance is needed to ex-plain the history of science; it is needed to explain the role ofalternative hypotheses in science; and it figures in the best ac-count of how gathering evidence contributes to scientific goals.

181

8

Representation theorem

In Chapters 6 and 7, I assumed that rational scientists haveutilities for cognitive consequences, as well as probabilities forscientific hypotheses. It is now time to defend that assump-tion. According to the preference interpretation of probabilityand utility given in Section 1.3, the assumption would be true ifrational scientists have preferences that maximize expected cog-nitive utility, relative to some probability and cognitive utilityfunctions. My argument will therefore consist in showing thatthe cognitive preferences of rational scientists can be so repre-sented. This will be done by stating a suitable representationtheorem.

It turns out that existing representation theorems are notquite suitable for this purpose. I will show why Savage's repre-sentation theorem is not suitable and then state a new represen-tation theorem that does what is wanted. This representationtheorem comes in two parts, the first (Section 8.2) establish-ing a representation for simple cognitive acts, and the second(Section 8.3) extending this to cognitive acts in general. I willdescribe the assumptions of the theorem in some detail, butproofs are relegated to appendixes.

Even without proofs, this chapter is technical. Readers willingto grant the possibility of a representation (even if only for thesake of argument) could skip this chapter and proceed directlyto the discussion of scientific values in Chapter 9.

8.1 SAVAGE'S UNINTERPRETABLE ACTS

In Chapter 1, I mentioned Savage's (1954) representation theo-rem and discussed some of its postulates. Here I will show thatSavage's theorem has a defect that makes it unsatisfactory touse as a basis for deriving cognitive utilities.

182

I begin with Savage's conception of an act. In Savage's the-ory, acts, states, and consequences are conceptualized in such away that the act chosen, together with the state that obtains,uniquely determines what consequence will be obtained. Henceit is possible to think of acts as functions mapping the set ofstates onto the set of consequences. But Savage goes a step fur-ther, and identifies the set of acts with the set of functions fromthe states to the consequences. Thus every function from thestates to the consequences is regarded as an act. I will call thesefunctions Savage acts.

In many decision problems, some of these Savage acts donot correspond to any conceivable decision. In particular, thisis true of cognitive decision problems. For example, let A bea proposition, and let (A, t) be the cognitive consequence thatwe can describe as "accepting A when A is true." Let / bethe function that maps every state onto (J4,£); that is, f(x) =(Ajt)j for all states x. Then / is a Savage act. But if A isnot a necessary truth, there are states x in which A is false;and there is no conceivable decision that would give the conse-quence (A, t) in those states. Thus the Savage act / does notcorrespond to any conceivable decision. Savage acts like this,which correspond to no conceivable decision, I will call unin-terpretable acts.

The Savage act / in this example is an instance of what iscalled a constant act, that is, an act whose value is the same forall states. In cognitive decision problems, it would appear thatconstant acts are typically uninterpretable. However, there areexceptions. It is reasonable to take the cognitive consequence ofaccepting the tautologous proposition as being the same what-ever state obtains, since this proposition is true in all of them.Similarly, we can plausibly take the cognitive consequence ofaccepting the contradictory proposition to be the same what-ever state obtains, since this proposition is false in every state.(Moreover, it captures no more of the truth about any one statethan about any other.) Thus the decisions to accept the tautol-ogous proposition and to accept the contradictory propositionare both interpretable constant acts.

Not all uninterpretable acts are constant acts. If f(x) =(A, £), for some state x in which A is false, then / is an

183

uninterpretable act, regardless of its values for other states. Butall these functions are regarded as acts by Savage.

It makes no sense to speak of preferences regarding uninter-pretable acts. For the notion of preference we are concernedwith is a preference for performing one act rather than another,not a preference for (say) contemplating the mathematical formof one act rather than another; and the notion of performing anuninterpretable act is nonsense. This means that by includinguninterpretable acts in the set of acts, Savage makes it logicallyimpossible for preferences to be connected on the set of acts.Thus Savage's first postulate, which asserts that preferences areconnected (Section 1.3), logically cannot be satisfied on the setof Savage acts.

In Section 1.5,1 did allow that the connectedness axiom doesnot need to be a requirement of rationality, in order for repre-sentation theorems assuming this axiom to be useful. I said thatif your preferences agree so far as they go with some connectedpreference rankings that satisfy the other postulates, then therepresentation theorem tells us that these connected preferenceorderings are represent able by a p-u pair. Hence the represen-tation theorem can be interpreted as telling us that rationalpreferences are represented by at least one p-u pair. But insaying that preferences satisfying the other axioms agree withsome connected preference ordering, I was of course assumingthat the acts were the sorts of things that it is possible to havepreferences about; and that is not so when the class of actsincludes uninterpretable acts.

Still, it remains the case that if your preferences satisfy theother postulates, there is a binary relation -< that is connectedon the set of all acts, satisfies the other postulates, and agreeswith your preferences so far as they go. Savage's representa-tion theorem shows that there is a probability function p anda utility function u such that / •< g iff EU(f) < EU(g), for allSavage acts / and g (where EU is expected utility calculatedusing p and u). We could then say that p and u represent youractual preferences, even though the relation •< is not a possiblepreference relation.

I think this approach does succeed in salvaging somethingfrom Savage's representation theorem. But interpreted in this

184

way, Savage's representation theorem only establishes the exis-tence of some p-u pair that represents rational preferences; itdoes not establish any uniqueness result. Even if your prefer-ences were defined for all interpretable acts, Savage's represen-tation theorem does not allow us to infer that your probabilityand utility functions are unique;1 for Savage gets this unique-ness result by assuming connectedness on the set of Savage acts,and we have seen that it is logically impossible for your prefer-ences to be connected on this set.2

The inclusion of uninterpretable acts also creates problemsfor some of Savage's other postulates. For example, the inde-pendence postulate asserts that if / , / ' , g, and g' satisfy certainconditions, and if / •< g, then / ' ^ gf. However, it might be that/ and g are interpretable, while / ' and/or g1 are not; and in thiscase, it would be impossible to satisfy independence. We mightget around this by saying that independence is only a require-ment of rationality when all the acts involved are interpretable;but still, this is an inelegant way to construct a theory. Onewould rather have a representation theorem that does not re-quire the assumption of preferences regarding uninterpretableacts.

8.2 SIMPLE COGNITIVE EXPECTED UTILITY

I will now state a representation theorem that is broadly similarto Savage's, but I will avoid Savage's reliance on uninterpretableacts. This representation theorem will show that rational prefer-ences regarding simple cognitive acts correspond to maximizingexpected utility relative to some probability and cognitive util-ity functions. In Section 8.3, the restriction to simple acts willbe removed.

1 A "unique" utility function here means one that is unique up to a positive affinetransformation. This notion will be explained in Section 8.2.10.

2 For Savage's use of the assumption that the preference relation extends to unin-terpretable acts, see in particular the proof of the first theorem in Savage (1954,ch. 5). This proof requires that for each consequence a and for any state x therebe an act / (in the domain of the preference relation) such that f(x) = a; andthere will not generally be an interpretable act / satisfying this condition. Also,Savage's proof of the existence of a utility function (1954, ch. 5, sec. 3) dependson the assumption that for each consequence a there is a constant act that givesa in every state; and as we have seen, constant acts are often uninterpretable.

185

8.2.1 NotationThe set of states will be denoted by X. These are to be for-mulated in such a way that you are sure exactly one of themobtains. They are also to be sufficiently specific that they de-termine what consequence will result from each act under con-sideration. So, for example, if we are concerned with the act ofaccepting A, and the cognitive consequence of this act dependson the truth value of A, then the states must specify the truthvalue of A. If what is of cognitive value in accepting A dependson more than the truth value of A (e.g., because it matters how"close to the truth" A is when false), then the states need tospecify more than the truth value of A. We saw an exampleof how to do this in Section 6.3.2, where the acts were to ac-cept various hypotheses about the value of a parameter, andthe states specified the true value of the parameter.

Events (or propositions) are taken to be sets of states, that is,subsets of X. The set of all events will be denoted by X. (HereI am using the convention according to which a boldface letterdenotes a set of subsets of the set denoted by the correspondingitalic letter.) I do not assume that X contains every subset ofX\ I merely assume that X is a cr-algebra of subsets of X.3

The set of consequences will be denoted by Y. As always, theconsequences need to specify everything of value in the situa-tion. In cognitive decision problems, if all we cared about waswhether the corpus we accept is true or false, then the conse-quences could be taken to be of the form "Accepting A whenit is true" and "Accepting A when it is false." But if the valueof accepting a false corpus varies depending on which state ob-tains (as it will if we care about "distance from truth"), thenthe consequence of accepting A when it is false should ratherbe taken to be of the form "Accepting A when state x obtains."

I will use Y to denote a cr-algebra of subsets of Y that con-tains all the singleton sets of Y. That is, for all a € F, {a} G Y.

The set of possible acts, or decisions, will be denoted byD. In cognitive decision problems, this set will include acts of

3 This means that X contains the empty set 0, the set X, and is closed undercomplementation and countable unions. To be closed under complementationmeans that if A £ X, then A G X. To be closed under countable unions meansthat if A\,A<i,... G X, then ( J ^ Ai G X.

186

accepting various potential corpora. As in Savage's theory, theelements of D will be taken to be functions from X toY. How-ever, unlike Savage's theory, I do not assume that D containsevery function from X to Y. Exactly what D must include willbe explained shortly.

A function / from X to Y is said to be measurable if, for allB G Y, f~l(B) G X.4 The acts in D will be required to bemeasurable. Thus if there are functions from X toY that arenot measurable, these are excluded from D. This is a merelyformal requirement; it does not force any specific function tobe excluded from D, since the choice of X and Y is up to us.

I turn now to a description, and discussion, of the variousassumptions on which my first representation theorem rests.

8.2.2 ConnectednessThe representation theorem will assume that the weak prefer-ence relation is connected on D. Formally, this isAxiom 1. For all / , j G D , either f ^ g or g •< f (or both).The status of this assumption was explained in Section 1.5; itis not a requirement of rationality, but its assumption in a rep-resentation theorem is harmless, provided we do not interpret arepresentation theorem as showing that a rational person musthave unique probability and utility functions.5

8.2.3 Two constant actsUnlike Savage's representation theorem, the theorem to bestated here does not assume that for every a G Y there is acorresponding constant act in D. However, it is assumed thatD contains (at least) two constant acts, one of which is strictlypreferred to the other. As I noted in Section 8.1, this is a plau-sible assumption when we are dealing with cognitive decisionproblems. For there are always the options of accepting X (thetautologous proposition) and 0 (the impossible proposition). Inaccepting X one accepts truth no matter what state obtains,and in accepting 0 one accepts falsehood no matter what state4f~~1(B) is the set of all x G X such that f(x) G B.5 Footnote 1 applies here also.

187

obtains. Furthermore, I imagine that most people would strictlyprefer accepting necessary truth to accepting necessary false-hood.

The preference relation ^ can be extended in a natural wayto the consequences that are values of constant acts. Thus ifa, b G Y, I shall interpret 'a ^ V as meaning that there existf,g € D such that f -< g, and for all x G X, f(x) = a andg(x) = b. Similarly for 'a -< 6'. In this notation, the assumptionthat there are two constant acts, one strictly preferred to theother, can be expressed as

Axiom 2. There exist a, 6 eY such that a -< b.

8.2.4 Closure conditionIf / and g are acts, then the following is also a conceivable act:First determine whether or not A is true, then choose / if Ais true, and g otherwise. This latter act gives the same conse-quences as / for states in A, and gives the same consequencesas g for states in A. I will assume that for any / , g G D and forany i E X , there is always such an act. Formally:

Axiom 3. For all / , g G D and A G X , there exists h G D suchthat h = / on A, and h = g on A.

We could call h a mixture of / and 5, and then Axiom 3 can beexpressed by saying that D must be closed under the mixingoperation.

To see the import of this Axiom for cognitive decision prob-lems, suppose that / is the act of accepting some hypothesisF, and g is the act of accepting a hypothesis G. Then h is theact of learning whether A is true and accepting F if it is, G ifit is not. This act h will generally not be identifiable with theact of accepting any particular hypothesis, and thus Axiom 3entails that in cognitive decision problems the class D cannotbe limited to acts of acceptance.

The act h can be described as an experiment, since it consistsof observing (or otherwise determining) whether or not a cer-tain event obtains, and then making an inference conditionalon that observation. Sometimes, of course, the mere making of

188

an observation is called an experiment; but the design of an ex-periment properly includes a specification of the inference to bedrawn from each possible outcome, and so it is not unnaturalto think of an experiment as consisting not just of an observa-tion, but as including also a rule determining the inference tobe drawn from the observation.

In general, I will call an act an experiment if it involves learn-ing which element in some partition6 {A\ : A G A} obtains(where A is an arbitrary index set), and for each A G A there isa proposition B\ that will be accepted if A\ obtains. It is clearthat the class of experiments is closed under mixing. That is tosay, if / and g are two experiments, and h is the act that equals/ on A and g on A, then h is also an experiment. Thus in cogni-tive decision problems, Axiom 3 can be satisfied by taking D tobe a class of experiments. Furthermore, the act of accepting ahypothesis B is a degenerate case of an experiment; it consistsof learning the true element of the trivial partition {X}, andaccepting B if X obtains (as it must). Consequently, Axiom 2can also be satisfied by taking D to be a class of experiments.The axioms yet to be stated can likewise be satisfied when Dis a class of experiments.

It is not necessary for D to contain all experiments in orderfor the assumptions of the representation theorem to be satis-fied. In particular, the assumptions that have been stated thusfar are satisfied by taking D to be the class of those experimentsthat have the following form: It is observed whether or not someevent A G X obtains, with X being accepted if A does obtainand 0 being accepted otherwise. Of course, we would normallywant to consider more interesting experiments than this, andmy point here is merely to indicate the smallest class of ex-periments that satisfies the assumptions of the representationtheorem.

Not all experiments are practical possibilities. For example, anexperiment that involved observing whether or not all electrons(in all of space-time) have the same charge is not an experimentthat is available to finite beings. However, even experiments like

6 A partition is a set of mutually exclusive and jointly exhaustive events (or propo-sitions). In addition, I will add the stipulation that a set of events counts as apartition only if each event is an element of X.

189

this are conceptual possibilities, and often people will have pref-erences about them. If A is the proposition that all electrons havethe same charge, then few would disagree that the experiment ofobserving whether A is true, and accepting A if it is true and Aotherwise, is preferable to the experiment that makes the sameobservation but draws the opposite conclusion. This preferenceholds despite the fact that both of the experiments here com-pared are unrealizable in practice.

8.2.5 TransitivityThe representation theorem also assumes that the preferencerelation •< is transitive. This isAxiom 4. For all / , g, h G D, if f ^g^h, then f <h.The status of this principle was discussed in Chapter 2. I dotake it to be a requirement of rationality, though I concededthat this position is not entirely uncontroversial. I suggestedthat the most fruitful approach to resolving the controversywould be to see what consequences flow from the principle andits alternatives. The present work is drawing out some conse-quences of transitivity together with other principles.

8.2.6 IndependenceMy representation theorem also assumes the independence ax-iom:Axiom 5. For all / , fLg, g1 G D and A G X, if f = f on A,g = gf on A, f = g on A, f = g' on A, and f ^g, then f •< gf.The status of this principle was discussed in Chapter 3.1 showedthere that independence follows from the intuitively attractiveprinciple of synchronic separability. While this will not end alldebate, my suggestion here too was that further progress towardconsensus is most fruitfully sought by developing applicationsof the principle and its alternatives.

8.2.7 Generalized weak dominanceAt this point, it is convenient to define a notion of conditionalpreference, as follows:

190

Definition 8.1. f -< g given A iff for all f,g' G D, if f = fon A, g1 = g on A, and f = g1 on A, then f •< g1.

Axioms 1 and 5 imply that for all / , j G D and A G X, eitherf -<g given A, or g •< f given A (or both). If / •< g given A, but9 it f giv e n A, then we say / -< g given A, Similarly, \i f -< ggiven A and g •< f given A, then we say f ~ g given A.

An event i G X will be said to be a null event if the personis indifferent between all acts that differ only on A. The classof null events will be denoted by N. The formal definition ofNis

Definition 8.2. A G N iff A G X and for all f,g e D, f £ggiven A.This is no more than a definition; it is not assumed that thereactually are any null events. But null events, if there are any,will be assigned probability 0 by the representation theorem.

Given these two definitions, I can now state what I will callthe principle of weak dominance, as follows:7

Weak dominance. If / ,# e D, a, b € F, / = a, and g = 6,then for all A G X \ N, / ^ g given A iff a £ b,I think that weak dominance is an uncontroversial principleof rationality. But its restriction to consequences that are thevalues of constant acts seems unnecessarily restrictive. To seehow a suitable generalization may be effected, note that thefollowing is equivalent to the weak dominance principle:

If / , g G D, a, b G F, / = a, and g = fe, then for allA , B G X \ N , / ^ given A iff / ^ g given B.

From this formulation, it is a natural step to the following gen-eralization, which is not restricted to consequences that are thevalues of constant acts:

Axiom 6. For all / , j G D, o,() G 7, and A,B G X \ N, iff = a on A, f = b on A, g = a on B, and g' = b on B, thenf 3 / ' given A iff g<g' given B.7The notation X \ N denotes the set of elements of X that are not in N. In words,

it is the set of nonnull events.

191

If it is assumed that every consequence is the value of a constantact, then Axiom 6 is in fact equivalent to weak dominance. Soit does indeed capture the essence of weak dominance, withoutthe restriction to consequences that are the values of constantacts. It seems to be no more controversial than weak dominanceitself, and the representation theorem will assume it.

8.2.8 Qualitative probabilityWe can say that event B is more probable for you than event A,just in case you prefer the option of getting a desirable prize if Bobtains, to the option of getting the same prize if A obtains. Butfor this "more probable than" relation to be well defined, it mustbe the case that whether you prefer a prize to be conditionalon A or B does not depend on what the prize is, just so longas the prize is desirable. This appears a reasonable condition;and I will now state an axiom that captures it formally.

The option of getting a desirable prize if A obtains can beregarded as an act / such that / = b on A, f — a on A, anda -< b. Similarly, the option of getting the same desirable prizeif B obtains is an act g such that g = b on B and g = a on B.You prefer the prize to be conditional on B iff / -< g. We couldthen say that your preference is independent of what the prizeis, so long as substituting c for a and d for b does not reverseyour preference, for all c and d such that c -< d.

While this is a reasonable condition, the formulation justgiven is unnecessarily restrictive, since it applies only whenthe prizes are the values of constant acts. (This is due to theconditions that a -< b and c -< d.) It should not make anydifference if there are states in A fl B in which no act gives 6,or states in A fl B for which no act gives a. Similarly for c andd. To see how to remove this gratuitous restriction, note that if/ and g are as in the preceding paragraph, we can replace therequirement that a -< b by the requirement that f -< g given A\B. The latter requirement is equivalent to the former if a andb are the values of constant acts,8 but the latter requirementcan hold in other cases as well. So the condition stated in the8Assuming A\B 0 N. There is no need to discuss here the case when A\B £

N, because Axiom 7 says nothing about that case. If A \ B € N, then / ~ ggiven A \ B, so the antecedent of Axiom 7 is false.

192

preceding paragraph can be generalized in the following way:

Axiom 7. For all / , / ' , p, g' G D; a, 6, c, d G F ; and A,B G X:

• f = b on A and a on A,• g = b on B and a on B,• ff = donA and c on A,• g1 — d onB and c on B,• / -< 9 9^en A\B,• f'^9* given A\B, and

then~f ^g1.

The generalization does not affect the intuitive rationale for thiscondition, namely: Whether you would rather have a desirableprize riding on A or B should not depend on what the prize is,just so long as it is desirable.

8.2.9 ContinuityFor the next axiom, it seems best to first state it, then giveillustrations and discuss its import.

Axiom 8. For all / , g G D, a G Y, and i4GX, if f -< g thenthere exists in X a partition {-Ai,..., An} of A such that fori = 1,. . . ,n:• If fi = a on Ai and f on Ai, then fi -< g.• If gi = a on Ai and g on Ai, then f -< gi.

Here is an example of the application of Axiom 8 in a cogni-tive decision problem. It makes use of these propositions fromSection 6.3.1:

Hc> The electrostatic force falls off as the nth power of thedistance, for some n between 1.98 and 2.02.

H'c: The electrostatic force falls off as the nth power of thedistance, for some n between 1.9 and 2.1.

Let the / and g of Axiom 8 be the acts of accepting H'c andHe, respectively. Also let A be the proposition that the electro-static force falls off exactly as the second power of the distance,and let a be the cognitive consequence of accepting A when it

193

is true. We know that for Cavendish, after he had conductedhis experiment, / -< g. Axiom 8 thus requires that there be apartition {Ai,. . . , An} of 4, such that for i = 1, . . . , n,

If fi = a on A{ and / on A^ then fi -< g.

Now consider the trivial partition of A that consists of just Aitself; that is, A\ = A. Then f\ is the act of accepting A ifA is true, and otherwise accepting Hf

c.9 It may be that forCavendish, f\ -< g. Intuitively, this would occur if Cavendishthought A very improbable; for then he would think f\ is mostlikely to give whatever consequence / would give, and he has/ -< g. If this possibility is realized, the first clause of Axiom 8is satisfied for this example.

But perhaps Cavendish has g ^ f\. If so, we will consider anontrivial partition of A. For example, Cavendish might knowthat a coin is to be tossed ten times. Then we can let A\ be theproposition that A is true and that all ten tosses land heads,let A2 be the proposition that A is true and only the first ninetosses land heads, and so on. In this way, we get a partition of Awith 210 members. Now f\ is the act that gives the consequencea if A is true and all ten tosses of the coin land heads; other-wise /1 gives whatever consequence / would give. Similarly forh-> • • • ? /210- Since each partition element is very unlikely, eachfi is overwhelmingly likely to give the same consequence as /would give; and so we might expect to have fi -< #, for alli = 1, . . . , 210. If so, the first clause of Axiom 8 is satisfied. Ifnot, try an even finer partition of A.

Thus we see the import of Axiom 8: It is likely to hold only ifthe set of states is sufficiently numerous that propositions canbe subdivided into arbitrarily fine divisions. That is why thisaxiom is dubbed "continuity." This should not be a controver-sial assumption, because we can always enrich the class of statesif necessary. Of course, a person might not have preferences overall the options definable in the enriched set of states, but weare not insisting that a person's preferences be connected.

Axiom 8 also requires that there be no consequence so desir-able that adding even the tiniest chance of getting it is enough9This is an experiment, in the sense of Section 8.2.4.

194

to reverse preferences between acts. This amounts to sayingthat the decision maker does not view any consequence the wayPascal, in his famous wager, viewed the prospect of getting toheaven. I think this is plausible in cognitive contexts, at least.The best possible cognitive consequence is presumably learningeverything one would like to know, and few would be inclinedto reverse cognitive preferences for an infinitesimal chance ofattaining this consequence.

So far I have illustrated and discussed only the first clauseof Axiom 8, but the second clause is similar. With a as above,the gi defined in this clause would actually be more attractiveoptions than g itself, and so it is unproblematic that / -< gi.A more interesting case arises when a is an undesirable con-sequence. For example, suppose a is the consequence of ac-cepting the contradictory proposition 0. Using the partition{.Ai,..., A2io} defined three paragraphs back, each <# has onlya tiny chance of resulting in the acceptance of a contradiction;it is almost certain that gi will give the same consequence asg would. And so we might well expect that / -< <#, for alli = 1, . . . , 210. If not, try a finer partition.

Here we see that the second clause of Axiom 8 requires thatthere be no consequence so undesirable that adding even thetiniest chance of getting it is enough to reverse preferences be-tween acts. This is, I think, plausible in cognitive contexts. Per-haps the worst cognitive consequence is to accept a contradic-tion, and I think most of us can accept a sufficiently small riskof doing that.

8.2.10 Representation theoremA representation theorem for expected utility has both an exis-tence and a uniqueness component. In a typical representationtheorem, such as Savage's, the existence component says thatthere is a probability function p and a utility function u suchthat for all f,g G D,

fig iff EU(f)<EU(g),where the expected utility EU is calculated using p and u. Thisestablishes that there is a representation. The uniqueness com-ponent typically says that the probability function p is unique;

195

that is, p cannot be replaced by any other function withoutdestroying the representation. The uniqueness component alsotypically says that the utility function u is unique in the sensethat if v! can replace u without destroying the representation,then there exist constants p and cr, with p > 0, such thatu' = pu + a.10 This relation between u and v! is called a pos-itive affine transformation, and thus the uniqueness result isalso expressed by saying that u is unique up to a positive affinetransformation.

The representation theorem that can be proved on the basisof Axioms 1 to 8 differs from this paradigm of a representa-tion theorem in both its existence and uniqueness components.These differences will now be explained.

An act is said to be simple if it has only finitely many possibleconsequences. That is to say, / is simple if there are only finitelymany a G F such that, for some x G X, f(x) = a. If we sup-pose that the cognitive consequence of accepting a hypothesisdepends only on the truth value of the hypothesis, then accep-tance of a hypothesis has only two possible consequences andhence is a simple act. Under the same supposition, experimentsare also simple acts, provided the partition whose true elementwill be ascertained is finite. The representation theorem to bestated here provides an expected utility representation for sim-ple acts only. Thus the existence claim of this representationtheorem is weaker than the paradigm described above.11

Let Y* be the set of all a G Y such that, for some / G D,/ - 1 (a ) ^ N. Thus Y* is the set of all consequences that can beobtained with positive probability. So the cognitive consequence"accepting A when A is true" is a member of Y* only if A £ N.The uniqueness part of the representation theorem to be statedhere says that if p and u are probability and utility functionsthat give the expected utility representation (on simple acts),10More fully: u'(a) = pu(a) + a, for all aeY.11 Axioms 1—8 do not entail an expected utility representation for all acts. Savage

(1954, p. 78) gives an example in which Axioms 1-8 are all satisfied but nocomplete expected utility representation exists. Savage's example was one inwhich the probability p was not countably additive, and Savage speculatedthat this failure of countable additivity was an essential feature of his example.However, Seidenfeld and Schervish (1983, p. 404) give an example in which p iscountably additive, and Axioms 1—8 are satisfied, but still there is no expectedutility representation of the preferences.

196

and if v! is another utility function that can be substituted foru while preserving that representation, then there exist con-stants p and a, p > 0, such that v! = pu + a on y*.12 Thiscan be described by saying that u is unique, up to a positiveaffine transformation, on Y*. The limitation to Y* makes thisuniqueness result weaker than the paradigm described in thefirst paragraph of this section.

With these explanations, I can now state the theorem.

Theorem 8.1. If Axioms 1-8 hold, then there exists a proba-bility function p on X, and a utility function u on Y, such thatfor all simple / , g G D,

f*g iff EU(f)<EU(g),where EU is expected utility calculated using p and u. The prob-ability function is unique, and the utility function is unique, upto a positive affine transformation, on Y*.

The proof of this theorem is given in Appendix B.If the acceptance of a hypothesis were a simple act, then

Theorem 8.1 would provide a satisfactory foundation for thedecision-theoretic account of rational acceptance.

8.3 GENERAL COGNITIVE EXPECTED UTILITY

In Section 6.3, I noted that the cognitive utility of accepting afalse hypothesis is often held to depend on how close the hypoth-esis is to the truth. As the example discussed in that sectionillustrated, this view entails that accepting a hypothesis mayhave infinitely many possible consequences. If that is so, thenaccepting a hypothesis is not a simple act. In that case, The-orem 8.1 does not establish an expected utility representationfor preferences regarding acts of acceptance.

However, with some additional assumptions, Theorem 8.1 canbe strengthened so that it does provide an expected utility rep-resentation for acts of acceptance, even when those acts are notassumed to be simple. In this section, I will state the additionalassumptions that are needed, and then the stronger represen-tation theorem that they allow to be proved.12More fully: u'(a) = pu(a) + <r, for all aeY*.

197

I am not here assuming any theory of what it means for afalse hypothesis to be more or less close to the truth. Thattopic will be taken up in Chapter 9. The point of invoking theidea here is merely to make it plausible that acceptance neednot be a simple act. The representation theorem shows thatwe can speak meaningfully of cognitive utilities under thesecircumstances, without having any prior theory of distance fromtruth. Consequently, in Chapter 9 I will be able to define thenotion of distance from truth in terms of cognitive utility.

8.3.1 Dominance for countable partitionsThe Axioms of Section 8.2 entail the following (cf. Theorem B.2):

If {Ai,..., An} is a partition of A, and if / ^ g given Ai forall £ = 1, . . . , n, then / •< g given A.

This is a type of dominance condition. A somewhat weakercondition is obtained by replacing •< with -< in the antecedentof the conditional. This gives

If {A\,..., An} is a partition of A, and if / -< g given Ai forall £ = 1, . . . , n, then / £ g given A.

I will assume that this weakened form of the dominance condi-tion applies even when the partition is countably infinite. So Iassume

Axiom 9. For all / , g G D and 4 G X , &/{AI, A2,...} is a par-tition of A, and iff -< g given Ai for all i, then f -< g given A.

Axiom 9 is, I think, immensely attractive; so much so that anargument for it would seem to be redundant. However, this ax-iom, together with the preceding ones, entails that the probabil-ity function p (whose existence is guaranteed by Theorem 8.1) iscountably additive. This means that if Ai, A2,... is a sequenceof disjoint (or mutually exclusive) events, then p(USi Ai) —Yli^iP(Ai). By contrast, Axioms 1-8 merely entail that p isfinitely additive; that is, they guarantee that p(\J2=i Ai) =Yl?=iP{Ai) for finite n, but not the extension of this to infi-nite sets of events. The proof that Axiom 9 entails countableadditivity is given in Appendix D.

198

In the mathematical theory of probability and in applicationsof it in the sciences, it is generally assumed that probability iscountably additive. Nevertheless, de Finetti has argued that weshould not take countable additivity of probability to be a condi-tion of rationality. One of his reasons for this is that if one knowsan integer will be selected randomly, one might want to say thateach integer has the same probability of being chosen and that theprobability of some integer or other being chosen is 1. This vio-lates countable additivity. De Finetti's other reason for rejectingcountable additivity as a rationality condition is that he thinksthere is no compelling argument for it, other than mathematicalconvenience. I will address these two reasons in turn.

If you told me you had selected an integer "at random," Iwould think 5 more likely than 5 million. Of course, sometimesnature "chooses" an integer, such as the number of stars in theuniverse or the number of hairs on my head. But here again,I have no inclination to judge all integers as equally likely tobe selected. In fact, I have not been able to think of an actualcase where it seems natural to assign equal probabilities to allelements of a countable partition. Perhaps this is a failure ofimagination on my part; but at the very least it does indicatethat there are few situations in which it is natural to assignprobabilities in this way.

Turning now to de Finetti's claim that there is no argumentfor countable additivity, other than convenience: I would rebutthis by noting that Axiom 9, together with other conditions ac-cepted by de Finetti, entails that probabilities should be count-ably additive.

There is also an ad hominem argument: De Finetti acceptsDutch book arguments, and we can give a Dutch book argu-ment for countable additivity. For example, suppose an inte-ger is to be selected, and as de Finetti wishes to allow, yougive each integer probability zero of being selected but are suresome integer will be selected. If you post these probabilities ina de Finetti-style Dutch book setup (cf. Section 4.6.4), you willhave to accept bets in which you lose $1 if the integer selectedis n and gain nothing if it is not n. But accepting all such betsgives you a sure loss of $1. (We can also make the bets eachhave a positive expected utility for you and still ensure that

199

together they produce a sure loss.) Perhaps it will be objectedhere that finite agents cannot make infinitely many bets. Butthe bets need not be made serially; and even if finitude did pre-clude making infinite sets of bets, this does not seem to be arelevant objection to a thought experiment concerned with ra-tionality conditions. Thus de Finetti cannot consistently rejectcountable additivity.

8.3.2 Acts and consequencesBesides Axiom 9, there are two more axioms that I need inorder to prove the representation theorem for general cognitiveutility. These two axioms require D and Y to have a certainstructure. I think that the best procedure here is for me to firstpresent what I will call the favored interpretation of D and Y;later I will show that this interpretation of D and Y satisfiesthe axioms yet to be stated.

I begin with the favored interpretation of Y. On this inter-pretation, for each nonempty A G X, Y contains the cognitiveconsequence "accepting A when A is true." I will continue touse the notation (A, t) to refer to this consequence. Since con-sequences must be complete in all respects that are of value(Section 1.1), I am assuming here that if you accept A and itis true, the satisfaction of your cognitive goals is not dependenton which state in A is the true state. For example, if the onlything you accept is that all ravens are black, and if all ravensare indeed black, then on this assumption your cognitive utilityis not dependent on whether all swans are white, or on howmany ravens there are, or on anything else. (This assumptionwill be defended in Section 9.3.)

On the favored interpretation of Y it also includes, for eachnonempty A G X and x G A, the cognitive consequence "ac-cepting A when the true state is x" This consequence will bedenoted (A,x). In identifying the consequences of accepting afalse hypothesis in this way, we make no assumptions aboutwhat factors influence the cognitive value of accepting a falsehypothesis; we merely assume that the states are sufficientlyspecific to determine the values of whatever the relevant fac-tors may be.

200

Acceptance of the empty set (or contradictory proposition) 0was made an exception in each of the preceding two paragraphs.That is because, on the one hand, 0 is true in no state; andon the other hand, as argued in Section 8.1, accepting 0 is aconstant act; it gives the same consequence in every state. Thisconsequence will be denoted (0, / ) .

On the favored interpretation, Y contains all the conse-quences just mentioned, and no others. Thus Y is

{(A,t) : AeX\{<b}}U{(A,x) : AeX\{Q},x ? A}U{(<bJ)}.

The criterion for identity of cognitive consequences will be justthe usual criterion of identity for ordered pairs, namely

{<Pui/>i) = <¥>2,V>2) iff ¥>i = ¥>2 and Vi = t/>2-

I turn now to the interpretation of D. Let a countable experi-ment be an experiment in which the true element of some count-able partition A\,A2,... is ascertained, and for each i = 1,2,...there is a proposition Bi that will be accepted if A% obtains.In the favored interpretation, D contains all countable experi-ments.

It may be that only finitely many of the A{ in a countable par-tition are nonempty; hence the class of countable experimentsincludes those experiments associated with a finite partition.In particular, if only one of the A% is nonempty, the countableexperiment is identical with the decision to simply accept somehypothesis. Thus acceptance decisions are also included in theclass of countable experiments.

In addition to countable experiments, the favored interpreta-tion of D has D containing a special experiment, which I willdenote fr. This experiment consists of ascertaining which stateobtains, and then accepting that the true state does obtain. Sothe definition of fr is that for all x G X, fr{x) = ({#}, *)• SinceAxiom 8 forces X to be uncountable,13 fr is not a countableexperiment; it is an uncountable experiment.

Besides countable experiments and fr, D must contain mix-tures of these acts, as required by Axiom 3. The favored inter-pretation of D is that it includes all these acts, and no others.13As clause (iv) of Theorem B.I shows.

201

As a result of these stipulations, D is the class of all acts / thathave the following form:

There is a countable partition {Ai, A2,...} such that / involves as-certaining which event Ai obtains and also, in case A\ obtains, as-certaining which state obtains. If state x £ Ai obtains, then {x} isaccepted; and for each i = 2,3,... there is a proposition Bi that willbe accepted if Ai obtains.

It is to be understood here that only one of A\, A<i,... need benonempty.

I will draw on this favored interpretation of D and Y in dis-cussing the reasonableness of the two remaining axioms.

8.3.3 Density of simple actsThe next axiom is easy to state.

Axiom 10. For all f,g G D, if f -< g, then there exists a simpleact h G D such that f ^h -<g.An alternative way of saying the same thing is to say that thesimple acts are dense in D.

If Axioms 1-8 are satisfied, then the following conditions arejointly sufficient (but not necessary) for Axiom 10:

(i) There are "best" and "worst" acts; that is, there existM e D such that, for all / G D, k 3 / 3 I.

(ii) For all / G D, if k -< / or / -< Z, then there exists asimple act / iGD such that h •< f 01 f ^<h(respectively).

A proof of this is given in Appendix C. I will now show thatboth of these conditions can be expected to hold, given thefavored interpretation of D and Y.

The act / r gives true and complete information about theworld,14 and so it can be expected to be at least weakly preferredto all other experiments; that is, / •< fr for all / G D. At anyrate, this will be so if the preferences reflect the scientific con-cerns for truth and informativeness. Hence there is a "best" act,14At least, it is complete relative to the set X of states, which is all that matters

here.

202

as (i) requires. Furthermore, the act of accepting the contradic-tory proposition 0 is surely not preferable to any other cognitiveact; so if/0 is the act of accepting 0, we have f$ ^ / , for all / £ D.So there is a "worst" act too, and hence (i) is satisfied.

Turning now to (ii), suppose that / -< /r- For any positiveinteger n and finite partition Ai,..., An there is a possible ex-periment gn that consists of ascertaining which Ai obtains, andaccepting the one that does obtain. It is plausible that as n isincreased without bound, and provided the events Ai are chosensuitably, the gn would approach fr in the preference ranking;in that case, there will be some n such that / -< gn. Now thepossible consequences of gn are (-Ai, £),. . . , (An, i), and henceare finite; thus gn is a simple act. Thus the part of (ii) dealingwith the "best" act does hold. Furthermore, as I argued in Sec-tion 8.1, the act f$ is a constant act; hence it is a simple act; soif h ^ f> h itself is a simple act such that h •< f. Thus bothparts of (ii) hold.

So on the favored interpretation of D and Y, both (i) and (ii)can be expected to hold; from which it follows that Axiom 10holds.

8.3.4 Structural assumptionsThe final axiom imposes four conditions on the structure of D,Yj and X. I will first explain how each condition is satisfiedon the favored interpretation of D and F, then bring theseconditions together to state the axiom.

Acceptance of a proposition A is an act all of whose conse-quences are (on the favored interpretation) of the form (J4 ,^) ,for some ip. Since (A, ip) ^ (B, xj)) if A ^ B, it follows that thepossible consequences of accepting A are disjoint from the pos-sible consequences of accepting any other proposition JB. UsingC/(X)' to denote the set of possible consequences of act /, 15

the fact just established can be expressed by saying that if /and g are the acts of accepting two different hypotheses, then

On the favored interpretation, it is also true that each con-sequence in Y is a possible consequence of accepting some15More fully: f(X) = {y : f(x) = y, for some x € X}.

203

proposition or other; for each element of Y has the form (J4, £),(A, #), or (0,/), and the first two of these are possible con-sequences of accepting A, while (0, / ) is the (inevitable) con-sequence of accepting 0. Letting E be the class of all actsof accepting some proposition or other, we can then say thatU/eE / P O = Y. Combining this result with that of the pre-ceding paragraph, we have that

(i) {f(X) : / G E} is a partition of Y.

Now let / be any element of E. This means there is someproposition A such that / is the act of accepting A. Thus onepossible consequence of / is (A,t) (or (A, / ) if A = 0), and allother possible consequences, if there are any, are of the form{A,x). Now a consequence of the form (A,x) can only be ob-tained if x is the actual state. So we have that for each / G Ethere is at most one exception to the rule that all possible con-sequences of / are consequences that can be obtained in justone state. This result can be expressed formally by saying:

For all / G E, there is at most one j / G 7 such that f~1(y)contains more than one element.

The consequences of the act fa are all of the form ({a?},t),and so each possible consequence of fr is obtainable in just onestate. So for all j / G 7 , f^iv) contains at most one element.Letting E; be the set whose sole member is / r , we now havethe following strengthening of the preceding result.

For all / G E U E;, there is at most one y EY such thatf~1(y) contains more than one element.

So a fortiori we have

(ii) For all / G E U E', there are at most countably manyy EY such that f~1(y) contains more than one element.

In Section 8.3.2,1 said that on the favored interpretation D isthe closure under mixing of the set of all countable experiments,together with fr- It is easy to show that on this interpretation,the following is true.

204

(iii) For all / G D, there is a sequence / i , /2, •. • of elementsof E U E', and a partition A\, A<i,... of X, such thatfor all positive integers i, / = fi on A{.

We have already observed that the possible consequences offx are all of the form ({#}, t). The only way any of these can bea consequence of accepting a hypothesis A is if A = {x} for somex G X; and in that case there is just one consequence that frand accepting A have in common. (For all other consequencesof accepting A are of the form ({#},/).) Thus for all / i G E ,h(X) n fr(X) contains at most one element, and it is of theform ({#},£). Now for any act / G D, f~1({x},t) is either {x}or 0. Consequently, we have

For all / G D and h G E, / " ^ ( X ) n / rPQ] contains atmost one element.

Since E' = {/T}> we therefore have(iv) E' contains at most one element, and for all / G D,

ftEE, and h' G E', f~l[h(X) n /i'(X)] contains atmost one element.

Putting together these four results, we have that on the fa-vored interpretation, this axiom is satisfied:

Axiom 11. There exist E , E ' c D such that(i) {h(X) : heE} is a partition ofY.

(ii) For all h G E U E', there are at most countably manyy eY such that h~1(y) contains more than one element,

(iii) For all f G D, there is a sequence / i , /2, • • • of elementso/EU E', and a partition Ai,A2,... of X, such that forall positive integers i, f — f% on A{.

(iv) E' contains at most one element, and for all f G D,heE, and h! G E', f~l[h{X) n hf(X)] contains at mostone element.

It is possible to weaken Axiom 11 in various ways withoutaffecting the representation theorem (other than to make theproof more complicated). For example, it would not matter ifE' had any finite cardinality. Also, it is possible to weaken therequirement in clause (iv) that f~1[h(X) D hf{X)} contain at

205

most one element. Just how far such weakenings can go is anopen question. However, I show in Section D.5 that Axiom 11cannot be entirely eliminated.

8.3.5 Representation theoremThe expected utility representation whose existence is given byTheorem 8.1 was limited to preferences concerning simple acts.With the addition of Axioms 9-11, we obtain a representationtheorem without that limitation; it applies to the preferencerelation on D, without restriction.

Also, the utility function whose existence is given by Theo-rem 8.1 was unique only up to a positive affine transformationon Y* and was quite arbitrary on Y \ Y*. The utility func-tion whose existence can now be derived satisfies the followingmuch stricter uniqueness condition: If p and u are probabilityand utility functions that give the expected utility representa-tion of preference on D, then v! is another utility function thatcan be substituted for u while preserving that representationiff there exist real numbers p and cr, p > 0, such that for all/ G D, the set of all x such that

U[f(x)}=pu'{f(x)}+<T

is an event with probability 1. I will describe this by sayingthat u is unique, up to a positive affine transformation, almosteverywhere on Y.16

So the representation theorem is the following:Theorem 8.2. If Axioms 1-11 hold, then there exists a proba-bility function p on X, and a utility function u on Y, such thatfor all f, gel),

f^g iff EU(f)<EU(g),

where EU is expected utility calculated using p and u. The prob-ability function is unique; and the utility function is unique, upto a positive affine transformation, almost everywhere on Y.16There is also a technical requirement: The composition v! o / of functions u1

and / must be measurable; otherwise the expected utility of act / would notbe defined.

206

The proof of Theorem 8.2 is given in Appendix D.Thus we now have a representation theorem that establishes

an expected utility representation for cognitive preferences evenwhen the acceptance of a hypothesis is not regarded as a simpleact. This provides a vindication of the decision theoretic accountof rational acceptance that I gave in Section 6.3. It also providesthe foundation for the following discussion of scientific values.

207

9

Scientific values

Chapter 6 put forward the idea that scientists are concerned toaccept theories that are informative and close to the truth; andit was assumed there that these values can be represented by acognitive utility function. Chapter 8 has provided support forthe idea that cognitive values can be represented by a cognitiveutility function. However, a person's preferences can satisfy allthe axioms of Chapter 8 without that person having the sorts ofvalues we would call scientific; in this case, the person's prefer-ences are representable by a cognitive utility function, but thatfunction does not reflect scientific values.

This chapter investigates the question of what sort of cogni-tive values count as scientific. I articulate and defend the viewput forward in Chapter 6: that science values informativenessand closeness to the truth and nothing else.

9.1 TRUTH

The notion of truth figures centrally in my account of scientificvalues. This notion is sometimes regarded as a dubious one,in need of clarification before it can be used with a good con-science. So I should perhaps begin by putting such worries torest. Fortunately, that is easily done.

All we need to know about truth is that it has what Blackburn(1984) calls the transparency property. This property is thatstatements of the form 'It is true that p' mean the same as 'p'(where p is to be replaced by any sentence).1 For example, 'Itis true that sodium burns with a yellow flame' means the same

xIt would be customary to restrict p here to sentences that do not themselvescontain the predicate 'is true', or to impose some other restriction to the sameeffect. But I think it is more elegant to follow the proposal of Gupta (1989),which does not require such a restriction, and instead provides a method forunderstanding circular definitions.

208

as 'Sodium burns with a yellow flame'. Since the meaning ofthe latter sentence is unproblematic (if anything is), so is themeaning of the former. Consequently, talk of truth need not bemysterious or problematic.

In my experience, those who find truth a problematic notionare often failing to distinguish between truth and certainty. Forexample, it is sometimes said that since we cannot be certain ofanything, therefore there is no such thing as absolute truth orfalsity. If "absolute truth" means certainty, this little argumentis sound, but trivial. If "absolute truth" means truth - that is,if it has the transparency property - then the inference is a nonsequitur. In fact, if one accepts the logical principle of excludedmiddle

p or not-p

then it follows that every statement is either true or false.

Proof. Let p be any statement. Prom excluded middle andthe transparency property we obtain

p is true or not-p is true.

Since 'p is false' means 'not-p is true', this is the desired result.

I mentioned that what I say about truth in this book requiresonly the transparency principle in order to be intelligible. Forexample, I hold that the scientific utility of accepting a state-ment A is at least as great when A is true as when A is false.This claim can be restated by saying that the scientific utilityfor 'A is accepted, and A! is at least as great as that for 'A isaccepted, and not-A\ The latter formulation does not use thenotion of truth; and so it suffices to explain what is meant bytruth here.

9.2 NECESSARY CONDITIONS

If some people prefer that accepted hypotheses be false, theywould violate the condition I stated in the preceding paragraph.I would say that such a person has unscientific values. Never-theless, the person could satisfy all the axioms of Chapter 8

209

and hence have a cognitive utility function. In this case, thecognitive utility function does not represent scientific values.

Let a scientific utility function be any cognitive utility func-tion that represents scientific values. That is, a scientific utilityfunction is a cognitive utility function such that, if a personhad it, we would say that person had scientific values. I donot assume that there is some one, essentially unique2 scientificutility function. In fact, I am quite sure such a thing does notexist. But some kinds of values are clearly unscientific. In thissection, I will propose four necessary conditions for a cognitiveutility function to count as scientific. If these conditions are ac-cepted, then any cognitive utility function that violates themis unscientific; but there can still be many, essentially different,cognitive utility functions that are scientific.

9.2.1 Respect for truthThe first necessary condition for a scientific utility function hasalready been mentioned: The cognitive utility of accepting ahypothesis when it is false cannot be higher than that of ac-cepting the hypothesis when it is true. Letting u(A, x) denotethe cognitive utility of accepting A when the true state is x, wecan formalize this condition as

u(A, x) > u(A, y), for all A G X, x G A, and y G A. (9.1)

(For those who skipped Chapter 8: X is a set of subsets of theset of states X. Elements of X are thus sets of states, and areregarded as events or propositions.)

In Chapter 8,1 assumed that the cognitive utility of acceptingA when it is true is independent of which state in A obtains.That is,

u(A, x) = u(A, y), for all A G X and x,y G A. (9.2)

In this case, we can let u(A,i) denote the cognitive utility ofaccepting A when it is true; that is, u(A, t) will be the value of

2For those who read Chapter 8: What I mean by "essentially unique" is uniqueup to a positive affine transformation almost everywhere.

210

u(A, x) for any x £ A. And then we can express (9.1) a littlemore perspicuously as

u(A, t) > u(A, x), for all A G X, x G X. (9.3)

Because the inequality is weak, (9.3) does not say that a pos-itive value is put on truth, only that a positive value is not puton falsehood. So (9.3) is consistent, for example, with the viewthat scientists care about whether their theories correctly de-scribe observable phenomena but do not care whether they arecorrect about unobservable entities.3 Although I suspect thatscientists typically prefer to be right about unobservables as wellas observables, I do not think that someone who is indifferentabout correctly describing unobservables is thereby unscientific.For this reason, I would not require the inequality in (9.3) tobe strict. More generally, there may be many things a persondoes not care whether they are right or wrong about, and thisdoes not make a person unscientific; hence the weak inequalityin (9.3) cannot be replaced by a strict one.

A number of philosophers of science have maintained thatscientists do not value truth but do value agreement betweentheory and available evidence. Kuhn (1977, 1983) took thisview, and similar positions have been taken by Laudan (1977),Ellis (1988), and others. Superficially, this position may appearto be in conflict with both (9.1) and (9.2). For might it not bethe case that the evidence for a theory is better in some statesin which the theory is false than in some states in which thetheory is true? And might it not be the case that the evidencefor a theory could vary in strength in different states in whichthe theory is true? If so, and if agreement with evidence is ascientific value, then we have a conflict with (9.1) and (9.2).

However, it must be remembered that acceptance is being un-derstood here as acceptance of a total corpus (cf. Section 6.3.1).If the evidence is different in different states, and if that evi-dence is accepted, then what is accepted as corpus is different inthe different states. But for a counterexample to (9.1) or (9.2),what is needed is a case in which the same proposition A is3This view, which has been put forward by van Fraassen, will be discussed in

Section 9.6.

211

accepted in two different states. Hence so long as we are deal-ing with accepted evidence, claims about cognitive utility beinga function of the available evidence cannot be inconsistent with(9.1) or (9.2).

If the utility of accepting A were a function of the proba-bility of A, then we could get counterexamples to (9.1) and(9.2). For example, suppose that accepting A when it has ahigh probability has higher utility,4 other things equal, thanaccepting A when it has a low probability. Then the states willneed to specify the probability of A, and if A is true in bothstates x and y but has a higher probability in the former, thenu(A,x) > u(A,y), in violation of (9.2).

While the situation envisaged would be a counterexample to(9.2), there is no reason to think accepting propositions witha high probability is part of the ends of science. To the extentthat high probability is a reason for accepting a theory, thiscan be understood as being because high probability increasesthe expected utility of accepting a theory; we do not need tosuppose in addition that it increases the actual utility of ac-cepting a theory, when that theory is true. In other words, thedesirability of high probability is adequately accounted for bytaking it to be a means to the end of accepting true theories,and without taking it to be an end in itself.

Furthermore, if the utility of accepting a true theory covar-ied with the probability of that theory, we would get into thesort of difficulties discussed in Section 7.3. We would have thatarbitrary shifts in probability could increase expected cogni-tive utility and that gathering cost-free evidence could reduceexpected cognitive utility. Since both possibilities conflict withour views about what would be rational so far as scientific goalsare concerned, we have a strong positive reason to deny that theutility of accepting a true theory varies with the probability ofthat theory. And more generally, we have positive reason to saythat the utility of accepting a theory does not depend on theprobability of that theory.

4'Utility' here is actual utility, not expected utility. The expected utility of ac-cepting A will covary with the probability of A; but we are considering herethe quite different claim that the actual utility of accepting A covaries with theprobability of A.

212

So I will assume that both (9.1) and (9.2) are necessary con-ditions for a utility function to represent scientific values.

9.2.2 Respect for informationThe concern for informative truths would clearly be violated ifwe did not have

u(AB, t) > u(A, t), for all A,B eX. (9.4)

Since the inequality is weak, a violation of this condition wouldbe a situation in which a positive value is put on avoidingsome information. For example, married persons may some-times judge it preferable to be agnostic about their spouse'sfidelity, rather than accept the existence of infidelity, even whenthere is indeed infidelity. This is not necessarily irrational (cf.Section 1.8); but it is unscientific.

Suppose a, 6, and c are true statements. Then abc V dbc andabc V abc V abc are both true statements, with the former beinglogically stronger than the latter; hence (9.4) entails that ac-cepting abc V abc should have cognitive utility at least as highas accepting abc V dbc V dbc. However, Tichy (1978) and Oddie(1981) have argued that in this example, it is better to acceptabc V dbc V dbc than to accept abc V dbc. Their reasoning is thatin accepting abc V dbc one stands a 50 percent chance of beingas far from the truth as possible, while accepting abcVdbc\/dbcgives one a 2/3 chance of being close to the truth and only a 1/3chance of being as far from the truth as possible.5 If Tichy andOddie were right about this, then we would have a counterexam-ple to (9.4). But they are not right. Acceptance of a disjunctionis acceptance of the disjunction, not acceptance of a randomlyselected disjunct. So, in particular, acceptance of abcMdbc doesnot give one any chance of accepting dbc, and hence it does notgive one any chance of being as far from the truth as possible.Thus we have not yet been given a good reason to reject (9.4)as a necessary condition for a scientific utility function. (For amore extended discussion of the position of Tichy and Oddie,see [Niiniluoto 1987, sec. 6.6].)

5These judgments of closeness to the truth depend on Tichy's measure of veri-similitude, which will be presented in Section 9.4.2.

213

The weak inequality in (9.4) should not be made strict. Forone thing, A might entail B, in which case AB expresses thesame proposition as A, and so identity must hold in (9.4). Buteven if A does not entail B, it need not be unscientific to haveu(AB,t) = u(A,t). For example, let A specify the age of theuniverse and B specify this week's top-selling record; I don'tcare whether I know J5, and so for me u(AB, t) = u{A, £), eventhough A does not entail B; but this is surely not a reason to saymy values are unscientific. For another sort of example, suppose(as in the example of Section 6.3.2) that we are interested inthe value of some real-valued parameter, and that

k

for some k. Then setting A = [0,1] and B = {r : r ^ 1}, wehave

u(AB,t) =u(A,t) = fc/2,even though A does not entail B. But I would not want tocall someone unscientific, merely because they have this utilityfunction.

On the other hand, someone who does not care about beingright about anything would not have scientific values. Thus fora person with scientific values, there must be some propositionA such that the person thinks that accepting A when it is trueis better than complete agnosticism on empirical matters. For-mally, this amounts to saying that the following condition isnecessary for scientific values:

u(A, t) > u(X, t), for some A G X. (9.5)

9.2.3 ImpartialityIf science is a disinterested search for truth, then the scientificutility of accepting the whole truth would be the same, whateverthat truth may be. Formally, this condition is that

u({x}, t) = u({y}, t), for all x,y G X. (9.6)This condition requires that the utility of accepting the wholetruth be the same in every state, and hence that the graph ofthe utility of this act be a horizontal straight line.

214

But is science a disinterested search for truth, in this sense?It is often said that simplicity is a goal of science. This claimcan be interpreted in different ways, and on one interpretationit is inconsistent with (9.6). I will therefore mention here themain interpretations and discuss their compatibility with (9.6).

Some writers, such as Jeffreys (1931), have held that scientistsprefer simple theories because these have a higher probabilityof being true. On this view, simplicity is valued not because itis one of the ends of science, but rather because it is viewed asa means to the end of having true theories. Hence this view isconsistent with the idea that the cognitive utility of accepting acomplete true theory does not depend on the simplicity of thetheory; in other words, the view is consistent with (9.6).

Popper (1959), Kemeny (1955), Sober (1975), and Rosen-krantz (1977) have all held that the simplicity of a theory co-varies with the content of the theory. On this view too, simplic-ity is a means to the goal of having true informative theories,not a separate end of science. So this view of the role of sim-plicity in science is also consistent with (9.6).

Finally, there are those, such as Kuhn (1977, 1983), who seesimplicity as an ultimate goal of science, not a means to thegoal of having true informative theories. For these, the utilityof accepting a complete true theory will vary, depending on thesimplicity of the theory (and, in Kuhn's case, other factors also).So if this view is accepted, (9.6) would have to be rejected as acondition on a scientific utility function.

This is not the place to attempt a definitive account of therole of simplicity in science. However, I will express my sym-pathies with the view that simplicity is not an ultimate end ofscience, but only a means to the goal of having true informativetheories. I also note that scientists themselves often talk in away that fits better with the idea that simplicity is valued asan indicator of truth, rather than an end in itself. For example,biologist Francis Crick writes, "Elegance and a deep simplicity,often expressed in a very abstract mathematical form, are use-ful guides in physics, but in biology such intellectual tools canbe very misleading" (Crick 1988, p. 6). A similar statement,by the physicist John Schwarz, is quoted in (Kaku and Trainer1987, p. 195). These scientists evidently value simplicity only so

215

long as it appears to be a reliable guide; hence it is for them ameans to an end, not an end in itself. Going back further, New-ton's invocation of simplicity is supported with the claim thatit is an indicator of truth: "Nature is pleased with simplicity,and affects not the pomp of superfluous causes" (Newton 1726,vol. II, p. 398).

Consequently, while I acknowledge that simplicity plays a rolein theory choice, I hold that it is not an intrinsic goal of science;and hence I hold that (9.6) really is a necessary condition for autility function to be scientific. Furthermore, reducing the num-ber of postulated ends of science is surely a gain in simplicity, sothe advocates of simplicity would themselves seem committedto agreeing that simplicity should not be taken to be an end ofscience.

9.2.4 Contradiction suboptimalConditions (9.3)-(9.6) impose no restrictions on preferences re-garding acceptance of the contradictory proposition 0. (In thecase of (9.3)-(9.5), this is because there is no such cognitive con-sequence as < 0,£ >.) But someone who weakly prefers to ac-cept a contradiction than to accept the complete truth is surelybeing unscientific, if not irrational. Hence a further conditionsatisfied by scientific utility functions is

u ({x}, t) > u(0, y), for all x, y e X. (9.7)

I would not claim that conditions (9.3)-(9.7) jointly consti-tute sufficient conditions for a cognitive utility function to bescientific; though I am not currently aware of any further condi-tions that seem plausible to me. But I doubt that any reasonableconditions on a scientific utility function could be so strong asto identify an essentially unique utility function as scientific.

9.3 VALUE INCOMMENSURABILITY

Kuhn (1977) has maintained that certain values are partly con-stitutive of science, but that these values can be interpreteddifferently by different scientists. He also says the weight to begiven to each can be assessed in different ways without beingunscientific. In (1962), Kuhn seemed to be arguing from these

216

premises to the conclusion that scientific debates cannot be ra-tionally persuasive. For example, he wrote:

To the extent, as significant as it is incomplete, that two scientificschools disagree about what is a problem and what a solution, theywill inevitably talk through each other when debating the relativemerits of their respective paradigms. In the partially circular argu-ments that regularly result, each paradigm will be shown to satisfymore or less the criteria that it dictates for itself and to fall short ofa few of those dictated by its opponent, (pp. 109f.)

Doppelt (1978) has elaborated and defended this argument thatdivergent values entail the incommensurability of theories.

While I differ from Kuhn on what the ultimate scientific val-ues are, I agree with him that values necessarily play a role intheory choice, and that different scientists can have differentvalues without being unscientific. Am I then committed to theview that scientific debates cannot be rationally persuasive? Iam not.

It is true that if two scientists have different values, thenthey can disagree about the relative merits of different theories,without either having made a mistake. For example, suppose theutility of accepting theory A when it is true is 0.6 for scientist 1,and 0.4 for scientist 2; and suppose further that the utility ofaccepting A when it is false is —0.5 for both. Then if both agreethat the probability of A is 0.5, accepting A will have a positiveexpected utility for scientist 1 and a negative expected utilityfor scientist 2. Assuming that the expected utility of suspendingjudgment is zero, this means that scientist 1 prefers acceptingA to suspending judgment, while scientist 2 has the oppositepreference. And neither can be said to be wrong; their differencesimply reflects different values.

But it does not follow from this that the difference betweenthese scientists cannot be resolved by rational means. As furtherevidence is acquired, the probability of A will shift. If it risesto 0.6, then scientist 2 will come to agree with scientist 1 thataccepting A is preferable to suspending judgment. Similarly,if the probability of A falls to 0.4, then scientist 1 will adoptscientist 2's preferences. Thus it is possible for the differencebetween these scientists to be resolved by gathering further ev-idence. This is a rational way to resolve their difference and

217

does not require either party to adopt the values of the other.6Of course, there is no guarantee that the requisite evidence

will be found to settle scientific differences resulting from differ-ent values. But the history of science indicates that differencesusually do get settled, and this happens primarily as a result ofadditional evidence being acquired, not by resolving divergentvalues. On the other hand, at the cutting edge of research therealways are disagreements, and we should not require a philos-ophy of science to deem all such disagreements as irrational orunscientific. In fact, as Kuhn (1977) has persuasively argued,scientific progress is enhanced by the fact that different scien-tists sometimes accept different theories. On the account I amoffering, such differences may be due in part to different prob-abilities on the part of the scientists involved, but may also bedue to a difference in their values, even though those values arescientific.

9.4 VERISIMILITUDE

In Chapters 6 and 8,1 allowed for the possibility that scientistswould rather a false accepted theory be close to the truth thannot so close. But what does it mean for a false theory to be moreor less close to the truth? This notion cannot be explicated inthe way I explicated the notion of truth itself. Furthermore,skepticism about the meaningfulness, or usefulness, of this no-tion has often been expressed. I will now argue that the notionis worth trying to make sense of, and that the representationtheorem of Chapter 8 provides a way of making sense of it.

9.4-1 Why verisimilitude?The idea that false theories can be more or less close to the truthhas often been invoked by those who see science as aiming attruth. An important motivation has been to give an account ofprogress in science. The history of science is a history of false6 Subsequently, Kuhn himself has noted that evidence can produce agreement

amongst scientists with different algorithms or values (Kuhn 1977, p. 329). Hedoes still maintain that change in theory produces a change in values, but notesthat this "does not make the decision process circular in any damaging sense"(p. 336). Thus before Doppelt's article was published, Kuhn had repudiated thethesis Doppelt attributed to him.

218

theories, and yet we want to say that science is making progress.Consequently, those who take science to be aiming at truthare apt to suggest that science is making progress in that it istending to approach truth more and more closely, even thoughall past and present scientific theories may be strictly false.7According to this view, Newtonian mechanics is an advanceover Aristotelian mechanics because, despite the fact that boththese theories are false, Newtonian mechanics is closer to thetruth.

But does a realist really need to take this approach? An al-ternative suggestion (made by Giere [1983, sec. 9]) is that scien-tists should not accept that their theories are literally correct,but rather that reality is like their theoretical models in certainspecified respects, to within certain specified degrees of error.If the qualifications here are chosen suitably, what is acceptedwill stand a good chance of being true. One could then say thatprogress in science consists in this sequence of generally truepropositions becoming more informative over time.

There are several problems facing this latter approach. Thefirst is that scientists do not appear to have put limits on whatthey accepted that were sufficiently broad to make it the casethat what they accepted was true. For example, the histori-cal evidence strongly suggests that Newtonians accepted thatgravitation acts instantaneously over the whole universe, thatthe mass of bodies is completely independent of their velocity,and so on. No margins of error for these theories were spec-ified; and if the scientists had some tacit margins in mind,there is no reason to think these were broad enough that whatthe scientists accepted was true by our lights (i.e., consistentwith relativity theory). So Giere's approach is not able to pre-vent the conclusion that Newtonians accepted false proposi-tions. Since we nevertheless want to say that Newtonianismrepresented progress over Aristotelianism, we still have a mo-tivation for saying that some false theories are closer to thetruth than others.

Defense of the progressiveness of science is not the only reasonfor allowing that false theories can be more or less close to the7This view of the nature of scientific progress has been expressed by Popper (1963,

p. 231), Niiniluoto (1984, ch. 5), and Newton-Smith (1981), among others.

219

truth. In many cases, the notion is intuitively compelling in itsown right. Consider, for example, Cavendish's hypothesis:

He'- The electrostatic force falls off as the nth power of thedistance, for some n between 1.98 and 2.02.

This hypothesis is false if the true value of n is anything lessthan 1.98 or greater than 2.02; but we would certainly be in-clined to say that Cavendish was closer to the truth if the truevalue of n was 2.021 than if it was 3.

9.4-2 A little historyThere is, then, good motivation for saying that false theoriescan be more or less close to the truth. But we can say thismeaningfully only if the required notion of distance from truthis meaningful. On this point, there is considerable skepticism;for example, Laudan (1977, pp. 125f.) writes that "no one hasbeen able even to say what it would mean to be 'closer to thetruth.'" Laudan's complaint is not that nobody has tried tosay what it means; and in fact, the literature now contains alarge number of attempts to say what it means. But there is awidespread sense that these attempts have not been successful,and this has fueled skepticism about the possibility of makingclear sense of the notion.

The first rigorous account of 'closeness to the truth' was givenby Popper (1963, pp. 231-7, 391-403; 1972, pp. 47-60). Thenotion of closeness to truth that interested Popper was closenessto the whole truth; thus a true but uninformative statement isnot close to the truth, in this sense. Popper introduced the termverisimilitude to refer to closeness to the truth in this sense.Thus the verisimilitude of a theory is the degree to which itapproximates the ideal of entailing all truths and no falsehoods.Popper observed that if all true consequences of B are alsoentailed by A, and if any false consequence of A is also entailedby 2?, the verisimilitude of A is at least as great as that of B. Ifin addition A entails some true statements not entailed by JB, orB entails some false statements not entailed by A, then Poppersaid the verisimilitude of A is greater than that of B. Theseconditions on verisimilitude have come to be called Popper's"qualitative definition of verisimilitude."

220

Unfortunately, this qualitative definition permits the veri-similitude of theories to be compared only in special cases. Tocompare the verisimilitude of A and B, it must be the casethat either (i) all true consequences of B are also entailed byA, and any false consequence of A is also entailed by i?; or (ii)this relation must hold with A and B interchanged. However,Tichy (1974) and Miller (1974) showed that no two false theo-ries satisfy these conditions, and hence no two false theories arecomparable using Popper's qualitative definition of verisimil-itude. But a central motivation for introducing the notion ofverisimilitude is to say that some false scientific theories arecloser to the truth than earlier ones.

Popper also gave a quantitative definition of verisimilitude,which was intended to define the verisimilitude of all state-ments. Letting Ht be the conjunction of the true consequencesof i7, and p a logical probability function, Popper suggestedthat the truth content of H could be taken to be measured by1 —p(Ht). He also suggested that the falsity content of H couldbe measured by 1 — p(H\Ht). Thinking of the verisimilitude ofa statement as its truth content minus its falsity content thenleads to the definition8

v(H)=p(H\Ht)-p(Ht). (9.8)

One problem with (9.8) is that it rests on the assumptionthat there is such a thing as logical probability. Ramsey hadalready argued cogently against the existence of such a thing in(1926), and Carnap's later heroic efforts to make sense of thenotion in the end (1952) only confirmed that Ramsey had beenright.9 Nor could we substitute subjective for logical probabilityin (9.8), since that would make it impossible to be certain of thetruth of any hypothesis with positive verisimilitude. As Popperhimself insisted, verisimilitude is not meant to be an epistemicnotion.

Even if the required logical probability function existed, (9.8)would still be an unsatisfactory definition for other reasons.8For numerical calculations of verisimilitude, Popper prefers to divide the fol-

lowing definiendum by the normalization factor p(H\Ht) +p(Ht) (Popper 1963,p. 399; 1972, p. 334). But this is a refinement that need not concern us here.

9In a penetrating recent study, van Fraassen (1989, ch. 12) reaffirms thisconclusion.

221

Note that a statement is a true consequence of H just in case itis a consequence of H V T, where T is the true complete theory(relative to some partition). Hence we can substitute H V T forHt in (9.8), obtaining

v(H) = p(H\H V T) - p(H V T)_

If if is false, then H and T are disjoint, and we get

Thus the verisimilitude of false theories is a function of theprobability of the theory. In particular, all false theories with thesame probability have the same verisimilitude. As an exampleof this, suppose an urn contains 100 balls, each of which is eitherblack or white. Proponents of logical probability would say that(in the absence of other information) each attribution of colorsto balls has the same logical probability. Then if all the balls arein fact white, Popper's account deems the hypothesis that allthe balls are black just as close to the truth as the hypothesisthat balls 1-99 are white and ball 100 is black. But this is notthe judgment we would make. (Tichy [1974] made essentiallythis point.)

Although Popper's attempts to define verisimilitude were un-successful, their failure prompted a host of other attempts toachieve Popper's goal: to define a mathematical function thatwould be a measure of the verisimilitude of a hypothesis. Tichymade a start at the end of his 1974 paper criticizing Popper'sapproach; since this proposal was relatively simple, and sincemany later proposals are generalizations or modifications of thesort of approach Tichy adopted, it will be worthwhile to brieflyreview this first proposal of Tichy's.

Tichy considered a language in which there are finitely manyprimitive sentences, and all other sentences in the language aretruth functional combinations of these sentences. (So the lan-guage belonged to what is called "propositional logic") For asimple example, let us suppose there are just three primitive

222

sentences, a, 6, and c. The maximally specific sentences thatcan be expressed in this language are the eight sentences of theform abc, abc,..., abc. These are called constituents, and theyare the linguistic equivalent of states of nature. In particular,the constituents form a partition: Exactly one of them must betrue. Tichy proposed that the distance of a constituent from thetruth could be taken to be measured by the number of primitivesentences that the constituent is wrong about. For example, ifa, b, and c are all true, the distance of abc from the truth wouldbe 0; the distance of abc from the truth would be 1; the distanceof abc from the truth would be 2; and so on.

That definition only defines distance from truth for constit-uents. It remains to extend the definition to other sentences.Tichy noted that for languages of the kind here considered, anysentence that can be expressed in the language is equivalent toa disjunction of constituents. For example, the sentence ab isequivalent to abc V abc. The disjunction of constituents that isequivalent to a given sentence is called the disjunctive normalform of that sentence. Tichy proposed that the distance fromthe truth of any sentence in the language could be taken tobe the average distance from the truth of the constituents inthe disjunctive normal form of the sentence.10 For example,the distance of ab from the truth, on this proposal, would be(0 + l)/2 - 1/2.

The notion of distance from truth, that Tichy intended toexplicate, is distance from the whole truth (where the wholetruth is represented by the true constituent). Distance from thetruth is thus the inverse of the notion of verisimilitude. So givena definition of distance from truth, we can define verisimilitudeas a function that varies inversely with distance from truth.Thus Tichy's definition of distance from truth also provides adefinition of verisimilitude.

How satisfactory is Tichy's definition, within its intended do-main of application? Miller (1974, sec. 6) pointed out that onTichy's definition, the verisimilitude of one and the same state-ment may be different, depending on what the primitive sen-tences of the language are taken to be. Tichy denies that this10Tichy (1974, p. 159) writes that this quantity is to be taken as the verisimilitude

of the sentence; but this is a slip.

223

is the case. His argument (Tichy 1978, sec. 8) seems to assumethat two sentences do not express the same statement unlessboth occur in a single language, and definitions that entail theequivalence are adopted. This is surely false. 'Der Schnee istweiss' in German expresses the same statement as 'Snow iswhite' in English, though neither sentence occurs in the otherlanguage. (Quotation names of each sentence occur in the otherlanguage, but the sentences themselves do not.)11

But even if Tichy were right about this, there would remaina problem of arbitrariness in Tichy's account. On Tichy's ac-count, the verisimilitude of a sentence uttered by a scientistdepends on what we take the scientist's language to include,and on which sentences in that language are taken to be prim-itive. If these choices of language and primitive sentences arearbitrary (and Tichy has given no reason to think they are not),then the verisimilitude to be attributed to a sentence utteredby a scientist will also be a matter of arbitrary choice. Sucharbitrariness is unacceptable because an important motivationfor introducing the notion of verisimilitude in the first place wasto be able to say that scientific theories are getting closer to thetruth. If the verisimilitude of those theories is arbitrary, thenthe claim that science is getting closer to the truth is equallyarbitrary.

9.4-3 The case for a subjective approachAlthough we have no use for an arbitrary conception of veri-similitude, it is a mistake to think that verisimilitude can beindependent of human interests or values. Every false theory islike the truth in some respects and unlike it in others; so to saythat it is like (or unlike) the truth overall requires a judgmentas to which respects are important. If this judgment is not to bearbitrary, it must be founded on the degree to which we valuecorrectness of various kinds.

Many philosophers of science seem intent on banishing valuejudgments from science; and so if persuaded that verisimilituderests on value judgments, they would banish it too. (This seemsto be the position of Urbach [1983].) But even if verisimilitude11 For another criticism of Tichy's position, see (Urbach 1983, sec. 1).

224

were banished, value judgments would remain. If one concedesthat science aims to have theories that are both true and in-formative, then comparison of theories requires weighing therelative importance of these different desiderata. For example,if A is logically stronger than S, then acceptance of A wouldgive more information than S, but also runs a greater risk oferror. The decision of which to accept (as corpus) thus requiresweighing the two desiderata against each other. If this weightingis not to be arbitrary, it must reflect our values.

We might also ask: What does it mean to "banish" verisimil-itude? Presumably it means that the value of accepting a theorywhen it is false is independent of which aspects of the theoryare false. But this is no less a value judgment than the con-trary claim, that the value of accepting a false theory may bedifferent depending on what aspects of the theory are false.So "banishing" verisimilitude is really just a different proposalabout scientific values, and does not achieve the desired aim ofbanishing value judgments from science.12

Given that judgments of verisimilitude must depend on ourvalues, we might fix the problem of language-relativity in Tichy'saccount of verisimilitude by saying that the choice of a language isfixed by our values. This has been proposed by Niiniluoto (1987,p. 459). The problem of arbitrariness is thereby solved, since itis no longer the case that any language may be used for assessingverisimilitude using Tichy's measure.

However, once we have allowed that judgments of verisimil-itude are relative to our interests, I see no reason to think thatthose interests can be always be represented by a fixed measureon a suitably chosen language. In any event, the use of syntaxto define verisimilitude seems unnecessarily indirect. A more di-rect route would be to have the measure defined on propositions(which can be the same, though expressed by different sentences)and let our interests be reflected in the choice of the measure.

12The aim might seem to be more nearly approximated by Koertge's (1979) sug-gestion that we need not make judgments about the relative merits of falsetheories. This suggestion gives up on trying to say that science has been mak-ing progress even though its theories have been false. But even if one werewilling to abandon this view of scientific progress, Koertge's suggestion wouldstill be untenable. For the rationality of accepting a theory depends, in part,on the value of accepting the theory when it is false.

225

Even though verisimilitude is relative to our interests, it couldstill be objective, in the sense that everyone has the same in-terests and hence verisimilitude is the same for everyone. Buton the face of it, this is implausible. Individuals differ in therelative values they place on everything else; why not also onthe relative importance of content and truth and on the relativeimportance of the different possible ways of being wrong?

There is an analogy here with early work in utility theory. Atfirst it was assumed that the value of a bet was measured byits expected monetary return. This assumption was refuted byNikolaus Bernoulli's "St. Petersburg paradox," where a gamewith infinite expected monetary value is worth very little. DanielBernoulli (1738) proposed that the value of a bet is measuredby its expected utility, and argued that utility is a logarithmicfunction of money. At this time, it was still assumed that thereis some one function that expresses the utility of money.13 Butnobody thinks that any more; we now allow that different peoplemay have different utility functions for money.

A measure of verisimilitude is essentially a measure of thecognitive utility of accepting a hypothesis under different cir-cumstances. The attempts of Popper, Tichy, and others to pro-duce a mathematical definition of verisimilitude are thus anal-ogous to Daniel Bernoulli's attempt to define the utility ofmoney. We gave up on Bernoulli's project long ago, and I thinkit is time we gave up on the Popperian analog of that project.

I began this chapter by distinguishing scientific values fromother values, and in Section 9.2 I suggested some conditionsthat the former satisfy. But this still allowed that different setsof values can count as scientific, just as different sets of valuescan count as unscientific. Judgments of verisimilitude require aweighting of the relative importance of different ways of beingwrong; and I see no reason to think that science is so narrowlydefined that it can uniquely determine these weightings.

Recall that Popper's original motivation for proposing hisdefinitions of verisimilitude was to show that the notion of

13 Daniel Bernoulli did allow that there were some "exceedingly rare" exceptionsto his rule. His example is a prisoner who needs a fixed sum of money topurchase freedom. But even here, the utility function is conceived as fixed bythe person's external circumstances.

226

verisimilitude was meaningful. However, we do not need thereto be one unique verisimilitude function in order for the notionof verisimilitude to be meaningful. Consider again the analo-gous problem of interpreting the notion of the utility of money.On the approach advocated in Chapter 1, the utility of moneyis relative to persons; and a person has a particular utility func-tion for money if the person's preferences regarding monetarygambles maximize expected utility relative to that utility func-tion (and some probability function). Applying the same ap-proach to verisimilitude, we will say that verisimilitude is rela-tive to persons; and for a person to have a particular measureof verisimilitude is for the person to have a particular kind ofcognitive utility function. Further, a person has a particularkind of cognitive utility function if the person's preferences re-garding cognitive gambles maximize expected cognitive utilityrelative to that cognitive utility function (and some probabil-ity function). In this way, we can give meaning to the notionof verisimilitude, without requiring that there be some uniquemeasure of verisimilitude that is the same for everyone. In thenext subsection, I will fill in the details of how this subjectiveapproach to verisimilitude may be carried out.

9.4-4 A subjective definition of verisimilitudeThe representation theorems of Chapter 8 show that rationalpersons have cognitive preferences that are representable by aprobability function and a cognitive utility function (thoughthese functions need not be unique). On the favored interpreta-tion of Theorem 8.2, the cognitive preferences are over a set Dof cognitive options that consists of (i) countable experiments,and (ii) the act fr of accepting the complete truth (relative tothe chosen partition of states). Recall that the act of accept-ing a hypothesis is a degenerate case of a countable experiment(Section 8.3.2), and hence the representation applies in particu-lar to acts of accepting a hypothesis. Furthermore, Theorem 8.2allows the cognitive consequence of accepting A when A is falseto be different for every x $ A, and hence it allows the cognitiveutility of accepting A to be different for every state in A. WhatI will now do is show how such a cognitive utility function can

227

be used to define a subjective notion of verisimilitude. I will as-sume that we are dealing with a person whose cognitive utilitiessatisfy (9.3)-(9.7).

A simple subjective measure of verisimilitude could be ob-tained by identifying the verisimilitude of a hypothesis withthe cognitive utility of accepting it. But such a measure wouldbe unsatisfactory. Cognitive utilities are, at best, unique onlyup to a positive affine transformation (as we saw in Chapter 8).This simple definition of verisimilitude would thus give differ-ent values of verisimilitude for one and the same hypothesis,depending on which of the equivalent utility functions we choseto use.

We can avoid this difficulty by normalizing the cognitive util-ity function. It is natural to do this in such a way that theverisimilitude of the complete true theory14 is 1, while that ofa tautology is 0. This leads to the following definition. Herevx(A) denotes the verisimilitude of proposition A when x is thetrue state. By (9.6), u({x},x) is the same for all x G l , and Iwill denote this value by UT- Also, by (9.2), u(X, x) is the samefor all x G X; this value will be denoted ux- The definition ofverisimilitude is then

Definition 9.1.

u(A, x) - uxvx{A) =uT -UX

It follows immediately from this definition that for all x G X,vx({x}) = 1 and vx(X) = 0.

Anything deserving the name verisimilitude must satisfy thecondition that a hypothesis is at least as close to the truth whenit is true as when it is false. Formally, what we should have is

vx(A) > vy(A), for all x G A, y G A. (9.9)

14The "complete true theory" is the theory that asserts that the true state obtains.That is to say, it asserts that the true element of some partition, chosen by us,obtains. Each element of this partition is supposed to be sufficiently specificthat it, together with the chosen act, determines everything that is of value(Section 1.1); hence the complete true theory, in a cognitive decision problem,should specify the truth value of every proposition whose truth value we wouldlike to know. However, it need not be complete in any stronger sense than this.

228

Prom (9.4) and (9.5) we have that UT — ux > 0; and this to-gether with (9.3) and Definition 9.1 entails (9.9).

Verisimilitude is meant to represent closeness to the wholetruth. So if A and B are both true, the verisimilitude of ABshould be at least as great as that of A. Formally:

Vx(AB) > vx(A), for all x G AB. (9.10)This inequality follows immediately from (9.4) and Defini-tion 9.1.

Let x* denote the true state. Then the actual verisimilitudeof any proposition A is vx*(A)1 which I will write simply asv(A). At this point, I have addressed Laudan's complaint, men-tioned earlier, that "no one has been able even to say what itwould mean to be 'closer to the truth'." The verisimilitude of ahypothesis has been defined partly in terms of a person's cog-nitive utilities (and hence preferences), and partly in terms ofthe true state of nature.

The fact that the definition of verisimilitude refers to the truestate x* brings me to a second complaint that Laudan has madeabout verisimilitude, namely that no one has been able "to of-fer criteria for determining how we could assess" verisimilitude(Laudan 1977, p. 126; see also 1981, p. 32). Now it is true thatwe do not in general know which state is #*, and it follows fromthis that we generally do not know the value of the verisimil-itude of a theory. But we can calculate an expected value for theverisimilitude of a theory. Assuming X is countable, we have,for any proposition A,

S[v(A)} =xex

As a simple example of this, let {Hi,H2,H%} be a partitionof hypotheses. Suppose that

,*) =0.5 if x € Hi

0.25 if x G H2

-0 .5 if x e Hz.

Suppose also that the utility of correctly accepting that statex is the true state is 1, for all x, and the utility of accepting X

229

(i.e., completely suspending judgment) is 0. Then Definition 9.1gives that vx(Hi) = u(Hi,x), for all x and i. The expectedverisimilitude of H\ is thus

e[v(H!)] = 0.5p(fTi) + 0.25p(fT2) - 0.5p(ff3)-

For instance, if all the Hi have the same probability, then theexpected verisimilitude of Hi is 1/12.

Propositions with maximum expected verisimilitude are al-so those whose acceptance has maximum expected cognitiveutility.

Proof.

[ ( t££*] » by Definition 9.1X, since uT and ux

are constants.

Since UT — ux > 0, it follows that £[v(A)] is maximized by alland only those propositions A that maximize £[u(A, x*)]. Since£[i/(j4,a;*)] is the expected utility of accepting A, this is thedesired result.

Prom this result and the principle of maximizing expectedutility, we have that rationality requires a person with a sci-entific cognitive utility function to accept a proposition thatmaximizes expected verisimilitude.

If all the axioms of Chapter 8 are satisfied, then Theorem 8.2guarantees that the utility function is unique, up to a positiveaffine transformation, almost everywhere. Consequently, veri-similitude, as defined here, will be unique almost everywhere.(Definition 9.1 ensures that utility functions differing only bya positive affine transformation yield the same measure of veri-similitude.) And the fact that this uniqueness only holds almosteverywhere is not significant, since expected verisimilitude isunaffected by differences on a set of probability 0. However,I have taken the position that the connectedness axiom (Ax-iom 1) is not a requirement of rationality. Consequently, I allowthat the utility functions we can attribute to a scientist need not

230

satisfy the above uniqueness condition. It will then follow frommy definition of verisimilitude that the scientist's measure ofverisimilitude is not uniquely determined, not even almost ev-erywhere. Rather, the scientist's measure of verisimilitude willbe represented by a set of verisimilitude functions, one for eachutility function in some p-u pair in the scientist's representor(cf. Section 1.5). This is not, I think, a disturbing situation;there is a sense of unreality about measures of verisimilitudethat purport to assign precise numbers as the verisimilitude ofevery hypothesis in every state.

9.5 INFORMATION AND DISTANCE FROM TRUTH

In Section 6.3, I suggested that scientists aim to accept hy-potheses that are both informative and true - goals that tend tocompete with one another. These two goals are both containedin the notion of verisimilitude, since verisimilitude is closenessto the whole truth. But the explication of verisimilitude givenin the preceding section did not isolate these components ofverisimilitude. The present section will make good that omis-sion and show how measures of information and distance fromthe truth may be defined. I will also compare my definition ofinformation with others in the literature.

9.5.1 A subjective definition of informationInformation, as we are concerned with it here, is the extentto which a proposition answers a person's cognitive questions.Thus a measure of information is a measure of the relative im-portance of different questions to the person. As such, a measureof information must be a function of the person's values; andsince different people can have different values, we should notrequire there to be one unique measure of information that isthe same for everyone. Information, like verisimilitude, is under-stood here as a subjective notion. And in fact, the approach Iwill follow is to define informativeness in terms of verisimilitude.I assume throughout this section that the underlying cognitiveutility function is scientific; that is, it satisfies (9.3)-(9.7).

If A is true, then the verisimilitude of A can itself be taken asa measure of the informativeness of A. So letting c{A) denote

231

the information content of A, we can for true hypotheses set

c(A)=v(A). (9.11)

For example, this identity entails that c({x*}) = v({x*}) = 1,and c{X) = v(X) = 0. This is as it should be, since {x*} isfully informative, while X conveys no information at all.

As for false propositions, their information content can beidentified with what their verisimilitude would be if they weretrue. The verisimilitude A would have if it were true is vx(A),where x is any state in A. Of course, this makes sense only if Ais not the empty (or contradictory) proposition 0. But for 0 wecan simply adopt the convention that its informational contentis 1. So my general definition of information is

Definition 9.2.

Any satisfactory conception of information must satisfy thecondition

c(AB) > c(A), for all A , B G X .

This condition follows from Definition 9.2.

Proof. If A i- 0 and B ^ 0,c(AB) = vx(AB), for x e AB

> vx(A), for x E AB, by (9.10)= c(A), by Definition 9.2.

If A = 0 or B = 0, c(AB) = 1, and so again c(AB) > c(A).

In Section 6.3.2, I discussed an example of estimating thetrue value of some real-valued parameter. Here the propositionsA were identified with sets of real numbers; and I defined ameasure of the content of any proposition A by setting

l f A ^ 0= J l+suP(i4)-inf(A)

1 if ,4 = 0.

232

I also defined the distance of A from the true parameter valuer to be

{ infxeA \x-r\ -f A / 0

1 if A = 0.

And I considered cognitive utility functions of the formuk(A, r) = kc(A) - dr(A). (9.12)

The function c in this example satisfies Definition 9.2.

Proof. Let r G A. Then

c(A) = - [uk(A, r) + dr(A)], by (9.12)k1

= — v>k(A, r), since dr(A) = 0 for r G A

- , since iAfc(X, t) = 0

and Uk({x},t) = fc, for any x= vr(A), by Definition 9.1.

This shows that c(A) satisfies Definition 9.2 when ^4^0 . Andc(0) is defined to be 1, as Definition 9.2 requires.

To summarize this subsection: The concept of informationthat is relevant to the acceptance of propositions is one thatmeasures the degree to which a proposition answers a person'scognitive questions. When the person's cognitive values are sci-entific, we can interpret this sense of information as the veri-similitude a proposition would have if true. Since verisimilitudeis defined in terms of the person's cognitive utility function,which is in turn defined in terms of the person's cognitive pref-erences, information has here been defined ultimately in termsof the person's cognitive preferences. Of course, if the person'spreferences are not complete, then this subjective measure ofinformation, like that of verisimilitude, will be to some degreeindeterminate - which is as it should be.

233

P. 5.2 Comparison with other accounts of informationMeasures of the informativeness of propositions have been dis-cussed by a number of philosophers of science, since many haveagreed that informativeness is part of the goal of science. AsBar-Hillel and Carnap (1953) observed, the relevant notion ofinformation for this purpose is a semantic one, and thus thepurely syntactic measures of Shannon (1949) do not measurewhat we are interested in here.

The semantic measures that have been proposed have, almostwithout exception, been based on the intuition that the infor-mativeness of propositions varies inversely with their probabil-ity. For example, Bar-Hillel and Carnap (1953) proposed thatthe amount of information conveyed by a proposition A couldbe defined either by

c(A) = l-p(A) (9.13)

or else byc(A) =-\og2 p(A) (9.14)

where p is some suitable probability function. Virtually all otherauthors who have discussed measures of informativeness haveaccepted some variant of (9.13) or (9.14).15 For example, Popper(1963, p. 392) uses (9.13).

As Hempel and Oppenheim observed in the conclusion oftheir classic (1948) paper, a crucial problem for definitions suchas these is the problem of selecting, from among an infinity offormal possibilities, some suitable probability measure p. Theusual suggestion was that p should be the logical probabilityfunction, but this suggestion only succeeds in rendering the the-ory of information hostage to the difficulties facing the theoryof logical probability.

If one is willing to embrace a subjective conception of infor-mation, then one might think of taking p to be the subjectiveprobability function of the person whose judgments of infor-mativeness we are explicating. But this has the peculiar resultthat a person can never be highly confident of any informa-tive proposition. If you acquire evidence that an informative15 An exception is the notion of "pragmatic information" discussed by Szaniawski

(1976).

234

proposition is probably true, it ceases to be informative. Acqui-sition of evidence would thus not advance the cognitive goal ofaccepting true informative propositions.

The weakest, and hence most plausible, interpretation ofprobabilistic measures of information is Levi's. Levi (1980,p. 48) requires that for each person there be an "information-determining probability" function M, such that the informa-tion value of accepting proposition A is measured by the quan-tity 1 — M(A).16 But he allows the choice of M to be up tothe individual, and to be a reflection of the individual's val-ues. Thus the function M is a probability only in the abstractsense that it is required to satisfy the axioms of the probabilitycalculus.

But even Levi's approach rules out certain possible judgmentsof informativeness - judgments that I think we often actuallymake. For example, in a partition of n hypotheses, at least onehypothesis must have probability < 1/n, whatever probabilityfunction we use. Thus in any large partition, (9.13) or (9.14)entails that some element of the partition has close to maximalinformative content. Yet if the partition concerned the possibleoutcomes in a long sequence of tosses of a coin, I think we wouldnormally judge that no element of the partition has any greatsignificance. In particular, the outcome of a long sequence ofcoin tosses is of vastly less significance than, say, the theoryof relativistic mechanics; but for a sufficiently long sequenceof tosses, some outcome is no more probable than relativitytheory, and hence is at least as informative, according to (9.13)or (9.14).

The axioms of probability entail that if p is a probabilityfunction, then for all propositions A and 5 ,

1 - p(AB) < 1 - p(A) + 1 - p(B).

So if a measure of informativeness must have the form (9.13),then the information value of a conjunction cannot exceed the16Levi allows that a rational person may have unresolved value conflicts, in which

case he holds that a rational person must have a set of information-determiningprobabilities, one for each way of resolving the conflicts (Levi 1980, ch. 8). Butfor a person with no unresolved value conflicts, Levi is committed to the positionas I describe it in the text.

235

sum of the information values of its conjuncts; that is, c(AB) <c(A) + c(B), for all A and B. But this is not a reasonable condi-tion to impose on a measure of informativeness, as may be seenfrom the following example. Let A be a proposition concerningthe mass of the proton, and let B b e a proposition concerningthe mass of the neutron. A physicist interested in the structureof atomic nuclei may be very interested in learning how closethese masses are to one another, but be much less interestedin the absolute magnitudes of the masses. A and B togetherwill tell the physicist how close the masses are to one another,but neither will do so separately. Our physicist therefore valuesthe information provided by A and B together much more thanthe information either provides separately. So if c measures ourphysicist's demand for information, we may well have

c(AB) >c(A) + c(B).

This is a further reason for rejecting the requirement that ameasure of information satisfy (9.13), for some probabilityfunction p.

A similar objection can be made to the requirement that mea-sures of information satisfy (9.14), for some probability functionp. Suppose that in the example of the preceding paragraph,c(A) = c(B) = 0.1. Then if c satisfies (9.14) for probabilityfunction p, it follows that p(A) = p(B) = 2~01. Hence

c(AB) = -log2 p(AB)= -log2lp(A)+p(B)-p(A\JB)]

< 0.21.

But in view of the circumstances described, we may well expectc(AB) to be larger than this; for example, there is no reasonwhy our physicist should not have c(AB) = 0.3. So (9.14), like(9.13), is not a condition that measures of informational contentshould be required to satisfy.

I do not deny that some individuals might be committedto having their judgments of informativeness satisfy (9.13) or(9.14), for some probability function p. For persons who have

236

such a commitment, it would be irrational to assess informa-tiveness in a way that violates the relevant condition. What Ihave argued is that someone who judges informativeness in adifferent way is not thereby unscientific.

My definition of informational content, Definition 9.2, doesnot entail that (9.13) or (9.14) is satisfied. The argument I havegiven in this section shows that this fact is not a defect of mydefinition, but rather a positive feature.

9.5.3 Distance from truthThe suggestion in Section 6.3 was that science aims at bothinformativeness and truth. Having just shown how the first ofthese desiderata may be characterized, I now turn to charac-terizing the notion of the distance of a proposition from thetruth.

The verisimilitude of a proposition is a measure of its close-ness to the whole truth. So we could easily define a measure ofthe distance of a proposition from the whole truth; for example,the distance of A from the whole truth could be measured by1 — v(A). But the notion I wish to explicate here is not distancefrom the whole truth but distance from the truth, that is, howfar a proposition is from being true.

The distinction can be made formally as follows. If d{A) de-notes the distance of A from the truth, in the sense here in-tended, then it must be the case that

d(A) = 0, for all true propositions A. (9.15)This condition does not hold for the notion of distance fromthe whole truth, since a true proposition need not be the wholetruth (it need not have a verisimilitude of 1).

The notion of truth itself is objective. It is an objective factthat snow is white, and hence it is an objective fact that it istrue that snow is white. But a measure of distance from truthmust measure the relative importance of the true and false con-sequences of a hypothesis; and such measures of relative im-portance must rest ultimately on our interest in being right onthese various topics. So for essentially the same reason that in-formation content is a subjective notion, distance from truthmust also be a subjective notion.

237

The informational content of any given proposition A is thesame, whatever the true state may be; but u(A, x) will in gen-eral be different in these different states. Thus we can thinkof distance from truth as the component of cognitive utilitythat is responsible for the variation of u(A, x) as x is varied.Let dx(A) be the distance of A from the truth when x is thetrue state; then a simple definition of the sort just indicatedwould be

dx(A) = u{A, t) - u(A, x). (9.16)

If (9.16) were adopted as a definition of dx(A), then a changein scale of the utility function would change the distance ofpropositions from the truth. This is an undesirable feature, andso we should multiply the right-hand side of (9.16) by a factorthat will cancel out changes of scale. Let u$ = u(0, x), assumedto be a constant independent of x. By (9.7), u$ < UT, and so wecan use UT — UQ as a scaling factor for measurement of distancefrom truth,17 giving

= u(A,t)-u(A,x)UTUQ

Since ?z(0,£) is not defined, this identity defines dx(A) only forA ^ 0. It will be convenient to have dc(0) defined also. Hencethe definition I will adopt is

Definition 9.3.

{ u(A,t)-uuT-u

1dx(A) =

I will also use d(A) to denote the actual distance of A from thetruth; that is, d(A) = dx*(A).

17Other choices of scaling factor could also be used; these would result in a dif-ferent scale of measurement for distance from truth. But I am avoiding usingUT — ux here, because (as will become clear shortly) that choice would forcethe k in u^ to be 1.

238

For any true proposition A, we now have

d(A) = dx*{A)_ u{A,t) — u(A,x*)

= v } ^—LJ~, since x* G A

= 0.

Thus (9.15) is satisfied by Definition 9.3. By (9.3) and (9.7), wealso have that d(A) > 0, for all A - as it should be.

Definitions 9.1-9.3 together imply that for all x G X and

u(A, x) = (uT - ux)c{A) - (uT - u$)dx(A) + ux.

(The proof is straightforward.) An equivalent utility function,differing from u only by a positive affine transformation, is

u'(A,x) = UT~Uxc{A) - dx{A).UT -U%

This utility function is identical to the utility function Uk of(9.12), with

_ uT-uxUT-UQ'

Hence adoption of Definitions 9.1-9.3 entails that all scientificutility functions can be expressed in the form t^, for some k.Differences in the cognitive utilities of different scientists canthus all be attributed to differences in one or more of:• The measure of content;• The measure of distance from truth; or• The weight put on content, as compared with closeness to

truth.It is also easy to show that if

u(A,x) = kc(A)-dx(A)

for some functions c and dx, and constant &, then dx is a measureof distance from truth, in the sense of Definition 9.3. It follows

239

that the measure of distance from truth, used in the examplein Section 6.3, satisfies Definition 9.3.

Definitions 9.1-9.3 entail that

Letting k be as above, this can be written as

vx(A) = c(A) - \dx{A).

Thus we have succeeded in factoring verisimilitude into the twocomponents that intuitively are part of it, namely content andcloseness to the truth.

9.6 SCIENTIFIC REALISM

In the past decade, much of the debate on scientific values hasbeen concerned with van Praassen's attack on scientific real-ism. According to van Praassen (1980, p. 12), scientific realismholds that science aims to accept true theories. Van Praassenrejected this, and maintained instead a view he called construc-tive empiricism, which holds that science aims only to acceptempirically adequate theories. (A theory is empirically adequateif everything it says about observable phenomena is true.) Theaccount of scientific values outlined in this chapter has implica-tions for that debate about scientific realism - but perhaps notthe implications one might expect, as we will now see.

First, we need to be clear about just what is at issue in thisdebate. To aim at x is not just to value x but also to take stepsdesigned to achieve x. So when van Praassen denies that scienceaims at truth, he might be denying that scientists value truthbeyond empirical adequacy; or he might merely be denying thatthey take steps designed to achieve it. For the latter claim tobe correct, it would perhaps suffice for scientists to have com-pletely indeterminate probabilities regarding unobservables. Soon the latter interpretation, van Praassen's claims about aimswould have no implications for scientific values. But the latterinterpretation is not the one van Fraassen intends. He construesthe aim of science as the standard of success in science (1989,pp. 189f.), and understands this notion in such a way that it

240

refers to the values put on the different possible outcomes ofaccepting a scientific theory. Thus van Praassen is taking real-ists to hold that in science, acceptance of a true theory is betterthan one that is false; and he is opposing this by maintainingthat it is a matter of indifference whether an accepted theoryis correct about unobservables.

So the issue is about scientific values. One might now ask:Is the issue a descriptive one about the values scientists actu-ally have? Or is it a normative one, about the values scientistsought to have? It is neither of these. Van Praassen is answer-ing the question What is science? (1989, p. 189), and so hisclaims about scientific values are claims about what values aperson must have to be a scientist. He is claiming that to be ascientist a person must value empirical adequacy, but need notvalue correctness about unobservables. This position is consis-tent with the possibility that many, or even all, scientists doin fact value correctness about unobservables. Van Praassen'sclaim is merely that having this value is not essential to beinga scientist.18

Thus understood, van Praassen's claim about scientific valuesis addressed to the same issue that I dealt with in Section 9.2,where I proposed necessary conditions for a person to count ashaving scientific values. I will now examine the points of simi-larity and difference between my proposals and van Praassen'sposition.

Talk of the aim of an activity, or of a criterion of success ofthat activity, seems to assume that the activity has only onedesirable outcome. The view of scientific values I have beendefending in this chapter does not make that assumption. In-stead, I have allowed that acceptance of a scientific theory mayhave a continuum of more or less desirable outcomes; in whichcase, there is no nonarbitrary dividing line between what countsas "success" and what counts as "failure." For that reason, inthis chapter I have chosen to talk of scientific values, repre-sented by a utility function, rather than talking of an "aim" of18 The question of whether van Praassen is making a descriptive or normative claim

was posed by Gideon Rosen, in a paper presented at the University of Michiganin April 1990, with van Praassen responding. Rosen argued that on either ofthese two interpretations, van Praassen's position is problematic. Van Fraassenreplied to Rosen's paper in the way I have indicated here.

241

science. This is one point of difference between my proposalsand van Praassen's position.

I do not deny that a scientist might have a cognitive utilityfunction in which the acceptance of any theory has only twopossible utilities. What I deny is that this is necessary for havingscientific values. Indeed, I argued that real scientists typicallydo not satisfy this condition.

I turn now to the claim of van Praassen's that has provokedthe most debate - his claim that science does not aim at truthbeyond empirical adequacy. Recall that what this claim meansis that someone who does not value truth beyond empiricaladequacy is not thereby unscientific. In terms of cognitive utilityfunctions, the claim is this: It is possible for a scientific utilityfunction u to satisfy the condition

u(A, t) = u(A, #), for all states x in whichA is empirically adequate. (9.17)

As noted in Section 9.2.1, my conditions on a scientific utilityfunction are consistent with this condition. Hence I agree withvan Praassen that scientists need not value truth beyond empir-ical adequacy. In fact, there have been very good scientists whohave maintained that they cared only for empirical adequacy;so van Praassen is surely right on this point.

But while a scientific utility function might satisfy (9.17),it need not do so; and I think that most scientists do preferaccepting true theories to ones that are merely empirically ad-equate. If this is right, then (9.17) does not usually hold for thecognitive utility functions of actual scientists.

That is an empirical claim, and to settle whether or not it iscorrect would require empirical evidence. This could be done bya survey that would ask scientists for their preferences regard-ing suitable possible experiments, with the preference interpre-tation of probability and utility then being used to infer thecognitive utilities we are interested in. (The representation the-orem of Chapter 8 shows that this can be done.) I suspect thatif such a survey were conducted, we would find that scientiststypically have a higher cognitive utility for accepting a theorywhen it is true than when it is false but empirically adequate.

242

But it would be interesting to have an actual study of this kindconducted.

In any case, this empirical claim is consistent with van Fraa-ssen's constructive empiricism, as I noted earlier.

The positive side of van Fraassen's account of the aim ofscience is that science aims at empirical adequacy. Since this isa claim about scientific values, it can be expressed in terms ofscientific utility. The claim would then presumably be that allscientific utility functions satisfy the condition

u(A, x) > u(A, y) whenever A is empirically adequate in x,and not empirically adequate in y. (9.18)

While I would allow that scientific utility functions may sat-isfy this condition, I deny that they must satisfy it. For ex-ample, imagine an individual who accepts theories that makeclaims about all of space-time, but cares only for whetherthose theories fit observable phenomena that occur within abillion light years of us. This individual cares about empir-ical adequacy with regard to all phenomena that our scien-tists are going to encounter, at least for the next billion years(by which time we probably will be long extinct). I see nocompelling reason to say that this individual's values are un-scientific. Indeed, I guess that those scientists who say theycare only for empirical adequacy, not truth, would also saythat they do not care for empirical adequacy in such remoteregions of space-time.

So the position on scientific values that I have defended inthis chapter is weaker than van Praassen's. It is weaker becauseit does not require the scientific utility of accepting a theory tohave only two possible values, and because it does not require(9.18) to hold. Since van Praassen's position is in turn weakerthan scientific realism (as he defines that view), I could describemy position by saying that I am even less of a scientific realistthan van Praassen is. However, I have suggested that the realistposition is likely to be closer to being a correct account of mostactual scientists' values.

In any case, neither van Praassen nor realists dispute the nec-essary conditions on a scientific utility function that I adopted

243

in Section 9.2. And I have shown that when these conditions aresatisfied, then we can say that the cognitive utility of acceptinga theory depends only on the content of the theory, and on itsdistance from the truth. The notions of content and distancefrom truth are subjective, and different scientific values willbe reflected in different measures of content and distance fromtruth.

244

Appendix A

Proof for Section 5.1.6

This appendix proves the theorem stated in Section 5.1.6.Given a suitable set X of states of nature, the expected utility

of any act a can be written as1

EU(a) = ^2 P(x)u(xa)xex

where p and u are the person's probability and utility functions,and xa denotes the conjunction of x and a. According to causaldecision theory (which I assume here)2 X is a suitable set ofstates for calculating the expected utility of a iff X is a partitionsuch that for each x € X, x is not causally influenced by a, andx determines what consequence will be obtained if a is chosen.Consequences are here understood as including every aspect ofthe outcome that matters to the person.

Let 5 be a suitable set of states of nature for calculating theexpected utility of the acts that will be available after the choicebetween d and d! is made. By (iv) the states s e S are causallyindependent of d, and hence conjunctions of the form s.d-+Rqare also causally independent of d.3 Let the set of acts availableafter choosing d or d1 be S, and assume there is a unique bq G Bthat you would choose if your probability function were q. Thend together with d^>Rq determines that you will choose bq, andthis together with s determines the unique consequence you willobtain as a result. Hence conjunctions of the form s.d^>Rq areboth causally independent of d and determine what consequence

XI will assume that the set of states is at most countable. This assumption can beremoved by replacing summation with integration.

2For a discussion of this theory and its alternative, see e.g., (Maher 1987).3Where ambiguity would otherwise arise, I use a dot to represent conjunction; the

scope extends to the end of the formula (or to the next dot, if there is one). Sos.d^Rq denotes the conjunction of s and d—>R q.

245

will eventually be obtained as an indirect result of choosing d.We can therefore take these propositions to be our states forthe purpose of computing the expected utility of d. Letting Qbe the set of all probability functions that you could have afterchoosing d, we then have

EU(d) = Y,Y,P(s'd-*RMsbq)- (A.I)seSqeQ

You know that if d and d-+Rq are true, then Rq must obtain.Hence p(Rq\s.d^Rq.d) = 1. Thus (A.I) implies

Eu(d) =seSqeQ

Applying Bayes' theorem to p(Rq\s.d^>Rq.d) then gives

By (v), p(s.d—*R q\d) = p(s.d—*R q), and so we have

EU(d) = 2 E p(s.d-+Rq\Rqd)p(Rq\d)u(sbq). (A.2)seSqeQ

By condition (a), you are sure one of the d^>Rq holds, andso p(s.d—>R q\Rqd) = p(s\Rqd). I assume that p(d\Rq) = 1, forall q E Q; a sufficient condition for this would be that you knowwhat act you have chosen after you choose it. Hence p(s\Rqd) =p(s\Rq), and (A.2) simplifies to

seSqeQ

qeQ ses

Since you are sure one of the d^>Rq is true, p(Rq\d) — p(dRq\d)] and by (v), p(d^>Rq\d) =p(d^Rq). Thus

EU(d) = 5 > ( d ^ i g 5>(«|i?gM*& g). (A.3)qeQ

246

Condition (b) asserts that for each q G Q there is a proba-bility function q' such that p(d' —> Rq/\d-+ Rq) = 1. Obviously,only one q' can satisfy this condition for a given q\ I will usethe notation (p(q) to denote this q'. I will also use Q' to denotethe set of probability functions you might have after choosingdf. Then we can rearrange the summation over Q in (A.3), togive

E' (p(q)=q'

By condition (d), p(s\Rq) = p(s\R(p^), Also, condition (i) en-tails that p(s\R(p^) = ip(q)(s). Hence

EU(d) =q'€Q' f,{q)=q'

q'€Q' <p(q)=q'

= Ejp(q)=q'

(A-4)seS

By the definition of <p, p{df-+ Rqt\d-+ Rq) equals 1 if q' =and 0 otherwise. Hence

<p(q)=q'

The theorem of total probability then gives

<p(q)=q'

Substituting in (A.4), we have

EU(d)< E p(d'->Rq>) max J2q'(s)u{sb). (A.5)beBbeB seS

The reasoning leading to (A.3) can be repeated, mutatis mu-tandis, for d', giving

q'€Q' s<=S

247

Prom condition (i) we have p(s\Rqf) = </(s), and so

q'eQ' sesJ2u(sb) (A.6)ses

Comparing (A.5) and (A.6), we see that EU(d) < EU{d').

248

Appendix B

Proof of Theorem 8.1

This appendix proves the representation theorem for simple cog-nitive expected utility stated in Section 8.2. Throughout thisappendix, it is assumed that Axioms 1-8 are satisfied.

B.I PROBABILITY

The following theorem establishes the existence of a quantita-tive subjective probability function that agrees with the quali-tative probability relation discussed in Section 8.2.8.

Theorem B.I. Let •<• be a binary relation defined on X bythe condition that for all F,G G X, F <• G iff there exista,b eY and / , j G D such that

• a -<b;• f = b on F and a on F;• g = b on G and a on G; and

Then there exists a unique function p such that for allF,GeX;

(i) (X, X,p) is a (finitely additive) probability space.(U)p{F)<p{G)iffF<.G.

(iv) If p € [0,1] there exists A € X such that A c F andp(A)=pp(G).

Savage (1954, ch. 3) shows that Theorem B.I follows from hispostulates. His proof remains valid when the axioms of Sec-tion 8.1 are used in place of his postulates; consequently, thereader is referred to Savage for a proof of this theorem.

Throughout this appendix, 'p' denotes the unique probabilityfunction asserted to exist by Theorem B.I.

In the next section, we will make use of the following theorem.249

Theorem B.2. Let {Ai,..., An} C X be a partition ofand let f,g e D. If f £ g given A*, for all i — 1, . . . , n, thenf ~ 9 given A. And if in addition there exists i such that f -< ggiven Ai, then f -< g given A.As Savage (1954, p. 24) remarks, this can be proved using math-ematical induction.

B.2 UTILITY OF GAMBLES

Let a i , . . . , an G Y and let p i , . . . , pn > 0, where YA=I Pi = 1- Iwill use Yd=i Piai t o denote the set of all / G D such that forsome partition {Ai,. . . , An} of X, fi = ai on A^ and p(Ai) =pi, i = 1, . . . , n. A subset F of D is said to be a gamble (Savage1954) or lottery (Fishburn 1981) if F ^ 0 and

for some ai and p*. The set of all gambles will be denoted by L.The purpose of this section is to prove the existence of a util-

ity function on L. But first we need a well-defined notion of pref-erence on L. The following theorem is addressed to that need,showing that gambles are equivalence classes in IJ^?1 whence-< on D induces a corresponding ordering of L. (In this the-orem, and subsequently, the abbreviation 'a.e.' means 'almosteverywhere', i.e., on a set of probability 1.)

Theorem B.3. Let E?=iP^i € L, let f',gf G Td=iPiai> and

let f,g G D be such that f = f a.e. and g = g! a.e.. Thenf ~ 9-

Proof. It is easily shown, using Theorem B.I and Definition 8.2,that f ~ ff and g ~ g'. Thus there is no loss of generality insupposing that f,g G Y%=\ Pi<*>i-

I will first prove that the theorem holds for n = 2, and after-ward show that the general case follows.

Suppose then that f,g€ P\a\ + P2&2- We may assume, with-out loss of generality, that / •< g given f~x{a\) D g~1(a2), that1 U L is the union of the elements of L. It is identical to the set of simple acts.

250

is, that one or the other of the following three cases holds:(OrHaOn^MeN.

(ii) / ~ g given / 1(ai) n g 1(a2), and

(iii) / -< g given f"1(ai)ng"1(a2).If (i), then since (as is easily verified)

P [rHatirig-Hv)] =p[r1(a2)ng-1(a1)]we have by Theorem B.I that f~1(a2) C\g~1(ai) G N, whencef ~ g given f~1(a2) fl gf~1(ai). If (ii), then by Axiom 6, / ~ pgiven /~1(a2)Hp~1(ai). So whether (i) or (ii) holds, we have byTheorem B.2 that f ~ g. Turning now to case (iii), let 61,62 G Ybe such that 61 -<b2, and define f',g' G D by

f 61 on /-^ai) 1 ; = f 61 on g^fa)\ 62 on f~l{a2) J ' ( 62 on g~1(a2)

Then / ; -< g' given /~1(ai) fl ff"1(a2), by Axiom 6. So by Ax-iom 7, / 2 g iff / ' < pr. But by Theorem B.2 and the fact thatp[/~1(ai)] = p[flf~1(ai)], / ' ~ 5r. Hence f ~ g. This completesthe proof for n = 2.

Turning now to the general case, let C G X be such that forij = l , . . . , n

P [Cnr^ng'^aj)] =p[f-1(ai)ng-\aj)] /2.(The existence of such a C is guaranteed by Theorem B.2.)Let a G Y be such that /i = a for some h G D, and define

,Pi G D by

/ on C I I aonCaonC J [ / on C

By the n = 2 case already proved, we have that for all i =1,.. . ,n, /1 ~ /2 given /~x(ai). So by Theorem B.2, /1 ~ /2.Furthermore, since

251

Theorem B.I implies that fa ~ g\ given [/ 1(aj)nC] U[ 1 C ] , i = l,...,n. But

PI C] U [5-X(«i) n C] : i = 1 , . . . , n}

is a partition of X, as so we have by Theorem B.2 that fi ~ g\.So by the transitivity of ~, f\ ~ g\. Consequently, f ~ g givenC. Interchanging the roles of C and C in this proof gives alsothat / ~ g given (7, and so / ~ p. I

(The solid block is used to mark the end of a proof.)The preference relation •< can now be extended to L in the

obvious way.

Definition B.I. Let F, G G L, / G F, g G G. T/ien F <G iff

The relations < and ~ o n L are of course defined in terms of ^in the usual way.

I will now define the notion of a mixture of gambles. LetF i , . . . , Fn E L, and for i = 1, . . . , n let Fi = YJjLi Pijaj- (Notethat this latter condition involves no loss of generality, sincesome of the pij may be zero.) Then if <JI, . . . ,crn > 0, andJ2?=i ^i — ^-> the mixture

of F i , . . . , Fn is defined to be

The next theorem shows that L is closed under this mixingoperation. In other words, L is a convex set.

Theorem B.4. Let Fi , . . . , Fn e L, and let o\,..., on > 0,with Y2=i <?i = l- Then Yd=i °iFi G L.

252

Proof. What needs to be shown is that Yli=i ai^i ls nonempty.This will be proved by induction on n.

If n = 1, then YA=I ai^i = i> a n d then since i*\ is nonemptyby definition, so is Yli=i ^iFi-

Suppose now that the theorem is true for some n, and con-sider the mixture Y17=i a^i °f gambles JF\, . . . , Fn+i. If an+i =1, then J2?=i ViFi — -Fn+i> which is nonempty. Suppose thenthat o-n+i < 1, and let

By assumption, there exists / G F and g G Fn+\. By Theo-rem B.I we have that for each a, 6 G Y there exists Aa^ G Xsuch that

p [Aa,b H f-\a) H g-\b)\ = <Jn+iV [f'Ha) D g-\b)] .

LetA= |J A*,h.

Since only finitely many Aa^ are nonempty, A G X. Hence byAxiom 3 there exists h eT> such that h = f on A and g on A.Then

n+lh e ( l - <Jn+i)F

Thus the theorem holds with n + 1 in place of n. I

We are now in a position to define the notion of a utility on L.

Definition B.2. A real-valued function v onL is a utility onL iff for all F,G G L and p G [0,1],

(i)F*Giffv(F)<v(G).(ii) v(pF + (1 - p)G) = pv(F) + (1 - p)v{G).

In other words, a utility on L is an order-preserving linear real-valued function on L.

The next task is to prove that there is a utility on L. Not tobelabor familiar material, I will state without proof the follow-ing theorem.

253

Theorem B.5. There exists a utility on L iff for all F,G,H eL the following three conditions hold:

(i) •< is connected and transitive on L.(ii) If pe (0,1), then F^G iff

PF + (l-p)HlpG+(l-p)H.(in) If F -<G -< H, then there exist p,aG (0,1) such that

(l- a)H.

Essentially Theorem B.5 is due to von Neumann and Morgen-stern (1947), but for a proof in which the assumptions have theform (i)-(iii) see (Jensen 1967).

It is an immediate consequence of Definition B.I and Ax-ioms 1 and 4 that condition (i) of Theorem B.5 is satisfied.Thus the existence of a utility on L will be proved if we provethat conditions (ii) and (iii) are also satisfied. To this we nowturn.Theorem B.6. Let F, G G L with f G F and g G G, andsuppose A G X \ N is independent of f~1(a) n g~l(b), for alla, 6 eY. Then f £g iff f ^g given A.

Proof. I will first show that, if B G X is independent of f~1(a)np-1(6) for all a, 6 eY, and if p(A) = p(B), then f -<g given Aiff / •< g given B. To this end, let c G Y be such that h = c forsome h G D, and define / ' , / ; / , </, <//; G D by the conditions

r =

Then letting p be the common value oip(A) and p(B), we havethat / ' , f" G pF + (1 - p)c and #', g" G pG + (1 - p)c. So byTheorem B.3, / ' - / " and g' - g". Thus / ' < g1 iff f" < g",and from this we obtain the desired result, viz. / •< g given Aiff / •< g given B.

In view of the result just established, it will be convenientto introduce the (temporary) notation lf^g given p\ where

254

p G [0,1]. This notation will mean that there exists B G Xsuch that B is independent of f~1(a) flg~l{b) for all a,b €Y,p(B) = p, and f -<g given B.

Next the following three claims will be established. (Here andsubsequently, J + denotes the set of positive integers.)

(i) If p G (0,1], n G / + , and n < p"1, then / •< g given ponly if / < g given np.

(ii) If p G [0,1] and n G J+, then / •< g given p only if / •< ggiven n~1p.

(iii) If p G (0,1] and / -< g given p, then there exists e > 0such that / -< g given cr, for all a G [p — £, p].

To prove (i), note that by Theorem B.I there exist pairwisedisjoint Ai , . . . , An G X such that for all i = 1, . . . , n and a, 6 G

n /-^a) n5-1(6)] = pp [rH") n ^ '

By assumption, / ^ g given A$, i = 1,... ,n. Hence by Theo-rem B.2, f -<g given |J^=1 i4 . Since p [U?=i -A»] — nP-> this is thedesired result.

For the proof of (ii), suppose there exists n G / + such thatg -< / given nrV. Then one can show, much as in the proof of(i), that g -< / given p. So by contraposition, / ^ p given p onlyif / 3 ^ given n ~ V

For the proof of (iii), let £ G X be such that for all a, b G F,

P [B n rx(a) n5-1(6)] = PP n

Define /o G D to be the act that agrees with / on J5, and 5on B. Then by assumption, /o -< 5. Now let Ai , . . . , An be anenumeration of

such that ii f ^ g given A and 5 -< / given A , then i < j . Wedefine simultaneously a sequence / 1 , . . . , fn in D and a sequence

255

S i , . . . , Bn in X by the following two conditions, held true foreach i G {1 , . . . , n}:

(a) Let Ci , . . . , Cn be a partition of A% such that, for all j =1,... ,m if

/i_i on Cj

then h < g. Choose some j such that Cj fl i? 0 N, and let^ = C, n B.

(b) Let

(Note that /* -< g for all i = 1, . . . , n.) Now define

e = min {p(Bi)/p(Ai) : i = 1, . . . , n} .

For each i = 1,. . . , n, let Di G X be such that D{ C Bi andp(I^) = ep(Ai). Let D = U?=i A , and define / ' G D by thecondition

fl=( fonB\D[ g elsewhere

Then if m = max{i : / ^ g given A*}, we have by Theorem B.2that f'<fm< 9- And so, since

P[J4* n(B\ D)] = (p - e)p(i4i), i = 1,... ,n

we have that / -< g given p — e. Now if 0 < er < s, substitutingef for e in the proof just given shows that / -< g given p — e'.This completes the proof of (iii).

With (i)-(iii) established, I proceed to complete the proof ofthe theorem. For this it suffices to show that for all p G (0,1],f -<g given p iff / •< g. That result is trivial if, for all p G (0,1],/ rsj g given p. So suppose that for some a G (0,1], / ^ g givena. Without loss of generality, let / -< g given a. By (iii) there

256

exists e > 0 such that / -< g given r, for all re [a — e, or]. Nowsuppose that for some pG (0,1],

g-<f given p. (B.I)

Choose m,n e 1+ such that n > rap and mn~1p G [a — e,a].Then (ii) gives that

5 -< / given n~V

and hence by (i) it follows that

g-<f given mn~lp.

But this contradicts the choice of ra and n, and hence (B.I) isfalse. Thus we have shown that / -< g given p, for all p G (0,1],and hence that f £ g given p\S f <g. I

The following theorem asserts that condition (ii) of Theo-rem B.5 is satisfied. It is an easy corollary of Theorem B.6.

Theorem B.7. If F, G, H e L, and p G (0,1], then

iff F^G.

Proof. Let / G F, g G G, and /iGff. Choose 4 G X such thatfor all a ,6 ,cG7,

Define /',</ G D by the conditions

/ ' =

Then / ' G pF + (l-p)H, and fl/ G pG + (l-p)H. Thus we havethe following chain of equivalences (the third of which holds byTheorem B.6):

257

PF+(l-p)H^pG+(l-p)H<*= f ^9 given A

The next theorem asserts that condition (iii) of Theorem B.5is also satisfied.

Theorem B.8. / / F,G,H G L, and F -< G -< H, then thereexist p,cr G (0,1) such that

pF + (l-p)H^G^aF + (l- a)H.

Proof, Let / G F, g G G, h G H, and let Ai, . . . ,An be anenumeration of

{f-1(a)nh-1(b):a,beY}\N

such that \i f •< h given A{ and h -< f given AJ5 then i < j .By Axiom 8, there exists a sequence /o, / 1 , . . . , fn in D and asequence J5i,..., Bn in X \ N such that for all i = 1,. . . , n:

(i)(ii)

(iii)in

(iv)

Let

/oBi

fi

= f

J1-<9>

e

h o n B<k

fi-i on

= min

and for each i = 1, . . . , n let C{ G X be such that Ci C Bi andp(d) = ep(Ai). Let B = (JJU #f and C = U?=i Ci7 and definefen by

, ZionC

" " / on C

Then if m = max{i : f -<h given ^4^}, we have by Theorem B.2258

that / ' -< fm^ g. But / ' G (1 - e)F + eH, so setting p = 1 - ewe have that p G (0,1) and

pF + (1 - p)# -< G.

A similar proof establishes the dual result, namely that thereexists a G (0,1) such that

G -< <JF + (1 - &)H. I

We have now shown that all conditions (i)-(iii) in Theo-rem B.5 hold, and hence have established

Theorem B.9. There exists a utility on L.

It is a familiar consequence of Definition B.2 that this utilityon L is unique up to a positive affine transformation; that is,we have

Theorem B.10. Ifv is a utility on L; then a real-valued func-tion v' on L is a utility on L iff there exist real numbers p anda, p > 0, such that v' = pv + a.

A proof of Theorem B.10 may be found in many places, includ-ing the works cited in connection with Theorem B.5.

B.3 UTILITY OF CONSEQUENCES

Let y* be the set of all a G Y such that, for some / G D,f~1(a) £ N. The notion of utility of consequences I am aboutto introduce will be called a utility on F*, even though thedomain of this utility function is the whole of Y. The reasonfor this terminology is that the values assumed by this utilityfunction on 7 \ 7* are quite arbitrary. The term utility on Ywill therefore be reserved for the stricter notion of utility ofconsequences to be introduced in Appendix D. The definitionof the present notion of utility of consequences is as follows(where £{Z) is the expected value of the random variable Z):

Definition B.3. A real-valued function u onY is a utility onY* iff for atlf,ge\JL,flg iff S[u(f)} < £[u(g)}.

259

In other words, a utility on Y* is a function whose expectedvalue £[^(-)] is order preserving on (JL. In fact, £[^(-)] is orderpreserving on acts that are almost equal to acts in (JL. Moreprecisely, if / ' , </ G IJk, a n d f->9 € D, and if / = / ' a.e. andg — g1 a.e., then / < g iff 5[u(/)] < £[1/(5)]. (That this is so isan immediate consequence of Definition B.3 and clause (iii) ofTheorem B.I.)

The aim of the present section is to establish the existenceof a utility on Y*. But first I will establish a uniqueness condi-tion. This asserts that utilities on Y* are unique on Y* up toa positive affine transformation, although (as remarked above)there are no restrictions on the values they may take o n 7 \ 7 * .Formally, the condition isTheorem B. l l . Let u be a utility on Y*. Then a real-valuedfunction u1 on Y is a utility on Y* iff there exist real numbersp and cr, p > Q, such that u* = pu'+ + cr, where u* and u'^ arethe restrictions of u and uf respectively to Y*.

Proof. Let / , g G (J^? a nd suppose first that for some p > 0and real number a, u* = pu^ + a. Since p[f~1(Y \Y*)] =plg-^YXY*)] =0 , we have

S[u(f)] = £[pu'(f) + a}= p£[u\f)\ + a

and similarly for g. Applying first Definition B.3 and then thepreceding identity gives

e[u(f)]<S[u(g)}

So v! is a utility on Y*.For the other half of the proof, suppose u' is a utility on Y*.

Let a, b G Y, a -< 6. If we set

u{b) — u(a)F u'(b) - uf{a)

then p > 0. Then setting a = u(a) — pu'{a) gives

pu'{a) + a = u(a)puf(b) + (j = u(b).

260

Let u" = pu! + a. Then by the half of the theorem alreadyproved, v!1 is a utility on Y*. The proof will be completed byshowing that for any ceY*, U"(C) = u(c).

Since c e Y*, there exists h G D such that h~l(c) £ N.Define k G \J L by

{ c on h~1(c)a elsewhere

We consider three cases.First, suppose a ^ k -< b. Now it is a consequence of conditions

(i)-(iii) of Theorem B.5 that for any F, G, H G L, if F 3 G 3 Hthen there exists r G [0,1] such that G ~ T F + (1 - r ) ^ . (For aproof of this claim, see e.g., [Jensen 1967].) Hence there existsr G [0,1] such that

k ~ ra + (1 — r)6.

So since t/; is a utility on y*,

So if 77 = p [fc~1(c)], we have

rju"(c) + (1 - r/K(a) = rti'^a) + (1 - r)u"(b). (B.2)

Similarly, the fact that u is a utility on y* gives

+ (1 - ri)u{a) = ru{a) + (1 - r)w(6). (B.3)

But (B.2) and (B.3), together with the facts that u"{a) = u(a),u"(b) = tx(6), and 77 > 0 imply u"(c) = u(c).

For the second case, suppose k -< a. Then by the result ap-pealed to in the preceding paragraph, there exists r G (0,1)such that

a ~ r[rjc + (1 — rj)a] + (1 — r)b.

So since u" is a utility on Y*1

£[u"{a)\ = S[uff[rrjc + r ( l - rj)a + (1 - r)b}).261

That is,

u"{a) = rriu"(c) + r ( l - rj)u'\a) + (1 - r)u"{b). (B.4)

Similarly, the fact that u is a utility on F* gives

u(a) = rr)u(c) + r ( l - r))u(a) + (1 - r)u(b). (B.5)

It follows from (B.4) and (B.5) again that u"(c) = u(c).Finally, if b -< fc, then a similar argument shows that in this

case too u"{c) = tx(c). I

Theorem B.12. There exists a utility on Y*.

Proof. Let a E Y be such that for some / G D, / = a. Letb e y*, and let pa G (0,1] be such that pb + (1 - p)a G L andcr6 + (1 — a)a G L. Supposing without loss of generality thatp < cr, we have

pb + (1 - p)a = - [aft + (1 - a)a] + -—-a.a a

Thus if v is a utility on L such that v(a) = 0, we have byDefinition B.2

v[p6 + (1 - p)a] = -v[cr6 + (1 - a)a].

That is,

t;[p6 + (1 — p)o]/p = v[crb + (1 — a)a]/a.

In view of this result, we can define a function u on y as follows:For any b € y,

/ t ;[p6+(l-p)a]/p if 6 Gy*o if 6 G y \ y*

where p > 0 is such that (if 6 G 7 * then) p6 + (1 - p)a G L. Iwill now show that the function u thus defined is a utility ony*.

262

It suffices to show that for all pi,..., pm € [0,1] and o i , . . . ,meY, if ££LiPiateL, then

( TO

For if / , 5 G LJL, then / G ET-iPi^i and 5 G £"=I<7J6J, forsome p i , . . . , pm, a i , . . . , an G [0,1] and a i , . . . , am, 61 , . . . , bn GY. And so, since by Definition B.I

(B.6) and the fact that v is a utility on L imply

£[u(f)]<£[u(g)}

which is the desired result.I will prove (B.6) by induction on m. First I will show that

(B.6) holds for m = 3. (Of course, if (B.6) holds for m = 3,then it follows that (B.6) also holds for m = 1 and m = 2,since we can allow some or all of the ai appearing in (B.6) tobe identical.) Suppose then that ]T)?=i Piai ^ L. It follows thatfor i = 1,2,3,

+ (1 - Pi)a G L

and so3

i + (2/3)a = 2^(1/3)[piai + (1 — pi)a] G L.

Therefore

Thus (B.6) holds for m = 3.263

Now to complete the induction, suppose (B.6) is true for somera, and let Y^X1 Pi<*>i G L. Define Hu H2 G L by

m - l

PiCLi + (P™ + Prn+l)<*>

/m-l \H2 = pmO"m + Pm+lOm-fl + I ^ Pi 1 Q-

\i=l /

Then by assumption,

m - l

v(H2) = Pmu

Butm+1Y, (Pi/2)ai + (l/2)a - (1/2)^ + (1/2)H2i=l

and so

(m+l \ m+1

Thus (B.6) is true with m + 1 in place of m.

264

Appendix C

Sufficient conditions forAxiom 10

This appendix proves that when Axioms 1-8 are satisfied, thenthe following conditions are jointly sufficient (but not necessary)for Axiom 10:

(i) There are "best" and "worst" acts; that is, there existfc, / G D such that, for all / G D, k ^ / £ I.

(ii) For all / G D, if k -< / or / -< /, then there existsh G U L such that h ^ / or / ^ h (respectively).

As in Appendix B, we here use the fact that the set of simpleacts in D is (J L.

Suppose conditions (i) and (ii) are satisfied, and let / , g G Dbe such that / -< g. I will show that there exists h G (J L suchthat / -<h •< g. First note that, by (ii), there exist /ii, hi G (JLsuch that h\ -< g and / -< /i2- If either / ^ hi or /12 ^ p,there would be nothing more to prove; so suppose that theseconditions do not hold; that is, suppose hi -< / -< g -< hi- Let

a = sup J£[u(h)] : /i G | J L, /i -< / }

(3 = mf{£[u(h)]:he\jL, g * h} .

Obviously a < (3. Now since L is a convex set (Theorem B.4),there exists h G (J L such that

But then if a < (3, we would have

a < £[u(h)} < /?,265

and hence / -< h -< #, and the result would be established. Sosuppose instead that a = /3. Then we have

S[u(h)] = a = (3.

Now if h -< / , then /i -< /12, and so by Axiom 8 there existsb! e U L such that h^h' ^f. But then 5[ti(fe;)] > W O ] = a»contradicting the choice of a. Hence / •< h. A similar argumentshows that h •< g, and so we have the desired result that / •<h< g.

266

Appendix D

Proof of Theorem 8.2

This appendix gives a proof of Theorem 8.2, which establishesan expected utility representation for preferences over cogni-tive acts that are not necessarily simple. As in Appendix B,'p' denotes the unique probability function whose existence isvouched for by Theorem B.I. Also, it is assumed throughoutthis appendix that Axioms 1-11 are satisfied.

D.I COUNTABLE ADDITIVITY OF PROBABILITY

In this section, it will be shown that p is countably additive -that is, that for all disjoint sequences {Ai} in X, p(\JiAi) —J2iP(Ai). The proof of that result will utilize the following the-orem.

Theorem D.I. Ifp is not countably additive, then there existsA G X and a sequence {Ci} in X \ N such that p(A) > 1/2,

is a partition of X, andp(A\Ci) < 1/2 for all i G J+.

Proof.1 Let us say that a probability p is dilute if for alle > 0 there exists a measurable partition {Di} of X such that£ ; p ( A ) < e. Then it follows from Theorems 2.1 and 2.2 ofSchervish, Seidenfeld, and Kadane (1984) that any probabilityis a convex combination of a countably additive probability anda dilute probability. Thus for any probability p, we can write

p = apc + /3pd

where pc is a countably additive probability, pd is a dilute prob-ability, a G [0,1], and (3 = 1 - a.

Now suppose that p is not countably additive, so that /? > 0 inthe above decomposition. By the continuity of p (Theorem B.I,xThis proof is due to Teddy Seidenfeld (personal communication, April 5, 1984).

267

part iv), there is a partition {E\, E2, E3} of X such that p{E\) =p(E2) = 0.5 — 0.1/?. Also, by renaming E\ and E2 if necessary,we can suppose that Pd(Ei) > Pd(E2). Letting A = E\ U #3,we then have that p(A) = 0.5 + 0.1/3 and pd(A) > 0.5. By thedefinition of a dilute probability, there exists a partition {Ai}of A such that £;Pd(^i) < 0.01. Then

< 0.5-0.39/3< 0 . 5 - 0.1/3 = p(A).

It follows from this inequality that there is a partition {Bi} ofA such that p(Bi) > p(Ai) for all i G / + . For by the continuityof p, there is a disjoint sequence {Bi} of subsets of A such that,for all i £ I+,

Finally, for each i G /+ , let Ci = Ai U Bi. Then we have

The countable additivity of p is an easy corollary of Theo-rem D.I.

Theorem D.2. p is countably additive.

Proof. Suppose p is not countably additive. Then by Theo-rem D.I there exists A G X such that p(A) > 1/2, and thereexists a sequence {C^} in X \ N such that {Ci} is a partition ofX and p(A\Ci) < 1/2 for all i G /+ . Let a, b G Y be such thata -< 6, and define / , g G U L by the conditions

Let u be a utility on Y* (Definition B.3). Then u(a) < u(b).Also for all i G J+, if

f Olid

then £[u(f)] < £[u(g')], and so / -< g'; that is, / -< g given d-But f [^(/)] > £[u(g)]i and so / >- p, violating Axiom 9. Henceit is false that p is not countably additive. I

D.2 THE FUNCTION W

In this section I will define a function w with domain D andestablish some basic properties of this function - properties thatwill justify thinking of w as a kind of utility function on D. Butbefore turning to that, I will establish several useful results,beginning with the following strengthening of Axiom 10.

Theorem D.3. If / , g G D and f -< g, then there exists h GIJL such that f -< h -< g.

Proof. Let a, b G Y be such that a -< b. Then at least one ofthe following must hold:

(i) f*b.(ii) a -< g.

Suppose (i) holds. Then by Axiom 8, there exists a measurablepartition {A\,..., An} of X such that for alH — 1, . . . , n, if

then fi -< g. But by Theorem B.2, there exists j G {1, . . . ,n}such that / -< b given Aj, whence / -< fj -< g. Also, repeatingthe preceding argument with fj in place of p, we have that thereexists fj G D such that / -< /j -< fj. Then by Axiom 10 thereexists h G U L such that fj^h-< fj, whence / -< h -< #.

269

If (ii) holds, then using Axiom 8 and Theorem B.2 much asbefore, we have that there exist gj,g'j G D such that / -< gj -<g'j -< g. Then by Axiom 10 there exists h e \JL such thatgj £h£ 9j, whence / -<h^g again. I

The following theorem is proved by Savage (1954, p. 73). Hisproof of it is unaffected by the differences between his postulatesand my axioms, and so the theorem is stated here without proof.

Theorem D.4. If F\,F2 G L, g G D, and F\ -< g -< F2, thenthere exists a unique p G [0,1] such that pF\ + (1 — p)F2 ~ g.

Let ao G Y be such that ao -< a for some a G Y. By The-orems B.ll and B.12, there exists a function ixo such that u$is a utility on Y* (Definition B.3) and txo( o) = 0. We keep aoand ^o fixed throughout the remainder of this appendix. Alsowhen g e\JLI will for brevity write uo(g) in place of £[uo{g)},thus regarding uo as a function on (J L as well as on Y. No con-fusion should arise from this convention, since these two usesof the symbol UQ are distinguished by the different argumentsthat the function takes in each case. In the following theorem,for example, it is clear that ^o is being regarded as a function

Theorem D.5. For all f G D there exists g E \JL such thatg 3 / or f ~ 9- V there exist g,h e \JL such that g ^ / -<h,then

sup{uo(g) : 5GIJL, g £ /}

h€\JL, f*h}. (D.I)

Proof. The first part of the theorem follows immediately fromthe fact that ^ is connected (Axiom 1) and [jL ^ 0 (Axiom 2).

As for the second part, we have by Theorem D.4 that therealways exists k e\JL such that f ~ k. Thus

, g < f)

270

since k is in both sets. But the reverse inequality follows fromthe fact that uo is order preserving on |J L, and so (D.I) holds. I

In view of Theorem D.5, we can define a function w on D asfollows.

Definition D.I. For all f G D, if there exists g e \JL suchthat g ^ / , then

w(f) = sup {uo(g) : g G (JL, g * / }

and if there exists h G |J L such that f -<h, then

w(f) = ini{uo(h): h€[JL, f*h}.

It is clear from Definition D.I that w is an extended real-valued function; that is, w(f) G [—00,00] for all / G D. Also wagrees with UQ on (JL; that is, w(g) = uo(g) for all g G (J^. Sosince UQ is finite and order preserving on |J L, the same holds forw. The following two theorems generalize this result, showingthat w is in fact order preserving and finite on D.

Theorem D.6. For all f,g G D, f <g iffw(f) < w(g).

Proof. If / ~ g, it follows immediately from Definition D.I thatw(f) = W(Q)- Suppose then that / -< g. Applying Theorem D.3twice, we have that there exist /ii, /12 G |J L such that / <h\ -</12 < h. But then Definition D.I gives that

w(f) ^ w(hi) < w(h2) < w(g).

Thus we have shown that if / ^ (7, then w(f) < g.Reversing the role of / and g in the preceding argument, we

have that if g -< / then w(g) < w(f). So by contraposition, ifw(f) < w(g), then f <g. I

Before turning to the proof of the finiteness of w, I introducethe following notation, which will be used both in that proofand subsequently.

271

Definition D.2. For all f G D and A G X, fA is the uniqueelement of D such that

fA =

The finiteness of w may now be proved as follows.2

Theorem D.7. w is finite.

Proof. Suppose there is an / in D such that w(f) = oo. ThenDefinition D.I gives that

sup{uo(s) : 9 e | J L } = oo. (D.2)

Let A G X be such that A, A £ N. It is easy to verify that forall<?G|JL,

uo(g) =So we can infer from (D.2) that either

s\ip{uo(gA) : ge\jL} = oo (D.3)

ors\ip{uo(gA) : g G |JL} = oo

(or both). Suppose without loss of generality that (D.3) holds.Then since u$ is finite on (J L, we have that for all g G U L thereexists h €\JIJ such that uo(hA) > uo(gA), and hence hA >- gA.So if fA ^ gA for some g G IJ L, then fA -< hA for some h G |J L,and defining / ' by the condition

gives that / ' y f (by Theorem B.2). But this violates The-orem D.6, since w(f) < oo = w(f). Hence fA >- gA for all

and consequently w(fA) = oo.2A stronger result, namely that w is bounded, has been proved by Teddy Seiden-

feld (personal communication, June 8, 1983).

272

Now by our choice of ao, there exists a eY such that ao -< a.So if we let

we have (by Theorem B.2) that / " >- JA- But this violatesTheorem D.6, since w{ff) < oo = w(fA)- Hence our originalassumption, that w(f) = oo, is false. A similar argument showsthat w(f) ^ -oo. |

D.3 THE SIGNED MEASURES Wf

I will now define, for each act / in D, a real-valued function Wfon X. One can think of Wf{A), for any A £ X, as a measure ofthe expected utility of / when the event A is known to occur.The formal definition is as follows.

Definition D.3. For all f £ D, Wf is the function defined onX by the condition that for all 4 G X ,

wf(A) = w(fA).

This section will be devoted to establishing some propertiesof the functions Wf just defined. One property that the readercan easily verify - and which will be appealed to in subsequentproofs - is that Wf is finitely additive when f €\JIJ.

The following theorem shows that we can replace / , 5, and hin Definition D.I by /A, gA, and /i^, for any A £ X.

Theorem D.8. Let f £ D and AeX. If there exists g e\JLsuch that gA fA, then

w(fA) = sup {un(gA) : g £ |JL> 9A 3 JA\

and if there exists g £ |J L such that fA~£h,A, then

w(fA) = inf {uo(hA) : h £ | J L , fA^h

273

Proof. If p(A) = 1, then for all g G (J L, gA £ fA iff g ^ / , and^O(<M) = ^0(3)- The desired result then follows directly fromDefinition D.I.

Suppose there exist #, h G U L such that gA £ /A ^,h>A- Thenif G, if G L are such that gA G G and /i^ G if, we have byTheorem D.4 that for some p G [0,1],

pG + (l-p)H~fA.

Let B € X be such that for all y e Y,

HgjHy)nh^(y)] = PP[g^iy)n

Define k G (J L by the condition

{ a on B_

honB

Then fc^ G pG + (1 — p)H, and so fc>i ~ /^. The desired resultnow follows from the fact that uo is order preserving on |JLand

uo(kA) = w(kA) = w(fA).

Two cases remain to be considered:

(i) p(A) < 1 and for all g G |JL, 9A -< /A-(ii) p(A) < 1 and for all g G LJL, /A •« PA-

Suppose (i) holds, and for all E G X let

wE = s\ip{w(gE) : g G | J L } .

By Theorem D.7, w{fA) is finite, and so since 0 < U>A < w(fA),WA is also finite. Let e = w(fA) — WA, and suppose e > 0.Choose fc G D such that

w(kA) > m a x | ^ - | , OJ

274

and define / ' G D by the condition

By Definition D.I, there exists g £{JL such that

WUA) -e< w(g) < w(fA).

Since /A >- a, the weak inequality here can be replaced with astrict one; that is, there exists g E\Jh such that

U>UA) -£< w(g) < w{fA)-

For any such 5, there exists by Axiom 8 an / " G D such thatg -< f'X~< fA- But this means

w(fA)-e<w(f'JL)<w(fA).Since w(fA)—£ = WA, we then have that for all g G UL, f" y ggiven A.

Now if there exists / i G D such that W^A) > w^ then setting

h on A

we have that for all g e \JL, k y k' y g given A, and k ~k' y g given A, whence ky k1 y g. But this violates Axiom 10.Hence for all / i G D , w(/i^) < t&^. In particular, w(f^) < w^.Consequently, there exists / i G D such that w(f^) < w(h^).Then setting

h on A

gives k y f. By Axiom 10 we then have that there exists g GU L such that ff ^ g ^ k. Also a -< / ' , and so by an applicationof Theorem D.4 we infer that there exists g E [Jli such that

275

By the assumption (i), gA -< /A- Also since / ' = /A on A and/ ' £ /A giv e n -4) w e have that / ' £ ff

A, and so the fact thatg ~ f implies g y f'A = fA. In short, gA -< /A 3 5- It followsthat there exists 5 G X such that B D A and gB ~ IA- Alsosince w^ is finitely additive,

= W(9B) - w(gA) = w(fA) - w(gA) > e. (D.4)

Now define g1 G D by the condition

g' =g on B

Then g1 ~ g given i? and g1 ~ g given i?, whence g' ~ g ~ ff.But the fact that /A = /^ implies that </ = / ' on A, and so wehave that g' ~ f given A, whence g'^ = /^. Also since g' — aon B \ A, we have 3^ = #^ — 5B- Thus gg ~ f'A, whence

W(9B) = WU'A) > max{t&>i - | , 0} > ^ - e. (D.5)

Using (D.4) and (D.5) together with the additivity of wg, weobtain

But this contradicts the definition of w^ and so the assumptionthat e > 0 is false, whence e = 0. That is, W{JA) = WA, and sothe theorem holds in this case.

The proof that the theorem holds in case (ii) is similar to theproof just given with regard to (i). I

Theorem D.9. For all f G D, Wf is finitely additive.

Proof. Let / G D, and let {^1,^2} be a measurable partitionof A G X. Let

2

We want to show that e = 0.276

The following four cases are mutually exhaustive:

(i) There exists g G IJ L such that gAl 3 fAl and9A2 ^ / A 2 -

(ii) For all p G (JL, 5^1 £ IA1 and &42 £ /A 2 -(iii) There exists g G (J L such that 5^1 -< fAl and, for all

h€\JL, hM y fA2.(iv) There exists g G (J L such that 5^2 -< / A 2 and, for all

he\JL,hAl yfAl.

I now suppose that e > 0, and proceed to show that on thissupposition each of (i)-(iv) is impossible (from which we willbe able to conclude that e < 0).

If (i) holds, then by Theorem D.8 there exists g G (J L suchthat for i = 1,2:

Then since wg is additive, wg(A) > YA=I wf(Ai) ~ £ — wf(A)-Thus g y f given A, although g •< f given A\ and g < f givenA2. But this contradicts Theorem B.2, and so (i) is impossible.

If (ii) holds, then

2ini lwg(A) : g E[JL\ > 2wf(Ai) > wf(A)->

which contradicts Theorem D.8. So (ii) is impossible.If (iii) holds, then by Theorem D.8 there exists g E\Jli such

thatWf(Ai) - e < Wg(Ai) < Wf(Ai)

and (setting 6 = Wf{A\) — wg{A\))

Wf(A2) < wg(A2) < Wf{A2) + 6.

Then since

277

we have that QA >- /A- SO by Axiom 8, there is a measurablepartition {Bi, . . . , Bn} of A2 such that for i — 1, . . . , n, if

then gA>- I'A- NOW JA2 ~< &o by assumption, and so there existsk e { 1 , . . . , n) such that f'k y / . Then letting / ' = f'k, we have9A> J ' A ^ IA- Also /^2 >- /A2, SO using Theorem D.8 again wecan find gf E\Jh such that g1 =- g on A\, g1 •< g given A2, andg1 -< f given A2. Then ^ ( ^ 2 ) > ^ ( ^ 2 ) and

wg(A2) - wgl(A2) < wg(Ai) + wg(A2) - wf (A2)

<

So, by a further application of Theorem D.8, there exists g" EU L such that g" = g' on A2 and

Wg»(Al) = Wg(Al) + Wg(A 2) ~ Wg>(A2).

Then wgn{A\) = wg(A), and so g*A~ 9A-, whence 9A^ IA- - U^9M -< hi = IAV a n d 9A2

= 9A2 -< f'A2i s o t h a t A x i o m 8 isviolated. Consequently, (iii) is impossible.

To see that (iv) is impossible, one need only interchange A\and A2 in the proof that (iii) is impossible.

This completes the proof that £ < 0. The proof that e > 0is just the dual of the above, and so e = 0; that is, Wf(A) =Si=i wf(Ai). It then follows by mathematical induction thatWf is finitely additive. I

A real-valued function ip on X is said to be absolutely con-tinuous if for all e > 0 there exists 6 > 0 such that |< ( 4.)| < sfor all A G X for which p(A) < 6. I will now show that thefunctions Wf have this property.

Theorem D.10. For all f 6 D, Wf is absolutely continuous.

Proof. If Wf is not absolutely continuous, then there existse > 0 such that at least one of the following holds:

278

(i) For all 6 > 0 there exists i G X such that p{A) < 6 andWf(A) > e.

(ii) For all 6 > 0 there exists A G X such that p{A) < 6 andWf(A) < -e.

Suppose (i) holds for some e > 0. Then there exist sequencesin X, {pi} in (0, oo), and {gi} in (JL such that for all

(a) Wf(Ai) > e.(b) wgt > e.(c) Pi = £/max{uo(y) : y G Y, g^iy) i(d) p(Ai+1) < pi/4.

(The existence of a gi G U L satisfying (b) is guaranteed byTheorem D.8.) Then for all i G J+ we have by (b) that

e < p(Ai) max [uo(y) : y G Y", 5,"1 £ N } .Hence by (c) and (d),

4p(Ai+1)<pi<p(Ai). (D.6)Now define a sequence {i?j} in X by the condition that for alliG/+,

oo

Bi = Ai- | J Aj.

Note that in view of (D.6),

Pi U ^ ) ^ E PiAi) < MAi+i) < f • (D.7)We then have

oo

wgi(Bi) = ^(i4i)-t£;^( |J Aj)

> wgi(Ai)-p(

max{«0(y) : y € r, ( y )

> | , by (b), (c), and (D.7).

279

Since the Bi are pairwise disjoint, we can define a sequence {hi}in |J L by the condition

gi onhi = <

elsewhere

Then for all i G /+,

3=1 j=l

Thus w(/ii) —• oo as i —* oo. Now define h G D by the condition

Sj on B,-, j G J+elsewhere

Then for all j G J+, wh(Bj) = wg.(Bj) > 0, and so by Theo-rem D.6 h y hi given Bj, for all j > i. It follows by Axiom 9that h,y hi given Ujii+i Bj, and so since h = hi on Uj=i -Sj> w e

have by Theorem B.2 that h ^ hi. Thus w(h) > w(hi) for all i,whence ^(/i) = oo. Since this result contradicts Theorem D.7,we infer that (i) cannot hold for any e > 0.

A slight modification of the preceding argument shows sim-ilarly that (ii) also cannot hold for any e > 0, whence Wf isabsolutely continuous. I

Having now established that the functions Wf are both finite-ly additive and absolutely continuous, it follows easily that thesefunctions are in fact countably additive.

Theorem D. l l . For all f G D, Wf is countably additive.

Proof. Let {Ai} be a disjoint sequence in X, and let A =Uili M- Let e > 0. Then by Theorem D.10, there exists 6 > 0such that for all B G X, if p(B) < 6 then \wf(B)\ < e. And byTheorem D.2, there exists n G /+ such that p(Uiln+i M) < 8-

280

Then for all k > n we have (using Theorem D.10 and the fact

U < e.i=fc+l

Henceoo k

i

k 1

An extended real-valued function (p on X is said to be a signedmeasure if ip is countably additive and </?(0) = 0. Since it followsfrom Theorem D.10 that wf(0) = 0 for all / G D , Theorem D.llestablishes that the functions Wf are signed measures.

D.4 UTILITY ON Y

I will now define the notion of a utility on Y. Here I use theusual notation for composition of functions, writing ipoxf; for thecomposition of functions ip and , that is, the function

Definition D.4. A real-valued measurable function u with do-main Y is a utility on Y iff for aI! / , jED,• uo f is measurable, and

Essentially Definition D.4 says that a utility on Y is a func-tion whose expected value is order preserving on D. The con-cept of a utility on Y is thus stronger than that of a utility onY* (Definition B.3), since the expected value of a utility on Y*need only be order preserving on (J L.

In this section I will establish the existence of a utility on Y,and prove a uniqueness result for this utility.

Theorem D.12. There exists a utility on Y.

Proof. It has been established in Section D.3 that, for all/ G D, the function Wf is an absolutely continuous finite signed

281

measure. Consequently we have by the Radon-Nikodym Theo-rem (Halmos 1950, pp. 128f.) that for all / G D there exists areal-valued measurable function <p on X such that for all A G X,

wf(A)= f V dp. (D.8)JA

A function <p satisfying (D.8) is called a Radon-Nikodym deriva-tive Of Wf.

I will now show that for any h G EUE' there exists a Radon-Nikodym derivative ip of Wh such that

(i) For all y G Y, <p is constant on h~1(y).(ii) For all y G Y*, (p = uo(y) on h~l(y).

To this end, let <ff be a Radon-Nikodym derivative of WH- Lety G y , A G X, and A C fc"1^)- T h e n

tio(/iA) = ^o(yM^). (D.9)

Also, since p(A) > 0 only if y G Y*, there exists 5 G (Jk suchthat g = HA a.e.. But then

^O(^A) = ^0(5)? since g = hA a.e.= w(#), since uo = it; on U L= w(/iA), since ^ ~ ^ .

Hence (D.9) yields

wh{A) = ixo(yMA). (D.10)

From (D.10), and the fact that ip1 is a Radon-Nikodym deriva-tive of Wht it follows that <pf = uo(y) st-e- o n h~1(y). Also wehave from clause (ii) of Axiom 11 that there are at most count-ably many y G Y such that cpf is not constant on h~l(y). Soif

= {xeX: <ff(x

ipf not constant on h~1[h(x)]},

282

then B G N. Then defining <p on X by the condition

ip{x)

we have that ip is measurable and <p = <// a.e.. Thus </? is aRadon-Nikodym derivative of Wh that satisfies (i). As for (ii),observe that it follows from clauses (i), (iii), and (iv) of Ax-iom 11 that, if y G Y*, then either h~l{y) = 0 or plh'^y)] > 0;in the former case (ii) is trivially satisfied, and in the latter case(ii) follows from the fact that iff = uo(y) a.e. on h~1(y).

Now for each h G EUE7 we can define a function Vh on h(X)by the condition that for all x G X,

where <£> is a Radon-Nikodym derivative of it^ satisfying condi-tions (i) and (ii) above. For each of the functions Vh we defineanother function Uh by the following two conditions:

(a) If E7 is empty, then Uh = Vh-(b) If h! G E', then uhi = v^, and for all heE,

vh on h(X) \ ti(X)vhi on h(X) n ^

In view of clause (i) of Axiom 11, we may define a function uon Y by the condition

I will show that u is a utility on Y.The first step is to show that, for all / G D, w o f is mea-

surable. If / G E7 there is nothing to prove, since in that caseu o / = Vf o / , and the definition ofvfof entails that it is mea-surable. If / G E and E7 is empty, then again there is nothing

283

t o p r o v e , s i n c e i n t h i s c a s e a l s o u o f = vj o f. I f / € E a n d/ ' € E ' , t h e n

uof={ vf°fonf-1[f(X)-f'(X)}\ vf,ofonf-l[f(X)nf'(X)}

So if f(X)nf'(X) = 0, then uof = Ufof, which is measurable.On the other hand, if f(X) fl f(X) ± 0, then by clause (iv) ofAxiom 11 there exists y G Y such that f(X)C)f(X) = {y}; sinceY contains all the singleton sets of Y (Section 8.2.1), it followsthat f(X)nf'(X) G Y, and hence f-1[f(X)nf(X)] G X. Alsoclause (iv) of Axiom 11 entails that / = / ' on f~1[f(X)nf(X)},so that Vff o f is identical to the measurable function Vft o f onthe measurable set f~1[f(X)C\ff(X)]. Hence uof is measurableif / G E. Finally, if / G D, then by clause (iii) of Axiom 11 thereexists a sequence {fi} in EUE' and a measurable partition {A{}of X such that for all i G /+, / = fi on A{. We have shown thateach u o fa is measurable, and from this it follows that u o f ismeasurable.

It remains to show that the expected value of u is order pre-serving on D. Since w is order preserving on D (Theorem D.6),it suffices to show that £(u o f) = w(f), for all / G D. To thisend, note that if / G D, and if {fi} and {Ai} are the associatedsequences asserted to exist by clause (iii) of Axiom 11, then

w(f) = wf(X)oo

= ]T Wf (Ai), by Theorem D. 11

/ ^ definition ofoo .

i=i JA*

But Vf{ o fi differs from Uft o /i, if at all, only on a singletonset. The continuity of p (part iv of Theorem B.I) entails that

284

the probability of any singleton set is 0. Hence Vji o fi = Uji o fia.e., and so

oo c

Utilities on F are not in general unique up to a positive affinetransformation. However, the following weaker condition holds.

Theorem D.13. If u is a utility on Y, then a necessary andsufficient condition for a real-valued function v! on Y to be autility on Y is that there exist real numbers p and cr, p > 0,such that for all f G D, ur o / is measurable and

« o / = (puf + a) o f a.e..

Proof. If uf satisfies the stated condition, then for all / G Dwe have

£{u o / ) = p£(u' o / ) + cr.

Hence for all f,g G D,

/ ^ 9 ** S{u o / ) < £{u o g) & £{u! o f) < £{uf o g).

Thus the stated condition is sufficient for uf to be a utility on Y.To establish that the condition is necessary, let u and u1 be

two utilities on Y. Then u and u1 are in particular utilities ony*, and so it follows from Theorem B.ll that there exist realnumbers p and a, p > 0, such that u = pv! + a on Y*. Settingu" = pv! + cr, we then have that for all g G U L,

uog = u" og. (D.ll)

Now let / G D and suppose g •< f for some g G U L, so that

w(f) = sup {uo(g) : g-<f, g G | J

285

Let

e = S(uof)-sup{s(uog): 5 ^ / , g

I will show that e = 0. Since e > 0 simply by the fact that u isa utility on F, assume that e > 0. Then if 5 G U L and 5 •< / ,

Hence if A = [(tz o / ) - ( « o p)]"1 (0,00), then A G X \ N . Alsoby the absolute continuity of indefinite integrals (Halmos 1950,p. 97), there exists 6 > 0 such that for all B G X, if p(B) < 6then

(t* o / ) - (n o 0)] dp<e.)B

Then choosing B C A such that 0 < p(B) < 6 gives

0< f [(uof)-(uo g)] dp < e. (D.13)JB

Now define / ' G D by the condition

g on B

'•{f= •

/on B

Then

f(uo/ /)=f(tio/)-J^[(tio/)-(uo^)]dpand so by (D.13),

£(u o /) — e < £(u o /') < £{u o /). (D.14)

Now fix / ' as a particular act satisfying (D.14). Then in viewof (D.12) we have, for all g e\JL, that if g / then

Since this violates Axiom 10, we infer that e — 0; that is,

286

Obviously (D.15) also holds with un in place of u. Furthermore,we have from (D.ll) that

sup {£(u og) : g 3 / , g G

Consequently,e(uof) = £(u"of). (D.16)

This derivation of (D.16) has been made on the assumptionthat there is a g G |J L such that g •< f.Ii that assumption doesnot hold, then

w(f) = inf

and a simple modification of the previous argument for (D.16)shows that (D.16) holds in this case too. Hence (D.16) holds forall / G D.

Now for all C G X we have

(uof)dp = £(u o fc) - u(ao)p(C)Jc

= £{u" o fc) - u"(ao)p(C), by (D.ll) and (D.16)

= f(u"of)dp.JC

Hence u o / = u" o f a.e.. I

D.5 THE NEED FOR AXIOM 11

I conclude this appendix with an example that shows Theo-rem D.12 does not hold in the absence of Axiom 11.

Let X be the interval [—1,1], and let X be the Borel subsetsof X. Let Y = [0,1]2 U {0,1}, and let Y be the set of all A suchthat A\ {0,1} is a Borel subset of [0,1]2. Let D be the set of allfunctions from X toY such that for some measurable partition

of X:

(i) / = 0on^i.(ii) / = 1 on A2.

287

(iii) There is a measurable partition {2?i,..., Bn} of Assuch that for all i = 1, . . . , n there exists yi G [0,1]such that for all x G Bi, f(x) = (y , x + yi).

(iv) There is a measurable partition {i?i,..., Bn} of A4such that for alii = 1, . . . , n there exists yi G [0,1]such that for all x G Bi, f(x) = (yi — x, yi).

(Conditions (iii) and (iv) imply that f(A%) and f(A±) are sub-sets of [0,1]2. In fact, f(As) is a finite union of sets of the form{y} x C, where y G [0,1] and C C [0,1]; similarly f{A±) is afinite union of sets of the form C x {y}. Note also that for all(2/1*2/2) € [0,1]2, f~1(yi,y2) contains at most the single point2/2-2/1.)

Obviously A\ and A2 are uniquely determined by conditions(i) and (ii). Also As and A4 are determined to within a finite setby conditions (iii) and (iv). For suppose {Ai, A2,As, A4} and{Ai, A2, A's, A'A} are two measurable partitions of X satisfying(i)-(iv). As is a finite union of sets of the form

r\{yi} x Ci)

and A4 is a finite union of sets of the form

Hence As fl A± is a finite union of sets of the form

But /~1{(2/1J2/2)} is a singleton set, and so As H A'4 is finite.Hence As \ A's is finite. Similar reasoning shows that A'3 \ ^3 ,A4 \ A4, and A'± \ A4 are also finite, so that As and A4 aredetermined to within a finite set, as claimed.

Since the Lebesgue measure of any finite set is zero, theLebesgue measure of the sets Ai, A2, ^.3, and A4 is uniquelydetermined. Hence we may define a function Q on D by thecondition that, for all / G D, and for any measurable partition{Ai,A2,A3,A4} of X satisfying (i)-(iv), Q(f) is the Lebesgue

288

measure of A\ U A3. A relation •< on D may then be defined bystipulating that for all f,g G D,

f<g iff Q(f)<Q(g).

It is easy to verify that, with N defined by Definition 8.2, Ax-ioms 1-8 are all satisfied. (Note that the consequences 0 and1 satisfy Axiom 2, and all the other consequences belong toY\Y*.) Also the countable additivity of Lebesgue measure en-sures that Axiom 9 is satisfied. And for all / G D, if

. 1 on Ai U A3

0 on A2 U A4

then f G U L and / ' ~ / ; so Axiom 10 is satisfied. There are,however, no sets E and E' in D that satisfy Axiom 11. AndTheorem D.12 fails to hold, as I will now show.

If there is a utility on Y, then there is in particular a utility usuch that u(0) = 0 and u(l) = 1; furthermore, we may supposethat u is nonnegative. I will show that, for such a choice of u,

r1 r1

/ / u(yljJy 1=0 J2/2=0

/ /Jy 1=0 J2/2

and

/ / u(yuJy2=0 Jyi=0

But by Tonelli's theorem (Royden 1968, p. 270), not both of(D.17) and (D.18) can be true. Hence there is no utility on Y.

To establish (D.17), observe that for all y\ G [0,1] there exists/ G D such that

{yi,x + yi) for x G [-yi, 1 - yi0 elsewhere

Let p be the probability on X asserted to exist by Theorem B.I.It is easily verified that this probability is equal to one half of

289

Lebesgue measure on X. Hence

£(uof) = / (uof)dpJx=-yirl-yi

Jx=-yil r1

= o / u(yi,y2)dy2.* Jy2=0

Now if / ' G D is defined by the condition

[ 0 elsewhere

then / ' ~ / . So since u is a utility on y ,

5(u o / ) = S(u o / ' ) = p [ - y i , 1 - yi] = i . (D.20)

Combining (D.19) and (D.20) gives

Integrating over y\ in (D.21) yields (D.17).The proof of (D.18) is similar. Observe that for all y2 G [0,1]

there exists g G D such that

(y2 - x, y2) for x e [y2 - 1, y2]

0 elsewhere

Then

£(uog) = / (uog)dpJx=y2-l[V2

= / u(y2 - x, y2) dpJx=y2-li r1

= 5 / u(yuy2)dyi. (D.22)

290

Now if gf E D is the act with constant value 0, then g' ~ g. Sosince u is a utility on y ,

£(u og)= £(u o g') = 0. (D.23)

Combining (D.22) and (D.23) gives

/ u(yi,y2)dyi = 0. (D.24)J 2/1=0

Integrating over j/2 in (D.24) yields (D.18).

291

Bibliography

Allais, Maurice. 1979. "The So-Called Allais Paradox and Ra-tional Decisions under Uncertainty." In Maurice Allais andOle Hagen (eds.), Expected Utility Hypotheses and the Al-lais Paradox. Dordrecht: D. Reidel, pp. 437-681.

Anand, Paul. 1987. "Are the Preference Axioms Really Ratio-nal?" Theory and Decision 23:189-214.

Armendt, Brad. 1980. "Is There a Dutch Book Argument forProbability Kinematics?" Philosophy of Science 47:583-8.

Bacon, Francis. 1620. Novum Organum. English translation inJames Spedding, Robert Leslie Ellis, and Douglas DenonHeath (eds.), The Works of Francis Bacon, vol. 8. Boston:Brown and Taggard, 1863.

Bar-Hillel, Maya, and Avishai Margalit. 1988. "How ViciousAre Cycles of Intransitive Choice?" Theory and Decision24:119-45.

Bar-Hillel, Yehoshua, and Rudolf Carnap. 1953. "Semantic In-formation." British Journal for the Philosophy of Science4:147-57.

Bell, David E. 1982. "Regret in Decision Making under Uncer-tainty." Operations Research 30:961-81.

Bernoulli, Daniel. 1738. "Specimen Theoriae de MensuraSortis." Commentarii Academiae Sicentarum ImperialisPetropolitanae V: 175-92. English translation by LouiseSommer in Econometrica 22 (1954), 23-36; reprinted inAlfred N. Page (ed.), Utility Theory: A Book of Readings,New York: Wiley, 1968.

Blackburn, Simon. 1984. Spreading the Word. Oxford: OxfordUniversity Press.

292

Blyth, Colin R. 1972. "Some Probability Paradoxes in Choicefrom among Random Alternatives." Journal of the Amer-ican Statistical Association 67:366-73.

Bratman, Michael E. 1987. Intention, Plans, and Practical Rea-son. Cambridge, Mass.: Harvard University Press.

Broome, John. 1991. "Rationality and the Sure-Thing Princi-ple." In Gay Meeks (ed.), Thoughtful Economic Man. Cam-bridge: Cambridge University Press, pp. 74-102.

Brown, Peter M. 1976. "Conditionalization and Expected Util-ity." Philosophy of Science 43:415-19.

Burks, Arthur W. 1977. Chance, Cause, Reason. Chicago: Uni-versity of Chicago Press.

Campbell, Richmond, and Thomas Vinci. 1982. "Why AreNovel Predictions Important?" Pacific Philosophical Quar-terly 63:111-21.

1983. "Novel Confirmation." British Journal for the Philoso-phy of Science 34:315-41.

Carnap, Rudolf. 1950. Logical Foundations of Probability.Chicago: Chicago University Press; 2d ed. 1962.

1952. The Continuum of Inductive Methods. Chicago: Uni-versity of Chicago Press.

Cavendish, Henry. 1879. The Electrical Researches of the Hon-orable Henry Cavendish. Edited by J. Clerk Maxwell, Cam-bridge: Cambridge University Press.

Christensen, David. 1991. "Clever Bookies and Coherent Be-liefs." Philosophical Review 100:229-47.

Cohen, L. Jonathan. 1977. The Probable and the Provable. Ox-ford: Oxford University Press.

Crick, Francis. 1988. What Mad Pursuit. New York: Basic Books.Davidson, Donald. 1980. Essays on Actions and Events. New

York: Oxford University Press.1984. Inquiries into Truth and Interpretation. New York: Ox-

ford University Press.293

1985. "Incoherence and Irrationality." Dialectica 39:345-54.Davidson, Donald, J. C. C. McKinsey, and Patrick Suppes.

1955. "Outlines of a Formal Theory of Value, I." Philoso-phy of Science 22:60-80.

de Finetti, Bruno. 1931. "Sul Significato soggettivo della prob-abilita." Fundamenta Mathematicae 17:298-329.

1937. "La Prevision: ses lois logiques, ses sources subjectives."Annales de l'lnstitut Henri Poincare 7:1-68. Page refer-ences are to the English translation in Henry E. KyburgJr. and Howard E. Smokier (eds.), Studies in SubjectiveProbability, New York: Krieger, 1980.

Doppelt, Gerald. 1978. "Kuhn's Epistemological Relativism: AnInterpretation and Defense." Inquiry 21:33-86.

Dorling, Jon. 1972. "Bayesianism and the Rationality of Scien-tific Inference." British Journal for the Philosophy of Sci-ence 23:181-90.

1974. "Henry Cavendish's Deduction of the Electrostatic In-verse Square Law from the Result of a Single Experiment."Studies in History and Philosophy of Science 4:327-48.

Duhem, Pierre. 1914. La Theorie Physique: Son Objet, SaStructure. Paris: Marcel Riviere & Cie. 2d ed. Translatedby P. P. Wiener as The Aim and Structure of PhysicalTheory, Princeton: Princeton University Press, 1954.

Eells, Ellery. 1982. Rational Decision and Causality Cambridge:Cambridge University Press.

Ellis, Brian. 1988. "Solving the Problem of Induction Using aValues-Based Epistemology." British Journal for the Phi-losophy of Science 39:141-60.

Ellsberg, Daniel. 1961. "Risk, Ambiguity, and the SavageAxioms." Quarterly Journal of Economics 75:643-99.Reprinted in Peter Gardenfors and Nils-Eric Sahlin (eds.),Decision, Probability, and Utility, Cambridge: CambridgeUniversity Press, 1988.

Elster, Jon. 1983. Sour Grapes. Cambridge: Cambridge Univer-sity Press.

294

Fishburn, Peter C. 1981. "Subjective Expected Utility: A Re-view of Normative Theories." Theory and Decision 13:129-99.

Foley, Richard. 1987. The Theory of Epistemic RationalityCambridge, Mass.: Harvard University Press.

Franklin, Allan, and Colin Howson. 1985. "Newton and Kepler,A Bayesian Approach." Studies in History and Philoso-phy of Science 16:379-85. Reprinted in Allan Franklin, TheNeglect of Experiment, Cambridge: Cambridge UniversityPress, 1986, pp. 119-23.

Gardenfors, Peter, and Nils-Eric Sahlin (eds.). 1988. Decision,Probability and Utility. Cambridge: Cambridge UniversityPress.

Gibbard, Allan. 1990. Wise Choices, Apt Feelings. Cambridge,Mass.: Harvard University Press.

Gibbard, Allan, and William L. Harper 1978. "Counterfactu-als and Two Kinds of Expected Utility." In C. A. Hooker,J. J. Leach, and E. F. McClennen (eds.), Foundations andApplications of Decision Theory, vol. 1. Dordrecht: D. Rei-del, pp. 123-62.

Giere, Ronald M. 1983. "Testing Theoretical Hypotheses." InJohn Earman (ed.), Testing Scientific Theories. Minneapo-lis: University of Minnesota Press, pp. 269-98.

Goldstein, Michael. 1983. "The Prevision of a Prevision." Jour-nal of the Americal Statistical Association 78:817-19.

Good, I. J. 1967. "On the Principle of Total Evidence." BritishJournal for the Philosophy of Science 17:319-21. Reprintedin I. J. Good, Good Thinking, Minneapolis: University ofMinnesota Press, 1983.

1969. "What Is the Use of a Distribution?" In P. R. Krishna-iah (ed.), Multivariate Analysis-II. New York and London:Academic Press, pp. 183-203.

1983. Good Thinking. Minneapolis: University of MinnesotaPress.

295

1988. "The Interface between Statistics and Philosophy ofScience." Statistical Science 3:386-412.

Goodman, Nelson. 1973. Fact, Fiction, and Forecast. 3d ed.Indianapolis: Hackett.

Gupta, Anil. 1989. "Remarks on Definitions and the Concept ofTruth." Proceedings of the Aristotelian Society 89:227-46.

Halmos, Paul R. 1950. Measure Theory. New York: Van Nos-trand.

Hammond, Peter J. 1988a. "Consequentialist Foundations forExpected Utility." Theory and Decision 25:25-78.

1988b. "Consequentialism and the Independence Axiom." InBertrand R. Munier (ed.), Risk, Decision and Rationality.Dordrecht: D. Reidel, pp. 503-16.

Heilig, Klaus. 1978. "Carnap and de Finetti on Bets and theProbability of Singular Events: The Dutch Book ArgumentReconsidered." British Journal for the Philosophy of Sci-ence 29:325-46.

Hempel, Carl G. 1960. "Inductive Inconsistencies." Synthesei2:439-69. Reprinted in Carl G. Hempel, Aspects of Sci-entific Explanation, New York: The Free Press, 1965.

1962. "Deductive-Nomological vs Statistical Explanation." InH. Feigl and G. Maxwell (eds.), Minnesota Studies in thePhilosophy of Science III. Minneapolis: University of Min-nesota Press, pp. 98-169.

Hempel, Carl G., and Paul Oppenheim. 1948. "Studies in theLogic of Explanation." Philosophy of Science 15:135-75.Reprinted in Carl G. Hempel, Aspects of Scientific Expla-nation, New York: The Free Press, 1965.

Horwich, Paul. 1982. Probability and Evidence. Cambridge:Cambridge University Press.

Howson, Colin, and Allan Franklin. 1991. "Maher, Mendeleevand Bayesianism." Philosophy of Science 58:574-85.

Howson, Colin, and Peter Urbach. 1989. Scientific Reasoning.La Salle, 111.: Open Court.

296

Hughes, R. I. G. 1980. "Rationality and Intransitive Prefer-ences." Analysis 40:132-4.

Hurley, S. L. 1989. Natural Reasons. New York: Oxford Univer-sity Press.

Huygens, Christiaan. 1690. Traite de la Lumiere. Leiden: Pierrevander Aa. Translated by S. P. Thompson as Treatise onLight, London: Macmillan, 1912.

Jeffrey, Richard C. 1965. The Logic of Decision. New York:McGraw-Hill; 2d ed., Chicago: University of Chicago Press,1983.

1974. "Preference among Preferences." Journal of Philosophy71:377-91. Reprinted in Richard C. Jeffrey, The Logic of De-cision, 2d ed. Chicago: University of Chicago Press, 1983.

1983. "The Sure Thing Principle." In Peter D. Asquith andThomas Nickles (eds.), PSA 1982, vol. 2. East Lansing:Philosophy of Science Association, pp. 719-30.

1987. "Risk and Human Rationality." The Monist 70:223-36.1988. "Conditioning, Kinematics, and Exchangeability." In

Brian Skyrms and William L. Harper (eds.), Causa-tion, Chance, and Credence, vol. 1. Dordrecht: Kluwer,pp. 221-55.

Jeffreys, Harold. 1931. Scientific Inference. Cambridge: Cam-bridge University Press, 3d ed., 1973.

Jensen, Niels Erik. 1967. "An Introduction to Bernoullian Util-ity Theory I." Swedish Journal of Economics 69:163-83.

Kahneman, Daniel, Paul Slovic, and Amos Tversky (eds.). 1982.Judgment under Uncertainty: Heuristics and Biases. Cam-bridge: Cambridge University Press.

Kahneman, Daniel, and Amos Tversky. 1979. "Prospect The-ory: An Analysis of Decisions under Risk." Economet-rica 47:263-91. Reprinted in Peter Gardenfors and Nils-Eric Sahlin (eds.), Decision, Probability, and Utility, Cam-bridge: Cambridge University Press, 1988.

Kaku, Michio, and Jennifer Trainer. 1987. Beyond Einstein.New York: Bantam.

297

Kaplan, Mark. 1981. "Rational Acceptance." PhilosophicalStudies 40:129-45.

Kashima, Yoshihisa, and Patrick Maher. 1992. "Framing of De-cisions under Ambiguity." Department of Psychology, LaTrobe University.

Kemeny, John G. 1955. "Two Measures of Simplicity." Journalof Philosophy 52:722-33.

Koertge, Noretta. 1979. "The Problem of Appraising ScientificTheories." In Peter D. Asquith and Henry E. Kyburg, Jr.(eds.), Current Research in Philosophy of Science, EastLansing: Philosophy of Science Association, pp. 228-51.

Kuhn, Thomas S. 1962. The Structure of Scientific Revolutions.Chicago: University of Chicago Press; 2d ed. 1970.

1977. "Objectivity, Value Judgment, and Theory Choice." InThomas S. Kuhn, The Essential Tension. Chicago: Univer-sity of Chicago Press, pp. 320-39.

1983. "Rationality and Theory Choice." Journal of Philoso-phy 80:563-70.

Kukla, Andre. 1991. "Criteria of Rationality and the Problemof Logical Sloth." Philosophy of Science 58:486-90.

Kyburg, Henry E. Jr. 1961. Probability and the Logic of Ratio-nal Belief Middletown, Conn.: Wesleyan University Press.

1968. "Bets and Beliefs." American Philosophical Quar-terly 5:63-78. Reprinted in Peter Gardenfors and Nils-Eric Sahlin (eds.), Decision, Probability, and Utility, Cam-bridge: Cambridge University Press, 1988.

1978. "Subjective Probability: Criticisms, Reflections, andProblems." Journal of Philosophical Logic 7:157-80.

Lamport, Leslie. 1986. IATgX: A Document Preparation System.Reading, Mass.: Addison-Wesley.

Laudan, Larry. 1977. Progress and Its Problems. Berkeley: Uni-versity of California Press.

1981. "A Confutation of Convergent Realism." Philosophy ofScience 48:19-49.

298

Lehman, R. Sherman. 1955. "On Confirmation and RationalBetting." Journal of Symbolic Logic 20:251-62.

Lehrer, Keith, and Carl Wagner. 1985. "Intransitive Indiffer-ence: The Semi-Order Problem." Synthese 65:249-56.

Leibniz, Gottfried Wilhelm. 1678. "Letter to Herman Conring."In Gottfried Wilhelm Leibniz, Philosophical Papers andLetters. Dordrecht: D. Reidel, pp. 186-91.

Levi, Isaac. 1967. Gambling with Truth. Cambridge, Mass.:MIT Press.

1974. "On Indeterminate Probabilities." Journal of Philoso-phy 71:391-418.

1976. "Acceptance Revisited." In Radu J. Bogdan (ed.), Lo-cal Induction. Dordrecht: D. Reidel, pp. 1-71.

1980. The Enterprise of Knowledge. Cambridge, Mass.: MITPress.

1984. Decisions and Revisions. Cambridge: Cambridge Uni-versity Press.

1986. Hard Choices. Cambridge: Cambridge University Press.1987. "The Demons of Decision." The Monist 70:193-211.

Lewis, David. 1981. "Causal Decision Theory." AustralasianJournal of Philosophy 59:5-30.

Luce, R. Duncan. 1956. "Semiorders and a Theory of UtilityDiscrimination." Econometrica 24:178-91.

McClennen, Edward F. 1983. "Sure Thing Doubts." In BerntP. Stigum and Fred Wenst0p (eds.), Foundations of Util-ity and Risk Theory with Applications. Dordrecht: D. Rei-del, pp. 117-36. Reprinted in Peter Gardenfors and Nils-Eric Sahlin (eds.), Decision, Probability, and Utility, Cam-bridge: Cambridge University Press, 1988.

1990. Rationality and Dynamic Choice. Cambridge: Cam-bridge University Press.

MacCrimmon, Kenneth R. 1968. "Descriptive and Norma-tive Implications of Decision Theory." In Karl Borch and

299

Jan Mossin (eds.), Risk and Uncertainty. New York: St.Martin's Press, pp. 3-23.

MacCrimmon, Kenneth R., and Stig Larsson. 1979. "UtilityTheory: Axioms versus 'Paradoxes'." In Maurice Allaisand Ole Hagen (eds.), Expected Utility Hypotheses andthe Allais Paradox. Dordrecht: D. Reidel, pp. 333-409.

Maher, Patrick. 1984. Rationality and Belief. Doctoral disser-tation, University of Pittsburgh.

1986a. "What Is Wrong with Strict Bayesianism?" PSA 1986,vol. 1:450-7.

1986b. "The Irrelevance of Belief to Rational Action." Er-kenntnis 24:363-84.

1987. "Causality in the Logic of Decision." Theory and De-cision 22:155-72.

1988. "Prediction, Accommodation, and the Logic of Discov-ery." PSA 1988, vol. 1:273-85.

1989. "Levi on the Allais and Ellsberg Problems." Economicsand Philosophy 5:69-78.

1990a. "How Prediction Enhances Confirmation." InJ. Michael Dunn and Anil Gupta (eds.), Truth or Con-sequences: Essays in Honor of Nuel Belnap. Dordrecht:Kluwer, pp. 327-43.

1990b. "Symptomatic Acts and the Value of Evidence inCausal Decision Theory." Philosophy of Science 57:479-98.

1990c. "Why Scientists Gather Evidence." British Journal forthe Philosophy of Science 41:103-19.

1992. "Diachronic Rationality." Philosophy of Science59:120-41.

In press. "Howson and Franklin on Prediction." Philosophyof Science.

Maher, Patrick, and Yoshihisa Kashima. 1991. "On the Descrip-tive Adequacy of Levi's Decision Theory." Economics andPhilosophy 7:93-100.

300

Mao Tse-tung. 1966. Quotations from Chairman Mao Tse-Tung. Peking: Foreign Languages Press.

Markowitz, Harry M. 1959. Portfolio Selection. New York: Wiley.Miller, David. 1974. "Popper's Qualitative Theory of Veri-

similitude." British Journal for the Philosophy of Science25:166-77.

Moore, G. E. 1942. "A Reply to My Critics." In Paul ArthurSchilpp (ed.), The Philosophy of G. E. Moore. Evanston,111.: Northwestern University, pp. 535-677.

Morrison, Donald. 1967. "On the Consistency of Preferences inAllais' paradox." Behavioral Science 5:225-42.

Moscowitz, Herbert. 1974. "Effects of Problem Representationand Feedback on Rational Behavior in Allais and Morlat-type Problems." Decision Sciences 5:225-41.

Newton, Isaac. 1726. Philosophise Naturalis Principia Mathe-matica. 3d ed. English translation by Andrew Motte, re-vised by Florian Cajori, Berkeley: University of CaliforniaPress, 1934.

Newton-Smith, W. H. 1981. The Rationality of Science. Boston:Routledge and Kegan Paul.

Niiniluoto, Ilkka. 1984. Is Science Progressive? Dordrecht:D. Reidel.

1986. "Truthlikeness and Bayesian Estimation." Synthese67:321-46.

1987. Truthlikeness. Dordrecht: D. Reidel.Oddie, Graham. 1981. "Verisimilitude Reviewed." British Jour-

nal for the Philosophy of Science 32:237-65.Packard, Dennis J. 1982. "Cyclical Preference Logic." Theory

and Decision 14:415-26.Peirce, Charles S. 1883. "A Theory of Probable Inference." In

Charles S. Peirce (ed.), Studies in Logic. Boston: Little,Brown, pp. 126-81.

Popper, Karl R. 1959. The Logic of Scientific Discovery. NewYork: Harper and Row; 2d ed., 1968.

301

1963. Conjectures and Refutations. New York: Harper andRow; 2d ed., 1965.

1972. Objective Knowledge. Oxford: Oxford University Press;rev. ed., 1979.

Raiffa, Howard. 1968. Decision Analysis. Reading, Mass.:Addison-Wesley.

Railton, Peter. 1984. "Alienation, Consequentialism, and theDemands of Morality." Philosophy Sz Public Affairs13:134-71.

Ramsey, F. P. 1926. "Truth and Probability." In F. P. Ramsey,Foundations. Atlantic Highlands, N.J.: Humanities Press,pp. 58-100.

Rawls, John. 1971. A Theory of Justice. Cambridge, Mass.: Har-vard University Press.

Rosenkrantz, Roger D. 1977. Inference, Method and Decision.Dordrecht: D. Reidel.

1981. Foundations and Applications of Inductive Probability.Atascadero, Calif.: Ridgeview.

Royden, H. L. 1968. Real Analysis. 2d ed. New York: Macmillan.

Salmon, Wesley C. 1967. The Foundations of Scientific Infer-ence. Pittsburgh: University of Pittsburgh Press.

Savage, Leonard J. 1954. The Foundations of Statistics. NewYork: John Wiley; 2d ed., New York: Dover, 1972.

1971. "The Elicitation of Personal Probabilities and Expec-tations." Journal of the American Statistical Association66:783-801.

Schervish, Mark J., Teddy Seidenfeld, and Joseph B. Kadane.1984. "The Extent of Non-Conglomerability of Finitely Ad-ditive Probabilities." Zeitschrift fur Wahrscheinlichkeits-theorie und verwandte Gebiete 66:205-26.

Schick, Frederic. 1986. "Dutch Bookies and Money Pumps."Journal of Philosophy 83:112-19.

302

Seidenfeld, Teddy. 1988. "Decision Theory Without 'Indepen-dence' or without 'Ordering,' What Is the Difference?"Economics and Philosophy 4:267-90.

Seidenfeld, Teddy, and Mark Schervish. 1983. "A Conflict be-tween Finite Additivity and Avoiding Dutch Book." Phi-losophy of Science 50:398-412.

Sen, Amartya K. 1971. "Choice Functions and Revealed Pref-erence." Review of Economic Studies 38:307-17.

Shannon, C. E., and W. Weaver. 1949. The Mathematical The-ory of Communication. Urbana: University of Illinois Press.

Shimony, Abner. 1955. "Coherence and the Axioms of Confir-mation." Journal of Symbolic Logic 20:1-28.

1970. "Scientific Inference." In Robert Colodny (ed.), TheNature and Function of Scientific Theories. Pittsburgh:University of Pittsburgh Press, pp. 79-172.

Skyrms, Brian. 1984. Pragmatics and Empiricism. New Haven,Conn.: Yale University Press.

1987a. "Coherence." In Nicholas Rescher (ed.), Scientific In-quiry in Philosophical Perspective. Lanham, Md.: Univer-sity Press of America, pp. 225-41.

1987b. "Dynamic Coherence and Probability Kinematics."Philosophy of Science 54:1-20.

1990a. "The Value of Knowledge." In C. Wade Savage (ed.),Scientific Theories. Minneapolis: University of MinnesotaPress, pp. 245-66.

1990b. The Dynamics of Rational Deliberation. Cambridge,Mass.: Harvard University Press.

In press. "A Mistake in Dynamic Coherence Arguments?"Philosophy of Science.

Slovic, Paul, and Amos Tversky. 1974. "Who Accepts Savage'sAxiom?" Behavioral Science 19:368-73.

Sober, Elliott. 1975. Simplicity. London: Oxford UniversityPress.

303

Stich, Stephen P. 1983. From Folk Psychology to Cognitive Sci-ence. Cambridge, Mass.: MIT Press.

Szaniawski, Klemens. 1976. "Types of Information and TheirRole in the Methodology of Science." In Marian Przel§cki,Klemens Szaniawski, and Ryszard Wojcicki (eds.), FormalMethods in the Methodology of Empirical Sciences. Dor-drecht: D. Reidel, pp. 297-308.

Talbott, W. J. 1991. "Two Principles of Bayesian Epistemol-ogy." Philosophical Studies 62:135-50.

Taylor, Charles. 1982. "Rationality." In Martin Hollis andSteven Lukes (eds.), Rationality and Relativism. Cam-bridge, Mass.: MIT Press, pp. 87-105.

Teller, Paul. 1973. "Conditionalization and Observation." Syn-these 26:218-58.

1976. "Conditionalization, Observation, and Change of Pref-erence." In W. Harper and C. Hooker (eds.), Foundationsof Probability Theory, Statistical Inference, and StatisticalTheories of Science. Dordrecht: D. Reidel, pp. 205-53.

Thagard, Paul R. 1988. Computational Philosophy of Science.Cambridge, Mass.: MIT Press.

Tichy, Pavel. 1974. "On Popper's Definitions of Verisimilitude."British Journal for the Philosophy of Science 25:155-60.

1978. "Verisimilitude Revisited." Synthese 38:175-96.Tversky, Amos. 1969. "Intransitivity of Preferences." Psycho-

logical Review 76:31-48.Urbach, Peter. 1983. "Intimations of Similarity: The Shaky Ba-

sis of Verisimilitude." British Journal for the Philosophy ofScience 34:266-75.

van Praassen, Bas C. 1980. The Scientific Image. New York:Oxford University Press.

1983. "Glymour on Evidence and Explanation." In John Ear-man (ed.), Testing Scientific Theories. Minneapolis: Uni-versity of Minnesota Press, pp. 165-76.

1984. "Belief and the Will." Journal of Philosophy 81:235-56.304

1985. "Empiricism in the Philosophy of Science." In PaulM. Churchland and Clifford A. Hooker (eds.), Images of Sci-ence. Chicago: University of Chicago Press, pp. 245-308.

1989. Laws and Symmetries. New York: Oxford UniversityPress.

In press. "Rationality Does Not Require Conditionalization."In Edna Ullmann-Margalit (ed.), The Israel ColloquiumStudies in the History, Philosophy and Sociology of Sci-ence, vol. V. Dordrecht: Kluwer.

von Neumann, John, and Oskar Morgenstern. 1947. Theory ofGames and Economic Behavior. 2d ed. Princeton, N.J.:Princeton University Press.

Wakker, Peter. 1988. "Nonexpected Utility as Aversion of Infor-mation." Journal of Behavioral Decision Making 1:169-75.

Whewell, William. 1847. The Philosophy of the Inductive Sci-ences. 2d ed. London: Parker.

Wilson, Timothy D., Jay G. Hull, and Jim Johnson. 1981."Awareness and Self-Perception: Verbal Reports on Inter-nal States." Journal of Personality and Social Psychology40:53-71.

Worrall, John. 1989. "Presnel, Poisson and the White Spot: TheRole of Successful Predictions in the Acceptance of Scien-tific Theories." In David Gooding, Trevor Pinch, and Si-mon Schaffer (eds.), The Uses of Experiment. Cambridge:Cambridge University Press, pp. 135-57.

305

Index

acceptance, 130-3; and action, 149-52;vs. probability, 133-9, 146-7;random, 148-9; rational, 139-47;and will, 147-8; see also belief

act (or option), 1-2, 4-5, 49, 51-2; ofacceptance, 148; constant, 183,187-8; Savage, 183; set D of, 186-7,201-2; simple, 196-7, 202-3; unin-terpretable, 183-5

Allais, Maurice, 63, 64, 82Allais problems, 63-6alternative hypotheses, 169-73Anand, Paul, 48Armendt, Brad, xii, 128

Bacon, Francis, 86Bar-Hillel, Maya, 48, 49, 51Bar-Hillel, Yehoshua, 234Bayesian decision theory, 1, 29Bayesianism, qualified, 29-33belief, 130, 152-5Bell, David E., 65Bernoulli, Daniel, 226Bernoulli, Nikolaus, 226Blackburn, Simon, 208Blyth, Colin R., 48-9Brat man, Michael E., 13nBroome, John, 65, 68Brown, Peter M., 123nBurks, Arthur, 49-51, 53

calculation, 5-8Campbell, Richmond, 86nCarnap, Rudolf, 29, 221, 234Cavendish, Henry, 139-40, 163-4, 194choice function, 22choice set, 22Christensen, David, 113cognitive utility, 141-2, 185Cohen, L. Jonathan, 135nconditionalization, 32, 84-5, 120-7Condorcet paradox, 51confirmation, 84—6confirmation theory, explanatory value

of, 85-8; subjectivity of, 91-2

connectedness, 10, 19-21, 46, 54, 184,187

consequence, 1-5, 82; cognitive, 186,200-1

consequentialism, 38-42consistency, logical, 134-6, 216continuity, 193-5contradiction, see consistency, logicalCopernicus, Nicolaus, 166-8corpus, 142, 211Crick, Francis, 215

Davidson, Donald, 9, 18, 24, 36deductive closure, 32-3de Finetti, Bruno, 91-2, 99-102,

199-200diachronic separability, 74, 77-8, 80discrimination threshold, 57—60dominance, 110-12, 191-2, 198Doppelt, Gerald, 217, 218nDorling, Jon, 88n, 163-4Duhem, Pierre, 86Dutch book arguments, 94-104, 199-

200; diachronic, 106-7, 110-13,120, 128

Eddington, A. S., 169-70Eells, Ellery, 29, 65, 68Einstein, Albert, 137-8, 169-70Ellis, Brian, 211Ellsberg, Daniel, 66-7, 76n, 82Ellsberg problems, 66-70Elster, Jon, 31empirical adequacy, 240empiricism, 93—4event, 133, 186evidence, value of, 79-81, 108, 173-80,

212expected utility, 1-5experiment, 188-90

Fishburn, Peter C, 10Foley, Richard, 24Franklin, Allan, 86n, 164-6/ T , 201

307

Galileo, 175Gauthier, David, xiGibbard, Allan, xii, 2n, 17, 18n, 24-8Giere, Ronald, 219Glymour, Clark, xiGoldstein, Michael, 105, 106Good, I. J., 8n, 174-5Goodman, Nelson, 87Gupta, Anil, 208n

Hammond, Peter J., 38-42, 81Harper, William L., 2nHeilig, Klaus, 99nHempel, Carl G., xi, 141, 234history of science, 162—9Horwich, Paul, 94, 97, 176, 178-9Howson, Colin, 86, 94, 124-5, 164-6Hughes, R. I. G., 51nHull, Jay G., 154Hume, David, 61, 90, 93Hurley, S. L., 18Huygens, Christiaan, 86

iff, 18nincommensurability, 216—18independence, 10, 11, 12; arguments

for, 70—82; axiom on, 190; objectionsto, 82; and uninterpretable acts,185; and violations, 63-70

indifference, 14inf(imum), 143ninformation: and acceptance, 140-1;

defined, 232; probability measuresof, 234-7; respect for, 213-14

integrity, 113-14

Jeffrey, Richard, 2n, 15-16, 65, 68,76n, 116n, 128

Jeffreys, Harold, 215Jensen, Niels Erik, 254Johnson, Jim, 154justification, 25-9, 117n

Kadane, Joseph B., 267Kahneman, Daniel, 64, 71-2, 87Kaplan, Mark, 155-6Kashima, Yoshihisa, 56n, 67, 68, 69,

72Kemeny, John G., 215Koertge, Noretta, 225nKuhn, Thomas S., 86, 169-73, 180n,

211, 215-18Kukla, Andre, 7nKyburg, Henry E., Jr., 92n, 93, 126-7,

135n

Lamport, Leslie, xiiLarsson, Stig, 64-8, 78Laudan, Larry, 161n, 211, 220, 229learning, 114-16, 121Lehman, R. Sherman, 95nLehrer, Keith, 38nLeibniz, Gottfried Wilhelm, 86Levi, Isaac, 53-7, 110, 141, 156-8, 235Lewis, David, 5, 120Locke, John, 93lottery paradox, 135nLuce, R. Duncan, 57n, 60

McClennen, Edward F., 42, 43, 45,60-1, 72, 83

MacCrimmon, Kenneth R., 34—6,64-8, 78

McKinsey, J. C. C, 36Maher, Janette, xiimajority rule, 51-3Mao Tse-tung, 62Margalit, Avishai, 48, 49, 51Markowitz, Harry M., 70-2, 74Mavrodes, George, xiiMiller, David, 221, 223modesty, 42-5, 61money pump argument, 36—8Moore, G. E., 130Morgenstern, Oskar, 97, 254Morrison, Donald, 64, 65Moscowitz, Herbert, 64

Newton, Isaac, 164-5, 216Newton-Smith, W. H., 219nNiiniluoto, Ilkka, 147n, 213, 219n, 225normality, 21-3, 34-6, 54, 56, 60-2null event, 191

Oddie, Graham, 213Oppenheim, Paul, 234option, see act

p-u pair, 21Packard, Dennis J., 48partition, 189nPascal, Blaise, 195Peirce, Charles S., 86Popper, Karl R., 86, 141, 180n, 215,

219n, 220-2, 226, 234preference, 12-19, 22-3, 54n;

conditional, 190-1; weak, 10preference interpretation of probabil-

ity and utility, 9, 88-91probabilistic prevalence, 48-9probability, subjective, 9, 88-91;

countable additivity of, 198-200;qualitative, 192-3

308

probability kinematics, 127-8properties a and /3, 45-7proposition, 133, 186

qualified Bayesianism, 29-33

Raiffa, Howard, 36, 65, 70-1, 74Railton, Peter, 8nRamsey, F. P., 9, 91-2, 152, 221rationality, 23-9, 61Rawls, John, 87realism, scientific, 240-4Reflection, 105-6; argument for,

106-7, 110-13; and conditionali-zation, 121; counterexamples to,107-10; true status of, 116-20

representation theorem, 9-12, 20-1,30, 195-7, 206-7

representor, 21Rescher, Nicholas, xirigidity, 38-41, 75-6, 78, 81Rosen, Gideon, 241nRosenkrantz, Roger D., 166-8, 176,

215

Salmon, Wesley C, xi, 29, 91-2Savage, Leonard J., 9, 10, 11, 13, 19n,

63, 76-8, 179, 182-5, 196n, 249, 250,270

Schervish, Mark J., 196n, 267Schick, Frederic, 98Schmitt, Frederick, xiiSchwartz, John, 215scientific realism, 240-4scientific values, 209-10Seidenfeld, Teddy, xi, 81, 196n, 267,

272nSen, Amartya, 23, 45-7Shannon, C. E., 234shifting focus, 49-51Shimony, Abner, 88n, 97simplicity, 215-16Skyrms, Brian, 20, 68, 102-4, 106,

110, 115nSlovic, Paul, 64-8, 87Sober, Elliott, 215state, 1-5, 186Stich, Stephen P., 153

strict preference, 14Suppes, Patrick, 36sup(remum), 143nsure-thing principle, lOn, 76—9SVET, definition of, 173; see also

evidence, value ofsynchronic separability, 73Szaniawski, Klemens, 234

Talbott, W. J., 113Taylor, Charles, 24Teller, Paul, 120, 123-4Thagard, Paul, 88-9Tichy, Pavel, 213, 221-4, 226transitivity, 10, 11, 12, 14, 20;

arguments for, 36-47; axiom on,190; objections to, 47-60; andpopular endorsement, 34-6

truth, 208-9; distance from, 237-40;and impartiality, 214; respect for,210-13

Tversky, Amos, 35n, 64-8, 71-2, 87

Urbach, Peter, 86, 94, 224utility, 9, 88; expected, 1-5; scientific,

210

van Fraassen, Bas, 21n, 93, 105, 106,113, 125-6, 158-61, 165n, 221n,240-3

verisimilitude: defined, 228; expected,229-30; need for, 141-2, 218-20;Popper's theory of, 220-2;subjective approach to, 224-7;Tichy's theory of, 222-4

Vinci, Thomas, 86nvon Neumann, John, 97, 254

Wagner, Carl, 38nWagner, Steven, xiiWakker, Peter, 79-81weak order, 10Whewell, William, 86will, weakness of, 17-18, 30-1Wilson, Timothy D., 154Worrall, John, 160nWu, Wei-ming, xii

309