preference for knowledge

63
Preference for Knowledge Rommeswinkel,Hendrik;Chang,Hung-Chi; and Hsu,W en-T ai * A ugust 17, 2020 Abstract We examine the subjective value of gaining knowledge in a version of Savage’s model for decisions under uncertainty in which the received outcome provides information about which event has obtained. Decision makers commonly value such knowledge either because they want to use it in future decisions or because they are intrinsically interested in it. We find that in our model, the sure-thing principle and several other axioms of Savage are inconsistent with this value for knowledge about events. We provide a representation theorem for a subjective value of knowledge consisting of the sum of expected utility and a function of the information partition generated by the outcomes of an act. Bayesian updating of likelihood judgments and stationarity of preferences imply that decision makers rank acts by the sum of expected utility and the Shannon entropy. Our results also provide a novel critique of the necessity of Savage’s axioms for rational decisions under uncertainty. Keywords: Decision Theory, Uncertainty, Learning, Expected Utility, Pref- erences, Entropy, Knowledge, Information JEL Classification:D81,C44 * Department of Economics and Center for Research in Econometric Theory and Applica- tions, National Taiwan University, No. 1, Sec. 4, Roosevelt Rd., Taipei 106. We are greatly indebted to Peter Wakker for extensive suggestions and remarks on the paper. We thank pre- sentation participants at the Warsaw School of Economics, Hitotsubashi University, and D-TEA 2020 as well as Bob Aumann, Franz Dietrich, Alex Jacobson, and Norio Takeoka for comments on the paper. This work was financially supported by the Center for Research in Econometric Theory and Applications (grant no. 109L900203) and by the Ministry of Science and Technol- ogy (MOST), Taiwan, (grant no. MOST 109-2634-F-002-045). Hendrik Rommeswinkel thanks Erasmus University Rotterdam for their hospitality and the Ministry of Science and Technology (grant no. MOST 108-2918-I-002-009) for financial support of a research visit during which this paper was written.

Upload: others

Post on 26-Dec-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

Preference for Knowledge

Rommeswinkel, Hendrik; Chang, Hung-Chi; and Hsu, Wen-Tai∗

August 17, 2020

Abstract

We examine the subjective value of gaining knowledge in a version of

Savage’s model for decisions under uncertainty in which the received

outcome provides information about which event has obtained. Decision

makers commonly value such knowledge either because they want to

use it in future decisions or because they are intrinsically interested in

it. We find that in our model, the sure-thing principle and several other

axioms of Savage are inconsistent with this value for knowledge about

events. We provide a representation theorem for a subjective value of

knowledge consisting of the sum of expected utility and a function of

the information partition generated by the outcomes of an act. Bayesian

updating of likelihood judgments and stationarity of preferences imply

that decision makers rank acts by the sum of expected utility and the

Shannon entropy. Our results also provide a novel critique of the necessity

of Savage’s axioms for rational decisions under uncertainty.

Keywords: Decision Theory, Uncertainty, Learning, Expected Utility, Pref-

erences, Entropy, Knowledge, Information

JEL Classification: D81,C44

∗Department of Economics and Center for Research in Econometric Theory and Applica-tions, National Taiwan University, No. 1, Sec. 4, Roosevelt Rd., Taipei 106. We are greatlyindebted to Peter Wakker for extensive suggestions and remarks on the paper. We thank pre-sentation participants at the Warsaw School of Economics, Hitotsubashi University, and D-TEA2020 as well as Bob Aumann, Franz Dietrich, Alex Jacobson, and Norio Takeoka for commentson the paper. This work was financially supported by the Center for Research in EconometricTheory and Applications (grant no. 109L900203) and by the Ministry of Science and Technol-ogy (MOST), Taiwan, (grant no. MOST 109-2634-F-002-045). Hendrik Rommeswinkel thanksErasmus University Rotterdam for their hospitality and the Ministry of Science and Technology(grant no. MOST 108-2918-I-002-009) for financial support of a research visit during which thispaper was written.

1 Introduction

Decisions often involve choices between alternatives from which knowledge canbe gained. Consider the number of test subjects recruited by an experimenter,or the too few hours of sleep left after finishing an addictive novel — both ofthese reveal a preference for acquiring knowledge. Sometimes, this knowledgeis acquired as a means to achieve other goals; other times, this knowledge iscompletely inconsequential. The experimenter hopes to gain useful knowledgebut Agatha Christie’s Murder on the Orient Express is read for pleasure only. Inthis paper, we construct a theory of decision under uncertainty that accountsfor preferences for knowledge in general.

We embed our analysis into Savage (1954)’s framework and axiomatizationof rational behavior under uncertainty. In this model, decision makers havepreferences over acts. An act yields an outcome for every possible state of theworld. We assume that after an act is resolved, the decision maker infers fromthe obtained outcome and the chosen act what event obtained. The decisionmaker can therefore only distinguish two events via an act that yields distinctoutcomes on the two events.

We argue that if a decision maker is interested in gaining knowledge aboutwhich event obtains, then several of Savage’s axioms lose their normativeappeal. This is easiest to see for the monotonicity axiom that requires that thepreferences over achieving outcomes on an event are the same irrespective ofwhat other outcomes are achieved on the other events. If the decision makercares about knowledge, replacing an outcome by a worse outcome on someevent may allow the decision maker to identify if this event obtains and maythus be desirable. For example, in order to find out whether an event obtains,you may reject a sure payoff of 100 Euros in favor of a gamble that pays you100 Euros only if an event obtains and nothing otherwise.

We provide an axiomatization of a decision model that accounts for apreference for acquiring knowledge about events. For this, we weaken theaxioms of Savage (1954). Savage’s sure-thing principle imposes separabilityof preferences on conditional acts given an event from the preferences onconditional acts on the complement. We only assume this separability ofpreferences whenever the the event can be distinguished from its complement.Similar adjustments are made to the monotonicity axiom and the likelihoodoutcome independence axiom. We add one axiom that guarantees that thepreference for knowledge is separable from the preference for achieving good

1

outcomes. Our representation theorem characterizes what we call a subjectiveknowledge utility in which the utility of an act is additively separable into theexpected utility of the outcomes and the value of the information partitionrevealed by the act. The value of the information partition is in turn additivelyseparable in the revealed events. We then analyze the value of knowledge inmore detail both conceptually and axiomatically.

Conceptually, our axiomatic analysis gives rise to a utility representationof the value of knowledge from which we derive further concepts. First, wedefine a classification of decision makers by their knowledge preferences intoknowledge-seeking, knowledge-neutral, and knowledge-averse and show thatthese can be linked to subadditivity and superadditivity properties of our utilityrepresentation of knowledge preferences. Second, we define a measure of thevalue of knowledge about events and a knowledge equivalent which roughlyplay the role of risk attitude measures and certainty equivalents in decisionsunder uncertainty. Third, we derive a curiosity relation with which decisionmakers can be ranked by their value of knowledge.

Axiomatically, we sharpen our decision model further, by adding axioms thatguarantee that the decision maker does not care about obtaining specific piecesof knowledge but rather cares about the (quantified) amount of informationgained about the state of the world.

Probabilistic knowledge utility consists of the sum of expected utility and avery general information measure. It is characterized by assuming that thedecision maker’s value of knowledge only depends on the likelihoods of theinvolved events. This representation turns out not to be stationary in the sensethat the value of information may change as additional knowledge is gained.

If the information measure is the Shannon (1948) entropy, we obtain entropicknowledge utility. It is characterized by a stationarity condition which ensuresthat after Bayesian updating of the probabilities on an event, the tradeoffremains the same between achieving good outcomes and being able to dis-tinuishing two events with a fixed relative probability. Essentially, this allowsthe decision maker to use the same utility functions for evaluating conditionalacts after Bayesian updating of the probabilities.

In the context of probabilistic knowledge utility we also define what wecall the elasticity of curiosity that describes the change in intensity of knowl-edge preferences as the decision maker becomes more and more informed.From this definition, we derive constant elasticity of curiosity utility which is a

2

convenient two parameter functional form that incorporates two importantaspects of knowledge preferences: the intensity of knowledge preferences andthe elasticity of curiosity.

Stationary information cost utility accounts for the fact that decision makersmay sometimes only be interested in a specific statistical variable but considerany information received as costly to process. For example, a decision makerwho is interested in the mean of a random variable would prefer to know thesample mean of a data set instead of being informed about every single datapoint. Stationary information cost utility attaches a prior-independent valueto each posterior probability measure over a random variable. The cost ofprocessing the information is given by the Shannon entropy. We characterizethis functional form by weaking the stationarity axioms used to characterizeentropic knowledge utility. An example of a stationary information cost utilityis the mutual information between the known events and the variable thedecision maker is interested in.

In the light of our results, we briefly revisit Savage’s endeavour to providea foundation of statistics. Savage (1954) considered statistics as a practice ofsolving partition problems in which a decision maker tries to distinguish if anevent obtains or not. In our framework, acts not only provide more or lessgood outcomes but also solve partition problems. This allows us to considerthe choice of a hypothesis test or the choice of the number of subjects invitedto an experiment as choices over acts, enabling us to study them within ourtheory. As an application we therefore briefly sketch how estimation problems,hypothesis tests, or optimal data collection fall into the domain of the decisionproblems to which our model applies.

The paper proceeds as follows. In section 2, we briefly review the relatedliterature. The notation is introduced in section 3. In section 4, we introduceour model of knowledge in a Savage framework, propose our adjustmentsto Savage’s axioms, and characterize subjective knowledge utility based onthese axioms. Several concepts to analyze the value of knowledge of decisionmakers are introduced in section 5. We then axiomatically characterize variousforms of knowledge preferences, specifically, probabilistic knowledge utility insubsection 6.1, entropic knowledge utility in subsection 6.2, constant elasticityof curiosity utility in subsection 6.3, and stationary information cost utility insubsection 6.4. Section 7 translates aspects of statistical estimation into ourframework. Finally, section 8 discusses alternative ways of modeling knowledge

3

and the necessity of Savage’s axioms for rationality before section 9 concludes.

2 Literature

Our paper is closely related to the literature on the intrinsic value of informa-tion, which has been studied theoretically (Grant, Kajii, & Polak, 1998; Golman& Loewenstein, 2018) and empirically (Bennett, Bode, Brydevall, Warren, &Murawski, 2016; Falk & Zimmermann, 2016; Kops & Pasichnichenko, 2020;Masatlioglu, Orhun, & Raymond, 2017). Kadane, Schervish, and Seidenfeld(2008) discuss whether it can be rational for a decision maker to pay for notreceiving information and how a theory of decision can incorporate this. Ouradjustment to Savage’s model allows for a negative value of information andtherefore also for such preferences. Wakker (1988a) shows that violations of theindependence axiom imply that a decision maker displays information-aversebehavior for some choices. Safra and Sulganik (1995) shows that for every infor-mation structure there exists a nonexpected utility maximizer that prefers lessinformation to more. In a working paper, Alaoui (2012) studies under whichcircumstances behavioral decision makers attach value to useless information.Luce, Ng, Marley, and Aczel (2008a, 2008b) study the functional form of ourentropic knowledge utility representation in the context of utility for gambling.In a recent working paper, Liang (2019) characterizes intrinsic informationpreferences in an Anscombe and Aumann (1963) model. Information in Liang(2019) is modeled as objective probabilities over the state space as revealed byan outcome. Similar to our model, the decision maker infers information aboutthe state space from the outcomes. Unlike our model, inference is about theobjective probabilities of the second stage of the Anscombe-Aumann model.Thus, the remaining uncertainty after an act is resolved is an objective proba-bility distribution over states of the world, which enters the decision maker’sutility function. We do not require the availability of objective probabilities forour axiomatization. Our decision maker values knowledge about which eventobtains and assigns these events a subjective probability. Knowledge aboutwhich event obtains of course provides information about what the state of theworld is. However, in our model this information may be entirely subjectiveand different decision makers may disagree on how informative an event isabout a state. Another important aspect that distinguishes our model is thatsince our decision maker values knowledge of events, the decision maker may

4

have a stronger preference for knowing the combination of some states thanthe combination of other states.

Cooke (2017) models a tradeoff between current consumption and (taste)information using a multi-stage model in which choices over intial consumptionand choices over menus of later consumption are observed. Caplin and Leahy(2001) model anticipatory feelings over the resolution of lotteries, such assuspense. In contrast to their model, in which probabilities are fully objectiveand are separated into two stages, the uncertainty in our paper is fully subjectiveon arbitrary event spaces.

There exists a large literature on the instrumental value of information.Blackwell (1953) shows the equivalence of ranking experiments by their in-strumental value to expected utility maximizing decision makers and rankingexperiments by statistical sufficiency. Celen (2012) extends this to max minpreferences. Torgersen (1991) provides a comprehensive review of the literatureon comparisons of experiments. Hilton (1981) summarizes some of the findingson the instrumental value of information and its relation to risk aversion, flexi-bility, wealth, and prior uncertainty. Snow (2010) examines the instrumentalvalue of information under ambiguity. Bassan, Gossner, Scarsini, and Zamir(2003), Jakobsen (2016), Lehrer and Rosenberg (2006, 2010), Lehrer, Rosenberg,and Shmaya (2010, 2013), De Meyer, Lehrer, and Rosenberg (2010), Rosen-berg, Salomon, and Vieille (2013) consider the value of information in variousgame-theoretic settings.

The conception of knowledge about the state of the world that we employ isthe usual one employed in decision and game theory, using partitions of thestate space. Aumann (1999) links this to a syntactic representation of knowledge.Gilboa and Lehrer (1991) characterize the value of information partitions that isexpected utility rationalizable using methods from cooperative games. Theirpaper provides an answer to the question under which conditions the valueof knowledge we characterize may arise from expected utility maximizationof a subsequent decision problem. Azrieli and Lehrer (2008) extend theiranalysis to stochastic information structures. Ghirardato (2002) discusses howa dynamic consistency axiom that resembles the sure-thing principle expressesa nonnegative instrumental value of information. In a recent working paper,Galanis (2019) further extends the analysis of this relation between dynamicconsistency and information value to ambiguity averse decision makers.

If the value of knowledge in our paper is seen as purely instrumental for

5

solving future decision problems, then there is a natural relation to the value offlexibility (Kreps, 1979). Flexibility allows decision makers to react to futureinformation while knowledge allows decision makers to use flexibility. Nehring(1999) axiomatizes preference for flexibility in a Savage framework. Studyingthe duality between these axiomatizations is an interesting avenue for furtherresearch.

3 Notation

Let S be a set of states of nature. E denotes a σ-algebra of events on S. For anyevent E, let E = S− E be the complement of E in S. X is a set of outcomes. Anact is a function a : S→ X that is measurable, i.e., for all subsets of outcomesX ⊆ X, the preimage is an event, a−1(X) ∈ E. In this paper we only considersimple acts, i.e., acts that are finite-valued |a(S)| < ∞.

The set of (simple) acts is denoted by A. If E ∈ E is an event, and f anact then fE : E → X denotes the restriction of f to E which we will call aconditional act. The set of conditional acts on an event E is denoted AE. The setof outcomes resulting from a conditional act fE and an event F ⊆ E is definedas the image fE(F) = {α ∈ X : fE(s) = α for some s ∈ F}. For convenience, weintroduce several ways of denoting acts. If f and g are acts and E is an event,then fEg denotes the act that agrees with f on the event E and with g on theevent E. Constant acts are simply denoted by their outcome, α ∈ X.

A decision maker ranks acts via a preference relation %. f % g means the actf is at least as good as the act g, f ∼ g is the symmetric part of the relation andmeans that the decision maker is indifferent between the two acts. � denotesthe asymmetric part of the relation and indicates strict preference. A functionU : A→ R represents the relation % if f % g⇔ U( f ) ≥ U(g).

An event E is nonnull if for some outcomes γEβ � β. We denote the set ofnull events by N ⊆ E. An atom is a nonnull event E that cannot be partitionedinto two nonnull events. We assume throughout the paper that there are noatoms. This guarantees that the event space is sufficiently rich.

We make the following additional richness assumption that greatly simplifiesour analysis. For every outcome α, there exists a countable number of indifferentoutcomes α′, α′′,. . .. We can interpret the outcome α′ as the same as the outcomeα but with a label attached to it that allows the decision maker to distinguishbetween α and α′.

6

For some results, we also assume that the set of outcomes is rich enoughsuch that conditional acts are outcome-solvable, i.e., for all events E and allconditional acts fE and some γ 6∈ fE(E) there exists an outcome ε 6= γ, suchthat εEγ ∼ fEγ.

4 Axiomatization of a Preference for Knowledge

Savage (1954) did not specify what information the decision maker acquiresabout the state space as an act is resolved. To start, we clarify that Savage’saxioms maintain their normative appeal in case the exact state of the world isalways revealed when the act is resolved.1 However, there are many decisionsin which different information is acquired depending on which act is chosen.To account for such decisions, we must clarify what information is acquired asan act is resolved.

We assume that the decision maker finds out which event obtains by combin-ing the memory of which act she played with the information which outcomeshe experienced. Since this is a justified true belief, we say that the decisionknows that this event obtains. Thus, if the decision maker receives outcome α

from act f , then the decision maker knows that the event f−1(α) obtains. Theknowledge that will be acquired from this act can therefore be described by theinformation partition ι( f ) = { f−1(α) : α ∈ X}. An example is provided in Table1. a is a constant act that does not reveal any knowledge about events. b yieldsdistinct outcomes on events E and E and the decision maker can concludefrom the outcome which of these two events obtains. We justify this choice ofmodeling knowledge and discuss alternative possibilities in section 8.

E E ιa α α {S}b α β {E, E}c β α {E, E}d β β {S}

Table 1: Example Acts and their Information Partition

We next examine whether Savage’s axioms maintain their normative appealunder this interpretation of the knowledge gained from an act and propose

1The state space of a roulette wheel fulfills this condition. The joint state space of a roulettewheel and a blackjack table violates this condition in case one can only stand at either tableand not observe the other.

7

alternative axioms where necessary. The normative position from which wediscuss the axioms can be summarized as follows:

“A decision maker may gain joy from acquiring knowledge, irrespective ofwhether this knowledge will be useful in future decisions. The decision makermay be willing to forego better outcomes in favor of gaining this knowledge. Atheory of rational choice under uncertainty must allow for this behavior.”

We think this statement is uncontroversial to most decision makers. Ifyou ever bought and read a book purely for entertainment and were excitedabout the events revealed in the book, most likely you will need to accept thisstatement. We say that a decision maker who exhibits the described behaviorhas a preference for knowledge. Our theory of a preference for knowledgeabout events naturally encompasses a theory of preference for informationabout events or states of the world. This is because knowledge of an event mayprovide information about other, correlated events. This will be made moreprecise in subsection 6.4.

We now show that Savage’s axioms of expected utility are incompatible witha preference for knowledge, and provide alternative axioms that characterize anovel representation of preferences.

P 1 (Weak Order). % is complete and transitive.

We will maintain the Weak Order assumption. The Blackwell order is anexample of an informativeness ranking that does not fulfill completeness as itranks experiments by whether they reveal unambiguously more information.Relaxing the assumption of completeness may therefore be an interestingavenue for future research.

P 2 (Sure-Thing Principle). For all f , g ∈ A, and all α, β ∈ X,

αE f % αEg⇔ βE f % βEg (1)

The sure-thing principle guarantees that we can ignore identical conditionalsubacts when comparing two acts. A simple example shows that the sure-thingprinciple is in direct conflict with a preference for knowledge. Consider theacts in Table 1. The sure-thing principle requires that a % b if and only if c % d.However, b and c provide knowledge about E while a and d do not. A decisionmaker who would like to know whether event E obtains may therefore prefer bto a and c to d as long as the outcomes α and β are sufficiently close in value.The issue is that a and b as well as c and d not only differ by a change in the

8

outcomes but also by a change in the information partition. The sure-thingprinciple demands that the change in the information partition is irrelevantfor preference. Notice however that in case the outcomes yielded by fE and gE

on E all differ from α and β, then P2 is not inconsistent with a preference forknowledge because E is always part of the information partition. In this case,the sure-thing principle maintains its normative appeal.

We now formalize this idea using disjoint conditional acts.

Definition 1 (Disjoint Conditional Acts). A conditional act fE is disjoint from aconditional act gF if the images fE(E) and gF(F) are disjoint: fE(E)∩ gF(F) = ∅.

Two conditional acts are therefore disjoint if they have no outcomes incommon. This definition helps us to specify which events the decision makercan identify after an act is resolved. If fE and fE are disjoint, then after the actf , the decision maker will know with certainty whether event E has happenedor not. We adjust the sure-thing principle in the following manner:2

Axiom 2 (Information-Neutral Sure-Thing Principle). Let E ∈ E be an event.Suppose fE and gE are disjoint from hE and kE. Then,

fEh % gEh ⇔ fEk % gEk (2)

The information-neutral sure-thing principle differs from the sure-thingprinciple in two respects. First, full separability of the preferences across E andE is limited to acts in which the decision maker learns with certainty whetherevent E obtains. This is achieved by requiring disjointness of the conditionalacts on E and the conditional acts on its complements. Second, instead ofonly changing a single outcome, we allow for changes of conditional acts, i.e.,the change in Equation (2) from hE to kE may not only change a particularoutcome but may change several outcomes. This means that also the knowledgeabout events contained in hE and kE may differ. The independence conditiontherefore not only imposes separability of the preferences on outcomes butalso on knowledge about events. For example, hE may achieve a single goodoutcome and kE achieves several poor outcomes but informs the decision makerabout several subevents of E.

Savage’s monotonicity axiom conflicts in a similar manner as the sure-thingprinciple with a preference for knowledge.

2For easier comparison, we number our axioms by their counterparts P1-P6 in Savage (1954).

9

P 3 (Monotonicity). For all f ∈ A, E ∈ E−N, and all β, γ ∈ X,

γ % β ⇔ γE f % βE f (3)

Monotonicity states that outcomes are ranked the same way on every event.It implies in our previous example in Table 1 that a is preferred to d if and onlyif b is preferred to d. However, note that b reveals whether event E obtains buta and d do not. The decision maker may therefore judge d to be better thana because the outcome β is preferable to the outcome α and at the same timejudge b to be better than d because b reveals event E. Note again that there isno issue with imposing the monotonicity axiom with respect to event E in casethe outcomes on E are disjoint from the outcomes of E.

We therefore make the following adjustment to the monotonicity axiom:

Axiom 3 (Information-Neutral Monotonicity). Suppose hE is disjoint from γ

and β. Then,

γEhE % βEhE ⇔ γ % β (4)

Information-neutral monotonicity implies that outcomes are identicallyranked on all events as long as a change in outcomes does not change theinformation partition. By requiring disjointness, we guarantee that whencombining βE and γE with hE, the information about the event E is identicalacross the two acts. For example, βEγE � γ and γ � β is a legitimate preferencein our model. It means that the outcome γ is preferable to β but obtaininginformation about the event E may be more valuable than receiving the betteroutcome γ instead of β on the event E.

P 4 (Likelihood Outcome Independence). For all E, F ∈ E and all β, β′, γ, γ′ ∈ X,such that γ � β and γ′ � β′,

γEβ % γFβ ⇔ γ′Eβ′ % γ′Fβ′ (5)

The likelihood outcome independence axiom ensures that the decisionmaker has preferences consistent with the existence of a likelihood relation overevents. For an expected utility maximizing decision maker who prefers γEβ

to γFβ, we can conclude that the decision maker judges E to be more likelythan F if γ � β. No matter whether γ′ is a lot or just slightly better than β′, thedecision maker should prefer to realize γ′ on the more likely event. Thus, the

10

decision maker should also prefer γ′Eβ′ to γ′Fβ′.For a decision maker who cares about knowledge, γEβ � γFβ need not mean

that E is strictly more likely than F. Instead, the decision maker may simplyprefer to know whether E obtains to knowing whether F obtains. Indeed, ifF = S, then γFβ = γ is completely uninformative and γEβ � γFβ simply meansthat the decision maker is willing to accept the loss of realizing β instead of γ

on event E in order to find out whether event E or event E obtains. Likelihoodoutcome independence then requires the decision maker to not only acceptthis loss but also any arbitrarily large loss of realizing β′ instead of γ′ on E. Inthis manner, likelihood outcome independence disallows tradeoffs between thevalue of knowledge and outcomes.

Yet again there are also cases in which likelihood outcome independenceis consistent with a preference for knowledge. From γEβ % βEγ = γEβ, wecan indeed conclude that E is at least as likely as E; in both acts the decisionmaker learns whether the event E obtains. The former act must then be atleast as good as the latter act in virtue of event E being at least as likely as itscomplement. We can generalize this idea to arbitrary disjoint events E and F byconcluding that E is at least as likely as F if the decision maker weakly prefersto receive the good outcome γ on E and the bad outcome β on F as long as bothacts yield the same information. We therefore change the likelihood outcomeindependence axiom to be consistent with this way of identifying likelihoods.

Axiom 4 (Information-Neutral Likelihood Outcome Independence). For alldisjoint events E, F ∈ E and all distinct outcomes α, β, β′, γ, γ′ ∈ X, if γ � β,γ′ � β′, then

γEβFα % βEγFα ⇔ γ′Eβ′Fα % β′Eγ′Fα. (6)

In fact, it is straightforward to show that the above axiom implies likelihoodoutcome independence given the sure-thing principle. Under the weakerinformation-neutral sure-thing principle however, the two axioms are distinct.

11

For a preference % fulfilling information-neutral likelihood outcome inde-pendence, there exists a well-defined likelihood relation %∗ over events.3

Definition 2 (Likelihood Relation). Let E, F ∈ E be events and α, β, γ ∈ X bedistinct outcomes such that γ � β. E is revealed by % to be more likely than F,E %∗ F if

γE−FβF−Eα % βE−FγF−Eα. (7)

P 5 (Nontriviality). There are β, γ ∈ X such that γ � β.

Nontriviality is consistent with a preference for knowledge. However, itimposes that the decision maker also has some nontrivial consequentialistpreference and thus cannot only have a preference for knowledge.

Finally, Savage introduces a continuity assumption that guarantees that onecan find sufficiently small events to measure the utility of every act in terms ofan arbitrarily chosen outcome.

P 6 (Event Continuity). If f , g ∈ A are such that f � g and α ∈ X, then there isa finite partition H of S such that for every H ∈ H,

αH f � g

and f � αHg. (8)

While we technically would be able to work with this axiom, we find that itis not quite in the spirit of our paper. Specifically, the change from f to αH f mayinvolve different changes to the information partition than the change from g toαHg. We therefore employ a continuity axiom that ensures that preferences arecontinuous in monotone changes in events on which outcomes are achieved.

We state our continuity properties in terms of convergent sequences of actsin which both the information and the outcomes converge. Stating continuityproperties using convergent sequences is quite standard in the context of deci-sions within a topological space such as consumer choice. To avoid introducinga topological structure on outcomes or acts, we use set-theoretic limits of events.For a sequence f k, k = 1, 2, . . ., the notation f k → f means that the acts inthe sequence f k become arbitrarily similar to f with the following condition.

3We note at this point, that empirically, obtaining likelihoods judgments in this mannermay be preferable if the exact state cannot be revealed to the subject. In case subjects have apreference for knowledge, then the above way of obtaining likelihood judgments is robust tothis preference.

12

Every outcome α is either acquired on more and more states or on less andless states as the sequence progresses. Moreover, the set of states in which theoutcome obtains is in the limit the same as in f . We make this notion precise inAppendix A.

Continuity of the preference relation is then defined as follows:

Axiom 6 (Continuity). If f k → f and gk → g and for all k, f k % gk, then f % g.

As sequences of acts converge to an act, the preference is not permitted to“jump” – similar acts must be similarly ranked.

This concludes our adjustments to Savage’s original axioms. In addition, weassume that the intrinsic value of information is separable from the outcomesobtained. In principle, a decision maker may find information especiallyvaluable in case a particular outcome is obtained. For example, knowing abouthow to drive a car will generally be more useful if one acquires a car. Thereare many cases in statistics, research, and media consumption in which it isplausible to assume that the benefits from knowledge are separable from theoutcomes obtained. Nevertheless, we fully acknowledge that the followingaxiom is not a pure axiom of rationality.

Axiom 7 (Learning Independence). Suppose α ∼ α′, δ ∼ δ′ are disjoint outcomesfrom f , g, and E ∩ F = ∅. Then,

αEαF f % δEδFg ⇔ αEα′F f % δEδ′Fg (9)

This axiom guarantees that the value of learning whether event E or Fobtains is unrelated to whether the outcomes α or δ arise on these events.

Our main representation theorem based on the previously introduced ax-ioms characterizes the sum of expected utility and the value of learning whetherparticular events obtain or not. We call this value a subjective knowledge utilitybecause the value of knowing if an event obtains is subjectively determined. Adecision maker may prefer learning about event E rather than event F simplybecause she finds E more interesting than F.

Definition 3 (Subjective Knowledge Utility). % on the set A defined on eventsE has a subjective knowledge utility representation if there exist functionsU : A → R, u : X → R, a monotonely continuous function h : E → R and a

13

probability measure µ : E→ R such that U( f ) ≥ U(g) if and only if f % g and

U(a) = ∑α∈X

u(α)µ(a−1(α)) + ∑α∈X

h(a−1(α)) (10)

The first component of the representation is the expected utility of theoutcomes and will be called the instrumental component or the instrumentalpreference of the decision maker. The second component contains the utilityfrom gaining knowledge about what event obtains. After choosing act a, theevent a−1(α) is known if outcome α is achieved. h(E) + h(F) expresses howmuch the decision maker enjoys being able to distinguish between the event Eand F. If h(E) + h(F) is greater than h(E ∪ F), then the decision maker wouldlike to know if E or F obtains.

Theorem 1. Suppose all conditional acts are outcome-solvable. Then the followingstatements are equivalent:

1. % has a subjective knowledge utility representation with a unique probabilitymeasure.

2. % fulfills weak order, the information-neutral sure-thing principle, information-neutral monotonicity, information-neutral likelihood outcome independence, non-triviality, continuity, and learning independence.

By adjusting Savage’s axioms we have therefore obtained a representationthat accounts for the value of knowing which event obtains.4 These preferencesencompass a wealth of possible attitudes to information as we will see in section6. They can express a varying intensity of preference for knowledge about someevents but a preference against knowledge for other events. In the followingsection we therefore analyze the subjective value of knowledge a decision makerattaches to being able to distinguish between events in more detail.

5 Value of Knowledge

Regarding risk attitudes, we often distinguish preferences that are risk-seeking,risk-neutral, and risk-averse. A similar classification can also be done withknowledge preferences which we will call the knowledge attitude of a decision

4Gilboa and Lehrer (1991) provide conditions such that the preference for knowledge aboutevents expressed by h can be derived from expected utility maximization with respect to asubsequent decision problem.

14

maker. A decision maker may exhibit knowledge aversion or knowledge seekingbehavior. An individual is knowledge seeking about distinguishing event Efrom event F if for outcomes α ∼ α′, β we have that αEα′Fβ % αE∪Fβ. Similarly,the decision maker is knowledge averse about these events if the preference isreversed.

If for all nonnull, disjoint events E, F the decision maker is knowledgeseeking (averse), we say simply that the decision maker is knowledge seeking(averse). An expected utility maximizer is both knowledge seeking and knowl-edge averse and thus knowledge-neutral. Since a set function h defined on aσ-algebra is subadditive if h(E ∪ F) ≤ h(E) + h(F) for all E, F, the followingresult is straightforward.

Proposition 1. A subjective knowledge utility is knowledge-seeking if and only ifh is a subadditive set function. The utility is knowledge-averse if and only if h is asuperadditive set function.

Subadditivity and superadditivity of h are therefore the defining propertyof knowledge attitudes. We will see below that this property extends also tocomparisons of decision makers.

For a decision maker with a subjective knowledge utility representation U,we define the value of knowledge as follows.

Definition 4 (Value of Knowledge). For a decision maker with a subjectiveknowledge utility representation, the value of distinguishing a finite set ofdisjoint events {E1, . . . , En} is

H(E1, . . . , En) ≡U(γ

(1)E1

...γ(n)En

δ)−U(γE1∪...∪En δ)

µ(⋃n

i=1 Ei)(U(γ)−U(β))=

(∑ni=1 h(Ei))− h(

⋃ni=1 Ei)

µ(⋃n

i=1 Ei)(U(γ)−U(β))

(11)

where γ(1), . . . , γ(n) are indifferent, mutually distinct outcomes.

This definition has the desirable property that it fulfills the identity H(E, F, G) =

µ(E ∪ F ∪ G)H(E ∪ F, G) + µ(E ∪ F)H(E, F) and thus the value of gainingknowledge about several events can be decomposed into the expected value ofknowledge of being able to distinguish between binary events.

In decisions under risk, certainty equivalents are a useful indicator of riskpreferences. For knowledge preferences, we define the knowledge-equivalent(KE) as follows:

15

Definition 5 (Knowledge Equivalent). At outcome β, the knowledge equivalent(KE) of whether E or F obtains is the outcome γ that fulfills:

βEβ′Fα ∼ γE∪Fα (12)

for some α and β′ ∼ β. We denote the knowledge equivalent by KE(E, F, β) = γ.

The following result is straightforward.

Proposition 2. Suppose conditional acts are outcome-solvable and the decision maker’spreferences have a subjective knowledge representation. Then the decision maker has ahigher value of knowledge for distinguishing E from F than distinguishing G from Hif and only if its knowledge equivalent is preferable, i.e.,

H(E, F) ≥H(G, H)

⇔ KE(E, F, β) %KE(G, H, β)∀β ∈ X (13)

Comparing the knowledge preferences of different decision makers is non-trivial in a subjective model. Decision makers may differ in their utilities ofoutcomes, u, probabilities, µ, and knowledge preferences, h. Consider as anexample the decision between βEβ′Fδ and γE∪Fδ which a decision maker 1 mayperceive as the decision between a better outcome γ �1 β ∼1 β′ and knowledgeabout E in case E∪ F comes about. A decision maker 2 who disagrees on u andh may see this as a decision between a lottery over outcomes β �2 β′ or a safepayoff γ. Yet another decision maker 3 may think that E is null and learningabout E ∪ F is sufficient to know that F obtains. Comparisons between decisionmakers therefore need to carefully account for the various aspects decisionmakers can differ on.

To compare knowledge attitudes, we therefore assume that all decisionmakers agree on the likelihood of events, %∗. We say that two decision makers1 and 2 have identical instrumental preferences if for all acts f and all acts gsuch that ι( f ) = ι(g), we have that f %1 g iff f %2 g.

Definition 6 (Curiosity Relation). For two decision makers with preferences%1 and %2, yielding identical likelihood relations %1∗=%2∗, and identicalinstrumental preferences, we say that %1 is more curious about (disjoint events)

16

E and F than %2 if,

βEβ′Fα %2γE∪Fα ⇒ βEβ′Fα %1 γE∪Fα (14)

and βEβ′Fα �2γE∪Fα ⇒ βEβ′Fα �1 γE∪Fα (15)

for all α, β ∼ β′, γ ∈ X. We say that %1 is more curious than %2 if the former ismore curious than the latter about all disjoint events E and F.

The following proposition establishes that a DM is more curious thananother if she has a higher knowledge equivalent.

Proposition 3. Suppose two decision makers with a subjective knowledge represen-tation agree on %∗ and have the same instrumental preferences. Then the followingstatements are equivalent:

1. KE1(E, F, β) % KE2(E, F, β) for all β ∈ X, E, F ∈ E.

2. %1 is more curious than %2.

3. D(E) = h1(E)U1(γ)−U1(β)

− h2(E)U2(γ)−U2(β)

is a subadditive function.

The function D(E) can be seen as the difference between the two decisionmakers in the (normalized) value of knowing whether event E obtains. Theequivalence of the second and third statement is very natural given that sub-additivity of h implies knowledge-seeking behavior. The equivalence betweenthe two statements means that “higher subadditivity” means a higher curiosity.This is similar to analogous results for risk preferences; the subadditivity of h iswith respect to knowledge preferences what the concavity of u is with respectto risk preferences.

Equipped with an understanding of how h represents knowledge prefer-ences, we characterize specific functional forms of h in the following section.

6 Characterizations of the Value of Knowledge

While the appeal of the subjective knowledge utility representation lies in itsgenerality, for practical applications it is often desirable to have more restrictivedecision models. In this section we axiomatically characterize several functionalforms that are more tractable in practical applications.

17

6.1 Probabilistic Knowledge Utility

A decision maker might only care about the probability of the known eventsand ignore all other aspects of these events. This is an appealing condition if theevent space is very homogeneous and there are no events that are intrinsicallymore interesting than others. In practical applications it may also be a usefulapproximation in case the decision maker has already previously selected whatinformation to gain and the primary tradeoff is how much information is gained.

Definition 7 (Probabilistic Knowledge Utility). % on the set A of simple actson events E has a probabilistic knowledge utility representation if there existfunctions U : A → R, u : X → R, a continuous function h : [0, 1] → R, and aprobability measure µ : E→ R such that U( f ) ≥ U(g) if and only if f % g and

U(a) = ∑α∈X

u(α)µ(a−1(α)) + ∑α∈X

h(µ(a−1(α))) (16)

We call this utility representation a probabilistic knowledge utility becausethe knowledge about events is only valued in terms of the subjective probabilityattached to the events. Thus, acts are treated as signals about the state spaceand no special value is attached to knowing particular events. We formalize thenotion of a decision maker only caring about the likelihood of events for theirvalue of knowledge as follows.

Axiom 8 (Principle of Indifference of Information). If E ∼∗ F, then,

αEβFαGβHγ ∼ βEαFαGβHγ. (17)

This principle of indifference of information states that if E and F are equallylikely, then the decision maker is indifferent between being informed aboutwhether E or G obtains or being informed about whether F or G obtains.Together with our previous axioms, it characterizes a probabilistic knowledgeutility.5

Theorem 2. The following statements are equivalent:

1. % has a probabilistic knowledge utility representation with a unique probabilitymeasure.

5Unlike our first result, this result does not use outcome solvability and therefore does notimplicitly assume a continuum of outcomes. We derive this result for a countably infinitenumber of outcomes for technical convenience. In principle, the result can also be derived for asufficiently large finite number of outcomes.

18

2. % fulfills weak order, the information-neutral sure-thing principle, information-neutral monotonicity, information-neutral likelihood outcome independence, con-tinuity, learning independence, and the principle of indifference of information.

6.2 Entropic Knowledge Utility

A probabilistic knowledge utility representation is dynamically consistent inthe sense that for every event we can find a probabilistic knowledge utilityrepresentation that represents the preferences over conditional acts on thisevent. However, this utility representation does not need to be the one obtainedfrom applying Bayesian updating to the probabilities and then using the samerepresentation. It is in this sense dynamically consistent but not stationary:suppose a decision maker chooses between acts that yield distinct outcomes onevents E, F, G, which partition the state space. If the decision maker learns thatevent E does not obtain, updates beliefs according to Bayes’ rule, and then usesequation (16) to evaluate the conditional acts given F ∪ G, preference reversalsmay occur. In contrast, the following representation based on the Shannonentropy does not exhibit such preference reversals.

Definition 8 (Entropic Knowledge Utility). % on the set A of simple acts onevents E has an entropic knowledge utility representation if there exist functionsU : A→ R, u : X→ R, a real number v, and a probability measure µ : E→ R

such that U( f ) ≥ U(g) if and only if f % g and

U(a) = ∑α∈X

u(α)µ(a−1(α))− v · ∑α∈X

µ(a−1(α)) ln µ(a−1(α)) (18)

The parameter v determines how strong the knowledge preference is andwhether the decision maker is knowledge-seeking or knowledge-averse. Theentropic knowledge utility is indeed unique in the respect that if U is theentropic knowledge utility representation, then U can also be used to compareconditional acts in AE after updating probabilities. It therefore deserves its owncharacterization which we prepare with the following definitions.

Definition 9 (Bayesian Updating). A relation %∗E on the subevents intersectingE is obtained from Bayesian updating of %∗ if it is a quantitative probabilitywith representation µE and µE(F) = µ(F ∩ E)/µ(E).

From the probabilities obtained by Bayesian updating we can derive compar-isons between conditional likelihoods. Let C = E× E be the set of conditional

19

events, with elements denoted by E|F. We can define a likelihood relation on C

as follows. We impose that E|F %† G|H if and only if µF(F ∩ E) ≥ µH(G ∩ H).6

The following stationarity condition sharpens the principle of indifference ofinformation and guarantees that using the same utility function on conditionalacts after Bayesian updating of beliefs does not yield preference reversals.

Definition 10 (Stationarity I). Suppose for all i ≤ k, Ei|⋃k

i=1 Ei ∼† Gi|⋃k

i=1 Gi.Then, for all (not necessarily distinct) α1, . . . , αk, β1, . . . , βk ∈ X− {γ},

α1E1

. . . αkEk

γ %β1E1

. . . βkEk

γ

⇔ α1G1

. . . αkGk

γ %β1G1

. . . βkGk

γ (19)

Let fF be an arbitrary conditional act on the event F and ψ( fF) an actfulfilling (19) with H = S. Thus, ψ maps conditional acts into acts such that theconditional probabilities of the outcomes are unchanged. If % fulfills stationarityI, then the function U ◦ ψ can be used to evaluate conditional acts. This meansthat the relative value of knowledge remains the same after updating on anevent.

The previous definition of stationarity employed the somewhat nonstandardconditional likelihood relation %†. We therefore also provide an alternativedefinition that does not rely on %† but is equivalent given our axioms of asubjective knowledge representation.

Definition 11 (Stationarity II). For all events E, F, G, H such that E ∼ F andG ∼ H and for all (not necessarily distinct) outcomes α, β, γ, δ ∈ X− {ε},

αEβFε % γEδFε ⇔ αGβHε % γGδHε (20)

The next theorem shows that the two stationarity definitions are equivalentand that the entropic knowledge utility is the only probabilistic knowledgeutility that fulfills these conditions.

6This definition has a natural analogue using equiprobable partitions of events that does notdirectly refer to the probability measure µ. Simply define E|F %† G|H if for partitions withF1 ∼∗ F2 ∼∗ . . . ∼∗ Fk of F and H1 ∼∗ H2 ∼∗ . . . ∼∗ Hk of H, we have that for some l ≤ k,E ⊇ F1 ∪ . . . ∪ Fl and G ⊆ H1 ∪ . . . ∪ Hl . Under monotone continuity this is identical to thedefinition in terms of probabilities.

20

Theorem 3. Suppose % has a subjective knowledge utility representation and fulfillsoutcome solvability. Then the following statements are equivalent.

1. % fulfills stationarity I.

2. % fulfills stationarity II.

3. % has an entropic knowledge utility representation.

A good comparison of this result is the result that the only stationarytime discounting rule is exponential discounting (Koopmans, 1960). Underexponential discounting of a time-independent utility over commodities, we canuse the same utility function on substreams of commodities without preferencereversals. Similarly, in an entropic knowledge utility representation, the valueof distinguishing an event of (conditional) probability p from an event ofprobability 1− p is the same irrespective of how much knowledge the decisionmaker has gained already. This raises the more general question of how thevalue of knowledge in probabilistic knowledge utility representations changesas additional knowledge is gained, which we address next.

6.3 Constant Elasticity of Curiosity Utility

For decision makers with probabilistic knowledge preferences, the value ofknowledge can be expressed as a function of probabilities of events.

H(E, F) = H(µ(E), µ(F)) =h(µ(E)) + h(µ(F))− h(µ(E ∪ F))

µ(E ∪ F)(U(γ)−U(β))(21)

Using the function H, we can express how the value of knowledge changes asevents become more or less likely. Since the value of distinguishing E and F isonly meaningful in case E∪ F is already known, this at the same time expresseshow the value of knowledge changes as more knowledge is gained. We thusdefine the the elasticity of curiosity as the percentage change of the value ofknowledge for a percentage change in likelihood.

Definition 12 (Elasticity of Curiosity). The coefficient of the elasticity of cu-riosity at probability p for distinguishing events of probabilities q and 1− qis

el(p, q) =∂H(pq, p(1− q))

∂pp

H(pq, p(1− q)). (22)

21

The coefficient expresses how many percent the value of knowledge changesfor a one percent increase in probability p of acquiring the information. Therelative probabilities of the events to be distinguished, q and 1− q, are held con-stant in this comparison. It is straightforward to verify that entropic knowledgeutility preferences have a zero elasticity of curiosity.

For many decision makers, the elasticity of curiosity may be nonzero. Adecision maker who is knowledge seeking and has a high elasticity of curiositywill generally only choose to give up instrumental value for a coarsely grainedpartition of the state space. A decision maker with an entropic knowledge utilityrepresentation may also be willing to forego instrumental value to achieve amore detailed partition of the state space. A decision maker may even find abook “addictive” in the sense that reading the second half of the book (allowingto distinguish between equally likely events E and F) has a higher knowledgeequivalent than reading the first half (allowing to distinguish between theequally likely events E ∪ F and E ∪ F). This is the case if the elasticity ofcuriosity is negative.

A functional form that is mathematically tractable and can account for boththe intensity of knowledge preferences and different elasticities of curiosity istherefore desirable. The following two-parameter family of preferences achievesthis:

Definition 13 (Constant Elasticity of Curiosity (CEC) Utility). % on the set Aof simple acts on events E has a constant elasticity of curiosity representationif there exist functions U : A → R, u : X → R, real numbers r and v, and aprobability measure µ : E→ R such that U( f ) ≥ U(g) if and only if f % g and

U(a) = ∑α∈X

u(α)µ(a−1(α))− v ·(

∑α∈X

µ(a−1(α))r − 1

)(23)

The above representation consists of the expected utility and the Renyi(1961) entropy. v is a parameter that determines whether the decision makeris knowledge seeking or knowledge averse and how strong this preference iscompared with the instrumental preferences. r is a parameter that determinesthe elasticity of curiosity.

Table 2 provides an example of CEC preferences for a risk neutral decisionmaker. The willingness to pay is only approximate for greater clarity of exposi-tion. The decision maker has the choice to acquire knowledge of distinguishingevent E from event F in case either of the two obtains. In situation A, the

22

µ(E ∪ F) µ(E) µ(F) approx. WTPA 1 q (1− q) 200 EURB .98 .98 · q .98 · (1− q) 198 EURC .25 .25 · q′ .25 · (1− q′) 100 EURD .245 .245 · q′ .245 · (1− q′) 99 EUR

Table 2: CEC Preference Example

decision maker reports a willingness to pay for the information of 200 EUR.The decision maker is sure that she will learn that either E or F obtains sincethe probabilities q and 1− q sum to one. In situation B, the events E and Fonly make up 98% probability but their relative likelihood is the same as inA. Despite learning to distinguish between E and F being 2% less likely, thewillingness to pay only decreases by 1%. Comparing situations C and D, weobserve the same pattern; a 2% decrease in the likelihood of distinguishing Efrom F leads to a 1% decrease in the willingness to pay. Note that the relativelikelihood of E and F may differ in situations C and D from situations A and B.

We now justify the name CEC preferences for the functional form givenabove.

Proposition 4 (Characterization of CEC Preferences). Suppose a decision maker’spreferences have a probabilistic knowledge utility representation. Then the followingstatements are equivalent:

1. The elasticity of curiosity el(p, q) is constant in p.

2. The preferences have an entropic knowledge utility representation or a constantelasticity of curiosity representation.

In some sense, one can see CEC preferences as corresponding to CRRArisk preferences and entropic knowledge utility preferences as correspondingto CARA risk preferences. CARA preferences are invariant under additionsand substractions of income – the willingness to pay for a gamble does notdepend on the starting income. Entropic knowledge utility preferences areinvariant under additions or removal of initial information - the willingnessto forego good outcomes in favor of more information in case of an eventdoes not depend on the likelihood of this event. CRRA preferences changein a consistent manner as income changes while CEC preferences change in aconsistent manner as initial information changes.

23

6.4 Stationary Information Cost Utility

So far, the characterized functional forms of knowledge preferences treat allinformation about the event space symmetrically. Often, a decision maker mayhowever only be interested in a particular random variable and uninterestedin other knowledge that can be gained. For example, an econometrician maybe interested in a parameter of a distribution but not interested in every datapoint generated from this distribution. Nonetheless, knowledge about the datapoints may be useful to acquire knowledge about the parameter.

We make this idea precise using random variables. A variable V : S→ SV isa measurable map from the state space S into the state space of the variable,SV with σ-algebra EV . That is, for all EV ∈ EV , there exists an event E ∈ E suchthat E = V−1(EV). The state space of the variable, SV may for example specifya parameter value of an econometric model the decision maker is uncertainabout. We define µV

E = µE ◦V−1 as the probability measure over the variablegiven event E ∈ S.

A natural way to evaluate the knowledge gained from an act is to simplyattach a value to each possible posterior probability measure over the randomvariable generated by the knowledge about events and calculate its expectedvalue.7

Definition 14 (Stationary Information Utility). % has a stationary informationrepresentation over variable V if for all events E ∈ E, the conditional relation%E can be represented by

UE(aE) = ∑α

µE(a−1E (α))

(u(α) + v(µV

a−1E (α)

)− v(µVE ))

(24)

In words, the decision maker perceives a tradeoff between expected utilityand the expected valuation of the information gained about the variable V.Analogous to the Bernoulli utility function u for the evaluation of outcomes, thefunction v evaluates the value of the information µV

a−1E (α)

gained by learning that

outcome α obtains and compares it with the value of the information µVE the de-

cision maker had initially.8 What is special about stationary information utility

7To clarify that the evaluation of the posterior probability measures over the variable V donot depend on the prior, we state this and the following representation as representations forall conditional acts. Since for a subjective knowledge representation the preferences over actsdetermine the preferences over conditional acts, this is without loss of generality.

8The representation is unique up to joint linear transformations of u and v and separateadditive transformations of u and v. The functional form is chosen such that UE(γE) = u(γ).

24

is that the evaluation of the posterior does not depend on the prior. An exampleof a stationary information representation is the mutual information in whicheach conditional measure is valued according to v(µV

E ) =∫

µVE (θ) ln µV

E (θ)dθ.Similarly to the entropy, in mutual information the value of knowing aboutan event of the variable only depends on the likelihood of the event of thevariable. However, in some cases this is not a plausible condition. For example,we may be more interested in the first than the 100th digit of a uniform randomvariable. If the state space of the variable is more structured, other interestingfunctional forms of v are possible such as the posterior variance of a real-valuedparameter. If SV = R, i.e., if the parameter is real-valued, then using (thenegative of) the variance, v(µV

E ) =∫

µVE (θ)(θ −

∫θµV

E (θ′)dθ′)2dθ, is a plausible

way of measuring how informed the decision maker is about V.A stationary information representation is only sensible under the assump-

tion that it is costless for the decision maker to process the information. If adecision maker is for example interested in the mean of a random variable, thedecision maker may rather choose to take an action that estimates the meanfrom a data set instead of choosing an act that informs the decision makerabout all data in the data set. We can account for this by introducing a cost of“too much information”.

Definition 15 (Stationary Information Cost Utility). % has a stationary informa-tion cost representation over variable V if for all events E ∈ E, the conditionalrelation %E can be represented by

UE(aE) = ∑α

µE(a−1E (α))

(u(α) + v(µV

a−1E (α)

)− v(µVE )− r · ln(µE(a−1

E (α))))(25)

The cost of too much information is expressed via the Shannon entropy ofthe information partition. If r is negative, then the decision maker perceives atradeoff between gaining as much information as possible about the variable Vand having an information partition that is as simple as possible.

In the remainder of this section, we characterize the stationary informationcost utility. The stationary information utility then follows trivially from theadditional condition that the decision maker only cares about knowledge ofevents in V as stated by the following condition.

Omitting the term −v(µVE ) from the representation yields identical conditional preferences but

does not fulfill this condition.

25

Definition 16 (Indifference to Irrelevant Information). Suppose E and F fulfillµV

E = µVF and are nonnull. Then,

αE∪Fβ ∼ αEα′Fβ. (26)

Thus, if two events yield the same conditional probability measure (andthus the same information) about the variable, then being able to distinguishbetween the two events is irrelevant to the decision maker.

We adjust the stationarity conditions to account for the fact that not onlythe likelihoods of events are important but also how well these correlate withevents in EV .

Definition 17 (Stationarity I*). Suppose for all i ≤ k, Ei|F ∼† Gi|H and µ′Ei=

µ′Gi. Then, for all (not necessarily distinct) α1, . . . , αk, β1, . . . , βk ∈ X− {γ},

α1E1

. . . αkEk

γ %β1E1

. . . βkEk

γ

⇔ α1G1

. . . αkGk

γ %β1G1

. . . βkGk

γ (27)

The corresponding change to the second stationarity axiom is the followingcondition.

Definition 18 (Stationarity II*). Suppose for all events E, F, G, H if E ∼ F andG ∼ H as well as µ′E = µ′G and µ′F = µ′H. Then for all (not necessarily distinct)α, β, γ, δ ∈ X− {ε},

αEβFε % γEδFε ⇔ αGβHε % γGδHε (28)

We now state a similar result to the previous theorem that yielded anentropic knowledge representation but allow for a decision maker’s specialinterest in V. We can understand this stationarity condition using the followingsimple example. Let’s say a decision maker is willing to pay a certain amountof money for information about a parameter. Suppose now somebody offers toflip a coin and sell the information only in case the coin comes up “heads” butneither payment nor information is exchanged on “tails”. In case the decisionmaker’s willingness to pay changes, this would violate stationarity.

The additive separability of the evaluation of prior and posterior distributionturns out to be intrically linked with the stationarity of the value of knowledge.

Theorem 4. Suppose % has a subjective knowledge utility representation and fulfillsoutcome solvability. Then the following statements are equivalent.

26

1. % fulfills stationarity I*.

2. % fulfills stationarity II*.

3. % has a stationary information cost utility representation.

In stationary information representations and stationary information costrepresentations the actions become more or less valuable via the knowledgegained about the variable V. This gives the actions taken by the decision makerthe character of estimators. If V is an unknown parameter of a distribution oran economic model, then the data E may be used to inform the decision makerabout this parameter. This leads us very naturally to the problem of statisticalestimation in general. In the following section, we therefore briefly sketch howcertain practices of statistics reveal certain aspects of knowledge preferences.

7 Partition Problems

Savage (1954) considered statistics as the practice of solving partition problems.A partition problem is the problem of determining whether an event E or anevent F obtains. Our representation theorem for subjective knowledge utilityprovides a subjective value of solving partition problems. This provides anovel perspective on partition problems in general because statistical estimatorsthemselves can be considered as acts in our model. This in turn allows usto impose our axioms on the preferences over estimators and analyze thesepreferences using the tools developed in the previous sections. To see this,we define in the language of our model estimation problems, data collectionproblems, and hypothesis testing problems in the following paragraphs.

Let EV be a σ-algebra on a parameter space SV . From the perspective of thedecision maker, the parameter is a variable V of the more general state space S

of states of the world. For simplicity, we assume that S = ∏ni=1 Si consists of n

realizations of a distribution based on the parameter space.9 We endow S witha product σ-algebra E. The set of outcomes X = R+ ×R consists of elements(c, r) containing a monetary cost c ∈ R+ and estimation results r ∈ R, where R

is the set of possible estimation results.10 The set of estimators is simply the

9In fact, we could assume an arbitrary, rich event space and let the variable and thus theparameter space the decision maker has in mind be fully subjective and only be revealed viathe knowledge preferences.

10Given our way of modeling knowledge with state-independent evaluation of outcomes, theestimation results are simply messages from a signal and are only meaningful in conjunction

27

set of acts in this model. An estimation problem is a choice from the set ofestimators given a subjective knowledge utility.

The concept of an estimation problem is rather abstract. While solutions tospecific estimation problems are beyond the scope of our paper, we show howcertain assumptions employed in statistics can be readily translated into ourframework.

Curiosity about parameter: The decision maker attaches a positive valueto information about the variable V but a nonpositive value to informationabout data that does not reveal additional information about V. A stationaryinformation cost utility is an example that exhibits such preferences.

Distribution assumptions: If the decision maker believes that the observationsare independent and identically distributed, then this is straightforward toimpose on the beliefs µ about events and the variable V. Moreover, for suchestimation problems it is sensible to impose that the h function treats eachobservation symmetrically. Similarly, all other assumptions about the relationbetween parameters and observations are either expressed in the beliefs µ orthe variable V. Since V is only revealed by the knowledge preferences, thedistribution assumptions can be fully subjective as they are fully determinedby the subjectively determined h and µ. µV is the prior belief of the decisionmaker about the parameter. The likelihood that event E ∈ E is observed froman observation given an event EV in the parameter space is the conditional

probability l(E|EV) =µ(E)µV

E (EV)

µV(EV).

Hypothesis testing: A hypothesis testing problem is an estimation problem inwhich the decision maker is interested in distinguishing whether a hypothesisabout the parameters is true or not. Thus, we have a null hypothesis EV ∈EV and its alternative EV for which the decision maker attaches a positivevalue of knowledge, H(V−1(EV), V−1(EV)) > 0, and the value of any furtherpartitioning of EV is nonpositive. In classical hypothesis testing, the value ofh(V−1(EV)) for some EV ∈ EV only depends on the likelihoods of observationsgiven the parameters, l(E|EV) but not on prior probabilities.

Properties of maximum likelihood estimation: Contrary to Bayesian estimators,maximum likelihood estimators do not depend on the prior probabilities µV . Inour framework, this means that in most cases, a maximum likelihood estimatorcan only be a solution to an estimation problem if h has a negative elasticityof curiosity. The less likely an event in EV is, the greater must be the value of

with the chosen act.

28

knowledge attached to that event. Moreover, the elasticitiy must be -1 becausethe value of distinguishing two equally, highly likely events must be the sameas distinguishing two equally, but less likely events. This in turn means that thevalue of knowing the exact parameter is infinite if EV is atomless. Maximumlikelihood estimation therefore may therefore violate our continuity assumptionin case the decision maker attaches zero prior probability to some parametervalues. However, it may be approximated by an elasticity of curiosity thatapproaches -1.

Data collection costs: A data collection problem is an estimation probleminvolving a choice from a subset of the set of estimators in which the costof the estimator depends on the number of observations used. Note that anestimator f need not depend on all the data. An estimator uses at most the firstn observations if for all outcomes (c, r), the preimages f−1(c, r) are cylindersets of the form SV ×∏i=n+1 Si with SV ⊆ ∏n

i=1 Si. A data collection costfunction c : N → R specifies the monetary cost associated with the numberof observations used. A data collection problem is a set of estimators A′ suchthat for all a ∈ A′ the outcome (c, r) is only in the support of a if c = c(n(a)).The decision maker therefore faces a tradeoff between using more observationsto improve the information about V and the cost associated with using theseobservations.

The characterization of statistical estimators from assumptions on the valueof information may be an exciting avenue for further research. From a normativeperspective, this may be more interesting for a classical statistician than fora Bayesian statistician as subjective knowledge utility appears to justify agreater variety than expected-utility motivated Bayesian statistics. The revealedpreference perspective may however be even more interesting: what doesthe number of test subjects in an experiment and their pay tell us about theresearcher’s value of knowledge of the hypothesis being tested? Do the effortsand resources spent indicate that society places a higher value of knowledgeon world temperatures in 2050 than on knowledge about the lives of certaincelebrities? Subjective knowledge utility can address these questions within aunified decision theoretic framework.

29

8 Discussion

In this section, we discuss alternative ways of modeling the gain of knowledgein decisions under uncertainty and implications of our work for the necessityof Savage’s axioms for rational decision making under uncertainty.

It is important to realize that the standard way of modeling decisionsunder uncertainty that we employed comes with some underlying maintainedassumptions. The most crucial implicit assumption is that any outcome α canbe realized on any event. The following property therefore directly followsfrom the way decisions under uncertainty are modeled:

P 0 (Outcome Permutability). If σ : f (S) → f (S) is a permutation of theoutcomes resulting from an act f ∈ A, then permuting the outcomes of f yieldsalso an act σ ◦ f ∈ A.

Thus, starting from any act we can permute all outcomes and obtain anotheract. This assumption is implicit in almost all of axiomatic decision theory inone form or another. However, this imposes severe restrictions on the way theknowledge gained from an act (and thus its value) can be modeled. For example,defining an outcome as “knowing event E obtains” is inconsistent with outcomepermutability — under outcome permutability, this outcome could also resultfrom event E implying that the decision maker knows (and therefore has thecorrect belief) that E obtains when in fact it does not obtain! Including beliefsin the description of outcomes faces similar issues. In this case, the value of“believing event E obtains” should be state-dependent, depending on whetherE actually obtains or not.11 It seems therefore that if one accepts outcomepermutability and wants to avoid state-dependent utility, then knowledge andbeliefs about events cannot be (part of) outcomes.12 It is for this reason that weassumed that knowledge about which event obtains is derived from the chosenact and the observed outcomes.

Another possible way of modeling knowledge would be to explicitly intro-duce an information partition in the description of an act. However, consideran act that yields outcome α on event E and β on event E but has a trivial

11Savage responded to the critique that the evaluation of outcomes should be state dependentwith the famous bonmot “I should not mind being hung so long as it be done without damageto my health and reputation”. If the reason for state dependence is the inclusion of beliefts intooutcomes, the bonmot would go “I should not mind being hung as long as I believe that I amnot being hung”.

12For difficulties with including beliefs into outcomes to explain anticipatory feelings, see theextensive discussion in Eliaz and Spiegler (2006).

30

information partition. In this act, the decision maker could refine the informa-tion partition to {E, E} by realizing that according to her chosen act outcomeα occurs if and only if event E obtains. Thus, either the outcome must beunobservable after it is realized or the decision maker must “forget” what actshe has chosen. Our way of modeling knowledge avoids such issues – indeed,the knowledge gained by the decision maker is exactly what can be concludedfrom observing an outcome and remembering which act has been chosen.

If the outcomes in Savage’s axiomatization cannot contain descriptions ofknowledge or beliefs without running into conceptual difficulties, then expectedutility as axiomatized by Savage does not fulfill our normative position statedin section 4. This is because Savage’s axiomatization of expected utility cannotproperly account for an intrinsic preference for knowledge. Moreover, in caseof an instrumental preference for knowledge (to improve future decisions),Savage’s axiomatization makes excessive requirements on the decision makerto integrate the future decision problems into one grand decision problem. Ourmodel allows decision makers to integrate a preference for knowledge into theirpreferences without specifying the exact way in which this knowledge will beused.

Our results extend previous critiques of the necessity of Savage’s axiomsfor rational decisions (Gilboa, Postlewaite, & Schmeidler, 2009, 2012). However,the motivation for Gilboa et al. (2012) to reject the sure-thing principle refersto the inability of the decision maker to fix a precise prior. A tough-mindedexpected utility maximizer may state that this inability is simply a mistake. Ourcritique instead refers to the tastes of the decision maker for knowledge. Degustibus non est disputandum - clearly a theory of rational choice should allowa decision maker to give up a good night’s sleep in return for finding out whothe murderer on the orient express is. It is up to further research how expectedutility can be salvaged as a normatively compelling criterion when decisionmakers have a preference for knowledge.

9 Concluding Remarks

While the instrumental value of information has been covered to great extent inthe literature, it is perhaps surprising that the standard model of rational choicedoes not account for the value of knowledge simpliciter. This is especiallystriking as an intrinsic preference for knowledge is present in many contexts

31

of everyday life, be it media, news, and entertainment consumption, scientificexploration, education, and social relationships. With this paper, we hope toprovide a starting point for a systematic analysis of rational decisions in thepresence of a subjective value of knowledge.

32

A Monotone Sequences of Acts and Events

In this section of the appendix, we make our definitions regarding monotonecontinuity precise.

Definition 19 (Monotonely Continuous Function). A function h : E → R ismonotonely continuous if for any convergent monotone sequence of eventsE1 ⊂ E2 ⊂ . . .→ E, we have that limk→∞ h(Ek)→ h(E).

Definition 20 (Monotone Sequence). A sequence of acts { f k}k is monotonein outcome α if k ≥ l implies ( f k)−1(α) ⊆ ( f l)−1(α) or if k ≥ l implies( f k)−1(α) ⊆ ( f l)−1(α).

A monotone sequence of acts therefore makes a particular outcome eitherincreasingly likely or increasingly unlikely by adding or removing states inwhich the event obtains.

Definition 21 (Monotone Convergence). { f k}k converges monotonically tof , denoted fk → f , if for all α ∈ supp( f ), { f k}k is monotone in α and⋃

k( f k)−1(α) = f−1(α) or⋂

k( f k)−1(α) = f−1(α).

A sequence of acts converges monotonically to an act f if it is a monotonesequence and the set-theoretic limit of the events on which each outcome isobtained is the event on which f obtains that outcome.

B Proof of Theorem 1

The proof proceeds as follows. First, we show that any set of conditional actsforms a linear continuum. This is highly nontrivial because the value of knowl-edge prevents us from using standard results. Since in a linear continuumthe order topology is connected, this allows us to use additive representationtheorems to obtain additive separability of utility across conditional acts. Next,for arbitrary information partitions, we derive an additive utility representationfor conditional acts that in turn depends on the additive utility representa-tions of its conditional acts. This allows us to use the uniqueness of additiverepresentations to further refine the representation into the desired subjectiveknowledge utility.

Lemma 1. If E ∈ E is null, then for all fE, gE ∈ AE, fE ∼E gE.

33

Proof. We prove this by outcome solvability. If fE �E gE, then for some out-comes γ, β, γE �E βE and thus E is not null.

Definition 22 (Supremum). f is a supremum of S′ under relation %, denotedf ∈ sup%(S

′) if for all g ∈ S′, f % g and for all h such that h % g ∀ g ∈ S′,h % f .

Definition 23 (Linear Continuum). A set S ordered by % is a linear continuumif for all f , h ∈ S and all S′ ⊆ S:

f � h⇒ ∃g : f � g � h (29)

sup%

S′ ⊆ S. (30)

Definition 24 (Interval of Acts). An interval I[ f , h] of acts between f and h isdefined as the set of all acts that are part of some monotone sequence from f tog, i.e.,

I[ f , h] ={

g : ∀α ∈ X :(

f−1(α) ⊆ g−1(α) ⊆ h−1(α)∨

h−1(α) ⊆ g−1(α) ⊆ f−1(α))}

.

(31)

Lemma 2. If f � h, and I[ f , h] 6= ∅, then there exists g ∈ I[ f , h] such thatf 6= g 6= h.

Proof. By Lemma 1 and f � h, it cannot be the case that f and h are identicalon all nonnull events and therefore there must be an outcome α ∈ X such thatf−1(α)− h−1(α) is nonnull. Since E contains no atoms, we can find a non-nullproper subset, denoted A, of f−1(α)− h−1(α). Then αAh is the desired act onthe interval.

Lemma 3. For any two acts f , h, there exists a finite sequence of nonempty intervalsI[ f , g′], I[g′, g′′], . . . , I[g′′′, h].

Proof. This follows from the assumption that all acts are simple. Since the actsare simple, there are finitely many events on which the two acts differ. If twoacts differ only on a single event, it is straightforward to show that a nonemptyinterval exists. We can therefore construct the finite sequence of intervals bychanging a single event on each interval.

Lemma 4 (Villegas (1964), Theorem 5). In a qualitative probability σ-algebra thefollowing propositions are equivalent:

34

1. There are no atoms;

2. Every event can be partitioned into two equally probable events.

Lemma 5. Let I[ f ′, g′] be a nonempty interval. For any bounded subset I′ of theindifference curves I[ f ′, g′]/∼, there exists an interval I0 between acts containing abound and elements worse or indifferent to the indifference classes of I′.

Proof. Assume the bound is an indifference class containing the act h. Weconsider the interval I[g′, h]. If we have that I[g′, h] contains indifference classesworse or equal to indifference classes in I′, then I0 ≡ I[g′, h]. Otherwise,I[g′, h] contains only elements that are strictly preferred to all elements of theindifference classes in i ∈ I′. Then g′ is a bound for I′ and for any elementi ∈ I′ with representative act h′, we can choose I0 ≡ I[h′, g′].

Lemma 6. For any nonempty I[ f , h], there exists an element g ∈ I[ f , h] s.t. thefollowing holds

1. For α ∈ X with f−1(α) ⊆ g−1(α) ⊆ h−1(α), we have g−1(α)− f−1(α) is aslikely as h−1(α)− g−1(α).

2. For α ∈ X with h−1(α) ⊆ g−1(α) ⊆ f−1(α), we have h−1(α)− g−1(α) is aslikely as f−1(α)− g−1(α).

We call this element g the half distance element for I[ f , h].

Proof. This directly follows from Villegas (1964), Theorem 5.

Lemma 7. If f � h, and I[ f , h] 6= ∅, then there exists g ∈ I[ f , h] such thatf � g � h.

Proof. Let g0 ≡ f and g0 ≡ h. Define I1/2[ f , g] as the set of half-distanceelements between f and g. Define gi as an arbitrary element of I1/2[gi−1, gi−1].If f � gi � h, then we have found the desired element g. If gi % f , thendefine gi ≡ gi and gi ≡ gi−1. If h % gi, then define gi ≡ gi−1 and gi ≡ gi. Weobtain the monotone sequences gi and gi. If the sequences do not terminate atsome element g, then they converge to acts g∗ and g∗, respectively, which mayonly differ on null events. By Lemma 1 we have that g∗ ∼ g∗. It follows frommonotone continuity that g∗ % f and h % g∗, a contradiction. It follows thatthe sequence terminates at some element g such that f � g � h.

Lemma 8 (Least Upper Bound Property). For every nonempty subset I of aninterval I[ f , h] that is bounded by g with respect to %, there exists a supremum g∗.

35

Proof. By Lemma 5, there exists an interval I[g0, g0] such that for all x ∈ I,g0 % x and for some y ∈ I, y % g0. Define I1/2[g0, g0] as the set of half-distanceelements in this interval. Define gi as an arbitrary element of I1/2[gi−1, gi−1].If for all x ∈ I, gi � x, then define gi ≡ gi and gi ≡ gi−1. If there exists y ∈ Isuch that y % gi, then define gi ≡ gi−1 and gi ≡ gi. We obtain the monotonesequences gi and gi. The sequences converge to acts g∗ and g∗, respectively,which may only differ on null events and are thus indifferent by Lemma 1.Clearly, g∗ ≡ g∗ is a supremum of I, since if g∗ � x ∈ I[ f , h], then there existssome y ∈ I and gi s.t. y % gi � x.

Lemma 9 (Intervals are Linear Continuums). If I[ f ′, h′] is nonempty, then its setof indifference curves I[ f ′, h′]/∼ is a linear continuum.

Proof. This follows since the relation % has the upper bound property and forall elements f , h ∈ I[ f ′, h′], there exists an element g such that f ′ � g � h′

Lemma 10 (Connectedness). The order topology on the set of conditional acts on Eis connected.

Proof. Suppose we can partition the set of conditional acts into two open sets,A, B. Take an arbitrary act from each set. There exists a finite sequence ofnonempty intervals between the sets and each interval is a linear continuum.The union of these intervals forms a linear continuum U as well. But then bycontinuity A∩U and B∩U partition the linear continuum, a contradiction.

Using Connectedness, we can now obtain an additive representation U( fEg) =uE( fE) + uE(gE). From here, we use the remaining axioms to obtain a represen-tation of the form U( f ) = ∑α∈O µ( f−1(α))U(α) + h( f−1(α))

We call any partition PE of E that has at least three elements and is a subsetof E an admissible partition of E.

Lemma 11 (Additive Representation). For every event E and admissible partitionPE, there exists a representation UPE : ∏E′∈PE

AE′ → R of %E

UPE( fE) = ∑E′∈PE

uPE( fE′ , E′). (32)

Proof. Choose any admissible partition PE. Note that for every element E′ ofPE the set of acts { fE′ : E′ → R} is connected under the %E′-order topology.We now assume that α′, β′, etc. can only occur on E′ ∈ PE, α′′, β′′, etc. canonly occur on E′′ 6= E′ and so on. Since outcomes α′ ∼ α′′ can be substituted

36

for another by our monotonicity assumption and since we have countablymany indifferent outcomes for each outcome, this is without loss of generality.By the information-neutral sure-thing principle, we have jointly independentpreferences over the product space ∏E′∈PE′

AE′ . Since preferences over these arecontinuous in the order topology, it follows by Wakker (1988b) that an additiverepresentation of the form

UPE( fE) = ∑E′∈PE

uPE( fE′ , E′) (33)

exists.

Lemma 12 (Outcome Additive Representation). For every event E, there exists arepresentation UE : AE → R of %E of the form:

UE( fE) = ∑α∈X

uE(α, f−1E (α)) (34)

Proof. Let P′E be a refinement of PE. Then UPE and UP′Emust be affine transfor-

mations of another by the uniqueness of additive representations over ∏P′EAE′ .

We may assume without loss of generality (by simply applying this affine trans-formation to one of the representations), that they are identity transformationsof another on the domain ∏P′E

AE′ . Thus, for all f , such that the decision makeris informed about P′E, we have that UPE( fE) = UP′E

( fE). It follows that we mayalso choose uPE( fE′′gE′′′ , E′′ ∪ E′′′) = uP′E

( fE′′ , E′′) + uP′E(gE′′′ , E′′′) for arbitrary

disjoint E′′, E′′′. Choosing UPE(γE) and UPE(γE) therefore uniquely fixes thescale and utility representation of all refinements of PE. Moreover, since allother partitions P′′E share a common refinement with PE, this indeed fixes thescale of all admissible partitions of E.

We now argue that for arbitrary partitions PE and P′E, UPE( fE) ≥ UP′E(gE)

if and only if fE %E gE. Since for any F ⊂ E and all fF (gF), there exists anoutcome δ (ε) such that fF ∼F δF (gF ∼F εF), fF %F gF iff δF %F εF we haveconsistency between UPE and UP′E

if there are at least two common partitionelements E′, E′′ via the representation U{E′,E′′,E−(E′∪E′′)}( fE). By continuity,for sufficiently small E′, E′′, we can refine PE without changing the set ofindifference curves much. Call this refinement PE. We can therefore ensurethat any two partitions are consistent with another since we can choose E′ andE′′ such that the partitions PE, PE, P′E, P′E are all consistent with another on anarbitrarily large utility interval.

37

We can therefore define the following utility representation on all acts withat least three distinct outcomes:

UE( fE) = ∑α∈X

vE(α, f−1E (α)) (35)

vE(α, E′) = uPE(αE′ , E′) for some PE 3 E′ (36)

To extend this representation to all acts, we consider disjoint monotone se-quences Ek → ∅ and Fk → ∅. By continuity, the utility of the outcome α mustbe the limit of US(γEk δFk α). The utility of αFβ must be the limit of US(γEk αFβ)

if F is disjoint from Ek.

Lemma 13 (Monotone Additive Representation). For an arbitrary partition PE,there exists a function vPE such that:

UE( fE) = ∑E′∈PE

vPE(UE′( fE′), E′) (37)

Proof. Note that under the normalizations employed in the previous lemma,UE( fE) = UPE( fE) if PE is coarser than the events the individual is informedabout, f−1

E (X). The desired result then follows from the existence of an additiverepresentation UPE and that each uPE must be an increasing function of UE.

Lemma 14 (Affinity). For all E′ ⊂ E ∈ E vPE(x, E′) = AE′|Ex + BE′|E

Proof. Note that since we have normalized uPE and uP′Esuch that they are

identical on identical sub-acts, we know that vPE does not depend on the choiceof partition but only on E. By the uniqueness of additive representations, wethen have that vPE must be affine in its first argument.

Lemma 15 (Multiplicativity). AE′′|E′AE′|E = AE′′|E.

Proof. Consider the utility of the following act.

UE(αE′′βE′−E′′γE−E′)

=AE′′|EUE′′(αE′′) + AE′−E′′|EUE′−E′′(βE′−E′′) + AE−E′|EUE−E′(γE−E′)

+ BE′′|E + BE′−E′′|E + BE−E′|E (38)

=AE′|E

(AE′′|E′UE′′(αE′′) + AE′−E′′|E′UE′−E′′(βE′−E′′) + BE′′|E′ + BE′−E′′|E′′

)+ AE−E′|EUE−E′(γE−E′)

+ BE′|E + BE−E′|E′ (39)

38

UE(αE′′βE′−E′′γE−E′)

=AE′′|EUE′′(αE′′) + AE′−E′′|EUE′−E′′(βE′−E′′) + AE−E′|EUE−E′(γE−E′)

+ BE′′|E + BE′−E′′|E + BE−E′|E (40)

=AE′|EUE′(αE′′βE′−E′′) + BE′|E + AE−E′|EUE−E′(γE−E′) + BE−E′|E (41)

=AE′|E(AE′′|E′UE′′(αE′′) + BE′′|E′ + AE′−E′′|E′UE′−E′′(βE′−E′′) + BE′−E′′|E′)

+ BE′|E + AE−E′|EUE−E′(γE−E′) + BE−E′|E (42)

Note that UE′′(X) is a continuum. For a small change of UE′′(αE′′) the aboveequation is only maintained if AE′′|E′AE′|E = AE′′|E.

It follows that

U( f ) ≡ US( f ) = ∑α∈X

A f−1(α)|SU f−1(α)(α f−1(α)) + B f−1(α)|S. (43)

is a utility representation. We now want to show that without loss of generalityUE(αE) = U(α) and that AE|S is a probability. Note that we can rescale for allevents E, each representation UE such that UE(γE) = 1 > 0 = UE(βE) for twoarbitrary outcomes γ � β. We now have for some suitably chosen acts f , gdisjoint from β, γ:

AE|S + AF|S =uS−E−F( fS−E−F)− uS−E−F(gS−E−F)

(44)(AE|S + AF|S

)(UE(γE)−UE(βE)) =uS−E−F( fS−E−F)− uS−E−F(gS−E−F)

(45)

⇔ U(γEγ′F f ) =U(βEβ′Fg) (46)

⇔ U(γE∪F f ) =U(βE∪Fg) (47)

⇔ AE∪F|S (UE∪F(γE∪F)−UE∪F(βE∪F)) =uS−E−F( fS−E−F)− uS−E−F(gS−E−F)

(48)

⇔ AE∪F|S =uS−E−F( fS−E−F)− uS−E−F(gS−E−F)

(49)

Thus, AE∪F|S = AE|S + AF|S. It is straightforward to show that A∅|S = 0 andwithout loss of generality AS|S = 1. It follows that AE|S = µ(E|S) is the uniqueprobability representation of %∗.

We now show that under our normalization, UE(αE) = UF(αF). For some

39

acts f , g, we have:(AE|S + AF|S

)(UE(γE)−UE(αE)) =uS−E−F( fS−E−F)− uS−E−F(gS−E−F)

(50)

⇔ U(γEγ′F f ) =U(αEα′Fg) (51)

⇔ U(γE∪F f ) =U(αE∪Fg) (52)

⇔ AE∪F|S (UE∪F(γE∪F)−UE∪F(αE∪F)) =uS−E−F( fS−E−F)− uS−E−F(gS−E−F)

(53)

Thus,

AE|S (UE∪F(αE∪F)−UE(αE)) = −AF|S (UE∪F(αE∪F)−UF(αF)) (54)

Since the LHS and RHS must be of the same sign and all events E ∪ F can bepartitioned into two equally likely events E and F, it follows for all events wemust have that UE(αE) = UF(αF) = UE∪F(αE∪F) = US(α) = U(α). We thusobtain the representation:

UE( fE) = ∑α

µ( f−1E (α))U(α) + B f−1(α)|S (55)

Defining h(E) = BE|S yields the desired representation. Lastly, we show that hhas special uniqueness properties.

Lemma 16. The function h in the subjective knowledge utility representation is uniqueup to an additions of a finite measure, i.e., if h′ = h + m where m is a finite measureon the event algebra, then h′ represents the same knowledge preferences as h.

40

Proof.

U( f ) ≥ U(g) (56)

⇔ ∑α∈X

u(α)µ( f−1(α)) + ∑α∈X

h( f−1(α)) (57)

≥ ∑α∈X

u(α)µ(g−1(α)) + ∑α∈X

h(g−1(α)) (58)

⇔ ∑α∈X

u(α)µ( f−1(α)) + ∑α∈X

h( f−1(α)) + m(S) (59)

≥ ∑α∈X

u(α)µ(g−1(α)) + ∑α∈X

h(g−1(α)) + m(S) (60)

⇔ ∑α∈X

u(α)µ( f−1(α)) + ∑α∈X

h( f−1(α)) + m(∪α∈X f−1(α)) (61)

≥ ∑α∈X

u(α)µ(g−1(α)) + ∑α∈X

h(g−1(α)) + m(∪α∈Xg−1(α)) (62)

⇔ ∑α∈X

u(α)µ( f−1(α)) + ∑α∈X

h( f−1(α)) + ∑α∈X

m( f−1(α)) (63)

≥ ∑α∈X

u(α)µ(g−1(α)) + ∑α∈X

h(g−1(α)) + ∑α∈X

m(g−1(α)) (64)

⇔ ∑α∈X

u(α)µ( f−1(α)) + ∑α∈X

(h( f−1(α)) + m( f−1(α))) (65)

≥ ∑α∈X

u(α)µ(g−1(α)) + ∑α∈X

(h(g−1(α)) + m(g−1(α))) (66)

⇔ ∑α∈X

u(α)µ( f−1(α)) + ∑α∈X

h′( f−1(α)) (67)

≥ ∑α∈X

u(α)µ(g−1(α)) + ∑α∈X

h′(g−1(α)) (68)

C Proof of Proposition 3

Proof. By the uniqueness of probabilities and utility representations over lot-teries, we must have that µ1 = µ2 and u1 is an affine transformation of u2.Moreover, D is invariant under affine transformations of the utilities. Withoutloss of generality, we therefore assume that the transformation is the identityand thus u1 = u2.

41

We show that 3⇒ 2. If D is subadditive, then

D(E) + D(F) ≥D(E ∪ F) (69)

⇔ h1(E) + h1(F)− h1(E ∪ F)U1(γ)−U1(β)

≥h2(E) + h2(F)− h2(E ∪ F)U2(γ)−U2(β)

(70)

⇔ U1(βEβ′Fα)−U1(γE∪F) ≥U2(βEβ′Fα)−U2(γE∪F) (71)

and thus if βEβ′Fα %2 γE∪Fα, then βEβ′α %1 γE∪Fα.2 ⇒ 1 follows straightforward from transitivity and identical preferences

over outcomes.1⇒ 3 follows from:

U1(βEβ′Fα) > U1(LE2(E, F, β)E∪Fα) (72)

⇔ U1(βEβ′Fα)−U2(βEβ′Fα) > U1(LE2(E, F, β)E∪Fα)−U2(LE2(E, F, β)E∪Fα)

(73)

⇔ h1(E)− h2(E) + h1(F)− h2(F) + h1(E ∪ F)− h2(E ∪ F)

> h1(E ∪ F)− h2(E ∪ F) + h1(E ∪ F)− h2(E ∪ F) (74)

⇔ D(E) + D(F) > D(E ∪ F) (75)

where the equivalence between the first and second line follows from thedefinition of the knowledge equivalent, and the remaining steps follow fromthe assumption that u1 = u2.

D Proof of Proposition 4

Proof. It is straightforward to show that an entropic knowledge utility repre-sentation has an elasticity of curiosity of zero and that a constant elasticityof curiosity representation has a constant elasticity of curiosity. We prove thereverse implication. If el(p, q) = c, then

el(p, q) =p2(

h′(pq)q+h′(p(1−q))(1−q)−h′(p)p − h(pq)+h(p(1−q))−h(p)

p2

)h(pq) + h(p(1− q)− h(p))

= c (76)

42

and after reordering terms

p(h′(pq)q + h′(p(1− q))(1− q)− h′(p)

)=(1 + c) (h(pq) + h(p(1− q))− h(p)) (77)

Defining k(p) = p · h′(p)− (1 + c)h(p), we obtain:

k(pq) + k(p(1− q)) = k(p) (78)

Substituing pq = x and p(1− q) = y, we have Cauchy’s functional equation:

k(x) + k(y) = k(x + y) (79)

with the solution k(x) = A · x. Thus, A · x = x · h′(x)− (1 + c)h(x) which is alinear differential equation. The integration factor of this differential equationis x−(1+c). The solution is then:

h(x) =

A · x + Cx1+c, c 6= 0

A · x ln x + Cx c = 0.(80)

Since we may remove terms that are constant or linear in probability withoutchanging the functional form of the representation, we obtain the desiredrepresentations.

E Proof of Theorem 2

E.1 Sufficiency Proof

Proof. Since we do not assume outcome solvability, we cannot use Theorem1 to prove this result. However, the principle of indifference of informationallows us to proceed in a more standard manner than in the proof of Theorem1. We divide the proof into the following steps. First, we show that % underour axioms induces a qualitative probability %∗. Since our event space has noatoms, we obtain a quantitative probability. Next we show that the probabilitydistribution over outcomes determines the preference. Using our separabilityand monotonicity conditions we obtain a utility representation that is additivelyseparable in outcomes. Lastly, we use learning independence to separate theutility function into an expected utility and the value of information.

43

Definition 25 (Qualitative Probability). %∗ is a qualitative probability if it is acomplete and transitive relation on E and for any events E, F, G ∈ E such that(E ∪ F) ∩ G = ∅,

S �∗ ∅ (81)

E ∪ G %∗ E (82)

E %∗ F ⇔ E ∪ G %∗ F ∪ G. (83)

Conjecture 1 (Existence of a Qualitative Probability). %∗ is a qualitative probabil-ity.

We first prove several lemmas about the properties of %∗.

Lemma 17 (Completeness). For all E, F ∈ E, E %∗ F, F %∗ E, or E ∼∗ F.

Proof. Completeness is guaranteed since for any two events E, F ∈ E, we canwrite E = E− F∪ (E∩ F) and F = F− E∪ (E∩ F). Completeness of % togetherwith (7) then guarantee that either E %∗ F, or F %∗, or both.

Definition 26 (Event Solvability). % fulfills Event Solvability if whenever E %∗ Fand E and F are disjoint, then there exists an event E′ ⊆ E such that E′ ∼∗ F.

In other words, if γEβF %E∪F βEγF for γ � β, then there exists a subeventE′ of the event E such that γE′βFα ∼ βE′γFα.

Lemma 18. If there are no atoms and continuity holds, then Event Solvability holds.

Proof. The proof is straightforward. We iteratively split the larger event and addor remove subevents to obtain a converging sequence of events. This sequencecan be split into sequences of events that are too large and those that are toosmall. Continuity then guarantees that the associated acts characterizing thelikelihood relation converge.

With no-atom E, the proof is trivial if either E ∼∗ F or F ∼∗ ∅. Hence,assume E �∗ F �∗ ∅.

Because E has no atoms, for every event A �∗ ∅, ∃A′ ⊂ A such that∅ ≺∗ A′ ≺∗ A. Furthermore, we can find a partition P of E {E1, E2, ..., En} suchthat for all j, F− Ej �∗ Ej and that for all j, ∅ ≺∗ Ej ≺∗ F because F � ∅. Theformer works because of Theorem 4 of Villegas (1964). Suppose the latter doesnot work, then there is an atom E′ in E such that E′ %∗ F �∗ ∅, a contradiction.By construction, we can find i < n such that L1 ≡

⋃1≤j≤i

Ej -∗ F and that

U1 ≡ L1 ∪ Ei+1 %∗ F.

44

Similarly, we can again find a non-null partition P(2) of Ei+1, {E(2)1 , E(2)

2 , ..., E(2)n2 },

such that for every j (1) ∅ ≺∗ E(2)j ≺

∗ Ei+1 (2) L1 ∪ E(2)j ≺

∗ F (3) Ei+1 − E(2)j �

E(2)j . It follows that there exists again i(2) such that L2 ≡ L1 ∪

⋃1≤j≤i(2)

E(2)j -∗ F

and that U2 ≡ L2 ∪ E(2)i(2)+1

%∗ F.

We repeat by finding such a partition Pk+1 for E(k)i(k)+1 and obtain two se-

quence of events {Lk}, {Uk}. By construction, E′ ≡∞⋃

k=1Lk =

∞⋂k=1

Uk and

Lk -∗ F -∗ Uk. The former implies E′ ∈ E (since E is a σ-algebra) whereas thelatter implies E′ ∼∗ F by continuity.

Lemma 19 (Nontriviality of Likelihood). S �∗ ∅

Proof. Nontriviality of % ensures that γSβ∅α = γ � β = βSγ∅α and thereforeS �∗ ∅.

Lemma 20 (Subset Consistency). E ⊆ F ⇒ F %∗ E

Proof. If E ⊆ F, then E ∪ G = F where E ∩ G = ∅. We need to show that ifγ � β, then,

γF−EβE−Fα %βF−EγE−Fα (84)

⇔ γGα %βGα (85)

which is guaranteed by information-neutral monotonicity and γ � β.

Lemma 21 (Expansion Consistency). E %∗ F ⇔ E ∪ G %∗ F ∪ G whenever(E ∪ F) ∩ G = ∅.

The result follows since the definition of E %∗ F is the same as the definitionof E ∪ G %∗ F ∪ G;

E %∗F (86)

⇔ γE−FβF−Eα %βE−FγF−Eα (87)

⇔ γ(E∪G)−(F∪G)β(F∪G)−(E∪G)α %β(E∪G)−(F∪G)γ(F∪G)−(E∪G)α (88)

⇔ E ∪ G %∗F ∪ G. (89)

Lemma 22 (Disjoint Measurability). If G ∼ H and E, F, G, and H are mutuallydisjoint, then E %∗ F ⇔ E ∪ G %∗ F ∪ H.

45

Proof.

γEβFαGαHα %βEγFαGαHα (90)

⇔ γEβFαGα′Hα %βEγFαGα′Hα (91)

⇔ γEβFγ′Gβ′Hα %βEγFγ′Gβ′Hα (92)

⇔ γEβFγ′Gβ′Hα %βEγFβ′Gγ′Hα (93)

⇔ γEβFγGβ′Hα %βEγFβGγ′Hα (94)

⇔ γEβFγGβHα %βEγFβGγHα (95)

While the first, second, forth and fifth equivalence comes from information-neutral sure-thing principle, the third holds thanks to the principle of indiffer-ence of information.

Lemma 23 (Transitivity). E %∗ F and F %∗ G implies E %∗ G.

We prove this in several steps.

Lemma 24 (Disjoint Transitivity). If E, F, G are mutually disjoint events, thenE %∗ F and F %∗ G implies E %∗ G.

Proof. Let γ � β � α be given. Since E %∗ F, we have γEβFαGδ % βEγFαGδ.Since F %∗ G, we have βEγFαGδ % βEαFγGδ. By E %∗ F again, βEαFγGδ %

αEβFγGδ. Hence we conclude that, γEβFαGδ % αEβFγGδ, that is E %∗ G.

Lemma 25 (Subset Transitivity).

1. Suppose F ⊆ E and F %∗ G, then E %∗ G.

2. Suppose G ⊆ F and E %∗ F, then E %∗ G.

Proof.

1. Solve for G on F, obtain E ⊇ F′ ∼∗ G. If G � E, then βEγGδ � γEβGδ

which holds if and only if (By learning independence) β′E−F′βF′γGδ �γ′E−F′γF′βGδ. Since F′ ∼ G, we have that β′E−F′βF′γGδ ∼ β′E−F′γF′βGδ

violating monotonicity.

2. By Event Solvability and E %∗ F, we can solve for F on E, getting usE ⊇ E′ ∼∗ F. Suppose for contradiction that G �∗ E; that is, γGβEα �γEβGα. This is identical to the following expression: γG−FγFβ′EβE−E′α �γ′EγE−E′βG−FβFα. Since E′ ∼∗ F and they are disjoint, we can switch

46

γ and β on E′ ∪ F. This gives us ∅ = (G − F) �∗ E − E′ %∗ ∅, acontradiction.

Lemma 26 (Limited Transitivity 1). If (E ∪ F) ∩ G = ∅, then E %∗ F and F %∗ Gimply E %∗ G.

Proof. If E ∩ F % G, then we have the result by Subset Transitivity 1. Otherwisewe can solve E ∩ F into G, calling it G′. By Disjoint Measurability F − E %

G− G′. Now we solve G− G′ into F− E, calling it F′ ∼ G− G′ and via SubsetTransitivity 1 have that F− E % G−G′. By Disjoint Transitivity E− F %∗ G−G′.By Disjoint Measurability E % G.

Lemma 27 (Limited Transitivity 2). If E ∩ G = ∅, then E %∗ F and F %∗ G implyE %∗ G.

Proof. Since E %∗ F, E− F %∗ F− E ⊇ F ∩ G. By Subset Transitivity E− F %∗

F ∩ G. We can therefore solve for E′ ∼∗ F ∩ G such that E′ ⊆ E − F. ByDisjoint Measurability, E− F % F− E if and only if (E− F)− E′ % F− G. ByLimited Transitivity 1, (E − F) − E′ %∗ G − F follows from F − G % G − Fand (E− F)− E′ % F − G. Moreover, by Disjoint Measurability and SubsetTransitivity, E % G if (E− F)− E′ %∗ G− F.

Lemma 28 (Limited Transitivity 3). If (F− E)− G = ∅ and E ∩ F ∩ G = ∅, thenE %∗ F and F %∗ G imply E %∗ G.

Proof. Solve G− F into F− G, naming the set F′. Also, E %∗ F is equivalentto E− F %∗ F− E. By Disjoint Measurability, we can add F′ to the LHS andG− F to the RHS, yielding (E− F)∪ F′ % G. By Subset Transitivity, E % G.

Lemma 29 (Limited Transitivity 4). If E ∩ F ∩ G = ∅, then E %∗ F and F %∗ Gimply E %∗ G.

Proof. If G ∩ E % (F − G)− E, then we can solve for a G′ in G ∩ E such thatG′ ∼ (F−G)− E. Then E % F iff E−G′ % (F ∩ E)∪ (F ∩G). Moreover, F % Giff (F ∩ E) ∪ (F ∩ G) % G− G′. Applying Limited Transitivity 3, we have thatE−G′ % G−G′ and by Expansion consistency E % G. If (F−G)− E %∗ E∩G,then we can solve for an F′ in (F − G) − E. Then, E % F if and only ifE−G % F− F′ and F % G if and only if F− F′ % G− E. By Limited Transitivity2 it follows that E− G %∗ G− E.

47

Lemma 30 (Transitivity). If %∗ is transitive on events with E ∩ F ∩ G = ∅, then itis transitive.

Proof. By Expansion consistency, for any X, Y ∈ {E, F, G}, we have that X % Yif and only if X − (E ∩ F ∩ G) % Y − (E ∩ F ∩ G). Thus, proving LimitedTransitivity 4 is equivalent to proving Transitivity.

Lemma 31. %∗ has a quantitative probability representation.

Proof. This follows from Villegas (1964) since we have a qualitative probabilitywithout atoms.

Lemma 32 (Event Swapping). Suppose {E1, . . . , Ek} and {F1, . . . , Fk} are partitionsof S such that for all i, Ei ∼∗ Fi. Then for all lists of outcomes α1, . . . , αk,

(E1 : α1, . . . , Ek : αk) ∼ (F1 : α1, . . . , FK : αk) (96)

Proof. If G ⊆ E1 and H ⊆ E2 and G ∼∗ H, then by the principle of indifferenceof information we have that

(E1 : α1, E2 : α2, E1 ∪ E2 : γ) ∼((E1 − G) ∪ H : α1, (E2 − H) ∪ G : α2, E1 ∪ E2 : γ).(97)

By the information-neutral sure-thing principle

(E1 : α1, E2 : α2, . . . , Ek : αk) ∼((E1 − G) ∪ H : α1, (E2 − H) ∪ G : α2, . . . , Ek : αk).(98)

From this result, the lemma is almost obvious as we can change parts of eventson which outcomes occur as long as the probabilities are maintained. Weiteratively apply this result to construct the indifference in (96). For this, chooseHi = F1 ∩ Ei and Gi as an equiprobable event contained in E1 −

⋃i−1l=2 Hl which

48

exists by Event Solvability. By (98) we have that:

(E1 : α1, . . . , Ek : αk)

∼((E1 − G2) ∪ H2 : α1, (E2 − H2) ∪ G2, E3 : α3, . . . , Ek : αk)

∼ . . .

∼((

E1 −k⋃

i=2

Gi

)∪

k⋃i=2

Hi, (E2 − H2) ∪ G2, . . . , (Ek − Hk) ∪ Gk

)=(F1, (E2 − H2) ∪ G2, . . . , (Ek − Hk) ∪ Gk). (99)

Note, that our choice of E1 and F1 was arbitrary, we can therefore repeat theabove argument for all events. Since Fi ∩ Fj = ∅ for all i 6= j, all previouslychanged events will not be changed by applying (99) to another event. Applying(99) to all events yields the desired indifference (96).

Lemma 33 (Epimorphism to Mixture Space). Let ∆X be the space of finite supportprobability measures over outcomes. Let µ be the quantitative probability representationof %∗. There exists a function φ : A→ ∆X and a unique relation %+ such that for allα ∈ X and all a, b ∈ A:

a % b⇔ φ(a) %+ φ(b) (100)

µ(a−1(α)) = (φ(a))(α) (101)

Proof. By Event Swapping, any two acts are indifferent that induce the sameprobability measure on the outcomes. If φ maps every act to this probabilitymeasure, then (100) trivially holds.

Lemma 34 (Mixture Space Completeness). %+ is complete and transitive.

Proof. Completeness and transitivity follow directly from completeness andtransitivity of %.

We employ the monotone subsequence theorem adapted to simple probabil-ity measures.

Lemma 35 (Monotone Subsequence Theorem). If µk → µ of simple probabilitymeasures on X, then for any outcome α ∈ supp(µ), the sequence {µk}k has a conver-gent subsequence {µk} → µ such that µk(α) ≥ µl(α) for all l ≥ k or µk(α) ≤ µl(α)

for all l ≥ k.

49

Proof. Let α be given. We first prove that every sequence {µk}k has a monotonicsubsequence. We call the m-th term of the sequence µm a peak if for all n ≥ mwe have that µm(α) ≥ µn(α).

— Case 1: Suppose the sequence {µk}k has infinitely many peaks. Weuse these peaks to form a subsequence and denote it {µk}. Since theyare peaks, we have µ1(α) ≥ µ2(α) ≥ µ3(α) ≥ · · · and thus we have amonotonic decreasing sequence.

— Case 2: Suppose the sequence {µk}k has only finitely many peaks anddenote it µk1, µk2, · · · , µkn. Let µ1 = µkn+1. Since µ1 is not a peak wecan find µ2 such that µ1(α) ≤ µ2(α). Continue this process, we have amonotonic increasing sequence.

Since every subsequence of a convergent sequence converges to the same limit,we have the desired result.

Lemma 36 (Mixture Space Continuity). %+ is continuous (closed weakly upper andlower sets)

Proof. Consider sequences µk → µ and νk → ν of simple probability measureson X. By Lemma 35, for any outcome α ∈ supp(µ) the sequence {µk}k hasa convergent subsequence {µk} → µ such that µk(α) ≥ µl(α) for all l ≥ k orµk(α) ≤ µl(α) for all l ≥ k. We can therefore convert µk into a sequence thatmonotonically changes the probabilities of the outcomes in the support of µ.Next, we convert the sequence of measures into a sequence of acts. Let f bean arbitrary element of φ−1(µ). We then choose a sequence of f k ∈ φ−1(µk)

such that it converges monotonically to µ. Proceeding the same way for νk → ν,we obtain a monotonically convergent sequence gk → g. We then have that iffor all k, µk %+ νk, then also f k % gk. It follows by continuity that f % g andtherefore µ %+ ν.

Lemma 37 (Additive Representation). %+ can be represented by

U(µ) = ∑α∈supp(µ)

v(µ(α), U(α)). (102)

Proof. By the information-neutral sure-thing principle, for a state space partition{E, F, G} we have that preferences are separable on conditional acts fE, fF, andfG for disjoint outcome domains. Since these conditional acts straightforwardlymap into restrictions of µ to the subsets of outcomes µ|supp( fE), µ|supp( fF), and

50

µ|supp( fG)by a standard argument, it follows that there exists an additive

representation of the form

U(µ) = u(µ|supp( fE)) + v(µ|supp( fF)) + w(µ|supp( fG)) (103)

Choosing any other partition of outcomes on fE and fF yields another repre-sentation of the above form that must be a monotone transformation of U sinceonly the probabilities of outcomes matter, not the event on which they arise. Itis straightforward to show that by the uniqueness of additive representations,the two representations must be affine transformations of another. It followsthat u (and v and w in a similar manner) is additively separable in the outcomes.We obtain a representation:

U(µ) = ∑α∈supp(µ)

V(µ(α), α). (104)

By information-neutral monotonicity, V must be increasing in U(α). The desiredrepresentation follows.

We now finish the proof of the theorem. By learning independence, itfollows after cancelling terms that

v(µ(α), U(α)) + v(µ(α′), U(α))− v(µ(α) + µ(α′), U(α)) (105)

= v(µ(α), U(γ)) + v(µ(α′), U(γ))− v(µ(α) + µ(α′), U(γ)) (106)

This is a Pexider functional equation with the solution:

v(µ(α), U(α)) + v(µ(α), U(γ)) = A(U(α), U(γ))µ(α) + B(U(α), U(γ)). (107)

Since γ was arbitrarily chosen, it follows that v(µ(α), U(α)) = A(U(α))µ(α) +

B(U(α)) + C(µ(α)). Noting that continuity requires that v(0, U(α)) = 0, wemust have that B(U(α)) = 0. The desired representation follows.

E.2 Necessity Proof

Lemma 38 (Necessity). In a probabilistic knowledge utility representation, the Axioms2, 3, 4, 6, 7, 8 and weak order hold.

51

Proof. It is clear that weak order holds. Axiom 2 holds because

U( fEk) ≥ U(gEk) (108)

⇔ ∑α∈ f (E)

u(α)µ( f−1(α))− h(µ( f−1(α))) + ∑α∈k(E)

u(α)µ(k−1(α))− h(µ(k−1(α)))

(109)

≥ ∑α∈g(E)

u(α)µ( f−1(α))− h(µ( f−1(α))) + ∑α∈k(E)

u(α)µ(k−1(α))− h(µ(k−1(α)))

(110)

⇔ ∑α∈ f (E)

u(α)µ( f−1(α))− h(µ( f−1(α))) + ∑α∈l(E)

u(α)µ(k−1(α))− h(µ(l−1(α)))

(111)

≥ ∑α∈g(E)

u(α)µ( f−1(α))− h(µ( f−1(α))) + ∑α∈l(E)

u(α)µ(k−1(α))− h(µ(l−1(α)))

(112)

⇔ U( fEl) ≥ U(gEl) (113)

Axiom 3 holds because

U(γEk) ≥ U(βEk) (114)

⇔ u(γ)µ(E)− h(µ(E)) + ∑α∈k(E)

u(α)µ(k−1(α))− ∑α∈k(E)

h(µ(k−1(α))) (115)

≥ u(β)µ(E)− h(µ(E)) + ∑α∈k(E)

u(α)µ(k−1(α))− ∑α∈k(E)

h(µ(k−1(α))) (116)

⇔ u(γ) ≥ u(β) (117)

⇔ γ % β (118)

Axiom 4 holds because

U(γEβFα) ≥ U(βEγFα) (119)

⇔ µ(E)u(γ) + µ(F)u(β) ≥ µ(E)u(β) + µ(F)u(γ) (120)

⇔ µ(E) ≥ µ(F) (121)

⇔ µ(E)u(γ′) + µ(F)u(β′) ≥ µ(E)u(β′) + µ(F)u(γ′) (122)

⇔ U(γ′Eβ′Fα) ≥ U(β′Eγ′Fα) (123)

For Axiom 6, f k → f means that for each α there is a monotone sequenceof events Ek → E = f−1(α). From this we obtain a sequence of probabilities

52

ρk(α) → µ(E). Now define the utility contribution of α Ukα = u(α)ρk(α) −

h(ρk(α))→ u(α)µ(E)− h(µ(E)). But if for all α these sequences converge, thenthe sum converges as well. Axiom 7 holds because

U(αEαF f ) ≥ U(δEδFg) (124)

⇔ µ(E ∪ F)u(α) + ∑β∈ f (E∩F)

u(β)µ( f−1(β))− ∑β∈ f (E∩F)

h(µ( f−1(β))) (125)

≥ µ(E ∪ F)u(δ) + ∑β∈g(E∩F)

u(β)µ(g−1(β))− ∑β∈g(E∩F)

h(µ(g−1(β))) (126)

⇔ µ(E)u(α) + µ(F)u(α) + ∑β∈ f (E∩F)

u(β)µ( f−1(β))− ∑β∈ f (E∩F)

h(µ( f−1(β)))

(127)

≥ µ(E)u(δ) + µ(F)u(δ) + ∑β∈g(E∩F)

u(β)µ(g−1(β))− ∑β∈g(E∩F)

h(µ(g−1(β)))

(128)

⇔ µ(E)u(α) + µ(F)u(α′) + ∑β∈ f (E∩F)

u(β)µ( f−1(β))− ∑β∈ f (E∩F)

h(µ( f−1(β)))

(129)

≥ µ(E)u(δ) + µ(F)u(δ′) + ∑β∈g(E∩F)

u(β)µ(g−1(β))− ∑β∈g(E∩F)

h(µ(g−1(β)))

(130)

⇔ U(αEα′F f ) ≥ U(δEδ′Fg) (131)

Axiom 8 holds because

U(αEβFαGβHγ) = U(βEαFαGβHγ) (132)

⇔ µ(E ∪ G)u(α) + µ(F ∪ H)u(β)− h(µ(E ∪ G))− h(µ(F ∪ H)) (133)

= µ(E ∪ H)u(β) + µ(F ∪ G)u(α)− h(µ(E ∪ H))− h(µ(F ∪ G)) (134)

⇔ µ(E)u(α) + µ(G)u(α) + µ(F)u(β) + µ(H)u(β) (135)

= µ(E)u(β) + µ(H)u(β) + µ(F)u(α) + µ(G)u(α) (136)

⇔ µ(E) = µ(F) (137)

53

F Proof of Theorem 3

Proof. The result that 3⇒ 2 is trivial.We prove 2⇒ 1:Using outcome solvability, we can show by a simple induction proof that

if stationarity I (II) holds for k = 2, then stationarity I (stationarity I withequiprobable events) holds for arbitrary k ∈ N. Naturally, stationarity I fork > 2 implies stationarity 1 for k = 2. What is left to show is that stationarity Ifor equiprobable events and arbitrary k implies stationarity II for k = 2.

Suppose αEβFε % γEδFε and E|E ∪ F ∼† G|G ∪ H. Without loss of gen-erality, assume E %∗ F If µ(E|E ∪ F) is rational, we can find an equiprob-able partition of E ∪ F into k events such that E =

⋃kµ(E|E∪F)i=1 E∗i and F =⋃kµ(F|E∪F)

i=1 F∗i . Then by Learning Independence, α′E∗1α′′E∗2

. . . β′Fkµ(E|E∪F)+1. . . ε %

γ′E∗1γ′′E∗2

. . . δ′Fkµ(E|E∪F)+1. . . ε. By stationarity I for equiprobable events, this is

equivalent to: α′G∗1α′′G∗2

. . . β′Hkµ(E|E∪F)+1. . . ε % γ′G∗1

γ′′G∗2. . . δ′Hkµ(E|E∪F)+1

. . . ε for an

equiprobable partition of G∪H fulfilling G =⋃kµ(G|G∪H)

i=1 G∗i and H =⋃kµ(H|G∪H)

i=1 H∗i .By Learning Independence, αGβHε % γGδHε. The case in which µ(E|E ∪ F) isnot rational follows from the continuity of the representation.

1 ⇒ 3: It is straightforward to derive that the stationarity axioms implythe principle of indifference of information and thus a probabilistic knowledgerepresentation. Let ψ( fF) map a conditional act fF to an act ψ( fF) such that theconditional probabilities of outcomes remain unchanged. Under stationarity Iand a probabilistic knowledge utility representation,

∑α

µ( f−1(α))

µ(F)u(α) + h

(µ( f−1(α))

µ(F)

)≥∑

α

µ(g−1(α))

µ(F)u(α) + h

(µ(g−1(α))

µ(F)

)(138)

⇔ ψ( fF) %ψ(gF) (139)

⇔ fF %FgF (140)

⇔ fFβ %gFβ (141)

⇔ ∑α

µ( f−1(α))u(α) + h(

µ( f−1(α)))≥∑

α

µ(g−1(α))u(α) + h(

µ(g−1(α)))

(142)

Thus, U( fFα) = T(ψ( fF)) for a continuous, monotone transformation. Itfollows from the uniqueness of additive representations that T is affine. It is

54

straightforward to show then that:

U(αEβFγ)

=µ(E)U(α) + h(µ(E)) + h(1− µ(E))

+ (1− µ(E))(

µ(F)1− µ(E)

U(β) + h(µ(F)

1− µ(E)) +

µ(E ∪ F)1− µ(E)

U(γ) + h(µ(E ∪ F)1− µ(E)

)

)(143)

=µ(F)U(β) + h(µ(F)) + h(1− µ(F))

+ (1− µ(F))(

µ(E)1− µ(F)

U(α) + h(µ(E)

1− µ(F)) +

µ(E ∪ F)1− µ(F)

U(γ) + h(µ(E ∪ F)1− µ(F)

)

)(144)

Canceling terms, we obtain the fundamental equation of information:

h(µ(α)) + h(1− µ(α))(1− µ(α))

(h(

µ(β)

1− µ(α)) + h(

µ(γ)

1− µ(α))

)=h(µ(β)) + h(1− µ(β))(1− µ(β))

(h(

µ(α)

1− µ(β)) + h(

µ(γ)

1− µ(β))

)(145)

The continuous solution to this functional equation is h(µ(α)) = µ(α) ln(µ(α)).The desired representation follows.

G Proof of Theorem 4

Proof. From the existence of a subjective knowledge utility representation wecan find for the evaluation of all conditional acts given an arbitrary event E arepresentation of the form:

UE(aE) = ∑α

µE(a−1E (α))uE(α) + hE(a−1

E (α)) (146)

Since the expected utility component always has the richness of a continuum,these representations are unique up to affine transformations. By stationarityI* (with respect to partitions fulfilling µ(E1 ∪ . . . ∪ Ek) = µ(G1 ∪ . . . ∪ Gk)) it isstraightforward that hE(E) = hE(µ(E), µV

E ) for a suitably defined function hE.

UE(aE) = ∑α

µE(a−1E (α))uE(α) + hE(µ(a−1

E (α)), µVa−1

E (α)) (147)

55

By stationarity 1* with respect to partitions with µ(E1 ∪ . . .∪ Ek) 6= µ(G1 ∪ . . .∪Gk) the representations UE and UG must be affine transformations of anotherif µV

E = µVG. If we normalize uE = uG = u, then this reduces to an additive

transformation. Thus, whenever aE and aG are corresponding conditional actsin outcomes and knowledge partition gained, then

UE(aE) =∑α

µE(a−1E (α))u(α) + hE(µ(a−1

E (α)), µVa−1

E (α)) + tE

= UG(aG) =∑α

µG(a−1G (α))u(α) + hG(µ(a−1

G (α)), µVa−1

G (α)) + tG. (148)

Cancelling the identical instrumental preferences,

UE(aE) =∑α

hE(µ(a−1E (α)), µV

a−1E (α)

) + tE

= UG(aG) =∑α

hG(µ(a−1G (α)), µV

a−1G (α)

) + tG. (149)

By Lemma 16, we can then assume that hE = hG = hµVE

. By the information-adjusted sure-thing principle and the uniqueness of additive representations,the representation UE(aE) must also be an affine transformation of UE∪F(aEγF)

for all aE disjoint from γF if E ∩ F = ∅. Thus,

UE(aE) (150)

=∑α

µE(a−1E (α))u(α) + hµV

E(µ(a−1

E (α)), µVa−1

E (α)) + tE (151)

=A(E, γ, F)UE∪F(aEγF) + B(E, γ, F) (152)

=A(E, γ, F)(µE∪F(F)u(γ) + hµVE∪F

(µE∪F(F), µVF ) (153)

+ ∑α

µE∪F(E)µE(a−1E (α))u(α) + hµV

E∪F(µE∪F(E)µE(a−1

E (α)), µVa−1

E (α)) + tE∪F)

+ B(E, γ, F) (154)

It is not without loss of generality to assume an arbitrary function A. This isbecause we already chose the affine transformation between UE and UG andother arbitrary events G′ with µV

E = µVG = µV

G′ and we chose u to be identical inall representations.

Since UE and UG only differ by an additive transformation, we can identifythat A(E, γ, F) = 1

µE∪F(E) . Moreover, for convenience we substitute B(E, γ, F) =B(µ(E), µV

E , µ(F), µVF ) + tE − A(E, γ)(−tE∪F + µ(E)u(γ) + hµV

E∪F(µE∪F(F), µV

F )).

56

We then obtain:

∑α

hµVE(µ(a−1

E (α)), µVa−1

E (α)) =∑

α

1µE∪F(E)

hµVE∪F

(µE∪F(E)µ(a−1E (α)), µV

a−1E (α)

)

+ B(µ(E), µVE , µ(F), µV

F ) (155)

Thus, B(µ(E), µVE , µ(F), µV

F ) = F(µVE ) + G(µE∪F(E), µV

E∪F). From the case inwhich µE(a−1

E (α)) = 1 for some α, we have:

hµVE(1, µV

E ) =1

µE∪F(E)hµV

E∪F(µE∪F(E), µV

E ) + F(µVE ) + G(µE∪F(E), µV

E∪F) (156)

and therefore hµVE∪F

(µE∪F(E), µVE ) = µE∪F(E)( fµV

E∪F(µ(E)) + g(µV

E )). Next, weconsider uninformative conditional acts such that:

UE(aE) =∑α

µE(a−1E (α))u(α) + hµV

E(µ(a−1

E (α)), µVE ) + tE (157)

For some arbitrary uninformative events F and G we determine the knowledgeequivalent:

UF∪G(αFα′G) =u(α) + hµVF(µF∪G(F), µV

F ) + hµVF(µF∪G(G), µV

F ) + tF∪G

=u(γ) + hµVF(1, µV

F ) + tF∪G (158)

This can be substituted into UE∪F∪G(αEγF∪G) to obtain:

UE∪F∪G(αEγF∪G)

=u(α) + hµVF(µE∪F∪G(E), µV

F ) + hµVF(1− µE∪F∪G(E), µV

F ) + tE∪F∪G

+ µE∪F∪G(F ∪ G)(

hµVF(µF∪G(F), µV

F ) + hµVF(µF∪G(G), µV

F )− hµVF(1, µV

F ))

(159)

Definining H(p, µV) = hµV (p, µV) + h(1 − p, µV) − g(µV) = p fµVF(p) + (1 −

p) fµVF(1− p) and repeating the above steps for a knowledge equivalent of E

57

and G yields the generalized fundamental equation of information:

H(µE∪F∪G(F), µVF )

+ (1− µE∪F∪G(F))(

H(µE∪F∪G(E)

1− µE∪F∪G(F), µV

F )− hµVF(1, µV

F )

)=H(µE∪F∪G(E), µV

F )

+ (1− µE∪F∪G(E))(

H(µE∪F∪G(F)

1− µE∪F∪G(E), µV

F )− hµVF(1, µV

F )

). (160)

The general solution to this functional equation is (up to the uniqueness prop-erties of Lemma 16) p fµV (p) = c(µV)(p ln p) + pd(µV) + e(µV). In the specificcase above, we can substitute this solution to obtain:

(1− µE∪F∪G(F))(

e(µVF ))=(1− µE∪F∪G(E))

(e(µV

F ))

(161)

and therefore e(µV) = 0.Thus,

UE(aE) = ∑α

µE(a−1E (α))(uE(α) + c(µV

E ) ln(µE(a−1(α))) + d(µVE ) + g(µV

a−1(α))) + tE

(162)

We now consider an arbitrary informative conditional act aE. Let γ be theknowledge equivalent of distinguishing events G and F. Solving UF∪G(γF∪G) =

UF∪G(αFα′G) for u(γ) and inserting the result into UE∪F∪G(αEγF∪G) = UE∪F∪G(αEα′Fα′′G)

yields after simplifying:

(c(µVE∪F∪G)− c(µV

F∪G))(µE∪F∪G(F) ln µE∪F∪G(F)

+ µE∪F∪G(G) ln µE∪F∪G(G)

− µE∪F∪G(F ∪ G) ln µE∪F∪G(F ∪ G)) = 0 (163)

and therefore c(µVE∪F∪G) = c(µV

F∪G). Lastly, we choose tE such that u(γ) =

UE(γE) which implies g(µVE ) = −d(µV

E ) and obtain the desired representation.

References

Alaoui, L. (2012). The value of useless information.

58

Anscombe, F. J. & Aumann, R. J. (1963). A Definition of Subjective Probability.The Annals of Mathematical Statistics, 34(1), 199–205.

Aumann, R. J. (1999, August 17). Interactive epistemology I: Knowledge. Inter-national Journal of Game Theory, 28(3), 263–300. doi:10.1007/s001820050111

Azrieli, Y. & Lehrer, E. (2008, July). The value of a stochastic informationstructure. Games and Economic Behavior, 63(2), 679–693. doi:10.1016/j.geb.2005.12.003

Bassan, B., Gossner, O., Scarsini, M., & Zamir, S. (2003, December 1). Positivevalue of information in games. International Journal of Game Theory, 32(1),17–31. doi:10.1007/s001820300142

Bennett, D., Bode, S., Brydevall, M., Warren, H., & Murawski, C. (2016, July 14).Intrinsic Valuation of Information in Decision Making under Uncertainty.PLOS Computational Biology, 12(7), e1005020. doi:10.1371/journal.pcbi.1005020

Blackwell, D. (1953, June). Equivalent Comparisons of Experiments. The Annalsof Mathematical Statistics, 24(2), 265–272. doi:10.1214/aoms/1177729032

Caplin, A. & Leahy, J. (2001, February 1). Psychological Expected Utility Theoryand Anticipatory Feelings. The Quarterly Journal of Economics, 116(1), 55–79.doi:10.1162/003355301556347

Celen, B. (2012, December). Informativeness of experiments for meu. Journal ofMathematical Economics, 48(6), 404–406. doi:10.1016/j.jmateco.2012.07.005

Cooke, K. (2017, September). Preference discovery and experimentation: Pref-erence discovery and experimentation. Theoretical Economics, 12(3), 1307–1348. doi:10.3982/TE2263

De Meyer, B., Lehrer, E., & Rosenberg, D. (2010, November). Evaluating Informa-tion in Zero-Sum Games with Incomplete Information on Both Sides. Math-ematics of Operations Research, 35(4), 851–863. doi:10.1287/moor.1100.0467

Eliaz, K. & Spiegler, R. (2006, July). Can anticipatory feelings explain anomalouschoices of information sources? Games and Economic Behavior, 56(1), 87–104.doi:10.1016/j.geb.2005.06.004

Falk, A. & Zimmermann, F. (2016). Beliefs and Utility: Experimental Evidenceon Preferences for Information.

Galanis, S. (2019). Dynamic Consistency, Valuable Information and SubjectiveBeliefs. City University of London Repository.

Ghirardato, P. (2002). Revisiting Savage in a conditional world. Economic theory,20(1), 83–92.

59

Gilboa, I. & Lehrer, E. (1991, January). The value of information - An axiomaticapproach. Journal of Mathematical Economics, 20(5), 443–459. doi:10.1016/0304-4068(91)90002-B

Gilboa, I., Postlewaite, A., & Schmeidler, D. (2009, November). Is It AlwaysRational to Satisfy Savage’s Axioms? Economics and Philosophy, 25(3), 285–296. doi:10.1017/S0266267109990241

Gilboa, I., Postlewaite, A., & Schmeidler, D. (2012, July). Rationality of belief or:Why savage’s axioms are neither necessary nor sufficient for rationality.Synthese, 187(1), 11–31. doi:10.1007/s11229-011-0034-2

Golman, R. & Loewenstein, G. (2018, July). Information gaps: A theory ofpreferences regarding the presence and absence of information. Decision,5(3), 143–164. doi:10.1037/dec0000068

Grant, S., Kajii, A., & Polak, B. (1998, December). Intrinsic Preference forInformation. Journal of Economic Theory, 83(2), 233–259. doi:10.1006/jeth.1996.2458

Hilton, R. W. (1981, January). The Determinants of Information Value: Syn-thesizing Some General Results. Management Science, 27(1), 57–64. doi:10.1287/mnsc.27.1.57

Jakobsen, A. (2016). Dynamic (In)Consistency and the Value of Information.Kadane, J. B., Schervish, M., & Seidenfeld, T. (2008). Is Ignorance Bliss? The

Journal of Philosophy, 105(1), 5–36. JSTOR: 20620069

Koopmans, T. C. (1960, April). Stationary Ordinal Utility and Impatience. Econo-metrica, 28(2), 287. doi:10.2307/1907722

Kops, C. & Pasichnichenko, I. (2020). A Test of Information Aversion.Kreps, D. M. (1979, May). A Representation Theorem for ”Preference for Flex-

ibility”. Econometrica, 47(3), 565. doi:10.2307/1910406. JSTOR: 1910406?origin=crossref

Lehrer, E. & Rosenberg, D. (2006, June). What restrictions do Bayesian gamesimpose on the value of information? Journal of Mathematical Economics,42(3), 343–357. doi:10.1016/j.jmateco.2005.09.002

Lehrer, E. & Rosenberg, D. (2010, July). A note on the evaluation of informationin zero-sum repeated games. Journal of Mathematical Economics, 46(4), 393–399. doi:10.1016/j.jmateco.2010.02.002

Lehrer, E., Rosenberg, D., & Shmaya, E. (2010, March). Signaling and mediationin games with common interests. Games and Economic Behavior, 68(2), 670–682. doi:10.1016/j.geb.2009.08.007

60

Lehrer, E., Rosenberg, D., & Shmaya, E. (2013, September). Garbling of signalsand outcome equivalence. Games and Economic Behavior, 81, 179–191. doi:10.1016/j.geb.2013.05.005

Liang, Y. (2019). Information-dependent expected utility. Retrieved from http://ssrn.com/abstract-2842714

Luce, R. D., Ng, C. T., Marley, A. A. J., & Aczel, J. (2008a, July). Utility ofgambling I: Entropy modified linear weighted utility. Economic Theory,36(1), 1–33. doi:10.1007/s00199-007-0260-5

Luce, R. D., Ng, C. T., Marley, A. A. J., & Aczel, J. (2008b, August). Utility ofgambling II: Risk, paradoxes, and data. Economic Theory, 36(2), 165–187.doi:10.1007/s00199-007-0259-y

Masatlioglu, Y., Orhun, A. Y., & Raymond, C. (2017). Intrinsic InformationPreferences and Skewness. SSRN Electronic Journal. doi:10 .2139/ssrn .3232350

Nehring, K. (1999). Preference for Flexibility in a Savage Framework. Economet-rica, 67, 121–146.

Renyi, A. (1961). On measures of information and entropy. In Proceedings ofthe fourth Berkeley Symposium on Mathematics, Statistics and Probability 1960(pp. 547–561).

Rosenberg, D., Salomon, A., & Vieille, N. (2013). On games of strategic experi-mentation. Games and Economic Behavior, 82, 31–51.

Safra, Z. & Sulganik, E. (1995). On the nonexistence of Blackwell’s theorem-typeresults with general preference relations. Journal of Risk and Uncertainty,10, 187–201.

Savage, L. J. (1954). The foundations of statistics. New York: Wiley.Shannon, C. E. (1948). A Mathematical Theory of Communication. The Bell

System Technical Journal, 27(3), 379–423, 623–656.Snow, A. (2010, April). Ambiguity and the value of information. Journal of Risk

and Uncertainty, 40(2), 133–145. doi:10.1007/s11166-010-9088-7Torgersen, E. (1991). Comparison of statistical experiments. Cambridge University

Press.Villegas, C. (1964, December). On Qualitative Probability $/sigma$-Algebras.

The Annals of Mathematical Statistics, 35(4), 1787–1796. doi:10.1214/aoms/1177700400

Wakker, P. (1988a, July). Nonexpected utility as aversion of information. Journalof Behavioral Decision Making, 1(3), 169–175. doi:10.1002/bdm.3960010305

61

Wakker, P. (1988b). The Algebraic versus the Topological Approach to AdditiveRepresentations. Journal of Mathematical Psychology, 32, 421–435.

62