utility based risk measures · 2016-07-27 · construct these utility based risk measures. because...

Faculty of ScienceDepartment of Applied Mathematics, Computer Science and Statistics

Utility based risk measures

Jasmine Maes

Promotor: Prof. Dr. D. VynckeSupervisor: H. Gudmundsson

Thesis submitted tot obtain the academic degree of Master of Science: AppliedMathematics

Academic year 2015–2016

Acknowledgements

First of all I would like to thank my supervisor Mr. Gundmundson for lettingme come by his office whenever I felt like I needed it, for coming up with goodideas and for supporting me throughout the thesis. I also would like to thankmy promotor prof. Vyncke for giving me advice when I asked for it, while stillallowing me a lot of freedom. Last but not least I would like to thank my friendsfor listing to all my complaints when things didn’t go as planned and when I gotstuck, and my parents for their financial support during my education.

The author gives permission to make this master thesis available for consultationand to copy parts of this master thesis for personal use. In the case of any other use,the limitations of the copyright have to be respected, in particular with regard tothe obligation to state expressly the source when quoting results from this masterdissertation.

Ghent, 1 June 2016.

2

Contents

Preface 6

1 Mathematical representation of risk 81.1 Definitions and properties . . . . . . . . . . . . . . . . . . . . . . . 81.2 The acceptance set of a risk measure . . . . . . . . . . . . . . . . . 121.3 The penalty function . . . . . . . . . . . . . . . . . . . . . . . . . . 161.4 Robust representation of convex risk measures . . . . . . . . . . . . 20

2 An introduction to decision theory 242.1 The axioms of von Neumann-Morgenstern . . . . . . . . . . . . . . 242.2 Risk and utility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262.3 Certainty equivalents . . . . . . . . . . . . . . . . . . . . . . . . . . 28

2.3.1 The ordinary certainty equivalent . . . . . . . . . . . . . . . 282.3.2 The optimised certainty equivalent . . . . . . . . . . . . . . 292.3.3 The u-Mean certainty equivalent . . . . . . . . . . . . . . . 33

2.4 The exponential utility function . . . . . . . . . . . . . . . . . . . . 332.5 Stochastic dominance . . . . . . . . . . . . . . . . . . . . . . . . . . 35

3 Value at Risk and Expected shortfall 383.1 Value at Risk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

3.1.1 General properties . . . . . . . . . . . . . . . . . . . . . . . 393.1.2 Consistency with expected utility maximisation . . . . . . . 42

3.2 Expected shortfall . . . . . . . . . . . . . . . . . . . . . . . . . . . . 453.2.1 General properties . . . . . . . . . . . . . . . . . . . . . . . 463.2.2 Consistency with expected utility maximisation . . . . . . . 49

4 Utility based risk measures 544.1 Utility based shortfall risk measures . . . . . . . . . . . . . . . . . . 544.2 Divergence risk measures . . . . . . . . . . . . . . . . . . . . . . . . 60

4.2.1 Construction and representation . . . . . . . . . . . . . . . . 604.2.2 The coherence of divergence risk measures . . . . . . . . . . 714.2.3 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

4.3 The ordinary certainty equivalent as risk measure . . . . . . . . . . 76

5 Utility functions 785.1 The power utility functions . . . . . . . . . . . . . . . . . . . . . . . 805.2 The exponential utility functions . . . . . . . . . . . . . . . . . . . 815.3 The polynomial utility functions . . . . . . . . . . . . . . . . . . . . 84

4

5.4 The SAHARA utility functions . . . . . . . . . . . . . . . . . . . . 885.5 The κ-utility functions . . . . . . . . . . . . . . . . . . . . . . . . . 96

Conclusion 102

A Dutch summary 104

B Additional computations 107B.1 Computations regarding the SAHARA utility class . . . . . . . . . 107

B.1.1 Computation of the utility function . . . . . . . . . . . . . . 107B.1.2 Computation of the divergence function . . . . . . . . . . . 110

B.2 Computations regarding the κ-utility class . . . . . . . . . . . . . . 112B.2.1 Determining the asymptotic behaviour . . . . . . . . . . . . 112B.2.2 Computation of the divergence function . . . . . . . . . . . 114

5

Preface

When choosing between different investment opportunities it is tempting to selectthe one which offers the highest expected return. However, this strategy wouldignore the risk associated with that investment. Generally speaking we have thatthe larger the expected return of an investment, the larger the risk associated withit. Taking into account the risk of a particular investment is not only necessary topick the best investment, but also to set up capital requirements. These capitalrequirements should create a buffer for potential losses of the investments.But how do we describe and measure this risk? We could of course try to describethe cumulative distribution or the density function of the investment. Althoughthis would give us a lot of information about the risk involved, it could still bevery difficult to compare different investment opportunities in terms of risk. Buta more important problem is that it gives us too much information in some sense.Therefore it would be useful to summarize the distribution of the investment into anumber, which represents the risk. These numbers can then be used to determinethe necessary capital requirements. More formally if the stochastic variable Xmodels the returns of an investment, then a risk measure is a mapping ρ : X 7→ρ(X) such that ρ(X) ∈ R.Because a stochastic variable can be viewed as a function, a risk measure can beinterpreted as a functional. We could therefore study risk measures by lookingat them as purely mathematical objects. Using techniques and ideas from math-ematical analysis we could then analyse properties of these functionals. This isexactly what we will do in the first chapter of this thesis.Studying risk measures only from a purely mathematical point of view has thedownside that it ignores the intuition behind it. The attitude towards risky alter-natives is a subjective matter determined by personal preferences. These personalpreferences can be represented using so called utility functions. Utility functionsare commonly used in economics to model how people make decisions under un-certainty. In the second chapter we will therefore introduce this decision theoreticframework and explain the necessary concepts of economics.Armed with both a strong mathematical and economic framework we will thenapply these concepts to two commonly used risk measures in industry, Value atRisk and Expected Shortfall. This analysis will be the subject of the third chapter.The fourth chapter combines the axiomatic approach from the first chapter andthe economic ideas from the second chapter and describes different ways in whichutility functions can be used to construct risk measures. We will introduce utilitybased shortfall risk measures and divergences risk measures. Using ideas frommathematical optimisation we will link different utility based risk measures anddiscuss different representations of these risk measures.

6

After this discussion the question arises which utility function we should use toconstruct these utility based risk measures. Because utility functions representpersonal preferences we do not believe that there is a straightforward answer to thisquestion. However, the properties of the utility function used in the risk measuresdo affect this risk measure. The last chapter takes a closer look at different classesof utility functions which appear in literature and asses their properties in thecontext of utility based risk measures.

7

1Mathematical representation of risk

In this chapter we will look at risk measures from a solely mathematical point ofview. We will define what a risk measure is, and what properties it should have.We will take a closer look at the concepts of the acceptance set and the penaltyfunction. Finally we will introduce the robust representation of a risk measure.The contents of this chapter is largely based on the theorems found in [11].

1.1 Definitions and properties

Consider a probability space (Ω,F , P ). Where Ω represents the set of all possiblescenario’s, where F is a σ-algebra and where P is a probability measure. Thefuture value of a scenario is uncertain and can be represented by a stochasticvariable X. This is a function on the set of all possible scenario’s to the realnumbers, X : Ω→ R.Let X denote a given linear space of functions X : Ω→ R including the constants.A risk measure ρ is a mapping ρ : X → R. Our goal is to define ρ in such a waythat it can quantify the risk of a market position X, such that it can serve as ameasure to determine the capital requirement of X. That is the amount of capi-tal needed when invested in a risk-free manner will make the position acceptable.Using this interpretation of ρ(X), we would like to have a risk measure that hassome likeable properties.

First of all, if the value of the portfolio X is smaller then the value of the portfolioY almost surely, then it would be logical that than you need more money to makethe position of X acceptable, than to make the position of Y acceptable. Thisproperty is called monotonicity.

Property 1. (Monotonicity) If X ≤ Y , then ρ(X) ≥ ρ(Y ).

Secondly, it is logical to assume that to make the position X+m acceptable, wherem is a risk-free amount, we need to have ρ(X)−m. This is precisely the amount

8

ρ(X) to make the position X acceptable reduced by the risk-free amount m wealready had. This property is called translation invariance or cash invariance.

Property 2. (Translation invariance) If m ∈ R, then ρ(X +m) = ρ(X)−m.

Definition 1.1. A mapping ρ : X → R is called a monetary risk measure if ρsatisfies both monotonicity and translation invariance.

Here we would like to point out that some authors define a monetary risk suchthat ρ can also take the values of +∞ and −∞. But then they use the additionalproperty that ρ(0) is finite or even normalized ρ(0) = 0.

In [11] we found the following lemma.

Lemma 1.1. Any monetary risk measure ρ is Lipschitz continuous with respectto the supremum norm ‖ · ‖, we have:

|ρ(X)− ρ(Y )| ≤ ‖X − Y ‖ (1.1)

Proof. We have thatX − Y ≤ sup

ω∈Ω|X(ω)− Y (ω)|,

henceX ≤ Y+‖X−Y ‖. Using monotonicity we find that ρ(X) ≥ ρ (Y + ‖X − Y ‖).Using translation invariance we get ρ(X) ≥ ρ(Y ) − ‖X − Y ‖. This gives us thatρ(X)− ρ(Y ) ≥ −‖X−Y ‖ or equivalently, ρ(Y )− ρ(X) ≤ ‖X−Y ‖. We also havethat

Y −X ≤ supω∈Ω|Y (ω)−X(ω)|.

Again using monotonicity and translation invariance we find that ρ(Y ) ≥ ρ(X)−‖Y −X‖. From this we conclude that ρ(X)−ρ(Y ) ≤ ‖Y −X‖ = ‖X−Y ‖. Hencewe have that ρ(Y ) − ρ(X) ≤ ‖X − Y ‖ and ρ(X) − ρ(Y ) ≤ ‖X − Y ‖. This leadsus to conclude that |ρ(X)− ρ(Y )| ≤ ‖X − Y ‖.

An important subclass of monetary risk measures are the convex risk measures.These risk measures have the extra property of being convex.

Property 3. (Convexity) ρ(λX+(1−λ)Y ) ≤ λρ(X)+(1−λ)ρ(Y ), for 0 ≤ λ ≤ 1

We will prove in lemma 1.2 that for monetary risk measures this property is equiv-alent with the property of quasi convexity.

Property 4. (Quasi convexity) ρ(λX + (1− λ)Y ) ≤ max (ρ(X), ρ(Y ))

Definition 1.2. A convex risk measure is a monetary risk measure satisfying theconvexity property.

We can easily interpret the property of quasi convexity. Consider an investor whocan invest his resources in such a way that he obtains X, or in another way so thathe obtains Y . If he spends only a fraction λ of his resources on the first investmentstrategy and the rest on Y , he will obtain λX + (1 − λ)Y . This diversificationstrategy will give him a risk of ρ(λX+ (1−λ)Y ). The property of quasi convexitythen states that the risk of this diversified portfolio cannot be greater than therisk of the riskiest investment strategy.In [11, p 178] we find the following statement which we will prove in this thesis.

9

Lemma 1.2. A monetary risk measure is convex if and only if it is quasi convex.

Proof. First consider a risk measure satisfying convexity, hence ρ(λX+(1−λ)Y ) ≤λρ(X) + (1 − λ)ρ(Y ), for 0 ≤ λ ≤ 1. Without loss of generality we can assumethat ρ(X) ≥ ρ(Y ) and hence max (ρ(X), ρ(Y )) = ρ(X). We find that

ρ(λX + (1− λ)Y ) ≤ λρ(X) + (1− λ)ρ(Y )

≤ λρ(X) + (1− λ)ρ(X)

= ρ(X)

= max (ρ(X), ρ(Y ))

From which we can conclude that convexity of a monetary risk measure impliesquasi convexity.Now consider a monetary risk measure satisfying quasi convexity. For all X, Y ∈ Xwe can define X ′ := X+ρ(X) and Y ′ := Y +ρ(Y ). Then it is clear that X ′, Y ′ ∈ X .Without loss of generality we can suppose that ρ(Y ′) ≥ ρ(X ′). Because ρ is quasiconvex we have that ρ(λX ′ + (1 − λ)Y ′) ≤ ρ(Y ′). Rewriting this expressionin terms of X and Y we find that ρ (λX + λρ(X) + (1− λ)Y + (1− λ)ρ(Y )) ≤ρ (Y + ρ(Y )). Now using the fact that ρ satisfies translation invariance we havethat

ρ (λX + (1− λ)Y )− λρ(X)− (1− λ)ρ(Y ) ≤ ρ(Y + ρ(Y )) = ρ(Y )− ρ(Y ) = 0

We can conclude that ρ(λX + (1− λ)Y ) ≤ λρ(X) + (1− λ)ρ(Y ) for all X, Y ∈ X ,i.e. ρ is a convex risk measure.

We can define a special subclass of convex risk measures using the notion of posi-tive homogeneity. Consider an investor who invests his wealth using an investmentstrategy that replicates X, with an associated risk ρ(X). If he only invests a frac-tion λ of his wealth in the same investment strategy he will obtain λX, with anassociated risk of ρ(λX). If this risk equals the proportional risk of the initialinvestment, we say that the risk measure satisfies the property of positive homo-geneity.

Property 5. (Positive Homogeneity) If λ ≥ 0, then ρ(λX) = λρ(X)

Definition 1.3. A coherent risk measure is a convex risk measure satisfying pos-itive homogeneity

Coherent risk measures can also be defined by using the sub-additivity property.If a risk measure is sub-additive, you can decentralize the task of managing therisk of different positions. Consider an investor who has invested his wealth ina contingent claim X + Y . If the risk measure is sub-additive this will never begreater than ρ(X) + ρ(Y ).

Property 6. (Sub-additivity) ρ(X + Y ) ≤ ρ(X) + ρ(Y )

It is stated in [11] that a coherent risk measure is a monetary risk measure sat-isfying positive homogeneity and sub-additivity. We now prove this equivalentdefinition.

10

Lemma 1.3. For a monetary risk measure that satisfies positive homogeneity, theconvexity property is equivalent to the sub-additivity property.

Proof. First assume ρ is sub-additive, X, Y ∈ X , and 0 ≤ λ ≤ 1. We find that:

ρ(λX + (1− λ)Y ) ≤ ρ(λX) + ρ((1− λ)Y ) = λρ(X) + (1− λ)ρ(Y ).

The first inequality follows from the fact that ρ is sub-additive, the second equalityuses the assumption that ρ satisfies positive homogeneity. Note that λX ∈ X and(1− λ)Y ∈ X because of the assumed linearity of X .Now assume ρ is convex. Then for a fixed λ, 0 < λ < 1, define X ′ := 1

λX and

Y ′ := 1(1−λ)

Y . Notice that X ′, Y ′ ∈ X . It follows from the convexity property andthe positive homogeneity that

ρ(X + Y ) = ρ(λX ′ + (1− λ)Y ′) ≤ λρ(X ′) + (1− λ)ρ(Y ′) = ρ(λX ′) + ρ((1− λ)Y ′)

= ρ(X) + ρ(Y ).

This proves that ρ satisfies the sub-additive property.

So far we have defined a coherent risk measure as a risk measure which satisfiesthe following four properties:

1. (Monotonicity) If X ≤ Y , then ρ(X) ≥ ρ(Y ).

2. (Translation invariance) If m ∈ R, then ρ(X +m) = ρ(X)−m.

3. (Positive homogeneity) If λ ≥ 0, then ρ(λX) = λρ(X).

4. (Convexity)ρ(λX + (1− λ)Y ) ≤ λρ(X) + (1− λ)ρ(Y ), for 0 ≤ λ ≤ 1.

Where the convexity property can be replaced by the subadditivity propery. How-ever some author’s like [1] and [2] use a positivity axiom instead of the monotonicityaxiom.

Property 7. (Positivity) ∀X ≥ 0⇒ ρ(X) ≤ 0.

In general positivity and monotonicity are not equivalent. However it turns outthat when a risk measure satisfies the positive homogeneity property and the sub-additivity property they are. The reason for using the positivity property insteadof monotonicity property, is that the positivity property is often easier to prove.

Lemma 1.4. If a risk measure is translation invariant, sub-additive and positivehomogeneous, then it is positive if and only if it is monotone.

11

Proof. First suppose the risk measure is positive homogeneous, translation invariant,sub-additive and positive. Because of positivity we have that

(X − Y ) ≥ 0⇒ ρ(X − Y ) ≤ 0. (1.2)

Using the sub-additivity property we find that

ρ(X) = ρ(X − Y + Y ) ≤ ρ(X − Y ) + ρ(Y ). (1.3)

Combining equation 1.2 and equation 1.3 we find that

X ≥ Y ⇒ ρ(X) ≤ ρ(Y ). (1.4)

We conclude that the risk measure is monotone. Now suppose the risk measure ispositive homogeneous, translation invariant, sub-additive and monotone, we needto show that it is positive. Using monotonicity we find that

X ≥ 0⇒ ρ(X) ≤ ρ(0). (1.5)

Using positive homogenity we find that for all λ > 0,

ρ(0) = ρ(λ0) = λρ(0). (1.6)

Because this is true for all λ > 0 we can conclude that ρ(0) = 0. Using equation1.5 we can conclude that

X ≥ 0⇒ ρ(X) ≤ 0. (1.7)

This proves positivity.

Remark 1.1. Using lemma 1.3 and lemma 1.4 we see that a risk measure iscoherent if and only if it satisfies the following properties for X, Y ∈ X .

1. (Positivity) X ≥ 0⇒ ρ(X) ≤ 0.

2. (Sub-additivity) ρ(X + Y ) ≤ ρ(X) + ρ(Y ).

3. (Positive homogeneous) ∀λ > 0 ρ(λX) = λρ(X).

4. Ttranslation invariant) ∀m ∈ R ρ(X +m) = ρ(X)−m.

1.2 The acceptance set of a risk measure

In the previous section we interpreted ρ(X) as the amount of capital which, ifinvested in a risk-free manner, makes the position X acceptable. In this sectionwe will define the acceptance set of a risk measure. This is the set of all positionswhich do not require surplus capital. We will also demonstrate the relationshipbetween the properties of the risk measure and the corresponding acceptance set.

12

Definition 1.4. The acceptance set induced by a monetary risk measure ρ is de-fined by

Aρ := X ∈ X |ρ(X) ≤ 0. (1.8)

The following theorem was taken from [11] and proves that there is a clear con-nection between the properties of a monetary risk measure and the associatedacceptance set. We have worked out the proof.

Theorem 1.1. If ρ is a monetary risk measure with acceptance set A := Aρ then

1. A is non-empty.

2. A is closed in X with respect to the supremum norm ‖ · ‖.

3. infm ∈ R|m ∈ A > −∞.

4. X ∈ A, Y ∈ X , Y ≥ X ⇒ Y ∈ A.

5. ρ can be recovered from A:

ρ(X) = infm ∈ R|m+X ∈ A. (1.9)

6. If ρ is a convex risk measure, then A is a convex set.

7. If ρ is positively homogeneous, then A is a cone. In particular is ρ is acoherent risk measure, A is a convex cone.

Proof. 1. Consider m = ρ(0), then m ∈ X . We will prove that m ∈ A. m ∈A ⇔ ρ(m) ≤ 0⇔ ρ(0)−m ≤ 0⇔ ρ(0) ≤ m.

2. Consider a sequence Xn ∈ A such that Xn → X1. We need to prove thatX ∈ A. Suppose ρ(X) > 0 then ∃c > 0 : |ρ(Xn) − ρ(X)| > c but usinglemma 1.1 we have that ‖Xn − X‖ ≥ |ρ(Xn) − ρ(X)| > c > 0. If Xn

converges to X in the supremum norm the left-hand side goes to 0. Thisgives us a contradiction. Hence ρ(X) ≤ 0, and therefore ρ(X) ∈ A.

3. ∀m ∈ A we have: m ∈ A ⇔ ρ(m) ∈ A ⇔ ρ(0) − m ≤ 0 ⇔ ρ(0) ≤ m.Hence ρ(0) is a lower bound for all m ∈ A. This concludes the proof sincewe supposed ρ(0) is finite for a monetary risk measure.

4. We know that X ∈ A ⇒ ρ(X) ≤ 0 and using monotonicity Y ≥ X ⇒ρ(Y ) ≤ ρ(X). Combining those two facts we find that ρ(Y ) ≤ ρ(X) ≤ 0.Finally we can conclude that Y ∈ A.

5. Notice that infm ∈ R|ρ(m+X) ≤ 0 = infm ∈ R|ρ(X) ≤ m = ρ(X).

6. We need to prove that ∀X, Y ∈ A and ∀λ ∈ [0, 1] we have that λX +(1 − λ)Y ∈ A. It is sufficient to prove that ρ (λX + (1− λ)Y ) ≤ 0. SinceX, Y ∈ A, and λ ∈ [0, 1] we have λρ(X) ≤ 0 and (1 − λ)ρ(Y ) ≤ 0. Since ρis convex we have ρ (λX + (1− λ)Y ) ≤ λρ(X) + (1 − λ)ρ(Y ) ≤ 0. This iswhat we needed to prove.

1convergence with respect to the supremum norm ‖ · ‖

13

7. To prove that A is a cone it is sufficient to prove that ∀X ∈ X and ∀λ ≥ 0,we have that λX ∈ A. Because ρ is positively homogeneous we have thatλX ∈ A ⇔ ρ (λX) ≤ 0 ⇔ λρ (X) ≤ 0 ⇔ ρ(X) ≤ 0 ⇔ ρ(X) ∈ A. Thisproves that A is a cone. From the above proofs it follows directly that if ρis a coherent risk measure then A is a convex cone.

In 1.4 we defined for each monetary risk measure the associated acceptance set.We can also do the opposite and define for each acceptance set an associated riskmeasure.

Definition 1.5. ρA(X) := infm ∈ R|m+X ∈ AThis is a very intuitive definition for a risk measure. If X is a financial position,then ρA(X) is the minimal amount of money needed to make the position Xacceptable. Next theorem will show that the properties of the acceptance set arelinked to the properties of the associated risk measure. This theorem was foundin [11] and we have worked out the proof.

Theorem 1.2. If A is a non-empty subset of X such that properties 3 and 4 fromtheorem 1.1 are both satisfied, Then the functional ρA has the following properties

1. ρA is a monetary risk measure.

2. If A is a convex set, then ρA is a convex risk measure.

3. If A is a cone, then ρA is positively homogeneous. In particular if A is aconvex cone then ρA is a coherent risk measure.

4. A ⊆ AρA, and A = AρA if and only if A is ‖ · ‖-closed in X .

Proof. 1. To prove that ρA is a monetary risk measure, we need to check that∀X ∈ X ρA(X) is finite and that, ρA(X) satisfies monotonicity and transla-tion invariance.

(translation invariance) We need to prove that for X ∈ X and m ∈ RρA(X + m) = ρA(X) −m. This follows almost immediately from theproperties of the infimum, since ρA(X) − m = infl ∈ R|l + X ∈A −m = infl ∈ R|l +X +m ∈ A = ρA(X +m)

(monotonicity) X ≤ Y ⇒ m + X ≤ m + Y ∀m ∈ R this implies thatinfm ∈ R|m+X ∈ A ≥ infm ∈ R|m+Y ∈ A. Using the definitionof ρA we conclude that X ≤ Y ⇒ ρA(X) ≥ ρA(Y )

(ρA(X) is finite) Since A 6= ∅, we can find a Y ∈ A. Fix this Y andlet X ∈ X . From the assumptions on X we have that X and Y areboth bounded, hence there exists a m ∈ R such that m + X > Y .Using that Y ∈ A, monotonicity and the translation invariance we find0 ≥ ρA(Y ) ≥ ρA(X + m) = ρA(X) − m. We conclude that ∀X ∈ XρA(X) ≤ m < +∞. Because we have assumed property 3 from theorem1.1, we have ρA(0) > −∞. We need to prove that ρA(X) > −∞ ∀X ∈X . Take m′ ∈ R such that X+m′ ≤ 0. Using translation invariance andmonotonicity we find that ρA(X + m′) = ρA(X)−m′ ≥ ρA(0) > −∞.From this we can conclude that for a random X ∈ X ρA(X) > −∞.

14

2. We need to prove that if A is convex, then ∀X1, X2 ∈ X and ∀λ ∈ [0, 1]ρA (λX1 + (1− λ)X2) ≤ λρA(X1) + (1 − λ)ρA(X2). Because of translationinvariance we find ∀i ∈ 1, 2 ρA (Xi + ρA(Xi)) = ρA(Xi)− ρA(Xi) = 0, henceρA(Xi)+Xi ∈ A. Because A is a convex set we have λ (ρA(X1) +X1)+(1−λ) (ρA(X2) +X2) ∈ A. Using this we find that

0 ≥ ρA (λ (ρA(X1) +X1) + (1− λ) (ρA(X2) +X2))

= ρA (λX1 + (1− λ)X2)− λρA(X1)− (1− λ)ρA(X2).

From this we can conclude that ∀X1, X2 ∈ X , λ ∈ [0, 1], ρA (λX1 + (1− λ)X2) ≤λρA(X1)− (1− λ)ρA(X2). Which is precisely what we needed to prove.

3. If A is a cone we need to prove that ∀X ∈ X and ∀λ ≥ 0, ρA(λX) = λρA(X).We first prove ρA(λX) ≤ λρA(X). We know that since ρA(X)+X ∈ A andAis a cone that λ (ρA(X) +X) ∈ A. Hence we have 0 ≥ ρA (λ (ρA(X) +X)) =ρA(λX) − λρA(X). This proves that ρA(λX) ≤ λρA(X). To prove theopposite inequality take m such that m < ρA(X). Then m+X /∈ A, whichalso implies that for λ ≥ 0 λm + λX /∈ A. Which is equivalent with λm <ρA(λX). We have that λm < λρA(X) ⇒ λm < ρA(λX) This can only betrue if λρA(X) ≤ ρA(λX). Finally we can conclude that λρA(X) = ρA(λX).

4. First we’ll prove the inclusion A ⊂ AρA . For this take an X ∈ A then it isclear that ρA(X) = infm ∈ R|m + X ∈ A ≤ 0 and therefore X ∈ AρA .Secondly from part 2 of theorem 1.1 we know that if A = AρA , then A is‖ · ‖-closed in X . Finally assume that A is ‖ · ‖-closed in X . We need toprove that AρA ⊂ A, hence we need to prove that X ∈ AρA ⇒ X ∈ A. Thisis equivalent with X /∈ A ⇒ X /∈ AρA . Take an X /∈ A it is sufficient toprove that ρA(X) > 0. To prove this we need to take m > ‖X‖. Since A is‖ · ‖-closed in X , X \ A is ‖ · ‖-open in X . Because X ∈ X \ A we can finda λ ∈ (0, 1) such that λm+ (1− λ)X /∈ A. Therefore we have

0 ≤ ρA(λm+ (1− λ)X) = ρA((1− λ)X)− λm.

Because ρA is a monetary risk measure we can apply lemma 1.1. We findthat

|ρA((1− λ)X)− ρA(X)| ≤ ‖X − λX −X‖ = λ‖X‖.

Using the two inequalities which we have obtained above, we can concludethat

ρA(X) ≥ ρA((1− λ)X)− λ‖X‖ ≥ λ(m− ‖X‖) > 0.

This is precisely what we needed to prove.

We have connected the concepts of monetary risk measures, convex risk measuresand coherent risk measures to the concept of the acceptance set. The acceptancecontains all acceptable financial positions. But what is an acceptable position?This is subjective and can depend on the risk-aversion of the portfolio-holder. Orit could depend on regulations of a supervisory agency.

15

1.3 The penalty function

In 1921 the economist Frank Knight formulated a distinction between risk anduncertainty. Risk only applies to situations where, although we do not know theoutcome of an event, we can accurately assign a probability measure to the differ-ent outcomes. This situation might occur when tossing a fair coin. Although youdo not know if the coin will land head’s up or not, you know (with certainty) thatthis will happen with probability 1

2.

Uncertainty in Knigth’s work is different. It applies to situations in which we donot have all the information to accurately assign a probability measure to the dif-ferent outcomes. This type of uncertainty, named after Knight, is called Knightianuncertainty. Knightian uncertainty is very common in real world situations. Con-sider for example the future return of a stock. The return of the stock is uncertainand we cannot accurately assign a probability measure to the different returns.From historical returns of the stock we could estimate such a probability measure.But would this be the correct probability measure? Obviously not.In this section we consider the case of Knightian uncertainty where we have ameasurable space (Ω,F) but without a fixed probability measure assigned to thisspace. Let X be the space of all bounded measurable functions on (Ω,F) endowedwith the supremum-norm ‖ · ‖. It is straightforward to show that X is a Banachspace. LetM1 :=M1 (Ω,F) be the set of all probability measures on (Ω,F) anddenote with M1,f the set of all functions Q : F → [0, 1] with are normalized i.e.Q (Ω) = 1 and which are finitely additive. It is clear thatM1 ⊂M1,f and that theelements of M1,f are not necessarily probability measures since it is not guaran-teed that they satisfy σ-additivity. In the next section we use the notation EQ [X]with Q ∈ M1,f for

∫XdQ, where the integral is understood to be a Lebesgue

integral and Q ∈M1,f .

Definition 1.6. A penalty function for ρ on M1,f is a functional α : M1,f →R ∪ +∞ such that

infQ∈M1,f

α (Q) ∈ R. (1.10)

Penalty functions are strongly linked to convex risk measures. Each penalty func-tion defines a convex risk measure and convex risk measures can be representedby using a penalty function. We will prove this in the next two theorems.

Theorem 1.3. The functional

ρ(X) := supQ∈M1,f

(EQ [−X]− α (Q)) (1.11)

defines a convex risk measure on X , such that ρ(0) = − infQ∈M1,f

α(Q).

Proof. For each Q ∈ M1,f we define for all X in X the functional ρQ(X) :=(EQ [−X]− α (Q)). We will first show that ρQ satisfies monotonicity and transla-tion invariance. Monotonicity follows from

X ≤ Y ⇒ −X ≥ −Y⇒ EQ [−X] ≥ EQ [−Y ]

⇒ (EQ [−X]− α(Q)) ≥ (EQ [−Y ]− α(Q))

⇒ ρQ(X) ≥ ρQ(Y ).

16

To prove that ρQ satisfies translation invariance take X ∈ X and m ∈ R. We havethat

ρQ(m+X) = EQ [−(X +m)]− α(Q) = EQ [−X]− α(Q)−m = ρQ(X)−m.

Where we have used that Q is normalized. We also want to prove that thefunctional ρQ is convex. From the proof of 1.2 we know that it is sufficientto prove that ∀X, Y ∈ X and ∀λ ∈ [0, 1] we have that ρQ (λX + (1− λY ) ≤max (ρQ(X), ρQ(Y )). We can assume without loss of generality that EQ [−X] ≤EQ [−Y ], then ρQ(X) ≤ ρQ(Y ) and therefore max (ρQ(X), ρQ(Y )) = ρQ(Y ). Wealso have that

ρQ (λX + (1− λ)Y ) = EQ [− (λX + (1− λ)Y )]− α(Q)

= λEQ [−X] + (1− λ)EQ [−Y ]− α(Q)

≥ λEQ [−Y ] + (1− λ)EQ [−Y ]− α(Q)

= EQ [−Y ]− α(Q)

= ρQ(Y ) = max (ρQ(X), ρQ(Y )) .

The properties monotonicity, translation invariance and convexity are satisfied forall Q ∈M1,f . Hence we have that the functional defined by 1.11 also satisfies theseproperties since they are preserved when taking the supremum over all Q ∈M1,f .Because of the definition of a penalty function and the fact that X ∈ X is bounded,we have that ρ(X) only takes finite values. We can conclude that ρ(X) is a convexrisk measure. The fact that ρ(0) = − inf

Q∈M1,f

α(Q) follows immediately from the

properties of supremum and infimum.

Next theorem will prove that we can represent each convex risk measure usinga penalty function. The proof of this theorem is not easy and uses results fromfunctional analysis. For the ease of the reader we give these results without proof.

Theorem 1.4. (Separating hyperplane theorem) In a topological vector space E,any two disjoint convex sets B and C, one of which has an interior point, can beseparated by a non-zero continuous linear functional l on E, i.e.,

l(x) ≤ l(y) ∀x ∈ C, ∀y ∈ B. (1.12)

Proof. Without proof, see [11, p.508].

Theorem 1.5. (Riesz representation theorem) There is a one-to-one correspon-dence between the set of functions Q ∈ M1,f and linear continuous functionals lon X such that l(1) = 1 and l(X) ≥ 0 for X ∈ X . The correspondence is definedby l(X) = EQ [X] =

∫XdQ, ∀Q ∈M1,f .

Proof. Without proof see [11, p.506].

Theorem 1.6. Any convex risk measure ρ on X is of the form

ρ(X) = maxQ∈M1,f

(EQ [−X]− αmin(Q)

), X ∈ X , (1.13)

17

where the penalty function αmin is given by

αmin(Q) := supX∈Aρ

EQ [−X] for Q ∈M1,f (1.14)

Moreover, αmin is the minimal penalty function which represents ρ i.e., any penaltyfunction α for which 1.11 holds satisfies α(Q) ≥ αmin(Q) for all Q ∈M1,f

Proof. Step 1: We will prove that

ρ(X) ≥ supQ∈M1,f

(EQ [−X]− αmin(Q)

),∀X ∈ X . (1.15)

Let X ′ := ρ(X) +X, then because of the translation invariance property we havethat ρ(X ′) = ρ(ρ(X)+X) = ρ(X)−ρ(X) = 0. Hence X ′ ∈ Aρ. Because of the def-inition of αmin(Q) and X ′ ∈ Aρ, we have that ∀Q ∈ M1,f α

min(Q) ≥ EQ [−X ′] =EQ [−X]−ρ(X). This leads us to conclude that ρ(X) ≥ sup

Q∈M1,f

EQ [−X]−αmin(Q).

Which is what we wanted to prove.

Step 2: For a given X we will construct a QX ∈M1,f such that

ρ(X) ≤ EQX [−X]− αmin(QX). (1.16)

In combination with 1.15 from the first part, this will prove 1.13. It is sufficientto prove this for X ∈ X with ρ(X) = 0. Because if ρ(X) = m, then ρ(X +m) = 0and we have that ρ(X) − m = ρ(X + m) ≤ EQX [−(X +m)] − αmin(QX) =EQX [−X] − αmin(QX) − m. We can also assume without loss of generality thatρ(0) = 0.Consider the set

B := Y ∈ X |ρ(Y ) < 0. (1.17)

It is clear that B is non-empty. We’ll prove that B is open in X . To prove thisit is sufficient to prove that X \ B is closed in X . Take a sequence Xn ∈ X \ B,i.e. ρ(Xn) ≥ 0 for all n, such that Xn → X. Because of lemma 1.1 we have thatρ is Lipschitz continuous with respect to the supremum norm and ρ(Xn)→ ρ(X).We find that ρ(X) ≥ 0, i.e. X ∈ X \ B. The set B is also convex because ifwe take X, Y ∈ B and λ ∈ [0, 1] then because of the convexity of ρ we have(λX + (1 − λ)Y ) ∈ B ⇔ ρ (λX + (1− λ)Y )) ≤ λρ(X) + (1 − λ)ρ(Y ) < 0. SinceX /∈ B and a singleton is a convex set we can apply theorem 1.4 to find a non-zerocontinuous linear functional l on X such that

l(X) ≤ infY ∈B

l(Y ) =: b. (1.18)

To construct QX we’ll use 1.5. For this we will first need to prove that Y ≥ 0 ⇒l(Y ) ≥ 0. Take Y ≥ 0 for all λ > 0 we have λY ≥ 0, because of monotonic-ity we have ρ(λY ) ≤ 0. Furthermore because of translation invariance we findρ(1 + λY ) = ρ(λY ) − 1 < 0. We find that ∀λ > 0 (1 + λY ) ∈ B. Because of thelinearity of l we have that l(X) ≤ l(1)+λl(Y ) for all λ > 0. If l(Y ) < 0 you could,by choosing λ large enough, make sure that l(1) + λl(Y ) < l(X), a contradiction.

18

We conclude that l(Y ) ≥ 0 if Y ≥ 0.

Now we will prove that l(1) > 0. Since l is a non-zero continuous linear functional,there exists a Y ∈ X such that l(Y ) 6= 0 and also l(−Y ) = −l(Y ) 6= 0. Hencewe find a Y + and aY − such that 0 < l(Y ) := l(Y +) − l(Y −). With Y + ≥ 0 andY − ≥ 0. This representation of Y is not unique and because of the linearity of l wecan pick Y with l(Y ) > 0 and a representation of this Y such that ‖Y +‖ < 1. Thenbecause 1 − Y + ≥ 0 and the positivity of l we have l(1 − Y +) ≥ and l(Y +) > 0.Using linearity we find that l(1) = l(Y +) + l(1− Y +) > 0.

Now we can use 1.5 to find a QX in M1,f such that

EQX [Y ] =l(Y )

l(1)∀Y ∈ X (1.19)

It is clear from the definitions of B and Aρ that B ⊂ Aρ. Therefore we have

αmin(QX) = supY ∈Aρ

EQX [−Y ] ≥ supY ∈B

EQX [−Y ] = supY ∈B

−l(Y )

l(1)= − inf

Y ∈B

l(Y )

l(1)=−bl(1)

(1.20)Because we know that ∀ε > 0 Y +ε ∈ B for any Y ∈ Aρ. Therefore we can concludeusing the epsilon characterisation of the supremum, that the above inequality isan equality, hence αmin(QX) = −b

l(1). Using the assumption that ρ(X) = 0 and the

fact that l(X) ≤ b, we can conclude that

EQX [−X]− αmin(QX) =1

l(1)(b− l(X)) ≥ 0 = ρ(X) (1.21)

This is what we needed to prove.

The only part which is left to prove is the fact that αmin is the minimal penaltyfunction which represents ρ. Let α be a random penalty function which representsρ. Then we need to prove that for all Q ∈ M1,f α(Q) ≥ αmin(Q). We have that∀X ∈ X and Q ∈M1,f , ρ(X) ≥ EQ [−X]− α(Q). therefore

α(Q) ≥ supX∈X

(EQ [−X]− ρ(X)) ≥ supX∈Aρ

(EQ [−X]− ρ(X)) ≥ αmin(Q).

This concludes the proof.

We have learned that each convex risk measure can be represented using a penaltyfunction. Because coherent risk measures are by definition convex, this is also truefor coherent risk measures. We now show that the penalty function of a coherentrisk measure has some interesting properties and that the the representation 1.13can be further specified.

Theorem 1.7. The minimal penalty function αmin of a coherent risk measureρ takes only values 0 and +∞. In particular a coherent risk measure can berepresented by

ρ(X) = maxQ∈Qmax

EQ [−X] . (1.22)

19

Where Qmax is defined as

Qmax := Q ∈M1,f |αmin(Q) = 0. (1.23)

Proof. We know from theorem 1.1 that Aρ is a convex cone. Hence for all λ > 0,λX ∈ Aρ. Using theorem 1.6 we know that

αmin(Q) = supX∈Aρ

EQ [−X] = supλX∈Aρ

EQ [−λX] = λ supλX∈Aρ

EQ [−X] = λαmin(Q).

Because this equation must hold for all Q ∈M1,f and for all λ > 0, we have thatαmin(Q) = 0 or αmin(Q) = +∞. It is now clear that 1.22 holds.

We would like to remind the reader that in the representation 1.13 of a convex riskmeasure the Q is not necessarily a probability measure. In the next section we willimpose some extra conditions with respect to the space X and the continuity of ρto obtain an analogous representation in which Q is indeed a probability measure.

1.4 Robust representation of convex risk mea-

sures

In the previous section we considered the situation in which there was no proba-bility measure fixed to the space (Ω,F). In this section fix a probability measureP to the space (Ω,F) and let X = L∞ := L∞(Ω,F , P ). Theorem 1.6 gave us arepresentation for any convex risk measure. In this section we will only considerrisk measures ρ such that

if X = Y P − almost surely then ρ(X) = ρ(Y ). (1.24)

We introduce the the notion of absolute continuity.

Definition 1.7. Q ∈ M1,f is absolute continuous with respect to P ∈ M1,f onthe σ-algebra F , and we write Q P if for all A ∈ F

P (A) = 0⇒ Q(A) = 0. (1.25)

Notice that if P and Q are probability measures then this definition reduces to thedefinition of absolute continuity of two probability measures.

Lemma 1.5. Let ρ be a convex risk measure that satisfies 1.24 and which isrepresented by a penalty function α as in 1.11. Then α(Q) = +∞ for any Q ∈M1,f (Ω,F) which is not absolutely continuous to P .

Proof. Take Q ∈M1,f (Ω,F) such that Q is not absolute continuous with respectto P . Then because Q : F → [0, 1] there exists an A ∈ F such that

Q(A) > 0 and P (A) = 0. (1.26)

Take any X ∈ Aρ and define Xn := X − nIA, with IA(ω) =

1, if ω ∈ A0, if ω /∈ A

.

Because P (A) = 0, A is a null-set of P . Since X and Xn only differ on a null-set

20

of P and we have assumed that ρ satisfies 1.24, we have that ρ(Xn) = ρ(X) for alln. Using theorem 1.6 we have that

α(Q) ≥ αmin(Q) ≥ EQ(−Xn) = EQ [−X + nIA] = EQ [−X] + nQ(A). (1.27)

Because Q(A) > 0 we have that EQ [−X] + nQ(A) → +∞ if n → +∞. We canconclude that α(Q) = +∞.

Now let M1 := M1(Ω,F , P ) denote the set of all probability measures whichare absolute continuous with respect to P . From theorem 1.6 we know that eachconvex risk measure can be represented by a minimal penalty function αmin, butin the representation the supremum is taken over all Q ∈ M1,f . The followingtheorem characterizes a class of convex risk measures in which the Q is indeed aprobability measure and αmin is a penalty function concentrated on M1(P ) Theproof of this theorem is very technical and is outside the scope of this thesis.

Theorem 1.8. Suppose ρ : L∞ → R is a convex risk measure, then the followingconditions are equivalent:

1. ρ satisfies the following Fatou property: for any bounded sequence (Xn) whichconverges P-a.s. to some X,

ρ(X) ≤ limn↑∞

inf ρ(Xn)

2. ρ can be represented by the restriction of the minimal penalty function αmin

to the set M1(P )

ρ(X) = supQ∈M1(P )

(EQ [−X]− αmin(Q)). X ∈ L∞ (1.28)

Proof. Without proof, the proof can be found in [11].

Instead of proving this theorem we’ll try to give an intuition behind the seeminglytechnical formula 1.28. Consider the situation where you have some subjectivebelief P . Consider also the set of all other probabilistic modelsM1(P ) which havethe property that, for an event A, if under your subjective belief it is impossiblefor A to happen, then under other probabilistic models from M1(P ), A cannothappen.For a fixed probabilistic model Q we can interpret EQ [X] as the expected valueof the portfolio under this probabilistic model. Using the interpretation of a risk-measure as a capital requirement we can interpret ρ(X) = EQ [−X] as the risk-freecapital you should hold so that your total expected wealth, which consists of theportfolio and the risk-free capital, equals zero. If you portfolio has a positiveexpected value under Q, then the position X is acceptable. Hence ρ(X) ≤ 0 andX ∈ Aρ.But which probability measure Q should we pick in de definition of ρ? Insteadof focusing on a specific probabilistic model, we could consider all plausible prob-abilistic models M1(P ). We could define ρ as the capital requirement needed in

21

the worst-case scenario of all these probabilistic models M1(P ) to make sure ourtotal expected wealth is always at least zero i.e.


EQ [−X] . (1.29)

But we did have some beliefs about the probabilistic model. So we would liketo give more importance to probabilistic models which are ”more similar” to P ,than the models which deviate a lot from P . This is where the penalty functioncomes in. If we let α(Q) be so that is assigns higher values to probabilistic modelsQ which deviate a lot from our model P , then they have a smaller influence onthe supremum. Now the question is how do we measure the similarity betweentwo probability measures? One way of doing this is using the notion of relativeentropy, or Kullback-Leibler divergence.

Definition 1.8. The Kullback-Leibler divergence or relative entropy for a proba-bility measure Q which is absolute continuous with respect to a probability measureP is defined as

KL(Q|P ) = EQ[ln

(dQ

dP

)]=

∫dQ

dPln

(dQ

dP

)dP. (1.30)

Where dQdP

is the Radon-Nikodym derivative of Q with respect to P .

If we want to use relative entropy as a penalty function, it is important to checkthat it takes a minimal value for Q = P . We want to penalize the probabilisticmodel P the least. This is what we prove in the following lemma.

Lemma 1.6. For all Q ∈ M1(P ) we have KL(Q|P ) ≥ 0. Furthermore we haveKL(P |P ) = 0.

Proof. Let f(x) = x ln(x). Then f(x) is a convex function. By definition we havethat

KL(Q|P ) = E[dQ

dPln

(dQ

dP

)]= E

[f

(dQ

dP

)]≥ f

[E(dQ

dP

)]= E

[dQ

dP

]ln

(E[dQ

dP

])= 1 ln (1) = 0.

Where the inequality follows from Jensen’s inequality. It is clear that

KL(P |P ) = E[dP

dPln

(dP

dP

)]= E [1 ln (1)] = 0

22

Using the Kullback-Leibler entropy as penalty function we get the following risk-measure:


(EQ[−X]−KL(Q|P )) . (1.31)

This risk measure is known as an entropic risk measure. We’ll study this riskmeasure in more detail in chapters four and five.We would like to point out that each risk measure that is defined as


(EQ[−X]− αmin(Q)), X ∈ L∞, (1.32)

is a convex risk measure. To see this it is sufficient to notice that the proof oftheorem 1.11 still works, if we take the supremum over all Q ∈M1(P ) instead ofover all Q ∈ M1,f . The representation of a convex risk measure in the form of1.28 is often called the robust representation of a convex risk measure. This namerefers to the fact that we don’t pick a fixed probability measure Q to calculate therisk, but consider all possible scenario’s.Sometimes the supremum in the representation 1.28 is actually a maximum.

Theorem 1.9. For a convex risk measure ρ on X we have that if ρ is continuousfrom below, which means that for all sequences Xn:

Xn X pointwise on Ω⇒ ρ(Xn) ρ(X).

The the supremum in 1.28 is a maximum and we have that

ρ(X) = maxQ∈M1(P )

(EQ [X]− αmin (Q)

), X ∈ X . (1.33)

Proof. Without proof, see [11, p192]

23

2An introduction to decision theory

2.1 The axioms of von Neumann-Morgenstern

When dealing with risk individual preferences matter. In the first chapter wedefined the acceptance set of a risk measure. The acceptance set contained allacceptable positions X ∈ X . But we never gave a clear explanation of what anacceptable position is. This is because whether or not you find a specific positionacceptable depends on your individual preferences and therefore your risk aversion.In this section we repeat some basic notions from expected utility theory. Acentral question in this discussion is ”How does a rational investor choose betweendifferent portfolio’s?” This choice is risky, because the return of the portfolio’sis uncertain and can be modelled using stochastic variables. The attitude of theinvestor towards risk can be studied using expected utility theory. Crucial to thistheory is the concept of a preference order over a set of lotteries L.

Definition 2.1. A lottery L is defined as a probability measure over a set ofoutcomes, called the outcome space.

In our case the outcome space will be the real axis. These are all the possiblenet payoffs of the portfolio, X. The different probability distributions of the netpayoffs of the different portfolio’s are given by the lotteries.

Definition 2.2. A preference order on the set of lotteries L is defined as a binaryrelation < with the following two axioms:

Axiom 1. (Completeness) ∀L1, L2 ∈ L :

L1 < L2 or L2 < L1.

Axiom 2. (Transitivity) ∀L1, L2, L3 ∈ L:

L1 < L2 and L2 < L3 ⇒ L1 < L3.

24

Sometimes a preference order has a numerical representation.

Definition 2.3. A numerical representation of a preference order is a functionU : L → R such that

L1 < L2 ⇔ U(L1) ≥ U(L2). (2.1)

A numerical representation is called affine if for all L1, L2 ∈ L and α ∈ [0, 1].

U(αL1 + (1− α)L2) = αU(L1) + (1− α)U(L2). (2.2)

To be sure there exists numerical representation and that it is affine we have toimpose two extra axioms.

Axiom 3. (Independence) ∀L1, L2, L3 ∈ L and α ∈ (0, 1], we have that

L1 < L2 ⇒ αL1 + (1− α)L3 < αL2 + (1− α)L3.

Using the concept of a compound lottery, we can give an interpretation of theindependence axiom. The compound lottery is represented by the distributionαL1 + (1 − α)L3, and can be interpreted as a two-step procedure where first achoice is made between lottery L1,and lottery L3, with probabilities α and 1 − αrespectively, and then the chosen lottery is played. The axiom of independencestates that if we prefer lottery L1 to lottery L2, we must prefer the compoundlottery αL1 + (1− α)L3 to αL2 + (1− α)L3

Axiom 4. (Continuity) ∀L1, L2, L3 ∈ L, the following sets are closed.

α ∈ [0, 1]|αL1 + (1− α)L2 < L3 ⊂ [0, 1]

α ∈ [0, 1]|L3 < αL1 + (1− α)L2 ⊂ [0, 1]

Theorem 2.1. If < is a preference order that satisfies the axiom of independenceand the axiom of continuity, then there exists an affine numerical representation Uof <. However U is not unique, but is unique up to an affine transformation. Thismeans that another affine numerical representation U of < is such that U = aU+b,with a > 0 and b ∈ R.

Proof. Without proof, see e.g. [11, p. 58].

Sometimes this numerical affine representation has a special form, called the vonNeumann-Morgenstern representation.

Definition 2.4. A numerical representation of a preference order < is a vonNeumann-Morgenstern representation if it is of the form

U(L) =

∫u(x)L(dx) ∀L ∈ L. (2.3)

Where u is a real function of the outcomes. We will call this function u the utilityfunction.

25

In the case that the outcome space is not finite it is generally not guaranteed thatthe numerical representation will be of the von Neumann-Morgenstern form. Butif there is a von Neumann-Morgenstern representation then both U and u are onlyunique up to affine transformation.For an interpretation of the von Neumann-Morgenstern representation and to un-derstand why it is useful, consider a fixed preference relation < which has a vonNeumann-Morgenstern representation U(L) =

∫u(x)L(dx). In our context the

lottery L can be interpreted as the probability distribution that characterizes thereturns of out investment, modelled by a stochastic variable X. We will assumethat we can make a loss or a profit, such that the outcome space is the whole realline. Taking the integral over our outcome space R gives us that U(L) = E [u(X)].In the expected utility framework a rational investor with utility function u, willrank different portfolios based on their expected utility.

X1 < X2 ⇔ E [u(X1)] ≥ E [u(X2)] . (2.4)

2.2 Risk and utility

In this section we will only consider investors who’s preference order admits to avon Neumann-Morgenstern utility representation. The utility function u of suchan investor reveals his attitude towards risk.

Definition 2.5. We will call a preference order < (strictly) risk averse if and onlyif u is (strictly) concave.

This definition should not come as a surprise. From Jensen’s inequality we knowthat if u is a concave function we have that

u (E [X]) ≥ E [u (X)] . (2.5)

Where if u is strictly concave, the inequality holds. In the expected utility context,Jensen’s inequality states that when a rational risk averse investor has to choosebetween taking a gamble X or getting a certain amount equal to the expectedpayoff of the gamble E [X], he will prefer the certain ammount. This is becausehis utility for taking the certain amount, u (E [X]), is higher than the expectedutility he receives when he takes the gamble, i.e. E [u (X)] .Similarity, if the utility function of an investor is convex this means the investor isrisk loving. And if the utility function of an investor is a linear function it meansthat the investor is risk neutral.This gives us another way to look at utility functions in the context of financialmathematics. Concave utility functions can be viewed as a way to make riskyinvestments less valuable. Conceptually it is comparable to the discount factorused to make future payoffs less valuable.If the utility function u is twice differentiable, we could analyse the concavity of autility function using the second derivative of this function. However, two remarksmust be made about this approach. Firstly, the second derivative is a local measureand will reflect the local risk aversion. Secondly it is impossible to compare therisk aversion of two utility functions using only the second derivative. This second

26

problem is a direct consequence of the fact that the von Neumann-Morgensternutility representation is only unique up to an affine transformation. The utilityfunctions u1(x) = −x2 and u2(x) = −2x2 have as second derivatives respectively−2 and −4. But both utility functions can represent the same preference order.One way to deal with this problem is to use the Arrow-Pratt coefficient of absoluterisk aversion.

Definition 2.6. The Arrow-Pratt coefficient of absolute risk aversion of a twicedifferentiable utility function u is defined as

rA(x) = −u′′(x)

u′(x). (2.6)

We will use the notation ruA(x) if we want to specify the used utility function u.

By dividing u′′(x) by u′(x) we have made sure that all affine transformations of autility function u have the same coefficient of absolute risk aversion. The minussign makes sure that positive values of the coefficient of absolute risk version reflecta risk averse investor.For an interpretation of the numerical value of this coefficient consider

ra(x) = −u′′(x)

u′(x)= −du

′/dx

u′=du′/u′

dx. (2.7)

Where u′(x) is called the marginal utility. The marginal utility measures theincrease of utility per unit of increase in payoff of the portfolio. It would be rationalto assume non-saturation, that is u′(x) ≥ 0, which means that the utility is non-decreasing when the payoff of the portfolio increases. The Arrow-Pratt coefficientof absolute risk aversion can be interpreted as the percentage decrease in marginalutility per unit of increase in net payoffs of the portfolio. E.g. if rA = 0.01 thismeans that in the neighbourhood of x the investor’s marginal utility is decreasingat the rate of 1% per unit of increase in the net payoff. As a little remark wewould like to point that if the net payoff of the portfolio is expressed in euro, thenthe unit of rA(x) is 1

euro. However generally the units of the Arrow-Pratt measure

of absolute risk aversion are omitted, and we’ll do the same.Given the functional form of the Arrow-Pratt measure of risk aversion, you couldfind a utility satisfying it, by solving the following second order linear differentialequation:

u′′(x) + rA(x)u′(x) = 0. (2.8)

Solving this equation is fairly straightforward and we’ll do this in the next theorem.

Theorem 2.2. The solutions to equation 2.8 is given by:

u(x) =

∫ x

1

C1 exp

(∫ η

1

−rA(ζ)dζ

)dη + C2. (2.9)

Where C1 and C2 are two constants and rA(x) is the Arrow-Pratt coefficient ofabsolute risk aversion.

27

Proof. Using the substitution v(x) = u′(x) in equation 2.8, we find

v′(x) = −rA(x)v(x).

Rewriting this we find

v′(x)

v(x)= −rA(x)⇔ ln(v(x))′ = −rA(x).

Integrating both sides we get

ln(v(η)) =

∫ η

1

−rA(ζ)dζ + C.

This gives us

v(η) = C1 exp

(∫ η

1

−rA(ζ)dζ

).

Since v(x) = u′(x) we find that

u(x) =

∫ x

1

C1 exp

(∫ η

1

−rA(ζ)dζ

)+ C2.

2.3 Certainty equivalents

Certainty equivalents will play a crucial role in this thesis, since they are stronglylinked to risk-measures. In this section we will take a look at three differentcertainty equivalents. We will consider the ordinary certainty equivalent, the op-timised certainty equivalent and the certainty equivalent resulting from the zero-utility principle.

2.3.1 The ordinary certainty equivalent

Definition 2.7. The (ordinary) certainty equivalent, CEu(X) of a stochastic vari-able X, with distribution L, is the amount of money for which an individual isindifferent between X and the certain amount CEu(X). This means

u (CEu(X)) =

∫u(x)L(dx) = E [u(X)] . (2.10)

Sometime we will use the notation CELu(X) if we explicitly want to specify the

distribution L of X.

If an investor is risk averse we have that u (E [X]) ≥ E [u (X)]. This means that

CEu(X) = u−1 (E [u(X)]) ≤ E [X] . (2.11)

This coincides with our intuition about risk aversion. When faced with the choicebetween a gamble X and a risk free amount CEu(X) it is possible that a riskaverse investor will choose the certain amount even if it’s less than the expectedpayoff of the gamble. So far we have seen different ways to asses the attitude ofinvestor towards risk. The following theorem links these different concepts.

28

Theorem 2.3. Given two investors with utility functions u1 and u2 respectively,then the following statements are equivalent.

1. ru2A (x) ≥ ru1A (x) for every x.

2. There exists an increasing concave function φ(·) such that u2(x) = φ(u1(x))at all x. This means that u2 can be seen as a concave transformation of u1.

3. CELu2(X) ≤ CEL

u1(X) for all L.

4. Whenever the second investor with utility function u2 finds a lottery L asleast as good as a risk free outcome x, then the first investor with utilityfunction u1 also finds the lottery L as least as good as x. So

∫u2(x)L(dx) ≥

u2(x)⇒∫u1(x)L(dx) ≥ u1(x) for all L and x.

Proof. Without proof, see [20, p191].

All expressions in theorem 2.3 reflect the fact that the second investor is more riskaverse than the first investor. Remember that in expected utility theory, a rationalinvestor is able to rank different portfolios using their expected utility. It is easyto see that if the investor has an increasing utility function, this ranking can alsobe obtained when the investor uses his ordinary certainty equivalent, because:

X1 < X2 ⇔ E [u(X1)] ≥ E [u(X2)]⇔ u−1 (E [u(X1)]) ≥ u−1 (E [u(X2)]) . (2.12)

2.3.2 The optimised certainty equivalent

A certainty equivalent which will play an important role in the study of so calleddivergence risk measures will be the optimised certainty equivalent.

Definition 2.8. The optimized certainty equivalent, OCEu(X) of a stochastic vari-able X of an investor with a concave utility function u is defined as

OCEu(X) = supη∈R

(η + E [u (X − η)]) . (2.13)

Before we give the economic intuition behind this certainty equivalent, we willtake a closer look at definition 2.13. When we do this, we immediately noticea problem. Consider u1(x) and u2(x) = u1(x) + b with b 6= 0. From the vonNeumann-Morgenstern utility theory we know that both utility functions representthe same preferences. However because of the linearity of the expected value wehave OCEu2(X) = OCEu1(X) + b 6= OCEu1(X). This means that the OCEu

is not invariant under an affine transformation of the utility function, which isan undesirable property for a certainty equivalent and makes the interpretationdifficult.The authors [5], who introduced this certainty equivalent did only define the opti-mised certainty equivalent for a limited class U0 of ”normalised” utility functions.

29

Definition 2.9. Let u : R 7→ [−∞,+∞) be a proper1 closed concave and non-decreasing utility function with effective domain domu = t ∈ R|u(t) > −∞ 6= ∅.Then u is contained in the class of normalized utility functions U0 if u satisfiesu(0) = 0 and 1 ∈ ∂u(0). Where ∂u(·) is the subdifferential 2 map of u.

If u is differentiable at 0 then the two normalisation properties of definition 2.9yield u(0) = 0 and u′(0) = 1. Since the utility functions in U0 are non-decreasingand u(0) = 0 we have that for u ∈ U0, u(x) ≥ 0 for all x ≥ 0. We also have thatfor u ∈ U0 and for all x ∈ R, u(x) ≤ x because of the concavity of the utilityfunction and 1 ∈ ∂(0).We will now try to give an intuition behind the definition of the optimised certaintyequivalent let X denote the net payoff of our portfolio. Then E [u(X)] can beinterpreted as sure present value of the net payoff of out portfolio. Now consider aninvestor who can choose to receive a part η of the future net payoff of the portfolioX, giving him a total present value of η + E [u(X − η)]. If the investor were tooptimise this choice, he would receive max

η∈R(η + E [u(X − η)]). However since it

is not always guaranteed a maximum exists the optimised certainty equivalent isdefined using a supremum. From [4] we have following properties and proofs.

Theorem 2.4. For u ∈ U0 the optimised certainty equivalent has following prop-erties:

1. (Monotonicity) X ≤ Y ⇒ OCEu(X) ≤ OCEu(Y )

2. (Shift additive) For all c ∈ R we have OCEu(X + c) = OCEu(X) + c.

3. (Risk aversion) u(x) ≤ x for all x if and only if OCEu(X) ≤ E [X].

4. (Concavity) For all stochastic variables X1 and X2 and all λ ∈ [0, 1] we have

OCEu(λX1 + (1− λ)X2) ≥ λOCEu(X1) + (1− λ) OCEu(X2).

Proof. The proofs of these properties follow from straightforward calculations, asdemonstrated below.

1. (Monotonicity)Because u is non-decreasing we have that

X ≤ Y ⇒ X − η ≤ Y − η⇒ E [u (X − η)] ≤ E [u (Y − η)]

⇒ η + E [u (X − η)] ≤ η + E [u (Y − η)]

⇒ supη∈R

(η + E [u (X − η)]) ≤ supη∈R

(η + E [u (Y − η)])

⇒ OCEu(X) ≤ OCEu(Y )

1This means that u(·) is such that for all x u(x) < +∞ and for at least one x we haveu(x) > −∞.

2For a concave function the subdifferential at x0 is defined as following set ∂u(x0) = c ∈R|u(x)− u(x0) ≤ c(x− x0). In this case 1 ∈ ∂u(0)⇔ u(x) ≤ x.

30

2. (Shift additive)

OCEu(X + c) = supη∈R

(η + E [u(X + c− η)])

= supη∈R

(η − c+ E [u(X − (η − c))]) + c

= sup(η−c)∈R

(η − c+ E [u(X − (η − c))]) + c

= OCEu(X) + c

3. (Risk aversion)First suppose u(x) ≤ x for all x then:

OCEu(X) = supη∈R

(η + E [u(X − η)])

≤ supη∈R

(η + E [X − η])

= supη∈R

(η + E [X]− η)

= supη∈R

E [X]

= E [X]

Now suppose OCEu(X) ≤ E [X]. For all X we have:

supη∈R

(η + E [u(X − η)]) ≤ E [X]

⇒ η + E [u(X − η)] ≤ E [X] ∀η ∈ R⇒ E [u(X − η)] ≤ E [X − η] ∀η ∈ R⇒ E [u(X)] ≤ E [X]

Since this is true for all X, this is especially true for all x ∈ R. Hence wecan conclude that u(x) ≤ x.

4. (Concavity) For all λ ∈ [0, 1], let Xλ = λX1 + (1 − λ)X2. Because of theconcavity of u we have for all η1, η2 ∈ R that

E [u(λX1 + (1− λ)X2 − λη1 − (1− λ)η2)] ≥ λE [u(X1 − η1)]+(1−λ)E [u(X2 − η2)] .

Notice that λη1 + (1− λ)η2 ∈ R. Adding this to both sides we find that

λη1 + (1− λ)η2 + E [u(λX1 + (1− λ)X2 − λη1 − (1− λ)η2)]

≥ λη1 + λE [u(X1 − η1)] + (1− λ)η2 + (1− λ)E [u(X2 − η2)] .

Taking the supremum of both sides we get

supη1,η2∈R

λη1 + (1− λ)η2 + E [u(Xλ − λη1 − (1− λ)η2)]

≥ supη1,η2∈R

λ (η1 + E [u(X1 − η1)]) + (1− λ) (η2 + E [u(X2 − η2)]).

31

Because the mapping (η1, η2) 7→ λη1 + (1−λ)η2 defines a surjection from R2

to R, we have that

OCEu(Xλ) = supη1,η2∈R

λη1 + (1− λ)η2 + E [u(Xλ − λη1 − (1− λ)η2)]

≥ supη1,η2∈R

λ (η1 + E [u(X1 − η1)]) + (1− λ) (η2 + E [u(X2 − η2)])

= λ supη1∈R

(η1 + E [u(X1 − η1)]) + (1− λ) supη2∈R

(η2 + E [u(X2 − η2)])

= λOCEu(X1) + (1− λ) OCEu(X2).

This proves the concavity property of the optimised certainty equivalent.

It is now natural to ask if the optimised certainty equivalent provides the sameranking on the portfolios as the ordinary certainty equivalent, or the expectedutility criterion. It is stated in [5] that this will not always be the case, theorem 2.6links the optimised certainty equivalent and the ordinary certainty equivalent. Inthis theorem we need to assume that the supremum in the definition of OCEu(X)is attained for an η ∈ R. This will be the case if the support of X is a closedbounded interval. From [5] we have following theorems and proof.

Theorem 2.5. If u ∈ U0 and if X is a stochastic variable with support a closedbounded interval, then the supremum in the definition of the optimised certaintyequivalent is attained. I.e.

∃η ∈ R : OCEu(X) = η + E [u(X − η)] .

Proof. Without proof, see [5].

Theorem 2.6. If X and Y are stochastic variables with a compact support, then

OCEu(X) ≥ OCEu(Y ) ∀u ∈ U0 ⇔ CEu(X) ≥ CEu(Y ) ∀u ∈ U0. (2.14)

Proof. First assume that CEu(X) ≥ CEu(Y ) ∀u ∈ U0. Using that u is nondecreas-ing we find that ∀u ∈ U0 and η ∈ R:

CEu(X) ≥ CEu(Y )⇒ E [u(X)] ≥ E [u(Y )]

⇒ η + E [u(X − η)] ≥ η + E [u(Y − η)]

⇒ supη∈R

(η + E [u(X − η)]) ≥ supη∈R

(η + E [u(Y − η)])

⇒ OCEu(X) ≥ OCEu(Y )

Where the first two implications follow from the fact that u is nondecreasing. Nowassume that OCEu(X) ≥ OCEu(Y ). Because X and Y have compact supports, thesupremum in the optimised certainty equivalents is attained. Hence for every u ∈

32

U0 there exists ηuX and ηuY such that we have that OCEu(X) = ηuX +E [u(X − ηuX)]and OCEu(Y ) = ηuY + E [u(Y − ηuY )]. We have for any u ∈ U0 that

OCEu(X) = ηuX + E [u(X − ηuX)] ≥ ηuY + E [u(Y − ηuY )] ≥ ηuX + E [u(Y − ηuX)] .

We conclude that for any u ∈ U0 E [u(X − ηuX)] ≥ E [u(Y − ηuX)]. Which impliesthat E [u(X)] ≥ E [u(Y )]. Again using the fact that u is nondecreasing this impliesthat CEu(X) ≥ CEu(Y ).

We would like to remark that we do not necessarily need the fact that both X andY have compact support. We only need to be sure that the supremum is attainedfor all utility functions u.

2.3.3 The u-Mean certainty equivalent

Definition 2.10. The u-Mean certainty equivalent, Mu(X) of a stochastic variableX is defined by the equation

E [u (X −Mu(X))] = 0. (2.15)

Where u is a strictly increasing utility function. The equation 2.15 is known asthe principle of zero utility.

Notice that the u-Mean certainty equivalent also has the problem that it is notinvariant under an affine transformation of the utility function u. When u isnon-decreasing one can give a more general definition of the u-Mean certaintyequivalent,

Mu(X) = supm ∈ R|E [u(X −m)] ≥ 0. (2.16)

In the fourth chapter of this thesis we will derive a relation between the u-Meancertainty equivalent and the optimised certainty equivalent. We will also show thatunder reasonable assumptions, the zero-utility principle has a unique solution.

2.4 The exponential utility function

The exponential utility function will be an important utility function in this thesis,since it is strongly connected with the concept of an entropic risk measure. In thissection I will therefore apply the concepts described above to the exponentialutility function.The exponential utility function occurs when we model an investor with constantabsolute risk aversion. Let a, with a > 0 be the coefficient of absolute risk aversionof a risk averse investor. Using theorem 2.2 we find that

u(x) =

∫ x

1

C1 exp

(∫ η

1

−adζ)dη + C2.

Calculating these integrals we have that

u(x) =−C1

a(exp (−a(x− 1))− 1) + C2.

33

We will choose the constants C1 and C2 such that u ∈ U0. The condition u′(0) = 1gives us that C1 = exp (−a), and the condition u(0) = 0 gives us that C2 =1a

(1 + exp (−a)). Using these constants the utility function becomes

u(x) =1− exp (−ax)

a. (2.17)

We will take a look at the different kinds of certainty equivalents. It is stated in[5] that for the exponential utility function 2.17 the ordinary certainty equivalent,the optimised certainty equivalent and the u-Mean certainty equivalent coincide.We will provide a proof of this statement in this thesis.

Theorem 2.7. If u(x) = 1−exp(−ax)a

then

CEu(X) = OCEu(X) = Mu(X) = −1

aln (E [exp (−aX)]) . (2.18)

Proof. Using the definitions from section 2.3, we’ll compute all the different cer-tainty equivalents.

The ordinary certainty equivalent CEu(X)

Notice that u−1(x) = − ln(1−ax)a

CEu(X) = u−1 (E [u(X)])

=− ln

(1− aE

[1−exp(−aX)

a

])a

=−1

aln (1− E [1− exp (−aX)])

=−1

aln (E [exp (−aX)])

The optimised certainty equivalent OCEu(X)Consider the function f(η) = η + E [u (X − η)]. For the optimised certaintyequivalent we’re interested in the supremum of this function. If f(η) has amaximum, this maximum will be equal to the supremum.

The first order condition gives

0 =d

dη(η + E [u (X − η)])

= 1 +d

dη

(E[

1− exp (−a (X − η))

a

])= 1 +

d

dη

(1− exp (aη)E [exp (−aX)]

a

)= 1− exp (aη)E [exp (−aX)]

Solving this last equation to η we find

η∗ =− lnE [exp (−aX)]

a

34

To show that the function f(η) attains a maximum at η∗ it is suffucient to no-tice that for a > 0 the second derivative f ′′(η) = −a exp (aη)E [exp (−aX)]is negative.

We can conclude that

OCEu(X) = η∗ + E[

1− exp (−a (X − η∗))a

]

=− lnE [exp (−aX)]

a+ E

1− exp(−aX + a lnE[exp(−aX)]

a

)a


a+

1

a− 1

aE [exp (−aX) exp (lnE [exp (−aX)])]


a+

1

a− 1

aE[

exp (−aX)

E [exp (−aX)]

]=− lnE [exp (−aX)]

a

The u-Mean Mu(X)

We will show that Mu(X) = − lnE[exp(−aX)]a

satisfies the principle of zeroutility.

E [u (X −Mu(X))] = E [1− exp (−aX − lnE [exp(−aX)])]

= E [1− exp(−aX) exp (− lnE [exp(−aX)])]

= 1− E [exp(−aX)]

E [exp(−aX)]

= 0

Because u is a strictly increasing and continuous function Mu(X) is theunique solution of the zero-utility principle.

2.5 Stochastic dominance

Expected utility theory states that an investor with a utility function u wouldprefer X1 to X2 if and only if E [u(X1)] ≥ E [u(X2)]. In this section we will take alook at a situation where X1 is preferred to X2, not only for one investor with aspecific utility function u, but for a whole class of investors with different utilityfunctions. This study can be done using the concept of stochastic dominance.The main goal of this section is to introduce the necessary concepts and definitionsregarding stochastic dominance so that we can apply them later on in concretesituations. The definitions and theorems in this section are taken from [29].As always the stochastic variable X will be the net payoff of a portfolio, and takenegative as well as positive values. We will assume that the distribution of X isgiven by F (x) and that his derivative exists, so that the probability density of Xis well defined. We will further assume that the utility function u is sufficientlydifferentiable.

35

An important concept in the study of stochastic dominance is the n-th orderdistribution of a stochastic variable X. The n-th order distribution can be definedinductively as follows

Definition 2.11. The n-th order distribution function, F (n)(x), of a stochasticvariable X is inductively defined by

F (1)(x) := F (x), F (n)(x) :=

∫ x

−∞F (n−1)(u)du. (2.19)

Where F (x) is the cummulative distribution of X.

Using the notion of a n-th order distributions functions we can define the conceptof n-th order stochastic dominance.

Definition 2.12. If X1 and X2 are random variables, then X1 dominates X2 inthe sense of n-th order stochastic dominance, X1 ≥SD(n) X2, if

F(n)1 (x) ≤ F

(n)2 (x) ∀x ∈ R (2.20)

Where F(n)1 (x) and F

(n)2 (x) are the n-th order distributions of X1 and X2.

There is an important link between the concept of n-th order stochastic dominanceand expected utility maximisation this link is given by following theorem.

Theorem 2.8.X1 ≥SD(n) X2 ⇔ E [u (X1)] ≥ E [u (X2)] . (2.21)

for all utility functions u(x) for which (−1)ku(k)(x) ≤ 0 for k ∈ 1, 2, 3, . . . , n forall x (with at least one utility function satisfying the inequality).

When we take a closer look at the special case of second order stochastic dom-inance. We can see see that second order stochastic dominance has a clear andeasy economic interpretation.

Theorem 2.9.X1 ≥SD(2) X2 ⇔ E [u (X1)] ≥ E [u (X2)] . (2.22)

For all utility functions u(x) with u′(x) ≥ 0 and u′′(x) ≤ 0 for all x, where there isat least one utility function u(x) with the property that u′(x) > 0 and u′′(x) < 0.

Second order dominance means that X1 is ranked above X2 if for all risk averse(u′′(x) ≤ 0) and non-saturated (u′(x) ≥ 0) investors the expected utility of X1 ismore than the expected utility of X2.Risk measures can induce an ordering on different portfolio’s, that is if ρ(X) ≥ρ(Y ) ⇒ X ≤ Y . When all non-saturated risk averse investors obtain the sameordering using the von Neumann-Morgenstern criterion we will say that this risk-measure is consistent with second order stochastic dominance. More generally wehave following definition.

Definition 2.13. A risk measure ρ(X) is consistent with n-th order stochasticdominance if

X1 ≥SD(n) X2 ⇒ ρ(X1) ≤ ρ(X2). (2.23)

36

Using the definition of the n-th order distribution it is easy to see that one hasfollowing inclusion.

Theorem 2.10. X1 ≥SD(n) X2 ⇒ X1 ≥SD(n+1) X2.

Proof. without proof, see [29, Theorem 4]

Using definition 2.13 and theorem 2.10 we can conclude that following theoremholds.

Theorem 2.11. A risk measure consistent with (n+ 1)-th order stochastic domi-nance is also consistent with n-th order stochastic dominance.

Proof. Without proof, see [29, Theorem 6].

37

3Value at Risk and Expected shortfall

”Value at Risk is like an air bagthat works well all the time exceptwhen you have an accident.”

David Einhorn

In this chapter we will take a closer look at two commenly used risk measures, Valueat Risk (VaR) and Expected shortfall (ES). We will use the concepts describedin chapter one and two to check whether these risk measures have both desirablemathematical properties as well as desirable decision theoretic properties.

3.1 Value at Risk

Value at Risk is perhaps the most famous risk measures in existence. It’s definitionis based on the quantiles of a stochastic variables.

Definition 3.1. The lower α quantile of a stochastic variable X is defined by

xα := qα(X) := infx ∈ R|P [X ≤ x] ≥ α. (3.1)

Definition 3.2. The upper α quantile of a stochastic variable X is defined by

xα := qα(X) = infx ∈ R|P [X ≤ x] > α. (3.2)

Because x ∈ R|P [X ≤ x] > α ⊂ x ∈ R|P [X ≤ x] ≥ α we have that qα(X) ≤qα(X). The equality is in general not true. However in [1] it is stated that equalityholds iff P [X ≤ x] = α for at most one x.We follow [1] , [11] and [29] in defining VaRα as the smallest value such that theprobability of an absolute loss being at most this value is at least 1 − α. Moreformally we have following definition.

38

Definition 3.3. Fix α ∈ (0, 1), Then the Value at Risk of a portfolio, where thenet payoff is modelled by X at a level α1 is defined as

VaRα(X) = − infx ∈ R|P [X ≤ x] > α. (3.3)

We have illustrated the concept of Value at Risk in figure 3.1. Where we haveassumed that X ∼ N(0, 1) and the total blue area equals α. In this case the upperand lower quantile are the same.

Figure 3.1: Value at Risk.

α

−VaRα

0x

y

3.1.1 General properties

We will now apply properties of general risk measures form chapter one to analysethe properties of Value at Risk. It is easy to check that Value at Risk is a monetaryrisk measure.

Theorem 3.1. VaRα(X) is a monetary risk measure.

Proof. To prove that VaRα is a monetary risk measure, we need to prove that itsatisfies both monotonicity and translation invariance.

1. MonotonicityTake X ≤ Y . We need to prove that for all α ∈ (0, 1) we have thatVaRα(X) ≥ VaRα(Y ). Notice that

VaRα(X) ≥ VaRα(Y )⇔ infx|P [X ≤ x] > α ≤ infx|P [Y ≤ x] > α.

To prove the inequality on the right side we will show that

x ∈ R|P [Y − x ≤ 0] > α ⊂ x ∈ R|P [X − x ≤ 0] > α. (3.4)

From our assumption X ≤ Y we have that ∀x ∈ R, X − x ≤ Y − x. Fromthis it follows that P [X − x ≤ 0] ≥ P [Y − x ≤ 0].

Take an x ∈ x ∈ R|P [Y − x ≤ 0] > α randomly, then we have P [Y − x ≤ 0] >α. From which it follows that P [X − x ≤ 0] ≥ P [Y − x ≤ 0] > α. We canconclude that x ∈ x ∈ R|P [X − x ≤] > α. This proves 3.4. We concludethat if X ≤ Y then VaRα(X) ≥ VaRα(Y ).

1This means on a 100(1− α) percent confidence level.

39

2. Translation invarianceWe need to prove that VaRα(X +m) = VaRα(X)−m. This follows from astraightforward calculation.

VaRα(X +m) = − infx ∈ R|P [X +m ≤ x] > α= − infx ∈ R|P [X ≤ x−m] > α= − infx+m ∈ R|P [X ≤ x] > α= − infx ∈ R|P [X ≤ x] > α −m= VaRα(X)−m.

This concludes the proof that VaRα(X) is a monetary risk measure.

It is now natural to ask whether Value at Risk is a convex risk measure. Fromtheorem 1.1 from chapter one, we know that if VaRα is a convex risk measure, thenthe associated acceptance set should be convex as well. Unfortunately it turns outthat VaRα is not a convex risk measure. This is an undesirable property to havebecause it can penalize more diversified portfolio’s. We will illustrate the fact thatVaRα is not an convex risk measure using a simplified example.Let the risk-free rate be 0%. Consider a zero-coupon bond which costs 100, paysout 101 and has a default probability of 0.0095. Denote the net payoff of aninvestment in the bond with X. Then

X =

−100, when the bond defaults.

1, otherwise.(3.5)

It is easy to see that the Value at Risk at the 99% confidence level, VaR0.01(X) =−1. This is because the probability of default is below the 1% at which we calculatethe Value at Risk. This default is considered to unlikely to be taken into account byVaR0.01. Because VaR0.01(X) = −1 ≤ 0, we have that X is an acceptable position,i.e. X ∈ AVaR0.01 . Now consider a second bond which has exactly the same defaultrisk, payoff and price as the first bond. If the net payoff of the investment in thissecond bond are modelled by Y , then it is clear that VaR0.01(Y ) = −1 and thusY ∈ AVaR0.01 . We have that both X and Y are acceptable positions. Now assumethat the default of the first bond is independent of the default of the second bond.If Value at Risk would be a convex risk measure, then the more diversified portfoliowith payoff P = 1

2X + 1

2Y should be an acceptable position as well. Using the

independence of X and Y we find that P has following distribution.

X =

−100, when both bonds default, p = (0.0095)2.

−992, precisely one bond defaults, p = 2 · 0.0095(1− 0.0095).

1, otherwise.

(3.6)

The probability that at least one bond will default equals (0.0095)2 +2 ·0.0095(1−0.0095) = 0.0189. Hence the Value at Risk of the diversified portfolio is VaR0.01(P ) =992

. The portfolio P is not an acceptable position. The acceptance set AVaR0.01 isnot convex.

40

Although we constructed a concrete example for α = 0.01, it is possible to con-struct such an example for all α ∈ (0, 1) by choosing the default probability of thebonds small enough.In the previous example another important problem of Value at Risk became clear.Value at Risk can ignore potentially very large losses. Consider again the exampleof a bond which costs b, has a positive net return r and defaults with probabilityp < α. Then VaRα(X) = −r ≤ 0 no matter the value of b. If the bond defaultsthe payoff is −b, but because the default probability is too low, Value at Risk isn’taffected by this, potentially very large, loss.

Figure 3.2: Density function of S1. Figure 3.3: Standard normal densityfunction.

Like all risk measures in this thesis, VaR tries to summarise the distribution ofa portfolio into one number which should reflect the level of risk. Hence it is in-evitable that some information regarding the complete distribution of the stochas-tic variable is lost. It is however important to be aware of this information loss.Because Value at Risk is defined as a quantile, it does not incorporate well theinformation about the shape of the left tail of the density function. We will illus-trate this with a theoretical example. Remember that the normal density is givenby

f(x, µ, σ2) =1

σ√

2πexp

(−(x− µ)2

2σ2

). (3.7)

Now consider S1 such that the net payoffs have following density function

g(x) = 0.99f(x, 0, 1.002974) + 0.01f(x,−8, 0.04). (3.8)

This density function is plotted in figure 3.2. Next to this figure the density of thestandard normal is plotted. Apart from the spike which occurs around -8, bothdensity functions are very similar. In fact if we calculate the Value at Risk at a95%-confidence level of S1 we find that VaR0.05(S1) = 1.64485. If we now considerS2, for which the net payoffs have a standard normal distribution, then we find thatthe Value at Risk at a 95% confidence level is the same,i.e. VaR0.05(S2) = 1.64485.Although the Value at Risk at the 95%-confidence level is the same. The risk as-sociated with both investments is definitely not. For S1 there is a 0.5% probabilitythat the loss exceeds 8, while for S2 we have that this probability is negligible2.This problem of Value at Risk get addressed by other risk measures such as Ex-pected shortfall.

2p = 6.2210−16

41

3.1.2 Consistency with expected utility maximisation

In this subsection we will take a closer look at some of the results reported in [29]regarding the consistency of Value at Risk with expected utility maximisation.For this we will rely on the definitions and theorems introduced in chapter tworegarding stochastic dominance. In [29] it is stated that Value at Risk is consistentwith first order stochastic dominance. This should not be surprising, since Valueat Risk is defined as a quantile.

Theorem 3.2. VaR is consistent with first-order stochastic dominance. Thismeans

X1 ≥SD(1) X2 ⇒ VaRα(X1) ≤ VaRα(X2). (3.9)

Theorem 3.3. Without proof, follows from [19, Theorem 1’].

Instead of copying the proof of this theorem, we will give an example which willillustrate the fact thatX1 ≥SD(1) X2 is a very strong assumption. Consider 2 stocksand let X1 and X2 denote the net payoffs of these stocks. Assume X1 ∼ N(1, 1)and X2 ∼ N(0, 1). It is known that the cumulative distribution of of a normaldistribution with mean µ and deviation σ is given by

F (x) =1

2

(1 + erf

(x− µσ√

2

)), where erf(x) :=

1√π

∫ x

−xexp (−t2)dt. (3.10)

We have that

F1(x) =1

2

(1 + erf

(x− 1√

2

))F2(x) =

1

2

(1 + erf

(x√2

))Because the error function is an increasing function, we have that for all x ∈R F1(x) ≤ F2(x). By definition of first order stochastic dominance this im-plies that X1 ≥SD(1) X2. We have plotted these cumulative distributions infigure 3.1.2 together with the line y = 0.05. By definition we have that x-coordinate of intersection of this line with the cumulative distribution of X1 equals−VaR0.05(X1) = −0.6449. Similarity we find that −VaR0.05(X2) = −1.6449. Wehave that VaR0.05(X1) ≤ VaR0.05(X2). From figure 3.1.2 it is clear that this in-equality would hold for any α ∈ (0, 1).Hence we have that

X1 ≥SD(1) X2 ⇒ VaRα(X1) ≤ VaRα(X2). (3.11)

42

Figure 3.4: First order stochastic dominance and VaR

We realize the above example is a rather theoretical one. However it does illustratean important point. The condition that F1(x) ≤ F2(x) for all x ∈ R is very

restrictive. A less severe restriction would be that F(2)1 (x) ≤ F

(2)2 (x) for all x ∈ R.

Where F (2) denotes the second order distribution.In [29] it is stated that Value at Risk is in general not consistent with second orderstochastic dominance. This means that in general we have that

X1 ≥SD(2) X2 ; VaRα(X1) ≤ VaRα(X2). (3.12)

However in [29] we also find an important exception.

Theorem 3.4. VaR is consistent with second order stochastic dominance whenportfolios’ profits and losses have an elliptical distribution 3 with finite varianceand the same mean.

Proof. Without proof, see [29, Theorem 14].

Again we will not repeat the proof here, but we will construct an example to geta better understanding of the concept of second order stochastic dominance andthe assumptions made in theorem 3.4.Consider again two stocks such that there net payoff is given by X1 and X2 re-spectively. Assume that X1 ∼ N(µ, σ2

1) and X2 ∼ N(µ, σ22), with σ1 < σ2 and

denote with F 11 and F 1

2 there cumulative distributions. It is known that a normaldistribution is an elliptical distribution.Then for i = 1 and i = 2 we have that

F 1i (x) =

1

2

(1 + erf

(x− µσi√

2

)). (3.13)

3An n-dimensional random vector R = [R1, R2, . . . , Rn]T has an elliptical distribution if thedensity function of R (denoted by f(R)) is represented below with a function φ(·, n)

f(R, θ,Σ) =1

|Σ| 12φ((R− θ)T Σ−1(R− θ), n

).

Where Σ is an n-dimensional positive definite matrix and θ is an n-dimensional column vector.

43

By assumption we also have that

σ1 < σ2 ⇒1

σ1

√2>

1

σ2

√2. (3.14)

From this we can conclude thatx−µσ1√

2> x−µ

σ2√

2, when x > µ.

x−µσ1√

2< x−µ

σ2√

2, when x < µ

x−µσ1√

2= x−µ

σ2√

2when x = µ.

(3.15)

Using the fact that the error function erf is an increasing function we find thatF 1

1 (x) > F 12 (x), when x > µ.

F 11 (x) < F 1

2 (x), when x < µ

F 11 (x) = F 1

2 (x), when x = µ.

(3.16)

From this we can conclude that we do not have the necessary condition for firstorder stochastic dominance. Graphically this is illustrated in figure 3.5 in whichwe have taken µ = 0, σ1 = 1 and σ2 = 3. Although the conditions for first orderstochastic dominance is not fulfilled, the condition for second order stochasticdominance is. The second order distributions are plotted in figure 3.6. We willhave that

X1 ∼ N(µ, σ21) and X2 ∼ N(µ, σ2

2), with σ1 < σ2 ⇒ F 22 (x) ≥ F 2

1 (x) ∀x ∈ R.(3.17)

Figure 3.5: First order distirbutionsfor X1 ∼ N(0, 1) andX2 ∼ N(0, 9).

Figure 3.6: Second order distributionsfor X1 ∼ N(0, 1) andX2 ∼ N(0, 9).

To see why 3.17 is true remember that for i = 1 and i = 2 we have by definitionthat:

F 2i (x) =

∫ x

−∞F 1i (u)du. (3.18)

Using equations 3.16 we see that F 11 and F 1

2 intersect each other for x = µ. Usingthe assumption that both X1 and X2 have the same mean and the properties of the

44

normal distribution we see that this intersection will happen for F 11 (µ) = F 2

2 (µ) =0.5. When calculating the second order distribution, we in fact calculate the areaunder the first order distribution form −∞ to some point x.We know from 3.16 that for x < µ we have that F 2

2 (x) > F 21 (x). Hence when

calculating the second order distribution until some x < µ we accumulate extraarea, this difference in area is labelled A in figure 3.5. We can also clearly see thisaccumulating effect in figure 3.6 where the difference between F 2

2 and F 21 grows

until x = µ. In this same figure we also notice that after this point the differencebetween F 2

2 and F 21 decreases again, but it never becomes negative. This effect is

a result from the fact that for x > µ we have that F 11 (x) > F 1

2 (x) which impliesthat the access area between F 1

2 (x) and F 11 (x) is negative for x > µ. Using the

symmetry property of the normal distribution we see that∫ µ

−∞

(F 1

2 (x)− F 11 (x)

)dx = −

∫ +∞

µ

(F 1

2 (x)− F 11 (x)

)dx (3.19)

Graphically this means that in figure 3.5 the area A and B are the same, but theyhave a different sign. Hence when integrating the first order distributions form−∞ to some point x, one first accumulates the extra area A, and then loses partof this excess area when x > µ. However it is impossible to lose more than thealready accumulated excess area because the absolute value of the area A is thesame as that of area B, a fact which follows from equation 3.19.Hence we conclude that 3.17 holds and by definition of second order stochasticdominance we have that:

X1 ∼ N(µ, σ21), X2 ∼ N(µ, σ2

2) with σ1 < σ2 ⇒ X1 ≥SD(2) X2. (3.20)

This conclusion coincides with our intuition that the riskiest of two portfolios X1

and X2, with the same expected return, but with different variance, is the portfoliowith the largest variance.Applying theorem 3.4 we find that

X1 ∼ N(µ, σ21), X2 ∼ N(µ, σ2

2) with σ1 < σ2 ⇒ VaRα(X1) ≤ VaRα(X2). (3.21)

We want to stress that in this previous example the fact that X1 and X2 hadthe same mean, is a crucial assumption. If this constraint would not be fulfilledit would not be guaranteed that Value at Risk is consistent with second orderstochastic dominance.We can conclude that although Value at Risk has an easy definition, it also has alot of shortcomings both from a mathematical point of view and from a decisiontheoretic point of view.

3.2 Expected shortfall

In this section we will look at an improvement of Value at Risk called Expectedshortfall. The theorems and definitions used in this section are taken from [1].

45

Definition 3.4. Assume E [X−] < +∞. Then the expected shortfall at a levelα ∈ (0, 1) is defined as

ESα(X) = − 1

α(E [XI(X ≤ xα)] + xα (α− P (X ≤ xα))) . (3.22)

Where I(·) denotes the indicator function.

An interesting representation of expected shortfall is the integral representation.

Theorem 3.5. If X is a real valued random variable on the probability space(Ω,F , P ) with E [X−] < +∞ and α ∈ (0, 1) is fixed, then

ESα(X) = − 1

α

∫ α

0

xudu = − 1

α

∫ α

0

qu(X)du. (3.23)


At this point we would like to mention that sometimes the definition

TCEα := −E [X|X ≤ −VaRα(X)] (3.24)

is used as a synonym of expected shortfall. When the distribution of X is con-tinuous, this definition is equivalent to the definition of expected shortfall givenby equation 3.22, see [1]. However in general the equality TCEα(X) = ESα(X)does not hold. The risk measure defined in equation 3.24 is known as upper tailconditional expectation. It is stated in [1] that it is not guaranteed that thisrisk-measure is coherent, because it sometimes lacks the sub-additivity property.

3.2.1 General properties

The most important property of expected shortfall is that it is a coherent riskmeasure. This is an improvement upon Value at Risk since Value at Risk wasnot even convex. To prove this we will use the alternative characterisation of acoherent risk measure described in remark 1.1 of the first chapter. Here we findthat we need to show that for all X, Y and α ∈ (0, 1) we have that:

1. (Positivity) X ≥ 0⇒ ESα(X) ≤ 0

2. (Positive homogeneous) ∀λ > 0 ESα(λX) = λESα(X)

3. (Translation invariant) ∀m ∈ R ESα(X +m) = ESα(X)−m.

4. (Sub-additivity) ESα(X + Y ) ≤ ESα(X) + ESα(Y )

From all these properties the sub-additivity property is the most difficult to prove.We will work out the sub-additivity proof from [1]. For this we need to definefollowing function.

46

Iα (X ≤ x) :=

I (X ≤ x) , if P (X = x) = 0

I (X ≤ x) + α−P (X≤x)P (X=x)

I (X = x) , if P (X = x) > 0(3.25)

In [1] we find following lemma.

Lemma 3.1. We have following equalities

1. Iα(X ≤ x(α)

)∈ [0, 1]

2. E[Iα(X ≤ x(α)

)]= α

3. 1αE[XIα

(X ≤ x(α)

)]= −ESα(X)

We will use this lemma to prove following lemma.

Lemma 3.2. Iα(Z ≤ z(α)

)− Iα

(X ≤ x(α)

)≥ 0, if X > x(α)

Iα(Z ≤ z(α)

)− Iα

(X ≤ x(α)

)≤ 0, if X < x(α)

(3.26)

Proof. If X > x(α) or if X < x(α) we have that P [X = xα] = 0. Using definition3.25 we have that Iα

(X ≤ x(α)

)= I

(X ≤ x(α)

). Hence we have

Iα(X ≤ x(α)

)= 0, if,X > x(α)

Iα(X ≤ x(α)

)= 1, if X < x(α).

From 3.1 we have that Iα(Z ≤ z(α)

)∈ [0, 1]. Hence we can conclude that

Iα(Z ≤ z(α)

)− Iα

(X ≤ x(α)

)≥ 0, if X > x(α)

Iα(Z ≤ z(α)

)− Iα

(X ≤ x(α)

)≤ 0, if X < x(α)

Which is what we needed to prove.

Theorem 3.6. Expected shortfall is a coherent risk measure.

Proof. 1. (Positivity)Take X ≥ 0 then for all α ∈ (0, 1) we have that qα(X) ≥ 0. Using theintegral representation of expected shortfall we find that

ESα(X) = − 1

α

∫ α

0

qu(X)du ≤ 0. (3.27)

This proves the positivity property.

47

2. (Positive Homogeneity) Take λ > 0 then for all α ∈ (0, 1) we have that

q(α)(λX) = inf x ∈ R|P (λX ≤ x) ≥ α

= infx ∈ R|P

(X ≤ x

λ

)≥ α

= inf λx ∈ R|P (X ≤ x) ≥ α= λ inf x ∈ R|P (X ≤ x) ≥ α= λq(α)(X).

Using the integral representation of expected shortfall we get that

ESα(λX) = − 1

α

∫ α

0

q(u)(λX)du

= − 1

α

∫ α

0

λq(u)(X)du

= λESα(X).

This proves positive homogeneity.

3. (Translation invariance)Let m ∈ R then we have for all α ∈ (0, 1)

q(α)(X +m) = inf x ∈ R|P (X +m ≤ x) ≥ α= inf x ∈ R|P (X ≤ x−m) ≥ α= inf x+m ∈ R|P (X ≤ x) ≥ α= inf x ∈ R|P (X ≤ x) ≥ α+m

= q(α)(X) +m.

Using the integral representation of expected shortfall we have that

ESα(X +m) = − 1

α

∫ α

0

q(u)(X +m)du

= − 1

α

∫ α

0

(q(u)(X) +m)du

= − 1

α

∫ α

0

q(u)(X)du− m

α

∫ α

0

du

= − 1

α

∫ α

0

q(u)(X)du−m

= ESα(X)−m

4. (Sub-additivity)Take X and Y then we need to show that following inequality holds.

ESα(X) + ESα(Y )− ESα(X + Y ) ≥ 0. (3.28)

48

Let Z := X +Y and take α > 0. From lemma 3.1 we have that αESα(X) =−E

[XI(α)(X ≤ x(α))

]. We find that:

α (ESα(X) + ESα(Y )− ESα(Z))

=E[ZI(α)(Z ≤ z(α))−XI(α)(X ≤ x(α))− Y I(α)(Y ≤ y(α))

].

Using the fact that Z = X + Y , we can rewrite this as

E[X(I(α)(Z ≤ z(α))− I(α)(X ≤ x(α)

)]+E

[Y(I(α)(Z ≤ z(α)

)− I(α)(Y ≤ y(α))

].

(3.29)

Now we use lemma 3.2 to obtain following inequalities

E[X(I(α)

(Z ≤ z(α)

)− I(α)

(X ≤ x(α)

))]≥ xαE

[I(α)

(Z ≤ z(α)

)− I(α)

(X ≤ x(α)

)]E[Y(I(α)

(Z ≤ z(α)

)− I(α)

(Y ≤ y(α)

))]≥ yαE

[I(α)

(Z ≤ z(α)

)− I(α)

(Y ≤ y(α)

)]We conclude that

E[X(I(α)(Z ≤ z(α))− I(α)(X ≤ x(α))

)]+ E

[Y(I(α)(Z ≤ z(α)

)− I(α)(Y ≤ y(α)))

]≥ xαE

[I(α)

(Z ≤ z(α)

)− I(α)

(X ≤ x(α)

)]+ yαE

[I(α)

(Z ≤ z(α)

)− I(α)

(Y ≤ y(α)

)]= x(α)(α− α) + y(α)(α− α)

= 0.

We conclude that expected shortfall satisfies the sub-additivity property.

3.2.2 Consistency with expected utility maximisation

We will now look at the consistency of expected shortfall with expected utilitymaximisation. For this we will need a result from [19, Theorem 5’] where we findfollowing theorem.

Theorem 3.7. Let qα(X1) and qα(X2) be quantiles of X1 and X2 respectively, thenfollowing expressions are equivalent

1. X1 ≥SD(2) X2

2.∫ α

0qu(X1)du ≥

∫ α0qu(X2)du for all α ∈ [0, 1] and a strict inequality holds for

some α.

Proof. without proof, see [19, Theorem 5’].

In [29] we find following theorem with a proof based on theorem 3.7 and the integralrepresentation of expected shortfall.

Theorem 3.8. Expected shortfall is consistent with second-order stochastic dom-inance.

49

Proof. By definition of second order stochastic dominance we need to show thatfor all α ∈ (0, 1)

X1 ≥SD(2) X2 ⇒ ESα(X1) ≤ ESα(X2). (3.30)

From theorem 3.7 we have that

X1 ≥SD(2) X2 ⇒∫ α

0

qu(X1)du ≥∫ α

0

qu(X2)du

⇒ − 1

α

∫ α

0

qu(X1)du ≤ − 1

α

∫ α

0

qu(X2)du

⇒ ESα(X1) ≤ ESα(X2).


The fact that expected shortfall is consistent with second order stochastic domi-nance means that if all risk averse and non-saturated investors prefer X1 to X2,then the expected shortfall of X1 is lower than the expected shortfall of X2. Thistheorem shows that expected shortfall is not only an improvement upon Value atRisk from a mathematical point of view but also from an economic point of view.We would like to point out that the condition that all risk averse investors preferX1 to X2 is a rather severe one. When this condition is not fulfilled consistencywith expected utility maximisation cannot be guaranteed.To illustrate the severity of the assumption of second order stochastic dominancewe’ll give a numerical example. Consider two investors A and B with followingutility functions.

uA(x) :=1− exp (−0.02x)

0.02(3.31)

uB(x) := 1 + x−√

1 + x2 (3.32)

Notice that both utility functions are increasing and concave. Consider two port-folio’s X1 and X2 such that their net payoffs are given by

X1 =

2, p = 0.99

−25, p = 0.0075

−50, p = 0.0025

(3.33)

and

X2 =

5, p = 0.55

2, p = 0.44

−75, p = 0.01.

(3.34)

We have calculated the expected utility for both investors as well as the expectedshortfall at a 0.01 level of both portfolios. These results are summarised in table3.2.2. We notice that on the basis of expected utility investor A prefers X2 toX1 while investor B prefers X1 to X2. We conclude that second order stochasticdominance cannot order X2 and X1.

50

Table 3.1: summery results

X1 X2 conclusion

Expected utility A 1.48 1.74 X2 <A X1

Expected utility B 0.14 -0.66 X1 <B X2

ES0.01 31.25 75 ES0.01(X2) ≥ ES0.01(X1)

Definition 3.5. Assume E [X−] < +∞ Then the conditional value at risk at levelα of X is defined as

CVaRα(X) = infs∈R

(1

αE[(X − s)−

]− s). (3.35)

In [1, Corollary 4.3] it is stated that under a mild integrability condition, expectedshortfall and conditional value at risk are the same object. More formally followingtheorem is stated.

Theorem 3.9. Let X be a real integrable random variable on some probabilityspace (Ω,F , P ) and α ∈ (0, 1) be fixed. Then

ESα(X) = CVaRα(X) (3.36)

Proof. Without proof, see [1, Corollary 4.3].

It is interesting to notice that CVaRα can be rewritten in the form of an optimisedcertainty equivalent.

CVaRα(X) = infs∈R

(1

αE[(X − s)−

]− s)

= infs∈R

(−(s− 1

αE[(X − s)−

]))= − sup

s∈R

(s+ E

[−1

α(X − s)−

])= −OCEu(X).

Where u(x) = −1α

max(0,−x). Notice that u(0) = 0 and 1 ∈ ∂(0) and that u(x) isincreasing. Furthermore because 0 < α < 1 we have that u is a concave function.

u(λx+ (1− λ)y) =−1

αmax(0,−λx− (1− λ)y)

≥ −1

αmax(0,−λx) +

−1

αmax(0,−(1− λ)y)

= λ−1

αmax(0,−x) + (1− λ)

−1

αmax(0,−y)

= λu(x) + (1− λ)u(y).

51

To get a better understanding of this utility function we have plotted it in figure3.2.2). We notice that, locally, the investor is risk neutral because the utilityfunction is a piecewise linear function.The interpretation of expected shortfall as the optimised certainty equivalent ofan investor with the utility function u(x) = −1

αmax(0,−x) reveals a potential

criticism. When an investor with this utility function knows he will lose money,he is indifferent between an uncertain loss X and a certain loss E [X]. And whenthis investor knows he will gain money, his utility score does not depend upon theamount he eventually gains.

Figure 3.7: Utility function for CVaRα with α = 0.05.

52

4Utility based risk measures

In this chapter we will discuss how utility functions can be incorporated in financialrisk measures. The stochastic variable X will model, as always, the net payoffsof the portfolio. We will again assume that X ∈ L∞ (Ω,F , P ) and that X cantake positive as well as negative values. When studying risk, we are especiallyinterested in the losses of our portfolio. Instead of using the utility function u tostudy these losses we will use the associated loss function l defined as:

l(x) = −u(−x). (4.1)

Non-saturated and risk averse investors are modelled using increasing and concaveutility functions. Using the relation 4.1 we can see that this implies that theirassociated loss function will be increasing and convex.

4.1 Utility based shortfall risk measures

The first class of utility based risk measures that we will discuss was introductedin [13] and [11]. These risk measures are called utility based shortfall risk and canbe constructed using the notion of a loss function. In this section we will explainthe construction of this utility based risk measure and show the link to the u-Meancertainty equivalent.

Definition 4.1. A function l : R → R is called a loss function if it is increasingand not identically constant.

Loss functions can induce risk measures in a natural way using the notion of aacceptance set. Take x0 in the interior of l (R). We can now define the followingacceptance set:

A : = X ∈ L∞ (Ω,F , P ) |E [l(−X)] ≤ x0= X ∈ L∞ (Ω,F , P ) |E [−u(X)] ≤ x0= X ∈ L∞ (Ω,F , P ) |E [u(X)] ≥ −x0.

54

Hence a position X is acceptable if the expected utility of it is larger than a givenamount, or equivalently if the expected loss is smaller than a given amount. Usingthis acceptance set we are able to define the risk measure associated with it.

ρA(X) : = infm ∈ R|m+X ∈ A= infm ∈ R|E [l(−X −m)] ≤ x0= infm ∈ R|E [u(X +m)] ≥ −x0.

The risk measure defined above is called utility based shortfall risk and we willdenote it with SF lx0(X). In [5, p. 473] a link between utility based shortfall riskmeasures and u-Mean certainty equivalents is mentioned. We will derive this linkhere.

SF lx0(X) = infm ∈ R|E [l(−X −m)] ≤ x0= − sup−m ∈ R|E [l(−X −m)] ≤ x0= − supm ∈ R|E [l(−X +m)] ≤ x0= − supm ∈ R|E [u(X −m)] ≥ −x0= − supm ∈ R|E [u(X −m)] ≥ 0= −Mu(X)

Where u(x) = u(x) + x0 = −l(−x) + x0. We conclude that

SF lx0(X) = −Mu(X), with u(x) = u(x) + x0 = −l(−x) + x0. (4.2)

The value of x0 has an influence on the utility based shortfall risk. Suppose x1 ≥ x0

and take m ∈ m ∈ R|E [u (X +m)]+x0 ≥ 0. Then we have that E [u (X +m)]+x1 ≥ E [u (X +m)]+x0 ≥ 0. Hence m ∈ m ∈ R|E [u (X +m)]+x1 ≥ 0. We canconclude that m ∈ R|E [u (X +m)]+x0 ≥ 0 ⊂ m ∈ R|E [u (X +m)]+x1 ≥ 0.This implies that infm ∈ R|E [u (X +m)]+x0 ≥ 0 ≥ infm ∈ R|E [u (X +m)]+x0 ≥ 0. We conclude that

x1 ≥ x0 ⇒ SF lx1(X) ≤ SF lx0(X). (4.3)

In [11, p247] it is stated without proof that SF lx0(X) is a monetary risk measureand that if l is a convex loss function this risk measure is convex. In this thesiswe’ll prove these claims.

Theorem 4.1. The utility based shortfall risk measure

SF lx0(X) = infm ∈ R|E [l(−X −m)] ≤ x0 (4.4)

is a monetary risk measure.

Proof. To prove that SF lx0(X) is a monetary risk measure we will prove that itsatisfies the monotonicity property and the translation invariance property.

1. (Monotonicity) Without loss of generality we can assume that X ≤ Y . Wehave that

X ≤ Y ⇒ −X −m ≥ −Y −m⇒ E [l(−X −m)] ≥ E [l(−Y −m)] .

55

Now take m ∈ m ∈ R|E [l(−X −m)] ≤ x0, then we have that

E [l(−Y −m)] ≤ E [l(−X −m)] ≤ x0.

From this we conclude that m ∈ m ∈ R|E [l(−Y −m)] ≤ x0. Hence we findthat

m ∈ R|E [l(−X −m)] ≤ x0 ⊂ m ∈ R|E [l(−Y −m)] ≤ x0.


infm ∈ R|E [l(−X −m)] ≤ x0 ≥ infm ∈ R|E [l(−Y −m)] ≤ x0.

This proves that if X ≤ Y then SF lx0(X) ≥ SF lx0(Y ).

(Translation invariance) We have that

SF lx0(X + k) = infm ∈ R|E [l(−X − k −m)] ≤ x0= infm ∈ R|E [l(−X − (k +m))] ≤ x0= infm− k ∈ R|E [l(−X −m)] ≤ x0= infm ∈ R|E [l(−X −m)] ≤ x0 − k= SF lx0(X)− k.

This proves the translation invariance property.

We now proof that utility based shortfall risk is a convex risk measure if l is aconvex loss function, or equivalently if u is a concave utility function.

Theorem 4.2. If the loss function l is convex, then the utility based shortfall riskmeasure

SF lx0(X) = infm ∈ R|E [l(−X −m)] ≤ x0 (4.5)

is a convex risk measure.

Proof. From theorem 1.1 form chapter 1 we know that it is sufficient to provethat the acceptance set A = X ∈ L∞(Ω,F , P )|E [l(−X)] ≤ x0 is convex. TakeX ∈ A, Y ∈ A and λ ∈ [0, 1] randomly. We need to prove that λX+(1−λ)Y ∈ A.Because the loss function l is convex, we have that E [l(−(λX + (1− λ)Y ))] ≤E [λl(−X) + (1− λ)l(−Y )] = λE [l(−X)] + (1 − λ)E [l(−Y )]. Since X ∈ A andY ∈ A we have E [l(−X)] ≤ x0 and E [u(−Y )] ≤ x0. We can conclude thatE [l(−(λX + (1− λ)Y ))] ≤ λ(x0) + (1− λ)(x0) = x0. This means that λX + (1−λ)Y ∈ A, which is what we needed to prove.

It is now natural to ask whether this utility based shortfall risk measure is acoherent risk measure. Unfortunately this will not always be the case. I willdemonstrate this using a numerical example. Consider a bond which at time t = 0costs 100, and will pay 105 at time t = 1. Assume risk free interest rate of 2%, anda default probability of the bond of 1%. Then the net payoff X of this investmentis −100

1.02= −98.04 with probability 1% and 5

1.02= 4.90 with probability 99%. If

56

we use the exponential utility function u(x) = 1 − exp(−x) and take x0 = 0.Then SF l0(X) = −Mu(X) = lnE [exp(−X)]. Where the last equality followsfrom theorem 2.7 in chapter 2. If this risk measure was coherent it would satisfythe property of positive homogeneity, i.e. λ lnE [exp(−X)] = lnE [exp(−λX)], forall λ > 0. However if we take λ = 2, we find that 2 lnE [exp(−X)] = 186.87and lnE [exp(−2X)] = 191, 47. Which shows that this risk measure is not alwayscoherent.From [27, p.101] we have following fact.

Lemma 4.1. A convex function l : R→ R is continuous.

From [11, p.248] we have following lemma. We will work out the proof of thislemma.

Lemma 4.2. If l is a convex loss function, then the equation

E [l (−z −X)] = x0 (4.6)

has a unique solution z = SF lx0(X).

Proof. Consider a sequence zn with zn ∈ z ∈ R|E [l(−z −X)] = x01 suchthat zn → SF lx0(X) = infz ∈ R|E [l(−z −X)] = x0 if n → +∞. Thenlim

n→+∞E [l(−zn −X)] = x0. Now we will use that because X ∈ L∞ (Ω,F , P ),

X is a bounded measurable function. Which means that ∃M ∈ R : ∀ω ∈ Ω|X(ω)| ≤M . Because l : R→ R is continuous and zn ∈ R then for all n, we havethat

∃M ′ ∈ R,∀ω ∈ Ω,∀n ≥ 0 : |l(−zn −X(ω))| ≤M ′ < +∞

Using bounded convergence we have that E[

limn→+∞

l(−zn −X)

]= x0. Using the

fact that l is an increasing and convex function and thus continuous we have that

E[l

(lim

n→+∞−zn −X

)]= E

[l(−SF lx0(X)−X

)]= x0. Hence SF lx0(X) is a

solution to 4.6. To show that the solution is unique it is sufficient to notice thatif x0 is an interior point of an increasing, convex and non-identically constantfunction l and that l is strictly increasing in (l−1(x0) − ε,+∞) for some ε > 0.Because any solution of 4.6 has to lie in this interval we have that the solution isunique.

In [11, p.248] we found following theorem and proof.

Theorem 4.3. The utility-based shortfall risk measure SF lx0(X) is continuous

from below. Hence SF lx0(X) can be represented in the form

SF lx0(X) = maxQ∈M1(P )

(EQ (−X)− αmin(Q)

)(4.7)

Proof. If SF lx0(X) is continuous from below, representation 4.7 follows directlyfrom theorem 1.9 from the first chapter. Take a sequence Xn ∈ L∞(Ω,F , P )such that Xn X, point wise. Then SF lx0(Xn) R ∈ R. We need to

1We know this set is not empty because x0 is assumed to be an internal point of l.

57

show that R = SF lx0(X). Just as in the proof of the above lemma 4.2 we

can use bounded convergence to obtain that limn→+∞

E[l(−SF lx0(Xn)−Xn)

]=

E[

limn→+∞

l(−SF lx0(Xn)−Xn)

]. Again using the fact that l is continuous we have

E[

limn→+∞

l(−SF lx0(Xn)−Xn)

]= E [l(−R−X)]. Because E

[l(−SF lx0(Xn)−Xn)

]=

x0 for all n we have that limn→+∞

E[l(−SF lx0(Xn)−Xn)

]= x0. This implies that

E [l(−R−X)] = x0. Since we know that the only solution to equation 4.6 isSF lx0(X). We have that R = SF lx0(X).

Equation 4.2 states the link between the link between the utility based shortfallrisk measure and the u-Mean certainty equivalent. We will work out the idea ofusing strong Lagrangian duality proposed in [5] to derive a relation between theoptimised certainty equivalent en the u-Mean certainty equivalent. From [14, p60]we have following theorem concerning strong Lagrangian duality.

Theorem 4.4. Let X be a non-empty convex set in Rn, Let f : Rn → R andg : Rn → Rm be convex and h : Rn → Rl be affine. Suppose that the followingconstraint is satisfied: There exists a x ∈ X such that g(x) < 0 and h(x) = 0 with0 ∈ int(h(X)) where h(X) = h(x)|x ∈ X. Then

inff(x)|x ∈ X, g(x) ≤ 0, h(x) = 0 = supθ(u, v)|u ≥ 0. (4.8)

Where θ(u, v) = inff(x) + uTg(x) + vTh(x)|x ∈ X. Furthermore if the infimumis finite, then supθ(u, v)|u ≥ 0 is achieved at (u, v) with u ≥ 0. If the infimumis achieved at x, then uTg(x) = 0.

For λ > 0, denote with OCEλu(X) := supη∈R

(η + λE [u(X − η)]). By definition we

have that:

SF lx0 = infη ∈ R|E [l(−X − η)] ≤ x0.

Use following translation l(x) = l(x)− x0, we have that

SF l0 = infη ∈ R|E[l(−X − η)

]≤ 0.

The Lagrange function for this problem is given by L(λ) = η + λE[l(−X − η)

].

We want to apply the strong duality theorem. We have that f(η) = η is a convex

function. We also have that g(η) := E[l(−X − η)

]is a convex function. This

follows from the convexity of l. To formally prove this, take η1, η2 ∈ R and

58

t ∈ [0, 1], then we have that

g(tη1 + (1− t)η2) = E[l(−X − tη1 − (1− t)η2)

]= E

[l(−tX − (1− t)X − tη1 − (1− t)η2)

]= E

[l (t(−X − η1) + (1− t)(−X − η2))

]≤ E

[tl(−X − η1) + (1− t)l(−X − η2)

]= tE

[l(−X − η1)

]+ (1− t)E

[l(−X − η2)

]= tg(η1) + (1− t)g(η2).

To apply strong Lagrangian duality we need to show that there exists an internalsolution. That is, we need to find a η such that g(η) < 0. Because x0 is aninternal point of l(R), 0 is an internal point of l(R). Using the same argumentsas with equation 4.6 we find that there exists an ε > 0 such that the equation

E[l (−X − η)

]= −ε has a solution η. From this we can conclude there exists an

η such that g(η) = E[l (−X − η)

]= −ε < 0, which proves the existence of the

internal solution. Denote with u the utility function associated with l. Because

l is a continuous and non-decreasing function, the restriction E[l(−X − η)

]≤ 0

will be binding. Hence we can assume λ > 0.We can now apply the strong duality theorem.

Using that θ(λ) = infη∈R

(η + λE

[l(−X − η)

])we find that

SF l0 = infη ∈ R|E

[l(−X − η)

]≤ 0

= supλ>0

(infη∈R

(η + λE

[l(−X − η)

]))= sup

λ>0

(infη∈R

(η − λE [u(X + η)])

)= sup

λ>0

(infη∈R

(−η − λE [u(X − η)])

)= sup

λ>0

(− sup

η∈R(η + λE [u(X − η)])

)= − inf

λ>0

(supη∈R

(η + λE [u(X − η)])

)= − inf

λ>0(OCEλu(X)) .

We already showed that SF l0 = −Mu. Hence we have that

Mu = infλ>0

(OCEλu(X)) . (4.9)

From this we conclude that

59

Mu ≤ OCEu(X). (4.10)

4.2 Divergence risk measures

Apart from utility based shortfall risk measures there is another way to incorpo-rate utility functions into risk measures. Although less obvious, divergence riskmeasures are another example of utility based risk measures. The next section isdevoted to the study of this class of risk measures.

4.2.1 Construction and representation

Divergence risk measures are based on the robust representation of a convex riskmeasure, something which we have discussed in the first chapter. The robustrepresentation of a risk measure has the following form


(EQ [−X]− α(Q)) . (4.11)

In this representation we have taken some probabilistic models more seriously thanothers using the penalty function α(Q). In divergence based risk measures thispenalty function will be the φ-divergence. We will make following assumptions onthe function φ:

1. φ : R→ (−∞,+∞] is a proper2 closed convex function.

2. φ is lower semicontinous.

3. If the effective domain3 is denoted by domφ, then 1 ∈ int(domφ).

4. The minimum of φ is 0 which is attained at 1.

The class of functions for which these properties are satisfied will be denoted withΦ. We will call the function φ a divergence function.

Definition 4.2. For φ ∈ Φ the φ-divergence of the probability measure Q withrespect to P is defined as

Iφ(Q|P ) =

∫Ωφ(dQdP

)dP if Q ∈M1(P )

+∞, otherwise(4.12)

Where dQdP

denotes the Radon-Nikodym derivative.

2Which means there exists an x ∈ R such that φ(x) < +∞.3The effective domain of the proper function is the set x|φ(x) < +∞.

60

Note that if the probability measure Q would not be absolute continuous withrespect to P then the Radon-Nikodym derivative would not be well defined. Us-ing the φ-divergence as a penalty function we can define divergence based riskmeasures.

Definition 4.3. The φ-divergence based risk measure is defined as

Dφ(X) = supQ∈M1(P )

(E[−X ]−Iφ (Q|P )) . (4.13)

In what follows we will often use the Legendre transform. This transform is some-times also called the Fenchel-Legendre transform.

Definition 4.4. The Legendre transform of a convex function l : R→ R∪ +∞is defined as

l∗(y) := supx∈R

(yx− l(x)) , y ∈ R. (4.14)

At first sight it might not be clear why divergence risk measures are also utilitybased risk measures. However, it turns out that divergence risk measures are in factnegative optimised certainty equivalents. The negative of the optimised certaintyequivalent can be viewed as the dual optimisation problem of the divergence riskmeasure. i.e for u(x) = −φ∗(−x) we have that

supQ∈M1(P )

(EQ[−X ]−Iφ (Q|P )) = − supη∈R

(η + E [u (X − η)]) . (4.15)

We want to make the remark that in the optimisation problem on the left handside of 4.15 we optimise over an infinite dimensional space, while on the right handside the optimisation happens over a finite dimensional space. In [5] the authorsuse strong Lagrangian duality to obtain this link. However we are not convincedthat they checked all necessary assumptions to conclude that strong duality holds.Therefore we have added the assumption that φ is a lower semicontinuous function,an assumption also made in [11, p.256]. Using this extra assumption and the ideasproposed in [5] we have reworked the proof of 4.15. Instead of using strong La-grangian duality we will use the closely related concept of Fenchel duality to provethis connection. Using this type of duality explains why the Fenchel-Legendretransformation turns up in some of the equations. We will need the concept of thecore of a set.

Definition 4.5. If X is a normed space then the core of a set A ⊂ X is definedby x ∈ core(A) if for each h ∈ x ∈ X|‖x‖ = 1 there exists an δ > 0 such thatx+ th ∈ A for all 0 ≤ t ≤ δ.

Lemma 4.3. If A is a set then int(A) ⊂ core(A).

Proof. without proof, see [9].

Our proof will be based on following duality theorem regarding Fenchel dualitywith equality constraints. From [7, Corollary 1.3] we have that

61

Theorem 4.5. (Fenchel Duality theorem for linear constraints) Let X and Y beBanach spaces. Given any f : X → (−∞,+∞]. Any bounded map A : X → Yany element b ∈ Y . The following weak duality holds:

infx∈Xf(x)|Ax = b ≥ sup

µ∈Y ∗〈b, µ〉 − f ∗(A∗µ). (4.16)

If f is lower semicontinuous and b ∈ core(A dom f), then we have equality. Andthe supremum is attained if finite.

In [16] we found the following Fatou property which states that

Theorem 4.6. (Fatou property) Let g, fn for n ∈ N be measurable functions suchthat fn ≥ g for all n and

∫gdµ > −∞, then

limn→∞

inf

∫fndµ ≥

∫limn→∞

inf fndµ. (4.17)

We also have that

Theorem 4.7. Let Ω be a σ-finite measure space, and X := Lp(Ω,F , P ), p ∈[1,+∞]. Let g : R× Ω→ (−∞,+∞] be a normal integrand, and define on X theintegral function Ig(x) :=

∫Ωg (x(ω), ω) dP (ω). Then,

infx∈X

∫Ω

g(x(ω), ω)dP (ω) =

∫Ω

infs∈R

g(s, ω)dP (ω), (4.18)

provided the left-hand side is finite. Moreover,

x ∈ arg minx∈X

Ig(x)⇔ x(ω) ∈ arg mins∈R

Ig(s, ω), a.e. (4.19)

Proof. Theorem from [5, p20].

Theorem 4.8. Let f : R× Ω→ (−∞,+∞]. If f(·, ω) is (convex) and closed foralmost all ω, and measurable in ω for each x such that dom f(·, ω) has a non-emptyinterior for every ω, then f is a normal (convex) integrand.


Theorem 4.9. For all p ∈ [1,∞] the spaces Lp(Ω,F , P ) are Banach spaces


Before we prove theorem 4.15 we will prove some lemma’s which will make thefinal proof easier.

Lemma 4.4. If z ∈ L1(Ω,F , P ) then the functional B : L1 → R defined byB(z) =

∫Ωz(ω)dP (ω) is continuous and linear.

Proof. Take z1, z2 ∈ L1(Ω,F , P ). We need to show that ∀ε > 0 ∃δ > 0 suchthat if ‖z1 − z2‖L1 < δ then |B(z1) − B(Z2)| < ε. We have that ‖z1 − z2‖L1 =∫

Ω|z1(ω)− z2(ω)|dP (ω) < δ. We can now conclude that

62

|B(z1)−B(Z2)| =∣∣∣∣∫

Ω

z1(ω)dP (ω)−∫

Ω

z2(ω)dP (ω)

∣∣∣∣ ≤ ∫Ω

|z1(ω)−z2(ω)|dP (ω) < δ.

(4.20)Hence for each ε > 0 we can pick δ such that δ = ε.The linearity of the functional B follows from the fact that the Lebegues integralis linear.

A standard result from functional analysis yields that a linear operator betweennormed spaces is bounded if and only if it is a continuous linear operator. Fromthis we can conclude that B is a bounded functional.

Lemma 4.5. If A and B are sets such that A ⊂ B then int(A) ⊂ int(B)4

Proof. take a ∈ int(A) then there exists an environment U of a with U ⊂ A.Because A ⊂ B we have U ⊂ B. Hence U is an environment of a in B. weconclude that a ∈ int(B). Because a was chosen randomly, we can conclude thatint(A) ⊂ int(B).

Lemma 4.6. If X ∈ L∞(Ω,F , P ), then the function g : L1(Ω,F , P )→ R definedby g(z) :=

∫ΩX(ω)z(ω)dP (ω) is continuous.

Proof. Take z1, z2 ∈ L1(Ω,F , P ), such that for δ > 0 ‖z1− z2‖L1 < δ. This meansthat

∫Ω|z1(ω)− z2(ω)|dP (ω) < δ. Then we have that

|g(z1)− g(z2)| =∣∣∣∣∫

Ω

X(ω) (z1(ω)− z2(ω)) dP (ω)

∣∣∣∣≤∫

Ω

|X(ω) (z1(ω)− z2(ω))| dP (ω)

≤ supω|X(ω)|

∫Ω

|z1(ω)− z2(ω)|dP (ω)

< supω|X(ω)|δ

Because X ∈ L∞(Ω,F , P ) we have supω|X(ω)| < +∞. Hence if we pick δ =

ε

supω|X(ω)| > 0 then ‖z1 − z2‖L1 < δ implies |g(z1)− g(z2)| < ε.

We can now prove the main theorem of this section.

Theorem 4.10. Let φ ∈ Φ and let X ∈ L∞(Ω,F , P ). Then

infQ∈M1(P )

(EQ [X] + Iφ (Q|P )) = supη∈R

(η − EP [φ∗ (η −X)]) . (4.21)

4int(A) denotes the interior of the set A.

63

Therefore with u(t) := −φ∗(−x), we have

OCEu(X) = infQ∈M1(P )

(EQ [X] + Iφ (Q|P )) (4.22)

= − supQ∈M1(P )

(EQ [−X]− Iφ (Q|P )) (4.23)

= −Dφ(X). (4.24)

(4.25)

Proof. Take φ in Φ. Let v := infQ∈M1(P )

(EQ [X] + Iφ (Q|P )). Now fix Q ∈ M1(P ).

Then by definition of M1(P ), Q is absolute continuous with respect to P . Us-ing the Radon-Nikodym theorem we have that this is equivalent with the exis-tence of a density z(ω) := dQ(ω)

dP (ω). We have that z ≥ 0 a.e. and it is clear that∫

Ω|z(ω)|dP (ω) = 1. Hence we have that z(ω) ∈ L1(Ω,F , P ).

v = infQ∈M1(P )

(EQ [X] + Iφ (Q|P ))

= infQ∈M1(P )

(EQ [X] + EP

[φ

(dQ

dP

)])= inf

z∈L1

∫Ω

φ (z(ω)) dP (ω) +

∫Ω

X(ω)z(ω)dP (ω)∣∣∣ ∫

Ω

z(ω) = 1, z ≥ 0 a.e.

= inf

z∈L1

∫Ω

φ (z(ω)) dP (ω) +

∫Ω

X(ω)z(ω)dP (ω)∣∣∣ ∫

Ω

z(ω) = 1

.

The last equality follows from the fact that if z(ω) < 0 for a set S ⊂ Ω withP (S) > 0 then z can not correspond to the Radon-Nikodym derivative of a cer-tain probability measure Q with respect to P . By definition of φ-divergence wehave that

∫Ωφ (z(ω)) dP (ω) = +∞. Furthermore we always have that −∞ <∫

ΩX(ω)z(ω)dP (ω) ≤ supω∈Ω |X(ω)|

∫Ωz(ω)dP (ω) < +∞. From all this it fol-

lows that if z(ω) < 0 for a set S ⊂ Ω with P (S) > 0 then∫

Ωφ (z(ω)) dP (ω) +∫

ΩX(ω)z(ω)dP (ω) = +∞. Therefore we can conclude that the last equality holds.

We want to apply theorem 4.5 regarding Fenchel duality for linear constraints. Inthe context of this theorem let f : L1(Ω,F , P )→ (−∞,+∞] and defined by

f(z) :=

∫Ω

φ(z(ω))dP (ω) +

∫Ω

X(ω)z(ω)dP (ω). (4.26)

and let A : L1(Ω,F , P )→ R:

A(z) :=

∫Ω

z(ω)dP (ω). (4.27)

Then A is linear because the Lebegues integral is linear. It is bounded because itis also a continuous functional, see lemma 4.4. Let b = 1 and note that R∗ = R.We will now calculate

d := supµ∈R

(〈b, µ〉 − f ∗(A∗µ)) . (4.28)

64

We have that

f ∗(A∗µ) = supz∈L1

(〈A∗µ, z〉 − f(z))

= supz∈L1

(〈µ,Az〉 − f(z))

= supz∈L1

(µ

∫Ω

z(ω)dP (ω)−∫

Ω

φ(z(ω))dP (ω)−∫

Ω

X(ω)z(ω)dP (ω)

)= sup

z∈L1

(−∫

Ω

φ(z(ω))dP (ω) +

∫Ω

(µ−X(ω))z(ω)dP (ω)

)= − inf

z∈L1

(∫Ω

φ(z(ω))dP (ω)−∫

Ω

(µ−X(ω))z(ω)dP (ω)

)= − inf

z∈L1

(∫Ω

(φ(z(ω))− (µ−X(ω))z(ω)) dP (ω)

).

We now want to apply theorem 4.7. To be able to apply this theorem we first needto check that I(s, ω) := φ(s) − (µ − X(ω))s is a normal integrand. For this wecan use lemma 4.8. I(s, ω) is convex and closed in s for almost all ω because φ isconvex and closed. 1 ∈ int(dom I(·, ω)) for every ω, because 1 ∈ int(domφ).We also need to prove that

infz∈L1

(∫Ω

(φ(z(ω))− (µ−X(ω))z(ω)) dP (ω)

)(4.29)

is finite. By assumption the minimum of φ is 0 which is attained at 1. Hence wehave for all z.

−∞ <

∫Ω

(µ−X(ω)) dP (ω) ≤∫

Ω

(φ(z(ω))− (µ−X(ω))z(ω)) dP (ω). (4.30)

The first strict inequality follows from the fact that EP [X] is finite. Hence∫Ω

(φ(z(ω))− (µ−X(ω))z(ω)) dP (ω) is bounded from below. Which implies that

that −∞ < infz∈L1

(∫Ω

(φ(z(ω))− (µ−X(ω))z(ω)) dP (ω)

). Because z = 1 is a

possible solution we have

infz∈L1

(∫Ω

(φ(z(ω))− (µ−X(ω))z(ω)) dP (ω)

)≤ −

∫Ω

(µ−X(ω)) dP (ω) < +∞.


infz∈L1

(∫Ω

(φ(z(ω))− (µ−X(ω))z(ω)) dP (ω)

)is finite and that we can apply theorem 4.7.We find that

65

f ∗(A∗µ) = −∫

Ω

infs∈R

(φ(s)− (µ−X(ω))sdP (ω)

=

∫Ω

sups∈R

((µ−X(ω))s− φ(s)) dP (ω)

=

∫Ω

φ∗ (µ−X(ω)) dP (ω).

Using the fact that b = 1 we can conclude that

d = supµ∈R

(µ− f ∗(A∗µ))

= supµ∈R

(µ−

∫Ω

φ∗ (µ−X(ω)) dP (ω)

).

Using theorem 4.5 we can conclude that we have weak duality which means that

infz∈L1

∫Ω

φ (z(ω)) dP (ω) +

∫Ω

X(ω)z(ω)dP (ω)∣∣∣ ∫

Ω

z(ω) = 1

≥ sup

µ∈R

(µ−

∫Ω

φ∗ (µ−X(ω)) dP (ω)

).

We now want to show that the equality holds. We need to show that 1 ∈core(A dom f), this will follow from the assumption that 1 ∈ int(domφ). Wewill first show that domφ ∩ R ⊂ A dom f . Take w ∈ domφ ∩ R. Then by defini-tion of the effective domain M := φ(w) < +∞ and we have that

∫Ωφ(w)dP (ω) +∫

ΩX(ω)wdP (ω) = M + wEP [X] < +∞. Hence w ∈ dom f . Because Aw = w

we have w ∈ A dom f . By assumption we have 1 ∈ int(domφ ∩ R). Using lemma4.5 we can conclude that 1 ∈ int(A dom f). Using lemma 4.3 we conclude that1 ∈ core(A dom f).We also need to show that f is lower semicontinuous. Which means we need toprove that

limzn→z0

inf f(z) ≥ f(z0). (4.31)

Denote with h(z) :=∫

Ωφ(z(ω))dP (ω) and with g(z) :=

∫ΩX(ω)z(ω)dP (ω). Then

f(z) = h(z) + g(z).Using the sub-additivity property of limit inferior we find that

limzn→z

inf f(z) ≥ limzn→z

inf h(z) + limzn→z

inf g(z).

We know from lemma 4.6 that g(z) is continuous. This implies that g(z) is lowersemicontinuous and we can conclude that lim

zn→zinf g(z) ≥ g(z0).

For each sequence zn ∈ L1, we can define a sequence φn(ω) := φ(zn(ω)) such thatφ0(ω) := φ(z(ω)). Then

66

limzn→z0

inf

∫Ω

φ(zn(ω))dP (ω) = limn→∞

inf

∫Ω

φn(ω)dP (ω).

We have assumed that the minimum of φ is 0. Therefore we have that φn(ω) =φ(zn(ω)) ≥ 0. Because

∫Ω

0dP (ω) = 0 and φn are measurable functions because φis a measurable function5 we can use Fatou’s lemma.We have that

limn→∞

inf

∫Ω

φn(ω)dP (ω) ≥∫

Ω

limn→∞

inf φn(ω)dP (ω)

=

∫Ω

limzn→z0

inf φ(zn(ω))dP (ω)

≥∫

Ω

φ(z0(ω))dP (ω)

= h(z0).

Where in the last inequality we have used that φ is lower semicontinuous. We findthat

limzn→z0

inf f(z) ≥ limzn→z0

inf h(z) + limzn→z−0

inf g(z) ≥ h(z0) + g(z0) = f(z0). (4.32)

Which proves the lower semicontinuity of f .We can conclude that

infL1

∫Ω

φ (z(ω)) +

∫Ω

X(ω)z(ω)dP (ω)∣∣∣ ∫

Ω

z(ω) = 1

= sup

µ∈R

(µ−

∫Ω

φ∗ (µ−X(ω)) dP (ω)

).

(4.33)Which means that

infQ∈M1(P )

(Iφ (Q|P ) + EP [X]) = supµ∈R

(µ− E [φ∗(µ−X)]) . (4.34)


Using the relationship between optimised certainty equivalents and φ-divergencerisk measures and the relationship between u-Mean certainty equivalents and util-ity based shortfall risk measures we can derive the robust representation of a utilitybased shortfall risk measure. This representation was found in [13]. Their proofis rather technical and is outside the scope of this thesis. Therefore we will usethe proof suggested in [5] to obtain this result. For this we will first derive someelementary properties of the Legendre transform. In [11] we found following result.

Theorem 4.11. If φ is a proper convex function which is lower semicontinuous,then φ∗∗ = φ. I.e

φ(t) = supx∈R

(xt− φ∗(x)) . (4.35)

5Because it is lower semicontinuous.

67


From this result we can conclude that if φ is the divergence function which is linkedto the utility function u by u(t) = −φ∗(−t), or equivalently to the loss functionby φ∗(t) = l(t), then φ can be obtained by φ(t) = l∗(t).

Lemma 4.7. If f is a convex function and let f ∗ denote its Legendre transformthen:If λ > 0 then (λf)∗ (t) = λf ∗

(tλ

).

Proof.

(λf)∗ (t) = supx∈R

(xt− λf(x)) = λ supx∈R

(xt

λ− f(x)

)= λf ∗

(t

λ

)

Lemma 4.8. If l is a convex function and define l := l − x0 with x0 ∈ R. Thenl∗ = l∗ + x0.

Proof.

l∗(t) = supx∈R

(xt− l(x)

)= sup

x∈R(xt− l(x) + x0) = sup

x∈R(xt− l(x)) + x0 = l∗(t) + x0

Theorem 4.12. For any convex loss function l, the minimal penalty function inthe representation (4.7) is given by

αmin(q) = infλ>0

1

λ

(x0 + EP

[l∗(λdQ

dP

)]), Q ∈M1(P ). (4.36)

In particular we have

SF lx0(X) = maxQ∈M1(P )

[EQ (−X)− inf

λ>0

1

λ

(x0 + EP

[l∗(λdQ

dP

)])], X ∈ L∞.

(4.37)

Proof. By definition we have that

SF lx0(X) = infη ∈ R|EP [l(−X − η)] ≤ x0= infη ∈ R|EP [l(−X − η)− x0] ≤ 0

= infη ∈ R|EP[l(−X − η)

]≤ 0

= SF l0(X).

Where l = l − x0. Denote with u the associated utility function. From equation4.2 we have that SF lx0(X) = −Mu(X). Using equation 4.9 and theorem 4.10 wehave that

68

Mu(X) = infλ>0

(OCEλu(X))

= infλ>0

infQ∈M1(P )

(EQ [X] + Iφ (Q|P )

)= inf

λ>0inf

Q∈M1(P )

(EQ [X] + EP

[φ

(dQ

dP

)])= inf

λ>0inf

Q∈M1(P )

(EQ(X) + EP

[λl∗(dQ

λdP

)])= inf

λ>0inf

Q∈M1(P )

(EQ [X] + EP

[λx0 + λl∗

(dQ

λdP

)])= inf

Q∈M1(P )

(EQ [X] + inf

λ>0EP[λx0 + λl∗

(dQ

λdP

)])= inf

Q∈M1(P )

(EQ [X] + inf

λ>0λEP

[x0 + l∗

(dQ

λdP

)])= inf

Q∈M1(P )

(EQ [X] + inf

λ>0

1

λEP[x0 + l∗

(λdQ

dP

)])= − sup

Q∈M1(P )

(EQ [−X]− inf

λ>0

1

λEP[x0 + l∗

(λdQ

dP

)]).

In the forth equality we have used that φ∗(t) = −λu(−t) and hence φ(t) =(λl)∗

(t). Using 4.7 we have that(λl)∗

(t) = λl∗(tλ

). In the fifth equality we

have used lemma 4.8.Using the relation that SF lx0 = −Mu(X) we can conclude that

SF lx0 = supQ∈M1(P )

(EQ [−X]− inf

λ>0

1

λEP[x0 + l∗

(λdQ

dP

)]). (4.38)

Which proves the theorem.

We have shown the relation of utility based risk measures with certainty equival-lents defined in chapter two. We have also discussed the relation between theserisk measures. We summarize the main results in table 4.2.1.

Table 4.1: summary utility based risk measures

SF lx0(X) Dφ(X)

certainty equivalent −Mu(X) −OCEu(X)with u(x) = −l(−x) + x0 with −u(−x) = φ∗(x)

Utility representation − supm ∈ R|E [u(X −m)] ≥ −x0 − supη∈R

(η + E [u (X − η)])

penalty function infλ>01λEP[x0 + l∗

(λdQdP

)]EP[φ(dQdP

)]At this point we will take a closer look at the assumptions we have made. Oneof these assumptions was that the utility functions we use are normalised. We

69

followed [5] by only considering the subset of utility functions which are non-decreasing and concave and for which u(0) = 0 and 1 ∈ ∂u(0). The authors of[5] give no clear explanation why they chose this normalisation. However theydo state that they need this normalisation to be able to give a clear economicinterpretation of the optimised certainty equivalent. They interpret the optimisedcertainty equivalent as a decision problem and use the utility function to ’discount’an uncertain payoff. If X is an uncertain payoff, then E [u(X)] is the value of thispayoff. If you give the investor the possibility to consume a part η of this uncertainincome in advance, then he gets η + E [u(X − η)]. The investor tries to optimisethe decision on how much to consume in advance. Using this normalisation theyguarantee that u(x) ≤ x. As we have shown in the second chapter this is equivalentwith OCEu(X) ≤ E [X], a condition which reflects risk aversion. If the investorconsumes to much in advance then u(·) will penalise this. If on the other hand theinvestor consumes to little in advance then this can be seen as a missed opportunity.Al his money (or more) is stuck in the uncertain payoff and since he is risk averse,this would also be penalized by u(·). The investors optimal allocation resultsin the optimised certainty equivalent. Remember that under the von Neumann-Morgenstern axioms the utility function of an investor is only unique up to anaffine transformation. This has the undesirable effect that the same investor,modelled by two different utility functions can have different optimised certaintyequivalents, because the optimised certainty equivalent is not invariant under anaffine transformation of the utility function u. This makes the optimised certaintyequivalent not a ’real’ certainty equivalent.From an economic point of view the standardisation of the utility functions isessential to give a clear interpretation to the optimised certainty equivalent, andto bypass the problem that the optimised certainty equivalent is not invariantunder an affine transformation of the utility function.From a mathematical point of view however, the dependence of the optimised cer-tainty equivalent on the specific standardization of the utility function is not reallya problem but can be viewed as an opportunity. By wisely choosing a standard-isation of the utility function, one can alter the optimised certainty equivalent,and thus the risk measure. Now a new question occurs: ”What would be a goodstandardisation of the utility function from a mathematical point of view?” Toanswer this question we will need to further examine the connection between theutility function and the divergence function.In the robust representation of the divergence risk measure one can observe that thedivergence is a penalty function. A good penalty function would heavily penalisemodels Q which deviate a lot from the fixed model P , while lightly penalisingmodels which are very close to P . Therefore it would be intuitive to assume thatthe divergence penalises the model P the least. That is φ(t) attains its minimumfor t = 1. Using that −u(−x) = φ∗(t) we have that

u(0) = − supt∈R

(0 · t− φ(t))

= inft∈R

(φ(t)) .

Assuming the infimum is attained we can conclude that u(0) is be the minimalpenalty given. In what follows we will assume that u ∈ C1. We are interested in

70

what the condition u′(0) = 1 imposes on the divergence function φ.Notice that φ(1) = sup

x∈R(1x+ u(−x)). If u′(0) = 1, then 1 − u′(−x) = 0 has

a solution x = 0. Because u is concave we have that u′′(x) ≤ 0. Thereforeφ(1) = u(0), which again states that the penalty given to P equals u(0).Hence the standardisation u(0) = 0 and u′(0) = 1 implies that φ attains itsminimum at 1 and φ(1) = 0.

4.2.2 The coherence of divergence risk measures

We know that divergence risk measures are always convex. This follows easilyfrom the properties of the optimised certainty equivalent proven in theorem 2.4 ofthe second chapter. However divergence risk measures are not always coherent.In chapter one, theorem 1.7, we have seen that the penalty function of a coherentrisk measure can only take the values 0 or +∞. In the case of divergence riskmeasures this would imply that the divergence is either 0 or +∞. This is a ratherrestrictive condition. It is now natuaral to ask which utility functions give rise tocoherent divergence risk measures. This question was answered in [5]. For this theauthors considered the class of strongly risk averse utility functions U<

0 . I.e.

u ∈ U<0 if and only if u ∈ U0 and u(t) < t ∀t 6= 0.

We will further assume that u is continuous. In [5, lemma 2.1] we find followinglemma.

Lemma 4.9. Let u : R → [−∞,+∞] be a proper closed and concave function.Then the right and left derivatives u′+ and u′− exist as extended real numbers, and

1. for all a < t < b we have that u′+(a) ≥ u′−(t) ≥ u′+(t) ≥ u′−(b), and

2. the subdifferential is given by

∂u(t) = s ∈ R|u′+(t) ≤ s ≤ u′−(t). (4.39)

Denote with g(η) := η + E [u(X − η)] then it is stated in [5, proposition 2.1] that

η∗ ∈ arg max(g(η))⇔ E[u′+(X − η∗)

]≤ 1 ≤ E

[u′−(X − η∗)

]. (4.40)

Where the authors have assumed that they can freely interchange the derivativeand the expectation operator. They claim this is the case when the one-sidedderivatives of u are continuous and when the associated expected values are finite.Hence if u ∈ C1 we have that

η∗ ∈ arg max(g(η))⇔ E [u′(X − η∗)] = 1. (4.41)

In [5, Theorem 3.1] we find following theorem which characterises the utility func-tions for which the associated divergence risk measure is coherent.

71

Theorem 4.13. In the class U<0 of strongly risk-averse utility functions that are

finite valued, The divergence risk measure Dφ(X) = −OCEu(X) is a coherent riskmeasure if and only if u is the piecewise linear function given by

u(x) =

γ2x, if x ≤ 0

γ1x, if x > 0(4.42)

for some γ2 > 1 > γ1 ≥ 0.

Proof. The proof of this theorem was taken from [5, propositions 3.1, 3.2] andconsists of two parts, theorem 4.14 and 4.15.

Theorem 4.14. Let u ∈ U<0 . Then OCEu(X) is positively homogeneous for all

random variables X if and only if u is positively homogeneous.

Proof. First suppose that u is positive homogeneous. Then we need to show thatOCEu is positive homogeneous. Take λ > 0, we have that

OCEu(λX) = supη∈R

(η + E [u(λX − η)])

= supλη∈R

(λη + E [u(λX − λη)])

= supλη∈R

(λη + λE [u(X − η)])

= λ supλη∈R

(η + E [u(X − η)])

= λOCEu(X).

Which proves that the OCE is positive homogeneous. Take α > 0 > β, andconsider the random variable X such that P (X = α) = p and P (X = β) = 1− p.Now denote with

g(η) := η + pu(α− η) + (1− p)u(β − η). (4.43)

Then the optimised certainty equivalent is given by

OCEu(X) = supη∈R

(η + pu(α− η) + (1− p)u(β − η)) = supη∈R

g(η). (4.44)

Because u ∈ U<0 we have that 1 ∈ ∂u(0). Hence by lemma 4.9 we have that

u′+(0) ≤ 1 ≤ u′−(0).

Because α > 0 > β we can again apply lemma 4.9 such that

u′−(α) ≥ u′+(α) ≥ u′−(0) ≥ 1 ≥ u′+(0) ≥ u′−(β) ≥ u′+(β) > 0. (4.45)

We know from equation 4.40 that

η∗ ∈ arg max(g(η))⇔ E[u′−(X − η∗)

]≥ 1 ≥ E

[u′+(X − η∗)

]. (4.46)

72

Hence 0 ∈ arg max(g(η)) if and only if

pu′−(α) + (1− p)u′−(β) ≥ 1 ≥ pu′+(α) + (1− p)u′+(β).

From this we have that

pu′−(α) + (1− p)u′−(β) ≥ 1⇔ p ≥1− u′−(β)

u′−(α)− u′−(β),

and

pu′+(α) + (1− p)u′+(β) ≤ 1⇔ p ≤1− u′+(β)

u′+(α)− u′+(β).

Hence

0 ∈ arg max(g(η))⇔1− u′−(β)

u′−(α)− u′−(β)≤ p ≤

1− u′+(β)

u′+(α)− u′+(β). (4.47)

We will now check whether the right hand side of equation 4.47 is well defined.Because u′−(α) > 1 > u′−(β) > 0, and similarly u′+(α) > 1 > u′+(β) > 0 we have

that 0 <1−u′−(β)

u′−(α)−u′−(β)< 1 and 0 <

1−u′+(β)

u′+(α)−u′+(β)< 1. We conclude that we always

have that p ∈ (0, 1).We also need to show that

1− u′−(β)

u′−(α)− u′−(β)≤

1− u′+(β)

u′+(α)− u′+(β). (4.48)

We will prove that

(1− u′+(β))(u′−(α)− u′−(β))− (1− u′−(β))(u′+(α)− u′+(β)) ≥ 0. (4.49)

We have that

(1− u′+(β))(u′−(α)− u′−(β))− (1− u′−(β))(u′+(α)− u′+(β))

= u′−(α)− u′−(β)− u′−(α)u′+(β)− u′+(α) + u′+(β) + u′+(α)u′−(β)

= u′−(α)(1− u′+(β)

)− u′−(β)− u′+(α) + u′+(β) + u′+(α)u′−(β)

≥ u′+(α)(1− u′+(β)

)− u′−(β)− u′+(α) + u′+(β) + u′+(α)u′−(β)

= u′+(α)(1− u′+(β)− 1 + u′−(β)

)− u′−(β) + u′+(β)

= u′+(α)(u′−(β)− u′+(β)

)−(u′−(β)− u′+(β)

)=(u′−(β)− u′+(β)

) (u′+(α)− 1

)≥ 0.

Where in the first inequality we have used the fact that u′−(α) ≥ u′+(α). We canconclude that the expression on the right side of 4.47 is well defined. Now take p0

such that it satisfies 4.47. Then η∗ = 0 is an optimal solution and OCEu(X) =p0u(α) + (1 − p0)u(β). Take λ ∈ (0, 1), then because we do not necessarily knowthat η∗ = 0 is an optimal solution for OCEu(λX) we have following inequalities.

73

OCEu(λX) = supη∈R

(η + E [u(λX − η)])

≥ p0u(λα) + (1− p0)u(λβ)

= p0u(λα + (1− λ)0) + (1− p0)u(λβ + (1− λ)0)

≥ λ (p0u(α) + (1− p0)u(β)) + (1− λ) (p0u(0) + (1− p0)u(0))

= λ (p0u(α) + (1− p0)u(β))

= λOCEu(X)

Where we have used that u is a concave function such that u(0) = 0. Becausewe assumed that the optimised certainty equivalent is positive homogeneous, allequalities should hold. Hence we have that

p0u(λα) + (1− p0)u(λβ) = λp0u(α) + λ(1− p0)u(β)

We can rewrite this and find that

p0 (u(λα)− λu(α)) + (1− p0) (u(λβ)− λu(β)) = 0 (4.50)

Because u is a concave utility function for which u(0)=0. We have that for allx ∈ R and λ ∈ [0, 1].

u(λx) = u(λx+ (1− λ)0) ≥ λu(x) + (1− λ)u(0) = λu(x).

Because p0 ∈ (0, 1) and (u(λx)− λu(x)) ≥ 0 for all x ∈ R, we have that bothterms in the sum 4.50 are positive and because the sum to zero they should bezero as well.Hence we can conclude that

u(λα) = λu(α), ∀α > 0

u(λβ) = λu(β), ∀β < 0.(4.51)

We conclude that because u(0) = 0

u(λx) = λu(x), ∀λ ∈ [0, 1],∀x ∈ R. (4.52)

If λ > 1, then there exists a µ ∈ (0, 1) such that λ = 1µ. We then have that

u(µx) = µu(x), which means that u(

1λx)

= 1λu(x). Because this holds for all

x ∈ R, this also holds for λx. We have that u(

1λλx)

= 1λu(λx).

we can conclude that

u(λx) = λu(x), ∀λ > 0,∀x ∈ R. (4.53)

This means that u is positive homogeneous, which concludes our proof.

The next theorem characterises the positive homogeneous utility functions fromU<

0 .

74

Theorem 4.15. Let u ∈ U<0 be a finite positive homogeneous utility function, then

u is a piecewise linear function. I.e

u(x) =

γ2x, x ≤ 0

γ1x x > 0.(4.54)

Where γ2 > 1 > γ1 ≥ 0.

The proof of this theorem is based on a lemma found in [25, corollary 13.2.1 ].

Lemma 4.10. Let f be any positively homogeneous convex function, which is notidentically +∞. Then cl(f) is the support function of a certain closed convex setC. Namely

C := y|∀x, 〈x, y〉 ≤ f(x) (4.55)

Proof. Without proof, see [25, corollary 13.2.1 ].

Furthermore it is stated in [25, p. 51] that for proper convex functions the closed-ness property cl(f) = f is equivalent with lower semicontinuity.

Proof. (theorem 4.15) Denote with l(x) := −u(−x), then l is a positive homoge-neous convex function. l is also continuous because u is. Using lemma 4.10 weknow that l is the support of a closed convex set of R, I.e. an interval [γ1, γ2]with γ1 ≤ γ2. And because [γ1, γ2] = y|∀x, 〈x, y〉 ≤ l(x), we have followingrepresentation for l.

l(x) = −u(−x) = supγ1≤y≤γ2

(xy) . (4.56)

Hence we have that

l(x) =

γ1x, x ≤ 0

γ2x, x ≥ 0.(4.57)

Then the utility function u is given by

u(x) =

γ2x, x ≤ 0

γ1x, x ≥ 0.(4.58)

Because u ∈ U<0 we have that for x 6= 0 u(x) < x, this implies that γ2 > 1 > γ1.

Because u is non-decreasing we also know that γ1 ≥ 0.

4.2.3 Examples

In this section we will try to clarify the concept of divergence based risk measuresfurther by calculating the corresponding utility function of some known divergencefunctions.The χ2-divergence is given by φ(t) = (t− 1)2. We have that

−u(−x) = supt∈R

(xt− φ(t))

= supt∈R

(xt− (t− 1)2

).

75

The first order condition yields that x = 2(t − 1). The second order conditionyields that −2 < 0. From this we conclude that xt− (t− 1)2 attains a maximumfor t = x

2+ 1. Hence:

−u(−x) = x(x

2+ 1)−(x

2

)2

=x2

4+ x.

We conclude that the corresponding utility function is given by u(x) = −x2

4+ x.

The Hellinger divergence is given by φ(t) =(√

t− 1)2

. We have that

−u(−x) = supt∈R

(xt− φ(t))

= supt∈R

(xt−

(√t− 1

)2).

The first order condition yields that x = 1 − 1√t, or t = 1

(1−x)2. The second order

condition is satisfied because −1

2√t3< 0. We find that

−u(−x) =x

(1− x)2−(

1

1− x− 1

)2

=x

(1− x)2−(

x

1− x

)2

=x− x2

(1− x)2

=x

1− x.

We can conclude that u(x) = x1+x

.The reader might notice we have not included the Kullback-Leibler divergence. Inthe next chapter we will show that the associated utility function is the exponentialutility function.

4.3 The ordinary certainty equivalent as risk mea-

sure

In previous sections we have found two ways to construct convex risk measuresusing utility functions, utility based shortfall risk and divergence risk measures.Using a duality theorem form mathematical optimisation we found that these util-ity based risk measures where the dual optimisation problems of negative certaintyequivalents.The link with certainty equivalents is not surprising at all. Certainty equivalentstry to define an equivalent risk-free amount to a uncertain gamble. If this amount

76

is negative, this means you are willing to pay some amount to not have to incurthe risk of the gamble. This amount is then used as the risk measure.It is now natural to ask whether we could we use the ordinary certainty equivalentas a risk measure? That is ”would ρ(X) = −CEu(X) be a good risk measure?”In the first chapter we defined some axioms which a ”good” risk measure shouldsatisfy. First of all −CEu(X) would need to be a monetary risk measure. For thisit needs to satisfy the translation property. This means we need to have

CEu(X +m) = CEu(X) +m ∀m ∈ R. (4.59)

It turns out that the restriction 4.59 is a rather severe restriction on the possibleutility functions we can use. In what follows we will further assume that u isstrictly increasing and u ∈ C2. Now define for all m ∈ R um(x) := u(x+m), thenum is also strictly increasing and um ∈ C2. Because both u and um are strictlyincreasing the inverse functions u−1 and u−1

m are well defined. For all m ∈ R wehave:

CEu(X +m) = CEu(X) +m⇔ u−1 (E [u(X +m)]) = u−1 (E [u(X)]) +m

⇒ E [u(X +m)] = u(u−1 (E [u(X)]) +m

)⇒ E [um(X)] = um (CEu(X))

⇒ u−1m (E [um(X)]) = CEu(X)

⇒ CEum(X) = CEu(X)

From theorem 2.3 from chapter 2. We know that for all m ∈ R

CEum(X) = CEu(X)⇔ rumA (x) = ruA(x) ∀x ∈ R.

Where ruA(x) denotes the Arrow-Pratt coefficient of absolute risk aversion. Wehave that:

rumA (x) =−u′′m(x)

u′m(x)=−u′′(x+m)

u′(x+m)= ruA(x+m).

This means we need to have ruA(x + m) = ruA(x) for all m ∈ R and x ∈ R. Fromthis we derive that the Arrow-Pratt coefficient of absolute risk aversion ruA(x) isindependent of x. This implies that ruA(x) is constant. Thus u is a linear or anexponential utility function.Because linear utility functions imply a risk neutral attitude they are not desirableto construct a risk measure with. In theorem 2.7 of chapter two we have shownthat for the (normalised) exponential utility function all three different certaintyequivalents coincide.

77

5Utility functions

In this chapter we will take a closer look at some of the classes of utility functionswe encountered in the literature. We will discuss their general properties andwhether they are suitable to be used in utility based risk measures.For each of the utility functions we will calculate the associated divergence functionusing the Legendre transform. Furthermore we will also illustrate the effect of theparameters that occur in both the utility based shortfall risk and the divergencerisk. For this we have simulated 10000 returns form a normal distribution withmean 0.25 and standard deviation σ. One can think of the log-returns of a stockwhich follows a Brownian motion with a drift of 0.25 and a volatility of σ. Becausethe volatility of a stock, is linked to the riskiness of this stock, we are interestedto see the effect of σ on the risk measure. We expect to see that higher values ofσ coincide with higher values of the risk measures.Inspired by the results of these simulations we can state following lemma, whichdoes not assume a specific distribution of the returns.

Lemma 5.1. Let uα : R → R be a class of utility functions with a parameter α.If uα1(x) ≥ uα2(x) for all x ∈ R then we have that for X ∈ L∞(Ω,F , P ).

1. Dφα1(X) ≤ Dφα2

(X),

2. SF lα1x0 (X) ≤ SF lα2x0 (X), for all x0 in the interior of lα1 and lα2.

Where φα1 and φα2 denote the associated divergence functions of uα1 and uα2 re-spectively. And where lα1 and lα2 denote the associated loss functions.

78

Proof. 1. Let η ∈ R and let X ∈ L∞(Ω,F , P ) then

uα1(x− η) ≥ uα2(x− η), ∀x ∈ R⇒ E [uα1(X − η)] ≥ E [uα2(X − η)]

⇒ η + E [uα1(X − η)] ≥ η + E [uα2(X − η)]

⇒ supη∈R

(η + E [uα1(X − η)]) ≥ supη∈R

(η + E [uα2(X − η)])

⇒ −OCEuα1(X) ≤ −OCEuα2

(X)

⇒ Dφα1(X) ≤ Dφα2

(X).

The last implication follows from theorem 4.10 and uses the assumption thatX ∈ L∞(Ω,F , P ).

2. To prove the effect on the utility based shortfall risk measure take X ∈L∞(Ω,F , P ) and assume that uα1(x) ≥ uα2(x) for all x ∈ R. Then we havethat

E [uα1(X +m)] ≥ E [uα2(X +m)] , ∀m ∈ R

To prove that SF lα1x0 ≤ SFlα2x0 for all x0 in the interior of lα1 and lα2 , we will

show that

m ∈ R|E [uα2(X +m)] ≥ −x0 ⊂ m ∈ R|E [uα1(X +m)] ≥ −x0 (5.1)

Take m ∈ m ∈ R|E [uα2(X +m)] ≥ −x0. We have that E [uα1(X +m)] ≥E [uα2(X +m)] ≥ −x0.

Hence we can conclude that m ∈ m ∈ R|E [uα1(X +m)] ≥ −x0 . Thisproves the fact that

inf m ∈ R|E [uα1(X +m)] ≥ −x0 ≤ inf m ∈ R|E [uα2(X +m)] ≥ −x0 .(5.2)

The first class of utility functions we will study are the power utility functionsof which we found a brief description in [15]. These utility functions belong to alarger class of utility functions called the HARA class. The acronym HARA standsfor hyperbolic absolute risk aversion. A utility function belongs to the HARA classif the Arrow-Pratt coefficient of absolute risk aversion is given by

rA =1

a+ bx∀x ∈ D. (5.3)

Where b ≥ 0 and a > 0 if b = 0. The domain D = R if b = 0. If b 6= 0 thenD = (−a

b,+∞).

79

5.1 The power utility functions

Assume that b > 0, then we can reconstruct the utility function using theorem 2.2from the second chapter. First assume that b 6= 1We have that

u(x) =

∫ x

1

C1 exp

(∫ η

1

−rA(ζ)dζ

)dη + C2

=

∫ x

1

C1 exp

(∫ η

1

−1

a+ bζdζ

)dη + C2

=

∫ x

1

C1 exp

(ln

(a+ bη

a+ b

)− 1b

)dη + C2

= C1

∫ x

1

(a+ bη

a+ b

)− 1b

dη + C2

= C1a+ b

b− 1

[(a+ bx

a+ b

)1− 1b

− 1

]+ C2

=D

b− 1(a+ bx)1− 1

b + E.

Where D and E are integration constants. If b = 1 we have that rA(x) = 1a+x

.Then the utility function is given by

u(x) = D ln(a+ x) + E. (5.4)

The utility function

u(x) =

Db−1

(a+ bx)1− 1b + E b 6= 1

D ln(a+ x) + E b = 1(5.5)

is called the extended power utility. If we have that a > 0 then we can standardisethis utility function in the usual way such that u(0) = 0 and u′(0) = 1. To see this,

first suppose b 6= 1. From the condition u′(0) = 1 we conclude that D = a1b , and

from u(0) = 0 we can conclude that E = −Db−1

a1− 1b = −a

b−1. Now consider the case

where b = 1 then u(0) = 0 implies that E = −D ln(a) and u′(0) = 1 implies thatD = a. Therefore we can conclude that when a > 0 and b > 0 we have followingstandardised utility function.

u(x) =

a1−

1b

b−1(a+ bx)1− 1

b − ab−1

b 6= 1

a ln(a+ x)− a ln(a) b = 1,(5.6)

where the domain is given by D = (−ab,+∞). When a = 0 the extended power util-

ity function becomes the narrow power utility function which takes the followingform

u(x) =

D x1−γ

1−γ + E γ 6= 1

D ln(x) + E γ = 1.(5.7)

80

This utility function is defined for all x > 0. The parameter γ is known as thecoefficient of relative risk aversion rR which can be obtained using the followingformula

rR(x) = −xu′′(x)

u′(x). (5.8)

It is clear that if a ≤ 0 the power utility cannot be standardised in the usual way.However there is a bigger problem when using the power utility in the context ofutility based risk measures. The power utility is only defined for values greaterthan −a

b. Throughout this thesis however we have looked at the stochastic variable

X which modelled the net payoffs of a portfolio. When trying to quantify the riskof this portfolio we are especially interested in the potential losses of the portfolio.Because we have to evaluate these potential losses with a utility function, it isimportant that the utility function is defined for those negative values. This canbe problematic with power utility.There are however functions in the HARA class which do not have this problem.This occurs whenever b = 0 because then the coefficient of absolute risk aversion isa constant. As we have seen in chapter two of this thesis the corresponding utilityfunction is the exponential utility function.

5.2 The exponential utility functions

The standardised exponential utility function is given by

u(x) =1− exp (−ax)

a. (5.9)

Where the parameter a denotes the coefficient of absolute risk aversion. In theorem2.7 of the second chapter we have proven that for the exponential utility function,all certainty equivalents coincide.We will now derive the divergence function associated with this utility function:

φ(t) ≡ supx∈R

(xt+ u(−x))

= supx∈R

(xt+

(1− exp (ax)

a

))

The first order condition yields that in a maximum x = 1a

ln(t). The second ordercondition for a maximum is fulfilled because for all x we have that −a exp (ax) < 0.Hence we find that the divergence function is given by

φ(t) =t

aln(t) +

(1− exp (ln(t))

a

)=t

aln(t) +

1

a− t

a.

81

If Q ∈M1(P ), then the divergence associated with this is

Iφ(Q|P ) =

∫Ω

1

a

(dQ

dPln

(dQ

dP

)+ 1− dQ

dP

)dP

=1

a

∫Ω

dQ

dPln

(dQ

dP

)dP +

1

a

∫Ω

dP − 1

a

∫Ω

dQ

=1

a

∫Ω

dQ

dPln

(dQ

dP

)dP

=1

aKL(Q|P ).

We find that the divergence associated with the exponential utility function is theKullBack-Leibler entropy. We have already encountered this entropy in the firstchapter where we used it as a penalty function. We note that the divergence riskmeasure associated with it, i.e

supQ∈M1(P )

(EQ(−X)− 1

aKL(Q|P )

)(5.10)

is called the entropic risk measure. This definition of entropic risk measure wasgiven in [11, p 201] where we also find the definition in the form of a negativeoptimised certainty equivalent. Using theorem 2.7 from the second chapter andtheorem 4.10 from the fourth chapter we can see that this entropic risk measurehas the following representation.

ERa(X) = supQ∈M1(P )

(EQ(−X)− 1

aKL(Q|P )

)=

1

aln (E [exp (−aX)]) . (5.11)

To study the effect of the coefficient of absolute risk aversion of the entropic riskmeasure, it is important to notice that if a1 ≤ a2 then ua1(x) ≥ ua2(x) for allx ∈ R.We will formally prove this fact by showing that ua(x) is a decreasing function ofa. We have that

∂ua(x)

∂a=

exp(−ax)(xa+ 1)− 1

a2≤ 0

Where the inequality follows from the fact that (xa + 1) ≤ exp(ax) 1. Hence wecan use lemma 5.1 to conclude that the entropic risk measure is increasing in thecoefficient of absolute risk aversion, i.e.

a1 ≤ a2 ⇒ ERa1(X) ≤ ERa2(X). (5.12)

To illustrate this relationship we have simulated different sets of each 10000 re-turns. These returns were generated from a normal distribution with mean 0.25and different standard deviations σ. We calculated the entropic risk measure fordifferent values of absolute risk aversion and plotted our results. The results canbe found in figure 5.2. In this figure we can clearly observe that an increase ofabsolute risk aversion corresponds to an increase in entropic risk. Furthermore wealso notice that a higher standard deviation corresponds to a higher risk.

1This is a known inequality which follows from the fact that xa + 1 is the tangent line toexp(ax) in ax = 0, and exp(·) is a convex function.

82

Figure 5.1: Influence of the absolute risk aversion a on the exponential divergence riskmeasure.

We would like to point out that because of theorem 2.7 we know that SF l0 =ERa(X). This means that for the exponential utility function the utility basedshortfall risk for x0 = 0 equals the entropic risk, i.e the exponential divergencerisk. Therefore we will not study the effect of the absolute risk aversion on theexponentially based shortfall risk. As we have deduced in the fourth chapterincreasing the parameter x0 of a utility based shortfall risk measure always resultsin a decrease of the risk and this holds independent of the utility function.We know from 4.13 that this risk measure is not coherent because the utilityfunctions is not a piecewise linear function. However there exists a coherent versionof this entropic risk measure which is desribed in detail in [3]. This risk measureis called Entropic Value at Risk or EVaR. From [3, definition 3.1] we have that

Definition 5.1. Entropic Value at Risk at a (1−α)100% confidence level is definedas

EVaRα(X) := infz>0

(1

zln

(MX(z)

a

)), (5.13)

where MX denotes the moment generating function of X.

We will not work out any details of this article as it is outside the scope of thisthesis. However we will report some key results from this paper and explain howthese results are closely linked to the entropic risk measure and divergence basedrisk measures in general. This is interesting because these seemingly technicaltheorems could be used to construct coherent alternatives to the divergence riskmeasures discussed in chapter four.In [3, Theorem 3.3] we find following robust representation theorem regardingEntropic Value at Risk.

83

Theorem 5.1. For X ∈ L∞(Ω,F , P ) we have that

EVaRα = supQ∈I

EQ [−X] = infz>0

(sup

Q∈M1(P )

(EQ [X]− 1

zKL(Q|P )− 1

zln(α)

)),

(5.14)where I = Q ∈M1(P )|KL(Q|P ) ≤ − ln(α)

It is easy to see that any risk measure which has a robust representation in theform of sup

Q∈IEQ [−X], where I denotes a set of probability measures, is a coherent

risk measure. This follows from the properties of the expected value and thesupremum. When we compare this representation of Entropic Value at Risk todefinition 5.11 we can clearly see and interpret the different approach.In the divergence based approach we considered all probability measures Q whichare absolute continuous with respect to P . We then looked at the expected lossesunder each of these probability measures Q, EQ [−X]. Using the Kullback-Leiberentropy we penalised these expected losses depending on how similar the probabil-ity measure Q was to P . We concluded our computation by taking the supremumover all these penalised expected losses.In the coherent approach however, we only consider the probability measures Qwhich have a Kullback-Leibler distance with respect to P smaller than a givenamount. We then take the supremum over all the expected losses with respect tothose probability measures Q. No penalty functions are used here and every prob-ability measure Q for which KL(Q|P ) ≤ − ln(α) is taken to be equally important.This idea which is used to construct Entropic Value at Risk could be generalisedto all divergence risk measures using the definition of a φ-entropic risk measurewith divergence level β which we found in [3, definition 5.1].

Definition 5.2. Let φ be a convex function with φ(1) = 0, and β a non-negativenumber. The φ-entropic risk measure with divergence level β is defined as

ERφ,β(X) := supQ∈I

EQ [−X] , (5.15)

where I := Q ∈M1(P )|Iφ(Q|P ) ≤ β .

This defines a class of coherent risk measures which shows a lot of similarities tothe divergence risk measures which where defined as

supQ∈M1(P )

(EQ [−X]− Iφ (Q|P )) .

So far we have discussed the power utility and the exponential utility, both ofwhich are contained in the HARA class. These utility functions are commonlyused in economics. There exist however a lot of other classes utility functions.One such class is the class of the polynomial utility functions.

5.3 The polynomial utility functions

The following class of utility functions was found in [10].

84

Definition 5.3. For γ > 1 with γ ∈ N the polynomial utility function is definedas

u(x) =

1−(1−x)γ

γif x ≤ 1

1γ

elsewhere .(5.16)

We have plotted this utility function for different values of γ in figure 5.2The associated loss function is given by

l(x) =

(1+x)γ−1

γif x ≥ −1

−1γ

elsewhere .(5.17)

The first derivative is given by

u′(x) =

(1− x)γ−1 x ≤ 1

0 x > 1.(5.18)

The Arrow-Pratt measure of absolute risk aversion is not well defined for x ≥ 1.For x < 1 we have that rA+ γ−1

1−x > 0, which implies a risk averse attitude. Becausethe utility function is constant for x ≥ 1 we can say that for x ≥ 1 the utilityfunction implies risk neutrality.We will now calculate the divergence function associated with this utility function.

φ(t) = supx∈R

(xt− l(x))

= max

(supx≥−1

(xt−

((1 + x)γ − 1

γ

)), supx<−1

(xt+

1

γ

))First consider the case that t < 0, then we have that sup

x<−1

(xt+

1

γ

)= +∞. Hence

for t < 0 we have that φ(t) = +∞. Now assume that t ≥ 0. Then we have that

supx<−1

(xt+

1

γ

)= −t+

1

γ. (5.19)

We will now calculate supx≥−1

(xt−

((1 + x)γ − 1

γ

)). The first order condition

yields that t − (1 + x)γ−1 = 0. Hence we have that t1

γ−1 − 1 = x. Notice thatthe second order condition for a maximum is also fulfilled in this point. Hence wehave that

supx≥−1

(xt−

((1 + x)γ − 1

γ

))=

((t

1γ−1 − 1

)t−

(t

γγ−1 − 1

γ

))=

1

γ

(γt

1γ−1

+1 − γt− tγγ−1 + 1

)=

1

γ

(γt

γγ−1 − γt− t

γγ−1 + 1

)=

1

γ

((γ − 1)t

γγ−1 − γt+ 1

)85

Figure 5.2: Polynomial util-ity function fordifferent values ofγ.

Figure 5.3: Divergence function for different valuesof γ.

Now it is sufficient to notice that if t ≥ 0 then (γ−1)γt

γγ−1 ≥ 0. Hence we have that

1

γ

((γ − 1)t

γγ−1 − γt+ 1

)≥ −t+

1

γ. (5.20)

From this it follows that for t ≥ 0

φ(t) =1

γ

((γ − 1)t

γγ−1 − γt+ 1

). (5.21)

We conclude that the associated divergence is given by

φ(t) =

1γ

((γ − 1)t

γγ−1 − γt+ 1

)if t ≥ 0

+∞ elsewhere .(5.22)

We have plotted this divergence function for different values of γ in figure 5.3.We will now study the effect of the parameter γ on both the polynomial utilitybased shortfall risk and on the polynomial divergence risk. An excellent startingpoint for this is figure 5.2. This figure lets us suspect that, for a fixed return x, ifγ1 ≥ γ2 then uγ1(x) ≤ uγ2(x). We can see this in the following way:Assume that γ1 ≥ γ2. For x ≥ 1 we have that 1

γ1≤ 1

γ2. Hence for x ≥ 1 we have

that uγ1(x) ≤ uγ2(x). Now consider a random but fixed x < 1 then 1 − x > 0.

Hence if γ1 ≥ γ2 we have that 1−(1−x)γ1

γ1≤ 1−(1−x)γ2

γ2. From this we can conclude

that uγ1(x) ≤ uγ2(x) for all x ∈ R.

86

We can use lemma 5.1 to conclude that both the polynomial divergence risk andthe polynomial utility based shortfall risk will increase when the parameter γincreases. We have illustrated this relationship using a simulation. We generated10000 returns from a normal distribution with mean 0.25. We did this for differentstandard deviations. For each of these sets of returns we computed the divergencerisk and the utility based shortfall risk for different values of γ. We have listed theresults in table 5.1 and 5.2 respectively.

Table 5.1: Divergence risk of the polynomial utility for different values of γ and σ

γ = 2 γ = 3 γ = 4 γ = 5 γ = 6

σ = 0.2 -0.226 -0.206 -0.187 -0.167 -0.148σ = 0.4 -0.166 -0.092 -0.022 0.045 0.107σ = 0.6 -0.073 0.077 0.214 0.340 0.457σ = 0.8 0.046 0.289 0.514 0.732 0.943σ = 1.0 0.206 0.545 0.836 1.095 1.330

Table 5.2: Utility based shortfall risk of the polynomial utility with x0 = 0 for differentvalues of γ and σ

γ = 2 γ = 3 γ = 4 γ = 5 γ = 6

σ = 0.2 -0.226 -0.206 -0.186 0.167 -0.148σ = 0.4 -0.163 -0.086 -0.015 0.052 0.115σ = 0.6 -0.058 0.095 0.234 0.361 0.479σ = 0.8 0.083 0.332 0.562 0.784 0.977σ = 1.0 0.276 0.614 0.903 1.159 1.393

When looking at these tables we further notice that the larger the standard de-viation, the larger both risk measures. This coincides with our intuition. Whenwe compare values across both tables, we notice that the divergence risk is alwayssmaller than the utility based shortfall risk, with x0 = 0. This should not besurprising because this theoretical result follows directly from the results whichwere proven in the fourth chapter. In the fourth chapter we showed that2 if x0 = 0then SF l0(X) = −Mu(X). We also obtained a general inequality which statedthat Mu(X) ≤ OCEu(X). From those results we can conclude that

SF l0(X) ≥ Dφ(X). (5.23)

A special case of the polynomial utility is the quadratic utility. This utility functionis obtained by taking γ = 2.

u(x) =

−x2

2+ x if x ≤ 1

12

elsewhere .(5.24)

The divergence associated with the quadratic utility is

2see equation 4.2

87

φ(t) =

12

(t− 1)2 if t ≥ 0

+∞ elsewhere .(5.25)

The divergence function (t− 1)2 called the χ2-divergence function. It is stated in[5] that the optimised certainty equivalent associated with the quadratic utilityfunction of a stochastic variable X for which xmax ≤ 1 + E [X] is

OCEu(X) = E [X]− 1

2Var(X) (5.26)

We will verify this claim. We have that u is a differentiable function and

u′(x) =

1− x, x ≤ 1

0 x > 0(5.27)

Now we will use equation 4.41 which characterises the optimal allocation η∗ of theoptimised certainty equivalent.

η∗ = E [X]⇔ E [u′ (X − η∗)] = 1

Because we know that xmax ≤ 1 + E [X], we have that X(ω) − E [X] ≤ 1 for allω ∈ Ω. Hence we have that

E [u′ (X − E [X])] = E [1−X + E [X]] = 1− E [X] + E [X] = 1.

This proves that η∗ = E [X] is an optimal allocation. We can now conclude thatthe optimised certainty equivalent is given by

OCEu(X) = η∗ + E [u (X − η∗)]= E [X] + E [u (X − E [X])]

= E [X] + E[(X − E [X])− 1

2(X − E [X])2

]= E [X] + E [X]− E [X]− 1

2E[(X − E [X])2]

= E [X]− 1

2Var(X).

This proves the result.

5.4 The SAHARA utility functions

When we took a closer look at the application of the power utility to risk measures,we highlighted the problem that this utility function might not be defined forlarge negative values. The class of SAHARA utility functions was introduced in[8], to deal with the problem of the limited domain of certain HARA functions.The acronym SAHARA stands for symmetric asymptotic hyperbolic absolute riskaversion. Originally this class of utility functions was used for option pricing. Inthis thesis we will take a closer look at the properties of this class in the context

88

of utility based risk measures. Just as in the HARA class, the SAHARA class isalso defined using the Arrow-Pratt measure of absolute risk aversion.

Definition 5.4. A utility function u with domain R belongs to the SAHARA classif the absolute risk aversion ra(x) = −u′′(x)

u′(x)is given by

rA(x) =a√

b2 + (x− d)2(5.28)

with a > 0, b > 0 and d ∈ R.

We have plotted this risk aversion for several values of the parameters a, b and d infigure 5.4. We can see in these figures that the absolute risk aversion is a strictlypositive symmetric function which attains a maximum for x = d. It is easy toprove these facts using equation 5.28. Furthermore we find that

limx→+∞

rA(x) = limx→+∞

a√b2 + (x− d)2

= 0 (5.29)

limx→−∞

rA(x) = limx→−∞

a√b2 + (x− d)2

= 0. (5.30)

This implies that an investor with the SAHARA utility has an almost risk neutralattitude towards very large losses and very large gains. We will call the point atwhich the absolute risk aversion attains its maximum, d the threshold loss3. Whenapproached from above the absolute risk aversion is increasing. This implies thatan investor or financial institution will become increasingly risk averse and will tryto avoid falling below the threshold loss.In the context of risk measures the parameter d could be used to model a loss whichif exceeded will cause a the financial institution significant problems. Using thisinterpretation it is not difficult to see why lim

x→−∞rA(x) = 0 is not an unreasonable

assumption. If the losses are so large that it insures the bankruptcy of the financialinstitution, there is no reason to be risk averse any more. Unlike in [8] we will notassume that d = 0, because there is no reason to assume that the threshold lossis 0. This makes the computation of the associated utility function tedious. Toinsure the readability of this section we will only report the results and we haveput the calculations in appendix B.If the Arrow-Pratt measure of absolute risk aversion is given by equation 5.28 thenthe associated utility function is given by

u(x) =

−C1

(a2−1)

(√b2 + (x− d)2 + (x− d)

)−a (a√b2 + (x− d)2 + (x− d)

)+ C2 a 6= 1

C1

2

(ln(√

b2 + (x− d)2 + (x− d))−(√

b2+(x−d)2−(x−d))2

2b2

)+ C2 a = 1.

(5.31)For some constants C1 and C2.For all a > 0 the marginal utility is given by

3This is different from the definition given in [8], because in our context the stochastic variableX does not model the total wealth.

89

Figure 5.4: Absolute risk aversion of SAHARA utility.

Figure 5.5: Absolute risk aversion forvarying values of a withb = 1 and d = 2 fixed.

Figure 5.6: Absolute risk aversion forvarying values of b witha = 0.5 and d = 2 fixed.

u′(x) = C1

(√b2 + (x− d)2 + (x− d)

)−a. (5.32)

We can determine the constants C1 and C2 such that the utility function is stan-dardised and we have that u(0) = 0 and u′(0) = 1. We find that

C1 =(√

b2 + d2 − d)a, (5.33)

and

C2 =

C1

(a2−1)

(√b2 + d2 − d

)−a (a√b2 + d2 − d

)a 6= 1

−C1

2

(ln(√

b2 + d2 − d)− (

√b2+d2+d)

2

2b2

)a = 1.

(5.34)

We have plotted the standardised SAHARA utility functions for different valuesof a,b and d in figure 5.7.

90

Figure 5.7: The SAHARA utility function.

Figure 5.8: b = 1 andd = 2.

Figure 5.9: a = 2 andd = 0.

Figure 5.10: a = 2 andb = 1.

The calculation of the associated divergence function can be found in appendix B.We found that the associated divergence function is given by

φ(t) =

t2

(b2t

1aC−1a

1

1+ 1a

− t−1a C

1a1

1− 1a

− 2d

)+ C2 a 6= 1

12

(C1 ln

(C1

t

)− t

2

)− t(d+ C1

2t− b2t

2C1

)+ C2 a = 1.

(5.35)

Where the constants C1 and C2 are given by 5.33 and 5.34 respectively. We haveplotted the divergence function in figure 5.11.

Figure 5.11: Divergence function of SAHARA utility with a = 2, b = 2 and d = −1.

The SAHARA class of utility functions is de most complicated class of utilityfunctions in this chapter.Unlike in the case of the exponential utility functions and the polynomial utilityfunctions, we will only illustrate the effect of the parameters of the SAHARA

91

utility on the divergence risk measures in a concrete example. For the divergencerisk measures we worked with different sets of returns which we generated from anormal distribution with mean 0.25 and different standard deviations. The effect ofthe parameters a, b and d where plotted in figures 5.12, 5.14 and 5.16 respectively.When we look the effect of the parameter a on the SAHARA divergence risk wesee that in this case the divergence risk is increasing in the parameter a. Althoughwe do not provide a formal proof, we do not think this relationship is purelycoincidental. If we look at figure 5.8 we suspect that if a1 ≥ a2 and all otherparameters are fixed, then ua1(x) ≤ ua2(x). Hence the relationship between theparameter a and the divergence risk measure might be due to lemma 5.1.The same relationship is observed between the parameter a and the utility basedshortfall risk. In figure 5.13 we have plotted both the divergence risk and theutility based shortfall risk with x0 = 0 of 10000 returns generated form a normaldistribution with mean 0.25 and standard deviation 0.8. Although both the di-vergence risk and the utility based shortfall risk were plotted, only one graph isvisible. This is not a mistake. It turns out that in this example both risk measuresyield very similar results, which makes it difficult to distinguish between them.

Figure 5.12: Effect of the parameter a on the SAHARA divergence risk with b = 2 andd = 0.

92

Figure 5.13: Effect of parameter a on utility based shortfall risk with b = 2 en d = 0.

Using the same sets of returns as in the illustration of the affect of the parametera we have illustrated the effect of the parameter b on the divergence risk. For thecomputations we have taken a = 2 and d = 0. The results are shown in figure 5.14.Here we observe that an increase in the parameter b corresponds to a decrease inthe divergence risk measure. We also observe that a higher standard deviationleads to a higher divergence risk. These results are not surprising when we lookat figure 5.9 where the effect of the parameter b on the SAHARA utility functionis shown.To illustrate the effect on the utility based shortfall risk we have calculated thisrisk measure on a set of 10000 returns generated from a normal distribution withmean 0.25 and standard deviation 0.8. We took a = 2, d = 0 and x0 = 0. Atthe same time we also calculated the divergence risk of the same set of returns.The results are shown in figure 5.15. In this figure we do not only observe thatthe SAHARA utility based shortfall risk is decreasing in the parameter b, but alsothat the difference between this risk measure and the divergence risk measure isvery small.

93

Figure 5.14: Influence of the parameter b on the SAHARA divergence risk with a = 2and d = 0.

Figure 5.15: Influence of the parameter b on the utility based shortfall risk with a = 2,d = 0 and x0 = 0.

94

Until this point each parameter we looked at had a monotone effect on the riskmeasure. However if we look at figure 5.16 we notice a non-monotone relationshipbetween the parameter d and the associated divergence risk. This figure wasgenerated, as always using different sets of returns each taken from a normaldistribution with mean 0.25 and different standard deviations. For the calculationswe put a = 2 and b = 2. The same relationship is observed in figure 5.17. Here wehave computed both the SAHARA utility based shortfall risk and the SAHARAdivergence risk of a set of returns which was generated from a normal distributionwith mean 0.25 and standard deviation 0.8. For the computations we took a = 2,b = 2 and x0 = 0. Again we notice that there is almost no difference betweenthe utility based shortfall risk measure and the divergence risk measure. Noticethat the non-monotonicity is a consequence of the way that we standardised theSAHARA utility function. If we had taken C1 = 1 and C2 = 0 for example thenthe increasing relationship between the parameter d and the divergence risk wouldfollow directly from the shift additivity of the optimised certainty equivalent. Tosee this denote with ud1(x) the SAHARA utility function were d = d1. Notice thatunder the alternative standardisation ud(x) = u0(x− d), therefore we have that

OCEud(X) = supη∈R

(η + E [ud(X − η)]) = supη∈R

(η + E [u0(X − d− η)]) = OCEu0(X)−d

Hence we would have that Dφd(X) = Dφ0(X) + d. Using the standardisationC1 = 1 and C2 = 0 would also cause an increasing relationship between theparameter d and the utility based shortfall risk. A result which follows directlyfrom the translation invariance of utility based shortfall risk measures.

Figure 5.16: Influence of the parameter d on the SAHARA divergence risk with a = 2and b = 2.

95

Figure 5.17: Influence of the parameter d on the different risk measures with a = 2,b = 2 and x0 = 0. Based on 10000 returns generated from a normaldistribution with µ = 0.25 and σ = 0.8.

5.5 The κ-utility functions

In [15] we found following class of utility functions, which we will call the κ-utilityfunctions. For each κ > 0

u(x) =1

κ

(1 + κx−

√1 + κ2x2

)(5.36)

denotes a κ-utility function. Notice that this utility function is standardised suchthat u(0) = 0 and u′(0) = 1 for all κ > 0. We have plotted this function forseveral values of κ in figure 5.5. We will fix κ and calculate the first and secondderivatives with respect to x. We find that

u′(x) = 1− κx√1 + κ2x2

. (5.37)

We can conclude that for all κ > 0 and for all x u′(x) > 0. This implies thatall utility functions from the class 5.36 are strictly increasing. For the secondderivative we find that

u′′(x) =−κ(√

1 + κ2x2)3 . (5.38)

Because κ > 0 we have that u′′(x) < 0 for all κ and x. This means all utilityfunctions from the class 5.36 are strictly concave everywhere. Now we can easilydeduce the Arrow-Pratt measure of absolute risk aversion.

ra(x) = −u′′(x)

u′(x)=

κ

(1 + κ2x2)(√

1 + κ2x2 − κ2x2) (5.39)

96

Figure 5.18: Utility func-tion 5.36.

Figure 5.19: Absolute riskaversion.

Figure 5.20: Skew asymp-tote, κ = 2.

We have plotted this absolute risk aversion for several values of κ in figure 5.19.Using this figure we can make several hypotheses about the class of κ-utility func-tions. We notice that when κ gets larger, the maximum of the absolute riskaversion gets larger. In the neighbourhood of zero we have that the larger theκ the larger the absolute risk aversion. We also see that for all values of κ theabsolute risk aversion tends to zero in the left tail. The larger the κ the faster thishappens. Hence in the left tail the agent tends to risk-neutrality. This will alsobe the case for the right tail. We can see this in figure 5.20 where we have plottedthe same utility functions as in figure 5.5 but on a larger domain. We can see thatwhen x becomes larger, the utility functions tend to some horizontal asymptote.In the same figure we see that when x gets smaller the utility function tends tosome skew asymptote y = ax + b for some a and b. The computations regardingthe asymptotic behaviour of the κ-utility can be found in appendix B. To improvethe readability of this text we will only discuss the results.It turns out that the skew asymptote is given by y = 2x + 1

κ. This is illustrated

in figure 5.20. The equation of the horizontal asymptote is given by y = 1κ. The

class of utility functions 5.36 is defined for all κ > 0. We will now discuss whathappens when κ tends to 0 and κ tends to +∞.We have that

limκ→+∞

u(x) = x−√x2. (5.40)

From this we can conclude that

limκ→+∞

u(x) =

0, x ≥ 0

2x, x < 0.(5.41)

We recognize the utility function used in CVAR for a confidence level α = 0.5.The limit when κ tends to zero is given by

97

limκ→0

u(x) = x. (5.42)

Which gives us the utility function of a risk neutral investor.The determination of the associated divergence function is a tedious task. All com-putations can be found in appendix B. We obtained that the divergence functionis given by

φ(t) =

−1κ

(√1− (t− 1)2

)+ 1

κif 0 ≤ t ≤ 2

+∞ if t > 2.(5.43)

In figure 5.21 we have plotted this divergence function.

Figure 5.21: The divergence function of the κ-utility for different values of κ.

To understand the effect of the parameter κ on both the divergence risk measureand the utility based risk measure we will again use lemma 5.1. Figure lets ussuspect that if κ1 ≥ κ2 then uκ1(x) ≤ uκ2(x) for all x ∈ R.

98

Assume that κ1 ≥ κ2 > 0 then for all x ∈ R we have that

κ1 ≥ κ2 > 0⇒√

1 + κ21x

2 ≥√

1 + κ22x

2

⇒ −√

1 + κ21x

2 ≤ −√

1 + κ22x

2

⇒ −√

1 + κ21x

2

κ1

≤ −√

1 + κ22x

2

κ2

⇒ 1

κ1

+ x−√

1 + κ21x

2

κ1

≤ 1

κ2

+ x−√

1 + κ22x

2

κ2

⇒ uκ1(x) ≤ uκ2(x).

From this we can conclude that an increase in the parameter κ results in an increaseof the divergence risk and of the utility based shortfall risk.This effect of the parameter κ on the divergence risk is illustrated in figure 5.22.We have constructed this figure in the same fashion as before. I.e sets of 10000returns where simulated from a normal distribution with mean 0.25 such that eachset had different standard deviations.

Figure 5.22: Influence of κ-parameter on the divergence risk.

In figure 5.23 we have plotted the divergence risk and the utility based shortfallrisk with x0 = 0 of a set of returns generated from a normal distribution with mean0.25 and standard deviation 0.8. This figure illustrates the fact that utility basedshortfall risk is increasing in the parameter κ. We can also observe that thereis a clear difference between the utility based shortfall risk with x0 = 0 and thedivergence risk, and that the difference between the two risk measures increases ifκ increases.

99

Figure 5.23: Comparison of the divergence risk and the utility based shortfall risk ofthe κ-utility.

When looking at the divergence of the κ-utility we notice that for t > 2 thedivergence becomes +∞. That is for values of t larger then the slope of the skewasymptote the divergence is +∞. We have seen something similar when we lookedat the divergence associated with CVaR. In that case we have that the utilityfunction is given by u(x) = − 1

αmax(0,−x) and y = 1

αx is a skew asymptote for

x→ −∞. The associated divergence was given by

φ(t) =

0 if 0 ≤ t ≤ 1

α

+∞ if t > 1α.

(5.44)

This turns out not be a coincidence. Because the slope of the skew asymptote isan upper bound of the slope of the utility function or equivalently the slope of lossfunction, the Legendre transform of the loss function becomes infinite for valueslarger than this upper bound. This effect is illustrated in figure 5.24 where wehave used κ = 5. We have formalised this intuition in a lemma.

Lemma 5.2. Let u(x) be an increasing and concave utility function, such thaty = ax + b is a skew asymptote for x → −∞. Then the associated divergenceφ(t) = (−u(−t))∗ = +∞ for all t > a.

Proof. Because y = ax + b is a skew asymptote of u(x) when x tends to −∞, wehave that y = ax− b is a skew asymptote of l(x) = −u(−x) when x tends to +∞.Because l(x) is a convex and increasing function, the slope of l(x) is increasing.Hence a is an upper bound for the slope of the loss function. Because ax− b is askew asymptote when x tends to +∞ we have that

∀ε > 0,∃δ > 0 such that x > δ ⇒ |l(x)− (ax− b)| < ε

100

Figure 5.24: Effect of skew asymptote on the divergence function.

Hence there exists a γ such that for all x > δ l(x) − ax + b < γ because we cantake γ = ε or γ = −ε. We need to calculate sup

x∈R(xt− l(x)). We will show that

for t > a we have that xt− l(x) is unbounded. Suppose that xt− l(x) is boundedfrom above for t > a, then there exists an M ∈ R such that xt − l(x) ≤ Mfor all x ∈ R. We have that for x > δ, l(x) < γ + ax − b. Hence we have thatxt ≤ l(x)+M < γ+ax−b+M . From this we have that ∀x > δ, x(t−a) < γ−b+M .Because t > a we find that x < γ−b+M

t−a . Which gives a contradiction. Hencext− l(x) is unbounded and φ(t) = l∗(t) = sup

x∈R(xt− l(x)) = +∞.

101

Conclusion

In this masterthesis we looked at different ways to incorporate utility functionsin risk measures. We focused on two classes of risk measures: utility basedshortfall risk measures and divergence risk measures. Both of these risk mea-sures are convex, which means that they satisfy the properties of monotonicity,translation invariance and sub-additivity. However, they are generally not coher-ent because they lack the positive homogeneity property. Like all convex riskmeasures, these risk measures have a robust representation of the following form

supQ∈M1(P )

(EQ [−X]− α(Q)). Where α(·) is a penalty function. In the case of diver-

gence risk measures this representation is often used and the penalty function istaken to be the φ- divergence Iφ(Q|P ) = E

[φ(dQ

dP

], where the convex function φ

is called the divergence function. The effect of this divergence function is difficultto analyse and to interpret. Fortunately the strong Fenchel duality theorem frommathematical optimisation was able to reformulate the robust representation ofa divergence risk measure to a more comprehensible formula. We obtained thateach divergence risk measure could be interpreted as the negative of an optimisedcertainty equivalent where the utility function u was linked to the divergence func-tion φ through the Fenchel-Legendre transform. More formally we obtained thatif φ∗(x) = −u(−x) then

Dφ(X) = supQ∈M1(P )

(EQ [−X]− Iφ(Q|P )) = − supη∈R

(η + E [u(X − η)]) = −OCEu(X).

The utility based shortfall risk measures were defined as the negative of the u-Meancertainty equivalent. We had that for l(x) = −u(−x) = −u(−x) + x0.

SF lx0(X) = infm ∈ R|E [l(−X −m) ≤ x0]= − supm ∈ R|E [u(X −m)] ≥ 0 = −Mu(X).

Both utility based risk measures have a representation as an optimisations problemwith regard to a utility function. These optimisation problems were linked usingstrong Lagrangian duality. We obtained that SF l0(X) ≥ Dφ(X).Because both the divergence risk measure and the utility based shortfall risk mea-sure can be interpreted as the negative of some certainty equivalent, we lookedinto the possibility of using the negative of the ordinary certainty equivalent as arisk measure. It turned out that to get a translation invariant risk measure, onlylinear or exponential utility functions could be used. Since we also noted thatfor the exponential utility function, all certainty equivalents coincide, we did notobtain any new interesting convex risk measures.

102

After reading this thesis, the reader might feel there remains an important unan-swered question. Namely, ”Which utility function should be used in utility basedrisk measure?” We do not give an answer to this question.One of the reasons for not proposing a specific utility function is that utility func-tions model preferences and these preferences are subjective. Another importantreason is computability. In this thesis we did not look at how these risk mea-sures could be efficiently computed. Although we did compute some utility basedrisk measures in the last chapter, we did this by using a packages for constrainedand unconstrained optimisation in python. This computational aspect should betaking into account when choosing a suitable utility function.Although we did not put forward a specific utility function that should be used inutility based risk measures, we did study some of them in the last chapter. Herewe tried to give an illustration in how the parameters of the utility functions affectboth the utility based shortfall risk and the divergence risk. For each of the utilityfunctions we also computed the associated divergence function.

103

ADutch summary

Om kapitaalvereisten op te stellen voor financiele instellingen is het noodzakelijkom het risico te kunnen bepalen van de portfolio’s van deze instellingen. Hetbepalen van dit risico kan gebeuren aan de hand van risicomaten. In het eerstehoofdstuk bestuderen we deze risicomaten vanuit een wiskundig standpunt en for-muleren we enkele eigenschappen die een goede risicomaat zou moeten hebben.Aan de hand van deze eigenschappen kunnen we een klasse van convexe risico-maten construeren. Binnen deze klasse besteden we vervolgens extra aandachtaan de subklasse van de coherente risicomaten. Hierna bestuderen we het con-cept van acceptatieverzamelingen. Dit zijn verzamelingen waarin alle mogelijkeportfolio’s zich bevinden waarvan we het risico aanvaardbaar vinden. Aan dehand van deze verzamelingen kunnen we op een eenvoudige manier risicomatendefinieren. Vervolgens introduceren de robuuste representatie van convexe risico-maten en bestuderen we de bijhorende straffunctie.Enkel risicomaten bestuderen vanuit een wiskundig standpunt zou volledig voorbij-gaan aan het subjectieve karakter van risico. Wat een te hoog risico is voor de eenis aanvaardbaar voor de ander. Het tweede hoofdstuk geeft daarom een inleidingtot de beslissingstheorie. Hierin introduceren we het von Neumann-Morgensteinframework voor het maken van beslissingen onder onzekerheid. We leggen uit watnutsfuncties zijn en hoe ze de verschillende attitudes ten opzichte van risico kunnenmodelleren. Vervolgens schenken we aandacht aan verschillende types zekerheidse-quivalenten: het gewone zekerheidsequivalent (CEu), het geoptimaliseerde zeker-heidsequivalent (OCEu) en het u-gemiddeld zekerheidsequivalent (Mu) komen aanbod. We sluiten dit hoofdstuk af met een introductie van het concept stochastischedominantie.Gewapend met zowel de wiskundige concepten uit het eerste hoofdstuk als deeconomische concepten uit het tweede hoofdstuk, kunnen we nu concrete risico-maten analyseren. Dit gebeurt in het derde hoofdstuk, waarin we de geziene con-cepten toepassen op zowel Value at Risk als Expected shortfall. Hierin merken weop dat Value at Risk, een van de meest gebruikte risicomaten, enkele belangrijke

104

tekortkomingen vertoont zowel op wiskundig als op economisch gebied.In het vierde hoofdstuk gaan we dieper in op de hoofdvraag van deze thesis: ”Hoekunnen nutsfuncties op een goede manier geıncorporeerd worden in risicomaten?”We geven twee mogelijke antwoorden op deze vraag. Allereerst zijn er de opnutsfuncties gebaseerde shortfall risicomaten (SF lx0). De constructie van deze risi-comaten gebeurt vanuit acceptatieverzamelingen. Deze acceptatieverzamelingenbevatten alle portfolio’s waarvan het verwachte nut een bepaalde grens overstijgt.Als we de verliesfunctie l definieren als l(x) = −u(−x) hebben we dat

SF lx0 = infm ∈ R|E [u(X +m)] ≥ −x0.

Hierbij hebben we opgemerkt dat we deze formule kunnen herschrijven aan dehand van een u-gemiddeld zekerheidsequivalent. We hebben dat SF l0 = −Mu(X).Een tweede type risicomaten waarin nutsfuncties geıncorporeerd zijn, zijn de zo-genaamde divergentie risicomaten (Dφ). In tegenstelling tot de op nutsfunctiesgebasserde shortfall risicomaten worden deze risicomaten niet geconstrueerd aande hand van acceptatieverzamelingen, maar wordt er gebruik gemaakt van derobuuste representatie van convexe risicomaten. We hebben dat

Dφ(X) = supQP

(EQ [−X]− Iφ(Q|P )) .

Kenmerkend voor divergentie risicomaten is dat de straffunctie de vorm Iφ(Q|P ) =EP[φ(dQdP

)]heeft, waarbij φ een convexe functie1 is die de divergentiefunctie ge-

noemd wordt. Een van de bekendste voorbeelden van een divergentie risicomaat isentropisch risico. Hierbij neemt men als divergentie de Kullback-Leibler entropy,EP[dQdP

ln(dQdP

)]. De interpretatie van divergentie risicomaten is niet eenvoudig als

men enkel beschikt over de desbetreffende robuuste respresentatie. Gelukkig biedtde stelling van de sterke Fenchel-dualiteit hiervoor een oplossing. Indien we alsnutsfunctie u(x) = −φ∗(−x) nemen, waarbij φ∗ de Fenchel-Legendre transformatieis van de divergentiefunctie vinden we dat

Dφ(X) = − supη∈R

(η + E [u(X − η)]) = −OCEu(X).

Deze representatie is veel eenvoudiger te interpreteren dan de robuuste repre-sentatie. Zo kan men entropisch risico interpreteren als het negatieve geopti-maliseerde zekerheidsequivalent van een individu met een exponentiele nutsfunctie.Geınspireerd door het feit dat zowel divergentie risicomaten als op de nutsfunc-ties gebaseerde shortfall risicomaten kunnen geınterpreteerd worden als negatievezekerheidsequivalenten vroegen we ons af of ook het gewone zekerheidsequiva-lent op die manier aanleiding zou geven tot een goede risicomaat. Dit idee bleekechter weinig succesvol daar deze risicomaten in vele gevallen niet over de gewenstewiskundige eigenschappen beschikten.Alhoewel entropisch risico een erg gekend voorbeeld is van een divergentie risico-maat, is er vanuit economisch oogpunt weinig reden waarom we voor de construc-tie van risicomaten de exponentiele nutsfunctie zouden gebruiken. In het laatstehoofdstuk bestudeerden we daarom verschillende nutsfuncties in de context van

1Die ook de waarden +∞ en −∞ aan kan nemen, maar in minstens een waarde eindig is.

105

risicomaten. We berekenden voor elk van deze nutsfuncties de geassocieerde di-vergentiefunctie en onderzochten de invloed van de parameters op de verschillenderisicomaten.

106

BAdditional computations

B.1 Computations regarding the SAHARA util-

ity class

The class of SAHARA utility functions is defined using following coefficient ofabsolute risk aversion.

rA(x) =a√

b2 + (x− d)2(B.1)

with a > 0, b > 0 and d ∈ R.

B.1.1 Computation of the utility function

Let v(x) = u′(x) than we have that dv(x)v(x)

= −a√b2+(x−d)2

dx Integrating both sides

gives

ln(v(x)) =

∫−a√

b2 + (x− d)2dx

=

∫−a√b2 + y2

dy

= −a ln(√b2 + y2 + y) + C

= ln(√b2 + (x− d)2 + (x− d))−a + C.

We conclude that

u′(x) = C1

(√b2 + (x− d)2 + (x− d)

)−a. (B.2)

For some integration constant C1. Then we have that

107

u(x) =

∫C1

(√b2 + (x− d)2 + (x− d)

)−adx.

Let y = (x− d) then we have that

u(y) =

∫C1

(√b2 + y2 + y

)−ady.

Now consider following substitution

y =z2 − b2

2z,

dy =z2 + b2

2z2dz.

Then we have that

u(z) =

∫C1

√b2 +

(z2 − b2

2z

)2

+

(z2 − b2

2z

)−a(z2 + b2

2z2

)dz

=

∫C1

(√4b2z2 + z4 − 2z2b2 + b4

4z2+

(z2 − b2

2z

))−a(z2 + b2

2z2

)dz

=

∫C1

√(z2 + b2

2z

)2

+

(z2 − b2

2z

)−a(z2 + b2

2z2

)dz

=

∫C1

(z2

2z

)−a(z2 + b2

2z2

)dz

=

∫C1

1

2z−a−2(z2 + b2)dz.

Now consider the case that a 6= 1 then

u(z) =

∫C1

1

2z−a−2(z2 + b2)dz

=C1

2

(z−a+1

−a+ 1+b2z−a−1

−a− 1

)+ C2

=−C1z

−a

2(a2 − 1)

(z(a+ 1) + b2z−1(a− 1)

)+ C2.

Using the substitution y = z2−b22z

or equivalently z =√b2 + y2 + y we have that

108

u(y) =−C1

(√b2 + y2 + y

)−a2(a2 − 1)

((√b2 + y2 + y

)(a+ 1) + b2

(√b2 + y2 + y

)−1

(a− 1)

)+ C2

=−C1

(√b2 + y2 + y

)−a2(a2 − 1)

((√b2 + y2 + y

)(a+ 1) +

(√b2 + y2 − y

)(a− 1)

)+ C2

=−C1

(√b2 + y2 + y

)−a(a2 − 1)

(a√b2 + y2 + y

)+ C2.

We conclude that for a 6= 1 we have that

u(x) =−C1

(a2 − 1)

(√b2 + (x− d)2 + (x− d)

)−a (a√b2 + (x− d)2 + (x− d)

)+C2.

(B.3)Now consider the case that a = 1, then we have that

u(z) =

∫C1

2z−1 +

C1

2b2z−3dz

=C1

2

(ln(z)− b2 1

2z2

)+ C2.

Now using the substitution z =√b2 + y2 + y we find that

u(y) =C1

2

ln(√

b2 + y2 + y)− b2 1

2(√

b2 + y2 + y)2

+ C2

=C1

2

ln(√

b2 + y2 + y)− b2

(√b2 + y2 − y

)2

2(√

b2 + y2 + y)2 (√

b2 + y2 − y)2

+ C2

=C1

2

ln(√

b2 + y2 + y)− b2

(√b2 + y2 − y

)2

2b4

+ C2

=C1

2

ln(√

b2 + y2 + y)−

(√b2 + y2 − y

)2

2b2

+ C2.

Using that y = x− d we conclude that if a = 1 we have that

u(x) =C1

2

ln(√

b2 + (x− d)2 + (x− d))−

(√b2 + (x− d)2 − (x− d)

)2

2b2

+C2.

(B.4)

109

Now we will determine the constants C1 and C2 such that u(0) = 0 and u′(0) = 1.From B.2 we have that the condition u′(0) = 1 yields

C1 =(√

b2 + d2 − d)a.

Notice that C1 > 0. The condition u(0) = 0 yields that when a 6= 1

C2 =C1

(a2 − 1)

(√b2 + d2 − d

)−a (a√b2 + d2 − d

).

When a = 1 we have that

C2 = −C1

2

(ln(√

b2 + d2 − d)−(√

b2 + d2 + d)2

2b2

).

B.1.2 Computation of the divergence function

Now we will derive the divergence function associated with the standardised SA-HARA Utility.First suppose that a 6= 1 then the utility function is given by B.3. denote with

u1(x) :=1

(a2 − 1)

(√b2 + (x− d)2 + (x− d)

)−a (a√b2 + (x− d)2 + (x− d)

)(B.5)

Then u(x) = C1u(x) + C2. We will calculate the divergence function associatedwith u1 en denote this function φ1. Then there is a clear relation between thedivergence function associated with u. Because C1 > 0 we have that

φ(t) = supx∈R

(xt+ u(−x))

= supx∈R

(u(x)− xt)

= supx∈R

(C1u1(x) + C2 − xt)

= C1 supx∈R

(u1(x)− x t

C1

)+ C2

= C1φ1

(t

C1

)+ C2.

We have that φ1(t) = supx∈R (u1(x)− xt). The first order condition gives thatu′(x∗) = t, where x∗ denotes the optimal value.

(√b2 + (x∗ − d)2 + (x∗ − d)

)−a= t

⇔(x∗ − d) =1

2

(t−1a − b2t

1a

).

110

Then we have that

u1(x∗) =−1

(a2 − 1)

(√b2 + (x∗ − d)2 + (x∗ − d)

)−a (a√b2 + (x∗ − d)2 + (x∗ − d)

)=−t

a2 − 1

(a√b2 + (x∗ − d)2 + (x∗ − d)

)=−t

a2 − 1

a√√√√b2 +

(t−1a

2− b2t

1a

2

)2

+t−1a

2− b2t

1a

2

=−t

a2 − 1

a√

4b2 + t−2a − 2b2 + b2t

2a

4+t−1a

2− b2t

1a

2

=−t

a2 − 1

a√√√√(t−1

a + b2t1/a

2

)2

+t−1a

2− b2t

1a

2

=−t

a2 − 1

(a

(t−1a + b2t1/a

2

)+t−1a

2− b2t

1a

2

)=

−t2(a2 − 1)

(t−1a (a+ 1) + b2t

1a (a− 1)

)=−t2

(t−1a

(a− 1)+ b2t

1a (a+ 1)

).


φ1(t) = u1(x∗)− x∗t

=−t2

(t−1a

(a− 1)+ b2t

1a (a+ 1)

)− t

2

(2d+ t

−1a − b2t

1a

)=−t2

(t−1a

(1

a− 1+ 1

)+ b2t

1a

(1

a+ 1− 1

))− td

=t

2

(b2t

1a

1 + 1a

− t−1a

1− 1a

− 2d

).

Hence we have that

φ(t) = C1φ1

(t

C1

)+ C2 (B.6)

When a = 1 we have that the first order condition yields that

t =(√

b2 + (x∗0 − d)2 + (x∗ − d)2)−1

.

111

From which we can conclude that

(x∗ − d) =t−1

2− b2t

2

x∗ = d+t−1

2− b2t

2.

We have that

u1(x∗) =1

2

(ln(√b2 + (x∗ − d)2 + (x− d))−

√b2 + (x∗ − d)2 − (x∗ − d)

2b2

)

=1

2

(ln(t−1)− 1

2b2

(√4b2 + t−2 − b2 + b4t2

4− t−1

2+b2t

2

))

=1

2

(ln(t−1)− 1

2b2

(t−1

2+b2t

2− t−1

2+b2t

2

))=

1

2

(ln(t−1)− b2t

2b2

)=

1

2

(ln(t−1)− t

2

).

Then we have that

φ1(t) = u1(x∗)− tx∗

=1

2

(ln(t−1)− t

2

)− t(d+

t−1

2− b2t

2

).

B.2 Computations regarding the κ-utility class

B.2.1 Determining the asymptotic behaviour

For κ > 0 the κ-utility is given by

u(x) =1

κ

(1 + κx−

√1 + κ2x2

). (B.7)

We will now determine the equation corresponding to the skew asymptote of theκ-utility. The skew asymptote is given by y = ax+b Using the formula’s of Cauchywe find that

112

a = limx→−∞

u(κ, x)

x

= limx→−∞

1

κ

(1 + κx−

√1 + κ2x2

x

)

= 0 + 1 + limx→−∞

−√

1 + κ2x2

κx

= 1 + limx→−∞

−√x2

√1x2

+ κ2

κx

= 1 + limx→−∞

x√

1x2

+ κ2

κx

= 1 + limx→−∞

√1x2

+ κ2

κ

= 1 +

√κ2

κ= 2

b = limx→−∞

[u(x)− ax]

= limx→−∞

[1

κ

(1 + κx−

√1 + κ2x2

)− 2x

]=

1

κ+ lim

x→−∞

[x− 1

κ

√1 + κ2x2 − 2x

]=

1

κ+ lim

x→−∞

[−x− 1

κ

√1 + κ2x2

]=

1

κ+ lim

x→−∞

[−x−

√x2

κ

√1

x2+ κ2

]

=1

κ+ lim

x→−∞

[−x+

x

κ

√1

x2+ κ2

]

=1

κ− lim

x→−∞x

[1− 1

κ

√1

x2+ κ2

]

=1

κ− lim

x→−∞

1− 1κ

√1x2

+ κ2

1x

=

1

κ− lim

z→0

[1− 1

κ

√z2 + κ2

z

]

=1

κ− lim

z→0

[−z

κ√z2 + κ2

]=

1

κ

113

We can conclude that the skew asymptote is given by y = 2x + 1κ. We will now

derive the equation of the horizontal asymptote.

limx→+∞

u(x) = limx→+∞

1

κ

(1 + κx−

√1 + κ2x2

)=

1

κ+ lim

x→+∞

1

κ

(κx− κx

√1 +

1

κ2x2

)

=1

κ+ lim

x→+∞x

(1−

√1 +

1

κ2x2

)

=1

κ+ lim

x→+∞

(1−

√1 + 1

κ2x2

)1x

=1

κ+ lim

x→+∞

−1

κ2x3√

1+ 1κ2x2

−1x2

=1

κ+ lim

x→+∞

1

κ2x√

1 + 1κ2x2

=1

κ.

We conclude that if x tends to infinity the utility function tend to the horizontalasymptote with equation y = 1

κ.

The class of utility functions 5.36 is defined for all κ > 0. We will now show whathappens when κ tends to zero and κ tends to +∞.

limκ→+∞

u(x) = limκ→+∞

1

κ

(1 + κx−

√1 + κ2x2

)= lim

κ→+∞

(1

κ+ x−

√1

κ+ x2

)

= x− limκ→+∞

√1

κ+ x2

= x−√x2.

limκ→0

u(x) = limκ→0

1

κ

(1 + κx−

√1 + κ2x2

)= lim

κ→0

(x− κx2

√1 + κ2x2

)= x.

B.2.2 Computation of the divergence function

Fix some κ > 0 then the associates loss function is given by l(x) = −u(−x).

l(x) = −1

κ

(1− κx−

√1 + κ2x2

)(B.8)

114

The associated divergence function is given by φ(t) = l∗(κ, t).

φ(t) = supx∈R

(xt+

1

κ

(1− κx−

√1 + κ2x2

))(B.9)

The first order condition yields that

t− 1

κ

(κ+

κ2x√1 + κ2x2

)= 0. (B.10)

We will only derive the divergence for t > 0. First assume that t ≥ 2. Then theredoes not exist an x ∈ R such that equation B.10 holds. We can see this easilywhen we rewrite this equation as

t = 1 +κx√

1 + κ2x2. (B.11)

Because κx <√

1 + κ2x2 for all x ∈ R and κ > 0 we have that κx√1+κ2x2

< 1.Hence if equation B.11 would hold for some x and some κ then t <. It t ≥2 then the first derivative t − 1

κ

(κ+ κ2x√

1+κ2x2

)> 0. This implies the function

xt+ 1κ

(1− κx−

√1 + κ2x2

)is increasing. To determine the supremum over all x

of this function we will study the limit when x tends to +∞. First consider thecase when t = 2. We have that

limx→+∞

(2x+

1

κ

(1− κx−

√1 + κ2x2

))=

1

κ+ lim

x→+∞

(x− 1

κ

√1 + κ2x2

)=

1

κ+ lim

x→+∞x

(1−

√1 +

1

κ2x2

)

=1

κ+ lim

x→+∞

(1−

√1 + 1

κ2x2

)1x

=1

κ+ lim

x→+∞

1

κ2x√

1 + 1κ2x2

=1

κ.

We conclude that:

φ(2) =1

κ. (B.12)

Not consider the case where t > 2. Calculating the limit gives us

limx→+∞

(tx+

1

κ

(1− κx−

√1 + κ2x2

))=

1

κ+ lim

x→+∞

((t− 1)x− 1

κ

√1 + κ2x2

)=

1

κ+ lim

x→+∞x

((t− 1)−

√1 +

1

κ2x2

)= +∞ (t− 1− 1)

= +∞.

115

We find that:φ(t) = +∞ t > 2. (B.13)

When 0 < t < 2 equation B.11 can hold for some x. We will now determine thisx as a function of t. We find that

t = 1 +κx√

1 + κ2x2

⇒ (t− 1) =κx√

1 + κ2x2

⇒ (t− 1)2 =κ2x2

1 + κ2x2

⇒ 1

(t− 1)2=

1 + κ2x2

κ2x2

⇒ 1

(t− 1)2− 1 =

1

κ2x2

⇒ ±

√1

(t− 1)2− 1 =

1

κx

⇒ x =±1

κ√

1(t−1)2

− 1.

In what follows we will use following notations

x+ =1

κ√

1(t−1)2

− 1=

√(t− 1)2

κ√

1− (t− 1)2

x− =−1

κ√

1(t−1)2

− 1=−√

(t− 1)2

κ√

1− (t− 1)2

φ+(t) = x+t+1

κ

(1− κx+ −

√1 + κ2x2

+

)φ−(t) = x−t+

1

κ

(1− κx− −

√1 + κ2x2

−

).

We remark that if t = 1 then x+t = x−t = 0. Hence for t = 1 we have that

φ(1) = φ+(1) = φ−(1) = 1− 1 = 0.The second order condition follows from a straightforward calculation

d

dx

(t− 1

κ

(κ+

κ2x√1 + κ2x2

))=

d

dx

(−κ2x√1 + κ2x2

)=

−κ2

(1 + κ2x2)32

< 0.

To calculate the divergence we need to calculate

φ(t) = max(φ+(t), φ−(t)

)(B.14)

116

φ+(t) = x+t+1

κ

(1− κx+ −

√1 + κ2x2

+

)=

√(t− 1)2t

κ√

1− (t− 1)2+

1

κ

(1−

κ√

(t− 1)2

κ√

1− (t− 1)2−

√1 +

(t− 1)2

(1− (t− 1)2)

)

=

√(t− 1)2t

κ√

1− (t− 1)2+

1

κ

(1−

√(t− 1)2√

1− (t− 1)2− 1√

1− (t− 1)2

)

=1

κ

(√(t− 1)2t−

√(t− 1)2 − 1√

1− (t− 1)2

)+

1

κ

=1

κ

(√(t− 1)2(t− 1)− 1√

1− (t− 1)2

)+

1

κ

We need to distinguish two cases when t > 1 we have that√

(t− 1)2 = (t − 1)

and when t < 1 we have√

(t− 1)2 = −(t− 1).Hence if t > 1 we have

φ+(t) =1

κ

((t− 1)2 − 1√1− (t− 1)2

)+

1

κ

=−1

κ

(1− (t− 1)2√1− (t− 1)2

)+

1

κ

=−1

κ

(√1− (t− 1)2

)+

1

κ.

And if t < 1 we have

φ+(t) =1

κ

(−(t− 1)2 − 1√

1− (t− 1)2

)+

1

κ.

We’ll now calculate φ−(t).

φ−(t) = x−t+1

κ

(1− κx− −

√1 + κ2x2

−

)=−√

(t− 1)2t

κ√

1− (t− 1)2+

1

κ

(1 +

κ√

(t− 1)2

κ√

1− (t− 1)2−

√1 +

(t− 1)2

(1− (t− 1)2)

)

=−√

(t− 1)2t

κ√

1− (t− 1)2+

1

κ

(1 +

√(t− 1)2√

1− (t− 1)2− 1√

1− (t− 1)2

)

=1

κ

(−√

(t− 1)2t+√

(t− 1)2 − 1√1− (t− 1)2

)+

1

κ

=1

κ

(−√

(t− 1)2(t− 1)− 1√1− (t− 1)2

)+

1

κ

117

We again distinguish two cases. For t > 1 we find that

φ−(t) =1

κ

(−(t− 1)2 − 1√

1− (t− 1)2

)+

1

κ.

and for t < 1 we have

φ−(t) =1

κ

((t− 1)2 − 1√1− (t− 1)2

)+

1

κ

=−1

κ

(√1− (t− 1)2

)+

1

κ.

If t > 1 we have that φ+(t) ≥ φ−(t) and for t < 1 we find that φ−(t) ≥ φ+(t). We

also have that limx2

−1κ

(√1− (t− 1)2

)+ 1

κ= 1

κ.


φ(t) =

−1κ

(√1− (t− 1)2

)+ 1

κif 0 ≤ t ≤ 2

+∞ if t > 2.

118

Bibliography

[1] C. Acerbi, D. Tasche, On the coherence of expected shortfall, Journal of Bankingand Finance, Vol. 26, Issue 7, 2002, 1487-1503.

[2] C. Acerbi, Spectral measures of risk: A coherent representation of subjectiverisk aversion, Journal of Banking and Finance, Vol. 26, 2002, 1505-1518.

[3] A. Ahmadi-Javid, Entropic Value-at-Risk: A new coherent risk measure, JOptim Theory Appl, Vol. 155,Issue 3, 2012, 1105-1123.

[4] A. Ben-Tal, A. Ben-Israel, A recourse certainty equivalent for decisions underuncertainty., Annals of Operations Research, Vol. 30, Issue 1, 1991, 1-44.

[5] A. Ben-Tal,M. Teboulle, An old-new concept of convex risk measures: the op-timised certainty equivalent., Mathematical Finance, Vol. 17, Issue 3, 2007,449-476.

[6] J.M. Borwein, A.S. Lewis, Partially finite convex programming, Part I: Quasirelative interiors and duality theory, Mathematical Programming, Vol. 57,1992, 15-48.

[7] J.M. Borwein, D.R. Luke, Duality an convex programming, Handbook of Math-ematical Methods in Imaging, edited by Scherzer and Otmar, Springer, 1992,229-270.

[8] A. Chen, A. Pelsser, M. Vellekoop, Modelling non-monotone risk aversion usingSAHARA utility functions, Journal of Economic Theory, Vol. 146, Issue 5,2011, 2075-2092.

[9] E.R. Csetnek, Overcoming the failure of classical generalized interior-point reg-ularity conditions in convex optimisation., Logos verlag Berlin GmbH, 2010.

[10] S. Drapeau, M. Kupper, A. Papapantoleon, A Fourier Approach to the Com-putation of CV@R and Optimized Certainty Equivalents, Journal of Risk, Vol.16, 2013, 3-29.

[11] H. Follmer, A. Schied, Stochastic finance: An introduction in discrete time,De Gruyter, 2010.

[12] H. Follmer, A. Schied, Convex and coherent risk measures, unpublished paper.

[13] H. Follmer, A. Schied Convex measures of risk and trading constraints., Fi-nance and Stochastics, Vol. 6, Issue 4, 2002, 429-447.

119

[14] G. C. Goodwin, M. M. Seron, J. A. de Dona , Constrained Control and Es-timation: An Optimisation Approach, Springer Science and Business Media,2006.

[15] V. Henderson, D. Hobson, Utility indifference Pricing: an Overview, Volumeon Indifference Pricing, 2004.

[16] O. Hernandez-Lerma, J. B. Lasserre Markov Chains and Invariant Probabili-ties, Birkhauser, 2012.

[17] V.Jose,R.Nau,R.Winkler, Scoring Rules, Generalized Entropy and Utilitymaximisation, Operations Research, Vol. 56, Issue 5, 2008, 1146-1157.

[18] T. Knispel, H. Follmer, Convex Risk Measures: Basic Facts, Law-invarianceand beyond, Asymptotics for Large Portfolios, Handbook of the Fundamentalsof Financial Decision Making: In 2 Parts, edited by L. MacLean, W. Ziemba,World scientific, 2013, 507-555.

[19] H. Levy, Y. Kroll, Ordering Uncertain Options with Borrowing and Lending,The Journal of Finance, Vol. 33, Issue 2, 1978, 553-574.

[20] A.Mas-Collell,M. Whinston,J. Green, Microeconomic theory, Oxford Univer-sity Press, 1995.

[21] K. Martin, C.T. Ryan, M. Stern, The Slater Conundrum: Duality and Pricingin Infinite Dimensional Optimization., SIAM. J. OPtim., Vol. 26, Issue 1, 2016,111-138.

[22] R.Nau, R.Jose, R. Winkler, Duality between maximization of expected utilityand minimization of Relative entropy when probabilities are imprecise, Int.Symp. on imprecise probability, 2009.

[23] J. Pontstein, Approaches in the theory of optimisation, Cambridge UniversityPress, 1980.

[24] R. Raskin, M. Cochran, Interpretations and transformations of scale forthe pratt-arrow absolute risk aversion coefficient: implications for generalizedstochastic dominance, Western Journal of Agricultural Economics Vol.11, Issue2, 1986, 204-210.

[25] R.T. Rockafellar, Convex Analysis, Princeton University press, 1970.

[26] L. Rogers, D. Williams Diffusions, markov processes and martingales: Volume1: Foundations, Cambridge University Press, 2000.

[27] W.Rudin, Principles of Mathematical Analysis , Third edition, Mc.Graw-Hill,1964.

[28] Y. Syau, A note on convex functions,International J. Math. and Math. Sci.,Vol.22, 1998, 525-534.

120

[29] Y.Yamai, T. Yoshiba, Comparative analyses of expected shortfall and valueat risk: expected utility maximisation and tail risk., Monetary and EconomicStudies, 2002, 95-116.

121

utility based risk measures · 2016-07-27 · construct these utility based risk measures. because...

Documents