a weak dynamic programming principle for zero-sum stochastic ... · a weak dynamic programming...

122
A weak dynamic programming principle for zero-sum stochastic differential games Pedro Miguel Almeida Serra Costa Vit´oria Disserta¸ ao para obten¸ ao do Grau de Mestre em Matem´ atica e Aplica¸ oes uri Presidente: Professora Doutora Ana Bela Cruzeiro Orientador: Professor Doutor Diogo Aguiar Gomes Vogal: Professor Doutor Ant´ onio Atalaia Serra Vogal: Doutor Gabriele Terrone Junho de 2010

Upload: others

Post on 03-Jun-2020

20 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A weak dynamic programming principle for zero-sum stochastic ... · A weak dynamic programming principle for zero-sum stochastic differential games Pedro Miguel Almeida Serra Costa

A weak dynamic programming principle

for zero-sum stochastic differential games

Pedro Miguel Almeida Serra Costa Vitoria

Dissertacao para obtencao do Grau de Mestre em

Matematica e Aplicacoes

Juri

Presidente: Professora Doutora Ana Bela CruzeiroOrientador: Professor Doutor Diogo Aguiar GomesVogal: Professor Doutor Antonio Atalaia SerraVogal: Doutor Gabriele Terrone

Junho de 2010

Page 2: A weak dynamic programming principle for zero-sum stochastic ... · A weak dynamic programming principle for zero-sum stochastic differential games Pedro Miguel Almeida Serra Costa
Page 3: A weak dynamic programming principle for zero-sum stochastic ... · A weak dynamic programming principle for zero-sum stochastic differential games Pedro Miguel Almeida Serra Costa

“I was born not knowing and have had onlya little time to change that here and there.”

Richard Feynman

Page 4: A weak dynamic programming principle for zero-sum stochastic ... · A weak dynamic programming principle for zero-sum stochastic differential games Pedro Miguel Almeida Serra Costa

4

Page 5: A weak dynamic programming principle for zero-sum stochastic ... · A weak dynamic programming principle for zero-sum stochastic differential games Pedro Miguel Almeida Serra Costa

Acknowledgments

First I would like to thank Prof. Nizar Touzi, for he initially proposed the problem that gave riseto this thesis and supervised this work in its early stage.

I am also grateful to my adviser, Prof. Diogo Gomes, who took this project under his supervision.For his guidance and numerous insightful comments I deeply express my gratitude.

This project started at Ecole Polytechnique, where I had the pleasure of staying for a wholesemester. I am grateful for their hospitality.

My parents deserve a special acknowledgment, for all their love and unconditional support.

Last but not least I would like to thank the numerous people - friends, colleagues and mentors -who, through their friendship, influence and encouragement, contributed to these last five fruitful andjoyful years of my personal and academic development.

i

Page 6: A weak dynamic programming principle for zero-sum stochastic ... · A weak dynamic programming principle for zero-sum stochastic differential games Pedro Miguel Almeida Serra Costa

ii

Page 7: A weak dynamic programming principle for zero-sum stochastic ... · A weak dynamic programming principle for zero-sum stochastic differential games Pedro Miguel Almeida Serra Costa

Resumo

Neste trabalho extendemos uma versao fraca do princıpio de programacao dinamica, provado pelaprimeira vez em [1] para problemas de controlo optimal, a problemas de jogos diferenciais estocasticosde soma nula. Deste modo, conseguimos derivar a equacao de Hamilton-Jacobi-Bellman-Isaacs quandoum dos jogadores pode utilizar estrategias com valores num conjunto ilimitado.

Palavras-chave: jogos diferenciais estocasticos, funcao valor, princıpio de programacao dinamica,solucoes de viscosidade.

iii

Page 8: A weak dynamic programming principle for zero-sum stochastic ... · A weak dynamic programming principle for zero-sum stochastic differential games Pedro Miguel Almeida Serra Costa

iv

Page 9: A weak dynamic programming principle for zero-sum stochastic ... · A weak dynamic programming principle for zero-sum stochastic differential games Pedro Miguel Almeida Serra Costa

Abstract

We extend a weak version of the dynamic programming principle, first proven in [1] for stochasticcontrol problems, to the context of zero-sum stochastic differential games. By doing so we are ableto derive the Hamilton-Jacobi-Bellman-Isaacs equation when one of the players is allowed to usestrategies taking values in an unbounded set.

Keywords: stochastic differential games, value function, dynamic programming principle, viscos-ity solutions.

v

Page 10: A weak dynamic programming principle for zero-sum stochastic ... · A weak dynamic programming principle for zero-sum stochastic differential games Pedro Miguel Almeida Serra Costa

vi

Page 11: A weak dynamic programming principle for zero-sum stochastic ... · A weak dynamic programming principle for zero-sum stochastic differential games Pedro Miguel Almeida Serra Costa

Contents

Acknowledgments i

Resumo iii

Abstract v

Preface 1

1 Deterministic optimal control 31.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.2 The controlled dynamical system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.3 Value function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.4 Dynamic programming principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.5 Hamilton-Jacobi-Bellman equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2 Stochastic optimal control 92.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.2 The controlled Markov diffusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102.3 Value function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.4 Dynamic programming principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132.5 Hamilton-Jacobi-Bellman equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152.6 Verification Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172.7 Merton’s optimal portfolio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

3 Deterministic differential games 213.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213.2 The controlled dynamical system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223.3 Lower and upper values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223.4 Dynamic programming principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233.5 Hamilton-Jacobi-Bellman-Isaacs equation . . . . . . . . . . . . . . . . . . . . . . . . . 25

4 Stochastic differential games 294.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304.2 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324.3 The Markovian scenario . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

4.3.1 State dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344.3.2 Admissible controls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 354.3.3 Terminal reward . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 374.3.4 Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 384.3.5 Lower and upper values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 394.3.6 Properties of strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

4.4 Properties of the value function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 484.4.1 Non-randomness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

vii

Page 12: A weak dynamic programming principle for zero-sum stochastic ... · A weak dynamic programming principle for zero-sum stochastic differential games Pedro Miguel Almeida Serra Costa

viii CONTENTS

4.4.2 Growth rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 554.4.3 Continuity in the space variable . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

4.5 Weak dynamic programming principle . . . . . . . . . . . . . . . . . . . . . . . . . . . 574.5.1 Optimal stochastic control as a particular case . . . . . . . . . . . . . . . . . . 654.5.2 Continuity of the reward function . . . . . . . . . . . . . . . . . . . . . . . . . . 67

4.6 Hamilton-Jacobi-Bellman-Isaacs equation . . . . . . . . . . . . . . . . . . . . . . . . . 694.6.1 Uniqueness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

4.7 Merton’s optimal portfolio: worst-case approach . . . . . . . . . . . . . . . . . . . . . 784.8 Conclusions and further research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 804.9 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

A Viscosity solutions 85A.1 Notion of viscosity solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85A.2 Uniqueness for the Dirichlet problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87A.3 Discontinuous viscosity solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90A.4 Parabolic equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

B Stochastic calculus 95B.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

B.1.1 Brownian motion and filtration . . . . . . . . . . . . . . . . . . . . . . . . . . . 95B.1.2 Stopping times and progressive measurability . . . . . . . . . . . . . . . . . . . 96B.1.3 Martingales and local martingales . . . . . . . . . . . . . . . . . . . . . . . . . 96

B.2 Stochastic integral and Ito’s formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98B.2.1 Ito processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99B.2.2 Ito’s formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

B.3 Martingale representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101B.4 Girsanov’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101B.5 Stochastic differential equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102B.6 Controlled diffusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

C A measure result 105

Page 13: A weak dynamic programming principle for zero-sum stochastic ... · A weak dynamic programming principle for zero-sum stochastic differential games Pedro Miguel Almeida Serra Costa

Preface

In [1], Bouchard and Touzi propose a weak version of the dynamic programming principle in thecontext of stochastic optimal control. Their objective was to avoid technical difficulties related to themeasurable selection argument. By doing this they were able to derive the dynamic programmingequation without requiring the value function to be measurable. One question that arises naturally ishow to extend this approach to stochastic differential games.

Zero-sum stochastic differential games were studied rigorously for the first time by Fleming andSouganidis in [2]. These problems are usually studied in a setting where the strong assumptions implythat the value function is continuous, hence measurable. Considering a weak version of the dynamicprogramming principle gives us the opportunity of studying these problems in a more general settingwhere the value function does not have a priori much regularity.

This thesis tackles the problem of extending the weak dynamic programming principle of [1] tothe context of stochastic differential games. This is done in Chapter 4. For convenience of the readerwe develop in the first Chapters small introductions to the theories of deterministic optimal control,stochastic optimal control and deterministic two-person zero-sum differential games. In addition, 2appendices in the end briefly recall and develop some basic notions and results of second-order viscositysolutions and stochastic calculus.

1

Page 14: A weak dynamic programming principle for zero-sum stochastic ... · A weak dynamic programming principle for zero-sum stochastic differential games Pedro Miguel Almeida Serra Costa

2 CONTENTS

Page 15: A weak dynamic programming principle for zero-sum stochastic ... · A weak dynamic programming principle for zero-sum stochastic differential games Pedro Miguel Almeida Serra Costa

Chapter 1

Deterministic optimal control

In this Chapter we consider the deterministic optimal control problem in the finite horizon setting.Our objective is to establish the dynamic programming principle and derive from it the Hamilton-Jacobi-Bellman equation.

We aim to stress the main ideas and arguments. Thus, we consider the Mayer problem (no runningcost) with compact valued controls and bounded value function. For a detailed exposition we referthe reader to [3].

1.1 Introduction

We start by outlining in a brief and informal way the main ideas of this Chapter. Consider a statevariable, X, driven by a control ν through the following nonlinear system:

dX(s) = µ (s,X(s); ν(s)) ds,

with initial conditionX(t) = x. Since this variableX depends on t, x, ν, we use the notationX := Xνt,x.

We are interested in the terminal value of this variable, Xνt,x(T ). More precisely, we are interested

in the quantity

J(t, x; ν) := f(Xνt,x(T )

),

which we want to maximize over all admissible controls. Thus we want to determine the value function

v(t, x) := supνJ(t, x; ν).

We will use a dynamic programming approach to this optimization problem.To establish heuristically the dynamic programming principle we assume that there is an optimal

control ν∗, i.e., there exists ν∗ such that, for all (t, x),

v(t, x) = J(t, x; ν∗).

Then, given τ ≥ t, we have, by the flow property for ordinary differential equations, that

v(t, x) = f(Xν∗

t,x(T ))

= f(Xν∗

τ,Xν∗t,x(τ)

(T ))

= J(τ,Xν∗

t,x(τ); ν∗)

= v(τ,Xν∗

t,x(τ)).

3

Page 16: A weak dynamic programming principle for zero-sum stochastic ... · A weak dynamic programming principle for zero-sum stochastic differential games Pedro Miguel Almeida Serra Costa

4 CHAPTER 1. DETERMINISTIC OPTIMAL CONTROL

If the existence of admissible controls is not assumed then we should expect the previous equality tobe replaced by the so-called dynamic programming principle:

v(t, x) = supνv(τ,Xν

t,x(τ)).

The dynamic programming principle gives us important information on the local behavior of thevalue function. Letting τ → t we are able to derive an infinitesimal version of it, the Hamilton-Jacobi-Bellman equation. Indeed, if we assume that v is C1, then we have, by the chain rule, that

0 = supν

v(τ,Xν

t,x(τ))− v(t, x)

τ − t−→τ→t

supu

(∂tv(t, x) + 〈µ(t, x;u), Dv(t, x)〉)

Thus, we should expect v to be a solution of the following Hamilton-Jacobi-Bellman (HJB) equation:(−∂tv + inf

u−〈µ(.;u), Dv〉

)(t, x) = 0.

Since in many applications the value function lacks the regularity to be a classical solution of theprevious equation, we need a weak definition of solution for such an equation. As it turns out, thetheory of viscosity solutions provides the adequate framework for this derivation. In fact, as we willsee, the value function is a viscosity solution of the HJB equation.

1.2 The controlled dynamical system

We consider a nonlinear system in Rd, written in differential form asdXν

t,x(s) = µ(s,Xν

t,x(s); ν(s))ds

Xνt,x(t) = x,

(1.1)

where ν is a control function in a set U to be specified later, (t, x) are the initial conditions andµ : [0, T ]×Rd ×U → Rd is a continuous function which is Lipschitz continuous in the space variable,that is

|µ(t, x;u)− µ(t, y;u)| ≤ K|x− y|,

for some constant K.We call [0, T ]× Rd the state space and we denote it by S.The space of admissible controls, U , is defined as

U := ψ : [0, T ] → U : ψ measurable ,

where U ⊂ RM is a compact set.Under the assumptions made on µ, the existence and uniqueness of a solution Xν

t,x for (1.1) followsfrom the standard theory of ordinary differential equations, for each ν. Furthermore there is continuitywith respect to the initial conditions (t, x) uniformly in t and ν. More precisely, for each x, there is aconstant, Cx, depending only on K,T, x such that for all ν ∈ U , t ∈ [0, T ], t′ ≥ t, s ≥ t′, x′ ∈ B1(x),∣∣Xν

t′,x′(s)−Xνt,x(s)

∣∣ ≤ Cx

(|x− x′|+ |t− t′| 12

). (1.2)

This estimate is a particular case of Lemma 148, when there is no diffusion.

1.3 Value function

In this Section we characterize our problem. We consider a terminal reward which we want tomaximize,

J(t, x; ν) := f(Xνt,x(T )

).

Page 17: A weak dynamic programming principle for zero-sum stochastic ... · A weak dynamic programming principle for zero-sum stochastic differential games Pedro Miguel Almeida Serra Costa

1.4. DYNAMIC PROGRAMMING PRINCIPLE 5

The payoff function f is assumed to be bounded and continuous.The problem we consider is to determine the value function given by

v(t, x) := supν∈U

J(t, x; ν).

The following regularity result for v is easy to obtain.

Proposition 1. v is bounded and continuous in [0, T ]× Rd.

Proof. The boundedness of v follows directly from the boundedness of f .To prove continuity we consider νε such that

v(t, x) ≤ J (t, x; νε) + ε.

Then

v(t, x)− v(t′, x′) ≤ J(t, x; νε)− J(t′, x′; νε) + ε

= f(Xνε

t,x(T ))− f

(Xνε

t′,x′(T ))

+ ε

→ ε,

where the convergence, as (t′, x′) → (t, x), follows from the continuity of f and from (1.2).Thus, for (t′, x′) sufficiently close to (t, x), we have

v(t, x)− v(t′, x′) ≤ 2ε.

The inequality

v(t, x)− v(t′, x′) ≥ −2ε,

is obtained in an analogous way.

1.4 Dynamic programming principle

In this Section we establish the dynamic programming principle for the value function v.The essential ingredient is the following property:

J(t, x; ν1 ⊕τ ν2) = J(τ,Xν1

t,x(τ); ν2), (1.3)

where

(ν1 ⊕τ ν2)(s) := ν1(s)1[t,τ ](s) + ν2(s)1(τ,T ](s)

denotes the concatenation of controls ν1, ν2 at time τ ≥ t.This property follows directly from the flow property for dynamical systems and a simple compu-

tation:

J(t, x; ν1 ⊕τ ν2) = f(Xν1⊕τν2t,x (T )

)= f

(Xν2τ,X

ν1t,x(τ)

(T ))

= J(τ,Xν1

t,x(τ); ν2)

Theorem 2 (Dynamic programming principle). For all x ∈ Rd, t ∈ [0, T ] and τ ∈ [t, T ] the followingholds:

v(t, x) = supν∈U

v(τ,Xν

t,x(τ))

Page 18: A weak dynamic programming principle for zero-sum stochastic ... · A weak dynamic programming principle for zero-sum stochastic differential games Pedro Miguel Almeida Serra Costa

6 CHAPTER 1. DETERMINISTIC OPTIMAL CONTROL

Proof. We prove the two inequalities separately.

Step 1: v(t, x) ≤ supν∈U

v(τ,Xν

t,x(τ))

By (1.3) we have

J(t, x; ν) = J(τ,Xν

t,x(τ); ν)

≤ supν2∈U

J(τ,Xν

t,x(τ); ν2)

= v(τ,Xν

t,x(τ)).

Taking the supremum on both sides of the inequality yields

v(t, x) ≤ supν∈U

v(τ,Xν

t,x(τ)).

Step 2: v(t, x) ≥ supν∈U

v(τ,Xν

t,x(τ))

Fix ε > 0 and consider ν1, ν2 ∈ U such that

supν∈U

v(τ,Xν

t,x(τ))

≤ v(τ,Xν1

t,x(τ))

+ ε

v(τ,Xν1

t,x(τ))

≤ J(τ,Xν1

t,x(τ); ν2)

+ ε.

Then

supν∈U

v(τ,Xν

t,x(τ))

≤ v(τ,Xν1

t,x(τ))

+ ε

≤ J(τ,Xν1

t,x(τ); ν2)

+ 2ε= J(t, x; ν1 ⊕τ ν2) + 2ε≤ v(t, x) + 2ε.

Since ε is arbitrary we conclude that

supν∈U

v(τ,Xν

t,x(τ))≤ v(t, x).

1.5 Hamilton-Jacobi-Bellman equation

In this Section we prove that the dynamic programming principle implies that v is a viscositysolution of the Hamilton-Jacobi-Bellman equation.

We introduce the Hamiltonian

H(t, x, p) := infu∈U

Hu(t, x, p),

where

Hu(t, x, p) := −〈p, µ(t, x;u)〉.

Notice that Hu(t, x, p) is continuous in t, x, p, u. Thus, due to the compactness of U , H is alsocontinuous.

Theorem 3. The value function v is a viscosity solution of

(−∂tv +H(., Dv))(t, x) = 0, (t, x) ∈]0, T [×Rd. (1.4)

Page 19: A weak dynamic programming principle for zero-sum stochastic ... · A weak dynamic programming principle for zero-sum stochastic differential games Pedro Miguel Almeida Serra Costa

1.5. HAMILTON-JACOBI-BELLMAN EQUATION 7

Proof. We begin by proving that v is a viscosity supersolution of (1.4). Consider φ ∈ C1(S) and(t, x) ∈ argmin(v − φ).

By Remark 101, in the Appendix, we can suppose that (t, x) is a strict global minimizer of v − φ.Thus, for all (t, x) ∈ S,

(v − φ)(t, x) ≥ (v − φ)(t, x),

that is

φ(t, x)− φ(t, x) ≥ v(t, x)− v(t, x).

Fix u ∈ U and consider the constant control ν(s) := u. Then, for τ ≥ t,

φ(t, x)− φ(τ,Xν

t,x(τ))≥ v(t, x)− v

(τ,Xν

t,x(τ)).

By the dynamic programming principle we know that

v(t, x) ≥ v(τ,Xν

t,x(τ)),

hence

φ(t, x)− φ(τ,Xν

t,x(τ))≥ 0. (1.5)

Dividing both sides of (1.5) by τ − t and letting τ → t we conclude that

(−∂tφ− 〈Dφ, µ(., u)〉) (t, x) = (−∂tφ+Hu(., Dφ)) (t, x) ≥ 0,

where we have used the differentiability of φ. Since u ∈ U is arbitrary we conclude that

(−∂tφ+H(., Dφ)) (t, x) ≥ 0.

Thus v is a viscosity supersolution of (1.4).

To check that v is a subsolution we consider φ ∈ C1(S) and (t, x) ∈ argmax(v − φ). By Remark101, in the Appendix, we can suppose that (t, x) is a strict global maximizer of v − φ. Thus, for all(t, x) ∈ S,

(v − φ)(t, x) ≤ (v − φ)(t, x),

that is

φ(t, x)− φ(t, x) ≤ v(t, x)− v(t, x). (1.6)

We suppose by contradiction that

(−∂tφ+H(., Dφ)) (t, x) ≥ 2δ,

for some δ > 0. Then, for all u ∈ U ,

(−∂tφ+Hu(., Dφ)) (t, x) ≥ 2δ.

Since Hu(t, x, p) is continuous with respect to u, t, x, p and U is compact we deduce that thereexists R > 0 such that, for all u ∈ U ,

(−∂tφ+Hu(., Dφ)) (t, x) ≥ δ, for all (t, x) ∈ BR(t, x).

By (1.2), for τ sufficiently close to t, we have∣∣∣(s,Xνt,x

(s))− (t, x)

∣∣∣ < R,

Page 20: A weak dynamic programming principle for zero-sum stochastic ... · A weak dynamic programming principle for zero-sum stochastic differential games Pedro Miguel Almeida Serra Costa

8 CHAPTER 1. DETERMINISTIC OPTIMAL CONTROL

for all ν ∈ U and all s ∈ [t, τ ].Since φ ∈ C1(S), we have

φ(τ,Xν

t,x(τ))− φ(t, x) =

∫ τ

t

(∂tφ−Hν(s)(., Dφ)

)(s,Xν

t,x(s))ds

≤ −(τ − t)δ.

By (1.6) we then conclude that

v(t, x)− v(τ,Xν

t,x(τ))

≥ φ(t, x)− φ(τ,Xν

t,x(τ))

≥ (τ − t)δ.

Taking the supremum in ν we conclude that

v(t, x) ≥ supν∈U

v(τ,Xν

t,x(τ))

+ (τ − t)δ,

which contradicts the dynamic programming principle.

We remark that v also satisfies the terminal condition

v(T, x) = f(x).

This provides a characterization of the value function as a solution of a partial differential equation.For the characterization to be complete, uniqueness of solution for the HJB equation must be estab-lished. It is, indeed, possible to prove uniqueness by the standard techniques of first order viscositysolutions. We will not do this study here. Instead we refer the reader to [3].

Page 21: A weak dynamic programming principle for zero-sum stochastic ... · A weak dynamic programming principle for zero-sum stochastic differential games Pedro Miguel Almeida Serra Costa

Chapter 2

Stochastic optimal control

In this Chapter we turn our attention to stochastic optimal control problems. The results areanalogous to those of the previous Chapter. A modern reference on the subject is the well knownbook by Fleming and Soner, [4].

2.1 Introduction

Stochastic optimal control is, in many points, similar to its deterministic counterpart. Thus theprocedure we will use to study this problem will be analogous to the previous one. In this introductionwe outline the main differences.

In the stochastic scenario we consider a state variable, Xνt,x, which is a stochastic process satisfying

a stochastic differential equation,

dXνt,x(s) = µ

(s,Xν

t,x(s); νs)ds+ σ

(s,Xν

t,x(s); νs)dWs,

with initial condition Xνt,x(t) = x. This state variable is driven by a control ν, which must be a

progressively measurable stochastic process.As in the deterministic case, we have a terminal payoff, f

(Xνt,x(T )

), which we want to maximize.

One difference is that, in the stochastic setting, when the choice of control is to be made, at time t,we can not predict the terminal payoff. Thus we maximize instead its conditional expectation,

J(t, x; ν) := E[f(Xνt,x(T )

) ∣∣∣Ft] .We call this random variable the terminal reward. The stochastic optimal control problem is then todetermine the value function:

V (t, x) := esssupν

J(t, x; ν).

Even though, a priori, V (t, x) is a random variable, we will see that it is, in fact, a constant randomvariable. Thus we can think of V as a function V : [0, T ]× Rd → R.

As in the deterministic scenario, it is easy to derive heuristically the dynamic programming prin-ciple for this problem. It takes the following form:

V (t, x) = esssupν

E[V(τ,Xν

t,x(τ)) ∣∣∣Ft] .

Though the heuristic derivation is simple and the analogies with the deterministic case are many, thestochastic version of the dynamic programming principle encloses subtle technical difficulties relatedto measurability issues. Indeed, as we can see, for the statement of the principle to make sense, we

9

Page 22: A weak dynamic programming principle for zero-sum stochastic ... · A weak dynamic programming principle for zero-sum stochastic differential games Pedro Miguel Almeida Serra Costa

10 CHAPTER 2. STOCHASTIC OPTIMAL CONTROL

must prove first that V is measurable. At this point, either we work under a restricted setting wherestrong assumptions imply easily that V is measurable (typically that it is even continuous), or wehave to use deep measurable selection arguments to prove it. In this Chapter we will take the firstroute. For the reader interested in following the second, we refer to [5], where that approach is usedin the discrete time scenario.

Recently, Bouchard and Touzi, proposed in [1] an alternative approach, where a weak version ofthe dynamic programming principle is used. Following this approach we avoid the measurability issuesand we are still able to derive the Hamilton-Jacobi-Bellman equation which is the utmost objectiveof the whole dynamic programming approach. In Chapter 4 we will use this approach in the contextof stochastic differential games.

With the dynamic programming principle we are able to derive the Hamilton-Jacobi-Bellmanequation (HJB). Again we must take a limit τ → t, in the context of the dominated convergenceTheorem, and use the stochastic counterpart of the chain rule, Ito’s formula. Indeed, under theassumption that V is C1,2, we have by Ito’s formula that

0 = esssupν

E[V(τ,Xν

t,x(τ))− V (t, x)

∣∣∣Ft]= esssup

νE[∫ τ

t

(∂tV + 〈µ(.; νs), DV 〉+

12Tr(σσT (, ; νs)D2V

)) (s,Xν

t,x(s))ds∣∣∣Ft]+

+E[∫ τ

t

(DV σ(.; νs))(s,Xν

t,x(s))dWs

∣∣∣Ft] . (2.1)

If Xνt,x(s) remains bounded for s ∈ [t, τ ], we then have, by the martingale properties of the stochastic

integral, that

E[∫ τ

t

(DV σ(.; ν))(s,Xν

t,x(s))dWs

∣∣∣Ft] = 0.

In that case, we can divide both sides of (2.1) by τ − t and take the limit as τ → t to conclude that

supu

(∂tV + 〈µ(.;u), DV 〉+

12Tr(σσT (, ;u)D2V

))(t, x) = 0.

Thus we should expect V to be a solution of the HJB equation:(−∂tV + inf

u

(−〈µ(.;u), DV 〉 − 1

2Tr(σσT (, ;u)D2V

)))(t, x) = 0.

Notice that this is a parabolic second order partial differential equation. Like in the first order case,the theory of viscosity solutions provides the adequate framework for the study of such equations.

The HJB equation provides us also with a procedure to find optimal strategies. This procedure isexplored in Section 2.6 and applied in Section 2.7.

2.2 The controlled Markov diffusion

We fix a time horizon, T , and consider the classical Wiener space, (Ω,F ,P), endowed with thestandard N−dimensional Brownian motion, W , and the natural filtration induced by W , F = Fs :0 ≤ s ≤ T. For further details see Section 4.2.

As in the previous Chapter we denote by S := [0, T ] × Rd the state space. We consider a controlspace, U , consisting of the progressively measurable processes with values in U , where U ⊂ RM is acompact set.

Let µ : S× U → Rd and σ : S× U → Rd×N be continuous functions such that

|µ(t, x;u)− µ(t, y;u)|+ |σ(t, x;u)− σ(t, y;u)| ≤ K|x− y|,|µ(t, x;u)|+ |σ(t, x;u)| ≤ K(1 + |x|).

Page 23: A weak dynamic programming principle for zero-sum stochastic ... · A weak dynamic programming principle for zero-sum stochastic differential games Pedro Miguel Almeida Serra Costa

2.3. VALUE FUNCTION 11

The dynamics of the state variable is given by a stochastic differential equation,dXν

t,x(s) = µ(s,Xν

t,x(s); νs)ds+ σ

(s,Xν

t,x(s); νs)dWs

Xνt,x(t) = x,

(2.2)

where (t, x) are the initial conditions and ν ∈ U .Under the assumptions made on µ, σ, we know by Theorem 147 that, for each ν, there exists a

unique strong solution, Xνt,x, of (2.2). Furthermore, by Lemma 148, there is continuity with respect

to the initial conditions, uniformly in t and ν, in the sense that, for each x and p ≥ 2, there is aconstant, Cx, depending only on K,T, x, p such that for all ν ∈ U , t ∈ [0, T ], t′ ≥ t, s ≥ t′, x′ ∈ B1(x),

E[∣∣Xν

t′,x′(s)−Xνt,x(s)

∣∣p ∣∣∣Ft] ≤ Cx

(|x− x′|p + |t− t′|

p2

). (2.3)

2.3 Value function

We consider a terminal reward which we want to maximize,

J(t, x; ν) := E[f(Xνt,x(T )

) ∣∣∣Ft] ,where the payoff function f is assumed to be bounded and globally Lipschitz.

Notice that J(t, x; ν) is a random variable and may depend on the past.The problem we consider is to determine the value function given by

V (t, x) := esssupν∈U

J(t, x; ν).

For the notion of essential supremum, see Definition 39.Even though, a priori, V (t, x) is a random variable, it turns out that it is, in fact, a constant

random variable. For a proof of this result we refer the reader to Section 4.4.1, where an analogousresult is proved in the context of stochastic differential games. Thus, we may think of V (t, x) as afunction V : S → R.

It is possible to find controls which are uniformly ε−optimal. More precisely we have:

Lemma 4. Fix ε > 0. Then there exists νε ∈ U such that

V (t, x) ≤ J(t, x; νε) + ε.

Proof. By Theorem 40 there exists a countable collection νi ⊂ U such that

V (t, x) = supiJ(t, x; νi).

Consider Λi := V (t, x) ≤ J(t, x; νi) + ε ∈ Ft. We define Λ1 := Λ1, Λi+1 = Λi+1 \⋃ik=1 Λk. Then

Λi ⊂ Ft forms a countable partition of Ω, modulo null sets.We now define

νε :=∑i

νi1Λi .

Then, since U is compact, νε ∈ U and

J(t, x; νε) =∑i

J(t, x; νi)1Λi

≥ V (t, x)− ε,

where we used in the first equality a property of J , to be proved in Lemma 35, that implies:

J

(t, x;

∑i

1Λiνi

)=∑i

1ΛiJ(t, x; νi).

We call this property independence of irrelevant alternatives.

Page 24: A weak dynamic programming principle for zero-sum stochastic ... · A weak dynamic programming principle for zero-sum stochastic differential games Pedro Miguel Almeida Serra Costa

12 CHAPTER 2. STOCHASTIC OPTIMAL CONTROL

Remark 5. Even though, in the proof of the previous Lemma, we use the compactness of U , the sameresult still holds if U is not compact. This will be a consequence of Proposition 68, and it will beexplored in Section 4.5.1.

By the previous result and by the non-randomness of V we get the following alternative definitionfor the value function:

Proposition 6. The following holds:

V (t, x) = supν∈U

E [J(t, x; ν)] .

Proof. Since V is a constant random-variable we have

V (t, x) = E[V (t, x)].

Thus, on one hand,

V (t, x) = E[esssupν∈U

J(t, x; ν)]

≥ supν∈U

E [J(t, x; ν)] .

On the other hand, by the previous Lemma,

V (t, x) = E[V (t, x)]≤ E[J(t, x; νε)] + ε

≤ supν∈U

E [J(t, x; ν)] + ε,

and since ε is arbitrary we deduce that

V (t, x) ≤ supν∈U

E [J(t, x; ν)] .

Using Proposition 6, the next regularity result for V is easy to obtain.

Proposition 7. The function V (t, x) is bounded and continuous. Furthermore V (t, .) is Lipschitzcontinuous and V (., x) is 1

2−Holder continuous.

Proof. The boundedness of V follows directly from the boundedness of f .By Proposition 6, we have for all x′ ∈ B1(x):

V (t′, x′)− V (t, x) = supν

E[J(t′, x′; ν)]− supν

E[J(t, x; ν)]

≤ supν

E[f(Xνt′,x′(T )

)− f

(Xνt,x(T )

)]≤ sup

νKE

[∣∣Xνt′,x′(T )−Xν

t,x(T )∣∣]

≤ supνKE

[∣∣Xνt′,x′(T )−Xν

t,x(T )∣∣2] 1

2

≤ CxK(|x− x′|2 + |t− t′|

) 12 ,

where the last inequality follows from (2.3). Changing the roles of (t′, x′) and (t, x) we get the desiredcontinuity result on V .

Page 25: A weak dynamic programming principle for zero-sum stochastic ... · A weak dynamic programming principle for zero-sum stochastic differential games Pedro Miguel Almeida Serra Costa

2.4. DYNAMIC PROGRAMMING PRINCIPLE 13

Typically, the 12−Holder continuity of V , in the time variable, is proved using the dynamic pro-

gramming principle. by a slightly different argument. Because we state the dynamic programmingprinciple with stopping times we need some a priori regularity of V in the time variable. This is thereason why we proved at this point this regularity result.

The fact that V (t, x) is continuous is essential to simplify the subsequent exposition. Indeed, dueto this fact, we conclude that V

(τ,Xν

t,x(τ))

is measurable, for any stopping time τ . As we will seein the next Section, this is crucial for the statement of the dynamic programming principle. In [1],Bouchard and Touzi establish a weak version of the dynamic programming principle where such aregularity of the value function is not required.

2.4 Dynamic programming principle

In this Section we establish the dynamic programming principle for the value function V .The essential ingredient is a result analogous to (1.3):

J(t, x; ν1 ⊕τ ν2) = E[J(τ,Xν1

t,x(τ); ν2) ∣∣∣Ft] , (2.4)

where

(ν1 ⊕τ ν2)(s) := ν1(s)1[t,τ ](s) + ν2(s)1(τ,T ](s)

denotes the concatenation of controls ν1, ν2 at the stopping time τ ≥ t.This property follows directly from the flow property for solutions of stochastic differential equa-

tions and the tower property of conditional expectations:

J(t, x; ν1 ⊕τ ν2) = E[f(Xν1⊕τν2t,x (T )

)|Ft]

= E[E[f(Xν2τ,X

ν1t,x(τ)

(T )) ∣∣∣Fτ] ∣∣∣Ft]

= E[J(τ,Xν1

t,x(τ); ν2) ∣∣∣Ft] .

Theorem 8 (Dynamic programming principle). For all x ∈ Rd, t ∈ [0, T ] and stopping time τ ∈ [t, T ]the following holds:

V (t, x) = esssupν∈U

E[V(τ,Xν

t,x(τ)) ∣∣∣Ft] .

Proof. We prove the two inequalities separately.

Step 1: V (t, x) ≤ esssupν∈U

E[V (τ,Xν

t,x(τ))|Ft]

By (2.4) we have

J(t, x; ν) = E[J(τ,Xν

t,x(τ); ν) ∣∣∣Ft]

≤ E[esssupν2∈U

J(τ,Xν

t,x(τ); ν2) ∣∣∣Ft]

= E[V(τ,Xν

t,x(τ)) ∣∣∣Ft] .

Taking the supremum on both sides of the inequality yields

V (t, x) ≤ esssupν∈U

E[V(τ,Xν

t,x(τ)) ∣∣∣Ft] .

Step 2: V (t, x) ≥ esssupν∈U

E[V(τ,Xν

t,x(τ)) ∣∣∣Ft]

Page 26: A weak dynamic programming principle for zero-sum stochastic ... · A weak dynamic programming principle for zero-sum stochastic differential games Pedro Miguel Almeida Serra Costa

14 CHAPTER 2. STOCHASTIC OPTIMAL CONTROL

This inequality requires more work. Fix ε > 0 and consider νε ∈ U such that

esssupν∈U

E[V(τ,Xν

t,x(τ)) ∣∣∣Ft] ≤ E

[V(τ,Xνε

t,x(τ)) ∣∣∣Ft]+ ε.

For each (s, y) ∈ S consider ν(s,y) ∈ U such that

V (s, y) ≤ J(s, y; ν(s,y)) + ε.

We know, by continuity of V and by (2.3), that, for each (s, y) there is also r(s,y) such that

|V (s′, y′)− V (s, y)| ≤ ε,

|J(s′, y′; ν)− E[J(s, y; ν)|Fs′ ]| ≤ ε,

for all ν ∈ U and for all (s′, y′) ∈ B(s, y; r(s,y)), where B(s, y; r) := [s−r, s]×Br(y). Since B(s, y; r) :(s, y) ∈ S, 0 < r ≤ r(s,y) forms a Vitali covering of S, we can find a countable sequence (si, yi, ri)such that B(si, yi; ri)i forms a partition of S, modulo null sets, and 0 < ri ≤ r(si,yi). For the notionof Vitali covering we refer the reader to [6, p. 158].

Define

Λi :=(τ,Xνε

t,x(τ))∈ B(si, yi; ri)

∈ Fτ ∩ Fsi ,

νi := νε ⊕τ∨si ν(si,yi),

ν :=∑i

1Λiνi =∑i

1Λiνε ⊕si ν(si,yi).

Then ν ∈ U and

V(τ,Xνε

t,x(τ))1Λi ≤ (V (si, yi) + ε)1Λi

= (E [V (si, yi)|Fτ ] + ε)1Λi

≤ (E[J(si, yi; ν(si,yi))|Fτ ] + 2ε)1Λi

= (E[J(si, yi; νi)|Fτ ] + 2ε)1Λi

≤(J(τ,Xνε

t,x(τ); νi)

+ 3ε)1Λi

=(J(τ,Xν

t,x(τ); νi)

+ 3ε)1Λi .

Thus,

esssupν∈U

E[V(τ,Xν

t,x(τ)) ∣∣∣Ft] ≤ E

[V(τ,Xνε

t,x(τ)) ∣∣∣Ft]+ ε

= E

[∑i

V(τ,Xνε

t,x(τ))1Λi

∣∣∣Ft]+ ε

≤ E

[∑i

J(τ,Xν

t,x(τ); νi)1Λi

∣∣∣Ft]+ 4ε

= E[J(τ,Xν

t,x(τ); ν) ∣∣∣Ft]+ 4ε

= J(t, x; ν) + 4ε≤ V (t, x) + 4ε.

Since ε is arbitrary, we conclude that

esssupν∈U

E[V(τ,Xν

t,x(τ)) ∣∣∣Ft] ≤ V (t, x).

Page 27: A weak dynamic programming principle for zero-sum stochastic ... · A weak dynamic programming principle for zero-sum stochastic differential games Pedro Miguel Almeida Serra Costa

2.5. HAMILTON-JACOBI-BELLMAN EQUATION 15

Remark 9. In our setting the value function is continuous, thus implying that V(τ,Xν

t,x(τ))

ismeasurable. If this is not the case then more complicated arguments must be used. Examples of this canbe seen in [5], where a deep measurable selection argument is used, and [7], where a compactificationof the space of controls is carried through. As an alternative to this, in [1], a weak formulation of thedynamic programming principle is proposed.

2.5 Hamilton-Jacobi-Bellman equation

Now that we established the dynamic programming principle and proved that V is a continuousfunction we can prove that it is a viscosity solution of the Hamilton-Jacobi-Bellman equation. In thestochastic setting the Hamilton-Jacobi-Bellman equation is a parabolic partial differential equation ofsecond order.

The Hamiltonian we consider in this case is

H(t, x, p,X) := infu∈U

Hu(t, x, p,X),

where

Hu(t, x, p,X) := −〈p, µ(t, x;u)〉 − 12Tr((σσT )(t, x;u)X

).

Notice that Hu(t, x, p,X) is continuous in t, x, p, u,X. Thus, due to the compactness of U , H is alsocontinuous.

Theorem 10. The value function V is a viscosity solution of(−∂tV +H(., DV,D2V )

)(t, x) = 0, (t, x) ∈]0, T [×Rd. (2.5)

Proof. We begin by proving that V is a viscosity supersolution of (2.5). Consider φ ∈ C1,2(S) and(t, x) ∈ argmin(V −φ). By Remark 101, in the Appendix, we can suppose that (t, x) is a strict globalminimizer of V − φ. Thus, for all (t, x) ∈ S,

(V − φ)(t, x) ≥ (V − φ)(t, x),

that is

φ(t, x)− φ(t, x) ≥ V (t, x)− V (t, x).

Fix u ∈ U and consider the constant control νs := u. For each r > 0, consider the stopping timeτr := inf

s ≥ t : Xν

t,x(s) /∈ Br(x)

. Then,

φ(t, x)− φ(τr, X

νt,x

(τr))≥ V (t, x)− V

(τr, X

νt,x

(τr)).

By the dynamic programming principle we know that

V (t, x) ≥ E[V(τr, X

νt,x

(τr)) ∣∣∣Ft] ,

hence

φ(t, x)− E[φ(τr, X

νt,x

(τr)) ∣∣∣Ft] ≥ 0. (2.6)

By Ito’s formula we have

φ(τr, X

νt,x

(τr))− φ(t, x) =

∫ τr

t

(∂tφ−Hνs

(.;Dφ,D2φ

)) (s,Xν

t,x(s))ds+

+∫ τr

t

(Dφσ(.; νs))(s,Xν

t,x(s))dWs. (2.7)

Page 28: A weak dynamic programming principle for zero-sum stochastic ... · A weak dynamic programming principle for zero-sum stochastic differential games Pedro Miguel Almeida Serra Costa

16 CHAPTER 2. STOCHASTIC OPTIMAL CONTROL

Since τr is an exit time and σ,Dφ are continuous we have that (Dφσ(.; νs))(s,Xν

t,x(s))

remains

bounded for s ∈ [t, τ ], hence

E[∫ τr

t

(Dφσ(.; νs))(s,Xν

t,x(s))dWs

∣∣∣Ft] = 0.

Taking expectations on both sides of (2.7) and using (2.6) we get

E[∫ τr

t

(∂tφ−Hνs

(.;Dφ,D2φ

)) (s,Xν

t,x(s))ds∣∣∣Ft] ≤ 0

Dividing the previous inequality by τr − t and letting r → 0 we have, by dominated convergence,(∂tφ−Hu

(.;Dφ,D2φ

))(t, x) ≤ 0.

Thus V is a supersolution of (2.5).

To check that V is a subsolution we consider φ ∈ C1,2(S) and (t, x) ∈ argmax(V −φ). By Remark101 we can suppose that (t, x) is a strict global maximizer of V − φ. Thus, for all (t, x) ∈ S,

(V − φ)(t, x) ≤ (V − φ)(t, x),

that is

φ(t, x)− φ(t, x) ≤ V (t, x)− V (t, x). (2.8)

We suppose by contradiction that(−∂tφ+H(., Dφ,D2φ)

)(t, x) ≥ 2δ,

for some δ > 0.Consider ϕ(t, x) := φ(t, x) + |t− t|2 + |x− x|4. Then, for all u ∈ U ,(

−∂tϕ+Hu(., Dϕ,D2ϕ))(t, x) ≥ 2δ.

Since Hu(t, x, p,X) is continuous with respect to u, t, x, p,X and U is compact, we deduce thatthere exists R > 0 such that, for all u ∈ U ,(

−∂tϕ+Hu(., Dϕ,D2ϕ))(t, x) ≥ δ, for all (t, x) ∈ BR(t, x).

We define

η := min∂BR(t,x)

(ϕ− φ) > 0.

Let ν be such that

V (t, x) ≤ J(t, x; ν) +η

2, (2.9)

and consider the stopping time τ := infs ≥ t :

(s,Xν

t,x(s))/∈ BR(t, x)

.

Since ϕ ∈ C1,2(S), we have by Ito’s formula that

ϕ(τ,Xν

t,x(τ))− ϕ(t, x) =

∫ τ

t

(∂tϕ−Hνs(., Dϕ,D2ϕ)

) (s,Xν

t,x(s))ds+

+∫ τ

t

(Dϕσ(.; νs))(s,Xν

t,x(s))dWs

≤ −(τ − t)δ +∫ τ

t

(Dϕσ(.; νs))(s,Xνt,x

(s))dWs. (2.10)

Page 29: A weak dynamic programming principle for zero-sum stochastic ... · A weak dynamic programming principle for zero-sum stochastic differential games Pedro Miguel Almeida Serra Costa

2.6. VERIFICATION THEOREM 17

Since τ is an exit time and σ,Dϕ are continuous we have that (Dϕσ(.; νs))(s,Xν

t,x(s))

remains

bounded for s ∈ [t, τ ], hence

E[∫ τ

t

(Dϕσ(.; νs))(s,Xν

t,x(s))dWs

∣∣∣Ft] = 0.

Thus, taking expectations on (2.10) we get

E[ϕ(τ,Xν

t,x(τ)) ∣∣∣Ft]− ϕ(t, x) ≤ 0.

By definition of η and since(τ,Xν

t,x(τ))∈ ∂BR(t, x), we have

ϕ(τ,Xν

t,x(τ))≥ φ

(τ,Xν

t,x(τ))

+ η,

and hence

E[φ(τ,Xν

t,x(τ)) ∣∣∣Ft]− φ(t, x) ≤ −η.

By (2.8) we then conclude that

V (t, x)− E[V(τ,Xν

t,x(τ)) ∣∣∣Ft] ≥ φ(t, x)− E

[φ(τ,Xν

t,x(τ)) ∣∣∣Ft]

≥ η.

Thus

V (t, x) ≥ E[V(τ,Xν

t,x(τ)) ∣∣∣Ft]+ η.

On the other hand, we have by (2.9) and (2.4) that

V (t, x) ≤ J(t, x; ν) +η

2

= E[J(τ,Xν

t,x(τ); ν

) ∣∣∣Ft]+η

2

≤ E[V(τ,Xν

t,x(τ)) ∣∣∣Ft]+

η

2.

We have thus reached a contradiction.

Like in the deterministic case, the value function V also satisfies the terminal condition

V (T, x) = f(x).

Regarding uniqueness of solution, we refer the reader to Section 4.6.1. There we discuss uniquenessof solution for the Hamilton-Jacobi-Bellman-Isaacs (HJBI) equation, which is a second-order partialdifferential equation similar to the HJB equation. The same arguments can be adapted easily toprove uniqueness of solution for the HJB equation in this case. In fact, it is much easier, since in thisChapter we only consider the case of bounded value function.

2.6 Verification Theorem

In this section we establish a verification result useful for the synthesis of optimal controls.

Page 30: A weak dynamic programming principle for zero-sum stochastic ... · A weak dynamic programming principle for zero-sum stochastic differential games Pedro Miguel Almeida Serra Costa

18 CHAPTER 2. STOCHASTIC OPTIMAL CONTROL

Theorem 11 (Verification Theorem). Let v ∈ C1,2([0, T )×Rd)∩C([0, T ]×Rd) be a classical solutionof (2.5) such that v(T, x) = f(x), and v has polynomial growth.

Suppose that there exists a measurable function u∗ : S → U such that

H(.;Dv,D2v)(t, x) = Hu∗(.)(.;Dv,D2v)(t, x).

Then

V (t, x) = v(t, x),

and ν∗s := u∗ (s,Xt,x(s)) is an optimal control.

Proof. Since u∗ is measurable then ν∗s := u∗ (s,Xt,x(s)) ∈ U . Let X := Xν∗

t,x be the unique strongsolution of (2.2) and consider the stopping time τn := T ∧ infs ≥ t : Xs /∈ [−n, n].

Applying Ito’s formula we have

v(τn, X(τn))− v(t, x) =∫ τn

t

(∂tv −Hu∗(.)(.;Dv,D2v)

)(s,X(s))ds+

+∫ τn

t

(Dv σ(.;u∗(.)))(s,X(s))dWs

=∫ τn

t

(Dv σ(.;u∗(.)))(s,X(s))dWs.

Since τn is an exit time, we have

E[∫ τn

t

(Dv σ(.;u∗(.)))(s,X(s))dWs

∣∣∣Ft] = 0,

hence

v(t, x) = E [v(τn, X(τn))|Ft] .

Since v has polynomial growth we can take n → ∞ to conclude, by the dominated convergenceTheorem, that

v(t, x) = E [v(T,X(T ))|Ft]= E [f(X(T ))|Ft]= J(t, x; ν∗).

On the other hand, if we consider an arbitrary control ν ∈ U and Xν , the unique strong solutionof (2.2), then we remark that(

Hu∗(.)(.;Dv,D2v))

(s,Xν(s)) ≤(Hνs(.;Dv,D2v)

)(s,Xν(s)),

hence, by an analogous argument to the one used before, we deduce that

E [v(τn, Xν(τn))|Ft]− v(t, x) ≤ E[∫ τn

t

(Dv σ(.; νs))(s,Xν(s))dWs

∣∣∣Ft] = 0.

Letting n→∞ we deduce that

v(t, x) ≥ E [v(T,Xν(T ))|Ft]= E [f(Xν(T ))|Ft]= J(t, x; ν).

Since ν is arbitrary and v(t, x) = J(t, x; ν∗) we conclude that

V (t, x) ≤ v(t, x)= J(t, x; ν∗)≤ V (t, x).

Thus ν∗ is an optimal control.

Page 31: A weak dynamic programming principle for zero-sum stochastic ... · A weak dynamic programming principle for zero-sum stochastic differential games Pedro Miguel Almeida Serra Costa

2.7. MERTON’S OPTIMAL PORTFOLIO 19

2.7 Merton’s optimal portfolio

We now give an example of application of the previous results to the well known Merton’s optimalportfolio problem.

The setting for Merton’s problem is that of a market with a risky asset St and a non-risky assetS0t which follow the following stochastic differential equation

dSt = St(µtdt+ σtdWt),dS0

t = S0t rtdt.

In this setting we consider a self-financed portfolio which value, Xt, we want to maximize. The valueof the portfolio is to be maximized over all admissible strategies. An admissible strategy, π ∈ U ,indicates the ratio, πt ∈ [π0, π1], of the value of the portfolio to invest at time t in the risky asset.The self-financing conditions are then:

dXt =XtπtSt

dSt +Xt(1− πt)

S0t

dS0t

= Xt ((πtµt + (1− πt)rt)dt+ πtσtdWt) .

By choosing only self-financed strategies we are not allowing money to be added or taken fromthe portfolio. Furthermore it is easy to see that the value of the portfolio never becomes negative.Indeed, if at some point the value reaches 0 the we have dXt = 0 so that it remains 0.

Given a terminal time T , Merton’s optimal portfolio problem consists in optimizing, over alladmissible strategies π ∈ U , the utility of the terminal value of the portfolio, that is, determine

V (t, x) = supπ∈U

E[f(Xπt,x(T )

) ∣∣∣Ft] ,where U is the set of progressively measurable processes taking values in [π0, π1], f is an utilityfunction, Xπ

t,x(s) is the value of the portfolio at time s, with strategy π and initial data (t, x).We consider in the following a power utility function, f(x) = xp, for p ∈ (0, 1), and deterministic

parameters, µt = µ, σt = σ. In this case we have f(R+

0

)= R+

0 , thus V (t, x) ≥ 0.Because the dynamics is homogeneous, that is, Xπ

t,λx = λXπt,x, the power utility function is par-

ticularly easy to deal with. This can be seen in the following Lemma:

Lemma 12. If f(x) is multiplicative, i.e. f(xy) = f(x)f(y), then

V (t, x) = f(x)V (t, 1)

Proof. This is a consequence of the following computation:

J(t, x;π) = E[f(Xπt,x(T )

) ∣∣∣Ft]= E

[f

(Xπt,x(T )x

x

) ∣∣∣∣∣Ft]

= f(x)E[f(Xπt,1(T )

) ∣∣∣Ft]= f(x)J(t, 1;π).

Thus we conclude that V is differentiable in the space variable and

DV (t, x) = f ′(x)V (t, 1) = pxp−1V (t, 1) > 0,D2V (t, x) = f ′′(x)V (t, 1) = −p(1− p)xp−2V (t, 1) < 0,

Page 32: A weak dynamic programming principle for zero-sum stochastic ... · A weak dynamic programming principle for zero-sum stochastic differential games Pedro Miguel Almeida Serra Costa

20 CHAPTER 2. STOCHASTIC OPTIMAL CONTROL

for x > 0. Taking these facts in account and denoting h(t) := V (t, 1), we now need to determine π∗

such that

H(t, x; pxp−1h(t),−p(1− p)xp−2h(t)) = Hπ∗(t,x)(t, x; pxp−1h(t),−p(1− p)xp−2h(t)).

Thus

π∗(t, x) = argminπ∈[π0,π1]

(−(π(µ− r) + r)pxph(t) +

12σ2π2p(1− p)xph(t)

)= argmin

π∈[π0,π1]

(−(π(µ− r) + r) +

12σ2π2(1− p)

)= min

(π1,

µ− r

σ2(1− p)∨ π0

).

It is then easy to deduce that

v(t, x) = xpep(T−t)(r+π∗((µ−r)− 1−p

2 π∗σ2))

is a C1,2 solution of (2.5) such that

H(.;Dv,D2v)(t, x) = Hπ∗(.)(.;Dv,D2v)(t, x).

Thus we conclude, by the verification Theorem, that V (t, x) = v(t, x) and π∗ is an optimal control.Later, in Section 4.7, we will perform a worst-case approach to this problem, when the parameters

µ, σ are considered to be stochastic.

Page 33: A weak dynamic programming principle for zero-sum stochastic ... · A weak dynamic programming principle for zero-sum stochastic differential games Pedro Miguel Almeida Serra Costa

Chapter 3

Deterministic differential games

In this Chapter we consider two-person zero-sum differential games. Analogously to Chapter 1our objective is to establish the dynamic programming principle and use it to characterize the valuefunctions as viscosity solutions of the associated Hamilton-Jacobi-Bellman-Isaacs (HJBI) equations.The exposition is similar to the one in [3]. For more classical approaches (without viscosity solutions)we refer the reader to [8, 9].

3.1 Introduction

The scenario for a two-person differential game is similar to that of optimal control. One maindifference is that now the state variable is controlled by two controls, a ∈ A and b ∈ B:

dXa,bt,x (s) = µ

(s,Xa,b

t,x (s); a(s), b(s))ds.

The controls are adjusted by two players, who have antagonic objectives. We assume that player onewants to maximize a terminal reward, while the other wants to minimize it. The terminal reward isthe quantity

J(t, x; a, b) := f(Xa,bt,x (T )

).

In this scenario we will give advantage to one of the players by allowing him to use strategies. Astrategy for player 1, α := α[b], is a mapping α : B → A. Analogously we define strategy for player 2.We then consider the optimization problems of determining the value functions

v(t, x) := infβ

supaJ(t, x; a, β[a]),

u(t, x) := supα

infbJ(t, x;α[b], b).

These value functions, v, u, are called, respectively, lower and upper value of the game. The namesare justified by the inequality,

v ≤ u,

which is a consequence of the fact that on v player two is given advantage, while the opposite happenson u. In fact, not to give too much advantage to one of the players, and to consider a problem thatcan be tackled via a dynamic programming approach, we need to consider only strategies that donot foresee the future of the opponent’s control. This is made precise by introducing the notion ofnon-anticipativity, which is a property of interest in many applications.

As the reader may predict, each of the value functions will verify a dynamic programming principlethat will allows us to prove that they are solutions to a partial differential equation. This is, indeed,

21

Page 34: A weak dynamic programming principle for zero-sum stochastic ... · A weak dynamic programming principle for zero-sum stochastic differential games Pedro Miguel Almeida Serra Costa

22 CHAPTER 3. DETERMINISTIC DIFFERENTIAL GAMES

true. The dynamic programming principle now takes the form:

v(t, x) = infβ

supav(τ,X

a,β[a]t,x (τ)

),

u(t, x) = supα

infbu(τ,X

α[b],bt,x (τ)

).

The equations that the value functions satisfy are the so-called Hamilton-Jacobi-Bellman-Isaacs equa-tions: (

−∂tv + infa

supb−〈µ(.; a, b), Dv〉

)(t, x) = 0,(

−∂tu+ supb

infa−〈µ(.; a, b), Du〉

)(t, x) = 0.

After establishing uniqueness for the above equations we can deduce that if Isaac’s condition holds,that is, if, for all (t, x) and p,

infa

supb−〈µ(t, x; a, b), p〉 = sup

binfa−〈µ(t, x; a, b), p〉

holds, then the game has a value, i.e., v = u.

3.2 The controlled dynamical system

In two-person differential games we must consider two control spaces A,B consisting of the mea-surable functions taking values in compact sets A ⊂ Rda , B ⊂ Rdb , respectively. We say that A, B arethe control spaces of players one and two respectively.

The nonlinear system in Rd that characterizes the state variable is similar to the one consideredin optimal control,

dXa,bt,x (s) = µ

(s,Xa,b

t,x (s); a(s), b(s))ds

Xa,bt,x (t) = x,

(3.1)

where a ∈ A, b ∈ B, (t, x) are the initial conditions and µ : [0, T ]× Rd ×A×B → Rd is a continuousfunction which is Lipschitz continuous in the space variable, that is

|µ(t, x; a, b)− µ(t, y; a, b)| ≤ K|x− y|,

for some constant K.The space S := [0, T ]× Rd is again denoted by state space.Under the assumptions made on µ, the existence and uniqueness of a solution Xa,b

t,x for (1.1)follows from the standard theory of ordinary differential equations, for each (a, b). Furthermore thereis continuity with respect to the initial conditions (t, x) uniformly in t and a, b. More precisely, foreach x, there is a constant, Cx, depending only on K,T, x such that for all a ∈ A, b ∈ B, t ∈ [0, T ],t′ ≥ t, s ≥ t′, x′ ∈ B1(x),∣∣∣Xa,b

t′,x′(s)−Xa,bt,x (s)

∣∣∣ ≤ Cx

(|x− x′|+ |t− t′| 12

). (3.2)

This estimate is a particular case of Lemma 148, when there is no diffusion.

3.3 Lower and upper values

We consider a terminal reward , which player one wants to maximize and player two wants tominimize,

J(t, x; a, b) := f(Xa,bt,x (T )

).

Page 35: A weak dynamic programming principle for zero-sum stochastic ... · A weak dynamic programming principle for zero-sum stochastic differential games Pedro Miguel Almeida Serra Costa

3.4. DYNAMIC PROGRAMMING PRINCIPLE 23

The payoff function f is assumed to be bounded and continuous. We can think of this payoff assomething that player two will have to pay to player one in the terminal time T . This would explaintheir antagonic interests.

In this scenario it is natural to define the two following value functions:

vs(t, x) := supa∈A

infb∈B

J(t, x; a, b),

us(t, x) := infb∈B

supa∈A

J(t, x; a, b).

We call vs the lower static value function and us the upper static value function.Notice that, on the lower value function, advantage is given to player 2, who is allowed to make his

choice with the information of the whole future response of the first player. Similarly, on the uppervalue, advantage is given to the first player.

In a dynamic formulation of the game we want to give advantage to one of the players but withoutletting him foresee the future. This is achieved by introducing the notion of non-anticipating strategies:

Definition 13. A strategy for player 2 is a mapping β : A → B.A strategy, β, is said to verify the non-anticipativity property if for every a1, a2 ∈ A, τ ∈ [t, T ]

we have: (∀s∈[t,τ ] a1(s) = a2(s)

)⇒(∀s∈[t,τ ] β[a1](s) = β[a2](s)

).

The space of non-anticipative strategies for player 2 is denoted by ∆.The definition of strategy for player 1 is analogous. The space of non-anticipative strategies for

player 1 is denoted by Γ.

We are now able to define the value functions which will be studied in the subsequent sections.

Definition 14. The lower value of a differential game is defined as

v(t, x) = infβ∈∆

supa∈A

J(t, x; a, β[a]).

Similarly, the upper value of a differential game is

u(t, x) = supα∈Γ

infb∈B

J(t, x;α[b], b).

The names upper and lower value are justified by the following inequality:

v ≤ u. (3.3)

This intuitive inequality needs to be proved. We will do that, in an indirect way, using the HJBIequations associated with the game.

Using the same arguments as in the deterministic optimal control problem we deduce the followingresult on the regularity of v, u:

Proposition 15. The lower and upper value functions, v, u, are bounded and continuous in S.

3.4 Dynamic programming principle

In this Section we establish the dynamic programming principle for the value functions v, u. Likein the case of optimal control, property (1.3) will be essential.

Theorem 16 (Dynamic programming principle). For all x ∈ Rd, t ∈ [0, T ] and τ ∈ [t, T ] the followingholds:

v(t, x) = infβ∈∆

supa∈A

v(τ,X

a,β[a]t,x (τ)

),

u(t, x) = supα∈Γ

infb∈B

u(τ,X

α[b],bt,x (τ)

).

Page 36: A weak dynamic programming principle for zero-sum stochastic ... · A weak dynamic programming principle for zero-sum stochastic differential games Pedro Miguel Almeida Serra Costa

24 CHAPTER 3. DETERMINISTIC DIFFERENTIAL GAMES

Proof. We give only the proof for the lower value function, since the other is similar. We prove thetwo inequalities separately.

Step 1: v(t, x) ≤ infβ∈∆

supa∈A

v(τ,X

a,β[a]t,x (τ)

).

Let βε1 be such that

infβ∈∆

supa∈A

v(τ,X

a,β[a]t,x (τ)

)≥ supa∈A

v(τ,X

a,βε1 [a]

t,x (τ))− ε.

For each y, let βεy be such that

v(τ, y) ≥ supa∈A

J(τ, y; a, βεy[a]

)− ε.

We now define βε by

βε[a] := βε1[a]⊕τ βεX

a,βε1[a]

t,x (τ)[a].

It is straightforward to check that βε ∈ ∆. Furthermore, for any a ∈ A,

J(t, x; a, βε[a]) = J(τ,X

a,βε[a]t,x (τ); a, βε[a]

)= J

(τ,X

a,βε1 [a]

t,x (τ); a, βεX

a,βε1[a]

t,x (τ)[a])

≤ v(τ,X

a,βε1 [a]

t,x (τ))

+ ε

≤ supa∈A

v(τ,X

a,βε1 [a]

t,x (τ))

+ ε

≤ infβ∈∆

supa∈A

v(τ,X

a,β[a]t,x (τ)

)+ 2ε.

Since a ∈ A is arbitrary, we deduce that

supa∈A

J(t, x; a, βε[a]) ≤ infβ∈∆

supa∈A

v(τ,X

a,β[a]t,x (τ)

)+ 2ε.

Thus,

v(t, x) ≤ infβ∈∆

supa∈A

v(τ,X

a,β[a]t,x (τ)

)+ 2ε.

Step 2: v(t, x) ≥ infβ∈∆

supa∈A

v(τ,X

a,β[a]t,x (τ)

).

Let βε be such that

v(t, x) ≥ supa∈A

J(t, x; a, βε[a])− ε,

and aε1 be such that

supa∈A

v(τ,X

a,βε[a]t,x (τ)

)≤ v

(τ,X

aε1,β

ε[aε1]

t,x (τ))

+ ε.

Finally, define βε by

βε[a] := βε[aε1 ⊕τ a],

Page 37: A weak dynamic programming principle for zero-sum stochastic ... · A weak dynamic programming principle for zero-sum stochastic differential games Pedro Miguel Almeida Serra Costa

3.5. HAMILTON-JACOBI-BELLMAN-ISAACS EQUATION 25

and let aε2 be such that

supaJ(τ,X

aε1,β

ε[aε1]

t,x (τ); a, βε[a])≤ J

(τ,X

aε1,β

ε[aε1]

t,x (τ); aε2, βε[aε2]

)+ ε.

Let aε := aε1 ⊕τ aε2. Then, by the non-anticipativity property of βε and by definition of βε, we have

βε[aε] = βε[aε1]⊕τ βε[aε2].

Thus,

v(t, x) ≥ supa∈A

J(t, x; a, βε[a])− ε

≥ J(t, x; aε, βε[aε])− ε

= J(τ,X

aε,βε[aε]t,x (τ); aε, βε[aε]

)− ε

= J(τ,X

aε1,β

ε[aε1]

t,x (τ); aε2, βε[aε2]

)− ε

≥ supaJ(τ,X

aε1,β

ε[aε1]

t,x (τ); a, βε[a])− 2ε

≥ v(τ,X

aε1,β

ε[aε1]

t,x (τ))− 2ε

≥ supa∈A

v(τ,X

a,βε[a]t,x (τ)

)− 3ε

≥ infβ∈∆

supa∈A

v(τ,X

a,β[a]t,x (τ)

)− 3ε.

3.5 Hamilton-Jacobi-Bellman-Isaacs equation

In this Section we use the dynamic programming principle to prove that the value functions areviscosity solutions of the associated Hamilton-Jacobi-Bellman-Isaacs equations.

Consider the Hamiltonians

H−(t, x, p) := infa∈A

supb∈B

Ha,b(t, x, p),

H+(t, x, p) := supb∈B

infa∈A

Ha,b(t, x, p),

where

Ha,b(t, x, p) := −〈p, µ(t, x; a, b)〉.

Notice that Ha,b(t, x, p) is continuous in t, x, p, a, b. Thus, due to the compactness of A and B, H isalso continuous.

Theorem 17. The lower value function v is a viscosity solution of

(−∂tv +H−(., Dv))(t, x) = 0, (t, x) ∈]0, T [×Rd. (3.4)

The upper value function u is a viscosity solution of

(−∂tu+H+(., Du))(t, x) = 0, (t, x) ∈]0, T [×Rd. (3.5)

Proof. We only make the proof for the lower value, since the other is analogous. We start with thesupersolution property, arguing by contradiction. With that purpose in mind consider φ ∈ C1(S),(t, x) ∈ argmin(v − φ) and suppose that

(−∂tφ+H−(., Dφ))(t, x) ≤ −3δ, (3.6)

Page 38: A weak dynamic programming principle for zero-sum stochastic ... · A weak dynamic programming principle for zero-sum stochastic differential games Pedro Miguel Almeida Serra Costa

26 CHAPTER 3. DETERMINISTIC DIFFERENTIAL GAMES

where δ > 0.By Remark 101, in the Appendix, we can suppose that (t, x) is a strict global minimizer. Thus,

for all (t, x) ∈ S,

(v − φ)(t, x) ≥ (v − φ)(t, x),

that is

φ(t, x)− φ(t, x) ≥ v(t, x)− v(t, x). (3.7)

By (3.6), there is a∗ ∈ A such that, for all b ∈ B,(−∂tφ+Ha∗,b(., Dφ)

)(t, x) ≤ −2δ.

Because B is compact, then Ha,b(t, x) is continuous in (t, x) uniformly with respect to b. Thus thereis R such that, for all b ∈ B,(

−∂tφ+Ha∗,b(., Dφ))

(t, x) ≤ −δ, for all (t, x) ∈ BR(t, x).

Let τ be sufficiently close to t so that, for all a ∈ A, b ∈ B,(s,Xa,b

t,x(s))∈ BR(t, x), for all s ∈ [t, τ ].

Consider β ∈ ∆ arbitrary. Then, because φ is C1, we have

φ(τ,X

a∗,β[a∗]

t,x(τ))− φ(t, x) =

∫ τ

t

(∂tφ−Ha∗,β[a∗](., Dφ)

)(s,X

a∗,β[a∗]

t,x(s))ds

≥ (τ − t)δ.

Thus, by (3.7),

v(τ,X

a∗,β[a∗]

t,x(τ))− v(t, x) ≥ φ

(τ,X

a∗,β[a∗]

t,x(τ))− φ(t, x)

≥ (τ − t)δ.

Hence

supa∈A

v(τ,X

a,β[a]

t,x(τ))

≥ v(t, x) + (τ − t)δ,

and, since β is arbitrary,

infβ∈∆

supa∈A

v(τ,X

a,β[a]

t,x(τ))

≥ v(t, x) + (τ − t)δ,

which contradicts the dynamic programming principle.

To prove that v is a subsolution of (3.4) we also proceed by contradiction. Consider φ ∈ C1(S),(t, x) ∈ argmax(v − φ) and suppose that

(−∂tφ+H−(., Dφ))(t, x) ≥ 4δ. (3.8)

By Remark 101, in the Appendix, we can suppose that (t, x) is a strict global maximizer. Thus, forall (t, x) ∈ S,

(v − φ)(t, x) ≤ (v − φ)(t, x),

that is

φ(t, x)− φ(t, x) ≤ v(t, x)− v(t, x). (3.9)

Page 39: A weak dynamic programming principle for zero-sum stochastic ... · A weak dynamic programming principle for zero-sum stochastic differential games Pedro Miguel Almeida Serra Costa

3.5. HAMILTON-JACOBI-BELLMAN-ISAACS EQUATION 27

By (3.8), for each a ∈ A there is ba ∈ B such that(−∂tφ+Ha,ba(., Dφ)

)(t, x) ≥ 3δ.

Furthermore, by continuity ofHa,b with respect to a, for each a there is ra such that, for all a ∈ Bra(a),(−∂tφ+H a,ba(., Dφ)

)(t, x) ≥ 2δ.

Since A is compact and Bra(a) : a ∈ A is an open covering of A, there is a finite collection ai, risuch that Bri

(ai) covers A. We then define Λ1 := Br1(a1), Λi+1 := Bri+1(ai+1) \⋃ik=1 Λk. Now

define ψ : A→ B by

ψ(a) :=∑i

1Λi(a)bai ,

and β ∈ ∆ by β[a]s := ψ(as). Then, for all a ∈ A,(−∂tφ+Ha,ψ(a)(., Dφ)

)(t, x) ≥ 2δ.

Because A,B are compact then Ha,b(t, x) is continuous in (t, x) uniformly with respect to (a, b). Thus,there is R such that, for all a ∈ A,(

−∂tφ+Ha,ψ(a)(., Dφ))

(t, x) ≥ δ, for all (t, x) ∈ BR(t, x).

Let τ be sufficiently close to t so that, for all a ∈ A, b ∈ B,(s,Xa,b

t,x(s))∈ BR(t, x), for all s ∈ [t, τ ].

Then, because φ is C1, we have, for an arbitrary a ∈ A,

φ(τ,X

a,β[a]

t,x(τ))− φ(t, x) =

∫ τ

t

(∂tφ−Has,ψ(as)(., Dφ)

)(s,X

a,β[a]

t,x(s))ds

≤ −(τ − t)δ.

Thus, by (3.9),

v(τ,X

a,β[a]

t,x(τ))− v(t, x) ≤ φ

(τ,X

a,β[a]

t,x(τ))− φ(t, x)

≤ −(τ − t)δ.

Taking the supremum in a we get

supa∈A

v(τ,X

a,β[a]

t,x(τ))

≤ v(t, x)− (τ − t)δ.

Thus

infβ∈∆

supa∈A

v(τ,Xa,β[a]

t,x(τ)) ≤ v(t, x)− (τ − t)δ,

which contradicts the dynamic programming principle.

It remains to establish uniqueness of solution for the HJBI equation.

Theorem 18. The lower value v is the minimal supersolution and maximal subsolution of (3.4) inthe class of bounded functions. In particular, it is the unique viscosity solution of (3.4) in that class.

An analogous result holds for the upper value, u.

Page 40: A weak dynamic programming principle for zero-sum stochastic ... · A weak dynamic programming principle for zero-sum stochastic differential games Pedro Miguel Almeida Serra Costa

28 CHAPTER 3. DETERMINISTIC DIFFERENTIAL GAMES

For the proof of the previous Theorem we refer the reader to [3, p. 442]. We will establish lateruniqueness of solution for the analogous second order HJBI equation appearing in the stochasticscenario, see Section 4.6.1.

We can finally prove (3.3).

Corollary 19. Let v, u be the lower and upper values of the game, respectively. Then

v ≤ u.

Proof. Notice that H− ≥ H+. This implies that v is a subsolution of (3.5). Since, by the previousTheorem, u is the maximal subsolution of (3.5) we conclude that

v ≤ u.

Using the same arguments we can also establish a criterion for the game to have a value:

Corollary 20 (Isaac’s condition). If H− = H+ then the game has a value, that is,

v = u.

Proof. The proof follows directly from the fact that in this case equations (3.4) and (3.5) are the same,hence, by uniqueness of solution, we have v = u.

Page 41: A weak dynamic programming principle for zero-sum stochastic ... · A weak dynamic programming principle for zero-sum stochastic differential games Pedro Miguel Almeida Serra Costa

Chapter 4

Stochastic differential games

29

Page 42: A weak dynamic programming principle for zero-sum stochastic ... · A weak dynamic programming principle for zero-sum stochastic differential games Pedro Miguel Almeida Serra Costa

30 CHAPTER 4. STOCHASTIC DIFFERENTIAL GAMES

4.1 Introduction

Consider a scenario where two players compete in such a way that at some terminal time T thesecond player pays to the first one a certain payoff. This payoff is determined by the terminal value ofa state variable, Xa,b(T ), depending on parameters a, b which are controlled, respectively, by players1 and 2. More precisely, the payoff is f

(Xa,bt,x (T )

), where f is a measurable function.

We consider the case where the state variable is random with dynamics given by the followingstochastic differential equation:

dXa,bt,x (s) = µ

(s,Xa,b

t,x (s); as, bs)ds+ σ

(s,Xa,b

t,x (s); as, bs)dWs

Xa,bt,x (t) = x,

where t is the time when the players start competing and x is the initial value of the state variable.By adjusting the controls, both players can drive the state variable, hence changing the terminal

payoff. Obviously, if player 1 is rational then he is interested in maximizing the payoff. Similarly,player 2 is interested in minimizing the payoff.

At the initial time t, when both players choose their controls, they can not predict the payoff,hence they are not able to optimize it. Instead they can optimize the expected value of the payoff:

J(t, x; a, b) := E[f(Xa,bt,x (T )

) ∣∣∣Ft] .This quantity is also a random variable, but is Ft−measurable, hence its value is accessible to theplayers at the time of the start of the game. We shall call J the reward function. Both players willtry to choose a, b in order to optimize J according to their antagonic objectives.

If the players are rational then they are interested in choosing controls a, b in such a way thatmaximizes or minimizes J(t, x; a, b). Naively we could consider the problems

Vs(t, x) := supa

infbJ(t, x; a, b),

Us(t, x) := infb

supaJ(t, x; a, b).

These two functions are of interest but they will not be the scope of this work. They are called,respectively, the lower static value and upper static value of the game. Clearly on the lower staticvalue, player 2 is given an advantage over player 1 because he is able to look at the other player’scontrol before choosing his own. Analogously, on the upper static value it is player 1 who is inadvantage.

These values are called static, because on each of them, one of the players is given advantage bybeing allowed to know the control of the other player for the entire duration of the game, that is,in [t, T ]. In this case we see that some information about the future is being revealed to one of theplayers.

In a dynamic version of the same game we will still give advantage to one of the players, but insuch a way that no information about the future is revealed. In this case information to one playerabout the other player’s control is revealed, in a dynamic way, as time goes by. This justifies thenames static and dynamic given to these two different versions of the same game. More precisely, wewill consider

V (t, x) := infβ

supaJ(t, x; a, β[a]),

U(t, x) := supα

infbJ(t, x;α[b], b),

where α := α[b] and β := β[b] are, respectively, strategies for players 1 and 2. A strategy for player 1,α[b], can be seen as the reply of player 1 when player 2 chooses to use the control b. If no restrictionsare made on the admissible strategies then we get the static version of the game. Since we areinterested in not revealing information about the future, we require the admissible strategies to benon-anticipating, in a sense to be specified in the next paragraph.

Page 43: A weak dynamic programming principle for zero-sum stochastic ... · A weak dynamic programming principle for zero-sum stochastic differential games Pedro Miguel Almeida Serra Costa

4.1. INTRODUCTION 31

We say that a strategy for player 1, α, is non-anticipating if, given any two controls of player 2,b and b′, which are equal up to some time s, br = b′r for all r ∈ [t, s], the replies of player 1 usingstrategy α, α[b] and α[b′], are also equal up to time s, α[b]r = α[b′]r for all r ∈ [t, s].

The scope of this chapter is to study and characterize the lower value V . The approach for theupper value is analogous and will be omitted. The main results are:

• V (t, x) is a constant random variable, that is, V (t, x) = E[V (t, x)];

• V is the viscosity solution of the Hamilton-Jacobi-Bellman-Isaacs (HJBI) equation associatedwith this game.

To derive the HJBI equation we will use a dynamic programming approach different from the oneexistent in the literature. Instead of the dynamic programming principle we will follow [1] and usea weak version of this principle. With this approach we avoid measurability problems and the needof continuity of the value function which will allow us to study V in a more general setting than theusual one.

Thus, we emphasize the two main contributions of this thesis, which are discussed in this chapter:

• Extending the weak dynamic programming principle to the context of stochastic differentialgames;

• Considering stochastic differential games in a more general setting, the setting where strategiesare allowed to take values in an unbounded set and where f is locally Lipschitz instead of globallyLipschitz.

Two main references for zero-sum stochastic differential games are [2], [10]. In their pioneeringwork, [2], Fleming and Souganidis studied rigorously for the first time zero-sum stochastic differentialgames. There they considered only controls which are independent of the past, thus getting determin-istic value functions. In [10], Buckdahn and Li take a more modern approach to the same problem,using backward stochastic differential equations (BSDEs). Our approach will follow more closely thislast reference.

The structure of this Chapter is the following:

• Section 4.2 concerns some notation and preliminary definitions and estimates;

• In Section 4.3 we consider zero-sum stochastic differential games in the Markovian setting. Wediscuss general properties of the dynamics, the space of controls, the non-anticipative strate-gies, the reward function, and the upper and lower values. A detailed discussion of strategies’properties, essential in the subsequent sections, ends this Section;

• In the following Section we prove that the value function is a constant random variable and canthus be seen as a deterministic function, V (t, x) = E[V (t, x)]. It is also proved in this sectionthat the value function has polynomial growth. This result is essential in the discussion ofuniqueness of the HJBI equation;

• Section 4.5 is the main section of the Chapter. There we prove a version of the weak dynamicprogramming principle for stochastic differential games;

• In Section 4.6 we use the weak dynamic programming principle to prove that the value functionis a viscosity solution of the HJBI equation. It is also proved in this section that when A,B areboth compact sets then the HJBI equation admits a unique solution on a class of functions thatcontains all the functions of polynomial growth;

• In Section 4.7 we discuss an application of zero-sum stochastic differential games to Merton’sproblem. There we use a worst-case approach to deal with stochastic volatilities;

• Finally, in Section 4.8, we gather the results and assumptions from this Chapter to extractconclusions and some directions where further research can be carried.

For convenience of the reader we review in the end of the Chapter the notation used throughoutthis work.

Page 44: A weak dynamic programming principle for zero-sum stochastic ... · A weak dynamic programming principle for zero-sum stochastic differential games Pedro Miguel Almeida Serra Costa

32 CHAPTER 4. STOCHASTIC DIFFERENTIAL GAMES

4.2 Preliminaries

In this Chapter we consider zero-sum stochastic differential games on a time interval [0, T ].Therefore we consider the classical Wiener space (Ω,F ,P), on this finite horizon T . More precisely,

Ω is the space of continuous functions ω : [0, T ] → RN such that ω(0) = 0, and F is the Borel σ−algebrain Ω completed with respect to the Wiener measure P. We denote by W the standard N−dimensionalBrownian motion corresponding to the coordinate process, Ws(ω) = ωs.

We consider in (Ω,F ,P) the natural filtration induced by W augmented with the P−null sets,F = Fs, 0 ≤ s ≤ T, where

Fs = σWr : r ≤ s ∨ NP.

We denote by T the collection of all stopping times in F. Given τ1, τ2 such that τ1 ≤ τ2, T[τ1,τ2]

denotes the collection of all τ ∈ T such that τ1 ≤ τ ≤ τ2. When τ1 = 0 we simply write Tτ2 .The essential extrema of a random variable is defined as

Definition 21. Let X be a random variable. Then M ∈ R ∪ +∞ is said to be the essentialsupremum of X, M = esssupX, if:

• X ≤M P− a.s.;

• If there exists M ∈ R such that X ≤ M, P− a.s., then M ≤ M .

The essential infimum of X is defined as

essinfX := −esssup −X.

We consider processes in the following space

Hp(t, T ;A) :=

(ψs)s∈[t,T ], F− progressively measurable with values in A s.t.:

E

[∫ T

t

|ψs|pds

]<∞

.

In the above space we consider a norm ‖.‖Hp , for p ≥ 1, defined by

‖ψ‖Hp :=

(E

[∫ T

t

|ψs|pds

]) 1p

.

Notice that the above space is a Lp space. Indeed,

Hp(t, T ;A) = Lp([t, T ]× Ω,B([t, T ])⊗F ,m⊗ P),

where B([t, T ]) denotes the Borel σ−algebra of [t, T ] and m is the Lebesgue measure in [t, T ].We also consider the subspace of Hp(t, T ;A) consisting of the processes which have essentially

bounded integral:

Hp,∞(t, T ;A) :=

ψ ∈ Hp(t, T ;A) : esssup

∫ T

t

|ψs|pds <∞

.

In this space we consider the following two uniform norms:

‖ψ‖Hp,∞t

:= esssup

(E

[∫ T

t

|ψs|pds∣∣∣Ft])

1p

,

‖ψ‖Hp,∞ := esssup

(∫ T

t

|ψs|pds

) 1p

.

Page 45: A weak dynamic programming principle for zero-sum stochastic ... · A weak dynamic programming principle for zero-sum stochastic differential games Pedro Miguel Almeida Serra Costa

4.2. PRELIMINARIES 33

Notice that ‖ψ‖Hp,∞t

≤ ‖ψ‖Hp,∞ .Finally, we consider the subspace of Hp,∞(t, T ;A) consisting of the processes which are essentially

bounded:

H∞(t, T ;A) :=

ψ ∈ Hp,∞(t, T ;A) : esssup sup

s∈[t,T ]

|ψs| <∞

.

In this space we also consider two uniform norms:

‖ψ‖H∞t,p:= esssup

(E

[sups∈[t,T ]

|ψs|p∣∣∣Ft])

1p

,

‖ψ‖H∞ := esssup sups∈[t,T ]

|ψs|.

In the following Lemma we collect some inequalities about the above norms:

Lemma 22. Let ψ ∈ H∞(t, T ;A) and let q ≤ p. Then the following hold:

(i) ‖ψ‖H∞t,q≤ ‖ψ‖H∞t,p

≤ ‖ψ‖H∞ ;

(ii) ‖ψ‖Hp,∞t

≤ (T − t)1p ‖ψ‖H∞t,p

;

(iii) ‖ψ‖Hp,∞ ≤ (T − t)1p ‖ψ‖H∞ .

Remark 23. The space H∞(t, T ;A), with the norm of Hp, is a dense subspace of Hp(t, T ;A).

Hprcll(t, T ;A) denotes the processes in Hp(t, T ;A) which are right-continuous and have finite left

limits, also called cadlag .Unless otherwise stated, equalities and inequalities between random variables are to be understood

in the P− a.s. sense.

Page 46: A weak dynamic programming principle for zero-sum stochastic ... · A weak dynamic programming principle for zero-sum stochastic differential games Pedro Miguel Almeida Serra Costa

34 CHAPTER 4. STOCHASTIC DIFFERENTIAL GAMES

4.3 The Markovian scenario

4.3.1 State dynamics

We consider zero-sum stochastic differential games between two players. There is a controlled stateprocess that determines the reward of each of them. This process is a mapping taking values in Rd,

(t, x; a, b) ∈ S×At × Bt 7→ Xa,bt,x ∈ H0

rcll(t, T ; Rd),

where S := [0, T ]×Rd is the state space and At ⊂ H0(t, T ; Rda), Bt ⊂ H0(t, T ; Rdb) are the collectionsof admissible controls at time t for players 1 and 2 respectively. Here da, db represent the dimensionsof the space of controls for players 1 and 2, respectively.

In this work we consider the case where the controlled state process is a diffusion:dXa,b

t,x (s) = µ(s,Xa,b

t,x (s); as, bs)ds+ σ

(s,Xa,b

t,x (s); as, bs)dWs

Xa,bt,x (t) = x,

(4.1)

In the above equation µ, σ are functions such that the above SDE has an unique solution:

µ : S×A×B → Rd,σ : S×A×B → Rd×N ,

where A ⊂ Rda , B ⊂ Rdb , and we think of Rd×N as the space of d × N matrices with the Frobeniusnorm, i.e., |σ| =

√Tr(σσT ).

We start by making the typical global Lipschitz and linear growth conditions that assure existenceand uniqueness of a strong solution for (4.1). That is, there is K such that

|µ(t, x; a, b)− µ(t, y; a, b)|+ |σ(t, x; a, b)− σ(t, y; a, b)| ≤ K|x− y||µ(t, x; a, b)|+ |σ(t, x; a, b)| ≤ K(1 + |x|+ |a|+ |b|). (4.2)

In addition, as we proceed, further assumptions on the controls will be considered in order to establishour main results.

Example 24. We now give a few examples of state dynamics:

• In a pursuit-evasion game there is a pursuer P and an evader E. In this case we can think ofthe state variable as being the relative position between P and E. If the pursuer and evader cancontrol their velocities a, b, respectively, then we consider the dynamics of the state variable tobe

dXa,bt,x (s) = (as − bs)ds+ σdWs,

where σ is a volatility factor that controls the Brownian noise on the pursuit.

Clearly in this case conditions (4.2) are satisfied.

• We can also think of a game where there is a particle with a velocity controlled by players 1 and2: the first player controls the direction while the second controls the intensity. In this case thestate variable is the position of the particle. In dimension 2 with a 2−dimensional Browniannoise in the dynamics we get

dXa,bt,x (s) = bs cos(as)ds+ σ1,1dW

1s + σ1,2dW

2s

dY a,bt,x (s) = bs sin(as)ds+ σ2,1dW1s + σ2,2dW

2s ,

where X,Y represent, respectively, the x and y coordinates of the particle’s position.

Again, the dynamics satisfies (4.2).

Page 47: A weak dynamic programming principle for zero-sum stochastic ... · A weak dynamic programming principle for zero-sum stochastic differential games Pedro Miguel Almeida Serra Costa

4.3. THE MARKOVIAN SCENARIO 35

• We can do a worst-case approach to Merton’s optimal portfolio problem by considering a differ-ential game where a fictitious player controls the volatility. In this case we can take the value ofthe portfolio to be the state variable, which has dynamics

dXa,bt,x (s) = (asµ+ (1− as)r)X

a,bt,x (s)ds+ asbsX

a,bt,x (s)dWs,

where µ, r are, respectively, the growth rate of the risky asset and the interest rate, as is the ratioof the portfolio’s value invested in the risky asset at time s, and bs is the volatility of the riskyasset at time s.

In this example, if we consider compact-valued controls, that is, if a, b take values in a compactset, then conditions (4.2) are satisfied.

If conditions (4.2) are satisfied we know, by Theorem 147, that for each p ≥ 2 and each pair ofcontrols a ∈ At∩Hp(t, T ;A), b ∈ Bt∩Hp(t, T ;B), there is a unique strong solution Xa,b

t,x ∈ Hp(t, T ; Rd)to (4.1), which verifies additionally

E

[sups∈[t,T ]

∣∣∣Xa,bt,x (s)

∣∣∣p] ≤ C(1 + |x|p + ‖a‖pHp + ‖b‖pHp), (4.3)

where C is a constant depending only on K,T, p. We are interested in considering conditional expec-tations of the state process. Hence we consider a stronger version of (4.3):

E

[sups∈[t,T ]

∣∣∣Xa,bt,x (s)

∣∣∣p ∣∣∣Ft] ≤ C

(1 + |x|p + E

[∫ T

t

(|as|p + |bs|p)ds∣∣∣Ft]) .

If we know additionally that a, b ∈ Hp,∞ then we deduce from the previous estimate that

E

[sups∈[t,T ]

∣∣∣Xa,bt,x (s)

∣∣∣p ∣∣∣Ft] ≤ C(1 + |x|p + ‖a‖pHp,∞

t+ ‖b‖pHp,∞

t

). (4.4)

This estimate has the advantage of being uniform with respect to ω.

Remark 25. If we consider controls a, b in bounded sets A,B of Hp then estimate (4.3) is uniformwith respect to a, b. More precisely, if A ⊂ At, B ⊂ Bt are bounded sets, in the norm of Hp, thenthere is C depending only on K, T , sup ‖a‖Hp : a ∈ A, and sup ‖b‖Hp : b ∈ B such that, for alla ∈ A, b ∈ B,

E

[sups∈[t,T ]

∣∣∣Xa,bt,x (s)

∣∣∣p] ≤ C(1 + |x|p). (4.5)

Similarly, if we consider controls a, b in sets A,B bounded in the norm Hp,∞t then estimate (4.4)

is uniform with respect to a, b, in the sense that, for all a ∈ A, b ∈ B,

E

[sups∈[t,T ]

∣∣∣Xa,bt,x (s)

∣∣∣p ∣∣∣Ft] ≤ C (1 + |x|p) . (4.6)

4.3.2 Admissible controls

In order to be able to use estimate (4.4) we should consider

At ⊂ Hp,∞(t, T ;A), Bt ⊂ Hp,∞(t, T ;B).

Since we also want to use Lemma 1481 we consider the following spaces of admissible controls forplayers 1 and 2:

At := H∞(t, T ;A), Bt := H∞(t, T ;B).1In fact, to use Lemma 148, we could have just considered

At := Hp,∞(t, T ; A), Bt := Hp,∞(t, T ; B),

for some p > 2.

Page 48: A weak dynamic programming principle for zero-sum stochastic ... · A weak dynamic programming principle for zero-sum stochastic differential games Pedro Miguel Almeida Serra Costa

36 CHAPTER 4. STOCHASTIC DIFFERENTIAL GAMES

In this case, estimates (4.3) and (4.4) are valid for all p ≥ 2.

Remark 26. If A,B are compact then

At = H0(t, T ;A), Bt = H0(t, T ;B).

This is the case considered in [10].

In the following we establish basic properties concerning the controls for player 1. Obviously thesame results apply to the controls of player 2.

Note that by considering restrictions of controls we have the inclusion At ⊂ As, whenever t ≤ s.More precisely

Lemma 27. Let t ≤ s. If a ∈ At then a|[s,T ] ∈ As.

Thus, given a ∈ At, when there is no ambiguity we will write a ∈ As instead of a|[s,T ] ∈ As.The following definition is useful to compare stochastic processes:

Definition 28. Given stochastic processes X,Y , stopping times τ1, τ2 and a measurable set Λ ∈ F ,we write

X ≡Λ Y on [τ1, τ2] (4.7)

to mean

P(1ΛXs = 1ΛYs; s a.e. on [τ1, τ2]) = 1.

If Λ = Ω we write (4.7) as

X ≡ Y on [τ1, τ2].

Remark 29. Notice that if X,Y are stochastic processes with enough regularity then we may lift the‘a.e.’ in the definition of ≡. More precisely, if X,Y are right continuous, then

X ≡Λ Y on [τ1, τ2] ⇒ P(1ΛXs = 1ΛYs;∀s ∈ [τ1, τ2)) = 1,

and if X,Y are left continuous, then

X ≡Λ Y on [τ1, τ2] ⇒ P(1ΛXs = 1ΛYs;∀s ∈ (τ1, τ2]) = 1.

We would like to identify controls that have an equivalent effect on the dynamics. Therefore wemake the following definition:

Definition 30. Let a1, a2 ∈ At, τ1 ∈ T[t,T ], τ2 ∈ T[τ1,T ], Λ ∈ F .

• We say that a1 equals a2 on [τ1, τ2] for all events in Λ if a1 ≡Λ a2 on [τ1, τ2].

• If a1 ≡ a2 on [t, T ] we simply write a1 ≡ a2 and say that a1, a2 are equivalent controls.

There are two very natural and useful ways of constructing controls out of existing ones. One isto concatenate two different controls. For this operation we use the following notation:

Definition 31. For t ≤ s ≤ T let a1 ∈ At, a2 ∈ As and θ ∈ T[s,T ]. Then we define a1 ⊕θ a2 ∈ At by

(a1 ⊕θ a2)s(ω) := (a1)s(ω)1[t,θ(ω)](s) + (a2)s(ω)1(θ(ω),T ](s).

The other one is to patch two controls in a suitable way by using one of the controls in a subset ofΩ and the other in the complement. The next Lemma gives conditions under which these operationsyield admissible controls.

Lemma 32. The following hold:

Page 49: A weak dynamic programming principle for zero-sum stochastic ... · A weak dynamic programming principle for zero-sum stochastic differential games Pedro Miguel Almeida Serra Costa

4.3. THE MARKOVIAN SCENARIO 37

• Let s ≥ t and θ ∈ T[s,T ]. If a1 ∈ At and a2 ∈ As then

a1 ⊕θ a2 ∈ At.

• Let θ ∈ T[t,T ] and ai ∈ At be such that ai ≡ aj on [t, θ] and supi ‖ai‖H∞ < ∞. SupposeΛii≥1 ⊂ Fθ forms a partition of Ω. Then∑

i≥1

1Λiai ∈ At.

Example 33. One example of admissible controls are the so called Markov control policies. Given ameasurable function ψ : S → A, a Markov control policy is the control defined by

as := ψ(s,Xs).

More precisely, if the control policy above is used then Xa,bt,x is the solution to

dX(s) = µ(s,X(s);ψ(s,X(s)), bs)ds+ σ(s,X(s);ψ(s,X(s)), bs)dWs

X(t) = x.

Obviously, further conditions on ψ may be required to obtain indeed an admissible control (for exampleboundedness of ψ). In the case of A compact then no other conditions are required.

4.3.3 Terminal reward

We consider zero-sum games, meaning that the rewards of both players add to zero. The terminalreward of player 1 is the random variable

J(t, x; a, b) := E[f(Xa,bt,x (T )

) ∣∣∣Ft] ,where f is a payoff function such that the above definition makes sense, i.e., f is measurable and forall a ∈ At, b ∈ Bt:

E[f(Xa,bt,x (T )

)]< +∞.

For example we can take f to be a bounded measurable function. More generally we will assume thatf has polynomial growth, i.e., there are p, C such that

|f(x)| ≤ C(1 + |x|p). (4.8)

Then by (4.3) we conclude that J is well defined.Due to the zero-sum condition, the reward of player 2 is −J(t, x; a, b).

Remark 34. For fixed t, x, a, b, (J(s, x; a, b))s∈[t,T ] is a martingale. Hence, by the martingale repre-sentation Theorem (Theorem 140), there exists ψ(., x; a, b) such that

J(s, x; a, b) = J(t, x; a, b) +∫ s

t

ψ(r, x; a, b)dWr.

Furthermore, since by (4.3) and (4.8), J(T, x; a, b) = f(Xa,bt,x (T )

)is square-integrable, then

ψ(., x; a, b) ∈ H2(t, T ; RN )

is unique. Thus,

(J(s, x; a, b), ψ(s, x; a, b))s∈[t,T ]

is the unique solution of the BSDEdJ(s, x; a, b) = ψ(s, x; a, b)dWs

J(T, x; a, b) = f(Xa,bt,x (T )

).

In [10] the reward function is seen as a solution of a similar BSDE.

Page 50: A weak dynamic programming principle for zero-sum stochastic ... · A weak dynamic programming principle for zero-sum stochastic differential games Pedro Miguel Almeida Serra Costa

38 CHAPTER 4. STOCHASTIC DIFFERENTIAL GAMES

By the definition of J it follows that J(t, x; a, b) as function of a, b verifies a property of independenceof irrelevant alternatives2 in the following sense:

Lemma 35. Let Λ ∈ Ft and a1, a2 ∈ At, b1, b2 ∈ Bt such that

(a1, b1) ≡Λ (a2, b2) on [t, T ].

Then

J(t, x; a1, b1) =Λ J(t, x; a2, b2).

Proof. We have

(a1, b1) ≡Λ (a2, b2) on [t, T ] ⇒ Xa1,b1t,x ≡Λ X

a2,b2t,x on [t, T ],

and, by continuity of Xa1,b1t,x , Xa2,b2

t,x , we conclude that

Xa1,b1t,x (T ) =Λ X

a2,b2t,x (T ).

Thus, since Λ ∈ Ft,

J(t, x; a1, b1) =Λ J(t, x; a2, b2).

Remark 36. We can allow a running cost, l, by considering an extra state variable Y a,b that satisfiesdY a,b(s) = l

(s,Xa,b

t,x (s); as, bs)ds

Y a,b(t) = 0,

and the payoff function g(x, y) = f(x) + y.Then

g(Xa,bt,x (T ), Y a,b(T )

)= f

(Xa,bt,x (T )

)+∫ T

t

l(s,Xa,b

t,x (s); as, bs)ds

is a terminal reward that has incorporated the running cost. Therefore the fact that we are dealingwith terminal payoffs is not as restrictive as it may seem.

4.3.4 Strategies

Given a control a ∈ At for player 1, the second player needs to choose a control b ∈ Bt. In general,however, the second player at each time t only knows as for s ≤ t. The choice of controls of thesecond player must take into account his lack of information about the future. This is guaranteed byimposing a condition of non-anticipativity which is made precise by the next Definition.

Definition 37. A strategy for player 2 is a function β : At → Bt that maps equivalent controls toequivalent controls.

A strategy, β, is said to verify the non-anticipativity property if for every a1, a2 ∈ At, τ ∈ T[t,T ]

we have:

a1 ≡ a2 on [t, τ ] ⇒ β[a1] ≡ β[a2] on [t, τ ].

The space of non-anticipative strategies for player 2 is denoted by ∆(t).The definition of strategy for player 1 is analogous. The space of non-anticipative strategies for

player 1 is denoted by Γ(t).

2Independence of irrelevant alternatives (IIA) is a term for an axiom of decision theory.

Page 51: A weak dynamic programming principle for zero-sum stochastic ... · A weak dynamic programming principle for zero-sum stochastic differential games Pedro Miguel Almeida Serra Costa

4.3. THE MARKOVIAN SCENARIO 39

This is the stochastic analogue to strategies in deterministic differential games. We allow player2 to use all information he can access up to the present time without foreseeing the future. Thisinformation includes the strategy of player 1 and the state of the world.

Of course the definition would be useless if in the end ∆(t) = ∅. Thus we give now some examplesof strategies.

Example 38. 1. If β is the constant strategy, β[a] = b ∈ Bt, then β ∈ ∆(t).

2. If ψ : A→ B is measurable and bounded then the strategy β defined by

β[a]s(ω) := ψ(as(ω))

is in ∆(t).

In this example we can replace the condition of boundedness by linear growth or continuity.

3. Generalizing the previous example and in analogy with the Markov control policies we can con-sider β ∈ ∆(t) defined by

β[a]s := ψ(s,X

a,β[a]t,x (s); as

),

where ψ : S×A→ B is a measurable bounded function.

More precisely, in this case Xa,β[a]t,x is the solution to

dXs = µ(s,Xs; as, ψ (s,Xs; as))ds+ σ(s,Xs; as, ψ (s,Xs; as))dWs

Xt = x.

These are the Markov control policies for the player 2 when he is allowed to use strategies.

For properties of strategies we refer the reader to the end of the Section.

4.3.5 Lower and upper values

Since the reward functions of both players are symmetric, then it should be clear that if players1 and 2 are rational then they should be interested in maximizing and minimizing, respectively, thereward function of player 1. Thus we will take the supremum and infimum of the reward functionJ . Actually, because J is a random variable, we will need to use instead the essential versions of thesupremum and infimum which we recall here.

Definition 39. Given a family of indexed real-valued random variables, Xν , ν ∈ U , a random variableX is said to be the essential supremum of Xν with respect to ν ∈ U , X = esssupν∈UXν , if

1. X ≥ Xν , P− a.s., for all ν ∈ U ;

2. If there is another random variable X such that X ≥ Xν , P − a.s. for all ν ∈ U , then X ≥X, P− a.s..

The essential infimum of Xν with respect to ν ∈ U , X = essinfν∈UXν is defined as

essinfν∈U

Xν = −esssupν∈U

(−Xν).

Since a probability space is σ−finite we have the following important property of essential extrema,[6, p. 71]:

Theorem 40. Let Xνν∈U be a collection of measurable random variables. Then the essentialsupremum, esssupν∈UXν , exists, is unique up to null sets and there is a suitable countable sequenceXνn such that

esssupν

Xν = supnXνn .

Page 52: A weak dynamic programming principle for zero-sum stochastic ... · A weak dynamic programming principle for zero-sum stochastic differential games Pedro Miguel Almeida Serra Costa

40 CHAPTER 4. STOCHASTIC DIFFERENTIAL GAMES

We can now define the upper and lower values of a stochastic differential game. In the lower valuewe allow player 2 to use strategies. This gives him an advantage over player 1. In the upper value wehave the opposite situation. More precisely, we have the following:

Definition 41. The lower value of a stochastic differential game is defined as

V (t, x) = essinfβ∈∆(t)

esssupa∈At

J(t, x; a, β[a]).

Similarly, the upper value of a stochastic differential game is

U(t, x) = esssupα∈Γ(t)

essinfb∈Bt

J(t, x;α[b], b).

The name upper and lower values is justified by the following inequality, which is intuitive butneeds to be proved,

V (t, x) ≤ U(t, x).

This comparison is discussed in Corollary 96. If the lower and upper values are equal then we saythat the game has a value.

Remark 42. For these values to exist we need further assumptions. For example, any of the followingguarantees this:

• f is bounded;

• A,B are compact sets. Indeed, by (4.6) and by the polynomial growth of f we conclude that ifA,B are compact then J(t, x; .) is uniformly bounded.

• The following holds:

∃a∗∈At lim‖b‖H∞→∞

J(t, x; a∗, b) = +∞,

∃b∗∈Bt lim‖a‖H∞→∞

J(t, x; a, b∗) = −∞,

where the limits are taken uniformly in ω.

Indeed, if this is the case then there is M sufficiently large such that

V (t, x) ≤ esssupa∈At

J(t, x; a, b∗) = esssupa∈At;‖a‖H∞≤M

J(t, x; a, b∗) < +∞

V (t, x) ≥ essinfβ∈∆(t)

J(t, x; a∗, β[a∗]) ≥ essinfβ∈∆(t);‖β[a∗]‖H∞≤M

J(t, x; a∗, β[a∗]) > −∞

• The following holds:

∃a∈At ∀b∈Bt J(.; a, b) is bounded from below,∃b∈Bt ∀a∈At J(.; a, b) is bounded from above.

Notice that, a priori, for each (t, x), V (t, x) and U(t, x) are random variables. Still, because thedynamics is Markovian we see that if we fix the history up to time t then there is no point for bothplayers to use controls which are not independent of Ft and for these controls we have that J(t, x; a, b)is deterministic. Thus we should expect V,U to be deterministic in the sense that

V (t, x) = E[V (t, x)],U(t, x) = E[U(t, x)].

Our objective is to prove a weak version of the dynamic programming principle for V (t, x) whichwill allow us to prove that it is a viscosity solution of the HJBI equation. Using the same arguments,analogous results can be obtained for the upper value.

Page 53: A weak dynamic programming principle for zero-sum stochastic ... · A weak dynamic programming principle for zero-sum stochastic differential games Pedro Miguel Almeida Serra Costa

4.3. THE MARKOVIAN SCENARIO 41

4.3.6 Properties of strategies

In this Section we study basic properties of strategies. These will be required when proving thedynamic programming principle. We restrict ourselves to strategies of player 2 but obviously analogousresults hold for strategies of player 1. We start by considering the non-anticipativity property.

Proposition 43. Consider a strategy β. Then β ∈ ∆(t) iff for every a1, a2 ∈ At, τ ∈ T[t,T ]

β[a1 ⊕τ a2] ≡ β[a1]⊕τ β[a1 ⊕τ a2]. (4.9)

Proof. Consider a1, a2 ∈ At, τ ∈ T[t,T ]. On one hand, if β ∈ ∆(t) then

a1 ≡ a1 ⊕τ a2 on [t, τ ] ⇒ β[a1] ≡ β[a1 ⊕τ a2] on [t, τ ],

hence we conclude that β satisfies (4.9).On the other hand, suppose a1 ≡ a2 on [t, τ ]. Then a1 ⊕τ a2 ≡ a2 so if β is a strategy we have

that

β[a1 ⊕τ a2] ≡ β[a2]. (4.10)

Thus, if β additionally satisfies (4.9) we conclude that

β[a1]⊕τ β[a1 ⊕τ a2] ≡ β[a1 ⊕τ a2] ≡ β[a2],

by (4.10). This implies that β[a1] ≡ β[a2] on [t, τ ], i.e., β is a non-anticipative strategy.

Due to the arbitrariness of the stopping time in the definition of strategy we can deduce that astrategy verifies the, apparently stronger, property stated in the next Proposition.

Proposition 44. Let β ∈ ∆(t), a1, a2 ∈ At. Then for every sequence of stopping times τ0 := t ≤τ1 ≤ ... ≤ τn, and every sequence of events Λ1 ⊇ Λ2... ⊇ Λn, Λi ∈ Fτi−1 , we have:

if ∀i (a1 ≡Λi a2 on [τi−1, τi]) then ∀i (β[a1] ≡Λi β[a2] on [τi−1, τi]) .

Proof. Consider a sequence of stopping times τ0 := t ≤ τ1 ≤ ... ≤ τn and a sequence of eventsΛ1 ⊇ Λ2... ⊇ Λn, Λi ∈ Fτi−1 , and define

τ :=n∑i=0

1Λi1Λci+1τi,

where Λ0 := Ω,Λn+1 := ∅. Then τ is a stopping time because

τ ≤ s =n⋃i=0

τi ≤ s ∩ Λi ∩ Λci+1,

and Λi ∩ Λci+1 ∈ Fτi ⇒ τi ≤ s ∩ Λi ∩ Λci+1 ∈ Fs.Now notice that

∀i (a1 ≡Λi a2 on [τi−1, τi])

is equivalent to

∀i (a1 ≡Λi a2 on [t, τi]) .

The result then follows easily by the non-anticipativity of β. Indeed,

∀i (a1 ≡Λi a2 on [t, τi]) ⇒ a1 ≡ a2 on [t, τ ]⇒ β[a1] ≡ β[a2] on [t, τ ]⇒ ∀i (β[a1] ≡Λi β[a2] on [τi−1, τi]) ,

Page 54: A weak dynamic programming principle for zero-sum stochastic ... · A weak dynamic programming principle for zero-sum stochastic differential games Pedro Miguel Almeida Serra Costa

42 CHAPTER 4. STOCHASTIC DIFFERENTIAL GAMES

where the first and third implications follow, respectively, from

τ1Λci+1

≤ τi1Λci+1,

τ1Λi ≥ τi1Λi .

The previous property tells us two important facts:

• What player 2 does up to time s depends only on what player 1 does up to time s;

• If at time s the control of player 1 is fragmented across Ω in sets of events in Fs then the replyof player 2 should be fragmented across the same sets as well.

In particular, a non-anticipative strategy verifies the property of independence of irrelevant alter-natives:

Corollary 45. Let β ∈ ∆(t). Then β is independent of irrelevant alternatives, that is, given a familyof controls ai ∈ At and a partition Λi ⊂ Ft of Ω such that

∑i 1Λiai ∈ At we have:

β

[∑i

1Λiai

]≡∑i

1Λiβ [ai] .

As a consequence of the independence of irrelevant alternatives we have the following interestingand useful property:

Proposition 46. Consider a set A ⊂ At bounded in the norm ‖.‖H∞ . Then, for each p ≥ 1, β[A] isbounded in the norm ‖.‖H∞t,p

.Similarly, if A is bounded in the norm ‖.‖Hp,∞ then β[A] is bounded in the norm ‖.‖Hp,∞

t.

Proof. We suppose, by contradiction, that β[A] is unbounded in the norm ‖.‖H∞t,p. Then there is a

sequence (ai) ⊂ A such that ‖β[ai]‖H∞t,pis unbounded. We suppose, without loss of generality, that

‖β[ai]‖H∞t,p≥ i.

We define

Λi :=

E

[sups∈[t,T ]

|β[ai]s|p∣∣∣Ft] ≥ i

,

λi := P(Λi).

Since ‖β[ai]‖H∞t,p≥ i then λi > 0. We now define λi by

λ1 := λ1,

λi+1 :=λi ∧ λi

3.

By Lemma 122, for each i, there is a Ft−measurable set Λi ⊂ Λi such that

P(Λi) = λi.

We now remark that

P

(⋃k>i

Λk

)≤

∑k>i

λk

≤ λi∑k>i

13k−i

=λi2.

Page 55: A weak dynamic programming principle for zero-sum stochastic ... · A weak dynamic programming principle for zero-sum stochastic differential games Pedro Miguel Almeida Serra Costa

4.3. THE MARKOVIAN SCENARIO 43

Let Γi := Λi \⋃k>i Λk. Then Γi is a family of disjoint sets and

P (Γi) ≥λi2> 0.

By Lemma 32 and because A is bounded, we have

a := a11(Si≥1 Γi)c +

∑i≥1

ai1Γi ∈ At.

Furthermore, by the independence of irrelevant alternatives, we have

β[a] = β[a1]1(Si≥1 Γi)c +

∑i≥1

β[ai]1Γi .

Thus

‖β[a]‖H∞t,p≥ esssup

∑i≥1

E

[sups∈[t,T ]

|β[ai]s|p∣∣∣Ft]1Γi

≥ esssup

∑i≥1

i1Γi

= +∞.

Since β[a] ∈ Bt and ‖β[a]‖H∞ ≥ ‖β[a]‖H∞t,pwe have a contradiction.

The case when A is bounded in the norm ‖.‖Hp,∞ is analogous.

The previous property tells us that a non-anticipating strategy is in some sense locally boundedwith respect to the control of the other player.

Remark 47. Using the previous Proposition we conclude that if A is compact, then β[At] is boundedwith the norm ‖.‖H∞t,p

. Thus, in this case, we can define

‖β‖p := supa∈At

‖β[a]‖H∞t,p. (4.11)

It is immediate that, if ∆(t) is a vector space then (4.11) defines a norm and that, in general,

dp(β1, β2) := supa∈At

‖β1[a]− β2[a]‖H∞t,p

defines a metric in ∆(t).

Notice that we can see a strategy as a controlled process. In this context, the property of non-anticipativity should say that the state of the controlled process in the present does not depend onthe future of the control, which is a natural condition to require. Thus we make the next definition,where we denote by U a suitable control space. For our purposes, U = At,Bt or At × Bt.

Definition 48. A controlled process, U 3 ν 7→ Xν , is said to verify the non-anticipativity propertyif for every ν1, ν2 ∈ U , τ ∈ T[t,T ] we have

ν1 ≡ ν2 on [t, τ ] ⇒ Xν1 ≡ Xν2 on [t, τ ].

Example 49. The solution to (4.1), Xa,bt,x , is a non-anticipative controlled process.

This is a consequence of the fact that Xa,bt,x (s) = ψ((Wr; ar, br)t≤r≤s) for some function ψ.

The notion of non-anticipativity is preserved by algebraic operations, that is:

Page 56: A weak dynamic programming principle for zero-sum stochastic ... · A weak dynamic programming principle for zero-sum stochastic differential games Pedro Miguel Almeida Serra Costa

44 CHAPTER 4. STOCHASTIC DIFFERENTIAL GAMES

Proposition 50. Let Xa, Y a be non-anticipative controlled processes and λ ∈ R. Then the followingare non-anticipative controlled processes as well: λXa, Xa + Y a, XaY a, Xa/Y a.

In the remainder of this section we will study natural ways of constructing strategies. We start bythe next two Propositions, which are the analogue of Lemma 32 in the context of strategies.

Proposition 51. Let t ≤ s ≤ T , β1 ∈ ∆(t), β2 ∈ ∆(s) and θ ∈ T[s,T ]. Then

β1 ⊕θ β2 := β11[t,θ] + β21(θ,T ] ∈ ∆(t).

Proposition 52. Let θ ∈ T[t,T ] and βi ∈ ∆(t) be such that ∀a∈Atβi[a] ≡ βj [a] on [t, θ]. If Λini=1 ⊂Fθ forms a partition of Ω then

n∑i=1

1Λiβi ∈ ∆(t).

Because in strategies there is a dependence in a there is a natural extension to the previousconstructions where we consider a stopping time dependent of a, θa. But to respect the flow ofinformation the choice of θa must have some restrictions, which intuitively will impose that it mustnot look in the future of the controls. These are made in the following definition.

Definition 53. A non-anticipative controlled stopping time, θν , is a mapping

U 3 ν 7→ θν ∈ T[t,T ]

such that, for all ν1, ν2 ∈ U , τ ∈ T[t,T ], f : R → R, we have

ν1 ≡ ν2 on [t, τ ] ⇒ f(θν1)1θν1<τ = f(θν2)1θν2<τ.

Remark 54. Notice that in the previous Definition

f(θν1)1θν1<τ = f(θν2)1θν2<τ

is equivalent to

1θν1<τ = 1θν2<τ,

θν11θν1<τ = θν21θν2<τ.

Notice as well that we considered 1θν1<τ instead of 1θν1≤τ. This is important because, dueto Remark 29, given a controlled stochastic process Xν , the condition Xν1 ≡ Xν2 on [t, τ ] does notcompare Xν1(τ) with Xν2(τ), unless Xν1 , Xν2 are know to be left-continuous. However we will need toconsider controlled stopping times associated with right continuous processes, as in the next example.

Example 55. Consider a controlled stochastic process, Xν , and suppose that it is either right or leftcontinuous. Let S be such that Xν(t) ∈ S. Then θν := infs > t : Xν(s) /∈ S is a non-anticipativecontrolled stopping time.

Indeed, if we consider ν1, ν2 ∈ U , τ ∈ T[t,T ] such that ν1 ≡ ν2 on [t, τ ], then Xν1 ≡ Xν2 on [t, τ ],hence by the assumption of continuity and Remark 29, we conclude that

P(Xν1(s) = Xν2(s);∀s ∈ (t, τ)) = 1.

Thus,

1θν1≥τ = 1Xν1 (s)∈S, ∀s∈(t,τ)

= 1Xν2 (s)∈S, ∀s∈(t,τ)

= 1θν2≥τ.

Besides

θν11θν1<τ = infs : τ > s > t, Xν1(s) /∈ S1θν1<τ

= infs : τ > s > t, Xν2(s) /∈ S1θν2<τ

= θν21θν2<τ.

Page 57: A weak dynamic programming principle for zero-sum stochastic ... · A weak dynamic programming principle for zero-sum stochastic differential games Pedro Miguel Almeida Serra Costa

4.3. THE MARKOVIAN SCENARIO 45

Proposition 56. Let θν1 , θν2 be non-anticipative controlled stopping times. Then θν1 ∧ θν2 , θν1 ∨ θν2 are

non-anticipative controlled stopping times.

Proof. Let ν1, ν2 ∈ U and τ ∈ T[t,T ] be such that

ν1 ≡ ν2 on [t, τ ].

Then

f(θν11 ∨ θν12 )1θν11 ∨θν1

2 <τ = f(θν11 ∨ θν12 )1θν11 <τ1θν1

2 <τ

= f(θν21 ∨ θν22 )1θν21 <τ1θν2

2 <τ

= f(θν21 ∨ θν22 )1θν21 ∨θν2

2 <τ ,

and

f(θν11 ∧ θν12 )1θν11 ∧θν1

2 <τ = f(θν11 )1θν11 <τ1θν1

2 ≥τ + f(θν11 ∧ θν12 )1θν11 <τ1θν1

2 <τ + f(θν12 )1θν12 <τ1θν1

1 ≥τ

= f(θν21 )1θν21 <τ1θν2

2 ≥τ + f(θν21 ∧ θν22 )1θν21 <τ1θν2

2 <τ + f(θν22 )1θν22 <τ1θν2

1 ≥τ

= f(θν21 ∧ θν22 )1θν21 ∧θν2

2 <τ .

Thus, θν1 ∧ θν2 , θν1 ∨ θν2 are non-anticipative controlled stopping times.

With the notion of non-anticipative controlled stopping time the extension of Proposition 51 isnow natural.

Proposition 57. Let t ≤ s ≤ T , β1 ∈ ∆(t), β2 ∈ ∆(s) and θa a non-anticipative controlled stoppingtime. Then β, defined by

β[a] := β1[a]⊕θa∨s β2[a],

is in ∆(t).

Proof. For a ∈ At, β[a] is well defined because At ⊂ As and hence β2[a] is well defined. Besides, byLemma 32, we have that β[a] ∈ Bt.

Now we need to verify the non-anticipativity property. For that, let a1, a2 ∈ At, τ ∈ T[t,T ] be suchthat

a1 ≡ a2 on [t, τ ].

Then we have by the non-anticipativity of β1 ∈ ∆(t) and β2 ∈ ∆(s)β1[a1] ≡ β1[a2] on [t, τ ],β2[a1] ≡ β2[a2] on [s, s ∨ τ ]. (4.12)

Since θa is non-anticipative then so is θa := θa ∨ s. Thus1θa1<τ1[t,θa1 ] = 1θa2<τ1[t,θa2 ],

1θa1<τ1(θa1 ,T ] = 1θa2<τ1(θa2 ,T ].(4.13)

Furthermore,

β[a1] = 1[t,θa1 ]β1[a1] + 1(θa1 ,T ]β2[a1]

= 1θa1≥τ

(1[t,θa1 ]β1[a1] + 1(θa1 ,T ]β2[a1]

)+

+1θa1<τ

(1[t,θa1 ]β1[a1] + 1(θa1 ,T ]β2[a1]

), (4.14)

and

1θa1≥τ

(1[t,θa1 ]β1[a1] + 1(θa1 ,T ]β2[a1]

)≡ 1θa1≥τβ1[a1] on [t, τ ]. (4.15)

Page 58: A weak dynamic programming principle for zero-sum stochastic ... · A weak dynamic programming principle for zero-sum stochastic differential games Pedro Miguel Almeida Serra Costa

46 CHAPTER 4. STOCHASTIC DIFFERENTIAL GAMES

Combining (4.12), (4.13) and (4.15) in (4.14) we conclude that

β[a1] ≡ β[a2] on [t, τ ].

To extend Proposition 52 we need an additional definition that has the following motivation. Inmany applications we are interested in observing certain quantities at a stopping time. More precisely,given a stopping time θ we are interested in considering a Fθ−measurable random variable, Xθ. Inthe case of a non-anticipative controlled stopping time, θν , we are interested in considering a familyof observations, Xθν

, which are also non-anticipative. Intuitively this should mean that if ν1 and ν2are equal up to time s and θν1 < s then θν1 = θν2 and the observations for these controls are equal,Xθν1 = Xθν2 . Rigorously we have:

Definition 58. Given a non-anticipative controlled stopping time θν , a family of random variablesXνν∈U is called a controlled observation associated with θν if Xν ∈ Fθν , and, for every ν1, ν2 ∈ U ,τ ∈ T[t,T ] such that

ν1 ≡ ν2 on [t, τ ],

we have

Xν11θν1<τ = Xν21θν2<τ.

Example 59. Let θν be a non-anticipative controlled stopping time, ν ∈ U . The following arecontrolled observations associated with θν :

• θν ;

• ψ(Xν), where Xν is any controlled observation associated with θν and ψ is measurable;

• Xν(θν), where Xν is a right-continuous non-anticipative controlled process. Indeed, if

ν1 ≡ ν2 on [t, τ ],

then we have, by the non-anticipativity of Xν and by Remark 29, that

P(Xν1(s) = Xν2(s) : ∀s ∈ [t, τ)) = 1.

Thus

Xν1(θν1)1θν1<τ = Xν2(θν2)1θν2<τ.

• Xν(θν), where Xν is a left-continuous non-anticipative controlled process such that Xν1(t) =Xν2(t), for all ν1, ν2 ∈ U . Like the previous example, this is also a consequence of Remark 29.

We can now extend Proposition 52 in the following way:

Proposition 60. Let θa be a non-anticipative controlled stopping time and βii≥1 ⊂ ∆(t) be suchthat ∀a∈Atβi[a] ≡ βj [a] on [t, θa]. If for each a ∈ At, Λai ni=1 ⊂ Fθa forms a partition of Ω, and foreach 1 ≤ i ≤ n, 1Λa

ia∈At is a controlled observation associated with θa, then β, defined by

β[a] :=n∑i=1

1Λaiβi[a],

is in ∆(t).

Page 59: A weak dynamic programming principle for zero-sum stochastic ... · A weak dynamic programming principle for zero-sum stochastic differential games Pedro Miguel Almeida Serra Costa

4.3. THE MARKOVIAN SCENARIO 47

Proof. Given a ∈ At, β[a] is in Bt by Lemma 32.We just need to prove the non-anticipativity property. As usual, we start by considering a1, a2 ∈

At, τ ∈ T[t,T ] such that

a1 ≡ a2 on [t, τ ].

Then, for each i, we have

βi[a1] ≡ βi[a2] on [t, τ ].

Thus, for any i,

β[a1] ≡θa1≥τ βi[a1] ≡θa1≥τ βi[a2] ≡θa1≥τ∩θa2≥τ β[a2] on [t, τ ].

Because θa is a non-anticipative controlled stopping time,

1θa1≥τ∩θa2≥τ = 1θa1≥τ = 1θa2≥τ, P− a.s..

Thus we conclude that

β[a1] ≡θa1≥τ β[a2] on [t, τ ]. (4.16)

On the other hand, because 1Λai

is a controlled observation,

βi[a1]1Λa1i

1θa1<τ ≡ βi[a2]1Λa2i

1θa2<τ on [t, τ ],

hence, by definition of β, we get

β[a1] ≡θa1<τ β[a2] on [t, τ ]. (4.17)

Combining (4.16) and (4.17) we conclude that

β[a1] ≡ β[a2] on [t, τ ].

The next Proposition gives a natural construction for strategies inDelta(s) from strategies in ∆(t), for s ≥ t.

Proposition 61. For t ≤ s ≤ T and θ ∈ T[s,T ] let β ∈ ∆(t), a ∈ At be fixed. Define β by

β[a] := β[a⊕θ a],

for all a ∈ As. Then β ∈ ∆(s).

Proof. β is well defined because if a ∈ As then, by Lemma 32, a⊕θ a ∈ At, and because Bt ⊂ Bs.We need to prove the non-anticipativity property. As usual, we consider a1, a2 ∈ As, τ ∈ T[t,T ]

such that

a1 ≡ a2 on [s, τ ].

Then

a⊕θ a1 ≡ a⊕θ a2 on [t, τ ].

From the non-anticipativity of β we conclude that

β[a⊕θ a1] ≡ β[a⊕θ a2] on [t, τ ].

Thus,

β[a1] ≡ β[a2] on [s, τ ].

Page 60: A weak dynamic programming principle for zero-sum stochastic ... · A weak dynamic programming principle for zero-sum stochastic differential games Pedro Miguel Almeida Serra Costa

48 CHAPTER 4. STOCHASTIC DIFFERENTIAL GAMES

4.4 Properties of the value function

4.4.1 Non-randomness

Recall the definitions of lower and upper values:

V (t, x) = essinfβ∈∆(t)

esssupa∈At

J(t, x; a, β[a]),

U(t, x) = esssupα∈Γ(t)

essinfb∈Bt

J(t, x;α[b], b).

In this Section we prove that these value functions, which are a priori random variables, are infact deterministic. More precisely, we show that

V (t, x) = E[V (t, x)],U(t, x) = E[U(t, x)]. (4.18)

This result was first proven in [10] and follows from the invariance of the value function withrespect to the Girsanov transformation:

ρhω(.) = ω(.) +∫ .

0

h(s)ds,

where h ∈ L2([0, T ]; RN ) is arbitrary.We start by deducing the law of ρh.

Lemma 62. Define

Zh := exp

(∫ T

0

h(s)dWs −12

∫ T

0

|h(s)|2ds

).

Then the law of ρh is given by P (ρh)−1 = Zh.P. More precisely, for any random variable X, wehave

E [X ρh] = E[XZh].

Proof. Let W := W ρh. Then Wt = Wt +∫ t0h(s)ds, hence by Girsanov’s Theorem (Theorem 141),

W is a Q−Brownian motion, where

dQdP

= exp

(−∫ T

0

h(s)dWs −12

∫ T

0

|h(s)|2ds

).

Thus

EP [1Wρh∈Λ]

= EQ[dPdQ

1W∈Λ

]= EQ

[exp

(∫ T

0

h(s)dWs +12

∫ T

0

|h(s)|2ds

)1W∈Λ

]

= EQ

[exp

(∫ T

0

h(s)dWs −12

∫ T

0

|h(s)|2ds

)1W∈Λ

]

= EP

[exp

(∫ T

0

h(s)dWs −12

∫ T

0

|h(s)|2ds

)1W∈Λ

]= EP[Zh1W∈Λ].

Page 61: A weak dynamic programming principle for zero-sum stochastic ... · A weak dynamic programming principle for zero-sum stochastic differential games Pedro Miguel Almeida Serra Costa

4.4. PROPERTIES OF THE VALUE FUNCTION 49

As a consequence of the previous Lemma we have the following Corollary:

Corollary 63. Let X be a random variable. Then

E[X ρh|Ft] = E[Z−h|Ft](E[XZh|Ft] ρh). (4.19)

Furthermore, if Zh is Ft−measurable then

E[X ρh|Ft] = E[X|Ft] ρh. (4.20)

Proof. We just need to check that for any Λ ∈ Ft

E[X ρh1Λ] = E [E[Z−h|Ft](E [XZh|Ft] ρh)1Λ] .

This is just a simple computation:

E[X ρh1Λ] = E[(X1ρh(Λ)

) ρh]

= E[ZhX1ρh(Λ)]= E[E[ZhX|Ft]1ρh(Λ)]= E[(E[ZhX|Ft] ρh1Λ) ρ−h]= E[Z−h(E[ZhX|Ft] ρh)1Λ]= E[E[Z−h|Ft](E[ZhX|Ft] ρh)1Λ].

To obtain (4.20) we first notice that

Zh ρh = exp

(∫ T

0

h(s)dWs −12

∫ T

0

|h(s)|2ds

)

= exp

(∫ T

0

h(s)dWs +12

∫ T

0

|h(s)|2ds

)= (Z−h)−1.

Thus, if Zh is Ft measurable,

E[X ρh|Ft] = E[Z−h|Ft](E[XZh|Ft] ρh)= Z−h(Zh ρh)(E[X|Ft] ρh)= E[X|Ft] ρh.

We can now proceed to the proof of the main result of this section:

Theorem 64. The random variables V (t, x) and U(t, x) are constant, i.e.,

V (t, x) = E[V (t, x)],U(t, x) = E[U(t, x)].

Proof. The result for the lower value is proved in detail and is a consequence of the next two Propo-sitions. The proof for the upper value is analogous.

Proposition 65. V (t, x) is invariant by ρh, that is

V (t, x) ρh = V (t, x).

Page 62: A weak dynamic programming principle for zero-sum stochastic ... · A weak dynamic programming principle for zero-sum stochastic differential games Pedro Miguel Almeida Serra Costa

50 CHAPTER 4. STOCHASTIC DIFFERENTIAL GAMES

Proof. We consider a fixed (t, x) ∈ S and for each h we define ht := 1[0,t]h. The proof is now dividedinto several steps.

Step 1: J(t, x; a, b)(ρh) = J(t, x; a(ρht), b(ρht)).

First we notice that (∫ s

t

ψrdWr

)(ρht) =

∫ s

t

ψr(ρht)dWr,

because ht(r) = 0, for r ∈ [t, T ], and so dWr(ρht) = dWr for all r ∈ [t, T ].Thus,

Xa,bt,x (s)(ρht) = x+

(∫ s

t

µ(r,Xa,b

t,x (r); ar, br)dr

)(ρht) +

(∫ s

t

σ(r,Xa,b

t,x (r); ar, br)dWr

)(ρht)

= x+∫ s

t

µ(r,Xa,b

t,x (r)(ρht); ar(ρht), br(ρht))dr +

+∫ s

t

σ(r,Xa,b

t,x (r)(ρht); ar(ρht), br(ρht))dWr,

which implies that Xa,bt,x (ρht) is a solution of (4.1) with a, b replaced by a(ρht), b(ρht). By uniqueness

of solution, we conclude that

Xa,bt,x (ρht) = X

a(ρht ),b(ρht )t,x .

Since Zht is Ft−measurable, we can use (4.20) to get

J(t, x; a(ρht), b(ρht)) = E[f(Xa(ρht ),b(ρht )t,x (T )

) ∣∣∣Ft]= E

[f(Xa,bt,x (T )

) ρht

∣∣∣Ft]= E

[f(Xa,bt,x (T )

) ∣∣∣Ft] ρht

= J(t, x; a, b)(ρht).

Furthermore, since J(t, x; a, b) is Ft−measurable it will only depend on ω through ω|[0,t], thus

J(t, x; a, b)(ρht) = J(t, x; a, b)(ρh).

Step 2: Given β ∈ ∆(t), we define βh by βh[a] := β[a(ρ−h)](ρh). Then we claim that βh ∈ ∆(t).

Note that βh is well defined and βh[a] ∈ Bt. We just need to verify the non-anticipativity property.If a1, a2 ∈ At and τ ∈ T[t,T ] are such that a1 ≡ a2 on [t, τ ], then we have

a1(ρ−h) ≡ a2(ρ−h) on [t, τ(ρ−h)].

Since τ(ρ−h) is a stopping time we get by the non-anticipativity of β that

β[a1(ρ−h)] ≡ β[a2(ρ−h)] on [t, τ(ρ−h)],

hence

β[a1(ρ−h)](ρh) ≡ β[a2(ρ−h)](ρh) on [t, τ ].

Step 3:(

esssupa∈At

J(t, x; a, β[a]))

(ρh) = esssupa∈At

(J(t, x; a, β[a])(ρh)

).

Page 63: A weak dynamic programming principle for zero-sum stochastic ... · A weak dynamic programming principle for zero-sum stochastic differential games Pedro Miguel Almeida Serra Costa

4.4. PROPERTIES OF THE VALUE FUNCTION 51

We have, for all a ∈ At,

esssupa∈At

J(t, x; a, β[a]) ≥ J(t, x; a, β[a]),

hence (esssupa∈At

J(t, x; a, β[a]))

(ρh) ≥ J(t, x; a, β[a])(ρh),

and since a is arbitrary we conclude that(esssupa∈At

J(t, x; a, β[a]))

(ρh) ≥ esssupa∈At

(J(t, x; a, β[a])(ρh)

).

Similarly, we deduce that(esssupa∈At

(J(t, x; a, β[a])(ρh)

))(ρ−h) ≥ esssup

a∈At

J(t, x; a, β[a]),

that is,

esssupa∈At

(J(t, x; a, β[a])(ρh)

)≥(

esssupa∈At

J(t, x; a, β[a]))

(ρh).

Thus (esssupa∈At

J(t, x; a, β[a]))

(ρh) = esssupa∈At

(J(t, x; a, β[a])(ρh)

).

Note that repeating the same argument yields(essinfβ∈∆(t)

esssupa∈At

J(t, x; a, β[a]))

(ρh) = essinfβ∈∆(t)

esssupa∈At

(J(t, x; a, β[a])(ρh)

).

Step 4: V (t, x)(ρh) = V (t, x).

Using the previous properties we compute:

V (t, x)(ρh) =(

essinfβ∈∆(t)

esssupa∈At

J(t, x; a, β[a]))

(ρh)

= essinfβ∈∆(t)

esssupa∈At

(J(t, x; a, β[a])(ρh))

= essinfβ∈∆(t)

esssupa∈At

J(t, x; a(ρht), βh

t

[a(ρht)])

= essinfβ∈∆(t)

esssupa∈At

J(t, x; a, β[a])

= V (t, x).

Proposition 66. Let X be a random variable such that X(ρh) = X, for any h ∈ L2([0, T ]; RN ).Then X = E[X].

Proof. For any Borel set O ⊂ R,

E[1X∈O] = E[1ρhX∈O ρh

]= E

[Zh1ρhX∈O

]= E

[Zh1X(ρ−h)∈O

]= E

[Zh1X∈O

].

Page 64: A weak dynamic programming principle for zero-sum stochastic ... · A weak dynamic programming principle for zero-sum stochastic differential games Pedro Miguel Almeida Serra Costa

52 CHAPTER 4. STOCHASTIC DIFFERENTIAL GAMES

Thus, by definition of Zh, we deduce that

E

[exp

(∫ T

0

h(s)dWs

)1X∈O

]= E

[exp

(12

∫ T

0

|h(s)|2ds

)]E[1X∈O]

= E

[exp

(∫ T

0

h(s)dWs

)]E[1X∈O],

where the second equality follows from E[Zh] = 1.Since O and h are arbitrary we conclude that X ⊥⊥W and hence X ⊥⊥ FT , which is only possible

if X is constant.

By the previous result and from this point onwards, V (t, x) can denote both a random variable ora function of (t, x), without any possible ambiguity. Obviously the same applies to the upper valuefunction, U .

The previous result reveals a connection between our value functions and the ones defined byFleming and Souganidis in their original article [2]. This connection is explored in the next Proposition.

Proposition 67. Suppose that the essential infimum and essential supremum in the definition of Vare achieved in an uniform way. More precisely assume that:

∀ε>0 ∃βε∈∆(t) V (t, x) ≥ esssupa∈At

J(t, x; a, βε[a])− ε,

∀β∈∆(t) ∀ε>0 ∃aβ,ε∈Atesssupa∈At

J(t, x; a, β[a]) ≤ J(t, x; aβ,ε, β[aβ,ε

]) + ε.

Then

V (t, x) = infβ∈∆(t)

supa∈At

E[f(Xa,β[a]t,x (T )

)]. (4.21)

Proof. On one hand we have

V (t, x) = E[V (t, x)]

= E[essinfβ∈∆(t)

esssupa∈At

J(t, x; a, β[a])]

≤ infβ∈∆(t)

E[esssupa∈At

J(t, x; a, β[a])]

≤ infβ∈∆(t)

E[J(t, x; aβ,ε, β

[aβ,ε

])]+ ε

≤ infβ∈∆(t)

supa∈At

E [J(t, x; a, β[a])] + ε.

On the other hand,

V (t, x) = E[V (t, x)]

= E[essinfβ∈∆(t)

esssupa∈At

J(t, x; a, β[a])]

≥ E[esssupa∈At

J(t, x; a, βε[a])]− ε

≥ supa∈At

E [J(t, x; a, βε[a])]− ε

≥ infβ∈∆(t)

supa∈At

E [J(t, x; a, β[a])]− ε.

Thus,

infβ∈∆(t)

supa∈At

E [J(t, x; a, β[a])]− ε ≤ V (t, x) ≤ infβ∈∆(t)

supa∈At

E [J(t, x; a, β[a])] + ε,

Page 65: A weak dynamic programming principle for zero-sum stochastic ... · A weak dynamic programming principle for zero-sum stochastic differential games Pedro Miguel Almeida Serra Costa

4.4. PROPERTIES OF THE VALUE FUNCTION 53

and, since ε is arbitrary, we conclude that

V (t, x) = infβ∈∆(t)

supa∈At

E[f(Xa,β[a]t,x (T )

)].

The difference to [2] is that in (4.21) we allow controls and strategies which are dependent on thepast, that is, we allow controls and strategies which are not independent of Ft.

On the previous Proposition we assume the existence of strategies that achieve the infimum inthe definition of V in an uniform way. On the next Proposition we give sufficient conditions for suchstrategies to exist.

Proposition 68. Let Λ ∈ Ft, with P(Λ) > 0, and β ∈ ∆(t) be such that

1ΛV (t, x) ≥ 1Λ

(esssupa∈At

J(t, x; a, β[a])− ε

). (4.22)

Suppose that one of the following holds:

(i) For each h ∈ L2, β[a ρht ] = β[a] ρht ;

(ii) supa∈At‖β[a]‖H∞ <∞.

Then there exists βε ∈ ∆(t) such that

V (t, x) ≥ esssupa∈At

J(t, x; a, βε[a])− 2ε.

Proof. Suppose that (i) holds. Then(esssupa∈At

J(t, x; a, β[a])) ρh = esssup

a∈At

(J(t, x; a, β[a]) ρh)

= esssupa∈At

J(t, x; a ρht , β[a] ρht)

= esssupa∈At

J(t, x; a ρht , β[a ρht ])

= esssupa∈At

J(t, x; a, β[a]).

Thus, by Proposition 66, esssupa∈At

J(t, x; a, β[a]) is a constant random variable. Since V (t, x) is also

constant we deduce from (4.22) that

V (t, x) ≥ esssupa∈At

J(t, x; a, β[a])− ε.

Thus we can take βε := β.

Now suppose that (ii) holds. From (4.22) it follows that

1ρ−h(Λ)V (t, x) = (1ΛV (t, x)) ρh

≥(1Λ

(esssupa∈At

J(t, x; a, β[a])− ε

)) ρh

= 1ρ−h(Λ)

(esssupa∈At

J(t, x; a ρht , β[a] ρht)− ε

)= 1ρ−h(Λ)

(esssupa∈At

J(t, x; a ρht , βh

t

[a ρht ])− ε

)= 1ρ−h(Λ)

(esssupa∈At

J(t, x; a, βh

t

[a])− ε

)≥ 1ρ−h(Λ)

(essinfβ∈B

esssupa∈At

J(t, x; a, β[a])− ε

),

Page 66: A weak dynamic programming principle for zero-sum stochastic ... · A weak dynamic programming principle for zero-sum stochastic differential games Pedro Miguel Almeida Serra Costa

54 CHAPTER 4. STOCHASTIC DIFFERENTIAL GAMES

where

B :=βh

t

: h ∈ L2.

Since h ∈ L2 is arbitrary we deduce that

1ΛV (t, x) ≥ 1Λ

(essinfβ∈B

esssupa∈At

J(t, x; a, β[a])− ε

), (4.23)

where

Λ =⋃h∈L2

ρ−h(Λ).

Now we remark that

1Λ ρh = 1ρ−h(Λ) = 1Λ.

Thus, by Proposition 66,

1Λ = E [1Λ] = P(Λ) > 0,

which implies that 1Λ = 1. From (4.23) we then deduce that

V (t, x) ≥ essinfβ∈B

esssupa∈At

J(t, x; a, β[a])− ε,

By Theorem 40, there is a sequence βi := βhti ⊂ B such that

essinfβ∈B

esssupa∈At

J(t, x; a, β[a]) = infi

esssupa∈At

J(t, x; a, βi[a]).

Let

Λi :=

esssupa∈At

J(t, x; a, βi[a]) ≤ essinfβ∈B

esssupa∈At

J(t, x; a, β[a]) + ε

∈ Ft,

and define Λ1 := Λ1, Λi+1 := Λi+1 \⋃ik=1 Λk. Then Λi ⊂ Ft forms a partition, modulo null sets, of

Ω.We now remark that, for each a ∈ At,

‖βi[a]‖H∞ = ‖βhti [a]‖H∞ = ‖β[a ρ−ht

i] ρht

i‖H∞ = ‖β[a ρ−ht

i]‖H∞ ≤ sup

a∈At

‖β[a]‖H∞ ,

and hence

supi‖βi[a]‖H∞ <∞.

Thus, we can apply Lemma 32 to conclude that βε, defined by

βε[a] :=∑i

1Λiβi[a],

is a well defined strategy. Furthermore,

esssupa∈At

J(t, x; a, βε[a]) =∑i

1Λiesssupa∈At

J(t, x; a, βi[a])

≤ essinfβ∈B

esssupa∈At

J(t, x; a, β[a]) + ε

≤ V (t, x) + 2ε.

Remark 69. Notice that, on the previous Proposition, if β is a Markov control policy as in Example38, then condition (i) is satisfied.

Page 67: A weak dynamic programming principle for zero-sum stochastic ... · A weak dynamic programming principle for zero-sum stochastic differential games Pedro Miguel Almeida Serra Costa

4.4. PROPERTIES OF THE VALUE FUNCTION 55

4.4.2 Growth rate

In this section we prove that the value function has polynomial growth. This will be important inthe discussion of uniqueness of solution of the HJBI equation.

Recall Remark 47, where ‖β‖p is defined by (4.11) when A is compact.

Proposition 70. Let A be compact and suppose that for each (t, x) and for each ε there existsβε(t,x) ∈ ∆(t) such that

V (t, x) ≥ esssupa∈At

J(t, x; a, βε(t,x)[a])− ε, P− a.s.,

and

‖βε(t,x)‖p ≤ C(1 + |x|m), (4.24)

for some C,m. Here p is the growth power of f , as in (4.8).Then V has polynomial growth, i.e.,

|V (t, x)| ≤ C(1 + |x|p + |x|pm), (4.25)

for some constant C.

Proof. Consider the collection βε(t,x) in the hypothesis of the Proposition. Then

V (t, x) ≥ esssupa∈At

J(t, x; a, βε(t,x)[a])− ε, P− a.s..

Now we estimate J :

|J(t, x; a, βε(t,x)[a])| ≤ E[∣∣∣f (Xa,βε

(t,x)[a]

t,x (T ))∣∣∣∣∣∣Ft]

≤ E[C

(1 +

∣∣∣Xa,βε(t,x)[a]

t,x (T )∣∣∣p)∣∣∣Ft]

≤ C(1 + |x|p + ‖βε(t,x)[a]‖

pHp,∞

t

)≤ C

(1 + |x|p + (T − t)‖βε(t,x)[a]‖

pH∞t,p

)≤ C (1 + |x|p + (T − t)|x|pm) ,

where the second inequality follows by (4.8), the third inequality follows by (4.4) and the fact thatA is compact, in the fourth inequality we used inequality (ii) of Lemma 22, and the last inequalityfollows from (4.24).

Thus (4.25) holds, for some constant C.

Remark 71. If B is compact then we may use m = 0 on the previous Proposition, thus getting thatV has at most the same growth as f .

4.4.3 Continuity in the space variable

In this Section we show that if f is globally Lipschitz then the value function is also Lipschitz inthe space variable.

Proposition 72. Suppose f is Lipschitz. Then

|V (t, x)− V (t, x′)| ≤ K|x− x′|

Page 68: A weak dynamic programming principle for zero-sum stochastic ... · A weak dynamic programming principle for zero-sum stochastic differential games Pedro Miguel Almeida Serra Costa

56 CHAPTER 4. STOCHASTIC DIFFERENTIAL GAMES

Proof. We start with the following estimate:

(J(t, x; a, b)− J(t, x′; a, b))2 ≤ E[(f(Xa,bt,x (T )

)− f

(Xa,bt,x′(T )

))2 ∣∣∣Ft]≤ K2E

[∣∣∣Xa,bt,x (T )−Xa,b

t,x′(T )∣∣∣2 ∣∣∣Ft]

≤ CK2|x− x′|2,

where the last inequality follows from Lemma 148.The Lipschitz continuity of V now follows easily from the relations

essinfa,β

(J − J ′) ≤ essinfβ

esssupa

J − essinfβ

esssupa

J ′ ≤ esssupa,β

(J − J ′),

and the fact that C does not depend on a, b.

If A,B are compact and f is locally Lipschitz we get that for all t, V (t, .) is continuous. Theproof of this fact uses an estimate for |J(t, x; a, b)− J(t, x′; a, b)| that is analogous to the one used inthe proof of Proposition 85. Thus we omit the proof of this fact and, instead, we refer the reader toRemark 87.

Page 69: A weak dynamic programming principle for zero-sum stochastic ... · A weak dynamic programming principle for zero-sum stochastic differential games Pedro Miguel Almeida Serra Costa

4.5. WEAK DYNAMIC PROGRAMMING PRINCIPLE 57

4.5 Weak dynamic programming principle

This Section is the core of this Chapter. We prove a weak version of the dynamic programmingprinciple (DPP) that will be used later to derive the HJBI equation. The original version of the weakDPP is due to Touzi and Bouchard and was proved in [1] in the context of stochastic control.

This weak version of the DPP avoids measurability problems by considering weaker inequalitiesinvolving test functions for the value function. This is enough, however, to prove that the valuefunction is a viscosity solution of the HJBI equation. Indeed, when working in the viscosity sense,the value function is replaced by a smooth test function. Therefore it is enough to prove a dynamicprogramming principle that gives information on these functions.

There are two main differences between the weak and the traditional versions of the dynamicprogramming principle. One, already mentioned, is that in the weak version we use a test functionmore regular than the value function. The other is that in this version we will consider an intermediatetime that is a stopping time, θ ∈ T[t,T ], instead of a deterministic time, t + δ. Considering stoppingtimes will be useful in the proof of the HJBI equation because there we will be interested in boundingthe state process to use the martingale properties of the stochastic integral.

There will be two main assumptions in the proof of the weak DPP. One has to do with thecontinuity of the reward function while the other is technical. We will try to motivate both beforestating and proving the Theorem.

When proving the traditional DPP we follow an optimal strategy up to a fixed time t + δ andthen we concatenate it with a strategy which is optimal from that time onwards. In the stochasticsetting at time t + δ the state process will end up randomly in an uncountable number of positions.We can not consider an uncountable number of strategies to patch while preserving for instancemeasurability. Therefore we must make a small error by choosing only one fixed strategy for eachgiven neighborhood, instead of for each point. To control the error made by choosing a single strategyfor an entire neighborhood we must have some continuity in the space variable of the reward function,that is, require continuity of J(t+ δ, x; a, b) in x.

In this version of the DPP we use stopping times, hence, we will switch strategies at a randomtime and, again, we will have to make an error. To control this error in time we will need continuity ofthe reward function in the time variable. Thus we must show that J(t, x; a, b) is continuous in (t, x).Furthermore, because we have 2 players and 2 controls, we do not want to allow one player to ruinthe other player’s patching of strategies by augmenting too much the error of the patching. Thus, wewill have to require continuity of J(.; a, β[a]) uniformly in a.

One question that arises naturally in this context is how we should interpret the continuity ofJ(.; a, b). Indeed, J is a random variable, and even worse, J(t, .; a, b) is Ft−measurable while J(t +δ, .; a, b) is Ft+δ−measurable. It should be clear that we can not ask for continuity J(.; a, b) uniformlyin ω. Indeed, even if δ is very small, there will certainly exist histories ω for which the state processchanges drastically from t to t + δ thus changing the reward function as well. The solution to thisproblem is averaging: by averaging through all the histories from t to t+ δ we can obtain continuity.More precisely:

Definition 73. We say that J(t, x; a, b) is continuous in x and from the left in t, uniformly withrespect to ω if for each ε > 0 and (t, x) ∈ S there is ra,b(t,x) > 0 such that

|E[J(t, x; a, b)|Ft′ ]− J(t′, x′; a, b)| < ε, P− a.s.,

for all (t′, x′) ∈ B(t, x; ra,b(t,x)

), where B(t, x; r) := [t− r, t]×Br(x).

If we can find a radius rb(t,x) that is valid for all a ∈ A then we say that the continuity is uniformwith respect to a ∈ A. Analogously, we can define uniform continuity with respect to t, x, b.

Usually, to simplify the text, we will just say that J(.; a, b) is left-continuous uniformly with respectto ω.

To obtain continuity uniformly with respect to a and to avoid other technicalities we will assumethat A is compact.

Page 70: A weak dynamic programming principle for zero-sum stochastic ... · A weak dynamic programming principle for zero-sum stochastic differential games Pedro Miguel Almeida Serra Costa

58 CHAPTER 4. STOCHASTIC DIFFERENTIAL GAMES

The other assumption that will be made is technical in nature and has to do with the existence ofa sequence of strategies that approximate the essential infimum of the reward function in an uniformway. More precisely, we make the following definition:

Definition 74. βε(t,x) ∈ ∆(t) is an uniformly ε−optimal strategy for V (t, x) if

V (t, x) ≥ esssupa∈At

J(t, x; a, βε(t,x)[a])− ε, P− a.s..

The definition of uniformly ε−optimal strategy for U(t, x) is analogous.

We will assume that for each (t, x) and each ε there is an uniformly ε−optimal strategy βε(t,x) ∈∆(t). Furthermore, to be able to use the uniform continuity of the reward function, we will needto bound these optimal strategies by requiring ‖βε(t,x)‖p+δ to be locally bounded as a function of(t, x), for some δ > 0, where p it the growth power of f . Here we recall that, since A is compact,‖β‖p+δ := supa∈At

‖β[a]‖H∞t,p+δis well defined for all β ∈ ∆(t).

We are now ready to state and prove the weak dynamic programming principle.

Theorem 75 (Weak dynamic programming principle). Let p be the power growth of f , as in (4.8),and consider δ > 0 such that p+ δ ≥ 2.

Suppose that A is compact and for every M, J(t, x; a, b) is left-continuous in (t, x) uniformly withrespect to t ∈ [0, T ], a ∈ At, b ∈ b ∈ Bt : ‖b‖H∞t,p+δ

≤M, ω ∈ Ω.Suppose also that, for each (t, x) and each ε, there exists an uniformly ε−optimal strategy for

V (t, x), βε(t,x) ∈ ∆(t), such that ‖βε(t,x)‖p+δ is locally bounded as a function of (t, x).Let φ : S → R be a continuous function3 and θa,b be a non-anticipative controlled stopping time

such that

W (t, x; a, b) := E[φ(θa,b, Xa,b

t,x (θa,b)) ∣∣∣Ft]

makes sense for every a ∈ At, b ∈ Bt.It follows that:

1. If φ ≥ V and φ(t, x) ≥ −C(1 + |x|m) for some C,m, then

V (t, x) ≤ essinfβ∈∆(t)

esssupa∈At

W (t, x; a, β[a]). (4.26)

2. If φ ≤ V then

V (t, x) ≥ essinfβ∈∆(t)

esssupa∈At

W (t, x; a, β[a]). (4.27)

Proof. We start by defining

W (t, x;β) := esssupa∈At

W (t, x; a, β[a]),

W (t, x) := essinfβ∈∆(t)

W (t, x;β),

and fixing (t, x).Notice that due to the fact that θa,b is a non-anticipative controlled stopping time, W (t, x; a, b)

satisfies the independence of irrelevant alternatives as J(t, x; a, b), see Lemma 35.The proof of the Theorem is divided into Lemmas 76, 78, which establish (4.26) and (4.27),

respectively.

Lemma 76. In the conditions of Theorem 75, (4.26) is valid.

3In practice, later we will use the weak dynamic programming principle on functions φ ∈ C1,2(S).

Page 71: A weak dynamic programming principle for zero-sum stochastic ... · A weak dynamic programming principle for zero-sum stochastic differential games Pedro Miguel Almeida Serra Costa

4.5. WEAK DYNAMIC PROGRAMMING PRINCIPLE 59

Proof. Let ε > 0. We will proceed in several steps:

Step 1: Find βε,m ∈ ∆(t) and Ft 3 Λm Ω such that 1ΛmW (t, x) ≥ 1Λm(W (t, x;βε,m)− ε).

There is a countable sequence (βi) ⊂ ∆(t) such that

W (t, x) = infi≥1

W (t, x;βi), P− a.s.

We consider Λi := W (t, x)−W (t, x;βi) ≥ −ε ∈ Ft and define Λ1 := Λ1, Λi+1 := Λi+1 \⋃ij=1 Λi.

Then we note that P(⋃

i≥1 Λi)

= 1 and Λi ∩ Λj = ∅ thus, by Proposition 52,

βε,m :=m∑i=1

1Λiβi + 1(Λm)cβ1 ∈ ∆(t),

where Λm =⋃mi=1 Λi. Besides, due to the independence of irrelevant alternatives of W we have4

1ΛmW (t, x;βε,m) = 1Λmesssupa∈At

W

(t, x; a,

m∑i=1

1Λiβi[a] + 1(Λm)cβ1[a]

)

= esssupa∈At

m∑i=1

1ΛiW (t, x; a, βi[a])

=m∑i=1

1ΛiW (t, x;βi)

≤ 1Λm (W (t, x) + ε) ,

that is,

1ΛmW (t, x) ≥ 1Λm(W (t, x;βε,m)− ε), ,P− a.s. (4.28)

Step 2: Fix m and define βε := βε,m. Find ε−optimal strategies for V , βεi , to patch with βε after thestopping time.

By hypothesis, there is a family(βε(s,y)

)(s,y)

such that βε(s,y) ∈ ∆(s) and

φ(s, y) ≥ V (s, y) ≥ J(s, y; βε(s,y)

)− ε

≥ J(s, y; a, βε(s,y)[a]

)− ε, P− a.s., (4.29)

for any a ∈ As.Furthermore, ‖βε(s,y)‖p+δ is locally bounded. We define

βε(s,y) := βε ⊕s βε(s,y) ∈ ∆(t),

and remark that ‖βε(s,y)‖p+δ is also locally bounded. Hence, for each n, there is Mn such that∀y∈Bn(0) ‖βε(s,y)‖p+δ ≤Mn.

Since φ is continuous and, for each n, Bn(0) is compact and J(t, x; a, b) is left-continuous in (t, x),uniformly with respect to t ∈ [0, T ], a ∈ At and b ∈ b ∈ Bt : ‖b‖H∞t,p+δ

≤ Mn, there is also a family(rn) such that

φ(s, y)− φ(s′, y′) ≤ ε, (4.30)

E[J(s, y; a, βε(s,y)[a]

) ∣∣∣Fs′]− J(s′, y′; a, βε(s,y)[a]

)≥ −ε, (4.31)

4We use the fact that if Λii is a collection of disjoint sets then esssupa

Pi 1Λi

Xa =P

i 1ΛiesssupaXa.

Page 72: A weak dynamic programming principle for zero-sum stochastic ... · A weak dynamic programming principle for zero-sum stochastic differential games Pedro Miguel Almeida Serra Costa

60 CHAPTER 4. STOCHASTIC DIFFERENTIAL GAMES

for all a ∈ At, (s, y) ∈ [t, T ]× Bn(0), (s′, y′) ∈ B(s, y; rn) ∩ [t, T ]× Rd.Since Brn(y) : y ∈ Bn(0) is an open covering of Bn(0), then, for each n, we can find a finite

collection (si, yi) such that Oni := B(si, yi; rn) is a covering of [t, T ] × Bn(0). We define On1 :=On1 , O

ni+1 := Oni+1 \

⋃ik=1 O

nk and On =

⋃i≥1O

ni .

Combining (4.29), (4.30), (4.31) we conclude that if (s′, y′) ∈ Oni , a ∈ At then

J(s′, y′; a, βεi [a]) ≤ φ(s′, y′) + 3ε, P− a.s. (4.32)

where βεi := βε(si,yi)∈ ∆(si). Indeed,

J(s′, y′; a, βεi [a]) ≤ E[J(si, yi; a, βε[a]⊕si β

ε(si,yi)

[a]) ∣∣∣Fs′]+ ε

= E[J(si, yi; a, βε(si,yi)

[a]) ∣∣∣Fs′]+ ε

≤ φ(si, yi) + 2ε, using (4.29),≤ φ(s′, y′) + 3ε.

Step 3: We will now patch the strategies together to get a sequence of almost optimal strategies forplayer 2, βn.

Given a ∈ At, we then define βn[a] by

βn[a] :=∑i≥1

1Πa,niβεi [a] + 1(Πa,n)cβε[a], (4.33)

where Πa,ni =

(θa, X

a,βε[a]t,x (θa)

)∈ Oni

∈ Fθa , Πa,n = ∪i≥1Π

a,ni ∈ Fθa and θa := θa,β

ε[a]. The proofthat βn ∈ ∆(t) is done in Lemma 77.

By definition of βn, we have thatβn[a] ≡ βε[a] on [t, θa]βn[a] ≡Πa,n

iβεi [a] on [t, T ]. (4.34)

Then we notice that for all a ∈ At and for almost all ω ∈ Πa,ni ,

E[f(Xa,βn[a]t,x (T )

) ∣∣∣Fθa

](ω) = E

[f

(Xa,βε

i [a]

θa(ω),Xa,βε[a]t,x (θa(ω))

(T )) ∣∣∣Fθa(ω)

](ω)

= J(θa(ω), Xa,βε[a]

t,x (θa(ω)); a, βεi [a])

(ω)

≤ φ(θa(ω), Xa,βε[a]

t,x (θa(ω)))

+ 3ε,

where the first equality follows from the flow property for solutions of SDE’s5 and (4.34), and theinequality follows from (4.32). Thus,

E[f(Xa,βn[a]t,x (T )

) ∣∣∣Fθa

]1Πa,n ≤

(φ(θa, X

a,βε[a]t,x (θa)

)+ 3ε

)1Πa,n . (4.35)

Step 4: Finally, we take the limit as n→∞.

By the tower property of expectations and (4.35), we deduce that,

J(t, x; a, βn[a]) = E[f(Xa,βn[a]t,x (T )

)1Πa,n

∣∣∣Ft]+ E[f(Xa,βn[a]t,x (T )

)1(Πa,n)c

∣∣∣Ft]= E

[E[f(Xa,βn[a]t,x (T )

)|Fθa

]1Πa,n

∣∣∣Ft]+ E[f(Xa,βε[a]t,x (T )

)1(Πa,n)c

∣∣∣Ft]≤ E

[(φ(θa, X

a,βε[a]t,x (θa)

)+ 3ε

)1Πa,n

∣∣∣Ft]+ E[f(Xa,βε[a]t,x (T )

)1(Πa,n)c

∣∣∣Ft]5Here we refer to the following property: Xt,x(T ) = Xs,Xt,x(s)(T ), where t ≤ s ≤ T .

Page 73: A weak dynamic programming principle for zero-sum stochastic ... · A weak dynamic programming principle for zero-sum stochastic differential games Pedro Miguel Almeida Serra Costa

4.5. WEAK DYNAMIC PROGRAMMING PRINCIPLE 61

In the following we simplify the notation and let Xs := Xa,βε[a]t,x (s). We recall that, by hypothesis,

φ(x) + C(1 + |x|m) ≥ 0. Hence,

E [φ(θa, Xθa)1Πa,n |Ft] = E [(φ(θa, Xθa) + C (1 + |Xθa |m))1Πa,n |Ft]− E [C (1 + |Xθa |m)1Πa,n |Ft]≤ E [(φ(θa, Xθa) + C (1 + |Xθa |m)) |Ft]− E [C (1 + |Xθa |m)1Πa,n |Ft]= E

[φ(θa, Xθa) + C (1 + |Xθa |m)1(Πa,n)c |Ft

]→ E [φ(θa, Xθa)|Ft] ,

where the convergence as n → ∞ is uniform with respect to a and ω, and follows from dominatedconvergence and the following facts:

• By (4.6), E[sups∈[t,T ]

∣∣∣Xa,βε[a]t,x (s)

∣∣∣m ∣∣∣Ft] is bounded uniformly in a;

• 1(Πa,n)c ≤ 1nsups∈[t,T ]

˛X

a,βn[a]t,x (s)

˛>n

o → 0 uniformly with respect to a.

Using the same convergence arguments and the fact that f has polynomial growth, we deduce that

E[f(Xa,βε[a]t,x (T )

)1(Πa,n)c

∣∣∣Ft]→ 0,

uniformly in a.Thus, for n sufficiently large, the following holds, for all a ∈ At,

J(t, x; a, βn[a]) ≤W (t, x; a, βε[a]) + 4ε (4.36)

Now we recall that, by (4.28) and the definition of W (t, x;βε),

1ΛmW (t, x) ≥ 1Λm(W (t, x;βε)− ε) ≥ 1Λm(W (t, x; a, βε[a])− ε). (4.37)

Combining (4.36) and (4.37) we get

1ΛmJ(t, x; a, βn[a]) ≤ 1Λm(W (t, x) + 5ε),

and since a, ε,m are arbitrary we conclude that

V (t, x) ≤W (t, x).

Lemma 77. Let βn be defined by (4.33). Then βn ∈ ∆(t).

Proof. Notice that we have

βn[a]s =∑i≥1

βi,ε[a]s1Πa,ni

+ 1(Πa,n)cβε[a]s,

where

βi,ε := βε ⊕si∨θa βε(si,yi).

Since θa,b is a non-anticipative stopping time and βε is non-anticipative we conclude that θa :=θa,β

ε[a] is also a non-anticipative stopping time and thus by Proposition 57 we conclude that βi,ε ∈∆(t).

On the other hand

βi,ε[a] ≡ βj,ε[a] ≡ βε[a] on [t, θa],

and

1Πa,ni

= 1[si−rn,si](θa)1Brn (yi)

(Xa,βε[a]t,x (θa)

).

Page 74: A weak dynamic programming principle for zero-sum stochastic ... · A weak dynamic programming principle for zero-sum stochastic differential games Pedro Miguel Almeida Serra Costa

62 CHAPTER 4. STOCHASTIC DIFFERENTIAL GAMES

Because Xa,bt,x is a non-anticipative controlled process then so is Xa,βε[a]

t,x . Thus, by Example 59,

Xa,βε[a]t,x (θa) is a non-anticipative controlled observation associated with θa.

We conclude that 1Πa,ni

is a non-anticipative controlled observation associated with θa, hence, byProposition 60 βn ∈ ∆(t).

Lemma 78. In the conditions of Theorem 75, (4.27) is valid.

Proof. This proof is similar to the one of Lemma 76, hence we will only outline the main points andthe ones that are different. Once again we proceed by steps:

Step 1: Find the strategy for player 2, βε, and the initial reply of player 1, aε.

By hypothesis there is βε ∈ ∆(t) such that

V (t, x) ≥ J(t, x;βε)− ε, P− a.s.. (4.38)

On the other hand there is a family (ai) ⊂ At such that

W (t, x) ≤W (t, x;βε) = supi≥1

W (t, x; ai, βε[ai]), P− a.s..

Consider Λi := W (t, x; ai, βε[ai]) −W (t, x;βε) ≥ −ε ∈ Ft and define Λ1 = Λ1, Λi+1 = Λi+1 \⋃ik=1 Λk. Then Λii is a partition of Ω, modulo null sets, thus, by Lemma 32 and because A is

compact,

aε :=∑i≥1

1Λiai ∈ At. (4.39)

Moreover, by the independence of irrelevant alternatives for W and βε, we have

W (t, x; aε, βε[aε]) = W

t, x;∑i≥1

1Λiai, βε

∑i≥1

1Λiai

= W

t, x;∑i≥1

1Λiai,∑i≥1

1Λiβε [ai]

=

∑i≥1

1ΛiW (t, x; ai, βε [ai])

≥ W (t, x)− ε,

Thus, for aε given by (4.39), we have

W (t, x) ≤W (t, x; aε, βε[aε]) + ε, P− a.s.. (4.40)

Step 2: Find ε−optimal controls, ai,ε, for player 1 to continue the game after the stopping time andthe neighborhoods, Ai, where they shall be used.

Define, for each s ≥ t, βεs by

βεs [a] := βε[aε ⊕s a].

We have βεs ∈ ∆(s) by Proposition 61. In addition, we have that βεs ∈ ∆(r) for each r ∈ [t, s], becauseβε ∈ ∆(t). Notice that, by Proposition 43,

βεs [a] = βε[aε]⊕s βεs [a].

As in step 1, for each (s, y) we can find aε(s,y) ∈ As such that

φ(s, y) ≤ V (s, y) ≤ J(s, y; aε(s,y), β

εs [a

ε(s,y)]

)+ ε, ,P− a.s. (4.41)

Page 75: A weak dynamic programming principle for zero-sum stochastic ... · A weak dynamic programming principle for zero-sum stochastic differential games Pedro Miguel Almeida Serra Costa

4.5. WEAK DYNAMIC PROGRAMMING PRINCIPLE 63

Because J(.; a, b)(ω) is left-continuous and φ is continuous there is also a family (r(s,y))(s,y) such that

φ(s, y)− φ(s′, y′) ≥ −ε, (4.42)

E[J(s, y; aε(s,y), β

εs [a

ε(s,y)]

) ∣∣∣Fs′]− J(s′, y′; aε(s,y), β

εs [a

ε(s,y)]

)≤ ε, (4.43)

for all (s′, y′) ∈ B(s, y; r(s,y)), where aε(s,y) := aε ⊕s aε(s,y) ∈ At.Since B(s, y; r) : (s, y) ∈ S, 0 < r ≤ r(s,y) forms a Vitali covering of S, we can find a countable

sequence (si, yi, ri) such that B(si, yi; ri)i forms a partition of S and 0 < ri ≤ r(si,yi). For the notionof Vitali covering we refer the reader to [6, p. 158].

Combining (4.41), (4.42), (4.43) we conclude that if (s′, y′) ∈ B(si, yi; ri), then

J(s′, y′; aεi , β

εsi

[aεi ])≥ φ(s′, y′)− 3ε, P− a.s. (4.44)

where aεi := aε(si,yi). Indeed,

J(s′, y′; aεi , β

εsi

[aεi ])

≥ E[J(si, yi; aε ⊕si a

εi , β

εsi

[aε ⊕si aεi ]) ∣∣∣Fs′]− ε

= E[J(si, yi; aεi , β

εsi

[aεi ]) ∣∣∣Fs′]− ε

≥ φ(si, yi)− 2ε, using (4.41),≥ φ(s′, y′)− 3ε.

Step 3: Patching the controls together to get the reply of player 1, a, to βε.

We define a ∈ At by

a :=∑i≥1

aεi1Πi=∑i≥1

aε ⊕si aεi1Πi ,

where Πi =(θ,X

aε,βε[aε]t,x (θ)

)∈ B(si, yi; ri)

∈ Fθ and θ := θa

ε,βε

. Then, by Corollary 45 andProposition 43, we have

βε[a] =∑i≥1

1Πiβε[aε]⊕si β

ε[aε ⊕si aεi ]

=∑i≥1

1Πiβε[aε]⊕si β

εsi

[aεi ]

=∑i≥1

1Πi βεsi

[aεi ].

Thus, a ≡Πi a

εi

βε[a] ≡Πi βεsi

[aεi ](4.45)

By (4.44), (4.45), we have for almost all ω ∈ Πi

E[f(Xa,βε[a]t,x (T )

) ∣∣∣Fθ] (ω) = E[f

(Xa,βε[a]

θ(ω),Xaε,βε[aε]t,x (θ(ω))

(T )) ∣∣∣Fθ(ω)

](ω)

= J(θ(ω), Xaε,βε[aε]

t,x (θ(ω)); aεi , βεsi

[aεi ])

(ω)

≥ φ(θ(ω), Xaε,βε[aε]

t,x (θ(ω)))− 3ε.

Thus,

E[f(Xa,βε[a]t,x (T )

) ∣∣∣Fθ]1Πi ≥(φ(θ,X

aε,βε[aε]t,x (θ)

)− 3ε

)1Πi , P− a.s.. (4.46)

Page 76: A weak dynamic programming principle for zero-sum stochastic ... · A weak dynamic programming principle for zero-sum stochastic differential games Pedro Miguel Almeida Serra Costa

64 CHAPTER 4. STOCHASTIC DIFFERENTIAL GAMES

Step 4: Combining a and βε.

By (4.38) and the definition of J(t, x;βε), we have that

V (t, x) ≥ J(t, x;βε)− ε ≥ J(t, x; a, βε[a])− ε, P− a.s.. (4.47)

Also, we have, by the tower property of conditional expectation and (4.46), that

J(t, x; a, βε[a]) =∑i≥1

E[f(Xa,βε[a]t,x (T )

)1Πi

∣∣∣Ft]=

∑i≥1

E[E[f(Xa,βε[a]t,x (T )

)|Fθ]1Πi

∣∣∣Ft]≥

∑i≥1

E[(φ(θ,X

aε,βε[aε]t,x (θ)

)− 3ε

)1Πi

∣∣∣Ft]= W (t, x; aε, βε[aε])− 3ε. (4.48)

By (4.47), (4.48), (4.40), we deduce that

V (t, x) ≥ J(t, x; a, βε[a])− ε ≥W (t, x; aε, βε[aε])− 4ε ≥W (t, x)− 5ε,

and, since ε is arbitrary, we conclude that

V (t, x) ≥W (t, x).

Remark 79. Since V (t, x) is deterministic, and if we assume additionally that it is continuous andof polynomial growth, then we can take φ(t, x) = V (t, x) in both inequalities, thus getting a version ofthe traditional dynamic programming principle with controlled stopping times.

Remark 80. If B is bounded then the assumption on the existence of uniformly ε−optimal strategies,βε(t,x), is satisfied. Indeed, we know that there is a sequence (βi) such that

V (t, x) = infi

esssupa∈At

J(t, x; a, βi[a]).

We can now consider Λi = V (t, x) − esssupa∈AtJ(t, x; a, βi[a]) ≥ −ε ∈ Ft and define Λ1 := Λ1,

Λi+1 := Λi+1\⋃ik=1 Λk. Then Λi is a partition of Ω and because B is bounded we have by Proposition

52 that

βε :=∑i≥1

1Λiβi ∈ ∆(t).

Moreover, by the property of independence of irrelevant alternatives of J , we have

esssupa∈At

J(t, x; a, βε[a]) =∑i≥1

1Λiesssupa∈At

J(t, x; a, βi[a])

≤ V (t, x) + ε.

Furthermore, the condition of local boundedness of ‖βε(t,x)‖p+δ with respect to (t, x) is immediatesince B is bounded.

If B is not bounded and the essential infimum is not achieved uniformly with respect to ω then werun into problems in Lemma 76 when searching for the countable sequence of strategies βεi to patch.This is due to the fact that in this case we are only able to find strategies βε(s,y) which approximateV (s, y) in Ω \ Λ(s,y). When we consider all the points (s, y) we may miss an important part of Ω.

Page 77: A weak dynamic programming principle for zero-sum stochastic ... · A weak dynamic programming principle for zero-sum stochastic differential games Pedro Miguel Almeida Serra Costa

4.5. WEAK DYNAMIC PROGRAMMING PRINCIPLE 65

Remark 81. We now make a remark on the need of uniform continuity for J in the proof of the weakDPP.

First we notice that for the second inequality we only need continuity for each a, b fixed insteadof uniformly in a, b. The reason is that for the second inequality we need to construct carefully thecontrol for the first player in order to approximate the value function and this is done after player 2chooses his strategy, hence the second player can not perturb the approximation made by player 1.

On the other hand, on the proof of the first inequality, uniform continuity seems to be fundamental.The reason is that for this inequality we are constructing the strategy for the second player before player1 chooses his control. Thus we must make a choice that ensures that our approximation of the valuefunction is good enough independently of the first player’s control.

We end this Section by stating the corresponding Theorem for the upper value function. The proofis completely analogous.

Theorem 82. Let p be the power growth of f , as in (4.8), and consider δ > 0 such that p+ δ ≥ 2.Suppose that B is compact and for every M, J(t, x; a, b) is left-continuous in (t, x) uniformly with

respect to t ∈ [0, T ], a ∈ a ∈ At : ‖a‖H∞t,p+δ≤M, b ∈ Bt, ω ∈ Ω.

Suppose also that, for each (t, x) and each ε, there exists an uniformly ε−optimal strategy forU(t, x), αε(t,x) ∈ Γ(t), such that ‖αε(t,x)‖p+δ is locally bounded as a function of (t, x).

Let φ : S → R be a continuous function and θa,b be a non-anticipative controlled stopping timesuch that

W (t, x; a, b) := E[φ(θa,b, Xa,b

t,x (θa,b)) ∣∣∣Ft]

makes sense for every a ∈ At, b ∈ Bt.It follows that:

1. If φ ≤ U and φ(t, x) ≤ C(1 + |x|m) for some C,m, then

U(t, x) ≥ esssupα∈Γ(t)

essinfb∈Bt

W (t, x;α[b], b).

2. If φ ≥ U then

U(t, x) ≤ esssupα∈Γ(t)

essinfb∈Bt

W (t, x;α[b], b).

4.5.1 Optimal stochastic control as a particular case

We now consider the particular case of Theorem 75 in the context of stochastic control. For thatwe set A = a which is obviously compact. In this case player 1 has no choices, hence we have onlyone player, player two, like in stochastic control.

In this situation we have

V (t, x) = essinfb∈Bt

J(t, x; a, b),

and Theorem 75 reads as

Theorem 83 (Weak DPP for stochastic control 1). Let p be the power growth of f , as in (4.8), andconsider δ > 0 such that p+ δ ≥ 2.

Suppose that for each (t, x) and each ε there exists bε(t,x) ∈ Bt such that

V (t, x) ≥ J(t, x; a, bε(t,x)

)− ε, P− a.s.,

and ‖bε(t,x)‖H∞t,p+δis locally bounded as a function of (t, x).

Page 78: A weak dynamic programming principle for zero-sum stochastic ... · A weak dynamic programming principle for zero-sum stochastic differential games Pedro Miguel Almeida Serra Costa

66 CHAPTER 4. STOCHASTIC DIFFERENTIAL GAMES

Let φ : S → R be a continuous function and θb be a non-anticipative controlled stopping time suchthat

W (t, x; b) := E[φ(θb, Xa,b

t,x (θb)) ∣∣∣Ft]

makes sense for every b ∈ Bt.It follows that:

1. If φ ≥ V and φ(t, x) ≥ −C(1 + |x|m) for some C,m, then

V (t, x) ≤ essinfb∈Bt

W (t, x; b). (4.49)

2. If φ ≤ V then

V (t, x) ≥ essinfb∈Bt

W (t, x; b). (4.50)

In the previous version of the Theorem we have already lifted most of the assumptions that are nolonger needed because we no longer need uniform continuity with respect to a. Yet we can still adjusta few details namely:

• Since, in this case, the condition (ii) of Proposition 68 is immediately satisfied, we conclude thatfor each (t, x) there exists bε(t,x) ∈ Bt such that

V (t, x) ≥ J(t, x; a, bε(t,x)

)− ε, P− a.s..

• The local boundedness of ‖bε(t,x)‖H∞t,p+δand the condition φ(t, x) ≥ −C(1 + |x|m) were there to

ensure uniform convergence with respect to a. Thus, they are no longer required;

• Since now we do not need strategies then there is no need to require θb to be non-anticipative;

• In this context the second inequality is easier to obtain and does not require much regularity ofφ. In fact we only need φ to be measurable. Indeed:

V (t, x) = essinfb∈Bt

E[f(Xa,bt,x (T )

) ∣∣∣Ft]= essinf

b∈Bt

E[E[f(Xa,bt,x (T )

) ∣∣∣Fθb

] ∣∣∣Ft]= essinf

b∈Bt

E[E[f

(Xa,b

θb,Xa,bt,x (θb)

(T )) ∣∣∣Fθb

] ∣∣∣Ft] ,and

E[f

(Xa,b

θb,Xa,bt,x (θb)

(T )) ∣∣∣Fθb

](ω) ≥ essinf

b∈Bθb(ω)

E[f

(Xa,b

θb,Xa,bt,x (θb)

(T )(ω)) ∣∣∣Fθb

](ω)

= V(θb(ω), Xa,b

t,x (θb)(ω))

≥ φ(θb(ω), Xa,b

t,x (θb)(ω)).

Thus,

V (t, x) ≥ essinfb∈Bt

φ(θb, Xa,b

t,x (θb)).

• Due to the previous observation, the regularity of φ now is only needed to prove the firstinequality of the weak dynamic programming principle. In this case φ needs not be continuousbut just lower semi-continuous.

Page 79: A weak dynamic programming principle for zero-sum stochastic ... · A weak dynamic programming principle for zero-sum stochastic differential games Pedro Miguel Almeida Serra Costa

4.5. WEAK DYNAMIC PROGRAMMING PRINCIPLE 67

By the previous fourth remark we can, in particular, consider the lower semi-continuous envelope of v,v∗, as φ in the second inequality. Indeed, v∗ is lower semi-continuous, hence measurable, and v∗ ≤ v.For the definition of semi-continuous envelope we refer the reader to Definition 113, in the Appendix.

Taking in account all the previous remarks we get the following version of the dynamic program-ming principle for stochastic control problems:

Theorem 84 (Weak DPP for stochastic control 2). Let φ : S → R and θb be a family of stoppingtimes such that

W (t, x; b) := E[φ(θb, Xa,b

t,x (θb)) ∣∣∣Ft]

makes sense for every b ∈ Bt.It follows that:

1. If φ ∈ LSC(S), φ ≥ V , then

V (t, x) ≤ essinfb∈Bt

W (t, x; b).

2.

V (t, x) ≥ essinfb∈Bt

E[v∗

(θb, Xa,b

t,x (θb)) ∣∣∣Ft] .

This version of the DPP is analogous to the one in [1] for diffusion processes, when controls candepend on the past.

4.5.2 Continuity of the reward function

We will now study the continuity of the reward function J in the context of Definition 73. Moreprecisely we will find conditions under which we can deduce that J has the uniform continuity requiredfor the proof of the weak dynamic programming principle.

Proposition 85. Suppose that A is compact and f is locally Lipschitz. Then J(t, x; a, b) is left-continuous uniformly with respect to t ∈ [0, T ], a ∈ At, b ∈ b ∈ Bt : ‖b‖H∞t,p+δ

≤M, and ω. Here p isthe growth power of f as in (4.8), and δ > 0 is such that p+ δ ≥ 2.

Proof. Consider x, x′ ∈ Rd and t, t′ ∈ [0, T ] such that t′ ≤ t. Let a ∈ At′ and b ∈ b ∈ Bt : ‖b‖H∞t,p+δ≤

M be arbitrary. We use the following notation:

Xs := Xa,bt,x (s), X ′

s := Xa,bt′,x′(s).

Since f is locally Lipschitz there is Kn such that:

|f(x)− f(x′)| ≤ Kn|x− x′|, if |x|, |x′| ≤ n.

Using Holder’s inequality we have

E[|f(XT )− f(X ′

T )|1|XT |>n|Ft′]≤[|f(XT )− f(X ′

T )|p+δ

p

∣∣∣Ft′] pp+δ

E[1|XT |>n|Ft′

] δp+δ .

On one hand we have, by (4.8) and by Jensen’s inequality, that

|f(XT )− f(X ′T )|

p+δp ≤ C

(1 + |XT |p+δ + |X ′

T |p+δ),

and hence, by (4.4) we conclude that[|f(XT )− f(X ′

T )|p+δ

p

∣∣∣Ft′] ≤ C(1 + |x|p+δ + |x′|p+δ +Mp+δ

).

Page 80: A weak dynamic programming principle for zero-sum stochastic ... · A weak dynamic programming principle for zero-sum stochastic differential games Pedro Miguel Almeida Serra Costa

68 CHAPTER 4. STOCHASTIC DIFFERENTIAL GAMES

On the other hand, we have by Chebyshev’s inequality that

E[1|XT |>n|Ft′

]≤ 1

n2E[|XT |2|Ft′

]≤ C

n2

(1 + |x|2 +M2

)≤ C

n2

(1 + |x|p+δ + |x′|p+δ +Mp+δ

) 2p+δ

≤ C

n2

(1 + |x|p+δ + |x′|p+δ +Mp+δ

).

Thus we have

E[|f(XT )− f(X ′

T )|1|XT |>n|Ft′]≤ C

n2δ

p+δ

(1 + |x|p+δ + |x′|p+δ +Mp+δ

),

and we have an analogous inequality for E[|f(XT )− f(X ′

T )|1|X′T |>n|Ft′

].

Now recall Definition 73. We then have

|E[J(t, x; a, b)|Ft′ ]− J(t′, x′; a, b)| = |E [f(XT )− f(X ′T )|Ft′ ] |

≤ E [|f(XT )− f(X ′T )| |Ft′ ]

= E[|f(XT )− f(X ′

T )|1|XT |,|X′T |≤n|Ft′

]+

+E[|f(XT )− f(X ′

T )|1|XT |,|X′T |≤nc |Ft′

]≤ KnE [|XT −X ′

T | |Ft′ ] +

+C

n2δ

p+δ

(1 + |x|p+δ + |x′|p+δ +Mp+δ

)≤ Kn

(E[|XT −X ′

T |2∣∣∣Ft′]) 1

2+ε

2,

for a fixed n large enough, depending only on K,T,M, |x|, |x′|.By Lemma 148, there is r, such that

∀(t′,x′)∈B(t,x;r|x|) KnE[|XT −X ′

T |2∣∣∣Ft′] 1

2 ≤ ε

2.

Since n depends only on K,T,M, |x|, |x′| we conclude that r depends only on K,T,M, |x|. Thus,J(t, x; a, b) is left-continuous uniformly with respect to t ∈ [0, T ], a ∈ At, b ∈ b ∈ Bt : ‖b‖H∞t,p+δ

≤M,and ω ∈ Ω.

Remark 86. We remark that on the previous Proposition even if t′ = t the rate of convergence

E[|f(XT )− f(X ′

T )|1|XT |,|X′T |≤nc |Ft′

]→ 0

may depend on M . Thus if f is not globally Lipschitz, i.e. if Kn →∞, the rate of convergence of

|E[J(t, x; a, b)|Ft′ ]− J(t′, x′; a, b)| → 0

will in general depend on M as well.If this did not happen, then the inequality

|J(t, x; a, b)− J(t, x′; a, b)| ≤ ε

would be true for all a ∈ At, b ∈ Bt. Thus, as a consequence of the inequality

essinfa,β

(J − J ′) ≤ essinfβ

esssupa

J − essinfβ

esssupa

J ′ ≤ esssupa,β

(J − J ′),

we would get that V (t, .) would be continuous.

Remark 87. If B is compact then we can forget the dependence on M on Proposition 85. Thus, bythe previous Remark, we conclude that if A,B are compact and f is locally Lipschitz then V (t, .) iscontinuous.

Page 81: A weak dynamic programming principle for zero-sum stochastic ... · A weak dynamic programming principle for zero-sum stochastic differential games Pedro Miguel Almeida Serra Costa

4.6. HAMILTON-JACOBI-BELLMAN-ISAACS EQUATION 69

4.6 Hamilton-Jacobi-Bellman-Isaacs equation

We know from Theorem 64 that V (t, x) is deterministic, that is, we can think of it as a functionV : S → R. In this section we give a PDE characterization for V by means of a HJBI equation.

We introduce the Hamiltonians H± : S× Rd × Rd×d → R by

H−(t, x, p,X) = infa∈A

supb∈B

Ha,b(t, x, p,X),

H+(t, x, p,X) = supb∈B

infa∈A

Ha,b(t, x, p,X),

where

Ha,b(t, x, p,X) = −〈µ(t, x; a, b), p〉 − 12Tr((σσT

)(t, x; a, b)X

).

Throughout this section we assume that both µ, σ are continuous in all variables. This impliesthat Ha,b(t, x, p,X) is continuous in a, b, t, x, p,X.

Remark 88. If A,B are compact then we know that H+,H− are continuous functions. However ifeither A or B fails to be compact then both the Hamiltonians can be discontinuous. In that case wecan only ensure semi-continuity. Indeed, if A is compact then H− is lower semi-continuous (see [5,p. 148]) and H+ is also lower semi-continuous because it is the supremum of continuous functions.Analogously, if B is compact then H− and H+ are upper semi-continuous.

The main result of this section is the next Theorem which states that V is a discontinuous viscositysolution of the following Hamilton-Jacobi-Bellman-Isaacs equation:

−∂tV +H−(., DV,D2V ) = 0, on (0, T )× Rd−V = −f on T × Rd. (4.51)

We need to consider the notion of discontinuous viscosity solutions because we did not prove yetthat V is a continuous function. For the notion of discontinuous viscosity solutions we refer the readerto the Appendix, section A.3. This notion makes use of the concepts of semi-continuous envelopes,defined in Definition 113 of the Appendix.

Theorem 89. Suppose A is compact. Then

1. V ∗ is a viscosity subsolution of−∂tV ∗ +H−(., DV ∗, D2V ∗) ≤ 0, on (0, T )× Rd−V ∗ ≤ −f on T × Rd;

2. V∗ is a viscosity supersolution of−∂tV∗ + (H−)∗(., DV∗, D2V∗) ≥ 0, on (0, T )× Rd−V∗ ≥ −f on T × Rd.

Proof of 1. We start with the subsolution property. For the boundary condition we just notice that

V ∗(T, x) ≥ V (T, x) = f(x).

In the interior of the domain we proceed by contradiction, that is, we suppose that there is (t0, x0)and φ smooth satisfying

(V ∗ − φ)(t, x) < (V ∗ − φ)(t0, x0) = 0, for all (t, x) ∈ (0, T )× Rd, (t, x) 6= (t0, x0),

such that (−∂tφ+H−(., Dφ,D2φ)

)(t0, x0) ≥ 3δ,

Page 82: A weak dynamic programming principle for zero-sum stochastic ... · A weak dynamic programming principle for zero-sum stochastic differential games Pedro Miguel Almeida Serra Costa

70 CHAPTER 4. STOCHASTIC DIFFERENTIAL GAMES

for some δ > 0.

Step 1: Find a strategy β that contradicts the weak DPP.

We consider ϕ(t, x) = φ(t, x) + |t− t0|2 + |x− x0|4 and notice that(−∂tϕ+H−(., Dϕ,D2ϕ)

)(t0, x0) ≥ 3δ.

Thus, for each a ∈ A, there is ba ∈ B such that(−∂tϕ+Ha,ba(., Dϕ,D2ϕ)

)(t0, x0) ≥ 2δ,

and since H .,b(t, x) is continuous, for each a, there is a ra such that, for all a ∈ Bra(a),(−∂tϕ+H a,ba(., Dϕ,D2ϕ)

)(t0, x0) ≥ δ.

Since Bra(a) : a ∈ A is an open covering of A, we can find a finite family ai such that⋃i≥1

Bri(ai) = A,

where ri := rai.

Let Λ1 := Br1(a1), Λi+1 := Bri+1(ai+1) \⋃ik=1 Λk, and define

ψ(a) :=∑i≥1

bi1Λi(a),

where bi := bai .Then ψ : A→ B is measurable, bounded and, for all a ∈ A,(

−∂tϕ+Ha,ψ(a)(., Dϕ,D2ϕ))

(t0, x0) ≥ δ. (4.52)

With ψ we can define a strategy for player 2 by β[a]s := ψ(as). We have seen in Example 38 thatβ ∈ ∆(t).

Step 2: Find a non-anticipative controlled stopping time, θa.

Due to (4.52), and since A and ψ(A) are compact, we conclude, by the continuity of Ha,b(.), thatthere is R such that, for all a ∈ A,(

−∂tϕ+Ha,ψ(a)(., Dϕ,D2ϕ))

(t, x) ≥ 0, for all (t, x) ∈ BR(t0, x0). (4.53)

Let (tn, xn) be a sequence in BR(t0, x0) converging to (t0, x0) such that V (tn, xn) → V ∗(t0, x0)and let Xa,n

. := Xa,β[a]tn,xn

(.). For each n, we consider the family of stopping times

θan := infs ≥ tn : (s,Xa,ns ) /∈ BR(t0, x0) < +∞.

Then, for each n, by Example 55, θana∈Atnis a non-anticipative controlled stopping time.

Step 3: Getting the contradiction.

By Ito’s formula, we have

ϕ(θan, X

a,nθa

n

)− ϕ(tn, xn) =

∫ θan

tn

(∂tϕ−Has,β[a]s(., Dϕ,D2ϕ)

)(s,Xa,n

s ) ds+

+∫ θa

n

tn

(Dϕσ(.; as, β[a]s)) (s,Xa,ns ) dWs.

Page 83: A weak dynamic programming principle for zero-sum stochastic ... · A weak dynamic programming principle for zero-sum stochastic differential games Pedro Miguel Almeida Serra Costa

4.6. HAMILTON-JACOBI-BELLMAN-ISAACS EQUATION 71

Since σ,Dϕ are continuous and (s,Xa,ns ) is bounded for s ∈ [tn, θan], we have that

E

[∫ θan

tn

(Dϕσ(.; as, β[a]s)) (s,Xa,ns ) dWs

∣∣∣Ftn]

= 0.

On the other hand, β[a]s = ψ(as), hence, by taking expectations, we get

ϕ(tn, xn) = E

[ϕ(θan, X

a,nθa

n

)−∫ θa

n

tn

(∂tϕ−Has,ψ(as)(., Dϕ,D2ϕ)

)(s,Xa,n

s )ds∣∣∣Ftn

]≥ E

[ϕ(θan, X

a,nθa

n

) ∣∣∣Ftn] , (4.54)

where the inequality follows from (4.53).Now we notice that Xa,n

θan∈ ∂BR(t0, x0), hence there is η > 0 such that

ϕ(θan, X

a,nθa

n

)≥ φ

(θan, X

a,nθa

n

)+ 2η. (4.55)

Since ϕ(tn, xn) → ϕ(t0, x0) = V ∗(t0, x0) and V (tn, xn) → V ∗(t0, x0) we conclude that, for n suffi-ciently large,

V (tn, xn) ≥ ϕ(tn, xn)− η. (4.56)

Combining (4.54), (4.55) and (4.56), we conclude that

V (tn, xn) ≥ E[φ(θan, X

a,nθa

n

) ∣∣∣Ftn]+ η.

Since a ∈ Atn is arbitrary we deduce that

V (tn, xn) ≥ esssupa∈Atn

E[φ(θan, X

a,θa

n

) ∣∣∣Ftn]+ η,

hence

V (tn, xn) ≥ essinfβ∈∆(tn)

esssupa∈Atn

E[φ(θan, X

a,β[a]tn,xn

(θan)) ∣∣∣Ftn]+ η,

which contradicts the inequality (4.26) of the weak DPP.

Proof of 2. We now give the proof of the supersolution property. The boundary condition is completelyanalogous to that of the subsolution. In the interior of the domain we proceed again by contradiction,that is, we suppose that there is (t0, x0) and φ smooth satisfying

(V∗ − φ)(t, x) > (V∗ − φ)(t0, x0) = 0, for all (t, x) ∈ [0, T )× Rd, (t, x) 6= (t0, x0),

such that (−∂tφ+ (H−)∗(., Dφ,D2φ)

)(t0, x0) ≤ −3δ,

for some δ > 0.

Step 1: Find a control a that contradicts the weak DPP.

Consider ϕ(t, x) = φ(t, x)− |t− t0|2 − |x− x0|4. Then(−∂tϕ+ (H−)∗(., Dϕ,D2ϕ)

)(t0, x0) ≤ −3δ.

Since (H−)∗ is upper semi-continuous, there is R such that(−∂tϕ+ (H−)∗(., Dϕ,D2ϕ)

)(t, x) ≤ −2δ, for all (t, x) ∈ BR(t0, x0).

Page 84: A weak dynamic programming principle for zero-sum stochastic ... · A weak dynamic programming principle for zero-sum stochastic differential games Pedro Miguel Almeida Serra Costa

72 CHAPTER 4. STOCHASTIC DIFFERENTIAL GAMES

By (H−)∗ ≥ H−, we then conclude that(−∂tϕ+H−(., Dϕ,D2ϕ)

)(t, x) ≤ −2δ, for all (t, x) ∈ BR(t0, x0).

We now remark that supb∈B Ha,b(., Dϕ,D2ϕ)(t, x) is lower semi-continuous in a and (t, x) because itis the supremum of continuous functions. Taking in account this fact we deduce that

F :=

(t, x; a) ∈ BR(t0, x0)×A :(−∂tϕ+ sup

b∈BHa,b(., Dϕ,D2ϕ)

)(t, x) ≤ −δ

is a closed set and, by the previous inequality, π

BR(t0,x0)(F ) = BR(t0, x0).

Thus, by Proposition 151, there is a measurable function a∗ : BR(t0, x0) → A such that, for allb ∈ B, (

−∂tϕ+Ha∗(t,x),b(., Dϕ,D2ϕ))

(t, x) ≤ −δ ≤ 0, for all (t, x) ∈ BR(t0, x0). (4.57)

We then have by Example 33 that as := a∗(s,Xs) defines an admissible control.

Step 2: Find a non-anticipative controlled stopping time, θb.

Consider

η :=12

min(t,x)∈∂BR(t0,x0)

(φ(t, x)− ϕ(t, x)) > 0.

Let (tn, xn) be a sequence in BR(t0, x0) converging to (t0, x0) such that V (tn, xn) → V∗(t0, x0).Then ϕ(tn, xn) → ϕ(t0, x0) = V∗(t0, x0), hence, for a fixed n sufficiently large,

V (tn, xn) ≤ ϕ(tn, xn) + η. (4.58)

We now define (t, x) := (tn, xn), Xb. := Xa,b

t,x (.) and

θb := infs ≥ t : (s,Xbs) /∈ BR(t0, x0) < +∞.

Then, by Example 55, θbb∈Btis a non-anticipative controlled stopping time.

Step 3: Getting the contradiction.

Let β ∈ ∆(t). By Ito’s formula, we have

ϕ(θβ[a], X

β[a]

θβ[a]

)− ϕ(t, x) =

∫ θβ[a]

t

(∂tϕ−Has,β[a]s(., Dϕ,D2ϕ)

)(s,Xβ[a]

s

)ds+

+∫ θβ[a]

t

(Dϕσ(.; as, β[a]s))(s,Xβ[a]

s

)dWs,

Since σ,Dϕ are continuous and(s,X

β[a]s

)is bounded for s ∈

[t, θβ[a]

], we have that

E

[∫ θβ[a]

t

(Dϕσ(.; as, β[a]s))(s,Xβ[a]

s

)dWs

∣∣∣Ft] = 0.

On the other hand as = a∗, hence, by taking expectations we get

ϕ(t, x) = E

[ϕ(θβ[a], X

β[a]

θβ[a]

)−∫ θβ[a]

t

(∂tϕ−Ha∗(.),β[a]s(., Dϕ,D2ϕ)

)(s,Xβ[a]

s

)ds∣∣∣Ft]

≤ E[ϕ(θβ[a], X

β[a]

θβ[a]

) ∣∣∣Ft] , (4.59)

Page 85: A weak dynamic programming principle for zero-sum stochastic ... · A weak dynamic programming principle for zero-sum stochastic differential games Pedro Miguel Almeida Serra Costa

4.6. HAMILTON-JACOBI-BELLMAN-ISAACS EQUATION 73

where the inequality follows from (4.57).Since Xβ[a]

θβ[a] ∈ ∂BR(t0, x0) we have, by definition of η, that

ϕ(θβ[a], X

β[a]

θβ[a]

)≤ φ

(θβ[a], X

β[a]

θβ[a]

)− 2η. (4.60)

Combining (4.58), (4.59) and (4.60), we conclude that

V (t, x) ≤ E[φ(θβ[a], X

β[a]

θβ[a]

) ∣∣∣Ft]− η ≤ esssupa∈At

E[φ(θβ[a], X

β[a]

θβ[a]

) ∣∣∣Ft]− η,

and, since β ∈ ∆(tn) is arbitrary, we deduce that

V (t, x) ≤ essinfβ∈∆(t)

esssupa∈At

E[φ(θβ[a], X

β[a]

θβ[a]

),∣∣∣Ft]− η.

which contradicts the inequality (4.27) of the weak DPP.

Analogously we obtain a PDE characterization for the upper value:

Theorem 90. Suppose B is compact. Then

1. U∗ is a viscosity subsolution of−∂tU∗ +H+

∗ (., DU∗, D2U∗) ≤ 0, on (0, T )× Rd−U∗ ≤ −f on T × Rd;

2. U∗ is a viscosity supersolution of−∂tU∗ +H+(., DU∗, D2U∗) ≥ 0, on (0, T )× Rd−U∗ ≥ −f on T × Rd.

4.6.1 Uniqueness

In this Section we prove uniqueness of solution for (4.51) in the case where both A,B are compact.The procedure will be similar to that of the Laplace equation: we will prove that the difference ofsolutions is a subsolution of an equation for which there is a maximum principle. From that we extractthat the supremum is attained at the boundary and hence is zero.

We assume throughout this Section that A,B are compact. In that case we know that H− iscontinuous, hence H−

∗ = (H−)∗ = H−.

Lemma 91. Suppose that A,B are compact. If v1, v2 are respectively a viscosity subsolution andsupersolution of (4.51) then w = v1 − v2 is a viscosity subsolution of

−∂tw + infa∈A, b∈B Ha,b(.;Dw,D2w) ≤ 0, on (0, T )× Rdw ≤ 0 on T × Rd (4.61)

Proof. Let φ ∈ C1,2([0, T ] × Rd) and fix (t0, x0) ∈ argmax (w − φ). By Remark 101 we can supposethat (t0, x0) is actually a strict maximum, that is:

(w − φ)(t, x) < (w − φ)(t0, x0) = 0, for all (t, x) ∈ BR(t0, x0), (t, x) 6= (t0, x0).

We then consider the function

ψε(t, x, y) = v1(t, x)− v2(t, y)−|x− y|2

ε2− φ(t, x),

and the points (depending on ε)

(t, x, y) ∈ argmax[t0−R,t0+R]×BR(x0)×BR(x0)

ψε(t, x, y).

Since (t0, x0) is the global maximum of w − φ we know, by Lemma 109, that

Page 86: A weak dynamic programming principle for zero-sum stochastic ... · A weak dynamic programming principle for zero-sum stochastic differential games Pedro Miguel Almeida Serra Costa

74 CHAPTER 4. STOCHASTIC DIFFERENTIAL GAMES

• (t, x, y) → (t0, x0, x0) as ε→ 0;

• |x−y|2ε2 is bounded and converges to 0 as ε→ 0.

It follows that for ε sufficiently small, (t, x, y) ∈ (t0−R, t0+R)×BR(x0)×BR(x0) is a local maximum ofψε, hence by the parabolic version of Ishii’s Lemma, Lemma 119, there exist q1, q2 ∈ R, and X,Y ∈ Sdsuch that

(q1 + ∂tφ(t, x), p+Dφ(t, x), X) ∈ D(2,1),+v1(t, x),(q2, p, Y ) ∈ D(2,1),−v2(t, y),

q1 − q2 = 0,(X 00 −Y

)≤ 3ε2

(I −I−I I

)+(D2φ(t, x) 0

0 0

), (4.62)

where

p =2(x− y)

ε2

Applying vectors of the form (x, x) to the quadratic forms of inequality (4.62) we conclude that

X ≤ Y +D2φ(t, x).

Since v1, v2 are, respectively, a subsolution and a supersolution of (4.51), we have

−q1 − ∂tφ(t, x) +H−(t, x, p+Dφ(t, x), X) ≤ 0,−q2 +H−(t, y, p, Y ) ≥ 0.

Subtracting the 2 inequalities we get

− ∂tφ(t, x) +H−(t, x, p+Dφ(t, x), X)−H−(t, y, p, Y ) ≤ 0. (4.63)

Now we notice that infa supbHa,b − infa supbH ′a,b ≥ infa,b(Ha,b −H ′a,b). Indeed, choose aε such

that infa supbHa,b ≥ supbHaε,b − ε, and choose bε such that supbH ′aε,b ≤ H ′aε,bε + ε. Then:

infa

supbHa,b − inf

asupbH ′a,b ≥ sup

bHaε,b − ε− sup

bH ′aε,b

≥ Haε,bε −H ′aε,bε − 2ε≥ inf

a,b

(Ha,b −H ′a,b)− 2ε.

Applying this inequality on (4.63) we get

−∂tφ(t, x) + infa∈A,b∈B

(Ha,b(t, x, p+Dφ(t, x), X)−Ha,b(t, y, p, Y )

)≤ 0.

We now need to estimate the difference Ha,b(t, x, p+Dφ(t, x), X)−Ha,b(t, y, p, Y ). Following Example112 we conclude that, since σ(t, x; a, b), µ(t, x; a, b) are Lipschitz continuous in x uniformly with respectto t, a, b, then

Ha,b(t, x, p+Dφ(t, x), X)−Ha,b(t, y, p, Y ) ≥ Ha,b(., Dφ,D2φ)(t, x)− C|x− y|2

ε2,

where C is a constant depending only on K.Thus,

−∂tφ(t, x) + infa∈A,b∈B

Ha,b(t, x, Dφ(t, x), D2φ(t, x)

)− C

|x− y|2

ε2≤ 0.

Since A,B are compact then infa,bHa,b(.) is continuous, hence we can let ε→ 0 to conclude that

−∂tφ(t0, x0) + infa∈A,b∈B

Ha,b(t0, x0, Dφ(t0, x0), D2φ(t0, x0)) ≤ 0.

Page 87: A weak dynamic programming principle for zero-sum stochastic ... · A weak dynamic programming principle for zero-sum stochastic differential games Pedro Miguel Almeida Serra Costa

4.6. HAMILTON-JACOBI-BELLMAN-ISAACS EQUATION 75

Since the state space S is unbounded in the space variable, we need to impose growth conditionsto prove uniqueness. Thus, we will prove uniqueness for (4.51) in the space of functions:

Θ =w : S → R : ∃C > 0 such that lim

|x|→+∞w(t, x) exp

(−CΨ(x)

)= 0, uniformly in t

,

where

Ψ(x) :=(log(|x|2 + 1) + 1

)2.

We remark that this space of functions contains the functions with polynomial growth.Before proving the maximum principle for (4.61) we need the following auxiliary Lemma:

Lemma 92. For any C, there exists C1 > 0 such that, for all t2, the function

χ(t, x) := exp((C1(t2 − t) + C)Ψ(x)

)satisfies

−∂tχ+ infa∈A,b∈B

Ha,b(., Dχ,D2χ) > 0 in [t1, t2]× Rd,

where t1 := t2 − CC1

.

Proof. By a direct calculation we have

DΨ(x) = 4log(|x|2 + 1) + 1

|x|2 + 1xT ,

D2Ψ(x) = −8log(|x|2 + 1)(|x|2 + 1)2

xxT + 4log(|x|2 + 1) + 1

|x|2 + 1I.

Thus,

|DΨ(x)| < C

√Ψ(x)

1 + |x|, |D2Ψ(x)| < C

√Ψ(x)

1 + |x|2.

Let C1 > 0 be arbitrary and t1 := t2 − CC1

. Then, for t ∈ [t1, t2],

|Dχ(t, x)| = χ(t, x)(C1(t2 − t) + C)|DΨ(x)|≤ 2Cχ(t, x)|DΨ(x)|

< Cχ(t, x)

√Ψ(x)

1 + |x|,

and, similarly,

|D2χ(t, x)| =∣∣∣(C1(t2 − t) + C)

((DΨ(x))TDχ(t, x) + χ(t, x)D2Ψ(x)

)∣∣∣≤ Cχ(t, x)

Ψ(x)1 + |x|2

.

Notice that the constant C never depends on C1 by virtue of the choice of t1. On the other hand, ∂tχdepends on C1 in a crucial way:

∂tχ(t, x) = −C1Ψ(x)χ(t, x).

By the growth conditions on µ, σ, we then conclude that

−∂tχ(t, x) + infa∈A,b∈B

Ha,b(t, x,Dχ,D2χ) ≥ (C1 − C)Ψ(t, x)χ(t, x)

> 0,

for C1 > C large enough.

Page 88: A weak dynamic programming principle for zero-sum stochastic ... · A weak dynamic programming principle for zero-sum stochastic differential games Pedro Miguel Almeida Serra Costa

76 CHAPTER 4. STOCHASTIC DIFFERENTIAL GAMES

We can now prove the maximum principle for (4.61).

Theorem 93. Suppose that A,B are compact and let w ∈ Θ be a continuous subsolution of (4.61).Then

sup[0,T ]×Rd

w = supT×Rd

w.

Proof. Since w ∈ Θ there is C such that

lim|x|→∞

w(t, x) exp(−CΨ(x)

)= 0.

Let t2 := T , and set t1 and χ as in Lemma 92. Then, for all ε,

Mε := max[t1,T ]×Rd

w − εχ

is attained at some point (tε, xε).Suppose that there is ε such that tε < T . We will proceed towards a contradiction. By the

definition of (tε, xε) we have,

w(t, x) ≤ φ(t, x) := εχ(t, x) + (w − εχ)(tε, xε),

hence (w− φ)(t, x) ≤ 0 = (w− φ)(tε, xε) for all t ∈ [t1, T ]. Since w is a viscosity subsolution of (4.61)and tε ∈ [t1, T ) we conclude that(

−∂tφ+ infa∈A,b∈B

Ha,b(., Dφ,D2φ))

(tε, xε) ≤ 0.

Notice that to obtain the previous inequality we had to extend the viscosity property of subsolutionsup to the initial time t1. This is made possible by Proposition 117 since w is continuous.

But since ∂tφ = ε∂tχ, Dφ = εDχ, D2φ = εD2χ and tε ∈ [t1, T ] we have by Lemma 92 that(−∂tφ+ inf

a∈A,b∈BHa,b(., Dφ,D2φ)

)(tε, xε) = ε

(−∂tχ+ inf

a∈A,b∈BHa,b(., Dχ,D2χ)

)(tε, xε) > 0,

which is a contradiction.We conclude that for all ε, tε = T and hence

(w − εχ)(t, x) ≤ (w − εχ)(tε, xε) ≤ supT×Rd

w,

for all t ∈ [t1, T ].Thus, taking ε→ 0, we conclude that

sup[t1,T ]×Rd

w ≤ supT×Rd

w.

Repeating the same argument with t2 := t1 we conclude that

supht1− C

C1,t1

i×Rd

w ≤ supt1×Rd

w ≤ sup[t1,T ]×Rd

w ≤ supT×Rd

w,

and, by iteration, we get

sup[0,T ]×Rd

w ≤ supT×Rd

w.

Page 89: A weak dynamic programming principle for zero-sum stochastic ... · A weak dynamic programming principle for zero-sum stochastic differential games Pedro Miguel Almeida Serra Costa

4.6. HAMILTON-JACOBI-BELLMAN-ISAACS EQUATION 77

As an immediate Corollary of Lemma 91 and Theorem 93 we get the comparison principle forsolutions of (4.51).

Corollary 94 (Comparison principle for the HJBI equation). Suppose A,B are compact. If v1, v2 ∈ Θare continuous functions such that v1 is a viscosity subsolution and v2 is a viscosity supersolution of(4.51) then v1 ≤ v2.

From the comparison result we are able to prove that there is a unique viscosity solution in Θ to(4.51).

Theorem 95. Suppose A,B are compact. Then there is an unique solution v ∈ Θ to (4.51).

Proof. Suppose there exist two solutions v1, v2 to (4.51). Then each one is continuous and both areviscosity subsolutions and supersolutions, hence

v1 ≤ v2,

v2 ≤ v1.

Since, by Remark 71, the value function V (t, x) has polynomial growth then V ∈ Θ. Furthermore,in the case of A,B compact we have by Remark 87 that V is continuous. Thus, it is the uniquesolution to (4.51).

As a Corollary of the comparison principle we also conclude that V ≤ U .

Corollary 96. Suppose A,B are compact. Then

V ≤ U.

Proof. We have seen that V is a solution of (4.51). Since H− ≥ H+, we conclude that V is asubsolution to

∂tV +H+(., DV,D2V ) ≤ 0.

Since U is a solution to the above equation, we conclude by the comparison principle that V ≤ U .

By uniqueness of solution we get, under the Isaacs’ condition, the existence of value for thedifferential game.

Corollary 97 (Isaacs’ condition). Suppose A,B are compact. If H+ = H− then V = U , that is, thegame has a value.

Page 90: A weak dynamic programming principle for zero-sum stochastic ... · A weak dynamic programming principle for zero-sum stochastic differential games Pedro Miguel Almeida Serra Costa

78 CHAPTER 4. STOCHASTIC DIFFERENTIAL GAMES

4.7 Merton’s optimal portfolio: worst-case approach

In this Section we consider a worst-case approach on Merton’s optimal portfolio problem in thesetting of stochastic parameters. We will do this by transforming this problem into a differential game.

Recall Merton’s problem, studied in Section 2.7. To convert it into a differential game we considertwo players: the first one controls π and wants to maximize the utility of the portfolio’s terminal valuewhile the second controls µ, σ and wants to minimize it. We suppose that the interest rate, r, is fixed.More precisely, our state variable has dynamics given by (4.1) with coefficients6

µ(t, x;π, µ, σ) := x(πµ+ (1− π)r),σ(t, x;π, µ, σ) := xπσ,

and the control spaces are At = H∞(t, T ; [π0, π1]), Bt = H∞(t, T ;B), with B a compact set to bespecified later. The reward function is

J(t, x;π, µ, σ) := E[f(Xπ,(µ,σ)t,x (T )

) ∣∣∣Ft] ,where f(x) = xp is the power utility function (0 < p < 1).

For the worst-case approach we consider the upper value of this stochastic differential game,

U(t, x) := esssupπ(.)∈Γ(t)

essinf(µ,σ)∈Bt

J(t, x;π(µ, σ), µ, σ),

which we know already to be a deterministic function. Moreover, we know that U is the uniqueviscosity solution to

−∂tU + sup(µ,σ)∈B

infπ∈[π0,π1]

Hπ,(µ,σ)(., DU,D2U) = 0,

where

Hπ,(µ,σ)(., DU,D2U) = −µ(.;π, µ, σ)DU − 12σ(.;π, µ, σ)2D2U.

Using the same arguments as in Section 2.7, we conclude that U(t, x) = f(x)U(t, 1) and

infπ∈[π0,π1]

Hπ,(µ,σ)(., DU,D2U) = Hπ∗(µ,σ),(µ,σ)(., DU,D2U),

where

π∗(µ, σ) = min(π1,

(µ− r)(1− p)σ2

∨ π0

).

It remains to obtain h(t) := U(t, 1) which, due to the previous considerations, is a solution to

−h′ − p(r − λ∗)h = 0,

where

λ∗ = sup(µ,σ)∈B

−(µ− r)π∗(µ, σ) + (1− p)σ2

2(π∗(µ, σ))2.

Thus,

u(t, x) = xpep(T−t)(r−λ∗).

6To avoid the use of many letters we will use the same letter for the controls and the values that they take. Forexample we use π for an element of H∞(t, T ; [π0, π1]) and also for an element of [π0, π1].

Page 91: A weak dynamic programming principle for zero-sum stochastic ... · A weak dynamic programming principle for zero-sum stochastic differential games Pedro Miguel Almeida Serra Costa

4.7. MERTON’S OPTIMAL PORTFOLIO: WORST-CASE APPROACH 79

In the following we assume π0 ≤ 0 ≤ π1. To compute λ∗ we notice that

λ∗ = max(λ1, λ2, λ3, λ4),

where

λ1 = sup(µ,σ)∈B:

(µ−r)(1−p)σ2≤π0

−(µ− r)π0 + (1− p)σ2

2π2

0

≤ sup(µ,σ,r)∈B:

(µ−r)(1−p)σ2≤π0

− (µ− r)2

π0 ≤ 0,

λ2 = sup(µ,σ)∈B:π0≤ (µ−r)

(1−p)σ2≤0

− (µ− r)2

2(1− p)σ2≤ 0,

λ3 = sup(µ,σ)∈B:0≤ (µ−r)

(1−p)σ2≤π1

− (µ− r)2

2(1− p)σ2≤ 0,

λ4 = sup(µ,σ)∈B:

(µ−r)(1−p)σ2≥π1

−(µ− r)π1 + (1− p)σ2

2π2

1

≤ sup(µ,σ)∈B:

(µ−r)(1−p)σ2≥π1

− (µ− r)2

π1 ≤ 0.

Thus λ∗ ≤ 0. To proceed any further we need to consider a particular set B. We consider B to bea rectangle in the product space, that is B = [µ0, µ1]× [σ0, σ1]. It is then easy to see that the optimalµ, σ must be such that |µ− r| is minimal and σ is maximal.

Page 92: A weak dynamic programming principle for zero-sum stochastic ... · A weak dynamic programming principle for zero-sum stochastic differential games Pedro Miguel Almeida Serra Costa

80 CHAPTER 4. STOCHASTIC DIFFERENTIAL GAMES

4.8 Conclusions and further research

We have extended the weak dynamic programming principle of [1] to the context of stochastic two-person zero-sum differential games. We used this principle to derive, under weaker assumptions thanthe ones existing in the literature, that both value functions are viscosity solutions of the associatedHamilton-Jacobi-Bellman-Isaacs equations.

In this Section we compare the assumptions made to deduce our main results, with the onestypically made in the literature, in particular those of the pioneering paper by Fleming and Souganidis,[2], and those of the more recent paper by Buckdahn and Li, [10]. We also compare this version ofthe weak dynamic programming principle with the original one, obtained for the stochastic optimalcontrol problem. Our objective in this comparison is to stress the main differences and the difficultiesencountered when extending the principle. Finally we indicate some directions where further researchcan be made in order to complete this work.

A more general scenario

There are two points where the setting considered in this thesis is more general than the one in[2, 10]: we consider a larger control space for one of the players and a larger set of payoff functions.Indeed, to derive that the lower value function is a viscosity solution of the HJBI equation, the set B isallowed to be unbounded and the payoff function f needs only to be locally Lipschitz with polynomialgrowth (as opposed to globally Lipschitz).

If this approach is to be applied to both the lower and upper values at the same time then strongerconditions must be considered. Indeed, for our results to apply to the lower value we need A compactwhile for the upper value we need B compact. Thus, to apply our results to the lower and uppervalues simultaneously, then both A,B need to be compact sets. However, there are situations whereonly one of the value functions is of interest, like for example in worst-case approaches, as the onestudied in Section 4.7. In such a situation our approach provides a more general framework.

Comparison with stochastic optimal control

In Section 4.5.1 we see the stochastic optimal control problem as a particular case of stochastic dif-ferential games. There we see that many assumptions used to deduce the weak dynamic programmingprinciple are no longer necessary.

In fact, differential games are much more delicate than optimal control problems. The main reasonfor this is the existence, in a differential game, of two optimization problems: a minimization and amaximization. Because of this, none of the inequalities on the dynamic programming principle isdeduced “for free”. Furthermore, since the two optimization problems have opposite objectives, weneed extra regularity on the reward function, in order to control the effect that the actions of oneplayer might have upon the other player’s choices.

Another factor that makes differential games more difficult than optimal control is the non-anticipativity property. Indeed, when proving the dynamic programming principle we have the needto construct strategies that are non-anticipative, which makes the proof more technical. For example,just to formulate correctly the weak dynamic programming principle in this new context, we had theneed to introduce the notion of non-anticipative controlled stopping time. Introducing this, and otherrelated notions, and studying some of their properties, took us a big part of Section 4.3.6.

Some directions of further research

We indicate four main directions where further research can be taken:

• Regarding the existence of uniformly ε−optimal strategies, βε(t,x) (recall Definition 74). Onequestion that remains open is whether considering A compact implies automatically the existenceof such strategies. Some steps we took in this direction can be found in Proposition 68, wheresufficient conditions were deduced. One of this conditions implies, in particular, that if there

Page 93: A weak dynamic programming principle for zero-sum stochastic ... · A weak dynamic programming principle for zero-sum stochastic differential games Pedro Miguel Almeida Serra Costa

4.8. CONCLUSIONS AND FURTHER RESEARCH 81

are ε−optimal Markov control policies for the player allowed to use strategies, then there existuniformly ε−optimal strategies.

Another question to be answered has to do with the norm of these strategies, ‖βε(t,x)‖p. Indeed,in the proof of the weak dynamic programming principle we assume that ‖βε(t,x)‖p is locallybounded as a function of (t, x). Finding sufficient conditions for this property to hold is, thus,an important issue. In fact, we would like to find conditions for the stronger property

∃C,m ‖βε(t,x)‖p ≤ C(1 + |x|m)

to hold. We assumed this result to prove that the value function has polynomial growth.

• Extending the result to the case where the set A is unbounded. We assume that A is compact soas to ensure that the reward function has the required continuity, which must be uniform withrespect to the control of the first player. We discuss the need for this condition in Remark 81.It would be interesting if we could lift this assumption so as to be able to consider both A,B tobe unbounded. If this is not possible it would then be interesting to find counterexamples.

• Considering even larger control spaces. The space of admissible controls considered in this thesisis the space of essentially bounded controls, H∞. We chose this space so as to be able to useLemma 148. As stated, in a footnote, in the beginning of Section 4.3.2, we could have consideredHp,∞ for p > 2 (even though in that case we would have to restrict the set of considered payofffunctions). A question that remains open is whether we can make p = 2 and consider H2,∞.

In the literature on stochastic optimal control a weaker condition on the integrability of thecontrols is often made. For example, in the reference for stochastic optimal control [4, p. 153],the admissible controls must be such that all its moments are finite.

• Proving uniqueness of solution for the HJBI equation, when A or B are not compact. Regardinguniqueness, in this thesis we only considered the case where both A,B are compact. If one ofthese sets fails to be compact then the Hamiltonian can be discontinuous and the arguments inthe proof of uniqueness need to be modified.

If a comparison result is proved for the HJBI equation with discontinuous Hamiltonian then wewould have as a consequence

V (t, x) ≤ V ∗(t, x) ≤ V∗(t, x) ≤ V (t, x),

which proves that V is continuous.

Page 94: A weak dynamic programming principle for zero-sum stochastic ... · A weak dynamic programming principle for zero-sum stochastic differential games Pedro Miguel Almeida Serra Costa

82 CHAPTER 4. STOCHASTIC DIFFERENTIAL GAMES

4.9 Notation

In this Section we review the main notation used throughout this Chapter.

Ω - Space of continuous functions from [0, T ] to RN starting from 0;

ω - Denotes a generic element ω ∈ Ω;

F - Borel σ−algebra in Ω;

Λ,Π - Generic elements of F ;

P - Wiener measure in Ω;

NP - P−null sets;

W. - N−dimensional Brownian motion in (Ω,F ,P) corresponding to the coordinate process,Ws(ω) = ωs;

F - Natural filtration induced by Ws augmented with the P−null sets, F = Fs, 0 ≤ s ≤ T;

T - Collection of all stopping times in F;

τ, θ - Generic elements of T ;

S - State space, S = [0, T ]× Rd;

(t, x), (t′, x′), (s, y) - Generic elements of S;

A,B - Sets where the controls of players 1 and 2, respectively, take values;

At,Bt - Space of controls, starting at time t, for players 1 and 2, respectively;

a, b - Denote, respectively, generic elements of both A,B and At,Bt;

Γ(t),∆(t) - Space of strategies, starting at time t, for players 1 and 2, respectively;

α, β - Denote, respectively, generic elements of Γ(t),∆(t);

Xa,bt,x (.) - Controlled state process taking values in Rd;

µ(t, x; a, b) - Growth rate of the state process, µ : S×A×B → Rd;

σ(t, x; a, b) - Volatility of the state process, σ : S×A×B → Rd×N ;

K - Lipschitz constant;

C - Generic constant depending at most in K,T ;

f(x) - Payoff function, f : Rd → R;

p - Growth power of f , i.e., p is such that |f(x)| ≤ C(1 + |x|p).

J(t, x; a, b) - Reward function, J(t, x; a, b) := E[f(Xa,bt,x (T )

) ∣∣∣Ft];V (t, x), U(t, x) - Lower and upper values of the stochastic differential game:

V (t, x) := essinfβ∈∆(t)

esssupa∈At

J(t, x; a, β[a]),

U(t, x) := esssupα∈Γ(t)

essinfb∈Bt

J(t, x;α[b], b).

U - Generic space of controls. Typically, U = At, U = Bt or U = At × Bt;

Page 95: A weak dynamic programming principle for zero-sum stochastic ... · A weak dynamic programming principle for zero-sum stochastic differential games Pedro Miguel Almeida Serra Costa

4.9. NOTATION 83

ν - Generic element ν ∈ U ;

Br(x) - Ball of radius r centered at x ∈ Rd;

B(t, x; r) := [t− r, t]×Br(x);

φ, ϕ - C1,2 test functions;

Θ - Space of functions where uniqueness for the HJBI equation will hold;

I - Identity matrix of dimensions d× d;

USC(Q),LSC(Q) - Spaces of functions which are, respectively, upper semi-continuous or lowersemi-continuous in Q ⊂ S.

graph(ψ) - Denotes the graph of a real-valued function ψ : S → R, that is,

graph(ψ) := (x, ψ(x)) : x ∈ S ⊂ S × R.

πA(F ) - Denotes the projection of F ⊂ A×B in A;

t1 ∧ t2, t1 ∨ t2 - Denotes, respectively, min(t1, t2) and max(t1, t2);

X =Λ Y - Abbreviation for 1ΛX = 1ΛY , where X,Y are random variables and Λ ∈ F ;

X ≡Λ Y on [τ1, τ2] - Abbreviation for P(1ΛXs = 1ΛYs; s a.e. on [τ1, τ2]) = 1, where X,Y arestochastic processes, τ1, τ2 ∈ T , and Λ ∈ F ;

X ≡ Y on [τ1, τ2] - Abbreviation for X ≡Ω Y on [τ1, τ2];

ν1 ⊕θ ν2 - Denotes the concatenation of controls ν1, ν2 ∈ U at a stopping time θ ∈ T , that is

(ν1 ⊕θ ν2)s := (ν1)s1[t,θ](s) + (ν2)s1(θ,T ](s).

Page 96: A weak dynamic programming principle for zero-sum stochastic ... · A weak dynamic programming principle for zero-sum stochastic differential games Pedro Miguel Almeida Serra Costa

84 CHAPTER 4. STOCHASTIC DIFFERENTIAL GAMES

Page 97: A weak dynamic programming principle for zero-sum stochastic ... · A weak dynamic programming principle for zero-sum stochastic differential games Pedro Miguel Almeida Serra Costa

Appendix A

Viscosity solutions

In this Appendix we briefly survey important definitions and some key results of the theory ofviscosity solutions of second order partial differential equations. The exposition follows closely [11].

A.1 Notion of viscosity solution

The theory of viscosity solutions applies to second order partial differential equations of the type

H(x, v,Dv,D2v) = 0, (A.1)

where H(x, r, p,X) is a continuous function that satisfies the following monotonicity condition:

H(x, r, p,X) ≤ H(x, s, p, Y ), when r ≤ s, X ≥ Y.

When this condition is satisfied we say that H is proper .

Example 98. To illustrate the scope of the theory we present some examples:

• First order equations, H(x, v,Dv) = 0, where H(x, r, p) is nondecreasing in r ∈ R.

• Hamilton-Jacobi-Bellman equations of first and second order:

H(x, r, p,X) = supα

(−〈µ(x;α), p〉 − Tr((σσT )(x;α)X) + c(x;α)r

),

where c(x;α) ≥ 0.

Remark 99. If for each t, H(t, x, r, p,X) is proper then so is the associated parabolic equation

− ∂tv +H(.; v,Dv,D2v) = 0, (A.2)

which we write as F (t, x, v, ∂tv,Dv,D2v) = 0.

In general, possibly degenerate elliptic or parabolic equations such as (A.1) or (A.2) do not admitsmooth solutions. For instance it is very easy to construct examples using the method of characteristicsfor which (A.1) does not admit classical solutions, see [12], for instance.

Through the use of test functions we will be able to consider merely continuous solutions (or evendiscontinuous) to these equations, in the following sense:

Definition 100. Let Q ⊂ Rd. Then v is a viscosity subsolution of (A.1) in Q if v ∈ USC(Q) andfor each ϕ ∈ C2(Q),

H(., v,Dϕ,D2ϕ)(x) ≤ 0, (A.3)

at every x ∈ Q such that v − ϕ has a local maximum at x on Q.

85

Page 98: A weak dynamic programming principle for zero-sum stochastic ... · A weak dynamic programming principle for zero-sum stochastic differential games Pedro Miguel Almeida Serra Costa

86 APPENDIX A. VISCOSITY SOLUTIONS

Similarly, v is a viscosity supersolution of (A.1) in Q if v ∈ LSC(Q) and for each ϕ ∈ C2(Q),

H(., v,Dϕ,D2ϕ)(x) ≥ 0, (A.4)

at every x ∈ Q such that v − ϕ has a local minimum at x on Q.We say that v is a viscosity solution of (A.1) if it is both a viscosity subsolution and a viscosity

supersolution.

We now make a few remarks on the definition of subsolution. The same remarks also apply tosupersolutions.

Remark 101. Let ϕ be a test function, and x ∈ argmaxQ(v − ϕ). Let R be such that

(v − ϕ)(x) ≤ (v − ϕ)(x), for all x ∈ BR(x).

We make some observations regarding the nature of this maximizer and the definition of subsolution:

• For the definition of subsolution we can consider that x is a strict maximizer. Indeed, v− ϕ hasa strict local maximum at x, where ϕ(x) := ϕ(x)+ |x− x|4. Furthermore, since Dϕ(x) = Dϕ(x)and D2ϕ(x) = D2ϕ(x), then the viscosity property (A.3) is valid for ϕ iff it is valid for ϕ.

• Similarly, we can consider that v(x) = ϕ(x). If not, we can take the test function to be ϕ(x) :=ϕ(x) + v(x)− ϕ(x).

• If v is bounded above by some positive function f ∈ C2(Q), we can consider that x is aglobal maximizer. Indeed, let ηr1,r2 denote a C2 cutoff function such that ηr1,r2

∣∣Br1 (x)

= 1

and ηr1,r2∣∣(Br2 (x))c = 0. Define

ϕ(x) := ηR2 ,R

(x)ϕ(x) +(1− ηR

2 ,R(x))f(x).

By the previous observation we can assume that ϕ(x) = v(x). Then, it is immediate that

(v − ϕ)(x) ≤ 0 = (v − ϕ)(x),

hence x is a global maximizer of v− ϕ. Since the derivatives of ϕ, ϕ at x are equal, we concludethat ϕ satisfies the viscosity property (A.3) iff ϕ does.

Remark 102. In Definition 100 we have taken the space of test functions to be C2(Q). Due to thecontinuity of H, instead of C2(Q) we can consider any dense subspace of (C2(Q), ||.||∞). In particular,we can consider C∞(Q) to be the space of test functions.

Indeed if v − ϕ has a strict local maximum at x and ϕn → ϕ uniformly then there is a sequencexn → x such that v(xn) → v(x) and v − ϕn has a strict local maximum at xn. Thus, by continuity ofH,ϕ and upper semi-continuity of v, we get

H(x, v(x), Dϕ(x), D2ϕ(x)

)= limH

(xn, v(xn), Dϕn(xn), D2ϕn(xn)

)≤ 0

We want the definition of viscosity solution to be compatible with that of classical solution. Moreprecisely, if v ∈ C2(Q) is a solution of (A.1) in the classical sense then it should also be a solution inthe viscosity sense. If Q is an open set then this is indeed true.

Proposition 103. Suppose Q is an open set. If v ∈ C2(Q) is a classical solution of (A.1) then it isalso a viscosity solution.

Proof. Let v ∈ C2(Q) be a classical solution of (A.1). We prove that v is viscosity subsolution of(A.1). Similarly, we can prove that v is also a viscosity supersolution.

Consider ϕ ∈ C2(Q) and x ∈ argmax(v − ϕ). Then, since v, ϕ ∈ C2(Q) and Q is an open set, thefollowing hold:

Dv(x) = Dϕ(x),D2v(x) ≤ D2ϕ(x).

Page 99: A weak dynamic programming principle for zero-sum stochastic ... · A weak dynamic programming principle for zero-sum stochastic differential games Pedro Miguel Almeida Serra Costa

A.2. UNIQUENESS FOR THE DIRICHLET PROBLEM 87

Since H is proper, we have, by (A.2), that

H(., v,Dϕ,D2ϕ)(x) ≤ H(., v,Dv,D2v)(x) = 0.

Thus, v is a viscosity subsolution of (A.1).

Using these test functions we can give a sense to (Dv,D2v) for a non-differentiable function v.This goes as follows:

Definition 104. Let v be upper semi-continuous in Q. The second-order superdifferential of v at xis the set

D2,+Q v(x) :=

(Dϕ(x), D2ϕ(x)

): ϕ ∈ C2(Q) such that v − ϕ has a local maximum at x

.

Let v be lower semi-continuous in Q. The second-order subdifferential of v at x is the setD2,−Q v(x) := −D2,+

Q (−v)(x).

The definition of viscosity solutions can then be made in terms of the superdifferentials and sub-differentials.

Remark 105. If x is an interior point of Q then D2,+Q v(x) does not depend on Q. More precisely, if

x ∈ intQ ∩ intQ′ then D2,+Q v(x) = D2,+

Q′ v(x). In this case we define D2,+v(x) := D2,+Q v(x).

Remark 106. If v−ϕ has a local maximum at x then v(x) ≤ ϕ(x) + v(x)−ϕ(x) and, by the Taylorexpansion of ϕ, we get

v(x) ≤ v(x) + 〈Dϕ(x), x− x〉+12〈D2ϕ(x)(x− x), x− x〉+ o(|x− x|2).

Thus,

D2,+Q v(x) ⊂

(p,X) : lim sup

Q3x→x

v(x)− v(x)− 〈p, x− x〉 − 12 〈X(x− x), x− x〉

|x− x|2≤ 0

.

In fact, the other inclusion is also true, hence we get an alternative definition for D2,+Q v.

Remark 107. If v ∈ C2 then D2,+v(x) = (Dv(x), D2v(x) +X) : X ≥ 0.

A.2 Uniqueness for the Dirichlet problem

On this Section we suppose that Q is a compact set. We will study uniqueness of solution of (A.1)in the case of a Dirichlet boundary condition on ∂Q.

We start by recalling the argument which establishes a comparison principle between classicalsubsolutions and supersolutions.

Proposition 108 (Classic comparison principle). Suppose that H(x, r, p, A) is strictly increasing inr. Let u, v be, respectively, a subsolution and a supersolution of (A.1) such that u|∂Q = v|∂Q. Then

u ≤ v.

Proof. Let w := u− v. The result follows by the following maximum principle:

maxQ

w = max∂Q

w = 0. (A.5)

To establish (A.5) let x ∈ argmaxQw and suppose, by contradiction, that w(x) > 0. Then x ∈ intQand

D(u− v)(x) = 0,D2(u− v)(x) ≤ 0.

Page 100: A weak dynamic programming principle for zero-sum stochastic ... · A weak dynamic programming principle for zero-sum stochastic differential games Pedro Miguel Almeida Serra Costa

88 APPENDIX A. VISCOSITY SOLUTIONS

By (A.2), we then get

H(., u,Dv,D2v)(x) ≤ H(., u,Du,D2u)(x) ≤ 0 ≤ H(., v,Dv,D2v)(x).

Since H(x, r, p, A) is strictly increasing in r and we are assuming that w(x) = u(x)− v(x) > 0 we alsohave

H(., v,Dv,D2v)(x) < H(., u,Dv,D2v)(x),

which is a contradiction.Thus, we conclude that maxQ w ≤ 0 = max∂Q w.

In the case where u, v are semi-continuous we can not replicate the argument of the previousProposition because we can not differentiate w. To go around the problem we double the number ofvariables and penalize the doubling. More precisely, we consider

Mα := maxQ×Q

u(x)− v(y)− α

2|x− y|2, (A.6)

which is attained in some point (xα, yα). By penalization we mean taking α→∞ which implies thatwe approximate the maximization of u− v in Q. More precisely, we have the following Lemma:

Lemma 109. Suppose u,−v are upper semi-continuous. Let Mα, xα, yα be defined as before, andsuppose that xα → x as α→∞. Then, as α→∞,

α|xα − yα|2 → 0,Mα → max

Q(u(x)− v(x)),

x ∈ argmaxQ

(u(x)− v(x)).

Proof. We start by noting that Mα ≥ maxQ u(x) − v(x). This implies that α|xα − yα|2 remainsbounded. Hence we have

|xα − yα| → 0.

Since xα → x then we must also have yα → x.Since u(x) − v(y) is upper semi-continuous and (xα, yα) → (x, x) we conclude that for α large

enough

Mα ≤ u(x)− v(x) + ε− α

2|xα − yα|2 ≤ max

Q(u(x)− v(x)) + ε.

Thus, due to the arbitrariness of ε, we conclude that

Mα → maxQ

(u(x)− v(x)).

Furthermore, for α large enough,

α

2|xα − yα|2 = u(xα)− v(yα)−Mα

≤ maxQ

(u(x)− v(x)) + ε−Mα

→ ε.

Thus, α|xα − yα|2 → 0.It remains to show that x ∈ argmaxQ(u− v). For that we recall that, for α large enough,

u(x)− v(x) ≥Mα − ε→ maxQ

(u(x)− v(x))− ε.

Thus, u(x)− v(x) = maxQ(u(x)− v(x)).

Page 101: A weak dynamic programming principle for zero-sum stochastic ... · A weak dynamic programming principle for zero-sum stochastic differential games Pedro Miguel Almeida Serra Costa

A.2. UNIQUENESS FOR THE DIRICHLET PROBLEM 89

If u, v were both C2 then we could compare the derivatives of u, v and α2 |x− y|

2 at (xα, yα) to getDu(xα) = −Dv(yα) = α(xα − yα) and(

Xα 00 −Yα

)≤ α

(I −I−I I

),

where Xα = D2u(xα), Yα = D2v(yα) and I stands for the identity matrix. The previous inequalityimplies in particular that Xα ≤ Yα.

For u,−v upper semi-continuous then it is still possible to obtain a perturbed version of theprevious inequality. This result is due to Ishii and is a key result in the proof of uniqueness ofviscosity solutions.

Lemma 110 (Ishii’s Lemma). Let Q be a locally compact set, u,−v ∈ USC(Q), ϕ be twice differen-tiable in a neighborhood of Q×Q. Suppose that u(x)− v(y)− ϕ(x, y) has a local maximum at (x, y).Then for each ε > 0 there exist X,Y such that

(Dxϕ(x, y), X) ∈ D2,+Q u(x),

(−Dyϕ(x, y), Y ) ∈ D2,−Q v(y),

and (X 00 −Y

)≤ A+ εA2,

where A := D2ϕ(x, y).

If in the previous Lemma we consider ϕ(x, y) := α2 |x− y|2, (x, y) = (xα, yα) and ε = 1

α , we get

Dxϕ(xα, yα) = −Dyϕ(xα, yα) = α(xα − yα),(Xα 00 −Yα

)≤ 3α

(I −I−I I

). (A.7)

In particular, like in the case of u, v twice differentiable, we have Xα ≤ Yα.We can now establish the comparison principle for viscosity solutions, under two additional as-

sumptions.

Theorem 111 (Comparison principle for viscosity solutions). Let u, v be continuous functions and,respectively, a subsolution and a supersolution of (A.1) such that u|∂Q = v|∂Q. Let xα, yα be definedas in (A.6). Consider the Xα, Yα given by Ishii’s lemma, satisfying (A.7).

Suppose that there exists γ > 0 and a function ω with ω(0+) = 0 such thatH(x, r, p, A)−H(x, s, p, A) ≥ γ(r − s),H(yα, r, pα, Yα)−H(xα, r, pα, Xα) ≤ ω(α|xα − yα|2 + |xα − yα|2),

(A.8)

where pα := α(xα − yα).Then

u ≤ v.

Proof. Because Q is compact we can suppose that xα → x ∈ argmaxQ(u − v). We know by Ishii’slemma that

(pα, Xα) ∈ D2,+u(xα),(pα, Yα) ∈ D2,−v(yα).

Thus, since u, v are, respectively, a subsolution and a supersolution of (A.1), we have

H(xα, u(xα), pα, Xα) ≤ 0 ≤ H(yα, v(yα), pα, Yα).

Page 102: A weak dynamic programming principle for zero-sum stochastic ... · A weak dynamic programming principle for zero-sum stochastic differential games Pedro Miguel Almeida Serra Costa

90 APPENDIX A. VISCOSITY SOLUTIONS

We then get

γ(u(xα)− v(yα)) ≤ H(xα, u(xα), pα, Xα)−H(xα, v(yα), pα, Xα)= H(xα, u(xα), pα, Xα)−H(yα, v(yα), pα, Yα) +

+H(yα, v(yα), pα, Yα)−H(xα, v(yα), pα, Xα)≤ H(yα, v(yα), pα, Yα)−H(xα, v(yα), pα, Xα)≤ ω(α|xα − yα|2 + |xα − yα|2),

and, taking α→∞, we conclude

u(x)− v(x) ≤ 0.

Thus,

u(x) ≤ v(x).

Now that we established a comparison principle for (A.1), the uniqueness of solution will follow.To show that conditions (A.8) are not too restrictive we now give examples where the second conditionis satisfied. Regarding the first condition we will see in the next Section that it poses no problems inthe parabolic case.

Example 112. We see in this example that if

H(x, r, p,X) = −〈µ(x), p〉 − Tr((σσT )(x)X),

then the second condition in (A.8) is satisfied. For that purpose we suppose that µ, σ are Lipschitz.Then we have

|〈p, µ(yα)〉 − 〈p, µ(xα)〉| ≤ C|p||yα − xα|,

and, if Xα, Yα satisfy (A.7),

−Tr((σσT )(yα)Yα

)+ Tr

((σσT )(xα)Xα

)≤ Cα|yα − xα|2,

where the second inequality follows from multiplying both sides of (A.7) by the positive semi-definitematrix (

(σσT )(xα) σ(xα)σT (yα)σ(yα)σT (xα) (σσT )(yα)

),

and taking traces.

A.3 Discontinuous viscosity solutions

In the definition of viscosity solution we require the solution to be continuous. In addition, in theelliptic equation (A.1) we suppose that H is continuous.

In this Section we extend the definition of viscosity solutions so as to be able to consider discon-tinuous solutions of elliptic equations of the form (A.1) with H not necessarily continuous. For thatpurpose we need to introduce the notion of semi-continuous envelope:

Definition 113. Given a function v, the lower semi-continuous envelope of v is the function v∗defined by

v∗(x) := lim infx′→x

v(x′).

The upper semi-continuous envelope of v, v∗, is defined as v∗ := −(−v)∗.

Page 103: A weak dynamic programming principle for zero-sum stochastic ... · A weak dynamic programming principle for zero-sum stochastic differential games Pedro Miguel Almeida Serra Costa

A.4. PARABOLIC EQUATIONS 91

The definition of discontinuous viscosity solution then is as follows:

Definition 114. A function v is a discontinuous viscosity subsolution of (A.1) in Q if v∗ is a viscositysubsolution of

H∗(., ϕ,Dϕ,D2ϕ)(x) ≤ 0,

for all x ∈ Q.Analogously, v is a discontinuous viscosity supersolution of (A.1) in Q if v∗ is a viscosity super-

solution of

H∗(., ϕ,Dϕ,D2ϕ)(x) ≤ 0,

for all x ∈ Q.We say that v is a discontinuous viscosity solution of (A.1) if it is both a discontinuous viscosity

subsolution and a discontinuous viscosity supersolution.

A.4 Parabolic equations

We now consider the particular case of parabolic equations:

F (t, x, v, ∂tv,Dv,D2v) := −∂tv +H(t, x, v,Dv,D2v) = 0. (A.9)

In this equations we have N + 1 variables (t, x) but the unknown v is only differentiated once in thetime variable. Therefore we do not have to require the test functions to have more than one derivativein time. We consider domains of the form Q :=]t, T [×O.

Definition 115. A function v is a viscosity subsolution of (A.2) in Q if v ∈ USC(Q) and, for eachϕ ∈ C1,2(Q),

F (., v, ∂tϕ,Dϕ,D2ϕ)(t, x) ≤ 0,

at every (t, x) ∈ Q such that v − ϕ has a local maximum at (t, x) on Q.The definitions of viscosity supersolution and viscosity solution are then obvious.

The definition of superdifferential is adapted as well:

Definition 116. Let v be upper semi-continuous in Q. The parabolic second-order superdifferentialof v at (t, x) is the set

D(1,2),+Q v(t, x) :=

(∂tϕ,Dϕ,D2ϕ)(t, x) : ϕ ∈ C1,2(Q) such that

v − ϕ has a local maximum at (t, x).

The parabolic second-order subdifferential of v at (t, x) is then defined as

D(1,2),−Q v(t, x) := −D(1,2),+

Q (−v)(t, x)

We can extend the viscosity property to the initial time, in the following sense:

Proposition 117. Suppose v ∈ C([t, T [×O) is a viscosity solution of (A.2) in ]t, T [×O. Then v is aviscosity solution of (A.2) in [t, T [×O.

Proof. Let ϕ ∈ C1,2([t, T [×O) and let (t, x0) ∈ argmax(v−ϕ). We assume, without loss of generality,that (t, x0) is a strict maximizer. For some fixed r, consider

(tε, xε) ∈ argmax(s,x)∈(t,t+r]×Br(x0)

(v(s, x)− ϕ(s, x)− ε

s− t

).

Page 104: A weak dynamic programming principle for zero-sum stochastic ... · A weak dynamic programming principle for zero-sum stochastic differential games Pedro Miguel Almeida Serra Costa

92 APPENDIX A. VISCOSITY SOLUTIONS

Since (tε, xε) ∈ [t, t + r] × Br(x0), we can suppose that (tε, xε) → (t, x). By definition of (tε, xε), wehave that, for all (s, x) ∈ (t, T )×O,

v(s, x)− ϕ(s, x) ≤ v(tε, xε)− ϕ(tε, xε)−ε

tε − t+

ε

s− t

≤ v(tε, xε)− ϕ(tε, xε) +ε

s− t

→ v(t, x)− ϕ(t, x),

where the convergence, as ε→ 0, follows from the continuity of v and ϕ.Because v−ϕ is continuous in [t, T [×O, we can extend the previous inequality to (s, x) ∈ [t, T )×O

getting

v(s, x)− ϕ(s, x) ≤ v(t, x)− ϕ(t, x), for all (s, x) ∈ [t, T )×O.

Since (t, x0) is a strict maximizer of v − ϕ in [t, T )×O we then get that (t, x) = (t, x0). This impliesin particular that, for ε small enough, (tε, xε) ∈ (t, T )×Br(x0).

Thus, since v is a viscosity subsolution to (A.2) in (t, T )×O, we conclude that

ε

(tε − t)2− ∂tϕ(tε, xε) +H(., v,Dϕ,D2ϕ)(tε, xε) ≤ 0.

The previous inequality implies that

−∂tϕ(tε, xε) +H(., v,Dϕ,D2ϕ)(tε, xε) ≤ 0.

Taking ε→ 0, we conclude that

−∂tϕ(t, x0) +H(., v,Dϕ,D2ϕ)(t, x0) ≤ 0.

Thus v is a viscosity subsolution of (A.2) in [t, T ) × O. The supersolution property is proved in asimilar way.

For parabolic equations, we may consider the following convenient particular case of Ishii’s Lemma.

Lemma 118. Let Q =]t, T [×O be a locally compact set, u,−v ∈ USC(Q), ϕ(t, s, x, y) be twice dif-ferentiable in a neighborhood of Q × Q such that ∂tDxϕ = ∂sDxϕ = ∂tDyϕ = ∂sDyϕ = 0. Supposethat u(t, x)− v(s, y)−ϕ(t, s, x, y) has a local maximum at (t, s, x, y). Then, for each ε > 0, there existX,Y such that

(∂tϕ(t, s, x, y), Dxϕ(t, s, x, y), X) ∈ D(1,2),+Q u(t, x)

(−∂sϕ(t, s, x, y),−Dyϕ(t, s, x, y), Y ) ∈ D(1,2),−Q v(s, y)

and (X 00 −Y

)≤ A+ εA2,

where A := D2x,yϕ(t, s, x, y).

We can apply the previous Lemma together with a doubling of variables argument, doubling (t, x)into (t, x, s, y), to establish uniqueness. However, if we do this, then to verify the second conditionof (A.8) we need stronger regularity of H in the time variable. To avoid this, we need a parabolicanalogue to Ishii’s Lemma. This parabolic analogue can be found in [11, p. 50] and is the following:

Lemma 119 (Ishii’s Lemma for parabolic equations). Let Q =]t, T [×O be a locally compact set,u,−v ∈ USC(Q) and ϕ(t, x, y) be a function defined in a neighborhood of ]t, T [×O×O, once continu-ously differentiable in t and twice continuously differentiable in x, y.

Page 105: A weak dynamic programming principle for zero-sum stochastic ... · A weak dynamic programming principle for zero-sum stochastic differential games Pedro Miguel Almeida Serra Costa

A.4. PARABOLIC EQUATIONS 93

Suppose that u(t, x)− v(t, y)− ϕ(t, x, y) has a local maximum at (t, x, y). Assume, moreover, thatthere is r > 0 such that, for every M > 0, there is C ∈ R such that

q1 ≥ C, whenever (q1, p1, X) ∈ D(1,2),+Q u(t, x), |x− x|+ |t− t| ≤ r and

|u(t, x)|+ |p1|+ |X| ≤M.

q2 ≤ C, whenever (q2, p2, Y ) ∈ D(1,2),−Q v(t, y), |y − y|+ |t− t| ≤ r and

|v(t, y)|+ |p2|+ |X| ≤M.

(A.10)

Then, for each ε > 0, there exist q1, q2, X, Y such that(i)

(q1, Dxϕ(t, x, y), X) ∈ D(1,2),+Q u(t, x),

(q2,−Dyϕ(t, x, y), Y ) ∈ D(1,2),−Q v(t, y),

(ii) q1 − q2 = ∂tϕ(t, x, y),

(iii)(X 00 −Y

)≤ A+ εA2,

where A := D2x,yϕ(t, x, y).

We remark that condition (A.10) is guaranteed by having u, v, respectively, a viscosity subsolutionand a viscosity supersolution of (A.2).

We end this Section by noticing that in the parabolic case the first condition in (A.8) is notsatisfied ifH(t, x, r, p,X) = H(t, x, p,X) does not depend on r (which is the case of the HJB equation).However, it is easy to work around this issue. Indeed, if H(t, x, p,X) is homogeneous in (p,X), i.e.if H(t, x, λp, λX) = λH(t, x, p,X), and u is a subsolution to (A.2), then u(t, x) := e−(T−t)u(t, x) is asubsolution to

F (t, x, u, ∂tu, Du,D2u) := u+ F (t, x, ∂tu, Du,D2u) ≤ 0,

and

F (t, x, r, q, p,X)− F (t, x, s, q, p,X) = r − s.

Thus, instead of applying the maximum principle to u − v, we should apply the maximum principleto e−(T−t)(u− v).

Page 106: A weak dynamic programming principle for zero-sum stochastic ... · A weak dynamic programming principle for zero-sum stochastic differential games Pedro Miguel Almeida Serra Costa

94 APPENDIX A. VISCOSITY SOLUTIONS

Page 107: A weak dynamic programming principle for zero-sum stochastic ... · A weak dynamic programming principle for zero-sum stochastic differential games Pedro Miguel Almeida Serra Costa

Appendix B

Stochastic calculus

In this Appendix we review some basic definitions and results from stochastic calculus. Theseresults are standard in the literature and can be found for instance in [13, 14].

B.1 Preliminaries

B.1.1 Brownian motion and filtration

Consider a probability space (Ω,F ,P), where Ω is the sample space, F is a σ−algebra on Ω and Pis a probability measure, i.e., P(Ω) = 1.

Definition 120. Given a probability space (Ω,F ,P), a stochastic process W is a Brownian motion if:

• W0 = 0 and t 7→Wt is a continuous function a.s.;

• Wt has independent increments, i.e.,

0 ≤ t1 ≤ t2 ≤ t3 ≤ t4 ⇒Wt4 −Wt3 ⊥⊥Wt2 −Wt1 ;

• Wt has increments which are normally distributed, i.e.,

0 ≤ t1 ≤ t2 ⇒Wt2 −Wt1 ∼ N (0, t2 − t1).

As stated in Section 4.2 the probability space we consider in this thesis is the classical Wiener spaceand in it we consider the standard N−dimensional Brownian motion corresponding to the coordinateprocess, Ws(ω) = ωs.

Definition 121. A filtration is an increasing collection of σ−algebras Ftt≥0, that is, s < t⇒ Fs ⊂Ft.

We consider in the probability space (Ω,F ,P) a filtration F = Fs : 0 ≤ s ≤ T. The filtration weconsider is the one induced naturally by W augmented with the P−null sets, that is,

Fs = σWr : r ≤ s ∨ NP.

We prove a useful property of this probability space:

Lemma 122. Let Λ ∈ Ft and 0 < ε < P(Λ). Then there is Λ ⊂ Λ such that Λ ∈ Ft and P(Λ) = ε.

Proof. This result is a consequence of the fact that we can partition Ω in Ft−measurable sets whichhave arbitrarily small measure. Indeed, if we consider

Γδn := |Wt| ∈ [nδ, (n+ 1)δ) ,

95

Page 108: A weak dynamic programming principle for zero-sum stochastic ... · A weak dynamic programming principle for zero-sum stochastic differential games Pedro Miguel Almeida Serra Costa

96 APPENDIX B. STOCHASTIC CALCULUS

then Γδn ∈ Ft, Γδnn≥0 is a partition of Ω and

P(Γδn) ≤ P(Γδ0) →δ→0 0.

Using these partitions and considering a sequence δi → 0, we can construct Λi ⊂ Λ such that

Λi ∈ Ft,Λi ⊂ Λi+1,

ε− δi ≤ P(Λi) ≤ ε.

Thus Λ =⋃i≥1 Λi is Ft−measurable and satisfies P(Λ) = ε.

B.1.2 Stopping times and progressive measurability

The σ−algebra Ft of the considered filtration represents intuitively the information acquired upto time t. In this subsection we define useful objects that respect the information accumulated up tothat time.

Definition 123. A random variable τ ≥ 0 is a stopping time with respect to a filtration F if τ ≤t ∈ Ft, for all t ≥ 0.

Definition 124. The σ−algebra induced by a stopping time is

Fτ = A ∈ F : A ∩ τ ≤ t ∈ Ft, ∀t ≥ 0.

Definition 125. Consider a stochastic process X:

• X is F−adapted if, for all t, Xt is Ft−measurable.

• X is F−progressively measurable if, for all t, the application

[0, t]× Ω 3 (s, ω) 7→ Xs(ω)

is B([0, t])×F−measurable.

Clearly, a progressively measurable process is adapted. A sufficient condition for an adaptedprocess to be progressively measurable is that the sample paths be right continuous (or left continuous),see for instance [13, p. 5].

B.1.3 Martingales and local martingales

Definition 126. Let (Ω,F ,P,F) be a probability space with a filtration, and X be a F−adapted processsuch that E[|Xt|] <∞. Then:

• Xt is a martingale if E[Xt|Fs] = Xs;

• Xt is a supermartingale if E[Xt|Fs] ≤ Xs;

• Xt is a submartingale if E[Xt|Fs] ≥ Xs.

Using Jensen’s inequality for conditional expectations, we get the following result:

Proposition 127. Let X be a martingale and ψ a convex function such that E[|ψ(Xt)|] <∞. Thenψ(X.) is a submartingale.

The previous Proposition implies, in particular, that, if X is a martingale, then |X| is a submartin-gale.

Regarding convergence of submartingales we have the next Theorem. Obviously, an analogousresult holds for supermartingales.

Page 109: A weak dynamic programming principle for zero-sum stochastic ... · A weak dynamic programming principle for zero-sum stochastic differential games Pedro Miguel Almeida Serra Costa

B.1. PRELIMINARIES 97

Theorem 128 (Convergence of submartingales). Let X be a right-continuous submartingale suchthat supt E[Xt] <∞. Then X∞ = limtXt exists a.s. and E[|X∞|] <∞.

The optional sampling theorem extends the inequalities in the definition of martingale to stoppingtimes.

Theorem 129 (Optional sampling theorem). Let X be a right-continuous submartingale such thatX∞ = limtXt exists a.s., E[|X∞|] <∞ and E[X∞|Ft] ≥ Xt. Then, given two stopping times τ1 ≤ τ2,we have

E[Xτ2 |Fτ1 ] ≥ Xτ1 .

We can deduce an analogous result for supermartingales and martingales.To get estimates on martingales the following inequality is frequently used:

Theorem 130 (Doob’s maximal inequality). Let p > 1 and X ≥ 0 be a right-continuous submartingalesuch that E[|Xt|p] <∞. Then

E[∣∣∣∣ supt1≤t≤t2

Xt

∣∣∣∣p] ≤ ( p

p− 1

)pE[|Xt2 |p].

We use in this work a conditional version of Doob’s maximal inequality which we prove in the nextCorollary.

Theorem 131 (Doob’s maximal inequality for conditional expectations). Let p > 1 and X ≥ 0 be aright-continuous submartingale such that E[|Xt|p] <∞. Then

E[∣∣∣∣ supt1≤t≤t2

Xt

∣∣∣∣p ∣∣∣Ft1] ≤ ( p

p− 1

)pE[|Xt2 |p|Ft1 ].

Proof. We use Doob’s maximal inequality to prove that, for each Λ ∈ Ft1 ,

E[E[∣∣∣∣ supt1≤t≤t2

Xt

∣∣∣∣p ∣∣∣Ft1]1Λ

]≤(

p

p− 1

)pE [E[|Xt2 |p|Ft1 ]1Λ] .

Since E[∣∣supt1≤t≤t2 Xt

∣∣p |Ft1] and E[|Xt2 |p|Ft1 ] are Ft1−measurable the result follows.To proceed we notice that, if X is a non-negative right-continuous submartingale, then so is

X := X1Λ. Furthermore, (sup

t1≤t≤t2Xt

)1Λ = sup

t1≤t≤t2Xt.

Thus, we can apply Doob’s maximal inequality to get:

E[E[∣∣∣∣ supt1≤t≤t2

Xt

∣∣∣∣p ∣∣∣Ft1]1Λ

]= E

[∣∣∣∣ supt1≤t≤t2

Xt

∣∣∣∣p]≤

(p

p− 1

)pE[|Xt2 |p]

=(

p

p− 1

)pE [E[|Xt2 |p|Ft1 ]1Λ] .

Local martingales

Definition 132. X is a local martingale if there is a sequence of stopping times τn → ∞ such thatX.∧τn is a martingale.

Page 110: A weak dynamic programming principle for zero-sum stochastic ... · A weak dynamic programming principle for zero-sum stochastic differential games Pedro Miguel Almeida Serra Costa

98 APPENDIX B. STOCHASTIC CALCULUS

Using the dominated converge theorem we can easily give a condition for a local martingale to bea martingale.

Lemma 133. Let M be a local martingale such that E[|M |∗t ] < ∞, where |M |∗t := sups∈[0,t] |Ms|.Then M is a martingale.

Proof. Indeed, we haveE[Mt∧τn |Fs] = Ms∧τn →a.s. Ms

On the other hand |Mt∧τn | ≤ |M |∗t and |M |∗t is integrable so, by the dominated convergence Theorem,we have

E[Mt∧τn|Fs] →a.s E[Mt|Fs].

Thus, E[Mt|Fs] = Ms.

Similarly, we can use Fatou’s lemma to give a condition for a local martingale to be a supermartin-gale.

Lemma 134. Let M l be a local martingale such that M l is bounded by below by a martingale M .Then M l is a supermartingale.

Proof. We want to show that E[Mt|Fs] ≤Ms. By the bounding condition we have M lt −Mt ≥ 0.

Since M l is a local martingale:

E[M lt∧τn

−Mt∧τn

∣∣∣Fs] = M ls∧τn

−Ms∧τn .→a.s. Mls −Ms

On the other hand, by Fatou’s lemma,

E[lim inf

n

(M lt∧τn

−Mt∧τn

) ∣∣∣Fs] ≤ lim infn

E[M lt∧τn

−Mt∧τn

∣∣∣Fs] = M ls −Ms,

and we know thatlim inf

n

(M lt∧τn

−Mt∧τn

)= M l

t −Mt.

Thus,E[M lt −Mt|Fs

]≤M l

s −Ms.

Because M is a martingale we conclude that E[M lt |Fs

]≤M l

s.

B.2 Stochastic integral and Ito’s formula

The stochastic integral∫ t0ψsdWs is defined for processes, ψ : [0, T ]× Ω → Rd×N , in

H2loc =

ψ : F− adapted processes with

∫ T

0

|ψs|2ds <∞ a.s.

.

However, in the smaller space

H2 =

ψ : F− adapted processes with E

[∫ T

0

|ψs|2ds

]<∞

,

we can prove additional results since this one is a Hilbert space when equipped with the norm

||ψ||H2 =

√√√√E

[∫ T

0

|ψs|2ds

].

Page 111: A weak dynamic programming principle for zero-sum stochastic ... · A weak dynamic programming principle for zero-sum stochastic differential games Pedro Miguel Almeida Serra Costa

B.2. STOCHASTIC INTEGRAL AND ITO’S FORMULA 99

Remark 135. We recall that W is a N−dimensional Brownian motion. Thus, I :=∫ T0ψsdWs is an

abbreviation for the vector (Ii)i=1,...,d such that

Ii :=N∑j=1

∫ T

0

ψi,js dW js ,

where ψ is a (d×N)−dimensional process. The norm of ψ is the Frobenius norm:

|ψ| =√

Tr(ψψT ).

In the following Proposition we list some of the properties of the stochastic integral:

Proposition 136 (Properties of the stochastic integral). Let ψ ∈ H2loc and It :=

∫ t0ψsdWs. Then:

• It has continuous sample paths and I0 = 0;

• It is a local martingale. If ψ ∈ H2 then It is a martingale;

• If ψ ∈ H2 then we have the so called Ito’s isometry:

E[(It − Is)2|Fs

]= E

[∫ t

s

|ψr|2dr∣∣∣Fs] .

B.2.1 Ito processes

Definition 137. An Ito process, X, is a continuous-time process defined by

Xt := X0 +∫ t

0

µsds+∫ t

0

σsdWs, t ≥ 0,

where µ, σ are F−adapted processes satisfying∫ t0|µs|+ |σs|2ds <∞.

Ito processes are frequently written in differential notation as

dXt = µtdt+ σtdWt.

We will use this notation often, specially in the context of stochastic differential equations.

B.2.2 Ito’s formula

Ito’s formula can be seen as the chain rule of stochastic calculus. It tells us how stochasticdifferentials change under composition.

Theorem 138 (Ito’s formula). Consider a function f ∈ C1,2([0, T ]× RN ) and an Ito process,

Xt = X0 +∫ t

0

µsds+∫ t

0

σsdWs.

Then, with probability 1,

f(t,Xt) = f(0, 0) +∫ t

0

∂tf(s,Xs) + 〈Df(s,Xs), µs〉+12Tr(σsσ

Ts D

2f(s,Xs))ds+

+∫ t

0

Df(s,Ws)σsdWs,

for all t ∈ [0, T ].

Page 112: A weak dynamic programming principle for zero-sum stochastic ... · A weak dynamic programming principle for zero-sum stochastic differential games Pedro Miguel Almeida Serra Costa

100 APPENDIX B. STOCHASTIC CALCULUS

In differential form Ito’s formula is written as

df(t,Xt) =(∂tf(t,Xt) + 〈Df(t,Xt), µt〉+

12Tr(σtσ

Tt D

2f(t,Xt)))

dt+Df(t,Wt)σtdWt.

As an application of Ito’s formula, we obtain the next martingale inequality, similar to Ito’sisometry, that will be useful to obtain estimates in SDE’s. We will obtain this inequality for conditionalexpectations. The same result for expected values can be found in [15, p. 80] or [13, p. 163].

Lemma 139. Let ψ ∈ H2loc and q ≥ 1. Then

E

[∣∣∣∣∫ t

s

ψrdWr

∣∣∣∣2q∣∣∣∣∣Fs]≤ (t− s)q−1(2q − 1)qE

[∫ t

s

|ψr|2qdr∣∣∣Ft] .

Proof. Let Xt :=∫ tsψrdWr. We apply Ito’s formula to |Xt|2q to obtain

|Xt|2q =∫ t

s

2q(q − 1)|ψTr Xr|2|Xr|2q−4 + q|ψr|2|Xr|2q−2dr +∫ t

s

2q(q − 1)|Xr|2q−2XTr dWr.

Since the Frobenius norm is consistent, we have

|ψTr Xr|2 ≤ |ψTr |2|Xr|2 = |ψr|2|Xr|2.

For now, we suppose that Xt is bounded for t ∈ [s, T ]. Then, the conditional expectation of thestochastic integral is zero, and we get

E[|Xt|2q|Fs

]≤

∫ t

s

q(2q − 1)E[|ψr|2|Xr|2q−2|Fs

]dr

Applying Holder’s inequality for conditional expectations, we conclude that

E[|Xt|2q|Fs

]≤

∫ t

s

q(2q − 1)E[|ψr|2q|Fs

] 1q E[|Xr|2q|Fs

]1− 1q dr

Thus, if we denote

g(t, ω) := q(2q − 1)E[|ψt|2q|Fs

] 1q (ω),

f(t, ω) := E[|Xt|2q|Fs

](ω),

we have that, for almost all ω ∈ Ω, f(., ω) verifies the integral inequality

f(t, ω) ≤∫ t

s

g(r, ω)f(r, ω)1−1q dr.

This implies that (see [16, p. 561])

f(t, ω) ≤(

1q

∫ t

s

g(r, ω)dr)q

.

Thus,

E[|Xt|2q|Fs

]≤ (2q − 1)q

(∫ t

s

E[|ψr|2q|Fs

] 1q dr

)q.

Using Jensen’s inequality, we then conclude that

E[|Xt|2q|Fs

]≤ (2q − 1)q(t− s)q−1

∫ t

s

E[|ψr|2q|Fs

]dr.

Page 113: A weak dynamic programming principle for zero-sum stochastic ... · A weak dynamic programming principle for zero-sum stochastic differential games Pedro Miguel Almeida Serra Costa

B.3. MARTINGALE REPRESENTATION 101

If Xt is not bounded then we consider the exit time, τn, of Xt from Bn(0). Then Xt := Xt∧τn isbounded and

Xt =∫ t

s

1suph∈[s,r]Xh∈Bn(0)ψrdr.

Thus, we can apply the inequality for bounded processes, to get

E[|Xt∧τn |2q|Fs

]≤ (2q − 1)q(t− s)q−1E

[∫ t∧τn

s

|ψr|2qdr∣∣∣Fs] .

Using Fatou’s lemma on the left term, monotone convergence on the right term and the fact thatτn →∞, we conclude that

E[|Xt|2q|Fs

]≤ (2q − 1)q(t− s)q−1E

[∫ t

s

|ψr|2qdr∣∣∣Fs] .

B.3 Martingale representation

Theorem 140 (Martingale representation). If M is a local martingale then there exists an adaptedprocess H ∈ H2

loc such that

Mt = M0 +∫ t

0

HsdWs, for all t ∈ [0, T ].

Furthermore, if M is square integrable, H ∈ H2 is unique.

B.4 Girsanov’s Theorem

Now we recall Girsanov’s Theorem, which tells us how the distribution of a Brownian motionchanges when we change from measure P to Q.

We specify the new measure Q by its Radon-Nikodym derivative with respect to P.

dQdP

= ZT := e−R T0 ψsdWs− 1

2

R T0 ψ2

sds,

where ψ ∈ H2.This defines a measure, but not necessarily a probability measure. Thus we must require that

E[ZT ] = 1.Girsanov’s Theorem then tells us that we can compensate the drift of a Brownian motion by

changing the measure:

Theorem 141 (Girsanov). Suppose E[ZT ] = 1. Then

Bt := Wt +∫ t

0

ψsds

is a Brownian motion under Q, where dQdP = ZT .

A condition to ensure that E[ZT ] = 1 is given by Novikov’s criterion:

Theorem 142 (Novikov). If

E[e

12

R T0 ψ2

sds]<∞,

then E[ZT ] = 1.

Page 114: A weak dynamic programming principle for zero-sum stochastic ... · A weak dynamic programming principle for zero-sum stochastic differential games Pedro Miguel Almeida Serra Costa

102 APPENDIX B. STOCHASTIC CALCULUS

B.5 Stochastic differential equations

In this Section we give a meaning to the stochastic differential equationdXs = µ(s,Xs)ds+ σ(s,Xs)dWs,Xt = x.

(SDE)

Notice that we only consider deterministic initial conditions.

Definition 143. A strong solution to (SDE) is an adapted process X with continuous sample pathssuch that

• Xt = x;

•∫ Tt|µ(s,Xs)|+ |σ(s,Xs)|2ds <∞, P− a.s.;

• Xs = x+∫ stµ(r,Xr)dr +

∫ stσ(r,Xr)dWr, for all s ∈ [t, T ].

The next Theorem gives sufficient conditions for the existence and uniqueness of strong solutionsfor (SDE).

Theorem 144. Suppose µ, σ satisfy the global Lipschitz and linear growth conditions

|µ(t, x)− µ(t, y)|+ |σ(t, x)− σ(t, y)| ≤ K|x− y|,|µ(t, x)|+ |σ(t, x)| ≤ K(1 + |x|).

Then there exists a unique strong solution, X ∈ H2, to (SDE).Moreover, for each p ≥ 2, there exists a constant C depending only on K,T, p such that

E[

supt≤s≤T

|Xs|p∣∣∣Ft] ≤ C(1 + |x|p).

In the next section we obtain the estimate of the previous Theorem in the context of controlleddiffusions.

B.6 Controlled diffusions

We now consider controlled processes.

Definition 145. Given a set of controls U , a controlled process is a mapping

(t, x, ν) ∈ [0, T ]× Rd × U 7→ Xνt,x ∈ H0

rcll(Rd).

The controlled processes we consider are given as solutions of a controlled diffusion:dXν

s = µ(s,Xνs ; νs)ds+ σ(s,Xν

s ; νs)dWs

Xνt = x

, (CSDE)

where ν ∈ Ut and Ut is a suitable space of controls taking values in a set U , such that Ut ⊂ Us whenevert ≤ s. We assume that µ, σ satisfy the following global Lipschitz and linear growth conditions:

|µ(t, x;u)− µ(t, y;u)|+ |σ(t, x;u)− σ(t, y;u)| ≤ K|x− y|,|µ(t, x;u)|+ |σ(t, x;u)| ≤ K(1 + |x|+ |u|). (B.1)

Remark 146. Notice that, by (B.1) and Jensen’s inequality, we have, for p ≥ 1:

|σ(t, x;u)|p ≤ Kp(1 + |x|+ |u|)p

≤ 3p−1Kp(1 + |x|p + |u|p)≤ (3K)p(1 + |x|p + |u|p).

Thus we can restate the growth condition of (B.1) as

|µ(t, x;u)|p + |σ(t, x;u)|p ≤ Kp(1 + |x|p + |u|p), for all p ≥ 1. (B.2)

Page 115: A weak dynamic programming principle for zero-sum stochastic ... · A weak dynamic programming principle for zero-sum stochastic differential games Pedro Miguel Almeida Serra Costa

B.6. CONTROLLED DIFFUSIONS 103

In this case we can still prove existence and uniqueness.

Theorem 147. Let p ≥ 2. For each ν ∈ Ut ∩ Hp(t, T ;U) there exists a unique strong solution,Xν ∈ Hp(t, T ; Rd), to (CSDE).

Moreover, there exists a constant C depending only on K,T, p such that

E[

supt≤s≤T

|Xνs |p∣∣∣Ft] ≤ C

(1 + |x|p + E

[∫ T

t

|νs|pds∣∣∣Ft]) . (B.3)

Proof. Here we just give the proof of the estimate. We abbreviate the notation, writing µs :=µ(s,Xν

s ; νs) and σs := σ(s,Xνs ; νs). Then

E[

supt≤r≤s

|Xνr |p∣∣∣Ft] = E

[supt≤r≤s

∣∣∣∣x+∫ r

t

µudu+∫ r

t

σudWu

∣∣∣∣p ∣∣∣Ft]≤ 3p−1

(|x|p + T p−1E

[∫ s

t

|µu|pdu∣∣∣Ft]+ E

[supt≤r≤s

∣∣∣∣∫ r

t

σudWu

∣∣∣∣p ∣∣∣Ft])

≤ 3p−1

(|x|p + T p−1E

[∫ s

t

|µu|pdu∣∣∣Ft]+

+(

p

p− 1

)pT

p2−1(p− 1)

p2 E[∫ s

t

|σu|pdu∣∣∣Ft]),

where the first inequality follows from Jensen’s inequality and the second follows from Doob’s maximalinequality and Lemma 139.

Thus, by (B.2), we conclude that there is C such that

E[

supt≤r≤s

|Xνr |p∣∣∣Ft] ≤ C

(1 + |x|p + E

[∫ T

t

|νs|pds∣∣∣Ft]+

∫ s

t

E[|Xν

u |p∣∣∣Ft] du)

≤ C

(1 + |x|p + E

[∫ T

t

|νs|pds∣∣∣Ft]+

∫ s

t

E[

supt≤r≤u

|Xνr |p∣∣∣Ft] du)

Applying Gronwall’s Lemma and considering s = T , we get the desired result.

We also have continuity in the initial conditions in the following sense:

Lemma 148. Consider t′ ≤ t, ν ∈ Ut′ ⊂ Ut and let Xνt,x be the strong solution of (CSDE). Suppose

that either:

(i) ‖ν‖H∞t′,p

≤M for some p ≥ 2, or

(ii) ‖ν‖Hp,∞t′

≤M for some p > 2.

Then, for all x′ ∈ B1(x) and for all s ∈ [t, T ],

E[∣∣Xν

t,x(s)−Xνt′,x′(s)

∣∣p ∣∣∣Ft′] ≤ Cx,M (|x− x′|p + |t− t′|γ), (B.4)

where Cx,M is a constant depending only on K,T, p, x,M and

γ :=

p2 ,if (i) holds,p2 − 1 ,if (ii) holds.

Proof. We use the following notation:

Xs := Xνt,x(s) , X ′

s := Xνt′,x′(s),

µs := µ(s,Xs; νs) , µ′s := µ(s,X ′s; νs),

σs := σ(s,Xs; νs) , σ′s := σ(s,X ′s; νs),

dx′ = |x′ − x| , dt′ = |t′ − t|.

Page 116: A weak dynamic programming principle for zero-sum stochastic ... · A weak dynamic programming principle for zero-sum stochastic differential games Pedro Miguel Almeida Serra Costa

104 APPENDIX B. STOCHASTIC CALCULUS

First we notice that, by (B.3), we have, for all ∈ [t′, T ] and for all x′ ∈ B1(x),

E[∣∣Xν

t′,x′(r)∣∣p ∣∣∣Ft′] ≤ CM (1 + |x′|p)

≤ CM (1 + |x|p),

where CM depends only on K,T, p,M .Suppose that (i) in the hypothesis of the Lemma holds. Then

E [|Xs −X ′s|p|Ft′ ] = E

[∣∣∣∣x+∫ s

t

µrdr +∫ s

t

σrdWr − x′ −∫ s

t′µ′rdr −

∫ s

t′σ′rdWr

∣∣∣∣p∣∣∣Ft′]≤ CE

[∣∣∣∣∫ s

t

µr − µ′rdr

∣∣∣∣p +∣∣∣∣∫ t

t′µ′rdr

∣∣∣∣p +

+∣∣∣∣∫ s

t

σr − σ′rdWr

∣∣∣∣p +∣∣∣∣∫ t

t′σ′rdWr

∣∣∣∣p + dpx′∣∣∣Ft′]

≤ CE[∫ s

t

|µr − µ′r|pdr + dp−1

t′

∫ t

t′|µ′r|pdr+

+∫ s

t

|σr − σ′r|pdr + dp2−1

t′

∫ t

t′|σ′r|pdr + dpx′

∣∣∣Ft′]≤ CE

[d

p2−1

t′

∫ t

t′(1 + |X ′

r|p + |νr|p)dr + dpx′ +∫ s

t

|Xr −X ′r|pdr

∣∣∣Ft′]≤ C(|x|,M)(dt′ , dx′) + C

∫ s

t

E[|Xr −X ′r|p|Ft′ ]dr,

where in the second inequality we used Lemma 139, and

Cx,M (dt′ , dx′) := CMdp2t′

(1 + |x|p +Mp

)+ Cdpx′

≤ Cx,M

(d

p2t′ + dpx′

).

Thus, we conclude by Gronwall’s Lemma that

E [|Xs −X ′s|p|Ft′ ] ≤ Cx,M (dt′ , dx′)eC(s−t)

≤ Cx,M

(d

p2t′ + dpx′

)eCT .

If we assume that (ii) holds then we can define C by

Cx,M (dt′ , dx′) := CMdp2−1

t′

(1 + |x|p +Mp

)+ Cdpx′ .

Remark 149. In this thesis we are interested in the case where U = A×B, Ut = At × Bt.In that case we remark that, for p ≥ 1 and (a, b) ∈ U ,

|(a, b)|p ≤ 2p−1(|a|p + |b|p).

As a consequence we also have that, for p ≥ 1 and (a, b) ∈ U ,

‖(a, b)‖H∞t′,p

≤ 2p−1(‖a‖H∞t′,p

+ ‖b‖H∞t′,p

),

‖(a, b)‖Hp,∞t′

≤ 2p−1(‖a‖Hp,∞t′

+ ‖b‖Hp,∞t′

).

Page 117: A weak dynamic programming principle for zero-sum stochastic ... · A weak dynamic programming principle for zero-sum stochastic differential games Pedro Miguel Almeida Serra Costa

Appendix C

A measure result

In this Appendix we prove a measure result useful in the proof of existence of solution for theHJBI equation.

Lemma 150. Consider X ⊂ Rn, A ⊂ R and F ⊂ X ×A such that A is compact, F is closed and

πX(F ) = X.

Then there is a measurable function ψ : X → A such that (x, ψ(x)) ∈ F .

Proof. Consider ψ(x) := maxa ∈ A : (x, a) ∈ F. Since A is compact, ψ : X → A is a welldefined function and (x, ψ(x)) ∈ F . It remains to see that ψ is measurable, which we will prove byapproximating ψ by a sequence of increasing measurable functions fn.

We can suppose, without loss of generality, that A ⊂ [0, 1]. We then define

fn(x) :=2n−1∑i=1

2−n1πX(F∩X×[ i2n ,1])(x)

and notice that fn is measurable because, for each i, πX(F ∩X ×

[i

2n , 1])

is a closed subset of X.Furthermore, it is easy to see that fn is an increasing sequence and that

fn(x) =2n−1∑i=0

i

2n1x: i

2n≤ψ(x)< i+12n ,

which implies that

fn(x) ψ(x).

Thus ψ(x) is measurable.

Using the previous Lemma, we get:

Proposition 151. Consider X ⊂ Rn, A ⊂ Rm and F ⊂ X × A such that A is compact, F is closedand

πX(F ) = X.

Then there is a measurable function ψ : X → A such that (x, ψ(x)) ∈ F .

Proof. We will make the proof by induction. By the previous Lemma the result is valid for m = 1.Now suppose it is valid for m− 1.

We can suppose that A = A1 ×A2 where A1, A2 are compact and A2 ⊂ R. Thus, we will think ofF as a subset of X ×A1 ×A2.

105

Page 118: A weak dynamic programming principle for zero-sum stochastic ... · A weak dynamic programming principle for zero-sum stochastic differential games Pedro Miguel Almeida Serra Costa

106 APPENDIX C. A MEASURE RESULT

Consider F1 = πX×A1(F ) which is a closed subset of X × A1. Then πX(F1) = X, hence, by theinduction hypothesis, there is a measurable function ψ1 : X → A1 such that

(x, ψ1(x)) ∈ F1.

Now define the closed sets X := graph(ψ1) ⊂ X × A1 and F2 := X × R ∩ F . It is clear thatF2 ⊂ X ×A2.

We will prove later that for each (x, a1) ∈ X there is a2 ∈ A2 such that (x, a1, a2) ∈ F . Assumingthis claim for now, we get

πX(F2) = X.

We can then apply the previous Lemma to obtain a measurable function ψ2 : X → A2 such that

(x, ψ2(x)) ∈ F2 ⊂ F.

We now define ψ : X → A1 ×A2 by

ψ(x) := (ψ1(x), ψ2(x, ψ1(x))).

It is then obvious that graph(ψ) ⊂ F . Furthermore, ψ is measurable because it is the composition ofmeasurable functions.

We finally prove the claim that for each (x, a1) ∈ X there is a2 ∈ A2 such that (x, a1, a2) ∈ F .Indeed, if (x, a1) ∈ X then there is a sequence (xn, an1 ) ⊂ graph(ψ1) such that (xn, an1 ) → (x, a1).Since graph(ψ1) ⊂ F1 = πX×A1(F ), then for each n there is also an2 ∈ A2 such that (xn, an1 , a

n2 ) ∈ F .

Since A2 is compact there is a2 ∈ A2 and a sequence nk such that ank2 → a2. Then (xnk , ank

1 , ank2 ) →

(x, a1, a2) and, since F is closed, we conclude that (x, a1, a2) ∈ F as desired.

Page 119: A weak dynamic programming principle for zero-sum stochastic ... · A weak dynamic programming principle for zero-sum stochastic differential games Pedro Miguel Almeida Serra Costa

Bibliography

[1] B. Bouchard and N. Touzi. Weak Dynamic Programming Principle for Viscosity Solutions.Preprint, 2009. Available from: http://www.ceremade.dauphine.fr/∼bouchard/pdf/BT09.pdf.

[2] W. H. Fleming and P. E. Souganidis. On the existence of value functions of two-player, zero-sum stochastic differential games. Indiana Univ. Math. J., 38(2):293–314, 1989. Available from:http://dx.doi.org/10.1512/iumj.1989.38.38015, doi:10.1512/iumj.1989.38.38015.

[3] Martino Bardi and Italo Capuzzo-Dolcetta. Optimal control and viscosity solutions of Hamilton-Jacobi-Bellman equations. Systems & Control: Foundations & Applications. Birkhauser BostonInc., Boston, MA, 1997. With appendices by Maurizio Falcone and Pierpaolo Soravia. Availablefrom: http://dx.doi.org/10.1007/978-0-8176-4755-1, doi:10.1007/978-0-8176-4755-1.

[4] Wendell H. Fleming and H. Mete Soner. Controlled Markov processes and viscosity solutions,volume 25 of Stochastic Modelling and Applied Probability. Springer, New York, second edition,2006.

[5] Dimitri P. Bertsekas and Steven E. Shreve. Stochastic optimal control, volume 139 of Mathematicsin Science and Engineering. Academic Press Inc. [Harcourt Brace Jovanovich Publishers], NewYork, 1978. The discrete time case.

[6] J. L. Doob. Measure theory, volume 143 of Graduate Texts in Mathematics. Springer-Verlag,New York, 1994.

[7] Nicole El Karoui, D. Huu Nguyen, and Monique Jeanblanc-Picque. Compactification methods inthe control of degenerate diffusions: existence of an optimal control. Stochastics, 20(3):169–219,1987.

[8] Rufus Isaacs. Differential games. A mathematical theory with applications to warfare and pursuit,control and optimization. John Wiley & Sons Inc., New York, 1965.

[9] Avner Friedman. Differential games. Wiley-Interscience [A division of John Wiley & Sons, Inc.],New York-London, 1971. Pure and Applied Mathematics, Vol. XXV.

[10] Rainer Buckdahn and Juan Li. Stochastic differential games and viscosity solutions of Hamilton-Jacobi-Bellman-Isaacs equations. SIAM J. Control Optim., 47(1):444–475, 2008. Available from:http://dx.doi.org/10.1137/060671954, http://dx.doi.org/10.1137/060671954 doi:10.1137/060671954.

[11] Michael G. Crandall, Hitoshi Ishii, and Pierre-Louis Lions. User’s guide to viscosity so-lutions of second order partial differential equations. Bull. Amer. Math. Soc. (N.S.),27(1):1–67, 1992. Available from: http://dx.doi.org/10.1090/S0273-0979-1992-00266-5,http://dx.doi.org/10.1090/S0273-0979-1992-00266-5 doi:10.1090/S0273-0979-1992-00266-5.

[12] Lawrence C. Evans. Partial differential equations, volume 19 of Graduate Studies in Mathematics.American Mathematical Society, Providence, RI, second edition, 2010.

107

Page 120: A weak dynamic programming principle for zero-sum stochastic ... · A weak dynamic programming principle for zero-sum stochastic differential games Pedro Miguel Almeida Serra Costa

108 BIBLIOGRAPHY

[13] Ioannis Karatzas and Steven E. Shreve. Brownian motion and stochastic calculus, volume 113 ofGraduate Texts in Mathematics. Springer-Verlag, New York, second edition, 1991.

[14] Daniel Revuz and Marc Yor. Continuous martingales and Brownian motion, volume 293 ofGrundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sci-ences]. Springer-Verlag, Berlin, third edition, 1999.

[15] N. V. Krylov. Controlled diffusion processes, volume 14 of Applications of Mathematics. Springer-Verlag, New York, 2008. Reprint of the 1980 edition.

[16] Byung-Il Kim. On some Gronwall type inequalities for a system integral equation. Bull. KoreanMath. Soc., 42(4):789–805, 2005.

[17] P. Tankov and N. Touzi. Calcul Stochastique en Finance (polycopie). Ecole Polytechnique, 2009.Available from: http://www.cmap.polytechnique.fr/∼touzi/Poly-MAP552.pdf.

Page 121: A weak dynamic programming principle for zero-sum stochastic ... · A weak dynamic programming principle for zero-sum stochastic differential games Pedro Miguel Almeida Serra Costa

Index

Adapted process, 96

Brownian motion, 95Filtration induced by the Brownian motion,

32, 95standard Brownian motion, 32

Cadlag, 33Comparison principle

for classic solutions, 87for the HJBI equation, 77for viscosity solutions, 89

Controlled diffusion, 102Controlled observation, 46Controlled process, 102

Non-anticipative controlled process, 43Controls

Admissible controls, 4, 35Concatenation of controls, 5, 13, 36Equivalent controls, 36Markov control policies, 37Patching of controls, 36

Differential notation, 99Discontinuous viscosity solution, subsolution, su-

persolution, 91Doob’s maximal inequality, 97

for conditional expectations, 97Dynamic programming principle, 5, 13, 23

Weak dynamic programming principle, 58

Essential infimum, 32with respect to an index, 39

Essential supremum, 32with respect to an index, 39

Filtration, 95Filtration induced by the Brownian motion,

32, 95Flow property

for ODE’s, 5for SDE’s, 13, 60

Frobenius norm, 34, 99

Girsanov’s Theorem, 101

Hamilton-Jacobi-Bellman equation, 6, 15, 85Hamilton-Jacobi-Bellman-Isaacs equation, 25, 69

Independence of irrelevant alternatives, 38, 42Isaac’s condition, 28, 77Ishii’s Lemma, 89

for parabolic equations, 92Ito process, 99Ito’s formula, 99Ito’s isometry, 99

Martingale, 96Local martingale, 97Martingale representation Theorem, 101Submartingale, 96Supermartingale, 96

Maximum principle, 87Merton’s optimal portfolio problem, 19

Worst-case approach, 78

Non-anticipativityfor controlled processes, 43for stopping times, 44for strategies, 23, 38

Novikov’s criterion, 101

Optimal controlDeterministic optimal control, 3Stochastic optimal control, 9

Optional sampling Theorem, 97

Partial Differential EquationParabolic PDE, 85, 91Proper PDE, 85

Payoff function, 5, 11, 23, 37Probability space, 95Progressively measurable process, 96

Running cost, 38

Self-financing conditions, 19Semi-continuous envelope, 90

Lower semi-continuous envelope, 90Upper semi-continuous envelope, 90

State space, 4, 10, 22, 34Stochastic differential equation, 102

109

Page 122: A weak dynamic programming principle for zero-sum stochastic ... · A weak dynamic programming principle for zero-sum stochastic differential games Pedro Miguel Almeida Serra Costa

110 INDEX

Strong solution, 102Stochastic integral, 98

Ito’s isometry, 99Properties, 99

Stopping time, 96σ−algebra induced by a stopping time, 96Non-anticipative controlled stopping time, 44

Controlled observation associated with, 46Strategies, 23, 38

Concatenation of strategies, 44, 45Independence of irrelevant alternatives, 42Markov control policies, 39Non-anticipative strategy, 23, 38Patching of strategies, 44, 46

StrategyUniformly ε−optimal strategy, 58

Subdifferentialof second order, 87Parabolic second-order subdifferential, 91

Superdifferentialof second order, 87Parabolic second-order superdifferential, 91

Terminal reward, 4, 11, 22, 37Continuity of the terminal reward, 57Independence of irrelevant alternatives, 38

ValueLower static value, 23, 30Lower value, 23, 40Upper static value, 23, 30Upper value, 23, 40Value function, 5, 11, 23Value of a game, 28, 40

Viscosity solution, subsolution, supersolution, 85for Parabolic equations, 91

Wiener space, 32

Zero-sum stochastic differential games, 34