a class of stochastic optimal control problems in hilbert ...tubaro/articles/fuhrman2003.pdf ·...

A class of stochastic optimal control problems in Hilbert spaces:

BSDEs and optimal control laws,

state constraints, conditioned processes

Marco FuhrmanDipartimento di Matematica, Politecnico di Milanopiazza Leonardo da Vinci 32, 20133 Milano, Italy

e-mail: [email protected]

Abstract

We consider a nonlinear controlled stochastic evolution equation in a Hilbert space, witha Wiener process affecting the control, assuming Lipschitz conditions on the coefficients. Wetake a cost functional quadratic in the control term, but otherwise with general coefficientsthat may even take infinite values. Under a mild finiteness condition, and after appropriateformulation, we prove existence and uniqueness of the optimal control. We construct theoptimal feedback law by means of an associated backward stochastic differential equation.In this Hilbert space setting we are able to treat some state constraints and in some cases torecover conditioned processes as optimal trajectories of appropriate optimal control problems.Applications to optimal control of stochastic partial differential equations are also given.

1 Introduction

We consider a stochastic optimal control problem for a system governed by a state equation ofthe form:

dXu

t = AXut dt + F (t,Xu

t ) dt + G(t,Xut ) ut dt + G(t,Xu

t ) dWt, t ∈ [0, T ],Xu

0 = x,(1.1)

on a bounded and fixed time interval [0, T ]. The solution Xu corresponding to the control u isa process in a Hilbert space H starting from x ∈ H. The Wiener process W and the controlprocess u take values in another Hilbert space U . The linear operator A generates a stronglycontinuous semigroup on H and the coefficients F and G, defined on [0, T ]×H, are assumed tosatisfy Lipschitz conditions with respect to appropriate norms.

We wish to minimize a cost functional of the form:

J(u) = E∫ T

0

[12|ut|2 + q(t,Xu

t )]

dt + E r(XuT ), (1.2)

over all admissible controls. Here | · | is the norm in the space U , q and r are functions on[0, T ] × H and H respectively, with nonnegative values, the value +∞ being allowed. As theclass of admissible controls we take square summable U -valued adapted controls. We note thatthis formulation of the optimal control problem includes state constraints: indeed, a large classof constraints on the trajectories of the state equations can be expressed by choosing q and r toequal +∞ on some prescribed sets.

1

The occurrence of the operator G in the control term of the state equation and the occurrenceof the quadratic term |ut|2 in the cost functional are essential for the results that follow: thesestructural assumptions are the main restriction imposed by our techniques. However note thefollowing natural interpretations: the term ut dt + dWt in the state equation can be consideredas a control affected by noise and the term E

∫ T0 |ut|2 dt in the cost functional can be interpreted

as the energy of the control u.Our main results are the following. First we prove an existence and uniqueness result for

the solution of the state equation (Proposition 2.3). The solution is understood in the so calledmild sense, customary in the theory of both deterministic and stochastic evolution equations.The kind of conditions we impose on the coefficients of the state equation are standard, andthey are satisfied in a large number of applications (see e.g. [6], [7]) To overcome difficulties dueto the possible unboundedness of u we introduce a localization technique designed to deal withmild solutions, and useful also for subsequent results as a technical tool.

Then we prove the so called fundamental relation (Theorem 4.2). We assume that thesolution X0 corresponding to u = 0 satisfies

∫ T

0q(σ,X0

σ) dσ < ∞, r(X0T ) < ∞, (1.3)

with strictly positive probability, and we show that there exists a real constant J and a functionζ : [0, T ]×H → U such that

J(u) = J +12E

∫ T

0|ζ(t,Xu

t )− ut|2 dt. (1.4)

for every admissible control u satisfying J(u) < ∞.Next (Proposition 5.1 and Theorem 5.2) we show that the so called closed loop equation:

dXt = AXt dt + F (t, Xt) dt + G(t,Xt) ζ(t, Xt) dt + G(t,Xt) dWt, t ∈ [0, T ],X0 = x,

admits a mild solution, possibly on a different probability space, unique in law. As a consequenceof (1.4) we show that the process defined by

ut = ζ(t,Xt), (1.5)

is an optimal control with optimal cost J(u) = J . The function ζ is called the optimal feedbacklaw. Moreover, it turns out that the law of X is absolutely continuous with respect to the lawof X0, and we obtain an explicit expression of the density (Corollary 5.3).

Finally (section 6) we consider special choices of q and r in some detail. When these functionstake only the values 0 or +∞ the optimal control problem is equivalent to minimizing the energyE

∫ T0 |ut|2 dt of the control u under the constraint that trajectories of the controlled system

remain in a prescribed set. In this case it turns out that the law of the optimal trajectory X isobtained by conditioning the paths of X0 to remain in the given set. In particular if r = 0 on aset B, r = +∞ on the complement of B and q = 0 then the law of X is obtained by conditioningX0 to belong to B at final time, and our construction of the process X is shown to be equivalentto the h-transformation of Doob.

We remark some features of the results we have just outlined. The state, the control andthe noise are processes in infinite dimensional spaces. The solution of the state equation istaken in the mild sense mentioned above: this allows better solvability results, in particular thepresence of an unbounded linear drift A, at the expense of more severe technical difficulties. The

2

control is not required to take values in a bounded set, as it sometimes happens in the theoryof nonlinear optimal control. The cost functions q and r may take the value +∞: this allows totreat control problems with some state constraints, but of course causes additional difficulties,in particular the cost itself may be infinite. We also note that no nondegeneracy assumption ofany kind is assumed on the diffusion coefficient G in the state equation, and state dependence(i.e. dependence on the space variable x) is allowed. Besides Lipschitz condition, no furtherregularity is assumed for the coefficients F and G. Because of these features we believe that theresults mentioned above are considerably general and improve existing results in the literature.

Among the large number of results relevant to the problems we have addressed we wishto mention those which are most closely connected with ours, leaving aside results on specificmodels (as a general reference for the finite dimensional case the reader may refer to [14]).

Existence of an optimal control for stochastic systems in infinite dimensions has been provedin [1], [2], [3], [18], [19]. In these papers the associated Hamilton-Jacobi-Bellman equation on aHilbert space is considered, and the optimal feedback law is obtained from its solution; see [8]as a general reference for this approach. The results of these papers can not be applied to ourcase; in particular in these papers no constraint is considered, and the coefficient G is assumedto be constant and to satisfy some nondegeneracy conditions.

The notion of viscosity solution has been successfully applied to Hamilton-Jacobi-Bellmanequations and subsequently to stochastic control problems. Concerning equations on an infinite-dimensional Hilbert space, relevant references are [4], [25], [26], [27], [38], [39], [20]. Generallyspeaking, the class of control problems that can be treated by this method is much more generalthan the one considered in this paper: for example, it includes control-depending coefficients Fand G, and Lipschitz conditions are not required. However, none of the results we know aredirectly applicable to our situation, either because some boundedness conditions are not fulfilledin the presence of an unbounded operator A, or because G is required to take values in the spaceof HilbertSchmidt operators, or because other specific properties are required. We also noticethat in all the cited references on viscosity solutions a characterization of the optimal controlthrough a feedback law is not available.

The connection between energy-minimizing problems and processes conditioned at final timeis classical: see e.g. [14] section VI.4. There is a significant generalization of the classicalresults in the paper [5], where the solutions of a class of stochastic optimal control problem areidentified with the so called reciprocal processes. In [5] a finite entropy condition is assumedfor the existence of the required optimal control: this should be compared with our finitenessassumption (1.3) or more precisely (4.2) below. However in these references the cost function qis zero, and analysis is carried out in the finite dimensional case with substantial use of ellipticregularity results, which in turn require nondegeneracy of the diffusion coefficient G. The sameremarks apply to the classical h-transformation of Doob, that we generalize to diffusion equationsin infinite dimensions.

A related problem is a control-theoretic interpretation of tied-down processes, or bridges.Bridges can be formally viewed as optimal trajectories which correspond to controls with infiniteenergy (see e.g. [14] section VI.4). In some cases bridges can be rigorously shown to be solutionsof optimal control problems with appropriate (finite) cost: see [13]. Essential use of densitieswith respect to the Lebesgue measure and gaussian-type estimates are required, and we havenot addressed possible generalization to the infinite dimensional case, that we will pursue inthe future. We mention however the results of [35] on bridges constructed from a class ofOrnstein-Uhlenbeck processes in a Hilbert space, and some subsequent applications ([36], [37]).

Our approach to the control problem is based on backward stochastic differential equations(BSDEs). The subject of (general non linear) BSDEs originated in the paper [31]: see [28], [11]

3

as general references. Applications of BSDEs to control problems are well known, see e.g. [32],[12] or [11] Part III, and extensions to infinite dimensional control problems can be found in[15], [16], [17]. In this paper BSDEs are used as a tool to define the optimal control law ζ. Theequations we will consider are very elementary and consequently our exposition will not rely onprevious results of this theory. Although existence and uniqueness of the solutions of the BSDEswe will consider is immediate, we need to prove some nontrivial properties of the solutions whichwere not known so far, at least to our knowledge. Since the definition of ζ is rather indirect wewill now explain the idea of our approach informally, following [15] and assuming H = U = Rn

for simplicity.For arbitrary admissible control u we consider the process

W 1t = Wt +

∫ t

0us ds, t ∈ [0, T ],

and we associate to the state equation the following BSDE:

dYt = Zt dW 1t +

12|Zt|2 dt− q(t, Xu

t ) dt, YT = r(XuT ). (1.6)

Suppose that we can find a pair of adapted processes (Y, Z) in R×Rn solution of this equation.In most cases (for instance if W 1 is a Wiener process under another probability measure, as aconsequence of the Girsanov theorem) it turns out that Y0 is deterministic, and it is a functionalof q, r and the coefficients of the state equation. Writing (1.6) in terms of W :

r(XuT )− Y0 =

∫ T

0Zt dWt +

∫ T

0Zt ut dt +

12

∫ T

0|Zt|2 dt−

∫ T

0q(t,Xu

t ) dt,

and taking expectation we obtain, after rearranging terms and recalling the cost functional (1.2),

J(u) = Y0 + E∫ T

0

[12|Zt|2 + Zt ut +

12|ut|2

]dt = Y0 +

12E

∫ T

0|Zt + ut|2 dt.

Under suitable regularity assumptions it is known that the process Z has indeed the formZt = −ζ(t,Xu

t ) for some function ζ : [0, T ] × Rn → Rn. This way we get the fundamentalrelation (1.4) with J = Y0. Solvability of (1.6) follows from the results in [21] or [24] if theterminal condition r(Xu

T ) is bounded. Since this need not be the case, we may consider solvingthe BSDE by a change of variable (suggested in [21], Example 1): we set

yt = exp(−Yt −

∫ t

0q(σ,Xu

σ ) dσ

), zt = −yt Zt, (1.7)

and by the Ito formula the equation (1.6) becomes

dyt = zt dW 1t , yT = exp

(−r(Xu

T )−∫ T

0q(σ,Xu

σ ) dσ

), (1.8)

which is immediately solvable, since y is simply a martingale whose final value is given, and zis the process provided by the well known martingale representation theorem. Note that theprescribed value YT is bounded even if the (positive) functions r and q are not. Our strategyconsists in solving (1.8) first, and then deduce a fundamental relation from it. In particular wefind J = − log y0 as the optimal cost. We never need to introduce the process (Y, Z). Whilederiving these results we allow more generality, as explained above; in particular, applicationsof the Girsanov theorem are never immediate, due to the possible unboundedness of the control

4

process u which is only assumed to be square integrable (and adapted). While existence ofa solution of (1.8) is trivial, in deducing (1.4) we need to prove that there exists a functionζ : [0, T ] × Rn → Rn such that zt = ζ(t, Xu

t ) yt. This is a non trivial problem, not previouslyconsidered, and complicated by the fact that y is not necessarily strictly positive, for instance ifr or q take the value +∞. This proof of the existence of ζ constitutes a novel approach to thedefinition of the optimal feedback law, based on BSDEs.

In this paper we have not addressed a detailed study of regularity properties of the controllaw, which may eventually lead to further improvements of our results.

One final comment on the change of unknown processes (1.7) seems in order. This is infact a probabilistic counterpart of the so called Cole-Hopf transformation, otherwise called theFleming logarithmic transformation: see [14] chapter VI. Indeed, denoting by V (t, x), t ∈ [0, T ],x ∈ H, the usual value function of the control problem, the Hamilton-Jacobi-Bellman equationof dynamic programming is

∂

∂tV (t, x) + LtV (t, x) + q(t, x) =

12|∇xV (t, x)G(t, x)|2 , V (T, x) = r(x),

where Lt denotes the generator of the Markov process associated with the state equation (withu = 0), namely

Ltf(x) =12

Trace (G(t, x)G(t, x)∗∇2xf(x)) + 〈Ax,F (t, x)〉+ 〈∇xf(x), F (t, x)〉,

for every smooth function f : H → R. This expression is formal, as we do not specify the domainof Lt. Under appropriate assumptions the optimal control law is ζ(t, x) = −G(t, x)∗∇xV (t, x)∗

and the optimal cost is V (0, x). The logarithmic transformation consists in defining the functionv(t, x) = exp(−V (t, x)) and noting that the equation for v deduced from the Hamilton-Jacobi-Bellman equation is the linear equation

∂

∂tv(t, x) + Ltv(t, x) = q(t, x) v(t, x), v(T, x) = exp(−r(x)). (1.9)

By the Feynman-Kac formula and equation (1.7) we obtain y0 = v(0, x) and the optimal valueV (0, x) = − log v(0, x) = − log y0 coincides with the one found above.

The logarithmic transformation has been generalized to the infinite dimensional case onlywhen the diffusion coefficient G is constant: see the references above (some partial results withvariable but nondegenerate G can be found in [9]). Our results are more general since we alsoallow r and q to be infinite: in this case v takes the value 0 and V may be infinite or nonsmoothand the feedback law can not be directly defined in terms of ∇V . Using our definition of thefeedback law based on BSDEs these difficulties can be overcome.

The plan of the paper is as follows. In section 2 we state the main assumptions, formulatethe control problem and show well-posedness of the state equation. In section 3 we study theBSDE (1.8) and define the function ζ that will eventually turn out to be the optimal controllaw. In section 4 we prove the fundamental relation (1.4) and draw the first consequenceson the solvability of the control problem. In section 5 we restate the control problem in theframework of admissible control systems (see e.g. [14]) which allows us to show existence of anoptimal control system and determine uniquely the law of the optimal trajectory. In section6 constrained problems are considered and relationships with conditioned processes and theDoob h-transformation is investigated. Finally in section 7 some applications are given in orderto illustrate applicability of our results to several concrete cases, including some controlledstochastic partial differential equations with finite or infinite dimensional noise.

5

2 The optimal control problem: strong formulation

This section is devoted to the formulation of the control problem and to some preliminaries.Existence of an optimal control will be discussed later.

We start with some notation. Throughout the paper, H and U denote real separable Hilbertspaces, with scalar products (·, ·)H , (·, ·)U . We use the symbol | · | to denote the norm in variousspaces, with a subscript if necessary. For any element z ∈ U∗, we denote by z∗ the element ofU corresponding to z by the Riesz isometry U∗ → U , i.e. satisfying zu = (z∗, u)U for u ∈ U .The space of bounded linear operators from U to H, with the usual operator norm, is denotedL(U,H); the subspace of Hilbert-Schmidt operators, with the Hilbert-Schmidt norm, is denotedL2(U,H).

On a probability space (Ω,F ,P) with a filtration Ft, t ≥ 0, a cylindrical P-Wiener processWt, t ≥ 0 with respect to Ft, taking values in a Hilbert space U , is a family of mappingsWt : U → L2(Ω,F ,P) such that the process Wtξ, ξ ∈ U, t ≥ 0 is gaussian centered withcovariance given by E [(Wtξ)(Wsξ

′)] = (ξ, ξ′)U (t ∧ s) for t, s ≥ 0, ξ, ξ′ ∈ U ; moreover for everyξ ∈ U the process Wtξ, t ≥ 0 is a real (continuous) Wiener process. Such a process can beconstructed for instance starting from a P-Wiener process W t, t ≥ 0 in a Hilbert space U ,letting U ⊂ U denote the Cameron-Martin space of the law of W 1, and defining Wtξ = (W t, ξ)U

for t ≥ 0, ξ ∈ U .The Ito stochastic integral process IT =

∫ T0 Φt dWt, T ≥ 0 with respect to a cylindrical

Wiener process can be defined for predictable integrand processes Φt, t ≥ 0 with values inL2(U,H) satisfying, P-a.s.,

∫ T0 |Φt|2L2(U,H)dt < ∞ for every T > 0. The process I is a continuous

local martingale in H and the Ito isometry E |IT |2 = E∫ T0 |Φt|2L2(U,H)dt holds provided the

right-hand side is finite for every T > 0, and in this case I is a square-integrable martingale inH.

For these preliminaries we refer the reader to [6], [7], or [34]. We will often consider filtrationsor processes defined only on a bounded interval [0, T ].

We consider a stochastic differential equation describing the evolution of the state Xu cor-responding to the control u:

dXu

t = AXut dt + F (t,Xu

t ) dt + G(t,Xut ) ut dt + G(t,Xu

t ) dWt, t ∈ [0, T ],Xu

0 = x ∈ H.(2.1)

Xu takes values in a Hilbert space H. We consider a cost functional of the form:

J(u) = E∫ T

0

[12|ut|2 + q(t,Xu

t )]

dt + E r(XuT ), (2.2)

where q and r are functions on [0, T ] × H and H respectively, with nonnegative values. Ourpurpose is to minimize the functional J over all admissible controls.

(Ω,F ,P) is a given complete probability space, with right-continuous and P-complete fil-tration Ft, t ∈ [0, T ]. Wt, t ∈ [0, T ] is a cylindrical P-Wiener process with respect tothe filtration Ft, taking values in a Hilbert space U . We call admissible control any Ft-predictable process ut, t ∈ [0, T ] with values in U and satisfying E

∫ T0 |ut|2 dt < ∞. The

precise notion of solution Xu to equation (2.1) will be given below.We make the following assumptions on the coefficients of the cost functional.

Hypothesis 2.1 The functions r : H → [0,∞] and q : [0, T ]×H → [0,∞] are measurable.

Notice that we allow infinite values for q and r. On the coefficients A, F, G of the stateequation we assume the following:

6

Hypothesis 2.2 (i) The operator A is the generator of a strongly continuous semigroupetA, t ≥ 0 of bounded linear operators in the Hilbert space H.

(ii) The mapping F : [0, T ]×H → H is measurable and satisfies, for some constant L > 0,

|F (t, x)− F (t, y)| ≤ L |x− y|, t ∈ [0, T ], x, y ∈ H.

(iii) G is a mapping [0, T ]×H → L(U,H) such that for every v ∈ U the map Gv : [0, T ]×H →H is measurable and bounded on bounded sets, esAG(t, x) ∈ L2(U,H) for every s > 0,t ∈ [0, T ] and x ∈ H, and

|esAG(t, x)|L2(U,H) ≤ L s−γ(1 + |x|),|esAG(t, x)− esAG(t, y)|L2(U,H) ≤ L s−γ |x− y|, s > 0, t ∈ [0, T ], x, y ∈ H.

(2.3)

for some constants L > 0 and γ ∈ [0, 1/2).

We note that G is also bounded on bounded sets as a mapping [0, T ] × H → L(U,H), i.e.with respect to the operator norm, due to the Banach-Steinhaus theorem.

We say that Xu is the trajectory corresponding to u if it is a continuous, Ft-adaptedprocess with values in H, and it satisfies: P-a.s.,

Xut = etAx +

∫ t

0e(t−σ)AF (σ,Xu

σ ) dσ +∫ t

0e(t−σ)AG(σ,Xu

σ )uσ dσ

+∫ t

0e(t−σ)AG(σ,Xu

σ ) dWσ, t ∈ [0, T ].(2.4)

We say that Xu is the mild solution of equation (2.1).

Proposition 2.3 Under the assumptions of Hypothesis 2.2, for every admissible control thereexists a unique trajectory Xu.

The proposition is not an immediate consequence of well-known results on stochastic evo-lution equations, since an admissible control u is just a square-summable predictable process.In the proof we will proceed by localization, stopping the control process at the time when∫ t0 |us|2 ds first exits a ball of radius n. Technical difficulties arise at this point: for instance we

note that the process (Xt∧T ), where T is a stopping time, is not a mild solution of the equationwith coefficients A,F,G replaced by A1[0,T ], F1[0,T ], G1[0,T ], simply because A1[0,T ] is not a gen-erator of a semigroup (a corresponding notion of mild solution can be defined, see [23], but itinvolves several additional difficulties). This indicates that the localization procedure needs tobe adapted to deal with mild solutions.

For any stopping time S with values in [0, T ] and any continuous adapted process Y in Hwe define the process Y S,A setting Y S,A

t = e(t−S)+A Yt∧S , i.e.

Y S,At = Yt if t ≤ S,

Y S,At = e(t−S)AYS if t > S.

Lemma 2.4 Let X be a continuous adapted process in H. Defining

Yt =∫ t

0e(t−σ)AG(σ,Xσ) dWσ, t ∈ [0, T ],

then Y has a continuous modification and, P-a.s.,

Y S,At =

∫ t

01[0,S](σ) e(t−σ)AG(σ,Xσ) dWσ, t ∈ [0, T ]. (2.5)

7

Proof. We first note that the stochastic integrals in the statement of the Lemma are welldefined, since by our assumptions and the continuity of X we have P-a.s.

∫ t

0|e(t−σ)AG(σ,Xσ)|2L2(U,H)dσ ≤ L

∫ t

0(t− σ)−2γ(1 + |Xσ|)2 dσ

≤ LT 1−2γ(1− 2γ)−1 supσ∈[0,T ]

(1 + |Xσ|)2 < ∞.

Next we prove that Y has a continuous modification. Indeed, let Tk = inft ∈ [0, T ] : |Xt| > k,Tk = T if this set is empty, and let

Y kt =

∫ t∧Tk

0e(t−σ)AG(σ,Xσ) dWσ =

∫ t

0e(t−σ)AG(σ,Xσ)1[0,Tk](σ) dWσ.

The estimate

|e(t−σ)AG(σ,Xσ)1[0,Tk](σ)|L2(U,H) ≤ L(t− σ)−γ(1 + |Xσ|)1[0,Tk](σ) ≤ L(t− σ)−γ(1 + k)

shows that Y k is well defined, and an application of the factorization method (see e.g. [7]Theorems 5.2.5 and 5.2.6) yields the continuity of the process Y k. Since P (Tk < T ) → 0 ask → ∞ and since, on the set Tk = T, we have, P-a.s., Yt = Y k

t for every t, it follows that acontinuous modification of Y exists. In a similar way one proves the existence of a continuousmodification of the right-hand side of (2.5).

In order to prove equality (2.5) assume first that A is a bounded linear operator and U =Rd. Denote by It the right-hand side of (2.5). Then the Ito stochastic differential of I iseasily computed: dIt = AIt dt + 1[0,S](t)G(t,Xt) dWt. Similarly we have dYt = AYtdt +G(t,Xt) dWt and therefore dYt∧S = 1[0,S](t)AYtdt + 1[0,S](t)G(t, Xt) dWt. Since d e(t−S)+Ah =1(S,T ](t)Ae(t−S)Ah dt for every h ∈ H, it follows from the Ito formula that dY S,A

t = AY S,At dt +

1[0,S](t)G(t,Xt) dWt. We conclude that d(Y S,At − It) = A(Y S,A

t − It) dt and therefore Y S,A = I.To prove the general case we let An = nA(nI −A)−1 ∈ L(H) denote the Yosida approxima-

tions of A. Then we take a basis ek of U , we denote by PN the orthogonal projection ontothe space spanned by e1, . . . , eN and we set WN = PNW and

Y(n,N)t =

∫ t

0e(t−σ)AnG(σ,Xσ) dWN

σ , Y(N)t =

∫ t

0e(t−σ)AG(σ,Xσ) dWN

σ .

Y(n,N)t is well defined since, P-a.s.,

∫ t

0|e(t−σ)AnG(σ,Xσ)PN |2L2(U,H)dσ =

N∑

i=1

∫ t

0|e(t−σ)AnG(σ,Xσ)ei|2dσ < ∞,

since we assume that X is continuous and Gei is bounded on bounded sets. Let Sk be adecreasing sequence of stopping times with values in [0, T ], converging to S P-a.s., such thateach Sk takes only a finite number of values. The special case proved above implies

e(t−Sk)+AnY(n,N)t∧Sk

=∫ t

01[0,Sk](σ) e(t−σ)AnG(σ,Xσ) dWN

σ .

By a standard localization procedure, it can be proved that for every t, Y(n,N)t → Y

(N)t in

probability as n → ∞ and Y(N)t → Yt in probability as N → ∞. Since Sk has only a finite

8

number of values, for every t, Y(n,N)t∧Sk

→ Y(N)t∧Sk

in probability as n → ∞ and Y(N)t∧Sk

→ Yt∧Skin

probability as N →∞. It follows that, for every t,

Y Sk,At =

∫ t

01[0,Sk](σ) e(t−σ)AG(σ,Xσ) dWσ, P− a.s.

Letting k → ∞, the right-hand side converges in probability to the right-hand side of (2.5)and Y Sk,A

t → Y S,At , P-a.s. Since both sides of (2.5) are continuous processes, the Lemma is

completely proved.

Corollary 2.5 Let X be a mild solution of (2.1) and let S be a stopping time with values in[0, T ]. Then the process XS,A is a mild solution of the equation:

dXS,At = AXS,A

t dt + F (t,XS,At )1[0,S](t) dt + G(t,XS,A

t )1[0,S](t) [ut dt + dWt], t ∈ [0, T ],

with XS,A0 = x.

Proof. We define Yt =∫ t0 e(t−σ)AG(σ,Xσ) dWσ and

Vt = etAx +∫ t

0e(t−σ)AF (σ,Xσ) dσ +

∫ t

0e(t−σ)AG(σ,Xσ)uσ dσ,

so that X = V + Y . Then it is readily checked that

V S,At = etAx +

∫ t

0e(t−σ)AF (σ,Xσ)1[0,S](t) dσ +

∫ t

0e(t−σ)AG(σ,Xσ)1[0,S](t)uσ dσ,

and the conclusion follows from Lemma 2.4.The following simple uniqueness lemma will be used several times.

Lemma 2.6 Let S be a stopping time with values in [0, T ], u a predictable process such that∫ T0 |ut|2 dt is bounded P-a.s., and let X be a bounded mild solution of the equation

dXt = AXt dt + F (t,Xt)1[0,S](t) dt + G(t,Xt)1[0,S](t) [ut dt + dWt], t ∈ [0, T ],

with X0 = x. If X ′ is another bounded mild solution of the same equation with X ′0 = x then we

have X = X ′.

Proof. We have, denoting by C a constant that may vary from line to line,

E∣∣∣∣∫ t

0e(t−σ)A[G(σ,Xσ)−G(σ,X ′

σ)]1[0,S](σ) dWσ

∣∣∣∣2

= E∫ t

0|e(t−σ)A[G(σ,Xσ)−G(σ,X ′

σ)]|2L2(U,H)1[0,S](σ) dσ

≤ C

∫ t

0(t− σ)−2γE |Xσ −X ′

σ|2 dσ.

Similarly, ∣∣∣∣∫ t

0e(t−σ)A[G(σ,Xσ)−G(σ,X ′

σ)]uσ1[0,S](σ) dσ

∣∣∣∣2

≤ C

∣∣∣∣∫ t

0(t− σ)−γ |Xσ −X ′

σ| |uσ| dσ

∣∣∣∣2

≤ C

∫ t

0(t− σ)−2γ |Xσ −X ′

σ|2 dσ

∫ t

0|uσ|2 dσ

≤ C

∫ t

0(t− σ)−2γ |Xσ −X ′

σ|2 dσ

9

and ∣∣∣∣∫ t

0e(t−σ)A[F (σ,Xσ)− F (σ,X ′

σ)]1[0,S](σ) dσ

∣∣∣∣2

≤ C

∫ t

0|Xσ −X ′

σ|2 dσ.

Setting v(t) = E |Xt −X ′t|2 we conclude that v(t) ≤ C

∫ t0 (t− σ)−2γv(σ) dσ and hence v(t) = 0

for all t, by a variant of the Gronwall lemma.Proof of Proposition 2.3. Step 1. We first assume that u = 0. For fixed p ∈ (2,∞)

let us denote by Hp the space of continuous adapted processes X, with values in H, satisfyingE supt∈[0,T ] |Xt|p < ∞. It can be proved that there exists a unique mild solution in Hp: see e.g.[7] Theorem 5.3.1, or [15] Proposition 3.2. An outline of the proof, that we need to recall inorder to proceed further, is as follows: one proves that the mapping Φ : Hp → Hp given by theformula

Φ(X)t = etAx +∫ t


∫ t

0e(t−σ)AG(σ,Xσ) dWσ, t ∈ [0, T ], X ∈ Hp,

is well defined and it is a contraction in the space Hp, endowed with the norm ‖X‖p =E supt∈[0,T ] e

−βtp|Xt|p, provided β > 0 is chosen sufficiently large. The unique fixed point isthe required solution. In particular the solution X satisfies

‖X‖ ≤ C(1 + |x|), (2.6)

for some constant C depending only on p, γ, T, L and on supt∈[0,T ] |etA|.Step 2. Now we assume that u is an admissible control satisfying the additional condition:

P(∫ T

0 |ut|2 dt ≤ c20

)= 1 for some constant c0 > 0.

We define a mapping Γ : Hp → Hp by the formula

Γ(X)t =∫ t

0e(t−σ)AG(σ,Xσ) uσ dσ, t ∈ [0, T ], X ∈ Hp.

Then we obtain

|Γ(X)t| ≤ L

∫ t

0(t− σ)−γ(1 + |Xσ|) |uσ| dσ

≤ L

(∫ t

0(t− σ)−2γ(1 + |Xσ|)2 dσ

)1/2 (∫ t

0|uσ|2 dσ

)1/2

≤ Lc0

(sup

t∈[0,T ]e−βt(1 + |Xt|)

)(∫ t

0e2βσ(t− σ)−2γ dσ

)1/2

.

So we have

e−βt|Γ(X)t| ≤ Lc0

(1 + sup

t∈[0,T ]e−βt|Xt|

)(∫ t

0e−2β(t−σ)(t− σ)−2γ dσ

)1/2

and we finally obtain

‖Γ(X)‖ ≤ Lc0 (1 + ‖X‖)(∫ T

0e−2βσσ−2γ dσ

)1/2

.

This shows that Γ is a well defined mapping on Hp. If X, X1 are processes in Hp, similarpassages show that

‖Γ(X)− Γ(X1)‖ ≤ Lc0‖X −X1‖(∫ T

0e−2βσσ−2γ dσ

)1/2

.

10

so that, for β sufficiently large, the mapping Φ + Γ is a contraction. We conclude that thereexists a unique mild solution in the space Hp.

Step 3. (Uniqueness.) Let X be the trajectory corresponding to an arbitrary admissiblecontrol u. Let us define the stopping times

Tn = inft ∈ [0, T ] :∫ t

0|us|2ds > n or |Xt| > n,

with the convention that Tn = T if this set is empty. Since X has continuous paths and sincethe requirement E

∫ T0 |ut|2 dt < ∞ implies in particular that P

(∫ T0 |ut|2 dt < ∞

)= 1, it follows

that for P-almost all ω there exists n(ω) such that Tn(ω) = T for n > n(ω). By the definitionof Tn the process XTn,A is bounded and by Corollary 2.5 it is a mild solution of

dXTn,At = AXTn,A

t dt+F (t,XTn,At )1[0,Tn](t) dt+G(t,XTn,A

t )1[0,Tn](t) [ut dt+dWt], t ∈ [0, T ].

We note that the process 1[0,Tn] u satisfies∫ T0 1[0,Tn](t)|ut|2 dt ≤ n, P-a.s. So if Y is another

trajectory corresponding to the same control, Lemma 2.6 implies that XTn,A = Y Tn,A andtherefore X = Y .

Step 4. (Existence.) Let us define the stopping times Tn = inft ∈ [0, T ] :∫ t0 |us|2ds > n,

let us set un := u1[0,Tn] and let us denote by Xn the mild solution of (2.1) corresponding to thecontrol un. The existence of Xn follows from Step 2. Since un+11[0,Tn] = un, from Corollary2.5 and the uniqueness property already proved we deduce that (Xn+1)Tn,A = (Xn)Tn,A. Inparticular, Xn+1

t = Xnt for t ≤ Tn, and so there exists a continuous adapted process X such

that Xt = Xnt for t ≤ Tn and for all n. Since XTn,A = (Xn)Tn,A, the process XTn,A is the mild

solution of

dXTn,At = AXTn,A

t dt+F (t,XTn,At )1[0,Tn](t) dt+G(t,XTn,A

t )1[0,Tn](t) [ut dt+dWt], t ∈ [0, T ].

Denoting

Vt = etAx+∫ t

0e(t−σ)AF (σ,Xσ) dσ+

∫ t

0e(t−σ)AG(σ,Xσ)uσ dσ, Yt =

∫ t

0e(t−σ)AG(σ,Xσ) dWσ,

it follows from Lemma 2.4 that XTn,A = V Tn,A + Y Tn,A, which implies that X = V + Y andshows that X is the required solution.

Remark 2.7 The reader may note that Proposition 2.3 still holds, with the same proof, ifwe merely require that the predictable process u satisfies P

(∫ T0 |ut|2 dt < ∞

)= 1 instead of

E∫ T0 |ut|2 dt < ∞.

3 The uncontrolled process, a special backward equation andthe definition of the feedback law

Our purpose is to define a function ζ that, under appropriate assumptions, turns out to be theoptimal feedback law. The definition is rather indirect, and requires several preliminary results.

11

3.1 The uncontrolled process.

In this section we study more carefully the trajectories of the state equation corresponding tou = 0. The resulting solution will be denoted by X and called uncontrolled process. To allowmore generality, we solve the equation on an arbitrary interval [t, T ] ⊂ [0, T ]. It is also convenientto reformulate the assumptions on the noise in a slightly different way.

Let W 1t , t ∈ [0, T ] be a cylindrical Wiener process with values in a Hilbert space U , defined

on a probability space (Ω,F ,P). We assume that F is P-complete and denote by N the familyof P-null sets of F . For an arbitrary interval [s, t] ⊂ [0, T ] we denote by F1

[s,t] the σ-algebragenerated by N and by the random variables W 1

r −W 1s , r ∈ [s, t]. We set F1

t = F1[0,t] and call

F1t , t ∈ [0, T ] the filtration generated by the Wiener process W 1; it is well known that F1

t is right-continuous. We denote by EF1

t the conditional expectation with respect to F1t .

For t ∈ [0, T ] and x ∈ H let us consider the equation:

dXτ = AXτ dτ + F (τ, Xτ ) dτ + G(τ, Xτ ) dW 1τ , τ ∈ [t, T ],

Xt = x.(3.1)

We assume that A,F, G satisfy the assumptions of Hypothesis 2.2. By a mild solution ofequation (3.1) we mean an F1

t -predictable process Xτ , τ ∈ [t, T ] with continuous paths inH such that, P-a.s.

Xτ = e(τ−t)Ax +∫ τ

te(τ−σ)AF (σ,Xσ) dσ +

∫ τ

te(τ−σ)AG(σ,Xσ) dW 1

σ , τ ∈ [t, T ]. (3.2)

We have the following result:

Proposition 3.1 Under the assumptions of Hypothesis 2.2, there exists a unique process Xsolution of (3.2). Moreover, for every p ∈ (2,∞),

E supτ∈[t,T ]

|Xτ |p ≤ C(1 + |x|)p, (3.3)

for some constant C depending only on p, γ, T, L and on supt∈[0,T ] |etA|.

This result is known: see e.g. [7] Theorem 5.3.1, or [15] Proposition 3.2. An outline of theproof was given in Step 1 of the proof of Proposition 2.3; the estimate (3.3) coincides with (2.6).We denote by X(τ, t, x), τ ∈ [t, T ] the solution, to stress dependence on the parameters t andx.

Most of the results in this section do not depend on the equation used to construct theprocess X. We collect below those well-known properties enjoyed by the process X that arerelevant in what follows.

(1) (Ω,F ,P) is a complete probability space and W 1t , t ∈ [0, T ] is a cylindrical Wiener process

with values in a Hilbert space U . We define F1[s,t] and F1

t as before.

(2) X = X(ω, τ, t, x), ω ∈ Ω, 0 ≤ t ≤ τ ≤ T, x ∈ H is a stochastic process with values ina Hilbert space H, measurable with respect to F × B(∆) × B(H) and B(H) respectively(here by ∆ we denote the set (t, τ), 0 ≤ t ≤ τ ≤ T and by B(Λ) the Borel σ-algebra ofany topological space Λ).

(3) For every t ∈ [0, T ] and x ∈ H, the process X(τ, t, x), τ ∈ [t, T ] has continuous paths andis adapted to the filtration F1

[t,τ ], τ ∈ [t, T ].

12

(4) For 0 ≤ t ≤ s ≤ T and x ∈ H we have, P-a.s.,

X(t, t, x) = x, X(τ, s, X(s, t, x)) = X(τ, t, x), τ ∈ [s, T ]. (3.4)

We recall that, for any z ∈ U∗, we denote by z∗ the element of U corresponding to z by theRiesz isometry U∗ → U .

Proposition 3.2 Assume the properties (1)− (4) above. Suppose that

(1) v = v(ω, τ, t, x), ω ∈ Ω, 0 ≤ t ≤ τ ≤ T, x ∈ H is a stochastic process with values in U∗,measurable with respect to F × B(∆)× B(H) and B(U∗) respectively.

(2) For every t ∈ [0, T ] and x ∈ H, the process v(τ, t, x), τ ∈ [t, T ] is predictable with respectto the filtration F1

[t,τ ], τ ∈ [t, T ].

(3) For 0 ≤ t ≤ s ≤ T and x ∈ H we have, P-a.s.,

v(τ, s,X(s, t, x)) = v(τ, t, x), for a.a. τ ∈ [s, T ]. (3.5)

Then there exists a Borel measurable function ζ : [0, T ] ×H → U such that, for t ∈ [0, T ] andx ∈ H, we have P-a.s.

v(τ, t, x)∗ = ζ(τ, X(τ, t, x)), for a.a. τ ∈ [t, T ]. (3.6)

Proof. Let ei be a basis of U and let us define vi,N = ((vei) ∧ N) ∨ (−N). Then vi,N

also satisfies the assumptions of the Proposition, with U∗ replaced by R, and moreover vi,N isbounded. Let us define

ζi,N (t, x) = lim infn→∞ n

∫ t+1/n

tE vi,N (τ, t, x) dτ, t ∈ [0, T ], x ∈ H.

Clearly ζi,N : [0, T ]×H → R is a Borel function.We fix x and 0 ≤ t ≤ s ≤ T . For τ ∈ [s, T ] we denote E[vi,N (τ, s, y)]|y=X(s,t,x) the random

variable obtained by composing X(s, t, x) with the map y 7→ E[vi,N (τ, s, y)]. Since vi,N (τ, s, y)is F1

[s,τ ]-measurable and X(s, t, x) is F1s -measurable, and these σ-algebras are independent, we

obtainEF

1s vi,N (τ, s, X(s, t, x)) = E[vi,N (τ, s, y)]|y=X(s,t,x), P− a.s.

and by (3.5) it follows that

EF1s vi,N (τ, t, x) = E[vi,N (τ, t, y)]|y=X(s,t,x), P−a.s. for a.a. τ ∈ [s, T ].

An application of the Fubini theorem shows that∫ s+1/ns E[vi,N (τ, t, y)]|y=X(s,t,x) dτ is a version

of the conditional expectation of∫ s+1/ns vi,N (τ, t, x) dτ given F1

s . We conclude that

ζi,N (s,X(s, t, x)) = lim infn→∞ nEF

1s

∫ s+1/n

svi,N (τ, t, x) dτ, P− a.s.

Now we fix x and t. Recalling that vi,N is bounded we note that P-a.s. the equality

limn→∞n

∫ s+1/n

svi,N (τ, t, x) dτ = vi,N (s, t, x) (3.7)

13

holds for almost all s ∈ [t, T ], by the Lebesgue theorem on differentiation. Thus, for almost alls ∈ [t, T ], (3.7) holds P-a.s. and therefore in the L1(Ω,P)-norm, again by the boundedness ofvi,N . It follows that ζi,N (s,X(s, t, x)) = EF1

s vi,N (s, t, x) = vi,N (s, t, x), P-a.s.So far we have proved that for every x, t

ζi,N (τ,X(τ, t, x)) = vi,N (τ, t, x), P−a.s. for a.a. τ ∈ [t, T ], (3.8)

for every i,N . Now let C ⊂ [0, T ]×H denote the set of pairs (t, x) such that limN→∞ ζi,N (t, x)exists and the series

∑∞i=1

(limN→∞ ζi,N (t, x)

)ei converges in U . Let us define

ζ(t, x) =∞∑

i=1

(lim

N→∞ζi,N (t, x)

)ei, (t, x) ∈ C, ζ(t, x) = 0, (t, x) /∈ C.

Since the process v satisfies

v(ω, τ, t, x)∗ =∞∑

i=1

(lim

N→∞vi,N (ω, τ, t, x)

)ei,

for every ω, τ, t, x, it follows from (3.8) that for every x, t we have (τ,X(τ, t, x)) ∈ C P-a.s. foralmost all τ ∈ [t, T ], and that (3.6) holds.

Remark 3.3 It follows from the proof of Proposition 3.2, in particular the definition of ζi,N ,that the function ζ constructed above is uniquely determined by the law of v, or even by thelaw of ∫ τ

t vi,N (s, t, x) ds, τ ∈ [t, T ].

3.2 A backward equation and the feedback law.

We are still aiming at defining the function ζ that will eventually turn out to be the optimalfeedback law. This is done by means of an auxiliary backward equation that we are now goingto introduce.

We assume given Ω, F , P, W 1t , X(τ, t, x), satisfying the properties (1)− (4) listed

above. We recall the following well known representation theorem (see e.g. [30], [29]), which isthe starting point for many basic results on backward equations.

Proposition 3.4 For arbitrary F1T -measurable η : Ω → R satisfying E |η|2 < ∞ there exists an

F1t -predictable process z in U∗ such that E

∫ T0 |zσ|2dσ < ∞ and η = E η +

∫ T0 zσ dW 1

σ , P-a.s.In particular, for every τ ∈ [0, T ],

EF1τ η = η −

∫ T

τzσ dW 1

σ , P−a.s.

We remark that the Hilbert space U∗, where z takes values, coincides with L2(U,R). Proposi-tion 3.4 immediately implies solvability of the following backward equation of particularly simpleform: given η as in the proposition, we look for a pair of F1

t -predictable processes (y, z) inR× U∗ satisfying, for almost every τ ∈ [0, T ], P− a.s.,

yτ +∫ T

τzσ dW 1

σ = η, (3.9)

Clearly, settingyτ = EF

1τ η, (3.10)

14

one has the required solution. We also note that the process y is a martingale and has acontinuous version; we will always consider this version and we can assume that, P-a.s., equality(3.9) holds for every τ ∈ [0, T ]. It is also easy to prove that y is unique up to indistiguishbilityand z is unique up to modification. Equation (3.9) will be written in the differential form

dyτ = zτ dW 1τ , yT = η.

We will study the following special case. For t ∈ [0, T ] and x ∈ H, P-a.s.,

dyτ = zτ dW 1τ , τ ∈ [t, T ], yT = φ(X(T, t, x)) exp

(−

∫ T

tq(σ,X(σ, t, x)) dσ

).

(3.11)On the functions φ and q we make the following assumption:

φ : H → [0,∞) and q : [0, T ]×H → [0,∞] are measurable and φ is bounded. (3.12)

Note that we allow infinite values for q; however, adopting the obvious convention exp(−∞) = 0,the right-hand side of (3.11) is well defined and even bounded. Consequently the solution (y, z)exists and y is a nonnegative bounded process, by formula (3.10).

We will denote by y(τ, t, x), z(τ, t, x), τ ∈ [t, T ] the solution, in order to stress dependenceon the parameters t ∈ [0, T ] and x ∈ H.

Let us define

χ(t, x) = E[φ(X(T, t, x)) exp

(−

∫ T


)]. (3.13)

Clearly, χ(t, x) is the same as y(t, t, x) and depends only on q, φ and the law of X.

Remark 3.5 In particular, if X is defined as the solution of equation (3.1) then χ is determinedby the operator A and the functions F,G, q, φ.

By the Markov property of X, for τ ∈ [t, T ],

y(τ, t, x) = χ(τ,X(τ, t, x)) exp(−

∫ τ


), P− a.s. (3.14)

For arbitrary ξ ∈ U let us consider the real Wiener process W 1ξ = W 1t ξ, t ∈ [0, T ] and its

joint quadratic variation with y(·, t, x) on the interval [t, τ ], that we denote 〈y(·, t, x),W 1ξ〉[t,τ ]. Itfollows directly from the backward equation that 〈y(·, t, x),W 1ξ〉[t,τ ] =

∫ τt zsξ ds. Consequently,

denoting by ∆ a subdivision t = t0 < t1 < . . . < tn = τ we have

∫ τ

tzsξ ds = lim

n∑

i=1

(y(ti, t, x)− y(ti−1, t, x))(W 1tiξ −W 1

ti−1ξ), (3.15)

in probability, as supi(ti − ti−1) → 0. From (3.13), (3.14), (3.15) it is easy to deduce that thelaw of y(τ, t, x),

∫ τt z(s, t, x) ds is uniquely determined by the law of X(τ, t, x),W 1

τ −W 1t .

Lemma 3.6 Assume that q is bounded. Then for 0 ≤ t ≤ s ≤ T we have, P-a.s.,

y(τ, s, X(s, t, x)) = exp(∫ s


)y(τ, t, x), for τ ∈ [s, T ],

z(τ, s,X(s, t, x)) = exp(∫ s


)z(τ, t, x) for a.a. τ ∈ [s, T ].

15

Proof. We set Xτ = X(τ, t, x) for short. We define, for τ ∈ [s, T ],

y′τ = exp(∫ s

tq(σ,Xσ) dσ

)y(τ, t, x), z′τ = exp

(∫ s

tq(σ,Xσ) dσ

)z(τ, t, x).

Clearly, dy′τ = z′τ dW 1τ on [s, T ]. Next we note that the pair of processes (y(·, s,Xs), z(·, s, Xs))

also solve the backward equation on [s, T ]. Finally, we note that

y′T = φ(XT ) exp(−

∫ T

sq(σ,Xσ) dσ

)

and so y′T coincides with

y(T, s, Xs) = φ(X(T, s,Xs)) exp(−

∫ T

sq(σ,X(σ, s, Xs)) dσ

)

by (3.4). We have shown that the processes (y(·, s, Xs), z(·, s,Xs)) and (y′, z′) are solutions ofthe backward equation on [s, T ] with the same terminal condition. The conclusion of the lemmafollows from the uniqueness property.

Proposition 3.7 Assume that the properties (1)− (4) listed above and that (3.12) hold. Thenthere exists a measurable function ζ : [0, T ]×H → U such that, for every t ∈ [0, T ], x ∈ X, wehave P-a.s.,

z(τ, t, x)∗ = y(τ, t, x) ζ(τ, X(τ, t, x)) for a.a. τ ∈ [t, T ], (3.16)

where (y, z) is the solution to (3.11). Moreover, ζ can be chosen to satisfy the requirement thatζ(t, x) = 0 for all (t, x) such that χ(t, x) = 0.

Proof. First we assume that q is bounded and φ ≥ ε for some ε > 0. We have theny(τ, t, x) ≥ ε and setting v(τ, t, x) = z(τ, t, x)/y(τ, t, x) it follows from Lemma 3.6 that for everyt ∈ [0, T ] and x ∈ X,

v(τ, s,X(s, t, x)) = v(τ, t, x), P−a.s. for a.a. τ ∈ [t, T ].

The existence of the required function ζ follows from Proposition 3.2.Now we remove the restriction on q and φ. We define φn = φ + 1/n, qn = q ∧ n, we note

that φn ≥ 1/n and qn is bounded and therefore there exist corresponding measurable functionsζn : [0, T ]×H → U satisfying the assertions of the proposition. Thus, if we fix t and x and setXτ = X(τ, t, x), yτ = y(τ, t, x), zτ = z(τ, t, x) for short, and if we denote (yn, zn) the solutionsto

dynτ = zn

τ dW 1τ , τ ∈ [t, T ], yn

T =[φ(XT ) +

1n

]exp

(−

∫ T

t[q(σ,Xσ) ∧ n] dσ

), n = 1, 2, . . .

then we have(zn

τ )∗ = ynτ ζn(τ, Xτ ), P−a.s. for a.a. τ ∈ [t, T ].

We let C = (t, x) : limn ζn(t, x) exists and we define ζ(t, x) = limn ζn(t, x) for (t, x) ∈ C,ζ(t, x) = 0 for (t, x) /∈ C. Since the sequence (yn

T ) is uniformly bounded and converges to yT ,and since the processes yn are martingales, it follows that one can extract a subsequence suchthat, P-a.s., yn

τ → yτ uniformly on [t, T ]. Similary, since E∫ Tt |zn

τ − zτ |2dτ → 0, one can find asubsequence such that P-a.s., zn

τ → zτ for a.a. τ ∈ [t, T ].

16

Next we define the stopping time R = infτ ∈ [t, T ] : yτ = 0, with the usual conventionthat R = T if this set is empty. Since y is a nonnegative continuous martingale, it is well-knownthat, P-a.s., y vanishes on (R, T ] (see e.g. [33] Proposition II.3.4). Thus, P-a.s., yτ = yτ∧R =yτ1[t,R](τ) for τ ∈ [t, R]. It follows that dyτ = dyτ∧R = zτ1[t,R](τ) dW 1

τ , and so the process z1[t,R]

is also a solution of the backward equation. From uniqueness we conclude that z1[t,R] = z, upto modification.

Now, P-a.s. for a.a. τ , if τ ∈ [t, R) then yτ > 0, and consequently ynτ > 0 for large n,

(τ,Xτ ) ∈ C and ζn(τ, Xτ ) → z∗τ/yτ . So, passing to the limit as n →∞, we obtain, P-a.s.,

z∗τ1[t,R)(τ) = yτ1[t,R)(τ) ζ(τ, Xτ ), for a.a. τ ∈ [t, T ],

which is equivalent to z∗τ = yτ ζ(τ, Xτ ) for a.a. τ ∈ [t, T ]. Now the proposition is proved, exceptfor the last assertion.

Let us denote by N the Borel set in [0, T ] × H where χ = 0. If (τ, X(τ, t, x)) ∈ N theny(τ, t, x) = 0 by equality (3.14) and therefore also z(τ, t, x) = 0 by (3.16). It follows that ifwe modify the definition of ζ setting ζ = 0 on N then equality (3.16) remains true. After theindicated modification the function ζ satisfies all the requirements of the proposition.

We recall that the law of y(τ, t, x),∫ τt z(s, t, x) ds is uniquely determined by the law of

X(τ, t, x),W 1τ − W 1

t . It follows from Remark 3.3 and the proof of Proposition 3.7 that thefunction ζ depends only on q, φ and the law of (X,W 1).

Remark 3.8 In particular, if X is defined as the solution of equation (3.1) then ζ is determinedby the operator A and the functions F,G, q, φ.

4 The fundamental relation for the optimal control problem

We come back to the control problem: we assume that Hypotheses 2.1 and 2.2 hold, (Ω,F ,P),Ft and W are given as in section 2 and we try to minimize the cost J(u) in (2.2) for the stateequation (2.1) over all admissible controls.

We denote by X(τ, t, x), x ∈ H, 0 ≤ t ≤ τ ≤ T the uncontrolled process, solution to (3.1).Still adopting the convention exp(−∞) = 0 we define φ = exp(−r) and

χ(t, x) = E exp(−r(X(T, t, x))−

∫ T


), t ∈ [0, T ], x ∈ H, (4.1)

in agreement with formula (3.13). We let y(τ, t, x), z(τ, t, x), x ∈ H, 0 ≤ t ≤ τ ≤ T thesolution of equation (3.11) with φ = exp(−r). Finally, we denote by ζ : [0, T ] × H → U thefunction constructed in Proposition 3.7.

As noticed in Remarks 3.5 and 3.8, the functions χ and ζ are determined by the operator Aand the functions F, G, q, r.

Remark 4.1 In some statements below we will impose the requirement χ(0, x) > 0. Thiscondition can be interpreted as follows. Let X0 denote the trajectory of the control systemstarting from x at time 0, corresponding to the null control u = 0. X0 has the same law asX(·, 0, x), so in particular

χ(0, x) = E exp(−r(X0

T )−∫ T

0q(σ,X0

σ) dσ

).

17

It follows that χ(0, x) is strictly positive if and only if the trajectory X0 satisfies∫ T

0q(σ,X0

σ) dσ < ∞, r(X0T ) < ∞,

with strictly positive probability.

With this notation we can now state the main result of this section:

Theorem 4.2 We assume that Hypotheses 2.1 and 2.2 hold and that

χ(0, x) > 0. (4.2)

Then for every admissible control u satisfying J(u) < ∞ we have

J(u) = − log χ(0, x) +12E

∫ T

0|ζ(t,Xu

t )− ut|2 dt. (4.3)

Consequently, we have J(u) ≥ − log χ(0, x) and the equality holds if and only if the followingfeedback law is verified by u and Xu:

ut = ζ(t,Xut ), P− a.s. for a.a. t ∈ [0, T ]. (4.4)

Equality (4.3) is known as the fundamental relation. We note that it implies all the remainingassertions of the theorem. The following corollary is also an immediate consequence.

Corollary 4.3 Under the assumptions of Theorem 4.2, if the closed loop equation:

dXt = AXt dt + F (t, Xt) dt + G(t, Xt) ζ(t, Xt) dt + G(t,Xt) dWt, t ∈ [0, T ],X0 = x,

admits a mild solution such that, defining ut := ζ(t, Xt), P-a.s. for a.a. t ∈ [0, T ], we have

E∫ T

0|ut|2 dt < ∞ and J(u) < ∞,

then the pair (u,X) is optimal for the control problem.

This corollary is not very useful as it stands, since we are not able to give conditions ensuringsolvability of the closed-loop equation in the mild sense, due to the lack of regularity of thefeedback law ζ. However, in the following section, we will show that the closed-loop equationadmits a weak solution. After an appropriate reformulation of the optimal control problem thiswill lead to existence of the optimal control and the validity of the feedback law for an optimalpair.

We conclude this section with the proof of Theorem 4.2.Proof. It is enough to prove (4.3). Let us define the stopping times Tn = inft ∈ [0, T ] :∫ t

0 |us|2ds > n, with the convention that Tn = T if this set is empty. Let us set un := u1[0,Tn]

and let us denote by Xn the trajectory of the control system corresponding to the control un.Then we have Xn1[0,Tn] = Xu1[0,Tn].

Let us define a measure P1 on FT , equivalent to P, setting

dP1

dP= exp

(−

∫ T

0(un

t )∗ dWt − 12

∫ T

0|un

t |2 dt

).

18

Here by u∗ we denote the element of U∗ corresponding to u ∈ U by the Riesz isometry. P1 clearlydepends on u and n but we will omit this dependence in the notation. Since

∫ T0 |un

s |2ds ≤ n,the Novikov condition

E exp(

12

∫ T

0|un

t |2dt

)< ∞

clearly holds, so P1 is in fact a probability measure. By the Girsanov theorem, the process

W 1t = Wt +

∫ t

0un

s ds, t ∈ [0, T ],

is a P1-Wiener cylindrical process with respect to Ft. Now we consider the (complete) prob-ability space (Ω,FT ,P1), we denote by N the family of its P1-null sets and, for an arbitraryinterval [s, t] ⊂ [0, T ], we denote by F1

[s,t] the σ-algebra generated by N and by the randomvariables W 1

r −W 1s , r ∈ [s, t]. Finally, we set F1

t = F1[0,t]. Xn is the mild solution of

dXnt = AXn

t dt + F (t, Xnt ) dt + G(t,Xn

t ) dW 1t , t ∈ [0, T ], Xn

0 = x,

which is an instance of equation (3.1), the equation satisfied by the uncontrolled process, witht = 0.

We consider the backward equation: P1-a.s.,

dyτ = zτ dW 1τ , τ ∈ [0, T ], yT = exp

(−r(Xn

T )−∫ T

0q(σ,Xn

σ ) dσ

), (4.5)

which is an instance of equation (3.11) with φ = exp(−r) and t = 0. The results of the previoussection apply here: in particular, for all t ∈ [0, T ],

yt = χ(t,Xnt ) exp

(−

∫ t

0q(σ,Xn

σ ) dσ

), P1 − a.s., (4.6)

by (3.14), andz∗t = yt ζ(t,Xn

t ), P1−a.s. for a.a. t ∈ [0, T ], (4.7)

by Proposition 3.7. The indicated equalities also hold P-a.s.We recall that y is a P1-martingale, and since 0 ≤ yT ≤ 1 it follows that 0 ≤ yt ≤ 1 for all

t ∈ [0, T ]. Applying the Ito formula we obtain, for all ε > 0,

d log(yt + ε) =zt

yt + εdW 1

t −12

∣∣∣∣zt

yt + ε

∣∣∣∣2

dt,

or, in terms of the process W ,

log(yt+ε)−log(y0+ε) =∫ t

0

zs

ys + εdWs+

∫ t

0

zs

ys + εun

s ds−12

∫ t

0

∣∣∣∣zs

ys + ε

∣∣∣∣2

ds, t ∈ [0, T ]. (4.8)

We claim that ∫ t0

zsys+ε dWs is a P-martingale with respect to Ft. Let p ∈ (2,∞) be arbitrary.

First we note that E1(∫ T

0 |zt|2 dt)p

< ∞, as follows from an application of the Burkholder-Davis-

Gundy inequalities to the bounded P1-martingale y. Next, denoting by ρ the density dP/dP1,we have

ρ = exp(∫ T

0(un

t )∗ dW 1t −

12

∫ T

0|un

t |2 dt

)

19

and since∫ T0 |un

t |2 dt ≤ n we obtain

E ρp = exp(∫ T

0(p un

t )∗ dW 1t −

12

∫ T

0|p un

t |2 dt +p(p− 1)

2

∫ T

0|un

t |2 dt

)

≤ exp(∫ T

0(p un

t )∗ dW 1t −

12

∫ T

0|p un

t |2 dt

)exp

(np(p− 1)

2

),

which implies E ρp ≤ exp (np(p− 1)/2) < ∞. It follows that

E∫ t

0

∣∣∣∣zs

ys + ε

∣∣∣∣2

ds ≤ 1ε2E

∫ T

0|zs|2 ds =

1ε2E1

[ρ

∫ T

0|zs|2 ds

]< ∞,

which proves the claim.Stopping the processes in (4.8) at Tn and taking expectation, we obtain

E log(yTn + ε)− E log(y0 + ε) = E∫ Tn

0

zt

yt + εun

t dt− 12E

∫ Tn

0

∣∣∣∣zt

yt + ε

∣∣∣∣2

dt

=12E

∫ Tn

0|un

t |2 dt− 12E

∫ Tn

0

∣∣∣∣z∗t

yt + ε− un

t

∣∣∣∣2

dt.

Since Xnt = Xu

t and unt = ut for t ≤ Tn we have, recalling (4.6), (4.7),

E log(

χ(Tn, XuTn

) exp(−

∫ Tn

0q(σ,Xu

σ ) dσ

)+ ε

)− log(χ(0, x) + ε)

=12E

∫ Tn

0|ut|2 dt− 1

2E

∫ Tn

0

∣∣∣∣∣∣ζ(t, Xu

t )χ(t,Xu

t ) exp(− ∫ t

0 q(σ,Xuσ ) dσ

)

χ(t,Xut ) exp

(− ∫ t

0 q(σ,Xuσ ) dσ

)+ ε

− ut

∣∣∣∣∣∣

2

dt.

Now we let n → ∞. Since we are assuming that u is admissible then, for P-almost all ω, thereexists n(ω) such that Tn(ω) = T for n > n(ω). So the left-hand side converges by the dominatedconvergence theorem, since 0 ≤ χ ≤ 1. Convergence of the right-hand side holds by monotoneconvergence. Recalling that χ(T, x) = exp(−r(x)) we obtain

E log(

exp(−r(Xu

T )−∫ T

0q(σ,Xu

σ ) dσ

)+ ε

)− log(χ(0, x) + ε)

=12E

∫ T

0|ut|2 dt− 1

2E

∫ T

0

∣∣∣∣∣∣ζ(t, Xu

t )χ(t,Xu

t ) exp(− ∫ t

0 q(σ,Xuσ ) dσ

)

χ(t,Xut ) exp

(− ∫ t

0 q(σ,Xuσ ) dσ

)+ ε

− ut

∣∣∣∣∣∣

2

dt.

(4.9)

Now we let ε ↓ 0. We first note that

− log(1 + ε) ≤ − log(

exp(−r(Xu

T )−∫ T

0q(σ,Xu

σ ) dσ

)+ ε

)↑ r(Xu

T ) +∫ T

0q(σ,Xu

σ ) dσ;

this implies, by monotone convergence,

−E log(

exp(−r(Xu

T )−∫ T

0q(σ,Xu

σ ) dσ

)+ ε

)→ E

[r(Xu

T ) +∫ T

0q(σ,Xu

σ ) dσ

],

which is finite, since we are assuming that J(u) < ∞. Since we also assume that χ(0, x) > 0,the left-hand side of (4.9) tends to a finite limit and obviously so does the right-hand side.

20

Now we note that, P-a.s.,∫ T0 q(σ,Xu

σ ) dσ < ∞ and therefore exp(− ∫ T

0 q(σ,Xuσ ) dσ

)> 0.

Next we recall that ζ = 0 on the set where χ = 0, as stated in Proposition 3.7. It follows that

limε→0

∣∣∣∣∣∣ζ(t,Xu

t )χ(t,Xu

t ) exp(− ∫ t

0 q(σ,Xuσ ) dσ

)

χ(t,Xut ) exp

(− ∫ t

0 q(σ,Xuσ ) dσ

)+ ε

− ut

∣∣∣∣∣∣= |ζ(t,Xu

t )− ut| ,

and, by the Fatou lemma,

E∫ T

0|ζ(t, Xu

t )− ut|2 dt ≤ lim infε→0

E∫ T

0

∣∣∣∣∣∣ζ(t,Xu

t )χ(t,Xu

t ) exp(− ∫ t

0 q(σ,Xuσ ) dσ

)

χ(t,Xut ) exp

(− ∫ t

0 q(σ,Xuσ ) dσ

)+ ε

− ut

∣∣∣∣∣∣

2

dt.

Since the right-hand side of (4.9) tends to a finite limit we conclude that E∫ T0 |ζ(t,Xu

t )− ut|2 dt,and consequently E

∫ T0 |ζ(t,Xu

t )|2dt, are finite. This allows to pass to the limit in (4.9) and finallygives

−E[r(Xu

T ) +∫ T

0q(σ,Xu

σ ) dσ

]− log χ(0, x) =

12E

∫ T

0|ut|2 dt− 1

2E

∫ T

0|ζ(t,Xu

t )− ut|2 dt

which coincides with (4.3).

5 The optimal control problem: weak formulation

We now reformulate the optimal control problem in the weak sense, following the approach of[14]. The main advantage is that we will be able to solve the closed-loop equation, and henceto find an optimal control, although the feedback law ζ is non-smooth.

Again H, U denote real separable Hilbert spaces. We assume we are given A,F,G, q, rsatisfying Hypotheses 2.1 and 2.2. We call (Ω,F , Ft,P,W ) an admissible set-up, or simplya set-up, if (Ω,F ,P) is a complete probability space with a right-continuous and P-completefiltration Ft, t ∈ [0, T ], and Wt, t ∈ [0, T ] is a cylindrical P-Wiener process with values inU , with respect to the filtration Ft.

An admissible control system (a.c.s) is defined as (Ω,F , Ft,P, W, u, Xu, x) where:

• (Ω,F , Ft,P, W ) is an admissible set-up;

• ut, t ∈ [0, T ] is an Ft-predictable process with values in U , satisfying E∫ T0 |ut|2 dt <

∞;

• Xut , t ∈ [0, T ] is an Ft-adapted continuous process with values in H, mild solution of

the state equation (2.1) with initial condition Xu0 = x.

By Proposition 2.3, on an arbitrary set-up the process Xu is uniquely determined by u andx, up to indistiguishability. To every a.c.s. we associate the cost J :

J = E∫ T

0

[12|ut|2 + q(t,Xu

t )]

dt + E r(XuT ). (5.1)

Although (5.1) formally coincides with (2.2), it is important to note that J is a functional ofthe a.c.s., and not a functional of u alone. Our purpose is to minimize the functional J over alla.c.s. with a given initial condition x ∈ H.

21

Now recall the definition of the functions χ and ζ: see (4.1) and Proposition 3.7 respectively.As already noticed several times, χ and ζ are determined by A,F,G, q, r and so they are thesame for all a.c.s.

Our main result, Theorem 5.2 below, is based on the solvability of the closed loop equation

dXt = AXt dt + F (t,Xt) dt + G(t,Xt) ζ(t,Xt) dt + G(t,Xt) dWt, t ∈ [0, T ],X0 = x,

(5.2)

in the following sense: we say that X is a weak solution of (5.2) with given initial conditionx ∈ H if there exists an admissible set-up (Ω,F , Ft,P,W ) and an Ft-adapted continuousprocess X with values in H, satisfying E

∫ T0 |ζ(t,Xt)|2 dt < ∞, which solves the equation in the

mild sense, namely: P-a.s.

Xt = etAx +∫ t


∫ t

0e(t−σ)AG(σ,Xσ)ζ(σ,Xσ) dσ

+∫ t

0e(t−σ)AG(σ,Xσ) dWσ, t ∈ [0, T ].

Let us denote by WH the space of continuous functions w : [0, T ] → H, endowed withthe usual topology of uniform convergence and the corresponding Borel σ-algebra. By the lawof X we understand as usual the image measure of P under the mapping Ω → WH given byω 7→ X(ω) = (Xt(ω))t∈[0,T ].

Proposition 5.1 We assume that Hypotheses 2.1 and 2.2 hold.If X and Y are weak solutions (possibly on different set-ups) of the closed-loop equation, with

the same initial condition x ∈ H, then the laws of X and Y coincide.If the initial condition x ∈ H satisfies χ(0, x) > 0 then the closed loop equation has a weak

solution satisfying

E∫ T

0

[|ζ(t,Xt)|2 + q(t,Xt)]

dt + E r(XT ) < ∞. (5.3)

We postpone the proof and we first state the main result of this section:

Theorem 5.2 We assume that Hypotheses 2.1 and 2.2 hold and that x ∈ H satisfies χ(0, x) > 0.Then the infimum of J over all a.c.s. with initial condition x is equal to − log χ(0, x). Moreoverthere exists an a.c.s. (Ω,F , Ft,P,W, u, Xu, x) for which J = − log χ(0, x) and the feedbacklaw

ut = ζ(t,Xut ), P− a.s. for a.a. t ∈ [0, T ].

is verified by u and Xu. Consequently, the optimal trajectory Xu is a weak solution of the closedloop equation and its law is uniquely determined.

Proof. Let X be a weak solution of the closed-loop equation in an admissible set-up(Ω,F , Ft,P,W ), satisfying (5.3). Then, defining ut := ζ(t,Xt), P-a.s. for a.a. t ∈ [0, T ],we have E

∫ T0 |ut|2 dt < ∞, J < ∞, X = Xu and all the required conclusions follow from

Theorem 4.2 and Corollary 4.3.Proof of Proposition 5.1. We first prove uniqueness.Assume that X is a weak solution of (5.2) on an admissible set-up (Ω,F , Ft,P,W ). For

w ∈ WH let us define

τn(w) = inft ∈ [0, T ] :∫ t

0|ζ(σ,wσ)|2 dσ ≥ n,

22

with convention that τn = T is the indicated set is empty. We denote by Tn the Ft-stoppingtime

Tn = τn(X) = inft ∈ [0, T ] :∫ t

0|ζ(σ,Xσ)|2 dσ ≥ n,

and we use the notation XTn,At = e(t−Tn)+A Xt∧Tn , introduced in section 2. By Corollary 2.5

the process XTn,A is a mild solution of

dXTn,At = AXTn,A

t dt + F (t,XTn,At )1[0,Tn](t) dt

+G(t,XTn,At )1[0,Tn](t) [ζ(t,XTn,A

t )1[0,Tn](t) dt + dWt], t ∈ [0, T ].

We note that the process ζ(·, XTn,A)1[0,Tn] satisfies∫ T0 1[0,Tn](t)|ζ(t, XTn,A

t )|2 dt ≤ n, P-a.s. andtherefore the Novikov condition

E exp(

12

∫ T

01[0,Tn](t)|ζ(t,XTn,A

t )|2dt

)< ∞

is trivially verified and, by the Girsanov theorem, there exists a probability measure Pn on FT ,equivalent to P, given by the formula

dPn

dP= exp

(−

∫ T

01[0,Tn](s)ζ(s,XTn,A

s )∗ dWs − 12

∫ T

01[0,Tn](s)|ζ(s,XTn,A

s )|2 ds

), (5.4)

such that the process Wn defined by

Wnt = Wt +

∫ t

01[0,Tn](s)ζ(s,XTn,A

s ) ds, t ∈ [0, T ],

is a Pn-Wiener cylindrical process. Thus, (Ω,FT , Ft,Pn,Wn) is an admissible set-up andXTn,A is a mild solution of

dXTn,At = AXTn,A

t dt + F (t,XTn,At )1[0,Tn](t) dt + G(t,XTn,A

t )1[0,Tn](t) dWnt .

Keeping the same set-up (Ω,FT , Ft,Pn,Wn), let us consider

dZt = AZt dt + F (t, Zt) dt + G(t, Zt) dWnt , Z0 = x.

This equation has a unique mild solution Z, and the law of Z is uniquely determined by A, F ,G and x. By Corollary 2.5, the process ZTn,A is a mild solution of

dZTn,At = AZTn,A

t dt + F (t, ZTn,At )1[0,Tn](t) dt + G(t, ZTn,A

t )1[0,Tn](t) dWnt .

By Lemma 2.6 we conclude that ZTn,A = XTn,A. In particular we have Z1[0,τn(X)] = X1[0,τn(X)]

which implies, by the definition of τn, that τn(X) = τn(Z). It follows that Zτn(Z) = Xτn(X) andfinally

XTn,At = Zt1[0,τn(Z)](t) + e(t−τn(Z))+AZτn(Z)1(τn(Z),T ](t), t ∈ [0, T ].

Thus, the trajectories of XTn,A are obtained from those of Z by composition with the mapping

w 7→ w1[0,τn(w)] + e(·−τn(w))+Awτn(w)1(τn(w),T ], w ∈ WH .

It follows that the law of XTn,A under Pn, and also the joint law of XTn,A and Wn under Pn,are uniquely determined by τn, A, F , G and x. Now writing the density in (5.4) in terms of Wn

and noting that Tn = τn(XTn,A) we obtain

dPdPn

= exp(−

∫ T

01[0,τn(XTn,A)](s)ζ(s,XTn,A

s )∗ dWns −

12

∫ T

01[0,τn(XTn,A)](s)|ζ(s,XTn,A

s )|2 ds

).

23

This formula shows that τn, A, F , G, x also determine the joint law of dP/dPn and XTn,A underPn, and therefore also the law of XTn,A under the original probability P.

Now if Y denotes another weak solution of the closed-loop equation (possibly on a differentset-up) then setting Sn = τn(Y ) we conclude that the processes XTn,A and Y Sn,A have the samelaw.

Coming back to the original set-up (Ω,F , Ft,P,W ) of X, we note that the requirementE

∫ T0 |ζ(t,Xt)|2 dt < ∞ in the definition of weak solution implies that for every t, XTn,A

t → Xt

P-a.s. It follows that the finite-dimensional distributions of the processes XTn,A converge to thefinite-dimensional distributions of X in the weak topology of measures. A similar result holdsfor the processes Y Sn,A and Y , and therefore the laws of X and Y coincide. This completes theproof of the uniqueness part.

Now we prove existence. Let us start with a cylindrical Wiener process W 1t , t ∈ [0, T ]

with values in U , defined on a complete probability space (Ω,F1,P1), and let F1t , t ∈ [0, T ]

be the filtration generated by the Wiener process W 1. Thus, F1t is the σ-algebra generated by

the random variables W 1r , r ∈ [0, t] and by the family of P1-null sets of F1. We denote by X

the mild solution of the equation

dXt = AXt dt + F (t,Xt) dt + G(t,Xt) dW 1t , t ∈ [0, T ], X0 = x. (5.5)

This equation is an instance of (3.2) and existence and uniqueness of X follow from Proposition3.1. Next we solve the backward equation

dyt = zt dW 1t , t ∈ [0, T ], yT = exp

(−r(XT )−

∫ T

0q(σ,Xσ) dσ

),

which is an instance of (3.11) with φ = exp(−r), and we recall that zt = yt ζ(t,Xt), P1-a.s. fora.a. t ∈ [0, T ], where ζ : [0, T ]×H → U is the function defined in Proposition 3.7.

The process y is a P1-continuous martingale and we have y0 = χ(0, x) > 0. For any P1-continuous real local martingale mt, t ∈ [0, T ] with respect to F1

t we denote by 〈m, y〉 thejoint quadratic variation process of m and y. In particular, given ξ ∈ U we may choose mt = Wtξand in this case 〈Wξ, y〉t =

∫ t0 zsξ ds, as follows immediately from the backward equation.

Now we define a probability measure P on F1T setting

dPdP1

=yT

y0, (5.6)

so that the Radon-Nikodym derivative of P with respect to P1 on each σ-algebra F1t is yt/y0.

We note that, by the very definition on P, we have yT > 0 P-a.s. and hence

− log yT = r(XT ) +∫ T

0q(σ,Xσ) dσ < ∞, P− a.s. (5.7)

In addition, since y is P1-a.s. continuous and nonnegative, by a classical result (see e.g. [33]Proposition VIII.1.2) P-almost all the trajectories of y are strictly positive. In particular theprocess y0/y is well-defined, up to a P-null set. We note that in general yT it is not strictlypositive P1-a.s., so that P and P1 are not equivalent on F1

T . Nevertheless we can apply a generalversion of the Girsanov theorem in a way that we now describe (see [33] Theorem VIII.1.4; [10],nos. 45–50, especially no. 50-(c); [22]). Let us denote by F the P-completion of F1

T and by Ftthe usual augmentation of the filtration F1

t with respect to (Ω,F ,P). The general version ofthe Girsanov theorem states that for any P1-continuous real local martingale m with respect toF1

t the process

mt −∫ t

0

1ys

d〈m, y〉s, t ∈ [0, T ]

24

is a P-continuous real local martingale with respect to Ft. In particular, choosing mt = W 1t ξ

as before we deduce that the process

W 1t ξ −

∫ t

0

zsξ

ysds, t ∈ [0, T ] (5.8)

is a P-continuous real local martingale. Since its joint quadratic variation with W 1ξ′, for anyξ′ ∈ U , is the process (ξ, ξ′)U t, we conclude by the Levy characterization theorem that theprocess defined in (5.8) is a Wiener process and that if we set

Wt := W 1t −

∫ t

0

zs

ysds, t ∈ [0, T ]

then W is a P-Wiener cylindrical process in U , with respect to Ft. Since P is absolutelycontinuous with respect to P1, we also have, for a.a. t ∈ [0, T ], zt = yt ζ(t,Xt), P-a.s., andconsequently Wt = W 1

t −∫ t0 ζ(s, Xs) ds, P-a.s., so writing equation (5.5) with respect to W it

follows that X is a weak solution of the closed-loop equation on the set-up (Ω,F , Ft,P, W ) ofNow it remains to verify (5.3). Let us define Tn = inft ∈ [0, T ] : yt ≤ 1/n, with the

convention that Tn = T if this set is empty. Tn are F1t and Ft-stopping times. We define

the stopped process yTn setting yTnt = yt∧Tn . Then dyTn

t = 1[0,Tn](t)zt dW 1t and writing this

equation in terms of W :

dyTnt = 1[0,Tn](t)zt dWt + 1[0,Tn](t)ztζ(t,Xt) dt.

Since yTn ≥ 1/n we can apply the Ito formula to the process log yTn . Since zt = yt ζ(t, Xt) andsince yTn = yt on [0, Tn] we obtain

d log yTnt =

1yTn

t

dyTnt − 1

21[0,Tn](t)

|zt|2|yTn

t |2 dt = 1[0,Tn](t)zt

ytdWt +

12

1[0,Tn](t)|zt|2|yt|2 dt

= 1[0,Tn](t) ζ(t,Xt) dWt +12

1[0,Tn](t) |ζ(t,Xt)|2 dt.

We claim that ∫ t0 1[0,Tn](s) ζ(s,Xs) dWs is a P-martingale. Indeed, using the definition of Tn

and of the probability P, and recalling the backward equation, we have

E∫ Tn

0|ζ(t,Xt)|2 dt = E

∫ Tn

0

∣∣∣∣zt

yt

∣∣∣∣2

dt ≤ nE∫ Tn

0

|zt|2yt

dt =n

y0

∫ Tn

0E1 |zt|2 dt

≤ n

y0

∫ T

0E1 |zt|2 dt =

n

y0(E1 |yT |2 − y2

0) < ∞.

Therefore, taking expectation with respect to P, we obtain

12E

∫ Tn

0|ζ(t,Xt)|2 dt− E log yTn = − log y0. (5.9)

We recall that y is a stricly positive process P-a.s. It follows that for P-almost all ω, there existsn(ω) such that Tn(ω) = T for n > n(ω). Moreover, since y0 = χ(0, x) > 0, for n > χ(0, x)−1

the random variables yTn form a decreasing sequence and we have, P-a.s., 1 ≥ yTn → yT andconsequently 0 ≤ − log yTn → − log yT . By the monotone convergence theorem we can pass tothe limit in (5.9), and taking into account (5.7) we conclude that

12E

∫ T

0|ζ(t,Xt)|2 dt + E

∫ T

0q(t,Xt) dt + E r(XT ) = − log y0.

This shows that (5.3) holds and finishes the proof of Proposition 5.1.In the following corollary we denote by P the law of the optimal trajectory and, for arbitrary

set-up, we denote by P 0 the law of the trajectory X0 corresponding to u = 0.

25

Corollary 5.3 Under the assumptions of Theorem 5.2, P is absolutely continuous with respectto P 0 with density

dP

dP 0=

1Z

exp(−r(wT )−

∫ T

0q(σ,wσ) dσ

), w ∈ WH ,

where Z is the normalizing constant

Z =∫

WH

exp(−r(wT )−

∫ T

0q(σ,wσ) dσ

)P 0(dw).

Proof. Let X be the process, in the probability space (Ω,F1,P1), defined as the mildsolution of the equation (5.5) introduced in the proof of Theorem 5.2. Then the law of X underP1 is P 0. The existence of the density and its formula is then an immediate consequence of(5.6).

6 Constrained optimal control problems and conditioned pro-cesses

In this section we assume that we are given Hilbert spaces H, U and coefficients A,F, G sat-isfying Hypothesis 2.2. We adopt the weak formulation of the control problem for the stateequation (2.1) and we define admissible control systems (a.c.s) (Ω,F , Ft,P,W, u, Xu, x) as inthe previous section. We will show that application of previous results with special choices ofthe (possibily infinite) functions q and r occurring in the cost (5.1) leads to existence results foroptimal control problems with state constraints.

For arbitrary sets V ⊂ Z we define the function IV setting IV (z) = 0 if z ∈ V , IV (z) = +∞if z ∈ Z\V . Now we fix a Borel subset B ⊂ H and a (relatively) open subset C ⊂ [0, T ] × Hand we denote C(t) = x ∈ H : (t, x) ∈ C. We consider the optimal control problem with thecost

E∫ T

0

[12|ut|2 + IC(t, Xu

t )]

dt + E IB(XuT ). (6.1)

Clearly, the problem of finding an optimal a.c.s. with a finite cost is equivalent to finding ana.c.s. minimizing the cost E

∫ T0 |ut|2 dt (sometimes called the energy of the control) and for

which the constraints XuT ∈ B and Xu

t ∈ C(t) for a.a. t ∈ [0, T ] hold with probability 1. Sincewe assume that C is open and Xu has continuous paths, the last requirement is equivalent to:Xu

t ∈ C(t) for all t ∈ [0, T ], with probability 1.We still denote by P 0 the law of the trajectory X0 corresponding to u = 0. Let us denote Θ

the set of all w ∈ WH such that wT ∈ B and wt ∈ C(t) for t ∈ [0, T ]. The condition P 0(Θ) > 0,required in the following proposition, means that the trajectory X0 belongs to Θ with positiveprobability. For this to happen it is necessary that the starting point x belongs to C(0) andthat B ∩ C(T ) 6= ∅.

Proposition 6.1 Assume that Hypothesis 2.2 holds and consider the optimal control problemwith the cost

E∫ T

0|ut|2 dt

and the constraintsXu

T ∈ B, Xut ∈ C(t) for t ∈ [0, T ],

26

where B is a Borel subset of H and C is a (relatively) open subset of [0, T ]×H.If P 0(Θ) > 0, then there exists an optimal a.c.s. satisfying the constraints. Moreover the

law P of the corresponding optimal trajectory coincides with the law P 0 conditioned to Θ, i.e.for every Borel subset Γ ⊂ WH

P (Γ) =P 0(Γ ∩Θ)

P 0(Θ). (6.2)

Proof. Let us consider the cost J given by (6.1). Since we require C to be open we clearlyhave

1Θ(w) = exp(−IB(wT )−

∫ T

0IC(σ,wσ) dσ

), w ∈ WH .

Since, in our special case, the constant χ(0, x) of Theorem 5.2 coincides with with P 0(Θ), theexistence of an optimal a.c.s. satisfying the contraints follows from Theorem 5.2. Since P 0(Θ)also equals the constant Z in Corollary 5.3, the formula for the density dP/dP 0 of Corollary 5.3gives (6.2).

Choosing C = [0, T ]×H, which implies IC = 0, we have in particular the following corollary,that generalizes Theorem VI.4.1 in [14].

Corollary 6.2 Assume that Hypothesis 2.2 holds and consider the optimal control problem withthe cost

E∫ T

0|ut|2 dt

and the constraintXu

T ∈ B

where B is a Borel subset of H.If P 0(w : wT ∈ B) > 0, then there exists an optimal a.c.s. satisfying the constraint.

Moreover the law P of the corresponding optimal trajectory coincides with the law P 0 conditionedto w : wT ∈ B, i.e. for every Borel subset Γ ⊂ WH

P (Γ) =P 0(Γ ∩ w : wT ∈ B)

P 0(w : wT ∈ B). (6.3)

With abuse of terminology, the optimal trajectory X is obtained by conditioning the processX0 to reach the set B at the terminal time T . It is expected that this conditioning procedureis related to the h-transformation of Doob: this is indeed the case. Let us recall how thelaw P was constructed in the proof of Proposition 5.1 We started with an appropriate set-up(Ω,F1,P1, F1

t ,W 1), we solved the equation

dXt = AXt dt + F (t,Xt) dt + G(t,Xt) dW 1t , t ∈ [0, T ], X0 = x,

and the backward equation

dyt = zt dW 1t , t ∈ [0, T ], yT = exp (−IB(XT )) = 1B(XT );

we defined P on F1T setting dP/dP1 = yT /y0 and, finally, P was defined as the law of the process

X under P. Next let us recall the definition of χ, formula (4.1): since now q = 0 and r = IB,and hence exp (−IB) = 1B, we see that χ(t, ·) = PtT 1B, where by PtT we denote the transitionoperator of the uncontrolled process X(τ, t, x) over the interval [t, T ]. Finally we recall that

27

yt = χ(t,Xt) and it follows that the Radon-Nikodym derivative of P with respect to P1 on eachσ-algebra F1

t isdPdP1

∣∣∣∣F1

t

=yt

y0=

χ(t,Xt)χ(0, x)

.

This shows that P is obtained from P 0 by h-transformation via the function χ.Since χ(t, ·) = PtT exp (−IB), we see that χ formally coincides with the solution of equation

(1.9) (with q = 0) obtained from the Hamilton-Jacobi-Bellman equation by the logarithmictransformation. As mentioned in the introduction, under regularity assumptions, in particularif the gradient ∇xχ exists, the conditioned process X satisfies a closed-loop equation withfeedback law given by

ut =G(t,Xt)∗∇xχ(t,Xt)∗

χ(t,Xt).

Under our weaker assumptions we could prove that the optimal feedback law is ut = ζ(t,Xt)where ζ : [0, T ]×H → U is the function defined in Proposition 3.7.

7 Examples

In this section we present some simple applications of the previous results. We do not aim atutmost generality and several variants of the results below could easily be stated and proved.For brevity we will also leave to the reader the precise formulation of the control problem, inparticular concerning admissible set-ups and controls.

7.1 The finite dimensional case.

Let us consider the controlled stochastic equation in Rn

dXut = F (Xu

t ) dt + G(Xut )[ut dt + dWt], t ∈ [0, T ], Xu

0 = x,

where x ∈ Rn, W is a Wiener process in Rd, the functions F : Rn → Rn and G : Rn → L(Rd,Rn)are assumed to be Lipschitz continuous. Let X0 denote the solution corresponding to u = 0and let τ = inft ≥ 0 : X0

t /∈ O be the exit time from an open set O ⊂ Rn. Suppose thatP(τ > T ) > 0 for some T > 0 (this can happen only if x ∈ O). Then there exists a squaresummable control u minimizing E

∫ T0 |ut|2 dt and such that, P-a.s., Xu

t ∈ O for every t ∈ [0, T ].Moreover, ut = ζ(t,Xu

t ) for a feedback law ζ.

7.2 A controlled heat equation.

We consider the heat equation with one dimensional space variable ξ ∈ (0, 1) and unknownprocess yu(t, ξ):

dyu(t, ξ) =∂2

∂ξ2yu(t, ξ) dt + f(yu(t, ξ)) dt + g(yu(t, ξ))[u(t, ξ) dt + dW (t, ξ)],

yu(0, ξ) = x(ξ), ξ ∈ (0, 1),yu(t, 0) = yu(t, 1) = 0, t ≥ 0.

(7.1)

Here W (t, ξ) stands for the space-time white noise, the functions f : R→ R and g : R→ R areLipschitz continuous and g is bounded. We set H = U = L2(0, 1) and assume x ∈ H. For u ∈ U ,y ∈ H we define F (y) ∈ H, G(y) ∈ L(U,H) setting F (y)(ξ) = f(y(ξ)), (G(y)u)(ξ) = g(y(ξ))u(ξ),ξ ∈ (0, 1). Next if A denotes the operator ∂2/∂ξ2 in L2(0, 1) with domain W 2,2(0, 1)∩W 1,2

0 (0, 1)then (7.1) has the form (2.1) and Hypothesis 2.2 holds.

28

Denoting y0 the solution corresponding to u = 0, let T > 0, R > 0 satisfy

P(∫ 1

0y0(T, ξ)2 dξ < R

)> 0.

Then there exists a control u minimizing E∫ T0

∫ 10 u(t, ξ)2 dξ dt and such that

P(∫ 1

0yu(T, ξ)2 dξ < R

)= 1.

Moreover, u(t, ξ) = ζ(t, yu(t, ·))(ξ) for a feedback law ζ : [0, T ]× L2(0, 1) → L2(0, 1).

7.3 A controlled SPDE with degenerate noise.

We consider a stochastic partial differential equation of parabolic type for an unknown processyu(t, ξ) in a bounded domain O ⊂ Rn with smooth boundary ∂O:

dyu(t, ξ) = ∆ yu(t, ξ) dt + f(yu(t, ξ)) dt +d∑

j=1

gj(yu(t, ξ))[ujt dt + dW j

t ],

yu(0, ξ) = x(ξ), ξ ∈ O,yu(t, ξ) = 0, t ∈ [0, T ], ξ ∈ ∂O.

(7.2)

Here W = (W 1, . . . ,W d) is a standard Wiener process in Rd, the functions f : R → R andgj : R → R are Lipschitz continuous and gj are bounded. We set U = Rd, H = L2(O) andassume x ∈ H. For u = (u1, . . . , ud) ∈ U , y ∈ H we define F (y) ∈ H, G(y) ∈ L(U,H) setting

F (y)(ξ) = f(y(ξ)), (G(y)u)(ξ) =d∑

j=1

gj(y(ξ))uj , ξ ∈ O.

Next if A denotes the Laplace operator ∆ in L2(O) with domain W 2,2(O)∩W 1,20 (O) then (7.2)

has the form (2.1) and Hypothesis 2.2 holds.Let us consider the optimal control problem associated with the cost

J = E∫ T

0

1

2

d∑

j=1

(ujt )

2 +∫

Oq(t, yu(t, ξ)) dξ

dt + E

∫

Or(yu(T, ξ)) dξ,

where q : [0, T ] × R → [0,∞), r : R → [0,∞) are bounded measurable functions. Then thereexists an optimal control u and moreover ut = ζ(t, yu(t, ·)) for a feedback law ζ : [0, T ]×L2(O) →Rd.

Acknowledgments. I am grateful to Giuseppe Da Prato and Jerzy Zabczyk for discussionson conditioned processes, to Gianmario Tessitore for his help with backward equations and toPaolo Dai Pra for explanations about his own results in [5].

References

[1] V. Barbu, G. Da Prato, Hamilton-Jacobi equations in Hilbert spaces. Pitman ResearchNotes in Mathematics, 86. Pitman, 1983.

29

[2] P. Cannarsa, G. Da Prato, Second-order Hamilton-Jacobi equations in infinite dimensions.SIAM J. Control and Optimization 29 (1991), no. 2, 474–492.

[3] P. Cannarsa, G. Da Prato, Direct solution of a second-order Hamilton-Jacobi equations inHilbert spaces. in: Stochastic partial differential equations and applications, eds. G. DaPrato, L. Tubaro, 72–85, Pitman Research Notes in Mathematics, 268. Pitman, 1992.

[4] M.G. Crandall, M. Kocan, A. Swiech, On partial sup-convolutions, a lemma of P. L. Lionsand viscosity solutions in Hilbert spaces. Adv. Math. Sci. Appl. 3 (1993/94), Special Issue,1–15.

[5] P. Dai Pra, A stochastic control approach to reciprocal diffusion processes. Appl. Math.Optim. 23 (1991), 313–329.

[6] G. Da Prato, J. Zabczyk, Stochastic equations in infinite dimensions. Encyclopedia of Math-ematics and its Applications 44, Cambridge University Press, 1992.

[7] G. Da Prato, J. Zabczyk, Ergodicity for infinite-dimensional systems. London MathematicalSociety Lecture Note Series, 229. Cambridge University Press, Cambridge, 1996.

[8] G. Da Prato, J. Zabczyk, Second order partial differential equations in Hilbert spaces. Bookto appear.

[9] G. Da Prato, J. Zabczyk, Differentiability of the Feynman-Kac semigroup and a controlapplication. Rend. Mat. Accad. Lincei 8 (1997), s. 9, 183–188.

[10] C. Dellacherie, P.A. Meyer, Probabilities and potential. B. Theory of martingales. North-Holland Mathematics Studies, 72. North-Holland Publishing Co., Amsterdam, 1982.

[11] Backward Stochastic Differential Equations. eds. N. El Karoui, L. Mazliak, Pitman ResearchNotes in Mathematics Series 364, Longman, 1997.

[12] N. El Karoui, S. Peng, M.C. Quenez, Backward stochastic differential equations in finance.Mathematical Finance 7 (1997), no. 1, 1–71.

[13] W.H. Fleming, S.J. Sheu, Stochastic variational formula for fundamental solutions ofparabolic PDE. Appl. Math. Optim. 13 (1985), 193–204.

[14] W.H. Fleming, H.M. Soner, Controlled Markov processes and viscosity solutions. Applica-tions of Mathematics, 25. Springer-Verlag, New York, 1993.

[15] M. Fuhrman, G. Tessitore, Nonlinear Kolmogorov equations in infinite dimensional spaces:the backward stochastic differential equations approach and applications to optimal control.Ann. Probab. 30 (2002), no. 3, 1397–1465.

[16] M. Fuhrman, G. Tessitore, The Bismut-Elworthy formula for Backward SDE’s and ap-plications to nonlinear Kolmogorov equations and control in infinite dimensional spaces.Stochastics and Stochastics Rep. 74 (2002), no. 1-2, 429–464.

[17] M. Fuhrman, G. Tessitore, Infinite horizon backward stochastic differential equations andelliptic equations in Hilbert spaces, preprint Dipartimento di Matematica del Politecnico diMilano, 498/P, February 2002.

[18] F. Gozzi, Regularity of solutions of second order Hamilton-Jacobi equations and applicationto a control problem. Comm. Partial Differential Equations 20 (1995), 775–826.

30

[19] F. Gozzi, Global regular solutions of second order Hamilton-Jacobi equations in Hilbertspaces with locally Lipschitz nonlinearities. J. Math. Anal. Appl. 198 (1996), 399–443.

[20] F. Gozzi, E. Rouy, A. Swiech, Second order Hamilton-Jacobi equations in Hilbert spacesand stochastic boundary control. SIAM J. Control Optim. 38, (2000), no. 2, 400–430.

[21] M. Kobylanski, Backward stochastic differential equations and partial differential equationswith quadratic growth. Ann. Probab. 28 (2000), no. 2, 558–602.

[22] E. Lenglart, Transformation des martingales locales par changement absolument continude probabilites. Z. Wahrscheinlichkeitstheorie und Verw. Gebiete 39 (1977), no. 1, 65–70.

[23] J. A. Leon, D. Nualart, Stochastic evolution equations with random generators. Ann.Probab. 26 (1998), no. 1, 149–186.

[24] J.-P. Lepeltier, J. San Martın, Existence for BSDE with superlinear-quadratic coefficient.Stochastics Stochastics Rep. 63 (1998), no. 3-4, 227–240.

[25] P.L. Lions, Viscosity solutions of fully nonlinear second-order equations and optimalstochastic control in infinite dimensions. I. The case of bounded stochastic evolutions. ActaMath. 161 (1988), no. 3-4, 243–278.

[26] P.L. Lions, Viscosity solutions of fully nonlinear second order equations and optimal stochas-tic control in infinite dimensions. II. Optimal control of Zakai’s equation. in: Stochasticpartial differential equations and applications, II, eds. G. Da Prato, L. Tubaro, 147-170,Lecture Notes in Mathematics 1390, Springer, 1989.

[27] P.L. Lions, Viscosity solutions of fully nonlinear second-order equations and optimalstochastic control in infinite dimensions. III. Uniqueness of viscosity solutions for generalsecond-order equations. J. Funct. Anal. 86 (1989), no. 1, 1–18.

[28] J. Ma, J. Yong, Forward-backward stochastic differential equations and their applications.Lecture Notes in Mathematics 1702, Springer, 1999.

[29] M. Metivier, Stochastic integration with respect to Hilbert valued martingales, representa-tion theorems and infinite-dimensional filtering. Measure theory applications to stochasticanalysis (Proc. Conf., Res. Inst. Math., Oberwolfach, 1977), pp. 13–25, Lecture Notes inMath., 695, Springer, Berlin, 1978.

[30] J.Y. Ouvrard, Representation de martingales vectorielles de carre integrable a valeurs dansdes espaces de Hilbert reels separables. Z. Wahrscheinlichkeitstheorie und Verw. Gebiete 33(1975/76), no. 3, 195–208.

[31] E. Pardoux, S. Peng, Adapted solution of a backward stochastic differential equation. Sys-tems and Control Lett. 14 (1990), 55–61.

[32] S. Peng, A generalized dynamic programming principle and Hamilton-Jacobi-Bellman equa-tion. Stochastics Stochastics Rep. 38 (1992), 119–134.

[33] D. Revuz, M. Yor, Continuous martingales and Brownian motion. Third edition. Grund-lehren der Mathematischen Wissenschaften 293. Springer-Verlag, Berlin, 1999.

[34] B. L. Rozovskii, Stochastic evolution systems. Linear theory and applications to nonlinearfiltering. Mathematics and its Applications (Soviet Series), 35. Kluwer, 1990.

31

[35] I. Simao, A conditioned Ornstein-Uhlenbeck process on a Hilbert space. Stochastic Anal.Appl. 9 (1991), 85–98.

[36] I. Simao, Regular fundamental solution for a parabolic equation on an infinite-dimensionalspace. Stochastic Anal. Appl. 11 (1993), 235–247.

[37] I. Simao, Regular transition densities for infinite dimensional diffusions, Stochastic Anal.Appl. 11 (1993), 309–336.

[38] A. Swiech, “Unbounded” second order partial differential equations in infinite-dimensionalHilbert spaces. Comm. Partial Differential Equations 19, (1994), no. 11-12 ,1999–2036.

[39] A. Swiech, Viscosity solutions of fully nonlinear partial differential equations with “un-bounded” terms in infinite dimensions. Ph.D. thesis, University of California at Santa Bar-bara, 1993.

32

a class of stochastic optimal control problems in hilbert ...tubaro/articles/fuhrman2003.pdf ·...

Documents