generalization of an inequality by...

37
GENERALIZATION OF AN INEQUALITY BY TALAGRAND, AND LINKS WITH THE LOGARITHMIC SOBOLEV INEQUALITY F. OTTO AND C. VILLANI Abstract. We show that transport inequalities, similar to the one derived by Talagrand [30] for the Gaussian measure, are im- plied by logarithmic Sobolev inequalities. Conversely, Talagrand’s inequality implies a logarithmic Sobolev inequality if the density of the measure is approximately log-concave, in a precise sense. All constants are independent of the dimension, and optimal in certain cases. The proofs are based on partial differential equations, and an interpolation inequality involving the Wasserstein distance, the entropy functional and the Fisher information. Contents 1. Introduction 1 2. Main results 5 3. Heuristics 10 4. Proof of Theorem 1 18 5. Proof of Theorem 3 24 6. An application of Theorem 1 29 7. Linearizations 31 Appendix A. A nonlinear approximation argument 34 References 35 1. Introduction Let M be a smooth complete Riemannian manifold of dimension n, with the geodesic distance (1) d(x, y) = inf s Z 1 0 | ˙ w(t)| 2 dt, w C 1 ((0, 1); M ), w(0) = x, w(1) = y . 1

Upload: doankhue

Post on 11-Mar-2018

217 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: GENERALIZATION OF AN INEQUALITY BY …cedricvillani.org/wp-content/uploads/2012/08/014.OV-Talagrand.pdfGENERALIZATION OF AN INEQUALITY BY TALAGRAND, AND LINKS WITH THE LOGARITHMIC

GENERALIZATION OF AN INEQUALITY BYTALAGRAND, AND LINKS WITH THE

LOGARITHMIC SOBOLEV INEQUALITY

F. OTTO AND C. VILLANI

Abstract. We show that transport inequalities, similar to theone derived by Talagrand [30] for the Gaussian measure, are im-plied by logarithmic Sobolev inequalities. Conversely, Talagrand’sinequality implies a logarithmic Sobolev inequality if the density ofthe measure is approximately log-concave, in a precise sense. Allconstants are independent of the dimension, and optimal in certaincases. The proofs are based on partial differential equations, andan interpolation inequality involving the Wasserstein distance, theentropy functional and the Fisher information.

Contents

1. Introduction 12. Main results 53. Heuristics 104. Proof of Theorem 1 185. Proof of Theorem 3 246. An application of Theorem 1 297. Linearizations 31Appendix A. A nonlinear approximation argument 34References 35

1. Introduction

Let M be a smooth complete Riemannian manifold of dimension n,with the geodesic distance(1)

d(x, y) = inf

√∫ 1

0

|w(t)|2 dt, w ∈ C1((0, 1); M), w(0) = x, w(1) = y

.

1

Page 2: GENERALIZATION OF AN INEQUALITY BY …cedricvillani.org/wp-content/uploads/2012/08/014.OV-Talagrand.pdfGENERALIZATION OF AN INEQUALITY BY TALAGRAND, AND LINKS WITH THE LOGARITHMIC

2 F. OTTO AND C. VILLANI

We define the Wasserstein distance, or transportation distance withquadratic cost, between two probability measures µ and ν on M , by

(2) W (µ, ν) =√

T2(µ, ν) =

√inf

π∈Π(µ,ν)

M×M

d(x, y)2 dπ(x, y),

where Π(µ, ν) denotes the set of probability measures on M ×M withmarginals µ and ν, i.e. such that for all bounded continuous functionsf and g on M ,

M×M

dπ(x, y)[f(x) + g(y)

]=

M

f dµ +

M

g dν.

Equivalently,

W (µ, ν) = inf√

E d(X, Y )2, law(X) = µ, law(Y ) = ν

,

where the infimum is taken over arbitrary random variables X and Yon M . This infimum is finite as soon as µ and ν have finite secondmoments, which we shall always assume.

The Wasserstein distance has a long history in probability theoryand statistics, as a natural way to measure the distance between twoprobability measures in weak sense. As a matter of fact, W metrizesthe weak-* topology on P2(M), the set of probability measures on Mwith finite second moments. More precisely, if (µn) is a sequence ofprobability measures on M such that for some (and thus any) x0 ∈ M ,

limR→∞

supn

d(x0,x)≥R

d(x0, x)2 dµn(x) = 0,

then W (µn, µ) −→ 0 if and only if µn −→ µ in weak measure sense.Striking applications of the use of this and related metrics were re-

cently put forward in works by Marton [21] and Talagrand [30]. There,Talagrand shows how to obtain rather sharp concentration estimates ina Gaussian setting, with a completely elementary method, which runsas follows. Let

dγ(x) =e−|x|

2/2

(2π)n/2dx

denote the standard Gaussian measure. Talagrand proved that for anyprobability measure µ on Rn, with density h = dµ/dγ with respectto γ,

(3) W (µ, γ) ≤√

2

Rn

h log h dγ =

√2

Rn

log h dµ.

Page 3: GENERALIZATION OF AN INEQUALITY BY …cedricvillani.org/wp-content/uploads/2012/08/014.OV-Talagrand.pdfGENERALIZATION OF AN INEQUALITY BY TALAGRAND, AND LINKS WITH THE LOGARITHMIC

ON AN INEQUALITY BY TALAGRAND 3

Now, let B ⊂ Rn be a measurable set with positive measure γ(B),and for any t > 0 let

Bt = x ∈ Rn; d(x,B) ≤ t .

Here d(x,B) = infy∈B ‖x − y‖Rn . Moreover, let γ|B denote the re-striction of γ to B, i.e. the measure (1B/γ(B))dγ. A straightforwardcomputation, using (3) and the triangle inequality for W , yields theestimate

W(γ|B , γ|Rn\Bt

) ≤√

2 log1

γ(B)+

√2 log

1

1− γ(Bt).

Since, obviously, this distance is bounded below by t, this entails

(4) γ(Bt) ≥ 1− e− 1

2

t−q

2 log 1γ(B)

2

.

In words, the measure of Bt goes rapidly to 1 as t grows : this is astandard result in the theory of the concentration of measure in Gaussspace, which can also be derived from the Gaussian isoperimetry.

Talagrand’s proof of (3) is completely elementary; after establishingit in dimension 1, he proceeds by induction on the dimension, takingadvantage of the tensorization properties of both the Gaussian measureand the entropy functional E(h log h). His proof is robust enough toyield a comparable result in the more delicate case of a tensor product ofexponential measure : e−

P |xi| dx1 . . . dxn, with a complicated variant ofthe Wasserstein metric. Bobkov and Gotze also recovered inequality (3)as a consequence of the Prekopa-Leindler inequality, and an argumentdue to Maurey [22].

In this paper, we shall give a new proof of inequality (3), and gen-eralize it to a very wide class of probability measures : namely, allprobability measures ν (on a Riemannian manifold M) satisfying alogarithmic Sobolev inequality, which means

(5)

M

h log h dν −(∫

M

h dν

)log

(∫

M

h dν

)≤ 1

M

|∇h|2h

dν,

holding for all (reasonably smooth) functions h on M , with some fixedρ > 0. Let us recall that (5) is obviously equivalent, at least for smoothh, to the (maybe) more familiar form

M

g2 log g2 dν −(∫

M

g2 dν

)log

(∫

M

g2 dν

)≤ 2

ρ

M

|∇g|2 dν.

In the case M = Rn, ν = γ, ρ = 1, this is Gross’s logarithmic Sobolevinequality, and we shall prove that it implies Talagrand’s inequality (3).

Page 4: GENERALIZATION OF AN INEQUALITY BY …cedricvillani.org/wp-content/uploads/2012/08/014.OV-Talagrand.pdfGENERALIZATION OF AN INEQUALITY BY TALAGRAND, AND LINKS WITH THE LOGARITHMIC

4 F. OTTO AND C. VILLANI

As we realized after this study, this implication was conjectured byBobkov and Gotze in their recent work [5]. But we wish to emphasizethe generality of our result : in fact we shall prove that (5) implies aninequality similar to (3), only with the coefficient 2 replaced by 2/ρ.This result is in general optimal, as shows the example of the Gaussianmeasure. By known results on logarithmic Sobolev inequalities, it alsoentails immediately that inequalities similar to (3) hold for (not nec-essarily product) measures e−Ψ(x)−ψ(x) dx on Rn (resp. on a manifoldM) such that ψ is bounded and the Hessian D2Ψ is uniformly positivedefinite (resp. D2Ψ + Ric, where Ric stands for the Ricci curvaturetensor on M).

This implication fits very well in the general picture of applicationsof logarithmic Sobolev inequalities to the concentration of measure, asdeveloped for instance in [19].

Then, a natural question is the converse statement : does an in-equality such as (3) imply (5) ? The answer is known to be positive formeasures on Rn that are log concave, or approximately : this was shownby Wang, using exponential integrability bounds. But we shall presenta completely different proof, based on an information-theoretic interpo-lation inequality, which is apparently new and whose range of applica-tions is certainly very broad. It was used by the first author in [26] forthe study of the long-time behaviour of some nonlinear PDE’s. Oneinterest of this proof is to provide bounds which are dimension-free,and in fact optimal in certain regimes, thus qualitatively much betterthan those already known.

Our arguments are mainly based on partial differential equations.This point of view was already successfully used by Bakry and Emery [3]to derive simple sufficient conditions for logarithmic Sobolev inequali-ties (see also the recent exhaustive study by Arnold et al. [1]), and willappear very powerful here too – in fact, our proofs also imply the mainresults in [3].

Note added in proof : After our main results were announced,S. Bobkov and M. Ledoux gave alternative proofs of Theorem 1 below,based on an argument involving the Hamilton-Jacobi equation.

Acknowledgement : The second author thanks A. Arnold, F. Bartheand A. Swiech for discussions on related topics, and especially M. Le-doux for providing his lecture notes [19] (which motivated this work), aswell as discussing the questions addressed here. Both authors gratefullyacknowledge stimulating discussions with Y. Brenier and W. Gangbo.Part of this work was done when the second author was visiting the

Page 5: GENERALIZATION OF AN INEQUALITY BY …cedricvillani.org/wp-content/uploads/2012/08/014.OV-Talagrand.pdfGENERALIZATION OF AN INEQUALITY BY TALAGRAND, AND LINKS WITH THE LOGARITHMIC

ON AN INEQUALITY BY TALAGRAND 5

University of Santa Barbara, and part of it when he was in the Univer-sity of Pavia; the main results were first announced in November, 1998,on the occasion of a seminar in Georgia Tech. It is a pleasure to thankall of these institutions for their kind hospitality. The first author alsoacknowledges support from the National Science Foundation and theA. P. Sloan Research Foundation.

2. Main results

We shall always deal with probability measures that are absolutelycontinuous w.r.t. the standard volume measure dx on the (smooth,complete) manifold M , and sometimes identify them with their density.We shall fix a “reference” probability measure dν = e−Ψ(x) dx, andassume enough smoothness on Ψ : say, Ψ is twice differentiable. As faras we know, the most important cases of interest are (a) M = Rn, (b)M has finite volume, normalized to unity, and dν = dx (so Ψ = 0). Aninteresting limit case of (a) is dν = dx|B, where B is a closed smoothsubset of Rn. Depending on the cases of study, many extensions arepossible by approximation arguments.

Let dµ = f dx, we define its relative entropy with respect to dν =e−Ψ dx by

(6) H(µ|ν) =

M

logdµ

dνdµ =

M

dνlog

dνdν

or equivalently by

(7) H(f |e−Ψ) =

M

f(log f + Ψ) dx.

Next, we define the relative Fisher information by

(8) I(µ|ν) =

M

∣∣∣∣∇ logdµ

∣∣∣∣2

dµ = 4

M

∣∣∣∣∣∇√

∣∣∣∣∣

2

or equivalently by

(9) I(f |e−Ψ) =

M

f |∇(log f + Ψ)|2 dx.

Here | · |2 denotes the square norm in the Riemannian structure onM , and ∇ is the gradient on M . The relative Fisher information iswell-defined on [0, +∞] by the expression in the right-hand side of (8).The relative entropy is also well-defined in [0, +∞], for instance by theexpression ∫

M

(dµ

dνlog

dν− dµ

dν+ 1

)dν,

Page 6: GENERALIZATION OF AN INEQUALITY BY …cedricvillani.org/wp-content/uploads/2012/08/014.OV-Talagrand.pdfGENERALIZATION OF AN INEQUALITY BY TALAGRAND, AND LINKS WITH THE LOGARITHMIC

6 F. OTTO AND C. VILLANI

which is the integral of a nonnegative function.

Definition 1. The probability measure ν satisfies a logarithmic Sobolevinequality with constant ρ > 0 (in short : LSI(ρ)) if for all probabilitymeasure µ absolutely continuous w.r.t. ν,

(10) H(µ|ν) ≤ 1

2ρI(µ|ν).

This definition is equivalent to (5), since here we restrict to measuresµ = hν which are probability measures.

Definition 2. The probability measure ν satisfies a Talagrand inequal-ity with constant ρ > 0 (in short : T(ρ)) if for all probability measureµ, absolutely continuous w.r.t. ν, with finite moments of order 2,

(11) W (µ, ν) ≤√

2H(µ|ν)

ρ.

By combining (10) and (11), we also naturally introduce the

Definition 3. The probability measure ν satisfies LSI+T(ρ) if for allprobability measure µ, absolutely continuous w.r.t. ν and with finitemoments of order 2,

(12) W (µ, ν) ≤ 1

ρ

√I(µ|ν).

Our first result states that (10) is stronger than (11), and thusthan (12) as well. Below, In denotes the identity matrix of order n,and Ric stands for the Ricci curvature tensor on M .

Theorem 1. Let dν = e−Ψ dx be a probability measure with finitemoments of order 2, such that Ψ ∈ C2(M) and D2Ψ + Ric ≥ −CIn,C ∈ R. If ν satisfies LSI(ρ) for some ρ > 0, then it also satisfies T(ρ),and (obviously) LSI+T(ρ).

Remark. The assumption on D2Ψ+ Ric is there only to avoid patho-logical situations, and ensure uniform bounds on the solution of a re-lated PDE. For all the cases of interest known to the authors, it is nota restriction. The value of C plays no role in the results.

We now recall a simple criterion for ν to satisfy a logarithmic Sobolevinequality. This is the celebrated result of Bakry and Emery.

Theorem 2. (Bakry and Emery [3]) Let dν = e−Ψ dx be a probabil-ity measure on M , such that Ψ ∈ C2(M) and D2Ψ+Ric ≥ ρ In, ρ > 0.Then dν satisfies LSI(ρ).

Page 7: GENERALIZATION OF AN INEQUALITY BY …cedricvillani.org/wp-content/uploads/2012/08/014.OV-Talagrand.pdfGENERALIZATION OF AN INEQUALITY BY TALAGRAND, AND LINKS WITH THE LOGARITHMIC

ON AN INEQUALITY BY TALAGRAND 7

As is well-known since the work of Holley and Stroock [16], if νsatisifies a LSI(ρ), and ν = e−ψ ν is a “bounded perturbation” of ν(this means ψ ∈ L∞, and ν is a probability measure) then ν satisfiesLSI(ρ) with ρ = ρe−osc(ψ), osc(ψ) = sup ψ − inf ψ. This simple lemmaallows to extend considerably the range of probability measures whichare known to satisfy a logarithmic Sobolev inequality.

Next, we are interested in the converse implication to Theorem 1. Itwill turn out that it is actually a corollary of a general “interpolation”inequality between the functionals H, W and I.

Theorem 3. Let dν = e−Ψ dx be a probability measure on Rn, withfinite moments of order 2, such that Ψ ∈ C2(Rn), D2Ψ ≥ KIn, K ∈ R(not necessarily positive). Then, for all probability measure µ on Rn,absolutely continuous w.r.t. ν, holds the following “HWI inequality” :

(13) H(µ|ν) ≤ W (µ, ν)√

I(µ|ν)− K

2W (µ, ν)2.

Remarks.

(1) In particular, if Ψ is convex, then

(14) H(µ|ν) ≤ W (µ, ν)√

I(µ|ν).

(2) Formally, it is not difficult to adapt our proof to a general Rie-mannian setting, with D2Ψ replaced by D2Ψ + Ric in theassumptions of Theorem 3 (and of its corollaries stated below).However, a rigorous proof requires some preliminary work onthe Wasserstein distance on manifolds, which is of independentinterest and will therefore be examined elsewhere. At the mo-ment, using the results of [12], we can only obtain the sameresults when dν = e−Ψ dx is a probability measure on the torusTn, where Ψ is the restriction to Tn of a function Ψ on Rn,D2Ψ ≥ KIn.

(3) By Young’s inequality, if K > 0, inequality (13) implies LSI(K).Thus, this inequality contains the Bakry-Emery result (at leastin Rn). Moreover, the cases of equality for (13) are the samethan for LSI(K). By the way, this shows that the constant 1(in front of the right-hand side of (13)) is optimal for K > 0.

(4) In any case, we have, for any ρ > 0,

H(µ|ν) ≤ 1

2ρI(µ|ν) +

ρ−K

2W (µ, ν)2.

This tells us that LSI(ρ) is always satisfied (for any ρ), up toa “small error”, i.e. an error term of second order in the weaktopology.

Page 8: GENERALIZATION OF AN INEQUALITY BY …cedricvillani.org/wp-content/uploads/2012/08/014.OV-Talagrand.pdfGENERALIZATION OF AN INEQUALITY BY TALAGRAND, AND LINKS WITH THE LOGARITHMIC

8 F. OTTO AND C. VILLANI

Let us now enumerate some immediate consequences of Theorems 1,2, 3. As a corollary of Theorem 1, we find

Corollary 1.1. Under the assumptions of Theorem 1, for all measur-able set B ⊂ M , and t ≥

√(2/ρ) log(1/ν(B)), one has

(15) ν(Bt) ≥ 1− e− ρ

2

t−q

log 1ν(B)

2

.

This inequality was already obtained by Bobkov and Gotze [5].Next, as a corollary of Theorem 2 and Theorem 1, we obtain

Corollary 2.1. Let dν = e−Ψ dx be a probability measure on M withfinite moments of order 2, such that Ψ ∈ C2(M), D2Ψ + Ric ≥ ρ In,ρ > 0. Then T (ρ) holds.

And actually, using the Holley-Stroock perturbation lemma, we comeup with the stronger statement

Corollary 2.2. Let dν = e−Ψ−ψ dx be a probability measure on M , withfinite moments of order 2, such that Ψ ∈ C2(M), D2Ψ + Ric ≥ ρ In,ρ > 0, ψ ∈ L∞. Then T (ρ) holds, ρ = ρe−osc(ψ).

We now state two corollaries of Theorem 3. The first one means thatunder some “convexity” assumption, T ⇒ LSI, maybe at the price ofa degradation of the constants :

Corollary 3.1. Let dν = e−Ψ dx be a measure on Rn, with Ψ ∈C2(Rn),

∫e−Ψ(x)|x|2 dx < +∞, and D2Ψ ≥ KIn, K ∈ R. Assume that

ν satisfies T(ρ) with ρ ≥ max(0,−K). Then ν also satisfies LSI(ρ)with

ρ = max

4

(1 +

K

ρ

)2

, K

].

In particular, if K > 0, ν satisfies LSI(K); and if Ψ is convex, νsatisfies LSI(ρ/4).

Remark. This result is sharp at least for K > 0.

Another variant is the implication LSI+T ⇒ LSI, or LSI+T ⇒ T .Quite surprisingly, these implications essentially always hold true, infact as soon as D2Ψ is bounded below by any real number.

Corollary 3.2. Let dν = e−Ψ dx be a measure on Rn, with Ψ ∈C2(Rn),

∫e−Ψ(x)|x|2 dx < +∞, and D2Ψ ≥ KIn, K ∈ R. Assume

that ν satisfies LSI+T(ρ) with ρ ≥ max(K, 0). Then ν also satisfiesLSI(ρ), and thus also T(ρ), with

ρ =ρ(

2− Kρ

) .

Page 9: GENERALIZATION OF AN INEQUALITY BY …cedricvillani.org/wp-content/uploads/2012/08/014.OV-Talagrand.pdfGENERALIZATION OF AN INEQUALITY BY TALAGRAND, AND LINKS WITH THE LOGARITHMIC

ON AN INEQUALITY BY TALAGRAND 9

Remark. Again, this is sharp for ρ = K > 0.

Proof of Corollaries 3.1 and 3.2. Let us use the shorthands W = W (µ, ν),H = H(µ|ν), I = I(µ|ν), and assume that all these quantities are pos-itive (if not, there is nothing to prove).

By direct resolution, and since W cannot be negative, inequality (13)

implies W ≥ (√

I − √I − 2KH)/K if K 6= 0 (and I − 2KH has to

be nonnegative if K > 0), and W ≥ H/√

I if K = 0. In all the cases,

using W ≤√

(2H/ρ), this leads, if ρ ≥ −K, to

H ≤ 2I

ρ(1 + K

ρ

)2 ,

which is the result of Corollary 3.1. (Of course, if K ≤ ρ, then ρ is noimprovement of ρ.)

The proof of Corollary 3.2 follows the same lines. ¤

Without any convexity assumption on Ψ, it seems likely that theimplication T(ρ) ⇒ LSI(ρ) fails, although we do not have a counterex-ample up to present. On the other hand, as will be shown in thelast section, the Talagrand inequality is still strong enough to imply aPoincare inequality.

Let us comment further on our results. First, define the transporta-tion distance with linear cost, or Monge-Kantorovich distance,

T1(µ, ν) = infπ∈Π(µ,ν)

M×M

d(x, y) dπ(x, y).

By Cauchy-Schwarz inequality, T1 ≤ T2 = W . Bobkov and Gotze [5]prove, actually in a more general setting than ours, that an inequality(referred to below as T1(ρ)) of the form

T1(µ, ν) ≤√

2H(µ|ν)

ρ,

holding for all probability measures µ absolutely continuous w.r.t. νand with finite second moments, is equivalent to a concentration in-equality of the form

(16)

M

eλF dν ≤ eλR

F dν+λ2/(2ρ),

holding for all Lipschitz functions F on M with ‖F‖Lip ≤ 1. Sucha concentration inequality can also be seen as a consequence of thelogarithmic Sobolev inequality LSI(ρ). So our results extend theirs by

Page 10: GENERALIZATION OF AN INEQUALITY BY …cedricvillani.org/wp-content/uploads/2012/08/014.OV-Talagrand.pdfGENERALIZATION OF AN INEQUALITY BY TALAGRAND, AND LINKS WITH THE LOGARITHMIC

10 F. OTTO AND C. VILLANI

showing the stronger inequality for W . In short, LSI(ρ) ⇒ T(ρ) ⇒T1(ρ) ⇒ (16) ⇒ (15).

By the arguments of [5], another consequence of Theorem 1 is thatif ν satisfies LSI(ρ), then for all measurable functions f on M ,∫

M

eρ Sf dν ≤ eρR

f dν ,

with Sf(x) ≡ infy∈M [f(y) + 12d(x, y)2]. Indeed, this is a consequence

of the general identities (the first of which is a special case of theKantorovich duality)

1

2W (µ, ν)2 = sup

f∈Cb(M)

∫Sf dµ−

∫f dν

,

H(µ|ν) = supϕ∈Cb(M)

∫ϕdµ− log

∫eϕ dν

.

The proofs of [5] are essentially based on functional characterizationssimilar to the ones above, and thus completely different from our PDEtools. This explains our more restricted setting : we need a differen-tiable structure.

To conclude this presentation, we mention that, very recently, G. Blowercommunicated to us a direct (independent) proof of Corollary 2.1, inthe Euclidean setting (this is part of a work in preparation, on theGaussian isoperimetric inequality and its links with transportation).His argument does not involve logarithmic Sobolev inequalities norpartial differential equations. One drawback of this approach is thatperturbation lemmas for Talagrand inequalities seem much more deli-cate to obtain, than perturbation lemmas for logarithmic Sobolev in-equalities (due to the nonlocal nature of the Wasserstein distance). Insection 5, we briefly reinterpret Blower’s proof within our framework.

3. Heuristics

In this section, we shall explain how the inequalities T, LSI, HWIand the Bakry–Emery condition have a simple and appealing interpre-tation in terms of a formal Riemannian setting. This formalism, whichis somewhat reminiscent of Arnold’s [2] geometric viewpoint of fluidmechanics, was developed by the first author in [26]. It gives preciousmethodological help in various situations. We wish to make it clearthat we consider it only as a formal tool, and that the arguments inthis section will not be used anywhere else in the paper. Hence, thereader interested only in the proofs (and not in the ideas) may skipthis section.

Page 11: GENERALIZATION OF AN INEQUALITY BY …cedricvillani.org/wp-content/uploads/2012/08/014.OV-Talagrand.pdfGENERALIZATION OF AN INEQUALITY BY TALAGRAND, AND LINKS WITH THE LOGARITHMIC

ON AN INEQUALITY BY TALAGRAND 11

Let P denote the set of all probability measures on M . In the fol-lowing heuristics, we shall do as if all probability measures encounteredhad smooth, positive and rapidly decaying density functions w.r.t thevolume element dx on M .

We shall formally turn P into a Riemannian manifold. Let us fixµ ∈ P . Given a curve (−ε, ε) 3 t 7→ µt ∈ P with µ0 = µ, there existsa unique (up to additive constants) solution Φ on M of the ellipticequation

(17) −∇ · (µ∇Φ) =∂µt

∂t

∣∣∣∣t=0

,

which we interpret in the sense of∫

M

∇ζ · ∇Φ dµ = ddt

∣∣t=0

∫M

ζ dµt

for all smooth functions ζ on M with compact support.

On the other hand, for a given function Φ there exists a curve (−ε, ε) 3t 7→ µt ∈ P with µ0 = µ and (17) — just take the push forward ofµ under the flow generated by the gradient field ∇Φ. Hence we haveidentified the tangent space TµP of P at µ with the space of functionsΦ on M (modulo additive constants). We endow TµP with the scalarproduct

〈Φ, Φ〉 =

M

∇Φ · ∇Φ dµ

and write

‖Φ‖ =√〈Φ, Φ〉.

This endows P with a metric tensor and thus formally with a Riemann-ian structure.

Let us now give a heuristic argument why the geodesic distance distinduced by the above metric tensor is indeed the Wasserstein distanceW . Different arguments were given in [26, section 4.3], and (implicitly)in Benamou and Brenier [4, Theorem 1.2]. Let [0, 1] 3 t 7→ µt be acurve on P . According to the above definitions, its tangent vector field[0, 1] 3 t 7→ Φt is given by the identity (holding in weak sense)

d

dt

∫ζ dµt =

∫∇ζ · ∇Φt dµt

and its action 12

∫ 1

0‖Φt‖2 dt by

∫ 1

0

12‖Φt‖2 dt =

∫ 1

0

∫12|∇Φt|2 dµt.

Page 12: GENERALIZATION OF AN INEQUALITY BY …cedricvillani.org/wp-content/uploads/2012/08/014.OV-Talagrand.pdfGENERALIZATION OF AN INEQUALITY BY TALAGRAND, AND LINKS WITH THE LOGARITHMIC

12 F. OTTO AND C. VILLANI

For any other vector field [0, 1] 3 t 7→ Φt along the curve, we have

d

dt

∫Φt dµt

=

∫∂tΦt dµt +

∫∇Φt · ∇Φt dµt

=

∫ (∂tΦt + 1

2|∇Φt|2

)dµt +

∫12|∇Φt|2 dµt −

∫12|∇Φt −∇Φt|2 dµt.

Integrating in t and rearranging terms,∫ 1

0

dt

∫12|∇Φt|2 dµt

= −∫ 1

0

dt

∫ (∂tΦt + 1

2|∇Φt|2

)dµt +

∫Φ1 dµ1 −

∫Φ0 dµ0

+

∫ 1

0

∫12|∇Φt −∇Φt|2 dµt dt.

Hence∫ 1

0

12‖Φt‖2 dt

= supΦt

∫ 1

0

∫ (∂tΦt + 1

2|∇Φt|2

)dµt dt +

∫Φ1 dµ1 −

∫Φ0 dµ0

.

By definition of the induced geodesic distance,12dist(µ0, µ1)

2

= infµt

supΦt

(−

∫ 1

0

dt

∫(∂tΦt + 1

2|∇Φt|2) dµt +

∫Φ1 dµ1 −

∫Φ0 dµ0

)

= supΦt

infµt

(−

∫ 1

0

dt

∫(∂tΦt + 1

2|∇Φt|2) dµt +

∫Φ1 dµ1 −

∫Φ0 dµ0

)

= sup

∫Φ1 dµ1 −

∫Φ0 dµ0, ∂tΦt + 1

2|∇Φt|2 ≤ 0

,

where we have used the minimax principle. By standard considerations,the supremum in the last formula is obtained when Φt is the viscositysolution of the Hamilton-Jacobi equation

∂Φt

∂t+ 1

2|∇Φt|2 = 0,

and this solution is given by the (generalized) Hopf-Lax formula

Φt(x) = infx0∈M

(Φ0(x0) +

1

2td(x, x0)

2

).

Page 13: GENERALIZATION OF AN INEQUALITY BY …cedricvillani.org/wp-content/uploads/2012/08/014.OV-Talagrand.pdfGENERALIZATION OF AN INEQUALITY BY TALAGRAND, AND LINKS WITH THE LOGARITHMIC

ON AN INEQUALITY BY TALAGRAND 13

Thus the last line of the string of identities (18) is identical to

supΦ0,Φ1

∫Φ1 dµ1 −

∫Φ0 dµ0, Φ1(x1)− Φ0(x0) ≤ 1

2d(x0, x1)

2

.

The Kantorovich duality principle asserts that this expression coin-cides with 1

2W (µ0, µ1)

2. Indeed,

sup

∫Φ1 dµ1 −

∫Φ0 dµ0, Φ1(x1)− Φ0(x0) ≤ 1

2d(x0, x1)

2

= sup(Φ0,Φ1)

infπ

( ∫ (12d(x0, x1)

2 + Φ0(x0)− Φ1(x1))

dπ(x0, x1)

+

∫Φ1 dµ1 −

∫Φ0 dµ0

)

= infπ

sup(Φ0,Φ1)

( ∫12d(x0, x1)

2 dπ(x0, x1)

+

∫Φ1 dµ1 −

∫Φ0 dµ0 −

∫(Φ1(x1)− Φ0(x0)) dπ(x0, x1)

)

= inf ∫

12d(x0, x1)

2 dπ(x0, x1), π ≥ 0 with marginals µ0, µ1

= 12W (µ0, µ1)

2.

This establishes that W coincides with the induced geodesic distanceon P .

The Riemannian structure allows to define the gradient gradE|µ andthe Hessian HessE|µ of a functional E at a point µ. Let us consider theentropy w. r. t. a reference measure ν, that is

E(µ) =

∫log

dνdµ.

We will now argue that

〈gradE|µ, Φ〉 =

∫∇ log

dν· ∇Φ dµ,

〈HessE|µ Φ, Φ〉 =

∫ [tr((D2Φ)T D2Φ) +∇Φ · (Ric + D2Ψ)∇Φ

]dµ

We mention that a somewhat different heuristic justification can befound in [26, section 4.4]. We observe that if t 7→ µt is a geodesic on P

Page 14: GENERALIZATION OF AN INEQUALITY BY …cedricvillani.org/wp-content/uploads/2012/08/014.OV-Talagrand.pdfGENERALIZATION OF AN INEQUALITY BY TALAGRAND, AND LINKS WITH THE LOGARITHMIC

14 F. OTTO AND C. VILLANI

with tangent vector field t 7→ Φt, then

〈gradE|µt , Φt〉 =d

dtE(µt),

〈HessE|µt Φt, Φt〉 =d2

dt2E(µt).

As can be seen from (18), the geodesic equation is given by

(18)∂tµt +∇ · (µt∇Φt) = 0,

∂tΦt + 12|∇Φt|2 = 0,

where the first equation is to be read as

(19)

∫ζ ∂t

dµt

dνdν =

d

dt

∫ζ dµt =

∫∇ζ · ∇Φt dµt

for all smooth functions ζ.In order to compute the derivatives of E(µt) with respect to t, we

shall use the integration by parts formula

(20)

∫∇Φ1 · ∇Φ2 dν =

∫(−4Φ1 +∇Ψ · ∇Φ1) Φ2 dν,

where dν = e−Ψ dx, and dx is the standard volume on M . Since

E(µt) =

∫dµt

dνlog

dµt

dνdν

we obtain for the first derivative

d

dtE(µt) =

∫ (1 + log

dµt

)∂t

dµt

dνdν

(19)=

∫∇ log

dµt

dν· ∇Φt dµt

=

∫∇Φt · ∇dµt

dνdν

(20)=

∫(−4Φt +∇Ψ · ∇Φt)

dµt

dνdν

= −∫4Φt dµt +

∫∇Ψ · ∇Φt dµt.

As for the second derivative,

d2

dt2E(µt)

(19)=

∫ (−∇Φt · ∇4Φt +412|∇Φt|2

)dµt

+

∫ (∇Φt · ∇(∇Ψ · ∇Φt)−∇Ψ · ∇12|∇Φt|2

)dµt,

Page 15: GENERALIZATION OF AN INEQUALITY BY …cedricvillani.org/wp-content/uploads/2012/08/014.OV-Talagrand.pdfGENERALIZATION OF AN INEQUALITY BY TALAGRAND, AND LINKS WITH THE LOGARITHMIC

ON AN INEQUALITY BY TALAGRAND 15

which turns into the desired expression with the help of the Riemanniangeometry formulas (the first is the Bochner formula)

−∇Φ · ∇4Φ +412|∇Φ|2 = tr((D2Φ)T D2Φ) +∇Φ · Ric∇Φ,

∇Φ · ∇(∇Ψ · ∇Φ)−∇Ψ · ∇12|∇Φ|2 = ∇Φ ·D2Ψ∇Φ.

We observe that

E(µ) = H(µ|ν) and ‖gradE|µ‖2 = I(µ|ν),

and that, with these correspondences,

Talagrand inequality ⇐⇒ ρ2dist(µ, ν)2 ≤ E(µ),

logarithmic Sobolev inequality ⇐⇒ ρE(µ) ≤ 12‖gradE|µ‖2

Bakry–Emery condition =⇒ HessE|µ ≥ K Id.

Hence, the following three results are the exact analogues of Theo-rems 1, 2 and 3.

Proposition 1’. Assume that for some ρ > 0,

(21) ∀µ ∈ P, E(µ) ≤ 1

2ρ‖gradE|µ‖2.

Then,

∀µ ∈ P, dist(µ, ν) ≤√

2E(µ)

ρ.

Proposition 2’. Assume that for some ρ > 0,

(22) ∀µ ∈ P, HessE|µ ≥ ρ Id.

Then,

∀µ ∈ P, E(µ) ≤ 1

2ρ‖gradE|µ‖2.

Proposition 3’. Assume that for some K ∈ R,

(23) ∀µ ∈ P, HessE|µ ≥ K Id.

Then,

∀µ ∈ P, E(µ) ≤ ‖gradE|µ‖ dist(µ, ν)− K

2dist(µ, ν)2.

In the remaining of this section, we briefly sketch the proof of theseabstract results, by performing all calculations as if we were on asmooth, finite-dimensional manifold P . Moreover, we assume that ν ischaracterized by

(24)E(µ) ≥ 0 with equality if µ = ν,

‖gradE|µ‖ → 0 ⇔ E(µ) → 0 ⇔ dist(µ, ν) → 0.

Page 16: GENERALIZATION OF AN INEQUALITY BY …cedricvillani.org/wp-content/uploads/2012/08/014.OV-Talagrand.pdfGENERALIZATION OF AN INEQUALITY BY TALAGRAND, AND LINKS WITH THE LOGARITHMIC

16 F. OTTO AND C. VILLANI

These proofs will be the guiding lines of our rigorous arguments forTheorems 1 and 3, in the next sections.

Proof of Proposition 1’. Assume that (21) holds. Fix a µ ∈ P . Thekey ingredient is the gradient flow of E with initial datum µ :

(25)dµt

dt= −gradE|µt , µ0 = µ.

From equation (25) one easily checks that

d

dtE(µt) = 〈gradE|µt ,

dµt

dt〉 = −‖gradE|µt‖2,

and by assumption this quantity is bounded from above by −2ρE(µt).Thus, as t → +∞, E(µt) converges exponentially fast towards 0, andµt converges to ν.

Let us consider the real–valued function

ϕ(t) ≡ dist(µ, µt) +

√2E(µt)

ρ.

Of course, ϕ(0) =√

2E(µ)/ρ, and ϕ(t) −→ dist(µ, ν) as t →∞.We claim that this function is nonincreasing, which will end the

proof. To prove that ϕ is nonincreasing, we show that its right upper

derivative, ddt

+ϕ(t) = lim suph↓0

1h(ϕ(t + h) − ϕ(t)), is nonpositive. In

doing so, we may assume µt 6= ν, otherwise ϕ(t + h) = ϕ(t) for allh > 0, and the upper derivative vanishes.

By triangle inequality,

|dist(µ, µt)− dist(µ, µt+h)| ≤ dist(µt, µt+h),

so that

(26) lim suph↓0

1

hdist(µt, µt+h) = ‖gradE|µt‖.

On the other hand, since µt 6= ν,

d

dt

√2E(µt)

ρ= −‖gradE|µt‖2

√2ρE(µt)

.

Applying inequality (21), we find

(27)d

dt

√2E(µt)

ρ≤ −‖gradE|µt‖,

and we conclude by grouping together (26) and (27). ¤

Page 17: GENERALIZATION OF AN INEQUALITY BY …cedricvillani.org/wp-content/uploads/2012/08/014.OV-Talagrand.pdfGENERALIZATION OF AN INEQUALITY BY TALAGRAND, AND LINKS WITH THE LOGARITHMIC

ON AN INEQUALITY BY TALAGRAND 17

Proof of Proposition 2’. Of course Proposition 2’ is immediate fromProposition 3’. However, we present an independent proof, where thereader may recognize the well-known argument of Bakry and Emery.We fix µ ∈ P and introduce the gradient flow µt of E starting from µ,as in the proof of Proposition 1’. Recall that

(28)d

dtE(µt) = −‖gradE|µt‖2,

and

d

dt‖gradE(µt)‖2 = 2 〈gradE|µt , HessE|µt

dµt

dt〉

= −2 〈gradE|µt , HessE|µtgradE|µt〉(22)

≤ −2 ρ ‖gradE|µt‖2.

This differential inequality implies that

(29) ‖gradE|µt‖2 ≤ e−2 ρ t ‖gradE|µ‖2.

In particular, as t → ∞, ‖gradE|µt‖ −→ 0, and by assumption (24),µt −→ ν. Combining (29) with (28) and integrating in time, we obtain

E(µ)− E(µt) ≤(∫ t

0

e−2 ρ τ dτ

)‖gradE|µ‖2

≤ 1

2 ρ‖gradE|µ‖2.(30)

Since E(µt) −→ E(ν) as t →∞, the result follows. ¤

Proof of Proposition 3’. Now we assume that (23) holds true. We fixµ. Let µt, 0 ≤ t ≤ 1 be a curve of least energy joining µ for t = 0 andν for t = 1. Hence it is a geodesic parametrized by arc length,

(31)

∥∥∥∥dµt

dt

∥∥∥∥ = dist(µ, ν).

Let us write the Taylor formula

E(ν) = E(µ) +d

dt

∣∣∣∣t=0

E(µt) +

∫ 1

0

(1− t)d2

dt2E(µt) dt.

Page 18: GENERALIZATION OF AN INEQUALITY BY …cedricvillani.org/wp-content/uploads/2012/08/014.OV-Talagrand.pdfGENERALIZATION OF AN INEQUALITY BY TALAGRAND, AND LINKS WITH THE LOGARITHMIC

18 F. OTTO AND C. VILLANI

To prove the result, it is sufficient to note that E(ν) = 0 by assumption,and

d

dt

∣∣∣∣t=0

E(µt) =

⟨gradE|µ,

dµt

dt

∣∣∣∣t=0

≤ ‖gradE|µ‖∥∥∥∥

dµt

dt

∣∣∣∣t=0

∥∥∥∥(31)= ‖gradE|µ‖ dist(µ, ν),

∫ 1

0

(1− t)d2

dt2E(µt) dt =

∫ 1

0

(1− t)

⟨dµt

dt, HessE|µt

dµt

dt

⟩dt

(23)

≥ K

∫ 1

0

(1− t)

∥∥∥∥dµt

dt

∥∥∥∥2

dt

(31)=

K

2dist(µ, ν)2.

¤

4. Proof of Theorem 1

In this section, we fix two probability measures µ and ν on M , sat-isfying the assumptions of Theorem 1. By an approximation argumentwhich is sketched in the appendix, we may assume without loss ofgenerality that the density h = dµ

dνsatisfies

(32)h is bounded away from 0 and ∞ on M,

h is smooth and 12|∇h|2 is bounded on M.

The main tool in the proof of Theorem 1 is the introduction of theprobability diffusion semigroup (µt)t≥0 defined by the PDE

(33) ∂tµt = ∇ ·(

µt∇ logdµt

); µ0 = µ.

Here, ∇· stands for the adjoint operator of the gradient. The semigroup(µt) provides a natural interpolation between µ and ν. In the languageof section 3, it is the gradient flow of the entropy functional H(µ|ν).In the Euclidean case, the equation for the density ft of µt,

∂tft = ∇ · (∇ft + ft∇Ψ)

is known in kinetic theory as the Fokker-Planck equation. It will bemore adapted to our purposes (and more intrinsic) to use the equationfor the density ht = dµt/dν :

(34) ∂tht = ∆ht −∇Ψ · ∇ht; h0 = h.

Page 19: GENERALIZATION OF AN INEQUALITY BY …cedricvillani.org/wp-content/uploads/2012/08/014.OV-Talagrand.pdfGENERALIZATION OF AN INEQUALITY BY TALAGRAND, AND LINKS WITH THE LOGARITHMIC

ON AN INEQUALITY BY TALAGRAND 19

Let us now argue that we can construct a solution of suitable regular-ity of the evolution problem (34). First note that, due to the maximumprinciple, we can find a priori estimates from above and below for ht :

(35) supM

ht ≤ supM

h, infM

ht ≥ infM

h

Next, we give a bound on the gradient of ht by using Bernstein’smethod. Let gt = |∇ht|2/2, one checks that gt satisfies

∂tgt −4gt +∇Ψ · ∇gt

= −tr((D2ht)T D2ht)−∇ht · (Ric + D2Ψ)∇ht

≤ 2 C gt,

Here we have used the Riemannian geometry formulas

−4gt = −tr((D2ht)T D2ht)−∇ht · Ric∇ht −∇ht · ∇4ht,

∇Ψ · ∇gt = −∇ht ·D2Ψ∇ht +∇ht · ∇(∇Ψ · ∇ht).

Since by assumption Ric +D2Ψ is bounded below by −C, we get bymaximum principle the a priori estimate

(36) supM

12|∇ht|2 ≤ e2 C t sup

M

12|∇h|2.

Combining our assumption (32) on the initial data, and standardtheory for the heat equation, the two a priori estimates (35) and (36)are sufficient to guarantee the existence of a global solution to (34).Then, it suffices to set

µt = htν

to obtain a solution of (33) (in weak sense).

We now begin to study several links between the semigroup (µt), theentropy, the Fisher information, and the Wasserstein distance. Thereader can check that the next two lemmas are formally direct conse-quences of the abstract considerations developed in section 3.

Lemma 1.

(37)d

dtH(µt|ν) = −I(µt|ν),

This lemma is well-known : see in particular [3, 1]. We give a shortproof for completeness.

Proof of lemma 1. We observe that

H(µt|ν) =

∫ht log ht dν.

Page 20: GENERALIZATION OF AN INEQUALITY BY …cedricvillani.org/wp-content/uploads/2012/08/014.OV-Talagrand.pdfGENERALIZATION OF AN INEQUALITY BY TALAGRAND, AND LINKS WITH THE LOGARITHMIC

20 F. OTTO AND C. VILLANI

Since ht is a smooth solution of (34) which is bounded away from 0,

∂t(ht log ht)−4(ht log ht) +∇Ψ · ∇(ht log ht) = − 1

ht

|∇ht|2.Therefore, for any smooth function η with compact support

d

dt

∫(ht log ht) η dν = −

∫∇(ht log ht) · ∇η dν −

∫1

ht

|∇ht|2 η dν.

Now, select a sequence ηn of smooth functions with compact support,such that

ηn is uniformly bounded and converges pointwise to 1,

∇ηn is uniformly bounded and converges pointwise to 0.

Since 12|∇ht|2 is bounded on M , locally uniformly for t ∈ [0,∞), we

can pass to the limit, and get

d

dt

∫ht log ht dν = −

∫1

ht

|∇ht|2 dν = −I(µt|ν).

This proves (37). ¤The next lemma is the real core of the proof.

Lemma 2.

(38)d

dt

+

W (µ, µt) ≤√

I(µt|ν).

(Here (d/dt)+ stands for the upper right derivative.)

Proof. First of all, thanks to the triangle inequality for the Wassersteinmetric (see [27] for instance),

|W (µ, µt+s)−W (µ, µt)| ≤ W (µt, µt+s).

Thus we only need to show that for fixed t ∈ [0,∞),

(39) lim sups↓0

1

sW (µt|µt+s) ≤

√I(µt|ν).

Remark. Actually, there should be equality in formula (39), but (ina general Riemannian context) this requires some more work, and willbe proved elsewhere. See section 7 for related statements.

The idea to prove (39) is to rewrite the nonlinear equation (33) as alinear transport equation, i.e. an equation of the form

(40) ∂tµt +∇ · (µtξt) = 0,

and then solve it (a posteriori) by using the well-known method ofcharacteristics from the theory of linear transport equations.

Page 21: GENERALIZATION OF AN INEQUALITY BY …cedricvillani.org/wp-content/uploads/2012/08/014.OV-Talagrand.pdfGENERALIZATION OF AN INEQUALITY BY TALAGRAND, AND LINKS WITH THE LOGARITHMIC

ON AN INEQUALITY BY TALAGRAND 21

To this end, we naturally introduce the vector field

ξt = −∇ log ht.

By our a priori estimates (35) and (36), this vector field is smooth andbounded on all of M , locally uniformly in t ∈ [0,∞). Hence there ex-ists a smooth curve of diffeomorphisms (the characteristic trajectories)[0,∞) 3 s 7→ φs with

∂sφs = ξt+s φs and φ0 = id.

Let us now prove that µt+s is the push-forward φs#µt of µt under φs.This means

(41)

∫ζ dµt+s =

∫ζ φs dµt for all bounded and continuous ζ,

or equivalently

(42)

∫ζ φ−1

s dµt+s =

∫ζ dµt.

It is obviously enough to prove (42) for an arbitrary smooth ζ withcompact support. For such a ζ, define (ζs)s≥0 by ζs = ζ φ−1

s . Sinceζs φs = ζ, it is readily checked that (ζs) solves (in strong sense) theadjoint transport equation

∂sζs + ξt+s · ∇ζs = 0.

Since on the other hand

∂sµt+s +∇ · (µt+sξt+s) = 0,

this impliesd

ds

∫ζs dµt+s = 0,

and the identity (42) follows.Now, define the measure πs on M ×M by

dπs(x, y) = dµt(x) δ[y = φs(x)].

In other words,∫ζ(x, y) dπs(x, y) =

∫ζ(x, φs(x)) dµt(x)

for all continuous and bounded ζ.

According to (41), πs has marginals µt and µt+s. Thus, by definitionof the Wasserstein distance,

W (µt, µt+s) ≤√∫

d(x, φs(x))2 dµt(x)

Page 22: GENERALIZATION OF AN INEQUALITY BY …cedricvillani.org/wp-content/uploads/2012/08/014.OV-Talagrand.pdfGENERALIZATION OF AN INEQUALITY BY TALAGRAND, AND LINKS WITH THE LOGARITHMIC

22 F. OTTO AND C. VILLANI

or

1

sW (µt, µt+s) ≤

√∫d(x, φs(x))2

s2dµt(x).

Since ξt+s is bounded on M uniformly in s ∈ [0, 1], also d(x,φs(x))s

isbounded in x ∈ M and in s ∈ [0, 1], and converges to |ξt(x)| for s ↓ 0.Therefore we obtain by dominated convergence

lims↓0

∫d(x, φs(x))2

s2dµt(x) =

∫|ξt|2 dµt = I(µt|ν).

This establishes (39) and achieves the proof of (38). ¤

The next lemma makes precise the sense in which µt converges to-wards ν as t →∞.

Lemma 3.

H(µt|ν)t↑∞−→ 0,(43)

W (µ, µt)t↑∞−→ W (µ, ν).(44)

Proof. The proof of the first part of Lemma 3 has become standard.According to the assumption of Theorem 1,

I(µt|ν) ≥ 2ρH(µt|ν)

so that together with (37)

d

dtH(µt|ν) ≤ −2ρH(µt|ν),

which implies (43).Next, we prove (44). In view of the triangle inequality

|W (µ, µt)−W (µ, ν)| ≤ W (µt, ν),

it is enough to show

(45) W (µt, ν)t↑∞−→ 0.

Using the well–known and elementary Csiszar–Kullback–Pinsker in-equality [11, 17]

∫|ht − 1| dν ≤

√2

∫ht log ht dν =

√2 H(µt|ν),

we obtain from (43) that∫|ht − 1| dν

t↑∞−→ 0.

Page 23: GENERALIZATION OF AN INEQUALITY BY …cedricvillani.org/wp-content/uploads/2012/08/014.OV-Talagrand.pdfGENERALIZATION OF AN INEQUALITY BY TALAGRAND, AND LINKS WITH THE LOGARITHMIC

ON AN INEQUALITY BY TALAGRAND 23

On the other hand, ht is bounded on M , uniformly for t ↑ ∞. There-fore, if ζ is a continuous function with at most quadratic growth atinfinity, that is,

|ζ(x)| ≤ C (d(x0, x)2 + 1),

we have by dominated convergence∣∣∣∣∫

ζ dµt −∫

ζ dν

∣∣∣∣ ≤ C

∫|ht − 1| (d(x0, ·)2 + 1) dν

t↑∞−→ 0.

According to the continuity of the Wasserstein metric under weak con-vergence [27, Theorem 1.4.1], this implies (45). We note that due to thequadratic growth of the cost function, it would not have been sufficientto check only weak convergence in measure sense. ¤

We can now complete the proof of Theorem 1, by the very sameargument than Proposition 1′. First, by lemma 2,

d

dt

+

W (µ, µt) ≤√

I(µt|ν).

But, since by assumption ν satisfies LSI(ρ),

√I(µt|ν) ≤ I(µt|ν)√

2ρH(µt|ν)

(if µt 6= ν). Then, applying lemma 1,

I(µt|ν)√2ρH(µt|ν)

= − d

dt

√2H(µt|ν)

ρ.

Thus,

d

dt

+

ϕ(t) ≡ d

dt

+[W (µ, µt)−

√2H(µt|ν)

ρ

]≤ 0

(and this relation is also clearly true if µt = ν, since this will implyµt+s = ν for all s > 0).

Therefore,

limt→+∞

ϕ(t) ≤ ϕ(0) =

√2H(µ|ν)

ρ.

But by lemma 3,

ϕ(t)t→+∞−−−−→ W (µ, ν).

This concludes the proof of inequality (11), and of Theorem 1.

Page 24: GENERALIZATION OF AN INEQUALITY BY …cedricvillani.org/wp-content/uploads/2012/08/014.OV-Talagrand.pdfGENERALIZATION OF AN INEQUALITY BY TALAGRAND, AND LINKS WITH THE LOGARITHMIC

24 F. OTTO AND C. VILLANI

5. Proof of Theorem 3

We first mention that this theorem is proven in a slightly differentcontext in [26].

In this section, we fix µ0 = µ, µ1 = e−Ψ dx, and denote by f0, f1 theirrespective densities w.r.t Lebesgue measure. According to the resultsof Brenier [6], extended in McCann [23], there exists a (dµ0-a.e) uniquegradient of a convex function, ∇ϕ(x) = x +∇Φ(x), such that

∇ϕ#µ0 = µ1.

In terms of the densities,

f0(x) = f1(x +∇Φ(x)) det D(x +∇Φ(x)).

We mention that this characterization of the “optimal transferenceplan” has been recently extended by McCann [25] to the case of asmooth manifold (see also [12] for the torus), with a suitable general-ization of the class of gradients of convex functions.

We immediately note that

W (µ0, µ1) =

√∫|∇Φ|2 dµ0.

We shall prove

(46) H(µ0|µ1) ≤ −∫∇ log

dµ0

dµ1

· ∇Φ dµ0 − K

2

∫|∇Φ|2 dµ0,

and by Cauchy-Schwarz inequality, it will follow that

H(µ0|µ1) ≤√∫ ∣∣∣∣∇ log

dµ0

dµ1

∣∣∣∣2

dµ0

√∫|∇Φ|2 dµ0 − K

2

∫|∇Φ|2 dµ0

=√

I(µ0|µ1) W (µ0, µ1)− K

2W (µ0, µ1)

2.

To prove (46), we introduce a convenient interpolation flow betweenµ0 and µ1. For the heuristic reasons exposed in section 3, the naturalinterpolation is given by the following construction, due to McCann [23]and Benamou & Brenier [4]. Define a family of measures (µt)0≤t≤1,interpolating between µ0 and µ1, by

µt = (id + t∇Φ)#µ0.

This interpolation is natural if one thinks of the transport problemas a problem of transporting units of mass (or particles) at least cost :it corresponds to a transport of particles along straight lines, with

Page 25: GENERALIZATION OF AN INEQUALITY BY …cedricvillani.org/wp-content/uploads/2012/08/014.OV-Talagrand.pdfGENERALIZATION OF AN INEQUALITY BY TALAGRAND, AND LINKS WITH THE LOGARITHMIC

ON AN INEQUALITY BY TALAGRAND 25

constant velocity, from their initial location to their final destination.At the level of the densities ft of the measures µt,

(47) f0(x) = ft(x + t∇Φ(x)) det D(x + t∇Φ(x)).

Note that x + t∇Φ(x) is the gradient of a strictly convex function fort < 1, so that this formula indeed defines ft. Also this implies, by theKnott-Smith-Brenier characterization theorem,

W (µ0, µt) = t

√∫|∇Φ|2 dµ0 = tW (µ0, µ1), 0 ≤ t ≤ 1.

Turning this formulation into an Eulerian formulation yields pre-cisely the equations (18) for (µt, Φt), with initial value Φ0 = Φ. Thefield Φt is now the “velocity field” of the “particles” in an Euleriandescription, and coincides with the actual velocities of particles only attime t = 0.

Formally, (46) is a very elementary consequence of the equations (18).Indeed, by simple calculations they imply (as in the example given insection 3)(48)

d

dtH(µt|µ1) =

∫∇ log

dµt

dµ1

· ∇Φt dµt;

d

dt

∫∇ log

dµt

dµ1

· ∇Φt =∑ij

∫(∂ijΦt)

2 dµt +

∫〈D2Ψ · ∇Φt,∇Φt〉 dµt;

d

dt

∫|∇Φt|2 dµt = 0.

Then the third equation yields∫|∇Φt|2 dµt =

∫|∇Φ0|2 dµ0 = W (µ0, µ1)

2.

Combining this with the second equation in (48), and using the as-sumption on D2Ψ, we get∫∇ log

dµt

dµ1

· ∇Φt dµt ≥∫∇ log

dµ0

dµ1

· ∇Φ0 dµ0 + K

∫ t

0

ds

∫|∇Φs|2 dµs

=

∫∇ log

dµ0

dµ1

· ∇Φ0 dµ0 + Kt

∫|∇Φ0|2 dµ0.

To obtain (46) it suffices to integrate this in t, from t = 0 to t = 1, anduse the first equation in (48).

Page 26: GENERALIZATION OF AN INEQUALITY BY …cedricvillani.org/wp-content/uploads/2012/08/014.OV-Talagrand.pdfGENERALIZATION OF AN INEQUALITY BY TALAGRAND, AND LINKS WITH THE LOGARITHMIC

26 F. OTTO AND C. VILLANI

But some comments have to be made on the rigor in dealing withsolutions of (18). According to the results of Caffarelli [7, 8] and Ur-bas [31] on the Monge-Ampere equation, Φ in (47) is of class C2,α(Ω)if f0 and f1 are strictly positive smooth functions on Ω, where Ω is asmooth bounded convex set of Rn. It was pointed out to the secondauthor by A. Swiech, that it is possible to extend these results to thecase when f0 and f1 are positive Holder-continuous fonctions on thewhole of Rn. This allows to avoid the unpleasant boundary terms thatwould appear in a truncation arguments (with a boundary dependingon t, since the support of µt may be strictly smaller than the supportof µ0, µ1), and to render the argument above rigorous.

We do not expose here the generalization to Rn of the regularityresults, but rather detail an alternative, direct argument relying only onformula (47). This one is less elementary but easier to justify. Its mainadvantage is that it does not invoke the difficult results of Caffarelli andUrbas, and thus may be generalized to the study of arbitrary smoothmanifolds.

In the formulas below we shall abuse notations by identifying mea-sures with their densities. Recall that H(f |e−Ψ) =

∫f log f +

∫fΨ.

In the sequel we fix Ψ, but (by density) we replace f0 and f1 bysmooth functions with common compact support (so that the rela-tion Ψ = − log f1 is no longer true). We then construct (ft)0≤t≤1 asin (47). Since f0 and f1 are compactly supported, we know that on thesupport of f0, ∇Φ ∈ L∞ (more precisely, ∇Φ takes its values in supp(f1) − supp (f0)).

The results of McCann [24] assert that the change of variables givenby x → x+t∇Φ(x) is licit. Replacing ft(x+t∇Φ(x)) by f0(x)/ det(In+tD2Φ(x)), this entails

(49) H(ft|e−Ψ) =

∫f0 log f0 −

∫f0 log det(In + tD2Φ)

+

∫f0Ψ(x + t∇Φ(x)) dx.

Here D2Φ is to be understood in the following sense. By a resultof Aleksandrov, since |x|2/2 + Φ is a convex function, Φ has a secondderivative almost everywhere; this means that for a.a. x ∈ Rn,

Φ(x + h) = Φ(x) +∇Φ(x) · h + 〈D2Φ(x) · h, h〉+ o(|h|2).We refer to [13] for a proof of this and all the related facts that we shalluse about second derivatives in the sense of Aleksandrov.

The log-concavity of the determinant application ensures that thesecond term on the right of (49) is a convex function of t. Next, using

Page 27: GENERALIZATION OF AN INEQUALITY BY …cedricvillani.org/wp-content/uploads/2012/08/014.OV-Talagrand.pdfGENERALIZATION OF AN INEQUALITY BY TALAGRAND, AND LINKS WITH THE LOGARITHMIC

ON AN INEQUALITY BY TALAGRAND 27

again the fact that ∇Φ is L∞, we see that the last term on the right isC2, and its second derivative is

∫f0 〈D2Ψ(x + t∇Φ(x)) · ∇Φ(x),∇Φ(x)〉 ≥ K

∫f0|∇Φ(x)|2.

Since∫

f0|∇Φ|2 = W (f0, f1)2, this implies that

(50)d2

dt2H(ft|e−Ψ) ≥ KW (f0, f1)

2.

Next, we need to show that

(51) limt→0+

H(ft|e−Ψ)−H(f0|e−Ψ)

t≥

∫f0∇ log

f0

e−Ψ· ∇Φ

=

∫∇f0 · ∇Φ +

∫f0∇Ψ · ∇Φ.

The term in∫

f0∇Ψ·∇Φ is clearly obtained by differentiating∫

f0Ψ(x+t∇Φ(x)), and we only consider the more delicate term with the deter-minant.

By Aleksandrov’s theorem, as t → 0, − log det(x + t∇Φ(x)) con-verges a.e. to ∆Φ(x) (again, considered in Aleksandrov sense). More-over, since − log(x+ t∇Φ(x)) is a convex function of t, the convergenceis monotone decreasing, and we can apply Lebesgue’s monotone con-vergence theorem to find

limt→0+

H(ft|e−Ψ)−H(f0|e−Ψ)

t= −

∫f0∆Φ.

But, as is well-known, ∆Φ[Aleksandrov] ≤ ∆Φ [distributional] (in thesense of measures). This implies

limt→0+

H(ft|e−Ψ)−H(f0|e−Ψ)

t≥

∫∇f0 · ∇Φ =

∫f0∇ log f0 · ∇Φ.

The combination of (50) and (51) entails (as previously seen)

H(f1|e−Ψ) ≥ H(f0|e−Ψ)−∫

f0∇ logf0

e−Ψ· ∇Φ +

K

2

∫f0|∇Φ|2

≥ H(f0|e−Ψ)−√

I(f0|e−Ψ)W (f0, f1) +K

2W (f0, f1)

2.

To conclude, it suffices to let f1 approach e−Ψ.

Remarks.

Page 28: GENERALIZATION OF AN INEQUALITY BY …cedricvillani.org/wp-content/uploads/2012/08/014.OV-Talagrand.pdfGENERALIZATION OF AN INEQUALITY BY TALAGRAND, AND LINKS WITH THE LOGARITHMIC

28 F. OTTO AND C. VILLANI

(1) We proved the general inequality

H(µ0|e−Ψ) ≤ H(µ1|e−Ψ) + W (µ0, µ1)√

I(µ0|e−Ψ)− K

2W (µ0, µ1)

2,

valid for arbitrary µ0, µ1 as soon as D2Ψ ≥ K In. If in this lastformula, instead of dµ1 = e−Ψ dx, we choose dµ0 = e−Ψ dx, werecover

H(µ1|e−Ψ) ≥ K

2W (µ1, e

−Ψ)2.

Though this may not be immediately apparent because of ourdifferential argument, this is actually the principle of Blower’sproof of our Corollary 2.1 in Rn.

(2) What are the cases of equality for (13) ? Recall the formalequality (dµ1 = e−Ψ dx)

(52)

H(µ0|µ1) = −∫∇ log

dµ0

dµ1

·∇Φ0 dµ0−∫ 1

0

dt (1−t)

∫〈D2Ψ·∇Φt,∇Φt〉 dµt

−∑ij

∫ 1

0

dt (1− t)

∫(∂ijΦt)

2 dµt.

If this formula could be rigorously proven for arbitrary probabil-ity distributions µ0 on Rn, this would easily imply the cases ofequality in Theorem 3. Indeed, on the support of µ0, D2Φt = 0must vanish, whence ∇Φt = a = ∇Φ0 for some fixed vector a.Thus the density f0 of µ0 has to be a translate of e−Ψ, i.e.

f0(x) = e−Ψ(x+a).

Moreover, ∇ log(dµ0/dµ1) = ∇Ψ(x + a) − ∇Ψ(x) has to becolinear to a, for all x, so that

∇Ψ(x + a) = ∇Ψ(x) + λa

for some real number λ. Also we should have

(53) 〈D2Ψ(x) · a, a〉 = K|a|2for all x ∈ Rn, and some K ∈ R (obviously nonnegative ifa 6= 0). Roughly speaking, this means that Ψ has to be of theform K|a|2+ψ(y2(x), . . . , yn(x)), where (a, y2(x), . . . , yn(x)) is asystem of cartesian coordinates of Rn. In short, equality shouldhold if and only if f0 is a translate of e−Ψ, in a direction inwhich the potential Ψ is quadratic.

In fact, this is precisely the condition of equality for the log-arithmic Sobolev inequality in the case when D2Ψ ≥ K > 0

Page 29: GENERALIZATION OF AN INEQUALITY BY …cedricvillani.org/wp-content/uploads/2012/08/014.OV-Talagrand.pdfGENERALIZATION OF AN INEQUALITY BY TALAGRAND, AND LINKS WITH THE LOGARITHMIC

ON AN INEQUALITY BY TALAGRAND 29

(see [1] for precise statements). Since in this case the inequal-ity (13) implies LSI(K), we can conclude without further jus-tification that the former formal argument yields the correctanswer in the case K > 0.

This formal argument also suggests that there are no non-trivial cases of equality if K < 0.

In order to do the same proof on a manifold, we need several auxiliaryresults : identification of the relevant interpolation between µ0 andµ1, semiconcavity of the optimal transportation map (and existenceof a second derivative almost everywhere), change of variables formulasimilar to the one of McCann, etc. In order to limit the size of thispaper, and because they are of independent interest, these questionswill be addressed in a separate work.

6. An application of Theorem 1

Let ν denote the uniform measure on a Riemannian manifold M ,with unit volume : ν(M) = 1. Let us assume that ν satisfies LSI(ρ)for some ρ > 0, and let A be any measurable subset of M , and f =1A/ν(A). As a consequence of Theorem 1, we have

(54) W (f dν, dν) ≤√

2∫

f log f dν

ρ=

√2

ρlog

1

ν(A)

We shall use this inequality to give a (slightly) simplified proof of atheorem by Ledoux [19], which establishes a partial converse to thestatement (due to Rothaus) that compact manifolds always satisfy log-arithmic Sobolev inequalities. In fact, Saloff-Coste [29] and Ledoux [19]proved that if the Ricci curvature tensor of M is bounded below, thenLSI implies the compactness of M .

In view of the remark by Talagrand (inequality (4)), the proof thatwe present is quite close in spirit to the one by Ledoux. Yet the use oftransport instead of concentration may appear more natural, and leadsto much simpler numerical constants.

Theorem 4. (Ledoux) Let M be a smooth complete Riemannianmanifold of dimension n, wich uniform measure ν, ν(M) = 1. Assumethat ν satisfies LSI(ρ) for some ρ > 0, and that the Ricci curvaturetensor of M is bounded below :

∀x ∈ M, Ric(x) ≥ −RIn, R > 0.

Page 30: GENERALIZATION OF AN INEQUALITY BY …cedricvillani.org/wp-content/uploads/2012/08/014.OV-Talagrand.pdfGENERALIZATION OF AN INEQUALITY BY TALAGRAND, AND LINKS WITH THE LOGARITHMIC

30 F. OTTO AND C. VILLANI

Then M has finite diameter D, with

(55) D ≤ C√

n max

(1√ρ,R

ρ

),

where C is numerical.

Proof. We shall admit the fact (proven in [19]) that D < ∞, andprove directly the estimate (55). Let B(x, r) denote the geodesic openball of radius r, center x, and let νB(x, r) = ν(B(x, r)) denote itsvolume. By continuity of the distance mapping (x, y) 7→ d(x, y) onM × M , there exist x0, y0 ∈ M such that d(x0, y0) = D. Clearly,B(x0, D/2) ∩ B(y0, D/2) = ∅, whence we can assume without loss ofgenerality that νB(x0, D/2) ≤ 1/2.

On the other hand, since M = B(x0, D), by the Riemannian volumecomparison theorem follows

(56)1

νB(x0, D/4)=

νB(x0, D)

νB(x0, D/4)≤ 4ne

√(n−1)RD.

By (54) and (56),

W(

ν|B(x0,D/4) , ν)2

≤ 2

ρ(n log 4 +

√(n− 1)RD).

But since ν(cB(x0, D/2)) ≥ 1/2 and d(B(x0, D/4), cB(x0, D/2)) ≥D/4, obviously

W(

ν|B(x0,D/4) , ν)2

≥ 1

2× D2

16.

Thus

(57)D2

32− 2

√(n− 1)RD

ρ− 2n log 4

ρ≤ 0,

and the conclusion follows. ¤

Following Ledoux [20], we note that if Ric ≥ R > 0, then it is

known [3, 18] that ρ ≥ R n/(n− 1), and (57) becomes

D ≤ 8√

log 4

√n− 1

R.

Replacing D/4 by λD/2 in the proof above, we can change the constant

8√

log 4 by 4 infλ

√log(2/λ)/(1 − λ) ' 7.6, which is only about twice

the optimal constant π (given by the Bonnet-Myers theorem).

Page 31: GENERALIZATION OF AN INEQUALITY BY …cedricvillani.org/wp-content/uploads/2012/08/014.OV-Talagrand.pdfGENERALIZATION OF AN INEQUALITY BY TALAGRAND, AND LINKS WITH THE LOGARITHMIC

ON AN INEQUALITY BY TALAGRAND 31

7. Linearizations

It is well-known since Rothaus [28] that a linearized version of thelogarithmic Sobolev inequality

(58) H(µ|ν) ≤ 1

2ρI(µ|ν)

is the Poincare inequality P(ρ) :

(59)

[∫

M

f dν = 0

]=⇒ ‖f‖2

L2(dν) ≤1

ρ‖f‖2

H1(dν),

where

‖f‖2H1(dν) =

M

|∇f |2 dν.

Indeed, if one chooses a smooth f such that∫

f dν = 0, and setsµ = µε = (1 + εf)ν, then, as ε go to 0,

(60)H(µε|ν)

ε2−→ 1

2‖f‖2

L2(dν),I(µε|ν)

ε2−→ ‖f‖2

H1(dν).

We now argue that the linearization of Talagrand’s inequality

(61) W (µ, ν)2 ≤ 2H(µ|ν)

ρ

is also the Poincare inequality P(ρ). In short, LSI(ρ) ⇒ T(ρ) ⇒ P(ρ).We thank D.Bakry for asking us whether this implication was true.

Here is a general and simple argument. As before, we consider µε =(1 + εf)ν (f smooth, compactly supported,

∫f dν = 0). By Taylor

formula at order 2, there is a constant C such that for all x, y ∈ M ,

(62) f(x)− f(y) ≤ |∇f(y)|d(x, y) + Cd(x, y)2.

Let πε be an optimal transference plan in the definition of the Wasser-stein distance between µε and ν. This means

πε ∈ Π(µε, ν), W (µε, ν)2 =

M×M

d(x, y)2 dπε(x, y).

By definition of Π(µε, ν),∫

f 2 dν =

∫f d

(µε − ν

ε

)=

1

ε

M×M

[f(x)− f(y)

]dπε(x, y).

Using (62),∫

f 2 dν ≤ 1

ε

M×M

|∇f(y)|d(x, y) dπε(x, y)+C

ε

M×M

d(x, y)2 dπε(x, y)

Page 32: GENERALIZATION OF AN INEQUALITY BY …cedricvillani.org/wp-content/uploads/2012/08/014.OV-Talagrand.pdfGENERALIZATION OF AN INEQUALITY BY TALAGRAND, AND LINKS WITH THE LOGARITHMIC

32 F. OTTO AND C. VILLANI

≤ 1

ε

√∫|∇f(y)|2 dπε(x, y)

√∫d(x, y)2 dπε(x, y)+

C

ε

∫d(x, y)2 dπε(x, y)

=

√∫|∇f |2 dν

W (µε, ν)

ε+

C

εW (µε, ν)2.

Using now inequality T(ρ),

∫f 2 dν ≤

√2

ρ

√∫|∇f |2 dν

√H(µε|ν)

ε2+

C

εH(µε|ν).

By (60), we can pass to the limit as ε → 0 and recover

(63) ‖f‖2L2(dν) ≤

1√ρ‖f‖H1(dν)‖f‖L2(dν).

After simplification by ‖f‖L2(dν), this is the Poincare inequality P(ρ)(proven for smooth functions, and immediately extended to all of H1(dν)by density).

The above proof is simple and holds in full generality. Yet, in orderto help understanding how the Wasserstein distance is behaving in thelimit ε → 0, and why the preceding result is natural, we present anotherline of reasoning, which we explicit only in the Euclidean case. Closelyrelated considerations appear in [14].

Let ∇ϕε be an optimal gradient of convex function transporting νonto µε (see [23] for instance). This means

∫|x−∇ϕε(x)|2 dν(x) = W (µε, ν)2.

In particular,(64)

1

ε2

∫|x−∇ϕε(x)|2 dν(x) =

1

ε2W (µε, ν)2 ≤ 2

ρ

H(µε|ν)

ε2−→

‖f‖2L2(dν)

ρ.

This implies that (up to extraction of a subsequence), as ε → 0, ∇ϕε

converges towards ∇ϕ0 = id in L2(dν) and a.e, and ∇(ϕε−|x|2/2) −→∇F , weakly in L2, for some F ∈ H1(dν).

This F is actually a well-known object. Indeed, if we let ζ be asmooth test-function, we see that (by definition of ∇ϕε),

(65)

∫ζf dν = lim

ε→0

1

ε

∫ζ d(µε − ν) = lim

ε→0

∫ζ ∇ϕε − ζ

εdν

=

∫∇ζ · ∇F dν,

Page 33: GENERALIZATION OF AN INEQUALITY BY …cedricvillani.org/wp-content/uploads/2012/08/014.OV-Talagrand.pdfGENERALIZATION OF AN INEQUALITY BY TALAGRAND, AND LINKS WITH THE LOGARITHMIC

ON AN INEQUALITY BY TALAGRAND 33

where the limit is easily justified by using the a.e. convergence of ϕε,and weak convergence of (∇ϕε − x)/ε. Thus, F solves the ellipticequation

−LF = f,

where L is the linear, L2(dν)-self-adjoint operator defined by

(66) (LF )ν = −∇ · (ν∇F ),

or more explicitly

LF = eΨ∇ · (e−Ψ∇F ) = ∆F −∇Ψ · ∇F.

This solution is unique up to a constant, because (F, LF )ν = −(∇F,∇F )ν ,and ν is supported on the whole of M .

Now, since

‖f‖2L2(dν) = −

∫f LF dν =

∫∇f · ∇F dν

≤√∫

|∇F |2 dν

√∫|∇f |2 dν,

we can define ∫|∇F |2 dν = ‖f‖2

H−1(dν),

and the following interpolation inequality holds,

(67) ‖f‖2L2(dν) ≤ ‖f‖H1(dν)‖f‖H−1(dν).

By weak convergence and convexity of the square norm,(68)

‖f‖2H−1(dν) =

∫|∇F |2 dν ≤ limε→0

∫ ∣∣∣∣∇ϕε −∇ϕ0

ε

∣∣∣∣2

dν = limε→0

W (µε, ν)

ε2,

and we conclude by using (67) and (60). Actually, the limit of T(ρ) isprecisely the Poincare inequality in dual formulation; and it is naturalthat the infinitesimal optimal displacement be given by the equation−LF = f , because (by (66)) this ensures that µε is an approximatesolution of the transport equation

∂µε

∂ε+∇ · (µε∇F ) = 0.

The same considerations show that the linearization of the (HWI)inequality is[∫

f dν = 0

]=⇒ ‖f‖2

L2(dν) ≤ 2‖f‖H−1(dν)‖f‖H1(dν) −K‖f‖2H−1(dν),

Page 34: GENERALIZATION OF AN INEQUALITY BY …cedricvillani.org/wp-content/uploads/2012/08/014.OV-Talagrand.pdfGENERALIZATION OF AN INEQUALITY BY TALAGRAND, AND LINKS WITH THE LOGARITHMIC

34 F. OTTO AND C. VILLANI

which holds if dν = e−Ψ dx, D2Ψ ≥ KIn. By Young’s inequality, itimplies the Poincare inequality P(K) if K > 0. And if K = 0 (con-vexity of Ψ), we recover the interpolation inequality (67), only up toa multiplicative factor 2. (The formal resemblance between inequality(HWI) and inequality (67) was first pointed out to us by P.L. Lions.)

Thus, inequality (HWI) should really be considered as a nonlinearinterpolation inequality (recall that H plays the role of a strong norm,by Csiszar-Kullback-Pinsker, while W is a weak distance, and I involvesgradients). The heuristic considerations of section 3 suggest that thereis no nonlinear generalization of the linear interpolation inequality (67).

Appendix A. A nonlinear approximation argument

In this appendix, we display the density argument that enables torecover Theorem 1 in full generality, after proving it only for thoseµ = hν that satisfy (32).

Let us first show that it is sufficient to prove Theorem 1 in the casewhen h is bounded from above, and bounded away from 0. We consideran arbitrary h ∈ L1(dν), such that µ = h ν has finite second moments.Then, for any n ∈ N, we define

(69)hn =

1n

on h < 1n

n on h > nh elsewhere

and

µn = hn ν where hn = 1αn

hn and αn =∫

hn dν.

We observe that

H(µn|ν) =

∫hn log hn dν =

1

αn

∫hn log hn dν − log αn.

Sinceαn −→ 1,

hn log hn −→ h log h a. e.,

−1e≤ hn log hn ≤ maxh log h, 0,

we obtain by dominated convergence

(70) H(µn|ν) −→∫

h log h dν = H(µ|ν).

Furthermore, if ζ is a continuous function with at most quadraticgrowth at infinity, that is,

|ζ(x)| ≤ C (d(x0, x)2 + 1),

Page 35: GENERALIZATION OF AN INEQUALITY BY …cedricvillani.org/wp-content/uploads/2012/08/014.OV-Talagrand.pdfGENERALIZATION OF AN INEQUALITY BY TALAGRAND, AND LINKS WITH THE LOGARITHMIC

ON AN INEQUALITY BY TALAGRAND 35

then∣∣∣∣∫

ζ dµn −∫

ζ dµ

∣∣∣∣ ≤ C

∫|hn − h| (d(x0, ·)2 + 1) dν

≤ C( 1

αn

h< 1n∪h>n

(d(x0, ·)2 + 1) dµ

+

∣∣∣∣1

αn

− 1

∣∣∣∣∫

(d(x0, ·)2 + 1) dµ).

Hence, by dominated convergence again,∫ζ dµn −→

∫ζ dµ

for all continuous functions ζ which grow at most quadratically. Ac-cording to the continuity of the Wasserstein metric under weak conver-gence, this implies as desired

(71) W (µn, ν) −→ W (µ, ν).

The argument to show that one may also assume the second partof (32) is more standard. Given a µ = h ν which satisfies the firstpart of (32), it suffices to construct by a standard linear regularization

argument a sequence hn such that

hn is smooth and 12|∇hn|2 is bounded on M for all n,

hn is bounded away from 0 and ∞ on M uniformly in n,

hn −→ h in L1(dν).

We then define µn like in the second line of (69). The argument that(70) and (71) holds for this approximating sequence is even easier thanin the first approximation step.

References

[1] Arnold, A., Markowich, P., Toscani, G., and Unterreiter, A. Onlogarithmic Sobolev inequalities and the rate of convergence to equilibrium forFokker-Planck type equations. To appear in Comm. P.D.E.

[2] Arnold, V. I., and Khesin, B. A. Topological methods in hydrodynamics.Springer-Verlag, New York (1998).

[3] Bakry, D., and Emery, M. Diffusions hypercontractives. In Sem. Proba.XIX, LNM, 1123, Springer (1985), 177–206.

[4] Benamou, J.-D., and Brenier, Y. A numerical method for the optimaltime-continuous mass transport problem and related problems. In MongeAmpere equation : applications to geometry and optimization, Contemp.Math., 226, Amer. Math. Soc., Providence, RI (1999), 1–11.

Page 36: GENERALIZATION OF AN INEQUALITY BY …cedricvillani.org/wp-content/uploads/2012/08/014.OV-Talagrand.pdfGENERALIZATION OF AN INEQUALITY BY TALAGRAND, AND LINKS WITH THE LOGARITHMIC

36 F. OTTO AND C. VILLANI

[5] Bobkov, S., and Gotze, F. Exponential integrability and transportationcost related to logarithmic sobolev inequalities. J. Funct. Anal. 163, 1 (1999),1–28.

[6] Brenier, Y. Polar factorization and monotone rearrangement of vector-valuedfunctions. Comm. Pure Appl. Math. 44, 4 (1991), 375–417.

[7] Caffarelli, L. A. The regularity of mappings with a convex potential. J.Amer. Math. Soc. 5, 1 (1992), 99–104.

[8] Caffarelli, L. A. Boundary regularity of maps with convex potentials.Comm. Pure Appl. Math. 45 (1992), 1141–1151.

[9] Carlen, E. Superadditivity of Fisher’s information and logarithmic Sobolevinequalities. J. Funct. Anal. 101, 1 (1991), 194–211.

[10] Carlen, E., and Soffer, A. Entropy production by block variable summa-tion and central limit theorems. Comm. Math. Phys. 140 (1991), 339–371.

[11] Csiszar, I. Information-type measures of difference of probability distribu-tions and indirect observations. Stud. Sci. Math. Hung. 2 (1967), 299–318.

[12] Cordero-Erausquin, D. Sur le transport de mesures priodiques. C.R. Acad.Sci. Paris, I, 329 (1999), 199–202.

[13] Evans, L.C., and Gariepy, R.F. Measure theory and fine properties offunctions. Studies in Advanced Mathematics, CRC Press (1992).

[14] Gangbo, W. An elementary proof of the polar factorization of vector-valuedfunctions. Arch. Rat. Mech. Anal. 128 (1994), 381–399.

[15] Jordan, R., Kinderlehrer, D., and Otto, F. The variational formula-tion of the Fokker-Planck equation. SIAM J. Math. Anal. 29, 1 (1998), 1–17(electronic).

[16] Holley, R. and Stroock, D. Logarithmic Sobolev inequalities and stochas-tic Ising models. J. Stat. Phys., 46 (5–6) : 1159–1194, 1987.

[17] Kullback, S. A lower bound for discrimination information in terms of vari-ation. IEEE Trans. Info. 4 (1967), 126–127.

[18] Ledoux, M. On an integral criterion for hypercontractivity of diffusion semi-groups and extremal functions. J. Funct. Anal. 105, 2 (1992), 444–465.

[19] Ledoux, M. Concentration of measure and logarithmic Sobolev inequalities.Lectures in Berlin (1997).

[20] Ledoux, M. The geometry of Markov processes. Lectures in Zurich (1998).[21] Marton, K. A measure concentration inequality for contracting Markov

chains. Geom. Funct. Anal. 6 (1996), 556–571.[22] Maurey, B. Some deviation inequalities. Geom. Funct. Anal. 1, 2 (1991),

188–197.[23] Mc Cann, R. J. Existence and uniqueness of monotone measure-preserving

maps. Duke Math. J. 80, 2 (1995), 309–323.[24] McCann, R. J. A convexity principle for interacting gases. Adv. Math. 128,

1 (1997), 153–179.[25] McCann, R. J. Polar factorization on Riemannian manifolds. Preprint (1999).[26] Otto, F. The geometry of dissipative evolution equations: the porous medium

equation. To appear in Comm. P.D.E.[27] Rachev, S., and Ruschendorf, L. Mass transportation problems. Proba-

bility and its applications, Springer-Verlag, 1998.[28] Rothaus, O. Diffusion on compact Riemannian manifolds and logarithmic

Sobolev inequalities. J. Funct. Anal. 42 (1981), 102–109.

Page 37: GENERALIZATION OF AN INEQUALITY BY …cedricvillani.org/wp-content/uploads/2012/08/014.OV-Talagrand.pdfGENERALIZATION OF AN INEQUALITY BY TALAGRAND, AND LINKS WITH THE LOGARITHMIC

ON AN INEQUALITY BY TALAGRAND 37

[29] Saloff-Coste, L. Convergence to equilibrium and logarithmic Sobolev con-stant on manifolds with Ricci curvature bounded below. Colloquium Math. 67(1994), 109–121.

[30] Talagrand, M. Transportation cost for Gaussian and other product mea-sures. Geom. Funct. Anal. 6, 3 (1996), 587–600.

[31] Urbas, J. On the second boundary value problem for equations of Monge-Ampere type. J. Reine Angew. Math. 487 (1997), 115–124.

Department of mathematics, University of California, Santa Bar-bara, CA 93106, USA. e-mail [email protected]

Ecole Normale Superieure, DMA, 45 rue d’Ulm, 75230 Paris Cedex 05,FRANCE. e-mail [email protected]