quantitative stability of optimal transport maps and

Quantitative Stability of Optimal Transport Maps andLinearization of the 2-Wasserstein Space

Alex Delalande*, +with Quentin Mérigot* and Frédéric Chazal+

Laboratoire de Mathématiques d’Orsay, Université Paris-Sud*DataShape team, INRIA Saclay+

AISTATS 2020

1/14

Introduction

I Numerous problems involve the comparison of point clouds/probabilitymeasures (e.g. in astronomy, shape recognition, image processing, generative modelling,large-scale learning, etc).

Figure 1: Point cloud comparison may appear in astronomy, 3D shape recognitionor color transfer (from [Yu et al., 2012, Wu et al., 2015, Paty et al., 2019]).

2/14

Introduction


I Wasserstein distances, provided by Optimal Transport (OT), are naturalto perform these comparisons.

3/14

Introduction


I Wasserstein distances, provided by Optimal Transport (OT), are naturalto perform these comparisons.Wasserstein distances.

I X , Y compact and convex subsets of Rd .I α, β probability measures on X , Y respectively.

Wpp(α, β) := min

π

{∫X×Y

||x − y ||pdπ(x , y) | π ∈ U(α, β)},

where U(α, β) = {π ∈ P(X × Y) | (PX )#π = α and (PY )#π = β} withPX (x , y) = x and PY (x , y) = y .

3/14

Introduction


I Wasserstein distances, provided by Optimal Transport (OT), are naturalto perform these comparisons.Wasserstein distances.

I X , Y compact and convex subsets of Rd .I α, β probability measures on X , Y respectively.

Wpp(α, β) := min

π

{∫X×Y

||x − y ||pdπ(x , y) | π ∈ U(α, β)},

where U(α, β) = {π ∈ P(X × Y) | (PX )#π = α and (PY )#π = β} withPX (x , y) = x and PY (x , y) = y .

They enable notions such as:

3/14

Introduction

I In practice: for αn = 1n∑n

i=1 δxi and βn = 1n∑n

i=1 δyi , computingWp

p(αn, βm) corresponds to solving a LP problem.

4/14

Introduction





Several algorithms, e.g.Algorithm Complexity

Network Simplex O(n3 log(n)2)Auction O(n3)

Sinkhorn (τ -approximate OT ) O(n2 log(n)τ−3)[Peyré and Cuturi, 2019]

4/14

Introduction





Several algorithms, e.g.Algorithm Complexity

Network Simplex O(n3 log(n)2)Auction O(n3)

Sinkhorn (τ -approximate OT ) O(n2 log(n)τ−3)[Peyré and Cuturi, 2019]

1 solving of OT =⇒ high computational costs.

To get the distance matrix of k point clouds, O(k2) OT problems to solve:can be prohibitive for large values of k.

4/14

Introduction

I In practice: solving numerous OT problems can be computationally prohibitive.

Wasserstein spaces are curved, hence non linear.

Figure 2: (P2,W2) is curved.

No "closed-form" notions of sum or mean in (Pp,Wp) (e.g. Wassersteinbarycenters are defined as arg min’s).

5/14

Introduction

I In practice: solving numerous OT problems can be computationally prohibitive.

Wasserstein spaces are curved, hence non linear.

Figure 3: (P2,W2) is curved.

No "closed-form" notions of sum or mean in (Pp,Wp) (e.g. Wassersteinbarycenters are defined as arg min’s).In ML, can be interesting to have an Hilbertian structure: impossible inWasserstein spaces.

5/14

Introduction

I In practice: solving numerous OT problems can be computationally prohibitive.Wasserstein spaces are curved and not Hilbertian.

I Here: we propose a measure embedding into a Hilbert space thatconserves some of the (unregularized) Wasserstein geometry.

6/14

Proposed workaround: Monge embedding

Let ρ be a fixed a.c. measure on X .

By [Brenier, 1991], for any µ ∈ P(Y),

minπ

{∫X×Y

||x − y ||2dπ(x , y) | π ∈ U(ρ, µ)}

=

minT

{∫X||x − T (x)||2dρ(x) | T : X → Y,T#ρ = µ

},

where T#ρ is s.t. ∀Y ⊆ Y, T#ρ(Y ) = ρ(T−1(Y )).

µ

νN = 1N

∑iδyi

ρ

Tµ

TνN

7/14

Proposed workaround: Monge embedding

Let ρ be a fixed a.c. measure on X .


minπ

{∫X×Y

||x − y ||2dπ(x , y) | π ∈ U(ρ, µ)}

=

minT

{∫X||x − T (x)||2dρ(x) | T : X → Y,T#ρ = µ

},


A solution Tµ always exists and is uniquely defined as thegradient Tµ = ∇φµ of a convex function φµ that minimizesthe Kantorovich functional

K(φµ) =∫Xφµdρ+

∫Yψµdµ,

where ψµ = φ∗µ is the Legendre transform of φµ.

µ

νN = 1N

∑iδyi

ρ

Tµ

TνN

7/14

Proposed workaround: Monge embeddingLet ρ be a fixed a.c. measure on X .


minπ

{∫X×Y

||x − y ||2dπ(x , y) | π ∈ U(ρ, µ)}

=

minT

{∫X||x − T (x)||2dρ(x) | T : X → Y,T#ρ = µ

},


A solution Tµ always exists and is uniquely defined as thegradient Tµ = ∇φµ of a convex function φµ that minimizesthe Kantorovich functional

K(φµ) =∫Xφµdρ+

∫Yψµdµ,

where ψµ = φ∗µ is the Legendre transform of φµ.

µ

νN = 1N

∑iδyi

ρ

Tµ

TνN

Definition (Monge embedding)The Monge embedding is the mapping

P(Y)→ L2(ρ,Rd),µ 7→ Tµ.

7/14


Definition (Monge embedding)For all µ ∈ P(Y), denote Tµ the solution of Monge’sOT problem between ρ and µ for the squarredEuclidean cost.

The Monge embedding is the mapping

P(Y)→ L2(ρ,Rd),µ 7→ Tµ.

µ

νN = 1N

∑iδyi

ρ

Tµ

TνN

RemarksI When µ is discrete: semi-discrete OT. Efficiently solved in low dimensions with

second-order methods [Kitagawa et al., 2019] and in higher dimensions withstochastic optimization methods [Genevay et al., 2016].

I L2(ρ)-Distance matrix of k point-clouds =⇒ k OT problems to solve + O(k2)distance computations on Hilbertian/Euclidean data.

7/14


Definition (Monge embedding)For all µ ∈ P(Y), denote Tµ the solution of Monge’sOT problem between ρ and µ for the squarredEuclidean cost.

The Monge embedding is the mapping

P(Y)→ L2(ρ,Rd),µ 7→ Tµ.

µ

νN = 1N

∑iδyi

ρ

Tµ

TνN

Main result: bi-Hölder behavior of µ 7→ Tµ

∀µ, ν ∈ P(Y), W2(µ, ν) ≤ ‖Tµ − Tν‖L2(ρ) ≤ Cd,X ,YW2(µ, ν)2/15.

7/14

Geometric interpretation of the Monge embeddingMonge embedding as logarithm map (similar construction andinterpretation in the Linear Optimal Transportation Framework of[Wang et al., 2013])

( ( ), W2)

T = IdL2( )

W2( , )

L2( )T TW2, ( , )

T T

where W2,ρ(µ, ν) := ||Tµ − Tν ||L2(ρ) .

Riemannian geometry Optimal transportPoint x ∈ M µ ∈ P2(Rd )

Geodesic distance dg (x, y) W2(µ, ν)Tangent space TρM TρP2(Rd ) ≈ L2(ρ)

Inverse exponential map exp−1(x) ∈ TρM Tµ ∈ L2(ρ)Distance in tangent space

∥∥exp−1(x)− exp−1(y)∥∥

g(ρ)

∥∥Tµ − Tν∥∥

L2(ρ)

What amount of the Wasserstein geometry is preserved by the embeddingµ 7→ Tµ ?

8/14

Immediate properties of the Monge embedding

µ 7→ Tµ is discriminative

I µ 7→ Tµ is injective(by definition of the push-forward operator).

I µ 7→ Tµ is reverse-Lipschitz: ||Tµ − Tν ||L2(ρ) ≥W2(µ, ν)(γ := (Tµ,Tν )#ρ defines an admissible coupling between µ and ν).

µ 7→ Tµ is not better than 12 -Hölder

I Take ρ := 1πLebB(0,1) on R2 and µθ :=

δxθ+δxθ+π2 with xθ = (cos(θ), sin(θ)). Then

||Tµθ − Tµθ+δ ||2L2(ρ)

≥ Cδ while W2(µθ, µθ+δ) ≤ Cδ.

9/14

Immediate properties of the Monge embedding

Theorem ( 12 -Hölder continuity near a regular measure, similar to a resultof [Gigli, 2011])Let µ, ν ∈ P(Y) and assume that Tµ is K -Lipschitz. Then,

‖Tµ − Tν‖L2(ρ) ≤ 2√

MXKW1(µ, ν)1/2,

where MX is s.t. X ⊂ B(0,MX ).

Very strong hypothesis on µ.

Theorem (General Hölder-continuity as a corollary of [Berman, 2018],Proposition 3.4)If ρ ≡ 1 on X with |X | = 1, then for any measures µ and ν in P(Y),

‖Tµ − Tν‖L2(ρ) ≤ Cd,X ,YW1(µ, ν)1

(d+2)2(d−1) .

High dependence on the ambient dimension. Optimal exponent?

10/14

Stability of the Monge embedding

Theorem (Dimension-independent Hölder continuity)If ρ ≡ 1 on X convex with |X | = 1, then for any measures µ and ν in P(Y),

‖Tν − Tµ‖L2(X ) ≤ Cd,X ,YW1(µ, ν)2/15.

RemarksI Hölder exponent independent of the ambient dimension.I No hypothesis on µ and ν except that they are compactly supported.I No regularization.I Optimality: best exponent belongs to [ 2

15 ,12 ].

11/14


Theorem (Dimension-independent Hölder continuity)If ρ ≡ 1 on X with |X | = 1, then for any measures µ and ν in P(Y),

‖Tν − Tµ‖L2(X ) ≤ Cd,X ,YW1(µ, ν)2/15.


15 ,12 ].

I Proof ingredients:I Discrete µ and ν (general case by density).

11/14


Theorem (Dimension-independent Hölder continuity)If ρ ≡ 1 on X with |X | = 1, then for any measures µ and ν in P(Y),

‖Tν − Tµ‖L2(X ) ≤ Cd,X ,YW1(µ, ν)2/15.


15 ,12 ].

I Proof ingredients:I Discrete µ and ν (general case by density).I A Discrete Poincaré-Wirtinger inequality =⇒ local estimate of the strong

convexity of the Kantorovich functional (non trivial because of the lack ofregularization). Gives:

‖ψν − ψµ‖L2(µ+ν) ≤ CW1(µ, ν)13 .

11/14

Stability of the Monge embeddingTheorem (Dimension-independent Hölder continuity)If ρ ≡ 1 on X with |X | = 1, then for any measures µ and ν in P(Y),

‖Tν − Tµ‖L2(X ) ≤ Cd,X ,YW1(µ, ν)2/15.


15 ,12 ].

I Proof ingredients:I Discrete µ and ν (general case by density).I A Discrete Poincaré-Wirtinger inequality =⇒ local estimate of the strong

convexity of the Kantorovich functional (non trivial because of the lack ofregularization).

I An inverse Poincaré-Wirtinger inequality gives:

‖Tν − Tµ‖L2(ρ) = ‖∇φν −∇φµ‖L2(ρ) ≤ CW1(µ, ν)215 .

11/14

Numerical illustrations

I Let X = Y = [0, 1]2, ρ ≡ 1 on X .

I Project L2(ρ,R2) onto a finite dimensional space.

Distance approximation: W2,ρ vs. W2

0.1 0.2 0.3 0.4 0.5 0.6W2(μM, νN)

0.1

0.2

0.3

0.4

0.5

0.6

||Tμ M

−T ν

N|| L

2 (ρ)

Gaussiansy = x

0.05 0.10 0.15 0.20 0.25 0.30 0.35W2(μM, νN)

0.1

0.2

0.3||T

μ M−T ν

N|| L

2 (ρ)

Mixture of 4 Gaussiansy = x

0.04 0.06 0.08 0.10 0.12W2(μM, νN)

0.04

0.06

0.08

0.10

0.12

||Tμ M

−T ν

N|| L

2 (ρ)

Uniformsy = x

W2,ρ vs. W2 between point clouds sampled from a Gaussian, a Mixture of 4 Gaussian and auniform distribution.

12/14

Numerical illustrations

Wasserstein barycenter [Agueh and Carlier, 2011] approximationApproximate argminµ

∑Ss=1 λsW2

2(µ, µs) with µ =(∑S

s=1 λsTµs

)#ρ.

Barycenters of 4 point clouds. Weights (λs )s are bilinear w.r.t. the corners of the square.

Push-forwards of the 20 centroids after clustering of the Monge embeddings of the MNISTtraining set.

13/14

Conclusion

µ 7→ Tµ is an Hilbert space embedding that:I Linearizes to some extent the 2-Wasserstein space.I Is bi-Hölder continuous w.r.t. W2.I Allows for the direct use of generic ML algorithms on measure data thanks

to the linearity and Hilbertian structure of L2(ρ).

Future work:I Not compactly supported target measures.I More general source measures.I Statistical properties: concentration and sample complexity of the defined

distances.I Applications: compact encoding of Tµ ∈ L2(ρ) that scales well to high

dimensions.

14/14

References I

Agueh, M. and Carlier, G. (2011).Barycenters in the Wasserstein space.SIAM Journal on Mathematical Analysis, 43(2):904–924.

Berman, R. J. (2018).Convergence rates for discretized monge-ampère equations andquantitative stability of optimal transport.arXiv preprint 1803.00785.

Brenier, Y. (1991).Polar factorization and monotone rearrangement of vector-valuedfunctions.Communications on Pure and Applied Mathematics, 44(4):375–417.

Genevay, A., Cuturi, M., Peyré, G., and Bach, F. (2016).Stochastic optimization for large-scale optimal transport.In Advances in neural information processing systems, pages 3440–3448.

Gigli, N. (2011).On hölder continuity-in-time of the optimal transport map towardsmeasures along a curve.Proceedings of the Edinburgh Mathematical Society, 54(2):401–409.

14/14

References II

Kitagawa, J., Mérigot, Q., and Thibert, B. (2019).Convergence of a newton algorithm for semi-discrete optimal transport.Journal of the European Mathematical Society.

Paty, F.-P., d’Aspremont, A., and Cuturi, M. (2019).Regularity as regularization: Smooth and strongly convex brenierpotentials in optimal transport.

Peyré, G. and Cuturi, M. (2019).Computational optimal transport.Foundations and Trends in Machine Learning, 11(5-6):355–607.

Wang, W., Slepčev, D., Basu, S., Ozolek, J. A., and Rohde, G. K. (2013).A linear optimal transportation framework for quantifying and visualizingvariations in sets of images.Int. J. Comput. Vision, 101(2):254–269.

Wu, Z., Song, S., Khosla, A., Yu, F., Zhang, L., Tang, X., and Xiao, J.(2015).3d shapenets: A deep representation for volumetric shapes.In CVPR, pages 1912–1920. IEEE Computer Society.

14/14

References III

Yu, L., Efstathiou, K., Isenberg, P., and Isenberg, T. (2012).Efficient structure-aware selection techniques for 3d point cloudvisualizations with 2dof input.IEEE Transactions on Visualization and Computer Graphics,18(12):2245–2254.

14/14

quantitative stability of optimal transport maps and

Documents