a geometric integration approach to non-smooth and non-convex optimisation … · 2020. 4. 7. ·...

25
A geometric integration approach to non-smooth and non-convex optimisation Martin Benning 1 , Matthias Ehrhardt 2 , GRW Quispel 3 , Erlend Skaldehaug Riis 4 , Torbjørn Ringholm 5 , Carola-Bibiane Sch¨ onlieb 4 1 Queen Mary University of London, UK 2 University of Bath, UK 3 La Trobe University, Australia 4 University of Cambridge, UK 5 Norwegian University of Science and Technology, Norway Variational Methods and Optimization in Imaging IHP, Paris Erlend S. Riis Nonsmooth, nonconvex optimisation IHP, February, 2019 1 / 26

Upload: others

Post on 25-Jul-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A geometric integration approach to non-smooth and non-convex optimisation … · 2020. 4. 7. · Theorem (Ehrhardt, Riis, Quispel, Sch onlieb (2018)) Let V : Rn!R be alocally Lipschitz

A geometric integration approach to non-smooth andnon-convex optimisation

Martin Benning1, Matthias Ehrhardt2, GRW Quispel3,Erlend Skaldehaug Riis4, Torbjørn Ringholm5, Carola-Bibiane Schonlieb4

1Queen Mary University of London, UK2University of Bath, UK

3La Trobe University, Australia4University of Cambridge, UK

5Norwegian University of Science and Technology, Norway

Variational Methods and Optimization in ImagingIHP, Paris

Erlend S. Riis Nonsmooth, nonconvex optimisation IHP, February, 2019 1 / 26

Page 2: A geometric integration approach to non-smooth and non-convex optimisation … · 2020. 4. 7. · Theorem (Ehrhardt, Riis, Quispel, Sch onlieb (2018)) Let V : Rn!R be alocally Lipschitz

Outline

1 Geometric numerical integration and the discrete gradient method

2 The DG method for nonsmooth, nonconvex optimisation

3 Beyond gradient flow

Erlend S. Riis Nonsmooth, nonconvex optimisation IHP, February, 2019 2 / 26

Page 3: A geometric integration approach to non-smooth and non-convex optimisation … · 2020. 4. 7. · Theorem (Ehrhardt, Riis, Quispel, Sch onlieb (2018)) Let V : Rn!R be alocally Lipschitz

Geometric numerical integration and the discretegradient method

Erlend S. Riis Nonsmooth, nonconvex optimisation IHP, February, 2019 3 / 26

Page 4: A geometric integration approach to non-smooth and non-convex optimisation … · 2020. 4. 7. · Theorem (Ehrhardt, Riis, Quispel, Sch onlieb (2018)) Let V : Rn!R be alocally Lipschitz

Optimisation and numerical integration

minx∈Rn V (x), x(t) = −∇V (x(t)) (gradient flow)

Forward Euler → xk+1 = xk − τ∇V (xk),Backward Euler → xk+1 = xk − τ∇V (xk+1).

Numerical integration and analysis of ODEs

Optimisation scheme → ODEAccelerated gradient descent → x + 3

t x = −∇V (x)[Su, Boyd, Candes (2016)]

Numerical integration tools to improve efficiencyRunge-Kutta methods for stiff ODEs → Larger time steps[Eftekhari, Vandereycken, Vilmart, Zygalakis (2018)]

Discretisation methods for structure preservation of ODESymmetry preservation → acceleration phenomenon[Betancourt, Jordan, Wilson (2018)]

Erlend S. Riis Nonsmooth, nonconvex optimisation IHP, February, 2019 4 / 26

Page 5: A geometric integration approach to non-smooth and non-convex optimisation … · 2020. 4. 7. · Theorem (Ehrhardt, Riis, Quispel, Sch onlieb (2018)) Let V : Rn!R be alocally Lipschitz

Geometric integration and discrete gradients

Geometric numerical integrationODEs have structure (conservation laws, symplectic structure.)Aim: Preserve structure when numerically solving ODEs

Discrete gradient (DG) methodPreserves first integrals; energy conservation and dissipation laws,Lyapunov functions1

Optimisation methodsWant to solve minx∈Rn V (x).Apply DG method to gradient flow to preserve dissipative structure

Figure: Dahlby, Owren, Yaguchi (2011).

‘Why geometric numerical integration?’ [Iserles, Quispel (2018)]1McLachlan, Quispel, and Robidoux (1999).

Erlend S. Riis Nonsmooth, nonconvex optimisation IHP, February, 2019 5 / 26

Page 6: A geometric integration approach to non-smooth and non-convex optimisation … · 2020. 4. 7. · Theorem (Ehrhardt, Riis, Quispel, Sch onlieb (2018)) Let V : Rn!R be alocally Lipschitz

Discrete gradient method

Definition

For a smooth function V : Rn → R, a discrete gradient ∇V (x , y) satisfies

(i) limy→x ∇V (x , y) = ∇V (x) (consistency),

(ii) 〈∇V (x , y), y − x〉 = V (y)− V (x) (mean value).

minV : Rn → RDiscrete gradient method.

xk+1 = xk − τ∇V (xk , xk+1)

Dissipative:

V (xk+1)− V (xk)

τ= 〈xk+1 − xk ,∇V (xk , xk+1)〉

= −‖∇V (xk , xk+1)‖2

= −

∥∥∥∥∥xk+1 − xk

τ

∥∥∥∥∥2

.

Gradient flow.

x(t) = −∇V (x(t)).

d

dtV (x(t)) = 〈x(t),∇V (x(t))〉

= −‖∇V (x(t))‖2

= −‖x(t)‖2.

Erlend S. Riis Nonsmooth, nonconvex optimisation IHP, February, 2019 6 / 26

Page 7: A geometric integration approach to non-smooth and non-convex optimisation … · 2020. 4. 7. · Theorem (Ehrhardt, Riis, Quispel, Sch onlieb (2018)) Let V : Rn!R be alocally Lipschitz

Convergence theorem for DG method

Theorem1

Suppose V is C 1-smooth, coercive, and bounded below, 0 < c < τ < C ,and xk+1 = xk − τ∇V (xk , xk+1). Then

∇V (xk)→ 0,

‖xk+1 − xk‖ → 0,

(xk) has an accumulation point x∗, and it satisfies ∇V (x∗) = 0.

Figure: Inpainting with Itoh-Abe DG method1

1Grimm, McLachlan, McLaren, Quispel and Schonlieb (2017).Erlend S. Riis Nonsmooth, nonconvex optimisation IHP, February, 2019 7 / 26

Page 8: A geometric integration approach to non-smooth and non-convex optimisation … · 2020. 4. 7. · Theorem (Ehrhardt, Riis, Quispel, Sch onlieb (2018)) Let V : Rn!R be alocally Lipschitz

Examples of discrete gradients

Gonzalez (midpoint) DG1:

∇V (x , y) = ∇V(x+y

2

)+

V (y)−V (x)−〈∇V ( x+y2

),y−x〉‖x−y‖2 (y − x).

Mean value DG2:

∇V (x , y) =∫ 1

0∇V

((1− s)x + sy

)ds.

Itoh-Abe (coordinate increment) DG3:

∇V (x , y) =

V (y1,x2,...,xn)−V (x1,x2,...,xn)y1−x1

V (y1,y2,x3,...,xn)−V (y1,x2,x3,...,xn)y2−x2

...V (y1,...,yn)−V (y1,y2,...,yn−1,xn)

yn−xn

.

1Gonzalez (1996). 2Celledoni, Grimm, et al. (2012). 3Itoh and Abe (1988).Erlend S. Riis Nonsmooth, nonconvex optimisation IHP, February, 2019 8 / 26

Page 9: A geometric integration approach to non-smooth and non-convex optimisation … · 2020. 4. 7. · Theorem (Ehrhardt, Riis, Quispel, Sch onlieb (2018)) Let V : Rn!R be alocally Lipschitz

Itoh-Abe discrete gradient (IADG) method

Applications

Image inpainting with Euler’s elastica (nonconvex).1

Successive-over-relaxation (SOR) and the Gauss-Seidel method, forlinear systems Ax = b.2

Kaczmarz methods (by extension)

Figure: Inpainting with Itoh-Abe DG method1.

1 Ringholm, Lasic and Schonlieb (2017). 2 Miyatake, Sogabe, Zhang (2017).Erlend S. Riis Nonsmooth, nonconvex optimisation IHP, February, 2019 9 / 26

Page 10: A geometric integration approach to non-smooth and non-convex optimisation … · 2020. 4. 7. · Theorem (Ehrhardt, Riis, Quispel, Sch onlieb (2018)) Let V : Rn!R be alocally Lipschitz

Itoh–Abe method for nonsmooth, nonconvex optimisation

Rewrite IADG method as

xk+1 = xk − αdk , s.t. α = τkV (xk − αdk)− V (xk)

α,

dk ∈ Sn−1 := {d ∈ Rn : ‖d‖ = 1}.(1)

Derivative-free gradient flow dissipative structure

V (xk+1)− V (xk) = − 1

τk‖xk+1 − xk‖2 = −τk

(V (xk+1 − V (xk)

‖xk+1 − xk‖

)2

Well-defined for nonsmooth functions; computationally tractable

Descends along directions (dk)k∈NStandard IADG: Let (dk) cycle through coordinates e i

Can also randomise: Draw dk randomly from (e i )ni=1 or from Sn−1

Erlend S. Riis Nonsmooth, nonconvex optimisation IHP, February, 2019 10 / 26

Page 11: A geometric integration approach to non-smooth and non-convex optimisation … · 2020. 4. 7. · Theorem (Ehrhardt, Riis, Quispel, Sch onlieb (2018)) Let V : Rn!R be alocally Lipschitz

Motivations

When the function is nonsmooth, nonconvex, and black-box

Bilevel optimisation of variational regularisation problemsParameter optimisation of model simulations

‘Optimal camera placement to measure distances regarding staticand dynamic obstacles’ [Hanel et al. (2012)]

When gradients are computationally expensive

When problem is poorly conditioned or stiff.

Erlend S. Riis Nonsmooth, nonconvex optimisation IHP, February, 2019 11 / 26

Page 12: A geometric integration approach to non-smooth and non-convex optimisation … · 2020. 4. 7. · Theorem (Ehrhardt, Riis, Quispel, Sch onlieb (2018)) Let V : Rn!R be alocally Lipschitz

The DG method for nonsmooth, nonconvexoptimisation

Erlend S. Riis Nonsmooth, nonconvex optimisation IHP, February, 2019 12 / 26

Page 13: A geometric integration approach to non-smooth and non-convex optimisation … · 2020. 4. 7. · Theorem (Ehrhardt, Riis, Quispel, Sch onlieb (2018)) Let V : Rn!R be alocally Lipschitz

Clarke subdifferential for nonsmooth, nonconvex analysis

The Clarke subdifferential1 ∂V (x) introduced by F. Clarke (1973).

∂V (x) = co{p s.t. xk → x , ∇V (xk)→ p

}(Convex hull of

limiting gradients).

Generalises gradient, and classical subdifferential for convex functions.

Nice analytical properties for locally Lipschitz continuous functions.(outer semicontinuity; mean value theorem; convex, compact,non-empty sets)

x is Clarke stationary when 0 ∈ ∂V (x).

Clarke directional derivative:

V o(x ; d) := lim supy→x ,λ→0

V (y + λ+ v)− V (y)

λ

0 ∈ ∂V (x) ⇐⇒ V o(x ; d) ≥ 0 for all d ∈ Sn−1

1Clarke (1990).Erlend S. Riis Nonsmooth, nonconvex optimisation IHP, February, 2019 13 / 26

Page 14: A geometric integration approach to non-smooth and non-convex optimisation … · 2020. 4. 7. · Theorem (Ehrhardt, Riis, Quispel, Sch onlieb (2018)) Let V : Rn!R be alocally Lipschitz

Ingredients of proof

Want to prove accumulation points of (xk)k∈N are Clarke stationary.

Local Lipschitz continuity of V ⇒ upper semicontinuity of V o( · ; ·).(closely related to outer semicontinuity of ∂V )

‖xk+1 − xk‖ → 0, V (xk+1)−V (xk )‖xk+1−xk‖ → 0.

(xkj , dkj )→ (x∗, d∗) and lim infj→∞

V o(xkj ; dkj ) ≥ 0 ⇒ V o(x∗; d∗) ≥ 0.

Need to ensure ∃ dense subsequence dkj corresponding to xkj → x∗.

For random (dk)k∈N, ⇐⇒ full support of distribution of dk on Sn−1.For determinstic (dk), ⇐⇒ “cyclical” density.

Erlend S. Riis Nonsmooth, nonconvex optimisation IHP, February, 2019 14 / 26

Page 15: A geometric integration approach to non-smooth and non-convex optimisation … · 2020. 4. 7. · Theorem (Ehrhardt, Riis, Quispel, Sch onlieb (2018)) Let V : Rn!R be alocally Lipschitz

Main theorem

Theorem (Ehrhardt, Riis, Quispel, Schonlieb (2018))

Let V : Rn → R be a locally Lipschitz continuous, coercive function that isbounded below. Suppose (xk) are the iterates from the generalisedItoh-Abe DG method with appropriate sequence of directions (dk)k∈∞.Then

The iterates converge to a nonempty, connected, compact set ofaccumulation points.

All accumulation points are Clarke stationary.

All accumulation points take the same value on V .

Erlend S. Riis Nonsmooth, nonconvex optimisation IHP, February, 2019 15 / 26

Page 16: A geometric integration approach to non-smooth and non-convex optimisation … · 2020. 4. 7. · Theorem (Ehrhardt, Riis, Quispel, Sch onlieb (2018)) Let V : Rn!R be alocally Lipschitz

Other properties of DG method

When V is C 1-smooth, the DG methods inherit properties from gradientdescent/flow:

Convergence rates:1 V (xk)− V ∗ → 0.

O(1/k) if V is convex.Linearly if V is Polyak– Lojasiewicz function/ strongly convex.Itoh–Abe method has marginally better convergence rate thancoordinate descent.

Kurdyka– Lojasiewicz inequality =⇒ (xk)k∈N converges.

Properties hold for all time steps c < (τk)k∈N < C , c ,C > 0.

1Ehrhardt, Riis, Ringholm, Schonlieb (2018).Erlend S. Riis Nonsmooth, nonconvex optimisation IHP, February, 2019 16 / 26

Page 17: A geometric integration approach to non-smooth and non-convex optimisation … · 2020. 4. 7. · Theorem (Ehrhardt, Riis, Quispel, Sch onlieb (2018)) Let V : Rn!R be alocally Lipschitz

Rosenbrock function

−1.5 −1.0 −0.5 0.0 0.5 1.0x1

−0.5

0.0

0.5

1.0

1.5

2.0

x 2

0 2000 4000 6000 8000 10000 12000function evaluations

10−10

10−8

10−6

10−4

10−2

100

relativ

e ob

jective

Standard Itoh-AbeRotated Itoh-AbeRandom pursuit

Erlend S. Riis Nonsmooth, nonconvex optimisation IHP, February, 2019 17 / 26

Page 18: A geometric integration approach to non-smooth and non-convex optimisation … · 2020. 4. 7. · Theorem (Ehrhardt, Riis, Quispel, Sch onlieb (2018)) Let V : Rn!R be alocally Lipschitz

Beyond gradient flow

Erlend S. Riis Nonsmooth, nonconvex optimisation IHP, February, 2019 18 / 26

Page 19: A geometric integration approach to non-smooth and non-convex optimisation … · 2020. 4. 7. · Theorem (Ehrhardt, Riis, Quispel, Sch onlieb (2018)) Let V : Rn!R be alocally Lipschitz

Bregman distance and inverse scale space flow

Let J : Rn → R be convex, e.g.

J(x) = γ‖x‖1 or γ‖x‖1 + ‖x‖2/2 or γ TV(x).

Define Bregman distance (notion of distance induced by J)

DpJ (x , y) = J(y)− J(x)− 〈p, y − x〉 ≥ 0, p ∈ ∂J(x).

For inverse problem b = Ax , want to solve

minx

J(x) s.t. b = Ax or minx

V (x) + λJ(x),

where V (x) = ‖Ax − b‖2/2.

Consider inverse scale space (ISS) flow1

∂tp(t) = −∂V (x(t)), p(t) ∈ ∂J(x(t)).

1Burger, Gilboa, Osher, Xu (2006).Erlend S. Riis Nonsmooth, nonconvex optimisation IHP, February, 2019 19 / 26

Page 20: A geometric integration approach to non-smooth and non-convex optimisation … · 2020. 4. 7. · Theorem (Ehrhardt, Riis, Quispel, Sch onlieb (2018)) Let V : Rn!R be alocally Lipschitz

Discretisations of inverse scale space

∂tp(t) = −∂V (x(t)), p(t) ∈ ∂J(x(t)).

Backward Euler → Bregman iterations1:

pk+1 = pk − τk∇V (xk+1), pk+1 ∈ ∂J(xk+1)

⇐⇒ xk+1 = arg minx

τkV (x) + Dpk

J (xk , x).

Forward Euler → Linearised Bregman iterations1:

pk+1 = pk − τk∇V (xk), pk+1 ∈ ∂J(xk+1)

⇐⇒ xk+1 = arg minx

τk(V (xk) + 〈∇V (xk), x − xk〉

)+ Dpk

J (xk , x).

DG method → Bregman DG method:

pk+1 = pk − τk∇V (xk , xk+1), pk+1 = ∂J(xk+1).

1Benning, Burger (2018).Erlend S. Riis Nonsmooth, nonconvex optimisation IHP, February, 2019 20 / 26

Page 21: A geometric integration approach to non-smooth and non-convex optimisation … · 2020. 4. 7. · Theorem (Ehrhardt, Riis, Quispel, Sch onlieb (2018)) Let V : Rn!R be alocally Lipschitz

Bregman Itoh–Abe method

pk+1i = pk

i − τ kiV (xk+1

1 , . . . , xk+1i , xk

i+1, . . . , xkn )− V (xk+1

1 , . . . , xk+1i−1 , x

ki , . . . , x

kn )

xk+1i − xk

i

.

If V is continuous, and J is continuous and strongly convex, thenupdates are well-defined.

If V is convex, updates are unique.

Iterates (xk)k∈N converge to Clarke stationary points (underregularity assumption).

Implications:

Can adapt pre-existing methods to incorporate bias (e.g. sparsity,variational regularisation problems)Handles nonsmoothness better (e.g. ‖ · ‖1 kinks)

Erlend S. Riis Nonsmooth, nonconvex optimisation IHP, February, 2019 21 / 26

Page 22: A geometric integration approach to non-smooth and non-convex optimisation … · 2020. 4. 7. · Theorem (Ehrhardt, Riis, Quispel, Sch onlieb (2018)) Let V : Rn!R be alocally Lipschitz

Example: Bregman Itoh–Abe for linear systems (SOR)(1/2)

Figure: Top left: Ground truth. Top right: Normalised residual of data fidelity. Bottom: Error insupport of iterate, supp (xk ), to support of ground truth, supp (x∗).

Erlend S. Riis Nonsmooth, nonconvex optimisation IHP, February, 2019 22 / 26

Page 23: A geometric integration approach to non-smooth and non-convex optimisation … · 2020. 4. 7. · Theorem (Ehrhardt, Riis, Quispel, Sch onlieb (2018)) Let V : Rn!R be alocally Lipschitz

Example: Bregman Itoh–Abe for linear systems (SOR)(2/2)

Figure: Top left: Ground truth. Top right: Normalised residual of data fidelity. Bottom: Error insupport of iterate, supp (xk ), to support of ground truth, supp (x∗).

Erlend S. Riis Nonsmooth, nonconvex optimisation IHP, February, 2019 23 / 26

Page 24: A geometric integration approach to non-smooth and non-convex optimisation … · 2020. 4. 7. · Theorem (Ehrhardt, Riis, Quispel, Sch onlieb (2018)) Let V : Rn!R be alocally Lipschitz

Outlook

Accelerate Itoh–Abe DG method.

Apply discrete gradient methods to gradient flow under differentmetrics(e.g. optimal transport)

Erlend S. Riis Nonsmooth, nonconvex optimisation IHP, February, 2019 24 / 26

Page 25: A geometric integration approach to non-smooth and non-convex optimisation … · 2020. 4. 7. · Theorem (Ehrhardt, Riis, Quispel, Sch onlieb (2018)) Let V : Rn!R be alocally Lipschitz

Thank you for your attention!

Relevant papers

1 Riis, Ehrhardt, Quispel, Schonlieb. A geometric integration approach tononsmooth, nonconvex optimisation. (2018, preprint)

2 Ehrhardt, Riis, Ringholm, Schonlieb. A geometric integration approach to smoothoptimisation: Foundations of the discrete gradient method. (2018, preprint)

3 Grimm, McLachlan, McLaren, Quispel, Schonlieb. Discrete gradient methods forsolving variational image regularisation models. (2017)

4 Ringholm, Lazic, Schonlieb. Variational image regularization with Euler’s elasticausing a discrete gradient scheme. (2017)

Erlend S. Riis Nonsmooth, nonconvex optimisation IHP, February, 2019 25 / 26