a geometric integration approach to non-smooth and non-convex optimisation … · 2020. 4. 7. ·...

Post on 25-Jul-2021

1 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

A geometric integration approach to non-smooth andnon-convex optimisation

Martin Benning1, Matthias Ehrhardt2, GRW Quispel3,Erlend Skaldehaug Riis4, Torbjørn Ringholm5, Carola-Bibiane Schonlieb4

1Queen Mary University of London, UK2University of Bath, UK

3La Trobe University, Australia4University of Cambridge, UK

5Norwegian University of Science and Technology, Norway

Variational Methods and Optimization in ImagingIHP, Paris

Erlend S. Riis Nonsmooth, nonconvex optimisation IHP, February, 2019 1 / 26

Outline

1 Geometric numerical integration and the discrete gradient method

2 The DG method for nonsmooth, nonconvex optimisation

3 Beyond gradient flow

Erlend S. Riis Nonsmooth, nonconvex optimisation IHP, February, 2019 2 / 26

Geometric numerical integration and the discretegradient method

Erlend S. Riis Nonsmooth, nonconvex optimisation IHP, February, 2019 3 / 26

Optimisation and numerical integration

minx∈Rn V (x), x(t) = −∇V (x(t)) (gradient flow)

Forward Euler → xk+1 = xk − τ∇V (xk),Backward Euler → xk+1 = xk − τ∇V (xk+1).

Numerical integration and analysis of ODEs

Optimisation scheme → ODEAccelerated gradient descent → x + 3

t x = −∇V (x)[Su, Boyd, Candes (2016)]

Numerical integration tools to improve efficiencyRunge-Kutta methods for stiff ODEs → Larger time steps[Eftekhari, Vandereycken, Vilmart, Zygalakis (2018)]

Discretisation methods for structure preservation of ODESymmetry preservation → acceleration phenomenon[Betancourt, Jordan, Wilson (2018)]

Erlend S. Riis Nonsmooth, nonconvex optimisation IHP, February, 2019 4 / 26

Geometric integration and discrete gradients

Geometric numerical integrationODEs have structure (conservation laws, symplectic structure.)Aim: Preserve structure when numerically solving ODEs

Discrete gradient (DG) methodPreserves first integrals; energy conservation and dissipation laws,Lyapunov functions1

Optimisation methodsWant to solve minx∈Rn V (x).Apply DG method to gradient flow to preserve dissipative structure

Figure: Dahlby, Owren, Yaguchi (2011).

‘Why geometric numerical integration?’ [Iserles, Quispel (2018)]1McLachlan, Quispel, and Robidoux (1999).

Erlend S. Riis Nonsmooth, nonconvex optimisation IHP, February, 2019 5 / 26

Discrete gradient method

Definition

For a smooth function V : Rn → R, a discrete gradient ∇V (x , y) satisfies

(i) limy→x ∇V (x , y) = ∇V (x) (consistency),

(ii) 〈∇V (x , y), y − x〉 = V (y)− V (x) (mean value).

minV : Rn → RDiscrete gradient method.

xk+1 = xk − τ∇V (xk , xk+1)

Dissipative:

V (xk+1)− V (xk)

τ= 〈xk+1 − xk ,∇V (xk , xk+1)〉

= −‖∇V (xk , xk+1)‖2

= −

∥∥∥∥∥xk+1 − xk

τ

∥∥∥∥∥2

.

Gradient flow.

x(t) = −∇V (x(t)).

d

dtV (x(t)) = 〈x(t),∇V (x(t))〉

= −‖∇V (x(t))‖2

= −‖x(t)‖2.

Erlend S. Riis Nonsmooth, nonconvex optimisation IHP, February, 2019 6 / 26

Convergence theorem for DG method

Theorem1

Suppose V is C 1-smooth, coercive, and bounded below, 0 < c < τ < C ,and xk+1 = xk − τ∇V (xk , xk+1). Then

∇V (xk)→ 0,

‖xk+1 − xk‖ → 0,

(xk) has an accumulation point x∗, and it satisfies ∇V (x∗) = 0.

Figure: Inpainting with Itoh-Abe DG method1

1Grimm, McLachlan, McLaren, Quispel and Schonlieb (2017).Erlend S. Riis Nonsmooth, nonconvex optimisation IHP, February, 2019 7 / 26

Examples of discrete gradients

Gonzalez (midpoint) DG1:

∇V (x , y) = ∇V(x+y

2

)+

V (y)−V (x)−〈∇V ( x+y2

),y−x〉‖x−y‖2 (y − x).

Mean value DG2:

∇V (x , y) =∫ 1

0∇V

((1− s)x + sy

)ds.

Itoh-Abe (coordinate increment) DG3:

∇V (x , y) =

V (y1,x2,...,xn)−V (x1,x2,...,xn)y1−x1

V (y1,y2,x3,...,xn)−V (y1,x2,x3,...,xn)y2−x2

...V (y1,...,yn)−V (y1,y2,...,yn−1,xn)

yn−xn

.

1Gonzalez (1996). 2Celledoni, Grimm, et al. (2012). 3Itoh and Abe (1988).Erlend S. Riis Nonsmooth, nonconvex optimisation IHP, February, 2019 8 / 26

Itoh-Abe discrete gradient (IADG) method

Applications

Image inpainting with Euler’s elastica (nonconvex).1

Successive-over-relaxation (SOR) and the Gauss-Seidel method, forlinear systems Ax = b.2

Kaczmarz methods (by extension)

Figure: Inpainting with Itoh-Abe DG method1.

1 Ringholm, Lasic and Schonlieb (2017). 2 Miyatake, Sogabe, Zhang (2017).Erlend S. Riis Nonsmooth, nonconvex optimisation IHP, February, 2019 9 / 26

Itoh–Abe method for nonsmooth, nonconvex optimisation

Rewrite IADG method as

xk+1 = xk − αdk , s.t. α = τkV (xk − αdk)− V (xk)

α,

dk ∈ Sn−1 := {d ∈ Rn : ‖d‖ = 1}.(1)

Derivative-free gradient flow dissipative structure

V (xk+1)− V (xk) = − 1

τk‖xk+1 − xk‖2 = −τk

(V (xk+1 − V (xk)

‖xk+1 − xk‖

)2

Well-defined for nonsmooth functions; computationally tractable

Descends along directions (dk)k∈NStandard IADG: Let (dk) cycle through coordinates e i

Can also randomise: Draw dk randomly from (e i )ni=1 or from Sn−1

Erlend S. Riis Nonsmooth, nonconvex optimisation IHP, February, 2019 10 / 26

Motivations

When the function is nonsmooth, nonconvex, and black-box

Bilevel optimisation of variational regularisation problemsParameter optimisation of model simulations

‘Optimal camera placement to measure distances regarding staticand dynamic obstacles’ [Hanel et al. (2012)]

When gradients are computationally expensive

When problem is poorly conditioned or stiff.

Erlend S. Riis Nonsmooth, nonconvex optimisation IHP, February, 2019 11 / 26

The DG method for nonsmooth, nonconvexoptimisation

Erlend S. Riis Nonsmooth, nonconvex optimisation IHP, February, 2019 12 / 26

Clarke subdifferential for nonsmooth, nonconvex analysis

The Clarke subdifferential1 ∂V (x) introduced by F. Clarke (1973).

∂V (x) = co{p s.t. xk → x , ∇V (xk)→ p

}(Convex hull of

limiting gradients).

Generalises gradient, and classical subdifferential for convex functions.

Nice analytical properties for locally Lipschitz continuous functions.(outer semicontinuity; mean value theorem; convex, compact,non-empty sets)

x is Clarke stationary when 0 ∈ ∂V (x).

Clarke directional derivative:

V o(x ; d) := lim supy→x ,λ→0

V (y + λ+ v)− V (y)

λ

0 ∈ ∂V (x) ⇐⇒ V o(x ; d) ≥ 0 for all d ∈ Sn−1

1Clarke (1990).Erlend S. Riis Nonsmooth, nonconvex optimisation IHP, February, 2019 13 / 26

Ingredients of proof

Want to prove accumulation points of (xk)k∈N are Clarke stationary.

Local Lipschitz continuity of V ⇒ upper semicontinuity of V o( · ; ·).(closely related to outer semicontinuity of ∂V )

‖xk+1 − xk‖ → 0, V (xk+1)−V (xk )‖xk+1−xk‖ → 0.

(xkj , dkj )→ (x∗, d∗) and lim infj→∞

V o(xkj ; dkj ) ≥ 0 ⇒ V o(x∗; d∗) ≥ 0.

Need to ensure ∃ dense subsequence dkj corresponding to xkj → x∗.

For random (dk)k∈N, ⇐⇒ full support of distribution of dk on Sn−1.For determinstic (dk), ⇐⇒ “cyclical” density.

Erlend S. Riis Nonsmooth, nonconvex optimisation IHP, February, 2019 14 / 26

Main theorem

Theorem (Ehrhardt, Riis, Quispel, Schonlieb (2018))

Let V : Rn → R be a locally Lipschitz continuous, coercive function that isbounded below. Suppose (xk) are the iterates from the generalisedItoh-Abe DG method with appropriate sequence of directions (dk)k∈∞.Then

The iterates converge to a nonempty, connected, compact set ofaccumulation points.

All accumulation points are Clarke stationary.

All accumulation points take the same value on V .

Erlend S. Riis Nonsmooth, nonconvex optimisation IHP, February, 2019 15 / 26

Other properties of DG method

When V is C 1-smooth, the DG methods inherit properties from gradientdescent/flow:

Convergence rates:1 V (xk)− V ∗ → 0.

O(1/k) if V is convex.Linearly if V is Polyak– Lojasiewicz function/ strongly convex.Itoh–Abe method has marginally better convergence rate thancoordinate descent.

Kurdyka– Lojasiewicz inequality =⇒ (xk)k∈N converges.

Properties hold for all time steps c < (τk)k∈N < C , c ,C > 0.

1Ehrhardt, Riis, Ringholm, Schonlieb (2018).Erlend S. Riis Nonsmooth, nonconvex optimisation IHP, February, 2019 16 / 26

Rosenbrock function

−1.5 −1.0 −0.5 0.0 0.5 1.0x1

−0.5

0.0

0.5

1.0

1.5

2.0

x 2

0 2000 4000 6000 8000 10000 12000function evaluations

10−10

10−8

10−6

10−4

10−2

100

relativ

e ob

jective

Standard Itoh-AbeRotated Itoh-AbeRandom pursuit

Erlend S. Riis Nonsmooth, nonconvex optimisation IHP, February, 2019 17 / 26

Beyond gradient flow

Erlend S. Riis Nonsmooth, nonconvex optimisation IHP, February, 2019 18 / 26

Bregman distance and inverse scale space flow

Let J : Rn → R be convex, e.g.

J(x) = γ‖x‖1 or γ‖x‖1 + ‖x‖2/2 or γ TV(x).

Define Bregman distance (notion of distance induced by J)

DpJ (x , y) = J(y)− J(x)− 〈p, y − x〉 ≥ 0, p ∈ ∂J(x).

For inverse problem b = Ax , want to solve

minx

J(x) s.t. b = Ax or minx

V (x) + λJ(x),

where V (x) = ‖Ax − b‖2/2.

Consider inverse scale space (ISS) flow1

∂tp(t) = −∂V (x(t)), p(t) ∈ ∂J(x(t)).

1Burger, Gilboa, Osher, Xu (2006).Erlend S. Riis Nonsmooth, nonconvex optimisation IHP, February, 2019 19 / 26

Discretisations of inverse scale space

∂tp(t) = −∂V (x(t)), p(t) ∈ ∂J(x(t)).

Backward Euler → Bregman iterations1:

pk+1 = pk − τk∇V (xk+1), pk+1 ∈ ∂J(xk+1)

⇐⇒ xk+1 = arg minx

τkV (x) + Dpk

J (xk , x).

Forward Euler → Linearised Bregman iterations1:

pk+1 = pk − τk∇V (xk), pk+1 ∈ ∂J(xk+1)

⇐⇒ xk+1 = arg minx

τk(V (xk) + 〈∇V (xk), x − xk〉

)+ Dpk

J (xk , x).

DG method → Bregman DG method:

pk+1 = pk − τk∇V (xk , xk+1), pk+1 = ∂J(xk+1).

1Benning, Burger (2018).Erlend S. Riis Nonsmooth, nonconvex optimisation IHP, February, 2019 20 / 26

Bregman Itoh–Abe method

pk+1i = pk

i − τ kiV (xk+1

1 , . . . , xk+1i , xk

i+1, . . . , xkn )− V (xk+1

1 , . . . , xk+1i−1 , x

ki , . . . , x

kn )

xk+1i − xk

i

.

If V is continuous, and J is continuous and strongly convex, thenupdates are well-defined.

If V is convex, updates are unique.

Iterates (xk)k∈N converge to Clarke stationary points (underregularity assumption).

Implications:

Can adapt pre-existing methods to incorporate bias (e.g. sparsity,variational regularisation problems)Handles nonsmoothness better (e.g. ‖ · ‖1 kinks)

Erlend S. Riis Nonsmooth, nonconvex optimisation IHP, February, 2019 21 / 26

Example: Bregman Itoh–Abe for linear systems (SOR)(1/2)

Figure: Top left: Ground truth. Top right: Normalised residual of data fidelity. Bottom: Error insupport of iterate, supp (xk ), to support of ground truth, supp (x∗).

Erlend S. Riis Nonsmooth, nonconvex optimisation IHP, February, 2019 22 / 26

Example: Bregman Itoh–Abe for linear systems (SOR)(2/2)

Figure: Top left: Ground truth. Top right: Normalised residual of data fidelity. Bottom: Error insupport of iterate, supp (xk ), to support of ground truth, supp (x∗).

Erlend S. Riis Nonsmooth, nonconvex optimisation IHP, February, 2019 23 / 26

Outlook

Accelerate Itoh–Abe DG method.

Apply discrete gradient methods to gradient flow under differentmetrics(e.g. optimal transport)

Erlend S. Riis Nonsmooth, nonconvex optimisation IHP, February, 2019 24 / 26

Thank you for your attention!

Relevant papers

1 Riis, Ehrhardt, Quispel, Schonlieb. A geometric integration approach tononsmooth, nonconvex optimisation. (2018, preprint)

2 Ehrhardt, Riis, Ringholm, Schonlieb. A geometric integration approach to smoothoptimisation: Foundations of the discrete gradient method. (2018, preprint)

3 Grimm, McLachlan, McLaren, Quispel, Schonlieb. Discrete gradient methods forsolving variational image regularisation models. (2017)

4 Ringholm, Lazic, Schonlieb. Variational image regularization with Euler’s elasticausing a discrete gradient scheme. (2017)

Erlend S. Riis Nonsmooth, nonconvex optimisation IHP, February, 2019 25 / 26

top related