a geometric integration approach to non-smooth and non-convex optimisation … · 2020. 4. 7. ·...
TRANSCRIPT
A geometric integration approach to non-smooth andnon-convex optimisation
Martin Benning1, Matthias Ehrhardt2, GRW Quispel3,Erlend Skaldehaug Riis4, Torbjørn Ringholm5, Carola-Bibiane Schonlieb4
1Queen Mary University of London, UK2University of Bath, UK
3La Trobe University, Australia4University of Cambridge, UK
5Norwegian University of Science and Technology, Norway
Variational Methods and Optimization in ImagingIHP, Paris
Erlend S. Riis Nonsmooth, nonconvex optimisation IHP, February, 2019 1 / 26
Outline
1 Geometric numerical integration and the discrete gradient method
2 The DG method for nonsmooth, nonconvex optimisation
3 Beyond gradient flow
Erlend S. Riis Nonsmooth, nonconvex optimisation IHP, February, 2019 2 / 26
Geometric numerical integration and the discretegradient method
Erlend S. Riis Nonsmooth, nonconvex optimisation IHP, February, 2019 3 / 26
Optimisation and numerical integration
minx∈Rn V (x), x(t) = −∇V (x(t)) (gradient flow)
Forward Euler → xk+1 = xk − τ∇V (xk),Backward Euler → xk+1 = xk − τ∇V (xk+1).
Numerical integration and analysis of ODEs
Optimisation scheme → ODEAccelerated gradient descent → x + 3
t x = −∇V (x)[Su, Boyd, Candes (2016)]
Numerical integration tools to improve efficiencyRunge-Kutta methods for stiff ODEs → Larger time steps[Eftekhari, Vandereycken, Vilmart, Zygalakis (2018)]
Discretisation methods for structure preservation of ODESymmetry preservation → acceleration phenomenon[Betancourt, Jordan, Wilson (2018)]
Erlend S. Riis Nonsmooth, nonconvex optimisation IHP, February, 2019 4 / 26
Geometric integration and discrete gradients
Geometric numerical integrationODEs have structure (conservation laws, symplectic structure.)Aim: Preserve structure when numerically solving ODEs
Discrete gradient (DG) methodPreserves first integrals; energy conservation and dissipation laws,Lyapunov functions1
Optimisation methodsWant to solve minx∈Rn V (x).Apply DG method to gradient flow to preserve dissipative structure
Figure: Dahlby, Owren, Yaguchi (2011).
‘Why geometric numerical integration?’ [Iserles, Quispel (2018)]1McLachlan, Quispel, and Robidoux (1999).
Erlend S. Riis Nonsmooth, nonconvex optimisation IHP, February, 2019 5 / 26
Discrete gradient method
Definition
For a smooth function V : Rn → R, a discrete gradient ∇V (x , y) satisfies
(i) limy→x ∇V (x , y) = ∇V (x) (consistency),
(ii) 〈∇V (x , y), y − x〉 = V (y)− V (x) (mean value).
minV : Rn → RDiscrete gradient method.
xk+1 = xk − τ∇V (xk , xk+1)
Dissipative:
V (xk+1)− V (xk)
τ= 〈xk+1 − xk ,∇V (xk , xk+1)〉
= −‖∇V (xk , xk+1)‖2
= −
∥∥∥∥∥xk+1 − xk
τ
∥∥∥∥∥2
.
Gradient flow.
x(t) = −∇V (x(t)).
d
dtV (x(t)) = 〈x(t),∇V (x(t))〉
= −‖∇V (x(t))‖2
= −‖x(t)‖2.
Erlend S. Riis Nonsmooth, nonconvex optimisation IHP, February, 2019 6 / 26
Convergence theorem for DG method
Theorem1
Suppose V is C 1-smooth, coercive, and bounded below, 0 < c < τ < C ,and xk+1 = xk − τ∇V (xk , xk+1). Then
∇V (xk)→ 0,
‖xk+1 − xk‖ → 0,
(xk) has an accumulation point x∗, and it satisfies ∇V (x∗) = 0.
Figure: Inpainting with Itoh-Abe DG method1
1Grimm, McLachlan, McLaren, Quispel and Schonlieb (2017).Erlend S. Riis Nonsmooth, nonconvex optimisation IHP, February, 2019 7 / 26
Examples of discrete gradients
Gonzalez (midpoint) DG1:
∇V (x , y) = ∇V(x+y
2
)+
V (y)−V (x)−〈∇V ( x+y2
),y−x〉‖x−y‖2 (y − x).
Mean value DG2:
∇V (x , y) =∫ 1
0∇V
((1− s)x + sy
)ds.
Itoh-Abe (coordinate increment) DG3:
∇V (x , y) =
V (y1,x2,...,xn)−V (x1,x2,...,xn)y1−x1
V (y1,y2,x3,...,xn)−V (y1,x2,x3,...,xn)y2−x2
...V (y1,...,yn)−V (y1,y2,...,yn−1,xn)
yn−xn
.
1Gonzalez (1996). 2Celledoni, Grimm, et al. (2012). 3Itoh and Abe (1988).Erlend S. Riis Nonsmooth, nonconvex optimisation IHP, February, 2019 8 / 26
Itoh-Abe discrete gradient (IADG) method
Applications
Image inpainting with Euler’s elastica (nonconvex).1
Successive-over-relaxation (SOR) and the Gauss-Seidel method, forlinear systems Ax = b.2
Kaczmarz methods (by extension)
Figure: Inpainting with Itoh-Abe DG method1.
1 Ringholm, Lasic and Schonlieb (2017). 2 Miyatake, Sogabe, Zhang (2017).Erlend S. Riis Nonsmooth, nonconvex optimisation IHP, February, 2019 9 / 26
Itoh–Abe method for nonsmooth, nonconvex optimisation
Rewrite IADG method as
xk+1 = xk − αdk , s.t. α = τkV (xk − αdk)− V (xk)
α,
dk ∈ Sn−1 := {d ∈ Rn : ‖d‖ = 1}.(1)
Derivative-free gradient flow dissipative structure
V (xk+1)− V (xk) = − 1
τk‖xk+1 − xk‖2 = −τk
(V (xk+1 − V (xk)
‖xk+1 − xk‖
)2
Well-defined for nonsmooth functions; computationally tractable
Descends along directions (dk)k∈NStandard IADG: Let (dk) cycle through coordinates e i
Can also randomise: Draw dk randomly from (e i )ni=1 or from Sn−1
Erlend S. Riis Nonsmooth, nonconvex optimisation IHP, February, 2019 10 / 26
Motivations
When the function is nonsmooth, nonconvex, and black-box
Bilevel optimisation of variational regularisation problemsParameter optimisation of model simulations
‘Optimal camera placement to measure distances regarding staticand dynamic obstacles’ [Hanel et al. (2012)]
When gradients are computationally expensive
When problem is poorly conditioned or stiff.
Erlend S. Riis Nonsmooth, nonconvex optimisation IHP, February, 2019 11 / 26
The DG method for nonsmooth, nonconvexoptimisation
Erlend S. Riis Nonsmooth, nonconvex optimisation IHP, February, 2019 12 / 26
Clarke subdifferential for nonsmooth, nonconvex analysis
The Clarke subdifferential1 ∂V (x) introduced by F. Clarke (1973).
∂V (x) = co{p s.t. xk → x , ∇V (xk)→ p
}(Convex hull of
limiting gradients).
Generalises gradient, and classical subdifferential for convex functions.
Nice analytical properties for locally Lipschitz continuous functions.(outer semicontinuity; mean value theorem; convex, compact,non-empty sets)
x is Clarke stationary when 0 ∈ ∂V (x).
Clarke directional derivative:
V o(x ; d) := lim supy→x ,λ→0
V (y + λ+ v)− V (y)
λ
0 ∈ ∂V (x) ⇐⇒ V o(x ; d) ≥ 0 for all d ∈ Sn−1
1Clarke (1990).Erlend S. Riis Nonsmooth, nonconvex optimisation IHP, February, 2019 13 / 26
Ingredients of proof
Want to prove accumulation points of (xk)k∈N are Clarke stationary.
Local Lipschitz continuity of V ⇒ upper semicontinuity of V o( · ; ·).(closely related to outer semicontinuity of ∂V )
‖xk+1 − xk‖ → 0, V (xk+1)−V (xk )‖xk+1−xk‖ → 0.
(xkj , dkj )→ (x∗, d∗) and lim infj→∞
V o(xkj ; dkj ) ≥ 0 ⇒ V o(x∗; d∗) ≥ 0.
Need to ensure ∃ dense subsequence dkj corresponding to xkj → x∗.
For random (dk)k∈N, ⇐⇒ full support of distribution of dk on Sn−1.For determinstic (dk), ⇐⇒ “cyclical” density.
Erlend S. Riis Nonsmooth, nonconvex optimisation IHP, February, 2019 14 / 26
Main theorem
Theorem (Ehrhardt, Riis, Quispel, Schonlieb (2018))
Let V : Rn → R be a locally Lipschitz continuous, coercive function that isbounded below. Suppose (xk) are the iterates from the generalisedItoh-Abe DG method with appropriate sequence of directions (dk)k∈∞.Then
The iterates converge to a nonempty, connected, compact set ofaccumulation points.
All accumulation points are Clarke stationary.
All accumulation points take the same value on V .
Erlend S. Riis Nonsmooth, nonconvex optimisation IHP, February, 2019 15 / 26
Other properties of DG method
When V is C 1-smooth, the DG methods inherit properties from gradientdescent/flow:
Convergence rates:1 V (xk)− V ∗ → 0.
O(1/k) if V is convex.Linearly if V is Polyak– Lojasiewicz function/ strongly convex.Itoh–Abe method has marginally better convergence rate thancoordinate descent.
Kurdyka– Lojasiewicz inequality =⇒ (xk)k∈N converges.
Properties hold for all time steps c < (τk)k∈N < C , c ,C > 0.
1Ehrhardt, Riis, Ringholm, Schonlieb (2018).Erlend S. Riis Nonsmooth, nonconvex optimisation IHP, February, 2019 16 / 26
Rosenbrock function
−1.5 −1.0 −0.5 0.0 0.5 1.0x1
−0.5
0.0
0.5
1.0
1.5
2.0
x 2
0 2000 4000 6000 8000 10000 12000function evaluations
10−10
10−8
10−6
10−4
10−2
100
relativ
e ob
jective
Standard Itoh-AbeRotated Itoh-AbeRandom pursuit
Erlend S. Riis Nonsmooth, nonconvex optimisation IHP, February, 2019 17 / 26
Beyond gradient flow
Erlend S. Riis Nonsmooth, nonconvex optimisation IHP, February, 2019 18 / 26
Bregman distance and inverse scale space flow
Let J : Rn → R be convex, e.g.
J(x) = γ‖x‖1 or γ‖x‖1 + ‖x‖2/2 or γ TV(x).
Define Bregman distance (notion of distance induced by J)
DpJ (x , y) = J(y)− J(x)− 〈p, y − x〉 ≥ 0, p ∈ ∂J(x).
For inverse problem b = Ax , want to solve
minx
J(x) s.t. b = Ax or minx
V (x) + λJ(x),
where V (x) = ‖Ax − b‖2/2.
Consider inverse scale space (ISS) flow1
∂tp(t) = −∂V (x(t)), p(t) ∈ ∂J(x(t)).
1Burger, Gilboa, Osher, Xu (2006).Erlend S. Riis Nonsmooth, nonconvex optimisation IHP, February, 2019 19 / 26
Discretisations of inverse scale space
∂tp(t) = −∂V (x(t)), p(t) ∈ ∂J(x(t)).
Backward Euler → Bregman iterations1:
pk+1 = pk − τk∇V (xk+1), pk+1 ∈ ∂J(xk+1)
⇐⇒ xk+1 = arg minx
τkV (x) + Dpk
J (xk , x).
Forward Euler → Linearised Bregman iterations1:
pk+1 = pk − τk∇V (xk), pk+1 ∈ ∂J(xk+1)
⇐⇒ xk+1 = arg minx
τk(V (xk) + 〈∇V (xk), x − xk〉
)+ Dpk
J (xk , x).
DG method → Bregman DG method:
pk+1 = pk − τk∇V (xk , xk+1), pk+1 = ∂J(xk+1).
1Benning, Burger (2018).Erlend S. Riis Nonsmooth, nonconvex optimisation IHP, February, 2019 20 / 26
Bregman Itoh–Abe method
pk+1i = pk
i − τ kiV (xk+1
1 , . . . , xk+1i , xk
i+1, . . . , xkn )− V (xk+1
1 , . . . , xk+1i−1 , x
ki , . . . , x
kn )
xk+1i − xk
i
.
If V is continuous, and J is continuous and strongly convex, thenupdates are well-defined.
If V is convex, updates are unique.
Iterates (xk)k∈N converge to Clarke stationary points (underregularity assumption).
Implications:
Can adapt pre-existing methods to incorporate bias (e.g. sparsity,variational regularisation problems)Handles nonsmoothness better (e.g. ‖ · ‖1 kinks)
Erlend S. Riis Nonsmooth, nonconvex optimisation IHP, February, 2019 21 / 26
Example: Bregman Itoh–Abe for linear systems (SOR)(1/2)
Figure: Top left: Ground truth. Top right: Normalised residual of data fidelity. Bottom: Error insupport of iterate, supp (xk ), to support of ground truth, supp (x∗).
Erlend S. Riis Nonsmooth, nonconvex optimisation IHP, February, 2019 22 / 26
Example: Bregman Itoh–Abe for linear systems (SOR)(2/2)
Figure: Top left: Ground truth. Top right: Normalised residual of data fidelity. Bottom: Error insupport of iterate, supp (xk ), to support of ground truth, supp (x∗).
Erlend S. Riis Nonsmooth, nonconvex optimisation IHP, February, 2019 23 / 26
Outlook
Accelerate Itoh–Abe DG method.
Apply discrete gradient methods to gradient flow under differentmetrics(e.g. optimal transport)
Erlend S. Riis Nonsmooth, nonconvex optimisation IHP, February, 2019 24 / 26
Thank you for your attention!
Relevant papers
1 Riis, Ehrhardt, Quispel, Schonlieb. A geometric integration approach tononsmooth, nonconvex optimisation. (2018, preprint)
2 Ehrhardt, Riis, Ringholm, Schonlieb. A geometric integration approach to smoothoptimisation: Foundations of the discrete gradient method. (2018, preprint)
3 Grimm, McLachlan, McLaren, Quispel, Schonlieb. Discrete gradient methods forsolving variational image regularisation models. (2017)
4 Ringholm, Lazic, Schonlieb. Variational image regularization with Euler’s elasticausing a discrete gradient scheme. (2017)
Erlend S. Riis Nonsmooth, nonconvex optimisation IHP, February, 2019 25 / 26