introductory course on non-smooth optimisation · jingwei liang,damtpintroduction to non-smooth...

35
Introductory Course on Non-smooth Optimisation Lecture - Krasnosel’ski˘ ı-Mann iteration Jingwei Liang Department of Applied Mathematics and Theoretical Physics

Upload: others

Post on 25-Sep-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Introductory Course on Non-smooth Optimisation · Jingwei Liang,DAMTPIntroduction to Non-smooth OptimisationNovember 20, 2019. Convergence Lemma Let S Rn be a non-empty set and fx

Introductory Course on Non-smooth Optimisation

Lecture 03 - Krasnosel’skiı-Mann iteration

Jingwei Liang

Department of Applied Mathematics and Theoretical Physics

Page 2: Introductory Course on Non-smooth Optimisation · Jingwei Liang,DAMTPIntroduction to Non-smooth OptimisationNovember 20, 2019. Convergence Lemma Let S Rn be a non-empty set and fx

Table of contents

1 Method of alternating projection

2 Monotone and non-expansive mappings

3 Krasnosel’skiı-Mann iteration

4 “Accelerated” Krasnosel’skiı-Mann iteration

Page 3: Introductory Course on Non-smooth Optimisation · Jingwei Liang,DAMTPIntroduction to Non-smooth OptimisationNovember 20, 2019. Convergence Lemma Let S Rn be a non-empty set and fx

Recap

Recap of descent methods

include gradient descent, proximal gradient descent.

convergence (rate) properties– objective function value

◦ O(1/k) convergence rate.

◦ optimal O(1/k2) convergence rate.

– sequence◦ O(1/

√k) convergence rate.

◦ optimal O(1/k) convergence rate.

– linear convergence under e.g. strong convexity.

NB: end of happiness, most of the above results, especially for objective function values, will notbe true for non-descent type methods.

Page 4: Introductory Course on Non-smooth Optimisation · Jingwei Liang,DAMTPIntroduction to Non-smooth OptimisationNovember 20, 2019. Convergence Lemma Let S Rn be a non-empty set and fx

Operator splitting

Consider the problemminx∈Rn

µ1||x||1 + µ2||∇x||1 + 12 ||Ax− f||2.

In 1D, bothproxγ||·||1(·) and proxγ||∇·||1(·)

have close form solution. However, not for

proxγ(||·||1+||∇·||1)(·).

Operator splitting design properly structured scheme such that

the proximal mapping of non-smooth functions are evaluated separated.

gradient descent is applied to the smooth part.

Page 5: Introductory Course on Non-smooth Optimisation · Jingwei Liang,DAMTPIntroduction to Non-smooth OptimisationNovember 20, 2019. Convergence Lemma Let S Rn be a non-empty set and fx

Outline

1 Method of alternating projection

2 Monotone and non-expansive mappings

3 Krasnosel’skiı-Mann iteration

4 “Accelerated” Krasnosel’skiı-Mann iteration

Page 6: Introductory Course on Non-smooth Optimisation · Jingwei Liang,DAMTPIntroduction to Non-smooth OptimisationNovember 20, 2019. Convergence Lemma Let S Rn be a non-empty set and fx

Feasibility problem

Feasibility problemConsider finding a common point

find x ∈ X ∩ Y,

where X ,Y ∈ Rn are two closed and convex sets.

Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation November 20, 2019

Page 7: Introductory Course on Non-smooth Optimisation · Jingwei Liang,DAMTPIntroduction to Non-smooth OptimisationNovember 20, 2019. Convergence Lemma Let S Rn be a non-empty set and fx

Method of alternating projection

Equivalent formulationminx∈Rn

ιX (x) + ιY(x).

Method of alternating projection (MAP)initial : x0 ∈ X ;repeat :

1. Projection onto Y : yk = PY(xk)

2. Projection onto X : xk+1 = PX (yk)

until : stopping criterion is satisfied.

The projection onto two sets are computed separately.

Stopping criterion: ||xk − xk−1|| ≤ ε.

Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation November 20, 2019

Page 8: Introductory Course on Non-smooth Optimisation · Jingwei Liang,DAMTPIntroduction to Non-smooth OptimisationNovember 20, 2019. Convergence Lemma Let S Rn be a non-empty set and fx

Convergence analysis

MAPxk+1 = PX ◦ PY(xk).

Convergence proerties

convergence result for the objective function value?

convergence of the sequences {xk}k∈N, {yk}k∈N?

Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation November 20, 2019

Page 9: Introductory Course on Non-smooth Optimisation · Jingwei Liang,DAMTPIntroduction to Non-smooth OptimisationNovember 20, 2019. Convergence Lemma Let S Rn be a non-empty set and fx

Outline

1 Method of alternating projection

2 Monotone and non-expansive mappings

3 Krasnosel’skiı-Mann iteration

4 “Accelerated” Krasnosel’skiı-Mann iteration

Page 10: Introductory Course on Non-smooth Optimisation · Jingwei Liang,DAMTPIntroduction to Non-smooth OptimisationNovember 20, 2019. Convergence Lemma Let S Rn be a non-empty set and fx

Notations

Given two non-empty sets X ,U ⊆ Rn, A : X ⇒ U is called set-valued operator if A maps everypoint in X to a subset of U , i.e.

A : X ⇒ U , x ∈ X 7→ A(x) ⊆ U .

The graph of A is defined by

gra(A)def={

(x, u) ∈ X × U : u ∈ A(x)}.

The domain and range of A are

dom(A)def={x ∈ X : A(x) 6= ∅

}, ran(A)

def= A(X ).

The inverse of A defined through its graph

gra(A−1)def={

(u, x) ∈ U × X : u ∈ A(x)}.

The set of zeros of A are the points such that

zer(A)def= A−1(0) =

{x ∈ X : 0 ∈ A(x)

}.

Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation November 20, 2019

Page 11: Introductory Course on Non-smooth Optimisation · Jingwei Liang,DAMTPIntroduction to Non-smooth OptimisationNovember 20, 2019. Convergence Lemma Let S Rn be a non-empty set and fx

Monotone operator

Monotone operatorLet X ,U ⊆ Rn be two non-empty convex sets, A : X ⇒ U is monotone if

〈x− y, u− v〉 ≥ 0, ∀(x, u), (y, v) ∈ gra(A).

It is moreover maximal monotone if gra(A) is not strictly contained in the graph of any othermonotone operators.

A is called α-strongly monotone for some κ > 0 if

〈x− y, u− v〉 ≥ κ||x− y||2.

LemmaLet R ∈ Γ0, then ∂R is maximal monotone.

Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation November 20, 2019

Page 12: Introductory Course on Non-smooth Optimisation · Jingwei Liang,DAMTPIntroduction to Non-smooth OptimisationNovember 20, 2019. Convergence Lemma Let S Rn be a non-empty set and fx

Cocoercive operator

Cocoercive operatorAn operator B : Rn → Rn is called β-cocoercive if there exists β > 0 such that

β||B(x)− B(y)||2 ≤ 〈B(x)− B(y), x− y〉, ∀x, y ∈ Rn.

The above equation implies that B is (1/β)-Lipschitz continuous.

Baillon-Haddad theoremLet F ∈ C1L, then∇F is β-cocoercive.

LemmaLet C : Rn ⇒ Rn be β-strongly monotone, then its inverse C−1 is β-cocoercive.

Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation November 20, 2019

Page 13: Introductory Course on Non-smooth Optimisation · Jingwei Liang,DAMTPIntroduction to Non-smooth OptimisationNovember 20, 2019. Convergence Lemma Let S Rn be a non-empty set and fx

Resolvent of monotone operator

ResolventLet A : Rn ⇒ Rn be a maximal monotone operator and γ > 0, the resolvent of A is defined by

JAdef= (Id + A)−1.

The reflection of JA is defined byRA

def= 2JA − Id.

Given a function R ∈ Γ0 and its sub-differential ∂R,

proxR = J∂R.

Set of fixed points, x = proxR(x)

fix(proxR) = fix(J∂R) = zer(∂R).

Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation November 20, 2019

Page 14: Introductory Course on Non-smooth Optimisation · Jingwei Liang,DAMTPIntroduction to Non-smooth OptimisationNovember 20, 2019. Convergence Lemma Let S Rn be a non-empty set and fx

Yosida approximation

Yosida approximationLet A : Rn ⇒ Rn be a maximal monotone operator and γ > 0, the Yosida approximation of Awith γ is

γA def= 1

γ(Id− JγA) = (γId + A−1)−1 = JA−1/γ(·/γ).

Moreover,Id = JγA(·) + γJA−1/γ

( ·γ

).

γA is γ-cocoercive

Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation November 20, 2019

Page 15: Introductory Course on Non-smooth Optimisation · Jingwei Liang,DAMTPIntroduction to Non-smooth OptimisationNovember 20, 2019. Convergence Lemma Let S Rn be a non-empty set and fx

Non-expansive operator

Non-expansive operatorAn operator T : Rn → Rn is called non-expansive if it is 1-Lipschitz continuous, i.e.

||T (x)− T (y)|| ≤ ||x− y||, ∀x, y ∈ Rn.

For any α ∈]0, 1[, T is α-averaged if there exists a non-expansive operator T ′ such that

T = αT ′ + (1− α)Id.

A(α) denotes the class of α-averaged operators on Rn.

A( 12 ) is the class of firmly non-expansive operators.

Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation November 20, 2019

Page 16: Introductory Course on Non-smooth Optimisation · Jingwei Liang,DAMTPIntroduction to Non-smooth OptimisationNovember 20, 2019. Convergence Lemma Let S Rn be a non-empty set and fx

Properties: α-averaged operators

LemmaLet T : Rn → Rn be non-expansive and α ∈]0, 1[. The following statements are equivalent:T is α-averaged non-expansive.

The operator (1− 1

α

)Id + 1

αT

is non-expansive.

For any x, y ∈ Rn,

||T (x)− T (y)||2 ≤ ||x− y||2 − 1− αα||(Id− T )(x)− (Id− T )(y)||2.

Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation November 20, 2019

Page 17: Introductory Course on Non-smooth Optimisation · Jingwei Liang,DAMTPIntroduction to Non-smooth OptimisationNovember 20, 2019. Convergence Lemma Let S Rn be a non-empty set and fx

Properties: α-averaged operators

A(α) is closed under relaxations, convex combinations and compositions.

LemmaLetm ∈ N+, {Ti}i∈{1,...,m} be non-expansive operators on Rn, (ωi)i ∈]0, 1]m and

∑i ωi = 1, and

(αi)i ∈]0, 1]m such that Ti ∈ A(αi), i ∈ {1, ...,m}. Then,

Id + λi(Ti − Id) ∈ A(λiαi), λi ∈]0, 1αi

[ and i ∈ {1, ...,m}.∑i ωiTi ∈ A(α) with α = maxi αi.

T1 · · · Tm ∈ A(α) with α = mm−1+1/maxi∈{1,...,m} αi

.

Remark For the composition of two averaged operators, a sharper bound of α can be obtained,

α =α1 + α2 − 2α1α2

1− α1α2∈]0, 1[.

Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation November 20, 2019

Page 18: Introductory Course on Non-smooth Optimisation · Jingwei Liang,DAMTPIntroduction to Non-smooth OptimisationNovember 20, 2019. Convergence Lemma Let S Rn be a non-empty set and fx

Properties: firmly non-expansive operators

LemmaLet T : Rn → Rn be non-expansive. The following statements are equivalent:

T is firmly non-expansive.

Id− T is firmly non-expansive.

2T − Id is non-expansive.

||T (x)− T (y)||2 ≤ 〈T (x)− T (y), x− y〉,∀x, y ∈ Rn.

T is the resolvent of a maximal monotone operator A, i.e. T = JA.

LemmaLet operator B : Rn → Rn be β-cocoercive for some β > 0. Then

βB ∈ A( 12 ), i.e. is firmly non-expansive.

Id− γB ∈ A( γ2β ) for γ ∈]0, 2β[.

Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation November 20, 2019

Page 19: Introductory Course on Non-smooth Optimisation · Jingwei Liang,DAMTPIntroduction to Non-smooth OptimisationNovember 20, 2019. Convergence Lemma Let S Rn be a non-empty set and fx

Outline

1 Method of alternating projection

2 Monotone and non-expansive mappings

3 Krasnosel’skiı-Mann iteration

4 “Accelerated” Krasnosel’skiı-Mann iteration

Page 20: Introductory Course on Non-smooth Optimisation · Jingwei Liang,DAMTPIntroduction to Non-smooth OptimisationNovember 20, 2019. Convergence Lemma Let S Rn be a non-empty set and fx

Fixed point

Fixed pointLet T : Rn → Rn be a non-expansive operator, x ∈ Rn is called the fixed point of T if

x = T (x).

The set of fixed points of T is denoted as fix(T ).

fix(T ) may be empty, e.g. translation by a non-zero vector.

LemmaLet X be a non-empty bounded closed convex subset of Rn and T : X → Rn be anon-expansive operator, then fix(T ) 6= ∅.

LemmaLet X be a non-empty closed convex subset of Rn and T : X → Rn be a non-expansiveoperator, then fix(T ) is closed and convex.

Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation November 20, 2019

Page 21: Introductory Course on Non-smooth Optimisation · Jingwei Liang,DAMTPIntroduction to Non-smooth OptimisationNovember 20, 2019. Convergence Lemma Let S Rn be a non-empty set and fx

Krasnosel’skiı-Mann iteration

Fixed-point iterationxk+1 = T (xk).

However, it may not converge, e.g. T = −Id...

Krasnosel’skiı-Mann iterationLet T : Rn → Rn be a non-expansive operator such that fix(T ) 6= ∅. Let λk ∈ [0, 1] and choosex0 arbitrarily from Rn, then the Krasnosel’skiı-Mann iteration of T reads

xk+1 = xk + λk(T (xk)− xk).

If T ∈ A(α), then λk ∈ [0, 1/α]

Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation November 20, 2019

Page 22: Introductory Course on Non-smooth Optimisation · Jingwei Liang,DAMTPIntroduction to Non-smooth OptimisationNovember 20, 2019. Convergence Lemma Let S Rn be a non-empty set and fx

Fejérmonotonicity

Fejér monotonicityLet S ⊆ Rn be a non-empty set and {xk}k∈N be a sequence in Rn. Then

{xk}k∈N is Fejérmonotone with respect to S if

||xk+1 − x|| ≤ ||xk − x||, ∀x ∈ S, ∀k ∈ N.

{xk}k∈N is quasi-Fejérmonotone with respect to S, if there exists a summable sequence{εk}k∈N ∈ `1+ such that

∀k ∈ N, ||xk+1 − x|| ≤ ||xk − x||+ εk, ∀x ∈ S.

Example Let X ⊆ Rn be a non-empty convex set, and T : X → X be a non-expansive operatorsuch that fix(T ) 6= ∅. The sequence {xk}k∈N generated by

xk+1 = T (xk)

is Fejérmonotone with respect to fix(T ).

Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation November 20, 2019

Page 23: Introductory Course on Non-smooth Optimisation · Jingwei Liang,DAMTPIntroduction to Non-smooth OptimisationNovember 20, 2019. Convergence Lemma Let S Rn be a non-empty set and fx

Convergence

LemmaLet S ⊆ Rn be a non-empty set and {xk}k∈N be a sequence in Rn. Assume the {xk}k∈N isquasi-Fejér monotone with respect to S, then the following holds{xk}k∈N is bounded.

||xk − x|| is bounded for any x ∈ S.

{dist(xk,S)}k∈N is decreasing and convergent.

If every sequential cluster point of {xk}k∈N belongs to S, then {xk}k∈N converges to a point in S.

Weak convergence in general real Hilbert space

Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation November 20, 2019

Page 24: Introductory Course on Non-smooth Optimisation · Jingwei Liang,DAMTPIntroduction to Non-smooth OptimisationNovember 20, 2019. Convergence Lemma Let S Rn be a non-empty set and fx

Convergence

ConvergenceLet T : Rn → Rn be a non-expansive operator such that fix(T ) 6= ∅. Consider theKrasnosel’skiı-Mann iteration of T , and choose λk ∈ [0, 1] such that∑

k∈N

λk(1− λk) = +∞,

then the following holds{xk}k∈N is Fejérmonotone with respect to fix(T ).

{xk − T (xk)}k∈N converges strongly to 0.

{xk}k∈N converges to a point in fix(T ).

Remark When T is α-averaged, then

λk ∈ [0, 1/α] such that∑

k∈N λk(1/α− λk) = +∞.

Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation November 20, 2019

Page 25: Introductory Course on Non-smooth Optimisation · Jingwei Liang,DAMTPIntroduction to Non-smooth OptimisationNovember 20, 2019. Convergence Lemma Let S Rn be a non-empty set and fx

Preliminiary

Krasnosel’skiı-Mann iteration with constant relaxationxk+1 = xk + λ

(T (xk)− xk

)=((1− λ)Id + λT

)(xk).

Denote Tλ = (1− λ)Id + λT , and define residual

ek = (Id− T )(xk) = 1λ

(xk − xk+1).

– Tλ ∈ A(λ) if λ ∈]0, 1[. If T ∈ A(α), then Tλ ∈ A(λα).

– For any x? ∈ fix(T ),

x? ∈ fix(T )⇔ x? ∈ fix(Tλ)⇔ x? ∈ zer(Id− T ).

– If λ ∈ [ε, 1− ε], ε ∈]0, 1/2],◦ ek converges to 0.

◦ {xk}k∈N is quasi-Fejér monotone with respect to fix(T ), and converges to a pointx? ∈ fix(T ).

Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation November 20, 2019

Page 26: Introductory Course on Non-smooth Optimisation · Jingwei Liang,DAMTPIntroduction to Non-smooth OptimisationNovember 20, 2019. Convergence Lemma Let S Rn be a non-empty set and fx

Pointwise convergence rate

Rate of ||ek||2:

For residual||ek+1||2 ≤ ||ek||2 − 1− λ

λ||ek − ek+1||2.

Tλ ∈ A(λ), τ = λ(1− λ)

||xk+1 − x?||2 ≤ ||xk − x?||2 − τ ||ek||2.

Summation(k + 1)||ek||2 ≤ τ

∑ki=0 ||ei||

2 ≤ ||x0 − x?||2 − ||xk+1 − x?||2.

Rate||ek||2 ≤ ||x0 − x?||2

k+ 1 .

NB: if T ∈ A(α), then the above holds for λ ∈ [ε, 1/α− ε].

Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation November 20, 2019

Page 27: Introductory Course on Non-smooth Optimisation · Jingwei Liang,DAMTPIntroduction to Non-smooth OptimisationNovember 20, 2019. Convergence Lemma Let S Rn be a non-empty set and fx

Ergodic convergence rate

Define ek = 1k+1

∑ki=0 ei.

Boundedness||xk+1 − x?|| = ||Tλ(xk)− Tλ(x?)|| ≤ ||xk − x?||

≤ ||x0 − x?||.

λek = xk − xk+1

||ek|| = 1k+ 1 ||

∑ki=0 ei|| = 1

λ(k+ 1) ||∑k

i=0 (xi − xi+1)||

= 1λ(k+ 1) ||x0 − xk+1||

≤ 1λ(k+ 1) (||x0 − x?||+ ||xk+1 − x?||)

≤ 2||x0 − x?||λ(k+ 1) .

NB: both rates (pointwise and ergodic) can be extended to the inexact case...

Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation November 20, 2019

Page 28: Introductory Course on Non-smooth Optimisation · Jingwei Liang,DAMTPIntroduction to Non-smooth OptimisationNovember 20, 2019. Convergence Lemma Let S Rn be a non-empty set and fx

Metric sub-regularity

Metric sub-regularityA set-valued mapping A : Rn ⇒ Rn is called metrically sub-regular at x for u ∈ A(x) if thereexists κ ≥ 0 along with neighbourhood X of x such that

dist(x,A−1(u)) ≤ κ dist(u,A(x)), ∀x ∈ X .

The infimum of all κ such that above holds is called the modulus of metric sub-regularity, anddenoted by subreg(A; x|u).

Example Let F ∈ S1α,L and A = γ∇F with γ ≤ 1/L: x = argminRnF and u = 0,

dist(u,A(x)) = ||γ∇F(x)− γ∇F(x)||

≥ γα||x− y||

Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation November 20, 2019

Page 29: Introductory Course on Non-smooth Optimisation · Jingwei Liang,DAMTPIntroduction to Non-smooth OptimisationNovember 20, 2019. Convergence Lemma Let S Rn be a non-empty set and fx

(Local) linear convergence

Let x? ∈ fix(T ), suppose T ′ def= Id− T is metrically sub-regular at x? with neighbourhood X of x?,

let κ > subreg(T ′; x?|0):

0 = T ′(x?), T ′−1(0) = fix(T )

dist(x, fix(T )) ≤ κdist(0, T ′(x)) = κ||x− T (x)||.

Denote dk = dist(xk, fix(T )), x ∈ fix(T ) such that dk = ||xk+1 − x||,

d2k+1 ≤ ||xk+1 − x||2 ≤ ||xk − x||2 − τ ||T ′(xk)− T ′(x)||2

≤ d2k − τκ2 d

2k

=(1− τ

κ2)d2k.

NB: As metric sub-regularity is a local propery, the linear convergence will happen only when xk isclose enough to fix(T ).

Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation November 20, 2019

Page 30: Introductory Course on Non-smooth Optimisation · Jingwei Liang,DAMTPIntroduction to Non-smooth OptimisationNovember 20, 2019. Convergence Lemma Let S Rn be a non-empty set and fx

Optimal relaxation parameter?

Consider λk ∈ [0, 1] and xk+1 = (1− λk)xk + λkT (xk). Then

||xk+1 − x?||2 = ||(1− λk)(xk − x?) + λk(T (xk)− x?)||2

= (1− λk)||xk − x?||2 + λk||T (xk)− x?||2

− λk(1− λk)||xk − T (xk)||2

= λ2k||xk − T (xk)||2

− λk(||xk − x?||2 − ||T (xk)− x?||2 + ||xk − T (xk)||2

)+ ||xk − x?||2

which is a quadratic fucntion of λk, and minimises at

λ =12 +

||xk − x?||2 − ||T (xk)− x?||2

2||xk − T (xk)||2.

Approximation:

λ =12 +

||xk − T (xk)||2 − ||T (xk)− T 2(xk)||2

2||(xk − T (xk))− (T (xk)− T 2(xk))||2.

Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation November 20, 2019

Page 31: Introductory Course on Non-smooth Optimisation · Jingwei Liang,DAMTPIntroduction to Non-smooth OptimisationNovember 20, 2019. Convergence Lemma Let S Rn be a non-empty set and fx

Outline

1 Method of alternating projection

2 Monotone and non-expansive mappings

3 Krasnosel’skiı-Mann iteration

4 “Accelerated” Krasnosel’skiı-Mann iteration

Page 32: Introductory Course on Non-smooth Optimisation · Jingwei Liang,DAMTPIntroduction to Non-smooth OptimisationNovember 20, 2019. Convergence Lemma Let S Rn be a non-empty set and fx

Inertial Krasnosel’skiı-Mann iteration

An inertial Krasnosel’skiı-Mann iterationInitial : x0 ∈ Rn, x−1 = x0;

yk = xk + ak(xk − xk−1), ak ∈ [0, 1],

zk = xk + bk(xk − xk−1), bk ∈ [0, 1],

xk+1 = (1− λk)yk + λkT (zk), λk ∈ [0, 1].

Covers heavy-ball method, Nesterov’s scheme, inertial FB and FISTA.

Convergence analysis is much harder than the inertial version of descent methods.

No convergence rate.

May perform very poorly in practice, slower than the original scheme.

Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation November 20, 2019

Page 33: Introductory Course on Non-smooth Optimisation · Jingwei Liang,DAMTPIntroduction to Non-smooth OptimisationNovember 20, 2019. Convergence Lemma Let S Rn be a non-empty set and fx

A multi-step inertial scheme

A multi-step inertial Krasnosel’skiı-Mann iterationInitial : x0 ∈ Rn, x−1 = x0 and γ ∈]0, 2/L[;

yk = xk + a0,k(xk − xk−1) + a1,k(xk−1 − xk−2) + · · · ,

zk = xk + b0,k(xk − xk−1) + b1,k(xk−1 − xk−2) + · · · ,

xk+1 = (1− λk)yk + λkT (zk), λk ∈ [0, 1].

Even harder to analyse convergence.

No rate.

However, can outperform the original scheme...

Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation November 20, 2019

Page 34: Introductory Course on Non-smooth Optimisation · Jingwei Liang,DAMTPIntroduction to Non-smooth OptimisationNovember 20, 2019. Convergence Lemma Let S Rn be a non-empty set and fx

Convergence

Conditional convergence, i = 0, 1, ...∑k∈N

max{

maxi|ai,k|,max

i|bi,k|

}∑i

||xk−i − xk−i−1|| < +∞.

Online updating ruleai,k = min

{ai, ci,k

}where

ci,k =ci

k1+δ∑

i ||xk−i − xk−i−1||, δ > 0.

Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation November 20, 2019

Page 35: Introductory Course on Non-smooth Optimisation · Jingwei Liang,DAMTPIntroduction to Non-smooth OptimisationNovember 20, 2019. Convergence Lemma Let S Rn be a non-empty set and fx

Reference

H. Bauschke and P. L. Combettes. “Convex Analysis and Monotone Operator Theory in HilbertSpaces”. Springer, 2011.

P. L. Lions and B. Mercier. “Splitting algorithms for the sum of two nonlinear operators”. SIAMJournal on Numerical Analysis, 16(6):964–979, 1979.

J. Liang, J. Fadili, and G. Peyré. “Convergence rates with inexact non-expansive operators”.Mathematical Programming 159.1-2 (2016): 403-434.