introductory course on non-smooth optimisation · jingwei liang,damtpintroduction to non-smooth...
TRANSCRIPT
Introductory Course on Non-smooth Optimisation
Lecture 03 - Krasnosel’skiı-Mann iteration
Jingwei Liang
Department of Applied Mathematics and Theoretical Physics
Table of contents
1 Method of alternating projection
2 Monotone and non-expansive mappings
3 Krasnosel’skiı-Mann iteration
4 “Accelerated” Krasnosel’skiı-Mann iteration
Recap
Recap of descent methods
include gradient descent, proximal gradient descent.
convergence (rate) properties– objective function value
◦ O(1/k) convergence rate.
◦ optimal O(1/k2) convergence rate.
– sequence◦ O(1/
√k) convergence rate.
◦ optimal O(1/k) convergence rate.
– linear convergence under e.g. strong convexity.
NB: end of happiness, most of the above results, especially for objective function values, will notbe true for non-descent type methods.
Operator splitting
Consider the problemminx∈Rn
µ1||x||1 + µ2||∇x||1 + 12 ||Ax− f||2.
In 1D, bothproxγ||·||1(·) and proxγ||∇·||1(·)
have close form solution. However, not for
proxγ(||·||1+||∇·||1)(·).
Operator splitting design properly structured scheme such that
the proximal mapping of non-smooth functions are evaluated separated.
gradient descent is applied to the smooth part.
Outline
1 Method of alternating projection
2 Monotone and non-expansive mappings
3 Krasnosel’skiı-Mann iteration
4 “Accelerated” Krasnosel’skiı-Mann iteration
Feasibility problem
Feasibility problemConsider finding a common point
find x ∈ X ∩ Y,
where X ,Y ∈ Rn are two closed and convex sets.
Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation November 20, 2019
Method of alternating projection
Equivalent formulationminx∈Rn
ιX (x) + ιY(x).
Method of alternating projection (MAP)initial : x0 ∈ X ;repeat :
1. Projection onto Y : yk = PY(xk)
2. Projection onto X : xk+1 = PX (yk)
until : stopping criterion is satisfied.
The projection onto two sets are computed separately.
Stopping criterion: ||xk − xk−1|| ≤ ε.
Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation November 20, 2019
Convergence analysis
MAPxk+1 = PX ◦ PY(xk).
Convergence proerties
convergence result for the objective function value?
convergence of the sequences {xk}k∈N, {yk}k∈N?
Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation November 20, 2019
Outline
1 Method of alternating projection
2 Monotone and non-expansive mappings
3 Krasnosel’skiı-Mann iteration
4 “Accelerated” Krasnosel’skiı-Mann iteration
Notations
Given two non-empty sets X ,U ⊆ Rn, A : X ⇒ U is called set-valued operator if A maps everypoint in X to a subset of U , i.e.
A : X ⇒ U , x ∈ X 7→ A(x) ⊆ U .
The graph of A is defined by
gra(A)def={
(x, u) ∈ X × U : u ∈ A(x)}.
The domain and range of A are
dom(A)def={x ∈ X : A(x) 6= ∅
}, ran(A)
def= A(X ).
The inverse of A defined through its graph
gra(A−1)def={
(u, x) ∈ U × X : u ∈ A(x)}.
The set of zeros of A are the points such that
zer(A)def= A−1(0) =
{x ∈ X : 0 ∈ A(x)
}.
Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation November 20, 2019
Monotone operator
Monotone operatorLet X ,U ⊆ Rn be two non-empty convex sets, A : X ⇒ U is monotone if
〈x− y, u− v〉 ≥ 0, ∀(x, u), (y, v) ∈ gra(A).
It is moreover maximal monotone if gra(A) is not strictly contained in the graph of any othermonotone operators.
A is called α-strongly monotone for some κ > 0 if
〈x− y, u− v〉 ≥ κ||x− y||2.
LemmaLet R ∈ Γ0, then ∂R is maximal monotone.
Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation November 20, 2019
Cocoercive operator
Cocoercive operatorAn operator B : Rn → Rn is called β-cocoercive if there exists β > 0 such that
β||B(x)− B(y)||2 ≤ 〈B(x)− B(y), x− y〉, ∀x, y ∈ Rn.
The above equation implies that B is (1/β)-Lipschitz continuous.
Baillon-Haddad theoremLet F ∈ C1L, then∇F is β-cocoercive.
LemmaLet C : Rn ⇒ Rn be β-strongly monotone, then its inverse C−1 is β-cocoercive.
Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation November 20, 2019
Resolvent of monotone operator
ResolventLet A : Rn ⇒ Rn be a maximal monotone operator and γ > 0, the resolvent of A is defined by
JAdef= (Id + A)−1.
The reflection of JA is defined byRA
def= 2JA − Id.
Given a function R ∈ Γ0 and its sub-differential ∂R,
proxR = J∂R.
Set of fixed points, x = proxR(x)
fix(proxR) = fix(J∂R) = zer(∂R).
Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation November 20, 2019
Yosida approximation
Yosida approximationLet A : Rn ⇒ Rn be a maximal monotone operator and γ > 0, the Yosida approximation of Awith γ is
γA def= 1
γ(Id− JγA) = (γId + A−1)−1 = JA−1/γ(·/γ).
Moreover,Id = JγA(·) + γJA−1/γ
( ·γ
).
γA is γ-cocoercive
Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation November 20, 2019
Non-expansive operator
Non-expansive operatorAn operator T : Rn → Rn is called non-expansive if it is 1-Lipschitz continuous, i.e.
||T (x)− T (y)|| ≤ ||x− y||, ∀x, y ∈ Rn.
For any α ∈]0, 1[, T is α-averaged if there exists a non-expansive operator T ′ such that
T = αT ′ + (1− α)Id.
A(α) denotes the class of α-averaged operators on Rn.
A( 12 ) is the class of firmly non-expansive operators.
Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation November 20, 2019
Properties: α-averaged operators
LemmaLet T : Rn → Rn be non-expansive and α ∈]0, 1[. The following statements are equivalent:T is α-averaged non-expansive.
The operator (1− 1
α
)Id + 1
αT
is non-expansive.
For any x, y ∈ Rn,
||T (x)− T (y)||2 ≤ ||x− y||2 − 1− αα||(Id− T )(x)− (Id− T )(y)||2.
Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation November 20, 2019
Properties: α-averaged operators
A(α) is closed under relaxations, convex combinations and compositions.
LemmaLetm ∈ N+, {Ti}i∈{1,...,m} be non-expansive operators on Rn, (ωi)i ∈]0, 1]m and
∑i ωi = 1, and
(αi)i ∈]0, 1]m such that Ti ∈ A(αi), i ∈ {1, ...,m}. Then,
Id + λi(Ti − Id) ∈ A(λiαi), λi ∈]0, 1αi
[ and i ∈ {1, ...,m}.∑i ωiTi ∈ A(α) with α = maxi αi.
T1 · · · Tm ∈ A(α) with α = mm−1+1/maxi∈{1,...,m} αi
.
Remark For the composition of two averaged operators, a sharper bound of α can be obtained,
α =α1 + α2 − 2α1α2
1− α1α2∈]0, 1[.
Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation November 20, 2019
Properties: firmly non-expansive operators
LemmaLet T : Rn → Rn be non-expansive. The following statements are equivalent:
T is firmly non-expansive.
Id− T is firmly non-expansive.
2T − Id is non-expansive.
||T (x)− T (y)||2 ≤ 〈T (x)− T (y), x− y〉,∀x, y ∈ Rn.
T is the resolvent of a maximal monotone operator A, i.e. T = JA.
LemmaLet operator B : Rn → Rn be β-cocoercive for some β > 0. Then
βB ∈ A( 12 ), i.e. is firmly non-expansive.
Id− γB ∈ A( γ2β ) for γ ∈]0, 2β[.
Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation November 20, 2019
Outline
1 Method of alternating projection
2 Monotone and non-expansive mappings
3 Krasnosel’skiı-Mann iteration
4 “Accelerated” Krasnosel’skiı-Mann iteration
Fixed point
Fixed pointLet T : Rn → Rn be a non-expansive operator, x ∈ Rn is called the fixed point of T if
x = T (x).
The set of fixed points of T is denoted as fix(T ).
fix(T ) may be empty, e.g. translation by a non-zero vector.
LemmaLet X be a non-empty bounded closed convex subset of Rn and T : X → Rn be anon-expansive operator, then fix(T ) 6= ∅.
LemmaLet X be a non-empty closed convex subset of Rn and T : X → Rn be a non-expansiveoperator, then fix(T ) is closed and convex.
Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation November 20, 2019
Krasnosel’skiı-Mann iteration
Fixed-point iterationxk+1 = T (xk).
However, it may not converge, e.g. T = −Id...
Krasnosel’skiı-Mann iterationLet T : Rn → Rn be a non-expansive operator such that fix(T ) 6= ∅. Let λk ∈ [0, 1] and choosex0 arbitrarily from Rn, then the Krasnosel’skiı-Mann iteration of T reads
xk+1 = xk + λk(T (xk)− xk).
If T ∈ A(α), then λk ∈ [0, 1/α]
Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation November 20, 2019
Fejérmonotonicity
Fejér monotonicityLet S ⊆ Rn be a non-empty set and {xk}k∈N be a sequence in Rn. Then
{xk}k∈N is Fejérmonotone with respect to S if
||xk+1 − x|| ≤ ||xk − x||, ∀x ∈ S, ∀k ∈ N.
{xk}k∈N is quasi-Fejérmonotone with respect to S, if there exists a summable sequence{εk}k∈N ∈ `1+ such that
∀k ∈ N, ||xk+1 − x|| ≤ ||xk − x||+ εk, ∀x ∈ S.
Example Let X ⊆ Rn be a non-empty convex set, and T : X → X be a non-expansive operatorsuch that fix(T ) 6= ∅. The sequence {xk}k∈N generated by
xk+1 = T (xk)
is Fejérmonotone with respect to fix(T ).
Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation November 20, 2019
Convergence
LemmaLet S ⊆ Rn be a non-empty set and {xk}k∈N be a sequence in Rn. Assume the {xk}k∈N isquasi-Fejér monotone with respect to S, then the following holds{xk}k∈N is bounded.
||xk − x|| is bounded for any x ∈ S.
{dist(xk,S)}k∈N is decreasing and convergent.
If every sequential cluster point of {xk}k∈N belongs to S, then {xk}k∈N converges to a point in S.
Weak convergence in general real Hilbert space
Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation November 20, 2019
Convergence
ConvergenceLet T : Rn → Rn be a non-expansive operator such that fix(T ) 6= ∅. Consider theKrasnosel’skiı-Mann iteration of T , and choose λk ∈ [0, 1] such that∑
k∈N
λk(1− λk) = +∞,
then the following holds{xk}k∈N is Fejérmonotone with respect to fix(T ).
{xk − T (xk)}k∈N converges strongly to 0.
{xk}k∈N converges to a point in fix(T ).
Remark When T is α-averaged, then
λk ∈ [0, 1/α] such that∑
k∈N λk(1/α− λk) = +∞.
Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation November 20, 2019
Preliminiary
Krasnosel’skiı-Mann iteration with constant relaxationxk+1 = xk + λ
(T (xk)− xk
)=((1− λ)Id + λT
)(xk).
Denote Tλ = (1− λ)Id + λT , and define residual
ek = (Id− T )(xk) = 1λ
(xk − xk+1).
– Tλ ∈ A(λ) if λ ∈]0, 1[. If T ∈ A(α), then Tλ ∈ A(λα).
– For any x? ∈ fix(T ),
x? ∈ fix(T )⇔ x? ∈ fix(Tλ)⇔ x? ∈ zer(Id− T ).
– If λ ∈ [ε, 1− ε], ε ∈]0, 1/2],◦ ek converges to 0.
◦ {xk}k∈N is quasi-Fejér monotone with respect to fix(T ), and converges to a pointx? ∈ fix(T ).
Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation November 20, 2019
Pointwise convergence rate
Rate of ||ek||2:
For residual||ek+1||2 ≤ ||ek||2 − 1− λ
λ||ek − ek+1||2.
Tλ ∈ A(λ), τ = λ(1− λ)
||xk+1 − x?||2 ≤ ||xk − x?||2 − τ ||ek||2.
Summation(k + 1)||ek||2 ≤ τ
∑ki=0 ||ei||
2 ≤ ||x0 − x?||2 − ||xk+1 − x?||2.
Rate||ek||2 ≤ ||x0 − x?||2
k+ 1 .
NB: if T ∈ A(α), then the above holds for λ ∈ [ε, 1/α− ε].
Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation November 20, 2019
Ergodic convergence rate
Define ek = 1k+1
∑ki=0 ei.
Boundedness||xk+1 − x?|| = ||Tλ(xk)− Tλ(x?)|| ≤ ||xk − x?||
≤ ||x0 − x?||.
λek = xk − xk+1
||ek|| = 1k+ 1 ||
∑ki=0 ei|| = 1
λ(k+ 1) ||∑k
i=0 (xi − xi+1)||
= 1λ(k+ 1) ||x0 − xk+1||
≤ 1λ(k+ 1) (||x0 − x?||+ ||xk+1 − x?||)
≤ 2||x0 − x?||λ(k+ 1) .
NB: both rates (pointwise and ergodic) can be extended to the inexact case...
Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation November 20, 2019
Metric sub-regularity
Metric sub-regularityA set-valued mapping A : Rn ⇒ Rn is called metrically sub-regular at x for u ∈ A(x) if thereexists κ ≥ 0 along with neighbourhood X of x such that
dist(x,A−1(u)) ≤ κ dist(u,A(x)), ∀x ∈ X .
The infimum of all κ such that above holds is called the modulus of metric sub-regularity, anddenoted by subreg(A; x|u).
Example Let F ∈ S1α,L and A = γ∇F with γ ≤ 1/L: x = argminRnF and u = 0,
dist(u,A(x)) = ||γ∇F(x)− γ∇F(x)||
≥ γα||x− y||
Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation November 20, 2019
(Local) linear convergence
Let x? ∈ fix(T ), suppose T ′ def= Id− T is metrically sub-regular at x? with neighbourhood X of x?,
let κ > subreg(T ′; x?|0):
0 = T ′(x?), T ′−1(0) = fix(T )
dist(x, fix(T )) ≤ κdist(0, T ′(x)) = κ||x− T (x)||.
Denote dk = dist(xk, fix(T )), x ∈ fix(T ) such that dk = ||xk+1 − x||,
d2k+1 ≤ ||xk+1 − x||2 ≤ ||xk − x||2 − τ ||T ′(xk)− T ′(x)||2
≤ d2k − τκ2 d
2k
=(1− τ
κ2)d2k.
NB: As metric sub-regularity is a local propery, the linear convergence will happen only when xk isclose enough to fix(T ).
Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation November 20, 2019
Optimal relaxation parameter?
Consider λk ∈ [0, 1] and xk+1 = (1− λk)xk + λkT (xk). Then
||xk+1 − x?||2 = ||(1− λk)(xk − x?) + λk(T (xk)− x?)||2
= (1− λk)||xk − x?||2 + λk||T (xk)− x?||2
− λk(1− λk)||xk − T (xk)||2
= λ2k||xk − T (xk)||2
− λk(||xk − x?||2 − ||T (xk)− x?||2 + ||xk − T (xk)||2
)+ ||xk − x?||2
which is a quadratic fucntion of λk, and minimises at
λ =12 +
||xk − x?||2 − ||T (xk)− x?||2
2||xk − T (xk)||2.
Approximation:
λ =12 +
||xk − T (xk)||2 − ||T (xk)− T 2(xk)||2
2||(xk − T (xk))− (T (xk)− T 2(xk))||2.
Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation November 20, 2019
Outline
1 Method of alternating projection
2 Monotone and non-expansive mappings
3 Krasnosel’skiı-Mann iteration
4 “Accelerated” Krasnosel’skiı-Mann iteration
Inertial Krasnosel’skiı-Mann iteration
An inertial Krasnosel’skiı-Mann iterationInitial : x0 ∈ Rn, x−1 = x0;
yk = xk + ak(xk − xk−1), ak ∈ [0, 1],
zk = xk + bk(xk − xk−1), bk ∈ [0, 1],
xk+1 = (1− λk)yk + λkT (zk), λk ∈ [0, 1].
Covers heavy-ball method, Nesterov’s scheme, inertial FB and FISTA.
Convergence analysis is much harder than the inertial version of descent methods.
No convergence rate.
May perform very poorly in practice, slower than the original scheme.
Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation November 20, 2019
A multi-step inertial scheme
A multi-step inertial Krasnosel’skiı-Mann iterationInitial : x0 ∈ Rn, x−1 = x0 and γ ∈]0, 2/L[;
yk = xk + a0,k(xk − xk−1) + a1,k(xk−1 − xk−2) + · · · ,
zk = xk + b0,k(xk − xk−1) + b1,k(xk−1 − xk−2) + · · · ,
xk+1 = (1− λk)yk + λkT (zk), λk ∈ [0, 1].
Even harder to analyse convergence.
No rate.
However, can outperform the original scheme...
Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation November 20, 2019
Convergence
Conditional convergence, i = 0, 1, ...∑k∈N
max{
maxi|ai,k|,max
i|bi,k|
}∑i
||xk−i − xk−i−1|| < +∞.
Online updating ruleai,k = min
{ai, ci,k
}where
ci,k =ci
k1+δ∑
i ||xk−i − xk−i−1||, δ > 0.
Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation November 20, 2019
Reference
H. Bauschke and P. L. Combettes. “Convex Analysis and Monotone Operator Theory in HilbertSpaces”. Springer, 2011.
P. L. Lions and B. Mercier. “Splitting algorithms for the sum of two nonlinear operators”. SIAMJournal on Numerical Analysis, 16(6):964–979, 1979.
J. Liang, J. Fadili, and G. Peyré. “Convergence rates with inexact non-expansive operators”.Mathematical Programming 159.1-2 (2016): 403-434.