on the acceleration of first-order methods - and a new one
TRANSCRIPT
On the Acceleration of First-order Methodsand a New One
Jingwei LIANG
Institute of Natural Sciences, Shanghai Jiao Tong University
Joint work with: Clarice Poon (Bath UK)
Outline
1 Introduction
2 Trajectory of first-order methods
3 Adaptive acceleration for FoM
4 Relation with previous work
5 Numerical experiments
6 Conclusions
Non-smooth optimization
Example - Constrained non-smooth optimization
min๐ฅโโ๐
๐น(๐ฅ) + โ๐
๐=1๐ ๐(๐พ๐๐ฅ),
where
๐น: smooth differentiable with ๐ฟ-Lipschitz continuous gradient.
๐ ๐: proper and lower semi-continuous.
๐พ๐: bounded linear mapping.
Applications: signal/image processing, inverse problems, machine learning, data
science, statistics...
Challenges: non-smooth, (non-convex), composite, high dimension...
Hope: well structured.
Jingwei LIANG A2FoM Introduction 2/26
Example: blind image deconvolution
Figure:NGC224 by Hubble Space Telescope fromWikipedia.
Forward model
๐ด = ๐ฆ โ ๐ฅ + ๐.
Example - blind image deconvolution
min๐ฅโโ๐ร๐,๐ฆโโ๐ ร๐
12โ๐ด โ ๐ฆ โ ๐ฅโ2 + ๐๐ (๐ฅ) s.t. 0 โชฏ ๐ฅ โชฏ 1, 0 โชฏ ๐ฆ โชฏ 1, โ๐ฆโ1 = 1.
Jingwei LIANG A2FoM Introduction 3/26
Example: blind image deconvolution
Figure:NGC224 by Hubble Space Telescope fromWikipedia.
Forward model
๐ด = ๐ฆ โ ๐ฅ + ๐.
Example - blind image deconvolution
min๐ฅโโ๐ร๐,๐ฆโโ๐ ร๐
12โ๐ด โ ๐ฆ โ ๐ฅโ2 + ๐๐ (๐ฅ) s.t. 0 โชฏ ๐ฅ โชฏ 1, 0 โชฏ ๐ฆ โชฏ 1, โ๐ฆโ1 = 1.
Jingwei LIANG A2FoM Introduction 3/26
Example: sparse non-negative matrix factorization
= ร
Decomposition model
๐ด = ๐ฆ๐ฅ + ๐.
Example - sparse NMF
min๐ฅโโ๐ร๐,๐ฆโโ๐ร๐
12โ๐ด โ ๐ฅ๐ฆโ2 s.t. ๐ฅ, ๐ฆ โฅ 0, โ๐ฅ๐โ0 โค ๐ ๐ฅ, ๐ = 1, ..., ๐.
Jingwei LIANG A2FoM Introduction 4/26
Example: sparse non-negative matrix factorization
= ร
Decomposition model
๐ด = ๐ฆ๐ฅ + ๐.
Example - sparse NMF
min๐ฅโโ๐ร๐,๐ฆโโ๐ร๐
12โ๐ด โ ๐ฅ๐ฆโ2 s.t. ๐ฅ, ๐ฆ โฅ 0, โ๐ฅ๐โ0 โค ๐ ๐ฅ, ๐ = 1, ..., ๐.
Jingwei LIANG A2FoM Introduction 4/26
First-order methods: two basic ingredients
Gradient descent [Cauchy โ1847]
min๐ฅ
๐น(๐ฅ)
where ๐น is convex smooth differentiable with โ๐น being ๐ฟ-Lipschitz.
EXplicit Euler scheme:
๐ฅ๐+1 = ๐ฅ๐ โ ๐พ๐โ๐น(๐ฅ๐), ๐พ๐ โ]0, 2/๐ฟ[.
Proximal point algorithm [Rockafellar โ76]
min๐ฅ
๐ (๐ฅ)
with ๐ being proper closed convex. Define โproximity operatorโ by
prox๐พ๐ (๐ฃ) โ argmin๐ฅ ๐พ๐ (๐ฅ) + 12 ||๐ฅ โ ๐ฃ||2.
IMplicit Euler scheme:
๐ฅ๐+1 = prox๐พ๐๐ (๐ฅ๐), ๐พ๐ > 0.
Jingwei LIANG A2FoM Introduction 5/26
First-order methods: two basic ingredients
Gradient descent [Cauchy โ1847]
min๐ฅ
๐น(๐ฅ)
where ๐น is convex smooth differentiable with โ๐น being ๐ฟ-Lipschitz.EXplicit Euler scheme:
๐ฅ๐+1 = ๐ฅ๐ โ ๐พ๐โ๐น(๐ฅ๐), ๐พ๐ โ]0, 2/๐ฟ[.
Proximal point algorithm [Rockafellar โ76]
min๐ฅ
๐ (๐ฅ)
with ๐ being proper closed convex. Define โproximity operatorโ by
prox๐พ๐ (๐ฃ) โ argmin๐ฅ ๐พ๐ (๐ฅ) + 12 ||๐ฅ โ ๐ฃ||2.
IMplicit Euler scheme:
๐ฅ๐+1 = prox๐พ๐๐ (๐ฅ๐), ๐พ๐ > 0.
Jingwei LIANG A2FoM Introduction 5/26
First-order methods: two basic ingredients
Gradient descent [Cauchy โ1847]
min๐ฅ
๐น(๐ฅ)
where ๐น is convex smooth differentiable with โ๐น being ๐ฟ-Lipschitz.EXplicit Euler scheme:
๐ฅ๐+1 = ๐ฅ๐ โ ๐พ๐โ๐น(๐ฅ๐), ๐พ๐ โ]0, 2/๐ฟ[.
Proximal point algorithm [Rockafellar โ76]
min๐ฅ
๐ (๐ฅ)
with ๐ being proper closed convex.
Define โproximity operatorโ by
prox๐พ๐ (๐ฃ) โ argmin๐ฅ ๐พ๐ (๐ฅ) + 12 ||๐ฅ โ ๐ฃ||2.
IMplicit Euler scheme:
๐ฅ๐+1 = prox๐พ๐๐ (๐ฅ๐), ๐พ๐ > 0.
Jingwei LIANG A2FoM Introduction 5/26
First-order methods: two basic ingredients
Gradient descent [Cauchy โ1847]
min๐ฅ
๐น(๐ฅ)
where ๐น is convex smooth differentiable with โ๐น being ๐ฟ-Lipschitz.EXplicit Euler scheme:
๐ฅ๐+1 = ๐ฅ๐ โ ๐พ๐โ๐น(๐ฅ๐), ๐พ๐ โ]0, 2/๐ฟ[.
Proximal point algorithm [Rockafellar โ76]
min๐ฅ
๐ (๐ฅ)
with ๐ being proper closed convex. Define โproximity operatorโ by
prox๐พ๐ (๐ฃ) โ argmin๐ฅ ๐พ๐ (๐ฅ) + 12 ||๐ฅ โ ๐ฃ||2.
IMplicit Euler scheme:
๐ฅ๐+1 = prox๐พ๐๐ (๐ฅ๐), ๐พ๐ > 0.
Jingwei LIANG A2FoM Introduction 5/26
First-order methods: two basic ingredients
Gradient descent [Cauchy โ1847]
min๐ฅ
๐น(๐ฅ)
where ๐น is convex smooth differentiable with โ๐น being ๐ฟ-Lipschitz.EXplicit Euler scheme:
๐ฅ๐+1 = ๐ฅ๐ โ ๐พ๐โ๐น(๐ฅ๐), ๐พ๐ โ]0, 2/๐ฟ[.
Proximal point algorithm [Rockafellar โ76]
min๐ฅ
๐ (๐ฅ)
with ๐ being proper closed convex. Define โproximity operatorโ by
prox๐พ๐ (๐ฃ) โ argmin๐ฅ ๐พ๐ (๐ฅ) + 12 ||๐ฅ โ ๐ฃ||2.
IMplicit Euler scheme:
๐ฅ๐+1 = prox๐พ๐๐ (๐ฅ๐), ๐พ๐ > 0.
Jingwei LIANG A2FoM Introduction 5/26
First-order methods
A rich class of first-order methods
๐น + ๐ ForwardโBackward splitting [Lions & Mercier โ79]
๐ 1 + ๐ 2 DouglasโRachford splitting [Douglas & Rachford โ56; Lions & Mercier
โ79]
ADMM [Glowinski & Marrocco โ75; Gabay & Mercier โ76]...
๐น + ๐ (๐พโ ) PrimalโDual splitting methods [Arrow, Hurwicz & Uzawa โ58; Esser,
Zhang & Chan โ10; Chambolle & Pock โ11]
๐น + โ๐ ๐ ๐ Generalized ForwardโBackward splitting [Raguet, Fadili & Peyrรฉ โ13]
โ โฏ
Origins from numerical PDE back to 1950s, now ubiquitous in signal/image pro-cessing, inverse problems, data science, statistics, machine learning...
Jingwei LIANG A2FoM Introduction 6/26
First-order methods
A rich class of first-order methods
๐น + ๐ ForwardโBackward splitting [Lions & Mercier โ79]
๐ 1 + ๐ 2 DouglasโRachford splitting [Douglas & Rachford โ56; Lions & Mercier
โ79]
ADMM [Glowinski & Marrocco โ75; Gabay & Mercier โ76]...
๐น + ๐ (๐พโ ) PrimalโDual splitting methods [Arrow, Hurwicz & Uzawa โ58; Esser,
Zhang & Chan โ10; Chambolle & Pock โ11]
๐น + โ๐ ๐ ๐ Generalized ForwardโBackward splitting [Raguet, Fadili & Peyrรฉ โ13]
โ โฏ
Fixed-point formulation
Let F โถ H โ H be non-expansive with fix(F) โ {๐ง โ H | ๐ง = F(๐ง)} โ โ ,
๐ง๐+1 = F(๐ง๐).
Jingwei LIANG A2FoM Introduction 6/26
Accelerations: two approaches
Definition - Inertial technique [Polyak โ64, Nesterov โ83, Beck & Teboulle โ09]
โ๐ฅ๐ = ๐ฅ๐ + ๐๐(๐ฅ๐ โ ๐ฅ๐โ1),
๐ฅ๐+1 = F( ๐ฅ๐).
x?
xkโ1
xkxk+1xk+2
xkxk+1
Jingwei LIANG A2FoM Introduction 7/26
Accelerations: two approaches
Definition - Successive over-relaxation [Richardson โ1911, Young โ50]
๐ฅ๐+1 = (1 โ ๐๐)๐ฅ๐ + ๐๐F(๐ฅ๐)๐๐=๐๐โ1
=======โ โ๐ฅ๐ = ๐ฅ๐ + ๐๐(๐ฅ๐ โ ๐ฅ๐โ1),
๐ฅ๐+1 = F( ๐ฅ๐).
x?
xkโ1
xk F(xkโ1)xk+1xk+2 xk
Jingwei LIANG A2FoM Introduction 7/26
A brief history of inertial acceleration
1964 1985 2001 2009 2014 2017 Now
Heavy-ball method
Nesterovโs acceleration
Inertial PPA
FISTA
Dynamical system ofNesterovโs acceleration
Dynamical system ofmaximal monotone operator
Inertial methods
Dynamical systems
A brief timeline of inertial acceleration.
A glimpse of references: [Polyak โ64], [Polyak โ87], [Nesterov โ07], [Nesterov โ83], [Nesterov โ04], [Alvarez โ00],[Attouch, Goudou & Redont โ00], [Alvarez & Attouch โ01], [H Attouch, Peypouquet & Redont โ14],[Attouch & Peypouquet โ16], [Attouch & Peypouquet โ16], [Attouch & Peypouquet โ19], [Adly & Attouch โ20],[Moudafi & Oliny โ03], [Maingรฉ โ08], [Beck & Teboulle โ09], [OโDonoghue โ12], [Su, Boyd & Candรจs โ14], [Su, Boyd &Candรจs โ16], [Chambolle & Dossal โ15], [Aujol & Dossal โ15], [Lorenz & Pock โ14], [Boลฃ, Csetnek & Hendrich โ15], [Boลฃ& Csetnek โ16] [Liang, Fadili and Gabriรฉl โ17], [Apidopoulos, Aujol & Dossal โ20], [Franรงa, Robinson and Vidal โ18a],[Franรงa, Robinson and Vidal โ18b], [Calatroni & Chambolle โ19], [Chambolle & Tovey โ21] and many others...
Jingwei LIANG A2FoM Introduction 8/26
A brief history of inertial acceleration
1964 1985 2001 2009 2014 2017 Now
Heavy-ball method
Nesterovโs acceleration
Inertial PPA
FISTA
Dynamical system ofNesterovโs acceleration
Dynamical system ofmaximal monotone operator
[Polyak โ64] Consider minimizing ๐น โ ๐ถ1๐ฟ(โ๐)
โ๐ฅ๐ = ๐ฅ๐ + ๐๐(๐ฅ๐ โ ๐ฅ๐โ1),
๐ฅ๐+1 = ๐ฅ๐ โ ๐พโ๐น(๐ฅ๐).
If ๐น โ ๐ถ2๐ฟ,๐(โ๐), the optimal rate can be achieved ๐โ = 1 โ โ๐/๐ฟ
1 + โ๐/๐ฟ.
Heavy-ball dynamics with friction:
๐ฅ(๐ก) + ๐พ ๐ฅ(๐ก) = โโ๐น(๐ฅ(๐ก)).
Jingwei LIANG A2FoM Introduction 8/26
A brief history of inertial acceleration
1964 1985 2001 2009 2014 2017 Now
Heavy-ball method
Nesterovโs acceleration
Inertial PPA
FISTA
Dynamical system ofNesterovโs acceleration
Dynamical system ofmaximal monotone operator
[Nesterov โ83] Consider minimizing ๐น โ ๐ถ1๐ฟ(โ๐): ๐พ โค 1/๐ฟ
โ๐ฅ๐ = ๐ฅ๐ + ๐ โ 1
๐ + 2(๐ฅ๐ โ ๐ฅ๐โ1),๐ฅ๐+1 = ๐ฅ๐ โ ๐พโ๐น( ๐ฅ๐).
Convex case: ๐(1/๐) โ ๐(1/๐2).
If ๐น โ ๐ถ1๐ฟ,๐(โ๐), 1 โ ๐
๐ฟ โ 1 โ โ ๐๐ฟ .
Jingwei LIANG A2FoM Introduction 8/26
A brief history of inertial acceleration
1964 1985 2001 2009 2014 2017 Now
Heavy-ball method
Nesterovโs acceleration
Inertial PPA
FISTA
Dynamical system ofNesterovโs acceleration
Dynamical system ofmaximal monotone operator
[Alvarez & Attouch โ01] Consider maximal monotone inclusion problem 0 โ ๐ด(๐ฅ)
๐ฅ(๐ก) + ๐พ ๐ฅ(๐ก) โ โ๐ด(๐ฅ(๐ก)).No co-coercivity.
Guaranteed convergence for ๐๐ < 13 with
๐ฅ๐+1 = (Id+ ๐๐๐ด)โ1(๐ฅ๐ + ๐๐(๐ฅ๐ โ ๐ฅ๐โ1)).
Jingwei LIANG A2FoM Introduction 8/26
A brief history of inertial acceleration
1964 1985 2001 2009 2014 2017 Now
Heavy-ball method
Nesterovโs acceleration
Inertial PPA
FISTA
Dynamical system ofNesterovโs acceleration
Dynamical system ofmaximal monotone operator
[Beck & Teboulle โ09][Chambolle & Dossal โ15] ๐น โ ๐ถ1๐ฟ(โ๐), ๐ โ ๐ค0(โ๐):
๐พ โ]0, 1/๐ฟ], ๐ > 2๐ฅ๐ = ๐ฅ๐ + ๐โ1
๐+๐(๐ฅ๐ โ ๐ฅ๐โ1),
๐ฅ๐+1 = prox๐พ๐ ( ๐ฅ๐ โ ๐พโ๐น( ๐ฅ๐)).
Rate of convergence: sequence ๐(1/โ
๐) โ ๐(1/๐), objective ๐(1/๐) โ ๐(1/๐2).
Convergence of sequence.
Jingwei LIANG A2FoM Introduction 8/26
A brief history of inertial acceleration
1964 1985 2001 2009 2014 2017 Now
Heavy-ball method
Nesterovโs acceleration
Inertial PPA
FISTA
Dynamical system ofNesterovโs acceleration
Dynamical system ofmaximal monotone operator
[Su, Boyd & Candรจs โ14] A differential equation for Nesterovโs acceleration
๐ฅ(๐ก) + ๐ผ๐ก
๐ฅ(๐ก) = โโ๐น(๐ฅ(๐ก)).
Jingwei LIANG A2FoM Introduction 8/26
A brief history of inertial acceleration
1964 1985 2001 2009 2014 2017 Now
Heavy-ball method
Nesterovโs acceleration
Inertial PPA
FISTA
Dynamical system ofNesterovโs acceleration
Dynamical system ofmaximal monotone operator
[Attouch & Peypouquet โ19] Yosidaโs regularization
๐ด๐ = 1๐(Idโ (Id+ ๐๐ด)โ1).
Damped inertial dynamics
๐ฅ(๐ก) + ๐พ(๐ก) ๐ฅ(๐ก) = โ๐ด๐(๐ก)(๐ฅ(๐ก)).๐พ(๐ก) = ๐ผ
๐ก and ๐(๐ก)๐พ2(๐ก) > 1.
Sequence rate of convergence: ๐(1/โ
๐) โ ๐(1/๐).
Jingwei LIANG A2FoM Introduction 8/26
Are we blessed with guaranteed acceleration?
Problem -
min๐ฅโโ๐
๐ฝ(๐ฅ) + ๐ (๐ฅ).
Let ๐พ > 0FDR โ 1
2(Id+ (2prox๐พ๐ โ Id)(2prox๐พ๐ฝ โ Id)).
Jingwei LIANG A2FoM Introduction 9/26
Are we blessed with guaranteed acceleration?
Problem -
min๐ฅโโ๐
๐ฝ(๐ฅ) + ๐ (๐ฅ).
Let ๐พ > 0FDR โ 1
2(Id+ (2prox๐พ๐ โ Id)(2prox๐พ๐ฝ โ Id)).
DouglasโRachford splitting [Douglas & Rachford โ56]
๐ง๐+1 = FDR(๐ง๐),
Sequence ๐(1/โ
๐), objectiveNA.
Jingwei LIANG A2FoM Introduction 9/26
Are we blessed with guaranteed acceleration?
Problem -
min๐ฅโโ๐
๐ฝ(๐ฅ) + ๐ (๐ฅ).
Let ๐พ > 0FDR โ 1
2(Id+ (2prox๐พ๐ โ Id)(2prox๐พ๐ฝ โ Id)).
Inertial DouglasโRachford [Boลฃ, Csetnek & Hendrich โ15]
๐ง๐ = ๐ง๐ + ๐๐(๐ง๐ โ ๐ง๐โ1),๐ง๐+1 = FDR( ๐ง๐).
No rates available,may fail to provide acceleration.
Jingwei LIANG A2FoM Introduction 9/26
Are we blessed with guaranteed acceleration?
Problem -
min๐ฅโโ๐
๐ฝ(๐ฅ) + ๐ (๐ฅ).
Let ๐พ > 0FDR โ 1
2(Id+ (2prox๐พ๐ โ Id)(2prox๐พ๐ฝ โ Id)).
Regularized inertial DouglasโRachford: let J๐ด = (Id+ ๐ด)โ1 = FDR, ๐ผ > 2 and ๐ฝ > 0
๐ฆ๐ = ๐ฅ๐ + ๐ โ 1๐ + ๐ผ(๐ฅ๐ โ ๐ฅ๐โ1),
๐ฅ๐+1 = J๐ฝ๐ด๐พ๐(๐ฆ๐).
Let ๐พ๐ = (1 + ๐) ๐ฝ๐ผ2 ๐2 with ๐ > 2
๐ผโ2 ,
||๐ฅ๐ โ ๐ฅ๐โ1|| = ๐(1/๐), ๐ฅ๐ โ ๐ฅโ โ zer(๐ด).
Jingwei LIANG A2FoM Introduction 9/26
Are we blessed with guaranteed acceleration?
Problem - Feasibility problem in โ2
Let ๐1, ๐2 โ โ2 be two subspaces such that ๐1 โฉ ๐2 โ โ ,
Find ๐ฅ โ โ2 such that ๐ฅ โ ๐1 โฉ ๐2.
Define
FDR โ 12(Id+ (2P๐1
โ Id)(2P๐2โ Id)).
Jingwei LIANG A2FoM Introduction 9/26
Are we blessed with guaranteed acceleration?
Problem - Feasibility problem in โ2
Let ๐1, ๐2 โ โ2 be two subspaces such that ๐1 โฉ ๐2 โ โ ,
Find ๐ฅ โ โ2 such that ๐ฅ โ ๐1 โฉ ๐2.
Inertial DouglasโRachford:
๐ง๐ = ๐ง๐, +๐(๐ง๐ โ ๐ง๐โ1),๐ง๐+1 = FDR( ๐ง๐).
1-step inertial: ๐ = 0.3.
2-step inertial: ๐ = 0.6, ๐ = โ0.3.
200 400 600 800 100010-15
10-10
10-5
100. .normal DR
Jingwei LIANG A2FoM Introduction 9/26
Are we blessed with guaranteed acceleration?
Problem - Feasibility problem in โ2
Let ๐1, ๐2 โ โ2 be two subspaces such that ๐1 โฉ ๐2 โ โ ,
Find ๐ฅ โ โ2 such that ๐ฅ โ ๐1 โฉ ๐2.
Inertial DouglasโRachford:
๐ง๐ = ๐ง๐ + ๐(๐ง๐ โ ๐ง๐โ1),๐ง๐+1 = FDR( ๐ง๐).
1-step inertial: ๐ = 0.3.
200 400 600 800 100010-15
10-10
10-5
1001-step inertial DRnormal DR
Jingwei LIANG A2FoM Introduction 9/26
Are we blessed with guaranteed acceleration?
Problem - Feasibility problem in โ2
Let ๐1, ๐2 โ โ2 be two subspaces such that ๐1 โฉ ๐2 โ โ ,
Find ๐ฅ โ โ2 such that ๐ฅ โ ๐1 โฉ ๐2.
Regularized inertial DouglasโRachford:
๐ฆ๐ = ๐ฅ๐ + ๐ โ 1๐ + ๐ผ(๐ฅ๐ โ ๐ฅ๐โ1),
๐ฅ๐+1 = J๐ฝ๐ด๐พ๐(๐ฆ๐).
100 101 102 103 104
10-6
10-2
Jingwei LIANG A2FoM Introduction 9/26
Are we blessed with guaranteed acceleration?
Consider minimizing
min๐ฅโโ๐
๐ (๐ฅ) + ๐ฝ(๐ง) s.t. ๐ง = ๐ด๐ฅ.
Proposition - Dynamics of ADMM [Franรงa et al โ18a]
Let ๐ (๐ฅ) = ๐ (๐ฅ) + ๐ฝ(๐ด๐ฅ),
(๐ด๐๐ด)( ๐ฅ(๐ก) + ๐ผ๐ก ๐ฅ(๐ก)) = โโ๐(๐ฅ(๐ก)).
Smoothness are required by both ๐ and ๐ฝ.
Improve the objective rate of convergence from ๐(1/๐ก) to ๐(1/๐ก2).
With non-smooth objective, Yosida regularization can be applied [Franรงa et al โ18b].
Not result available in discrete setting...
Jingwei LIANG A2FoM Introduction 9/26
Are we blessed with guaranteed acceleration?
Some remarks
Nesterov/FISTA provide optimal convergence rate.
Generalization of inertial technique to first-order methods, or in general fixed-point
iteration, is achievable:
โข Guaranteed sequence convergence.
โข NO acceleration guarantees. Unless stronger assumptions are imposed, e.g. strong
convexity or Lipschitz smoothness.
For a given method, e.g. DouglasโRachford, the outcome its inertial/SOR versions
is problem and parameters dependent.
A general acceleration framework with acceleration guarantees is missing!
Jingwei LIANG A2FoM Introduction 9/26
Acceleration of First-order Methods
Trajectory of first-order methods
Straight line, logarithmic/elliptical spirals
What makes the difference?
Nesterov/FISTAGradient descent
1-step inertial DRnormal DR
Trajectory of {๐ฅ๐}๐โโ. Trajectory of {๐ง๐}๐โโ.
Projection of elliptical spiral in โ4 onto โ2.
Jingwei LIANG A2FoM Trajectory of first-order methods 10/26
What makes the difference?
Non-smooth opt.Structure
=======โ FoMGeometry
=======โ Trajectory of sequence
Nesterov/FISTAGradient descent
1-step inertial DRnormal DR
Jingwei LIANG A2FoM Trajectory of first-order methods 10/26
What makes the difference?
Non-smooth opt.Structure
=======โ FoMGeometry
=======โ Trajectory of sequence
Nesterov/FISTAGradient descent
1-step inertial DRnormal DR
Jingwei LIANG A2FoM Trajectory of first-order methods 10/26
What makes the difference?
Non-smooth opt.Structure
=======โ FoMGeometry
=======โ Trajectory of sequence
Nesterov/FISTAGradient descent
1-step inertial DRnormal DR
Jingwei LIANG A2FoM Trajectory of first-order methods 10/26
What makes the difference?
Non-smooth opt.Structure
=======โ FoMGeometry
=======โ Trajectory of sequence
However, FoM are non-linear in general...
Nesterov/FISTAGradient descent
1-step inertial DRnormal DR
Jingwei LIANG A2FoM Trajectory of first-order methods 10/26
Partial smoothness
Definition - Partly smooth function [Lewis โ03]
๐ is partly smooth at ๐ฅ relative to a setM๐ฅ containing ๐ฅ if ๐๐ (๐ฅ) โ โ andSmoothness M๐ฅ is a ๐ถ2-manifold, ๐ |M๐ฅ
is ๐ถ2 near ๐ฅ.
Sharpness Tangent space TM๐ฅ(๐ฅ) is ๐๐ฅ โ par(๐๐ (๐ฅ))โ
.
Continuity ๐๐ โถ โ๐ โ โ๐ is continuous alongM๐ฅ near ๐ฅ.par(๐ถ):sub-space parallel to ๐ถ, where ๐ถ โ โ๐ is a non-empty convex set.
Examples:
โ1, โ1,2, โโ-norm
Nuclear norm
Total variation
Partly smooth sets...
Jingwei LIANG A2FoM Trajectory of first-order methods 11/26
Partial smoothness
Definition - Partly smooth function [Lewis โ03]
๐ is partly smooth at ๐ฅ relative to a setM๐ฅ containing ๐ฅ if ๐๐ (๐ฅ) โ โ andSmoothness M๐ฅ is a ๐ถ2-manifold, ๐ |M๐ฅ
is ๐ถ2 near ๐ฅ.
Sharpness Tangent space TM๐ฅ(๐ฅ) is ๐๐ฅ โ par(๐๐ (๐ฅ))โ
.
Continuity ๐๐ โถ โ๐ โ โ๐ is continuous alongM๐ฅ near ๐ฅ.par(๐ถ):sub-space parallel to ๐ถ, where ๐ถ โ โ๐ is a non-empty convex set.
โ50
5 10 20 30 40
0
5
10
15
20
M x
x2 x1
f(x)
Examples:
โ1, โ1,2, โโ-norm
Nuclear norm
Total variation
Partly smooth sets...
Jingwei LIANG A2FoM Trajectory of first-order methods 11/26
Trajectory of first-order methods
Framework for analyzing the local trajectory of FoM
First-order method (non-linear)
โConvergence & Non-degeneracy: finite identification ofM
โLocal linearization alongM: matrix ๐ (linear)
โSpectral properties of ๐
โ
Local trajectory
Jingwei LIANG A2FoM Trajectory of first-order methods 12/26
Trajectory of first-order methods
ForwardโBackward DouglasโRachford/ADMM PrimalโDual
๐ is similar to a symmet-
ric matrix with real eigen-
values in ] โ 1, 1].
Both functions are polyhedral, ๐isnormalwith complex eigenval-
ues of the form cos(๐)eยฑi๐.
Both functions are polyhe-
dral,๐ isblock diagonal-
izable.
NB: For DR/ADMM, if smoothness or (local) strong convexity is posed, straight-line
trajectory can be obtained under proper parameters.
Jingwei LIANG A2FoM Trajectory of first-order methods 12/26
Acceleration of First-order Methods
Adaptive acceleration for FoM
Trajectory following linear prediction
Linear prediction: illustration
Idea: Given past points {๐ง๐โ๐}๐+1๐=0 , how to predict ๐ง๐+1?
Define {๐ฃ๐โ๐ โ ๐ง๐โ๐ โ ๐ง๐โ๐โ1}๐๐=0.
Fit ๐ฃ๐ using past directions ๐ฃ๐โ1, โฆ , ๐ฃ๐โ๐:
๐๐ โ argmin๐โโ๐ || โ๐๐=1 ๐๐๐ฃ๐โ๐ โ ๐ฃ๐||2.
Jingwei LIANG A2FoM Adaptive acceleration for FoM 13/26
Linear prediction: illustration
Idea: Given past points {๐ง๐โ๐}๐+1๐=0 , how to predict ๐ง๐+1?
Define {๐ฃ๐โ๐ โ ๐ง๐โ๐ โ ๐ง๐โ๐โ1}๐๐=0.
Fit ๐ฃ๐ using past directions ๐ฃ๐โ1, โฆ , ๐ฃ๐โ๐:
๐๐ โ argmin๐โโ๐ || โ๐๐=1 ๐๐๐ฃ๐โ๐ โ ๐ฃ๐||2.
Suppose ๐ง๐+1 is given, we can compute ๐๐+1 as above...
Approximation
๐ฃ๐+1 = โ๐๐๐+1,๐๐ฃ๐+1โ๐ โ โ๐๐๐,๐๐ฃ๐+1โ๐.
Jingwei LIANG A2FoM Adaptive acceleration for FoM 13/26
Linear prediction: illustration
Idea: Given past points {๐ง๐โ๐}๐+1๐=0 , how to predict ๐ง๐+1?
Define {๐ฃ๐โ๐ โ ๐ง๐โ๐ โ ๐ง๐โ๐โ1}๐๐=0.
Approximation ๐ง๐+1 โ ๐ง๐,1 โ ๐ง๐ + โ๐๐=1 ๐๐,๐๐ฃ๐โ๐+1.
Jingwei LIANG A2FoM Adaptive acceleration for FoM 13/26
Linear prediction: illustration
Idea: Given past points {๐ง๐โ๐}๐+1๐=0 , how to predict ๐ง๐+1?
Define {๐ฃ๐โ๐ โ ๐ง๐โ๐ โ ๐ง๐โ๐โ1}๐๐=0.
Approximation ๐ง๐+1 โ ๐ง๐,1 โ ๐ง๐ + โ๐๐=1 ๐๐,๐๐ฃ๐โ๐+1.
Repeat on {๐ง๐โ๐}๐๐=0 โช { ๐ง๐,1} and so on...
Jingwei LIANG A2FoM Adaptive acceleration for FoM 13/26
Linear prediction: illustration
Idea: Given past points {๐ง๐โ๐}๐+1๐=0 , how to predict ๐ง๐+1?
Define {๐ฃ๐โ๐ โ ๐ง๐โ๐ โ ๐ง๐โ๐โ1}๐๐=0.
Approximation ๐ง๐+1 โ ๐ง๐,1 โ ๐ง๐ + โ๐๐=1 ๐๐,๐๐ฃ๐โ๐+1.
Repeat on {๐ง๐โ๐}๐๐=0 โช { ๐ง๐,1} and so on...
Repeat ๐ times: ๐ง๐+๐ โ ๐ง๐,๐ = ๐ง๐ + โ๐ ๐=1 ๐ฃ๐+๐.
Jingwei LIANG A2FoM Adaptive acceleration for FoM 13/26
Linear prediction: illustration
Idea: Given past points {๐ง๐โ๐}๐+1๐=0 , how to predict ๐ง๐+1?
Define {๐ฃ๐โ๐ โ ๐ง๐โ๐ โ ๐ง๐โ๐โ1}๐๐=0.
Approximation ๐ง๐+1 โ ๐ง๐,1 โ ๐ง๐ + โ๐๐=1 ๐๐,๐๐ฃ๐โ๐+1.
Repeat on {๐ง๐โ๐}๐๐=0 โช { ๐ง๐,1} and so on...
Repeat ๐ times: ๐ง๐+๐ โ ๐ง๐,๐ = ๐ง๐ + โ๐ ๐=1 ๐ฃ๐+๐.
Let ๐ โ +โ, if converges
๐งโ โ ๐ง๐,+โ = ๐ง๐ + โ+โ๐=1 ๐ฃ๐+๐.
Jingwei LIANG A2FoM Adaptive acceleration for FoM 13/26
Linear prediction: illustration
Idea: Given past points {๐ง๐โ๐}๐+1๐=0 , how to predict ๐ง๐+1?
Define {๐ฃ๐โ๐ โ ๐ง๐โ๐ โ ๐ง๐โ๐โ1}๐๐=0.
Approximation ๐ง๐+1 โ ๐ง๐,1 โ ๐ง๐ + โ๐๐=1 ๐๐,๐๐ฃ๐โ๐+1.
Repeat on {๐ง๐โ๐}๐๐=0 โช { ๐ง๐,1} and so on...
Repeat ๐ times: ๐ง๐+๐ โ ๐ง๐,๐ = ๐ง๐ + โ๐ ๐=1 ๐ฃ๐+๐.
Let ๐ โ +โ, if converges
๐งโ โ ๐ง๐,+โ = ๐ง๐ + โ+โ๐=1 ๐ฃ๐+๐.
The ๐ -step extrapolation is ๐ง๐,๐ = ๐ง๐ + E๐ ,๐,๐, where E๐ ,๐,๐ = โ๐๐=1 ๐๐๐ฃ๐โ๐+1 and
๐ โ (โ๐
๐=1๐ป(๐๐)๐)
(โถ,1)with ๐ป(๐๐) โ
โกโขโขโฃ
๐1 1๐2 1โฎ โฑ
๐๐โ1 1๐๐ 0 โฆ 0
โคโฅโฅโฆ
.
Jingwei LIANG A2FoM Adaptive acceleration for FoM 13/26
Adaptive acceleration for FoM (A2FoM)
Given first-order method
๐ง๐+1 = F(๐ง๐).
Algorithm - A2FoM via linear prediction
Let ๐ โฅ 1, ๐ โฅ 1 be integers. Let ๐ง0 โ โ๐ and ๐ง0 = ๐ง0, set ๐ท = 0 โ โ๐ร(๐+1):
For ๐ โฅ 1:๐ง๐ = F( ๐ง๐โ1),๐ฃ๐ = ๐ง๐ โ ๐ง๐โ1,๐ท = [๐ฃ๐, ๐ท(โถ, 1 โถ ๐)].
If mod(๐, ๐+2) = 0: compute ๐ and ๐ป๐,
If ๐(๐ป๐) < 1: ๐ง๐ = ๐ง๐ + ๐๐(โ๐
๐=1๐ป๐
๐)(โถ,1)
;
else: ๐ง๐ = ๐ง๐.
If mod(๐, ๐+2) โ 0: ๐ง๐ = ๐ง๐.
Jingwei LIANG A2FoM Adaptive acceleration for FoM 14/26
Adaptive acceleration for FoM (A2FoM)
Remarks
Every (๐ + 2)-iteration we apply 1-step LP.
Only apply the linear prediction when ๐(๐ป๐) < 1.
Extra memory cost ๐ ร (๐ + 1) (the difference vector matrix). Usually ๐ โค 10.
Extra computation cost, ๐2๐ from ๐ +๐โ1.
Conditional convergence can be obtained by treating LP as perturbation error,
๐ง๐+1 = F(๐ง๐ + ๐๐).
Weighted LP
๐ง๐ = ๐ง๐ + ๐๐๐๐(โ๐
๐=1๐ป๐
๐)(โถ,1)
,
with ๐๐ updated online.
If ๐(๐ป๐) < 1, the Neumann series is convergent
โ+โ
๐=0๐ป๐
๐ = (Idโ ๐ป๐)โ1.
Jingwei LIANG A2FoM Adaptive acceleration for FoM 14/26
Adaptive acceleration for FoM (A2FoM)
Remarks
Every (๐ + 2)-iteration we apply 1-step LP.
Only apply the linear prediction when ๐(๐ป๐) < 1.
Extra memory cost ๐ ร (๐ + 1) (the difference vector matrix). Usually ๐ โค 10.
Extra computation cost, ๐2๐ from ๐ +๐โ1.
Conditional convergence can be obtained by treating LP as perturbation error,
๐ง๐+1 = F(๐ง๐ + ๐๐).
Weighted LP
๐ง๐ = ๐ง๐ + ๐๐๐๐(โ๐
๐=1๐ป๐
๐)(โถ,1)
,
with ๐๐ updated online.
If ๐(๐ป๐) < 1, the Neumann series is convergent
โ+โ
๐=0๐ป๐
๐ = (Idโ ๐ป๐)โ1.
Jingwei LIANG A2FoM Adaptive acceleration for FoM 14/26
Adaptive acceleration for FoM (A2FoM)
Remarks
Every (๐ + 2)-iteration we apply 1-step LP.
Only apply the linear prediction when ๐(๐ป๐) < 1.
Extra memory cost ๐ ร (๐ + 1) (the difference vector matrix). Usually ๐ โค 10.
Extra computation cost, ๐2๐ from ๐ +๐โ1.
Conditional convergence can be obtained by treating LP as perturbation error,
๐ง๐+1 = F(๐ง๐ + ๐๐).
Weighted LP
๐ง๐ = ๐ง๐ + ๐๐๐๐(โ๐
๐=1๐ป๐
๐)(โถ,1)
,
with ๐๐ updated online.
If ๐(๐ป๐) < 1, the Neumann series is convergent
โ+โ
๐=0๐ป๐
๐ = (Idโ ๐ป๐)โ1.
Jingwei LIANG A2FoM Adaptive acceleration for FoM 14/26
Example: DouglasโRachford continue
normal DR
Jingwei LIANG A2FoM Adaptive acceleration for FoM 15/26
Example: DouglasโRachford continue
normal DRLP, s = 4LP, s = 25
LP, s = +
Jingwei LIANG A2FoM Adaptive acceleration for FoM 15/26
Acceleration of First-order Methods
Relation with previous work
Polynomial extrapolation, regularized non-linear acceleration
Convergence acceleration
Given a sequence {๐ง๐}๐โโ which converges to ๐งโ. Can we generate another sequence
{ ๐ง๐}๐โโ such that || ๐ง๐ โ ๐งโ|| = ๐(||๐ง๐ โ ๐งโ||)?
This is called convergence acceleration and is well-established in numerical analysis:
1927 Aitkinโs ๐ฅ-process.
1965 Andersenโs acceleration.
1970โs Vector extrapolation techniques such as minimal polynomial extrapolation
(MPE) and reduced rank extrapolation (RRE) [Sidi โ17].
Recent Regularized non-linear acceleration (RNA) is a regularised version of RRE
introduced by [Scieur, DโAspremont, Bach โ16].
Jingwei LIANG A2FoM Relation with previous work 16/26
Convergence acceleration
Given a sequence {๐ง๐}๐โโ which converges to ๐งโ. Can we generate another sequence
{ ๐ง๐}๐โโ such that || ๐ง๐ โ ๐งโ|| = ๐(||๐ง๐ โ ๐งโ||)?
This is called convergence acceleration and is well-established in numerical analysis:
1927 Aitkinโs ๐ฅ-process.
1965 Andersenโs acceleration.
1970โs Vector extrapolation techniques such as minimal polynomial extrapolation
(MPE) and reduced rank extrapolation (RRE) [Sidi โ17].
Recent Regularized non-linear acceleration (RNA) is a regularised version of RRE
introduced by [Scieur, DโAspremont, Bach โ16].
Jingwei LIANG A2FoM Relation with previous work 16/26
Vector extrapolation techniques
Polynomial extrapolation [Cabay & Jackson โ76]
Consider ๐ง๐+1 = ๐๐ง๐ + ๐ with ๐(๐) < 1 such that ๐ง๐ โ ๐งโ:
๐ง๐ โ ๐งโ = ๐(๐ง๐โ1 โ ๐งโ) = ๐๐(๐ง0 โ ๐งโ),
If ๐(๐) = โ๐๐=0 ๐๐๐๐ is the minimal polynomial of ๐ w.r.t. ๐ง0 โ ๐งโ, that is
๐(๐)(๐ง0 โ ๐งโ) = โ๐
๐=0๐๐๐ ๐(๐ง0 โ ๐งโ) = 0.
then ๐งโ =โ๐
๐=0 ๐๐๐ง๐
โ๐ ๐๐.
The coefficients ๐ can be computed without knowledge of ๐งโ:
๐๐๐(0โถ๐โ1) = โ๐ฃ๐+1
where ๐๐ = [๐ฃ1|๐ฃ2| โฏ |๐ฃ๐] and ๐ฃ๐ = ๐ง๐ โ ๐ง๐โ1.
Jingwei LIANG A2FoM Relation with previous work 17/26
Vector extrapolation techniques
Vector extrapolation methods with applications (SIAM, 2017) by Avram Sidi.
Minimal polynomial extrapolation (MPE)
Let ๐ง0 = ๐ง:S.1 Generate points {๐ง๐}
๐+1๐=0 and let ๐ฃ๐ = ๐ง๐ โ ๐ง๐โ1.
S.2 Let ๐ โ โ๐+1 be s.t. ๐๐ = 1 and ๐๐๐(0โถ๐โ1) = โ๐ฃ๐+1 where ๐๐ = [๐ฃ1| โฏ |๐ฃ๐].For ๐ โ [0, ๐ โ 1], ๐๐ โ ๐๐/(โ๐
๐=0๐๐).
S.3 ๐ง โ โ๐๐=0 ๐๐๐ง๐.
Jingwei LIANG A2FoM Relation with previous work 17/26
Vector extrapolation techniques
Vector extrapolation methods with applications (SIAM, 2017) by Avram Sidi.
Minimal polynomial extrapolation (MPE)
Let ๐ง0 = ๐ง:S.1 Generate points {๐ง๐}
๐+1๐=0 and let ๐ฃ๐ = ๐ง๐ โ ๐ง๐โ1.
S.2 Let ๐ โ โ๐+1 be s.t. ๐๐ = 1 and ๐๐๐(0โถ๐โ1) = โ๐ฃ๐+1 where ๐๐ = [๐ฃ1| โฏ |๐ฃ๐].For ๐ โ [0, ๐ โ 1], ๐๐ โ ๐๐/(โ๐
๐=0๐๐).
S.3 ๐ง โ โ๐๐=0 ๐๐๐ง๐.
Reduced rank extrapolation (RRE)
[Andersen โ65; Kaniel & Stein โ74; Eddy โ79; Meลกina โ77] Replace step S.2 by
๐ โ argmin๐ ||๐๐+1๐|| subject to 1๐๐ = 1.
Jingwei LIANG A2FoM Relation with previous work 17/26
Vector extrapolation techniques
Vector extrapolation methods with applications (SIAM, 2017) by Avram Sidi.
Minimal polynomial extrapolation (MPE)
Let ๐ง0 = ๐ง:S.1 Generate points {๐ง๐}
๐+1๐=0 and let ๐ฃ๐ = ๐ง๐ โ ๐ง๐โ1.
S.2 Let ๐ โ โ๐+1 be s.t. ๐๐ = 1 and ๐๐๐(0โถ๐โ1) = โ๐ฃ๐+1 where ๐๐ = [๐ฃ1| โฏ |๐ฃ๐].For ๐ โ [0, ๐ โ 1], ๐๐ โ ๐๐/(โ๐
๐=0๐๐).
S.3 ๐ง โ โ๐๐=0 ๐๐๐ง๐.
Reduced rank extrapolation (RRE)
[Andersen โ65; Kaniel & Stein โ74; Eddy โ79; Meลกina โ77] Replace step S.2 by
๐ โ argmin๐ ||๐๐+1๐|| subject to 1๐๐ = 1.
LP is equivalent to MPE with S.3 replaced by ๐ง โ โ๐๐=0 ๐๐๐ง๐+1.
Jingwei LIANG A2FoM Relation with previous work 17/26
Relationship between MPE and LP
Our derivation
is based on sequence trajectory.
motivates checking ๐(๐ป๐) < 1.
Example: Solve with DR
min๐ฅ
๐ (๐ฅ) subject to ๐ด๐ฅ = ๐.
500 1000 1500 2000
10-8
10-4
100
DR1-iDR2-iDRDR+LP, finiteDR+LP, InfiniteMPERRE
500 1000 1500 2000
10-8
10-4
100
DR1-iDR2-iDRDR+LP, finiteDR+LP, InfiniteMPERRE
โ1-norm Nuclear norm
Jingwei LIANG A2FoM Relation with previous work 18/26
Relationship between MPE and LP
Our derivation
is based on sequence trajectory.
motivates checking ๐(๐ป๐) < 1.
Example: Solve with DR
min๐ฅ
๐ (๐ฅ) subject to ๐ด๐ฅ = ๐.
500 1000 1500 2000
10-8
10-4
100
DR1-iDR2-iDRDR+LP, finiteDR+LP, InfiniteMPERRE
500 1000 1500 2000
10-8
10-4
100
DR1-iDR2-iDRDR+LP, finiteDR+LP, InfiniteMPERRE
โ1-norm Nuclear norm
Jingwei LIANG A2FoM Relation with previous work 18/26
Acceleration guarantees
When ๐ง๐+1 โ ๐ง๐ = ๐(๐ง๐ โ ๐ง๐โ1),
|| ๐ง๐,๐ โ ๐งโ|| โค ||๐ง๐+๐ โ ๐งโ|| + ๐ต๐๐
where ๐๐ = ||๐๐โ1๐ โ ๐ฃ๐|| and ๐ต โ โ๐ โ=1 ||๐ โ||| โ๐ โโ
๐=0(๐ป๐๐)(1,1)|.
Asymptotic bound (๐ โ โ):
๐๐ = ๐(|๐๐+1|๐)
where ๐๐+1 is the (๐ + 1)๐กโ largest eigenvalue. Without extrapolation, we just have
๐(|๐1|๐).
Non-asymptotic bound: If ๐ด(๐) โ [๐ผ, ๐ฝ] with โ1 < ๐ผ < ๐ฝ < 1, then
๐ต๐๐ โค ๐พ๐ฝ๐โ๐ (โ๐ โ 1โ๐ + 1)
๐, where ๐ = 1 โ ๐ผ
1 โ ๐ฝ
Similar error bounds also hold for MPE and RRE [Sidi โ98].
For PD, DR with polyhedral functions, guaranteed acceleration with ๐ = 2.
Jingwei LIANG A2FoM Relation with previous work 19/26
Acceleration guarantees
When ๐ง๐+1 โ ๐ง๐ = ๐(๐ง๐ โ ๐ง๐โ1),
|| ๐ง๐,๐ โ ๐งโ|| โค ||๐ง๐+๐ โ ๐งโ|| + ๐ต๐๐
where ๐๐ = ||๐๐โ1๐ โ ๐ฃ๐|| and ๐ต โ โ๐ โ=1 ||๐ โ||| โ๐ โโ
๐=0(๐ป๐๐)(1,1)|.
Asymptotic bound (๐ โ โ):
๐๐ = ๐(|๐๐+1|๐)
where ๐๐+1 is the (๐ + 1)๐กโ largest eigenvalue. Without extrapolation, we just have
๐(|๐1|๐).
Non-asymptotic bound: If ๐ด(๐) โ [๐ผ, ๐ฝ] with โ1 < ๐ผ < ๐ฝ < 1, then
๐ต๐๐ โค ๐พ๐ฝ๐โ๐ (โ๐ โ 1โ๐ + 1)
๐, where ๐ = 1 โ ๐ผ
1 โ ๐ฝ
Similar error bounds also hold for MPE and RRE [Sidi โ98].
For PD, DR with polyhedral functions, guaranteed acceleration with ๐ = 2.
Jingwei LIANG A2FoM Relation with previous work 19/26
Acceleration guarantees
When ๐ง๐+1 โ ๐ง๐ = ๐(๐ง๐ โ ๐ง๐โ1),
|| ๐ง๐,๐ โ ๐งโ|| โค ||๐ง๐+๐ โ ๐งโ|| + ๐ต๐๐
where ๐๐ = ||๐๐โ1๐ โ ๐ฃ๐|| and ๐ต โ โ๐ โ=1 ||๐ โ||| โ๐ โโ
๐=0(๐ป๐๐)(1,1)|.
Asymptotic bound (๐ โ โ):
๐๐ = ๐(|๐๐+1|๐)
where ๐๐+1 is the (๐ + 1)๐กโ largest eigenvalue. Without extrapolation, we just have
๐(|๐1|๐).
Non-asymptotic bound: If ๐ด(๐) โ [๐ผ, ๐ฝ] with โ1 < ๐ผ < ๐ฝ < 1, then
๐ต๐๐ โค ๐พ๐ฝ๐โ๐ (โ๐ โ 1โ๐ + 1)
๐, where ๐ = 1 โ ๐ผ
1 โ ๐ฝ
Similar error bounds also hold for MPE and RRE [Sidi โ98].
For PD, DR with polyhedral functions, guaranteed acceleration with ๐ = 2.
Jingwei LIANG A2FoM Relation with previous work 19/26
Acceleration guarantees
When ๐ง๐+1 โ ๐ง๐ = ๐(๐ง๐ โ ๐ง๐โ1),
|| ๐ง๐,๐ โ ๐งโ|| โค ||๐ง๐+๐ โ ๐งโ|| + ๐ต๐๐
where ๐๐ = ||๐๐โ1๐ โ ๐ฃ๐|| and ๐ต โ โ๐ โ=1 ||๐ โ||| โ๐ โโ
๐=0(๐ป๐๐)(1,1)|.
Asymptotic bound (๐ โ โ):
๐๐ = ๐(|๐๐+1|๐)
where ๐๐+1 is the (๐ + 1)๐กโ largest eigenvalue. Without extrapolation, we just have
๐(|๐1|๐).
Non-asymptotic bound: If ๐ด(๐) โ [๐ผ, ๐ฝ] with โ1 < ๐ผ < ๐ฝ < 1, then
๐ต๐๐ โค ๐พ๐ฝ๐โ๐ (โ๐ โ 1โ๐ + 1)
๐, where ๐ = 1 โ ๐ผ
1 โ ๐ฝ
Similar error bounds also hold for MPE and RRE [Sidi โ98].
For PD, DR with polyhedral functions, guaranteed acceleration with ๐ = 2.
Jingwei LIANG A2FoM Relation with previous work 19/26
Acceleration guarantees
When ๐ง๐+1 โ ๐ง๐ = ๐(๐ง๐ โ ๐ง๐โ1),
|| ๐ง๐,๐ โ ๐งโ|| โค ||๐ง๐+๐ โ ๐งโ|| + ๐ต๐๐
where ๐๐ = ||๐๐โ1๐ โ ๐ฃ๐|| and ๐ต โ โ๐ โ=1 ||๐ โ||| โ๐ โโ
๐=0(๐ป๐๐)(1,1)|.
Asymptotic bound (๐ โ โ):
๐๐ = ๐(|๐๐+1|๐)
where ๐๐+1 is the (๐ + 1)๐กโ largest eigenvalue. Without extrapolation, we just have
๐(|๐1|๐).
Non-asymptotic bound: If ๐ด(๐) โ [๐ผ, ๐ฝ] with โ1 < ๐ผ < ๐ฝ < 1, then
๐ต๐๐ โค ๐พ๐ฝ๐โ๐ (โ๐ โ 1โ๐ + 1)
๐, where ๐ = 1 โ ๐ผ
1 โ ๐ฝ
Similar error bounds also hold for MPE and RRE [Sidi โ98].
For PD, DR with polyhedral functions, guaranteed acceleration with ๐ = 2.
Jingwei LIANG A2FoM Relation with previous work 19/26
Regularised non-linear acceleration (RNA): regularised RRE
[Scieur, DโAspremont, Bach โ16] studied RRE for the case of
๐ง๐+1 โ ๐งโ = ๐ด(๐ง๐ โ ๐งโ) + ๐(||๐ง๐ โ ๐งโ||2)
where ๐ด is symmetric with 0 โชฏ ๐ด โชฏ ๐Id, ๐ < 1.
To deal with the possible ill-conditioning of ๐๐, regularise with ๐ > 0:
๐ โ Argmin๐ ||๐๐(๐ ๐๐ ๐๐+๐Id)๐|| subject to 1๐๐ = 1.
In practice, grid search on optimal ๐ โ [๐min, ๐max]. Potentially many evaluations ofthe objective.
The angle between ๐ง๐ โ ๐ง๐โ1 and ๐ง๐+1 โ ๐ง๐ converges to zero, intuitively, this is the
regime where standard inertial works well...
Jingwei LIANG A2FoM Relation with previous work 20/26
Regularised non-linear acceleration (RNA): regularised RRE
[Scieur, DโAspremont, Bach โ16] studied RRE for the case of
๐ง๐+1 โ ๐งโ = ๐ด(๐ง๐ โ ๐งโ) + ๐(||๐ง๐ โ ๐งโ||2)
where ๐ด is symmetric with 0 โชฏ ๐ด โชฏ ๐Id, ๐ < 1.
To deal with the possible ill-conditioning of ๐๐, regularise with ๐ > 0:
๐ โ Argmin๐ ||๐๐(๐ ๐๐ ๐๐+๐Id)๐|| subject to 1๐๐ = 1.
In practice, grid search on optimal ๐ โ [๐min, ๐max]. Potentially many evaluations ofthe objective.
The angle between ๐ง๐ โ ๐ง๐โ1 and ๐ง๐+1 โ ๐ง๐ converges to zero, intuitively, this is the
regime where standard inertial works well...
Jingwei LIANG A2FoM Relation with previous work 20/26
Regularised non-linear acceleration (RNA): regularised RRE
[Scieur, DโAspremont, Bach โ16] studied RRE for the case of
๐ง๐+1 โ ๐งโ = ๐ด(๐ง๐ โ ๐งโ) + ๐(||๐ง๐ โ ๐งโ||2)
where ๐ด is symmetric with 0 โชฏ ๐ด โชฏ ๐Id, ๐ < 1.
To deal with the possible ill-conditioning of ๐๐, regularise with ๐ > 0:
๐ โ Argmin๐ ||๐๐(๐ ๐๐ ๐๐+๐Id)๐|| subject to 1๐๐ = 1.
In practice, grid search on optimal ๐ โ [๐min, ๐max]. Potentially many evaluations ofthe objective.
The angle between ๐ง๐ โ ๐ง๐โ1 and ๐ง๐+1 โ ๐ง๐ converges to zero, intuitively, this is the
regime where standard inertial works well...
Jingwei LIANG A2FoM Relation with previous work 20/26
LASSO-type problems with FB
NB: Recall the straight-line trajectory of FB.
โ1-norm โ1,2-norm Nuclear norm
500 1000 1500 2000 2500
10-8
10-4
100
Forward--BackwardFISTARestarting FISTALP, finite stepLP, Infinite stepMPERRERNA
200 400 600 800 1000 1200
10-8
10-4
100
Forward--BackwardFISTARestarting FISTALP, finite stepLP, Infinite stepMPERRERNA
2000 4000 6000
10-8
10-4
100
Forward--BackwardFISTARestarting FISTALP, finite stepLP, Infinite stepMPERRERNA
Note the performance of restarted-FISTA [OโDonoghue & Candรจs โ12]!
Jingwei LIANG A2FoM Relation with previous work 21/26
Acceleration of First-order Methods
Numerical experiments
Somemore results
Experiment: 2 non-smooth terms
Basis pursuit type problem with ๐บ โ {๐ฅ โ โ๐ โถ ๐พ๐ฅ = ๐}:
min๐ฅ,๐ฆโโ๐
๐ (๐ฅ) + ๐๐บ(๐ฆ) such that ๐ฅ โ ๐ฆ = 0.
50 100 150 200 250 300 350
0.8
0.85
0.9
0.95
1
50 100 150 2000.75
0.8
0.85
0.9
0.95
1
100 200 300 400 500
0.8
0.85
0.9
0.95
1
0 50 100 150 200 250 300 350
10-10
10-8
10-6
10-4
10-2
100
102
0 50 100 150 200
10-10
10-8
10-6
10-4
10-2
100
102
0 100 200 300 400 500
10-10
10-8
10-6
10-4
10-2
100
102
โ1-norm โ1,2-norm Nuclear norm
Jingwei LIANG A2FoM Numerical experiments 22/26
Experiment: 2 non-smooth terms
0 50 100 150 200 250 300 350
10-10
10-8
10-6
10-4
10-2
100
102
โ1-norm
Inertial ADMM is slower than ADMM as eventual trajectory is a spiral.
Jingwei LIANG A2FoM Numerical experiments 22/26
Experiment: LASSO
The LASSO problem
min๐ฅ,๐ฆโโ๐
๐ (๐ฅ) + 12 ||๐พ๐ฆ โ ๐||2 such that ๐ฅ โ ๐ฆ = 0.
0 2000 4000 6000 8000 10000
10-10
10-8
10-6
10-4
10-2
100
102
0 200 400 600 800 1000 1200
10-10
10-8
10-6
10-4
10-2
100
102
0 500 1000 1500 2000 2500
10-10
10-8
10-6
10-4
10-2
100
102
covtype ijcnn1 phishing
Jingwei LIANG A2FoM Numerical experiments 23/26
Experiment: LASSO
0 2000 4000 6000 8000 10000
10-10
10-8
10-6
10-4
10-2
100
102
covtype
Inertial ADMM does accelerate, but A3DMM is significantly faster.
Jingwei LIANG A2FoM Numerical experiments 23/26
Experiment: Total variation based image inpainting
Let ๐บ โ {๐ฅ โ โ๐ร๐ โถ ๐D(๐ฅ) = ๐}, ๐D randomly sets 50% pixels to zero and consider
min๐ฅโโ๐ร๐
||๐ฆ||1 + ๐๐บ(๐ฅ) such that โ๐ฅ โ ๐ฆ = 0.
Angle {๐๐} ||๐ฅ๐ โ ๐ฅโ|| PSNR
500 1000 15000.75
0.8
0.85
0.9
0.95
1
0 200 400 600 800 1000 1200 1400 1600 1800
10-8
10-6
10-4
10-2
100
102
104
0 20 40 60 80 100
24
24.5
25
25.5
26
26.5
27
27.5
Both functions are polyhedral, trajectory is a spiral.
Inertial ADMM is slower than ADMM.
Jingwei LIANG A2FoM Numerical experiments 24/26
Experiment: Total variation based image inpainting
Original image ADMM, PSNR = 26.5448 Inertial ADMM, PSNR = 26.1096
Corrupted image A3DMM ๐ = 100, PSNR = 27.0402 A3DMM ๐ = +โ, PSNR = 27.0402
Jingwei LIANG A2FoM Numerical experiments 24/26
Take-away messages
Trajectory of FoM For fixed-point sequence {๐ง๐}๐โโ
Linearization of FoM.
Locally different FoM demonstrate distinct trajectories: straight line,
(logarithmic and elliptic) spirals.
An adaptive acceleration for FoM
Though motivated by local trajectory, linear prediction works globally.
For polyhedral functions, guaranteed local acceleration for DR/PD using 4 pastpoints.
For FB, the trajectory is eventually a straight line and one can guarantee local
acceleration by extrapolating from 3 past points under our framework, but one
could just apply restarted FISTA...
Reference
Trajectory of Alternating Direction Method of Multipliers and AdaptiveAcceleration, NeurIPS19.
Geometry of First-Order Methods and Adaptive Acceleration,arXiv:2003.03910.
Jingwei LIANG A2FoM Conclusions 25/26
Take-away messages
Trajectory of FoM For fixed-point sequence {๐ง๐}๐โโ
Linearization of FoM.
Locally different FoM demonstrate distinct trajectories: straight line,
(logarithmic and elliptic) spirals.
An adaptive acceleration for FoM
Though motivated by local trajectory, linear prediction works globally.
For polyhedral functions, guaranteed local acceleration for DR/PD using 4 pastpoints.
For FB, the trajectory is eventually a straight line and one can guarantee local
acceleration by extrapolating from 3 past points under our framework, but one
could just apply restarted FISTA...
Reference
Trajectory of Alternating Direction Method of Multipliers and AdaptiveAcceleration, NeurIPS19.
Geometry of First-Order Methods and Adaptive Acceleration,arXiv:2003.03910.
Jingwei LIANG A2FoM Conclusions 25/26
Take-away messages
Trajectory of FoM For fixed-point sequence {๐ง๐}๐โโ
Linearization of FoM.
Locally different FoM demonstrate distinct trajectories: straight line,
(logarithmic and elliptic) spirals.
An adaptive acceleration for FoM
Though motivated by local trajectory, linear prediction works globally.
For polyhedral functions, guaranteed local acceleration for DR/PD using 4 pastpoints.
For FB, the trajectory is eventually a straight line and one can guarantee local
acceleration by extrapolating from 3 past points under our framework, but one
could just apply restarted FISTA...
Reference
Trajectory of Alternating Direction Method of Multipliers and AdaptiveAcceleration, NeurIPS19.
Geometry of First-Order Methods and Adaptive Acceleration,arXiv:2003.03910.
Jingwei LIANG A2FoM Conclusions 25/26
Many thanks for your time!https://jliang993.github.io/
Jingwei LIANG A2FoM Conclusions 26/26