on the acceleration of first-order methods - and a new one

86
On the Acceleration of First-order Methods and a New One Jingwei LIANG Institute of Natural Sciences, Shanghai Jiao Tong University Joint work with: Clarice Poon (Bath UK)

Upload: others

Post on 03-Jan-2022

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: On the Acceleration of First-order Methods - and a New One

On the Acceleration of First-order Methodsand a New One

Jingwei LIANG

Institute of Natural Sciences, Shanghai Jiao Tong University

Joint work with: Clarice Poon (Bath UK)

Page 2: On the Acceleration of First-order Methods - and a New One

Outline

1 Introduction

2 Trajectory of first-order methods

3 Adaptive acceleration for FoM

4 Relation with previous work

5 Numerical experiments

6 Conclusions

Page 3: On the Acceleration of First-order Methods - and a New One

Non-smooth optimization

Example - Constrained non-smooth optimization

min๐‘ฅโˆˆโ„๐‘›

๐น(๐‘ฅ) + โˆ‘๐‘Ÿ

๐‘–=1๐‘…๐‘–(๐พ๐‘–๐‘ฅ),

where

๐น: smooth differentiable with ๐ฟ-Lipschitz continuous gradient.

๐‘…๐‘–: proper and lower semi-continuous.

๐พ๐‘–: bounded linear mapping.

Applications: signal/image processing, inverse problems, machine learning, data

science, statistics...

Challenges: non-smooth, (non-convex), composite, high dimension...

Hope: well structured.

Jingwei LIANG A2FoM Introduction 2/26

Page 4: On the Acceleration of First-order Methods - and a New One

Example: blind image deconvolution

Figure:NGC224 by Hubble Space Telescope fromWikipedia.

Forward model

๐ด = ๐‘ฆ โŠ™ ๐‘ฅ + ๐œ€.

Example - blind image deconvolution

min๐‘ฅโˆˆโ„๐‘ร—๐‘Ÿ,๐‘ฆโˆˆโ„๐‘ ร—๐‘ 

12โ€–๐ด โˆ’ ๐‘ฆ โŠ™ ๐‘ฅโ€–2 + ๐œ†๐‘…(๐‘ฅ) s.t. 0 โชฏ ๐‘ฅ โชฏ 1, 0 โชฏ ๐‘ฆ โชฏ 1, โ€–๐‘ฆโ€–1 = 1.

Jingwei LIANG A2FoM Introduction 3/26

Page 5: On the Acceleration of First-order Methods - and a New One

Example: blind image deconvolution

Figure:NGC224 by Hubble Space Telescope fromWikipedia.

Forward model

๐ด = ๐‘ฆ โŠ™ ๐‘ฅ + ๐œ€.

Example - blind image deconvolution

min๐‘ฅโˆˆโ„๐‘ร—๐‘Ÿ,๐‘ฆโˆˆโ„๐‘ ร—๐‘ 

12โ€–๐ด โˆ’ ๐‘ฆ โŠ™ ๐‘ฅโ€–2 + ๐œ†๐‘…(๐‘ฅ) s.t. 0 โชฏ ๐‘ฅ โชฏ 1, 0 โชฏ ๐‘ฆ โชฏ 1, โ€–๐‘ฆโ€–1 = 1.

Jingwei LIANG A2FoM Introduction 3/26

Page 6: On the Acceleration of First-order Methods - and a New One

Example: sparse non-negative matrix factorization

= ร—

Decomposition model

๐ด = ๐‘ฆ๐‘ฅ + ๐œ€.

Example - sparse NMF

min๐‘ฅโˆˆโ„๐‘ร—๐‘Ÿ,๐‘ฆโˆˆโ„๐‘Ÿร—๐‘ž

12โ€–๐ด โˆ’ ๐‘ฅ๐‘ฆโ€–2 s.t. ๐‘ฅ, ๐‘ฆ โ‰ฅ 0, โ€–๐‘ฅ๐‘–โ€–0 โ‰ค ๐œ…๐‘ฅ, ๐‘– = 1, ..., ๐‘Ÿ.

Jingwei LIANG A2FoM Introduction 4/26

Page 7: On the Acceleration of First-order Methods - and a New One

Example: sparse non-negative matrix factorization

= ร—

Decomposition model

๐ด = ๐‘ฆ๐‘ฅ + ๐œ€.

Example - sparse NMF

min๐‘ฅโˆˆโ„๐‘ร—๐‘Ÿ,๐‘ฆโˆˆโ„๐‘Ÿร—๐‘ž

12โ€–๐ด โˆ’ ๐‘ฅ๐‘ฆโ€–2 s.t. ๐‘ฅ, ๐‘ฆ โ‰ฅ 0, โ€–๐‘ฅ๐‘–โ€–0 โ‰ค ๐œ…๐‘ฅ, ๐‘– = 1, ..., ๐‘Ÿ.

Jingwei LIANG A2FoM Introduction 4/26

Page 8: On the Acceleration of First-order Methods - and a New One

First-order methods: two basic ingredients

Gradient descent [Cauchy โ€™1847]

min๐‘ฅ

๐น(๐‘ฅ)

where ๐น is convex smooth differentiable with โˆ‡๐น being ๐ฟ-Lipschitz.

EXplicit Euler scheme:

๐‘ฅ๐‘˜+1 = ๐‘ฅ๐‘˜ โˆ’ ๐›พ๐‘˜โˆ‡๐น(๐‘ฅ๐‘˜), ๐›พ๐‘˜ โˆˆ]0, 2/๐ฟ[.

Proximal point algorithm [Rockafellar โ€™76]

min๐‘ฅ

๐‘…(๐‘ฅ)

with ๐‘… being proper closed convex. Define โ€œproximity operatorโ€ by

prox๐›พ๐‘…(๐‘ฃ) โ‰ argmin๐‘ฅ ๐›พ๐‘…(๐‘ฅ) + 12 ||๐‘ฅ โˆ’ ๐‘ฃ||2.

IMplicit Euler scheme:

๐‘ฅ๐‘˜+1 = prox๐›พ๐‘˜๐‘…(๐‘ฅ๐‘˜), ๐›พ๐‘˜ > 0.

Jingwei LIANG A2FoM Introduction 5/26

Page 9: On the Acceleration of First-order Methods - and a New One

First-order methods: two basic ingredients

Gradient descent [Cauchy โ€™1847]

min๐‘ฅ

๐น(๐‘ฅ)

where ๐น is convex smooth differentiable with โˆ‡๐น being ๐ฟ-Lipschitz.EXplicit Euler scheme:

๐‘ฅ๐‘˜+1 = ๐‘ฅ๐‘˜ โˆ’ ๐›พ๐‘˜โˆ‡๐น(๐‘ฅ๐‘˜), ๐›พ๐‘˜ โˆˆ]0, 2/๐ฟ[.

Proximal point algorithm [Rockafellar โ€™76]

min๐‘ฅ

๐‘…(๐‘ฅ)

with ๐‘… being proper closed convex. Define โ€œproximity operatorโ€ by

prox๐›พ๐‘…(๐‘ฃ) โ‰ argmin๐‘ฅ ๐›พ๐‘…(๐‘ฅ) + 12 ||๐‘ฅ โˆ’ ๐‘ฃ||2.

IMplicit Euler scheme:

๐‘ฅ๐‘˜+1 = prox๐›พ๐‘˜๐‘…(๐‘ฅ๐‘˜), ๐›พ๐‘˜ > 0.

Jingwei LIANG A2FoM Introduction 5/26

Page 10: On the Acceleration of First-order Methods - and a New One

First-order methods: two basic ingredients

Gradient descent [Cauchy โ€™1847]

min๐‘ฅ

๐น(๐‘ฅ)

where ๐น is convex smooth differentiable with โˆ‡๐น being ๐ฟ-Lipschitz.EXplicit Euler scheme:

๐‘ฅ๐‘˜+1 = ๐‘ฅ๐‘˜ โˆ’ ๐›พ๐‘˜โˆ‡๐น(๐‘ฅ๐‘˜), ๐›พ๐‘˜ โˆˆ]0, 2/๐ฟ[.

Proximal point algorithm [Rockafellar โ€™76]

min๐‘ฅ

๐‘…(๐‘ฅ)

with ๐‘… being proper closed convex.

Define โ€œproximity operatorโ€ by

prox๐›พ๐‘…(๐‘ฃ) โ‰ argmin๐‘ฅ ๐›พ๐‘…(๐‘ฅ) + 12 ||๐‘ฅ โˆ’ ๐‘ฃ||2.

IMplicit Euler scheme:

๐‘ฅ๐‘˜+1 = prox๐›พ๐‘˜๐‘…(๐‘ฅ๐‘˜), ๐›พ๐‘˜ > 0.

Jingwei LIANG A2FoM Introduction 5/26

Page 11: On the Acceleration of First-order Methods - and a New One

First-order methods: two basic ingredients

Gradient descent [Cauchy โ€™1847]

min๐‘ฅ

๐น(๐‘ฅ)

where ๐น is convex smooth differentiable with โˆ‡๐น being ๐ฟ-Lipschitz.EXplicit Euler scheme:

๐‘ฅ๐‘˜+1 = ๐‘ฅ๐‘˜ โˆ’ ๐›พ๐‘˜โˆ‡๐น(๐‘ฅ๐‘˜), ๐›พ๐‘˜ โˆˆ]0, 2/๐ฟ[.

Proximal point algorithm [Rockafellar โ€™76]

min๐‘ฅ

๐‘…(๐‘ฅ)

with ๐‘… being proper closed convex. Define โ€œproximity operatorโ€ by

prox๐›พ๐‘…(๐‘ฃ) โ‰ argmin๐‘ฅ ๐›พ๐‘…(๐‘ฅ) + 12 ||๐‘ฅ โˆ’ ๐‘ฃ||2.

IMplicit Euler scheme:

๐‘ฅ๐‘˜+1 = prox๐›พ๐‘˜๐‘…(๐‘ฅ๐‘˜), ๐›พ๐‘˜ > 0.

Jingwei LIANG A2FoM Introduction 5/26

Page 12: On the Acceleration of First-order Methods - and a New One

First-order methods: two basic ingredients

Gradient descent [Cauchy โ€™1847]

min๐‘ฅ

๐น(๐‘ฅ)

where ๐น is convex smooth differentiable with โˆ‡๐น being ๐ฟ-Lipschitz.EXplicit Euler scheme:

๐‘ฅ๐‘˜+1 = ๐‘ฅ๐‘˜ โˆ’ ๐›พ๐‘˜โˆ‡๐น(๐‘ฅ๐‘˜), ๐›พ๐‘˜ โˆˆ]0, 2/๐ฟ[.

Proximal point algorithm [Rockafellar โ€™76]

min๐‘ฅ

๐‘…(๐‘ฅ)

with ๐‘… being proper closed convex. Define โ€œproximity operatorโ€ by

prox๐›พ๐‘…(๐‘ฃ) โ‰ argmin๐‘ฅ ๐›พ๐‘…(๐‘ฅ) + 12 ||๐‘ฅ โˆ’ ๐‘ฃ||2.

IMplicit Euler scheme:

๐‘ฅ๐‘˜+1 = prox๐›พ๐‘˜๐‘…(๐‘ฅ๐‘˜), ๐›พ๐‘˜ > 0.

Jingwei LIANG A2FoM Introduction 5/26

Page 13: On the Acceleration of First-order Methods - and a New One

First-order methods

A rich class of first-order methods

๐น + ๐‘… Forwardโ€“Backward splitting [Lions & Mercier โ€™79]

๐‘…1 + ๐‘…2 Douglasโ€“Rachford splitting [Douglas & Rachford โ€™56; Lions & Mercier

โ€™79]

ADMM [Glowinski & Marrocco โ€™75; Gabay & Mercier โ€™76]...

๐น + ๐‘…(๐พโ‹…) Primalโ€“Dual splitting methods [Arrow, Hurwicz & Uzawa โ€™58; Esser,

Zhang & Chan โ€™10; Chambolle & Pock โ€™11]

๐น + โˆ‘๐‘– ๐‘…๐‘– Generalized Forwardโ€“Backward splitting [Raguet, Fadili & Peyrรฉ โ€™13]

โ€“ โ‹ฏ

Origins from numerical PDE back to 1950s, now ubiquitous in signal/image pro-cessing, inverse problems, data science, statistics, machine learning...

Jingwei LIANG A2FoM Introduction 6/26

Page 14: On the Acceleration of First-order Methods - and a New One

First-order methods

A rich class of first-order methods

๐น + ๐‘… Forwardโ€“Backward splitting [Lions & Mercier โ€™79]

๐‘…1 + ๐‘…2 Douglasโ€“Rachford splitting [Douglas & Rachford โ€™56; Lions & Mercier

โ€™79]

ADMM [Glowinski & Marrocco โ€™75; Gabay & Mercier โ€™76]...

๐น + ๐‘…(๐พโ‹…) Primalโ€“Dual splitting methods [Arrow, Hurwicz & Uzawa โ€™58; Esser,

Zhang & Chan โ€™10; Chambolle & Pock โ€™11]

๐น + โˆ‘๐‘– ๐‘…๐‘– Generalized Forwardโ€“Backward splitting [Raguet, Fadili & Peyrรฉ โ€™13]

โ€“ โ‹ฏ

Fixed-point formulation

Let F โˆถ H โ†’ H be non-expansive with fix(F) โ‰ {๐‘ง โˆˆ H | ๐‘ง = F(๐‘ง)} โ‰  โˆ…,

๐‘ง๐‘˜+1 = F(๐‘ง๐‘˜).

Jingwei LIANG A2FoM Introduction 6/26

Page 15: On the Acceleration of First-order Methods - and a New One

Accelerations: two approaches

Definition - Inertial technique [Polyak โ€™64, Nesterov โ€™83, Beck & Teboulle โ€™09]

โŒŠ๐‘ฅ๐‘˜ = ๐‘ฅ๐‘˜ + ๐‘Ž๐‘˜(๐‘ฅ๐‘˜ โˆ’ ๐‘ฅ๐‘˜โˆ’1),

๐‘ฅ๐‘˜+1 = F( ๐‘ฅ๐‘˜).

x?

xkโˆ’1

xkxk+1xk+2

xkxk+1

Jingwei LIANG A2FoM Introduction 7/26

Page 16: On the Acceleration of First-order Methods - and a New One

Accelerations: two approaches

Definition - Successive over-relaxation [Richardson โ€™1911, Young โ€™50]

๐‘ฅ๐‘˜+1 = (1 โˆ’ ๐œ†๐‘˜)๐‘ฅ๐‘˜ + ๐œ†๐‘˜F(๐‘ฅ๐‘˜)๐‘Ž๐‘˜=๐œ†๐‘˜โˆ’1

=======โ‡’ โŒŠ๐‘ฅ๐‘˜ = ๐‘ฅ๐‘˜ + ๐‘Ž๐‘˜(๐‘ฅ๐‘˜ โˆ’ ๐‘ฅ๐‘˜โˆ’1),

๐‘ฅ๐‘˜+1 = F( ๐‘ฅ๐‘˜).

x?

xkโˆ’1

xk F(xkโˆ’1)xk+1xk+2 xk

Jingwei LIANG A2FoM Introduction 7/26

Page 17: On the Acceleration of First-order Methods - and a New One

A brief history of inertial acceleration

1964 1985 2001 2009 2014 2017 Now

Heavy-ball method

Nesterovโ€™s acceleration

Inertial PPA

FISTA

Dynamical system ofNesterovโ€™s acceleration

Dynamical system ofmaximal monotone operator

Inertial methods

Dynamical systems

A brief timeline of inertial acceleration.

A glimpse of references: [Polyak โ€™64], [Polyak โ€™87], [Nesterov โ€™07], [Nesterov โ€™83], [Nesterov โ€™04], [Alvarez โ€™00],[Attouch, Goudou & Redont โ€™00], [Alvarez & Attouch โ€™01], [H Attouch, Peypouquet & Redont โ€™14],[Attouch & Peypouquet โ€™16], [Attouch & Peypouquet โ€™16], [Attouch & Peypouquet โ€™19], [Adly & Attouch โ€™20],[Moudafi & Oliny โ€™03], [Maingรฉ โ€™08], [Beck & Teboulle โ€™09], [Oโ€™Donoghue โ€™12], [Su, Boyd & Candรจs โ€™14], [Su, Boyd &Candรจs โ€™16], [Chambolle & Dossal โ€™15], [Aujol & Dossal โ€™15], [Lorenz & Pock โ€™14], [Boลฃ, Csetnek & Hendrich โ€™15], [Boลฃ& Csetnek โ€™16] [Liang, Fadili and Gabriรฉl โ€™17], [Apidopoulos, Aujol & Dossal โ€™20], [Franรงa, Robinson and Vidal โ€™18a],[Franรงa, Robinson and Vidal โ€™18b], [Calatroni & Chambolle โ€™19], [Chambolle & Tovey โ€™21] and many others...

Jingwei LIANG A2FoM Introduction 8/26

Page 18: On the Acceleration of First-order Methods - and a New One

A brief history of inertial acceleration

1964 1985 2001 2009 2014 2017 Now

Heavy-ball method

Nesterovโ€™s acceleration

Inertial PPA

FISTA

Dynamical system ofNesterovโ€™s acceleration

Dynamical system ofmaximal monotone operator

[Polyak โ€™64] Consider minimizing ๐น โˆˆ ๐ถ1๐ฟ(โ„๐‘›)

โŒŠ๐‘ฅ๐‘˜ = ๐‘ฅ๐‘˜ + ๐‘Ž๐‘˜(๐‘ฅ๐‘˜ โˆ’ ๐‘ฅ๐‘˜โˆ’1),

๐‘ฅ๐‘˜+1 = ๐‘ฅ๐‘˜ โˆ’ ๐›พโˆ‡๐น(๐‘ฅ๐‘˜).

If ๐น โˆˆ ๐ถ2๐ฟ,๐œ‡(โ„๐‘›), the optimal rate can be achieved ๐œŒโ‹† = 1 โˆ’ โˆš๐œ‡/๐ฟ

1 + โˆš๐œ‡/๐ฟ.

Heavy-ball dynamics with friction:

๐‘ฅ(๐‘ก) + ๐›พ ๐‘ฅ(๐‘ก) = โˆ’โˆ‡๐น(๐‘ฅ(๐‘ก)).

Jingwei LIANG A2FoM Introduction 8/26

Page 19: On the Acceleration of First-order Methods - and a New One

A brief history of inertial acceleration

1964 1985 2001 2009 2014 2017 Now

Heavy-ball method

Nesterovโ€™s acceleration

Inertial PPA

FISTA

Dynamical system ofNesterovโ€™s acceleration

Dynamical system ofmaximal monotone operator

[Nesterov โ€™83] Consider minimizing ๐น โˆˆ ๐ถ1๐ฟ(โ„๐‘›): ๐›พ โ‰ค 1/๐ฟ

โŒŠ๐‘ฅ๐‘˜ = ๐‘ฅ๐‘˜ + ๐‘˜ โˆ’ 1

๐‘˜ + 2(๐‘ฅ๐‘˜ โˆ’ ๐‘ฅ๐‘˜โˆ’1),๐‘ฅ๐‘˜+1 = ๐‘ฅ๐‘˜ โˆ’ ๐›พโˆ‡๐น( ๐‘ฅ๐‘˜).

Convex case: ๐‘œ(1/๐‘˜) โ†’ ๐‘œ(1/๐‘˜2).

If ๐น โˆˆ ๐ถ1๐ฟ,๐œ‡(โ„๐‘›), 1 โˆ’ ๐œ‡

๐ฟ โ†’ 1 โˆ’ โˆš ๐œ‡๐ฟ .

Jingwei LIANG A2FoM Introduction 8/26

Page 20: On the Acceleration of First-order Methods - and a New One

A brief history of inertial acceleration

1964 1985 2001 2009 2014 2017 Now

Heavy-ball method

Nesterovโ€™s acceleration

Inertial PPA

FISTA

Dynamical system ofNesterovโ€™s acceleration

Dynamical system ofmaximal monotone operator

[Alvarez & Attouch โ€™01] Consider maximal monotone inclusion problem 0 โˆˆ ๐ด(๐‘ฅ)

๐‘ฅ(๐‘ก) + ๐›พ ๐‘ฅ(๐‘ก) โˆˆ โˆ’๐ด(๐‘ฅ(๐‘ก)).No co-coercivity.

Guaranteed convergence for ๐‘Ž๐‘˜ < 13 with

๐‘ฅ๐‘˜+1 = (Id+ ๐œ†๐‘˜๐ด)โˆ’1(๐‘ฅ๐‘˜ + ๐‘Ž๐‘˜(๐‘ฅ๐‘˜ โˆ’ ๐‘ฅ๐‘˜โˆ’1)).

Jingwei LIANG A2FoM Introduction 8/26

Page 21: On the Acceleration of First-order Methods - and a New One

A brief history of inertial acceleration

1964 1985 2001 2009 2014 2017 Now

Heavy-ball method

Nesterovโ€™s acceleration

Inertial PPA

FISTA

Dynamical system ofNesterovโ€™s acceleration

Dynamical system ofmaximal monotone operator

[Beck & Teboulle โ€™09][Chambolle & Dossal โ€™15] ๐น โˆˆ ๐ถ1๐ฟ(โ„๐‘›), ๐‘… โˆˆ ๐›ค0(โ„๐‘›):

๐›พ โˆˆ]0, 1/๐ฟ], ๐‘‘ > 2๐‘ฅ๐‘˜ = ๐‘ฅ๐‘˜ + ๐‘˜โˆ’1

๐‘˜+๐‘‘(๐‘ฅ๐‘˜ โˆ’ ๐‘ฅ๐‘˜โˆ’1),

๐‘ฅ๐‘˜+1 = prox๐›พ๐‘…( ๐‘ฅ๐‘˜ โˆ’ ๐›พโˆ‡๐น( ๐‘ฅ๐‘˜)).

Rate of convergence: sequence ๐‘œ(1/โˆš

๐‘˜) โ†’ ๐‘œ(1/๐‘˜), objective ๐‘œ(1/๐‘˜) โ†’ ๐‘œ(1/๐‘˜2).

Convergence of sequence.

Jingwei LIANG A2FoM Introduction 8/26

Page 22: On the Acceleration of First-order Methods - and a New One

A brief history of inertial acceleration

1964 1985 2001 2009 2014 2017 Now

Heavy-ball method

Nesterovโ€™s acceleration

Inertial PPA

FISTA

Dynamical system ofNesterovโ€™s acceleration

Dynamical system ofmaximal monotone operator

[Su, Boyd & Candรจs โ€™14] A differential equation for Nesterovโ€™s acceleration

๐‘ฅ(๐‘ก) + ๐›ผ๐‘ก

๐‘ฅ(๐‘ก) = โˆ’โˆ‡๐น(๐‘ฅ(๐‘ก)).

Jingwei LIANG A2FoM Introduction 8/26

Page 23: On the Acceleration of First-order Methods - and a New One

A brief history of inertial acceleration

1964 1985 2001 2009 2014 2017 Now

Heavy-ball method

Nesterovโ€™s acceleration

Inertial PPA

FISTA

Dynamical system ofNesterovโ€™s acceleration

Dynamical system ofmaximal monotone operator

[Attouch & Peypouquet โ€™19] Yosidaโ€™s regularization

๐ด๐œ† = 1๐œ†(Idโˆ’ (Id+ ๐œ†๐ด)โˆ’1).

Damped inertial dynamics

๐‘ฅ(๐‘ก) + ๐›พ(๐‘ก) ๐‘ฅ(๐‘ก) = โˆ’๐ด๐œ†(๐‘ก)(๐‘ฅ(๐‘ก)).๐›พ(๐‘ก) = ๐›ผ

๐‘ก and ๐œ†(๐‘ก)๐›พ2(๐‘ก) > 1.

Sequence rate of convergence: ๐‘œ(1/โˆš

๐‘˜) โ†’ ๐‘œ(1/๐‘˜).

Jingwei LIANG A2FoM Introduction 8/26

Page 24: On the Acceleration of First-order Methods - and a New One

Are we blessed with guaranteed acceleration?

Problem -

min๐‘ฅโˆˆโ„๐‘›

๐ฝ(๐‘ฅ) + ๐‘…(๐‘ฅ).

Let ๐›พ > 0FDR โ‰ 1

2(Id+ (2prox๐›พ๐‘… โˆ’ Id)(2prox๐›พ๐ฝ โˆ’ Id)).

Jingwei LIANG A2FoM Introduction 9/26

Page 25: On the Acceleration of First-order Methods - and a New One

Are we blessed with guaranteed acceleration?

Problem -

min๐‘ฅโˆˆโ„๐‘›

๐ฝ(๐‘ฅ) + ๐‘…(๐‘ฅ).

Let ๐›พ > 0FDR โ‰ 1

2(Id+ (2prox๐›พ๐‘… โˆ’ Id)(2prox๐›พ๐ฝ โˆ’ Id)).

Douglasโ€“Rachford splitting [Douglas & Rachford โ€™56]

๐‘ง๐‘˜+1 = FDR(๐‘ง๐‘˜),

Sequence ๐‘œ(1/โˆš

๐‘˜), objectiveNA.

Jingwei LIANG A2FoM Introduction 9/26

Page 26: On the Acceleration of First-order Methods - and a New One

Are we blessed with guaranteed acceleration?

Problem -

min๐‘ฅโˆˆโ„๐‘›

๐ฝ(๐‘ฅ) + ๐‘…(๐‘ฅ).

Let ๐›พ > 0FDR โ‰ 1

2(Id+ (2prox๐›พ๐‘… โˆ’ Id)(2prox๐›พ๐ฝ โˆ’ Id)).

Inertial Douglasโ€“Rachford [Boลฃ, Csetnek & Hendrich โ€™15]

๐‘ง๐‘˜ = ๐‘ง๐‘˜ + ๐‘Ž๐‘˜(๐‘ง๐‘˜ โˆ’ ๐‘ง๐‘˜โˆ’1),๐‘ง๐‘˜+1 = FDR( ๐‘ง๐‘˜).

No rates available,may fail to provide acceleration.

Jingwei LIANG A2FoM Introduction 9/26

Page 27: On the Acceleration of First-order Methods - and a New One

Are we blessed with guaranteed acceleration?

Problem -

min๐‘ฅโˆˆโ„๐‘›

๐ฝ(๐‘ฅ) + ๐‘…(๐‘ฅ).

Let ๐›พ > 0FDR โ‰ 1

2(Id+ (2prox๐›พ๐‘… โˆ’ Id)(2prox๐›พ๐ฝ โˆ’ Id)).

Regularized inertial Douglasโ€“Rachford: let J๐ด = (Id+ ๐ด)โˆ’1 = FDR, ๐›ผ > 2 and ๐›ฝ > 0

๐‘ฆ๐‘˜ = ๐‘ฅ๐‘˜ + ๐‘˜ โˆ’ 1๐‘˜ + ๐›ผ(๐‘ฅ๐‘˜ โˆ’ ๐‘ฅ๐‘˜โˆ’1),

๐‘ฅ๐‘˜+1 = J๐›ฝ๐ด๐›พ๐‘˜(๐‘ฆ๐‘˜).

Let ๐›พ๐‘˜ = (1 + ๐œ–) ๐›ฝ๐›ผ2 ๐‘˜2 with ๐œ– > 2

๐›ผโˆ’2 ,

||๐‘ฅ๐‘˜ โˆ’ ๐‘ฅ๐‘˜โˆ’1|| = ๐‘œ(1/๐‘˜), ๐‘ฅ๐‘˜ โ†’ ๐‘ฅโ‹† โˆˆ zer(๐ด).

Jingwei LIANG A2FoM Introduction 9/26

Page 28: On the Acceleration of First-order Methods - and a New One

Are we blessed with guaranteed acceleration?

Problem - Feasibility problem in โ„2

Let ๐‘‡1, ๐‘‡2 โŠ‚ โ„2 be two subspaces such that ๐‘‡1 โˆฉ ๐‘‡2 โ‰  โˆ…,

Find ๐‘ฅ โˆˆ โ„2 such that ๐‘ฅ โˆˆ ๐‘‡1 โˆฉ ๐‘‡2.

Define

FDR โ‰ 12(Id+ (2P๐‘‡1

โˆ’ Id)(2P๐‘‡2โˆ’ Id)).

Jingwei LIANG A2FoM Introduction 9/26

Page 29: On the Acceleration of First-order Methods - and a New One

Are we blessed with guaranteed acceleration?

Problem - Feasibility problem in โ„2

Let ๐‘‡1, ๐‘‡2 โŠ‚ โ„2 be two subspaces such that ๐‘‡1 โˆฉ ๐‘‡2 โ‰  โˆ…,

Find ๐‘ฅ โˆˆ โ„2 such that ๐‘ฅ โˆˆ ๐‘‡1 โˆฉ ๐‘‡2.

Inertial Douglasโ€“Rachford:

๐‘ง๐‘˜ = ๐‘ง๐‘˜, +๐‘Ž(๐‘ง๐‘˜ โˆ’ ๐‘ง๐‘˜โˆ’1),๐‘ง๐‘˜+1 = FDR( ๐‘ง๐‘˜).

1-step inertial: ๐‘Ž = 0.3.

2-step inertial: ๐‘Ž = 0.6, ๐‘ = โˆ’0.3.

200 400 600 800 100010-15

10-10

10-5

100. .normal DR

Jingwei LIANG A2FoM Introduction 9/26

Page 30: On the Acceleration of First-order Methods - and a New One

Are we blessed with guaranteed acceleration?

Problem - Feasibility problem in โ„2

Let ๐‘‡1, ๐‘‡2 โŠ‚ โ„2 be two subspaces such that ๐‘‡1 โˆฉ ๐‘‡2 โ‰  โˆ…,

Find ๐‘ฅ โˆˆ โ„2 such that ๐‘ฅ โˆˆ ๐‘‡1 โˆฉ ๐‘‡2.

Inertial Douglasโ€“Rachford:

๐‘ง๐‘˜ = ๐‘ง๐‘˜ + ๐‘Ž(๐‘ง๐‘˜ โˆ’ ๐‘ง๐‘˜โˆ’1),๐‘ง๐‘˜+1 = FDR( ๐‘ง๐‘˜).

1-step inertial: ๐‘Ž = 0.3.

200 400 600 800 100010-15

10-10

10-5

1001-step inertial DRnormal DR

Jingwei LIANG A2FoM Introduction 9/26

Page 31: On the Acceleration of First-order Methods - and a New One

Are we blessed with guaranteed acceleration?

Problem - Feasibility problem in โ„2

Let ๐‘‡1, ๐‘‡2 โŠ‚ โ„2 be two subspaces such that ๐‘‡1 โˆฉ ๐‘‡2 โ‰  โˆ…,

Find ๐‘ฅ โˆˆ โ„2 such that ๐‘ฅ โˆˆ ๐‘‡1 โˆฉ ๐‘‡2.

Regularized inertial Douglasโ€“Rachford:

๐‘ฆ๐‘˜ = ๐‘ฅ๐‘˜ + ๐‘˜ โˆ’ 1๐‘˜ + ๐›ผ(๐‘ฅ๐‘˜ โˆ’ ๐‘ฅ๐‘˜โˆ’1),

๐‘ฅ๐‘˜+1 = J๐›ฝ๐ด๐›พ๐‘˜(๐‘ฆ๐‘˜).

100 101 102 103 104

10-6

10-2

Jingwei LIANG A2FoM Introduction 9/26

Page 32: On the Acceleration of First-order Methods - and a New One

Are we blessed with guaranteed acceleration?

Consider minimizing

min๐‘ฅโˆˆโ„๐‘›

๐‘…(๐‘ฅ) + ๐ฝ(๐‘ง) s.t. ๐‘ง = ๐ด๐‘ฅ.

Proposition - Dynamics of ADMM [Franรงa et al โ€™18a]

Let ๐‘‰ (๐‘ฅ) = ๐‘…(๐‘ฅ) + ๐ฝ(๐ด๐‘ฅ),

(๐ด๐‘‡๐ด)( ๐‘ฅ(๐‘ก) + ๐›ผ๐‘ก ๐‘ฅ(๐‘ก)) = โˆ’โˆ‡๐‘‰(๐‘ฅ(๐‘ก)).

Smoothness are required by both ๐‘… and ๐ฝ.

Improve the objective rate of convergence from ๐‘‚(1/๐‘ก) to ๐‘‚(1/๐‘ก2).

With non-smooth objective, Yosida regularization can be applied [Franรงa et al โ€™18b].

Not result available in discrete setting...

Jingwei LIANG A2FoM Introduction 9/26

Page 33: On the Acceleration of First-order Methods - and a New One

Are we blessed with guaranteed acceleration?

Some remarks

Nesterov/FISTA provide optimal convergence rate.

Generalization of inertial technique to first-order methods, or in general fixed-point

iteration, is achievable:

โ€ข Guaranteed sequence convergence.

โ€ข NO acceleration guarantees. Unless stronger assumptions are imposed, e.g. strong

convexity or Lipschitz smoothness.

For a given method, e.g. Douglasโ€“Rachford, the outcome its inertial/SOR versions

is problem and parameters dependent.

A general acceleration framework with acceleration guarantees is missing!

Jingwei LIANG A2FoM Introduction 9/26

Page 34: On the Acceleration of First-order Methods - and a New One

Acceleration of First-order Methods

Trajectory of first-order methods

Straight line, logarithmic/elliptical spirals

Page 35: On the Acceleration of First-order Methods - and a New One

What makes the difference?

Nesterov/FISTAGradient descent

1-step inertial DRnormal DR

Trajectory of {๐‘ฅ๐‘˜}๐‘˜โˆˆโ„•. Trajectory of {๐‘ง๐‘˜}๐‘˜โˆˆโ„•.

Projection of elliptical spiral in โ„4 onto โ„2.

Jingwei LIANG A2FoM Trajectory of first-order methods 10/26

Page 36: On the Acceleration of First-order Methods - and a New One

What makes the difference?

Non-smooth opt.Structure

=======โ‡’ FoMGeometry

=======โ‡’ Trajectory of sequence

Nesterov/FISTAGradient descent

1-step inertial DRnormal DR

Jingwei LIANG A2FoM Trajectory of first-order methods 10/26

Page 37: On the Acceleration of First-order Methods - and a New One

What makes the difference?

Non-smooth opt.Structure

=======โ‡’ FoMGeometry

=======โ‡’ Trajectory of sequence

Nesterov/FISTAGradient descent

1-step inertial DRnormal DR

Jingwei LIANG A2FoM Trajectory of first-order methods 10/26

Page 38: On the Acceleration of First-order Methods - and a New One

What makes the difference?

Non-smooth opt.Structure

=======โ‡’ FoMGeometry

=======โ‡’ Trajectory of sequence

Nesterov/FISTAGradient descent

1-step inertial DRnormal DR

Jingwei LIANG A2FoM Trajectory of first-order methods 10/26

Page 39: On the Acceleration of First-order Methods - and a New One

What makes the difference?

Non-smooth opt.Structure

=======โ‡’ FoMGeometry

=======โ‡’ Trajectory of sequence

However, FoM are non-linear in general...

Nesterov/FISTAGradient descent

1-step inertial DRnormal DR

Jingwei LIANG A2FoM Trajectory of first-order methods 10/26

Page 40: On the Acceleration of First-order Methods - and a New One

Partial smoothness

Definition - Partly smooth function [Lewis โ€™03]

๐‘… is partly smooth at ๐‘ฅ relative to a setM๐‘ฅ containing ๐‘ฅ if ๐œ•๐‘…(๐‘ฅ) โ‰  โˆ… andSmoothness M๐‘ฅ is a ๐ถ2-manifold, ๐‘…|M๐‘ฅ

is ๐ถ2 near ๐‘ฅ.

Sharpness Tangent space TM๐‘ฅ(๐‘ฅ) is ๐‘‡๐‘ฅ โ‰ par(๐œ•๐‘…(๐‘ฅ))โŸ‚

.

Continuity ๐œ•๐‘… โˆถ โ„๐‘› โ‡‰ โ„๐‘› is continuous alongM๐‘ฅ near ๐‘ฅ.par(๐ถ):sub-space parallel to ๐ถ, where ๐ถ โŠ‚ โ„๐‘› is a non-empty convex set.

Examples:

โ„“1, โ„“1,2, โ„“โˆž-norm

Nuclear norm

Total variation

Partly smooth sets...

Jingwei LIANG A2FoM Trajectory of first-order methods 11/26

Page 41: On the Acceleration of First-order Methods - and a New One

Partial smoothness

Definition - Partly smooth function [Lewis โ€™03]

๐‘… is partly smooth at ๐‘ฅ relative to a setM๐‘ฅ containing ๐‘ฅ if ๐œ•๐‘…(๐‘ฅ) โ‰  โˆ… andSmoothness M๐‘ฅ is a ๐ถ2-manifold, ๐‘…|M๐‘ฅ

is ๐ถ2 near ๐‘ฅ.

Sharpness Tangent space TM๐‘ฅ(๐‘ฅ) is ๐‘‡๐‘ฅ โ‰ par(๐œ•๐‘…(๐‘ฅ))โŸ‚

.

Continuity ๐œ•๐‘… โˆถ โ„๐‘› โ‡‰ โ„๐‘› is continuous alongM๐‘ฅ near ๐‘ฅ.par(๐ถ):sub-space parallel to ๐ถ, where ๐ถ โŠ‚ โ„๐‘› is a non-empty convex set.

โˆ’50

5 10 20 30 40

0

5

10

15

20

M x

x2 x1

f(x)

Examples:

โ„“1, โ„“1,2, โ„“โˆž-norm

Nuclear norm

Total variation

Partly smooth sets...

Jingwei LIANG A2FoM Trajectory of first-order methods 11/26

Page 42: On the Acceleration of First-order Methods - and a New One

Trajectory of first-order methods

Framework for analyzing the local trajectory of FoM

First-order method (non-linear)

โ‡“Convergence & Non-degeneracy: finite identification ofM

โ‡“Local linearization alongM: matrix ๐‘€ (linear)

โ‡“Spectral properties of ๐‘€

โ‡“

Local trajectory

Jingwei LIANG A2FoM Trajectory of first-order methods 12/26

Page 43: On the Acceleration of First-order Methods - and a New One

Trajectory of first-order methods

Forwardโ€“Backward Douglasโ€“Rachford/ADMM Primalโ€“Dual

๐‘€ is similar to a symmet-

ric matrix with real eigen-

values in ] โˆ’ 1, 1].

Both functions are polyhedral, ๐‘€isnormalwith complex eigenval-

ues of the form cos(๐œƒ)eยฑi๐œƒ.

Both functions are polyhe-

dral,๐‘€ isblock diagonal-

izable.

NB: For DR/ADMM, if smoothness or (local) strong convexity is posed, straight-line

trajectory can be obtained under proper parameters.

Jingwei LIANG A2FoM Trajectory of first-order methods 12/26

Page 44: On the Acceleration of First-order Methods - and a New One

Acceleration of First-order Methods

Adaptive acceleration for FoM

Trajectory following linear prediction

Page 45: On the Acceleration of First-order Methods - and a New One

Linear prediction: illustration

Idea: Given past points {๐‘ง๐‘˜โˆ’๐‘—}๐‘ž+1๐‘—=0 , how to predict ๐‘ง๐‘˜+1?

Define {๐‘ฃ๐‘˜โˆ’๐‘— โ‰ ๐‘ง๐‘˜โˆ’๐‘— โˆ’ ๐‘ง๐‘˜โˆ’๐‘—โˆ’1}๐‘ž๐‘—=0.

Fit ๐‘ฃ๐‘˜ using past directions ๐‘ฃ๐‘˜โˆ’1, โ€ฆ , ๐‘ฃ๐‘˜โˆ’๐‘ž:

๐‘๐‘˜ โ‰ argmin๐‘โˆˆโ„๐‘ž || โˆ‘๐‘ž๐‘—=1 ๐‘๐‘—๐‘ฃ๐‘˜โˆ’๐‘— โˆ’ ๐‘ฃ๐‘˜||2.

Jingwei LIANG A2FoM Adaptive acceleration for FoM 13/26

Page 46: On the Acceleration of First-order Methods - and a New One

Linear prediction: illustration

Idea: Given past points {๐‘ง๐‘˜โˆ’๐‘—}๐‘ž+1๐‘—=0 , how to predict ๐‘ง๐‘˜+1?

Define {๐‘ฃ๐‘˜โˆ’๐‘— โ‰ ๐‘ง๐‘˜โˆ’๐‘— โˆ’ ๐‘ง๐‘˜โˆ’๐‘—โˆ’1}๐‘ž๐‘—=0.

Fit ๐‘ฃ๐‘˜ using past directions ๐‘ฃ๐‘˜โˆ’1, โ€ฆ , ๐‘ฃ๐‘˜โˆ’๐‘ž:

๐‘๐‘˜ โ‰ argmin๐‘โˆˆโ„๐‘ž || โˆ‘๐‘ž๐‘—=1 ๐‘๐‘—๐‘ฃ๐‘˜โˆ’๐‘— โˆ’ ๐‘ฃ๐‘˜||2.

Suppose ๐‘ง๐‘˜+1 is given, we can compute ๐‘๐‘˜+1 as above...

Approximation

๐‘ฃ๐‘˜+1 = โˆ‘๐‘—๐‘๐‘˜+1,๐‘—๐‘ฃ๐‘˜+1โˆ’๐‘— โ‰ˆ โˆ‘๐‘—๐‘๐‘˜,๐‘—๐‘ฃ๐‘˜+1โˆ’๐‘—.

Jingwei LIANG A2FoM Adaptive acceleration for FoM 13/26

Page 47: On the Acceleration of First-order Methods - and a New One

Linear prediction: illustration

Idea: Given past points {๐‘ง๐‘˜โˆ’๐‘—}๐‘ž+1๐‘—=0 , how to predict ๐‘ง๐‘˜+1?

Define {๐‘ฃ๐‘˜โˆ’๐‘— โ‰ ๐‘ง๐‘˜โˆ’๐‘— โˆ’ ๐‘ง๐‘˜โˆ’๐‘—โˆ’1}๐‘ž๐‘—=0.

Approximation ๐‘ง๐‘˜+1 โ‰ˆ ๐‘ง๐‘˜,1 โ‰ ๐‘ง๐‘˜ + โˆ‘๐‘ž๐‘—=1 ๐‘๐‘˜,๐‘—๐‘ฃ๐‘˜โˆ’๐‘—+1.

Jingwei LIANG A2FoM Adaptive acceleration for FoM 13/26

Page 48: On the Acceleration of First-order Methods - and a New One

Linear prediction: illustration

Idea: Given past points {๐‘ง๐‘˜โˆ’๐‘—}๐‘ž+1๐‘—=0 , how to predict ๐‘ง๐‘˜+1?

Define {๐‘ฃ๐‘˜โˆ’๐‘— โ‰ ๐‘ง๐‘˜โˆ’๐‘— โˆ’ ๐‘ง๐‘˜โˆ’๐‘—โˆ’1}๐‘ž๐‘—=0.

Approximation ๐‘ง๐‘˜+1 โ‰ˆ ๐‘ง๐‘˜,1 โ‰ ๐‘ง๐‘˜ + โˆ‘๐‘ž๐‘—=1 ๐‘๐‘˜,๐‘—๐‘ฃ๐‘˜โˆ’๐‘—+1.

Repeat on {๐‘ง๐‘˜โˆ’๐‘—}๐‘ž๐‘—=0 โˆช { ๐‘ง๐‘˜,1} and so on...

Jingwei LIANG A2FoM Adaptive acceleration for FoM 13/26

Page 49: On the Acceleration of First-order Methods - and a New One

Linear prediction: illustration

Idea: Given past points {๐‘ง๐‘˜โˆ’๐‘—}๐‘ž+1๐‘—=0 , how to predict ๐‘ง๐‘˜+1?

Define {๐‘ฃ๐‘˜โˆ’๐‘— โ‰ ๐‘ง๐‘˜โˆ’๐‘— โˆ’ ๐‘ง๐‘˜โˆ’๐‘—โˆ’1}๐‘ž๐‘—=0.

Approximation ๐‘ง๐‘˜+1 โ‰ˆ ๐‘ง๐‘˜,1 โ‰ ๐‘ง๐‘˜ + โˆ‘๐‘ž๐‘—=1 ๐‘๐‘˜,๐‘—๐‘ฃ๐‘˜โˆ’๐‘—+1.

Repeat on {๐‘ง๐‘˜โˆ’๐‘—}๐‘ž๐‘—=0 โˆช { ๐‘ง๐‘˜,1} and so on...

Repeat ๐‘  times: ๐‘ง๐‘˜+๐‘  โ‰ˆ ๐‘ง๐‘˜,๐‘  = ๐‘ง๐‘˜ + โˆ‘๐‘ ๐‘—=1 ๐‘ฃ๐‘˜+๐‘—.

Jingwei LIANG A2FoM Adaptive acceleration for FoM 13/26

Page 50: On the Acceleration of First-order Methods - and a New One

Linear prediction: illustration

Idea: Given past points {๐‘ง๐‘˜โˆ’๐‘—}๐‘ž+1๐‘—=0 , how to predict ๐‘ง๐‘˜+1?

Define {๐‘ฃ๐‘˜โˆ’๐‘— โ‰ ๐‘ง๐‘˜โˆ’๐‘— โˆ’ ๐‘ง๐‘˜โˆ’๐‘—โˆ’1}๐‘ž๐‘—=0.

Approximation ๐‘ง๐‘˜+1 โ‰ˆ ๐‘ง๐‘˜,1 โ‰ ๐‘ง๐‘˜ + โˆ‘๐‘ž๐‘—=1 ๐‘๐‘˜,๐‘—๐‘ฃ๐‘˜โˆ’๐‘—+1.

Repeat on {๐‘ง๐‘˜โˆ’๐‘—}๐‘ž๐‘—=0 โˆช { ๐‘ง๐‘˜,1} and so on...

Repeat ๐‘  times: ๐‘ง๐‘˜+๐‘  โ‰ˆ ๐‘ง๐‘˜,๐‘  = ๐‘ง๐‘˜ + โˆ‘๐‘ ๐‘—=1 ๐‘ฃ๐‘˜+๐‘—.

Let ๐‘  โ†’ +โˆž, if converges

๐‘งโ‹† โ‰ˆ ๐‘ง๐‘˜,+โˆž = ๐‘ง๐‘˜ + โˆ‘+โˆž๐‘—=1 ๐‘ฃ๐‘˜+๐‘—.

Jingwei LIANG A2FoM Adaptive acceleration for FoM 13/26

Page 51: On the Acceleration of First-order Methods - and a New One

Linear prediction: illustration

Idea: Given past points {๐‘ง๐‘˜โˆ’๐‘—}๐‘ž+1๐‘—=0 , how to predict ๐‘ง๐‘˜+1?

Define {๐‘ฃ๐‘˜โˆ’๐‘— โ‰ ๐‘ง๐‘˜โˆ’๐‘— โˆ’ ๐‘ง๐‘˜โˆ’๐‘—โˆ’1}๐‘ž๐‘—=0.

Approximation ๐‘ง๐‘˜+1 โ‰ˆ ๐‘ง๐‘˜,1 โ‰ ๐‘ง๐‘˜ + โˆ‘๐‘ž๐‘—=1 ๐‘๐‘˜,๐‘—๐‘ฃ๐‘˜โˆ’๐‘—+1.

Repeat on {๐‘ง๐‘˜โˆ’๐‘—}๐‘ž๐‘—=0 โˆช { ๐‘ง๐‘˜,1} and so on...

Repeat ๐‘  times: ๐‘ง๐‘˜+๐‘  โ‰ˆ ๐‘ง๐‘˜,๐‘  = ๐‘ง๐‘˜ + โˆ‘๐‘ ๐‘—=1 ๐‘ฃ๐‘˜+๐‘—.

Let ๐‘  โ†’ +โˆž, if converges

๐‘งโ‹† โ‰ˆ ๐‘ง๐‘˜,+โˆž = ๐‘ง๐‘˜ + โˆ‘+โˆž๐‘—=1 ๐‘ฃ๐‘˜+๐‘—.

The ๐‘ -step extrapolation is ๐‘ง๐‘˜,๐‘  = ๐‘ง๐‘˜ + E๐‘ ,๐‘ž,๐‘˜, where E๐‘ ,๐‘ž,๐‘˜ = โˆ‘๐‘ž๐‘—=1 ๐‘๐‘—๐‘ฃ๐‘˜โˆ’๐‘—+1 and

๐‘ โ‰ (โˆ‘๐‘ 

๐‘—=1๐ป(๐‘๐‘˜)๐‘—)

(โˆถ,1)with ๐ป(๐‘๐‘˜) โ‰

โŽกโŽขโŽขโŽฃ

๐‘1 1๐‘2 1โ‹ฎ โ‹ฑ

๐‘๐‘žโˆ’1 1๐‘๐‘ž 0 โ€ฆ 0

โŽคโŽฅโŽฅโŽฆ

.

Jingwei LIANG A2FoM Adaptive acceleration for FoM 13/26

Page 52: On the Acceleration of First-order Methods - and a New One

Adaptive acceleration for FoM (A2FoM)

Given first-order method

๐‘ง๐‘˜+1 = F(๐‘ง๐‘˜).

Algorithm - A2FoM via linear prediction

Let ๐‘  โ‰ฅ 1, ๐‘ž โ‰ฅ 1 be integers. Let ๐‘ง0 โˆˆ โ„๐‘› and ๐‘ง0 = ๐‘ง0, set ๐ท = 0 โˆˆ โ„๐‘›ร—(๐‘ž+1):

For ๐‘˜ โ‰ฅ 1:๐‘ง๐‘˜ = F( ๐‘ง๐‘˜โˆ’1),๐‘ฃ๐‘˜ = ๐‘ง๐‘˜ โˆ’ ๐‘ง๐‘˜โˆ’1,๐ท = [๐‘ฃ๐‘˜, ๐ท(โˆถ, 1 โˆถ ๐‘ž)].

If mod(๐‘˜, ๐‘ž+2) = 0: compute ๐‘ and ๐ป๐‘,

If ๐œŒ(๐ป๐‘) < 1: ๐‘ง๐‘˜ = ๐‘ง๐‘˜ + ๐‘‰๐‘˜(โˆ‘๐‘ 

๐‘–=1๐ป๐‘–

๐‘)(โˆถ,1)

;

else: ๐‘ง๐‘˜ = ๐‘ง๐‘˜.

If mod(๐‘˜, ๐‘ž+2) โ‰  0: ๐‘ง๐‘˜ = ๐‘ง๐‘˜.

Jingwei LIANG A2FoM Adaptive acceleration for FoM 14/26

Page 53: On the Acceleration of First-order Methods - and a New One

Adaptive acceleration for FoM (A2FoM)

Remarks

Every (๐‘ž + 2)-iteration we apply 1-step LP.

Only apply the linear prediction when ๐œŒ(๐ป๐‘) < 1.

Extra memory cost ๐‘› ร— (๐‘ž + 1) (the difference vector matrix). Usually ๐‘ž โ‰ค 10.

Extra computation cost, ๐‘ž2๐‘› from ๐‘‰ +๐‘˜โˆ’1.

Conditional convergence can be obtained by treating LP as perturbation error,

๐‘ง๐‘˜+1 = F(๐‘ง๐‘˜ + ๐œ–๐‘˜).

Weighted LP

๐‘ง๐‘˜ = ๐‘ง๐‘˜ + ๐‘Ž๐‘˜๐‘‰๐‘˜(โˆ‘๐‘ 

๐‘–=1๐ป๐‘–

๐‘)(โˆถ,1)

,

with ๐‘Ž๐‘˜ updated online.

If ๐œŒ(๐ป๐‘) < 1, the Neumann series is convergent

โˆ‘+โˆž

๐‘–=0๐ป๐‘–

๐‘ = (Idโˆ’ ๐ป๐‘)โˆ’1.

Jingwei LIANG A2FoM Adaptive acceleration for FoM 14/26

Page 54: On the Acceleration of First-order Methods - and a New One

Adaptive acceleration for FoM (A2FoM)

Remarks

Every (๐‘ž + 2)-iteration we apply 1-step LP.

Only apply the linear prediction when ๐œŒ(๐ป๐‘) < 1.

Extra memory cost ๐‘› ร— (๐‘ž + 1) (the difference vector matrix). Usually ๐‘ž โ‰ค 10.

Extra computation cost, ๐‘ž2๐‘› from ๐‘‰ +๐‘˜โˆ’1.

Conditional convergence can be obtained by treating LP as perturbation error,

๐‘ง๐‘˜+1 = F(๐‘ง๐‘˜ + ๐œ–๐‘˜).

Weighted LP

๐‘ง๐‘˜ = ๐‘ง๐‘˜ + ๐‘Ž๐‘˜๐‘‰๐‘˜(โˆ‘๐‘ 

๐‘–=1๐ป๐‘–

๐‘)(โˆถ,1)

,

with ๐‘Ž๐‘˜ updated online.

If ๐œŒ(๐ป๐‘) < 1, the Neumann series is convergent

โˆ‘+โˆž

๐‘–=0๐ป๐‘–

๐‘ = (Idโˆ’ ๐ป๐‘)โˆ’1.

Jingwei LIANG A2FoM Adaptive acceleration for FoM 14/26

Page 55: On the Acceleration of First-order Methods - and a New One

Adaptive acceleration for FoM (A2FoM)

Remarks

Every (๐‘ž + 2)-iteration we apply 1-step LP.

Only apply the linear prediction when ๐œŒ(๐ป๐‘) < 1.

Extra memory cost ๐‘› ร— (๐‘ž + 1) (the difference vector matrix). Usually ๐‘ž โ‰ค 10.

Extra computation cost, ๐‘ž2๐‘› from ๐‘‰ +๐‘˜โˆ’1.

Conditional convergence can be obtained by treating LP as perturbation error,

๐‘ง๐‘˜+1 = F(๐‘ง๐‘˜ + ๐œ–๐‘˜).

Weighted LP

๐‘ง๐‘˜ = ๐‘ง๐‘˜ + ๐‘Ž๐‘˜๐‘‰๐‘˜(โˆ‘๐‘ 

๐‘–=1๐ป๐‘–

๐‘)(โˆถ,1)

,

with ๐‘Ž๐‘˜ updated online.

If ๐œŒ(๐ป๐‘) < 1, the Neumann series is convergent

โˆ‘+โˆž

๐‘–=0๐ป๐‘–

๐‘ = (Idโˆ’ ๐ป๐‘)โˆ’1.

Jingwei LIANG A2FoM Adaptive acceleration for FoM 14/26

Page 56: On the Acceleration of First-order Methods - and a New One

Example: Douglasโ€“Rachford continue

normal DR

Jingwei LIANG A2FoM Adaptive acceleration for FoM 15/26

Page 57: On the Acceleration of First-order Methods - and a New One

Example: Douglasโ€“Rachford continue

normal DRLP, s = 4LP, s = 25

LP, s = +

Jingwei LIANG A2FoM Adaptive acceleration for FoM 15/26

Page 58: On the Acceleration of First-order Methods - and a New One

Acceleration of First-order Methods

Relation with previous work

Polynomial extrapolation, regularized non-linear acceleration

Page 59: On the Acceleration of First-order Methods - and a New One

Convergence acceleration

Given a sequence {๐‘ง๐‘˜}๐‘˜โˆˆโ„• which converges to ๐‘งโ‹†. Can we generate another sequence

{ ๐‘ง๐‘˜}๐‘˜โˆˆโ„• such that || ๐‘ง๐‘˜ โˆ’ ๐‘งโ‹†|| = ๐‘œ(||๐‘ง๐‘˜ โˆ’ ๐‘งโ‹†||)?

This is called convergence acceleration and is well-established in numerical analysis:

1927 Aitkinโ€™s ๐›ฅ-process.

1965 Andersenโ€™s acceleration.

1970โ€™s Vector extrapolation techniques such as minimal polynomial extrapolation

(MPE) and reduced rank extrapolation (RRE) [Sidi โ€™17].

Recent Regularized non-linear acceleration (RNA) is a regularised version of RRE

introduced by [Scieur, Dโ€™Aspremont, Bach โ€™16].

Jingwei LIANG A2FoM Relation with previous work 16/26

Page 60: On the Acceleration of First-order Methods - and a New One

Convergence acceleration

Given a sequence {๐‘ง๐‘˜}๐‘˜โˆˆโ„• which converges to ๐‘งโ‹†. Can we generate another sequence

{ ๐‘ง๐‘˜}๐‘˜โˆˆโ„• such that || ๐‘ง๐‘˜ โˆ’ ๐‘งโ‹†|| = ๐‘œ(||๐‘ง๐‘˜ โˆ’ ๐‘งโ‹†||)?

This is called convergence acceleration and is well-established in numerical analysis:

1927 Aitkinโ€™s ๐›ฅ-process.

1965 Andersenโ€™s acceleration.

1970โ€™s Vector extrapolation techniques such as minimal polynomial extrapolation

(MPE) and reduced rank extrapolation (RRE) [Sidi โ€™17].

Recent Regularized non-linear acceleration (RNA) is a regularised version of RRE

introduced by [Scieur, Dโ€™Aspremont, Bach โ€™16].

Jingwei LIANG A2FoM Relation with previous work 16/26

Page 61: On the Acceleration of First-order Methods - and a New One

Vector extrapolation techniques

Polynomial extrapolation [Cabay & Jackson โ€™76]

Consider ๐‘ง๐‘˜+1 = ๐‘€๐‘ง๐‘˜ + ๐‘‘ with ๐œŒ(๐‘€) < 1 such that ๐‘ง๐‘˜ โ†’ ๐‘งโ‹†:

๐‘ง๐‘˜ โˆ’ ๐‘งโ‹† = ๐‘€(๐‘ง๐‘˜โˆ’1 โˆ’ ๐‘งโ‹†) = ๐‘€๐‘˜(๐‘ง0 โˆ’ ๐‘งโ‹†),

If ๐‘ƒ(๐œ†) = โˆ‘๐‘ž๐‘—=0 ๐‘๐‘—๐œ†๐‘— is the minimal polynomial of ๐‘€ w.r.t. ๐‘ง0 โˆ’ ๐‘งโ‹†, that is

๐‘ƒ(๐‘€)(๐‘ง0 โˆ’ ๐‘งโ‹†) = โˆ‘๐‘ž

๐‘—=0๐‘๐‘—๐‘€ ๐‘—(๐‘ง0 โˆ’ ๐‘งโ‹†) = 0.

then ๐‘งโ‹† =โˆ‘๐‘ž

๐‘—=0 ๐‘๐‘—๐‘ง๐‘—

โˆ‘๐‘— ๐‘๐‘—.

The coefficients ๐‘ can be computed without knowledge of ๐‘งโ‹†:

๐‘‰๐‘ž๐‘(0โˆถ๐‘žโˆ’1) = โˆ’๐‘ฃ๐‘ž+1

where ๐‘‰๐‘ž = [๐‘ฃ1|๐‘ฃ2| โ‹ฏ |๐‘ฃ๐‘ž] and ๐‘ฃ๐‘— = ๐‘ง๐‘— โˆ’ ๐‘ง๐‘—โˆ’1.

Jingwei LIANG A2FoM Relation with previous work 17/26

Page 62: On the Acceleration of First-order Methods - and a New One

Vector extrapolation techniques

Vector extrapolation methods with applications (SIAM, 2017) by Avram Sidi.

Minimal polynomial extrapolation (MPE)

Let ๐‘ง0 = ๐‘ง:S.1 Generate points {๐‘ง๐‘—}

๐‘ž+1๐‘—=0 and let ๐‘ฃ๐‘— = ๐‘ง๐‘— โˆ’ ๐‘ง๐‘—โˆ’1.

S.2 Let ๐‘ โˆˆ โ„๐‘ž+1 be s.t. ๐‘๐‘ž = 1 and ๐‘‰๐‘ž๐‘(0โˆถ๐‘žโˆ’1) = โˆ’๐‘ฃ๐‘ž+1 where ๐‘‰๐‘ž = [๐‘ฃ1| โ‹ฏ |๐‘ฃ๐‘ž].For ๐‘— โˆˆ [0, ๐‘ž โˆ’ 1], ๐‘๐‘— โ‰ ๐‘๐‘—/(โˆ‘๐‘ž

๐‘—=0๐‘๐‘—).

S.3 ๐‘ง โ‰ โˆ‘๐‘ž๐‘—=0 ๐‘๐‘—๐‘ง๐‘—.

Jingwei LIANG A2FoM Relation with previous work 17/26

Page 63: On the Acceleration of First-order Methods - and a New One

Vector extrapolation techniques

Vector extrapolation methods with applications (SIAM, 2017) by Avram Sidi.

Minimal polynomial extrapolation (MPE)

Let ๐‘ง0 = ๐‘ง:S.1 Generate points {๐‘ง๐‘—}

๐‘ž+1๐‘—=0 and let ๐‘ฃ๐‘— = ๐‘ง๐‘— โˆ’ ๐‘ง๐‘—โˆ’1.

S.2 Let ๐‘ โˆˆ โ„๐‘ž+1 be s.t. ๐‘๐‘ž = 1 and ๐‘‰๐‘ž๐‘(0โˆถ๐‘žโˆ’1) = โˆ’๐‘ฃ๐‘ž+1 where ๐‘‰๐‘ž = [๐‘ฃ1| โ‹ฏ |๐‘ฃ๐‘ž].For ๐‘— โˆˆ [0, ๐‘ž โˆ’ 1], ๐‘๐‘— โ‰ ๐‘๐‘—/(โˆ‘๐‘ž

๐‘—=0๐‘๐‘—).

S.3 ๐‘ง โ‰ โˆ‘๐‘ž๐‘—=0 ๐‘๐‘—๐‘ง๐‘—.

Reduced rank extrapolation (RRE)

[Andersen โ€™65; Kaniel & Stein โ€™74; Eddy โ€™79; Meลกina โ€™77] Replace step S.2 by

๐‘ โˆˆ argmin๐‘ ||๐‘‰๐‘ž+1๐‘|| subject to 1๐‘‡๐‘ = 1.

Jingwei LIANG A2FoM Relation with previous work 17/26

Page 64: On the Acceleration of First-order Methods - and a New One

Vector extrapolation techniques

Vector extrapolation methods with applications (SIAM, 2017) by Avram Sidi.

Minimal polynomial extrapolation (MPE)

Let ๐‘ง0 = ๐‘ง:S.1 Generate points {๐‘ง๐‘—}

๐‘ž+1๐‘—=0 and let ๐‘ฃ๐‘— = ๐‘ง๐‘— โˆ’ ๐‘ง๐‘—โˆ’1.

S.2 Let ๐‘ โˆˆ โ„๐‘ž+1 be s.t. ๐‘๐‘ž = 1 and ๐‘‰๐‘ž๐‘(0โˆถ๐‘žโˆ’1) = โˆ’๐‘ฃ๐‘ž+1 where ๐‘‰๐‘ž = [๐‘ฃ1| โ‹ฏ |๐‘ฃ๐‘ž].For ๐‘— โˆˆ [0, ๐‘ž โˆ’ 1], ๐‘๐‘— โ‰ ๐‘๐‘—/(โˆ‘๐‘ž

๐‘—=0๐‘๐‘—).

S.3 ๐‘ง โ‰ โˆ‘๐‘ž๐‘—=0 ๐‘๐‘—๐‘ง๐‘—.

Reduced rank extrapolation (RRE)

[Andersen โ€™65; Kaniel & Stein โ€™74; Eddy โ€™79; Meลกina โ€™77] Replace step S.2 by

๐‘ โˆˆ argmin๐‘ ||๐‘‰๐‘ž+1๐‘|| subject to 1๐‘‡๐‘ = 1.

LP is equivalent to MPE with S.3 replaced by ๐‘ง โ‰ โˆ‘๐‘ž๐‘—=0 ๐‘๐‘—๐‘ง๐‘—+1.

Jingwei LIANG A2FoM Relation with previous work 17/26

Page 65: On the Acceleration of First-order Methods - and a New One

Relationship between MPE and LP

Our derivation

is based on sequence trajectory.

motivates checking ๐œŒ(๐ป๐‘) < 1.

Example: Solve with DR

min๐‘ฅ

๐‘…(๐‘ฅ) subject to ๐ด๐‘ฅ = ๐‘.

500 1000 1500 2000

10-8

10-4

100

DR1-iDR2-iDRDR+LP, finiteDR+LP, InfiniteMPERRE

500 1000 1500 2000

10-8

10-4

100

DR1-iDR2-iDRDR+LP, finiteDR+LP, InfiniteMPERRE

โ„“1-norm Nuclear norm

Jingwei LIANG A2FoM Relation with previous work 18/26

Page 66: On the Acceleration of First-order Methods - and a New One

Relationship between MPE and LP

Our derivation

is based on sequence trajectory.

motivates checking ๐œŒ(๐ป๐‘) < 1.

Example: Solve with DR

min๐‘ฅ

๐‘…(๐‘ฅ) subject to ๐ด๐‘ฅ = ๐‘.

500 1000 1500 2000

10-8

10-4

100

DR1-iDR2-iDRDR+LP, finiteDR+LP, InfiniteMPERRE

500 1000 1500 2000

10-8

10-4

100

DR1-iDR2-iDRDR+LP, finiteDR+LP, InfiniteMPERRE

โ„“1-norm Nuclear norm

Jingwei LIANG A2FoM Relation with previous work 18/26

Page 67: On the Acceleration of First-order Methods - and a New One

Acceleration guarantees

When ๐‘ง๐‘˜+1 โˆ’ ๐‘ง๐‘˜ = ๐‘€(๐‘ง๐‘˜ โˆ’ ๐‘ง๐‘˜โˆ’1),

|| ๐‘ง๐‘˜,๐‘  โˆ’ ๐‘งโ‹†|| โ‰ค ||๐‘ง๐‘˜+๐‘  โˆ’ ๐‘งโ‹†|| + ๐ต๐œ–๐‘˜

where ๐œ–๐‘˜ = ||๐‘‰๐‘˜โˆ’1๐‘ โˆ’ ๐‘ฃ๐‘˜|| and ๐ต โ‰ โˆ‘๐‘ โ„“=1 ||๐‘€ โ„“||| โˆ‘๐‘ โˆ’โ„“

๐‘–=0(๐ป๐‘–๐‘)(1,1)|.

Asymptotic bound (๐‘˜ โ†’ โˆž):

๐œ–๐‘˜ = ๐‘‚(|๐œ†๐‘ž+1|๐‘˜)

where ๐œ†๐‘ž+1 is the (๐‘ž + 1)๐‘กโ„Ž largest eigenvalue. Without extrapolation, we just have

๐‘‚(|๐œ†1|๐‘˜).

Non-asymptotic bound: If ๐›ด(๐‘€) โŠ‚ [๐›ผ, ๐›ฝ] with โˆ’1 < ๐›ผ < ๐›ฝ < 1, then

๐ต๐œ–๐‘˜ โ‰ค ๐พ๐›ฝ๐‘˜โˆ’๐‘ž (โˆš๐œ‚ โˆ’ 1โˆš๐œ‚ + 1)

๐‘ž, where ๐œ‚ = 1 โˆ’ ๐›ผ

1 โˆ’ ๐›ฝ

Similar error bounds also hold for MPE and RRE [Sidi โ€™98].

For PD, DR with polyhedral functions, guaranteed acceleration with ๐‘ž = 2.

Jingwei LIANG A2FoM Relation with previous work 19/26

Page 68: On the Acceleration of First-order Methods - and a New One

Acceleration guarantees

When ๐‘ง๐‘˜+1 โˆ’ ๐‘ง๐‘˜ = ๐‘€(๐‘ง๐‘˜ โˆ’ ๐‘ง๐‘˜โˆ’1),

|| ๐‘ง๐‘˜,๐‘  โˆ’ ๐‘งโ‹†|| โ‰ค ||๐‘ง๐‘˜+๐‘  โˆ’ ๐‘งโ‹†|| + ๐ต๐œ–๐‘˜

where ๐œ–๐‘˜ = ||๐‘‰๐‘˜โˆ’1๐‘ โˆ’ ๐‘ฃ๐‘˜|| and ๐ต โ‰ โˆ‘๐‘ โ„“=1 ||๐‘€ โ„“||| โˆ‘๐‘ โˆ’โ„“

๐‘–=0(๐ป๐‘–๐‘)(1,1)|.

Asymptotic bound (๐‘˜ โ†’ โˆž):

๐œ–๐‘˜ = ๐‘‚(|๐œ†๐‘ž+1|๐‘˜)

where ๐œ†๐‘ž+1 is the (๐‘ž + 1)๐‘กโ„Ž largest eigenvalue. Without extrapolation, we just have

๐‘‚(|๐œ†1|๐‘˜).

Non-asymptotic bound: If ๐›ด(๐‘€) โŠ‚ [๐›ผ, ๐›ฝ] with โˆ’1 < ๐›ผ < ๐›ฝ < 1, then

๐ต๐œ–๐‘˜ โ‰ค ๐พ๐›ฝ๐‘˜โˆ’๐‘ž (โˆš๐œ‚ โˆ’ 1โˆš๐œ‚ + 1)

๐‘ž, where ๐œ‚ = 1 โˆ’ ๐›ผ

1 โˆ’ ๐›ฝ

Similar error bounds also hold for MPE and RRE [Sidi โ€™98].

For PD, DR with polyhedral functions, guaranteed acceleration with ๐‘ž = 2.

Jingwei LIANG A2FoM Relation with previous work 19/26

Page 69: On the Acceleration of First-order Methods - and a New One

Acceleration guarantees

When ๐‘ง๐‘˜+1 โˆ’ ๐‘ง๐‘˜ = ๐‘€(๐‘ง๐‘˜ โˆ’ ๐‘ง๐‘˜โˆ’1),

|| ๐‘ง๐‘˜,๐‘  โˆ’ ๐‘งโ‹†|| โ‰ค ||๐‘ง๐‘˜+๐‘  โˆ’ ๐‘งโ‹†|| + ๐ต๐œ–๐‘˜

where ๐œ–๐‘˜ = ||๐‘‰๐‘˜โˆ’1๐‘ โˆ’ ๐‘ฃ๐‘˜|| and ๐ต โ‰ โˆ‘๐‘ โ„“=1 ||๐‘€ โ„“||| โˆ‘๐‘ โˆ’โ„“

๐‘–=0(๐ป๐‘–๐‘)(1,1)|.

Asymptotic bound (๐‘˜ โ†’ โˆž):

๐œ–๐‘˜ = ๐‘‚(|๐œ†๐‘ž+1|๐‘˜)

where ๐œ†๐‘ž+1 is the (๐‘ž + 1)๐‘กโ„Ž largest eigenvalue. Without extrapolation, we just have

๐‘‚(|๐œ†1|๐‘˜).

Non-asymptotic bound: If ๐›ด(๐‘€) โŠ‚ [๐›ผ, ๐›ฝ] with โˆ’1 < ๐›ผ < ๐›ฝ < 1, then

๐ต๐œ–๐‘˜ โ‰ค ๐พ๐›ฝ๐‘˜โˆ’๐‘ž (โˆš๐œ‚ โˆ’ 1โˆš๐œ‚ + 1)

๐‘ž, where ๐œ‚ = 1 โˆ’ ๐›ผ

1 โˆ’ ๐›ฝ

Similar error bounds also hold for MPE and RRE [Sidi โ€™98].

For PD, DR with polyhedral functions, guaranteed acceleration with ๐‘ž = 2.

Jingwei LIANG A2FoM Relation with previous work 19/26

Page 70: On the Acceleration of First-order Methods - and a New One

Acceleration guarantees

When ๐‘ง๐‘˜+1 โˆ’ ๐‘ง๐‘˜ = ๐‘€(๐‘ง๐‘˜ โˆ’ ๐‘ง๐‘˜โˆ’1),

|| ๐‘ง๐‘˜,๐‘  โˆ’ ๐‘งโ‹†|| โ‰ค ||๐‘ง๐‘˜+๐‘  โˆ’ ๐‘งโ‹†|| + ๐ต๐œ–๐‘˜

where ๐œ–๐‘˜ = ||๐‘‰๐‘˜โˆ’1๐‘ โˆ’ ๐‘ฃ๐‘˜|| and ๐ต โ‰ โˆ‘๐‘ โ„“=1 ||๐‘€ โ„“||| โˆ‘๐‘ โˆ’โ„“

๐‘–=0(๐ป๐‘–๐‘)(1,1)|.

Asymptotic bound (๐‘˜ โ†’ โˆž):

๐œ–๐‘˜ = ๐‘‚(|๐œ†๐‘ž+1|๐‘˜)

where ๐œ†๐‘ž+1 is the (๐‘ž + 1)๐‘กโ„Ž largest eigenvalue. Without extrapolation, we just have

๐‘‚(|๐œ†1|๐‘˜).

Non-asymptotic bound: If ๐›ด(๐‘€) โŠ‚ [๐›ผ, ๐›ฝ] with โˆ’1 < ๐›ผ < ๐›ฝ < 1, then

๐ต๐œ–๐‘˜ โ‰ค ๐พ๐›ฝ๐‘˜โˆ’๐‘ž (โˆš๐œ‚ โˆ’ 1โˆš๐œ‚ + 1)

๐‘ž, where ๐œ‚ = 1 โˆ’ ๐›ผ

1 โˆ’ ๐›ฝ

Similar error bounds also hold for MPE and RRE [Sidi โ€™98].

For PD, DR with polyhedral functions, guaranteed acceleration with ๐‘ž = 2.

Jingwei LIANG A2FoM Relation with previous work 19/26

Page 71: On the Acceleration of First-order Methods - and a New One

Acceleration guarantees

When ๐‘ง๐‘˜+1 โˆ’ ๐‘ง๐‘˜ = ๐‘€(๐‘ง๐‘˜ โˆ’ ๐‘ง๐‘˜โˆ’1),

|| ๐‘ง๐‘˜,๐‘  โˆ’ ๐‘งโ‹†|| โ‰ค ||๐‘ง๐‘˜+๐‘  โˆ’ ๐‘งโ‹†|| + ๐ต๐œ–๐‘˜

where ๐œ–๐‘˜ = ||๐‘‰๐‘˜โˆ’1๐‘ โˆ’ ๐‘ฃ๐‘˜|| and ๐ต โ‰ โˆ‘๐‘ โ„“=1 ||๐‘€ โ„“||| โˆ‘๐‘ โˆ’โ„“

๐‘–=0(๐ป๐‘–๐‘)(1,1)|.

Asymptotic bound (๐‘˜ โ†’ โˆž):

๐œ–๐‘˜ = ๐‘‚(|๐œ†๐‘ž+1|๐‘˜)

where ๐œ†๐‘ž+1 is the (๐‘ž + 1)๐‘กโ„Ž largest eigenvalue. Without extrapolation, we just have

๐‘‚(|๐œ†1|๐‘˜).

Non-asymptotic bound: If ๐›ด(๐‘€) โŠ‚ [๐›ผ, ๐›ฝ] with โˆ’1 < ๐›ผ < ๐›ฝ < 1, then

๐ต๐œ–๐‘˜ โ‰ค ๐พ๐›ฝ๐‘˜โˆ’๐‘ž (โˆš๐œ‚ โˆ’ 1โˆš๐œ‚ + 1)

๐‘ž, where ๐œ‚ = 1 โˆ’ ๐›ผ

1 โˆ’ ๐›ฝ

Similar error bounds also hold for MPE and RRE [Sidi โ€™98].

For PD, DR with polyhedral functions, guaranteed acceleration with ๐‘ž = 2.

Jingwei LIANG A2FoM Relation with previous work 19/26

Page 72: On the Acceleration of First-order Methods - and a New One

Regularised non-linear acceleration (RNA): regularised RRE

[Scieur, Dโ€™Aspremont, Bach โ€™16] studied RRE for the case of

๐‘ง๐‘˜+1 โˆ’ ๐‘งโ‹† = ๐ด(๐‘ง๐‘˜ โˆ’ ๐‘งโ‹†) + ๐‘‚(||๐‘ง๐‘˜ โˆ’ ๐‘งโ‹†||2)

where ๐ด is symmetric with 0 โชฏ ๐ด โชฏ ๐œŽId, ๐œŽ < 1.

To deal with the possible ill-conditioning of ๐‘‰๐‘ž, regularise with ๐œ† > 0:

๐‘ โˆˆ Argmin๐‘ ||๐‘๐‘‡(๐‘‰ ๐‘‡๐‘ž ๐‘‰๐‘ž+๐œ†Id)๐‘|| subject to 1๐‘‡๐‘ = 1.

In practice, grid search on optimal ๐œ† โˆˆ [๐œ†min, ๐œ†max]. Potentially many evaluations ofthe objective.

The angle between ๐‘ง๐‘˜ โˆ’ ๐‘ง๐‘˜โˆ’1 and ๐‘ง๐‘˜+1 โˆ’ ๐‘ง๐‘˜ converges to zero, intuitively, this is the

regime where standard inertial works well...

Jingwei LIANG A2FoM Relation with previous work 20/26

Page 73: On the Acceleration of First-order Methods - and a New One

Regularised non-linear acceleration (RNA): regularised RRE

[Scieur, Dโ€™Aspremont, Bach โ€™16] studied RRE for the case of

๐‘ง๐‘˜+1 โˆ’ ๐‘งโ‹† = ๐ด(๐‘ง๐‘˜ โˆ’ ๐‘งโ‹†) + ๐‘‚(||๐‘ง๐‘˜ โˆ’ ๐‘งโ‹†||2)

where ๐ด is symmetric with 0 โชฏ ๐ด โชฏ ๐œŽId, ๐œŽ < 1.

To deal with the possible ill-conditioning of ๐‘‰๐‘ž, regularise with ๐œ† > 0:

๐‘ โˆˆ Argmin๐‘ ||๐‘๐‘‡(๐‘‰ ๐‘‡๐‘ž ๐‘‰๐‘ž+๐œ†Id)๐‘|| subject to 1๐‘‡๐‘ = 1.

In practice, grid search on optimal ๐œ† โˆˆ [๐œ†min, ๐œ†max]. Potentially many evaluations ofthe objective.

The angle between ๐‘ง๐‘˜ โˆ’ ๐‘ง๐‘˜โˆ’1 and ๐‘ง๐‘˜+1 โˆ’ ๐‘ง๐‘˜ converges to zero, intuitively, this is the

regime where standard inertial works well...

Jingwei LIANG A2FoM Relation with previous work 20/26

Page 74: On the Acceleration of First-order Methods - and a New One

Regularised non-linear acceleration (RNA): regularised RRE

[Scieur, Dโ€™Aspremont, Bach โ€™16] studied RRE for the case of

๐‘ง๐‘˜+1 โˆ’ ๐‘งโ‹† = ๐ด(๐‘ง๐‘˜ โˆ’ ๐‘งโ‹†) + ๐‘‚(||๐‘ง๐‘˜ โˆ’ ๐‘งโ‹†||2)

where ๐ด is symmetric with 0 โชฏ ๐ด โชฏ ๐œŽId, ๐œŽ < 1.

To deal with the possible ill-conditioning of ๐‘‰๐‘ž, regularise with ๐œ† > 0:

๐‘ โˆˆ Argmin๐‘ ||๐‘๐‘‡(๐‘‰ ๐‘‡๐‘ž ๐‘‰๐‘ž+๐œ†Id)๐‘|| subject to 1๐‘‡๐‘ = 1.

In practice, grid search on optimal ๐œ† โˆˆ [๐œ†min, ๐œ†max]. Potentially many evaluations ofthe objective.

The angle between ๐‘ง๐‘˜ โˆ’ ๐‘ง๐‘˜โˆ’1 and ๐‘ง๐‘˜+1 โˆ’ ๐‘ง๐‘˜ converges to zero, intuitively, this is the

regime where standard inertial works well...

Jingwei LIANG A2FoM Relation with previous work 20/26

Page 75: On the Acceleration of First-order Methods - and a New One

LASSO-type problems with FB

NB: Recall the straight-line trajectory of FB.

โ„“1-norm โ„“1,2-norm Nuclear norm

500 1000 1500 2000 2500

10-8

10-4

100

Forward--BackwardFISTARestarting FISTALP, finite stepLP, Infinite stepMPERRERNA

200 400 600 800 1000 1200

10-8

10-4

100

Forward--BackwardFISTARestarting FISTALP, finite stepLP, Infinite stepMPERRERNA

2000 4000 6000

10-8

10-4

100

Forward--BackwardFISTARestarting FISTALP, finite stepLP, Infinite stepMPERRERNA

Note the performance of restarted-FISTA [Oโ€™Donoghue & Candรจs โ€™12]!

Jingwei LIANG A2FoM Relation with previous work 21/26

Page 76: On the Acceleration of First-order Methods - and a New One

Acceleration of First-order Methods

Numerical experiments

Somemore results

Page 77: On the Acceleration of First-order Methods - and a New One

Experiment: 2 non-smooth terms

Basis pursuit type problem with ๐›บ โ‰ {๐‘ฅ โˆˆ โ„๐‘› โˆถ ๐พ๐‘ฅ = ๐‘“}:

min๐‘ฅ,๐‘ฆโˆˆโ„๐‘›

๐‘…(๐‘ฅ) + ๐œ„๐›บ(๐‘ฆ) such that ๐‘ฅ โˆ’ ๐‘ฆ = 0.

50 100 150 200 250 300 350

0.8

0.85

0.9

0.95

1

50 100 150 2000.75

0.8

0.85

0.9

0.95

1

100 200 300 400 500

0.8

0.85

0.9

0.95

1

0 50 100 150 200 250 300 350

10-10

10-8

10-6

10-4

10-2

100

102

0 50 100 150 200

10-10

10-8

10-6

10-4

10-2

100

102

0 100 200 300 400 500

10-10

10-8

10-6

10-4

10-2

100

102

โ„“1-norm โ„“1,2-norm Nuclear norm

Jingwei LIANG A2FoM Numerical experiments 22/26

Page 78: On the Acceleration of First-order Methods - and a New One

Experiment: 2 non-smooth terms

0 50 100 150 200 250 300 350

10-10

10-8

10-6

10-4

10-2

100

102

โ„“1-norm

Inertial ADMM is slower than ADMM as eventual trajectory is a spiral.

Jingwei LIANG A2FoM Numerical experiments 22/26

Page 79: On the Acceleration of First-order Methods - and a New One

Experiment: LASSO

The LASSO problem

min๐‘ฅ,๐‘ฆโˆˆโ„๐‘›

๐‘…(๐‘ฅ) + 12 ||๐พ๐‘ฆ โˆ’ ๐‘“||2 such that ๐‘ฅ โˆ’ ๐‘ฆ = 0.

0 2000 4000 6000 8000 10000

10-10

10-8

10-6

10-4

10-2

100

102

0 200 400 600 800 1000 1200

10-10

10-8

10-6

10-4

10-2

100

102

0 500 1000 1500 2000 2500

10-10

10-8

10-6

10-4

10-2

100

102

covtype ijcnn1 phishing

Jingwei LIANG A2FoM Numerical experiments 23/26

Page 80: On the Acceleration of First-order Methods - and a New One

Experiment: LASSO

0 2000 4000 6000 8000 10000

10-10

10-8

10-6

10-4

10-2

100

102

covtype

Inertial ADMM does accelerate, but A3DMM is significantly faster.

Jingwei LIANG A2FoM Numerical experiments 23/26

Page 81: On the Acceleration of First-order Methods - and a New One

Experiment: Total variation based image inpainting

Let ๐›บ โ‰ {๐‘ฅ โˆˆ โ„๐‘›ร—๐‘› โˆถ ๐‘ƒD(๐‘ฅ) = ๐‘“}, ๐‘ƒD randomly sets 50% pixels to zero and consider

min๐‘ฅโˆˆโ„๐‘›ร—๐‘›

||๐‘ฆ||1 + ๐œ„๐›บ(๐‘ฅ) such that โˆ‡๐‘ฅ โˆ’ ๐‘ฆ = 0.

Angle {๐œƒ๐‘˜} ||๐‘ฅ๐‘˜ โˆ’ ๐‘ฅโ‹†|| PSNR

500 1000 15000.75

0.8

0.85

0.9

0.95

1

0 200 400 600 800 1000 1200 1400 1600 1800

10-8

10-6

10-4

10-2

100

102

104

0 20 40 60 80 100

24

24.5

25

25.5

26

26.5

27

27.5

Both functions are polyhedral, trajectory is a spiral.

Inertial ADMM is slower than ADMM.

Jingwei LIANG A2FoM Numerical experiments 24/26

Page 82: On the Acceleration of First-order Methods - and a New One

Experiment: Total variation based image inpainting

Original image ADMM, PSNR = 26.5448 Inertial ADMM, PSNR = 26.1096

Corrupted image A3DMM ๐‘  = 100, PSNR = 27.0402 A3DMM ๐‘  = +โˆž, PSNR = 27.0402

Jingwei LIANG A2FoM Numerical experiments 24/26

Page 83: On the Acceleration of First-order Methods - and a New One

Take-away messages

Trajectory of FoM For fixed-point sequence {๐‘ง๐‘˜}๐‘˜โˆˆโ„•

Linearization of FoM.

Locally different FoM demonstrate distinct trajectories: straight line,

(logarithmic and elliptic) spirals.

An adaptive acceleration for FoM

Though motivated by local trajectory, linear prediction works globally.

For polyhedral functions, guaranteed local acceleration for DR/PD using 4 pastpoints.

For FB, the trajectory is eventually a straight line and one can guarantee local

acceleration by extrapolating from 3 past points under our framework, but one

could just apply restarted FISTA...

Reference

Trajectory of Alternating Direction Method of Multipliers and AdaptiveAcceleration, NeurIPS19.

Geometry of First-Order Methods and Adaptive Acceleration,arXiv:2003.03910.

Jingwei LIANG A2FoM Conclusions 25/26

Page 84: On the Acceleration of First-order Methods - and a New One

Take-away messages

Trajectory of FoM For fixed-point sequence {๐‘ง๐‘˜}๐‘˜โˆˆโ„•

Linearization of FoM.

Locally different FoM demonstrate distinct trajectories: straight line,

(logarithmic and elliptic) spirals.

An adaptive acceleration for FoM

Though motivated by local trajectory, linear prediction works globally.

For polyhedral functions, guaranteed local acceleration for DR/PD using 4 pastpoints.

For FB, the trajectory is eventually a straight line and one can guarantee local

acceleration by extrapolating from 3 past points under our framework, but one

could just apply restarted FISTA...

Reference

Trajectory of Alternating Direction Method of Multipliers and AdaptiveAcceleration, NeurIPS19.

Geometry of First-Order Methods and Adaptive Acceleration,arXiv:2003.03910.

Jingwei LIANG A2FoM Conclusions 25/26

Page 85: On the Acceleration of First-order Methods - and a New One

Take-away messages

Trajectory of FoM For fixed-point sequence {๐‘ง๐‘˜}๐‘˜โˆˆโ„•

Linearization of FoM.

Locally different FoM demonstrate distinct trajectories: straight line,

(logarithmic and elliptic) spirals.

An adaptive acceleration for FoM

Though motivated by local trajectory, linear prediction works globally.

For polyhedral functions, guaranteed local acceleration for DR/PD using 4 pastpoints.

For FB, the trajectory is eventually a straight line and one can guarantee local

acceleration by extrapolating from 3 past points under our framework, but one

could just apply restarted FISTA...

Reference

Trajectory of Alternating Direction Method of Multipliers and AdaptiveAcceleration, NeurIPS19.

Geometry of First-Order Methods and Adaptive Acceleration,arXiv:2003.03910.

Jingwei LIANG A2FoM Conclusions 25/26

Page 86: On the Acceleration of First-order Methods - and a New One

Many thanks for your time!https://jliang993.github.io/

Jingwei LIANG A2FoM Conclusions 26/26