preconditioners for sparse approximations and big data … · 2020-02-18 · • compressed sensing...

52
Preconditioners: Sparse Approximations, Big Data T H E U N I V E R S I T Y O F E D I N B U R G H Preconditioners for Sparse Approximations and Big Data Optimization Jacek Gondzio thanks to my collaborator: K. Fountoulakis Padova, February 18th, 2020 1

Upload: others

Post on 14-Jul-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Preconditioners for Sparse Approximations and Big Data … · 2020-02-18 · • Compressed Sensing with K. Fountoulakis and P. Zhlobich min x τkxk1 + 1 2kAx−bk 2 2, A∈ R m×n

Preconditioners: Sparse Approximations, Big Data

TH

E

U N I V E R S

I TY

OF

ED I N B U

RG

H

Preconditionersfor Sparse Approximationsand Big Data Optimization

Jacek Gondzio

thanks to my collaborator:

K. Fountoulakis

Padova, February 18th, 2020 1

Page 2: Preconditioners for Sparse Approximations and Big Data … · 2020-02-18 · • Compressed Sensing with K. Fountoulakis and P. Zhlobich min x τkxk1 + 1 2kAx−bk 2 2, A∈ R m×n

Preconditioners: Sparse Approximations, Big Data

Introduction

New applications:

• signal/image processing(compression, reconstruction, deblurring)

• modern computational statistics(inverse problems, classification, machine learning)

Overarching mathematical problem:

• dimension reduction

Optimization faces new challenges:

huge scale of data availableneeds new processing algorithms

Padova, February 18th, 2020 2

Page 3: Preconditioners for Sparse Approximations and Big Data … · 2020-02-18 · • Compressed Sensing with K. Fountoulakis and P. Zhlobich min x τkxk1 + 1 2kAx−bk 2 2, A∈ R m×n

Preconditioners: Sparse Approximations, Big Data

Outline

• Homotopy:“replace a difficult problem with a sequenceof (easy to solve) auxilliary problems”

– Interior Point Methods– Big Data optimization

• 2nd-order methods for optimization– Inexact Newton method– Preconditioners needed– Matrix-free methods

• The same idea of preconditioner in IPMs and BD• Big Data problem (240

≈ 1012 variables)• Conclusions

Padova, February 18th, 2020 3

Page 4: Preconditioners for Sparse Approximations and Big Data … · 2020-02-18 · • Compressed Sensing with K. Fountoulakis and P. Zhlobich min x τkxk1 + 1 2kAx−bk 2 2, A∈ R m×n

Preconditioners: Sparse Approximations, Big Data

Interior Point Methods (IPMs)

Padova, February 18th, 2020 4

Page 5: Preconditioners for Sparse Approximations and Big Data … · 2020-02-18 · • Compressed Sensing with K. Fountoulakis and P. Zhlobich min x τkxk1 + 1 2kAx−bk 2 2, A∈ R m×n

Preconditioners: Sparse Approximations, Big Data

Primal-Dual Pair of Quadratic ProgramsPrimal Dual

min cTx+ 12xTQx max bTy − 1

2xTQx

s.t. Ax = b, s.t. ATy+ s−Qx = c,x ≥ 0; s ≥ 0.

LagrangianL(x, y) = cTx+

12xTQx− yT (Ax− b)− sTx.

Optimality Conditions

Ax = b,

ATy+ s−Qx = c,

XSe = 0, ( i.e., xj · sj = 0 ∀j),(x, s) ≥ 0,

X=diag{x1, · · · , xn}, S=diag{s1, · · · , sn}, e=(1, · · · ,1)∈Rn.Padova, February 18th, 2020 5

Page 6: Preconditioners for Sparse Approximations and Big Data … · 2020-02-18 · • Compressed Sensing with K. Fountoulakis and P. Zhlobich min x τkxk1 + 1 2kAx−bk 2 2, A∈ R m×n

Preconditioners: Sparse Approximations, Big Data

Interior-Point Framework

The log barrier − log xj“replaces” the inequality xj ≥ 0. x

−ln x

1

We derive the first order optimality conditionsfor the primal barrier problem:

Ax = b,

−Qx+ ATy+ s = c,XSe = µe,

and apply Newton method to solve this system of(nonlinear) equations.Padova, February 18th, 2020 6

Page 7: Preconditioners for Sparse Approximations and Big Data … · 2020-02-18 · • Compressed Sensing with K. Fountoulakis and P. Zhlobich min x τkxk1 + 1 2kAx−bk 2 2, A∈ R m×n

Preconditioners: Sparse Approximations, Big Data

First Order Opt Conditions for QP

Ax = b,

ATy+ s−Qx = c,XSe = 0,(x, s) ≥ 0,

First Order Opt Conditions for Barrier QP

Ax = b,

ATy+ s−Qx = c,XSe = µe,

(x, s) > 0,

Padova, February 18th, 2020 7

Page 8: Preconditioners for Sparse Approximations and Big Data … · 2020-02-18 · • Compressed Sensing with K. Fountoulakis and P. Zhlobich min x τkxk1 + 1 2kAx−bk 2 2, A∈ R m×n

Preconditioners: Sparse Approximations, Big Data

Homotopy/Continuation

LP/QP is about “solving” the following equation

xj · sj = 0 ∀j = 1,2, ..., n.

Do not guess!Replace this equation with

xj · sj = µ ∀j = 1,2, ..., n,

and decrease µ gradually to zero.

Let the intelligent IPMs find outwhat the optimal partition is.

Padova, February 18th, 2020 8

Page 9: Preconditioners for Sparse Approximations and Big Data … · 2020-02-18 · • Compressed Sensing with K. Fountoulakis and P. Zhlobich min x τkxk1 + 1 2kAx−bk 2 2, A∈ R m×n

Preconditioners: Sparse Approximations, Big Data

The pronunciation of Greek letter µ

Robert De Niro, Taxi Driver (1976)

Padova, February 18th, 2020 9

Page 10: Preconditioners for Sparse Approximations and Big Data … · 2020-02-18 · • Compressed Sensing with K. Fountoulakis and P. Zhlobich min x τkxk1 + 1 2kAx−bk 2 2, A∈ R m×n

Preconditioners: Sparse Approximations, Big Data

Big Data Optimization

Padova, February 18th, 2020 10

Page 11: Preconditioners for Sparse Approximations and Big Data … · 2020-02-18 · • Compressed Sensing with K. Fountoulakis and P. Zhlobich min x τkxk1 + 1 2kAx−bk 2 2, A∈ R m×n

Preconditioners: Sparse Approximations, Big Data

Sparse Approximation

• Machine Learning: Classification with SVMs

• Statistics: Estimate x from observations

• Wavelet-based signal/image reconst. & restoration

• Compressed Sensing (Signal Processing)

All such problems lead to the same dense, potentiallyvery large QP.

Padova, February 18th, 2020 11

Page 12: Preconditioners for Sparse Approximations and Big Data … · 2020-02-18 · • Compressed Sensing with K. Fountoulakis and P. Zhlobich min x τkxk1 + 1 2kAx−bk 2 2, A∈ R m×n

Preconditioners: Sparse Approximations, Big Data

Binary Classification

min τ‖x‖1+m∑

i=1log(1+e−bix

Tai) min τ‖x‖22+m∑

i=1log(1+e−bix

Tai)

Padova, February 18th, 2020 12

Page 13: Preconditioners for Sparse Approximations and Big Data … · 2020-02-18 · • Compressed Sensing with K. Fountoulakis and P. Zhlobich min x τkxk1 + 1 2kAx−bk 2 2, A∈ R m×n

Preconditioners: Sparse Approximations, Big Data

Bayesian Statistics Viewpoint

Estimate x from observations

y = Ax+ e,

where y are observations and e is the Gaussian noise.

→ minx ‖y − Ax‖22

If the prior on x is Laplacian (log p(x) = −λ‖x‖1 + K)then

minx

τ‖x‖1+ ‖Ax− b‖22

Tibshirani,J. of Royal Statistical Soc B 58 (1996) 267-288.

Padova, February 18th, 2020 13

Page 14: Preconditioners for Sparse Approximations and Big Data … · 2020-02-18 · • Compressed Sensing with K. Fountoulakis and P. Zhlobich min x τkxk1 + 1 2kAx−bk 2 2, A∈ R m×n

Preconditioners: Sparse Approximations, Big Data

ℓ1-regularization

minx

τ‖x‖1+ φ(x).

Unconstrained optimization ⇒ easy

Serious Issue: nondifferentiability of ‖.‖1

Two possible tricks:

• Splitting x = u− v with u, v ≥ 0

• Smoothing with pseudo-Huber approximation

replaces ‖x‖1 with ψµ(x) =∑ni=1(

µ2+ x2i − µ)

Padova, February 18th, 2020 14

Page 15: Preconditioners for Sparse Approximations and Big Data … · 2020-02-18 · • Compressed Sensing with K. Fountoulakis and P. Zhlobich min x τkxk1 + 1 2kAx−bk 2 2, A∈ R m×n

Preconditioners: Sparse Approximations, Big Data

Huber:

Padova, February 18th, 2020 15

Page 16: Preconditioners for Sparse Approximations and Big Data … · 2020-02-18 · • Compressed Sensing with K. Fountoulakis and P. Zhlobich min x τkxk1 + 1 2kAx−bk 2 2, A∈ R m×n

Preconditioners: Sparse Approximations, Big Data

Continuation

Embed inexact Newton Meth into a homotopy approach:

• Inequalities u ≥ 0, v ≥ 0 −→ use IPM

replace z ≥ 0 with −µ logz and drive µ to zero.

• pseudo-Huber regression −→ use continuation

replace |xi| with µ(

1+x2iµ2

−1) and drive µ to zero.

Questions:

• Theory?

• Practice?

Padova, February 18th, 2020 16

Page 17: Preconditioners for Sparse Approximations and Big Data … · 2020-02-18 · • Compressed Sensing with K. Fountoulakis and P. Zhlobich min x τkxk1 + 1 2kAx−bk 2 2, A∈ R m×n

Preconditioners: Sparse Approximations, Big Data

Main Tool: Inexact Newton Method

Replace an exact Newton direction

∇2f(x)∆x = −∇f(x)

with an inexact one:

∇2f(x)∆x = −∇f(x) + r,

where the error r is small: ‖r‖ ≤ η‖∇f(x)‖, η ∈ (0,1).

The NLP community usually writes it as:

‖∇2f(x)∆x+∇f(x)‖2 ≤ η‖∇f(x)‖2, η ∈ (0,1).

Dembo, Eisenstat & Steihaug,SIAM J. on Numerical Analysis 19 (1982) 400–408.Padova, February 18th, 2020 17

Page 18: Preconditioners for Sparse Approximations and Big Data … · 2020-02-18 · • Compressed Sensing with K. Fountoulakis and P. Zhlobich min x τkxk1 + 1 2kAx−bk 2 2, A∈ R m×n

Preconditioners: Sparse Approximations, Big Data

Inexact Primal-Dual IPM:G., Convergence Analysis of an Inexact Feasible IPM forConvex QP, SIAM J Opt, 23 (2013) No 3, 1510–1527.G., Matrix-Free Interior Point Method,Comput Opt and Appls, 51 (2012) 457–480.

Primal-Dual Newton Conjugate Grad Meth:Fountoulakis and G., A Second-order Method forStrongly Convex ℓ1-regularization Problems,Mathematical Programming, 156 (2016) 189–219.Dassios, Fountoulakis and G., A Preconditionerfor a Primal-Dual Newton Conjugate Gradient Methodfor Compressed Sensing Problems,SIAM J on Sci Comput, 37 (2015) A2783–A2812.

Padova, February 18th, 2020 18

Page 19: Preconditioners for Sparse Approximations and Big Data … · 2020-02-18 · • Compressed Sensing with K. Fountoulakis and P. Zhlobich min x τkxk1 + 1 2kAx−bk 2 2, A∈ R m×n

Preconditioners: Sparse Approximations, Big Data

Compressed Sensing and Continuation

Replaceminx

f(x) = τ‖W ∗x‖1+12‖Ax− b‖22, −→ xτ

withminx

fµ(x) = τψµ(W ∗x) +12‖Ax− b‖22, −→ xτ,µ

Solve approximately a family of problems for a (short)decreasing sequence of µ’s: µ0 > µ1 > µ2 · · ·

Theorem (Brief description)

There exists a µ such that ∀µ ≤ µ the difference of thetwo solutions satisfies

‖xτ,µ − xτ‖2 = O(µ1/2) ∀ τ, µ

Padova, February 18th, 2020 19

Page 20: Preconditioners for Sparse Approximations and Big Data … · 2020-02-18 · • Compressed Sensing with K. Fountoulakis and P. Zhlobich min x τkxk1 + 1 2kAx−bk 2 2, A∈ R m×n

Preconditioners: Sparse Approximations, Big Data

Convergence of the primal-dual Newton CG

Use inexact Newton directions:

‖∇2fµ(x)∆x+∇fµ(x)‖2 ≤ η‖∇fµ(x)‖2, η ∈ (0,1)

computed by the Newton CG method.

Theorem (Primal convergence)

Let {xk}∞k=0 be a sequence generated by pdNCG. Thenthe sequence {xk}∞k=0 converges to the primal (perturbed)solution xτ,µ.

Theorem (Rate of convergence)

If the forcing factor ηk satisfies limk→∞ ηk = 0, thenpdNCG converges superlinearly.

Padova, February 18th, 2020 20

Page 21: Preconditioners for Sparse Approximations and Big Data … · 2020-02-18 · • Compressed Sensing with K. Fountoulakis and P. Zhlobich min x τkxk1 + 1 2kAx−bk 2 2, A∈ R m×n

Preconditioners: Sparse Approximations, Big Data

Continuation

(three examples of ℓ1-regularization)

Padova, February 18th, 2020 21

Page 22: Preconditioners for Sparse Approximations and Big Data … · 2020-02-18 · • Compressed Sensing with K. Fountoulakis and P. Zhlobich min x τkxk1 + 1 2kAx−bk 2 2, A∈ R m×n

Preconditioners: Sparse Approximations, Big Data

Three examples of ℓ1-regularization

• Compressed Sensingwith K. Fountoulakis and P. Zhlobich

minx

τ‖x‖1+12‖Ax− b‖22, A ∈ Rm×n

• Compressed Sensing (Coherent and Redundant Dict.)with I. Dassios and K. Fountoulakis

minx

τ‖W ∗x‖1+12‖Ax− b‖22, W ∈ Cn×l, A ∈ Rm×n

think of Total Variation

• Big Data optimization (Machine Learning)with K. Fountoulakis

Padova, February 18th, 2020 22

Page 23: Preconditioners for Sparse Approximations and Big Data … · 2020-02-18 · • Compressed Sensing with K. Fountoulakis and P. Zhlobich min x τkxk1 + 1 2kAx−bk 2 2, A∈ R m×n

Preconditioners: Sparse Approximations, Big Data

Example 1: Compressed Sensing

with K. Fountoulakis and P. Zhlobich

Large dense quadratic optimization problem:

minx

τ‖x‖1+12‖Ax− b‖22,

where A ∈ Rm×n is a very special matrix.

Fountoulakis, G., ZhlobichMatrix-free IPM for Compressed Sensing Problems,Math. Prog. Computation 6 (2014), pp. 1–31.

Software available at http://www.maths.ed.ac.uk/ERGO/

Padova, February 18th, 2020 23

Page 24: Preconditioners for Sparse Approximations and Big Data … · 2020-02-18 · • Compressed Sensing with K. Fountoulakis and P. Zhlobich min x τkxk1 + 1 2kAx−bk 2 2, A∈ R m×n

Preconditioners: Sparse Approximations, Big Data

Restricted Isometry Property (RIP)

• rows of A are orthogonal to each other (A is built ofa subset of rows of an othonormal matrix U ∈Rn×n)

AAT = Im.

• small subsets of columns of A are nearly-orthogonalto each other: Restricted Isometry Property (RIP)

‖AT A−m

nIk‖ ≤ δk ∈ (0,1).

Candes, Romberg & Tao,Comm on Pure and Appl Maths 59 (2005) 1207-1233.

Padova, February 18th, 2020 24

Page 25: Preconditioners for Sparse Approximations and Big Data … · 2020-02-18 · • Compressed Sensing with K. Fountoulakis and P. Zhlobich min x τkxk1 + 1 2kAx−bk 2 2, A∈ R m×n

Preconditioners: Sparse Approximations, Big Data

Restricted Isometry Property

Matrix A ∈ Rm×k (k ≪ n) is built of a subset of columnsof A ∈ Rm×n.

A = −→ A =

AT A = = ≈m

nIk.

This yields a very well conditioned optimization problem.

Padova, February 18th, 2020 25

Page 26: Preconditioners for Sparse Approximations and Big Data … · 2020-02-18 · • Compressed Sensing with K. Fountoulakis and P. Zhlobich min x τkxk1 + 1 2kAx−bk 2 2, A∈ R m×n

Preconditioners: Sparse Approximations, Big Data

Problem Reformulation

minx

τ‖x‖1+12‖Ax− b‖22

Replace x = x+ − x− to be able to use |x| = x++ x−.Use |xi| = zi+ zi+n to replace ‖x‖1 with ‖x‖1 = 1T2nz.

(Increases problem dimension from n to 2n.)

minz≥0

cTz +12zTQz,

where

Q =[

AT

−AT

]

[A −A ] =[

ATA −ATA

−ATA ATA

]

∈ R2n×2n

Padova, February 18th, 2020 26

Page 27: Preconditioners for Sparse Approximations and Big Data … · 2020-02-18 · • Compressed Sensing with K. Fountoulakis and P. Zhlobich min x τkxk1 + 1 2kAx−bk 2 2, A∈ R m×n

Preconditioners: Sparse Approximations, Big Data

Preconditioner

Approximate

M =[

ATA −ATA

−ATA ATA

]

+

[

Θ−11

Θ−12

]

withP =

m

n

[In −In

−In In

]

+

[

Θ−11

Θ−12

]

.

We expect (optimal partition):

• k entries of Θ−1 → 0, k ≪ 2n,

• 2n− k entries of Θ−1 → ∞.

Padova, February 18th, 2020 27

Page 28: Preconditioners for Sparse Approximations and Big Data … · 2020-02-18 · • Compressed Sensing with K. Fountoulakis and P. Zhlobich min x τkxk1 + 1 2kAx−bk 2 2, A∈ R m×n

Preconditioners: Sparse Approximations, Big Data

Spectral Properties of P−1M

Theorem

• Exactly n eigenvalues of P−1M are 1.

• The remaining n eigenvalues satisfy

|λ(P−1M)− 1| ≤ δk +n

mδkL,

where δk is the RIP-constant, andL is a threshold of “large” (Θ1+Θ2)−1.

Fountoulakis, G., ZhlobichMatrix-free IPM for Compressed Sensing Problems,Math. Prog. Computation 6 (2014), pp. 1–31.

Padova, February 18th, 2020 28

Page 29: Preconditioners for Sparse Approximations and Big Data … · 2020-02-18 · • Compressed Sensing with K. Fountoulakis and P. Zhlobich min x τkxk1 + 1 2kAx−bk 2 2, A∈ R m×n

Preconditioners: Sparse Approximations, Big Data

Preconditioning

2 4 6 8 10 12 14 16 18 20 2210

0

101

102

103

Number of CG/PCG call

Ma

tVe

cs

Matrix−vector products per CG/PCG call

Unpreconditioned CG

Preconditioned CG

2 4 6 8 10 12 14 16 18 20 2210

−6

10−4

10−2

100

102

104

106

108

number of CG/PCG call m

ag

nit

ud

e o

f λ(M

)/λ(P

−1M

)

Spread of λ(M)/λ(P−1

M) per call of CG/PCG

λ(M)

λ(P−1

M)

−→ good clustering of eigenvalues

mf-IPM compares favourably with NestA on easy probs(NestA: Becker, Bobin and Candes).

Padova, February 18th, 2020 29

Page 30: Preconditioners for Sparse Approximations and Big Data … · 2020-02-18 · • Compressed Sensing with K. Fountoulakis and P. Zhlobich min x τkxk1 + 1 2kAx−bk 2 2, A∈ R m×n

Preconditioners: Sparse Approximations, Big Data

SPARCO problems

Comparison on 18 out of 26 classes of problems(all but 6 complex and 2 installation-dependent ones).

Solvers compared:PDCO, Saunders and Kim, Stanford,ℓ1−ℓs, Kim, Koh, Lustig, Boyd, Gorinevsky, Stanford,FPC-AS-CG, Wen, Yin, Goldfarb, Zhang, Rice,SPGL1, Van Den Berg, Friedlander, Vancouver, andmf-IPM, Fountoulakis, G., Zhlobich, Edinburgh.

On 36 runs (noisy and noiseless problems), mf-IPM:• is the fastest on 11,• is the second best on 14, and• overall is very robust.

Padova, February 18th, 2020 30

Page 31: Preconditioners for Sparse Approximations and Big Data … · 2020-02-18 · • Compressed Sensing with K. Fountoulakis and P. Zhlobich min x τkxk1 + 1 2kAx−bk 2 2, A∈ R m×n

Preconditioners: Sparse Approximations, Big Data

Linear Algebra Perspective

Convert a difficulty into an advantage

Interestingly the same trick works:in IPMs and in Big Data!

Padova, February 18th, 2020 31

Page 32: Preconditioners for Sparse Approximations and Big Data … · 2020-02-18 · • Compressed Sensing with K. Fountoulakis and P. Zhlobich min x τkxk1 + 1 2kAx−bk 2 2, A∈ R m×n

Preconditioners: Sparse Approximations, Big Data

Linear Algebra of IPMs for LP/QP

Newton direction

A 0 0−Q AT IS 0 X

∆x∆y∆s

=

ξpξdξµ

.

Eliminate ∆s from the second equation and get[

−Q−Θ−1 AT

A 0

] [∆x∆y

]

=[ξd −X−1ξµ

ξp

]

,

where Θ = XS−1 is a diagonal scaling matrix.

Eliminate ∆x from the first equation and get

(A(Q+Θ−1)−1AT )∆y = g.

Padova, February 18th, 2020 32

Page 33: Preconditioners for Sparse Approximations and Big Data … · 2020-02-18 · • Compressed Sensing with K. Fountoulakis and P. Zhlobich min x τkxk1 + 1 2kAx−bk 2 2, A∈ R m×n

Preconditioners: Sparse Approximations, Big Data

IPM Linear Algebra: Splitting Preconditioner

For “basic” variables, xB → xB > 0 and sB → sB = 0hence

Θj = xj/sj ≈ (x2j)/(xjsj) = O(µ−1) ∀j ∈ B.

For “non-basic” variables, xN→ xN = 0 and sN→ sN > 0hence

Θj = xj/sj ≈ (xjsj)/(s2j) = O(µ) ∀j ∈ N .

Convert a difficulty into an advantage→ Exploit the property:

As µ→ 0 then:ΘB → ∞, Θ−1

B → 0;

ΘN → 0, Θ−1N → ∞.

Padova, February 18th, 2020 33

Page 34: Preconditioners for Sparse Approximations and Big Data … · 2020-02-18 · • Compressed Sensing with K. Fountoulakis and P. Zhlobich min x τkxk1 + 1 2kAx−bk 2 2, A∈ R m×n

Preconditioners: Sparse Approximations, Big Data

Oliveira & Sorensen, LAA 394 (2005) 1-24.

Guess basic/nonbasic partition A= [B|N ], invertible B.

E−1HE−T

=

Θ1/2B Θ−1/2

B B−1

Θ1/2N

Θ1/2B

−Θ−1B BT

−Θ−1N NT

B N 0

Θ1/2B Θ1/2

B

Θ1/2N

B−TΘ−1/2B

=

Im WT

W −In−m−Im

, where W = Θ1/2N NTB−TΘ−1/2

B .

With µ→ 0 we have Θ−1B → 0, ΘN → 0 hence

W = Θ1/2N NTB−TΘ−1/2

B ≈ 0.

Padova, February 18th, 2020 34

Page 35: Preconditioners for Sparse Approximations and Big Data … · 2020-02-18 · • Compressed Sensing with K. Fountoulakis and P. Zhlobich min x τkxk1 + 1 2kAx−bk 2 2, A∈ R m×n

Preconditioners: Sparse Approximations, Big Data

Property of Sparse Approximations

minx

f(x) = τ‖x‖1+ ‖Ax− b‖22

Assume a sparse solution exists x = [xB | xZ] with xZ = 0.Partition matrix A = [AB |AZ] accordingly.

Then only a small subset of the Hessian ATA is “relevant”

ATA =

[

ATBAB ATBAZ

ATZAB ATZAZ

]

Padova, February 18th, 2020 35

Page 36: Preconditioners for Sparse Approximations and Big Data … · 2020-02-18 · • Compressed Sensing with K. Fountoulakis and P. Zhlobich min x τkxk1 + 1 2kAx−bk 2 2, A∈ R m×n

Preconditioners: Sparse Approximations, Big Data

Splitting x = u − v and IPMs

There is a need to solve equations with ATA+Θ−1 and

• Θ−1j → 0, for j in the sparse part B (xj > 0),

• Θ−1j → ∞, for j in the zero part Z (xj ≈ 0).

Then

ATA+Θ−1 =

[

ATBAB +Θ−1B ATBAZ

ATZAB ATZAZ +Θ−1Z

]

[

ATBAB

Θ−1Z

]

From RIP: ATBAB ≈ mn I.

Padova, February 18th, 2020 36

Page 37: Preconditioners for Sparse Approximations and Big Data … · 2020-02-18 · • Compressed Sensing with K. Fountoulakis and P. Zhlobich min x τkxk1 + 1 2kAx−bk 2 2, A∈ R m×n

Preconditioners: Sparse Approximations, Big Data

Smoothing with pseudo-Huber approximation

There is a need to solve equations with ATA+ ∇2ψµ(x)and

• ψµ(xj) ≫ 0 and ∇2ψµ(xj) → 0, for j in part B,

• ψµ(xj) ≈ 0 and ∇2ψµ(xj) → 1µ, for j in part Z.

Then

ATA+∇2ψµ(x) =

[

ATBAB+∇2ψµ(xB) ATBAZ

ATZAB ATZAZ+∇2ψµ(xZ)

]

[

ATBAB1µI

]

Padova, February 18th, 2020 37

Page 38: Preconditioners for Sparse Approximations and Big Data … · 2020-02-18 · • Compressed Sensing with K. Fountoulakis and P. Zhlobich min x τkxk1 + 1 2kAx−bk 2 2, A∈ R m×n

Preconditioners: Sparse Approximations, Big Data

Example 2: CS, Coherent & Redundant Dict.

with I. Dassios and K. Fountoulakis.

Large dense quadratic optimization problem:

minx

τ‖W ∗x‖1+12‖Ax− b‖22,

where A ∈ Rm×n and W ∈ Cn×l is a dictionary.

Dassios, Fountoulakis and G.A Preconditioner for a Primal-Dual Newton ConjugateGradient Method for Compressed Sensing Problems,SIAM J on Sci. Comput. 37 (2015) A2783–A2812.

Software available at http://www.maths.ed.ac.uk/ERGO/

Padova, February 18th, 2020 38

Page 39: Preconditioners for Sparse Approximations and Big Data … · 2020-02-18 · • Compressed Sensing with K. Fountoulakis and P. Zhlobich min x τkxk1 + 1 2kAx−bk 2 2, A∈ R m×n

Preconditioners: Sparse Approximations, Big Data

W-Restricted Isometry Property (W-RIP)

• rows of A are nearly-orthogonal to each other, i.e.,there exists a small constant δ such that

‖AAT − Im‖ ≤ δ.

• W-Restricted Isometry Property (W-RIP):there exists a constant δq such that

(1− δq)‖Wz‖22 ≤ ‖AWz‖22 ≤ (1 + δq)‖Wz‖22

for all at most q-sparse z ∈ Cn.

Candes, Eldar & Needell,Appl and Comp Harmonic Anal 31 (2011) 59-73.

Padova, February 18th, 2020 39

Page 40: Preconditioners for Sparse Approximations and Big Data … · 2020-02-18 · • Compressed Sensing with K. Fountoulakis and P. Zhlobich min x τkxk1 + 1 2kAx−bk 2 2, A∈ R m×n

Preconditioners: Sparse Approximations, Big Data

Preconditioner

ApproximateH = τ∇2ψµ(W ∗x) + ATA

withP = τ∇2ψµ(W ∗x) + ρIn.

We expect (optimal partition):

• k entries of W ∗x ≫ 0, k ≪ l, ∇2ψµ(W ∗x)j → 0,

• l−k entries of W ∗x ≈ 0, ∇2ψµ(W ∗x)j → 1µ.

The preconditioner approximates well the 2nd derivativeof the pseudo-Huber regularization.

Padova, February 18th, 2020 40

Page 41: Preconditioners for Sparse Approximations and Big Data … · 2020-02-18 · • Compressed Sensing with K. Fountoulakis and P. Zhlobich min x τkxk1 + 1 2kAx−bk 2 2, A∈ R m×n

Preconditioners: Sparse Approximations, Big Data

Spectral Properties of P−1H

Theorem

• The eigenvalues of P−1H satisfy

|λ(P−1H)− 1| ≤η(δ, δq, ρ)

ρ,

where δq is the W-RIP constant,δ is another small constant, andη(δ, δq, ρ) is some simple function.

Dassios, Fountoulakis and G.A Preconditioner for a Primal-Dual Newton ConjugateGradient Method for Compressed Sensing Problems,SIAM J on Sci. Comput. 37 (2015) A2783–A2812.Padova, February 18th, 2020 41

Page 42: Preconditioners for Sparse Approximations and Big Data … · 2020-02-18 · • Compressed Sensing with K. Fountoulakis and P. Zhlobich min x τkxk1 + 1 2kAx−bk 2 2, A∈ R m×n

Preconditioners: Sparse Approximations, Big Data

CS: Coherent and Redundant Dictionaries

−→ good clustering of eigenvalues

pdNCG outperforms TFOCS on several examples(TFOCS: Becker, Candes and Grant).

Padova, February 18th, 2020 42

Page 43: Preconditioners for Sparse Approximations and Big Data … · 2020-02-18 · • Compressed Sensing with K. Fountoulakis and P. Zhlobich min x τkxk1 + 1 2kAx−bk 2 2, A∈ R m×n

Preconditioners: Sparse Approximations, Big Data

Example 3: Big Data and Optimization

with K. Fountoulakis.

Large dense quadratic optimization problem:

minx

τ‖x‖1+12‖Ax− b‖22,

where A ∈ Rm×n.

Fountoulakis and G.Performance of First- and Second-Order Methods for ℓ1-regularized Least Squares Problems, COAP 65 (2016)605–635.

Software availablehttp://www.maths.ed.ac.uk/ERGO/trillion/

Padova, February 18th, 2020 43

Page 44: Preconditioners for Sparse Approximations and Big Data … · 2020-02-18 · • Compressed Sensing with K. Fountoulakis and P. Zhlobich min x τkxk1 + 1 2kAx−bk 2 2, A∈ R m×n

Preconditioners: Sparse Approximations, Big Data

Simple example for ℓ1-regularization

minx

τ‖x‖1+ ‖Ax− b‖22

Special matrix given in SVD form A = QΣGT .

Matlab generator:http://www.maths.ed.ac.uk/ERGO/trillion/

The user controls:

• the condition number κ(A),

• the sparsity of matrix A.

Padova, February 18th, 2020 44

Page 45: Preconditioners for Sparse Approximations and Big Data … · 2020-02-18 · • Compressed Sensing with K. Fountoulakis and P. Zhlobich min x τkxk1 + 1 2kAx−bk 2 2, A∈ R m×n

Preconditioners: Sparse Approximations, Big Data

Simple example for ℓ1-regularization

Suppose m ≥ n. Write A in the SVD form:

A = Q

[diag{σ1, σ2, . . . , σn}

0

]

GT ,

where Q is an m×m orthogonal matrix,σ1, σ2, . . . , σn are the singular values of A,G is a product of Givens rotations

G = G(i1, j1, θ1)G(i2, j2, θ2) . . . G(iK, jK, θK),

hence it is an orthonormal matrix.Observe that

ATA = G[

diag{σ21, σ22, . . . , σ

2n}

]

GT .

Padova, February 18th, 2020 45

Page 46: Preconditioners for Sparse Approximations and Big Data … · 2020-02-18 · • Compressed Sensing with K. Fountoulakis and P. Zhlobich min x τkxk1 + 1 2kAx−bk 2 2, A∈ R m×n

Preconditioners: Sparse Approximations, Big Data

Excessive Computational Tests (4 mths of CPU)

• FISTA (Fast Iterative Shrinkage-Thresholding Alg)

• PCDM (Parallel Coordinate Descent Method)

• PSSgb (Projected Scaled Subgrad, Gafni-Bertsekas)

• pdNCG (primal-dual Newton Conjugate Gradients)

The 1-st order methods:

• work well if the condition number κ(A) ≤ 102,

• struggle when κ(A) ≥ 103,

• stall when κ(A) ≥ 104.

The 2-nd ordermethod (pdNCG, diag preconditioner):

• works well if the condition number κ(A) ≤ 106.

Padova, February 18th, 2020 46

Page 47: Preconditioners for Sparse Approximations and Big Data … · 2020-02-18 · • Compressed Sensing with K. Fountoulakis and P. Zhlobich min x τkxk1 + 1 2kAx−bk 2 2, A∈ R m×n

Preconditioners: Sparse Approximations, Big Data

Let us go big: a trillion (240) variables

n (billions) Processors Memory (TB) time (s)1 64 0.192 19234 256 0.768 196816 1024 3.072 198664 4096 12.288 1970256 16384 49.152 1990

1,024 65536 196.608 2006

ARCHER (ranked 25 on top500.com, 11 March 2015)Linpack Performance (Rmax) 1,642.54 TFlop/sTheoretical Peak (Rpeak) 2,550.53 TFlop/sFountoulakis and G. Performance of First- andSecond-Order Methods for ℓ1-regularized Least SquaresProblems, COAP 65 (2016) 605–635.Padova, February 18th, 2020 47

Page 48: Preconditioners for Sparse Approximations and Big Data … · 2020-02-18 · • Compressed Sensing with K. Fountoulakis and P. Zhlobich min x τkxk1 + 1 2kAx−bk 2 2, A∈ R m×n

Preconditioners: Sparse Approximations, Big Data

Conclusions

2nd-order methods for optimization:

• employ inexact Newton method• rely on preconditioners• enjoy matrix-free implementation

Trick:

• find the “essential subspace” and• exploit it to simplify the linear algebra

– works in IPMs for LP– works in Newton CG for ℓ1-regularization

Simple, reliable test example for ℓ1-regularization:http://www.maths.ed.ac.uk/ERGO/trillion/

Padova, February 18th, 2020 48

Page 49: Preconditioners for Sparse Approximations and Big Data … · 2020-02-18 · • Compressed Sensing with K. Fountoulakis and P. Zhlobich min x τkxk1 + 1 2kAx−bk 2 2, A∈ R m×n

Preconditioners: Sparse Approximations, Big Data

Baby test example: RCDC vs pdNCG

10−1

100

101

102

103

106

108

1010

1012

1014

1016

CPU time

Ob

jecti

ve f

un

cti

on

dcNCG

RCDC

10−1

100

101

102

103

106

108

1010

1012

1014

CPU time

dcNCG

RCDC

Dimensions: m = 4× 103, n = 2× 103.x∗ has 50 non-zero elements randomly positioned.

RCDC interrupted after 109 iterations, 31 hours.

Padova, February 18th, 2020 49

Page 50: Preconditioners for Sparse Approximations and Big Data … · 2020-02-18 · • Compressed Sensing with K. Fountoulakis and P. Zhlobich min x τkxk1 + 1 2kAx−bk 2 2, A∈ R m×n

Preconditioners: Sparse Approximations, Big Data

ID rhs Accuracy mfIPM ℓ1 ℓs pdco fpc as cg spgl1

2 b 3.0e-04 61 48 687 9 40000b 1.0e-11 65 98 40007 40002 22

3 b 7.0e-04 241 462 4941 106 40000b 1.0e-08 415 1612 40157 212 148

5 b 2.0e-03 5991 9842 28203 521 40000b 2.0e-05 7953 19684 41283 874 2567

10 b 1.0e-03 4775 8529 6203 40002 40000b 9.0e-10 4567 8192 41227 40161 40000

701 b 2.0e-02 947 1794 5967 1049 40000b 7.0e-09 1341 2656 42041 40017 15239

702 b 4.0e-03 809 1574 3341 40001 40000b 1.0e-07 1123 3030 49563 40157 11089

Padova, February 18th, 2020 50

Page 51: Preconditioners for Sparse Approximations and Big Data … · 2020-02-18 · • Compressed Sensing with K. Fountoulakis and P. Zhlobich min x τkxk1 + 1 2kAx−bk 2 2, A∈ R m×n

Preconditioners: Sparse Approximations, Big Data

A better linearization

τ Dx︸︷︷︸

∇ψµ(x)

+ AT (Ax− b) = 0,

where D:=diag(D1,...,Dn) with Di:=(µ2+x2i )−1

2 ∀i=1,...,n

Set g = Dx. Use the easier form of the equations.

Difficult:

τg+ AT (Ax− b) = 0,g = Dx.

Easy:

τg+ AT (Ax− b) = 0,D−1g = x.

Chan, Golub, Mulet,SIAM J. on Sci. Comput. 20 (1999) 1964–1977.

Padova, February 18th, 2020 51

Page 52: Preconditioners for Sparse Approximations and Big Data … · 2020-02-18 · • Compressed Sensing with K. Fountoulakis and P. Zhlobich min x τkxk1 + 1 2kAx−bk 2 2, A∈ R m×n

Preconditioners: Sparse Approximations, Big Data

A better linearizationExample: gi = 0.99

bad: gi = Dixi good: D−1i gi = xi

Padova, February 18th, 2020 52