fast first-order methods for convex optimization with line search · 2016-04-11 · at the core of...

46
Fast first-order methods for convex optimization with line search Katya Scheinberg Lehigh University (joint work with X. Bai, D. Goldfarb and S. Ma) 6/26/12 ERGO seminar, The University of Ediburgh

Upload: others

Post on 07-Apr-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Fast first-order methods for convex optimization with line search · 2016-04-11 · At the core of many statistical machine learning problems lies an optimization problem, often convex,

Fast first-order methods for convex optimization

with line search

Katya Scheinberg Lehigh University

(joint work with X. Bai, D. Goldfarb and S. Ma)

TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAAAAAAAAAAAAAA

6/26/12 ERGO seminar, The University of Ediburgh

Page 2: Fast first-order methods for convex optimization with line search · 2016-04-11 · At the core of many statistical machine learning problems lies an optimization problem, often convex,

l  The field of convex optimization has been extensively developed since Khachian showed in 1979 that ellipsoid method has polynomial complexity when applied to LP.

l  General theory of interior point algorithms for convex optimization was developed by Nesterov and Nemirovskii.

l  Any convex optimization problem can be solved in polynomial time by an IPM. For some known classes (LP, QP, SDP) the IPMs are readily available.

l  For decades optimization methods relied of the fact that the problem data, when large, is typically sparse.

l  Second-order methods (IPM) have good convergence rate, but high per iteration complexity. They exploit sparsity structure to facilitate linear algebra.

l  First-order methods (gradient based) have slow convergence and were considered inefficient.

Introduction

6/26/12 ERGO seminar, The University of Ediburgh

Page 3: Fast first-order methods for convex optimization with line search · 2016-04-11 · At the core of many statistical machine learning problems lies an optimization problem, often convex,

l  At the core of many statistical machine learning problems lies an optimization problem, often convex, from a well-studied class (LP, QP, SDP).

l  These problems are very large and dense in terms of data. l  IPMs are often too expensive to use. ML community initially

assumed that traditional optimization methods have to be abandoned.

l  However, often structure (sparsity) is present in the solution. l  This structure can be well exploited by first-order

approaches to convex optimization. l  Recent advances in complexity results give rise to very

significant interest in first-order methods.

Introduction

6/26/12 ERGO seminar, The University of Ediburgh

Page 4: Fast first-order methods for convex optimization with line search · 2016-04-11 · At the core of many statistical machine learning problems lies an optimization problem, often convex,

l  Problem:

l  Assumptions:

Problem under consideration

Many applications involve optimization of this form

6/26/12 ERGO seminar, The University of Ediburgh

Page 5: Fast first-order methods for convex optimization with line search · 2016-04-11 · At the core of many statistical machine learning problems lies an optimization problem, often convex,

l  Breast cancer diagnostics l  Test results of a group of patients, some have been

diagnosed with cancer, other do not have it. Find how the test can predict high risk patients.

l  Spam filter l  From a list of spam and nonspam labeled emails learn to

detect spam automatically.

l  Genetic disease l  find away to identify high risk individuals based on gene

expression data. l  Target customer groups

l  By demographic data and past purchases find customers most likely to buy certain products.

Support vector machine classification

6/26/12 ERGO seminar, The University of Ediburgh

Page 6: Fast first-order methods for convex optimization with line search · 2016-04-11 · At the core of many statistical machine learning problems lies an optimization problem, often convex,

l  Problem:

Support Vector Machines

+ -

6/26/12 ERGO seminar, The University of Ediburgh

Page 7: Fast first-order methods for convex optimization with line search · 2016-04-11 · At the core of many statistical machine learning problems lies an optimization problem, often convex,

l  Problem:

Linear Support Vector Machines

+ -

6/26/12 ERGO seminar, The University of Ediburgh

Page 8: Fast first-order methods for convex optimization with line search · 2016-04-11 · At the core of many statistical machine learning problems lies an optimization problem, often convex,

l  Compresses sensing, MRI l  Recover sparse signal x, which satisfies Ax=b.

l  Sparse least square regression (Lasso) l  Find linear regression models while selecting important

features.

l  Regression models using polynomials with variable selection l  birthweight dataset from Hosmer and

Lemeshow (1989), weight of 189 babies and 8 variables per mother. Predictive models for birthweight.

Sparse models

6/26/12 ERGO seminar, The University of Ediburgh

Page 9: Fast first-order methods for convex optimization with line search · 2016-04-11 · At the core of many statistical machine learning problems lies an optimization problem, often convex,

l  Problem:

Lasso regression

6/26/12 ERGO seminar, The University of Ediburgh

Page 10: Fast first-order methods for convex optimization with line search · 2016-04-11 · At the core of many statistical machine learning problems lies an optimization problem, often convex,

l  Problem:

Group Lasso regression

•  Assume that columns of A form groups of correlated features. •  Find sparse vector x where nonzeros are selected according to groups •  xi is a subvector of x corresponding to the i-th group of features.

6/26/12 ERGO seminar, The University of Ediburgh

Page 11: Fast first-order methods for convex optimization with line search · 2016-04-11 · At the core of many statistical machine learning problems lies an optimization problem, often convex,

FMRI Analysis and schizophrenia prediction

Network Extracted

Correlation Matrix

(N2=2x1010)Thresholded Matrix

MR

Signal

M1

V1

PP

1 N1

N

-0.5

0

0.5

1

1 N1

N

Measuring blood oxigination in voxels of the brain.

Construct predictive models based on FMRI data, use to predict/diagnose schizophrenia or classify “states of mind”.

6/26/12 ERGO seminar, The University of Ediburgh

Page 12: Fast first-order methods for convex optimization with line search · 2016-04-11 · At the core of many statistical machine learning problems lies an optimization problem, often convex,

l  Problem:

l  Formulation:

Sparse Inverse Covariance Selection

6/26/12 ERGO seminar, The University of Ediburgh

Page 13: Fast first-order methods for convex optimization with line search · 2016-04-11 · At the core of many statistical machine learning problems lies an optimization problem, often convex,

l  Lasso or CS:

l  Group Lasso or MMV

l  Matrix Completion

l  Robust PCA

l  SICS

Summary and add’l examples

6/26/12 ERGO seminar, The University of Ediburgh

Page 14: Fast first-order methods for convex optimization with line search · 2016-04-11 · At the core of many statistical machine learning problems lies an optimization problem, often convex,

First-order methods applied to problems of the form f(x)+g(x)

6/26/12 ERGO seminar, The University of Ediburgh

Page 15: Fast first-order methods for convex optimization with line search · 2016-04-11 · At the core of many statistical machine learning problems lies an optimization problem, often convex,

l  Consider:

l  Quadratic upper approximation

Prox method with nonsmooth term

Assume that g(y) is such that the above function is easy to optimize over y

6/26/12 ERGO seminar, The University of Ediburgh

Page 16: Fast first-order methods for convex optimization with line search · 2016-04-11 · At the core of many statistical machine learning problems lies an optimization problem, often convex,

l  Minimize quadratic upper approximation on each iteration

l  O(L/²) complexity: If ¯/L· µ· 1/L then in k iterations finds solution

ISTA/prox gradient projection

6/26/12 ERGO seminar, The University of Ediburgh

Page 17: Fast first-order methods for convex optimization with line search · 2016-04-11 · At the core of many statistical machine learning problems lies an optimization problem, often convex,

l  Minimize upper approximation at a “shifted” point.

l  complexity: If ¯/L·µ· 1/L then in k iterations finds solution

Fast first-order method Nesterov, Beck & Teboulle

6/26/12 ERGO seminar, The University of Ediburgh

Page 18: Fast first-order methods for convex optimization with line search · 2016-04-11 · At the core of many statistical machine learning problems lies an optimization problem, often convex,

Specifically for CS setting and Lasso

f(x) g(x)

2 matrix/vector multiplications + shrinkage operator per iteration

iteration bound

6/26/12 ERGO seminar, The University of Ediburgh

Page 19: Fast first-order methods for convex optimization with line search · 2016-04-11 · At the core of many statistical machine learning problems lies an optimization problem, often convex,

Choosing prox parameter via backtracking

6/26/12 ERGO seminar, The University of Ediburgh

Page 20: Fast first-order methods for convex optimization with line search · 2016-04-11 · At the core of many statistical machine learning problems lies an optimization problem, often convex,

•  Minimize quadratic upper relaxation on each iteration

•  Using backtracking find µk such that

•  In k iterations finds solution

Beck&Teboulle, Tseng, Auslender&Teboulle, 2008

Iterative Shrinkage Threshholding Algorithm (ISTA)

6/26/12 ERGO seminar, The University of Ediburgh

Page 21: Fast first-order methods for convex optimization with line search · 2016-04-11 · At the core of many statistical machine learning problems lies an optimization problem, often convex,

Minimize quadratic upper relaxation on each iteration

Using backtracking find µk · µk-1 such that

In k iterations finds solution

Fast Iterative Shrinkage Threshholding Algorithm (FISTA)

Beck&Teboulle, Tseng, 2008 6/26/12 ERGO seminar, The University of Ediburgh

Page 22: Fast first-order methods for convex optimization with line search · 2016-04-11 · At the core of many statistical machine learning problems lies an optimization problem, often convex,

Minimize quadratic upper relaxation on each iteration

Using backtracking find µk · µk-1 such that

In k iterations finds solution

Fast Iterative Shrinkage Threshholding Algorithm (FISTA)

6/26/12 ERGO seminar, The University of Ediburgh

Page 23: Fast first-order methods for convex optimization with line search · 2016-04-11 · At the core of many statistical machine learning problems lies an optimization problem, often convex,

Find µk · µk-1 such that

Cycle to find µk

Convergence rate:

Need to compute Ax-b

6/26/12 ERGO seminar, The University of Ediburgh

Page 24: Fast first-order methods for convex optimization with line search · 2016-04-11 · At the core of many statistical machine learning problems lies an optimization problem, often convex,

Find µk such that

Goldfarb & S. 2010

To allow for larger µk we need to

reduce and vice versa

6/26/12 ERGO seminar, The University of Ediburgh

Page 25: Fast first-order methods for convex optimization with line search · 2016-04-11 · At the core of many statistical machine learning problems lies an optimization problem, often convex,

Find µk such that

Goldfarb & S. 2010

FISTA with full backtracking

Cycle to find µ and t

6/26/12 ERGO seminar, The University of Ediburgh

Page 26: Fast first-order methods for convex optimization with line search · 2016-04-11 · At the core of many statistical machine learning problems lies an optimization problem, often convex,

Find µk such that

Goldfarb & S. 2010

FISTA with full backtracking

Cycle to find µ and t

6/26/12 ERGO seminar, The University of Ediburgh

Bla-bla-bla….

Page 27: Fast first-order methods for convex optimization with line search · 2016-04-11 · At the core of many statistical machine learning problems lies an optimization problem, often convex,

Computational Results

6/26/12 ERGO seminar, The University of Ediburgh

Page 28: Fast first-order methods for convex optimization with line search · 2016-04-11 · At the core of many statistical machine learning problems lies an optimization problem, often convex,

Computational results

6/26/12 ERGO seminar, The University of Ediburgh

Page 29: Fast first-order methods for convex optimization with line search · 2016-04-11 · At the core of many statistical machine learning problems lies an optimization problem, often convex,

Computational results

6/26/12 ERGO seminar, The University of Ediburgh

Page 30: Fast first-order methods for convex optimization with line search · 2016-04-11 · At the core of many statistical machine learning problems lies an optimization problem, often convex,

Computational results

6/26/12 ERGO seminar, The University of Ediburgh

Page 31: Fast first-order methods for convex optimization with line search · 2016-04-11 · At the core of many statistical machine learning problems lies an optimization problem, often convex,

Computational results

6/26/12 ERGO seminar, The University of Ediburgh

Page 32: Fast first-order methods for convex optimization with line search · 2016-04-11 · At the core of many statistical machine learning problems lies an optimization problem, often convex,

Computational results

6/26/12 ERGO seminar, The University of Ediburgh

Page 33: Fast first-order methods for convex optimization with line search · 2016-04-11 · At the core of many statistical machine learning problems lies an optimization problem, often convex,

Complexity bounds on alternating linearization methods

6/26/12 ERGO seminar, The University of Ediburgh

Page 34: Fast first-order methods for convex optimization with line search · 2016-04-11 · At the core of many statistical machine learning problems lies an optimization problem, often convex,

l  Consider:

l  Relax constraints via Augmented Lagrangian technique

Alternating directions method

Assume that f(x) and g(y) are both such that the above functions are easy to optimize in x or y

6/26/12 ERGO seminar, The University of Ediburgh

Page 35: Fast first-order methods for convex optimization with line search · 2016-04-11 · At the core of many statistical machine learning problems lies an optimization problem, often convex,

Sparse Inverse Covariance Selection

Shrinkage O(n2) ops

Eigenvalue decomposition O(n3) ops. Same as one gradient of f(X)

f(x) g(x)

6/26/12 ERGO seminar, The University of Ediburgh

Page 36: Fast first-order methods for convex optimization with line search · 2016-04-11 · At the core of many statistical machine learning problems lies an optimization problem, often convex,

Lasso or group Lasso

Shrinkage O(n2) ops

Matrix inverse, can take O(n3) ops. But can also be O(n ln n) for special A.

f(x) g(x)

6/26/12 ERGO seminar, The University of Ediburgh

Page 37: Fast first-order methods for convex optimization with line search · 2016-04-11 · At the core of many statistical machine learning problems lies an optimization problem, often convex,

Alternating direction method (ADM)

Widely used method without complexity bounds

6/26/12 ERGO seminar, The University of Ediburgh

Page 38: Fast first-order methods for convex optimization with line search · 2016-04-11 · At the core of many statistical machine learning problems lies an optimization problem, often convex,

A slight modification of ADM

This turns out to be equivalent to……

Goldfarb, Ma, S, ’09-’10

6/26/12 ERGO seminar, The University of Ediburgh

Page 39: Fast first-order methods for convex optimization with line search · 2016-04-11 · At the core of many statistical machine learning problems lies an optimization problem, often convex,

Alternating linearization method (ALM) Goldfarb, Ma, S, ’09-’10

6/26/12 ERGO seminar, The University of Ediburgh

Page 40: Fast first-order methods for convex optimization with line search · 2016-04-11 · At the core of many statistical machine learning problems lies an optimization problem, often convex,

Fast ALM (FALM) Goldfarb, Ma, S, ’09-’10

6/26/12 ERGO seminar, The University of Ediburgh

Page 41: Fast first-order methods for convex optimization with line search · 2016-04-11 · At the core of many statistical machine learning problems lies an optimization problem, often convex,

Complexity results

FISTA

FALM

6/26/12 ERGO seminar, The University of Ediburgh

Page 42: Fast first-order methods for convex optimization with line search · 2016-04-11 · At the core of many statistical machine learning problems lies an optimization problem, often convex,

Experiments on SICS

6/26/12 ERGO seminar, The University of Ediburgh

Gene expression networks using the five data sets from Li and Toh(2010) (1) Lymph node status (2) Estrogen receptor; (3) Arabidopsis thaliana; (4) Leukemia; (5) Hereditary breast cancer.

PSM by Duchi et al (2008) and VSM by Lu (2009)

Page 43: Fast first-order methods for convex optimization with line search · 2016-04-11 · At the core of many statistical machine learning problems lies an optimization problem, often convex,

Experiments in CS

6/26/12 ERGO seminar, The University of Ediburgh

Comparison of algorithms on image recovery problem. Here matrix inverse take O(n ln n) ops, as do mat-vec multiplications.

PSM by Duchi et al (2008) and VSM by Lu (2009)

Page 44: Fast first-order methods for convex optimization with line search · 2016-04-11 · At the core of many statistical machine learning problems lies an optimization problem, often convex,

FALM with backtracking Goldfarb, S, ‘10

6/26/12 ERGO seminar, The University of Ediburgh

Page 45: Fast first-order methods for convex optimization with line search · 2016-04-11 · At the core of many statistical machine learning problems lies an optimization problem, often convex,

Conclusion and Future work

l  Performing backtracking carefully is possible and desirable in accelerated first order methods.

l  The trade-offs are different and need to be explored for particular applications beyond CS.

l  Accelerated alternating direction methods can utilize the same ideas.

l  Careful implementation is being considered. l  Combining backtracking with inexact evaluations

may be beneficial. l  Seeking problems where average behavior differs

greatly from the worst case. 6/26/12 ERGO seminar, The University of Ediburgh

Page 46: Fast first-order methods for convex optimization with line search · 2016-04-11 · At the core of many statistical machine learning problems lies an optimization problem, often convex,

Thank you!

6/26/12 ERGO seminar, The University of Ediburgh