dc programming: a brief tutorial. - nyu courantmunoz/slides/dc_slides.pdf · 2014. 12. 2. · dc...

46
DC Programming: A brief tutorial. Andres Munoz Medina Courant Institute of Mathematical Sciences [email protected] Tuesday, November 12, 2013

Upload: others

Post on 01-Jul-2021

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: DC Programming: A brief tutorial. - NYU Courantmunoz/slides/dc_slides.pdf · 2014. 12. 2. · DC Programming: A brief tutorial. Andres Munoz Medina Courant Institute of Mathematical

DC Programming:A brief tutorial.

Andres Munoz MedinaCourant Institute of Mathematical Sciences

[email protected]

Tuesday, November 12, 2013

Page 2: DC Programming: A brief tutorial. - NYU Courantmunoz/slides/dc_slides.pdf · 2014. 12. 2. · DC Programming: A brief tutorial. Andres Munoz Medina Courant Institute of Mathematical

Difference of Convex Functions

• Definition: A function is said to be DC if there exists convex functions such that

• Definition: A function is locally DC if for every there exists a neighborhood such that is DC.

fg, h f = g − h

xf |UU

Tuesday, November 12, 2013

Page 3: DC Programming: A brief tutorial. - NYU Courantmunoz/slides/dc_slides.pdf · 2014. 12. 2. · DC Programming: A brief tutorial. Andres Munoz Medina Courant Institute of Mathematical

Contents

• Motivation.

• DC functions and properties.

• DC programming and DCA.

• CCCP and convergence results.

• Global optimization algorithms.

Tuesday, November 12, 2013

Page 4: DC Programming: A brief tutorial. - NYU Courantmunoz/slides/dc_slides.pdf · 2014. 12. 2. · DC Programming: A brief tutorial. Andres Munoz Medina Courant Institute of Mathematical

Motivation

• Not all machine learning problems are convex anymore

• Transductive SVM’s [Wang et. al 2003]

• Kernel learning [Argyriou et. al 2004]

• Structure prediction [Narasinham 2012]

• Auction mechanism design [MM and Muñoz]

Tuesday, November 12, 2013

Page 5: DC Programming: A brief tutorial. - NYU Courantmunoz/slides/dc_slides.pdf · 2014. 12. 2. · DC Programming: A brief tutorial. Andres Munoz Medina Courant Institute of Mathematical

Notation

• Let be a convex function. The conjugate of is defined as

• For , denotes the subdifferential of at , i.e.

• will denote the exact subdifferential.

gg

g

g∗

∂�g(x0)� > 0

∂�g(x0) = {v ∈ Rn|g(x) ≥ g(x0) + �x− x0, v� − �

g∗(y) = supx�y, x� − g(x)

�x0

∂g(x0)

Tuesday, November 12, 2013

Page 6: DC Programming: A brief tutorial. - NYU Courantmunoz/slides/dc_slides.pdf · 2014. 12. 2. · DC Programming: A brief tutorial. Andres Munoz Medina Courant Institute of Mathematical

DC functions • A function is DC iff is the

integral of a function of bounded variation. [Hartman 59]

• A locally DC function is globally DC. [Hartman 59]

• All twice continuously differentiable functions are DC.

• Closed under sum, negation, supremum and products.

f : R → R

Tuesday, November 12, 2013

Page 7: DC Programming: A brief tutorial. - NYU Courantmunoz/slides/dc_slides.pdf · 2014. 12. 2. · DC Programming: A brief tutorial. Andres Munoz Medina Courant Institute of Mathematical

DC programming

• DC programming refers to optimization problems of the form.

• More generally, for DC functions

minx

g(x)− h(x)

minx

g(x)− h(x)

subject to fi(x) ≤ 0

fi(x)

Tuesday, November 12, 2013

Page 8: DC Programming: A brief tutorial. - NYU Courantmunoz/slides/dc_slides.pdf · 2014. 12. 2. · DC Programming: A brief tutorial. Andres Munoz Medina Courant Institute of Mathematical

Global optimality conditions.

• A point is a global solution if and only if

• Let , then a point is a global solution if and only if

x∗

x∗w∗ = infx

g(x)− h(x)

0 = infx{−h(x) + t|g(x)− t ≤ w∗}

∂�h(x∗) ⊂ ∂�g(x

∗)

Tuesday, November 12, 2013

Page 9: DC Programming: A brief tutorial. - NYU Courantmunoz/slides/dc_slides.pdf · 2014. 12. 2. · DC Programming: A brief tutorial. Andres Munoz Medina Courant Institute of Mathematical

Global optimality conditions.

• A point is a global solution if and only if

• Let , then a point is a global solution if and only if

x∗

x∗w∗ = infx

g(x)− h(x)

0 = infx{−h(x) + t|g(x)− t ≤ w∗}

∂�h(x∗) ⊂ ∂�g(x

∗)

Tuesday, November 12, 2013

Page 10: DC Programming: A brief tutorial. - NYU Courantmunoz/slides/dc_slides.pdf · 2014. 12. 2. · DC Programming: A brief tutorial. Andres Munoz Medina Courant Institute of Mathematical

Global optimality conditions.

• A point is a global solution if and only if

• Let , then a point is a global solution if and only if

x∗

x∗w∗ = infx

g(x)− h(x)

0 = infx{−h(x) + t|g(x)− t ≤ w∗}

∂�h(x∗) ⊂ ∂�g(x

∗)

Tuesday, November 12, 2013

Page 11: DC Programming: A brief tutorial. - NYU Courantmunoz/slides/dc_slides.pdf · 2014. 12. 2. · DC Programming: A brief tutorial. Andres Munoz Medina Courant Institute of Mathematical

Local optimality conditions

• If verifies , then is a strict local minimizer of x∗ ∂h(x∗) ⊂ int∂g(x∗)

g − h

Tuesday, November 12, 2013

Page 12: DC Programming: A brief tutorial. - NYU Courantmunoz/slides/dc_slides.pdf · 2014. 12. 2. · DC Programming: A brief tutorial. Andres Munoz Medina Courant Institute of Mathematical

DC duality

• By definition of conjugate function

• This is the dual of the original problem

infx

g(x)− h(x) = infx

g(x)− (supy�x, y� − h∗(y))

= infx

infyh∗(y) + g(x)− �x, y�

= infyh∗(y)− g∗(y)

Tuesday, November 12, 2013

Page 13: DC Programming: A brief tutorial. - NYU Courantmunoz/slides/dc_slides.pdf · 2014. 12. 2. · DC Programming: A brief tutorial. Andres Munoz Medina Courant Institute of Mathematical

DC algorithm.

• We want to find a sequence that decreases the function at every step.

• Use duality. If then

h∗(y)− g∗(y) = �x0, y� − h(x0)− supx(�x, y� − g(x))

= infx

g(x)− h(x) + �x0 − x, y�

≤ g(x0)− h(x0)

y ∈ ∂h(x0)

xk

Tuesday, November 12, 2013

Page 14: DC Programming: A brief tutorial. - NYU Courantmunoz/slides/dc_slides.pdf · 2014. 12. 2. · DC Programming: A brief tutorial. Andres Munoz Medina Courant Institute of Mathematical

DC algorithm

• Solve the partial problems

S(x∗) inf{h∗(y)− g∗(y) : y ∈ ∂h(x∗)}T (y∗) inf{g(x)− h(x) : x ∈ ∂g∗(y∗)}

• Choose and .• Solve concave minimization problems.• Simplified DCA

yk ∈ S(xk) xk+1 ∈ T (yk)

yk ∈ ∂h(xk) xk ∈ ∂g∗(yk)

Tuesday, November 12, 2013

Page 15: DC Programming: A brief tutorial. - NYU Courantmunoz/slides/dc_slides.pdf · 2014. 12. 2. · DC Programming: A brief tutorial. Andres Munoz Medina Courant Institute of Mathematical

DCA as CCCP

• If the function is differentiable, the simplified DCA becomes and

• Equivalent to

yk = ∇h(xk)xk+1 ∈ ∂g∗(yk)

Tuesday, November 12, 2013

Page 16: DC Programming: A brief tutorial. - NYU Courantmunoz/slides/dc_slides.pdf · 2014. 12. 2. · DC Programming: A brief tutorial. Andres Munoz Medina Courant Institute of Mathematical

DCA as CCCP

• If the function is differentiable, the simplified DCA becomes and

• Equivalent to

yk = ∇h(xk)xk+1 ∈ ∂g∗(yk)

Tuesday, November 12, 2013

Page 17: DC Programming: A brief tutorial. - NYU Courantmunoz/slides/dc_slides.pdf · 2014. 12. 2. · DC Programming: A brief tutorial. Andres Munoz Medina Courant Institute of Mathematical

DCA as CCCP

• If the function is differentiable, the simplified DCA becomes and

• Equivalent to

yk = ∇h(xk)xk+1 ∈ ∂g∗(yk)

xk+1 ∈ argmin g(x)− �x,∇h(xk)�

Tuesday, November 12, 2013

Page 18: DC Programming: A brief tutorial. - NYU Courantmunoz/slides/dc_slides.pdf · 2014. 12. 2. · DC Programming: A brief tutorial. Andres Munoz Medina Courant Institute of Mathematical

DCA as CCCP

• If the function is differentiable, the simplified DCA becomes and

• Equivalent to

yk = ∇h(xk)xk+1 ∈ ∂g∗(yk)

xk+1 ∈ argmin g(x)− �x,∇h(xk)�

xk+1 ∈ argmin g(x)− h(xk)− �x− xk,∇h(xk)�

Tuesday, November 12, 2013

Page 19: DC Programming: A brief tutorial. - NYU Courantmunoz/slides/dc_slides.pdf · 2014. 12. 2. · DC Programming: A brief tutorial. Andres Munoz Medina Courant Institute of Mathematical

CCCP as a majorization minimization algorithm

• To minimize , MM algorithms build a majorization function such that

• Do coordinate descent on

• In our scenario

f(x) ≤ F (x, y) ∀x, yf(x) = F (x, x) ∀x

F

Ff

F (x, y) = g(x)− h(y)− �x− y,∇h(y)�

Tuesday, November 12, 2013

Page 20: DC Programming: A brief tutorial. - NYU Courantmunoz/slides/dc_slides.pdf · 2014. 12. 2. · DC Programming: A brief tutorial. Andres Munoz Medina Courant Institute of Mathematical

Convergence results

• Unconstrained DC functions: Convergence to a local minimum (no rate of convergence). Bound depends on moduli of convexity. [PD Tao, LT Hoai An 97]

• Unconstrained smooth optimization: Linear or almost quadratic convergence depending on curvature [Roweis et. al 03]

• Constrained smooth optimization: Convergence without rate using Zangwill’s theory. [Lanckriet, Sriperumbudur 09]

Tuesday, November 12, 2013

Page 21: DC Programming: A brief tutorial. - NYU Courantmunoz/slides/dc_slides.pdf · 2014. 12. 2. · DC Programming: A brief tutorial. Andres Munoz Medina Courant Institute of Mathematical

Global convergence

• NP-hard in general: Minimizing a quadratic function with one negative eigenvalue with linear constraints. [Pardalos 91]

• Mostly branch and bound methods and cutting plane methods [H. Tuy 03, Horst and Thoai 99]

• Some successful results with low rank functions.

Tuesday, November 12, 2013

Page 22: DC Programming: A brief tutorial. - NYU Courantmunoz/slides/dc_slides.pdf · 2014. 12. 2. · DC Programming: A brief tutorial. Andres Munoz Medina Courant Institute of Mathematical

Cutting plane algorithm[H. Tuy 03]

•All limit points of this algorithm are global minimizers• Finite convergence for

piecewise linear functions (Conjecture).•Keeps track of exponentially

many vertices.

Tuesday, November 12, 2013

Page 23: DC Programming: A brief tutorial. - NYU Courantmunoz/slides/dc_slides.pdf · 2014. 12. 2. · DC Programming: A brief tutorial. Andres Munoz Medina Courant Institute of Mathematical

Cutting plane algorithm[H. Tuy 03]

•All limit points of this algorithm are global minimizers• Finite convergence for

piecewise linear functions (Conjecture).•Keeps track of exponentially

many vertices.

Tuesday, November 12, 2013

Page 24: DC Programming: A brief tutorial. - NYU Courantmunoz/slides/dc_slides.pdf · 2014. 12. 2. · DC Programming: A brief tutorial. Andres Munoz Medina Courant Institute of Mathematical

Cutting plane algorithm[H. Tuy 03]

•All limit points of this algorithm are global minimizers• Finite convergence for

piecewise linear functions (Conjecture).•Keeps track of exponentially

many vertices.

Tuesday, November 12, 2013

Page 25: DC Programming: A brief tutorial. - NYU Courantmunoz/slides/dc_slides.pdf · 2014. 12. 2. · DC Programming: A brief tutorial. Andres Munoz Medina Courant Institute of Mathematical

Cutting plane algorithm[H. Tuy 03]

(x0, t0)

•All limit points of this algorithm are global minimizers• Finite convergence for

piecewise linear functions (Conjecture).•Keeps track of exponentially

many vertices.

Tuesday, November 12, 2013

Page 26: DC Programming: A brief tutorial. - NYU Courantmunoz/slides/dc_slides.pdf · 2014. 12. 2. · DC Programming: A brief tutorial. Andres Munoz Medina Courant Institute of Mathematical

Cutting plane algorithm[H. Tuy 03]

(x0, t0)

•All limit points of this algorithm are global minimizers• Finite convergence for

piecewise linear functions (Conjecture).•Keeps track of exponentially

many vertices.

Tuesday, November 12, 2013

Page 27: DC Programming: A brief tutorial. - NYU Courantmunoz/slides/dc_slides.pdf · 2014. 12. 2. · DC Programming: A brief tutorial. Andres Munoz Medina Courant Institute of Mathematical

Cutting plane algorithm[H. Tuy 03]

(x0, t0)

•All limit points of this algorithm are global minimizers• Finite convergence for

piecewise linear functions (Conjecture).•Keeps track of exponentially

many vertices.

Tuesday, November 12, 2013

Page 28: DC Programming: A brief tutorial. - NYU Courantmunoz/slides/dc_slides.pdf · 2014. 12. 2. · DC Programming: A brief tutorial. Andres Munoz Medina Courant Institute of Mathematical

Cutting plane algorithm[H. Tuy 03]

•All limit points of this algorithm are global minimizers• Finite convergence for

piecewise linear functions (Conjecture).•Keeps track of exponentially

many vertices.

Tuesday, November 12, 2013

Page 29: DC Programming: A brief tutorial. - NYU Courantmunoz/slides/dc_slides.pdf · 2014. 12. 2. · DC Programming: A brief tutorial. Andres Munoz Medina Courant Institute of Mathematical

Cutting plane algorithm[H. Tuy 03]

(x1, t1)

•All limit points of this algorithm are global minimizers• Finite convergence for

piecewise linear functions (Conjecture).•Keeps track of exponentially

many vertices.

Tuesday, November 12, 2013

Page 30: DC Programming: A brief tutorial. - NYU Courantmunoz/slides/dc_slides.pdf · 2014. 12. 2. · DC Programming: A brief tutorial. Andres Munoz Medina Courant Institute of Mathematical

Cutting plane algorithm[H. Tuy 03]

(x1, t1)

•All limit points of this algorithm are global minimizers• Finite convergence for

piecewise linear functions (Conjecture).•Keeps track of exponentially

many vertices.

Tuesday, November 12, 2013

Page 31: DC Programming: A brief tutorial. - NYU Courantmunoz/slides/dc_slides.pdf · 2014. 12. 2. · DC Programming: A brief tutorial. Andres Munoz Medina Courant Institute of Mathematical

Cutting plane algorithm[H. Tuy 03]

(x1, t1)

•All limit points of this algorithm are global minimizers• Finite convergence for

piecewise linear functions (Conjecture).•Keeps track of exponentially

many vertices.

Tuesday, November 12, 2013

Page 32: DC Programming: A brief tutorial. - NYU Courantmunoz/slides/dc_slides.pdf · 2014. 12. 2. · DC Programming: A brief tutorial. Andres Munoz Medina Courant Institute of Mathematical

Cutting plane algorithm[H. Tuy 03]

•All limit points of this algorithm are global minimizers• Finite convergence for

piecewise linear functions (Conjecture).•Keeps track of exponentially

many vertices.

Tuesday, November 12, 2013

Page 33: DC Programming: A brief tutorial. - NYU Courantmunoz/slides/dc_slides.pdf · 2014. 12. 2. · DC Programming: A brief tutorial. Andres Munoz Medina Courant Institute of Mathematical

Cutting plane algorithm[H. Tuy 03]

(x2, t2)

•All limit points of this algorithm are global minimizers• Finite convergence for

piecewise linear functions (Conjecture).•Keeps track of exponentially

many vertices.

Tuesday, November 12, 2013

Page 34: DC Programming: A brief tutorial. - NYU Courantmunoz/slides/dc_slides.pdf · 2014. 12. 2. · DC Programming: A brief tutorial. Andres Munoz Medina Courant Institute of Mathematical

Cutting plane algorithm[H. Tuy 03]

(x2, t2)

•All limit points of this algorithm are global minimizers• Finite convergence for

piecewise linear functions (Conjecture).•Keeps track of exponentially

many vertices.

Tuesday, November 12, 2013

Page 35: DC Programming: A brief tutorial. - NYU Courantmunoz/slides/dc_slides.pdf · 2014. 12. 2. · DC Programming: A brief tutorial. Andres Munoz Medina Courant Institute of Mathematical

Cutting plane algorithm[H. Tuy 03]

(x2, t2)

•All limit points of this algorithm are global minimizers• Finite convergence for

piecewise linear functions (Conjecture).•Keeps track of exponentially

many vertices.

Tuesday, November 12, 2013

Page 36: DC Programming: A brief tutorial. - NYU Courantmunoz/slides/dc_slides.pdf · 2014. 12. 2. · DC Programming: A brief tutorial. Andres Munoz Medina Courant Institute of Mathematical

Cutting plane algorithm[H. Tuy 03]

•All limit points of this algorithm are global minimizers• Finite convergence for

piecewise linear functions (Conjecture).•Keeps track of exponentially

many vertices.

Tuesday, November 12, 2013

Page 37: DC Programming: A brief tutorial. - NYU Courantmunoz/slides/dc_slides.pdf · 2014. 12. 2. · DC Programming: A brief tutorial. Andres Munoz Medina Courant Institute of Mathematical

Cutting plane algorithm[H. Tuy 03]

(x3, t3)

•All limit points of this algorithm are global minimizers• Finite convergence for

piecewise linear functions (Conjecture).•Keeps track of exponentially

many vertices.

Tuesday, November 12, 2013

Page 38: DC Programming: A brief tutorial. - NYU Courantmunoz/slides/dc_slides.pdf · 2014. 12. 2. · DC Programming: A brief tutorial. Andres Munoz Medina Courant Institute of Mathematical

Cutting plane algorithm[H. Tuy 03]

(x3, t3)

•All limit points of this algorithm are global minimizers• Finite convergence for

piecewise linear functions (Conjecture).•Keeps track of exponentially

many vertices.

Tuesday, November 12, 2013

Page 39: DC Programming: A brief tutorial. - NYU Courantmunoz/slides/dc_slides.pdf · 2014. 12. 2. · DC Programming: A brief tutorial. Andres Munoz Medina Courant Institute of Mathematical

Cutting plane algorithm[H. Tuy 03]

(x3, t3)

•All limit points of this algorithm are global minimizers• Finite convergence for

piecewise linear functions (Conjecture).•Keeps track of exponentially

many vertices.

Tuesday, November 12, 2013

Page 40: DC Programming: A brief tutorial. - NYU Courantmunoz/slides/dc_slides.pdf · 2014. 12. 2. · DC Programming: A brief tutorial. Andres Munoz Medina Courant Institute of Mathematical

Branch and bound.[Hoarst and Thoai 99]

• Main idea is to keep upper and lower bounds of on simplices .

• Upper bound: Evaluate for . Use DCA as subroutine for better bounds.

• Lower bound: If are the vertices of and . Solve

Sk

g(xk)− h(xk)xk ∈ Sk

g − h

(vi)n+1i=1 Sk

x =n+1�

i=1

αivi

minα

g��

i

αivi�−

�αih(vi)

Tuesday, November 12, 2013

Page 41: DC Programming: A brief tutorial. - NYU Courantmunoz/slides/dc_slides.pdf · 2014. 12. 2. · DC Programming: A brief tutorial. Andres Munoz Medina Courant Institute of Mathematical

Low rank optimization and polynomial guarantees[Goyal and Ravi 08]

• A function has rank if there exists and such that

• Most examples in Economy literature.

• For a quasi-concave function we want to solve .

• Can always transform DC programs to this type of problem.

f : Rn → R k � ng : Rk → R α1, . . . ,αk ∈ Rn

f(x) = g(α1 · x, . . . ,αk · x)

minx∈C

f(x)f

Tuesday, November 12, 2013

Page 42: DC Programming: A brief tutorial. - NYU Courantmunoz/slides/dc_slides.pdf · 2014. 12. 2. · DC Programming: A brief tutorial. Andres Munoz Medina Courant Institute of Mathematical

Algorithm

• Let satisfy the following conditions.

‣ The gradient

‣ for all and some

‣ for all

• There is an algorithm that finds with . in

∇g(y) ≥ 0

g(λy) ≤ λcg(y) λ > 1 c

αi · x > 0 x ∈ P

�xf(�x) ≤ (1 + �)f(x∗) O

�ck

�k

g

Tuesday, November 12, 2013

Page 43: DC Programming: A brief tutorial. - NYU Courantmunoz/slides/dc_slides.pdf · 2014. 12. 2. · DC Programming: A brief tutorial. Andres Munoz Medina Courant Institute of Mathematical

Further reading

• Farkas type results and duality for DC programs with convex constraints. [Dinh et. al 13]

• On DC functions and mappings [Duda et. al 01]

Tuesday, November 12, 2013

Page 44: DC Programming: A brief tutorial. - NYU Courantmunoz/slides/dc_slides.pdf · 2014. 12. 2. · DC Programming: A brief tutorial. Andres Munoz Medina Courant Institute of Mathematical

Open problems

• Local rate of convergence for constrained DC programs.

• Is there a condition under which DCA finds global optima. For instance might not be convex but might.

• Finite convergence of cutting plane methods.

g − hh∗ − g∗

Tuesday, November 12, 2013

Page 45: DC Programming: A brief tutorial. - NYU Courantmunoz/slides/dc_slides.pdf · 2014. 12. 2. · DC Programming: A brief tutorial. Andres Munoz Medina Courant Institute of Mathematical

References‣ Dinh, Nguyen, Guy Vallet, and T. T. A. Nghia. "Farkas-type results and

duality for DC programs with convex constraints." (2006).

‣ Goyal, Vineet, and R. Ravi. "An FPTAS for minimizing a class of low-rank quasi-concave functions over a convex set." Operations Research Letters 41.2 (2013): 191-196.

‣ Tao, Pham Dinh, and Le Thi Hoai An. "Convex analysis approach to DC programming: theory, algorithms and applications." Acta Mathematica Vietnamica 22.1 (1997): 289-355.

‣ Horst, R., and Ng V. Thoai. "DC programming: overview." Journal of Optimization Theory and Applications 103.1 (1999): 1-43.

‣ Tuy, H. "On global optimality conditions and cutting plane algorithms." Journal of optimization theory and applications 118.1 (2003): 201-216.

‣ Yuille, Alan L., and Anand Rangarajan. "The concave-convex procedure."Neural Computation 15.4 (2003): 915-936.

Tuesday, November 12, 2013

Page 46: DC Programming: A brief tutorial. - NYU Courantmunoz/slides/dc_slides.pdf · 2014. 12. 2. · DC Programming: A brief tutorial. Andres Munoz Medina Courant Institute of Mathematical

References

‣ Salakhutdinov, Ruslan, Sam Roweis, and Zoubin Ghahramani. "On the convergence of bound optimization algorithms." Proceedings of the Nineteenth conference on Uncertainty in Artificial Intelligence. Morgan Kaufmann Publishers Inc., 2002.

‣ Hartman, Philip. "On functions representable as a difference of convex functions." Pacific J. Math 9.3 (1959): 707-713.

‣ Hiriart-Urruty, J-B. "Generalized Differentiability/Duality and Optimization for Problems Dealing with Differences of Convex Functions." Convexity and duality in optimization. Springer Berlin Heidelberg, 1985. 37-70.

Tuesday, November 12, 2013