agenda - statweb.stanford.edustatweb.stanford.edu/.../lectures/cone_programming.pdf · agenda 1...

Agenda

1 Cone programming

2 Convex cones

3 Generalized inequalities

4 Linear programming (LP)

5 Second-order cone programming (SOCP)

6 Semidefinite programming (SDP)

7 Examples

Optimization problem in standard form

minimize f0(x)subject to fi(x) ≤ 0 i = 1, . . . ,m

hi(x) = 0 i = 1, . . . , p

x ∈ Rn

f0 : Rn → R (objective or cost function)

fi : Rn → R (inequality constraint functionals)

hi : Rn → R (equality constraint functionals)

Terminology

x is feasible if x obeys the constraints

feasible set C: set of all feasible points

optimal value: p? = inff0(x), x ∈ Ccan be −∞; e.g. min log(x), x > 0.by convention, p? =∞ if C = ∅ (problem infeasible)

optimal solution: x? s.t. f(x?) = p?

there may be no optimal solution: e.g. min log(x), x > 0

optimal set: x : f(x) = p?

Convex optimization problem

Convex optimization problem in standard form

minimize f0(x)subject to fi(x) ≤ 0 i = 1, . . . ,m

aTi x = bi i = 1, . . . , p

f0, f1, . . . , fm convex

affine equality constraints Ax = b, A ∈ Rp×n

feasible set is convex

Abstract convex optimization problem

minimize f0(x)subject to x ∈ C

f0 convex

C convex

Why convexity?

A convex function has no local minimum that is not global

convex not convex

A convex set is connected and has feasible directions at any point

convex + feasible directions

not convex

A convex function is continuous and has some differentiability properties

Convex functions arise prominently in duality

Cone programming I

LPminimize cTxsubject to Fx+ g ≥ 0

Ax = b

Nonlinear programming → nonlinear constraints

Express nonlinearity via generalized inequalities

Orderings of Rn and convex cones

K is a convex cone if

(i) K is convex

(ii) K is a cone (i.e. x ∈ K =⇒ λx ∈ K ∀λ ≥ 0)

K is pointed if

(iii) x ∈ K and − x ∈ K =⇒ x = 0(K does not contain a straight line through the origin)

Example: K = x ∈ Rn : x ≥ 0 is a pointed convex cone

Two additional properties of Rn+(iv) Rn+ is closed

(v) Rn+ has a nonempty interior

Implication: ordering

a K b ⇐⇒ a− b ∈ K

(i) - (iii) ensure that this is a good ordering

1 reflexivity: a a follows from 0 ∈ K2 antisymmetry: a b, b a =⇒ a = b (since K is pointed)

3 transitivity: a b, b c =⇒ a c (since K is convex and a cone)

→ compatibility with linear operations

a b & λ ≥ 0 =⇒ λa λba b & c d =⇒ a+ c b+ d

Good properties of LPs come from these properties

4 closedness: ai bi, ai → a, bi → b =⇒ a b5 nonempty interior allows us to define strict inequalities:a b ⇐⇒ a− b ∈ int(K)

Examples of cones

Nonnegative orthant Rn+x ∈ Rn : x ≥ 0

Second-order (or Lorentz or ice cream) cone

x ∈ Rn+1 :√x21 + . . .+ x2n ≤ xn+1

Positive semidefinite cone

X ∈ Sn : X 0

Cone programming II

minimize cTxsubject to Fx+ g 0

Ax = b

K = Rn+ =⇒ linear programming

Minimize linear functional over an affine slice of a cone

Very fruitful point of view

useful theory (duality)useful algorithms (interior point methods)

Linear programming (LP)

minimize cTxsubject to Fx+ g ≥ 0

Ax = b

Linear objective

Linear equality and inequality constraints

Feasible set is a polyhedronc

x* (optimal)

cTx = constant

Many problems can be formulated as LP’s

Example: Chebyshev approximation

A ∈ Rm×n

b ∈ Rm

minimize ‖Ax− b‖∞ ⇐⇒ minimize maxi=1,...,m |aTi x− bi|

Different from LS problem: minimize‖Ax− b‖2LP formulation (epigraph trick)

⇐⇒ minimize tsubject to |aTi x− bi| ≤ t ∀i

⇐⇒ minimize tsubject to − t ≤ aTi x− bi ≤ t ∀i

optimization variables (x, t) ∈ Rn+1

Example: basis pursuit

A ∈ Rm×n

b ∈ Rm

minimize ‖x‖1subject to Ax = b

LP formulations:

(a)minimize

∑ti

subject to −ti ≤ xi ≤ tiAx = b

optimization variables (x, t) ∈ R2n

(b)minimize

∑x+i +

∑x−i

subject to A(x+ − x−) = bx+, x− ≥ 0

optimization variables (x+, x−) ∈ R2n

Second-order cone programming (SOCP)

minimize cTxsubject to ‖Fix+ gi‖2 ≤ cTi x+ di i = 1, . . . ,m

Ax = b

‖Fix+ gi‖2 ≤ cTi x+ di ⇐⇒[Fix+ gicTi x+ di

]∈ Li = (yi, t) : ‖yi‖ ≤ t

(hence the name)

SOCP ⇐⇒ minimize cTx

subject to

[FicTi

]x+

[gidi

]∈ Li

Ax = b

affine mapping

Fx+ g = [FicTi

]x+

[gidi

]i=1,...,m

cone productK = L1 × L2 × . . .× Lm

[FicTi

]x+

[gidi

]∈ Li ∀i ⇐⇒ Fx+ g ∈ K

∴ SOCP ⇐⇒minimize cTxsubject to Fx+ g ∈ K

Ax = b

this is a cone program

Example: support vector machines

n pairs (xi, yi)

xi ∈ Rp: feature/explanatory variablesyi ∈ −1, 1: response/class label

Examples

xi: infrared blood absorption spectrumyi: person is diabetic or not

SVM model: SVM as a penalized fittingprocedure

minβ

n∑i=1

[1− yif(xi)]+ + λ‖β‖2

f(x) = xTβ

sometimes f(x) = xTβ + β0 and sameminimum

hinge loss

0 1 yf(x)

[1-yf(x)]+

SVM: formulation as an SOCP

Variables (β, t) ∈ Rp×n

minimize∑ti + λ‖β‖2 ⇐⇒ minimize

∑ti + λ‖β‖2

subject to [1− yif(xi)]+ ≤ ti subject to yif(xi) ≥ 1− titi ≥ 0

this an SOCP, since SOCP’s are more general than QP’s and QCQP’s

Equivalenceminimize

∑ti + λu

subject to ‖β‖2 ≤ uyif(xi) ≥ 1− titi ≥ 0

‖β‖2 ≤ u ⇐⇒ ‖β‖2 ≤(u+ 1

2

)2−(u− 1

2

)2⇐⇒

∥∥∥∥[ βu−12

]∥∥∥∥ ≤ u+ 1

2

QP ⊂ SOCP ( =⇒ LP ⊂ SOCP)

QCQPminimize 1

2xTP0x+ qT0 x+ r0

subject to 12x

TPix+ qTi x+ ri ≤ 0P0, Pi 0

QCQP ⊂ SOCP

quadratic convex inequalities are SOCP-representable

Example: total-variation denoising

Observebij = fij + σzij 0 < i, j < n

f is original image

b is a noisy version

Problem: recover original image (de-noise)

Min-TV solutionminimize ‖x‖TVsubject to ‖x− b‖ ≤ δ

TV norm

‖x‖TV =∑‖Dijx‖2 Dijx =

[xi+1,j − xi,jxi,j+1 − xi,j

]Formulation as an SOCP

minimize∑tij

subject to ‖Dijx‖2 ≤ tij‖x− b‖2 ≤ δ

Semidefinite programming (SDP)

minimize cTxsubject to F (x) = x1F1 + . . .+ xnFn − F0 0

Fi ∈ Sp(p× p symmetric matrices)

linear matrix inequality (LMI): F (x) 0

multiple LMI’s can be combined into one:

Fi(x) 0 i = 1, . . . ,m ⇐⇒

F1(x). . .

Fm(x)

0

SOCP ⊂ SDP (but the converse is not true!)

(x, t) ∈ Rn+1 : ‖x‖ ≤ t ⇐⇒[tIm xxT t

] 0

SOCP constraints are LMI’s

Hierarchy: LP ⊂ SOCP ⊂ SDP

Many nonlinear convex problems can be cast as SDP’s

Example: minimum-norm problem

minimize ‖A(x)‖subject to A(x) = x1A1 + . . .+ xnAn −B

with Ai ∈ Rp1×p2 , is equivalent to

minimize tsubject to ‖A(x)‖ ≤ t

‖A(x)‖ ≤ t ⇐⇒[tIp1 A(x)AT (x) tIp2

] 0

Why? Eigenvalues of this matrix are t± σi(A(x))

Example: nuclear-norm minimization

minimize ‖X‖∗ =∑σi(X)

subject to Xij = Bij (i, j) ∈ Ω ⊂ [p1]× [p2]

This is an SDP (proof, later)

Stability analysis for dynamical systems

Linear systemdv

dt= v(t) = Qv(t) Q ∈ Rn×n

Main question: is this system stable? i.e. do all trajectories tend to zero ast→∞?

Simple sufficient condition: existence of a quadratic Lyapunov function

(i) L(v) = vTXv X 0

(ii) L = ddtL(v(t)) ≤ −αL(v(t)) (α > 0) for any trajectory

This condition gives L(v(t)) = vT (t)Xv(t) ≤ exp(−αt)L(v(0)) (Gronwall’sinequality), whence

X 0 =⇒ v(t)→ 0 as t→∞

Exsitence of X 0 and α > 0 provides a certificate of stability

dv

dt= v(t) = Qv(t), L(v) = vTXv X 0

L =d

dt

[vT (t)Xv(t)

]= vTXv + vTXv = vT (QTX +XQ)v

i.e. L ≤ −αL ⇐⇒ vT (QTX +XQ+ αX)v < 0 ∀v⇐⇒ QTX +XQ+ αX ≺ 0

Conclusion: to certify stability, it suffices to find X obeying

X 0, QTX +XQ ≺ 0

If the optimal value of SDP

minimize t

subject to

[X + tI 0

0 −(QTX +XQ) + tI

] 0

is negative, then the system is stable

Extension

v(t) = Q(t)v(t)

Q(t) ∈ convQ1, . . . , Qn time-varying

L(v) = vTXv (X 0) s.t. L ≤ −αL =⇒ stability

Similar calculations show that for all v

vT (QT (t)X +XQ(t) + αX)v ≤ 0⇐⇒ QT (t)X +XQ(t) + αX ≺ 0, ∀Q(t) ∈ convQ1, . . . , Qn⇐⇒ QTi X +XQi + αX ≺ 0, ∀i = 1, . . . , k

If we can find X such that

X 0 & QTi X +XQi ≺ 0 ∀i = 1, . . . , k

then we have stability

This is an SDP!

References

1 A. Ben-Tal and A. Nemirovski, Lectures on Modern Convex Optimization:Analysis, Algorithms, and Engineering Applications, MPS-SIAM Series onOptimization

2 S. Boyd and L. Vandenberghe, Convex Optimization, Cambridge UniversityPress

agenda - statweb.stanford.edustatweb.stanford.edu/.../lectures/cone_programming.pdf · agenda 1...

Documents