convex optimization and...

Convex Optimization and Applications

Lin Xiao

Center for the Mathematics of InformationCalifornia Institute of Technology

Acknowledgment:

Stephen Boyd (Stanford) and Lieven Vandenberghe (UCLA)and their research groups

CS286a: Mathematics of Information seminar, 10/15/04

Two problems

polyhedron P described by linear inequalities, aTi x ≤ bi, i = 1, . . . , L

PSfrag replacements

a1

P

Problem 1: find minimum volume ellipsoid ⊇ P

Problem 2: find maximum volume ellipsoid ⊆ P

are these (computationally) difficult? or easy?

CS286a seminar 10/15/04 1

problem 1 is very difficult

• in practice

• in theory (NP-hard)

problem 2 is very easy

• in practice (readily solved on small computer)

• in theory (polynomial complexity)


Moral

very difficult and very easy problems can look quite similar

. . . unless we are trained to recognize the difference


Linear program (LP)

minimize cTxsubject to aTi x ≤ bi, i = 1, . . . ,m

c, ai ∈ Rn are parameters; x ∈ Rn is variable

• easy to solve, in theory and practice• can solve dense problems with n = 1000 vbles, m = 10000 constraintseasily; far larger for sparse or structured problems


Polynomial minimization

minimize p(x)

p is polynomial of degree d; x ∈ Rn is variable

• except for special cases (e.g., d = 2) this is a very difficult problem

• even sparse problems with size n = 20, d = 10 are essentially intractable

• all algorithms known to solve this problem require effort exponential in n


Moral

• a problem can appear∗ hard, but be easy

• a problem can appear∗ easy, but be hard

∗ if we are not trained to recognize them


What makes a problem easy or hard?

classical view:

• linear is easy

• nonlinear is hard(er)


What makes a problem easy or hard?

emerging (and correct) view:

. . . the great watershed in optimization isn’t between linearity andnonlinearity, but convexity and nonconvexity.

— R. Rockafellar, SIAM Review 1993


Convex optimization

minimize f0(x)subject to f1(x) ≤ 0, . . . , fm(x) ≤ 0, Ax = b

x ∈ Rn is optimization variable; fi : Rn → R are convex:

fi(λx+ (1− λ)y) ≤ λfi(x) + (1− λ)fi(y)

for all x, y, 0 ≤ λ ≤ 1

• includes least-squares, linear programming, maximum volume ellipsoidin polyhedron, and many others

• convex problems are fundamentally tractable


Example: Robust LP

minimize cTxsubject to Prob(aTi x ≤ bi) ≥ η, i = 1, . . . ,m

coefficient vectors ai IID, N (ai,Σi); η is required reliability

• for fixed x, aTi x is N (aTi x, xTΣix)

• so for η = 50%, robust LP reduces to LP

minimize cTxsubject to aTi x ≤ bi, i = 1, . . . ,m

and so is easily solved

• what about other values of η, e.g., η = 10%? η = 90%?

CS286a seminar 10/15/04 10

constraint Prob(aTi x ≤ bi) ≥ η equivalent to

aTi x+Φ−1(η)‖Σ1/2i x‖2 − bi ≤ 0

Φ is CDF of unit Gaussian

is LHS a convex function?

CS286a seminar 10/15/04 11

Hint

{x | Prob(aTi x ≤ bi) ≥ η, i = 1, . . . ,m}

η = 10% η = 50% η = 90%

CS286a seminar 10/15/04 12

That’s right

robust LP with reliability η = 90% is convex, and very easily solved

robust LP with reliability η = 10% is not convex, and extremely difficult

CS286a seminar 10/15/04 13

Maximum volume ellipsoid in polyhedron

• polyhedron: P = {x | aTi x ≤ bi, i = 1, . . . ,m}• ellipsoid: E = {By + d | ‖y‖ ≤ 1}, with B = BT Â 0

PSfrag replacementsE

s

d

maximum volume E ⊆ P, as convex problem in variables B, d:

maximize log detBsubject to B = BT Â 0, ‖Bai‖+ aTi d ≤ bi, i = 1, . . . ,m

CS286a seminar 10/15/04 14

Moral

• it’s not easy to recognize convex functions and convex optimizationproblems

• huge benefit, though, when we do

CS286a seminar 10/15/04 15

Convex Analysis and Optimization

Convex analysis & optimization

nice properties of convex optimization problems known since 1960s

• local solutions are global• duality theory, optimality conditions• simple solution methods like alternating projections

convex analysis well developed by 1970s Rockafellar

• separating & supporting hyperplanes• subgradient calculus

CS286a seminar 10/15/04 16

What’s new (since 1990 or so)

• primal-dual interior-point (IP) methodsextremely efficient, handle nonlinear large scale problems,polynomial-time complexity results, software implementations

• new standard problem classesgeneralizations of LP, with theory, algorithms, software

• extension to generalized inequalitiessemidefinite, cone programming

CS286a seminar 10/15/04 17

Applications and uses

• lots of applicationscontrol, combinatorial optimization, signal processing,circuit design, communications, machine learning . . .

• robust optimizationrobust versions of LP, least-squares, other problems

• relaxations and randomizationprovide bounds, heuristics for solving hard (e.g., combinatorialoptimization) problems

CS286a seminar 10/15/04 18

Recent history

• 1984–97: interior-point methods for LP– 1984: Karmarkar’s interior-point LP method– theory Ye, Renegar, Kojima, Todd, Monteiro, Roos, . . .– practice Wright, Mehrotra, Vanderbei, Shanno, Lustig, . . .

• 1988: Nesterov & Nemirovsky’s self-concordance analysis• 1989–: LMIs and semidefinite programming in control• 1990–: semidefinite programming in combinatorial optimizationAlizadeh, Goemans, Williamson, Lovasz & Schrijver, Parrilo, . . .

• 1994: interior-point methods for nonlinear convex problemsNesterov & Nemirovsky, Overton, Todd, Ye, Sturm, . . .

• 1997–: robust optimization Ben Tal, Nemirovsky, El Ghaoui, . . .

CS286a seminar 10/15/04 19

New Standard Convex Problem Classes

Some new standard convex problem classes

• second-order cone program (SOCP)• geometric program (GP) (and entropy problems)• semidefinite program (SDP)

all these new problem classes have

• complete duality theory, similar to LP• good algorithms, and robust, reliable software• wide variety of new applications

CS286a seminar 10/15/04 20

Second-order cone program

second-order cone program (SOCP) has form

minimize cT0 x

subject to ‖Aix+ bi‖2 ≤ cTi x+ di, i = 1, . . . ,m

with variable x ∈ Rn

• includes LP and QP as special cases• nondifferentiable when Aix+ bi = 0

• new IP methods can solve (almost) as fast as LPs

CS286a seminar 10/15/04 21

Example: robust linear program

minimize cTxsubject to Prob(aTi x ≤ bi) ≥ η, i = 1, . . . ,m

where ai ∼ N (ai,Σi)

equivalent to

minimize cTx

subject to aTi x+Φ−1(η)‖Σ1/2i x‖2 ≤ bi, i = 1, . . . ,m

where Φ is (unit) normal CDF

robust LP is an SOCP for η ≥ 0.5 (Φ−1(η) ≥ 0)

CS286a seminar 10/15/04 22

Geometric program (GP)

log-sum-exp function:

lse(x) = log (ex1 + · · ·+ exn)

. . . a smooth convex approximation of the max function

geometric program:

minimize lse(A0x+ b0)

subject to lse(Aix+ bi) ≤ 0, i = 1, . . . ,m

Ai ∈ Rmi×n, bi ∈ Rmi; variable x ∈ Rn

CS286a seminar 10/15/04 23

Posynomial form geometric program

x = (x1, . . . , xn): vector of positive variables

function f of form

f(x) =

t∑

k=1

ckxα1k1 x

α2k2 · · ·xαnk

n

with ck ≥ 0, αik ∈ R, is called posynomial

like polynomial, but

• coefficients must be positive• exponents can be fractional or negative

CS286a seminar 10/15/04 24

Posynomial form geometric program

posynomial form GP:

minimize f0(x)subject to fi(x) ≤ 1, i = 1, . . . ,m

fi are posynomial; xi > 0 are variables

to convert to (convex form) GP, express as

minimize log f0(ey)

subject to log fi(ey) ≤ 0, i = 1, . . . ,m

objective and constraints have form lse(Aiy + bi)

CS286a seminar 10/15/04 25

Entropy problems

unnormalized negative entropy is convex function

− entr(x) =

n∑

i=1

xi log(xi/1Tx)

defined for xi ≥ 0, 1Tx > 0

entropy problem:

minimize − entr(A0x+ b0)

subject to − entr(Aix+ bi) ≤ 0, i = 1, . . . ,m

Ai ∈ Rmi×n, bi ∈ Rmi

CS286a seminar 10/15/04 26

Solving GPs (and entropy problems)

• GP and entropy problems are duals (if we solve one, we solve the other)

• new IP methods can solve large scale GPs (and entropy problems)almost as fast as LPs

• applications in many areas:– information theory, statistics– communications, wireless power control– digital and analog circuit design

CS286a seminar 10/15/04 27

Generalized inequalities

with proper convex cone K ⊆ Rk we associate generalized inequality

x ¹K y ⇐⇒ y − x ∈ K

convex optimization problem with generalized inequalities:

minimize f0(x)

subject to f1(x) ¹K1 0, . . . , fL(x) ¹KL0, Ax = b

fi : Rn → Rki are Ki-convex: for all x, y, 0 ≤ λ ≤ 1,

fi(λx+ (1− λ)y) ¹Kiλfi(x) + (1− λ)fi(y)

CS286a seminar 10/15/04 28

Semidefinite program

semidefinite program (SDP):

minimize cTxsubject to x1A1 + · · ·+ xnAn ¹ B

• B, Ai are symmetric matrices; variable is x ∈ Rn

• ¹ is matrix inequality; constraint is linear matrix inequality (LMI)• SDP can be expressed as convex problem as

λmax(B − x1A1 · · · − xnAn) ≤ 0

or handled directly as cone problem

CS286a seminar 10/15/04 29

Semidefinite programming

• nearly complete duality theory, similar to LP

• interior-point algorithms that are efficient in theory & practice

• applications in many areas:– control theory– combinatorial optimization & graph theory– structural optimization– statistics– signal processing– circuit design– geometrical problems– communications and information theory– quantum computing– algebraic geometry– machine learning

CS286a seminar 10/15/04 30

Continuous time symmetric Markov chain on a graph

• connected graph G = (V, E), V = {1, . . . , n}, no self-loops

• Markov process on V, with symmetric rate matrix Q– Qij = 0 for (i, j) 6∈ E , i 6= j– Qij ≥ 0 for (i, j) ∈ E– Qii = −

∑

j 6=iQij

• eigenvalues of Q ordered as 0 = λ1(Q) > λ2(Q) ≥ · · · ≥ λn(Q)

• state distribution given by π(t) = etQπ(0)

• distribution converges to uniform with rate determined by λ2

‖π(t)− 1/n‖tv ≤ (√n/2)eλ2(Q)t

CS286a seminar 10/15/04 31

Fastest mixing Markov chain (FMMC) problem

minimize λ2(Q)

subject to Q1 = 0, Q = QT

Qij = 0 for (i, j) 6∈ E , i 6= j

Qij ≥ 0 for (i, j) ∈ E∑

(i,j)∈E d2ijQij ≤ 1

PSfrag replacements

?

? ??

? ?

?

• variable is matrix Q; problem data is graph, constants dij = dij on E

• need to add constraint on rates since λ2 is homogeneous

• optimal Q gives fastest diffusion process on graph(subject to rate constraint)

CS286a seminar 10/15/04 32

Fast diffusion processes

electrical

PSfrag replacements

12

34

5

6

g13

g12 g23 g34g45

g46

mechanical

PSfrag replacements

1 2 3 4 5 6

k13

k12 k23 k34 k45

k46

CS286a seminar 10/15/04 33

Swap objective and constraint

• λ2(Q) and∑

(i,j)∈E d2ijQij are homogeneous

• hence, can just as well minimize weighted rate sum ∑

(i,j)∈E d2ijQij

subject to bound on λ2(Q)

minimize∑

(i,j)∈E d2ijQij


Qij = 0 for (i, j) 6∈ E , i 6= j

Qij ≥ 0 for (i, j) ∈ Eλ2(Q) ≤ −1

CS286a seminar 10/15/04 34

SDP formulation of FMMC

using Q1 = 0, we have

λ2(Q) ≤ −1 ⇐⇒ Q− 11T/n ¹ −I

so FMMC reduces to SDP

minimize∑

(i,j)∈E d2ijQij


Qij = 0 for (i, j) 6∈ E , i 6= j

Qij ≥ 0 for (i, j) ∈ EQ− 11T/n ¹ −I

hence: can solve efficiently, duality theory, . . .

CS286a seminar 10/15/04 35

Robust Optimization

Robust optimization problem

robust optimization problem:

minimize maxa∈A f0(a, x)subject to maxa∈A fi(a, x) ≤ 0, i = 1. . . . ,m

• x is optimization variable

• a is uncertain parameter

• A is parameter set (e.g., box, polyhedron, ellipsoid)

heuristics (detuning, sensitivity penalty term) have been used sincebeginning of optimization

CS286a seminar 10/15/04 36

Robust optimization via convex optimization

El Ghaoui, Ben Tal & Nemirovsky, . . .

• robust versions of LP, QP, SOCP problems, with ellipsoidal orpolyhedral uncertainty, can be formulated as SDPs (or simpler)

• other robust problems (e.g., SDP) intractable, but there are goodconvex approximations

CS286a seminar 10/15/04 37

Robust least-squares

robust LS problem with uncertain parameters v1, . . . , vp

minimize sup‖v‖≤1 ‖(A0 + v1A1 + · · ·+ vpAp)x− b‖2

equivalent SDP (variables x, t1, t2):

minimize t1 + t2

subject to

I P (x) q(x)P (x)T t1I 0q(x)T 0 t2

º 0

where

P (x) =[

A1x A2x · · · Apx]

∈ Rm×p, q(x) = A0x− b

CS286a seminar 10/15/04 38

example: minimize sup‖v‖≤ρ ‖(A0 + v1A1 + v2A2)x− b‖2

0 0.2 0.4 0.6 0.8 10

2

4

6

8

10

12

PSfrag replacements

ρ

worst-caseresidual

LS sol.

robust LS sol.

(‖A0‖ = 10, ‖A1‖ = ‖A2‖ = 1)

CS286a seminar 10/15/04 39

example: minimize sup‖v‖≤1 ‖(A0 + v1A1 + v2A2)x− b‖2

0 1 2 3 4 50

0.05

0.1

0.15

0.2

0.25

PSfrag replacements

‖(A0 + v1A1 + v2A2)x− b‖2

xls

xtych

xrls

freq

uen

cy

distribution assuming v is uniformly distributed

CS286a seminar 10/15/04 40

Relaxations & Randomization

Relaxations & randomization

convex optimization is increasingly used

• to find good bounds for hard (i.e., nonconvex) problems, via relaxation

• as a heuristic for finding good suboptimal points, often viarandomization

CS286a seminar 10/15/04 41

Example: Boolean least-squares

Boolean least-squares problem:

minimize ‖Ax− b‖2subject to x2

i = 1, i = 1, . . . , n

• basic problem in digital communications• could check all 2n possible values of x . . .• an NP-hard problem, and very hard in practice• many heuristics for approximate solution

CS286a seminar 10/15/04 42

Boolean least-squares as matrix problem

‖Ax− b‖2 = xTATAx− 2bTAx+ bT b

= TrATAX − 2bTATx+ bT b

where X = xxT

hence can express BLS as

minimize TrATAX − 2bTAx+ bT bsubject to Xii = 1, X º xxT , rank(X) = 1

. . . still a very hard problem

CS286a seminar 10/15/04 43

SDP relaxation for BLS

ignore rank one constraint, and use

X º xxT ⇐⇒[

X xxT 1

]

º 0

to obtain SDP relaxation (with variables X, x)

minimize TrATAX − 2bTATx+ bT b

subject to Xii = 1,

[

X xxT 1

]

º 0

• optimal value of SDP gives lower bound for BLS• if optimal matrix is rank one, we’re done

CS286a seminar 10/15/04 44

Interpretation via randomization

• can think of variables X, x in SDP relaxation as defining a normaldistribution z ∼ N (x,X − xxT ), with E z2

i = 1

• SDP objective is E ‖Az − b‖2

suggests randomized method for BLS:

• find X?, x?, optimal for SDP relaxation

• generate z from N (x?, X? − x?x?T )

• take x = sgn(z) as approximate solution of BLS(can repeat many times and take best one)

CS286a seminar 10/15/04 45

Example

• (randomly chosen) parameters A ∈ R150×100, b ∈ R150

• x ∈ R100, so feasible set has 2100 ≈ 1030 points

LS approximate solution: minimize ‖Ax− b‖ s.t. ‖x‖2 = n, then round

yields objective 8.7% over SDP relaxation bound

randomized method: (using SDP optimal distribution)

• best of 20 samples: 3.1% over SDP bound

• best of 1000 samples: 2.6% over SDP bound

CS286a seminar 10/15/04 46

1 1.20

0.1

0.2

0.3

0.4

0.5

PSfrag replacements

‖Ax− b‖/(SDP bound)

freq

uen

cySDP bound LS solution

CS286a seminar 10/15/04 47

Calculus of Convex Functions

Approach

• basic examples or atoms

• calculus rules or transformations that preserve convexity

CS286a seminar 10/15/04 48

Convex functions: Basic examples

• xp for p ≥ 1 or p ≤ 0; −xp for 0 ≤ p ≤ 1

• ex, − log x, x log x

• xTx; xTx/y (for y > 0); (xTx)1/2

• ‖x‖ (any norm)

• max(x1, . . . , xn), log(ex1 + · · ·+ exn)

• log Φ(x) (Φ is Gaussian CDF)

• log detX−1 (for X Â 0)

CS286a seminar 10/15/04 49

Calculus rules

• convexity preserved under sums, nonnegative scaling

• if f cvx, then g(x) = f(Ax+ b) cvx

• pointwise sup: if fα cvx for each α ∈ A, then g(x) = supα∈A fα(x) cvx

• minimization: if f(x, y) cvx, then g(x) = infy f(x, y) cvx

• composition rules: if h cvx & increasing, f cvx, then g(x) = h(f(x))cvx

• perspective transformation: if f cvx, then g(x, t) = tf(x/t) cvx fort > 0

. . . and many, many others

CS286a seminar 10/15/04 50

More examples

• λmax(X) (for X = XT )

• f(x) = x[1] + · · ·+ x[k] (sum of largest k elements of x)

• −∑mi=1 log(−fi(x)) (on {x | fi(x) < 0}; fi cvx)

• f(x) = logProb(x+ z ∈ C) (C convex, z ∼ N (0,Σ))

• xTY −1x is cvx in (x, Y ) for Y = Y T Â 0

CS286a seminar 10/15/04 51

Duality

Lagrangian and dual function

primal problem:

minimize f0(x)

subject to fi(x) ≤ 0, i = 1, . . . ,m

Lagrangian: L(x, λ) = f0(x) +∑m

i=1 λifi(x)

dual function: g(λ) = infxL(x, λ)

CS286a seminar 10/15/04 52

Lower bound property

• for any primal feasible x and any λ ≥ 0, we have f0(x) ≤ g(λ)

• hence for any λ ≥ 0, g(λ) ≤ p? (optimal value of primal problem)

• duality gap of x, λ is defined as f0(x)− g(λ)(gap is nonnegative; bounds suboptimality of x)

CS286a seminar 10/15/04 53

Dual problem

find best lower bound on p?:

maximize g(λ)

subject to λi ≥ 0, i = 1, . . . ,m

p? is optimal value of primal, and d? is optimal value of dual

• weak duality: even when primal not convex, d? ≤ p?

• for convex primal problem, we have strong duality: d? = p?

(provided technical condition holds)

CS286a seminar 10/15/04 54

Example: linear program

primal:minimize cTxsubject to Ax ¹ b

dual:maximize bTλsubject to ATλ+ c = 0, λ º 0

CS286a seminar 10/15/04 55

Example: unconstrained geometric program

primal:minimize log

(∑m

i=1 exp(aTi x+ bi)

)

dual:maximize bTν −∑m

i=1 νi log νisubject to 1Tν = 1, ATν = 0, ν º 0

. . . an entropy maximization problem

CS286a seminar 10/15/04 56

Example: duality between FMMC and MVU

FMMC problem

minimize∑

(i,j)∈E d2ijQij


Qij = 0 for (i, j) 6∈ E , i 6= j

Qij ≥ 0 for (i, j) ∈ EQ− 11T/n ¹ −I

dual of FMMC problem

maximize TrX

subject to Xii +Xjj −Xij −Xji ≤ d2ij, (i, j) ∈ E

X1 = 0, X º 0

CS286a seminar 10/15/04 57

FMMC dual as maximum-variance unfolding

• use variables x1, . . . , xn, with X =

xT1...xTn

[x1 · · ·xn]

• dual FMMC problem becomes

maximize∑n

i=1 ‖xi‖2

subject to∑

i xi = 0, ‖xi − xj‖ ≤ dij, (i, j) ∈ E

• position n points in Rn to maximize variance, respecting local distanceconstraints, i.e., maximum-variance unfolding problem

CS286a seminar 10/15/04 58

Semidefinite embedding

• similar to semidefinite embedding for unsupervised learning ofmanifolds (which has distance equality constraints)(Weinberger, Saul 2003)

• surprise: fastest diffusion on graph, max-variance unfolding are duals

CS286a seminar 10/15/04 59

Interior-Point Methods

Interior-point methods

• handle linear and nonlinear convex problems Nesterov & Nemirovsky• based on Newton’s method applied to ‘barrier’ functions that trap x ininterior of feasible region (hence the name IP)

• worst-case complexity theory: # Newton steps ∼ √problem size• in practice: # Newton steps between 20 & 50 (!)— over wide range of problem dimensions, type, and data

• 1000 variables, 10000 constraints feasible on PC; far larger if structureis exploited

• readily available (commercial and noncommercial) packages

CS286a seminar 10/15/04 60

Log barrier

for convex problem

minimize f0(x)

subject to fi(x) ≤ 0, i = 1, . . . ,m

we define logarithmic barrier as

φ(x) = −m∑

i=1

log(−fi(x))

• φ is convex, smooth on interior of feasible set

• φ→∞ as x approaches boundary of feasible set

CS286a seminar 10/15/04 61

Central path

central path is curve

x?(t) = argminx

(tf0(x) + φ(x)) , t ≥ 0

• x?(t) is strictly feasible, i.e., fi(x) < 0

• x?(t) can be computed by, e.g., Newton’s method

• intuition suggests x?(t) converges to optimal as t→∞• using duality can prove x?(t) is m/t-suboptimal

CS286a seminar 10/15/04 62

Central path & duality

from

∇x(tf0(x?) + φ(x?)) = t∇f0(x

?) +

m∑

i=1

1

−fi(x?)∇fi(x?) = 0

we find that

λ?i (t) =1

−tfi(x?), i = 1, . . . ,m

is dual feasible, with g(λ?(t)) = f0(x?)−m/t

• duality gap associated with pair x?(t), λ?(t) is m/t• hence, x?(t) is m/t-suboptimal

CS286a seminar 10/15/04 63

Example: central path for LP

x?(t) = argminx

(

tcTx−∑6i=1 log(bi − aTi x)

)

PSfrag replacements

c

x?(0)

xopt x?(10)

CS286a seminar 10/15/04 64

Barrier method

a.k.a. path-following method

given strictly feasible x, t > 0, µ > 1

repeat

1. compute x := x?(t)

(using Newton’s method, starting from x)

2. exit if m/t < tol

3. t := µt

duality gap reduced by µ each outer iteration

CS286a seminar 10/15/04 65

Trade-off in choice of µ

large µ means

• fast duality gap reduction (fewer outer iterations), but

• many Newton steps to compute x?(t+)(more Newton steps per outer iteration)

total effort measured by total number of Newton steps

CS286a seminar 10/15/04 66

Typical example

GP with n = 50 variables,m = 100 constraints, mi = 5

• wide range of µ works well• very typical behavior(even for large m, n)

0 20 40 60 80 100 120

10−6

10−4

10−2

100

102

PSfrag replacements

Newton iterations

dual

ity

gap

µ = 2µ = 50µ = 150

CS286a seminar 10/15/04 67

Effect of µ

0 20 40 60 80 100 120 140 160 180 2000

20

40

60

80

100

120

140

PSfrag replacements

µ

Newtoniterations

barrier method works well for µ in large range

CS286a seminar 10/15/04 68

Typical convergence of IP method

0 10 20 30 40 5010

−6

10−4

10−2

100

102

PSfrag replacements

GP LPSOCP

SDP

# Newton steps

dual

ity

gap

LP, GP, SOCP, SDP with 100 variables

CS286a seminar 10/15/04 69

Typical effort versus problem dimensions

• LPs with n vbles, 2nconstraints

• 100 instances for each of20 problem sizes

• avg & std dev shown

101

102

10315

20

25

30

35

PSfrag replacements

n

New

ton

step

s

CS286a seminar 10/15/04 70

Complexity analysis

• based on self-concordance (Nesterov & Nemirovsky, 1988)

• for any choice of µ, #steps is O(m log 1/ε), where ε is final accuracy

• to optimize complexity bound, can take µ = 1 + 1/√m, which yields

#steps O(√m log 1/ε)

• in any case, IP methods work extremely well in practice

CS286a seminar 10/15/04 71

Computational effort per Newton step

• Newton step effort dominated by solving linear equations to findprimal-dual search direction

• equations inherit structure from underlying problem(e.g., sparsity, symmetry, toeplitz, circulant, Hankel, Kronecker)

• equations same as for least-squares problem of similar size and structure

conclusion:

we can solve a convex problem with about the same effort assolving 20–50 least-squares problems

CS286a seminar 10/15/04 72

Other interior-point methods

more sophisticated IP algorithms

• primal-dual, incomplete centering, infeasible start• use same ideas, e.g., central path, log barrier• readily available (commercial and noncommercial packages)

typical performance: 20 – 50 Newton steps (!)— over wide range of problem dimensions, problem type, and problem data

CS286a seminar 10/15/04 73

Exploiting structure

sparsity

• well developed, since late 1970s

• direct (sparse factorizations) and iterative methods (CG, LSQR)

• standard in general purpose LP, QP, GP, SOCP implementations

• can solve problems with 105, 106 vbles, constraints(depending on sparsity pattern)

symmetry

• reduce number of variables

• reduce size of matrices (particularly helpful for SDP)

CS286a seminar 10/15/04 74

A return to subgradient-type methods

• very large-scale problems: 105 − 107 variables

• applications: medical imaging, shape design of mechanical structures,machine learning and data mining, . . .

• IPM out of consideration (even single iteration prohibitive); can useonly first-order information (function values and subgradients)

• a reasonable algorithm should have– computational effort per iteration at most linear in design dimension– potential to obtain (and willing to accept) medium-accuracy solutions– error reduction factor essentially independent of problem dimension

Ben Tal & Nemirovsky, Nesterov, . . .

CS286a seminar 10/15/04 75

Conclusions

Conclusions

convex optimization

• theory fairly mature; practice has advanced tremendously last decade

• qualitatively different from general nonlinear programming

• cost only 30× more than least-squares, but far more expressive

• lots of applications still to be discovered

CS286a seminar 10/15/04 76

Some references

• Convex Optimization, Boyd & Vandenberghe, 2004www.stanford.edu/~boyd/cvxbook.html

(pdf of full text on web)

• Introductory Lectures on Convex Optimization, Nesterov, 2003

• Lectures on Modern Convex Optimization, Ben Tal & Nemirovsky, 2001

• Interior-point Polynomial Algorithms in Convex Programming,Nesterov & Nemirovsky, 1994

• Linear Matrix Inequalities in System and Control Theory,Boyd, El Ghaoui, Feron, & Balakrishnan, 1994

CS286a seminar 10/15/04 77

convex optimization and...

Documents