a ktec center of excellence 1 convex optimization: part 1 of chapter 7 discussion presenter: brian...

Post on 17-Dec-2015

216 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

A KTEC Center of Excellence 1

Convex Optimization: Part 1 of

Chapter 7 Discussion

Presenter: Brian Quanz

A KTEC Center of Excellence 2

About today’s

discussion…•Chapter 7 – no separate discussion

of convex optimization• Discusses with SVM problems

• Instead: • Today: Discuss convex optimization

• Next Week: Discuss some specific convex optimization

problems (from text), e.g. SVMs

A KTEC Center of Excellence 3

About today’s

discussion…•Mostly follow alternate text:

• Convex Optimization, Stephen Boyd and Lieven

Vandenberghe

– Borrowed material from book and related course notes

– Some figures and equations shown here

• Available online: http://www.stanford.edu/~boyd/cvxbook/

• Nice course lecture videos available from Stephen Boyd

online: http://www.stanford.edu/class/ee364a/

• Corresponding convex optimization tool (discuss later) -

CVX: http://www.stanford.edu/~boyd/cvx/

A KTEC Center of Excellence 4

Overview Why convex? What is convex?

Key examples of linear and quadratic programming

Key mathematical ideas to discuss: ->Lagrange Duality->KKT conditions

Brief concept of interior point methods

CVX – convex opt. made easy

A KTEC Center of Excellence 5

Mathematical

Optimization• All learning is some optimization problem

-> Stick to canonical form

• x = (x1, x2, …, xp ) – opt. variables ; x*

• f0 : Rp -> R – objective function

• fi : Rp -> R – constraint function

A KTEC Center of Excellence 6

Optimization Example•Well familiar with: regularized

regression• Least squares

• Add some constraints, ridge, lasso

A KTEC Center of Excellence 7

Why convex

optimization?• Can’t solve most OPs

• E.g. NP Hard, even high polynomial time too slow

• Convex OPs• (Generally) No analytic solution

• Efficient algorithms to find (global) solution

• Interior point methods (basically Iterated Newton) can be

used:

– ~[10-100]*max{p3 , p2m, F} ; F cost eval. obj. and constr. f

• At worst solve with general IP methods (CVX), faster

specialized

A KTEC Center of Excellence 8

What is Convex

Optimization?•OP with convex objective and

constraint functions

• f0 , … , fm are convex = convex OP

that has an efficient solution!

A KTEC Center of Excellence 9

Convex Function•Definition: the weighted mean of

function evaluated at any two points

is greater than or equal to the

function evaluated at the weighted

mean of the two points

A KTEC Center of Excellence 10

Convex Function• What does definition mean?

• Pick any two points x, y and evaluate along the

function, f(x), f(y)

• Draw the line passing through the two points

f(x) and f(y)

• Convex if function evaluated on any point along

the line between x and y is below the line

between f(x) and f(y)

A KTEC Center of Excellence 11

Convex Function

A KTEC Center of Excellence 12

Convex Function

Convex!

A KTEC Center of Excellence 13

Convex Function

Not Convex!!!

A KTEC Center of Excellence 14

Convex Function• Easy to see why convexity allows

for efficient solution

• Just “slide” down the objective

function as far as possible and will

reach a minimum

A KTEC Center of Excellence 15

Local Optima is Global (simple proof)

A KTEC Center of Excellence 16

Convex vs. Non-convex

Ex.

•Convex, min. easy to find

Affine – border case of convexity

A KTEC Center of Excellence 17

Convex vs. Non-convex

Ex.

• Non-convex, easy to get stuck in a local min.

• Can’t rely on only local search techniques

A KTEC Center of Excellence 18

Non-convex• Some non-convex problems highly multi-

modal, or NP hard

• Could be forced to search all solutions, or

hope stochastic search is successful

• Cannot guarantee best solution, inefficient

• Harder to make performance guarantees

with approximate solutions

A KTEC Center of Excellence 19

Determine/Prove

Convexity• Can use definition (prove holds) to prove• If function restricted to any line is convex, function is convex

• If 2X differentiable, show hessian >= 0

• Often easier to:• Convert to a known convex OP

– E.g. QP, LP, SOCP, SDP, often of a more general form

• Combine known convex functions (building blocks) using

operations that preserve convexity

– Similar idea to building kernels

A KTEC Center of Excellence 20

Some common convex

OPs•Of particular interest for this book

and chapter: • linear programming (LP) and quadratic programming

(QP)

• LP: affine objective function, affine

constraints

-e.g. LP SVM, portfolio management

A KTEC Center of Excellence 21

LP VisualizationNote: constraints form feasible set-for LP, polyhedra

A KTEC Center of Excellence 22

Quadratic Program• QP: Quadratic objective, affine constraints

• LP is special case

• Many SVM problems result in QP, regression

• If constraint functions quadratic, then

Quadratically Constrained Quadratic Program

(QCQP)

A KTEC Center of Excellence 23

QP Visualization

A KTEC Center of Excellence 24

Second Order Cone

Program

• Ai = 0 - results in LP

• ci = 0 - results in QCQP

•Constraint requires the affine functions

to lie in 2nd order

cone

A KTEC Center of Excellence 25

Second Order Cone (Boundary)

in R3

A KTEC Center of Excellence 26

Semidefinite

Programming

• Linear matrix inequality (LMI)

constraints

• Many problems can be expressed using

LMIs

• LP and SOCP

A KTEC Center of Excellence 27

Semidefinite

Programming

A KTEC Center of Excellence 28

Building Convex

Functions• From simple convex functions to

complex: some operations that

preserve complexity• Nonnegative weighted sum

• Composition with affine function

• Pointwise maximum and supremum

• Composition

• Minimization

• Perspective ( g(x,t) = tf(x/t) )

A KTEC Center of Excellence 29

Verifying Convexity

Remarks• For more detail and expansion, consult the

referenced text, Convex Optimization

• Geometric Programs also convex, can be

handled with a series of SDPs (skipped

details here)

• CVX converts the problem either to SOCP

or SDM (or a series of) and uses efficient

solver

A KTEC Center of Excellence 30

Lagrangian• Standard form:

• Lagrangian L:

• Lambda, nu, Lagrange multipliers (dual variables)

A KTEC Center of Excellence 31

Lagrange Dual Function

• Lagrange Dual found by minimizing

L with respect to primal variables • Often can take gradient of L w.r.t. primal var.’s and

set = 0 (SVM)

A KTEC Center of Excellence 32

Lagrange Dual Function•Note: Lagrange dual function is the

point-wise infimum of family of

affine functions of (lambda, nu)

• Thus, g is concave even if problem

is not convex

A KTEC Center of Excellence 33

Lagrange Dual Function• Lagrange Dual provides lower

bound on objective value at solution

A KTEC Center of Excellence 34

Lagrangian as Linear Approximation,

Lower Bound

• Simple interpretation of Lagrangian

• Can incorporate the constraints into

objective as indicator functions • Infinity if violated, 0 otherwise:

• In Lagrangian we use a “soft” linear

approximation to the indicator functions; under-

estimator since

0

A KTEC Center of Excellence 35

Lagrange Dual Problem• Why not make the lower bound best possible?

• Dual problem:

• Always convex opt. problem (even when

primal is non-convex)

• Weak Duality: d* <= p* (have already

seen this)

A KTEC Center of Excellence 36

Strong Duality• If d* = p*, strong duality holds

• Does not hold in general

• Slater’s Theorem: If convex problem,

and strictly feasible point exists, then

strong duality holds! (proof too involved, refer to text)

• => For convex problems, can use dual problem to find

solution

A KTEC Center of Excellence 37

Complementary

Slackness• When strong duality holds

• Sandwiched between f0(x), last 2

inequalities are equalities, simple!

(definition)

(since constraints satisfied at x*)

A KTEC Center of Excellence 38

Complementary

Slackness• Which means:

• Since each term is non-positive, we have

complementary slackness:

• Whenever constraint is non-active,

corresponding multiplier is zero

A KTEC Center of Excellence 39

Complementary

Slackness• This can also be described by

• Since usually only a few active

constraints at solution (see geometry),

the dual variable lambda is often sparse • Note: In general no guarantee

A KTEC Center of Excellence 40

Complementary

Slackness• As we will see, this is why support vector

machines result in solution with only key

support vectors• These come from the dual problem, constraints correspond to points,

and complementary slackness ensures only the “active” points are

kept

A KTEC Center of Excellence 41

Complementary

Slackness• However, avoid common misconceptions

when it comes to SVM and complementary

slackness!

• E.g. if Lagrange multiplier is 0, constraint

could still be active! (not bijection!)

• This means:

A KTEC Center of Excellence 42

KKT Conditions • The KKT conditions are then just what

we call that set of conditions required

at the solution (basically list what we

know)

• KKT conditions play important role• Can sometimes be used to find solution analytically

• Otherwise can think of many methods as ways of solving

KKT conditions

A KTEC Center of Excellence 43

KKT Conditions• Again given strong duality and assuming

differentiable, since

gradient must be 0 at x*

• Thus, putting it all together, for non-

convex problems we have

A KTEC Center of Excellence 44

KKT Conditions – non-

convex

• Necessary conditions

A KTEC Center of Excellence 45

KKT Conditions – convex

• Also sufficient

conditions: • 1+2 -> xt is feasible.

• 3 -> L(x,lt,nt) is convex

• 5 -> xt minimizes

L(x,lt,nt) so g(lt,nt) =

L(xt,lt,nt)

A KTEC Center of Excellence 46

Brief description of interior point

method• Solve a series of equality constrained

problems with Newton’s method

• Approximate constraints with log-barrier

(approx. of indicator)

A KTEC Center of Excellence 47

Brief description of interior point

method

• As t gets larger, approximation becomes better

A KTEC Center of Excellence 48

Central Path Idea

A KTEC Center of Excellence 49

CVX: Convex Optimization Made

Easy• CVX is a Matlab toolbox• Allows you to flexibly express convex optimization problems

• Translates these to a general form and uses efficient solver

(SOCP, SDP, or a series of these)

• http://www.stanford.edu/~boyd/cvx/

• All you have to do is design the convex

optimization problem• Plug into CVX, a first version of algorithm implemented

• More specialized solver may be necessary for some

applications

A KTEC Center of Excellence 50

CVX - Examples•Quadratic program: given H, f, A,

and b• cvx_begin

variable x(n)

minimize (x’*H*x + f’*x)

subject to

A*x >= b

cvx_end

A KTEC Center of Excellence 51

CVX - Examples• SVM-type formulation with L1 norm

• cvx_begin variable w(p)

variable b(1) variable e(n) expression by(n) by = train_label.*b; minimize( w'*(L + I)*w + C*sum(e) +

l1_lambda*norm(w,1) ) subject to

X*w + by >= a - e;e >= ec;

cvx_end

A KTEC Center of Excellence 52

CVX - Examples• More complicated terms built with expressions

• cvx_begin variable w(p+1+n); expression q(ec); for i =1:p for j =i:p if(A(i,j) == 1) q(ct) = max(abs(w(i))/d(i),abs(w(j))/d(j)); ct=ct+1; end end end minimize( f'*w + lambda*sum(q) ) subject to X*w >= a;cvx_end

A KTEC Center of Excellence 53

Questions•Questions, Comments?

A KTEC Center of Excellence 54

Extra proof

top related