a ktec center of excellence 1 convex optimization: part 1 of chapter 7 discussion presenter: brian...

54
A KTEC Center of Excellence 1 Convex Optimization: Part 1 of Chapter 7 Discussion Presenter: Brian Quanz

Upload: patience-conley

Post on 17-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A KTEC Center of Excellence 1 Convex Optimization: Part 1 of Chapter 7 Discussion Presenter: Brian Quanz

A KTEC Center of Excellence 1

Convex Optimization: Part 1 of

Chapter 7 Discussion

Presenter: Brian Quanz

Page 2: A KTEC Center of Excellence 1 Convex Optimization: Part 1 of Chapter 7 Discussion Presenter: Brian Quanz

A KTEC Center of Excellence 2

About today’s

discussion…•Chapter 7 – no separate discussion

of convex optimization• Discusses with SVM problems

• Instead: • Today: Discuss convex optimization

• Next Week: Discuss some specific convex optimization

problems (from text), e.g. SVMs

Page 3: A KTEC Center of Excellence 1 Convex Optimization: Part 1 of Chapter 7 Discussion Presenter: Brian Quanz

A KTEC Center of Excellence 3

About today’s

discussion…•Mostly follow alternate text:

• Convex Optimization, Stephen Boyd and Lieven

Vandenberghe

– Borrowed material from book and related course notes

– Some figures and equations shown here

• Available online: http://www.stanford.edu/~boyd/cvxbook/

• Nice course lecture videos available from Stephen Boyd

online: http://www.stanford.edu/class/ee364a/

• Corresponding convex optimization tool (discuss later) -

CVX: http://www.stanford.edu/~boyd/cvx/

Page 4: A KTEC Center of Excellence 1 Convex Optimization: Part 1 of Chapter 7 Discussion Presenter: Brian Quanz

A KTEC Center of Excellence 4

Overview Why convex? What is convex?

Key examples of linear and quadratic programming

Key mathematical ideas to discuss: ->Lagrange Duality->KKT conditions

Brief concept of interior point methods

CVX – convex opt. made easy

Page 5: A KTEC Center of Excellence 1 Convex Optimization: Part 1 of Chapter 7 Discussion Presenter: Brian Quanz

A KTEC Center of Excellence 5

Mathematical

Optimization• All learning is some optimization problem

-> Stick to canonical form

• x = (x1, x2, …, xp ) – opt. variables ; x*

• f0 : Rp -> R – objective function

• fi : Rp -> R – constraint function

Page 6: A KTEC Center of Excellence 1 Convex Optimization: Part 1 of Chapter 7 Discussion Presenter: Brian Quanz

A KTEC Center of Excellence 6

Optimization Example•Well familiar with: regularized

regression• Least squares

• Add some constraints, ridge, lasso

Page 7: A KTEC Center of Excellence 1 Convex Optimization: Part 1 of Chapter 7 Discussion Presenter: Brian Quanz

A KTEC Center of Excellence 7

Why convex

optimization?• Can’t solve most OPs

• E.g. NP Hard, even high polynomial time too slow

• Convex OPs• (Generally) No analytic solution

• Efficient algorithms to find (global) solution

• Interior point methods (basically Iterated Newton) can be

used:

– ~[10-100]*max{p3 , p2m, F} ; F cost eval. obj. and constr. f

• At worst solve with general IP methods (CVX), faster

specialized

Page 8: A KTEC Center of Excellence 1 Convex Optimization: Part 1 of Chapter 7 Discussion Presenter: Brian Quanz

A KTEC Center of Excellence 8

What is Convex

Optimization?•OP with convex objective and

constraint functions

• f0 , … , fm are convex = convex OP

that has an efficient solution!

Page 9: A KTEC Center of Excellence 1 Convex Optimization: Part 1 of Chapter 7 Discussion Presenter: Brian Quanz

A KTEC Center of Excellence 9

Convex Function•Definition: the weighted mean of

function evaluated at any two points

is greater than or equal to the

function evaluated at the weighted

mean of the two points

Page 10: A KTEC Center of Excellence 1 Convex Optimization: Part 1 of Chapter 7 Discussion Presenter: Brian Quanz

A KTEC Center of Excellence 10

Convex Function• What does definition mean?

• Pick any two points x, y and evaluate along the

function, f(x), f(y)

• Draw the line passing through the two points

f(x) and f(y)

• Convex if function evaluated on any point along

the line between x and y is below the line

between f(x) and f(y)

Page 11: A KTEC Center of Excellence 1 Convex Optimization: Part 1 of Chapter 7 Discussion Presenter: Brian Quanz

A KTEC Center of Excellence 11

Convex Function

Page 12: A KTEC Center of Excellence 1 Convex Optimization: Part 1 of Chapter 7 Discussion Presenter: Brian Quanz

A KTEC Center of Excellence 12

Convex Function

Convex!

Page 13: A KTEC Center of Excellence 1 Convex Optimization: Part 1 of Chapter 7 Discussion Presenter: Brian Quanz

A KTEC Center of Excellence 13

Convex Function

Not Convex!!!

Page 14: A KTEC Center of Excellence 1 Convex Optimization: Part 1 of Chapter 7 Discussion Presenter: Brian Quanz

A KTEC Center of Excellence 14

Convex Function• Easy to see why convexity allows

for efficient solution

• Just “slide” down the objective

function as far as possible and will

reach a minimum

Page 15: A KTEC Center of Excellence 1 Convex Optimization: Part 1 of Chapter 7 Discussion Presenter: Brian Quanz

A KTEC Center of Excellence 15

Local Optima is Global (simple proof)

Page 16: A KTEC Center of Excellence 1 Convex Optimization: Part 1 of Chapter 7 Discussion Presenter: Brian Quanz

A KTEC Center of Excellence 16

Convex vs. Non-convex

Ex.

•Convex, min. easy to find

Affine – border case of convexity

Page 17: A KTEC Center of Excellence 1 Convex Optimization: Part 1 of Chapter 7 Discussion Presenter: Brian Quanz

A KTEC Center of Excellence 17

Convex vs. Non-convex

Ex.

• Non-convex, easy to get stuck in a local min.

• Can’t rely on only local search techniques

Page 18: A KTEC Center of Excellence 1 Convex Optimization: Part 1 of Chapter 7 Discussion Presenter: Brian Quanz

A KTEC Center of Excellence 18

Non-convex• Some non-convex problems highly multi-

modal, or NP hard

• Could be forced to search all solutions, or

hope stochastic search is successful

• Cannot guarantee best solution, inefficient

• Harder to make performance guarantees

with approximate solutions

Page 19: A KTEC Center of Excellence 1 Convex Optimization: Part 1 of Chapter 7 Discussion Presenter: Brian Quanz

A KTEC Center of Excellence 19

Determine/Prove

Convexity• Can use definition (prove holds) to prove• If function restricted to any line is convex, function is convex

• If 2X differentiable, show hessian >= 0

• Often easier to:• Convert to a known convex OP

– E.g. QP, LP, SOCP, SDP, often of a more general form

• Combine known convex functions (building blocks) using

operations that preserve convexity

– Similar idea to building kernels

Page 20: A KTEC Center of Excellence 1 Convex Optimization: Part 1 of Chapter 7 Discussion Presenter: Brian Quanz

A KTEC Center of Excellence 20

Some common convex

OPs•Of particular interest for this book

and chapter: • linear programming (LP) and quadratic programming

(QP)

• LP: affine objective function, affine

constraints

-e.g. LP SVM, portfolio management

Page 21: A KTEC Center of Excellence 1 Convex Optimization: Part 1 of Chapter 7 Discussion Presenter: Brian Quanz

A KTEC Center of Excellence 21

LP VisualizationNote: constraints form feasible set-for LP, polyhedra

Page 22: A KTEC Center of Excellence 1 Convex Optimization: Part 1 of Chapter 7 Discussion Presenter: Brian Quanz

A KTEC Center of Excellence 22

Quadratic Program• QP: Quadratic objective, affine constraints

• LP is special case

• Many SVM problems result in QP, regression

• If constraint functions quadratic, then

Quadratically Constrained Quadratic Program

(QCQP)

Page 23: A KTEC Center of Excellence 1 Convex Optimization: Part 1 of Chapter 7 Discussion Presenter: Brian Quanz

A KTEC Center of Excellence 23

QP Visualization

Page 24: A KTEC Center of Excellence 1 Convex Optimization: Part 1 of Chapter 7 Discussion Presenter: Brian Quanz

A KTEC Center of Excellence 24

Second Order Cone

Program

• Ai = 0 - results in LP

• ci = 0 - results in QCQP

•Constraint requires the affine functions

to lie in 2nd order

cone

Page 25: A KTEC Center of Excellence 1 Convex Optimization: Part 1 of Chapter 7 Discussion Presenter: Brian Quanz

A KTEC Center of Excellence 25

Second Order Cone (Boundary)

in R3

Page 26: A KTEC Center of Excellence 1 Convex Optimization: Part 1 of Chapter 7 Discussion Presenter: Brian Quanz

A KTEC Center of Excellence 26

Semidefinite

Programming

• Linear matrix inequality (LMI)

constraints

• Many problems can be expressed using

LMIs

• LP and SOCP

Page 27: A KTEC Center of Excellence 1 Convex Optimization: Part 1 of Chapter 7 Discussion Presenter: Brian Quanz

A KTEC Center of Excellence 27

Semidefinite

Programming

Page 28: A KTEC Center of Excellence 1 Convex Optimization: Part 1 of Chapter 7 Discussion Presenter: Brian Quanz

A KTEC Center of Excellence 28

Building Convex

Functions• From simple convex functions to

complex: some operations that

preserve complexity• Nonnegative weighted sum

• Composition with affine function

• Pointwise maximum and supremum

• Composition

• Minimization

• Perspective ( g(x,t) = tf(x/t) )

Page 29: A KTEC Center of Excellence 1 Convex Optimization: Part 1 of Chapter 7 Discussion Presenter: Brian Quanz

A KTEC Center of Excellence 29

Verifying Convexity

Remarks• For more detail and expansion, consult the

referenced text, Convex Optimization

• Geometric Programs also convex, can be

handled with a series of SDPs (skipped

details here)

• CVX converts the problem either to SOCP

or SDM (or a series of) and uses efficient

solver

Page 30: A KTEC Center of Excellence 1 Convex Optimization: Part 1 of Chapter 7 Discussion Presenter: Brian Quanz

A KTEC Center of Excellence 30

Lagrangian• Standard form:

• Lagrangian L:

• Lambda, nu, Lagrange multipliers (dual variables)

Page 31: A KTEC Center of Excellence 1 Convex Optimization: Part 1 of Chapter 7 Discussion Presenter: Brian Quanz

A KTEC Center of Excellence 31

Lagrange Dual Function

• Lagrange Dual found by minimizing

L with respect to primal variables • Often can take gradient of L w.r.t. primal var.’s and

set = 0 (SVM)

Page 32: A KTEC Center of Excellence 1 Convex Optimization: Part 1 of Chapter 7 Discussion Presenter: Brian Quanz

A KTEC Center of Excellence 32

Lagrange Dual Function•Note: Lagrange dual function is the

point-wise infimum of family of

affine functions of (lambda, nu)

• Thus, g is concave even if problem

is not convex

Page 33: A KTEC Center of Excellence 1 Convex Optimization: Part 1 of Chapter 7 Discussion Presenter: Brian Quanz

A KTEC Center of Excellence 33

Lagrange Dual Function• Lagrange Dual provides lower

bound on objective value at solution

Page 34: A KTEC Center of Excellence 1 Convex Optimization: Part 1 of Chapter 7 Discussion Presenter: Brian Quanz

A KTEC Center of Excellence 34

Lagrangian as Linear Approximation,

Lower Bound

• Simple interpretation of Lagrangian

• Can incorporate the constraints into

objective as indicator functions • Infinity if violated, 0 otherwise:

• In Lagrangian we use a “soft” linear

approximation to the indicator functions; under-

estimator since

0

Page 35: A KTEC Center of Excellence 1 Convex Optimization: Part 1 of Chapter 7 Discussion Presenter: Brian Quanz

A KTEC Center of Excellence 35

Lagrange Dual Problem• Why not make the lower bound best possible?

• Dual problem:

• Always convex opt. problem (even when

primal is non-convex)

• Weak Duality: d* <= p* (have already

seen this)

Page 36: A KTEC Center of Excellence 1 Convex Optimization: Part 1 of Chapter 7 Discussion Presenter: Brian Quanz

A KTEC Center of Excellence 36

Strong Duality• If d* = p*, strong duality holds

• Does not hold in general

• Slater’s Theorem: If convex problem,

and strictly feasible point exists, then

strong duality holds! (proof too involved, refer to text)

• => For convex problems, can use dual problem to find

solution

Page 37: A KTEC Center of Excellence 1 Convex Optimization: Part 1 of Chapter 7 Discussion Presenter: Brian Quanz

A KTEC Center of Excellence 37

Complementary

Slackness• When strong duality holds

• Sandwiched between f0(x), last 2

inequalities are equalities, simple!

(definition)

(since constraints satisfied at x*)

Page 38: A KTEC Center of Excellence 1 Convex Optimization: Part 1 of Chapter 7 Discussion Presenter: Brian Quanz

A KTEC Center of Excellence 38

Complementary

Slackness• Which means:

• Since each term is non-positive, we have

complementary slackness:

• Whenever constraint is non-active,

corresponding multiplier is zero

Page 39: A KTEC Center of Excellence 1 Convex Optimization: Part 1 of Chapter 7 Discussion Presenter: Brian Quanz

A KTEC Center of Excellence 39

Complementary

Slackness• This can also be described by

• Since usually only a few active

constraints at solution (see geometry),

the dual variable lambda is often sparse • Note: In general no guarantee

Page 40: A KTEC Center of Excellence 1 Convex Optimization: Part 1 of Chapter 7 Discussion Presenter: Brian Quanz

A KTEC Center of Excellence 40

Complementary

Slackness• As we will see, this is why support vector

machines result in solution with only key

support vectors• These come from the dual problem, constraints correspond to points,

and complementary slackness ensures only the “active” points are

kept

Page 41: A KTEC Center of Excellence 1 Convex Optimization: Part 1 of Chapter 7 Discussion Presenter: Brian Quanz

A KTEC Center of Excellence 41

Complementary

Slackness• However, avoid common misconceptions

when it comes to SVM and complementary

slackness!

• E.g. if Lagrange multiplier is 0, constraint

could still be active! (not bijection!)

• This means:

Page 42: A KTEC Center of Excellence 1 Convex Optimization: Part 1 of Chapter 7 Discussion Presenter: Brian Quanz

A KTEC Center of Excellence 42

KKT Conditions • The KKT conditions are then just what

we call that set of conditions required

at the solution (basically list what we

know)

• KKT conditions play important role• Can sometimes be used to find solution analytically

• Otherwise can think of many methods as ways of solving

KKT conditions

Page 43: A KTEC Center of Excellence 1 Convex Optimization: Part 1 of Chapter 7 Discussion Presenter: Brian Quanz

A KTEC Center of Excellence 43

KKT Conditions• Again given strong duality and assuming

differentiable, since

gradient must be 0 at x*

• Thus, putting it all together, for non-

convex problems we have

Page 44: A KTEC Center of Excellence 1 Convex Optimization: Part 1 of Chapter 7 Discussion Presenter: Brian Quanz

A KTEC Center of Excellence 44

KKT Conditions – non-

convex

• Necessary conditions

Page 45: A KTEC Center of Excellence 1 Convex Optimization: Part 1 of Chapter 7 Discussion Presenter: Brian Quanz

A KTEC Center of Excellence 45

KKT Conditions – convex

• Also sufficient

conditions: • 1+2 -> xt is feasible.

• 3 -> L(x,lt,nt) is convex

• 5 -> xt minimizes

L(x,lt,nt) so g(lt,nt) =

L(xt,lt,nt)

Page 46: A KTEC Center of Excellence 1 Convex Optimization: Part 1 of Chapter 7 Discussion Presenter: Brian Quanz

A KTEC Center of Excellence 46

Brief description of interior point

method• Solve a series of equality constrained

problems with Newton’s method

• Approximate constraints with log-barrier

(approx. of indicator)

Page 47: A KTEC Center of Excellence 1 Convex Optimization: Part 1 of Chapter 7 Discussion Presenter: Brian Quanz

A KTEC Center of Excellence 47

Brief description of interior point

method

• As t gets larger, approximation becomes better

Page 48: A KTEC Center of Excellence 1 Convex Optimization: Part 1 of Chapter 7 Discussion Presenter: Brian Quanz

A KTEC Center of Excellence 48

Central Path Idea

Page 49: A KTEC Center of Excellence 1 Convex Optimization: Part 1 of Chapter 7 Discussion Presenter: Brian Quanz

A KTEC Center of Excellence 49

CVX: Convex Optimization Made

Easy• CVX is a Matlab toolbox• Allows you to flexibly express convex optimization problems

• Translates these to a general form and uses efficient solver

(SOCP, SDP, or a series of these)

• http://www.stanford.edu/~boyd/cvx/

• All you have to do is design the convex

optimization problem• Plug into CVX, a first version of algorithm implemented

• More specialized solver may be necessary for some

applications

Page 50: A KTEC Center of Excellence 1 Convex Optimization: Part 1 of Chapter 7 Discussion Presenter: Brian Quanz

A KTEC Center of Excellence 50

CVX - Examples•Quadratic program: given H, f, A,

and b• cvx_begin

variable x(n)

minimize (x’*H*x + f’*x)

subject to

A*x >= b

cvx_end

Page 51: A KTEC Center of Excellence 1 Convex Optimization: Part 1 of Chapter 7 Discussion Presenter: Brian Quanz

A KTEC Center of Excellence 51

CVX - Examples• SVM-type formulation with L1 norm

• cvx_begin variable w(p)

variable b(1) variable e(n) expression by(n) by = train_label.*b; minimize( w'*(L + I)*w + C*sum(e) +

l1_lambda*norm(w,1) ) subject to

X*w + by >= a - e;e >= ec;

cvx_end

Page 52: A KTEC Center of Excellence 1 Convex Optimization: Part 1 of Chapter 7 Discussion Presenter: Brian Quanz

A KTEC Center of Excellence 52

CVX - Examples• More complicated terms built with expressions

• cvx_begin variable w(p+1+n); expression q(ec); for i =1:p for j =i:p if(A(i,j) == 1) q(ct) = max(abs(w(i))/d(i),abs(w(j))/d(j)); ct=ct+1; end end end minimize( f'*w + lambda*sum(q) ) subject to X*w >= a;cvx_end

Page 53: A KTEC Center of Excellence 1 Convex Optimization: Part 1 of Chapter 7 Discussion Presenter: Brian Quanz

A KTEC Center of Excellence 53

Questions•Questions, Comments?

Page 54: A KTEC Center of Excellence 1 Convex Optimization: Part 1 of Chapter 7 Discussion Presenter: Brian Quanz

A KTEC Center of Excellence 54

Extra proof