a ktec center of excellence 1 convex optimization: part 1 of chapter 7 discussion presenter: brian...

A KTEC Center of Excellence 1

Convex Optimization: Part 1 of

Chapter 7 Discussion

Presenter: Brian Quanz

About today’s

discussion…•Chapter 7 – no separate discussion

of convex optimization• Discusses with SVM problems

• Instead: • Today: Discuss convex optimization

• Next Week: Discuss some specific convex optimization

problems (from text), e.g. SVMs

About today’s

discussion…•Mostly follow alternate text:

• Convex Optimization, Stephen Boyd and Lieven

Vandenberghe

– Borrowed material from book and related course notes

– Some figures and equations shown here

• Available online: http://www.stanford.edu/~boyd/cvxbook/

• Nice course lecture videos available from Stephen Boyd

online: http://www.stanford.edu/class/ee364a/

• Corresponding convex optimization tool (discuss later) -

CVX: http://www.stanford.edu/~boyd/cvx/

Overview Why convex? What is convex?

Key examples of linear and quadratic programming

Key mathematical ideas to discuss: ->Lagrange Duality->KKT conditions

Brief concept of interior point methods

CVX – convex opt. made easy

Mathematical

Optimization• All learning is some optimization problem

-> Stick to canonical form

• x = (x1, x2, …, xp ) – opt. variables ; x*

• f0 : Rp -> R – objective function

• fi : Rp -> R – constraint function

Optimization Example•Well familiar with: regularized

regression• Least squares

• Add some constraints, ridge, lasso

Why convex

optimization?• Can’t solve most OPs

• E.g. NP Hard, even high polynomial time too slow

• Convex OPs• (Generally) No analytic solution

• Efficient algorithms to find (global) solution

• Interior point methods (basically Iterated Newton) can be

– ~[10-100]*max{p3 , p2m, F} ; F cost eval. obj. and constr. f

• At worst solve with general IP methods (CVX), faster

specialized

What is Convex

Optimization?•OP with convex objective and

constraint functions

• f0 , … , fm are convex = convex OP

that has an efficient solution!

Convex Function•Definition: the weighted mean of

function evaluated at any two points

is greater than or equal to the

function evaluated at the weighted

mean of the two points

Convex Function• What does definition mean?

• Pick any two points x, y and evaluate along the

function, f(x), f(y)

• Draw the line passing through the two points

f(x) and f(y)

• Convex if function evaluated on any point along

the line between x and y is below the line

between f(x) and f(y)

Convex Function

Convex!

Convex Function

Not Convex!!!

Convex Function• Easy to see why convexity allows

for efficient solution

• Just “slide” down the objective

function as far as possible and will

reach a minimum

Local Optima is Global (simple proof)

Convex vs. Non-convex

•Convex, min. easy to find

Affine – border case of convexity

Convex vs. Non-convex

• Non-convex, easy to get stuck in a local min.

• Can’t rely on only local search techniques

Non-convex• Some non-convex problems highly multi-

modal, or NP hard

• Could be forced to search all solutions, or

hope stochastic search is successful

• Cannot guarantee best solution, inefficient

• Harder to make performance guarantees

with approximate solutions

Determine/Prove

Convexity• Can use definition (prove holds) to prove• If function restricted to any line is convex, function is convex

• If 2X differentiable, show hessian >= 0

• Often easier to:• Convert to a known convex OP

– E.g. QP, LP, SOCP, SDP, often of a more general form

• Combine known convex functions (building blocks) using

operations that preserve convexity

– Similar idea to building kernels

Some common convex

OPs•Of particular interest for this book

and chapter: • linear programming (LP) and quadratic programming

• LP: affine objective function, affine

constraints

-e.g. LP SVM, portfolio management

LP VisualizationNote: constraints form feasible set-for LP, polyhedra

Quadratic Program• QP: Quadratic objective, affine constraints

• LP is special case

• Many SVM problems result in QP, regression

• If constraint functions quadratic, then

Quadratically Constrained Quadratic Program

(QCQP)

QP Visualization

Second Order Cone

Program

• Ai = 0 - results in LP

• ci = 0 - results in QCQP

•Constraint requires the affine functions

to lie in 2nd order

Second Order Cone (Boundary)

Semidefinite

Programming

• Linear matrix inequality (LMI)

constraints

• Many problems can be expressed using

• LP and SOCP

Semidefinite

Programming

Building Convex

Functions• From simple convex functions to

complex: some operations that

preserve complexity• Nonnegative weighted sum

• Composition with affine function

• Pointwise maximum and supremum

• Composition

• Minimization

• Perspective ( g(x,t) = tf(x/t) )

Verifying Convexity

Remarks• For more detail and expansion, consult the

referenced text, Convex Optimization

• Geometric Programs also convex, can be

handled with a series of SDPs (skipped

details here)

• CVX converts the problem either to SOCP

or SDM (or a series of) and uses efficient

solver

Lagrangian• Standard form:

• Lagrangian L:

• Lambda, nu, Lagrange multipliers (dual variables)

Lagrange Dual Function

• Lagrange Dual found by minimizing

L with respect to primal variables • Often can take gradient of L w.r.t. primal var.’s and

set = 0 (SVM)

Lagrange Dual Function•Note: Lagrange dual function is the

point-wise infimum of family of

affine functions of (lambda, nu)

• Thus, g is concave even if problem

is not convex

Lagrange Dual Function• Lagrange Dual provides lower

bound on objective value at solution

Lagrangian as Linear Approximation,

Lower Bound

• Simple interpretation of Lagrangian

• Can incorporate the constraints into

objective as indicator functions • Infinity if violated, 0 otherwise:

• In Lagrangian we use a “soft” linear

approximation to the indicator functions; under-

estimator since

Lagrange Dual Problem• Why not make the lower bound best possible?

• Dual problem:

• Always convex opt. problem (even when

primal is non-convex)

• Weak Duality: d* <= p* (have already

seen this)

Strong Duality• If d* = p*, strong duality holds

• Does not hold in general

• Slater’s Theorem: If convex problem,

and strictly feasible point exists, then

strong duality holds! (proof too involved, refer to text)

• => For convex problems, can use dual problem to find

solution

Complementary

Slackness• When strong duality holds

• Sandwiched between f0(x), last 2

inequalities are equalities, simple!

(definition)

(since constraints satisfied at x*)

Complementary

Slackness• Which means:

• Since each term is non-positive, we have

complementary slackness:

• Whenever constraint is non-active,

corresponding multiplier is zero

Complementary

Slackness• This can also be described by

• Since usually only a few active

constraints at solution (see geometry),

the dual variable lambda is often sparse • Note: In general no guarantee

Complementary

Slackness• As we will see, this is why support vector

machines result in solution with only key

support vectors• These come from the dual problem, constraints correspond to points,

and complementary slackness ensures only the “active” points are

Complementary

Slackness• However, avoid common misconceptions

when it comes to SVM and complementary

slackness!

• E.g. if Lagrange multiplier is 0, constraint

could still be active! (not bijection!)

• This means:

KKT Conditions • The KKT conditions are then just what

we call that set of conditions required

at the solution (basically list what we

• KKT conditions play important role• Can sometimes be used to find solution analytically

• Otherwise can think of many methods as ways of solving

KKT conditions

KKT Conditions• Again given strong duality and assuming

differentiable, since

gradient must be 0 at x*

• Thus, putting it all together, for non-

convex problems we have

KKT Conditions – non-

convex

• Necessary conditions

KKT Conditions – convex

• Also sufficient

conditions: • 1+2 -> xt is feasible.

• 3 -> L(x,lt,nt) is convex

• 5 -> xt minimizes

L(x,lt,nt) so g(lt,nt) =

L(xt,lt,nt)

Brief description of interior point

method• Solve a series of equality constrained

problems with Newton’s method

• Approximate constraints with log-barrier

(approx. of indicator)

Brief description of interior point

method

• As t gets larger, approximation becomes better

Central Path Idea

CVX: Convex Optimization Made

Easy• CVX is a Matlab toolbox• Allows you to flexibly express convex optimization problems

• Translates these to a general form and uses efficient solver

(SOCP, SDP, or a series of these)

• http://www.stanford.edu/~boyd/cvx/

• All you have to do is design the convex

optimization problem• Plug into CVX, a first version of algorithm implemented

• More specialized solver may be necessary for some

applications

CVX - Examples•Quadratic program: given H, f, A,

and b• cvx_begin

variable x(n)

minimize (x’*H*x + f’*x)

subject to

A*x >= b

cvx_end

CVX - Examples• SVM-type formulation with L1 norm

• cvx_begin variable w(p)

variable b(1) variable e(n) expression by(n) by = train_label.*b; minimize( w'*(L + I)*w + C*sum(e) +

l1_lambda*norm(w,1) ) subject to

X*w + by >= a - e;e >= ec;

cvx_end

CVX - Examples• More complicated terms built with expressions

• cvx_begin variable w(p+1+n); expression q(ec); for i =1:p for j =i:p if(A(i,j) == 1) q(ct) = max(abs(w(i))/d(i),abs(w(j))/d(j)); ct=ct+1; end end end minimize( f'*w + lambda*sum(q) ) subject to X*w >= a;cvx_end

Questions•Questions, Comments?

Extra proof

a ktec center of excellence 1 convex optimization: part 1 of chapter 7 discussion presenter: brian...

convex function convex

convex function slide

ktec center of excellence

convex objective

convex function easy

convex function definition

convex optimization

slow convex ops

Documents

convex hulls - tulane university• but convex hulls are...

geometric algorithms joão comba. example: convex hull...

convex / convex pulse - ceaweld

1 theory of convex functions - princeton...

convex hull smallest convex set containing all the points

polygons can be concave or convex convex concave

a ktec center of excellence 1 basis expansions and...

convex and non-convex worlds in machine...

+ convex functions, convex sets and quadratic programs...

introduction, convex...

lecture 3: a brief overview on convex analysis and...

practical session on convex optimization: convex...

lecture notes 7: convex optimization - nyu...

lecture 4 convex programming and lagrange duality...lecture...

a ktec center of excellence 1 pattern analysis using convex...

convex optimization - mark schmidt - cmpt...

ects.ogu.edu.tr°statistik doktora... · web...

regularized bundle methods for convex and non-convex risks

der 30u30-wettbewerb. neumann-quanz: beproud

polygons - concave or convexpolygons - concave & convex...