release 3.2.2 mosek apschapter2 linearoptimization in this chapter we discuss various aspects of...
TRANSCRIPT
MOSEK Modeling CookbookRelease 3.2.2
MOSEK ApS
18 March 2021
Contents
1 Preface 1
2 Linear optimization 32.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32.2 Linear modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72.3 Infeasibility in linear optimization . . . . . . . . . . . . . . . . . . . . . . . 122.4 Duality in linear optimization . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3 Conic quadratic optimization 203.1 Cones . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203.2 Conic quadratic modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223.3 Conic quadratic case studies . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4 The power cone 304.1 The power cone(s) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304.2 Sets representable using the power cone . . . . . . . . . . . . . . . . . . . . 314.3 Power cone case studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
5 Exponential cone optimization 385.1 Exponential cone . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 385.2 Modeling with the exponential cone . . . . . . . . . . . . . . . . . . . . . . 395.3 Geometric programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 425.4 Exponential cone case studies . . . . . . . . . . . . . . . . . . . . . . . . . . 47
6 Semidefinite optimization 516.1 Introduction to semidefinite matrices . . . . . . . . . . . . . . . . . . . . . . 516.2 Semidefinite modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 556.3 Semidefinite optimization case studies . . . . . . . . . . . . . . . . . . . . . 65
7 Practical optimization 747.1 Conic reformulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 747.2 Avoiding ill-posed problems . . . . . . . . . . . . . . . . . . . . . . . . . . . 777.3 Scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 797.4 The huge and the tiny . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 817.5 Semidefinite variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 827.6 The quality of a solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
i
7.7 Distance to a cone . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
8 Duality in conic optimization 888.1 Dual cone . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 898.2 Infeasibility in conic optimization . . . . . . . . . . . . . . . . . . . . . . . . 908.3 Lagrangian and the dual problem . . . . . . . . . . . . . . . . . . . . . . . . 928.4 Weak and strong duality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 958.5 Applications of conic duality . . . . . . . . . . . . . . . . . . . . . . . . . . 988.6 Semidefinite duality and LMIs . . . . . . . . . . . . . . . . . . . . . . . . . 99
9 Mixed integer optimization 1029.1 Integer modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1029.2 Mixed integer conic case studies . . . . . . . . . . . . . . . . . . . . . . . . 110
10 Quadratic optimization 11410.1 Quadratic objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11410.2 Quadratically constrained optimization . . . . . . . . . . . . . . . . . . . . . 11710.3 Example: Factor model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
11 Bibliographic notes 121
12 Notation and definitions 122
Bibliography 124
Index 126
ii
Chapter 1
Preface
This cookbook is about model building using convex optimization. It is intended as amodeling guide for the MOSEK optimization package. However, the style is intentionallyquite generic without specific MOSEK commands or API descriptions.
There are several excellent books available on this topic, for example the books by Ben-Tal and Nemirovski [BenTalN01] and Boyd and Vandenberghe [BV04] , which have bothbeen a great source of inspiration for this manual. The purpose of this manual is to collectthe material which we consider most relevant to our users and to present it in a practicalself-contained manner; however, we highly recommend the books as a supplement to thismanual.
Some textbooks on building models using optimization (or mathematical programming)introduce various concepts through practical examples. In this manual we have chosen adifferent route, where we instead show the different sets and functions that can be modeledusing convex optimization, which can subsequently be combined into realistic examples andapplications. In other words, we present simple convex building blocks, which can then becombined into more elaborate convex models. We call this approach extremely disciplinedmodeling. With the advent of more expressive and sophisticated tools like conic optimization,we feel that this approach is better suited.
Content
We begin with a comprehensive chapter on linear optimization, including modeling ex-amples, duality theory and infeasibility certificates for linear problems. Linear problems areoptimization problems of the form
minimize ๐๐๐ฅsubject to ๐ด๐ฅ = ๐,
๐ฅ โฅ 0.
Conic optimization is a generalization of linear optimization which handles problems of theform:
minimize ๐๐๐ฅsubject to ๐ด๐ฅ = ๐,
๐ฅ โ ๐พ,
1
where ๐พ is a convex cone. Various families of convex cones allow formulating different typesof nonlinear constraints. The following chapters present modeling with four types of convexcones:
โข quadratic cones ,
โข power cone,
โข exponential cone,
โข semidefinite cone.
It is โwell-knownโ in the convex optimization community that this family of cones issufficient to express almost all convex optimization problems appearing in practice.
Next we discuss issues arising in practical optimization, and we wholeheartedly recom-mend this short chapter to all readers before moving on to implementing mathematicalmodels with real data.
Following that, we present a general duality and infeasibility theory for conic problems.Finally we diverge slightly from the topic of conic optimization and introduce the language ofmixed-integer optimization and we discuss the relation between convex quadratic optimizationand conic quadratic optimization.
2
Chapter 2
Linear optimization
In this chapter we discuss various aspects of linear optimization. We first introduce thebasic concepts of linear optimization and discuss the underlying geometric interpretations.We then give examples of the most frequently used reformulations or modeling tricks usedin linear optimization, and finally we discuss duality and infeasibility theory in some detail.
2.1 Introduction
2.1.1 Basic notions
The most basic type of optimization is linear optimization. In linear optimization we mini-mize a linear function given a set of linear constraints. For example, we may wish to minimizea linear function
๐ฅ1 + 2๐ฅ2 โ ๐ฅ3
under the constraints that
๐ฅ1 + ๐ฅ2 + ๐ฅ3 = 1, ๐ฅ1, ๐ฅ2, ๐ฅ3 โฅ 0.
The function we minimize is often called the objective function; in this case we have a linearobjective function. The constraints are also linear and consist of both linear equalities andinequalities. We typically use more compact notation
minimize ๐ฅ1 + 2๐ฅ2 โ ๐ฅ3
subject to ๐ฅ1 + ๐ฅ2 + ๐ฅ3 = 1,๐ฅ1, ๐ฅ2, ๐ฅ3 โฅ 0,
(2.1)
and we call (2.1) a linear optimization problem. The domain where all constraints are satisfiedis called the feasible set ; the feasible set for (2.1) is shown in Fig. 2.1.
For this simple problem we see by inspection that the optimal value of the problem is โ1obtained by the optimal solution
(๐ฅโ1, ๐ฅ
โ2, ๐ฅ
โ3) = (0, 0, 1).
3
x1 x2
x3
Fig. 2.1: Feasible set for ๐ฅ1 + ๐ฅ2 + ๐ฅ3 = 1 and ๐ฅ1, ๐ฅ2, ๐ฅ3 โฅ 0.
Linear optimization problems are typically formulated using matrix notation. The standardform of a linear minimization problem is:
minimize ๐๐๐ฅsubject to ๐ด๐ฅ = ๐,
๐ฅ โฅ 0.(2.2)
For example, we can pose (2.1) in this form with
๐ฅ =
โกโฃ ๐ฅ1
๐ฅ2
๐ฅ3
โคโฆ , ๐ =
โกโฃ 12
โ1
โคโฆ , ๐ด =[๏ธ
1 1 1]๏ธ.
There are many other formulations for linear optimization problems; we can have differenttypes of constraints,
๐ด๐ฅ = ๐, ๐ด๐ฅ โฅ ๐, ๐ด๐ฅ โค ๐, ๐๐ โค ๐ด๐ฅ โค ๐ข๐,
and different bounds on the variables
๐๐ฅ โค ๐ฅ โค ๐ข๐ฅ
or we may have no bounds on some ๐ฅ๐, in which case we say that ๐ฅ๐ is a free variable. Allthese formulations are equivalent in the sense that by simple linear transformations andintroduction of auxiliary variables they represent the same set of problems. The importantfeature is that the objective function and the constraints are all linear in ๐ฅ.
2.1.2 Geometry of linear optimization
A hyperplane is a subset of R๐ defined as {๐ฅ | ๐๐ (๐ฅโ๐ฅ0) = 0} or equivalently {๐ฅ | ๐๐๐ฅ = ๐พ}with ๐๐๐ฅ0 = ๐พ, see Fig. 2.2.
4
a
x
aTx = ฮณ
x0
Fig. 2.2: The dashed line illustrates a hyperplane {๐ฅ | ๐๐๐ฅ = ๐พ}.
Thus a linear constraint
๐ด๐ฅ = ๐
with ๐ด โ R๐ร๐ represents an intersection of ๐ hyperplanes.Next, consider a point ๐ฅ above the hyperplane in Fig. 2.3. Since ๐ฅ โ ๐ฅ0 forms an acute
angle with ๐ we have that ๐๐ (๐ฅ โ ๐ฅ0) โฅ 0, or ๐๐๐ฅ โฅ ๐พ. The set {๐ฅ | ๐๐๐ฅ โฅ ๐พ} is called ahalfspace. Similarly the set {๐ฅ | ๐๐๐ฅ โค ๐พ} forms another halfspace; in Fig. 2.3 it correspondsto the area below the dashed line.
a
x
aTx > ฮณ
x0
Fig. 2.3: The grey area is the halfspace {๐ฅ | ๐๐๐ฅ โฅ ๐พ}.
A set of linear inequalities
๐ด๐ฅ โค ๐
corresponds to an intersection of halfspaces and forms a polyhedron, see Fig. 2.4.The polyhedral description of the feasible set gives us a very intuitive interpretation of
linear optimization, which is illustrated in Fig. 2.5. The dashed lines are normal to theobjective ๐ = (โ1, 1), and to minimize ๐๐๐ฅ we move as far as possible in the oppositedirection of ๐, to the furthest position where a dashed line intersect the polyhedron; anoptimal solution is therefore always either a vertex of the polyhedron, or an entire facet ofthe polyhedron may be optimal.
5
a1
a2a3
a4
a5
Fig. 2.4: A polyhedron formed as an intersection of halfspaces.
xโ
a1
a2a3
a4
a5
c
Fig. 2.5: Geometric interpretation of linear optimization. The optimal solution ๐ฅโ is at apoint where the normals to ๐ (the dashed lines) intersect the polyhedron.
6
The polyhedron shown in the figure is nonempty and bounded, but this is not always thecase for polyhedra arising from linear inequalities in optimization problems. In such casesthe optimization problem may be infeasible or unbounded, which we will discuss in detail inSec. 2.3.
2.2 Linear modeling
In this section we present useful reformulation techniques and standard tricks which allowconstructing more complicated models using linear optimization. It is also a guide throughthe types of constraints which can be expressed using linear (in)equalities.
2.2.1 Maximum
The inequality ๐ก โฅ max{๐ฅ1, . . . , ๐ฅ๐} is equivalent to a simultaneous sequence of ๐ inequalities
๐ก โฅ ๐ฅ๐, ๐ = 1, . . . , ๐
and similarly ๐ก โค min{๐ฅ1, . . . , ๐ฅ๐} is the same as
๐ก โค ๐ฅ๐, ๐ = 1, . . . , ๐.
Of course the same reformulation applies if each ๐ฅ๐ is not a single variable but a linearexpression. In particular, we can consider convex piecewise-linear functions ๐ : R๐ โฆโ Rdefined as the maximum of affine functions (see Fig. 2.6):
๐(๐ฅ) := max๐=1,...,๐
{๐๐๐ ๐ฅ + ๐๐}
a1x+ b1
a2x+ b2
a3x+ b3
Fig. 2.6: A convex piecewise-linear function (solid lines) of a single variable ๐ฅ. The functionis defined as the maximum of 3 affine functions.
The epigraph ๐(๐ฅ) โค ๐ก (see Sec. 12) has an equivalent formulation with ๐ inequalities:
๐๐๐ ๐ฅ + ๐๐ โค ๐ก, ๐ = 1, . . . ,๐.
Piecewise-linear functions have many uses linear in optimization; either we have a convexpiecewise-linear formulation from the onset, or we may approximate a more complicated(nonlinear) problem using piecewise-linear approximations, although with modern nonlinearoptimization software it is becoming both easier and more efficient to directly formulate andsolve nonlinear problems without piecewise-linear approximations.
7
2.2.2 Absolute value
The absolute value of a scalar variable is a special case of maximum
|๐ฅ| := max{๐ฅ,โ๐ฅ},
so we can model the epigraph |๐ฅ| โค ๐ก using two inequalities
โ๐ก โค ๐ฅ โค ๐ก.
2.2.3 The โ1 norm
All norms are convex functions, but the โ1 and โโ norms are of particular interest for linearoptimization. The โ1 norm of vector ๐ฅ โ R๐ is defined as
โ๐ฅโ1 := |๐ฅ1| + |๐ฅ2| + ยท ยท ยท + |๐ฅ๐|.
To model the epigraph
โ๐ฅโ1 โค ๐ก, (2.3)
we introduce the following system
|๐ฅ๐| โค ๐ง๐, ๐ = 1, . . . , ๐,๐โ๏ธ
๐=1
๐ง๐ = ๐ก, (2.4)
with additional (auxiliary) variable ๐ง โ R๐. Clearly (2.3) and (2.4) are equivalent, in thesense that they have the same projection onto the space of ๐ฅ and ๐ก variables. Therefore, wecan model (2.3) using linear (in)equalities
โ๐ง๐ โค ๐ฅ๐ โค ๐ง๐,๐โ๏ธ
๐=1
๐ง๐ = ๐ก, (2.5)
with auxiliary variables ๐ง. Similarly, we can describe the epigraph of the norm of an affinefunction of ๐ฅ,
โ๐ด๐ฅโ ๐โ1 โค ๐ก
as
โ๐ง๐ โค ๐๐๐ ๐ฅโ ๐๐ โค ๐ง๐,๐โ๏ธ
๐=1
๐ง๐ = ๐ก,
where ๐๐ is the ๐โth row of ๐ด (taken as a column-vector).
8
Example 2.1 (Basis pursuit). The โ1 norm is overwhelmingly popular as a convex ap-proximation of the cardinality (i.e., number on nonzero elements) of a vector ๐ฅ. Forexample, suppose we are given an underdetermined linear system
๐ด๐ฅ = ๐
where ๐ด โ R๐ร๐ and ๐ โช ๐. The basis pursuit problem
minimize โ๐ฅโ1subject to ๐ด๐ฅ = ๐,
(2.6)
uses the โ1 norm of ๐ฅ as a heuristic for finding a sparse solution (one with many zeroelements) to ๐ด๐ฅ = ๐, i.e., it aims to represent ๐ as a linear combination of few columns of๐ด. Using (2.5) we can pose the problem as a linear optimization problem,
minimize ๐๐ ๐งsubject to โ๐ง โค ๐ฅ โค ๐ง,
๐ด๐ฅ = ๐,(2.7)
where ๐ = (1, . . . , 1)๐ .
2.2.4 The โโ norm
The โโ norm of a vector ๐ฅ โ R๐ is defined as
โ๐ฅโโ := max๐=1,...,๐
|๐ฅ๐|,
which is another example of a simple piecewise-linear function. Using Sec. 2.2.2 we model
โ๐ฅโโ โค ๐ก
as
โ๐ก โค ๐ฅ๐ โค ๐ก, ๐ = 1, . . . , ๐.
Again, we can also consider affine functions of ๐ฅ, i.e.,
โ๐ด๐ฅโ ๐โโ โค ๐ก,
which can be described as
โ๐ก โค ๐๐๐ ๐ฅโ ๐ โค ๐ก, ๐ = 1, . . . , ๐.
9
Example 2.2 (Dual norms). It is interesting to note that the โ1 and โโ norms are dual.For any norm โ ยท โ on R๐, the dual norm โ ยท โ* is defined as
โ๐ฅโ* = max{๐ฅ๐๐ฃ | โ๐ฃโ โค 1}.
Let us verify that the dual of the โโ norm is the โ1 norm. Consider
โ๐ฅโ*,โ = max{๐ฅ๐๐ฃ | โ๐ฃโโ โค 1}.
Obviously the maximum is attained for
๐ฃ๐ =
{๏ธ+1, ๐ฅ๐ โฅ 0,โ1, ๐ฅ๐ < 0,
i.e., โ๐ฅโ*,โ =โ๏ธ
๐ |๐ฅ๐| = โ๐ฅโ1. Similarly, consider the dual of the โ1 norm,
โ๐ฅโ*,1 = max{๐ฅ๐๐ฃ | โ๐ฃโ1 โค 1}.
To maximize ๐ฅ๐๐ฃ subject to |๐ฃ1| + ยท ยท ยท + |๐ฃ๐| โค 1 we simply pick the element of ๐ฅ withlargest absolute value, say |๐ฅ๐|, and set ๐ฃ๐ = ยฑ1, so that โ๐ฅโ*,1 = |๐ฅ๐| = โ๐ฅโโ. Thisillustrates a more general property of dual norms, namely that โ๐ฅโ** = โ๐ฅโ.
2.2.5 Homogenization
Consider the linear-fractional problem
minimize ๐๐ ๐ฅ+๐๐๐ ๐ฅ+๐
subject to ๐๐๐ฅ + ๐ > 0,๐น๐ฅ = ๐.
(2.8)
Perhaps surprisingly, it can be turned into a linear problem if we homogenize the linearconstraint, i.e. replace it with ๐น๐ฆ = ๐๐ง for a single variable ๐ง โ R. The full new optimizationproblem is
minimize ๐๐๐ฆ + ๐๐งsubject to ๐๐๐ฆ + ๐๐ง = 1,
๐น๐ฆ = ๐๐ง,๐ง โฅ 0.
(2.9)
If ๐ฅ is a feasible point in (2.8) then ๐ง = (๐๐๐ฅ + ๐)โ1, ๐ฆ = ๐ฅ๐ง is feasible for (2.9) with thesame objective value. Conversely, if (๐ฆ, ๐ง) is feasible for (2.9) then ๐ฅ = ๐ฆ/๐ง is feasible in (2.8)and has the same objective value, at least when ๐ง ฬธ= 0. If ๐ง = 0 and ๐ฅ is any feasible pointfor (2.8) then ๐ฅ + ๐ก๐ฆ, ๐ก โ +โ is a sequence of solutions of (2.8) converging to the value of(2.9). We leave it for the reader to check those statements. In either case we showed anequivalence between the two problems.
10
Note that, as the sketch of proof above suggests, the optimal value in (2.8) may not beattained, even though the one in the linear problem (2.9) always is. For example, considera pair of problems constructed as above:
minimize ๐ฅ1/๐ฅ2
subject to ๐ฅ2 > 0,๐ฅ1 + ๐ฅ2 = 1.
minimize ๐ฆ1subject to ๐ฆ1 + ๐ฆ2 = ๐ง,
๐ฆ2 = 1,๐ง โฅ 0.
Both have an optimal value of โ1, but on the left we can only approach it arbitrarily closely.
2.2.6 Sum of largest elements
Suppose ๐ฅ โ R๐ and that ๐ is a positive integer. Consider the problem
minimize ๐๐ก +โ๏ธ
๐ ๐ข๐
subject to ๐ข๐ + ๐ก โฅ ๐ฅ๐, ๐ = 1, . . . , ๐,๐ข๐ โฅ 0, ๐ = 1, . . . , ๐,
(2.10)
with new variables ๐ก โ R, ๐ข๐ โ R๐. It is easy to see that fixing a value for ๐ก determines therest of the solution. For the sake of simplifying notation let us assume for a moment that ๐ฅis sorted:
๐ฅ1 โฅ ๐ฅ2 โฅ ยท ยท ยท โฅ ๐ฅ๐.
If ๐ก โ [๐ฅ๐, ๐ฅ๐+1) then ๐ข๐ = 0 for ๐ โฅ ๐ + 1 and ๐ข๐ = ๐ฅ๐ โ ๐ก for ๐ โค ๐ in the optimal solution.Therefore, the objective value under the assumption ๐ก โ [๐ฅ๐, ๐ฅ๐+1) is
obj๐ก = ๐ฅ1 + ยท ยท ยท + ๐ฅ๐ + ๐ก(๐โ ๐)
which is a linear function minimized at one of the endpoints of [๐ฅ๐, ๐ฅ๐+1). Now we cancompute
obj๐ฅ๐+1โ obj๐ฅ๐
= (๐ โ๐)(๐ฅ๐ โ ๐ฅ๐+1).
It follows that obj๐ฅ๐has a minimum for ๐ = ๐, and therefore the optimum value of (2.10)
is simply
๐ฅ1 + ยท ยท ยท + ๐ฅ๐.
Since the assumption that ๐ฅ is sorted was only a notational convenience, we conclude thatin general the optimization model (2.10) computes the sum of ๐ largest entries in ๐ฅ. InSec. 2.4 we will show a conceptual way of deriving this model.
11
2.3 Infeasibility in linear optimization
In this section we discuss the basic theory of primal infeasibility certificates for linear prob-lems. These ideas will be developed further after we have introduced duality in the nextsection.
One of the first issues one faces when presented with an optimization problem is whetherit has any solutions at all. As we discussed previously, for a linear optimization problem
minimize ๐๐๐ฅsubject to ๐ด๐ฅ = ๐,
๐ฅ โฅ 0.(2.11)
the feasible set
โฑ๐ = {๐ฅ โ R๐ | ๐ด๐ฅ = ๐, ๐ฅ โฅ 0}
is a convex polytope. We say the problem is feasible if โฑ๐ ฬธ= โ and infeasible otherwise.
Example 2.3 (Linear infeasible problem). Consider the optimization problem:
minimize 2๐ฅ1 + 3๐ฅ2 โ ๐ฅ3
subject to ๐ฅ1 + ๐ฅ2 + 2๐ฅ3 = 1,โ 2๐ฅ1 โ ๐ฅ2 + ๐ฅ3 = โ0.5,โ ๐ฅ1 + 5๐ฅ3 = โ0.1,
๐ฅ๐ โฅ 0.
This problem is infeasible. We see it by taking a linear combination of the constraintswith coefficients ๐ฆ = (1, 2,โ1)๐ :
๐ฅ1 + ๐ฅ2 + 2๐ฅ3 = 1, / ยท1โ 2๐ฅ1 โ ๐ฅ2 + ๐ฅ3 = โ0.5, / ยท2โ ๐ฅ1 + 5๐ฅ3 = โ0.1, / ยท(โ1)โ 2๐ฅ1 โ ๐ฅ2 โ ๐ฅ3 = 0.1.
This clearly proves infeasibility: the left-hand side is negative and the right-hand side ispositive, which is impossible.
2.3.1 Farkasโ lemma
In the last example we proved infeasibility of the linear system by exhibiting an explicit linearcombination of the equations, such that the right-hand side (constant) is positive while onthe left-hand side all coefficients are negative or zero. In matrix notation, such a linearcombination is given by a vector ๐ฆ such that ๐ด๐๐ฆ โค 0 and ๐๐๐ฆ > 0. The next lemma showsthat infeasibility of (2.11) is equivalent to the existence of such a vector.
12
Lemma 2.1 (Farkasโ lemma). Given ๐ด and ๐ as in (2.11), exactly one of the two statementsis true:
1. There exists ๐ฅ โฅ 0 such that ๐ด๐ฅ = ๐.
2. There exists ๐ฆ such that ๐ด๐๐ฆ โค 0 and ๐๐๐ฆ > 0.
Proof. Let ๐1, . . . , ๐๐ be the columns of ๐ด. The set {๐ด๐ฅ | ๐ฅ โฅ 0} is a closed convex conespanned by ๐1, . . . , ๐๐. If this cone contains ๐ then we have the first alternative. Otherwisethe cone can be separated from the point ๐ by a hyperplane passing through 0, i.e. thereexists ๐ฆ such that ๐ฆ๐ ๐ > 0 and ๐ฆ๐๐๐ โค 0 for all ๐. This is equivalent to the second alternative.Finally, 1. and 2. are mutually exclusive, since otherwise we would have
0 < ๐ฆ๐ ๐ = ๐ฆ๐๐ด๐ฅ = (๐ด๐๐ฆ)๐๐ฅ โค 0.
Farkasโ lemma implies that either the problem (2.11) is feasible or there is a certificateof infeasibility ๐ฆ. In other words, every time we classify model as infeasible, we can certifythis fact by providing an appropriate ๐ฆ, as in Example 2.3.
2.3.2 Locating infeasibility
As we already discussed, the infeasibility certificate ๐ฆ gives coefficients of a linear combinationof the constraints which is infeasible โin an obvious wayโ, that is positive on one side andnegative on the other. In some cases, ๐ฆ may be very sparse, i.e. it may have very fewnonzeros, which means that already a very small subset of the constraints is the root causeof infeasibility. This may be interesting if, for example, we are debugging a large modelwhich we expected to be feasible and infeasibility is caused by an error in the problemformulation. Then we only have to consider the sub-problem formed by constraints withindex set {๐ | ๐ฆ๐ ฬธ= 0}.
Example 2.4 (All constraints involved in infeasibility). As a cautionary note considerthe constraints
0 โค ๐ฅ1 โค ๐ฅ2 โค ยท ยท ยท โค ๐ฅ๐ โค โ1.
Any problem with those constraints is infeasible, but dropping any one of the inequalitiescreates a feasible subproblem.
2.4 Duality in linear optimization
Duality is a rich and powerful theory, central to understanding infeasibility and sensitivityissues in linear optimization. In this section we only discuss duality in linear optimization ata descriptive level suited for practitioners; we refer to Sec. 8 for a more in-depth discussionof duality for general conic problems.
13
2.4.1 The dual problem
Primal problem
We consider as always a linear optimization problem in standard form:
minimize ๐๐๐ฅsubject to ๐ด๐ฅ = ๐,
๐ฅ โฅ 0.(2.12)
We denote the optimal objective value in (2.12) by ๐โ. There are three possibilities:
โข The problem is infeasible. By convention ๐โ = +โ.
โข ๐โ is finite, in which case the problem has an optimal solution.
โข ๐โ = โโ, meaning that there are feasible solutions with ๐๐๐ฅ decreasing to โโ, inwhich case we say the problem is unbounded.
Lagrange function
We associate with (2.12) a so-called Lagrangian function ๐ฟ : R๐ ร R๐ ร R๐+ โ R that
augments the objective with a weighted combination of all the constraints,
๐ฟ(๐ฅ, ๐ฆ, ๐ ) = ๐๐๐ฅ + ๐ฆ๐ (๐โ ๐ด๐ฅ) โ ๐ ๐๐ฅ.
The variables ๐ฆ โ R๐ and ๐ โ R๐+ are called Lagrange multipliers or dual variables. For any
feasible ๐ฅ* โ โฑ๐ and any (๐ฆ*, ๐ *) โ R๐ ร R๐+ we have
๐ฟ(๐ฅ*, ๐ฆ*, ๐ *) = ๐๐๐ฅ* + (๐ฆ*)๐ ยท 0 โ (๐ *)๐๐ฅ* โค ๐๐๐ฅ*.
Note the we used the nonnegativity of ๐ *, or in general of any Lagrange multiplier associatedwith an inequality constraint. The dual function is defined as the minimum of ๐ฟ(๐ฅ, ๐ฆ, ๐ ) over๐ฅ. Thus the dual function of (2.12) is
๐(๐ฆ, ๐ ) = min๐ฅ
๐ฟ(๐ฅ, ๐ฆ, ๐ ) = min๐ฅ
๐ฅ๐ (๐โ ๐ด๐๐ฆ โ ๐ ) + ๐๐๐ฆ =
{๏ธ๐๐๐ฆ, ๐โ ๐ด๐๐ฆ โ ๐ = 0,โโ, otherwise.
Dual problem
For every (๐ฆ, ๐ ) the value of ๐(๐ฆ, ๐ ) is a lower bound for ๐โ. To get the best such boundwe maximize ๐(๐ฆ, ๐ ) over all (๐ฆ, ๐ ) and get the dual problem:
maximize ๐๐๐ฆsubject to ๐โ ๐ด๐๐ฆ = ๐ ,
๐ โฅ 0.(2.13)
The optimal value of (2.13) will be denoted ๐โ. As in the case of (2.12) (which from now onwe call the primal problem), the dual problem can be infeasible (๐โ = โโ), have an optimalsolution (โโ < ๐โ < +โ) or be unbounded (๐โ = +โ). Note that the roles of โโ and+โ are now reversed because the dual is a maximization problem.
14
Example 2.5 (Dual of basis pursuit). As an example, let us derive the dual of the basispursuit formulation (2.7). It would be possible to add auxiliary variables and constraintsto force that problem into the standard form (2.12) and then just apply the dual trans-formation as a black box, but it is both easier and more instructive to directly write theLagrangian:
๐ฟ(๐ฅ, ๐ง, ๐ฆ, ๐ข, ๐ฃ) = ๐๐ ๐ง + ๐ข๐ (๐ฅโ ๐ง) โ ๐ฃ๐ (๐ฅ + ๐ง) + ๐ฆ๐ (๐โ ๐ด๐ฅ)
where ๐ = (1, . . . , 1)๐ , with Lagrange multipliers ๐ฆ โ R๐ and ๐ข, ๐ฃ โ R๐+. The dual function
๐(๐ฆ, ๐ข, ๐ฃ) = min๐ฅ,๐ง
๐ฟ(๐ฅ, ๐ง, ๐ฆ, ๐ข, ๐ฃ) = min๐ฅ,๐ง
๐ง๐ (๐โ ๐ขโ ๐ฃ) + ๐ฅ๐ (๐ขโ ๐ฃ โ ๐ด๐๐ฆ) + ๐ฆ๐ ๐
is only bounded below if ๐ = ๐ข + ๐ฃ and ๐ด๐๐ฆ = ๐ขโ ๐ฃ, hence the dual problem is
maximize ๐๐๐ฆsubject to ๐ = ๐ข + ๐ฃ,
๐ด๐๐ฆ = ๐ขโ ๐ฃ,๐ข, ๐ฃ โฅ 0.
(2.14)
It is not hard to observe that an equivalent formulation of (2.14) is simply
maximize ๐๐๐ฆsubject to โ๐ด๐๐ฆโโ โค 1,
(2.15)
which should be associated with duality between norms discussed in Example 2.2.
Example 2.6 (Dual of a maximization problem). We can similarly derive the dual ofproblem (2.13). If we write it simply as
maximize ๐๐๐ฆsubject to ๐โ ๐ด๐๐ฆ โฅ 0,
then the Lagrangian is
๐ฟ(๐ฆ, ๐ข) = ๐๐๐ฆ + ๐ข๐ (๐โ ๐ด๐๐ฆ) = ๐ฆ๐ (๐โ ๐ด๐ข) + ๐๐๐ข
with ๐ข โ R๐+, so that now ๐ฟ(๐ฆ, ๐ข) โฅ ๐๐๐ฆ for any feasible ๐ฆ. Calculating min๐ข max๐ฆ ๐ฟ(๐ฆ, ๐ข)
is now equivalent to the problem
minimize ๐๐๐ขsubject to ๐ด๐ข = ๐,
๐ข โฅ 0,
so, as expected, the dual of the dual recovers the original primal problem.
15
2.4.2 Weak and strong duality
Suppose ๐ฅ* and (๐ฆ*, ๐ *) are feasible points for the primal and dual problems (2.12) and (2.13),respectively. Then we have
๐๐๐ฆ* = (๐ด๐ฅ*)๐๐ฆ* = (๐ฅ*)๐ (๐ด๐๐ฆ*) = (๐ฅ*)๐ (๐โ ๐ *) = ๐๐๐ฅ* โ (๐ *)๐๐ฅ* โค ๐๐๐ฅ*
so the dual objective value is a lower bound on the objective value of the primal. In particular,any dual feasible point (๐ฆ*, ๐ *) gives a lower bound:
๐๐๐ฆ* โค ๐โ
and we immediately get the next lemma.
Lemma 2.2 (Weak duality). ๐โ โค ๐โ.
It follows that if ๐๐๐ฆ* = ๐๐๐ฅ* then ๐ฅ* is optimal for the primal, (๐ฆ*, ๐ *) is optimal forthe dual and ๐๐๐ฆ* = ๐๐๐ฅ* is the common optimal objective value. This way we can use theoptimal dual solution to certify optimality of the primal solution and vice versa.
The remarkable property of linear optimization is that ๐โ = ๐โ holds in the most interest-ing scenario when the primal problem is feasible and bounded. It means that the certificateof optimality mentioned in the previous paragraph always exists.
Lemma 2.3 (Strong duality). If at least one of ๐โ, ๐โ is finite then ๐โ = ๐โ.
Proof. Suppose โโ < ๐โ < โ; the proof in the dual case is analogous. For any ๐ > 0consider the feasibility problem with variable ๐ฅ โฅ 0 and constraints[๏ธ
โ๐๐
๐ด
]๏ธ๐ฅ =
[๏ธโ๐โ + ๐
๐
]๏ธthat is ๐๐๐ฅ = ๐โ โ ๐,
๐ด๐ฅ = ๐.
Optimality of ๐โ implies that the above problem is infeasible. By Lemma 2.1 there exists๐ฆ = [๐ฆ0 ๐ฆ]๐ such that [๏ธ
โ๐, ๐ด๐]๏ธ๐ฆ โค 0 and
[๏ธโ๐โ + ๐, ๐๐
]๏ธ๐ฆ > 0.
If ๐ฆ0 = 0 then ๐ด๐๐ฆ โค 0 and ๐๐๐ฆ > 0, which by Lemma 2.1 again would mean that theoriginal primal problem was infeasible, which is not the case. Hence we can rescale so that๐ฆ0 = 1 and then we get
๐โ ๐ด๐๐ฆ โฅ 0 and ๐๐๐ฆ โฅ ๐โ โ ๐.
The first inequality above implies that ๐ฆ is feasible for the dual problem. By letting ๐ โ 0we obtain ๐โ โฅ ๐โ.
We can exploit strong duality to freely choose between solving the primal or dual versionof any linear problem.
16
Example 2.7 (Sum of largest elements). Suppose that ๐ฅ is now a constant vector. Con-sider the following problem with variable ๐ง:
maximize ๐ฅ๐ ๐งsubject to
โ๏ธ๐ ๐ง๐ = ๐,
0 โค ๐ง โค 1.
The maximum is attained when ๐ง indicates the positions of ๐ largest entries in ๐ฅ, andthe objective value is then their sum. This formulation, however, cannot be used when ๐ฅis another variable, since then the objective function is no longer linear. Let us derive thedual problem. The Lagrangian is
๐ฟ(๐ง, ๐ , ๐ก, ๐ข) = ๐ฅ๐ ๐ง + ๐ก(๐โ ๐๐ ๐ง) + ๐ ๐ ๐ง + ๐ข๐ (๐โ ๐ง) == ๐ง๐ (๐ฅโ ๐ก๐ + ๐ โ ๐ข) + ๐ก๐ + ๐ข๐ ๐
with ๐ข, ๐ โฅ 0. Since ๐ ๐ โฅ 0 is arbitrary and not otherwise constrained, the equality๐ฅ๐ โ ๐ก + ๐ ๐ โ ๐ข๐ = 0 is the same as ๐ข๐ + ๐ก โฅ ๐ฅ๐ and for the dual problem we get
minimize ๐๐ก +โ๏ธ
๐ ๐ข๐
subject to ๐ข๐ + ๐ก โฅ ๐ฅ๐, ๐ = 1, . . . , ๐,๐ข๐ โฅ 0, ๐ = 1, . . . , ๐,
which is exactly the problem (2.10) we studied in Sec. 2.2.6. Strong duality now impliesthat (2.10) computes the sum of ๐ biggest entries in ๐ฅ.
2.4.3 Duality and infeasibility: summary
We can now expand the discussion of infeasibility certificates in the context of duality. Farkasโlemma Lemma 2.1 can be dualized and the two versions can be summarized as follows:
Lemma 2.4 (Primal and dual Farkasโ lemma). For a primal-dual pair of linear problemswe have the following equivalences:
1. The primal problem (2.12) is infeasible if and only if there is ๐ฆ such that ๐ด๐๐ฆ โค 0 and๐๐๐ฆ > 0.
2. The dual problem (2.13) is infeasible if and only if there is ๐ฅ โฅ 0 such that ๐ด๐ฅ = 0and ๐๐๐ฅ < 0.
Weak and strong duality for linear optimization now lead to the following conclusions:
โข If the problem is primal feasible and has finite objective value (โโ < ๐โ < โ) thenso is the dual and ๐โ = ๐โ. We sometimes refer to this case as primal and dual feasible.The dual solution certifies the optimality of the primal solution and vice versa.
โข If the primal problem is feasible but unbounded (๐โ = โโ) then the dual is infeasible(๐โ = โโ). Part (ii) of Farkasโ lemma provides a certificate of this fact, that is a
17
vector ๐ฅ with ๐ฅ โฅ 0, ๐ด๐ฅ = 0 and ๐๐๐ฅ < 0. In fact it is easy to give this statement ageometric interpretation. If ๐ฅ0 is any primal feasible point then the infinite ray
๐ก โ ๐ฅ0 + ๐ก๐ฅ, ๐ก โ [0,โ)
belongs to the feasible set โฑ๐ because ๐ด(๐ฅ0 + ๐ก๐ฅ) = ๐ and ๐ฅ0 + ๐ก๐ฅ โฅ 0. Along this raythe objective value is unbounded below:
๐๐ (๐ฅ0 + ๐ก๐ฅ) = ๐๐๐ฅ0 + ๐ก(๐๐๐ฅ) โ โโ.
โข If the primal problem is infeasible (๐โ = โ) then a certificate of this fact is providedby part (i). The dual problem may be unbounded (๐โ = โ) or infeasible (๐โ = โโ).
Example 2.8 (Primal-dual infeasibility). Weak and strong duality imply that the onlycase when ๐โ ฬธ= ๐โ is when both primal and dual problem are infeasible (๐โ = โโ,๐โ = โ), for example:
minimize ๐ฅsubject to 0 ยท ๐ฅ = 1.
2.4.4 Dual values as shadow prices
Dual values are related to shadow prices, as they measure, under some nondegeneracy as-sumption, the sensitivity of the objective value to a change in the constraint. Consider againthe primal and dual problem pair (2.12) and (2.13) with feasible sets โฑ๐ and โฑ๐ and with aprimal-dual optimal solution (๐ฅ*, ๐ฆ*, ๐ *).
Suppose we change one of the values in ๐ from ๐๐ to ๐โฒ๐. This corresponds to moving oneof the hyperplanes defining โฑ๐, and in consequence the optimal solution (and the objectivevalue) may change. On the other hand, the dual feasible set โฑ๐ is not affected. Assumingthat the solution (๐ฆ*, ๐ *) was a unique vertex of โฑ๐ this point remains optimal for the dualafter a sufficiently small change of ๐. But then the change of the dual objective is
๐ฆ*๐ (๐โฒ๐ โ ๐๐)
and by strong duality the primal objective changes by the same amount.
Example 2.9 (Student diet). An optimization student wants to save money on the dietwhile remaining healthy. A healthy diet requires at least ๐ = 6 units of protein, ๐ถ = 15units of carbohydrates, ๐น = 5 units of fats and ๐ = 7 units of vitamins. The student canchoose from the following products:
P C F V pricetakeaway 3 3 2 1 5vegetables 1 2 0 4 1bread 0.5 4 1 0 2
18
The problem of minimizing cost while meeting dietary requirements is
minimize 5๐ฅ1 + ๐ฅ2 + 2๐ฅ3
subject to 3๐ฅ1 + ๐ฅ2 + 0.5๐ฅ3 โฅ 6,3๐ฅ1 + 2๐ฅ2 + 4๐ฅ3 โฅ 15,2๐ฅ1 + ๐ฅ3 โฅ 5,๐ฅ1 + 4๐ฅ2 โฅ 7,๐ฅ1, ๐ฅ2, ๐ฅ3 โฅ 0.
If ๐ฆ1, ๐ฆ2, ๐ฆ3, ๐ฆ4 are the dual variables associated with the four inequality constraints thenthe (unique) primal-dual optimal solution to this problem is approximately:
(๐ฅ, ๐ฆ) = ((1, 1.5, 3), (0.42, 0, 1.78, 0.14))
with optimal cost ๐โ = 12.5. Note ๐ฆ2 = 0 indicates that the second constraint is notbinding. Indeed, we could increase ๐ถ to 18 without affecting the optimal solution. Theremaining constraints are binding.
Improving the intake of protein by 1 unit (increasing ๐ to 7) will increase the costby 0.42, while doing the same for fat will cost an extra 1.78 per unit. If the student hadextra money to improve one of the parameters then the best choice would be to increasethe intake of vitamins, with shadow price of just 0.14.
If one month the student only had 12 units of money and was willing to relax one ofthe requirements then the best choice is to save on fats: the necessary reduction of ๐น issmallest, namely 0.5 ยท 1.78โ1 = 0.28. Indeed, with the new value of ๐น = 4.72 the sameproblem solves to ๐โ = 12 and ๐ฅ = (1.08, 1.48, 2.56).
We stress that a truly balanced diet problem should also include upper bounds.
19
Chapter 3
Conic quadratic optimization
This chapter extends the notion of linear optimization with quadratic cones. Conic quadraticoptimization, also known as second-order cone optimization, is a straightforward generaliza-tion of linear optimization, in the sense that we optimize a linear function under linear(in)equalities with some variables belonging to one or more (rotated) quadratic cones. Wediscuss the basic concept of quadratic cones, and demonstrate the surprisingly large flexibilityof conic quadratic modeling.
3.1 Cones
Since this is the first place where we introduce a non-linear cone, it seems suitable to makeour most important definition:
A set ๐พ โ R๐ is called a convex cone if
โข for every ๐ฅ, ๐ฆ โ ๐พ we have ๐ฅ + ๐ฆ โ ๐พ,
โข for every ๐ฅ โ ๐พ and ๐ผ โฅ 0 we have ๐ผ๐ฅ โ ๐พ.
For example a linear subspace of R๐, the positive orthant R๐โฅ0 or any ray (half-line)
starting at the origin are examples of convex cones. We leave it for the reader to checkthat the intersection of convex cones is a convex cone; this property enables us to assemblecomplicated optimization models from individual conic bricks.
3.1.1 Quadratic cones
We define the ๐-dimensional quadratic cone as
๐ฌ๐ =
{๏ธ๐ฅ โ R๐ | ๐ฅ1 โฅ
โ๏ธ๐ฅ22 + ๐ฅ2
3 + ยท ยท ยท + ๐ฅ2๐
}๏ธ. (3.1)
The geometric interpretation of a quadratic (or second-order) cone is shown in Fig. 3.1 fora cone with three variables, and illustrates how the boundary of the cone resembles anice-cream cone. The 1-dimensional quadratic cone simply states nonnegativity ๐ฅ1 โฅ 0.
20
Fig. 3.1: Boundary of quadratic cone ๐ฅ1 โฅโ๏ธ
๐ฅ22 + ๐ฅ2
3 and rotated quadratic cone 2๐ฅ1๐ฅ2 โฅ ๐ฅ23,
๐ฅ1, ๐ฅ2 โฅ 0.
3.1.2 Rotated quadratic cones
An ๐โdimensional rotated quadratic cone is defined as
๐ฌ๐๐ =
{๏ธ๐ฅ โ R๐ | 2๐ฅ1๐ฅ2 โฅ ๐ฅ2
3 + ยท ยท ยท + ๐ฅ2๐, ๐ฅ1, ๐ฅ2 โฅ 0
}๏ธ. (3.2)
As the name indicates, there is a simple relationship between quadratic and rotated quadraticcones. Define an orthogonal transformation
๐๐ :=
โกโฃ 1/โ
2 1/โ
2 0
1/โ
2 โ1/โ
2 00 0 ๐ผ๐โ2
โคโฆ . (3.3)
Then it is easy to verify that
๐ฅ โ ๐ฌ๐ โโ ๐๐๐ฅ โ ๐ฌ๐๐ ,
and since ๐ is orthogonal we call ๐ฌ๐๐ a rotated cone; the transformation corresponds to a
rotation of ๐/4 in the (๐ฅ1, ๐ฅ2) plane. For example if ๐ฅ โ ๐ฌ3 andโกโฃ ๐ง1๐ง2๐ง3
โคโฆ =
โกโฃ 1โ2
1โ2
01โ2
โ 1โ2
0
0 0 1
โคโฆ ยท
โกโฃ ๐ฅ1
๐ฅ2
๐ฅ3
โคโฆ =
โกโฃ 1โ2(๐ฅ1 + ๐ฅ2)
1โ2(๐ฅ1 โ ๐ฅ2)
๐ฅ3
โคโฆthen
2๐ง1๐ง2 โฅ ๐ง23 , ๐ง1, ๐ง2 โฅ 0 =โ (๐ฅ21 โ ๐ฅ2
2) โฅ ๐ฅ23, ๐ฅ1 โฅ 0,
and similarly we see that
๐ฅ21 โฅ ๐ฅ2
2 + ๐ฅ23, ๐ฅ1 โฅ 0 =โ 2๐ง1๐ง2 โฅ ๐ง23 , ๐ง1, ๐ง2 โฅ 0.
Thus, one could argue that we only need quadratic cones ๐ฌ๐, but there are many exampleswhere using an explicit rotated quadratic cone ๐ฌ๐
๐ is more natural, as we will see next.
21
3.2 Conic quadratic modeling
In the following we describe several convex sets that can be modeled using conic quadraticformulations or, as we call them, are conic quadratic representable.
3.2.1 Absolute values
In Sec. 2.2.2 we saw how to model |๐ฅ| โค ๐ก using two linear inequalities, but in fact theepigraph of the absolute value is just the definition of a two-dimensional quadratic cone, i.e.,
|๐ฅ| โค ๐ก โโ (๐ก, ๐ฅ) โ ๐ฌ2.
3.2.2 Euclidean norms
The Euclidean norm of ๐ฅ โ R๐,
โ๐ฅโ2 =โ๏ธ
๐ฅ21 + ๐ฅ2
2 + ยท ยท ยท + ๐ฅ2๐
essentially defines the quadratic cone, i.e.,
โ๐ฅโ2 โค ๐ก โโ (๐ก, ๐ฅ) โ ๐ฌ๐+1.
The epigraph of the squared Euclidean norm can be described as the intersection of a rotatedquadratic cone with an affine hyperplane,
๐ฅ21 + ยท ยท ยท + ๐ฅ2
๐ = โ๐ฅโ22 โค ๐ก โโ (1/2, ๐ก, ๐ฅ) โ ๐ฌ๐+2๐ .
3.2.3 Convex quadratic sets
Assume ๐ โ R๐ร๐ is a symmetric positive semidefinite matrix. The convex inequality
(1/2)๐ฅ๐๐๐ฅ + ๐๐๐ฅ + ๐ โค 0
may be rewritten as
๐ก + ๐๐๐ฅ + ๐ = 0,๐ฅ๐๐๐ฅ โค 2๐ก.
(3.4)
Since ๐ is symmetric positive semidefinite the epigraph
๐ฅ๐๐๐ฅ โค 2๐ก (3.5)
22
is a convex set and there exists a matrix ๐น โ R๐ร๐ such that
๐ = ๐น ๐๐น (3.6)
(see Sec. 6 for properties of semidefinite matrices). For instance ๐น could be the Choleskyfactorization of ๐. Then
๐ฅ๐๐๐ฅ = ๐ฅ๐๐น ๐๐น๐ฅ = โ๐น๐ฅโ22
and we have an equivalent characterization of (3.5) as
(1/2)๐ฅ๐๐๐ฅ โค ๐ก โโ (๐ก, 1, ๐น๐ฅ) โ ๐ฌ2+๐๐ .
Frequently ๐ has the structure
๐ = ๐ผ + ๐น ๐๐น
where ๐ผ is the identity matrix, so
๐ฅ๐๐๐ฅ = ๐ฅ๐๐ฅ + ๐ฅ๐๐น ๐๐น๐ฅ = โ๐ฅโ22 + โ๐น๐ฅโ22
and hence
(๐, 1, ๐ฅ) โ ๐ฌ2+๐๐ , (โ, 1, ๐น๐ฅ) โ ๐ฌ2+๐
๐ , ๐ + โ = ๐ก
is a conic quadratic representation of (3.5) in this case.
3.2.4 Second-order cones
A second-order cone is occasionally specified as
โ๐ด๐ฅ + ๐โ2 โค ๐๐๐ฅ + ๐ (3.7)
where ๐ด โ R๐ร๐ and ๐ โ R๐. The formulation (3.7) is simply
(๐๐๐ฅ + ๐,๐ด๐ฅ + ๐) โ ๐ฌ๐+1 (3.8)
or equivalently
๐ = ๐ด๐ฅ + ๐,๐ก = ๐๐๐ฅ + ๐,
(๐ก, ๐ ) โ ๐ฌ๐+1.(3.9)
As will be explained in Sec. 8, we refer to (3.8) as the dual form and (3.9) as the primalform. An alternative characterization of (3.7) is
โ๐ด๐ฅ + ๐โ22 โ (๐๐๐ฅ + ๐)2 โค 0, ๐๐๐ฅ + ๐ โฅ 0 (3.10)
which shows that certain quadratic inequalities are conic quadratic representable.
23
3.2.5 Simple sets involving power functions
Some power-like inequalities are conic quadratic representable, even though it need not beobvious at first glance. For example, we have
|๐ก| โค โ๐ฅ, ๐ฅ โฅ 0 โโ (๐ฅ, 1/2, ๐ก) โ ๐ฌ3
๐,
or in a similar fashion
๐ก โฅ 1
๐ฅ, ๐ฅ โฅ 0 โโ (๐ฅ, ๐ก,
โ2) โ ๐ฌ3
๐.
For a more complicated example, consider the constraint
๐ก โฅ ๐ฅ3/2, ๐ฅ โฅ 0.
This is equivalent to a statement involving two cones and an extra variable
(๐ , ๐ก, ๐ฅ), (๐ฅ, 1/8, ๐ ) โ ๐ฌ3๐
because
2๐ ๐ก โฅ ๐ฅ2, 2 ยท 1
8๐ฅ โฅ ๐ 2, =โ 4๐ 2๐ก2 ยท 1
4๐ฅ โฅ ๐ฅ4 ยท ๐ 2 =โ ๐ก โฅ ๐ฅ3/2.
In practice power-like inequalities representable with similar tricks can often be expressedmuch more naturally using the power cone (see Sec. 4), so we will not dwell on these examplesmuch longer.
3.2.6 Rational functions
A general non-degenerate rational function of one variable ๐(๐ฅ) = ๐๐ฅ+๐๐๐ฅ+๐
can always be writtenin the form ๐(๐ฅ) = ๐ + ๐
๐๐ฅ+๐, and when ๐ > 0 it is convex on the set where ๐๐ฅ + ๐ > 0. The
conic quadratic model of
๐ก โฅ ๐ +๐
๐๐ฅ + ๐, ๐๐ฅ + ๐ > 0, ๐ > 0
with variables ๐ก, ๐ฅ can be written as:
(๐กโ ๐, ๐๐ฅ + ๐,โ๏ธ
2๐) โ ๐ฌ3๐.
3.2.7 Harmonic mean
Consider next the hypograph of the harmonic mean,(๏ธ1
๐
๐โ๏ธ๐=1
๐ฅโ1๐
)๏ธโ1
โฅ ๐ก โฅ 0, ๐ฅ โฅ 0.
24
It is not obvious that the inequality defines a convex set, nor that it is conic quadraticrepresentable. However, we can rewrite it in the form
๐โ๏ธ๐=1
๐ก2
๐ฅ๐
โค ๐๐ก,
which suggests the conic representation:
2๐ฅ๐๐ง๐ โฅ ๐ก2, ๐ฅ๐, ๐ง๐ โฅ 0, 2๐โ๏ธ
๐=1
๐ง๐ = ๐๐ก. (3.11)
Alternatively, in case ๐ = 2, the hypograph of the harmonic mean can be represented by asingle quadratic cone:
(๐ฅ1 + ๐ฅ2 โ ๐ก/2, ๐ฅ1, ๐ฅ2, ๐ก/2) โ ๐ฌ4. (3.12)
As this representation generalizes the harmonic mean to all points ๐ฅ1 + ๐ฅ2 โฅ 0, as impliedby (3.12), the variable bounds ๐ฅ โฅ 0 and ๐ก โฅ 0 have to be added seperately if so desired.
3.2.8 Quadratic forms with one negative eigenvalue
Assume that ๐ด โ R๐ร๐ is a symmetric matrix with exactly one negative eigenvalue, i.e., ๐ดhas a spectral factorization (i.e., eigenvalue decomposition)
๐ด = ๐ฮ๐๐ = โ๐ผ1๐1๐๐1 +
๐โ๏ธ๐=2
๐ผ๐๐๐๐๐๐ ,
where ๐๐๐ = ๐ผ, ฮ = Diag(โ๐ผ1, ๐ผ2, . . . , ๐ผ๐), ๐ผ๐ โฅ 0. Then
๐ฅ๐๐ด๐ฅ โค 0
is equivalent to๐โ๏ธ
๐=2
๐ผ๐(๐๐๐ ๐ฅ)2 โค ๐ผ1(๐
๐1 ๐ฅ)2. (3.13)
This shows ๐ฅ๐๐ด๐ฅ โค 0 to be a union of two convex subsets, each representable by a quadraticcone. For example, assuming ๐๐1 ๐ฅ โฅ 0, we can characterize (3.13) as
(โ๐ผ1๐
๐1 ๐ฅ,
โ๐ผ2๐
๐2 ๐ฅ, . . . ,
โ๐ผ๐๐
๐๐๐ฅ) โ ๐ฌ๐. (3.14)
3.2.9 Ellipsoidal sets
The set
โฐ = {๐ฅ โ R๐ | โ๐ (๐ฅโ ๐)โ2 โค 1}describes an ellipsoid centred at ๐. It has a natural conic quadratic representation, i.e., ๐ฅ โ โฐif and only if
๐ฅ โ โฐ โโ (1, ๐ (๐ฅโ ๐)) โ ๐ฌ๐+1.
25
3.3 Conic quadratic case studies
3.3.1 Quadratically constrained quadratic optimization
A general convex quadratically constrained quadratic optimization problem can be writtenas
minimize (1/2)๐ฅ๐๐0๐ฅ + ๐๐0 ๐ฅ + ๐0subject to (1/2)๐ฅ๐๐๐๐ฅ + ๐๐๐ ๐ฅ + ๐๐ โค 0, ๐ = 1, . . . , ๐,
(3.15)
where all ๐๐ โ R๐ร๐ are symmetric positive semidefinite. Let
๐๐ = ๐น ๐๐ ๐น๐, ๐ = 0, . . . , ๐,
where ๐น๐ โ R๐๐ร๐. Using the formulations in Sec. 3.2.3 we then get an equivalent conicquadratic problem
minimize ๐ก0 + ๐๐0 ๐ฅ + ๐0subject to ๐ก๐ + ๐๐๐ ๐ฅ + ๐๐ = 0, ๐ = 1, . . . , ๐,
(๐ก๐, 1, ๐น๐๐ฅ) โ ๐ฌ๐๐+2๐ , ๐ = 0, . . . , ๐.
(3.16)
Assume next that ๐๐, the number of rows in ๐น๐, is small compared to ๐. Storing ๐๐ requiresabout ๐2/2 space whereas storing ๐น๐ then only requires ๐๐๐ space. Moreover, the amountof work required to evaluate ๐ฅ๐๐๐๐ฅ is proportional to ๐2 whereas the work required toevaluate ๐ฅ๐๐น ๐
๐ ๐น๐๐ฅ = โ๐น๐๐ฅโ2 is proportional to ๐๐๐ only. In other words, if ๐๐ have low rank,then (3.16) will require much less space and time to solve than (3.15). We will study thereformulation (3.16) in much more detail in Sec. 10.
3.3.2 Robust optimization with ellipsoidal uncertainties
Often in robust optimization some of the parameters in the model are assumed to be unknownexactly, but there is a simple set describing the uncertainty. For example, for a standardlinear optimization problem we may wish to find a robust solution for all objective vectors๐ in an ellipsoid
โฐ = {๐ โ R๐ | ๐ = ๐น๐ฆ + ๐, โ๐ฆโ2 โค 1}.
A common approach is then to optimize for the worst-case scenario for ๐, so we get a robustversion
minimize sup๐โโฐ ๐๐๐ฅ
subject to ๐ด๐ฅ = ๐,๐ฅ โฅ 0.
(3.17)
The worst-case objective can be evaluated as
sup๐โโฐ
๐๐๐ฅ = ๐๐๐ฅ + supโ๐ฆโ2โค1
๐ฆ๐๐น ๐๐ฅ = ๐๐๐ฅ + โ๐น ๐๐ฅโ2
26
where we used that supโ๐ขโ2โค1 ๐ฃ๐๐ข = (๐ฃ๐๐ฃ)/โ๐ฃโ2 = โ๐ฃโ2. Thus the robust problem (3.17) is
equivalent to
minimize ๐๐๐ฅ + โ๐น ๐๐ฅโ2subject to ๐ด๐ฅ = ๐,
๐ฅ โฅ 0,
which can be posed as a conic quadratic problem
minimize ๐๐๐ฅ + ๐กsubject to ๐ด๐ฅ = ๐,
(๐ก, ๐น ๐๐ฅ) โ ๐ฌ๐+1,๐ฅ โฅ 0.
(3.18)
3.3.3 Markowitz portfolio optimization
In classical Markowitz portfolio optimization we consider investment in ๐ stocks or assetsheld over a period of time. Let ๐ฅ๐ denote the amount we invest in asset ๐, and assume astochastic model where the return of the assets is a random variable ๐ with known mean
๐ = E๐
and covariance
ฮฃ = E(๐ โ ๐)(๐ โ ๐)๐ .
The return of our investment is also a random variable ๐ฆ = ๐๐๐ฅ with mean (or expectedreturn)
E๐ฆ = ๐๐๐ฅ
and variance (or risk)
(๐ฆ โ E๐ฆ)2 = ๐ฅ๐ฮฃ๐ฅ.
We then wish to rebalance our portfolio to achieve a compromise between risk and expectedreturn, e.g., we can maximize the expected return given an upper bound ๐พ on the tolerablerisk and a constraint that our total investment is fixed,
maximize ๐๐๐ฅsubject to ๐ฅ๐ฮฃ๐ฅ โค ๐พ
๐๐๐ฅ = 1๐ฅ โฅ 0.
(3.19)
Suppose we factor ฮฃ = ๐บ๐บ๐ (e.g., using a Cholesky or a eigenvalue decomposition). Wethen get a conic formulation
maximize ๐๐๐ฅsubject to (
โ๐พ,๐บ๐๐ฅ) โ ๐ฌ๐+1
๐๐๐ฅ = 1๐ฅ โฅ 0.
(3.20)
27
In practice both the average return and covariance are estimated using historical data. Arecent trend is then to formulate a robust version of the portfolio optimization problem tocombat the inherent uncertainty in those estimates, e.g., we can constrain ๐ to an ellipsoidaluncertainty set as in Sec. 3.3.2.
It is also common that the data for a portfolio optimization problem is already given inthe form of a factor model ฮฃ = ๐น ๐๐น of ฮฃ = ๐ผ + ๐น ๐๐น and a conic quadratic formulation asin Sec. 3.3.1 is most natural. For more details see Sec. 10.3.
3.3.4 Maximizing the Sharpe ratio
Continuing the previous example, the Sharpe ratio defines an efficiency metric of a portfolioas the expected return per unit risk, i.e.,
๐(๐ฅ) =๐๐๐ฅโ ๐๐(๐ฅ๐ฮฃ๐ฅ)1/2
,
where ๐๐ denotes the return of a risk-free asset. We assume that there is a portfolio with๐๐๐ฅ > ๐๐ , so maximizing the Sharpe ratio is equivalent to minimizing 1/๐(๐ฅ). In otherwords, we have the following problem
minimize โ๐บ๐ ๐ฅโ๐๐ ๐ฅโ๐๐
subject to ๐๐๐ฅ = 1,๐ฅ โฅ 0.
The objective has the same nature as a quotient of two affine functions we studied in Sec.2.2.5. We reformulate the problem in a similar way, introducing a scalar variable ๐ง โฅ 0 anda variable transformation
๐ฆ = ๐ง๐ฅ.
Since a positive ๐ง can be chosen arbitrarily and (๐ โ ๐๐๐)๐๐ฅ > 0, we can without loss of
generality assume that
(๐โ ๐๐๐)๐๐ฆ = 1.
Thus, we obtain the following conic problem for maximizing the Sharpe ratio,
minimize ๐กsubject to (๐ก, ๐บ๐๐ฆ) โ ๐ฌ๐+1,
๐๐๐ฆ = ๐ง,(๐โ ๐๐๐)
๐๐ฆ = 1,๐ฆ, ๐ง โฅ 0,
and we recover ๐ฅ = ๐ฆ/๐ง.
28
3.3.5 A resource constrained production and inventory problem
The resource constrained production and inventory problem [Zie82] can be formulated asfollows:
minimizeโ๏ธ๐
๐=1(๐๐๐ฅ๐ + ๐๐/๐ฅ๐)
subject toโ๏ธ๐
๐=1 ๐๐๐ฅ๐ โค ๐,
๐ฅ๐ โฅ 0, ๐ = 1, . . . , ๐,
(3.21)
where ๐ denotes the number of items to be produced, ๐ denotes the amount of commonresource, and ๐๐ is the consumption of the limited resource to produce one unit of item ๐.The objective function represents inventory and ordering costs. Let ๐๐๐ denote the holdingcost per unit of product ๐ and ๐๐๐ denote the rate of holding costs, respectively. Further, let
๐๐ =๐๐๐๐
๐๐
2
so that
๐๐๐ฅ๐
is the average holding costs for product ๐. If ๐ท๐ denotes the total demand for product ๐ and๐๐๐ the ordering cost per order of product ๐ then let
๐๐ = ๐๐๐๐ท๐
and hence
๐๐๐ฅ๐
=๐๐๐๐ท๐
๐ฅ๐
is the average ordering costs for product ๐. In summary, the problem finds the optimal batchsize such that the inventory and ordering cost are minimized while satisfying the constraintson the common resource. Given ๐๐, ๐๐ โฅ 0 problem (3.21) is equivalent to the conic quadraticproblem
minimizeโ๏ธ๐
๐=1(๐๐๐ฅ๐ + ๐๐๐ก๐)
subject toโ๏ธ๐
๐=1 ๐๐๐ฅ๐ โค ๐,
(๐ก๐, ๐ฅ๐,โ
2) โ ๐ฌ3๐, ๐ = 1, . . . , ๐.
It is not always possible to produce a fractional number of items. In such case ๐ฅ๐ should beconstrained to be integers. See Sec. 9.
29
Chapter 4
The power cone
So far we studied quadratic cones and their applications in modeling problems involving,directly or indirectly, quadratic terms. In this part we expand the quadratic and rotatedquadratic cone family with power cones, which provide a convenient language to expressmodels involving powers other than 2. We must stress that although the power cones in-clude the quadratic cones as special cases, at the current state-of-the-art they require moreadvanced and less efficient algorithms.
4.1 The power cone(s)
๐-dimensional power cones form a family of convex cones parametrized by a real number0 < ๐ผ < 1:
๐ซ๐ผ,1โ๐ผ๐ =
{๏ธ๐ฅ โ R๐ : ๐ฅ๐ผ
1๐ฅ1โ๐ผ2 โฅ
โ๏ธ๐ฅ23 + ยท ยท ยท + ๐ฅ2
๐, ๐ฅ1, ๐ฅ2 โฅ 0}๏ธ. (4.1)
The constraint in the definition of ๐ซ๐ผ,1โ๐ผ๐ can be expressed as a composition of two con-
straints, one of which is a quadratic cone:
๐ฅ๐ผ1๐ฅ
1โ๐ผ2 โฅ |๐ง|,๐ง โฅ
โ๏ธ๐ฅ23 + ยท ยท ยท + ๐ฅ2
๐,(4.2)
which means that the basic building block we need to consider is the three-dimensional powercone
๐ซ๐ผ,1โ๐ผ3 =
{๏ธ๐ฅ โ R3 : ๐ฅ๐ผ
1๐ฅ1โ๐ผ2 โฅ |๐ฅ3|, ๐ฅ1, ๐ฅ2 โฅ 0
}๏ธ. (4.3)
More generally, we can also consider power cones with โlong left-hand sideโ. That is, for๐ < ๐ and a sequence of exponents ๐ผ1, . . . , ๐ผ๐ with ๐ผ1 + ยท ยท ยท + ๐ผ๐ = 1, we have the mostgeneral power cone object defined as
๐ซ๐ผ1,ยทยทยท ,๐ผ๐๐ =
{๏ธ๐ฅ โ R๐ :
โ๏ธ๐๐=1 ๐ฅ
๐ผ๐๐ โฅ
โ๏ธโ๏ธ๐๐=๐+1 ๐ฅ
2๐ , ๐ฅ1, . . . , ๐ฅ๐ โฅ 0
}๏ธ. (4.4)
The left-hand side is nothing but the weighted geometric mean of the ๐ฅ๐, ๐ = 1, . . . ,๐ withweights ๐ผ๐. As we will see later, also this most general cone can be modeled as a compositionof three-dimensional cones ๐ซ๐ผ,1โ๐ผ
3 , so in a sense that is the basic object of interest.
30
There are some notable special cases we are familiar with. If we let ๐ผ โ 0 then in thelimit we get ๐ซ0,1
๐ = R+ ร ๐ฌ๐โ1. If ๐ผ = 12
then we have a rescaled version of the rotatedquadratic cone, precisely:
(๐ฅ1, ๐ฅ2, . . . , ๐ฅ๐) โ ๐ซ12, 12
๐ โโ (๐ฅ1/โ
2, ๐ฅ2/โ
2, ๐ฅ3, . . . , ๐ฅ๐) โ ๐ฌ๐r .
A gallery of three-dimensional power cones for varying ๐ผ is shown in Fig. 4.1.
4.2 Sets representable using the power cone
In this section we give basic examples of constraints which can be expressed using powercones.
4.2.1 Powers
For all values of ๐ ฬธ= 0, 1 we can bound ๐ฅ๐ depending on the convexity of ๐(๐ฅ) = ๐ฅ๐.
โข For ๐ > 1 the inequality ๐ก โฅ |๐ฅ|๐ is equivalent to ๐ก1/๐ โฅ |๐ฅ| and hence corresponds to
๐ก โฅ |๐ฅ|๐ โโ (๐ก, 1, ๐ฅ) โ ๐ซ1/๐,1โ1/๐3 .
For instance ๐ก โฅ |๐ฅ|1.5 is equivalent to (๐ก, 1, ๐ฅ) โ ๐ซ2/3,1/33 .
โข For 0 < ๐ < 1 the function ๐(๐ฅ) = ๐ฅ๐ is concave for ๐ฅ โฅ 0 and so we get
|๐ก| โค ๐ฅ๐, ๐ฅ โฅ 0 โโ (๐ฅ, 1, ๐ก) โ ๐ซ๐,1โ๐3 .
โข For ๐ < 0 the function ๐(๐ฅ) = ๐ฅ๐ is convex for ๐ฅ > 0 and in this range the inequality๐ก โฅ ๐ฅ๐ is equivalent to
๐ก โฅ ๐ฅ๐ โโ ๐ก1/(1โ๐)๐ฅโ๐/(1โ๐) โฅ 1 โโ (๐ก, ๐ฅ, 1) โ ๐ซ1/(1โ๐),โ๐/(1โ๐)3 .
For example ๐ก โฅ 1โ๐ฅ
is the same as (๐ก, ๐ฅ, 1) โ ๐ซ2/3,1/33 .
4.2.2 ๐-norm cones
Let ๐ โฅ 1. The ๐-norm of a vector ๐ฅ โ R๐ is โ๐ฅโ๐ = (|๐ฅ1|๐ + ยท ยท ยท + |๐ฅ๐|๐)1/๐ and the ๐-normball of radius ๐ก is defined by the inequality โ๐ฅโ๐ โค ๐ก. We take the ๐-norm cone (in dimension๐ + 1) to be the convex set {๏ธ
(๐ก, ๐ฅ) โ R๐+1 : ๐ก โฅ โ๐ฅโ๐}๏ธ. (4.5)
For ๐ = 2 this is precisely the quadratic cone. We can model the ๐-norm cone by writing theinequality ๐ก โฅ โ๐ฅโ๐ as:
๐ก โฅโ๏ธ๐
|๐ฅ๐|๐/๐ก๐โ1
31
Fig. 4.1: The boundary of ๐ซ๐ผ,1โ๐ผ3 seen from a point inside the cone for ๐ผ = 0.1, 0.2, 0.35, 0.5.
32
and bounding each summand with a power cone. This leads to the following model:
๐๐๐ก๐โ1 โฅ |๐ฅ๐|๐ ((๐๐, ๐ก, ๐ฅ๐) โ ๐ซ1/๐,1โ1/๐
3 ),โ๏ธ๐๐ = ๐ก.
(4.6)
When 0 < ๐ < 1 or ๐ < 0 the formula for โ๐ฅโ๐ gives a concave, rather than convex functionon R๐
+ and in this case it is possible to model the set{๏ธ(๐ก, ๐ฅ) : 0 โค ๐ก โค
(๏ธโ๏ธ๐ฅ๐๐
)๏ธ1/๐, ๐ฅ๐ โฅ 0
}๏ธ, ๐ < 1, ๐ ฬธ= 0.
We leave it as an exercise (see previous subsection). The case ๐ = โ1 appears in Sec. 3.2.7.
4.2.3 The most general power cone
Consider the most general version of the power cone with โlong left-hand sideโ defined in(4.4). We show that it can be expressed using the basic three-dimensional cones ๐ซ๐ผ,1โ๐ผ
3 .Clearly it suffices to consider a short right-hand side, that is to model the cone
๐ซ๐ผ1,...,๐ผ๐
๐+1 = {๐ฅ๐ผ11 ๐ฅ๐ผ2
2 ยท ยท ยท๐ฅ๐ผ๐๐ โฅ |๐ง|, ๐ฅ1, . . . , ๐ฅ๐ โฅ 0} , (4.7)
whereโ๏ธ
๐ ๐ผ๐ = 1. Denote ๐ = ๐ผ1 + ยท ยท ยท + ๐ผ๐โ1. We split (4.7) into two constraints
๐ฅ๐ผ1/๐ 1 ยท ยท ยท๐ฅ๐ผ๐โ1/๐
๐โ1 โฅ |๐ก|, ๐ฅ1, . . . , ๐ฅ๐โ1 โฅ 0,๐ก๐ ๐ฅ๐ผ๐
๐ โฅ |๐ง|, ๐ฅ๐ โฅ 0,(4.8)
and this way we expressed ๐ซ๐ผ1,...,๐ผ๐
๐+1 using two power cones ๐ซ๐ผ1/๐ ,...,๐ผ๐โ1/๐ ๐ and ๐ซ๐ ,๐ผ๐
3 . Pro-ceeding by induction gives the desired splitting.
4.2.4 Geometric mean
The power cone ๐ซ1/๐,...,1/๐๐+1 is a direct way to model the inequality
|๐ง| โค (๐ฅ1๐ฅ2 ยท ยท ยท ๐ฅ๐)1/๐, (4.9)
which corresponds to maximizing the geometric mean of the variables ๐ฅ๐ โฅ 0. In this specialcase tracing through the splitting (4.8) produces an equivalent representation of (4.9) usingthree-dimensional power cones as follows:
๐ฅ1/21 ๐ฅ
1/22 โฅ |๐ก3|,
๐ก1โ1/33 ๐ฅ
1/33 โฅ |๐ก4|,
ยท ยท ยท๐ก1โ1/(๐โ1)๐โ1 ๐ฅ
1/(๐โ1)๐โ1 โฅ |๐ก๐|,
๐ก1โ1/๐๐ ๐ฅ
1/๐๐ โฅ |๐ง|.
33
4.2.5 Non-homogenous constraints
Every constraint of the form
๐ฅ๐ผ11 ๐ฅ๐ผ2
2 ยท ยท ยท๐ฅ๐ผ๐๐ โฅ |๐ง|๐ฝ, ๐ฅ๐ โฅ 0,
whereโ๏ธ
๐ ๐ผ๐ < ๐ฝ and ๐ผ๐ > 0 is equivalent to
(๐ฅ1, ๐ฅ2, . . . , ๐ฅ๐, 1, ๐ง) โ ๐ซ๐ผ1/๐ฝ,...,๐ผ๐/๐ฝ,๐ ๐+2
with ๐ = 1 โโ๏ธ๐ ๐ผ๐/๐ฝ.
4.3 Power cone case studies
4.3.1 Portfolio optimization with market impact
Let us go back to the Markowitz portfolio optimization problem introduced in Sec. 3.3.3,where we now ask to maximize expected profit subject to bounded risk in a long-only port-folio:
maximize ๐๐๐ฅ
subject toโ๐ฅ๐ฮฃ๐ฅ โค ๐พ,
๐ฅ๐ โฅ 0, ๐ = 1, . . . , ๐,
(4.10)
In a realistic model we would have to consider transaction costs which decrease the expectedreturn. In particular if a really large volume is traded then the trade itself will affect theprice of the asset, a phenomenon called market impact. It is typically modeled by decreasingthe expected return of ๐-th asset by a slippage cost proportional to ๐ฅ๐ฝ for some ๐ฝ > 1, sothat the objective function changes to
maximize
(๏ธ๐๐๐ฅโ
โ๏ธ๐
๐ฟ๐๐ฅ๐ฝ๐
)๏ธ.
A popular choice is ๐ฝ = 3/2. This objective can easily be modeled with a power cone as inSec. 4.2:
maximize ๐๐๐ฅโ ๐ฟ๐ ๐ก
subject to (๐ก๐, 1, ๐ฅ๐) โ ๐ซ1/๐ฝ,1โ1/๐ฝ3 (๐ก๐ โฅ ๐ฅ๐ฝ
๐ ),ยท ยท ยท
In particular if ๐ฝ = 3/2 the inequality ๐ก๐ โฅ ๐ฅ3/2๐ has conic representation (๐ก๐, 1, ๐ฅ๐) โ ๐ซ2/3,1/3
3 .
4.3.2 Maximum volume cuboid
Suppose we have a convex, conic representable set ๐พ โ R๐ (for instance a polyhedron, aball, intersections of those and so on). We would like to find a maximum volume axis-parallel
34
cuboid inscribed in ๐พ. If we denote by ๐ โ R๐ the leftmost corner and by ๐ฅ1, . . . , ๐ฅ๐ theedge lengths, then this problem is equivalent to
maximize ๐กsubject to ๐ก โค (๐ฅ1 ยท ยท ยท ๐ฅ๐)1/๐,
๐ฅ๐ โฅ 0,(๐1 + ๐1๐ฅ1, . . . , ๐๐ + ๐๐๐ฅ๐) โ ๐พ, โ๐1, . . . , ๐๐ โ {0, 1},
(4.11)
where the last constraint states that all vertices of the cuboid are in ๐พ. The optimal volumeis then ๐ฃ = ๐ก๐. Modeling the geometric mean with power cones was discussed in Sec. 4.2.4.
Maximizing the volume of an arbitrary (not necessarily axis-parallel) cuboid inscribed in๐พ is no longer a convex problem. However, it can be solved by maximizing the solution to(4.11) over all sets ๐ (๐พ) where ๐ is a rotation (orthogonal matrix) in R๐. In practice onecan approximate the global solution by sampling sufficiently many rotations ๐ or using moreadvanced methods of optimization over the orthogonal group.
Fig. 4.2: The maximal volume cuboid inscribed in the regular icosahedron takes up approx-imately 0.388 of the volume of the icosahedron (the exact value is 3(1 +
โ5)/25).
4.3.3 ๐-norm geometric median
The geometric median of a sequence of points ๐ฅ1, . . . , ๐ฅ๐ โ R๐ is defined as
argmin๐ฆโR๐
๐โ๏ธ๐=1
โ๐ฆ โ ๐ฅ๐โ,
that is a point which minimizes the sum of distances to all the given points. Here โ ยท โ canbe any norm on R๐. The most classical case is the Euclidean norm, where the geometricmedian is the solution to the basic facility location problem minimizing total transportationcost from one depot to given destinations.
35
For a general ๐-norm โ๐ฅโ๐ with 1 โค ๐ < โ (see Sec. 4.2.2) the geometric median is thesolution to the obvious conic problem:
minimizeโ๏ธ
๐ ๐ก๐subject to ๐ก๐ โฅ โ๐ฆ โ ๐ฅ๐โ๐,
๐ฆ โ R๐.(4.12)
In Sec. 4.2.2 we showed how to model the ๐-norm bound using ๐ power cones.The Fermat-Torricelli point of a triangle is the Euclidean geometric mean of its vertices,
and a classical theorem in planar geometry (due to Torricelli, posed by Fermat), states thatit is the unique point inside the triangle from which each edge is visible at the angle of 120โ
(or a vertex if the triangle has an angle of 120โ or more). Using (4.12) we can compute the๐-norm analogues of the Fermat-Torricelli point. Some examples are shown in Fig. 4.3.
Fig. 4.3: The geometric median of three triangle vertices in various ๐-norms.
4.3.4 Maximum likelihood estimator of a convex density function
In [TV98] the problem of estimating a density function that is know in advance to be convexis considered. Here we will show that this problem can be posed as a conic optimizationproblem. Formally the problem is to estimate an unknown convex density function ๐ : R+ โR+ given an ordered sample ๐ฆ1 < ๐ฆ2 < . . . < ๐ฆ๐ of ๐ outcomes of a distribution with density๐.
The estimator ๐ โฅ 0 is a piecewise linear function
๐ : [๐ฆ1, ๐ฆ๐] โ R+
with break points at (๐ฆ๐, ๐ฅ๐), ๐ = 1, . . . , ๐, where the variables ๐ฅ๐ > 0 are estimators for ๐(๐ฆ๐).The slope of the ๐-th linear segment of ๐ is
๐ฅ๐+1 โ ๐ฅ๐
๐ฆ๐+1 โ ๐ฆ๐.
Hence the convexity requirement leads to the constraints
๐ฅ๐+1 โ ๐ฅ๐
๐ฆ๐+1 โ ๐ฆ๐โค ๐ฅ๐+2 โ ๐ฅ๐+1
๐ฆ๐+2 โ ๐ฆ๐+1
, ๐ = 1, . . . , ๐โ 2.
36
Recall the area under the density function must be 1. Hence,
๐โ1โ๏ธ๐=1
(๐ฆ๐+1 โ ๐ฆ๐)
(๏ธ๐ฅ๐+1 + ๐ฅ๐
2
)๏ธ= 1
must hold. Therefore, the problem to be solved is
maximizeโ๏ธ๐
๐=1 ๐ฅ๐
subject to ๐ฅ๐+1โ๐ฅ๐
๐ฆ๐+1โ๐ฆ๐โ ๐ฅ๐+2โ๐ฅ๐+1
๐ฆ๐+2โ๐ฆ๐+1โค 0, ๐ = 1, . . . , ๐โ 2,โ๏ธ๐โ1
๐=1 (๐ฆ๐+1 โ ๐ฆ๐)(๏ธ๐ฅ๐+1+๐ฅ๐
2
)๏ธ= 1,
๐ฅ โฅ 0.
Maximizingโ๏ธ๐
๐=1 ๐ฅ๐ or the geometric mean (โ๏ธ๐
๐=1 ๐ฅ๐)1๐ will produce the same optimal solu-
tions. Using that observation we can model the objective as shown in Sec. 4.2.4. Alterna-tively, one can use as objective
โ๏ธ๐ log ๐ฅ๐ and model it using the exponential cone as in Sec.
5.2.
37
Chapter 5
Exponential cone optimization
So far we discussed optimization problems involving the major โpolynomialโ families of cones:linear, quadratic and power cones. In this chapter we introduce a single new object, namelythe three-dimensional exponential cone, together with examples and applications. The ex-ponential cone can be used to model a variety of constraints involving exponentials andlogarithms.
5.1 Exponential cone
The exponential cone is a convex subset of R3 defined as
๐พexp ={๏ธ
(๐ฅ1, ๐ฅ2, ๐ฅ3) : ๐ฅ1 โฅ ๐ฅ2๐๐ฅ3/๐ฅ2 , ๐ฅ2 > 0
}๏ธโช
{(๐ฅ1, 0, ๐ฅ3) : ๐ฅ1 โฅ 0, ๐ฅ3 โค 0} . (5.1)
Thus the exponential cone is the closure in R3 of the set of points which satisfy
๐ฅ1 โฅ ๐ฅ2๐๐ฅ3/๐ฅ2 , ๐ฅ1, ๐ฅ2 > 0. (5.2)
When working with logarithms, a convenient reformulation of (5.2) is
๐ฅ3 โค ๐ฅ2 log(๐ฅ1/๐ฅ2), ๐ฅ1, ๐ฅ2 > 0. (5.3)
Alternatively, one can write the same condition as
๐ฅ1/๐ฅ2 โฅ ๐๐ฅ3/๐ฅ2 , ๐ฅ1, ๐ฅ2 > 0,
which immediately shows that ๐พexp is in fact a cone, i.e. ๐ผ๐ฅ โ ๐พexp for ๐ฅ โ ๐พexp and ๐ผ โฅ 0.Convexity of ๐พexp follows from the fact that the Hessian of ๐(๐ฅ, ๐ฆ) = ๐ฆ exp(๐ฅ/๐ฆ), namely
๐ท2(๐) = ๐๐ฅ/๐ฆ[๏ธ
๐ฆโ1 โ๐ฅ๐ฆโ2
โ๐ฅ๐ฆโ2 ๐ฅ2๐ฆโ3
]๏ธis positive semidefinite for ๐ฆ > 0.
38
Fig. 5.1: The boundary of the exponential cone ๐พexp. The red isolines are graphs of ๐ฅ2 โ๐ฅ2 log(๐ฅ1/๐ฅ2) for fixed ๐ฅ1, see (5.3).
5.2 Modeling with the exponential cone
Extending the conic optimization toolbox with the exponential cone leads to new types ofconstraint building blocks and new types of representable sets. In this section we list thebasic operations available using the exponential cone.
5.2.1 Exponential
The epigraph ๐ก โฅ ๐๐ฅ is a section of ๐พexp:
๐ก โฅ ๐๐ฅ โโ (๐ก, 1, ๐ฅ) โ ๐พexp. (5.4)
5.2.2 Logarithm
Similarly, we can express the hypograph ๐ก โค log ๐ฅ, ๐ฅ โฅ 0:
๐ก โค log ๐ฅ โโ (๐ฅ, 1, ๐ก) โ ๐พexp. (5.5)
5.2.3 Entropy
The entropy function ๐ป(๐ฅ) = โ๐ฅ log ๐ฅ can be maximized using the following representationwhich follows directly from (5.3):
๐ก โค โ๐ฅ log ๐ฅ โโ ๐ก โค ๐ฅ log(1/๐ฅ) โโ (1, ๐ฅ, ๐ก) โ ๐พexp. (5.6)
39
5.2.4 Relative entropy
The relative entropy or Kullback-Leiber divergence of two probability distributions is definedin terms of the function ๐ท(๐ฅ, ๐ฆ) = ๐ฅ log(๐ฅ/๐ฆ). It is convex, and the minimization problem๐ก โฅ ๐ท(๐ฅ, ๐ฆ) is equivalent to
๐ก โฅ ๐ท(๐ฅ, ๐ฆ) โโ โ๐ก โค ๐ฅ log(๐ฆ/๐ฅ) โโ (๐ฆ, ๐ฅ,โ๐ก) โ ๐พexp. (5.7)
Because of this reparametrization the exponential cone is also referred to as the relativeentropy cone, leading to a class of problems known as REPs (relative entropy problems).Having the relative entropy function available makes it possible to express epigraphs of otherfunctions appearing in REPs, for instance:
๐ฅ log(1 + ๐ฅ/๐ฆ) = ๐ท(๐ฅ + ๐ฆ, ๐ฆ) + ๐ท(๐ฆ, ๐ฅ + ๐ฆ).
5.2.5 Softplus function
In neural networks the function ๐(๐ฅ) = log(1+๐๐ฅ), known as the softplus function, is used asan analytic approximation to the rectifier activation function ๐(๐ฅ) = ๐ฅ+ = max(0, ๐ฅ). Thesoftplus function is convex and we can express its epigraph ๐ก โฅ log(1 + ๐๐ฅ) by combining twoexponential cones. Note that
๐ก โฅ log(1 + ๐๐ฅ) โโ ๐๐ฅโ๐ก + ๐โ๐ก โค 1
and therefore ๐ก โฅ log(1 + ๐๐ฅ) is equivalent to the following set of conic constraints:
๐ข + ๐ฃ โค 1,(๐ข, 1, ๐ฅโ ๐ก) โ ๐พexp,
(๐ฃ, 1,โ๐ก) โ ๐พexp.(5.8)
5.2.6 Log-sum-exp
We can generalize the previous example to a log-sum-exp (logarithm of sum of exponentials)expression
๐ก โฅ log(๐๐ฅ1 + ยท ยท ยท + ๐๐ฅ๐).
This is equivalent to the inequality
๐๐ฅ1โ๐ก + ยท ยท ยท + ๐๐ฅ๐โ๐ก โค 1,
and so it can be modeled as follows:โ๏ธ๐ข๐ โค 1,
(๐ข๐, 1, ๐ฅ๐ โ ๐ก) โ ๐พexp, ๐ = 1, . . . , ๐.(5.9)
40
5.2.7 Log-sum-inv
The following type of bound has applications in capacity optimization for wireless networkdesign:
๐ก โฅ log
(๏ธ1
๐ฅ1
+ ยท ยท ยท +1
๐ฅ๐
)๏ธ, ๐ฅ๐ > 0.
Since the logarithm is increasing, we can model this using a log-sum-exp and an exponentialas:
๐ก โฅ log(๐๐ฆ1 + ยท ยท ยท + ๐๐ฆ๐),๐ฅ๐ โฅ ๐โ๐ฆ๐ , ๐ = 1, . . . , ๐.
Alternatively, one can also rewrite the original constraint in equivalent form:
๐โ๐ก โค ๐ โค(๏ธ
1
๐ฅ1
+ ยท ยท ยท +1
๐ฅ๐
)๏ธโ1
, ๐ฅ๐ > 0.
and then model the right-hand side inequality using the technique from Sec. 3.2.7. Thisapproach requires only one exponential cone.
5.2.8 Arbitrary exponential
The inequality
๐ก โฅ ๐๐ฅ11 ๐๐ฅ2
2 ยท ยท ยท ๐๐ฅ๐๐ ,
where ๐๐ are arbitrary positive constants, is of course equivalent to
๐ก โฅ exp
(๏ธโ๏ธ๐
๐ฅ๐ log ๐๐
)๏ธand therefore to (๐ก, 1,
โ๏ธ๐ ๐ฅ๐ log ๐๐) โ ๐พexp.
5.2.9 Lambert W-function
The Lambert function ๐ : R+ โ R+ is the unique function satisfying the identity
๐ (๐ฅ)๐๐ (๐ฅ) = ๐ฅ.
It is the real branch of a more general function which appears in applications such as diodemodeling. The ๐ function is concave. Although there is no explicit analytic formula for๐ (๐ฅ), the hypograph {(๐ฅ, ๐ก) : 0 โค ๐ฅ, 0 โค ๐ก โค ๐ (๐ฅ)} has an equivalent description:
๐ฅ โฅ ๐ก๐๐ก = ๐ก๐๐ก2/๐ก
and so it can be modeled with a mix of exponential and quadratic cones (see Sec. 3.1.2):
(๐ฅ, ๐ก, ๐ข) โ ๐พexp, (๐ฅ โฅ ๐ก exp(๐ข/๐ก)),(1/2, ๐ข, ๐ก) โ ๐ฌr , (๐ข โฅ ๐ก2).
(5.10)
41
5.2.10 Other simple sets
Here are a few more typical sets which can be expressed using the exponential and quadraticcones. The presentations should be self-explanatory; we leave the simple verifications to thereader.
Table 5.1: Sets representable with the exponential coneSet Conic representation๐ก โฅ (log ๐ฅ)2, 0 < ๐ฅ โค 1 (1
2, ๐ก, ๐ข) โ ๐ฌ3
r , (๐ฅ, 1, ๐ข) โ ๐พexp, ๐ฅ โค 1๐ก โค log log ๐ฅ, ๐ฅ > 1 (๐ข, 1, ๐ก) โ ๐พexp, (๐ฅ, 1, ๐ข) โ ๐พexp
๐ก โฅ (log ๐ฅ)โ1, ๐ฅ > 1 (๐ข, ๐ก,โ
2) โ ๐ฌ3r , (๐ฅ, 1, ๐ข) โ ๐พexp
๐ก โค โlog ๐ฅ, ๐ฅ > 1 (1
2, ๐ข, ๐ก) โ ๐ฌ3
r , (๐ฅ, 1, ๐ข) โ ๐พexp
๐ก โค โ๐ฅ log ๐ฅ, ๐ฅ > 1 (๐ฅ, ๐ข,
โ2๐ก) โ ๐ฌ3
r , (๐ฅ, 1, ๐ข) โ ๐พexp
๐ก โค log(1 โ 1/๐ฅ), ๐ฅ > 1 (๐ฅ, ๐ข,โ
2) โ ๐ฌ3r , (1 โ ๐ข, 1, ๐ก) โ ๐พexp
๐ก โฅ log(1 + 1/๐ฅ), ๐ฅ > 0 (๐ฅ + 1, ๐ข,โ
2) โ ๐ฌ3r , (1 โ ๐ข, 1,โ๐ก) โ ๐พexp
5.3 Geometric programming
Geometric optimization problems form a family of optimization problems with objective andconstraints in special polynomial form. It is a rich class of problems solved by reformulating inlogarithmic-exponential form, and thus a major area of applications for the exponential cone๐พexp. Geometric programming is used in circuit design, chemical engineering, mechanicalengineering and finance, just to mention a few applications. We refer to [BKVH07] for asurvey and extensive bibliography.
5.3.1 Definition and basic examples
A monomial is a real valued function of the form
๐(๐ฅ1, . . . , ๐ฅ๐) = ๐๐ฅ๐11 ๐ฅ๐2
2 ยท ยท ยท๐ฅ๐๐๐ , (5.11)
where the exponents ๐๐ are arbitrary real numbers and ๐ > 0. A posynomial (positive poly-nomial) is a sum of monomials. Thus the difference between a posynomial and a standardnotion of a multi-variate polynomial known from algebra or calculus is that (i) posynomi-als can have arbitrary exponents, not just integers, but (ii) they can only have positivecoefficients.
For example, the following functions are monomials (in variables ๐ฅ, ๐ฆ, ๐ง):
๐ฅ๐ฆ, 2๐ฅ1.5๐ฆโ1๐ฅ0.3, 3โ๏ธ
๐ฅ๐ฆ/๐ง, 1 (5.12)
and the following are examples of posynomials:
2๐ฅ + ๐ฆ๐ง, 1.5๐ฅ3๐ง + 5/๐ฆ, (๐ฅ2๐ฆ2 + 3๐งโ0.3)4 + 1. (5.13)
42
A geometric program (GP) is an optimization problem of the form
minimize ๐0(๐ฅ)subject to ๐๐(๐ฅ) โค 1, ๐ = 1, . . . ,๐,
๐ฅ๐ > 0, ๐ = 1, . . . , ๐,(5.14)
where ๐0, . . . , ๐๐ are posynomials and ๐ฅ = (๐ฅ1, . . . , ๐ฅ๐) is the variable vector.A geometric program (5.14) can be modeled in exponential conic form by making a
substitution
๐ฅ๐ = ๐๐ฆ๐ , ๐ = 1, . . . , ๐.
Under this substitution a monomial of the form (5.11) becomes
๐๐๐1๐ฆ1๐๐2๐ฆ2 ยท ยท ยท ๐๐๐๐ฆ๐ = exp(๐๐* ๐ฆ + log ๐)
for ๐* = (๐1, . . . , ๐๐). Consequently, the optimization problem (5.14) takes an equivalentform
minimize ๐กsubject to log(
โ๏ธ๐ exp(๐๐0,๐,*๐ฆ + log ๐0,๐)) โค ๐ก,
log(โ๏ธ
๐ exp(๐๐๐,๐,*๐ฆ + log ๐๐,๐)) โค 0, ๐ = 1, . . . ,๐,(5.15)
where ๐๐,๐ โ R๐ and ๐๐,๐ โ R for all ๐, ๐. These are now log-sum-exp constraints we alreadydiscussed in Sec. 5.2.6. In particular, the problem (5.15) is convex, as opposed to theposynomial formulation (5.14).
Example
We demonstrate this reduction on a simple example. Take the geometric problem
minimize ๐ฅ + ๐ฆ2๐งsubject to 0.1
โ๐ฅ + 2๐ฆโ1 โค 1,
๐งโ1 + ๐ฆ๐ฅโ2 โค 1.
By substituting ๐ฅ = ๐๐ข, ๐ฆ = ๐๐ฃ, ๐ง = ๐๐ค we get
minimize ๐กsubject to log(๐๐ข + ๐2๐ฃ+๐ค) โค ๐ก,
log(๐0.5๐ข+log 0.1 + ๐โ๐ฃ+log 2) โค 0,log(๐โ๐ค + ๐๐ฃโ2๐ข) โค 0.
and using the log-sum-exp reduction from Sec. 5.2.6 we write an explicit conic problem:
minimize ๐กsubject to (๐1, 1, ๐ขโ ๐ก), (๐1, 1, 2๐ฃ + ๐ค โ ๐ก) โ ๐พexp, ๐1 + ๐1 โค 1,
(๐2, 1, 0.5๐ข + log 0.1), (๐2, 1,โ๐ฃ + log 2) โ ๐พexp, ๐2 + ๐2 โค 1,(๐3, 1,โ๐ค), (๐3, 1, ๐ฃ โ 2๐ข) โ ๐พexp, ๐3 + ๐3 โค 1.
Solving this problem yields (๐ฅ, ๐ฆ, ๐ง) โ (3.14, 2.43, 1.32).
43
5.3.2 Generalized geometric models
In this section we briefly discuss more general types of constraints which can be modeledwith geometric programs.
Monomials
If ๐(๐ฅ) is a monomial then the constraint ๐(๐ฅ) = ๐ is equivalent to two posynomialinequalities ๐(๐ฅ)๐โ1 โค 1 and ๐(๐ฅ)โ1๐ โค 1, so it can be expressed in the language ofgeometric programs. In practice it should be added to the model (5.15) as a linear constraintโ๏ธ
๐
๐๐๐ฆ๐ = log ๐.
Monomial inequalities ๐(๐ฅ) โค ๐ and ๐(๐ฅ) โฅ ๐ should similarly be modeled as linear inequal-ities in the ๐ฆ๐ variables.
In similar vein, if ๐(๐ฅ) is a posynomial and ๐(๐ฅ) is a monomial then ๐(๐ฅ) โค ๐(๐ฅ) is stilla posynomial inequality because it can be written as ๐(๐ฅ)๐(๐ฅ)โ1 โค 1.
It also means that we can add lower and upper variable bounds: 0 < ๐1 โค ๐ฅ โค ๐2 isequivalent to ๐ฅโ1๐1 โค 1 and ๐โ1
2 ๐ฅ โค 1.
Products and positive powers
Expressions involving products and positive powers (possibly iterated) of posynomialscan again be modeled with posynomials. For example, a constraint such as
((๐ฅ๐ฆ2 + ๐ง)0.3 + ๐ฆ)(1/๐ฅ + 1/๐ฆ)2.2 โค 1
can be replaced with
๐ฅ๐ฆ2 + ๐ง โค ๐ก, ๐ก0.3 + ๐ฆ โค ๐ข, ๐ฅโ1 + ๐ฆโ1 โค ๐ฃ, ๐ข๐ฃ2.2 โค 1.
Other transformations and extensions
โข If ๐1, ๐2 are already expressed by posynomials then max{๐1(๐ฅ), ๐2(๐ฅ)} โค ๐ก is clearlyequivalent to ๐1(๐ฅ) โค ๐ก and ๐2(๐ฅ) โค ๐ก. Hence we can add the maximum operator tothe list of building blocks for geometric programs.
โข If ๐, ๐ are posynomials, ๐ is a monomial and we know that ๐(๐ฅ) โฅ ๐(๐ฅ) then theconstraint ๐(๐ฅ)
๐(๐ฅ)โ๐(๐ฅ)โค ๐ก is equivalent to ๐กโ1๐(๐ฅ)๐(๐ฅ)โ1 + ๐(๐ฅ)๐(๐ฅ)โ1 โค 1.
โข The objective function of a geometric program (5.14) can be extended to include otherterms, for example:
minimize ๐0(๐ฅ) +โ๏ธ๐
log๐๐(๐ฅ)
44
where ๐๐(๐ฅ) are monomials. After the change of variables ๐ฅ = ๐๐ฆ we get a slightlymodified version of (5.15):
minimize ๐ก + ๐๐๐ฆsubject to
โ๏ธ๐ exp(๐๐0,๐,*๐ฆ + log ๐0,๐) โค ๐ก,
log(โ๏ธ
๐ exp(๐๐๐,๐,*๐ฆ + log ๐๐,๐)) โค 0, ๐ = 1, . . . ,๐,
(note the lack of one logarithm) which can still be expressed with exponential cones.
5.3.3 Geometric programming case studies
Frobenius norm diagonal scaling
Suppose we have a matrix ๐ โ R๐ร๐ and we want to rescale the coordinate systemusing a diagonal matrix ๐ท = Diag(๐1, . . . , ๐๐) with ๐๐ > 0. In the new basis the lineartransformation given by ๐ will now be described by the matrix ๐ท๐๐ทโ1 = (๐๐๐๐๐๐
โ1๐ )๐,๐.
To choose ๐ท which leads to a โsmallโ rescaling we can for example minimize the Frobeniusnorm
โ๐ท๐๐ทโ1โ2๐น =โ๏ธ๐๐
(๏ธ(๐ท๐๐ทโ1)๐๐
)๏ธ2=โ๏ธ๐๐
๐2๐๐๐
2๐ ๐
โ2๐ .
Minimizing the last sum is an example of a geometric program with variables ๐๐ (and withoutconstraints).
Maximum likelihood estimation
Geometric programs appear naturally in connection with maximum likelihood estimationof parameters of random distributions. Consider a simple example. Suppose we have twobiased coins, with head probabilities ๐ and 2๐, respectively. We toss both coins and countthe total number of heads. Given that, in the long run, we observed ๐ heads ๐๐ times for๐ = 0, 1, 2, estimate the value of ๐.
The probability of obtaining the given outcome equals(๏ธ๐0 + ๐1 + ๐2
๐0
)๏ธ(๏ธ๐1 + ๐2
๐1
)๏ธ(๐ ยท 2๐)๐2 (๐(1 โ 2๐) + 2๐(1 โ ๐))๐1 ((1 โ ๐)(1 โ 2๐))๐0 ,
and, up to constant factors, maximizing that expression is equivalent to solving the problem
maximize ๐2๐2+๐1๐ ๐1๐๐0๐๐0
subject to ๐ โค 1 โ ๐,๐ โค 1 โ 2๐,๐ โค 3 โ 4๐,
or, as a geometric problem:
minimize ๐โ2๐2โ๐1๐ โ๐1๐โ๐0๐โ๐0
subject to ๐ + ๐ โค 1,๐ + 2๐ โค 1,
13๐ + 4
3๐ โค 1.
For example, if (๐0, ๐1, ๐2) = (30, 53, 16) then the above problem solves with ๐ = 0.29.
45
An Olympiad problem
The 26th Vojtฤch Jarnรญk International Mathematical Competition, Ostrava 2016. Let๐, ๐, ๐ be positive real numbers with ๐ + ๐ + ๐ = 1. Prove that(๏ธ
1
๐+
1
๐๐
)๏ธ(๏ธ1
๐+
1
๐๐
)๏ธ(๏ธ1
๐+
1
๐๐
)๏ธโฅ 1728
Using the tricks introduced in Sec. 5.3.2 we formulate this problem as a geometric program:
minimize ๐๐๐subject to ๐โ1๐โ1 + ๐โ1๐โ1๐โ1 โค 1,
๐โ1๐โ1 + ๐โ1๐โ1๐โ1 โค 1,๐โ1๐โ1 + ๐โ1๐โ1๐โ1 โค 1,
๐ + ๐ + ๐ โค 1.
Unsurprisingly, the optimal value of this program is 1728, achieved for (๐, ๐, ๐, ๐, ๐, ๐) =(๏ธ13, 13, 13, 12, 12, 12
)๏ธ.
Power control and rate allocation in wireless networks
We consider a basic wireless network power control problem. In a wireless network with๐ logical transmitter-receiver pairs if the power output of transmitter ๐ is ๐๐ then the powerreceived by receiver ๐ is ๐บ๐๐๐๐, where ๐บ๐๐ models path gain and fading effects. If the ๐-threceiverโs own noise is ๐๐ then the signal-to-interference-plus-noise (SINR) ratio of receiver๐ is given by
๐ ๐ =๐บ๐๐๐๐
๐๐ +โ๏ธ
๐ ฬธ=๐ ๐บ๐๐๐๐. (5.16)
Maximizing the minimal SINR over all receivers (max min๐ ๐ ๐), subject to bounded poweroutput of the transmitters, is equivalent to the geometric program with variables ๐1, . . . , ๐๐, ๐ก:
minimize ๐กโ1
subject to ๐min โค ๐๐ โค ๐max, ๐ = 1, . . . , ๐,๐ก(๐๐ +
โ๏ธ๐ ฬธ=๐๐บ๐๐๐๐)๐บ
โ1๐๐ ๐
โ1๐ โค 1, ๐ = 1, . . . , ๐.
(5.17)
In the low-SNR regime the problem of system rate maximization is approximated by theproblem of maximizing
โ๏ธ๐ log ๐ ๐, or equivalently minimizing
โ๏ธ๐ ๐
โ1๐ . This is a geometric
problem with variables ๐๐, ๐ ๐:
minimize ๐ โ11 ยท ยท ยท ยท ๐ โ1
๐
subject to ๐min โค ๐๐ โค ๐max, ๐ = 1, . . . , ๐,๐ ๐(๐๐ +
โ๏ธ๐ ฬธ=๐๐บ๐๐๐๐)๐บ
โ1๐๐ ๐
โ1๐ โค 1, ๐ = 1, . . . , ๐.
(5.18)
For more information and examples see [BKVH07] .
46
5.4 Exponential cone case studies
In this section we introduce some practical optimization problems where the exponentialcone comes in handy.
5.4.1 Risk parity portfolio
Consider a simple version of the Markowitz portfolio optimization problem introduced in Sec.3.3.3, where we simply ask to minimize the risk ๐(๐ฅ) =
โ๐ฅ๐ฮฃ๐ฅ of a fully-invested long-only
portfolio:
minimizeโ๐ฅ๐ฮฃ๐ฅ
subject toโ๏ธ๐
๐=1 ๐ฅ๐ = 1,๐ฅ๐ โฅ 0, ๐ = 1, . . . , ๐,
(5.19)
where ฮฃ is a symmetric positive definite covariance matrix. We can derive from the first-order optimality conditions that the solution to (5.19) satisfies ๐๐
๐๐ฅ๐= ๐๐
๐๐ฅ๐whenever ๐ฅ๐, ๐ฅ๐ > 0,
i.e. marginal risk contributions of positively invested assets are equal. In practice this oftenleads to concentrated portfolios, whereas it would benefit diversification to consider portfolioswhere all assets have the same total contribution to risk:
๐ฅ๐๐๐
๐๐ฅ๐
= ๐ฅ๐๐๐
๐๐ฅ๐
, ๐, ๐ = 1, . . . , ๐. (5.20)
We call (5.20) the risk parity condition. It indeed models equal risk contribution from allthe assets, because as one can easily check ๐๐
๐๐ฅ๐= (ฮฃ๐ฅ)๐โ
๐ฅ๐ฮฃ๐ฅand
๐(๐ฅ) =โ๏ธ๐
๐ฅ๐๐๐
๐๐ฅ๐
.
Risk parity portfolios satisfying condition (5.20) can be found with an auxiliary optimizationproblem:
minimizeโ๐ฅ๐ฮฃ๐ฅโ ๐
โ๏ธ๐ log ๐ฅ๐
subject to ๐ฅ๐ > 0, ๐ = 1, . . . , ๐,(5.21)
for any ๐ > 0. More precisely, the gradient of the objective function in (5.21) is zerowhen ๐๐
๐๐ฅ๐= ๐/๐ฅ๐ for all ๐, implying the parity condition (5.20) holds. Since (5.20) is scale-
invariant, we can rescale any solution of (5.21) and get a fully-invested risk parity portfoliowith
โ๏ธ๐ ๐ฅ๐ = 1.
The conic form of problem (5.21) is:
minimize ๐กโ ๐๐๐ ๐
subject to (๐ก,ฮฃ1/2๐ฅ) โ ๐ฌ๐+1, (๐ก โฅโ๐ฅ๐ฮฃ๐ฅ),
(๐ฅ๐, 1, ๐ ๐) โ ๐พexp, (๐ ๐ โค log ๐ฅ๐).
(5.22)
47
5.4.2 Entropy maximization
A general entropy maximization problem has the form
maximize โโ๏ธ๐ ๐๐ log ๐๐subject to
โ๏ธ๐ ๐๐ = 1,๐๐ โฅ 0,๐ โ โ,
(5.23)
where โ defines additional constraints on the probability distribution ๐ (these are known asprior information). In the absence of complete information about ๐ the maximum entropyprinciple of Jaynes posits to choose the distribution which maximizes uncertainty, that isentropy, subject to what is known. Practitioners think of the solution to (5.23) as the mostrandom or most conservative of distributions consistent with โ.
Maximization of the entropy function ๐ป(๐ฅ) = โ๐ฅ log ๐ฅ was explained in Sec. 5.2.Often one has an a priori distribution ๐, and one tries to minimize the distance between
๐ and ๐, while remaining consistent with โ. In this case it is standard to minimize theKullback-Leiber divergence
๐๐พ๐ฟ(๐||๐) =โ๏ธ๐
๐๐ log ๐๐/๐๐
which leads to an optimization problem
minimizeโ๏ธ
๐ ๐๐ log ๐๐/๐๐subject to
โ๏ธ๐ ๐๐ = 1,๐๐ โฅ 0,๐ โ โ.
(5.24)
5.4.3 Hitting time of a linear system
Consider a linear dynamical system
xโฒ(๐ก) = ๐ดx(๐ก) (5.25)
where we assume for simplicity that ๐ด = Diag(๐1, . . . , ๐๐) with ๐๐ < 0 with initial conditionx(0)๐ = ๐ฅ๐. The resulting dynamical system x(๐ก) = x(0) exp(๐ด๐ก) converges to 0 and one canask, for instance, for the time it takes to approach the limit up to distance ๐. The resultingoptimization problem is
minimize ๐ก
subject toโ๏ธโ๏ธ
๐ (๐ฅ๐ exp(๐๐๐ก))2 โค ๐,
with the following conic form, where the variables are ๐ก, ๐1, . . . , ๐๐:
minimize ๐กsubject to (๐, ๐ฅ1๐1, . . . , ๐ฅ๐๐๐) โ ๐ฌ๐+1,
(๐๐, 1, ๐๐๐ก) โ ๐พexp, ๐ = 1, . . . , ๐.
48
See Fig. 5.2 for an example. Other criteria for the target set of the trajectories are alsopossible. For example, polyhedral constraints
๐๐x โค ๐, ๐ โ R๐+, ๐ โ R+
are also expressible in exponential conic form for starting points x(0) โ R๐+, since they
correspond to log-sum-exp constraints of the form
log
(๏ธโ๏ธ๐
exp(๐๐x๐ + log(๐๐๐ฅ๐))
)๏ธโค log ๐.
For a robust version involving uncertainty on ๐ด and ๐ฅ(0) see [CS14] .
Fig. 5.2: With ๐ด = Diag(โ0.3,โ0.06) and starting point ๐ฅ(0) = (2.2, 1.3) the trajectoryreaches distance ๐ = 0.5 from origin at time ๐ก โ 15.936.
5.4.4 Logistic regression
Logistic regression is a technique of training a binary classifier. We are given a training setof examples ๐ฅ1, . . . , ๐ฅ๐ โ R๐ together with labels ๐ฆ๐ โ {0, 1}. The goal is to train a classifiercapable of assigning new data points to either class 0 or 1. Specifically, the labels should beassigned by computing
โ๐(๐ฅ) =1
1 + exp(โ๐๐๐ฅ)(5.26)
and choosing label ๐ฆ = 0 if โ๐(๐ฅ) < 12
and ๐ฆ = 1 for โ๐(๐ฅ) โฅ 12. Here โ๐(๐ฅ) is interpreted as
the probability that ๐ฅ belongs to class 1. The optimal parameter vector ๐ should be learnedfrom the training set, so as to maximize the likelihood function:โ๏ธ
๐
โ๐(๐ฅ๐)๐ฆ๐(1 โ โ๐(๐ฅ๐))
1โ๐ฆ๐ .
By taking logarithms and adding regularization with respect to ๐ we reach an unconstrainedoptimization problem
minimize๐โR๐ ๐โ๐โ2 +โ๏ธ๐
โ๐ฆ๐ log(โ๐(๐ฅ๐)) โ (1 โ ๐ฆ๐) log(1 โ โ๐(๐ฅ๐)). (5.27)
49
Problem (5.27) is convex, and can be more explicitly written as
minimizeโ๏ธ
๐ ๐ก๐ + ๐๐subject to ๐ก๐ โฅ โ log(โ๐(๐ฅ)) = log(1 + exp(โ๐๐๐ฅ๐)) if ๐ฆ๐ = 1,
๐ก๐ โฅ โ log(1 โ โ๐(๐ฅ)) = log(1 + exp(๐๐๐ฅ๐)) if ๐ฆ๐ = 0,๐ โฅ โ๐โ2,
involving softplus type constraints (see Sec. 5.2) and a quadratic cone. See Fig. 5.3 for anexample.
Fig. 5.3: Logistic regression example with none, medium and strong regularization (small,medium, large ๐). The two-dimensional dataset was converted into a feature vector ๐ฅ โ R28
using monomial coordinates of degrees at most 6. Without regularization we get obviousoverfitting.
50
Chapter 6
Semidefinite optimization
In this chapter we extend the conic optimization framework introduced before with symmet-ric positive semidefinite matrix variables.
6.1 Introduction to semidefinite matrices
6.1.1 Semidefinite matrices and cones
A symmetric matrix ๐ โ ๐ฎ๐ is called symmetric positive semidefinite if
๐ง๐๐๐ง โฅ 0, โ๐ง โ R๐.
We then define the cone of symmetric positive semidefinite matrices as
๐ฎ๐+ = {๐ โ ๐ฎ๐ | ๐ง๐๐๐ง โฅ 0, โ๐ง โ R๐}. (6.1)
For brevity we will often use the shorter notion semidefinite instead of symmetric positivesemidefinite, and we will write ๐ โชฐ ๐ (๐ โชฏ ๐ ) as shorthand notation for (๐ โ ๐ ) โ ๐ฎ๐
+
((๐ โ๐) โ ๐ฎ๐+). As inner product for semidefinite matrices, we use the standard trace inner
product for general matrices, i.e.,
โจ๐ด,๐ตโฉ := tr(๐ด๐๐ต) =โ๏ธ๐๐
๐๐๐๐๐๐.
It is easy to see that (6.1) indeed specifies a convex cone; it is pointed (with origin ๐ = 0),and ๐, ๐ โ ๐ฎ๐
+ implies that (๐ผ๐ + ๐ฝ๐ ) โ ๐ฎ๐+, ๐ผ, ๐ฝ โฅ 0. Let us review a few equivalent
definitions of ๐ฎ๐+. It is well-known that every symmetric matrix ๐ด has a spectral factorization
๐ด =๐โ๏ธ
๐=1
๐๐๐๐๐๐๐ .
where ๐๐ โ R๐ are the (orthogonal) eigenvectors and ๐๐ are eigenvalues of ๐ด. Using thespectral factorization of ๐ด we have
๐ฅ๐๐ด๐ฅ =๐โ๏ธ
๐=1
๐๐(๐ฅ๐ ๐๐)
2,
51
which shows that ๐ฅ๐๐ด๐ฅ โฅ 0 โ ๐๐ โฅ 0, ๐ = 1, . . . , ๐. In other words,
๐ฎ๐+ = {๐ โ ๐ฎ๐ | ๐๐(๐) โฅ 0, ๐ = 1, . . . , ๐}. (6.2)
Another useful characterization is that ๐ด โ ๐ฎ๐+ if and only if it is a Grammian matrix
๐ด = ๐ ๐๐ . Here ๐ is called the Cholesky factor of ๐ด. Using the Grammian representationwe have
๐ฅ๐๐ด๐ฅ = ๐ฅ๐๐ ๐๐ ๐ฅ = โ๐ ๐ฅโ22,
i.e., if ๐ด = ๐ ๐๐ then ๐ฅ๐๐ด๐ฅ โฅ 0 for all ๐ฅ. On the other hand, from the posi-tive spectral factorization ๐ด = ๐ฮ๐๐ we have ๐ด = ๐ ๐๐ with ๐ = ฮ1/2๐๐ , whereฮ1/2 = Diag(
โ๐1, . . . ,
โ๐๐). We thus have the equivalent characterization
๐ฎ๐+ = {๐ โ ๐ฎ๐
+ | ๐ = ๐ ๐๐ for some ๐ โ R๐ร๐}. (6.3)
In a completely analogous way we define the cone of symmetric positive definite matrices as
๐ฎ๐++ = {๐ โ ๐ฎ๐ | ๐ง๐๐๐ง > 0, โ๐ง โ R๐}
= {๐ โ ๐ฎ๐ | ๐๐(๐) > 0, ๐ = 1, . . . , ๐}= {๐ โ ๐ฎ๐
+ | ๐ = ๐ ๐๐ for some ๐ โ R๐ร๐, rank(๐ ) = ๐},and we write ๐ โป ๐ (๐ โบ ๐ ) as shorthand notation for (๐ โ ๐ ) โ ๐ฎ๐
++ ((๐ โ๐) โ ๐ฎ๐++).
The one dimensional cone ๐ฎ1+ simply corresponds to R+. Similarly consider
๐ =
[๏ธ๐ฅ1 ๐ฅ3
๐ฅ3 ๐ฅ2
]๏ธwith determinant det(๐) = ๐ฅ1๐ฅ2โ๐ฅ2
3 = ๐1๐2 and trace tr(๐) = ๐ฅ1+๐ฅ2 = ๐1+๐2. Therefore๐ has positive eigenvalues if and only if
๐ฅ1๐ฅ2 โฅ ๐ฅ23, ๐ฅ1, ๐ฅ2 โฅ 0,
which characterizes a three-dimensional scaled rotated cone, i.e.,[๏ธ๐ฅ1 ๐ฅ3
๐ฅ3 ๐ฅ2
]๏ธโ ๐ฎ2
+ โโ (๐ฅ1, ๐ฅ2, ๐ฅ3
โ2) โ ๐ฌ3
๐.
More generally we have
๐ฅ โ R๐+ โโ Diag(๐ฅ) โ ๐ฎ๐
+
and
(๐ก, ๐ฅ) โ ๐ฌ๐+1 โโ[๏ธ
๐ก ๐ฅ๐
๐ฅ ๐ก๐ผ
]๏ธโ ๐ฎ๐+1
+ ,
where the latter equivalence follows immediately from Lemma 6.1. Thus both the linearand quadratic cone are embedded in the semidefinite cone. In practice, however, linear andquadratic cones should never be described using semidefinite constraints, which would resultin a large performance penalty by squaring the number of variables.
52
Example 6.1. As a more interesting example, consider the symmetric matrix
๐ด(๐ฅ, ๐ฆ, ๐ง) =
โกโฃ 1 ๐ฅ ๐ฆ๐ฅ 1 ๐ง๐ฆ ๐ง 1
โคโฆ (6.4)
parametrized by (๐ฅ, ๐ฆ, ๐ง). The set
๐ = {(๐ฅ, ๐ฆ, ๐ง) โ R3 | ๐ด(๐ฅ, ๐ฆ, ๐ง) โ ๐ฎ3+},
(shown in Fig. 6.1) is called a spectrahedron and is perhaps the simplest bounded semidef-inite representable set, which cannot be represented using (finitely many) linear orquadratic cones. To gain a geometric intuition of ๐, we note that
det(๐ด(๐ฅ, ๐ฆ, ๐ง)) = โ(๐ฅ2 + ๐ฆ2 + ๐ง2 โ 2๐ฅ๐ฆ๐ง โ 1),
so the boundary of ๐ can be characterized as
๐ฅ2 + ๐ฆ2 + ๐ง2 โ 2๐ฅ๐ฆ๐ง = 1,
or equivalently as [๏ธ๐ฅ๐ฆ
]๏ธ๐ [๏ธ1 โ๐ง
โ๐ง 1
]๏ธ [๏ธ๐ฅ๐ฆ
]๏ธ= 1 โ ๐ง2.
For ๐ง = 0 this describes a circle in the (๐ฅ, ๐ฆ)-plane, and for โ1 โค ๐ง โค 1 it characterizesan ellipse (for a fixed ๐ง).
6.1.2 Properties of semidefinite matrices
Many useful properties of (semi)definite matrices follow directly from the definitions (6.1)-(6.3) and their definite counterparts.
โข The diagonal elements of ๐ด โ ๐ฎ๐+ are nonnegative. Let ๐๐ denote the ๐th standard basis
vector (i.e., [๐๐]๐ = 0, ๐ ฬธ= ๐, [๐๐]๐ = 1). Then ๐ด๐๐ = ๐๐๐ ๐ด๐๐, so (6.1) implies that ๐ด๐๐ โฅ 0.
โข A block-diagonal matrix ๐ด = Diag(๐ด1, . . . ๐ด๐) is (semi)definite if and only if eachdiagonal block ๐ด๐ is (semi)definite.
โข Given a quadratic transformation ๐ := ๐ต๐๐ด๐ต, ๐ โป 0 if and only if ๐ด โป 0 and ๐ต hasfull rank. This follows directly from the Grammian characterization ๐ = (๐ ๐ต)๐ (๐ ๐ต).For ๐ โชฐ 0 we only require that ๐ด โชฐ 0. As an example, if ๐ด is (semi)definite then sois any permutation ๐ ๐๐ด๐ .
53
Fig. 6.1: Plot of spectrahedron ๐ = {(๐ฅ, ๐ฆ, ๐ง) โ R3 | ๐ด(๐ฅ, ๐ฆ, ๐ง) โชฐ 0}.
โข Any principal submatrix of ๐ด โ ๐ฎ๐+ (๐ด restricted to the same set of rows as columns)
is positive semidefinite; this follows by restricting the Grammian characterization ๐ด =๐ ๐๐ to a submatrix of ๐ .
โข The inner product of positive (semi)definite matrices is positive (nonnegative). Forany ๐ด,๐ต โ ๐ฎ๐
++ let ๐ด = ๐๐๐ and ๐ต = ๐ ๐๐ where ๐ and ๐ have full rank. Then
โจ๐ด,๐ตโฉ = tr(๐๐๐๐ ๐๐ ) = โ๐๐ ๐โ2๐น > 0,
where strict positivity follows from the assumption that ๐ has full column-rank, i.e.,๐๐ ๐ ฬธ= 0.
โข The inverse of a positive definite matrix is positive definite. This follows from thepositive spectral factorization ๐ด = ๐ฮ๐๐ , which gives us
๐ดโ1 = ๐๐ฮโ1๐
where ฮ๐๐ > 0. If ๐ด is semidefinite then the pseudo-inverse ๐ดโ of ๐ด is semidefinite.
โข Consider a matrix ๐ โ ๐ฎ๐ partitioned as
๐ =
[๏ธ๐ด ๐ต๐
๐ต ๐ถ
]๏ธ.
Let us find necessary and sufficient conditions for ๐ โป 0. We know that ๐ด โป 0 and๐ถ โป 0 (since any principal submatrix must be positive definite). Furthermore, we can
54
simplify the analysis using a nonsingular transformation
๐ฟ =
[๏ธ๐ผ 0๐น ๐ผ
]๏ธto diagonalize ๐ as ๐ฟ๐๐ฟ๐ = ๐ท, where ๐ท is block-diagonal. Note that det(๐ฟ) = 1, so๐ฟ is indeed nonsingular. Then ๐ โป 0 if and only if ๐ท โป 0. Expanding ๐ฟ๐๐ฟ๐ = ๐ท,we get [๏ธ
๐ด ๐ด๐น ๐ + ๐ต๐
๐น๐ด + ๐ต ๐น๐ด๐น ๐ + ๐น๐ต๐ + ๐ต๐น ๐ + ๐ถ
]๏ธ=
[๏ธ๐ท1 00 ๐ท2
]๏ธ.
Since det(๐ด) ฬธ= 0 (by assuming that ๐ด โป 0) we see that ๐น = โ๐ต๐ดโ1 and directsubstitution gives us [๏ธ
๐ด 00 ๐ถ โ๐ต๐ดโ1๐ต๐
]๏ธ=
[๏ธ๐ท1 00 ๐ท2
]๏ธ.
In the last part we have thus established the following useful result.
Lemma 6.1 (Schur complement lemma). A symmetric matrix
๐ =
[๏ธ๐ด ๐ต๐
๐ต ๐ถ
]๏ธ.
is positive definite if and only if
๐ด โป 0, ๐ถ โ๐ต๐ดโ1๐ต๐ โป 0.
6.2 Semidefinite modeling
Having discussed different characterizations and properties of semidefinite matrices, we nextturn to different functions and sets that can be modeled using semidefinite cones and vari-ables. Most of those representations involve semidefinite matrix-valued affine functions,which we discuss next.
6.2.1 Linear matrix inequalities
Consider an affine matrix-valued mapping ๐ด : R๐ โฆโ ๐ฎ๐:
๐ด(๐ฅ) = ๐ด0 + ๐ฅ1๐ด1 + ยท ยท ยท + ๐ฅ๐๐ด๐. (6.5)
A linear matrix inequality (LMI) is a constraint of the form
๐ด0 + ๐ฅ1๐ด1 + ยท ยท ยท + ๐ฅ๐๐ด๐ โชฐ 0 (6.6)
55
in the variable ๐ฅ โ R๐ with symmetric coefficients ๐ด๐ โ ๐ฎ๐, ๐ = 0, . . . , ๐. As a simpleexample consider the matrix in (6.4),
๐ด(๐ฅ, ๐ฆ, ๐ง) = ๐ด0 + ๐ฅ๐ด1 + ๐ฆ๐ด2 + ๐ง๐ด3 โชฐ 0
with
๐ด0 = ๐ผ, ๐ด1 =
โกโฃ 0 1 01 0 00 0 0
โคโฆ , ๐ด2 =
โกโฃ 0 0 10 0 01 0 0
โคโฆ , ๐ด3 =
โกโฃ 0 0 00 0 10 1 0
โคโฆ .
Alternatively, we can describe the linear matrix inequality ๐ด(๐ฅ, ๐ฆ, ๐ง) โชฐ 0 as
๐ โ ๐ฎ3+, ๐ฅ11 = ๐ฅ22 = ๐ฅ33 = 1,
i.e., as a semidefinite variable with fixed diagonal; these two alternative formulations illus-trate the difference between primal and dual form of semidefinite problems, see Sec. 8.6.
6.2.2 Eigenvalue optimization
Consider a symmetric matrix ๐ด โ ๐ฎ๐ and let its eigenvalues be denoted by
๐1(๐ด) โฅ ๐2(๐ด) โฅ ยท ยท ยท โฅ ๐๐(๐ด).
A number of different functions of ๐๐ can then be described using a mix of linear and semidef-inite constraints.
Sum of eigenvalues
The sum of the eigenvalues corresponds to
๐โ๏ธ๐=1
๐๐(๐ด) = tr(๐ด).
Largest eigenvalue
The largest eigenvalue can be characterized in epigraph form ๐1(๐ด) โค ๐ก as
๐ก๐ผ โ ๐ด โชฐ 0. (6.7)
To verify this, suppose we have a spectral factorization ๐ด = ๐ฮ๐๐ where ๐ is orthogonaland ฮ is diagonal. Then ๐ก is an upper bound on the largest eigenvalue if and only if
๐๐ (๐ก๐ผ โ ๐ด)๐ = ๐ก๐ผ โ ฮ โชฐ 0.
Thus we can minimize the largest eigenvalue of ๐ด.
56
Smallest eigenvalue
The smallest eigenvalue can be described in hypograph form ๐๐(๐ด) โฅ ๐ก as
๐ด โชฐ ๐ก๐ผ, (6.8)
i.e., we can maximize the smallest eigenvalue of ๐ด.
Eigenvalue spread
The eigenvalue spread can be modeled in epigraph form
๐1(๐ด) โ ๐๐(๐ด) โค ๐ก
by combining the two linear matrix inequalities in (6.7) and (6.8), i.e.,
๐ง๐ผ โชฏ ๐ด โชฏ ๐ ๐ผ,๐ โ ๐ง โค ๐ก.
(6.9)
Spectral radius
The spectral radius ๐(๐ด) := max๐ |๐๐(๐ด)| can be modeled in epigraph form ๐(๐ด) โค ๐กusing two linear matrix inequalities
โ๐ก๐ผ โชฏ ๐ด โชฏ ๐ก๐ผ.
Condition number of a positive definite matrix
Suppose now that ๐ด โ ๐ฎ๐+ . The condition number of a positive definite matrix can be
minimized by noting that ๐1(๐ด)/๐๐(๐ด) โค ๐ก if and only if there exists a ๐ > 0 such that
๐๐ผ โชฏ ๐ด โชฏ ๐๐ก๐ผ,
or equivalently if and only if ๐ผ โชฏ ๐โ1๐ด โชฏ ๐ก๐ผ. If ๐ด = ๐ด(๐ฅ) is represented as in (6.5) then achange of variables ๐ง := ๐ฅ/๐, ๐ := 1/๐ leads to a problem of the form
minimize ๐กsubject to ๐ผ โชฏ ๐๐ด0 +
โ๏ธ๐๐=1 ๐ง๐๐ด๐ โชฏ ๐ก๐ผ,
(6.10)
from which we recover the solution ๐ฅ = ๐ง/๐. In essence, we first normalize the spectrum bythe smallest eigenvalue, and then minimize the largest eigenvalue of the normalized linearmatrix inequality. Compare Sec. 2.2.5.
57
6.2.3 Log-determinant
Consider again a symmetric positive-definite matrix ๐ด โ ๐ฎ๐+ . The determinant
det(๐ด) =๐โ๏ธ๐=1
๐๐(๐ด)
is neither convex or concave, but log det(๐ด) is concave and we can write the inequality
๐ก โค log det(๐ด)
in the form of the following problem:[๏ธ๐ด ๐๐๐ diag(๐)
]๏ธโชฐ 0,
๐ is lower triangular,๐ก โคโ๏ธ๐ log๐๐๐.
(6.11)
The equivalence of the two problems follows from Lemma 6.1 and subadditivity of determi-nant for semidefinite matrices. That is:
0 โค det(๐ดโ ๐๐diag(๐)โ1๐) โค det(๐ด) โ det(๐๐diag(๐)โ1๐)= det(๐ด) โ det(๐).
On the other hand the optimal value det(๐ด) is attained for ๐ = ๐ฟ๐ท if ๐ด = ๐ฟ๐ท๐ฟ๐ is theLDL factorization of ๐ด.
The last inequality in problem (6.11) can of course be modeled using the exponentialcone as in Sec. 5.2. Note that we can replace that bound with ๐ก โค (
โ๏ธ๐ ๐๐๐)
1/๐ to get insteadthe model of ๐ก โค det(๐ด)1/๐ using Sec. 4.2.4.
6.2.4 Singular value optimization
We next consider a non-square matrix ๐ด โ R๐ร๐. Assume ๐ โค ๐ and denote the singularvalues of ๐ด by
๐1(๐ด) โฅ ๐2(๐ด) โฅ ยท ยท ยท โฅ ๐๐(๐ด) โฅ 0.
The singular values are connected to the eigenvalues of ๐ด๐๐ด via
๐๐(๐ด) =โ๏ธ
๐๐(๐ด๐๐ด), (6.12)
and if ๐ด is square and symmetric then ๐๐(๐ด) = |๐๐(๐ด)|. We show next how to optimizeseveral functions of the singular values.
58
Largest singular value
The epigraph ๐1(๐ด) โค ๐ก can be characterized using (6.12) as
๐ด๐๐ด โชฏ ๐ก2๐ผ,
which from Schurโs lemma is equivalent to[๏ธ๐ก๐ผ ๐ด๐ด๐ ๐ก๐ผ
]๏ธโชฐ 0. (6.13)
The largest singular value ๐1(๐ด) is also called the spectral norm or the โ2-norm of ๐ด, โ๐ดโ2 :=๐1(๐ด).
Sum of singular values
The trace norm or the nuclear norm of ๐ is the dual of the โ2-norm:
โ๐โ* = supโ๐โ2โค1
tr(๐๐๐). (6.14)
It turns out that the nuclear norm corresponds to the sum of the singular values,
โ๐โ* =๐โ๏ธ๐=1
๐๐(๐) =๐โ๏ธ
๐=1
โ๏ธ๐๐(๐๐๐), (6.15)
which is easy to verify using singular value decomposition ๐ = ๐ฮฃ๐ ๐ . We have
supโ๐โ2โค1
tr(๐๐๐) = supโ๐โ2โค1
tr(ฮฃ๐๐๐๐๐ )
= supโ๐ โ2โค1
tr(ฮฃ๐๐ )
= sup|๐ฆ๐|โค1
๐โ๏ธ๐=1
๐๐๐ฆ๐ =
๐โ๏ธ๐=1
๐๐.
which shows (6.15). Alternatively, we can express (6.14) as the solution to
maximize tr(๐๐๐)
subject to[๏ธ
๐ผ ๐๐
๐ ๐ผ
]๏ธโชฐ 0,
(6.16)
with the dual problem (see Example 8.8)
minimize tr(๐ + ๐ )/2
subject to[๏ธ
๐ ๐๐
๐ ๐
]๏ธโชฐ 0.
(6.17)
In other words, using strong duality we can characterize the epigraph โ๐ดโ* โค ๐ก with[๏ธ๐ ๐ด๐
๐ด ๐
]๏ธโชฐ 0, tr(๐ + ๐ )/2 โค ๐ก. (6.18)
For a symmetric matrix the nuclear norm corresponds to the sum of absolute values ofeigenvalues, and for a semidefinite matrix it simply corresponds to the trace of the matrix.
59
6.2.5 Matrix inequalities from Schurโs Lemma
Several quadratic or quadratic-over-linear matrix inequalities follow immediately fromSchurโs lemma. Suppose ๐ด : R๐ร๐ and ๐ต : R๐ร๐ are matrix variables. Then
๐ด๐๐ตโ1๐ด โชฏ ๐ถ
if and only if [๏ธ๐ถ ๐ด๐
๐ด ๐ต
]๏ธโชฐ 0.
6.2.6 Nonnegative polynomials
We next consider characterizations of polynomials constrained to be nonnegative on the realaxis. To that end, consider a polynomial basis function
๐ฃ(๐ก) =(๏ธ1, ๐ก, . . . , ๐ก2๐
)๏ธ.
It is then well-known that a polynomial ๐ : R โฆโ R of even degree 2๐ is nonnegative on theentire real axis
๐(๐ก) := ๐ฅ๐๐ฃ(๐ก) = ๐ฅ0 + ๐ฅ1๐ก + ยท ยท ยท + ๐ฅ2๐๐ก2๐ โฅ 0, โ๐ก (6.19)
if and only if it can be written as a sum of squared polynomials of degree ๐ (or less), i.e.,for some ๐1, ๐2 โ R๐+1
๐(๐ก) = (๐๐1 ๐ข(๐ก))2 + (๐๐2 ๐ข(๐ก))2, ๐ข(๐ก) := (1, ๐ก, . . . , ๐ก๐). (6.20)
It turns out that an equivalent characterization of {๐ฅ | ๐ฅ๐๐ฃ(๐ก) โฅ 0, โ๐ก} can be given in termsof a semidefinite variable ๐,
๐ฅ๐ = โจ๐,๐ป๐โฉ, ๐ = 0, . . . , 2๐, ๐ โ ๐ฎ๐+1+ . (6.21)
where ๐ป๐+1๐ โ R(๐+1)ร(๐+1) are Hankel matrices
[๐ป๐]๐๐ =
{๏ธ1, ๐ + ๐ = ๐,0, otherwise.
When there is no ambiguity, we drop the superscript on ๐ป๐. For example, for ๐ = 2 we have
๐ป0 =
โกโฃ 1 0 00 0 00 0 0
โคโฆ , ๐ป1 =
โกโฃ 0 1 01 0 00 0 0
โคโฆ , . . . ๐ป4 =
โกโฃ 0 0 00 0 00 0 1
โคโฆ .
To verify that (6.19) and (6.21) are equivalent, we first note that
๐ข(๐ก)๐ข(๐ก)๐ =2๐โ๏ธ๐=0
๐ป๐๐ฃ๐(๐ก),
60
i.e., โกโขโขโขโฃ1๐ก...๐ก๐
โคโฅโฅโฅโฆโกโขโขโขโฃ
1๐ก...๐ก๐
โคโฅโฅโฅโฆ๐
=
โกโขโขโขโฃ1 ๐ก . . . ๐ก๐
๐ก ๐ก2 . . . ๐ก๐+1
...... . . . ...
๐ก๐ ๐ก๐+1 . . . ๐ก2๐
โคโฅโฅโฅโฆ .
Assume next that ๐(๐ก) โฅ 0. Then from (6.20) we have
๐(๐ก) = (๐๐1 ๐ข(๐ก))2 + (๐๐2 ๐ข(๐ก))2
= โจ๐1๐๐1 + ๐2๐๐2 , ๐ข(๐ก)๐ข(๐ก)๐ โฉ
=โ๏ธ2๐
๐=0โจ๐1๐๐1 + ๐2๐๐2 , ๐ป๐โฉ๐ฃ๐(๐ก),
i.e., we have ๐(๐ก) = ๐ฅ๐๐ฃ(๐ก) with ๐ฅ๐ = โจ๐,๐ป๐โฉ, ๐ = (๐1๐๐1 + ๐2๐
๐2 ) โชฐ 0. Conversely, assume
that (6.21) holds. Then
๐(๐ก) =2๐โ๏ธ๐=0
โจ๐ป๐, ๐โฉ๐ฃ๐(๐ก) = โจ๐,2๐โ๏ธ๐=0
๐ป๐๐ฃ๐(๐ก)โฉ = โจ๐, ๐ข(๐ก)๐ข(๐ก)๐ โฉ โฅ 0
since ๐ โชฐ 0. In summary, we can characterize the cone of nonnegative polynomials over thereal axis as
๐พ๐โ = {๐ฅ โ R๐+1 | ๐ฅ๐ = โจ๐,๐ป๐โฉ, ๐ = 0, . . . , 2๐, ๐ โ ๐ฎ๐+1
+ }. (6.22)
Checking nonnegativity of a univariate polynomial thus corresponds to a semidefinite feasi-bility problem.
Nonnegativity on a finite interval
As an extension we consider a basis function of degree ๐,
๐ฃ(๐ก) = (1, ๐ก, . . . , ๐ก๐).
A polynomial ๐(๐ก) := ๐ฅ๐๐ฃ(๐ก) is then nonnegative on a subinterval ๐ผ = [๐, ๐] โ R if and only๐(๐ก) can be written as a sum of weighted squares,
๐(๐ก) = ๐ค1(๐ก)(๐๐1 ๐ข1(๐ก))
2 + ๐ค2(๐ก)(๐๐2 ๐ข2(๐ก))
2
where ๐ค๐(๐ก) are polynomials nonnegative on [๐, ๐]. To describe the cone
๐พ๐๐,๐ = {๐ฅ โ R๐+1 | ๐(๐ก) = ๐ฅ๐๐ฃ(๐ก) โฅ 0, โ๐ก โ [๐, ๐]}
we distinguish between polynomials of odd and even degree.
โข Even degree. Let ๐ = 2๐ and denote
๐ข1(๐ก) = (1, ๐ก, . . . , ๐ก๐), ๐ข2(๐ก) = (1, ๐ก, . . . , ๐ก๐โ1).
61
We choose ๐ค1(๐ก) = 1 and ๐ค2(๐ก) = (๐กโ ๐)(๐โ ๐ก) and note that ๐ค2(๐ก) โฅ 0 on [๐, ๐]. Then๐(๐ก) โฅ 0,โ๐ก โ [๐, ๐] if and only if
๐(๐ก) = (๐๐1 ๐ข1(๐ก))2 + ๐ค2(๐ก)(๐
๐2 ๐ข2(๐ก))
2
for some ๐1, ๐2, and an equivalent semidefinite characterization can be found as
๐พ๐๐,๐ = {๐ฅ โ R๐+1 | ๐ฅ๐ = โจ๐1, ๐ป
๐๐ โฉ + โจ๐2, (๐ + ๐)๐ป๐โ1
๐โ1 โ ๐๐๐ป๐โ1๐ โ๐ป๐โ1
๐โ2 โฉ,๐ = 0, . . . , ๐, ๐1 โ ๐ฎ๐
+ , ๐2 โ ๐ฎ๐โ1+ }. (6.23)
โข Odd degree. Let ๐ = 2๐+1 and denote ๐ข(๐ก) = (1, ๐ก, . . . , ๐ก๐). We choose ๐ค1(๐ก) = (๐กโ๐)and ๐ค2(๐ก) = (๐โ ๐ก). We then have that ๐(๐ก) = ๐ฅ๐๐ฃ(๐ก) โฅ 0,โ๐ก โ [๐, ๐] if and only if
๐(๐ก) = (๐กโ ๐)(๐๐1 ๐ข(๐ก))2 + (๐โ ๐ก)(๐๐2 ๐ข(๐ก))2
for some ๐1, ๐2, and an equivalent semidefinite characterization can be found as
๐พ๐๐,๐ = {๐ฅ โ R๐+1 | ๐ฅ๐ = โจ๐1, ๐ป
๐๐โ1 โ ๐๐ป๐
๐ โฉ + โจ๐2, ๐๐ป๐๐ โ๐ป๐
๐โ1โฉ,๐ = 0, . . . , ๐, ๐1, ๐2 โ ๐ฎ๐
+ }. (6.24)
6.2.7 Hermitian matrices
Semidefinite optimization can be extended to complex-valued matrices. To that end, let โ๐
denote the cone of Hermitian matrices of order ๐, i.e.,
๐ โ โ๐ โโ ๐ โ C๐ร๐, ๐๐ป = ๐, (6.25)
where superscript โ๐ปโ denotes Hermitian (or complex) transposition. Then ๐ โ โ๐+ if and
only if
๐ง๐ป๐๐ง = (โ๐ง โ ๐โ๐ง)๐ (โ๐ + ๐โ๐)(โ๐ง + ๐โ๐ง)
=
[๏ธโ๐งโ๐ง
]๏ธ๐ [๏ธ โ๐ โโ๐โ๐ โ๐
]๏ธ [๏ธโ๐งโ๐ง
]๏ธโฅ 0, โ๐ง โ C๐.
In other words,
๐ โ โ๐+ โโ
[๏ธโ๐ โโ๐โ๐ โ๐
]๏ธโ ๐ฎ2๐
+ . (6.26)
Note that (6.25) implies skew-symmetry of โ๐, i.e., โ๐ = โโ๐๐ .
6.2.8 Nonnegative trigonometric polynomials
As a complex-valued variation of the sum-of-squares representation we consider trigonometricpolynomials; optimization over cones of nonnegative trigonometric polynomials has several
62
important engineering applications. Consider a trigonometric polynomial evaluated on thecomplex unit-circle
๐(๐ง) = ๐ฅ0 + 2โ(๐โ๏ธ
๐=1
๐ฅ๐๐งโ๐), |๐ง| = 1 (6.27)
parametrized by ๐ฅ โ Rร C๐. We are interested in characterizing the cone of trigonometricpolynomials that are nonnegative on the angular interval [0, ๐],
๐พ๐0,๐ = {๐ฅ โ Rร C๐ | ๐ฅ0 + 2โ(
๐โ๏ธ๐=1
๐ฅ๐๐งโ๐) โฅ 0, โ๐ง = ๐๐๐ก, ๐ก โ [0, ๐]}.
Consider a complex-valued basis function
๐ฃ(๐ง) = (1, ๐ง, . . . , ๐ง๐).
The Riesz-Fejer Theorem states that a trigonometric polynomial ๐(๐ง) in (6.27) is nonnegative(i.e., ๐ฅ โ ๐พ๐
0,๐) if and only if for some ๐ โ C๐+1
๐(๐ง) = |๐๐ป๐ฃ(๐ง)|2. (6.28)
Analogously to Sec. 6.2.6 we have a semidefinite characterization of the sum-of-squares rep-resentation, i.e., ๐(๐ง) โฅ 0, โ๐ง = ๐๐๐ก, ๐ก โ [0, 2๐] if and only if
๐ฅ๐ = โจ๐,๐๐โฉ, ๐ = 0, . . . , ๐, ๐ โ โ๐+1+ (6.29)
where ๐ ๐+1๐ โ R(๐+1)ร(๐+1) are Toeplitz matrices
[๐๐]๐๐ =
{๏ธ1, ๐ โ ๐ = ๐0, otherwise , ๐ = 0, . . . , ๐.
When there is no ambiguity, we drop the superscript on ๐๐. For example, for ๐ = 2 we have
๐0 =
โกโฃ 1 0 00 1 00 0 1
โคโฆ , ๐1 =
โกโฃ 0 0 01 0 00 1 0
โคโฆ , ๐2 =
โกโฃ 0 0 00 0 01 0 0
โคโฆ .
To prove correctness of the semidefinite characterization, we first note that
๐ฃ(๐ง)๐ฃ(๐ง)๐ป = ๐ผ +๐โ๏ธ
๐=1
(๐๐๐ฃ๐(๐ง)) +๐โ๏ธ
๐=1
(๐๐๐ฃ๐(๐ง))๐ป
i.e., โกโขโขโขโฃ1๐ง...๐ง๐
โคโฅโฅโฅโฆโกโขโขโขโฃ
1๐ง...๐ง๐
โคโฅโฅโฅโฆ๐ป
=
โกโขโขโขโฃ1 ๐งโ1 . . . ๐งโ๐
๐ง 1 . . . ๐ง1โ๐
...... . . . ...
๐ง๐ ๐ง๐โ1 . . . 1
โคโฅโฅโฅโฆ .
63
Next assume that (6.28) is satisfied. Then
๐(๐ง) = โจ๐๐๐ป , ๐ฃ(๐ง)๐ฃ(๐ง)๐ปโฉ= โจ๐๐๐ป , ๐ผโฉ + โจ๐๐๐ป ,โ๏ธ๐
๐=1 ๐๐๐ฃ๐(๐ง)โฉ + โจ๐๐๐ป ,โ๏ธ๐๐=1 ๐
๐๐ ๐ฃ๐(๐ง)โฉ
= โจ๐๐๐ป , ๐ผโฉ +โ๏ธ๐
๐=1โจ๐๐๐ป , ๐๐โฉ๐ฃ๐(๐ง) +โ๏ธ๐
๐=1โจ๐๐๐ป , ๐ ๐๐ โฉ๐ฃ๐(๐ง)
= ๐ฅ0 + 2โ(โ๏ธ๐
๐=1 ๐ฅ๐๐ฃ๐(๐ง))
with ๐ฅ๐ = โจ๐๐๐ป , ๐๐โฉ. Conversely, assume that (6.29) holds. Then
๐(๐ง) = โจ๐, ๐ผโฉ +๐โ๏ธ
๐=1
โจ๐,๐๐โฉ๐ฃ๐(๐ง) +๐โ๏ธ
๐=1
โจ๐,๐ ๐๐ โฉ๐ฃ๐(๐ง) = โจ๐, ๐ฃ(๐ง)๐ฃ(๐ง)๐ปโฉ โฅ 0.
In other words, we have shown that
๐พ๐0,๐ = {๐ฅ โ Rร C๐ | ๐ฅ๐ = โจ๐,๐๐โฉ, ๐ = 0, . . . , ๐, ๐ โ โ๐+1
+ }. (6.30)
Nonnegativity on a subinterval
We next sketch a few useful extensions. An extension of the Riesz-Fejer Theorem states thata trigonometric polynomial ๐(๐ง) of degree ๐ is nonnegative on ๐ผ(๐, ๐) = {๐ง | ๐ง = ๐๐๐ก, ๐ก โ[๐, ๐] โ [0, ๐]} if and only if it can be written as a weighted sum of squared trigonometricpolynomials
๐(๐ง) = |๐1(๐ง)|2 + ๐(๐ง)|๐2(๐ง)|2
where ๐1, ๐2, ๐ are trigonemetric polynomials of degree ๐, ๐ โ ๐ and ๐, respectively, and๐(๐ง) โฅ 0 on ๐ผ(๐, ๐). For example ๐(๐ง) = ๐ง + ๐งโ1 โ 2 cos๐ผ is nonnegative on ๐ผ(0, ๐ผ), and itcan be verified that ๐(๐ง) โฅ 0, โ๐ง โ ๐ผ(0, ๐ผ) if and only if
๐ฅ๐ = โจ๐1, ๐๐+1๐ โฉ + โจ๐2, ๐
๐๐+1โฉ + โจ๐2, ๐
๐๐โ1โฉ โ 2 cos(๐ผ)โจ๐2, ๐
๐๐ โฉ,
for ๐1 โ โ๐+1+ , ๐2 โ โ๐
+, i.e.,
๐พ๐0,๐ผ = {๐ฅ โ Rร C๐ | ๐ฅ๐ = โจ๐1, ๐
๐+1๐ โฉ + โจ๐2, ๐
๐๐+1โฉ + โจ๐2, ๐
๐๐โ1โฉ
โ2 cos(๐ผ)โจ๐2, ๐๐๐ โฉ, ๐1 โ โ๐+1
+ , ๐2 โ โ๐+}.
(6.31)
Similarly ๐(๐ง) โฅ 0, โ๐ง โ ๐ผ(๐ผ, ๐) if and only if
๐ฅ๐ = โจ๐1, ๐๐+1๐ โฉ + โจ๐2, ๐
๐๐+1โฉ + โจ๐2, ๐
๐๐โ1โฉ โ 2 cos(๐ผ)โจ๐2, ๐
๐๐ โฉ
i.e.,
๐พ๐๐ผ,๐ = {๐ฅ โ Rร C๐ | ๐ฅ๐ = โจ๐1, ๐
๐+1๐ โฉ + โจ๐2, ๐
๐๐+1โฉ + โจ๐2, ๐
๐๐โ1โฉ
+2 cos(๐ผ)โจ๐2, ๐๐๐ โฉ, ๐1 โ โ๐+1
+ , ๐2 โ โ๐+}.
(6.32)
64
6.3 Semidefinite optimization case studies
6.3.1 Nearest correlation matrix
We consider the set
๐ = {๐ โ ๐ฎ๐+ | ๐๐๐ = 1, ๐ = 1, . . . , ๐}
(shown in Fig. 6.1 for ๐ = 3). For ๐ด โ ๐ฎ๐ the nearest correlation matrix is
๐โ = arg min๐โ๐
โ๐ดโ๐โ๐น ,
i.e., the projection of ๐ด onto the set ๐. To pose this as a conic optimization we define thelinear operator
svec(๐) = (๐11,โ
2๐21, . . . ,โ
2๐๐1, ๐22,โ
2๐32, . . . ,โ
2๐๐2, . . . , ๐๐๐),
which extracts and scales the lower-triangular part of ๐ . We then get a conic formulationof the nearest correlation problem exploiting symmetry of ๐ดโ๐,
minimize ๐กsubject to โ๐งโ2 โค ๐ก,
svec(๐ดโ๐) = ๐ง,diag(๐) = ๐,๐ โชฐ 0.
(6.33)
This is an example of a problem with both conic quadratic and semidefinite constraints inprimal form. We can add different constraints to the problem, for example a bound ๐พ onthe smallest eigenvalue by replacing ๐ โชฐ 0 with ๐ โชฐ ๐พ๐ผ.
6.3.2 Extremal ellipsoids
Given a polytope we can find the largest ellipsoid contained in the polytope, or the smallestellipsoid containing the polytope (for certain representations of the polytope).
Maximal inscribed ellipsoid
Consider a polytope
๐ = {๐ฅ โ R๐ | ๐๐๐ ๐ฅ โค ๐๐, ๐ = 1, . . . ,๐}.
The ellipsoid
โฐ := {๐ฅ | ๐ฅ = ๐ถ๐ข + ๐, โ๐ขโ โค 1}
is contained in ๐ if and only if
maxโ๐ขโ2โค1
๐๐๐ (๐ถ๐ข + ๐) โค ๐๐, ๐ = 1, . . . ,๐
65
or equivalently, if and only if
โ๐ถ๐๐โ2 + ๐๐๐ ๐ โค ๐๐, ๐ = 1, . . . ,๐.
Since Vol(โฐ) โ det(๐ถ)1/๐ the maximum-volume inscribed ellipsoid is the solution to
maximize det(๐ถ)subject to โ๐ถ๐๐โ2 + ๐๐๐ ๐ โค ๐๐, ๐ = 1, . . . ,๐,
๐ถ โชฐ 0.
In Sec. 6.2.2 we show how to maximize the determinant of a positive definite matrix.
Minimal enclosing ellipsoid
Next consider a polytope given as the convex hull of a set of points,
๐ โฒ = conv{๐ฅ1, ๐ฅ2, . . . , ๐ฅ๐}, ๐ฅ๐ โ R๐.
The ellipsoid
โฐ โฒ := {๐ฅ | โ๐ (๐ฅโ ๐)โ2 โค 1}
has Vol(โฐ โฒ) โ det(๐ )โ1/๐, so the minimum-volume enclosing ellipsoid is the solution to
maximize det(๐ )subject to โ๐ (๐ฅ๐ โ ๐)โ2 โค 1, ๐ = 1, . . . ,๐,
๐ โชฐ 0.
Fig. 6.2: Example of inner and outer ellipsoidal approximations of a pentagon in R2.
6.3.3 Polynomial curve-fitting
Consider a univariate polynomial of degree ๐,
๐(๐ก) = ๐ฅ0 + ๐ฅ1๐ก + ๐ฅ2๐ก2 + ยท ยท ยท + ๐ฅ๐๐ก
๐.
66
Often we wish to fit such a polynomial to a given set of measurements or control points
{(๐ก1, ๐ฆ1), (๐ก2, ๐ฆ2), . . . , (๐ก๐, ๐ฆ๐)},
i.e., we wish to determine coefficients ๐ฅ๐, ๐ = 0, . . . , ๐ such that
๐(๐ก๐) โ ๐ฆ๐, ๐ = 1, . . . ,๐.
To that end, define the Vandermonde matrix
๐ด =
โกโขโขโขโฃ1 ๐ก1 ๐ก21 . . . ๐ก๐11 ๐ก2 ๐ก22 . . . ๐ก๐2...
......
...1 ๐ก๐ ๐ก2๐ . . . ๐ก๐๐
โคโฅโฅโฅโฆ .
We can then express the desired curve-fit compactly as
๐ด๐ฅ โ ๐ฆ,
i.e., as a linear expression in the coefficients ๐ฅ. When the degree of the polynomial equalsthe number measurements, ๐ = ๐, the matrix ๐ด is square and non-singular (provided thereare no duplicate rows), so we can can solve
๐ด๐ฅ = ๐ฆ
to find a polynomial that passes through all the control points (๐ก๐, ๐ฆ๐). Similarly, if ๐ > ๐there are infinitely many solutions satisfying the underdetermined system ๐ด๐ฅ = ๐ฆ. A typicalchoice in that case is the least-norm solution
๐ฅln = arg min๐ด๐ฅ=๐ฆ
โ๐ฅโ2
which (assuming again there are no duplicate rows) equals
๐ฅln = ๐ด๐ (๐ด๐ด๐ )โ1๐ฆ.
On the other hand, if ๐ < ๐ we generally cannot find a solution to the overdeterminedsystem ๐ด๐ฅ = ๐ฆ, and we typically resort to a least-squares solution
๐ฅls = arg min โ๐ด๐ฅโ ๐ฆโ2
which is given by the formula (see Sec. 8.5.1)
๐ฅls = (๐ด๐๐ด)โ1๐ด๐๐ฆ.
In the following we discuss how the semidefinite characterizations of nonnegative polynomials(see Sec. 6.2.6) lead to more advanced and useful polynomial curve-fitting constraints.
67
โข Nonnegativity. One possible constraint is nonnegativity on an interval,
๐(๐ก) := ๐ฅ0 + ๐ฅ1๐ก + ยท ยท ยท + ๐ฅ๐๐ก๐ โฅ 0, โ๐ก โ [๐, ๐]
with a semidefinite characterization embedded in ๐ฅ โ ๐พ๐๐,๐, see (6.23).
โข Monotonicity. We can ensure monotonicity of ๐(๐ก) by requiring that ๐ โฒ(๐ก) โฅ 0 (or๐ โฒ(๐ก) โค 0), i.e.,
(๐ฅ1, 2๐ฅ2, . . . , ๐๐ฅ๐) โ ๐พ๐โ1๐,๐ ,
or
โ(๐ฅ1, 2๐ฅ2, . . . , ๐๐ฅ๐) โ ๐พ๐โ1๐,๐ ,
respectively.
โข Convexity or concavity. Convexity (or concavity) of ๐(๐ก) corresponds to ๐ โฒโฒ(๐ก) โฅ 0 (or๐ โฒโฒ(๐ก) โค 0), i.e.,
(2๐ฅ2, 6๐ฅ3, . . . , (๐โ 1)๐๐ฅ๐) โ ๐พ๐โ2๐,๐ ,
or
โ(2๐ฅ2, 6๐ฅ3, . . . , (๐โ 1)๐๐ฅ๐) โ ๐พ๐โ2๐,๐ ,
respectively.
As an example, we consider fitting a smooth polynomial
๐๐(๐ก) = ๐ฅ0 + ๐ฅ1๐ก + ยท ยท ยท + ๐ฅ๐๐ก๐
to the points {(โ1, 1), (0, 0), (1, 1)}, where smoothness is implied by bounding |๐ โฒ๐(๐ก)|. More
specifically, we wish to solve the problem
minimize ๐งsubject to |๐ โฒ
๐(๐ก)| โค ๐ง, โ๐ก โ [โ1, 1]๐๐(โ1) = 1, ๐๐(0) = 0, ๐๐(1) = 1,
or equivalently
minimize ๐งsubject to ๐ง โ ๐ โฒ
๐(๐ก) โฅ 0, โ๐ก โ [โ1, 1]๐ โฒ๐(๐ก) โ ๐ง โฅ 0, โ๐ก โ [โ1, 1]๐๐(โ1) = 1, ๐๐(0) = 0, ๐๐(1) = 1.
Finally, we use the characterizations ๐พ๐๐,๐ to get a conic problem
minimize ๐งsubject to (๐ง โ ๐ฅ1,โ2๐ฅ2, . . . ,โ๐๐ฅ๐) โ ๐พ๐โ1
โ1,1
(๐ฅ1 โ ๐ง, 2๐ฅ2, . . . , ๐๐ฅ๐) โ ๐พ๐โ1โ1,1
๐๐(โ1) = 1, ๐๐(0) = 0, ๐๐(1) = 1.
68
f2(t)
f4(t)
f8(t)
12
1
32
โ2 โ1 0 1 2 t
Fig. 6.3: Graph of univariate polynomials of degree 2, 4, and 8, respectively, passing through{(โ1, 1), (0, 0), (1, 1)}. The higher-degree polynomials are increasingly smoother on [โ1, 1].
In Fig. 6.3 we show the graphs for the resulting polynomails of degree 2, 4 and 8, re-spectively. The second degree polynomial is uniquely determined by the three constraints๐2(โ1) = 1, ๐2(0) = 0, ๐2(1) = 1, i.e., ๐2(๐ก) = ๐ก2. Also, we obviously have a lower boundon the largest derivative max๐กโ[โ1,1] |๐ โฒ
๐(๐ก)| โฅ 1. The computed fourth degree polynomial isgiven by
๐4(๐ก) =3
2๐ก2 โ 1
2๐ก4
after rounding coefficients to rational numbers. Furthermore, the largest derivative is givenby
๐ โฒ4(1/
โ2) =
โ2,
and ๐ โฒโฒ4 (๐ก) < 0 on (1/
โ2, 1] so, although not visibly clear, the polynomial is nonconvex on
[โ1, 1]. In Fig. 6.4 we show the graphs of the corresponding polynomials where we added aconvexity constraint ๐ โฒโฒ
๐(๐ก) โฅ 0, i.e.,
(2๐ฅ2, 6๐ฅ3, . . . , (๐โ 1)๐๐ฅ๐) โ ๐พ๐โ2โ1,1.
In this case, we get
๐4(๐ก) =6
5๐ก2 โ 1
5๐ก4
and the largest derivative increases to 85.
6.3.4 Filter design problems
Filter design is an important application of optimization over trigonometric polynomials insignal processing. We consider a trigonometric polynomial
๐ป(๐) = ๐ฅ0 + 2โ(โ๏ธ๐
๐=1 ๐ฅ๐๐โ๐๐๐)
= ๐0 + 2โ๏ธ๐
๐=1 (๐๐ cos(๐๐) + ๐๐ sin(๐๐))
69
f2(t)
f4(t)
f8(t)12
1
32
โ2 โ1 0 1 2 t
Fig. 6.4: Graph of univariate polynomials of degree 2, 4, and 8, respectively, passing through{(โ1, 1), (0, 0), (1, 1)}. The polynomials all have positive second derivative (i.e., they areconvex) on [โ1, 1] and the higher-degree polynomials are increasingly smoother on thatinterval.
where ๐๐ := โ(๐ฅ๐), ๐๐ := โ(๐ฅ๐). If the function ๐ป(๐) is nonnegative we call it a transferfunction, and it describes how different harmonic components of a periodic discrete signalare attenuated when a filter with transfer function ๐ป(๐) is applied to the signal.
We often wish a transfer function where ๐ป(๐) โ 1 for 0 โค ๐ โค ๐๐ and ๐ป(๐) โ 0 for๐๐ โค ๐ โค ๐ for given constants ๐๐, ๐๐ . One possible formulation for achieving this is
minimize ๐กsubject to 0 โค ๐ป(๐) โ๐ โ [0, ๐]
1 โ ๐ฟ โค ๐ป(๐) โค 1 + ๐ฟ โ๐ โ [0, ๐๐]๐ป(๐) โค ๐ก โ๐ โ [๐๐ , ๐],
which corresponds to minimizing ๐ป(๐ค) on the interval [๐๐ , ๐] while allowing ๐ป(๐ค) to departfrom unity by a small amount ๐ฟ on the interval [0, ๐๐]. Using the results from Sec. 6.2.8 (inparticular (6.30), (6.31) and (6.32)), we can pose this as a conic optimization problem
minimize ๐กsubject to ๐ฅ โ ๐พ๐
0,๐,(๐ฅ0 โ (1 โ ๐ฟ), ๐ฅ1:๐) โ ๐พ๐
0,๐๐,
โ(๐ฅ0 โ (1 + ๐ฟ), ๐ฅ1:๐) โ ๐พ๐0,๐๐
,
โ(๐ฅ0 โ ๐ก, ๐ฅ1:๐) โ ๐พ๐๐๐ ,๐,
(6.34)
which is a semidefinite optimization problem. In Fig. 6.5 we show ๐ป(๐) obtained by solving(6.34) for ๐ = 10, ๐ฟ = 0.05, ๐๐ = ๐/4 and ๐๐ = ๐๐ + ๐/8.
6.3.5 Relaxations of binary optimization
Semidefinite optimization is also useful for computing bounds on difficult non-convex orcombinatorial optimization problems. For example consider the binary linear optimization
70
t?
1โ ฮด1 + ฮด
H(ฯ
)
ฯp ฯs ฯ
ฯ
Fig. 6.5: Plot of lowpass filter transfer function.
problem
minimize ๐๐๐ฅsubject to ๐ด๐ฅ = ๐,
๐ฅ โ {0, 1}๐.(6.35)
In general, problem (6.35) is a very difficult non-convex problem where we have to explore2๐ different objectives. Alternatively, we can use semidefinite optimization to get a lowerbound on the optimal solution with polynomial complexity. We first note that
๐ฅ๐ โ {0, 1} โโ ๐ฅ2๐ = ๐ฅ๐,
which is, in fact, equivalent to a rank constraint on a semidefinite variable,
๐ = ๐ฅ๐ฅ๐ , diag(๐) = ๐ฅ.
By relaxing the rank 1 constraint on ๐ we get a semidefinite relaxation of (6.35),
minimize ๐๐๐ฅsubject to ๐ด๐ฅ = ๐,
diag(๐) = ๐ฅ,๐ โชฐ ๐ฅ๐ฅ๐ ,
(6.36)
where we note that
๐ โชฐ ๐ฅ๐ฅ๐ โโ[๏ธ
1 ๐ฅ๐
๐ฅ ๐
]๏ธโชฐ 0.
Since (6.36) is a semidefinite optimization problem, it can be solved very efficiently. Suppose๐ฅโ is an optimal solution for (6.35); then (๐ฅโ, ๐ฅโ(๐ฅโ)๐ ) is also feasible for (6.36), but thefeasible set for (6.36) is larger than the feasible set for (6.35), so in general the optimalsolution of (6.36) serves as a lower bound. However, if the optimal solution ๐โ of (6.36) has
71
rank 1 we have found a solution to (6.35) also. The semidefinite relaxation can also be usedin a branch-bound mixed-integer exact algorithm for (6.35).
We can tighten (or improve) the relaxation (6.36) by adding other constraints that cutaway parts of the feasible set, without excluding rank 1 solutions. By tightening the re-laxation, we reduce the gap between the optimal values of the original problem and therelaxation. For example, we can add the constraints
0 โค ๐๐๐ โค 1, ๐ = 1, . . . , ๐, ๐ = 1, . . . , ๐,๐๐๐ โฅ ๐๐๐, ๐ = 1, . . . , ๐, ๐ = 1, . . . , ๐,
and so on. This will usually have a dramatic impact on solution times and memory re-quirements. Already constraining a semidefinite matrix to be doubly nonnegative (๐๐๐ โฅ 0)introduces additional ๐2 linear inequality constraints.
6.3.6 Relaxations of boolean optimization
Similarly to Sec. 6.3.5 we can use semidefinite relaxations of boolean constraints ๐ฅ โ{โ1,+1}๐. To that end, we note that
๐ฅ โ {โ1,+1}๐ โโ ๐ = ๐ฅ๐ฅ๐ , diag(๐) = ๐, (6.37)
with a semidefinite relaxation ๐ โชฐ ๐ฅ๐ฅ๐ of the rank-1 constraint.As a (standard) example of a combinatorial problem with boolean constraints, we consider
an undirected graph ๐บ = (๐,๐ธ) described by a set of vertices ๐ = {๐ฃ1, . . . , ๐ฃ๐} and a set ofedges ๐ธ = {(๐ฃ๐, ๐ฃ๐) | ๐ฃ๐, ๐ฃ๐ โ ๐, ๐ ฬธ= ๐}, and we wish to find the cut of maximum capacity.A cut ๐ถ partitions the nodes ๐ into two disjoint sets ๐ and ๐ = ๐ โ ๐, and the cut-set ๐ผ isthe set of edges with one node in ๐ and another in ๐ , i.e.,
๐ผ = {(๐ข, ๐ฃ) โ ๐ธ | ๐ข โ ๐, ๐ฃ โ ๐}.
The capacity of a cut is then defined as the number of edges in the cut-set, |๐ผ|.
v0
v1
v2
v3 v4
v5v6
Fig. 6.6: Undirected graph. The cut {๐ฃ2, ๐ฃ4, ๐ฃ5} has capacity 9 (thick edges).
To maximize the capacity of a cut we define the symmetric adjacency matrix ๐ด โ ๐ฎ๐,
[๐ด]๐๐ =
{๏ธ1, (๐ฃ๐, ๐ฃ๐) โ ๐ธ,0, otherwise,
72
where ๐ = |๐ |, and let
๐ฅ๐ =
{๏ธ+1, ๐ฃ๐ โ ๐,โ1, ๐ฃ๐ /โ ๐.
Suppose ๐ฃ๐ โ ๐. Then 1 โ ๐ฅ๐๐ฅ๐ = 0 if ๐ฃ๐ โ ๐ and 1 โ ๐ฅ๐๐ฅ๐ = 2 if ๐ฃ๐ /โ ๐, so we get anexpression for the capacity as
๐(๐ฅ) =1
4
โ๏ธ๐๐
(1 โ ๐ฅ๐๐ฅ๐)๐ด๐๐ =1
4๐๐๐ด๐โ 1
4๐ฅ๐๐ด๐ฅ,
and discarding the constant term ๐๐๐ด๐ gives us the MAX-CUT problem
minimize ๐ฅ๐๐ด๐ฅsubject to ๐ฅ โ {โ1,+1}๐, (6.38)
with a semidefinite relaxation
minimize โจ๐ด,๐โฉsubject to diag(๐) = ๐,
๐ โชฐ 0.(6.39)
73
Chapter 7
Practical optimization
In this chapter we discuss various practical aspects of creating optimization models.
7.1 Conic reformulations
Many nonlinear constraint families were reformulated to conic form in previous chapters, andthese examples are often sufficient to construct conic models for the optimization problems athand. In case you do need to go beyond the cookbook examples, however, a few compositionrules are worth knowing to guarantee correctness of your reformulation.
7.1.1 Perspective functions
The perspective of a function ๐(๐ฅ) is defined as ๐ ๐(๐ฅ/๐ ) on ๐ > 0. From any conic represen-tation of ๐ก โฅ ๐(๐ฅ) one can reach a representation of ๐ก โฅ ๐ ๐(๐ฅ/๐ ) by exchanging all constants๐ with their homogenized counterpart ๐๐ .
Example 7.1. In Sec. 5.2.6 it was shown that the logarithm of sum of exponentials
๐ก โฅ log(๐๐ฅ1 + ยท ยท ยท + ๐๐ฅ๐),
can be modeled as: โ๏ธ๐ข๐ โค 1,
(๐ข๐, 1, ๐ฅ๐ โ ๐ก) โ ๐พexp, ๐ = 1, . . . , ๐.
It follows immediately that its perspective
๐ก โฅ ๐ log(๐๐ฅ1/๐ + ยท ยท ยท + ๐๐ฅ๐/๐ ),
can be modeled as: โ๏ธ๐ข๐ โค ๐ ,
(๐ข๐, ๐ , ๐ฅ๐ โ ๐ก) โ ๐พexp, ๐ = 1, . . . , ๐.
74
Proof. If [๐ก โฅ ๐(๐ฅ)] โ [๐ก โฅ ๐(๐ฅ, ๐) + ๐0, ๐ด(๐ฅ, ๐) + ๐ โ ๐พ] for some linear functions ๐ and ๐ด, acone ๐พ, and artificial variable ๐, then [๐ก โฅ ๐ ๐(๐ฅ/๐ )] โ [๐ก โฅ ๐
(๏ธ๐(๐ฅ/๐ , ๐) + ๐0
)๏ธ, ๐ด(๐ฅ/๐ , ๐) + ๐ โ
๐พ] โ [๐ก โฅ ๐(๐ฅ, ๐) + ๐0๐ , ๐ด(๐ฅ, ๐) + ๐๐ โ ๐พ], where ๐ is taken as a new artificial variable inplace of ๐๐ .
7.1.2 Composite functions
In representing composites ๐(๐(๐ฅ)), it is often helpful to move the inner function ๐(๐ฅ) to aseparate constraint. That is, we would like ๐ก โฅ ๐(๐(๐ฅ)) to be reformulated as ๐ก โฅ ๐(๐) for anew variable ๐ constrained as ๐ = ๐(๐ฅ). This reformulation works directly as stated if ๐ isaffine; that is, ๐(๐ฅ) = ๐๐๐ฅ+๐. Otherwise, one of two possible relaxations may be considered:
1. If ๐ is convex and ๐ is convex, then ๐ก โฅ ๐(๐), ๐ โฅ ๐(๐ฅ) is equivalent to ๐ก โฅ{๏ธ๐(๐(๐ฅ)) if ๐(๐ฅ) โ โฑ+,
๐min otherwise ,
2. If ๐ is convex and ๐ is concave, then ๐ก โฅ ๐(๐), ๐ โค ๐(๐ฅ) is equivalent to ๐ก โฅ{๏ธ๐(๐(๐ฅ)) if ๐(๐ฅ) โ โฑโ,
๐min otherwise ,
where โฑ+ (resp. โฑโ) is the domain on which ๐ is nondecreasing (resp. nonincreasing),and ๐min is the minimum of ๐ , if finite, and โโ otherwise. Note that Relaxation 1 providesan exact representation if ๐(๐ฅ) โ โฑ+ for all ๐ฅ in domain, and likewise for Relaxation 2 if๐(๐ฅ) โ โฑโ for all ๐ฅ in domain.
Proof. Consider Relaxation 1. Intuitively, if ๐ โ โฑ+, you can expect clever solvers to drive๐ towards โโ in attempt to relax ๐ก โฅ ๐(๐), and this push continues until ๐ โฅ ๐(๐ฅ) becomesactive or else โ in case ๐(๐ฅ) ฬธโ โฑ+ โ until ๐min is reached. On the other hand, if ๐ โ โฑโ
(and thus ๐ โฅ ๐(๐ฅ) โ โฑโ) then solvers will drive ๐ towards +โ until ๐min is reached.
Example 7.2. Here are some simple substitutions.
โข The inequality ๐ก โฅ exp(๐(๐ฅ)) can be rewritten as ๐ก โฅ exp(๐) and ๐ โฅ ๐(๐ฅ) for allconvex functions ๐(๐ฅ). This is because ๐(๐) = exp(๐) is nondecreasing everywhere.
โข The inequality 2๐ ๐ก โฅ ๐(๐ฅ)2 can be rewritten as 2๐ ๐ก โฅ ๐2 and ๐ โฅ ๐(๐ฅ) for allnonnegative convex functions ๐(๐ฅ). This is because ๐(๐) = ๐2 is nondecreasing on๐ โฅ 0.
โข The nonconvex inequality ๐ก โฅ (๐ฅ2 โ 1)2 can be relaxed as ๐ก โฅ ๐2 and ๐ โฅ ๐ฅ2 โ 1.This relaxation is exact on ๐ฅ2 โ 1 โฅ 0 where ๐(๐) = ๐2 is nondecreasing, i.e., on thedisjoint domain |๐ฅ| โฅ 1. On the domain |๐ฅ| โค 1 it represents the strongest possibleconvex cut given by ๐ก โฅ ๐min = 0.
75
Example 7.3. Some amount of work is generally required to find the correct reformulationof a nested nonlinear constraint. For instance, suppose that for ๐ฅ, ๐ฆ, ๐ง โฅ 0 with ๐ฅ๐ฆ๐ง > 1we want to write
๐ก โฅ 1
๐ฅ๐ฆ๐ง โ 1.
A natural first attempt is:
๐ก โฅ 1
๐, ๐ โค ๐ฅ๐ฆ๐ง โ 1,
which corresponds to Relaxation 2 with ๐(๐) = 1/๐ and ๐(๐ฅ, ๐ฆ, ๐ง) = ๐ฅ๐ฆ๐ง โ 1 > 0. Thefunction ๐ is indeed convex and nonincreasing on all of ๐(๐ฅ, ๐ฆ, ๐ง), and the inequality ๐ก๐ โฅ 1is moreover representable with a rotated quadratic cone. Unfortunately ๐ is not concave.We know that a monomial like ๐ฅ๐ฆ๐ง appears in connection with the power cone, but thatrequires a homogeneous constraint such as ๐ฅ๐ฆ๐ง โฅ ๐ข3. This gives us an idea to try
๐ก โฅ 1
๐3, ๐3 โค ๐ฅ๐ฆ๐ง โ 1,
which is ๐(๐) = 1/๐3 and ๐(๐ฅ, ๐ฆ, ๐ง) = (๐ฅ๐ฆ๐ง โ 1)1/3 > 0. This provides the right balance:all conditions for validity and exactness of Relaxation 2 are satisfied. Introducing anothervariable ๐ข we get the following model:
๐ก๐3 โฅ 1, ๐ฅ๐ฆ๐ง โฅ ๐ข3, ๐ข โฅ (๐3 + 13)1/3,
We refer to Sec. 4 to verify that all the constraints above are representable using powercones. We leave it as an exercise to find other conic representations, based on othertransformations of the original inequality.
7.1.3 Convex univariate piecewise-defined functions
Consider a univariate function with ๐ pieces:
๐(๐ฅ) =
โงโชโจโชโฉ๐1(๐ฅ) if ๐ฅ โค ๐ผ1,
๐๐(๐ฅ) if ๐ผ๐โ1 โค ๐ฅ โค ๐ผ๐, for ๐ = 2, . . . , ๐ โ 1,
๐๐(๐ฅ) if ๐ผ๐โ1 โค ๐ฅ,
Suppose ๐(๐ฅ) is convex (in particular each piece is convex by itself) and ๐๐(๐ผ๐) = ๐๐+1(๐ผ๐). Inrepresenting the epigraph ๐ก โฅ ๐(๐ฅ) it is helpful to proceed via the equivalent representation:
๐ก =โ๏ธ๐
๐=1 ๐ก๐ โโ๏ธ๐โ1
๐=1 ๐๐(๐ผ๐),
๐ฅ =โ๏ธ๐
๐=1 ๐ฅ๐ โโ๏ธ๐โ1
๐=1 ๐ผ๐,๐ก๐ โฅ ๐๐(๐ฅ๐) for ๐ = 1, . . . , ๐,๐ฅ1 โค ๐ผ1, ๐ผ๐โ1 โค ๐ฅ๐ โค ๐ผ๐ for ๐ = 2, . . . , ๐ โ 1, ๐ผ๐โ1 โค ๐ฅ๐.
76
Proof. In the special case when ๐ = 2 and ๐ผ1 = ๐1(๐ผ1) = ๐2(๐ผ1) = 0 the epigraph of ๐ isequal to the Minkowski sum {(๐ก1 + ๐ก2, ๐ฅ1 + ๐ฅ2) : ๐ก๐ โฅ ๐๐(๐ฅ๐)}. In general the epigraph overtwo consecutive pieces is obtained by shifting them to (0, 0), computing the Minkowski sumand shifting back. Finally more than two pieces can be joined by continuing this argumentby induction.
As the reformulation grows in size with the number of pieces, it is preferable to keepthis number low. Trivially, if ๐๐(๐ฅ) = ๐๐+1(๐ฅ), these two pieces can be merged. Substituting๐๐(๐ฅ) and ๐๐+1(๐ฅ) for max(๐๐(๐ฅ), ๐๐+1(๐ฅ)) is sometimes an invariant change to facilitate thismerge. For instance, it always works for affine functions ๐๐(๐ฅ) and ๐๐+1(๐ฅ). Finally, if ๐(๐ฅ)is symmetric around some point ๐ผ, we can represent its epigraph via a piecewise function,with only half the number of pieces, defined in terms of a new variable ๐ง = |๐ฅโ ๐ผ| + ๐ผ.
Example 7.4 (Huber loss function). The Huber loss function
๐(๐ฅ) =
โงโชโจโชโฉโ2๐ฅโ 1 if ๐ฅ โค โ1,
๐ฅ2 if โ 1 โค ๐ฅ โค 1,
2๐ฅโ 1 if 1 โค ๐ฅ,
is convex and its epigraph ๐ก โฅ ๐(๐ฅ) has an equivalent representation
๐ก โฅ ๐ก1 + ๐ก2 + ๐ก3 โ 2,๐ฅ = ๐ฅ1 + ๐ฅ2 + ๐ฅ3,๐ก1 = โ2๐ฅ1 โ 1, ๐ก2 โฅ ๐ฅ2
2, ๐ก3 = 2๐ฅ3 โ 1,๐ฅ1 โค โ1, โ1 โค ๐ฅ2 โค 1, 1 โค ๐ฅ3,
where ๐ก2 โฅ ๐ฅ22 is representable by a rotated quadratic cone. No two pieces of ๐(๐ฅ) can be
merged to reduce the size of this formulation, but the loss function does satisfy a simplesymmetry; namely ๐(๐ฅ) = ๐(โ๐ฅ). We can thus represent its epigraph by ๐ก โฅ ๐(๐ง) and๐ง โฅ |๐ฅ|, where
๐(๐ง) =
{๏ธ๐ง2 if ๐ง โค 1,
2๐ง โ 1 if 1 โค ๐ง.
In this particular example, however, unless the absolute value ๐ง from ๐ง โฅ |๐ฅ| is usedelsewhere, the cost of introducing it does not outweigh the savings achieved by going fromthree pieces in ๐ฅ to two pieces in ๐ง.
7.2 Avoiding ill-posed problems
For a well-posed continuous problem a small change in the input data should induce a smallchange of the optimal solution. A problem is ill-posed if small perturbations of the problemdata result in arbitrarily large perturbations of the solution, or even change the feasibility
77
status of the problem. In such cases small rounding errors, or solving the problem on adifferent computer can result in different or wrong solutions.
In fact, from an algorithmic point of view, even computing a wrong solution is numericallydifficult for ill-posed problems. A rigorous definition of the degree of ill-posedness is possibleby defining a condition number. This is an attractive, but not very practical metric, as itsevaluation requires solving several auxiliary optimization problems. Therefore we only makethe modest recommendations to avoid problems
โข that are nearly infeasible,
โข with constraints that are linearly dependent, or nearly linearly dependent.
Example 7.5 (Near linear dependencies). To give some idea about the perils of nearlinear dependence, consider this pair of optimization problems:
maximize ๐ฅsubject to 2๐ฅโ ๐ฆ โฅ โ1,
2.0001๐ฅโ ๐ฆ โค 0,
and
maximize ๐ฅsubject to 2๐ฅโ ๐ฆ โฅ โ1,
1.9999๐ฅโ ๐ฆ โค 0.
The first problem has a unique optimal solution (๐ฅ, ๐ฆ) = (104, 2 ยท104 +1), while the secondproblem is unbounded. This is caused by the fact that the hyperplanes (here: straightlines) defining the constraints in each problem are almost parallel. Moreover, we canconsider another modification:
maximize ๐ฅsubject to 2๐ฅโ ๐ฆ โฅ โ1,
2.001๐ฅโ ๐ฆ โค 0,
with optimal solution (๐ฅ, ๐ฆ) = (103, 2 ยท 103 + 1), which shows how a small perturbation ofthe coefficients induces a large change of the solution.
Typical examples of problems with nearly linearly dependent constraints are discretiza-tions of continuous processes, where the constraints invariably become more correlated aswe make the discretization finer; as such there may be nothing wrong with the discretiza-tion or problem formulation, but we should expect numerical difficulties for sufficiently finediscretizations.
One should also be careful not to specify problems whose optimal value is only achievedin the limit. A trivial example is
minimize ๐๐ฅ, ๐ฅ โ R.
78
The infimum value 0 is not attained for any finite ๐ฅ, which can lead to unspecified behaviourof the solver. More examples of ill-posed conic problems are discussed in Fig. 7.1 and Sec.8. In particular, Fig. 7.1 depicts some troublesome problems in two dimensions. In (a)minimizing ๐ฆ โฅ 1/๐ฅ on the nonnegative orthant is unattained approaching zero for ๐ฅ โ โ.In (b) the only feasible point is (0, 0), but objectives maximizing ๐ฅ are not blocked by thepurely vertical normal vectors of active constraints at this point, falsely suggesting localprogress could be made. In (c) the intersection of the two subsets is empty, but the distancebetween them is zero. Finally in (d) minimizing ๐ฅ on ๐ฆ โฅ ๐ฅ2 is unbounded at minus infinityfor ๐ฆ โ โ, but there is no improving ray to follow. Although seemingly unrelated, the fourcases are actually primal-dual pairs; (a) with (b) and (c) with (d). In fact, the missing normalvector property in (b)โdesired to certify optimalityโcan be attributed to (a) not attainingthe best among objective values at distance zero to feasibility, and the missing positivedistance property in (c)โdesired to certify infeasibilityโis because (d) has no improvingray.
Fig. 7.1: Geometric representations of some ill-posed situations.
7.3 Scaling
Another difficulty encountered in practice involves models that are badly scaled. Looselyspeaking, we consider a model to be badly scaled if
โข variables are measured on very different scales,
โข constraints or bounds are measured on very different scales.
For example if one variable ๐ฅ1 has units of molecules and another variable ๐ฅ2 measurestemperature, we might expect an objective function such as
๐ฅ1 + 1012๐ฅ2
79
and in finite precision the second term dominates the objective and renders the contributionfrom ๐ฅ1 insignificant and unreliable.
A similar situation (from a numerical point of view) is encountered when using penal-ization or big-๐ strategies. Assume we have a standard linear optimization problem (2.12)with the additional constraint that ๐ฅ1 = 0. We may eliminate ๐ฅ1 completely from the model,or we might add an additional constraint, but suppose we choose to formulate a penalizedproblem instead
minimize ๐๐๐ฅ + 1012๐ฅ1
subject to ๐ด๐ฅ = ๐,๐ฅ โฅ 0,
reasoning that the large penalty term will force ๐ฅ1 = 0. However, if โ๐โ is small we havethe exact same problem, namely that in finite precision the penalty term will completelydominate the objective and render the contribution ๐๐๐ฅ insignificant or unreliable. Therefore,the penalty term should be chosen carefully.
Example 7.6 (Risk-return tradeoff). Suppose we want to solve an efficient frontier variantof the optimal portfolio problem from Sec. 3.3.3 with an objective of the form
maximize ๐๐๐ฅโ 1
2๐พ๐ฅ๐ฮฃ๐ฅ
where the parameter ๐พ controls the tradeoff between return and risk. The user mightchoose a very small ๐พ to discount the risk term. This, however, may lead to numericalissues caused by a very large solution. To see why, consider a simple one-dimensional,unconstrained analogue:
maximize ๐ฅโ 1
2๐พ๐ฅ2, ๐ฅ โ R
and note that the maximum is attained for ๐ฅ = 1/๐พ, which may be very large.
Example 7.7 (Explicit redundant bounds). Consider again the problem (2.12), but withadditional (redundant) constraints ๐ฅ๐ โค ๐พ. This is a common approach for some optimiza-tion practitioners. The problem we solve is then
minimize ๐๐๐ฅsubject to ๐ด๐ฅ = ๐,
๐ฅ โฅ 0,๐ฅ โค ๐พ๐,
with a dual problem
maximize ๐๐๐ฆ โ ๐พ๐๐ ๐งsubject to ๐ด๐๐ฆ + ๐ โ ๐ง = ๐,
๐ , ๐ง โฅ 0.
80
Suppose we do not know a-priori an upper bound on โ๐ฅโโ, so we choose a very large๐พ = 1012 reasoning that this will not change the optimal solution. Note that the largevariable bound becomes a penalty term in the dual problem; in finite precision such alarge bound will effectively destroy accuracy of the solution.
Example 7.8 (Big-M). Suppose we have a mixed integer model which involves a big-Mconstraint (see Sec. 9), for instance
๐๐๐ฅ โค ๐ + ๐(1 โ ๐ง)
as in Sec. 9.1.3. The constant ๐ must be big enough to guarantee that the constraintbecomes redundant when ๐ง = 0, or we risk shrinking the feasible set, perhaps to the pointof creating an infeasible model. On the other hand, the user might set ๐ to a huge, safevalue, say ๐ = 1012. While this is mathematically correct, it will lead to a model withpoorer linear relaxations and most likely increase solution time (sometimes dramatically).It is therefore very important to put some effort into estimating a reasonable value of ๐which is safe, but not too big.
7.4 The huge and the tiny
Some types of problems and constraints are especially prone to behave badly with very largeor very small numbers. We discuss such examples briefly in this section.
Sum of squares
The sum of squares ๐ฅ21 + ยท ยท ยท+๐ฅ2
๐ can be bounded above using the rotated quadratic cone:
๐ก โฅ ๐ฅ21 + ยท ยท ยท + ๐ฅ2
๐ โโ (1
2, ๐ก, ๐ฅ) โ ๐ฌ๐
r ,
but in most cases it is better to bound the square root with a quadratic cone:
๐กโฒ โฅโ๏ธ
๐ฅ21 + ยท ยท ยท + ๐ฅ2
๐ โโ (๐กโฒ, ๐ฅ) โ ๐ฌ๐.
The latter has slightly better numerical properties: ๐กโฒ is roughly comparable with ๐ฅ andis measured on the same scale, while ๐ก grows as the square of ๐ฅ which will sooner lead tonumerical instabilities.
Exponential cone
Using the function ๐๐ฅ for ๐ฅ โค โ30 or ๐ฅ โฅ 30 would be rather questionable if ๐๐ฅ is supposedto represent any realistic value. Note that ๐30 โ 1013 and that ๐30.0000001 โ ๐30 โ 106, so a
81
tiny perturbation of the exponent produces a huge change of the result. Ideally, ๐ฅ shouldhave a single-digit absolute value. For similar reasons, it is even more important to havewell-scaled data. For instance, in floating point arithmetic we have the โequalityโ
๐20 + ๐โ20 = ๐20
since the smaller summand disappears in comparison to the other one, 1017 times bigger.For the same reason it is advised to replace explicit inequalities involving exp(๐ฅ) with
log-sum-exp variants (see Sec. 5.2.6). For example, suppose we have a constraint such as
๐ก โฅ ๐๐ฅ1 + ๐๐ฅ2 + ๐๐ฅ3 .
Then after a substitution ๐ก = ๐๐กโฒ we have
1 โฅ ๐๐ฅ1โ๐กโฒ + ๐๐ฅ2โ๐กโฒ + ๐๐ฅ3โ๐กโฒ
which has the advantage that ๐กโฒ is of the same order of magnitude as ๐ฅ๐, and that theexponents ๐ฅ๐ โ ๐กโฒ are likely to have much smaller absolute values than simply ๐ฅ๐.
Power cone
The power cone is not reliable when one of the exponents is very small. For example,consider the function ๐(๐ฅ) = ๐ฅ0.01, which behaves almost like the indicator function of ๐ฅ > 0in the sense that
๐(0) = 0, and ๐(๐ฅ) > 0.5 for ๐ฅ > 10โ30.
Now suppose ๐ฅ = 10โ20. Is the constraint ๐(๐ฅ) > 0.5 satisfied? In principle yes, but inpractice ๐ฅ could have been obtained as a solution to an optimization problem and it may infact represent 0 up to numerical error. The function ๐(๐ฅ) is sensitive to changes in ๐ฅ wellbelow standard numerical accuracy. The point ๐ฅ = 0 does not satisfy ๐(๐ฅ) > 0.5 but it isonly 10โ30 from doing so.
7.5 Semidefinite variables
Special care should be given to models involving semidefinite matrix variables (see Sec. 6),otherwise it is easy to produce an unnecessarily inefficient model. The general rule of thumbis:
โข having many small matrix variables is more efficient than one big matrix variable,
โข efficiency drops with growing number of semidefinite terms involved in linear con-straints. This can have much bigger impact on the solution time than increasing thedimension of the semidefinite variable.
Let us consider a few examples.
82
Block matrices
Given two matrix variables ๐, ๐ โชฐ 0 do not assemble them in a block matrix and writethe constraints as [๏ธ
๐ 00 ๐
]๏ธโชฐ 0.
This increases the dimension of the problem and, even worse, introduces unnecessary con-straints for a large portion of entries of the block matrix.
Schur complement
Suppose we want to model a relaxation of a rank-one constraint:[๏ธ๐ ๐ฅ๐ฅ๐ 1
]๏ธโชฐ 0.
where ๐ฅ โ R๐ and ๐ โ ๐ฎ๐+. The correct way to do this is to set up a matrix variable
๐ โ ๐ฎ๐+1+ with only a linear number of constraints:
๐๐,๐+1 = ๐ฅ๐, ๐ = 1, . . . , ๐๐๐+1,๐+1 = 1,๐ โชฐ 0,
and use the upper-left ๐ร ๐ part of ๐ as the original ๐. Going the other way around, i.e.starting with a variable ๐ and aligning it with the corner of another, bigger semidefinitematrix ๐ introduces ๐(๐+1)/2 equality constraints and will quickly have formidable solutiontimes.
Sparse LMIs
Suppose we want to model a problem with a sparse linear matrix inequality (see Sec.6.2.1) such as:
minimize ๐๐๐ฅ
subject to ๐ด0 +โ๏ธ๐
๐=1 ๐ด๐๐ฅ๐ โชฐ 0,
where ๐ฅ โ R๐ and ๐ด๐ are symmetric ๐ ร ๐ matrices. The representation of this problem inprimal form is:
minimize ๐๐๐ฅ
subject to ๐ด0 +โ๏ธ๐
๐=1 ๐ด๐๐ฅ๐ = ๐,๐ โชฐ 0,
and the linear constraint requires a full set of ๐(๐ + 1)/2 equalities, many of which are just๐๐,๐ = 0, regardless of the sparsity of ๐ด๐. However the dual problem (see Sec. 8.6) is:
maximize โโจ๐ด0, ๐โฉsubject to โจ๐ด๐, ๐โฉ = ๐๐, ๐ = 1, . . . , ๐,
๐ โชฐ 0,
83
and the number of nonzeros in linear constraints is just joint number of nonzeros in ๐ด๐. Itmeans that large, sparse LMIs should almost always be dualized and entered in that formfor efficiency reasons.
7.6 The quality of a solution
In this section we will discuss how to validate an obtained solution. Assume we have a conicmodel with continuous variables only and that the optimization software has reported anoptimal primal and dual solution. Given such a solution, we might ask how to verify that itis indeed feasible and optimal.
To that end, consider a simple model
minimize โ๐ฅ2
subject to ๐ฅ1 + ๐ฅ2 โค 3,๐ฅ22 โค 2,
๐ฅ1, ๐ฅ2 โฅ 0,
(7.1)
where a solver might approximate the solution as
๐ฅ1 = 0.0000000000000000 and ๐ฅ2 = 1.4142135623730951
and therefore the approximate optimal objective value is
โ1.4142135623730951.
The true objective value is โโ
2, so the approximate objective value is wrong by the amount
1.4142135623730951 โโ
2 โ 10โ16.
Most likely this difference is irrelevant for all practical purposes. Nevertheless, in general asolution obtained using floating point arithmetic is only an approximation. Most (if not all)commercial optimization software uses double precision floating point arithmetic, implyingthat about 16 digits are correct in the computations performed internally by the software.This also means that irrational numbers such as
โ2 and ๐ can only be stored accurately
within 16 digits.
Verifying feasibility
A good practice after solving an optimization problem is to evaluate the reported solution.At the very least this process should be carried out during the initial phase of building amodel, or if the reported solution is unexpected in some way. The first step in that processis to check that the solution is feasible; in case of the small example (7.1) this amounts tochecking that:
๐ฅ1 + ๐ฅ2 = 0.0000000000000000 + 1.4142135623730951 โค 3,๐ฅ22 = 2.0000000000000004 โค 2, (?)
๐ฅ1 = 0.0000000000000000 โฅ 0,๐ฅ2 = 1.4142135623730951 โฅ 0,
84
which demonstrates that one constraint is slightly violated due to computations in finiteprecision. It is up to the user to assess the significance of a specific violation; for example aviolation of one unit in
๐ฅ1 + ๐ฅ2 โค 1
may or may not be more significant than a violation of one unit in
๐ฅ1 + ๐ฅ2 โค 109.
The right-hand side of 109 may itself be the result of a computation in finite precision, andmay only be known with, say 3 digits of accuracy. Therefore, a violation of 1 unit is notsignificant since the true right-hand side could just as well be 1.001ยท109. Certainly a violationof 4 ยท 10โ16 as in our example is within the numerical tolerance for zero and we would acceptthe solution as feasible. In practice it would be standard to see violations of order 10โ8.
Verifying optimality
Another question is how to verify that a feasible approximate solution is actually optimal,which is answered by duality theory, which is discussed in Sec. 2.4 for linear problems andin Sec. 8 in full generality. Following Sec. 8 we can derive an equivalent version of the dualproblem as:
maximize โ3๐ฆ1 โ ๐ฆ2 โ ๐ฆ3subject to 2๐ฆ2๐ฆ3 โฅ (๐ฆ1 + 1)2,
๐ฆ1 โฅ 0.
This problem has a feasible point (๐ฆ1, ๐ฆ2, ๐ฆ3) = (0, 1/โ
2, 1/โ
2) with objective value โโ
2,matching the objective value of the primal problem. By duality theory this proves that boththe primal and dual solutions are optimal. Again, in finite precision the dual objective valuemay be reported as, for example
โ1.4142135623730950
leading to a duality gap of about 10โ16 between the approximate primal and dual objectivevalues. It is up to the user to assess the significance of any duality gap and accept or rejectthe solution as sufficiently good approximation of the optimal value.
7.7 Distance to a cone
The violation of a linear constraint ๐๐๐ฅ โค ๐ under a given solution ๐ฅ* is obviously
max(0, ๐๐๐ฅ* โ ๐).
85
It is less obvious how to assess violations of conic constraints. To this end suppose we havea convex cone ๐พ. The (Euclidean) projection of a point ๐ฅ onto ๐พ is defined as the solutionof the problem
minimize โ๐โ ๐ฅโ2subject to ๐ โ ๐พ.
(7.2)
We denote the projection by proj๐พ(๐ฅ). The distance of ๐ฅ to ๐พ
dist(๐ฅ,๐พ) = โ๐ฅโ proj๐พ(๐ฅ)โ2,
i.e. the objective value of (7.2), measures the violation of a conic constraint involving ๐พ.Obviously
dist(๐ฅ,๐พ) = 0 โโ ๐ฅ โ ๐พ.
This distance measure is attractive because it depends only on the set ๐พ and not on anyparticular representation of ๐พ using (in)equalities.
Example 7.9 (Surprisingly small distance). Let ๐พ = ๐ซ0.1,0.93 be the power cone defined
by the inequality ๐ฅ0.1๐ฆ0.9 โฅ |๐ง|. The point
๐ฅ* = (๐ฅ, ๐ฆ, ๐ง) = (0, 10000, 500)
is clearly not in the cone, in fact ๐ฅ0.1๐ฆ0.9 = 0 while |๐ง| = 500, so the violation of the conicinequality is seemingly large. However, the point
๐ = (10โ8, 10000, 500)
belongs to ๐ซ0.1,0.93 (since (10โ8)0.1 ยท 100000.9 โ 630), hence
dist(๐ฅ*,๐ซ0.1,0.93 ) โค โ๐ฅ* โ ๐โ2 = 10โ8.
Therefore the distance of ๐ฅ* to the cone is actually very small.
Distance to certain cones
For some basic cone types the projection problem (7.2) can be solved analytically. De-noting [๐ฅ]+ = max(0, ๐ฅ) we have that:
โข For ๐ฅ โ R the projection onto the nonnegative half-line is projR+(๐ฅ) = [๐ฅ]+.
โข For (๐ก, ๐ฅ) โ Rร R๐โ1 the projection onto the quadratic cone is
proj๐ฌ๐(๐ก, ๐ฅ) =
โงโชโชโจโชโชโฉ(๐ก, ๐ฅ) if ๐ก โฅ โ๐ฅโ2,12
(๏ธ๐ก
โ๐ฅโ2 + 1)๏ธ
(โ๐ฅโ2, ๐ฅ) if โ โ๐ฅโ2 < ๐ก < โ๐ฅโ2,0 if ๐ก โค โโ๐ฅโ2.
86
โข For ๐ =โ๏ธ๐
๐=1 ๐๐๐๐๐๐๐ the projection onto the semidefinite cone is
proj๐ฎ๐+
(๐) =๐โ๏ธ
๐=1
[๐๐]+๐๐๐๐๐ .
87
Chapter 8
Duality in conic optimization
In Sec. 2 we introduced duality and related concepts for linear optimization. Here we presenta more general version of this theory for conic optimization and we illustrate it with examples.Although this chapter is self-contained, we recommend familiarity with Sec. 2, which someof this material is a natural extension of.
Duality theory is a rich and powerful area of convex optimization, and central to under-standing sensitivity analysis and infeasibility issues. Furthermore, it provides a simple andsystematic way of obtaining non-trivial lower bounds on the optimal value for many difficultnon-convex problems.
From now on we consider a conic problem in standard form
minimize ๐๐๐ฅsubject to ๐ด๐ฅ = ๐,
๐ฅ โ ๐พ,(8.1)
where ๐พ is a convex cone. In practice ๐พ would likely be defined as a product
๐พ = ๐พ1 ร ยท ยท ยท ร๐พ๐
of smaller cones corresponding to actual constraints in the problem formulation. The abstractformulation (8.1) encompasses such special cases as ๐พ = R๐
+ (linear optimization), ๐พ๐ = R(unconstrained variable) or ๐พ๐ = ๐ฎ๐
+ (12๐(๐ + 1) variables representing the upper-triangular
part of a matrix).The feasible set for (8.1):
โฑ๐ = {๐ฅ โ R๐ | ๐ด๐ฅ = ๐} โฉ๐พ
is a section of ๐พ. We say the problem is feasible if โฑ๐ ฬธ= โ and infeasible otherwise. Thevalue of (8.1) is defined as
๐โ = inf{๐๐๐ฅ : ๐ฅ โ โฑ๐},
allowing the special cases of ๐โ = +โ (the problem is infeasible) and ๐โ = โโ (the problemis unbounded). Note that ๐โ may not be attained, i.e. the infimum is not necessarily aminimum, although this sort of problems are ill-posed from a practical point of view (seeSec. 7).
88
Example 8.1 (Unattained problem value). The conic quadratic problem
minimize ๐ฅsubject to (๐ฅ, ๐ฆ, 1) โ ๐ฌ3
๐,
with constraint equivalent to 2๐ฅ๐ฆ โฅ 1, ๐ฅ, ๐ฆ โฅ 0 has ๐โ = 0 but this optimal value is notattained by any point in โฑ๐.
8.1 Dual cone
Suppose ๐พ โ R๐ is a closed convex cone. We define the dual cone ๐พ* as
๐พ* = {๐ฆ โ R๐ : ๐ฆ๐๐ฅ โฅ 0 โ๐ฅ โ ๐พ}. (8.2)
For simplicity we write ๐ฆ๐๐ฅ, denoting the standard Euclidean inner product, but everythingin this section applies verbatim to the inner product of matrices in the semidefinite context.In other words ๐พ* consists of vectors which form an acute (or right) angle with every vectorin ๐พ. We see easily that ๐พ* is in fact a convex cone. The main example, associated withlinear optimization, is the dual of the positive orthant:
Lemma 8.1 (Dual of linear cone).
(R๐+)* = R๐
+.
Proof. This is obvious for ๐ = 1, and then we use the simple fact
(๐พ1 ร ยท ยท ยท ร๐พ๐)* = ๐พ*1 ร ยท ยท ยท ร๐พ*
๐,
which we invite the reader to prove.
All cones studied in this cookbook can be explicitly dualized as we show next.
Lemma 8.2 (Duals of nonlinear cones). We have the following:
โข The quadratic, rotated quadratic and semidefinite cones are self-dual:
(๐ฌ๐)* = ๐ฌ๐, (๐ฌ๐r )* = ๐ฌ๐
r , (๐ฎ๐+)* = ๐ฎ๐
+.
โข The dual of the power cone (4.4) is
(๐ซ๐ผ1,ยทยทยท ,๐ผ๐๐ )* =
{๏ธ๐ฆ โ R๐ :
(๏ธ๐ฆ1๐ผ1
, . . . ,๐ฆ๐๐ผ๐
, ๐ฆ๐+1, . . . , ๐ฆ๐
)๏ธโ ๐ซ๐ผ1,ยทยทยท ,๐ผ๐
๐
}๏ธ.
โข The dual of the exponential cone (5.2) is the closure
(๐พexp)* = cl{๏ธ๐ฆ โ R3 : ๐ฆ1 โฅ โ๐ฆ3๐
๐ฆ2/๐ฆ3โ1, ๐ฆ1 > 0, ๐ฆ3 < 0}๏ธ.
89
Proof. We only sketch some parts of the proof and indicate geometric intuitions. First, letus show (๐ฌ๐)* = ๐ฌ๐. For (๐ก, ๐ฅ), (๐ , ๐ฆ) โ ๐ฌ๐ we have
๐ ๐ก + ๐ฆ๐๐ฅ โฅ โ๐ฆโ2โ๐ฅโ2 + ๐ฆ๐๐ฅ โฅ 0
by the Cauchy-Schwartz inequality |๐ข๐๐ฃ| โค โ๐ขโ2โ๐ฃโ2, therefore ๐ฌ๐ โ (๐ฌ๐)*. Geometricallywe are just saying that the quadratic cone ๐ฌ๐ has a right angle at the apex, so any twovectors inside the cone form at most a right angle. Now suppose (๐ , ๐ฆ) is a point outside๐ฌ๐. If (โ๐ ,โ๐ฆ) โ ๐ฌ๐ then โ๐ 2 โ ๐ฆ๐๐ฆ < 0 and we showed (๐ , ๐ฆ) ฬธโ (๐ฌ๐)*. Otherwise let(๐ก, ๐ฅ) = proj๐ฌ๐(๐ , ๐ฆ) be the projection (see Sec. 7.7); note that (๐ , ๐ฆ) and (๐ก,โ๐ฅ) โ ๐ฌ๐ forman obtuse angle.
For the semidefinite cone we use the property โจ๐, ๐ โฉ โฅ 0 for ๐, ๐ โ ๐ฎ๐+ (see Sec. 6.1.2).
Conversely, assume that ๐ ฬธโ ๐ฎ๐+. Then there exists a ๐ค satisfying โจ๐ค๐ค๐ , ๐โฉ = ๐ค๐๐๐ค < 0,
so ๐ ฬธโ (๐ฎ๐+)*.
As our last exercise let us check that vectors in ๐พexp and (๐พexp)* form acute angles. Bydefinition of ๐พexp and (๐พexp)* we take ๐ฅ, ๐ฆ such that:
๐ฅ1 โฅ ๐ฅ2 exp(๐ฅ3/๐ฅ2), ๐ฆ1 โฅ โ๐ฆ3 exp(๐ฆ2/๐ฆ3 โ 1), ๐ฅ1, ๐ฅ2, ๐ฆ1 > 0, ๐ฆ3 < 0,
and then we have
๐ฆ๐๐ฅ = ๐ฅ1๐ฆ1 + ๐ฅ2๐ฆ2 + ๐ฅ3๐ฆ3 โฅ โ๐ฅ2๐ฆ3 exp(๐ฅ3
๐ฅ2+ ๐ฆ2
๐ฆ3โ 1) + ๐ฅ2๐ฆ2 + ๐ฅ3๐ฆ3
โฅ โ๐ฅ2๐ฆ3(๐ฅ3
๐ฅ2+ ๐ฆ2
๐ฆ3) + ๐ฅ2๐ฆ2 + ๐ฅ3๐ฆ3 = 0,
using the inequality exp(๐ก) โฅ 1 + ๐ก. We refer to the literature for full proofs of all thestatements.
Finally it is nice to realize that (๐พ*)* = ๐พ and that the cones {0} and R are each othersโduals. We leave it to the reader to check these facts.
8.2 Infeasibility in conic optimization
We can now discuss infeasibility certificates for conic problems. Given an optimizationproblem, the first basic question we are interested in is its feasibility status. The theoryof infeasibility certificates for linear problems, including the Farkas lemma (see Sec. 2.3)extends almost verbatim to the conic case.
Lemma 8.3 (Farkasโ lemma, conic version). Suppose we have the conic optimization problem(8.1). Exactly one of the following statements is true:
1. Problem (8.1) is feasible.
2. Problem (8.1) is infeasible, but there is a sequence ๐ฅ๐ โ ๐พ such that โ๐ด๐ฅ๐ โ ๐โ โ 0.
3. There exists ๐ฆ such that โ๐ด๐๐ฆ โ ๐พ* and ๐๐๐ฆ > 0.
90
Problems which fall under option 2. (limit-feasible) are ill-posed: an arbitrarily smallperturbation of input data puts the problem in either category 1. or 3. This fringe caseshould therefore not appear in practice, and if it does, it signals issues with the optimizationmodel which should be addressed.
Example 8.2 (Limit-feasible model). Here is an example of an ill-posed limit-feasiblemodel created by fixing one of the root variables of a rotated quadratic cone to 0.
minimize ๐ขsubject to (๐ข, ๐ฃ, ๐ค) โ ๐ฌ3
๐,๐ฃ = 0,๐ค โฅ 1.
The problem is clearly infeasible, but the sequence (๐ข๐, ๐ฃ๐, ๐ค๐) = (๐, 1/๐, 1) โ ๐ฌ3๐ with
(๐ฃ๐, ๐ค๐) โ (0, 1) makes it limit-feasible as in alternative 2. of Lemma 8.3. There is noinfeasibility certificate as in alternative 3.
Having cleared this detail we continue with the proof and example for the actually usefulpart of conic Farkasโ lemma.
Proof. Consider the set
๐ด(๐พ) = {๐ด๐ฅ : ๐ฅ โ ๐พ}.
It is a convex cone. Feasibility is equivalent to ๐ โ ๐ด(๐พ). If ๐ ฬธโ ๐ด(๐พ) but ๐ โ cl(๐ด(๐พ))then we have the second alternative. Finally, if ๐ ฬธโ cl(๐ด(๐พ)) then the closed convex conecl(๐ด(๐พ)) and the point ๐ can be strictly separated by a hyperplane passing through theorigin, i.e. there exists ๐ฆ such that
๐๐๐ฆ > 0, (๐ด๐ฅ)๐๐ฆ โค 0 โ๐ฅ โ ๐พ.
But then 0 โค โ(๐ด๐ฅ)๐๐ฆ = (โ๐ด๐๐ฆ)๐๐ฅ for all ๐ฅ โ ๐พ, so by definition of ๐พ* we have โ๐ด๐๐ฆ โ๐พ*, and we showed ๐ฆ satisfies the third alternative. Finally, 1. and 3. are mutually exclusive,since otherwise we would have
0 โค (โ๐ด๐๐ฆ)๐๐ฅ = โ๐ฆ๐๐ด๐ฅ = โ๐ฆ๐ ๐ < 0
and the same argument works in the limit if ๐ฅ๐ is a sequence as in 2. That proves thelemma.
Therefore Farkasโ lemma implies that (up to certain ill-posed cases) either the problem(8.1) is feasible (first alternative) or there is a certificate of infeasibility ๐ฆ (last alternative).In other words, every time we classify a model as infeasible, we can certify this fact byproviding an appropriate ๐ฆ. Note that when ๐พ = R๐
+ we recover precisely the linear version,Lemma 2.1.
91
Example 8.3 (Infeasible conic problem). Consider a minimization problem:
minimize ๐ฅ1
subject to โ๐ฅ1 + ๐ฅ2 โ ๐ฅ4 = 1,2๐ฅ3 โ 3๐ฅ4 = โ1,
๐ฅ1 โฅโ๏ธ
๐ฅ22 + ๐ฅ2
3,๐ฅ4 โฅ 0.
It can be expressed in the standard form (8.1) with
๐ด =
[๏ธโ1 1 0 โ10 0 2 โ3
]๏ธ, ๐ =
[๏ธ1โ1
]๏ธ, ๐พ = ๐ฌ3 ร R+, ๐ = [1, 0, 0, 0]๐ .
A certificate of infeasibility is ๐ฆ = [1, 0]๐ . Indeed, ๐๐๐ฆ = 1 > 0 and โ๐ด๐๐ฆ = [1,โ1, 0, 1] โ๐พ* = ๐ฌ3 ร R+. The certificate indicates that the first linear constraint alone causesinfeasibility, which is indeed the case: the first equality together with the conic constraintsyield a contradiction:
๐ฅ2 โ 1 โฅ ๐ฅ2 โ 1 โ ๐ฅ4 = ๐ฅ1 โฅ |๐ฅ2|.
8.3 Lagrangian and the dual problem
Classical Lagrangian
In general constrained optimization we consider an optimization problem of the form
minimize ๐0(๐ฅ)subject to ๐(๐ฅ) โค 0,
โ(๐ฅ) = 0,(8.3)
where ๐0 : R๐ โฆโ R is the objective function, ๐ : R๐ โฆโ R๐ encodes inequality constraints,and โ : R๐ โฆโ R๐ encodes equality constraints. Readers familiar with the method of Lagrangemultipliers or penalization will recognize the Lagrangian for (8.3), a function ๐ฟ : R๐ รR๐ รR๐ โฆโ R that augments the objective with a weighted combination of all the constraints,
๐ฟ(๐ฅ, ๐ฆ, ๐ ) = ๐0(๐ฅ) + ๐ฆ๐โ(๐ฅ) + ๐ ๐๐(๐ฅ). (8.4)
The variables ๐ฆ โ R๐ and ๐ โ R๐+ are called Lagrange multipliers or dual variables. The
Lagrangian has the property that ๐ฟ(๐ฅ, ๐ฆ, ๐ ) โค ๐0(๐ฅ) whenever ๐ฅ is feasible for (8.1) and๐ โ R๐
+ . The optimal point satisfies the first-order optimality condition โ๐ฅ๐ฟ(๐ฅ, ๐ฆ, ๐ ) = 0.Moreover, the dual function ๐(๐ฆ, ๐ ) = inf๐ฅ ๐ฟ(๐ฅ, ๐ฆ, ๐ ) provides a lower bound for the optimalvalue of (8.3) for any ๐ฆ โ R๐ and ๐ โ R๐
+ , which leads to considering the dual problem ofmaximizing ๐(๐ฆ, ๐ ).
92
Lagrangian for a conic problem
We next set up an analogue of the Lagrangian theory for the conic problem (8.1)
minimize ๐๐๐ฅsubject to ๐ด๐ฅ = ๐,
๐ฅ โ ๐พ,
where ๐ฅ, ๐ โ R๐, ๐ โ R๐, ๐ด โ R๐ร๐ and ๐พ โ R๐ is a convex cone. We associate with (8.1)a Lagrangian of the form ๐ฟ : R๐ ร R๐ ร๐พ* โ R
๐ฟ(๐ฅ, ๐ฆ, ๐ ) = ๐๐๐ฅ + ๐ฆ๐ (๐โ ๐ด๐ฅ) โ ๐ ๐๐ฅ.
For any feasible ๐ฅ* โ โฑ๐ and any (๐ฆ*, ๐ *) โ R๐ ร๐พ* we have
๐ฟ(๐ฅ*, ๐ฆ*, ๐ *) = ๐๐๐ฅ* + (๐ฆ*)๐ ยท 0 โ (๐ *)๐๐ฅ* โค ๐๐๐ฅ*. (8.5)
Note the we used the definition of the dual cone to conclude that (๐ *)๐๐ฅ* โฅ 0. The dualfunction is defined as the minimum of ๐ฟ(๐ฅ, ๐ฆ, ๐ ) over ๐ฅ. Thus the dual function of (8.1) is
๐(๐ฆ, ๐ ) = min๐ฅ
๐ฟ(๐ฅ, ๐ฆ, ๐ ) = min๐ฅ
๐ฅ๐ (๐โ ๐ด๐๐ฆ โ ๐ ) + ๐๐๐ฆ =
{๏ธ๐๐๐ฆ, ๐โ ๐ด๐๐ฆ โ ๐ = 0,โโ, otherwise.
Dual problem
From (8.5) every (๐ฆ, ๐ ) โ R๐ ร๐พ* satisfies ๐(๐ฆ, ๐ ) โค ๐โ, i.e. ๐(๐ฆ, ๐ ) is a lower bound for๐โ. To get the best such bound we maximize ๐(๐ฆ, ๐ ) over all (๐ฆ, ๐ ) โ R๐ ร๐พ* and get thedual problem:
maximize ๐๐๐ฆsubject to ๐โ ๐ด๐๐ฆ = ๐ ,
๐ โ ๐พ*,(8.6)
or simply:
maximize ๐๐๐ฆsubject to ๐โ ๐ด๐๐ฆ โ ๐พ*.
(8.7)
The optimal value of (8.6) will be denoted ๐โ. As in the case of (8.1) (which from now onwe call the primal problem), the dual problem can be infeasible (๐โ = โโ), have an optimalsolution (โโ < ๐โ < +โ) or be unbounded (๐โ = +โ). As before, the value ๐โ is definedas a supremum of ๐๐๐ฆ and may not be attained. Note that the roles of โโ and +โ arenow reversed because the dual is a maximization problem.
Example 8.4 (More general constraints). We can just as easily derive the dual of a prob-lem with more general constraints, without necessarily having to transform the problem to
93
standard form beforehand. Imagine, for example, that some solver accepts conic problemsof the form:
minimize ๐๐๐ฅ + ๐๐
subject to ๐๐ โค ๐ด๐ฅ โค ๐ข๐,๐๐ฅ โค ๐ฅ โค ๐ข๐ฅ,๐ฅ โ ๐พ.
(8.8)
Then we define a Lagrangian with one set of dual variables for each constraint appearingin the problem:
๐ฟ(๐ฅ, ๐ ๐๐ , ๐ ๐๐ข, ๐
๐ฅ๐ , ๐
๐ฅ๐ข, ๐
๐ฅ๐) = ๐๐๐ฅ + ๐๐ โ (๐ ๐๐ )
๐ (๐ด๐ฅโ ๐๐) โ (๐ ๐๐ข)๐ (๐ข๐ โ ๐ด๐ฅ)โ(๐ ๐ฅ๐ )๐ (๐ฅโ ๐๐ฅ) โ (๐ ๐ฅ๐ข)๐ (๐ข๐ฅ โ ๐ฅ) โ (๐ ๐ฅ๐)๐๐ฅ
= ๐ฅ๐ (๐โ ๐ด๐ ๐ ๐๐ + ๐ด๐ ๐ ๐๐ข โ ๐ ๐ฅ๐ + ๐ ๐ฅ๐ข โ ๐ ๐ฅ๐)+(๐๐)๐ ๐ ๐๐ โ (๐ข๐)๐ ๐ ๐๐ข + (๐๐ฅ)๐ ๐ ๐ฅ๐ โ (๐ข๐)๐ ๐ ๐ฅ๐ข + ๐๐
and that gives a dual problem
maximize (๐๐)๐ ๐ ๐๐ โ (๐ข๐)๐ ๐ ๐๐ข + (๐๐ฅ)๐ ๐ ๐ฅ๐ โ (๐ข๐)๐ ๐ ๐ฅ๐ข + ๐๐
subject to ๐ + ๐ด๐ (โ๐ ๐๐ + ๐ ๐๐ข) โ ๐ ๐ฅ๐ + ๐ ๐ฅ๐ข โ ๐ ๐ฅ๐ = 0,๐ ๐๐ , ๐
๐๐ข, ๐
๐ฅ๐ , ๐
๐ฅ๐ข โฅ 0,
๐ ๐ฅ๐ โ ๐พ*.
Example 8.5 (Dual of simple portfolio). Consider a simplified version of the portfoliooptimization problem, where we maximize expected return subject to an upper bound onthe risk and no other constraints:
maximize ๐๐๐ฅsubject to โ๐น๐ฅโ2 โค ๐พ,
where ๐ฅ โ R๐ and ๐น โ R๐ร๐. The conic formulation is
maximize ๐๐๐ฅsubject to (๐พ, ๐น๐ฅ) โ ๐ฌ๐+1,
(8.9)
and we can directly write the Lagrangian
๐ฟ(๐ฅ, ๐ฃ, ๐ค) = ๐๐๐ฅ + ๐ฃ๐พ + ๐ค๐๐น๐ฅ = ๐ฅ๐ (๐ + ๐น ๐๐ค) + ๐ฃ๐พ
with (๐ฃ, ๐ค) โ (๐ฌ๐+1)* = ๐ฌ๐+1. Note that we chose signs to have ๐ฟ(๐ฅ, ๐ฃ, ๐ค) โฅ ๐๐๐ฅsince we are dealing with a maximization problem. The dual is now determined by theconditions ๐น ๐๐ค + ๐ = 0 and ๐ฃ โฅ โ๐คโ2, so it can be formulated as
minimize ๐พโ๐คโ2subject to ๐น ๐๐ค = โ๐.
(8.10)
94
Note that it is actually more natural to view problem (8.9) as the dual form and problem(8.10) as the primal. Indeed we can write the constraint in (8.9) as[๏ธ
๐พ0
]๏ธโ[๏ธ
0โ๐น
]๏ธ๐ฅ โ (๐๐+1)*
which fits naturally into the form (8.7). Having done this we can recover the dual as aminimization problem in the standard form (8.1). We leave it to the reader to check thatwe get the same answer as above.
8.4 Weak and strong duality
Weak duality
Suppose ๐ฅ* and (๐ฆ*, ๐ *) are feasible points for the primal and dual problems (8.1) and(8.6), respectively. Then we have
๐๐๐ฆ* = (๐ด๐ฅ*)๐๐ฆ* = (๐ฅ*)๐ (๐ด๐๐ฆ*) = (๐ฅ*)๐ (๐โ ๐ *) = ๐๐๐ฅ* โ (๐ *)๐๐ฅ* โค ๐๐๐ฅ* (8.11)
so the dual objective value is a lower bound on the objective value of the primal. In particular,any dual feasible point (๐ฆ*, ๐ *) gives a lower bound:
๐๐๐ฆ* โค ๐โ
and we immediately get the next lemma.
Lemma 8.4 (Weak duality). ๐โ โค ๐โ.
It follows that if ๐๐๐ฆ* = ๐๐๐ฅ* then ๐ฅ* is optimal for the primal, (๐ฆ*, ๐ *) is optimal forthe dual and ๐๐๐ฆ* = ๐๐๐ฅ* is the common optimal objective value. This way we can use theoptimal dual solution to certify optimality of the primal solution and vice versa.
Complementary slackness
Moreover, (8.11) asserts that ๐๐๐ฆ* = ๐๐๐ฅ* is equivalent to orthogonality
(๐ *)๐๐ฅ* = 0
i.e. complementary slackness. It is not hard to verify what complementary slackness meansfor particular types of cones, for example
โข for ๐ , ๐ฅ โ R+ we have ๐ ๐ฅ = 0 iff ๐ = 0 or ๐ฅ = 0,
โข vectors (๐ 1, ๐ ), (๐ฅ1, ๏ฟฝฬ๏ฟฝ) โ ๐ฌ๐+1 are orthogonal iff (๐ 1, ๐ ) and (๐ฅ1,โ๏ฟฝฬ๏ฟฝ) are parallel,
โข vectors (๐ 1, ๐ 2, ๐ ), (๐ฅ1, ๐ฅ2, ๏ฟฝฬ๏ฟฝ) โ ๐ฌ๐+2r are orthogonal iff (๐ 1, ๐ 2, ๐ ) and (๐ฅ2, ๐ฅ1,โ๏ฟฝฬ๏ฟฝ) are
parallel.
95
One implicit case is worth special attention: complementary slackness for a linear inequal-ity constraint ๐๐๐ฅ โค ๐ with a non-negative dual variable ๐ฆ asserts that (๐๐๐ฅ* โ ๐)๐ฆ* = 0.This can be seen by directly writing down the appropriate Lagrangian for this type of con-straint. Alternatively, we can introduce a slack variable ๐ข = ๐โ ๐๐๐ฅ with a conic constraint๐ข โฅ 0 and let ๐ฆ be the dual conic variable. In particular, if a constraint is non-binding inthe optimal solution (๐๐๐ฅ* < ๐) then the corresponding dual variable ๐ฆ* = 0. If ๐ฆ* > 0 thenit can be related to a shadow price, see Sec. 2.4.4 and Sec. 8.5.2.
Strong duality
The obvious question is now whether ๐โ = ๐โ, that is if optimality of a primal solu-tion can always be certified by a dual solution with matching objective value, as for linearprogramming. This turns out not to be the case for general conic problems.
Example 8.6 (Positive duality gap). Consider the problem
minimize ๐ฅ3
subject to ๐ฅ1 โฅโ๏ธ๐ฅ22 + ๐ฅ2
3,๐ฅ2 โฅ ๐ฅ1, ๐ฅ3 โฅ โ1.
The only feasible points are (๐ฅ, ๐ฅ, 0), so ๐โ = 0. The dual problem is
maximize โ๐ฆ2subject to ๐ฆ1 โฅ
โ๏ธ๐ฆ21 + (1 โ ๐ฆ2)2,
with feasible points (๐ฆ, 1), hence ๐โ = โ1.Similarly, we consider a problem
minimize ๐ฅ1
subject to
โกโฃ 0 ๐ฅ1 0๐ฅ1 ๐ฅ2 00 0 1 + ๐ฅ1
โคโฆ โ ๐ฎ3+,
with feasible set {๐ฅ1 = 0, ๐ฅ2 โฅ 0} and optimal value ๐โ = 0. The dual problem can beformulated as
maximize โ๐ง2
subject to
โกโฃ ๐ง1 (1 โ ๐ง2)/2 0(1 โ ๐ง2)/2 0 0
0 0 ๐ง2
โคโฆ โ ๐ฎ3+,
which has a feasible set {๐ง1 โฅ 0, ๐ง2 = 1} and dual optimal value ๐โ = โ1.
To ensure strong duality for conic problems we need an additional regularity assumption.As with the conic version of Farkasโ lemma Lemma 8.3, we stress that this is a technical con-dition to eliminate ill-posed problems which should not be formed in practice. In particular,
96
we invite the reader to think that strong duality holds for all well formed conic problems oneis likely to come across in applications, and that having a duality gap signals issues with themodel formulation.
We say problem (8.1) is very nicely posed if for all values of ๐0 the feasibility problem
๐๐๐ฅ = ๐0, ๐ด๐ฅ = ๐, ๐ฅ โ ๐พ
satisfies either the first or third alternative in Lemma 8.3.
Lemma 8.5 (Strong duality). Suppose that (8.1) is very nicely posed and ๐โ is finite. Then๐โ = ๐โ.
Proof. For any ๐ > 0 consider the feasibility problem with variable ๐ฅ and constraints[๏ธโ๐๐
๐ด
]๏ธ๐ฅ =
[๏ธโ๐โ + ๐
๐
]๏ธ๐ฅ โ ๐พ
that is๐๐๐ฅ = ๐โ โ ๐,๐ด๐ฅ = ๐,๐ฅ โ ๐พ.
Optimality of ๐โ implies that the above problem is infeasible. By Lemma 8.3 and becausewe assumed very-nice-posedness there exists ๐ฆ = [๐ฆ0 ๐ฆ]๐ such that[๏ธ
๐, โ๐ด๐]๏ธ๐ฆ โ ๐พ* and
[๏ธโ๐โ + ๐, ๐๐
]๏ธ๐ฆ > 0.
If ๐ฆ0 = 0 then โ๐ด๐๐ฆ โ ๐พ* and ๐๐๐ฆ > 0, which by Lemma 8.3 again would mean that theoriginal problem was infeasible, which is not the case. Hence we can rescale so that ๐ฆ0 = 1and then we get
๐โ ๐ด๐๐ฆ โ ๐พ* and ๐๐๐ฆ โฅ ๐โ โ ๐.
The first constraint means that ๐ฆ is feasible for the dual problem. By letting ๐ โ 0 weobtain ๐โ โฅ ๐โ.
There are more direct conditions which guarantee strong duality, such as below.
Lemma 8.6 (Slater constraint qualification). Suppose that (8.1) is strictly feasible: thereexists a point ๐ฅ โ int(๐พ) in the interior of ๐พ such that ๐ด๐ฅ = ๐. Then strong duality holdsif ๐โ is finite. Moreover, if both primal and dual problem are strictly feasible then ๐โ and ๐โ
are attained.
We omit the proof which can be found in standard texts. Note that the first problemfrom Example 8.6 does not satisfy Slater constraint qualification: the only feasible pointslie on the boundary of the cone (we say the problem has no interior). That problem is notvery nicely posed either: the point (๐ฅ1, ๐ฅ2, ๐ฅ3) = (0.5๐20๐
โ1 +๐, 0.5๐20๐โ1, ๐0) โ ๐ฌ3 violates the
inequality ๐ฅ2 โฅ ๐ฅ1 by an arbitrarily small ๐, so the problem is infeasible but limit-feasible(second alternative in Lemma 8.3).
97
8.5 Applications of conic duality
8.5.1 Linear regression and the normal equation
Least-squares linear regression is the problem of minimizing โ๐ด๐ฅ โ ๐โ22 over ๐ฅ โ R๐, where๐ด โ R๐ร๐ and ๐ โ R๐ are fixed. This problem can be posed in conic form as
minimize ๐กsubject to (๐ก, ๐ด๐ฅโ ๐) โ ๐ฌ๐+1,
and we can write the Lagrangian
๐ฟ(๐ก, ๐ฅ, ๐ข, ๐ฃ) = ๐กโ ๐ก๐ขโ ๐ฃ๐ (๐ด๐ฅโ ๐) = ๐ก(1 โ ๐ข) โ ๐ฅ๐๐ด๐๐ฃ + ๐๐๐ฃ
so the constraints in the dual problem are:
๐ข = 1, ๐ด๐๐ฃ = 0, (๐ข, ๐ฃ) โ ๐๐+1.
The problem exhibits strong duality with both the primal and dual values attained in theoptimal solution. The primal solution clearly satisfies ๐ก = โ๐ด๐ฅโ ๐โ2, and so complementaryslackness for the quadratic cone implies that the vectors (๐ข,โ๐ฃ) = (1,โ๐ฃ) and (๐ก, ๐ด๐ฅ โ ๐)are parallel. As a consequence the constraint ๐ด๐๐ฃ = 0 becomes ๐ด๐ (๐ด๐ฅโ ๐) = 0 or simply
๐ด๐๐ด๐ฅ = ๐ด๐ ๐
so if ๐ด๐๐ด is invertible then ๐ฅ = (๐ด๐๐ด)โ1๐ด๐ ๐. This is the so-called normal equation forleast-squares regression, which we now obtained as a consequence of strong duality.
8.5.2 Constraint attribution
Consider again a portfolio optimization problem with mean-variance utility function, vectorof expected returns ๐ผ and covariance matrix ฮฃ = ๐น ๐๐น :
maximize ๐ผ๐๐ฅโ 12๐๐ฅ๐ฮฃ๐ฅ
subject to ๐ด๐ฅ โค ๐,(8.12)
where the linear part represents any set of additional constraints: total budget, sector con-straints, diversification constraints, individual relations between positions etc. In the absenceof additional constraints the solution to the unconstrained maximization problem is easy toderive using basic calculus and equals
๏ฟฝฬ๏ฟฝ = ๐โ1ฮฃโ1๐ผ.
We would like to understand the difference ๐ฅ* โ ๏ฟฝฬ๏ฟฝ, where ๐ฅ* is the solution of (8.12), andin particular to measure which of the linear constraints actually cause ๐ฅ* to deviate from ๏ฟฝฬ๏ฟฝand to what degree. This can be quantified using the dual variables.
98
The conic version of problem (8.12) is
maximize ๐ผ๐๐ฅโ ๐๐subject to ๐ด๐ฅ โค ๐,
(1, ๐, ๐น๐ฅ) โ ๐ฌr
with dual
minimize ๐๐๐ฆ + ๐ subject to ๐ด๐๐ฆ = ๐ผ + ๐น ๐๐ข,
(๐ , ๐, ๐ข) โ ๐ฌr ,๐ฆ โฅ 0.
Suppose we have a primal-dual optimal solution (๐ฅ*, ๐*, ๐ฆ*, ๐ *, ๐ข*). Complementary slacknessfor the rotated quadratic cone implies
(๐ *, ๐, ๐ข*) = ๐ฝ(๐*, 1,โ๐น๐ฅ*)
which leads to ๐ฝ = ๐ and
๐ด๐๐ฆ* = ๐ผโ ๐๐น ๐๐น๐ฅ*
or equivalently
๐ฅ* = ๏ฟฝฬ๏ฟฝโ ๐โ1ฮฃโ1๐ด๐๐ฆ*.
This equation splits the difference between the constrained and unconstrained solutions intocontributions from individual constraints, where the weights are precisely the dual variables๐ฆ*. For example, if a constraint is not binding (๐๐๐ ๐ฅ* โ ๐๐ < 0) then by complementaryslackness ๐ฆ*๐ = 0 and, indeed, a non-binding constraint has no effect on the change in solution.
8.6 Semidefinite duality and LMIs
The general theory of conic duality applies in particular to problems with semidefinite vari-ables so here we just state it in the language familiar to SDP practitioners. Consider forsimplicity a primal semidefinite optimization problem with one matrix variable
minimize โจ๐ถ,๐โฉsubject to โจ๐ด๐, ๐โฉ = ๐๐, ๐ = 1, . . . ,๐,
๐ โ ๐ฎ๐+.
(8.13)
We can quickly repeat the derivation of the dual problem in this notation. The Lagrangianis
๐ฟ(๐, ๐ฆ, ๐) = โจ๐ถ,๐โฉ โโ๏ธ๐ ๐ฆ๐(โจ๐ด๐, ๐โฉ โ ๐๐) โ โจ๐,๐โฉ= โจ๐ถ โโ๏ธ๐ ๐ฆ๐๐ด๐ โ ๐,๐โฉ + ๐๐๐ฆ
99
and we get the dual problem
maximize ๐๐๐ฆsubject to ๐ถ โโ๏ธ๐
๐=1 ๐ฆ๐๐ด๐ โ ๐ฎ๐+.
(8.14)
The dual contains an affine matrix-valued function with coefficients ๐ถ,๐ด๐ โ ๐ฎ๐ and variable๐ฆ โ R๐. Such a matrix-valued affine inequality is called a linear matrix inequality (LMI). InSec. 6 we formulated many problems as LMIs, that is in the form more naturally matchingthe dual.
From a modeling perspective it does not matter whether constraints are given as linearmatrix inequalities or as an intersection of affine hyperplanes; one formulation is easilyconverted to other using auxiliary variables and constraints, and this transformation is oftendone transparently by optimization software. Nevertheless, it is instructive to study anexplicit example of how to carry out this transformation. An linear matrix inequality
๐ด0 + ๐ฅ1๐ด1 + ยท ยท ยท + ๐ฅ๐๐ด๐ โชฐ 0
where ๐ด๐ โ ๐ฎ๐+ is converted to a set of linear equality constraints using a slack variable
๐ด0 + ๐ฅ1๐ด1 + ยท ยท ยท + ๐ฅ๐๐ด๐ = ๐, ๐ โชฐ 0.
Apart from introducing an explicit semidefinite variable ๐ โ ๐ฎ๐+ we also added ๐(๐ + 1)/2
equality constraints. On the other hand, a semidefinite variable ๐ โ ๐ฎ๐+ can be rewritten as
a linear matrix inequality with ๐(๐ + 1)/2 scalar variables
๐ =๐โ๏ธ
๐=1
๐๐๐๐๐ ๐ฅ๐๐ +
๐โ๏ธ๐=1
๐โ๏ธ๐=๐+1
(๐๐๐๐๐ + ๐๐๐
๐๐ )๐ฅ๐๐ โชฐ 0.
Obviously we should only use these transformations when necessary; if we have a problemthat is more naturally interpreted in either primal or dual form, we should be careful torecognize that structure.
Example 8.7 (Dualization and efficiency). Consider the problem:
minimize ๐๐ ๐งsubject to ๐ด + Diag(๐ง) = ๐,
๐ โชฐ 0.
with the variables ๐ โ ๐ฎ๐+ and ๐ง โ R๐. This is a problem in primal form with ๐(๐+ 1)/2
equality constraints, but they are more naturally interpreted as a linear matrix inequality
๐ด +โ๏ธ๐
๐๐๐๐๐ ๐ง๐ โชฐ 0.
The dual problem is
maximize โโจ๐ด,๐โฉsubject to diag(๐) = ๐,
๐ โชฐ 0,
in the variable ๐ โ ๐ฎ๐+. The dual problem has only ๐ equality constraints, which is a vast
improvement over the ๐(๐ + 1)/2 constraints in the primal problem. See also Sec. 7.5.
100
Example 8.8 (Sum of singular values revisited). In Sec. 6.2.4, and specifically in (6.16),we expressed the problem of minimizing the sum of singular values of a nonsymmetricmatrix ๐. Problem (6.16) can be written as an LMI:
maximizeโ๏ธ
๐,๐ ๐๐๐๐ง๐๐
subject to ๐ผ โโ๏ธ๐,๐ ๐ง๐๐
[๏ธ0 ๐๐๐
๐๐
๐๐๐๐๐ 0
]๏ธโชฐ 0.
Treating this as the dual and going back to the primal form we get:
minimize Tr(๐) + Tr(๐ )subject to ๐ = โ1
2๐,[๏ธ
๐ ๐๐
๐ ๐
]๏ธโชฐ 0,
which is equivalent to the claimed (6.17). The dual formulation has the advantage ofbeing linear in ๐.
101
Chapter 9
Mixed integer optimization
In other chapters of this cookbook we have considered different classes of convex problemswith continuous variables. In this chapter we consider a much wider range of non-convexproblems by allowing integer variables. This technique is extremely useful in practice, andalready for linear programming it covers a vast range of problems. We introduce differentbuilding blocks for integer optimization, which make it possible to model useful non-convexdependencies between variables in conic problems. It should be noted that mixed integeroptimization problems are very hard (technically NP-hard), and for many practical cases anexact solution may not be found in reasonable time.
9.1 Integer modeling
A general mixed integer conic optimization problem has the form
minimize ๐๐๐ฅsubject to ๐ด๐ฅ = ๐,
๐ฅ โ ๐พ,๐ฅ๐ โ Z, โ๐ โ โ,
(9.1)
where ๐พ is a cone and โ โ {1, . . . , ๐} denotes the set of variables that are constrained to beintegers.
Two major techniques are typical for mixed integer optimization. The first one is theuse of binary variables, also known as indicator variables, which only take values 0 and 1,and indicate the absence or presence of a particular event or choice. This restriction can ofcourse be modeled in the form (9.1) by writing:
0 โค ๐ฅ โค 1 and ๐ฅ โ Z.
The other, known as big-M, refers to the fact that some relations can only be modeled linearlyif one assumes some fixed bound ๐ on the quantities involved, and this constant enters themodel formulation. The choice of ๐ can affect the performance of the model, see Example7.8.
102
9.1.1 Implication of positivity
Often we have a real-valued variable ๐ฅ โ R satisfying 0 โค ๐ฅ < ๐ for a known upper bound๐ , and we wish to model the implication
๐ฅ > 0 =โ ๐ง = 1. (9.2)
Making ๐ง a binary variable we can write (9.2) as a linear inequality
๐ฅ โค ๐๐ง, ๐ง โ {0, 1}. (9.3)
Indeed ๐ฅ > 0 excludes the possibility of ๐ง = 0, hence forces ๐ง = 1. Since a priori ๐ฅ โค ๐ ,there is no danger that the constraint accidentally makes the problem infeasible. A typicaluse of this trick is to model fixed setup costs.
Example 9.1 (Fixed setup cost). Assume that production of a specific item ๐ costs ๐ข๐ perunit, but there is an additional fixed charge of ๐ค๐ if we produce item ๐ at all. For instance,๐ค๐ could be the cost of setting up a production plant, initial cost of equipment etc. Thenthe cost of producing ๐ฅ๐ units of product ๐ is given by the discontinuous function
๐๐(๐ฅ๐) =
{๏ธ๐ค๐ + ๐ข๐๐ฅ๐, ๐ฅ๐ > 00, ๐ฅ๐ = 0.
If we let ๐ denote an upper bound on the quantities we can produce, we can then minimizethe total production cost of ๐ products under some affine constraint ๐ด๐ฅ = ๐ with
minimize ๐ข๐๐ฅ + ๐ค๐ ๐งsubject to ๐ด๐ฅ = ๐,
๐ฅ๐ โค ๐๐ง๐, ๐ = 1, . . . , ๐๐ฅ โฅ 0,๐ง โ {0, 1}๐,
which is a linear mixed-integer optimization problem. Note that by minimizing the pro-duction cost, we drive ๐ง๐ to 0 when ๐ฅ๐ = 0, so setup costs are indeed included only forproducts with ๐ฅ๐ > 0.
9.1.2 Semi-continuous variables
We can also model semi-continuity of a variable ๐ฅ โ R,
๐ฅ โ 0 โช [๐, ๐], (9.4)
where 0 < ๐ โค ๐ using a double inequality
๐๐ง โค ๐ฅ โค ๐๐ง, ๐ง โ {0, 1}.
103
Total
cost
0 1 2 3 4 5Quantities produced
wi
ui
Fig. 9.1: Production cost with fixed setup cost ๐ค๐.
9.1.3 Indicator constraints
Suppose we want to model the fact that a certain linear inequality must be satisfied whensome other event occurs. In other words, for a binary variable ๐ง we want to model theimplication
๐ง = 1 =โ ๐๐๐ฅ โค ๐.
Suppose we know in advance an upper bound ๐๐๐ฅ โ ๐ โค ๐ . Then we can write the aboveas a linear inequality
๐๐๐ฅ โค ๐ + ๐(1 โ ๐ง).
Now if ๐ง = 1 then we forced ๐๐๐ฅ โค ๐, while for ๐ง = 0 the inequality is trivially satisfied anddoes not enforce any additional constraint on ๐ฅ.
9.1.4 Disjunctive constraints
With a disjunctive constraint we require that at least one of the given linear constraints issatisfied, that is
(๐๐1 ๐ฅ โค ๐1) โจ (๐๐2 ๐ฅ โค ๐2) โจ ยท ยท ยท โจ (๐๐๐ ๐ฅ โค ๐๐).
Introducing binary variables ๐ง1, . . . , ๐ง๐, we can use Sec. 9.1.3 to write a linear model
๐ง1 + ยท ยท ยท + ๐ง๐ โฅ 1,๐ง1, . . . , ๐ง๐ โ {0, 1},๐๐๐ ๐ฅ โค ๐๐ + ๐(1 โ ๐ง๐), ๐ = 1, . . . , ๐.
Note that ๐ง๐ = 1 implies that the ๐-th constraint is satisfied, but not vice-versa. Achievingthat effect is described in the next section.
104
9.1.5 Constraint satisfaction
Say we want to define an optimization model that will behave differently depending on whichof the inequalities
๐๐๐ฅ โค ๐ or ๐๐๐ฅ โฅ ๐
is satisfied. Suppose we have lower and upper bounds for ๐๐๐ฅ โ ๐ in the form of ๐ โค๐๐๐ฅโ ๐ โค ๐ . Then we can write a model
๐ + ๐๐ง โค ๐๐๐ฅ โค ๐ + ๐(1 โ ๐ง), ๐ง โ {0, 1}. (9.5)
Now observe that ๐ง = 0 implies ๐ โค ๐๐๐ฅ โค ๐ + ๐ , of which the right-hand inequality isredundant, i.e. always satisfied. Similarly, ๐ง = 1 implies ๐ + ๐ โค ๐๐๐ฅ โค ๐. In other words ๐งis an indicator of whether ๐๐๐ฅ โค ๐.
In practice we would relax one inequality using a small amount of slack, i.e.,
๐ + (๐โ ๐)๐ง + ๐ โค ๐๐๐ฅ (9.6)
to avoid issues with classifying the equality ๐๐๐ฅ = ๐.
9.1.6 Exact absolute value
In Sec. 2.2.2 we showed how to model |๐ฅ| โค ๐ก as two linear inequalities. Now suppose weneed to model an exact equality
|๐ฅ| = ๐ก. (9.7)
It defines a non-convex set, hence it is not conic representable. If we split ๐ฅ into positive andnegative part ๐ฅ = ๐ฅ+ โ ๐ฅโ, where ๐ฅ+, ๐ฅโ โฅ 0, then |๐ฅ| = ๐ฅ+ + ๐ฅโ as long as either ๐ฅ+ = 0or ๐ฅโ = 0. That last alternative can be modeled with a binary variable, and we get a modelof (9.7):
๐ฅ = ๐ฅ+ โ ๐ฅโ,๐ก = ๐ฅ+ + ๐ฅโ,0 โค ๐ฅ+, ๐ฅโ,
๐ฅ+ โค ๐๐ง,๐ฅโ โค ๐(1 โ ๐ง),๐ง โ {0, 1},
(9.8)
where the constant ๐ is an a priori known upper bound on |๐ฅ| in the problem.
9.1.7 Exact 1-norm
We can use the technique above to model the exact โ1-norm equality constraint๐โ๏ธ
๐=1
|๐ฅ๐| = ๐, (9.9)
105
where ๐ฅ โ R๐ is a decision variable and ๐ is a constant. Such constraints arise for instancein fully invested portfolio optimizations scenarios (with short-selling). As before, we split ๐ฅinto a positive and negative part, using a sequence of binary variables to guarantee that atmost one of them is nonzero:
๐ฅ = ๐ฅ+ โ ๐ฅโ,0 โค ๐ฅ+, ๐ฅโ,
๐ฅ+ โค ๐๐ง,๐ฅโ โค ๐(๐โ ๐ง),โ๏ธ
๐ ๐ฅ+๐ +
โ๏ธ๐ ๐ฅ
โ๐ = ๐,๐ง โ {0, 1}๐, ๐ฅ+, ๐ฅโ โ R๐.
(9.10)
9.1.8 Maximum
The exact equality ๐ก = max{๐ฅ1, . . . , ๐ฅ๐} can be expressed by introducing a sequence ofmutually exclusive indicator variables ๐ง1, . . . , ๐ง๐, with the intention that ๐ง๐ = 1 picks thevariable ๐ฅ๐ which actually achieves maximum. Choosing a safe bound ๐ we get a model:
๐ฅ๐ โค ๐ก โค ๐ฅ๐ + ๐(1 โ ๐ง๐), ๐ = 1, . . . , ๐,๐ง1 + ยท ยท ยท + ๐ง๐ = 1,
๐ง โ {0, 1}๐.(9.11)
9.1.9 Boolean operators
Typically an indicator variable ๐ง โ {0, 1} represents a boolean value (true/false). In thiscase the standard boolean operators can be implemented as linear inequalities. In the tablebelow we assume all variables are binary.
Table 9.1: Boolean operatorsBoolean Linear๐ง = ๐ฅ OR ๐ฆ ๐ฅ โค ๐ง, ๐ฆ โค ๐ง, ๐ง โค ๐ฅ + ๐ฆ๐ง = ๐ฅ AND ๐ฆ ๐ฅ โฅ ๐ง, ๐ฆ โฅ ๐ง, ๐ง + 1 โฅ ๐ฅ + ๐ฆ๐ง = NOT ๐ฅ ๐ง = 1 โ ๐ฅ๐ฅ =โ ๐ฆ ๐ฅ โค ๐ฆAt most one of ๐ง1, . . . , ๐ง๐ holds (SOS1, set-packing)
โ๏ธ๐ ๐ง๐ โค 1
Exactly one of ๐ง1, . . . , ๐ง๐ holds (set-partitioning)โ๏ธ
๐ ๐ง๐ = 1At least one of ๐ง1, . . . , ๐ง๐ holds (set-covering)
โ๏ธ๐ ๐ง๐ โฅ 1
At most ๐ of ๐ง1, . . . , ๐ง๐ holds (cardinality)โ๏ธ
๐ ๐ง๐ โค ๐
9.1.10 Fixed set of values
We can restrict a variable ๐ฅ to take on only values from a specified finite set {๐1, . . . , ๐๐} bywriting
๐ฅ =โ๏ธ
๐ ๐ง๐๐๐๐ง โ {0, 1}๐,โ๏ธ
๐ ๐ง๐ = 1.(9.12)
106
In (9.12) we essentially defined ๐ง๐ to be the indicator variable of whether ๐ฅ = ๐๐. In somecircumstances there may be more efficient representations of a restricted set of values, forexample:
โข (sign) ๐ฅ โ {โ1, 1} โโ ๐ฅ = 2๐ง โ 1, ๐ง โ {0, 1},
โข (modulo) ๐ฅ โ {1, 4, 7, 10} โโ ๐ฅ = 3๐ง + 1, 0 โค ๐ง โค 3, ๐ง โ Z,
โข (fraction) ๐ฅ โ {0, 1/3, 2/3, 1} โโ 3๐ฅ = ๐ง, 0 โค ๐ง โค 3, ๐ง โ Z,
โข (gap) ๐ฅ โ (โโ, ๐]โช [๐,โ) โโ ๐โ๐(1โ๐ง) โค ๐ฅ โค ๐+๐๐ง, ๐ง โ {0, 1} for sufficientlylarge ๐ .
In a very similar fashion we can restrict a variable ๐ฅ to a finite union of intervalsโ๏ธ
๐[๐๐, ๐๐]:โ๏ธ๐ ๐ง๐๐๐ โค ๐ฅ โคโ๏ธ๐ ๐ง๐๐๐,
๐ง โ {0, 1}๐,โ๏ธ๐ ๐ง๐ = 1.
(9.13)
9.1.11 Alternative as a bilinear equality
In the literature one often finds models containing a bilinear constraint
๐ฅ๐ฆ = 0.
This constraint is just an algebraic formulation of the alternative ๐ฅ = 0 or ๐ฆ = 0 and thereforeit can be written as a mixed-integer linear problem:
|๐ฅ| โค ๐๐ง,|๐ฆ| โค ๐(1 โ ๐ง),๐ง โ {0, 1},
for a suitable constant ๐ . The absolute value can be omitted if both ๐ฅ, ๐ฆ are nonnegative.Otherwise it can be modeled as in Sec. 2.2.2.
9.1.12 Equal signs
Suppose we want to say that the variables ๐ฅ1, . . . , ๐ฅ๐ must either be all nonnegative or allnonpositive. This can be achieved by introducing a single binary variable ๐ง โ {0, 1} whichpicks one of these two alternatives:
โ๐(1 โ ๐ง) โค ๐ฅ๐ โค ๐๐ง, ๐ = 1, . . . , ๐.
Indeed, if ๐ง = 0 then we have โ๐ โค ๐ฅ๐ โค 0 and if ๐ง = 1 then 0 โค ๐ฅ๐ โค ๐ .
107
9.1.13 Continuous piecewise-linear functions
Consider a continuous, univariate, piecewise-linear, non-convex function ๐ : [๐ผ1, ๐ผ5] โฆโ Rshown in Fig. 9.2. At the interval [๐ผ๐, ๐ผ๐+1], ๐ = 1, 2, 3, 4 we can describe the function as
๐(๐ฅ) = ๐๐๐(๐ผ๐) + ๐๐+1๐(๐ผ๐+1)
where ๐๐๐ผ๐ +๐๐+1๐ผ๐+1 = ๐ฅ and ๐๐ +๐๐+1 = 1. If we add a constraint that only two (adjacent)variables ๐๐, ๐๐+1 can be nonzero, we can characterize every value ๐(๐ฅ) over the entire interval[๐ผ1, ๐ผ5] as some convex combination,
๐(๐ฅ) =4โ๏ธ
๐=1
๐๐๐(๐ผ๐).
f(x)
xฮฑ1 ฮฑ2 ฮฑ3 ฮฑ4 ฮฑ5
Fig. 9.2: A univariate piecewise-linear non-convex function.
The condition that only two adjacent variables can be nonzero is sometimes called anSOS2 constraint. If we introduce indicator variables ๐ง๐ for each pair of adjacent variables(๐๐, ๐๐+1), we can model an SOS2 constraint as:
๐1 โค ๐ง1, ๐2 โค ๐ง1 + ๐ง2, ๐3 โค ๐ง2 + ๐ง3, ๐4 โค ๐ง4 + ๐ง3, ๐5 โค ๐ง4๐ง1 + ๐ง2 + ๐ง3 + ๐ง4 = 1, ๐ง โ {0, 1}4,
so that we have ๐ง๐ = 1 =โ ๐๐ = 0, ๐ ฬธ= {๐, ๐ + 1}. Collectively, we can then model theepigraph ๐(๐ฅ) โค ๐ก as
๐ฅ =โ๏ธ๐
๐=1 ๐๐๐ผ๐,โ๏ธ๐
๐=1 ๐๐๐(๐ผ๐) โค ๐ก
๐1 โค ๐ง1, ๐๐ โค ๐ง๐ + ๐ง๐โ1, ๐ = 2, . . . , ๐โ 1, ๐๐ โค ๐ง๐โ1,
๐ โฅ 0,โ๏ธ๐
๐=1 ๐๐ = 1,โ๏ธ๐โ1
๐=1 ๐ง๐ = 1, ๐ง โ {0, 1}๐โ1,(9.14)
for a piecewise-linear function ๐(๐ฅ) with ๐ terms. This approach is often called the lambda-method.
For the function in Fig. 9.2 we can reduce the number of integer variables by using aGray encoding
of the intervals [๐ผ๐, ๐ผ๐+1] and an indicator variable ๐ฆ โ {0, 1}2 to represent the fourdifferent values of Gray code. We can then describe the constraints on ๐ using only two
108
00 10 11 01
ฮฑ1 ฮฑ2 ฮฑ3 ฮฑ4 ฮฑ5
indicator variables,
(๐ฆ1 = 0) โ ๐3 = 0,(๐ฆ1 = 1) โ ๐1 = ๐5 = 0,(๐ฆ2 = 0) โ ๐4 = ๐5 = 0,(๐ฆ2 = 1) โ ๐1 = ๐2 = 0,
which leads to a more efficient characterization of the epigraph ๐(๐ฅ) โค ๐ก,
๐ฅ =โ๏ธ5
๐=1 ๐๐๐ผ๐,โ๏ธ5
๐=1 ๐๐๐(๐ผ๐) โค ๐ก,
๐3 โค ๐ฆ1, ๐1 + ๐5 โค (1 โ ๐ฆ1), ๐4 + ๐5 โค ๐ฆ2, ๐1 + ๐2 โค (1 โ ๐ฆ2),
๐ โฅ 0,โ๏ธ5
๐=1 ๐๐ = 1, ๐ฆ โ {0, 1}2,
The lambda-method can also be used to model multivariate continuous piecewise-linear non-convex functions, specified on a set of polyhedra ๐๐. For example, for the function shown inFig. 9.3 we can model the epigraph ๐(๐ฅ) โค ๐ก as
๐ฅ =โ๏ธ6
๐=1 ๐๐๐ฃ๐,โ๏ธ6
๐=1 ๐๐๐(๐ฃ๐) โค ๐ก,๐1 โค ๐ง1 + ๐ง2, ๐2 โค ๐ง1, ๐3 โค ๐ง2 + ๐ง3,
๐4 โค ๐ง1 + ๐ง2 + ๐ง3 + ๐ง4, ๐5 โค ๐ง3 + ๐ง4, ๐6 โค ๐ง4,
๐ โฅ 0,โ๏ธ6
๐=1 ๐๐ = 1,โ๏ธ4
๐=1 ๐ง๐ = 1,๐ง โ {0, 1}4.
(9.15)
Note, for example, that ๐ง2 = 1 implies that ๐2 = ๐5 = ๐6 = 0 and ๐ฅ = ๐1๐ฃ1 + ๐3๐ฃ3 + ๐4๐ฃ4.
Fig. 9.3: A multivariate continuous piecewise-linear non-convex function.
9.1.14 Lower semicontinuous piecewise-linear functions
The ideas in Sec. 9.1.13 can be applied to lower semicontinuous piecewise-linear functions aswell. For example, consider the function shown in Fig. 9.4. If we denote the one-sided limits
109
by ๐โ(๐) := lim๐ฅโ๐ ๐(๐ฅ) and ๐+(๐) := lim๐ฅโ๐ ๐(๐ฅ), respectively, the one-sided limits, then wecan describe the epigraph ๐(๐ฅ) โค ๐ก for the function in Fig. 9.4 as
๐ฅ = ๐1๐ผ1 + (๐2 + ๐3 + ๐4)๐ผ2 + ๐5๐ผ3,๐1๐(๐ผ1) + ๐2๐โ(๐ผ2) + ๐3๐(๐ผ2) + ๐4๐+(๐ผ2) + ๐5๐(๐ผ3) โค ๐ก,
๐1 + ๐2 โค ๐ง1, ๐3 โค ๐ง2, ๐4 + ๐5 โค ๐ง3,
๐ โฅ 0,โ๏ธ5
๐=1 ๐๐ = 1,โ๏ธ3
๐=1 ๐ง๐ = 1, ๐ง โ {0, 1}3,(9.16)
where we have a different decision variable for the intervals [๐ผ1, ๐ผ2), [๐ผ2, ๐ผ2], and (๐ผ2, ๐ผ3]. Asa special case this gives us an alternative characterization of fixed charge models consideredin Sec. 9.1.1.
f(x)
xฮฑ1 ฮฑ2 ฮฑ3
Fig. 9.4: A univariate lower semicontinuous piecewise-linear function.
9.2 Mixed integer conic case studies
9.2.1 Wireless network design
The following problem arises in wireless network design and some other applications. Wewant to serve ๐ clients with ๐ transmitters. A transmitter with range ๐๐ can serve all clientswithin Euclidean distance ๐๐ from its position. The power consumption of a transmitter withrange ๐ is proportional to ๐๐ผ for some fixed 1 โค ๐ผ < โ. The goal is to assign locations andranges to transmitters minimizing the total power consumption while providing coverage ofall ๐ clients.
Denoting by ๐ฅ1, . . . , ๐ฅ๐ โ R2 locations of the clients, we can model this problem usingbinary decision variables ๐ง๐๐ indicating if the ๐-th client is covered by the ๐-th transmitter.This leads to a mixed-integer conic problem:
minimizeโ๏ธ
๐ ๐๐ผ๐
subject to ๐๐ โฅ โ๐๐ โ ๐ฅ๐โ2 โ๐(1 โ ๐ง๐๐), ๐ = 1, . . . , ๐, ๐ = 1, . . . , ๐,โ๏ธ๐ ๐ง๐๐ โฅ 1, ๐ = 1, . . . , ๐,
๐๐ โ R2, ๐ง๐๐ โ {0, 1}.(9.17)
The objective can be realized by summing power bounds ๐ก๐ โฅ ๐๐ผ๐ or by directly boundingthe ๐ผ-norm of (๐1, . . . , ๐๐). The latter approach would be recommended for large ๐ผ.
110
This is a type of clustering problem. For ๐ผ = 1, 2, respectively, we are minimizing theperimeter and area of the covered region. In practical applications the power exponent ๐ผcan be as large as 6 depending on various factors (for instance terrain). In the linear costmodel (๐ผ = 1) typical solutions contain a small number of huge disks covering most of theclients. For increasing ๐ผ large ranges are penalized more heavily and the disks tend to bemore balanced.
9.2.2 Avoiding small trades
The standard portfolio optimization model admits a number of mixed-integer extensionsaimed at avoiding solutions with very small trades. To fix attention consider the model
maximize ๐๐๐ฅsubject to (๐พ,๐บ๐๐ฅ) โ ๐ฌ๐+1,
๐๐๐ฅ = ๐ค + ๐๐๐ฅ0,๐ฅ โฅ 0,
(9.18)
with initial holdings ๐ฅ0, initial cash amount ๐ค, expected returns ๐, risk bound ๐พ and decisionvariable ๐ฅ. Here ๐ is the all-ones vector. Let โ๐ฅ๐ = ๐ฅ๐ โ ๐ฅ0
๐ denote the change of position inasset ๐.
Transaction costs
A transaction cost involved with nonzero โ๐ฅ๐ could be modeled as
๐๐(๐ฅ๐) =
{๏ธ0, โ๐ฅ๐ = 0,
๐ผ๐โ๐ฅ๐ + ๐ฝ๐, โ๐ฅ๐ ฬธ= 0,
similarly to the problem from Example 9.1. Including transaction costs will now lead to themodel:
maximize ๐๐๐ฅsubject to (๐พ,๐บ๐๐ฅ) โ ๐ฌ๐+1,
๐๐๐ฅ + ๐ผ๐๐ฅ + ๐ฝ๐ ๐ง = ๐ค + ๐๐๐ฅ0,๐ฅโ ๐ฅ0 โค ๐๐ง, ๐ฅ0 โ ๐ฅ โค ๐๐ง,๐ฅ โฅ 0, ๐ง โ {0, 1}๐,
(9.19)
where the binary variable ๐ง๐ is an indicator of โ๐ฅ๐ ฬธ= 0. Here ๐ is a sufficiently largeconstant, for instance ๐ = ๐ค + ๐๐๐ฅ0 will do.
111
Cardinality constraints
Another option is to fix an upper bound ๐ on the number of nonzero trades. The meaningof ๐ง is the same as before:
maximize ๐๐๐ฅsubject to (๐พ,๐บ๐๐ฅ) โ ๐ฌ๐+1,
๐๐๐ฅ = ๐ค + ๐๐๐ฅ0,๐ฅโ ๐ฅ0 โค ๐๐ง, ๐ฅ0 โ ๐ฅ โค ๐๐ง,๐๐ ๐ง โค ๐,๐ฅ โฅ 0, ๐ง โ {0, 1}๐.
(9.20)
Trading size constraints
We can also demand a lower bound on nonzero trades, that is |โ๐ฅ๐| โ {0} โช [๐, ๐] for all๐. To this end we combine the techniques from Sec. 9.1.6 and Sec. 9.1.2 writing ๐๐, ๐๐ for theindicators of โ๐ฅ๐ > 0 and โ๐ฅ๐ < 0, respectively:
maximize ๐๐๐ฅsubject to (๐พ,๐บ๐๐ฅ) โ ๐ฌ๐+1,
๐๐๐ฅ = ๐ค + ๐๐๐ฅ0,๐ฅโ ๐ฅ0 = ๐ฅ+ โ ๐ฅโ,๐ฅ+ โค ๐๐, ๐ฅโ โค ๐๐,๐(๐ + ๐) โค ๐ฅ+ + ๐ฅโ โค ๐(๐ + ๐),๐ + ๐ โค ๐,๐ฅ, ๐ฅ+, ๐ฅโ โฅ 0, ๐, ๐ โ {0, 1}๐.
(9.21)
9.2.3 Convex piecewise linear regression
Consider the problem of approximating the data (๐ฅ๐, ๐ฆ๐), ๐ = 1, . . . , ๐ by a piecewise linearconvex function of the form
๐(๐ฅ) = max{๐๐๐ฅ + ๐๐, ๐ = 1, . . . ๐},
where ๐ is the number of segments we want to consider. The quality of the fit is measuredwith least squares as โ๏ธ
๐
(๐(๐ฅ๐) โ ๐ฆ๐)2.
Note that we do not specify the locations of nodes (breakpoints), i.e. points where ๐(๐ฅ)changes slope. Finding them is part of the fitting problem.
As in Sec. 9.1.8 we introduce binary variables ๐ง๐๐ indicating that ๐(๐ฅ๐) = ๐๐๐ฅ๐ + ๐๐, i.e. itis the ๐-th linear function that achieves maximum at the point ๐ฅ๐. Following Sec. 9.1.8 we
112
now have a mixed integer conic quadratic problem
minimize โ๐ฆ โ ๐โ2subject to ๐๐๐ฅ๐ + ๐๐ โค ๐๐, ๐ = 1, . . . , ๐, ๐ = 1, . . . , ๐,
๐๐ โค ๐๐๐ฅ๐ + ๐๐ + ๐(1 โ ๐ง๐๐), ๐ = 1, . . . , ๐, ๐ = 1, . . . , ๐,โ๏ธ๐ ๐ง๐๐ = 1, ๐ = 1 . . . , ๐,
๐ง๐๐ โ {0, 1}, ๐ = 1, . . . , ๐, ๐ = 1, . . . , ๐,
(9.22)
with variables ๐, ๐, ๐, ๐ง, where ๐ is a sufficiently large constant.
Fig. 9.5: Convex piecewise linear fit with ๐ = 2, 3, 4 segments.
Frequently an integer model will have properties which formally follow from the problemโsconstraints, but may be very hard or impossible for a mixed-integer solver to automaticallydeduce. It may dramatically improve efficiency to explicitly add some of them to the model.For example, we can enhance (9.22) with all inequalities of the form
๐ง๐,๐+1 + ๐ง๐+๐โฒ,๐ โค 1,
which indicate that each linear segment covers a contiguous subset of the sample and addi-tionally force these segments to come in the order of increasing ๐ as ๐ increases from left toright. The last statement is an example of symmetry breaking.
113
Chapter 10
Quadratic optimization
In this chapter we discuss convex quadratic and quadratically constrained optimization.Our discussion is fairly brief compared to the previous chapters for three reasons; (i) convexquadratic optimization is a special case of conic quadratic optimization, (ii) for most convexproblems it is actually more computationally efficient to pose the problem in conic form, and(iii) duality theory (including infeasibility certificates) is much simpler for conic quadraticoptimization. Therefore, we generally recommend a conic quadratic formulation, see Sec. 3and especially Sec. 3.2.3.
10.1 Quadratic objective
A standard (convex) quadratic optimization problem
minimize 12๐ฅ๐๐๐ฅ + ๐๐๐ฅ
subject to ๐ด๐ฅ = ๐,๐ฅ โฅ 0,
(10.1)
with ๐ โ ๐ฎ๐+ is conceptually a simple extension of a standard linear optimization problem
with a quadratic term ๐ฅ๐๐๐ฅ. Note the important requirement that ๐ is symmetric positivesemidefinite; otherwise the objective function would not be convex.
10.1.1 Geometry of quadratic optimization
Quadratic optimization has a simple geometric interpretation; we minimize a convexquadratic function over a polyhedron., see Fig. 10.1. It is intuitively clear that the followingdifferent cases can occur:
โข The optimal solution ๐ฅโ is at the boundary of the polyhedron (as shown in Fig. 10.1).At ๐ฅโ one of the hyperplanes is tangential to an ellipsoidal level curve.
โข The optimal solution is inside the polyhedron; this occurs if the unconstrained mini-mizer arg min๐ฅ
12๐ฅ๐๐๐ฅ + ๐๐๐ฅ = โ๐โ ๐ (i.e., the center of the ellipsoidal level curves) is
inside the polyhedron. From now on ๐โ denotes the pseudoinverse of ๐; in particular๐โ = ๐โ1 if ๐ is positive definite.
114
โข If the polyhedron is unbounded in the opposite direction of ๐, and if the ellipsoid levelcurves are degenerate in that direction (i.e., ๐๐ = 0), then the problem is unbounded.If ๐ โ ๐ฎ๐
++, then the problem cannot be unbounded.
โข The problem is infeasible, i.e., {๐ฅ | ๐ด๐ฅ = ๐, ๐ฅ โฅ 0} = โ .
xโ
a0
a1a2
a3
a4
Fig. 10.1: Geometric interpretation of quadratic optimization. At the optimal point ๐ฅโ thehyperplane {๐ฅ | ๐๐1 ๐ฅ = ๐} is tangential to an ellipsoidal level curve.
Possibly because of its simple geometric interpretation and similarity to linear optimiza-tion, quadratic optimization has been more widely adopted by optimization practitionersthan conic quadratic optimization.
10.1.2 Duality in quadratic optimization
The Lagrangian function (Sec. 8.3) for (10.1) is
๐ฟ(๐ฅ, ๐ฆ, ๐ ) =1
2๐ฅ๐๐๐ฅ + ๐ฅ๐ (๐โ ๐ด๐๐ฆ โ ๐ ) + ๐๐๐ฆ (10.2)
with Lagrange multipliers ๐ โ R๐+, and from โ๐ฅ๐ฟ(๐ฅ, ๐ฆ, ๐ ) = 0 we get the necessary first-order
optimality condition
๐๐ฅ = ๐ด๐๐ฆ + ๐ โ ๐,
i.e. (๐ด๐๐ฆ + ๐ โ ๐) โ โ(๐). Then
arg min๐ฅ
๐ฟ(๐ฅ, ๐ฆ, ๐ ) = ๐โ (๐ด๐๐ฆ + ๐ โ ๐),
which can be substituted into (10.2) to give a dual function
๐(๐ฆ, ๐ ) =
{๏ธ๐๐๐ฆ โ 1
2(๐ด๐๐ฆ + ๐ โ ๐)๐โ (๐ด๐๐ฆ + ๐ โ ๐), (๐ด๐๐ฆ + ๐ โ ๐) โ โ(๐),
โโ otherwise.
115
Thus we get a dual problem
maximize ๐๐๐ฆ โ 12(๐ด๐๐ฆ + ๐ โ ๐)๐โ (๐ด๐๐ฆ + ๐ โ ๐)
subject to (๐ด๐๐ฆ + ๐ โ ๐) โ โ(๐),๐ โฅ 0,
(10.3)
or alternatively, using the optimality condition ๐๐ฅ = ๐ด๐๐ฆ + ๐ โ ๐ we can write
maximize ๐๐๐ฆ โ 12๐ฅ๐๐๐ฅ
subject to ๐๐ฅ = ๐ด๐๐ฆ โ ๐ + ๐ ,๐ โฅ 0.
(10.4)
Note that this is an unusual dual problem in the sense that it involves both primal and dualvariables.
Weak duality, strong duality under Slater constraint qualification and Farkas infeasibilitycertificates work similarly as in Sec. 8. In particular, note that the constraints in both (10.1)and (10.4) are linear, so Lemma 2.4 applies and we have:
1. The primal problem (10.1) is infeasible if and only if there is ๐ฆ such that ๐ด๐๐ฆ โค 0 and๐๐๐ฆ > 0.
2. The dual problem (10.4) is infeasible if and only if there is ๐ฅ โฅ 0 such that ๐ด๐ฅ = 0,๐๐ฅ = 0 and ๐๐๐ฅ < 0.
10.1.3 Conic reformulation
Suppose we have a factorization ๐ = ๐น ๐๐น where ๐น โ R๐ร๐, which is most interesting when๐ โช ๐. Then ๐ฅ๐๐๐ฅ = ๐ฅ๐๐น ๐๐น๐ฅ = โ๐น๐ฅโ22 and the conic quadratic reformulation of (10.1) is
minimize ๐ + ๐๐๐ฅsubject to ๐ด๐ฅ = ๐,
๐ฅ โฅ 0,(1, ๐, ๐น๐ฅ) โ ๐ฌ๐+2
r ,
(10.5)
with dual problem
maximize ๐๐๐ฆ โ ๐ขsubject to โ๐น ๐๐ฃ = ๐ด๐๐ฆ โ ๐ + ๐ ,
๐ โฅ 0,(๐ข, 1, ๐ฃ) โ ๐ฌ๐+2
r .
(10.6)
Note that in an optimal primal-dual solution we have ๐ = 12โ๐น๐ฅโ22, hence the complementary
slackness for ๐ฌ๐+2r demands ๐ฃ = โ๐น๐ฅ and โ๐น ๐๐น๐ฃ = ๐๐ฅ, as well as ๐ข = 1
2โ๐ฃโ22 = 1
2๐ฅ๐๐๐ฅ.
This justifies why some of the dual variables in (10.6) and (10.4) have the same names - theyare in fact equal, and so both the primal and dual solution to the original quadratic problemcan be recovered from the primal-dual solution to the conic reformulation.
116
10.2 Quadratically constrained optimization
A general convex quadratically constrained quadratic optimization problem is
minimize 12๐ฅ๐๐0๐ฅ + ๐๐0 ๐ฅ + ๐0
subject to 12๐ฅ๐๐๐๐ฅ + ๐๐๐ ๐ฅ + ๐๐ โค 0, ๐ = 1, . . . ,๐,
(10.7)
where ๐๐ โ ๐ฎ๐+. This corresponds to minimizing a convex quadratic function over an inter-
section of convex quadratic sets such as ellipsoids or affine halfspaces. Note the importantrequirement ๐๐ โชฐ 0 for all ๐ = 0, . . . ,๐, so that the objective function is convex and theconstraints characterize convex sets. For example, neither of the constraints
1
2๐ฅ๐๐๐๐ฅ + ๐๐๐ ๐ฅ + ๐๐ = 0,
1
2๐ฅ๐๐๐๐ฅ + ๐๐๐ ๐ฅ + ๐๐ โฅ 0
characterize convex sets, and therefore cannot be included.
10.2.1 Duality in quadratically constrained optimization
The Lagrangian function for (10.7) is
๐ฟ(๐ฅ, ๐) = 12๐ฅ๐๐0๐ฅ + ๐๐0 ๐ฅ + ๐0 +
โ๏ธ๐๐=1 ๐๐
(๏ธ12๐ฅ๐๐๐๐ฅ + ๐๐๐ ๐ฅ + ๐๐
)๏ธ= 1
2๐ฅ๐๐(๐)๐ฅ + ๐(๐)๐๐ฅ + ๐(๐),
where
๐(๐) = ๐0 +๐โ๏ธ๐=1
๐๐๐๐, ๐(๐) = ๐0 +๐โ๏ธ๐=1
๐๐๐๐, ๐(๐) = ๐0 +๐โ๏ธ๐=1
๐๐๐๐.
From the Lagrangian we get the first-order optimality conditions
๐(๐)๐ฅ = โ๐(๐), (10.8)
and similar to the case of quadratic optimization we get a dual problem
maximize โ12๐(๐)๐๐(๐)โ ๐(๐) + ๐(๐)
subject to ๐(๐) โ โ (๐(๐)) ,๐ โฅ 0,
(10.9)
or equivalently
maximize โ12๐ค๐๐(๐)๐ค + ๐(๐)
subject to ๐(๐)๐ค = โ๐(๐),๐ โฅ 0.
(10.10)
Using a general version of the Schur Lemma for singular matrices, we can also write (10.9)as an equivalent semidefinite optimization problem,
maximize ๐ก
subject to[๏ธ
2(๐(๐) โ ๐ก) ๐(๐)๐
๐(๐) ๐(๐)
]๏ธโชฐ 0,
๐ โฅ 0.
(10.11)
117
Feasibility in quadratically constrained optimization is characterized by the following con-ditions (assuming Slater constraint qualification or other conditions to exclude ill-posedproblems):
โข Either the primal problem (10.7) is feasible or there exists ๐ โฅ 0, ๐ ฬธ= 0 satisfying
๐โ๏ธ๐=1
๐๐
[๏ธ2๐๐ ๐๐๐๐๐ ๐๐
]๏ธโป 0. (10.12)
โข Either the dual problem (10.10) is feasible or there exists ๐ฅ โ R๐ satisfying
๐0๐ฅ = 0, ๐๐0 ๐ฅ < 0, ๐๐๐ฅ = 0, ๐๐๐ ๐ฅ = 0, ๐ = 1, . . . ,๐. (10.13)
To see why the certificate proves infeasibility, suppose for instance that (10.12) and (10.7)are simultaneously satisfied. Then
0 <โ๏ธ๐
๐๐
[๏ธ1๐ฅ
]๏ธ๐ [๏ธ2๐๐ ๐๐๐๐๐ ๐๐
]๏ธ [๏ธ1๐ฅ
]๏ธ= 2
โ๏ธ๐
๐๐
(๏ธ1
2๐ฅ๐๐๐๐ฅ + ๐๐๐ ๐ฅ + ๐๐
)๏ธโค 0
and we have a contradiction, so (10.12) certifies infeasibility.
10.2.2 Conic reformulation
If ๐๐ = ๐น ๐๐ ๐น๐ for ๐ = 0, . . . ,๐, where ๐น๐ โ R๐๐ร๐ then we get a conic quadratic reformulation
of (10.7):
minimize ๐ก0 + ๐๐0 ๐ฅ + ๐0subject to (1, ๐ก0, ๐น0๐ฅ) โ ๐ฌ๐0+2
r ,(1,โ๐๐๐ ๐ฅโ ๐๐, ๐น๐๐ฅ) โ ๐ฌ๐๐+2
r , ๐ = 1, . . . ,๐.(10.14)
The primal and dual solution of (10.14) recovers the primal and dual solution of (10.7) sim-ilarly as for quadratic optimization in Sec. 10.1.3. Let us see for example, how a (conic) pri-mal infeasibility certificate for (10.14) implies an infeasibility certificate in the form (10.12).Infeasibility in (10.14) involves only the last set of conic constraints. We can derive theinfeasibility certificate for those constraints from Lemma 8.3 or by directly writing the La-grangian
๐ฟ = โโ๏ธ๐
(๏ธ๐ข๐ + ๐ฃ๐(โ๐๐๐ ๐ฅโ ๐๐) + ๐ค๐
๐ ๐น๐๐ฅ)๏ธ
= ๐ฅ๐(๏ธโ๏ธ
๐ ๐ฃ๐๐๐ โโ๏ธ
๐ ๐น๐๐ ๐ค๐
)๏ธ+ (โ๏ธ
๐ ๐ฃ๐๐๐ โโ๏ธ
๐ ๐ข๐) .
The dual maximization problem is unbounded (i.e. the primal problem is infeasible) if wehave โ๏ธ
๐ ๐ฃ๐๐๐ =โ๏ธ
๐ ๐น๐๐ ๐ค๐,โ๏ธ
๐ ๐ฃ๐๐๐ >โ๏ธ
๐ ๐ข๐,2๐ข๐๐ฃ๐ โฅ โ๐ค๐โ2, ๐ข๐, ๐ฃ๐ โฅ 0.
118
We claim that ๐ = ๐ฃ is an infeasibility certificate in the sense of (10.12). We can assume๐ฃ๐ > 0 for all ๐, as otherwise ๐ค๐ = 0 and we can take ๐ข๐ = 0 and skip the ๐-th coordinate. Let๐ denote the matrix appearing in (10.12) with ๐ = ๐ฃ. We show that ๐ is positive definite:
[๐ฆ, ๐ฅ]๐๐ [๐ฆ, ๐ฅ] =โ๏ธ
๐
(๏ธ๐ฃ๐๐ฅ
๐๐๐๐ฅ + 2๐ฃ๐๐ฆ๐๐๐ ๐ฅ + 2๐ฃ๐๐๐๐ฆ
2)๏ธ
>โ๏ธ
๐
(๏ธ๐ฃ๐โ๐น๐๐ฅโ2 + 2๐ฆ๐ค๐
๐ ๐น๐๐ฅ + 2๐ข๐๐ฆ2)๏ธ
โฅโ๏ธ๐ ๐ฃโ1๐ โ๐ฃ๐๐น๐๐ฅ + ๐ฆ๐ค๐โ2 โฅ 0.
10.3 Example: Factor model
Recall from Sec. 3.3.3 that the standard Markowitz portfolio optimization problem is
maximize ๐๐๐ฅsubject to ๐ฅ๐ฮฃ๐ฅ โค ๐พ,
๐๐๐ฅ = 1,๐ฅ โฅ 0,
(10.15)
where ๐ โ R๐ is a vector of expected returns for ๐ different assets and ฮฃ โ ๐ฎ๐+ denotes the
corresponding covariance matrix. Problem (10.15) maximizes the expected return of an in-vestment given a budget constraint and an upper bound ๐พ on the allowed risk. Alternatively,we can minimize the risk given a lower bound ๐ฟ on the expected return of investment, i.e.,we can solve the problem
minimize ๐ฅ๐ฮฃ๐ฅsubject to ๐๐๐ฅ โฅ ๐ฟ,
๐๐๐ฅ = 1,๐ฅ โฅ 0.
(10.16)
Both problems (10.15) and (10.16) are equivalent in the sense that they describe the samePareto-optimal trade-off curve by varying ๐ฟ and ๐พ.
Next consider a factorization
ฮฃ = ๐ ๐๐ (10.17)
for some ๐ โ R๐ร๐. We can then rewrite both problems (10.15) and (10.16) in conic quadraticform as
maximize ๐๐๐ฅsubject to (1/2, ๐พ, ๐ ๐ฅ) โ ๐ฌ๐+2
๐ ,๐๐๐ฅ = 1,๐ฅ โฅ 0,
(10.18)
and
minimize ๐กsubject to (1/2, ๐ก, ๐ ๐ฅ) โ ๐ฌ๐+2
๐ ,๐๐๐ฅ โฅ ๐ฟ,๐๐๐ฅ = 1,๐ฅ โฅ 0,
(10.19)
119
respectively. Given ฮฃ โป 0, we may always compute a factorization (10.17) where ๐ is upper-triangular (Cholesky factor). In this case ๐ = ๐, i.e., ๐ โ R๐ร๐, so there is little differencein complexity between the conic and quadratic formulations. However, in practice, betterchoices of ๐ are either known or readily available. We mention two examples.
Data matrix
ฮฃ might be specified directly in the form (10.17), where ๐ is a normalized data-matrixwith ๐ observations of market data (for example daily returns) of the ๐ assets. When theobservation horizon ๐ is shorter than ๐, which is typically the case, the conic representationis both more parsimonious and has better numerical properties.
Factor model
For a factor model we have
ฮฃ = ๐ท + ๐๐๐ ๐
where ๐ท = Diag(๐ค) is a diagonal matrix, ๐ โ R๐ร๐ represents the exposure of assets to riskfactors and ๐ โ R๐ร๐ is the covariance matrix of factors. Importantly, we normally have asmall number of factors (๐ โช ๐), so it is computationally much cheaper to find a Choleskydecomposition ๐ = ๐น ๐๐น with ๐น โ R๐ร๐. This combined gives us ฮฃ = ๐ ๐๐ for
๐ =
[๏ธ๐ท1/2
๐น๐
]๏ธof dimensions (๐ + ๐) ร ๐. The dimensions of ๐ are larger than the dimensions of theCholesky factors of ฮฃ, but ๐ is very sparse, which usually results in a significant reductionof solution time. The resulting risk minimization conic problem can ultimately be writtenas:
minimize ๐ + ๐กsubject to (1/2, ๐ก, ๐น๐๐ฅ) โ ๐ฌ๐+2
๐ ,
(1/2, ๐, ๐ค1/21 ๐ฅ1, ยท ยท ยท , ๐ค1/2
๐ ๐ฅ๐) โ ๐ฌ๐+2๐ ,
๐๐๐ฅ โฅ ๐ฟ,๐๐๐ฅ = 1,๐ฅ โฅ 0.
(10.20)
120
Chapter 11
Bibliographic notes
The material on linear optimization is very basic, and can be found in any textbook. Forfurther details, we suggest a few standard references [Chv83] , [BT97] and [PS98] , whichall cover much more that discussed here. [NW06] gives a more modern treatment of boththeory and algorithmic aspects of linear optimization.
Material on conic quadratic optimization is based on the paper [LVBL98] and the books[BenTalN01] , [BV04] . The papers [AG03] , [ART03] contain additional theoretical andalgorithmic aspects.
For more theory behind the power cone and the exponential cone we recommend thethesis [Cha09] .
Much of the material about semidefinite optimization is based on the paper [VB96]and the books [BenTalN01] , [BKVH07] . The section on optimization over nonnegativepolynomials is based on [Nes99] , [Hac03] . We refer to [LR05] for a comprehensive surveyon semidefinite optimization and relaxations in combinatorial optimization.
The chapter on conic duality follows the exposition in [GartnerM12]Mixed-integer optimization is based on the books [NW88] , [Wil93] . Modeling of piecewise
linear functions is described in the survey paper [VAN10] .
121
Chapter 12
Notation and definitions
R and Z denote the sets of real numbers and integers, respectively. R๐ denotes the set of๐-dimensional vectors of real numbers (and similarly for Z๐ and {0, 1}๐); in most cases wedenote such vectors by lower case letters, e.g., ๐ โ R๐. A subscripted value ๐๐ then refers tothe ๐-th entry in ๐, i.e.,
๐ = (๐1, ๐2, . . . , ๐๐).
The symbol ๐ denotes the all-ones vector ๐ = (1, . . . , 1)๐ , whose length always follows fromthe context.
All vectors are interpreted as column-vectors. For ๐, ๐ โ R๐ we use the standard innerproduct,
โจ๐, ๐โฉ := ๐1๐1 + ๐2๐2 + ยท ยท ยท + ๐๐๐๐,
which we also write as ๐๐ ๐ := โจ๐, ๐โฉ. We let R๐ร๐ denote the set of ๐ร ๐ matrices, and weuse upper case letters to represent them, e.g., ๐ต โ R๐ร๐ is organized as
๐ต =
โกโขโขโขโฃ๐11 ๐12 . . . ๐1๐๐21 ๐22 . . . ๐2๐...
... . . . ...๐๐1 ๐๐2 . . . ๐๐๐
โคโฅโฅโฅโฆFor matrices ๐ด,๐ต โ R๐ร๐ we use the inner product
โจ๐ด,๐ตโฉ :=๐โ๏ธ๐=1
๐โ๏ธ๐=1
๐๐๐๐๐๐.
For a vector ๐ฅ โ R๐ we have
Diag(๐ฅ) :=
โกโขโขโขโฃ๐ฅ1 0 . . . 00 ๐ฅ2 . . . 0...
... . . . ...0 0 . . . ๐ฅ๐
โคโฅโฅโฅโฆ ,
122
i.e., a square matrix with ๐ฅ on the diagonal and zero elsewhere. Similarly, for a squarematrix ๐ โ R๐ร๐ we have
diag(๐) := (๐ฅ11, ๐ฅ22, . . . , ๐ฅ๐๐).
A set ๐ โ R๐ is convex if and only if for any ๐ฅ, ๐ฆ โ ๐ and ๐ โ [0, 1] we have
๐๐ฅ + (1 โ ๐)๐ฆ โ ๐
A function ๐ : R๐ โฆโ R is convex if and only if its domain dom(๐) is convex and for all๐ โ [0, 1] we have
๐(๐๐ฅ + (1 โ ๐)๐ฆ) โค ๐๐(๐ฅ) + (1 โ ๐)๐(๐ฆ).
A function ๐ : R๐ โฆโ R is concave if and only if โ๐ is convex. The epigraph of a function๐ : R๐ โฆโ R is the set
epi(๐) := {(๐ฅ, ๐ก) | ๐ฅ โ dom(๐), ๐(๐ฅ) โค ๐ก},
shown in Fig. 12.1.
epi(f)
f
Fig. 12.1: The shaded region is the epigraph of the function ๐(๐ฅ) = โ log(๐ฅ).
Thus, minimizing over the epigraph
minimize ๐กsubject to ๐(๐ฅ) โค ๐ก
is equivalent to minimizing ๐(๐ฅ). Furthermore, ๐ is convex if and only if epi(๐) is a convexset. Similarly, the hypograph of a function ๐ : R๐ โฆโ R is the set
hypo(๐) := {(๐ฅ, ๐ก) | ๐ฅ โ dom(๐), ๐(๐ฅ) โฅ ๐ก}.
Maximizing ๐ is equivalent to maximizing over the hypograph
maximize ๐กsubject to ๐(๐ฅ) โฅ ๐ก,
and ๐ is concave if and only if hypo(๐) is a convex set.
123
Bibliography
[AG03] F. Alizadeh and D. Goldfarb. Second-order cone programming. Math. Programming,95(1):3โ51, 2003.
[ART03] E. D. Andersen, C. Roos, and T. Terlaky. On implementing a primal-dual interior-point method for conic quadratic optimization. Math. Programming, February 2003.
[BT97] D. Bertsimas and J. N. Tsitsiklis. Introduction to linear optimization. Athena Scien-tific, 1997.
[BKVH07] S. Boyd, S.J. Kim, L. Vandenberghe, and A. Hassibi. A Tutorial on Geo-metric Programming. Optimization and Engineering, 8(1):67โ127, 2007. Available athttp://www.stanford.edu/~boyd/gp_tutorial.html.
[BV04] S. Boyd and L. Vandenberghe. Convex optimization. Cambridge University Press,2004. http://www.stanford.edu/~boyd/cvxbook/.
[CS14] Venkat Chandrasekaran and Parikshit Shah. Conic geometric programming. In In-formation Sciences and Systems (CISS), 2014 48th Annual Conference on, 1โ4. IEEE,2014.
[Cha09] Peter Robert Chares. Cones and interior-point algorithms for structed convex opti-mization involving powers and exponentials. PhD thesis, Ecole polytechnique de Louvain,Universitet catholique de Louvain, 2009.
[Chv83] V. Chvรกtal. Linear programming. W.H. Freeman and Company, 1983.
[GartnerM12] B. Gรคrtner and J. Matousek. Approximation algorithms and semidefinite pro-gramming. Springer Science & Business Media, 2012.
[Hac03] Y. Hachez. Convex optimization over non-negative polynomials: structured algo-rithms and applications. PhD thesis, Universitรฉ Catholique De Lovain, 2003.
[LR05] M. Laurent and F. Rendl. Semidefinite programming and integer programming.Handbooks in Operations Research and Management Science, 12:393โ514, 2005.
[LVBL98] M. S. Lobo, L. Vanderberghe, S. Boyd, and H. Lebret. Applications of second-order cone programming. Linear Algebra Appl., 284:193โ228, November 1998.
[NW88] G. L. Nemhauser and L. A. Wolsey. Integer programming and combinatorial opti-mization. John Wiley and Sons, New York, 1988.
124
[Nes99] Yu. Nesterov. Squared functional systems and optimization problems. In H. Frenk,K. Roos, T. Terlaky, and S. Zhang, editors, High Performance Optimization. KluwerAcademic Publishers, 1999.
[NW06] J. Nocedal and S. Wright. Numerical optimization. Springer Science, 2nd edition,2006.
[PS98] C. H. Papadimitriou and K. Steiglitz. Combinatorial optimization: algorithms andcomplexity. Dover publications, 1998.
[TV98] T. Terlaky and J.-Ph. Vial. Computing maximum likelihood estimators of convexdensity functions. SIAM J. Sci. Statist. Comput., 19(2):675โ694, 1998.
[VB96] L. Vandenberghe and S. Boyd. Semidefinite programming. SIAM Rev., 38(1):49โ95,March 1996.
[VAN10] J. P. Vielma, S. Ahmed, and G. Nemhauser. Mixed-integer models for nonseparablepiecewise-linear optimization: unifying framework and extensions. Operations research,58(2):303โ315, 2010.
[Wil93] H. P. Williams. Model building in mathematical programming. John Wiley and Sons,3rd edition, 1993.
[Zie82] H Ziegler. Solving certain singly constrained convex optimization problems in pro-duction planning. Operations Research Letters, 1982. URL: http://www.sciencedirect.com/science/article/pii/016763778290030X.
[BenTalN01] A. Ben-Tal and A. Nemirovski. Lectures on Modern Convex Optimization:Analysis, Algorithms, and Engineering Applications. MPS/SIAM Series on Optimiza-tion. SIAM, 2001.
125
Index
Aabsolute value, 8, 22, 105adjacency matrix, 72
Bbasis pursuit, 9, 15big-M, 81, 102binary optimization, 70, 72binary variable, 102Boolean operator, 106
Ccardinality constraint, 112Cholesky factorization, 23, 27, 52complementary slackness, 95condition number, 57cone
convex, 20distance to, 85dual, 89exponential, 37p-order, 31power, 29product, 88projection, 85quadratic, 19, 52rotated quadratic, 20, 52second-order, 19self-dual, 89semidefinite, 51
conic quadratic optimization, 19constraint, 3
disjunctive, 104indicator, 104redundant, 80
constraint attribution, 98constraint satisfaction, 105
convexcone, 20function, 123set, 123
covariance matrix, 27, 119CQO, 19curve fitting, 66cut, 72
Ddeterminant, 58disjunctive constraint, 104dual
cone, 89function, 14, 93norm, 10problem, 14, 93, 116
dualityconic, 87gap, 96linear, 13, 18quadratic, 115, 117strong, 16, 96weak, 16, 95
dynamical system, 48
Eeigenvalue optimization, 56ellipsoid, 25, 26, 65, 114entropy, 39
maximization, 48relative, 40, 48
epigraph, 123exponential
cone, 37function, 39optimization, 37
126
Ffactor model, 23, 28, 120Farkas lemma, 12, 17, 90, 116, 118feasibility, 88feasible set, 3, 12, 88Fermat-Torricelli point, 36filter design, 69function
concave, 123convex, 123dual, 14, 93entropy, 39exponential, 39, 41Lagrange, 14, 92, 93, 115, 117Lambert W, 41logarithm, 39, 42lower semicontinuous, 109piecewise linear, 7, 108power, 23, 31, 44, 110sigmoid, 49softplus, 40
Ggeometric mean, 33, 34, 36geometric median, 35geometric programming, 42GP, 42Grammian matrix, 52
Hhalfspace, 5harmonic mean, 24Hermitian matrix, 62hitting time, 48homogenization, 10, 28Huber loss, 77hyperplane, 4hypograph, 123
Iill-posed, 77, 79, 88, 91indicator
constraint, 104variable, 102
infeasibility, 88
certificate, 12, 90, 92, 116, 118locating, 13
inner product, 122matrix, 51
integer variable, 102inventory, 29
KKullback-Leiber divergence, 40, 48
LLagrange function, 14, 92, 93, 115, 117Lambert W function, 41limit-feasibility, 91linear
matrix inequality, 55, 83, 99near dependence, 78optimization, 2
linear matrix inequality, 55, 83, 99linear-fractional problem, 10LMI, 55, 83, 99LO, 2log-determinant, 58log-sum-exp, 40log-sum-inv, 41logarithm, 39, 42logistic regression, 49lowpass filter, 69
MMarkowitz model, 27matrix
adjacency, 72correlation, 65covariance, 27, 119Grammian, 52Hermitian, 62inner product, 51positive definite, 51pseudo-inverse, 53, 114semidefinite, 51, 114, 117variable, 51
MAX-CUT, 73maximum, 7, 9, 44, 106maximum likelihood, 36, 45
127
meangeometric, 33harmonic, 24
MIO, 101monomial, 42
Nnearest correlation matrix, 65network
design, 110wireless, 46, 110
norm1-norm, 8, 1052-norm, 22dual, 10Euclidean, 22Frobenius, 45infinity norm, 9nuclear, 59p-norm, 31, 35
normal equation, 98
Oobjective, 114objective function, 3optimal value, 3, 88
unattainment, 10, 88optimization
binary, 70boolean, 72eigenvalue, 56exponential, 37linear, 2mixed integer, 101p-order cone, 31power cone, 29practical, 73quadratic, 113robust, 26semidefinite, 50
overflow, 81
Ppenalization, 80perturbation, 77
piecewise linearfunction, 7regression, 112
pOCO, 31polyhedron, 5, 35, 114polynomial
curve fitting, 66nonnegative, 60, 61, 67trigonometric, 62, 64, 69
portfolio optimizationcardinality constraint, 112constraint attribution, 98covariance matrix, 27, 119duality, 94factor model, 23, 28, 120fully invested, 47, 105market impact, 34Markowitz model, 27risk factor exposure, 120risk parity, 47Sharpe ratio, 28trading size, 112transaction costs, 111
posynomial, 42power, 23, 44power cone optimization, 29power control, 46precision, 82, 84principal submatrix, 53pseudo-inverse, 53, 54
QQCQO, 26, 117QO, 22, 113quadratic
cone, 19duality, 115, 117optimization, 113rotated cone, 20
quadratic optimization, 22, 26
Rrate allocation, 46redundant constraints, 80regression
128
linear, 98logistic, 49piecewise linear, 112regularization, 49
regularization, 49relative entropy, 40, 48relaxation
semidefinite, 70, 72, 83Riesz-Fejer Theorem, 63risk parity, 47robust optimization, 26rotated quadratic cone, 20
Sscaling, 79Schur complement, 54, 60, 83, 117SDO, 50second-order cone, 19, 23semicontinuous variable, 103semidefinite
cone, 51optimization, 50relaxation, 70, 72, 83
setcovering, 106packing, 106partitioning, 106
setup cost, 103shadow price, 18Sharpe ratio, 28sigmoid, 49signal processing, 69signal-to-noise, 46singular value, 58, 101Slater constraint qualification, 97, 116, 118SOCO, 19softplus, 40SOS1, 106SOS2, 108spectrahedron, 53spectral factorization, 25, 51, 54spectral radius, 57sum of squares, 60, 63symmetry breaking, 113
Ttrading size, 112transaction costs, 111trigonometric polynomial, 62, 64, 69
Uunattainment, 78
Vvariable
binary, 102indicator, 102integer, 102matrix, 51semicontinuous, 103
verification, 84violation, 84volume, 34, 66
129