Interior Point Methods
for Linear, Quadratic
and Nonlinear Programming
Turin 2008
Jacek Gondzio
Lecture 11:
SemiDefinite Programming
1
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
SDP: SemiDefinite Prog
• Generalization of LP.
• Deals with symmetric positive semidefinite
matrices (Linear Matrix Inequalities, LMI).
• Solved with IPMs.
• Numerous applications:
eigenvalue optimization problems, LP, quasi-
convex programs, convex quadratically con-
strained optimization, robust mathematical
programming, matrix norm minimization,
combinatorial optimization, control theory,
statistics.
This lecture is based on two survey papers:
• L. Vandenberghe and S. Boyd, Semide-
finite Programming, SIAM Review 38 (1996)
pp. 49-95.
• M.J. Todd, Semidefinite Optimization,
Acta Numerica 10 (2001) pp. 515-560.
The Internet resources:
• http://www-unix.mcs.anl.gov/otc/InteriorPoint/
• http://www.zib.de/helmberg/semidef.html
2
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
SDP: Background
Def. A matrix H ∈ Rn×n is positive semidefinite
if xTHx ≥ 0 for any x 6= 0. We write H � 0.
Def. A matrix H ∈ Rn×n is positive definite if
xTHx > 0 for any x 6= 0. We write H ≻ 0.
We denote with SRn×n (SRn×n+
) the set of sym-
metric positive semidefinite (positive definite) ma-
trices.
Let U, V ∈ SRn×n. We define the inner product
between U and V as U •V = trace(UTV), where
trace(H) =∑n
i=1hii. The associated norm is the
Frobenius norm, written ‖U‖F = (U • U)1/2 (or
just ‖U‖).
Def. Linear Matrix Inequalities
Let U, V ∈ SRn×n.
We write U � V iff U − V � 0.
We write U ≻ V iff U − V ≻ 0.
We write U � V iff U − V � 0.
We write U ≺ V iff U − V ≺ 0.
3
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Properties
1. If P ∈ Rm×n and Q ∈ Rn×m, then
trace(PQ) = trace(QP).
2. If U, V ∈ SRn×n, and Q ∈ Rn×n is orthogonal
(i.e. QTQ = I), then U •V = (QTUQ)•(QTV Q).
More generally, if P is nonsingular, then
U • V = (PUPT ) • (P−TV P−1).
3. Every U ∈SRn×n can be written as U=QΛQT,
where Q is orthogonal and Λ is diagonal. Then
UQ = QΛ. In other words the columns of Q are
the eigenvectors, and the diagonal entries of Λ
the corresponding eigenvalues of U .
4. If U ∈ SRn×n and U = QΛQT , then
trace(U) = trace(Λ) =∑
i λi.
5. For U∈SRn×n, the following are equivalent:
(i) U � 0 (U ≻ 0)
(ii) xTUx ≥ 0, ∀x∈Rn (xTUx>0,∀0 6= x∈Rn).
(iii) If U = QΛQT , then Λ � 0 (Λ ≻ 0).
(iv) U = PT P for some matrix P (U = PTP for
some square nonsingular matrix P). 4
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Properties (continued)
6. Every U ∈ SRn×n has a square root
U1/2 ∈ SRn×n.
Proof: From Property 5 (ii) we get U = QΛQT .
Take U1/2 = QΛ1/2QT , where Λ1/2 is the diago-
nal matrix whose diagonal contains the (nonneg-
ative) square roots of the eigenvalues of U , and
verify that U1/2U1/2 = U .
7. Suppose
U =
[
A BT
B C
]
,
where A and C are symmetric and A ≻ 0. Then
U � 0 (U ≻ 0) iff C − BA−1BT � 0 (≻ 0).
The matrix C−BA−1BT is called the Schur com-
plement of A in U .
Proof: follows easily from the factorization:[
A BT
B C
]
=
[
I 0
BA−1 I
][
A 0
0 C−BA−1BT
][
I A−1BT
0 I
]
.
8. If U ∈SRn×n and x∈Rn, then xTUx=U •xxT .
5
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Primal and Dual SDPs
Consider a primal SDP
min C • Xs.t. Ai • X = bi, i = 1..m,
X � 0,
where all Ai ∈ SRn×n, b ∈ Rm, C ∈ SRn×n are
given, and X ∈ SRn×n is the variable.
The associated dual SDP
max bTys.t.
∑mi=1 yiAi + S = C
S � 0,
where y ∈ Rm and S ∈ SRn×n are the variables.
An equivalent dual SDP has the form
max bTys.t.
∑mi=1 yiAi � C.
6
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Weak Duality in SDP
Theorem (Weak Duality)
If X is feasible in the primal and (y, S) in the
dual, then
C • X − bTy = X • S ≥ 0.
Proof:
C • X − bTy = (m∑
i=1
yiAi + S) • X − bTy
=m∑
i=1
(Ai • X) yi + S • X − bTy
= S • X = X • S.
Further, since X is positive semidefinite, it has a
square root X1/2 (Property 6), and so
X • S = trace(XS) = trace(X1/2X1/2S)
= trace(X1/2SX1/2) ≥ 0.
We use Property 1 and the fact that S and
X1/2 are positive semidefinite, hence X1/2SX1/2
is positive semidefinite and its trace is nonnega-
tive.
7
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
SDP Example 1
Minimize the Maximum Eigenvalue
We wish to choose x ∈ Rk to minimize the max-
imum eigenvalue of A(x)=A0+x1A1+. . .+xkAk,
where Ai ∈ Rn×n and Ai = ATi .
Observe that
λmax(A(x)) ≤ t
if and only if
λmax(A(x) − tI) ≤ 0
or equivalently if and only if
λmin(tI − A(x)) ≥ 0.
This holds iff
tI − A(x) � 0.
So we get the SDP in the dual form:
max −t
s.t. tI − A(x) � 0,
where the variable is y := (t, x).
Application: This problem arises for example in
stabilizing a differential equation.
8
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
SDP Example 2
Logarithmic Chebyshev Approximation
Suppose we wish to solve Ax ≈ b approximately,
where A = [a1 . . . an]T ∈ Rn×k and b ∈ Rn. In
Chebyshev approximation we minimize the ℓ∞-
norm of the residual, i.e., we solve
min maxi
|aTi x − bi|.
This can be cast as a linear program, with x and
an auxiliary variable t:
min t
s.t. −t ≤ aTi x − bi ≤ t, i = 1..n.
In some applications bi has a dimension of a
power of intensity, and it is typically expressed
on a logarithmic scale. In such cases the more
natural optimization problem is
min maxi
| log(aTi x) − log(bi)|
(assuming aTi x > 0 and bi > 0).
9
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Log Chebyshev Approximation
The logarithmic Chebyshev approximation prob-
lem can be cast as a semidefinite program. To
see this, note that
| log(aTi x)−log(bi)| = logmax(aT
i x/bi, bi/aTi x).
Hence the problem can be rewritten as the fol-
lowing nonlinear program
min t
s.t. 1/t ≤ aTi x/bi ≤ t, i = 1..n.
or,
min t
s.t.
t−aTi x/bi 0 0
0 aTi x/bi 1
0 1 t
� 0, i = 1..n
which is a semidefinite program.
10
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
SDP Example 3
Stats: Minimum Trace Factor Analysis
Assume x ∈ Rn is a random vector, with mean
x and covariance matrix Σ. We take a large
number of samples y = x + w, where the mea-
surement noise w has zero mean, is uncorrelated
with x, and has an unknown but diagonal covari-
ance matrix D. It follows that Σ = Σ+D, where
Σ denotes covariance matrix of y.
We assume that we can estimate Σ with high
confidence, i.e., we consider it a known, con-
stant matrix.
We do not know Σ, the covariance of x, or D,
the covariance matrix of the measurement noise.
But they are both positive semidefinite, so we
know that Σ lies in the convex set
C = {Σ−D | Σ−D � 0, D ≻ 0, D diagonal }.
We can derive bounds on linear functions of Σ
by solving SDPs over the set C.
11
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Min Trace Factor Analysis
Consider the sum of components of x, i.e., eT x.The variance of eTx is given by
eTΣ e = eT Σ e − trace(D).
Since we do not know Σ, we cannot say exactly
what eTΣ e is. But we can compute its bounds.
The upper bound is eT Σ e.A lower bound can be obtained by solving SDP
maxn∑
i=1di
s.t. Σ − diag(d) � 0,d ≥ 0.
Fletcher interpreted this as the educational test-
ing problem. The vector y gives the scores of a
random student on a series of n tests, and eTygives the total score. One considers the test to
be reliable if the variance of the measured total
scores eTy is close to the variance of eTx over
the entire population. The quantity
ρ = eTΣ e/eT Σ e
is called the reliability of the test. SDP provides
a lower bound for ρ.12
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Logarithmic Barrier Function
Define the logarithmic barrier function for the
cone SRn×n+
of positive definite matrices.
f : SRn×n+
7→ R
f(X) =
{
− ln detX if X ≻ 0+∞ otherwise.
Let us evaluate its derivatives.
Let X ≻ 0, H ∈ SRn×n. Then
f(X+αH)=− ln det[X(I + αX−1H)]
=− ln detX−ln(1+αtrace(X−1H)+O(α2))
= f(X) − αX−1 • H + O(α2),
so that f ′(X)=−X−1 and Df(X)[H]=−X−1 •H.
Similarly
f ′(X+αH) =−[X(I + αX−1H)]−1
=−[I−αX−1H+O(α2)]X−1
=f ′(X) + αX−1HX−1 + O(α2),
so that f ′′(X)[H]=X−1HX−1
and D2f(X)[H, G]=X−1HX−1 • G.
Finally,
f ′′′(X)[H,G]=−X−1HX−1GX−1 −X−1GX−1HX−1.
13
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Logarithmic Barrier Function
Theorem:
f(X) = − ln detX is a convex barrier for SRn×n+
.
Proof:
Define φ(α) = f(X +αH). We know that f is
convex if, for every X ∈ SRn×n+
and every H ∈
SRn×n, φ(α) is convex in α.
Consider a set of α such that X+αH ≻ 0. On
this set
φ′′(α) = D2f(X)[H, H] = X−1HX−1 • H,
where X = X+αH.
Since X≻0, so is V =X−1/2 (Property 6), and
φ′′(α) = V 2HV 2 • H = trace(V2HV2H)
= trace((VHV)(VHV)) = ‖VHV‖2F ≥ 0.
So φ is convex.
When X ≻ 0 approaches a singular matrix, its
determinant approaches zero and f(X) → ∞.
14
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Simplified Notation
Define A : SRn×n 7→ Rm
AX = (Ai • X)mi=1 ∈ Rm.
Note that, for any X ∈ SRn×n and y ∈ Rm,
(AX)T y =m∑
i=1
(Ai • X) yi = (m∑
i=1
yiAi) • X,
so the adjoint of A is given by
A∗y =m∑
i=1
yiAi.
A∗ is a mapping from Rm to SRn×n.
With this notation the primal SDP becomes
min C • Xs.t. AX = b,
X � 0,
where X ∈ SRn×n is the variable.
The associated dual SDP writes
max bTys.t. A∗y + S = C
S � 0,
where y ∈ Rm and S ∈ SRn×n are the variables.
15
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Solving SDPs with IPMs
Replace the primal SDP
min C • Xs.t. AX = b,
X � 0,
with the primal barrier SDP
min C • X + µf(X)s.t. AX = b,
(with a barrier parameter µ ≥ 0).
Formulate the Lagrangian
L(X, y, S) = C • X + µf(X) − yT(AX − b),
with y ∈ Rm, and write the first order conditions
(FOC) for a stationary point of L:
C + µf ′(X) −A∗y = 0.
Use f(X) = − ln det(X) and f ′(X) = −X−1.
Denote S = µX−1, i.e., XS = µI.
For a positive definite matrix X its inverse is also
positive definite. The FOC now become:
AX = b,A∗y + S = C,
XS = µI,
with X ≻ 0 and S ≻ 0.16
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Newton direction
We derive the Newton direction for the system:
AX = b,A∗y + S = C,
−µX−1 + S = 0.
Recall that the variables in FOC are (X, y, S),
where X, S ∈ SRn×n+
and y ∈ Rm.
Hence we look for a direction (∆X,∆y,∆S),
where ∆X,∆S ∈ SRn×n+
and ∆y ∈ Rm.
The differentiation in the above system is a non-
trivial operation.
The direction is the solution of the system:
A 0 00 A∗ I
µ(X−1 ⊙ X−1) 0 I
·
∆X∆y∆S
=
ξbξCξµ
.
We introduce a useful notation P ⊙ Q for n × nmatrices P and Q. This is an operator from
SRn×n to SRn×n defined by
(P ⊙ Q)U =1
2(PUQT + QUPT ).
17
Interior Point Methods
for Linear, Quadratic
and Nonlinear Programming
Turin 2008
Jacek Gondzio
Lecture 12:
Linear Algebra Issues
1
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Linear Algebra of IPM for LP
First order optimality conditions
Ax = b,
ATy + s = c,
XSe = µe.
Newton direction
A 0 0
0 AT IS 0 X
∆x∆y∆s
=
ξp
ξdξµ
,
where
ξp
ξdξµ
=
b − Ax
c − ATy − sµe − XSe
.
Use the third equation to eliminate
∆s = X−1(ξµ − S∆x)
= −X−1S∆x + X−1ξµ,
from the second equation and get[
−Θ−1 AT
A 0
] [
∆x∆y
]
=
[
ξd − X−1ξµ
ξp
]
.
where Θ = XS−1 is a diagonal scaling matrix.
2
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Weighted Least Squares
Augmented system (symmetric but indefinite)[
−Θ−1 AT
A 0
] [
∆x∆y
]
=
[
rh
]
,
where[
rh
]
=
[
ξd − X−1ξµ
ξp
]
.
Eliminate
∆x = ΘAT∆y − Θr,
to get normal equations (symmetric, positive
definite system)
(AΘAT)∆y = g = AΘr + h.
Matrix AΘAT has always the same sparsity struc-
ture (only Θ changes in subsequent iterations).
Two step solution method:
• factorization to LDLT form,
• backsolve to compute direction ∆y.3
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Linear Algebra of IPMs
Cholesky factorization
LDLT = AΘAT .
Cholesky factorization is nothing else but the
Gaussian Elimination process that exploits two
properties of the matrix:
• symmetry;
• positive definiteness.
Def. A matrix H∈Rm×m is symmetric if H =HT .
Example: H1 is symmetric, H2 is not.
H1 =
[
5 33 8
]
and H2 =
[
5 32 8
]
.
Def. A matrix H ∈ Rm×m is positive definite if
xTHx > 0 for any x 6= 0.
Example: H3 is positive definite, H4 is not.
H3 =
[
5 33 2
]
and H4 =
[
4 33 2
]
.
4
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Pivots from the diagonal
Lemma 1: If H =
[
p aT
a W
]
∈ Rm×m is positive
definite then W−1paaT is also positive definite.
Proof: Observe that p > 0. Otherwise, for x =
e1 ∈ Rm we would have xTHx = p < 0.
We shall prove the main result by contradiction.
Suppose W − 1paaT is not positive definite that is
there exists x ∈ Rm−1, x 6= 0 such that
xT (W − p−1aaT ) x < 0.
Define x = (α, xT )T ∈ Rm, where α = −xTa/p
and compute:
xTHx = [α, xT ]
[
p aT
a W
][
αx
]
=α2p+2αxTa+xTWx
=xT(W−1
paaT )x+
1
p(αp+xTa)2=xT(W−
1
paaT )x<0,
which contradicts positive definiteness of H.
Corollary.
If GE is applied to a positive definite matrix, then
all pivots can be chosen from the diagonal.
Diagonal pivoting preserves symmetry. 5
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Cholesky Factorization
Use Cholesky factorization
LDLT = AΘAT ,
where:
L is a lower triangular matrix; and
D is a diagonal matrix.
Hence replace the difficult equation
(AΘAT) · ∆y = g,
with a sequence of easy equations:
L · u = g,
D · v = u,
LT · ∆y = v.
Note that
g = Lu
= L(Dv)
= LD(LT∆y)
= (LDLT )∆y
= (AΘAT)∆y.
6
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Existence of LDLT factors
Lemma 2: The decomposition H = LDLT with
dii > 0,∀i exists iff H is positive definite (PD).
Proof:
Part 1 ( ⇒ ) Let H = LDLT with dii > 0.
Take any x 6= 0 and let u = LTx. Since L is a
unit lower triangular matrix it is nonsingular so
u 6= 0 and
xTHx = xTLDLTx = uTDu =m∑
i=1
diiu2i > 0.
Part 2 ( ⇐ ) Proof by induction on dimension
of H. For m=1. H =h11=d11>0 iff H is PD.
Assume the result is true for m = k − 1 ≥ 1.
Let H =
[
W a
aT q
]
∈ Rk×k be given k × k positive
definite matrix with W ∈ R(k−1)×(k−1), a ∈ Rk−1
and q ∈ R. Note first that since H is PD, W is
also PD. Indeed for any (x,0) ∈ Rk we have
[x,0]
[
W a
aT q
][
x0
]
=xTWx>0 ∀x∈Rk−1, x 6= 0.
7
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Existence of LDLT factors
From inductive hypothesis we know that
W =LDLT with dii>0. Let[
W a
aT q
]
=
[
L 0
lT 1
] [
D 00 d
] [
LT l0 1
]
,
where l is the solution of equation (LD)l = a (itis well defined since L and D are nonsingular)
and d is given by d = q − lTDl.
Hence matrix H =
[
W a
aT q
]
has an LDLT de-
composition. It remains to prove that d > 0.
Consider the vector
x =
[
−L−T l1
]
.
Since H is positive definite, we get
0 < xTHx
= [−lTL−1,1]
[
L 0
lT 1
][
D 00 d
][
LT l0 1
][
−L−T l1
]
= [0,1]
[
D 00 d
] [
01
]
= d,
which completes the proof.
8
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Linear Equation Systems
Let H∈Rn×n be a nonsingular matrix
H =
h11 h12 · · · h1nh21 h22 · · · h2n... ... . . . ...hn1 hn2 · · · hnn
.
The general linear equation system
Hx = b
is difficult to solve.
Suppose we represent H in the following form:
H =
1 0 · · · 0l21 1 · · · 0... ... . . . ...ln1 ln2 · · · 1
u11 u12 · · · u1n0 u22 · · · u2n... ... . . . ...0 0 · · · unn
.
The system Hx = b can be solved as a sequence
of two (easy) triangular systems
Lz = b,
Ux = z.
9
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Triangular Equation Systems
The system with a unit lower triangular matrix
1 0 · · · 0l21 1 · · · 0... ... . . . ...ln1 ln2 · · · 1
z1z2...zn
=
b1b2...bn
can be solved as followsfor i = 1,2,...,n
zi := bi
for j = i+1,i+2,...,n
bj := bj − ljizi
The system with an upper triangular matrix
u11 u12 · · · u1n0 u22 · · · u2n... ... . . . ...0 0 · · · unn
x1x2...xn
=
z1z2...zn
can be solved as followsfor i = n,n-1,...,1
xi := zi/uii
for j = 1,2,...,i-1
zj := zj − ujixi10
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Gaussian Elimination
Matrix A and the right-hand-side at the begin-ning of step k of Gaussian Elimination:
A(k)=
a11 a12 · · · a1,k−1 a1k · · · a1n
0 a(2)22 · · · a(2)
2,k−1 a(2)2k
· · · a(2)2n
...... . . . ...
......
0 0 · · · a(k−1)k−1,k−1 a
(k−1)k−1,k · · · a
(k−1)k−1,n
0 0 · · · 0 a(k)kk
· · · a(k)kn... ... ... ... ...
0 0 · · · 0 a(k)nk
· · · a(k)nn
, b(k)=
b1b(2)2...
b(k−1)k−1
b(k)k...
b(k)n
.
Elimination Operationsfor i = k+1,k+2,...,n
mik := a(k)ik /a
(k)kk
for j = k+1,k+2,...,n
a(k+1)ij := a(k)
ij − mika(k)kj
b(k+1)i := b(k)i − mikb
(k)k
After step k of Gaussian Elimination:
A(k+1)=
a11 a12 · · · a1k a1,k+1 · · · a1n
0 a(2)22 · · · a(2)
2ka(2)2,k+1 · · · a(2)
2n... ... . . . ... ... ...
0 0 · · · a(k)kk
a(k)k,k+1 · · · a(k)
kn
0 0 · · · 0 a(k+1)k+1,k+1 · · · a(k+1)
k+1,n... ... ... ... ...
0 0 · · · 0 a(k+1)n,k+1 · · · a(k+1)
n,n
, b(k+1)=
b1
b(2)2...
b(k)k
b(k+1)k+1...
b(k+1)n
.
11
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Elementary Operations in GE
Step k of Gaussian Elimination can be viewed as
multiplying A(k) and b(k) with the following n×nmatrix
Mk =
1.. .
11
−mk+1,k 1... . . .
−mn,k 1
.
After n − 1 steps
A(n)=Mn−1Mn−2 · · ·M1A=
a11 a12 · · · a1n
0 a(2)22 · · · a
(2)2n... ... . . . ...
0 0 · · · a(n)nn
is an upper triangular matrix.
Gaussian Elimination implicitly transforms ma-
trix A into the product A = LU . Normally, ma-
trix L is not stored. However, it is possible to
save elementary row operators m(k)ij and obtain
an explicit form of matrix L.
12
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Accuracy of Computations
When you compute
S = πr2,
you probably set π = 3.14.
Beware of the finite precision of the computer.
IEEE standard relative precision:
practically all computers in the world use
ǫ = 2.23 × 10−16.
This means that
1 + ǫ = 1 + ǫ
but
1 +1
2ǫ = 1.
13
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Accuracy and Gaussian Elim.
General Idea: Avoid divisions by small numbersbecause they cause growth of numerical errors.
Partial Pivoting
Choose the element with the largest absolutevalue in a column as a pivot.
Complete Pivoting
Choose the element with the largest absolutevalue in an active submatrix as a pivot.
Complete pivoting ensures excellent accuracy.Partial pivoting provides acceptable accuracy.No pivoting at all is extremely dangerous.
Example
GE without pivoting applied to a matrix
1 1 11 1 23 4 5
gives in the second step the following 2×2 Schurcomplement [
0 11 2
]
so there is a zero at the place of pivot.
14
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Computational Complexity
A flop is a floating point operation:
x := x + a · b.
LU decomposition of an unsymmetric n × n ma-
trix needs 13n3 flops.
Sketch of the proof.
At iteration k of LU decomposition one flop is
executed for every element in an active subma-
trix of size (n − k) × (n − k). Thus the whole
decomposition requires
n−1∑
k=1
(n − k)2 ≈1
3n3
flops.
Solving equation with an LU decomposition of
an unsymmetric n × n matrix needs n2 flops.
15
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Additional Cost of Pivoting
Partial Pivoting
At step k we choose maxi≥k
|a(k)ik | so overall we add
n−1∑
k=1
(n − k) ≈1
2n2
comparisons of real numbers.
Complete Pivoting
At step k we choose maxi,j≥k
|a(k)ij | so overall we add
n−1∑
k=1
(n − k)2 ≈1
3n3
comparisons of real numbers.
A comparison of two real numbers has a costcomparable to a single flop. Thus partial pivot-ing adds very little to the cost of LU decompo-sition. However, complete pivoting doubles thecost of decomposition.The guarantee of accuracy is costly!
GE with complete pivoting is two times moreexpensive than GE with partial pivoting.
16
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Symmetric Gaussian Elim.
Let H ∈Rm×m be a symmetric positive definite
matrix
H =
h11 h12 · · · h1mh21 h22 · · · h2m... ... . . . ...hm1 hm2 · · · hmm
.
By applying Gaussian Elimination to it, we can
represent it in the following form:
1 0 · · · 0l21 1 · · · 0... ... . . . ...lm1 lm2 · · · 1
d11 0 · · · 00 d22 · · · 0... ... . . . ...0 0 · · · dmm
1 l21 · · · lm1
0 1 · · · lm2... ... . . . ...0 0 · · · 1
Example 1:
1 −1 2−1 3 02 0 9
=
1 0 0−1 1 02 1 1
1 0 00 2 00 0 3
1 −1 20 1 10 0 1
.
Example 2:
1 1 −11 5 7−1 7 22
=
1 0 01 1 0−1 2 1
1 0 00 4 00 0 5
1 1 −10 1 20 0 1
.
17
Interior Point Methods
for Linear, Quadratic
and Nonlinear Programming
Turin 2008
Jacek Gondzio
Lecture 13:
Linear Algebra Issues (cont’d)
Definite & Indefinite Systems
1
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Definite & Indefinite Systems
Cholesky factorization fails for indefinite matrix.
Example 1: Negative pivot d22 < 0.[
3 22 1
]
=
[
1 02/3 1
] [
3 00 −1/3
] [
1 2/30 1
]
.
Example 2: d11 = 0. Can’t start elimination.[
0 22 5
]
=???
IPMs:
For indefinite augmented system[
−Θ−1 AT
A 0
] [
∆x∆y
]
=
[
rh
]
.
one needs to use some special tricks.
For positive definite normal equations
(AΘAT)∆y = g.
one can compute the Cholesky factorization.
2
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Symmetric Factorization
Two step solution method:
• factorization to LDLT form,
• backsolve to compute direction ∆y.
A symmetric nonsingular matrix H is factoriz-
able if there exists a diagonal matrix D and unit
lower triangular matrix L such that H = LDLT .
A symmetric matrix H is strongly factorizable
if for any permutation matrix P a factorization
PHPT = LDLT exists.
The general symmetric indefinite matrix is not
factorizable.
3
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Factorizing Indefinite Matrix
Two options are possible:
1. Replace diagonal matrix D with a block-
diagonal one and allow 2 × 2 (indefinite) pivots[
0 aa 0
]
and
[
0 aa d
]
.
Hence obtain a decomposition H = LDLT with
block-diagonal D.
2. Regularize indefinite matrix to produce a
quasidefinite matrix
K =
[
−E AT
A F
]
,
where
E ∈ Rn×n is positive definite,
F ∈ Rm×m is positive definite, and
A ∈ Rm×n has full row rank.
4
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Using 2 × 2 Pivots
Replace diagonal matrix D with a block-diagonal
one and allow 2 × 2 (indefinite) pivots[
0 aa 0
]
and
[
0 aa d
]
.
Hence obtain a decomposition H = LDLT with
block-diagonal D.
Drawback:
One cannot predict when the zero pivot will ap-
pear. Consequently, the pivots have to be cho-
sen dynamically (one cannot reorder the matrix
beforehand and that way separate symbolic and
numerical factorization phases).
This considerably slows down the computations.
5
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Quasidefinite Matrices
Symmetric matrix is called quasidefinite if
K =
[
−E AT
A F
]
,
where E ∈ Rn×n and
F ∈ Rm×m are positive definite, and
A ∈ Rm×n has full row rank.
Symmetric nonsingular matrix K is factorizable
if there exists a diagonal matrix D and unit lower
triangular matrix L such that K = LDLT .
The symmetric matrix K is strongly factorizable
if for any permutation matrix P a factorization
PKPT = LDLT exists.
Vanderbei (1995) proved that
Symmetric QDFM’s are strongly factorizable.
For any quasidefinite matrix
there exists a Cholesky-like factorization
H = LDLT ,
where
D is diagonal but not positive definite:
n negative pivots; and m positive pivots.6
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Quasidefinite Matrix
Def:
An indefinite matrix
H =
[
−E AT
A F
]
,
where
E ∈ Rn×n is positive definite, and
F ∈ Rm×m is positive definite
is called quasi-definite.
Lemma 1: If G ∈ Rn×n is positive definite, and
p > 0 then for any a ∈ Rn the matrix G+ 1paaT is
positive definite.
Proof: Take any x ∈ Rn, x 6= 0 and observe that
xT (G +1
paaT )x = xTGx +
1
p(xTa)2 > 0.
Lemma 2: If G ∈ Rn×n is negative definite, and
p < 0 then for any a ∈ Rn the matrix G+ 1paaT is
negative definite.
7
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Factorizing Quasidefinite Mtx
Lemma 3:
A quasidefinite matrix is strongly factorizable.
Sketch of the proof.
We show that the elimination of any diagonal
pivot in a quasidefinite matrix preserves quasidef-
initeness in the Schur complement.
We consider two cases depending on where the
first pivot comes from (part −E or part F).
Suppose the first pivot is from −E part:
p eT uT
e −E1 AT1
u A1 F
,
where p < 0 is the pivot element and E1 is a pos-
itive definite matrix. After the elimination of the
pivot, we obtain the following Schur complement
−E1 − 1peeT AT
1 − 1peuT
A1 − 1pueT F − 1
puuT
. (1)
Matrix −E1 − 1peeT is negative definite (Lect 12:
Lemma 1) and matrix F−1puuT is positive definite
(Lemma 1), so (1) is quasidefinite.
8
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
From Indefinite to Quasidef.
Indefinite matrix
H =
[
−Q − Θ−1 AT
A 0
]
.
Vanderbei (1995): replace Ax=b with Ax≤b
HV =
−Θ−1s 0 I
0 −Q − Θ−1 AT
I A 0
and eliminate Θ−1s
K =
[
−Q − Θ−1 AT
A Θs
]
.
Saunders (1996):
HS =
[
−Q − Θ−1 AT
A 0
]
+
[
−γ2In 0
0 δ2Im
]
,
whereγδ ≥
√ε = 10−8.
A & G (1999): use dynamic regularization
H =
[
−Q − Θ−1 AT
A 0
]
+
[
−Rp 00 Rd
]
,
Rp ∈ Rn×n and Rd ∈ Rm×m are primal and dual
regularizations.9
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
From Indefinite to Quasidef.
Indefinite matrix
H =
[
−Q − Θ−1 AT
A 0
]
.
in IPMs can be converted to a quasidefinite one.
Regularize indefinite matrix to produce a quasi-
definite matrix. Use dynamic regularization
H =
[
−Q − Θ−1 AT
A 0
]
+
[
−Rp 00 Rd
]
,
where
Rp ∈ Rn×n is a primal regularization
Rd ∈ Rm×m is a dual regularization.
For any quasidefinite matrix
there exists a Cholesky-like factorization
H = LDLT ,
where
D is diagonal but not positive definite:
n negative pivots;
m positive pivots.
10
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Primal Regularization
Primal barrier problem
min zP = cTx+12xTQx−µ
∑nj=1(ln xj+ln sj)
s. to Ax = b,x + s = u,x, s > 0
[
−Q − Θ−1 AT
A 0
] [
∆x∆y
]
=
[
fh
]
.
Primal regularized barrier problem
min zP +1
2(x − x0)
TRp(x − x0)
s. to Ax = b,
x + s = u,
x, s > 0
[
−Q − Θ−1 − Rp AT
A 0
] [
∆x∆y
]
=
[
f ′
h
]
,
where
f ′ = f − Rp(x − x0).
11
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Dual Regularization
Dual barrier problem
max zD = bTy−uTw−12xTQx+µ
n∑
j=1(ln zj+lnwj)
s. to ATy + z − w − Qx = c,x ≥ 0, z, w > 0
[
−Q − Θ−1 AT
A 0
] [
∆x∆y
]
=
[
fh
]
.
Dual regularized barrier problem
max zD − 1
2(y − y0)
TRd(y − y0)
s. to ATy + z − w − Qx = c,
x ≥ 0, z, w > 0
[
−Q − Θ−1 AT
A Rd
] [
∆x∆y
]
=
[
fh′
]
,
where
h′ = h − Rd(y − y0).
12
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Large Problems are Sparse
Large problems are always sparse.
Linear programs are perfect examples of that.
Exploiting sparsity in computations leads to huge
savings. Exploiting sparsity means mainly avoid-
ing doing useless computations: the computa-
tions for which the result is known, as for exam-
ple multiplications with zero.
Example: Consider a multiplication
Ax =
2 1 0 4 0 0
0 2 0 −1 5 −1
3 0 3 8 0 5
2
0
5
0
0
−2
.
It requires computing
2 · A.1 + 5 · A.3 − 2 · A.6
and involves only five multiplications and five ad-
ditions. We say that this matrix-vector multipli-
cation needs 5 flops.
13
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Sparsity Issues in QP
Example
1 1
1 2 1
1 2 1
1 2 1
1 2
−1
=
1
1 1
1 1
1 1
1 1
·
1 1
1 1
1 1
1 1
1
−1
=
1 −1 1 −1 1
1 −1 1 −1
1 −1 1
1 −1
1
·
1
−1 1
1 −1 1
−1 1 −1 1
1 −1 1 −1 1
=
5 −4 3 −2 1
−4 4 −3 2 −1
3 −3 3 −2 1
−2 2 −2 2 −1
1 −1 1 −1 1
.
Conclusion:
the inverse of the sparse matrix may be dense.
IPMs for QP:
Do not explicitly invert the matrix Q + Θ−1
in the matrix A(Q + Θ−1)−1AT .
Use augmented system instead.14
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
IPMs: LP vs QP
Augmented system in LP[
−Θ−1 AT
A 0
] [
∆x∆y
]
=
[
ξd − X−1ξµ
ξp
]
.
Eliminate ∆x from the first equation and get
normal equations
(AΘAT)∆y = g.
Augmented system in QP[
−Q − Θ−1 AT
A 0
] [
∆x∆y
]
=
[
ξd − X−1ξµ
ξp
]
.
Eliminate ∆x from the first equation and get
normal equations
(A(Q + Θ−1)−1AT)∆y = g.
One can use normal equations in LP, but not
in QP. Normal equations in QP may become al-
most completely dense even for sparse matrices
A and Q. Thus, in QP, usually the indefinite
augmented system form is used.
15
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Quasidefinite System in NLP
Lemma 4:
If f : Rn 7→ R is strictly convex, and g : Rn 7→ Rm
is convex, both f and g are twice differentiable,
and A(x) has full row rank for any x, then the
augmented system matrix
H =
[
Q(x, y) A(x)T
A(x) −ZY −1
]
is quasidefinite for any x and any z, y > 0.
Proof:
From convexity of f and g, the matrix Q(x, y) is
positive definite. Since z, y > 0, the matrix ZY −1
is also positive definite. Hence H is quasidefinite.
Remark 1:
Note that if f is convex but not strictly con-
vex, then the regularization is needed to make
Q(x, y) = Q(x, y) + Rp positive definite.
Remark 2:
The assumption that A(x) has full row rank for
any x is an example of the regularity condition.
16
Interior Point Methods
for Linear, Quadratic
and Nonlinear Programming
Turin 2008
Jacek Gondzio
Lecture 14:
Linear Algebra Issues (cont’d)
Exploiting Sparsity
1
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Large Problems are Sparse
Large problems are always sparse.
Linear programs are perfect examples of that.
Exploiting sparsity in computations leads to huge
savings. Exploiting sparsity means mainly avoid-
ing doing useless computations: the computa-
tions for which the result is known, as for exam-
ple multiplications with zero.
Example: Consider a multiplication
Ax =
2 1 0 4 0 0
0 2 0 −1 5 −1
3 0 3 8 0 5
2
0
5
0
0
−2
.
It requires computing
2 · A.1 + 5 · A.3 − 2 · A.6
and involves only five multiplications and five ad-
ditions. We say that this matrix-vector multipli-
cation needs 5 flops.
2
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Sparse Systems of Equations
Dense system of linear equations
x1 + x2 + x3 + x5 = 112x1 − x2 + x3 − x4 + 2x5 = 93x1 − x2 − x3 − x4 + 2x5 = 4x1 − x2 + 3x3 + x4 − x5 = 7
2x1 + x3 − x4 + 7x5 = 36,
needs GE or to compute the LU decomposition
of the matrix involved.
What do you do for sparse system of linear equa-
tions ?
x1 + x2 + x3 + x5 = 11x2 − x3 = −1
x1 + x3 + x4 + x5 = 13x2 + 2x3 = 8
4x4 − 2x5 = 6.
Use common sense:
• try to simplify,
• solve for “easy” variables.
You might be unaware of that
... but you exploit sparsity.
3
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
LU factors of Sparse Matrix
The best pivot candidates (in terms of sparsity)
are singleton rows/columns.
Example:
1 2 3 4 5 6
1 x x x2 x x x x
3 x x x4 x x
5 p6 x x
pivot : a52
rows : 5 − 1
columns : 2 − 1
2 1 3 4 5 6
5 x
2 x x x x3 x x x
4 x x1 x x x
6 x p
pivot : a64
rows : 6 − 2
columns : 4 − 1
2 4 3 1 5 6
5 x6 x x
3 x x x
4 x x1 x x p2 x x x x
pivot : a11
rows : 1 − 3
columns : 1 − 3
4
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
LU factors of Sparse Matrix
2 4 1 3 5 6
5 x
6 x x1 x x x
4 x x3 x x p2 x x x x
pivot : a35
rows : 3 − 4
columns : 5 − 3
2 4 1 5 3 6
5 x6 x x
1 x x x
3 x x x4 x x
2 x x x p
pivot : a23
rows : 2 − 4
columns : OK
2 4 1 5 3 6
5 x6 x x
1 x x x3 x x x
2 x x x x
4 x x
triangular matrix
5
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Sparse Equation Systems
The original linear system was
x1 + 2x2 − x4 = 1x2 + x3 − x4 + x5 = 6
3x1 + 3x2 − x5 = 4x3 + 2x6 = 15
2x2 = 4x2 + x4 = 6.
One should start from equation 5, then 6, etc.
The reordered linear system is
2x2 = 4x2 + x4 = 6
2x2 − x4 + x1 = 13x2 + 3x1 − x5 = 4x2 − x4 + x5 + x3 = 6
x3 + 2x6 = 15.
with the corresponding sparse matrix2 4 1 5 3 6
5 x6 x x
1 x x x3 x x x
2 x x x x4 x x
6
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
General Sparse Systems
Single step in Gaussian Elimination
A =
p vT
u A1
produces the following Schur complement
A1 − p−1uvT .
The effect on the sparsity pattern
1 2 3 4 5 6 7 8
1 p x x x x2 x x x x
3 x x x x4 x x x
5 x x x
6 x x7 x x x
8 x x x x
pivot : pnonzero : x
1 2 3 4 5 6 7 8
1 p x x x x2 x x x x
3 x x x f f f f4 x x x5 x f f x f f6 x x7 x x x
8 x f f f f
pivot : pnonzero : xfill − in : f
7
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Markowitz Pivot Choice
Let ri and ci, i = 1,2, ..., n be numbers of nonzero
entries in row and column i, respectively. The
elimination of the pivot aij needs
fij = (ri − 1)(cj − 1)
flops to be made. This step creates at most fij
new nonzero entries in the Schur complement.
Markowitz: Choose the pivot with mini,j fij.
1 2 3 4 5 6 7 8
1 x x x x x
2 x x x x
3 x x x x4 x x x
5 x x x6 p x
7 x x x8 x x x x
1 2 3 4 5 6 7 8
1 x x x f x
2 x x x x
3 x x x x4 x x x f5 x x x6 p x7 x x x8 x x x x
8
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Sparsity in Cholesky Factors
Example: Consider a symmetric 4 × 4 matrix
H =
x x x x
x xx x
x x
,
where x denotes a nonzero and empty spaces
denote zeros. Direct application of Cholesky
factorization would produce a completely dense
lower triangular factor
L =
xx x
x x xx x x x
.
However, it suffices to reorder (symmetrically)
the rows and columns of H
H = PHP T =
x x
x xx x
x x x x
,
to obtain sparse Cholesky factor
L =
x
x
xx x x x
.
9
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Minimum Degree Ordering
How to permute rows and columns of H to get
the sparsest possible Cholesky factorization?
Difficult problem but heuristics can help.
Example: Consider a symmetric matrix
H =
x x x xx x
x x xx x x
x x xx x x
.
Suppose h11 is the first pivot
H =
p x x xx x
x x f f xx f x f x
x x f f xx x x
.
Suppose h22 is the first pivot. Replace rows 1
and 2 and columns 1 and 2.
H =
p xx x x x
x x x
x x xx x x
x x x
.
10
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Minimum Degree Ordering
Recall Markowitz Pivot Choice
Let ri and ci, i = 1,2, ..., n be numbers of nonzero
entries in row and column i, respectively. The
elimination of the pivot aij needs
fij = (ri − 1)(cj − 1)
flops to be made.
Choose the pivot with mini,j fij.
1 2 3 4 5 6 7 8
1 x x x f x2 x x x x
3 x x x x4 x x x f5 x x x
6 p x7 x x x
8 x x x x
In symmetric positive definite case:
pivots are chosen from the diagonal and ri = ci
hence choose the pivot with mini ri
Minimum degree ordering: choose an element
with the minimum number of nonzeros in a row.
11
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Nested Dissection
Remove few nodes to disconnect the graph.
Consider a matrix
H =
1 2 3 4 5 6 7 8 9 10 11
1 x x x x
2 x x x x x3 x x x x
4 x x x x x x5 x x x x x
6 x x x x x
7 x x x x8 x x x x x
9 x x x x10 x x x x x x
11 x x x x x
and its graph
3
5
104
11
86
9
2 71
12
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Nested Dissection (cont’d)
Remove (permute) nodes 4 and 7.
PHP T =
1 2 3 5 6 8 9 10 11 4 7
1 x x x x2 x x x x x3 x x x x5 x x x x x6 x x x x x8 x x x x x9 x x x x
10 x x x x x x11 x x x x x4 x x x x x x7 x x x x
and its graph
3
5
104
11
86
9
2 71
13
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Linear Algebra of IPMs
Cholesky factorization
LDLT = AΘAT .
Involved preparation step:
• minimum degree ordering
(reduces # of nonzeros of L);
• symbolic factorization
(predicts the sparsity structure of L).
Computational complexity of different steps:
• minimum degree ordering O(∑
i n2i )
• numerical factorization O(∑
i n2i )
• symbolic factorization O(∑
i ni)
• backsolve O(∑
i ni)
where ni is # of nonzero entries in L.i
14
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Linear Algebra: SM vs IPM
Suppose an LP of dimension m × n is solved.
Iterations to reach optimum:
Simplex Method IPM
Theory Practice Theory PracticeNonpolynomial O(m + n) O(
√n) O(log10 n)
But ...
one iteration of the simplex method is usually
significantly less expensive.
Simplex method solves equation with the basis
matrix:[
B N
0 In−m
] [
xB
xN
]
=
[
b
0
]
,
that reduces to
BxB = b.
IPM solves equation with the matrix AΘAT :
(AΘAT)∆y = g.
15
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
References:
Gondzio
Implementing Cholesky Factorization for Interior
Point Methods of Linear Programming, Opti-
mization, 27 (1993) pp. 121–140.
Andersen, Gondzio, Meszaros and Xu
Implementation of Interior Point Methods for
Large Scale Linear Programming, in: Interior
Point Methods in Mathematical Programming,
T Terlaky (ed.), Kluwer Academic Publishers,
1996, pp. 189–252.
Altman and Gondzio
Regularized Symmetric Indefinite Systems in In-
terior Point Methods for Linear and Quadratic
Optimization, Optimization Methods and Soft-
ware, 11-12 (1999), pp 275–302.
16
Interior Point Methods
for Linear, Quadratic
and Nonlinear Programming
Turin 2008
Jacek Gondzio
Lecture 15:
Tree Sparse Problems
1
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Block-Angular LPs
Primal Block-Angular Structure
A =
A1
A2. . .
An
B1 B2 · · · Bn Bn+1
.
Dual Block-Angular Structure
A =
A1 C1
A2 C2. . . ...
An Cn
.
2
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Primal block-angular structure
A =
A1
A2. . .
An
B1 B2 · · · Bn Bn+1
.
Normal-equations matrix
AAT =
A1AT1 A1AT
1
A2AT2 A2BT
2. . . ...
AnATn AnBT
n
B1AT1 B2AT
2 · · · BnATn C
,
where
C =n+1∑
i=1
BiBTi .
Implicit inverse
AiATi = LiL
Ti
Bi = BiATi L−T
iS = C −
∑ni=1 BiB
Ti = LSLT
S
3
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Dual block-angular structure
A =
A1 C1
A2 C2. . . ...
An Cn
.
Normal-equations matrix
AAT =
A1AT1
A2AT2
. . .
AnATn
+ CCT ,
where C ∈ Rm×k defines a rank-k corrector.
Implicit inverse (Sherman-Morrison-Woodbury)
AiATi = LiL
Ti
diag(AiATi ) = diag(LiL
Ti ) = LLT
Ci = L−1i Ci
S = Ik +∑n
i=1 CTi Ci = LSLT
S
(AAT)−1 = (LLT + CCT)−1
= (LLT )−1−(LLT )−1CS−1CT (LLT )−1
4
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Stochastic LP with recourse
The two-stage stochastic program
minx∈X
cTx + Eξ{qTy(ξ)}
s.t. T(ξ) x + Wy(ξ) = h(ξ),x ≥ 0, y(ξ) ≥ 0, ∀ξ ∈ Ξ.
Assume random data has a joint finite discrete
distribution {(ξk, pk), k =1..N} with∑
k pk = 1.
We have stochastic program with fixed recourse
minx∈X
cTx +N∑
k=1
pkqTk yk
s.t. T(ξk) x + Wyk = h(ξk), k = 1..N,
x ≥ 0, yk ≥ 0, k = 1..N.
5
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Stochastic LPs (cont’d)
The deterministic equivalent formulation
minx∈X
cTx +p1qT1 y1 +p2qT
2 y2 . . . +pNqTNyN
s.t. T1x +Wy1 =h1
T2x +Wy2 =h2... . . . ...TNx +WyN =hN
x ≥ 0, y1 ≥ 0, y2 ≥ 0, . . . yN ≥ 0.
Dual block-angular structure
6
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Multi-stage Stoch. Programs
1
2
3
Period 1 Period 2 Period 3
Scenario 1
Scenario 2
Scenario 3
Scenario 4
4
5
7
6
The structured constraint matrix
Symmetrical event tree with p realizations at
each node and T + 1 periods corresponds to
pT
scenarios.7
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Real-life Stochastic Programs
Multistage stochastic linear program.
Reordered matrix of the multistage SLP.
8
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Exploiting Structure in IPMs
Interior Point Methods:
• are well-suited to large-scale optimization
• can take advantage of the parallelism
Large problems are “structured”:
• partial separability
• spatial distribution
• dynamics
• uncertainty
• etc.
Object-Oriented Parallel Solver (OOPS)
http://www.maths.ed.ac.uk/~gondzio/
parallel/solver.html
• Exploits structure
• Runs in parallel
• Solves problems with millions of variables
9
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Data Structure
Linear Algebra Module:
• Given A, x and Θ, compute Ax, ATx, AΘAT .
• Given AΘAT , compute the Cholesky factor-
ization AΘAT = LLT
• Given L and r, solve Lx = r and LT = r
Common choice: A single data structure to com-
pute the general sparse operations
How can we deal with block-structures?
Blocks may be
• general sparse matrices
• dense matrices
• very particular matrices
(identity, projection, GUB)
• block-structured
It is worth considering many data structures.
10
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
MDO for Sparse Matrix
H =
x x x x
x x
x x x
x x x
x x x
x x x
Pivot h11
p x x x
x x
x x f f x
x f x f x
x x f f x
x x x
Pivot h22
x x x x
p x
x x x
x x x
x x x
x x x
11
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
MDO for Block-Sparse Matrix
H =
Pivot Block H11
P
Pivot Block H22
P
12
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Primal block-angular
Q =
[ ]
A =
and AT =
[ ]
Augmented System:
H =
,
Reorder blocks: {1,3; 2,4; 5}.
PHPT =
13
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Dual block-angular
Q =
A =
[ ]
and AT =
Augmented System:
H =
Reorder blocks: {1,4; 2,5; 3}.
PHPT =
14
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Block Tree Structure
Structured Matrix:
11
Q12
Q10
Q21
Q22
Q23
Q20
Q30
Q
Q Q
Q
Q
T
T
31 32
32
31
D10
D30
D20
23D
22D
21D
12D
11D
C
BB
B B B
D1
D2
21 23
1211
C32
31
22
Associated Tree:
Dua
l Blo
ck A
ngul
ar S
truc
ture
Prim
al B
lock
Ang
ular
Str
uctu
re
Prim
al B
lock
Ang
ular
Str
uctu
re
D30
C31A
D1 D2
D D1211 D10 B B11 12 D D D D B B B21 22 23 20 21 22 23
C32
15
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Reordered Augmented Matrix
11
Q12
Q10
Q21
Q22
Q23
Q20
Q30
Q
Q Q
Q
Q
T
T
31 32
32
31
D10
D30
D20
23D
22D
21D
12D
11D
C
BB
B B B
D1
D2
21 23
1211
C32
31
22
T
T
D11
D12
D10
Q30
D30
D11
D12
D10
D30
~C31
~C31
~C31
C~32
C~32
C~32
C~32
~C
T31
T~C31
~CT
31 C~32T C
~T32
TC~32
TC~32
B
B T
B
B
B
B
~Q31
T
T
T
T
T
T
T
T
T
B
B
12
11
1211
T
T
T
D
D
D
D
D
D
21
21
22
22
23
23
D
DT
20
20
~Q31
~Q31
~Q32
~Q32
~Q32
~Q32
~Q
~Q
~Q
T32
32
32
~Q
~Q
~Q
~Q
31
31
31
32Q
Q
Q
Q
21
22
23
20
Q
Q
11
12
Q10
21 22 B23
B
T
T
T23
22
21
16
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
OOPS
• Every node in the block elimination tree
has its own linear algebra implementation
(depending on its type)
• Each implementation is a realisation of an
abstract linear algebra interface.
• Different implementations are available for
different structures
iA
iB
iC
iA
R
Rank corrector
implementation
RankCorrector
D
iA
iC
iB
y=Mtx
y=Mx
SolveLt
SolveL Implicit
PrimalBlockAngMatrixFactorize
factorization
Implicit
DualBlockAng
factorization
Implicit
factorization
BorderedBlockDiag
linear algebra
General sparse
linear algebra
SparseMatrix
DenseMatrix
General dense
Mat
rix
Inte
rfac
e
⇒ Rebuild block elimination tree
with matrix interface structures
17
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
References:
Gondzio and Sarkissian
Parallel Interior Point Solver for Structured Lin-
ear Programs, Mathematical Programming 96
(2003) 561-584.
Gondzio and Grothey
Reoptimization with the Primal-Dual Interior
Point Method, SIAM J. on Optimization 13
(2003) 842-864.
Gondzio and Grothey
Direct Solution of Linear Systems of Size 109
Arising in Optimization with Interior Point Meth-
ods, in: Parallel Processing and Applied Math-
ematics, R. Wyrzykowski et al. (eds), Lecture
Notes in Computer Science 3911 (2006) pp. 513–
525
Gondzio and Grothey
A New Unblocking Technique to Warmstart Inte-
rior Point Methods based on Sensitivity Analysis,
SIAM J. on Optimization (accepted).
18
Interior Point Methods
for Linear, Quadratic
and Nonlinear Programming
Turin 2008
Jacek Gondzio
Lecture 16:
Markowitz Portfolio
Optimization
1
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
The Markowitz Model
The material given in this lecture comes from the
book of R. J. Vanderbei: Linear Program-
ming: Foundations and Extensions,
Kluwer, Boston, 1997.
Suppose a collection of potential investments
I = {1,2, ..., n} is given. Let Ri denote the return
in the next time period on investment i ∈ I.
Ri is a random variable.
A portfolio is a vector x∈Rn. xi, i∈ I specifies
the fraction of asset put into investment i. The
portfolio satisfies∑
i∈I xi=1 and has the return
R =∑
i∈I
xiRi
that is uncertain. The reward associated with
such a portfolio is the expected return:
IE R =∑
i∈I
xiIE Ri.
2
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
The Markowitz Model
There is a risk associated with this investment.
It is measured with the variance of the return:
Var(R) = IE (R−IE R)2
= IE (∑
i∈I
xiRi−IE (∑
i∈I
xiRi))2
= IE (∑
i∈I
xiRi−∑
i∈I
xiIE Ri)2
= IE (∑
i∈I
xi(Ri−IE Ri))2
= IE (∑
i∈I
xiRi)2,
where
Ri = Ri − IE Ri.
3
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
The Markowitz Model
The investor has two major concerns:
• maximizing the reward IE R; and
• minimizing the associated risk Var(R).
These objectives usually are contradictory.
There is no chance to increase the reward with-
out exposing yourself to a higher risk.
Conversely, if you don’t want to take more risk,
you have to accept lower reward.
In the Markowitz model (1959), these two objec-
tives are combined and a quadratic optimization
problem is solved:
min −∑
i∈I xiIE Ri + λIE (∑
i∈I xiRi)2
s.t.∑
i∈I xi = 1,
xi ≥ 0,
where λ is a positive parameter that represents
investor’s attitude to risk. The higher the value
of λ, the less risk the investor is ready to incur.
Harry Markowitz received the 1990 Nobel
prize for this work. 4
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
The Markowitz Model (cont’d)
The quadratic program has an equivalent formu-
lation
min cTx + λxTQ x
s.t. eTx = 1,
x ≥ 0,
where c ∈ Rn : ci = −IE Ri is the expected return
of asset i, and Q ∈ Rn×n : Qij = IE (RiRj) is the
covariance of Ri and Rj.
Although this QP has only one linear constraint,
its solution for larger n is not trivial because ma-
trix Q is often completely dense.
5
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Estimates of IER and Var(R)
The Markowitz Model requires the knowledge of
the joint distribution of variables Ri.
These distributions are not known but may be
estimated using historical data. Suppose, for
simplicity, that only four classes of assets are
considered for possible investments: Long-term
government bonds (LTB), S&P 500, NASDAQ
Composite and Gold.
The historical data for the last 10 years for these
assets may look like:
Year LTB S&P NASDAQ Gold
1990 1.084 0.961 0.824 0.9251991 1.054 1.302 1.605 0.9571992 1.072 1.056 1.065 1.0641993 1.107 1.106 0.978 1.1231994 1.047 1.024 0.952 0.9921995 1.074 1.109 1.205 0.9711996 1.036 1.063 1.453 1.0631997 1.046 0.934 1.056 1.0871998 1.051 0.928 1.105 1.0731999 1.071 1.314 1.513 0.945
6
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Estimate of IER
A way to estimate IE Ri is to take the average of
the historical returns:
IE Ri =1
T·
T∑
t=1
Ri(t).
However, it is a very poor estimate. If in two
subsequent years the returns were 2.0 and 0.5
respectively, then this estimate would have given
IE Ri = (2.0 + 0.5)/2 = 1.25 though the invest-
ment in this particular asset made two years ago
would have a return 2.0 · 0.5 = 1.
A better way is to use a geometric mean
IE Ri =
T∏
t=1
Ri(t)
1
T
.
By taking logarithms of both sides, we obtain:
log IE Ri =1
T·
T∑
t=1
logRi(t).
7
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Estimate of IER (cont’d)
Additionally, one should pay more attention to
the most recent data. Indeed, the 1999 return is
more probable to repeat in 2000 than the 1990
one. Use a discounted sum of logarithms:
log IE Ri =1
∑Tt=1 sT−t
·T
∑
t=1
sT−t logRi(t),
where
s(t) =sT−t
∑Tt=1 sT−t
, t = 1,2, ..., T,
is a discount factor.
Indeed, £1 today is valid more than £1 next year.
8
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Estimate of Var(R)
Recall that
Var(R) = IE (∑
i∈I
xiRi)2,
where
Ri = Ri − IE Ri.
Thus Var(R) is a quadratic function of x
Var(R) = xTQ x,
where
Qij = IE (RiRj).
9
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Estimate of Var(R) (cont’d)
Suppose a discounted sum is used to estimate
IE Ri, i.e.
IE Ri =T
∑
t=1
s(t)Ri(t),
where
s(t) =sT−t
∑Tt=1 sT−t
, t = 1,2, ..., T.
Analogously, we use a discounted sum to esti-
mate Var(R), i.e.
Var(R) = IE (∑
i∈I
xiRi)2
=T
∑
t=1
s(t)
∑
i∈I
xiRi(t)
2
,
where
Ri(t) = Ri(t) − IE Ri
= Ri(t) −T
∑
t=1
s(t)Ri(t).
10
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
How to represent Var(R) ?
There are two ways of handling the double sum-
mation in the definition of Var(R):
Var(R) =T
∑
t=1
s(t)
∑
i∈I
xiRi(t)
2
.
Possibility 1.
Write:
Var(R) = xTQ x,
where matrix Q ∈ Rn×n has coefficients:
Qij =T
∑
t=1
s(t)Ri(t)Rj(t), i, j = 1,2, ..., n.
This gives dense matrix Q and a very difficult
nonseparable QP problem to solve.
11
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Another representation Var(R)
Possibility 2.
Write:
Var(R) = uTD u,
where diagonal matrix D ∈ RT×T has coeffi-
cients:
Dtt = s(t), t = 1,2, ..., T,
and u ∈ RT is defined as follows:
ut =∑
i∈I
xiRi(t), t = 1,2, ..., T.
We rewrite the last equation in a form
Fx = u,
where matrix F ∈ RT×n is built of coefficients
fti = Ri(t), t = 1, ..., T, i = 1, ..., n.
This formulation leads to a separable QP that
is easier to solve with IPM.
12
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
IPMs and the Markowitz Model
The original Markowitz quadratic program with
the matrix Q ∈ Rn×n
min cTx + 12xTQ x
s.t. eTx = 1, (1)
x ≥ 0,
can thus be replaced by the following equivalent
separable one
min cTx + 12uTD u
s.t. eTx = 1, (2)
Fx − u = 0,
x ≥ 0.
Suppose the number of assets n is much larger
than the number of time periods T
T ≪ n.
13
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
IPMs/M. Model (details)
The primal barrier QP
min cTx + 12uTD u − µ
n∑
j=1ln xj
s.t. eT x = 1,Fx − u = 0.
Lagrangian
L(x, u, y, z, µ)=
cTx+1
2uTDu−y(eTx−1)−zT(Fx−u)−µ
n∑
j=1
lnxj,
where y ∈ R and z ∈ RT are dual variables as-
sociated with linear constraints eTx = 1 and
Fx − u = 0, respectively.
The conditions for a stationary point
∇xL = c − ye − FT z − µX−1e = 0∇uL = Du + z = 0
−∇yL = eTx − 1 = 0−∇zL = Fx − u = 0.
14
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
IPMs/M. Model (details)
We denote
s = µX−1e, i.e. XSe = µe.
and get the first order optimality conditions:
eTx = 1,Fx − u = 0,
FT z + ey + s = c,Du + z = 0,
XSe = µe.
We rewrite FOC[
eT 0F −I
][
xu
]
=
[
10
]
,
[
e FT
0 −I
][
yz
]
+
[
s0
]
−
[
0 00 D
][
xu
]
=
[
c0
]
,
XSe = µe.
Finally, we derive the Newton equation system
and reduce it to the following form:
−Θ−1 e FT
−D −I
eT
F −I
∆x∆u∆y∆z
=
rx
ru
ry
rz
,
where Θ−1 = X−1S.15
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
IPMs: separable form of MM
Although the problem (2) has n + T variables
(x, u) and T +1 linear constraints (while (1) had
n variables and only one constraint), for small T ,
(2) is usually much easier to solve than (1).
Indeed, in this case, the Newton equation system
can be reduced to the following normal equations
system
(A(D + Θ−1)−1AT)∆y = g,
where
A=
[
eT 0F −I
]
∈ R(T+1)×(n+T)
D+Θ−1=
[
0 00 D
]
+
[
Θ−1 00 0
]
∈ R(n+T)×(n+T).
Observe that D + Θ−1 is a diagonal matrix.
16
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
IPMs: separable form of MM
Summing up, when (2) is solved, we deal with
the normal equations system of dimension T +1.
The augmented system corresponding to the non-
separable formulation (1) would be a completely
dense indefinite system of dimension n+1. Since
T ≪ n, the separable QP is much easier to solve.
17
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Solution of the M. Model
Optimal portfolios for different λ:
λ LTB S&P NASDAQ Gold
0.0 0.020 0.056 0.924 0.0000.5 0.061 0.084 0.840 0.0151.0 0.248 0.214 0.516 0.022
10.0 0.431 0.239 0.213 0.117100.0 0.749 0.103 0.103 0.045
1000.0 0.890 0.040 0.015 0.055
Efficient frontier:
1.05 1.06 1.07 1.08 1.09 1.10 1.12 1.13 1.14
return
4
6
8
10
12
14
16
risk
18
20
1.11
λ = 1000
λ = 100
λ = 10
λ = 5
λ = 2.5
λ = 1
λ = 0.25
λ = 0
λ = 0.5
2
18
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Solution of the M. Model (cnt’d)
The smaller the λ, the more risky investments
are present in the optimal portfolio.
The border of the green-shaded area is called the
efficient frontier. For any portfolio inside the
shaded area one can find a better portfolio that
either:
• offers the same reward with less risk; or
• gives a larger reward with the same risk.
In real-life problems one considers many possible
assets (more than 4 classes in our example). The
resulting quadratic programming problem then
becomes much larger.
19
Interior Point Methods
for Linear, Quadratic
and Nonlinear Programming
Turin 2008
Jacek Gondzio
Lecture 17:
Financial Planning Problems
1
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Modeling Multistage SLPs
Stochastic Programming Models can be ap-
plied to multistage decision problems under un-
certainty satisfying the following two conditions:
• The underlying stochastic process is discrete
and independent of the decisions;
• The dynamics is represented by an equation
zt+1 = Gt(zt, xt, ξt),
where zt ∈ Zt and xt ∈ Xt are the state and
the decision (control) of the dynamic system
at stage t, respectively, and ξt is the realiza-
tion of the stochastic process.
2
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Modeling Multistage SLPs
With the Stochastic Program we associate an
event tree. Decision variables are indexed with
the nodes of the tree. The transition equation
relates variables associated with a given node
and its predecessor in the event tree. With every
arc from a node to its successor we associate a
unique realization of the random variable.
Symmetric Event TreeUnsymmetric Event Tree
3
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Event Tree
Every node of the event tree is indexed by the
pair (t, n), where t∈{1, .., T} and n∈{1, .., N(t)}
where T is the planning horizon (number of stages)
and N(t) is the number of nodes in the stage t.
The immediate ancestor (predecessor) of node
(t, n) is (t − 1, a(t, n)), where a(t, n) is an appli-
cation from the set of nodes onto itself. The
probabilistic event that led from (t−1, a(t, n)) to
its son (t,n) is an application denoted by s(t,n).
The probability π(t, n) that the stochastic pro-
cess visits node (t, n) is given by the recurrence
formula
π(t, n) = P(s(t, n) | (t−1, a(t, n))) · π(t−1, a(t, n)),
where P(s(t, n) | (t−1, a(t, n))) is the conditional
probability of the transition from (t−1, a(t, n)) to
(t, n).
The variables (state and control) are indexed on
the node. The dynamics is defined at each node
(t, n) by
z(t, n)=G(t,n)(z(t−1, a(t, n)), x(t, n), ξ(t−1, s(t, n))).
4
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Description of Event Tree
The applications a(t, n) and s(t, n) enable the
mathematical programming formulation of many
multistage decision problems under uncertainty.
Stage 2 Stage 3Stage 1
(1,1)
(3,1)
(3,2)
(2,3)
(2,2)
(2,1)
(3,3)
(3,4)
(3,6)
(3,5)
N(1)=1
N(2)=3
N(3)=6
Z(1)=3
Z(2)=2
Assume that:
The number of possible events at node (t, n) is
the same for all events at stage t and equal to
Z(t). The conditional probabilities P(s(t, n)|(t −
1, a(t, n))) depend on time (stage) t but not on
the particular node in this stage. Consequently,
for any t, we have N(t) = Z(t − 1) · N(t − 1).
Now a(t, n) and s(t, n) are computed recursively:
a(t, n) = ⌈ nZ(t−1)
⌉
s(t, n) = n − (a(t, n)− 1) · Z(t − 1) 5
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Event Tree: Example
Consider the following event tree
Stage 2 Stage 3Stage 1
(1,1)
(3,1)
(3,2)
(2,3)
(2,1)
(3,3)
(3,4)
(3,6)
(3,5)
(2,2)
2
1ξ
ξ
Consider the nodes of stage 3. Their ancestors
belong to stage 2 at which Z(3− 1) = 2, hence
a(3,3) = ⌈ 3Z(3−1)
⌉ = 2;
a(3,4) = ⌈ 4Z(3−1)
⌉ = 2.
Nodes (3,3) and (3,4) are sons of node (2,2).
The indices of the sons are the following:
s(3,3) = 3 − (a(3,3) − 1) · Z(3 − 1) = 1;
s(3,4) = 4 − (a(3,4) − 1) · Z(3 − 1) = 2.6
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Node Labels in the Event Tree
There are two “natural” node labeling rules:
- breadth-first search; and
- depth-first search.
Breadth-first search
Stage 1
1 3(1,1) (2,2) 7
8
9
10
6
5
(3,2)
(3,1)
(3,3)
2
4(2,3)
(2,1)
Stage 2 Stage 3
(3,4)
(3,5)
(3,6)
Recursively defined mapping:
a node (t, n) has the following label
fb(t, n) = fb(t − 1, N(t − 1)) + n,
with fb(1,1) = 1.7
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Labels in the Event Tree (cnt’d)
Depth-first search
Stage 1
1(1,1) (2,2)
9
10
(3,2)
(3,1)
(3,3)
2
(2,3)
(2,1)
Stage 2 Stage 3
3
4
5
8
7
6
(3,4)
(3,5)
(3,6)
h(3)=1h(2)=3h(1)=10
Define a function h(t) that gives the number of
nodes of any subtree with root node in time t.
Use the backward recursive formula
h(t − 1) = h(t) · Z(t − 1) + 1,
with h(T) = 1.
We get h(3) = 1, h(2) = 3 and h(1) = 10.
Recursively defined mapping:
a node (t, n) has the following label
fd(t, n) = fd(t−1, a(t, n)) + h(t) · (s(t, n)−1) + 1,
with fd(1,1) = 1.8
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Financial Planning Problem
Consider a multi-period financial planning prob-
lem. At every stage t = 0, ..., T −1 we can buy
or sell different assets from the set J = {1, ..., J}
(e.g. bonds, stock, real estate), we can lend
the money to other parties or borrow it from the
bank.
The return of asset j at stage t is uncertain.
We have an initial sum S0 to invest and we want
to maximize the expected final wealth ST
(or to maximize its expected utility U(ST)).
9
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Asset Liability Management
Asset Liability Modeling
Suppose that at every stage t, we contribute a
certain amount of cash Ct to the portfolio and
we pay a certain liability Lt.
Such a financial planning problem is called the
asset liability management problem.
This problem is of crucial importance to
pension funds and insurance companies.
Note a dynamic aspect of decisions to be taken:
the portfolio is to be re-balanced at every stage.
Note a stochastic aspect of the problem:
the returns of assets are uncertain.
We model this problem using an event tree
2
1ξ
ξ (t,n)
(t-1,a(n))
(t,n-1)
and decision variables associated with its nodes
(t, n). We assume that applications a(t, n) and
s(t, n) which define ancestor and son of node
(t, n), respectively are known.10
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Asset Liability Management
With asset j ∈ J at node (t, n) we associate:
xj,t,n the position in asset j in node (t, n);
xbj,t,n the amount of asset j bought in (t, n);
xsj,t,n the amount of asset j sold in (t, n).
For any t : 1 ≤ t ≤ T , we write the inventory
equation for asset j at node (t, n)
xj,t,n=(1+rj,t,n) · xj,t−1,a(t,n) + xbj,t,n − xs
j,t,n,
where rj,t,n is a return of asset j corresponding
to moving from node (t−1, a(t, n)) to node (t, n)
in the event tree.
Initial inventory equation for asset j:
xj,0,1= xinitialj + xb
j,0,1 − xsj,0,1.
11
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Cash BalanceLet Pj be the initial price of asset j. We assume
that the transaction costs are proportional to
the value of asset bought or sold. To buy xbj,t,n
of asset j at stage t we have to pay
(1 + γb) · Pj · xbj,t,n.
Analogously, for selling xsj,t,n of asset j at stage
t we get(1 − γs) · Pj · x
sj,t,n.
Let mt,n, mbt,n and ml
t,n be the cash hold, bor-
rowed and lent at time t at node n, respec-
tively. The borrowing and lending have return
rates rbt,n and rl
t,n, respectively. For example, for
the money mlt−1,a(t,n)
lent at stage t − 1, we re-
ceive back (1+rlt,n) · m
lt−1,a(t,n)
at stage t.
The cash balance equation states that the cash
inflow is equal to the cash outflow
Ct,n+mbt,n+(1+rl
t,n)mlt−1,a(t,n)+
J∑
j=1
(1−γs)Pjxsj,t,n
=Lt,n+mlt,n+(1+rb
t,n)mbt−1,a(t,n)+
J∑
j=1
(1+γb)Pjxbj,t,n
at any node (t, n), t = 1, ..., T , n = 1, ..., N(t).12
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Special Cash Balance Equations
Initial cash balance
S0+C0,1+mb0,1+
J∑
j=1
(1−γs)Pjxsj,0,1
=L0,1+ml0,1+
J∑
j=1
(1+γb)Pjxbj,0,1
i.e., we assume that there was an initial portfolio
of assets, so we can also sell at stage t = 0.
Final cash balance
CT,n+mbT,n+(1+rl
T,n)mlT−1,a(T,n)+
J∑
j=1
(1−γs)Pjxsj,T,n =
ST +LT,n+mlT,n+(1+rb
T,n)mbT−1,a(T,n)+
J∑
j=1
(1+γb)Pjxbj,T,n
i.e., we assume that the final portfolio can in-
clude assets, so we can also buy at stage T .
These two constraints may be used for example
in asset liability management problem for a pen-
sion fund or an insurance company.
(Indeed, life does not end at time T .)
13
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Additional ConstraintsPolicy restrictions for asset mix:
wloj ·
J∑
j=1
xj,t,n ≤ xj,t,n ≤ wupj ·
J∑
j=1
xj,t,n, ∀j, t, n.
Also the contributions are bounded:
Clot,n ≤ Ct,n ≤ C
upt,n, ∀t, n.
Total asset value at the end of period t,
At,n=J
∑
j=1
(1+rj,t,n)Pjxj,t−1,a(t,n)
+(1+rlt,n)m
lt−1,a(t,n)−(1+rb
t,n)mbt−1,a(t,n)
should not decrease below the minimum level of
funding ratio Fmin for liabilities at this period.
To get more flexibility in modeling (to ensure
complete recourse), we allow a deficit Zt,n:
At,n ≥ Fmin · Lt,n − Zt,n, ∀t, n,
for which we shall penalize in the objective.
The final value of assets at the end of the plan-
ning horizon should cover the final liabilities and
ensure the final wealth of at least ST , hence
AT,n ≥ F end · LT,n + ST,n, ∀n.
14
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Objective
In a simple financial planning problem we usually
maximize the expected value of the final portfolio
converted into cash.
In asset liability management problem we may be
more flexible:
• we accept (small) deficits, Zt,n;
• we can increase the contributions, Ct,n;
• we can borrow cash, mbt,n, etc.
Suppose we:
• penalize for deficits; and
• minimize contributions.
Hence we get the following objective
minT−1∑
t=0
N(t)∑
n=1
πt,nCt,n + λT
∑
t=1
N(t)∑
n=1
πt,nZt,n
Lt,n.
15
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Fin. Planning Prob: Example
Consider a two-year planning problem with the
initial portfolio in cash S0=1000. The portfolio
is re-balanced at the beginning of each year. Any
transaction incurs a proportional cost of 2%.
There are two assets to choose from. Their val-
ues at t = 0 are 50 and 80, respectively.
In year 1, assets A and B have two possible re-
turns: (-6%,-4%) and (+12%,+8%) with prob-
abilities 0.2 and 0.8, respectively. In year 2, these
returns are (-8%,-6%) and (+12%,+10%) with
probabilities 0.4 and 0.6, respectively.
At the end of year 2 (or at the beginning of year
3) all assets are sold and converted into cash.
The objective of the manager is to maximize the
expected value of the terminal wealth in cash.
16
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Example cont’d: Event Tree
We use an event tree with two possible eventsin year 1, and 4 events in year 2. (t, n) denotesnode n in year t and πt,n is the probability ofreaching node (t, n).
(0,1)
(1,1)
(1,2)
(2,1)
(2,2)
(2,3)
(2,4)
(-8,-6)
(12,10)
(12,10)
(-8,-6)
(-6,-4)
(12,8)
π1,1
π
= 0.2
= 0.8
π
π
π
π2,1= 0.08
2,2
2,3
2,4
= 0.12
= 0.32
= 0.48
1,2
With every node (t, n) we associate 6 variables.Let xA,t,n, xb
A,t,n and xsA,t,n be the position, the
amount purchased and the amount sold of assetA, respectively. The variables xB,t,n, xb
B,t,n andxs
B,t,n correspond to asset B.
Overall, we have 7 × 6 = 42 variables. However,some of them must be zero:
xsA,0,1 = xs
B,0,1 = 0.
17
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Example cont’d: EquationsThe initial inventory equations:
xA,0,1 = xbA,0,1 and xB,0,1 = xb
B,0,1,
and
50(1 + γ) · xbA,0,1 + 80(1 + γ) · xb
B,0,1 = S0.
The inventory equations in year 1:
xA,1,1 = 0.94 · xA,0,1 + xbA,1,1 − xs
A,1,1,
xB,1,1 = 0.96 · xB,0,1 + xbB,1,1 − xs
B,1,1,
xA,1,2 = 1.12 · xA,0,1 + xbA,1,2 − xs
A,1,2,
xB,1,2 = 1.08 · xB,0,1 + xbB,1,2 − xs
B,1,2.
The inventory equations in year 2:
xA,2,1 = 0.92 · xA,1,1 + xbA,2,1 − xs
A,2,1,
xB,2,1 = 0.94 · xB,1,1 + xbB,2,1 − xs
B,2,1,
etc.
Cash balance at every node:
50(1−γ)xsA,t,n + 80(1−γ)xs
B,t,n
= 50(1+γ)xbA,t,n + 80(1+γ)xb
B,t,n
for t = 1,2, and n = 1, ..., N(t).18
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Example cont’d: Objective
At the beginning of year 3 (or at the end of year
2), we sell all assets to convert our portfolio into
cash.
The objective is to maximize:
50(1−γ)·4
∑
n=1
π2,n·xA,2,n + 80(1−γ)·4
∑
n=1
π2,n·xB,2,n.
All variables must satisfy
xj,t,n ≥ 0 xbj,t,n ≥ 0 xs
j,t,n ≥ 0,
for j ∈ {A, B}, t = 0,1,2, and n = 1, ..., N(t).
Overall, we have:
42-2 = 40 non-negative decision variables; and
3+4+8+6 = 21 constraints.
19
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
ALM: Extensions
Introduce two more (nonnegative) variables per
final node (scenario) i ∈ Lt to model the positive
and negative variation from the mean
(1 − ct)J
∑
j=1
vjxhi,j + s+i − s−i = y.
Since (s+i )2, (s−i )2 cannot both be positive the
variance is expressed as
Var(X) =∑
i∈Lt
pi(s+i −s−i )2 =
∑
i∈Lt
pi((s+i )2+(s−i )2).
We model downside risk using a semi-variance
IE[(X − IEX)2−] =∑
i∈Lt
pi(s+i )2.
Downside risk can be taken into account
• in the objective, or
• as a constraint.
20
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
ALM: Extensions
Concave utility function (risk-aversion).
Suppose an initial sum of b = 10000 is to be invested
and a liability of L = 11000 is to be paid at the end of
the planning period. If the final wealth, W exceeds the
liability L, a return of 4% per year from the excess W − L
is ensured. Otherwise, if W is lower than the liability L,
the deficit L − W will have to be covered by a loan that
costs 10% per year.
Utility
Wealthexcessdeficit
10%
4%11,000
The investor’s utility is proportional to the inter-
est payment resulting from the excess or deficit.
u(w) =
{
0.04 · (w − 11000) if w ≥ 11000,
0.10 · (w − 11000) if w < 11000.
log(w − l) is a commonly used concave utility
function. 21
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
ALM: Extensions
xi = (xi,1, ..., xi,J) denotes the portfolio in node i
Standard Markowitz formulation:
max y − σ∑
i∈Ltpi(d
⊤i xi−y)2
s.t.
(C1)∑
i∈Ltpid
⊤i xi−y = 0
(C2) Bxa(i) − Axi = 0, i 6=0
(C3) Ax0 = b
Downside risk constrained:
max y
s.t.∑
i∈Ltpi(s
+i )2 ≤ ρ
d⊤i xi + s+i − s−i − y = 0, i ∈ Lt
(C1) − (C3)
22
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
ALM: Extensions
Nonlinear utility function:
max∑
i∈Ltpi log (d⊤i xi)
s.t.∑
i∈Ltpi(s
+i )2 ≤ ρ
d⊤i xi + s+i − s−i − y = 0, i ∈ Lt
(C1) − (C3)
Skewness formulation:
max y+γ∑
i∈Lt
pi(s+i −s−i )3
s.t.∑
i∈Ltpi(s
+i )2 ≤ ρ
d⊤i xi + s+i − s−i − y = 0, i ∈ Lt
(C1) − (C3)
23
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
References:
Gondzio and Kouwenberg
High performance computing for asset liability
management, Operations Research, 49 (2001)
pp. 879–891.
Gondzio and Grothey
Parallel interior point solver for structured quad-
ratic programs: Application to financial planning
problems, Annals of Operations Research 152
(2007) pp. 319–339.
Gondzio and Grothey
Solving nonlinear portfolio optimization problems
with the primal-dual interior point method,
European Journal of Operational Research 181
(2007) pp. 1019–1029.
24
IPMs for LP, QP, NLP J. Gondzio, Turin 2008
Classification
We consider a set of points X = {x1, x2, . . . , xn}, xi ∈ Rℓ to be classified into two subsest of
“good” and “bad” ones. X = G ∪ B and G ∩ B = ∅.We look for a function f : X 7→ R such that f(x) ≥ 0 if x ∈ G and f(x) < 0 if x ∈ B.
Linear Classification
We consider a case when f is a linear function:
f(x) = wTx + b,
where w ∈ Rℓ and b ∈ R.
In other words we look for a hyperplane which separates “good” points from “bad” ones.
In such case the decision rule is given by d = sgn(f(x)).
If f(xi) ≥ 0, then di = +1 and xi ∈ G. If f(xi) < 0, then di = −1 and xi ∈ B.
We say that there is a linearly separable training sample
S = ((x1, d1), (x2, d2), . . . , (xn, dn)).
3
IPMs for LP, QP, NLP J. Gondzio, Turin 2008
How does it work?
Given a linearly separable database (training sample)
S = ((x1, d1), (x2, d2), . . . , (xn, dn))
find a separating hyperplane
wTx + b = 0,
which satisfies
di(wTxi + b) ≥ 1, ∀i = 1, 2, . . . , n.
Given a new (unclassified) point z, compute
dz = sgn(wTz + b)
to decide whether z is “good” or “bad”.
4
IPMs for LP, QP, NLP J. Gondzio, Turin 2008
Lecture 18:
Machine Learning and Data Mining
with Support Vector Machines
• V.N. Vapnik: Statistical Learning Theory, John Wiley & Sons, New York, 1998.
• N. Cristianini and J. Shawe-Taylor: An Introduction to Support Vector Machines and Other
Kernel Based Learning Methods, Cambridge University Press, 2000.
• J.A.K. Suykens, T. Van Gestel, J. De Brabanter, B. De Moor and J. Vandevalle, Least
Squares Support Vector Machines, World Scientific, 2002.
• M.C. Ferris, T.S. Munson: Interior-Point Methods for Massive Support Vector Machines,
SIAM Journal on Optimization 13 (2003), pp 783–804.
• J. Ma, J. Theiler, S. Perkins: Accurate Online Support Vector Regression, Neural Compu-
tation, 15 (2003), pp 2683-2703. (Tech Report, Los Alamos National Lab, Los Alamos, NM
87545, USA.)
1
IPMs for LP, QP, NLP J. Gondzio, Turin 2008
Support Vector Machines (SVMs) are a standard
method used to separate points belonging to two (or
more) sets in n-dimensional space by a linear or
nonlinear surface (Vapnik, Cristianini and Shawe-
Taylor, Suykens et al.).
B
GG
GG
G
GG
GBB
BB
B
BBB BBB
B
B
G
GG
SVMs have numerous applications:
• Finance: detecting fraud transactions with credit cards,
• Finance: credit scoring,
• CS: pattern recognition, image recognition, hand-written digit recognition,
• Bioinformatics: finding the functions of particular genes,
• Medicine: diagnosing patients,
• Telecommunications: detecting customers who will switch to another operator,
• and many others ...
2
IPMs for LP, QP, NLP J. Gondzio, Turin 2008
QP Formulation
Finding a separating hyperplane can be formulated as a quadratic programming problem:
min 12w
Tw
s.t. di(wTxi + b) ≥ 1, ∀i = 1, 2, . . . , n.
In this formulation the Euclidean norm of w is minimized.
This is clearly a convex optimization problem.
(We can minimize ‖w‖1 or ‖w‖∞ and then the problem can be reformulated as an LP.)
Two major difficulties:
• Clusters may not be separable at all
−→ minimize the error of misclassifications;
• Clusters may be separable by a nonlinear manifold
−→ find the right feature map.
7
IPMs for LP, QP, NLP J. Gondzio, Turin 2008
Difficult Cases
Nonseparable clusters:
B
G
GG
G
GG
GBB
BB
BB BBB
B
B
G
GG
BG
B
1
2
ξ
ξ Errors when defining clusters of good and bad points.
Minimize the global error of misclassifications: ξ1 +ξ2.
Use nonlinear feature map:
B
G
G
G
G BBB
B
BB
B BBBB
B
G
GG
G
G
G
B
GG
G
B
GG B
G
BBB
B
B BBBB
B
G
GG
GG
G G
G G
GG
GG
GG
G
G
BB
B
B
G GG
Φ
8
IPMs for LP, QP, NLP J. Gondzio, Turin 2008
Example
Consider a database of mortgages in a bank.
For each mortgage (customer) i = 1, 2, . . . , n we know xi = (ai, li, pi, si, ci,mi) ∈ R6, where:
ai is the age of the customer,
li is the number of loans this customer has,
pi is the post code of the customer’s address,
si is the salary of the customer,
ci is the type of customer’s contract (part-time, temporary, open-ended, etc),
mi is the outstanding balance on the mortgage.
For each of these mortgages we also know di which indicates if the mortgage is “good” or “bad”.
We find a hyperplane
wTx + b = 0,
where w ∈ R6 and b ∈ R which separates “good” and “bad” mortgages.
Suppose a new application for a mortgage has been made.
We collect information z = (az, lz, pz, sz, cz,mz) ∈ R6 for the applicant and compute
dz = sgn(wTz + b)
to decide whether the mortgage should be given or not.
5
IPMs for LP, QP, NLP J. Gondzio, Turin 2008
Separating Hyperplane
To guarantee a nonzero margin of separation we look for a hyperplane
wTx + b = 0,
such that
wTxi + b ≥ 1 for “good” points;
wTxi + b ≤ −1 for “bad” points.
This is equivalent to:wT xi
‖w‖+ b
‖w‖≥ 1
‖w‖for “good” points;
wT xi
‖w‖+ b
‖w‖≤ − 1
‖w‖for “bad” points.
In this formualtion the normal vector of the separating hyperplane w‖w‖ has unit length.
In this case the margin between “good” and “bad” points is measured by 2‖w‖.
We would like this margin to be maximised.
This can be achieved by minimising the norm ‖w‖.
6
IPMs for LP, QP, NLP J. Gondzio, Turin 2008
Dual Quadratic Problem (continued)
Observe that the dual problem has a neat formulation in which only dual variables y are present.
(The primal variables (w, b, ξ) do not appear in the dual.)
Define a matrix Q ∈ Rn×n such that qij = didj(xTi xj) .
Rewrite the (dual) quadratic program:
max eTy − 12y
TQy,
s.t. dTy = 0,
0 ≤ y ≤ λe,
where e is the vector of ones in Rn.
The matrix Q corresponds to a specific linear kernel function.
11
IPMs for LP, QP, NLP J. Gondzio, Turin 2008
Dual Quadratic Problem (continued)
The primal problem is convex hence the dual problem must be well stated too.
The dual problem is to maximise the concave function. We can prove it directly.
Lemma
The matrix Q is positive semidefinite.
Proof:
Define
Z = [d1x1|d2x2| . . . |dnxn]T ∈ Rn×ℓ
and observe that
Q = ZZT (i.e., qij = didj(xTi xj)).
For any u ∈ Rn we have
uTQu = (uTZ)(ZTu) = ‖ZTu‖2 ≥ 0
hence Q is positive semidefinite.
12
IPMs for LP, QP, NLP J. Gondzio, Turin 2008
Linearly nonseparable case
If perfect linear separation is impossible then for each misclassified data we introduce a slack vari-
able ξi which measures the distance between the hyperplane and misclassified data.
Finding the best hyperplane can be formulated as a quadratic programming problem:
min 12w
Tw + λn∑
i=1
ξi
s.t. di(wTxi + b) + ξi ≥ 1, ∀i = 1, 2, . . . , n,
ξi ≥ 0 ∀i = 1, 2, . . . , n,
where λ (λ > 0) controls the penalisation for misclassifications.
We will derive the dual quadratic problem.
We associate Lagrange multipliers y ∈ Rn (y≥0) and s ∈ Rn (s≥0)
with the constraints di(wTxi + b) + ξi ≥ 1 and ξ ≥ 0, and write the Lagrangian
L(w, b, ξ, y, s) =1
2wT w + λ
n∑
i=1
ξi −n∑
i=1
yi(di(wTxi + b) + ξi − 1) − sTξ.
9
IPMs for LP, QP, NLP J. Gondzio, Turin 2008
Dual Quadratic Problem
Stationarity conditions (with respect to all primal variables):
∇w L(w, b, ξ, y, s) = w −n∑
i=1
yidixi = 0
∇ξiL(w, b, ξ, y, s) = λ − yi − si = 0
∇b L(w, b, ξ, y, s) =n∑
i=1
yidi = 0.
Substituting these equations into the Lagrangian function we get
L(w, b, ξ, y, s) =n∑
i=1
yi −1
2
n∑
i,j=1
didjyiyj(xTi xj).
Hence the dual problem has the form:
maxn∑
i=1
yi −12
n∑i,j=1
didjyiyj(xTi xj)
s.t.n∑
i=1
diyi = 0,
0 ≤ yi ≤ λ, ∀i = 1, 2, . . . , n,
10
IPMs for LP, QP, NLP J. Gondzio, Turin 2008
Support Vector Machines
Two step procedure:
Use nonlinear mapping to transform data into a feature space F ;
Use linear machine to classify objects in the feature space.
Requirement: Only the inner products of data with the new point can be used.
Kernel Function
A kernel is a function K, such that for all x, z ∈ X
K(x, z) = 〈φ(x), φ(z)〉,
where φ is a mapping from X to an (inner product) feature space F .
We use 〈., .〉 to denote a scalar product.
Linear Kernel K(x, z) = xTz.
Polynomial Kernel K(x, z) = (xTz + 1)d.
Gaussian Kernel K(x, z) = e−
‖x−z‖2
σ2 .
15
IPMs for LP, QP, NLP J. Gondzio, Turin 2008
Question
Consider a particular case:
Database of mortgages for a big lender:
n = 1000000 mortgages;
ℓ = 6 (or, more realistically, ℓ = 20).
Which problem is easier to solve:
the Primal (page 9) or the Dual (page 11) ?
16
IPMs for LP, QP, NLP J. Gondzio, Turin 2008
Definitions
The original quantities are called attributes.
The quantities introduced to describe the data are called features.
The task of choosing a suitable representation is called feature selection.
X ⊂ Rℓ is the input space.
F ⊂ Rk is the feature space.
The function φ : X 7→ F is called feature map.
We would like to transform input space into feature space
and see a clear separation (preferably linear) in the feature space.
13
IPMs for LP, QP, NLP J. Gondzio, Turin 2008
Example: Nonlinear Feature Map
G BB
BG
G
GG
G
G
G
GG
B
G
GG
BB
B
B
BB
BBB
BB
BB
B
B
B
BB
GG
G
G
G
BB
B
BBBB G G G
GGGG
GGGG
GG
G
G
G
G
B BB
BBB B B BB B B B
BBBB
B B B BBB
B B BB
B
B
G
Φ
2p
2q
Use polar coordinates:
Define φ : R2 7→ R2 such that (xi1, xi2) 7→ (ri, θi),
where ri = ((xi1−a)2
p2 + (xi2−b)2
q2 )1/2 and θi = tan−1( (xi2−b)p(xi1−a)q
).
Attributes: (xi1, xi2)
Features: (ri, θi)
Feature map: φ
14
Interior Point Methods
for Linear, Quadratic
and Nonlinear Programming
Turin 2008
Jacek Gondzio
Lecture 19:
Nonlinear Programming
Linesearch Methods
1
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Unconstrained Nonlin. Opt.
Consider an unconstrained nonlinear optimiza-
tion problem
minx
f(x)
where x ∈ Rn, and f : Rn 7→ R is a twice differ-
entiable but not necessarily convex function.
Lemma 1
First Order Necessary Conditions.
If x is a local minimizer and f is continuously dif-
ferentiable in an open neighbourhood of x, then
∇f(x) = 0.
Proof (by contradiction)
Suppose ∇f(x) 6= 0. Define p = −∇f(x) and
note that ∇f(x)Tp = −‖∇f(x)‖2 < 0. Since ∇fis continuous near x, there exists a scalar a > 0
such that ∇f(x + αp)Tp < 0 for all α ∈ [0, a].Take any α ∈ (0, a]. From Taylor’s theorem we
get
f(x + αp) = f(x) + α∇f(x + βp)T p,
for some β ∈ (0, α]. Thus f(x + αp) < f(x) for
all α ∈ (0, a]. Hence x is not a local minimizer.
2
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Optimality Conditions
Lemma 2
Second Order Necessary Conditions.
If x is a local minimizer and ∇2f is continuous
in an open neighbourhood of x, then ∇f(x) = 0
and ∇2f(x) is positive semidefinite.
Proof
From Lemma 1, ∇f(x) = 0. Suppose ∇2f(x) is
not positive semidefinite. Then we can choose
a vector p such that pT∇2f(x)p < 0, and from
continuity of ∇2f around x, there exists a scalar
a such that pT∇2f(x+αp)p < 0 for all α ∈ (0, a].
From the second-order Taylor expansion around
x, we then have for any α ∈ (0, a]
f(x + αp) = f(x)+α∇f(x)Tp+1
2α2pT∇2f(x+βp)p
for some β ∈ (0, α). Since ∇f(x) = 0 and
pT∇2f(x + βp)p < 0, then f(x + αp) < f(x).
Hence x is not a local minimizer.
3
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Optimality Conditions (cont’d)
Lemma 3
Second Order Sufficient Conditions.
Suppose ∇2f is continuous in an open neigh-bourhood of x and ∇2f(x) is positive definite and∇f(x) = 0. Then x is a strict local minimizer of
f .
Proof
Since ∇2f is continuous and it is positive definiteat x, it remains positive definite in some neigh-
bourhood of x. Let B = {z : ‖z− x‖ < r} be sucha neighbourhood. For any nonzero vector p such
that ‖p‖ < r, we have x + p ∈ B and so
f(x + p) = f(x)+∇f(x)Tp+1
2pT∇2f(z)p
= f(x)+1
2pT∇2f(z)p
where z = x + αp for some α ∈ (0,1). Sincez ∈ B, we have pT∇2f(z)p > 0, and therefore
f(x + p) > f(x).
Hence x is a strict local minimizer.
4
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Quadratic Model of the Function
Let f : Rn 7→ R be twice continuously differen-
tiable at x0.
From Taylor theorem we can replace (locally) f
with its second order approximation
f(x0+p)=f(x0)+∇f(x0)Tp+
1
2pT∇2f(x0)p+r3(p),
where the remainder satisfies:
limp→0
r3(p)
‖p‖2= 0.
The quadratic model
m(x0+p)=f(x0)+∇f(x0)Tp+
1
2pT∇2f(x0)p
is a good approximation of f in the neighbour-
hood of x0, i.e. iff p is small.
5
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Rosenbrock Function
Example: f(x) = 10(x2 − x21)
2 + (1 − x1)2.
x 1
level set
minimizer
1
1
x 2
6
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Descent Directions
Steepest Descent Direction
p = −∇f(x0).
Indeed, for small α that satisfies
α∇f(x0)
T∇2f(x0)∇f(x0)
2‖∇f(x0)‖2< 1
we have
m(x0 + αp) = f(x0)−α‖∇f(x0)‖2
+1
2α2∇f(x0)
T∇2f(x0)∇f(x0)
< f(x0).
7
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Descent Directions (cont’d)
Newton Direction
This direction points to the minimum of the
quadratic model.
The quadratic model has a minimum at point
x0 + p such that
m′
(x0 + αp) = ∇f(x0)+∇2f(x0)p=0,
i.e.
p = −(∇2f(x0))−1∇f(x0).
If ∇2f(x0) is positive definite, then (∇2f(x0))−1
is also positive definite, hence
∇f(x0)Tp = −∇f(x0)
T∇2f(x0)−1
∇f(x0) < 0.
Thus, for α ∈ (0,1] we get
m(x0+αp)=f(x0)−α(1−α
2)∇f(x0)
T∇2f(x0)−1
∇f(x0)
< f(x0)−α
2∇f(x0)
T∇2f(x0)−1
∇f(x0)
< f(x0).
8
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Other Descent Directions
Let B be a symmetric positive definite matrix.
Define
p = −B−1∇f(x0).
Since B is positive definite, its inverse is also
positive definite and
∇f(x0)Tp = −∇f(x0)
TB−1∇f(x0) < 0.
Thus, for small α that satisfies
α∇f(x0)
TB−1∇2f(x0)B−1∇f(x0)
2∇f(x0)TB−1∇f(x0)
< 1
we get
m(x0+αp)=f(x0)−α∇f(x0)TB−1∇f(x0)
+1
2α2∇f(x0)
TB−1∇2f(x0)B−1∇f(x0)
< f(x0).
9
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Descent Directions: Summary
The steepest descent direction is easy to com-
pute (only the gradient of f). There is a guar-
antee that for sufficiently small α some progress
in the objective can be achieved. Although the
algorithms that use the steepest descent direc-
tions are globally convergent, they may be very
slow in practice.
If ∇2f(x0) is positive definite, the Newton direc-
tion can be used. The computation of this direc-
tion needs the evaluation of the second derivative
of f as well as the solution of the system of linear
equations:
(∇2f(x0)) p = −∇f(x0).
This may be quite a significant effort.
10
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Approximate Hessian
If
• ∇2f(x0) is not positive definite, or
• ∇2f(x0) is very expensive to compute, or
• the linear system with ∇2f(x0) is too hard,
then instead of using exact ∇2f(x0), we can use
its positive definite approximation B.
Example:
Quasi-Newton Methods
• DFP: Davidon, Fletcher, Powell
• BFGS: Broyden, Fletcher, Goldfarb, Shanno
Low-rank approximations of the Hessian matrix.
11
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Line Search
The material given in this lecture comes from
the book of J. Nocedal & S. Wright:
Numerical Optimization,
Springer Series in Operations Research,
Springer-Verlag Telos, 1999.
To analyse the behaviour of function f along di-
rection p, define φ : R 7→ R as
φ(α) = f(x0 + αp).
If p is a descent direction for f , then for suffi-
ciently small α
φ(α) < φ(0).
However, tiny step size α would mean that tiny
progress to optimality can only be made.
We want α to be reasonably large.
12
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Two Conds for the Step Size
Sufficient Decrease Condition
f(x0 + αp) ≤ f(x0) + c1α∇f(x0)T p,
for some constant c1 ∈ (0,1).
Curvature Condition
∇f(x0 + αp)T p ≥ c2∇f(x0)T p,
for some constant c2 ∈ (c1,1).
The sufficient decrease condition requires the re-
duction in the objective to be proportional to
the step size α and the directional derivative
∇f(x0)T p.
The curvature condition can be expressed as
φ′
(α) ≥ c2φ′
(0).
Thus it requires the slope of φ at α to be c2 times
larger than that at zero. In other words, if the
slope φ′
(α) is strongly negative, then we should
increase α. However, if it is only slightly negative
or even positive, then little or none progress can
be expected from moving further along direction
p.13
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Two Conds for the Step Size
φ(α) αk=f(x + p )k
α
l( )α
acceptable acceptable
φ(α) αk
=f(x + p )k
α
l( )α
acceptable acceptable
14
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Good step size α exists
Lemma 4
Let f : Rn 7→ R be continuously differentiable at
x0. Let p be a descent direction at x0, such that
f is bounded below along a ray {x0+αp|α > 0}.
If 0 < c1 < c2 < 1, then there exist intervals of
step lengths that satisfy Wolfe’s conditions:
f(x0 + αp) ≤ f(x0) + c1α∇f(x0)T p,
∇f(x0 + αp)T p ≥ c2∇f(x0)T p.
15
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Good step size α exists
Proof. Since φ(α)=f(x0+αp) is bounded below
∀α > 0 and 0 < c1 < 1, the line l(α) = f(x0)+
c1α∇f(x0)T p must intersect the graph of φ at
least once. Let α0 be the smallest intersecting
value of α, that is
f(x0 + α0p) = f(x0) + c1α0∇f(x0)T p.
The sufficient decrease condition holds for
α ∈ [0, α0].
By the mean value theorem, ∃αx ∈ (0, α0):
f(x0 + α0p) − f(x0) = α0∇f(x0 + αxp)T p.
By combining this and the earlier equality:
∇f(x0 + αxp)T p = c1∇f(x0)T p > c2∇f(x0)
T p,
since c1 < c2 and ∇f(x0)T p < 0.
Therefore αx satisfies Wolfe’s conditions. Since
both these conditions are satisfied as strict in-
equalities, there exists a neighbourhood of αx in
which the Wolfe’s inequalities hold.
16
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Goldstein Conditions
An alternative to Wolfe’s conditions are the Gold-
stein Conditions:
f(x0 + αp) ≤ f(x0) + c α∇f(x0)T p,
f(x0 + αp) ≥ f(x0) + (1 − c)α∇f(x0)T p,
with 0 < c < 1/2.
α T(1-c) f pk k
α Tc f pk k
αacceptable
φ(α) αk=f(x + p )k
accept.
17
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Convergence of Line Search M.
Def. A function f is Lipschitz continuous on Dif there exists a constant L such that
‖f(x) − f(y)‖ ≤ L‖x − y‖ for all x, y ∈ D.
Theorem (Zoutendijk)
Consider an iteration of the form
xk+1 = xk + αkpk,
where pk is a descent direction and αk satis-
fies the Wolfe’s conditions. Let θk be the angle
between pk and the steepest descent direction
−∇fk, i.e.
cos θk =−∇fT
k pk
‖∇fk‖‖pk‖.
Assume f is bounded below, continuously dif-
ferentiable on an open set D, and the initial
point x0 ∈ D is given. Assume also that ∇fis Lipschitz continuous on D and the level set
L = {x : f(x) < f(x0)} ⊂ D.
Then∑
k≥0
cos2θk‖∇fk‖2 < ∞.
18
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Proof of Zoutendijk Theorem
Proof. From the second Wolfe’s condition:
(∇fk+1 −∇fk)T pk ≥ (c2 − 1)∇fT
k pk.
Lipschitz continuity of ∇f implies that
(∇fk+1 −∇fk)T pk ≤ αkL‖pk‖
2.
By combining these two inequalities, we get
αk ≥c2 − 1
L·∇fT
k pk
‖pk‖2
.
Hence from the first Wolfe’s condition we get
fk+1 ≤ fk − c1 ·1 − c2
L·(∇fT
k pk)2
‖pk‖2
.
Let c = c11−c2
L> 0. From the definition of cos θk,
we getfk+1 ≤ fk − c · cos2θk‖∇fk‖
2.
Summing this inequalities for j = 0,1, · · · , k, we
obtainfk+1 ≤ f0 − c ·
k∑
j=0
cos2θj‖∇fj‖2.
Since f is bounded below, for any k, f0− fk+1 is
less than some positive constant. Hence∑
k≥0
cos2θk‖∇fk‖2 < ∞.
19
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Global Convergence
Corollary 1
limk→∞
cos θk∇fk = 0.
Corollary 2
If for all k, cos2 θk ≥ δ > 0, then
limk→∞
∇fk = 0.
Thus it suffices that the angle between pk and
−∇fk does not get too close to π/2. Then
the descent algorithm with line search satisfying
Wolfe’s conditions is globally convergent.
The convergence is global but it may be slow.
xx
x0
12
20
Interior Point Methods
for Linear, Quadratic
and Nonlinear Programming
Turin 2008
Jacek Gondzio
Lecture 20:
Nonlinear Programming
Trust Region Methods
1
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Trust Region Nonlinear Opt.The material given in this lecture comes from
the book of J. Nocedal & S. Wright:
Numerical Optimization,
Springer Series in Operations Research,
Springer-Verlag Telos, 1999.
Consider an unconstrained nonlinear optimiza-
tion problemmin
xf(x)
where x ∈ Rn, and f : Rn 7→ R is a twice differ-
entiable but not necessarily convex function.
The quadratic model
mk(xk+p)=f(xk) + ∇f(xk)Tp +
1
2pTBk p
is a good approximation of f in the neighbour-
hood of xk, i.e. iff p is small.
It is natural to make step from xk to the new
xk+1 in a direction that minimizes the quadratic
model. The quadratic model is valid for small p,say, ‖p‖ ≤ ∆. Thus the direction pk ∈ Rn is the
solution of the following problem
min mk(p) = fk + ∇fTk p + 1
2pTBk ps.t. ‖p‖ ≤ ∆k.
2
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Trust Region: Illustration
xk
∆
∆2
1
contours of the model
trust region
level set
minimizer
Trust region around the point xk.
Two possible directions depending on the size of
the trust region ∆1 and ∆2.
3
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Measuring the Progress
The radius of the trust region is chosen at each
iteration. This choice is based on the agreement
of the model function mk and the true objective
function f .
Given a step pk we define the ratio
ρk =f(xk) − f(xk + pk)
mk(0) − mk(pk).
The numerator is called the actual reduction,
and the denominator is the predicted reduction.
Since the step pk is obtained by minimizing the
model mk over the region that includes p = 0,
the predicted reduction will always be nonnega-
tive.
4
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Measuring the Progress
If ρk < 0, the new objective value f(xk + pk) is
greater than the current value f(xk), so the step
must be rejected and the trust region should be
reduced.
On the other hand, if ρk is close to 1, there is
a good agreement between the model mk and
the function f , so it is safe to expand the trust
region for the next iteration.
If ρk > 0 but it is not close to 1, we do not alter
the trust region. If ρk is close to zero, we shrink
the trust region.
5
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Trust Region Algorithm
Given ∆ > 0,∆0 ∈ (0, ∆) and η ∈ [0, 14):
for k = 0,1,2, · · ·
Find pk (by solving the trust region problem);
Evaluate ρk;
if ρk < 14
∆k+1 = 14‖pk‖
else
if ρk > 34 and ‖pk‖ = ∆k
∆k+1 = min{2∆k, ∆}
else
∆k+1 = ∆k;
if ρk > η
xk+1 = xk + pk
else
xk+1 = xk;
end (for).
6
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
The Cauchy Point
Find the direction pSk that solves the linear ver-
sion of trust region subproblem:
min fk + ∇fTk p
s.t. ‖p‖ ≤ ∆k.
Find the step size τk > 0 that minimizes
mk(τpSk ) within the trust region, i.e. solve:
minτ>0
mk(τpSk )
s.t. ‖τpSk‖ ≤ ∆k.
Set pCk = τkpS
k
7
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
The Cauchy Point (cont’d)
kpC
pSk
xk
contours of the model
trust region
It is possible to write a closed-form definition of
the Cauchy point. Indeed, the first minimization
givespSk = −
∆k
‖∇fk‖∇fk.
It is the steepest descent direction and the max-
imum step size within the trust region is made
in this direction.
8
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Cauchy Point: Explicit Formula
The problem of the line search along pSk can also
be solved explicitly. We consider two cases:
Case 1: ∇fTk Bk∇fk ≤ 0.
The function mk(τpSk ) decreases monotonically
with τ so τk is the largest value that satisfies the
trust region bound, namely τk = 1.
Case 2: ∇fTk Bk∇fk > 0.
The function mk(τpSk ) is a convex quadratic in
τ , so τk is either the unconstrained minimizer
of this quadratic, ‖∇fk‖3/(∆k∇fT
k Bk∇fk), or the
boundary value 1, whichever comes first.
Summing up, pCk = −τk
∆k
‖∇fk‖∇fk, where
τk=
1 if ∇fTk Bk∇fk ≤ 0;
min{‖∇fk‖
3
(∆k∇fT
kBk∇fk)
, 1} otherwise.
9
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Improvement on the C. Point
If we choose the Cauchy point then we actually
make step in the steepest descent direction. (We
just use the specific step size.) We thus make
negligible use of the second-order information.
Being a variant of the steepest descent method,
an algorithm that always makes the step towards
the Cauchy point is globally convergent but very
slow in practice.
We should make better use of the second order
information.
To simplify the notation from now on we drop
the subscripts k from pk, fk,∇fk and Bk.
Additionally, we denote g = ∇fk.
10
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Direction p depends on ∆
Examine the influence of ∆ on the solution of the
trust region subproblem. For very small ∆, the
step in the steepest descent direction −g should
be made:
p(∆) ≈ −∆g
‖g‖.
For very large ∆ (and positive definite B), the
full step in the Newton-like direction should be
made:
p(∆) = pB = −B−1g.
-g
xk
trajectory p( )∆
pU
pB
11
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Dogleg Method
For the intermediate values of ∆, the solution
p(∆) typically follows the curved trajectory in
the Figure.
We can approximate the curved trajectory with
two line segments. The first runs from the origin
to the unconstrained minimizer along the steep-
est descent direction
pU = −gTg
gTB gg.
The second line segment runs from pU to pB.
The following is a parametric description of the
two-segment trajectory:
p(τ)=
{
τpU , 0 ≤ τ ≤ 1,
pU + (τ − 1)(pB − pU), 1 < τ ≤ 2.
12
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Dogleg Method (cont’d)
Lemma 1 Let B be positive definite. Then
(i) ‖p(τ)‖ is an increasing function of τ ,
(ii) m(p(τ)) is a decreasing function of τ .
Proof: Both (i) and (ii) hold for any τ ∈ [0,1].
Suppose τ ∈ (1,2]. Let τ = 1 + α.
For (i), define h(α) and show that h′
(α) ≥ 0 for
α ∈ (0,1).
h(α) =1
2‖pU + α(pB − pU)‖2
=1
2‖pU‖2+α(pU)T (pB−pU)+
1
2α2‖(pB−pU)‖2.
13
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Proof of Lemma 1 (cont’d)
h′
(α) = −(pU)T (pU−pB)+α‖(pB−pU)‖2
≥ −(pU)T (pU−pB)
=gTg
gTB ggT(−
gTg
gTB gg + B−1g)
= gTggTB−1g
gTB g
(
1 −(gTg)2
(gTB g)(gTB−1g)
)
≥0.
Substitute u = B1/2g and v = B−1/2g and use
|uTv| ≤ ‖u‖‖v‖.
For (ii), define
h(α)=m(p(1+α))=f(xk)+(pU +α(pB−pU))T g
+1
2(pU +α(pB−pU))TB(pU +α(pB−pU)),
and show that h′
(α) ≤ 0 for α ∈ (0,1).
h′
(α) = (pB−pU)T (g+BpU)+α(pB−pU)TB(pB−pU)
≤ (pB−pU)T (g+BpU + B(pB−pU))
= (pB−pU)T (g+BpB) = 0
14
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Dogleg Method (continued)
From Lemma we conclude that the path p(τ) has
to intersect the trust region boundary ‖p‖ = ∆
at exactly one point if ‖pB‖ ≥ ∆, and nowhere
otherwise.
Since m is decreasing along the path, the optimal
solution of the trust region subproblem (along
the path) is either pB (if ‖pB‖ ≤ ∆) or the point
of intersection of the path and the trust region
boundary. In the latter case, we compute τ by
solving the following scalar quadratic equation
‖pU + (τ − 1)(pB − pU)‖2 = ∆2.
15
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Two-D. Subspace Minimization
The dogleg method can be made more sophis-
ticated by allowing the search in the entire two-
dimensional subspace spanned by vectors g and
B−1g instead of the search along the path p.
The trust region subproblem is then replaced by
min m(p) = f + gT p + 12pT B p
s.t. ‖p‖ ≤ ∆
p ∈ span[g, B−1g].
Clearly, the Cauchy point pC is feasible for this
subproblem so the optimal solution yields at least
as much reduction in m as the Cauchy point,
resulting in the global convergence of the algo-
rithm. Since the space span[g, B−1g] contains
the whole dogleg trajectory, the two-dimensional
subspace search is an obvious extension of the
dogleg method.
16
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Two-D. Subspace Minimization
An important advantage of the two-dimensional
subspace minimization approach is that it can
easily handle the case of indefinite B.
When B is indefinite, the two-dimensional sub-
space is replaced by span[g, (B +λI)−1g], where
λ > 0 is chosen such that the matrix B + λI is
positive definite.
If B is indefinite, then at least one of its eigen-
values is negative. Let λ1 be the most negative
eigenvalue of B. It suffices to take λ > −λ1 to
make B + λI positive definite.
There exist efficient methods that find the most
negative eigenvalue of the matrix.
17
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Example
Consider the Rosenbrock function:
f(x) = 100(x2 − x21)
2 + (1 − x1)2.
x 1
level set
minimizer
1
1
x 2
The function has a deep “valley” around the
points that satisfy equation x2 = x21.
Build the quadratic model at point (0,1) and
solve the trust region subproblem for different
values of ∆: ∆1 = 1/4 and ∆2 = 1/2.
18