introduction to global optimization - eeci · introduction to global optimization – p. complexity...
TRANSCRIPT
Introduction to Global OptimizationFabio Schoen
2008
http://gol.dsi.unifi.it/users/schoen
Introduction to Global Optimization – p.
Global Optimization Problems
minx∈S⊆Rn
f(x)
What is it meant by global optimization? Of course we sould liketo find
f ∗ = minx∈S⊆Rn
f(x)
andx∗ = arg min f(x) : f(x∗) ≤ f(x) ∀ x ∈ S
Introduction to Global Optimization – p.
This definition in unsatisfactory:
the problem is “ill posed” in x (two objective functions whichdiffer only slightly might have global optima which arearbitrarily far)
it is however well posed in the optimal values: ||f − g|| ≤ δ⇒|f ∗ − g∗| ≤ ε
Introduction to Global Optimization – p.
Quite often we are satisfied in looking for f ∗ and search one ormore feasible solutions suche that
f(x) ≤ f(x∗) + ε
Frequently, however, this is too ambitious a task!
Introduction to Global Optimization – p.
Research in Global Optimization
the problem is highly relevant, especially in applications
the problem is very hard (perhaps too much) to solve
there are plenty of publications on global optimizationalgorithms for specific problem classes
there are only relatively few papers with relevant theoreticalcontents
often from elegant theories, weak algorithms have beenproduced and viceversa, the best computational methodsoften lack a sound theoretical support
Introduction to Global Optimization – p.
many global optimization papers get published on appliedresearch journals
Bazaraa, Sherali, Shetty “Nonlinear Programming: theoryand algorithms”, 1993:the word “global optimum” appears for the first time on page99, the second time at page 132, then at page 247:“A desirable property of an algorithm for solving [anoptimization] problem is that it generates a sequence ofpoints converging to a global optimal solution. In manycases however we may have to be satisfied with lessfavorable outcomes.”after this (in 638 pages) it never appears anymore. “Globaloptimization” is never cited.
Introduction to Global Optimization – p.
Similar situation in Bertsekas, Nonlinear Programming (1999):777 pages, but only the definition of global minima and maximais given!Nocedal & Wrigth, “Numerical Optimization”, 2nd edition, 2006:Global solutions are needed in some applications, but for manyproblems they are difficult to recognize and even more difficultto locate. . .many successful global optimization algorithms require thesolution of many local optimization problems, to which thealgorithms described in this book can be applied
Introduction to Global Optimization – p.
Complexity
Global optimization is “hopeless”: without “global” informationno algorithm will find a certifiable global optimum unless itgenerates a dense sample.There exists a rigorous definition of “global” information – someexamples:
number of local optima
global optimum value
for global optimization problems over a box, (an upperbound on) the Lipschitz constant
|f(y) − f(x)| ≤ L‖x− y‖ ∀ x, y
Concavity of the objective function + convexity of thefeasible region
an explicit representation of the objective function as thedifference between two convex functions (+ convexity of thefeasible set)
Introduction to Global Optimization – p.
Complexity
Global optimization is computationally intractable alsoaccording to classical complexity theory. Special cases:Quadratic programming:
minl≤Ax≤u
1
2xTQx+ cTx
is NP–hard [Sahni, 1974] and, when considered as a decisionproblem, NP -complete [Vavasis, 1990].
Introduction to Global Optimization – p.
Many special cases are still NP–hard:
norm maximization on a parallelotope:
max ‖x‖b ≤ Ax ≤ c
Quadratic optimization on a hyper-rectangle (A = I) wheneven only one eigenvalue of Q is negative
quadratic minimization over a simplex
minx≥0
1
2xTQx+ cTx
∑
j
xj = 1
Even checking that a point is a local optimum is NP -hardIntroduction to Global Optimization – p. 10
Applications of global optimization
concave minimization – quantity discounts, scaleeconomies
fixed charge
combinatorial optimization - binary linear programming:
min cTx+KxT (1 − x)
Ax = b
x ∈ [0, 1]
or:
min cTx
Ax = b
x ∈ [0, 1]
xT (1 − x) = 0Introduction to Global Optimization – p. 11
Minimization of cost functions which are neither convex norconcave. E.g.: finding the minimum conformation ofcomplex molecules – Lennard-Jones micro-cluster, proteinfolding, protein-ligand docking,Example: Lennard-Jones: pair potential due to two atoms atX1, X2 ∈ R
3:
v(r) =1
r12− 2
r6
where r = ‖X1 −X2‖. The total energy of a cluster of Natoms located at X1, . . . , XN ∈ R
3 is defined as:∑
i=1,...,N
∑
j<i
v(||Xi −Xj||)
This function has a number of local (non global) minimawhich grows like exp(N)
Introduction to Global Optimization – p. 12
Lennard-Jones potential
-3
-2
-1
0
1
2
3
0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
attractive(x)repulsive(x)
lennard-jones(x)
Introduction to Global Optimization – p. 13
Protein folding and docking
Potential energy model:E = El + Ea + Ed + Ev + Ee where:
El =∑
i∈L
1
2Kb
i (ri − r0i )
2
(contribution of pairs of bonded atoms):
Ea =∑
i∈A
1
2Kθ
i (θi − θ0i )
2
(angle between 3 bonded atoms)
Ed =∑
i∈T
1
2Kφ
i [1 + cos(nφi − γ)]
(dihedrals)Introduction to Global Optimization – p. 14
Ev =∑
(i,j)
∑
∈C
(
Aij
R12ij
− Bij
R6ij
)
(van der Waals)
Ee =1
2
∑
(i,j)
∑
∈C
qiqjεRij
(Coulomb interaction)
Introduction to Global Optimization – p. 15
Docking
Given two macro-molecules M1,M2, find their minimal energycouplingIf no bonds are changed ⇒to find the optimal docking it issufficient to minimized:
Ev + Ee =∑
i∈M1,j∈M2
(
Aij
R12ij
− Bij
R6ij
)
+1
2
∑
i∈M1,j∈M2
qiqjεRij
Introduction to Global Optimization – p. 16
Main algorithmic strategies
Two main families:
1. with global information (“structured problems”)
2. without global information (“unstructured problems”)
Structured problems ⇒stochastic and deterministic methodsUnstructured problems ⇒typically stochastic algorithmsEvery global optimization method should try to find a balancebetween
exploration of the feasible region
approximations of the optimum
Introduction to Global Optimization – p. 17
Example: Lennard Jones
LJN = minLJ(X) = minN−1∑
i=1
N∑
j=i+1
1
‖Xi −Xj‖12− 2
‖Xi −Xj‖6
This is a highly structured problem. But is it easy/convenient touse its structure?And how?
Introduction to Global Optimization – p. 18
LJ
The map
F1 : R3N 7→ R
N(N−1)/2+
F1(X1, . . . , XN ) 7→
‖X1 −X2‖2, . . . , ‖XN−1 −XN‖2
is convex and the function
F2 : RN(N−1)/2+ 7→ R
F2(r12, . . . , rN−1,N ) 7→∑ 1
r6ij
− 2∑ 1
r3ij
is the difference between two convex functions. Thus LJ(X)can be seen as the difference between two convex function (ad.c. programming problem)
Introduction to Global Optimization – p. 19
NB: every C2 function is d.c., but often its d.c. decomposition isnot known.D.C. optimization is very elegant, there exists a nice dualitytheory, but algorithms are typically very inefficient.
Introduction to Global Optimization – p. 20
A primal method for d.c. optimization
“cutting plane” method (just an example, not particularlyefficient, useless for high dimensional problems).Any unconstrained d.c. problem can be represented as anequivalent problem with linear objective, a convex constraintand a reverse convex constraint. If g, h ar convex, thenmin g(x) − h(x) is equivalent to:
min z
g(x) − h(x) ≤ z
which is equivalent to
min z
g(x) ≤ w
h(x) + z ≥ w
Introduction to Global Optimization – p. 21
D.C. canonical form
min cTx
g(x) ≤ 0
h(x) ≥ 0
where h, g: convex. Let
Ω = x : g(x) ≤ 0C = x : h(x)≤0
Hp:0 ∈ intΩ ∩ intC, cTx > 0∀x ∈ Ω \ intC
Fundamental property: if a D.C. problem admits an optimum, atleast one optimum belongs to
∂Ω ∩ ∂C Introduction to Global Optimization – p. 22
Discussion of the assumptions
g(0) < 0, h(0) < 0, cTx > 0∀ feasible x. Let x be a solution to theconvex problem
min cTx g(x) ≤ 0
If h(x) ≥ 0 then x solves the d.c. problem. Otherwise cTx > cT xfor all feasible x. Coordinate transformation: y = x− x:
min cTy
g(y) ≤ 0
h(y) ≥ 0
where g(y) = g(y + x). Then cTy > 0 for all feasible solutionsand h(0) > 0; by continuity it is possible to choose x so thatg(0) < 0.
Introduction to Global Optimization – p. 23
-9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2-4
-3
-2
-1
0
1
2
3
4
Ω
C
0
cTx = 0
Introduction to Global Optimization – p. 24
Let x best known solution.Let
D(x) = x ∈ Ω : cTx ≤ cT xIf D(x) ⊆ C then x is optimal;Check: a polytope P (with known vertices) is built whichcontains D(x)If all vertices of P are in C ⇒optimal solution. Otherwise let v:best feasible vertex;the intersection of the segment [0, v] with ∂C (if feasible) is animproving point x. Otherwise a cut is introduced in P which istangent to Ω in x.
Introduction to Global Optimization – p. 25
-9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2-4
-3
-2
-1
0
1
2
3
4
Ω
C
cTx = 0
x
D(x) = x ∈ Ω : cTx ≤ cT x
Introduction to Global Optimization – p. 26
Initialization
Given a feasible solution x, take a polytope P such that
P ⊇ D(x)
i.e.
y : cTy ≤ cT x
y feasible
⇒y ∈ P
If P ⊂ C, i.e. if y ∈ P ⇒h(y) ≤ 0 then x is optimal.Checking is easy if we know the vertices of P .
Introduction to Global Optimization – p. 27
-9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2-4
-3
-2
-1
0
1
2
3
4
Ω
C
cTx = 0
x
P : D(x) ⊆ P with vertices V1, . . . , Vk. V ⋆ := arg maxh(Vj)
V ⋆
Introduction to Global Optimization – p. 28
Step 1
Let V ⋆ the vertex with largest h() value. Surely h(V ⋆) > 0(otherwise we stop with an optimal solution)Moreover: h(0) < 0 (0 is in the interior of C). Thus the line fromV ⋆ to 0 must intersect the boundary of CLet xk be the intersection point. It might be feasible(⇒improving) or not.
Introduction to Global Optimization – p. 29
-9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2-4
-3
-2
-1
0
1
2
3
4
Ω
C
cTx = 0
x
xk = ∂C ∩ [V ⋆, 0]
V ⋆
xk
Introduction to Global Optimization – p. 30
-9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2-4
-3
-2
-1
0
1
2
3
4
Ω
C
cTx = 0
If xk ∈ Ω, set x := xk
x
Introduction to Global Optimization – p. 31
-9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2-4
-3
-2
-1
0
1
2
3
4
Ω
C
cTx = 0
Otherwise if xk 6∈ Ω, the polytope is divided
Introduction to Global Optimization – p. 32
-9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2-4
-3
-2
-1
0
1
2
3
4
Ω
C
cTx = 0
Otherwise if xk 6∈ Ω, the polytope is divided
Introduction to Global Optimization – p. 32
Duality for d.c. problems
minx∈S
g(x) − h(x)
where f, g: convex. Let
h⋆(u) := supuTx− h(x) : x ∈ Rn
g⋆(u) := supuTx− g(x) : x ∈ Rn
the conjugate functions of h e g. The problem
infh⋆(u) − g⋆(u) : u : h⋆(u) < +∞
is the Fenchel-Rockafellar dual. If min g(x) − h(x) admits anoptimum, then Fenchel dual is a strong dual.
Introduction to Global Optimization – p. 33
If x⋆ ∈ arg min g(x) − h(x) then
u⋆ ∈ ∂h(x⋆)
(∂ denotes subdifferential) is dual optimal and ifu⋆ ∈ arg minh⋆(u) − g⋆(u) then
x⋆ ∈ ∂g⋆(u⋆)
is an optimal primal solution.
Introduction to Global Optimization – p. 34
A primal/dual algorithm
Pk : min g(x) − (h(xk) + (x− xk)Tyk)
andDk : minh⋆(y) − (g⋆(yk−1) + xT
k (y − yk−1)
Introduction to Global Optimization – p. 35
Exact Global Optimization
Introduction to Global Optimization – p. 36
GlobOpt - relaxations
Consider the global optimization problem (P):
min f(x)
x ∈ X
and assume the min exists and is finite and that we can use arelaxation (R):
min g(y)
y ∈ Y
Usually both X and Y are subsets of the same space Rn.
Recall: (R) is a relaxation of (P) iff:
X ⊆ Y
g(x) ≤ f(x) for all x ∈ XIntroduction to Global Optimization – p. 37
Branch and Bound
1. Solve the relaxation (R) and let L be the (global) optimumvalue (assume it is feasible for (R))
2. (Heuristically) solve the original problem (P) (or, moregenerally, find a “good” feasible solution to (P) in X). Let Ube the best feasible function value known
3. if U − L ≤ ε then stop: U is a certified ε–optimum for (P)
4. otherwise split X and Y into two parts and apply to each ofthem the same method
Introduction to Global Optimization – p. 38
Tools
“good relaxations”: easy yet accurate
good upper bounding, i.e., good heuristics for (P)
Good relaxations can be obtained, e.g., through:
convex relaxations
domain reduction
Introduction to Global Optimization – p. 39
Convex relaxations
Assume X is convex and Y = X. If g is the convex envelop of fon X, then solving the convex relaxation (R), in one step givesthe certified global optimum for (P).g(x) is a convex under-estimator of f on X if:
g(x)is convex
g(x) ≤ f(x) ∀x ∈ X
g is the convex envelop of f on X if:
gis a convex under-estimator off
g(x) ≥ h(x) ∀x ∈ X
∀h : convex under-estimator of f
Introduction to Global Optimization – p. 40
A 1-D example
Introduction to Global Optimization – p. 41
Convex under-estimator
Introduction to Global Optimization – p. 42
Branching
Introduction to Global Optimization – p. 43
Bounding
fathomed
Upper bound
lower boundsIntroduction to Global Optimization – p. 44
Relaxation of the feasible domain
Let
minx∈S
f(x)
be a GlobOpt problem where f is convex, while S is non convex.A relaxation (outer approximation) is obtained replacing S with alarger set Q. If Q is convex ⇒convex optimization problem.If the optimal solution to
minx∈Q
f(x)
belongs to S ⇒optimal solution to the original problem.
Introduction to Global Optimization – p. 45
Example
minx∈[0,5],y∈[0,3]
−x− 2y
xy ≤ 3
0 1 2 3 4 5 60
1
2
3
4
Introduction to Global Optimization – p. 46
Relaxation
minx∈[0,5],y∈[0,3]
−x− 2y
xy ≤ 3
We know that:
(x+ y)2 = x2 + y2 + 2xy
thus
xy = ((x+ y)2 − x2 − y2)/2
and, as x and y are non-negative, x2 ≤ 5x, y2 ≤ 3y, thus a(convex) relaxation of xy ≤ 3 is
(x+ y)2 − 5x− 3y ≤ 6
(a convex constraint)
Introduction to Global Optimization – p. 47
Relaxation
0 1 2 3 4 5 60
1
2
3
4
Optimal solution of the relaxed convex problem: (2, 3) (value:−8)
Introduction to Global Optimization – p. 48
Stronger Relaxation
minx∈[0,5],y∈[0,3]
−x− 2y
xy ≤ 3
Thus:
(5 − x)(3 − y) ≥ 0 ⇒15 − 3x− 5y + xy ≥ 0 ⇒
xy ≥ 3x+ 5y − 15
Thus a (convex) relaxation of xy ≤ 3 is
3x+ 5y − 15 ≤ 3
i.e.: 3x+ 5y ≤ 18Introduction to Global Optimization – p. 49
Relaxation
0 1 2 3 4 5 60
1
2
3
4
The optimal solution of the convex (linear) relaxation is (1, 3)which is feasible ⇒optimal for the original problem
Introduction to Global Optimization – p. 50
Convex (concave) envelopes
How to build convex envelopes of a function or how to relax anon convex constraint?Convex envelopes ⇒lower boundsConvex envelopes of −f(x) ⇒upper boundsConstraint: g(x) ≤ 0 ⇒if h(x) is a convex underestimator of gthen h(x) ≤ 0 is a convex relaxations.Constraint: g(x) ≥ 0 ⇒if h(x) is concave and h(x) ≥ g(x), thenh(x) ≥ 0 is a “convex” constraint
Introduction to Global Optimization – p. 51
Convex envelopes
Definition: a function is polyhedral if it is the pointwise maximumof a finite number of linear functions.(NB: in general, the convex envelope is the pointwisesupremum of affine minorants)The generating set X of a function f over a convex set P is theset
X = x ∈ Rn : (x, f(x))is a vertex of epi(convP (f))
I.e., given f we first build its convex envelop in P and thendefine its epigraph (x, y) : x ∈ P, y ≥ f(x). This is a convexset whose extreme points can be denoted by V . X are the xcoordinates of V
Introduction to Global Optimization – p. 52
Generating sets
* *
*
*
Introduction to Global Optimization – p. 53
bbb
Introduction to Global Optimization – p. 54
Characterization
Let f(x) be continuously differentiable in a polytope P . Theconvex envelope of f on P is polyhedral if and only if
X(f) = Vert(P )
(the generating set is the vertex set of P )Corollary: let f1, . . . , fm ∈ C1(P ) and
∑
i fi(x) possesspolyhedral convex envelopes on P . Then
Conv(∑
i
fi(x)) =∑
i
Convfi(x)
iff the generating set of∑
i Conv(fi(x)) is Vert(P )
Introduction to Global Optimization – p. 55
Characterization
If a f(x) is such that Convf(x) is polyhedral, than an affinefunction h(x) such that
1. h(x) ≤ f(x) for all x ∈ Vert(P )
2. there exist n+ 1 affinely independent vertices of P ,V1, . . . , Vn+1 such that
f(Vi) = h(Vi) i = 1, . . . , n + 1
belongs to the polyhedral description of Convf(x) and
h(x) = convf(x)
for any x ∈ Conv(V1, . . . , Vn+1).
Introduction to Global Optimization – p. 56
Characterization
The condition may be reversed: given m affine functionsh1, . . . , hm such that, for each of them
1. hj(x) ≤ f(x) for all x ∈ Vert(P )
2. there exist n+ 1 affinely independent vertices of P ,V1, . . . , Vn+1 such that
f(Vi) = hj(Vi) i = 1, . . . , n+ 1
Then the function ψ(x) = maxj φj(x) is the convex envelope of apolyhedral function f iff
the generating set of ψ is Vert(P)
for every vertex Vi we have ψ(Vi) = f(Vi)
Introduction to Global Optimization – p. 57
Sufficient condition
If f(x) is lower semi-continuous in P and for all x 6∈ Vert(P ) thereexists a line ℓx: x ∈ interior of P ∩ ℓx and f(x) is concave in aneighborhood of x on ℓx,then Convf(x) is polyhedralApplication: let
f(x) =∑
i,j
αijxixj
The sufficient condition holds for f in [0, 1]n ⇒bilinear forms arepolyhedral in an hypercube
Introduction to Global Optimization – p. 58
Application: a bilinear term
(Al-Khayyal, Falk (1983)): let x ∈ [ℓx, ux], y ∈ [ℓy, uy]. Then theconvex envelope of xy in [ℓx, ux] × [ℓy, uy is
φ(x, y) = maxℓyx+ ℓxy − ℓxℓy;uyx+ uxy − uxuy
In fact: φ(x, y) is a under-estimate of xy:
(x− ℓx)(y − ℓy) ≥ 0
xy ≥ ℓyx+ ℓxy − ℓxℓy
and analogously for xy ≥ uyx+ uxy − uxuy
Introduction to Global Optimization – p. 59
Bilinear terms
xy ≥ φ(x, y) = maxℓyx+ ℓxy − ℓxℓy;uyx+ uxy − uxuyNo other (polyhedral) function underestimating xy is tighter.In fact ℓyx+ ℓxy − ℓxℓy belongs to the convex envelope: itunderestimates xy and coincides with xy at 3 vertices((ℓx, ℓy), (ℓx, uy), (ux, ℓy)).Analogously for the other affine function.All vertices are interpolated by these 2 underestimatinghyperplanes ⇒they form the convex envelop of xy
Introduction to Global Optimization – p. 60
All easy then?
Of course no!Many things can go wrong . . .
It is true that, on the hypercube, a bilinear form:∑
i<j
αijxixj
is polyhedral (easy to see) but we cannot guarantee ingeneral that the generating set of the envelope are thevertices of the hypercube! (in particular, if α’s have oppositesigns)
if the set is not an hypercube, even a bilinear term might benon polyhedral: e.g. xy on the triangle 0 ≤ x ≤ y ≤ 1
Finding the (polyhedral) convex envelope of a bilinear form on ageneric polytope P is NP–hard!
Introduction to Global Optimization – p. 61
Fractional terms
A convex underestimate of a fractional term x/y over a box canbe obtained through
w ≥ ℓx/y + x/uy − ℓx/uy if ℓx ≥ 0
w ≥ x/uy − ℓxy/ℓyuy + ℓx/ℓy if ℓx < 0
w ≥ ux/y + x/ℓy − ux/ℓy if ℓx ≥ 0
w ≥ x/ℓy − uxy/ℓyuy + ux/uy if ℓx < 0
(a better underestimate exists)
Introduction to Global Optimization – p. 62
Univariate concave terms
If f(x), x ∈ [ℓx, ux], is concave, then the convex envelope issimply its linear interpolation at the extremes of the interval:
f(ℓx) +f(ux) − f(ℓx)
ux − ℓx(x− ℓx)
Introduction to Global Optimization – p. 63
Underestimating a general nonconvex function
Let f(x) ∈ C2 be general non convex. Than a convexunderestimate on a box can be defined as
φ(x) = f(x) −n∑
i=1
αi(xi − ℓi)(ui − xi)
where αi > 0 are parameters. The Hessian of φ is
∇2φ(x) = ∇2f(x) + 2diag(α)
φ is convex iff ∇2φ(x) is positive semi-definite.
Introduction to Global Optimization – p. 64
How to choose αi’s? One possibility: uniform choice: αi = α. Inthis case convexity of φ is obtained iff
α ≥ max
0,−1
2min
x∈[ℓ,u]λmin(x)
where λmin(x) is the minimum eigenvalue of ∇2f(x)
Introduction to Global Optimization – p. 65
Key properties
φ(x) ≤ f(x)
φ interpolates f at all vertices of [ℓ, u]
φ is convex
Maximum separation:
max(f(x) − φ(x)) =1
4α∑
i
(ui − ℓi)2
Thus the error in underestimation decreases when the boxis split.
Introduction to Global Optimization – p. 66
Estimation of α
Compute an interval Hessian [H] : [H(x)]ij = [hLij(x), h
Uij(x)] in
[ℓ, u]Find α such that [H] + 2diag(α) < 0.Gerschgorin theorem for real matrices:
λmin ≥ mini
hii −∑
j 6=i
|hij|
Extension to interval matrices:
λmin ≥ mini
hLii −
∑
j 6=i
max|hLij |, |hU
ij |uj − ℓjui − ℓi
Introduction to Global Optimization – p. 67
Improvements
new relaxation functions (other than quadratic). Example
Φ(x; γ) = −n∑
i=1
(1 − eγi(xi−ℓi))(1 − eγi(ui−xi))
gives a tighter underestimate than the quadratic function
partitioning: partition the domain into a small number ofregions (hyper-rectangules); evaluate a convexunderestimator in each region; join the underestimators toform a single convex function in the whole domain
Introduction to Global Optimization – p. 68
Domain (range) reduction
Techniques for cutting the feasible region without cutting theglobal optimum solution.Simplest approaches: feasibility-based and optimality-basedrange reduction (RR).Let the problem be:
minx∈S
f(x)
Feasibility based RR asks for solving
ℓi = min xi ui = maxxi
x ∈ S x ∈ S
for all i ∈ 1, . . . , n and then adding the constraints x ∈ [ℓ, u] tothe problem (or to the sub-problems generated during Branch &Bound)
Introduction to Global Optimization – p. 69
Feasibility Based RR
If S is a polyhedron, RR requires the solution of LP’s:
[ℓ, u] = min /maxx
Ax ≤ b
x ∈ [L,U ]
“Poor man’s” L.P. based RR: from every constraint∑
j aijxj ≤ biin which ai > 0 then
x ≤1
ai
(
bi −∑
j 6=
aijxj
)
⇒
x ≤1
ai
(
bi −∑
j 6=
minaijLj, aijUj)
Introduction to Global Optimization – p. 70
Optimality Based RR
Given an incumbent solution x ∈ S, ranges are updated bysolving the sequence:
ℓi = min xi ui = maxxi
f(x) ≤ f(x) f(x) ≤ f(x)
x ∈ S x ∈ S
where f(x) is a convex underestimate of f in the currentdomain.RR can be applied iteratively (i.e., at the end of a complete RRsequence, we might start a new one using the new bounds)
Introduction to Global Optimization – p. 71
generalization
minx∈X
f(x) (P )
g(x) ≤ 0
a (non convex) problem; let
minx∈X
f(x) (R)
g(x) ≤ 0
be a convex relaxation of (P ):
x ∈ X : g(x) ≤ 0 ⊆ x ∈ X : g(x) ≤ 0 and
x ∈ X : g(x) ≤ 0⇒f(x) ≤ f(x)
Introduction to Global Optimization – p. 72
R.H.S. perturbation
Let
φ(y) = minx∈X
f(x) (Ry)
g(x) ≤ y
be a perturbation of (R). (R) convex ⇒(Ry) convex for any y.Let x: an optimal solution of (R) and assume that the i–thconstraint is active:
g(x) = 0
Then, if xy is an optimal solution of (Ry) ⇒gi(x) ≤ yi is active at
xy if yi ≤ 0
Introduction to Global Optimization – p. 73
Duality
Assume (R) has a finite optimum at x with value φ(0) andLagrange multipliers µ. Then the hyperplane
H(y) = φ(0) − µTy
is a supporting hyperplane of the graph of φ(y) at y = 0, i.e.
φ(y) ≥ φ(0) − µTy ∀ y ∈ Rm
Introduction to Global Optimization – p. 74
Main result
If (R) is convex with optimum value φ(0), constraint i is active atthe optimum and the Lagrange multiplier is µi > 0 then, if U isan upper bound for the original problem (P ) the constraint:
gi(x) ≥ −(U − L)/µi
(where L = φ(0)) is valid for the original problem (P ), i.e. it doesnot exclude any feasible solution with value better than U .
Introduction to Global Optimization – p. 75
proof
Problem (Ry) can be seen as a convex relaxation of theperturbed non convex problem
Φ(y) = minx∈X
f(x)
g(x) ≤ y
and thus φ(y) ≤ Φ(y). Thus underestimating (Ry) produces anunderestimate of Φ(y). Let y := eiyi; From duality:L− µT eiyi ≤ φ(eiyi) ≤ Φ(eiyi)If yi < 0 then U is an upper bound also for Φ(eiyi), thusL− µiyi ≤ U . But if yi < 0 then constraint i is active. For anyfeasible x there exists a yi < 0 such that g(x) ≤ yi is active ⇒wemay substitute yi with g
i(x) and deduce L− µigi
(x) ≤ U
Introduction to Global Optimization – p. 76
Applications
Range reduction: let x ∈ [ℓ, u] in the convex relaxed problem. Ifvariable xi is at its upper bound in the optimal solution, them wecan deduce
xi ≥ maxℓi, ui − (U − L)/λi
where λi is the optimal multiplier associated to the i–th upperbound. Analogously for active lower bounds:
xi ≤ minui, ℓi + (U − L)/λi
Introduction to Global Optimization – p. 77
Let the constraint
aTi x ≤ bi
be active in an optimal solution of the convex relaxation (R).Then we can deduce the valid inequality
aiTx ≥ bi − (U − L)/µi
Introduction to Global Optimization – p. 78
Methods based on “merit functions”
Bayesian algorithm: the objective function is considered as arealization of a stochastic process
f(x) = F (x;ω)
A loss function is defined, e.g.:
L(x1, ..., xn;ω) = mini=1,n
F (xi;ω) − minxF (x;ω)
and the next point to sample is placed in order to minimize theexpected loss (or risk)
xn+1 = arg minE (L(x1, ..., xn, xn+1) | x1, ..., xn)
= arg minE (min(F (xn+1;ω) − F (x;ω)) | x1, ..., xn)
Introduction to Global Optimization – p. 79
Radial basis method
Given k observations (x1, f1), . . . , (xk, fk), an interpolant is built:
s(x) =n∑
i=1
λiΦ(‖x− xi‖) + p(x)
p: polynomial of a (prefixed) small degree m. Φ: radial functionlike, e.g.:
Φ(r) = r linear
Φ(r) = r3 cubic
Φ(r) = r2 log r thin plate spline
Φ(r) = e−γr2
gaussian
Polynomial p is necessary to guarantee existence of a uniqueinterpolant (i.e. when the matrix Φij = Φ(‖xi −xj‖) is singular)
Introduction to Global Optimization – p. 80
“Bumpiness”
Let f ⋆k an estimate of the value of the global optimum after k
observations. Let syk the (unique) interpolant of the data points
(xi, fi)i = 1, . . . , k
(y, f ⋆k )
Idea: the most likely location of y is such that the resultinginterpolant has minimum “bumpiness”Bumpiness measure:
σ(sk) = (−1)m+1∑
λisyk(xi)
Introduction to Global Optimization – p. 81
TO BE DONE
Introduction to Global Optimization – p. 82
Stochastic methods
Pure Random Search - random uniform sampling over thefeasible region
Best start: like Pure Random Search, but a local search isstarted from the best observation
Multistart: Local searches started from randomly generatedstarting points
Introduction to Global Optimization – p. 83
-3
-2
-1
0
1
2
3
0 1 2 3 4 5
rsrsrs rs rsrs rs rsrsrs
+
++
+
+
+ + +++
Introduction to Global Optimization – p. 84
-3
-2
-1
0
1
2
3
0 1 2 3 4 5
rsrsrs rs rsrs rs rsrsrs
+
++
+
+
+ + +++
Introduction to Global Optimization – p. 85
Clustering methods
Given a uniform sample, evaluate the objective function
Sample Transformation (or concentration): either a fractionof “worst” points are discarded, or a few steps of a gradientmethod are performed
Remaining points are clustered
from the best point in each cluster a single local search isstarted
Introduction to Global Optimization – p. 86
Uniform sample
−1
−3
0
−5
rs
rs rs
rs
rs
rsrs
rs
rs
rs
rsrs
rs
rs
rs
rs
rs
rs
rs
rsrsrs
rs
rs
rs
rs
rs
rsrs
rs
rs
0
1
2
3
4
5
0 1 2 3 4 5
Introduction to Global Optimization – p. 87
Sample concentration
−1
−3
0
−5
rs
rsrs
rs
rs
rs
rs
rs
rs
rs
rs
rs
rsrs
rs
+ + +
+
+
+
+
++
+++
+
+ ++0
1
2
3
4
5
0 1 2 3 4 5
Introduction to Global Optimization – p. 88
Clustering
−1
−3
0
−5
r
rr
rr
r
r
r
r
r
u
r
u
r
r
0
1
2
3
4
5
0 1 2 3 4 5
Introduction to Global Optimization – p. 89
Local optimization
−1
−3
0
−5
r
rr
rr
r
r
r
r
r
u
r
u
r
r
0
1
2
3
4
5
0 1 2 3 4 5
Introduction to Global Optimization – p. 90
Clustering: MLSL
Sampling proceed in batches of N points. Given sample pointsX1, . . . , Xk ∈ [0, 1]n, label Xj as “clustered” iff ∃Y ∈ X1, . . . , Xk:
||Xj − Y || ≤ ∆k :=1√2π
(
log k
kσΓ(
1 +n
2
)
)1
n
andf(Y ) ≤ f(Xj)
Introduction to Global Optimization – p. 91
Simple Linkage
A sequential sample is generated (batches consist of a singleobservation). A local search is started only from the lastsampled point (i.e. there is no “recall”) unless there exists asufficiently near sampled point with better function valure
Introduction to Global Optimization – p. 92
Smoothing methods
Given f : Rn → R, the Gaussian transform is defined as:
〈f〉λ(x) =1
πn/2λn
∫
Rn
f(y) exp(
−‖y − x‖2/λ2)
When λ is sufficiently large ⇒〈f〉λ is convex. Idea: starting witha large enough λ, minimize the smoothed function and slowlydecrease λ towards 0.
Introduction to Global Optimization – p. 93
Smoothing methods
-10-5
05
10 -10
-5
0
5
10
0
0.5
1
1.5
2
2.5
3
Introduction to Global Optimization – p. 94
-10-5
05
10 -10
-5
0
5
10
0
0.5
1
1.5
2
2.5
3
Introduction to Global Optimization – p. 95
-10-5
05
10 -10
-5
0
5
10
0.60.8
11.21.41.61.8
22.22.4
Introduction to Global Optimization – p. 96
-10-5
05
10 -10
-5
0
5
10
0.8
1
1.2
1.4
1.6
1.8
2
2.2
Introduction to Global Optimization – p. 97
-10-5
05
10 -10
-5
0
5
10
0.8
1
1.2
1.4
1.6
1.8
2
2.2
Introduction to Global Optimization – p. 98
Transformed function landscape
Elementary idea: local optimization smooths out many “highfrequency” oscillations
Introduction to Global Optimization – p. 99
0 1 2 3 4 5 6 7 8 9 100
1
2
3
4
5
6
7
8
9
10
Introduction to Global Optimization – p. 100
0 1 2 3 4 5 6 7 8 9 100
1
2
3
4
5
6
7
8
9
10
Introduction to Global Optimization – p. 101
0 1 2 3 4 5 6 7 8 9 100
1
2
3
4
5
6
7
8
9
10
Introduction to Global Optimization – p. 102
Monotonic Basin-Hopping
k := 0; f⋆ := +∞;while k < MaxIter do
Xk: random initial solutionX⋆
k= arg min f(x; Xk);
(local minimization started at Xk)fk = f(X⋆
k);
if fk < f⋆ =⇒ f⋆ := fk
NoImprove := 0;while NoImprove < MaxImprove do
X = random perturbation of Xk
Y = arg minf(x; X) ;if f(Y ) < f⋆ =⇒ Xk := Y ; NoImprove := 0; f⋆ := f(Y )
otherwise NoImprove + +
end while
end while
Introduction to Global Optimization – p. 103
0 1 2 3 4 5 6 7 8 9 100
1
2
3
4
5
6
7
8
9
10
Introduction to Global Optimization – p. 104
0 1 2 3 4 5 6 7 8 9 100
1
2
3
4
5
6
7
8
9
10
Introduction to Global Optimization – p. 105
0 1 2 3 4 5 6 7 8 9 100
1
2
3
4
5
6
7
8
9
10
Introduction to Global Optimization – p. 106
0 1 2 3 4 5 6 7 8 9 100
1
2
3
4
5
6
7
8
9
10
Introduction to Global Optimization – p. 107
0 1 2 3 4 5 6 7 8 9 100
1
2
3
4
5
6
7
8
9
10
Introduction to Global Optimization – p. 108