optimization theory and methods: from convex to nonconvex 20200527.pdf · applications of s-lemma...
Post on 22-Jul-2020
3 Views
Preview:
TRANSCRIPT
Optimization theory and methods: from convex tononconvex
Yong Xia
School of Mathematical Sciences, Beihang University
May 27, 2020
(Beihang University) 1 / 79
Outline
1 Theory: Hidden Convexity
2 Theory: Optimization Conditions
3 Optimality conditions for local and global solutions
4 Theory: Complexity
5 Methods: Approximation Algorithm
6 Methods: Global Optimization Algorithm
7 Conclusions
(Beihang University) 2 / 79
Optimization problem
min f(x)
s.t. x ∈ Ω.
Convex Optimization: Ω convex set, f : Rn → R convex function.Applications: machine learning.
Non-convex Optimization: either Ω or f : Rn → R is non-convex. Forexample, deep learning.
(Beihang University) 3 / 79
Non-convex Optimization
Problem 2016 (Y. Xia et al. 2020)
minx∈[−1,1]2
100x21 + x2
2 +x1 + x2
2.016 + x1 + x2.
−1 −0.5 0 0.5 1
−10
1−60
−40
−20
0
20
40
60
80
100
120
x1
x2
func
tion
valu
e
(Beihang University) 4 / 79
Linear Programming
Ax = b
Ax = b, x ≥ 0.
min cTx
s.t. Ax = b, x ≥ 0.
(Beihang University) 5 / 79
Duality theory
min cTx
s.t. Ax = b, x ≥ 0,
is equal to the optimal value of the dual problem (strong duality)
max bT y
s.t. AT y ≤ c.
(Beihang University) 6 / 79
How to check the insolvability of x : Ax ≤ b, x ≥ 0?
min cTx : Ax = b, x ≥ 0= max t : cTx < t, Ax = b, x ≥ 0 = ∅.
(Beihang University) 7 / 79
Classical Farkas Lemma
Theorem
The following two statements are equivalent:(i) The system Ax ≤ b, x ≥ 0 has no solution.(ii) There exists a y ≥ 0 such that AT y ≥ 0 and bT y < 0.
J. Farkas, Uber die Teorie der Einfachen Ungleichungen, Journal fur dieReine und Angewandte Mathematik, 124(124), 1-27, 1902
(Beihang University) 8 / 79
Nonlinear Farkas Lemma
Theorem 21.1 in [2]
TheoremLet f, g1, . . . , gm : Rn → R be convex functions. C ⊆ Rn is a convex set. Let usassume that the Slater condition holds for g1, · · · , gm, i.e., there exists anx ∈ rel int C such that gj(x) < 0, j = 1, . . . ,m. The following two statementsare equivalent:(i) The system f(x) < 0, gi(x) ≤ 0, i = 1, . . . ,m, x ∈ C is not solvable.(ii) There exist λi ≥ 0, i = 1, . . . ,m such that
f(x) +
m∑i=1
λigi(x) ≥ 0
for all x ∈ C.
R.T. Rockafellar, Convex Analysis, Princeton University Press, Princeton,N. J., 1970
(Beihang University) 9 / 79
From convex to nonconvex: S-lemma
Theorem (Yakubovich 1971)
Let f, g : Rn → R be quadratic functions and suppose that there is anx ∈ Rn such that g(x) < 0. Then the following two statements areequivalent.(i)There is no x ∈ Rn such that f(x) < 0, g(x) ≤ 0.(ii)There is a nonnegative number y ≥ 0 such thatf(x) + yg(x) ≥ 0 ∀x ∈ Rn.
V.A. Yakubovich, S-procedure in nonlinear control theory, VestnikLeningrad. Univ., 1, 62-71, 1971(in Russian)
Polik,I., Terlaky, T.: A survey of the S-lemma. SIAM Rev. 49(3), 371-418,2007
(Beihang University) 10 / 79
Applications of S-lemma
The well-known trust-region subproblem (TRS) [5]:
(TRS) minxTAx+ bTx : xTx ≤ δ
,
where δ is a positive scalar. In the case that A 6 0, (TRS) is a nonconvexoptimization. However, we can show that (TRS) is equivalent to a convexoptimization in the sense that they share the same optimal solution.
D.M. Gay, Computing optimal locally constrained steps. SIAM J. Sci. Stat.Comput. 2(2), 186-197, 1981
(Beihang University) 11 / 79
Applications of S-lemma
Strong duality holds for (TRS):
(TRS)
= minx
xTAx+ bTx : xTx ≤ δ
= sup
t
t :x : t > xTAx+ bTx, xTx ≤ δ
= ∅
= supt
t : ∃λ ≥ 0 : xTAx+ bTx− t+ λ(xTx− δ) ≥ 0, ∀x ∈ Rn
= sup
t
t :
(A+ λI b
2bT
2 −t− λδ
) 0, λ ≥ 0
,
which is a semidefinite programming (SDP).
(Beihang University) 12 / 79
Extensions of S-lemma: S-lemma with equality
Theorem
Let f(x) := xTAx+ 2aTx+ c, h(x) := xTBx+ 2bTx+ d, whereA,B ∈ Rn×n are symmetric matrices. Suppose there are x′, x′′ ∈ Rn suchthat h(x′) < 0 < h(x′′). Then, except that A has exactly one negativeeigenvalue, B = 0, b 6= 0 and(
V TAV V T (Ax0 + a)(xT0 A+ aT )V f(x0)
) 0,
where x0 = − d2bT b
b, V ∈ Rn×(n−1) is the matrix basis of N (b), the systemf(x) < 0, h(x) = 0 is unsolvable if and only if there exists a real numberµ such that f(x) + µh(x) ≥ 0, ∀x ∈ Rn.
Y. Xia, S. Wang, R.-L. Sheu, S-Lemma with Equality and Its Applications,Mathematical Programming, Ser. A, 156(1-2), 513-547, 2016
(Beihang University) 13 / 79
Application: GPS localization problem
di ≈ ‖x− ai‖ − r, i = 1, · · · ,m,
where x is the user’s (unknown) location, r is the (unknown) bias causedby the user clock error, a1, · · · , am are the satellites’ known locations, andd1, . . . , dm are measured pseudoranges.
The squared least squares model
minx∈Rn,r∈R
m∑i=1
(‖x− ai‖2 − (r + di)
2)2
can be reformulated as a QP with an equality constraint
minx∈Rn,r,t∈R
m∑i=1
(t− 2aTi x− 2dir + aTi ai − d2
i
)2: xTx− r2 = t
.
(Beihang University) 14 / 79
A new application
Total least squares:
minx∈Rn‖Ax− b‖2
‖x‖2 + 1.
Total least squares with Tikhonov identical regularization:
(TI) minx∈Rn‖Ax− b‖2
‖x‖2 + 1+ ρ‖x‖2.
(Beihang University) 15 / 79
Non-convexity
A two-dimensional example:
z =100× ((x+ 2)2 + 3y2)
x2 + y2 + 1+ x2 + y2
5
0
-5
6420-2-4-6-8
400
350
450
300
250
200
150
50
0
100
(Beihang University) 16 / 79
Reformation of (TI)
(TI) minx∈Rn‖Ax− b‖2
‖x‖2 + 1+ ρ‖x‖2
= minx∈Rn‖Ax− b‖2 + ρ‖x‖4 + ρ‖x‖2
‖x‖2 + 1
, minx∈Rnf(x)
g(x)
Proposition
minx∈Rnf(x)g(x) = λ is equivalent to the quadratic equation
minx∈Rnf(x)− λg(x) = 0.
(TI) Finding λ such that minxf(x)− λg(x) = 0.
(Beihang University) 17 / 79
Hidden Convexity of (TI)
minx∈Rn
‖Ax− b‖2
‖x‖2 + 1+ ρ‖x‖2
=
minx∈Rn
‖Ax− b‖2 + ρt2 + ρt
t+ 1: ‖x‖2 = t
= supλ : x ∈ Rn : λ ≥
‖Ax− b‖2 + ρt2 + ρt
t+ 1, ‖x‖2 = t = ∅
= supλ : x ∈ Rn : ‖Ax− b‖2 + ρt2 + ρt− λ(t+ 1) < 0, ‖x‖2 = t = ∅= supλ : ∃µ : ‖Ax− b‖2 + ρt2 + ρt− λ(t+ 1) + µ(‖x‖2 − t) ≥ 0, ∀x, t
= sup
λ :
ATA+ µI 0 −AT b0 ρ ρ−λ−µ
2
−bTA ρ−λ−µ2
bT b− λ
0
,
which is a semidefinite programm (SDP).
(Beihang University) 18 / 79
The Simplified SDP
sup
λ :
ATA+ µI 0 −AT b0 ρ ρ−λ−µ
2
−bTA ρ−λ−µ2
bT b− λ
0
⇐⇒
sup
λ :
Λ + µI 0 00 ρ 0
0 0 − (ρ−λ−µ)24ρ
+ ‖b‖2 − λ− bT (Λ + µI)−1b
0
where ATA = UΛUT , U is orthogonal and b = −UTAT b.⇐⇒
sup
λ : Λi + µ ≥ 0, − (ρ− λ− µ)2
4ρ+ ‖b‖2 − λ− bT (Λ + µI)−1b ≥ 0
(Beihang University) 19 / 79
Reducing SDP to solving a smooth univariate equation
By eliminating λ, (SDP) is reduced to
max T (µ) := −µ− ρ+ 2ρ
√√√√µ
ρ− 1
ρ
n∑i=1
bi2
Λi + µ+
1
ρbT b
s.t. µ ≥ maxi
(−Λi)
T (µ) is a strictly concave function.The classical Newton’s method is workable with quadratic convergencerate.
(Beihang University) 20 / 79
Extensions of S-lemma: S-lemma with interval bounds
TheoremAssume there is an x ∈ Rn such that −∞ < α < h(x) < β <∞. Then, except that A has exactly one negativeeigenvalue, B = 0, b 6= 0 and there exists a ν ≥ 0 such that
V TAV 1
2bT bV TAb V T a
12bT b
bTAV bT Ab(2bT b)2
+ ν aT b2bT b
− ν2(α + β − 2d)
aT V aT b2bT b
− ν2(α + β − 2d) c + ν(α− d)(β − d)
0,
where V ∈ Rn×(n−1) is the matrix basis of N (b), the system
f(x) < 0, α ≤ h(x) ≤ β
is unsolvable if and only if there is a number µ ∈ R such that
f(x) + µ−(h(x)− β) + µ+(α− h(x)) ≥ 0, ∀x ∈ Rn,
where µ+ = maxµ, 0, µ− = minµ, 0.
S. Wang, Y. Xia, Strong duality for generalized trust region subproblem:S-lemma with interval bounds. Optim. Lett. 9, 1063-1073, 2015
(Beihang University) 21 / 79
Positive Lagrangian duality gap
Identical regularized total least squares problem (TLS) (Beck et al. 2006):
(ITLS) min
‖Ax− b‖2
xTx+ 1: α ≤ xTx ≤ β
,
where A ∈ Rm×n (m ≥ n) and 0 ≤ α ≤ β < +∞ is assumed. Our necessary andsufficient condition for the strong duality:
Theorem (Yang & Xia, 2000)
The Lagrangian duality gap for (ITLS) is positive if and only if
A := ATA− λmin
(ATA −AT b−bTA bT b
)· I 0,
α− bTAA−2AbT > 0.
Corollary
Strong Lagrangian duality holds for (ITLS) with α = 0.
(Beihang University) 22 / 79
Closing Lagrangian duality gap
(G− ITLS) min
f1(x)
f2(x): α ≤ f3(x) ≤ β
,
where fi(x) are quadratic functions for i = 1, 2, 3 and f2 > 0.
Theorem
There is no Lagrangian duality gap for the scaled (G− ITLS):
min
f1(x)
f2(x):
α
f2(x)≤ f3(x)
f2(x)≤ β
f2(x)
.
(Beihang University) 23 / 79
Let’s look back
Nonlinear Farkas Lemma: convex;
S-lemma: quadratic nonconvex.
(Beihang University) 24 / 79
Unifying Farkas and S-lemma
Theorem
(U-lemma) Suppose f(x) and g(x) are two quadratic functions. Letq0(z), . . . , qm(z) be convex functions defined in a convex set Ω ⊆ Rn.Assume that there exist x ∈ Rn and z ∈ relint(Ω) such thatg(x) + q1(z) < 0, qi(z) < 0, i = 2, . . . ,m. Then the following twostatements are equivalent:
(i) The system
f(x) + q0(z) < 0, g(x) + q1(z) ≤ 0, qi(z) ≤ 0, i = 2, . . . ,m, z ∈ Ω
has no solution (x, z).
(ii) There exist µi ≥ 0, i = 1, 2, . . . ,m such that
f(x) + µ1g(x) + q0(z) +
m∑i=1
µiqi(z) ≥ 0, ∀ x ∈ Rn, z ∈ Ω.
(Beihang University) 25 / 79
Application 1.
Solving the p-regularized subproblems for p > 2:
(p−RS) : minxxTAx+ bTx+ ‖x‖p (p > 2)
The p-regularized subproblem (p-RS) is a regularization technique incomputing a Newton-like for unconstrained optimization.When p = 3: Nestrov-Polyak Subproblem.When p = 4: Double-well potential minimization problem.
Y. Xia, R.-L. Sheu, Y.-x. Yuan, Theory and application of p-regularizedsubproblems for p > 2, Optimization Methods & Software, 32(5) 1059-1077,2017
(Beihang University) 26 / 79
Application 2.
A common way of seeking a good answer in numerical analysis is to find amethod of calculation that minimizes the backward error of the problem.The basic idea of the backward error analysis technique is to show thesolution obtained is the exact solution of a nearby problem, by use offloating-point arithmetic error bounds. One of the cost functions is asbelow:
minx
‖Ax− b‖‖A‖‖x‖+ ‖b‖
. (1)
K.E. Schubert, A new look at robust estimation and identification,Ph.D.Dissertation, Univ. of California, Santa Barbara, CA, September 2003.
(Beihang University) 27 / 79
Trust-region subproblem (TRS)
(TRS) minxTx=1
1
2xTQx + cTx.
-15
-10
-5
0
5
1
10
15
20
y2
01
y1
0.50-1 -0.5-1
global minimizerglobal maximizerlocal non-global minimizerlocal non-global maximizer
Figure: An example that the local non-global minimizer exists
(Beihang University) 28 / 79
Problem simplification and the KKT condition
Let Q = UTΛU , Λ = Diag(λ1, . . . , λn), and U = [u1, . . . , un]
(0 >)λ1 ≤ λ2 ≤ . . . ≤ λn,
Introduce y = Ux and d = Uc, we have
v(TE) = minyT y=1
f(y) =1
2yTΛy + dT y
We call (y, µ) a KKT point if :
(Λ + µIn)y + d = 0
yT y − 1 = 0
µ ∈ R
(Beihang University) 29 / 79
Optimality conditions for local minimizers
Lemma
(1) (Second-order necessary condition.) If y is a local minimizer, µ isthe associated Lagrangian multiplier, then (y, µ) is a KKT point and
vT (Λ + µIn)v ≥ 0, ∀v ∈ Rn such that vT y = 0.
(2) (Second-order sufficient condition.) Suppose (y, µ) is a KKT pointand
vT (Λ + µIn)v > 0, ∀v ∈ Rn such that vT y = 0, v 6= 0.
Then y is a strict local minimizer.
D. G. Luenberger, Linear and Nonlinear Programming, 2nd ed.,Addison-Wesley, Reading, MA, 1984
(Beihang University) 30 / 79
Necessary and sufficient condition for global minimizer
Theorem (Gay 1981; Sorensen 1982; More and Sorensen 1983)
y is a global minimizer if and only if (y, µ) is a KKT point and
µ ≥ −λ1(> 0).
D. M. Gay, Computing optimal locally constrained steps. SIAM J. Sci.Stat. Comput. 2(2): 186-197, 1981
D. C. Sorensen, Newton’s method with a model trust region modification.SIAM J. Numer. Anal. 19(2): 409–426, 1982
J. J. More and D. C. Sorensen, Computing a trust region step. SIAM J.Sci. Statist. Comput., 4(3): 553–572, 1983
(Beihang University) 31 / 79
Characterizations of local non-global minimizer
Lemma (Martınez,1994)
If y is a local non-global minimizer, then (y, µ) is a KKT point, λ1 < λ2
and µ ∈ (−λ2,−λ1).
Define the so-called secular function:
ϕ(µ) =
n∑i=1
d2i
(λi + µ)2− 1.
µ must be a zero point of ϕ(µ).
J. M. Martınez, Local minimizers of quadratic functions on Euclidean ballsand spheres, SIAM J. Optim. 4: 159–176, 1994
(Beihang University) 32 / 79
Secular function
ϕ(µ) =
n∑i=1
d2i
(λi + µ)2− 1,
ϕ′(µ) = −2
n∑i=1
d2i
(λi + µ)3,
ϕ′′(µ) = 6
n∑i=1
d2i
(λi + µ)4.
ϕ′′(µ) > 0, ϕ(µ) is strongly convex
(d1 6= 0) limµ→−λ1
ϕ(µ) = +∞, (d1 = 0) limµ→−λ1
ϕ(µ) = α.
(d2 6= 0) limµ→−λ2
ϕ(µ) = +∞, (d2 = 0) limµ→−λ2
ϕ(µ) = β.
ϕ(µ) has at most two zero points in (−λ2,−λ1)
(Beihang University) 33 / 79
Optimality conditions for local non-global minimizer
Theorem (Martınez,1994)
(1) Suppose either λ1 = λ2 or d1 = 0, there is no local non-globalminimizer.
(2) There is at most one local non-global minimizer.
(3) (Necessary condition.) If y is a local non-global minimizer, then (y, µ)is a KKT point, µ ∈ (−λ2,−λ1) and
ϕ′(µ) ≥ 0.
(4) (Sufficient condition.) If (y, µ) is a KKT point, µ ∈ (−λ2,−λ1) and
ϕ′(µ) > 0,
then y is the unique local non-global minimizer.
(Beihang University) 34 / 79
An example that ϕ′(µ) = 0
110.5 0.5y
1
0y
2
0-0.5
30
20
10
0
-1 -0.5
-10
-20-1
global minimizerglobal maximizersecond-order KKT point
Figure: A two-dimensional example has no local non-global minimizer/maximizer.
(Beihang University) 35 / 79
Necessary and sufficient condition for local non-globalminimizer
Theorem (Wang & Xia 2020)
y is the unique local non-global minimizer if and only if (y, µ) is a KKTpoint, µ ∈ (−λ2,−λ1) and ϕ′(µ) > 0.
Theorem (Second-order necessary and sufficient condition)
x is a local non-global minimizer of (TE) if and only if there exists aunique Lagrangian multiplier µ such that
(Q+ µIn)x+ c = 0,
xTx = 1,
vT (Q+ µIn)v > 0, ∀v ∈ Rn such that vTx = 0, v 6= 0,
Q+ µIn 6 0.
J. Wang, Y. Xia, Closing the gap between necessary and sufficientconditions for local non-global minimizer of trust region subproblem, SIAM J.Optim. 2020, accepted(Beihang University) 36 / 79
Application
With the help of local-nonglobal minimizer, one can globally solve theextended trust-region subproblem:
min1
2xTQx+ cTx
s.t. xTx ≤ 1,
aTi x− bi ≤ 0, i = 1, · · · ,m,
(Beihang University) 37 / 79
NP=P ?
(Beihang University) 38 / 79
Chebyshev center problem
Finding the smallest ball enclosing the convex set Ω:
minz
maxx∈Ω‖x− z‖2.
Easy cases:
Ω : a given set of finite points.
Ω : intersection of twoellipsoids in the complex domain.
(Beihang University) 39 / 79
Chebyshev center of intersection of balls
Finding the smallest ball enclosing the intersection of the given balls:
(CCB) minz∈Rn
max‖x−ai‖≤ri,i=1,2,...,p
‖x− z‖2,
‖ · ‖: `2-norm.
(Beihang University) 40 / 79
Applications
Example
Robust estimation. Suppose yk, k = 1, ..., p are the p measurements of theunknown x with bounded noises, i.e., ‖yk − x‖ ≤ ρ for some ρ > 0. Then,a robust recover of x can be obtained by solving (CCB).
Example
In non-cooperative wireless network positioning, the region for a target tocommunicate with some reference nodes is an intersection of many balls.The position error is usually bounded by the diameter of the smallest ballenclosing this region. This leads to a direct application of (CCB) withn = 2 or 3.
(Beihang University) 41 / 79
Known Results
Theorem (Beck 2007)
v(CCB) = v(SDP) as long as p ≤ n− 1.
Theorem (Beck 2009)
v(CCB) = v(SDP) as long as p ≤ n.
No complexity results when p > n.
(Beihang University) 42 / 79
What happens when p = 3 > n = 2?
(Beihang University) 43 / 79
Our complexity results
Theorem
The complexity of (CCB) when n = 2 is at most O(p2).
Theorem
(CCB) is NP-hard.
Theorem
Suppose either n is fixed or p = n+ q with a fixed integer q, (CCB) ispolynomially solvable in at most
(pn
)·O(n3) time.
Y. Xia, M. Yang, S. Wang, Chebyshev Center of the Intersection of Balls:Complexity, Relaxation and Approximation, Mathematical Programming, 2020,https://doi.org/10.1007/s10107-020-01479-0
(Beihang University) 44 / 79
Chebyshev center of the intersection of two ellipsoids
(CC) minz
maxx∈Ω‖x− z‖2,
where
Ω :=x ∈ Rn : ‖Fix+ gi‖2 ≤ 1, i = 1, 2
,
and Fi ∈ Rmi×n, gi ∈ Rmi .
Beck, A., Eldar, Y.: Regularization in regression with bounded noise: AChebyshev center approach. SIAM Journal on Matrix Analysis and Applications29(2), 606-625, 2007
(Beihang University) 45 / 79
Application in bounded error estimation
Linear regression model Ax ≈ b, where the input data matrix A isill-conditioned.
Admissible solutions to the linear system:
F = x ∈ Rn : ‖Lx‖2 ≤ η, ‖Ax− b‖2 ≤ ρ,
where regularization constraint ‖Lx‖2 ≤ η is to stabilize x and‖Ax− b‖2 ≤ ρ is the error bound constraint.
A robust approximation of the true solution x is given by the Chebyshevcenter of F .
Milanese, M., Vicino, A.: Optimal estimation theory for dynamic systemswith set membership uncertainty: an overview. Automatica 27(6), 997-1009,1991
(Beihang University) 46 / 79
An Example
-6 -5 -4 -3 -2 -1 0 1 2 3
-4
-3
-2
-1
0
1
2
3
*+
Figure: An example in two dimension where the input ellipse is plotted in solid line. The dotted and dashed circles are theChebyshev solutions and the SDP approximation, respectively. Chebyshev centers and the corresponding SDP approximation aremarked by ∗ and +, respectively.
(Beihang University) 47 / 79
Complexity for (CC)
Theorem
For any ε > 0, (CC) can be solved in
O(n8 log log u−1) log
((6R+ 4‖zc‖) · σmax(F1)
σ2min(F1)
1
ε
).
X. Cen, Y. Xia et al. On Chebyshev Center of the Intersection of TwoEllipsoids, WCGO 2019
(Beihang University) 48 / 79
Weighted Maximin Dispersion Problem
Consider the following weighted maximin dispersion problem
(MaxMin) maxx∈Ω
mini=1,...,m
ωi‖x− xi‖2,
where Ω = x | ‖x‖p ≤ 1 with p ≥ 2,x1, . . . , xm ∈ Rn are given m points,ωi > 0 for i = 1, . . . ,mand ‖x‖ =
√xTx is the Euclidean norm.
(Beihang University) 49 / 79
On the objective function
nonsocial transient behavior
E.C. Kim, Nonsocial Transient Behavior: Social Disengagement on theGreyhound Bus, Symbolic Interaction (2012), 35(3) pp. 267–283.
(Beihang University) 50 / 79
An Application
m cities: x1, . . . , xm.
Choosing a location in the region χ for the facility such that the amountof pollution reaching any city is minimized.
(Beihang University) 51 / 79
Complexity
p = +∞. (MaxMin) is NP-hard. (Haines et al., SIAM J. Optim.2013).
p = 2. (MaxMin) is NP-hard. When m ≤ n− 1, it is polynomiallysolvable. (Wang & Xia, SIAM J. Optim. 2016).
2 < p < +∞. Unknown!
S. Wang and Y. Xia, On the Ball-Constrained Weighted MaximinDispersion Problem, SIAM J. Optim. 26(3), 1565-1588, 2016
(Beihang University) 52 / 79
Approximation algorithm for (CCB)
(CCB) minz∈Rn
f(z) = max
‖x−ai‖≤ri,i=1,2,...,p‖x− z‖2
.
(Beihang University) 53 / 79
Standard quadratic programming relaxation
Using SDP relaxation for the inner maximization
(DCC) minz
v(SDP(z)) + ‖z‖2
= miny,λ,z
y + ‖z‖2
s. t.
((−1 +
∑pi=1 λi)In z −
∑pi=1 λiai
zT −∑p
i=1 λiaTi y +
∑pi=1 λi(‖ai‖2 − r2
i )
) 0,
λi ∈ R+, i = 1, . . . , p,
which reduces to the standard quadratic program:
(SQP) minλ
p∑i=1
λi(r2i − ‖ai‖2) +
∥∥∥∥∥p∑i=1
λiai
∥∥∥∥∥2
s. t.
p∑i=1
λi = 1, λi ≥ 0, i = 1, . . . , p.
(Beihang University) 54 / 79
Approximation of the SQP
Theorem (Xia-Yang-Wang 2020)
The candidate solution z =∑p
i=1 λ∗i ai satisfies(√
2 + γ
1− γ
)2
v(CCB) ≥ f(z) ≥ v(CCB),
where γ equal to the optimal value of:
minx∈Rn
maxi=1,...,p
‖x− ai‖ri
.
Moreover, let dmax = maxi,j=1,...,p ‖ai − aj‖ and rmin = mini ri:
γ ≤√
n
2(n+ 1)· dmax
rmin<
dmax√2 rmin
.
(Beihang University) 55 / 79
Weighted Maximin Dispersion Problem
(MaxMin) max‖x‖p≤1
f(x) := ωi
(xTx− 2(xi)Tx+ (xi)Txi
).
Lifting xxT to a matrix X yields
(SDP) max mini=1,...,m
ωi(Tr(X)− 2(xi)Tx+ (xi)Txi
)s.t. X
p211 +X
p222 + . . .+X
p2nn ≤ 1,[
X xx 1
] 0.
v(SDP) ≥ v(MaxMin).
(Beihang University) 56 / 79
General Approximation Algorithm (Haines et al. 2013)
An approximation algorithm for (MaxMin)
1. Input ρ ∈ (0, 1) and xi for i = 1, . . . ,m. Let α =√
2 ln(m/ρ).
2. Solve (SDP) and return the optimal solution Z∗ ∈ Sn+1.
Set bi = (√Z∗11x
i1, . . . ,
√Z∗nnx
in)T for i = 1, . . . ,m.
3. Repeatedly generate ξ = (ξ1, . . . , ξn)T with independent ξitaking the value ±1 with equal probability until (bi)T ξ < α‖bi‖for i = 1, . . . ,m.
4. Output x =
( √Z∗
11ξ1√Z∗n+1,n+1
,
√Z∗
22ξ2√Z∗n+1,n+1
, . . . ,
√Z∗nnξn√
Z∗n+1,n+1
)T.
(Beihang University) 57 / 79
Approximation Algorithm without SDP Relaxation
A new algorithm for (MaxMin)
1. Input ρ ∈ (0, 1) and xi for i = 1, . . . ,m. Let α =√
2 ln(m/ρ).
2. Solve (SDP) and return the optimal solution Z∗ ∈ Sn+1.
Set bi = (√Z∗11x
i1, . . . ,
√Z∗nnx
in)T for i = 1, . . . ,m.
2. bi = xi/n1/p for i = 1, . . . ,m.
3. Repeatedly generate ξ = (ξ1, . . . , ξn)T with independent ξitaking the value ±1 with equal probability until (bi)T ξ < α‖bi‖for i = 1, . . . ,m.
4. Output x =
( √Z∗
11ξ1√Z∗n+1,n+1
,
√Z∗
22ξ2√Z∗n+1,n+1
, . . . ,
√Z∗nnξn√
Z∗n+1,n+1
)T. n−
1p ξ.
(Beihang University) 58 / 79
Our New Approximation Bound for (P)
Theorem
For the solution x returned by the new Algorithm, we have
v(MaxMin) ≥ f(x) >1−
√2n ln(m/ρ)
2· v(MaxMin).
(Beihang University) 59 / 79
8 10 12 14 16 18 20 22 24 26 28 30 322
3
4
5
6
7
8
9
10
m
The
ret
urne
d ob
ject
ive
valu
es
Our new Algorithm 2Algorithm 1 proposed in [6]
Figure: Numerical comparison between our new approximation algorithm(Algorithm 2) and the algorithm proposed in [3] (Algorithm 1).
(Beihang University) 60 / 79
Global Optimization for Biconvex Program
For any fixed x (y), f(x, y) is a convex function of y (x), for example,bilinear: f(x, y) = xT y.
Many nonconvex programming problems can be reformulated as abiconvex program.
Our algorithm scheme: biconvexify + branch-and-bound.
(Beihang University) 61 / 79
Nonconvex QP
The “simplest” nonconvex QP is NP-hard:
(NQP) min f(x) = xTQ+x+ qTx− (cTx)2
s.t. x ∈ X,
where X is a polytope, and Q+ 0.
bi-convex:
minx∈X,t∈R
xTQ+x+ qTx− (cTx)2 + (t− cTx)2
= minx∈X,t∈R
xTQ+x+ qTx+ t2 − 2tcTx.
(Beihang University) 62 / 79
New Global Algorithm for solving nonconvex QP
mint∈[tmin,tmax]
g(t) = min
x∈XxTQ+x+ qTx− 2tT cTx+ t2
where tmin = minx∈X c
Tx, tmax = maxx∈X cTx.
Branch-and-bound algorithm for minimizing g(t).
The key idea of the algorithm is to estimate a lower bound of g(t) overany interval [t1, t2] ⊆ [tmin, tmax] after calculating the objective valuesg(t1) and g(t2).
(Beihang University) 63 / 79
Novel underestimation
mint∈[t1,t2]
g(t) = minx∈X
xTQ+x+ qTx− 2tT cTx+ t2
≥ mint∈[t1,t2]
miny1,y2∈R
y1 + ty2 + t2
s.t. y1 + t1y2 + t21 ≥ g(t1)
y1 + t2y2 + t22 ≥ g(t2)
= mint∈[t1,t2]
maxµ
µ1(g(t1)− t21) + µ2(g(t2)− t22) + t2
s.t. µ1 + µ2 = 1; µ1, µ2 ≥ 0
µ1t1 + µ2t2 = t
= mint∈[t1,t2]
t2 + bt+ d
where b =g(t2)−g(t1)+t21−t22
t2−t1 , d =t2(g(t1)−t21)−t1(g(t2)−t22)
t2−t1 . The optimal t∗ isused for the next subdivision.
(Beihang University) 64 / 79
Numerical Results
Table: Results of B&B, CPLEX, and BARON for problem (P1)
B&B CPLEX BARONm/n #val time #iter #val time #val time10 / 5 -0.6382 0.06 4.6 -0.6382 0.12 -0.6382 0.2310 / 20 -3.0354 0.05 3.8 -3.0354 0.17 -3.0354 0.9320 / 25 -2.9929 0.07 4.8 -2.9929 0.16 -2.9929 1.1125 / 50 -5.6372 0.06 4.4 -5.6372 0.20 -5.6372 103.6950 / 100 -10.8368 0.10 4.2 -10.8368 1.18 -10.8368 889.29100/ 200 -19.8384 0.20 4.0 -19.8384 4.30 - -100 / 500 -43.9165 0.51 3.8 -43.9165 10.85 - -200 / 1000 -87.6703 1.51 3.8 -87.6703 106.78 - -300 / 1500 -133.3308 2.87 3.0 * * - -400 / 2000 -177.0847 5.84 3.8 * * - -500 / 2500 -216.0523 10.03 3.8 * * - -600 / 3000 -264.7072 15.00 3.6 * * - -700 / 3500 -301.3223 25.68 4.0 * * - -800 / 4000 -338.7711 34.11 3.8 * * - -900 / 4500 -387.8831 46.00 3.8 * * - -1000 / 5000 -439.0814 57.28 3.6 * * - -
(Beihang University) 65 / 79
Regularized Total Least Squares
minx∈Rn
‖Ax− b‖2
‖x‖2 + 1+ ρ‖Lx‖2.
bi-hidden-convexifying
minα≥1
G(α) := min
‖x‖2=α−1
‖Ax− b‖2
α+ ρ‖Lx‖2
(Beihang University) 66 / 79
Lower bound
The lower bound of G(α) over α ∈ [αi, αi+1]:
minα∈[αi,αi+1]
c1α+c2
α+ c3,
where
c1 =αi+1λ(αi+1)− αiλ(αi)
αi+1 − αi,
c2 = αiαi+1
(c1 −
G(αi+1)− G(αi)
αi+1 − αi
),
c3 =αi+1G(αi+1)− αiG(αi)
αi+1 − αi− c1(αi+1 + αi).
(Beihang University) 67 / 79
Computational Complexity
Theorem
Under some mild assumptions, the new global optimization algorithmrequires at most ⌈
2U√αmax(αmax − αmin)
αmin ·√ε
⌉iterations, where U = maxα∈[αmin,αmax] λ(α) + αλ′(α), is a well-definedfinite number and λ(α) is the λ-solution of the KKT system.
(Beihang University) 68 / 79
Numerical results for image deblurring with different noise(n = 1024)
Bisection Sawtooth Oursσ time # iter. time # iter. time # iter.
0.01 9.46 17.2 477.20 749.6 8.60 14.40.03 12.33 21.9 2673.61 4195.1 10.11 16.60.05 8.91 16.2 5071.03 7780.0 10.52 17.00.08 10.04 18.0 4986.66 7670.6 10.68 17.0
0.1 11.25 19.8 5261.11 7958.7 11.56 18.40.3 17.65 29.4 5764.56 8520.1 10.94 17.00.5 18.59 30.8 6106.53 9023.9 11.19 17.4
(Beihang University) 69 / 79
Ball-constrained Sum-of-two-ratios
(P) max f(x) =xTBx
xTWx+ xTDx
s.t. ‖x‖ = 1,
which is a normalization of (SRQ):
(SRQ) maxx 6=0
xTBx
xTWx+xTDx
xTV x.
(Beihang University) 70 / 79
Reformulation of (P)
(Pα) maxα∈[λ1,λn]
G(α) := max
xTWx=α, x∈S
xTBx
α+ xTDx
.
where λ1 = λmin(W ), λn = λmax(W ).
(Beihang University) 71 / 79
Novel overestimation
Theorem
c1 =αi+1ν1(αi+1)− αiν1(αi)
αi+1 − αi,
c2 = αiαi+1
(c1 −
G(αi+1)− G(αi)
αi+1 − αi
),
c3 =αi+1G(αi+1)− αiG(αi)
αi+1 − αi− c1(αi+1 + αi).
If c1 < 0, c2 < 0, α :=√
c2c1∈ (αi, αi+1), then
maxα∈[αi,αi+1]
G(α) ≤ maxα∈[αi,αi+1]
G(α) = 2√c1c2 + c3.
Otherwise, one of the two endpoints is optimal.
(Beihang University) 72 / 79
Complexity
Theorem
Suppose B, W and D are all diagonal, (P) can be solved in O(n2) time.
Theorem
The total complexity of the new branch-and-bound algorithm forapproximately solving (Pα) is linear-time in terms of N , an upper boundfor the number of all non-zero entries in B, D and W .
L. Wang, Y. Xia, A linear-time algorithm for globally maximizing the sumof a generalized Rayleigh quotient and a quadratic form on the unit sphere,SIAM J. Optim. 29(3), 1844-1869, 2019
(Beihang University) 73 / 79
Numerical results for small-size problems
Table: Numerical comparison among BARON, Algorithm Lipschitz-BB and thenew BB.
BARON Lipschitz-BB New BBn time time # iter. time # iter.
3 1.07 28.87 140.0 0.03 43.34 5.23 22.19 104.7 0.03 44.85 17.75 23.58 105.2 0.03 43.86 290.36 18.52 85.0 0.03 44.27 1199.74 24.70 114.3 0.04 44.48 — 22.35 100.3 0.05 44.9
(Beihang University) 74 / 79
Table: Numerical comparison between Algorithms L-BB and new-BB for solving(P) with η = 1.
L-BB New-BBn time # iter. time # iter.
30 683.88 2704.0 0.22 39.850 1034.22 3390.4 0.44 40.480 1825.18 3778.2 0.94 40.1
100 — — 1.88 43.2
(Beihang University) 75 / 79
Table: Numerical comparison between Algorithms L-BB and new-BB for solving(P) with η = 10.
L-BB New-BBn time # iter. time # iter.30 31.01 124.4 0.22 42.250 53.63 182.1 0.45 42.680 77.67 167.2 1.06 44.8
100 137.11 217.3 1.87 44.7120 171.04 193.1 2.68 44.5150 270.26 185.7 4.13 45.0180 528.61 226.8 5.87 45.9200 667.49 216.6 7.03 45.2220 874.41 217.4 8.56 46.0250 1439.84 245.2 11.13 46.1280 2186.28 263.9 14.02 46.1300 2941.02 284.2 15.95 46.8320 — — 17.90 45.9
(Beihang University) 76 / 79
General Nonconvex QP
Nonconvex QP with multi negative eigenvalues:
minx∈X
xTQ+x+ qTx− ‖Cx‖2
= mint∈∆
g(t) = min
x∈XxTQ+x+ qTx− 2tTCx+ ‖t‖2
≥ min
t∈∆min
y1∈R,y2∈Rry1 + tT y2 + ‖t‖2
s.t. y1 + tiTy2 + ‖t‖2 ≥ g(ti), i = 1, · · · , r + 1.
= mint∈∆
maxu
r+1∑i=1
ui(g(ti)− ‖ti‖2) + ‖t‖2
s.t.
r+1∑i=1
ui = 1;
r+1∑i=1
uiti = t; u ≥ 0
= minα∈∆
αT (g − diag(RTR)) + αTRTRα
where R = (t1, · · · , tr+1), g = (g(t1), · · · , g(tr+1))T .(Beihang University) 77 / 79
Conclusions
When solving a nonconvex optimization, we have the following steps:
1. Is there any hidden convex structure? If no, goto Step 2.
2. Is it NP-hard?
3. Necessary condition and/or sufficient conditions for global/localminimizer.
4. Is there an polynomial-time approximation algorithm?
5. If the problem is not far away from convex optimization, globaloptimization may be a practical choice.
(Beihang University) 78 / 79
Thank you for your time!
yxia@buaa.edu.cn
(Beihang University) 79 / 79
top related