optimization theory and methods: from convex to nonconvex 20200527.pdf · applications of s-lemma...

Optimization theory and methods: from convex tononconvex

Yong Xia

School of Mathematical Sciences, Beihang University

May 27, 2020

(Beihang University) 1 / 79

Outline

1 Theory: Hidden Convexity

2 Theory: Optimization Conditions

3 Optimality conditions for local and global solutions

4 Theory: Complexity

5 Methods: Approximation Algorithm

6 Methods: Global Optimization Algorithm

7 Conclusions


Optimization problem

min f(x)

s.t. x ∈ Ω.

Convex Optimization: Ω convex set, f : Rn → R convex function.Applications: machine learning.

Non-convex Optimization: either Ω or f : Rn → R is non-convex. Forexample, deep learning.


Non-convex Optimization

Problem 2016 (Y. Xia et al. 2020)

minx∈[−1,1]2

100x21 + x2

2 +x1 + x2

2.016 + x1 + x2.

−1 −0.5 0 0.5 1

−10

1−60

−40

−20

0

20

40

60

80

100

120

x1

x2

func

tion

valu

e


Linear Programming

Ax = b

Ax = b, x ≥ 0.

min cTx

s.t. Ax = b, x ≥ 0.


Duality theory

min cTx

s.t. Ax = b, x ≥ 0,

is equal to the optimal value of the dual problem (strong duality)

max bT y

s.t. AT y ≤ c.


How to check the insolvability of x : Ax ≤ b, x ≥ 0?

min cTx : Ax = b, x ≥ 0= max t : cTx < t, Ax = b, x ≥ 0 = ∅.


Classical Farkas Lemma

Theorem

The following two statements are equivalent:(i) The system Ax ≤ b, x ≥ 0 has no solution.(ii) There exists a y ≥ 0 such that AT y ≥ 0 and bT y < 0.

J. Farkas, Uber die Teorie der Einfachen Ungleichungen, Journal fur dieReine und Angewandte Mathematik, 124(124), 1-27, 1902


Nonlinear Farkas Lemma

Theorem 21.1 in [2]

TheoremLet f, g1, . . . , gm : Rn → R be convex functions. C ⊆ Rn is a convex set. Let usassume that the Slater condition holds for g1, · · · , gm, i.e., there exists anx ∈ rel int C such that gj(x) < 0, j = 1, . . . ,m. The following two statementsare equivalent:(i) The system f(x) < 0, gi(x) ≤ 0, i = 1, . . . ,m, x ∈ C is not solvable.(ii) There exist λi ≥ 0, i = 1, . . . ,m such that

f(x) +

m∑i=1

λigi(x) ≥ 0

for all x ∈ C.

R.T. Rockafellar, Convex Analysis, Princeton University Press, Princeton,N. J., 1970


From convex to nonconvex: S-lemma

Theorem (Yakubovich 1971)

Let f, g : Rn → R be quadratic functions and suppose that there is anx ∈ Rn such that g(x) < 0. Then the following two statements areequivalent.(i)There is no x ∈ Rn such that f(x) < 0, g(x) ≤ 0.(ii)There is a nonnegative number y ≥ 0 such thatf(x) + yg(x) ≥ 0 ∀x ∈ Rn.

V.A. Yakubovich, S-procedure in nonlinear control theory, VestnikLeningrad. Univ., 1, 62-71, 1971(in Russian)

Polik,I., Terlaky, T.: A survey of the S-lemma. SIAM Rev. 49(3), 371-418,2007


Applications of S-lemma

The well-known trust-region subproblem (TRS) [5]:

(TRS) minxTAx+ bTx : xTx ≤ δ

,

where δ is a positive scalar. In the case that A 6 0, (TRS) is a nonconvexoptimization. However, we can show that (TRS) is equivalent to a convexoptimization in the sense that they share the same optimal solution.

D.M. Gay, Computing optimal locally constrained steps. SIAM J. Sci. Stat.Comput. 2(2), 186-197, 1981


Applications of S-lemma

Strong duality holds for (TRS):

(TRS)

= minx

xTAx+ bTx : xTx ≤ δ

= sup

t

t :x : t > xTAx+ bTx, xTx ≤ δ

= ∅

= supt

t : ∃λ ≥ 0 : xTAx+ bTx− t+ λ(xTx− δ) ≥ 0, ∀x ∈ Rn

= sup

t

t :

(A+ λI b

2bT

2 −t− λδ

) 0, λ ≥ 0

,

which is a semidefinite programming (SDP).


Extensions of S-lemma: S-lemma with equality

Theorem

Let f(x) := xTAx+ 2aTx+ c, h(x) := xTBx+ 2bTx+ d, whereA,B ∈ Rn×n are symmetric matrices. Suppose there are x′, x′′ ∈ Rn suchthat h(x′) < 0 < h(x′′). Then, except that A has exactly one negativeeigenvalue, B = 0, b 6= 0 and(

V TAV V T (Ax0 + a)(xT0 A+ aT )V f(x0)

) 0,

where x0 = − d2bT b

b, V ∈ Rn×(n−1) is the matrix basis of N (b), the systemf(x) < 0, h(x) = 0 is unsolvable if and only if there exists a real numberµ such that f(x) + µh(x) ≥ 0, ∀x ∈ Rn.

Y. Xia, S. Wang, R.-L. Sheu, S-Lemma with Equality and Its Applications,Mathematical Programming, Ser. A, 156(1-2), 513-547, 2016


Application: GPS localization problem

di ≈ ‖x− ai‖ − r, i = 1, · · · ,m,

where x is the user’s (unknown) location, r is the (unknown) bias causedby the user clock error, a1, · · · , am are the satellites’ known locations, andd1, . . . , dm are measured pseudoranges.

The squared least squares model

minx∈Rn,r∈R

m∑i=1

(‖x− ai‖2 − (r + di)

2)2

can be reformulated as a QP with an equality constraint

minx∈Rn,r,t∈R

m∑i=1

(t− 2aTi x− 2dir + aTi ai − d2

i

)2: xTx− r2 = t

.


A new application

Total least squares:

minx∈Rn‖Ax− b‖2

‖x‖2 + 1.

Total least squares with Tikhonov identical regularization:

(TI) minx∈Rn‖Ax− b‖2

‖x‖2 + 1+ ρ‖x‖2.


Non-convexity

A two-dimensional example:

z =100× ((x+ 2)2 + 3y2)

x2 + y2 + 1+ x2 + y2

5

0

-5

6420-2-4-6-8

400

350

450

300

250

200

150

50

0

100


Reformation of (TI)

(TI) minx∈Rn‖Ax− b‖2

‖x‖2 + 1+ ρ‖x‖2

= minx∈Rn‖Ax− b‖2 + ρ‖x‖4 + ρ‖x‖2

‖x‖2 + 1

, minx∈Rnf(x)

g(x)

Proposition

minx∈Rnf(x)g(x) = λ is equivalent to the quadratic equation

minx∈Rnf(x)− λg(x) = 0.

(TI) Finding λ such that minxf(x)− λg(x) = 0.


Hidden Convexity of (TI)

minx∈Rn

‖Ax− b‖2

‖x‖2 + 1+ ρ‖x‖2

=

minx∈Rn

‖Ax− b‖2 + ρt2 + ρt

t+ 1: ‖x‖2 = t

= supλ : x ∈ Rn : λ ≥

‖Ax− b‖2 + ρt2 + ρt

t+ 1, ‖x‖2 = t = ∅

= supλ : x ∈ Rn : ‖Ax− b‖2 + ρt2 + ρt− λ(t+ 1) < 0, ‖x‖2 = t = ∅= supλ : ∃µ : ‖Ax− b‖2 + ρt2 + ρt− λ(t+ 1) + µ(‖x‖2 − t) ≥ 0, ∀x, t

= sup

λ :

ATA+ µI 0 −AT b0 ρ ρ−λ−µ

2

−bTA ρ−λ−µ2

bT b− λ

0

,

which is a semidefinite programm (SDP).


The Simplified SDP

sup

λ :

ATA+ µI 0 −AT b0 ρ ρ−λ−µ

2

−bTA ρ−λ−µ2

bT b− λ

0

⇐⇒

sup

λ :

Λ + µI 0 00 ρ 0

0 0 − (ρ−λ−µ)24ρ

+ ‖b‖2 − λ− bT (Λ + µI)−1b

0

where ATA = UΛUT , U is orthogonal and b = −UTAT b.⇐⇒

sup

λ : Λi + µ ≥ 0, − (ρ− λ− µ)2

4ρ+ ‖b‖2 − λ− bT (Λ + µI)−1b ≥ 0


Reducing SDP to solving a smooth univariate equation

By eliminating λ, (SDP) is reduced to

max T (µ) := −µ− ρ+ 2ρ

√√√√µ

ρ− 1

ρ

n∑i=1

bi2

Λi + µ+

1

ρbT b

s.t. µ ≥ maxi

(−Λi)

T (µ) is a strictly concave function.The classical Newton’s method is workable with quadratic convergencerate.


Extensions of S-lemma: S-lemma with interval bounds

TheoremAssume there is an x ∈ Rn such that −∞ < α < h(x) < β <∞. Then, except that A has exactly one negativeeigenvalue, B = 0, b 6= 0 and there exists a ν ≥ 0 such that

V TAV 1

2bT bV TAb V T a

12bT b

bTAV bT Ab(2bT b)2

+ ν aT b2bT b

− ν2(α + β − 2d)

aT V aT b2bT b

− ν2(α + β − 2d) c + ν(α− d)(β − d)

0,

where V ∈ Rn×(n−1) is the matrix basis of N (b), the system

f(x) < 0, α ≤ h(x) ≤ β

is unsolvable if and only if there is a number µ ∈ R such that

f(x) + µ−(h(x)− β) + µ+(α− h(x)) ≥ 0, ∀x ∈ Rn,

where µ+ = maxµ, 0, µ− = minµ, 0.

S. Wang, Y. Xia, Strong duality for generalized trust region subproblem:S-lemma with interval bounds. Optim. Lett. 9, 1063-1073, 2015


Positive Lagrangian duality gap

Identical regularized total least squares problem (TLS) (Beck et al. 2006):

(ITLS) min

‖Ax− b‖2

xTx+ 1: α ≤ xTx ≤ β

,

where A ∈ Rm×n (m ≥ n) and 0 ≤ α ≤ β < +∞ is assumed. Our necessary andsufficient condition for the strong duality:

Theorem (Yang & Xia, 2000)

The Lagrangian duality gap for (ITLS) is positive if and only if

A := ATA− λmin

(ATA −AT b−bTA bT b

)· I 0,

α− bTAA−2AbT > 0.

Corollary

Strong Lagrangian duality holds for (ITLS) with α = 0.


Closing Lagrangian duality gap

(G− ITLS) min

f1(x)

f2(x): α ≤ f3(x) ≤ β

,

where fi(x) are quadratic functions for i = 1, 2, 3 and f2 > 0.

Theorem

There is no Lagrangian duality gap for the scaled (G− ITLS):

min

f1(x)

f2(x):

α

f2(x)≤ f3(x)

f2(x)≤ β

f2(x)

.


Let’s look back

Nonlinear Farkas Lemma: convex;

S-lemma: quadratic nonconvex.


Unifying Farkas and S-lemma

Theorem

(U-lemma) Suppose f(x) and g(x) are two quadratic functions. Letq0(z), . . . , qm(z) be convex functions defined in a convex set Ω ⊆ Rn.Assume that there exist x ∈ Rn and z ∈ relint(Ω) such thatg(x) + q1(z) < 0, qi(z) < 0, i = 2, . . . ,m. Then the following twostatements are equivalent:

(i) The system

f(x) + q0(z) < 0, g(x) + q1(z) ≤ 0, qi(z) ≤ 0, i = 2, . . . ,m, z ∈ Ω

has no solution (x, z).

(ii) There exist µi ≥ 0, i = 1, 2, . . . ,m such that

f(x) + µ1g(x) + q0(z) +

m∑i=1

µiqi(z) ≥ 0, ∀ x ∈ Rn, z ∈ Ω.


Application 1.

Solving the p-regularized subproblems for p > 2:

(p−RS) : minxxTAx+ bTx+ ‖x‖p (p > 2)

The p-regularized subproblem (p-RS) is a regularization technique incomputing a Newton-like for unconstrained optimization.When p = 3: Nestrov-Polyak Subproblem.When p = 4: Double-well potential minimization problem.

Y. Xia, R.-L. Sheu, Y.-x. Yuan, Theory and application of p-regularizedsubproblems for p > 2, Optimization Methods & Software, 32(5) 1059-1077,2017


Application 2.

A common way of seeking a good answer in numerical analysis is to find amethod of calculation that minimizes the backward error of the problem.The basic idea of the backward error analysis technique is to show thesolution obtained is the exact solution of a nearby problem, by use offloating-point arithmetic error bounds. One of the cost functions is asbelow:

minx

‖Ax− b‖‖A‖‖x‖+ ‖b‖

. (1)

K.E. Schubert, A new look at robust estimation and identification,Ph.D.Dissertation, Univ. of California, Santa Barbara, CA, September 2003.


Trust-region subproblem (TRS)

(TRS) minxTx=1

1

2xTQx + cTx.

-15

-10

-5

0

5

1

10

15

20

y2

01

y1

0.50-1 -0.5-1

global minimizerglobal maximizerlocal non-global minimizerlocal non-global maximizer

Figure: An example that the local non-global minimizer exists


Problem simplification and the KKT condition

Let Q = UTΛU , Λ = Diag(λ1, . . . , λn), and U = [u1, . . . , un]

(0 >)λ1 ≤ λ2 ≤ . . . ≤ λn,

Introduce y = Ux and d = Uc, we have

v(TE) = minyT y=1

f(y) =1

2yTΛy + dT y

We call (y, µ) a KKT point if :

(Λ + µIn)y + d = 0

yT y − 1 = 0

µ ∈ R


Optimality conditions for local minimizers

Lemma

(1) (Second-order necessary condition.) If y is a local minimizer, µ isthe associated Lagrangian multiplier, then (y, µ) is a KKT point and

vT (Λ + µIn)v ≥ 0, ∀v ∈ Rn such that vT y = 0.

(2) (Second-order sufficient condition.) Suppose (y, µ) is a KKT pointand

vT (Λ + µIn)v > 0, ∀v ∈ Rn such that vT y = 0, v 6= 0.

Then y is a strict local minimizer.

D. G. Luenberger, Linear and Nonlinear Programming, 2nd ed.,Addison-Wesley, Reading, MA, 1984


Necessary and sufficient condition for global minimizer

Theorem (Gay 1981; Sorensen 1982; More and Sorensen 1983)

y is a global minimizer if and only if (y, µ) is a KKT point and

µ ≥ −λ1(> 0).

D. M. Gay, Computing optimal locally constrained steps. SIAM J. Sci.Stat. Comput. 2(2): 186-197, 1981

D. C. Sorensen, Newton’s method with a model trust region modification.SIAM J. Numer. Anal. 19(2): 409–426, 1982

J. J. More and D. C. Sorensen, Computing a trust region step. SIAM J.Sci. Statist. Comput., 4(3): 553–572, 1983


Characterizations of local non-global minimizer

Lemma (Martınez,1994)

If y is a local non-global minimizer, then (y, µ) is a KKT point, λ1 < λ2

and µ ∈ (−λ2,−λ1).

Define the so-called secular function:

ϕ(µ) =

n∑i=1

d2i

(λi + µ)2− 1.

µ must be a zero point of ϕ(µ).

J. M. Martınez, Local minimizers of quadratic functions on Euclidean ballsand spheres, SIAM J. Optim. 4: 159–176, 1994


Secular function

ϕ(µ) =

n∑i=1

d2i

(λi + µ)2− 1,

ϕ′(µ) = −2

n∑i=1

d2i

(λi + µ)3,

ϕ′′(µ) = 6

n∑i=1

d2i

(λi + µ)4.

ϕ′′(µ) > 0, ϕ(µ) is strongly convex

(d1 6= 0) limµ→−λ1

ϕ(µ) = +∞, (d1 = 0) limµ→−λ1

ϕ(µ) = α.

(d2 6= 0) limµ→−λ2

ϕ(µ) = +∞, (d2 = 0) limµ→−λ2

ϕ(µ) = β.

ϕ(µ) has at most two zero points in (−λ2,−λ1)


Optimality conditions for local non-global minimizer

Theorem (Martınez,1994)

(1) Suppose either λ1 = λ2 or d1 = 0, there is no local non-globalminimizer.

(2) There is at most one local non-global minimizer.

(3) (Necessary condition.) If y is a local non-global minimizer, then (y, µ)is a KKT point, µ ∈ (−λ2,−λ1) and

ϕ′(µ) ≥ 0.

(4) (Sufficient condition.) If (y, µ) is a KKT point, µ ∈ (−λ2,−λ1) and

ϕ′(µ) > 0,

then y is the unique local non-global minimizer.


An example that ϕ′(µ) = 0

110.5 0.5y

1

0y

2

0-0.5

30

20

10

0

-1 -0.5

-10

-20-1

global minimizerglobal maximizersecond-order KKT point

Figure: A two-dimensional example has no local non-global minimizer/maximizer.


Necessary and sufficient condition for local non-globalminimizer

Theorem (Wang & Xia 2020)

y is the unique local non-global minimizer if and only if (y, µ) is a KKTpoint, µ ∈ (−λ2,−λ1) and ϕ′(µ) > 0.

Theorem (Second-order necessary and sufficient condition)

x is a local non-global minimizer of (TE) if and only if there exists aunique Lagrangian multiplier µ such that

(Q+ µIn)x+ c = 0,

xTx = 1,

vT (Q+ µIn)v > 0, ∀v ∈ Rn such that vTx = 0, v 6= 0,

Q+ µIn 6 0.

J. Wang, Y. Xia, Closing the gap between necessary and sufficientconditions for local non-global minimizer of trust region subproblem, SIAM J.Optim. 2020, accepted(Beihang University) 36 / 79

Application

With the help of local-nonglobal minimizer, one can globally solve theextended trust-region subproblem:

min1

2xTQx+ cTx

s.t. xTx ≤ 1,

aTi x− bi ≤ 0, i = 1, · · · ,m,


NP=P ?


Chebyshev center problem

Finding the smallest ball enclosing the convex set Ω:

minz

maxx∈Ω‖x− z‖2.

Easy cases:

Ω : a given set of finite points.

Ω : intersection of twoellipsoids in the complex domain.


Chebyshev center of intersection of balls

Finding the smallest ball enclosing the intersection of the given balls:

(CCB) minz∈Rn

max‖x−ai‖≤ri,i=1,2,...,p

‖x− z‖2,

‖ · ‖: `2-norm.


Applications

Example

Robust estimation. Suppose yk, k = 1, ..., p are the p measurements of theunknown x with bounded noises, i.e., ‖yk − x‖ ≤ ρ for some ρ > 0. Then,a robust recover of x can be obtained by solving (CCB).

Example

In non-cooperative wireless network positioning, the region for a target tocommunicate with some reference nodes is an intersection of many balls.The position error is usually bounded by the diameter of the smallest ballenclosing this region. This leads to a direct application of (CCB) withn = 2 or 3.


Known Results

Theorem (Beck 2007)

v(CCB) = v(SDP) as long as p ≤ n− 1.

Theorem (Beck 2009)

v(CCB) = v(SDP) as long as p ≤ n.

No complexity results when p > n.


What happens when p = 3 > n = 2?


Our complexity results

Theorem

The complexity of (CCB) when n = 2 is at most O(p2).

Theorem

(CCB) is NP-hard.

Theorem

Suppose either n is fixed or p = n+ q with a fixed integer q, (CCB) ispolynomially solvable in at most

(pn

)·O(n3) time.

Y. Xia, M. Yang, S. Wang, Chebyshev Center of the Intersection of Balls:Complexity, Relaxation and Approximation, Mathematical Programming, 2020,https://doi.org/10.1007/s10107-020-01479-0


Chebyshev center of the intersection of two ellipsoids

(CC) minz

maxx∈Ω‖x− z‖2,

where

Ω :=x ∈ Rn : ‖Fix+ gi‖2 ≤ 1, i = 1, 2

,

and Fi ∈ Rmi×n, gi ∈ Rmi .

Beck, A., Eldar, Y.: Regularization in regression with bounded noise: AChebyshev center approach. SIAM Journal on Matrix Analysis and Applications29(2), 606-625, 2007


Application in bounded error estimation

Linear regression model Ax ≈ b, where the input data matrix A isill-conditioned.

Admissible solutions to the linear system:

F = x ∈ Rn : ‖Lx‖2 ≤ η, ‖Ax− b‖2 ≤ ρ,

where regularization constraint ‖Lx‖2 ≤ η is to stabilize x and‖Ax− b‖2 ≤ ρ is the error bound constraint.

A robust approximation of the true solution x is given by the Chebyshevcenter of F .

Milanese, M., Vicino, A.: Optimal estimation theory for dynamic systemswith set membership uncertainty: an overview. Automatica 27(6), 997-1009,1991


An Example

-6 -5 -4 -3 -2 -1 0 1 2 3

-4

-3

-2

-1

0

1

2

3

*+

Figure: An example in two dimension where the input ellipse is plotted in solid line. The dotted and dashed circles are theChebyshev solutions and the SDP approximation, respectively. Chebyshev centers and the corresponding SDP approximation aremarked by ∗ and +, respectively.


Complexity for (CC)

Theorem

For any ε > 0, (CC) can be solved in

O(n8 log log u−1) log

((6R+ 4‖zc‖) · σmax(F1)

σ2min(F1)

1

ε

).

X. Cen, Y. Xia et al. On Chebyshev Center of the Intersection of TwoEllipsoids, WCGO 2019


Weighted Maximin Dispersion Problem

Consider the following weighted maximin dispersion problem

(MaxMin) maxx∈Ω

mini=1,...,m

ωi‖x− xi‖2,

where Ω = x | ‖x‖p ≤ 1 with p ≥ 2,x1, . . . , xm ∈ Rn are given m points,ωi > 0 for i = 1, . . . ,mand ‖x‖ =

√xTx is the Euclidean norm.


On the objective function

nonsocial transient behavior

E.C. Kim, Nonsocial Transient Behavior: Social Disengagement on theGreyhound Bus, Symbolic Interaction (2012), 35(3) pp. 267–283.


An Application

m cities: x1, . . . , xm.

Choosing a location in the region χ for the facility such that the amountof pollution reaching any city is minimized.


Complexity

p = +∞. (MaxMin) is NP-hard. (Haines et al., SIAM J. Optim.2013).

p = 2. (MaxMin) is NP-hard. When m ≤ n− 1, it is polynomiallysolvable. (Wang & Xia, SIAM J. Optim. 2016).

2 < p < +∞. Unknown!

S. Wang and Y. Xia, On the Ball-Constrained Weighted MaximinDispersion Problem, SIAM J. Optim. 26(3), 1565-1588, 2016


Approximation algorithm for (CCB)

(CCB) minz∈Rn

f(z) = max

‖x−ai‖≤ri,i=1,2,...,p‖x− z‖2

.


Standard quadratic programming relaxation

Using SDP relaxation for the inner maximization

(DCC) minz

v(SDP(z)) + ‖z‖2

= miny,λ,z

y + ‖z‖2

s. t.

((−1 +

∑pi=1 λi)In z −

∑pi=1 λiai

zT −∑p

i=1 λiaTi y +

∑pi=1 λi(‖ai‖2 − r2

i )

) 0,

λi ∈ R+, i = 1, . . . , p,

which reduces to the standard quadratic program:

(SQP) minλ

p∑i=1

λi(r2i − ‖ai‖2) +

∥∥∥∥∥p∑i=1

λiai

∥∥∥∥∥2

s. t.

p∑i=1

λi = 1, λi ≥ 0, i = 1, . . . , p.


Approximation of the SQP

Theorem (Xia-Yang-Wang 2020)

The candidate solution z =∑p

i=1 λ∗i ai satisfies(√

2 + γ

1− γ

)2

v(CCB) ≥ f(z) ≥ v(CCB),

where γ equal to the optimal value of:

minx∈Rn

maxi=1,...,p

‖x− ai‖ri

.

Moreover, let dmax = maxi,j=1,...,p ‖ai − aj‖ and rmin = mini ri:

γ ≤√

n

2(n+ 1)· dmax

rmin<

dmax√2 rmin

.


Weighted Maximin Dispersion Problem

(MaxMin) max‖x‖p≤1

f(x) := ωi

(xTx− 2(xi)Tx+ (xi)Txi

).

Lifting xxT to a matrix X yields

(SDP) max mini=1,...,m

ωi(Tr(X)− 2(xi)Tx+ (xi)Txi

)s.t. X

p211 +X

p222 + . . .+X

p2nn ≤ 1,[

X xx 1

] 0.

v(SDP) ≥ v(MaxMin).


General Approximation Algorithm (Haines et al. 2013)

An approximation algorithm for (MaxMin)

1. Input ρ ∈ (0, 1) and xi for i = 1, . . . ,m. Let α =√

2 ln(m/ρ).

2. Solve (SDP) and return the optimal solution Z∗ ∈ Sn+1.

Set bi = (√Z∗11x

i1, . . . ,

√Z∗nnx

in)T for i = 1, . . . ,m.

3. Repeatedly generate ξ = (ξ1, . . . , ξn)T with independent ξitaking the value ±1 with equal probability until (bi)T ξ < α‖bi‖for i = 1, . . . ,m.

4. Output x =

( √Z∗

11ξ1√Z∗n+1,n+1

,

√Z∗

22ξ2√Z∗n+1,n+1

, . . . ,

√Z∗nnξn√

Z∗n+1,n+1

)T.


Approximation Algorithm without SDP Relaxation

A new algorithm for (MaxMin)

1. Input ρ ∈ (0, 1) and xi for i = 1, . . . ,m. Let α =√

2 ln(m/ρ).

2. Solve (SDP) and return the optimal solution Z∗ ∈ Sn+1.

Set bi = (√Z∗11x

i1, . . . ,

√Z∗nnx

in)T for i = 1, . . . ,m.

2. bi = xi/n1/p for i = 1, . . . ,m.

3. Repeatedly generate ξ = (ξ1, . . . , ξn)T with independent ξitaking the value ±1 with equal probability until (bi)T ξ < α‖bi‖for i = 1, . . . ,m.

4. Output x =

( √Z∗

11ξ1√Z∗n+1,n+1

,

√Z∗

22ξ2√Z∗n+1,n+1

, . . . ,

√Z∗nnξn√

Z∗n+1,n+1

)T. n−

1p ξ.


Our New Approximation Bound for (P)

Theorem

For the solution x returned by the new Algorithm, we have

v(MaxMin) ≥ f(x) >1−

√2n ln(m/ρ)

2· v(MaxMin).


8 10 12 14 16 18 20 22 24 26 28 30 322

3

4

5

6

7

8

9

10

m

The

ret

urne

d ob

ject

ive

valu

es

Our new Algorithm 2Algorithm 1 proposed in [6]

Figure: Numerical comparison between our new approximation algorithm(Algorithm 2) and the algorithm proposed in [3] (Algorithm 1).


Global Optimization for Biconvex Program

For any fixed x (y), f(x, y) is a convex function of y (x), for example,bilinear: f(x, y) = xT y.

Many nonconvex programming problems can be reformulated as abiconvex program.

Our algorithm scheme: biconvexify + branch-and-bound.


Nonconvex QP

The “simplest” nonconvex QP is NP-hard:

(NQP) min f(x) = xTQ+x+ qTx− (cTx)2

s.t. x ∈ X,

where X is a polytope, and Q+ 0.

bi-convex:

minx∈X,t∈R

xTQ+x+ qTx− (cTx)2 + (t− cTx)2

= minx∈X,t∈R

xTQ+x+ qTx+ t2 − 2tcTx.


New Global Algorithm for solving nonconvex QP

mint∈[tmin,tmax]

g(t) = min

x∈XxTQ+x+ qTx− 2tT cTx+ t2

where tmin = minx∈X c

Tx, tmax = maxx∈X cTx.

Branch-and-bound algorithm for minimizing g(t).

The key idea of the algorithm is to estimate a lower bound of g(t) overany interval [t1, t2] ⊆ [tmin, tmax] after calculating the objective valuesg(t1) and g(t2).


Novel underestimation

mint∈[t1,t2]

g(t) = minx∈X

xTQ+x+ qTx− 2tT cTx+ t2

≥ mint∈[t1,t2]

miny1,y2∈R

y1 + ty2 + t2

s.t. y1 + t1y2 + t21 ≥ g(t1)

y1 + t2y2 + t22 ≥ g(t2)

= mint∈[t1,t2]

maxµ

µ1(g(t1)− t21) + µ2(g(t2)− t22) + t2

s.t. µ1 + µ2 = 1; µ1, µ2 ≥ 0

µ1t1 + µ2t2 = t

= mint∈[t1,t2]

t2 + bt+ d

where b =g(t2)−g(t1)+t21−t22

t2−t1 , d =t2(g(t1)−t21)−t1(g(t2)−t22)

t2−t1 . The optimal t∗ isused for the next subdivision.


Numerical Results

Table: Results of B&B, CPLEX, and BARON for problem (P1)

B&B CPLEX BARONm/n #val time #iter #val time #val time10 / 5 -0.6382 0.06 4.6 -0.6382 0.12 -0.6382 0.2310 / 20 -3.0354 0.05 3.8 -3.0354 0.17 -3.0354 0.9320 / 25 -2.9929 0.07 4.8 -2.9929 0.16 -2.9929 1.1125 / 50 -5.6372 0.06 4.4 -5.6372 0.20 -5.6372 103.6950 / 100 -10.8368 0.10 4.2 -10.8368 1.18 -10.8368 889.29100/ 200 -19.8384 0.20 4.0 -19.8384 4.30 - -100 / 500 -43.9165 0.51 3.8 -43.9165 10.85 - -200 / 1000 -87.6703 1.51 3.8 -87.6703 106.78 - -300 / 1500 -133.3308 2.87 3.0 * * - -400 / 2000 -177.0847 5.84 3.8 * * - -500 / 2500 -216.0523 10.03 3.8 * * - -600 / 3000 -264.7072 15.00 3.6 * * - -700 / 3500 -301.3223 25.68 4.0 * * - -800 / 4000 -338.7711 34.11 3.8 * * - -900 / 4500 -387.8831 46.00 3.8 * * - -1000 / 5000 -439.0814 57.28 3.6 * * - -


Regularized Total Least Squares

minx∈Rn

‖Ax− b‖2

‖x‖2 + 1+ ρ‖Lx‖2.

bi-hidden-convexifying

minα≥1

G(α) := min

‖x‖2=α−1

‖Ax− b‖2

α+ ρ‖Lx‖2


Lower bound

The lower bound of G(α) over α ∈ [αi, αi+1]:

minα∈[αi,αi+1]

c1α+c2

α+ c3,

where

c1 =αi+1λ(αi+1)− αiλ(αi)

αi+1 − αi,

c2 = αiαi+1

(c1 −

G(αi+1)− G(αi)

αi+1 − αi

),

c3 =αi+1G(αi+1)− αiG(αi)

αi+1 − αi− c1(αi+1 + αi).


Computational Complexity

Theorem

Under some mild assumptions, the new global optimization algorithmrequires at most ⌈

2U√αmax(αmax − αmin)

αmin ·√ε

⌉iterations, where U = maxα∈[αmin,αmax] λ(α) + αλ′(α), is a well-definedfinite number and λ(α) is the λ-solution of the KKT system.


Numerical results for image deblurring with different noise(n = 1024)

Bisection Sawtooth Oursσ time # iter. time # iter. time # iter.

0.01 9.46 17.2 477.20 749.6 8.60 14.40.03 12.33 21.9 2673.61 4195.1 10.11 16.60.05 8.91 16.2 5071.03 7780.0 10.52 17.00.08 10.04 18.0 4986.66 7670.6 10.68 17.0

0.1 11.25 19.8 5261.11 7958.7 11.56 18.40.3 17.65 29.4 5764.56 8520.1 10.94 17.00.5 18.59 30.8 6106.53 9023.9 11.19 17.4


Ball-constrained Sum-of-two-ratios

(P) max f(x) =xTBx

xTWx+ xTDx

s.t. ‖x‖ = 1,

which is a normalization of (SRQ):

(SRQ) maxx 6=0

xTBx

xTWx+xTDx

xTV x.


Reformulation of (P)

(Pα) maxα∈[λ1,λn]

G(α) := max

xTWx=α, x∈S

xTBx

α+ xTDx

.

where λ1 = λmin(W ), λn = λmax(W ).


Novel overestimation

Theorem

c1 =αi+1ν1(αi+1)− αiν1(αi)

αi+1 − αi,

c2 = αiαi+1

(c1 −

G(αi+1)− G(αi)

αi+1 − αi

),

c3 =αi+1G(αi+1)− αiG(αi)

αi+1 − αi− c1(αi+1 + αi).

If c1 < 0, c2 < 0, α :=√

c2c1∈ (αi, αi+1), then

maxα∈[αi,αi+1]

G(α) ≤ maxα∈[αi,αi+1]

G(α) = 2√c1c2 + c3.

Otherwise, one of the two endpoints is optimal.


Complexity

Theorem

Suppose B, W and D are all diagonal, (P) can be solved in O(n2) time.

Theorem

The total complexity of the new branch-and-bound algorithm forapproximately solving (Pα) is linear-time in terms of N , an upper boundfor the number of all non-zero entries in B, D and W .

L. Wang, Y. Xia, A linear-time algorithm for globally maximizing the sumof a generalized Rayleigh quotient and a quadratic form on the unit sphere,SIAM J. Optim. 29(3), 1844-1869, 2019


Numerical results for small-size problems

Table: Numerical comparison among BARON, Algorithm Lipschitz-BB and thenew BB.

BARON Lipschitz-BB New BBn time time # iter. time # iter.

3 1.07 28.87 140.0 0.03 43.34 5.23 22.19 104.7 0.03 44.85 17.75 23.58 105.2 0.03 43.86 290.36 18.52 85.0 0.03 44.27 1199.74 24.70 114.3 0.04 44.48 — 22.35 100.3 0.05 44.9


Table: Numerical comparison between Algorithms L-BB and new-BB for solving(P) with η = 1.

L-BB New-BBn time # iter. time # iter.

30 683.88 2704.0 0.22 39.850 1034.22 3390.4 0.44 40.480 1825.18 3778.2 0.94 40.1

100 — — 1.88 43.2


Table: Numerical comparison between Algorithms L-BB and new-BB for solving(P) with η = 10.

L-BB New-BBn time # iter. time # iter.30 31.01 124.4 0.22 42.250 53.63 182.1 0.45 42.680 77.67 167.2 1.06 44.8

100 137.11 217.3 1.87 44.7120 171.04 193.1 2.68 44.5150 270.26 185.7 4.13 45.0180 528.61 226.8 5.87 45.9200 667.49 216.6 7.03 45.2220 874.41 217.4 8.56 46.0250 1439.84 245.2 11.13 46.1280 2186.28 263.9 14.02 46.1300 2941.02 284.2 15.95 46.8320 — — 17.90 45.9


General Nonconvex QP

Nonconvex QP with multi negative eigenvalues:

minx∈X

xTQ+x+ qTx− ‖Cx‖2

= mint∈∆

g(t) = min

x∈XxTQ+x+ qTx− 2tTCx+ ‖t‖2

≥ min

t∈∆min

y1∈R,y2∈Rry1 + tT y2 + ‖t‖2

s.t. y1 + tiTy2 + ‖t‖2 ≥ g(ti), i = 1, · · · , r + 1.

= mint∈∆

maxu

r+1∑i=1

ui(g(ti)− ‖ti‖2) + ‖t‖2

s.t.

r+1∑i=1

ui = 1;

r+1∑i=1

uiti = t; u ≥ 0

= minα∈∆

αT (g − diag(RTR)) + αTRTRα

where R = (t1, · · · , tr+1), g = (g(t1), · · · , g(tr+1))T .(Beihang University) 77 / 79

Conclusions

When solving a nonconvex optimization, we have the following steps:

1. Is there any hidden convex structure? If no, goto Step 2.

2. Is it NP-hard?

3. Necessary condition and/or sufficient conditions for global/localminimizer.

4. Is there an polynomial-time approximation algorithm?

5. If the problem is not far away from convex optimization, globaloptimization may be a practical choice.


Thank you for your time!

[email protected]


optimization theory and methods: from convex to nonconvex 20200527.pdf · applications of s-lemma...

Documents