optimization theory and methods: from convex to nonconvex 20200527.pdf · applications of s-lemma...

Post on 22-Jul-2020

3 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Optimization theory and methods: from convex tononconvex

Yong Xia

School of Mathematical Sciences, Beihang University

May 27, 2020

(Beihang University) 1 / 79

Outline

1 Theory: Hidden Convexity

2 Theory: Optimization Conditions

3 Optimality conditions for local and global solutions

4 Theory: Complexity

5 Methods: Approximation Algorithm

6 Methods: Global Optimization Algorithm

7 Conclusions

(Beihang University) 2 / 79

Optimization problem

min f(x)

s.t. x ∈ Ω.

Convex Optimization: Ω convex set, f : Rn → R convex function.Applications: machine learning.

Non-convex Optimization: either Ω or f : Rn → R is non-convex. Forexample, deep learning.

(Beihang University) 3 / 79

Non-convex Optimization

Problem 2016 (Y. Xia et al. 2020)

minx∈[−1,1]2

100x21 + x2

2 +x1 + x2

2.016 + x1 + x2.

−1 −0.5 0 0.5 1

−10

1−60

−40

−20

0

20

40

60

80

100

120

x1

x2

func

tion

valu

e

(Beihang University) 4 / 79

Linear Programming

Ax = b

Ax = b, x ≥ 0.

min cTx

s.t. Ax = b, x ≥ 0.

(Beihang University) 5 / 79

Duality theory

min cTx

s.t. Ax = b, x ≥ 0,

is equal to the optimal value of the dual problem (strong duality)

max bT y

s.t. AT y ≤ c.

(Beihang University) 6 / 79

How to check the insolvability of x : Ax ≤ b, x ≥ 0?

min cTx : Ax = b, x ≥ 0= max t : cTx < t, Ax = b, x ≥ 0 = ∅.

(Beihang University) 7 / 79

Classical Farkas Lemma

Theorem

The following two statements are equivalent:(i) The system Ax ≤ b, x ≥ 0 has no solution.(ii) There exists a y ≥ 0 such that AT y ≥ 0 and bT y < 0.

J. Farkas, Uber die Teorie der Einfachen Ungleichungen, Journal fur dieReine und Angewandte Mathematik, 124(124), 1-27, 1902

(Beihang University) 8 / 79

Nonlinear Farkas Lemma

Theorem 21.1 in [2]

TheoremLet f, g1, . . . , gm : Rn → R be convex functions. C ⊆ Rn is a convex set. Let usassume that the Slater condition holds for g1, · · · , gm, i.e., there exists anx ∈ rel int C such that gj(x) < 0, j = 1, . . . ,m. The following two statementsare equivalent:(i) The system f(x) < 0, gi(x) ≤ 0, i = 1, . . . ,m, x ∈ C is not solvable.(ii) There exist λi ≥ 0, i = 1, . . . ,m such that

f(x) +

m∑i=1

λigi(x) ≥ 0

for all x ∈ C.

R.T. Rockafellar, Convex Analysis, Princeton University Press, Princeton,N. J., 1970

(Beihang University) 9 / 79

From convex to nonconvex: S-lemma

Theorem (Yakubovich 1971)

Let f, g : Rn → R be quadratic functions and suppose that there is anx ∈ Rn such that g(x) < 0. Then the following two statements areequivalent.(i)There is no x ∈ Rn such that f(x) < 0, g(x) ≤ 0.(ii)There is a nonnegative number y ≥ 0 such thatf(x) + yg(x) ≥ 0 ∀x ∈ Rn.

V.A. Yakubovich, S-procedure in nonlinear control theory, VestnikLeningrad. Univ., 1, 62-71, 1971(in Russian)

Polik,I., Terlaky, T.: A survey of the S-lemma. SIAM Rev. 49(3), 371-418,2007

(Beihang University) 10 / 79

Applications of S-lemma

The well-known trust-region subproblem (TRS) [5]:

(TRS) minxTAx+ bTx : xTx ≤ δ

,

where δ is a positive scalar. In the case that A 6 0, (TRS) is a nonconvexoptimization. However, we can show that (TRS) is equivalent to a convexoptimization in the sense that they share the same optimal solution.

D.M. Gay, Computing optimal locally constrained steps. SIAM J. Sci. Stat.Comput. 2(2), 186-197, 1981

(Beihang University) 11 / 79

Applications of S-lemma

Strong duality holds for (TRS):

(TRS)

= minx

xTAx+ bTx : xTx ≤ δ

= sup

t

t :x : t > xTAx+ bTx, xTx ≤ δ

= ∅

= supt

t : ∃λ ≥ 0 : xTAx+ bTx− t+ λ(xTx− δ) ≥ 0, ∀x ∈ Rn

= sup

t

t :

(A+ λI b

2bT

2 −t− λδ

) 0, λ ≥ 0

,

which is a semidefinite programming (SDP).

(Beihang University) 12 / 79

Extensions of S-lemma: S-lemma with equality

Theorem

Let f(x) := xTAx+ 2aTx+ c, h(x) := xTBx+ 2bTx+ d, whereA,B ∈ Rn×n are symmetric matrices. Suppose there are x′, x′′ ∈ Rn suchthat h(x′) < 0 < h(x′′). Then, except that A has exactly one negativeeigenvalue, B = 0, b 6= 0 and(

V TAV V T (Ax0 + a)(xT0 A+ aT )V f(x0)

) 0,

where x0 = − d2bT b

b, V ∈ Rn×(n−1) is the matrix basis of N (b), the systemf(x) < 0, h(x) = 0 is unsolvable if and only if there exists a real numberµ such that f(x) + µh(x) ≥ 0, ∀x ∈ Rn.

Y. Xia, S. Wang, R.-L. Sheu, S-Lemma with Equality and Its Applications,Mathematical Programming, Ser. A, 156(1-2), 513-547, 2016

(Beihang University) 13 / 79

Application: GPS localization problem

di ≈ ‖x− ai‖ − r, i = 1, · · · ,m,

where x is the user’s (unknown) location, r is the (unknown) bias causedby the user clock error, a1, · · · , am are the satellites’ known locations, andd1, . . . , dm are measured pseudoranges.

The squared least squares model

minx∈Rn,r∈R

m∑i=1

(‖x− ai‖2 − (r + di)

2)2

can be reformulated as a QP with an equality constraint

minx∈Rn,r,t∈R

m∑i=1

(t− 2aTi x− 2dir + aTi ai − d2

i

)2: xTx− r2 = t

.

(Beihang University) 14 / 79

A new application

Total least squares:

minx∈Rn‖Ax− b‖2

‖x‖2 + 1.

Total least squares with Tikhonov identical regularization:

(TI) minx∈Rn‖Ax− b‖2

‖x‖2 + 1+ ρ‖x‖2.

(Beihang University) 15 / 79

Non-convexity

A two-dimensional example:

z =100× ((x+ 2)2 + 3y2)

x2 + y2 + 1+ x2 + y2

5

0

-5

6420-2-4-6-8

400

350

450

300

250

200

150

50

0

100

(Beihang University) 16 / 79

Reformation of (TI)

(TI) minx∈Rn‖Ax− b‖2

‖x‖2 + 1+ ρ‖x‖2

= minx∈Rn‖Ax− b‖2 + ρ‖x‖4 + ρ‖x‖2

‖x‖2 + 1

, minx∈Rnf(x)

g(x)

Proposition

minx∈Rnf(x)g(x) = λ is equivalent to the quadratic equation

minx∈Rnf(x)− λg(x) = 0.

(TI) Finding λ such that minxf(x)− λg(x) = 0.

(Beihang University) 17 / 79

Hidden Convexity of (TI)

minx∈Rn

‖Ax− b‖2

‖x‖2 + 1+ ρ‖x‖2

=

minx∈Rn

‖Ax− b‖2 + ρt2 + ρt

t+ 1: ‖x‖2 = t

= supλ : x ∈ Rn : λ ≥

‖Ax− b‖2 + ρt2 + ρt

t+ 1, ‖x‖2 = t = ∅

= supλ : x ∈ Rn : ‖Ax− b‖2 + ρt2 + ρt− λ(t+ 1) < 0, ‖x‖2 = t = ∅= supλ : ∃µ : ‖Ax− b‖2 + ρt2 + ρt− λ(t+ 1) + µ(‖x‖2 − t) ≥ 0, ∀x, t

= sup

λ :

ATA+ µI 0 −AT b0 ρ ρ−λ−µ

2

−bTA ρ−λ−µ2

bT b− λ

0

,

which is a semidefinite programm (SDP).

(Beihang University) 18 / 79

The Simplified SDP

sup

λ :

ATA+ µI 0 −AT b0 ρ ρ−λ−µ

2

−bTA ρ−λ−µ2

bT b− λ

0

⇐⇒

sup

λ :

Λ + µI 0 00 ρ 0

0 0 − (ρ−λ−µ)24ρ

+ ‖b‖2 − λ− bT (Λ + µI)−1b

0

where ATA = UΛUT , U is orthogonal and b = −UTAT b.⇐⇒

sup

λ : Λi + µ ≥ 0, − (ρ− λ− µ)2

4ρ+ ‖b‖2 − λ− bT (Λ + µI)−1b ≥ 0

(Beihang University) 19 / 79

Reducing SDP to solving a smooth univariate equation

By eliminating λ, (SDP) is reduced to

max T (µ) := −µ− ρ+ 2ρ

√√√√µ

ρ− 1

ρ

n∑i=1

bi2

Λi + µ+

1

ρbT b

s.t. µ ≥ maxi

(−Λi)

T (µ) is a strictly concave function.The classical Newton’s method is workable with quadratic convergencerate.

(Beihang University) 20 / 79

Extensions of S-lemma: S-lemma with interval bounds

TheoremAssume there is an x ∈ Rn such that −∞ < α < h(x) < β <∞. Then, except that A has exactly one negativeeigenvalue, B = 0, b 6= 0 and there exists a ν ≥ 0 such that

V TAV 1

2bT bV TAb V T a

12bT b

bTAV bT Ab(2bT b)2

+ ν aT b2bT b

− ν2(α + β − 2d)

aT V aT b2bT b

− ν2(α + β − 2d) c + ν(α− d)(β − d)

0,

where V ∈ Rn×(n−1) is the matrix basis of N (b), the system

f(x) < 0, α ≤ h(x) ≤ β

is unsolvable if and only if there is a number µ ∈ R such that

f(x) + µ−(h(x)− β) + µ+(α− h(x)) ≥ 0, ∀x ∈ Rn,

where µ+ = maxµ, 0, µ− = minµ, 0.

S. Wang, Y. Xia, Strong duality for generalized trust region subproblem:S-lemma with interval bounds. Optim. Lett. 9, 1063-1073, 2015

(Beihang University) 21 / 79

Positive Lagrangian duality gap

Identical regularized total least squares problem (TLS) (Beck et al. 2006):

(ITLS) min

‖Ax− b‖2

xTx+ 1: α ≤ xTx ≤ β

,

where A ∈ Rm×n (m ≥ n) and 0 ≤ α ≤ β < +∞ is assumed. Our necessary andsufficient condition for the strong duality:

Theorem (Yang & Xia, 2000)

The Lagrangian duality gap for (ITLS) is positive if and only if

A := ATA− λmin

(ATA −AT b−bTA bT b

)· I 0,

α− bTAA−2AbT > 0.

Corollary

Strong Lagrangian duality holds for (ITLS) with α = 0.

(Beihang University) 22 / 79

Closing Lagrangian duality gap

(G− ITLS) min

f1(x)

f2(x): α ≤ f3(x) ≤ β

,

where fi(x) are quadratic functions for i = 1, 2, 3 and f2 > 0.

Theorem

There is no Lagrangian duality gap for the scaled (G− ITLS):

min

f1(x)

f2(x):

α

f2(x)≤ f3(x)

f2(x)≤ β

f2(x)

.

(Beihang University) 23 / 79

Let’s look back

Nonlinear Farkas Lemma: convex;

S-lemma: quadratic nonconvex.

(Beihang University) 24 / 79

Unifying Farkas and S-lemma

Theorem

(U-lemma) Suppose f(x) and g(x) are two quadratic functions. Letq0(z), . . . , qm(z) be convex functions defined in a convex set Ω ⊆ Rn.Assume that there exist x ∈ Rn and z ∈ relint(Ω) such thatg(x) + q1(z) < 0, qi(z) < 0, i = 2, . . . ,m. Then the following twostatements are equivalent:

(i) The system

f(x) + q0(z) < 0, g(x) + q1(z) ≤ 0, qi(z) ≤ 0, i = 2, . . . ,m, z ∈ Ω

has no solution (x, z).

(ii) There exist µi ≥ 0, i = 1, 2, . . . ,m such that

f(x) + µ1g(x) + q0(z) +

m∑i=1

µiqi(z) ≥ 0, ∀ x ∈ Rn, z ∈ Ω.

(Beihang University) 25 / 79

Application 1.

Solving the p-regularized subproblems for p > 2:

(p−RS) : minxxTAx+ bTx+ ‖x‖p (p > 2)

The p-regularized subproblem (p-RS) is a regularization technique incomputing a Newton-like for unconstrained optimization.When p = 3: Nestrov-Polyak Subproblem.When p = 4: Double-well potential minimization problem.

Y. Xia, R.-L. Sheu, Y.-x. Yuan, Theory and application of p-regularizedsubproblems for p > 2, Optimization Methods & Software, 32(5) 1059-1077,2017

(Beihang University) 26 / 79

Application 2.

A common way of seeking a good answer in numerical analysis is to find amethod of calculation that minimizes the backward error of the problem.The basic idea of the backward error analysis technique is to show thesolution obtained is the exact solution of a nearby problem, by use offloating-point arithmetic error bounds. One of the cost functions is asbelow:

minx

‖Ax− b‖‖A‖‖x‖+ ‖b‖

. (1)

K.E. Schubert, A new look at robust estimation and identification,Ph.D.Dissertation, Univ. of California, Santa Barbara, CA, September 2003.

(Beihang University) 27 / 79

Trust-region subproblem (TRS)

(TRS) minxTx=1

1

2xTQx + cTx.

-15

-10

-5

0

5

1

10

15

20

y2

01

y1

0.50-1 -0.5-1

global minimizerglobal maximizerlocal non-global minimizerlocal non-global maximizer

Figure: An example that the local non-global minimizer exists

(Beihang University) 28 / 79

Problem simplification and the KKT condition

Let Q = UTΛU , Λ = Diag(λ1, . . . , λn), and U = [u1, . . . , un]

(0 >)λ1 ≤ λ2 ≤ . . . ≤ λn,

Introduce y = Ux and d = Uc, we have

v(TE) = minyT y=1

f(y) =1

2yTΛy + dT y

We call (y, µ) a KKT point if :

(Λ + µIn)y + d = 0

yT y − 1 = 0

µ ∈ R

(Beihang University) 29 / 79

Optimality conditions for local minimizers

Lemma

(1) (Second-order necessary condition.) If y is a local minimizer, µ isthe associated Lagrangian multiplier, then (y, µ) is a KKT point and

vT (Λ + µIn)v ≥ 0, ∀v ∈ Rn such that vT y = 0.

(2) (Second-order sufficient condition.) Suppose (y, µ) is a KKT pointand

vT (Λ + µIn)v > 0, ∀v ∈ Rn such that vT y = 0, v 6= 0.

Then y is a strict local minimizer.

D. G. Luenberger, Linear and Nonlinear Programming, 2nd ed.,Addison-Wesley, Reading, MA, 1984

(Beihang University) 30 / 79

Necessary and sufficient condition for global minimizer

Theorem (Gay 1981; Sorensen 1982; More and Sorensen 1983)

y is a global minimizer if and only if (y, µ) is a KKT point and

µ ≥ −λ1(> 0).

D. M. Gay, Computing optimal locally constrained steps. SIAM J. Sci.Stat. Comput. 2(2): 186-197, 1981

D. C. Sorensen, Newton’s method with a model trust region modification.SIAM J. Numer. Anal. 19(2): 409–426, 1982

J. J. More and D. C. Sorensen, Computing a trust region step. SIAM J.Sci. Statist. Comput., 4(3): 553–572, 1983

(Beihang University) 31 / 79

Characterizations of local non-global minimizer

Lemma (Martınez,1994)

If y is a local non-global minimizer, then (y, µ) is a KKT point, λ1 < λ2

and µ ∈ (−λ2,−λ1).

Define the so-called secular function:

ϕ(µ) =

n∑i=1

d2i

(λi + µ)2− 1.

µ must be a zero point of ϕ(µ).

J. M. Martınez, Local minimizers of quadratic functions on Euclidean ballsand spheres, SIAM J. Optim. 4: 159–176, 1994

(Beihang University) 32 / 79

Secular function

ϕ(µ) =

n∑i=1

d2i

(λi + µ)2− 1,

ϕ′(µ) = −2

n∑i=1

d2i

(λi + µ)3,

ϕ′′(µ) = 6

n∑i=1

d2i

(λi + µ)4.

ϕ′′(µ) > 0, ϕ(µ) is strongly convex

(d1 6= 0) limµ→−λ1

ϕ(µ) = +∞, (d1 = 0) limµ→−λ1

ϕ(µ) = α.

(d2 6= 0) limµ→−λ2

ϕ(µ) = +∞, (d2 = 0) limµ→−λ2

ϕ(µ) = β.

ϕ(µ) has at most two zero points in (−λ2,−λ1)

(Beihang University) 33 / 79

Optimality conditions for local non-global minimizer

Theorem (Martınez,1994)

(1) Suppose either λ1 = λ2 or d1 = 0, there is no local non-globalminimizer.

(2) There is at most one local non-global minimizer.

(3) (Necessary condition.) If y is a local non-global minimizer, then (y, µ)is a KKT point, µ ∈ (−λ2,−λ1) and

ϕ′(µ) ≥ 0.

(4) (Sufficient condition.) If (y, µ) is a KKT point, µ ∈ (−λ2,−λ1) and

ϕ′(µ) > 0,

then y is the unique local non-global minimizer.

(Beihang University) 34 / 79

An example that ϕ′(µ) = 0

110.5 0.5y

1

0y

2

0-0.5

30

20

10

0

-1 -0.5

-10

-20-1

global minimizerglobal maximizersecond-order KKT point

Figure: A two-dimensional example has no local non-global minimizer/maximizer.

(Beihang University) 35 / 79

Necessary and sufficient condition for local non-globalminimizer

Theorem (Wang & Xia 2020)

y is the unique local non-global minimizer if and only if (y, µ) is a KKTpoint, µ ∈ (−λ2,−λ1) and ϕ′(µ) > 0.

Theorem (Second-order necessary and sufficient condition)

x is a local non-global minimizer of (TE) if and only if there exists aunique Lagrangian multiplier µ such that

(Q+ µIn)x+ c = 0,

xTx = 1,

vT (Q+ µIn)v > 0, ∀v ∈ Rn such that vTx = 0, v 6= 0,

Q+ µIn 6 0.

J. Wang, Y. Xia, Closing the gap between necessary and sufficientconditions for local non-global minimizer of trust region subproblem, SIAM J.Optim. 2020, accepted(Beihang University) 36 / 79

Application

With the help of local-nonglobal minimizer, one can globally solve theextended trust-region subproblem:

min1

2xTQx+ cTx

s.t. xTx ≤ 1,

aTi x− bi ≤ 0, i = 1, · · · ,m,

(Beihang University) 37 / 79

NP=P ?

(Beihang University) 38 / 79

Chebyshev center problem

Finding the smallest ball enclosing the convex set Ω:

minz

maxx∈Ω‖x− z‖2.

Easy cases:

Ω : a given set of finite points.

Ω : intersection of twoellipsoids in the complex domain.

(Beihang University) 39 / 79

Chebyshev center of intersection of balls

Finding the smallest ball enclosing the intersection of the given balls:

(CCB) minz∈Rn

max‖x−ai‖≤ri,i=1,2,...,p

‖x− z‖2,

‖ · ‖: `2-norm.

(Beihang University) 40 / 79

Applications

Example

Robust estimation. Suppose yk, k = 1, ..., p are the p measurements of theunknown x with bounded noises, i.e., ‖yk − x‖ ≤ ρ for some ρ > 0. Then,a robust recover of x can be obtained by solving (CCB).

Example

In non-cooperative wireless network positioning, the region for a target tocommunicate with some reference nodes is an intersection of many balls.The position error is usually bounded by the diameter of the smallest ballenclosing this region. This leads to a direct application of (CCB) withn = 2 or 3.

(Beihang University) 41 / 79

Known Results

Theorem (Beck 2007)

v(CCB) = v(SDP) as long as p ≤ n− 1.

Theorem (Beck 2009)

v(CCB) = v(SDP) as long as p ≤ n.

No complexity results when p > n.

(Beihang University) 42 / 79

What happens when p = 3 > n = 2?

(Beihang University) 43 / 79

Our complexity results

Theorem

The complexity of (CCB) when n = 2 is at most O(p2).

Theorem

(CCB) is NP-hard.

Theorem

Suppose either n is fixed or p = n+ q with a fixed integer q, (CCB) ispolynomially solvable in at most

(pn

)·O(n3) time.

Y. Xia, M. Yang, S. Wang, Chebyshev Center of the Intersection of Balls:Complexity, Relaxation and Approximation, Mathematical Programming, 2020,https://doi.org/10.1007/s10107-020-01479-0

(Beihang University) 44 / 79

Chebyshev center of the intersection of two ellipsoids

(CC) minz

maxx∈Ω‖x− z‖2,

where

Ω :=x ∈ Rn : ‖Fix+ gi‖2 ≤ 1, i = 1, 2

,

and Fi ∈ Rmi×n, gi ∈ Rmi .

Beck, A., Eldar, Y.: Regularization in regression with bounded noise: AChebyshev center approach. SIAM Journal on Matrix Analysis and Applications29(2), 606-625, 2007

(Beihang University) 45 / 79

Application in bounded error estimation

Linear regression model Ax ≈ b, where the input data matrix A isill-conditioned.

Admissible solutions to the linear system:

F = x ∈ Rn : ‖Lx‖2 ≤ η, ‖Ax− b‖2 ≤ ρ,

where regularization constraint ‖Lx‖2 ≤ η is to stabilize x and‖Ax− b‖2 ≤ ρ is the error bound constraint.

A robust approximation of the true solution x is given by the Chebyshevcenter of F .

Milanese, M., Vicino, A.: Optimal estimation theory for dynamic systemswith set membership uncertainty: an overview. Automatica 27(6), 997-1009,1991

(Beihang University) 46 / 79

An Example

-6 -5 -4 -3 -2 -1 0 1 2 3

-4

-3

-2

-1

0

1

2

3

*+

Figure: An example in two dimension where the input ellipse is plotted in solid line. The dotted and dashed circles are theChebyshev solutions and the SDP approximation, respectively. Chebyshev centers and the corresponding SDP approximation aremarked by ∗ and +, respectively.

(Beihang University) 47 / 79

Complexity for (CC)

Theorem

For any ε > 0, (CC) can be solved in

O(n8 log log u−1) log

((6R+ 4‖zc‖) · σmax(F1)

σ2min(F1)

1

ε

).

X. Cen, Y. Xia et al. On Chebyshev Center of the Intersection of TwoEllipsoids, WCGO 2019

(Beihang University) 48 / 79

Weighted Maximin Dispersion Problem

Consider the following weighted maximin dispersion problem

(MaxMin) maxx∈Ω

mini=1,...,m

ωi‖x− xi‖2,

where Ω = x | ‖x‖p ≤ 1 with p ≥ 2,x1, . . . , xm ∈ Rn are given m points,ωi > 0 for i = 1, . . . ,mand ‖x‖ =

√xTx is the Euclidean norm.

(Beihang University) 49 / 79

On the objective function

nonsocial transient behavior

E.C. Kim, Nonsocial Transient Behavior: Social Disengagement on theGreyhound Bus, Symbolic Interaction (2012), 35(3) pp. 267–283.

(Beihang University) 50 / 79

An Application

m cities: x1, . . . , xm.

Choosing a location in the region χ for the facility such that the amountof pollution reaching any city is minimized.

(Beihang University) 51 / 79

Complexity

p = +∞. (MaxMin) is NP-hard. (Haines et al., SIAM J. Optim.2013).

p = 2. (MaxMin) is NP-hard. When m ≤ n− 1, it is polynomiallysolvable. (Wang & Xia, SIAM J. Optim. 2016).

2 < p < +∞. Unknown!

S. Wang and Y. Xia, On the Ball-Constrained Weighted MaximinDispersion Problem, SIAM J. Optim. 26(3), 1565-1588, 2016

(Beihang University) 52 / 79

Approximation algorithm for (CCB)

(CCB) minz∈Rn

f(z) = max

‖x−ai‖≤ri,i=1,2,...,p‖x− z‖2

.

(Beihang University) 53 / 79

Standard quadratic programming relaxation

Using SDP relaxation for the inner maximization

(DCC) minz

v(SDP(z)) + ‖z‖2

= miny,λ,z

y + ‖z‖2

s. t.

((−1 +

∑pi=1 λi)In z −

∑pi=1 λiai

zT −∑p

i=1 λiaTi y +

∑pi=1 λi(‖ai‖2 − r2

i )

) 0,

λi ∈ R+, i = 1, . . . , p,

which reduces to the standard quadratic program:

(SQP) minλ

p∑i=1

λi(r2i − ‖ai‖2) +

∥∥∥∥∥p∑i=1

λiai

∥∥∥∥∥2

s. t.

p∑i=1

λi = 1, λi ≥ 0, i = 1, . . . , p.

(Beihang University) 54 / 79

Approximation of the SQP

Theorem (Xia-Yang-Wang 2020)

The candidate solution z =∑p

i=1 λ∗i ai satisfies(√

2 + γ

1− γ

)2

v(CCB) ≥ f(z) ≥ v(CCB),

where γ equal to the optimal value of:

minx∈Rn

maxi=1,...,p

‖x− ai‖ri

.

Moreover, let dmax = maxi,j=1,...,p ‖ai − aj‖ and rmin = mini ri:

γ ≤√

n

2(n+ 1)· dmax

rmin<

dmax√2 rmin

.

(Beihang University) 55 / 79

Weighted Maximin Dispersion Problem

(MaxMin) max‖x‖p≤1

f(x) := ωi

(xTx− 2(xi)Tx+ (xi)Txi

).

Lifting xxT to a matrix X yields

(SDP) max mini=1,...,m

ωi(Tr(X)− 2(xi)Tx+ (xi)Txi

)s.t. X

p211 +X

p222 + . . .+X

p2nn ≤ 1,[

X xx 1

] 0.

v(SDP) ≥ v(MaxMin).

(Beihang University) 56 / 79

General Approximation Algorithm (Haines et al. 2013)

An approximation algorithm for (MaxMin)

1. Input ρ ∈ (0, 1) and xi for i = 1, . . . ,m. Let α =√

2 ln(m/ρ).

2. Solve (SDP) and return the optimal solution Z∗ ∈ Sn+1.

Set bi = (√Z∗11x

i1, . . . ,

√Z∗nnx

in)T for i = 1, . . . ,m.

3. Repeatedly generate ξ = (ξ1, . . . , ξn)T with independent ξitaking the value ±1 with equal probability until (bi)T ξ < α‖bi‖for i = 1, . . . ,m.

4. Output x =

( √Z∗

11ξ1√Z∗n+1,n+1

,

√Z∗

22ξ2√Z∗n+1,n+1

, . . . ,

√Z∗nnξn√

Z∗n+1,n+1

)T.

(Beihang University) 57 / 79

Approximation Algorithm without SDP Relaxation

A new algorithm for (MaxMin)

1. Input ρ ∈ (0, 1) and xi for i = 1, . . . ,m. Let α =√

2 ln(m/ρ).

2. Solve (SDP) and return the optimal solution Z∗ ∈ Sn+1.

Set bi = (√Z∗11x

i1, . . . ,

√Z∗nnx

in)T for i = 1, . . . ,m.

2. bi = xi/n1/p for i = 1, . . . ,m.

3. Repeatedly generate ξ = (ξ1, . . . , ξn)T with independent ξitaking the value ±1 with equal probability until (bi)T ξ < α‖bi‖for i = 1, . . . ,m.

4. Output x =

( √Z∗

11ξ1√Z∗n+1,n+1

,

√Z∗

22ξ2√Z∗n+1,n+1

, . . . ,

√Z∗nnξn√

Z∗n+1,n+1

)T. n−

1p ξ.

(Beihang University) 58 / 79

Our New Approximation Bound for (P)

Theorem

For the solution x returned by the new Algorithm, we have

v(MaxMin) ≥ f(x) >1−

√2n ln(m/ρ)

2· v(MaxMin).

(Beihang University) 59 / 79

8 10 12 14 16 18 20 22 24 26 28 30 322

3

4

5

6

7

8

9

10

m

The

ret

urne

d ob

ject

ive

valu

es

Our new Algorithm 2Algorithm 1 proposed in [6]

Figure: Numerical comparison between our new approximation algorithm(Algorithm 2) and the algorithm proposed in [3] (Algorithm 1).

(Beihang University) 60 / 79

Global Optimization for Biconvex Program

For any fixed x (y), f(x, y) is a convex function of y (x), for example,bilinear: f(x, y) = xT y.

Many nonconvex programming problems can be reformulated as abiconvex program.

Our algorithm scheme: biconvexify + branch-and-bound.

(Beihang University) 61 / 79

Nonconvex QP

The “simplest” nonconvex QP is NP-hard:

(NQP) min f(x) = xTQ+x+ qTx− (cTx)2

s.t. x ∈ X,

where X is a polytope, and Q+ 0.

bi-convex:

minx∈X,t∈R

xTQ+x+ qTx− (cTx)2 + (t− cTx)2

= minx∈X,t∈R

xTQ+x+ qTx+ t2 − 2tcTx.

(Beihang University) 62 / 79

New Global Algorithm for solving nonconvex QP

mint∈[tmin,tmax]

g(t) = min

x∈XxTQ+x+ qTx− 2tT cTx+ t2

where tmin = minx∈X c

Tx, tmax = maxx∈X cTx.

Branch-and-bound algorithm for minimizing g(t).

The key idea of the algorithm is to estimate a lower bound of g(t) overany interval [t1, t2] ⊆ [tmin, tmax] after calculating the objective valuesg(t1) and g(t2).

(Beihang University) 63 / 79

Novel underestimation

mint∈[t1,t2]

g(t) = minx∈X

xTQ+x+ qTx− 2tT cTx+ t2

≥ mint∈[t1,t2]

miny1,y2∈R

y1 + ty2 + t2

s.t. y1 + t1y2 + t21 ≥ g(t1)

y1 + t2y2 + t22 ≥ g(t2)

= mint∈[t1,t2]

maxµ

µ1(g(t1)− t21) + µ2(g(t2)− t22) + t2

s.t. µ1 + µ2 = 1; µ1, µ2 ≥ 0

µ1t1 + µ2t2 = t

= mint∈[t1,t2]

t2 + bt+ d

where b =g(t2)−g(t1)+t21−t22

t2−t1 , d =t2(g(t1)−t21)−t1(g(t2)−t22)

t2−t1 . The optimal t∗ isused for the next subdivision.

(Beihang University) 64 / 79

Numerical Results

Table: Results of B&B, CPLEX, and BARON for problem (P1)

B&B CPLEX BARONm/n #val time #iter #val time #val time10 / 5 -0.6382 0.06 4.6 -0.6382 0.12 -0.6382 0.2310 / 20 -3.0354 0.05 3.8 -3.0354 0.17 -3.0354 0.9320 / 25 -2.9929 0.07 4.8 -2.9929 0.16 -2.9929 1.1125 / 50 -5.6372 0.06 4.4 -5.6372 0.20 -5.6372 103.6950 / 100 -10.8368 0.10 4.2 -10.8368 1.18 -10.8368 889.29100/ 200 -19.8384 0.20 4.0 -19.8384 4.30 - -100 / 500 -43.9165 0.51 3.8 -43.9165 10.85 - -200 / 1000 -87.6703 1.51 3.8 -87.6703 106.78 - -300 / 1500 -133.3308 2.87 3.0 * * - -400 / 2000 -177.0847 5.84 3.8 * * - -500 / 2500 -216.0523 10.03 3.8 * * - -600 / 3000 -264.7072 15.00 3.6 * * - -700 / 3500 -301.3223 25.68 4.0 * * - -800 / 4000 -338.7711 34.11 3.8 * * - -900 / 4500 -387.8831 46.00 3.8 * * - -1000 / 5000 -439.0814 57.28 3.6 * * - -

(Beihang University) 65 / 79

Regularized Total Least Squares

minx∈Rn

‖Ax− b‖2

‖x‖2 + 1+ ρ‖Lx‖2.

bi-hidden-convexifying

minα≥1

G(α) := min

‖x‖2=α−1

‖Ax− b‖2

α+ ρ‖Lx‖2

(Beihang University) 66 / 79

Lower bound

The lower bound of G(α) over α ∈ [αi, αi+1]:

minα∈[αi,αi+1]

c1α+c2

α+ c3,

where

c1 =αi+1λ(αi+1)− αiλ(αi)

αi+1 − αi,

c2 = αiαi+1

(c1 −

G(αi+1)− G(αi)

αi+1 − αi

),

c3 =αi+1G(αi+1)− αiG(αi)

αi+1 − αi− c1(αi+1 + αi).

(Beihang University) 67 / 79

Computational Complexity

Theorem

Under some mild assumptions, the new global optimization algorithmrequires at most ⌈

2U√αmax(αmax − αmin)

αmin ·√ε

⌉iterations, where U = maxα∈[αmin,αmax] λ(α) + αλ′(α), is a well-definedfinite number and λ(α) is the λ-solution of the KKT system.

(Beihang University) 68 / 79

Numerical results for image deblurring with different noise(n = 1024)

Bisection Sawtooth Oursσ time # iter. time # iter. time # iter.

0.01 9.46 17.2 477.20 749.6 8.60 14.40.03 12.33 21.9 2673.61 4195.1 10.11 16.60.05 8.91 16.2 5071.03 7780.0 10.52 17.00.08 10.04 18.0 4986.66 7670.6 10.68 17.0

0.1 11.25 19.8 5261.11 7958.7 11.56 18.40.3 17.65 29.4 5764.56 8520.1 10.94 17.00.5 18.59 30.8 6106.53 9023.9 11.19 17.4

(Beihang University) 69 / 79

Ball-constrained Sum-of-two-ratios

(P) max f(x) =xTBx

xTWx+ xTDx

s.t. ‖x‖ = 1,

which is a normalization of (SRQ):

(SRQ) maxx 6=0

xTBx

xTWx+xTDx

xTV x.

(Beihang University) 70 / 79

Reformulation of (P)

(Pα) maxα∈[λ1,λn]

G(α) := max

xTWx=α, x∈S

xTBx

α+ xTDx

.

where λ1 = λmin(W ), λn = λmax(W ).

(Beihang University) 71 / 79

Novel overestimation

Theorem

c1 =αi+1ν1(αi+1)− αiν1(αi)

αi+1 − αi,

c2 = αiαi+1

(c1 −

G(αi+1)− G(αi)

αi+1 − αi

),

c3 =αi+1G(αi+1)− αiG(αi)

αi+1 − αi− c1(αi+1 + αi).

If c1 < 0, c2 < 0, α :=√

c2c1∈ (αi, αi+1), then

maxα∈[αi,αi+1]

G(α) ≤ maxα∈[αi,αi+1]

G(α) = 2√c1c2 + c3.

Otherwise, one of the two endpoints is optimal.

(Beihang University) 72 / 79

Complexity

Theorem

Suppose B, W and D are all diagonal, (P) can be solved in O(n2) time.

Theorem

The total complexity of the new branch-and-bound algorithm forapproximately solving (Pα) is linear-time in terms of N , an upper boundfor the number of all non-zero entries in B, D and W .

L. Wang, Y. Xia, A linear-time algorithm for globally maximizing the sumof a generalized Rayleigh quotient and a quadratic form on the unit sphere,SIAM J. Optim. 29(3), 1844-1869, 2019

(Beihang University) 73 / 79

Numerical results for small-size problems

Table: Numerical comparison among BARON, Algorithm Lipschitz-BB and thenew BB.

BARON Lipschitz-BB New BBn time time # iter. time # iter.

3 1.07 28.87 140.0 0.03 43.34 5.23 22.19 104.7 0.03 44.85 17.75 23.58 105.2 0.03 43.86 290.36 18.52 85.0 0.03 44.27 1199.74 24.70 114.3 0.04 44.48 — 22.35 100.3 0.05 44.9

(Beihang University) 74 / 79

Table: Numerical comparison between Algorithms L-BB and new-BB for solving(P) with η = 1.

L-BB New-BBn time # iter. time # iter.

30 683.88 2704.0 0.22 39.850 1034.22 3390.4 0.44 40.480 1825.18 3778.2 0.94 40.1

100 — — 1.88 43.2

(Beihang University) 75 / 79

Table: Numerical comparison between Algorithms L-BB and new-BB for solving(P) with η = 10.

L-BB New-BBn time # iter. time # iter.30 31.01 124.4 0.22 42.250 53.63 182.1 0.45 42.680 77.67 167.2 1.06 44.8

100 137.11 217.3 1.87 44.7120 171.04 193.1 2.68 44.5150 270.26 185.7 4.13 45.0180 528.61 226.8 5.87 45.9200 667.49 216.6 7.03 45.2220 874.41 217.4 8.56 46.0250 1439.84 245.2 11.13 46.1280 2186.28 263.9 14.02 46.1300 2941.02 284.2 15.95 46.8320 — — 17.90 45.9

(Beihang University) 76 / 79

General Nonconvex QP

Nonconvex QP with multi negative eigenvalues:

minx∈X

xTQ+x+ qTx− ‖Cx‖2

= mint∈∆

g(t) = min

x∈XxTQ+x+ qTx− 2tTCx+ ‖t‖2

≥ min

t∈∆min

y1∈R,y2∈Rry1 + tT y2 + ‖t‖2

s.t. y1 + tiTy2 + ‖t‖2 ≥ g(ti), i = 1, · · · , r + 1.

= mint∈∆

maxu

r+1∑i=1

ui(g(ti)− ‖ti‖2) + ‖t‖2

s.t.

r+1∑i=1

ui = 1;

r+1∑i=1

uiti = t; u ≥ 0

= minα∈∆

αT (g − diag(RTR)) + αTRTRα

where R = (t1, · · · , tr+1), g = (g(t1), · · · , g(tr+1))T .(Beihang University) 77 / 79

Conclusions

When solving a nonconvex optimization, we have the following steps:

1. Is there any hidden convex structure? If no, goto Step 2.

2. Is it NP-hard?

3. Necessary condition and/or sufficient conditions for global/localminimizer.

4. Is there an polynomial-time approximation algorithm?

5. If the problem is not far away from convex optimization, globaloptimization may be a practical choice.

(Beihang University) 78 / 79

Thank you for your time!

yxia@buaa.edu.cn

(Beihang University) 79 / 79

top related