algorithms for minimizing differences of convex functions ......minimizing differences of convex...

48
Minimizing Differences of Convex Functions-The DCA Algorithms for Minimizing Differences of Convex Functions and Applications to Facility Location and Clustering Mau Nam Nguyen (joint work with D. Giles and R. B. Rector) Fariborz Maseeh Department of Mathematics and Statistics Portland State University Mathematics and Statistics Seminar WSU Vancouver, April 12, 2017

Upload: others

Post on 23-Sep-2020

12 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Algorithms for Minimizing Differences of Convex Functions ......Minimizing Differences of Convex Functions-The DCA Algorithms for Minimizing Differences of Convex Functions and Applications

Minimizing Differences of Convex Functions-The DCA

Algorithms for Minimizing Differences ofConvex Functions and Applications to

Facility Location and Clustering

Mau Nam Nguyen

(joint work with D. Giles and R. B. Rector)

Fariborz Maseeh Department of Mathematics and StatisticsPortland State University

Mathematics and Statistics SeminarWSU Vancouver, April 12, 2017

Page 2: Algorithms for Minimizing Differences of Convex Functions ......Minimizing Differences of Convex Functions-The DCA Algorithms for Minimizing Differences of Convex Functions and Applications

Minimizing Differences of Convex Functions-The DCA

Convex Functions

DefinitionLet I ⊂ R be an interval and let f : I → R be a function. Thefunction f is said to be CONVEX on Ω if

f(λx + (1− λ)u

)≤ λf (x) + (1− λ)f (u)

for all x ,u ∈ I and λ ∈ (0,1).The function f is called CONCAVE if −f is convex.

Figure: Convex Function and Nonconvex Function.

Page 3: Algorithms for Minimizing Differences of Convex Functions ......Minimizing Differences of Convex Functions-The DCA Algorithms for Minimizing Differences of Convex Functions and Applications

Minimizing Differences of Convex Functions-The DCA

Convex Functions

ExampleLet I = R = (−∞,∞) and let f (x) = |x | for x ∈ R. Then f is aconvex function. Indeed, for any x ,u ∈ R and λ ∈ (0,1),

f (λx + (1− λ)u) = |λx + (1− λ)u|≤ |λx |+ |(1− λ)u|= λ|x |+ (1− λ)|u|= λf (x) + (1− λ)f (u).

Figure: Convex Function and Nonconvex Function.

Page 4: Algorithms for Minimizing Differences of Convex Functions ......Minimizing Differences of Convex Functions-The DCA Algorithms for Minimizing Differences of Convex Functions and Applications

Minimizing Differences of Convex Functions-The DCA

Characterizations for Convexity

TheoremLet f : I → R be differentiable function, where I is an openinterval in R. Then f is convex on I if and only if f ′ is monotoneincreasing on I.

TheoremLet f : I → R be twice differentiable function, where I is an openinterval in R. Then f is convex on I if and only if f ′′(x) ≥ 0 for allx ∈ I.

ExampleThe following functions are convex:

(i) f (x) = ax2 + bx + c, x ∈ R, a > 0.(ii) f (x) = ex , x ∈ R.(iii) f (x) = − ln(x), x ∈ (0,∞).

Page 5: Algorithms for Minimizing Differences of Convex Functions ......Minimizing Differences of Convex Functions-The DCA Algorithms for Minimizing Differences of Convex Functions and Applications

Minimizing Differences of Convex Functions-The DCA

Convex Sets

DefinitionA subset Ω of Rn is CONVEX if [a,b] ⊂ Ω whenever a,b ∈ Ω.Equivalently, Ω is convex if λa + (1− λ)b ∈ Ω for all a,b ∈ Ωand λ ∈ [0,1].

Figure: Convex set and nonconvex set.

Page 6: Algorithms for Minimizing Differences of Convex Functions ......Minimizing Differences of Convex Functions-The DCA Algorithms for Minimizing Differences of Convex Functions and Applications

Minimizing Differences of Convex Functions-The DCA

Convex Functions

DefinitionLet f : Ω→ R be defined on a convex set Ω ⊂ Rn. The functionf is said to be CONVEX on Ω if

f(λx + (1− λ)y

)≤ λf (x) + (1− λ)f (y)

for all x , y ∈ Ω and λ ∈ (0,1).The function f is called CONCAVE if −f is convex.

Figure: Convex Function and Nonconvex Function.

Page 7: Algorithms for Minimizing Differences of Convex Functions ......Minimizing Differences of Convex Functions-The DCA Algorithms for Minimizing Differences of Convex Functions and Applications

Minimizing Differences of Convex Functions-The DCA

Convex Functions

Theorem

Let fi : Rn → R be convex functions for all i = 1, . . . ,m.Then the following functions are convex as well:

(i) The multiplication by scalars λf for any λ > 0.(ii) The sum function

∑mi=1 fi .

(iii) The maximum function max1≤i≤m fi .Let f : Rn → R be convex and let ϕ : R→ R benondecreasing. Then the composition ϕ f is convex.Let B : Rn → Rp be an affine mapping and let f : Rp → Rbe a convex function. Then the composition f B is convex.

Page 8: Algorithms for Minimizing Differences of Convex Functions ......Minimizing Differences of Convex Functions-The DCA Algorithms for Minimizing Differences of Convex Functions and Applications

Minimizing Differences of Convex Functions-The DCA

Relationship between Convex Functions and Convex Sets

TheoremA function f : Rn → R is convex if and only if its epigraph epi f isa convex subset of the product space Rn × R.

Figure: Epigraphs of convex function and nonconvex function.

Page 9: Algorithms for Minimizing Differences of Convex Functions ......Minimizing Differences of Convex Functions-The DCA Algorithms for Minimizing Differences of Convex Functions and Applications

Minimizing Differences of Convex Functions-The DCA

Pierre de Fermat

Pierre de Fermat (1601-1665) was a French lawyer at theParlement of Toulouse, France, and an amateur mathematicianwho is given credit for early developments that led to calculus.

Page 10: Algorithms for Minimizing Differences of Convex Functions ......Minimizing Differences of Convex Functions-The DCA Algorithms for Minimizing Differences of Convex Functions and Applications

Minimizing Differences of Convex Functions-The DCA

Fermat-Torricelli Problem

In the early of the 17th century, at the end of his book, Treatiseon Minima and Maxima, the French mathematician Fermat(1601-1665) proposed the following problem: given three pointsin the plane, find a point such that the sum of its distances tothe three given points is minimal.Given a1,a2, . . . ,am ∈ Rn, find x ∈ Rn which minimizes thefunction

f (x) =m∑

i=1

‖x − ai‖.

Page 11: Algorithms for Minimizing Differences of Convex Functions ......Minimizing Differences of Convex Functions-The DCA Algorithms for Minimizing Differences of Convex Functions and Applications

Minimizing Differences of Convex Functions-The DCA

Weiszfeld’s Algorithm

The gradient of the function φ is

∇φ(x) =m∑

i=1

x − ai

‖x − ai‖, x /∈ a1,a2, . . . ,am.

Solving the equation ∇φ(x) = 0 gives

x =

∑mi=1

ai

‖x − ai‖∑mi=1

1‖x − ai‖

:= F (x).

For continuity, define F (x) := x for x ∈ a1,a2, . . . ,am.Weiszfeld algorithm: choose a starting point x0 ∈ Rn and define

xk+1 = F (xk ) for k ∈ N.

Page 12: Algorithms for Minimizing Differences of Convex Functions ......Minimizing Differences of Convex Functions-The DCA Algorithms for Minimizing Differences of Convex Functions and Applications

Minimizing Differences of Convex Functions-The DCA

Kuhn’s Convergence Theorem

Weiszfeld also claimed that if x0 /∈ a1,a2, . . . ,am, where ai fori = 1, . . . ,m are not collinear, then (xk ) converges to the uniqueoptimal solution of the problem. A counterexample and acorrect statement and the proof of the convergence were givenby Kuhn in 1972.

TheoremLet (xk ) be the sequence formed by the Weiszfeld’s algorithm.Suppose that xk /∈ a1,a2, . . . ,am for k ≥ 0. Then (xk )converges to the optimal solution of the problem.

Page 13: Algorithms for Minimizing Differences of Convex Functions ......Minimizing Differences of Convex Functions-The DCA Algorithms for Minimizing Differences of Convex Functions and Applications

Minimizing Differences of Convex Functions-The DCA

C1− Functions

A function f : Rn → R is called a C1 function if it has all partial

derivatives∂f∂xi

for i = 1, . . . ,n, and all these partial derivatives

are continuous functions. The gradient of f at a point x isdenoted by

∇f (x) =( ∂f∂x1

(x), . . . ,∂f∂xn

(x)).

Example

Let f (x) = ‖x‖2 =∑n

i=1 x2i . Then f is a C1 function with

∂f∂xi

(x) = 2xi for x = (x1, . . . , xn).

and ∇f (x) = (2x1, . . . ,2xn) = 2x .

Page 14: Algorithms for Minimizing Differences of Convex Functions ......Minimizing Differences of Convex Functions-The DCA Algorithms for Minimizing Differences of Convex Functions and Applications

Minimizing Differences of Convex Functions-The DCA

C1− Convex Functions

Theorem

Let f : Rn → R be a C1 function. Then f is convex if and only if

f (x) + 〈∇f (x), x − x〉 ≤ f (x) for all x , x ∈ Rn.

Figure: C1 Convex Functions.

Page 15: Algorithms for Minimizing Differences of Convex Functions ......Minimizing Differences of Convex Functions-The DCA Algorithms for Minimizing Differences of Convex Functions and Applications

Minimizing Differences of Convex Functions-The DCA

Lipschitz Continuous Functions and C1,1 Functions

DefinitionA function g : Rn → Rm is said to be Lipschitz continuous ifthere exists a constant ` ≥ 0 such that

‖g(x)− g(u)‖ ≤ `‖x − u‖ for all x ,u ∈ Rn.

A C1 function f : Rn → R is called a C1,1 function if its gradientis Lipschitz continuous, i.e., there exists a constant ` > 0 suchthat

‖∇f (x)−∇f (u)‖ ≤ `‖x − u‖ for all x ,u ∈ Rn.

Page 16: Algorithms for Minimizing Differences of Convex Functions ......Minimizing Differences of Convex Functions-The DCA Algorithms for Minimizing Differences of Convex Functions and Applications

Minimizing Differences of Convex Functions-The DCA

C1,1 Functions

Theorem

Let f : Rn → R be a C1,1 function whose gradient is Lipschitzcontinuous with Lipschitz constant `. Then

f (x) ≤ f (x) + 〈∇f (x), x − x〉+`

2‖x − x‖2

for all x , x ∈ Rn.

Figure: C1 Convex Functions.

Page 17: Algorithms for Minimizing Differences of Convex Functions ......Minimizing Differences of Convex Functions-The DCA Algorithms for Minimizing Differences of Convex Functions and Applications

Minimizing Differences of Convex Functions-The DCA

The Gradient Method

Let f : Rn → R be a C1 function and let αk be a sequence ofpositive real numbers called the sequence of step sizes. Asequence xk as follows:

Choose x0 ∈ Rn.Define xk+1 = xk − αk∇f (xk ) for k ≥ 0.

Given a C1 function f : Rn → R and α > 0, define g : Rn → Rn

by g(x) = x − α∇f (x). Then the gradient method with constantstep size α is given by

Choose x0 ∈ Rn.Define xk+1 = xk − α∇f (xk ) = g(xk ) for k ≥ 0.

Page 18: Algorithms for Minimizing Differences of Convex Functions ......Minimizing Differences of Convex Functions-The DCA Algorithms for Minimizing Differences of Convex Functions and Applications

Minimizing Differences of Convex Functions-The DCA

The Gradient Method

Theorem

Given a C1 convex function f : Rn → R whose gradient isLipschitz continuous with constant ` > 0. Consider the gradient

method with constant step size α =1`. Suppose that f has an

absolute minimum at x∗. Then f (xk ) is monotone decreasingand

0 ≤ f (xk )− f (x∗) ≤ ‖x0 − x∗‖2

2kαfor all k ≥ 1.

Page 19: Algorithms for Minimizing Differences of Convex Functions ......Minimizing Differences of Convex Functions-The DCA Algorithms for Minimizing Differences of Convex Functions and Applications

Minimizing Differences of Convex Functions-The DCA

Nesterov’s Accelerated Gradient Method

Let f : Rn → R be a C1−convex function whose gradient is

Lipschitz continuous with constant ` > 0 and let α =1`

.

Choose y1 = x0 ∈ Rn and t0 = 0.

Define xk = yk − αk∇f (yk ) = g(yk ) for k ≥ 1.

Update tk =1 +

√1 + 4t2

k−1

2for k ≥ 1.

Define yk+1 = xk +tk − 1tk + 1

(xk − xk−1) for k ≥ 1.

Page 20: Algorithms for Minimizing Differences of Convex Functions ......Minimizing Differences of Convex Functions-The DCA Algorithms for Minimizing Differences of Convex Functions and Applications

Minimizing Differences of Convex Functions-The DCA

The Gradient Method

Theorem

Given a C1 convex function f : Rn → R whose gradient isLipschitz continuous with constant ` > 0. Consider theNesterove accelerated gradient method with constant step size

α =1`. Suppose that f has an absolute minimum at x∗. Then

0 ≤ f (xk )− f (x∗) ≤ 2‖x0 − x∗‖2

α(k + 1)2 for all k ≥ 1.

Page 21: Algorithms for Minimizing Differences of Convex Functions ......Minimizing Differences of Convex Functions-The DCA Algorithms for Minimizing Differences of Convex Functions and Applications

Minimizing Differences of Convex Functions-The DCA

Distance Functions

Given a set Ω ⊂ Rn, the distance function associated with Ω isdefined by

d(x ; Ω) := inf‖x − ω‖

∣∣ ω ∈ Ω.

For each x ∈ Rn, the Euclidean projection from x to Ω isdefined by

P(x ; Ω) :=ω ∈ Ω

∣∣ ‖x − ω‖ = d(x ; Ω).

Figure: Distance function.

Page 22: Algorithms for Minimizing Differences of Convex Functions ......Minimizing Differences of Convex Functions-The DCA Algorithms for Minimizing Differences of Convex Functions and Applications

Minimizing Differences of Convex Functions-The DCA

Squared Distance Functions

TheoremIf Ω is a nonempty, closed, convex subset of Rn. Then:

For each x ∈ Rn, the Euclidean projection Π(x ; Ω) is asingleton.The projection mapping is nonexpansive:∥∥Π(x1; Ω)− Π(x2; Ω)

∥∥ ≤ ‖x1 − x2‖ for all x1, x2 ∈ Rn.

In addition, the squared distance function ϕ(x) = [d(x ; Ω)]2 forx ∈ Rn is a C1,1 function with

∇ϕ(x) = 2[x − P(x ; Ω)].

Page 23: Algorithms for Minimizing Differences of Convex Functions ......Minimizing Differences of Convex Functions-The DCA Algorithms for Minimizing Differences of Convex Functions and Applications

Minimizing Differences of Convex Functions-The DCA

Nesterov’s Smoothing Technique

TheoremGiven any a ∈ Rn and µ > 0, a Nesterov smoothingapproximation a of ϕ(x) := ‖x − a‖ has the representation

fµ(x) =‖x − b‖2

2µ− µ

2[d(

x − bµ

; IB)]2,

where IB is the closed unit ball of Rn. In addition,

∇fµ(x) = uµ(x) = Π(x − bµ

; IB).

The gradient ∇fµ is Lipschitz continuous with constant `µ =1µ.

aY. Nesterov: Smooth minimization of non-smooth functions.Math.Program., Ser. A 103 , 127-152 (2005).

Page 24: Algorithms for Minimizing Differences of Convex Functions ......Minimizing Differences of Convex Functions-The DCA Algorithms for Minimizing Differences of Convex Functions and Applications

Minimizing Differences of Convex Functions-The DCA

Generalized Fermat-Torricelli Problems

Consider the following version of the Fermat-Torricelli problem:

minimize F(x) :=m∑

i=1

‖x − ai‖ subject to x ∈ Rn.

A smooth approximation of F is given by

Fµ(x) =m∑

i=1

(‖x − ai‖2

2µ− µ

2[d(

x − ai

µ; IB)]2

).

Page 25: Algorithms for Minimizing Differences of Convex Functions ......Minimizing Differences of Convex Functions-The DCA Algorithms for Minimizing Differences of Convex Functions and Applications

Minimizing Differences of Convex Functions-The DCA

Generalized Fermat-Torricelli Problems

The function Hµ continuously differentiable on Rn with itsgradient given by

∇Fµ(x) =m∑

i=1

Π(x − ai

µ; IB).

Its gradient is Lipschitz with constant

Lµ =mµ.

Moreover, one has the following estimate

Hµ(x) ≤ H(x) < Hµ(x) + mµ

2for all x ∈ Rn.

Page 26: Algorithms for Minimizing Differences of Convex Functions ......Minimizing Differences of Convex Functions-The DCA Algorithms for Minimizing Differences of Convex Functions and Applications

Minimizing Differences of Convex Functions-The DCA

Generalized Fermat-Torricelli Problems

To demonstrate the method, let us consider a numericalexample below.

ExampleThe longitude/latitude coordinates in decimal format of US cities arerecorded, for example, at http://www.realestate3d.com/gps/uslatlongdegmin.htm.Our goal is to use to find a point that minimizes the `1 distances to thegiven points representing the cities. If we consider the Euclideannorm, an approximate optimal value is V ∗ = 23409.33, and asuboptimal solution is x∗ = (38.63,97.35). If we consider the`1−norm, the algorithm presented in this section allows us to find anapproximate optimal value V ∗ = 28724.68 and a suboptimalsolution x∗ = (39.48,97.22).

Page 27: Algorithms for Minimizing Differences of Convex Functions ......Minimizing Differences of Convex Functions-The DCA Algorithms for Minimizing Differences of Convex Functions and Applications

Minimizing Differences of Convex Functions-The DCA

Generalized Fermat-Torricelli Problems

ExampleThe graphs showing the convergence of the algorithms fordifferent norms:

0 500 1000 1500 20000

0.5

1

1.5

2

2.5

3

3.5

4x 10

4

k

Fun

ctio

n va

lue

sum normmax normEuclidean norm

Figure: A generalized Fermat-Torricelli problem with different norms

Page 28: Algorithms for Minimizing Differences of Convex Functions ......Minimizing Differences of Convex Functions-The DCA Algorithms for Minimizing Differences of Convex Functions and Applications

Minimizing Differences of Convex Functions-The DCA

Differences of Convex Functions

Consider the problem

minimize f (x) := g(x)− h(x) , x ∈ Rn

where g : Rn → R and h : Rn → R are convex.

We call g − h a DC decomposition of f .

g(x) = x4

h(x) = 3x2 + x

f = g − h

Page 29: Algorithms for Minimizing Differences of Convex Functions ......Minimizing Differences of Convex Functions-The DCA Algorithms for Minimizing Differences of Convex Functions and Applications

Minimizing Differences of Convex Functions-The DCA

Examples of DC Programming

Fermat-Torricelli

f (x) =∑m

i=1 ci‖x−ai‖

Clustering

f (x1, . . . , x`) =∑mi=1 mink

`=1‖x` − ai‖2

MultifacilityLocation

f (x1, . . . , x`) =∑mi=1 mink

`=1‖x` − ai‖

Page 30: Algorithms for Minimizing Differences of Convex Functions ......Minimizing Differences of Convex Functions-The DCA Algorithms for Minimizing Differences of Convex Functions and Applications

Minimizing Differences of Convex Functions-The DCA

The DCA for Clustering

Problem formulation1: Let ai for i = 1, ...,m be m datapoints in Rn.

Minimize f (x1, . . . , xk ) :=m∑

i=1

min‖xl − ai‖2 : l = 1, ..., k

over xl ∈ Rn, l = 1, ..., k .

1L.T.H. An, M.T. Belghiti, P.D. Tao, A new efficient algorithm based on DCprogramming and DCA for clustering, J. Glob. Optim., 27 (2007), 503–608.

Page 31: Algorithms for Minimizing Differences of Convex Functions ......Minimizing Differences of Convex Functions-The DCA Algorithms for Minimizing Differences of Convex Functions and Applications

Minimizing Differences of Convex Functions-The DCA

K-Mean Clustering

Let x1, x2, . . . , xm be the data points and let c1, . . . , ck denotethe centers.

Randomly select k cluster centers.Assign each data point to the nearest center.Find the average of the data points assigned to eachcenter.Repeat the second step with the obtained new centers inthe third step until the centroids no longer move.

Although k−mean clustering is effective in many situations, italso has some disadvantages.

The k-means algorithm does not necessarily find theoptimal solutionThe algorithm is sensitive to the initial selected clustercenters

Page 32: Algorithms for Minimizing Differences of Convex Functions ......Minimizing Differences of Convex Functions-The DCA Algorithms for Minimizing Differences of Convex Functions and Applications

Minimizing Differences of Convex Functions-The DCA

DCA for Clustering and K-Mean2

Both DCA1 and DCA2 are better than K-means: theobjective values given by DCA1 and DCA2 are muchsmaller than that computed by K-means.DCA2 is the best among the three algorithms: it providesthe best solution with the shortest time. DCA2 is very fastand can then handle large-scale problems.

f (x1, . . . , xk ) =m∑

i=1

min`=1,...,k

‖x` − ai‖1

This is a nonsmooth nonconvex program for which there arerarely efficient solution algorithms, especially in the large scalesetting.

2L.T.H. An, M.T. Belghiti, P.D. Tao, A new efficient algorithm based on DCprogramming and DCA for clustering, J. Glob. Optim., 27 (2007), 503–608.

Page 33: Algorithms for Minimizing Differences of Convex Functions ......Minimizing Differences of Convex Functions-The DCA Algorithms for Minimizing Differences of Convex Functions and Applications

Minimizing Differences of Convex Functions-The DCA

Subgradients and Fenchel Conjugates of Convex Functions

DefinitionLet f : Rn → R be a convex function. A subgradient of f at x isany v ∈ Rn such that

〈v , x − x〉 ≤ f (x)− f (x) for all x ∈ Rn.

The subdifferential ∂f (x) of f at x is the set of all subgradientsof f at x .

DefinitionLet f : Rn → R be a function. The Fenchel conjugate of f isdefined by

f ∗(x) = supu∈Rn〈x ,u〉 − g(u), x ∈ Rn.

Page 34: Algorithms for Minimizing Differences of Convex Functions ......Minimizing Differences of Convex Functions-The DCA Algorithms for Minimizing Differences of Convex Functions and Applications

Minimizing Differences of Convex Functions-The DCA

DC Algorithm-The DCA

Consider the problem

minimize f (x) := g(x)− h(x) , x ∈ Rn

where g : Rn → R and h : Rn → R are convex.

The DCA3.

INPUT: x1 ∈ Rn, N ∈ Nfor k = 1, . . . ,N do

Find yk ∈ ∂h(xk )Find xk+1 ∈ ∂g∗(yk )

end forOUTPUT: xN+1

3P.D. Tao, L.T.H. An, A d.c. optimization algorithm for solving thetrust-region subproblem, SIAM J. Optim. 8 (1998), 476–505.

Page 35: Algorithms for Minimizing Differences of Convex Functions ......Minimizing Differences of Convex Functions-The DCA Algorithms for Minimizing Differences of Convex Functions and Applications

Minimizing Differences of Convex Functions-The DCA

An Example of the DCA

minimize f (x) = x4 − 3x2 − x , x ∈ RThen f (x) = g(x)− h(x), where g(x) = x4 and h(x) = 3x2 + x .We have

∂h(x) = 6x + 1, ∂g∗(y) =

argmin

x∈Rn

(x4 − yx

)=

3

√y4

xk+1 =3

√6xk + 1

4

Page 36: Algorithms for Minimizing Differences of Convex Functions ......Minimizing Differences of Convex Functions-The DCA Algorithms for Minimizing Differences of Convex Functions and Applications

Minimizing Differences of Convex Functions-The DCA

DC Algorithm-The DCA

DefinitionA function h : Rn → R is called γ-convex (γ ≥ 0) if there existsγ ≥ 0 such that the function defined by k(x) := h(x)− γ

2‖x‖2,

x ∈ Rn, is convex. If there exists γ > 0 such that h is γ−convex,then h is called strongly convex with parameter γ.

TheoremConsider the sequence xk generated by the DCA. Supposethat g is γ1-convex and h is γ2-convex. Then

f (xk )− f (xk+1) ≥ γ1 + γ2

2‖xk+1 − xk‖2 for all k ∈ N.

Page 37: Algorithms for Minimizing Differences of Convex Functions ......Minimizing Differences of Convex Functions-The DCA Algorithms for Minimizing Differences of Convex Functions and Applications

Minimizing Differences of Convex Functions-The DCA

DC Algorithm-The DCA

DefinitionWe say that an element x ∈ Rn is a stationary point of thefunction f = g − h if ∂g(x)∩ ∂h(x) 6= ∅. In the case where g andh are differentiable, x is a stationary point of f if and only if∇f (x) = ∇g(x)−∇h(x) = 0.

TheoremConsider sequence xk generated by the DCA. Then f (xk )is a decreasing sequence. Suppose further that f is boundedfrom below and that g is γ1-convex and h is γ2-convex withγ1 + γ2 > 0. If xk is bounded, then every subsequential limitof the sequence xk is a stationary point of f .

Page 38: Algorithms for Minimizing Differences of Convex Functions ......Minimizing Differences of Convex Functions-The DCA Algorithms for Minimizing Differences of Convex Functions and Applications

Minimizing Differences of Convex Functions-The DCA

DC Decomposition

f (x1, . . . , xk ) =m∑

i=1

min`=1,...,k

‖x` − ai‖

We will utilize the fact that

f (x1, . . . , xk ) =m∑

i=1

k∑`=1

‖x` − ai‖ − maxr=1,...,k

k∑`=16=r

‖x` − ai‖

=m∑

i=1

(k∑`=1

‖x` − ai‖

)−

m∑i=1

maxr=1,...,k

k∑`=16=r

‖x` − ai‖

.

Page 39: Algorithms for Minimizing Differences of Convex Functions ......Minimizing Differences of Convex Functions-The DCA Algorithms for Minimizing Differences of Convex Functions and Applications

Minimizing Differences of Convex Functions-The DCA

DC Decomposition

We obtain the µ-smoothing approximation fµ = gµ − hµ, where

gµ(x1, . . . , xk ) =1

m∑i=1

k∑`=1

‖x` − ai‖2

hµ(x1, . . . , xk ) =µ

2

m∑i=1

k∑`=1

[d(

x` − ai

µ; IB)]2

+m∑

i=1

maxr=1,...,k

k∑`=16=r

(1

2µ‖x` − ai‖2 −

µ

2

[d(

x` − ai

µ; IB)]2

)

To implement DCA, we need ∂g∗µ and ∂hµ. . .

Page 40: Algorithms for Minimizing Differences of Convex Functions ......Minimizing Differences of Convex Functions-The DCA Algorithms for Minimizing Differences of Convex Functions and Applications

Minimizing Differences of Convex Functions-The DCA

∂gµ

Using the Frobenius norm in a space of matrices, we expressgµ as

Gµ(X ) =m2µ‖X‖2 − 1

µ〈X ,B〉+

k2µ‖A‖2,

with the inner product 〈A,B〉 =k∑`

n∑j

a`jb`j ,

X is the k × n matrix with rows x1, . . . , xk ,A is m × n with rows a1, . . . ,am, andB is k × n whose every row is the sum a1 + · · ·+ am.

∇Gµ(X ) = mµX − 1

µB X ∈ ∂G∗(Y ) iff Y ∈ ∂G(X )

∇G∗µ(Y ) = 1m (B + µY )

Page 41: Algorithms for Minimizing Differences of Convex Functions ......Minimizing Differences of Convex Functions-The DCA Algorithms for Minimizing Differences of Convex Functions and Applications

Minimizing Differences of Convex Functions-The DCA

∂hµ

hµ(x1, . . . , xk ) =µ

2

m∑i=1

k∑`=1

[d(

x` − ai

µ; IB)]2

+m∑

i=1

maxr=1,...,k

k∑`=1` 6=r

(1

2µ‖x` − ai‖2 −

µ

2

[d(

x` − ai

µ; IB)]2)

Page 42: Algorithms for Minimizing Differences of Convex Functions ......Minimizing Differences of Convex Functions-The DCA Algorithms for Minimizing Differences of Convex Functions and Applications

Minimizing Differences of Convex Functions-The DCA

∂hµ

∇H1(X ) =

ix1−aiµ − PIB

(x1−aiµ

)...∑

ixk−aiµ − PIB

(xk−aiµ

)

For each i = 1, . . . ,m, there is some Ri such that theR-excluded sum is maximal. If we call this sum FRi , then ∇FRi

is a k × n matrix whose `th 6= R row is PIB

(x`−aiµ

), and Rth row

is 0.

V ∈ ∂H2(X ) iff V =m∑

i=1

FRi

Page 43: Algorithms for Minimizing Differences of Convex Functions ......Minimizing Differences of Convex Functions-The DCA Algorithms for Minimizing Differences of Convex Functions and Applications

Minimizing Differences of Convex Functions-The DCA

Multifacility Location Algorithm

INPUT: X1 ∈ dom g, N ∈ N

for k = 1, . . . ,N do

Compute Yk = ∇H1(Xk )+Vk

Compute Xk+1 = 1m (B + µYk )

end for

OUTPUT: xN+1

Page 44: Algorithms for Minimizing Differences of Convex Functions ......Minimizing Differences of Convex Functions-The DCA Algorithms for Minimizing Differences of Convex Functions and Applications

Minimizing Differences of Convex Functions-The DCA

Clustering

Page 45: Algorithms for Minimizing Differences of Convex Functions ......Minimizing Differences of Convex Functions-The DCA Algorithms for Minimizing Differences of Convex Functions and Applications

Minimizing Differences of Convex Functions-The DCA

Clustering

Page 46: Algorithms for Minimizing Differences of Convex Functions ......Minimizing Differences of Convex Functions-The DCA Algorithms for Minimizing Differences of Convex Functions and Applications

Minimizing Differences of Convex Functions-The DCA

Clustering

Page 47: Algorithms for Minimizing Differences of Convex Functions ......Minimizing Differences of Convex Functions-The DCA Algorithms for Minimizing Differences of Convex Functions and Applications

Minimizing Differences of Convex Functions-The DCA

Results

iteration2 4 6 8 10 12 14 16 18 20

f(x

k)

5500

6000

6500

7000

7500Fermat Torricelli, R10

Algorithm 4

iteration0 50 100 150 200 250 300 350

f(X

k)

190

195

200

205

210

215

220

225

230

235

240Eight Rings, R2

Algorithm 5

iteration0 100 200 300 400 500 600 700 800

f(X

k)

1800

1820

1840

1860

1880

1900

1920

1940

1960

1980

Multifacility, R2

Algorithm 5

iteration0 100 200 300 400 500 600 700 800 900 1000

f(X

k)

4800

4900

5000

5100

5200

5300

5400

5500

Multifacility, R10

Algorithm 5

iteration2 4 6 8 10 12 14 16 18 20

f(X

k)

1500

2000

2500

3000

3500

4000

4500

5000

5500

Multifacility Sets, R2

Algorithm 7

Page 48: Algorithms for Minimizing Differences of Convex Functions ......Minimizing Differences of Convex Functions-The DCA Algorithms for Minimizing Differences of Convex Functions and Applications

Minimizing Differences of Convex Functions-The DCA

References

[1] An, Belghiti, Tao: A new efficient algorithm based on DCprogramming and DCA for clustering. J. Glob. Optim., 27 (2007),503–608.

[2] Nam, Rector, Giles: Minimizing Differences of Convex Functionsand Applications to Facility Location and clustering.arXiv :1511.07595 (2015).

[3] Nesterov: Smooth minimization of non-smooth functions.Math.Program., Ser. A 103 , 127-152 (2005).

[4] Rockafellar: Convex Analysis, Princeton University Press,Princeton, NJ, 1970.