majorization-minimization algorithm€¦ · distributed algorithm for nonlinear programming...

67
Majorization-Minimization Algorithm Ying Sun and Prof. Daniel P. Palomar ELEC5470/IEDA6100A - Convex Optimization The Hong Kong University of Science and Technology (HKUST) Fall 2019-20

Upload: others

Post on 25-Aug-2020

33 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Majorization-Minimization Algorithm€¦ · Distributed Algorithm for Nonlinear Programming Introduction Construction Techniques Example Algorithms Applications EM Algorithm Assumethecompletedatasetfx;zgconsistsofobserved

Majorization-Minimization Algorithm

Ying Sun and Prof. Daniel P. Palomar

ELEC5470/IEDA6100A - Convex OptimizationThe Hong Kong University of Science and Technology (HKUST)

Fall 2019-20

Page 2: Majorization-Minimization Algorithm€¦ · Distributed Algorithm for Nonlinear Programming Introduction Construction Techniques Example Algorithms Applications EM Algorithm Assumethecompletedatasetfx;zgconsistsofobserved

Acknowledgment

Slides of this lecture are majorly based on the following works:[Hun-Lan’J04] D. R. Hunter, and K. Lange, "A Tutorial on MMAlgorithms", Amer. Statistician, pp. 30-37, 2004.[Raz-Hon-Luo’J13] M. Razaviyayn, M. Hong, and Z. Luo, "A UnifiedConvergence Analysis of Block Successive Minimization Methods forNonsmooth Optimization", SIAM J. Optim., pp. 1126-1153, 2013.[Scu-Fac-Son-Pal-Pan’J14] G. Scutari, F. Facchinei, Peiran Song,D.P. Palomar, and Jong-Shi Pang, "Decomposition by PartialLinearization: Parallel Optimization of Multi-Agent Systems", IEEETrans. Signal Process., vol. 62, no. 3, pp. 641-656, Feb. 2014.

[Sun-Bab-Pal’J17] Y. Sun, P. Babu, and D. P. Palomar,“Majorization-Minimization Algorithms in Signal Processing,Communications, and Machine Learning”, IEEE Trans. SignalProcess, vol. 65, no. 3, pp. 794-816, Feb. 2017.

Page 3: Majorization-Minimization Algorithm€¦ · Distributed Algorithm for Nonlinear Programming Introduction Construction Techniques Example Algorithms Applications EM Algorithm Assumethecompletedatasetfx;zgconsistsofobserved

Outline

1 The Majorization-Minimization AlgorithmIntroductionConstruction TechniquesExample AlgorithmsApplications

2 Block Successive Majorization-MinimizationIntroductionBlock Coordinate DescentBlock Successive Majorization-MinimizationExample AlgorithmsApplications

3 Distributed Algorithm for Nonlinear ProgrammingExact Jacobi Successive Convex ApproximationExtensions

Page 4: Majorization-Minimization Algorithm€¦ · Distributed Algorithm for Nonlinear Programming Introduction Construction Techniques Example Algorithms Applications EM Algorithm Assumethecompletedatasetfx;zgconsistsofobserved

Outline

1 The Majorization-Minimization AlgorithmIntroductionConstruction TechniquesExample AlgorithmsApplications

2 Block Successive Majorization-MinimizationIntroductionBlock Coordinate DescentBlock Successive Majorization-MinimizationExample AlgorithmsApplications

3 Distributed Algorithm for Nonlinear ProgrammingExact Jacobi Successive Convex ApproximationExtensions

Page 5: Majorization-Minimization Algorithm€¦ · Distributed Algorithm for Nonlinear Programming Introduction Construction Techniques Example Algorithms Applications EM Algorithm Assumethecompletedatasetfx;zgconsistsofobserved

The Majorization-Minimization AlgorithmBlock Successive Majorization-Minimization

Distributed Algorithm for Nonlinear Programming

IntroductionConstruction TechniquesExample AlgorithmsApplications

Problem Statement

Consider the following optimization problem

minimizex

f (x)

subject to x ∈X ,

with X being a closed convex set and f (x) being continuous.f (x) is too complicated to manipulate.

Idea: successively minimize an approximating function u(x,xk

)xk+1 = arg min

x∈Xu(x,xk

),

hoping the sequence of minimizers{xk}will converge to

optimal x?.

Question: how to construct u(x,xk

)?

Ying Sun and Prof. Daniel P. Palomar MM Algorithm 5 / 65

Page 6: Majorization-Minimization Algorithm€¦ · Distributed Algorithm for Nonlinear Programming Introduction Construction Techniques Example Algorithms Applications EM Algorithm Assumethecompletedatasetfx;zgconsistsofobserved

The Majorization-Minimization AlgorithmBlock Successive Majorization-Minimization

Distributed Algorithm for Nonlinear Programming

IntroductionConstruction TechniquesExample AlgorithmsApplications

Problem Statement

Consider the following optimization problem

minimizex

f (x)

subject to x ∈X ,

with X being a closed convex set and f (x) being continuous.f (x) is too complicated to manipulate.

Idea: successively minimize an approximating function u(x,xk

)xk+1 = arg min

x∈Xu(x,xk

),

hoping the sequence of minimizers{xk}will converge to

optimal x?.

Question: how to construct u(x,xk

)?

Ying Sun and Prof. Daniel P. Palomar MM Algorithm 5 / 65

Page 7: Majorization-Minimization Algorithm€¦ · Distributed Algorithm for Nonlinear Programming Introduction Construction Techniques Example Algorithms Applications EM Algorithm Assumethecompletedatasetfx;zgconsistsofobserved

The Majorization-Minimization AlgorithmBlock Successive Majorization-Minimization

Distributed Algorithm for Nonlinear Programming

IntroductionConstruction TechniquesExample AlgorithmsApplications

Problem Statement

Consider the following optimization problem

minimizex

f (x)

subject to x ∈X ,

with X being a closed convex set and f (x) being continuous.f (x) is too complicated to manipulate.

Idea: successively minimize an approximating function u(x,xk

)xk+1 = arg min

x∈Xu(x,xk

),

hoping the sequence of minimizers{xk}will converge to

optimal x?.

Question: how to construct u(x,xk

)?

Ying Sun and Prof. Daniel P. Palomar MM Algorithm 5 / 65

Page 8: Majorization-Minimization Algorithm€¦ · Distributed Algorithm for Nonlinear Programming Introduction Construction Techniques Example Algorithms Applications EM Algorithm Assumethecompletedatasetfx;zgconsistsofobserved

The Majorization-Minimization AlgorithmBlock Successive Majorization-Minimization

Distributed Algorithm for Nonlinear Programming

IntroductionConstruction TechniquesExample AlgorithmsApplications

Terminology

Distance from a point to a set:

d (x,S ) = infs∈S‖x− s‖ .

Directional derivative:

f ′ (x;d) , liminfλ↓0

f (x+ λd)− f (x)

λ.

Stationary point: x is a stationary point if

f ′ (x;d)≥ 0, ∀d such that x+d ∈X .

Ying Sun and Prof. Daniel P. Palomar MM Algorithm 6 / 65

Page 9: Majorization-Minimization Algorithm€¦ · Distributed Algorithm for Nonlinear Programming Introduction Construction Techniques Example Algorithms Applications EM Algorithm Assumethecompletedatasetfx;zgconsistsofobserved

The Majorization-Minimization AlgorithmBlock Successive Majorization-Minimization

Distributed Algorithm for Nonlinear Programming

IntroductionConstruction TechniquesExample AlgorithmsApplications

Majorization-Minimization

Construction rule:

u (y,y) = f (y) , ∀y ∈X (A1)u (x,y)≥ f (x) , ∀x,y ∈X (A2)u′ (x,y;d)

∣∣x=y = f ′ (y;d) , ∀d with y +d ∈X (A3)

u (x,y) is continuous in x and y (A4)

Pictorially:

f (x)

xk

u(x,xk

)

Ying Sun and Prof. Daniel P. Palomar MM Algorithm 7 / 65

Page 10: Majorization-Minimization Algorithm€¦ · Distributed Algorithm for Nonlinear Programming Introduction Construction Techniques Example Algorithms Applications EM Algorithm Assumethecompletedatasetfx;zgconsistsofobserved

The Majorization-Minimization AlgorithmBlock Successive Majorization-Minimization

Distributed Algorithm for Nonlinear Programming

IntroductionConstruction TechniquesExample AlgorithmsApplications

Algorithm

Majorization-Minimization (Successive Upper-Bound Minimization):1: Find a feasible point x0 ∈X and set k = 02: repeat3: xk+1 = arg minx∈X u

(x,xk

)(global minimum)

4: k ← k +15: until some convergence criterion is met

Ying Sun and Prof. Daniel P. Palomar MM Algorithm 8 / 65

Page 11: Majorization-Minimization Algorithm€¦ · Distributed Algorithm for Nonlinear Programming Introduction Construction Techniques Example Algorithms Applications EM Algorithm Assumethecompletedatasetfx;zgconsistsofobserved

The Majorization-Minimization AlgorithmBlock Successive Majorization-Minimization

Distributed Algorithm for Nonlinear Programming

IntroductionConstruction TechniquesExample AlgorithmsApplications

Convergence

Under assumptions A1-A4, every limit point of the sequence{xk}is a stationary point of the original problem.

If further assume that the level set X 0 ={x|f (x)≤ f

(x0)}

iscompact, then

limk→∞

d(xk ,X ?

)= 0,

where X ? is the set of stationary points.

Ying Sun and Prof. Daniel P. Palomar MM Algorithm 9 / 65

Page 12: Majorization-Minimization Algorithm€¦ · Distributed Algorithm for Nonlinear Programming Introduction Construction Techniques Example Algorithms Applications EM Algorithm Assumethecompletedatasetfx;zgconsistsofobserved

Outline

1 The Majorization-Minimization AlgorithmIntroductionConstruction TechniquesExample AlgorithmsApplications

2 Block Successive Majorization-MinimizationIntroductionBlock Coordinate DescentBlock Successive Majorization-MinimizationExample AlgorithmsApplications

3 Distributed Algorithm for Nonlinear ProgrammingExact Jacobi Successive Convex ApproximationExtensions

Page 13: Majorization-Minimization Algorithm€¦ · Distributed Algorithm for Nonlinear Programming Introduction Construction Techniques Example Algorithms Applications EM Algorithm Assumethecompletedatasetfx;zgconsistsofobserved

The Majorization-Minimization AlgorithmBlock Successive Majorization-Minimization

Distributed Algorithm for Nonlinear Programming

IntroductionConstruction TechniquesExample AlgorithmsApplications

Construct Surrogate Function

The performance of Majorization-Minimization algorithmdepends crucially on the surrogate function u

(x,xk

).

Guideline: the global minimizer of u(x,xk

)should be easy to

find.

Suppose f (x) = f1 (x) + κ (x), where f1 (x) is some “nice”function and κ (x) is the one needed to be approximated.

Ying Sun and Prof. Daniel P. Palomar MM Algorithm 11 / 65

Page 14: Majorization-Minimization Algorithm€¦ · Distributed Algorithm for Nonlinear Programming Introduction Construction Techniques Example Algorithms Applications EM Algorithm Assumethecompletedatasetfx;zgconsistsofobserved

The Majorization-Minimization AlgorithmBlock Successive Majorization-Minimization

Distributed Algorithm for Nonlinear Programming

IntroductionConstruction TechniquesExample AlgorithmsApplications

Construction by Convexity

Suppose κ (t) is convex, then

κ

(∑i

αi ti

)≤∑

i

αiκ (ti )

with αi ≥ 0 and ∑αi = 1.

Ying Sun and Prof. Daniel P. Palomar MM Algorithm 12 / 65

Page 15: Majorization-Minimization Algorithm€¦ · Distributed Algorithm for Nonlinear Programming Introduction Construction Techniques Example Algorithms Applications EM Algorithm Assumethecompletedatasetfx;zgconsistsofobserved

The Majorization-Minimization AlgorithmBlock Successive Majorization-Minimization

Distributed Algorithm for Nonlinear Programming

IntroductionConstruction TechniquesExample AlgorithmsApplications

For example:

κ

(wTx

)= κ

(wT(x−xk

)+wTxk

)= κ

(∑i

αi

(wi

(xi −xki

)αi

+wTxk))

≤∑i

αiκ

(wi

(xi −xki

)αi

+wTxk)

If further assume that w and x are positive (αi = wixki /w

Txk):

κ

(wTx

)≤∑

i

wixki

wTxkκ

(wTxk

xkixi

)The surrogate functions are separable (parallel algorithm).

Ying Sun and Prof. Daniel P. Palomar MM Algorithm 13 / 65

Page 16: Majorization-Minimization Algorithm€¦ · Distributed Algorithm for Nonlinear Programming Introduction Construction Techniques Example Algorithms Applications EM Algorithm Assumethecompletedatasetfx;zgconsistsofobserved

The Majorization-Minimization AlgorithmBlock Successive Majorization-Minimization

Distributed Algorithm for Nonlinear Programming

IntroductionConstruction TechniquesExample AlgorithmsApplications

Construction by Taylor Expansion

Suppose κ (x) is concave and differentiable, then

κ (x)≤ κ

(xk)

+ ∇κ

(xk)(

x−xk),

which is a linear upper-bound.Suppose κ (x) is convex and twice differentiable, then

κ (x)≤ κ

(xk)

+∇κ

(xk)T (

x−xk)

+12

(x−xk

)TM(x−xk

)if M−∇2κ (x)� 0, ∀x.

Ying Sun and Prof. Daniel P. Palomar MM Algorithm 14 / 65

Page 17: Majorization-Minimization Algorithm€¦ · Distributed Algorithm for Nonlinear Programming Introduction Construction Techniques Example Algorithms Applications EM Algorithm Assumethecompletedatasetfx;zgconsistsofobserved

The Majorization-Minimization AlgorithmBlock Successive Majorization-Minimization

Distributed Algorithm for Nonlinear Programming

IntroductionConstruction TechniquesExample AlgorithmsApplications

Construction by Inequalities

Arithmetic-Geometric Mean Inequality:(n

∏i=1

xi

)1/n

≤ 1n

n

∑i=1

xi

Cauchy-Schwartz Inequality:

‖x‖ ≥ xTxk

‖xk‖

Jensen’s Inequality:

κ (Ex)≤ Eκ (x)

with κ (·) being convex.

Ying Sun and Prof. Daniel P. Palomar MM Algorithm 15 / 65

Page 18: Majorization-Minimization Algorithm€¦ · Distributed Algorithm for Nonlinear Programming Introduction Construction Techniques Example Algorithms Applications EM Algorithm Assumethecompletedatasetfx;zgconsistsofobserved

Outline

1 The Majorization-Minimization AlgorithmIntroductionConstruction TechniquesExample AlgorithmsApplications

2 Block Successive Majorization-MinimizationIntroductionBlock Coordinate DescentBlock Successive Majorization-MinimizationExample AlgorithmsApplications

3 Distributed Algorithm for Nonlinear ProgrammingExact Jacobi Successive Convex ApproximationExtensions

Page 19: Majorization-Minimization Algorithm€¦ · Distributed Algorithm for Nonlinear Programming Introduction Construction Techniques Example Algorithms Applications EM Algorithm Assumethecompletedatasetfx;zgconsistsofobserved

The Majorization-Minimization AlgorithmBlock Successive Majorization-Minimization

Distributed Algorithm for Nonlinear Programming

IntroductionConstruction TechniquesExample AlgorithmsApplications

EM Algorithm

Assume the complete data set {x,z} consists of observedvariable x and latent variable z.Objective: estimate parameter θ ∈Θ from x.Maximum likelihood estimator: θ = arg minθ∈Θ− logp (x|θ)

EM (Expectation Maximization) algorithm:

E-step: evaluate p(z|x,θk

)“guess” z from current estimate of θ

M-step: update θ as θk+1 = arg minθ∈Θ u(θ ,θk

), where

u(

θ ,θk)

=−Ez|x,θk logp (x,z|θ)

update θ from “guessed” complete data set

Ying Sun and Prof. Daniel P. Palomar MM Algorithm 17 / 65

Page 20: Majorization-Minimization Algorithm€¦ · Distributed Algorithm for Nonlinear Programming Introduction Construction Techniques Example Algorithms Applications EM Algorithm Assumethecompletedatasetfx;zgconsistsofobserved

The Majorization-Minimization AlgorithmBlock Successive Majorization-Minimization

Distributed Algorithm for Nonlinear Programming

IntroductionConstruction TechniquesExample AlgorithmsApplications

An MM Interpretation of EM

The objective function can be written as

− logp (x|θ)

=− log Ez|θp (x|z,θ)

=− log Ez|θ

(p(z|x,θk

)p (x|z,θ)

p (z|x,θk)

)

=− log Ez|x,θk

(p (x|z,θ)

p (z|x,θk)p (z|θ)

)≤−Ez|x,θk log

(p (x|z,θ)

p (z|x,θk)p (z|θ)

)(Jensen’s Inequality)

=−Ez|x,θk logp (x,z|θ)︸ ︷︷ ︸u(θ ,θk)

+ Ez|x,θkp(z|x,θk

)

Ying Sun and Prof. Daniel P. Palomar MM Algorithm 18 / 65

Page 21: Majorization-Minimization Algorithm€¦ · Distributed Algorithm for Nonlinear Programming Introduction Construction Techniques Example Algorithms Applications EM Algorithm Assumethecompletedatasetfx;zgconsistsofobserved

The Majorization-Minimization AlgorithmBlock Successive Majorization-Minimization

Distributed Algorithm for Nonlinear Programming

IntroductionConstruction TechniquesExample AlgorithmsApplications

Proximal Minimization

f (x) is convex. Solve minx f (x) by solving the equivalentproblem

minimizex∈X ,y∈X

f (x) + 12c ‖x−y‖2 .

Objective function is strongly convex in both x and y.Algorithm:

xk+1 = arg minx∈X

{f (x) +

12c

∥∥∥x−yk∥∥∥2}

yk+1 = xk+1.

An MM interpretation:

xk+1 = arg minx∈X

{f (x) +

12c

∥∥∥x−xk∥∥∥2}

Ying Sun and Prof. Daniel P. Palomar MM Algorithm 19 / 65

Page 22: Majorization-Minimization Algorithm€¦ · Distributed Algorithm for Nonlinear Programming Introduction Construction Techniques Example Algorithms Applications EM Algorithm Assumethecompletedatasetfx;zgconsistsofobserved

The Majorization-Minimization AlgorithmBlock Successive Majorization-Minimization

Distributed Algorithm for Nonlinear Programming

IntroductionConstruction TechniquesExample AlgorithmsApplications

DC Programming

Consider the unconstrained problem

minimizex∈Rn

f (x) ,

where f (x) = g (x) +h (x) with g (x) convex and h (x) concave.DC (Difference of Convex) Programming generates

{xk}by

solving∇g(xk+1

)=−∇h

(xk).

An MM interpretation:

xk+1 = arg minx

{g (x) + ∇h

(xk)T (

x−xk)}

.

Ying Sun and Prof. Daniel P. Palomar MM Algorithm 20 / 65

Page 23: Majorization-Minimization Algorithm€¦ · Distributed Algorithm for Nonlinear Programming Introduction Construction Techniques Example Algorithms Applications EM Algorithm Assumethecompletedatasetfx;zgconsistsofobserved

Outline

1 The Majorization-Minimization AlgorithmIntroductionConstruction TechniquesExample AlgorithmsApplications

2 Block Successive Majorization-MinimizationIntroductionBlock Coordinate DescentBlock Successive Majorization-MinimizationExample AlgorithmsApplications

3 Distributed Algorithm for Nonlinear ProgrammingExact Jacobi Successive Convex ApproximationExtensions

Page 24: Majorization-Minimization Algorithm€¦ · Distributed Algorithm for Nonlinear Programming Introduction Construction Techniques Example Algorithms Applications EM Algorithm Assumethecompletedatasetfx;zgconsistsofobserved

The Majorization-Minimization AlgorithmBlock Successive Majorization-Minimization

Distributed Algorithm for Nonlinear Programming

IntroductionConstruction TechniquesExample AlgorithmsApplications

Power Control by GP

[Chi-Tan-Pal-O’Ne-Jul’J07] Problem: maximize systemthroughput. Essentially we need to solve the following problem:

minimizeP∈P

∑j 6=i GijPj+ni∑j GijPj+ni

Objective function is the ratio of two posynomials.Minorize a posynomial, denoted by g (x) = ∑i mi (x), bymononial:

g (x)≥∏i

(mi (x)

αi

)αi

where αi =mi(xk)g(xk)

. (Arithmetic-Geometric Mean Inequality)

Solution: approximate the denominator posynomial∑j GijPj +ni by monomial.

Ying Sun and Prof. Daniel P. Palomar MM Algorithm 22 / 65

Page 25: Majorization-Minimization Algorithm€¦ · Distributed Algorithm for Nonlinear Programming Introduction Construction Techniques Example Algorithms Applications EM Algorithm Assumethecompletedatasetfx;zgconsistsofobserved

The Majorization-Minimization AlgorithmBlock Successive Majorization-Minimization

Distributed Algorithm for Nonlinear Programming

IntroductionConstruction TechniquesExample AlgorithmsApplications

Reweighted `1-norm

Sparsity signal recovery problem

minimizex

‖x‖0subject to Ax = b

`1-norm approximation

minimizex

‖x‖1subject to Ax = b

General formminimize

x∑ni=1 φ (|xi |)

subject to Ax = b

Ying Sun and Prof. Daniel P. Palomar MM Algorithm 23 / 65

Page 26: Majorization-Minimization Algorithm€¦ · Distributed Algorithm for Nonlinear Programming Introduction Construction Techniques Example Algorithms Applications EM Algorithm Assumethecompletedatasetfx;zgconsistsofobserved

The Majorization-Minimization AlgorithmBlock Successive Majorization-Minimization

Distributed Algorithm for Nonlinear Programming

IntroductionConstruction TechniquesExample AlgorithmsApplications

[Can-Wak-Boy’J08] Assume φ (t) is concave nondecreasing, atxki , φ (|xi |) is majorized by wk

i |xi | with wki = φ ′ (t)|t=|xi |.

At each iteration a weighted `1-norm is solved

minimizex

∑wki |xi |

subject to Ax = b

0x

x

1x

1

log 1x

Ying Sun and Prof. Daniel P. Palomar MM Algorithm 24 / 65

Page 27: Majorization-Minimization Algorithm€¦ · Distributed Algorithm for Nonlinear Programming Introduction Construction Techniques Example Algorithms Applications EM Algorithm Assumethecompletedatasetfx;zgconsistsofobserved

The Majorization-Minimization AlgorithmBlock Successive Majorization-Minimization

Distributed Algorithm for Nonlinear Programming

IntroductionConstruction TechniquesExample AlgorithmsApplications

Sparse Generalized Eigenvalue Problem

`0-norm regularized generalized eigenvalue problem

maximizex

xTAx−ρ ‖x‖0subject to xTBx = 1.

Replace ‖xi‖0 by some nicely behaved function gp (xi )

|xi |p, 0< p ≤ 1log (1+ |xi |/p)/ log (1+1/p), p > 01− e−|xi |/p, p > 0.

Take gp (xi ) = |xi |p for example.

Ying Sun and Prof. Daniel P. Palomar MM Algorithm 25 / 65

Page 28: Majorization-Minimization Algorithm€¦ · Distributed Algorithm for Nonlinear Programming Introduction Construction Techniques Example Algorithms Applications EM Algorithm Assumethecompletedatasetfx;zgconsistsofobserved

The Majorization-Minimization AlgorithmBlock Successive Majorization-Minimization

Distributed Algorithm for Nonlinear Programming

IntroductionConstruction TechniquesExample AlgorithmsApplications

[Son-Bab-Pal’J15a] Majorize gp (xi ) at xki by quadratic functionwki x

2i + cki .

The surrogate function for gp (xi ) = |xi |p is defined as

u(xi ,x

ki

)=

p

2

∣∣∣xki ∣∣∣p−2 x2i +(1− p

2

)∣∣∣xki ∣∣∣p .Solve at each iteration the following GEVP:

maximizex

xTAx−ρxTdiag(wk)x

subject to xTBx = 1

However, as |xi | → 0, wi →+∞...

Ying Sun and Prof. Daniel P. Palomar MM Algorithm 26 / 65

Page 29: Majorization-Minimization Algorithm€¦ · Distributed Algorithm for Nonlinear Programming Introduction Construction Techniques Example Algorithms Applications EM Algorithm Assumethecompletedatasetfx;zgconsistsofobserved

The Majorization-Minimization AlgorithmBlock Successive Majorization-Minimization

Distributed Algorithm for Nonlinear Programming

IntroductionConstruction TechniquesExample AlgorithmsApplications

Smooth approximation of gp (x):

g εp (x) =

{ p2εp−2x2, |x | ≤ ε

|x |p−(1− p

2

)εp, |x |> ε

When |x | ≤ ε , w remains to be a constant.

Ying Sun and Prof. Daniel P. Palomar MM Algorithm 27 / 65

Page 30: Majorization-Minimization Algorithm€¦ · Distributed Algorithm for Nonlinear Programming Introduction Construction Techniques Example Algorithms Applications EM Algorithm Assumethecompletedatasetfx;zgconsistsofobserved

The Majorization-Minimization AlgorithmBlock Successive Majorization-Minimization

Distributed Algorithm for Nonlinear Programming

IntroductionConstruction TechniquesExample AlgorithmsApplications

0 10 20 30 40 50 60

0

0.5

1

1.5

2

2.5

3

iterations

obje

ctive v

alu

e

IRQM (proposed)

DC−SGEP

Ying Sun and Prof. Daniel P. Palomar MM Algorithm 28 / 65

Page 31: Majorization-Minimization Algorithm€¦ · Distributed Algorithm for Nonlinear Programming Introduction Construction Techniques Example Algorithms Applications EM Algorithm Assumethecompletedatasetfx;zgconsistsofobserved

The Majorization-Minimization AlgorithmBlock Successive Majorization-Minimization

Distributed Algorithm for Nonlinear Programming

IntroductionConstruction TechniquesExample AlgorithmsApplications

Sequence Design

Complex unimodular sequence {xn ∈ C}Nn=1.Autocorrelation: rk = ∑

Nn=k+1 xnx

∗n−k = r∗−k , k = 0, . . . ,N−1.

Integrated sidelobe level (ISL):

ISL =N−1

∑k=1|rk |2 .

Problem formulation:

minimize{xn}Nn=1

ISL

subject to |xn|= 1, n = 1, . . . ,N.

Ying Sun and Prof. Daniel P. Palomar MM Algorithm 29 / 65

Page 32: Majorization-Minimization Algorithm€¦ · Distributed Algorithm for Nonlinear Programming Introduction Construction Techniques Example Algorithms Applications EM Algorithm Assumethecompletedatasetfx;zgconsistsofobserved

The Majorization-Minimization AlgorithmBlock Successive Majorization-Minimization

Distributed Algorithm for Nonlinear Programming

IntroductionConstruction TechniquesExample AlgorithmsApplications

By Fourier transform:

ISL ∝

2N

∑p=1

[∣∣∣aHp x∣∣∣2−N

]2with x = [x1, . . . ,xN ]T , ap =

[1,e jωp , . . . ,e jωp(N−1)

]Tand

ωp = 2π

2N (p−1).Equivalent problem:

minimizex

∑2Np=1(aHp xxHap

)2subject to |xn|= 1, ∀n.

Ying Sun and Prof. Daniel P. Palomar MM Algorithm 30 / 65

Page 33: Majorization-Minimization Algorithm€¦ · Distributed Algorithm for Nonlinear Programming Introduction Construction Techniques Example Algorithms Applications EM Algorithm Assumethecompletedatasetfx;zgconsistsofobserved

The Majorization-Minimization AlgorithmBlock Successive Majorization-Minimization

Distributed Algorithm for Nonlinear Programming

IntroductionConstruction TechniquesExample AlgorithmsApplications

[Son-Bab-Pal’J15b] Define A = [a1, . . . ,a2N ],

pk =[∣∣aH1 xk

∣∣2 , . . . , ∣∣aH2Nxk∣∣2]T , A = A

(diag

(pk)−pkmaxI

)AH .

Quadratic surrogate function:

������

�:const.pkmaxx

HAAHx +2Re(

xH(

A−2N2xk(xk)H)

xk)

Equivalent tominimize

x‖x−y‖2

subject to |xn|= 1, ∀n

with

y =−(

A−2N2xk(xk)H)

xk

Closed-form solution: xn = e j arg(yn).

Ying Sun and Prof. Daniel P. Palomar MM Algorithm 31 / 65

Page 34: Majorization-Minimization Algorithm€¦ · Distributed Algorithm for Nonlinear Programming Introduction Construction Techniques Example Algorithms Applications EM Algorithm Assumethecompletedatasetfx;zgconsistsofobserved

The Majorization-Minimization AlgorithmBlock Successive Majorization-Minimization

Distributed Algorithm for Nonlinear Programming

IntroductionConstruction TechniquesExample AlgorithmsApplications

0 20 40 60 80 10030

35

40

45

50

55

iteration

Inte

gra

ted s

idelo

be level (d

B)

benchmark

proposed method

Ying Sun and Prof. Daniel P. Palomar MM Algorithm 32 / 65

Page 35: Majorization-Minimization Algorithm€¦ · Distributed Algorithm for Nonlinear Programming Introduction Construction Techniques Example Algorithms Applications EM Algorithm Assumethecompletedatasetfx;zgconsistsofobserved

The Majorization-Minimization AlgorithmBlock Successive Majorization-Minimization

Distributed Algorithm for Nonlinear Programming

IntroductionConstruction TechniquesExample AlgorithmsApplications

Covariance Estimation

xi ∼ elliptical(0,Σ)

Fitting normalized sample si = xi‖xi‖2

to Angular CentralGaussian distribution

f (si ) ∝ det(Σ)−1/2(sTi Σ

−1si)−K/2

[Sun-Bab-Pal’J14] Shrinkage penalty

h (Σ) = log det(Σ) +Tr(Σ−1T

)Solve the following problem:

minimizeΣ

log det(Σ) + KN ∑ log

(xTi Σ

−1xi)

+ αh (Σ)

subject to Σ� 0

Ying Sun and Prof. Daniel P. Palomar MM Algorithm 33 / 65

Page 36: Majorization-Minimization Algorithm€¦ · Distributed Algorithm for Nonlinear Programming Introduction Construction Techniques Example Algorithms Applications EM Algorithm Assumethecompletedatasetfx;zgconsistsofobserved

The Majorization-Minimization AlgorithmBlock Successive Majorization-Minimization

Distributed Algorithm for Nonlinear Programming

IntroductionConstruction TechniquesExample AlgorithmsApplications

At Σk , the objective function is majorized by

(1+ α) log det(Σ) +K

N

N

∑i=1

xTi Σ−1xi

xTi(Σk)−1

xi+ αTr

(Σ−1T

)

Surrogate function is convex in Σ−1.Setting the gradient to zero leads to the weighted sampleaverage

Σk+1 =1

1+ α

K

N ∑xixTi

xTi(Σk)−1

xi+

α

1+ αT

Ying Sun and Prof. Daniel P. Palomar MM Algorithm 34 / 65

Page 37: Majorization-Minimization Algorithm€¦ · Distributed Algorithm for Nonlinear Programming Introduction Construction Techniques Example Algorithms Applications EM Algorithm Assumethecompletedatasetfx;zgconsistsofobserved

The Majorization-Minimization AlgorithmBlock Successive Majorization-Minimization

Distributed Algorithm for Nonlinear Programming

IntroductionConstruction TechniquesExample AlgorithmsApplications

1 6 11 168

10

12

14

16

18

20

Iteration

Obj

ectiv

e F

unct

ion

Val

ue

Ying Sun and Prof. Daniel P. Palomar MM Algorithm 35 / 65

Page 38: Majorization-Minimization Algorithm€¦ · Distributed Algorithm for Nonlinear Programming Introduction Construction Techniques Example Algorithms Applications EM Algorithm Assumethecompletedatasetfx;zgconsistsofobserved

Outline

1 The Majorization-Minimization AlgorithmIntroductionConstruction TechniquesExample AlgorithmsApplications

2 Block Successive Majorization-MinimizationIntroductionBlock Coordinate DescentBlock Successive Majorization-MinimizationExample AlgorithmsApplications

3 Distributed Algorithm for Nonlinear ProgrammingExact Jacobi Successive Convex ApproximationExtensions

Page 39: Majorization-Minimization Algorithm€¦ · Distributed Algorithm for Nonlinear Programming Introduction Construction Techniques Example Algorithms Applications EM Algorithm Assumethecompletedatasetfx;zgconsistsofobserved

The Majorization-Minimization AlgorithmBlock Successive Majorization-Minimization

Distributed Algorithm for Nonlinear Programming

IntroductionBlock Coordinate DescentBlock Successive Majorization-MinimizationExample AlgorithmsApplications

Problem Statement

Consider the following problem

minimizex∈X

f (x)

Set X possesses Cartesian product structure X = ∏mi=1Xi .

Observation: the problem

minimizexi∈Xi

f(x01, . . . ,x

0i−1,xi ,x

0i+1, . . . ,x

0m

)with x0−i taking some feasible value, is easy to solve.

Ying Sun and Prof. Daniel P. Palomar MM Algorithm 37 / 65

Page 40: Majorization-Minimization Algorithm€¦ · Distributed Algorithm for Nonlinear Programming Introduction Construction Techniques Example Algorithms Applications EM Algorithm Assumethecompletedatasetfx;zgconsistsofobserved

Outline

1 The Majorization-Minimization AlgorithmIntroductionConstruction TechniquesExample AlgorithmsApplications

2 Block Successive Majorization-MinimizationIntroductionBlock Coordinate DescentBlock Successive Majorization-MinimizationExample AlgorithmsApplications

3 Distributed Algorithm for Nonlinear ProgrammingExact Jacobi Successive Convex ApproximationExtensions

Page 41: Majorization-Minimization Algorithm€¦ · Distributed Algorithm for Nonlinear Programming Introduction Construction Techniques Example Algorithms Applications EM Algorithm Assumethecompletedatasetfx;zgconsistsofobserved

The Majorization-Minimization AlgorithmBlock Successive Majorization-Minimization

Distributed Algorithm for Nonlinear Programming

IntroductionBlock Coordinate DescentBlock Successive Majorization-MinimizationExample AlgorithmsApplications

Block Coordinate Descent (BCD)

Denote x , (x1, . . . ,xm),f(x01, . . . ,x

0i−1,xi ,x

0i+1, . . . ,x

0m

), f

(xi ,x0

)Block Coordinate Descent (nonlinear Gauss-Seidel)1: Initialize x0 ∈X and set k = 0.2: repeat3: k = k +1, i = (k mod n) +14: xki = arg minxi∈Xi

f(xi ,xk−1

)5: xki ← xk−1i , ∀k 6= i6: until some convergence criterion is met

Ying Sun and Prof. Daniel P. Palomar MM Algorithm 39 / 65

Page 42: Majorization-Minimization Algorithm€¦ · Distributed Algorithm for Nonlinear Programming Introduction Construction Techniques Example Algorithms Applications EM Algorithm Assumethecompletedatasetfx;zgconsistsofobserved

The Majorization-Minimization AlgorithmBlock Successive Majorization-Minimization

Distributed Algorithm for Nonlinear Programming

IntroductionBlock Coordinate DescentBlock Successive Majorization-MinimizationExample AlgorithmsApplications

Convergence

[Ber’B99] Assume that

f (x) is continuously differentiable over the set X .xki = arg minxi∈Xi

f(xi ,xk−1

)has a unique solution.

Then every limit point of the sequence{xk}is a stationary

point.[Gri-Sci’J00] Generalizations

globally convergent for m = 2.f is component-wise strictly quasi-convex w.r.t. m−2components.f is pseudo-convex.

Ying Sun and Prof. Daniel P. Palomar MM Algorithm 40 / 65

Page 43: Majorization-Minimization Algorithm€¦ · Distributed Algorithm for Nonlinear Programming Introduction Construction Techniques Example Algorithms Applications EM Algorithm Assumethecompletedatasetfx;zgconsistsofobserved

Outline

1 The Majorization-Minimization AlgorithmIntroductionConstruction TechniquesExample AlgorithmsApplications

2 Block Successive Majorization-MinimizationIntroductionBlock Coordinate DescentBlock Successive Majorization-MinimizationExample AlgorithmsApplications

3 Distributed Algorithm for Nonlinear ProgrammingExact Jacobi Successive Convex ApproximationExtensions

Page 44: Majorization-Minimization Algorithm€¦ · Distributed Algorithm for Nonlinear Programming Introduction Construction Techniques Example Algorithms Applications EM Algorithm Assumethecompletedatasetfx;zgconsistsofobserved

The Majorization-Minimization AlgorithmBlock Successive Majorization-Minimization

Distributed Algorithm for Nonlinear Programming

IntroductionBlock Coordinate DescentBlock Successive Majorization-MinimizationExample AlgorithmsApplications

BS-MM Algorithm

Combination of MM and BCDBlock Succesive Majorization-Minimization (BS-MM):1: Initialize x0 ∈X and set k = 0.2: repeat3: k = k +1, i = (k mod n) +14: X k = arg minxi∈Xi

ui(xi ,xk−1

)5: Set xki to be an arbitrary element in X k

6: xki ← xk−1i , ∀k 6= i7: until some convergence criterion is met

Generalization of BCD

Ying Sun and Prof. Daniel P. Palomar MM Algorithm 42 / 65

Page 45: Majorization-Minimization Algorithm€¦ · Distributed Algorithm for Nonlinear Programming Introduction Construction Techniques Example Algorithms Applications EM Algorithm Assumethecompletedatasetfx;zgconsistsofobserved

The Majorization-Minimization AlgorithmBlock Successive Majorization-Minimization

Distributed Algorithm for Nonlinear Programming

IntroductionBlock Coordinate DescentBlock Successive Majorization-MinimizationExample AlgorithmsApplications

Convergence

Surrogate function ui (·, ·) satisfies the following assumptions

ui (yi ,y) = f (y) , ∀y ∈X ,∀i (B1)ui (xi ,y)≥ f (y1, . . . ,yi−1,xi ,yi+1, . . . ,yn) ,

∀xi ∈Xi ,∀y ∈X ,∀i (B2)u′i (xi ,y;di )

∣∣xi=yi

= f ′ (y;d) ,

∀d = (0, . . . ,di , . . . ,0) such that yi +di ∈Xi ,∀i (B3)ui (xi ,y) is continuous in (xi ,y) , ∀i (B4)

In short, ui(xi ,xk

)majorizes f (x) on the ith block.

Ying Sun and Prof. Daniel P. Palomar MM Algorithm 43 / 65

Page 46: Majorization-Minimization Algorithm€¦ · Distributed Algorithm for Nonlinear Programming Introduction Construction Techniques Example Algorithms Applications EM Algorithm Assumethecompletedatasetfx;zgconsistsofobserved

The Majorization-Minimization AlgorithmBlock Successive Majorization-Minimization

Distributed Algorithm for Nonlinear Programming

IntroductionBlock Coordinate DescentBlock Successive Majorization-MinimizationExample AlgorithmsApplications

Under assumptions B1-B4, for simplicity additionally assumethat f is continuously differentiable,

ui (xi ,y) is quasi-convex in xi ,each subproblem minxi∈Xi

ui(xi ,xk−1

)has a unique solution

for any xk−1 ∈X ,then every limit point of

{xk}is a stationary point.

level set X 0 ={x|f (x)≤ f

(x0)}

is compact,each subproblem minxi∈Xi

ui(xi ,xk−1

)has a unique solution

for any xk−1 ∈X for at least m−1 blocks,then limk→∞ d

(xk ,X ?

)= 0.

More restrictive assumption than MM due to the cyclic updatebehavior.

Ying Sun and Prof. Daniel P. Palomar MM Algorithm 44 / 65

Page 47: Majorization-Minimization Algorithm€¦ · Distributed Algorithm for Nonlinear Programming Introduction Construction Techniques Example Algorithms Applications EM Algorithm Assumethecompletedatasetfx;zgconsistsofobserved

Outline

1 The Majorization-Minimization AlgorithmIntroductionConstruction TechniquesExample AlgorithmsApplications

2 Block Successive Majorization-MinimizationIntroductionBlock Coordinate DescentBlock Successive Majorization-MinimizationExample AlgorithmsApplications

3 Distributed Algorithm for Nonlinear ProgrammingExact Jacobi Successive Convex ApproximationExtensions

Page 48: Majorization-Minimization Algorithm€¦ · Distributed Algorithm for Nonlinear Programming Introduction Construction Techniques Example Algorithms Applications EM Algorithm Assumethecompletedatasetfx;zgconsistsofobserved

The Majorization-Minimization AlgorithmBlock Successive Majorization-Minimization

Distributed Algorithm for Nonlinear Programming

IntroductionBlock Coordinate DescentBlock Successive Majorization-MinimizationExample AlgorithmsApplications

Alternating Proximal Minimization

Consider the problem

minimizex

f (x1, . . . ,xm)

subject to xi ∈Xi ,

with f (·) being convex in each block.The convergence of BCD is not easy to establish since eachsubproblem may have multiple solutions.Alternating Proximal Minimization solves

minimizexi

f(xk1 , . . . ,x

ki−1,xi ,x

ki+1, . . . ,x

km

)+ 1

2c

∥∥xi −xki∥∥2

subject to xi ∈Xi

Strictly convex objective→unique minimizer

Ying Sun and Prof. Daniel P. Palomar MM Algorithm 46 / 65

Page 49: Majorization-Minimization Algorithm€¦ · Distributed Algorithm for Nonlinear Programming Introduction Construction Techniques Example Algorithms Applications EM Algorithm Assumethecompletedatasetfx;zgconsistsofobserved

The Majorization-Minimization AlgorithmBlock Successive Majorization-Minimization

Distributed Algorithm for Nonlinear Programming

IntroductionBlock Coordinate DescentBlock Successive Majorization-MinimizationExample AlgorithmsApplications

Proximal Splitting Algorithm

Consider the following problem

minimizex

∑mi=1 fi (xi ) + fm+1 (x1, . . . ,xm)

subject to xi ∈Xi , i = 1, . . . ,m

with fi convex and lower semicontinuous, fm+1 convex and

‖∇fm+1 (x)−∇fm+1 (y)‖ ≤ βi ‖xi −yi‖

Cyclically update:

xk+1i = proxγfi

(xki − γ∇xi fm+1

(xk))

,

with the proximity operator defined as

proxf (x) = arg miny∈X

f (y) +12‖x−y‖2 .

Ying Sun and Prof. Daniel P. Palomar MM Algorithm 47 / 65

Page 50: Majorization-Minimization Algorithm€¦ · Distributed Algorithm for Nonlinear Programming Introduction Construction Techniques Example Algorithms Applications EM Algorithm Assumethecompletedatasetfx;zgconsistsofobserved

The Majorization-Minimization AlgorithmBlock Successive Majorization-Minimization

Distributed Algorithm for Nonlinear Programming

IntroductionBlock Coordinate DescentBlock Successive Majorization-MinimizationExample AlgorithmsApplications

Proximal Splitting Algorithm

BS-MM interpretation:

ui

(xi ,xk

)= fi (xi ) +

12γ

∥∥∥xi −xki∥∥∥2

+ ∇xi fm+1

(xk)T (

xi −xki)

+ ∑j 6=i

fj

(xkj)

+ fm+1

(xk−i ,xi

).

Check:

fm+1

(xk)

+12γ

∥∥∥xi −xki∥∥∥2

+ ∇xi fm+1

(xk)T (

xi −xki)

≥fm+1

(xk)

+βi

2

∥∥∥xi −xki∥∥∥2

+ ∇xi fm+1

(xk)T (

xi −xki)

≥fm+1

(xk−i ,xi

)(Descent lemma)

with γ ∈ [εi ,2/βi − εi ] and εi ∈ (0,min{1,1/βi}).Ying Sun and Prof. Daniel P. Palomar MM Algorithm 48 / 65

Page 51: Majorization-Minimization Algorithm€¦ · Distributed Algorithm for Nonlinear Programming Introduction Construction Techniques Example Algorithms Applications EM Algorithm Assumethecompletedatasetfx;zgconsistsofobserved

Outline

1 The Majorization-Minimization AlgorithmIntroductionConstruction TechniquesExample AlgorithmsApplications

2 Block Successive Majorization-MinimizationIntroductionBlock Coordinate DescentBlock Successive Majorization-MinimizationExample AlgorithmsApplications

3 Distributed Algorithm for Nonlinear ProgrammingExact Jacobi Successive Convex ApproximationExtensions

Page 52: Majorization-Minimization Algorithm€¦ · Distributed Algorithm for Nonlinear Programming Introduction Construction Techniques Example Algorithms Applications EM Algorithm Assumethecompletedatasetfx;zgconsistsofobserved

The Majorization-Minimization AlgorithmBlock Successive Majorization-Minimization

Distributed Algorithm for Nonlinear Programming

IntroductionBlock Coordinate DescentBlock Successive Majorization-MinimizationExample AlgorithmsApplications

Robust Estimation of Location and Scatter

xi ∼ elliptical(µ,R)

[Sun-Bab-Pal’J15] Fitting xi to a Cauchy distribution with pdf

f (x) ∝ det(R)−1/2(1+ (xi −µ)T R−1 (xi −µ)

)−(K+1)/2

Shrinkage penalty

h (t,T) =K log(Tr(R−1T

))+log det(R)+log

(1+ (t−µ)T R−1 (t−µ)

)Solve the following problem:

minimizeµ,R�0

log det(R) + K+1N ∑

Ni=1 log

(1+ (xi −µ)T R−1 (xi −µ)

)+αh (t,T)

Ying Sun and Prof. Daniel P. Palomar MM Algorithm 50 / 65

Page 53: Majorization-Minimization Algorithm€¦ · Distributed Algorithm for Nonlinear Programming Introduction Construction Techniques Example Algorithms Applications EM Algorithm Assumethecompletedatasetfx;zgconsistsofobserved

The Majorization-Minimization AlgorithmBlock Successive Majorization-Minimization

Distributed Algorithm for Nonlinear Programming

IntroductionBlock Coordinate DescentBlock Successive Majorization-MinimizationExample AlgorithmsApplications

BS-MM Algorithm update:

µt+1 =(K +1)∑

Ni=1wi (µt ,Rt)xi +Nαwt (µt ,Rt)t

(K +1)∑Ni=1wi (µt ,Rt)+Nαwt (µt ,Rt)

Rt+1 =K +1N+Nα

N

∑i=1

wi

(µt+1,Rt

)(xi −µt+1

)(xi −µt+1

)T+

N+Nαwt(µt+1,Rt

)(µt+1− t

)(µt+1− t

)T+

NαK

N+Nα

T

Tr(R−1t T

)where

wi (µ,R) =1

1+(xi −µ)T R−1 (xi −µ)

wt (µ,R) =1

1+(t−µ)T R−1 (t−µ).

Ying Sun and Prof. Daniel P. Palomar MM Algorithm 51 / 65

Page 54: Majorization-Minimization Algorithm€¦ · Distributed Algorithm for Nonlinear Programming Introduction Construction Techniques Example Algorithms Applications EM Algorithm Assumethecompletedatasetfx;zgconsistsofobserved

The Majorization-Minimization AlgorithmBlock Successive Majorization-Minimization

Distributed Algorithm for Nonlinear Programming

IntroductionBlock Coordinate DescentBlock Successive Majorization-MinimizationExample AlgorithmsApplications

1 2 3 4 5 6 7 8 9 10 11 12 13 14 152000

2500

3000

3500

4000

4500

5000

Iterations

Obj

ectiv

e fu

nctio

n va

lue

Ying Sun and Prof. Daniel P. Palomar MM Algorithm 52 / 65

Page 55: Majorization-Minimization Algorithm€¦ · Distributed Algorithm for Nonlinear Programming Introduction Construction Techniques Example Algorithms Applications EM Algorithm Assumethecompletedatasetfx;zgconsistsofobserved

Outline

1 The Majorization-Minimization AlgorithmIntroductionConstruction TechniquesExample AlgorithmsApplications

2 Block Successive Majorization-MinimizationIntroductionBlock Coordinate DescentBlock Successive Majorization-MinimizationExample AlgorithmsApplications

3 Distributed Algorithm for Nonlinear ProgrammingExact Jacobi Successive Convex ApproximationExtensions

Page 56: Majorization-Minimization Algorithm€¦ · Distributed Algorithm for Nonlinear Programming Introduction Construction Techniques Example Algorithms Applications EM Algorithm Assumethecompletedatasetfx;zgconsistsofobserved

The Majorization-Minimization AlgorithmBlock Successive Majorization-Minimization

Distributed Algorithm for Nonlinear Programming

Exact Jacobi Successive Convex ApproximationExtensions

Problem Statement

Consider the following problem:

minimizex

f (x)

subject to xi ∈Xi

where the Xi ’s are closed and convex sets,f (x) = ∑

Ll=1 fl (x1, . . . ,xm).

Conditional gradient update (Frank-Wolfe):

xk+1 = xk + γkdk

direction dk , xk −xk with

xki = arg minxi∈Xi

∇xi f(xk)T (

xi −xki)

step-size γk ∈ (0,1], chosen to guarantee convergence.

Ying Sun and Prof. Daniel P. Palomar MM Algorithm 54 / 65

Page 57: Majorization-Minimization Algorithm€¦ · Distributed Algorithm for Nonlinear Programming Introduction Construction Techniques Example Algorithms Applications EM Algorithm Assumethecompletedatasetfx;zgconsistsofobserved

The Majorization-Minimization AlgorithmBlock Successive Majorization-Minimization

Distributed Algorithm for Nonlinear Programming

Exact Jacobi Successive Convex ApproximationExtensions

Exact Jacobi SCA Algorithm

Idea:

Conditional gradient update linearize all the fl ’s at xk .Each function fl might be convex w.r.t. some block xi .We want to preserve the convex property of fl

(xi ,xk−i

).

Solution: keep the convex fl(xi ,xk−i

)’s and linearize the others.

Ying Sun and Prof. Daniel P. Palomar MM Algorithm 55 / 65

Page 58: Majorization-Minimization Algorithm€¦ · Distributed Algorithm for Nonlinear Programming Introduction Construction Techniques Example Algorithms Applications EM Algorithm Assumethecompletedatasetfx;zgconsistsofobserved

The Majorization-Minimization AlgorithmBlock Successive Majorization-Minimization

Distributed Algorithm for Nonlinear Programming

Exact Jacobi Successive Convex ApproximationExtensions

Define Ci as the set of indices of l such that fl(xi ,xk−i

)is

convex.Approximate f (x) on the ith block at point xk :

fi

(xi ,xk

)= ∑

l∈Ci

fl

(xi ,xk−i

)︸ ︷︷ ︸

convex terms

+π i

(xk)T (

xi −xki)

︸ ︷︷ ︸linear approx.

non-convex terms

+τi

2

(xi −xki

)THi

(xk)(

xi −xki)

︸ ︷︷ ︸proximal term

,

withπ i

(xk)= ∑

l /∈Ci

∇xi fl

(xk)

and Hi

(xk)� cHi

I.

Ying Sun and Prof. Daniel P. Palomar MM Algorithm 56 / 65

Page 59: Majorization-Minimization Algorithm€¦ · Distributed Algorithm for Nonlinear Programming Introduction Construction Techniques Example Algorithms Applications EM Algorithm Assumethecompletedatasetfx;zgconsistsofobserved

The Majorization-Minimization AlgorithmBlock Successive Majorization-Minimization

Distributed Algorithm for Nonlinear Programming

Exact Jacobi Successive Convex ApproximationExtensions

Exact Jacobi SCA update:

xi(xk ,τi

)= arg min

xi∈Xi

fi

(xi ,xk

)xk+1 = xk + γ

k(x−xk

)Step-size rule

constant step-size that depends on the Lipschitz constant of∇fdiminishing step-size

Remark: update of the blocks can be done sequentially(Gauss-Seidel SCA Algorithm)

Ying Sun and Prof. Daniel P. Palomar MM Algorithm 57 / 65

Page 60: Majorization-Minimization Algorithm€¦ · Distributed Algorithm for Nonlinear Programming Introduction Construction Techniques Example Algorithms Applications EM Algorithm Assumethecompletedatasetfx;zgconsistsofobserved

Outline

1 The Majorization-Minimization AlgorithmIntroductionConstruction TechniquesExample AlgorithmsApplications

2 Block Successive Majorization-MinimizationIntroductionBlock Coordinate DescentBlock Successive Majorization-MinimizationExample AlgorithmsApplications

3 Distributed Algorithm for Nonlinear ProgrammingExact Jacobi Successive Convex ApproximationExtensions

Page 61: Majorization-Minimization Algorithm€¦ · Distributed Algorithm for Nonlinear Programming Introduction Construction Techniques Example Algorithms Applications EM Algorithm Assumethecompletedatasetfx;zgconsistsofobserved

The Majorization-Minimization AlgorithmBlock Successive Majorization-Minimization

Distributed Algorithm for Nonlinear Programming

Exact Jacobi Successive Convex ApproximationExtensions

Extensions

FLEXA

non-smooth objective functioninexact update directionflexible block update choice

HyFLEXA

Ying Sun and Prof. Daniel P. Palomar MM Algorithm 59 / 65

Page 62: Majorization-Minimization Algorithm€¦ · Distributed Algorithm for Nonlinear Programming Introduction Construction Techniques Example Algorithms Applications EM Algorithm Assumethecompletedatasetfx;zgconsistsofobserved

The Majorization-Minimization AlgorithmBlock Successive Majorization-Minimization

Distributed Algorithm for Nonlinear Programming

Exact Jacobi Successive Convex ApproximationExtensions

Comparison

BS-MM FLEXAconvergence stationary point stationary point

objective functioncontinuous continuous

may not be smooth may not be smoothconstraint set Cartesian Cartesian & convexupdate rule sequential sequential or parallel

approx. functionglobal upper-bound local approximationunique minimizer not requiredcan be non-convex convex approx.

Ying Sun and Prof. Daniel P. Palomar MM Algorithm 60 / 65

Page 63: Majorization-Minimization Algorithm€¦ · Distributed Algorithm for Nonlinear Programming Introduction Construction Techniques Example Algorithms Applications EM Algorithm Assumethecompletedatasetfx;zgconsistsofobserved

The Majorization-Minimization AlgorithmBlock Successive Majorization-Minimization

Distributed Algorithm for Nonlinear Programming

Exact Jacobi Successive Convex ApproximationExtensions

Summary

We have studied

Majorization-Minimization algorithmBlock Coordinate Descent algorithmBlock Successive Majorization-Minimization algorithm

We have briefly introduced

Distributed Successive Convex Approximation algorithm

Ying Sun and Prof. Daniel P. Palomar MM Algorithm 61 / 65

Page 64: Majorization-Minimization Algorithm€¦ · Distributed Algorithm for Nonlinear Programming Introduction Construction Techniques Example Algorithms Applications EM Algorithm Assumethecompletedatasetfx;zgconsistsofobserved

References I

D. R. Hunter and K. Lange.A tutorial on MM algorithms.Amer. Statistician, volume 58, pp. 30–37, 2004.

M. Razaviyayn, M. Hong, and Z. Luo.A unified convergence analysis of block successive minimization methods fornonsmooth optimization.SIAM J. Optim., volume 23, no. 2, pp. 1126–1153, 2013.

G. Scutari, F. Facchinei, P. Song, D. Palomar, and J.-S. Pang.Decomposition by partial linearization: Parallel optimization of multi-agentsystems.IEEE Trans. Signal Process., volume 62, no. 3, pp. 641–656, 2014.

Y. Sun, P. Babu, and D. Palomar.Majorization-minimization algorithms in signal processing, communications, andmachine learning.IEEE Transactions on Signal Processing, volume 65, no. 3, pp. 794–816, 2017.

M. Chiang, C. W. Tan, D. Palomar, D. O’Neill, and D. Julian.Power control by geometric programming.IEEE Trans. Wireless Commun, volume 6, no. 7, pp. 2640–2651, 2007.

Page 65: Majorization-Minimization Algorithm€¦ · Distributed Algorithm for Nonlinear Programming Introduction Construction Techniques Example Algorithms Applications EM Algorithm Assumethecompletedatasetfx;zgconsistsofobserved

References II

E. J. Candes, M. Wakin, and S. Boyd.Enhancing sparsity by reweighted l1 minimization.J. Fourier Anal. Appl., volume 14, no. 5-6, pp. 877–905, 2008.

J. Song, P. Babu, and D. P. Palomar.Sparse generalized eigenvalue problem via smooth optimization.IEEE Trans. Signal Process., volume 63, no. 7, pp. 1627–1642, 2015.

—.Optimization methods for designing sequences with low autocorrelationsidelobes.IEEE Trans. Signal Process., volume 63, no. 15, pp. 3998–4009, 2015.

Y. Sun, P. Babu, and D. P. Palomar.Regularized Tyler’s scatter estimator: Existence, uniqueness, and algorithms.IEEE Trans. Signal Processing, volume 62, no. 19, pp. 5143–5156, 2014.

D. P. Bertsekas.Nonlinear Programming.Athena Scientific, 1999.

L. Grippo and M. Sciandrone.On the convergence of the block nonlinear gauss–seidel method under convexconstraints.Oper. Res. Lett., volume 26, no. 3, pp. 127–136, 2000.

Page 66: Majorization-Minimization Algorithm€¦ · Distributed Algorithm for Nonlinear Programming Introduction Construction Techniques Example Algorithms Applications EM Algorithm Assumethecompletedatasetfx;zgconsistsofobserved

References III

Y. Sun, P. Babu, and D. P. Palomar.Regularized robust estimation of mean and covariance matrix under heavy-taileddistributions.IEEE Transactions on Signal Processing, volume 63, no. 12, pp. 3096–3109,2015.

Page 67: Majorization-Minimization Algorithm€¦ · Distributed Algorithm for Nonlinear Programming Introduction Construction Techniques Example Algorithms Applications EM Algorithm Assumethecompletedatasetfx;zgconsistsofobserved

Thanks

For more information visit:

https://www.danielppalomar.com