multiple gradient descent algorithm

Outline Introduction Multiple Gradient Descent Algorithm MGDA for non-smooth convex problems

Multiple Gradient Descent Algorithm

Outi Wilppu

March 1, 2013

Outi Wilppu Multiple Gradient Descent Algorithm

1 Introduction

2 Multiple Gradient Descent AlgorithmSteepest Descent MethodTheoretical backroundAlgorithm

3 MGDA for non-smooth convex problemsTheoretical backroundAlgorithmExampleFuture work

Motivation

Many real life problems have multiobjective nature

• Problem has several goals such that they should all be asgood as possible

• Goals are conflicting and have to make compromises

• For example in economics and engineering

• Many problems have also non-smooth nature

Problem

Definition

We consider a multiobjective problem of form

min f1(x), . . . , fk(x) (1)

s. t. x ∈ S

• Objective functions fi (x) : Rn −→ R• Set S ⊆ Rn is a set of feasible solutions

Pareto optimality

Definition

A solution x∗ ∈ S is Pareto optimal if there is no other solutionx ∈ S such that fi (x) ≤ fi (x

∗) for all i = 1, . . . , k andfi (x) < fi (x

∗) for at least one index i .

• Usually there exists a lot of Pareto optimal solutions, calledPareto optimal set

• All Pareto optimal solution are mathematically equally good

Steepest Descent Method

• A method to solve smooth unconstrained optimizationproblem with single objective function

min f (x), s. t. x ∈ Rn

• Search direction d = − ∇f (x)

‖ ∇f (x) ‖, which is the steepest

descent direction at the point x

• Stepsize λ can be found with line search

• A new point is of the form x+ = x + λd, wheref (x+) < f (x) until the optimum is reached

About MGDA

• Assume that in problem (1) we have

– All objective functions fi are continiously differentiable– Problem has no constrains and feasible set is Rn

• The main idea is to find a common descent direction for allobjective functions by defining the convex hull of gradients ofobjective functions and finding the minimum norm element ofthe convex hull

• With this method we obtain one Pareto stationary solution

Pareto stationary

Definition

Objective functions fi are said to be Pareto stationary at the pointx∗ if

k∑i=1

αi∇fi (x∗) = 0, αi ≥ 0 ∀i ,

k∑i=1

αi = 1.

• Pareto stationarity is a necessary condition for Paretooptimality

Theorem

Let objective funtions fi , i = 1, . . . , k be continiously differentiableand

U = {w ∈ Rn | w =k∑

αi∇fi (x), αi ≥ 0 ∀ i ,k∑

αi = 1}

Let vector w∗ be a minimum norm element of convex hull of U i.e.w∗ = argmin{‖ w ‖ |w ∈ conv U}. Then

1 either w∗ = 0 and objective functions fi , i = 1, . . . , k arePareto stationary at the point x∗

2 or w∗ 6= 0 and −w∗ is common descent direction for allobjective functions. Additionally, if w∗ ∈ U thenuTw∗ =‖ w∗ ‖2 for all u ∈ conv U.

Search direction

• To find the vector w we need to solve the problem

min ‖k∑

αi∇fi (x)‖2

s. t. αi ≥ 0∀ i ,k∑

αi = 1.

• In case of two objective functions we have

‖v‖2−uTv

‖u‖2+‖v‖2−2uTv, if uTv < min{‖u‖2, ‖v‖2}

0, if min{‖u‖, ‖v‖} = ‖v‖1, if min{‖u‖, ‖v‖} = ‖u‖

where u = ∇f1(x), v = ∇f2(x) and α2 = 1− α1.

Stepsize

• Stepsize λ is the largest strictly positive real number such thatall the functions gi (t) = fi (x− tw) are monotonicallydecreasing over the interval [0, t].

• A new point x+ is x+ = x− λw• Algorithm stops when w = 0

About MGDA for non-smooth convex problems

• Assume that in the problem (1)

– Objective functions fi are convex functions, not necessarilydifferentiable

– Problem has no constrains and the feasible set is Rn

– The subdifferentials of objective functions fi are known

• The basic idea is the same as before but now instead ofgradients we use the steepest descent subgradients

Subdifferential and subgradient

Definition

Let function f be convex. A subdifferential of f at the point x is aset

∂f (x) = {ξ ∈ Rn | f ′(x;d) ≥ ξTd}

for all d ∈ Rn. A vector ξ ∈ ∂f (x) is called subgradient offunction f at the point x.

Theorem

Let function f : Rn −→ R be convex and differentiable at the pointx ∈ Rn. Then

∂f (x) = {∇f (x)}.

Steepest descent direction

• To determine the steepest descent direction we need to solvethe problem

min f ′(x;d) s. t. ‖d‖ = 1 (2)

• The problem (2) can be written

min ‖ξ‖ s. t. ξ ∈ ∂f (x)

• Thus the steepest descent direction at the point x is thesmallest norm subgradient in the subdifferential ∂f (x)

Algorithm

• To obtain a search direction we need to define the steepestdescent subgradient for every objective function at the pointx. After that we can solve problem

min ‖k∑

αiξi‖2

s. t. αi ≥ 0 ∀ i ,k∑

αi = 1

where ξi ∈ ∂fi (x) is the steepest descent subgradient.

• Stepsize and new point are determined as in smooth case

• With this method we obtain one Pareto stationary solutionsince

k∑i=1

αiξi = 0, αi ≥ 0 and

k∑i=1

αi = 1.

• In convex case Pareto stationarity is necessary and sufficientcondition to Pareto optimality

⇒ We obtain one Pareto optimal solution

Example

Consider the following problem:

min {f1(x), f2(x)}s. t. x ∈ R2

f1(x) = max{x21 + (x2 − 1)2, (x1 + 1)2}

f2(x) = max{2x1 − x2, x21 + x2}

and starting point x0 = (1, 0.5).

The steepest descent subgradients at the point x0:

– ξ1 = (4, 0)

– To determine ξ2 we need to find ρ ∈ [0, 1] such that we findthe minimum norm of subdifferential

∂f2(x0) = ρ(2,−1) + (1− ρ)(2, 1) = (2, 1− 2ρ).

To obtain the steepest descent subgradient ρ =1

ξ2 = (2, 0).

A common descent search direction:

– w = αξ1 + (1− α)ξ2

– We obtain α from formula

‖ξ2‖2−ξT1 ξ2

‖ξ1‖2+‖ξ2‖2−2ξT1 ξ2, if ξT1 ξ2 < min{‖ξ1‖2, ‖ξ2‖2}

0, if min{‖ξ1‖, ‖ξ2‖} = ‖ξ2‖1, if min{‖ξ1‖, ‖ξ2‖} = ‖ξ1‖

– Thus w = (2, 0)

Stepsize:

– Function g1(t) = f1(x0 − tw) is descent if t ∈ [0, 0.6875]

– Function g2(t) = f2(x0 − tw) is descent if t ∈ [0, 0.5]

⇒ Stepsize λ = 0.5

New point:

– x1 = x0 − λw = (0, 0.5)

k ξ1 ξ2 α w λ xk+1

0 (4, 0) (2, 0) 0 (2, 0) 0.5 (0, 0.5)

1 (2, 0) (0, 1) 0.2 (0.4, 0.8) 0.403 (−0.161, 0.178)

2 (1.678, 0) (−0.322, 1) 0.329 (0.336; 0.671)

Solution: x∗ = (−0.161, 0.178), f1(x∗) ≈ 0.704 andf2(x∗) ≈ 0.203

Future work

• The biggest drawback is that we need to know wholesubdifferential

• The aim is to find an effective way to get a common descentdirection for all objective functions

• We do not need more than one arbitrary subgradient offunction

Thank you for your attention!

multiple gradient descent algorithm

Documents

1 lecture 10: descent methods gradient descent (reminder)

a fast stochastic error-descent algorithm for supervised...

pegasos: primal estimated sub-gradient solver for...

proximal gradient descent › ~aarti › class ›...

linear regression & gradient descent

multileave gradient descent for fast online learning to...

a new backpropagation algorithm without gradient descent ·...

(sub)gradient descent - umd...•gradient descent –a...

how the backpropagation algorithm...

deep learning i: gradient descentgd.pdf · roadmap...

lecture 6: stochastic gradient descent sanjeev arora elad...

algorithm 851: cg descent, a conjugate gradient method with

gradient descent optimization

learning to learn by gradient descent by gradient...

boosting algorithms as gradient descent in function...

learning to learn by gradient descent by gradient...

gradient descent - pomona college

gasgd: stochastic gradient descent for distributed … ·...

overview of stochastic gradient descent...

the gradient descent algorithm