Transcript
Page 1: Nonsmooth Optimization

1

Page 2: Nonsmooth Optimization

Preliminaries

• Rn, n-dimensional real Euclidean space and x, y ∈ Rn

• Usual inner product (x, y) = xTy = [n∑i=1

xiyi]

• Euclidean norm ‖x‖ =√

(x, x) = (xTx)12

• f : O → R is smooth (continuously differentiable), if thegradient ∇f : O → R is defined and continuous on an open

set O ⊆ Rn: ∇f(x) =

(∂f(x)

∂x1,∂f(x)

∂x2, . . . ,

∂f(x)

∂xn

)T

2

Page 3: Nonsmooth Optimization

Smooth Functions - Directional Derivative

• Directional derivatives f ′(x;u), f ′(x;−u) of f at x ∈ O,in the direction of u ∈ Rn:

f ′(x;u) := limα→+0

f(x+ αu)− f(x)

α= (∇f(x), u),

• f ′(x; e1), f ′(x; e2), . . . , f ′(x; en), ei(i = 1,2, . . . , n) unit vectors

• (∇f(x), e1) = fx1, (∇f(x), e2) = fx2 and (∇f(x), en) = fxn.

• Note that f ′(x;u) = −f ′(x;−u).

3

Page 4: Nonsmooth Optimization

Smooth Functions - 1st order approximation

• A first-order approximation of f near x ∈ Oby means of the Taylor series with remainder term:

f(x+ δ) = f(x) + (∇f(x), δ) + ox(δ) (x+ δ ∈ O),

• limα→0

ox(αδ)

α= 0 where δ ∈ Rn is small enough.

• a smooth function can be locally replaced by a “simple” linear

approximation of it

4

Page 5: Nonsmooth Optimization

Smooth Functions - Optimality Conditions

First-order necessary conditions for an extremum:

• For x∗ ∈ O to be a local minimizer of f on Rn, it is necessary

that ∇f(x∗) = 0n,

• For x∗ ∈ O to be a local maximizer of f on Rn, it is necessary

that ∇f(x∗) = 0n.

5

Page 6: Nonsmooth Optimization

Smooth Functions - Descent/Ascent Directions

Directions of steepest descent and ascent if x is not a stationarypoint,

• the unit steepest descent direction ud of the function f at a

point x: ud(x) = −∇f(x)

‖∇f(x)‖,

• the unit steepest ascent direction ua of the function f at a

point x: ua(x) =∇f(x)

‖∇f(x)‖.

• One steepest descent direction, only one steepest ascent di-rection and u0(x) = −u1(x)

6

Page 7: Nonsmooth Optimization

Smooth Functions - Chain Rule

• Chain rule: Let f : Rn → R, g : Rn → R, h : Rn → Rn.

• If f ∈ C1(O), g ∈ C1(O) and f(x) = g(h(x)) then, ∇Tf(x) =

∇Tg(h(x))∇h(x)

• ∇h(x) =

[∂hj(x)

∂xi

]i,j=1,2,...,n

is an n× n matrix.

7

Page 8: Nonsmooth Optimization

Nonsmooth Optimization

• Deals with nondifferentiable functions

• The problem is to find a proper replacement for the concept

of gradient

• Different research groups work on nonsmooth function classes;

hence there are different theories to handle the different non-

smooth problems

• Tools replacing the gradient

8

Page 9: Nonsmooth Optimization

Keywords of Nonsmooth Optimization

• Convex Functions, Lipschitz Continuous Functions

• Generalized directional derivatives, Generalized Derivatives

• Subgradient method, Bundle method, Discrete Gradient Al-

gorithm

• Asplund Spaces

9

Page 10: Nonsmooth Optimization

Convex Functions

• O ⊆ Rn a nonempty convex set

if αx+ (1− α)y ∈ O for all x, y ∈ O, α ∈ [0,1]

• f : O → R, R := [−∞,∞] s.t.

f(λx+ (1− λ)y) ≤ λf(x) + (1− λ)f(y)

for any x, y ∈ O, λ ∈ [0,1].

10

Page 11: Nonsmooth Optimization

Convex Functions

• Every local minimum is a global minimum

• ξ a subgradient of f at a nondifferentiable point x ∈ domfif it satisfies the subgradient inequality, i.e.,

f(y) ≥ f(x) + (ξ, y − x).

• Set of subgradients of is called subdifferential, ∂f(x)

∂f(x) := {ξ ∈ Rn | f(y) ≥ f(x) + (ξ, y − x) ∀y ∈ Rn}.

11

Page 12: Nonsmooth Optimization

Convex Functions

• The subgradients at a point can be characterized by direc-

tional derivative: f ′(x;u) = supξ∈∂f(x)

(ξ, u).

• x in the interior of domf , subdifferential ∂f(x) is compact

then the directional derivative is finite

• Subdifferential in relation with the directional derivative

∂f(x) = {ξ ∈ Rn | f ′(x;u) ≥ (ξ, u) ∀u ∈ Rn}.

12

Page 13: Nonsmooth Optimization

Lipschitz Continuous Functions

• f : O → R is Lipschitz continuous for some constant K

if for all y, z in an open set O: |f(y)− f(z)| ≤ K‖y − z‖

• Differentiable almost everywhere

• Clarke subdifferential ∂Cf(x) of Lipschitz continuous f at x

∂Cf(x) = co{ξ ∈ Rn | ξ = limk→∞

∇f(xk), xk → x, xk ∈ D}D is the set where the function is differentiable.

13

Page 14: Nonsmooth Optimization

Lipschitz Continuous Functions

• Mean Value Theorem for Clarke subdifferentials ξ

f(b)− f(a) = (ξ, b− a)

• Nonsmooth chain rule with respect to Clarke subdifferential

∂C(g ◦ F )(x) ⊆ co

{m∑i=1

ξiµi | ξ = (ξ1, ξ2, . . . , ξm) ∈ ∂Cg(F (x))

}µi ∈ ∂Cfi(x) (i = 1,2, . . . ,m)

• F (·) = (f1(·), f2(·), . . . , fm(·)) a vector valued function,

g : Rm → R, g ◦ F : Rn → R are Lipschitz continuous

14

Page 15: Nonsmooth Optimization

Regular Functions

• Locally Lipschitz functions have directional derivative

f ′C(x;u) = f ′(x;u)

• Ex: Semismooth functions: f : Rn → R at x ∈ Rn is locally

Lipschitz for every u ∈ Rn the following limit exists:

limξ∈∂f(x+αu)

v→uα→+0

(ξ, u)

15

Page 16: Nonsmooth Optimization

Max- and Min-type Functions

• f(x) = max {f1(x), f2(x), . . . , fm(x)}, fi : Rn → R (i = 1,2, . . . ,m)

• ∂Cf(x) ⊆ co

⋃i∈J(x)

∂Cfi(x)

,where J(x) := {i = 1,2, . . . ,m | f(x) = fi(x)}

• Ex: f(x) = max {f1(x), f2(x)}

16

Page 17: Nonsmooth Optimization

Quasidifferentiable Functions

• f : Rn → R is quasidifferentiable

if f ′(x;u) exist finitely ∀x in the direction u and

there exists [∂f(x), ∂̄f(x)]

• f ′(x;u) = maxξ∈∂f(x)

(ξ, u) + minφ∈∂̄f(x)

(φ, u)

• [∂f(x), ∂̄f(x)] is the quasidifferential, ∂f(x) subdifferential,

∂f(x) superdifferential

17

Page 18: Nonsmooth Optimization

Directional Derivatives

f : O → R, O ⊂ Rn, x ∈ O in the direction u ∈ Rn

• Dini Directional Derivative

• Hadamard Directional Derivative

• Clarke Directional Derivative

• Michel-Penot Directional Derivative

18

Page 19: Nonsmooth Optimization

Dini Directional Derivative

• upper Dini directionally differentiable

f ′D(x;u) := lim supα→+0

f(x+αu)−f(x)α

• lower Dini directionally differentiable

f ′D(x;−u) := lim infα→+0

f(x+αu)−f(x)α

• Dini subdifferentiable f ′D(x;u) = f ′D(x;−u)

19

Page 20: Nonsmooth Optimization

Hadamard Directional Derivative

• upper Hadamard directionally differentiable

f ′H(x;u) := limα→+0

supv→u

f(x+αv)−f(x)α

• lower Hadamard directionally differentiable

f ′H(x;−u) := limα→+0

infv→u

f(x+αv)−f(x)α

• Hadamard Subdifferentiable f ′H(x;u) = f ′H(x;−u)

20

Page 21: Nonsmooth Optimization

Clarke Directional Derivative

• upper Clarke directionally differentiable

f ′C(x;u) := limy→x sup

α→+0

f(x+αu)−f(y)α

• lower Clarke directionally differentiable

f ′C(x;−u) := limy→x inf

α→+0

f(x+αu)−f(y)α

• Clarke Subdifferentiable f ′C(x;u) = f ′C(x;−u)

21

Page 22: Nonsmooth Optimization

Michel-Penot Directional Derivative

• upper Michel-Penot directionally differentiable

f ′MP (x;u) := supv∈Rn

{lim supα→0

1α[f(x+ α(u+ v))− f(x+ αv)]}

• lower Michel-Penot directionally differentiable

f ′MP (x;−u) := infv∈Rn

{lim infα→0

1α[f(x+ α(u+ v))− f(x+ αv)]}

• Michel-Penot Subdifferentiable f ′MP (x;u) = f ′MP (x;−u)

22

Page 23: Nonsmooth Optimization

Subdifferentials and Optimality Conditions

• f ′(x;u) = maxξ∈∂f(x)

(ξ, u) ∀u ∈ Rn

• For a point x∗ to be a minimizer,

it is necessary that 0n ∈ ∂f(x)

• A point x∗ satisfying 0n ∈ ∂f(x) is called stationary point

23

Page 24: Nonsmooth Optimization

Nonsmooth Optimization Methods

• Subgradient Algorithm (and ε-Subgradient Methods)

• Bundle Methods

• Discrete Gradients

24

Page 25: Nonsmooth Optimization

Descent Methods

• min f(x) subject to x ∈ Rn

• Objective is to find dk f(xk + dk) < f(xk),

• min f(xk + d)− f(xk) subject to d ∈ Rn.

• f(x) twice continuously differentiable, expanding f(xk + d)

f(xk + d)− f(xk) = f ′(xk, d) + ‖d‖ε(d)

ε(d)→ 0 as ‖d‖ → 0

25

Page 26: Nonsmooth Optimization

Descent Methods

• We know f ′(xk, d) = ∇f(xk)Td

• mind∈Rn

∇f(xk)Td

subject to d ≤ 1.

• Search direction in descent is obtained

− ∇f(xk)‖∇f(xk)‖

• To find xk+1, a line search performed along dkto obtain t from which next point xk + tdk is computed

26

Page 27: Nonsmooth Optimization

Subgradient Algorithm

• Developed for minimizing convex functions

• min f(x) subject to x ∈ Rn

• x0 given, generates a sequence {xk}∞k=0 according toxk+1 = xk − αkvk, vk ∈ ∂f(xk)

• Simple generalization of a descent method with line search

• Opposite direction of subgradient is not descentline search cannot be used

27

Page 28: Nonsmooth Optimization

Subgradient Algorithm

• Does not converge to a stationary point

• Special rules for computation of a step size

• Theorem by Shor N.Z.:

S∗ set of minimum points of f , {xk} using step αk := α‖vk‖

for any ε and any x∗ ∈ S∗, one can find a k = k̄

f(x̄) = f(xk̄) and ‖x̄− x∗‖ < α(1+ε)2

28

Page 29: Nonsmooth Optimization

Bundle Method

• At current iterate xk, we have trial pointsyj ∈ Rn (j ∈ Jk ⊂ {1,2, . . . , k})

• Idea: underestimate f by using a piecewise-linear functions

• Subdifferential of f at x:∂f(x) = {vj ∈ Rn | (v, z − x) ≤ f(z)− f(x) ∀z ∈ Rn}

• f̂k(x) = maxj∈Jk

{f(yj) + (vj, x− yj)}

• f̂k(x) ≤ f(x) ∀x ∈ Rn and f̂k(yj) = f(yj) j ∈ Jk

29

Page 30: Nonsmooth Optimization

Bundle Method

• Serious Step: xk+1 := yk+1 := xk + tdk, t > 0

in case a sufficient decrease achieved at xk+1,

• Null Step: xk+1 := xk, in case no sufficient decrease achieved,

gradient information is enriched by new subgradient

vk+1 ∈ ∂f(yk+1) in the bundle.

30

Page 31: Nonsmooth Optimization

Bundle Method

• Standart concepts: serious step and null step

• The convergence problem is avoided by making sure thatthey are descent methods.

• Descent direction is found by solving a QP involving thecutting plane approximation of the function over a bunddleof subgradients.

• Utilize the information from the previous iterations by storingthe subgradient information into a bundle.

31

Page 32: Nonsmooth Optimization

Asplund Spaces

• Nonsmooth referred to functions, spaces can also be referred

• Banach spaces: complete normed vector spaces

• Frechet derivative, Gateaux derivative

• f is Frechet differentiable on an open set U ⊂ V ,if its Gateaux derivative linear, bounded at each point of Uand the Gateaux derivative is a continuous map U → L(V,W ).

• Asplund Spaces: a Banach space, every convex continuousfunction is generically Frechet differentiable

32

Page 33: Nonsmooth Optimization

Referanslar

Clarke, F.H., 1983. Optimization and Nonsmooth Analysis,Wiley-Interscience, New York.

Demyanov, V.F., 2002. The Rise of Nonsmooth Analysis: ItsMain Tools, Cybernetics and Systems Analysis, 38(4), 2002.

Jongen, H. Th., Pallaschke, D., 1988. On linearization andcontinuous selections of functions, Optimization 19(3), 343-353.

Rockafellar, R.T., 1972. Convex Analysis, Princeton UniversityPress, New Jersey.

Schittkowski K., 1992. Solving nonlinear programming problemswith very many constraints, Optimization, 25, 179-196.

33

Page 34: Nonsmooth Optimization

Weber, G.-W., 1993. Minimization of a max-type function:

Characterization of structural stability, in: Parametric Optimiza-

tion and Related Topics III, J. Guddat, J., H. Th. Jongen, and

B. Kummer, and F. Nozicka, eds., Peter Lang publishing house,

Frankfurt a.M., Bern, New York, pp. 519538.


Top Related