Download - Nonsmooth Optimization

Preliminaries

• Rn, n-dimensional real Euclidean space and x, y ∈ Rn

• Usual inner product (x, y) = xTy = [n∑i=1

xiyi]

• Euclidean norm ‖x‖ =√

(x, x) = (xTx)12

• f : O → R is smooth (continuously differentiable), if thegradient ∇f : O → R is defined and continuous on an open

set O ⊆ Rn: ∇f(x) =

(∂f(x)

∂x1,∂f(x)

∂x2, . . . ,

∂f(x)

∂xn

)T

2

Smooth Functions - Directional Derivative

• Directional derivatives f ′(x;u), f ′(x;−u) of f at x ∈ O,in the direction of u ∈ Rn:

f ′(x;u) := limα→+0

f(x+ αu)− f(x)

α= (∇f(x), u),

• f ′(x; e1), f ′(x; e2), . . . , f ′(x; en), ei(i = 1,2, . . . , n) unit vectors

• (∇f(x), e1) = fx1, (∇f(x), e2) = fx2 and (∇f(x), en) = fxn.

• Note that f ′(x;u) = −f ′(x;−u).

3

Smooth Functions - 1st order approximation

• A first-order approximation of f near x ∈ Oby means of the Taylor series with remainder term:

f(x+ δ) = f(x) + (∇f(x), δ) + ox(δ) (x+ δ ∈ O),

• limα→0

ox(αδ)

α= 0 where δ ∈ Rn is small enough.

• a smooth function can be locally replaced by a “simple” linear

approximation of it

4

Smooth Functions - Optimality Conditions

First-order necessary conditions for an extremum:

• For x∗ ∈ O to be a local minimizer of f on Rn, it is necessary

that ∇f(x∗) = 0n,

• For x∗ ∈ O to be a local maximizer of f on Rn, it is necessary

that ∇f(x∗) = 0n.

5

Smooth Functions - Descent/Ascent Directions

Directions of steepest descent and ascent if x is not a stationarypoint,

• the unit steepest descent direction ud of the function f at a

point x: ud(x) = −∇f(x)

‖∇f(x)‖,

• the unit steepest ascent direction ua of the function f at a

point x: ua(x) =∇f(x)

‖∇f(x)‖.

• One steepest descent direction, only one steepest ascent di-rection and u0(x) = −u1(x)

6

Smooth Functions - Chain Rule

• Chain rule: Let f : Rn → R, g : Rn → R, h : Rn → Rn.

• If f ∈ C1(O), g ∈ C1(O) and f(x) = g(h(x)) then, ∇Tf(x) =

∇Tg(h(x))∇h(x)

• ∇h(x) =

[∂hj(x)

∂xi

]i,j=1,2,...,n

is an n× n matrix.

7

Nonsmooth Optimization

• Deals with nondifferentiable functions

• The problem is to find a proper replacement for the concept

of gradient

• Different research groups work on nonsmooth function classes;

hence there are different theories to handle the different non-

smooth problems

• Tools replacing the gradient

8

Keywords of Nonsmooth Optimization

• Convex Functions, Lipschitz Continuous Functions

• Generalized directional derivatives, Generalized Derivatives

• Subgradient method, Bundle method, Discrete Gradient Al-

gorithm

• Asplund Spaces

9

Convex Functions

• O ⊆ Rn a nonempty convex set

if αx+ (1− α)y ∈ O for all x, y ∈ O, α ∈ [0,1]

• f : O → R, R := [−∞,∞] s.t.

f(λx+ (1− λ)y) ≤ λf(x) + (1− λ)f(y)

for any x, y ∈ O, λ ∈ [0,1].

10

Convex Functions

• Every local minimum is a global minimum

• ξ a subgradient of f at a nondifferentiable point x ∈ domfif it satisfies the subgradient inequality, i.e.,

f(y) ≥ f(x) + (ξ, y − x).

• Set of subgradients of is called subdifferential, ∂f(x)

∂f(x) := {ξ ∈ Rn | f(y) ≥ f(x) + (ξ, y − x) ∀y ∈ Rn}.

11

Convex Functions

• The subgradients at a point can be characterized by direc-

tional derivative: f ′(x;u) = supξ∈∂f(x)

(ξ, u).

• x in the interior of domf , subdifferential ∂f(x) is compact

then the directional derivative is finite

• Subdifferential in relation with the directional derivative

∂f(x) = {ξ ∈ Rn | f ′(x;u) ≥ (ξ, u) ∀u ∈ Rn}.

12

Lipschitz Continuous Functions

• f : O → R is Lipschitz continuous for some constant K

if for all y, z in an open set O: |f(y)− f(z)| ≤ K‖y − z‖

• Differentiable almost everywhere

• Clarke subdifferential ∂Cf(x) of Lipschitz continuous f at x

∂Cf(x) = co{ξ ∈ Rn | ξ = limk→∞

∇f(xk), xk → x, xk ∈ D}D is the set where the function is differentiable.

13

Lipschitz Continuous Functions

• Mean Value Theorem for Clarke subdifferentials ξ

f(b)− f(a) = (ξ, b− a)

• Nonsmooth chain rule with respect to Clarke subdifferential

∂C(g ◦ F )(x) ⊆ co

{m∑i=1

ξiµi | ξ = (ξ1, ξ2, . . . , ξm) ∈ ∂Cg(F (x))

}µi ∈ ∂Cfi(x) (i = 1,2, . . . ,m)

• F (·) = (f1(·), f2(·), . . . , fm(·)) a vector valued function,

g : Rm → R, g ◦ F : Rn → R are Lipschitz continuous

14

Regular Functions

• Locally Lipschitz functions have directional derivative

f ′C(x;u) = f ′(x;u)

• Ex: Semismooth functions: f : Rn → R at x ∈ Rn is locally

Lipschitz for every u ∈ Rn the following limit exists:

limξ∈∂f(x+αu)

v→uα→+0

(ξ, u)

15

Max- and Min-type Functions

• f(x) = max {f1(x), f2(x), . . . , fm(x)}, fi : Rn → R (i = 1,2, . . . ,m)

• ∂Cf(x) ⊆ co

⋃i∈J(x)

∂Cfi(x)

,where J(x) := {i = 1,2, . . . ,m | f(x) = fi(x)}

• Ex: f(x) = max {f1(x), f2(x)}

16

Quasidifferentiable Functions

• f : Rn → R is quasidifferentiable

if f ′(x;u) exist finitely ∀x in the direction u and

there exists [∂f(x), ∂̄f(x)]

• f ′(x;u) = maxξ∈∂f(x)

(ξ, u) + minφ∈∂̄f(x)

(φ, u)

• [∂f(x), ∂̄f(x)] is the quasidifferential, ∂f(x) subdifferential,

∂f(x) superdifferential

17

Directional Derivatives

f : O → R, O ⊂ Rn, x ∈ O in the direction u ∈ Rn

• Dini Directional Derivative

• Hadamard Directional Derivative

• Clarke Directional Derivative

• Michel-Penot Directional Derivative

18

Dini Directional Derivative

• upper Dini directionally differentiable

f ′D(x;u) := lim supα→+0

f(x+αu)−f(x)α

• lower Dini directionally differentiable

f ′D(x;−u) := lim infα→+0

f(x+αu)−f(x)α

• Dini subdifferentiable f ′D(x;u) = f ′D(x;−u)

19

Hadamard Directional Derivative

• upper Hadamard directionally differentiable

f ′H(x;u) := limα→+0

supv→u

f(x+αv)−f(x)α

• lower Hadamard directionally differentiable

f ′H(x;−u) := limα→+0

infv→u

f(x+αv)−f(x)α

• Hadamard Subdifferentiable f ′H(x;u) = f ′H(x;−u)

20

Clarke Directional Derivative

• upper Clarke directionally differentiable

f ′C(x;u) := limy→x sup

α→+0

f(x+αu)−f(y)α

• lower Clarke directionally differentiable

f ′C(x;−u) := limy→x inf

α→+0

f(x+αu)−f(y)α

• Clarke Subdifferentiable f ′C(x;u) = f ′C(x;−u)

21

Michel-Penot Directional Derivative

• upper Michel-Penot directionally differentiable

f ′MP (x;u) := supv∈Rn

{lim supα→0

1α[f(x+ α(u+ v))− f(x+ αv)]}

• lower Michel-Penot directionally differentiable

f ′MP (x;−u) := infv∈Rn

{lim infα→0

1α[f(x+ α(u+ v))− f(x+ αv)]}

• Michel-Penot Subdifferentiable f ′MP (x;u) = f ′MP (x;−u)

22

Subdifferentials and Optimality Conditions

• f ′(x;u) = maxξ∈∂f(x)

(ξ, u) ∀u ∈ Rn

• For a point x∗ to be a minimizer,

it is necessary that 0n ∈ ∂f(x)

• A point x∗ satisfying 0n ∈ ∂f(x) is called stationary point

23

Nonsmooth Optimization Methods

• Subgradient Algorithm (and ε-Subgradient Methods)

• Bundle Methods

• Discrete Gradients

24

Descent Methods

• min f(x) subject to x ∈ Rn

• Objective is to find dk f(xk + dk) < f(xk),

• min f(xk + d)− f(xk) subject to d ∈ Rn.

• f(x) twice continuously differentiable, expanding f(xk + d)

f(xk + d)− f(xk) = f ′(xk, d) + ‖d‖ε(d)

ε(d)→ 0 as ‖d‖ → 0

25

Descent Methods

• We know f ′(xk, d) = ∇f(xk)Td

• mind∈Rn

∇f(xk)Td

subject to d ≤ 1.

• Search direction in descent is obtained

− ∇f(xk)‖∇f(xk)‖

• To find xk+1, a line search performed along dkto obtain t from which next point xk + tdk is computed

26

Subgradient Algorithm

• Developed for minimizing convex functions

• min f(x) subject to x ∈ Rn

• x0 given, generates a sequence {xk}∞k=0 according toxk+1 = xk − αkvk, vk ∈ ∂f(xk)

• Simple generalization of a descent method with line search

• Opposite direction of subgradient is not descentline search cannot be used

27

Subgradient Algorithm

• Does not converge to a stationary point

• Special rules for computation of a step size

• Theorem by Shor N.Z.:

S∗ set of minimum points of f , {xk} using step αk := α‖vk‖

for any ε and any x∗ ∈ S∗, one can find a k = k̄

f(x̄) = f(xk̄) and ‖x̄− x∗‖ < α(1+ε)2

28

Bundle Method

• At current iterate xk, we have trial pointsyj ∈ Rn (j ∈ Jk ⊂ {1,2, . . . , k})

• Idea: underestimate f by using a piecewise-linear functions

• Subdifferential of f at x:∂f(x) = {vj ∈ Rn | (v, z − x) ≤ f(z)− f(x) ∀z ∈ Rn}

• f̂k(x) = maxj∈Jk

{f(yj) + (vj, x− yj)}

• f̂k(x) ≤ f(x) ∀x ∈ Rn and f̂k(yj) = f(yj) j ∈ Jk

29

Bundle Method

• Serious Step: xk+1 := yk+1 := xk + tdk, t > 0

in case a sufficient decrease achieved at xk+1,

• Null Step: xk+1 := xk, in case no sufficient decrease achieved,

gradient information is enriched by new subgradient

vk+1 ∈ ∂f(yk+1) in the bundle.

30

Bundle Method

• Standart concepts: serious step and null step

• The convergence problem is avoided by making sure thatthey are descent methods.

• Descent direction is found by solving a QP involving thecutting plane approximation of the function over a bunddleof subgradients.

• Utilize the information from the previous iterations by storingthe subgradient information into a bundle.

31

Asplund Spaces

• Nonsmooth referred to functions, spaces can also be referred

• Banach spaces: complete normed vector spaces

• Frechet derivative, Gateaux derivative

• f is Frechet differentiable on an open set U ⊂ V ,if its Gateaux derivative linear, bounded at each point of Uand the Gateaux derivative is a continuous map U → L(V,W ).

• Asplund Spaces: a Banach space, every convex continuousfunction is generically Frechet differentiable

32

Referanslar

Clarke, F.H., 1983. Optimization and Nonsmooth Analysis,Wiley-Interscience, New York.

Demyanov, V.F., 2002. The Rise of Nonsmooth Analysis: ItsMain Tools, Cybernetics and Systems Analysis, 38(4), 2002.

Jongen, H. Th., Pallaschke, D., 1988. On linearization andcontinuous selections of functions, Optimization 19(3), 343-353.

Rockafellar, R.T., 1972. Convex Analysis, Princeton UniversityPress, New Jersey.

Schittkowski K., 1992. Solving nonlinear programming problemswith very many constraints, Optimization, 25, 179-196.

33

Weber, G.-W., 1993. Minimization of a max-type function:

Characterization of structural stability, in: Parametric Optimiza-

tion and Related Topics III, J. Guddat, J., H. Th. Jongen, and

B. Kummer, and F. Nozicka, eds., Peter Lang publishing house,

Frankfurt a.M., Bern, New York, pp. 519538.

Download - Nonsmooth Optimization

Top Related