semismooth newton methods in function space theoretical ... · outlineofpartii...
TRANSCRIPT
Semismooth Newton Methods in Function SpaceTheoretical Foundations and Applications
Part II
Michael Ulbrich
Technische Universität München
42nd Woudschoten Conference, October 4–6, 2017
Supported by the DFG and the Munich Centre of Advanced Computing (MAC)Includes joint work with Christian Böhm, Daniela Bratzke, Michael
Hintermüller, Moritz Keuthen, Andre Milzarek, Stefan Ulbrich
Outline of Part II
Sufficient Conditions for Regularity
Moreau-Yosida Regularization for State-Constrained andRelated Problems
Application to 3D Elastic Contact Problems (Multigrid Preconditioner)
Globalization
Semismooth Newton for Nonsmooth Minimizationusing the Proximal Operator
Application to Seismic Tomography
Mesh-Independence of Semismooth Newton
Michael Ulbrich | Semismooth Newton Methods in Function Space: Theory and Applications | 5.10.2017 2
Sufficient Conditions for Regularity
Michael Ulbrich | Semismooth Newton Methods in Function Space: Theory and Applications | 5.10.2017 3
Sufficient Conditions for RegularityS. Ulbrich, M.U. ’00; M.U. ’01,’11
Sufficient conditions for regularity
‖M−1‖Z→W ≤ C ∀ M ∈ ∂H(w), ∀ w ∈ BW (w , δ)
can be derived from second-order type optimality conditions.
For reformulated complementarity problems in L2, the central ingredientsof sufficient regularity conditions are:
(v ,F ′(w)v)L2(Ω) ≥ ν‖v‖2L2(Ω) ∀ v ∈ L2(Ω) with vF (w) = 0 a.e. on Ω.
F has the structure
F = γI + G with γ > 0, G : L2(Ω)→ Lp(Ω), p > 2.
Some further technical requirements.
Michael Ulbrich | Semismooth Newton Methods in Function Space: Theory and Applications | 5.10.2017 4
State-Constrained and Related Problems
Michael Ulbrich | Semismooth Newton Methods in Function Space: Theory and Applications | 5.10.2017 5
Nonsmooth Reformulation Beyond the L2 Setting
Consider the obstacle-type problem
minw∈W
J(w) s.t. w ≤ β.
with, e.g., W = H10 (Ω) or C(Ω), or H1
0 (Ω) ∩ H2(Ω) and β ∈W .
Then the optimality conditions assume the form
w ≤ β, 〈J ′(w),w − w〉W ∗,W ≥ 0 ∀ w ∈W , w ≤ β.
It is no longer possible to write this in pointwise form
w − P(−∞,β](w − τJ ′(w)) = 0,
since J ′(w) ∈W ∗ is not a pointwise a.e. defined function.
Hence, nonsmooth Newton methods for such problems are difficult.
Standard approach: Regularize the problem to recover an Lp setting.
Michael Ulbrich | Semismooth Newton Methods in Function Space: Theory and Applications | 5.10.2017 6
Moreau-Yosida RegularizationIto, Kunisch ’03,’08; Hintermüller, Kunisch ’06; Meyer, Yousept ’09;Neitzel, Tröltzsch ’08; Hintermüller, Schiela, Wollner ’12; M.U. ’02,’11;Keuthen, M.U. ’15; Böhm, M.U. ’15; M.U., S. Ulbrich, Bratzke ’17
Moreau-Yosida regularized problem:
minw
J(w) + 12θ
∥∥∥[λ+ θ(w − β)]+
∥∥∥2L2(Ω)
,
where θ > 0 is a penalty parameter and [t]+ = max0, t.λ ∈ L2
+(Ω) is a shift parameter (often: λ = 0).
Optimality Conditions: J ′(w) + [λ+ θ(w − β)]+ = 0.
Observation: Semismooth Newton methods are applicable, since
w ∈W ⊂ Lp(Ω) 7→ λ+ θ(w − β) ∈ L2(Ω) ⊂ Lp′(Ω) ⊂W ∗, p′ = pp−1 ,
is continuous affine linear with suitable p > 2 (Sobolev embedding).
Results on convergence rates w.r.t. θ and continuation methods are avaliable.Michael Ulbrich | Semismooth Newton Methods in Function Space: Theory and Applications | 5.10.2017 7
Moreau-Yosida Regularization – Semismooth NewtonIto, Kunisch ’03,’08; Hintermüller, Kunisch ’06; Hintermüller, Schiela, Wollner ’12;M.U. ’02,’11; Böhm, M.U. ’15; Keuthen, M.U. ’15; M.U., S. Ulbrich, Bratzke ’17
As observed, with p > 2 such that W ⊂ Lp(Ω), the operator
w ∈W ⊂ Lp(Ω) 7→ [λ+ θ(w − β)]+ ∈ Lp′(Ω) ⊂W ∗,
is semismooth. The generalized differential of [·]+ at λ+ θ(w − β) consistsof all operators M ∈ L(W ,W ∗), M : h 7→ g h, where
g ∈ L∞(Ω), g(x) ∈
0 if λ(x) + θ(w(x)− β(x)) < 0,1 if λ(x) + θ(w(x)− β(x)) > 0,[0, 1] if λ(x) + θ(w(x)− β(x)) = 0.
The semismooth Newton system thus reads
[J ′′(w) + θ g · I] s = −J ′(w)− [λ+ θ(w − β)]+.
Michael Ulbrich | Semismooth Newton Methods in Function Space: Theory and Applications | 5.10.2017 8
Moreau-Yosida – Some Extensions
Constrained problem: minw
J(w) s.t. c(w) ∈ C .
J : W → R and c : W → V are C1; C ⊂ V is closed and convex.
Choose a Hilbert space V0 (often V0 = L2) with V ⊂ V0 densely.
Moreau-Yosida regularized problem:
minw
Jθ(w) := J(w) + θ2dist
2V0
(c(w) + θ−1λ,C),
where λ ∈ V0 and distV0(·,C) measures the distance from C in V0.
Using proximal theory, Jθ is C1 with
J ′θ(w) = J ′(w) + θc ′(w)∗R(c(w) + θ−1λ− PV0C (c(w) + θ−1λ)).
Here, R : V0 → V ∗0 , Rw = (w , ·)V0 , is the Riesz map.
Example: C = v ∈ V ; v(x) ∈ C a.e. on Ω; C ⊂ Rn closed,convex. Then PL2(Ω)n
C (v)(·) = PC (v(·)) is a superposition operator.
Michael Ulbrich | Semismooth Newton Methods in Function Space: Theory and Applications | 5.10.2017 9
State ConstraintsA particularly involved situation arises for state constraints:
miny∈Y ,u∈U
J(y , u) s.t. e(y , u) = 0, y ≤ β.
Then, in general, CQs are more demanding: Usually, Y ⊂ C(Ω) is required.As a consequence, the multiplier is a measure: λ ∈M(Ω) = C(Ω)∗.
Moreau-Yosida-Regularized Problem:
miny∈Y ,u∈U
J(y , u) + 12θ‖[λ+ θ(y − β)]+‖2L2(Ω) s.t. e(y , u) = 0.
Moreau-Yosida Optimality System:
Jy (y , u) + [λ+ θ(y − β)]+ + ey (y , u)∗q = 0,
Ju(y , u) + eu(y , u)∗q = 0,
e(y , u) = 0.
Semismooth Newton methods are applicable to the MY optimality system.Michael Ulbrich | Semismooth Newton Methods in Function Space: Theory and Applications | 5.10.2017 10
State-Constrained Problem – Optimal State
State constraint: y ≤ 0.1 in Ω.Semismooth Newton requires 20 iterations.Nested iteration reduces fine grid iterations to 4.
00.1
0.20.3
0.40.5
0.60.7
0.80.9
1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
−1.6
−1.4
−1.2
−1
−0.8
−0.6
−0.4
−0.2
0
0.2
Michael Ulbrich | Semismooth Newton Methods in Function Space: Theory and Applications | 5.10.2017 11
State-Constrained Problem – Optimal Multiplier
The Lagrange multiplier for the state constraint is very irregular (a measure).This makes state constraints a challenging problem class.
00.1
0.20.3
0.40.5
0.60.7
0.80.9
1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0
50
100
150
200
250
300
Michael Ulbrich | Semismooth Newton Methods in Function Space: Theory and Applications | 5.10.2017 12
State-Constrained Problem – Optimal Control
00.1
0.20.3
0.40.5
0.60.7
0.80.9
1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
−100
−50
0
50
Michael Ulbrich | Semismooth Newton Methods in Function Space: Theory and Applications | 5.10.2017 13
Application to 3D Elastic Contact Problems
M.U., S. Ulbrich, D. Bratzke ’17
Michael Ulbrich | Semismooth Newton Methods in Function Space: Theory and Applications | 5.10.2017 14
Elastic 3D Contact Problem
Ω
nΓC
ΓN
ΓD
Michael Ulbrich | Semismooth Newton Methods in Function Space: Theory and Applications | 5.10.2017 15
Elastic 3D Contact Problem
Elastic 3D Contact Problem as Optimization Problem (P):
minu∈U
J(u) :=∫
Ω
(µε(u) : ε(u) + λ
2 div(u)2 − f TV u)dx −
∫ΓN
f TS u dS(x)
s. t. uT n ≤ β on ΓC
Ω ⊂ R3 reference domain of an elastic body,ΓD , ΓN ⊂ ∂Ω Dirichlet boundary, Neumann boundary,ΓC ⊂ ∂Ω possible contact boundary on Ω,u ∈ U displacement, U =
u ∈ H1(Ω)3 ; u|ΓD = 0
ε(u) = 1
2 (∇u +∇uT ) strain,λ, µ Lamé material constants,uT n normal displacement on ΓC ,β ∈ H 1
2 (ΓC ) normal distance of the body to the obstacle,fV ∈ L2(Ω)3, fS ∈ L2(ΓN)3 volume / surface forces.
Michael Ulbrich | Semismooth Newton Methods in Function Space: Theory and Applications | 5.10.2017 16
Possible Generalizations
Our theory and methods also work for other C2-functions J : U → R.
The following structure is required:
J(u) =∫
Ω
( 12 (C∇u) : ∇u+D(u) : ∇u+e(u)−f T
V u)dx−
∫ΓN
f TS u dS(x),
where C, D, and e have suitable properties.
For error estimates we need that J : U → R is strongly convex in aneighborhood of the solution.
For the analysis of the multigrid semismooth Newton method werequire that J : U → R is strongly convex in a neighborhood of theMoreau-Yosida-regularized solution.
Michael Ulbrich | Semismooth Newton Methods in Function Space: Theory and Applications | 5.10.2017 17
Related Work
Semismooth Newton methods for contact problems:Bratzke, Christensen, Hoppe, Hüeber, Ito, Kunisch, Pang, Stadler,M.U., S. Ulbrich, Wohlmuth, . . .
Multilevel methods for contact problems:Dostal, Hüeber, Kornhuber, Krause, Schöberl, Stadler, Oosterlee,Vollebregt, Wohlmuth, Zhao . . .
Abstract multilevel theory (only the references we built on):Bornemann, Yserentant (. . . and many more)
Multilevel trust region methods:Gratton, von Loesch, Toint, . . .
Regularization of obstacle and state-constrained problems:Hintermüller, Ito, Kunisch, Meyer, Prüfert, Rösch, Schiela, Tröltzsch,M.U, Weiser, . . .
Michael Ulbrich | Semismooth Newton Methods in Function Space: Theory and Applications | 5.10.2017 18
KKT-System of the Elastic Contact ProblemWe define a : U ×U → R, A ∈ L(U,U∗), N ∈ L(U,H 1
2 (ΓC )), f ∈ U∗ by
a(v ,w) = 〈v ,Aw〉U,U∗ =∫
Ω
(2µε(v) : ε(w) + λdiv(v)div(w)
)dx ,
Nu = uT n|ΓC , 〈f , u〉U∗,U =∫
Ωf TV u dx +
∫ΓN
f TS u dS(x).
(P) minu∈U
12a(u, u)− 〈f , u〉U∗,U s. t. Nu ≤ β.
The problem is uniformly convex and quadratic. Also, N is onto (= CQ).
Optimality Conditions:u ∈ U solves (P) if and only if there exists z ∈ H 1
2 (ΓC )∗ such thatAu − f + N∗z = 0
z ≥ 0, Nu − β ≤ 0, 〈z ,Nu − β〉(H
12 )∗,H
12
= 0.
Here, z ≥ 0 means 〈z , v〉(H
12 )∗,H
12≥ 0 ∀ v ∈ H 1
2 (ΓC ), v ≥ 0.
Michael Ulbrich | Semismooth Newton Methods in Function Space: Theory and Applications | 5.10.2017 19
Moreau-Yosida-Regularized Problem
Moreau-Yosida-Regularized Elastic Contact Problem
minu∈U
12 〈Au, u〉U∗,U − 〈f , u〉U∗,U + 1
2θ‖[z + θ(Nu − β)]+‖2L2(ΓC )
Here, θ > 0 is a penalty parameter and z ∈ L2(ΓN)+.Optimality condition is a semismooth system:
Auθ − f + N∗[z + θ(Nuθ − β)]+ = 0
Operator in the semismooth Newton system is boundedly invertible:
A + θN∗MN, with Md = 1z+θ(Nu−β)≥0 d .
Thus, the semismooth Newton method converges locally superlinearly.
We apply a multigrid-preconditioned semismooth Newton CG method.
Michael Ulbrich | Semismooth Newton Methods in Function Space: Theory and Applications | 5.10.2017 20
Error EstimatesM.U., S. Ulbrich, Bratzke ’17
Let u be the solution of (P) with corresponding Lagrange multiplier z .
Let uθ be the solution of (Pθ) and zθ := [z + θ(Nuθ − β)]+.
Regularity results (e.g., Nečas ’75 or Kinderlehrer ’81) can be used toobtain improved regularity of u and z .
For z ∈ L2(ΓC ), we can show for θ →∞:
‖uθ − u‖H1 = o(θ− 12 ),
‖zθ − z‖(H1/2)∗ = o(θ− 12 ).
For z − z ∈ Hs(ΓC ), 0 < s ≤ 12 , we can show θ →∞:
‖uθ − u‖H1 = O(θ−s− 12 ),
‖zθ − z‖(H1/2)∗ = O(θ−s− 12 ),
‖zθ − z‖L2 = O(θ−s).Michael Ulbrich | Semismooth Newton Methods in Function Space: Theory and Applications | 5.10.2017 21
Multigrid-Preconditioned Semismooth Newton PCG Method
M.U., S. Ulbrich, Bratzke ’17
In a recent paper we propose and analyze a multigrid preconditioner for theMY-regularized semismooth Newton system:
The underlying operator is A + θN∗MN.
Large θ generates a strong algebraic (0th order) coupling supportedon the approximate contact boundary.
This requires special care in the multigrid method.
A suitable discretization yields a hierarchy of discretized semismoothNewton systems with the same structure.
We developed a multigrid preconditioner and proved a contractionrate that is independent of the number of grid levels and uniform forall sufficiently large regularization parameters θ.
Michael Ulbrich | Semismooth Newton Methods in Function Space: Theory and Applications | 5.10.2017 22
3D Hertzian Contact Problem
05
1015
20
0
5
10
15
200
5
10
15
20
xz
y
coarsest mesh – 3993 elements finest mesh – 4 120 119 elements
Michael Ulbrich | Semismooth Newton Methods in Function Space: Theory and Applications | 5.10.2017 23
3D Hertzian Contact Problem
left: Maximal contact normal stresses on levels 0,. . . ,6
right: Normal contact stress distribution in the x-y plane
Michael Ulbrich | Semismooth Newton Methods in Function Space: Theory and Applications | 5.10.2017 24
3D Hertzian Contact Problem
contact zone von Mises stress distribution
Michael Ulbrich | Semismooth Newton Methods in Function Space: Theory and Applications | 5.10.2017 25
3D Hertzian Contact Problem
Final θ = 108
εpcg = 10−2 εpcg = 10−4 εpcg = 10−8
l nl nC,l itNewt avg-itpcg itNewt avg-itpcg itNewt avg-itpcg0 922 69 3 1.00 3 1.00 3 1.001 1793 245 6 2.33 4 4.00 4 7.502 4827 929 5 2.40 4 5.00 3 8.6673 16456 3621 5 3.00 4 6.25 3 10.674 61711 14257 5 3.76 4 7.00 4 11.755 237300 56612 5 3.80 4 7.50 4 12.756 928152 225563 5 4.00 4 7.75 4 13.75
Convergence history semismooth Newton method with pcg-multigrid solver:l : Level, nl : number of grid points, nC ,l : number of contact nodes,itNewt: number of semismooth Newton iterations,avg-itpcg: average number of pcg iterations per Newton iteration
Michael Ulbrich | Semismooth Newton Methods in Function Space: Theory and Applications | 5.10.2017 26
Globalization
Michael Ulbrich | Semismooth Newton Methods in Function Space: Theory and Applications | 5.10.2017 27
Globalization
Achieving global convergence of Newton-type methods requiresadditional measures. We address two variants (further options exist).
Globalization using a merit function:Choose a suitable merit function and enforce (nonmonotone) descentto achieve convergence to stationarity for this merit function.
Globalization by path following:Generate a one-parameter family of problems with (Pµ0) easy to solveand (P0) the original system. Follow the path for µ 0.Examples are interior-point and smoothing methods.
If H(w) = 0 expresses optimality conditions, then globalizationtechniques for the underlying optimization problem can be used.
Central for globalizations of Newton-type methods is transition to fastlocal convergence (“undamped” Newton) under realistic conditions.
Michael Ulbrich | Semismooth Newton Methods in Function Space: Theory and Applications | 5.10.2017 28
Globalization Based on Merit Functions
Globalization based on a merit function ϕ enforces convergence tostationarity of the following auxiliary problem:
(Pglob) minw∈W
ϕ(w) s.t. w ∈ S.
S ⊂W is a closed convex set containing the relevant roots of H.
ϕ : U → R is a continuous (preferably C1) function defined on U ⊃ S.
The problem is chosen such that solutions of H(w) = 0 are stationarypoints of (Pglob), ideally with a 1-to-1 correspondence.
Stationarity is often expressed by a continuous criticality measureχ : W → R+ with χ(w) = 0 iff w is a stationary point of (Pglob).
Global convergence comes in different flavors, such as:
lim infk→∞
χ(wk) = 0 or, stronger, limk→∞
χ(wk) = 0.
Sometimes, ϕ depends on (e.g., penalty) parameters that are adapted.
Michael Ulbrich | Semismooth Newton Methods in Function Space: Theory and Applications | 5.10.2017 29
Globalization Based on Merit Functions (2)
If Z = Z∗ is a H-space, a canonical merit function is
ϕ2(w) = 12‖H(w)‖2Z .
If H is differentiable then ϕ′2(w) = H ′(w)∗H(w) and the Newton stepsk = −H ′(wk)−1H(wk) is a descent direction:
〈ϕ′2(wk), sk〉W ∗,W = 〈H ′(wk)∗H(wk),−H ′(wk)−1H(wk)〉W ∗,W= −‖H(wk)‖2Z .
If H is nonsmooth, then ϕ2 is usually nonsmooth, too.
Thus, there arise the following tasks:
• Finding a (preferably C1) merit function ϕ
• Showing that semismooth Newton steps are sufficient descentdirections for ϕ, at least close to a “nice” solution.
Michael Ulbrich | Semismooth Newton Methods in Function Space: Theory and Applications | 5.10.2017 30
Globalization Based on Merit Functions (3)
For complementarity problems (bilateral bounds can also be handled)
w ≥ 0, F (w) ≥ 0, (w ,F (w))W = 0
in either W = Rn or W = L2(Ω) one can use a reformulation
HFB(w) := φFB(w ,F (w)) = 0
component-wise in Rn, pointwise a.e. in L2(Ω).
φFB(a, b) = a + b − ‖(a, b)‖2 is the Fischer-Burmeister function.
We then have HFB : W → Z := W .
Although HFB is nonsmooth, one can show that φ2FB is C1 with
∇(φ2FB)(a, b) = 2φFB(a, b)g for all gT ∈ ∂φFB(a, b).
Michael Ulbrich | Semismooth Newton Methods in Function Space: Theory and Applications | 5.10.2017 31
Globalization Based on Merit Functions (4)
Due to the C1-smoothness of φ2FB , in both cases W = L2(Ω) orW = Rn, it can be proved that if F is C1, then ϕ2 is C1 with
ϕ′2(w) = M∗HFB(w) ∀ M ∈ ∂HFB(w).
For W = Rn, all M ∈ ∂HFB(w) = ∂HFB(w) ⊂ Rn×n have the form
M = Diag(ga) + Diag(gb)F ′(w),
(gaj , gb
j ) ∈ ∂φFB(wj ,Fj(w)) (1 ≤ j ≤ n).
In the case W = L2(Ω), all M ∈ ∂HFB(w) ⊂ L(W ,W ) have the form
Md = ga d + gb F ′(w)d ∀ d ∈ L2(Ω),
ga, gb ∈ L∞(Ω), (ga(x), gb(x)) ∈ ∂φFB(w(x),F (w)(x)) for a.a. x ∈ Ω.
Michael Ulbrich | Semismooth Newton Methods in Function Space: Theory and Applications | 5.10.2017 32
Globalization Based on Merit Functions (5)
In the case Rn, one can devise globally, superlinearly convergentalgorithms that use HFB and ϕ2.
In the case L2(Ω), ϕ2 is a C1-function, but HFB is not semismooth.
In fact, the typical lifting property F = γI + G , where G : L2 → Lp,p > 2, cannot be exploited to achieve that w 7→ (w ,F (w)) maps tosome Lp(Ω)2, p > 2.
In M.U. ’02, ’11, smoothing (or lifting) steps are proposed to closethe L2-Lp norm gap; we do not go into these technicalities.
We see that globalizing semismooth Newton methods in functionspace is delicate.
In practice, semismooth Newton methods combined with nestediteration over a grid hierarchy usually require globalization only oncoarser grids.
Michael Ulbrich | Semismooth Newton Methods in Function Space: Theory and Applications | 5.10.2017 33
Globalization Based on Merit Functions (6)M.U. ’11; Milzarek, M.U. ’14; Milzarek ’16
We used and analyzed the following globalization in several contexts:
Choose an auxiliary problem and a globally convergent method for it.
Use semismooth Newton steps whenever they are admissible for theglobalization method or if they satisfy certain other acceptanceconditions.
Examples for acceptance conditions:
• Nonmonotone filter (filter globalization goes back to Fletcher ’96)
• Sufficient residual reduction between Newton steps.
The acceptance conditions are such that if infinitely many Newtonsteps satisfy the condition, then ‖H(wk)‖Z → 0 on this subsequence.
These conditions are satisfied for full Newton steps in a neighborhoodof a “reasonably nice” solution.
Michael Ulbrich | Semismooth Newton Methods in Function Space: Theory and Applications | 5.10.2017 34
Globalization Based on Path FollowingContinuation w.r.t. a parameter can be used for globalization in various ways:
Nested iteration over a grid hierarchy often requires globalization onlyon the coarse grids (→ finite dimensional theory applies)
Moreau-Yosida regularization (state constraints and related problems)forms a basis for path following, which can be used for globalization.
Smoothing methods introduce a smoothed approximation Hε of H.
• Smoothing Newton system:
H ′εk(wk)sk = −H(wk).
• The merit function ϕk(w) = 12‖Hεk (w)‖2Z is used.
• In Rn global and fast local convergence of smoothing methods isshown, e.g., in Chen, Qi, Sun ’98.
• In a current draft paper, we extend the convergence theory ofsmoothing methods to an L2-setting.
Michael Ulbrich | Semismooth Newton Methods in Function Space: Theory and Applications | 5.10.2017 35
Semismooth Newton for Nonsmooth MinimizationUsing the Proximal Operator
Michael Ulbrich | Semismooth Newton Methods in Function Space: Theory and Applications | 5.10.2017 36
A Composite Nonsmooth Optimization Problem
We consider a composite nonsmooth optimization problem:
minw
g(w) + h(w).
W is a Hilbert space and g : W → R is C1.
h : W → R ∪ +∞ is proper, convex, lower semicontinuous.
For convenience we choose W ∗ = W , 〈·, ·〉W ∗,W = (·, ·)W .
Problems of this form arise in big data, compressed sensing, imageanalysis, sparse control, . . .
Optimality condition of the composite nonsmooth problem:
0 ∈ ∇g(w) + ∂h(w),
where ∂h is the subdifferential of convex analysis.
Goal: Reformulate this generalized equation as a nonsmooth equation.
Michael Ulbrich | Semismooth Newton Methods in Function Space: Theory and Applications | 5.10.2017 37
Composite Nonsmooth Optimization Problems: Examples
Example 1: h can represent constraints:
h(w) = ιWad(w) :=
0 (w ∈Wad),∞ (w /∈Wad),
Wad ⊂W closed, convex.
where Wad ⊂W is closed, convex.
Then the problem is equivalent to minw∈Wad g(w).
Example 2: Sparse optimization and related problems:
h(w) = ‖w‖W with W ⊂ W .
Important in compressed sensing and sparse control:
h(w) = ‖w‖L1 or h(w) = ‖w‖`1 are sparsity promoting.
Michael Ulbrich | Semismooth Newton Methods in Function Space: Theory and Applications | 5.10.2017 38
Proximal Operator
We introduce the Proximal Problem:
miny∈W
f (y) + 12‖y − w‖2W .
f : W → R ∪ ∞ is proper, lower semicontinuous, and convex.
W is a Hilbert space; we work with W ∗ = W , 〈·, ·〉W ∗,W = (·, ·)W .
The proximal problem is strictly convex. Hence, for every w ∈W , theproximal problem has a unique solution.
The unique solution y is denoted by proxf (w) and defines theproximal operator proxf : W →W .
proxf is firmly non-expansive, i.e., for all w1,w2 ∈W :
‖proxf (w1)− proxf (w2)‖W ≤ (proxf (w1)− proxf (w2),w1 − w2)1/2W
≤ ‖w1 − w2‖W .
Michael Ulbrich | Semismooth Newton Methods in Function Space: Theory and Applications | 5.10.2017 39
Proximal Operator (2)
Optimality condition of prox problem:
y = proxf (w) satisfies:
0 ∈ ∂f (y) + y − w .
The optimal value function ef : W → R,
ef (w) := miny∈W
f (y) + 12‖y − w‖2W
= f (proxf (w)) + 12‖proxf (w)− w‖2W
is called Moreau envelope or Moreau-Yosida regularization.
ef is convex and continuously differentiable with
∇ef (w) = w − proxf (w).
Much more could be said, cf., e.g., Bauschke, Combettes ’11.
Michael Ulbrich | Semismooth Newton Methods in Function Space: Theory and Applications | 5.10.2017 40
Proximal Operator – Example
Example: f (w) = µ|w |, W = R, µ > 0.
Proximal operator:
proxµ|·|(w) =
0 (|w | ≤ µ),w − µ sgn(w) (|w | ≥ µ).
(shrinkage/thresholding)
Moreau envelope:
eµ|·|(w) =
12w2 (|w | ≤ µ),µ(|w | − µ
2 ) (|w | ≥ µ).(Huber function)
Gradient of Moreau envelope:
∇eµ|·|(w) =
w (|w | ≤ µ),
µ sgn(x) (|w | ≥ µ)
= w − proxµ|·|(w).
Michael Ulbrich | Semismooth Newton Methods in Function Space: Theory and Applications | 5.10.2017 41
Sketches for the Example f (w) = µ|w |
left: proxµ|w | middle: eµ|·| right: ∇eµ|·|.
Michael Ulbrich | Semismooth Newton Methods in Function Space: Theory and Applications | 5.10.2017 42
Proximal Operator – Example (2)
Example: f (w) = ιWad(w), W = W ∗ H-space,Wad ⊂W closed and convex.
Proximal operator:
proxιWad(w) = PWad = projection onto Wad.
Moreau envelope:
eιWad(w) = 1
2‖w − PWad(w)‖2W = 12 dist
2W (w ,Wad).
Gradient of Moreau envelope:
∇eιWad(w) = w − proxιWad
(w) = w − PWad(w).
Michael Ulbrich | Semismooth Newton Methods in Function Space: Theory and Applications | 5.10.2017 43
Proximal Operator – Example (3)
Example: f (w) = µ‖w‖L1 + ιWad(w), W = L2(Ω),Wad = w ; α ≤ w ≤ β a.e. in Ω.
Proximal problem:
miny
µ‖y‖L1 + 12‖y − w‖2L2 s.t. α ≤ y ≤ β a.e. in Ω
One can show (we assume α < 0 < β):
proxf (w)(x) = P[α,β](w(x)− P[−µ,µ](w(x))), x ∈ Ω.
From our results it follows that this superposition operator is semismoothfrom Lp(Ω) to Lr (Ω), 1 ≤ r < p ≤ ∞.
Michael Ulbrich | Semismooth Newton Methods in Function Space: Theory and Applications | 5.10.2017 44
Proximal Operator of µ| · |+ ι[α,β]
Michael Ulbrich | Semismooth Newton Methods in Function Space: Theory and Applications | 5.10.2017 45
Prox-based Equation Reformulation
Optimization problem: minw g(w) + h(w).
Optimality condition:
(1) 0 ∈ ∇g(w) + ∂h(w).
Optimality condition of prox problem for h with w = w −∇g(w):
(2) 0 ∈ y − w +∇g(w) + ∂h(y).
We now show: (1) ⇐⇒ (3) with
(3) w = proxh(w −∇g(w)).
“=⇒”: If w satisfies (1) then y = w satisfies (2) and thus
proxh(w −∇g(w)) = y = w .
“⇐=”: If w satisfies (3) then y = w satisfies (2) and inserting y = winto (2) shows that w satisfies (1).
Michael Ulbrich | Semismooth Newton Methods in Function Space: Theory and Applications | 5.10.2017 46
Prox-Based Equation Reformulation (2)
We can replace g and h by τg and τh.
This yields that the first order optimality system
(1) 0 ∈ ∇g(w) + ∂h(w).
is equivalent to
(3)τ w = proxτh(w − τ∇g(w)).
We thus arrive at a nonsmooth system of equations.
If ∇g and proxτh are semismooth, then we can apply semismoothNewton methods.
An example, discussed on the next slide, is L1-regularization pluspointwise bound constraints.
Michael Ulbrich | Semismooth Newton Methods in Function Space: Theory and Applications | 5.10.2017 47
Prox-Based Equation Reformulation: Example
L1-regularization plus pointwise bound constraints result in
h(w) = µ‖w‖L1(Ω) + ιWad(w),
where Wad =
w ∈ L2(Ω) ; α ≤ w ≤ β a.e. in Ω.
As shown,
proxτh(w) = P[α,β](w(·)− P[−τµ,τµ](w(·))), x ∈ Ω,
is semismooth from Lp(Ω), p > 2, to L2(Ω).
If g : L2(Ω)→ R has the structure
∇g = γI + G with G : L2(Ω)→ Lp(Ω)
we achieve with τ = 1/γ:
H(w) := w − proxτh(w − τ∇g(w)) = w − proxγ−1h(−γ−1G(w))
is semismooth from L2(Ω) to L2(Ω).
Michael Ulbrich | Semismooth Newton Methods in Function Space: Theory and Applications | 5.10.2017 48
Application to Seismic Tomography
C. Boehm, M.U. ’15
Michael Ulbrich | Semismooth Newton Methods in Function Space: Theory and Applications | 5.10.2017 49
Introduction
Seismic Inversion:Given a set of seismograms and a description of the seismic sources,determine the material parameters of the Earth.
A better knowledge of the structure of the Earth’s subsurface can help to
explain geodynamic processes,
support the search for natural resources,
identify areas of potential geological hazards, ...
Michael Ulbrich | Semismooth Newton Methods in Function Space: Theory and Applications | 5.10.2017 50
Related Work
Full-Waveform Inversion in the Time DomainTromp, Tape, Liu ’05; Epanomeritakis, Akçelik, Ghattas, Bielak ’08;Fichtner, Kennett, Igel, Bunge ’09; Wilcox, Stadler, Burstedde,Ghattas ’10; Fichtner, Trampert ’11 ...
Properties of Parameter-to-State OperatorStolk ’00; Blazek, Stolk, Symes ’13; Kirsch, Rieder ’13 ...
Randomized Source Sampling, Mini-Batch HessianKrebs, Anderson, et al. ’09; Byrd, Chin, Neveitt, Nocedal ’11;Aravkin, Friedlander, Herrmann, Leeuwen ’12; Byrd, Chin, Nocedal,Wu ’12; Haber, Chung, Herrmann ’12; Schiemenz, Igel ’13 ...
Paper: Boehm, M.U., SISC, 2015
Michael Ulbrich | Semismooth Newton Methods in Function Space: Theory and Applications | 5.10.2017 51
Elastic Wave Equation
Bounded domain Ω ⊂ Rd (d = 2, 3), time interval I = (0,T ):
ρutt −∇ · (Ψ : ε(u)) = f on Ω× I,u(0) = 0 on Ω,
ut(0) = 0 on Ω,(Ψ : ε(u)) · ~n = 0 on ∂Ω× I.
u : displacement field ε(u) : strain tensor (= 12 (∇u +∇uT ))
ρ : density Ψ : 4th-order material tensor (Ψijkl )
Particularly important is Lamé material:
Ψijkl = λδijδkl + µ(δikδjl + δilδjk) with parameters λ(x), µ(x).
Michael Ulbrich | Semismooth Newton Methods in Function Space: Theory and Applications | 5.10.2017 52
Suitable Parameter Spaces
Parameterization:Ψ(m)(x) = Ψ(x) + Φ (m(x))
Ψ ∈ L∞(Ω)d4 = reference model that captures major discontinuities.
Unknown m parameterizes model variations
Φ : Rr → Rd4 is sufficiently smooth.
Here for simplicity: Φ linear.
Hilbert space of model variations: M ⊂⊂ L∞(Ω)r (compactly).
Admissible set: Mad = M ∩M∞ad withM∞ad = m ∈ L∞(Ω)r : ψa ≤ Sm ≤ ψbS ∈ L(M,Q), Q ⊂ Lq(Ω)n for some q > 2,ψa, ψb ∈ Q, ψa ≤ 0 < ψb.
Michael Ulbrich | Semismooth Newton Methods in Function Space: Theory and Applications | 5.10.2017 53
Parameter Identification Problem
Input:
ns seismic sources fi (with location and source time function).
Observations uδi (t, x) for every source in Ωobs × I, Ωobs ⊂ Ω.
Reference model Ψ ∈ L∞(Ω)d4 .
Seismic Inverse Problem (SIP)
minm∈Mad
j(m) = J(u(m),m) def=ns∑
i=1Jfit(ui , uδi ) + αJreg(m)
where the displacements u(m) = (ui (m))1≤i≤ns solve the elastic wave PDEs
E (ui ,m) = fi , ui (0) = 0, (ui )t(0) = 0 (1 ≤ i ≤ ns).
Here (e.g.): Jfit(ui , uδi ) = 12∥∥ui − uδi
∥∥2L2(Ωobs×I), Jreg(m) = 1
2‖m‖2M .
Michael Ulbrich | Semismooth Newton Methods in Function Space: Theory and Applications | 5.10.2017 54
Existence and Differentiability for the Seismic Inverse Problem
Let V = H1(Ω)d . The following can be shown:
Existence and uniqueness of (very) weak solutionsu(m) ∈ C (I; L2(Ω)d ) ∩ C1(I; V ∗) to the elastic wave equation forfixed m ∈ M, sufficiently regular ρ and f ∈ L2(I; V ∗).
Fréchet-differentiability of the parameter-to-state operator:Let f ∈ Hk+l (I; V ∗), k ≥ 2, l ≥ 0, and f (t) = 0 near t = 0.Then the solution operator
m ∈ L∞(Ω)r 7→ u(m) ∈ C l (I; V )
is (k − 2)-times Lipschitz continuously Fréchet-differentiable.
Existence of a solution to the regularized seismic inverse problem.
Böhm, M.U. ’15, Lions, Magenes ’72, Lasiecka, Triggiani ’90Michael Ulbrich | Semismooth Newton Methods in Function Space: Theory and Applications | 5.10.2017 55
Moreau-Yosida Regularization
Moreau-Yosida-Regularized Problem
For θ ∈ (0,∞) defineminm∈M
jθ(m) def= j(m) + θφ(m),
with the penalty function
φ(m) def= 12
(‖[Sm − ψb]+‖2L2(Ω)n + ‖[ψa − Sm]+‖2L2(Ω)n
).
First order optimality conditions:
j ′(m) + θS∗([Sm − ψb]+ − [ψa − Sm]+
)= 0 in M∗.
If j is twice cont. diff. then this is a semismooth operator equationsince [ · ]+ is semismooth as a map
Q ⊂ Lq(Ω)n → Lq
q−1 (Ω)n ⊂ Q∗, q > 2.
We also can prove error estimates for the regularized solution in terms of θ.Michael Ulbrich | Semismooth Newton Methods in Function Space: Theory and Applications | 5.10.2017 56
Outline of the Optimization AlgorithmMoreau-Yo
sidaPe
nalty
Metho
d
Trust-Re
gion
1: Choose θ0 > 0, an initial parameter m0θ0 and ε > 0.
2: for k = 0, 1, 2, . . . do3: Obtain an approximate solution m∗
θk of (Pθk ):4: for i = 0, 1, 2, . . . do5: Obtain iterates mi+1
θkby solving a
6: trust-region subproblem with a7: matrix-free Newton-PCG method.8: end for9: if (violation of feasibility & optimality at m∗
θk ) < ε then10: Stop with m∗ = m∗
θk .11: else12: Choose θk+1 > θk .13: Set m0
θk+1 = m∗θk .
14: end if15: end for
Michael Ulbrich | Semismooth Newton Methods in Function Space: Theory and Applications | 5.10.2017 57
Simultaneous Sources – Motivation
minm∈Mad
12∑ns
i=1∥∥ui (m)− uδi
∥∥2L2(Ωobs×I) + αJreg(m)
s.t. E (ui ,m) = fi , ui (0) = 0, (ui )t(0) = 0 (1 ≤ i ≤ ns).
Required number of simulations: objective: ns
gradient: + ns
Newton step: + 2 ns · itcg
Idea: Replace individual seismic events by simultaneous “super-shots”.=⇒ Source-encoding: For w ∈ Rns define u(m; w) as the solution to:
E (u,m) =ns∑
i=1wi fi , u(0) = 0, ut(0) = 0.
Michael Ulbrich | Semismooth Newton Methods in Function Space: Theory and Applications | 5.10.2017 58
Sample Average Approximation
Define
Jfit,w (m) := 12‖u(m; w)−
ns∑i=1
wiuδi ‖2L2(Ωobs×I)
and for WK = (w1, . . . ,wK ) ∈ Rns×K consider
minm∈Mad
j(m; WK ) := 1K
K∑k=1
Jfit,wk (m) + α2 ‖m‖
2M .
+ Requires only 2K simulations for objectiveand gradient.
− Possible loss of information due tointerference.
Michael Ulbrich | Semismooth Newton Methods in Function Space: Theory and Applications | 5.10.2017 59
Marmousi Test Data
Synthetic data set provided by Institut Français du Pétrole.
2D domain, 9.2km × 3.1km,
190 seismic sources at 36m depth,
384 receivers, equidistant at 100m depth,
perfectly elastic, isotropic material with constant Poisson’s ratio.
m: 50k nodes, u: 800k nodes, 4k time steps.
geophysical exploration displacement field (snapshot)
Michael Ulbrich | Semismooth Newton Methods in Function Space: Theory and Applications | 5.10.2017 60
Sample Average Gradient
WK ∈ −1, 1ns×K , i.i.d. Rademacher distributed.
K = 1 super-shot K = 8 super-shots
0 2 4 6 8
0
1
2
3
dept
h(k
m)
0 2 4 6 8
0
1
2
3
8 single sources all 190 sources
0 2 4 6 8
0
1
2
3
dept
h(k
m)
length (km)0 2 4 6 8
0
1
2
3
length (km)
Michael Ulbrich | Semismooth Newton Methods in Function Space: Theory and Applications | 5.10.2017 61
Reconstruction
1 super-shot
8 super-shots
16 super-shots
λ, difference from initial model
Michael Ulbrich | Semismooth Newton Methods in Function Space: Theory and Applications | 5.10.2017 62
Reconstruction - Computational Effort
K super-shots, WK ∈ −1, 1ns×K , i.i.d. Rademacher distributed.
Ktol = 10−3 tol = 10−6
iter. avg. cg it # PDEs iter. avg. cg it # PDEs
1 24 14.3 810 41 25.0 22552 25 15.3 1784 46 26.6 53544 28 18.8 4796 38 24.4 81968 24 17.0 7504 30 21.6 1158416 23 15.6 13376 31 21.9 24256
Can we do better?
Michael Ulbrich | Semismooth Newton Methods in Function Space: Theory and Applications | 5.10.2017 63
Approximation of the Hessian
j ′′(m; WK ) = 1K
K∑k=1
j ′′(m; wk).
Idea: Use mini-batch to generate curvature information.In every Newton iteration:Choose S ⊂ w1, . . . ,wK and approximate
j ′′(m; WK ) ≈ 1|S|
∑wκ∈S
j ′′(m; wκ).
+ Reduces number of PDE solves by a factor of K/|S|.
+ Reduces memory requirements by a factor of K/|S|.
− Only approximation to the true Hessian.
Michael Ulbrich | Semismooth Newton Methods in Function Space: Theory and Applications | 5.10.2017 64
Computational Effort with Mini-Batch Hessian
WK ∈ −1, 1ns×K , i.i.d. Rademacher distributed,
K = 8 super-shots for objective function and gradient,
Hessian approx. with S = wk (k chosen cyclical).
0 5000 100000
0.2
0.4
0.6
0.8
1
PDE simulations
rel. m
isfit
0 5000 10000
10−6
10−4
10−2
100
PDE simulations
rel. o
ptim
.
full Hessian L-BFGS mini-batch Hessian
Michael Ulbrich | Semismooth Newton Methods in Function Space: Theory and Applications | 5.10.2017 65
Comparison: Unconstrained vs. Bounds on Wave Velocities
WK ∈ −1, 1ns×K , i.i.d. Rademacher distributed,
unconstrained vs. lower bound on P-wave velocity: 1450 m/s,
K super-shots, Hessian approx. with S = wk (k chosen cyclical).
Kw/o constraints with constraints
iter. avg. cg it # PDEs iter. avg. cg it # PDEs
8 65 23.2 5058 66 24.0 509216 62 22.5 6112 65 24.4 6358
Total no. of PDE solves is less than 16 cg iterations without super-shots!
Michael Ulbrich | Semismooth Newton Methods in Function Space: Theory and Applications | 5.10.2017 66
Comparison: Unconstrained vs. Bounds on Wave Velocities
WK ∈ −1, 1ns×K , i.i.d. Rademacher distributed,
unconstrained vs. lower bound on P-wave velocity: 1450 m/s,
K super-shots, Hessian approx. with S = wk (k chosen cyclical).
Kw/o constraints with constraints
iter. avg. cg it # PDEs iter. avg. cg it # PDEs
8 65 23.2 5058 66 24.0 509216 62 22.5 6112 65 24.4 6358
Total no. of PDE solves is less than 16 cg iterations without super-shots!
Michael Ulbrich | Semismooth Newton Methods in Function Space: Theory and Applications | 5.10.2017 66
Parallel Scaling Statistics - Multiple Events
#cores 32 64 128 256 512 1024
#events 32 32 32 32 32 32#cores / event 1 2 4 8 16 32total time (s) 265.3 130.1 66.6 33.7 18.5 9.4speed-up 1.0 2.04 3.99 7.88 14.36 28.27
par. efficiency 1.000 1.020 0.996 0.985 0.898 0.883
Strong scaling results: 2d elastic wave equation, 32 eventsDiscretization: 12,288 elements, 197,633 dofs, 6,000 time steps
all computations carried out on Piz Daint (Cray XC30, Xeon E5),Swiss National Supercomputing Centre
Michael Ulbrich | Semismooth Newton Methods in Function Space: Theory and Applications | 5.10.2017 67
Parallel Scaling Statistics
#cpu cores 8 64 512 4096
#elements 8,000 64,000 512,000 4,096,000total time (s) 16.7 17.0 17.3 17.9
scaling efficiency 1.000 0.979 0.963 0.935
Weak scaling results: 3d elastic wave equationDiscretization: 68,921 dofs per core, 1,000 time steps
all computations carried out on Piz Daint (Cray XC30, Xeon E5),Swiss National Supercomputing Centre
Michael Ulbrich | Semismooth Newton Methods in Function Space: Theory and Applications | 5.10.2017 68
Mesh-Independence of Semismooth Newton
Michael Ulbrich | Semismooth Newton Methods in Function Space: Theory and Applications | 5.10.2017 69
Challenges due to Nonsmoothness
Parametric stability of the radius of fast convergence, mesh-independenceresults, impicit function theorems, and other related topics are significantlymore challenging in nonsmooth settings than in the smooth case:
In the smooth case, the Jacobian H ′(w) at w induces a good linearmodel H(w) + H ′(w)d of H(w + d) in a neighborhood of w .
If H is C1, then the model varies continuously with w .
In the nonsmooth case, however, the linear operator M(w + d) in the“linear” model has to be chosen depending on w + d , not w :H(w) + M(w + d)d .
In finite dimensions, upper semicontinuity and compact-valuedness of∂H are helpful and yield, e.g., dist(M(w + d), ∂H(w))→ 0 as d → 0.
In infinite dimensions, such properties of ∂H are not available.
Michael Ulbrich | Semismooth Newton Methods in Function Space: Theory and Applications | 5.10.2017 70
Structure of a Semismooth Newton Mesh-Independence ResultAccuracy of discretization measured by mesh size h ∈ (0, h0], h0 > 0.Discrete spaces: Wh ⊂W , Zh ⊂ Z .
Equation: H(w) = 0, H : W → Z
Semismooth Newton: wk+1 = wk + sk , Mksk = −H(wk).
Discretized Equation: Hh(wh) = 0, Hh : Wh → Zh
Discrete Semismooth Newton: wk+1h = wk
h + skh , Mh,ksk
h = −Hh(wkh ).
Let be given solutions w and wh, h ∈ (0, h0), with
‖wh − w‖W → 0, as h→ 0.
Our mesh-independence results have the following flavor:
For all η ∈ (0, 1), there exist h1 ≤ h0, δ > 0, such that ∀ h ∈ (0, h1]:∥∥wk − w∥∥
W < δ,∥∥wk
h − wh∥∥
W < δ ⇒∥∥wk+1 − w∥∥
W ≤ η∥∥wk − w
∥∥W ,
∥∥wk+1h − wh
∥∥W ≤ η
∥∥wkh − wh
∥∥W .
Michael Ulbrich | Semismooth Newton Methods in Function Space: Theory and Applications | 5.10.2017 71
Structure of a Semismooth Newton Mesh-Independence Result (2)Proving mesh-independence can be split in two parts:
There exist h1 ≤ h0, δ0 > 0, C > 0 such that for all h ∈ (0, h1] there holds:
Uniform Regularity Condition:∥∥M−1∥∥
Z→W ≤ C1 ∀ M ∈ ∂H(w), w ∈ BW (w , δ0),∥∥M−1h∥∥
Zh,Wh≤ C1 ∀ Mh ∈ ∂Hh(wh), wh ∈ BWh (wh, δ0).
Mesh-Independent Semismoothness:
For all η ∈ (0, 1) there exists δ ∈ (0, δ0):
∀ w ∈ BW (w , δ), M ∈ ∂H(w), wh ∈ BWh (wh, δ), Mh ∈ ∂Hh(wh) :
‖H(w)− H(w)−M(w − w)‖Z ≤ η ‖w − w‖W ,
‖Hh(wh)− Hh(wh)−Mh(wh − wh)‖Z ≤ η ‖wh − wh‖W .
Michael Ulbrich | Semismooth Newton Methods in Function Space: Theory and Applications | 5.10.2017 72
Mesh-Independence Result for VIs in L2
We consider the following complementarity problem:Find w ∈W := L2(Ω) such that
w ≥ 0, F (w) := G(w) + γw ≥ 0, w (G(w) + γw) = 0,
where G : W → Lp(Ω), p > 2 is C1.
Semismooth Reformulation:
(P) H(w) := w − [−γ−1G(w)]+ = 0.
Suitable FE discretization (e.g., piecewise constant finite elements for wh)results in the complementarity problem of finding wh ∈Wh ⊂W with
wh ≥ 0, Gh(wh) + γwh ≥ 0, wh (Gh(wh) + γwh) = 0.
The corresponding semismooth reformulation is given by
(Ph) Hh(wh) := wh − [−γ−1Gh(wh)]+ = 0.
Michael Ulbrich | Semismooth Newton Methods in Function Space: Theory and Applications | 5.10.2017 73
Assumptions
For brevity, let h = 0 correspond to the continuous problem: (P0) = (P).Furthermore, we define Bh(wh, δ) := wh ∈Wh ; ‖wh − wh‖L2 < δ.
Assumptions:
There exist h0, δ0 > 0, p > 2 and L > 0 such that for all h ∈ [0, h0]:
w = w0 solves (P) and wh solves (Ph).
‖wh − w‖L2 → 0 as h→ 0+.
‖Gh(wh)− G(w)‖Lp → 0 as h→ 0+.
Gh : Wh →Wh is C1 on Bh(wh, δ0).∥∥Gh(w1h )− Gh(w2
h )∥∥
Lp ≤ LG∥∥w1
h − w2h∥∥
L2 ∀ w ih ∈ Bh(wh, δ0).∥∥G ′h(w1
h )− G ′h(w2h )∥∥
Wh→Wh≤ LG′
∥∥w1h − w2
h∥∥
L2 ∀ w ih ∈ Bh(wh, δ0).
Strict complementarity: meas(x ∈ Ω ; G(w)(x) = 0) = 0.
Michael Ulbrich | Semismooth Newton Methods in Function Space: Theory and Applications | 5.10.2017 74
A Mesh-Independence Resultcf. Hintermüller, M.U. ’04; M.U. ’11
Under these assumptions, there holds (see M.U. ’11 for refinements):
Mesh-Independent Semismoothness:
For all η ∈ (0, 1) there exist δ ∈ (0, δ0] and h′ ∈ (0, h0] such that, for allh ∈ [0, h′], all sh ∈ Bh(wh, δ), and all M ∈ ∂H(wh + sh):
‖Hh(wh + sh)− Hh(wh)−Msh‖L2 ≤ η ‖sh‖L2 .
Mesh-Independence Result:
Let∥∥M−1h
∥∥Wh→Wh
≤ CM ∀ Mh ∈ ∂Hh(wh), wh ∈ Bh(wh, δ0), h ≤ h0.
Then for all η ∈ (0, 1) there exist δ ∈ (0, δ1] and h′ ∈ (0, h1] such that, forall h ∈ [0, h′] and all w0
h with∥∥w0
h − wh∥∥
L2 < δ, the semismooth Newtonmethod for (Ph) converges to wh with at least q-linear rate η:∥∥wk+1
h − wh∥∥
L2 ≤ η∥∥wk
h − wh∥∥
L2 ∀ k ≥ 0.
Michael Ulbrich | Semismooth Newton Methods in Function Space: Theory and Applications | 5.10.2017 75
Mesh-Independent Order of Semismoothness
One can find examples showing that order of semismoothness at a point wis not stable w.r.t. perturbations of w .
This is bad news for mesh-independent order of semismoothness.
In M.U. ’11 two solutions are provided:
If G ′h are locally Hölder continuous at wh, h ∈ [0, h′] then we can show auniform order α > 0 of semismoothness for all h ∈ [0, h′] and allsh ∈ Bh(0, δ) under the following
Uniform Growth Condition for Complementarity:
There exist constants C > 0, κ > 0, and τ > 0 with
meas(x ; 0 < |Gh(wh)(x)| < t) ≤ Ctκ for all t ∈ (0, τ ], h ∈ [0, h′].
Michael Ulbrich | Semismooth Newton Methods in Function Space: Theory and Applications | 5.10.2017 76
Mesh-Independent Order of Semismoothness (2)
Alternatively, if the above uniform growth condition for complementarity isreplaced by a condition only for h = 0,
meas(x ; |G(w)(x)| < t) ≤ Ctκ for all t ∈ (0, τ ]
we can show a bit less:
Mesh-Independent Order of Semismoothness Result:
There exist δ ∈ (0, δ0], h′ ∈ (0, h0], and C ′ > 0 such that, for allh ∈ [0, h′], all sh ∈ Bh(0, δ), and all Mh ∈ ∂H(wh + sh):
‖Hh(wh + sh)− Hh(wh)−Mhsh‖L2 ≤ C ′max‖sh‖αL2 , ‖Gh(wh)− G(w)‖αL2 ‖sh‖L2
with α = (p−2)κ2(κ+p) .
Corresponding mesh-independent q-orders of local convergence forsemismooth Newton methods can be shown.
Michael Ulbrich | Semismooth Newton Methods in Function Space: Theory and Applications | 5.10.2017 77
Conclusions and Final Remarks
Semismooth Newton methods are an efficient tool for handlinginequality constraints, variational inequalities, and structurednonsmooth optimization problems.
They have been successfully applied in many fields.
A quite comprehensive theory on semismoothness and semismoothNewton is available and is further developing.
You should consider to make them part of your toolbox!
Also in other fields, e.g., machine learning and big data, second ordermethods are about to become increasingly important.
Michael Ulbrich | Semismooth Newton Methods in Function Space: Theory and Applications | 5.10.2017 78
Many thanks for your attention!
Michael Ulbrich | Semismooth Newton Methods in Function Space: Theory and Applications | 5.10.2017 79