proximal point methods for variational problems · point regularization for ill-posed vi’s with...

Proximal Point Methodsfor

Variational Problems

R. Tichatschke

Preface

The subject of variational problems, in particular variational inequalities, hasits origin in the calculus of variations associated with the minimization of func-tionals in infinite-dimensional spaces. The systematic study of the subject beganin the early 1960s with the seminal work of the Italian mathematician GuidoStampacchia and his collaborators, who used variational inequalities (VI’s) asanalytic tool for studying free boundary problems defined by nonlinear partialdifferential operators arising from unilateral problems in elasticity and plasticitytheory and in mechanics. Some of the earliest papers in variational inequalitiesare Lions and Stampacchia [267], Mancino and Stampacchia [283] andStampacchia [381]. In particular, the first theorem of existence and unique-ness of the solutions of VI’s was proved in [381]. The books by Baiocchi andCapelo [28] and Kinderlehrer and Stampacchia [234] provide a thoroughintroduction to the application of VI’s in infinite-dimensional function spaces.The book by Glowinski, Lions and Tremoliere [135] is among the earli-est references to give a detailed numerical treatment of such VI’s. Nowadaysthere is a huge literature on the subject of infinite-dimensional VI’s and relatedproblems.

The development of mathematical programming and control theory hasproceeded almost contemporarily with the systematical investigation of ill-posedproblems and their numerical treatment. It was clear from the very beginningthat the main classes of extremal problems include ill-posed problems. Amongthe variational inequalities, having important applications in different fields ofphysics, there are ill-posed problems, too. Nevertheless, up to now, in the devel-opment of numerical methods for finite- and infinite-dimensional extremal prob-lems, ill-posedness has not been a major point of consideration. As a rule, con-ditions ensuring convergence of a method include assumptions on the problemwhich warrant its well-posedness in a certain sense. Moreover, in many papersexact input data and exact intermediate calculations are assumed. Therefore,the usage of standard optimization and discretization methods often proves tobe unsuccessful for the treatment of ill-posed problems.

In the first methods dealing with ill-posed linear programming and opti-mal control problems, suggested by Tikhonov [393, 394], the problem underconsideration was regularized by means of a sequence of well-posed problems in-volving a regularized objective and preserving the constraints from the originalproblem.

Essential progress in the development of solution methods for ill-posedvariational inequalities was initiated by a paper of Mosco [296]. Based onTikhonov’s principle, he investigated a stable scheme for the sequential ap-proximation of variational inequalities, where the regularization is performed si-multaneously with an approximation of the objective functional and the feasibleset. Later on analogous approaches became known as iterative regularization.

A method using the stabilizing properties of the proximal-mapping (seeMoreau [292]) was introduced by Martinet [287] for the unconstrained min-imization of a convex functional in a Hilbert space. Rockafellar [351, 352]created the theoretical foundation to the further advances in iterative proximalpoint regularization for ill-posed VI’s with monotone operators and convex opti-mization problems. These results attracted the attention of numerical analysts,

i

ii

and the number of papers in this field was increasing rapidly during the lastdecades. With regard to the stability of the regularized problems, which haveto be solved throughout the iteration process, proximal point methods (PPM)are superior in contrast to other approaches.

In the monograph ”Stable Methods for Ill-posed Variational Problems”[215], the purpose was to give a general framework to the investigation of theconvergence and the rate of convergence of iterative proximal point regulariza-tion (IPR) when applied to ill-posed convex variational problems. The iterativeregularization principle was generalized by several schemes and the approachespresented there allow also to analyze discretization and regularization proce-dures in connection with specific optimization methods, i.e., the behavior of thesolution process has been studied as a whole.

Now, I am pleasantly inspired to prepare an extended edition of the mono-graph, because it has apparently been accepted reasonable well by the opti-mization community. The objective in putting the present book together is todetail the major ideas in regularizing of different classes of ill-posed variationalproblems that have evolved in the past twenty years - not only theoretically butalso computationally.

While I may have overlooked the importance of some very recent develop-ments, I think that major new breakthroughs on those two fronts that interestme - theory and methods - have occurred mainly in the direction of computa-tional performances since the monograph [215] was published. Now, as a con-sequence I not only restricted myself to a thorough re-working and face liftingof previous results but have taken also the opportunity to focus on some newtopics, like the work on elliptic regularization, regularization of control prob-lems, regularization with non-quadratic distance functions, the consideration ofthe Auxiliary Problem Principle in context with regularization as well as severalnumerical experiments in the regularization of certain classes of problems. Mostof the results, describe here, are produced in a long-standing fruitful coopera-tion with Alexander Kaplan and I am regretting that he feels now unableto work on this project.

The book consists of ten chapters and an appendix. Each chapter providesmotivations at the beginning and throughout, and each concludes with notes andcomments which furnish credits and references together with historical perspec-tive on how the ideas gradually took shape. Also they give the reader a broaderorientation and special hints for further investigation. Sometimes figures and ex-amples are provided giving the reader a clear understanding of the peculiaritiesof the material. The necessary theoretical background from functional analysis,finite element methods and optimization theory is briefly described in the Ap-pendix.

In the first two chapters several stability concepts in optimization andother fields of mathematics are discussed. Classical approaches to the solutionof ill-posed problems are considered and the regularizing properties of differentoptimization methods are investigated.

Chapter 3 is devoted to IPR for finite-dimensional convex optimizationproblems, containing some proximal variants of projected gradient methods,penalty methods and augmented Lagrangian methods. In particular the appli-cation of proximal point ideas in other fields of numerical analysis and the usage

iii

of IPR in nonconvex programming is discussed.Chapters 4 and 5 deal with a general framework to the investigation of

IPR for infinite-dimensional convex variational problems, including a new classof multi-step regularization methods, regularization on subspaces as well as regu-larization in weaker norm, which permit a more efficient use of rough approxima-tions of the original problem and therefore may lead to an essential accelerationof the numerical process.Conditions are established concerning the coordinated update of discretizationand regularization parameters together with stopping parameters arising in stop-ping rules for the solution of the auxiliary problems. These conditions provideweak or strong convergence of the methods. Asymptotically exact estimates ofthe rate of convergence are obtained for a broad class of ill-posed convex varia-tional problems, and linear convergence is proved for a particular class.

In the next four chapters the general framework is applied and modifiedto the treatment of some specific variational problems.

Subject of Chapter 6 is the description of regularized penalty and barriermethods for convex semi-infinite and convex parametric semi-infinite problems.In particular the numerical behavior of regularized interior point methods forconvex semi-infinite problems is studied. In these cases the use of IPR enablesus to reduce the dimension of the auxiliary problems substantially by means ofspecial deleting rules for the discretized constraints.

Chapter 7 is devoted to the stable solution of ill-posed control problems.A significant peculiarity of this approach is that regularization is accomplishedonly with respect to the control variables but not to the state variables. Thisis a natural application of partial regularization on a subspace, introduced inChapter 4. For elliptic control problems with mixed boundary conditions, thediscretization of the state and control space and the convergence of the inex-act multi-step regularization approach is described and its numerical behavioris analyzed for a particular class of such problems – the so called chatteringproblems.

Chapter 8 reflects the application of the general framework for ill-posedmonotone variational inequalities, including contact problems of the elasticitytheory and some obstacle problems. Several variants of multi-step methods (withweak regularization and regularization on a subspace) are considered which al-low us to use efficiently the monotonicity reserve of the corresponding differentialoperators, in particular, the structure of their kernels. These regularization pro-cedures are coupled with finite element methods used for the discretization ofthe VI’s.

Chapter 9 is dealing with the solution of VI’s with multi-valued maximalmonotone operators. Starting with the idea of a relaxation of IPR methods andthe use of enlargements of maximal monotone operators, this approach is sug-gested to solve non-smooth minimization problems. The appearing numericalaspects and computational results are investigated in detail. In this chapter alsointerior point methods are considered by using non-quadratic distance functions.The main motivation for such proximal methods is not only to guarantee thatthe iterates stay in the interior of the feasibility set of the VI but also to preservethe main merits of the classical IPR: Good stability of the auxiliary problemsand convergence of the whole sequence of iterates to a solution of the originalproblem. In particular, also Bregman functions for convex and non-polyhedralsets are investigated and the standard requirement of the strict convexity of the

iv

Bregman function is weakened. This leads to an analogy of methods with weakregularization and regularization on a subspace developed on the basis of theclassical IPR in Chapter 8.

Finally, Chapter 10 deals with extensions of the auxiliary problem princi-ple in order to solve VI’s with non-symmetric multi-valued operators in Hilbertspaces. The idea to take in the solution scheme an additive component of themain operator of the VI at the sought iterate leads to new methods which linkthe advantages of the auxiliary problem principle with those of proximal-likemethods.

Concerning the bibliographic data given in this book they are not primarilyintended to document original publications and sources or to reflect the histori-cal development of the researches at the whole. Also, the list of references makesno attempt at being complete; we have listed mainly the papers which are dis-cussed in the text. Often we refer to more comprehensive texts. Except for thecase that reference is given to explicit statements or techniques, it was our aimto draw the readers attention to the basic approaches and fields of investigation.

Both authors have benefitted from the fruitful collaboration with their col-leagues and doctoral students, in particular, we mention L. Abbe, E. Huebner,V. Kustova, R. Namm, S. Rotin, H. Schmitt, T. Voetmann. We thank them all.We would like to express our gratitude also to the German Research Foundation,whose consecutively financial support in a series of projects in that field has keptus hopeful that optimization and regularization is still an important subject. Wehighly appreciate all support and help we got from wonderful colleagues that wehave met and worked with at some point or another during the past twenty years.

Trier, Dezember 2010 R. T ichatschke

Contents

Nomenclature ix

1 Ill-POSED PROBLEMS AND THEIR STABILIZATION 11.1 Concepts of Well-Posedness – Examples of Ill-Posed Problems . . 11.2 Well- and Ill-posed Variational Problems . . . . . . . . . . . . . . 5

1.2.1 Notations and definitions . . . . . . . . . . . . . . . . . . 51.2.2 Some conditions of well-posedness . . . . . . . . . . . . . 9

1.3 Approaches of Stabilizing Ill-posed Problems . . . . . . . . . . . 141.3.1 Stabilizers and quasi-solutions . . . . . . . . . . . . . . . 141.3.2 Classical approaches for stabilization of ill-posed problems 151.3.3 Tikhonov regularization . . . . . . . . . . . . . . . . . . . 201.3.4 Proximal point regularization . . . . . . . . . . . . . . . . 22

1.4 Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

2 STABLE DISCRETIZATION AND SOLUTION METHODS 292.1 Stabilizing Properties of Numerical Methods . . . . . . . . . . . . 29

2.1.1 Stabilizing properties of gradient type methods . . . . . . 292.1.2 Stabilizing properties of penalty methods . . . . . . . . . 32

2.2 Stabilization of Problems and Methods . . . . . . . . . . . . . . . 362.2.1 Structure of regularizing methods . . . . . . . . . . . . . . 372.2.2 Iterative one-step regularization . . . . . . . . . . . . . . . 382.2.3 Iterative multi-step-regularization . . . . . . . . . . . . . 38

2.3 Iterative Tikhonov Regularization . . . . . . . . . . . . . . . . . . 392.3.1 Regularized subgradient projection methods . . . . . . . . 392.3.2 Regularized penalty methods . . . . . . . . . . . . . . . . 42

2.4 Mosco’s Approximation Scheme . . . . . . . . . . . . . . . . . . . 472.4.1 Approximation of well-posed problems . . . . . . . . . . . 482.4.2 Mosco’s scheme for ill-posed problems . . . . . . . . . . . 49

2.5 Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

3 PPR FOR FINITE-DIMENSIONAL PROBLEMS 553.1 Properties of the Proximal-Point-Mapping . . . . . . . . . . . . . 553.2 PPR for Convex Problems . . . . . . . . . . . . . . . . . . . . . . 60

3.2.1 Proximal gradient methods . . . . . . . . . . . . . . . . . 603.2.2 Penalty methods with iterative PPR . . . . . . . . . . . . 603.2.3 Augmented Lagrangian methods with iterativer PPR . . . 73

3.3 PPR in Methods of Numerical Analysis . . . . . . . . . . . . . . 773.3.1 Implicit Euler method . . . . . . . . . . . . . . . . . . . . 77

v

vi CONTENTS

3.3.2 Levenberg-Marquardt method . . . . . . . . . . . . . . . . 783.3.3 Linearization method . . . . . . . . . . . . . . . . . . . . 79

3.4 PPR for Nonconvex Problems . . . . . . . . . . . . . . . . . . . . 793.4.1 Convexification of unconstrained problems . . . . . . . . . 823.4.2 Convexification of constrained problems . . . . . . . . . . 873.4.3 Proximity control in trust region methods . . . . . . . . . 90

3.5 Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

4 PPR FOR INFINITE-DIMENSIONAL PROBLEMS 974.1 Preliminary Results . . . . . . . . . . . . . . . . . . . . . . . . . 974.2 One-Step Regularization Methods . . . . . . . . . . . . . . . . . . 101

4.2.1 Description of OSR-methods . . . . . . . . . . . . . . . . 1014.2.2 Modifications of OSR-methods . . . . . . . . . . . . . . . 109

4.3 Multi-Step Regularization Methods . . . . . . . . . . . . . . . . . 1124.3.1 Description of MSR-methods . . . . . . . . . . . . . . . . 1124.3.2 Convergence of the MSR-scheme . . . . . . . . . . . . . . 1144.3.3 MSR-method with regularization on subspaces . . . . . . 124

4.4 Choice of Controlling Parameters . . . . . . . . . . . . . . . . . . 1314.4.1 On adaptive controlling procedures for parameters . . . . 1314.4.2 Stopping rules for inner and outer loops . . . . . . . . . . 137

4.5 Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138

5 RATE OF CONVERGENCE FOR PPR 1395.1 Rate of Convergence for MSR-Methods . . . . . . . . . . . . . . 139

5.1.1 Convergence results for the general case . . . . . . . . . . 1395.1.2 Asymptotic properties of MSR-methods . . . . . . . . . . 1485.1.3 Linear convergence of MSR-methods . . . . . . . . . . . . 152

5.2 Rate of Convergence for OSR-Methods . . . . . . . . . . . . . . . 1595.3 Rate of Convergence for Regularized Penalty Methods . . . . . . 1625.4 Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165

6 PPR FOR CONVEX SEMI-INFINITE PROBLEMS 1676.1 Introduction to Numerical Analysis of SIP . . . . . . . . . . . . . 167

6.1.1 Conceptual exchange approach . . . . . . . . . . . . . . . 1686.1.2 Conceptual discretization approach . . . . . . . . . . . . . 1686.1.3 Conceptual reduction approach . . . . . . . . . . . . . . . 170

6.2 Stable Methods for Ill-posed Convex SIP . . . . . . . . . . . . . . 1716.2.1 Behavior of ill-posed SIP under discretization . . . . . . . 1716.2.2 Preliminary results . . . . . . . . . . . . . . . . . . . . . . 1756.2.3 Regularized penalty methods for SIP . . . . . . . . . . . . 181

6.3 Convergence of Regularized Penalty Methods for SIP . . . . . . . 1866.3.1 Basic theorems on convergence . . . . . . . . . . . . . . . 1866.3.2 Efficiency of the deleting rule . . . . . . . . . . . . . . . . 1976.3.3 MSR-approach revisited: Deleting rules for constraints . . 202

6.4 Regularized Interior Point Methods . . . . . . . . . . . . . . . . . 2036.4.1 Regularized logarithmic barrier methods . . . . . . . . . . 2046.4.2 Convergence analysis . . . . . . . . . . . . . . . . . . . . . 2096.4.3 Application to model examples . . . . . . . . . . . . . . . 229

6.5 Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241

CONTENTS vii

7 PARTIAL PPR IN CONTROL PROBLEMS 2437.1 Non-coercive Elliptic Control Problems . . . . . . . . . . . . . . . 243

7.1.1 Distributed control problems . . . . . . . . . . . . . . . . 2457.1.2 Regularized penalty methods . . . . . . . . . . . . . . . . 2477.1.3 A simple example . . . . . . . . . . . . . . . . . . . . . . 259

7.2 Elliptic Control with Mixed Boundary Conditions . . . . . . . . . 2617.2.1 Discretization in state and control space . . . . . . . . . . 2617.2.2 Convergence of inexact MSR-methods . . . . . . . . . . . 2687.2.3 Numerical results . . . . . . . . . . . . . . . . . . . . . . . 278

7.3 Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281

8 PPR FOR VARIATIONAL INEQUALITIES 2838.1 Approximation of Variational Inequalities . . . . . . . . . . . . . 2838.2 Contact Problems Without Friction . . . . . . . . . . . . . . . . 288

8.2.1 Formulation of the problems . . . . . . . . . . . . . . . . 2898.2.2 Finite element approximation of contact problems . . . . 2948.2.3 Three Variants of MSR-Methods . . . . . . . . . . . . . . 2978.2.4 On choice of variants of IPR-methods . . . . . . . . . . . 3068.2.5 Special Case: Inner approximation of the set K . . . . . . 3088.2.6 Exactness of approximations of feasible sets . . . . . . . . 3158.2.7 Numerical results . . . . . . . . . . . . . . . . . . . . . . . 320

8.3 Solution of Contact Problems With Friction . . . . . . . . . . . . 3288.3.1 Signorini problem with friction . . . . . . . . . . . . . . . 3298.3.2 Algorithm of alternating iterations . . . . . . . . . . . . . 3298.3.3 Numerical results . . . . . . . . . . . . . . . . . . . . . . . 333

8.4 Regularization of Well-posed Problems . . . . . . . . . . . . . . . 3358.4.1 Linear rate of convergence of MSR-methods . . . . . . . . 336

8.5 Elliptic Regularization . . . . . . . . . . . . . . . . . . . . . . . . 3398.5.1 A generalized proximal point method . . . . . . . . . . . 3408.5.2 Convergence analysis of the method . . . . . . . . . . . . 3438.5.3 Application to minimal surface problems . . . . . . . . . . 3508.5.4 Application to convection-diffusion problems . . . . . . . 357

8.6 Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 362

9 PPR FOR VI’s WITH MULTI-VALUED OPERATORS 3659.1 PPR with Relaxation . . . . . . . . . . . . . . . . . . . . . . . . . 365

9.1.1 Relaxed PPR with enlargements of maximal monotoneoperators . . . . . . . . . . . . . . . . . . . . . . . . . . . 367

9.1.2 Application to non-smooth minimization problems . . . . 3719.1.3 Numerical aspects and computational results . . . . . . . 3759.1.4 Choice of relaxation parameters . . . . . . . . . . . . . . . 379

9.2 Interior PPR on Non-Polyhedral Sets . . . . . . . . . . . . . . . . 3879.2.1 Bregman function based PPR . . . . . . . . . . . . . . . . 3889.2.2 Bregman functions with non-polyhedral zones . . . . . . . 3949.2.3 Embedding of original Bregman-function-based methods . 401

9.3 PPR with Generalized Distance Functionals . . . . . . . . . . . . 4039.3.1 Generalized proximal point method . . . . . . . . . . . . . 4059.3.2 Convergence analysis . . . . . . . . . . . . . . . . . . . . . 410

9.4 Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 417

viii CONTENTS

10 THE AUXILIARY PROBLEM PRINCIPLE 41910.1 Extended Auxiliary Problem Principle . . . . . . . . . . . . . . . 420

10.1.1 On merging of APP and IPR . . . . . . . . . . . . . . . . 42010.1.2 PAP-method and its convergence analysis . . . . . . . . . 423

10.2 Modifications and Applications of the PAP-method . . . . . . . . 43710.2.1 Extension of PAP-method to different types of VI’s . . . . 43710.2.2 Extension of PAP-method to inclusions . . . . . . . . . . 44010.2.3 Decomposition properties of PAP-method . . . . . . . . . 440

10.3 Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 441

11 APPENDIX 443A1 Some Preliminary Results in Functional Analysis . . . . . . . . . 443

A1.1 Norm and convergence in Hilbert spaces . . . . . . . . . . 443A1.2 Functionals in Hilbert spaces . . . . . . . . . . . . . . . . 445A1.3 Some Hilbert spaces and Sobolev spaces . . . . . . . . . . 446A1.4 Differentiation of operators and functionals . . . . . . . . 449A1.5 Convexity of sets and functionals . . . . . . . . . . . . . . 450A1.6 Monotone operators and monotone variational inequalities 461A1.7 Convex minimization problems in Hilbert spaces . . . . . 466

A2 Approximation Techniques and Estimates of Solutions . . . . . . 470A2.1 Some examples of monotone variational inequalities . . . 470A2.2 Finite element methods . . . . . . . . . . . . . . . . . . . 471A2.3 FEM-approximation of feasible sets . . . . . . . . . . . . . 477A2.4 FEM-approximation of variational problems . . . . . . . . 478

A3 Selected Methods of Convex Minimization . . . . . . . . . . . . . 485A3.1 Notions, definitions and convergence statements . . . . . 485A3.2 Methods of unconstrained minimization . . . . . . . . . . 487A3.3 Minimization methods subject to simple constraints . . . 501A3.4 Methods for convex minimization problems . . . . . . . . 506

Bibliography 519

Index 547

Nomenclature

Mathematical Symbols

N the set of natural numbers

R the set of real numbers

R the extended real line, R := R ∪ +∞

R+ the set of non-negative real numbers

1In n-vector of all ones (subscript often omitted

lim; lim the lower (resp. upper) limit

A⇒ B A implies B

A⇔ B A implies Bimplies A and B

∀ for all (universal quantifier)

∃ there is (existential quantifier)

iff if and only if

Sets

| S | the cardinality of a set S

measS the measure of a set S

bd(S); ∂S the topological boundary of a set S

int(S); ri(S) the topological interior (resp. relative interior) of a setS

cl(S) the topological closure of a set S

NS(x) the normal cone of a set S at x

aff(S) the affine hull of a set S

conv(S) the convex hull of a set S

ix

x CONTENTS

cone(S) the conical hull of a set S

S⊥ the orthogonal complement of a set S

Bρ(x) the (closed) ball with radius ρ and center x

Sρ(x) the sphere with radius ρ and center x

U(x) the neighborhood of a point x

2S the power set (set of all subsets) of a set S

[a, b]; (a, b) the closed (resp. open) real interval

[a, b); (a, b] the half-open real interval

ρ(x,Q) the distance of the point x from the set Q

ρ(S,Q) ρ(S;Q) := sups∈S

ρ(s,Q)

ρH(S,Q) the Hausdorff distance between S and Q

Ω an open and connected domain

Γ; ∂Ω the boundary of a domain Ω

Th a triangulation of a domain Ω

Tk finite hk-grid on T ⊂ Rm

Functionals and Operators

f : X → Y a function mapping from X into Y

G : D → R a mapping with domain D and range R

T : V → V ′ an operator mapping from V into V ′

T ′ the conjugate operator to an operator T , T ′ : V ′ → V

T −1 the inverse operator to an operator T

suppf the support of function f

lsc; usc lower (resp. upper) semicontinuous

wlsc; wusc weakly lower (resp. weakly upper) semicontinuous func-tion

D(f); D(T ) definition domain of a functional f (resp. operator T )

dom(f); dom(T ) the effective domain of a function f (resp. operator T )

gph(f); gph(T ) the graph of a function f (resp. operator T )

epi(f) the epigraph of a function f

CONTENTS xi

f g the composition of two functions, (f g)(u) = f(g(u))

γϕ the trace of a function ϕ on a boundary Γ

uI,h the interpolant of a function u on a finite element Th

ind(· | K) the indicator functional of the set K

ker T the kernel of an operator T

K the kernel of a bilinear form

rankA the rank of a matrix A

I the identity operator

a(u, v) a symmetrical bilinear form

NK(·) the normality operator of a set K

ΠK(u) the ortho-projector, projecting u onto a set K

∇f(u) the gradient of a functional f at the point u

∇2f(u) the second derivative of a functional f at the point u,Hessian of f

∂f(u) the subdifferential of a functional f at the point u

G′(u) the Frechet- or Gateaux derivative of G at the point u

G′′(u) the Frechet- or Gateaux derivative of G′ at the point u

f ′(u; v) directional derivative of the function f at the point u indirection v

Dα := Dα11 Dα2

2 ...Dαnn , the derivative in multi-index nota-

tion, Di = ∂/∂xi

∂∂ν the derivative in direction of the outer normal ν

∆u the Laplace operator at the point u

εkl(u) the tensor field of strains

τkl(u) the tensor field of stress

Spaces and Norms

Rn the real n-dimensional vector space

Rn+ the n-dimensional nonnegative orthant

Rn++ the n-dimensional positive orthant

H; V a Hilbert space

V ′ the conjugate (or dual) space to the space V

xii CONTENTS

Vh a finite dimensional approximation of the space V

〈·, ·〉V a duality pairing between V and V ′

〈u, v〉 a scalar product in Hilbert spaces

uk; uki a sequence (resp. subsequence)

uk → u the norm convergence: ‖ uk − u ‖→ 0

uk u the weak convergence: ∀` ∈ V ′ : `(uk)→ `(u)

dimL the dimension of a linear subspace L

‖ u ‖V the norm of an element u in the space V

|‖ u ‖| the norm of u in a space of vector functions

| u |V the semi-norm of an element u in the space V

‖ T ‖V the norm of an operator T in the space V

L (X,Y ) the space of linear continuous operators from X into Y

Cm(Ω) the space of m-times continuously differentiable func-tions f on Ω

C∞(Ω) C∞(Ω) := ∩∞m=0Cm(Ω)

Cm,λ(Ω) the space of functions Cm(Ω), whose derivatives of orderm satisfy a Holder condition with constant λ

D(Ω) D(Ω) = v ∈ C∞(Ω) : supp v is compact in Ω

L2(Ω) the Lebesque space

Hm(Ω) v ∈ L2(Ω) : Dαv ∈ L2(Ω) for all α with | α |≤ m

[Hm(Ω)]n the space of n-dimensional vector functions with com-ponents in Hm(Ω)

Hm0 (Ω) the closure of D(Ω) in Hm(Ω)

H12 (Ω) the space of the traces of functions from H1(Ω)

H−12 (Ω) the dual space to H

12 (Ω)

〈u, v〉m,Ω the scalar product in Hm(Ω)

〈u, v〉m,Ω,0 the scalar product in Hm0 (Ω)

‖ u ‖m,Ω the norm in Hm(Ω)

|‖ u ‖|m,Ω the norm in [Hm(Ω)]n

|‖ u ‖|m,Ω,0 the norm in Hm0 (Ω)

X → Y X is continuously embedded into Y

CONTENTS xiii

Xc→ Y X is compactly embedded into Y

Minimization and Regularization

K; Kk the feasible set of a minimizing problem

J ; Jk the objective functional of a minimizing problem

infu∈K J(u) the infimum of the functional J on the set K

u∗ := arg minu∈K J(u), the unique minimizer of J over K

U∗ := Arg minu∈K J(u), the set of minimizers of J over K

J∗ := J(u∗), u∗ ∈ U∗, the optimal value of the objectivefunctional of J

L; LA the Lagrangian function; augmented Lagrangian func-tion

T (u∗) := t ∈ T : g(u∗, t) = maxτ∈T g(u∗, τ), the set of activeparameters in SIP

Proxf,C(·) := arg minv∈Cf(v) + χ2 ‖ v − · ‖

2, the proximal pointmapping

η(·) := minv∈Cf(v) + χ2 ‖ v − · ‖

2, the Yosida regulariza-tion

Ψχ,u := f(·) + χ2 ‖ · − u ‖

2, a regularized functional

Ψk := Jk(·)+ ‖ · − uk−1 ‖2

Ψk := J(·)+ ‖ · − uk−1 ‖2

Ψk,i := Jk(·)+ ‖ · − uk,i−1 ‖2

Ψk,i := J(·)+ ‖ · − uk,i−1 ‖2

φ(·) a penalty function

Fk,i := Jk(·) + ϕk(·)+ ‖ · − uk,i−1 ‖2

Jχ,Q(·) := (I + χQ)−1, the resolvent operator

ω(·) the stabilizer

Dh(x, y) the Bregman distance

Gu(K) the set of feasible directions at u in K

rk the penalty parameter

χk the regularization parameter

xiv CONTENTS

Problem Classes and Fundamental Objects

APP auxiliary problem principle

BPPA Bregman-function-based proximal point algorithm

IPR iterative proximal point regularization

PPR proximal point regularization

PAP proximal auxiliary problem method

SIP semi infinite programming problem

(P k) regularized auxiliary problem at step k

CP(f,K) convex minimization problem of f on K

IP(Q,K) inclusion defined by the set K and the operator Q

IP(T ,Ξ,K) IP(Q,K) with the operator Q := T + Ξ

HVI(Q, f,K) hemi-variational inequality defined by the set K, theoperator Q and the functional f

VI(Q,K) variational inequality defined by the set K and the op-erator Q

VI(F ,P,K) VI(Q,K) with the operator Q := F + P

VI(T , ∂f,K) VI(Q,K) with the operator Q := T + ∂f

SOL(Q,K) solution set of VI(Q,K)

Miscellaneous

o(·) any function such that limt↓0o(t)t = 0

O(·) any function such that lim supt↓0|O(t)|t <∞

:= the assignment operator

end of a proof

end of a example, remark etc.

Chapter 1

Ill-POSED PROBLEMSAND THEIRSTABILIZATION

1.1 Concepts of Well-Posedness – Examples ofIll-Posed Problems

We start with the consideration of some ill-posed problems in the classical do-main of mathematics, for instance solution of differential and integral equations,computation of the value of an unbounded operator and others, which on theone hand have important applications in natural sciences and on the other handare frequently used as model-problems in theoretical and numerical studies ofill-posed problems.

At first sight these examples seem far away from the variational problemsstudied in this book. However, we will see later on that many numerical ap-proaches are based on variational formulations which allow the application ofoptimization techniques. This is not surprising with regard to the broad reachof variational methods in mathematics. Although we are mainly concerned withproblems in Hilbert spaces, in this chapter we will study the problems in Banachspaces to get a more fundamental insight into the class of ill-posed problems.Later on we consider variational problems immediately given in a Hilbert spaceand discard the examples described in more general spaces.

The following concept of well-posedness of a problem is convenient for apreliminary analysis of different types of problems from a common point of view.

We consider a model describing the dependence of ”solution”

u = G(z) ⊂ RG ⊂ U

on some ”input data” z ∈ DG ⊂ Z, where U and Z are Banach spaces with met-rics ρU and ρZ , respectively. Here DG and RG denote the domain of definitionand the range of the mapping G.

1.1.1 Definition. The problem of computing u = G(z) is called (conditionally)well-posed on the set D ⊂ DG if

1

2 CHAPTER 1. ILL-POSED PROBLEMS AND THEIR STABILIZATION

(i) the mapping G is single-valued on D and

(ii) G is continuous on D as a mapping from Z into U .

We call G(D) ⊂ U the set of well-posedness of the considered problem.If at least one of the conditions (i) or (ii) fails to hold, the problem is termedill-posed on D . ♦

1.1.2 Remark. For the time being we do not consider the question of howto select the set D in order to take into account actual information on theproblem. However, in applications it is important to know for instance whetherthe approximate data are contained in D. We emphasize that notions of well-and ill-posedness depend on the chosen pair of spaces U and Z. ♦

An essentially more general concept will be considered later on for varia-tional problems. In particular, set-valued mappings G will be permitted.

Our first example is a classical ill-posed problem considered by Hadamard.It should be noted that Hadamard [162, 163] has been the first who gave adescription of the class of well-posed problems.

1.1.3 Example. (Cauchy Problem for the Laplace equation.)A solution u of the equation

∆u(x1, x2) = 0

is sought on the domain Ω = (x1, x2) ∈ R2 : x2 > 0 subject to the boundaryconditions

u(x1, 0) = 0,∂u

∂x2

∣∣x2=0

= ϕ(x1), −∞ < x1 <∞,

where ϕ is a given function. Here u = G(ϕ) describes the solution in dependenceof the boundary function ϕ.

For ϕ(x1) ≡ ϕ1(x1) = a−1 sin ax1, a > 0, the function

u1(x1, x2) = a−2 sin ax1 sinh ax2

is the unique solution of the Cauchy problem, whereas in case ϕ(x1) ≡ ϕ2(x1) =0 the solution is

u2(x1, x2) = 0.

Taking Z = C(R), U = C(Ω), we get ρZ(ϕ1, ϕ2) = a−1 and, for arbitrary x2,

ρU (u1, u2) ≥ supx1∈R

∣∣u1(x1, x2)− u2(x1, x2)∣∣

= supx1∈R

∣∣a−2 sin ax1 sinh ax2

∣∣ = a−2 sinh ax2.

Obviously, by means of the choice of a, the value of ρZ(ϕ1, ϕ2) can be madearbitrarily small and at the same time ρU (u1, u2) will be arbitrarily large for afixed x2 > 0.

This shows that the Cauchy Problem for the Laplace equation consideredon the pair of spaces U = C(Ω) and Z = C(R) is ill-posed on the set

D = ϕ ∈ Z : ‖ϕ‖ ≤ c0

for any c0 > 0. ♦

1.1. CONCEPTS OFWELL-POSEDNESS – EXAMPLES OF ILL-POSED PROBLEMS3

1.1.4 Example. (Differentiation of an approximately given function.)Let u1 be the derivative of a function z1 on [0, 1] and

z2(t) = z1(t) + q sin at.

For Z = C[0, 1] we obtainρZ(z1, z2) ≤ |q|

and for U = C[0, 1] and u2 = dz2dt the distance between u1 and u2 is

ρU (u1, u2) = maxt∈[0,1]

|qa cos at| = |qa|.

Choosing q > 0 arbitrarily small and tending a → ∞, we conclude that theproblem is ill-posed for the pair of spaces U = Z = C[0, 1].

It is easy to see that the problem becomes well-posed if we choose Z =C1[0, 1] and U = C[0, 1] . ♦

It should be noted that in applications the spaces Z and U are in general notof our disposition, but are prescribed by the physical properties of the modelinvolved. For example, the assumption that an approximately given function isclose to an unknown exactly given function in the norm of the space C1[0, 1] isin general not a natural hypothesis. In models for the geophysical detections ofminerals, leading to a Cauchy Problem for the Laplace equation, the solution isof interest just for the pair of spaces considered in Example 1.1.3.

1.1.5 Example. (Fredholm integral equation of the first kind.)On the pair of spaces U = C[0, 1], Z = L2(0, 1) the solution of the integralequation

Au(x) ≡∫ 1

0

K(x, s)u(s)ds = z(x)

is sought with K a kernel continuous on [0, 1]× [0, 1].Obviously, the functions z ∈ DG = RA must be continuous. Let u1 be a

solution for the fixed function z1 ∈ DG. Then

u2(x) = u1(x) + q sin as

will be the solution of the equation corresponding to the function

z2(x) = z1(x) + q

∫ 1

0

K(x, s) sin asds.

The distances are

ρU (u1, u2) = |q| maxs∈[0,1]

| sin as|,

ρZ(z1, z2) = |q|

(∫ 1

0

[∫ 1

0

K(x, s) sin asds

]2

dx

)1/2

.

For arbitrary numbers q and a→∞ we get

ρU (u1, u2)→ |q| and ρZ(z1, z2)→ 0.

If U = L2(0, 1) is chosen, then ρU (u1, u2) → |q|√2, i.e., the problem is ill-posed

for pair U = C[0, 1], Z = L2(0, 1) as well as for pair U = Z = L2(0, 1). ♦


In the examples above ill-posedness of the problems is due to the unboundednessof the operator G. However, in the latter problem, if λ = 0 appears to be aneigenvalue of the operator A, then the solution will be non-unique if it exists atall.

So-called inverse problems are other prominent examples of ill-posedness.They are, for instance, connected with the reconstruction of the physical envi-ronment by means of observation of the generated physical outcome. Importantproblems belong to this class like the analysis of physical experiments, the in-terpretation of observational data, tomography models and many others.

Inverse problems described by differential equations usually require to de-termine unknown operator coefficients on the basis of information about thesolution of the equation itself.

We close this introductory section with two simple examples of convexvariational problems which show that linear programming problems and linearsemi-infinite programming problems are ill-posed in general, too.

1.1.6 Example. (Linear programming problem.)The function J(u) = u2 has to be minimized subject to

0 ≤ u1 ≤ 2, 0 ≤ u2 ≤ 2,

g1(u) = u1 + u2 − 1 ≤ 0,

g2(u) = −u1 − u2 + 1 ≤ 0.

Obviously, the problem has the unique solution u∗ = (1, 0). Now, we considerthe dependence of the solution u ∈ U = R

2 on g2 ∈ Z = C[0, 2] near the givenfunction g2.

Replacing the constraint g2(u) ≤ 0 by

gσ2 (u) = −(1 + σ)u1 − u2 + 1 ≤ 0,

we getρZ(g2, g

σ2 ) = 2|σ| → 0 for σ → 0.

However, for σ ∈ (−1, 0), the perturbed problem has the unique solution u∗σ =(0, 1), hence ρU (u∗, u∗σ) =

√2.

If, instead of Z = C[0, 2] the set Z = R2×2 is considered to describe the

perturbations of the coefficients of the functions g1, g2, the result will be thesame independently of the norm chosen in Z. ♦

1.1.7 Example. (Linear semi-infinite programming problem.)Consider the linear semi-infinite problem (LSIP)

min−u1 : −u2 ≤ 0, u1 − tu2 ≤ 0 for all t ∈ T,

with T = [0, 1] ⊂ R. Let us investigate its well-posedness subject to the variationof the set T , which is estimated in the Hausdorff-metric ρH , i.e., Z is the spaceof subsets in R with this metric.Obviously, the solutions of the original problems are given by the points (0, a),a ≥ 0. If Tδ ⊂ (0, 1] is a closed set with ρH(Tδ, T ) ≤ δ, the perturbed problem

min−u1 : −u2 ≤ 0, u1 − tu2 ≤ 0 for all t ∈ Tδ,

1.2. WELL- AND ILL-POSED VARIATIONAL PROBLEMS 5

has no solution:

inf J(u) = −u1 = −∞ for arbitrary small δ > 0.

♦

The question of well-posedness of convex, and in particular of linear, semi-infinite problems with respect to the variation of the set T is closely relatedto the question whether they can be approximated in a stable way by finite-dimensional convex problems. Many numerical methods for solving semi-infiniteprogramming problems are based on such approximations.

Examples of ill-posed elliptic variational inequalities will be considered inSections 8.1 and 8.2.

The investigation of well-posedness in the sense of Definition 1.1.1 requiresthe transformation of the original problem into a model u = G(z). In the ex-amples considered above this should be done without difficulties. However, forspecific classes of problems, using their natural formulations, it is possible to ob-tain more convenient definitions which wide the notion of well-posed problemsfor a given class.

In particular, the assumption of single-valuedness of the mapping G, usu-ally obtained for linear problems by a suitable factorization, is very restrictivefor nonlinear problems because in the case of their non-unique solvability theirsolution sets are not in general affine manifolds.

Moreover, in Example 1.1.4, we have seen that any problem of the corre-sponding class is ill-posed, whereas Example 1.1.6 shows that within the classof linear programming problems ill-posedness is more an exceptional case de-pending on accidental numerical data.

1.2 Well- and Ill-posed Variational Problems

1.2.1 Notations and definitions

Throughout this book within the class of convex functions we will always con-sider proper convex functions.

In the literature different notions of well-posedness of variational problemscan be found (cf. Vasiljev [408], Karmanov [232]). We introduce here a defi-nition which is sufficient flexible for our further considerations.

First we deal with the problem

minimize J(u),subject to u ∈ K = v ∈ U0 : B(v) ≤ 0, (1.2.1)

where J : U → R is a convex, continuous functional on a Banach space U ;B : U → Y is a convex, continuous mapping into a Banach space Y andU0 is a convex and closed subset of U .

Let U∗ = Arg minJ(u) : u ∈ K be the set of solutions of Problem (1.2.1) andJ∗ = J(u∗) with u∗ ∈ U∗ the optimal value´of the problem.In case that U is a Hilbert space and U0 = U , we obtain the convex variationalproblem (A1.5.11), (A1.7.40) (see Appendix) and for U0 = U , B : U → R

m, a


problem of type (A1.7.35) is given.

For a fixed δ > 0 we define the set of variations

Φδ = ϕδ ≡ (Jδ, Bδ) : ‖J − Jδ‖C(U0) ≤ δ, maxu∈U0

‖B(u)−Bδ(u)‖Y ≤ δ, (1.2.2)

where Jδ : U → R and Bδ : U → Y are assumed to be continuous. With anelement ϕδ ∈ Φδ the perturbed problem

minimize Jδ(u),subject to u ∈ U0, Bδ(u) ≤ 0

(1.2.3)

can be established. The feasible and optimal sets are denoted by

U(ϕδ) = u ∈ U0 : Bδ(u) ≤ 0

andU∗(ϕδ) = u∗ ∈ U(ϕδ) : Jδ(u

∗) = infu∈U(ϕδ)

Jδ(u),

respectively.

1.2.1 Definition. Problem (1.2.1) is called weakly well-posed if

(i) solution set U∗ is non-empty;

(ii) there exists a constant δ0 > 0 such that for any δ ∈ (0, δ0) and any ϕδ ∈ Φδthe perturbed solution set U∗(ϕδ) is non-empty;

(iii) limδ↓0 supu∈U∗(ϕδ) ρ(u, U∗) = 0 holds for every ϕδ ∈ Φδ.

Problem (1.2.1) is said to be well-posed if, moreover,

(iv) solution set U∗ is a singleton, i.e. U∗ = u∗.

♦

If any of these conditions (i)-(iv) fails to hold, then the problem is consid-ered as an ill-posed one. Hence, if condition (iv) is disturbed, weakly well-posedproblems also belong to the class of ill-posed problems.

The perturbation class Φδ is related in particular to the convex problem

minimize J(u),subject to u ∈ K, (1.2.4)

where J : V → R is a convex, lower semicontinuous functional,K ⊂ V is a convex, closed set and K ∩ domJ 6= ∅.

Naturally, the distance between the functions Jδ and J as well as between Bδand B can be measured also in other metrics.

Defining Φδ in (1.2.2) it is often important to specify perturbations of theoriginal information in view of the common methods applied for the resolution ofthe perturbed problems. For instance, in the case of convex variational problemsconvexity of the approximate problem is usually assumed, i.e. in the definitionof Φδ it is supposed that Jδ and Bδ have to be convex.


1.2.2 Remark. Sometimes it may be convenient to replace U0 by a suitablesubset U0 ⊂ U0 (with U∗∩U0 6= ∅), implying an enlargement of Φδ. An analogouseffect in other definitions of well-posedness is induced by a relaxation of theinequalities in (1.2.2) (cf. Vasiljev [408], Tikhonov / Arsenin [395]) :

|J(u)− Jδ(u)| ≤ δ(1 + ω(u)), ‖B(u)−Bδ(u)‖Y ≤ δ(1 + ω(u)), (1.2.5)

where u ∈ U0 and ω : U0 → R is a non-negative functional such that its levelset u : ω(u) ≤ c is weakly compact for each constant c. ♦

When working with experimental data well-posedness of variational prob-lems has to allow an efficient estimation of the propagated error in the solutioninduced by the error bound of the data.

For problems with exact data the arising perturbations are due to the nu-merical solution methods. In order to prove convergence, it is usually assumedthat all the data are given with arbitrary accuracy.

Now we will investigate well-posedness of convex semi-infinite problems

(SIP ) minimize J(u)subject to u ∈ K ≡ v ∈ Rn : g(v, t) ≤ 0 ∀ t ∈ T, (1.2.6)

where T is a compact subset of a normed space; J, gt|t∈T : u → g(u, t) areconvex, finite-valued functions, gu : t→ g(u, t) is a continuous function on T forany u ∈ K.

The problem is studied in detail in Appendix (see (A1.7.42)).Setting either B : u → maxt∈T g(u, t) or B : u → g(u, ·), we can apply Def-inition 1.2.1 (cf. also Corollary 1.2.7). However, to consider perturbations ofthe set T within the whole class Φδ, the verification of the inclusion ϕδ ∈ Φδoften causes difficulties. On the other hand, as numerical methods always re-quire a discretization of the set T , the study of well-posedness of Problem (1.2.6)subject to an approximation, here considered as a perturbation of the set T , isespecially important.

Here we investigate well-posedness of convex SIP, when the class of per-turbations of the set T is given in the form

Tδ = Tδ ⊂ T : ρH(T, Tδ) ≤ δ, (1.2.7)

and everything else is given exactly.Thus, for an arbitrary δ > 0 and Tδ ⊂ Tδ, we consider the problem

minimize J(u)subject to u ∈ Rn, g(u, t) ≤ 0 ∀ t ∈ Tδ.

(1.2.8)

Let U∗(Tδ) be the set of optimal solutions of (1.2.8).

1.2.3 Definition. Problem (1.2.6) is called weakly T-well-posed if

(i) U∗ 6= ∅;


(ii) there exists a constant δ0 > 0 such that for each δ ∈ (0, δ0) and eachTδ ∈ Tδ the set U∗(Tδ) is non-empty;

(iii) limδ↓0 supu∈U∗(Tδ) ρ(u, U∗) = 0 holds for arbitrarily chosen Tδ ⊂ Tδ.

The problem is said to be T-well-posed if, moreover,

(iv) U∗ = u∗.

♦

If any of the conditions (i)-(iv) fails to hold, then Problem (1.2.6) is con-sidered as ill-posed.

1.2.4 Remark. The introduction of these or other weaker definitions of well-posedness for different problems aims at separating an essential part of ill-posedproblems. Mostly, if a convex SIP is ill-posed under a possible perturbation ofall data, then it is usually ill-posed in the sense of Definition 1.2.3. ♦

1.2.5 Remark. We comment on some alterations in the definition of well-posedness for variational problems given in the literature.

Mosco [296] has investigated well-posedness of variational inequalitieswith monotone operators assuming that the sequence of feasible sets Ki con-verges to the original feasible set K such that

v : vi → v for vi ∈ Ki= w : wj w for wj ∈ Kij and Kij ⊂ Ki = K.

Lemaire [260], considering the original and the perturbed problems asunconstrained minimization problems, has described the relation between thedata of these problems in the framework of the Moreau-Yosida approxima-tion. Indeed, for the problems

minϕ(u) : u ∈ V and minϕ(u) : u ∈ V ,

with convex functions ϕ, ϕ : V → R on the Hilbert space V , as a measure ofthe distance between ϕ and ϕ the value

sup‖u‖≤ρ

|ϕλ(u)− ϕλ(u)|

is introduced by means of the Moreau-Yosida functional

ηλ(u) = infv∈V 1

2λ‖v − u‖2 + f(u)

with λ > 0, ρ > 0 suitably chosen.

Instead of the notion ”weak well-posedness” in the literature often thenotion well-posedness with respect to the argument is used.

Because sometimes the optimal value J∗ = infu∈K J(u) may be of interestonly, the notion well-posedness with respect to the objective is considered, i.e.,for any choice of ϕδ ∈ Φδ (cf. (1.2.2)),

inf Jδ(u)→ J∗ as δ ↓ 0.


In some papers the notions stable and well-posed are used synonymously.However, when speaking about non-stable problems, one usually has in mind agap in the mapping G given in Definition 1.1.1.

We shall apply the notion ”stable”only in connection with numerical meth-ods. ♦

1.2.2 Some conditions of well-posedness

Now we consider Problem (1.2.1) with B : U → C(T ) ≡ Y given by B(u) =g(u, ·). The function g is assumed to be continuous on U × T and convex withrespect to u for every t ∈ T and T is a compact subset of a Banach space.Depending on the cardinality of T problems with a finite or infinite number offunctional constraints are included.Let us consider variations Bδ, Jδ of B and J , respectively, subject to

maxu∈U0

‖B(u)−Bδ(u)‖Y ≤ δ,

|J − Jδ|C(U0) ≤ δ.(1.2.9)

If gδ is a continuous functional on U × T satisfying

maxu∈U0

‖g(u, ·)− gδ(u, ·)‖C(T ) ≤ δ

then, for Bδ : u→ gδ(u, ·), inequality (1.2.9) holds.Denote

g(u) = maxg(u, t) : t ∈ T.

1.2.6 Theorem. In the problem defined as above, let U be a reflexive Banachspace and Slater’s condition be satisfies, i.e., there exists a point u ∈ U0 suchthat g(u) < 0. Assume further that the functionals Jδ, Bδ(·)(t), t ∈ T , are weaklylower semi-continuous, U∗ is non-empty and bounded and

limδ↓0

supu∈Wδ

ρ(u, U∗) = 0, (1.2.10)

where Wδ = u ∈ U0 : J(u) ≤ J∗ + δ, g(u) ≤ δ.Then the problem is weakly well-posed in the class of variations (1.2.2).

Proof: We first prove that for

J(δ) = infJ(u) : u ∈ U0, g(u) ≤ −δ

the relation limδ↓0 J(δ) = J∗ is valid. This is obviously true if g(u∗) < 0 forsome u∗ ∈ U∗.If U∗∩ intK = ∅, then for arbitrarily chosen u∗ ∈ U∗ and u = u+(1−α)(u∗− u)with α ∈ [0, 1], we obtain in view of the convexity of g that

g(u) ≤ g(u∗) + α[g(u)− g(u∗)] = αg(u).

Hence, g(u) ≤ −δ if α = − δg(u) ≤ 1, and for δ ↓ 0 the point u tends to u∗. The

continuity of J ensures that, indeed, J(δ)→ J∗.


Now, we fix δ0 > 0 such that Wδ0 is bounded (because of (1.2.10) and theboundedness of U∗ such a required δ0 exists) and choose δ < δ0 such that

J(δ)− J∗ + 3δ < δ0, (1.2.11)

and δ ∈ (0,mint∈T |g(u, t)|).For δ ∈ (0, δ) and any ϕδ ∈ Φδ the level set u ∈ U0 : Bδ(u) ≤ 0 is non-empty(obviously, Bδ(u) < 0) and is included in the set u ∈ U0 : g(u) ≤ δ on accountof inequality (1.2.9).According to the definition of J(δ) a point u ∈ U0 exists with

J(u) ≤ J(δ) + δ and g(u) ≤ −δ,

hence

Jδ(u) ≤ J(u) + δ ≤ J(δ) + 2δ and Bδ(u) ≤ 0

proving that the system of relations

Jδ(u) ≤ J(δ) + 2δ, Bδ(u) ≤ 0, u ∈ U0

is solvable. Obviously, its solution set is contained in

Uδ = u ∈ U0 : J(u) ≤ J(δ) + 3δ, g(u) ≤ δ,

and Uδ ⊂Wδ0 holds in view of (1.2.11) and the inequality J(δ) + 3δ ≤ J(δ) + 3δwhich follows from the definition of J(δ).Therefore, Uδ is bounded and even weakly compact due to the reflexivity ofU and convexity and continuity of the functionals J and g. Having this, inview of the lower semicontinuity of the functionals Jδ and Bδ(·)(t), t ∈ T , onecan conclude that the functional Jδ attains its minimum on the set u ∈ Uδ :Bδ(u) ≤ 0. Because of the equality

infJδ(u) : u ∈ U0, Bδ(u) ≤ 0 = minJδ(u) : u ∈ Uδ, Bδ(u) ≤ 0

and the inclusion Uδ ⊂ U0 we obtain that the infimum on the left-hand side isattained, proving that U∗(ϕδ) 6= ∅ and U∗(ϕδ) ⊂ Uδ ⊂Wδ0 .Finally, due to condition (1.2.10) it follows

limδ↓0

supu∈U∗(ϕδ)

ρ(u, U∗) = 0.

If U is a finite-dimensional space, then condition (1.2.10) is a consequenceof the remaining assumptions in Theorem 1.2.6).

1.2.7 Corollary. Assume that for the convex SIP (1.2.6) the following condi-tions hold:

(i) there exists a Slater point;

(ii) U∗ 6= ∅ and bounded;


(iii) there exist positive constants L and α such that

supu∈Uα

|g(u, t′)− g(u, t′′)| ≤ L‖t′ − t′′‖Y ∀ t′, t′′ ∈ T

holds for Uα = u : ρ(u, U∗) ≤ α.

Then SIP (1.2.6) is weakly well-posed for the class of variations Tδ given in(1.2.7) as well as for the class

ϕδ = (Jδ, gδ, Tδ) : ‖J − Jδ‖C(Rn) ≤ δ,

maxt∈T‖g(·, t)− gδ(·, t)‖C(Rn) ≤ δ,

Tδ ⊂ Tδ,

with continuous functionals Jδ and gδ.

In this context the following lemma is easy to prove.

1.2.8 Lemma. Assume that U0 is convex and closed, the functionals J andg(·, t) are convex and continuous, Slater’s condition is satisfied and the set Wδ0

is bounded for some δ0 > 0. Then

limδ↓0

supu∈Wδ

ρ(u,K) = 0.

If, moreover, infu∈U0J(u) < J∗, then for S = u ∈ U0 : J(u) ≤ J∗ the relation

limδ↓0

supu∈Wδ

ρ(u, S) = 0

holds true.

Nevertheless, the following example in infinite-dimensional spaces shows that

limδ↓0

supu∈Wδ

ρ(u, U∗) 6= 0

is possible.

1.2.9 Example. In Problem (1.2.1) let

U = U0 = `2, J(u) =

∞∑i=1

i−1u2i ,

B(u) = maxg1(u), g2(u) with g1(u) = u1 + 1, g2(u) =

∞∑i=1

u2i − 2.

Obviously, the solution set is U∗ = (−1, 0, ..., 0, ...) and the assumptions ofTheorem 1.2.6 are satisfied with exception of (1.2.10). The set Wδ is boundedfor any δ > 0 and infJ(u) : u ∈ U0 = 0 < J∗. However, for the sequence ofpoints uk = (−1, 0, ..., 1, 0, ...) ∈Wδk with δk = 1

k we have

ρ(uk,K) = 0, ρ(uk, S)→ 0,

whereas limk→∞ ρ(uk, U∗) = 1. ♦


It should be noted that the verification of condition (1.2.10) is a common stepin the investigation of the convergence of a generalized minimizing sequence (cf.Vainberg [405], Levitin / Polyak [263]).

Now we give a sufficient condition being important for the analysis of weakwell-posedness of variational inequalities in infinite-dimensional spaces.

1.2.10 Lemma. Let V1 be a finite-dimensional subspace of a Hilbert space V ,Π1 : V → V1 an orthoprojector on V1 (i.e. Π1u = arg min‖u − v‖ : v ∈ V1,for any u ∈ V ), and Π2 = I −Π1, where I denotes the identity operator on V .Assume moreover that, for any u1, u2 ∈ V and any λ ∈ [0, 1], the functional Jin Problem (1.2.1) (with U = V ) satisfies the inequality

J(λu1 + (1− λ)u2) ≤ λJ(u1) + (1− λ)J(u2)− λ(1− λ)q (‖Π2u1 −Π2u2‖) ,(1.2.12)

where q : R+ → R+ is a strictly increasing function with q(0) = 0.Then relation (1.2.10) is a conclusion of the remaining assumptions in Theorem1.2.6.

Proof: At first we verify that, for some δ0 > 0 the set Wδ0 is bounded. Due tothe Slater condition there exists a constant a > 0 (cf. Remark A3.4.44) suchthat for

ψ(u) = J(u) + amax[0, g(u)]

the relationsArg min

u∈Vψ(u) = U∗, min

u∈Vψ(u) = J∗ (1.2.13)

hold. Denote Γ = u : ρ(u, U∗) = c with c > 0 fixed and suppose that

infu∈Γ

ψ(u) = J∗ + c1, with c1 > 0.

Because of the convexity of ψ the inclusion

u : ψ(u) ≤ J∗ + c1 ⊂ u : ρ(u, U∗) ≤ c

is satisfied and with δ0 = c11+a it follows that

Wδ0 ⊂ u : ψ(u) ≤ J∗ + c1.

Now, assume thatinfu∈Γ

ψ(u) = J∗ (1.2.14)

would be true. Obviously, inequality (1.2.12) is fulfilled for the functional ψinstead of J . Hence, for u∗ = arg minu∈V ψ(u), we obtain

ψ(u)− ψ(u∗) ≥ q (‖Π2u−Π2u∗‖) . (1.2.15)

Choosing a weakly convergent sequence vi ⊂ Γ with

limi→∞

ψ(vi) = J∗, (1.2.16)

then, due to (1.2.15), it follows that

ψ(vi)− ψ(u∗) ≥ q(‖Π2v

i −Π2u∗‖),


and in view of (1.2.16) and the properties of the functional q the relation

limi→∞

‖Π2vi −Π2u

∗‖ = 0

holds. On account of the weak convergence of vi and the finite dimensionalityof V1 we conclude that

limi→∞

‖vi − u∗‖ = 0

in contradiction to the chosen sequence vi. Consequently, the case (1.2.14) isimpossible, and thus, boundedness of Wδ0 has been shown.Let δi ↓ 0 and ui ∈Wδi . Since Wδ0 is bounded, the sequence ui is also boundedand without loss of generality we can assume that ui v. Due to its convexityand continuity, the functional ρ(·,K) is weakly lower semi-continuous and from

limδ↓0

supu∈Wδ

ρ(u,K) = 0

we obtain

limi→∞ρ

(ui + v

2,K

)= limi→∞ρ

(ui,K

)= ρ(v,K) = 0.

Hence, J(v) ≥ J∗. Since J is a weakly lower semicontinuous functional, we get

limi→∞J(ui) ≥ J(v), limi→∞J

(ui + v

2

)≥ J(v).

However, from the definition of Wδi it follows that limi→∞J(ui) ≤ J∗.Thus, J(v) = J∗ and from

J

(ui + v

2

)≤ 1

2J(ui) +

1

2J(v)− 1

4q(‖Π2u

i −Π2v‖)

we conclude that‖Π2u

i −Π2v‖ → 0,

and finally, in view of the finite dimensionality of V1, the relation

‖Π1ui −Π1v‖ → 0

holds.

The verification of the assumptions of Theorem 1.2.6 shows that in Ex-ample 1.1.6 the Slater condition fails to hold, whereas in Example 1.1.7 the re-quired boundedness of U∗ is disturbed. Moreover, both problems are not weaklywell-posed with respect to variations of the data in the class Φδ. From the anal-ysis of Example 1.2.9 it is clear that condition (1.2.10) is not satisfied. Indeed,taking δk = 1

k and

Jδk(u) =∑i=1i 6=k

∞i−1u2

i + max1

ku2k,

1

k,

it is easy to see that this problem is not weakly well-posed, too.


1.3 Approaches of Stabilizing Ill-posed Problems

1.3.1 Stabilizers and quasi-solutions

The idea of stabilizing functionals or stabilizers is a fundamental tool for solvingill-posed problems. Their formal introduction does not require a specification ofthe problems under consideration. We only need a metric space U in which thesolution is sought and some information about the solutions. The content of thisinformation is more or less reflected in the definition of the stabilizer.

In the sequel some standard notations and properties of Sobolev spaceswill be used (cf. Section A1.3 in Appendix). The symbols 〈·, ·〉m,Ω and ‖ · ‖m,Ω(respectively 〈·, ·〉m,Ω,0 and ‖ · ‖m,Ω,0) denote the scalar product and the normof the spaces Hm(Ω) and Hm

0 (Ω), respectively.

1.3.1 Definition. A functional ω defined on the setDω ⊂ U is called a stabilizerof a given problem if

(i) ω(u) ≥ 0 for all u ∈ Dω;

(ii) set ωc = u ∈ U : ω(u) ≤ c is a compact subset of the metric space U forany c ≥ 0;

(iii) U∗ω ≡ Dω ∩ U∗ 6= ∅, where U∗ is the solution set of the problem.

The functional ω is said to be a weak stabilizer, if condition (ii) is replaced by

(ii)’ set ωc is a weakly compact subset of the space U for any c ≥ 0.

♦

It should be noted that in this context the lower semi-continuity of thefunctional ω follows from assumption (ii) and weak lower semi-continuity of ωcan be concluded from (ii)’.

A simple example of a stabilizer in the Euclidean space Rn is given by thefunctional ω(u) = ‖u− u‖2, where u ∈ Rn is an arbitrary element.

In the Banach space C[a, b], assuming that U∗ ∩ H1(a, b) 6= ∅, the func-tional ω(u) = ‖u− u‖21,(a,b) is a stabilizer with any u ∈ H1(a, b).In the case Ω ⊂ Rn this follows immediately from the compactness of the em-bedding of the spaces Hm(Ω) in C(Ω) for n < 2m (cf. Appendix, Section A1.3).

Due to the compactness of the embedding of Hm+1(Ω) in Hm(Ω) (inde-pendently of the dimension of Ω), the functional ω(u) = ‖u − u‖m+1,Ω is astabilizer in the space Hm(Ω) if U∗ ∩Hm+1(Ω) 6= ∅ and u ∈ Hm+1(Ω).

In a reflexive Banach space U , in view of the weak compactness of a sphere,the functional ω(u) = ‖u− u‖2U satisfies the properties of a weak stabilizer.

Analyzing ill-posed equations of the type

Au = z0,

one is faced with the difficulty that small perturbations of the function z0 mayexceed the range of the operator A such that classical solutions no longer existfor the perturbed equation.

In order to deal with this difficulty, Ivanov [195], Ivanov et al. [196]has suggested the following generalization of the notion of a solution.

1.3. APPROACHES OF STABILIZING ILL-POSED PROBLEMS 15

1.3.2 Definition. An element u attaining the minimum of the residue ρZ(Au, z0)on a set M is called a quasi-solution of the equation Au = z0 on M . ♦

Obviously, if z0 ∈ AM , then the corresponding quasi-solution coincideswith the solution in the classical sense.

The assumption that ρZ attains its minimum on the set M is an essentialrestriction with respect to the choice of M . In some papers the condition thatM is the set of well-posedness of the considered problem is part of the definitionof the quasi-solution.

1.3.2 Classical approaches for stabilization of ill-posed prob-lems

In order to introduce the basic ideas of the classical methods for solving ill-posedproblems we choose as an initial model the operator equation of the first kind

Au = z0, (1.3.1)

with A a continuous operator. The integral equation in Example 1.1.5 is arepresentative of such a problem.

We assume that the operator A is given exactly and that Problem (1.3.1)is solvable for the exact right-hand side z0. Instead of z0 an approximation zδis considered such that

ρZ(z0, zδ) ≤ δ.

Given (zδ,A, δ) an element uδ is sought such that for δ ↓ 0 we get convergenceof uδ to the solution u∗ (or to the solution set U∗) of (1.3.1) in a suitable sense.

The following three variational approaches for constructing such uδ arewell-known and guarantee the required properties.

Quasi-solution method:Minimize the residue ρZ(Au, zδ) on a set M ⊂ U selected such that it containsa solution of Problem (1.3.1). For the moment let M be compact.

If ω is a stabilizer and u ∈ U : ω(u) ≤ c ∩ U∗ 6= ∅, a possible choice ofM is the set ωc = u ∈ U : ω(u) ≤ c.

Residual method:Minimize the stabilizer ω on the set

u ∈ Dω : ρZ(Au, zδ) ≤ ϕ(δ),

where ϕ(δ) ≥ δ, ϕ(δ)→ 0 for δ ↓ 0.

Tikhonov regularization method:Minimize the functional ρ2

Z(Au, zδ) + αω(u) on its definition domain for a suit-ably chosen value α(δ) > 0 of the parameter α.

Obviously, in order to get a solution of the original Problem (1.3.1), it isnecessary that δ tends to zero.

Due to the continuity and non-negativity of the function ρZ and in viewof the compactness of the sets M and ωc, in each of the three methods aboveit holds that for any zδ and any c there exists an element uδ. Moreover, if


the operator A is linear and a weak stabilizer ω is used, then the transformedproblems have a solution, too.

For the investigation of the convergence of uδ, δ ↓ 0, the following resultis important.

1.3.3 Proposition. Let U and Z be Banach spaces, M ⊂ DT ⊂ U be compact,and the continuous operator T : U → Z is assumed to be an injection from Monto TM . Then the inverse operator T −1 is continuous on TM .

1.3.4 Remark. If the operator T : U → Z is linear, continuous and injectiveand M is a composition such that

M = M ′ +M ′′ ⊂ DT ⊂ U, M ′ ∩M ′′ = ∅ or M ′ ∩M ′′ = 0,

with M ′ a compact set and M ′′ a finite-dimensional subspace of U , then T −1

is continuous on TM . ♦

In particular, if the operator A in (1.3.1) satisfies the hypothesizes inProposition 1.3.3, we obtain immediately the convergence of the quasi-solutionmethod. However, if in addition it is supposed that A is linear, the set M isconvex, and the space Z is strongly convex (which is the case if Z is a Hilbertspace), then the problem

minu∈M

ρZ(Au, z)

is well-posed for any z.

Now we come back to the variational problem (1.2.1) and the class ofvariations Ψδ given by (1.2.2). Denote χ(·|Q) the indicator functional to Q, i.e.,

ind(u|Q) =

0 if u ∈ Q,+∞ if u /∈ Q.

By substituting formally Jδ(u) + ind(u|U(ϕδ)) for ρZ(Au, zδ) quasi-solutionand regularization methods can be formulated analogously for Problem (1.2.1),whereas, for an analogue to the residual methods, we have to replace ρZ(Au, zδ)by

Jδ(u) + ind(u|U(ϕδ))− J∗.It should be noted that if A is a linear continuous operator, then the functionals‖Au − z‖ and ‖Au − z‖2 are convex and continuous and the results obtainedbelow for Problem (1.2.1) can be re-translated to the equation Au = z.

To make use of the generally accepted notion solution methods for ill-posedproblems in the approaches considered above, it should be taken into accountthat, in fact, we deal only with stable approximations of ill-posed problems.This enables us to use numerical methods which are designed for well-posedproblems.

1.3.5 Proposition. For variational problem (1.2.1) assume that the optimalset U∗ is non-empty, that the constraint u ∈ K is given exactly and that theapproximation Jδ of the objective functional J satisfies ‖J − Jδ‖C(U0) ≤ δ.Moreover, supposed that the parameters α and δ in the Tikhonov regularizationmethod are chosen such that δ

α ≤ c, with fixed c and α ↓ 0.Then


(i) for any stabilizer ω each of the three methods is defined, the correspondingsequence uδ, δ ↓ 0, belongs to some compact set and any cluster pointof this sequence belongs to U∗;

(ii) if ω is a weak stabilizer and Jδ is a weakly lower semicontinuous functional,then uδ is contained in a weakly compact set and the weak cluster pointsbelong to U∗.

Proof: Here we do not use the detailed description of the feasible set K givenin (1.2.1). It suffices to assume that K is a convex, closed set and that

‖J − Jδ‖C(K) ≤ δ,

where K is some set containing K. Let ω be a stabilizer according to Definition1.3.1.

For quasi-solution methods, due to the definition of uδ, one gets uδ ∈ Mand Jδ(uδ) ≤ Jδ(u∗) with u∗ ∈M ∩U∗. Hence, since ‖J−Jδ‖C(U0) ≤ δ, it holds

J(uδ) ≤ J∗ + 2δ, uδ ∈M. (1.3.2)

For the residual method, in view of the inequalities

δ ≤ ϕ(δ), ‖J − Jδ‖C(U0) ≤ δ,

it follows that

Jδ(u∗) ≤ J∗ + ϕ(δ) for any δ and u∗ ∈ U∗ ∩Dω.

Consequently, observing the definition of uδ, all points uδ have to belong to thecompact set ω∗ = u : ω(u) ≤ ω(u∗). On account of Jδ(uδ) ≤ J∗ + ϕ(δ) wehave

J(uδ) ≤ J∗ + ϕ(δ) + δ, uδ ∈ ω∗. (1.3.3)

Finally, consider the case that uδ is defined by means of the Tikhonovregularization method. Let Q = u : ω(u) ≤ ω(u∗) + 2 δα. For any point u ∈K \Q, due to

Jδ(u)− Jδ(u∗) ≥ −2δ for u ∈ K, (1.3.4)

we haveJδ(u) + αω(u) > Jδ(u

∗) + αω(u∗).

Hence, uδ ∈ Q. Because 2 δα ≤ 2c, the sequence uδ belongs to the set

ω∗2c = u : ω(u) ≤ ω(u∗) + 2c.

Together with the definition of uδ this yields

J(uδ) + αω(uδ) ≤ J(u∗) + αω(u∗) + 2δ, uδ ∈ ω∗2c. (1.3.5)

Now, from the relations (1.3.2), (9.3.15) and (1.3.5) it follows that for each ofthe methods the sequence uδ is contained in some compact set. Choosing con-vergent subsequences and taking limits in the inequalities (1.3.2), (9.3.15) and(1.3.5), respectively, the assertion follows.

In the case that ω is a weak stabilizer, the arguments are analogous. The


weak lower semicontinuity of Jδ yields the existence of uδ.

Under the assumptions on the stabilizer made so far, it cannot be expectedto obtain stronger results on the convergence, since we cannot exclude that theset

U∗ ∩ u : ω(u) ≤ ω(u∗)

contains more than one point, and it is even possible that

U∗ ⊂ u : ω(u) ≤ ω(u∗).

But, if the hypothesis of Theorem 1.2.6 hold and, additionally, the func-tional Jδ is convex and the stabilizer ω is strongly convex and weakly lowersemicontinuous, then, due to Theorem 1.2.6, the existence of a value δ0 > 0 isensured such that, for δ ∈ (0, δ0], the problem of finding an element uδ, deter-mined by the residual as well as by the regularization method, is well-posed.

The following result shows some relations between the iterates of the con-sidered methods. We emphasize that convexity properties of the original problemare not used and that the result can be proved under rather weak assumptionson the functional Jδ (cf. Liskovec [275], chapt. 2).

1.3.6 Proposition. Denote

uαδ = arg minu∈UJδ(u) + ind(u|K) + αω(u).

Then uαδ is a quasi-solution for M = ωc with c = ω(uαδ ).The same solution uαδ can also be obtained by means of the residual method ifϕ(δ) = Jδ(u

αδ )− J∗ is chosen.

If uδ is uniquely determined by means of the residual method, then this point isalso a quasi-solution for M = ωc with c = ω(uδ).

1.3.7 Remark. One should be aware that the set M ∩ U∗ with

M = u : ω(u) ≤ ω(uαδ )

may be empty. Indeed, in the example with

J(u) = (u− 1)2, K = R, ω(u) = u2, Jδ = J,

we get uαδ = 11+α , and M does not contain the point u∗ = 1 which is the unique

solution of the original problem.This example also demonstrates that the point uδ is not always obtainable

by means of the quasi-solution method. However, uαδ and uδ are quasi-solutionsin the sense of Definition 1.3.2. ♦

Therefore, from Proposition 1.3.6 we can conclude that the quasi-solutionmethod is the most general one and this is emphasized by a number of papers.

The question of the practical performance of the described methods is moreimportant in our context. The regularizing method seems to be the simplest one.In the other two methods, in addition to the condition u ∈ K, restrictions ofthe type u ∈M or Jδ(u)− J∗ ≤ ϕ(δ) are imposed.

The choice of a suitable stabilizer ω or a compact set M , which contains asolution point, is usually very difficult in practice. For example, in some elliptic


variational inequalities with a linear differential operator of the second order (cf.Appendix A2.1 and Chapter 8), we have to minimize a quadratic functional

J(u) =1

2〈∇u,∇v〉0,Ω − 〈f, u〉0,Ω

(or even a more general functional J) on a closed and convex set K ⊂ H1(Ω).For such a problem usually a stabilizer ω(u) = ‖u− u‖22,Ω is used and there is nohope that another stabilizer could be chosen which imposes weaker conditionson the smoothness of the function u. In this case the order of the differentialoperator, corresponding to the functional to be minimized in the regularizationor residual method, is higher than the order of the initial differential operator,and a solution has to be sought in the function space H2(Ω) but not in H1(Ω),where the initial problem is given. In particular, this excludes the use of piece-wise affine approximations in the finite element method and requires to applyhigher order approximations leading to a greater numerical expense.On should also be aware, however, that due to the lower order of smoothness ofthe solution of the variational inequality, the use of higher order elements willnot lead to an improvement of the asymptotical behavior of the error (cf. theend of Appendix A2.2 and Chapter 8).

Similar difficulties arise in quasi-solution methods if the constraint ‖u −u‖2,Ω ≤ c is used for the construction of the set M .

It should be noted that for some other classes of problems, like integralequations, differentiation of functions, spline approximation etc., some resultsare known (cf. Lavrentjev / Romanov / Shishatski [259], Hofmann [185]),which enable us to construct stabilizers or compact sets of well-posedness on thebasis of additional a priori information. On this way the structure of the auxil-iary problems, which have to be solved by applying the above methods, is notessentially more complicated than in the original problem.

In the sequel we analyze Tikhonov’s regularization method based on theuse of weak stabilizers. In Section 1.3.3 more restrictive rules for the adaptionof the parameters α and δ will be given such that convergence in the metric ρUof the minimizing sequence uδ to the optimal set U∗ or to a point in U∗ canbe proved.

In our opinion, for convex variational problems, the quasi-solution andresidual methods are less efficient, a conclusion which may be precipitate, butit is based on the insufficient algorithmical development of these methods fornonlinear problems.As mentioned above, in the quasi-solution method, using a weakly compact setM , we can only expect weak convergence of uδ. In Vasiljev [408], [chapt.2, sect. 7] a modification with successive adaption of the set M is given whichenforces convergence in the metric ρU . However, the numerical procedure is es-sentially more complicated in comparison with the original version.In the residual method the value J∗ has to be known a priori or has to be speci-fied during the solution process which leads to an additional expense. However,for the solution of equations and some special variational problems the value J∗

is mostly known.


1.3.3 Tikhonov regularization

In Proposition 1.3.9 below convergence of the regularization method will bestated for Problem (1.2.1) under the assumption that the approximations of theobjective functional, the constraints and the weak stabilizer ω are successivelyimproved within the solution process.

Problem (1.2.1) is considered under the additional assumption that U is aHilbert space and that B maps into Rm. The class of approximate data Ji,Bi and ωi is defined by

‖J − Ji‖C(Dω) ≤ δi, (1.3.6)

maxu∈Dω

|B(u)−Bi(u)|1 ≤ σi ≤ δi, (1.3.7)

‖ω − ωi‖C(Dω) ≤ δi, (1.3.8)

where |v|1 = max1≤k≤m |vk| and σi, δi are sequences of non-negative num-bers converging to 0 and sup δi < 1. Let

Ki = u ∈ Dω : Bi(u)− τi1I ≤ 0, (1.3.9)

θi(u) = Ji(u) + αiωi(u), (1.3.10)

with 1I = (1, ..., 1)T ∈ Rn, τi = σi1−δi , αi > 0, αi ↓ 0.

It is easy to see that Ki ⊃ K ∩Dω.Fixing a sequence εi with εi > 0, ε ↓ 0, the sequence of iterates ui let bedefined by

ui ∈ Ki, θi(ui) ≤ inf

u∈Kiθi(u) + εi, i = 1, 2, ... (1.3.11)

1.3.8 Definition. A point u∗ω ∈ U∗ω ≡ Dω ∩ U∗ is called ω-normal solution ofProblem (1.2.1) if ω(u∗ω) = infω(u) : u ∈ U∗ω. ♦

The set of ω-normal solutions will be denoted by U∗∗ω . Since U∗ is weaklyclosed, because of its convexity and closedness, and the set

ω∗ = u : ω(u) ≤ ω(u∗)

is weakly compact for a fixed u∗ ∈ U∗, the set U∗ ∩ ω∗ is weakly compact.Together with the lower semicontinuity of ω this proves that U∗∗ω 6= ∅.

1.3.9 Proposition. The original assumptions concerning Problem (1.2.1) let befulfilled, where U is a Hilbert space and Y = Rn. Assume further that the Slatercondition is satisfied, U∗ 6= ∅ and that ω is strongly convex (with a constantκ > 0), non-negative and continuous on Dω = U0. Moreover, the approximationsJi, ωi, Bi are assumed to be weakly lower semicontinuous and (1.3.6) – (1.3.8)hold together with

limi→∞

(δi + αi + εi) = 0, limi→∞

εi + δiαi

= 0. (1.3.12)

Then the iterates ui, defined by (1.3.9) – (1.3.11), exist for all i and the sequenceui converges in the norm of the space U to the unique ω-normal solution u∗ωof Problem (1.2.1).


For the sake of simplicity, we will prove this statement only for the casethat the feasible set K is given exactly. For a complete proof of Proposition1.3.9 the reader is referred to Vasiljev [408], Thm. 2.8.1.

Proof: Since ω is strongly convex and weakly lower semicontinuous, the setωd = u : ω(u) ≤ d is bounded and weakly closed for all d. Hence, ω is a weakstabilizer. In view of σi ≡ 0 and (1.3.11) we have

ui ∈ K, θi(ui) ≤ inf

u∈Kθi(u) + εi, i = 1, 2, ... (1.3.13)

Due to the inequalities (1.3.6) – (1.3.8), we obtain for arbitrary u ∈ K

θi(u) ≥ J(u) + αiω(u)− δi(1 + αi) ≥ J∗ − δi(1 + αi), (1.3.14)

consequently, the iterates ui exist for all i. Since ω is a strongly convex func-tional, the ω-normal solution of Problem (1.2.1) is unique.Using the non-negativity of ω and the properties of the parameters αi, εi,δi and the relations (1.3.13), (1.3.14) one gets

J(u∗ω) ≤ J(ui) ≤ J(ui) + αiω(ui) ≤ θi(ui) + δi(1 + αi)

≤ infu∈K

θi(u) + εi + δi(1 + αi) ≤ θi(u∗ω) + εi + δi(1 + αi)

≤ J(u∗ω) + αiω(u∗ω) + 2δi(1 + αi) + εi. (1.3.15)

Thus,

ω(ui) ≤ ω(u∗ω) +1

αi[2δi(1 + αi) + εi] =: ω(u∗ω) + γi, (1.3.16)

J(ui) ≤ J∗ + µi, (1.3.17)

with γi = 1αi

[2δi(1 + αi) + εi], µi = αiω(u∗ω) + 2δi(1 + αi) + εi, and on accountof (1.3.12),

limi→∞

γi = 0, limi→∞

µi = 0. (1.3.18)

Using the definition of a weak stabilizer and the relations (1.3.16), (1.3.18),boundedness of the sequence ui can be concluded. Hence, one can select aweakly convergent subsequence uik, uik v∗, and in view of the closednessof K, v∗ ∈ K.Taking limits in the inequalities (1.3.16), (1.3.17) and due to the uniqueness ofthe normal solution, we conclude that v∗ = u∗ω.Obviously, the sequence 1

2 (uik + u∗ω) also converges weakly to u∗ω. According toDefinition A1.5.20, strong convexity of ω leads to

1

4κ‖uik − u∗ω‖2 ≤

1

2

[ω(uik) + ω(u∗ω)

]− ω

(1

2(uik + u∗ω)

)(1.3.19)

and, since ω is weakly lower semicontinuous,

limi→∞ω

(1

2(uik + u∗ω)

)≥ ω(u∗ω), limi→∞ω(uik) ≥ ω(u∗ω).

The latter inequality together with (1.3.16) implies

limω(uik) = ω(u∗ω).


Now, in view of (1.3.19), we obtain

limi→∞

‖uik − u∗ω‖ = 0,

and in order to getlimi→∞

‖ui − u∗ω‖ = 0,

it is sufficient to note that uik was an arbitrarily chosen weakly convergentsubsequence.

Even if we assume that the problem data are given exactly, we cannotabstain from the necessity to apply special methods designed for ill-posed prob-lems. Indeed, by means of the examples given in Section 6.1 we will see thatmethods which are suitable for well-posed problems are not practicable for ill-posed ones, even if we suppose in addition that the computations are performedexactly.

1.3.4 Proximal point regularization

Now we turn to the mainly investigated and propagated method in this bookfor solving ill-posed variational problems, the so-called proximal point method(PPM). This method was suggested by Martinet [287, 288] and became thestarting point for a new regularization approach.PPM is based on the proximal point mapping

Proxf,C

u := arg minv∈C

f(v) +χk2‖v − u‖2, u ∈ V,

where C is supposed to be a non-empty, closed and convex set in a Hilbert spaceV , f : V → R

n is a lower semi-continuous functional and

0 < χ ≤ χk ≤ χ

is a sequence of regularization parameters.In comparison to the Tikhonov regularization approach the main advantage ofPPM consists in the fact that obviously the regularized functional

Ψχ,u(·) := f(·) +χk2‖ · −uk‖2

is strongly convex for all k and its regularizing effect does not vanish for k →∞.For more properties of this mapping see Section 3.1.

In the following chapters iterative proximal point regularization (PPR) willbe studied in more detail for different classes of problems. In this section werestrict our attention to some regularizing properties of these methods, takingthe same point of view as in the study of methods in the foregoing sections.

1.3.10 Remark. As long as the regularization parameter χk stays away fromzero the convergence theory of PPM is not inflicted by the value of χk. Therefore,when describing basic properties of the proximal-point approach, we set sometimes for the sake of simplicity χk ≡ 2 ∀ k.However, as we will see in the numerical examples, a correctly chosen value ofχk – although in general unknown – influences the performance of the methodand has to be properly adapted when solving particular problems. ♦


We consider again Problem (1.2.1) with U = V being a Hilbert space andthe feasible set K being given exactly. Let δi, γi and εi be non-negativesequences converging to 0, γ0 = sup γi, K = u ∈ V : ρ(u,K) ≤ γ0 and Ji isassumed to be a family of continuous functionals satisfying the condition

supu∈K|Ji(u)− J(u)| ≤ δi, i = 1, 2, ... (1.3.20)

Proximal Point Regularization (PPR):For arbitrarily chosen u0 ∈ K compute ui according to

ρ(ui,K) ≤ γi, (1.3.21)

Ψi(ui) ≤ min

u∈KΨi(u) + εi, (1.3.22)

with Ψi(u) = Ji(u) + ‖u− ui−1‖2.

1.3.11 Proposition. Let J : V → R be a convex functional satisfying a Lips-chitz condition on V with constant L. Assume that K ⊂ V is a convex, closedset,U∗ = Arg minJ(u) : u ∈ K is non-empty and εi, γi, δi are given suchthat

∞∑i=1

√εi <∞,

∞∑i=1

√γi <∞,

∞∑i=1

√δi <∞. (1.3.23)

Then the sequence ui converges weakly to some element u ∈ U∗.

Proof: Denote Ψi(u) = J(u) +‖u−ui−1‖2 and let vi be the nearest point to ui

on K, i.e., vi = arg min‖ui − v‖ : v ∈ K. Due to (1.3.20), (1.3.22) we obtain

Ψi(ui) ≤ min

u∈KΨi(u) + εi + 2δi ≤ Ψi(v

i−1) + εi + 2δi (1.3.24)

and because of (1.3.21) the estimate ‖vi − ui‖ ≤ γi holds. Together with theLipschitz property assumed for J , we get

Ψi(vi−1) ≤ Ψi(u

i−1) + Lγi−1 + γ2i−1.

Therefore,

J(ui) + ‖ui − ui−1‖2 ≤ J(ui−1) + εi + 2δi + Lγi−1 + γ2i−1, (1.3.25)

and finallyJ(ui) ≤ J(ui−1) + εi + 2δi + Lγi−1 + γ2

i−1.

Using Lemma A3.1.4 and (1.3.23) we conclude that the sequence J(ui) con-verges and inequality (1.3.25) leads to

limi→∞

‖ui − ui−1‖ = 0. (1.3.26)

In view of ∣∣J(ui) + ‖ui − ui−1‖2 − J(vi)− ‖vi − ui−1‖2∣∣

≤ L‖ui − vi‖+ ‖ui − ui−1 + vi − ui−1‖ · ‖ui − vi‖


and the relations (1.3.21), (1.3.23) and (1.3.26), it follows that

‖ui − ui−1 + vi − ui−1‖ → 0.

Hence, for some κ > 0 we have

|Ψ(ui)− Ψi(vi)| ≤ (L+ κ)γi ∀ i

and this impliesΨi(v

i) ≤ Ψi(ui) + (L+ κ)γi ∀ i.

Defining ui = arg minu∈K Ψi(u), the latter inequality together with (1.3.20) and(1.3.22) yield

Ψi(ui) ≤ Ψi(u

i) + εi + 2δi (1.3.27)

and consequently,

Ψi(vi) ≤ Ψi(u

i) + εi + 2δi + (L+ κ)γi. (1.3.28)

In view of the strong convexity of Ψi and the choice of the point ui, we get alsothe relation

Ψi(vi)− Ψi(u

i) ≥ 〈q, vi − ui〉+1

4‖vi − ui‖2 (1.3.29)

with 〈q, vi − ui〉 ≥ 0 for some element q of the subdifferential ∂Ψi(ui).

Thus, from (1.3.28) and (1.3.29) the estimate

‖ui − vi‖ ≤ 2 (εi + 2δi + (L+ κ)γi)1/2

is satisfied and, observing (1.3.21)

‖ui − ui‖ ≤ 2 (εi + 2δi + (L+ κ)γi)1/2

+ γi =: γi. (1.3.30)

Now, assumption (1.3.23) provides that the series∑∞i=1 γi converges.

Because the mapping

ProxJ,K

: u→ arg minv∈K

J(v) + ‖v − u‖2

is non-expansive (see Remark 3.1.2) and a fix-point-mapping, i.e.,

ProxJ,K

u∗ = u∗ for u∗ ∈ U∗

holds, we get‖ui+1 − u∗‖ ≤ ‖ui − u∗‖.

On account of (1.3.30) this gives

‖ui+1 − u∗‖ ≤ ‖ui − u∗‖+ γi+1.

Now, applying Polyak’s Lemma A3.1.4 and (1.3.23) we get convergence of thesequence ‖ui − u∗‖ and thus boundedness of ui.Since ui+1 is the minimum point of the functional Ψi+1 on K, PropositionA1.5.34 (inequality (A1.5.24)) with J2 := J and J1(u) := ‖u − ui‖2 impliesthat for fixed u∗ ∈ U∗ we have

J(u∗)− J(ui+1) + 2〈ui+1 − ui, u∗ − ui+1〉 ≥ 0. (1.3.31)


With (1.3.26) and (1.3.30) it follows that

‖ui+1 − ui‖ → 0. (1.3.32)

Thus, relation (1.3.31) leads to

J(u∗) ≥ limi→∞J(ui) ≥ limi→∞J(ui) ≥ J(u∗),

i.e., limi→∞ J(ui) = J(u∗). By means of the Lipschitz property of J and (1.3.32)in addition we conclude that

limi→∞

J(ui) = J(u∗). (1.3.33)

Now, due to the boundedness of ui, there exists a subsequence uij whichconverges weakly to some point u. Moreover, because of the weak closedness ofK and (1.3.33) we obtain u ∈ U∗. Observing that for any u∗ ∈ U∗ the sequence‖ui−u∗‖ converges, in view of Opial’s lemma (Proposition A1.1.3) the weakconvergence of ui to u is valid immediately .

It should be noted that, using the inequalities (1.3.23) and (1.3.30), the state-ment of Proposition 1.3.11 follows immediately from Theorem 1 in Rockafel-lar [352].

1.3.12 Remark. Proposition 1.3.11 remains true if the functionals Ji are con-vex and Gateaux-differentiable and instead of (1.3.21) and (1.3.22) the condition

‖∇Ψi(ui)−∇Ψi(u

i)‖ ≤ εi

on the gradients of Ψi is used, with ui = arg minu∈K Ψi(u) (see the relationsbetween (A1.5.28) and (A1.5.31)). ♦

1.3.13 Remark. Proposition 1.3.11 can be easily extended to the case wherethe feasible set K is bounded and, instead of K, closed and convex sets Ki areconsidered with ρH(K,Ki) ≤ σi. In this case the conditions (1.3.21) and (1.3.22)in the proximal method above have to be modified to

ρ(ui,Ki) ≤ γi, (1.3.34)

Ψi(ui) ≤ min

u∈KiΨi(u) + εi (1.3.35)

and, in addition to (1.3.23), it must be assumed that∑∞i=1

√σi <∞.

In order to establish the convergence of this modified method, we show that(1.3.34) and (1.3.35) imply the validity of the inequalities (1.3.21) und (1.3.22)with modified constants γi and εi. To this end we prove that the estimate

minu∈Ki

Ψi(u)− minu∈K

Ψi(u) ≤ (L+ 4r)σi (1.3.36)

holds true ifu : ρ(u,K) ≤ sup

i(γi + σi) ⊂ Sr(0).

Indeed, if minu∈Ki min Ψi(u) ≥ minu∈K Ψi(u), setting

wi = arg minu∈Ki

Ψi(u), zi = arg minv∈Ki

‖v − u‖,


one can conclude thatΨi(z

i) ≥ Ψi(wi) ≥ Ψi(u

i)

and, in view of ‖zi − ui‖ ≤ σi and the Lipschitz property, it holds

Ψi(zi)− Ψi(u

i) ≤ (L+ 4r)σi.

Hence,Ψi(w

i)− Ψi(ui) ≤ (L+ 4r)σi.

Now, using (1.3.36), we obtain from (1.3.35) that

Ji(ui) + ‖ui − ui−1‖2 ≤≤ minu∈KJi(u) + ‖u− ui−1‖2+ (L+ 4r)σi + 2δi + εi,

but, due to (1.3.34) and ρH(K,Ki) ≤ σi, it follows immediately that

ρ(ui,K) ≤ γi + σi.

♦

In Section 4.2 we drop the assumption that the feasible set K is bounded.

Comparing the Propositions 1.3.9 and 1.3.11, we see with regard to Re-mark 1.3.13 that for the Tikhonov method a stronger statement on the conver-gence of the minimizing sequence is obtained than for the PPM. However, thePPM guarantees an essential better conditioning of the auxiliary problems and,as a consequence, a better stability of the numerical process.

In a number of papers the PPM is considered in a slightly more generalsetting in the sense that in (1.3.22) instead of

Ji(u) + ‖u− ui−1‖2

functionalsJi(u) +

χi2‖u− ui−1‖2

are used, where χi is a sequence of positive regularization parameters withsupi χi < ∞. If, in addition, infi χi > 0, then Proposition1.3.11 remains trueand the proof requires only some trivial modifications.

1.4 Comments

Among approximately fifty monographs known to us which are dedicated tothe theory and solution of ill-posed problems only a minor part is devotedto variational or extremal problems (see, for instance, Tikhonov and Ars-enin [395], Liskovec [275], Bakushinski and Gonkharski [30]) and onlythe monographs of Vasiljev [408], Alifanov, Artyukhin and Rumyantsev[7] deal with this subject exclusively. Together with the bibliographical surveyby Liskovec [276] they give a sufficiently complete overview about the investi-gation of the topic with the exception of the proximal point methods.

1.4. COMMENTS 27

Section 1.1: An analysis of the notion well-posedness (see Definition 1.1.1) isgiven in [30]. The first notion of well-posedness was introduced by Hadamard[162, 163]. For the equation of the first kind Au = z this corresponds toG = A−1, DG = D = Z. In fact, Definition 1.1.1 coincides with the conceptof conditional well-posedness introduced by Lavrentjev [258]. A comparisonof the different notions of well-posedness was performed by Dontchev andZolezzi [92].

Investigations on the ill-posedness of the Cauchy problem for the Laplaceequation can be found in the papers of Pucci [336], Lavrentjev [257], Liskovec[274] and Schaefer [361]. A large number of papers is dedicated to operatorequations of first kind, we refer especially to Douglas [93], Phillips [320],Ivanov, Vasin and Tanana [196] and Kress [249].

Various types of problems are analysed by Tikhonov and Arsenin[395], Hoffmann and Sprekels [184], Hofmann [185], Baumeister [34],Romanov [354], Louis [279] and Anger [11]. For the problem of computingvalues of unbounded operators cf. Morozov [295].Also recommendable are the proceedings of a conference in Oberwolfach, editedby Hammerlin and Hoffmann [164], where problems of parameter identifi-cation, free boundary and inverse problems for differential equations and alsointegral equations of first kind are studied, including applications in technicalfields and in medicine.

Section 1.2: Well-posedness of finite-dimensional convex optimization prob-lems according to Definition 1.2.1 was investigated by Karmanov [232]. For anotion of well-posedness of constraints see Levitin and Polyak [263].

An analogue of Theorem 1.2.6 for a slightly different definition of weakwell-posedness (in particular, convexity of Jδ and gδ is assumed) has been provedby Eremin and Astafjev [105].

Section 1.3: The notion of a stabilizer was created in context with the analysisof the convergence of Tikhonov regularization and residual methods. There, forthe first time, the functional ‖u− u‖2 was applied in order to obtain stability ofthe solution procedure. Numerous examples of stabilizers and weak stabilizersfor different problems (in particular also for control problems) and spaces aregiven by Vasiljev [408].

The quasi-solution method was introduced by Ivanov [195] for an equationof the first kind with an exactly given continuous operator. Its further devel-opment is essentially connected with a generalization of Proposition 1.3.3 (cf.Fuller [124], Liskovec [275], Bakushinski and Gonkharski [30]). A broadfield of applications of quasi-solution methods is described by Ivanov, Vasinand Tanana [196].

The question of the choice of a suitable compact set of well-posedness is ofspecial interes for some problems in order to establish error estimations for thesolution under approximately given information. This has motivated research indifferent areas of applications (cf. John [199], Pucci [336], Lavrentjev, Ro-manov and Shishatski [259]). An iteration method with a successive enlarge-ment of the compact set of well-posedness by means of a technique from para-metric programming was introduced in Tichatschke and Hofmann [387].

The residual method has been applied first by Phillips [320] for solving aFredholm integral equation of the first kind. A rigorous theoretical foundation


of this method was given by Morosov [293]. The generalization of the resid-ual method to approximately given operators was considered by Gonsharski,Leonov and Jagola [142] and Morozov [295].

One of the most important contributions to the field of ill-posed prob-lems has been the regularization method introduced by Tikhonov [392]. InTikhonov and Arsenin [395] the method was described for different typesof ill-posed problems. Its main two variants differ in the choice of the regular-ization parameter which is based on an a priori rule or performed according tothe residual value and the accuracy of the approximate data. Both approacheshave attracted wide publicity and consideration in the literature, cf. Liscovec[275], Groetsch [150], Morozov [295], Hofmann [185], Engl, Kunisch andNeubauer [104].

Systematical investigations of discretized schemes in connection with regu-larization methods can be found in Liscovec [275], Vainikko [406], Hofmann[185].

Some results modifying Proposition 1.3.5 to different types of ill-posedproblems are contained in Liskovec [275], and in Vasiljev [408] its general-ization to variational problems subject to inexact constraints is described.

Chapter 2

STABLEDISCRETIZATION ANDSOLUTION METHODS

2.1 Stabilizing Properties of Numerical Meth-ods

Using the regularization techniques described in Section 1.3, all the constraintsof the original variational problem are included in the regularized problems.Moreover, in the limit of the parameter δ → 0, according to the family of per-turbations in (1.2.3), (δ → 0, α → 0 in Tikhonov’s regularization method), weformally return to the original problem, except for some additional restrictionsin the cases of the residual- and quasi-solution methods.

It should also be mentioned that, in principle, the controlling parameters(for instance ε, γ, δ in the proximal point method) are chosen a priorily, and theconvergence properties of the considered methods do not depend on the specificalgorithms used to solve the regularized problems. Therefore, the regularizationtechniques can be seen independently of the numerical methods for solving theapproximated and regularized problems and can be considered purely as ap-proaches to stabilize the problems.

Now let us study some standard numerical methods which themselves havea regularizing property.

2.1.1 Stabilizing properties of gradient type methods

In this subsection we consider again Problem (1.2.4), i.e.,


where J : V → R is a convex, lower semicontinuous functional,K ⊂ V is a convex, closed set and K ∩ domJ 6= ∅

and assume additionally that the objective functional J is Frechet-differentiableand convex on the Hilbert space V and that ∇J satisfies a Lipschitz condition

29

30CHAPTER 2. STABLE DISCRETIZATION AND SOLUTION METHODS

on V with constant L.According to Riesz’ Theorem A1.1.1, ∇J is considered as an element of thespace V .

We consider the gradient projection method described in Subsection A3.3.2,assuming that the gradients are approximately determined within the iterationprocedure.

Thus, starting with some u0 ∈ K and choosing

0 < α <2

L, (2.1.2)

we calculate uk+1 according to

uk+1 := ΠK(uk − αpk), (2.1.3)

where ΠK is the orthoprojector onto the set K and pk is assumed to satisfy theinequality

‖pk −∇J(uk)‖ ≤ δk, δk ↓ 0. (2.1.4)

This corresponds to the case that the set K is given exactly and the error inthe approximation of the functional J becomes small in the norm of the spaceC1(V ).

Now let us investigate the convergence of this gradient projection method.

2.1.1 Proposition. (cf. Proposition A3.3.28)Suppose that the set U∗ = Arg minu∈K J(u) is non-empty. Then the sequenceuk, generated by the gradient projection method (2.1.2) - (2.1.4) with

∑∞k=0 δk <

∞, converges weakly to some u∗ ∈ U∗.

Proof: Using inequality (A1.5.22) with u1 := uk and u2 := u∗, where u∗ ∈ U∗is chosen arbitrarily, we get

〈∇J(uk)−∇J(u∗), uk − u∗〉 ≥ 1

L‖∇J(uk)−∇J(u∗)‖2. (2.1.5)

For uk+1 = ΠK(uk − α∇J(uk)), due to (2.1.4) and the non-expansivity of theprojection ΠK (cf. Corollary A1.5.36), the estimate

‖uk+1 − uk+1‖ = ‖ΠK(uk − α∇J(uk))−ΠK(uk − αpk)‖≤ α‖∇J(uk)− pk‖ ≤ αδk (2.1.6)

holds true. Due to u∗ ∈ U∗ one can show that

ΠK(u∗ − α∇J(u∗)) = u∗,

and applying the inequalities (2.1.5), (2.1.6), we infer

‖uk+1 − u∗‖ = ‖ΠK(uk − α∇J(uk))−ΠK(u∗ − α∇J(u∗))‖≤ ‖uk − u∗ − α(∇J(uk)−∇J(u∗))‖2

≤ ‖uk − u∗‖2 − 2α〈∇J(uk)−∇J(u∗), uk − u∗〉+ α2‖∇J(uk)−∇J(u∗)‖2

≤ ‖uk − u∗‖2 − α(2

L− α)‖∇J(uk)−∇J(u∗)‖2.

2.1. STABILIZING PROPERTIES OF NUMERICAL METHODS 31

Observing that α ∈ (0, 2L ), the estimate

‖uk+1 − u∗‖ ≤ ‖uk − u∗‖

holds, and due to (2.1.6), this implies

‖uk+1 − u∗‖ ≤ ‖uk+1 − u∗‖+ ‖uk+1 − uk+1‖ ≤ ‖uk − u∗‖+ αδk. (2.1.7)

In view of (2.1.7), assumption∑∞k=0 δk <∞ and Lemma A3.1.4, we conclude

that the sequence ‖uk − u∗‖ converges for all u∗ ∈ U∗.Furthermore, uk+1 is defined by (2.1.3), i.e. it satisfies the variational inequality

〈uk+1 − uk + αpk, u− uk+1〉 ≥ 0, ∀ u ∈ K.

Consequently,

〈pk, u− uk+1〉 ≥ 1

α〈uk − uk+1, u− uk+1〉 (2.1.8)

and inserting u := uk, we have

〈pk, uk − uk+1〉 ≥ 1

α‖uk − uk+1‖2.

By virtue of (2.1.4) this leads to

〈∇J(uk), uk − uk+1〉 ≥ 1

α‖uk − uk+1‖2 − 〈pk −∇J(uk), uk − uk+1〉

≥ 1

α‖uk − uk+1‖2 − δk‖uk − uk+1‖. (2.1.9)

But inequality (A1.4.10) and the Lipschitz condition yield for any u, v ∈ V

J(u)−J(v)−〈∇J(v), u−v〉 =

∫ 1

0

〈∇J(v+t(u−v))−∇J(v), u−v)dt ≤ L

2‖u−v‖2,

such that

J(uk+1)− J(uk)− 〈∇J(uk), uk+1 − uk〉 ≤ L

2‖uk+1 − uk‖2.

This together with (2.1.9) ensures that

J(uk)− J(uk+1) ≥(

1

α− L

2

)‖uk − uk+1‖2 − δk‖uk − uk+1‖, (2.1.10)

with 1α −

L2 > 0 according to the choice of α. Therefore,

J(uk+1) ≤ J(uk) + δk‖uk − uk+1‖,

and because of the boundedness of the sequence ‖uk‖ and∑∞k=0 δk <∞, we

conclude from Lemma A3.1.4 that the sequence J(uk) is convergent. Takinginto account relation (2.1.10), it follows that

‖uk − uk+1‖ → 0. (2.1.11)

Again, regarding the boundedness of the sequence uk, a subsequence ukjcan be chosen converging weakly to some element u. The convex set K is weakly


closed, hence, u ∈ K. In view of the convexity of the functional J we obtainfrom (2.1.8) for all u ∈ K that

J(u)− J(uk) ≥ 〈∇J(uk), u− uk〉

≥ 1

α〈uk − uk+1, u− uk+1〉 − 〈pk, u− uk+1〉

+ 〈∇J(uk), u− uk+1 − uk + uk+1〉 (2.1.12)

≥ 1

α〈uk − uk+1, u− uk+1〉 − ‖∇J(uk)− pk‖‖u− uk+1‖

− ‖∇J(uk)‖‖uk+1 − uk‖.

Taking limit in (2.1.12) for k := kj , j → ∞, and observing (2.1.4), (2.1.11) aswell as the lower semi-continuity of J , we get

J(u) ≥ J(u), ∀ u ∈ K,

which implies that u ∈ U∗. With regard to Opial’s lemma (cf. PropositionA1.1.3) this shows that uk u.

2.1.2 Remark. In the case V := Rn Proposition 2.1.1 leads immediately to

limk→∞

‖uk − u‖ = 0.

For the cases K := V and K := V = Rn corresponding results on the conver-

gence of the resulting gradient method can be obtained, showing that the gradi-ent method has a regularizing behavior, too (cf. Polyak [330] and Gol’steinand Tretyakov [141]). ♦

Alifanov et al. [7] investigated the application of the gradient method inorder to solve operator equations Au = f with A a linear continuous operatorfrom a Hilbert space V into a Hilbert space F . Under the conditions that thedata are approximately given according to

‖Ahk −A‖ ≤ hk, ‖fδk − f‖ ≤ δk, δk + hk → 0,

where Ahk is linear and continuous, it is proved that for a suitable choice of αthe sequence

uk+1 := uk − αA∗hk(Ahkuk − fδk)

converges in the norm of the space V to a solution of the equation Au = fhaving a minimal distance to the starting point u0. The technique of the proofuses essentially the linearity of the operators A and Ahk .

2.1.2 Stabilizing properties of penalty methods

Let us consider Problem (A1.7.35) under somewhat stronger conditions

minimize J(u),subject to u ∈ K,

K := u ∈ V : gj(u) ≤ 0, j = 1, ...,m,(2.1.13)

J, gj : V → R are convex, continuous functionals on a Hilbert space V ,U∗ = Arg minu∈K is non-empty and bounded.


We consider a penalty function

φk(u) = ϕk(g(u)), (2.1.14)

where the family ϕk is subject to conditions formulated in the followinglemma.

2.1.3 Lemma. Let ϕk : Rm → R be a family of convex functions and

limk→∞

ϕk(z) =

0 if max1≤j≤m zj < 0,+∞ otherwise.

(2.1.15)

Then, for arbitrary α1 < α2 < 0, α3 > 0 and δ > 0, the functions ϕk convergeuniformly

to 0 on the set z : α1 ≤ zj ≤ α2, j = 1, ...,m;to +∞ on the set z : max1≤j≤m zj ≥ α3 ∩ Bδ(0),

with Bδ(0) := u ∈ V : ‖u‖ ≤ δ.

Proof: The first part of the statement is an immediate consequence of Proposi-tion A1.5.19. In order to verify the second part, we can easily adapt the proofof Theorem A3.4.41 by taking n = m, H := Rm, K := G = z ∈ Rm : z ≤ 0,φ2k := 0, φ1

k := ϕk, J := ρ2(z,Bδ∗(0)) with a suitable δ∗.

Now, according to (2.1.13) and (2.1.14), we define

φk(·) := ϕk(g(·)),Fk(·) := J(·) + φk(·),g(·) := sup

1≤j≤mgj(·);

Wτ := u ∈ V : J(u) ≤ J∗ + τ, g(u) ≤ τ

and calculate the points uk via

Fk(uk) ≤ infu∈V

Fk(u) + εk, k = 1, 2, · · · , (2.1.16)

where Fk are approximately given functions specified in the following theorem.

2.1.4 Theorem. Assume that limτ↓0 supu∈Wτρ(u, U∗) = 0, that ϕk satisfy the

conditions of Lemma 2.1.3 and

limk→∞

ϕk(z) = 0 if max1≤j≤m

zj = 0, (2.1.17)

and that φk are convex and non-negative on V .Moreover, let one of the following conditions be fulfilled:

(i) Fk, Fk : V → R are continuous on C(Wτ ) for fixed τ > 0,

‖Fk − Fk‖C(Wτ ) ≤ µk, (2.1.18)

and Fk are convex on V , too;

(ii) the functionals Fk, Fk are continuous on V and

‖Fk − Fk‖C(V ) ≤ µk. (2.1.19)


If µk ↓ 0, εk ↓ 0 and the points uk are generated according to (2.1.16), thenlimk→∞ ρ(uk, U∗) = 0 holds true.

Proof: Assuming that Wτ is bounded for a given τ , we fix some τ ∈ (0, τ). SinceJ and gj are weakly lower-semicontinuous functionals, the set Wτ is weaklycompact. Let

Uτ := u ∈Wτ : J(u) = J∗ + τ or g(u) = τ.

On Wτ the functionals J and gj attain their minima, consequently, for u ∈Wτ

and g(u) = τ , we get

τ ≥ gj(u) ≥ q, j = 1, ...,m,

with some q > −∞ not depending on u. We choose some u ∈ K such that

J(u) < J∗ +1

2τ .

Then, from Lemma 2.1.3 and (2.1.17), we conclude that there exists some iter-ation number k0 = k(τ) such that for k ≥ k0

Fk(u) ≥ J∗ + τ , (2.1.20)

Fk(u) < J∗ +1

2τ (2.1.21)

holds true for all points u ∈Wτ satisfying the equation g(u) = τ .If u ∈ Wτ and J(u) = J∗ + τ , then inequality (2.1.20) is true for all k due tothe non-negativity of φk.Hence, on account of the convexity of Fk, it follows that inequality (2.1.20) musthold on V \Wτ for all k ≥ k0.

If condition (i) is fulfilled, then in view of (2.1.18), (2.1.20) and (2.1.21),we get

Fk(u) > Fk(u) +1

2τ − 2µk, ∀ u ∈ Uτ . (2.1.22)

But for sufficiently large k (k ≥ k1 ≥ k0) we have 12 τ − 2µk − εk > 0 and with

regard to the convexity of Fk, inequality (2.1.22) is satisfied for all u ∈ V \Wτ .Therefore, on account of (2.1.16), it follows that uk ∈Wτ for k ≥ k1.

If condition (ii) is fulfilled, then one can conclude immediately from (2.1.16)and (2.1.19) that

Fk(uk) ≤ infu∈V

Fk(u) + 2µk + εk. (2.1.23)

With regard to (2.1.20) and (2.1.21) the inequality 12 τ − 2µk − εk > 0 tells us

that uk ∈Wτ .The assertion follows now from the fact that τ ∈ (0, τ) was chosen arbi-

trarily and that limτ↓0 supu∈Wτρ(u, U∗) = 0.

2.1.5 Remark. Note that the validity of the Slater condition was not assumed.If it is satisfied, then assumption (2.1.17) is superfluous. In this case, instead of(2.1.18), we require that

Fk(u) ≥ Fk(u)− µk on domφk, domφk ⊃ domFk

andlimk→∞

(Fk(u)− Fk(u)) = 0 for any u ∈Wτ ∩ intK,

i.e., condition (i) can be weakened, too. ♦


In this way, assuming Slater’s condition, the conclusion of Theorem 2.1.4remains true for so-called barrier functions, especially for function (A3.4.78),too.

Analyzing Theorem 2.1.4 and Remark 2.1.5 we can conclude that, for awide class of convex variational problems, including the majority of ill-posed andeven not weakly well-posed problems, an appropriate adaption of the parameters(penalty parameter and the parameter of controlling the approximation of theinput data) guarantees convergence of the iterates to the solution set of theoriginal problem.

2.1.6 Remark. As an example, consider penalty function (A3.4.77) with s = 2.Assume that Jδk and gδk are convex and

‖J − Jδk‖C(Wτ) ≤ δk, ‖gj − gj,δk‖C(Wτ

) ≤ δk.

Denote

Fk(u) := Jδk(u) + rk

m∑j=1

max[0, gj,δk(u)]2,

and c := supu∈Wτmax1≤j≤m |gj(u)| <∞.

Then, in order to satisfy condition (2.1.18) with µk ↓ 0, it is sufficient that

limk→∞

rkδk = 0.

Indeed,

‖Fk − Fk‖C(Wτ ) ≤ ‖J − Jδk‖C(Wτ )

+ rk

m∑j=1

‖max[0, gj(·)]2 − max[0, gj,δk(·)]2‖C(Wτ )

≤ δk + rk(2c+ δk)mδk =: µk,

and obviously, µk → 0 if rkδk → 0. ♦

Similar rules for the choice of the controlling parameters can be established forother types of penalty functions.

Let us illustrate the application of Theorem 2.1.4 to Example 1.1.6, as-suming that it is solved by means of the penalty function (A3.4.77) with s = 2.If the data approximation is measured in the norm of the space C(U0), wherein the context of Remark 1.2.2 the set U0 is chosen as

U0 := u ∈ R2 : |u1| ≤ 2, |u2| ≤ 2,

then we get

maxu∈U0

|g2(u)− gσ2 (u)| = 2|σ|,

hence, σk must be chosen according to 2|σk| ≤ δk.Concerning Example 1.1.6, it was mentioned that for any perturbation σ ∈(−1, 0) the point u∗σ = (0, 1)T is the unique solution of the perturbed problem,but for the unperturbed problem we have U∗ = (1, 0).


Now, denote uk := arg minu∈R2 Fk(u), Fk = J+φσk , where φσk is the correspond-ing penalty function for the perturbed problem with σ = σk.Then we can verify immediately that for rkδ

2k → 0 and sufficient large k

uk1 =(1 + 1

2rk)(2 + δk)

2 + 2σk + 2σ2k

, uk2 = 1−(

1 +1

2rk

)2 + 2σk + σ2

k

2 + 2σk + 2σ2k

, (2.1.24)

whereas for rkσ2k → +∞ we get

uk1 =2 + σk2rkσ2

k

, uk2 = 1− 1

4rk− (σk + 2)2

4rkσ2k

. (2.1.25)

In the first case, for large k, the parameter rk in (2.1.24) is found to be so smallthat the permissible perturbation in the component u1 (i.e. −(1 − σk) insteadof −1 in the function gσ2 ) has no observable influence on the solution of theunconstrained (penalized) minimization problem and, therefore, the point uk

differs only slightly from the minimizer

uk =

(1 +

1

2rk,− 1

2rk

)Tof Fk on R2.On the other hand, if rkσ

2k →∞, then the influence of the perturbation becomes

large in (2.1.25) for large k and the sequence uk converges to the point u∗σ =(0, 1)T , which is the unique optimal solution for any problem perturbed in thedescribed sense.

It should be noted that under the more restrictive requirement rkδk → 0the convergence of uk to the optimal point u∗ = (1, 0)T follows from Theorem2.1.5 for any perturbation of the initial data which, at the k-th step, does notexceed δk in the norm of C(U0).

2.2 Stabilization of Problems and Methods

In the subsequent analysis of the numerical methods we will consider the errorscaused by replacing the given original problem by means of a family of auxiliaryproblems which are assumed to be solved approximately.

Appearing errors in the initial data can be treated in this context as contri-butions to the class of errors considered above such that the convergence resultsobtained for exact initial data can be carried over to those for problems withinexact initial data.

In fact, we frequently have used this approach already. For instance, inthe proof of Lemma 2.1.3 convergence of the sequence uk, constructed ac-cording to (2.1.16) with approximately given data, was established by means ofinequality (2.1.23), where uk was characterized as an approximate solution of aproblem with exact data (see also the proofs of Propositions 1.3.9 and 1.3.11).Hence, this approach enables us in the sequel, to consider initial data to be givenexactly.

Now, we briefly describe three main techniques to construct numericalmethods for solving ill-posed convex variational problems. In all of them stan-dard discretization and solution algorithms can be used, assuming that the latter

2.2. STABILIZATION OF PROBLEMS AND METHODS 37

are suitable for the arising well-posed auxiliary problems. We refer to these so-lution algorithms as being the basic algorithms (or basic methods) in the wholeapproach considered.

Let us take any suitable basic algorithm for solving variational problemsof type (1.2.1). In case U and Y are finite-dimensional spaces, this could be oneof the methods studied in Appendix A3.4, whereas in the general case combi-nations of discretization and minimization techniques have to be implementedon the whole.

In order to illustrate the techniques presented in this section, we considerthe finite-dimensional problem (A3.4.56) and, as a basic algorithm for solv-ing the auxiliary problems we chose the penalty method with penalty function(A3.4.77) for s = 2. We refer to this approach as being the illustrative model.

2.2.1 Structure of regularizing methods

Using one of the regularization approaches described in Section 1.3, we have tosolve successively the auxiliary problems

minΦk(u) : u ∈ K, k = 1, 2, ..., (2.2.1)

by means of some basic method, where Φk stands for the Tikhonov- or proximalpoint regularization, i.e.,

Φk(u) := J(u) + αkω(u) or Φk(u) := J(u) + ‖u− uk−1‖2,

respectively. In the context of our illustrative model, in order to solve the regu-larized auxiliary problem (2.2.1) for a fixed k (k-th exterior step of the process),we construct a finite sequence of points uk,i (i = 1, ..., i(k)), where uk,i is anapproximate minimizer (at the i-th interior step) of the function

Fk,i(u) := Φk(u) + rk,i

m∑j=1

max[0, gj(u)]2,

such that‖∇Fk,i(uk,i)‖ ≤ εk,i.

For each step k in this procedure the positive controlling parameters rk,i andεk,i are given such that

limi→∞

1

rk,i= limi→∞

εk,i = 0,

and i(k) is that number of the interior steps i for which

Φk(uk,i) ≤ minu∈K

Φk(u) + εk,

where εk > 0 and limk→∞ εk = 0.Afterwards the exterior iteration continues with step k + 1 by setting

uk := uk,i(k), k := k + 1, i := 1.

Obviously, in this way we obtain a rather complicated iterative procedurein comparison with the solution of a well-posed problem, which is treated by


means of some standard minimization technique. Moreover, this approach hasan essential drawback because the regularized auxiliary problems may have tobe solved with high accuracy even in situations where the regularized solutions

uk = arg minΦk(u) : u ∈ K

are still far from the solution set of the original problem.

2.2.2 Iterative one-step regularization

Currently more attention is devoted to diagonal schemes for solving ill-posedvariational problems. They are characterized in the following way: For each ex-ternal step k in Problem (2.2.1) only one interior step of the chosen basic methodis performed followed by an update of the controlling parameters according to ana priori rule or to the outcome of the previous iterations. Concerning Tikhonovregularization, the first methods of this type can be found by Bakushinskijand Polyak [31], whereas the proximal point regularization (PPR) was inves-tigated by Rockafella [351] and Antipin [13].

Iterative one-step regularization applied to our illustrative model reads asfollows: For fixed sequences εk, rk of positive numbers with

limk→∞

εk = 0, limk→∞

rk =∞

in the k-th external step an approximate minimizer uk of the functional

Fk(u) := Φk(u) + rk

m∑j=1

max[0, gj(u)]2

is calculated such that

‖∇Fk(uk)‖ ≤ εk, k = 1, 2, · · · .

Thereafter step k + 1 is executed.It should be noted that the straightforward application of the penalty

method to well-posed problems of the type (A3.4.56) yields an iterative schemeof the same structure:

‖∇Fk(uk)‖ ≤ εk, k = 1, 2, · · · ,

with Fk(u) := J(u) + rk∑mj=1max[0, gj(u)]2.

Diagonal schemes of this type for solving ill-posed problems are usuallycalled methods of iterative regularization. They can be understood as a regular-ization of a basic method.

2.2.3 Iterative multi-step-regularization

Convergence results of iterative regularization methods, see the Sections 2.3.2,3.2 and 4.2, require an update of the controlling parameters (εk and rk in ourillustrative model; in addition αk if we deal with Tikhonov regularization) suchthat an essential increase of the expense for one iteration may happen.

2.3. ITERATIVE TIKHONOV REGULARIZATION 39

To overcome this difficulty Kaplan and Tichatschke [213], [216] pro-posed a multi-step proximal point regularization (PPR), which permits to im-prove significantly the approximate solutions on a given external iteration levelk.

This scheme is carried out as follows: At a given external iteration k, fixingthe values of the controlling parameters for the internal iterations, we calculatesuccessively the iterates

uk,1, uk,2, ..., uk,i(k),

where uk,i is calculated by performing one step of the respective basic methodto the problem

minJ(u) + ‖u− uk,i−1‖2 : u ∈ K.

This internal iteration with respect to i has to be continued as long as ”sufficientprogress” is achieved from step i to step i+ 1.

For example, in our illustrative model, at the k-th external step, we have tocompute successively the approximate minimizers uk,1, uk,2, ... of the functionsFk,1, Fk,2, ..., such that

‖∇Fk,i(uk,i)‖ ≤ εk,

with

Fk,i(u) := J(u) + ‖u− uk,i−1‖2 + rk

m∑j=1

max[0, gj(u)]2.

An appropriate stopping criterion for the internal iterations could be

‖uk,i − uk,i−1‖ ≤ δk, (2.2.2)

with δk a given sequence of positive numbers, not necessarily tending to zero.If uk,i satisfies (2.2.2), we set

i(k) := i, uk+1,0 := uk,i(k)

and continue with the next external iteration.

The following chapters deal with the investigation of one-step and multi-step regularization under the usage of different basic algorithms.

2.3 Iterative Tikhonov Regularization

In this section we consider the stabilization of numerical methods, in particular,the regularization of gradient projection and penalty methods by means of theTikhonov approach. Concerning other basic methods combined with this typeof regularization we refer to Antipin [12], Vasilyev [408], Bakushinski andGoncharski [30].

2.3.1 Regularized subgradient projection methods

We consider Problem (1.2.1) in a Hilbert space V where the constrained setU0 ≡ V . Let the optimal set U∗ of this problem be non-empty and the weakstabilizer ω be a strongly convex (with constant κ), continuous functional on V .Subgradients of the objective functional J and ω at the point u are identified


with elements of the space V denoted by q(u) ∈ ∂J(u) and s(u) ∈ ∂ω(u).Then the regularized subgradient projection method in the sense of Tikhonov

for solving Problem (1.2.1) reads as follows:

uk+1 := ΠK

(uk − γk(q(uk) + αks(u

k))), ∀ k ∈ N, (2.3.1)

with αk, γk given sequences of positive numbers satisfying the conditions

limk→∞

αk = 0, limk→∞

γkαk

= 0, limk→∞

|αk − αk+1|γkα2

k

= 0,

∞∑k=1

αkγk =∞.

(2.3.2)Here ΠK denotes the orthogonal projector on the set K and u0 ∈ K is anarbitrary starting point.We suppress the question how the projection canbe carried out, we suppose onlythat this can be done efficiently. Hence, at this point we do not assume a specificstructure of the feasible set K.

Concerning a suitable choice of the parameters αk and γk satisfying (2.3.2),for instance one can set αk := (1 + k)−1/2, γk := (1 + k)−1/3.

2.3.1 Proposition. Suppose there exists a constant L such that for arbitraryu ∈ V and q(u) ∈ ∂J(u), s(u) ∈ ∂ω(u) the estimates

‖q(u)‖ ≤ L(1 + ‖u‖), ‖s(u)‖ ≤ L(1 + ‖u‖) (2.3.3)

hold. Then the sequence uk, generated by Method (2.3.1), (2.3.2), convergesto the ω-normal solution u∗ω ∈ U∗ of Problem (1.2.1).

Proof: Denote

θk(u) := J(u) + αkω(u),

ξk := q(uk) + αks(uk),

vk := arg minθk(u) : u ∈ K.

Proposition 1.3.9 tells us that limk→∞ ‖vk−u∗ω‖ = 0. For the iterate uk+1 beingthe orthogonal projection of uk − γkξk on K we get

‖uk+1 − vk+1‖ ≤ ‖uk − γkξk − vk‖+ ‖vk − vk+1‖. (2.3.4)

In view of the definition of vk the subgradients q(vk) ∈ ∂J(vk), s(vk) ∈ ∂ω(vk)can be chosen such that

〈uk − vk, q(vk) + αks(vk)〉 ≥ 0.

Now, using Corollary A1.5.32 (formula (A1.5.21)) the inequality

〈uk − vk, ξk〉 ≥ 〈uk − vk, ξk − q(vk)− αks(vk)〉 ≥ 2καk‖uk − vk‖2

can be established. Therefore, on account of (2.3.3) we get

‖uk − γkξk − vk‖2 = ‖uk − vk‖2 + γ2k‖ξk‖2 − 2γk〈uk − vk, ξk〉

≤ ‖uk − vk‖2(1− 4καkγk) + γ2k‖ξk‖2

≤ ‖uk − vk‖2(1− 4καkγk) + γ2k(‖q(uk)‖+ αk‖s(uk)‖)2

≤ ‖uk − vk‖2(1− 4καkγk) + γ2k(1 + αk)2L2(1 + ‖uk − vk‖+ ‖vk‖)2

≤ ‖uk − vk‖2(1− 4καkγk) + 2γ2k(1 + αk)2L2(‖uk − vk‖2 + (1 + ‖vk‖)2).


Due to ‖vk − u∗ω‖ → 0 and (2.3.2), there exist constants c1 and c2 such that forall k we have

‖uk − γkξk − vk‖2 ≤ ‖uk − vk‖2(1− 4καkγk + c2γ2k) + c1γ

2k. (2.3.5)

Now we are looking for an upper bound for ‖vk − vk+1‖.Let the subgradients q(vk), q(vk+1) and s(vk), s(vk+1) be chosen such that

〈q(vk) + αks(vk), vk+1 − vk〉 ≥ 0,

〈q(vk+1) + αk+1s(vk+1), vk − vk+1〉 ≥ 0.

Then summing up both inequalities

2καk‖vk+1 − vk‖2 ≤ 〈q(vk+1) + αks(vk+1)− q(vk)− αks(vk), vk+1 − vk〉

≤ 〈q(vk+1) + αks(vk+1), vk+1 − vk〉

≤ 〈q(vk+1) + αks(vk+1)− q(vk+1)− αk+1s(v

k+1), vk+1 − vk〉≤ (αk − αk+1)〈s(vk+1), vk+1 − vk〉≤ |αk − αk+1|‖s(vk+1)‖‖vk+1 − vk‖ ≤ c3|αk − αk+1|‖vk+1 − vk‖,

i.e.,

‖vk+1 − vk‖ ≤ c4∣∣αk − αk+1

αk

∣∣, k = 1, 2, · · · . (2.3.6)

In view of the inequalities (2.3.4), (2.3.5) and (2.3.6) we have

‖uk+1 − vk+1‖ ≤(‖uk − vk‖2(1− 4καkγk + c2γ

2k) + c1γ

2k

)1/2+ c4

∣∣αk − αk+1

αk

∣∣.(2.3.7)

Taking into consideration the elementar relation

(a+ b)2 ≤ (1 + καkγk)a2 + (1 +1

καkγk)b2,

by virtue of (2.3.7) and the boundedness of αk and γk, we conclude that

‖uk+1 − vk+1‖2 ≤ ‖uk − vk‖2(1− 4καkγk + c2γ2k)(1 + καkγk)

+ c1γ2k(1 + καkγk) + c24

(1 +

1

καkγk

) ∣∣αk − αk+1

αk

∣∣2≤ ‖uk − vk‖2

[1− κ2α2

kγ2k − 3καkγk(1 + καkγk) + c2γ

2k(1 + καkγk)

]+ c1γ

2k(1 + καkγk) + c24

(1 +

1

καkγk

) ∣∣αk − αk+1

αk

∣∣2 (2.3.8)

≤ ‖uk − vk‖2(1− 3καkγk + c5γ2k) + c6γ

2k + c24

(1 +

1

καkγk

) ∣∣αk − αk+1

αk

∣∣2.Inequality (2.3.8) has a similar structure as inequality (A3.1.6). Therefore, tofinish the proof we can apply Polyak’s Lemma A3.1.8 setting

µk := 3καkγk − c5γ2k,

βk := c6γ2k + c24

(1 +

1

καkγk

) ∣∣αk − αk+1

αk

∣∣2.


Therefore, in view of the conditions (2.3.2), one can conclude that

limk→∞

‖uk − vk‖ = 0,

and due to limk→∞ ‖vk − u∗ω‖ = 0, it follows that limk→∞ ‖uk − u∗ω‖ = 0.

2.3.2 Remark. If the functionals J and ω are Frechet-differentiable, one canprove similarly as above the convergence of the method

uk+1 := ΠK

(uk − γk(pk + αk∇ω(uk))

), k = 1, 2, · · · , (2.3.9)

with ‖pk −∇J(uk)‖ ≤ δk.If, in addition to the conditions (2.3.2), (2.3.3), it is required that

δk ≥ 0, limk→∞

δkαk

= 0,

then one can infer that ‖uk − u∗ω‖ = 0. Method (2.3.9) is a regularized gradientprojection method with inexactly calculated gradients, however, the conditionsfor the controlling parameters differ from those for Method (2.1.3). ♦

2.3.2 Regularized penalty methods

As in Section 2.1.2 let us consider Problem (2.1.13) under the additional as-sumption that the existence of a saddle point of the Lagrangian is guaranteed.To describe Tikhonov’s regularization approach coupled with penalty methodswe chose the penalty function

φk(u) = rk

m∑j=1

max[0, gj(u)]2, rk > 0, limk→∞

rk =∞,

and positive controlling sequences αk, µk and εk with

limk→∞

αk = limk→∞

µk = limk→∞

εk = 0.

Again we defineθk := J + αkω, Fk := θk + φk,

where ω is a strongly convex (with constant κ), continuous and non-negativefunctional on the Hilbert space V .

Let Fk be a sequence of approximately given continuous functionals onV with

‖Fk − Fk‖C(V ) ≤ µk, k = 1, 2, · · · , (2.3.10)

then a regularized penalty method can be considered which generates a sequenceof iterates uk such that

Fk(uk) ≤ infu∈V

Fk(u) + εk, k = 1, 2, · · · . (2.3.11)

Now, we will investigate the convergence of this method.Let (uk, λk) be a saddle point of the Lagrangian corresponding to the regularizedproblem

minθk(u) : u ∈ K. (2.3.12)

Obviously, because of the strong convexity of θk, the first component uk of thesaddle point is the unique solution of (2.3.12).


2.3.3 Proposition. Under the conditions

limk→∞

εkαk

= 0, limk→∞

µkαk

= 0, limk→∞

‖λk‖2

rkαk= 0, (2.3.13)

the sequence uk, generated by Method (2.3.11), converges to the normal solu-tion u∗ω of Problem (2.1.13).

Proof: Denote for k = 1, 2, · · ·

θ∗k := minθk(u) : u ∈ K,vk := arg minFk(u) : v ∈ V .

In view of (2.3.10) and (2.3.11) the point uk satisfies

Fk(uk) ≤ Fk(vk) + εk + 2µk, k = 1, 2, · · · ,

and since the functional Fk is strongly convex (with constant καk) we have

‖uk − vk‖ ≤√εk + 2µkκαk

. (2.3.14)

Using the definition of a saddle point, it follows for arbitrary u ∈ V

Fk(u)− θ∗k = θk(u)− θ∗k + rk

m∑j=1

max[0, gj(u)]2

≥ −m∑j=1

λkj gj(u) + rk

m∑j=1

max[0, gj(u)]2

≥ −m∑j=1

λkj max[0, gj(u)] + rk

m∑j=1

max[0, gj(u)]2

≥m∑j=1

mintj≥0

(−λkj tj + rkt2j )

= − 1

4rk

m∑j=1

(λkj )2 = −‖λk‖2

4rk,

i.e., due to Fk(uk) = θ∗k, we obtain Fk(u)− Fk(uk) ≥ −‖λk‖2

4rk.

Hence,

Fk(uk)− Fk(vk) ≤ ‖λk‖2

4rk,

and, with regard to the definition of vk and the strong convexity of Fk, we infer

‖uk − vk‖ ≤ ‖λk‖2√κrkαk

. (2.3.15)

Owing to (2.3.14) and (2.3.15) it follows that

‖uk − uk‖ ≤√εk + 2µkκαk

+‖λk‖

2√κrkαk


and, in view of (2.3.13), limk→∞ ‖uk − uk‖ = 0.But Proposition 1.3.9 says that limk→∞ uk = u∗ω, hence, limk→∞ uk = u∗ω.

Because the values ‖λk‖ in (2.3.13) are unknown, it is essential to knowwhether there exists an upper bound for the sequence λk of Lagrange multi-pliers.To investigate this, we suppose that the feasible set in Problem (2.1.13) has thefollowing structure

K := u ∈ V : gj(u) ≤ 0, j ∈ I = 1, ...,m, I = I1 ∪ I2,

where the constraint functions gj (j ∈ I1) are nonlinear, convex and continuouson V and gj (j ∈ I2) are affine, i.e.

gj(u) = 〈aj , u〉 − bj , j ∈ I2 ⊂ I

with aj ∈ V , bj ∈ R.

2.3.4 Lemma. Let Ji be a sequence of convex, continuous functionals boundedfrom above at any point u ∈ V . Assume that each Ji attains its minimizer onthe set K and

d0 = infi

minu∈K

Ji(u) > −∞.

Furthermore, assume that the following regularity condition is fulfilled:there exists a point u such that

gj(u) < 0 if j ∈ I1 = 1, ...,m \ I2,gj(u) ≤ 0 if j ∈ I2.

Then, for each problemminJi(u) : u ∈ K (2.3.16)

there exists a Lagrange multiplier vector ηi such that supi ‖ηi‖ <∞.

Proof: The assumptions of the lemma ensure that for each fixed i there is asaddle point (zi, ηi) of the Lagrange function corresponding to (2.3.16). Thus,we get in particular

gj(zi) ≤ 0, ηijgj(z

i) = 0, j = 1, ...,m.

If ηij0 = 0 for some j0 ∈ I2, then the point

(zi, ηi1, ..., ηij0−1, η

ij0+1, ..., η

im)

is a saddle point of a problem arising from (2.3.16) by deleting the constraintgj0(u) ≤ 0. Now, we drop all such constraints and consider the resulting problemwith index set I+

2 (i) := j ∈ I2 : ηij > 0 instead of I2. Then for the pairs

(zi, ηij|j∈I1∪I+2 (i)

) the relation

Ji(u) +∑j∈I1

ηijgj(u) +∑

j∈I+2 (i)

ηijgj(u) ≥ Ji(zi)

≥ Ji(zi) +∑j∈I1

λjgj(zi) +

∑j∈I+2 (i)

λjgj(zi) (2.3.17)


is true for any u ∈ V , λj ≥ 0 (j ∈ I1 ∪ I+2 (i)), moreover,

ηij > 0, gj(zi) = 0 ∀ j ∈ I+

2 (i).

If the vectors aj ∈ V , j ∈ I+2 (i), are linearly dependent, it is possible to

reduce this vector system to a system of linearly independent vectors (see forinstance the proof of Caratheodory’s theorem in Rockafellar [348]). Hence,we identify a subsystem of linearly independent vectors aj , j ∈ I2(i) ⊂ I+

2 (i),such that ∑

j∈I+2 (i)

ηijaj =

∑j∈I2(i)

µijaj

with µij > 0. Furthermore, for any u ∈ V∑j∈I+2 (i)

ηijgj(u) =∑

j∈I+2 (i)

ηij [〈aj , u〉 − 〈aj , zi〉]

=∑

j∈I2(i)

µij [〈aj , u〉 − 〈aj , zi〉] =∑

j∈I2(i)

µijgj(u),

and because of gj(zi) = 0 for j ∈ I+

2 (i), we see that relation (2.3.17) is equivalentto

Ji(u) +∑j∈I1

ηijgj(u) +∑

j∈I2(i)

µijgj(u)


ηijgj(zi) +

∑j∈I2(i)

µijgj(zi) (2.3.18)


λjgj(zi) +

∑j∈I2(i)

µjgj(zi)

for all λj ≥ 0 (j ∈ I1), µj ≥ 0 (j ∈ I2(i)).Therefore, (z, ηij|j∈I1 , µ

ij|j∈I2(i)) is a saddle point of the problem

minJi(u) : gj(u) ≤ 0, j ∈ I1 ∪ I2(i).

Setting µij := 0 for j ∈ I2 \ I2(i), then (zi, ηij|j∈I2 , µij|j∈I2) is a saddle point for

Problem (2.3.16).Now, we show the existence of a constant γ0 < 0 such that the system of

inequalities

gj(u) ≤ γ0, ∀ j ∈ I1 ∪ S,

is solvable for all S ⊂ Σ, with Σ the set of all subsets of I2 for which thecorresponding families of vectors aj are linearly independent.Indeed, let aff(a) be the affine hull of the points aj for all j ∈ S. In view of thelinear independence of the vectors aj we have 0 /∈ aff(a). Let ξ ∈ aff(a) be theclosest point to zero, then 〈aj − ξ, ξ〉 = 0 for j ∈ S, hence 〈aj ,−ξ〉 < 0.For uS := u − γ(S)ξ, due to the continuity of gj and the assumed regularitycondition, it follows that for a suitable γ(S) > 0 we get

gj(uS) < 0, ∀ j ∈ I1 ∪ S.


Now, because Σ is a finite set, on can choose

γ0 := maxS⊂Σ

maxj∈I1∪S

gj(uS).

In the remainder of the proof we consider the point uS to be fixed. In view of∑j∈I1

ηijgj(zi) = 0 and

∑j∈I2(i)

µijgj(zi) = 0,

using the left part of inequality (2.3.18) with u := uSi , Si = I2(i), we get∑j∈I1

ηij +∑j∈I2

µij =∑j∈I1

ηij +∑

j∈I2(i)

µij

≤ Ji(uSi)− Ji(zi)−maxj∈I1∪Si gj(uSi)

≤ − 1

γ0

(supiJi(uSi)− d0

).

Hence, for the Lagrange multiplier vector ηi = (ηij|j∈I1 , µij|j∈I2) of Problem

(2.3.16) the estimate

‖ηi‖ ≤ − 1

γ0

(supiJi(uSi)− d0

)holds true uniformly for all i. To finish the proof it is necessary to note that theset Σ is finite and supi Ji(uSi) <∞, since Ji(u) was supposed to be boundedfrom above for any u ∈ V .

2.3.5 Corollary. If the constraints in Problem (2.1.13) satisfy the regularitycondition of Lemma 2.3.4, then there exists a Lagrange multiplier vector λk foreach of the Problems (2.3.12) such that supk ‖λk‖ <∞ and

‖λk‖√rkαk

→ 0 for rkαk →∞.

2.3.6 Remark. If the Slater condition is fulfilled for Problem (2.1.13), thenthe proof of Lemma 2.3.4 implies that for any Lagrange multiplier vector λk ofProblem (2.3.12) and any k the estimate

‖λk‖ ≤ (J(u) + α0ω(u)− J∗)(− max

1≤j≤mgj(u)

)−1

holds, with α0 = supk αk. ♦

Under the Slater condition the proof of Proposition 2.3.3 can be extended eas-ily for other penalty functions, for instance, for barrier functions of the type(A3.4.78), too. If the choice of φk guarantees that vk ∈ K, then, in order tocalculate ‖uk − uk‖, we can immediately make use of the estimates for the rateof convergence of the penalty method, see for instance, Proposition A3.4.42.Indeed, these are of the form

θk(vk)− θ∗k ≤ η(αk, rk, εk)

and if vk ∈ K, due to the strong convexity of θk,

‖uk − vk‖ ≤

√η(αk, rk, εk)

καk.

2.4. MOSCO’S APPROXIMATION SCHEME 47

We are ready if it is possible to establish that the controlling parameters αk, rk, εkcan be chosen such that

η(αk, rk, εk)

αk→ 0.

This, however, is not difficult. Grossmann and Kaplan [151] show that onecan bound η(αk, rk, εk) from above, uniformly with respect to αk, such that

η(αk, rk, εk) ≤ η(rk, εk)

with η(rk, εk)→ 0 for rk →∞ and εk → 0.

2.4 Mosco’s Approximation Scheme

In this section we investigate an approach, suggested by Mosco [296], in orderto establish stable sequential approximation methods for ill-posed variationalinequalities with monotone, in general non-potential, operators.

This approach, based on a concept of the convergence of sets and function-als, is convenient for the study of standard approximations for elliptic variationalinequalities. We start with the description of Mosco’s scheme for the case thatProblem (2.1.1) is well-posed with respect to the class of data perturbations,defined in terms of Mosco’s concept. After that we turn to the ill-posed case.

Because this technique will be not used later on, all results are given with-out proofs. Moreover, some notions and statements are formulated for particularpurpose.

Let V be a Hilbert space and Cn be a sequence of convex, closed subsetsin V . Denote

LimsCn : the set of all points v ∈ V for which there is a sequencevn, vn ∈ Cn,∀ n ∈ N, with vn → v;

LimwCn : the set of all points v ∈ V such that there is a sequence

vk, vk ∈ Cnk , k = 1, 2, · · · , and vk v,where Cnk is a subsequence of Cn.

2.4.1 Definition. A sequence Cn of convex closed sets is called M-convergentto the set C in V if

LimsCn = LimwCn = C.

In this case we write C = LimCn. ♦

It is obvious that in the limit C is a convex and closed set in V .

2.4.2 Definition. A sequence fn of convex and lower-semicontinuous func-tionals fn : V → R is called M-convergent to the functional f on V if

epif = Lim(epifn) in V ⊗R,

where V ⊗ R is the space V × R endowed with the scalar product, which isgenerated by the scalar product in V and induces the norm

‖(v, β)‖ =√‖v‖2 + |β|2, (v, β) ∈ V ×R.

Here we write f = Limfn. ♦


It is easy to verify that in the limit the functional f is convex and lower-semicontinuous.

2.4.3 Proposition. The relation f = Limfn is valid iff for any v ∈ V

(i) there exists a sequence vn ⊂ V with

vn → v, lim supn→∞

fn(vn) ≤ f(v),

and

(ii) for any subsequence fnk ⊂ fn and any weakly convergent sequencevk the following implication holds

vk v ⇒ lim infk→∞

fnk ≥ f(v).

In finite-dimensional spaces M-convergence of functionals is equivalent to thenotion of epi-limits, see for instance, Rockafellar and Wets [353].

Now, let us consider Mosco’s approach for the problem mentioned:


where J : V → R is a convex, lower semicontinuous functional,K ⊂ V is a convex, closed set.

This problem has to be approximated by a sequence of auxiliary problems

minJi(u) : u ∈ Ki, ∀ i ∈ N, (2.4.2)

where the functionals Ji and sets Ki are supposed to satisfy the same conditionsas J and K in (2.4.1).

2.4.1 Approximation of well-posed problems

Suppose that the following assumptions are satisfied for the problems (2.4.1)and (2.4.2), respectively.

2.4.4 Assumption.

(i) domJ ⊃ K, domJi ⊃ Ki ∀ i ∈ N, and J = LimJi;

(ii) K = LimKi;

(iii) for any u ∈ K there exists a continuous, strictly increasing function γ :R+ → R+ with γ(0) = 0 and

γ(‖v − u‖) ≤ lim infi→∞

|Ji(v)− J(u)|,

the latter inequality is supposed to be true uniformly for arbitrary v ∈Wof any bounded W ⊂ V ;


(iv) there exists a function ζ : R+ → R+ with ζ(r)→∞ for r →∞ such thatfor any sequence vi, vi ∈ Ki, there is a bounded sequence zi, zi ∈ Ki,with

ζ(‖vi − zi‖) ≤ |Ji(vi)− Ji(zi)|‖vi − zi‖

;

(v) Problems (2.4.2) are solvable for sufficiently large i.

♦

Condition (iv) and (v) ensure not only solvability of Problems (2.4.2) forlarge i, but also the uniform boundedness of their optimal sets. Taking this intoconsideration, due to Assumption 2.4.4 (i) and (ii), the solvability of Problem(2.4.1) can be concluded and in case the solution u∗ of (2.4.1) is unique, weakconvergence of ui to u∗ can be proved, with ui any solution of the i-th auxiliaryproblem (2.4.2). Finally, in view of Assumption 2.4.4 (iii), ui u∗ impliesstrong convergence ui → u∗ in the norm of V .

2.4.5 Theorem. Suppose that the Problems (2.4.1) and (2.4.2) satisfy all theconditions required, in particular Assumption 2.4.4 (i)-(v).Then the auxiliary problems (2.4.2) are solvable for sufficiently large i ≥ i0 andtheir optimal sets U∗i are uniformly bounded.Moreover, Problem (2.4.1) is also solvable and if its solution u∗ is unique, thenany sequence ui, ui ∈ U∗i , i ≥ i0, converges to u∗ in the norm of the space V .

If J is a strictly convex functional on domJ , then it immediately followsthat Problem (2.4.1) is well-posed with respect to the class of perturbationsJi,Ki defined by the conditions (i)-(v) in Assumption 2.4.4.

2.4.6 Remark. Formally, the particular consideration of the sets K and Ki isnot necessary. Taking

ϕ(·) = J(·) + ind(·|K), ϕi(·) = Ji(·) + ind(·|Ki),

with ind the indicator functional of the corresponding sets, the Problems (2.4.1)and (2.4.2) are equivalent to the unconstrained minimization of ϕ and ϕi, re-spectively. Besides the functionals ϕ,ϕi are of the same class as J and Ji (seeCorollary of Theorem B in [296]). However, in view of a numerical treatment, itis more convenient to establish estimates of the approximation for the objectivefunctional as well as for the feasible set separately. ♦

2.4.2 Mosco’s scheme for ill-posed problems

Now, let us suppose that the conditions (iii) - (v) in Assumption 2.4.4 arenot satisfied and Problem (2.4.1) may have more than one solution. In thiscase, in order to obtain stable approximations of Problem (2.4.1), a sequence ofregularized auxiliary problems

minJi(u) + αi‖u‖2 : u ∈ Ki, ∀ i ∈ N, (2.4.3)

is considered, where Ji,Ki satisfy the assumptions required for J and K in(2.4.1) and αi ↓ 0.Here, to obtain convergence for the sequence ui of solutions of these auxiliaryproblems, more restrictive assumptions on the approximations Ji and Ki arerequired, respectively.


2.4.7 Definition. A sequence Cn of convex and closed sets Cn ∈ V is calledconvergent of the order ≥ σ (σ ≥ 0 is fixed) to a convex closed set C ⊂ V (writenσ[Cn − C]→ 0), if for any v ∈ C there exist vn ∈ Cn such that

nσ(vn − v)→ 0 in V as n→∞,

and for any weakly convergent sequence vk, vk ∈ Cnk , there exist vkj ⊂ vkand wj ⊂ C such that

nσkj (vkj − wj) 0 in V as j →∞,

with Cnk a subsequence of Cn. ♦

2.4.8 Definition. For a fixed σ ≥ 0 a sequence fn, fn : V → R, of convexand lower-semicontinuous functionals is called convergent of the order ≥ σ to aconvex, lsc functional in V (write nσ[fn − f ]→ 0), if

nσ[epifn − epif ]→ 0 in V ⊕R

in the sense of Definition 2.4.7. ♦

2.4.9 Proposition. Functional convergence nσ[fn − f ] → 0 holds true iff forany v ∈ domf there is a sequence vn ⊂ V such that

nσ(vn − v)→ 0 in V as n→∞,lim supn→∞

nσ(fn(vn)− f(v)) ≤ 0,

and from any weakly convergent vk ⊂ V , with lim supk→∞ fnk(vk) < +∞ fora chosen fnk, a subsequence vkj can be selected such that for some wj ⊂ Vthe relations

nσkj (vkj − wj)→ 0 in V as j →∞,

lim infj→∞

nσkj (fnkj (vkj )− f(wj)) ≥ 0

are fulfilled.

Now, suppose that for a fixed σ > 0 the approximation of Problem (2.4.1)by means of the sequence of Problems (2.4.3) satisfies the following requirements:

2.4.10 Assumption.

(i) domJ ⊃ K, domJi ⊃ Ki ∀ i ∈ N;

(ii) iσ[Ji − J ]→ 0;

(iii) iσ[Ki −K]→ 0;

(iv) for any sequence vi, vi ∈ Ki, with ‖vi‖ → ∞ as i → ∞ there is asequence zi ∈ K such that

lim supi→∞

iσ(J(zi)− Ji(vi))‖vi‖

<∞.


♦

2.4.11 Theorem. Suppose that the Problems (2.4.1) and (2.4.3) satisfy all theconditions required, in particular Assumption 2.4.10 (i)-(v) with some σ > 0.Moreover, the optimal set U∗ of Problem (2.4.1) is supposed to be non-emptyand in the regularized problems (2.4.3) the regularization parameter αi := iσ ischosen.Then the uniquely determined sequence ui of solutions of Problems (2.4.3)converges strongly to the norm-minimal solution u∗ of Problem (2.4.1), i.e.,‖u∗‖ ≤ ‖u‖ ∀ u ∈ U∗.

In [296] the approach described above is applied to variational inequalitiesof the type VI(Q,K):

u ∈ K : 〈Qu, v − u〉 ≥ 0, ∀ v ∈ K, (2.4.4)

and hemi-variational inequalities of the type HVI(Q, f ;K):

u ∈ V : 〈Qu, v − u〉+ f(v)− f(u) ≥ 0, ∀ v ∈ K, (2.4.5)

where Q : V → V ′ is a monotone and hemi-continuous operator, f is a convex,lsc functional on V and K ⊂ V is a convex and closed set. For the definition ofa hemi-continuous operator see, for instance, Definition A1.6.38.In the case of well-posed problems the corresponding auxiliary problems havethe form

ui ∈ Ki : 〈Qiui, v − ui〉 ≥ 0, ∀ v ∈ Ki, (2.4.6)

ui ∈ V : 〈Qiui, v − ui〉+ fi(v)− fi(ui) ≥ 0, ∀ v ∈ Ki, (2.4.7)

respectively, and in the case of ill-posedness

ui ∈ Ki : 〈(Qi + iσT )ui, v − ui〉 ≥ 0, ∀ v ∈ Ki, (2.4.8)

ui ∈ Ki : 〈(Qi + iσT )ui, v − ui〉+ fi(v)− fi(ui) ≥ 0, ∀ v ∈ Ki. (2.4.9)

Here Qi and fi satisfy the same properties as Q and f , and the operator Tbelongs to a sufficient broad class of operators performing a Tikhonov regular-ization for Problem (2.4.1). For the minimization problem (2.4.3) just T u := uhas to be used.The proofs of the Theorems B and D in Mosco’s paper [296] and their corollariesabout the convergence of the approaches considered can be adapted to our caseif we set Q = Qi := 0 and f := J + ind(·|K), fi := Ji + ind(·|K) and apply thetransition from inequality (7.4) to inequality (7.5) given in Section 0.9 of thepaper mentioned.

Mosco’s approach for solving Problem (2.4.1) can be considered, on theone hand, as a regularization of this problem if Ji and Ki are interpreted asapproximately given data and, on the other hand, as a regularization of thediscretization method if, for instance, Ji and Ki are derived by some stan-dard discretization of the original problem (2.4.1). Such dual interpretation isalso possible in a number of other approaches for solving ill-posed problems,i.e., there is no strict distinction between classical regularization methods andmethods of iterative regularization.


2.5 Comments

Section 2.1: For a long time investigations of optimization methods for solvingill-posed convex variational problems were restricted merely in order to proofthe convergence towards the optimal set. The first result comprising better prop-erties of convergence was established for the gradient method with a constantstep-size. Such a statement, similar to Proposition 2.1.1 (with K = V andδk ≡ 0), was formulated without proof by Polyak [328]. Later on a proof wasgiven by Gol’shtein and Tretyakov [139]. In order to establish here theconvergence of the gradient method with inexact computation of the gradientswe basically were following the proof given in [141], .

Proposition 2.1.1 shows the stability of the gradient - and gradient projec-tion method, applied to ill-posed convex variational problems under sufficientlysmooth data and small perturbations of the objective functional with respect tothe norm in C1(V ). In the infinite setting stability is to be understood in termsof weak convergence.

Lemma A1.5.33, often used to investigate optimization methods, wasproved by Gol’shtein and Tretyakov [139]. A proof of Theorem 2.1.4 canbe found in Kaplan [205]. Here the rule for adapting the controlling parame-ters in the penalty methods (see also Remark 2.1.6) corresponds to the idea ofregularization by means of a timely termination at each approximation level ofthe iteration procedure. This idea was exploited for linear problems by manyauthors, for instance, see Lavrentyev [258], Bakushinski [29], Emelin andKrasnoselski [103] and, in connection with the method of feasible directionsfor the solution of elliptic variational inequalities of the obstacle problem, byKirsten and Tichatschke [235]. An approach towards regularization viadiscretization was made by Natterer [303].

Section 2.2: see the comments to Section 3.2.

Section 2.3: In our description of regularized subgradient projection methodswe are going along with the papers of Bakushinski and Polyak [31] andBakushinski and Goncharski [30]. Tikhonov-regularized penalty methodswith various penalty functions are investigated by Vasiljev [407, 408]. In hisfirst paper a statement similar to Proposition 2.3.3 is proved under a weakerregularity condition. In our opinion, Lemma 2.3.4 gives a new result which couldbe useful in connection with the uniform estimation of the Lagrange multipliervectors of a family of regularized problems. By the way, a Tikhonov-regularizedNewton method for solving variational inequalities has been established in [30],see also Friedrich et al. [121].

Section 2.4: Executable versions of Mosco’s scheme are developed in a numberof papers mainly dealing with the discrete approximation of problems in mathe-matical physics. Discrete approximations of ill-posed problems were investigatedby Douglas [93] in connection with the solution of integral equations of the firstkind. Among the succeeding papers we mention the monograph by Vainikko[406], where an abstract framework is given for investigating methods basedon discrete approximations, including those with iterative regularization. Alsothe monograph of Hofmann [185] should be mentioned, dealing with appli-cations of regularized discretization methods. We further refer to a number of

2.5. COMMENTS 53

publications including the relations between Mosco-approximation and Moreau-Yoshida-approximation, see for instance, Attouch [16], Lemaire [260, 261].In Tossings [399] Mosco’s scheme is suggested for solving ill-posed problems,where the discretized problems are solved approximately.

Chapter 3

PPR FORFINITE-DIMENSIONALPROBLEMS

3.1 Properties of the Proximal-Point-Mapping

Let C be a non-empty, closed and convex subset of a Hilbert space V andf : V → R be a lower semi-continuous convex functional. We recall that onlyproper convex functionals are considered in this book.For arbitrary u ∈ V the proximal-point-mapping is defined by

Proxf,C

u := arg minv∈C

f(v) +χ

2‖v − u‖2, χ > 0 fixed. (3.1.1)

Due to the strong convexity and lower semi-continuity of

Ψχ,u(·) := f(·) +χ

2‖ · −u‖2,

the proximal-point-mapping (prox-mapping for short) is always single-valued.Obviously, the set of fixed points of this mapping coincides with the set C∗ ofminimizers of the function f on C, i.e.,

u∗ ∈ C∗ ⇔ Proxf,C

u∗ = u∗, ∀ χ > 0.

Moreover, it holds

Proxf,C

u = Proxf ,V

u

for f := f + ind(·|C) (ind(·) the indicator functional).

3.1.1 Proposition. The operator Proxf,C is firmly non-expansive, i.e., for ar-bitrary x, y ∈ V the following inequality is fulfilled:

‖Proxf,C

x− Proxf,C

y‖2 ≤ ‖x− y‖2 − ‖x− y − (Proxf,C

x− Proxf,C

y)‖2.

55

56 CHAPTER 3. PPR FOR FINITE-DIMENSIONAL PROBLEMS

Proof: From Proposition A1.5.34 we get for v := Proxf,C x and w := Proxf,C y:

f(u)− f(v) + χ〈v − x, u− v〉 ≥ 0 ∀ u ∈ C,f(z)− f(w) + χ〈w − y, z − w〉 ≥ 0 ∀ z ∈ C,

(3.1.2)

hence, setting u := w, z := v it follows

〈y − x,w − v〉 ≥ ‖w − v‖2

and

‖v − w‖2 ≤ ‖x− y‖2 + 〈y − x,w − v〉 − ‖x− y‖2

≤ ‖x− y‖2 + 2〈y − x,w − v〉 − ‖w − v‖2 − ‖x− y‖2

= ‖x− y‖2 − ‖x− y − (v − w)‖2.

3.1.2 Remark. Proposition 3.1.1 tells us that the prox-mapping is non-expansive,i.e.,

‖Proxf,C

x− Proxf,C

y‖ ≤ ‖x− y‖, ∀ x, y ∈ V.

♦

3.1.3 Proposition. Let y ∈ C be an arbitrary point. Then for any v0 ∈ V andv1 := Proxf,C v

0 it holds

‖v1 − y‖2 − ‖v0 − y‖2 ≤ −‖v1 − v0‖2 +2

χ[f(y)− f(v1)], (3.1.3)

‖v1 − y‖ ≤ ‖v0 − y‖+

0 if f(y) ≤ f(v1),√

2χ [f(y)− f(v1)] if f(y) > f(v1).

(3.1.4)

Proof: Substituting in inequality (3.1.2) x := v0, v := v1, u := y, we get

f(y)− f(v1) + χ〈v1 − v0, y − v1〉 ≥ 0.

Hence,

‖v1 − y‖2 − ‖v0 − y‖2 = −‖v1 − v0‖2 + 2〈v1 − v0, v1 − y〉

≤ −‖v1 − v0‖2 +2

χ[f(y)− f(v1)]

and inequality (3.1.4) can be deduced immediately from here.

Without assuming differentiability of the function f , the so called Moreau-Yosida-Regularization

η(u) := minv∈Cf(v) +

χ

2‖v − u‖2

has the following properties:

3.1. PROPERTIES OF THE PROXIMAL-POINT-MAPPING 57

3.1.4 Proposition. The functional

η(u) := minf(v) +χ

2‖v − u‖2 : v ∈ C

is convex and continuously Frechet-differentiable on V , and

∇η(u) = χ(u− Proxf,C

u).

Proof: The proof of the convexity of η is straightforward.In order to estimate for any u, h ∈ V the absolute value of

q := minv∈Cf(v) +

χ

2‖v− (u+ h)‖2−min

v∈Cf(v) +

χ

2‖v− u‖2−χ〈u−Prox

f,Cu, h〉,

we consider two cases.(i) q ≥ 0: Then

|q| ≤∣∣f(Prox

f,Cu) +

χ

2‖Prox

f,Cu− (u+ h)‖2

−f(Proxf,C

u)− χ

2‖Prox

f,Cu− u‖2 − χ〈u− Prox

f,Cu, h〉

∣∣,hence

|q| ≤∣∣〈−χh,Prox

f,Cu− u− 1

2h〉 − 〈χh, u− Prox

f,Cu〉∣∣ =

χ

2‖h‖2. (3.1.5)

(ii) q < 0: Then we conclude

|q| ≤∣∣f(Prox

f,C(u+ h)) +

χ

2‖Prox

f,C(u+ h)− (u+ h)‖2

−f(Proxf,C

(u+ h))− χ

2‖Prox

f,C(u+ h)− u‖2 − χ〈u− Prox

f,Cu, h〉

∣∣=

∣∣〈−χh,Proxf,C

(u+ h)− u− 1

2h〉 − 〈χh, u− Prox

f,Cu〉∣∣

=∣∣− 〈χh,Prox

f,C(u+ h)− Prox

f,Cu〉+

χ

2‖h‖2

∣∣.The non-expansivity of the prox-mapping yields

|q| ≤∣∣− χ‖h‖‖u+ h− u‖+

χ

2‖h‖2

∣∣ =χ

2‖h‖2. (3.1.6)

Now, the differentiability of η follows from (3.1.5) and (3.1.6) where as thecontinuity of its gradient is a conclusion of the non-expansivity of the prox-mapping

‖∇η(u)−∇η(v)‖ = χ‖u− Proxf,C

u− v + Proxf,C

v‖ ≤ 2χ‖u− v‖.

Next, let us consider some properties of the exact proximal point algorithm(PPA) applied to Problem (1.2.4).


3.1.5 Algorithm. (Proximal point algorithm)Data: u0 ∈ K, χk ∈ [χ, χ], (χ > 0, χ <∞);

S0: Set k := 0;

S1: if uk solves the problem, stop;

S2: compute

uk+1 := arg minu∈K

J(u) +χk2‖u− uk‖2, (3.1.7)

set k := k + 1 and go to S1.

♦

The following result goes back to Guler [153].

3.1.6 Proposition. For any u ∈ K and iteration n ≥ 1 in the PPA 3.1.5 itholds

J(un)− J(u) ≤ 1

2γn

[‖u− u0‖2 − ‖u− un‖2

]− γnχ

2n

2‖un − un−1‖2, (3.1.8)

with γn :=∑nk=1

1χk

.

Proof: First we show that the sequence χk‖uk−1 − uk‖ is non-increasing.For the functional J := J + ind(·|K), in view of the definition of uk, we get theinclusions

χk(uk−1 − uk) ∈ ∂J(uk), χk+1(uk − uk+1) ∈ ∂J(uk+1),

and from the monotonicity of the mapping ∂J we conclude that

〈χk+1(uk − uk+1)− χk(uk−1 − uk), uk+1 − uk〉 ≥ 0.

Hence,

〈χk+1(uk − uk+1), χk+1(uk − uk+1)〉 ≤ 〈χk(uk−1 − uk), χk+1(uk − uk+1)〉

and‖χk(uk−1 − uk)‖ ≥ ‖χk+1(uk − uk+1)‖.

Now, due to the convexity of the functional J ,

2

χk

(J(u)− J(uk)

)≥ 2〈uk−1 − uk, u− uk)

= ‖uk−1 − uk‖2 + ‖u− uk‖2 − ‖u− uk−1‖2

and, for u ∈ K, it follows

2

χk

(J(u)− J(uk)

)≥ ‖uk−1 − uk‖2 + ‖u− uk‖2 − ‖u− uk−1‖2. (3.1.9)

Summing up these inequalities for k = 1, ..., n, we obtain

2γnJ(u)−2

n∑k=1

1

χkJ(uk) ≥

n∑k=1

‖uk−1−uk‖2 +‖u−un‖2−‖u−u0‖2. (3.1.10)

3.1. PROPERTIES OF THE PROXIMAL-POINT-MAPPING 59

Inequality (3.1.9) with u := uk−1 tells us that

J(uk−1)− f(uk) ≥ χk‖uk−1 − uk‖2, ∀ k ∈ N,

hence,

γk−1J(uk−1)− γkJ(uk) +1

χkJ(uk) ≥ γk−1χk‖uk−1uk‖2, k = 1, 2, · · · ,

where γ0 = 0 is taken. This yields

−γnJ(un) +

n∑k=1

1

χkf(uk) ≥

n∑k=2

γk−1χk‖uk−1 − uk‖2.

The latter inequality together with (3.1.10) leads to

2γn (J(u)− J(un))

≥n∑k=1

‖uk−1 − uk‖2 + ‖u− un‖2 − ‖u− u0‖2 + 2

n∑k=2

γk−1χk‖uk−1 − uk‖2

and using the monotonicity of the sequence χk‖uk − uk−1‖2, we get finally

2γn (J(u)− J(un))

≥ ‖u− un‖2 − ‖u− u0‖2 +

(n∑k=1

1

χ2k

+ 2

n∑k=2

γk−1

χk

)χ2n‖un−1 − un‖2

= γ2nχ

2n‖un−1 − un‖2 + ‖u− un‖2 − ‖u− u0‖2,

proving estimate (3.1.8).

3.1.7 Remark. Proposition 3.1.6 implies immediately the following estimate

J(un)− J(u0) ≤ − 1

2γn‖u0 − un‖2, ∀ n ∈ N, u0 ∈ K,

and in case U∗ = Arg minu∈K J(u) 6= ∅ we get

J(un)− minu∈K

J(u) ≤ 1

2γnρ(u0, U∗), ∀ n ∈ N, u0 ∈ K.

♦

3.1.8 Proposition. If U∗ 6= ∅, then the sequence uk, generated by the PPA3.1.5, converges weakly to some element u∗ ∈ U∗.

This result was established by Martinet [287], which is the first publi-cation dedicated explicitly to proximal-point-methods. It should be mentionedthat Proposition 3.1.8 (with χk ≡ 2) could have been obtained also as a by-product of Proposition 1.3.11.

As we will see in detail later on, convergence still holds if the prox-point isonly approximated with a sufficiently accuracy. This leads to so called inexactproximal point methods:

uk+1 ≈ ProxJ,K

uk.


Rockafellar [351] gives various conditions for the accuracy of approximationof which the most important in the case of convex minimization is

‖uk+1 − Proxuk‖ ≤ εk,∞∑k=1

εkχk

<∞.

In fact that the regularization parameter χk is not required to tend to zerobut only to be confined to an upper bound is an important feature of the PPRin contrast to Tikhonov regularization, possibly slowing down the convergencethough but keeping the auxiliary objective functions strongly convex and well-defined.

3.2 PPR for Convex Problems

3.2.1 Proximal gradient methods

In the sequel we make use of the essential fact that the corresponding basicalgorithms employed in the proximal point regularization (PPR) are going toapproximate successively the original convex minimization problem by meansof unconstrained extremal problems. While doing so, the sequence of functionswhich have to be minimized converges to a function possessing an unconstrainedminimizer which is, at the same time, the solution of the original problem.

The finite-dimensional setting of the problems enables us to work withweak requirements on the convergence of the auxiliary functions which have tobe minimized (cf. in Theorem 3.2.1 the conditions (i) and (ii), as well as inTheorem 3.2.3 the conditions (i’) and (ii’)).

3.2.2 Penalty methods with iterative PPR

Let us recall the finite-dimensional Problem (A3.4.56)


K := v ∈ Rn : gj(v) ≤ 0, j = 1, ...,m,

J, gj : Rn → R are convex, differentiable functions,Slater’s condition is fulfilled and U∗ 6= ∅

and assume additionally that the optimal set U∗ is bounded.Let φk, φk ∈ C1(Rn), be a system of convex penalty functions, εk and χkbe sequences of positive controlling numbers such that

limk→∞

εk = 0, 0 < χ ≤ χk < χ, k = 1, 2, · · · ,

and the starting point u0 ∈ Rn be chosen arbitrarily.In a penalty method, regularized by means of PPR, the iterates are determinedvia

uk ≈ arg minu∈Rn

J(u) + φk(u) +χk2‖u− uk−1‖2 (3.2.1)

3.2. PPR FOR CONVEX PROBLEMS 61

such that‖∇J(uk) +∇φk(uk) + χk(uk − uk−1)‖ ≤ εk. (3.2.2)

To establish convergence for this method we need some technical notions. Let

Q := u ∈ K : J(u) ≤ J∗ + %, (% > 0 fixed)

and u ∈ intQ. Further, we define

Fk(u) := J(u) + φk(u) +χk2‖u− uk−1‖2,

vk := arg minu∈Rn

Fk(u),

τk := ‖wk − vk‖, δk := maxu∈Q

φk(u),

wk := arg minw∈[u,vk]∩Q

‖w − vk‖2,

i.e., wk is a point belonging to the intersection of the interval [u, vk] and the setQ and located next to vk.From the inequalities (3.2.2), (A1.5.21) and (A1.5.32) it follows for k = 1, 2, · · ·

‖vk − uk‖ ≤ εkχk, Fk(uk)− Fk(u) ≤ ε2k

χk∀ u ∈ Rn.

In the following theorem and in Theorem 3.2.3 the specific structure of thefeasible set K is of no importance.

3.2.1 Theorem. Assume that∑∞k=1 εk <∞ and that the penalty functions φk

satisfy the conditions

(i) limk→∞ φk(u) =

0 if u ∈ K,+∞ if u /∈ K;

(ii) ‖∇φk(zk)‖ → ∞ if limk→∞ zk = u /∈ K;

(iii) there exists a constant c > 0 such that for all k ∈ N

φk+1(u) ≤ cφk(u), ∀ u ∈ Rn.

Then it holds

(a) the sequence uk, generated by Penalty Method (3.2.1) (3.2.2), is bounded;

(b) each of its cluster points belongs to K;

(c) limk→∞ ‖uk − uk−1‖ = 0;

(d) limk→∞ J(uk) = J∗.

If, moreover,

(iv)∑∞k=1

√δk <∞,

∑∞k=1

√τk <∞,

then

(e) limk→∞ uk = u∗ ∈ U∗.


Proof: The proof is carried out in three steps.1. Proving the statements (a) and (b). We define for fixed constants θ > 0 andδ > 0 the sets

Q := u ∈ K : J(u) ≤ J∗ + 2θ,Qδ := u ∈ Rn : ρ(u,Q) ≤ δ.

Let Q′ ⊂ intQ be a convex and compact set such that intQ′ 6= ∅ and

J(u) < J∗ +θ

2∀ u ∈ Q′.

Setting in the proof of Theorem A3.4.41

H := Rn, G := K, φ2k ≡ 0, ∀ k,

we conclude from condition (i) that the inequality

J(q) + φk(q) > J(v) + φk(v) + θ (3.2.3)

is satisfied for sufficiently large k (k ≥ k′) and all q ∈ ∂Qδ, v ∈ Q′.Now, take an arbitrary z ∈ Q′, an index k0 > k′ and define

ρ0 := maxu∈∂Qδ

‖z − u‖,

ρ := max[‖z − uk0−1‖, ρ0

]and Br(z) := u ∈ Rn : ‖z − u‖ ≤ r.If u /∈ Br(z), with r := ρ+ ρ0

θ

ε2k0χk0

, then we chose (see Fig. 3.2.1)

z := u+ λ(z − u) : 0 ≤ λ ≤ 1 ∩ ∂Qδ,¯z := u+ γ(z − u) : 0 ≤ γ ≤ 1 ∩ ∂Bρ(z).

Because for a convex function ϕ the inequality

ϕ(x+ t(x− x′)) ≥ ϕ(x) + t(ϕ(x)− ϕ(x′)), t ≥ 0, (3.2.4)

is true, we get with ϕ := J + φk0 , x := z, x′ := z, t := ‖¯z−z‖‖z−z‖ and in view of

(3.2.3)

J(¯z) + φk0(¯z) = J(z + t(z − z)) + φk0(z + t(z − z))

≥ J(z) + φk0(z) +‖¯z − z‖‖z − z‖

(J(z) + φk0(z)− J(z)− φk0(z))

≥ J(z) + φk0(z) +ρ− ρ0

ρ0θ.

Analogously, if x := ¯z, x′ := z, t := ‖¯z−u‖‖¯z−z‖ , then in view of (3.2.3) it follows

J(u) + φk0(u) ≥ J(¯z) + φk0(¯z) + t(J(¯z) + φk0(¯z)− J(z)− φk0(z))

≥ J(¯z) + φk0(¯z) +ρ0ε

2k0

θρχk0(J(¯z) + φk0(¯z)− J(z)− φk0(z) + θ)

≥ J(¯z) + φk0(¯z) +ρ0ε

2k0

θρχk0

(ρ− ρ0

ρ0θ + θ

)= J(¯z) + φk0(¯z) +

ε2k0χk0

.


∂Q

∂Qδ K

z

ρ

u¯z

z

ρ0

uk0−1

J(u) = J∗ + 2θ

Figure 3.2.1:

Using the trivial inequality (see Fig 3.2.1)

‖¯z − uk0−1‖ < ‖u− uk0−1‖,

we can conclude further that

Fk0(u) > Fk0(¯z) +ε2k0χk0

,

and in view of (3.2.2) we get

Fk0(uk0) ≤ Fk0(¯z) +ε2k0χk0

.

Now it is obvious that uk0 ∈ Br(z) for r := ρ+ ρ0θ

ε2k0χk0

.

Moreover, for k ≥ k0, the inclusion uk ∈ Br′(z) holds with

r′ := ρ+ρ0

θ

∑k≥k0

ε2kχk.

Hence, due to the assumptions on εk and χk, the sequence uk is bounded.From the relation (3.2.2) it follows that

∇φk(uk)‖ ≤ εk + χk‖uk − uk−1‖+ ‖∇J(uk)‖,

therefore, lim supk→∞ ‖∇φk(uk)‖ < ∞. With regard to condition (ii), thisproves that limk→∞ ρ(uk,K) = 0.

2. Now, let us show (c) and (d). First we prove

limk→∞

φk(uk) = 0. (3.2.5)


Because of the assumptions (i) and (iii) we have c > 1 and

φk(u) ≥ 0 ∀ u ∈ Rn, ∀ k.

Suppose (3.2.5) is wrong. Then for some σ > 0 the relation φki(uki) ≥ σ must

be satisfied for a subsequence uki.In view of the boundedness of uk and χk we can assume without loss ofgenerality that

limi→∞

uki = u, limi→∞

uki−1 = ¯u, limi→∞

χki = χ > 0.

Taking limit in the relation

Fki(u) ≥ Fki(uki)−ε2kiχki

,

for i→∞, then due to u ∈ K (cf. the first part of the proof), this yields

J(u) +χ

2‖u− ¯u‖2 ≥ J(u) + σ +

χ

2‖u− ¯u‖2.

But this is impossible. Hence limk→∞ φk(uk) = 0 is true and, in view of the non-negativity of φk and condition (iii), the relation limk→∞ φk+1(uk) = 0 holds, too.

Next, we prove that lim supk→∞ J(uk) = J∗. Suppose this is wrong. Then

lim supk→∞

J(uk)− J∗ = θ0 > 0.

Choosing τ := θ03 , ε > 0, ε << τ , then there exists a number k0 such that the

inequalityJ(uk0−1) ≥ J∗ + τ

is satisfies and

φk(uk−1) < ε, φk(u∗) < ε,ε2kχk

< ε (3.2.6)

is true for all k ≥ k0 and some u∗ ∈ U∗.For ukα := αu∗ + (1− α)uk−1, α ∈ [0, 1], we get the inequality

J(ukα) + φk(ukα) +χk2‖ukα − uk−1‖2

< α(J(u∗)− J(uk−1)) +α2χk

2‖u∗ − uk−1‖2 + J(uk−1) + ε =: ηk(α).

It is easy to show that the minimum of ηk(·) on the interval [0, 1] is attained atthe point

αk = min

[J(uk−1)− J(u∗)

χk‖u∗ − uk−1‖2, 1

],

hence,

ηk(αk) = J(uk−1)− (J(uk−1)− J(u∗))2

2χk‖u∗ − uk−1‖2+ ε if αk < 1,


and

ηk(αk) ≤ J(uk−1)− 1

2(J(uk−1)− J(u∗)) + ε if αk = 1.

Thus, we get

ηk(αk) ≤J(uk−1)− τ2

ς + ε if αk < 1,

J(uk−1)− τ2 + ε if αk = 1,

(3.2.7)

with ς = supk≥k0 2χk‖u∗ − uk−1‖2 <∞.

Regarding the definition of the sequence uk, for any α ∈ [0, 1] the inequality

J(uk) + φk(uk) +χk2‖uk − uk−1‖2 ≤ J(ukα) + φk(ukα) +

χk2‖ukα − uk−1‖2 +

ε2kχk

holds true, and by means of (3.2.7) and the non-negativity of φk we get

J(uk) < max

[J(uk−1)− τ2

ς, J(uk−1)− τ

2

]+ ε+

ε2kχk. (3.2.8)

The latter inequality is also true for k1 > k if only J(uk1−1) ≥ J∗ + τ .

Assuming that ε << τ2

ς , we conclude from inequality (3.2.8) that after a finite

number of steps a point uk′

(k′ > k) can be obtained for which

J(uk′) < J∗ + τ. (3.2.9)

But from

Fk(uk) ≤ Fk(uk−1) +ε2kχk

(3.2.10)

which is true for any k ∈ N, it follows in view of (3.2.6) that

J(uk′+1) ≤ J(uk

′) + ε+

ε2k′+1

χk′+1,

i.e., due to (3.2.9) and ε+ε2k′+1

χk′+1< τ , we have

J(uk′+1) < J∗ + 2τ. (3.2.11)

Now, if J(uk′+1) ≥ J∗+τ , then using (3.2.8) with k := k′+2 as well as (3.2.11),

again we obtainJ(uk

′+2) < J∗ + 2τ.

Hence,J(uk) < J∗ + 2τ, ∀ k ≥ k′ + 1,

which contradicts the assumption

lim supk→∞

J(uk)− J∗ = θ0 = 3τ,

proving that lim supk→∞ J(uk) = J∗.On the other hand, due to condition (i), we have lim infk→∞ J(uk) ≥ J∗, hence,limk→∞ J(uk) = J∗ and the statement

‖uk − uk−1‖ = 0


follows immediately from (3.2.10).

3. Finally we prove (e). Reminding of the notions defined before Theorem 3.2.1and setting

yk := arg minu∈Q

J(u) +χk2‖u− uk−1‖2,

we obtain in view of the choice of vk, wk and δk,

Fk(vk) ≤ Fk(yk) ≤ J(yk) +χk2‖yk − uk−1‖2 + δk,

J(yk) +χk2‖yk − uk−1‖2 ≤ J(wk) +

χk2‖wk − uk−1‖2.

The last two inequalities, together with φk(vk) ≥ 0, lead to

J(vk) +χk2‖vk − uk−1‖2 − δk ≤ J(yk) +

χk2‖yk − uk−1‖2

≤ J(wk) +χk2‖wk − uk−1‖2. (3.2.12)

In the first and second part of the proof it was shown that the sequences vkand uk are bounded and that their cluster points belong to U∗.Boundedness of the set Q and consequently also of the sequence wk followsfrom Proposition A1.7.55. Therefore, for k = 1, 2, · · · , the inequality∣∣J(wk) +

χk2‖wk − uk−1‖2 − J(vk)− χk

2‖vk − uk−1‖2

∣∣ ≤ c1‖wk − vk‖ (3.2.13)

can be established. From (3.2.12) and (3.2.13) it follows that

J(wk) +χk2‖wk − uk−1‖2 − J(yk)− χk

2‖yk − uk−1‖2 ≤ c1τk + δk,

and with regard to the strong convexity of Ψk(·) = J(·) + χk2 ‖ · −u

k−1‖2 andthe choice of yk and wk, one can conclude that

‖wk − yk‖ ≤√

2

χk(c1τk + δk).

Non-expansivity of the prox-mapping leads to

‖yk − u‖ ≤ ‖uk−1 − u‖ for any u ∈ U∗,

and using the relations ‖uk − vk‖ ≤ εkχk

and

‖uk − u‖ ≤ ‖uk − vk‖+ ‖vk − wk‖+ ‖wk − yk‖+ ‖yk − u‖,

we obtain

‖uk − u‖ ≤ εkχk

+ τk +

√2

χk(c1τk + δk) + ‖ui−1 − u‖.

Now, the assumptions of the theorem ensure the convergence of the series

∞∑k=1

(εkχk

+ τk +

√2

χk(c1τk + δk)

),

and due to Lemma A3.1.4 the sequence ‖uk − u‖ converges for any u ∈ U∗.Taking into account that every cluster point of uk belongs to U∗, one caneasily seen that uk converges to a point of the optimal set U∗.


3.2.2 Remark. Let us analyze the condition (iv) in Theorem 3.2.1. Supposethe feasible set K is given as in Problem (A3.4.56). For sufficiently large k weconclude from the second part of the proof that J(vk) < J∗+%, hence, wk = vk

(and τk = 0) or g(wk) = 0, with g(·) = max1≤j≤m gj(·). In the latter case, from(3.2.4) with

ϕ := g, x′ = u, x := wk, t :=‖vk − wk‖‖wk − u‖

,

it follows

g(vk) ≥ g(wk) + t(g(wk)− g(u)),

g(vk) ≥ |g(u)| ‖vk − wk‖‖wk − u‖

≥ c2‖wk − vk‖,(3.2.14)

with c2 := |g(u)| 1maxz∈∂Q ‖z−u‖

.

For the majority of penalty functions satisfying the conditions (i)-(iii) of Theo-rem 3.2.1, the inequalities

∂φk(uk)

∂gj≤ c, (hence,

∂φk(vk)

∂gj≤ c), j = 1, ...,m, k = 1, 2, · · · , (3.2.15)

hold (see Proposition 3.2.5 below).If, for instance,

φk(u) := rk

m∑j=1

max[0, gj(u)]2, limk→∞

rk = +∞,

due to (3.2.15), we conclude immediately that

max[0, gj(vk)] ≤ c

2rk.

This together with (3.2.14) gives

τk := ‖vk − wk‖ ≤ c

2c2rk

if vk 6= wk and k is large enough.Therefore,

∞∑k=1

1√rk

<∞ ⇒∞∑k=1

√τk <∞.

The condition∑∞k=1

√δk <∞ is trivially satisfies because in that case we have

δk = 0 ∀ k.For other penalty functions some alterations are necessary, which are connectedwith the estimation of g(vk) on the basis of inequality (3.2.15). ♦

We emphasize that generally, for any penalty function satisfying the con-ditions (i)-(iii) in Theorem 3.2.1, condition (iv) can be fulfilled by a suitablechoice of the penalty parameters.

In order to apply barrier instead of penalty functions, we consider convex,lsc functions φk : Rn → R with a common effective domain D ⊃ intK and


φk|D ∈ C1(D), ∀ k ∈ N.To establish a similar convergence theorem for regularized barrier methods, wechoose a sequence σk, σk > 0 ∀ k, and define the sets

Uk := u ∈ Q : φk(u) ≤ σk

and the values

ζk := minu∈UkJ(u) +

χk2‖u− uk−1‖2 −min

u∈QJ(u) +

χk2‖u− uk−1‖2. (3.2.16)

Again we use the notions defined before Theorem 3.2.1.

Now, under previously made assumptions on χk and εk let us consider theconvergence of a regularized barrier method (3.2.1) (3.2.2) starting with u0 ∈ D.

3.2.3 Theorem. Let εk be chosen such that∑∞k=1 εk < ∞. Assume that the

barrier functions φk have the properties mentioned before and the following con-ditions are fulfilled:

(i’) limk→∞ φk(u) =

0 if u ∈ intK,+∞ if u /∈ K;

(ii’) lim infu∈∂K∩D ‖∇φk(u)‖ =∞ (or ∂K ∩D = ∅);

(iii’) for all k ∈ N it holds

φk+1(u) ≤ φk(u), ∀ u ∈ K ∩D.

Then

(a) the sequence uk, generated by Barrier Method (3.2.1) (3.2.2), is bounded;

(b) uk ∈ intK for sufficiently large k;

(c) limk→∞ ‖uk − uk−1‖ = 0;

(d) limk→∞ J(uk) = J∗.

If, moreover,

(iv’)∑∞k=1

√σk <∞,

∑∞k=1

√ζk <∞,

then

(e) limk→∞ uk = u∗ ∈ U∗.

Proof: The proof of the statements (a)-(d) is almost identical to that of Theorem7.1 in Grossmann and Kaplan [145]. In particular, the existence of a numberk′ is guaranteed such that for all k ≥ k′ − 1

uk ∈ intQ and vk := arg minu∈Rn

Fk(u) ∈ intQ.

We prove now the statement (e). Let k ≥ k′. Setting again

yk := arg minu∈Q

J(u) +χk2‖u− uk−1‖2, (3.2.17)


in view of the obvious inequality

J(yk) +χk2‖yk − uk−1‖2 ≤ J(vk) +

χk2‖vk − uk−1‖2

and the choice of ζk, one can conclude that

minu∈UkJ(u) +

χk2‖u− uk−1‖2 ≤ J(vk) +

χk2‖vk − uk−1‖2 + ζk. (3.2.18)

But due to the conditions (i’) and (iii’), the functions φk are non-negative onK and, with regard to (3.2.18) and the definition of Uk and ζk, we obtain forxk := arg minFk(u) : u ∈ Uk

Fk(xk) ≤ J(vk) +χk2‖vk − uk−1‖2 + σk + ζk ≤ Fk(vk) + σk + ζk,

J(xk) +χk2‖xk − uk−1‖2 ≤ J(yk) +

χk2‖yk − uk−1‖2 + σk + ζk.

Now, due to the strong convexity of the functions Ψk(·) = J(·) + χk2 ‖ · −u

k−1‖2and Fk as well as the choice of vk and yk, the inequalities

‖xk − vk‖ ≤√

2

χk(σk + ζk), ‖xk − yk‖ ≤

√2

χk(σk + ζk)

hold for k ≥ k′. To establish uk → u∗ ∈ U∗ we proceed now as in the proof ofTheorem 3.2.1.

3.2.4 Remark. Let us analyze condition (iv’) in Theorem 3.2.3 for barrierfunctions of type

φk(u) =

m∑j=1

exp(rkgj(u)), rk+1 ≥ rk, limk→∞

rk =∞.

For these functions the validity of the conditions (i’)-(iii’) is evident. If σk > 0are chosen such that

∑∞k=1 σk <∞ and rk satisfies the condition

∞∑k=1

√| lnσk|rk

<∞,

then one can show that∑∞k=1

√ζk <∞. Indeed, suppose that u ∈ Q fulfills the

inequality

g(u) ≤ln(σkm )

rk(3.2.19)

with g(·) := max1≤j≤m gj(·). Then, obviously, u belongs to Uk. We can assume

that σk < 1 and rk are chosen such that for some fixed u ∈ intQ it holds

g(u) ≤ln(σkm )

rk, ∀ k.

Then, for arbitrary z ∈ Q,

g(λu+ (1− λ)z) ≤ λg(u), ∀ λ ∈ [0, 1].


Setting

λ :=ln(σkm )

rkg(u),

we get

g(λu+ (1− λ)z) ≤ln(σkm )

rk,

proving the inclusion λu+ (1− λ)z ∈ Uk for all λ ∈ [0, 1]. Hence,

minu∈Uk

‖z − u‖ ≤ ‖z − (λu+ (1− λ)z)‖ = λ‖z − u‖

≤ λ maxu∈∂Q

‖u− u‖ =ln(σkm )

rkg(u)maxu∈∂Q

‖u− u‖.

For zk := arg minv∈Uk ‖v − yk‖, with yk via (3.2.17), we get

‖zk − yk‖ ≤ c| ln(σkm )|rk

, ∀ k. (3.2.20)

But

ζk ≤ J(zk) +χk2‖zk − uk−1‖2 − J(yk)− χk

2‖yk − uk−1‖2

= J(zk)− J(yk) +χk2‖yk − zk‖‖zk + yk − 2uk−1‖ (3.2.21)

and because the sequences uk, zk, yk and χk are bounded, the requiredresult follows immediately from (3.2.20) and (3.2.21). ♦

For this type of regularized penalty methods a statement analogously toProposition A3.4.43 is true.

3.2.5 Proposition. Assume that in Method (3.2.1), (3.2.2) applied to Problem(A3.4.56) the convex penalty functions φk : Rn → R have the form

φk := ϕk(g(·)),

where g := (g1, ..., gm) and ϕk : Rm → R, ϕ ∈ C1(domϕk), and that either theconditions (i)-(iii) of Theorem 3.2.1 or (i’)-(iii’) of Theorem 3.2.3 are fulfilled.Moreover, let the functions

ηjk(·) :=∂ϕk(g(·))∂gj

be non-negative on Rn and for any sequence zk the following implication holds

lim infk→∞

gj(zk) > −∞, lim sup

k→∞gj(z

k) < 0 ⇒ limk→∞

ηjk(zk) = 0.

Then

(i) the sequence (uk, λk), with λk = (λk1 , ..., λkm), λkj := ηjk(uk), is bounded,

(ii) each cluster point of the sequence λk is a solution of the dual problem,


(iii) in case uk ∈ K the estimate

J(uk) ≥ J∗ ≥ J(uk)+〈λk, g(uk)〉−(εk+χk‖uk−uk−1‖)ρ(uk, U∗) (3.2.22)

holds true.

Bipartite estimates of type (3.2.22) can be obtained also for infeasible iteratesuk /∈ K (cf. (A3.4.83)). In case that upper bounds for ρ(uk, U∗) are known,inequality (3.2.22) can be used as an efficient stopping rule.

Proposition 3.2.5 immediately follows from Proposition A3.4.43 if ϕk ∈C1(Rm) is chosen. Indeed, in view of inequality (3.2.2) we see that

‖∇J(uk) +∇φk(uk)‖ ≤ εk + χk‖uk − uk−1‖.

Remind that the relation ‖uk − uk−1‖ → 0 was proved in the Theorems 3.2.1and 3.2.3 without the usage of the conditions (iv), or (iv’), respectively. Thus,setting

ε′k = εk + χk‖uk − uk−1‖,

we obtain ε′k ↓ 0 and thus, the sequence uk satisfies the stopping rule inProposition A3.4.43.

For convex infinite-dimensional problems Penalty Method (3.2.1) (3.2.2)will be studied in the Sections 6.2 and 6.3 in a general framework choosingpenalties of type

φk(u) :=1

2

m∑j=1

λkj

(gj(u) +

√g2j (u) +

1

rk

)with limk→∞ rk = +∞, limk→∞ λkj = λj > λ∗j , λ

∗ = (λ∗1, ..., λ∗m) a Lagrange

multiplier vector.Moreover, in Section 5.3 estimates of the rate of convergence for Method (3.2.1)(3.2.2) with barrier functions of type (A3.4.78) will be established.

In the remainder of this subsection we present some results obtained byAlart and Lemaire [5]. They proposed regularized penalty methods with non-smooth penalty functions which are applicable to more general problems thanProblem (A3.4.56). For instance, it might happen that the objective functioncannot be extended outside of its effective domain by a finite convex functionwhich is defined on the whole space. In [5] the problem

minJ(u) : u ∈ K,

is considered with K := v ∈ Rn : g(v) ≤ 0 and J : Rn → R convex and lowersemi-continuous. The function g is supposed to be convex and finite-valued.Moreover, it is assumed that Slater’s condition holds and U∗ 6= ∅.Let M > supu∈K J(u). Introducing the functions

J(u) :=

J(u) if u ∈ K,M if u 6= K,

andJk(u) := max[J(u), rkg(u) +M ], rk > 0, lim

k→∞rk = +∞,


the iterates uk are determined by

Jk(uk) +χk2‖uk − uk−1‖2 ≤ min

u∈RnJk(u) +

χk2‖u− uk−1‖2+ εk (3.2.23)

with χk, εk chosen such that

0 < χ ≤ χk ≤ χ <∞, εk ↓ 0.

In [5] it is shown that the functions Jk are finite-valued, convex and

limk→∞

Jk(u) =

J(u) if u ∈ intK,M if u ∈ ∂K,+∞ if u /∈ K.

3.2.6 Proposition. (cf. [5])Assume that the set K is bounded, rk ≤ rk+1 ∀ k and that

∞∑k=1

√εk <∞,

∞∑k=1

1√rk

<∞.

Then the sequence uk, defined by Method (3.2.23), converges to some pointu∗ ∈ U∗ and Jk(uk)→ J∗.

This statement was proved by means of the theory of variational convergence, de-veloped by Attouch [16], Attouch and Wets [18] and, concerning proximal-point methods, by Lemaire [260].

3.2.7 Remark. Proposition 3.2.6 can be derived also from Theorems A3.4.41and 3.2.3 taking into account the following facts:Theorem A3.4.41 remains true if, instead of (A3.4.70) and the convexity of φk,we assume that intK 6= ∅ (i.e. H = R

n and φk := φ1k), that J + φk is convex

and

limk→∞

(J(u) + φk(u)) =

J(u) if u ∈ intK,+∞ if u /∈ K.

Theorem 3.2.3 remains valid if, instead of (ii’) and the convexity of φk, wesuppose that J + φk is convex and φk(u) ≥ c > 0 for all u ∈ ∂K and all k.In these cases we can also drop the assumption about the smoothness of J, gand φk and replace criterion (3.2.2) by

Fk(uk)− minu∈Rn

Fk(u) ≤ ε2kχk.

The alterations required in the proofs of these theorems are insignificant. Inorder to verify condition (iv’) in Theorem 3.2.3, we note that

Uk = u ∈ Q : max[J(u), rkg(u) +M ]− J(u) ≤ σk⊃ u ∈ Q : max[0, rkg(u)− J∗ +M ] ≤ σk.

For arbitrary z ∈ ∂Q and λ ∈ [0, 1] it follows

g(λu+ (1− λ)z) ≤ λg(u), (u ∈ intQ is fixed),


and hence,

λu+ (1− λ)z ∈ Uk if rkλg(u)− J∗ +M ≤ σk, ∀ λ ∈ [0, 1].

In particular, if k is large enough, this inclusion holds for

λ =M − J∗ − σk−g(u)rk

.

Thus, similarly to the deduction of (3.2.20), (3.2.21), we obtain∑∞k=1

√ζk <∞

if we assume that∑∞k=1

√σk <∞ and

∑∞k=1

√1rk<∞ holds. ♦

Sometimes we take the regularizing parameter χk := 2 ∀ k in order to simplifythe description of the PPR. Some practical recommendations with respect tothe choice of χk will be given in this book in particular applications for specificclasses of problems as well as in Chapter 7 (see also Rockafellar [352] andGuler [153]).

3.2.3 Augmented Lagrangian methods with iterativer PPR

Here the PPR suggested by Rockafellar [351] and Antipin [13] for solv-ing problems of type (A3.4.56) will be described by means of the augmentedLagrangian function (A3.4.85)

LA(u, λ) := J(u) +1

2r

‖[λ+ rg(u)]+‖2 − ‖λ‖2

, (r > 0).

The establish this method we mainly follow the description in [13].

3.2.8 Algorithm. (Regularized augmented Lagrangian method)Data: u0 ∈ K, λ0 ∈ Rm+ , εk ↓ 0;

S0: Set k := 0;

S1: if (uk, λk) is a saddle point of Problem (A3.4.56), stop;

S2: compute uk+1 such that

‖∇Fk(uk+1)‖ ≤ εk, (3.2.24)

with

Fk(u) := LA(u, λk−1) + ‖u− uk‖2;

S3: set

λk+1 := [λk + rg(uk+1)]+; (3.2.25)

set k := k + 1 and go to S1.

♦

Now we are going to study the convergence of Algorithm 3.2.8 applied toProblem (A3.4.56).


3.2.9 Theorem. Assume that the mapping u 7→ g(u) on Rn is Lipschitz-

continuous with constant L and the sequence εk of positive numbers satisfy

limk→∞

εk = 0,

∞∑k=1

εk <∞.

Thenlimk→∞

uk = u∗ ∈ U∗, limk→∞

λk = λ∗ ∈ Λ∗,

where Λ∗ denotes the set of optimal Lagrange multiplier vectors.

Proof: For the pair (uk−1, λk−1), computed by Algorithm 3.2.8, let (uk, λk) bethe exact calculated iterates, i.e.,

uk = arg minu∈Rn

Fk(u), (3.2.26)

λk = [λk−1 + rg(uk)]+. (3.2.27)

In view of the strong convexity of Fk and (3.2.24)-(3.2.27), we obtain

‖uk − uk‖ ≤ εk2, (3.2.28)

and

‖λk − λk‖ = ‖[λk−1 + rg(uk)]+ − [λk−1 + rg(uk)]+‖ (3.2.29)

≤ r‖g(uk)− g(uk)‖ ≤ rLεk2

=: c1εk. (3.2.30)

Next we estimate the distance of (uk, λk) to an arbitrary but fixed point (u, λ) ∈U∗ × Λ∗.In view of (3.2.26) and Proposition A1.5.34 it follows that

2〈uk − uk−1, u− uk〉 (3.2.31)

+〈∇g(uk)T [λk−1 + rg(uk)]+, u− uk〉+ J(u)− J(uk) ≥ 0, ∀ u ∈ Rn,

and

〈∇g(u)T [λ+ rg(u)]+, u− u〉+ J(u)− J(u) ≥ 0, ∀ u ∈ Rn. (3.2.32)

Setting u := u in (3.2.31) and u := uk in (3.2.32) and combining these inequal-ities, we obtain

2〈uk − uk−1, u− uk〉+〈∇g(uk)T [λk−1 + rg(uk)]+ −∇g(u)T [λ+ rg(u)]+, u− uk〉 ≥ 0,

and the convexity of g yields

2〈uk − uk−1, u− uk〉 (3.2.33)

+〈[λk−1 + rg(uk)]+ − [λ+ rg(u)]+, g(u)− g(uk)〉 ≥ 0.

Now, using the obvious relations

〈[x]+, x〉 = 〈[x]+, [x]+〉, 〈[x]+, y〉 ≤ 〈[x]+, [y]+〉,


we conclude that

〈[λk−1 + rg(uk)]+ − [λ+ rg(u)]+, g(uk)− g(u)〉

=1

r〈[λk−1 + rg(uk)]+ − [λ+ rg(u)]+,(

λk−1 + rg(uk)− λk−1)−(λ+ rg(u)− λ

)〉

≥ 1

r〈[λk−1 + rg(uk)]+ − [λ+ rg(u)]+,

[λk−1 + rg(uk)]+ − λk−1 −(

[λ+ rg(u)]+ − λ)〉.

Hence,


≥ 1

r‖([λk−1 + rg(uk)]+ − λk−1

)−(

[λ+ rg(u)]+ − λ)‖2

+1

r〈λk−1 − λ,

([λk−1 + rg(uk)]+ − λk−1

)−(

[λ+ rg(u)]+ − λ)〉.

On account of (3.2.27) and the property of a saddle point we have

[λ+ rg(u)]+ − λ = 0

and the latter inequality implies


≥ 1

r‖λk − λk−1‖2 +

1

r〈λk−1 − λ, λk − λk−1〉 (3.2.34)

=1

r〈λk − λ, λk − λk−1〉.

Now from (3.2.33 and (3.2.34) we conclude that

〈uk − uk−1, u− uk〉 ≥ 1

2r〈λk − λ, λk − λk−1〉

and after a short calculation

‖uk − u‖2 +1

2r‖λk − λ‖2 + ‖uk − uk−1‖2 +

1

2r‖λk − λk−1‖2

≤ ‖uk−1 − u‖2 +1

2r‖λk−1 − λ‖2.

Setting z := (u, 1√2rλ)T , zk := (uk, 1√

2rλk)T etc., the latter inequality can be

rewritten in the form

‖zk − z‖2 + ‖zk − zk−1‖2 ≤ ‖zk−1 − z‖2 (3.2.35)

and consequently,‖zk − z‖ ≤ ‖zk−1 − z‖. (3.2.36)

In order to estimate ‖zk − z‖ from below, we infer from (3.2.28) and (3.2.29)that

‖zk − zk‖ ≤√

1

4+c212r

εk =: c2εk,


hence,

‖zk − z‖2 = ‖zk − zk + zk − z‖2 ≥ ‖zk − z‖2 − 2‖zk − z‖‖zk − zk‖≥ ‖zk − z‖2 − 2c2εk‖zk − z‖2.

Together with (3.2.35) this yields

‖zk − z‖2 + ‖zk − zk−1‖2 ≤ ‖zk−1 − z‖2 + 2c2εk‖zk − z‖2. (3.2.37)

But from (3.2.36) it follows that

‖zk − z‖ ≤ ‖zk − zk‖+ ‖zk − z‖ ≤ c2εk + ‖zk−1 − z‖.

Regarding the structure of the latter inequality and Lemma A3.1.4, we obtainthe convergence of the sequence ‖zk− z‖ because

∑∞k=1 εk <∞. In connection

with (3.2.37) this leads to ‖zk − zk−1‖ → 0, hence,

‖uk − uk−1‖ → 0, ‖λk − λk−1‖ → 0. (3.2.38)

It is easy to verify that for λ ≥ 0 the equality

[g(u)]+ =

[[1

rλ+ g(u)]+ −

1

rλ

]+

is true. So in view of (3.2.27) and (3.2.38) we get

‖[g(uk)]+‖ ≤ ‖[1

rλk−1 + g(uk)]+ −

1

rλk−1‖ =

1

r‖λk − λk−1‖. (3.2.39)

The sequence (uk, λk) is bounded. Therefore a convergent subsequence (uki , λki)can be chosen with limi→∞(uki , λki) = (u∗, λ∗). Thus, with (3.2.38) we have

limi→∞

(uki , λki) = (u∗, λ∗).

Observing the continuity of [g(·)]+ and relation (3.2.39), we can conclude that[g(u∗)]+ = 0, i.e. the point u∗ is feasible. From (3.2.39) it follows also that

[1

rλ∗ + g(u∗)]+ −

1

rλ∗ = 0, (3.2.40)

henceλ∗g(u∗) = 0. (3.2.41)

Inequality (3.2.31) considered for k := ki, together with (3.2.38), lead to

〈∇J(u∗) +∇g(u∗)T [λ∗ + rg(u∗)]+, u− u∗〉 ≥ 0, ∀ u ∈ Rn,

and due to (3.2.40) we obtain

∇J(u∗) +∇g(u∗)Tλ∗ = 0. (3.2.42)

Since u∗ is a feasible point, the relations (3.2.41) and (3.2.42) establish that(u∗, λ∗) is a saddle point for Problem (A3.4.56).Finally, from the convergence of ‖zk − z‖ for each z ∈ U∗ × Λ∗, we are going

3.3. PPR IN METHODS OF NUMERICAL ANALYSIS 77

to conclude the statement.

It should be noted that the assumption on the existence of a global Lip-schitz constant for g on Rn can be omitted if we make sharper estimates in thefinal part of the proof.

In [351] the following estimates have been established additionally:

J(uk)− J∗ ≤ −〈λk, g(uk)〉+ (εk + 2‖uk − uk−1‖)‖uk − u∗‖,J(uk)− J∗ ≤ 2‖λ∗‖‖uk − uk−1‖

(cf. also inequality (3.2.22) in Proposition 3.2.5).In [14] a dual method of iterative PPR has been investigated using a

Lagrangian function regularized with respect to the dual variables. The iterationproceeds as follows:

λk ≈ arg maxλ∈Rm+

−1

2‖λ− λk−1‖2 +D(uk−1, λ),

uk ≈ uk−1 − αk(∇J(uk−1) +∇g(uk−1)Tλk),

with D(u, λ) := J(u) + 〈λ, g(u)〉 − α2 ‖∇J(u) + ∇g(u)Tλ‖2, α > 0. The step-

size αk has to be chosen as in Proposition A3.4.39. Conditions imposedon the smoothness of J and gj and on the controlling parameters ensure theconvergence of (uk, λk) to some point of U∗ × Λ∗ even if Λ∗ is unbounded.The function D(uk, ·) is concave and quadratic. Thus, calculating λk, a simplequadratic programming problem has to be solved.

3.3 PPR in Methods of Numerical Analysis

In this section we briefly consider well-known numerical methods which can bedescribed within the framework of proximal-point methods, even though theirinventions are motivated by quite different points of view.

3.3.1 Implicit Euler method

The problem of unconstrained minimization of a convex differentiable functionJ : Rn → R can be formulated as a problem of looking for a stationary pointof the differential equation

du

dt= −∇J(u), u(0) = u0.

It is obvious that the application of the explicit Euler method

uk+1 − uk

τ= −∇J(uk), k = 1, 2, · · · ,

corresponds to the gradient method with step-length parameter τ .On the other hand, the use of the implicit Euler method

uk+1 − uk

τ= −∇J(uk+1), k = 1, 2, · · · , (3.3.1)


is equivalent to the minimization of the functional J by means of the iterativeproximal-point-method

uk+1 := arg minu∈Rn

Ψk,χ(u),

with Ψk,χ := J(u) + χ2 ‖u− u

k‖2, χ := 1τ and the starting point u0 := u(0).

This is true because ∇Ψk,χ(u) = 0 is evident with the iteration scheme (3.3.1).In the theory of finite-difference methods it is well-known that the implicit

Euler method provides a better stability. It is absolutely stable, whereas theexplicit scheme is only conditional stable (cf. for instance [76, 368]). Of course,one iteration of the proximal method is more expensive than one step of thegradient method. Nevertheless, for stiff differential equations, which usually oc-cur in connection with ill-posed optimization problems, the implicit scheme ismore preferable.

Solving in the Hilbert space V the operator equation

Au = f (3.3.2)

with a symmetric, semi-definite operator A : V → V , f ∈ V , Kryanev [250]has investigated the iteration scheme

(I + τA)uk+1 = uk + τf, k = 1, 2, · · · , (3.3.3)

with I the identity operator, τ > 0 and u0 ∈ V arbitrarily chosen. This methodcan also be interpreted as a proximal-point regularization

uk+1 := arg minu∈V

J(u) +1

2τ‖u− uk‖2 (3.3.4)

for the unconstrained minimization of the quadratic functional

J(u) :=1

2〈Au, u〉 − 〈f, u〉.

In this case strong convergence of the PPR is guaranteed, whereas for generalconvex variational problems only weak convergence can be established.

3.3.2 Levenberg-Marquardt method

The Levenberg-Marquard method, originally developed for solving non-linearleast-squares problems, has been adapted later on in order to minimize functionsJ : Rn → R which are not necessarily convex. This method generates iteratesaccording to

uk+1 := uk −(∇2J(uk) + χk+1I

)−1∇J(uk). (3.3.5)

Note that for χk := 0 we just get the Newton method, whereas for χk →∞ thedescent directions approaches the negative gradient.

It is easily seen that Method (3.3.5) can be rewritten equivalently in theform


Jk(u) +χk+1

2‖u− uk‖2

3.4. PPR FOR NONCONVEX PROBLEMS 79

with

Jk(u) := J(uk) + 〈∇J(uk), u− uk〉+1

2〈∇2J(uk)(u− uk), u− uk〉,

i.e., this scheme can be interpreted as an iterative PPR of the classical Newtonmethod.

It should be mentioned that another interpretation of the Levenberg-Marquard method became the starting point for the development of trust-regionalgorithms which have attracted considerable attention during the last decades(cf. Fletcher [117]). We come back to this point of view in Section 3.4.3.

3.3.3 Linearization method

Now we turn to the linearization method (A3.4.64) sketched briefly in AppendixA3.4.2. This method is also interesting in the context of this book, because itcan be interpreted as an iterative PPR for constrained problems. To begin with,consider Problem (A3.4.56), linearized at point uk−1:

min〈∇J(uk−1), u− uk−1〉 (3.3.6)

s.t. gj(uk−1) + 〈∇gj(uk−1), u− uk−1〉 ≤ 0, j ∈ Ik−1(uk−1). (3.3.7)

It should be noted that in the linearized problem some of the initial constraintsmay be deleted.

In Remark A3.4.40 it was mentioned that the fully linearized problem maynot be solvable. Therefore, mere linearization is not suitable even as a standardmethod for solving well-posed problems.In order to recapitulate Method (A3.4.64), let uk be the minimizer of the regu-larized function

〈∇J(uk−1), u− uk−1〉+χ

2‖u− uk−1‖2, χ > 0,

subject to the constraints (3.3.7). Then the next iterate is calculated by

uk := uk−1 + γ(uk − uk−1),

with γ > 0 determined by some line search procedure.This method has the following essential advantage of iterative PPR: To-

gether with a good stability of the auxiliary problems it ensures convergence of aminimizing sequence to a solution of the original problem. Moreover, PropositionA3.4.39 guarantees that the method is stable under small data perturbationsin C1(Rn) in the original problem.

3.4 PPR for Nonconvex Problems

In this section we are dealing with the application of proximal point methodsfor non-convex minimization problems

minf(u) : u ∈ K

with f : Rn → R and K ⊂ Rn a closed set.As far as we know, the first direct application of the proximal point method


for certain nonconvex unconstrained minimization problems was performed byFukushima and Mine [123].

If we drop the convexity assumptions on the objective function f and/or the feasible set K, several problems arise (see [220]). For instance, the proxoperator is possibly no more well-defined because it is not clear whether a uniqueglobal minimizer of the regularized objective function exists at all. Moreover, ingeneral the prox operator might not be non-expansive anymore even in arbitrarysmall neighborhoods of local or global minima.

3.4.1 Example. Take K := R2 and f(u) := min[u21, u

22]. Then the optimal set

is given by U∗ = u ∈ R2 : u1 = 0 ∨ u2 = 0. The prox points of u1 = (2α, α)T

and u2 = (α, 2α)T with an arbitrarily chosen α > 0 and χ := 2 are given by

Proxf,Rn

u1 =(

2α,α

2

)T, Prox

f,Rnu2 =

(α2, 2α

),

thus

‖Proxf,Rn

u1 − Proxf,Rn

u2‖ =

√9

2α >√

2 = ‖u1 − u2‖,

showing the destruction of the non-expansivity. ♦

As the following example shows, the prox iteration may terminate in apoint which is a fixed point of the prox operator but neither a local minimumnor a stationary point of f even if the regularized objective function is convex.

3.4.2 Example. Consider the feasible set

K := u ∈ R2 : u1 + u2 ≤ 0, u1 ≤ 0,−u2 ≤ 1

and the objective function f(u) := −u1− 12u

22. Starting with u0 = (− 1

4 ,14 )T and

choosing χ := 2 we obtain

Proxf,K

u0 = (0, 0)T = Proxf,K

0,

telling us that the point 0 = (0, 0)T is a fixed point of the prox mapping butnot a local minimum of the initial problem. ♦

Despite these obstacles there are situations of successful applications of theproximal point algorithm. The following lemma shows that the set of proximalpoints at a given argument of the prox operator is never empty. Consequently,the proximal mapping is set-valued with dom(Proxf,K) = Rn.

3.4.3 Proposition. Let f be continuous and infv∈Rn f(v) > −∞.

(i) Then for all χ > 0 and u ∈ Rn

Arg minv∈Rn

Ψχ,Rn(v) 6= ∅,

with Ψχ,Rn(v) = f(v) + χ2 ‖v − u‖

2.

(ii) For any χ > 0 and fixed u ∈ Rn let u∗(χ) = u∗(χ, u) such that

u∗(χ) ∈ Arg minv∈Rn

Ψχ,Rn(v).

Then limχ→∞ u∗(χ) = u.


Proof: Let u ∈ Rn and χ > 0 be arbitrary but fixed. Due to the bound-edness of f we also have c := infv∈Rn Ψχ,Rn(v) > −∞. This together withthe continuity of f guarantees the existence of a bounded sequence vk withΨχ,u(vk) → c. Unboundedness of vk would be result in the unboundednessof Ψχ,Rn(vk), contradicting its convergence. Hence, there exists a convergentsubsequence vki such that

Ψχ,Rn(v) = limi→∞

Ψχ,Rn(vki) = c,

and accordingly v ∈ Arg minv∈Rn Ψχ,Rn(v).Again by assumption

f(u∗(χ)) +χ

2‖u∗(χ)− u‖2 = Ψχ,Rn(u∗(χ)) ≤ Ψχ,Rn(u) = f(u),

showing that f(u∗(χ)) ≤ f(u) for any χ > 0 and

‖u∗(χ)− u‖2 ≤ 2

χ[f(u)− f(u∗(χ))].

Now, taking limit χ→∞, due to the boundedness of [f(u)− f(u∗(χ))] we get‖u∗(χ)− u‖ → 0.

The next example was considered by Voetmann [412], who calculated theprox points by a Matlab routine with the highest adjustable accuracy. In thisexample we are only interested in the prox points themselves.

3.4.4 Example. Consider the Rosenbrock function

f(u) := 100(u2 − u21)2 + (1− u1)2,

which is non-convex but has a unique global minimizer at u∗ = (1, 1)T and nofurther local minima. The regularized objective function Ψχ,Rn(v) also has aunique global minimum for all χ > 0 and u ∈ U0 := v ∈ R2 : −2 ≤ v1 ≤2, −2 ≤ v2 ≤ 2. Starting the proximal iterations with arbitrary but bounded0 < χk < χ in u0 = (−1.2, 1)T the resulting sequence always converges to u∗.

k uk1 uk2 k uk1 uk20 -1.2 11 0.518094 0.272061 8 0.972203 0.9450652 0.720923 0.518498 9 0.980335 0.9609773 0.821633 0.674301 10 0.986047 0.9722324 0.881008 0.775668 11 0.990079 0.9802165 0.918694 0.843659 12 0.992935 0.9858926 0.943624 0.890193 13 0.994964 0.9899347 0.960538 0.922472 14 0.996408 0.992814

......

...

Table 3.4.1: Iteration log of Example 3.4.4

Table 3.4.1 shows the computed iterates for χk := 100. Obviously convergenceoccurs and from k = 3 on the iterates follow the bottom of the steep valley thatis characteristic for the Rosenbrock function. ♦


Proposition 3.4.3 and Example 3.4.4 show that one can realistically expectthat there are non-convex optimization problems that allow a successful con-vexification based on an suitable choice of the prox parameter χk.

3.4.1 Convexification of unconstrained problems

In the sequel, we deal with Problem

minu∈Rn

f(u) (3.4.1)

without convexity of the objective function f . However, the application of theexact proximal point method will be studied under the hypothesis that theregularized objective functions

Ψk(u) := f(u) +χk2‖u− uk−1‖2 (3.4.2)

are strongly convex on some convex set Ω, which has to contain the sequence ofproximal iterates uk. Due to the evident non-increase of f(uk), Ω may beany convex set containing u ∈ Rn : f(u) ≤ f(u0). In such situation usuallywe can handle the regularized auxiliary problem

minv∈Ω

Ψk(v) (3.4.3)

as a convex minimization problem, and this is a very essential argument forapplying proximal methods. But, of course, the assumption that the functionΨk(·) becomes convex under a suitable choice of χk > 0 is restrictive (see forinstance the problem f(u) := min[|u|, |u+ 1|] on R1).

Therefore, we start with a simple statement which establishes the possi-bility of convexification for an important class of non-convex functions.

3.4.5 Proposition. Let f : Rn → R be of the form

f(u) := supτ∈T

ϕ(u, τ),

with τ ∈ T ⊂ Rl and ϕ(·, τ) differentiable for all t ∈ T on a convex set Ω

with Ω ∩ domf 6= ∅. Suppose that for each τ ∈ T the gradient ∇uϕ(·, τ) isLipschitz-continuous on Ω with constant Lτ and L = supτ∈T Lτ < ∞ andsupτ∈T ‖∇uϕ(u, τ)‖ <∞ for some u ∈ Ω ∩ domf .Then, for χ ≥ L, the function f(·) + χ

2 ‖ · ‖2 is convex and finite on Ω.

Proof: For each τ ∈ T and arbitrary u, v ∈ Ω one gets

〈∇uϕ(u, τ) + χu−∇uϕ(v, τ)− χv, u− v〉 ≥ −L‖u− v‖2 + χ‖u− v‖2.

Hence, ϕ(·, τ) + χ2 ‖ · ‖

2 is convex on Ω if χ ≥ L.Due to the properties of ϕ(·, τ), for any v, u ∈ Ω, we obtain also

ϕ(u, τ) = ϕ(v, τ) +

∫ 1

0

〈∇uϕ(v + t(u− v), τ), u− v〉dt

= ϕ(v, τ) +

∫ 1

0

〈∇uϕ(v + t(u− v), τ)−∇uϕ(v, τ), u− v〉dt

+ 〈∇uϕ(v, τ), u− v〉

≤ ϕ(v, τ) +Lτ2‖v − u‖2 + 〈∇uϕ(v, τ), u− v〉.


Setting in this inequality v := u and taking into account that

supτ∈T‖∇uϕ(u, τ)‖ =: c <∞,

one can conclude that

ϕ(u, τ) ≤ ϕ(u, τ) +Lτ2‖u− u‖2 + c‖u− u‖,

hence,

supτ∈T

ϕ(u, τ) +χ

2‖u‖2 = sup

τ∈Tϕ(u, τ) +

χ

2‖u‖2 <∞ ∀ u ∈ Ω.

In view of the convexity of ϕ(·, τ) + χ2 ‖ · ‖

2 for each τ ∈ T , this ensures that thefunction f(·) + χ

2 ‖ · ‖2 is convex and finite on Ω.

Now, let f : Rn → R be a given lower semicontinuous function and wesuppose that the functions Ψk(·) for χk ∈ (χ, χ], χ > 0, are strongly convex onΩ ⊃ u ∈ Rn : f(u) ≤ f(u). Then, starting the exact proximal point method

uk = arg minu∈Rn

f(u) +χk2‖u− uk−1‖2 (3.4.4)

with u0 : f(u0) ≤ f(u), we have the known facts that

• uk is uniquely defined;

• 0 ∈ ∂(f(uk) + χk

2 ‖uk − uk−1‖2

);

• f(uk) + χk2 ‖u

k − uk−1‖2 ≤ f(uk−1).

If infu∈Rn f(u) > −∞, then the latter inequality yields

f(uk)→ f > −∞ and ‖uk − uk−1‖ → 0.

Remember that a nonlinear programming problem

min f0(u) : fj(u) ≤ 0, j = 1, ...,m, (3.4.5)

whose Lagrangian function possesses a saddle point, can be transformed intothe unconstrained problem

min f(u) : u ∈ Rn,

with f(u) := max0≤j≤m ηj(u), η0 := f0, ηj := f0 + αfj , j = 1, ...,m, and asufficiently large α > 0.Along with other applications this motivates to consider the proximal pointmethod (3.4.4) for f(u) = maxi∈I ϕi(u), |I| < ∞, with differentiable functionsϕi. Concerning ϕi, we suppose also that their gradients are Lipschitz-continuous(with constants Li) on some convex set Ω ⊃ u : f(u) ≤ f(u). Then, due to thefiniteness of the index set I, the conditions of Proposition 3.4.5 are satisfied.

3.4.6 Theorem. Let the hypothesizes made before on ϕi be valid and χk bechosen such that maxi∈I Li < χk ≤ χ. Moreover, let infu∈Rn f(u) > −∞.Then, starting Method (3.4.4) with u0 : f(u0) ≤ f(u), we obtain either


(i) uk−1 = uk holds for some k, providing that uk is a stationary point of f ,

or

(ii) any accumulation point of uk is a stationary point of f .

Proof Due to the differentiability of ϕi and convexity of Ψk(·) := f(·) + χk2 ‖ ·

−uk−1‖2 on Ω the subdifferential ∂Ψk(uk) is the convex hull of the gradientsat uk of the functions ϕi(·) + χk

2 ‖ · −uk−1‖2, i ∈ I(uk), where I(uk) = i ∈ I :

ϕi(uk) = f(uk).

Hence, for some λki ≥ 0, i ∈ I(uk), such that∑i∈I(uk) λ

ki = 1, we obtain

0 =∑

i∈I(uk)

λki∇ϕi(uk) + χk(uk − uk−1). (3.4.6)

If uk = uk−1, then

0 =∑

i∈I(uk)

λki∇ϕi(uk),

thus, the point 0 is included in Clarke’s subdifferential ∂clf(uk), proving thatuk is a stationary point of f . If uk is an infinite sequence and u is its accu-mulation point, then due to the finiteness of I and 0 ≤ λki ≤ 1 ∀ i ∈ I(uk),∀ k,we are able to choose a subsequence kl such that

I(uk1) = I(uk2) = ... = I ,

and for l→∞,

ukl → u, and λkli → λi, ∀ i ∈ I .

Moreover, because of infu∈Rn f(u) > −∞, the relation

limk→∞

‖uk − uk−1‖ = 0

is valid. Now taking limit in (3.4.6) for k = kl, l→∞, one gets

0 =∑i∈I

λi∇ϕi(u), with λi ≥ 0,∑i∈I

λi = 1.

On account of I(u) ⊃ I and the property of Clarke’s subdifferential, thismeans that 0 ∈ ∂clf(u), i.e. u is a stationary point of f .

3.4.7 Remark. If the set u : f(u) ≤ f(u0) is unbounded, the existence ofaccumulation points is not guaranteed in general. Moreover, in the case thatProblem (3.4.1) is convex, it is known that ‖uk‖ → ∞ if U∗ = ∅. ♦

3.4.8 Remark. If u = liml→∞ ukl and f is convex in some neighborhoodBr(u) := u : ‖u − u‖ ≤ r with r > 0, then of course, u is a local minimumof f . In this case, on account of ‖uk − uk−1‖ → 0, for sufficient large l one canclaim that ‖ukl − u‖ ≤ r

2 and

‖ukl+s − ukl+s−1‖ ≤ r

2, s = 1, 2, ...


Thus, ukl+1 ∈ Br(u) and hence ukl+1 is the minimum point of the sum of theconvex functions f and

χkl+1

2 ‖ · −ukl‖2 on Br(u). From Proposition II.2.2 in[100] we obtain

f(u)− f(ukl+1) + χkl+1

(ukl+1 − ukl , u− ukl+1

)≥ 0.

Together with the evident identity

‖ukl+1 − u‖2 − ‖ukl − u‖2 = −‖ukl+1 − ukl‖2 + 2⟨ukl+11 − ukl , ukl+1 − u

⟩this leads to

‖ukl+1 − u‖2 − ‖ukl − u‖2 ≤ −‖ukl+1 − ukl‖2 +2

χkl+1

(f(u)− f(ukl+1)

)providing that ‖ukl+1 − u‖ ≤ ‖ukl − u‖ and, in general,

‖ukl+s − u‖ ≤ ‖ukl+s−1 − u‖, s = 1, 2, ...

Hence, the sequence ‖uk − u‖ converges, and in view of liml→∞ ukl = u, onecan conclude that limk→∞ uk = u. ♦

It should be noted that we did not suppose that u is the unique minimumin the neighborhood Br(u).

The following result is of interest, in particular, for semi-infinite program-ming problems, which often can be reduced to the unconstrained minimizationof functions of the type f(u) = supτ∈T ϕ(u, τ) with T a compact set.

3.4.9 Theorem. Let the function f be defined by f(u) := supτ∈T ϕ(u, τ), whereT ⊂ Rl is a compact set and ϕ is continuous on Rn × T . Moreover, supposethat infu∈Rn f(u) > −∞ and that ∇uϕ is continuous on Ω× T , where Ω is anopen convex set containing u : f(u) ≤ f(u). Finally, let

‖∇uϕ(u′, τ)−∇uϕ(u′′, τ)‖ ≤ L‖u′ − u′′‖ ∀ u′, u′′ ∈ Ω, ∀ τ ∈ T.

Then the function f(·)+ χ2 ‖·‖

2 is strongly convex on Ω if χ > L, and for Method(3.4.4) with L < χk ≤ χ and the starting point u0 : f(u0) ≤ f(u) the conclusionof Theorem 3.4.6 remains true.

Proof: Due to the compactness of T and the continuity of ∇uϕ on Ω×T , finite-ness of supτ∈T ‖∇uϕ(u, τ)‖ is guaranteed for each u ∈ Ω. Therefore, Proposition3.4.5 implies strong convexity of f(·) + χ

2 ‖ · ‖2 on Ω for χ > L.

Moreover, f is a locally Lipschitzian function and its upper Clarke’s deriva-tive coincides with the directional derivative (see [89], Example 2.1.4). Therefore,the theorem about Clarke’s subdifferential for the sum of functions (cf. [77])permits to conclude that, for u ∈ Ω,

∂cl

(f(u) +

χk+1

2‖u− uk‖2

)= ∂clf(u) + ∂cl

(χk+1

2‖u− uk‖2

),

and regarding the convexity of f(·)+ χk+1

2 ‖·−uk‖2 and χk+1

2 ‖·−uk‖2, this leads

to∂(f(u) +

χk+1

2‖u− uk‖2

)= ∂clf(u) + χk+1(u− uk).


Hence,

0 ∈ ∂(f(uk+1) +

χk+1

2‖uk+1 − uk‖2

)= ∂clf(uk+1)+χk+1(uk+1−uk), (3.4.7)

and if uk = uk+1, then uk+1 is a stationary point of f .If uk is an infinite sequence and u = limi→∞ uki , then the inclusion 0 ∈∂cl(f(u)) follows from (3.4.7), relation ‖uk+1 − uk‖ → 0 as well as from theclosedness of the mapping u→ ∂clf(u).

In principle, Theorem 3.4.6 could be considered as a particular case ofTheorem 3.4.9, however, we preferred to give a direct proof.

Now, let us consider a very special case in non-convex programming whereit is possible to guarantee that the proximal point method generates a sequenceuk converging to u ∈ Arg minf(u) : u ∈ Rn.

Let f : Rn → R be a lower semicontinuous function and dom f 6= ∅. With agiven c > infu∈Rn f(u), define Nc = u : f(u) ≤ c. Again we want to solvethe problem

min f(u) : u ∈ Rn,but now under the following

3.4.10 Assumption.

(i) U∗ = u : f(u) = infv∈Rn f(v) =: f∗ 6= ∅;

(ii) Nc is contained in a convex set Ω, and for some χ > 0 the functionΨχ,Ω(u) = f(u) + χ

2 ‖u‖2 is convex on Ω;

(iii) for some c0 ∈ [f∗,c) and χ chosen as in (ii) one has

‖y(u)− χu‖ > d > 0 ∀ u ∈ Nc\Nc0 , ∀ y(u) ∈ Λ(u), (3.4.8)

where Λ(u) = ∂Ψχ,Ω(u)− χu;

(iv) Nc0 is convex, and f is convex on Nc0 .

♦

3.4.11 Theorem. The proximal iterates in (3.4.4) with χ < χk ≤ χ and arbi-trary u0 ∈ Nc converge to a solution point u∗ ∈ U∗.

Proof: Let u0 ∈ Nc\Nc0 . Then it is clear that uk ∈ Nc for all proximal steps k.Assume now that uk+1 ∈ Nc\Nc0 . Due to

f(u) +χk+1

2‖u− uk‖2 = Ψk+1(u) +

χk+1 − χ2

‖u‖2 − χk+1〈u, uk〉+χk+1

2‖uk‖2

and the definition of uk+1, we obtain

0 ∈ ∂Ψk+1(uk+1)− χuk+1 + χk+1(uk+1 − uk).

Observing (3.4.8), this leads to

‖uk+1 − uk‖ > d

χ.


Now, taking into account the inequality

f(uk+1) +χk+1

2‖uk+1 − uk‖2 ≤ f(uk),


f(uk+1) < f(uk)− χ

2χ2d2. (3.4.9)

Estimate (3.4.9) shows that, after a finite number of steps, the prox-iterates fallinto the set Nc0 . With regard to Assumption 3.4.10(iv) and the non-increaseof f(uk), the use of the convergence results for convex problems (see, forinstance, Proposition 3.1.8) ensures convergence of uk to a solution u∗ ∈ U∗.The same happens if u0 ∈ Nc0 .

3.4.12 Example. (Showing the fulfilment of Assumption 3.4.10):Consider in R2 the following function f(u) = u2

1 + u22 + 15u2

1u22. Obviously,

Assumption 3.4.10(i) is valid: u∗ = (0, 0)T is the unique minimum. The functionf is non-convex, but it is strongly convex on the sphere u ∈ R2 : ‖u‖ ≤ 1√

15.

Outside of this sphere we have ‖∇f(u)‖ > 2√15

. Therefore, the Assumptions

3.4.10(iii) and (iv) are satisfied, for instance with c = 2 and c0 = 115 . As it

follows from Proposition 3.4.5, Assumption 3.4.10(ii) can also be satisfied forany convex compact set Ω containing Nc with c = 2. ♦

Insignificant modifications in the proofs of the Theorems 3.4.6, 3.4.9 and3.4.11 permit us to establish analogous statements for the inexact version ofthe proximal point method.

3.4.2 Convexification of constrained problems

We start this subsection by considering the constrained problem

min f(u) : u ∈ K, (3.4.10)

where now K ⊂ Rn is supposed to be a convex closed set and f is twice differ-entiable on K. Let us introduce the operator

T u :=

∇f(u) +NK(u) if u ∈ K,∅ if u 6∈ K,

with NK(u) := v ∈ Rn : 〈v, z − u〉 ≤ 0 ∀ z ∈ K the normal cone of the set Kat the point u ∈ K.

3.4.13 Assumption.

(i) For a given c > inff(u) : u ∈ K the set Q = u : 0 ∈ T u, f(u) ≤ c isnon-empty, and if u ∈ Q, then u is a local minimum of f on K;

(ii) For some d > 0 the set u : f(u) ≤ c, infy∈T u ‖y‖ < d is bounded.

♦


3.4.14 Proposition. Suppose that Assumption 3.4.13 is satisfied for Problem(3.4.10). Then for each δ > 0 there exists d ∈ (0, d) such that u belongs to theδ-neighborhood Bδ(u) of some local minimum u ∈ Q, whenever u ∈ K, f(u) ≤ cand infy∈T ‖y‖ < d.

Proof: Suppose that for some δ > 0 such a constant d does not exist. Denot-ing by Bδ the union of δ-neighborhoods of all local minima in Q, we choose asequence dk → +0, dk < d and define wk ∈ K\ Bδ, yk ∈ T wk such that

f(wk) ≤ c, ‖yk‖ < dk.

Due to Assumption 3.4.13(ii), without loss of generality, one can assume thatwk converges to a point w. For this point we infer that

w ∈ K and f(w) ≤ c. (3.4.11)

Because of yk = ∇f(wk) + vk for some vk ∈ NK(uk) and ∇f(wk)→ ∇f(w), aswell as ‖yk‖ → 0, one gets vk → v = −∇f(w). But from the definition of thenormal cone NK(wk), the inequality

〈vk, wk − z〉 ≥ 0, ∀ z ∈ K

holds true, and taking limit, we infer

〈v, w − z〉 ≥ 0, ∀ z ∈ K,

i.e. v ∈ NK(w). Therefore, 0 = ∇f(w) + v ∈ T w, and due to (3.4.11) and As-sumption 3.4.13(i), w is a local minimum, contradicting the fact that wk ∩Bδ = ∅.

Suppose now that for Problem (3.4.10) the set u ∈ K : f(u) = infv∈K f(v)is non-empty and that for some χ > 0 the function f(·) + χ

2 ‖ · ‖2 is convex on

K. Due to Proposition 3.4.5, these conditions can be fulfilled, in particular, ifK is a non-empty and bounded set. Then, obviously, the function

ζ(u) :=

f(·) + χ

2 ‖ · ‖2 if u ∈ K,

+∞ if u 6∈ K,

is convex, lower semicontinuous and ∂ζ(u)−χu = T u. If, moreover, Assumption3.4.13 is valid, then using Proposition 3.4.14 and the proof of Theorem 3.4.11,one can easily show that, for arbitrary δ > 0, the iterates uk, generated by theexact method (3.4.2) with χ < χk ≤ χ and the starting point u0 ∈ K : f(u0) ≤c, belong to Bδ for sufficiently large k (k ≥ k(δ)).

3.4.15 Example. Let K := u ∈ R2 : −1 ≤ u1 ≤ 1, −1 ≤ u2 ≤ 1 andf(u) := (1− u2

1)u22. Evidently, we have

U∗ = (u1, 0)T : −1 ≤ u1 ≤ 1 ∪ (1, u2)T : −1 ≤ u2 ≤ 1∪(−1, u2)T : −1 ≤ u2 ≤ 1

andu ∈ K : ∇f(u) = 0 = (u1, 0)T : −1 ≤ u1 ≤ 1 ⊂ U∗.

Now, calculating ∇f on the boundary of K, it is easy to see that Assumption3.4.13(i) is fulfilled for any c > 0, and Assumption 3.4.13(ii) follows from theboundedness of K. ♦


Of course, it should be emphasized that the verification of the Assump-tions 3.4.10(iii)-(iv) and 3.4.13(i) causes difficulties. However, these conditionsare not binding for treating proximal point methods. Nevertheless, the completevalidity of theses Assumptions ensures a much more comfortable result, namely,according to the Theorems 3.4.6, 3.4.9 and 3.4.11, convergence to a local orglobal minimum.

Finally we investigate the following non-convex problem

minf(u) : h(u) = 0, (3.4.12)

with twice differentiable, non-convex functions f and hj , h(u) := (h1(u), ..., hm(u))T .The augmented Lagrangian function is defined by

LA(u, λ, ρ) := L(u, λ) +ρ

2‖h(u)‖2,

with L(u, λ) := f(u) + 〈λ, h(u)〉.An often used assumption to ensure convergence of the iterates of the multipliermethods to local saddle points (u∗, λ∗) is the positive definiteness of the matrix

∇2uuLA(u∗, λ∗, ρ)

for sufficiently large ρ. Nash and Sofer [302] gave the following estimate:

3.4.16 Proposition. Let (u∗, λ∗) be a local solution of Problem (3.4.12), Hbe the Hessian of L at (u∗, λ∗) and Z be a basis matrix of the null-space ofG = ∇h(u∗)T . Assume that the matrix G has full rank, i.e. linear independenceconstrained qualification is satisfied, and ZT∇2

uuL(x∗, λ∗)Z is positive definite,i.e. strong second order constraint qualification holds.Then the matrix ∇2

uuLA(u∗, λ∗, ρ) is positive definite for

ρ > ρ :=1

δ

(β2γ2

α+ γ

),

withα := `min(ZTHZ), γ := ‖H‖2,β := ‖Z‖2, δ := `min(GTG),

where `min(·) denotes the smallest eigenvalue of the given matrix.

It is known that the conditioning of the matrix ∇2uuLA(u∗, λ∗, ρ) usually

worsens with growing ρ if rank(A) < n and although ρ is not required to go toinfinity - in contrast to penalty methods - there is obviously a conflict betweenshifting the eigenvalues of ∇2

uuLA(u∗, λ∗, ρ) to the positive axis by large valuesof ρ and good condition numbers requiring small values of ρ.

In order to circumvent this conflict we consider the primal-dual applicationof the proximal point methods (cf. the dual methods described in SubsectionA3.4.4). These are based on the regularized augmented Lagrangian function

ΨA(v, λ, ρ, χ, u) := LA(v, λ, ρ) +χ

2‖v − u‖2

and lead to the so-called regularized method of multipliers:

uk+1 := arg minv∈Rn ΨA(v, λk, ρk, χk, uk),

λk+1 := λk + ρkh(uk).(3.4.13)

The counterpart of Proposition 3.4.16 reads now:


3.4.17 Proposition. Let the assumptions of Proposition 3.4.16 hold.Then ∇2

uuΨA(u∗, λ∗, ρ, χ, u) is positive definite for all u ∈ Rn and

ρ > ρ :=1

δ

(β2(γ + χ)2

α− χβ2+ γ − χ

). (3.4.14)

The proof is based on the same principles as the proof of Proposition3.4.16 simply regarding the proximal term as a part of the objective function.

The local convergence analysis can now be carried out simultaneously tothat of the classical method of multipliers with the peculiarity that two param-eters, the penalty ρ and the regularization parameter χ, are at hand to controlthe method.

3.4.3 Proximity control in trust region methods

In the previous subchapters we analyzed the most common used convexificationeffect of the regularization term. Additionally it can be interpreted as a penaltyterm for the step length. As this aspect is commented on less frequently in theliterature we want to analyze it in this subchapter.

Just as step-size algorithms trust-region methods can be seen as a global-ization technique for locally fast convergent methods such as Newton- or Quasi-Newton methods (cf. Subsection A3.2.5). In contrast to those though step-sizeand descent direction are determined simultaneously. To concentrate on the con-ceptual ideas we consider the unconstrained minimization problem (3.4.1) suchthat the prox operator can be simplified to Proxf,χ ≡ Proxf,χ,Rn .

At a given iterate uk we solve the so-called trust-region subproblem

minmk(u) : ‖u− uk‖ ≤ ∆k, (3.4.15)

where mk(·) is a model for the objective function f in a neighborhood of the cur-rent iterate uk. Generally the model is expected to be a good local approximationonly justifying the inclusion of the step-length constraint. Having determined apossibly approximate solution uk + dk of (3.4.15) we can assess the quality ofthe model mk by relating the actual reduction to the reduction predicted by themodel, i.e. we form the following ratio

rk :=f(uk)− f(uk + dk)

mk(uk)−mk(uk + dk). (3.4.16)

Based on the value rk the step dk is accepted, i.e. uk+1 := uk+dk, and the radius∆k is increased if a sufficient high reduction was achieved (successful step) orthe step is rejected and the radius is decreased indicating that the model cannotbe trusted and should be improved (unsuccessful step).

The basic scheme still has of course many degrees of freedom allowing fora great variety of trust-region type algorithms that differ mainly in the choiceof the model mk, in the choice of the norm used for the step restriction, in themethod of determining a global solution of the trust-region subproblem and theadjustment of the trust-radius. For a comprehensive overview of trust-regionmethods we refer to Conn et al. [82].


The most advantageous property of this basic trust-region scheme is thatglobal convergence to at least a stationary point can be shown under mild as-sumptions. Furthermore in many algorithms necessary second order conditionshold in the limit, see for instance [82, 117].Traditionally the most prominent model functions are quadratic approximationsof f at uk:

mk(u) := f(uk) + 〈∇f(uk), u− uk〉+1

2〈Hk(u− uk), u− uk〉, (3.4.17)

with a symmetric positive definite matrix Hk which is an approximation of theHessian of f at uk.

Based on these observations we are going to have a look at the relationshipbetween the trust-region problem

minmk(u) : ‖u− uk‖2 ≤ ∆2k (3.4.18)

andminmk(u) +

χk2‖u− uk‖2 (3.4.19)

with varying exact and inexact model functions mk(·). Here, instead of Prob-lem (3.4.15) we formulate an equivalent restricted subproblem (3.4.18) avoidingpeculiarities due to the non-differentiability of the constraints in (3.4.18). Ofcourse the resulting Lagrange multipliers differ between these problems.

Considering quadratic models of type (3.4.17), the advantage is that whenused in conjunction with a norm of type ‖x‖ = 〈Ax, x〉 with a positive defi-nite matrix A, the trust-region subproblem (3.4.18) is an all-quadratic problemwhose solutions can be completely characterized.In the most important case A := I we have in particular:

3.4.18 Theorem. (see [126])Consider Problem (3.4.18) with ∆k > 0, Euclidean norm ‖ · ‖2 and model func-tion (3.4.17). Then u∗ is a global solution iff a unique λ∗ ≥ 0 exists such thatthe following holds:

(i) (Hk + 2λ∗I)(u∗ − uk) = −∇f(uk);

(ii) ‖u∗ − uk‖ ≤ ∆k and λ∗(‖u∗ − uk‖ −∆k) = 0;

(iii) (Hk + 2λ∗I) is positive semidefinite.

Conditions (i) and (ii) are the Kuhn-Tucker conditions for (3.4.18) butdue to the special structure of the problem they are not only necessary but alsosufficient. Likewise condition (iii) is stronger than the general necessary secondorder condition as the latter only ensures positive definiteness of the modelHessian on the set of feasible directions in u∗. Theorem 3.4.18 shows that theglobal minimizer of (3.4.18) lie on the trajectory

u(λ) : (Hk + 2λI)(u− uk) = −∇f(uk), λ ∈ (max[0,−`min

2],∞), (3.4.20)

with `min smallest eigenvalue of (Hk + 2λI).It is furthermore well-known (see [372]) that the Lagrange multiplier is


monotonically decreasing with increasing trust-region radius. The most populartrust-region methods thus try to find λ∗ by solving the nonlinear equation

‖u(λ)− uk‖ = ∆k (3.4.21)

through a search along the solution trajectory (3.4.20). This approach can bemotivited by a dual point of view. The dual problem of (3.4.18) is given by

maxλ≥0q(λ) : λ ∈ dom(q)

with q(λ) := infu∈Rnmk(u) + λ(‖u− uk‖2 −∆2k).

Obviously we have λ ∈ dom(q) iff (Hk + 2λI) is positive definite. Determiningu(λ) by (3.4.20) is then equivalent to evaluating the dual function, and solvingthe nonlinear equation (3.4.21) is equivalent to finding the dual optimal λ∗.Naturally we cannot rely on strong duality results. On the contrary a dualitygap must be expected if the matrix Hk is not positive definite, i.e. mk is notconvex. But due to the compactness and regularity of the feasible set we canwork with local duality results in the sense of [302].

Observing that the quadratic function mk(u) + 2λ2 ‖u − u

k‖2 is stronglyconvex, if (Hk + 2λI) is positive definite, we see that u∗ is also a global solutionto (3.4.19), i.e.,

u∗ solves Problem (3.4.18) ⇔ u∗ = Proxmk,χk

(uk), χk := 2λ∗. (3.4.22)

The following statement tells us how the regularization- and the trust-region parameters are coupled.

3.4.19 Proposition. (see [126])Let mk be differentiable and u∗ be a global minimum of (3.4.18). Then the uniqueLagrange multiplier λ∗ is given by

λ∗ := − 1

2∆2k

〈u∗ − uk,∇mk(u∗)〉. (3.4.23)

In the non-convex case methods of Levenberg-Marquard type on the otherhand choose a priori λk such that (Hk +λkI) is positive definite, then computeu(λk) as the respective point on the trajectory (3.4.20) and consequently deter-mine the trust-region radius implicitly through the chosen λk. In light of (3.4.22)we can conclude that this approach is essentially the proximal point algorithmwhere the exact prox point is approximated by only one (Quasi-)Newton stepyielding the usual convergence properties of trust-region methods without as-suming any convexity on the objective function f (cf. Subsection 3.3.2).

Summing up, the classical trust-region method can be regarded as a prox-imal point method where the regularization parameter may be chosen a priorily(yielding a Levenberg-Marquard type of method) or may be implicitly deter-mined via the Lagrange multiplier of the trust-region subproblem. The proximalpoints are not calculated exactly but approximately by one Newton step.

Concerning more general convex model functions mk(·) we can show acomplete equivalence between computing a proximal iteration and solving atrust-region subproblem, because the respective solutions can be fully charac-terized analogously to Theorem 3.4.18 without the need to distinguish local


and global solutions.

The above approach using (quadratic) convex model functions has beenused specifically in the context of bundle methods for non-differentiable min-imization problems. Given a non-differentiable, convex function f , Schrammand Zowe [366] consider the piece-wise linear model mk

mk(u) := maxj∈Jkf(yj) + 〈∂f(yj), u− yj〉 (3.4.24)

constructed by means of a bundle yj : j ∈ Jk of previously computed trialpoints and subgradients ∂f(yj). A trial step is executed by minimizing theregularized bundle function


mk(u) +χk2‖u− uk‖2.

The effect of adding the regularization term to mk is again that the functionto be minimized is strongly convex and the unique minimizer can be found bysolving a quadratic auxiliary problem. As shown above the trial step can beregarded as the solution of a trust-region subproblem allowing it to be assessedby the ratio test (3.4.16) yielding a criterion for its acceptance as a new iterateand its admission into the bundle. The regularization parameter χk is decreasedif the model is a good approximation of the function f allowing for wider steps,otherwise increased to restrict the steps.

Kiwiel [239] describes a version of this bundle idea for the minimizationof differentiable non-convex functions. By defining

α(u, y) := |f(u)− f(y)− 〈g(y), u− y〉|,mk(u) := max

j∈Jkf(uk)− α(uk, yj) + 〈∂g(yj), u− uk〉,

the model is again a piece-wise linear but now possibly shifted model of theobjective function, with g(y) and g(yj) elements of a generalized gradient (dueto the non-convexity it is of course not possible to make use of a subgradient,therefore

∂f(u) := conv limk→∞

∇f(uk) : uk → u,∇f(uk) exists ∀ uk

denotes a generalized gradient).In the case of convex f this bundle approximation coincides with (3.4.24). Thenew search direction is again determined by minimizing the regularized modelbut this time an additional trust-region constraint is added:

uk+1 := arg min‖u−uk‖2≤∆2

k

mk(u) +χk2‖u− uk‖2. (3.4.25)

This problem is once again uniquely solvable and well defined due to the strongconvexity of the regularized model function.

In light of the above statements it seems surprising why the auxiliaryproblems (3.4.25) are formulated with a regularized objective function and atrust-region constraint. As the approximation function is convex, only one of


these modifications seems to be necessary as either the prox-term or the trust-region constraint captures the desired effect of proximity and the respectiveother term seems to be redundant. But coupling both terms has the followingadvantages compensating for the aggravated problem structure.The iterations are controlled by two parameters assigned to different tasks:

(a) The radius ∆k controls the accuracy of mk. If the ratio test demands areduction of mk and the discrepancy measured by α(uk, yj) is too large,the bundle is reexamined and points yj distant from the current iterateuk are eliminated from the bundle.

(b) The regularization parameter χk controls the decrease of mk. After a suc-cessful iteration (in view of the ratio test) chik is reduced and vice versa.

Without the regularizing term the objective function of Problem (3.4.25)is piece-wise linear and one has to deal with possibly non-unique minimizers. Byincluding the quadratic regularizing term the subproblem consists of minimizinga strongly convex function subject to a convex constraint such that the solutionand the corresponding multiplier are uniquely determined and can be computedefficiently.

Summing up the method can be formally described as a PPR as at theend of a successful step we have:


mk(u) +χk + λk

2‖u− uk‖2,

uk+1 := uk + tk(uk+1 − uk),

where λk is the optimal Lagrange multiplier associated with the solution of(3.4.25) and tk is a suitably chosen step-size. Additionally it can be seen as atrust-region method making use of a regularized model function.

As Toint [398] points out, the choice of the function f itself as an exactmodel function is possible and leads to globally convergent algorithms. Thesubproblems to solve then have the form

minf(u) : ‖u− uk‖2 ≤ ∆2k. (3.4.26)

It is not necessary to solve this problem exactly but an approximate solutionwith respect to a decrease condition is sufficient for convergence. Therefore theproblem can be rewritten in

min f(x) +χk2‖u− uk‖2 − χk

2‖u− uk‖2

s.t. ‖u− uk‖2 ≤ ∆2k,

Now, if χk is chosen such that f(u) + χk2 ‖u − u

k‖2 becomes convex on the setu : ‖u − uk‖ ≤ ∆k, the objective function can be seen as a DC-function,that is a difference of two convex functions and one can employ methods ofDC-optimization to solve this problem, see Voetmann [412].

3.5. COMMENTS 95

3.5 Comments

Section 3.1: The concept of proximal-point mapping was introduced by Moreau[292], including also the proofs of the main properties of this mapping, in par-ticular, non-expansivity of Proxf,C and differentiability of the Moreau-Yosidafunction η(u) := minv∈Cf(v) + χ

2 ‖v − u‖2 (see Proposition 3.1.4).

Other variants of Proposition 3.1.3 can be found in [211] and [212].

Section 3.2: For the theoretical foundation of proximal point methods we referto the fundamental paper of Rockafellar [352]. Some results of this paperhave been extended by Ha [160] towards the regularization on a subspace of aHilbert space.Rockafellar [351] and Antipin [13] comment about earlier applications ofthe PPR applied to convex optimization methods. In both papers the regu-larized Lagrangian method (cf. Algorithm 3.2.8) was considered under milderassumptions than required in Theorem 3.2.9: In [351] smoothness of the func-tionals J , g and φk is not required and in [13] it is permitted that the Lagrangemultipliers λk are calculated approximately. Later on more general regularizedmultiplier methods have been developed by Antipin [14].

For convex separable problems Spingarn [380] suggested a decompositionscheme, where in the framework of augmented Lagrangian methods iterativePPR is applied to each subproblem generated in this scheme of decomposition.

In the papers of Rockafellar [351] and Ibaraki et al. [190] specialdual methods for convex optimization problems are considered, where the La-grangian is regularized with respect to the primal and dual variables. Both ap-proaches differ from each other with respect to the order of performing the min-and max-operation for finding the saddle point of the regularized Lagrangian.While Rockafellar’s approach is sufficiently general, Ibaraki’s is predesti-nated only for separable problems with linear constraints.

Kaplan investigated in [203, 204] penalty methods in the context withPPR for a wide class of penalty functions. In particular, in [204] a penalty func-tion with a slow increase outside the feasible set is considered. In connectionwith regularized penalty methods he also proved in [205] their convergence, seethe Theorems 3.2.1 and 3.2.3.The papers of Auslender et al. [21], Mouallif [297], Mouallif and Toss-ings [299] and Alart and Lemaire [5] are mainly concerned with the use ofspecific classes of penalty functions, in particular those, which are suitable tonon-smooth problems.

Section 3.3: Concerning the results mentioned in Subsection 3.3.1 we referadditionally to Morosov [294], who has studied the iteration scheme

(ATA+ αI)uk = AT f + αuk−1, α > 0,

for solving a linear operator equation Au = f with a degenerated operator A.This can be interpreted as a PPR for the minimization of the quadratic func-tional J(u) = ‖Au− f‖2.

Section 3.4: A retrospective analysis of some algorithms for non-convex mini-mization shows their relationship to PPR. This concerns, in particular, a seriesof linearization methods, see Pshenicnyj [339].


Spingarn [379] has developed the PPR for finding a zero of a maximalstrictly hypomonotone operator T : Rn → 2R

n

. In this method, proximal iter-ates are calculated not with the operator T but with an auxiliary Lipschitz con-tinuous operator constructed in a convex neighborhood U of a point u : 0 ∈ T u.The model, adapted in [379] to the problem of the local minimization of a func-tion f = g − h, with g lsc, convex and h ∈ C2, turns into the usual proximalpoint method

uk+1 := arg minu∈U

f(u) +χk2‖u− uk‖2.

If χk > 0 is chosen such that f(·) + χk2 ‖ · ‖

2 is convex on U , and u0 is closeenough to a local minimum u of f , and if the mapping ∂f−1 has a monotonederivative at (0, u), then uk converges to u linearly.

Numerical difficulties arise in the trust-region approach (3.4.18) with qua-dratic model functions mk via (3.4.17) when the Lagrange multiplier λ or theeigenvalues `min(Hk + λI) become small. The latter happens if ∇f(uk) liesorthogonal to the eigenspace of the smallest eigenvalues of Hk (the so-calledhard case), see Fletcher [117]. Due to these disadvantages methods with directchoice of the radius ∆k are preferred in most practical algorithms. Bonnansand Launay [47] apply this technique of a regularized model function in anSQP-context in order to penalize large steps without including an extra trustregion constraint.

Chapter 4

PPR FOR INFINITE-DIMENSIONALPROBLEMS

In Section 2.2 a preliminary introduction to One-Step (OSR) and Multi Step(MSR) methods of iterative PPR was given. Subject of this chapter is a generalframework for investigating methods of PPR.

Although this framework is rather abstract, it is suitable to construct andanalyze special numerical methods for ill-posed, infinite as well as semi-infiniteconvex optimization problems, in particular, also elliptic variational inequali-ties. The essential property of the related methods is their notable stability atall iteration levels, achieved by a coordinate controlling of the parameters ofdiscretization, regularization as well as those parameters arising in the basicoptimization algorithms.

It should be noted that an OSR-method can be interpreted as a specialcase of a MSR-method. However, in order to demonstrate the peculiarities ofOSR-methods, including the update of the controlling parameters and their es-timate of the rate of convergence, we will investigated this class of methodsseparately.

In order to simplify the theoretical study of IPR we assume here and inthe following chapter that in the regularized functional

Ψk(u) := Jk(u) +χk2‖u− uk‖2

and in related expressions the parameter χk in the proximity term is chosenequal to 2. However, from numerical point of view this might be not a reason-able choice, see also Subsection 4.4.1.

4.1 Preliminary Results

Let us start with the Lipschitz property of some functionals on the ball Br :=u ∈ V : ‖u‖ ≤ r in the Hilbert space V .

97

98 CHAPTER 4. PPR FOR INFINITE-DIMENSIONAL PROBLEMS

Obviously, for the functional

ϕz(u) := ‖u− z‖2

with fixed z ∈ Br it holds

|ϕz(u′)− ϕz(u′′)| ≤ 4r‖u′ − u′′‖, ∀ u′, u′′ ∈ Br, (4.1.1)

whereas for the functional

δ(u) := ρ2(u,Br1), (r1 < r),

the inequality

|δ(u′)− δ(u′′)| ≤ 2r‖u′ − u′′‖, ∀ u′, u′′ ∈ Br (4.1.2)

is true. The latter can be easily seen using the representation

δ(u) =

(‖u‖ − r1)2 if ‖u‖ > r1,0 otherwise

and observing, in view of the symmetry, the following three cases:

‖u′‖ ≤ r1, ‖u′′‖ ≤ r1,

‖u′‖ > r1, ‖u′′‖ ≤ r1,

‖u′‖ > r1, ‖u′′‖ > r1.

Next, we need some technical lemma. Let L(r) be the Lipschitz constantof the functional J on the ball Br such that

supu′,u′′∈Br

|J(u′)− J(u′′)|‖u′ − u′′‖

≤ L(r) <∞, (4.1.3)

and denoteJ(u) := J(u) + ρ2(u,Br1).

4.1.1 Lemma. Let % > 0 and r1 > 0 be fixed, define

r0 := r1 + L(r1) +√%+ L(r1)(4r1 + L(r1)) (4.1.4)

and assume that J : V → R is a convex functional such that inequality (4.1.3)is fulfilled for r := r1. Then the inclusion

u ∈ V : J(u) ≤ J(u0) + % ⊂ intBr0

holds for each u0 ∈ intBr1 .

Proof: Due to the convexity of J , the Lipschitz property of J and u0 ∈ intBr1(cf. Corollary A1.5.16), we can conclude that for J(u) < J(u0) the inequality

J(u) ≥ J(u0)− L(r1)‖u− u0‖

is fulfilled. Therefore, for every u with ‖u‖ ≥ r0 we obtain

J(u) = J(u) + (‖u‖ − r1)2 ≥ J(u0)− |J(u)− J(u0)|+ (‖u‖ − r1)2

≥ J(u0)− L(r1)‖u− u0‖+ (‖u‖ − r1)2

≥ J(u0)− 2L(r1)‖u‖+ (‖u‖ − r1)2.

4.1. PRELIMINARY RESULTS 99

The function−2L(r1)t+(t−r1)2 increases for t ≥ r1+L(r1). Hence, for ‖u‖ ≥ r0,due to r0 > r1 + L(r1), we get

J(u) ≥ J(u0)− 2L(r1)‖u‖+ (‖u‖ − r1)2

≥ J(u0)− 2L(r1)r0 + (r0 − r1)2.

But from (4.1.4) it follows that

% = (r0 − r1)2 − 2L(r1)r0 − 2L(r1)r1

< (r0 − r1)2 − 2L(r1)r0.

This proves that J(u) > J(u0) + %, if ‖u‖ ≥ r0.

Now, suppose that f is a convex, continuous functional on the Hilbertspace V and G ⊂ Br is a non-empty, convex and closed set. Denote

G∗ := u ∈ G : f(u) = minv∈G

f(v)

and let us consider the following iteration method:

Starting with an arbitrarily chosen ξ0 ∈ Br, a sequence ξk is generatedsuch that

ξk ∈ Br, ‖ξk − Proxf,G

ξk−1‖ ≤ γk, k = 1, 2, · · · , (4.1.5)

with γk specified according to (4.1.6) below.

4.1.2 Lemma. Assume that f satisfies on Br the Lipschitz condition (4.1.3)and

∞∑k=1

γk <∞. (4.1.6)

Then the sequence ξk, generated according to (4.1.5), converges weakly tosome element ξ∗ ∈ G∗ and limk→∞ f(ξk) = f(ξ∗).

This is a particular case of Theorem 1 in Rockafellar [352]. It can alsobe deduced from the second part of the proof of Proposition 1.3.11 (startingfrom relation (1.3.30)).

Under the same conditions as above the following method can be investi-gated:

For an arbitrarily chosen ξ0 a sequence ξk is determined by

ρ(ξk, G) ≤ ¯γk, Ψk(ξk) ≤ minu∈G

Ψk(u) + γk, k = 1, 2, · · · , (4.1.7)

with Ψk(u) = f(u) + ‖u− ξk−1‖2.

4.1.3 Lemma. Assume that γk and ¯γk are non-negative sequences with

∞∑k=1

√γk <∞,

∞∑k=1

√¯γk <∞,


and that the functional f satisfies the Lipschitz condition (4.1.3) on a ball Bτwith

τ := r + supk

¯γk

(or τ := r if ξk ∈ Br for all k).Then ξk ξ∗ ∈ G∗ and limk→∞ f(ξk) = f(ξ∗).

This can be established by a trivial modification of the proof of Proposition1.3.11 taking into account Remark 1.3.13.

4.1.4 Lemma. Additional to the assumptions of Lemma 4.1.2 (or Lemma4.1.3) the following inequality let be satisfied:

f(λu′ + (1− λ)u′′) ≤ λf(u′) + (1− λ)f(u′′)− λ(1− λ)κ‖Πu′ −Πu′′‖2,∀ u′, u′′ ∈ V, λ ∈ [0, 1] (4.1.8)

with some κ > 0, where Π denotes the ortho-projector onto the orthogonal com-plement V ⊥1 of a finite-dimensional subspace V1 ⊂ V (cf. (1.2.12)).Then the sequence ξk, generated by (4.1.5) (or (4.1.7)), converges to someξ∗ ∈ G∗ in the norm of V .

Proof: Inequality (4.1.8) is equivalent to

f(u′)− f(u′′) ≥ 〈q(u′′), u′ − u′′〉+ κ‖Πu′ −Πu′′‖2, ∀ u′, u′′ ∈ V (4.1.9)

with some subgradient q(u′′) ∈ ∂f(u′′) (see Statement 3.4.4 in [313]).Setting u′ := ξk, u′′ := ξ∗ in (4.1.9) and taking limit as k →∞, then, on accountof Lemma 4.1.2 (or Lemma 4.1.3), we conclude that ‖Πξk−Πξ∗‖ → 0. On theother hand, ‖(I −Π)ξk − (I −Π)ξ∗‖ → 0 follows from the weak convergence ofξk and the finite dimensionality of V1.

The following result is useful for estimating the exactness of the iteratesin numerical algorithms for the iterative PPR.

4.1.5 Lemma. If f is a convex, continuous functional, G ⊂ Br is a convex andclosed set and for fixed u0 ∈ V the inequality

‖u1 − u0‖ ≤ ε

is fulfilled with u1 := Proxf,G u0, then the estimate

f(u1)−minu∈G

f(u) < 4rε (4.1.10)

holds.

Proof: Because of the definitions of u1 and u∗ := arg minu∈G f(u) the relation

f(u1 + λ(u∗ − u1)) + ‖u1 + λ(u∗ − u1)− u0‖2 ≥ f(u1) + ‖u1 − u0‖2

is satisfied for all λ ∈ (0, 1]. In view of the convexity of f we get

λf(u∗) + (1− λ)f(u1) + λ2‖u∗ − u1‖2 + 2λ〈u∗ − u1, u1 − u0〉 ≥ f(u1)

4.2. ONE-STEP REGULARIZATION METHODS 101

andf(u1)− f(u∗) ≤ 2〈u1 − u0, u∗ − u1〉+ λ‖u∗ − u1‖2.

For λ ↓ 0, this leads to

f(u1)− f(u∗) ≤ 2〈u1 − u0, u∗ − u1〉 ≤ 4rε.

To put it briefly, inequality (4.1.10) says: If two successive iterates are

close, then the deviation of the current objective value from its optimal value issmall, too.

4.2 One-Step Regularization Methods

We are going to describe one-step regularization methods (OSR-methods) forthe following class of problems (cf. Problem (A1.5.11))


where J : V → R is a convex, lower semi-continuous functional,K ⊂ V is a convex, closed set and K ∩ domJ 6= ∅,U∗ := Arg minu∈K J(u) 6= ∅.

Let Jk, k = 1, 2, · · · , be a sequence of convex, Gateaux-differentiable function-als on V and Kk, k = 1, 2, · · · , be a sequence of convex closed subsets of Vapproximating J and K, respectively. The question how these approximationscan be carried out will be answered in the next chapters when specific instancesof this class are considered.On this way a sequence of approximate problems

minu∈Kk

Ψk(u) := Jk(u) + ‖u− uk−1‖2, k = 1, 2, · · · , (4.2.2)

can be constructed.

4.2.1 Description of OSR-methods

Given a sequence εk of non-negative numbers with limk→∞ εk = 0, we analyzethe following algorithm

4.2.1 Method. (OSR-Method)Data: u0 ∈ V , εk ↓ 0;

S0: set k := 0;

S1: compute in Problem (4.2.2) an approximate solution uk such that

‖∇Ψk(uk)−∇Ψk(uk)‖V ′ ≤ εk, (4.2.3)

whereuk := arg min

u∈KkΨk(u); (4.2.4)


S2: set k := k + 1 and go to S1.

♦

Due to the strong convexity of Ψk, the minimizer uk is uniquely defined.Now, we are interested in conditions which ensure convergence of uk to someu∗ ∈ U∗ in the weak or strong sense.

Assume that the radius r∗ is fixed such that

U∗ ∩ Br∗/8 6= ∅,

and choose r ≥ r∗.Further denote

Q := K ∩ Br, Q∗ := U∗ ∩ Br∗ , Qk := Kk ∩ Br.

Obviously, we also need some quantitative information about the approximationof J and K mentioned above.

4.2.2 Assumption. For given sequences σk ↓ 0 and µk ↓ 0 and for eachk = 1, 2, · · · it holds

supu∈Br

|J(u)− Jk(u)| ≤ σk, (4.2.5)

andρ(Qk, Q) ≤ µk, ρ(Q∗, Qk) ≤ µk. (4.2.6)

♦

Moreover, according to Lemma 4.1.4, we consider

4.2.3 Assumption. The inequality

f(λu′ + (1− λ)u′′) ≤ λf(u′) + (1− λ)f(u′′)− λ(1− λ)κ‖Πu′ −Πu′′‖2,∀ u′, u′′ ∈ V, ∀λ ∈ [0, 1]

let be satisfied with some κ > 0 and Π the ortho-projector onto the orthogonalcomplement V ⊥1 of some finite-dimensional subspace V1 ⊂ V . ♦

Now we are able to formulate the convergence result.

4.2.4 Theorem. Suppose the radii r∗, r are chosen as before and Assumption4.2.2 is fulfilled. Moreover, assume that the functional J is Lipschitz on Braccording to (4.1.3), the starting point is chosen such that u0 ∈ Br∗/4 and theparameters are coordinated such that

∞∑k=1

(√2L(r)µk + 2σk + 2µk +

εk2

)<r∗

2. (4.2.7)

Then for the sequence uk, generated by OSR-Method 4.2.1, it holds

(i) ‖uk‖ < r∗;

(ii) uk u∗ ∈ U∗.

If, additionally, Assumption 4.2.3 is fulfilled, then


(iii) limk→∞ ‖uk − u∗‖ = 0.

Proof: We deliver the proof in three phases.Phase 1: Let u∗∗ ∈ U∗ ∩ Br∗/8 be fixed and uk := arg minu∈Qk Ψk(u). Due to

u∗∗ ∈ Q∗ and condition (4.2.6), we can choose vk ∈ Qk, vk ∈ Q such that

‖vk − u∗∗‖ ≤ µk, ‖vk − uk‖ ≤ µk, k = 1, 2, · · · . (4.2.8)

In view of (4.1.3) and (4.2.8),

J(vk)− J(uk) ≤ L(r)µk, J(vk)− J(u∗∗) ≤ L(r)µk

hold true, and with regard to J(u∗∗) ≤ J(vk), we conclude that

J(vk)− J(uk) ≤ 2ν(r)µk

and, due to (4.2.5),

Jk(vk)− Jk(uk) ≤ 2L(r)µk + 2σk, k = 1, 2, · · · . (4.2.9)

Using Proposition 3.1.3 with v0 := uk−1, y := vk, f := Jk, χ := 2 and C := Qk,we obtain

‖uk − vk‖ ≤ ‖uk−1 − vk‖+√

2L(r)µk + 2σk (4.2.10)

and with (4.2.8) this leads to

‖uk − u∗∗‖ ≤ ‖uk−1 − u∗∗‖+√

2L(r)µk + 2σk + 2µk. (4.2.11)

For k := 1, with regard to (4.2.7), (4.2.10) and the chosen u0, the estimate‖u1‖ ≤ r∗ holds. Consequently, u1 ∈ intBr, and from the strong convexity of Ψ1

and the convexity of K1, one can conclude that

u1 = u1 := arg minu∈K1

Ψ1(u).

But on account of (4.2.3), we get

‖uk − uk‖ ≤ εk2, ∀ k, (4.2.12)

and (4.2.11) yields

‖u1 − u∗∗‖ ≤ ‖u0 − u∗∗‖+√

2L(r)µ1 + 2σ1 + 2µ1 +ε12,

hence, ‖u1‖ ≤ r∗.Phase 2: Suppose now that the inequalities

‖uk‖ < r∗, ‖uk‖ < r∗

are true for k = 1, ..., s − 1. Then, we have uk = uk and from (4.2.8), (4.2.10)and (4.2.11) it follows that for k = 1, ..., s− 1

‖uk − u∗∗‖ ≤ ‖uk−1 − u∗∗‖+√

2L(r)µk + 2σk + 2µk,

‖uk − u∗∗‖ ≤ ‖uk−1 − u∗∗‖+√

2L(r)µk + 2σk + 2µk +εk2. (4.2.13)


Summing up (4.2.13) for k = 1, ..., s− 1 and taking into account that

‖us − u∗∗‖ ≤ ‖us−1 − u∗∗‖+√

2L(r)µs + 2σs + 2µs,

then

‖us − u∗∗‖ ≤ ‖u0 − u∗∗‖+

s∑k=1

(√2L(r)µk + 2σk + 2µk

)+

1

2

s−1∑k=1

εk.

Together with (4.2.7) this proves that ‖us‖ < r∗.Consequently, us = us and observing (4.2.12), we finally get

‖us − u∗∗‖ ≤ ‖u0 − u∗∗‖+

s∑k=1

(√2L(r)µk + 2σk + 2µk

)+

1

2

s∑k=1

εk

and ‖us‖ < r∗.Phase 3: For an arbitrary element w ∈ Q∗ := U∗ ∩ Br∗ and elements vk ∈ Qk,vk ∈ Q, chosen such that

‖vk − w‖ ≤ µk, ‖vk − uk‖ ≤ µk, k = 1, 2, · · · ,

we obtain analogously to (4.2.9) the estimate

Jk(vk)− Jk(uk) ≤ 2L(r)µk + 2σk. (4.2.14)

Applying again Proposition 3.1.3, but with the new data

v0 := uk−1, y := vk, f := Jk, χ := 2, C := Qk,


‖uk − vk‖2 ≤ ‖uk−1 − vk‖2 + Jk(vk)− Jk(uk), (4.2.15)

and, similarly to (4.2.11)

‖uk − w‖ ≤ ‖uk−1 − w‖+√

2L(r)µk + 2σk + 2µk.

In view of (4.2.12), the latter inequality provides

‖uk − w‖ ≤ ‖uk−1 − w‖+√

2L(r)µk + 2σk + 2µk +εk2, k = 1, 2, · · · .

Therefore, condition (4.2.7) and Lemma A3.1.4 imply that ‖uk−w‖ convergesfor any w ∈ Q∗ and with regard to (4.2.12), ‖uk − w‖ converges to the samelimit.

Due to the choice of the element vk the inequalities

‖uk−1 − vk‖2 ≤(‖uk−1 − w‖+ µk

)2‖uk − w‖2 ≤

(‖uk − vk‖+ µk

)2are true. Together with (4.2.15) and the estimates

‖uk‖ < r, ‖uk−1‖ < r, ‖vk‖ ≤ r, ‖w‖ ≤ r,


this leads to

‖uk−1 − w‖2 − ‖uk − w‖2 ≥ Jk(uk)− Jk(vk)− 8rµk − 2µ2k

and because of (4.2.5) and (4.1.3),

‖uk−1−w‖2−‖uk−w‖2 ≥ J(uk)−J(w)− (L(r)+8r)µk−2µ2k−2σk. (4.2.16)

As shown in phase 2 of the proof, the sequence uk is bounded. Hence,choosing in (4.2.16) a subsequence ukj u, and taking limit (for j → ∞),then in view of the weak lower-semicontinuity of the functional J and the con-ditions

limk→∞

σk = limk→∞

µk = 0,

we conclude thatJ(u)− J(w) ≤ 0.

But on account of (4.2.6), (4.2.7) and the convexity of K ∩ Br∗ , the limit pointu belongs to K ∩ Br∗ , therefore u ∈ Q∗ := U∗ ∩ Br∗ . Setting u := u∗, Opial’sLemma A1.1.3 shows that uk u∗, which implies uk u∗, too.

If, additionally, Assumption 4.2.3 is satisfied, then strong convergence ofuk → u∗ holds due to Lemma 4.1.4.

4.2.5 Remark. If OSR-Method 4.2.1 is executed with two different startingpoints u0 and v0, then in view of the non-expansivity of the prox-mapping thefollowing relation holds between the corresponding limit points u∗ and v∗:

‖u∗ − v∗‖ ≤ ‖u0 − v0‖+

∞∑k=1

εk.

♦

4.2.6 Remark. In order to satisfy the assumptions of Theorem 4.2.4 it issufficient to choose the sequences µk and σk such that

µk ≤ (c1 + k)−2−c2 , σk ≤ (c1 + k)−1−c2 , k = 1, 2, · · · ,

where c2 > 0 is an arbitrary small number and c1 > 0 a constant. For specialproblems and methods considered in this book later on these conditions for up-dating the parameters are not too restrictive if c is not chosen too large. Thevalue c can be tuned by means of variations of the radius r of the ball Br orsuitable scalings of the objective functional J . However, these transformationsmay lead to an increase of the external iterations.

The choice of the controlling parameters will be discussed in more de-tail in connection with specific variational problems considered in the followingchapters, see in particular Subsections 4.4 and 9.1.3. ♦

It should be noted that the statement of Theorem 4.2.4 remains true (withevident modifications in the proof) if one takes σk ≡ 0, i.e.,

Ψk(·) ≡ Ψk(·) := J(·) + ‖ · −uk−1‖2, ∀ k

and if, moreover, instead of (4.2.3) the condition

infq(uk)∈∂Ψk(uk)

‖q(uk)− q(uk)‖V ′ ≤ εk


is used with some q(uk) ∈ ∂Ψk(uk), q(uk) ∈ ∂Ψk(uk).

However, considering the case that J is non-differentiable and its approx-imation cannot be performed sufficiently smooth, the following OSR-methodwithout smoothing may be more convenient:

Generate uk by

Ψk(uk) ≤ minu∈Kk

Ψk(u) + εk, (4.2.17)

ρ(uk,Kk) ≤ εk. (4.2.18)

In that case we maintain the notations used for the description of OSR-Method4.2.1, but modify only the conditions concerning the choice of the radii r∗ andr such that

U∗ ∩ Br∗/8 6= ∅, r ≥ 2r∗ + 2 supkεk (4.2.19)

and(r∗)2 − (2(L(2r∗) + 8r∗) + 1) sup

kεk > 0, (4.2.20)

with L(2r∗) the Lipschitz constant of the functional J on B2r∗ .

Further, denote ukQ := arg minu∈Q Ψk(u) and Q := ∪∞k=1ukQ.

4.2.7 Theorem. Assume that J fulfils the Lipschitz condition (4.1.3) on Br,that the estimates (4.2.6) as well as ρ(Q,Qk) ≤ µk are satisfied, and moreover,that u0 ∈ Br∗/4 and

∞∑k=1

(√2L(r)µk + 2µk +

√(1 + L(r) + 4r)εk + εk

)<r∗

2. (4.2.21)

Then the sequence uk, generated by (4.2.17)(4.2.18), converges weakly to somesolution u∗ of Problem (4.2.1).If, additionally, Assumption 4.2.3 holds, then limk→∞ ‖uk − u∗‖ = 0.

Proof: Denote

¯uk := arg minu∈Kk

Ψk(u),

wk := arg minv∈Kk

‖v − uk‖

and assume that

‖¯uk‖ < r∗, ‖uk−1‖ < r∗, ∀ k ≤ s.

If s = 1, the second inequality is obvious and the first one was shown in phase1 of the proof of Theorem 4.2.4.If us /∈ Br′ with r′ := 2r∗ + supk εk, then due to (4.2.18) we get ws /∈ B2r∗ . Forthe elements

ws ∈ [¯us, ws] ∩ ∂B2r∗ , us ∈ [¯us, us] ∩ ∂B2r∗


the inequality‖ws − us‖ ≤ 2εs (4.2.22)

can be concluded by means of elementar geometrical observations, consideringthe pair of triangles with vertices (¯us, us, ws) and (¯us, us, ws), respectively, andchoosing εs <

r∗

16 , which follows trivially from (4.2.20).From (4.1.1), (4.1.3) and (4.2.22) one obtains

Ψs(us) ≥ Ψs(w

s)− 2(L(2r∗) + 8r∗)εs. (4.2.23)

But, regarding the strong convexity of Ψs, the inequality

Ψs(ws)− Ψs(¯us) ≥ 〈qs(¯us), ws − ¯us〉+ ‖ws − ¯us‖2 (4.2.24)

is true for any subgradient qs(¯us) ∈ ∂Ψs(¯us). Moreover, by definition of ¯us, itholds for some qs(¯us)

〈qs(¯us), ws − ¯us〉 ≥ 0.

Since ws ∈ ∂B2r∗ and ¯us ∈ Br∗ , we infer that

Ψs(ws)− Ψs(¯us) ≥ ‖ws − ¯us‖2 ≥ (r∗)2.

Now, (4.2.23) and (4.2.20) yield

Ψs(us) ≥ Ψs(¯us) + (r∗)2 − 2 (L(2r∗) + 8r∗) εs > Ψs(¯us) + εs.

Because us = us + λ(us − ¯us) for some λ > 0, we conclude from the convexityof Ψs and the latter inequality that

Ψs(us) > Ψs(¯us) + εs,

which contradicts condition (4.2.17).Hence, us ∈ Br′ ⊂ Br, and due to (4.2.18), this implies

ws ∈ Br. (4.2.25)

From (4.2.17) it is obvious that

Ψk(uk) ≤ minu∈Qk

Ψk(u) + εk, ∀ k ∈ N. (4.2.26)

Furthermore, the relations (4.2.18), (4.2.25) and ‖uk‖ < r∗ (for k < s) lead to

ρ(uk, Qk) ≤ εk, k = 1, ..., s. (4.2.27)

Taking into account (4.1.3), (4.2.26) and ‖uk − wk‖ ≤ εk, we infer that

Ψk(wk) ≤ minu∈Qk

Ψk(u) + (1 + L(r) + 4r)εk.

Hence, in view of the strong convexity of Ψk and (4.2.27), the inequalities

‖wk − ¯uk‖ ≤√

(1 + L(r) + 4r)εk,

‖uk − ¯uk‖ ≤√

(1 + L(r) + 4r)εk + εk (4.2.28)


hold for k = 1, ..., s. Now, formula (4.2.11) can be applied with

u∗∗ ∈ U∗ ∩ Br∗/8, uk := arg minu∈Qk

Ψ(u), σk := 0

(note that inequality (4.2.11) is true regardless of how uk−1 has been calculated).According to uk = ¯uk for k ≤ s, we get

‖¯uk − u∗∗‖ ≤ ‖uk−1 − u∗∗‖+√

2L(r)µk + 2µk

and due to (4.2.28),

‖uk − u∗∗‖ ≤ ‖uk−1 − u∗∗‖+ θk, ∀ k = 1, ..., s (4.2.29)

with

θk :=√

2L(r)µk + 2µk +√

(1− L(r) + 4r)εk + εk.

Summing up the latter inequalities, this leads to

‖us − u∗∗‖ ≤ ‖u0 − u∗∗‖+

s∑k=1

θk. (4.2.30)

Hence, in view of the choice of u0, u∗∗ and (4.2.21), we conclude that ‖us‖ < r∗.Moreover, from (4.2.11) (with k := s+ 1) and (4.2.30) it follows

‖us+1 − u∗∗‖ ≤ ‖u0 − u∗∗‖+

s+1∑k=1

θk,

and as before this yields ‖us+1‖ < r∗, i.e.,

¯us+1 = us+1, ‖¯us+1‖ < r∗.

Therefore, the estimates

‖uk‖ < r∗, ‖¯uk‖ < r∗, ∀ k ∈ N

are true and we have shown that

ρ(uk, Qk) = ρ(uk,Kk), ∀ k.

Now, due to (4.2.26), (4.2.27), (4.2.6) and ρ(Q,Qk) ≤ µk, it follows

Ψk(uk) ≤ minu∈Q

Ψk(u) + (L(r) + 4r)µk + εk,

ρ(uk, Q) ≤ εk + µk,

and all we have to do is to use Lemma 4.1.3.

As mentioned before, this variant of an OSR-method may be convenientin case methods of non-smooth optimization solve efficiently Problem (4.2.1).


4.2.2 Modifications of OSR-methods

Now, for Problem (4.2.1) we are going to describe modifications of the OSR-methods which make use of a different approximation of the original objectivefunction J as it was suggested in OSR-Method 4.2.1. This leads to weakerconditions for the choice of the controlling parameters.

Let the sequences Jk and Kk satisfy the same assumptions as madeat the beginning of Section 4.2.

For fixed r1 > 0 denote δ(u) := ρ2(u,Br1) and

Φk(u) := Jk(u) + δ(u) + ‖u− uk−1‖2.

Now, we are dealing with the following family of auxiliary problems

minu∈Kk

Φk(u), (4.2.31)

4.2.8 Method. (Modified OSR-Method)Data: u0 ∈ V , εk ↓ 0;

S0: Set k := 0;

S1: compute for Problem (4.2.31) an approximate solution uk such that

‖∇Φk(uk)−∇Φk(uk)‖V ′ ≤ εk, (4.2.32)

whereuk := arg min

u∈KkΦk(u). (4.2.33)

S2: set k := k + 1 and go to S1.

♦

To prove convergence of this method we need the following assumptionsand notations. Suppose that the functional J satisfies the Lipschitz condition(4.1.3) on Br1 . Further, for given θ > 0 choose the radius r such that

r ≥ 2r1 + 2L(r1) + 2√θ + L(r1)(4r1 + L(r1)) (4.2.34)

and denote

Q := K ∩ Br, Q∗ := U∗ ∩ Br, Qk := Kk ∩ Br, ∀ k,

Φk(u) := J(u) + δ(u) + ‖u− uk−1‖2.

Again, we need some quantitative information about the exactness of the ap-proximations of J and K.

4.2.9 Assumption. For given sequences σk ↓ 0 and µk ↓ 0 and for eachk = 1, 2, · · · it holds

supu∈Br

|J(u)− Jk(u)| ≤ σk, (4.2.35)

ρ(Qk, Qk+1) ≤ µk, ρ(Qk, Q) ≤ µk, ρ(Q∗, Qk) ≤ µk. (4.2.36)

♦


4.2.10 Theorem. Suppose parameters r1, r and θ are chosen such that U∗ ∩Br1 6= ∅, relation (4.2.34) holds and J satisfies the Lipschitz condition (4.1.3)on Br. Moreover, let Assumption 4.2.9 be fulfilled and

∞∑k=1

√µk <∞,

∞∑k=1

√σk <∞,

∞∑k=1

εk <∞, (4.2.37)

∞∑k=1

((L(r) + 6r)µk +

ε2k4

+ 2σk

)< θ. (4.2.38)

Then sequence uk, starting with u0 ∈ intBr1 ∩ K1 and generated by Method4.2.8, converges weakly to some u∗ ∈ U∗.If, moreover, Assumption 4.2.3 is satisfied, then limk→∞ ‖uk − u∗‖ = 0.

Proof: Let wk := arg minΦk(u) : u ∈ Qk−1 and set Q0 := Q1, µ0 := µ1. Inview of (4.2.36) an element vk ∈ Qk can be chosen with ‖wk− vk‖ ≤ µk−1. TheLipschitz properties of the functionals J and δ (cf. (4.1.2)) imply

|J(vk)− J(wk)| ≤ L(r)µk−1,

|δ(vk)− δ(wk)| ≤ 2rµk−1,

and if ‖uk−1‖ ≤ r, we get in view of (4.1.1)∣∣‖vk − uk−1‖2 − ‖wk − uk−1‖2∣∣ ≤ 4rµk−1.

Hence, if ‖uk−1‖ ≤ r, then∣∣Φk(vk)− Φk(wk)∣∣ ≤ (L(r) + 6r)µk−1. (4.2.39)

Now, assume that for k ≤ s the inequality

J(uk) + δ(uk) ≤ J(u0) + (L(r) + 6r)

k−1∑j=1

µj +1

4

k−1∑j=1

ε2j + 2

k∑j=1

σj (4.2.40)

is satisfied (for k = 1 this is trivial because

J(u1) + δ(u1) ≤ minu∈K1

J1(u) + δ(u) + ‖u− u0‖2+ σ1

≤ J1(u0) + δ(u0) + σ1 ≤ J(u0) + δ(u0) + 2σ1

and, due to the choice of u0, it holds δ(u0) = 0).Then, by means of the relations (4.2.38), (4.2.40) and Lemma 4.1.1, we caninfer that ‖us‖ < r

2 , meaning that us ∈ Qs.The definition of wk implies

Φs+1(ws+1) ≤ J(us) + δ(us) + ‖us − us‖2

and with regard to (4.2.39)

Φs+1(vs+1) ≤ J(us) + δ(us) + ‖us − us‖2 + (L(r) + 6r)µs. (4.2.41)

But‖uk − uk‖ ≤ εk

2, ∀ k, (4.2.42)


and taking into account (4.2.40), (4.2.41) and the trivial inequality

Φs+1(vs+1) ≥ J(us+1) + δ(us+1)− 2σs+1,

we get

J(us+1) + δ(us+1) ≤ J(us) + δ(us) + ‖us − us‖2 + (L(r) + 6r)µs + 2σs+1

≤ J(u0) + (L(r) + 6r)

s∑j=1

µj +1

4

s∑j=1

ε2j + 2

s+1∑j=1

σj .

This induction allows to assert that ‖uk‖ < r2 holds for all k.

Due to (4.2.38), we have εk2 <

√θ, hence, the relations (4.2.34) and (4.2.42)

yield that

εk < r, ‖uk − uk‖ < r

2

and that the estimate ‖uk‖ < r is satisfied for all k ∈ N.Now we make use of Theorem 4.2.4, because the modified OSR-Method

4.2.8 can be interpreted as OSR-Method 4.2.1 applied to the problem

minJ(u) + δ(u) : u ∈ K.

Then in the respective part of the proof of Theorem 4.2.4, instead of relation(4.2.7) only the conditions (4.2.37) have to be used.

In order to find suitable values µk according to the Theorems 4.2.4, 4.2.7or Theorem 4.2.10, we realize that a sufficient accuracy of the successive approx-imation of the set Q is required, and this is the crucial point in the realizationof this approach.

Of course, we do not assume that the minimizers ukQ := arg minu∈Q Ψk(u)

and the sets Q∗ are known. For the elements u∗ ∈ U∗ and ukQ we only have toknow some information about their smoothness (or regularity) which enables usto estimate the distance ρ(Q,Qk). This will be seen in Chapter 8 when specificvariational inequalities in Sobolev spaces are investigated.

In order to generate a sequence uk, condition (4.2.3) (but as well theconditions (4.2.17), (4.2.18) or (4.2.32)) has to be translated into a practicablestopping rule according to the properties of the problems and the chosen basicalgorithms for solving the discretized and regularized auxiliary problems.

It should be noted that the assumptions of the Theorems 4.2.4 and 4.2.10do not guarantee solvability of the approximate, non-regularized problems

minu∈Kk

Jk(u) or minu∈Kk

J(u).

This fact will be illustrated in Example 6.2.1 below in context with the approx-imation of semi-infinite problems. Their corresponding approximate problemshave no solution at all regardless of which approximation level k is used. How-ever, the non-regularized problems

minu∈Kk

Jk(u) + δ(u),

corresponding to the modified OSR-method just before, are solvable in any case.


4.3 Multi-Step Regularization Methods

We describe now a coupled (diagonal) scheme of approximation and iterativePPR applied to Problem (4.2.1), where on the k-th approximation level (externalloop) the proximal method is applied for solving

minJk(u) : u ∈ Kk.

On each external approximation level k the proximal iterations are performed aslong as they prove to be sufficiently productive (internal loop), i.e., as long as thedistance between the successive iterates are not going below a given threshold.

In comparison with OSR-methods this approach permits one to handlemore effectively coarse approximations on the external approximation levels,because one gets on such levels better approximations of the solution of theoriginal problem.

4.3.1 Description of MSR-methods

As before, let Jk, k = 1, 2, · · · , be a sequence of convex and Gateaux-differentiablefunctionals on V and Kk, k ∈ N, be a sequence of convex and closed subsetsof V which approximate J and K, respectively.In order to obtain a solution of the original Problem (4.2.1) a sequence

uk,i, (i = 0, 1, ..., i(k), k = 1, 2, ...)

is generated. The termination index i(k) at the internal loop will be determinedwithin the process and the choice of the starting point u1,0 is explained moreprecisely in Lemma 4.3.5 below.

4.3.1 Method. (Multi-step regularization)Given u0 and sequences δk, εk, k = 1, 2, · · · , such that

δk > 0, εk ≥ 0, limk→∞

εk = 0.

Step k: Given uk−1.

(a) Set uk,0 := uk−1, i := 1.

(b) Given uk,i−1 and

Ψk,i(u) := Jk(u) + ‖u− uk,i−1‖2, (4.3.1)

compute approximately uk,i according to

‖∇Ψk,i(uk,i)−∇Ψk,i(u

k,i)‖V ′ ≤ εk (4.3.2)

withuk,i := arg minΨk,i(u) : u ∈ Kk. (4.3.3)

(c) If ‖uk,i − uk,i−1‖ > δk, set i := i+ 1 and repeat (b).Otherwise, set uk := uk,i, i(k) := i, and continue with Step k := k + 1.

4.3. MULTI-STEP REGULARIZATION METHODS 113

♦

Before proving convergence of MSR-Method 4.3.1 we give a simple examplewarning against a careless choice of the controlling parameters appearing in themethod.

4.3.2 Example. Function J(u) := u2 has to be minimized on

K := u = (u1, u2)T : 0 ≤ u1 ≤ 1, 0 ≤ u2 ≤ 1.

Let Jk := J for all k and the sets Kk be defined such that (s ∈ Z+):

for k = 3s− 2 :Kk := u : 0 ≤ u1 ≤ 1, u2 ≤ 1, u1 + αs−1u2 ≥ 1;

for k = 3s− 1 :Kk := u : 0 ≤ u1 ≤ 1, u2 ≤ 1, u1 − αs−1u2 ≤ αs−1

α2s−1

+1if u1 ≥ 1

2,

u2 ≥ 12αs−1

− 1α2s−1

+1if u1 ≤ 1

2;

for k = 3s :Kk := u : 0 ≤ u1 ≤ 1, 1

2αs≤ u2 ≤ 1;

where α0 > 1 is fixed and αs satisfies for s ≥ 1 the condition

αs ≥ αs−1α2s−1 + 1

(αs−1 − 1)2.

In Figur 4.3.1 the first three sets Kk are drawn.

u2

u1

K3

K2

K1

1

1

Figure 4.3.1:

It is easily seen that, increasing αs suitably, the approximation of K bymeans of Kk can be performed arbitrarily fast, but also extremely slow, if

αs = αs−1α2s−1 + 1

(αs−1 − 1)2.

In each case lims→∞ αs =∞, hence, ρ(K,Kk)→ 0 if k →∞.Assume now that εk = 0 ∀ k, i.e., the regularized auxiliary problems

minΨk,i(u) : u ∈ Kk


will be solved exactly. Then a straightforward calculation shows the following:

Starting with u0 := ( 12, 1

2α0)T and sequence δk chosen such that

δ3s−2 ≥

√√√√ α2s−1

4(α2s−1 + 1)2

+1

4(α2s−1 + 1)2

:= δ,

δ3s−1 > 0 (arbitrarily),

δ3s > 0 (arbitrarily),

we get i(3s− 2) = 1, i(3s− 1) = 1 or 2, i(3s) = 1 or 2, and

u3s−2,1 =

(1

2+

αs−1

2(α2s−1 + 1)

,1

2αs−1−

1

2(α2s−1 + 1)

)T= u3s−1,0,

u3s−1,i(3s−1) =

(1

2,

1

2αs−1−

1

α2s−1 + 1

)T= u3s,0,

u3s,i(3s) =

(1

2,

1

2αs

)T= u3s+1,0.

Hence, uk,i converges to the point u∗ = ( 12, 0) ∈ U∗.

But, if max[δ3s−2, δ3s−1] < δ and δ3s > 0 is chosen arbitrarily, then

u3s−2,i(3s−2) = (1, 0)T ,

u3s−1,i(3s−1) =

(1

2,

1

2αs−1−

1

α2s−1 + 1

)T,

u3s,i(3s) =

(1

2,

1

2αs

)T, for i(3s) = 1 or 2.

In this case uk,i(k) has two limit points ( 12, 0)T and (1, 0)T . ♦

4.3.2 Convergence of the MSR-scheme

Now, we are going to prove convergence of MSR-Method 4.3.1 applied to Prob-lem (4.2.1). Again at first we start with some notations.For a fixed radius r∗ > 0 and r ≥ r∗ let

Q := K ∩ Br, Qk := Kk ∩ Br, ∀ k ∈ N;

Ψk,i(u) := J(u) + ‖u− uk,i−1‖2, uk,iQ := arg minu∈Q

Ψk,i(u);

Q := ∪∞k=1 ∪i(k)i=1 u

k,iQ , Q∗ := (U∗ ∩ Br∗) ∪ (Q ∩ Br).

(4.3.4)

Working with Assumption 4.2.2 in the case of MSR-methods, we supposethat Q∗ appearing there is defined by (4.3.4).

Next we show under which conditions the internal loop of a MSR-methodterminates after a finite number of steps.

4.3.3 Lemma. Suppose r∗ is fixed such that U∗∩Br∗ 6= ∅, r is chosen such thatr ≥ r∗, functional J is Lipschitz on Br according to (4.1.3), and Assumption4.2.2 is fulfilled.Moreover, controlling parameters εk, δk, µk and σk let be chosen such that

1

4r

(2L(r)µk + 2σk − (δk −

εk2

)2)

+εk2< 0,

εk := δk −εk2> 0.

(4.3.5)


If, for fixed k0, the iterates of MSR-Method 4.3.1 satisfy the inequalities

‖uk0,i‖ < r∗, ‖uk0,i‖ < r∗ ∀ i,

then for the termination index it holds i(k0) <∞.

Proof: Let u∗∗ ∈ U∗∩Br∗ be fixed. Due to u∗∗ ∈ Q∗, uk0,i ∈ Qk0 and condition(4.2.6), one can choose vk0 ∈ Qk0 and vk0,i ∈ Q such that

‖vk0 − u∗∗‖ ≤ µk0 , ‖vk0,i − uk0,i‖ ≤ µk0 , ∀ i = 1, ..., i(k0). (4.3.6)

On account of (4.1.3) and (4.3.6) the estimates

J(vk0,i)− J(uk0,i) ≤ L(r)µk0 ,

J(vk0)− J(u∗∗) ≤ L(r)µk0

are true, and with regard to J(u∗∗) ≤ J(vk0,i), we have

J(vk0)− J(uk0,i) ≤ 2L(r)µk0 .

On account of (4.2.5),

Jk0(vk0)− Jk0(uk0,i) ≤ 2L(r)µk0 + 2σk0 =: τk0 (4.3.7)

holds true for all i = 1, ..., i(k0).Using Proposition 3.1.3 with

C := Qk0 , f := Jk0 , v0 := uk0,i−1, y := vk0 ,

we obtain from (3.1.3)

‖uk0,i − vk0‖2 − ‖uk0,i−1 − vk0‖2 ≤ −‖uk0,i − uk0,i−1‖2 + τk0 . (4.3.8)

But in view of stopping rule (4.3.2), for each k and i the estimate

‖uk,i − uk,i‖ ≤ εk2

(4.3.9)

is true. For 1 ≤ i < i(k0), due to

‖uk0,i − uk0,i−1‖ > δk0 >εk02

and (4.3.8), (4.3.9), it follows that

‖uk0,i − vk0‖2 − ‖uk0,i−1 − vk0‖2 ≤ −(‖uk0,i − uk0,i−1‖ − εk0

2

)2

+ τk0 ,

‖uk0,i − vk0‖2 − ‖uk0,i−1 − vk0‖2 ≤ −ε2k0 + τk0

and from (4.3.5) we have τk0 < ε2k0 . Hence, for 1 ≤ i < i(k0),

‖uk0,i − vk0‖ − ‖uk0,i−1 − vk0‖ ≤−ε2k0 + τk0

‖uk0,i − vk0‖+ ‖uk0,i−1 − vk0‖

<−ε2k0 + τk0

2‖uk0,i−1 − vk0‖. (4.3.10)


With regard to ‖vk0‖ ≤ r and ‖uk0,i‖ < r ∀ i, the inequalities (4.3.9) and(4.3.10) lead to

‖uk0,i−vk0‖ < ‖uk0,i−1−vk0‖+1

4r(−ε2k0 +τk0)+

εk02, 1 ≤ i < i(k0). (4.3.11)

Summing up these inequalities for i = 1, ..., imax, with imax ≤ i(k0)−1 arbitrarilychosen, we get

‖uk0,imax − vk0‖ < ‖uk0,0 − vk0‖+ imax

(1

4r(−ε2k0 + τk0) +

εk02

),

where, because of (4.3.5),

1

4r(−ε2k0 + τk0) +

εk02< 0.

Hence,

imax <‖uk0,0 − vk0‖

14r (−ε2k0 + τk0) +

εk02

showing that i(k0) <∞.

It should be emphasized that the relations (4.2.5), (4.2.6) and (4.3.5) areused only for a fixed iteration index k0.

4.3.4 Corollary. If the assumptions of Lemma 4.3.3 are fulfilled and for uk,i,uk,i, generated according to MSR-Method 4.3.1, the inequalities

‖uk,i‖ < r∗, ‖uk,i‖ < r∗, ∀ k,∀ i

are satisfied, then i(k) < ∞ holds for any external iteration k, i.e., the MSR-Method is well defined.

4.3.5 Lemma. Suppose that for given r∗ and r ≥ r∗ the functional J is Lip-schitz on Br according to (4.1.3) and Assumption 4.2.2 is fulfilled. Moreover,assume that

U∗ ∩ Br∗/8 6= ∅, u1,0 := u0 ∈ Br∗/4, (4.3.12)

and controlling parameters εk, δk, µk and σk are chosen such that

1

4r

(2L(r)µk + 2σk − (δk −

εk2

)2)

+εk2< 0, (4.3.13)

∞∑k=1

(√2L(r)µk + 2σk +

εk2

+ 2µk

)<r∗

2. (4.3.14)

Then for MSR-Method 4.3.1 it holds:

(i) i(k) <∞ ∀ k;

(ii) ‖uk,i‖ < r∗, ‖uk,i‖ < r∗, ∀ k, ∀ i.


Proof: In view of (4.3.13), (4.3.14) we have

2rεk < (δk −εk2

)2, εk < r∗,

consequently

2ε2k < (δk −εk2

)2, δk >εk2.

Therefore,

εk := δk −εk2>√rεk > 0. (4.3.15)

Now, let u∗∗ ∈ U∗∩Br∗/8 be fixed. For proof by induction we make the followingassumptions:Suppose that k0 and i0 are fixed, 0 ≤ i0 < i(k0) and

(a) i(k) <∞ ∀ k < k0;

(b) ‖uk,i‖ < r∗, ‖uk,i‖ < r∗, ∀ (k, i) ∈ I0,with I0 := (k′, i′) : k′ < k0, 0 < i′ ≤ i(k′) and k′ = k0, 0 < i′ ≤ i0.

Denoteuk,i := arg minΨk,i(u) : u ∈ Qk.

For (k, i) ∈ I0 it holds obviously uk,i = uk,i.Now we choose vk ∈ Qk such that

‖vk − u∗∗‖ ≤ µk, (4.3.16)

which is possible due to Assumption 4.2.2.Similarly as in the proof of the previous lemma we conclude that for (k, i) ∈ I0

Jk(vk)− Jk(uk,i) ≤ 2L(r)µk + 2σk =: τk. (4.3.17)

Now, if k < k0, 0 ≤ i < i(k) − 1 or k = k0, 0 ≤ i < i0, then using Proposition3.1.3 with

C := Qk, f := Jk, v0 := uk,i, y := vk,

we get (cf. inequality (4.3.10)) in view of uk,i+1 = uk,i+1

‖uk,i+1 − vk‖ − ‖uk,i − vk‖ ≤ −ε2k + τk2‖uk,i − vk‖

≤ 1

4r

(−ε2k + τk

), (4.3.18)

and on account of (4.3.9) and (4.3.13)

‖uk,i+1 − vk‖ − ‖uk,i − vk‖ ≤ 1

4r

(−ε2k + τk

)+εk2< 0. (4.3.19)

Analogously, for k < k0 and i = i(k)− 1,

‖uk,i(k) − vk‖ − ‖uk,i(k)−1 − vk‖ ≤√τk +

εk2. (4.3.20)

Summing up the inequalities (4.3.19), (4.3.20) for a fixed k < k0 and 0 ≤ i ≤i(k)− 1, we obtain

‖uk,i(k) − vk‖ − ‖uk,0 − vk‖ ≤√τk +

εk2, ∀ k < k0,


hence, due to (4.3.16)

‖uk,i(k) − u∗∗‖ − ‖uk,0 − u∗∗‖ ≤√τk +

εk2

+ 2µk. (4.3.21)

Because ofJk0(vk0)− Jk0(uk0,i0+1) ≤ τk0

and Proposition 3.1.3 it follows that

‖uk0,i0+1 − vk0‖ ≤ ‖uk0,i0 − vk0‖+√τk0 , (4.3.22)

and summing up the inequalities (4.3.19) for k = k0, 0 ≤ i < i0 and (4.3.22),this leads to

‖uk0,i0+1 − vk0‖ ≤ ‖uk0,0 − vk0‖+√τk0 ,

consequently,

‖uk0,i0+1 − u∗∗‖ ≤ ‖uk0,0 − u∗∗‖+√τk0 + 2µk0 . (4.3.23)

Now, in view of uk,i(k) := uk+1,0 := uk+1, the inequalities (4.3.21) and (4.3.23)imply

‖uk0,i0+1−u∗∗‖ ≤ ‖u1,0−u∗∗‖+k0−1∑j=1

(√τj +

εj2

+ 2µj

)+√τk0 +2µk0 . (4.3.24)

Due to (4.3.12), the chosen element u∗∗ and (4.3.14), we obtain

‖uk0,i0+1‖ < r∗ − εk02,

hence uk0,i0+1 = uk0,i0+1 and

‖uk0,i0+1‖ < r∗ − εk02. (4.3.25)

Inequality (4.3.25) together with (4.3.9) provides

‖uk0,i0+1‖ < r∗. (4.3.26)

This enables us to conclude from condition (b) that

‖uk0,i‖ < r∗, ‖uk0,i‖ < r∗, ∀ i (4.3.27)

and Lemma 4.3.3 guarantees that i(k0) <∞.In a similar way, by means of Proposition 3.1.3, the inequalities

‖u1,1‖ < r∗, ‖u1,1‖ < r∗

can be verified. Hence, the estimates

‖u1,i‖ < r∗, ‖u1,i‖ < r∗, ∀ 0 < i ≤ i(1)

are valid and again the usage of Lemma 4.3.3 implies that i(1) < ∞. Thus,induction is complete.

Having the previous lemmata proved, we are now able to state convergenceof the MSR-methods.


4.3.6 Theorem. Suppose the assumptions of Lemma 4.3.3 are fulfilled anduk,i, uk,i, generated by MSR-Method 4.3.1, satisfy the inequalities

‖uk,i‖ < r∗, ‖uk,i‖ < r∗, ∀ k, ∀ i. (4.3.28)

Moreover, let

∞∑k=1

√µk <∞,

∞∑k=1

√σk <∞,

∞∑k=1

εk <∞. (4.3.29)

Then uk,i u∗ ∈ U∗.If, in addition, Assumption 4.2.3 holds, then uk,i → u∗ ∈ U∗ in the norm ofthe space V .

Proof: Well-definedness of MSR-Method 4.3.1, i.e. i(k) < ∞ ∀k, has beennoted in Corollary 4.3.4.Again, let u∗∗ ∈ U∗ ∩ Br∗ be arbitrarily chosen but fixed and

uk,i+1Q := arg min

u∈QΨk,i+1(u).

For i ≤ i(k)− 1 inequality

0 ≤ 2〈uk,i+1Q − uk,i, u∗∗ − uk,i+1

Q 〉+ J(u∗∗)− J(uk,i+1Q )

follows from Proposition A1.5.34.For i := i(k) all considerations with respect to uk,i+1

Q can be performedcompletely analogous.

Consequently, due to (3.1.3),

‖uk,i−u∗∗‖2−‖uk,i+1Q −u∗∗‖2 ≥ J(uk,i+1

Q )−J(u∗∗)+‖uk,i+1Q −uk,i‖2. (4.3.30)

From (4.2.5), (4.2.6) and (4.1.3) the inequalities

minu∈Qk

Ψk,i+1(u)−minu∈Q

Ψk,i+1(u) ≤ (L(r) + 4r)µk,

minu∈Qk


Ψk,i+1(u) ≤ (L(r) + 4r)µk + σk

can be deduced, hence

Ψk,i+1(uk,i+1)− Ψk,i+1(uk,i+1Q ) ≤ (L(r) + 4r)µk + σk. (4.3.31)

Denote uk,i := arg minv∈Q ‖v − uk,i‖.Due to (4.2.6), the Lipschitz property of Ψk,i+1 and

|Ψk,i+1(uk,i+1)−Ψk,i+1(uk,i+1)| ≤ |Ψk,i+1(uk,i+1)− Ψk,i+1(uk,i+1)|+ 2σk,

we conclude that

Ψk,i+1(uk,i+1) ≤ Ψk,i+1(uk,i+1) + (L(r) + 4r)µk + 2σk,

which together with (4.3.31) leads to

Ψk,i+1(uk,i+1)− Ψk,i+1(uk,i+1Q ) ≤ 2(L(r) + 4r)µk + 3σk,

Ψk,i+1(uk,i+1)− Ψk,i+1(uk,i+1Q ) ≤ 2(L(r) + 4r)µk + 4σk.


By virtue of the strong convexity of Ψk,i+1, we have

‖uk,i+1 − uk,i+1Q ‖ ≤

√2(L(r) + 4r)µk + 4σk.

Finally, because of (4.2.6) and (4.3.9), the estimate

‖uk,i+1 − uk,i+1Q ‖ ≤

√2(L(r) + 4r)µk + 4σk + µk +

εk2

:= θk (4.3.32)

is true.From (4.3.28) and the choices u∗∗ and uk,iQ , the above result yields

‖uk,i+1 − u∗∗‖2 − ‖uk,i+1Q − u∗∗‖2 ≤ 4rθk,

and in view of (4.3.30),

‖uk,i − u∗∗‖2 − ‖uk,i+1 − u∗∗‖2 ≥ ‖uk,i − uk,i+1Q ‖2 + J(uk,i+1

Q )− J(u∗∗)− 4rθk.

In particular, for i := i(k)− 1 inequality

‖uk,i(k)−1 − u∗∗‖2 − ‖uk+1,0 − u∗∗‖2

≥ ‖uk,i(k)−1 − uk+1,0Q ‖2 + J(uk+1,0

Q )− J(u∗∗)− 4rθk (4.3.33)

holds true. But for i < i(k) and

vk ∈ Qk : ‖vk − u∗∗‖ ≤ µk, (4.3.34)

following the proof of inequality (4.3.11), we get

‖uk,i − vk‖ < ‖uk,i−1 − vk‖. (4.3.35)

Hence,

‖uk,i − vk‖ < ‖uk,0 − vk‖,‖uk,i − u∗∗‖ < ‖uk,0 − u∗∗‖+ 2µk.

Now, taking into account that ‖uk,0‖ < r, ‖u∗∗‖ ≤ r∗, it follows for i := i(k)−1

‖uk,i − u∗∗‖2 < ‖uk,0 − u∗∗‖2 + 8rµk + 4µ2k.

Due to (4.3.33) and the definitions of uk,iQ and u∗∗, the latter inequality provides

‖uk,0 − u∗∗‖2 − ‖uk+1,0 − u∗∗‖2

> ‖uk,i(k)−1 − uk+1,0‖2 + J(uk+1,0Q )− J(u∗∗)− 4r(θk + 2µk)− 4µ2

k,

therefore,

‖uk+1,0 − u∗∗‖2 < ‖uk,0 − u∗∗‖2 + 4r(θk + 2µk) + 4µ2k.

But in virtue of (4.3.29) the relations

∞∑k=1

µk <∞,∞∑k=1

θk <∞


are fulfilled, therefore Lemma A3.1.4 ensures convergence of the sequence

‖uk,0 − u∗∗‖ for any u∗∗ ∈ U∗ ∩ Br∗ .

Due to Proposition 3.1.3 it holds

‖uk,i(k) − vk‖ ≤ ‖uk,i(k)−1 − vk‖+√τk,

hence,

‖uk,i(k) − vk‖ ≤ ‖uk,i(k)−1 − vk‖+√τk +

εk2

(4.3.36)

and together with (4.3.34) and (4.3.35), this leads to

‖uk,i − u∗∗‖ ≥ ‖uk+1,0 − u∗∗‖ −√τk −

εk2− 2µk. (4.3.37)

Using (4.3.35), (4.3.37) and (4.3.29), one can conclude that the sequences

‖uk,i − u∗∗‖, ‖uk,iQ − u∗∗‖

converge to the same limit as ‖uk,0 − u∗∗‖ it does (the order of the elementsis defined by the method).Thus, (4.3.30) and (4.3.32) yield

limk→∞

max1≤i≤i(k)

|J(uk,iQ )− J(u∗∗)| = 0,

limk→∞

max1≤i≤i(k)

|J(uk,i)− J(u∗∗)| = 0.

Now, in order to finish this proof, we can follow the final part of the proof ofProposition 1.3.11 and the proof of Lemma 4.1.4.

4.3.7 Corollary. Theorem 4.3.6 remains true if instead of condition (4.3.28)it is assumed that

Kk ∈ intBrk with rk := r∗ − εk2, k = 1, 2, · · · . (4.3.38)

Applying Lemma 4.3.5 and Theorem 4.3.6 we conclude immediately con-vergence of the MSR-methods.

4.3.8 Theorem. Suppose the assumptions of Lemma 4.3.5 are fulfilled.Then, MSR-Method 4.3.1 applied to Problem (4.2.1) is well-defined, i.e. i(k) <∞ k = 1, 2, · · · , and uk,i u∗ ∈ U∗.If, in addition, Assumption 4.2.3 holds true, then ‖uk,i − u∗‖V → 0.

A comparison between Theorem 4.3.8 on the one hand and Theorem 4.3.6together with Corollary 4.3.7 on the other hand shows: If the sets Kk are uni-formly bounded, then the requirements for choosing the controlling parametersin order to ensure convergence of the MSR-Method bf4.3.1 are essentially weakerthan in Theorem 4.3.8. In fact, condition (4.3.14) may be replaced by (4.3.29).In particular, this enables us to start the iteration with a coarser approximationof the original problem.

Knowing a radius r∗ with U∗ ∩ Br∗ 6= ∅ allows us to add a constraint


g0(u) ≤ r0 to the original problem. The choice of the functional g0 and con-stant r0 (for example g0(·) := ‖ · ‖ and r0 := r∗) has to guarantee that the setu : g0(u) ≤ r0 is bounded and that

U∗ ∩ u : g0(u) ≤ r0 6= ∅.

Obviously, in this way approximations of the transformed problem can be con-structed such that the family Kk is bounded. This artificial approach may beefficient, for instance, for solving convex semi-infinite problems with non-linearconstraints. However, for specific variational inequalities, as considered in Sec-tion 8.2, this may lead to essential more complicated approximate problems oneach discretization level. Moreover, the estimate ρ(Q∗, Qk), which is necessaryfor controlling the discretization procedure, becomes more difficult.

4.3.9 Remark. If the assumptions of Lemma 4.3.3 or Lemma 4.3.5 are ful-filled for a given sequence δk, then they are also satisfied for a sequence δkwith δk ≥ δk, whereas the other parameters remain unchanged. Particularly,for δk > 2r∗ ∀ k ∈ N, MSR-Method 4.3.1 turns into an OSR-method, i.e., forthe termination index yields i(k) = 1 ∀ k. In this case condition (4.3.13) is aconclusion of (4.3.14). Thus, Lemma 4.3.5 and Theorem 4.3.8 provide a con-vergence result for an OSR-method similarly to Theorem 4.2.4. An analogousinvestigation tells us that Theorem 4.3.6 remains also true for the modifiedOSR-Method 4.2.8. In this situation condition (4.3.5) is superfluous.On the other hand, one can formally consider MSR-Method 4.3.1 as an OSR-method if one takes

Kk,i := Kk, Jk,i := Jk, for 1 ≤ i ≤ i(k)

with a priorily unknown i(k). However, a straightforward application of Theo-rem 4.2.4 leads to essentially stronger sufficient convergence conditions than inTheorem 4.3.8. Indeed, instead of (4.3.14) the relation

∞∑k=1

i(k)(√

2L(r)µk + 2σk +εk2

+ 2µk

)<r∗

2

has to be required. ♦

4.3.2.1 Estimates of objective functional values in MSR-methods

Now let us consider some estimates of the values of the objective functionalsappearing in MSR-methods, i.e., we are interested in upper and lower boundsof the values

Jk(uk,i(k))− infu∈Qk

Jk(u)

andJ(uk,i(k))− inf

u∈KJ(u).

4.3.10 Theorem. Under the assumptions of Lemma 4.3.5 the following esti-mates are true for each exterior step k in MSR-Method 4.3.1:

Jk(uk,i(k))− infu∈Qk

Jk(u) < 4r(δk +εk2

) +1

2L(r)εk + 2σk, (4.3.39)

J(uk,i(k))− infu∈K

J(u) < 4rδk +1

2(4r + L(r))εk + L(r)µk + 2σk. (4.3.40)


Proof: Since Jk is supposed to be weakly lower semi-continuous at Qk, thereexists

vk := arg minv∈Qk

Jk(v).

By definition of uk,i(k), for any λ ∈ (0, 1] it holds

Jk(uk,i(k) + λ(vk − uk,i(k))) + ‖uk,i(k) + λ(vk − uk,i(k))− uk,i(k)−1‖2

≥ Jk(uk,i(k)) + ‖uk,i(k) − uk,i(k)−1‖2

and in view of convexity of Jk and positivity of λ, we obtain

Jk(uk,i(k)) ≤ λJk(vk) + (1− λ)Jk(uk,i(k)) + λ2‖vk − uk,i(k)‖2

+2λ〈vk − uk,i(k), uk,i(k) − uk,i(k)−1〉,

hence,

Jk(uk,i(k))− Jk(vk) ≤ 2〈uk,i(k) − uk,i(k)−1, vk − uk,i(k)〉+ λ‖vk − uk,i(k)‖2.

For λ ↓ 0 we get

Jk(uk,i(k))− Jk(vk) ≤ 2〈uk,i(k) − uk,i(k)−1, vk − uk,i(k)〉

≤ 2‖vk − uk,i(k)‖(‖uk,i(k) − uk,i(k)‖+ ‖uk,i(k) − uk,i(k)−1‖

)and, due to the stopping rule for the interior loop, one has

Jk(uk,i(k))− Jk(vk) ≤ 4r(δk +εk2

). (4.3.41)

Because of

|Jk(uk,i)− Jk(uk,i)| ≤ |J(uk,i)− J(uk,i)|+ 2σk ≤1

2L(r)εk + 2σk

for 1 ≤ i ≤ i(k), it yields (4.3.39).For a fixed u∗ ∈ U∗ ∩ Br∗ and wk := arg minv∈Qk ‖u

∗ − v‖ we conclude

J(uk,i(k))− J(u∗) ≤ J(uk,i(k))− J(u∗) + L(r)εk2

≤ J(uk,i(k))− J(wk) + L(r)(εk2

+ µk)

≤ Jk(uk,i(k))− Jk(wk) + L(r)(εk2

+ µk) + 2σk

≤ Jk(uk,i(k))− Jk(vk) + L(r)(εk2

+ µk) + 2σk

and, due to (4.3.41), estimate (4.3.40) is true.

4.3.11 Remark. Using Assumption 4.2.2, Lemma 4.3.5 and the inequalities(4.1.3) and (4.3.9), we immediately obtain the following lower estimates for eachk and 1 ≤ i ≤ i(k):

Jk(uk,i)− Jk(vk) ≥ −L(r)εk2− 2σk

andJ(uk,i)− inf

u∈KJ(u) ≥ −L(r)(

εk2

+ µk), (4.3.42)

with vk := arg minv∈Qk Jk(v). ♦


4.3.3 MSR-method with regularization on subspaces

Let V be divided into two subspaces such that V1 + V2 = V, V1 ∩ V2 = ∅ andΠ := Π2 is an orthogonal projector onto the subspace V2, whereas Π1 := I −Πprojects V onto the subspace V1.

Suppose now that the functional J in Problem (4.2.1) has the property

J(u)− J(v) ≥ 〈q(v), u− v〉+ κ‖Πu−Πv‖2, ∀ u, v ∈ V, (4.3.43)

where κ > 0, q(v) ∈ ∂J(v).Concerning the approximation of Problem (4.2.1), we assume that the

requirements made with respect to Kk, Jk and the parameters δk andεk are the same as before.Additionally we assume that condition (4.3.43) holds for the family Jk withthe common constant κ > 0.Now we consider the following regularized functional

Ψk,i(u) := Jk(u) + ‖Π1u−Π1uk,i−1‖2 + α‖Πu−Πuk,i−1‖2, (4.3.44)

with α ≥ 0 a given constant.In this case we suggest a MSR-method which carries out regularization on sub-spaces.

4.3.12 Method. (MSR-method on subspaces)Given u0 and sequences δk, εk, k = 1, 2, · · · , such that

δk > 0, εk ≥ 0, limk→∞

εk = 0.

Step k: Given uk−1.

(a) Set uk,0 := uk−1, i := 1.

(b) Given uk,i−1,compute approximately uk,i according to


k,i)‖V ′ ≤ ε′k (4.3.45)

with

uk,i := arg minΨk,i(u) : u ∈ Kk,

ε′k ≤ εkmin[1, α+ κ]√max[1, α+ κ]

.(4.3.46)

(c) If ‖Π1uk,i −Π1u

k,i−1‖ > δk, set i := i+ 1 and repeat (b).Otherwise, set uk := uk,i, i(k) := i, and continue with Step k := k + 1.

♦

We emphasize that, in comparison with Assumption 4.2.3, finite-dimensionalityof V1 := (I − P )V has been not assumed here.

Setting α := 0 in (4.3.44), then an essentially different MSR-method arises,


whereas in the case α := 1 we deal with the previously described MSR-method.

In order to obtain sufficient convergence conditions of MSR-Method 4.3.12we are going to adapt Lemma 4.3.5 and Theorem 4.3.8 to the new situation.

Denote |‖ · ‖| the norm induced by the scalar product

u, v := 〈Π1u, v〉+ (α+ κ)〈Πu, v〉,

where 〈·, ·〉 is the duality pairing between V and its dual space V ′.

Let us prove the following lemma which is similar to Proposition 3.1.3:

4.3.13 Lemma. Assume the functional J in Problem (4.2.1) satisfies condition(4.3.43) and v0 ∈ V is arbitrarily chosen and

v1 := arg minu∈K

J(u) + ‖Π1u−Π1v0‖2 + α‖Πu−Πv0‖2 (α ≥ 0 fixed).

Then, the following estimates hold for each u ∈ K:

|‖v1 − u‖|2 ≤ |‖v0 − u‖|2 − ‖Π1v0 −Π1v

1‖2

−α‖Πv0 −Πv1‖2 − κ‖Πv0 −Πu‖2 + J(u)− J(v1),

|‖v1 − u‖| ≤ |‖v0 − u‖|+√|J(u)− J(v1)|. (4.3.47)

If, moreover, J(u)− J(v1)− ‖Π1u0 −Π1v

1‖2 < 0, then

|‖v1 − u‖| − |‖v0 − u‖| < J(u)− J(v1)− ‖Π1u0 −Π1v

1‖2

2|‖v0 − u‖|. (4.3.48)

Proof: Because of the definition of v1 inequality

〈q, u− v1〉+ 2〈Π1v1 −Π1v

0 + α(Πv1 −Πv0, u− v1〉 ≥ 0

is fulfilled for some q ∈ ∂J(v1) and all u ∈ K.From the latter inequality together with (4.3.43) one can conclude

0 ≤ 2〈Π1v1 −Π1v

0 + α(Πv1 −Πv0), u− v1〉+ J(u)− J(v1)− κ‖Πu−Πv1‖2

and because Π and Π1 := I−Π are self-adjoint operators and Π2 = Π, Π21 = Π1,

|‖v1 − u‖|2 − |‖v0 − u‖|2 = −‖Π1v0 −Π1v

1‖2 − α‖Πv0 −Πv1‖2

+2〈Π1v1 −Π1v

0, v1 − u〉+ 2α〈Πv1 −Πv0, v1 − u〉−κ(‖Πv0‖2 − ‖Πv1‖2 − 2〈Πv0, u〉+ 2〈Πv1, u〉

)≤ −‖Π1v

0 −Π1v1‖2 − α‖Πv0 −Πv1‖2 − κ‖Πv0 −Πu‖2 + J(u)− J(v1).

Thus, relation (4.3.47) and

|‖v1 − u‖|2 − |‖v0 − u‖|2 ≤ −‖Π1v0 −Π1v

1‖2 + J(u)− J(v1)

hold true. IfJ(u)− J(v1)− ‖Π1v

0 −Π1v1‖2 < 0,

then inequality (4.3.48) follows immediately.


Note that the norms ‖ · ‖ and |‖ · ‖| are equivalent, i.e.

min[1, α+ κ]‖u‖2 ≤ |‖u‖|2 ≤ max[1, α+ κ]‖u‖2.

Since functional Ψk,i via (4.3.44) is strongly convex with constant κ := min[1, α+κ] (with respect to the norm ‖ · ‖), the latter relation together with (4.3.45) and(4.3.46) lead to

|‖uk,i − uk,i‖| ≤ εk2.

Now, let Br := u ∈ V : |‖u‖| ≤ r and

Ψk,s(u) := J(u) + ‖Π1u−Π1uk,i−1‖2 + α‖Πu−Πuk,i−1‖2,

and let the terms L(r), r∗, Q, Qk, Q∗ and σk be defined as before but assignednow to the new norm |‖ · ‖| and new functionals Ψk,i and Ψk,i.

With these substitutions Assumption 4.2.2, Lemma 4.3.5 and Theorem4.3.8 can be maintained also for MSR-Method 4.3.12 with only one modifica-tion: ‖ · ‖ has to be replaced by |‖ · ‖|. Now, using Lemma 4.3.13, the proofsof the statements in Section 4.3.2 can be repeated for this case with evidentcorrections.

If α + κ = 1, then ‖u‖ = |‖u‖|, hence, the terms L(r), r∗, Q, Qk and σkcoincide with the former ones. In principle, the choice of α + κ = 1 can alwaysbe guaranteed by means of a suitable scaling of the objective functional.

All results described above concerning convergence of OSR- and MSR-methods, except for Theorem 4.2.7, permit us to choose r∗ = r. However, as wewill see in Section 8.2, this choice may be inappropriate for specific problems.

Now we are going to present two simple examples. The first one illustratesthe advantages that are possible in applying MSR-methods in comparison withOSR-methods. The second one gives a look at the possible gain of the methodwith regularization on a subspace in comparison with the main variants of OSR-and MSR-methods.

Concerning the latter issue we refer also to Chapter ??, were the idea ofregularization on a subspace is applied in a natural manner to control problems,i.e. regularization is performed only with respect to the control- but not to thespace variables.

4.3.14 Example. Function J(u) := 110 (u2+u3−1)2+ 1

12u4 has to be minimizedon

K := u ∈ U0 : u1 sinπ

2t+ u2 cos

π

2t+ u3 ≤ 1 ∀ t ∈ [0, 1], (4.3.49)

withU0 := u ∈ R4 : u1 ≥ 0, u2 ≥ 0, 0 ≤ u3 ≤ 1, −1 ≤ u4 ≤ 0.

Obviously,

U∗ := u ∈ R4 : u1 = 0, u2 + u3 = 1, 0 ≤ u3 ≤ 1, u4 = −1

is the optimal set.In the methods considered we take εk := 0, Jk := J, ∀ k, hence σk := 0 ∀ k.

Approximating set T = [0, 1] by means of a sequence of grids

Tk := 0, hk, 2hk, ..., (n(k)− 1)hk, 1,


with n(k) > 0 integer, limk→∞ n(k) =∞ and hk := 1n(k) , we obtain the following

sequence of sets

Kk := u ∈ U0 : u1 sinπ

2t+ u2 cos

π

2t+ u3 ≤ 1 ∀ t ∈ Tk

approximating the original feasible set K.It is easy to prove that ρ(Ki,K) ≤ h2

k and that for r := 32 and hk ≤ 1

4 theinclusions K ⊂ Br, Kk ⊂ Br are valid, i.e., Q := K and Qk := Kk.This allows us to choose the controlling parameters of the OSR- and MSR-methods according to Theorem 4.3.6 (see Remark 4.3.9).

The function J satisfies the Lipschitz condition (4.1.3) on Br with constantL(r) := L( 3

4 ) = 43 , thus, the assumptions of Theorem 4.3.6 are fulfilled for

MSR-Method 4.3.1 if we set

µk := h2k, δ2

k >8

3µk,

∞∑k=1

hk <∞.

Convergence of the OSR-Method 4.2.1 is guaranteed if∑∞k=1 hk <∞.

Now we choose the controlling parameters as follows:

in the MSR-method:δk := 0.8 3

k+2 for k ≥ 1, i.e., δ1 := 0.8, δ2 := 0.6, δ3 := 0.48, , ...,

hk := 1k(k−1)+4 for k ≥ 1, i.e., h1 := 1

4 , h2 := 16 , h3 := 1

10 , ...;

in the OSR-method:hs := 1

s(s−1)+4 for s ≥ 1,

(in order to compare both methods we use the notation s and hs insteadof k and hk).

From the discretization point of view this updating of hk and hs is not fast,compared with implementations in practice.

Starting with u0 := (0, 0, 0, 0)T one can verify that the iterates uk of bothmethods satisfy the relations

supt∈Tuk1 sin

π

2t+ uk2 cos

π

2t+ uk3 < 1, uk2 > 0, 0 < uk3 < 1, (4.3.50)

and that uk1 = 0 holds for all k.Therefore, in the OSR-method we obtain for all s ≥ 1:

us1 = 0,

1

2

∂Ψs(us)

∂u2=

1

10(us2 + us3 − 1) + (us2 − us−1

2 ) = 0, (4.3.51)

1

2

∂Ψs(us)

∂u3=

1

10(us2 + us3 − 1) + (us3 − us−1

3 ) = 0, (4.3.52)

∂Ψs(us)

∂u4=

1

12+ 2(us4 − us−1

4 ) = 0 if us4 ≥ −1. (4.3.53)

Due to (4.3.51) and (4.3.52) we get

us2 − us3 = us−12 − us−1

3


and from u02 = u0

3 = 0 it follows us2 = us3 with

us2 =5

6us−1

2 +1

12,

us3 =5

6us−1

3 +1

12.

Relation (4.3.53) implies now

us4 = max[−1,− s

24].

Hence,

us+12 − us2 =

5

6(us2 − us−1

2 ) =

(5

6

)s(u1

2 − u02) =

1

12

(5

6

)s,

us+13 − us3 =

1

12

(5

6

)s,

us+14 − us4 =

− 1

24 if s ≤ 23,0 if s ≥ 24,

and

‖us+1 − us‖ =

√172 ( 5

6 )2s + 1576 if s ≤ 23,

√2

12 ( 56 )s if s ≥ 24.

(4.3.54)

By an MSR-method a sequence uk,i is generated which corresponds tothe sequence us such that

uk,i = us with s =

k−1∑j=1

i(j) + i, 1 ≤ i ≤ i(k).

The first iterations steps in the MSR-method lead to

i(1) := 3, i(2) := 3, i(3) := 3, i(4) := 15

and the iterateu4,15 := (0, 0.4937, 0.4937, −1)T

corresponds to the discretization parameter h4 := 116 .

The unique cluster point of these iterates is

u∗ = (0, 0.5, 0.5, −1)T ∈ U∗.

In an OSR-method the same point (0, 0.4937, 0.4937, −1)T will be ob-tained after 24 iterations and the corresponding discretization parameter ish24 := 1

556 .Thus, in a MSR-method semi-infinite constraint (4.3.49) generates 16 con-

straints (i.e. 16 exterior steps) whereas the OSR-method needs 556 constraintsin order to get the same level of accuracy for the termination point. ♦

4.3.15 Example. Function

J(u) := u21 + u2 + u3 (4.3.55)


has to be minimized on

K := u ∈ R3 : u2 + u3 ≥ 0.

Obviously, the optimal set is

U∗ := u ∈ R3 : u1 = 0, u2 + u3 = 0.

Now we choose

Jk := J, Kk := K; εk = µk = σk := 0; ∀ k ∈ N

and, in the case of the MSR-method, δk > 0 is chosen arbitrarily. Then se-quences, generated according to the exact Proximal Point Algorithm 3.1.5 (withχk := 2), coincide for both methods, OSR- as well as MSR-method, if they startwith the same point u0. Their convergence to some u∗ ∈ U∗ follows in particularfrom Proposition 3.1.8 or Theorem 4.2.4.

Let us focus on the sequence uk, obtained by means of OSR-method,when starting with some u0 > 0:

uk+11 :=

1

2uk1 , k = 1, 2, ...;

uk+12 := uk2 −

1

2, uk+1

3 := uk3 −1

2, while uk2 + uk3 − 1 > 0; (4.3.56)

uk+j2 :=

1

2(uk2 − uk3), uk+j

3 :=1

2(uk3 − uk2), if uk2 + uk3 − 1 ≤ 0, (4.3.57)

for j := 1, 2, ...

Consequently,

uk2 = uk+12 = ... = u∗2

uk3 = uk+13 = ... = u∗3 for k > u0

2 + u03.

Starting from index k := k0 > u02 +u0

3, linear convergence of the sequence holds,i.e.,

‖uk+j − u∗‖ =

(1

2

)j‖uk − u∗‖, ∀j ∈ N.

Now, let us turn to the regularization on subspace applied to the sameproblem. For function J condition (4.3.43) is true with the ortho-projector

Π : u 7→ (u1, 0, 0)T with κ := 1,

hence Π1 = I −Π : u 7→ (0, u2, u3)T .Taking in MSR-Method 4.3.12 α := 0, we have to minimize the functions

Ψk(u) := u21 + u2 + u3 + (u2 − uk−1

2 )2 + (u3 − uk−13 )2

successively on K. In this situation the behavior of the method is again indepen-dent of the choice of δk, therefore the iteration procedure can be interpretedas OSR-method.

Obviously, we get the sequence uk, with uk1 := 0 for k > 1 and uk2 , uk3

according to (4.3.56) and (4.3.57). Hence, this method generates the solution u∗


of the example in a finite number of steps.If, instead of (4.3.55), the function

J(u) := u21 + a

√(u2 + u3)3

has to be minimized (with arbitrary a > 0 and u0 > 0), then MSR-Method4.3.12, applied with the previously given ortho-projector Π1 and α := 0, con-verges also faster than an usual MSR-method, but not in a finite number ofsteps. In this case we get quadratic convergence, whereas the basic variants ofOSR- and MSR-methods provide only linear convergence. ♦

Of course, these examples show only some possible effects of particularlychosen regularization methods. In the following chapters we will find more ar-guments answering the question which PPR-method should be chosen for whichspecific problem.

Condition (4.3.13), responsible for a slow decrease of δk (in comparisonwith the other parameters), could give rise to the expectation that MSR-Method4.3.1 turns into an OSR-method after some iterations. The next example showsthat this expectation may be wrong.

4.3.16 Example. Function J(u) := u22 + u3 has to be minimized on

K := u ∈ R3 : 0 ≤ uj ≤ 1, j = 1, 2, 3.

Consider the approximations

Kk := u ∈ R3 : 0 ≤ u1 ≤ 1,1

2k(k+1)+5≤ u2 ≤ 1, 0 ≤ u3 ≤ 1,

withJk := J, εk := 0 k = 1, 2, · · · ; r = r∗ := 2.

In this situation we get

Q := K, Qk := Ki, ∀ k ∈ N; L(r) := 5.

Assumption 4.2.2 is fulfilled with µk := 12k(k+1)+5 and the conditions in Lemma

4.3.5 are satisfied if

δk :=1√

2k(k+1)+1, u1,0 := (0,

1

2, 0)T .

We observe immediately that, applying MSR-methods, the resulting minimizingsequence is

(0, 1

2, 0)T , (0,

1

4, 0)T , (0,

1

8, 0)T , (0,

1

16, 0)T , ...

and for the termination index of the interior loop it holds i(k) := k, k = 1, 2, ...(see Figur 4.3.2).

Indeed, the distance between successive iterates decreases and

‖uk,i(k)−1 − uk,i(k)−2‖ =1√

2k(k+1)> δk,

‖uk,i(k) − uk,i(k)−1‖ =1

2√

2k(k+1)< δk.

4.4. CHOICE OF CONTROLLING PARAMETERS 131

(0, 12, 0)T , (0, 1

4, 0)T , (0, 1

8, 0)T , (0, 1

16, 0)T , ...

k := 3

.....

k := 1

k := 2

Figure 4.3.2:

This example shows that the number of interior iterations (for fixed exteriorstep k) may even tend to infinite if k →∞. Of course, this effect is not typicalfor well-organized methods. Usually, transition k → k + 1 leads to a betterapproximation of the problem. Hence, sequence δk has to be chosen such thatfor sufficiently large k the approximate solution becomes more exact. ♦

It seems to be promising to investigate MSR-methods also for finite-dimensional problems. This is especially of interest for penalty-methods, con-sidered in Section 3.2, because with an increase of the penalty parameter theunconstrained auxiliary problems get a worse conditioning. This investigationcan be carried out on the basis of the results described above and will be studiedin Section 6.2 in the context of convex semi-infinite problems.

4.4 Choice of Controlling Parameters

We do not intend at giving in each case a strict formalization and argumentationof the recommendations contained in this section. Sometimes we will only sketchthe way how an improvement of the numerical process can be obtained.

4.4.1 On adaptive controlling procedures for parameters

Let

θk :=

k∑s=1

(√2L(r)µs + 2σs + 2µs +

εs2

). (4.4.1)

From the proof of Theorem 4.2.4 we see that the conclusion of this theoremremain to be true if

(i) for arbitrarily chosen k in the (k + 1)-th iteration, instead of the iterateuk, an arbitrary point uk ∈ Br∗/4+θk is used for the construction of thefunction Ψk+1;

(ii) thereafter the iteration process is carried on under the condition

∞∑s=k+1

(√2L(r)µs + 2σs + 2µs +

εs2

)<r∗

2− θk. (4.4.2)

This observation enables us to ensure convergence of the OSR-methods by con-trolling the parameters in the following way (cf. (4.2.2), (4.2.5) and (4.2.6)).

let the sequences µk, σk and εk be chosen according to Theorem4.2.4. Additionally, let positive sequences

µk, σk, εk (4.4.3)


be given such that

∞∑k=1

√µk <∞,

∞∑k=1

√σk <∞,

∞∑k=1

√εk <∞ (4.4.4)

and set%k :=

√2L(r)µk + 2σk + 2µk +

εk2.

Starting with u0 ∈ Br∗/8, in step (k+1) of the OSR-method 4.2.1, the inequality

‖uk‖ ≤ r∗

4+ θk+1 − %k+1 (4.4.5)

has to be observed. If it is wrong, we make a step under the usage of theparameters µk+1, σk+1 and εk+1 in order to compute uk+1.If inequality (4.4.5) is true, uk+1 has to be computed by using the parametersµk+1, σk+1 and εk+1. In particular, this means that

supu∈Br

|J(u)− Jk+1(u)| ≤ σk+1,

ρ(Qk+1, Q) ≤ µk+1, ρ(Q∗, Qk+1) ≤ µk+1

have to be satisfied for a given k.For efficiency reasons the sequences µk, σk, εk and µk, σk, εk should be

coordinated. Otherwise the numerical expense in solving the next auxiliary prob-lem may essentially increase.

Two substantial peculiarities of the procedure above should be noted.Firstly, its application will not disturb convergence of the OSR-method. If theconditions (4.4.2) and (4.4.4) are valid, then the corresponding sequence ukbelongs to Br∗ and converges weakly to a solution of u∗ ∈ U∗ of the problemunder consideration. Moreover, limk→∞ ‖uk − u∗‖ = 0 holds true if the func-tional J satisfies Assumption 4.2.3. Secondly, an analysis of the properties ofthe proximal mapping and numerical experiments indicate that (4.4.5) can bemaintained for the majority of real-life problems if the sequence µk harmo-nizes with the discretization strategies applied to semi-infinite problems andvariational inequalities and if, moreover, the sequences σk and εk are cho-sen according to the accuracy of the discretization. In that case the controllingof the method can be performed by means of the sequences (4.4.3) only.

4.4.1 Remark. In order to ensure a start with µ1, σ1, ε1 and a substantialprogress in the initial iterations, where the values of the controlling parametersare still rather large, it would be useful to require that in addition

r∗

8+ θ1 − %1 > r0

with some r0 > 0. ♦

We assume that the reader is familiar with the basic futures of finite el-ement methods, including the efficient handling of grid sequences. Probably,in the successive approximation of variational inequalities it makes sense todecrease the triangulation parameter slower than in multi-grid methods. Thereason is that, on the one hand, the favorable properties of multi-grid methods


for linear problems are not guaranteed for variational inequalities in general,and, on the other hand, it is important to keep the dimension of the discretizedand regularized auxiliary problems low as long as the iterates are fare from thesolution set U∗. This is especially important for OSR-methods.

An a priori choice of the sequences (4.4.3) is not obligatory. They can becorrected during execution of the OSR-method, particularly, in dependence ofthe values ‖uk − uk−1‖ and r∗/4 + θk+1 − ‖uk‖. It may be helpful to chooseµk+1, σk+1 and εk+1 close to µk, σk and εk if the distance ‖uk − uk−1‖ is large(of order O

√2L(r)µk + 2σk)) and to accelerate the change of the parameters

in case that r∗/4 + θk+1 − ‖uk‖ is small (close to %k).

This consideration can be extended to MSR-methods if in the determi-nation of the next parameter δk according to (4.3.13) the triple (µk, σk, εk)(respectively (µk, σk, εk)) is used which belongs to the k-th exterior level. Herethe choice of the parameters (4.4.3) requires in the k-th level only to considerpoints uk,0, in particular, (4.4.5) takes the form

‖uk,0‖ ≤ r∗

4+ θk+1 − %k+1.

This conclusion is obvious if we use inequality (4.3.9) and the relation

‖uk,i − u∗∗‖| ≤ ‖u1,0 − u∗∗‖+

k−1∑s=1

(√2L(r)µs + 2σs + 2µs +

εs2

), (4.4.6)

which follows from the proof of Lemma 4.3.5 for u∗∗ ∈ U∗ ∩ Br∗/8 and all k,i < i(k).

4.4.1.1 Controlling in case of unknown Lipschitz constants

If the calculation of a Lipschitz constant L(r) is difficult, it is recommendableto choose the sequences µk, σk and εk in Method 4.2.1 such that

∞∑k=1

√µk <∞,

∞∑k=1

√σk <∞,

∞∑k=1

√εk <∞, (4.4.7)

max (√µk +

√σk + εk) << r∗ (4.4.8)

and to verify the inclusion uk ∈ Br. In case

uk ∈ Br ∀ k (4.4.9)

convergence is guaranteed due to phase 3 in the proof of Theorem 4.2.4. Weemphasize that in this part of the proof the real value of L(r) is not important.It is only essential that the functional J is Lipschitz continuous. Nevertheless,the validity of relation (4.4.9) depends on the value L(r).

Another possibility is to treat the equivalent problem

minz : J(u)− z ≤ 0, u ∈ K (4.4.10)

in the space V ×R. In this case, the new objective functional satisfies the Lip-schitz condition with L ≡ 1 on the whole space V ×R.


However, from several reasons, this transformation may not be appropriate. Ifwe deal, for instance, with minimization of a quadratic functional subject tolinear constraints, the regularized problems belong to the same class and canbe solved by special algorithms, including finite ones. However, the applicationof iterative regularization methods to Problem (4.4.10) needs universal algo-rithms of convex optimization. Concerning other transformations of the originalproblem see Remark 4.2.6.

4.4.1.2 Effective use of rough approximations

As mentioned in Section 2.2, in order to apply MSR-methods of it is alwaysimportant to perform the iterations on rough approximation levels efficiently.The this end the following procedures may be successful:At a fixed exterior iteration level k, if i(k) > 1, compute

bk := −i(k)−1∑i=1

(‖uk,i − uk,i−1‖ − εk

2

)2

+ 2(i(k)− 1) (L(r)µk + σk + rεk) .

This value is negative due to (4.3.13). Then the iterations in the inner loop ihave to be continued while the inequalities

‖uk,i − uk,i−1‖ > εk2,

J(uk,i) < J(uk,i−1)− ε2k4

(4.4.11)

and

i∑s=i(k)

(‖uk,s − uk,s−1‖ − εk

2

)2

− 2(i− i(k) + 1) (L(r)µk + σk + rεk) > θbk

(4.4.12)are satisfied for a chosen θ ∈ (0, 1).

This modification of the MSR-method will not disturb convergence. Indeed,following the proof of Lemma 4.3.5, relation (4.3.27) can be established byinduction for i ≤ i(k0). Then, Lemma 4.3.3 immediately gives i(k0) < ∞because, instead of (4.3.19), the improved relations

‖uk0,i+1 − vk0‖ − ‖uk0,i − vk0‖

≤ 1

4r

(−(‖uk0,i+1 − uk0,i‖ − εk0

2

)2

+ 2L(r)µk0 + 2σk0

)+εk02

have to be used for i < i(k0) − 1. Summing up these inequalities for i =0, ..., i(k0)− 1, we obtain

‖uk0,i(k0)−1 − u∗∗‖ ≤ ‖uk0,0 − u∗∗‖+bk04r. (4.4.13)

In view of (4.4.12), (4.4.13) and the first inequality in (4.4.11), the estimates(4.3.27) can be obtained for i = i(k0) + 1, i(k0) + 2, ..., while the conditions(4.4.11) and (4.4.12) held true. Finiteness of the inner loop i for a fixed exteriorlevel k results immediately from (4.3.27) and the second inequality in (4.4.11).The remaining modifications in the proofs of Lemma 4.3.5 and Theorem 4.3.6are trivial.


4.4.1.3 Choice of regularization parameters

In order to simplify the study of IPR we have assumed beginning with Section4.1 that in the regularized functional

Ψk(u) := Jk(u) +χk2‖u− uk‖2

and in related expressions χk := 2 ∀ k.However, the efficiency of the methods depends substantially on the chosen

values of χk.The theoretical requirements open a great variety for that choice. For the

exact proximal algorithm 3.1.5 weak convergence of the iterates to an elementof the optimal solution set has been established by Brezis and Lions [52] incase

χk > 0,

∞∑k=1

1

χk=∞,

whereas Guler [153] has proved this convergence under the assumption

∞∑k=1

1√χk

=∞.

In simple situations, when OSR-methods are applied withKk ≡ K, Jk ≡ Jbut also χk ≡ χ, one can expect the following behavior based on estimates in[352] and confirmed numerically: Usually, a decrease of χ will lead to increasingill-conditioning of the regularized auxiliary problems and as a result we get ahigher numerical expense for the outer loop. But in order to obtain a sufficientlyexact solution of the original problem a relatively small number of outer itera-tions is required. Ibaraki et al. [190] have done numerical experiments withtheir own method and with a primal-dual proximal point algorithm suggestedby Rockafellar [351]. They solved a number of network flow problems ( #of nodes ≤ 400 and # of arcs ≤ 4000) with quadratic cost functionals includ-ing well-conditioned, ill-conditioned and ill-posed problems. Resulting from thisanalysis the following strategy for the choice of χk proved to be efficient:

χ1 := 1, χk :=

χk−1

2 if χk−1 > 0.010.01 otherwise

In [165] a class of self-adaptive proximal point methods is proposed byHager and Zhang. If the regularization parameter has the form χ(u) :=β‖∇J(u)‖η where η ∈ [0, 2) and β > 0 is a constant, they obtain convergence tothe set of minimizers that is linear for η := 0 and β sufficiently small, superlinearfor η ∈ (0, 1), and at least quadratic for η ∈ [1, 2).

Using IPR for solving variational inequalities, it may be important tocoordinate χk with κ, the value of the modulo of strong convexity of the energyfunctional J on the orthogonal complement of the kernel of the correspondingbilinear form. The goal is to make the regularized functional Ψk strongly convexon the whole space V with a constant which is close to κ.

Variation of χk makes sense, since the value κ is usually calculated bymeans of rough estimates. In the numerical experiments in Chapters 6 - 7,carried out for SIP, for contact problems in elasticity theory but also for control


problems, the value of χk are changing suitably and the ratio of maximal andminimal value of χk is comparatively big. In this sense OSR-and MSR-methodsprove to be rather sensitive with respect to the choice of χk.

4.4.1.4 Choice of penalty parameters

Concerning the penalty methods described in the Chapters 3 and 4 the choiceof a starting value r1 of the penalty parameter should be related to the startingpoint u0 or u1,0 in MSR-methods, respectively. otherwise it could be that in theinitial iterations we move into ”irregular directions” with an substantial increaseof the values of the objective function.This behavior can be demonstrated, for example, for the penalty method (3.2.1)applied with parameters χk := 2, εk := 0, for all k, and data

J(u) := −10u,

φk(u) := rk(max[0, u])2

(corresponding to the restriction u ≤ 0),

r1 := 1, rk := 2rk−1, k = 2, 3, ..., u0 = 0.1.

In fact, for infinite and semi-infinite problems these undesirable iterationsare leading to an increase of the dimension of the approximate auxiliary prob-lems. The same happens in the case of MSR-methods.

In the sequel we describe a procedure for choosing a start value for thepenalty parameter suggested by Grossmann and Kaplan [151].

If, for instance, Problem (A3.4.56) is solved by the penalty method (3.2.1)with χk := 2 and penalty function (A3.4.78), then the point u1 can be deter-mined by an non-exact minimization of the function

d(u) := − 1

J(u) + ‖u− u0‖2 − t−

m∑j=1

1

gj(u)

on the open setu ∈ intK : J(u) + ‖u− u0‖2 < t,

where t > J(u0) is chosen close to J(u0).Obviously, this point can be computed by means of the same algorithm

for unconstrained minimization which is also chosen to minimize the auxiliaryproblems in the penalty method. Computing on this way u1, as a stoppingcriterion the condition

‖∇d(u)‖ ≤ ε1c20,

with c0 a lower bound of minu∈RnJ(u) + ‖u − u0‖2 − t, can be used. Thenthe following relations are satisfies:

J(u1) + ‖u1 − u0‖2 < t

and ∥∥∇J(u1) + 2(u1 − u0) +(J(u1) + ‖u1 − u0‖2 − t

)2 m∑j=1

∇gj(u1)

g2j (u1)

∥∥ ≤ ε1.


Therefore, u1 is an approximate minimum of the function

F1(u) := J(u) + φ1(u) + ‖u− u0‖2

and satisfies the condition‖∇F1(u1)‖ ≤ ε1,

with

φ1(u) :=

− 1r1

∑mj=1

1gj(u) if u ∈ intK

+∞ if u /∈ K

a penalty function of type (A3.4.78) and

r1 :=1

(J(u1) + ‖u1 − u0‖2 − t)2 .

Now, choosing r2 ≥ r1, we can continue with method (3.2.1).

4.4.2 Stopping rules for inner and outer loops

Simple examples show that, in general, when the exact proximal point method3.1.5 terminates at a given distance ‖uk − uk−1‖ of two consecutive iterates ora given distance |J(uk)− J∗| of the current function value and J∗ (see Lemma4.1.5), then we cannot conclude that the iterates uk are close to the solution setin the metric of the initial space. This situation is typical for certain methodsapplied to ill-posed problems (see also Polyak [331], chapt. 1 and 6).

Nevertheless we obtain in IPR-methods a stable sequence of iterates to-ward some solution point by using a stopping rule which applies the distancebetween function values. This is often decisive for the quality of the approximatesolutions.

If Assumption 5.1.4 below (or 5.1.6) is fulfilled and if the controllingparameters decrease with geometrical progression, one may expect a rate ofconvergence like O( 1

k2 ) or even a linear rate with respect to the iterates uk.More precisely, the stopping rule for the outer loop in MSR-method 4.3.1 isbased on Theorem 4.3.10 and Remark 4.3.11, and the algorithm terminates ata point uk,i(k) where

4rδk +4r + L(r)

2εk + L(r)µk + 2δk < ε, (4.4.14)

with ε suitably chosen. In that case, due to (4.3.40) and (4.3.42), the estimate

|J(uk,i(k) − J∗| < ε

is true.For OSR-methods, taking into consideration (4.3.42) and the relation

J(uk)− J∗ ≤ 4r‖uk − uk−1‖+4r + L(r)

2εk + L(r)µk + 2σk,

the stopping condition is

4r‖uk − uk−1‖+4r + L(r)

2εk + L(r)µk + 2σk < ε. (4.4.15)


Usually, in order to solve well-posed problems, as a rule the accuracy insolving approximate problems has to be chosen in the same order as the accuracyof the approximation of the original problem. But, solving ill-posed problemsby means of IPR, such a strong rule is superfluous up to a decisive moment.Indeed, as mentioned, the influence of the regularization at the beginning ofthe solution process may drive us far away from the sought solution even if theapproximation of the data is sufficiently precise.

Hence, at the beginning of the solution process the choice of εk should bebasically oriented on the conditions of the convergence of the method consid-ered. However, we have to regard that the choice of εk influences the values of δkin MSR-methods (see condition (4.3.13)), and with an increase of εk the numberof iterations in the interior loop cuts down at the k-th level. This reduces theefficiency of the method. In the final stage of the MSR-algorithm we recommendto choose εk in the same order as µk and σk.

Concerning the solution of variational inequalities, in the starting phase,especially for MSR-methods, an opposite strategy may be profitable: If the dis-cretization initially on rough grids generates a quadratic programming problemwith simple constraints, often the application of exact methods is more efficient.Moreover, the choice of small or zero values for εk enables us to use smallervalues for δk and, hence, we obtain better approximate solutions of the originalproblem on these rough grids.

4.5 Comments

Section 4.2: The results of this section are based on the papers of Kaplanand Tichatschke [208, 209, 211, 213]. Theorem 4.2.7 is new.

Lemaire [261] used another framework for investigating iterative PPM’s.He considered unconstrained problems

minu∈V

J(u), minu∈VJk(u) + ‖u− uk−1‖2

and the distance between the convex, lsc functionals J and Jk are measured interms of Moreau-Yoshida-approximations. In his paper the assumptions onthe accuracy of the approximation are not predestinated when finite elementmethods are applied to variational inequalities with differential operators.

Section 4.3: Apparently, the first results concerning MSR-methods are givenin [213], where two variants of MSR-methods are suggested for solving convex,semi-infinite problems and the penalty method is used as a basic algorithm.Most of the results of this section are proved for the first time in [214], except forMSR-methods with regularization on a subspace, which have been not publishedbefore.

Chapter 5

RATE OFCONVERGENCE FORPPR

In this chapter we analyze the rate of convergence of MSR-Method 4.3.1. Itturns out that for a wide class of ill-posed convex variational problems a linearrate of convergence can be established for functional values as well as for iterates.Moreover, for Problem (4.2.1), which is quite general, estimates of the objectivevalues will be obtained that cannot be improved substantially.

In Section 5.2 this analysis will be adapted to OSR-Method 4.2.8.In Section 5.3 by means of the regularized penalty method (3.2.2) it is shownhow the approach developed in Section 4.1 may be used in situations, wherethe conditions on approximation (cf. Assumption 4.2.2) are disturbed.

5.1 Rate of Convergence for MSR-Methods

5.1.1 Convergence results for the general case

To start with we take in MSR-Method 4.3.1

σ0 := σ1, µ0 := µ1, ε0 := ε1,

uk,0 := uk−1,i(k−1) (for k > 1).

For readers convenience, in accordance with the notation introduced in Subsec-tion 4.3.2, we use the following abbreviations:

ξk1 :=1

2(L(r) + 4r)εk + 2σk,

ξk2 := L(r)(µk +εk2

),

139

140 CHAPTER 5. RATE OF CONVERGENCE FOR PPR

ζ ′k := 6σk + L(r)(

3µk +εk2

)+ r

(2µk +

εk2

)+(

2µk +εk2

)2

+ ξk2 + ξk1 ,

ζ ′′k :=1

8r2

(2σk + L(r)µk + ξk2

) (2rL(r) + ξk2

)+ 4σk

+ (L(r) + r)(

2µk +εk2

)+(

2µk +εk2

)2

+ ξk1 ,

ζk := max[ζ ′k, ζ′′k ],

%k,0 := J(uk,0)− J∗ + ξk−12 , (5.1.1)

%k,i := J(uk,i)− J∗ + ξk2 , 0 < i ≤ i(k), (5.1.2)

i(k) := i(k) + max[0, i(k)− 3].

Note that the notation σ0, µ0 and ε0 is used to define formally for k := 0 theconstant ξ0

2 in (5.1.1) and some other values which will be introduced later on.Concerning the rate of convergence of the objective values of the iterates

in MSR-Method 4.3.1 the following result can be proved.

5.1.1 Theorem. Suppose the assumptions of Lemma 4.3.5 are fulfilled,u0 := u1,0 ∈ K and, moreover,

(i) σk+1 ≤ σk, εk+1 ≤ εk, µk+1 ≤ µk, ρ(Qk, Qk+1) ≤ 2µk, k =1, 2, · · · ,ε1 < 1, µ1 < 1, J(u1,0) > J∗;

(ii) constant β is fixed such that

0 < β <1

4min

[1

L(r)(2r + µ1 + ε1

2

) , 1

8r2

]; (5.1.3)

(iii) constants ζk satisfy

ζ0 = ζ1 ≤β

4

(%1,0

1 + 3β%1,0

)2

,

ζk−1 + 2ζk ≤(δk −

εk2

)2

=: ε2k, (5.1.4)

ζk ≤β

4

%1,0

1 + β(∑k−1

j=1 i(j) + 3)%1,0

2

for k ≥ 2.

Then it holds for MSR-Method 4.3.1

%k,1 <%1,0

1 + β(∑k−1

j=1 i(j) + 1)%1,0

, k = 1, 2, · · · . (5.1.5)

Proof: Due to (4.1.3), (4.2.5), (4.3.9), Lemma 4.3.5 and the strong convexityof the functionals Ψk,i, we get for all k and 0 ≤ i ≤ i(k)− 1,

Ψk,i+1(uk,i+1)−Ψk,i+1(uk,i+1) ≤ ξk1 , (5.1.6)

5.1. RATE OF CONVERGENCE FOR MSR-METHODS 141

and from (4.3.42)

J(uk,i+1)− J∗ ≥ −ξk2 . (5.1.7)

Let

vk,i := arg min‖uk,i − v‖ : v ∈ Q∗, Q∗ := U∗ ∩ Br∗ ,vk,i := arg min‖uk,i − v‖ : v ∈ Qk, Qk := Kk ∩ Br,w1,0 := arg min‖u1,0 − w‖ : w ∈ Q1,wk,0 := arg min‖uk,0 − w‖ : w ∈ Qk, for k > 1,

ηk,i(λ) := λJk(vk,i) + (1− λ)Jk(uk,i) + λ2‖vk,i − uk,i‖2.

Due to (5.1.6), the convexity of Jk, Qk and the definition of uk,i in (4.3.3), wehave for 1 ≤ i < i(k) and λ ∈ [0, 1]

Jk(uk,i+1) + ‖uk,i+1 − uk,i‖2

≤ λJk(vk,i) + (1− λ)Jk(uk,i) + ‖λvk,i + (1− λ)uk,i − uk,i‖2 + ξk1

≤ λJk(vk,i) + (1− λ)Jk(uk,i) + (1− λ)(Jk(uk,i)− Jk(uk,i)

)+ λ2‖vk,i − uk,i‖2 + (1− λ)2‖uk,i − uk,i‖2

+ 2λ(1− λ)‖vk,i − uk,i‖‖uk,i − uk,i‖+ ξk1 .

With regard to λ(1 − λ) ≤ 14 and the relations (4.1.3), (4.2.5), (4.3.9), the

estimation above can be continued such that

Jk(uk,i+1) + ‖uk,i+1 − uk,i‖2

≤ λJk(vk,i) + (1− λ)Jk(uk,i) + λ2‖vk,i − uk,i‖2 + |Jk(uk,i)− Jk(uk,i)|

+1

2‖vk,i − uk,i‖‖uk,i − uk,i‖+ ‖uk,i − uk,i‖2 + ξk1

≤ ηk,i(λ) + ξk3 , (5.1.8)

with

ξk3 := 2σk +1

2(L(r) + r)εk +

ε2k4

+ ξk1 .

For k > 1, i = 0 we notice that uk,0 ∈ Qk−1, but uk,1 ∈ Qk.Using (4.1.3), (4.3.9) and Assumption 4.2.2, this leads to

Jk(uk,1) + ‖uk,1 − uk,0‖2

≤ λJk(vk,0) + (1− λ)Jk(wk,0) + ‖λvk,0 + (1− λ)wk,0 − uk,0‖2 + ξk1

≤ λJk(vk,0) + (1− λ)Jk(uk,0) + λ2‖vk,0 − uk,0‖2 + |Jk(uk,0)− Jk(wk,0)|

+1

2‖vk,0 − uk,0‖‖wk,0 − uk,0‖+ ‖wk,0 − uk,0‖2 + ξk1

≤ ηk,0(λ) + ξk4 , (5.1.9)

with

ξk4 := 2σk + (L(r) + r)(

2µk−1 +εk−1

2

)+(

2µk−1 +εk−1

2

)2

+ ξk1 .


Finally, for k = 1, i = 0 we get

J1(u1,1) + ‖u1,1 − u1,0‖2

≤ λJ1(v1,0) + (1− λ)J1(w1,0) + ‖λv1,0 + (1− λ)w1,0 − u1,0‖2 + ξ11

≤ η1,0(λ) + |J1(u1,0)− J1(w1,0)|+ 1

2‖v1,0 − u1,0‖‖w1,0 − u1,0‖

+ ‖w1,0 − u1,0‖2 + ξ11

≤ η1,0(λ) + 2σ1 + L(r)µ0 + rµ0 + µ20 + ξ1

1

≤ η1,0(λ) + ξ14 .

In this way, for each k and 0 ≤ i < i(k), we conclude that

Jk(uk,i+1) + ‖uk,i+1 − uk,i‖2 ≤ ηk,i(λ) + ξk0 , (5.1.10)

with ξk0 := ξk3 for i > 0 and ξk0 := ξk4 for i = 0.Minimizing the quadratic function ηk,i on the interval [0, 1], the following

three cases have to be analyzed:

(a) λk,i = arg min ηk,i(λ) = 0.Hence, ηk,i(λk,i) = Jk(uk,i) and Jk(uk,i) ≤ Jk(vk,i).Therefore, the following estimates are true:

Jk(uk,i+1) + ‖uk,i+1 − uk,i‖2 ≤ Jk(uk,i) + ξk0 ≤ Jk(vk,i) + ξk0 ,

J(uk,i+1) + ‖uk,i+1 − uk,i‖2 ≤ J(vk,i) + 2σk + ξk0 ,

and

J(uk,i+1)− J∗ + ξk2

≤ J(vk,i)− J∗ + ξk0 + ξk2 + 2σk − ‖uk,i+1 − uk,i‖2

= J(vk,i)− J(vk,i) + ξk0 + ξk2 + 2σk − ‖uk,i+1 − uk,i‖2.

Due to (4.2.6), (5.1.1) and (5.1.2), this leads to

%k,i+1 ≤ ξk2 + ξk0 + L(r)µk + 2σk − ‖uk,i+1 − uk,i‖2. (5.1.11)

(b) λk,i = 1 implies ηk,i(λk,i) = Jk(vk,i) + ‖vk,i − uk,i‖2 and

d

dληk,i(λ)|λ=1 ≤ 0

is obvious, i.e.,Jk(uk,i)− Jk(vk,i)

2‖vk,i − uk,i‖2≥ 1 = λk,i.

Thus

ηk,i(λk,i) ≤1

2

(Jk(uk,i) + Jk(vk,i)

)is valid and with regard to J(vk,i) ≤ J∗ + L(r)µk we conclude that

%k,i+1 ≤1

2

(Jk(uk,i) + J∗ + L(r)µk

)−J∗+ξk0 +ξk2 +2σk−‖uk,i+1−uk,i‖2.


In order to combine the cases i = 0 and i > 0, we utilize that ξk2 ≤ ξk−12 ,

hence for 0 ≤ i < i(k) the inequalities

%k,i ≥ J(uk,i)− J∗ + ξk2

and

%k,i+1 ≤1

2%k,i +

1

2L(r)µk +

1

2ξk2 + ξk0 + 2σk − ‖uk,i+1 − uk,i‖2 (5.1.12)

are true.

(c) If 0 < λk,i < 1 then

λk,i =Jk(uk,i)− Jk(vk,i)

2‖vk,i − uk,i‖2

and in view of (5.1.10) we have

Jk(uk,i+1) ≤ Jk(uk,i)−(Jk(uk,i)− Jk(vk,i)

)24‖vk,i − uk,i‖2

+ ξk0 − ‖uk,i+1 − uk,i‖2.

(5.1.13)

ButJk(uk,i)− Jk(vk,i) ≥ J(uk,i)− J(vk,i)− 2σk − L(r)µk

and we have to distinguish two cases:

() If J(uk,i)− J(vk,i) > 2σk + L(r)µk, then(Jk(uk,i)− Jk(vk,i)

)2 ≥ (Jk(uk,i)− Jk(vk,i) + ξk)2

−2(

2σk + L(r)µk + ξk)(

Jk(uk,i)− Jk(vk,i) + ξk)

+(

2σk + L(r)µk + ξk)2

> 0

holds with ξk := ξk2 for i > 0 and ξk := ξk−12 for i = 0.

Due to (4.1.3) and Lemma 4.3.5, the latter inequality leads to(Jk(uk,i)− Jk(vk,i)

)24‖vk,i − uk,i‖2

>

>1

16r2

[%2k,i − 2

(2σk + L(r)µk + ξk

)(J(uk,i)− J(vk,i) + ξk

)+(

2σk + L(r)µk + ξk)2]

>1

16r2

[%2k,i − 2


)(2L(r)r + ξk

)]. (5.1.14)

However, (5.1.13) implies

J(uk,i+1)− J∗ + ξk2 ≤ J(uk,i)− J∗ + ξk2

−(Jk(uk,i)− Jk(vk,i)

)24‖vk,i − uk,i‖2

+ ξk0 + 2σk − ‖uk,i+1 − uk,i‖2,


i.e.,

%k,i+1 ≤ %i,k −(Jk(uk,i)− Jk(vk,i)

)24‖vk,i − uk,i‖2

+ ξk0 + 2σk − ‖uk,i+1 − uk,i‖2

and with regard to (5.1.14)

%k,i+1 < %k,i −%2k,i

16r2

+2

16r2


)(2L(r)r + ξk

)(5.1.15)

+2σk + ξk0 − ‖uk,i+1 − uk,i‖2.

() If J(uk,i)−J(vk,i) ≤ 2σk+L(r)µk, then using inequality (5.1.10) for λ = 0,we get

J(uk,i+1) + ‖uk,i+1 − uk,i‖2 ≤ J(vk,i) + 4σk + L(r)µk + ξk0 ,

which leads immediately to

%k,i+1 ≤ 4σk + L(r)µk + ξk2 + ξk0 − ‖uk,i+1 − uk,i‖2. (5.1.16)

Because of ξk2 ≤ ξk−12 and ξ0

2 = ξ12 , the inequality

%k,i < L(r)(

2r + µk−1 +εk−1

2

)is obvious and, due to the monotonicity assumed for µk and εk, we obtain

%k,i < L(r)(

2r + µ1 +ε12

), ∀ k, ∀ i.

This leads together with (5.1.3) to

2β%k,i <%k,i

2L(r)(2r + µ1 + ε12 )

<1

2,

hence,1

2%k,i − 2β%2

k,i ≥ 0. (5.1.17)

Taking into account the definition of ζk, it can be easily seen that for corre-sponding pairs (k, i) the inequalities

%k,i+1 ≤ %k,i − 2β%2k,i + ζk − ‖uk,i+1 − uk,i‖2, (i > 0) (5.1.18)

and%k,1 ≤ %k,0 − 2β%2

k,0 + ζk−1 − ‖uk,1 − uk,0‖2 (5.1.19)

are conclusions from any of the inequalities (5.1.11), (5.1.12), (5.1.15) and(5.1.16) above. Thus, estimate (5.1.18) holds for each k and 0 < i ≤ i(k) − 1,and estimate (5.1.19) is satisfied for each k.

It should be noted that the inequality ζk ≥ ζ ′k was used for the transition


from (5.1.11), (5.1.12) to (5.1.18), (5.1.19), whereas for the transition of theestimates (5.1.15), (5.1.16) the inequality ζk ≥ ζ ′′k was applied.

Due to assumption (iii) and J(u1,0) > J∗, we get from (5.1.19)

%1,1 ≤ %1,0 − 2β%21,0 +

β

4

(%1,0

1 + 3β%1,0

)2

− ‖u1,1 − u1,0‖2

< %1,0(1− β%1,0)− β%21,0 +

β

4%2

1,0 − ‖u1,1 − u1,0‖2

<%1,0

1 + β%1,0− ‖u1,1 − u1,0‖2.

Continuing the proof we assume that, for fixed k and an integer γ > 0, theinequalities

%k,0 ≤ %1,0

1 + β(γ − 1)%1,0(5.1.20)

%k,1 <%1,0

1 + βγ%1,0− ‖uk,1 − uk,0‖2 (5.1.21)

hold and, moreover, that

ζk ≤β

4

(%1,0

1 + β(γ + 2)%1,0

)2

.

(1) In case i(k) = 1 we have %k+1,0 = %k,i(k) = %k,1. Then

%k+1,1 <%1,0

1 + β(γ + 1)%1,0− ‖uk+1,1 − uk+1,0‖2

is true. Indeed, if ζk ≥ β%2k+1,0, i.e.,

β

4

(%1,0

1 + β(γ + 2)%1,0

)2

≥ β%2k+1,0

then%k+1,0 ≤

%1,0

2(1 + β(γ + 2)%1,0)

holds and, due to (5.1.19),

%k+1,1 ≤ %k+1,0 + ζk − ‖uk+1,1 − uk+1,0‖2

≤ %1,0

2(1 + β(γ + 2)%1,0)+β

4

(%1,0

1 + β(γ + 2)%1,0

)2

− ‖uk+1,1 − uk+1,0‖2

<%1,0

2(1 + β(γ + 2)%1,0)+

1

4(γ + 2)· %1,0

1 + β(γ + 2)%1,0

− ‖uk+1,1 − uk+1,0‖2

<%1,0

1 + β(γ + 1)%1,0− ‖uk+1,1 − uk+1,0‖2.


If ζk < β%2k+1,0, then with regard to (5.1.19)

%k+1,1 < %k+1,0 − β%2k+1,0 − ‖uk+1,1 − uk+1,0‖2.

For 0 ≤ βt ≤ 12 function t − βt2 increases and β%1,0 ≤ 1

4 holds accordingto the chosen β in (5.1.3). Therefore, in view of %k+1,0 = %k,1 and (5.1.21),

%k+1,1 <%1,0

1 + βγ%1,0− β

(%1,0

1 + βγ%1,0

)2

− ‖uk+1,1 − uk+1,0‖2

<%1,0

1 + βγ%1,0

(1− β%1,0

1 + β(γ + 1)%1,0

)− ‖uk+1,1 − uk+1,0‖2

=%1,0

1 + β(γ + 1)%1,0− ‖uk+1,1 − uk+1,0‖2.

(2) If i(k) = 2, inequality (5.1.18) implies

%k+1,0 ≡ %k,2 ≤ %k,1 − 2β%2k,1 + ζk.

Proceeding analogously to case (1), we obtain at first

%k+1,0 <%1,0

1 + β(γ + 1)%1,0

and afterwards, according to the restriction for ζk and the validity of

%k+1,1 ≤ %k+1,0 − 2β%2k+1,0 + ζk − ‖uk+1,1 − uk+1,0‖2,

we infer%k+1,1 <

%1,0

1 + β(γ + 2)%1,0− ‖uk+1,1 − uk+1,0‖2.

(3) But if i(k) > 2, the following inequalities are true:

%k,1 ≤ %k,0 − 2β%2k,0 + ζk−1 − ‖uk,1 − uk,0‖2,

%k,2 ≤ %k,1 − 2β%2k,1 + ζk − ‖uk,2 − uk,1‖2,

......

%k,i(k)−1 ≤ %k,i(k)−2 − 2β%2k,i(k)−2 + ζk

− ‖uk,i(k)−1 − uk,i(k)−2‖2, (5.1.22)

%k+1,0 ≡ %k,i(k) ≤ %k,i(k)−1 − 2β%2k,i(k)−1 + ζk, (5.1.23)

%k+1,1 ≤ %k+1,0 − 2β%2k+1,0 + ζk − ‖uk+1,1 − uk+1,0‖2. (5.1.24)

Using the first inequality of this row together with

ζk−1 − ‖uk,1 − uk,0‖2 < 0,

which follows from

ζk−1 + 2ζk ≤ ε2k, i(k) > 2,


then the estimation of %k,1 can be improved. Indeed, applying (5.1.20)together with the trivial inequality

q − 2βq2 <q

1 + 2βq, (q > 0, β > 0), (5.1.25)

we conclude that%k,1 <

%1,0

1 + β(γ + 1)%1,0.

Due to (5.1.4) and ‖uk,i − uk,i−1‖ > δk for i < i(k) (cf. substep (c) inMSR-Method 4.3.1), we infer

%k,i(k)−2 <%1,0

1 + β(γ + 1 + 2(i(k)− 3))%1,0(5.1.26)

and%k,i(k)−1 <

%1,0

1 + β(γ + 1 + 2(i(k)− 2))%1,0.

But taking into account the relations (5.1.22), (5.1.23), one gets

%k+1,0 ≡ %k,i(k) ≤≤ %k,i(k)−2 − 2β%2

k,i(k)−2 − 2β%2k,i(k)−1 + 2ζk − ‖uk,i(k)−1 − uk,i(k)−2‖2.

Because of (5.1.4) we have

2ζk − ‖uk,i(k)−1 − uk,i(k)−2‖2 < 0,

hence,%k+1,0 < %k(i(k)−2 − 2β%2

k,i(k)−2.

As beforehand, using (5.1.25), inequality (5.1.26) leads to

%k+1,0 <%1,0

1 + β(γ + 1 + 2(i(k)− 2))%1,0<

%1,0

1 + β(γ + 2(i(k)− 2))%1,0.

Analogously, from (5.1.4) and (5.1.22)-(5.1.24) we infer

%k+1,1 ≤ %k,i(k)−1 − 2β%2k,i(k)−1 − 2β%2

k+1,0 + 2ζk − ‖uk+1,1 − uk+1,0‖2

≤ %k,i(k)−2 − 2β%2k,i(k)−2 − 2β%2

k,i(k)−1 − 2β%2k+1,0

+ 3ζk − ‖uk,i(k)−1 − uk,i(k)−2‖2 − ‖uk+1,1 − uk+1,0‖2.

Thus,%k+1,1 < %k,i(k)−2 − 2β%2

k,i(k)−2 − ‖uk+1,1 − uk+1,0‖2

and

%k+1,1 <%1,0

1 + β(γ + 1 + 2(i(k)− 2))%1,0− ‖uk+1,1 − uk+1,0‖2.

Finally, in each of the cases (1)-(3) the estimate

%k+1,1 <%1,0

1 + β(γ + i(k))%1,0


is true. Hence, if for 2 ≤ n ≤ k the values ζn fulfil the condition

ζn ≤β

4

%1,0

1 + β(∑n−1

j=1 i(j) + 3)%1,0

2

,

then we derive

%k,1 <%1, 0

1 + β(∑k−1

j=1 i(j) + 1)%1,0

.

5.1.2 Remark. The proof of Theorem 5.1.1 implies that the values %k,i de-crease monotonously for fixed k and 0 ≤ i < i(k) and, obviously, %k,i < %1,0

holds if (k, i) 6= (1, 0).In order to simplify the procedure of choosing the parameters, we note that

estimate (5.1.5) is also true if the term ζk is replaced by τ ′k := c(σk + µk + εk)everywhere in Theorem 5.1.1, with τ ′k ≥ ζk. The real value of the constant ccan easily be calculated. In particular, if r > 1, then it is possible to choose

c := 8 + (r + L(r))

(L(r)

r+ 4

).

♦

5.1.2 Asymptotic properties of MSR-methods

Now we establish an asymptotic rate of convergence for the sequence %k,1studied in the previous subsection. To do so we assume in addition that

limk→∞

max1≤i≤i(k)

ρ(uk,i, U∗) = 0.

It should be noted that the inequalities (5.1.14) and (5.1.15) remain trueif, in each step k, term 1

r2 is replaced by

4(maxn≥k

max0≤i≤i(n)−1

‖vn,i − un,i‖)2 .

with vk,i := arg minv∈Qk ‖vk,i − v‖ as beforehand. Provided with these replace-

ments, both inequalities will be denoted from here on as modified inequalities(5.1.14)’, (5.1.15’.The same replacement will be used in the definition of ζ ′′k and we re-denote inthis case ζk := max[ζ ′k, ζ

′′k ] by ζ(k).

5.1.3 Theorem. Suppose the assumptions of Lemma 4.3.5 and Assumption(i) of Theorem 5.1.1 are fulfilled, parameters σk, µk and εk decrease sufficientlyfast and δk is chosen such that(

δk −εk2

)2

≥ ζ(k − 1) + 2ζ(k).


Moreover let

limk→∞

max1≤i≤i(k)

ρ(uk,i, U∗) = 0.

Then it holds

limk→∞

%k,i

k−1∑j=1

i(k)

= 0.

Proof: Indeed, for 1 ≤ i ≤ i(k) we make use of

%k,i = J(uk,i)− J∗ + L(r)(µk +

εk2

)instead of

%k,i ≤ L(r)(

2r + µ1 +ε12

)and inequality

(Jk(uk,i)− Jk(vk,i)

)24‖vk,i − uk,i‖2

>%2k,i − 2


)(2L(r)r + ξk

)4‖vk,i − uk,i‖2

instead of inequality (5.1.14).Proceeding analogously as in the proof of Theorem 5.1.1 (cf. (5.1.17)), thefollowing conclusion is easy to verify:If 0 < βk ≤ q(k), with

q(k) :=1

4min

1

maxn≥k

max1≤i≤i(n)

(J(un,i)− J∗ + L(r)(µn + εn

2 )) ,

1

2

(maxn≥k

max0≤i(n)−1

‖vn,i − un,i‖2)2

,then

1

2%k,i − 2βk%

2k,i ≥ 0, 1 ≤ i ≤ i(k).

Hence, for the corresponding steps k and i the inequalities

%k,i+1 ≤ %k,i − 2βk%2k,i + ζ(k)− ‖uk,i+1 − uk,i‖2 (i > 0)

%k,1 ≤ %k,0 − 2βk−1%2k,0 + ζ(k − 1)− ‖uk,1 − uk,0‖2

are conclusions from each of the inequalities (5.1.11), (5.1.12), (5.1.16) and ofthe modified inequality (5.1.15)’.As beforehand, we assume that σ0 = σ1, µ0 = µ1, ε0 = ε1 and that β satisfies(5.1.3). Let β0 = β1 = β and

ζ(k) ≤βk−1%

21,0

4(

1 +(β0 +

∑k−1j=1 βj i(j) + 2βk−1

)%1,0

)2 , (5.1.27)


with 0 < βj ≤ q(j) for j > 1.Then, according to the proof of estimate (5.1.5), we obtain

%k,1 <%1,0

1 +(β0 +

∑k−1j=1 βj i(j)

)%1,0

.

In particular, for βk ≥ 12q(k),

limk→∞

∑k−1j=1 i(j)

1 +(β0 +

∑k−1j=1 βj i(j)

)%1,0

= 0,

which shows an asymptotic rate of the estimate. In order to verify this it sufficesto prove that

limk→∞

∑kj=1 βj i(j)∑kj=1 i(j)

=∞

or, because of i(k) ≤ i(k) < 2i(k),

limk→∞

∑kj=1 βji(j)∑kj=1 i(j)

=∞.

Let j(k) be the maximal index for which

j(k)∑j=1

i(j) ≤k∑

j=j(k)+1

i(j).

It is obvious that j(k) may be equal to zero at the beginning. But there existsa number k0 such that j(k) > 0 ∀k > k0, moreover, j(k)→∞ as k →∞.Moreover, due to the definition of q(k), it follows that

q(k + 1) ≥ q(k), ∀ k, limk→∞

q(k) =∞,

hence,limk→∞

q(j(k)) =∞.

Now, in view of

βk ≥1

2q(k) ≥ 1

2q(j(k) + 1)

and the definition of j(k), we obtain for k > k0∑kj=1 βji(j)∑kj=1 i(j)

≥∑j(k)j=1 βji(j) +

∑kj=j(k)+1 βji(j)

2∑kj=j(k)+1 i(j)

>1

4q(j(k) + 1)

and the use of the fact that limk→∞ q(j(k)) =∞ completes the proof.

We emphasize that the assumptions made here on the control parameters donot contradict the assumptions of Theorem 5.1.1.

In case of ill-posed problems, however, an suitable estimate of q(k) from


below cannot be expected and, consequently, we do not have an appropriate ad-vice for choosing the parameters σk, µk and εk in correspondence with condition(5.1.27).

In view of i(k) ≤ i(k) < 2i(k) the asymptotic estimate indicates that

%k,1 = o

(1

sk

), sk :=

k−1∑j=1

i(k) + 1,

with sk the number of necessary prox-iterations by transition from iterate u1,0

to uk,1. Due to (5.1.5), the estimate

%k,1 <%1,0

1 + β(∑k−1

j=1 i(j) + 1)%1,0

<1

βsk

is also true.Now, we are going to establish that, on the class of problems considered,

both estimates are sharp in the sense that it is impossible to replace sk by s1+τk

with τ > 0 (in the inequality above). It is also impossible even if τ > 0 is chosenarbitrarily small and β is replaces by an arbitrary large constant, whereas thecontrolling parameters of the method are decreased as fast as we want.

In order to show this, we consider MSR-Method 4.3.1 with σk := 0, εk := 0and Kk := K for all k. In that case, obviously, the method turns into the exactproximal point algorithm

uk = ProxJ,K

uk−1, (5.1.28)

regardless of how the sequence δk is chosen.For any τ > 0 we construct a problem for which the inequality

J(uk)− J∗ ≤ c

k1+τ

cannot be fulfilled for each k, even if c is chosen arbitrarily large. Indeed, fixingτ > 0, we take such an even number m that m > 4 1+τ

τ and consider the problem

minu∈R1

J(u), J(u) := 2um.

Starting with u0 > 0, iteration (5.1.28) leads to

m(uk)m−1 + uk − uk−1 = 0,

i.e.,

uk :=uk−1

m(uk)m−2 + 1, ∀ k.

Iterating the latter recursions for k = 1, ..., n, we get

un =u0

(m(u1)m−2 + 1) · · · (m(un)m−2 + 1),

and because un → 0, we have

limn→∞

Πnk=1

(m(uk)m−2 + 1

)=∞,


hence

limn→∞

n∑k=1

(uk)m−1 =∞.

Suppose that inequality

J(uk)− J∗ = J(uk) ≤ c

k1+τ

is true for some c > 0 and all k. Then, obviously,

(uk)m ≤ c

2k1+τ,

(uk)m−2 ≤(c2

)m−2m

k(1+τ)m−2m

.

With regard to the choice of m we have

(1 + τ)m− 2

m> 1 +

τ

2,

thus

(uk)m−2 ≤(c2

)m−22

k1+ τ2, ∀ k.

But∑∞k=1 k

−(1+ τ2 ) converges and this contradicts the fact that

limn→∞

n∑k=1

(uk)m−2 =∞.

5.1.3 Linear convergence of MSR-methods

In order to continue the analysis of convergence rates with respect to the valuesof the objective functional, we makes use of the following assumption.

5.1.4 Assumption. There exists a constant θ > 0 such that, for each c ∈(0, µ1 + ε1

2

],

infu∈Ωc

J(u)− J∗ + L(r)c

ρ2(u, U∗ ∩ Br∗)≥ θ, (5.1.29)

withΩc := u ∈ Br : ρ(u,Q) ≤ c, J(u) > J∗.

♦

This assumption can be understood as a certain growth condition for the objec-tive functional f around U∗ ∩ Br∗ .We choose a constant ζk such that

ζk ≥ 6σk + (L(r) + r)(

2µk +εk2

)+(

2µk +εk2

)2

+ ξk1 (5.1.30)

+ max

[L(r)µk + ξk2 , max

[L(r),

%1,0

2

]((2σk + 2L(r)µk + L(r)

εk2

)1/3

+ µk

)],

with ξk1 and ξk2 defined as at the beginning of Section 5.1.1.


5.1.5 Theorem. Suppose the following conditions are fulfilled:

(i) Assumption 5.1.4;

(ii) assumptions of Lemma 4.3.5 and assumption (i) in Theorem 5.1.1.

Moreover, ζk and δk let be chosen such that

ξk−1 + 2ξk ≤(δk −

εk2

)2

=: ε2k, (5.1.31)

ξk ≤

(1− θ′

16

)2

(1 + θ′

16 −(θ′

16

)2) ξk−1, ξk <θ′

32%1,0q

sk+2, (5.1.32)

with

θ′ := min[θ, 8], sk :=

k−1∑j=1

i(j), q ∈[1− θ′

32, 1

).

Then it holds

%k,i ≤ %1,0qsk+i, k = 1, 2, · · · , 0 ≤ i ≤ i(k). (5.1.33)

Proof: Let us return to case (c) in the proof of Theorem 5.1.1 and assume that

J(uk,i)− J(vk,i) > 2σk + L(r)µk.

If

‖uk,i − vk,i‖ ≤(

2σk + L(r)µk + ξk)1/3

+ µk,

then we obtain immediately from (5.1.13), (4.1.3) and (4.2.5)

%k,i+1 ≤ L(r)

((2σk + L(r)µk + ξk

)1/3

+ µk

)+ξk2 +ξk0 +2σk−‖uk,i+1−uk,i‖2.

(5.1.34)Now, let

‖uk,i − vk,i‖ >(

2σk + L(r)µk + ξk)1/3

+ µk.

Then, obviously ‖uk,i − vk,i‖ > ‖vk,i − vk,i‖, consequently,

‖vk,i − uk,i‖2 ≤ 2(‖vk,i − vk,i‖2 + ‖vk,i − uk,i‖2

)< 4‖vk,i − uk,i‖2

is true, and in view of (5.1.29) and ρ(uk,i, Qk) ≤ ξk

L(r) it follows

J(uk,i)− J(vk,i) + ξk

4‖vk,i − uk,i‖2>

1

16· J(uk,i)− J(vk,i) + ξk

4‖vk,i − uk,i‖2≥ θ′

16. (5.1.35)

Furthermore,‖uk,i − vk,i‖ ≥ ‖uk,i − vk,i‖ − ‖vk,i − vk,i‖,

and because ‖vk,i − vk,i‖ ≤ µk, we obtain in this case

‖uk,i − vk,i‖ >(

2σk + L(r)µk + ξk)1/3

. (5.1.36)


Inequality (5.1.13) leads to

Jk(uk,i+1) ≤ Jk(uk,i) + ξk0 − ‖uk,i+1 − uk,i‖2

−

(J(uk,i)− J(vk,i) + ξk +

(Jk(uk,i)− Jk(vk,i)− J(uk,i) + J(vk,i)− ξk

))2


≤ Jk(uk,i) + ξk0 − ‖uk,i+1 − uk,i‖2 −

(J(uk,i)− J(vk,i) + ξk

)2


+


)2‖vk,i − uk,i‖2 ×

×∣∣Jk(uk,i)− J(uk,i)−

(Jk(vk,i)− J(vk,i)

)−(J(vk,i)− J(vk,i)

)− ξk

∣∣.Using (5.1.35) and (5.1.36), we continue with

Jk(uk,i+1) < Jk(uk,i) + ξk0 − ‖uk,i+1 − uk,i‖2

− θ′

16%k,i +

1

2


) 2σk + L(r)µk + ξk

‖vk,i − uk,i‖2

< Jk(uk,i) + ξk0 − ‖uk,i+1 − uk,i‖2

− θ′

16%k,i +

1

2


)(2σk + L(r)µk + ξk

)1/3

.

Therefore, regarding Remark 5.1.2,

%k,i+1 <

(1− θ′

16

)%k,i + 2σk + ξk0 − ‖uk,i+1 − uk,i‖2

+1

2


)(2σk + L(r)µk + ξk

)1/3

≤(

1− θ′

16

)%k,i + 2σk + ξk0 − ‖uk,i+1 − uk,i‖2 (5.1.37)

+1

2%1,0


)1/3

.

From the definition of ζk and θ′ ≤ 8 we conclude that for corresponding k andi the estimates

%k,i+1 ≤ %k,i

(1− θ′

16

)+ ζk − ‖uk,i+1 − uk,i‖2, (5.1.38)

%k,1 ≤ %k,0

(1− θ′

16

)+ ζk−1 − ‖uk,1 − uk,0‖2, (5.1.39)

result from any of the inequalities (5.1.11), (5.1.12), (5.1.16), (5.1.34) and (5.1.37).Assume now that k and i are fixed, i < i(k) and that inequality (5.1.33) is

fulfilled with some q ∈(

1− θ′

32 , 1)

for j < k, 0 ≤ i′ ≤ i(j) and for j = k, i′ ≤ i.

(1) If i+ 1 < i(k), then due to

%k,i+1 ≤ %k,i(

1− θ′

16

)+ ζk−1 − ‖uk,i+1 − uk,i‖2,

which follows from (5.1.39), (5.1.39) and ζk−1 ≥ ζk, and on account of

ε2k < ‖uk,i+1 − uk,i‖2, ε2k ≥ ζk−1 + 2ζk,


it holds

%k,i+1 < %k,i

(1− θ′

16

).

Consequently,%k,i+1 < %1,0q

sk+i+1.

(2) Let i > 0, i+ 1 = i(k). In this case, from

%k,i+1 ≤ %k,i

(1− θ′

16

)+ ζk − ‖uk,i+1 − uk,i‖2,

%k,i ≤ %k,i−1

(1− θ′

16

)+ ζk−1 − ‖uk,i − uk,i−1‖2,

and ‖uk,i − uk,i−1‖2 > ε2k, we conclude that

%k,i+1 < %k,i−1

(1− θ′

16

)2

+ ζk−1

(1− θ′

16

)+ ζk −

(1− θ′

16

)ε2k.

With regard to ε2k ≥ ζk−1 + 2ζk and θ′ ≤ 8,

%k,i+1 < %k,i−1

(1− θ′

16

)2

,

thus, (5.1.33) is true.

(3) If i = 0, i(k) = 1 and i(k−1) = 1 or i(k−1) = 2, then, due to sk ≤ sk−1+2,(5.1.39) and the second inequality in (5.1.32), we get

%k,1 ≤ %1,0qsk

(1− θ′

16

)+θ′

32%1,0q

sk−1+2

≤ %1,0qsk

(1− θ′

16

)≤ %1,0q

sk+1.

(4) Finally, let i = 0, i(k) = 1 and i(k−1) > 2. Multiplying the two inequalities

%k−1,i(k−1)−1 ≤ %k−1,i(k−1)−2

(1− θ′

16

)+ ζk−1

− ‖uk−1,i(k−1)−1 − uk−1,i(k−1)−2‖2,

%k−1,i(k−1) ≤ %k−1,i(k−1)−1

(1− θ′

16

)+ ζk−1

− ‖uk−1,i(k−1) − uk−1,i(k−1)−1‖2,

by(

1− θ′

16

)2

and(

1− θ′

16

), respectively, and summing up the results to-

gether with

%k,1 ≤ %k−1,i(k−1)

(1− θ′

16

)+ ζk−1 − ‖uk,1 − uk−1,i(k−1)‖2,


then, due to

‖uk−1,i(k−1)−1 − uk−1,i(k−1)−2‖2 > ε2k−1,

we obtain

%k,1 < %k−1,i(k−1)−2

(1− θ′

16

)3

+ζk−1

(1 +

(1− θ′

16

)+

(1− θ′

16

)2)− ε2k−1

(1− θ′

16

)2

.

Now, because of (5.1.32) and ε2k−1 ≥ ζk−2 + 2ζk−1, the estimate

%k,1 < %k−1,i(k−1)−2

(1− θ′

16

)3

holds and consequently, estimate (5.1.33) is also true.

Note that, deducing the estimates (5.1.5) and (5.1.33), we do not require thevalidity of Assumption 4.2.3, i.e., these estimates are true even if the strongconvergence of uk,i is not guaranteed.

Now we estimate ‖uk,i − u∗‖, with u∗ the strong limit of uk,i. To do so,we need the following supposition.

5.1.6 Assumption. Assumption 5.1.4 is fulfilled with

Ω′c := u ∈ Br \ U∗ : ρ(u,Q) ≤ c

instead of Ωc. ♦

Let i(k) := 16r2

δ2k+ 1 and

γk :=∞∑j=k

(√τj +

εj2

+ 2µj

), (5.1.40)

with τj := 2L(r)µj+2σj . Furthermore, let us keep the former definitions of vk,i,sk, θ′ and q.

5.1.7 Theorem. Suppose the Assumptions 4.2.3 and 5.1.6 and hypothesizesof Theorem 5.1.5, except for (i), are satisfied and

1

4r

(τk −

(δk −

εk2

)2)

+εk2

+√τk + 2µk < −

δ2k

8r, ∀ k ∈ N; (5.1.41)

γk ≤√%1,0

θ′

√q(sk+i(k)−1), ∀ k ∈ N. (5.1.42)

Then it holds

‖uk,i − u∗‖ ≤ 3

√%1,0

θ′

√q(sk+i), ∀ k ∈ N, 0 ≤ i ≤ i(k)− 1, (5.1.43)

with u∗ = limk→∞ uk,1.


Proof: Convergence of uk,i to u∗ ∈ U∗ holds in view of Theorem 4.3.8.For 0 < i ≤ i(k) and c := µ1 + ε1

2 , due to uk,i ∈ Ω′c and Assumption 5.1.6, wehave

‖uk,i − vk,i‖ ≤√%k,iθ′

(5.1.44)

and, because of uk,0 = uk−1,i(k−1), inequality

‖uk,0 − vk,0‖ ≤√%k,0θ′

(5.1.45)

is also true.Let (k0, i0) be fixed with 0 ≤ i0 < i(k0) and k′ > k0 + 1. Now we estimate‖uk′,0 − vk0,i0‖.The inequalities (4.3.35) and (4.3.36), established in the proof of Theorem 5.1.5,are obviously true under the hypothesizes of Lemma 4.3.5 and, regarding thedefinition of vk in (4.3.6), we get

‖uk+1,0 − u∗∗‖ ≤ ‖uk,0 − u∗∗‖+√τk +

εk2

+ 2µk,

with u∗∗ an arbitrary element of U∗ ∩ Br∗ .For k := j, u∗∗ := vk0,i0 the latter inequality reads as

‖uj+1,0 − vk0,i0‖ ≤ ‖uj,0 − vk0,i0‖+√τj +

εj2

+ 2µj ,

and an iteration of this inequality for j = k0 + 1, ..., k′ − 1 implies

‖uk′,0 − vk0,i0‖ ≤ ‖uk0+1,0 − vk0,i0‖+

k′−1∑j=k0+1

(√τj +

εj2

+ 2µj

). (5.1.46)

Due to (4.3.36) and the inequalities (4.3.11), which are also true in that case,we obtain for 0 ≤ i0 < i(k0)− 1

‖uk0+1,0 − vk0‖ ≤ ‖uk0,i0 − vk0‖+1

4r

(−ε2k0 + τk0

)+√τk0 +

εk02,

with vk0 chosen according to (4.3.6), and

‖uk0+1,0 − vk0‖ ≤ ‖uk0,i(k0)−1 − vk0‖+√τk0 +

εk02.

Setting u∗∗ = vk0,i0 in (4.3.34), these inequalities lead to

‖uk0+1,0 − vk0,i0‖ < ‖uk0,i0 − vk0,i0‖+1

4r

(−ε2k0 + τk0

)+√τk0 +

εk02

+ 2µk0

if 0 ≤ i0 < i(k0)− 1, and

‖uk0+1,0 − vk0,i(k0)−1‖ < ‖uk0,i(k0)−1 − vk0,i(k0)−1‖+√τk0 +

εk02

+ 2µk0 .

Now, (5.1.41), (5.1.46) and the latter inequalities give

‖uk′,0 − vk0,i0‖ ≤ ‖uk0,i0 − vk0,i0‖+

k′−1∑j=k0

(√τj +

εj2

+ 2µj

), 0 ≤ i0 < i(k0).

(5.1.47)


Passing to the limit k′ →∞ in (5.1.47), we conclude that

‖u∗ − vk0,i0‖ ≤ ‖uk0,i0 − vk0,i0‖+

∞∑j=k0

(√τj +

εj2

+ 2µj

).

Hence,

‖uk0,i0 − u∗‖ ≤ ‖uk0,i0 − vk0,i0‖+ ‖vk0,i0 − u∗‖≤ 2‖uk0,i0 − vk0,i0‖+ γk0 ,

and, due to (5.1.44) and (5.1.45),

‖uk0,i0 − u∗‖ ≤ 2

√%k0,i0θ′

+ γk0 . (5.1.48)

Taking into account the last inequality in the proof of Lemma 4.3.3, then itfollows immediately

i(k0) < 2r

((δk0 −

εk02

)2 − τk04r

− εk02

)−1

+ 1

and because of (5.1.41) we get i(k0) < i(k0) and

γk0 ≤√%1,0

θ′

√q(sk0+i(k0)−1). (5.1.49)

Now, (5.1.48) together with (5.1.33) and (5.1.49) implies

‖uk0,i0 − u∗‖ ≤ 3

√%1,0

θ′

√q(sk0+i0).

5.1.8 Remark. The conditions controlling the MSR-methods described in The-orem 4.3.6 and Theorem 4.3.8 enable us to choose a priorily the parameters.However, in the Theorems 5.1.1 and 5.1.5 on the rate of convergence, the se-quences of controlling parameters can be generated within the methods. Regard-ing Remark 5.1.2, this procedure is not complicated. Indeed, at the beginningof the k-th exterior step, the values σk, εk, µk and δk are known and they canbe kept constant for all interior steps i. After termination of the interior loopthe value of i(k) is known, which allows us to calculate the necessary boundsfor ζk+1 of ζk+1 (cf. the last inequalities in hypothesis (iii) of Theorem 5.1.1or (5.1.32), all other conditions can be satisfies a priorily). Furthermore, in or-der to determine σk+1, εk+1 and µk+1, we have only to observe upper bounds,including (4.3.14), and afterwards one can calculate δk+1 in view of (4.3.13),(5.1.4) or (4.3.13), (5.1.31).

Questions on how to control the method efficiently will be considered inSection ??, too.

The updating rule for the controlling parameters ensuring linear conver-gence of uk,i to u∗ (cf. Theorem 5.1.7), is rather complicated. In fact, togetherwith the choice of σk0 , εk0 and µk0 we have to satisfy the inequalities (5.1.42) for

5.2. RATE OF CONVERGENCE FOR OSR-METHODS 159

all k ≤ k0− 1. From the practical point of view we cannot exclude that the dis-cretization parameter has to be decreased inadmissible fast. However, it shouldbe taken into account that for ill-posed problems a qualified convergence of themethods is usually an important advantage, even if corresponding assumptionson the controlling parameters and the accuracy of data approximation are ide-alized. Condition (5.1.42), which seems to be strange at the first glance, canbe explained as follows: In general, the original problem has a non-unique solu-tion and to which element u∗ the method is converging depends on the chosensequences µk, σk, εk and δk. ♦

Finally, we note that the rate of convergence of MSR-Method 4.3.12 canbe investigated in the same way.

5.2 Rate of Convergence for OSR-Methods

Now we are going to investigate the rate of convergence for OSR-Method 4.2.1.This will be done by applying techniques used in Section 5.1 for MSR-methods.

As in Subsection 5.1.1, we take σ0 = σ1, ε0 = ε1, µ0 = µ1 and use theformer definitions of ξk1 , ξk2 , ζ ′k, ζ ′′k and ζk given there.Define

%k := J(uk)− J∗ + ξk2 , k = 1, 2, · · · .

5.2.1 Theorem. Suppose the hypothesizes of Theorem 4.2.4 are fulfilled, u0 ∈K is chosen and, moreover,

(i) σk+1 ≤ σk, εk+1 ≤ εk, µk+1 ≤ µk, ρ(Qk, Qk+1) ≤ 2µk, ∀ k,ε1 < 1, µ1 < 1 and J(u0) > J∗;

(ii) constant β is fixed such that

0 < β <1

4min

[1

L(r)(2r + µ1 + ε12 ),

1

8r2

];

(iii) controlling parameters are chosen according to

ζk ≤β

4

(%0

1 + (k + 1)β%0

)2

k = 1, 2, · · · . (5.2.1)

Then, for OSR-Method 4.2.1 it holds

%k <%0

1 + βk%0. (5.2.2)

Proof: Obviously, %k ≥ 0 holds for all k.Following completely the proof of relation (5.1.19) in Theorem 5.1.1, we estab-lish

%k ≤ %k−1 − 2β%2k−1 + ζk−1, k = 1, 2, · · · . (5.2.3)

With regard to ζ0 = ζ1, (5.2.1) and (5.2.3), this leads to

%1 <%0

1 + β%0. (5.2.4)


Due to assumptions (i) and (ii), %k ≤ 14β is true for all k. Because the function

t−2βt2 increases on the interval [0, 14β ], the inequalities (5.2.3) and (5.2.4) imply

immediately

%2 ≤%0

1 + β%0− 2β

(%0

1 + β%0)

)2

+β

4

(%0

1 + 2β%0

)2

and

%2 <%0

1 + 2β%0.

In the sequel we follow part (1) of the proof of Theorem 5.1.1. Suppose that,for fixed k, inequality

%k <%0

1 + βk%0

holds true. If ζk ≥ β%2k, then according to (5.2.1),

%k ≤1

2· %0

1 + β(k + 1)%0

and using (5.2.3), we get

%k+1 ≤ %k + ζk

≤ 1

2· %0

1 + β(k + 1)%0+β

4

(%0

1 + β(k + 1)%0

)2

<%0

1 + β(k + 1)%0.

But if ζk < β%2k, then (5.2.3) implies

%k+1 ≤ %k − β%2k,

and again, in view of the monotonicity of the function t − 2βt2on [0, 14β ], one

can conclude that

%k+1 ≤ %0

1 + βk%0− β

(%0

1 + βk%0

)2

<%0

1 + βk%0· 1− β%0

1 + β(k + 1)%0=

%0

1 + β(k + 1)%0.

Now, for OSR-methods linear convergence of the objective function valuesas well as of the iterates can be proved analogously to the statements in theTheorems 5.1.5 and 5.1.7.

5.2.2 Theorem. Let the following hypothesizes be fulfilled:

(i) Assumption 5.1.4;

(ii) all assumptions of Theorem 4.2.4 and assumption (i) of Theorem 5.2.1;

5.2. RATE OF CONVERGENCE FOR OSR-METHODS 161

(iii) choose a constant ζk such that

ζk <θ′

32%0q

k,

with θ′ := min[θ, 8] and q ∈(

1− θ′

32 , 1)

.

Then, for OSR-Method 4.2.1 it holds

%k ≤ %0qk, k = 1, 2, · · · . (5.2.5)

Proof: An OSR-method corresponds in the context of an MSR-method to part(3) of the proof of Theorem 5.1.5, i.e. in all exterior steps k only one interiorstep is created. Thus i(k) = 1 for all k.Therefore, following the proofs of the Theorems 5.1.1 and 5.1.5, we obtain

%k ≤(

1− θ′

16

)%k−1 + ζk−1, ∀ k > 0, (5.2.6)

which corresponds to (5.1.39). Now assume that

%k ≤ %0qk (5.2.7)

holds for a fixed k. For k = 1 this can be verified trivially. Then, due to theassumption (iii) and (5.2.6), (5.2.7), we conclude

%k+1 <

(1− θ′

16

)%0q

k +θ′

32%0q

k

=

(1− θ′

32

)%0q

k ≤ %0qk+1.

5.2.3 Theorem. Suppose that Assumptions 4.2.3 and 5.1.6 and the hypothe-sizes of Theorem 5.2.2, except for (i), are satisfied and that

∞∑j=k+1

(√τj +

εj2

+ 2µj

)≤√%0

θ′

√qk, ∀ k. (5.2.8)

Then it holds

‖uk − u∗‖ ≤ 3

√%0

θ′

√qk, (5.2.9)

with u∗ = limk→∞ uk.

Proof: Convergence uk → u∗ ∈ U∗ follows immediately from Theorem 4.2.4.Using Assumption 5.1.6 with c := µ1 + ε1

2 , and having in mind that uk ∈ Ω′c,we get

‖uk − wk‖ ≤√%kθ′, (5.2.10)

with wk := arg min‖uk − w‖ : w ∈ U∗ ∩ Br∗.From (4.2.13), considered with fixed k0 and u∗∗ := wk0 , it follows

‖uk − wk0‖ ≤ ‖uk−1 − wk0‖+√τk +

εk2

+ 2µk, ∀ k,


and iterating theses inequalities for k = k0 + 1, ..., k0 + j, one gets

‖uk0+j − wk0‖ ≤ ‖uk0 − wk0‖+

k0+j∑k=k0+1

(√τk +

εk2

+ 2µk

).

Passing j →∞, we have

‖u∗ − wk0‖ ≤ ‖uk0 − wk0‖+

∞∑k=k0+1

(√τk +

εk2

+ 2µk

).

Thus,

‖u∗ − uk0‖ ≤ 2‖uk0 − wk0‖+

∞∑k=k0+1

(√τk +

εk2

+ 2µk

),

and in view of (5.2.10),

‖u∗ − uk0‖ ≤ 2

√%k0θ′

+∞∑

k=k0+1

(√τk +

εk2

+ 2µk

).

The latter inequality together with (5.2.7) and (5.2.8) allows us to conclude that

‖u∗ − uk0‖ ≤ 3

√%0

θ′

√qk0 .

It should be noted that the estimates (5.2.2) and (5.2.5) follow immediatelyas particular cases from the Theorems 5.1.1 and 5.1.5 which correspond toMSR-methods. However, deducing this way, it requires stronger conditions onthe controlling parameters.

5.3 Rate of Convergence for Regularized PenaltyMethods

In this section the approaches, developed in the previous Sections 5.1 and 5.2,are applied in order to investigate the rate of convergence for penalty methodsstudied in Section 3.2.

Considering Problem (A3.4.56) under the additional assumption that theoptimal set U∗ is bounded, we analyze Method (3.2.2) with χk = 2 ∀ k andpenalty function (A3.4.78):

φk(u) :=

− 1rk

∑mj=1

1gj(u) if u ∈ intK,

+∞ if u /∈ intK,(5.3.1)

with u0 ∈ intK, r1 > 0, rk+1 ≥ rk, ∀ k and limk→∞ rk =∞.An immediate application of the abstract scheme of the investigation of

OSR- and MSR-methods is impossible here, because in the normal case thatinfu∈Rn J(u) < infu∈K J(u), condition (4.2.5) cannot be fulfilled for

Jk(u) = J(u) + φk(u)

5.3. RATE OF CONVERGENCE FOR REGULARIZED PENALTYMETHODS163

in general, because on a subset of any neighborhood of the optimal set it holdslimk→∞ φk(u) =∞.

The same situation happens for other penalty functions satisfying theconditions of the Theorems 3.2.1 and 3.2.3.

Due to Theorem 3.2.3, sequence uk generated by Method (3.2.2) withpenalty function (5.3.1) is bounded if

∑∞k=1 εk <∞. Moreover,

limk→∞

‖uk − uk−1‖ = 0, limk→∞

J(uk) = J∗.

Relation (3.2.2) implies

‖∇Jk(uk)‖ ≤ εk + 2‖uk − uk−1‖ =: εk (5.3.2)

and, in view of limk→∞ ‖uk − uk−1‖ = 0, we get limk→∞ εk = 0. Choosing anarbitrary Slater point u, we obtain from (5.3.2)

〈∇J(uk) +1

rk

m∑j=1

1

g2j (uk)

∇gj(uk), uk − u〉 ≤ εk‖uk − u‖

and, with regard to the convexity of J and gj ,

1

rk

m∑j=1

1

g2j (uk)

[gj(u

k)− gj(u)]≤ εk‖uk − u‖+ J(u)− J(uk)

≤ εk‖uk − u‖+ J(u)− J∗. (5.3.3)

Choosing a convergent subsequence uki, then u = limi→∞ uki ∈ U∗.Denote

I0(u) := j = gj(u) = 0, I−(u := j : gj(u) < 0,

γ := max

[max

j∈I−(u)gj(u), max

j∈I0(u)gj(u)

].

Due to the continuity of the functions gj and the definition of the index setsI0(u) and I−(u), we can conclude that, beginning with some i0 and i ≥ i0,

gj(uki) <

γ

2< 0 if j ∈ I−(u),

gj(uki)− gj(u) > −γ

2if j ∈ I0(u).

Consequently, if j ∈ I−(u),

1

rkig2j (uki)

≤ 4

γ2rki, i ≥ i0, (5.3.4)

and, according to (5.3.3), this implies

− γ

2rki

∑j∈I0(u)

1

g2j (uki)

≤ εki‖uki − u‖+ J(u)− J∗ − 1

rki

∑j∈I0(u)

1

gj(uki). (5.3.5)

Estimates (5.3.4) and (5.3.5) ensures that

1

rkig2j (uki)

≤ c, ∀ i ∈ N


holds with some c for all functions gj .Proving by contradiction, it is not difficult to verify that

1

rkg2j (uk)

≤ c′, k = 1, 2, · · ·

is satisfied for some c′ and any j = 1, ...,m. Thus,

φk(uk) = − 1

rk

m∑j=1

1

gj(uk)≤ m

√c′

rk=:

c1√rk. (5.3.6)

With vk := arg min‖uk − v‖ : v ∈ U∗, in view of (A1.5.32), we have for anyλ ∈ [0, 1]:

J(uk+1) + φk+1(uk+1) + ‖uk+1 − uk‖2

≤ λJ(vk) + (1− λ)J(uk) + φk+1(λvk + (1− λ)uk) + λ2‖uk − vk‖2 +ε2k+1

2.

But, due to

gj(uk + λ(vk − uk)) ≤ λgj(v

k) + (1− λ)gj(uk)

≤ (1− λ)gj(uk) < 0

and rk+1 ≥ rk, we obtain

φk+1(λvk + (1− λ)uk) ≤ 1

1− λφk+1(uk)

≤ 1

1− λφk(uk) ≤ c1

(1− λ)√rk

and hence, the relations

J(uk+1) ≤ λJ(vk) + (1− λ)J(uk)

+ λ2‖uk − vk‖2 +ε2k+1

2+

c1(1− λ)

√rk

and

J(uk+1)− J∗ ≤ (1− λ)(J(uk)− J∗)

+ λ2‖uk − vk‖2 +ε2k+1

2+

c1(1− λ)

√rk

(5.3.7)

are true for any λ ∈ [0, 1]. Now we choose

λ = λk := min

[J(uk)− J∗

2‖uk − vk‖2,

7

8

].

If J(uk)−J∗2‖uk−vk‖2 ≥

78 , then (5.3.7) leads to

%k+1 ≤9

16%k +

8c1√rk

+ε2k+1

2, (5.3.8)

5.4. COMMENTS 165

with %k := J(uk)− J∗. But, if J(uk)−J∗2‖uk−vk‖2 <

78 , we get

λk :=J(uk)− J∗

2‖uk − vk‖2,

and (5.3.7) implies

%k+1 ≤ %k −1

4‖uk − vk‖2%2k +

8c1√rk

+ε2k+1

2. (5.3.9)

Comparing the relations (5.3.8), (5.3.9) with (5.1.12), (5.1.15) and acting asin the proof of Theorem 5.1.1 (cf. the transition from the inequalities (5.1.12),(5.1.15) to (5.1.18), (5.1.19), we can establish that for a sufficiently fast decreaseof the controlling parameters the estimate

%k+1 ≤ %k − 2β%2k + c

(1√rk

+ ε2k+1

), β > 0

is a conclusion of (5.3.8) and (5.3.9).Furthermore, using the techniques for proving the rate of convergence

for OSR-methods, analogue statements as in Theorems 5.2.1 - 5.2.3 can beobtained. However, these statements have a non-constructive character if we donot know upper bounds for the values c1 in (5.3.6) and supk ‖uk − vk‖.

5.4 Comments

Section 5.1: In Rockafellar [349] an OSR-method for solving variationalinequalities with maximal monotone operators T has been described. Further-more, under the assumption that T −1 is Lipschitz continuous at zero-point, alinear convergence rate has been proved (cf. Theorem 2). This assumption im-plies uniqueness of the solution u∗. Moreover, solving Problem (4.2.1) in thiscontext (with T := ∂J), a growth condition

lim infu→u∗

J(u)− J(u∗)

‖u− u∗‖2> 0

is required, which is stronger than (5.1.29). In Luque [281] this investigationhas been continued and the Lipschitz continuity of T −1 has been dropped. Inboth papers referred an approximation of the problem data has been not takeninto account, i.e., Jk := J, Kk := K, ∀ k.

For the case Kk := K = V , Kaplan [206] has investigated the convergencerate for an OSR-method and assumed, for the approximation of J , a conditionlike (4.2.5).

Guler [153] has studied an asymptotic rate of convergence for the exactiteration

uk+1 := arg minu∈KJ(u) +

1

2λk‖u− uk‖2, λk > 0,

∞∑k=1

λk =∞.

For λk = const this represents a particular case of an OSR-method.Convergence rate estimates for MSR-methods in the framework of convex


semi-infinite problems have been established in [214] and our presentation fol-lows this paper.

Section 5.2: The results in this section are not yet published.

Section 5.3: These results can be found in [206].

Chapter 6

PPR FOR CONVEXSEMI-INFINITEPROBLEMS

6.1 Introduction to Numerical Analysis of SIP

In the space V = Rn we consider the following convex problem

minJ(u) : u ∈ K, (6.1.1)

with

K = u ∈ U0 : g(u, t) ≤ 0, ∀ t ∈ T. (6.1.2)

Here set T is a compact subset of the space Rm; U0 is a closed convex subsetof Rn; J and gt : u 7→ g(u, t) are convex functions from R

n into R (see alsoProblem (A1.7.42)).

Theoretical and practical manifestations of this model are abundant and anincreasing number of papers show a growing interest in semi-infinite program-ming problems during the last two decades. The notion semi-infinite program-ming (SIP) stems from the property that set K is finite-dimensional whereasT is an infinite set, hence we are dealing with a continuum of inequality con-straints.

Concerning theory and properties of SIP we refer to the monographsBlum and Oettli [46], Krabs [248], Hettich and Zenke [178], Glashoffand Gustafson [132], Goberna and Lopez [137] and to survey papersGustafson and Kortanek [155], Anderson and Philpott [10], Hettichand Kortanek [177] and Reemtsen and Ruckmann [344].

Not being able to deal with infinitely many constraints in a finite algo-rithm, certain approximations of the problem are always required. Algorithmsto be considered generate finitely constrained auxiliary problems and, follow-ing [177], one can roughly distinguish three conceptual approaches: Exchangemethods, including cutting plane methods, discretization methods and methodsbased on local reductions.

In order to sketch briefly the respective ideas behind these approaches, we

167

168 CHAPTER 6. PPR FOR CONVEX SEMI-INFINITE PROBLEMS

consider the following finitely constrained auxiliary problems

P (Tk) minJ(u) : u ∈ Kk, (6.1.3)

with

Kk := u ∈ U0 : g(u, t) ≤ 0, ∀ t ∈ Tk. (6.1.4)

If U0 is compact and K 6= ∅, the approximate problem (6.1.3) has a solution forevery finite set Tk ⊂ T . Sometimes compactness of U0 is artificially introducedin order to secure solvability of these problems.

Omitting for a moment precise assumptions on the problems and methodsinvolved, we emphasize in the sequel the main ideas of the numerical approachesmentioned above.

6.1.1 Conceptual exchange approach

The notion exchange methods refers to the fact that in step k the set Kk+1 isgenerated from Kk by adding at least one new constraint and (in some algo-rithms) deleting some of the constraints of Kk, i.e., an exchange of constraintstakes place.

6.1.1 Algorithm. (Exchange algorithm)Step k: Given Tk ⊂ T , |Tk| <∞;

(i) compute approximately a solution uk of P (Tk) and some or all local solu-tions tk1 , ..., t

kqk

of the subproblem

Q(uk) maxg(uk, t) : t ∈ T; (6.1.5)

(ii) stop if g(uk, tkj ) ≤ 0, j = 1, ..., qk; Otherwise choose Tk+1 according to arule guaranteeing the following inclusions

Tk+1 ⊂ Tk ∪ tk1 , ..., tkqk and Tk+1 ∩ tk1 , ..., tkqk 6= ∅.

6.1.2 Remark. Detailed algorithms differ mainly in the choice of Tk+1. Deter-mining a global solution of the (in general) non-convex problem (6.1.5) is verycostly particularly if dimT ≥ 2. One can find algorithms, cf. [147, 173, 389],which have important additional features. For instance, the ability to add sev-eral constraints g(u, tk) ≤ 0, where tk are some points with g(uk, tk) > 0, oralgorithms with efficient rules for deleting constraints. In [247] an algorithmwith an automatic grid refinement is suggested, which ensures a linear rate ofconvergence with respect to the values of the objective functional. ♦

6.1.2 Conceptual discretization approach

Algorithms of this type compute a solution of Problem (6.1.1) by solving asequence of Problems P (Tk), where Tk is a grid of mesh-size hk, i.e. a finite setTk ⊂ T such that

supt∈T

ρ(t, Tk) ≤ hk.

6.1. INTRODUCTION TO NUMERICAL ANALYSIS OF SIP 169

In the k-th step of the algorithm a fixed (usually uniform) grid Tk is considered,generated by hk := γkhk−1, with γk ∈ (0, 1) chosen a priorily or defined withinthe solution procedure. In step k only subsets Tk of Tk are used and in somealgorithms one has

γk =1

nk−1, nk−1 ∈ N, nk−1 ≥ 2, Tk−1 ⊂ Tk ∀ k.

6.1.3 Algorithm. (Discretization algorithm)Step k: Given hk−1, Tk−1 ⊂ Tk−1 and a solution uk−1 of P (Tk−1);

(i) choose hk := γkhk−1 and generate Tk;

(ii) select Tk ⊂ Tk;

(iii) compute a solution u of P (Tk).If u is feasible for P (Tk) within a given accuracy, set uk := u, select γk+1

and continue with Step (k+1).Otherwise repeat (ii) for a new choice of Tk.

6.1.4 Remark. Concerning efficiency, it is important to use as much infor-mation as possible from former grids for solving P (Tk). Substantial differencesamong the algorithms of this type lie in the choice of Tk. An obvious criterionin selecting Tk is to ensure that

Tk ⊃ Tαk := t ∈ Tk : g(u, t) ≥ −α,

with α > 0 some chosen threshold and u the foregoing inner iterate accordingto Step k(iii) of Algorithm 6.1.3.

The choice of α is crucial: A large value of α leads to many constraintsin P (Tk) and choosing α too small may have the same effect in subsequentproblems, because parts of T may be overlooked (cf. [174, 343]).Applying methods of feasible directions for solving SIP (6.1.1), in [390, 391]some rules are described in order to coordinate the grids and search directionsto preserve feasibility of the iterates. This requires information about the lastTk−1 and uk−1 in order to choose γk and the initial set Tk at Step k. In [325]a discretization is suggested, preserving the rate of convergence of the basicoptimization algorithm for the whole solution procedure. In some publications,see for instance [325, 314] and the references quoted there, the stopping rulein Step k(iii) of Algorithm 6.1.3 is weakened. In some of these methods (cf.[326, 143]) the convergence results require a sufficient dense initial grid. ♦

The methods sketched above include some ideas which are also impor-tant for our abstract regularization scheme and for the algorithms consideredin Section 6.2. Particularly this effects the following questions: how to guar-antee stability of the solutions of the auxiliary problems by means of stableapproximations of the objective function, how to coordinate data (accuracy ofapproximations and solutions) belonging to former steps together with new gridparameters, or to make use of results of several previous iterations for construct-ing the next auxiliary problem.


6.1.3 Conceptual reduction approach

These methods are especially recommendable in the final stage of an iterationprocedure in order to obtain a high accuracy of the solution. They are con-structed under some assumptions allowing a local reduction of SIP which insure,in particular, that for each point u, generated by the method, the set

T (u) := τ ∈ T : g(u, τ) = maxt∈T

g(u, t)

is finite, i.e., T (u) = t1, ..., tq.Moreover these assumptions imply that, for j = 1, ..., q, continuous mappings

tj : U(u) 7→ U(tj) ∩ T,

exist on pairs of neighborhoods U(u), U(tj)∩T such that on U(u) the infinitelymany original constraints can be equivalently replaced by a finite number

Gj(u) := g(u, tj(u)) ≤ 0, j = 1, ..., q.

6.1.5 Algorithm. (Reduction algorithm) Step k: Given uk−1 (not necessarilyfeasible);

(i) compute all local maxima tk−11 , ..., tk−1

qk−1of Q(uk−1) (see (6.1.5)) for which

g(uk−1, tk−1j ) are close to maxt∈T g(uk−1, t);

(ii) execute some steps of a basic algorithm to the reduced finite-dimensionalproblem

minJ(u) : Gj(u) ≤ 0, j = 1, ..., qk−1,where Gj(u) := g(u, tj(u)) and tj(u) are defined according to u := uk−1

(uk−1 be the last of these iterates);

(iii) set uk := uk−1 and continue with Step k + 1.

6.1.6 Remark. It should be noted that the violation of the assumptions re-quired for the application of the reduction method, in particular finiteness ofthe set T (u), can be considered as a degenerate situation (cf. Jongen, Jonkerand Twilt [200]). The existence of suitable functions tj(·) in a neighborhood ofa fixed u is closely related with the possibility to apply the implicit function the-orem to the Karush-Kuhn-Tucker system of Problem Q(u) (see Hettich andJongen [175]).

As mentioned in Remark 6.1.2, substep (i) in Algorithm 6.1.5 is veryexpensive and it is desirable to reduce the number of its executions.If the functions Gj are sufficiently smooth, in substep (ii) methods with super-linear rate of convergence should be applied for solving the arising nonlinear pro-gramming problems. Sequential quadratic programming (SQP-methods) havebeen used efficiently in this context (cf. [327, 383, 146]).

In the first instance SQP-methods are locally convergent. Therefore, hy-brid techniques combining robust globally convergent descent algorithms withSQP-methods have been developed (cf. [155, 189]). Globalization techniquesfor SQP-methods were introduced by Han [167], who used for controlling thespep-size an exact penalty function of type

φ(u) = d

qk∑j=1

max[0, Gj(u)].

6.2. STABLE METHODS FOR ILL-POSED CONVEX SIP 171

In [416, 83] and [384] alternative penalty functions have been suggested withthe same goal. ♦

The idea to apply a penalty term in combination with discretization inorder to solve SIP seems to be promising. However, numerically this may berather difficult, because the condition number of the Hessians of the objectivefunctions of the auxiliary problems depends not only on the penalty parameter,but essentially also on the common influence of almost active constraints. Thenumber of these constraints usually increases with the number of discretizationpoints.

Hence, the use of exact penalty functions or their smooth approximationshas the advantage that it is possible to work with a fixed penalty parameter.

Exact penalty functions of integral type for SIP have been proposed in[81], but a suitable choice of the penalty parameter is not discussed.

In Section 6.2 we will consider a different approach based on uniform (withrespect to successive discretizations) upper bounds for the Lagrange multipliersof the arising regularized auxiliary problems and a special deleting rule for theconstraints. In this approach the strong convexity of the regularized objectivefunctionals in the auxiliary problems is essential. How to obtain such upperbounds in a constructive manner is described in Remark 6.2.9 and Remark6.2.16, respectively. This in turn facilitates the choice of penalty parameterswhich remain unchanged during the whole iteration procedure.

6.2 Stable Methods for Ill-posed Convex SIP

To start with some motivation for applying regularization methods to SIP, weillustrate some peculiarities of ill-posed convex SIP under discretization.

6.2.1 Behavior of ill-posed SIP under discretization

Let us consider the following simple examples explaining the typical behaviorof SIP under discretization.

6.2.1 Example. Consider the linear SIP

J(u) := −u1 → min

s.t. u ∈ U0 := (v1, v2) : v2 ≥ 0,

g(u, t) := u1 −(t− 1√

2

)2

u2 ≤ 0, ∀ t ∈ T := [0, 1].

Obviously, solutions of this problem are the points u∗ = (0, a)T for all a ≥ 0.Let Tk be a finite hk-grid on [0, 1] with

tk0 := arg mint∈Tk

∣∣t− 1√2

∣∣, but tk0 6=1√2.

For the approximate problems with Tk instead of T the whole ray

u ∈ R2 : u1 =

(tk0 −

1√2

)2

u2, u2 ≥ 0


is feasible and on this ray J(u)→ −∞.Choosing a suitable ball Br(0), we take

Q := u ∈ U0 ∩ Br(0) : u1 −(t− 1√

2

)2

u2 ≤ 0 ∀ t ∈ T,

Qk := u ∈ U0 ∩ Br(0) : u1 −(t− 1√

2

)2

u2 ≤ 0 ∀ t ∈ Tk

and obviously

Q := u ∈ Br(0) : u1 ≤ 0, u2 ≥ 0,

Qk := u ∈ Br(0) : u1 ≤(tk0 −

1√2

)2

u2, u2 ≥ 0.

For arbitrary u′ ∈ Qk \Q one has w′ := (min[0, u′1], u′2)T ∈ Q and

‖u′ − w′‖ ≤(tk0 −

1√2

)2

r,

hence, ρH(Q,Qk) ≤ rh2. ♦


J(u) := −u1 → min

s.t. u ∈ U0 := (v1, v2) : v2 ≥ 0,g(u, t) ≤ 0 ∀ t ∈ T := [0, 1],

with

g(u, t) := max

[u1 −

(t− 1√

2

)2

u2, u1 −∣∣t− 1√

2

∣∣] .As before the optimal solutions are u∗ = (0, a)T for all a ≥ 0.

Choosing Tk and tk0 as in Example 6.2.1, then

u∗,k :=

(∣∣tk0 − 1√2

∣∣, bk) with bk ≥1∣∣tk0 − 1√

2

∣∣are solutions of the corresponding approximate problems, thus

limk→∞

‖u∗,k‖ = +∞.

For the analogous choice of Q and Qk it is easy to show that

ρH(Q,Qk) ≤ min[hk, rh2k].

♦

Hence, if for the approximate problems considered in both examples wechoose rh1 < 1 and

Jk := J,

Kk := u ∈ U0 : g(u, t) ≤ 0 ∀ t ∈ Tk,


then Assumption 4.2.2 is fulfilled with σk := 0, µk := rh2k. Therefore, the

Theorems 4.2.4 and 4.3.8 deliver conditions for the controlling parameterswhich ensure convergence of the OSR- as well as the MSR-method when appliedto these problems.


J(u) := −u1 − u2 → min

s.t. u ∈ U0 := (v1, v2) : v1 ≥ 0.1, v2 ≥ 0.1,g(u, t) ≤ 0 ∀ t ∈ T := [0, 1]× [0, 1],

with

g(u, t) :=

(1−

(1√2− t1

)2)u1 +

(1−

(1√2− t2

)2)u2 − 1.

Obviously,K := u ∈ R2 : u1 + u2 ≤ 1, u1 ≥ 0.1, u2 ≥ 0.1

and the constraint u1 + u2 ≤ 1 corresponds to the unique parameter valuet = (1/

√2, 1/√

2)T . The optimal set of this problem is

U∗ :=

u = α

(0.90.1

)+ (1− α)

(0.10.9

): 0 ≤ α ≤ 1

.

Now, we choose an arbitrary sequence of grids Tk:

0 ≤ t11 < t21 < . . . < tq(k)1 ≤ 1,

0 ≤ t12 < t22 < . . . < tq(k)2 ≤ 1,

with rational numbers ts1, tj2. Then it is easy to verify that for

a1 := min1≤s≤q(k)

∣∣ 1√2− ts1

∣∣, a2 := min1≤j≤q(k)

∣∣ 1√2− tj2

∣∣the approximate problem has the unique solution:(

1− 0.1(1− a22)

1− a21

, 0.1

)if a1 < a2

and (0.1,

1− 0.1(1− a21)

1− a22

)if a2 < a1.

Of course the cases a1 < a2 and a2 < a1 have the same probability and, if thediscretization parameter hk → 0, in general two cluster points arise: (0.9, 0.1)T

and (0.1, 0.9)T . Thus, usually, discretization causes instability for this problem.To apply MSR-methods described in Section 4.3, we consider Theorem

4.3.6 and Corollary 4.3.7, because for the problem above the feasible set isbounded. Denoting again by Kk the feasible set of the approximate problemwith grid parameter hk, we get

ρH(Kk,K) ≤ 3

4h2k if h1 ≤

1

2.


Indeed,

Kk ⊂ K ′ := u ∈ R2 : (1− h2k)u1 + (1− h2

k)u2 ≤ 1, u1 ≥ 0.1, u2 ≥ 0.1

and for each u ∈ K ′ \K the point (1− h2k)u belongs to K. But, due to u ∈ K ′,

we have ‖u‖ ≤ 11−h2

kand, hence,

‖u− (1− h2k)u‖ ≤ h2

k

1− h2k

.

Thus, in case r := 2, εk ≤ 12 , hk ≤ 1

2 ∀ k, relation (4.3.38) is fulfilled. Therefore,Corollary 4.3.7 ensures convergence of the MSR-method if

∞∑k=1

hk <∞,∞∑k=1

εk <∞, σk := 0

and δk satisfies condition (4.3.5). ♦

Hence, under relatively weak assumptions on the choice of the controllingparameters the IPR-schemes, described in Sections 4.2 and 4.3, lead to stablesolution procedures for the three examples above.

6.2.4 Remark. The application of the exchange and discretization methodsto Example 6.2.1 requires additional constraints ensuring boundedness of thefeasible set. Otherwise the iterates would not be defined for any discretizationTk with 1/

√2 /∈ Tk, because the resulting discretized problem is unsolvable.

Solutions obtained by discretization methods, in general, depend on theseadditional constraints. The same dependence may also arise in exchange meth-ods because of the inexact computation of arg maxt∈T g(uk, t).In the case of Example 6.2.2 discretization methods will generate unboundedminimizing sequences and, in Examples 6.2.1 and 6.2.2, the assumptions re-quired for reduction methods are violated. In particular, for the optimal pointu∗ = (0, 0)T we have T (u∗) = T , i.e. T (u∗) contains infinitely many points.

In the case of Example 6.2.3 all the three approaches are applicable. Nev-ertheless discretization as well as exchange methods prove to be instable withrespect to the argument. For discretization methods this follows immediatelyfrom the previous analysis. In exchange methods if

τ ≈ arg mint∈T

g(uk, t)

is computed such that

g(uk, τ) > 0 and∣∣τ1 − 1/

√2∣∣ < ∣∣τ2 − 1/

√2∣∣,

and if we insert this point τ into Tk+1 (assuming that τ is closer to (1/√

2, 1/√

2)T

than any other point in Tk+1), then the next iterate uk+1 will be close to(0.9, 0.1)T whereas in the opposite case of

∣∣τ1 − 1/√

2∣∣ > ∣∣τ2 − 1/

√2∣∣ the point

uk+1 is located near (0.1, 0.9)T .For a similar problem obtained by substituting the constraints

u1 ≥ 0, u2 ≥ 0 for u1 ≥ 0.1, u2 ≥ 0.1

the assumptions required for reduction methods are violated in the optimalpoints (0, 1)T and (1, 0)T . ♦


Although the examples considered above are quite simple, the behavior of morecomplicated SIP under discretization is similar and one could expect that reg-ularization methods are quite natural to solve them on a stable way.

6.2.2 Preliminary results

In the sequel we consider two types of programs with infinitely many constraints:convex SIP and fully parameterized convex SIP.

The first one is given by

minimize J(u),subject to u ∈ K;

(6.2.1)

K := u ∈ Rn : g(u, t) ≤ 0, ∀ t ∈ T, (6.2.2)

T is a compact subset of the space Rm;J and gt : u 7→ g(u, t) are convex, differentiable functions on Rn;U∗ 6= ∅ and the Slater condition is fulfilled, i.e.,

∃ u ∈ Rn : supt∈T

g(u, t) < 0. (6.2.3)

The second problem can be described in the following way:

P (γ) minimize J(u, γ),subject to g(u, t, γ) ≤ 0 ∀ t ∈ T ;

(6.2.4)

whereγ ∈ Γ,Γ ⊂ Rp is a connected and bounded set;T ⊂ Rm is a compact set ;J : Rn × Γ → R and g : Rn × T × Γ → R are convex, differentiable functionswith respect to u for each t ∈ T and γ ∈ Γ, and continuous with respect to(t, γ) ∈ T × Γ for each u ∈ Rn.

We assume that for some γ0 ∈ Γ an approximate solution of P (γ0) isknown. Then, for a given γ ∈ Γ, or for several values γi ∈ Γ (i = 1, ..., s), anoptimal solution of P (γ), respectively P (γ1), P (γ2), ..., P (γs), is sought.

Of course, Problem P (γ) could be attacked at ones. However, we expectthat it would be more efficient to determine the solution of P (γ) by followinga certain path from γ0 to γ. As a by-product we will see that one can obtainapproximate solutions of the problems P (γ) at the chosen intermediate valuesγ together with two-sided error bounds of these solutions.

Sometimes it may be useful to introduce artificially a family of ProblemsP (γ) in order to solve a SIP of type (6.2.1). In this case the family P (γ) has tobe constructed such that Problem (6.2.1) corresponds to the parameter γ := γ.This approach makes sense if the starting problem P (γ0) can be solved eas-ily and the path-following procedure is efficient. Path-following or continuationstrategies were used in finite-dimensional non-linear programming, for instance,by Gfrerer et al. [127], Guddat et al. [152] and Gollmer et al. [138].Of course, applying fast convergent methods, the choice of the parameters


γ1, γ2, ... and the successive solution of the arising problems should be performedsuch that the last iterate of the previous problem serves as a suitable startingpoint for the next problem. The structural dependence of feasible and optimalsets with respect to the parameters, which have to be taken into account alongthe path, can be classified according to the structural theory developed by Jon-gen et al. [200], Kojima [243] and Kojima and Hirabayashi [244].

Concerning parametric SIP a similar strategy has been studied by Rupp[358]. As in finite optimization the path-following idea is applied to the corre-sponding Karush-Kuhn-Tucker system, which changes correspondingly with theactive set. A theoretical analysis of parametric SIP can be found by Brosowski[55, 56].

The initial situation, when a family of Problems (6.2.4) is given, may besubstantial for analyzing the parametric problem. Indeed, often the value of thefinal parameter γ, in which we are interested, is a priori unknown and only ten-tatively determined. Therefore, bounds of solutions of the type (4.3.39), (4.3.40),given for the intermediate problems P (γi), enable us to specify the sought valueof γ (cf. Theorem 6.3.7 below).

Naturally, in order to study IPR for path-following methods we need someadditional requirements on the family P (γ) which will be formulated below.

Next we prove some auxiliary statements for SIP under weaker assumptionson the smoothness of the functions involved than those required in (6.2.1) -(6.2.4).

6.2.5 Lemma. Assume that f : Rn → R is a convex lsc. function and domf ⊃Br with r > 0 fixed. Then for any v ∈ Br the function

f(u) + ‖u− v‖2

attains its unconstrained minimum on the set Bρ for

ρ := 2r +1

r(β − f(0)), β := max

u∈Brf(u).

Proof: For any subgradient q ∈ ∂f(0), q 6= 0, it holds

f(u)− f(0) ≥ 〈q, u〉 (6.2.5)

and for u := r q‖q‖ we have

r‖q‖ ≤ f(rq

‖q‖)− f(0) ≤ β − f(0). (6.2.6)

We verify that

f(u) + ‖u− v‖2 > f(0) + ‖v‖2, ∀ v ∈ Br and u /∈ Bρ.

Indeed, in view of (6.2.5) this inequality is obvious if

〈q, u〉+ ‖u− v‖2 > ‖v‖2. (6.2.7)

Rewriting (6.2.7) in the form

‖u− v +1

2q‖2 − ‖v − 1

2q‖2 > 0,


we conclude that (6.2.7) is fulfilled if ‖u− v + 12q‖

2 > ‖v − 12q‖

2 and, hence, if‖u‖ > 2‖v − 1

2q‖ is satisfied. But, due to the choice of the radius ρ, the point vand relation (6.2.6), the latter inequality holds for u /∈ Bρ.The case ∂f(0) = 0 is trivial.

6.2.6 Corollary. Let the hypothesizes of Lemma 6.2.5 be satisfied and assumethat f : Rn → R is a convex lsc. function such that

supu∈Br

|f(u)− f(u)| ≤ σ <∞. (6.2.8)

Then, for any v ∈ Br, the function

f(u) + ‖u− v‖2

attains its unconstrained minimum on the set Bρ for

ρ := 2r +1

r(β − f(0) + 2σ).

Indeed, it suffices to remark that for any q ∈ ∂f(0) with regard to (6.2.8)the following estimate holds

r‖q‖ ≤ f(rq

‖q‖)− f(0) ≤ β − f(0) + 2σ.

Now, for convex, lsc. functions f , g, f , g, mapping from Rn into R, we

consider the problems

minf(u) + ‖u− v‖2 : g(u) ≤ 0 (6.2.9)

and

minf(u) + ‖u− v‖2 : g(u) ≤ 0. (6.2.10)

If condition (6.2.14) below is true, then both problems are solvable for any v.The following estimates for the corresponding Lagrange multipliers λ(v)

and λ(v) of these problems can be established.

6.2.7 Lemma. Suppose that for the functions involved in the Problems (6.2.9)and (6.2.10) the following relations are fulfilled for fixed radius r > 0 and β :=maxu∈∂Br f(u):

domf ⊃ Bρ, with ρ := 2r +1

r(β − f(0) + 2σ), (6.2.11)

supu∈Br

|f(u)− f(u)| ≤ σ, (6.2.12)

supu∈Bρ

|f(u)− f(u)| ≤ σ, (6.2.13)

max

[minu∈Br

g(u), minu∈Br

g(u)

]< −α < 0. (6.2.14)


Then, for each v ∈ Br, the Lagrange multiplier of Problem (6.2.10) can beestimated by

λ(v) ≤ 1

α

(f(u) + (‖u‖+ r)2 + σ + σ + r2 − c

)=: θ (6.2.15)

holds with

c := minv∈Rn

f(v) +1

2‖v‖2, u ∈ Br, g(u) < −α.

Proof: Let u∗(v) be an optimal solution of Problem (6.2.10). Due to the saddlepoint inequality

f(u) + ‖u− v‖2 + λ(v)g(u) ≥ f(u∗(v)) + ‖u∗(v)− v‖2, ∀ u ∈ Rn, (6.2.16)

we get from (6.2.14) and (6.2.16)

f(u) + ‖u− v‖2 − λ(v)α ≥ f(w∗(v)) + ‖w∗(v)− v‖2,

with w∗(v) := arg minu∈Rnf(u) + ‖u− v‖2.In view of (6.2.12) and Corollary 6.2.6 the inclusion w∗(v) ∈ Bρ is true and,with regard to (6.2.12) and (6.2.13),

f(u) + (‖u‖+ r)2 − λ(v)α+ σ ≥ f(w∗(v)) + ‖w∗(v)− v‖2 − σ≥ minv∈Rn

f(v) + ‖v − v‖2 − σ. (6.2.17)

Taking into account that the quadratic function

1

2t2 − 2t‖v‖+ ‖v‖2

attains its minimum at the point t := 2‖v‖, we get

‖v − v‖2 ≥ 1

2‖v‖2 − 2‖v‖‖v‖+ ‖v‖2 +

1

2‖v‖2

≥ −‖v‖2 +1

2‖v‖2 ≥ 1

2‖v‖2 − r2.

This inequality together with (6.2.17) leads to

f(u) + (‖u‖+ r2)− λ(v)α+ σ ≥ minv∈Rn

f(v) +1

2‖v‖2 − r2 − σ,

and the statement of the lemma follows immediately.

6.2.8 Remark. It should be noted that the right-hand side in (6.2.15) does notdepend on v. Moreover, it is obvious that for the multiplier of Problem (6.2.9)it holds

λ(v) ≤ 1

α

(f(u) + (‖u‖+ r)2 + r2 − c

).

This estimate follows formally from Lemma 6.2.7 too if we take

f := f and σ = σ := 0.

♦


6.2.9 Remark. If in Lemma 6.2.7

g(·) := sup1≤j≤m

gj(·),

with gj convex, lsc. functions, then estimate (6.2.15) is true for each Lagrange

multiplier λj(v) of the problem

minf(u) + ‖u− v‖2 : gj(u) ≤ 0, j = 1, ...,m,

moreover,∑mj=1 λj(v) ≤ θ in (6.2.15).

Indeed, it suffices to use instead of (6.2.16) the inequality

f(u) + ‖u− v‖2 + g(u)

m∑j=1

λj(v) ≥ f(u) + ‖u− v‖2 +

m∑j=1

λj(v)gj(u)

≥ f(u∗(v)) + ‖u∗(v)− v‖2.

♦

For further description denote ‖ · ‖p the Euclidean norm in the space Rp

(for p = n we write as before ‖ · ‖).

Now, let us return to the parameterized SIP (6.2.4) and estimate theHausdorff distance between the feasible sets of the original and the discretizedproblem, where the latter is approximated by means of a finite grid Th.Suppose that positive numbers h, σ, r are given, Th is a finite h-grid and thevector γσ ∈ Γ is chosen such that ‖γσ − γ‖p ≤ σ.

6.2.10 Lemma. Suppose that for each t ∈ T , γ ∈ Γ the function g(·, t, γ) :Rn → R is convex and lsc. Assume further that domg(·, t, γ) ⊃ Br and that for

arbitrary t, t′ ∈ T , γ′ ∈ Γ and some constant L > 0 the inequality

supu∈Br

|g(u, t′, γ′)− g(u, t, γ)| ≤ L(‖t− t′‖m + ‖γ′ − γ‖p) (6.2.18)

is fulfilled, and there exists a point u ∈ Br with

supt∈T

g(u, t, γ) ≤ −α < 0.

Then, for h+ σ < α2L , the estimate

ρH(Q,Qh) ≤ 4

αL(h+ σ)r (6.2.19)

holds, where

Q := u ∈ Br : supt∈T

g(u, t, γ) ≤ 0, (6.2.20)

Qh := u ∈ Br : supt∈Th

g(u, t, γσ) ≤ 0, (6.2.21)

with ‖γσ − γ‖p ≤ σ.


Proof: Denote

g(u) := max

[supt∈T

g(u, t, γ), supt∈Th

g(u, t, γσ)

],

¯g(u) := supt∈Th

g(u, t, γσ),

Q := u ∈ Br : g(u) ≤ 0.

In this proof we always assume that u ∈ Br.Because g(u) ≥ ¯g(u) ∀ u, inclusion Q ⊂ Qh holds and it is obvious that

ρH(Q,Qh) ≤ max[ρH(Q, Q), ρH(Qh, Q)

]. (6.2.22)

Now we are going to estimate both distances ρH(Q, Q) and ρH(Qh, Q).Denote

∑:= (T × γ) ∪ (Th × γσ), and for arbitrary u ∈ Br we choose a

point (τ(u), γ(u)) ∈∑

such that g(u) = g(u, τ(u), γ(u)) and take

τh(u) := arg mint∈Th

‖t− τ(u)‖m.

With regard to the definition of ¯g the inequality ¯g(u) ≥ g(u, τh(u), γσ) is satisfiedand, due to (6.2.18),

g(u)− ¯g(u) ≤ g(u, τ(u), γ(u))− g(u, τh(u), γσ)

≤ L (‖τ(u)− τh(u)‖m + ‖γ(u)− γσ‖p) ≤ L(h+ σ). (6.2.23)

Furthermore, from (6.2.18) we get

supt∈Th

g(u, t, γσ)− supt∈Th

g(u, t, γ) < Lσ,

hence,supt∈Th

g(u, t, γσ)− supt∈T

g(u, t, γ) < Lσ

and¯g(u) = sup

t∈Thg(u, t, γσ) < −α+ Lσ < −α

2.

If ¯g(u) ≤ −L(h + σ), then g(u) ≤ 0 holds in view of (6.2.23), consequentlyu ∈ Q.The case 0 ≥ ¯g(u) > −L(h + σ) leads to g(u) ≥ −L(h + σ) and, because ofh+ σ < α

2L , inequality g(u) < g(u) is fulfilled.Now, due to the convexity of ¯g, for a point

z := u+α− 2L(h+ σ)

α(u− u), (6.2.24)

we have

¯g(z) ≤ 2L(h+ σ)

α¯g(u) +

α− 2L(h+ σ)

α¯g(u) ≤ −L(h+ σ). (6.2.25)

Because the points u and u belong to Br, also z ∈ Br and the inequalities(6.2.23), (6.2.25) imply g(z) ≤ 0, thus z ∈ Q.But u ∈ Qh and, due to (6.2.24),

‖z − u‖ =2L(h+ σ)

α‖u− u‖ ≤ 4L(h+ σ)

αr


holds true, consequently,

ρH(Qh, Q) ≤ 4L(h+ σ)

αr.

Using instead of ¯g the function maxt∈T g(u, t, γ), we obtain analogously

ρH(Q, Q) ≤ 4L(h+ σ)

αr.

6.2.11 Corollary. If g is independent of γ and h < αL , then

ρH(Q,Qh) ≤ 2Lh

αr

holds and the inclusion Q ⊂ Qh is obvious.

This result follows immediately from the proof above if we take σ := 0 andz := u+ α−Lh

α (u− u).

6.2.3 Regularized penalty methods for SIP

According to the convergence results of MSR-methods in Section 4.3 in themethods considered in the sequel a successive approximation of SIP and a reg-ularization of the approximate auxiliary problems is performed. However, incomparison with the abstract scheme, where it is supposed that the solutionof each regularized auxiliary problem is computed up to a given accuracy, hereonly one step of a certain penalty method is carried out. This requires a coordi-nated choice between the penalty and controlling parameters used in the generalscheme of MSR-methods.

The first two algorithms described below are suggested for solving con-vex SIP (6.2.1) and the third algorithm solves parametric SIP of type P (γ) (cf.(6.2.4)).

Suppose function g in Problem (6.2.1) satisfies the following Lipschitz con-dition

supu∈Br

|g(u, t)− g(u, t′)| ≤ Lt(r)‖t− t′‖m (6.2.26)

with some Lt(r) < ∞ (for arbitrary r > 0) and t, t′ ∈ T . In the sequel, weusually omit the radius r in the notation of the Lipschitz constant if it is clearfrom the context.

Choosing any finite subset T ′ ⊂ T and any v ∈ Br, due to the Remarks6.2.8 and 6.2.9, the following estimate of the Lagrange multipliers λ(v, t)|t∈T ′of the problem

minJ(u) + ‖u− v‖2 : g(u, t) ≤ 0 ∀ t ∈ T ′ (6.2.27)

holds:

maxt∈T ′

λ(v, t) ≤ 1

α

(J(u) + (‖u‖+ r)2 + r2 − c

)=: c(r), (6.2.28)

with u a Slater point of Problem (6.2.1),

−α ≥ supt∈T

g(u, t) (6.2.29)


and c := minv∈RnJ(v) + 12‖v‖

2.Because the right-hand side in (6.2.28) does not depend on v and T ′, this esti-mate is uniform with respect to v ∈ Br and T ′ ⊂ T for finite T ′.

Regarding Remark A3.4.44, if for a penalty parameter d it holds d > c(r),then Problem (6.2.27) is equivalent to the unconstrained minimization problem

minu∈Rn

J(u) +d

2

∑t∈T ′

(g(u, t) + |g(u, t)|) + ‖u− v‖2. (6.2.30)

Now we turn to the description of solution methods for Problem (6.2.1).In order to control such methods, we need several parameters: a discretizationparameter hk for defining the grid Tk, a termination parameter δk for stoppingthe interior loop of the MSR-method, an accuracy parameter εk for the inexactsolving of the regularized auxiliary problems and an smoothing parameter τkfor approximating the exact penalty function in (6.2.30).We take sequences hk, δk, εk and τk of positive numbers satisfying

limk→∞

hk = limk→∞

εk = limk→∞

τk = 0, hk <α

Lt,

and choose some radius r > 0 such that

U∗ ∩ Br/8 6= ∅. (6.2.31)

Solving SIP (6.2.1), in each of the following methods a minimizing sequence

uk,i, (k, i) : i = 0, ..., i(k), k = 1, 2, ...,

has to be generated and, as usual for MSR-methods, the termination values i(k)of the interior loops are defined within the iterations.

For a chosen family Tk of finite hk-grids on T we consider the approxi-mations

Kk := u ∈ Rn : g(u, t) ≤ 0 ∀ t ∈ Tk, k = 1, 2, ..., (6.2.32)

of the feasible set (6.2.2) and introduce the functions

ϕk(u) :=d

2

∑t∈Tk

(g(u, t) +

√g2(u, t) + τk

), (6.2.33)

Fk,i(u) := J(u) + ϕk(u) + ‖u− uk,i−1‖2, (6.2.34)

with d > c(r) and c(r) defined by (6.2.28).

6.2.12 Algorithm.

S0: Set k := 1, i := 0, i(1) := 1 and choose u1,0 ∈ K1 ∩ Br/4.

S1: If i < i(k), set i := i+ 1,if i := i(k), set uk+1,0 := uk,i(k), k := k + 1, i := 1.

S2: Compute uk,i such that ‖∇Fk,i(uk,i)‖ ≤ εk.

S3: If ‖uk,i − uk,i−1‖ > δk, set i(k) := i+ 1 and go to S1;otherwise set i(k) := i and go to S1.


In this method each iterate is computed at the k-th approximation levelwith regard to all constraints describing the set Kk.

Now we are going to describe a modified algorithm possessing an adaptivedeleting procedure for constraints describing the set Kk. For this modificationwe need additionally a Lipschitz constant Lu such that

supt∈T|g(u, t)− g(u′, t)| ≤ Lu‖u− u′‖, ∀ u, u′ ∈ Br. (6.2.35)

Let q(r) := 2αLtr with Lt via (6.2.26), and just as in the general scheme (cf.

(4.1.3)) we assume that

supu∈Br

‖∇J(u)‖ ≤ L(r) (6.2.36)

with a given non-decreasing function L(r) : R+ → R+.For subsets Tk,i ⊂ Tk specified within the following algorithm, we define

the sets

Kk,i := u ∈ Rn : g(u, t) ≤ 0 ∀ t ∈ Tk,i, i = 1, ..., i(k), k = 1, 2, ...,(6.2.37)

and functions

ϕk,i :=d

2

∑t∈Tk,i

(g(u, t) +

√g2(u, t) + τk

), (d > c(r)),

Fk,i := J(u) + ϕk,i(u) + ‖u− uk,i‖2.

6.2.13 Algorithm.

S0: Set k := 1, T1,1 := T1 and choose u1,0 ∈ K1 ∩ Br/4.Compute u1,1 such that ‖∇F1,1(u1,1)‖ ≤ ε1 and go to S3.

S1: Determine εk,i := εk2 + 4√τk

√d2 |Tk,i|.

If i < i(k) for a given k, generate

Tk,i+1 := t ∈ Tk : g(uk,i, t) + Lu(‖uk,i − uk,i−1‖+ εk,i) ≥ 0,

set i := i+ 1;if i = i(k), set

uk+1,0 := uk,i(k),

ˆεk+1,0 :=√

(L(r) + 4r)q(r)hk + q(r)hk + εk,i(k),

Tk+1,1 := t ∈ Tk+1 : g(uk+1,0, t) + Lu(‖uk+1,0 − uk,i(k)−1‖+ ˆεk+1,0

)≥ 0,

k := k + 1, i := 1.

S2: Compute uk,i such that ‖Fk,i(uk,i)‖ ≤ εk.

S3: If ‖uk,i − uk,i−1‖ > δk, set i(k) := i+ 1 and go to S1;otherwise set i(k) := i and go to S1.


The constants εk,i and ˆεk+1,0 serve as deleting levels for the interior and exteriorloops, respectively.

According to the general scheme in Subsection 4.3.2 denote

Q := K ∩ Br, Qk := Kk ∩ Br, k = 1, 2, · · · .

6.2.14 Remark. In view of Tk ⊂ T we have an outer approximation Q ⊂ Qkfor all k and Corollary 6.2.11 provides for k = 1, 2, ...

ρH(Qk, Q) ≤ q(r)hk,

ρH(Qk, Qk+1) ≤ q(r) max[hk, hk+1].

Although a similar estimate ρH(Q,Kk,i ∩ Br) is not true, moreover, sequenceKk,i ∩ Br does not converge to Q in general. ♦

The methods suggested below for solving Problem P (γ) will be consideredunder the following general assumption on the family of Problems P (γ).

6.2.15 Assumption.

(i) Functions J(·, γ) and g(·, , t, γ) are convex and belong to C1(Rn) for eachγ ∈ Γ and each t ∈ T ;

Starting with some r0 > 0, for r ≥ r0

(ii) some solutions of P (γ0) and P (γ) belong to Br and

(iii) there exists u ∈ Br and a constant a > 0 with supt∈T g(u, t, γ) ≤ −a.

The following conditions are fulfilled for each r > 0:

(iv) supu∈Br ‖∇J(u, γ)‖ ≤ L(r) for a given function L(r) : R+ → R+;

(v) there is a constant LJr with

supu∈Br

|J(u, γ)− J(u, γ)| ≤ LJr ‖γ − γ‖p ∀ γ ∈ Γ;

(vi) there is a constant Lgr such that for all t, t′ ∈ T , γ ∈ Γ

supu∈Br

|g(u, t, γ)− g(u, t′, γ)| ≤ Lgr(‖t− t′‖m + ‖γ − γ‖p);

(vii) for every u, u′ ∈ Br and some Lgr

supt∈T

supγ∈Γ|g(u, t, γ)− g(u′, t, γ)| ≤ Lgr‖u− u′‖.

♦

Now, let r ≥ r0 be chosen such that

U∗(γ) ∩ Br/8 6= ∅, u1,0 ∈ Br/4, (6.2.38)

with U∗(γ) the optimal set of P (γ) and u1,0 := u∗(γ0) a known solution ofP (γ0).


In order to control the iteration procedure, we choose sequences hk, δk,εk and τk of positive numbers and a non-negative sequences σk such that

limk→∞

hk = limk→∞

σk = limk→∞

εk = limk→∞

τk = 0,

hk + σk <a

2Lgr, k = 1, 2, ...,

(6.2.39)

where Lgr is given according to Assumption 6.2.15(vi).For an arbitrary finite subset T ′ ∈ T and γk ∈ Γ such that ‖γk− γ‖p ≤ σk,

the estimatesupt∈T ′

g(u, t, γk) ≤ −a+ Lgrσk < −a

2

was established in the proof of Lemma 6.2.10. Thus, if v ∈ Br, the assumptionsof Lemma 6.2.7 are satisfied for the pair of problems

minJ(u, γ) + ‖u− v‖2 : g(u, t, γ) ≤ 0 ∀ t ∈ T

andminJ(u, γk) + ‖u− v‖2 : g(u, t, γk) ≤ 0 ∀ t ∈ T ′ (6.2.40)

with

f := J(·, γ), f := J(·, γk), g := supt∈T

g(·, t, g), g := supt∈T ′

g(·, t, γk),

α :=a

2, σ := LJr sup

kσk, σ := LJρ sup

kσk,

(6.2.41)

and ρ defined by (6.2.11).

6.2.16 Remark. Taking into account Remark 6.2.9, Lagrange multiplierswhich correspond to Problem (6.2.40) can be estimated by (6.2.15) uniformlywith respect to v ∈ Br as well as the iterations steps k and T ′ ⊂ T . ♦

Now we are going to describe a MSR-method for Problem P (γ) workingwith a deleting rule for the discretized constraint set. In the sequel the penaltyparameter d is chosen such that d > θ, where θ is calculated according to (6.2.11)– (6.2.15) with data from (6.2.41).In the k-th step we fix a finite hk-grid Tk on the compact set T and a parameterγk ∈ Γ such that ‖γk − γ‖p ≤ σk. Then the sets

Kk := u ∈ Rn : g(u, t, γk) ≤ 0 ∀ t ∈ Tk, k = 1, 2, ..., , (6.2.42)

can be constructed.Analogously to Algorithm 6.2.13, for a subset Tk,i ⊂ Tk described below, wedefine the set

Kk,i := u ∈ Rn : g(u, t, γk) ≤ 0 ∀ t ∈ Tk,i (6.2.43)

and the functions

ϕk,i(u) :=d

2

∑t∈Tk,i

(g(u, t, γk) +

√g2(u, t, γk) + τk

), (6.2.44)

Fk,i(u) := J(u, γk) + ϕk,i(u) + ‖u− uk,i−1‖2 (6.2.45)


and set

q(r) :=4

aLgrr. (6.2.46)

Now starting with a known solution u∗(γ0) of P (γ0), a minimizing sequence

uk,i, i = 0, ..., i(k), k = 1, 2, ...,

will be generated which solves P (γ).

6.2.17 Algorithm.

S0: Set k := 1, i := 1, T1,1 := T1, u1,0 := u∗(γ0).Compute u1,1 such that ‖∇F1,1(u1,1)‖ ≤ ε1 and go to S3.

S1: Determine εk,i := εk2 + 4√τk

√d2 |Tk,i|.

If i < i(k) for a given k, generate

Tk,i+1 := t ∈ Tk : g(uk,i, t, γk) + Lgr

(‖uk,i − uk,i−1‖+ 2

√LJr σk + εk,i

)≥ 0,

set i := i+ 1;if i = i(k), take

uk+1,0 := uk,i(k),

ˆεk+1,0 :=√

2(L(r) + 4r)q(r)(hk + σk) + 2q(r)(hk + σk) + 4√LJr σk + εk,i(k),

Tk+1,1 := t ∈ Tk+1 : g(uk+1,0, t, γk+1) + Lgr

(‖uk+1,0 − uk,i(k)−1‖+ ˆεk+1,0

)≥ 0,

set k := k + 1, i := 1.

S2: Compute uk,i such that ‖∇Fk,i(uk,i)‖ ≤ εk.

S3: If ‖uk,i − uk,i−1‖ > δk, set i(k) := i+ 1 and go to S1,otherwise set i(k) := i and go to S1.

As we will see in the next section, only in the convergence proof it is requiredthat u1,0 is a feasible point of P (γ0). However, for the efficiency of the methodit is essential that u1,0 is a good approximation of the solution u∗(γ0).

Algorithms 6.2.12 – 6.2.17 and the results of convergence described inthe next section can be easily generalized for problems having a finite numberof semi-infinite constraints.

6.3 Convergence of Regularized Penalty Meth-ods for SIP

6.3.1 Basic theorems on convergence

In principle, in case the functions J and g do not depend on the parameterγ, Algorithm 6.2.13 can be considered as a particular version of Algorithm6.2.17. However, the iterates of both algorithms coincide only under a special

6.3. CONVERGENCEOF REGULARIZED PENALTYMETHODS FOR SIP187

coordination of the Lipschitz constants and the controlling parameters and suchan artificially forced coordination makes the iteration procedure more expensivein general.

Investigations of the convergence of Algorithm 6.2.17, carried out below,enables us also to establish convergence of Algorithm 6.2.13 by means of simplemodifications. For the study of Algorithm 6.2.17 we need the following auxiliarystatements.

Suppose Assumption 6.2.15 is fulfilled and the radius r ≥ r0 is fixed.Denote

J(u) := J(u, γ), Jk(u) := J(u, γk), (6.3.1)

and

Q := u ∈ Br : supt∈T

g(u, t, γ) ≤ 0,

Qk := u ∈ Br : supt∈Tk

g(u, t, γk) ≤ 0, k = 1, 2, · · · .(6.3.2)

The inequalities (6.2.39) and Lemma 6.2.10 ensure that

ρH(Q,Qk) ≤ q(r)(hk + σk) (6.3.3)

and if we have a monoton decreasing hk+1 ≤ hk, σk+1 ≤ σk, then

ρH(Qk+1, Qk) ≤ 2q(r)(hk + σk). (6.3.4)

6.3.1 Lemma. Suppose that Assumption 6.2.15 is satisfied and

hk+1 ≤ hk, σk+1 ≤ σk, for a given k.

Moreover, assume that for fixed v0 ∈ Rn, βk > 0 the points v1, v1, v2, v2 arechosen such that

v1 := ProxJk,Qk

v0, ‖v1 − v1‖ ≤ βk, v1 ∈ Br,

v2 := ProxJk+1,Qk

v1,

v2 := ProxJk+1,Qk+1

v1.

Then the following estimates hold true:

‖v2 − v1‖ ≤ ‖v1 − v0‖+ 2√LJr σk + βk, (6.3.5)

‖v2 − v1‖ < ‖v1 − v0‖+ β∗k , (6.3.6)

with

β∗k :=√

2(L(r) + 4r)q(r)(hk + σk) + 2q(r)(hk + σk) + 4√LJr σk + βk.

Proof: Let w1 := ProxJk+1,Qk v0. In view of the non-expansivity of the prox-

mapping we get ‖v2−w1‖ ≤ ‖v1−v0‖. With regard to σk+1 ≤ σk and Assump-tion 6.2.15(v),

−2LJr σk + Jk(w1) + ‖w1 − v0‖2 ≤ Jk+1(w1) + ‖w1 − v0‖2

≤ Jk+1(v1) + ‖v1 − v0‖2

≤ Jk(v1) + ‖v1 − v0‖2 + 2LJr σk,


henceJk(w1) + ‖w1 − v0‖2 ≤ Jk(v1) + ‖v1 − v0‖2 + 4LJr σk.

Due to the strong convexity of Jk(·) + ‖ · −v0‖2 and the choice of v1 it followsimmediately

‖w1 − v1‖ ≤ 2√LJr σk.

Consequently, using the triangle inequality

‖v2 − v1‖ ≤ ‖v2 − w1‖+ ‖w1 − v1‖+ ‖v1 − v1‖,

we obtain estimate (6.3.5).Denote Ψk+1(u) := Jk+1(u) + ‖u − v1‖2. Now, if Ψk+1(v2) ≥ Ψk+1(v2),

thenΨk+1(w0) ≥ Ψk+1(v2) ≥ Ψk+1(v2), (6.3.7)

with w0 := arg minv∈Qk ‖v − v2‖.

Furthermore denote Ψ(u) := J(u) + ‖u− v1‖2 and c0 := 2(L(r) + 4r)q(r).Because of σk+1 ≤ σk as well as Assumption 6.2.15(iv),(v) and (6.3.4) weconclude that

Ψk+1(w0)−Ψk+1(v2) ≤ Ψ(w0)− Ψ(v2) + 2LJr σk+1

≤ 2(L(r) + 4r)q(r)(hk + σk) + 2LJr σk

= c0(hk + σk) + 2LJr σk.

This inequality together with (6.3.7) leads to

Ψk+1(w0)− Ψk+1(v2) ≤ c0(hk + σk) + 2LJr σk,

hence,

‖w0 − v2‖ ≤√c0(hk + σk) + 2LJr σk. (6.3.8)

Due to (6.3.4), (6.3.5) and (6.3.8) we get finally

‖v2 − v1‖ ≤ ‖v2 − w0‖+ ‖w0 − v2‖+ ‖v2 − v1‖

≤ 2q(r)(hk + σk) +√c0(hk + σk) + 2LJr σk + 2

√LJr σk

+ βk + ‖v1 − v0‖

≤ 2q(r)(hk + σk) +√c0(hk + σk) + 4

√LJr σk + βk + ‖v1 − v0‖,

i.e., estimate (6.3.6) is satisfied.But if the reverse inequality Ψk+1(v2) < Ψk+1(v2) holds, then we get

Ψk+1(v2) < Ψk+1(v2) ≤ Ψk+1(w0),

with w0 := arg minu∈Qk+1‖u− v2‖, and acting as in the previous case, again we

obtain (6.3.6).

Now we consider the following functions

Fk,i(u) := Jk(u) +d

2

∑t∈Tk,i

(g(u, t, γk) + |g(u, t, γk)|) + ‖u− uk,i−1‖2,


with d the same penalty parameter as in (6.2.44), and denote by

uk,i := arg minu∈Rn

Fk,i(u).

If uk,i−1 ∈ Br and Assumption 6.2.15 is fulfilled, then due to the Remarks6.2.16 and A3.4.44, the identity

uk,i = arg minu∈Kk,i

Ψk,i (6.3.9)

holds withΨk,i(u) := Jk(u) + ‖u− uk,i−1‖2. (6.3.10)

Moreover, for each u ∈ Rn and pair of indices (k, i) we get

0 ≤ Fk,i(u)− Fk,i(u)

=d

2

∑t∈Tk,i

(√g2(u, t, γk) + τk − |g(u, t, γk)|

)=

d

2

∑t∈Tk,i

τk√g2(u, t, γk) + τk + |g(u, t, γk)|

≤ d

2

√τk|Tk,i|. (6.3.11)

In view of the choice of uk,i and the relations (6.3.9) and (6.3.11) it holds

‖uk,i − uk,i‖ < εk,i (6.3.12)

if uk,i ∈ Br. Indeed, in view of (6.3.9) we have uk,i ∈ Kk,i, consequently,

Fk,i(uk,i) = Ψk,i(uk,i).

For wk,i := arg minu∈Rn Fk,i(u), using (6.3.11) and the obvious relations

Fk,i(uk,i) ≤ Fk,i(wk,i) < Fk,i(wk,i),

we obtain

Fk,i(uk,i)−Fk,i(wk,i) ≤ Fk,i(uk,i)−Fk,i(wk,i) +d

2

√τk|Tk,i|

<d

2

√τk|Tk,i|.

Taking into account the strong convexity of Fk,i(·) and the definition of wk,i,the latter inequality leads to

‖uk,i − wk,i‖ < 4√τk

√d

2|Tk,i|,

and (A1.5.30) implies

‖uk,i − wk,i‖ ≤ εk2,

proving (6.3.12).

Now, we are able to formulate the main result on convergence of Algorithm6.2.17.


6.3.2 Theorem. Let Assumption 6.2.15 and the relations (6.2.38), (6.2.39) besatisfied and suppose that

hk+1 ≤ hk, σk+1 ≤ σk, for each k.

Moreover, assume that

1

4r

(2L(r)q(r)(hk + σk) + 2LJr σk − (δk − εk)2

)+ εk < 0, (6.3.13)

and

∞∑k=1

(√2L(r)q(r)(hk + σk) + 2LJr σk + 2q(r)(hk + σk) + εk

)<r

2(6.3.14)

are fulfilled with εk := 4√τk

√d2 |Tk,i|+

εk2 . Then it holds

(i) i(k) <∞ ∀ k,

(ii) ‖uk,i‖ < r ∀ (k, i),

(iii) uk,i → u∗∗ ∈ U∗(γ).

Proof: Suppose that k0 and i0 are fixed with 1 ≤ i0 ≤ i(ko) and that

(1) i(k) <∞ for k < k0;

(2) the relations

uk,i := ProxJk,Qk

uk,i−1, ‖uk,i‖ < r, ‖uk,i‖ < r

are true for the pairs of indices

(k, i) ∈ Θ0 := (k′, i′) : k′ < k0, 1 ≤ i′ ≤ i(k′) ∧ k′ = k0, 1 ≤ i′ < i0.

In order to avoid repetitions, conditions for the starting position for u1,1, u1,1

and i(1) will be verified simultaneously within the main part of the proof byinduction.Obviously εk,i ≤ εk holds for all (k, i) and (6.3.13) and (6.3.14) imply

δk − εk >√rεk > 0.

Now, let u∗ be an arbitrary point of the set U∗(γ) ∩ Br and

uk,i := arg minu∈Qk

Ψk,i(u) for 0 < i ≤ i(k), k = 1, 2, ...

(formally, at the moment we are not convinced of i(k − 1) <∞).Due to inequality (6.3.3), a point vk,i ∈ Q can be chosen such that

‖vk,i − uk,i‖ ≤ q(r)(hk + σk), ∀ (k, i) ∈ Θ0. (6.3.15)

From Assumption 6.2.15(iv) we conclude that

J(vk,i)− J(uk,i) ≤ L(r)q(r)(hk + σk)


and with regard to J(u∗) ≤ J(vk,i),

J(u∗)− J(uk,i) ≤ L(r)q(r)(hk + σk). (6.3.16)

Let

vk := arg minv∈Qk

‖v − u∗‖. (6.3.17)

Then

J(vk)− J(u∗) ≤ L(r)q(r)(hk + σk)

holds, hence,

J(vk)− J(uk,i) ≤ 2L(r)q(r)(hk + σk)

and

Jk(vk)− Jk(uk,i) ≤ 2L(r)q(r)(hk + σk) + 2LJr σk =: ηk.

Now, exploiting Proposition 3.1.3 with C := Qk, f := Jk, v0 := uk,i−1, y := vk,we obtain

‖uk,i − vk‖ ≤ ‖uk,i−1 − vk‖+√ηk, (6.3.18)

therefore,

‖uk,i − u∗‖ ≤ ‖uk,i−1 − u∗‖+√ηk + 2q(r)(hk + σk), ∀ (k, i) ∈ Θ0. (6.3.19)

Further, in view of assumption (2) and (6.3.12), the points uk,i and uk,i coincidesand

‖uk,i − uk,i−1‖ > ‖uk,i − uk,i−1‖ − εk,i ≥ ‖uk,i − uk,i−1‖ − εk, ∀ (k, i) ∈ Θ0.

From (6.3.13) it is obvious that ηk < (δk− εk)2, hence, Proposition 3.1.3 appliedwith the same data leads to

‖uk,i − vk‖ < ‖uk,i−1 − vk‖+1

4r

(ηk − (δk − εk)2

), (k, i) ∈ Θ0, 1 ≤ i < i(k)

and if k < k0

‖uk,i(k) − vk‖ ≤ ‖uk,i(k)−1 − vk‖+√ηk.

Consequently,

‖uk,i − vk‖ ≤ ‖uk,i−1 − vk‖+1

4r

(ηk − (δk − εk)2

)+ εk

< ‖uk,i−1 − vk‖, ∀ (k, i) ∈ Θ0, i < i(k), (6.3.20)

but in case k < k0, i = i(k) the estimate

‖uk,i(k) − vk‖ ≤ ‖uk,i(k)−1 − vk‖+√ηk + εk (6.3.21)

is satisfied.Summing up the inequalities (6.3.20) and (6.3.21) for fixed k < k0 with respectto i = 1, ..., i(k), we obtain

‖uk+1,,0 − vk‖ = ‖uk,i(k) − vk‖ ≤ ‖uk,0 − vk‖+√ηk + εk.


Hence, if k ≤ k0, the relation

‖uk,0 − u∗‖ ≤ ‖u1,0 − u∗‖+

k−1∑s=1

ωs (6.3.22)

is fulfilled with ωs :=√ηs + εs + 2q(r)(hs + σs). Using (6.3.20) with k = k0, we

conclude that‖uk0,i0−1 − vk0‖ ≤ ‖uk0,0 − vk0‖

and this together with (6.3.18) leads to

‖uk0,i0 − vk0‖ ≤ ‖uk0,0 − vk0‖+√ηk0 ,

thus

‖uk0,i0 − u∗‖ ≤ ‖uk0,0 − u∗‖+√ηk0 + 2q(r)(hk0 + σk0). (6.3.23)

Now, (6.3.22) and (6.3.23) provide

‖uk0,i0 − u∗‖ ≤ ‖u1,0 − u∗‖+

k0−1∑s=1

ωs +√ηk0 + 2q(r)(hk0 + σk0). (6.3.24)

We recall that u∗ is an arbitrarily chosen point in U∗(γ) ∩ Br. In view of(6.2.38), (6.3.14) and (6.3.24) and taking for the moment u∗ ∈ U∗(γ)∩Br/8, we

obtain ‖uk0,i0‖ < r − εk0 .In the sequel the cases i0 ≥ 2 and i0 = 1 will be investigated separately.

If i0 ≥ 2, the first statement of Lemma 6.3.1, used with the data

k := k0, v0 := uk0,i0−2, v1 := uk0,i0−1,

v1 := uk0,i0−1, βk0 := εk0,i0−1, v2 := uk0,i0 ,

implies

‖uk0,i0 − uk0,i0−1‖ ≤ ‖uk0,i0−1 − uk0,i0−2‖+ 2√LJr σk0 + εk0,i0−1.

With regard of Assumption 6.2.15(vii) it holds

g(uk0,i0 , t, γk0) ≤ g(uk0,i0−1, t, γk0)

+Lgr

(‖uk0,i0−1 − uk0,i0−2‖+ 2

√LJr σk0 + εk0,i0−1

), ∀ t ∈ T,

consequently, from the definition of Tk,i in Algorithm 6.2.17 we get

g(uk0,i0 , t, γk0) < 0 ∀ t ∈ Tk0 \ Tk0,i0 .

This implies that

uk0,i0 = uk0,i0 , ‖uk0,i0‖ < r − εk0

and (6.3.12) yields ‖uk0,i0‖ < r.The case i = 1 can be analyzed analogously by means of the inequality (6.3.6)if we take in Lemma 6.3.1

k := k0 − 1, v0 := uk0−1,i(k0−1)−1, v1 := uk0−1,i(k0−1),

v1 := uk0−1,i(k0−1), β∗k0−1 := ˆεk0,0, v2 := uk0,1.


Thus, condition (2) can be continued for k := k0 and each i.Now it is not difficult to verify that i(k0) <∞. Indeed, using (6.3.20), we

obtain for k = k0 and i < i(k0)

‖uk0,i − vk0‖ ≤ ‖uk0,0 − vk0‖+ i

(1

4r

(ηk0 − (δk0 − εk0)2

)+ εk0

),

hence

i(k0) ≤ ‖uk0,0 − vk0‖− 1

4r (ηk0 − (δk0 − εk0)2)− εk0+ 1.

In order to complete the induction, we note that due to (6.3.14) the es-timate ‖u1,1‖ < r − ε1 is satisfied for the starting position and the inequality(6.3.19) with k = 1, i = 1, too. Consequently, the identity

u1,1 = arg minu∈K1

J1(u) + ‖u− u1,0‖2

holds and, because of (6.3.9) and K1,1 = K1, this ensures u1,1 = u1,1. Finally,from ‖u1,1 − u1,1‖ ≤ ε1,1 we conclude that ‖u1,1‖ < r.Therefore, ‖u1,i‖ < r holds for all i, and finiteness of i(1) can be established asin the general scheme of the MSR-method in Section 4.3.1.

Now, let us prove convergence of the sequence uk,i (cf. also the proof ofTheorem 4.3.6).

For uk,iQ := arg minu∈Q Ψk,i(u) with

Ψk,i(u) := J(u) + ‖u− uk,i−1‖2,

and J(·) given in (6.3.1), the inequality

J(uk,i+1Q )− J(u∗) ≤ 2〈uk,i+1

Q − uk,i, u∗ − uk,i+1Q 〉

follows from Proposition A1.5.34, hence,

‖uk,i − u∗‖2 − ‖uk,i+1Q − u∗‖2 ≥ J(uk,i+1

Q )− J(u∗) + ‖uk,i+1Q − uk,i‖2. (6.3.25)

But, due to Assumption 6.2.15(iv),(v) and estimate (6.3.3), the inequalities

minu∈Qk


Ψ(u) ≤ (L(r) + 4r)q(r)(hk + σk)

and (cf. (6.3.10))

Ψk,i+1(uk,i+1)− Ψk,i+1(uk,i+1Q ) ≤ (L(r) + 4r)q(r)(hk + σk) + LJr σk (6.3.26)

hold. For uk,i+1 := arg minv∈Q ‖v − uk,i+1‖ we have

|Ψk,i+1(uk,i+1)− Ψk,i+1(uk,i+1)| ≤ |Ψk,i+1(uk,i+1)− Ψk,i+1(uk,i+1)|+ LJr σk,

therefore,

Ψk,i+1(uk,i+1)−Ψk,i+1(uk,i+1) ≤ (L(r) + 4r)q(r)(hk + σk) + LJr σk.


Ψk,i+1(uk,i+1)− Ψk,i+1(uk,i+1Q ) ≤ 2(L(r) + 4r)q(r)(hk + σk) + 2LJr σk.


Strong convexity of Ψk,i(·) implies

‖uk,i+1 − uk,i+1Q ‖ ≤

√2(L(r) + 4r)q(r)(hk + σk) + 2LJr σk (6.3.27)

and

‖uk,i+1 − uk,i+1Q ‖ ≤

√2(L(r) + 4r)q(r)(hk + σk) + 2LJr σk

+ q(r)(hk + σk) + εk =: θk. (6.3.28)

Because ‖uk,i‖ < r for all k and i ≤ i(k), we obtain

‖uk,i+1 − u∗‖2 − ‖uk,i+1Q − u∗‖2 ≤ 4rθk,

and together with (6.3.25),

‖uk,i − u∗‖2 − ‖uk,i+1 − u∗‖2 ≥ ‖uk,i+1Q − uk,i‖2 + J(uk,i+1

Q )− J(u∗)− 4rθk.

In particular, for i = i(k)− 1 the inequality

‖uk,i(k)−1 − u∗‖2 − ‖uk+1,0 − u∗‖2

≥ ‖uk+1,0Q − uk,i(k)−1‖2 + J(uk+1,0

Q )− J(u∗)− 4rθk (6.3.29)

is satisfied.But, for 0 < i < i(k), estimate (6.3.20) and the choice of vk in (6.3.17) imply

‖uk,i − vk‖ < ‖uk,0 − vk‖.

Thus, including the case i(k) = 1, inequality

‖uk,i(k)−1 − u∗‖ < ‖uk,0 − u∗‖+ 2q(r)(hk + σk)

and the facts that ‖uk,0‖ < r and ‖u∗‖ ≤ r, we infer that

‖uk,i(k)−1 − u∗‖2 < ‖uk,0 − u∗‖2 + 8rq(r)(hk + σk) + 4 (q(r)(hk + σk))2.


‖uk+1,0 − u∗‖2 < ‖uk,0 − u∗‖2 − ‖uk,i(k)−1 − uk+1,0Q ‖2 −

(J(uk+1,0

Q )− J(u∗))

+4rθk + 8rq(r)(hk + σk) + 4 (q(r)(hk + σk))2

≤ ‖uk,0 − u∗‖2 + 4r (θk + 2q(r)(hk + σk)) + 4 (q(r)(hk + σk))2.

From (6.3.14) we have∑∞k=1 θk <∞, consequently,

∞∑k=1

hk <∞ and

∞∑k=1

σk <∞.

Due to Lemma A3.1.4, ‖uk,0−u∗‖ converges for each u∗ ∈ U∗(γ)∩Br. Hence,

J(uk,0Q )→ J(u∗), and with regard to (6.3.28)

limk→∞

J(uk,0) = J(u∗).


Now, for a fixed cluster point u∗∗ of the sequence uk,0, we conclude thatu∗∗ ∈ U∗(γ) ∩ Br and that

limk→∞

‖uk,0 − u∗∗‖ = 0.

Because‖uk,i − u∗‖ < ‖uk,0 − u∗‖+ 2q(r)(hk + σk)

holds for any u∗ ∈ U∗(γ) ∩ Br, the relation

limk→∞

max0≤i≤i(k)

‖uk,i − u∗∗‖ = 0

is also true.

6.3.3 Remark. The controlling parameters hk, σk, εk, δk and τk are mainlydefined by the conditions (6.3.13), (6.3.14), which are not restrictive and enableus to change the parameters slowly, for instance like

hk ≈ σk ≈1

(c1 + k)2+c2,

√τk ≈

1

|Tk|(c1 + k)2+c2, εk ≈

1

(c1 + k)1+c2,

with fixed c1 > 0 and arbitrary small c2 > 0. In practice the correspondingparameters in optimization and discretization methods decrease usually withgeometrical progression. However, if the value

max[L(r)q(r), LJr , Lgr , L

gr ]

is large, then it is dangerous of c1 has to be chosen large in order to satisfy(6.3.14). In this connection it is important to note that if the set

K := u ∈ Rn : g(u, t, γ) ≤ 0 ∀ t ∈ T

is bounded then we are able to make use of Lemma 4.3.3 and Theorem 4.3.6instead of Lemma 4.3.5 and Theorem 4.3.8. Choosing r such that K ⊂ Br/2,requirement (6.3.14) can be replaced by

q(r)(hk + σk) + εk <r

2,

∞∑k=1

√hk <∞,

∞∑k=1

√σk <∞,

∞∑k=1

εk <∞.

These conditions are weaker for the choice of c1 (see also Remark 4.2.6). ♦

On the basis of the given proof of convergence for Algorithm 6.2.17 weare going now to analyze convergence of the Algorithms 6.2.12 and 6.2.13.

6.3.4 Theorem. Suppose that radius r is chosen such that the relations (6.2.31)and Lipschitz condition (6.2.26) are fulfilled.Moreover, assume that

hk+1 ≤ hk ∀ k; h1 <α

Lt,


that the inequalities

1

4r

(L(r)q(r)hk − (δk − εk)2

)+ εk < 0, (6.3.30)

∞∑k=1

(√L(r)q(r)hk + εk + q(r)hk

)<r

2(6.3.31)

hold and, in addition, that for Algorithm 6.2.13 condition (6.2.35) is true.Then for both Algorithms 6.2.12 and 6.2.13 it holds:

(i) i(k) <∞ ∀ k,

(ii) ‖uk,i‖ < r ∀ (k, i),

(iii) uk,i → u∗∗ ∈ U∗.

Sketch of Proof: In order to apply Lemma 6.3.1 and the proof of Theorem6.3.2 to this situation, we use some stronger estimates than those following fromthe analysis of Algorithm 6.2.17 for σk := 0, Jk := J = J ∀ k.Better estimates can be obtained if we investigate the approximation of Problem6.2.1 directly. To this end we assume that α := a in (6.2.29), with a fromAssumption 6.2.15(iii).

First, for both Algorithms 6.2.12 and 6.2.13 the inclusion K ⊂ Kk ∀ k istrue, therefore due to Corollary 6.2.11

ρH(Qk, Qk+1) ≤ q(r)hk, k = 1, 2, · · · ,

if hk <αL . Thus estimate (6.3.6) in Lemma 6.3.1 is true with

β∗k =√

(L(r) + 4r)q(r)hk + q(r)hk + βk.

Second, between the constants q(r) and q(r) the relation q(r) = 2q(r) holds,and formula (6.2.28) provides a more precise upper bound c(r) for the Lagrangemultiplier of the auxiliary problems than formula (6.2.15), namely c(r) = θ

2 .In order to exploit the proof of Theorem 6.3.2 to Algorithm 6.2.13, we

take σk := 0, Jk := J = J ∀ k and preserve all the other notations. Then,instead of inequality (6.3.16), we obtain

J(u∗)− J(uk,i) ≤ L(r)q(r)hk,

and because of Q ⊂ Qk, Proposition 3.1.3 can be used with

C := Qk, f := J, v0 := uk,i−1, y := u∗.

For (k, i) ∈ Θ0, 1 ≤ i < i(k) this leads to

‖uk,i − u∗‖ ≤ ‖uk,i−1 − u∗‖+1

4r

((L(r)q(r)hk)− (δk − εk)2

)(6.3.32)

and if k < k0,

‖uk,i(k) − u∗‖ ≤ ‖uk,i(k)−1 − u∗‖+√L(r)q(r)hk. (6.3.33)


Moreover, the assumption of induction in the proof of Theorem 6.3.2 givesuk,i = uk,i for (k, i) ∈ Θ0, and due to (6.3.30), (6.3.31),

‖uk,i − uk,i−1‖ ≥ ‖uk,i − uk,i−1‖ − εk.

Hence, taking into account (6.3.32), (6.3.33) and (6.3.13), we get

‖uk,i − u∗‖ < ‖uk,i−1 − u∗‖, ∀ (k, i) ∈ Θ0, 1 ≤ i < i(k) (6.3.34)

and if k < k0,

‖uk,i(k) − u∗‖ < ‖uk,i(k)−1 − u∗‖+√L(r)q(r)hk + εk. (6.3.35)

Finally, for k = k0 and i = i0 Proposition 3.1.3 enables us to conclude that

‖uk0,i0 − u∗‖ < ‖uk0,i0−1 − u∗‖+√L(r)q(r)hk0 (6.3.36)

and adding up the inequalities (6.3.34)–(6.3.36) yields

‖uk0,i0 − u∗‖ < ‖u1,0 − u∗‖+

k0∑s=1

(√L(r)q(r)hs + εs

).

The further analysis requires only trivial modifications in the proof of Theorem6.3.2.

It should be noted that Algorithm 6.2.12 is not a particular case of the Al-gorithms 6.2.13 or 6.2.17. Indeed, even if choosing a constant Lgr large enough,we are not sure that Tk,i = Tk ∀ (k, i). Nevertheless the proof idea of Theorem6.3.4 for the case of Algorithm 6.2.12 is similar, and the complete proof of thistheorem for both Algorithms 6.2.12 and 6.2.13 is published in Kaplan andTichatschke [214].

6.3.2 Efficiency of the deleting rule

The efficiency of the deleting rule applied in the Algorithms 6.2.13 and 6.2.17is shown in the following statement.

6.3.5 Theorem. Suppose that the hypothesizes of Theorem 6.3.4 are fulfilledand for a point u∗ with

limk→∞

sup1≤i≤i(k)

‖uk,i − u∗‖ = 0

the set

T (u∗) := t ∈ T : g(u∗, t) = maxτ∈T

g(u∗, τ) (6.3.37)

consists only of a finite number of elements. Then, for Algorithm 6.2.13 it holds

sup1≤i≤i(k)

|Tk,i||Tk|

→ 0 as k →∞.


Proof: Let T (u∗) = t1, ..., tq and Bρ(u∗) := u ∈ Rn : ‖u − u∗‖ ≤ ρ witharbitrarily chosen ρ > 0. We fix a small δ > 0 and define

Tδ := ∪qν=1t ∈ T : ‖t− tν‖m < δ.

Due to the continuity of g(u∗, ·) on the compact set T \ Tδ, the inequalityg(u∗, t) < −α(δ) is fulfilled with some α(δ) > 0 for all t ∈ T \ Tδ. In view of theconvergence uk,i → u∗ and the uniform continuity of g on Bρ(u∗)× (T \ Tδ),we obtain for sufficiently large k0(δ) and k ≥ k0(δ)

g(uk,i, t) < −α(δ)

2, ∀ i = 1, ..., i(k), ∀ t ∈ T \ Tδ. (6.3.38)

But because oflimk→∞

sup1≤i≤i(k)

‖uk,i − u∗‖ = 0

and the conditions on the controlling parameters we conclude that

limk→∞

sup1≤i≤i(k)−1

Lu(‖uk,i − uk,i−1‖+ εk,i

)= 0,

limk→∞

Lu

(‖uk+1,0 − uk,i(k)−1‖+ ˆεk+1,0

)= 0.

These relations together with (6.3.38) imply

g(uk,i, t) + Lu(‖uk,i − uk,i−1‖+ εk,i

)< 0, i = 1, ..., i(k),

andg(uk+1,0, t) + Lu

(‖uk+1,0 − uk,i(k)−1‖+ ˆεk+1,0

)< 0

for all t ∈ T \ Tδ and sufficiently large k.Hence, for such k, all points t ∈ Tk corresponding to the remaining constraintsare contained in Tδ. But δ is arbitrarily chosen and

measTδ → 0 as δ → 0.

Reduction theorems (cf. Proposition A1.7.54, for details see also [178])state the existence of a finite subset T characterizing the solution of SIP. Finite-ness of T (u∗) for each u+ ∈ U∗ can be considered as a generic case for SIP. Thiswas already remarked in Section 6.1.

For Algorithm 6.2.17 in a similar way a result can be established whichis analogous to Theorem 6.3.5. In this case, instead of (6.3.37), one has to dealwith the set

T (u∗(γ)) := t ∈ T : g(u∗(γ), t, γ) = maxτ∈T

g(u∗(γ), τ, γ),

with u∗(γ) such that limk→∞ sup1≤i≤i(k) ‖uk,i − u∗(γ)‖ = 0.Theorem 6.3.5 says that if we know the Lipschitz constant Lu, then Algo-

rithm 6.2.13 is preferable in comparison with Algorithm 6.2.12. However, thedetermination of such a constant may be connected with serious difficulties.For parametric SIP an analogue of Algorithm 6.2.12 can be investigated in thesame manner. Taking into account Remark 4.3.9, convergence of one-step vari-ants of these methods follow from Theorems 6.3.2 and 6.3.4. In this contextthe conditions (6.3.13) and (6.3.30) can be omitted.


6.3.6 Remark. For the sequences uk,i and uk,i, generated by the Algo-rithms 6.2.12, 6.2.13 and 6.2.17, the relations

uk,i = ProxQk,Jk

uk,i−1, ‖uk,i − uk,i‖ ≤ εk,i ≤ εk,

‖uk,i‖ < r, ‖uk,i‖ < r

are true. Therefore, all results concerning the rate of convergence established inSection 5.1 can be preserved for these algorithms, if we replace in the Theorems5.1.1, 5.1.5 and 5.1.7 the triple

εk2, µk, σk

by εk, q(r)hk, 0 (in the case of Algorithms 6.2.12 and 6.2.13) and by εk, q(r)(hk+σk), LJr σk (for Algorithm 6.2.17), respectively. ♦

6.3.7 Theorem. Under the assumptions of Theorem 6.3.2 the following errorbounds are true for each iteration k of Algorithm 6.2.17:

−L(r)εk,i(k) − 2LJr σk ≤ Jk(uk,i(k))− minu∈Qk

Jk(u) ≤

< 4rδk + (4r + L(r))εk,i(k) + 2LJr σk,(6.3.39)

−L(r)(q(r)(hk + σk) + εk,i(k)

)≤ J(uk,i(k))− min

u∈KJ(u) ≤

≤ 4rδk + (4r + L(r))εk,i(k) + L(r)q(r)(hk + σk) + 2LJr σk,(6.3.40)

infv∈Qk

‖v − uk,i(k)‖ ≤ εk,i(k),

infv∈K‖v − uk,i(k)‖ ≤ εk,i(k) + q(r)(hk + σk).

The first two estimates are corollaries of Theorem 4.3.10 and Remark4.3.11, whereas the latter two can be obtained immediately due to (6.3.12),uk,i ∈ Qk and (6.3.3). Estimate (6.3.39) defines the deviation between the com-puted values Jk(uk,i(k)) and the optimal value of the objective function of Prob-lem P (γk) if some solution of this problem belongs to the ball Br.

As we noted at the beginning of Section 6.2, such intermediate bounds areimportant for applied parametric problems in case an exact value of the param-eter γ, which is of interest, is unknown beforehand. Indeed, with these boundsa more suitable value of γ can be defined or its bounds can be improved.

The path-following from γ0 to γ enables us to solve the Problem P (γ) moreefficiently. For instance, taking into account the fast convergence of conjugatedirection methods and Quasi-Newton methods in a neighborhood of the soughtsolution, we may expect a better efficiency of these methods as if Problem P (γ)is tackled directly.

If the distance between γ0 and γ is large, then in order to accelerate thesolution procedure, we recommend to choose some intermediate values γ1, ..., γkand to compute subsequently solutions for the problems P (γ1), ..., P (γk) withan increasing accuracy. This can be done by means of the described procedureon each subinterval [γ0, γ1], ..., [γk, γ].


6.3.8 Remark. The conditions (6.3.13) and (6.3.14) in Theorem 6.3.2 can beweakened if for each k and i instead of δk, εk and εk the values

δk,i, εk,i, εk,i :=εk,i2

+ 4√τk

√d

2|Tk,i|

are used, respectively. More precisely, condition (6.3.13) has to be replaced by

δk,i > εk,i,

1

4r

(2L(r)q(r)(hk + σk) + 2LJr σk − (δk,i − εk,i)2

)+ εk,i < 0

(6.3.41)

for all k and i, 1 ≤ i ≤ i(k), and in (6.3.14) the value εk,i(k) should be usedinstead of εk. In order to guarantee that i(k) < ∞ it is sufficient, for instance,to ensure that

1

4r

(ηk − (δk,i − εk,i)2

)+ εk,i < −

αki,

with αk > 0 arbitrarily chosen. Using these modifications the proof of Theorem6.3.2 is completely analogous. Only in the case i0 < i(k0), in order to esti-mate ‖uk0,i0‖, we have to strengthen the estimate (6.3.23) and (6.3.24). This ispossible if, instead of inequality (6.3.18), the stronger one

‖uk0,i0 − vk0‖ < ‖uk0,i0−1 − vk0‖ − εk0,i0

is taken. The latter inequality can be obtained by means of the second statementin Proposition 3.1.3 and the modified relation (6.3.13).

Estimate (6.3.24) turns into

‖uk0,i0 − u∗‖ ≤ ‖u1,0 − u∗‖+

k0−1∑s=1

(√ηs + εs,i(s) + 2q(r)(hs + σs)

)+2q(r)(hk0 + σk0)− εk0,i0

and this allows us to conclude that ‖uk0,i0‖ < r − εk0,i0 .Of course, this is not an a priori choice of the controlling parameters, howeverit can be done automatically within the solution process.

Due to the relations

uk,i = uk,i, ‖uk,i‖ < r, ‖uk,i‖ < r,

established in Theorem 6.3.2, Assumption 6.2.15 can be reformulated for rchosen on account of (6.2.38). In particular, instead of condition (i) in Assump-tion 6.2.15 it is sufficient to assume that J(·, γ) : Rn → R, g(·, t, γ) : Rn → R

are convex differentiable functions on Br.In order to choose the penalty parameter d, condition (6.2.15) can be replacedalso by a weaker one, namely

d >2

a

(J(u) + (‖u‖+ r)

2+ 2σ0 + r2 − c0

)with σ0 := LJr supk σk, c0 := minv∈RnJ(v) + 1

2‖v‖2 and a defined according

to Assumption 6.2.15(iii). Consequently, there is no need to evaluate Lgρ withρ defined via (6.2.15) and (6.2.13). ♦


6.3.9 Remark. Now we consider the case where Slater’s condition fails to behold for Problem (6.2.1). Assume that for a chosen discretization and givenradius r it is possible to estimate

ρH(Qk, Q) ≤ µ(hk), k = 1, 2, ...,

by means of a monotonously increasing function µ : R+ → R+, µ(0) = 0.Moreover, we assume that for each problem

minJ(u) + ‖u− v‖2 : g(u, t) ≤ 0, ∀ t ∈ Tk (6.3.42)

with v ∈ Br, a constant d(k) and a Lagrange multiplier vector λ(v; k, t)|t∈Tkexists such that

λ(v; k, t) < d(k), t ∈ Tk, (6.3.43)

holds uniformly with respect to v.Then Algorithm 6.2.12 is well defined substituting penalty parameter d(k) ford in function (6.2.33). The corresponding statement on convergence is similarto Theorem 6.3.4. It requires only to replace constant d by d(k) in the term εkas well as q(r)hk by µ(hk) in the formulas (6.3.30) and (6.3.31).

With analogous substitutions Algorithm 6.2.13 is also practicable if theestimate (6.3.43) can be maintained for problem

minJ(u) + ‖u− v‖2 : g(u, t) ≤ 0, ∀ t ∈ Tk,i

with Tk,i ⊂ Tk, obtained from Tk by means of the deleting rule in S1. Thealterations in the proofs of Lemma 6.3.1 and Theorem 6.3.4 are obvious inboth cases. ♦

Considering an example introduced by Charnes, Cooper and Kor-tanek [72], we illustrate how problems can be solved by means of the describedmethods, when Slater’s condition fails to hold.

6.3.10 Example. We consider the problem

J(u) := −u1 → min

s.t. g(u, t) := u1t− u2t2 ≤ 0, ∀ t ∈ T := [0, 1]. (6.3.44)

Obviously, constraint (6.3.44) does not satisfy the Slater condition. It is easilyseen that

K := u ∈ R2 : u1 ≤ 0, u1 − u2 ≤ 0,U∗ := u ∈ R2 : u = (0, a)T : a ≥ 0.

Let r > 0 and Tk := tj = jk : j = 1, ..., k, then we get

Kk := u ∈ R2 :

(j

k

)u1 −

(j

k

)2

u2 ≤ 0, j = 1, ..., k,

i.e., Kk can be represented as

Kk = u ∈ R2 : u1 −(

1

k

)u2 ≤ 0, u1 − u2 ≤ 0.


For an arbitrary point u ∈ Qk \ Q we choose u1 := min[u1, 0], u2 := u2, andobtain

ρH(Qk, Q) ≤ rhk, hk =1

k.

Now, under this discretization a Slater point u exists. In order to estimate theLagrange multipliers λ(v; k, t) of the problems (6.3.42) with v ∈ Br, we observethat

maxt∈Tk

λ(v; k, t) ≤ J(u) + (‖u‖+ r)2 − c0

−maxt∈Tk g(u, t). (6.3.45)

This inequality follows from Lemma 6.2.7 and Remark 6.2.9, if we apply themto Problem 6.2.9 with f := J and g(·) := maxt∈Tk g(·, t).

For u := (−1, 0)T we obtain −maxt∈Tk g(u, t) = 1k and, in view of (6.3.45),

maxt∈Tk

λ(v; k, t) ≤(1 + (1 + r)2 − c0

)k =: c(r)k. (6.3.46)

Moreover,

u1 :=1

2(1 + k2), u2 :=

k

2(1 + k2),

is the solution of Problem (6.3.42) with v := 0 and

λ(0; k,1

k) :=

k3

1 + k2

is the unique non-zero Lagrange multiplier.Hence, estimate (6.3.46) is exact with respect to the order. Obviously, this esti-mate can be maintained also if a deleting rule is applied.

Therefore, if the controlling parameters are chosen according to Theo-rem 6.3.4 with the above mentioned replacements of q(r)hk by r

k and d byd(k) := 2c(r)k, then convergence of the Algorithms 6.2.12 and 6.2.13 is en-sured for the problem under consideration.

If the constrained in the example above is replaced by

g(u, t) := u1

∣∣t− 1√2

∣∣− u2

∣∣t− 1√2

∣∣2 ≤ 0, ∀ t ∈ [0, 1]

the same effect with a different function d(k) occurs for the same uniform dis-cretization Tk. ♦

6.3.3 MSR-approach revisited: Deleting rules for constraints

Now, according to the general scheme of MST-methods and the assumptions ofTheorem 4.3.8, the solution of SIP (6.2.1) under the validity of the Lipschitzproperty (6.2.26) will be sketched. We assume that

hk+1 ≤ hk, h1 ≤α

Lt, σk ≡ 0

with α given in (6.2.29). Then, due to Lemma 6.3.1, inequality (4.3.9) andRemark 6.2.14, we can conclude immediately that for each k

‖uk,i − uk,i−1‖ ≤ ‖uk,i−1 − uk,i−2‖+εk2, if 2 ≤ i ≤ i(k), (6.3.47)

‖uk+1,1 − uk+1,0‖ ≤ ‖uk+1,0 − uk,i(k)−1‖+ ε′k, (6.3.48)

6.4. REGULARIZED INTERIOR POINT METHODS 203

with ε′k :=√

(L(r) + 4r)q(r)hk + q(r)hk + εk2 .

Thus, if the function g satisfies the Lipschitz condition (6.2.35), then theinequalities (6.3.47) and (6.3.48) lead to

g(uk,i, t) ≤ g(uk,i−1, t) + Lu

(‖uk,i−1 − uk,i−2‖+

εk2

), for 2 ≤ i ≤ i(k)

andg(uk+1,1, t) ≤ g(uk+1,0, t) + Lu

(‖uk+1,0 − uk,i(k)−1‖+ ε′k

).

Hence, in the framework of general MSR-methods, instead of the auxiliary prob-lems

minu∈Kk

Ψk,i(u), i = 2, ..., i(k) and minu∈Kk+1

Ψk+1,1(u)

we solve (for each k) the problems

minΨk,i(u) : g(u, t) ≤ 0 ∀ t ∈ Tk,i, i = 2, ..., i(k), (6.3.49)

minΨk+1,1(u) : g(u, t) ≤ 0 ∀ t ∈ Tk+1,1, (6.3.50)

respectively, with

Tk,i := t ∈ Tk : g(uk,i−1, t)+Lu

(‖uk,i−1 − uk,i−2‖+

εk2

)≥ 0 for 2 ≤ i ≤ i(k)

and

Tk+1,1 := t ∈ Tk+1 : g(uk+1,0, t) + Lu

(‖uk+1,0 − uk(i(k)−1‖+ ε′k

)≥ 0.

In accordance with the starting phase of the method, we have to calculate

u1,1 ≈ arg minu∈K1

J(u).

The described transition from Tk to Tk,i should be considered as the firststage of the cleaning of a given grid. Thereafter, in order to solve Problem(6.3.49) or (6.3.50), those deleting rules can be applied which are included inthe solvers for these problems.

For instance, if a SIP is solved by means of a discretization approach (seeAlgorithm 6.1.3), then we replace Tk by Tk,i at the beginning of Step (ii). Instep (iii), the feasibility of the resulting point u has to be verified for ProblemP (Tk,i), but not for P (Tk). Nevertheless, at the end of step k a point uk, whichis also feasible for Problem P (Tk), is obtained.

We emphasize that, due to the application of IPR, the objective functionsin the auxiliary problems P (Tk,i) are strongly convex. This is essential for theuse of efficient deleting rules.

In the same way it is possible to combine the described cleaning proce-dure with deleting rules which are part of exchange methods, like cutting planemethods with a deletion of constraints.

6.4 Regularized Interior Point Methods

In this subsection we describe mainly results which we have obtained togetherwith Abbe [2]. Starting with the introduction of logarithmic barrier methodsand its regularization, the subsequent subsections contain corresponding con-vergence results and some numerical experiences with MSR-methods for SIP.


6.4.1 Regularized logarithmic barrier methods

In the sequel we want to apply the classical logarithmic barrier method to convexsemi-infinite problems of the form

minimize f(x) s.t. x ∈ Rn, g(x, t) ≤ 0 (t ∈ T ). (6.4.1)

Without specifying any assumptions at this point it turns out that the most dif-ficult question for this transfer is how can we choose a suitable barrier function.This is caused by the appearance of infinitely many constraints.

Sonnevend [377] and Schattler [362] suggest to use the following inte-gral barrier function

−∫T

ln(−g(x, t))dt. (6.4.2)

Of course, measT > 0 is assumed in this case so that especially finite sets Tare excluded. Nevertheless, let us have a closer look at some important detailsof the arising method.Due to the smoothing effect of the integral, (6.4.2) does not have to possessthe barrier property at all. That means it is possible that (6.4.2) is boundedfrom above if one approaches the boundary of the feasible region. This fact isillustrated by the following example.

6.4.1 Example. Let T := [0, 1] and consider the linearly restricted feasible set

K :=

x ∈ R2 : g(x, t) := −

(t− 1

2

)2

x1 − x2 ≤ 0 ∀ t ∈ T

.

Choosing x = (1, 0)T , we have g(x, t) = −(t − 12 )2 ≤ 0 for all t ∈ [0, 1]. Thus

x ∈ K, but g(x, t) = 0 for t = 12 implies x /∈ intK. Furthermore, we conclude

−∫T

ln(−g(x, t))dt = −1∫

0

ln

((t− 1

2

)2)dt = −4

1∫1/2

ln

(t− 1

2

)dt = 2 ln 2 <∞.

♦

Finally, let us have a look at (6.4.2) from the numerical point of view. Here wehave the task to evaluate integrals of the form (6.4.2) at different points x. If xis not located near the boundary of the feasible region this could be done withstandard formulas for numerical integration. But if we evaluate this integral fora point near the boundary of the feasible region the logarithm will have largeabsolute values for certain t. Consequently standard formulas for numerical in-tegration do not work very well in this area. Nevertheless, we have to be able toevaluate the barrier function (and therefore also the integrals) near the bound-ary of the feasible region, because optimal solutions are typically located onthe boundary. Due to these problems Schattler refers to special integrationrules for evaluating these integrals. Lin et al. [266] suggest to use Simpson’smethod to compute similar integrals. In order to achieve a suitable accuracythey decomposed in some cases the interval [0, 1] into 400000 small parts. Thus,independent of the formulae, evaluating such integrals requires a high compu-tational effort.


In our approach we consider the following reformulation of SIP (6.1.1),(6.1.2) with U0 = Rn:

minimize f(x) s.t. x ∈ Rn, maxt∈T

g(x, t) ≤ 0. (6.4.3)

The theoretical properties as well as practical applications of this approach areextensively studied by Polak [323, 324]. The main advantage of the reformu-lation (6.4.3) is that we are now dealing with a single constraint. Consequentlywe get the barrier function

f(x)− µ ln

(−max

t∈Tg(x, t)

). (6.4.4)

But now we face a non-differentiable barrier function due to the involvedmax-term and we have to solve a global auxiliary optimization problem

maximize g(x, t) s.t. t ∈ T (6.4.5)

in order to evaluate the barrier function at a given point x, which is in generala very hard task. Thus except for special cases we cannot suppose that (6.4.5)is exactly solvable for any given x with acceptable computational effort. Ac-cordingly there is only an approximate maximizer of Problem (6.4.5) available,hence the barrier function is only approximately computable. Consequently wehave to use a method for minimizing (6.4.4) which requires an approximatelycomputable objective function.

Such a method, deduced from a bundle method suggested by (cf. Kiwiel[238]), will be presented in the next subchapter. This method requires compactfeasible sets. Nevertheless, having in mind that the barrier function (6.4.4) hasto be minimized on open sets of the form

x ∈ Rn : g(x, t) < 0 (t ∈ T ) ,

in order to use the method proposed we will approximately minimize the con-vex barrier function successively on compact sets like closed boxes or balls.Altogether we obtain the following conceptual algorithm for solving (6.4.3).

6.4.2 Algorithm.

Given µ1 > 0.

For k = 1, 2, . . .:

• For i = 1, 2, . . .:

– Determine a nonempty compact set

Sk,i ⊂ x ∈ Rn : maxt∈T

g(x, t) < 0.

– Compute an approximate minimizer xk,i of (6.4.4) on Sk,i.

– If xk,i is an approximate unconstrained minimizer of a certain accu-racy of (6.4.4) set xk := xk,i and leave the inner loop.

• Choose µi+1 ∈ (0, µi). ♦

In the following subchapter we present a numerical method for minimizing thenondifferentiable barrier function (6.4.4) on Si,k and give all necessary detailsto put this conceptual algorithm into an implementable form.


6.4.1.1 An implementable barrier algorithm for SIP

We make use of the following general assumptions under which the methodis considered.

6.4.3 Assumption. Suppose

(1) f : Rn → R is a convex function;

(2) T ⊂ Rp is a compact set;

(3) g(·, t) is convex on Rn for any t ∈ T ;

(4) g(x, ·) is continuous on T for any x ∈ Rn;

(5) set M0 := x ∈M : maxt∈T g(x, t) < 0 is nonempty;

(6) set of optimal solutions Mopt := x ∈M : f(x) = f∗ of Problem (6.4.3)is nonempty;

(7) set Th (h > 0) is a finite h-grid on T (i.e. for each t ∈ T there existsth ∈ Th with ‖t− th‖2 ≤ h);(in case h = 0 the sets Th, T coincides);

(8) for each compact set S ⊂ Rn there exists a constant LtS with

|g(x, t1)− g(x, t2)| ≤ LtS‖t1 − t2‖2 (6.4.6)

for all x ∈ S and all t1, t2 ∈ T ;

(9) for each compact set S ⊂M0 a constant CS <∞ with

CS ≥ maxx∈S

∣∣∣∣ 1

maxt∈T g(x, t)

∣∣∣∣ (6.4.7)

can be computed such that S′ ⊂ S ⊂M0 implies CS′ ≤ CS ;

(10) for each x ∈ Rn and each t ∈ T an element of the subdifferential of f inx and an element of the subdifferential of g(·, t) in x can be computed.

♦

Regarding Assumptions (1) and (3) it is ensured that we deal with con-vex SIP of type (6.4.3). Furthermore, due to (2) and (4) the maximizationproblems (6.4.5) are solvable and consequently the barrier functions (6.4.4) arecomputable at least from the theoretical point of view. Moreover, (5) and (6) aremotivated by theoretical results below, ensuring the existence of a minimizer ofthe barrier function (6.4.4) for any given µ > 0. Furthermore, the classical log-arithmic barrier method with exact minimizers xk leads to an optimal solutionof SIP (6.4.3) in the sense that each accumulation point of xk is an optimalsolution and f(xk)→ f∗ (cf. Fiacco [112]). Because we cannot expect that themaximization problems (6.4.5) are exactly solvable, therefore we consider thenext assumptions: (7) allows to compute inexact maximizer while their accuracycan be controlled with (8). The necessity of Assumption (9) will become clear inthe further course, while by (10) we want to point out that indeed the compu-tation of the subgradients are necessary in the implementation of the method.


From the classical logarithmic barrier approach it is known that the barrierparameter has to converge to zero, e.g. by reducing it from step to step. But,due to the fact that the conditioning of the barrier problems is getting worsewith decreasing the barrier parameter, it makes sense to keep this parameterfixed for a couple of steps.

Again the idea of MSR-methods from Section 4.3 comes into the play (seealso [222]). In order to permit a dynamical control, the choice of the barrierparameter has to depend on the progress of the iterates. To avoid side effectswhich can influence this choice we keep the prox-parameter χ constant as longas the barrier parameter is not changed. Merely the proximal point is updatedmore frequently.

6.4.4 Algorithm.

Given µ1 > 0, x0 ∈M0, σ1 > 0 and χ1 with 0 < χ ≤ χ1 ≤ χ.

For k := 1, 2, . . . :

• Set xk,0 := xk−1.

• For i := 1, 2, . . . :

– Set xk,i,0 := xk,i−1, select εk,i > 0 and define Fk,i :M0 → R by

Fk,i(x) := f(x)− µk ln

(−max

t∈Tg(x, t)

)+χk2‖x− xk,i−1‖2. (6.4.8)

– For s := 1, 2, . . . :

(a) Select rk,i,s > 0 such that

Sk,i,s := x ∈ Rn : ‖x− xk,i,s−1‖∞ ≤ rk,i,s ⊂ M0.

(b) Select hk,i,s ≥ 0 and define Fk,i,s :M0 → R by

Fk,i,s(x) := f(x)− µk ln

(− maxt∈Thk,i,s

g(x, t)

)+χk2‖x− xk,i−1‖2

where Thk,i,s fulfils Assumption 6.4.3(7).

(c) Select βk,i,s ≥ 0 and compute an approximate solution xk,i,s of

minimize Fk,i(x) s.t. x ∈ Sk,i,s (6.4.9)

such that

Fk,i,s(xk,i,s)− min

x∈Sk,i,sFk,i(x) ≤ εk,i

2+ βk,i,s (6.4.10)

and Fk,i,s(xk,i,s) ≤ Fk,i,s(xk,i,s−1) are true.

(d) If

Fk,i,s(xk,i,s−1)− Fk,i,s(xk,i,s) ≤

εk,i2

(6.4.11)

then set xk,i := xk,i,s−1, Sk,i := Sk,i,s, rk,i := rk,i,s and stop theloop in s, otherwise continue with the loop in s.


– If ‖xk,i − xk,i−1‖ ≤ σk then set xk := xk,i, rk := rk,i, i(k) := i andstop the loop in i, otherwise continue with the loop in i.

• Select 0 < µk+1 < µk, 0 < χ ≤ χk+1 ≤ χ and σk+1 > 0.

♦

Aspects concerning the numerical realization of the inner loops will be discussedin Subsection 6.4.3.

Let us give some explanations for each particular step. In (a) we specifythe compact set Sk,i,s as a linearly bounded set. This decision is caused by thefact that linearly bounded sets are normally the simplest bounded structureson that minimization can be done. Consequently minimizing the barrier func-tion on the chosen compact set is normally easier than minimizing it on morecomplex structures like quadratically bounded sets such as balls or ellipsoids.Additionally, having a bundle method in mind, we point out that the decision tochoose linearly bounded sets is important because the auxiliary problems of thebundle method are linear and quadratic problems. Thus each of them should beefficiently solvable by standard algorithms. Furthermore, since M0 is an openset, step (a) is realizable.

In (b) we define the approximation of the regularized barrier function whilein (c) the approximate minimization of the regularized barrier function is donewith a certain accuracy. The condition (6.4.10) is stimulated by Corollary 3.6 in[238], when solving (6.4.9) with the bundle method. Finally, in (d) the stoppingcriterion of the inner loop is given. It is divided into three parts but mainlyonly two inequalities occur. The first one checks whether there is achieved asufficient improvement of the approximate solution on the current box with theselected accuracy. The second part of the criterion checks whether the accuracyparameter and the barrier parameter are in an appropriate order. If this is notthe case then the accuracy parameter is readjusted.

The critical point for a realization of the presented method is the questionwhether there exist approximate solutions of (6.4.9) which fulfil the demandedcriterions. As stated above (6.4.10) is initiated by a bundle method. There-fore we want to show that in fact we can use a bundle method for solving(6.4.9). We first notice that the sets Sk,i,s ⊂ M0 are convex and compact byconstruction (for any given rk,i,s > 0). Additionally, they are also nonempty,because xk,i,s−1 ∈ M0 holds by construction and due to the open structureof M0 there exists a radius rk,i,s > 0 such that Sk,i,s ⊂ M0 is fulfilled.Moreover, we have already remarked that the functions fk are continuous ondomfk =M0 = intdomfk (this can be proved for instance by using the convex-ity property of fk on domfk in combination with Theorem 2.35 in Rockafellarand Wets [353]). Consequently in (6.4.9) we have to minimize a continuousfunction on a compact set, which is obviously solvable.

In [1], Lemma 5.3, it is shown that under Assumption 6.4.3 Kiwiel’s prox-imal level bundle method in [238] can be applied to solve the auxiliary problemsarising in Algorithm 6.4.4. Using the constants LtS and CS from Assumption6.4.3, the value βk,i,s := µkCSk,i,sL

tSk,i,shk,i,s with predefined hk,i,s can be used

as error level when applying a bundle method.


6.4.2 Convergence analysis

In this subsection we want to show that Algorithm 6.4.4 leads to an optimalsolution of SIP (6.4.3) under appropriate assumptions. Moreover we investigatethe rate of convergence of this method.

We start with a statement about the finiteness of the loop in s.

6.4.5 Lemma. Let Assumption 6.4.3 be fulfilled. Furthermore, let k, i be fixed,δk,i > 0 and qk,i ∈ (0, 1) be given. If

µkLtSk,i,sCSk,i,shk,i,s ≤ βk,i,s ≤ q

kk,iδk,i, ∀ s, (6.4.12)

then the loop in s of Algorithm 6.4.4 terminates after a finite number of steps.

Proof: see Proposition 4.6 in [1].

At this point we want to analyze the consequences of the stopping criterion ofthe loop in s.

6.4.6 Lemma. Let Assumption 6.4.3 be fulfilled. Furthermore, let k, i be fixedand x be the unique optimal solution of

minimize Fk,i(x) s.t. x ∈M0. (6.4.13)

Moreover, let xk,i,s−1, xk,i,s be generated by Algorithm 6.4.4 and

βk,i,s ≥ µkLtSk,i,sCSk,i,shk,i,s

be valid. If inequality (6.4.11) is true, then

0 ≤ Fk,i(xk,i,s−1)−Fk,i(x) ≤ max

[1,‖xk,i,s−1 − x‖∞

rk,i,s

](εk,i+2βk,i,s) (6.4.14)

and

∥∥xk,i,s−1 − x∥∥∞ ≤

∥∥xk,i,s−1 − x∥∥ ≤ max

[√2(εk,i + 2βk,i,s)

χk,

2(εk,i + 2βk,i,s)

χkrk,i,s

](6.4.15)

hold.

Proof: First, let us remark that Problem (6.4.13) is uniquely solvable.Inequality (6.4.14) can be shown analogously as in the proof of Lemma 4.5

in [1]. Thus, only the second inequality needs to be proven.Due to the strong convexity of Fk,i with modulus χk/2 (in the sense of

Definition A1.5.20) we have

Fk,i(λx+ (1− λ)y) ≤ λFk,i(x) + (1− λ)Fk,i(y)− χk2λ(1− λ)‖x− y‖2

for all λ ∈ [0, 1] and all x, y ∈M0. Taking into account that

Fk,i(x) = infz∈M0

Fk,i(z) ≤ Fk,i(λx+ (1− λ)y)

for all λ ∈ [0, 1] and all x, y ∈M0, it follows that

Fk,i(x) ≤ λFk,i(x) + (1− λ)Fk,i(xk,i,s−1)− χk

2λ(1− λ)‖x− xk,i,s−1‖2


and

(1− λ)Fk,i(x) ≤ (1− λ)Fk,i(xk,i,s−1)− χk

2λ(1− λ)‖x− xk,i,s−1‖2

are true for all λ ∈ [0, 1]. Hence,

Fk,i(x) ≤ Fk,i(xk,i,s−1)− χk2λ‖x− xk,i,s−1‖2

for all λ ∈ [0, 1) so that for λ ↑ 1

χk2‖x− xk,i,s−1‖2 ≤ Fk,i(xk,i,s−1)− Fk,i(x). (6.4.16)

Using (6.4.14) one obtains

χk2‖x− xk,i,s−1‖2 ≤ max

[1,‖xk,i,s−1 − x‖∞

rk,i,s

](εk,i + 2βk,i,s).

At this point we distinguish two cases.We first suppose that 1 ≥ ‖xk,i,s−1 − x‖∞/rk,i,s. Then it holds

χk2‖x− xk,i,s−1‖2 ≤ (εk,i + 2βk,i,s)

and ∥∥xk,i,s−1 − x∥∥∞ ≤ ‖x− x

k,i,s−1‖ ≤

√2(εk,i + 2βk,i,s)

χk. (6.4.17)

In the second case the inequality 1 < ‖xk,i,s−1 − x‖∞/rk,i,s is supposed to betrue. Then

χk2‖x− xk,i,s−1‖2 ≤ ‖xk,i,s−1 − x‖∞

rk,i,s(εk,i + 2βk,i,s)

≤ ‖xk,i,s−1 − x‖rk,i,s

(εk,i + 2βk,i,s)

is valid. We conclude that

‖x− xk,i,s−1‖∞ ≤ ‖x− xk,i,s−1‖ ≤ 2(εk,i + 2βk,i,s)

χkrk,i,s. (6.4.18)

Combining (6.4.17) and (6.4.18) completes the proof.

In the following we denote the Euclidean ball with radius τ > 0 around xc ∈ Rnby Bτ (xc), i.e.

Bτ (xc) := x ∈ Rn : ‖x− xc‖ ≤ τ.

6.4.7 Theorem. Let Assumption 6.4.3 be fulfilled. Furthermore, let τ ≥ 1 andxc ∈ Rn be chosen such that Mopt ∩ Bτ/8(xc) 6= ∅. Let x∗ ∈ Mopt ∩ Bτ/8(xc),x ∈ M0 ∩ Bτ (xc) and x0 ∈ M0 ∩ Bτ/4(xc) be fixed and δk,i > 0, qk,i ∈ (0, 1),

αk > 0, t ∈ T (x), v ∈ ∂g(x, t) as well as

c ≥ ‖x− x∗‖ and c3 := f(x)− f− + c0 + c1


with

f− ≤ minx∈M

f(x), c0 :=

∣∣∣∣ln(−maxt∈T

g(x, t)

)∣∣∣∣ , c1 := ln

(−max

t∈Tg(x, t) + 2‖v‖

)be given. Moreover, assume that (6.4.12) is true for k = 1, 2, · · · , 1 ≤ i ≤ i(k)and all s occurring in the outer loop (k, i) and that the controlling parametersof Algorithm 6.4.4 satisfy the following conditions:

max

[√2(εk,i + 2δk,i)

χk,

2(εk,i + 2δk,i)

rk,iχk

]≤ αi, (6.4.19)

0 < µk+1 ≤ µk < 1 for k = 1, 2, · · · , µ1 ≤ e−c3 , (6.4.20)∞∑k=1

[√2µkχk

(2| lnµk|+ ln τ) + 2cµk + αk

]<τ

2(6.4.21)

and

σk >

√2µkχk

(2| lnµk|+ ln τ) + 4ταk + αk. (6.4.22)

Then it holds

(1) the loop in s is finite for each (k, i);

(2) the loop in i is finite for each k, i.e. i(k) <∞;

(3) ‖xk,i − xc‖ < τ for all pairs (k, i) with 0 ≤ i ≤ i(k);

(4) the sequence xk,i = x1,0, . . . , x1,i(1), x2,0, . . . , x2,i(2), x3,0, . . . convergesto an element x∗∗ ∈Mopt ∩ Bτ (xc).

Proof: The first statement follows immediately from Lemma 6.4.5. The otherstatements can be proven similarly to the proof of Theorem 1 in [222].

We first define zk = µkx+(1−µk)x∗. Due to 0 < µk < 1, x ∈M0 = intM,x∗ ∈ M and the fact that M is convex one can infer zi ∈ M0 by means ofTheorem 6.1 in [348]. Then it follows

−µk ln

(−max

t∈Tg(zk, t)

)= −µk ln

(−max

t∈Tg(µkx+ (1− µk)x∗, t)

)≤ −µk ln

(−µk max

t∈Tg(x, t)− (1− µk) max

t∈Tg(x∗, t)

)≤ −µk ln

(−µk max

t∈Tg(x, t)

)= −µk

(lnµk + ln

(−max

t∈Tg(x, t)

))≤ µk (| lnµk|+ c0) , (6.4.23)

because the max−function is convex and the logarithm increases monotonically.Furthermore, the estimates

‖zk − x∗‖ = µk‖x− x∗‖ ≤ cµk (6.4.24)

and

f(zk) ≤ µkf(x) + (1− µk)f(x∗) ≤ f(x∗) + µk(f(x)− f−) (6.4.25)


are obviously true. Additionally, this yields

maxt∈T

g(x, t) ≥ g(x, t) ≥ g(x, t) + 〈v, x− x〉 = maxt∈T

g(x, t) + 〈v, x− x〉

for all x ∈ Rn since v ∈ ∂g(x, t), t ∈ T (x). Consequently, using the Cauchy-Schwarz inequality and x ∈ Bτ (xc), we obtain

0 < −maxt∈T

g(x, t) ≤ −maxt∈T

g(x, t) + 2τ‖v‖, ∀ x ∈ Bτ (xc) ∩M0.

Regarding the monotonicity of the logarithm function and τ ≥ 1, this leads to

inf

−µk ln

(−max

t∈Tg(x, t)

): x ∈ Bτ (xc) ∩M0

≥ −µk ln

(−max

t∈Tg(x, t) + 2‖v‖τ

)≥ −µk(c1 + ln τ). (6.4.26)

For x ∈M0 we introduce

fk(x) := f(x)− µk ln

(−max

t∈Tg(x, t)

).

The inequalities (6.4.23), (6.4.25), (6.4.26) and the optimality of x∗ show thatfor all x ∈M0 ∩ Bτ (xc)

fk(zk) ≤ f(x)− µk ln

(−max

t∈Tg(x, t)

)+ µk(c1 + ln τ) + µk(f(x)− f−) + µk (| lnµk|+ c0)

= fk(x) + µk(c3 + ln τ + | lnµk|). (6.4.27)

Combining this with (6.4.20) one gets

fk(zk) ≤ fk(x) + µk (2| lnµk|+ ln τ) ∀ x ∈M0 ∩ Bτ (xc). (6.4.28)

Using the results above we can prove the second and third statement ofthe theorem by induction. To do so we assume:

(i) k0, i0 are kept fixed with 0 ≤ i0 < i(k0),

(ii) i(k) <∞ if k < k0,

(iii) denoting

xk,i = arg minx∈M0

Fk,i(x),

¯xk,i = arg minx∈M

f(x) +

χk2‖x− xk,i−1‖22

,

the following relations hold

¯xk,i = arg minx∈M∩Bτ (xc)

f(x) +

χk2‖x− xk,i−1‖22

, (6.4.29)

‖xk,i − xc‖ < τ, ‖xk,i − xc‖ < τ, ‖¯xk,i − xc‖ < τ,

hold for all pairs of indices

(k, i) ∈ Q0 := (k′, i′) : k′ < k0, 0 < i′ ≤ i(k′) ∨ k′ = k0, 0 < i′ ≤ i0 .


Let us remark that xk,i ∈ M0 as minimizer of Fk,i exists. The existenceof ¯xk,i ∈ M as minimizer of f(x) + χk

2 ‖x − xk,i−1‖22 is ensured by the strong

convexity and continuity of this function on the nonempty and closed set M.At this point we have to check (i)-(iii) for the starting values k0 = 1, i0 = 0,

but this is easy: By construction i(1) > 0 so that k0 = 1, i0 = 0 fulfil the firstassumption. The other two assumptions are also obvious by construction.

Using the stopping criterion of the loop in s of Algorithm 6.4.4, (6.4.12),(6.4.15), (6.4.19) as well as the definition of xk,i we deduce∥∥xk,i − xk,i∥∥ ≤ αk. (6.4.30)

Furthermore, taking the definitions of xk,i and ¯xk,i into account we can conclude

f(xk,i

)+χk2

∥∥xk,i − xk,i−1∥∥2 − f

(¯xk,i)− χk

2

∥∥¯xk,i − xk,i−1∥∥2 ≤ µk. (6.4.31)

Additionally one can establish

χk2

∥∥xk,i − ¯xk,i∥∥2 ≤ f

(xk,i

)+χk2

∥∥xk,i − xk,i−1∥∥2−f

(¯xk,i)−χk

2

∥∥¯xk,i − xk,i−1∥∥2

(6.4.32)in the same manner as (6.4.16). Combining (6.4.31) and (6.4.32) we see

χk2

∥∥xk,i − ¯xk,i∥∥2 ≤ µk,

so that ∥∥xk,i − ¯xk,i∥∥ ≤√2µk

χk. (6.4.33)

Using this and (6.4.30) we obtain

∥∥¯xk,i − xk,i∥∥ ≤ αk +

√2µkχk

. (6.4.34)

Due to xk,i ∈M0 ∩ Bτ (xc) estimate (6.4.28) implies

fk(zk) ≤ fk(xk,i) + µk (2| lnµk|+ ln τ) , ∀ (k, i) ∈ Q0. (6.4.35)

In the sequel we distinguish the following cases:

(a) k < k0, 0 ≤ i < i(k)− 1 or k = k0, 0 ≤ i < i0,

(b) k < k0, i = i(k)− 1 and

(c) k = k0, i = i0 + 1.

ad (a): In this case we obtain

∥∥xk,i+1 − zk∥∥2

2−∥∥xk,i − zk∥∥2

2≤ −

∥∥xk,i+1 − xk,i∥∥2

2+

2

χk

(fk(zk)− fk(xk,i+1)

)≤ −

∥∥xk,i+1 − xk,i∥∥2

2+

2µkχk

(2| lnµk|+ ln τ)

(6.4.36)


by using Proposition 3.1.3 and (6.4.35). Taking into account (6.4.22), (6.4.30)and the stopping criterion of the loop in i of Algorithm 6.4.4, we conclude∥∥xk,i+1 − xk,i

∥∥ ≥ ∥∥xk,i+1 − xk,i∥∥− ∥∥xk,i+1 − xk,i+1

∥∥ > σk − αk > 0 (6.4.37)

and we have ∥∥xk,i+1 − zk∥∥2

2−∥∥xk,i − zk∥∥2

2≤ −ε2

k + γk < 0 (6.4.38)

with εk = σk − αk and γk = 2µk(2| lnµk|+ ln τ)/χk.Moreover, regarding ‖xk,i − xc‖ < τ and ‖zk − xc‖ < τ , the estimate∥∥xk,i+1 − zk

∥∥− ∥∥xk,i − zk∥∥ < −ε2k + γk

2‖xk,i − zk‖≤ 1

4τ

(−ε2

k + γk)

(6.4.39)

holds. Together with (6.4.22) and (6.4.30) we obtain∥∥xk,i+1 − zk∥∥− ∥∥xk,i − zk∥∥ < 1

4τ

(−ε2

k + γk)

+ αk < 0. (6.4.40)

ad (b): Now we assume k < k0, i = i(k)− 1.In this case we can combine (6.4.28), (6.4.30) and the implications of Proposition3.1.3 to see that∥∥∥xk,i(k) − zk

∥∥∥− ∥∥∥xk,i(k)−1 − zk∥∥∥ ≤

≤∥∥∥xk,i(k) − xk,i(k)−1

∥∥∥− ∥∥∥xk,i(k)−1 − zk∥∥∥+ αk

≤√

2

χk

(fk(zk)− fk(xk,i(k))

)+ αk ≤

√2µkχk

(2| lnµk|+ ln τ) + αk

=√γk + αk (6.4.41)

holds. Summing up the inequalities (6.4.40) w.r.t. i = 0, 1, . . . , i(k) − 2 for afixed k < k0 and adding (6.4.41) leads to∥∥∥xk,i(k) − zk

∥∥∥− ∥∥xk,0 − zk∥∥ ≤ √γk + αk, (6.4.42)

and together with (6.4.24) one has∥∥∥xk,i(k) − x∗∥∥∥− ∥∥xk,0 − x∗∥∥ ≤ √γk + αk + 2cµk. (6.4.43)

ad (c): Now we assume k = k0, i = i0 + 1. In this case we consider

xk0,i0+1 := arg minx∈M∩Bτ (xc)

f(x) +

χk02

∥∥x− xk0,i0∥∥2

2

.

The non-expansivity of the prox-mapping (see Remark 3.1.2) yields∥∥xk0,i0+1 − x∗∥∥ ≤ ∥∥xk0,i0 − x∗∥∥ . (6.4.44)

Using this, (6.4.24) and (6.4.40) for k = k0, 0 ≤ i < i0 we obtain∥∥xk0,i0+1 − x∗∥∥ ≤ ∥∥xk0,i0 − zk0∥∥+ cµk0

≤∥∥xk0,0 − zk0∥∥+ cµk0

≤∥∥xk0,0 − x∗∥∥+ 2cµk0 .


If k0 > 1 this leads to∥∥xk0,i0+1 − x∗∥∥ ≤ ∥∥∥xk0−1,i(k0−1) − x∗

∥∥∥+ 2cµk0 .

Now a successive application of (6.4.43) gives

∥∥xk0,i0+1 − x∗∥∥ ≤ ∥∥x1,0 − x∗

∥∥+

k0−1∑j=1

(√γj + αj + 2cµj

)+ 2cµk0 . (6.4.45)

Due to the assumptions ‖x∗−xc‖ ≤ τ8 and ‖x1,0−xc‖ < τ

4 we can now assemble(6.4.20), (6.4.21) and (6.4.45) to get∥∥xk0,i0+1 − xc

∥∥ < τ − αk0 −√γk0 < τ − αk0 −

√2µk0χk0

. (6.4.46)

We see that∥∥xk0,i0+1 − xc

∥∥ < τ and due to the strong convexity of f +χk0

2

∥∥· − xk0,i0∥∥2

2we can deduce that xk0,i0+1 must actually be identical to

¯xk0,i0+1. But then the estimates (6.4.33) and (6.4.34) imply∥∥xk0,i0+1 − xc

∥∥ < τas well.

So far we have proven that Assumption (iii) is also true for k = k0,i = i0 + 1. It still remains to prove that i(k0) < ∞ holds. In order to do so wesum up the inequalities (6.4.40) to an arbitrary i ≤ i(k0)− 1 and obtain∥∥∥xk0 ,i − zk0∥∥∥ < ∥∥xk0,0 − zk0∥∥+ i

(1

4τ

(−ε2

k0 + γk0)

+ αk0

),

giving an upper bound for i:

i <−∥∥xk0,0 − zk0∥∥

14τ

(−ε2

k0+ γk0

)+ αk0

<∞. (6.4.47)

Thus we have shown the induction statements to hold for k0, i0 +1 if i0 < i(k0).But the case i0 = i(k0) is equivalent to the case (k0 + 1, 0) and so the inductionholds for all possible indices (k0, i0). As a consequence of this the second andthird statement of the theorem are proven.

It remains to prove the convergence of the generated sequence xk,i to anoptimal solution of SIP (6.4.3). Let x be an arbitrary element ofMopt∩Bτ (xc).Defining

zk := x+ (1− µk)(x− x),

we can show ‖zk−x‖ ≤ 2τµk similar to (6.4.24) and analogous results to (6.4.42)and (6.4.43) with zk instead of zk and x instead of x∗. Additionally, we obtainfrom (6.4.21)

∞∑k=1

√γk <∞,

∞∑k=1

µk <∞ and

∞∑k=1

αk <∞

and the convergence of∥∥xk,0 − x∥∥ is ensured by Lemma A3.1.7. Moreover,

the results (6.4.27), (6.4.36), (6.4.40) and (6.4.41) remain true if we use zk

instead of zk and∥∥xk,i − zk∥∥ < ∥∥xk,0 − zi∥∥ , ∥∥xk+1,0 − zk∥∥ ≤ √γk + αk +

∥∥xk,i − zk∥∥


for all k and 0 < i < i(k). Since zk, x ∈ Bτ (xc) this leads to∥∥xk+1,0 − x∥∥−√γk − αk − 4τµk ≤

∥∥xk,i − x∥∥ < ∥∥xk,0 − x∥∥+ 4τµk,

hence the sequence∥∥xk,i − x∥∥ converges. Furthermore, regarding (6.4.30) and

limk→∞ αk = 0 which is enforced by (6.4.21), it is clear that∥∥xk,i − x∥∥

converges to the same limit point.Due to x, x ∈ Bτ (xc) and 0 < µk < 1 for k = 1, 2, · · · we have zk ∈ Bτ (xc)

for all k ∈ N as well as∥∥xk,i−1 − zk∥∥ ≤ ∥∥xk,i−1 − x

∥∥+ 2τµk

for all pairs (k, i) with 1 ≤ i ≤ i(k) and∥∥xk,i − zk∥∥ ≥ ∥∥xk,i − x∥∥− 2τµk

for all pairs (k, i) with 0 ≤ i ≤ i(k). Consequently we obtain∥∥xk,i−1 − zk∥∥2 ≤

∥∥xk,i−1 − x∥∥2

+ 8τ2µk + 4τ2µ2k

for all pairs (k, i) with 1 ≤ i ≤ i(k) and∥∥xk,i − zk∥∥ ≥ ∥∥xk,i − x∥∥2 − 8τ2µk − 4τ2µ2k

for all pairs (k, i) with 0 ≤ i ≤ i(k). Additionally the modified estimates (6.4.27)

fk(zk) ≤ fk(x) + µk(c3 + ln τ + | lnµk|)

for all x ∈M0 ∩ Bτ (xc) and (6.4.36)∥∥xk,i − zk∥∥2 −∥∥xk,i−1 − zk

∥∥2 ≤ 2

χk

(fk(zk)− fk(xk,i)

)allow to infer∥∥xk,i−1 − x

∥∥2 −∥∥xk,i − x∥∥2 ≥

≥ 2

χk

(fk(xk,i)− fk(x)− µk(c3 + ln τ + | lnµk|)

)− 16τ2µk − 8τ2µ2

k

for all x ∈ M0 ∩ Bτ (xc). Then, regarding xk,i ∈ M0 ∩ Bτ (xc) and estimate(6.4.26), we obtain∥∥xk,i−1 − x

∥∥2 −∥∥xk,i − x∥∥2 ≥

≥ 2

χk

(f(xk,i)− fk(x)− µk(c1 + ln τ)

)(6.4.48)

− 2µkχk

(c3 + ln τ + | lnµk|)− 8τ2µk(2 + µk).

Furthermore, we have limk→∞ fk(x) = f(x) for each fixed x ∈M0, thus µk → 0and χk ≤ χ give

lim supk→∞

(max

1≤i≤i(k)

(f(xk,i)− f(x)

))≤ 0 (6.4.49)


for each fixed x ∈M0 ∩ Bτ (xc).Now let x∗∗ be an accumulation point of the sequence xk,i. Such an ac-

cumulation point exists because xk,i ∈ Bτ (xc)∩M for all pairs (k, i). Regarding(6.4.30) and limk→∞ αk = 0 it follows that x∗∗ is also an accumulation pointof the sequence xk,i. Furthermore we obtain x∗∗ ∈ M ∩ Bτ (xc) because thesets M and Bτ (xc) are closed. For each x ∈ M0 ∩ Bτ (xc) estimate (6.4.49)establishes f(x) as an upper bound for f(x∗∗) so that we deduce

f(x∗∗) ≤ inf f(x) : x ∈M0 ∩ Bτ (xc) . (6.4.50)

ObviouslyM∩Bτ (xc) is the closure ofM0 ∩Bτ (xc). Furthermore x∗ ∈Mopt ∩Bτ (xc) such that (6.4.50) implies f(x∗∗) ≤ f(x∗), respectively x∗∗ ∈ Mopt ∩Bτ (xc). Consequently, regarding that

∥∥xk,i − x∥∥ converges for each x ∈Mopt∩Bτ (xc), the sequence

∥∥xk,i − x∗∗∥∥ converges to zero. Thus the sequence xk,iconverges to x∗∗ ∈Mopt.

6.4.8 Remark. If maxt∈T g(·, t) is bounded below on the feasible set M ofSIP (6.4.3), i.e. there exists a (non-positive) constant d > −∞ with d ≤maxt∈T g(x, t) for all x ∈M, one obtains

inf

−µk ln

(−max

t∈Tg(x, t)

): x ∈ Bτ (xc)

≥ −µk ln(−d).

Consequently, using this estimate instead of (6.4.26) in the proof above, the con-ditions on the parameter of Algorithm 6.4.4 in Theorem 6.4.7 can be simplifiedin the considered case. In particular, c3 can be changed into

c3 := f(x)− f− + c0 − ln(−d)

and in (6.4.21) and (6.4.22) the term ln τ can be dropped. Thus the left-handside of the modified estimate (6.4.21) does not depend on (the unknown) τ .Therefore one could choose the value of τ after determining µk, αk andχk. Finally, the value of σk can be fixed such that (6.4.22) holds. Altogetherthe described procedure is much easier than the simultaneous determination ofall parameters in the general case. ♦

6.4.9 Remark. The conditions on the parameters of the method require theirseparate adjustment for each particular example, which can be a very fragiletask when applying a MSR-method. In the case of an OSR-method parametersaccording to Theorem 6.4.7 can be easily chosen. As we already know, an OSR-method procedure is given if i(k) = 1 for each k, which can be ensured bychoosing σk sufficiently large. Of course, applying a MSR-method large valuesof σk must be avoided. Then (6.4.22) is automatically satisfied for each fixedτ ≥ 1. Furthermore, (6.4.21) holds for all sufficiently large values of τ if oneguarantees that

∞∑k=1

√µk| lnµk|

χk<∞ and

∞∑k=1

αk <∞.

Consequently, (6.4.21) and (6.4.22) can be replaced by the given conditionsabove and τ , σk need not to be specified explicitly. ♦


At the end of this subsection an estimate of the difference between thecurrent value of the objective function f at each exterior step and its minimalvalue f∗ on M is established (cf. also Theorem 4.3.10 and Theorem 6.3.7).

6.4.10 Lemma. Let the assumptions of Theorem 6.4.7 be satisfied and let fbe Lipschitz continuous with modulus L on Bτ (xc).Then it holds

f(xk)− f∗ ≤

(9

8τχk + L

)(√2µkχk

+ αk

)+

9

8τχkσk, k = 1, 2, · · · .

Proof: Let k be fixed and 0 ≤ i < i(k) be arbitrarily given. In the proof ofTheorem 6.4.7 we defined already

¯xk,i+1 := arg minx∈M

f(x) +

χk2

∥∥x− xk,i∥∥2.

The inclusion ¯xk,i+1 ∈ Bτ (xc) has already been shown. Because M∩ Bτ (xc) isconvex, (1 − λ)¯xk,i+1 + λx∗ ∈ M ∩ Bτ (xc) for all λ ∈ [0, 1]. The optimality of¯xk,i+1 gives

f((1− λ)¯xk,i+1+ λx∗

)+χk2

∥∥(1− λ)¯xk,i+1+ λx∗ − xk,i∥∥2

2

≥ f(¯xk,i+1

)+χk2

∥∥¯xk,i+1 − xk,i∥∥2

2

so that, regarding the convexity of f , for all λ ∈ (0, 1]

0 ≤ λf(x∗)−λf(¯xk,i+1

)+χk2λ2∥∥x∗ − ¯xk,i+1

∥∥2

2+λχk〈x∗− ¯xk,i+1, ¯xk,i+1−xk,i〉

and

f(¯xk,i+1

)− f(x∗) ≤ χk〈x∗ − ¯xk,i+1, ¯xk,i+1 − xk,i〉+

χk2λ∥∥x∗ − ¯xk,i+1

∥∥2

2.

Passing to the limit λ ↓ 0 combined with the Cauchy-Schwarz inequality leadsto

f(¯xk,i+1

)− f(x∗) ≤ χk〈x∗ − ¯xk,i+1, ¯xk,i+1 − xk,i〉

≤ χk∥∥x∗ − ¯xk,i+1

∥∥ (∥∥¯xk,i+1 − xk,i+1∥∥+

∥∥xk,i+1 − xk,i∥∥)

and using the Lipschitz continuity of f

f(xk,i+1

)− f(x∗) ≤

∥∥¯xk,i+1 − xk,i+1∥∥ (L+ χk

∥∥x∗ − ¯xk,i+1∥∥)

+ χk∥∥x∗ − ¯xk,i+1

∥∥ ∥∥xk,i+1 − xk,i∥∥ .

In view of (6.4.34) and∥∥x∗ − ¯xk,i+1∥∥ ≤ ‖x∗ − xc‖+

∥∥¯xk,i+1 − xc∥∥ ≤ τ

8+ τ =

9

8τ,

we obtain

f(xk,i+1

)− f(x∗) ≤

(9

8τχk + L

)(αk +

√2µkχk

)+

9

8τχk

∥∥xk,i+1 − xk,i∥∥ ,

so that the statement follows by means of∥∥xk,i(k) − xk,i(k)−1

∥∥ ≤ σk, xk = xk,i(k)

and f(x∗) = f∗.


6.4.2.1 Rate of convergence

In the following subsections we analyze further convergence properties of Algo-rithm 6.4.4 with regard to the rate of convergence based on results of Kaplanand Tichatschke [222]. In doing so we consider the sequence xk,i insteadof the generated sequence xk,i with

xk,i = arg minx∈M0

Fk,i(x)

defined as in the proof of Theorem 6.4.7. That means we consider the sequenceof the exact minimizer of Fk,i instead of the computed approximate minimizer.However, based on results for the exact minima we can also achieve results forthe approximate minima, e.g. by using (6.4.30).

First the value of maxt∈T g(xk,i, t) is estimated.

6.4.11 Lemma. Let the assumptions of Theorem 6.4.7 be satisfied. Then foreach k and 1 ≤ i ≤ i(k)

−maxt∈T

g(xk,i) ≥ c4µk

holds with

c4 := − maxt∈T g(x, t)

µ1 + f(x)− f∗ + 2τ2χ

and x defined in Theorem 6.4.7.

Proof: Let k, i be arbitrarily given with 1 ≤ i ≤ i(k). Due to xk,i ∈ M0 =domFk,i, Theorem 23.1 in Rockafellar [348] ensures the existence of thedirectional derivative F ′k,i(x

k,i; d) for each d ∈ Rn. Since Fk,i attains its minimal

value at xk,i and since M0 is open we obtain F ′k,i(xk,i; d) ≥ 0 for each d ∈ Rn

so that 0 ∈ ∂Fk,i(xk,i

)follows from Theorem 23.2 in [348]. From subdifferential

calculus we already know

∂Fk,i(xk,i

)⊃ ∂fk

(xk,i

)+χk(xk,i − xk,i−1

).

Regarding

xk,i ∈M0 = ridom(fk) ∩ ridom(χk

2

∥∥· − xk,i−1∥∥2),

Theorem 23.8 in [348] even leads to

∂Fk,i(xk,i

)= ∂fk

(xk,i


).

Moreover, one obtains

∂fk(xk,i

)= ∂f

(xk,i

)+ µk

1

−maxt∈T g(xk,i, t)∂

(maxt∈T

g(xk,i, t)

),

hence

∂Fk,i(xk,i

)=

= ∂f(xk,i

)+

µk−maxt∈T g(xk,i, t)

∂

(maxt∈T

g(xk,i, t)


),


i.e. there exist uf ∈ ∂f(xk,i) and ug ∈ ∂(maxt∈T g(xk,i, t)

)with

uf −µk

maxt∈T g(xk,i, t)ug + χk

(xk,i − xk,i−1

)= 0

and multiplication with (xk,i − x) leads to

〈uf , xk,i− x〉+µk

−maxt∈T g(xk,i, t)〈ug, xk,i− x〉+χk〈xk,i−xk,i−1, xk,i− x〉 = 0.

Using the properties of the subgradients uf and ug as well as the convexity ofthe norm we obtain

0 ≥ f(xk,i

)− f (x) +


(maxt∈T

g(xk,i, t)−maxt∈T

g(x, t)

)+χk2

(∥∥xk,i − xk,i−1∥∥2 −

∥∥x− xk,i−1∥∥2),

so that we can conclude


(maxt∈T


g(x, t)

)≤ f(x)−f∗+χk

2

∥∥x− xk,i−1∥∥2

and using x, xk,i−1 ∈ Bτ (xc) gives


(maxt∈T


g(x, t)

)≤ f(x)− f∗ + 2τ2χ.

Now, regarding µk ≤ µ1, it is obvious that

−maxt∈T

g(xk,i, t) ≥ µk−maxt∈T g(x, t)

µ1 + f(x)− f∗ + 2τ2χ

holds and the proof is complete.

Denote∆k,i := f(xk,i)− f∗

for each k and 1 ≤ i ≤ i(k). In order to complete this definition for i = 0 weset xk+1,0 := xk,i(k) for each k = 1, 2, · · · as well as x1,0 := x1,0 = x0 such that∆k,0 can be defined as ∆k,i above.

6.4.12 Theorem. Let the assumptions of Theorem 6.4.7 be satisfied. Moreover,assume that

µ1 ≤ −1

c4maxt∈T

g(x0) (6.4.51)

with c4 given as in Lemma 6.4.11. Additionally let be given a positive constantα with

α ≤ 1

16χτ2, α sup

k,i∆k,i ≤

7

32(6.4.52)

and assume that for each k constant

ζk := µk(c1 + ln τ)− µk ln

(1

8c4µk

)+χk2α2k +

7

4χkαkτ (6.4.53)


satisfies

ζk ≤ α

(∆1,0

1 + α∑k−1j=1 i

1(j)∆1,0

)2

, (6.4.54)

2ζk ≤χk2

(σk − αk)2 (6.4.55)

with i1(j) := max1, 2i(j)− 2.Then the estimate

∆k,i ≤∆1,0

1 + α(

2i+∑k−1j=1 i

1(j))

∆1,0

(6.4.56)

is true for k = 1, 2, · · · , 0 ≤ i < i(k) if ∆1,0 > 0 or x0 6∈ Mopt.

Proof: Denote zk,i := arg minz∈Mopt∩Bτ (xc)

∥∥xk,i − z∥∥ for each k and 0 ≤ i ≤i(k). Then zk,i(λ) := λzk,i + (1 − λ)xk,i ∈ M0 ∩ Bτ (xc) for all λ ∈ [0, 1) andk, 0 ≤ i ≤ i(k) and we have

Fk,i+1(xk,i+1) ≤ Fk,i+1(zk,i(λ))

for λ ∈ [0, 1) if i < i(k). Taking into account the convexity of f as well as(6.4.30) we obtain

Fk,i+1

(xk,i+1

)= f

(xk,i+1

)− µk ln

(−max

t∈Tg(xk,i+1, t)

)+χk2

∥∥xk,i+1 − xk,i∥∥2

≤ Fk,i+1(zk,i(λ))

≤ λf(zk,i)

+ (1− λ)f(xk,i

)− µk ln

(−max

t∈Tg(zk,i(λ), t)

)+χk2

(λ‖zk,i − xk,i)‖+ αk

)2. (6.4.57)

Using the convexity of maxt∈T g(·, t) and the inclusion zk,i ∈M, we have

maxt∈T

g(zk,i(λ), t) ≤ (1− λ) maxt∈T

g(xk,i, t)

such that we infer

−µk ln

(−max

t∈Tg(zk,i(λ), t)

)≤ −µk ln

(−(1− λ) max

t∈Tg(xk,i, t)

)for all pairs (k, i) with 1 ≤ i ≤ i(k) and λ ∈ [0, 1). Applying Lemma 6.4.11 itfollows

−µk ln

(−max

t∈Tg(zk,i(λ), t)

)≤ −µk ln ((1− λ)c4µk) (6.4.58)

for all pairs (k, i) with 1 ≤ i ≤ i(k). But xk,0 = xk−1,i(k−1) and i(k − 1) > 0 foreach k > 1 so that

−µk ln

(−max

t∈Tg(zi,0(λ), t)

)≤ −µk ln ((1− λ)c4µi−1)


follows by Lemma 6.4.11. The monotonic decrease of µk leads to (6.4.58)again and regarding (6.4.51) the estimate now holds for all k and 0 ≤ i ≤ i(k).

From the proof of Theorem 6.4.7 we know that xk,i+1 ∈M0 ∩ Bτ (xc) forall k and 0 ≤ i < i(k). Using (6.4.26) we therefore conclude

−µk ln

(−max

t∈Tg(xk,i+1, t)

)≥ −µk(c1 + ln τ).

This together with (6.4.57), (6.4.58) and zk,i ∈Mopt yields

∆k,i+1 = f(xk,i+1)− f∗

≤ λ(f(zk,i)− f∗) + (1− λ)(f(xk,i)− f∗)

− µk ln

(−max

t∈Tg(zk,i(λ), t)

)+ µk ln

(−max

t∈Tg(xk,i+1, t)

)+χk2

(λ‖zk,i − xk,i)‖+ αk

)2 − χk2

∥∥xk,i+1 − xk,i∥∥2

≤ (1− λ)∆k,i − µk ln ((1− λ)c4µk) + µk(c1 + ln τ)

+χk2λ2‖zk,i − xk,i‖2 + χkαkλ‖zk,i − xk,i‖+

χk2α2i

− χk2

∥∥xk,i+1 − xk,i∥∥2

2

(6.4.59)

for all k and 0 ≤ i < i(k). In view of zk,i, xk,i ∈ Bτ (xc) as well as the definitionof ζk we obtain

0 ≤ ∆k,i+1 ≤ (1− λ)∆k,i + ζk +χk2λ2‖zk,i − xk,i‖2 − χk

2

∥∥xk,i+1 − xk,i∥∥2

2,

(6.4.60)if λ ∈ [0, 7/8], which can be enforced by setting

λ := λk,i = min

[∆k,i

χk‖zk,i − xk,i‖2,

7

8

]. (6.4.61)

If λk,i equals 7/8 (6.4.61) immediately leads to

χk∥∥zk,i − xk,i∥∥2 ≤ 8

7∆k,i

and we can infer from (6.4.60)

∆k,i+1 ≤∆k,i

8+ ζk +

7

16∆k,i −

χk2

∥∥xk,i+1 − xk,i∥∥2

2

=9

16∆k,i + ζk −

χk2

∥∥xk,i+1 − xk,i∥∥2

2.

(6.4.62)

Due to the second part of (6.4.52) this allows us to conclude

∆k,i+1 ≤ ∆k,i − 2α∆2k,i + ζk −

χk2

∥∥xk,i+1 − xk,i∥∥2

2. (6.4.63)

But if λk,i < 7/8 holds we obtain

∆k,i+1 ≤ ∆k,i −∆2k,i

2χk‖zk,i − xk,i‖2+ ζk −

χk2

∥∥xk,i+1 − xk,i∥∥2

2(6.4.64)


and now using the first part of (6.4.52) inequality (6.4.63) follows again. Con-sequently this estimate holds for all pairs (k, i) with 0 ≤ i < i(k) making it thebasis of the following induction proof.

Let us assume that (6.4.56) holds for a fixed pair k, i with i < i(k). Thisis obvious for the starting indices k = 1, i = 0. Now we distinguish three cases.

(a) We first consider 0 ≤ i < i(k) − 1. Then i + 1 < i(k) and due to (6.4.22)as well as (6.4.37) we have

χk2

∥∥xk,i+1 − xk,i∥∥2

2>χk2

(σk − αk)2 (6.4.65)

such that (6.4.55) leads to

ζk −χk2

∥∥xk,i+1 − xk,i∥∥2

2< 0.

Consequently, with (6.4.63) we obtain

∆k,i+1 ≤ ∆k,i − 2α∆2k,i.

Moreover, the trivial inequality

y − ϑy2 ≤ y

1 + ϑy(6.4.66)

is true for all y ≥ 0 with fixed ϑ > 0 and the function y1+ϑy increases

monotonically for nonnegative y such that one can set ϑ := 2α, y =: ∆k,i.Then, with regard to the induction assumption, we infer

∆k,i+1 <∆k,i

1 + 2α∆k,i

≤ ∆1,0(1 + α

(2i+

∑k−1j=1 i

1(j))

∆1,0

)(1 + 2α

∆1,0

1+α(2i+∑k−1j=1 i

1(j))∆1,0

)=

∆1,0

1 + α(

2i+ 2 +∑k−1j=1 i

1(j))

∆1,0

and the induction statement holds.

(b) In case i = 0, i(k) = 1 we have i1(k) = 1 and (6.4.63) leads to

∆k+1,0 = ∆k,1 ≤ ∆k,0 − 2α∆2k,0 + ζk.

Furthermore, regarding the second part of (6.4.52), it holds

∆1,0

1 + α∑k−1j=1 i

1(j)∆1,0

≤ ∆1,0 ≤7

32α<

1

4α.

Because the function y − 2αy2 increases monotonically if y < 14α , the

induction assumption allows to conclude

∆i+1,0 ≤∆1,0

1 + α∑k−1j=1 i

1(j)∆1,0

− 2α

(∆1,0

1 + α∑k−1j=1 i

1(j)∆1,0

)2

+ ζk


such that

∆i+1,0 ≤∆1,0

1 + α∑k−1j=1 i

1(j)∆1,0

− α

(∆1,0

1 + α∑i−1k=1 i

1(k)∆1,0

)2

(6.4.67)

follows from (6.4.54). Using (6.4.66) again - this time with ϑ := α, y =:∆1,0

1+α∑k−1j=1 i

1(j)∆1,0- the combination with (6.4.67) and i1(k) = 1 leads to

∆k+1,0 ≤∆1,0(

1 + α∑k−1j=1 i

1(j)∆1,0

)(1 + α

∆1,0

1+α∑k−1j=1 i

1(j)∆1,0

)=

∆1,0

1 + α∑k−1j=1 i

1(j)∆1,0 + α∆1,0

=∆1,0

1 + α∑kj=1 i

1(j)∆1,0

such that the induction statement holds.

(c) Finally let us consider the case i = i(k) − 1 with i(k) > 1. Then from(6.4.63) we have the inequalities

∆k,i(k)−1 ≤ ∆k,i(k)−2 − 2α∆2k,i(k)−2 + ζk −

χk2

∥∥∥xk,i(k)−1 − xk,i(k)−2∥∥∥2

2,

∆k+1,0 = ∆k,i(k) ≤ ∆k,i(k)−1 − 2α∆2k,i(k)−1 + ζk.

Coupling these we infer

∆k+1,0 ≤ ∆k,i(k)−2−2α∆2k,i(k)−2−2α∆2

k,i(k)−1+2ζk−χk2

∥∥∥xk,i(k)−1 − xk,i(k)−2∥∥∥2

2,

so that together with (6.4.55), (6.4.65) and α∆2k,i(k)−1 ≥ 0 the estimate

∆k,i(k) < ∆k,i(k)−2 − 2α∆2k,i(k)−2

holds. Thus we can conclude analogously to case (a) that

∆k+1,0 <∆1,0

1 + α(

2i(k)− 2 +∑k−1j=1 i

1(j))

∆1,0

≤ ∆1,0

1 + α∑kj=1 i

1(j)∆1,0

since 2i(k)− 2 > 1 and the induction is complete.

6.4.2.2 Linear convergence

Theorem 6.4.12 establishes the important estimate (6.4.56) which holds for anySIP satisfying Assumption 6.4.3. If we consider problems adhering to tighterassumptions it is possible to prove linear convergence for the iterates as well asthe values of the objective function. The condition which is to use in this caseis the following growth condition

infx∈M

f(x)− f∗

ρ2(x,Mopt)≥ θ > 0 (6.4.68)


with

M := (M0 ∩ Bτ (xc)) \Mopt, ρ(x,Mopt) := minz∈Mopt∩Bτ (xc)

‖x− z‖.

This growth condition generalizes that of Rockafellar [352] which occurs inthe context for proving linear convergence of the iterates of an inexact proximalpoint method.

If θχk

< 78 is true, (6.4.61) admits λk,i = 7/8 as well as λk,i < 7/8 for all

i with 0 ≤ i < i(k). If λk,i = 7/8, inequality (6.4.62) is true, while in the caseλk,i < 7/8 estimate (6.4.64) follows. Then we have

λk,i =∆k,i

χk‖zk,i − xk,i‖2

and one can conclude

∆k,i+1 ≤(

1− θ

2χ

)∆k,i + ζk −

χk2

∥∥xk,i+1 − xk,i∥∥2

2

with regard to (6.4.68) and χk ≤ χ.But if θ

χk≥ 7

8 is true, (6.4.61) only admits λk,i = 7/8 for all i with

0 ≤ i < i(k) which immediately leads to (6.4.62).Thus ∆k,i, ∆k,i+1 always fulfil

0 ≤ ∆k,i+1 ≤ (1− θ1) ∆k,i + ζk −χk2

∥∥xk,i+1 − xk,i∥∥2

2, (6.4.69)

if i < i(k), where θ1 = min

716 ,

θ2χ

.

Using these preliminary remarks a linear convergence of the sequence∆k,i can be established under the given growth condition.

6.4.13 Theorem. Let the assumptions of Theorem 6.4.7 be satisfied. Moreover,let (6.4.51) as well as (6.4.68) be satisfied. Additionally assume that

ζk ≤χk(1− θ1)

2(2− θ1)(σk − αk)2, ζk <

θ1

2∆1,0q

sk (6.4.70)

with sk =k−1∑j=1

i(j), q ∈[1− θ1

2 , 1). Then

∆k,i ≤ ∆1,0qsk+i (6.4.71)

holds.

Proof: The proof is by induction again. The statement is for k = 1, i = 0.Thus we suppose that a fixed k and i < i(k) are given such that

∆`,i′ ≤ ∆1,0qs`+i

′(6.4.72)

holds for all ` < k, 0 ≤ i′ ≤ i(`) and ` = k, 0 ≤ i′ ≤ i. The condition i < i(k) isnot a restriction because in case i = i(k) we consider the equivalent pair k + 1,i = 0 < i(k + 1).

We distinguish three cases.


(a) Suppose i + 1 < i(k). Combining∥∥xk,i+1 − xk,i

∥∥ > σk − αk, the firstinequality in (6.4.70) and (6.4.69), we obtain

∆k,i+1 < (1− θ1)∆k,i

and along (6.4.72) this implies

∆k,i+1 < ∆1,0qsk+i+1.

b) Suppose i > 0, i+ 1 = i(k). Then the inequalities

∆k,i+1 ≤ ∆k,i(1− θ1) + ζk −χk2

∥∥xk,i+1 − xk,i∥∥2,

∆k,i ≤ ∆k,i−1(1− θ1) + ζk −χk2

∥∥xk,i − xk,i−1∥∥2

and∥∥xk,i − xk,i−1

∥∥ > σk − αk hold. Substituting the second inequalityinto the first one gives

∆k,i+1 < ∆k,i−1(1− θ1)2 + ζk(2− θ1)− (1− θ1)χk2

(σk − αk)2,

leading to∆k,i+1 < (1− θ1)2∆k,i−1

if we consider the first inequality in (6.4.70). Hence,

∆k,i+1 < ∆1,0qsk+i+1

and the induction statement holds.

c) Suppose i = 0, i(k) = 1. Taking (6.4.72) and the second inequality in(6.4.70) into account we obtain

∆k,1 ≤ ∆1,0qsk(1− θ1) +

θ1

2∆1,0q

sk < ∆1,0qsk+1

from (6.4.69) and the proof is complete.

If the considered problem fulfils the growth condition (6.4.68) we can ad-ditionally prove the linear convergence of the sequence xk,i to an element ofMopt.

For this purpose we define i(k) := 16τ2

(σk−αk)2 + 1 and

ςk :=

∞∑j=k

(√γj + αj + 4τµj

), ∀ k, (6.4.73)

with γj :=2µjχj

(2| lnµj |+ ln τ) as in the proof of Theorem 6.4.7.

6.4.14 Theorem. Let the assumptions of Theorem 6.4.13 be satisfied. More-over, assume that

1

4τγk + αk <

1

8τ(σk − αk)2, (6.4.74)

ςk ≤√

∆1,0

θ

√qsk+i(k), ∀ k. (6.4.75)


Then it holds

∥∥xk,i − x∗∗∥∥ ≤ 3

√∆1,0

d

√qsk+i, ∀ k, 0 ≤ i < i(k), (6.4.76)

where x∗∗ := limk→∞ xk,0 is an optimal solution of SIP (6.4.3).

Proof: Let zk,i = arg minz∈Mopt∩Bτ (xc)

∥∥xk,i − z∥∥ be given as in the proof ofTheorem 6.4.12. The inequality

∥∥xk,i − zk,i∥∥ ≤√∆k,i

θ(6.4.77)

is obviously true if xk,i ∈Mopt, otherwise (6.4.77) holds due to (6.4.68).In the sequel let k0, i0 be fixed with 0 ≤ i0 < i(k0). From the proof of

Theorem 6.4.7 inequality (6.4.38) is known, i.e.∥∥xk,i+1 − zk∥∥2 −

∥∥xk,i − zk∥∥2

2≤ −(σk − αk)2 + γk

holds for all k, 0 ≤ i < i(k) − 1 with zk := x + (1 − µk)(x∗ − x). If we usezk := x+ (1− µk)(zk0,k0 − x) instead of zk we can conclude∥∥xk,i+1 − zk

∥∥2 −∥∥xk,i − zk∥∥2

2≤ −(σk − αk)2 + γk

analogously for all k and 0 ≤ i < i(k)− 1. Furthermore it yields (cf. (6.4.40))∥∥xk,i+1 − zk∥∥− ∥∥xk,i − zk∥∥ < 1

4τ

(−(σk − αk)2 + γk

)+ αk < 0 (6.4.78)

and (cf. (6.4.41))∥∥∥xk,i(k) − zk∥∥∥− ∥∥∥xk,i(k)−1 − zk

∥∥∥ ≤ √γk + αk (6.4.79)

for all k and 0 ≤ i < i(k)− 1. Summing up these inequalities we obtain∥∥∥xk,i(k) − zk∥∥∥− ∥∥xk,0 − zk∥∥ ≤ √γk + αk,

and together with∥∥zk − zk0,i0∥∥ ≤ 2µkτ and xk+1,0 = xk,i(k) the estimate∥∥xk+1,0 − zk0,i0

∥∥− ∥∥xk,0 − zk0,i0∥∥ ≤ √γk + αk + 4µkτ

follows. Summing these inequalities for j = k0 + 1, . . . , k′ − 1 with k′ > k0 + 1we get

∥∥∥xk′,0 − zk0,i0∥∥∥ ≤ ∥∥xk0+1,0 − zk0,i0∥∥+

k′−1∑j=k0+1

(√γj + αj + 4µjτ

).

In combination with (6.4.78) and (6.4.79) this leads to

∥∥∥xk′,0 − zk0,i0∥∥∥ ≤ ∥∥xk0,i0 − zk0,i0∥∥+

k′−1∑j=k0


).


Moreover, limk→∞ xk,0 = limk→∞ xk,0 follows from (6.4.30) and limk→∞ αk = 0is enforced by (6.4.21). Thus, taking limit k′ →∞, it allows us to conclude

∥∥x∗∗ − zk0,i0∥∥ ≤ ∥∥xk0,i0 − zk0,i0∥∥+

∞∑j=k0


)=∥∥xk0,i0 − zk0,i0∥∥+ςk0 ,

leading to∥∥xk0,i0 − x∗∗∥∥ ≤ ∥∥xk0,i0 − vk0,i0∥∥+∥∥zk0,i0 − x∗∗∥∥ ≤ 2

∥∥xk0,i0 − zk0,i0∥∥+ ςk0

and we obtain ∥∥xk0,i0 − x∗∗∥∥ ≤ 2

√∆k0,i0

θ+ ςk0 (6.4.80)

in combination with (6.4.77). We remember that estimate (6.4.47) gives

i(k0) <2τ

14τ ((σk0 − αk0)2 − γk0)− αk0

+ 1

so that i(k0) < i(k0) follows in view of (6.4.74). Then relation (6.4.75) gives

ςk0 ≤√

∆1,0

θ

√qsk+i(k0)

and along with (6.4.71) and (6.4.80) we can deduce

∥∥xk0,i0 − x∗∗∥∥ ≤ 3

√∆1,0

θ

√qsk+i0 .

Because k0 and i0 were chosen arbitrarily the statement is proved.

6.4.15 Corollary. Under the assumptions of Theorem 6.4.14 it holds

∥∥xk,i − x∗∗∥∥ ≤ 4

√∆1,0

θ

√qsk+i, ∀ k, 1 ≤ i ≤ i(k),

where x∗∗ = limk→∞ xk,0 is an optimal solution of SIP (6.4.3).

Proof: Regarding (6.4.30) and (6.4.76) we see that

∥∥xk,i − x∗∗∥∥ ≤ 3

√∆1,0

θ

√qsk+i + αk

holds for all k and 1 ≤ i < i(k). Moreover, using the definition of ςk in (6.4.73),(6.4.75) and 0 < q < 1, we infer

αk ≤√

∆1,0

θ

√qsk+i, ∀ k, 1 ≤ i ≤ i(k).

Combining both estimates then the proof is complete.


6.4.3 Application to model examples

We present some numerical results computed by the proposed interior pointmethods. For that purpose the algorithms were implemented in the program-ming language C by using version 2.7.2.3 of the gcc-compiler on a PentiumIII/800-computer with the operating system Suse Linux 6.2. The included lin-ear programs are solved by the Simplex-method while the quadratic problemsare solved by a finite algorithm of Fletcher [115].

Before we have a closer look at the examples some general numerical con-siderations are necessary.The first question which raises in the loops in s of the Algorithms 6.4.4 is howcan we determine the positive radius ri,j,k such that the box Sk,i,s is completelycontained in M0. The simplest way to find such a radius is a trial-and-errorstrategy, whereby only the edges of the considered box have to be checked fortheir feasibility. In particular this fact requires that we can decide whether agiven point x fulfils g(x, t) < 0 for all t ∈ T . Such a decision procedure canbe very costly, especially if the exact evaluation of the constraint values is notpossible. Therefore we offer another method in the following lemma.In order to formulate it let LxS denote a constant for a given nonempty setS ⊂ Rn with

supz∈S

supt∈T

supv∈∂g(z,t)

‖v‖1 ≤ LxS (6.4.81)

and the additional property that S′ ⊂ S implies LxS′ ≤ LxS . Furthermore, let usdefine

Br(M) :=

z ∈ Rn : min

v∈M‖z − v‖∞ ≤ r

for r > 0 and nonempty compact sets M ⊂ Rn.

6.4.16 Lemma. Let x ∈ M0 and r > 0 be given. Moreover, let h ≥ 0 be givensuch that

−maxt∈Th

g(x, t)− Ltxh > 0

holds with Ltx fulfilling (6.4.6), i.e.,

|g(x, t1)− g(x, t2)| ≤ Ltx‖t1 − t2‖, ∀ t1, t2 ∈ T.Then the inclusion Br(x) ⊂M0 is valid if

0 < r < min

[r,−maxt∈Th g(x, t)− Ltxh

LxBr(x)

]. (6.4.82)

Proof: Let z /∈ M0 be given. One has to show ‖z − x‖∞ > r. If ‖z − x‖∞ ≥ rthis follows immediately. Thus in the sequel we assume that ‖z− x‖∞ < r holds.Let t∗ ∈ T (z) be given such that g(z, t∗) = maxt∈T g(z, t) ≥ 0. Then one canconclude

0 > maxt∈Th

g(x, t) + Ltxh ≥ maxt∈T

g(x, t) ≥ g(x, t∗)− g(z, t∗) ≥ 〈v, x− z〉

with v ∈ ∂g(z, t∗). Due to ‖z − x‖∞ < r the estimate

−maxt∈Th

g(x, t)− Ltxh ≤ |vT (x− z)| ≤ LxBr(x)‖x− z‖∞


follows. Hence, we obtain ‖z − x‖∞ > r and the proof is complete.

6.4.17 Remark. In case of more than one inequality constraint (i.e. l > 1 in(6.4.3)) or in case of occurring linear equality constraints one has to replaceM0 by x ∈ Rn : gν(x, t) < 0 (t ∈ T ν) for each inequality constraint inthe proposition of the lemma. In this way a feasible radius for each inequalityconstraint can be separately determined by Lemma 6.4.16. Then the smallestvalue of these radii can be used for fixing the box. ♦

After determination of the boxes we have to select values for hi,j,k. Nor-mally they influence directly the costs of the maximization processes such thatwe want to choose them as large as possible. Upper bounds for hk,i,s are givenby (6.4.12). But in order to use these upper bounds the constant CS fulfillingpart (9) of the Assumption 6.4.3 is needed.

6.4.18 Lemma. Let the assumptions of Lemma 6.4.16 be fulfilled. Further-more, let r > 0 be given such that (6.4.82) is valid. Then

CS :=1

−maxt∈Th g(x, t)− Ltxh− LxSr

(6.4.83)

fulfils (6.4.7) with S := Br(x).

Proof: From Lemma 6.4.16 it follows that S ⊂ M0. Let x ∈ S, t∗ ∈ T (x)and v(x, t∗) ∈ ∂g(x, t∗) be arbitrarily given. Then we infer due to the Lipschitzproperty of function g

−maxt∈T

g(x, t) = −g(x, t∗)

≥ −g(x, t∗) + 〈v(x, t∗), x− x〉≥ −max

t∈Tg(x, t)− LxSr

≥ −maxt∈Th

g(x, t)− Ltxh− LxSr.

Moreover, using (6.4.82), we deduce

−maxt∈Th

g(x, t)− Ltxh− LxSr > 0

so that we have∣∣∣∣ 1

maxt∈T g(x, t)

∣∣∣∣ =1

−maxt∈T g(x, t)

≤ 1

−maxt∈Th g(x, t)− Ltxh− LxSr

= CS .

Consequently (6.4.7) holds since x ∈ S was chosen arbitrarily.

6.4.19 Remark. The monotonicity property of CS is automatically given if LxSis chosen as in (6.4.81) and one computes CS by (6.4.83). ♦


Now the sets Th ⊂ T are always determined as equidistant discretizationof T with step size 2h. Furthermore, the radii of the considered boxes are alwayscomputed as 9/10 of the maximal value allowed by the previous Lemma 6.4.16.But the values of r in the formula (6.4.82) have to be adapted to each example.Particularly they are adapted to each step of the chosen algorithm. Finally, thevalues of CS are always computed as suggested in (6.4.83).

Let us finish our general statements with a remark on the application ofseveral convergence results stated before. Each of them says that the algorithmsgenerate sequences which converge to an optimal solution (in case of Algorithm6.4.4 or its generalization). Moreover, in each presented convergence theoremit is required that some positive sequences converge to zero. But, caused by thefact that we cannot generate complete sequences, in practice it is impossible tocheck these assumptions. Nevertheless, they are the basis of the practical pa-rameter setting in the following sense: We choose the occurring finite values ofeach sequence which has to converge in such a way that they fulfil a geometricdecrease condition.

Now let us start with a small academic example in two variables withunbounded solution set.

6.4.20 Example. Consider the problem

minimize f(x) := (x1 − x2)2

s.t. g(x, t) := x1 cos t+ x2 sin t− 1 ≤ 0 for all t ∈ T := [0, 1].

The feasible (filled area) and the solution set (dotted line) of this problem are

Figure 6.4.1: The feasible and solution set of Example 6.4.20

illustrated in Figure 6.4.1 (both sets are shrunk to the presented clipped area).The solution set is given by

Mopt =

x ∈ R2 : x1 = x2 ≤

1

2

√2

.

Thus we deal with an unbounded feasible and an unbounded solution set so thata non-regularized Algorithm cannot be used to solve the problem. Therefore wewant to show that Assumption 6.4.3 is fulfilled except for (8). This is obviousif we regard our introductory remarks for (7) and (9). Part (8) is fulfilled with


constants LtS defined by

maxx∈S

maxt∈[0,1]

∣∣∣∣∂g∂t (x, t)

∣∣∣∣ = maxx∈S

maxt∈[0,1]

| − x1 sin t+ x2 cos t|

≤ maxx∈S‖x‖∞ max

t∈[0,1]| cos t− sin t|

= maxx∈S‖x‖∞ =: LtS .

Furthermore, for computing the values CS and the radii, the constants LxS areneeded and can be given by

LxS := maxt∈[0,1]

∥∥∥∥( cos tsin t

)∥∥∥∥1

=√

2.

Thus, Assumption 6.4.3 is completely fulfilled and we can use the regularizedmethod for solving the problem.

Now we use Algorithm 6.4.4 for solving the given problem. Having inmind Theorem 6.4.7 we must specify a few constants. First we use the standardparameters contained in Table 6.4.1.

parameter start value decreasing factor lower bound

εk,0 0.001 0.06 −δk 0.001 0.06 −qk 0.999 − −

Table 6.4.1: Example 6.4.20 - standard parameter

Furthermore, we set χ1 := 1, χk+1 := max0.01, 0.2χk, τ := 12,xc := (−3.3,−1.7), x∗ := (−2.5, 2.5), x := (0, 0) and x0 := (−5, 0) in order tofulfil x∗ ∈Mopt∩Bτ/8(xc), x ∈M0∩Bτ (xc) and x0 ∈M0∩Bτ/4(xc). This leads

to c = ‖x − x∗‖ =√

12.5, f(x) = 0, f− = 0, c0 = | ln(1)| = 0, t = 0, v = (1, 0)and c1 = ln(1 + 2) = ln(3) so that we have c3 = ln(3) and µ1 = 0.1 ≤ e−c3 canbe used. Setting the lower bound of the barrier parameter to 10−6 and σk assmall as possible by (6.4.22) all assumptions of Theorem 6.4.7 are fulfilled andwe obtained the iteration process given in Table 6.4.2.

k,i xk1 xk2 d2(xk,Mopt) #LP #QP #BP Time

1,1 -3.013535 -2.008399 7.11E-01 15 31 4 0.011,2 -2.622698 -2.414308 1.47E-01 11 33 2 0.022,1 -2.531565 -2.519300 8.67E-03 13 36 2 0.033,1 -2.529324 -2.528754 4.03E-04 12 24 2 0.044,1 -2.529324 -2.528754 4.03E-04 6 10 2 0.065,1 -2.525245 -2.524960 2.01E-04 10 33 2 0.126,1 -2.525539 -2.525535 2.69E-06 14 40 2 0.22

Table 6.4.2: Example 6.4.20 - Iterates computed by Algorithm 6.4.4

Remarkably there was not required any restart procedure for adapting the accu-racy parameter since it turned out that the radius was always equal the maximal


possible value 0.9. Furthermore, the multi-step technique was in fact used in thefirst outer step.

♦

6.4.21 Remark. In Abbe [1] the same example was used in order to showthat the non-regularized MSR-algorithm does not converge in the case of anunbounded solution set. ♦

6.4.3.1 An application to the financial market

We present the application of our algorithms to a problem occurring in the fieldof finance. Our goal is to approximate the yield curve of an underlying asset.For that purpose we follow the considerations by Tichatschke et al. [388]which are based on the model of Vasicek [409].Let y(t) be a given yield curve fulfilling the stochastic differential equation

dy = (α+ βy)dt+ σdZ,

with a Brownian motion Z and parameters α, β, σ. Now this yield curve y shouldbe approximated on an interval T = [0, t] by a function r. Then r has to fulfilthe initial value problem

r = βr + α+ σw(t), r(0) = r0 ∈ [r, r],

w ≤ w(t) ≤ w, t ∈ T,(6.4.84)

which can be derived from the stochastic differential equation stated above.Therein the Brownian motion Z is modeled by a piecewise continuous functionw with bounds w, w. Additionally the initial value r0 is variable so far. Thesolution of (6.4.84) is given by

r(t) = −αβ

(1− eβt) + r0eβt + σ

t∫0

eβ(t−τ)w(τ)dτ (6.4.85)

so that the approximation error can be minimized by solving the problem

min maxt∈T|y(t)− r(t)|

s.t. r0 ∈ [r, r],

w ≤ w(t) ≤ w, for all t ∈ T.

(6.4.86)

Since the feasible set is mainly described by the space of piece-wise continuousfunctions

w : w ≤ w(t) ≤ w, t ∈ T,we deal with an infinite problem. In order to simplify the model the availablefunctions of w are chosen in the class of piecewise constant functions, i.e. we set

w(t) := wi for all t ∈ Ti, i = 1, . . . , N,

with 0 =: t0 < t1 < . . . < tN−1 < tN := t, T i := [ti−1, ti) for i ∈ 1, . . . , N − 1and TN := [tN−1, tN ]. Then the integral term in (6.4.85) becomes

σ

t∫0

eβ(t−τ)w(τ)dτ = −i−1∑j=1

Bjeβtwj −

σ

β

(1− eβ(t−ti−1)

)wi


for all t ∈ T i with

Bj :=σ

β

(e−βtj − e−βtj−1

), ∀ j = 1, . . . , N − 1.

Consequently, using

fk(r0, w, t) := −αβ

(1− eβt) + r0eβt −

i−1∑j=1

Bjeβtwj −

σ

β

(1− eβ(t−ti−1)

)wi

for all t ∈ T i, (6.4.86) can be rewritten as

minimize f(r0, w, ϑ) := ϑ

s.t. gi(r0, w, ϑ, t) := |y(t)− fk(r0, w, t)| − ϑ ≤ 0 for all t ∈ T i (i = 1, . . . , N)

gN+1(r0, w, ϑ) := max

max

i=1,...,Nwi − w, w − wi, r0 − r, r − r0

≤ 0

(6.4.87)

with an approximation y of y constructed by observable values. Thus we nowdeal with a linear SIP with N + 1 constraints. The number of constraints in(6.4.87) is much smaller than in the formulation in [388] which is caused by thefact that we can treat non-differentiable constraint functions. Consequently, wecan use larger barrier parameter in order to expect similar accuracies.

We want to show that (a non-regularized variant of) Algorithm 6.4.4 canbe used for solving (6.4.87) approximately. For that purpose we have to checkthe corresponding parts of Assumption 6.4.3. We first notice that some partsof this assumptions does not have to hold for the last constraint gN+1 since itdoes not depend on t. However, in practice gN+1 is treated as constraint of typeg(x, t) ≤ 0, t ∈ T with single-valued T . Additionally we have to know somethingmore about y if we want to show some parts of Assumption 6.4.3. Therefore weonly consider the special case y = yi is constant on each interval T i.

We observe that the assumptions of the convexity of f and all gi arefulfilled since r0, w, ϑ occur at most linearly in each constraint and the absolutevalues of linear functions as well as the maxima of finitely many linear functionsare convex. Part (2) of Assumption 6.4.3 is not fulfilled since T 1, . . . , TN−1

are not closed. But it is possible to consider the closures of all sets T i andthe continuous extensions of all gi in (6.4.87) without changing the feasibleset. Then the continuity of all constraints w.r.t. t is obvious. Furthermore, part(5) of Assumption 6.4.3 is fulfilled if w < w and r < r are true. Moreover, weobserve that lower level sets of our considered semi-infinite problem are boundedsince w, r0 are bounded by the (N + 1)-th constraint and ϑ is bounded belowby 0 and bounded above by the given level. Consequently, since we only dealwith continuous functions, these level sets are compact. Thus, regarding thatthe given problem is feasible, the solution set Mopt has to be nonempty andcompact. Part (7) is simply fulfilled if we choose equidistant finite grids for eachconstraint as it is done in the chapter before. Regarding Lemma 6.4.18 part (9)of Assumption 6.4.3 can be fulfilled while subgradients of f and gi(·, t) can beeasily given if one regards that, excluding the absolute value or the maxima, thefunctions therein are differentiable. Thus it remains to determine the constantsLti,S enforced by part (8) of our assumption and LxS for determining Ci,S and the


radii. Regarding the differentiability of the function inside the absolute value ingi we can set

Lti,S := max(r0,w,ϑ)∈S

supt∈T i

∣∣∣∣∣∣αeβt + r0βeβt −

i−1∑j=1

Bjβeβtwj + σeβ(t−ti−1)wi

∣∣∣∣∣∣for all i = 1, . . . , N . In addition to this

Lxi,S := maxt∈T i

eβt +

i−1∑j=1

|Bj | eβt +

∣∣∣∣σβ (1− eβ(t−ti−1))∣∣∣∣+ 1

for all i = 1, . . . , N and LxN+1,S := 1 are used. Consequently Assumption 6.4.3is completely fulfilled so that Algorithm 6.4.4 can be used for solving (6.4.87).

Demonstrating this we want to approximate the German stock index DAXin two time periods of each 30 days. The required data, consisting of the dailyopening DAX prices, are given in Table 6.4.3.The first period represents a quite stable but slowly growing DAX while thesecond period covers a big fluctuation in a short time interval. In addition thereare needed values for α, β and σ. We used the setting

α := 0.0154, β := −0.1779 and σ := 0.02,

which is derived from the observation of US interests for government bondswithin the years 1964 to 1989, because there was no similar investigation of theGerman stock exchange. Moreover, the German stock exchange tends to followthe US stock exchange so that this choice is not too bad.

Due to the fact that we only consider two different scenarios we do notgive a standard parameter setting. Rather we have a separate look at bothsituations. Nevertheless, there were some common settings. So in both situationsthe considered time period was uniformly mapped to the interval [0, 1] whichimplies that each trading day was represented by a subinterval of length 1/30of [0, 1]. Additionally we set qk := 0.999 in each case and, regarding Remark6.4.17, the radii were computed by Lemma 6.4.16. In fact we used

ri,k = 0.9 min

r,−maxt∈Th g(xi,k−1, t)− Ltxi,k−1h

LxBr(xi,k−1)

(6.4.88)

to determine a radius for each constraint with r := min1000, 2rk,i−1 if k > 1and r := 1000 if k = 1 for all constraints and h := hνk,i−1 if k > 1, h := 0.001if k = 1 for ν = 1, . . . , 30. In consequence of this it was possible to computeCν,Sk,i by Lemma 6.4.18 for each constraint.


Example DAX1 Example DAX2

BoundsTrading Day

t1i

Pricey1i

BoundsTrading Day

t2i

Pricey2i

04.01.1993 1533.06 05.03.1998 4642.7905.01.1993 1547.99 06.03.1998 4686.2406.01.1993 1560.27 09.03.1998 4775.8307.01.1993 1546.33 10.03.1998 4807.9208.01.1993 1540.56 11.03.1998 4855.2211.01.1993 1526.66 12.03.1998 4822.7812.01.1993 1527.33 13.03.1998 4863.4413.01.1993 1529.61 16.03.1998 4891.8514.01.1993 1521.03 17.03.1998 4932.4215.01.1993 1542.91 18.03.1998 4936.17

w=−105 18.01.1993 1559.83 w=−106 19.03.1998 4923.5119.01.1993 1576.13 20.03.1998 4993.53

w= 105 20.01.1993 1586.94 w= 106 23.03.1998 5017.4821.01.1993 1577.62 24.03.1998 5014.62

r=1000 22.01.1993 1587.95 r =4000 25.03.1998 5058.5425.01.1993 1582.21 26.03.1998 5093.52

r=2000 26.01.1993 1566.83 r=6000 27.03.1998 5041.8427.01.1993 1570.96 30.03.1998 5069.9828.01.1993 1561.02 31.03.1998 5070.8129.01.1993 1571.28 01.04.1998 5093.5201.02.1993 1582.35 02.04.1998 5163.1102.02.1993 1587.20 03.04.1998 5203.5803.02.1993 1595.08 06.04.1998 5256.6904.02.1993 1605.07 07.04.1998 5276.7905.02.1993 1635.67 08.04.1998 5282.9408.02.1993 1643.83 09.04.1998 5270.3509.02.1993 1642.32 14.04.1998 5378.9110.02.1993 1649.79 15.04.1998 5379.9911.02.1993 1651.22 16.04.1998 5362.2612.02.1993 1655.13 17.04.1998 5266.34

Table 6.4.3: DAX data


Figure 6.4.2: Example DAX1 - trajectory of the regularized solution

Figure 6.4.3: Example DAX1 - trajectory of the unregularized solution

Figure 6.4.4: Example DAX2 - trajectory of the regularized solution


Now considering the first example data DAX1 the starting point

r0 := 1600, w1 = . . . = w30 := 0, ϑ := 3000

was used and we set

ε1,0 := 0.001, εk+1,0 := 0.7εk,

δ1 := 10, δk+1 := 0.7δk,

µ1 := 10, µk+1 := 0.8µk.

But as in all examples before restart procedures were applied to adapt auto-matically the accuracy parameter. Then the algorithm was stopped when thebarrier parameter fell below 0.1. Although this stopping criterion seems to bevery bad, Figure 6.4.2 shows that the tendency of the DAX curve is correctlyreconstructed by our final approximate solution which can be found in Table6.4.4. Furthermore the final approximation error of 15.65 is better than 16.78achieved in [411] and close to the correct minimal value 15.30, which is, due tothe theory of continuous Chebyshev approximation, the half of the maximal gapbetween two successive observed DAX values. As a comparison in Figure 6.4.3the unregularized solution for DAX1 is depicted.

For the second example data DAX2 we used

r0 := 5000, w1 = . . . = w30 := 30000, ϑ := 5000

as starting point,

ε1,0 := 0.005, εk+1,0 := 0.6εk,

δ1 := 50, δk+1 := 0.6δk,

µ1 := 100, µi+k := 0.7µk

and, again, the restart procedure. The stopping criterion was now fulfilled if thebarrier parameter reached 0.01. The resulting approximate solution is stated inTable 6.4.7, while our final approximation error given by 54.30 is comparablewith the result achieved in [411]. The optimal value is 54.28 so that the moreaccurate stopping criterion leads also to a more accurate final solution in com-parison to the first situation. Figure 6.4.4 shows again that the complete curveis correctly reconstructed, but it can be observed as in the first case that thereis not detected each particular fluctuation.

In the following tables more technical details about the computations arelisted.


n start vector approximate solution

32 r0 = 1600 r0 = 1525.52 w1 = 36242.75w1 = · · · = w30 = 0 w2 = 32815.99 w3 = 13815.77

δ = 3000 w4 = 4020.24 w5 = −6069.15w6 = 4758.85 w7 = 14737.18w8 = 14112.17 w9 = 18346.01w10 = 46544.21 w11 = 33868.20w12 = 40881.96 w13 = 14098.55w14 = 14098.55 w15 = 15506.86w16 = −3195.49 w17 = 7344.09w18 = 7793.20 w19 = 13793.60w20 = 30348.69 w21 = 30674.80w22 = 19024.95 w23 = 26004.27w24 = 46345.79 w25 = 49706.85w26 = 14621.09 w27 = 17968.50w28 = 20163.07 w29 = 19863.05w30 = 14764.52 δ = 15.65

Table 6.4.4: Example DAX1: Computed optimal solution

# of restarts of inner loops 8# of solved linear programs 946

# of solved quadratic programs 39337# of constrained boxes 7508

tLP : time in seconds for solving all LP 633.58tQP : time in seconds for solving all QP 137.58

Ttot: total time in seconds for complete iteration process 1429.63

Table 6.4.5: Example DAX1: Numerical effort

µ 1.15E-01r 1.59E-01hmin 2.13E-05

|Th|/|Th| 0.90

Table 6.4.6: Example DAX1: Final parameter values


n start vector approximate solution

32 r0 = 5000 r0 = 4642.79 w1 = 41296.84w1 = · · · = w30 = 10000 w2 = 174057.85 w3 = 122032.14

δ = 5000 w4 = 113992.06 w5 = 51575.18w6 = 51645.83 w7 = 86537.16w8 = 103728.91 w9 = 79851.57w10 = 43906.46 w11 = 77530.96w12 = 132807.74 w13 = 44629.71w14 = 73364.74 w15 = 95344.36w16 = 43553.69 w17 = 45089.31w18 = 47586.70 w19 = 45104.08w20 = 131617.59 w21 = 124602.13w22 = 120279.34 w23 = 115373.19w24 = 52892.84 w25 = 50427.28w26 = 110867.23 w27 = 130648.27w28 = 47854.24 w29 = −50973.20w30 = 46863.05 δ = 54.30

Table 6.4.7: Example DAX2: Computed optimal solution

# of restarts of inner loops 7# of solved linear programs 1643

# of solved quadratic programs 35399# of constrained boxes 3366

tLP : time in seconds for solving all LP 848.61tQP : time in seconds for solving all QP 153.91

Ttot: total time in seconds for complete iteration process 1711.77

Table 6.4.8: Example DAX2: Numerical effort

µ 1.34E-02r 1.17E-02hmin 1.01E-07

|Th|/|Th| 0.78

Table 6.4.9: Example DAX2: Final parameter values

6.5. COMMENTS 241

6.5 Comments

Section 6.1: This part reviews different approaches for the numerical solutionof SIP. Apart from the papers mentioned in the text we refer to a numberof publications that gave major contributions to duality theory (cf. Charnes,Cooper and Kortanek [73], Ben-Tal, Ben-Israel and Rosinger[38]),optimality conditions (see Ben-Tal, Teboulle and Zowe [39], Ioffe [191],Jongen, Wetterling and Zwier [201]) and also numerical methods, in par-ticular, exchange algorithms Gustafson and Kortanek [156], Gustafson[154] or reduction methods (cf. Watson [416], Coope and Watson [83]).

Section 6.2–6.3: Path-following methods for solving parameter-dependent equa-tions (see, for instance, Allgower and georg [8]) have been applied to opti-mization problems in Guddat et al. [152] and Golmer et al. [138].

The Algorithms 6.2.12–6.2.17 and their theoretical foundation are con-tained in the papers of Kaplan and Tichatschke [210, 212, 214, 218].

Section 6.4: The first algorithm for SIP in the context of interior point strate-gies was an extension of an affine-scaling algorithm to linear SIP suggested byFerris and Philpott [110, 111]. But it is not easily possible to extend eachinterior-point approach to SIP. For instance Powell [334] showed that theapplication of Karmarkar’s algorithm to linear SIP does not have to work.Additionally, a survey of interior-point approaches which can naturally be ex-tended to SIP is given by Todd [396] and Tuncel and Todd [403]. A furtherapproach originates from the method of analytic centers which was introducedby Sonnevend [377] and extensively studied by Jarre [197] for finite con-vex problems. In order to tackle the SIP directly, Sonnevend [377, 378] andSchattler [362, 367] extended this approach to convex SIP by introducing anintegral form of the logarithmic barrier. But, unfortunately the barrier propertymay be lost due to the smoothing effect of the integral (cf., e.g., [403, 198]).Usually boundedness (or in fact compactness) of the feasible set or at least ofthe solution set of the given problem is assumed in all interior point approachesfor SIP mentioned above. Dropping this restrictive assumption Kaplan andTichatschke [226] suggested a combination of the logarithmic barrier methodwith a discretization procedure for the constrained set and the proximal pointmethod. Furthermore, due to the regularization, this approach allows to treatill-posed SIP. A further advantage of the method proposed in [226] is given bythe fact that convergence of the iterates can be established. This is not alwaysclear if one applies pure interior-point methods for convex problems except forlinear and quadratic ones.

Chapter 7

PARTIAL PPR INCONTROL PROBLEMS

A large number of interesting physical and technical problems give rise to opti-mal control models where the state of the system is governed by partial differ-ential equations. The fundamental monograph of Lions [269] gives an excellentintroduction into the mathematics of these models for various types of differen-tial equations, boundary conditions, and control. In this chapter we deal withproblems whose states are described by second order, elliptic equations.

7.1 Non-coercive Elliptic Control Problems

Let Ω ⊂ Rn be an open domain with a boundary Γ of the class C2, Ω = Ω ∪ Γ.Then, with coefficients

aij ∈ C2(Ω), i, j = 1, · · · , n , a0 ∈ C2(Ω) (7.1.1)

such that for all x ∈ Ω, ξ ∈ Rn and a constant c > 0

a0(x) > 0 and

n∑i,j=1

aij(x)ξiξj ≥ cn∑i=1

ξ2i , (7.1.2)

we consider the elliptic second order differential operator A defined by

Ay := −n∑

i,j=1

∂

∂xi

(aij

∂y

∂xj

)+ a0y. (7.1.3)

Further let H denote a Hilbert space and

Uad ⊂ H a non-empty, closed, convex set, (7.1.4)

the so-called set of admissible controls.In the case of distributed control, the state of the system is defined by

Ay = f +D0u in Ω,

By = 0 on Γ,(7.1.5)

243

244 CHAPTER 7. PARTIAL PPR IN CONTROL PROBLEMS

with u ∈ Uad ⊂ H = L2(Ω), f ∈ L2(Ω), D0 ∈ L(L2(Ω), L2(Ω)) a linearcontinuous operator, and B a boundary operator (of Dirichlet or Neumann type,for instance).

In case of boundary control, the state of the system is

Ay = f in Ω,By = g +D1u on Γ,

(7.1.6)

with f ∈ L2(Ω), g ∈ L2(Γ), u ∈ Uad ⊂ H = L2(Γ), D1 ∈ L(L2(Γ), L2(Γ)), andB as above.

Let us assume now that, for u ∈ Uad, y(u) is uniquely determined by (7.1.5)or (7.1.6) respectively and that y(u) ∈ V , V a Hilbert space. For instance, in caseof (7.1.5) with B the trace operator, and Γ sufficiently smooth, an appropriatechoice could be V = H1

0 (Ω).Given now Uad and a state equation as above, the problem is to minimize

a functionalJ(u) = ||Cy(u)− κd||2H + ((Nu, u))H (7.1.7)

subject to u ∈ Uad, where C ∈ L(V,H), H a Hilbert space, κd ∈ H a givenelement, and N ∈ L(H,H) a positive semi-definite operator.Throughout this chapter, ||·||S , ((·, ·))S denote norm and scalar product in spaceS, the subscript being omitted, i.e.|| · ||, ((·, ·)), in case of S = L2(Ω).For further reference we define:

7.1.1 Problem.With a given admissible control set (7.1.4), a boundary problem (7.1.5) or(7.1.6) and an objective J via (7.1.7) consider the control problem

minJ(u) : u ∈ Uad. (7.1.8)

In addition, a state constraint

y(u) ∈ G ⊂ V, G closed, convex (7.1.9)

may be considered (cf. [41], [68]).

We note that the assumptions on the smoothness of the data and the typeof the boundary conditions may differ essentially from the above (for instanceaij , a0 ∈ L∞(Ω) instead of (7.1.1), or a non-homogeneous boundary conditionin (7.1.5) etc.).

Particularly in numerical contributions it is common to assume the op-erator N to be positive definite, implying Problem 7.1.1 to be well-posed. Afrequent choice is N := λI, λ > 0. However, from the practical point of view,the term ((Nu, u))H , usually considered as a cost of the control, is not evident.On the one hand, it is not clear why this cost should depend quadratically onu and, on the other hand, (7.1.7) appears to be a rather arbitrary scalarizationof a two-objective model. In the sequel we will deal with the case N ≡ 0. Webelieve that in many practical cases this is a more natural and relevant model. Itconcentrates on the primary aim of the process expressed by the first term in theobjective (7.1.7) and leaves restrictions on the cost of control to the constraintsdefining Uad.

For N ≡ 0, however, the problem is likely to become ill-posed and more

7.1. NON-COERCIVE ELLIPTIC CONTROL PROBLEMS 245

difficult to handle. For bounded sets Uad, a solution still exists (cf. [269]) butit may be non-unique unless rather unreasonable assumptions are imposed (e.g.that B and C be injective). In case of an unbounded Uad it may occur that theset of optimal controls is empty or unbounded.

In this chapter, to deal with possibly ill-posed control problems in infinite-dimensional spaces, we consider penalty methods stabilized by means of iterativePPR (see also Hettich, Kaplan and Tichatschke [176]). To be applicableto our optimal control Problem 7.1.1 a number of substantial modifications andsupplements are necessary due to the following circumstances:

• There are serious difficulties in estimating the closeness between the solu-tions of the original and discretized problems.

• The objective J depends in an implicit way on the control u.

• In general, it is impossible to uniformly estimate the Lagrange multipliersof the discretized problems as there are no suitable regularity conditionsavailable.

The next Subsection 7.1.1 extensively and exemplarily deals with the im-portant case of distributed control, Dirichlet boundary conditions, and boundedUad.

In [176], Part II, the results are extended to more general problems admit-ting distributed and boundary control with unbounded control sets and stateconstraints.

A significant peculiarity of our approach is that regularization is accom-plished only with respect to the control variable but not the state variable.Therefore, we deal with a natural application of a partial regularization – in thesubspace of control variables – which has been considered formally already inSubsection 4.3.3.

7.1.1 Distributed control problems

In the whole subsection we will deal with the following instance of Problem7.1.1 (cf. also [42, 68, 118]):With Ω ⊂ Rn an open and bounded domain, Γ its boundary, and Ω := Ω ∪ Γ,let

H := H = L2(Ω) , V := H10 (Ω) (7.1.10)

and A a second order, elliptic operator (7.1.3) with coefficients a0, aij satisfying(7.1.1) and (7.1.2). The inequality a0(x) > 0 may be relaxed to a0(x) ≥ 0 on Ω.

For u ∈ Uad we consider the Dirichlet problem

S1(u) :Ay = f + u in Ω,

y = 0 on Γ,(7.1.11)

with given f ∈ L2(Ω) and Uad a bounded, closed, and convex subset of L2(Ω).Assuming Γ belongs to the class C2, it is well-known (cf. [19, Theorem

7.11]) that, given u ∈ L2(Ω), S1(u) has a unique solution

y(u) ∈ H2(Ω) ∩H10 (Ω) (7.1.12)


such that the mapping

T u := y(u), (T : L2(Ω)→ H2(Ω) ∩H10 (Ω)) (7.1.13)

is well defined. By Y we denote the space of functions

Y := y : y ∈ H10 (Ω), Ay ∈ L2(Ω). (7.1.14)

Employing the Cauchy-Schwarz-Inequality and (7.1.1), it gets immediate thatby

((y, z))Y := ((Ay,Az)), ||y||Y := ||Ay|| (7.1.15)

a scalar product and a norm are given on Y (recall that ((·, ·)) and || · || denotethe scalar product and the norm in L2(Ω)), respectively. Moreover we have

7.1.2 Proposition. Y is a Hilbert space with scalar product and norm given by(7.1.15). Algebraically, Y coincides with the set of functions H2(Ω) ∩H1

0 (Ω).

To establish the second part of Proposition 7.1.2, let y ∈ Y . Thus, f :=Ay ∈ L2(Ω), and of course, y solves

Ay = f in Ω, y = 0 on Γ,

implying y ∈ H2(Ω) ∩H10 (Ω). The opposite implication is obvious.

Further, we defineX := Y × L2(Ω) (7.1.16)

with norm

||(y, u)||X = (||y||2Y + ||u||2)1/2 = (||Ay||2 + ||u||2)1/2. (7.1.17)

Using for the dual space (L2(Ω))′ = L2(Ω) we obtain X ′ = Y ′ × L2(Ω).

7.1.3 Proposition. [269]Under the assumptions on S1(u), every u ∈ Uad uniquely defines a process(y(u), u) ∈ X.

Finally, to specify the objective functional (see (7.1.10)) let

C ∈ L(H10 (Ω), L2(Ω)) = L(V,H), κd ∈ L2(Ω). (7.1.18)

Then Control Problem 7.1.1 can be reconsidered as

min J(u) :=∫

Ω(Cy(u)− κd)2dΩ,

s.t. u ∈ Uad, y(u) ∈ G ⊂ V, G closed, convex.(7.1.19)

We assume additionally that there exists a u ∈ Uad such that y(u) ∈ intG (in Y ).

Recall that y(u) is the unique solution of system (7.1.11).

7.1.4 Proposition. (cf. [269])In case Uad is bounded, the set U∗ of optimal controls for Problem (7.1.19) is anon-empty, closed, convex subset of Uad.


7.1.2 Regularized penalty methods

Here we are going to formulate two methods for solving Problem (7.1.19) andstate their main properties. According to the Subsection 2.2.3 (see also Subsec-tion 4.3.1) we consider in this chapter one-step as well as multi-step regulariza-tion methods, performed by means of penalty algorithms. In the first methodthe penalty and regularization term are updated synchronously (one-step regu-larization) whereas in the second method the proximal iterations continue withfixed penalty term as long as reasonable progress is achieved (multi-step regu-larization). As mentioned, a further significant peculiarity is that we deal witha partial regularization carried out only with respect to the control variables,but not to the state variables. In this sense we deal with a particular case ofregularization on a subspace.

In the spirit of [269], a penalty method for solving Problem (7.1.19) couldproceed as follows:

Given a sequence rk, rk > 0, limk→∞ rk = 0, define the penalized func-tionals

Jk(y, u) :=

∫Ω

(Cy − κd)2dΩ +1

rk

∫Ω

(Ay − f − u)2dΩ, k = 1, 2, · · · , (7.1.20)

and compute a sequence of minimal points zk = (yk, uk) of Jk s.t. Y × Uad.Then the question is whether zk converges to an optimal process for Prob-

lem (7.1.19). To simplify the convergence analysis, we do not include a penal-ization of the state constraints as it is done by Neitaanmaki and Tiba [305].In the case of strictly convex functionals J of type (7.1.7) with H = H = L2(Ω),N := I and C : V → H an embedding operator, a positive answer to this ques-tion is given in [42]. For possibly ill-posed problems similar results cannot beexpected. Therefore, to enforce strict convexity of the auxiliary problems, weuse IPR employing an regularization term w.r.t. variable u, i.e., regularizationon the subspace of control variables is carried out. Note that in (7.1.20), y andu are considered to be independent variables.

7.1.5 Method. (One-step regularization)Choose positive sequences rk, εk, with limk→∞ rk = limk→∞ εk = 0, supk rk <1, supk εk < 1, and u0 ∈ Uad.Step k: Given uk−1 ∈ Uad. With Jk defined by (7.1.20), let

Ψk(y, u) := Jk(y, u) +

∫Ω

(u− uk−1)2dΩ (7.1.21)

and(yk, uk) := arg minΨk(y, u) : (y, u) ∈ G× Uad. (7.1.22)

Compute an approximate (yk, uk) ∈ G× Uad of (yk, uk) such that

‖∇Ψk(yk, uk)−∇Ψk(yk, uk)‖X′ ≤ εk. (7.1.23)

♦

Here, ∇Ψk ∈ X ′ denotes the Gateaux-derivative of Ψk which is easily seento be given by

∇Ψk(y, u)(η, µ) = 2∫

Ω[(Cy − κd)Cη

+ 1rk

(Ay − f − u)(Aη − µ) + (u− uk−1)µ]dΩ(7.1.24)


for (η, µ) ∈ X.Note that for (yk, uk) being a minimum according to (7.1.22) it is necessary that

∇Ψk(yk, uk)(y − yk, u− uk) ≥ 0, ∀ (y, u) ∈ G× Uad. (7.1.25)

7.1.6 Method. (Multi-step regularization)Let rk, εk, u0 ∈ Uad be as in Method 7.1.5, and δk a third positivesequence (not necessarily tending to 0).Step k: Given uk−1 ∈ Uad.

(a) Set uk,0 := uk−1, i := 1.

(b) Given uk,i−1, let (with Jk defined by (7.1.20))

Ψk,i(y, u) := Jk(y, u) +

∫Ω

(u− uk,i−1)2dΩ (7.1.26)

and

(yk,i, uk,i) := arg minΨk,i(y, u) : (y, u) ∈ G× Uad. (7.1.27)

Compute an approximate (yk,i, uk,i) ∈ G× Uad of (yk,i, uk,i) such that

‖∇Ψk,i(yk,i, uk,i)−∇Ψk,i(y

k,i, uk,i)‖X′ ≤ εk. (7.1.28)

(c) If ‖uk,i − uk,i−1‖ > δk, set i := i+ 1 and repeat (b).Otherwise, set uk := uk,i, i(k) := i, and continue with Step(k + 1).

♦

In order to prove convergence of both, the one-step and multi-step method,we have to take into account whether the problems possess state constraints ornot.

1. The case of G = Y , i.e. no state constraints.

For∇Ψk,i it holds relation (7.1.24) with uk−1,i instead of uk−1 and the necessarycondition (7.1.25) applies accordingly.

7.1.7 Proposition. The functionals Ψk in (7.1.20) as well as Ψk,i in (7.1.26)are quadratic, continuous on X, and strongly convex, i.e.,for some α > 0 we have for all z = (y, u), z = (y, u) ∈ X

Ψk,i(z)−Ψk,i(z) ≥ 〈∇Ψk,i(z), z − z〉+ α‖z − z‖2X .

Proof: The fact that Ψk,Ψk,i are continuous, quadratic functionals on X isobvious.To show that Ψk,i is strongly convex (the proof for Ψk is analogous) it sufficesto show that its quadratic part is a positive definite quadratic form.An elementary calculation gives

Ψk,i(y, u) = Q1(y, u) +Q2(y, u) + `(y, u)


with an affine linear part ` and

Q1(y, u) =

∫Ω

(Cy)2dΩ +

(1

ri− 1

)∫Ω

(Ay − u)2dΩ

a positive semi-definite quadratic form (recall that rk < 1) and

Q2(y, u) =

∫Ω

(Ay − u)2dΩ +

∫Ω

u2dΩ.

We are done, if we can show that Q2 is positive definite. We calculate (cf.(7.1.17))

Q2(y, u) =∫

Ω

(√23Ay −

√32u)2

dΩ + 13 (‖Ay‖2 + ‖u‖2) + 1

6‖u‖2

≥ 13‖(y, u)‖2X .

(7.1.29)

Thus, Proposition 7.1.7 is proved.

From the proof above it follows, that Q2 defines another norm | · | on X:

|(y, u)|2 = Q2(y, u) = ‖Ay − u‖2 + ‖u‖2. (7.1.30)

From (7.1.29) and a simple estimation we have

1

3‖(y, u)‖2X ≤ |(y, u)|2 ≤ 3‖(y, u)‖2X . (7.1.31)

For shortness, in the sequel, we will use the abbreviations

z = (y, u), z∗ = (y∗, u∗), zk,i = (yk,i, uk,i) etc. (7.1.32)

for elements in X = Y × L2(Ω).Proposition 7.1.7 ensures that the optimization problems (7.1.22), (7.1.27)

become well-posed such that their respective unique solutions (yi, ui) and (yk,i, uk,i)can be approximated by common finite element methods. Note however, thatMethods 7.1.5 and 7.1.6 are conceptual because

- no stopping rule is defined for the outer iteration,

- the stopping rules (7.1.23), (7.1.28) are not yet practicable, as we don’tknow the exact minimizers,

- the method for minimizing (approximately) Ψk and Ψk,i respectively isnot specified.

Due to the strong convexity of Ψk and Ψk,i, one can use – in order tosatisfy (7.1.23) or (7.1.28) – any basic method which enables to define a point(y, u) ∈ G× Uad such that

Ψk,i(y, u) ≤ inf(y,u)∈G×Uad

Ψk,i(y, u) + µ

with given µ > 0 (analogously, for Ψk). For instance, if Uad ⊂ L∞(Ω) and Uadand G are given by means of point-wise constraints, the usual discretizationapproach, like FEM, can be combined with (finite) conjugate direction methodsor simple gradient projection methods.


7.1.8 Remark. Note, that Method 7.1.5 can be considered as a special caseof Method 7.1.6 by taking δk ”sufficiently large”, for instance

δk > d0 := sup‖u− v‖ : u, v ∈ Uad. (7.1.33)

Method 7.1.6 allows for a fixed Jk a more exact minimization. This gives rise tothe hope that in general, to obtain a certain accuracy, the value of the penaltyparameter 1

rkcan be kept smaller than in Method 7.1.5. Therefore, the numer-

ical behavior of Method 7.1.6 can be expected to be much better. ♦

To prove the next Proposition 7.1.10 we need some auxiliary estimate:

7.1.9 Lemma. : Let zk,i, zk,i be as in Method 7.1.6, i.e. (cf.(7.1.28))

‖∇Ψk,i(zk,i)−∇Ψk,i(z

k,i)‖X′ ≤ εk.

Then,

‖zk,i − zk,i‖X ≤3

2εk. (7.1.34)

The proof is immediate from (7.1.29) and the fact that (Ψk,i −Q2) is a convex,quadratic functional.

It should be emphasized that Lemma 9.2.5 is a consequence only of theproperties of Ψk,i and (7.1.28) and otherwise independent of the method.

Now, we start with two auxiliary estimates which are in essence conse-quences of the properties of the operator A and the boundedness of Uad.In this description, we do not focus on the explicit calculation of various con-stants cj , especially those which are connected with the estimation of solutionsof boundary value problems and the norm of certain operators.

7.1.10 Proposition. Let (y∗, u∗) be an optimal process of Problem (7.1.19).With an arbitrarily chosen uk,i−1 ∈ Uad let (yk,i, uk,i) and (yk,i, uk,i) be as inSubstep (b) of Method 7.1.6. Then there exist constants d1 and d2, independentof uk,i−1, εk, rk, k, and i ≥ 1, such that

Jk(y∗, u∗)− Jk(yk,i, uk,i) < d1rk (7.1.35)

andsup(k,i)

| (yk,i, uk,i)− (y∗, u∗)| ≤ d2, (7.1.36)

where | · | is a norm on X = Y × L2(Ω) defined by (7.1.30).

Proof: Let zk,i = (yk,i, uk,i) be the unique minimal point of Ψk,i on Y × Uadand

qk,i :=1

rk(Ayk,i − f − uk,i). (7.1.37)

Using (7.1.24) and the optimality condition (7.1.25) for Ψk,i instead of Ψk, weobtain due to the independence of y, u that∫

Ω

(Cyk,i − κd)(Cy − Cyk,i)dΩ +

∫Ω

qk,i(Ay −Ayk,i)dΩ ≥ 0 (7.1.38)

for all y ∈ Y and∫Ω

(uk,i − uk,i−1)(u− uk,i)dΩ−∫

Ω

qk,i(u− uk,i)dΩ ≥ 0 (7.1.39)


for all u ∈ Uad.Furthermore, with

Fk,i(z) :=

∫Ω

(Cy − κd)2dΩ +

∫Ω

(u− uk,i−1)2dΩ, (7.1.40)

in view of the gradient inequality for convex functions, we obtain for all y ∈Y, u ∈ L2(Ω) the inequality

Fk,i(z)− Fk,i(zk,i) ≥ 2∫

Ω(Cyk,i − κd)(Cy − Cyk,i)dΩ +

+ 2∫

Ω(uk,i − uk,i−1)(u− uk,i)dΩ.

(7.1.41)

Taking y := y∗, u := u∗ in (7.1.38)-(7.1.41) and observing Ay∗−f−u∗ = 0,we find

Fk,i(z∗) ≥ 2

rk

∫Ω

(Ayk,i − f − uk,i)2dΩ. (7.1.42)

Thus(1

2Fk,i(z

∗)

)1/2

≥(rk

2Fk,i(z

∗))1/2

≥ ‖Ayk,i − f‖ − ‖uk,i‖. (7.1.43)

Together with the boundedness of Uad , supk rk < 1, supk εk < 1, and (7.1.34)this shows that there exists a constant c1 such that

‖yk,i‖Y < c1 , ‖yk,i‖Y < c1. (7.1.44)

Now, yk,i and yk,i solve the boundary value problems

Ay = Ayk,i , y|Γ = 0 and Ay = Ayk,i, y|Γ = 0.

Therefore, a standard result from the theory of elliptic operators (see, for in-stance, [19]) gives the estimate ‖yk,i‖H2(Ω) ≤ const · ‖Ayk,i‖, or due to (7.1.44),the existence of c2 such that

‖yk,i‖H2(Ω) < c2 and, analogously, ‖yk,i‖H2(Ω) < c2. (7.1.45)

Let ¯zk,i = (¯yk,i, ¯uk,i) be a feasible point (i.e. ¯uk,i ∈ Uad, ¯yk,i = T ¯uk,i) withminimal distance (with regard to ‖ · ‖X) to zk,i, i.e.

¯zk,i = arg min‖zk,i − z‖X : z feasible . (7.1.46)

Now we estimate ‖¯zk,i− zk,i‖X . Inequality (7.1.42) shows that, due to theboundedness of Uad, there exists a constant c3 such that

‖rkqk,i‖ = ‖Ayk,i − f − uk,i‖ < c3√rk. (7.1.47)

Let yk,i be the solution of the boundary value problem Ay = f + uk,i , y|Γ = 0.Then

‖Ayk,i −Ayk,i‖ = ‖ Ayk,i − f − uk,i‖ < c3√rk

and hence‖(yk,i, uk,i)− (yk,i, uk,i)‖X < c3

√rk.


By the definition (7.1.46) of ¯zk,i this yields

‖zk,i − ¯zk,i‖X < c3√rk. (7.1.48)

Next, with A∗ and C∗ the adjoint operators to A and C, let pk,i be a solutionof the adjoint problem

A∗pk,i = C∗(Cyk,i − κd) , pk,i|Γ = 0. (7.1.49)

Because A∗ is again an elliptic operator of second order with coefficients inC2(Ω), we have pk,i ∈ H2(Ω) ∩H1

0 (Ω) and, due to (7.1.45), the existence of c4such that

‖pk,i‖H2(Ω) < c4. (7.1.50)

From ∫Ωpk,iA(y − yk,i)dΩ =

∫Ω

(y − yk,i)A∗pk,idΩ

=∫

Ω(y − yk,i)C∗(Cyk,i − κd)dΩ

=∫

Ω(Cyk,i − κd)(Cy − Cyk,i)dΩ

(7.1.51)

and (7.1.38), (7.1.39) we get for (y, u) ∈ Y × Uad∫Ω

(pk,i + qk,i)(Ay −Ayk,i)dΩ +

∫Ω

(u− uk,i)(uk,i − uk,i−1 − qk,i)dΩ ≥ 0.

Choosing u := uk,i, y := T(uk,i − qk,i

‖qk,i‖

), T the operator defined by (7.1.13),

(thus Ay = uk,i − qk,i

‖qk,i‖ + f), this gives

−∫

Ω

(pk,i + qk,i)

(qk,i

‖qk,i‖+ rkq

k,i

)dΩ ≥ 0. (7.1.52)

Together with (7.1.47), (7.1.50), and rk < 1, we obtain with the aid of theCauchy-Schwarz-Inequality

‖qk,i‖ ≤ −∫

Ωpk,iqk,i

(rk + 1

‖qk,i‖

)dΩ− rk‖qk,i‖2

≤ ‖pk,i‖ ‖qk,i‖(rk + 1

‖qk,i‖

)= ‖pk,i‖+

√rk‖pk,i‖ · ‖

√rkq

k,i‖

< c4 + c4c3 =: c5.

(7.1.53)

Therefore, we get the improved bounds

‖Ayk,i − f − uk,i‖ < c5rk (7.1.54)

and, from here, according to the derivation of (7.1.48),

‖zk,i − ¯zk,i‖X < c5rk. (7.1.55)

With an argument analogous to that used to derive (7.1.45), we estimate withconstant c6

‖y‖ ≤ c6 ‖y‖Y ∀ y ∈ Y. (7.1.56)


Therefore, (7.1.55) gives

‖¯yk,i − yk,i‖ < c6c5rk. (7.1.57)

Furthermore, using (with Jk according to (7.1.20))

Jk(z∗) =

∫Ω

(Cy∗ − κd)2dΩ ≤∫

Ω

(C ¯yk,i − κd)2dΩ = Jk(¯zk,i)

we obtain with (7.1.44) and (7.1.57)

Jk(z∗)− Jk(zk,i) = Jk(z∗)− Jk(¯zk,i) + Jk(¯zk,i)− Jk(zk,i)

≤∫

Ω[(C ¯yk,i − κd)2 − (Cyk,i − κd)2]dΩ

=∫

Ω(C ¯yk,i − Cyk,i)(C ¯yk,i + Cyk,i − 2κd)dΩ

≤ ‖C‖(‖C‖ ‖¯yk,i + yk,i‖+ 2‖κd‖)‖¯yk,i − yk,i‖

< ‖C‖(‖C‖(c1 + supu∈Uad ‖T u‖) + 2‖κd‖) · c5c6 · rk,

proving (7.1.35).The existence of a d2 such that (7.1.36) holds, is an easy consequence of

(7.1.31), (7.1.44) and the boundedness of Uad. Thus the proposition is proved.

To prove the convergence of the methods considered, we need one more lemma.Its formulation requires some further notations.

Let Z be a Hilbert space, Z1 a subspace and Π : Z → Z1 the orthogonalprojection operator. Let a(·, ·) be a continuous, symmetric, positive semi-definitebilinear form on Z × Z and ` a linear, continuous functional on Z.With K ⊂ Z convex and closed, consider the problem

minz∈Kφ(z) := a(z, z)− `(z). (7.1.58)

Let b(·, ·) be an other symmetric bilinear form on Z × Z such that

0 ≤ b(z, z) ≤ a(z, z) ∀ z ∈ Z (7.1.59)

and

b(z, z) + ‖Πz‖2Z ≥ β‖z‖2Z ∀ z ∈ Z (7.1.60)

with some β > 0. By

|z|2 = b(z, z) + ‖Πz‖2Z (7.1.61)

another norm is defined on Z equivalent to ‖ · ‖Z according to the obviousrelation

(M + 1)‖z‖2Z ≥ |z|2 ≥ β‖z‖2Z (7.1.62)

with M ≥ supz 6=0b(z,z)‖z‖2Z

.

7.1.11 Lemma. For each v0 ∈ Z and

v1 := arg minz∈K

φ(z) + ‖Πz −Πv0‖2Z


we have for all z ∈ K the inequalities

|v1 − z|2 − |v0 − z|2 ≤ −‖Πv1 −Πv0‖2Z + φ(z)− φ(v1) (7.1.63)

and|v1 − z| ≤ |v0 − z|+ η(z), (7.1.64)

where

η(z) =

(φ(z)− φ(v1))1/2 if φ(z) > φ(v1)

0 otherwise.

If, moreover, ‖Πv1 −Πv0‖Z ≥ δ ≥ η(z), then

|v1 − z| ≤ |v0 − z|+ η2(z)− δ2

2|v0 − z|. (7.1.65)

Proof. ‖Πz‖Z ≤ ‖z‖Z shows the boundedness of the bilinear form ((Πz,Πw))Zon the space Z × Z. Due to the optimality of v1 we have for all z ∈ K

2a(v1, z − v1)− `(z − v1) + 2((Πv1 −Πv0,Πz −Πv1))Z ≥ 0. (7.1.66)

Taking account of (7.1.61) a simple calculation shows

|v1 − z|2 − |v0 − z|2 = b (v1, v1)− 2b (v1, z) + 2b (v0, z)− b (v0, v0)

− ‖Πv1 −Πv0‖2Z + 2 ((Πv1 −Πv0,Πv1 −Πz))Z .

Utilizing (7.1.66), (7.1.59) and the simple inequality

2a(v1, z − v1) ≤ a(z, z)− a(v1, v1)

a straightforward calculation leads to (7.1.63) and (7.1.64) follows immediately.In case ‖Πv1 −Πv0‖Z ≥ δ ≥ η(z), estimate (7.1.63) gives

|v1 − z|2 − |v0 − z|2 ≤ − δ2 + η2(z) ≤ 0,

hence|v1 − z| ≤ |v0 − z|

and

0 ≥ −δ2 + η2(z) ≥ (|v1 − z|+ |v0 − z|)(|v1 − z| − |v0 − z|)

≥ 2|v0 − z|(|v1 − z| − |v0 − z|)

proving (7.1.65).

We note that in Lemma 9.2.6 closedness of K is only required to ensurethe existence of v1 and could be replaced by the requirement that v1 exists.

7.1.12 Theorem. Assume that the sequences rk, εk, andδk in Method7.1.6 satisfy the following conditions

supkεk < 1, sup

krk < 1,

∞∑k=1

√rk <∞,

∞∑k=1

εk <∞ (7.1.67)


and, with d1, d2 from (7.1.35), (7.1.36),

1

2d2(d1rk − (δk −

3

2εk)2) +

3√

3

2εk < 0, δk >

3

2εk. (7.1.68)

Then, for arbitrary u0 ∈ Uad, Method 7.1.6 is well-defined, especially i(k) <∞for every k, and uk,i, yk,i converge weakly in L2(Ω), Y to u, y, respectively,with (y, u) being an optimal process for Problem (7.1.19).For Method 7.1.5 already (7.1.67) is sufficient for weak convergence of (uk, yk)to (u, y).

Proof: From (7.1.68), the definition of i(k) in sub-step (c) of Method 7.1.6,and Lemma 9.2.5 we conclude

‖uk,i − uk,i−1‖ ≥ ‖uk,i − uk,i−1‖ − ‖uk,i − uk,i‖ > δk −3

2εk > 0

for all 1 ≤ i < i(k).Together with inequality (7.1.35) and d1rk < (δk − 3

2εk)2, due to the firstinequality in (7.1.68), this implies

‖uk,i − uk,i−1‖2 > Jk(z∗)− Jk(zk,i).

Let z1,0 = (T u1,0, u1,0). Applying Lemma 9.2.6 (inequality (7.1.65)) with

Z := X, Z1 := z = (y, u) ∈ X : y = 0,φ := Jk given by (7.1.20),

a(z, z) :=

∫Ω

CyCydΩ +1

rkb(z, z), ∀ z, z ∈ X,

b(z, z) :=

∫Ω

(Ay − u)(Ay − u)dΩ,

`(z) := −Jk(z) + a(z, z),

K := Y × Uad, δ := δk −3

2εk, v0 := zk,i−1,

together with Proposition 7.1.10, we get for 1 ≤ i < i(k)

|zk,i − z∗| < |zk,i−1 − z∗| +1

2d2(d1rk − (δi −

3

2εk)2)

and (cf. (7.1.64))

| zk,i(k) − z∗| < | zk,i(k)−1 − z∗| +√d1rk.

Utilizing (7.1.31), (7.1.34), and (7.1.68), one obtains for 1 ≤ i < i(k)

|zk,i − z∗| − |zk,i−1 − z∗| < 1

2d2(d1rk − (δk −

3

2εk)2) +

3√

3

2εk < 0 (7.1.69)

and

|zk,i(k) − z∗| − | zk,i(k)−1 − z∗| <√d1rk +

3√

3

2εk. (7.1.70)

Inequality (7.1.69) proves i(k) < ∞, because, as long as ‖uk,i − uk,i−1‖ > δk,the reduction in |zk,i − z∗| is better than a nonzero amount independent of i.


Polyak’s Lemma A3.1.8 enables us to state that the sequence |zk,i−z∗|converges if

∑∞k=1 εk < ∞ and

∑∞k=1

√rk < ∞. Due to (7.1.31) and (7.1.34),

|zk,i − z∗| converges to the same limit.Let zkj ,ij, with ij > 0 for each j, be a weakly convergent subsequence in

X, and z = (y, u) its weak limit. Observing (7.1.46), (7.1.48) and the convexityand closedness of (y, u) : u ∈ Uad, y = T u (T given by (7.1.13)) in X weconclude that z is feasible.

By the definition of Jk and Jk(z∗) = J(u∗)

|zk,i − z∗|2 − |zk,i−1 − z∗|2 ≤ Jk(z∗)− Jk(zk,i)

yields

|zk,i − z∗|2 − |zk,i−1 − z∗|2 ≤ J(u∗)−∫

Ω

(Cyk,i − κd)2dΩ. (7.1.71)

Taking limit in (7.1.71) for the subsequence zkj ,ij, the weak lower semi-continuity of

∫Ω

(Cy − κd)2dΩ in Y leads to

J(u∗) ≥∫

Ω

(Cy − κd)2dΩ.

Since z is feasible, this proves that u is optimal for Problem (7.1.19). Finally,Opial’s Lemma A1.1.3 proves the weak convergence of both zk,i and zk,ito z in X.

According to Remark 9.2.3, Theorem 9.2.11 also proves the convergenceof Method 7.1.5 even without condition (7.1.68), which is fulfilled automaticallyfor sufficiently large δk.

2. The case of state constraintsNow let a state constraint (7.1.9)

y(u) ∈ G ⊂ Y

be given, with G convex, closed and such that for some u ∈ Uad

y(u) ∈ int G, (7.1.72)

as assumed in Problem (7.1.19).The only relevant modification is in the proof of Proposition 7.1.10. We

show that inequalities (7.1.48), (7.1.54), and (7.1.55) remain true with modifiedconstants cj . Then the other parts of the proofs of Proposition 7.1.10 andTheorem 9.2.11 remain unchanged.

In the sequel we give the necessary modifications of the proof of Proposition7.1.10 in between relations (7.1.47) and (7.1.55).

Even though yk,i may not satisfy the state constraints we still have, withζk,i = (yk,i, uk,i), that

‖zk,i − ζk,i‖X < c3√rk. (7.1.73)

As zk,i ∈ G× Uad, this shows that

minv∈G×Uad

‖ζk,i − v‖X < c3√rk. (7.1.74)


With the abbreviation z(u) := (y(u), u), let with u according to (7.1.72),

τmin := minw∈∂G

‖T u− w‖Y ,

τmax := maxu∈Uad

‖z(u)− z(u)‖X ,

ωk,i := arg minv∈G×Uad

‖ζk,i − v‖X .

In case of yk,i 6∈ G, ωk,i /∈ z(u) + λ(ζk,i − z(u)) : λ ≥ 0, let

hk,i ∈ z(u) + λ(ζk,i − z(u)) : λ ≥ 0 ∩ (∂G× Uad),

bk,i ∈ z(u) + λ(ζk,i − ωk,i) : λ ≥ 0 ∩ hk,i + µ(hk,i − ωk,i) : µ ≥ 0

(hk,i, bk,i are uniquely defined by these relations, see Fig. 7.1.1).

bk,i

hk,i

ζk,i

ωk,i

z(u)

G × Uad

Figure 7.1.1:

Obviously‖ζk,i − ωk,i‖X‖z(u)− bk,i‖X

=‖ζk,i − hk,i‖X‖z(u)− hk,i‖X

Due to the trivial implication

−1 6= α

β=γ

δ⇒ α

α+ β=

γ

γ + δ

we obtain together with (7.1.73)

‖ζk,i − hk,i‖X =‖z(u)− ζk,i‖X

‖z(u)− bk,i‖X + ‖ζk,i − ωk,i‖X· ‖ζk,i − ωk,i‖X

≤ τmaxτmin

‖ζk,i − ωk,i‖X ≤τmaxτmin

c3√rk. (7.1.75)

If yk,i 6∈ G, but ωk,i ∈ z(u) + λ(ζk,i − z(u)) : λ ≥ 0, this estimate holds true,too.

Let ¯zk,i be again a feasible point closest to zk,i. Then because hk,i isfeasible, we get from (7.1.73) and (7.1.75)

‖zk,i − ¯zk,i‖X ≤ ‖zk,i − hk,i‖X ≤ ‖zk,i − ζk,i‖X + ‖ζk,i − hk,i‖X<

(τmaxτmin

+ 1)c3√rk

(7.1.76)


corresponding to (7.1.48).With qk,i and pk,i given by (7.1.37), (7.1.49) we again have for all (y, u) ∈

G× Uad the inequality∫Ω

(pk,i+ qk,i)(Ay−Ayk,i)dΩ+

∫Ω

(u−uk,i)(uk,i−uk,i−1− qk,i)dΩ ≥ 0. (7.1.77)

With u according to (7.1.72), let wk,i be given by

wk,i = T(u− γ0

qk,i

‖qk,i‖

), (7.1.78)

where γ0 > 0 is chosen to be a small number such that wk,i ∈ G for all (k, i).Such a γ0 exists because the solution of S1(u) (cf. (7.1.11)) depends continuouslyon u ∈ L2(Ω) (with regard to ‖ · ‖Y ).

Setting u := u and y := wk,i, inequality (7.1.77) leads to

‖qk,i‖ = γ−10 [‖pk,i‖(‖u− uk,i‖+ γ0) +

√rk‖pk,i‖ ‖

√rkq

k,i‖

+‖u− uk,i‖ ‖uk,i − uk,i−1‖],(7.1.79)

and, using (7.1.47), which is true in this case too, we obtain

‖Ayk,i − f − uk,i‖ < c5rk

‖zk,i − ¯zk,i‖X <

(τmaxτmin

+ 1

)c5rk

analogously to (7.1.54), (7.1.55).

7.1.13 Remark. (On strong convergence)A thorough analysis of the previous results on the convergence of Methods 7.1.5and 7.1.6 enables us to made some additional statements on the convergenceof the sequences yi and yk,i. For Problem (7.1.19), because the operatorA is an isomorphism from H2(Ω) ∩ H1

0 (Ω) onto L2(Ω), weak convergence ofuk,i in L2(Ω) implies the same for yk,i in H2(Ω). Under the assumptionsof Theorem 9.2.11, due to the relations (7.1.48) and (7.1.34), this immediatelyproves convergence of the sequence yk,i to a solution of Problem (7.1.19) inthe norm of the space H1(Ω).

This remains true in case that there are state constraints with (7.1.72)satisfied for some u ∈ Uad.

Now, let Q(·) := T (·)− T (0), with T defined by (7.1.13). Obviously, Q isa linear operator. Let us assume that the operator CQ has a finite dimensionalkernel K in L2(Ω) and that there exists a constant α > 0 such that for eachu ∈ L2(Ω)

‖CQu‖ ≥ α‖Π‖, (7.1.80)

with Π the operator of orthogonal projection onto the orthogonal complement ofK. Then, under the assumptions of Theorem 9.2.11, the sequence zk,i (resp.zk) converges strongly in the space X.


Indeed, we have

‖Cy(u)− κd‖2 = ‖Cy(u)− Cy(0)‖2 + ‖Cy(0)− κd‖2 +

+ 2∫

Ω(Cy(u)− Cy(0))(Cy(0)− κd)dΩ

= ‖CQu‖2 + 2∫

Ω(Cy(0)− κd)CQu dΩ + ‖Cy(0)− κd‖2.

Now, observing assumption (7.1.80), strong convergence of uk,i in L2(Ω) fol-lows immediately from Lemma 4.1.4. Therefore, the estimates for ‖Ayk,i− f −uk,i‖ ensure strong convergence of yk,i in Y . Finally, strong convergence ofzk,i in X follows from (7.1.34). ♦

We note that penalty methods in combination with PPR (with respect tothe full space) have been applied to convex variational problems for instance in(cf. [5, 21, 218, 261]). In contrast with the consideration here, in the mentionedcontributions Slater’s condition with regard to the penalized constraints appearsto be substantial.

7.1.3 A simple example

To demonstrate the effect of regularization, let us consider the following example:

min

∫ 1

−1

(y(0)− 1)2dx

s.t. −y′′ = u, y(−1) = y(1) = 0

with

u ∈ Uad := v ∈ L2(−1, 1) : v(x) ≤ 0 a.e. on (0, 1),∫ 0

−1

∫ x−1v(t)dtdx ≥ 0,

∫ 0

−1v(t)dt = 0.

The set Uad is a convex and closed subset of L2(−1, 1) and the above objectivecorresponds to the choice Cy = y(0), κd ≡ 1. The operator C maps H1

0 (−1, 1)into L2(−1, 1) and its boundedness is a consequence of the continuous embed-ding of H1

0 (−1, 1) into C[−1, 1].For a given u ∈ Uad the function

y′(x) = y′(−1)−∫ x

−1

u(t)dt

is absolutely continuous and

y(x) = y′(−1)(x+ 1)−∫ x

−1

∫ ξ

−1

u(t)dtdξ

solves the boundary value problem. Observing the latter formula, due to theboundary conditions, we conclude for x = 1 that

y′(−1) =1

2

∫ 1

−1

∫ ξ

−1

u(t)dtdξ


and for x = 0

y(0) = y′(−1)−∫ 0

−1

∫ ξ

−1

u(t)dtdξ.

Taking into account that u ∈ Uad, we get further

y′(−1) = 12

∫ 0

−1

∫ ξ−1u(t)dtdξ + 1

2

∫ 1

0

∫ ξ−1u(t)dtdξ

= 12

∫ 0

−1

∫ ξ−1u(t)dtdξ + 1

2

∫ 1

0

∫ ξ0u(t)dt dξ ≤ 1

2

∫ 0

−1

∫ ξ−1u(t)dtdξ.

Hence, for every u ∈ Uad, we infer

y(0) = y′(−1)−∫ 0

−1

∫ ξ

−1

u(t)dtdξ ≤ −1

2

∫ 0

−1

∫ ξ

−1

u(t)dtdξ ≤ 0,

and ∫ 1

−1

(y(0)− 1)2dx ≥ 2.

However, it is obvious that the process (y(1), u(1)) ≡ (0, 0) is feasible for

the control problem and has the objective value∫ 1

−1(y(1)(0) − 1)2dx = 2, i.e.

(y(1), u(1)) is optimal for the problem under consideration.It is easily seen that there are other solutions, in particular,

y(2)(x) :=

− 1

4π2 (cos 2πx− 1) for x ∈ [−1, 0]0 for x ∈ (0, 1]

,

u(2)(x) :=

cos 2πx for x ∈ [−1, 0]0 for x ∈ (0, 1]

and

y(3)(x) :=

−x

2

2− x− 1

2for x ∈ [−1,− 3

4]

x2

2+ x

2+ 1

16for x ∈ (− 3

4,− 1

4]

−x2

2for x ∈ (− 1

4, 0]

0 for x ∈ (0, 1]

,

u(3)(x) :=

+1 for x ∈ [−1,− 3

4]

−1 for x ∈ (− 34,− 1

4]

+1 for x ∈ (− 14, 0]

0 for x ∈ (0, 1]

.

Moreover, it is not difficult to verify that for arbitrary a ∈ R

(ay(2), au(2)) and (ay(3), au(3))

are again solutions, i.e. the optimal set U∗ is unbounded.Now, let us consider the penalized problem: Minimize, with fixed rk > 0,

Jk(y, u) =

∫ 1

−1

(y(0)− 1)2dx+1

rk

∫ 1

−1

(y′′ + u)2dx s.t. y ∈ Y, u ∈ Uad,

where, according to (7.1.14), Y = H10 (−1, 1) ∩H2(−1, 1).

No solution of this problem (if it is solvable) can satisfy the differentialequation −y′′ = u.

7.2. ELLIPTIC CONTROL WITH MIXED BOUNDARY CONDITIONS 261

Indeed, the first variation of the functional Jk in (y, u) ∈ Y × Uad gives forη ∈ Y, ν ∈ Uad

1α

[∫ 1

−1(y(0) + αη(0)− 1)2 − (y(0)− 1)2dx

+ 1rk

∫ 1

−1(y′′ + αη′′ + u+ αν)2 − (y′′ + u)2dx

]∣∣∣α=0

= 2[∫ 1

−1(y(0)− 1)η(0)dx+ 1

rk

∫ 1

−1(y′′ + u)(η′′ + ν)

]dx.

If y′′+u = 0, then (y, u) is a feasible control, hence y(0)−1 ≤ −1, and choosingfor instance η(x) = x2 − 1 and ν(x) ≡ 0, we obtain a non-zero value of the firstvariation.

However, if (y, u) is a solution of the penalized problem, then it is easilyseen that for arbitrary a the pairs

(y + ay(2), u+ au(2)) or (y + ay(3), u+ au(3))

are also solutions.Moreover, approximating u by piecewise constant functions and y by Her-

mitian cubic splines, one can verify that the approximate penalized problem isin general not uniquely solvable. For instance, if on the interval [−1, 1] a gridwith step-size h = 1

4 is chosen, then, with (yh, uh) a solution of the approximate

problem, the pair (yh + ay(3), uh + au(3)) is also a solution for arbitrary a (thisis true because the Hermite approximation of the function y(3) coincides withy(3)).

Hence, the Hessian of the approximate penalty function is not regular.

In Method 7.1.6 we deal with regularized penalty problems:

min Ψk,i(y, u) =

∫ 1

−1

(y(0)− 1)2dx+1

rk

∫ 1

−1

(y′′ + u)2dx+ ‖u− uk,i−1‖2L2(−1,1)

s.t. y ∈ Y, u ∈ Uad,

where Ψk,i are strongly convex functions in X (cf. (7.1.16),(7.1.17)). Approx-imating these problems as mentioned above, we obtain quadratic program-ming problems with strongly convex objective functions in corresponding finite-dimensional spaces. Hence, fast convergent methods can be applied for solvingthese finite-dimensional problems.

7.2 Elliptic Control with Mixed Boundary Con-ditions

7.2.1 Discretization in state and control space

As considered in the previous Chapter 7.1, penalization of the state equationpermits to handle the state variable y and control variable u as independent.On the one side this is an advantage on the other side it complicates the dis-cretization process. For instance, applying finite element methods, one has touse elements with order higher than one (cf. Lasiecka [256]).In the following chapter we renounce the use of penalty techniques. Consider-ing the elliptic boundary value problem as a functional restriction in the process


space, the auxiliary problems in the MSR-method becomes simpler and the classof elliptic state equations can be widened.

In the previous chapter we did not focus on the question about discretiza-tion and numerical realization. In particular, we did not estimate the discretiza-tion and computational errors and the question leaves open why the regularizingparameter χk cannot be arbitrarily decreased in the course of switching fromone discretization level to the next. Moreover, the stopping parameter of theinner prox-cycle δk in Algorithm 7.1.6 could not be computed because of theunknown constants d1 and d2. Now, these questions will be answered in thischapter more thoroughly.

x2

x1

Γm

Ω

Γm+1

νm+1

Sm

ωm

Figure 7.2.2: Polygonal domain

Let us consider a convex, two-dimensional polygonal domain Ω with edgesSm, m ∈M = 1, ...,M (cf. Figure 7.2.2). It is supposed that the boundary ofthis domain can be partitioned into two parts: ∂Ω = ΓD ∪ΓN . Each part can bedescribed as a union of a certain number of facets Γm, m ∈M, of the polygon:

ΓD = ∪m∈MDΓm, ΓN = ∪m∈MN

Γm,

with MD ∪ MN = M, MD 6= ∅. The facets and edges of the polygon arenumbered corresponding to a positive orientation of the boundary (counter-clockwise). Hereby let Sm be the last edge of Γm. Each edge is characterized bythe modul of its angle ωm ∈ (0, π], m ∈M.

On such polygonal domain we study the boundary value problem

−∆y(x) = f(x) in Ω,

∂y

∂νm(x) = gm(x) on Γm, m ∈MN , (7.2.1)

y(x) = 0 on Γm, m ∈MD,

where f ∈ L2(Ω), gm ∈ L2(Γm).With s = k + λ > 0, k ∈ Z+, λ ∈ [0, 1) and M ⊂ Rn a measurable set,

meas M > 0, we introduce the following norms (cf. ([148] and [108]):


‖u‖s,M = ‖u‖Wk(M) =

∑|α|≤k

∫M

|Dαu(x)|2dx

1/2

, if λ = 0,

otherwise

‖u‖s,M =

‖u‖2Wk(M) +∑|α|=k

∫ ∫M×M

|Dαu(x)−Dαu(y)|2

|x− y|n+2λdxdy

1/2

,

and

‖f‖−s,M = supφ∈C∞(M)\0

〈f, φ〉‖φ‖s,M

.

In order to allow a weak formulation of Problem (7.2.1) we introduce thefollowing Hilbert space:

Y = y ∈W 1(Ω) : γ(m)0 y = 0 in H1/2(Γm), m ∈MD, (7.2.2)

which is considered in the sequel as the state space. The trace operators γ(m)0

in (7.2.2) correspond to those facets of the polygon which are associated withthe homogenous boundary conditions. Now, the variational problem consistsin finding an element y ∈ Y which satisfies for arbitrary y ∈ Y the followingequation: ∑

j=1,2

⟨Dxjy,Dxj y

⟩Ω

= 〈f, y〉Ω +∑

m∈MN

〈gm, γ(m)0 y〉Γm . (7.2.3)

Because Green’s formula holds for bounded Lipschitzean domains [148], each”classical” solution of (7.2.1) satisfies the variational equation (7.2.3). Equation(7.2.3) can be equivalently reformulated in operator form

−∆y = f +∑

m∈MN

gmδΓm in M′, (7.2.4)

defining a generalized Laplace operator and the so-called Delta layers [19]. Itshould be mentioned that the operator form of the elliptic equation (7.2.4) isnot equivalent to the classical boundary value problem (7.2.1). In (7.2.1) it issupposed that all functions involved are measurable, whereas in (7.2.4) one dealswith generalized distributions. The elements f and gm are now permitted to belocated outside of the space L2. Thus, one has to find additional regularityconditions for the elements gm, m ∈ MN , which guarantee the existence of aclassical solution y ∈ Y ∩W 2(Ω) of (7.2.4) (cf. [148, Kap. 4]).

To each edge Sm on the oriented boundary ∂Ω a periodical vector function

xm ∈[C0,1(−∞,∞)

]2is assigned. Hereby it is agreed that the vector xm(s)

circulates infinitely often on the boundary ∂Ω, whereas the parameter s, definingthe sign-oriented distance from the point Sm on the curve ∂Ω, is changing from−∞ to ∞. Now, for each edge Sm we construct the following (bi)quadraticfunctionals on L2(Γm)× L2(Γm+1):


(A) If m ∈MN and (m+ 1) ∈MD, then

Tm(gm) =

∫ cm

0

|gm(xm(−s))|2

sds;

(B) if m ∈MD and (m+ 1) ∈MN , then

Tm(gm+1) =

∫ cm

0

|gm+1(xm(s))|2

sds;

(C) if m ∈MN and (m+ 1) ∈MN , then

Tm(gm, gm+1) =

∫ cm

0

|gm+1(xm(s))− gm(xm(−s))|2

sds,

with cm = min [meas Γm,meas Γm+1]. (Index M + 1 is identified with index1, where M denotes the number of facets of the polygon). According to theintroduced classification of the edges above we can determine correspondingsubsets of index sets MA ⊂M, MB ⊂M and MC ⊂M.

7.2.1 Theorem. Assume the system of elements gm, m ∈ MN , in (7.2.4)satisfies the following conditions:

• gm ∈ H1/2(Γm), m ∈MN ;

• Tm(gm) <∞, if ωm = π/2 and m ∈MA;

• Tm(gm+1) <∞, if ωm = π/2 and m ∈MB;

• Tm(gm, gm+1) <∞ , m ∈MC

and ωm ≤ π/2, m ∈MA ∪MB.Then the solution of the state equation (7.2.4) belongs to the space W 2(Ω).

The next two theorems describe some constants cF and cγ which are importantfor the estimates needed afterwards.

7.2.2 Theorem. For each element y ∈ M mit MD 6= ∅ there exists a positiveconstant cF (independent from y) such that

‖y‖Ω ≤ cF√∑j=1,2

‖Dxjy‖2Ω . (7.2.5)

7.2.3 Theorem. For each element y ∈ W 1(Ω) there exists a positive constantcγ (independent from y) such that

‖γ0y‖∂Ω ≤ cγ ‖y‖1,Ω . (7.2.6)


The formulas for the determination of cF and cγ by means of the geometricparameters of the polygon can be found in [357]. The corresponding proofs ofTheorems 7.2.2 and 7.2.3 can be transferred to the n-dimensional case.

Very rarely one succeeds in obtaining a closed form of the solution of(7.2.4). Hence, in the most cases approximate solutions are the only possibilityto get information about the exact solutions.

According to the finite-element-technique (see, for instance, [286], [291],[382]), we start with a triangulation of the polygon. In fact we consider a se-quence of nested triangulations. Each grid gets a number k = 1, 2, ... and wedetermine the sets of all edges Kk

j , j ∈ Ik and of all sub-domains Ωkj , j ∈ T kon the k-th level of discretization. The parameter hk of the discretization of thegrid is then determined by

hk = maxj∈ Ik

diam(Ωkj ),

characterizing the coarseness of the triangulation. We single out a subset ofindices J k0 ⊂ J k corresponding to those edges, which are not located on theDirichlet boundary ΓD. Dealing with a fixed level of discretization, we will omitthe index i. To each Kj , j ∈ J0 a ”hat”-function ϕj(x) is assigned (cf. [135],[286], [382]) and the space of inner approximations can be defined (cf. [135]):

Yh = yh ∈ Y : yh(x) =∑j∈J0

yjϕj(x), yj ∈ R.

On that way we get the discretized minimization problem

minF (yh) : yh ∈ Yh, (7.2.7)

with

F (yh) =1

2

∑j=1,2

∥∥Dxjyh∥∥2

Ω− 〈f, yh〉Ω −

∑m∈MN

〈gm, γ(m)0 yh〉Γm .

Naturally the question arises how good approximates the solution of Problem(7.2.7) the corresponding exact solution of Problem (7.2.4). The known resultsin [286, 382] correspond only to boundary conditions of the first, second andthird type but not to mixed boundary problems. So we are going to estimatethese solution in our case.

Let y ∈ Y be the exact solution of Problem (7.2.4) and yh ∈ Yh the exactsolution of the discretized Problem (7.2.7), then it holds

‖y − yh‖Ω ≤ C0h2

(‖f‖Ω +

∑m∈MN

‖gm‖1/2,Γm (7.2.8)

+∑

m∈MA

T 1/2m (gm) +

∑m∈MB

T 1/2m (gm+1) +

∑m∈MC

T 1/2m (gm, gm+1)

)

To study the behavior of the MSR-method we consider the following specialcase:


minJ(y, u) :=1

2‖y‖2Ω′ : (y, u) ∈ Y × Uad

s.t. (7.2.9)

−∆y = u+ f +∑

m∈MN

gm δΓm in Y ′,

where the state space Y is defined by (7.2.2). For the control space and the setof admissible controls we choose

U = L2(Ω), Uad = u ∈ U : ξ0(x) ≤ u(x) ≤ ξ1(x) a.a in Ω,

withξ0(x) ∈W 1(Ω) ∩ L∞(Ω), ξ1(x) ∈W 1(Ω) ∩ L∞(Ω).

The domain Ω′ can be chosen as an open measurable sub-domain of Ω, knowingthat the smaller the sub-domain Ω′ the more restricted is the space of obser-vations K = L2(Ω′). From the previous chapter we learned that the absenceof quadratic control costs in the objective functional is going to cause the ill-posedness of the Problem (7.2.9).

In the framework of this chapter we investigate the well-known chatteringregimes [419, 420]. Our goal is to observe an ill-posed behavior for problemswith distributed controls on a two-dimensional polygon. Such problems corre-late with problems considered in [176], Part II, where inhomogeneous boundaryconditions of the Neumann type are considered. Because the behavior of thesolution is complicated anyway, we restrict ourselves on the study of the casewhere in (7.2.9) no state constraints are given: Zad = Y × Uad. In [356] theexistence of a solution for a general problem is proved.

Let us consider the solution set Z∗ in the space Z = Y ×U , equipped withthe norm

‖z‖Z =√‖y‖21,Ω + ‖u‖2Ω. (7.2.10)

Because of the smallness of the observation space one cannot expect a uniquesolution of (7.2.9).

In the spirit of Subsection 2.2.3 we construct the MSR-algorithm for solv-ing Problem (7.2.9) in such a way that a small deviation of the approximatesolution zk,i from the exact solution zk,i of the auxiliary problem

minΨk,i(y, u) =1

2‖y‖2Ω′ +

χk2

∥∥u− uk,i−1∥∥2

Ω: (y, u) ∈ Zad,

s.t. (7.2.11)

−∆y = u+ f +∑

m∈MN

gmδΓm in Y ′

is permitted in the weaker norm (7.2.10), i.e., we are looking for an approximatesolution of the regularized auxiliary Problem 7.2.11.

Next we explain in detail the components of the error appearing by solvingthe auxiliary Problem (7.2.11). All together we are dealing with two components.We start with the investigation of the discretization error. To this end we need aninner approximation of U and Uad analogously to the approximation of the statespace Y. To each elementary triangle Ωj a piece-wise constant basis function


ψj(x), j ∈ I, is assigned (cf. [108], [286], [382]). Then the inner approximationis defined by:

Uh = uh ∈ L2(Ω) : uh(x) =∑j∈I

ujψj(x), uj ∈ R.

To describe the inner approximation of the convex set Uad via [135], we introducethe following notation for the mean value of an arbitrary function ψ ∈ L2(Ω)on the elementary triangle Ωj :

Mj(ψ) =1

meas Ωj

∫Ωj

ψdx. (7.2.12)

Then the set of inner approximation reads as:

Uhad = uh ∈ Uh : Mj(ξ0) ≤ uh(x) ≤Mj(ξ1) a.a. in Ωj , ∀ j ∈ I.

It should be noted that to simplify the notation all approximated sets Yh, Uh,Uhad are constructed with the same parameter of discretization hk, whereas in[108] two different parameters – one for the state and one for the control func-tions – are used.

Let zk,ih = (yk,ih , uk,ih ) be the exact solution of the discretized auxiliary problem

minΨk,i(yh, uh) : (yh, uh) ∈ Yh × Uhad,s.t. (7.2.13)∑j=1,2

⟨Dxjyh, Dxj yh

⟩Ω

= 〈uh + f, yh〉Ω +∑

m∈MN

〈gm, γ(m)0 yh〉Γm ∀ yh ∈ Yh,

where the state equation is described in variational form. Problem (7.2.13) isa finite-dimensional quadratic minimization problem, which can by solved bystandard optimization algorithms (cf. Section A3.4). These algorithms have

also an error such that the solution zk,ih can never be computed exactly. Thus,we have to deal with an inexact solution (yk,i, uk,i) of the auxiliary Problem(7.2.13). The MSR-method now can be described as follows:

7.2.4 Algorithm. (MSR-method)Data: u0,0 ∈ U , ηk ↓ 0, δk ↓ δ;

S0: Set i(0) := 0, k := 1;

S1: (a) set uk,0 := uk−1,i(k−1), i := 0;

(b) given uk,i ∈ U , i := i+ 1;compute a approximate zk,i = (yk,i, uk,i) ∈ Yh × Uhadof the exact solution zk,ih = (yk,ih , uk,ih ) ∈ Yh × Uhadof the discretized auxiliary Problem (7.2.13), such that

Ψk,i(yk,i, uk,i)− Ψk,i(y

k,ih , uk,ih ) ≤ ηk, (7.2.14)

and ∑j=1,2

⟨Dxjy

k,i, Dxj yh⟩

Ω=

⟨uk,i + f, yh

⟩Ω

+∑

m∈MN

〈gm, γ(m)0 yh〉Γm , ∀ yh ∈ Yh;


(c) if∥∥uk,i − uk,i−1

∥∥Ω≥ δk, go to (b),

otherwise set i(k) := i, k := k + 1 go to (a).

♦

The control parameter ηk characterizes the computational error. The assumptionthat the discretized state equation can be solved exactly is agreeable because thestate equation itself is well-posed and therefore it is solvable with high accuracy.

7.2.2 Convergence of inexact MSR-methods

The next goal is to prove the convergence of MSR-method 7.2.4 (see Theorem7.2.12 below). We start with the estimation of the computational error by meansof the controlling parameters. In [357] the following lemma is proved:

7.2.5 Lemma. For the approximate solutions introduced in (7.2.14) and (7.2.13),respectively, it holds:

‖yk,i − yk,ih ‖1,Ω ≤√

1 + c2F

√2ηkχk

, ‖uk,i − uk,ih ‖Ω ≤√

2ηkχk

. (7.2.15)

The next question which we have to answer is how good the auxiliaryProblem (7.2.13) approximates the Problem (7.2.11)?We fix the indices k and i in the MSR-method and will drop these sometimesif convenient. It should be noted that the proof techniques are similar to thosein [108]. The state equations in (7.2.11) and (7.2.13) correspond to two affinemappings y(u) and yh(uh), satisfying (due to the optimality conditions of firstorder for Problem (7.2.11) and (7.2.13)) the following variational inequalities,correspondingly:

〈y(u), y(v)− y(u)〉Ω′ + χk⟨u− uk,i−1, v − u

⟩Ω≥ 0, ∀ v ∈ Uad, (7.2.16)

〈yh(uh), yh(vh)− yh(uh)〉Ω′ + χk⟨uh − uk,i−1, vh − uh

⟩Ω≥ 0, ∀vh ∈ Uhad.

(7.2.17)Adding up (7.2.16) and (7.2.17) we get

ay(v, vh) + χkau(v, vh) ≥ 0, ∀ v ∈ Uad, ∀ vh ∈ Uhad (7.2.18)

with

au =⟨u− uk,i−1, v − u

⟩Ω

+⟨uh − uk,i−1, vh − uh

⟩Ω

= −‖u− uh‖2Ω + 〈u, v − uh〉Ω + 〈u, vh − u〉Ω+ 〈uh − u, vh − u〉Ω −

⟨uk,i−1, (v − uh) + (vh − u)

⟩Ω

(7.2.19)

and

ay = 〈y(u), y(v)− y(u)〉Ω′ + 〈yh(uh), yh(vh)− yh(uh)〉Ω′= −‖y(u)− yh(uh)‖2Ω′ + 〈y(u), (y(v)− y(uh)) + (y(vh)− y(u))〉Ω′

+ 〈y(u), (y(uh)− yh(uh)) + (yh(vh)− y(vh))〉Ω′+ 〈(yh(uh)− y(u)), (yh(vh)− y(vh)) + (y(vh)− y(u))〉Ω′ . (7.2.20)


From (7.2.18) we infer

χk‖u− uh‖2Ω + ‖y(u)− yh(uh)‖2Ω′≤ χk

(⟨uk,i−1, (v − uh) + (vh − u)

⟩Ω

+ 〈u, v − uh〉Ω + 〈u, vh − u〉Ω + 〈uh − u, vh − u〉Ω)

+ 〈y(u), (y(v)− y(uh)) + (y(vh)− y(u))〉Ω′ (7.2.21)

+ 〈y(u), (y(uh)− yh(uh)) + (yh(vh)− y(vh))〉Ω′+ 〈(yh(uh)− y(u)), (yh(vh)− y(vh)) + (y(vh)− y(u))〉Ω′ .

In Lemma 7.2.11 inequality (7.2.21) is basic for estimating the discretizationerror in the norm (7.2.10). Moreover, the terms in (7.2.21) require a computationof the approximation error of the control variables in the L2-norm, which is onlypossible if an upper bound for ‖u‖1,Ωj , j ∈ I is known (cf. [286, 382]). Thesecomputations can be done via Lemma 7.2.6, which is a generalization of Lemma2 in [108] for Problem (7.2.11). In particular this requires that the prox-pointuk,i−1 is discretized on the same or on the previous coarser grid. Hence, it isimportant that the grids are exactly nested one into the other, see Figure 7.2.3.

hkhk+1

Figure 7.2.3: Nested triangulations

7.2.6 Lemma. If the function uk,i−1 belongs to the space ⊕j∈IW 1(Ωj), thenit holds with (7.2.24) 1

∑j∈I‖u‖21,Ωj ≤

2c2y(1 + c2F )

χ2k

+ 2‖uk,i−1‖2Ω + ‖ξ0‖21,Ω + ‖ξ1‖21,Ω.

Proof: Regularity of the solution u of the auxiliary Problem (7.2.11) can beshown by standard techniques (see Lions [269]). Function u is continuous and

1The sets W 1(Ω1), W 1(Ω2), can be considered as subspaces of U if the correspondingfunctions are extended with values zero on the whole domain Ω.


its graph consists of tree parts:

u(x) =

uk,i−1(x)− p(x)χk

if ξ0(x) ≤ uk,i−1(x)− p(x)χk≤ ξ1(x),

ξ0(x) if uk,i−1(x)− p(x)χk

< ξ0(x),

ξ1(x) if uk,i−1(x)− p(x)χk

> ξ1(x),

(7.2.22)

where p(x) denotes the solution of the adjoint state equation. Because for Hilbertspaces it holds that the double-dual space is the original space, i.e. Y ′′ = Y, andthe generalized Laplace operator is self-adjoint, i.e. (−∆)∗ = −∆ in L(Y,Y ′),the function p(x) can be considered as classical solution of the Poisson problemwith mixed boundary conditions

∆p(x) = y(x) in Ω,

∂p

∂νm(x) = 0 on Γm, m ∈MN , (7.2.23)

p(x) = 0 on Γm, m ∈MD.

Obviously with Theorem 7.2.1 it holds that p ∈ Y ∩W 2(Ω). Analogous to [108]the domain Ω can be decomposed according to (7.2.22) into tree parts. We carryout this decomposition locally on each Ωj , j ∈ I:

‖u‖21,Ωj ≤∥∥∥∥uk,i−1 − p

χk

∥∥∥∥2

1,Ωj

+ ‖ξ0‖21,Ωj + ‖ξ1‖21,Ωj .

Dealing with nested grids, the values ‖uk,i−1‖1,Ωj , j ∈ I, exists and because thedistributed control is discretized by means of piece-wise constant basic functionsit holds ‖uk,i−1‖1,Ωj = ‖uk,i−1‖Ωj .

The estimate for the adjoint state function p(x) can be picked up fromLemma 7.2.7 below, hence the proof is complete.

7.2.7 Lemma. [357]The functions y(x) and p(x) are bounded and it holds

‖y‖Ω ≤ cy, ‖p‖1,Ω ≤ cy√

1 + c2F ,

with

cy = c2F

(‖f‖Ω + sup

v∈Uad‖v‖Ω + cγ

√1 + c−2

F

∑m∈MN

‖gm‖Γm

). (7.2.24)

The next tree lemmata correspond to Lemmata 3, 4, and 5 in [108]. Theseresults are classical tools for analyzing inner approximations of convex sets in L2-spaces. Because one deals only with the existence of some constants a majorizingconstant C is used.

7.2.8 Lemma. Let u be the solution of Problem (7.2.11) and vh let be given by

vh(x) =∑j∈I

Mj(u)ψj(x),


with Mj(·) via (7.2.12).Then it holds vh ∈ Uhad and there exists a constant C, independent from u andh, such that

‖u− vh‖Ω ≤ Ch

√∑j∈I‖u‖21,Ωj , (7.2.25)

‖u− vh‖−1,Ω ≤ Ch2

√∑j∈I‖u‖21,Ωj , (7.2.26)

√∑j∈I‖u− vh‖2−1,Ωj

≤ Ch2

√∑j∈I‖u‖21,Ωj . (7.2.27)

7.2.9 Lemma. Let vh be an arbitrary element belonging to Uhad, then thereexists a function v∗ ∈ Uad ∩ ⊕j∈IW 1(Ωj), such that

Mj(v∗) = Mj(vh), ∀ j ∈ I

and

‖v∗‖21,Ωj ≤ ‖ξ0‖21,Ωj + ‖ξ1‖21,Ωj , ∀ j ∈ I.

7.2.10 Lemma. Let uh be the solution of Problem (7.2.13) and v correspondsto v∗ in Lemma 7.2.9. Then there exists a constant C, independent from ξ0, ξ1and h, such that

‖uh − v‖−1,Ω ≤ Ch2√‖ξ0‖21,Ω + ‖ξ1‖21,Ω, (7.2.28)√∑

j∈I‖uh − v‖2−1,Ωj

≤ Ch2√‖ξ0‖21,Ω + ‖ξ1‖21,Ω. (7.2.29)

The following lemma makes use of the previous Lemmata 7.2.8, 7.2.9 und7.2.10.

7.2.11 Lemma. On each iteration of the MSR-method 7.2.4 the following re-lations are true

‖uk,i − uk,i‖Ω ≤ εk, (7.2.30)

with

εk =

√2okχk

+

√2ηkχk

, (7.2.31)


and

ok = max1≤i≤i(k)

C χkh

2k

(C

χ2k

+∥∥uk,i−1

∥∥2

Ω+ C1

)

+ C χkh2k

( ∥∥uk,i−1∥∥

Ω+

√C

χ2k

+ ‖uk,i−1‖2Ω + C1 +C2

χk

)

×

(√C1 +

√C

χ2k

+ ‖uk,i−1‖2Ω + C1

)(7.2.32)

+ C h4k

(C

χ2k

+∥∥uk,i−1

∥∥2

Ω+ C1

)+(C3 h

2k

)2+ C4 h

2k

.

The constants C, C1 , .., C4 are independent from ‖uk,i−1‖Ω, χk , ηk and hk.

Proof: We have to analyze the amount of the error (7.2.30) in dependence of thegiven approximate uk,i−1 and of the controlling parameters. Therefore, we con-tinue with the investigation of inequality (7.2.21). Following the proof techniquein [108], the first term on the right-hand side in (7.2.21) can be estimated viathe generalized Cauchy–Schwarz inequality. It holds for instance for any j ∈ I

〈uk,i−1, v − uh〉Ωj ≤ ‖uk,i−1‖1,Ωj‖v − uh‖−1,Ωj .

Adding up such inequalities we get from (7.2.21) the following relation

χk‖u− uh‖2Ω + ‖y(u)− yh(uh)‖2Ω′

≤ χk∑j∈I

(∥∥uk,i−1∥∥

1,Ωj‖v − uh‖−1,Ωj +

∥∥uk,i−1∥∥

1,Ωj‖vh − u‖−1,Ωj

)+χk

(〈u, v − uh〉Ω + 〈u, vh − u〉Ω + 〈uh − u, vh − u〉Ω

)+ 〈y(u), (y(v)− y(uh)) + (y(vh)− y(u))〉Ω′ (7.2.33)

+ 〈y(u), (y(uh)− yh(uh)) + (yh(vh)− y(vh))〉Ω′+ 〈(yh(uh)− y(u)), (yh(vh)− y(vh)) + (y(vh)− y(u))〉Ω′ .

The following estimates hold for an affine mapping y(·) : Y ′ → Y for all v ∈ Uand all uh ∈ Uh, (cf. [108], page 44)

‖y(v)− y(uh)‖1,Ω ≤ (1 + c2F )‖v − uh‖−1,Ω,

‖y(vh)− y(u)‖1,Ω ≤ (1 + c2F )‖vh − u‖−1,Ω.

Using the inequalities

2 〈uh − u, vh − u〉Ω ≤ ‖uh − u‖2Ω + ‖vh − u‖2Ω

and

2 〈(yh(uh)− y(u)), (yh(vh)− y(vh)) + (y(vh)− y(u))〉Ω′≤ 2 ‖yh(uh)− y(u)‖2Ω′ + ‖yh(vh)− y(vh)‖2Ω′ + ‖y(vh)− y(u)‖2Ω′


we obtain in (7.2.33)

χk ‖u− uh‖2Ω + ‖y(u)− yh(uh)‖2Ω′≤ χk

∑j∈I

∥∥uk,i−1∥∥

Ωj

(‖v − uh‖−1,Ωj + ‖vh − u‖−1,Ωj

)+χk

∑j∈I‖u‖1,Ωj

(‖v − uh‖−1,Ωj + ‖vh − u‖−1,Ωj

)+χk2

(‖uh − u‖2Ω + ‖vh − u‖2Ω

)+‖y(u)‖Ω (‖y(v)− y(uh)‖Ω + ‖y(vh)− y(u)‖Ω) (7.2.34)

+‖yh(uh)− y(u)‖2Ω′ +1

2‖yh(vh)− y(vh)‖2Ω′ +

1

2‖y(vh)− y(u)‖2Ω′

+ 〈y(u), (y(uh)− yh(uh)) + (yh(vh)− y(vh))〉Ω′ .

Now we have to bear in mind that the values∥∥uk,i−1

∥∥Ωj

and∥∥uk,i−1

∥∥1,Ωj

co-

incides. The first and second term on the right-hand side in (7.2.34) allows usto apply the Cauchy–Schwarz inequality for finite sums. In the fourth term weobserve that the L2-norms can be majorized by W 1-norms. After all transfor-mations we have for all v ∈ Uad and vh ∈ Uhad

χk2‖u− uh‖2Ω ≤

≤ χk2‖vh − u‖2Ω +

χk‖uk,i−1‖Ω + χk

√∑j∈I‖u‖21,Ωj

T∆

+C‖y(u)‖Ω (‖v − uh‖−1,Ω + ‖vh − u‖−1,Ω)

+1

2‖yh(vh)− y(vh)‖2Ω +

C2

2‖vh − u‖2−1,Ω (7.2.35)

+ ‖y(u)‖Ω(‖yh(uh)− y(uh)‖Ω + ‖yh(vh)− y(vh)‖Ω

)with

T∆ =

√∑j∈I‖v − uh‖2−1,Ωj

+

√∑j∈I‖vh − u‖2−1,Ωj

.

In (7.2.35) we set due to Lemma 7.2.10 v := v∗(uh) ∈ Uad and in view ofLemma 7.2.8

vh :=∑j∈I

Mj(u)ψj ∈ Uhad.

Applying to the last terms in (7.2.35) the estimates of the discretization error


of the state equation (7.2.8), we obtain

χk2‖u− uh‖2Ω ≤

χk2Ch2

k

∑j∈I‖u‖21,Ωj

+

χk ∥∥uk,i−1∥∥

Ω+ χk

√∑j∈I‖u‖21,Ωj + C ‖y‖Ω

×

Ch2k

√‖ξ0‖21,Ω + ‖ξ1‖21,Ω + Ch2

k

√∑j∈I‖u‖21,Ωj

+Ch4

k

∑j∈I‖u‖21,Ωj +

(Ch2

k ‖f + vh‖Ω + Ch2kTΣ

)2(7.2.36)

+ ‖y‖Ω Ch2k (‖f + vh‖Ω + ‖f + uh‖Ω + 2TΣ)

with

TΣ =∑

m∈MN

‖gm‖1/2,Γm +

+∑

m∈MA

T 1/2m (gm) +

∑m∈MB

T 1/2m (gm+1) +

∑m∈MC

T 1/2m (gm, gm+1) .

Analogous to [108], page 45, we get the estimates

‖uh‖Ω ≤√‖ξ0‖2Ω + ‖ξ1‖2Ω , ‖vh‖Ω ≤

√‖ξ0‖2Ω + ‖ξ1‖2Ω .

The terms ‖u‖1,Ωj in (7.2.36) can be estimated via Lemma 7.2.6 and ‖y‖Ω via(7.2.24), respectively. Finally, with new constants C,C ′, C we obtain

χk2‖u− uh‖2Ω ≤ Cχkh2

k

(C

χ2k

+ ‖uk,i−1‖2Ω + C

)+ Ch2

k

(χk∥∥uk,i−1

∥∥Ω

+ χk

√C

χ2k

+ ‖uk,i−1‖2Ω + C + C cy

)

×

(√C +

√C

χ2k

+ ‖uk,i−1‖2Ω + C

)(7.2.37)

+ Ch4k

(C

χ2k

+ ‖uk,i−1‖2Ω + C

)+(C ′h2

k

)2+ cyC

′h2k.

It is not difficult to see that the definition of ok in (7.2.32) is based on (7.2.37).Term (7.2.31) for the error εk in the auxiliary problem of the MSR-method canbe calculated via Lemma 7.2.5 by means of the triangle inequality. From (7.2.37)it follows

εk =

√2 okχk

+

√2 ηkχk

≥ ‖uk,i − uk,ih ‖Ω + ‖uk,i − uk,ih ‖Ω ≥ ‖uk,i − uk,i‖Ω .

Now all preliminary results are provided and we are going to formulate theconvergence result of the MSR-method.


7.2.12 Theorem. The sequence of iterates zk,i = (yk,i, uk,i) in MSR-Algorithm7.2.4 has always a finite number of elements in the internal loop i, i.e.,

i(k) ≤ 2d2/ϑk (7.2.38)

and zk,i converges weakly in the space W 1(Ω)×L2(Ω) to some element of thesolution set Z∗, if

0 ≤ χ < χk ≤ χ , 0 < ϑk ≤ ϑ ,

∞∑k=1

√ηkχk

<∞ ,

∞∑k=1

hkχk

<∞ , (7.2.39)

δk ≥ εk +√

2 (d2 + εk) (εk + ϑk) , (7.2.40)

where εk is defined via Lemma 7.2.11 and

d2 = 2 supv∈Uad

‖v‖Ω . (7.2.41)

Proof: Choosing the parameter δk in MSR-algorithm 7.2.4 sufficiently large,then∥∥uk,i − uk,i−1

∥∥Ω≥∥∥uk,i − uk,i−1

∥∥Ω−∥∥uk,i − uk,i∥∥

Ω≥ (δk − εk) > 0. (7.2.42)

We apply Lemma 8 in [357] to the k-th iteration of the MSR-method.As space X we chose the process space Z, i.e. ‖ · ‖X := ‖ · ‖Z . Moreover, thefollowing substitutions are necessary:

X1 := z = (y, u) ∈ Z : y = 0 in Y, Φ(z) := 2J(z),

b(z1, z2) := 0, a(z1, z2) := 〈y1, y2〉Ω′ , `(z) := 0,

G := (y, u) ∈ Y × Uad : −∆y = u+ f +∑

m∈MN

gmδΓm in Y ′,

a0 := zk,i−1, a1 := zk,i, x := z∗, δ := δk − εk, χ := χk.

The application of this Lemma needs to show the closedness and convexityof G in space X = Z = Y ×U has to be shown. By means of the state equationin G an affine closed manifold in Y × Y is generated, because the generalizedLaplace operator defines an isometric isomorphism between Y and Y ′. Therefore,the intersection of this manifold with the topological stronger space Z in X isclosed. The closedness of the whole set G holds due to the additional intersectionof the affine set with the closed (in X ) set Y × Uad. Moreover, these two setsare convex, hence G is convex. Due to [357], formula (2.124), we get a norm-equivalence:

|(y, u)|X = ‖u‖Ω . (7.2.43)

In this case the required equivalence between | · |X and ‖ · ‖X (cf. theprevious chapter) is not necessary, because the convergence proof is carried outdirectly for the half-norm.


Since Φ(x) = 2J(z∗) ≤ 2J(zk,i) = Φ(a1), we have η(x) = 0 in Lemma 8 in[357]. In view of [357], formula (2.128), we have

|zk,i − z∗|X ≤ |zk,i−1 − z∗|X −(δk − εk)2

2 (d2 + εk)(7.2.44)

≤ |zk,i−1 − z∗|X + |zk,i−1 − zk,i−1|X −(δk − εk)2

2 (d2 + εk).

The denominator in (7.2.44) can be calculated due to (7.2.41) in the followingway

|zk,i−1 − z∗|X ≤ |zk,i−1 − zk,i−1|X + |zk,i−1 − z∗|X≤ εk + sup

(v1, v2)∈U2ad

‖v1 − v2‖Ω

= εk + d2 .

With (7.2.43) and (7.2.44) one get for i = 1, .. i(k)− 1

|zk,i − z∗|X − |zk,i−1 − z∗|X ≤ |zk,i − zk,i|X −(δk − εk)2

2 (d2 + εk)

≤ εk −(δk − εk)2

2 (d2 + εk)≤ −ϑk < 0. (7.2.45)

Hence, the values of |zk,i − z∗|X can be majorized by an decreasing arithmeticseries (in i). Because ϑk > 0 and |zk,i− z∗|X ≤ d2 + εk, we have a finite numberof i-steps and estimate (7.2.38) holds true. At the same time we get via (7.2.45)a lower bound for δk, namely (7.2.40. In view of [357], formula (2.127), withη(x) = 0 it holds

|zk,i − z∗|X − |zk,i−1 − z∗|X ≤ εk. (7.2.46)

Due to (7.2.45) |zk,i − z∗|X is monotonously decreasing for i = 0, .. i(k) − 1and in (7.2.46) we can carry on for all i = 1, ..., i(k). Because of the identityzk+1,0 = zk,i(k) for all k , we get the following estimate which is independentfrom i

|zk+1,0 − z∗|X − |zk,0 − z∗|X ≤ εk. (7.2.47)

If∑∞k=1 εk <∞, then with Polyak’s Lemma A3.1.7 we get the convergence (in

k) of |zk,0−z∗|X towards a finite positive value. The sequence of exact solutions|zk,0 − z∗|X converges to the same value, because of εk → 0 .

Let zks, 0 be an arbitrary weak convergent subsequence of zk,0 withthe weak limit point z3 ∈ G ⊂ Z .2 Due to [357], formula (2.126), with i := i(k)

|zk,i(k) − z∗|2X − |zk,i(k)−1 − z∗|2X ≤4

χk

(J(z∗)− J(zk,i(k))

)(7.2.48)

and after backward tracking of the index i as in (7.2.46), and zk,0 = zk−1,i(k−1),

2Here it is important that the set G is bounded in the space Z.


we get the following relations between negative values

J(z∗)− J(zk+1,0) = J(z∗)− J(zk,i(k))

≥ χk4

(|zk,i(k) − z∗|X + |zk,i(k)−1 − z∗|X

) (|zk,i(k) − z∗|X − |zk,i(k)−1 − z∗|X

)≥ χ

4

(|zk,i(k) − z∗|X + |zk,i(k)−1 − z∗|X

) (|zk+1,0 − z∗|X − |zk,0 − z∗|X

)≥ χ

4(d2 + d2 + εk)

(|zk+1,0 − z∗|X − |zk,0 − z∗|X

).

The terms in the last brackets are equal in the limit and with ks → ∞ we getJ(z∗) ≥ J(z3), because J is weakly lsc. Hence z3 = (y3, u3) ∈ Z∗ = Y∗ ×U∗ . The subsequence zks, 0 converges weakly towards the same element z3 ,because of εks → 0 .

Finally we apply Opial’s Lemma A1.1.3 to the sequence uk,0 and theset U∗, which gives the weak convergence of the control iterate in the MSR-algorithm towards u3.

7.2.2.1 Strong convergence of the state component

Now we can proof strong convergence of the sequence yk,0 to y3 in the W 1-norm. The triangle inequality gives us∥∥yk,0 − y3

∥∥1,Ω

≤∥∥yk,0 − yk,0∥∥

1,Ω+∥∥y(uk,0)− y(u3)

∥∥1,Ω

≤ ‖y(u)− y(uh)‖1,Ω + ‖y(uh)− yh(uh)‖1,Ω+‖yk,i − yk,ih ‖1,Ω + ‖y(u)− y(u3)‖1,Ω

≤√

1 + c2F ‖u− uh‖−1,Ω + C5 h2k

+√

1 + c2F

√2 ηkχk

+ ( 1 + c2F )∥∥uk,0 − u3

∥∥−1,Ω

≤ C6

√2 okχk

+ C5 h2k +

√1 + c2F

√2 ηkχk

+(1 + c2F )3/2∥∥uk,0 − uk,0∥∥

Ω+ ( 1 + c2F )

∥∥uk,0 − u3

∥∥−1,Ω

≤ C6

√2 okχk

+ C5 h2k +

√1 + c2F

√2 ηkχk

+ (1 + c2F )3/2 εk

+(1 + c2F )∥∥uk,0 − u3

∥∥−1,Ω

.

Term C5 h2k yields due to (7.2.8).

It is sufficient to show that the last norm-term in the chain of inequalitiesabove tend to zero if k → ∞. To do that we define new elements yk3 in W 1(Ω)by means of the variational inequality⟨

yk3 , φ⟩

1,Ω=

⟨uk,0 − u3, φ

⟩Ω, ∀ φ ∈W 1(Ω).

Setting φ := yk3 and taking limit as k →∞, we obtain∥∥yk3∥∥2

1,Ω=

⟨uk,0 − u3, y

k3

⟩Ω→ 0,


because of uk,0 u3. This means

‖uk,0 − u3‖−1,Ω = supφ∈C∞(Ω)\0

〈uk,0 − u3, φ〉Ω‖φ‖1,Ω

= supφ∈C∞(Ω)\0

〈yk3 , φ〉1,Ω‖φ‖1,Ω

≤ ‖yk3‖1,Ω → 0.

To obtain the convergence conditions (7.2.39) we have to take the limit hk → 0in (7.2.31), (7.2.32). The weakest conditions yield under the assumption χ = 0.

With Hardy’s O-notation let us study the terms in (7.2.31) and (7.2.32):

εk ≤ O

(√okχk

)+O

(√ηkχk

)= O

(√ηkχk

)+

1√χk

[O

(h2k

χk

)+O

(χkh

2k

∥∥uk,i−1∥∥2

Ω

)+(O(χkh

2k

∥∥uk,i−1∥∥

Ω

)+O

(h2k

))(O

(1

χk

)+O

(∥∥uk,i−1∥∥

Ω

))+O

(h4k

χ2k

)+O

(h4k

∥∥uk,i−1∥∥2

Ω

)+O

(h4k

)+O

(h2k

)]1/2

.

It can easily be seen that the underlined terms are majorized and that from(7.2.39) it follows convergence of

∑∞k=0 εk.

Theorem 7.2.12 gives an a priori criteria for the convergence of the innerapproximation of the MSR-method with respect to the control space. For thefirst glance this result is common knowledge and for the treatment of ill-posedcontrol problems very important, because in that case there are no stoppingcriteria or other indicators at hand to prove the convergence. Mostly the solutionset is quite instable. It may happen that the sequence of approximations is goingto miss the right landing or there is no convergence towards some element atall. But if the conditions of Theorem 7.2.12 hold, both disadvantages cannothappen and, therefore, one could apply traditional criteria of strong convergence,like the boundedness of the sum of distances between neighbored iterates in thenorm of the process space. This would give also an a posteriori criteria for thestrong convergence.

For the weakest assumption under which it is possible to formulate an apriori criteria for the strong convergence we refer to [217].

7.2.3 Numerical results

The numerical investigation of the MSR-algorithms 7.2.4 is performed with asimple model of type (7.2.9):

min(y,u)∈Y×Uad

J(y, u) =1

2‖y‖2Ω

−∆y = u+ gδΓ in Y ′(7.2.49)


withY =

y ∈W 1(Ω) : γ0y = 0 in H1/2(ΓD)

,

K = U = L2(Ω), Uad = u ∈ U : −1 ≤ u(x) ≤ 1 a.a. in Ω ,

where Ω is a square domain and Γ = ∂Ω \ ΓD is one side of the square. Thesides with Dirichlet boundary conditions are drown in Figure 7.2.4 with doublelines.

x2

x1

ΓD

1

Ω

12

Figure 7.2.4:

Here we are dealing with an ”optimal reflection” of a delta layer due toa distributed control u(x), whose amplitude is bounded from above and below.Surely there exist different applications of this formulation. The most simplephysical meaning consist in the observation of a homogeneous squared elementΩ which is heated from one side Γ = ∂Ω \ ΓD by a fixed source of heat. Thetemperature at the other sides ΓD is fixed to the zero-level. In this situation oneis looking for a cooling regime on the element Ω such that the deviation of thepotential of the temperature on the square Ω to the zero-level on the boundaryΓD is as small as possible. The construction of the cooling system is determinedonly by the form of the switching lines of the optimal control function u∗(t).

Hence, Problem (7.2.49) describes a two-dimensional analogue of the clas-sical Fuller problem and one can expect that there is an infinite number ofswitching lines in the optimal solution. Although this problem has a unique so-lution, it belongs to the class of ill-posed problems.

The numerical reconstruction of such a solution becomes very difficultbecause the smallest bang-bang formations in u∗(x) between the neighbor-ing switches becomes extremely instable. An example for such a solution forg(x2) = 0.7 cosπx2 is shown in Figure 7.2.5.

The discrete auxiliary problems of type (7.2.13) can be formulated asquadratic minimization problems in finite-dimensional spaces. The variableswhich have to be optimized are the weights of the finite basic functions. The


0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

−1

0

1

Figure 7.2.5: ”Chattering regime”on a square for an optimal u∗(x1, x2).

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

x 2

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

x 2

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

x1

x 2

Figure 7.2.6: Approximation of uk,i(k)(x1, x2) on a sequence of grids: h = 2−6,h = 2−7, h = 2−8

7.3. COMMENTS 281

Figure 7.2.8: MSR for hi = 2−7

optimal weights determine the sought approximate solutions yk,i and uk,i.

The main idea of this approach is to show the interaction of discretizationand regularization. This can be observed clearly on Figure 7.2.6, where it isshown how the fourth instable bang-bang formation can be reproduced on asequence of grids. Hereby the parameter χk has to be slowly decreased, followingthe updating rule (7.2.39).In the one-dimensional Figures 7.2.7 and 7.2.8 another phenomenon can beobserved for Problem (7.2.49): On the same discretization level and under thesame updating rule of the regularization parameter the MSR-method deliversbetter results than Tichonov regularization.

Figure 7.2.7: Tichonov regularization for hi = 2−7

7.3 Comments

In the customary ideology to handle ill-posed optimal control problems mostlythe object of the control is an evolution system and the non-uniqueness of thequasi-optimal control is believed to be an advantage. In contrast to that, thehere described approaches require a careful analysis of convergence not only forthe objective values but also for the state and control components of the solu-tions.

Usually the convergence of numerical algorithms for control problems isstudied under the additional assumption that the objective functional is strongconvex with respect to the control variable and that the control function pos-sesses a classic bang-bang behavior (cf., for instance, Alt and Mackenroth


[9], Glashoff and Sachs [133], Hackbusch and Will [161], Knowles [242],Lasiecka [255]. The main result hereby is the proof of the convergence of theapproximate solution to the unique solution of the original problem in somegiven Banach spaces. However, an analogous result for ill-posed control prob-lems is approachable only by using special regularization techniques.

Difficult is also the question of the existence of a solution, if a nonlinearstate equation comes into the play and non-quadratic control costs arises. Thisis the reason why we restricted our investigation to linear state equations only.It should be mentioned that for nonlinear problems a number of results canbe found, for instance, by Kelley and Sachs [233], Heinkenschloss andSachs [171] and Kupfer and Sachs [251]. In this case the validity of suffi-cient optimality conditions of second order are substantial.

The first regularized method for ill-posed linear control problems was sug-gested by Tikhonov [394].Penalty techniques applied to control problems can be found first by Lions [269]and Balakrishnan [32]. Embedding the state equation into the penalty term,the control and state variables can be considered as independent, hence, thecontrol problem can be reformulated as a variational inequality. Bergounioux[42, 43] applied this penalty technique in order to prove under weak regularityconditions the existence of the Lagrange multipliers for elliptic and paraboliccontrol problems with state constraints. In [221] a regularized penalty methodis described for solving ill-posed parabolic control problems.

Chapter 8

PPR FOR VARIATIONALINEQUALITIES

8.1 Approximation of Variational Inequalities

The subsequent introduction is dispensable for understanding the main resultsof this chapter. Nevertheless, it makes the reader acquainted with some peculiar-ities of the finite-dimensional approximation of monotone (elliptic) variationalinequalities stated in Hilbert spaces, which are not common in linear problemsin mathematical physics.

Finite-dimensional approximations of elliptic variational inequalities aremainly performed by the usage of finite element technique. In this direction onecan distinguish three approaches: approximation of the original (primal) or thedual problem, or the mixed variational formulation of the problem.

We are going to illustrate briefly these approaches for the model problem(A2.1.6), known as problem with an obstacle on the boundary. This problem willbe considered also later on in detail.

For an introduction into Hilbert spaces and various notions of monotonevariational inequalities considered in these spaces we refer to the Appendix-Sections A1 and A2 in the Appendix. Later on the symbols ((·, ·)) and‖| · ‖|provided with corresponding indices are used to denote the scalar product andthe norm of a given space of vector functions.

On a given open and bounded domain Ω ⊂ Rn with Lipschitz continuousboundary Γ the space H

12 (Γ) can be defined as the image of the space H1(Ω)

under the trace mapping γ:

H12 (Γ) := γ

(H1(Ω)

)endowed with the norm

‖u‖ 12 ,Γ

:= inf‖v‖1,Ω : v ∈ H1(Ω), γv = u.

Here, γv denotes the trace of the function v on Γ. The space H12 (Γ) is a subspace

of L2(Γ) and H−12 (Γ) is the space of linear (continuous) functionals on H

12 (Γ).

283

284 CHAPTER 8. PPR FOR VARIATIONAL INEQUALITIES

Model Problem: (cf. (A2.1.6))


with

J(u) :=1

2‖|∇u‖|20,Ω − 〈f, u〉0,Ω (8.1.2)

K := u ∈ H1(Ω) : γu ≥ g on Γ, (8.1.3)

Ω ⊂ Rn an open domain with sufficiently regular boundary (at least Lipschitz-continuous),f ∈ L2(Ω) and g the trace of function g0 ∈ H2(Ω) on Γ.

In case n = 3, under certain symmetry conditions, it is possible to reduce theproblem to n = 2. Physically the problem describes a stationary fluid streamin the domain Ω which is bounded by a semi-permeable membrane Γ. Thegiven functions g and f define the fluid pressure on the boundary Γ and thefluid stream in Ω, respectively. The solution u represents the distribution of thepressure in Ω. Due to the semi-permeability of the boundary, on those parts ofΓ where u(x) ≤ g(x), the fluid can flow only into Ω, but it is impossible to flowout of Ω.

It is easy to see that, on the space H1(Ω), the kernel of the bilinear form

a(u, v) := ((∇u,∇v))0,Ω,

corresponding to the functional (8.1.2), consists only of elements z = const.Solvability of Problem (8.1.1) is guaranteed under the condition

〈f, 1〉0,Ω ≤ 0.

In case〈f, 1〉0,Ω < 0, (8.1.4)

the problem has a unique solution (cf. Glowinski, Lions and Tremolieres[135]). Condition (8.1.4) is obvious. Indeed, if 〈f, 1〉0,Ω > 0, then the volumeof the fluid increases. Consequently, an equilibrium of the distribution of thepressure is impossible.

However, if 〈f, 1〉0,Ω = 0 and Problem (8.1.1) has a solution u∗, then itssolution set has the structure

U∗ := u ∈ H2(Ω) : u = u∗ + c, γu∗ + c ≥ g on Γ.

In order to describe the above mentioned approximation approaches, forthe seek of simplicity we restrict ourselves to the case where Ω is an open poly-hedral set in R2. In this situation straightforward approximations of variationalproblems in H1(Ω) by means of finite elements are presented in Appendix A2.4.

In order to formulate the dual problem to (8.1.1), we need the space

H(div; Ω) := q ∈ [L2(Ω)]n : div q ∈ L2(Ω),

8.1. APPROXIMATION OF VARIATIONAL INEQUALITIES 285

with the divergence operator used in the sense of distributions

((q,∇ϕ))0,Ω := −〈ϕ,div q〉0,Ω ∀ ϕ ∈ D(Ω).

In H(div; Ω) the norm is given by

‖q‖2H(div;Ω) := ‖|q‖|20,Ω + ‖ div q‖20,Ω.

For each q ∈ H(div; Ω) there is a unique linear functional qv ∈ H− 12 (Γ) such

that〈qv, w〉 := ((q,∇v))0,Ω + 〈div q, v〉0,Ω, (8.1.5)

where 〈·, ·〉 denotes the duality pairing between H12 (Γ) and H−

12 (Γ) (see Lions

and Magenes [272]) and the elements w ∈ H 12 (Γ) and v ∈ H1(Ω) are related

according to the trace mapping γv = w.Non-negativity qv ≥ 0 on Γ means that

〈qv, γv〉 ≥ 0 ∀ v ∈ H1(Ω) with γv ≥ 0.

Now, the dual problem to (8.1.1) can be formulated as

minJ (q) : q ∈ K (8.1.6)

with

J (q) :=1

2‖|q‖|20,Ω − 〈qv, g〉, (8.1.7)

K := q ∈ H(div; Ω) : div q + f = 0 in Ω, qv ≥ 0 on Γ. (8.1.8)

The relation between the primal problem (8.1.1) and its dual (8.1.6) is clarifiedin the following statement.

8.1.1 Theorem. If the primal problem (8.1.1) is solvable, then there exists aunique solution q∗ of the dual problem (8.1.6), and the equations

q∗ = ∇u∗,J(u∗) + J (q∗) = 0

(8.1.9)

are true for every solution of Problem (8.1.1).

This theorem holds also true for more general variational problems withconvex quadratic objective functionals and constraints of the form Bu ≤ b, withB a linear operator in a space of functions on Ω or Γ.

Let us sketch its proof. Introducing new functions

ζk :=∂u

∂xk, (8.1.10)

Problem (8.1.1) can be reformulated as

min

1

2‖|ζ‖|20,Ω − 〈f, u〉0,Ω : ζk =

∂u

∂xk, k = 1, ..., n; u ∈ K

.

By means of the Lagrangian function

L(u, ζ, q) :=1

2‖|ζ‖|20,Ω − 〈f, u〉0,Ω +

n∑k=1

〈qk,−ζk +∂u

∂xk〉0,Ω


defined on the set K × [L2(Ω)]n × [L2(Ω)]n we express the dual problem as

supq∈[L2(Ω)]n

inf(u,ζ)∈K×[L2(Ω)]n

L(u, ζ, q).

This can be put in the form (8.1.6) by an explicit calculation of

ψ(q) := inf(u,ζ)∈K×[L2(Ω)]n

L(u, ζ, q).

Indeed, we can decompose ψ(q) = ψ1(q) + ψ2(q), where

ψ1(q) := infζ∈[L2(Ω)]n

1

2‖|ζ‖|20,Ω −

n∑k=1

〈qk, ζk〉0,Ω

,

ψ2(q) := infu∈K

n∑k=1

〈qk,∂u

∂xk〉0,Ω − 〈f, u〉0,Ω

.

A short calculation gives for q ∈ K

ψ1(q) = −1

2‖|q‖|20,Ω

and, due to (8.1.8), (8.1.5),ψ2(q) = 〈qv, g〉.

Moreover, it can be easily seen that ψ2(q) = −∞ for q /∈ K.As H(div; Ω) is a Hilbert space and the set K is convex and closed, exis-

tence and uniqueness of the solution of the dual problem (8.1.6) follows fromthe strong convexity of the functional 1

2‖|q‖|20,Ω on [L2(Ω)]n.

Concerning the duality relation (8.1.9), it should be noted that for Problem(8.1.1) as well as for many other convex variational problems, relations of thattype are established according to a general framework developed by Ekelandand Temam [100].

Finally, the relation q∗ = ∇u∗ can be concluded from the uniqueness of q∗

and the inclusion ∇u∗ ∈ K.

Essentially for the application of finite element methods to the dual prob-lem (8.1.6) is that the equation

div q + f = 0 in Ω (8.1.11)

is satisfied on each approximation level. This can be done by means of a suitablesubstitution of the unknown function q, which makes equation (8.1.11) homo-geneous, and by the use of special solenoid linear finite elements ϕi (see [87]).These elements satisfy the condition divϕi = 0 on each triangle. Note that, forn = 2, we actually deal with two functions q1 and q2 related by (8.1.11). Conse-quently, the finite element ϕi is a two-dimensional vector-function. Concerningdetails of this approach, we refer to Hlavacek et.al [182].

The construction of finite-dimensional approximations of the dual problemof an elliptic variational inequality is more complicated. Moreover, in order toprove the convergence of the approximate solutions of these auxiliary problemsand also some rate of convergence, higher smoothness of the solution of the

8.1. APPROXIMATION OF VARIATIONAL INEQUALITIES 287

variational inequality is required. Nevertheless, this approach has the advantagethat, in view of the strong convexity of the objective function, strong conver-gence of the minimizing sequence and an acceptable stability of the auxiliaryproblems can be established.

Note that for the model problem (8.1.1) the question of non-uniqueness ofthe solution is of minor importance. In the indefinite case 〈f, 1〉0,Ω = 0 solutionsof the problem can be obtained, provided they exist, by solving the correspond-ing Neumann problem.

However, for problems in elasticity theory, considered in the sequel, possi-ble non-uniqueness of the solution originates from meaningful physical situationsand may be a serious obstacle for the numerical treatment of such problems.

Problems in a mixed variational formulation are obtained in form of saddlepoint problems for a Lagrangian function which is different from the one we usedbefore. Such a function is introduced in order to exclude unilateral constraints,i.e. inequality constraints like γu ≥ g on Γ as considered in (8.1.3). For detailsof this notion see Fichera [114] and Panagiotopoulos [316].

The saddle point problem related to (8.1.1) is given as

find a point (u∗, λ∗) ∈ H1(Ω)×H−12

+ (Γ) such that

L(u∗, λ) ≤ L(u∗, λ∗) ≤ L(u, λ∗) ∀ (u, λ) ∈ H1(Ω)×H−12

+ (Γ), (8.1.12)

withL(u, λ) := J(u) + 〈λ,−γu+ g〉. (8.1.13)

Assuming solvability of the original problem (8.1.1), its equivalence (in the senseof Proposition A1.7.53) with Problem (8.1.12) can be established by argumentsanalogous to those used in the proofs of Kuhn-Tucker type theorems underSlater’s condition.

8.1.2 Theorem. The point (u∗, λ∗) ∈ H1(Ω)×H−12

+ (Γ) is a saddle point of the

Lagrangian function (8.1.13) on the set H1(Ω)×H−12

+ (Γ) iff u∗ is a solution of

Problem (8.1.1) and λ∗ = ∂u∗

∂ν , with ν a unit vector of the outward normal to Γ.

The proof of this statement is almost literally the same as that in Cea [69]for the problem

1

2‖u‖21,Ω − 〈f, u〉0,Ω → min

s.t. K := u ∈ H1(Ω) : γu ≥ 0 on Γ.

In order to discretize problems of the mixed variational formulation, twosystems of finite elements have to be introduced to approximate both the func-tions of the initial space and those of the space of the Lagrange multiplier vec-tors. For the model problem (8.1.1) functions in the initial space H1(Ω) can beapproximated by piece-wise linear elements on a standard quasi-uniform trian-

gulation of the domain Ω, see Definition A2.2.2. Functions from H− 1

2+ (Γ) have

to be approximated by piece-wise constant elements on Γ. Edges of the partitionof Γ are usually those vertices of the corresponding triangulation of the domainΩ which are located on the boundary. However, such a close connection between


the triangulations is not obligatory, see Haslinger and Lovisek [170].In this way, for the saddle point problem (8.1.12) a sequence of approx-

imate saddle point problems is constructed on Vhk × Λ+hk

, where Vhk and Λhkare finite element spaces corresponding to a given triangulation Thk on Ω, andΛ+hk

denotes the subset of non-negative functions on Λhk .

Concerning the approach of discretization of the original problem (8.1.1)it should be noted that, in general, we do not have the inclusion Khk ⊂ K forthe approximated auxiliary problems. This fact complicates the investigation ofconvergence and rate of convergence of the solution methods. The use of themixed variational formulation of Problem (8.1.1) overcomes this disadvantage

because Vhk ⊂ V and Λ+hk⊂ H

− 12

+ (γ). This situation is typical for variationalinequalities in real life problems.

The advantage of the latter approach consists in a simultaneous compu-tation of an approximate solution u∗hk and an approximate multiplier λ∗hk . Forproblems considered in this chapter the functionals λ∗ have an explicit physicalinterpretation which may be of interest in itself. Moreover, these functionals,computed by the problem in mixed variational formulation, are usually moreaccurate that those obtained by numerical differentiation of the approximatesolutions of the original problem.

In some cases the mixed variational formulation enables us to replace theminimization of a non-smooth functional by solving a saddle point problem fora functional which is differentiable in both variables u and λ, see Glowinskiet.al [135] for a model problem with friction.

However, the use of the mixed formulation does not eliminate difficultieswhich are connected with the non-uniqueness of the solution of the originalproblem, because in that case the Lagrangian function has more than one sad-dle point. Moreover, the selection of particular algorithms for solving the corre-sponding approximate auxiliary problems is more restricted. On the other hand,for a straightforward numerical solution of the approximated auxiliary problemsof the original problem the typical simple structure of the feasible sets K allowsus to apply fast-convergent methods.

8.2 Contact Problems Without Friction

In this chapter the abstract MSR-method described in Section 4.3 will be spec-ified in order to get stable solution algorithms for some elliptic variational in-equalities in elasticity theory. To this end a straightforward finite element ap-proximation of the original problems will be used.

Our attention is mainly devoted to contact problems of two elastic bodies(two-body contact problem) and of the equilibrium of an elastic body supportedby an absolutely rigid surface (Signorini problem). Both problems will be con-sidered in the plane and without friction. These plane problems have been inves-tigated intensively for instance by Fichera [114] and Hlavacek et al. [182](see also the references therein). These problems are ill-posed as it will be seenin the sequel.

In Section 8.3, in connection with an algorithm of alternating iterationssuggested by Panagiotopoulos [315] for contact problems with friction, weshall also investigate the application of IPR for elliptic variational inequalities

8.2. CONTACT PROBLEMS WITHOUT FRICTION 289

corresponding to the minimization of non-smooth energy functionals.For notions in the elasticity theory the reader is refered to the monographs

of Hahn [166] and Gladwell [131].We shortly review some results, given for instance in Duvaut and Lions

[95], Panagiotopoulos [316] and Hlavacek and Lovisek [183], on the ex-istence, uniqueness and characterization of the solutions. We mainly make useof the notations and terminology introduced in [182].

8.2.1 Formulation of the problems

All the problems studied here are considered in the framework of the theory ofsmall deformations under the linear Hooke’s law for anisotropic materials anda constant field of temperature. Moreover, it is assumed that the initial stressesand strains are equal to zero.

Contact problem: We first formulate the boundary value problem de-scribing the contact of two elastic bodies, supposing that these bodies occupybounded domains Ω′ ⊂ R2 and Ω′′ ⊂ R2 with Lipschitz-continuous boundariesΓ := ∂Ω′ and Γ′′ := ∂Ω′′ in the initial situation. Everywhere the upper strings′ and ′′ in the notations correspond to the bodies Ω′ and Ω′′, respectively. Thebodies are in contact along a part Γc := ∂Ω′∩∂Ω′′ of the boundary, the so-calledcontact zone, and their forms and mutual positions exclude an enlargement ofthe contact zone Γc within the deformation. The partitions of the boundaries

∂Ω′ := Γu ∪ Γ′τ ∪ Γc

∂Ω′′ := Γ0 ∪ Γ′′τ ∪ Γc,

with meas Γu > 0 and meas Γc > 0, are assumed to be known.

Γ0

Γ′′τ

Γ′τ

Γu

Ω′′

Ω′

Γc

P ′′

P ′

Figure 8.2.1:

The body Ω′ is fixed on Γu and external forces P ′ and P ′′ are given on Γ′τ andΓ′′τ , see Figure 8.2.1.

On the boundary Γ0 a zero displacement uν in the direction of the outer


normal ν and a zero tangential stress Tt are assumed. The case of Γ0 = ∅ ispermitted. Typically, conditions of this type are required on the symmetry axisof the system and can be reformulated as boundary conditions if an equivalentproblem for the ”half-system” is considered.

Under the above assumptions, taking into account the influence of bodyforces F , we consider a vector field of displacements u = (u1(x), u2(x)) of pointsx ∈ Ω′ ∪ Ω′′ which correspond to the equilibrium state, as well as the tensorfield of strains

εkl(u) :=1

2

(∂uk∂xl

+∂ul∂xk

), k, l = 1, 2. (8.2.1)

The components of the stress tensor τkl are given according to Hooke’s law:

τkl := aklpmεpm, k, l = 1, 2,

with aklpm the coefficients of elasticity. As usually in elasticity theory, here andin the sequel we follow Einstein’s summation convention, i.e., the summation isperformed over terms with repeating indices.

The functions aklpm are assumed to be measurable and bounded on Ω =Ω′ ∪ Ω′′. Moreover, symmetry

aklpm = alkpm = apmkl (8.2.2)

as well as the ellipticity property has to be satisfied, i.e., there exists a positiveconstant c0 such that

aklpm(x)σklσpm ≥ c0σklσpm (8.2.3)

for all symmetric matrices σklk,l=1,2 and almost every x ∈ Ω.According to the linear elasticity theory the stress tensor has to fulfil the

equilibrium conditions

∂τkl∂xl

+ Fk = 0 in Ω, k = 1, 2. (8.2.4)

Body Ω′ being fixed on Γu means that

u = 0 on Γu (8.2.5)

and the two-sided contact conditions on Γ0 can be described by

uν = 0, Tt = 0 on Γ0, (8.2.6)

with uν := ukνk, Tt := τklνktl, t := (−ν2, ν1) or t := (ν2,−ν1) and ν a unitvector of an outward normal to Γ0.

On Γ′τ and Γ′′τ the conditions

τ ′klν′l = P ′k,

τ ′′klν′′l = P ′′k

(8.2.7)

have to be satisfied for k = 1, 2.Finally, the following conditions are valid on Γc:

u′ν′ + u′′ν′′ ≤ 0, (8.2.8)


meaning that penetration of the points from one body into the other is impos-sible.From Newton’s third law (actio=reactio) and the absence of friction and stretch-ing of the bodies, we get the conditions

T ′t = T ′′t = 0,

T ′ν′ = T ′′ν′′ ≤ 0, (8.2.9)

(u′ν′ + u′′ν′′)T′ν′ = 0,

where Tν denotes the normal component of the force T .Condition (8.2.8) holds for the contact problem under the standard as-

sumption that the radius of the curvature of the arc Γc is large in comparisonwith its length.

Signorini problem: This problem can be represented as a special case ofthe two-body contact problem, where Ω′ is supposed to be an absolutely rigidbody. The transition from the two-body contact problem to its formulation isnot difficult, we have only to reformulate the conditions on the boundary Γc:

u′′ν′′ ≤ 0, T ′′t = 0, T ′′ν′′ ≤ 0, u′′ν′′T′′ν′′ = 0 on Γc. (8.2.10)

In future, we formulate the Signorini problem so that the elastic body is denotedby Ω and Γc denotes the support part of the boundary. The strings ′′ will beomitted.

A vector-function u satisfying the equilibrium system (8.2.4) and theboundary conditions (8.2.5)–(8.2.9) can be interpreted as a classical solutionof the two-body contact problem. We use this notion analogously for the Sig-norini problem.

8.2.1.1 Variational formulations of the problems

The variational formulation of these problems corresponds to the principleof minimal potential energy.Concerning the two-body contact problem, we introduce the space of virtualdisplacements

V := u = (u′, u′′) ∈ [H1(Ω′)]2 × [H1(Ω′′)]2 :

u′ = 0 on Γu, u′′ν = 0 on Γ0, (8.2.11)

with the norm

‖|u‖|1,Ω :=√‖|u′‖|21,Ω′ + ‖|u′′‖|21,Ω′′

and the set of admissible displacements

K := u ∈ V : u′ν + u′′ν ≤ 0 on Γc. (8.2.12)

Variational formulation of the two-body contact problem:


Minimize the functional of potential energy

J(u) :=1

2a(u, u)− `(u)

s.t. u ∈ K := v ∈ V : v′ν + v′′ν ≤ 0 on Γc,(8.2.13)

where

a(u, v) :=

∫Ω

aklpmεkl(u)εpm(v)dΩ, (8.2.14)

`(v) :=

∫Ω

FkvkdΩ +

∫Γτ

PkvkdΓ, Γτ := Γ′τ ∪ Γ′′τ . (8.2.15)

A minimizer u∗ of Problem (8.2.13) is called a weak solution of the two-bodycontact problem.

8.2.1 Theorem. Each classical solution of the two-body contact problem is aweak solution. If a weak solution is sufficiently smooth, then it is a solution inthe classical sense.

Concerning the Signorini problem, we suppose that the body occupies inits initial state a bounded domain Ω with a fixed partition of the Lipschitz-continuous boundary Γ := ∂Ω:

Γ = Γ0 ∪ Γc ∪ Γτ .

Vector-functions as body forces F ∈ [L2(Ω)]2 and external forces P ∈ [L2(Γτ )]2

(acting on the boundary Γτ ) are given. Under the previously made assumptionsfor the elasticity coefficients aklpm, we consider the following problem:

Variational formulation of the Signorini problem:

Minimize the functional of potential energy

J(u) :=1

2a(u, u)− `(u)

s.t. u ∈ K := v ∈ [H1(Ω)]2 : vν = 0 on Γ0, vν ≤ 0 on Γc,(8.2.16)

with

a(u, v) :=

∫Ω

aklpmεkl(u)εpm(v)dΩ, (8.2.17)

`(v) :=

∫Ω

FkvkdΩ +

∫Γτ

PkvkdΓ. (8.2.18)

The meaning of εkl and vν is the same as in Problem (8.2.13). The connectionbetween the solutions of this problem and the classical one is analogous toTheorem 8.2.1.

8.2.1.2 Solvability of the problems

Now, we turn to the question of solvability and characterization of the optimalsets of the problems under consideration.


In the case of the two-body contact problem the kernel K of the bilinearform (8.2.14) on the space Z := [H1(Ω′)]2 × [H1(Ω′′)]2 consists of elementsz = (z′, z′′), where z′ and z′′ are vector-functions with components

z′1(x) := a′1 − b′x2, z′2(x) := a′2 + b′x1,

z′′1 (x) := a′′1 − b′′x2, z′′2 (x) := a′′2 + b′′x1,

with arbitrary coefficients a′1, a′2, b′ and a′′1 , a′′2 , b′′.Since the body Ω′ is fixed on a part of the boundary Γu, its rigid displace-

ment is impossible1.For ` defined according to (8.2.15) it is easy to see that

`(y) ≤ 0 ∀ y ∈ K ∩ K.

This is a necessary condition for the existence of a solution of Problem (8.2.13).

8.2.2 Theorem. If the conditions

`(y) ≤ 0 ∀ y ∈ K ∩ K,

`(y) < 0 ∀ y ∈ K ∩ K with infx∈Γcy′′ν′′(x) + y′ν′(x) < 0

hold, then Problem (8.2.13) has at least one solution u∗.Moreover, the solution set has the structure

U∗ = u∗ + y : y ∈ V ∩ K, u∗ + y ∈ K, `(y) = 0,

with u∗ fixed.

If the boundary part Γ0 = ∅, then the dimension of the subspace V ∩K ofvirtual, rigid displacements of the body Ω′′ is equal to three. In case meas Γ0 > 0we have dim(V ∩ K) ≤ 1.

For the Signorini problem the kernel K of the bilinear form (8.2.17) on thespace [H1(Ω)]2 is a three-dimensional subspace consisting of vector-functions

z1(x) = a1 − bx2, z2(x) = a2 + bx1.

If there exists a solution of Problem (8.2.16), then the functional (8.2.18) hasto be non-positive on the set K ∩ K.

8.2.3 Theorem. Let for Problem (8.2.16) the following conditions be fulfilled:

`(y) ≤ 0 ∀ y ∈ K ∩ K,

`(y) < 0 ∀ y ∈ K ∩ K satisfying infx∈Γc

yν(x) < 0.

Then Problem (8.2.16) is solvable and, with a fixed solution u∗, the optimal setcan be expressed by

U∗ = u∗ + y : y ∈ V ∩ K, u∗ + y ∈ K, `(y) = 0,

withV := u ∈ [H1(Ω)]2 : uν = 0 on Γ0. (8.2.19)

1A displacement is called rigid if the distance between any point of the body is not changing.


As in the two-body contact problem, dim(V ∩K) = 3 for Γ0 = ∅ and equalto zero or one in case meas Γ0 > 0.

Applying the finite element method to the variational formulations ofthe both problems considered, convergence of the corresponding minimizing se-quences in the norm of the space V can be established only in the followingparticular cases (i)–(ii):

(i) dim(V ∩ K) ≤ 1,

(ii) either Korn’s inequality (see (8.2.55)) is fulfilled on V , or a subspace V 1 ⊂V can be chosen such that U∗ ∩ (K ∩ V 1) 6= ∅ and an analogue of Korn’sinequality holds on V 1 (cf. [182], [158]).

In case dim(V ∩ K) = 3 finite element approximations of the dual prob-lem are commonly used. The corresponding Lagrange multipliers represent thecomponents of the uniquely defined stress tensor.However, the description of the feasible set of the dual problem is consider-ably complicated such that sophisticated approximation techniques, based onequilibrium models for the finite elements have to be applied.

8.2.2 Finite element approximation of contact problems

Now, we consider the usage of finite element methods to the Problems (8.2.13)and (8.2.16). For the seek of simplicity, let us assume that the domains Ω′ andΩ′′ (or Ω in the Signorini problem) are bounded and polyhedral, such that wecan apply the FEM-scheme given in Appendix A2.2.

In this situation the contact boundary Γc of the two-body contact problemcan be described as

Γc = ∪mj=1Γc,j , (8.2.20)

with Γc,j denoting the closed straight-line intervals. Points, at which the type ofthe boundary condition changes and vertices belonging to the polyhedrons Ω′

and Ω′′, have to be included in the vertex set of the corresponding triangulationsT ′h or T ′′h for each triangulation parameter h. End-points of the intervals Γc,jhave to be common vertices for both triangulations. Under these conditions wecall a system of triangulation of each domain consistent with the partition ofthe boundary.

Moreover, vertices located on Γc are common vertices of T ′h and T ′′h andthe quasi-uniformity of the triangulation sequences T ′hk and T ′′hk must beensured according to Definition A2.2.2. It is easy to see that in this case thesequence Thk with Thk = T ′hk ∪ T ′′hk forms a quasi-uniform triangulation ofthe domain Ω = Ω′ ∪ Ω′′.

Vertices belonging to T ′h and T ′′h we shall denote by πk. The index sets

I ′h, I ′′h , Iuh , I0h and Ic,jh indicate that the points πk belong to the sets Ω′, Ω′′, Γu,

Γ0 and Γc,j , respectively.In order to simplify the description, we assume that Γ0 = ∅ or that Γ0 is

an interval. In the latter case ν0 denotes a unit outward normal to Γ0, and νj

is a unit normal to Γc,j , pointed to the exterior of Ω′.For a fixed pair of triangulation T ′h and T ′′h the finite element space Vh of

vector-functions

vh = (v′h, v′′h) ∈

([C(Ω′)]2 × [C(Ω′′)]2

)∩ V


is defined by

v′h(x) :=∑

k∈I′h\Iuh

αkϕk(x), αk ∈ R2, (8.2.21)

v′′h(x) :=∑k∈I′′h

βkηk(x), βk ∈ R2. (8.2.22)

Here, ϕk ∈ C(Ω′) and ηk ∈ C(Ω′′) are affine functions on each element of thecorresponding triangulation and

ϕk(πk) = 1, ϕk(πj) = 0, j 6= k, (k ∈ I ′h),

ηk(πk) = 1, ηk(πj) = 0, j 6= k, (k ∈ I ′′h ).

Due to (8.2.12), the coefficients βk have to satisfy

〈ν0, βk〉R2 = 0 k ∈ I0h. (8.2.23)

The set Kh approximating K on the space

Vh := vh = (v′h, v′′h) : 〈ν0, βk〉R2 = 0, k ∈ I0

h, (8.2.24)

can be expressed by

Kh := vh ∈ Vh : 〈νj , v′h(πk)− v′′h(πk)〉R2 ≤ 0, k ∈ Ic,jh , j = 1, ...,m (8.2.25)

or

Kh := vh ∈ Vh : 〈νj , αk − βk〉R2 ≤ 0, k ∈ Ic,jh , j = 1, ...,m. (8.2.26)

In this case the inclusion Kh ⊂ K is established. To prove this fact it is sufficientto remark that, on the one hand, v′h, v′′h as well as the functions describing theboundary conditions of Problem (8.2.13) on Γu, Γ0, Γc are linear on each intervalwhich connects neighboring vertices on the boundaries, and on the other hand,that the functions v′h, v′′h satisfy the boundary conditions at the vertices of thetriangulation.

The approximate problem in the space Vh can be formulated simply as

minJ(uh) : uh ∈ Kh, (8.2.27)

where J is defined according to (8.2.13), (8.2.14), (8.2.15) and the feasible setKh is given by relation (8.2.25).

Concerning the Signorini problem, we use the same representation of thecontact zone Γc as in (8.2.20) and assume that Γ0 = ∅ or that Γ0 is an interval.Again ν0 and νj are unit outward normals to Γ0 and Γc,j , respectively. Again asequence of quasi-uniform triangulations of the domain Ω has to be performedwhich is consistent with the partition of the boundary. For fixed Th the verticesof the triangulation are denoted by πk, and the sets Ih, I0

h, Ic,jh represent theindex sets of the vertices located on Ω, Γ0 and Γc,j , respectively. On Th a systemof finite elements ϕk is introduced as above, and with regard to (8.2.16) and(8.2.19) the space Vh is defined by

Vh := vh =∑k∈Ih

αkϕk(x) : αk ∈ R2, 〈ν0, αk〉R2 = 0, k ∈ I0h (8.2.28)


and the set Kh by

Kh := vh ∈ Vh : 〈νj , vh(πk)〉R2 ≤ 0, k ∈ Ic,jh , j = 1, ...,m, (8.2.29)

with

〈νj , vh(πk)〉R2 = 〈νj , αk〉R2 .

On this way we get an approximate problem of the form (8.2.27), where func-tional J is given by (8.2.16), (8.2.17), (8.2.18) and set Kh by (8.2.29).Using analogous arguments as above, one can show that again the inclusionKh ⊂ K holds.

For a more detailed description of the approximate problems we refer tothe finite element analysis of the Problems (A2.1.3) and (A2.1.6) in AppendixA2.4.

If the above described sufficient conditions of solvability of the problemshold, cf. Theorems 8.2.2 and 8.2.3, then solvability of the approximate prob-lems (8.2.27) follows immediately from the property Kh ⊂ K.

The case where the contact boundary is described by a convex, continuousfunction has been investigated in [182]. Here the triangulations of Ω′ and Ω′′ aredescribed by curved triangles Tj along the contact boundary such that⋃

Tj∈T ′h

Tj = Ω′,⋃

Tj∈T ′′h

Tj = Ω′′

and the vertices πk on this boundary (k ∈ Ich) are common vertices of bothtriangulation systems T ′h and T ′′h which are assumed to be quasi-uniform.The space Vh can be defined according to (8.2.24) and

Kh := vh ∈ Vh : 〈ν(πk), v′h(πk)− v′′h(πk)〉R2 ≤ 0, k ∈ Ich,

with ν(πk) a unit outward normal to Ω′.Note that in this case the inclusion Kh ⊂ K is not valid in general.

Let us briefly consider the numerical solution methods which can be sug-gested for solving the approximate problems of type (8.2.27). We constrict our-selves to the case of a contact problem with two polyhedral bodies.The approximate problem is a quadratic programming problem with a posi-tive semi-definite Hessian of the objective functional and linear, homogeneousinequalities (for πk ∈ Γc) and equality constraints (if πk ∈ Γ0). The coeffi-cient matrix of the constraints has a special sparsity structure, i.e., the maximalnumber of non-zero elements does not exceed 4 in the rows and 2 in columnscorresponding to break-points of Γc and entries 1 in the remaining columns.The insertion of slack variables into the inequality constraints on Γc and asuccessive elimination of a part of the variables enables us to reformulate theproblem into a convex quadratic program with simple constraints such that fi-nally only non-negativity of the slack variables is required.Hence, either special numerical methods as described in Appendix A3.3 canbe applied to the transformed problems with simple constraints, or standardquadratic programming algorithms to both the transformed and non-transformedproblems are applicable.


However, the numerical realization of these methods is connected with seri-ous difficulties due to the non-exact computation of the Hessian which may leadto non-convex auxiliary problems. The influence of errors may be significant, inparticular, because in practice the discretization is often performed with the useof considerably denser triangulations along the contact zone than in the otherparts of the domain, i.e., one has to work with elements of very different size.

For solving such problems in a stable way with an accuracy adjusted tothe accuracy of the approximation, special quasi-Newton methods may be par-ticularly suitable. In each iteration a (not strongly) convex objective functionalhas to be approximated by a quadratic functional with positive definite Hes-sian. Positive definiteness can be obtained by small variations of the diagonalelements in the framework of usual quasi-Newton procedures.

However, this type of regularization immanent in quasi-Newton methodsapplied to each approximate problem (8.2.27) (h is fixed) seems less efficient thanthe use of an iterative regularization of the external iteration process (h ↓ 0) en-suring strong convexity of the objective functional in the approximate problems,too.

8.2.3 Three Variants of MSR-Methods

In this subsection we apply the MSR-approach, described in Sections 4.3 and5.1, to the solution of contact problems.

The family of sets Kk is assumed to be constructed by means of a finiteelement approximation of the feasible set K according to Subsection 8.2.2. Forfixed triangulations T ′hk and T ′′hk ( or Thk in the case of Signorini problems) wetake Kk := Khk with a triangulation parameter hk.

Standard MSR-scheme:According to the MSR-scheme in Section 4.3 we choose Jk := J , because thefunctionals (8.2.13) and (8.2.16) are differentiable. With regard to the expressionof the approximate problems (8.2.27) their regularization leads to the problems

minJ(u) + ‖u− uk,i−1‖2V : u ∈ Kk. (8.2.30)

In addition to this standard MSR-method the following two particularmethods will be considered:Regularization on the subspace:

V1 := u = (u′, u′′) ∈ Z : u′ ≡ 0 in Ω′,

u′′1(x) = a1 − bx2, u′′2(x) = a2 + bx1 in Ω′′; (8.2.31)

(in the case of two-body contact problems) or

V1 := u ∈ [H1(Ω)]2 : u1(x) = a1 − bx2, u2(x) = a2 + bx1 in Ω; (8.2.32)

(in the case of Signorini problems),where the auxiliary problems are of the form

minJ(u) + ‖Π1u− Π1uk,i−1‖2V : u ∈ Kk, (8.2.33)

with Π1 : V → V1 an ortho-projector according to the norm ‖| · ‖|0,Ω.


Regularization with weaker norm:instead of Problem (8.2.30) we have to solve

minJ(u) + ‖|Π1u− Π1uk,i−1‖|20,Ω : u ∈ Kk, (8.2.34)

with

Π1(u(x)) =

0 for x ∈ Ω′,u′′(x) for x ∈ Ω′′,

in the case of two-body contact problems and

Π1u = u

for Signorini problems.

8.2.3.1 Convergence analysis

The three methods above can be studied as particular realizations of the follow-ing basic approach.

8.2.4 Assumption. Let Y and H be two Hilbert spaces and Y be continuouslyimbedded into H (see Definition A1.3.7). Furthermore, we assume that

(i) a(·, ·) and b(·, ·) are continuous, symmetric bilinear forms on Y × Y and

a(u, u) ≥ b(u, u) ≥ 0 on Y ;

(ii) the objective functional J has the form

J(u) :=1

2a(u, u)− 〈f, u〉, (f ∈ Y ′ fixed);

(iii) Π1 : Y → Y1 is an ortho-projector in the norm of Y or H with Y1 asubspace of Y ;

(iv) for some β > 0 it holds

1

2b(u, u) + ‖Π1u‖2H ≥ β‖u‖2Y , ∀ u ∈ Y. (8.2.35)

♦

Under this assumption we introduce on Y another norm | · |, defined by

|u|2 :=1

2b(u, u) + ‖Π1u‖2H . (8.2.36)

The space Y with this norm will be denoted by Y and its conjugate by Y ′.The norms ‖ · ‖Y and | · | are equivalent. Indeed, there exist constants c

and M such that

‖u‖H ≤ c‖u‖Y ,|b(u, v)| ≤ M‖u‖Y ‖v‖Y ,


and for the ortho-projector Π1, defined according to the norm ‖ · ‖Y or ‖ · ‖H ,we have

‖Π1u‖H ≤ c‖u‖Y . (8.2.37)

Finally, using inequality (8.2.35), it holds

β‖u‖2Y ≤ |u|2 ≤(M

2+ c2

)‖u‖2Y , ∀ u ∈ Y. (8.2.38)

For a functional J , defined according to Assumption 8.2.4(i),(ii) we con-sider the generalized problem

minJ(u) : u ∈ K, (8.2.39)

with K a convex, closed subset of Y .The corresponding family of regularized and approximate auxiliary problems

Ψk,i(u) := J(u) + ‖Π1u−Π1uk,i−1‖2H → min, (8.2.40)

u ∈ Kk, (8.2.41)

is constructed with convex, closed sets Kk ⊂ Y and with an ortho-projector Π1

satisfying Assumption 8.2.4(iii),(iv).

Now, we modify the MSR-method for Problem (8.2.39) as follows:

8.2.5 Method. (inexact MSR-method)Given sequences Ψk,i(·) and Kk via (8.2.40) and (8.2.41), respectively andδk and εk with

δk > 0, εk ≥ 0, limk→∞

εk = 0.

For fixed k and i > 0 the points uk,i are generated by


k,i)‖Y ≤ ε′k, (8.2.42)

with

ε′k ≤β√

M2 + c2

εk (8.2.43)

anduk,i := arg min

u∈KiΨk,i(u). (8.2.44)

If ‖Π1uk,i −Π1u

k,i−1‖H > δk, set i(k) := i+ 1 and compute uk,i+1,otherwise set i(k) := i, uk+1,0 := uk,i = uk,i(k) and compute uk+1,1. ♦

Further on in the applications the dependence between ε′k and εk has to bespecified.

8.2.6 Remark. Identifying Problem (8.2.39) with the two-body contact prob-lem (8.2.13) or the Signorini problem (8.2.16) and supposing that the sets Kk

are results of a finite element approximation, we can specify Method 8.2.5 inorder to apply it to these model problems. Recall that for the contact problemsthe space V is defined according to (8.2.11) or (8.2.19).


Identifying Y = V , Y1 = V , H = V , Π1 = I (identity operator in V ),Method 8.2.5 turns into the standard MSR-method 4.3.1. Moreover, takingb(u, v) := 0, we deal with ε′k = εk in (8.2.43).

The choice Y = V , H = V , Π1 = Π1, Y1 = V1, with V1 defined accordingto (8.2.31) or (8.2.32), leads to a particular example of the method with regu-larization on the subspace (cf. Section 4.3.3), where the auxiliary problems areof the form (8.2.33). Here we have to suppose that

b(u, v) := c0

∫Ω

εkl(u)εkl(v)dΩ

(concerning constant c0 see (8.2.3)).Finally, in the case of Y = V , Y1 = V , H = [L2(Ω)]2, Π1 = Π1, Method

8.2.5 corresponds to the method of weak regularization, in which the iteratesare computed by means of Problem (8.2.34). Again we have to take

b(u, v) := c0

∫Ω

εkl(u)εkl(v)dΩ,

and the orthoprojector Π1 has to be considered as an operator mapping fromV into [L2(Ω)]2.

However, the validity of condition (8.2.35) has to be verified in each of thelatter two cases. ♦

Let us return to Problem (8.2.39). In order to prove the convergence of Method8.2.5, we need the following statement, which turns out to be a generalizationof Lemma 4.3.13.

8.2.7 Lemma. Suppose that Assumption 8.2.4 is fulfilled, that C is a convex,closed subset of the space Y and that z0 ∈ Y is an arbitrarily chosen point. Let

z1 := arg minu∈C

J1(u) + ‖Π1u−Π1z0‖2H, (8.2.45)

with J1(u) := J(u) + j(u) and j(·) a convex lsc functional on Y .Then for each u ∈ C the estimates

|z1 − u|2 − |z0 − u|2 ≤ −‖Π1z1 −Π1z

0‖2H + J1(u)− J1(z1) (8.2.46)

and

|z1 − u| ≤ |z0 − u|+ η(u) (8.2.47)

hold, where the norm | · | is defined by (8.2.36) and

η(u) =

0 if J1(u) ≤ J1(z1),√J1(u)− J1(z1) otherwise.

If moreover, ‖Π1z1 −Π1z

0‖H ≥ δ and δ ≥ η(u), then

|z1 − u| ≤ |z0 − u|+ η2(u)− δ2

2|z0 − u|. (8.2.48)


Proof: With regard to (8.2.37) the bilinear form 〈Π1u,Π1v〉H is bounded onthe space Y × Y . Thus, due to the choice of z1, it holds

j(u)−j(z1)+a(z1, u−z1)−〈f, u−z1〉+2〈Π1u−Π1z1,Π1z

1−Π1z0〉H ≥ 0, ∀ u ∈ C.

(8.2.49)Using the definition of the norm | · |,

|z1 − u|2 − |z0 − u|2 =

=1

2b(z1, z1)− b(z1, u) + b(z0, u)− 1

2b(z0, z0)

+ ‖Π1z1‖2H − ‖Π1z

0‖2 − 2〈Π1z1,Π1u〉H + 2〈Π1z

0,Π1u〉H

=1

2b(z1, z1)− b(z1, u) + b(z0, u)− 1

2b(z0, z0)

− ‖Π1z1 −Π1z

0‖2H + 2〈Π1z1 −Π1z

0,Π1z1 −Π1u〉H ,

and in view of (8.2.49) and

a(u, u) ≥ b(u, u) ≥ 0, ∀ u ∈ Y,


|z1 − u|2 − |z0 − u|2 ≤

≤ 1

2b(z1, z1)− b(z1, u) + b(z0, u)− 1

2b(z0, z0)

+1

2b(u, u)− 1

2b(u, u) + j(u)− j(z1) + a(z1, u)− a(z1, z1)

−〈f, u− z1〉 − ‖Π1z1 −Π1z

0‖2H

=1

2b(z1 − u, z1 − u)− 1

2b(z0 − u, z0 − u) + j(u)− j(z1) + a(z1, u)− a(z1, z1)

−〈f, u− z1〉 − ‖Π1z1 −Π1z

0‖2H

≤ 1

2a(z1 − u, z1 − u)− 1

2b(z0 − u, z0 − u) + j(u)− j(z1) + a(z1, u)− a(z1, z1)

−〈f, u− z1〉 − ‖Π1z1 −Π1z

0‖2H

≤ −1

2a(z1, z1) +

1

2a(u, u) + j(u)− j(z1)− 〈f, u− z1〉 − ‖Π1z

1 −Π1z0‖2H

= J1(u)− J1(z1)− ‖Π1z1 −Π1z

0‖2H .

Now, the inequalities (8.2.47) and (8.2.48) follow immediately from the latterrelation.

Similarly as in the method with regularization on the subspace describedin Section 4.3.3 the basic statements about convergence of MSR-method 4.3.1can be extended to Method 8.2.5.

Due to the proofs of the Lemmata 4.3.3, 4.3.5 and Theorem 4.3.6, weget the following result by means of Lemma 8.2.7.

8.2.8 Corollary. Assume that

Ψk,i = Ψk,i(u) = J(u) + ‖Π1u−Π1uk,i−1‖2H ,


and Br := u ∈ Y : |u| ≤ r. The terms L(r), r∗, Q, Qk and Q∗, appearing in(4.3.4), let be defined according to Br, Ψk,i, Ψk,i and | · |.Moreover, the distances between points and sets in Assumption 4.2.2 let also bemeasured in the norm | · |.Then the statements of the Lemmata 4.3.3, 4.3.5 and of the Theorems 4.3.6and 4.3.8 remain true for Method 8.2.5 if the norm ‖ · ‖ is replaced by | · |everywhere. Strong convergence of uk,i holds if dimY1 <∞.

Comments on the proof: The only substantial modification in the proofsof the statements mentioned above consists in the use of Lemma 8.2.7 (withj(·) ≡ 0) wherever Proposition 3.1.3 is applied before.

Now we make some comments on those parts of the proofs whose adaptionmight be connected with some difficulties.

(1) In (4.3.6), (4.3.16) and (4.3.34) the norm ‖ · ‖ has to be replaced by | · |and the points uk,i (see the proof of Theorem 4.3.6) have to be definedby

uk,i := arg minv∈Q

|v − uk,i|.

(2) Instead of inequality (4.3.8) we now obtain

|uk0,i−vk0 |2−|uk0,i−1−vk0 |2 ≤ −‖Π1uk0,i−Π1u

k0,i−1‖2H +τk0 , (8.2.50)

and, with regard to σk0 = 0, the value τk0 coincides with 2L(r)µk0 .Because of (8.2.42) and the strong convexity of Ψk,i(·) (with constant β)on Y , the estimate

‖uk,i − uk,i‖Y ≤1

2βε′k

can easily be established.On this way, due to (8.2.37) and (8.2.43), for each index pair (k, i) weobtain

‖Π1uk,i −Π1u

k,i‖H ≤c

2βε′k =

c

2√

M2 + c2

εk ≤1

2εk. (8.2.51)

For 1 ≤ i < i(k0), due to

‖Π1uk0,i −Π1u

k0,i−1‖H > δk0 >1

2εk0

and (8.2.50) and (8.2.51), it follows

|uk0,i−vk0 |2−|uk0,i−1−vk0 |2 ≤ −(‖Π1u

k0,i −Π1uk0,i−1‖H −

εk02

)2

+τk0

and, instead of (4.3.10), it holds

|uk0,i − vk0 | − |uk0,i−1 − vk0 | ≤−ε2k0 + τk0

2|uk0,i−1 − vk0 |. (8.2.52)

Formula (4.3.18) has to be transformed analogously.In the other places, where (4.3.9) was formerly used, we should now applythe inequality

|uk,i − uk,i| ≤ 1

2εk,


which is a conclusion of the relations (8.2.38), (8.2.43) and

‖uk,i − uk,i‖Y ≤1

2βε′k.

(3) Instead of the inequality (4.3.30) we have to use

|uk,i−u∗∗|2−|uk,i+1Q −u∗∗|2 ≥ J(uk,i+1

Q )−J(u∗∗)+‖Π1uk,i+1Q −Π1u

k,i‖2H ,(8.2.53)

which follows from Lemma 8.2.7.

8.2.3.2 Verification of condition (8.2.35)

The validity of inequality (8.2.35) is obvious for the standard MSR-method.In the other two methods, the method with regularization on a subspace andmethod of weak regularization (see Remark 8.2.6), in the case of the two-bodycontact problem we make use of the relation∫

Ω′εkl(u)εkl(u)dΩ ≥ c1‖|u‖|21,Ω′ (c1 > 0), (8.2.54)

which reflects the equivalence between the seminorm

[u] :=

√∫Ω′εkl(u)εkl(u)dΩ

and the norm ‖|u‖|21,Ω′ in the space u ∈ [H1(Ω′)]2 : u|Γu = 0 if meas Γu > 0(cf. Duvaut and Lions [95], chapt.3, and Ciarlet [75], sect.1.2).

Together with the second Korn inequality used on the set Ω′′∫Ω′′εkl(u)εkl(u)dΩ +

∫Ω′′ukukdΩ ≥ c2‖|u‖|21,Ω′′ (c2 > 0), (8.2.55)

see, for instance, Fichera [113], chapt.1, and [95], chapt.3, relation (8.2.54)enables us immediately to establish the validity of condition (8.2.35) with

β := min[c1, c2] min[c02, 1]

in case the method with weak regularization is considered. Indeed, due to(8.2.54) and (8.2.55) we get

1

2b(u, u) + ‖Π1u‖2H =

c02

∫Ω

εkl(u)εkl(u)dΩ +

∫Ω′′ukukdΩ

=c02

∫Ω′εkl(u)εkl(u)dΩ +

c02

∫Ω′′εkl(u)εkl(u)dΩ +

∫Ω′′ukukdΩ

≥ c0c12‖|u‖|21,Ω′ + min[

c02, 1]c2‖|u‖|21,Ω′′ .

The corresponding analysis for the method with weak regularization on thesubspace, which we are going to establish now, proves to be more complicated.


Let Θ1 be the ortho-projector mapping from [L2(Ω′′)]2 into the linear set ofkernel functions

K′′ := z : z1(x) = a1 − bx2, z2(x) = a2 + bx1 on Ω′′,

with a1, a2, b arbitrary real numbers and Θ := I − Θ1. Here I is the identityoperator on [L2(Ω′′)]2.Due to the projection, we get∫

Ω′′εkl(u)εkl(u)dΩ =

∫Ω′′εkl(Θu)εkl(Θu)dΩ, ∀ u ∈ [H1(Ω′′)]2. (8.2.56)

Now, let us show that for some c3 > 0 the following inequality is satisfied∫Ω′′εkl(u)εkl(u)dΩ ≥ c3‖|Θu‖|21,Ω′′ ∀ u ∈ [H1(Ω′′)]2. (8.2.57)

Supposing this is wrong, then a sequence uj ⊂ [H1(Ω′′)]2 exists such that forvj := Θuj , due to (8.2.56), the relations

‖|vj‖|2,Ω′′ = 1 (8.2.58)

limj→∞

∫Ω′′εkl(v

j)εkl(vj)dΩ = 0 (8.2.59)

limj→∞

∫Ω′′εkl(u

j)εkl(uj)dΩ = 0

are true. Without loss of generality, we assume here that the sequence vj con-verges weakly in [H1(Ω′′)]2.

Then, with regard to the compact embedding [H1(Ω′′)]2c→ [L2(Ω′′)]2, the se-

quence vj converges in the norm of the space [L2(Ω′′)]2 to some elementv ∈ [H1(Ω′′)]2.On account of the second Korn inequality (8.2.55), the estimate

‖|vj+p−vj‖|21,Ω′′ ≤1

c2

(∫Ω′′εkl(v

j+p − vj)εkl(vj+p − vj)dΩ + ‖|vj+p − vj‖|20,Ω′′)

(8.2.60)is fulfilled for all j and p.But, due to (8.2.59) and the strong convergence of vj in [L2(Ω′′)]2, inequality(8.2.60) implies that vj converges to v in the norm of the space [H1(Ω)]2.Hence, according to (8.2.58), we get

‖|v‖|21,Ω′′ = 1. (8.2.61)

On the one hand, the relation∫Ω′′εkl(v)εkl(v)dΩ = 0,

following from (8.2.59) and limj→∞ ‖vj − v‖1,Ω′′ = 0 means that v ∈ K′′ (cf.Necas and Hlavacek [307]).On the other hand, since limj→∞ ‖vj − v‖0,Ω′′ = 0 and

((vj , z))0,Ω′′ = 0 ∀ z ∈ K′′,


we obtain((v, z))0,Ω′′ = 0 ∀ z ∈ K′′.

Hence, v = 0, but this contradicts (8.2.61).Due to the definition of the projectors Π1 and Θ1,

Π1u|Ω′′ = Θ1u′′, Π1u|Ω′ = 0,

and with regard to (8.2.57), for each u ∈ [H1(Ω′′)]2 the inequality

c04

∫Ω′′εkl(u)εkl(u)dΩ + ‖|Π1u‖|21,Ω′′ ≥

c0c34‖|Θu‖|21,Ω′′ + ‖|Π1u‖|21,Ω′′

≥ c0c34‖|Θu‖|20,Ω′′ + ‖|Θ1u‖|20,Ω′′ ≥ min

[c0c34, 1]‖|u‖|20,Ω′′

holds true. Therefore,

c02

∫Ω′′εkl(u)εkl(u)dΩ + ‖|Π1u‖|21,Ω′′

≥ c04

∫Ω′′εkl(u)εkl(u)dΩ + min

[c0c34, 1]‖|u‖|20,Ω′′

and in view of (8.2.55), we finally obtain

c02

∫Ω′′εkl(u)εkl(u)dΩ + ‖|Π1u‖|21,Ω′′ ≥ c4‖|u‖|21,Ω′′ , (c4 > 0).

Now, relation (8.2.35) follows immediately from the latter inequality togetherwith (8.2.54).

The same arguments enable us also to conclude that inequality (8.2.35)holds true if the considered regularization methods are applied to the Signoriniproblem.

Let us turn to the question about the rate of convergence of Method 8.2.5.Regarding the proofs of the Theorems 5.1.1, 5.1.3, 5.1.5 and 5.1.7, we canestablish the following result.

8.2.9 Corollary. The statements of the rate of convergence of the MSR-method4.3.1 remain true also for Method 8.2.5 if

(i) the alterations made in Corollary 8.2.8 are maintained;

(ii) in the Assumptions 5.1.4, 5.1.6 and in condition (i) of Theorem 5.1.1the distance function ρ is defined according to the norm of the space Y (cf.(8.2.36)).

Comments on the proof: We start with the beginning of the proof of Theorem5.1.1. Let the points vk,i, vk,i, wk,0 be determined with respect to the norm | · |,defined by (8.2.36), and the function ηk,i be expressed by

ηk,i(λ) = λJ(vk,i) + (1− λ)J(uk,i) + λ2‖Π1vk,i −Π1u

k,i‖2H .

Then using the relations

|uk,i − uk,i| ≤ εk2

and ‖Π1u‖H ≤ |u|,


we can conclude (as in the previous argumentation) that the inequalities (5.1.11),(5.1.12), (5.1.15), (5.1.16), (5.1.34) and (5.1.37) remain true if the term ‖uk,i+1−uk,i‖2 is replaced by ‖Π1u

k,i+1 −Π1uk,i‖2H .

Now the continuation of the proof requires only formal corrections whichare based on these replacements.

Assumption 5.1.6 and especially Assumption 5.1.4 seem to be natural forvariational inequalities. However, their verification causes serious difficulties. Weshall consider this question in Section 8.4.

8.2.4 On choice of variants of IPR-methods

In Section 4.3, by means of Example 4.3.15, it was shown that one can accel-erate OSR- or MSR-processes if a more suitable variant as the standard regu-larization is used. In this example regularization on a subspace, coinciding withthe kernel of the bilinear form of the objective functional, proves to be moreefficient.

Now, we analyze this question for variational inequalities, considering thethree variants of MSR-methods described in Section 8.2.3.First, we notice that the order of the condition number of the Hessians in theregularized and discretized problems is of the same magnitude in all the threevariants if the approximation is performed by finite element methods with quasi-uniform triangulation.

Applying regularization on a subspace (see Method 8.2.5, with Π1 anortho-projector on the kernel of the bilinear form of the objective functionalJ), on rough triangulation grids, the sequence (I − Π1)uk,i draws close sub-stantially faster to (I − Π1u

∗, u∗ ∈ U∗ than the corresponding projections ofsolutions obtained by the other two methods. Moreover, the values of the ob-jective functional decreases faster, too.

8.2.10 Remark. In order to understand the advantages of the regularizationon the subspace, we suggest to analyze all the three variants for solving thesimple model problem

minu∈H1(Ω)

J(u) :=1

2‖|∇v‖|20,Ω − 〈f, u〉0,Ω,

assuming that 〈f, u〉0,Ω = 0, Kk = K ∀ k and Π1 is the ortho-projector on thekernel in the space L2(Ω).Supposing that U∗ 6= ∅, this example should be considered for the following treecases:

(i) ‖Π1u∗ −Π1u

1,0‖0,Ω is large;

(ii) ‖Π1u∗ −Π1u

1,0‖0,Ω ≈ 0, but ‖u∗ − u1,0‖0,Ω is large;

(iii) ‖Π1u∗ −Π1u

1,0‖0,Ω = 0. ♦

We emphasize that additional numerical expense connected with the pro-jection on the kernel proves to be insignificant for contact problems, becausedue to the structure of the kernel K, each projection operation requires only tosolve a system of three linear equations with respect to a1, a2 and b.


However, at the final stage of MSR-methods, especially, if fast convergentalgorithms for the solution of the auxiliary problems are used, the other twovariants may be found more efficient. In case Kk ⊂ K, the transition from oneregularization variant to any other during the solution procedure does not re-quire a correction of the triangulation parameter hk and accuracy parameter εk,but δk has to satisfy condition (8.2.76) with the corresponding new values L(r)and µk. Recall that L(r) and µk depend on the new norm in the space V , intro-duced according to the chosen regularization approach (see Section 8.2.3.1).

If we will see in several numerical examples described later on in Subsec-tions 8.2.7 and 8.3.3, experiments with MSR- and OSR-methods for solvingspecial classes of Problem (8.1.1) show that weak regularization is more prefer-able than the standard variant. This can be expected taking into account thestructure of the kernel of the bilinear form.

The following pure analytical investigation is going to emphasize this advantage.

8.2.11 Example. Let Ω := (x, y) : −π2 < x, y < π2 , V := H1(Ω).

Consider the variational inequality

finde u ∈ K : 〈Q(u), u− v〉 ≥ 0 ∀ v ∈ K,

withK := V, Q : u 7→ −∆u− f,

f is given by f(x, y) := 2 sinx sin y.Obviously this is equivalent to the Neumann problem

−∆u = 2 sinx sin y,∂u

∂n|∂Ω = 0,

whose solution set is

U∗ := sinx sin y + d : d ∈ R.

Applying the exact proximal point method 4.2.1 with u1 := 0, χk ≡ 1, weobtain a sequence uk ⊂ H2(Ω), where uk+1 is the unique solution of theboundary value problem

−2∆u+ u = 2 sinx sin y −∆uk + uk,∂u

∂n|∂Ω = 0. (8.2.62)

It is not difficult to verify (by means of an immediate substitution) that

uk = ak sinx sin y,

where (1− ak) = 35 (1− ak−1), a1 = 0.

With u∗ = sinx sin y ∈ U∗ one gets

u∗ − uk =

(3

5

)k−1

sinx sin y.

But, replacing in this method the classical regularizing functional

h : u 7→ 1

2‖u‖2H1(Ω) by h : u 7→ 1

2‖u‖2L2(Ω),


a sequence vk ⊂ H2(Ω), v1 := u1 = 0, is generated, where vk+1 is the uniquesolution of the problem

−∆v + v = 2 sinx sin y + vk,∂v

∂n|∂Ω = 0. (8.2.63)

Herevk = bk sinx sin y,

holds, with (1− bk) = 13 (1− bk−1), b1 := a1 = 0. Hence,

u∗ − vk =

(1

3

)k−1

sinx sin y.

To compare numerically these solutions, for instance, we have

‖u∗ − v6‖‖u∗ − u6‖

≈ 0.053,‖u∗ − v11‖‖u∗ − u11‖

≈ 0.0028, etc.

Moreover, under identical finite element approximations in problems (8.2.62)and (8.2.63), the conditioning of the discretized problems for (8.2.63) is, atleast, not worse than in (8.2.62) . ♦

8.2.5 Special Case: Inner approximation of the set K

Throughout this section we suppose that Assumption 8.2.4 is fulfilled and Prob-lem (8.2.39) is considered in the space Y (cf. (8.2.36)), i.e., statements on con-vergence of the methods are studied in this space. However, from the numericalpoint of view it makes sense to preserve relation (8.2.42) in its original form,i.e., in (8.2.42) the norm of the dual space Y ′ is used.

If the sets Kk in the auxiliary problems (8.2.40), (8.2.41) have the propertyKk ⊂ K,∀ i, convergence of Method 8.2.5 can be established under essentialweaker conditions with respect to the controlling parameters than under thosein the modified Theorems 4.3.6 and 4.3.8 (cf. also Corollary 8.2.8).

8.2.12 Assumption. Fix a solution u∗∗ ∈ U∗ of Problem (8.2.39) and

(i) for each k = 1, 2, ... the inclusion Kk ⊂ K and the inequality

ρ(u∗∗,Kk) ≤ µk (8.2.64)

hold, with µk ↓ 0;

(ii) for a given r0 ≥ supk µk the functional J satisfies a Lipschitz conditionwith the constant L0 on Br0(u∗∗);

(iii)∑∞k=1

√µk <∞,

∑∞k=1 εk <∞ hold true, where εk is defined according

to Method 8.2.5.

♦

8.2.13 Lemma. Let Assumption 8.2.12 be fulfilled, radius r∗ be chosen suchthat r∗ ≥ 8r0 and

∞∑k=1

(√L0µk +

εk2

+ 2µk

)<r∗

2. (8.2.65)


Moreover, assume that parameter δk satisfies the condition

1

2r∗

(L0µk − (δk −

εk2

)2)

+εk2< 0, k = 1, 2, ... (8.2.66)

Then, for Method 8.2.5, starting with u1,0 ∈ Br∗/4(u∗∗), the interior loop isfinite, i.e., i(k) <∞ for each k, and the inclusions

uk,i ∈ intBr∗(u∗∗), uk,i ∈ intBr∗(u∗∗)

are valid for all pairs (k, i).

Proof: Consider a fixed pair (k, i) with i ≥ 1. In view of Kk ⊂ K and (8.2.44)the relation J(u∗∗) ≤ J(uk,i) holds. Choosing vk ∈ Kk such that |vk−u∗∗| ≤ µk,Assumption 8.2.12(ii) provides that

J(vk) ≤ J(u∗∗) + L0µk,

hence

J(vk) ≤ J(uk,i) + L0µk. (8.2.67)

Following the proof of Lemma 4.3.5, one can establish

δk −εk2>√L0µk.

In case i < i(k), with regard to (8.2.51) (see the comment on the proof ofCorollary 8.2.8), it is possible to conclude that

‖Π1uk,i −Π1u

k,i−1‖H > ‖Π1uk,i −Π1u

k,i−1‖H − ‖Π1uk,i −Π1u

k,i‖H>

√L0µk.

Taking into account the inequalities (8.2.67) and (8.2.48), the use of Lemma8.2.7 with C := Kk, z0 := uk,i−1 and u := vk leads to

|uk,i − vk| ≤ |uk,i−1 − vk|+L0µk − (δk − εk

2 )2

2|uk,i−1 − vk|, for 1 ≤ i < i(k). (8.2.68)

If i = i(k), Lemma 8.2.7 gives immediately

|uk,i(k) − vk| < |uk,i(k)−1 − vk|+√L0µk. (8.2.69)

Hence, in view of

|uk,i − uk,i| ≤ εk2,

the inequalities

|uk,i− vk| ≤ |uk,i−1− vk|+L0µk − (δk − εk

2 )2

2|uk,i−1 − vk|+εk2

for 1 ≤ i < i(k) (8.2.70)

and

|uk,i(k) − vk| ≤ |uk,i(k)−1 − vk|+√L0µk +

εk2

(8.2.71)


hold. Using the assumptions that u1,0 ∈ Br∗/4(u∗∗) and r∗ ≥ 8r, in case i(1) > 1we obtain from (8.2.68), (8.2.70) and (8.2.66) the estimate

max[|u1,1 − v1|, |u1,1 − v1|

]≤ |u1,0 − v1|+ 1

2r∗

(L0µ1 − (δ1 −

ε12

)2)

+ε12

< |u1,0 − v1| < r∗

2, (8.2.72)

and in case i(1) = 1, due to (8.2.69), (8.2.71) and (8.2.65)

max[|u1,1 − v1|, |u1,1 − v1|

]≤ |u1,0 − v1|+

√L0µ1 +

ε12< r∗. (8.2.73)

Analogously to (8.2.72) we get

|u1,i − v1| < |u1,0 − v1| < r∗

2for i < i(1), (8.2.74)

and the finiteness of i(1) can be established as in Lemma 4.3.3.In view of |vk − u∗∗| ≤ µk, the relations (8.2.72) and (8.2.73) lead to

max[|u1,i − u∗∗|, |u1,i − u∗∗|

]≤ |u1,0 − u∗∗|+

√L0µ1 +

ε12

+ 2µ1

< r∗ for 1 ≤ i ≤ i(1).

Now, for the starting point u2,0 on the level k = 2 the following inequality isvalid

|u2,0 − v2| ≤ |u1,0 − u∗∗|+√L0µ1 +

ε12

+ 2µ1 + µ2 < r∗. (8.2.75)

Continuation of this procedure for k = 2, 3, ... provides step by step the followingestimates:

i(k) <∞;

for 1 ≤ i < i(k) :

max[|uk,i − vk|, |uk,i − vk|

]≤ |uk,i−1 − vk|+ 1

2r∗

(L0µk − (δk −

εk2

)2)

+εk2

;

for 1 ≤ i ≤ i(k) :

max[|uk,i − vk|, |uk,i − vk|

]≤ |u1,0 − u∗∗|+

k−1∑s=1

(√L0µs +

εs2

+ 2µs

)+√L0µk +

εk2

+ µk;

for 1 ≤ i ≤ i(k) :

max[|uk,i − u∗∗|, |uk,i − u∗∗|

]≤ |u1,0 − u∗∗|+

k∑s=1

(√L0µs +

εs2

+ 2µs

)< r∗

max[|uk+1,0 − vk+1|, |uk+1,0 − vk+1|

]≤ |u1,0 − u∗∗|

+

k∑s=1

(√L0µs +

εs2

+ 2µs

)+ µk+1 < r∗.


Hence, we can conclude that

uk,i ∈ intBr∗(u∗∗), uk,i ∈ intBr∗(u∗∗) for all k, i.

8.2.14 Theorem. Let r ≥ r∗ be fixed and assume that the following conditionsare fulfilled:

(i) the hypothesizes of Lemma 8.2.13;

(ii) ρ(Q∗, Qk) ≤ µk, k = 1, 2, ..., where µk ≤ c0µk (with some constant c0),Q∗ = U∗ ∩ Br∗(u∗∗) and Qk = Kk ∩ Br(u∗∗);

(iii) functional J satisfies a Lipschitz-condition with constant L(r) on Br(u∗∗);

(iv) sequence δk is chosen such that

1

4r

(L(r)µk − (δk −

εk2

)2)

+εk2< 0. (8.2.76)

Then the sequence uk,i, generated by Method 8.2.5 with a starting point u1,0 ∈Br∗/4(u∗∗), converges weakly to some solution u∗ of Problem (8.2.39).

Proof: Under the obvious assumption L0 ≤ L(r) condition (8.2.66) is an evidentconsequence of (8.2.76).Let be chosen w ∈ U∗ ∩ Br∗(u∗∗) arbitrarily and the point vk ∈ Qk be definedsuch that

|vk − w| ≤ µk. (8.2.77)

Then, due to hypothesis (iii),

J(vk) ≤ J(uk,i) + L(r)µk.

Lemma 8.2.13 ensures that uk,i ∈ intBr∗(u∗∗) for all pairs (k, i), consequently,

|uk,i − vk| ≤ |uk,i − u∗∗|+ |vk − u∗∗| < 2r.

For a fixed index k, using (8.2.76) and Lemma 8.2.7 with C := Qk, u := vk,z0 := uk,i−1, we obtain similarly as at the beginning of the proof of Lemma8.2.13

|uk,i − vk| ≤ |uk,i−1 − vk|+ 1

4r

(L(r)µk − (δk −

εk2

)2), 1 ≤ i < i(k),

and|uk,i(k) − vk| ≤ |uk,i(k)−1 − vk|+

√L(r)µk.

This leads, in view of |uk,i − uk,i| ≤ εk2 and (8.2.76), to

|uk,i − vk| < |uk,i−1 − vk|, 1 ≤ i < i(k), (8.2.78)

and|uk,i(k) − vk| ≤ |uk,i(k)−1 − vk|+

√L(r)µk +

εk2. (8.2.79)


8.2.16 Remark. Comparing the results of this section with the general state-ment on convergence of MSR-methods, we emphasize that the sets Qk, Q andthe values µk are defined here somewhat differently than in Section 4.3.2. Incomparison with the choice of the controlling parameters in (4.3.14), here wehave essentially weaker requirements for µk and εk: In principle, it is suffi-cient to guarantee only convergence of the series

∑∞k=1

√µk and

∑∞k=1 εk. After

choosing µk and εk the Lipschitz constant L0 has to be determined and r∗,r ≥ r∗ can be chosen (in contrast to (4.3.14)) such that

max

[2

∞∑k=1

(√L0µk +

εk2

+ 2µk

), 8r0

]< r∗.

Note that the left part of this inequality does not depend on r. Thereafter δkhas to be determined according to inequality (8.2.76).

In Theorem 8.2.14 set Q∗ coincides with a part of the optimal set U∗,whereas due to (4.3.4) Q∗ includes also points

uk,iQ := arg minu∈Q

Ψk,i(u).

Therefore, in general, more precise estimates in approximating Q∗ by Qk or byKk can be obtained from hypothesis (ii) in Theorem 8.2.14 than from Assump-tion 4.2.2 (cf. condition (4.2.6)). ♦

The choice of the sequences µk and µk according to

ρ(u∗∗,Kk) ≤ µk and ρ(Q∗, Qk) ≤ µk or ρ(Q∗,Kk) ≤ µk

depends on the particular data of the problem and on the chosen metric in thespace Y.

We study this question for two-body contact problems, restricting ourconsideration to the standard MSR-method 4.3.1. The corresponding analysisfor the other two variants of MSR-methods and also for the Signorini problemcan be carried out analogously.According to Remark 8.2.6, in this case one has to take Y := V with V definedby (8.2.11). The norm of u = (u′, u′′) in the space [Hs(Ω′)]2× [Hs(Ω′′)]2 (s > 0integer) is defined by

‖|u‖|s,Ω :=√‖|u′‖|2s,Ω′ + ‖|u′′‖|2s,Ω′′ . (8.2.82)

Assuming that a solution u of the two-body contact problem belongs to the space[H2(Ω′)]2 × [H2(Ω′′)]2 and taking into account the structure of the solutionset (cf. Theorem 8.2.2), any other solution ¯u is also contained in this space,moreover,

‖|u− ¯u‖|1,Ω = ‖|u− ¯u‖|2,Ω. (8.2.83)

Therefore, for any radius r1 the set U∗ ∩ Br1(u∗∗) is also bounded in the space[H2(Ω′)]2 × [H2(Ω′′)]2, i.e.,

‖|u‖|2,Ω ≤ ‖|u∗∗‖|2,Ω + r1, ∀ u ∈ U∗ ∩ Br1(u∗∗).

Hence, on account of Theorem A2.2.5, the interpolant uI,h (see DefinitionA2.2.4) of each function u ∈ U∗ ∩ Br1(u∗∗) on the triangulation Th yields

‖|u− uI,h‖|1,Ω ≤ c (‖|u∗∗‖|2,Ω + r1)h, (8.2.84)


with c independent of u and r1.But for h := hk, due to (8.2.12), (8.2.25) and the construction of Th, one

can conclude that uI,hk ∈ Kk. This enables us to define immediately µk andµk according to Theorem 8.2.14.Indeed, if

supkhk ≤

r1

c(c1 + r1),

with arbitrarily chosen radius r1 and an upper bound c1 for ‖|u∗∗‖|2,Ω, thesequence

µk := c(c1 + r1)hk, k = 1, 2, ..., (8.2.85)

satisfies the inequality µk ≤ r1. Thus, we can identify the radius r0 with r1 (seeLemma 8.2.13), and

∑∞k=1

√hk <∞ ensures that

∑∞k=1

√µk <∞.

Now, we determine r∗ according to (8.2.65), r∗ > 8r0 and take

µk := c(c1 + r∗)hk, k = 1, 2, ... (8.2.86)

Using (8.2.84) with r1 := r∗, h := hk, we obtain

‖|u− uI,hk‖|1,Ω ≤ µk (8.2.87)

and, because of uI,hk ∈ Kk, the estimate ρ(u,Kk) ≤ µk is valid for anyu ∈ U∗ ∩ Br∗(u∗∗).

Resuming this analysis, we have to choose the sequences hk, εk suchthat

∑∞k=1

√hk < ∞,

∑∞k=1 εk < ∞, and thereafter δk has to be defined by

(8.2.76), with r := r∗ + r and r ≥ supk µk. On this way the sequence uk,i,calculated by MSR-method 4.3.1, converges in the norm of the space V to somesolution of the two-body contact problem.

The crucial point is relation (8.2.83). It makes sense to extend this consid-eration to other monotone variational inequalities provided their solution setsfulfil a property similar to (8.2.83).

However, it should be remarked that the verification of the conditionsensuring the previously made assumption

U∗ ⊂ [H2(Ω′)]2 × [H2(Ω′′)]2,

is very complicated (cf. Fichera [114] about smoothness of the solutions ofSignorini problems).

Kustova [252] has investigated an OSR-method for two-body contactproblems under the assumption that the sought solutions have the form

∑γpηp+

ω, where ηp are known functions belonging to [H1(Ω′)]2× [H1(Ω′′)]2 with com-pact supports in the neighborhoods of break-points of the boundary and points,in which the type of the boundary condition changes; ω is supposed to belongto [H2(Ω′)]2 × [H2(Ω′′)]2 (concerning such assumptions for problems in elastic-ity theory, see Vorovich at al. [414]). In this case the values ρ(Q∗, Qk) andρ(Qk, Q) are of order

√hk.


8.2.6 Exactness of approximations of feasible sets

If the inclusion Kk ⊂ K cannot be guaranteed, the application of MSR-method4.3.1 to variational inequalities is more complicated. In particular, this concernsthe changing rule of the triangulation parameter in order to satisfy the condi-tions (4.2.6) and (4.3.14). Recall that for MSR-methods the set Q∗ in (4.2.6) isdefined according to (4.3.4).

For certain variational inequalities disturbance of the inclusion Kk ⊂ Kcan be caused not only by the geometry of the domain Ω and the chosen tri-angulation. For instance, in our Model Problem (8.1.1) this inclusion can bedisturbed, even of Ω is a rectangle and a uniform triangulation sequence is used.In order to be convinced of that it suffices to consider an example where thefunction g in the boundary condition (8.1.3) is assumed to be strictly concaveon one side of Ω.

8.2.6.1 On estimation of the value ρ(Kk,K)

Let us consider in Problem (8.1.1) the feasible set (8.1.3), where g is a trace ofa function in the space H2(Ω). We assume that for the special case g = 0 theinclusion Kk ⊂ K holds. This always occurs if

Ω is polyedral and⋃

T∈Thk

T = Ω, ∀ k.

Sometimes this inclusion holds also for domains with a more complicated geom-etry, for instance, if curved triangles near the boundary of Ω and iso-parametricfinite elements have to be used (for iso-parametric elements see Ciarlet [75]).Under the above assumption, for an arbitrarily chosen function g0 ∈ H2(Ω) andg := γg0 (γ trace operator) the estimate

ρ(Kk,K) ≤ c‖g0‖2,Ωhk

can be proved for corresponding sets Kk and K with c independent of g0 andhk. Indeed, for each element vh ∈ Kh the following relation with respect to theinterpolate of g0 on the triangulation Th holds true

vh ≥ (g0)I,h on Γ,

hence,vh − (g0)I,h ∈ K − g0.

But, due to Theorem A2.2.5,

‖g0 − (g0)I,h‖1,Ω ≤ c‖g0‖2,Ωh

is valid.As mentioned before, in the case of two-body contact problems with a

curved boundary the set Kh has the form

Kh = vh ∈ Vh : 〈ν(πk), v′h(πk)− v′′h(πk)〉R2 ≤ 0, k ∈ Ich,

with ν(x) a unit outward normal to Ω′ in x ∈ Γc.If ν is a trace on Γc of a three-times differentiable vector-function defined on


Ω′ ∪ Ω′′, one can prove that there exists a constant h0 > 0 such that for h ≤ h0

and each vh ∈ Kh the function

vh := 〈v′h, v′′h + h34 ν′′I,h〉

satisfies the inequality

〈ν(x), v′h(πk)− v′′h(πk)〉R2 ≤ 0, ∀ k ∈ Ich, ∀ x ∈ Γc.

This means that for hk ≤ h0

ρ(Kk,K) ≤ ch34

k .

8.2.6.2 On estimation of the value ρ(Q∗, Qi)

Now we consider the question about the choice of the sequence µk accordingto Assumption 4.2.2 in case a finite element approximation is used and anestimate

ρ(Kh ∩ Br, Q) ≤ c1hα, (α > 0)

is known.We assume that Ω is a plane domain with a sufficiently smooth boundary,

V := [H1(Ω)]m and that a : V × V → R is a symmetric, continuous bilinearform. Moreover, let

a(u, u) ≥ 0, ∀ u ∈ V ; f ∈ [L2(Ω)]m,

and K be a convex and closed subset of V .We consider the problem

minu∈KJ(u) =

1

2a(u, u)− ((f, u))0,Ω. (8.2.88)

8.2.17 Assumption.

(i) The optimal set U∗ of Problem (8.2.88) is non-empty and belongs to[H2(Ω)]m, moreover, for each element u ∈ U∗ the estimate

‖|u‖|22,Ω ≤ c2(‖|f‖|20,Ω + ‖u‖2V

)(8.2.89)

holds, with c2 independent of u.

(ii) For every f ∈ [L2(Ω)]m the function

u := arg minu∈K

1

2a(u, u) + ‖u‖2V − ((f , u))0,Ω

belongs to [H2(Ω)]m and

‖|u‖|22,Ω ≤ c2(‖|f‖|20,Ω + ‖u‖2V

). (8.2.90)

♦


In order to solve Problem (8.2.88) we suppose that MSR-method 4.3.1 isapplied, and the approximation is performed by means of finite elements withpiece-wise affine basis functions on a quasi-uniform system of triangulations.Taking

M ≥ sup‖u‖V 6=0

a(u, u)

‖u‖2V,

we choose the radii r and r∗ such that

r > 4‖|f‖|0,Ω, r∗ <4r

33 +√M + 34

, U∗ ∩ Br∗/8 6= ∅, (8.2.91)

with Br := u : ‖u‖V ≤ r. In the sequel the relation r∗ < r8 , being an obvious

conclusion of the relation in (8.2.91), will be used.Let Kk := Khk , the choice of sequence hk will be explained below. We

make use of the sets Q, Qk and Q∗ according to the general framework of theinvestigation of MSR-methods described at the beginning of Section 4.3.2.

8.2.18 Assumption.

(i) For each k the interpolant uI,hk of an arbitrary function u ∈ K belongsto Kk.

(ii) ρ(Qk, Q) ≤ c1hαk , k = 1, 2, ..., with fixed constants c1 > 0 and α > 0.

♦

On account of Theorem A2.2.5, for a function v ∈ [H2(Ω)]m we have

‖v − vI,hk‖V ≤ c‖|v‖|2,Ωhk, (8.2.92)

and taking

µk := µ(hk) = max

[c1h

αk ,

c√c2√2rhk

], k = 1, 2, ...,

L(r) := Mr + ‖|f‖|0,Ω (Lipschitz constant of J on Br),

the following result can be obtained.

8.2.19 Theorem. Suppose that the Assumptions 8.2.17, 8.2.18 and condition(8.2.91) are fulfilled, moreover, let u1,0 ∈ Br∗/4. With σk := 0 ∀k let the con-trolling parameters µk, δk and εk of MSR-method 4.3.1 be chosen according to(4.3.13) and (4.3.14).Then it holds

ρ(Q∗, Qk) ≤ µk, k = 1, 2, ...

Proof: Note that in the proof of Lemma 4.3.5 instead of the requirementρ(Q∗, Qk) ≤ µk only ρ(u∗∗, Qk) ≤ µk was used with u∗∗ ∈ U∗ ∩ Br∗/8. Nowwe are going to verify the latter estimate.In view of Assumption 8.2.17(i) the inequality

‖|u∗∗‖|2,Ω ≤√c2(‖|f‖|20,Ω + (

r∗

8)2) <

√c22r


holds, and applying (8.2.92), one can conclude that

‖u∗∗ − u∗∗I,hk‖V <

√c2c

2rhk < µ(hk) =: µk.

Relation µk <r∗

4 follows from (4.3.14), consequently, ‖u∗∗I,hk‖V < r. But As-sumption 8.2.18(i) ensures that u∗∗I,hk ∈ Kk, hence,

u∗∗I,hk ∈ Qk, ρ(u∗∗, Qk) < µk.

Thus, in view of Lemma 4.3.5, the estimates

‖uk,i‖V < r∗, ‖uk,i‖V < r∗ ∀ (k, i)

holds true. Now, due to

J(u) + ‖u− uk,i−1‖2V ≤

≤(M

2+ 1

)‖u‖2V + ‖|f‖|0,Ω‖u‖V + ‖uk,i−1‖2V + 2‖u‖V ‖uk,i−1‖V ,

and the facts that u∗∗ ∈ K ∩ Br∗/8 and ‖uk,i−1‖V < r∗, we can conclude that

minu∈QJ(u) + ‖u− uk,i−1‖2V <

<

(M

2+ 1

)(r∗)2

64+r∗

8‖|f‖|0,Ω + (r∗)2 +

(r∗)2

4(8.2.93)

=

(M

128+

81

64

)(r∗)2 +

r∗

8‖|f‖|0,Ω <

(M

128+

81

64

)(r∗)2 +

rr∗

32.

On the other hand, for ‖u‖V ≥ r2 > 4r∗ the inequality

J(u) + ‖u− uk,i−1‖2V ≥ −‖|f‖|0,Ω‖u‖V +(‖u‖V − ‖uk,i−1‖V

)2≥ −‖|f‖|0,Ω‖u‖V + (‖u‖V − r∗)2

(8.2.94)

is satisfied. In order to estimate the right-hand part in (8.2.94), keeping in mind(8.2.91) and ‖u‖V ≥ r

2 , we have to minimize the function

ξ(t) := (t− r∗)2 − ‖|f‖|0,Ω t

subject to (8.2.91) and t ≥ r2 .

Obviously, ξ′(t) = 2(t − r∗) − ‖|f‖|0,Ω > 0 for t ≥ r2 , hence, this minimum is

attained at t = r2 .

Therefore, continuation of estimate (8.2.94) leads with ‖u‖V ≥ r2 to

J(u) + ‖u− uk,i−1‖2V ≥ r2

4− rr∗ − 1

2‖|f‖|0,Ωr + (r∗)2

≥ r2

8− rr∗ + (r∗)2. (8.2.95)

A comparison of the estimates (8.2.93) and (8.2.95) shows: If the choices of rand r∗ obey the condition

r2

8− rr∗ + (r∗)2 >

(M

128+

81

64

)(r∗)2 +

rr∗

32, (8.2.96)


then uk,iQ := arg minJ(u) + ‖u− uk,i−1‖2V : u ∈ Q belongs to intBr/2.But inequality (8.2.96) is just obtained from

r∗ <4r

33 +√M + 34

.

Because uk,iQ ∈ intBr/2, we get uk,iQ = arg minΨk,i(u) : u ∈ K.In the sequel, using the expression

Ψk,i(u) :=1

2a(u, u) + ‖u‖2V − ((2uk,i−1 + f, u))0,Ω + ‖uk,i−1‖2V ,

we can apply Assumption 8.2.17(ii) with f := f + 2uk,i−1.Hence, with regard to ‖uk,i−1‖V < r∗ and (8.2.90), we get

‖|uk,iQ ‖|2,Ω ≤√c2

√(‖|f‖|0,Ω + 2r∗)

2+r2

4

and, due to (8.2.91),

‖|uk,iQ ‖|2,Ω ≤√c22r.

Now, (8.2.92) leads to

‖uk,iQ − (uk,iQ )I,hk‖V < c

√c22rhk ≤ µk (8.2.97)

and taking into account that µk <r2 , one can conclude that

‖(uk,iQ )I,hk‖V < r.

Together with Assumption 8.2.17(i) this ensures that (uk,iQ )I,hk ∈ Qk. Fur-thermore, in view of (8.2.89) and (8.2.92), we verify analogously for functionsu ∈ U ∩ Br∗ that

‖u− uI,hk‖V < µk. (8.2.98)

Because µk <r∗

4 , the relations

‖uI,hk‖V <r

4, uI,hk ∈ Qk

are true. Now, the inequalities (8.2.97) and (8.2.98) imply that

ρ(Q∗, Qk) ≤ µk, ∀ k.

The same analysis can be performed for the two-body contact problem if,instead of U∗ ⊂ [H2(Ω)]m, the inclusion U∗ ⊂ [H2(Ω′)]2 × [H2(Ω′′)]2 is used.

An analogous theorem is valid if the inequalities (8.2.89) and (8.2.90) arereplaced by estimates of the type

‖|u‖|22,Ω ≤ c2‖|f‖|20,Ω + c0,

‖|u‖|22,Ω ≤ c2‖|f‖|20,Ω + c0.


8.2.20 Remark. The usage of the radii r and r∗ in the statements on conver-gence of OSR- and MSR-methods is connected first of all with the techniqueof estimating ρ(Q∗, Qk) for variational inequalities. Usually, upper bounds forρ(U∗,Kk) can be obtained by means of an estimation of the distance betweenu ∈ U∗ and its interpolant.However, the norm of such an interpolant in V may be larger than the normof the interpolated function, and in case r = r∗ we cannot guarantee that theinterpolant of an arbitrary function u ∈ U∗∩Br belongs to Qk, even if Kk ⊂ K.A suitable harmonization of r∗ and r ensures that for u ∈ U∗∩Br∗ the inclusionuI,hk ∈ Br holds for all k (cf. Theorem 8.2.19).

From the numerical point of view it is important to specify the choice ofthe parameters r, r∗ and hk for each particular problem. However, Theorem8.2.19 gives only a qualitative characterization of the relations among theseparameters.

Obviously, in the case of semi-infinite problems, such questions do notarise, because the choice r = r∗ is the best one. ♦


The numerical examples described in this Subsection have been investigatedtogether with Voetmann [413].

8.2.21 Example. (Well-posed Signorini boundary obstacle problem)We consider the two-dimensional Signorini boundary obstacle problem (A2.1.6)with homogeneous boundary values

−∆u = f in Ω

u = 0 on ΓD∂u

∂ν= 0 on ΓN (8.2.99)

u ≥ 0,∂u

∂ν≥ 0, u

∂u

∂ν= 0 on ΓS

with data

Ω = (0, 1)× (0, 1),

ΓS = (x, y) ∈ Γ : y = 1, ΓD = Γ \ ΓS , ΓN = ∅,f(x, y) = 4π2 sin(2πy).

The domain Ω, the boundary conditions and the initial triangulation are shownin Figure 8.2.2. The triangulation is refined globally once at the start of eachexterior step.


ΓD

ΓD

ΓD

ΓS

Ω

f(x, y) := 4π2 sin(2πy)

Figure 8.2.2: Geometrical domain and date for Example 8.2.21

We note that the problem has a unique solution, since meas(ΓD) > 0, although〈f, 1〉L2(Ω) = 0.

We apply the MSR-method 4.3.1 with strong regularization, i.e. regulariza-tion in the space V . Here the multi-step scheme is somewhat simpler performed,because the interior cycle is done by setting a maximum number of inner iter-ations imax. Inner steps are meant to be taken as long as a sufficiently largeprogress of the iterates indicates movement towards the solution on the givenapproximation level. They are stopped, if the iterations indicate that no moreprogress on the current approximation level can be expected. This motivated usto set

δk,i := δk‖uk,0‖, δk ↓ 0,

and the numerical results below show the effectiveness of even this simple ap-proach.

The controlling parameters are chosen such that

χk := χ0 · χk, εk := ε0 · εk, δk := δ0 · δk,

with χ0 := 1.0, χ := 0.9, ε0 := 0.001, ε := 0.5 and δ0 := 0.01, δ := 0.8. Thearising regularized finite dimensional auxiliary problems in each interior stepare solved by a Newton method from the Mathlab package.

Table 8.2.1 shows the iteration log of the MSR-method with strong regu-larization and Figure 8.2.3 shows the computed solution.

k dim #reg #nwt err EOC cpu1 13 5 3.4 1.24e-0 0.022 41 8 5 7.14e-1 0.81 0.103 145 8 5 3.82e-1 0.90 0.364 545 7 4 1.98e-1 0.95 1.095 2113 5 3 1.02e-1 0.95 4.916 8321 4 3 5.38e-2 0.92 33.267 33025 4 3 2.94e-2 0.87 216.27

Table 8.2.1: Iteration log of Example 8.2.21 with strong regularization


The first column shows the number of global refinements of the triangulationmesh; in the second it can be seen the number of resulting degrees of freedom, i.e.number of variables to be optimized. The next three columns indicate the num-ber of interior steps taken with the respective regularization parameter (fixedon each refinement level), the number of Newton steps and the relative errordistance of the computed iterate to the iterate on the preceding mesh interpo-late after refining the mesh. The norm used to measure this error distance is theH1-norm as this is the underlying space. Finally, an estimated order of conver-gence and cpu-time for the overall exterior step are given. The estimated orderof convergence is based on a simple residual type error estimator and providedby the used finite element code ALBERT.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0

0.2

0.4

0.6

0.8

10 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

−1

−0.5

0

0.5

1

1.5

Figure 8.2.3: Computed solution of Example 8.2.21 with strong regularization

It is seen that the partition of the Signorini boundary into the parts with Dirich-let and Neumann boundary conditions is accurately traced.

As a comparison we now present in Table 8.2.2 and Figure 8.2.4 theresults obtained with the same algorithm and same controlling parameters butusing weak regularization, i.e. with L2- norm.

k dim #reg #nwt err EOC cpu1 13 2 3.5 4.45e-1 0.022 41 4 5 1.26e-1 1.81 0.043 145 2 4.5 3.37e-1 1.90 0.084 545 2 3 8.71e-3 1.95 0.265 2113 2 3 2.25e-3 1.95 1.876 8321 2 2 5.95e-4 1.92 13.047 33025 1 2 1.63e-4 1.87 51.03

Table 8.2.2: Iteration log of Example 8.2.21 with weak regularization


0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0

0.2

0.4

0.6

0.8

10 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

−1

−0.5

0

0.5

1

1.5

Figure 8.2.4: Computed solution of Example 8.2.21 with weak regularization

We see that the number of interior iterations is significantly decreased.This is naturally reflected in the cpu-time, although the number of Newtonsteps per interior iteration does not change notably. Our primary motivation forintroducing the weak regularization is therefore satisfied.We also observe that as a price the boundary partition is not as cleanly tracedas by using strong regularization.

The multi-step scheme was motivated by the desire not to let the mesh berefined too fast. An alternative approach is to use a one-step scheme withoutrefining the mesh globally in each iteration. Based on an error estimator pro-vided by the finite element code ALBERT we present next the results of suchan approach. After each exterior step the mesh is refined by the following simplemaximum strategy.

Mesh refinement algorithm (Maximum strategy)At each iteration k, given

a triangulation T k and

a local error estimate ηT for all T ∈ T k.

Choose a maximum threshold γ ∈ (0, 1) and refine all elements S ∈ T k forwhich

ηS > γ maxT∈T k

ηT .

Typically, a threshold γ := 0.5 is chosen.Using this approach the resulting iterations and the solution are displayed

in Table 8.2.3 as well in Figure 8.2.5 with the corresponding meshes depictedin Figures 8.2.6 and 8.2.7.

k dim #reg #nwt err EOC cpu10 179 1 3 3.01e-2 0.0420 576 1 3 7.76e-3 1.95 0.1630 1399 1 2 3.58e-3 1.12 0.3740 2407 1 2 2.04e-3 0.81 0.9750 5183 1 2 9.74e-4 1.07 3.16

Table 8.2.3: Iteration log of Example 8.2.21 with weak regularization and adap-tive mesh refinement


0

0.2

0.4

0.6

0.8

1 00.2

0.40.6

0.81

−1

−0.5

0

0.5

1

1.5

Figure 8.2.5: Computed solution of Example 8.2.21 with weak regularizationand adaptive mesh refinement

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Figure 8.2.6: Adaptively refined meshes of Example 8.2.21: k = 10 and k = 20

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Figure 8.2.7: Adaptively refined meshes of Example 8.2.21: k = 30 and k = 40

We see that the desired effect is taking place. In each iteration, only a smallproportion of the mesh is refined. Due to the slowly increasing dimensions ofthe problems the Newton iterations are very cheap with respect to the cpu-time.Most important, the tracing of the partition is even sharper than by using thestrong regularization.

♦


8.2.22 Example. (Frictionless contact with a rigid foundation)

Ω

ΓF

ΓD

u f

ΓC α(x1)

g

Figure 8.2.8: Simplified contact problem with rigid foundation

Now, let the boundary be divided into three parts Γ := ΓD ∪ ΓF ∪ ΓC , whereΓD 6= ∅ and ΓC 6= ∅ and the three sets are open and mutually disjoint. ΓD isthe boundary part with prescribed displacements and ΓF is the boundary partwhere boundary forces apply. Additionally we consider the existence of a rigidfoundation and a parametrization α of the distance of any boundary point onΓC to the rigid foundation in x2-direction, i.e., α : I → R is a smooth realfunction from a bounded interval I ⊂ R such that for any x1 ∈ I the point(x1, α(x1))T ∈ ΓC and (x1, 0)T is a point on the rigid foundation (see Figure8.2.8).

Denote u(x1, x2) the displacement vector of a body-particle x = (x1, x2)T ,

εij :=1

2

(∂ui∂xj

+∂uj∂xi

), i, j = 1, 2

the linearized strain tensor,

τij := aijklεkl, i, j, k, l = 1, 2

the linearized stress tensor with aijkl the elasticity coefficients. Moreover, denoteTν = τ(u) · ν the stress of the displacements in direction of the normal vector ν,Tt = τ(u) · t the stress of the displacements in tangential direction, respectively.Then the following simple one-sided contact problem can be imposed:

−div τ(u) = f in Ω, (8.2.100)

u = d on ΓD, (8.2.101)

τ(u) · ν = g on ΓF , (8.2.102)

uν ≤ α on ΓC , (8.2.103)

Tν ≤ 0 on ΓC , (8.2.104)

(uν − α)Tν = 0 on ΓC , (8.2.105)

Tt = 0 on ΓC . (8.2.106)


The first three equations are equilibrium conditions, in particular, equation(8.2.100) describes the equilibrium of the stress of the displacements inside thebody, (8.2.101) requires an equilibrium of the displacements on the Dirichletboundary, whereas (8.2.102) is due to the equilibrium of the stress of displace-ments in direction of the normal vector on ΓF . Condition (8.2.103) preventspenetration into the rigid foundation, conditions (8.2.104) says that the stressof the displacements on the contact zone is acting towards the body and (8.2.105)makes sure that if no contact then there is no stress of the displacements on ΓC .The last condition says that no friction appears in this simplified model.

The weak formulation of this model is described by

K := u ∈ [H1(Ω)]2 : γu|ΓD = d, γuν |ΓC ≤ α,

and

a(u, v) :=

∫Ω

aijklεkldΩ,

`(u) :=

∫Ω

fiuidΩ +

∫ΓF

giuidΓ.

The energy functional J(u) = 12a(u, u)−`(u) which is to minimize is not coercive

on V , but the second Korn inequality allows us to employ weak regularization.

In the sequel we show two simple examples of such an one-sided contactproblem which differ only in the elasticity of the material, expressed by the cor-responding Yung’s modulus.

Deflection of a steel bar: (cf. Figure 8.2.9)Data: Yung’s modulus E := 2.15 · 1011, Poisson’s coefficient π := 0.29 andg2(x) := −5.75 · 108Nm−2 .

Figure 8.2.9: Steel bar fastened on the left side of the rigid body

In Figure 8.2.10 and Table 8.2.4 the graph of displacements and the iterationsare fixed.


0.5 1 1.5 2 2.5 3 3.5 4

−1

−0.5

0

0.5

1

1.5

2

0 0.5 1 1.5 2 2.5 3 3.5 4

−1

−0.5

0

0.5

1

1.5

2

Figure 8.2.10: Displacements of the steel bar

k dim #reg #nwt cpu1 54 3 6.6 0.12 170 1 8 0.213 594 1 6 1.014 2210 1 5 10.815 8514 1 5 75.286 33410 1 5 332.52

Table 8.2.4: Iteration log: deflection of a steel bar

Deflection of an aluminium bar: (cf. Figure 8.2.11)Data: Yung’s modulus E := 2.15, Poisson’s coefficient π := 0.29 and g2(x) :=−5.75 · 108Nm−2.

Figure 8.2.11: Aluminium bar fastened on the left side of the rigid body


0.5 1 1.5 2 2.5 3 3.5 4

−1

−0.5

0

0.5

1

1.5

2

0.5 1 1.5 2 2.5 3 3.5 4−1

−0.5

0

0.5

1

1.5

2

Figure 8.2.12: Displacements of the aluminium bar

In Figure 8.2.12 the graph of displacements is depicted and Table 8.2.5 givessome information about the iterations.

k dim #reg #nwt cpu1 54 10 3 0.12 170 1 3 0.073 594 1 2 0.234 2210 1 2 1.755 8514 1 2 14.26

Table 8.2.5: Iteration log: deflection of an aluminium bar

♦

8.3 Solution of Contact Problems With Friction

Our study of this class of problems is based on an algorithm of alternating it-erations, suggested by Panagiotopoulos [315]. This algorithm, dealing withthe alternate solution of variational inequalities of two types, does not have astrong theoretical foundation in the case of non-coercive operators. However, itis common use in practice.

To make its description simple, we analyze this algorithm only for theSignorini problem with friction. Its formal extension to the two-body contactproblem is possible without difficulties.Concerning solvability of contact problems with friction only a result of Necas,Jarusek and Haslinger [308] is known for the Signorini problem in case thefriction coefficient is small and the body is fastened on a part of its boundary.

The main idea of the algorithm is as follows: On each iteration two convexauxiliary problems have to be solved, one of which is an analogue of the classicalSignorini problem (without friction) and the other one consists in the uncon-strained minimization of a non-smooth, convex and continuous functional.

In the sequel we construct by means of the PPR stable solution proceduresfor these auxiliary problems, which are ill-posed in general.

8.3. SOLUTION OF CONTACT PROBLEMS WITH FRICTION 329

8.3.1 Signorini problem with friction

Preserving the notations and assumptions made previously in Subsection 8.2.1for the classical Signorini problem, we take into account the friction arising onthe contact boundary.Let the friction coefficient ρ belong to C2(Γc) and denote by Tν the normalcomponent of the reaction of the boundary Γc (normal force). Considering Tν ≤0 as a fixed parameter, the following variational problem can be formulated

minu∈K1

2a(u, u)−

∫Ω

FkukdΩ−∫

Γτ

PkukdΓ + j(u;Tν), (8.3.1)

whereK := v ∈ [H1(Ω)]2 : vν = 0 on Γ0, vν ≤ 0 on Γc, (8.3.2)

with j(u;Tν) :=∫

Γcρ|Tν ||ut|dΓ, and ut denotes the tangential component of the

displacement vector u.For fixed Tν Problem(8.3.1) is called Signorini problem with friction under

given normal forces.In realistic situations it is not natural that the normal forces on the contact

boundary are known. The equilibrium state answers to the relation

Tν = τkl (u(Tν)) νlνk, (8.3.3)

understanding in a generalized sense (see Duvaut and Lions [95], chapt. 3).Here u(Tν) describes the displacement of the points on Γc according to Prob-lem (8.3.1) and τkl are coefficients of the corresponding stress tensor. Problem(8.3.1)–(8.3.3) is called Signorini problem with friction. Apparently the questionabout its unique solvability is still open.

8.3.2 Algorithm of alternating iterations

On each step of this method the following two problems have to be solved oneafter the other:

Problem I: Given tangential forces Tt on the boundary Γc, minimize

1

2a(u, u)− `t(u),

subject to K, defined by (8.3.2), where

`t(u) :=

∫Ω

FkukdΩ +

∫Γτ

PkukdΓ +

∫Γc

TtutdΓ.

Problem II: Given normal forces Tν on the boundary Γc, minimize

1

2a(u, u)− `ν(u) + j(u;Tν),

subject to V := u ∈ [H1(Ω)]2 : uν = 0 on Γ0, where

`ν(u) :=

∫Ω

FkukdΩ +

∫Γτ

PkukdΓ +

∫Γc

TνuνdΓ.


Starting, for instance, with T 0t := 0, at the beginning of the i-th step an

approximate value T i−1t of the tangential forces has to be known.

Then, at the i-th step, initially Problem I has to be solved with Tt := T i−1t .

This results in a vector ui−1/2, characterizing the corresponding displacement.Now, we are in the position to calculate an approximation of the normal forcesvia (8.3.3) by

T iν := Tν(ui−1/2) = τkl(ui−1/2)νlνk.

Thereafter, solving Problem II subject to the normal forces Tν , we find thevector ui and compute the tangential forces:

(T it )k = τkl(ui)νi −

(τkl(u

i)νlνk)νk, k = 1, 2.

Symbol Tt is used for the vector Tt as well as for the value of its tangentialcomponent.The iteration procedure terminates if the distances between the normal forces aswell as between the tangential forces in two successive iterations become small.

In the following we analyze the Problems I and II under the assumptionsthat

Pk ∈ L∞(Γτ ), k = 1, 2; Tt ∈ L∞(Γc), Tν ∈ L∞(Γc),

maintaining the other suppositions made for Problem (8.2.16).Solvability of Problem I can be concluded from Theorem 8.2.3, by replacing

the linear functional ` in (8.2.16) by `t (cf. [182]). This result says: If ProblemI is solvable, then there exists no unique solution in general.

Concerning Problem II, in [95], chapt. 3, the following result is established.

8.3.1 Theorem. Denote K the kernel of the bilinear form (8.2.17). For solv-ability of Problem II it is necessary that

|`ν(y)| ≤ j(y;Tν), for any y ∈ K ∩ V.

A solution exists if for any non-zero vector y ∈ K ∩ V

|`ν(y)| < j(y;Tν).

Uniqueness of the solution of Problem II is not known in the general case.Nevertheless, it is obvious that if there exists more than one solution, then theirdifferences belong to K.

In the sequel we assume that the algorithm of alternating iterations iswell-defined, i.e., at each step the arising auxiliary problems are solvable. Wealso suppose that the functions T iν and T it are necessarily smooth.

We will leave open the questions about verification of these assumptionsand the study of the smoothness of solutions of these auxiliary problems as well,because our main concern is to construct a stable solution process for these prob-lems.

Problem I can be solved by any of the three MSR-methods described inSection 8.2.3, and the changing rule for the controlling parameters is completelyanalogous to those guaranteeing convergence of the corresponding method for


the classical Signorini problem. In particular, if the discretization secures theinclusion Kk ⊂ K, then the controlling parameters are determined by Theorem8.2.14 and Remark 8.2.16, but in more general situations the results of Theo-rem 8.2.19 can be used.

However, in the framework of the algorithm of alternating iterates theserules are not strictly substantiated. Nevertheless, they ensure convergence of theiterates to some solution of Problem I and Problem II, respectively, in case thesolutions of these problems are sufficiently smooth, say they belong to [H2(Ω)]2.

Now we consider the numerical treatment of Problem II. Difficulties inminimizing the functional

J(u) :=1

2a(u, u)− `ν(u) + j(u;Tν) (8.3.4)

on the space V under fixed Tν are related in the first line with the non-differentiabilityof j(u;Tν). In order to use one of the MSR-methods mentioned above for solv-ing this problem, a smoothing procedure for the objective functional (8.3.4) hasto be included in this method. Therefore, we shall approximate j(u;Tν) by asequence of functionals

jk(u;Tν) =

∫Γc

ρ|Tν |√u2t + τkdΓ, τk ↓ 0. (8.3.5)

In the sequel our consideration is concentrated on the method with weak regu-larization, which generates the following sequence of auxiliary problems:

minu∈VkJk(u) + ‖|u− uk,i−1‖|20,Ω, i = 1, ..., i(k); k = 1, 2, ..., (8.3.6)

with

Jk(u) :=1

2a(u, u)− `ν(u) + jk(u;Tν),

and Vk := Vhk the finite element space of V , obtained by means of the triangu-lation Thk . Thus, in Method 8.2.5 we take

Ψk,i(u) := Jk(u) + ‖|u− uk,i−1‖|20,Ω.

Obviously, the following estimate holds:

Jk(u)− J(u) =

∫Γc

ρ|Tν |(√

u2t + τk − |ut|

)dΓ

=

∫Γc

ρ|Tν |τk√u2t + τk + |ut|

dΓ

≤ c3√τk, c3 :=

∫Γc

ρ|Tν |dΓ.

Hence, for arbitrary radius r condition (4.2.5) in Assumption 4.2.2 is fulfilledwith σk := c3

√τk.

8.3.2 Remark. If instead of ‖|u − uk,i−1‖|20,Ω the regularization term ‖u −uk,i−1‖2V is used, then convergence follows immediately from the Theorems 4.3.6and 4.3.8. However, numerical experiments show that the method with weakregularization is more successful for such problems (see [365]). ♦


Now we study convergence of the method with weak regularization applied toProblem II. This can be done similarly to the proofs of Lemma 8.2.13 andTheorem 8.2.14. It should be noted that in the given situation the relationsKk = Vk and K = V are true, i.e. inclusion Vk ⊂ V means that Kk ⊂ K.

Assume that the solution set U∗ of Problem II is non-empty and belongsto the space [H2(Ω)]2. On the space V we introduce the new norm (cf. with(8.2.36))

|u|2Ω :=1

2b(u, u) + ‖|u‖|20,Ω,

with b(u, v) := c0∫

Ωεkl(u)εkl(v)dΩ and c0 > 0 chosen according to (8.2.3).

The space equipped with this norm will be denoted by V. Equivalence betweenthe norms ‖|u‖|1,Ω and |u|Ω can be established in the same way as it was donefor the two-body contact problem in Section 8.2.3.

In the sequel we consider Problem II in the space V. If u and ¯u are twosolutions of this problem, then relation (8.2.83) holds true. Hence, for some fixedu∗∗ ∈ U∗ and any u ∈ U∗ ∩ Br1(u∗∗), the estimate

‖|u‖|2,Ω ≤ ‖|u∗∗‖|2,Ω + c4r1

is satisfied, with c4 independent of r1 and Br1(u∗∗) ⊂ V.We make use of the same finite element approximation as in the case of the

classical Signorini problem and assume that hk ≥ hk+1 ∀ k. Due to the knownestimate

‖|v − vI,h‖|1,Ω ≤ c‖|v‖|2,Ωh, v ∈ [H2(Ω)]2,

equivalence between ‖|v‖|1,Ω and |v|Ω yields

|v − vI,h|Ω ≤ c‖|v‖|2,Ωh. (8.3.7)

Choosing

M ≥ supu6=0

a(u, u)

|u|2Ωand denoting by M1(Tν) the Lipschitz constant of the functional

−`ν(u) +

∫Γc

ρ|Tν ||ut|dΓ on V,

thenL(r1) := M (|u∗∗|Ω + r1) +M1(Tν)

is the Lipschitz constant of the functional (8.3.4) on Br1(u∗∗).Due to (8.3.7),

|u∗∗ − u∗∗I,hk |Ω ≤ c‖|u∗∗‖|2,Ωhk.

Now we take

µk := c5hk, µk := c6hk, k = 1, 2, ...,

r0 := c5h1, L0 := M (|u∗∗|Ω + r0) +M1(Tν),(8.3.8)

where c5 ≥ c‖|u∗∗‖|2,Ω and c6 ≥ c (‖|u∗∗‖|2,Ω + c4r∗) are fixed, and the choice

of r∗ will be specified below.


8.3.3 Theorem. Suppose that

∞∑k=1

√µk <∞,

∞∑k=1

√σk <∞,

∞∑k=1

εk <∞,

and assume further that r∗ and r satisfy the inequalities

r∗ > max

[8r0, 2

∞∑k=1

(√L0µk + 2σk +

εk2

+ 2µk

)], (8.3.9)

r ≥ r∗ + c (‖|u∗∗‖|2,Ω + c4r∗)h1, (8.3.10)

and the parameters δk are chosen such that

1

4r

(L(r)µk + 2σk − (δk +

εk2

)2)

+εk2< 0, k = 1, 2, ...

Then, starting with u1,0 ∈ Br∗/4(u∗∗), the method with weak regularization, withauxiliary problems via (8.3.6), converges to some solution of Problem II.

Using (8.3.8) together with σk := c3√τk, one can reformulate this state-

ment in terms of the original controlling parameters hk, τk, εk and δk.

Comments on the proof: Finiteness of k(i) and the validity of the inclusionsuk,i ∈ intBr∗(u∗∗), uk,i ∈ intBr∗(u∗∗) follow from the proof of Lemma 8.2.65 ifwe use, instead of (8.2.67) the estimate

Jk(vk) ≤ Jk(uk,i) + L0µk + 2σk

and thereafter we apply Lemma 8.2.7 with

J1 := Jk, C := Kk, z0 := uk,i−1, u := vk.

With analogous modifications in the proof of Theorem 8.2.14 we obtain, insteadof (8.2.79), the inequality

|uk,i(k) − vk|Ω ≤ |uk,i(k)−1 − vk|Ω +√L(r)µk + 2σk +

εk2,

which should be used further together with (8.2.77) in order to prove conver-gence of the sequence |uk,0 −w|Ω for each w ∈ U∗ ∩Br∗(u∗∗). The remainingmodifications are obvious.

We emphasize that the successive approximation of the objective functionalJ in (8.3.4) by means of a family of convex functionals Jk is an eminet part ofthe general MSR-scheme. Here this approximation is carried outby replacing a non-smooth problem by a sequence of smooth problems, whereasin Section 6.2 this possibility was used to construct path-following strategiesfor solving parametric SIP.


Let us consider the following simple model with boundary friction:


8.3.4 Example. (Simplified Signorini problem with boundary friction)

u ∈ K : a(u, v − u) + j(v)− j(u) ≤ 〈`, v − u〉 ∀ v ∈ K,

where Γ = ∂Ω = ΓD + ΓF , f : Ω→ R and

K := u ∈ H1(Ω) : γ(u) = 0 on ΓD,

with

a(u, v) := µ

∫Ω

uvdx+

∫Ω

∇u · ∇vdx,

〈`, u〉 :=

∫Ω

fudx,

j(u) := ρ

∫ΓF

|u|dΓ.

The geometrical configuration, data and solution for Example 8.3.4 with frictioncoefficient ρ := 1.0 can be seen in Figure 8.3.13. One should notice that theirregularity in the displacements along ΓF is due to the influence of the frictionon the one-dimensional friction domain ΓF .

ΓF

ΓF

f = 10 f = −10

Ω = (0, 1)2

Figure 8.3.13: Simplified Signorini problem with boundary friction (ρ := 1.0)

A comparison of the computations for weak and strong regularization tech-niques, depicted in Tables 8.3.6 and 8.3.7, shows the superiority of the weakregularization approach.

dim #reg err cpu386 3 1.000000e+00 1.28e+01

1545 2 9.288978e-02 9.34e+006179 2 4.887215e-02 4.15e+01

24711 2 2.470255e-02 7.45e+0298831 2 1.244462e-02 6.63e+03

Table 8.3.6: Iteration log: weak regularization

8.4. REGULARIZATION OF WELL-POSED PROBLEMS 335

dim #reg err cpu386 10 1.284012e+00 5.52e+01

1545 4 9.199661e-02 1.74e+016179 3 4.841953e-02 7.61e+01

24711 3 2.464253e-02 9.58e+0298831 3 1.246926e-02 1.27e+04

Table 8.3.7: Iteration log: strong regularization

Figure 8.3.14 shows the behavior of the solution along ΓF for different frictionalcoefficients ρ and says: The more the friction, the less the displacements alongthe frictional boundary. For ρ := 2.0 there are no displacements at all.

ρ = 1 ρ = 1.5 ρ = 2.0

Figure 8.3.14: Solution for different frictional coefficients ρ

♦

8.4 Regularization of Well-posed Problems

Let us return to Problem (8.1.1) and assume that 〈f, 1〉0,Ω < 0. Then, as men-tioned in (8.1.4), the problem has a unique solution u∗. If for each functionu ∈ H1(Ω) the Poincare inequality

‖|∇u‖|20,Ω +

(∫Ω

udΩ

)2

≥ A‖u‖21,Ω (A > 0 independent of u)

holds on the domain Ω, then each generalized minimizing sequence convergesin the norm of the space H1(Ω) to u∗ (cf. Kaplan and Namm [207]). Thus,regularization is first of all introduced in order to reduce the influence of errorsarising in the construction of discrete approximations of the problems (especiallyif the boundary is curved) and also to guarantee a stable solution process of theapproximate problems as well as a more efficient application of fast algorithms.For the case that a quasi-uniform triangulation sequence can be constructed


withKk ⊂ Kk+1 ⊂ K,

which is possible, for instance, if Ω is a polyhedral set and the function g in(8.1.3) is convex on each face of Ω, an OSR-method is considered in [207], whereuk is generated by

uk ∈ Kk, ‖uk − uk‖1,Ω ≤ εk,

with uk := arg minu∈KkJ(u) + ‖u− uk−1‖20,Ω.If the solution of the original problem belongs to the space H2(Ω), then con-vergence of uk to u∗ in the norm of H1(Ω) is shown under the conditionsthat

εk ≤√h3k,

∞∑k=1

√h3k <∞,

which are in general weaker than those results from Theorem 8.2.14 for OSR-methods. Assumption u ∈ H2(Ω) seems to be natural, taking into account theresults of Lions [271] and Brezis [51] for the similar problem with

J(u) :=1

2‖u‖21,Ω − 〈f, 1〉0,Ω

instead of (8.1.2).These conditions for εk and hk are not surprising, because the problem is well-posed with respect to small perturbations of both, the objective function in thenorm of the space C(V ) and the feasible set in the metric of H1(Ω).

Numerical experiments for solving the regularized auxiliary problems withrelaxation methods show a good stability of the solution process with respectto discretization errors (cf. Namm [300]).

8.4.1 Linear rate of convergence of MSR-methods

Now we concentrate our investigation on the verification of the Assumptions5.1.4 and 5.1.6, which guarantee a linear rate of convergence of MSR-methodsby changing the controlling parameters in the framework of the Theorems 5.1.5,5.1.7 and Corollary 8.2.9.

We assume that finite element approximation of the considered variationalinequality secures the inclusions Kk ⊂ K and that the chosen method generatesiterates uk,i ∈ K. Then the Assumptions 5.1.4 and 5.1.6 can be transformedinto a formally simpler condition if we take c := 0 in (5.1.29).

For Problem (8.1.1) and a number of other variational problems, wherethe kernel of the bilinear form on the space V is one-dimensional, the validity ofthe modified assumption follows from a result in [301] which we describe now.

We consider the problem

minu∈K1

2a(u, u)− `(u), (8.4.1)

with K a convex, closed subset of a Hilbert space V , a(·, ·) a symmetric, con-tinuous bilinear form on V × V and ` : V → R a bounded linear functional.

8.4.1 Assumption.

8.4. REGULARIZATION OF WELL-POSED PROBLEMS 337

(i) Kernel K of the bilinear form a(·, ·) on V is one-dimensional;

(ii) a(u, u) ≥ c0‖Πu‖2, ∀ u ∈ V, (c0 > 0),Π an ortho-projector on the orthogonal complement K⊥ of the kernel K;

(iii) `(z) 6= 0, ∀ z ∈ K, z 6= 0;

(iv) Problem (8.4.1) is solvable. ♦

This assumption ensures that the solution u∗ of Problem (8.4.1) is unique.We show now that for Problem (8.1.1) the Assumption 8.4.1(ii) is fulfilled underthe validity of the Poincare inequality.Indeed, for each u ∈ V := H1(Ω) the element v := 1

meas Ω

∫ΩudΩ belongs to K

and for u1 := u− v one gets

a(u, u) = ((∇u,∇u))0,Ω = ((∇(u1 + v),∇(u1 + v)))0,Ω

= ((∇u1,∇u1))0,Ω = ((∇u1,∇u1))0,Ω +

(∫Ω

u1dΩ

)2

≥ A‖u1‖21,Ω ≥ A‖Πu1‖21,Ω = A‖Πu‖21,Ω.

Using the analysis given in [307] for problems in elasticity theory, we canalso state that Assumption 8.4.1 holds true for two-body contact problems.

8.4.2 Theorem. Let Assumption 8.4.1 be fulfilled for Problem (8.4.1). Thenthere exists a constant c1 > 0 such that

J(u)− J(u∗) ≥ c1 min[‖u− u∗‖, ‖u− u∗‖2

], ∀ u ∈ K,

with J(u) := 12a(u, u)− `(u).

Proof: For arbitrarily chosen u ∈ V the following relation is obvious

J(u)− J(u∗) = a(u∗, u− u∗)− `(u− u∗) +1

2a(u− u∗, u− u∗).

We decompose the vector (u− u∗) into the direct sum

u− u∗ = λ‖u− u∗‖v′ + µ‖u− u∗‖v′′, λ2 + µ2 = 1,

where v′ ∈ K and v′′ ∈ K⊥ are unit vectors, with `(v′) < 0. Then

J(u)− J(u∗) = a(u∗, µ‖u− u∗‖v′′)− `(λ‖u− u∗‖v′ + µ‖u− u∗‖v′′)

+1

2a(µ‖u− u∗‖v′′, µ‖u− u∗‖v′′)

= ‖u− u∗‖ (−λ`(v′)− µ`(v′′) + µa(u∗, v′′))

+µ2

2‖u− u∗‖2a(v′′, v′′). (8.4.2)

The variational inequality

a(u∗, u− u∗)− `(u− u∗) ≥ 0 ∀ u ∈ K,

which corresponds to Problem (8.4.1), can be described as

−λ`(v′) + µ (a(u∗, v′′)− `(v′′)) ≥ 0. (8.4.3)


If λ < 0, then we obtain

−√

1− µ2`(−v′) + µ (a(u∗, v′′)− `(v′′)) ≥ 0 (8.4.4)

and because of `(−v′) > 0, relation (8.4.4) leads to

µ2 ≥ (`(−v′))2

(a(u∗, v′′)− `(v′′))2+ (`(−v′))2 .

Choosing

M ≥ supu 6=0

a(u, u)

‖u‖2, `0 := ‖`‖V ′ ,

the inequality

µ2 ≥ (`(−v′))2

(M‖u∗‖+ `0)2

+ (`(−v′))2 =: c2 (8.4.5)

is true. Assumption 8.4.1(ii) together with (8.4.2), (8.4.3) and (8.4.5) yield

J(u)− J(u∗) ≥ c2c02‖u− u∗‖2. (8.4.6)

If λ ≥ 0, then we have to face two possibilities:

|µ| > c3 > 0 and |µ| ≤ c3,

with

c3 :=−c4c5 + c5

√4c24 + 3c25

2 (c24 + c25)< 1

and c4 := M‖u∗‖+ `0, c5 := `(−v′).In the first case the estimate

J(u)− J(u∗) ≥ c23c02‖u− u∗‖2 (8.4.7)

follows immediately from (8.4.2), (8.4.3) and Assumption 8.4.1(ii).In the second case the inequality√

1− µ2c5 − |µ|c4 ≥c52

can be easily verified. Thus

λ`(−v′) + µ (a(u∗, v′′)− `(v′′)) ≥√

1− µ2c5 − |µ|c4 ≥c52

holds true and, due to (8.4.2), we obtain

J(u)− J(u∗) ≥ c52‖u− u∗‖. (8.4.8)

Combining the inequalities (8.4.6), (8.4.7) and (8.4.8), we get the statement ofTheorem 8.4.2 with

c1 := min

[c0c2

2,c0c

23

2,c52

].

8.5. ELLIPTIC REGULARIZATION 339

Now, setting θ := c1

2r , the validity of Assumption 5.1.4 with c := 0 followsimmediately from this statement.

It seems to be important to study proximal point methods for elliptic vari-ational inequalities with non-linear operators, in particular, for the generalizedProblem (A2.1.6) (see Gwinner [157], [159]), where, instead of the operator−∆u, the p-harmonic operator − div

(‖∇u‖p−2∇u

)with p > 2 is considered.

In case the differential operator is not strongly monotone, Lemma 3.1 in [157]gives a useful criterion for the well-posedness of the variational inequality.

8.5 Elliptic Regularization

This chapter is mainly focused upon the applications of proximal-like methodsfor solving variational inequalities with degenerate or singularly perturbed el-liptic operators. The conditions on the data approximation in the IPR-method(cf. Assumptions 8.5.3(iii)-(v) below) are weaker than those arising from thetheory of variational convergence for ill-posed problems, and at the same timethey are well-coordinated with the estimates of finite element interpolation inSobolev spaces (see the analysis of these conditions in Subsection 8.5.3 below).

For variational inequalities related to minimal surface problems and toconvection-diffusion problems, considered in Subsection 8.5.3, the studied gen-eral framework covers a new (proximal based) elliptic regularization method, inwhich a successive approximation of the set K is performed by means of thefinite element method on a sequence of triangulations.The idea of elliptic regularization, proposed by Lions [268] and Olejnik [311],consists in an approximation of a degenerate elliptic problem (parabolic prob-lems are treated as a special case) by a sequence of non-degenerate ellipticproblems. In the classical scheme it is carried out by adding the term εT (Tan appropriately chosen operator) to the operator of the original problem andby considering the sequence of solutions of the regularized problems for ε ↓ 0.Elliptic regularization is an efficient tool for the theoretical analysis of degen-erate problems (cf. [234], [268], [271]) and serves as a basis for some numericalmethods (see [106] and references therein). However, the necessity to reduce theparameter ε to 0 enforces hard requirements on the exactness of the discretiza-tion and causes ill-conditioning of the discretized problems.

Applying a proximal elliptic regularization, one can choose the regulariza-tion parameter apart from 0, and one obtains a sequence of uniformly ellipticauxiliary problems with a common constant of ellipticity.

Noteworthy is that for the minimal surface problems (with and withoutobstacles) considered in the space H1(Ω), the special analysis based on Theorem8.5.13 establishes a new result on the convergence of the proximal elliptic regu-larization method in the space W 1,1(Ω), whereas the general theory of proximalpoint methods guarantees weak convergence in H1(Ω) only (cf. Lemma 4.1.2).The proved strong convergence for the convection-diffusion problem is not sosurprising, because the operator of the problem is strongly monotone althoughsingularly perturbed, too.

In both cases regularization permits to obtain ”well-conditioned”discretizedproblems if the successive approximation of K is performed by means of stan-


dard finite element techniques.

8.5.1 A generalized proximal point method

Let (V, ‖ · ‖) be a Hilbert space with the topological dual V ′. We consider thevariational inequality

V I(Q, ϕ,K) find u ∈ K and p ∈ ∂ϕ(u) :

〈Q(u) + p, v − u〉 ≥ 0, ∀ v ∈ K,

whereK is a convex, closed subset of V ;Q : V → V ′ is a single-valued, monotoneoperator, its domain D(Q) contains K and Q is hemicontinuous on K (seeDefinition A1.6.38); ϕ : V → R := R ∪ +∞ is a convex, proper and lower-semicontinuous functional, ∂ϕ denotes its subdifferential and D(∂ϕ) ⊃ K.

In this subsection we continue our general algorithmic framework of IPR,now for solving VI(Q, ϕ,K). As before, it joins iterative proximal regularizationand data approximation in a manner of a diagonal process, i.e. approximation ofK and ∂ϕ is improved after each proximal iteration. If the operator Q possessesa certain reserve of monotonicity as described in the Assumptions 8.5.1 and8.5.3(ii) below, conditions on the choice of the regularizing functional admitthe application of weak proximal regularization as well as of regularization on asubspace, introduced in Section 8.2.3.

In the sequel the following assumption concerning VI(Q, ϕ,K) will be used.

8.5.1 Assumption.

(i) K ∩ intD(∂ϕ) 6= ∅;

(ii) for a given linear continuous and monotone operator B : V → V ′ withsymmetry property 〈Bu, v〉 = 〈Bv, u〉, the inequality

〈Q(u)−Q(v), u− v〉 ≥ 〈B(u− v), u− v〉, ∀ u, v ∈ D(Q)

is valid;

(iii) VI(Q, ϕ,K) is solvable. ♦

We denote the solution set of VI(Q, ϕ,K) by SOL(Q, ϕ,K) and write VI(Q,K)and SOL(Q,K) in case ϕ ≡ 0.

With the normality operator

NK : u 7→z ∈ V ′ : 〈z, u− v〉 ≥ 0 ∀ v ∈ K if u ∈ K,∅ otherwise,

the maximal monotonicity of Q+NK follows from Theorem 1 in [350]. Hence,Assumption 8.5.1(i) provides that the operator Q+NK +∂ϕ is maximal mono-tone, too (see [350], Theorem 3).

The IPR-framework studied here includes successive approximation of theset K and of the functional ϕ by means of a sequence Kk, Kk ⊂ K, ofconvex closed sets and a sequence ϕk, ϕk : V → R, of convex functionals,respectively. Moreover, we suppose that ϕk is Gateaux-differentiable on K and∇ϕk is hemicontinuous on K.


Let % : V → R be a convex Gateaux-differentiable functional such that

v 7→ %(v) + 〈Bv, v〉

is strongly convex on D(Q), where B satisfies Assumption 8.4.1(ii).As usual, we make also use of the controlling sequences δk and χk such that

δk ≥ 0, limk→∞

δk = 0, 0 < χk ≤ χ <∞. (8.5.1)

The choice of Kk, ϕk, % as well as of the sequences δk and χk will bespecified in Assumption 8.5.3.

8.5.2 Method. (Generalized Proximal Point Method)Data: x0 ∈ K; χ0 > 0, δ0 ≥ 0.

S1: Set k := 0.

S2: If uk is a solution of VI(Q, ϕ,K), stop.

S3: Find uk+1 ∈ Kk such that

(P k) 〈Q(uk+1) +∇ϕk(uk+1) + χk(∇%(uk+1)−∇%(uk)), v − uk+1〉≥ −δk‖v − uk+1‖, ∀ v ∈ Kk.

S4: Select χk+1, δk+1, set k := k + 1, go to S2. ♦

8.5.3 Assumption.

(i) % : V → R is a convex functional and the mapping ∇% is Lipschitz con-tinuous on D(Q);

(ii) with given constants χ ≥ 0, m > 0 and the operator B satisfying Assump-tion 8.5.1(ii), the inequality

χ

2〈B(u− v), u− v〉+ %(u)− %(v)− 〈∇%(v), u− v〉 ≥ m‖u− v‖2

holds for all u, v ∈ D(Q), and the choice of χ in (9.3.1) provides 2χχ < 1;

(iii) for each w ∈ K, there exists a sequence wk, wk ∈ Kk, such that

wk w, Q(wk)→ Q(w), limk→∞

ϕk(wk) = ϕ(w);

(iv) ϕk(v) ≥ ϕ(v), ∀ v ∈ K, ∀ k;

(v) with given nonnegative constants c1, c2, c and sequences hk, σk satis-fying

∞∑k=1

hkχk

<∞,∞∑k=1

σkχk

<∞, (8.5.2)


there exists a sequence wk, wk ∈ Kk, such that

‖wk − u∗‖ ≤ c1hk,

‖Q(wk)−Q(u∗)‖V ′ ≤ c2σk, (8.5.3)

ϕk(wk)− ϕ(u∗) ≤ cσk

for some u∗ ∈ SOL(Q, ϕ,K).

♦

For instance, for contact problems (with given friction), taking into accountrelation (8.2.3), the operator

B : 〈Bu, v〉 := c0

∫Ω

εkl(u)εkldx, ∀ u, v ∈ V

satisfies Assumption 8.5.1(ii), and the second Korn inequality (see (8.2.55))allows one to guarantee the validity of Assumption 8.5.3(ii) with the regularizingfunctional

% : u 7→ ‖u‖2L2(Ω)]2 , (8.5.4)

which meets also Assumption 8.5.3(i). Of course, % : u 7→ ‖u‖2 is also a possiblechoice, but (8.5.4) is more preferable from the numerical point of view.

8.5.4 Remark. In Subsection 8.5.3.1 we meet a situation when the assumptionKk ⊂ K can be rather restrictive, especially if the sets Kk are constructed bymeans of standard finite element techniques. In the problems considered there,however, we have ϕ ≡ 0 and the operator Q possesses good additional propertieslike strict monotonicity and Lipschitz continuity on the whole space V . Thispermits one to replace the assumption Kk ⊂ K by a weaker one:

(a) with a given c3 > 0 and hk as in (9.3.2), for any sequence vk, vk ∈ Kk,there exists a sequence zk ⊂ K such that

‖zk − vk‖ ≤ c3(‖vk − u∗‖2 + 1)hk, ∀ k,

or

(b) with a given c3 > 0 and hk, vk as in (a), there exists a sequencezk ⊂ K such that

〈Q(u∗), zk − vk〉 ≤ c3(‖vk − u∗‖2 + 1)hk, ∀k;

moreover, each weak limit point of vk belongs to K.

All statements of Subsection 8.5.2 (except for Theorem 8.5.13 if the weakerassumption (b) is used) remain true, the technical modifications in the proofscan be carried out on the base of the convergence analysis in [225]. ♦

8.5.5 Remark. If the Assumptions 8.5.1(ii), 8.5.3(i) and (ii) are valid, thenfor each k, the operator

v 7→ Q(v) +∇ϕk(v) + χk(∇%(v)−∇%(vk)) +NKk(v)

is maximal monotone ([350], Theorem 3). Hence, the exact problems (P k) (withδk = 0) has a unique solution, and the solvability of the inexact ones (withδk > 0) is guaranteed. ♦


8.5.2 Convergence analysis of the method

In the sequel we need the following modification of Minty’s lemma [290].

8.5.6 Lemma. Let Assumption 8.5.1(i) be valid, and for some u ∈ K and anyv ∈ K there exists p(v) ∈ ∂ϕ(v) such that

〈Q(v) + p(v), v − u〉 ≥ 0. (8.5.5)

Then, with some p ∈ ∂ϕ(u) the inequality

〈Q(u) + p, v − u〉 ≥ 0 (8.5.6)

holds for all v ∈ K.

Proof: DenoteG : v 7→ Q(v) + ∂ϕ(v) + I(v − u),

where I : V → V ′ is the canonical isometry operator. The operator G := G+NKis maximal monotone and strongly monotone. Therefore, there exists w ∈ K,such that 0 ∈ G(w), and using the definition of NK , we infer from here that

〈g(w), v − w〉 ≥ 0, ∀ v ∈ K (8.5.7)

holds with some g(w) ∈ G(w). In view of the convexity of the functional ϕ, thelast inequality implies

〈Q(w) + I(w − u), v − w〉+ ϕ(v)− ϕ(w) ≥ 0, ∀ v ∈ K. (8.5.8)

If w = u, then of course g(w) ∈ Q(w) + ∂ϕ(w) and the conclusion of the lemmais obvious.Suppose now that w 6= u. We make use of the relation

〈g(v), v − u〉 ≥ 0, ∀ v ∈ K, (8.5.9)

which follows from (9.3.5) for g(v) := g(v) + I(v − u) with an appropriateg(v) ∈ Q(v) +∂ϕ(v). Take wλ = u+λ(w−u) for λ ∈ [0, 1). Obviously, wλ ∈ K,and according to (9.3.9), for each λ there exists g(wλ) ∈ G(wλ) satisfying

〈g(wλ), w − u〉 ≥ 0. (8.5.10)

Using again the convexity of ϕ, one can immediately conclude from (9.3.10) that

〈Q(wλ) + I(wλ − u), w − u〉+1

1− λ[ϕ(w)− ϕ(u+ λ(w − u))] ≥ 0. (8.5.11)

Passing to the limit in (9.3.11) for λ ↓ 0 and observing that the operator Q ishemicontinuous on K and the functional ϕ is lsc, we get

〈Q(u), w − u〉+ ϕ(w)− ϕ(u) ≥ 0. (8.5.12)

Inequality (9.3.8) (given with v = u) together with (9.3.12) leads to

〈Q(u)−Q(w), u− w〉+ 〈I(u− w), u− w〉 ≤ 0,

but this contradicts the monotonicity of Q.


8.5.7 Remark.

(a) The reverse conclusion that (9.3.6) implies (9.3.5) (with any p(v) ∈ ∂ϕ(v))follows immediately from the monotonicity of the operator Q+ ∂ϕ. Usingthis fact and Lemma 8.5.6, one can easily show that SOL(Q, ϕ,K) is aconvex closed set.

(b) Under Assumption 8.5.1(i) the following statements are equivalent:

(b1) u ∈ K and 〈Q(v), v − u〉+ ϕ(v)− ϕ(u) ≥ 0, ∀ v ∈ K;

(b2) u ∈ K and ∃ p ∈ ∂ϕ(u): 〈Q(u) + p, v − u〉 ≥ 0, ∀ v ∈ K.

Indeed, the implication (b2)⇒ (b1) is evident. But if (b1) is fulfilled, thenthe monotonicity of Q and convexity of ϕ yield

〈Q(v), v − u〉+ 〈p(v), v − u〉 ≥ 0, ∀ v ∈ K, ∀ p(v) ∈ ∂ϕ(v),

and Lemma 8.5.6 provides the validity of (b2). ♦

Now, with χ from Assumption 8.5.3(ii) we introduce the function

Γ : (u, v) 7→ χ〈B(v − u), v − u〉+ %(u)− %(v)− 〈∇%(v), u− v〉. (8.5.13)

For u∗ chosen as in Assumption 8.5.3(v), Γ(u∗, ·) plays the role of a Lyapunovfunction in this convergence analysis.

8.5.8 Lemma. Let the Assumptions 8.5.1(ii), (iii), 8.5.3(i), (ii), (iv) and (v)be satisfied and

∑∞k=1

δkχk

<∞. Then it holds

(a) the sequence uk generated by Method 8.5.2 is bounded;

(b) limk→∞ Γ(uk+1, uk) = 0;

(c) limk→∞ ‖uk+1 − uk‖ = 0;

(d) sequence Γ(u∗, uk) converges.

Sketch of the proof: Let wk be chosen as in Assumption 8.5.3(v). Takinginto account Assumption 8.5.3(iv), (v) and the convexity of ϕk, we obtain

〈∇ϕk(uk+1), wk − uk+1〉 ≤ ϕk(wk)− ϕk(uk+1)

=[ϕk(wk)− ϕ(u∗)

]+[ϕ(u∗)− ϕk(uk+1)

]≤ cσk + ϕ(u∗)− ϕ(uk+1). (8.5.14)

Because u∗ ∈ SOL(Q, ϕ,K) and uk+1 ∈ Kk ⊂ K, the inequality

〈Q(u∗), uk+1 − u∗〉+ ϕ(uk+1)− ϕ(u∗) ≥ 0 (8.5.15)

is valid.Applying (P k), (9.3.17) and (9.3.18) for the estimation of

〈∇%(uk)−∇%(uk+1), wk − uk+1〉,

then the rest of the proof is analogous to those of Lemma 2 in [225].


8.5.9 Lemma. Let the Assumptions 8.5.1(i), 8.5.3(i), (iii) and (iv) be fulfilled.Moreover, suppose that the sequence uk generated by Method 8.5.2 is boundedand

limk→∞

‖uk+1 − uk‖ = 0.

Then each weak limit point of uk is a solution of VI(Q, ϕ,K).

Proof: Let the subsequence ukk∈S converge weakly to u. Because Kk ⊂K ∀ k and K is a closed convex set, one gets u ∈ K, whereas limk→∞ ‖uk+1 −uk‖ = 0 implies uk+1 u if k ∈ S, k →∞.According to Assumption 8.5.3(iii), for each v ∈ K, one can choose a sequencevk, vk ∈ Kk, such that vk v for k →∞ and

limk→∞

‖Q(vk)−Q(v)‖V ′ = 0, limk→∞

ϕk(vk) = ϕ(v). (8.5.16)

By the definition of uk (see (P k)) and the inclusion vk ∈ Kk, the inequality

〈Q(uk+1) + ∇ϕk(uk+1) + χk(∇%(uk+1)−∇%(uk)

), vk − uk+1〉

≥ −δk‖vk − uk+1‖

holds for all k. Due to the monotonicity of Q, the convexity of ϕk and Assump-tion 8.5.3(iv), this leads to

〈Q(vk) + χk(∇%(uk+1)−∇%(uk)

), vk − uk+1〉+ ϕk(vk)− ϕ(uk+1)

≥ −δk‖vk − uk+1‖.

Now, passing to the limit in the latter inequality for k ∈ S, k →∞, we obtainfrom (9.3.1), (9.3.19), and Assumption 8.5.3(i) that

limk→∞

‖uk+1 − uk‖ = 0, vk v, uk+1 u

and the lower semicontinuity of ϕ that

〈Q(v), v − u〉+ ϕ(v)− ϕ(u) ≥ 0, ∀ v ∈ K.

Finally, Lemma 8.5.6 and Remark 8.5.7(b) enable us to conclude that u ∈SOL(Q, ϕ,K).

8.5.10 Theorem. Let the Assumptions 8.5.1 and 8.5.3 be fulfilled and∑∞k=1

δkχk

<∞. Then it holds

(i) Problem (P k) is solvable for each k, the sequence uk generated by Method8.5.2 is bounded and each weak limit point of uk is a solution ofVI(Q, ϕ,K).

(ii) If, in addition, Assumption 8.5.3(v) holds for each u ∈ SOL(Q, ϕ,K)(the constants c1, c2, c may depend on u) and

vk v in V, vk ∈ Kk implies ∇%(vk) ∇%(v) in V ′, (8.5.17)

then the whole sequence uk converges weakly to u∗ ∈ SOL(Q, ϕ,K).


(iii) If, moreover, there exists a linear compact operator B : V → V ′ suchthat B + B is strongly monotone, then uk converges strongly to u∗ ∈SOL(Q, ϕ,K).

Proof: Conclusion (i) follows immediately from Remark 8.5.5 and the Lemmata8.5.8 and 8.5.9.The proof of conclusion (ii) is the same as in [225], Theorem 1.Thus, it remains to prove (iii) only. Let u∗ be the weak limit of uk. Choosingwk according to Assumption 8.5.3(v) we obtain from Assumption 8.5.1(ii)

〈B(uk+1 − u∗), uk+1 − u∗〉= 〈B(uk+1 − wk), uk+1 − wk〉 − 〈B(u∗ − wk), uk+1 − u∗〉

− 〈B(uk+1 − wk), u∗ − wk〉≤ 〈Q(uk+1)−Q(wk), uk+1 − wk〉 − 〈B(uk+1 − u∗), u∗ − wk〉 (8.5.18)

− 〈B(uk+1 − wk), u∗ − wk〉.

Now, we estimate the term 〈Q(uk+1), uk+1 − wk〉 by setting v = wk+1 in (P k),and then insert this estimate in (9.3.21). Together with Assumption 8.5.3(iv)this yields

〈B(uk+1 − u∗), uk+1 − u∗〉≤ 〈Q(wk), wk − uk+1〉+ 〈∇ϕk(uk+1), wk − uk+1〉

+ χk〈∇%(uk+1)−∇%(uk), wk − uk+1〉+ δk‖wk − uk+1‖+ 〈Bu∗ + Bwk − 2Buk+1, u∗ − wk〉

≤ 〈Q(wk)−Q(u∗), wk − uk+1〉+ 〈Q(u∗), wk − u∗〉+ 〈Q(u∗), u∗ − uk+1〉+ χk〈∇%(uk+1)−∇%(uk), wk − uk+1〉+ δk‖wk − uk+1‖+ 〈Bu∗ + Bwk − 2Buk+1, u∗ − wk〉+

(ϕk(wk)− ϕ(uk+1)

). (8.5.19)

Taking into account that the sequences wk and uk+1 are bounded, one canconclude that all terms in the right hand side of (9.3.22) tend to zero for k →∞.Indeed, it vanishes

the first, second and sixth term in view of Assumption 8.5.3(v);

the third term because uk u∗;

the fourth term due to ‖uk+1 − uk‖ → 0 and Assumption 8.5.3(i);

the fifth term owing to δk → 0;

the last term in view of Assumption 8.5.3(v), uk+1 u∗ and the lowersemicontinuity of ϕ.

Thus, (9.3.22) implies

limk→∞

〈B(uk − u∗), uk − u∗〉 = 0. (8.5.20)

At the same timelimk→∞

〈B(uk − u∗), uk − u∗〉 = 0 (8.5.21)

follows from uk u∗ and the compactness of B. Adding (9.3.23), (9.3.24) andobeying the strong monotonicity of B+B we conclude finally that uk convergesto u∗ strongly in V .


8.5.11 Remark. From the compactness of the canonical injection I : H1(Ω)→L2(Ω) (Ω is here an open domain in R2 with a Lipschitz continuous boundary)and the second Korn inequality, the existence of an operator B satisfying con-dition (iii) of Theorem 8.5.10 can be shown, in particular, for ill-posed ellipticvariational inequalities, which describe the problem of linear elasticity with givenfriction and the two-body contact problem. We dealt with these problems in Sec-tions 8.2 and 8.3.For the problem of linear elasticity, for example, the operator B defined by

〈Bu, v〉 =

∫Ω

(u1v1 + u2v2)dx, ∀ u, v ∈ V := [H1(Ω)]2

is appropriate. ♦

The following assumption serves to establish a more qualitative conver-gence of Method 8.5.2 in those situations when the conditions (ii) and (iii) ofTheorem 8.5.10 are not guaranteed.

Let Y ⊃ V be a Banach space with the norm ‖ · ‖Y , and

distY (y,A) := infz∈A‖y − z‖Y .

According to Lemma 8.5.8, there exists ρ > 0 such that

uk ⊂ Bρ, and Bρ ∩ SOL(Q, ϕ,K) 6= ∅.

Denote S∗ := SOL(Q, ϕ,K) ∩ Bρ.

8.5.12 Assumption. There exists a continuous function τ : [0,∞) → [0,∞),τ(0) = 0, τ(s) > 0 ∀ s > 0, such that

〈Q(v), v − u〉+ ϕ(v)− ϕ(u) ≥ τ (distY (v, S∗)) , ∀ u ∈ S∗,∀ v ∈ K ∩ Bρ.

♦

For a fixed u ∈ K ∩ Bρ, the function

ξ(·, u) : v → 〈Qv, v − u〉+ ϕ(v)− ϕ(u)

possesses the properties

ξ(v, u) ≥ 0 ∀ v ∈ K ⇔ u ∈ S∗,ξ(v, u) = 0 if v ∈ SOL(Q, ϕ,K), u ∈ S∗.

Assumption 8.5.12 describes a growth condition for the function infu∈S∗ ξ(·, u)on the set (K ∩ Bρ) \ S∗.In case Q = 0,K = V, Y = V and τ(s) = cs2, Assumption 8.5.12 is closelyrelated to a growth condition used by Kort and Bertsekas [246] for thequadratic method of multipliers in convex programming, which in fact is a prox-imal point method applied to the dual program.

8.5.13 Theorem. Let the conditions of Lemma 8.5.8 and Assumption 8.5.12be fulfilled. Moreover, suppose that Assumption 8.5.3(v) is valid for each u∗ ∈ S∗and the operator Q is bounded on K∩Bρ. Then, for the sequence uk generatedby Method 8.5.2 it holds

limk→∞

distY (uk, S∗) = 0. (8.5.22)


Proof: With u∗ ∈ S∗ and wk chosen as in Assumption 8.5.3(v), one gets

Γ(u∗, uk+1)− Γ(u∗, uk)

= −Γ(uk+1, uk) + 〈∇%(uk)−∇%(uk+1), u∗ − wk + wk − uk+1〉+ 2χ〈B(uk − uk+1), u∗ − uk+1〉,

and applying (P k) to estimate the term 〈∇%(uk) − ∇%(uk+1), wk − uk+1〉 weobtain

Γ(u∗, uk+1)− Γ(u∗, uk)

≤ 2χ〈B(uk − uk+1), u∗ − uk+1〉

+ 〈∇%(uk)−∇%(uk+1), u∗ − wk〉+δkχk‖wk − uk+1‖

+1

χk〈Q(uk+1), wk − uk+1〉+

1

χk〈∇ϕk(uk+1), wk − uk+1〉.(8.5.23)

Now, taking into account the convexity of ϕk and Assumption 8.5.3(iv), in-equality (9.3.26) yields

Γ(u∗, uk+1)− Γ(u∗, uk)

≤ 2χ〈B(uk − uk+1), u∗ − uk+1〉

+ 〈∇%(uk)−∇%(uk+1), u∗ − wk〉+δkχk‖wk − uk+1‖

+1

χk〈Q(uk+1), wk − u∗〉+

1

χk

(ϕk(wk)− ϕ(u∗)

)+

1

χk

[〈Q(uk+1), u∗ − uk+1〉+ ϕ(u∗)− ϕ(wk)

].

In view of Assumption 8.5.12 and χk ≤ χ, the last term can be replaced by

− 1

χτ(distY (uk+1, S∗)

).

Finally, passing to the limit in the so modified inequality, and owing to Lemma8.5.8, the assumptions Assumption 8.5.3(i), (v), the boundedness of the oper-ator Q on K ∩ Bρ, and

∑∞k=1

δkχk

<∞, we immediately obtain

limτ→∞

τ(distY (uk+1, S∗)) = 0.

Now the properties of τ imply the validity of (9.3.25).

We conclude this section with a statement which can be useful for checkingAssumption 8.5.12. It allows us to analyze a growth property of the function

v 7→ infu∈S∗〈Q(v), v − u〉+ ϕ(v)− ϕ(u)

in a Y -neighborhood of S∗ only. With ρ as above, a constant c such that

‖v‖Y ≤ c‖v‖ ∀ v ∈ V,

and a given δ ∈ (0, 2cρ), we consider the set

Kδ := v ∈ K ∩ Bρ : distY (v, S∗) ≤ δ.


8.5.14 Lemma. Suppose that τ : [0,∞) → [0,∞) is a nondecreasing functionsuch that

〈Q(v), v − u〉+ ϕ(v)− ϕ(u) ≥ τ(distY (v, S∗)) (8.5.24)

is valid for any v ∈ Kδ, u ∈ S∗. Then the inequality

infu∈S∗

[〈Q(v), v − u〉+ ϕ(v)− ϕ(u)] ≥ τ(

δ

2cρdistY (v, S∗)

)(8.5.25)

holds for any v ∈ K ∩ Bρ.

Proof: Obviously, we need to check (8.5.25) for v ∈ (K ∩ Bρ) \Kδ only. Takean arbitrary u ∈ S∗ and introduce the function

ψ(λ) := distY (λv + (1− λ)u, S∗) for 0 < λ < 1.

In view of the convexity of S∗ (see Remark 8.5.7(a)), the function distY (·, S∗)is convex, and this implies the convexity of ψ.In turn, because domψ = (0, 1), ψ is continuous on (0, 1), and taking intoaccount that u ∈ S∗ ⊂ Kδ, v /∈ Kδ, we conclude that there exists λ ∈ (0, 1) suchthat

v := λv + (1− λ)u ∈ Kδ and distY (v, S∗) = δ.

Now using

v − u = λ(v − u),1− λλ

(v − u) = v − v,

from the monotonicity of Q and the convexity of ϕ we obtain

1− λλ〈Q(v)−Q(v), v − u〉 = 〈Q(v)−Q(v), v − v〉 ≥ 0

and

ϕ(v)− ϕ(u) ≥ 1

λ(ϕ(v)− ϕ(u)).

Therefore,

〈Q(v), v − u〉 =1

λ〈Q(v), v − u〉 ≥ 1

λ〈Q(v), v − u〉

and

〈Q(v), v − u〉+ ϕ(v)− ϕ(u) ≥ 1

λ[〈Q(v), v − u〉+ ϕ(v)− ϕ(u)]

hold, and inequality (9.3.27) yields

〈Q(v), v − u〉+ ϕ(v)− ϕ(u) ≥ 1

λτ(distY (v, S∗))

≥ τ(distY (v, S∗)).

But, distY (v, S∗) ≤ 2cρ follows from v ∈ Bρ, ‖v‖Y ≤ c‖v‖, and S∗ ⊂ Bρ. Hence

distY (v, S∗) ≥ δ

2cρdistY (v, S∗),

and taking into account the nondecreasing of τ , we conclude that

〈Q(v), v − u〉+ ϕ(v)− ϕ(u) ≥ τ(δ

2cρdistY (v, S∗)).


Because u ∈ S∗ is arbitrarily chosen, this leads to (8.5.25).

A growth condition like (9.3.27) with τ(s) = cs2 and τ(s) = cs was in-troduced in [224] to investigate the rate of convergence of multi-step proximalregularization methods.

The result above will be applied in the next Subsection to show W 1,1-convergence of the iterates of Method 8.5.2 for the minimal surface problemand related variational inequalities considered in the H1-space.

8.5.3 Application to minimal surface problems

In the next two subsections we deal with elliptic variational problems in thespace V = H1(Ω), where Ω is an open domain in R2 with a Lipschitz continuousboundary Γ. In this context the convex closed set K is defined as

K = v ∈ V : v = g on Γ1 (8.5.26)

orK = v ∈ V : v = g on Γ, v ≥ ψ a.e. on Ω, (8.5.27)

where Γ1 ⊆ Γ, meas Γ1 > 0; g and ψ are sufficiently smooth functions onΩ := Ω ∪ Γ and ψ ≤ g on Γ.

Applying Method 8.5.2 to these problems, a successive approximation ofK by means of the finite element method on a sequence of triangulations Tkis performed.In accordance with the conditions on approximation, formulated in Assumption8.5.3, we will suppose that

Ω is a polygonal domain;

the solution of the problem is sufficiently smooth;

a standard finite element method with piece-wise linear basis functions onthe regular sequence of triangulations Tk of Ω is applied.

Analogously to Section 8.1, denote hk the characteristic triangulation parame-ter of Tk, i.e. hk is the length of the largest edge of the triangles T in Tk; Σkindicates the set of vertices of all triangles in Tk; Σk(Γ1), Σk(Γ) are the sets ofall vertices lying on Γ1 and Γ, respectively; P1 denotes the space of polynomialsin two variables of degree ≤ 1.

Then, on the functional space

V k := v ∈ C(Ω) : v|T ∈ P1(T ) ∀ T ∈ Tk (8.5.28)

the sets (8.5.26) and (8.5.27) are approximated by

Kk := v ∈ V k : v(ai) = g(ai) ∀ ai ∈ Σk(Γ1) (8.5.29)

and

Kk := v ∈ V k : v(ai) = g(ai) ∀ ai ∈ Σk(Γ),

v(ai) ≥ ψ(ai) ∀ ai ∈ Σk, (8.5.30)

respectively.


8.5.3.1 Formulations and properties of minimal surface problems

The classical (non-parametric) minimal surface problem, considered in the spaceV := H1(Ω), can be formulated as follows:

minJ(u) : u ∈ g +H10 (Ω), J(u) :=

∫Ω

√1 + |∇u|2dx (8.5.31)

(g ∈ V is given), i.e. among all functions u ∈ H1(Ω), u = g on Γ, we are lookingfor a function, which defines a surface z = u(x1, x2) with the smallest area.Introducing the operator Q : V → V ′ defined by2

〈Q(u), v〉 =

∫Ω

∇u · ∇v√1 + |∇u|2

dx ∀ u, v ∈ V (8.5.32)

and the affine setK := v ∈ V : v − g ∈ H1

0 (Ω), (8.5.33)

problem (8.5.31) can be rewritten as variational equality

find u ∈ K : 〈Q(u), v〉 = 0 ∀ v ∈ H10 (Ω),

which in turn represents a weak formulation of the boundary value problem

−div∇u√

1 + |∇u|2= 0 on Ω, (8.5.34)

u = g on Γ.

Equation (8.5.34) is nothing else but the Euler equation for the classical minimalsurface problem.

The non-homogeneous problem

− div∇u√

1 + |∇u|2= p on Ω,

u = g on Γ

is known as the Dirichlet problem for the equation of prescribed mean curvature.For a long history and survey of numerous investigations connected with thesetwo problems we refer to the monographs [309] and [130].

The variational inequality VI(Q,K) with K given by (8.5.27) correspondsto the minimal surface problem with an obstacle. Problems of such type weremainly investigated in non-reflexive Banach spaces, where the functional

J(u) =

∫Ω

√1 + |∇u|2dx (8.5.35)

possesses better coercivity properties (see [234], Chapt. 3.4).The problems considered here are not uniformly elliptic (see [253], Chapt.

VI for the corresponding definition). Indeed, using the identity

2∑i,j=1

∂2f(t)

∂ti∂tjξiξj =

|ξ|2 + (t2ξ1 − t1ξ2)2

(1 + |t|2)32

2Symbols a · b and |a| stand for the inner product and the Euclidean norm of vectors inR

2, respectively.


for f(t) =√

1 + t21 + t22, we obtain

β(u)|ξ|2 ≤2∑i,j

∂2f(t)

∂ti∂tj

∣∣t=∇uξiξj ≤

1√1 + |∇u|2

|ξ|2, ∀ ξ ∈ R2,

where β(u) > 0; for instance β(u) = (1 + |∇u|2)−32 is appropriate. But the right

inequality shows that β(u) cannot be separated from 0 uniformly in u.

The violation of the uniform ellipticity causes serious difficulties in thetheoretical and numerical analysis of these problems, including the investigationof their solvability.

8.5.15 Remark. The following facts point implicitly to the nature of thesedifficulties:

For the minimal surface problem with

Ω := x ∈ R2 : 1 < ‖x‖ < 2, g =

0 if ‖x‖ = 2γ if ‖x‖ = 1

there exists γ∗ such that the problem is solvable for γ ∈ [0, γ∗] and hasno solution if γ > γ∗ (for this well-known example see, for instance, [100],Chapt. V);

A necessary condition for the solvability of the Dirichlet problem for theequation of prescribed mean curvature is that∣∣∣∣∫

ω

p(x)dx

∣∣∣∣ < mes ∂ω

holds for all proper subsets ω ⊂ Ω with Lipschitz continuous boundaries,(mes ∂ω denotes the perimeter of ω), cf. [130]. ♦

The existence of a classical solution of problem (8.5.34) with continuousdata was proved by Rado [342] in the case that Ω is a convex set. Conditionsensuring that the solution of (8.5.34) belongs to C2,1(Ω) can be found in [253],Theorem IV.10.9.

For the minimal surface problem with an obstacle, but in the space V =H1,∞

0 (Ω), Lewy and Stampacchia [265] have shown that the solution is inW 2,s(Ω)∩C1(Ω), 1 ≤ s <∞, if ψ ∈ C2(Ω) and Ω is a convex set with a smoothboundary.

The uniqueness of a solution (if it exists) in case K is given by (8.5.33)or (8.5.27) is a rather evident corollary of the strict convexity of the functional(8.5.35). In turn, the strict convexity of J on K can be concluded by integration(over Ω) of the left inequality in (8.5.37) below given with a = |∇u|, b = |∇v|,where u, v ∈ K (hence, u− v ∈ H1

0 (Ω)).

8.5.16 Proposition. The operator Q in (8.5.32) is Lipschitz continuous on V .


Proof: For any u, v, w ∈ V one gets∣∣∣∣∣∫

Ω

(1√

1 + |∇u|2− 1√

1 + |∇v|2

)∇u · ∇wdx

∣∣∣∣∣≤∫

Ω

∣∣∣∣∣∣ (∇u−∇v) · (∇u+∇v)√1 + |∇u|2

√1 + |∇v|2

(√1 + |∇u|2 +

√1 + |∇v|2

)∣∣∣∣∣∣ |∇u · ∇w|dx

≤∫

Ω

|∇u−∇v||∇w|dx ≤ ‖u− v‖‖w‖,

whereas the Cauchy-Schwartz inequality implies∣∣∣∣∣∫

Ω

∇u−∇v√1 + |∇v|2

· ∇wdx

∣∣∣∣∣ ≤(∫

Ω

|∇u−∇v|2

1 + |∇v|2dx

)1/2

‖w‖ ≤ ‖u− v‖‖w‖.

Thus, we have

|〈Q(u)−Q(v), w〉| ≤ 2‖u− v‖‖w‖, ∀ w ∈ V,

hence

‖Q(u)−Q(v)‖V ′ ≤ 2‖u− v‖.

8.5.17 Proposition. Suppose that a solution u of VI(Q,K), with Q in (8.5.32)and K in (8.5.33) or (8.5.27), belongs to W 1,∞(Ω). Then Assumption 8.5.12is valid with Y = W 1,1(Ω), arbitrary ρ > ‖u‖ and τ(s) := c(ρ)s2.

Proof: Let us recall that u is the unique solution of VI(Q,K), hence S∗ = u.The convexity of J implies

J(v)− J(u) ≤ 〈Q(v), v − u〉, ∀ v ∈ V

and because 〈Q(u), v − u〉 ≥ 0 holds true for v ∈ K, we have for all v ∈ K

〈Q(v), v − u〉 ≥ J(v)− J(u)− 〈Q(u), v − u〉

=

∫Ω

[√1 + |∇v|2 −

√1 + |∇u|2 − ∇u · (∇v −∇u)√

1 + |∇u|2

]dx. (8.5.36)

With a ∈ R2, b ∈ R2 the identity

√1 + |a|2 −

√1 + |b|2 − b · (a− b)√

1 + |b|2

=|a− b|2√

1 + |b|2(√

1 + |a|2√

1 + |b|2 + 1 + b · a)

is evident, and using the inequality√1 + |a|2

√1 + |b|2 ≥ 1 + b · a,


this yields

√1 + |a|2 −

√1 + |b|2 − b · (a− b)√

1 + |b|2≥ |a− b|2

2 (1 + |b|2)√

1 + |a|2

≥ 1

2(1 + |b|2)

|a− b|2

1 + |a|2. (8.5.37)

From (8.5.37), given with a := ∇v, b := ∇u, and (8.5.36) we conclude that

〈Q(v), v − u〉 ≥ 1

2(1 +M2)

∫Ω

|∇v −∇u|2

1 + |∇v|2dx, ∀ v ∈ K, (8.5.38)

where M := ‖u‖W 1,∞(Ω). But the Cauchy-Schwarz inequality

∣∣∣∣∫Ω

zwdx

∣∣∣∣ ≤ (∫Ω

z2dx

)1/2(∫Ω

w2dx

)1/2

,

applied with z := |∇u−∇v|√1+|∇v|2

, w :=√

1 + |∇v|2, implies

(∫Ω

|∇u−∇v|dx)2

≤∫

Ω

|∇u−∇v|2

1 + |∇v|2dx ·

∫Ω

(1 + |∇v|2)dx.

Together with (8.5.38), the latter inequality leads to

〈Q(v), v − u〉 ≥ 1

2(1 +M2)·(∫

Ω|∇u−∇v|dx

)2∫Ω

(1 + |∇v|2)dx, ∀ v ∈ K. (8.5.39)

For v ∈ K, ‖v‖ ≤ ρ, inequality (8.5.39) gives

〈Q(v), v − u〉 ≥ 1

2(1 +M2)(meas Ω + ρ2)

(∫Ω

|∇u−∇v|dx)2

. (8.5.40)

Now, we use the fact that the standard norm and seminorm of the space W 1,1(Ω)are equivalent on the subspace W 1,1

0 (Ω). Because (u − v)|Γ = 0 holds for anyv ∈ K, this implies

∃ c > 0 :

∫Ω

|∇u−∇v|dx ≥ c‖u− v‖W 1,1(Ω), ∀ v ∈ K, (8.5.41)

and the conclusion of Proposition 8.5.17 follows from (8.5.40) and (8.5.41) with

c(ρ) =c2

2(1 +M2)(meas Ω + ρ2).

It should be noted that the embedding H1 ⊂W 1,1 is not a compact one.


8.5.3.2 Application of general proximal-point-method to minimalsurface problems

Considering the application of Method 8.5.2 to VI(Q,K) (with Q in (8.5.32)and K in (8.5.33) or (8.5.27)), we suppose, as already mentioned, that Ω is aconvex polygonal domain, VI(Q,K) is solvable and its solution u∗ belongs toH2(Ω).

As a regularizing functional the functional % : v 7→ ‖v‖2 can be used (incase Kk ⊂ K ∀ k, the choice %(v) = ‖v‖2

H10 (Ω)

may be preferable). Obviously,

these functionals possess the property (9.3.20).One can easily show that the operators

v 7→ Q(v) + χk(∇%(v)−∇%(uk))

in the subproblems (P k) of Method 8.5.2 are uniformly elliptic, in distinctionto the operator Q itself. This is the reason to speak about an elliptic regulariza-tion. Moreover, choosing a positive sequence χk apart from 0 (this is allowedby the conditions on the regularization parameter), the uniform ellipticity ofthese operators with a common constant of ellipticity is guaranteed. This isan important advantage in comparison with the classical elliptic regularizationapproach.

Applying the finite element method as described at the beginning of Sub-section 8.5.3, we deal here with sets Kk given by (8.5.29) (but with Γ1 = Γ) or(8.5.30). The inclusion Kk ⊂ K is not very realistic in this case, therefore wehave to check Assumption 8.5.3 modified as described in Remark 8.5.4.

The validity of Assumption 8.5.3(i) and (ii) (with B = 0, χ = 0) is obvious.To show Assumption 8.5.3(iii) for an arbitrary w ∈ K, if K is given by (8.5.27),one can rewrite K in the form

K = g + v ∈ H10 (Ω) : v ≥ ψ − g

and then follow the proof of Theorem 3.2 in [182], Sect. 1.2. This provides

∃ wk ∈ Kk : limk→∞

‖wk − w‖ = 0. (8.5.42)

If K is given by (8.5.33), the relation (8.5.42) is well-known. The application ofProposition 8.5.16 and (8.5.42) yields

limk→∞

‖Q(wk)−Q(w)‖V ′ = 0.

Analogously, taking into account that u∗ ∈ H2(Ω), the validity of Assumption8.5.3(v) with σk := hk follows from the properties of finite element approxima-tion, see Theorem 3.2.1 in [75] and Proposition 8.5.16.

Now we check the fulfillment of the conditions in Remark 8.5.4. DenoteφIk the linear interpolant of a function φ on the triangulation Tk.At first, let K be given by (8.5.33). For an arbitrary vk ∈ Kk take zk(vk) :=vk + g − gk, where gk := gIk . Obviously, zk(vk) ∈ K. From Theorem 3.2.1 in[75], already for g ∈ H2(Ω), the estimate

‖g − gk‖ ≤ c‖g‖H2(Ω)hk


holds with c independent of g, hk and Tk. Hence,

‖zk(vk)− vk‖ ≤ c‖g‖H2(Ω)hk,

i.e. condition (a) in Remark 8.5.4 is guaranteed with c3 = c‖g‖H2(Ω).

Now, let K be given by (8.5.27). In this case, for an arbitrary vk ∈ Kk, take

zk(vk) := max[vk + g − gIk , ψ

].

Then zk(vk) ∈ K, and with gk := gIk , ψk := ψIk the relation

g − gk ≤ zk(vk)− vk ≤ max[g − gk, ψ − ψk

](8.5.43)

holds on Ω. If g−gk ≥ ψ−ψk on Ω (in particular, if g−ψ is a concave function),then again zk(vk) := vk + g − gk holds true and condition (a) is also fulfilled.Otherwise, we are able to prove only the weaker condition (b) in Remark 8.5.4.Indeed, Green’s formula yields

〈Q(u∗), zk(vk)− vk〉 = −∫

Ω

div∇u∗√

1 + |∇u∗|2(zk(vk)− vk)dx

+

∫Γ

∂u∗

∂n√1 + |∇u∗|2

(zk(vk)− vk)dΓ, (8.5.44)

where ∂∂n denotes the normal derivative on Γ.

Assuming that g ∈ C2(Ω), ψ ∈ C2(Ω), Theorem 3.1 in [382] provides the esti-mates

‖g − gk‖C(Ω) ≤ c(g)h2k, ‖ψ − ψk‖C(Ω) ≤ c(ψ)h2

k. (8.5.45)

From (8.5.43)-(8.5.45) we conclude the fulfillment of the first part of condition(b) in Remark 8.5.4. The second part follows from the proof of Theorem II.2.3in [134].

Now, we are ready to apply the convergence results from Section 8.5.2.Let us recall that Ω is a convex polygonal domain, the sets Kk are described by(8.5.29) (with Γ1 = Γ) or (8.5.30) and the functions g, ψ are sufficiently smooth.The following statement is an immediate corollary of the Theorems 8.5.10 and8.5.13.

8.5.18 Theorem.

(i) The problems (P k), corresponding to VI(Q,K) (with Q defined by (8.5.32)and K by (8.5.33) or (8.5.27)) are solvable.

(ii) Let the controlling sequences of Method 8.5.2 satisfy the conditions (9.3.1),(9.3.2) and

∑∞k=1

δkχk

<∞.

a) If the solution u∗ belongs to H2(Ω), then uk u∗ in V .

b) If assumption (a) from Remark 8.5.4 is valid3 and u∗ belongs toH2(Ω) ∩W 1,∞(Ω), then limk→∞ ‖uk − u∗‖W 1,1(Ω) = 0.

3For instance if the set K is given by (8.5.33).


8.5.19 Remark. A quite similar analysis can be performed for Method 8.5.2applied to the variational formulation of the Dirichlet problem for the equationof prescribed mean curvature. Here the operator Q is defined by

〈Q(u, v)〉 =

∫Ω

(∇u · ∇v√1 + |∇u|2

− pv

)dx, ∀ u, v ∈ V

(cf. with (8.5.32)). Under the assumption that u∗ ∈ H2(Ω) ∩W 1,∞(Ω), we alsoobtain

limk→∞

‖uk − u∗‖W 1,1(Ω) = 0.

♦

8.5.4 Application to convection-diffusion problems

These problems arise in many areas such as the transport and diffusion of pollu-tants, simulation of oil extraction from underground reservoirs, heat transportproblems in the convection-dominated case, etc.

8.5.4.1 Formulations and properties of the problem

Again, let Ω be an open domain in R2 with a Lipschitz continuous boundaryΓ, which now is divided into disjoint connected pieces Γ1 and Γ2, meas Γ1 > 0(Γ2 = ∅ is not excluded). We consider the convection-diffusion problem

−ε∆u+ b · ∇u+ cu = f on Ω (8.5.46)

with boundary conditions

u = g1 on Γ1,∂u

∂n= g2 on Γ2. (8.5.47)

The functions b = (b1, b2), c and f are supposed to be sufficiently smooth on Ω,g1, g2 ∈ H2(Ω); c ≥ 0 holds on Ω and ε is a small positive constant such that

0 < ε << ‖b‖[L∞(Ω)]2 . (8.5.48)

The unknown function u may represent the concentration of a pollutantbeing transported along a stream moving at velocity b and also subject to dif-fusive effects. Alternatively, u may represent the temperature of a fluid movingalong a heated wall. Relation (9.1.17) corresponds to the situation that the dif-fusion is a less significant physical effect than the convection. For instance, ona windy day a pollutant moves fast in the direction of the wind, whereas itsspreading due to molecular diffusion remains small.

Condition (9.1.17) causes a so-called boundary layer: a fast variation ofthe gradient of the solution near a part of the boundary. Such problems arecalled singularly perturbed. This peculiarity is illustrated in the following slightlymodified example from [102].

8.5.20 Example. The equation

−ε∆u+∂u

∂x2= 0 on Ω := (0, 1)× (−1, 1)


is considered subject to Dirichlet boundary conditions

u(x1,−1) = x1, u(x1, 1) = 0,

u(0, x2) = 0, u(1, x2) =1− exp((x2 − 1)/ε)

1− exp(−2/ε),

(i.e. u(1, x2) ∈ [0, 1] ∀ x2 ∈ (−1, 1)).

The unique solution of this problem is

u(x1, x2) = x11− exp((x2 − 1)/ε)

1− exp(−2/ε). (8.5.49)

Figure 8.5.15: Graph of solution (8.5.49) for ε := 0.5 and ε = 0.005

One can easily see that, for small ε, this solution is very close to the functionx1 except near the boundary part x2 = 1. But,

∂u(x)

∂x2= −x1

exp((x2 − 1)/ε)

1− exp(−2/ε)

1

ε,

and for any x1 > 0, d > 0

lim∂u(x)

∂x2= −∞ if x2 = 1− dε, ε→ 0.

In general, in the most part of the domain the solution of problem (8.5.46),(8.5.47) is close to the solution of the reduced (hyperbolic) equation

b · ∇u+ cu = f (8.5.50)

with appropriate boundary conditions. If Γ− ⊂ Γ1, where Γ− = x ∈ Γ : b · n <0 is a so-called inflow boundary (n denotes the outward-pointed unit vectornormal to Γ), then this boundary condition is

u = g1 on Γ− . (8.5.51)

Boundary layers arise near an outflow boundary Γ+ = x ∈ Γ : b · n > 0 anda characteristic boundary Γ0 = x ∈ Γ : b · n = 0, where the solutions of the


problems (8.5.46), (8.5.47) and (8.5.50), (8.5.51) can differ significantly, and theboundary layer functions (this is x1 exp((x2−1)/ε) in our example) characterizeapproximately the difference between these solutions.

The presence of boundary layers causes serious difficulties for the applica-tions of discretization techniques (finite-difference- and finite element methods)to convection-diffusion problems. There are numerous publications dealing withspecial discretization procedures and special algorithms for solving discretizedconvection-diffusion problems ( see [37], [102], [355], and references therein).

Introducing the space V = v ∈ H1(Ω) : u|Γ1= 0 with the norm

‖v‖ := ‖∇v‖[L2(Ω)]2

(see [75], Theorem 1.2.1 concerning the equivalence of this norm and the stan-dard norm of H1(Ω) in case meas Γ1 > 0), one can describe a weak formula-tion of problem (8.5.46), (8.5.47), i.e. a variational equation of the convection-diffusion problem as follows:

find u ∈ V such that (8.5.52)

ε

∫Ω

∇u · ∇vdx+

∫Ω

[(b · ∇u)v + cuv]dx =

∫Ω

fvdx+ ε

∫Γ2

gvdΓ, ∀ v ∈ V,

where f := f + ε∆g1 − b · ∇g1 − cg1, g := g2 − ∂g1∂n .

Applying the trace inequality

‖v‖L2(Γ) ≤ c1‖v‖, ∀ v ∈ H1(Ω) (8.5.53)

(see, for instance, [75], Section 1.2)to estimate the term ε

∫Γ2gvdΓ, the continuity of the functional

v 7→∫

Ω

fvdx+ ε

∫Γ2

gvdΓ

in the space V can be easily concluded, hence

∃ l ∈ V ′ : 〈l, v〉 =

∫Ω

fvdx+ ε

∫Γ2

gvdΓ, ∀ v ∈ V. (8.5.54)

The estimate ∣∣∣∣∫Ω

(b · ∇u)vdx

∣∣∣∣ ≤ d supx∈Ω|b(x)| · ‖u‖‖v‖

(with d : ‖v‖L2(Ω) ≤ d‖v‖ ∀ v ∈ V )is proved by using twice the Cauchy-Schwarz inequality. Now the continuity ofthe bilinear form in (8.5.52) follows in a standard way. Thus, according to theRiesz theorem A1.1.1, there exists an operator A ⊂ L(V, V ′) such that

〈Au, v〉 = ε

∫Ω

∇u · ∇vdx+

∫Ω

(b · ∇u)vdx+

∫Ω

cuvdx. (8.5.55)

Its strong monotonicity can be shown under the additional assumption that

c− 1

2div b ≥ 0 on Ω, and Γ2 ⊂ Γ+ ,


which we will suppose in the sequel. Indeed, applying Green’s formula to

α(u, v) :=

∫Ω

(b · ∇u)vdx

we obtain

α(u, v) = −∫

Ω

udiv(vb)dx+

∫Γ2

uv(b · n)dΓ

= −∫

Ω

uvdivbdx−∫

Ω

(b · ∇v)udx+

∫Γ2

uv(b · n)dΓ

= −∫

Ω

uvdivbdx− α(v, u) +

∫Γ2

uv(b · n)dΓ.

Thus

α(u, u) = −1

2

∫Ω

u2divbdx+1

2

∫Γ2

u2(b · n)dΓ,

and b · n > 0 in Γ2 holds because of Γ2 ⊂ Γ+. Now

α(u, u) +

∫Ω

cu2dx ≥∫

Ω

(c− 1

2divb)u2dx ≥ 0,

and according to (8.5.55)

〈Au, u〉 ≥ ε‖u‖2, ∀ u ∈ V. (8.5.56)

From the monotonicity and continuity of the operator A it follows that Ais maximal monotone, and together with (8.5.56) this guarantees the existenceof a unique u∗ ∈ V such that

〈Au∗ − l, v〉 = 0, ∀ v ∈ V.

Because α(u, v) 6= α(v, u), the operator is not symmetric, hence problem (8.5.46),(8.5.47) cannot be transformed - at least not in a natural way - into an opti-mization problem.

Conditions on the data of problem (8.5.46), (8.5.47), which provide u∗ ∈H2(Ω), can be found in [148], [253]. In particular, u∗ ∈ H2(Ω) holds in thecase that Ω is a convex polygonal domain, Γ1 = Γ, functions b, c, g1 are suf-ficiently smooth, and f ∈ L2(Ω) (see [253], Theorem III.9.1 and Remark III.9.4).

8.5.4.2 Application of generalized IPR-method toconvection-diffusion problems

Applying Method 8.5.2 with % : v 7→ ‖v‖2 and an appropriate parameter se-quence χk, we approximate the singularly perturbed elliptic problem (8.5.46),(8.5.47) by a sequence of problems with unperturbed elliptic operators. Remind,this cannot be carried out by means of the classical approach of elliptic regu-larization. On this way, the boundary layers (also inner layers if exist; see [355]for this notion) will be accumulated gradually, because of the term −χk∇%(uk)in the operator of problem (P k).


In particular, for problem (8.5.46), (8.5.47) with Γ1 = Γ, g1 ≡ 0, the exactproblem (P k) (with δk = 0) consists in the finding of a weak solution of theequation

−(ε+ 2χk)∆u+ b · ∇u+ cu = f + 2χk∆uk, in H10 (Ω).

The gradual accumulation of boundary- and inner layers allows us a more suc-cessful application of standard finite element methods, and with ε+2χk in placeof ε, we obtain a better stability and conditioning of the discretized problems.

8.5.21 Remark. In [355], authors analyze situations where boundary- and in-ner layer functions can be defined a priori - sometimes in explicit form - by usinga standard technique from the singular perturbation theory. If such functionsare known (exact or approximately), they can be used to choose a starting pointin Method 8.5.2 or to correct an approximate solution after certain number ofiterations. ♦

Now, we examine the application of the convergence results from Section8.5.2 to Method 8.5.2 for solving the convection-diffusion problem in the vari-ational formulation (8.5.52). As in the previous case, it is supposed that Ω isa convex polygonal set and that the solution u∗ belongs to H2(Ω). So, we dealwith auxiliary problems (P k) in the space V = v ∈ H1(Ω) : u|Γ1 = 0, inwhich

Q : v 7→ Av − l, ϕk ≡ 0, % : v 7→ ‖v‖2

and Kk are given by (8.5.29).

Obviously, in this case Kk ⊂ K := V , Assumption 8.5.3(i) and (9.3.20)are satisfied. Assumption 8.5.1(ii) is valid with the operator B := −ε∆, and inAssumption 8.5.3(ii) one can take χ := 0.The relation

∀ w ∈ K, ∀ k, ∃ wk ∈ Kk : limk→∞

‖w − wk‖ = 0

follows immediately from the proof of Theorem 3.3 in [182], Section 1.1., andtherefore, the continuity of the operator A implies the validity of Assumption8.5.3(iii).

Next, because u∗ ∈ H2(Ω), the estimate

‖u∗ − u∗Ik‖ ≤ c‖u∗‖H2(Ω)hk, ∀ k

is known, and taking into account that u∗Ik ∈ Kk and

‖Q(u∗)−Q(u∗Ik)‖ ≤ ‖A‖V ′‖u∗ − u∗Ik‖,

Assumption 8.5.3(v) is valid with σk := hk if∑∞k=1

hkχk

<∞.

Finally, since the operator B = −ε∆ is strongly monotone on V , condition (iii)in Theorem 8.5.10 holds with B := 0.

Therefore, choosing the controlling parameters according to (9.3.1), (9.3.2)and

∑∞k=1

δkχk

< ∞, one can use Theorem 8.5.10, which guarantees that the

iterates uk of Method 8.5.2 converge to u∗ strongly in V .


8.6 Comments

Section 8.1: Various approaches to the solution of the problem with an obstacleon the boundary, including results on the convergence and the rate of conver-gence of finite element approximations, can be found in the extended edition ofthe monograph Glowinski, Lions and Tremolieres [135] and in the papersof Haslinger [168], Hlavacek [180, 181] and Scarpini and Vivaldi [360].

For coercive problems of this type see, for instance, Lions [271], Cea [69],Hlavacek et al. [182].

Numerical methods for elliptic variational inequalities with finite elementapproximations of the dual problem are considered in [135] and also in the pa-pers of Fremond [119] and Haslinger [169].

Mixed variational formulations are investigated in the framework of the du-ality theory for variational inequalities, developed by Cea [69], and Ekelandand Temam [100]. The general approach to mixed methods (see Brezzi andFortin [54]) is based, in particular, on investigations of Aubin [19], Babuskaand Gatica [27], Lapin [254] and Wang and Yang [415].

Section 8.2: For a more detailed analysis of the two-body contact problem seeHlavacek et al. [182]. The Signorini problem was studied by Fichera [114].Concerning the finite element approximation of the Signorini Problem in caseof a curved boundary, we refer to Haslinger [169].

The results in the Subsections 8.2.3-8.2.5 are now published in [219, 224].For the analysis of the coercivity of the auxiliary problems in the methods withregularization on the subspace a technique is used, which has been developedby Fichera [114] and Panagiotopoulos [316].

Theorem 8.2.19 improves a corresponding statement of the paper [212].

Section 8.3: The basic algorithm of alternating iterations has been studied byPanagiotopoulos [315] for the discrete version of the Signorini problem withfriction. In our description of the algorithm we follow Hlavacek et al. [182].The main results about the convergence of the MSR-method with weak regu-larization for the non-smooth Problem II (Theorem 8.3.3) is new.

Section 8.4: In connection with the application of IPR-methods to well-posedproblems we emphasize that in Rockafellar’s paper [352] the rate of conver-gence of the proximal-point-algorithm for an abstract variational inequality isstudied under conditions, which guarantee the well-posedness of the problem ina special sense (see the comments to Section 5.2).

Section 8.5: Typically, standard discretization methods in mathematical physicsare not efficient when applied to degenerate and singularly perturbed ellipticvariational inequalities, and there are numerous investigations addressed to thecreation of special discretization procedures, for instance finite element methodswith upwinding, streamline diffusion finite element methods, etc. [37, 102, 355].Also special algorithms for solving the arising discretized problems are needed.

In [231], we develop a quite different idea, which may be presented asfollows: Using the proximal regularization, the original variational inequality isapproximated by a sequence of uniformly elliptic problems, which can be treatedwith standard finite element techniques and standard solvers. Moreover, only a

8.6. COMMENTS 363

single discretization is used for each regularized problem, with a mild rule fordecreasing of the triangulation parameter in the outer process.

In various schemes of proximal point methods the conditions on data ap-proximation are of Mosco’s type with order σ > 0 (see Subsection 2.4.2 aswell as [260, 5]), or outer approximations of the set K are used [226]. Theseconditions are certainly not suitable if we deal with problems in mathematicalphysics and use finite element or finite-difference methods. Indeed, in this caseKk 6⊃ K, and for an arbitrary element u ∈ K at best the relation

limk→∞

minv∈Kk

‖u− v‖ = 0

can be concluded (without any estimate for the rate of convergence), i.e., theMosco convergence with order σ > 0 cannot be guaranteed.

Abstract assumptions admitting the described peculiarity were introducedfirst in [224]. Here we deal with a weaker form of these assumptions.

Growth conditions used so far to obtain a more qualitative convergence ofproximal point methods (see for instance [352, 281, 225]) are not fulfilled in thecase of the operator Q in the minimal surface problem. Therefore, the knownconvergence results provide only weak convergence in H1(Ω) of the proximalpoint methods when applied to this problem or related variational inequalities.

We conclude with the remark that an application of the elliptic proximalregularization method (without discretization) for solving parabolic variationalinequalities was studied in [225].

Chapter 9

PPR FOR VI’s WITHMULTI-VALUEDOPERATORS

For a given set-valued operator Q : Rn → 2Rn

and a closed convex subset K ofRn the following variational inequality is considered:

VI(Q,K) find a pair x∗ ∈ K and q∗ ∈ Q(x∗) such that

〈q∗, x− x∗〉 ≥ 0 ∀ x ∈ K.

As before we make use of the notations and definitions commonly used in con-vex analysis, see for instance Rockafellar [353]. Concerning properties of VI’swith set-valued maximal monotone operators we also refer to Subsection A1.6.

It is well-known that VI(Q,K) is closely related to the problem of findinga zero of a maximal monotone operator. Indeed, if the operator Q is maximalmonotone and ri(dom(Q)) ∩ ri(K) 6= ∅, then VI(Q,K) is equivalent to theinclusion

IP(T ) 0 ∈ T (x∗),

where T = Q+NK and NK denotes the normality operator of K.

9.1 PPR with Relaxation

The exact PPR for IP(T ) (with an arbitrary maximal monotone T ) is a fixedpoint iteration as follows: For any x0 ∈ Rn the sequence xk is generated bythe iteration

xk+1 := JχkT (xk), k = 0, 1, 2 . . . ,

where χk ⊂ (0,+∞) is a sequence of regularization parameters and

JχkT = (I + χkT )−1

is the resolvent operator associated with T . Due to Minty’s theorem [290], thisoperator is single-valued.

365

366 CHAPTER 9. PPR FOR VI’S WITH MULTI-VALUED OPERATORS

Gol’shtein and Tret’yakov [140] introduced a sequence of relaxation pa-rameters ρk ⊂ (0, 2) and proved the convergence of the relaxed proximal pointalgorithm:

xk+1 := (1− ρk)xk + ρkJχkT (xk), k = 0, 1, 2, . . . ,

for χk ≡ χ > 0. Considering the same scheme, Eckstein and Bertsekas[98] allow the regularization parameter to vary with k. In the terminology ofrelaxation algorithms the choices ρk ∈ (0, 1) are usually referred to as under-relaxation and ρk ∈ (1, 2) as over-relaxation, respectively.

Usually, the main goal for introducing a relaxation step in PPR is theacceleration of the convergence. However, in general one cannot expect thata relaxation step leads always to a reduction of the iteration number as thefollowing example shows.

9.1.1 Example. For the rotation operator T : R2 → R2,

T (x1, x2) =

[0 −11 0

] [x1

x2

],

which is monotone, continuous (and therefore maximal monotone), the relaxedproximal point algorithm generates the following sequence xk:[

xk+11

xk+12

]:=

1

1 + χ2k

[(1− ρk)χ2

k χkρk−χkρk (1− ρk)χ2

k

] [xk1xk2

], k = 0, 1, 2, . . . .

The eigenvalues of the iteration matrix depend on the relaxation parameter ρkand can be evaluated explicitly:

µ1(ρk) = 1− ρkχ2k

1 + χ2k

− ρkχk1 + χ2

k

√−1,

µ2(ρk) = 1− ρkχ2k

1 + χ2k

+ρkχk

1 + χ2k

√−1,

with

|µ1(ρk)|2 = |µ2(ρk)|2 = 1− ρkχ2k(2− ρk)

1 + χ2k

.

Minimizing the norm of the eigenvalues with respect to the relaxation parameter,one obtains

ρ∗k = arg minρk|µ1(ρk)|2 = arg min

ρk|µ2(ρk)|2 = 1.

The last two equations indicate two interesting facts. The first equation showsthat the lower and upper bounds for the relaxation parameter ρk are sharp.The choice ρk = 0 or ρk = 2 leads to |µ1(ρk)| = |µ2(ρk)| = 1 and hence to thedivergence of the relaxed PPR. The second equation says that the best choiceof ρk is ρ∗k ≡ 1. The case ρk 6= 1 leads to an increasing of iteration steps in bothcases - under-relaxation as well as over-relaxation. ♦

Hence, there is no universally valid recommendation for the choice of the se-quence of relaxation parameters ρk.

9.1. PPR WITH RELAXATION 367

The evaluation of the resolvent operator JχkT requires the most numericaleffort in one iteration of PPA. The exact evaluation of JχkT is equivalent tothe solution of a nontrivial auxiliary problem and might be very expensive inpractice. Eckstein and Bertsekas [98] suggest to use Rockafellar’s errortolerance criterion which permits to work with approximate solutions of theauxiliary problems in the following manner:

xk+1 := (1− ρk)xk + ρkyk, k = 0, 1, 2, . . . ,

where‖yk − JχkT (xk)‖ ≤ δk

and δk ⊂ [0,+∞) is a sequence of error tolerance parameters satisfying

∞∑k=0

δk <∞.

Some other error tolerance criteria for PPA can be found in [225] and [375]. How-ever, from the numerical point of view the evaluation of the resolvent operatorJχkT is always a difficult task. It requires to invert the operator I + χkT , whatdepends on the nature of T and implies information about the whole image ofthe operator.

9.1.1 Relaxed PPR with enlargements of maximal mono-tone operators

Burachik et. al. [61] try to overcome this difficulty and introduce an outerapproximation, the so-called ε-enlargement T ε of the operator T .

To start, we repeat briefly the definition and the main properties of the ε–enlargements of monotone operators. For a given monotone operator F : Rn →2R

n

and ε ≥ 0 the ε-enlargement F ε : Rn → 2Rn

is defined as follows

F ε(x) = u ∈ Rn : 〈u− v, x− y〉 ≥ −ε ∀ (y, v) ∈ gph(F ).

9.1.2 Proposition. [61], (Propositions 1, 2)Let F : Rn → 2R

n

be a monotone operator, then

(a) gph(F ) ⊆ gph(F ε1) ⊆ gph(F ε2) for any ε2 ≥ ε1 ≥ 0.

If additionally F is maximal monotone, then

(b) gph(F ) = gph(F 0).

(c) If dom(F ) is closed, then dom(F ) = dom(F ε) for all ε ≥ 0.

Using the error tolerance criterion introduced in [225], we suggest the followingversion of the relaxed proximal point algorithm for solving VI(Q,K):

9.1.3 Algorithm. (Relaxed Proximal Point Algorithm) (RPPA)S0. Choose any x0 ∈ Rn. Set k := 0.S1. If xk is a solution of VI(Q,K): STOP.S2. Choose some χk > 0, δk ≥ 0, εk ≥ 0 and

find yk+1 ∈ K, qk+1 ∈ Qεk(yk+1), such that

〈χkqk+1 + yk+1 − xk, y − yk+1〉 ≥ −δk‖y − yk+1‖ ∀ y ∈ K. (9.1.1)


S3. Choose ρk ∈(

0, 21+2δk

)and set

xk+1 := (1− ρk)xk + ρkyk+1. (9.1.2)

S4. Set k := k + 1 and go to S1. ♦

It should be noted that a solution of the auxiliary problem (9.1.1) can befound similarly to Eckstein [97]: Find yk+1 ∈ Rn and an error vector ek+1,such that

ek+1 ∈ χkQεk(yk+1) + yk+1 − xk +NK(yk+1), (9.1.3)

‖ek+1‖ ≤ δk. (9.1.4)

Indeed, if yk+1 and ek+1 satisfy (9.1.3)-(9.1.4), then yk+1 ∈ K and the definitionof the normality operator NK yields

〈χkqk+1 + yk+1 − xk, y − yk+1〉 ≥ −〈ek+1, y − yk+1〉 ∀ y ∈ K

for some qk+1 ∈ Qεk(yk+1). Owing to the Cauchy–Schwarz inequality we canconclude that (yk+1, qk+1) satisfy (9.1.1).

9.1.4 Proposition. Let K ⊆ Rn be a closed, convex set, Q : Rn → 2R

n

a maximal monotone operator with closed effective domain and ri(dom(Q)) ∩ri(K) 6= ∅. Then Algorithm 9.1.3 is well defined, i.e. for each k ∈ N and anyχk > 0, εk ≥ 0, δk ≥ 0 there exists a solution of the auxiliary problem (9.1.1).

Proof: Because the restriction set K is closed and convex, the normality oper-ator NK is maximal monotone. Besides, due to [350], Theorem 2, the operator

y 7→ χkQ(y) + y − xk +NK(y)

is maximal monotone and strongly monotone for χk > 0. Using [353], Proposi-tion 12.54, one can infer that the auxiliary problem (9.1.1) has a unique solutionfor εk = δk = 0, i.e. there exist y ∈ Rn, such that

0 ∈ χkQ(y) + y − xk +NK(y).

Due to Proposition 9.1.2, the last relation yields immediately for any εk ≥ 0that is

0 ∈ χkQεk(y) + y − xk +NK(y),

and thus yk+1 := y satisfies (9.1.1) for any δk ≥ 0.

9.1.5 Proposition. Assume that the solution set of VI(Q,K) is nonempty andthe sequences of parameters in Algorithm 9.1.3 are chosen such that

0 < χ ≤ χk ≤ χ <∞, 0 < ρ ≤ ρk ≤ σ2

1 + 2δk∀ k ∈ N,

δk ≥ 0, εk ≥ 0 ∀ k ∈ N,∞∑k=0

δk <∞,∞∑k=0

εk <∞,

for some σ ∈ (0, 1). Then it holds:


(i) sequence xk generated by Algorithm 9.1.3 is bounded;

(ii) sequence ‖xk − x∗‖ converges for any solution x∗ of VI(Q,K)

(iii) and it holdslimk→∞

‖xk+1 − xk‖ = 0. (9.1.5)

Proof: Let x∗ ∈ K be a solution of VI(Q,K). Then there exists q∗ ∈ Q(x∗),such that

〈q∗, y − x∗〉 ≥ 0 ∀ y ∈ K. (9.1.6)

For y = yk+1 in (9.1.6) and y = x∗ in (9.1.1) one gets

1

2‖xk − x∗‖2 − 1

2‖xk+1 − x∗‖2 =

1

2‖xk+1 − xk‖2 + 〈xk+1 − xk, x∗ − xk+1〉

≥ 1

2‖xk+1 − xk‖2 + 〈xk+1 − xk, x∗ − xk+1〉

+ ρk(〈χkqk+1 + yk+1 − xk, yk+1 − x∗〉

−δk‖x∗ − yk+1‖)− ρkχk〈q∗, yk+1 − x∗〉.

Relation (9.1.2) leads to

yk+1 = xk +1

ρk(xk+1 − xk)

and, due to qk+1 ∈ Qεk(yk+1), one can continue and conclude that

1

2‖xk − x∗‖2 − 1

2‖xk+1 − x∗‖2

≥ 1

2‖xk+1 − xk‖2 + 〈xk+1 − xk, x∗ − xk+1〉

+ ρkχk〈qk+1 − q∗, yk+1 − x∗〉

+ 〈xk+1 − xk, xk +1

ρk(xk+1 − xk)− x∗〉

− ρkδk‖x∗ − xk −1

ρk(xk+1 − xk)‖

≥(

1

ρk− 1

2

)‖xk+1 − xk‖2 − ρkχkεk

− ρkδk(‖x∗ − xk‖+

1

ρk‖xk+1 − xk‖

)≥(

1

ρk− 1

2

)‖xk+1 − xk‖2 − ρkχkεk

− ρkδk(‖x∗ − xk‖2 +

1

ρk‖xk+1 − xk‖2 + 1 +

1

ρk

).

The last inequality follows from r ≤ r2 + 1 ∀ r ∈ R. Rearranging the terms ityields

1

2‖xk+1 − x∗‖2 ≤ (1 + 2δkρk)

1

2‖xk − x∗‖2

−(

1

ρk− 1

2− δk

)‖xk+1 − xk‖2 + ρkχkεk + δk (ρk + 1) .


If ρk ≤ 2σ1+2δk

∀ k, then

1

2‖xk+1 − x∗‖2 ≤

(1 +

4δkσ

1 + 2δk

)1

2‖xk − x∗‖2

− (1 + 2δk)(1− σ)

2σ‖xk+1 − xk‖2 +

2σχkεk1 + 2δk

+ δk3 + 2δk1 + 2δk

. (9.1.7)

According to δk ≥ 0 for all k ∈ N and σ ∈ (0, 1), one can finally conclude that

‖xk+1 − x∗‖2 ≤ (1 + 4δk) ‖xk − x∗‖2 + 4χkεk + 2δk(3 + 2δk).

Due to the assumptions on the sequences χk, εk and δk together withLemma A3.1.7, the convergence of

‖xk − x∗‖

is established. Hence, the se-

quence xk is bounded, moreover in view of (9.1.7) it follows

‖xk+1 − xk‖ → 0 (k →∞).

In order to prove convergence of the sequence xk to a solution of VI(Q,K),we make use of a gap function, introduced in [59], Lemma 3, Lemma 4: Thefunction ψ : dom(Q) ∩K → R ∪ +∞,

ψ(x) = supy ∈ K,

(y, v) ∈ gph(Q)

〈v, x− y〉 (9.1.8)

is a convex gap function for VI(Q,K), i.e. ψ(x) ≥ 0 for all x ∈ dom(Q)∩K andψ(x∗) = 0 if and only if x∗ is a solution of VI(Q,K).

9.1.6 Theorem. Suppose the assumptions of Propositions 9.1.4 and 9.1.5 arefulfilled, then the sequence xk generated by Algorithm 9.1.3 converges to asolution of VI(Q,K).

Proof: Proposition 9.1.5, in particular (9.1.5) as well as (9.1.2) together withthe assumption on the sequence ρk guarantee that

‖yk+1 − xk‖ → 0 (k →∞). (9.1.9)

According to Proposition 9.1.5, the sequence xk is bounded and thereforethere exists a subsequence xkl of xk having a limit point x. Hence, from(9.1.9) it follows immediately that

ykl+1 → x (l→∞).

Inequality (9.1.1) guarantees that ykl+1 ∈ dom(Q) ∩ K for all l ∈ N. Theeffective domain of Q as well as the restriction set K are closed and we getx ∈ dom(Q) ∩K. Now, for any l ∈ N one can conclude from (9.1.1) that

〈χklqkl+1 + ykl+1 − xkl , y − ykl+1〉 ≥ −δkl‖y − ykl+1‖ ∀ y ∈ K,

where qkl+1 ∈ Qεkl (ykl+1). The definition of the ε–enlargement yields for anyy ∈ K and any q ∈ Q(y)

〈χklq + ykl+1 − xkl , y − ykl+1〉 ≥ −εklχkl − δkl‖y − ykl+1‖.


Passing to the limit for l → ∞ in the last inequality, then (9.1.9) and theassumptions on the sequences χk, εk, δk lead to

〈q, y − x〉 ≥ 0 ∀ (y, q) ∈ gph(Q), ∀ y ∈ K. (9.1.10)

Using the definition of the gap function ψ, from (9.1.10) one can conclude that

ψ(x) = 0,

i.e., x is a solution of VI(Q,K). It remains to show that the whole sequencexk converges to x. In view of Proposition 9.1.5, the sequence ‖x∗ − xk‖converges for an arbitrary solution x∗ of VI(Q,K). For x in place of x∗ the se-quence ‖x− xk‖ converges, too. Moreover, for the subsequence xkl it holds‖x− xkl‖ → 0 as l→∞. Thus, the whole sequence xk converges to the limitpoint x.

9.1.7 Remark. The parameter σ, introduced in Proposition 9.1.5, is a result ofthe proof and it is used to obtain the convergence ‖xk+1−xk‖ → 0. Parameter σcan be selected arbitrarily from the interval (0, 1). However, from the numericalpoint of view it should be chosen closely to 1, which allows us to perform largersteps in the relaxation step (9.1.2) of Algorithm 9.1.3.

The error tolerance parameter δk has also some influence on the step sizein the relaxation step (9.1.2). If the inexact solution yk+1 of auxiliary problem(9.1.1) is coarse, which is usual for iterates far from the solution set, then onehas to perform shorter steps. For δk = 0, i.e. the auxiliary problem is solvedexactly, the relaxation parameter ρk can be selected from the interval (0, 2).This selection corresponds to the classical result of Eckstein and Bertsekas[98]. It should be noted that the value of the error tolerance parameter in thescheme of Eckstein and Bertsekas has no consequences for the relaxationsteps. ♦

Although we use an outer approximation of the set-valued operator Q, theauxiliary problem (9.1.1) is still a general variational inequality and not easierto solve than the original problem. In the next subsection we will show howthe auxiliary problems in S2 of Algorithm 9.1.3 can be solved numerically forspecific operators Q.

9.1.2 Application to non-smooth minimization problems

Here the consideration of convex non-smooth optimization problems serves as anapplication of Algorithm 9.1.3, because in this case the ε-enlargements can becomputed by bundle methods relatively easy. As far as we know, the numericaltreatment of ε-enlargements for more general multi-valued operators is not aneasy task (cf., for instance, [62]).

Suppose that K ⊂ Rn is a closed, convex set and f : Rn → R ∪ +∞ isa proper, convex, lower semicontinuous function with ri(dom(f)) ∩ ri(K) 6= ∅.Then the optimization problem

CP(f,K) minx∈K

f(x)

is equivalent to the variation inequality VI(∂f,K).In the sequel we discuss a possible implementation of the Algorithm 9.1.3


for CP(f,K). In particular, we are interested in an algorithmic approach forStep 1 and Step 2.

Using the error tolerance criterion (9.1.3)– (9.1.4) to solve the auxiliaryproblem (9.1.1) for Q = ∂f , (9.1.1) reduces to:For given χk > 0, δk ≥ 0 and εk ≥ 0 determine yk+1 ∈ K and ek+1 ∈ Rn, suchthat

ek+1 ∈ χk(∂f)εk(yk+1) + yk+1 − xk +NK(yk+1), (9.1.11)

where ‖ek+1‖ ≤ δk.The most numerical effort in the algorithm is the solving of the auxiliary

problem (9.1.11). To choose a suitable solver let us analyze the structure of(9.1.11). Note that for εk = δk = 0 the inclusion (9.1.11) is equivalent to theminimization problem

miny∈Kf(y) +

1

2χk‖y − xk‖2, (9.1.12)

where the objective function is obviously strongly convex. However, for a non-smooth function f , the latter problem may be hard to solve. For this reason wemake use of a bundle method in order to find an approximate solution of (9.1.12)in accordance with (9.1.11).

The suggested bundle strategy consists of two steps: At first replace f bysome convex function such that the modified problem can be treated easier than(9.1.12). Then show that the solution of the resulting problem can be consideredas an approximate solution in the sense of (9.1.11).Assume that, for some fixed k ∈ N, a convex function fki+1 is constructed fori ≥ 0, approximating f from below, i.e.

fki+1(y) ≤ f(y) ∀ y ∈ dom(f). (9.1.13)

The rules defining fki+1 will be discussed below. Replacing f by fki+1 in (9.1.12),one gets the convex optimization problem

miny∈Kfki+1(y) +

1

2χk‖y − xk‖2. (9.1.14)

Let zki+1 be an approximate solution of (9.1.14), i.e. according to (9.1.11) thereexists eki+1 ∈ Rn with a sufficiently small norm such that

eki+1 ∈ ∂fki+1(zki+1) +1

χk(zki+1 − xk) +NK(zki+1). (9.1.15)

For eki+1 = 0 the point zki+1 solves (9.1.14) exactly.Now, we show that, under appropriate conditions on fki+1 and eki+1, the

iterate zki+1 solves (9.1.11). Indeed, by the definition of the subdifferential weget immediately from the last inclusion

fki+1(y) ≥ fki+1(zki+1) + 〈eki+1 −1

χk(zki+1 − xk)− wki+1, y − zki+1〉 ∀ y ∈ Rn,

with wki+1 ∈ NK(zki+1). Further, due to (9.1.13) and the property of zki+1, thelatter inequality leads to

f(y) ≥ f(zki+1) + 〈eki+1 −1

χk(zki+1 − xk)− wki+1, y − zki+1〉 − εk,i+1 ∀ y ∈ Rn,

(9.1.16)


withεk,i+1 = f(zki+1)− fki+1(zki+1). (9.1.17)

Due to (9.1.13) it holds εk,i+1 ≥ 0. In view of the definition of the ε-subdifferential,(9.1.16) is equivalent to

eki+1 ∈ ∂εk,i+1f(zki+1) +

1

χk(zki+1 − xk) +NK(zki+1). (9.1.18)

According to ∂εf ⊆ (∂f)ε for all ε ≥ 0 (cf. for instance, [61], Proposition 3)and observing that χkNK = NK for any χk > 0, one obtains finally that zki+1

satisfies the inclusion

χkeki+1 ∈ χk(∂f)εk,i+1(zki+1) + zki+1 − xk +NK(zki+1). (9.1.19)

If εk,i+1 ≤ εk and ‖eki+1‖ ≤ δk/χk, then zki+1 solves subproblem (9.1.11) and wecan take yk+1 := zki+1. If ‖eki+1‖ > δk/χk, then we should try to solve (9.1.14)more precisely. If ‖eki+1‖ ≤ δk/χk, but εk,i+1 > εk, one has to enhance functionfki+1 by some fki+2. After replacing fki+1 by fki+2 in (9.1.14) and solving theproblem again, the resulting enlargement parameter εk,i+2 is less than εk,i+1.

The choice of the sequence fki i≥1 is crucial for the determination of anapproximate solution of (9.1.11). According to Correa and Lemarechal [84],there are three conditions defining fki . For our purpose we adapt condition (4.8)in [84] to allow an inexact solving of (9.1.14) and require for fki i≥1:

(CP1) fki (y) ≤ f(y) ∀y ∈ dom(f), ∀i ≥ 1,

(CP2) fki+1(y) ≥ fki (zki ) + 〈eki − 1χk

(zki − xk), y − zki 〉, ∀ y ∈ K, ∀ i ≥ 1,

(CP3) fki+1(y) ≥ f(zki ) + 〈ski , y − zki 〉 ∀y ∈ Rn, ∀i ≥ 0,

where the vectors zki and eki satisfy the inclusion (9.1.15) for all i ≥ 1 andski ∈ ∂f(zki ) ∀i ≥ 0 are arbitrarily chosen. For i := 0 set zk0 := xk in (CP3).

There are at least three possibilities to choose a sequence fki i≥1, satis-fying (CP1)-(CP3). The first one is a choice of “maximal size”

fki+1(y) := maxj∈Jki

f(zkj ) + 〈skj , y − zkj 〉, (9.1.20)

where Jki := 0, . . . , i. Due to fki (y) ≤ fki+1(y) for any y and all i ≥ 1, thefunction fki+1 given by (9.1.20) satisfies (CP1)-(CP3). The second one is a choiceof “minimal size”

fki+1(y) := maxlki (y), f(zki ) + 〈ski , y − zki 〉, ∀i ≥ 1, (9.1.21)

with

lki (y) := fki (zki ) + 〈eki −1

χk(zki − xk), y − zki 〉.

For i := 0 we set fk1 (y) := f(zk0 ) + 〈sk0 , y − zk0 〉 and zk0 := xk. The validity of(CP2), (CP3) follows immediately from the definition. (CP1) is satisfied, sincethe two linear functions in (9.1.21) estimate f from below. The third possibilityis a combination of (9.1.20) and (9.1.21)

fki+1(y) := maxlki (y), f(zkj ) + 〈skj , y − zkj 〉 : j ∈ Jki , (9.1.22)


where Jki ⊂ Jki is an arbitrary set with i ∈ Jki .

The following proposition serves as a basis for the application of inexactsolution of the bundle subproblems.

9.1.8 Proposition. (Lemma 5.3.1 [187])Let K be a convex and closed set. Assume that f is a convex, lower semicon-tinuous function with K ⊂ dom(f) and the subdifferential ∂f is bounded onbounded subsets of K. Further suppose that for any fixed k ∈ N the sequencefki≥1 satisfies the conditions (CP1)-(CP3). If the solution set of CP(f,K) isnonempty and

∞∑i=1

‖eki ‖ < +∞,

then

(a) limi→∞

εk,i+1 = limi→∞

f(zki+1)− fki+1(zki+1) = 0.

(b) zki i∈N converges to the unique solution of (9.1.12).

It should be noted that ∂f is bounded on bounded subsets of K if K ⊂int(domf) = int(dom(∂f)).

Now we are looking for some numerical realization of Step 1 in Algorithm9.1.3 in order to decide whether the current iterate xk is a solution of CP(f,K).First of all one should be aware that xk may not always be feasible for CP(f,K).In the case of over-relaxation, in Step 3 it can happen that xk lies outside ofK. On the other hand yk+1 is always feasible. Therefore, we try to test yk+1

for optimality. If the conditions ekik+1 ≤ δk/χk and εk,ik+1 ≤ εk hold for some

ik ≥ 0, then according to (9.1.18) yk+1 := zki+1 satisfies the following condition

ekik+1 ∈ ∂εk,ik+1f(yk+1) +

1

χk(yk+1 − xk) +NK(yk+1),

or

f(y) ≥ f(yk+1) + 〈ekik+1 −1

χk(yk+1 − xk), y − yk+1〉 − εk,ik+1 ∀ y ∈ K.

If for some θ ≥ 0

‖ekik+1‖+1

χk‖yk+1 − xk‖ ≤ θ and εk,ik+1 ≤ θ, (9.1.23)

is true, then it follows immediately for yk+1 that

f(y) ≥ f(yk+1)− θ‖y − yk+1‖ − θ ∀ y ∈ K.

The point yk+1 is called θ-optimal. Obviously if θ := 0 then yk+1 is a solutionof CP(f,K).

Now we are able to formulate an implementable method for solving CP(f,K)based on Algorithm 9.1.3.


9.1.9 Algorithm. (Bundle RPPA)S0. Choose any x0 ∈ Rn and θ > 0. Set k := 0.S1. Choose some χk > 0, εk > 0, δk ≥ 0. Set zk0 := xk and i := −1.

S1.1 Set i := i+ 1 and construct fki+1 satisfying (CP1)-(CP3).S1.2 Find (zki+1, e

ki+1) as a solution of (9.1.15) with ‖eki+1‖ ≤ δk/χk.

S1.3 Evaluate εk,i+1 := f(zki+1)− fki+1(zki+1).S1.4 If εk,i+1 ≤ εk, then make a Serious Step:

set yk+1 := zki+1 and go to S2.S1.5 If εk,i+1 > εk, then make a Null Step:

go to S1.1.

S2. If ‖eki+1‖+ 1χk‖yk+1 − xk‖ ≤ θ and εk,i+1 ≤ θ: STOP,

yk+1 is a θ-optimal point.S3. Select ρk ∈ (0, 2/(1 + 2δk)) and set

xk+1 := (1− ρk)xk + ρkyk+1.

S4. Set k := k + 1 and go to S1. ♦

9.1.10 Theorem. Let K ⊂ Rn be a closed, convex set and f : Rn → R∪+∞be a proper convex, lower semi-continuous function with K ⊂ dom(f). Furtherlet the subdifferential ∂f be bounded on bounded subsets of K. If CP(f,K) issolvable and the sequences εk ⊂ (0,+∞), δk ⊂ [0,+∞), χk and ρksatisfy

∞∑k=0

εk <∞,∞∑k=0

δk <∞, 0 < χ ≤ χk ≤ χ <∞,

0 < ρ ≤ ρk ≤2σ

1 + 2δk∀ k ∈ N, for some σ ∈ (0, 1),

then Algorithm 9.1.9 generates a θ-optimal point of CP(f,K) after a finitenumber of iterations.

Proof: According to the assumption εk > 0 and in view of Proposition 9.1.8,for each k ∈ N there exists an ik ∈ N such that εk,ik+1 ≤ εk. Hence, afterfinitely many iterations, the inner iteration of Algorithm 9.1.9 – Step 1 – willbe done, i.e. this algorithm is well defined.After Step 1 the solution zkik+1 of the inclusion (9.1.19) is an approximativesolution of the auxiliary problem (9.1.11). Therefore, Theorem 9.1.6 guaranteesthe convergence of the sequence xk to a solution of CP(f,K).Further we can conclude via Theorem 9.1.6 that

limk→∞

‖yk+1 − xk‖ = 0, limk→∞

εk,ik+1 ≤ limk→∞

εk = 0.

Hence, the stopping criteria in Step 2 of Algorithm 9.1.9 will be satisfied forany θ > 0 if k is large enough.

9.1.3 Numerical aspects and computational results

The Bundle RPPA algorithm has been implemented and tested on a set of well-known problems in convex non-smooth optimization. All benchmark examplesare of the following form

minx∈K

f(x), (9.1.24)


whereK = x ∈ Rn : g1(x) ≤ 0, . . . , gm(x) ≤ 0.

The functions f and gj (j = 1, . . . ,m) are assumed to be convex and K has tosatisfy a constraint qualification, for instance, Slater’s condition. The detailedstructure of the problems is described by means of the test examples in Subsec-tion 9.1.4.1.

The assumptions of Theorem 9.1.10 are satisfied for these examples, be-cause in all cases dom(f) = Rn. Consequently, the objective function is contin-uous on the whole space and the subdifferential is bounded on any bounded set(cf. [348], Theorem 10.1, Theorem 23.4).

In the sequel, we discuss some aspects of the practical implementation ofAlgorithm 9.1.9. For key relations of the bundle methods we refer to [237] and[366].

9.1.3.1 The quadratic subproblem and the sequence δk

The crucial point in the algorithm is the computation of the solution of thequadratic problem (9.1.14). For each fixed k we have to construct a sequencefki+1 satisfying (CP1)-(CP3). As described above, there are several possibilitiesto do this. We prefer (9.1.22), because it allows to control the number of elementsincluded in the bundle. For i ≥ 1 let Jki be a subset of 0, . . . , i with i ∈ Jki(for i := 0 the choice of fki+1 is obvious). With an additional variable w ∈ RProblem (9.1.14) can be written equivalently as

min(w,y)∈Rn+1

w + 12χk‖y − xk‖2

s.t. f(zkj ) + 〈skj , y − zkj 〉 ≤ w ∀ j ∈ Jki ,fki (zki ) + 〈ski , y − zki 〉 ≤ w,y ∈ K,

(9.1.25)

ormin

(v,y)∈Rn+1v + 1

2χk‖y − xk‖2

s.t. −αk,j + 〈skj , y − xk〉 ≤ v ∀ j ∈ Jki ,−αk,i + 〈ski , y − xk〉 ≤ v,y ∈ K,

(9.1.26)

with

v := w − f(xk),

ski := − 1

χk(zki − xk),

αk,j := f(xk)− (f(zkj ) + 〈skj , xk − zkj 〉) ∀ j ∈ Jki ,αk,i := f(xk)− (fki (zki ) + 〈skj , xk − zki 〉).

Due to the convexity of f , the linearization errors αk,j (j ∈ Jki ) and αk,i arenon-negative.

If K is polyhedral, then (9.1.26) is a convex quadratic programming prob-lem with linear constraints for which efficient solvers are known. If some of thefunctions gj are nonlinear, then the choice of a suitable solution method depends


on further properties of gj . In order to design an universal solver, we suggest toeliminate the restriction set K by using an exact penalty function. Therefore, ifm > 0 we consider

F (x) := f(x) + c

m∑i=1

maxgi(x), 0,

with some constant penalty parameter c > 0 and, instead of (9.1.24), we solvethe unrestricted non-smooth problem

minx∈Rn

F (x).

Since we are dealing in (9.1.24) with a convex optimization problem satisfying aconstraint qualification, an a priori choice of the penalty parameter c is possible(see, for instance [179], Chapter VII, Corollary 3.2.3). The modifications in theproblem data of (9.1.26) are marginal: f must be replaced by F and consequentlythe function as well as the subgradient evaluations have to be changed.

The transfer to the case K := Rn has a further advantage. The dual of

(9.1.26) can be evaluated explicitly. Denoting by (νj : j ∈ Jki ; νi) the Lagrangemultipliers of the linear constraints, the dual problem has the form

minν∈R|J

ki|+1

12

∥∥∥∥ ∑j∈Jki

νjskj + νis

ki

∥∥∥∥2

+ 1χk

( ∑j∈Jki

νjαk,j + νiαk,i

)s.t.

∑j∈Jki

νj + νi = 1,

νj ≥ 0 ∀j ∈ Jki , νi ≥ 0.

(9.1.27)

Problem (9.1.27) is a quadratic programming problem. It has |Jki |+ 1 variables

and |Jki |+2 constraints in comparison to n+1 variables and |Jki |+1 constraints inproblem (9.1.26). Moreover, the constraints of (9.1.27) are structurally simplerthan the constraints of (9.1.26). Further, the number of the bundle elements|Jki |+ 1 can be chosen arbitrarily between 2 and i+ 1, i.e. we can decide on thesize of problem (9.1.27) in dependence of the numerical effort and the qualityof the approximation of f by fki+1.

Now, if the solution (νkj : j ∈ Jki , νki ) of the dual problem (9.1.27) is

calculated, the solution (vki+1, zki+1) of (9.1.26) is given by

zki+1 := xk − χk(∑j∈Jki

νkj skj + νis

ki

), (9.1.28)

vki+1 := − 1

χk‖zki+1 − xk‖2 −

(∑j∈Jki

νkj αk,j + νki αk,i). (9.1.29)

Observe that at the solution (wki+1, zki+1) of (9.1.25) the relation fki+1(zki+1) =

wki+1 is true. Therefore, it holds

fki+1(zki+1) = vki+1 + f(xk), (9.1.30)

and due to (9.1.17)

εk,i+1 = f(zki+1)− vki+1 − f(xk). (9.1.31)


With εk,i+1 given by (9.1.31) the validity of the stopping criteria for the inneriteration in Algorithm 9.1.9 (see Step 1.4 and Step 1.5) can be proved. If thestopping criteria is not satisfied, then one has to update ski+1 and αk,i+1 by

ski+1 :=∑j∈Jki

νkj skj + νis

ki ,

αk,i+1 :=∑j∈Jki

νkj αk,j + νki αk,i

and to solve the dual problem (9.1.27) for i+ 1 again.For the sake of simplicity we take here the accuracy parameter δk ≡ 0 and

assume eki = 0 for all i ∈ N and all k ∈ N in (CP2). It is possible to carryout a deeper analysis and to consider an inexact solution of the dual problem(9.1.27). Nevertheless, for computational purposes this is not crucial and can beneglected.

9.1.3.2 Choice of sequences χk and εk

By Theorem 9.1.10 the sequence χk must be bounded. Numerical experi-ences carried out by Kiwiel [237] and Schramm and Zowe [366] show thatthe choice of χk has a great influence on the convergence of the algorithm.Our numerical computations are executed with two alternative selections of thissequence: Firstly, χk ≡ χ > 0 and secondly χk is updated by the weightingtechnique which can be found in [237].

The choice of εk in Algorithm 9.1.9 determinates the number of inneriterations of the algorithm and therefore it is directly responsible for the nu-merical progress. Due to Theorem 9.1.10 there is only one requirement: thesequence εk has to be summable. In order to avoid an a priori setting andto make its choice dependent on the progress of the algorithm, we suggest thefollowing strategy:

εk := −γ1vkik+1 ∀ k ∈ N, (9.1.32)

where γ1 ∈ (0, 1) is some constant and ik ∈ N is the first index for which itholds

εk,ik+1 ≤ −γ1vkik+1. (9.1.33)

To show the existence of such an index ik in (9.1.33), we assume theopposite, i.e. let εk,i+1 > −γ1v

ki+1 for all i ≥ 0. In view of (9.1.30) and (9.1.31)

the latter inequality is equivalent to

εk,i+1 > γ1(f(xk)− fki+1(zki+1))

= γ1(f(xk)− f(zki+1) + εk,i+1) ∀ i ≥ 0.

With Proposition 9.1.8 and the continuity of f we get for i→∞

0 ≥ γ1(f(xk)− f(zk)).

Hence, due to γ1 > 0, we infer f(xk) ≤ f(zk). The point zk is optimal forProblem (9.1.12), therefore

f(xk) ≥ f(zk) +1

χk‖zk − xk‖2.


Combining the last two inequalities we obtain zk = xk. Finally, the optimalitycondition for zk says that 0 ∈ ∂f(xk), i.e. the point xk solves Problem CP(f,K).

Let us summarize: If (9.1.33) is false for infinitely many steps, then xk solvesthe original problem. If xk is not a solution of CP(f,K), then (9.1.33) is trueafter finitely many steps.

Assume now that the current iteration point is not a solution of CP(f,K).It remains to show if εk is defined by (9.1.32), then it is summable. Indeed,after ik steps of the inner iteration (see Step 1.4)

f(zkik+1) ≤ f(xk) + (1− γ1)vkik+1

is true. If the stopping criteria is valid, then set yk+1 := zkik+1 and one gets

f(yk+1) ≤ f(xk) + (1− γ1)vkik+1. (9.1.34)

If after the relaxation step (Step 3) the relation

f(xk+1) ≤ f(yk+1)− γ2vkik+1 (9.1.35)

holds true for some −∞ < γ2 < 1 − γ1 (see the discussion in Subsection 9.1.4below), then (9.1.34) and (9.1.35) lead to

f(xk+1) ≤ f(xk) + (1− γ1 − γ2)vkik+1. (9.1.36)

With (9.1.36) and (9.1.32) we conclude that for any N ∈ N

N∑k=0

εk = −γ1

N∑k=0

vkik+1 ≤γ1(f(x0)− f(xN ))

1− γ1 − γ2≤ γ1(f(x0)− f(x∗))

1− γ1 − γ2,

where x∗ is an arbitrary solution of CP(f,K). Taking limit as N →∞ it yieldsimmediately

∑∞k=0 εk <∞.

9.1.11 Remark. Inequality (9.1.33) serves as a measure for the quality of therelaxation step. With the help of the parameter γ2 the selection of the pointsxk+1 in question can be sharpened. If γ2 < 0 then for the next iterate xk+1 itholds ϕ(xk+1) < ϕ(yk+1). However, it is possible to have γ2 > 0. In this caseϕ(xk+1) < ϕ(yk+1) is also feasible after the relaxation step. Hence, a trivialchoice of the parameter γ2 could be γ2 ∈ [0, 1−γ1]. In this case various relaxationparameters can be chosen, in particular ρk := 1 is possible.

Note that, according to the choice of εk in (9.1.32), the stopping conditionfor a Serious Step in Algorithm 9.1.9 coincides with the typical condition fora serious step in minimization methods for convex non-differentiable functions(see [237], [262], and [366]). ♦

9.1.4 Choice of relaxation parameters

By Theorem 9.1.10 the relaxation parameter ρk can be selected arbitrarily fromthe interval (0, 2/(1 + 2δk)). To guarantee the summability of εk, in view of(9.1.35), a further condition on ρk is needed. But it turns out that (9.1.35) isnot a strong restriction. For example it can be trivially satisfied for γ2 = 0 ifρk := 1 is chosen. For γ2 ∈ (0, 1 − γ1) we are allowed to determinate ρk suchthat f(xk+1) > f(yk+1). In any cases if (9.1.35) is true after the relaxation step,


then f(xk+1) < f(xk) holds for all k ∈ N.Certainly there are many possibilities to execute the relaxation step. It

could be a line search procedure, where (9.1.35) plays the role of a stoppingcriterion. Line search requires usually a big number of function evaluations. Ourgoal is to implement Step 3 such that an adaptive choice of ρk is possible andthe number of function evaluations becomes minimal. For instance, it can bedone by taking ρk as the minimizer of a quadratic polynomial pk approximatingf(xk + t(yk+1 − xk)) on [0, 2]. In this case the detailed description of Step 3 inAlgorithm 9.1.9 reads as follows:

S3. Initialization: relaxation := true, ir := 0, choose γ2 ∈ (−∞, 1− γ1).

S3.0 If relaxation = false, then set xk+1 := yk+1 and go to S4.S3.1 Approximate f(xk + t(yk+1 − xk)) on [0, 2] by

pk(t) := a0 + a1t+ a2t2 with

a0 := f(xk),a1 := − 3

2f(xk) + 2f(yk+1)− 12f(xk + 2(yk+1 − xk)),

a2 := 12f(xk)− f(yk+1) + 1

2f(xk + 2(yk+1 − xk)).

S3.2 Set ρk := min− a12a2

, 1.99 and compute

xk+1 := (1− ρk)xk + ρkyk+1.

S3.3 If f(xk+1) > f(yk+1)− γ2vkik+1, set xk+1 := yk+1, go to S4.

S3.4 If |1− ρk| ≤ 0.1, then set ir := ir + 1.S3.5 If ir > 3, then set relaxation := false, go to S4.

It can be easily verified that the polynomial pk(·) takes its minimum at − a12a2

.If a2 := 0 or − a1

2a2≥ 2, we set ρk := 1.99. In Step 3.4 it is checked whether the

new iterate xk+1 is far from yk+1. If ‖xk+1 − yk+1‖ ≤ 0.1‖yk+1 − xk‖, then thecounter ir has to be increased. If ir is greater than some positive integer, e.g.ir > 3, then the variable relaxation is set to false and no further relaxation stepshave to be executed. With this strategy one can identify whether the relaxationstep accelerates the convergence of the iteration process.

9.1.4.1 Numerical tests

Most of the benchmark examples listed here were taken from [237], [262] or [366](see there for more details). First we describe briefly the problem structure, in-dicate the dimension n, the number m of constraints, the starting points x0

and known solutions x∗ of each test example. The case m := 0 corresponds tounrestricted problems, i.e. K := Rn in (9.1.24).

Test Example 1. (Demyanov)

f(x) := max5x1 + x2, x21 + x2

2 + 4x2,−5x1 + x2,

n := 2, m := 0, x0 := (1, 1)T , f(x0) = 6, x∗ = (0,−3)T , f(x∗) = −3.

Test Example 2. (Mifflin)

f(x) := −x1 + 20 maxx21 + x2

2 − 1, 0,


n := 2, m := 0, x0 := (0.8, 0.6)T , f(x0) = −0.8, x∗ = (1, 0)T , f(x∗) = −1.

Test Example 3.

f(x) := maxx21 + x2

2 −16

25,

9

25x2

1,

n := 2, m := 0, x0 := (√

74, 3

5)T , f(x0) = 63

25, f(x∗) = 0.

Test Example 4. (Rosen-Suzuki)

f(x) := x21 + x2

2 + 2x23 + x2

4 − 5x1 − 5x2 − 21x3 + 7x4,g1(x) := x2

1 + x22 + x2

3 + x24 + x1 − x2 + x3 − x4 − 8,

g2(x) := x21 + 2x2

2 + x23 + 2x2

4 − x1 − x4 − 10,g3(x) := x2

1 + x22 + x2

3 + 2x1 − x2 − x4 − 5,

n := 4, m := 3, x0 := 0, f(x0) = 0, x∗ = (0, 1, 2,−1)T , f(x∗) = −44.

Test Example 5. (Minimax location problem)

f(x) := max1≤i≤9

wi1‖ai − (x1, x2)‖pi1 , wi2‖ai − (x3, x4)‖pi2 , ‖(x1, x2)− (x3, x4)‖,

g1(x) := ‖(x1, x2)− a1‖2 − 144,g2(x) := ‖(x1, x2)− a4‖2 − 121,g3(x) := ‖(x3, x4)− a1‖2 − 225,g4(x) := ‖(x3, x4)− a2‖2 − 144,

n := 4, m := 4, ai ∈ R2 ∀i, x0 := (15, 22, 27, 11), f(x0) = 218.3607415,f(x∗) = 23.886767.

Test Example 6. (Streit’s problem No. 1)

f(z) := ‖Az − b‖∞,gi(z) := |(z − e)i| − ti, i = 1, . . . , d,

gd+i(z) := |(Bz − g)i| − ci, i = 1, . . . , r,

where A ∈ Cs×d, b ∈ Cs, e ∈ Cd, B ∈ Cr×d, g ∈ Cr.Setting zi := x2i−1 +

√−1x2i we get an optimization problem in R2d, considered for

d := 2, s := 5, r := 2, n := 4, m := 4, z0 := 0, f(z0) =√

2, z∗ = (−1+√−1

2, 0)T ,

f(z∗) = −√

22

.


Same form as Streit’s problem No. 1 with d := 2, s := 5, r := 2, n := 4, m := 4,z0 := 0, f(z0) =

√2, f(z∗) = 1.0142136.


Same form as Streit’s problem No. 1 with d := 3, s := 101, r := 0, n := 6, m := 6,z0 := 0, f(z0) = 1, f(z∗) = 0.01470631.

Test Example 9. (Shor)

f(x) := max1≤i≤10

bi

5∑j=1

(xj − aij)2,

n := 5, m := 0, x0 := (1, . . . , 1)T , f(x0) = 80, f(x∗) = 22.60016.


Test Example 10. (Colville)

f(x) :=5∑i=1

5∑j=1

cijxixj +5∑i=1

dixi +5∑i=1

eixi,

gi(x) := bi −5∑j=1

aijxj , i = 1, . . . , 10,

g10+i(x) := −xi, i = 1, . . . , 10,

n := 5, m := 20, x0 := (0, 0, 0, 0, 1)T , f(x0) = 20, f(x∗) = −23.0448869.

Test Example 11. (Love’s minimum location problem)

f(x) :=3∑i=1

5∑j=1

wij‖(x2i−1, x2i)− aj‖p +∑

1≤i<j≤3

‖(x2i−1, x2i)− (x2j−1, x2j)‖p,

g1(x) := x5 + x6 − 3.0,

n := 6, m := 1, x0 := 0, f(x0) = 177.31, f(x∗) = 68.82856.

Test Example 12. (Lemarechal’s MaxQuad problem)

f(x) := max1≤i≤5

〈Aix, x〉 − 〈bi, x〉,

n := 10, m := 0, x0 := (1, . . . , 1)T ∈ R10, f(x0) = 5337.07, f(x∗) = −0.8414084.

Test Example 13. (Constrained MaxQuad)

f(x) := max1≤i≤5

〈Aix, x〉 − 〈bi, x〉,

gi(x) := |xi| − 0.05 i = 1, . . . , 10,

g11(x) :=10∑i=1

xi − 0.05,

n := 10, m := 11, x0 := 0, f(x0) = 0, f(x∗) = −0.36816644175.

Test Example 14. (Ill-conditioned LP)

f(x) :=30∑i=1

ci(xi − 1),

gi(x) := 〈ai, x〉 − bi i = 1, . . . , 30,

where aij := 1i+j

, bi :=∑30j=1 a

ij , ci := −bi − 1

1+i, considered for n := 30, m := 30,

x0 := 0, f(x0) = 40.81, x∗ = (1, . . . , 1)T , f(x∗) = 0.

Test Example 15. (Lemarechal’s problem TR48)

f(x) :=

48∑j=1

dj max1≤i≤48

(xi − aij)−48∑i=1

sixi,

n := 48, m := 0, x0 := 0, f(x0) = −464816, f(x∗) = −638565.

Test Example 16. (Goffin)

f(x) := 50 max1≤i≤50

xi −50∑i=1

xi,

n := 50, m := 0, x0i := i− 51

2(i = 1, . . . , 50), f(x0) = 1225, x∗ = 0, f(x∗) = 0.


Test Example 17. (Polyhedral Problem)

f(x) :=

50∑i=1

∣∣∣∣∣50∑j=1

xj − 1

i+ j − 1

∣∣∣∣∣ ,n := 50, m := 0, x0 := 0, f(x0) = 68.82, x∗ = (1, . . . , 1)T , f(x∗) = 0.

The aim of the numerical tests is to study the influence of the relaxationstep in the framework of proximal point algorithms. With respect to each testexample Table 9.1.1 and Table 9.1.2 indicate the number k of iterations (num-ber of serious steps), the total number #tot of steps (sum of serious steps andnull steps), function values f(xk) at the found minimal points xk as well as thenumber of function evaluations #f . Mainly, we are interested in a comparison ofthe behavior of Algorithm 9.1.9 under the following three choices of ρk: ρk ≡ 1(no relaxation), ρk ≡ 1.2 (over-relaxation) and an adaptive variation of ρk ac-cording to Step 3.

As mentioned above, an adaptive choice of the relaxation parameter canbe understood as a correction of the step length. Due to Step 1 in Algorithm9.1.9 and in particular to (9.1.28), we observe that the regularization parame-ter χk and therefore the length of the iteration (serious) step is fixed before thedescent direction is found. In case the direction leads to a significant decrease ofthe objective function, it is natural to execute a larger step. Inequality (9.1.35),given with some negative parameter γ2, is a measure for the quality of the step.To make the influence of the relaxation parameter more clearly, we distinguishbetween an algorithm with a fixed regularization parameter (cf. Table 9.1.1)and one with a regularization parameter adopted by rules given in [237] (cf.Table 9.1.2).

Algorithm 9.1.9 was implemented in programming language C1. In or-der to solve the quadratic problem (9.1.27) Schittkowski’s QL-solver [364]was used, which is translated from FORTRAN to C by f2c. All results, summa-rized in Table 9.1.1 and Table 9.1.2, are calculated with the parameter settingsγ1 := 0.9, γ2 := −10−2 and θ := 10−4. The accuracy of the QL-solver is set to10−10. For the test examples 4 – 8 and 11 – 14 the penalty coefficient c := 10 ischosen whereas c := 50 is set for the test example 10.

The first numerical experiences indicate that the relaxation steps with anadapted relaxation parameter contribute to the acceleration of the convergence.In particular in Table 9.1.1, where the regularization parameter χk is fixed,one recognizes the positive influence of the relaxation parameter on the itera-tion process. The number of iterations (denoted by k) with an adapted ρk is inmore than 60% of test cases smaller or equal in comparison with the numberof iterations with ρk ≡ 1 and analogously, in more than 70% smaller, comparedto the algorithm with constant over-relaxation. Comparing the function evalu-ations required for one relaxation step, it should be noted that the algorithmwith constant over-relaxation needs only one additional function evaluation op-posite to two additional evaluations for the adaptive relaxation. Nevertheless,the number of function evaluations for the adapted ρk is in more than 70% ofthe test cases smaller.

1Meanwhile the code ”bundle method”corresponding to this approach is available as anextension of the GSL library (GNU Scientific Library, www.gsl.org/software/gsl).


Choosing an adaptive relaxation parameter, the tuning of an optimal bal-ance between the parameters χk and ρk is more complicated. However we canobserve that the total number of steps (denoted by #tot) (and therefore thenumerical effort for solving the quadratic problem (9.1.27)) is in more than 50%of the test cases smaller for adaptive ρk than for others choices. An adaptivechoice of the relaxation parameter for each test problem separately can certainlystill more enhance the results.


ρk≡

1ρk≡

1.2

ρk

Nr.

k#tot

f(xk)

#f

k#tot

f(xk)

#f

k#tot

f(xk)

#f

14

9-3

.000

000

1011

13

-2.9

99994

25

59

-3.0

00000

20

212

37-1

.000

000

387

20

-0.9

99998

28

14

46

-1.0

00000

61

38

80.

0000

009

66

0.0

00000

13

12

0.0

00000

54

2172

-44.

0000

0073

25

73

-44.0

00000

99

18

58

-44.0

00000

75

520

4523

.886

767

4626

40

23.8

86802

67

22

46

23.8

86767

57

618

330.

7071

0734

16

23

0.7

07108

40

16

36

0.7

07107

49

710

181.

0142

1419

14

29

1.0

14214

44

817

1.0

14214

28

825

426

00.

0147

0626

1212

221

0.0

14706

434

208

222

0.0

14706

335

920

5622

.600

162

5727

58

22.6

00166

86

19

62

22.6

00163

79

1015

88-2

3.04

4862

8928

1029

-23.0

44884

1058

10

936

-23.0

44842

947

1118

4368

.829

556

4439

66

68.8

29556

106

15

41

68.8

29557

58

1234

247

-0.8

4140

824

884

274

-0.8

41408

359

28

221

-0.8

41408

258

1324

252

-0.3

6815

325

328

229

-0.3

68155

258

20

244

-0.3

68138

261

1413

615

50.

0000

00

156

118

139

0.0

00001

258

136

155

0.0

00000

176

1545

763

6-6

3856

5.00

637

380

514

-638564.9

9895

458

616

-638565.0

0631

1625

560.

0000

0057

29

92

0.0

00090

122

25

56

0.0

00000

65

1717

121

40.

0000

01

215

149

182

0.0

00001

332

173

211

0.0

00001

224

Tab

le9.

1.1

:A

lgori

thm

9.1

.9w

ithχk≡

1.


ρk≡

1ρk≡

1.2

ρk

Nr.

k#tot

f(xk)

#f

k#tot

f(xk)

#f

k#tot

f(xk)

#f

18

10-3

.000000

11

10

13

-2.9

99941

24

811

-3.00000022

213

516

6-1.0

000

00

167

61

107

-1.0

00000

169

133

165-1.000000

1763

11

12

0.00

0000

13

10

14

0.0

00000

25

23

0.0000008

420

39-4

4.0000

00

40

20

40

-43.9

99998

61

2136

-44.00000059

544

8523.8

86769

86

27

48

23.8

86773

76

4383

23.886769100

69

200.7

07107

21

810

0.7

07124

19

1025

0.70710738

74

101.0

14214

11

58

1.0

14260

14

27

1.01421412

821

340.0

14754

35

23

34

0.0

14744

58

2140

0.01470867

920

3022.6

00163

31

24

34

22.6

00163

59

1930

22.60016243

10

18

65

-23.044

887

66

30

61

-23.0

44879

92

1865

-23.04488774

11

24

46

68.82

9556

47

34

54

68.8

29560

89

2537

68.82955652

12

31

51

-0.841

408

52

50

78

-0.8

41408

129

3350

-0.84140867

13

16

77

-0.368

166

78

35

81

-0.3

68161

117

2485

-0.368166108

14

15

19

0.00

0001

20

14

17

0.0

00079

32

2431

0.00000052

15

116

298

-63856

5.0

0299

92

186

-638564.9

9279

94255

-638565.00272

16

37

55

0.00

0000

56

44

59

0.0

00011

104

5361

0.00000072

17

15

20

0.00

0001

21

16

19

0.0

00073

36

2131

0.00000244

Table

9.1

.2:

Alg

orith

m9.1

.9w

ithva

riab

leχk .

9.2. INTERIOR PPR ON NON-POLYHEDRAL SETS 387

9.2 Interior PPR on Non-Polyhedral Sets

As in Section 9.1 we deal here with the variational inequality

VI(Q,K) find x ∈ K, q ∈ Q(x) : 〈q, y − x〉 ≥ 0, ∀ y ∈ K,

where K is a convex, closed subset of Rn but the operator Q is composed suchthat

Q := ∂f + P,

where ∂f is the subdifferential of a proper convex lower semi-continuous functionf and P : Rn → 2R

n

is a maximal monotone operator. The application of theexact PPR can be described as follows.

Given v1 ∈ K and a sequence χk, 0 < χk ≤ χ < ∞. With vk ∈ Kobtained in the previous step, define vk+1 ∈ K, qk+1 ∈ Q(vk+1) such that

〈qk+1 + χk∇1D(vk+1, vk), v − vk+1〉 ≥ 0 ∀ v ∈ K.

Here D : (x, y) 7→ ‖x − y‖2 and ∇1D denotes the partial gradient of D withrespect to the first vector argument.

For different modifications of the PPR, also with other quadratic distancefunctions D, we refer to [215, 224, 240] and the bibliographies therein.In the last decade, a new branch in PPR’s is intensively studied dealing with theuse of non-quadratic distance functions. The main motivation for this type ofproximal methods is the following: for certain classes of problems an appropriatechoice of non-quadratic distance functions permits one to preserve the regular-izing properties of the original version of the PPR and at the same time, thischoice guarantees that the iterates stay in the interior of the set K, i.e., withcertain precaution, the regularized problems can be treated as unconstrainedones.

Usually, a Bregman distance, an entropic ϕ-divergence or a logarithmic-quadratic distance are applied to construct such interior proximal methods (seereferences in [64, 240, 386] and the recent papers [23, 25, 229]). However, upto now, distance functions providing an interior point effect have been createdonly for the case that K is a polyhedral convex set or a ball.

Using a slightly modified definition of the class of Bregman functions (see Re-mark 9.2.3), we extend the Bregman-function-based interior proximal methodsto solve VI(Q,K) on a non-polyhedral set

K := x ∈ Rn : gi(x) ≤ 0, i ∈ I1 ∪ I2, (9.2.1)

where the Slater condition is supposed to be valid and

gi (i ∈ I1) are affine functions, (9.2.2)

gi (i ∈ I2) are convex and continuously differentiable functions,maxi∈I2 gi is strictly convex on K.

(9.2.3)

Later on in Subsection 9.2.2.1 we will show that condition (9.2.3) can be weak-ened substantially.

The modification in the definition of Bregman functions consists mainly in


a relaxation of the standard convergence sensing condition (see Stand-(B4) inRemark 9.2.3) which proves to be restrictive already in case K is a ball.The convergence analysis of the method studied admits a successive approxi-mation of the operator Q by means of the ε-enlargement concept, as well as aninexact solution of the subproblems under a criterion of the summability of theerror vectors.

In Section 9.2.2 we check the validity of the modified requirements onBregman functions for some particular functions and discuss the convergenceof earlier developed Bregman-function-based methods under the modificationmentioned.

9.2.1 Bregman function based PPR

Variational inequality VI(Q,K) is considered under the following basic assump-tions:

9.2.1 Assumption.

(A1) K ⊂ Rn is a convex closed set;

(A2) domQ∩ intK 6= ∅;

(A3) SOL(Q,K) of VI(Q,K) is non-empty;

(A4) (a) P : Rn → 2Rn

is a maximal monotone operator,

(b) limk→∞

yk = y ∈ K, pk ∈ P(yk) implies that pk is a bounded se-quence,

(c) ∂f + P is a paramonotone operator, see Definition A1.6.43.

Assumptions (A1), (A2) and (A4)(a) provide the maximal monotonicity ofQ+NK , where NK is the normality operator of K (cf. [350], Theorem 1).

For some properties of paramonotone operators see [194]. In particular,because ∂f is paramonotone we have paramonotonicity of Q. Paramonotonicityis a rather standard assumption in Bregman-function-based proximal methods.Such kind of methods for variational inequalities with non-paramonotone oper-ators was studied in [228].

The composed operators Q = ∂f + P, where P is linear, paramonotoneand non-symmetric and ∂f is the subdifferential of a proper convex, lower semi-continuous function f , such that K \ domf 6= ∅, form a wide class of operatorssatisfying (A). Assumption (A4)(b) is weaker than an assumption on the bound-edness of the whole operator Q. Indeed, because the operator ∂f is maximalmonotone, the assumption K \ domf 6= ∅ means that the restriction of ∂f onK is an unbounded operator, and hence, an assumption on the boundedness ofthe whole operator Q is stronger than required here.

Now, let h be a Bregman-type function with zone intK. According to theterminology of Bregman functions, under a zone of h one understands an openconvex set S := int(domh). More precisely, here we suppose:

9.2.2 Assumption.

(B1) domh = K and h is continuous and strictly convex on K;


(B2) h is continuously differentiable on intK;

(B3) With a distance function

Dh : (x, y) ∈ K × intK 7→ h(x)− h(y)− 〈∇h(y), x− y〉,

for each x ∈ K there exist constants α(x) > 0, c(x) such that

Dh(x, y) + c(x) ≥ α(x)‖x− y‖, ∀ y ∈ intK;

(B4) If zk ⊂ intK converges to z, then at least one of the following propertiesis valid:

(i) limk→∞

Dh(z, zk) = 0 or

(ii) limk→∞Dh(z, zk) = +∞ if z 6= z, z ∈ bdK;

(B5) (zone coercivity) ∇h(intK) = Rn. ♦

9.2.3 Remark. Assumption (B4) is evidently weaker than the standard con-vergence sensing condition:

Stand-(B4) If zk ⊂ intK converges to z then limk→∞

Dh(z, zk) = 0,

and it is closely related to condition (iv) in the definition of a generalized Breg-man function introduced by Kiwiel, see [240], Definition 2.4.

Another standard condition

Stand-(B3) For any x ∈ K and any constant α the setLα(x) := y ∈ intK : Dh(x, y) ≤ α is bounded,

follows immediately from (B3). Both (B3) and Stand-(B3) are obviously validif K is a bounded set. ♦

Referring further on to standard conditions on Bregman functions, we havein mind the fulfillment of (B1), (B2), Stand-(B3), Stand-(B4), and one of theconditions (B5) or

(B6) (boundary coercivity)If zk ⊂ intK, lim

k→∞zk = z ∈ bdK, then it holds for each x ∈ intK

limk→∞

〈∇f(zk), x− zk〉 = −∞.

9.2.4 Remark. To our knowledge, among the Bregman functions (under stan-dard conditions), considered in the literature (see [70], [71], [193] and referencestherein), only the function

n∑j=1

(xj − xβj ), 0 < β < 1

on K := Rn+ does not satisfy (B3). Assumption (B3) was introduced in [229] inorder to weaken the stopping criteria in generalized proximal methods. ♦


In the sequel we make use of

9.2.5 Lemma. ([376], Theorem 2.4)Let h satisfy the Assumptions 9.2.2 (B1) and (B2). If

zk ⊂ K, yk ⊂ intK, limk→∞

Dh(zk, yk) = 0

and one of these sequences converges, then the other one converges to the samelimit, too.

9.2.6 Lemma. Let the Assumptions 9.2.2 (B1), (B2), Stand-(B3) and (B4) bevalid and vk ⊂ intK. Moreover, suppose that C is a non-empty subset of K,Dh(x, vk) converges for each x ∈ C, and each cluster point of vk belongs toC.Then vk converges to some v ∈ C.

Proof: Because Dh(x, vk) converges for x ∈ C and Stand-(B3) is valid, thesequence vk is bounded. Take two subsequences vlk and vnk of vk with

limk→∞

vlk = v∗, limk→∞

vnk = v.

If v ∈ intK, then limk→∞

Dh(v, vnk) = 0 follows from (B2), and since the whole

sequence Dh(v, vk) converges, one gets

limk→∞

Dh(v, vk) = 0.

In turn, applying Lemma 9.2.5 with zk := v, yk := vlk , we conclude v∗ = v.Now let v ∈ bdK. We make use of (B4) setting z := v∗, zk := vlk and z := v.If (B4)(i) is valid, then lim

k→∞Dh(v∗, vlk) = 0, hence

limk→∞

Dh(v∗, vk) = 0.

Again, Lemma 9.2.5 with zk := v∗, yk := vnk , yields v∗ = v.But if (B4)(ii) holds, then v 6= v∗ is also impossible taking into account thatDh(v, vk) is a convergent sequence.

Let us remind on the ε-enlargement of an maximal monotone operator Q:

Qε(x) = u ∈ Rn : 〈u− v, x− y〉 ≥ −ε ∀ y ∈ domQ, v ∈ Q(y).

For properties of the ε-enlargement, see [61], [63].

Now we describe the method under consideration.

9.2.7 Method. (Bregman-function-based PPR)Starting with an arbitrary x1 ∈ intK, the method generates two sequencesxk ⊂ Rn and ek ⊂ Rn conforming to the recursion

ek+1 ∈ Qk(xk+1) + χk∇1Dh(xk+1, xk). (9.2.4)

0 < χk < χ (χ > 0 arbitrary), εk ≥ 0,

∞∑k=1

εkχk

<∞,∞∑k=1

‖ek‖χk

<∞. (9.2.5)

♦


Here h is a Bregman function satisfying (B1)-(B5) and Qk is an approximationof Q such that

Q ⊂ Qk ⊂ ∂εkf + Pεk ,

where ∂εf denotes the ε-subdifferential of f .

Considering this method with ek ≡ 0 instead of the last condition in(9.2.5), assumption (B3) can be replaced by Stand-(B3), with minor and evidentmodifications in the convergence analysis below.

According to [59], Lemma 1, the conditions (B1), (B2) and (B5) ensurethat for each y ∈ intK

dom ∂1Dh(·, y) = intK, (9.2.6)

and

∂1Dh(x, y) =

∇h(x)−∇h(y) if x ∈ intK∅ otherwise

(9.2.7)

(∂1Dh denotes the partial subdifferential of Dh).Moreover, the conditions (A1), (A2), (B1), (B2) and (B5) imply that, for

any e ∈ Rn, y ∈ intK and χ > 0, the inclusion

e ∈ Q(x) + χ∂1Dh(x, y)

has a unique solution belonging to intK (see [59], Theorem 1).

Thus, the mentioned assumptions guarantee that for any sequences ek ⊂Rn and χk, χk > 0, there exists a sequence xk satisfying (9.2.4), andxk ⊂ intK is a straightforward corollary of (9.2.6).

9.2.8 Lemma. Suppose that the sequence (xk, ek) fulfils recursion (9.2.4),that xk ⊂ intK and that the Assumptions 9.2.1 (A1)-(A3), Assumptions9.2.2 (B1)-(B3) and (9.2.5) are valid. Then

(i) Dh(x∗, xk) is convergent for each x∗ ∈ SOL(Q,K);

(ii) xk is bounded;

(iii) limk→∞Dh(xk+1, xk) = 0.

Proof: According to (9.2.4) there exists qk+1 ∈ Qk(xk+1) such that

ek+1 = qk+1 + χk∇1Dh(xk+1, xk).

From this equality and (9.2.7) we conclude that

〈qk+1 + χk(∇h(xk+1)−∇h(xk)), x− xk+1〉 ≥ −‖ek+1‖‖x− xk+1‖ (9.2.8)

holds for all x ∈ K. Together with the obvious identity

Dh(x, xk+1)−Dh(x, xk)

= −Dh(xk+1, xk) + 〈∇h(xk)−∇h(xk+1), x− xk+1〉 (9.2.9)


this yields

Dh(x, xk+1)−Dh(x, xk) ≤ −Dh(xk+1, xk)

+1

χk〈qk+1, x− xk+1〉+

‖ek+1‖χk

‖x− xk+1‖, ∀x ∈ K. (9.2.10)

Choose x∗ ∈ SOL(Q,K) and q∗ ∈ Q(x∗) satisfying

〈q∗, x− x∗〉 ≥ 0, ∀ x ∈ K.

From the definition of Pε and ∂εf ⊂ (∂f)ε the inclusions

Q ⊂ Qk ⊂ ∂εkf + Pεk ⊂ Qεk

are true and one gets

〈qk+1 − q∗, x∗ − xk+1〉 ≤ εk. (9.2.11)

Now, we take (9.2.10) with x := x∗ and insert there the following two estimates

〈qk+1, x∗ − xk+1〉 ≤ 〈q∗, x∗ − xk+1〉+ εk ≤ εk,

‖x∗ − xk+1‖ ≤ 1

α(x∗)

[Dh(x∗, xk+1) + c(x∗)

].

The first one is true because of (9.2.11) and xk+1 ∈ K, and the second one is aconsequence of (B3). These insertions lead to

Dh(x∗, xk+1)−Dh(x∗, xk)

≤ −Dh(xk+1, xk) +δk

α(x∗)Dh(x∗, xk+1) + δk

(1 +

c(x∗)

α(x∗)

), (9.2.12)

where δk := χ−1k max‖ek+1‖, εk.

Conditions (9.2.5) provide that δkα(x∗) < 1

2 for k ≥ k0, k0 sufficiently large.

Therefore,

1 ≤(

1− δkα(x∗)

)−1

≤(

1 +2δkα(x∗)

)< 2, ∀ k ≥ k0,

and (9.2.12) results in

Dh(x∗, xk+1)

≤(

1 +2δkα(x∗)

)Dh(x∗, xk)−Dh(xk+1, xk) + 2δk

(1 +

c(x∗)

α(x∗)

).(9.2.13)

Taking into account that Dh is a non-negative function and∑∞k=1 δk < ∞,

Lemma A3.1.7, applied to (9.2.13), guarantees that Dh(x∗, xk) convergesand

limk→∞

Dh(xk+1, xk) = 0.

Now, condition (B3) implies that the sequence xk is bounded.


9.2.9 Lemma. Let Assumption 9.2.1 be valid, x∗ ∈ SOL(Q,K), x ∈ K and

〈p, x− x∗〉+ f(x)− f(x∗) ≤ 0 holds for some p ∈ P(x). (9.2.14)

Then x ∈ SOL(Q,K).

Proof; Because x∗ ∈ SOL(Q,K), there are `∗ ∈ ∂f(x∗) and p∗ ∈ P(x∗)satisfying

〈`∗ + p∗, x− x∗〉 ≥ 0 ∀ x ∈ K. (9.2.15)

In view of the monotonicity of the operator P, (9.2.15) implies

〈p+ `∗, x− x∗〉 ≥ 0

and using (9.2.14) we obtain

f(x)− f(x∗) ≤ 〈`∗, x− x∗〉.

Thus, for any x ∈ Rn

f(x)− f(x) = f(x)− f(x∗)− f(x) + f(x∗)

≥ f(x)− f(x∗)− 〈`∗, x− x∗〉 − 〈`∗, x− x〉 ≥ 〈`∗, x− x〉.

This indicates that `∗ ∈ ∂f(x), whereas (9.3.19) and the obvious inequality

f(x)− f(x∗) ≥ 〈`∗, x− x∗〉

yield〈p+ `∗, x− x∗〉 ≤ 0.

Now, the paramonotony of ∂f + P provides that x ∈ SOL(Q,K).

Assuming (A4), the following property (F) of a paramonotone operator A ona convex closed set C is decisive (cf. [194]):

(F) If u∗ solves the variational inequality

find u ∈ C, q ∈ A(u) : 〈q, v − u〉 ≥ 0, ∀ v ∈ C, (9.2.16)

and for some u ∈ C there exists z ∈ A(u) with

〈z, u∗ − u〉 ≥ 0,

then u is also a solution of (9.2.16).

9.2.10 Lemma. Let Assumption 9.2.1 and the Assumptions 9.2.2 (B1)-(B3),(B5) be satisfied. Then each cluster point of the sequence xk, generated byMethod 9.2.7, belongs to SOL(Q,K).

Proof: According to Lemma 9.2.8, the sequence xk is bounded, hence thereexists a convergent subsequence xjk with limk→∞ xjk = x. Because K is aclosed set and xk ∈ intK one gets x ∈ K. Taking into account conclusion (iii)of Lemma 9.2.8, the application of Lemma 9.2.5 with zk := xjk+1, yk := xjk ,yields limk→∞ xjk+1 = x. Moreover, using the identity (9.2.9) with x := x∗ ∈SOL(Q,K), the relation

limk→∞

χjk〈∇f(xjk+1)−∇f(xjk), x∗ − xjk+1〉 = 0 (9.2.17)


follows immediately from Lemma 9.2.8 and χk ∈ (0, χ] for all k.Let us at first suppose that (A4) is valid with P ≡ 0. Due to the convexity

of the function f and qk+1 ∈ Qk(xk+1) ⊂ ∂εkf(xk+1), relation (9.2.8) consideredfor x := x∗ and k := jk implies that

−f(xjk+1) + f(x∗) + χjk〈∇h(xjk+1)−∇h(xjk), x∗ − xjk+1〉≥ −‖ejk+1‖‖x∗ − xjk+1‖ − εjk . (9.2.18)

Passing to the limit in (9.2.18) for k →∞, in view of (9.2.17), (9.2.5), χk ∈ (0, χ]and the lower semi-continuity of f , we obtain f(x) ≤ f(x∗). Together withx ∈ K, x∗ ∈ SOL(Q,K), this yields

0 ∈ ∂ (f(x) + δ(x|K)) ,

where δ(·|K) is the indicator function of K. In view of assumption (A2), Theo-rem 23.8 in [348] provides x ∈ SOL(Q,K).

Now, let us turn to the case that the operator Q ≡ P possesses property(A4). Owing to the Brønsted-Rockafellar property of the ε-enlargement (cf. [63])and the relation Q ⊂ Qk ⊂ Pεk , there exist xjk+1 and q(xjk+1) ∈ P(xjk+1) suchthat

‖xjk+1 − xjk+1‖ ≤ √εjk , ‖qjk+1 − q(xjk+1)‖ ≤ √εjk , ∀ k. (9.2.19)

Hence, limk→∞ xjk+1 = x, and taking into account (A4)(b) with yk := xjk+1,both sequences q(xjk+1) and qjk+1 are bounded. Together with the secondinequality in (9.2.19), this allows us to conclude, without loss of generality, that

limk→∞

qjk+1 = q and limk→∞

q(xjk+1) = q.

In turn, the maximal monotonicity of P ensures q ∈ P(x).Now we assemble both cases and observe Lemma 9.2.9. Inserting x :=

x∗ ∈ SOL(Q,K) into (9.2.8) and passing to the limit for k := jk, k → ∞, weinfer from (9.2.17) and (9.2.5) that

〈q, x∗ − x〉 ≥ 0.

Finally, property (F) of the paramonotone Q provides x ∈ SOL(Q,K).

Now, from xk ⊂ intK and Lemmata 9.2.6-9.2.10, the main convergenceresult follows immediately.

9.2.11 Theorem. Let Assumptions 9.2.1 and Assumption 9.2.2 be valid. Thenthe sequence xk, generated by Method 9.2.7, belongs to intK and convergesto a solution of VI(Q,K).

9.2.2 Bregman functions with non-polyhedral zones

Bregman functions with zone S := intK are of main interest in applications.Exactly in this case we deal with interior proximal methods. However, as it wasalready mentioned, up to now such Bregman functions have been constructedonly for linearly constrained sets K or in the case that K is a ball.


Usually, for linearly constrained K condition Stand-(B4) (i.e. (B4)(i)) is sup-posed. In case that K is a ball, instead of Stand-(B4) the condition (B4)(ii) wasintroduced in [70].

These convergence results are not applicable, for instance, if K is the in-tersection of a half-space and a ball. The reason is that neither condition (B4)(i)nor (B4)(ii) considered alone for itself is guaranteed in this case. This will beshown in Example 9.2.16 below.

In this Section Method 9.2.7 is studied for the case

K := x ∈ Rn : gi(x) ≤ 0, i ∈ I, I = I1 ∪ I2, (9.2.20)

where I is a finite index set, gi (i ∈ I1) are affine functions, gi (i ∈ I2) are convexcontinuously differentiable functions and maxi∈I2 gi is supposed to be strictlyconvex on K. Further Slater’s condition

∃ x : gi(x) < 0 ∀ i ∈ I (9.2.21)

is supposed to be valid.

The following statement clarifies a choice of Bregman-like functions with zoneintK.

9.2.12 Theorem. Let ϕ be a strictly convex, continuous and increasing func-tion with domϕ = (−∞, 0], and ϕ be continuously differentiable on (−∞, 0).Moreover, let

limt↑0

ϕ′(t)t = 0, (9.2.22)

limt↑0

ϕ′(t) = +∞. (9.2.23)

Then the function

h(x) :=∑i∈I

ϕ(gi(x)) + θ

n∑j=1

|xj |γ , γ > 1 is fixed, (9.2.24)

with θ := 0 if K is bounded and θ := 1 if boundedness of K is unknown, satisfiesAssumption 9.2.2.

As it will be clear from the proof of this theorem, instead of∑ni=1 |xj |γ in

(9.2.24) one can take any strictly convex and continuously differentiable func-tion guaranteeing the fulfilment of (B3) for h.

Proof: Step by step we check whether the assumptions (B1)-(B5) are satisfiedfor the function h in (9.2.24) to be a Bregman function.• Because ϕ is a convex increasing function, the convexity of the compositionϕ gi (and hence, of h) on the set K is guaranteed. Thus, if θ := 1, the strictconvexity of h is evident.But, if θ := 0 (i.e. K is bounded), we have to consider two cases.(a) I2 6= ∅. Assume that

∑i∈I2 ϕ gi is not strictly convex on K. Then one can

choose points x1, x2 ∈ K, x1 6= x2, and λ ∈ (0, 1) such that∑i∈I2

ϕ(gi(λx1 + (1− λ)x2)) = λ

∑i∈I2

ϕ(gi(x1)) + (1− λ)

∑i∈I2

ϕ(gi(x2)).


Because each function ϕ gi is convex, the last equality means that, for eachi ∈ I2,

ϕ(gi(λx1 + (1− λ)x2)) = λϕ(gi(x

1)) + (1− λ)ϕ(gi(x2)) (9.2.25)

is valid. Using that gi is convex, ϕ is increasing and x1, x2 ∈ K, we obtain

ϕ(λgi(x1) + (1− λ)gi(x

2)) ≥ λϕ(gi(x1)) + (1− λ)ϕ(gi(x

2)), i ∈ I2. (9.2.26)

The strict convexity of ϕ implies, that only the equality is possible in (9.2.26),and gi(x

1) = gi(x2) holds for each i ∈ I2. Finally, in view of (9.2.25) and the

increase of ϕ one gets for i ∈ I2

gi(λx1 + (1− λ)x2) = λgi(x

1) + (1− λ)gi(x2),

which contradicts the strict convexity of maxi∈I2 gi.(b) I2 = ∅, i.e. all functions gi are affine:

gi(x) := 〈ai, x〉 − bi, i ∈ I1 = I.

Then, due to Slater’s condition and the boundedness of K, the rank of the ma-trix A = aii∈I equals n. Using this fact, the strict convexity of the function hcan be easily established following [289].Thus, taking into account also the differentiability properties of gi and ϕ, as-sumptions (B1) and (B2) for the function h are always valid.

• Of course, assumption (B3) is guaranteed if K is a bounded set. In case of anunbounded K, (B3) is evident if γ := 2 in (9.2.24); for arbitrary γ > 1 condition(B3) follows from Proposition 2 in [229] and

Dh(x, y) ≥ h0(x)− h0(y)− 〈∇h0(y), x− y〉,

(where h0(x) :=∑ni=1 |xi|γ).

• Now, we check the validity of assumption (B4). Denote

I<(y) := i : gi(y) < 0, I=(y) := i : gi(y) = 0,

and let zk ⊂ intK, limk→∞ zk = z.(a) At first, suppose that I<(z) ⊃ I2. For i ∈ I<(z), one gets

limk→∞

gi(zk) = gi(z) < 0 and lim

k→∞〈∇gi(zk), z − zk〉 = 0,

whereas for i ∈ I=(z) we have limk→∞

gi(zk) = gi(z) = 0, bi = 〈ai, z〉. Thus, for

i ∈ I=(z),

limk→∞

ϕ′(〈ai, zk〉 − bi)〈ai, z − zk〉 = limk→∞

ϕ′(〈ai, zk − z〉)〈ai, z − zk〉 = 0

follows from 〈ai, zk−z〉 → 0 and (9.2.22). Using these relations and the identity

Dh(x, y) = h(x)− h(y)−∑

i∈I=(y)

ϕ′(〈ai, y〉 − bi)〈ai, x− y〉

−∑

i∈I<(y)

ϕ′(gi(y))〈∇gi(y), x− y〉 − θ〈∇h0(y), x− y〉,


one can easily conclude that limk→∞Dh(z, zk) = 0.(b) Now, let gi0(z) = 0 be valid for some i0 ∈ I2. Denote

I2(y) := i ∈ I2 : gi(y) = maxj∈I2

gj(y)

and take z ∈ K, z 6= z. In view of the convexity of the functions ϕ gi and h0

it holds

Dh(z, zk) ≥ ϕ(gi0(z))− ϕ(gi0(zk))− ϕ′(gi0(zk))〈∇gi0(zk), z − zk〉. (9.2.27)

Obviously, relation (9.2.23) implies

limk→∞

ϕ′(gi0(zk)) = +∞, (9.2.28)

whereaslimk→∞

〈∇gi0(zk), z − zk〉 = 〈∇gi0(z), z − z〉. (9.2.29)

But, using the structure of the subdifferential of a max-function and the strictconvexity of maxi∈I2 gi, we obtain

maxi∈I2

gi(z)−maxi∈I2

gi(z) > 〈∇gj(z), z − z〉, ∀ j ∈ I2(z).

Because z, z ∈ K and gi0(z) = 0, the last inequality yields

〈∇gj(z), z − z〉 < 0, j ∈ I2(z). (9.2.30)

The relations (9.2.27)-(9.2.30) and the continuity of ϕ gi0 ensure that

limk→∞

Dh(z, zk) = +∞. (9.2.31)

In fact, we have proved a stronger property than (B4)(ii): relation (9.2.31) holdsfor each z ∈ K, z 6= z, whereas (B4)(ii) supposes

limk→∞Dh(z, zk) = +∞ if z ∈ bdK, z 6= z.

• To prove the fulfillment of assumption (B5), we use Theorem 4.5 in [35], whichstates, in particular, that

For a function f with properties (B1) and (B2), boundary coercivity im-plies zone coercivity if K is bounded or the super-coercivity condition

limx∈K, ‖x‖→∞

f(x)

‖x‖= +∞

is valid.

Let z ∈ bdK, zk ∈ intK, limk→∞

zk = z and x ∈ intK. Then

limk→∞

〈∇gi(zk), x− zk〉 = 〈∇gi(z), x− z〉,

and for i ∈ I=(z) the relations

〈∇gi(z), x− z〉 < 0, limk→∞

ϕ′(gi(zk)) = +∞


are obvious. Hence, if i ∈ I=(z), then

limk→∞

ϕ′(gi(zk))〈∇gi(zk), x− zk〉 = −∞, (9.2.32)

whereas for i ∈ J<(z) it holds

limk→∞

ϕ′(gi(zk))〈∇gi(zk), x− zk〉 = ϕ′(gi(z))〈∇gi(z), x− z〉. (9.2.33)

From (9.2.32) and (9.2.33) and the continuous differentiability of the functionh0, we immediately conclude that the function h is boundary coercive, henceaccording to Theorem 4.5 in [35], h satisfies assumption (B5).

Particular functions satisfying the conditions of Theorem 9.2.12 are

ϕ(t) := −(−t)p, p ∈ (0, 1) arbitrarily chosen, (9.2.34)

ϕ(t) :=

−t ln(−t) + t if − 1

2 ≤ t ≤ 0− ln 2 ln(−t+ 1

2 )− 12 ln 2− 1

2 if t < − 12

, (9.2.35)

ϕ(t) := −t ln(−t) + t ln(−t+ 1) + t, (9.2.36)

where by convention ϕ(0) = 0.The principal idea for constructing function (9.2.36) consists in the fol-

lowing: Constraints, which turn out to be active at the limit point, are handledmainly by the first term in ϕ, whereas those which become inactive at the limitpoint, are observed by both terms of ϕ.

Combining Theorems 9.2.11 and 9.2.12 we immediately obtain the followingresult.

9.2.13 Theorem. Let the set K be described by (9.2.20), (9.2.21) and thefunction h be defined as in Theorem 9.2.12. Moreover, suppose that VI(Q,K)satisfies Assumption 9.2.1.Then, the sequence xk, generated by Method 9.2.7, belongs to intK and con-verges to x ∈ SOL(Q,K).

9.2.14 Corollary. Suppose that the hypothesis of Theorem 9.2.13 are fulfilled.If VI(Q,K) has more than one solution, then the sequence xk, generated byMethod 9.2.7 with Bregman function (9.2.24), converges to a solution x suchthat gi(x) < 0, i ∈ I2.

Indeed, the opposite assumption that gi(x) = 0 holds for some i ∈ I2 leadsimmediately to the contradiction between the statement (i) of Lemma 9.2.8and relation (9.2.31) given with zk := xk and z ∈ SOL(Q,K), z 6= x.

9.2.15 Remark. If functions gi in (9.2.20) satisfy

gi(x) ≥ −1 ∀ x ∈ K, ∀ i ∈ I2,

one can quite similarly prove that the function hl : K → R, defined by

hl(x) :=∑i∈I

[(−gi(x)) ln(−gi(x)) + gi(x)] + θ

n∑j=1

|xj |γ (9.2.37)

(with θ, γ as in (9.2.24)), possesses the properties (B1)-(B5), too. ♦


The following example indicates that assumption (B4) may be violated forfunction (9.2.24) if the function maxi∈I2 gi is not strictly convex.

9.2.16 Example. Let K := x ∈ R4 : x21 + x2

2 + x3 + x4 ≤ 1.With g(x) := x2

1 + x22 + x3 + x4 − 1, take (9.2.24), (9.2.35), i.e.,

h(x) := −(−g(x))p +

4∑j=1

x2j , p ∈ (0, 1),

and zk such that

if k := 2l − 1: zk1 =

√1− σ

11−pk − (1− σ

11−pk − σk)2,

zk2 = 1− σ1

1−pk − σk, zk3 = zk4 = 0;

if k := 2l: zk1 = 0, zk2 = 1, zk3 = −σk, zk4 = 0, σk > 0, limk→∞

σk = 0.

Then z = limk→∞

zk = (0, 1, 0, 0)T and for large l

(−g(zk))p−1〈∇g(zk), z − zk〉 = −2, if k = 2l − 1.

But, choosing z := (0, 1,−1, 1)T , one gets

(−g(zk))p−1〈∇g(zk), z − zk〉 = σpk, if k = 2l.

Thus, neither (B4)(i) nor (B4)(ii) is valid. ♦

9.2.2.1 On the weakening of condition (9.2.20)

Now instead of (9.2.20) we are going to consider a condition saying thatthe boundary of the feasible set which is described by nonlinear constraints doesnot contain a line segment. More precisely we assume that

gi (i ∈ I2) are convex, continuously differentiable functions, and

Kbd := y ∈ K : maxi∈I2

gi(y) = 0, (9.2.38)

does not contain any line segment.

As we have seen in the proof of Theorem 9.2.13 the Assumptions 9.2.2 (B1),(B2), (B3) and (B5) are fulfilled. This proof establishes also that the standardconvergence sensing condition (B4)(i) is certainly valid if gi(z) < 0 ∀ i ∈ I2, i.e.,if z ∈ K \ Kbd. In order to check the convergence sensing conditions (B4)(ii) incase limk→∞ zk = z ∈ Kbd, we need the following statement.

9.2.17 Lemma. The following conclusions are equivalent:

(i) assumption (9.2.38) is valid;

(ii) z ∈ Kbd implies

mini∈I2(z)

〈∇gi(z), x− z〉 < 0, ∀ x ∈ bdK, x 6= z,

where I2(z) := i ∈ I2 : gi(z) = 0;


(iii) z ∈ Kbd implies

〈∇gi(z), x− z〉 < 0, ∀ x ∈ bdK, x 6= z, ∀ i ∈ I2(z).

Proof: (iii) ⇒ (ii) is obvious.(ii) ⇒ (i): Assume that Kbd contains a line segment [a, b]. Then for z := 1

2 (a+

b) ∈ Kbd, due to b ∈ bdK, b 6= z and (ii) , we obtain

∃ i0 ∈ I2(z) : 〈∇gi0(z), b− z〉 < 0.

This implies

0 < −〈∇gi0(z), b− z〉 = 〈∇gi0(z),a− b

2〉 = 〈∇gi0(z), a− z〉,

hencegi0(a)− gi0(z) > 0,

and gi0(z) = 0 leads to gi0(a) > 0, which is a contradiction to a ∈ Kbd.(i) ⇒ (iii): Let be z ∈ Kbd. If x ∈ Kbd, x 6= z, then v := 1

2 (z + x) ∈ K \ Kbd

and for arbitrary i ∈ I2(z) one gets

gi(v)− gi(z) ≥1

2〈∇gi(z), x− z〉.

Taking into account that gi(v) < 0 and gi(z) = 0, this implies

〈∇gi(z), x− z〉 < 0.

But, if x ∈ bdK \ Kbd, then gi(x) < 0 holds for all i ∈ I2, and the inequality

gi(x)− gi(z) ≥ 〈∇gi(z), x− z〉

also guarantees that

〈∇gi(z), x− z〉 < 0, ∀ i ∈ I2(z).

Now, let the sequence zk ⊂ intK converge to z ∈ Kbd and assume thatcondition (9.2.38) is valid. We show that

limk→∞

Dh(z, zk) = +∞ (9.2.39)

holds if z 6= z and z ∈ bdK. This will immediately imply the fulfillment of themodified convergence sensing conditions (B4).

In view of z ∈ Kbd, the equality gi0(z) = 0 is valid for some i0 ∈ I2. Fromthe convexity of the functions ϕ gi and x→

∑|xi|γ it follows

Dh(z, zk) ≥ ϕ(gi0(z))− ϕ(gi0(zk))− ϕ′(gi0(zk))〈∇gi0(zk), z − zk〉. (9.2.40)

Obviously, the relation limt↑0 ϕ′(t) = +∞ provides

limk→∞

ϕ′(gi0(zk)) = +∞, (9.2.41)


whereaslimk→0

ϕ(gi0(zk)) = 0,

limk→∞

〈∇gi0(zk), z − zk〉 = 〈∇gi0(z), z − z〉. (9.2.42)

But, according to Lemma 9.2.17 ,

〈∇gi0(z), z − z〉 < 0, (9.2.43)

and (9.2.40)-(9.2.43) yields immediately the fulfillment of (9.2.39).Therefore, all assumptions on Bregman functions made are valid. Hence,

the convergence results proved there remain true under the use of the weakerassumption (9.2.38) on functions gi.

The following simple example shows that the conditions (9.2.38) on thefunctions gi (i ∈ I2) are indeed essentially weaker than the conditions assumedin (9.2.20).

9.2.18 Example. The set

K = x ∈ R2 : g(x) ≤ 0, g(x) = −x1 + x22,

satisfies Assumption (9.2.38), whereas the related condition (9.2.20) is evidentlyviolated. Moreover, a comparison with Example 9.2.16 points to the fact thatthere are hardly any chances for a further weakening of the conditions (9.2.38).

Assumption (9.2.38) does not entail any geometrical peculiarities of thefunction maxi∈I2 gi (cf. (9.2.3)) in intK. In particular, considering

K = x ∈ R2 : gi(x) ≤ 0, i = 1, 2

withg1(x) = x2

1 + x22 − 1, g2(x) = ex1−1 − 1,

we meet the situation that Assumption (9.2.38) is valid, but for arbitrary x0 ∈intK and

` : maxg1(x0), g2(x0) < ` < 0

the boundary of the level set

x ∈ R2 : maxg1(x), g2(x) = `

contains a line segment. ♦

9.2.3 Embedding of original Bregman-function-based meth-ods

Let us analyze the original proof technique of Bregman-function-based proximalmethods destined to variational inequalities on polyhedral sets. Denote by D adistance function and by v` an iterate of such a method. To our knowledge,original convergence results for these methods establish convergence of the se-quence D(x, v`) for each solution x without the use of assumption Stand-(B4).Hence, for interior proximal methods involving standard requirements on Breg-man functions (see, in particular, [61, 97, 376]), the original convergence analysiscan be preserved (with minor alterations in the final stage only) if we replace


Stand-(B4) by (B4) with z ∈ K instead of z ∈ bdK in (B4)(ii). Let us recallthat, for K given by (9.2.20), (9.2.21), the conditions of Theorem 9.2.12 ensurethe fulfillment of the so modified assumption (B4)

Indeed, let a subsequence v`k of v` with limk→∞ v`k = v be chosensuch that there exists x ∈ SOL(Q,K), x 6= v (if this is impossible, clearlylim`→∞ v` = v and SOL(Q,K) = v). The convergence of D(x, v`) im-plies that condition (B4)(ii) with z := x, z := v, zk := vlk is violated. Hence,(B4)(i) is valid, i.e. limk→∞D(v, v`k) = 0, and (B4) turns into Stand-(B4). Re-ferring now to the original convergence results, we immediately conclude thatv ∈ SOL(Q,K). Then, convergence of D(v, v`) implies lim`→∞D(v, v`) = 0and Theorem 2.4 in [376] yields lim`→∞ v` = v.

On this way, inserting the function (9.2.24) (for instance, with ϕ definedby (9.2.35) or (9.2.36)) in methods where Bregman functions are described ex-actly by standard conditions, one can extend these methods to solve VI(Q,K)on non-polyhedral sets K given by (9.2.35)–(9.2.36).

Return to the convergence analysis in Section 9.2.2. For Method 9.2.7without an approximation of the operator Q, i.e. with Qk ≡ Q, the same argu-ments as given just above permit us to weaken Assumption 9.2.1 (A4), replacing(A4)(b) by the condition that Q is a pseudo-monotone operator in the sense ofBrezis-Lions [270], here boundedness of the operator is included in this no-tion. With this alteration, the proof of Lemma 9.2.10 can be performed quitesimilarly to the proof of the case (c) in Lemma 9.3.10 in Subsection 9.3.2.

Moreover, for the exact version of Method 9.2.7, with Qk ≡ Q, ek ≡ 0,assumption (A4)(b) can be omitted at all according to the analysis of Solodovand Svaiter [376].

9.2.3.1 On entropy-like and logarithmic-quadratic PPR

We have tried to extend in a similar manner entropy-like and logarithmic-quadratic proximal methods (see [22, 25, 26, 386]). Applications of these meth-ods to the dual of linearly constrained programs or to variational inequalitieson polyhedral convex sets provide attractive properties of subproblems (for in-stance, C∞ multiplier methods with bounded Hessians are constructed on thisway in [25]).However, the basic requirements on the kernel functions in these methods seemto be not appropriate for such an extension. Indeed, in case K := R

m+ the

corresponding distance functions have the form

d(u, v) =

m∑i=1

vαi

[ϕ

(uivi

)+ β

(uivi− 1

)2]

(9.2.44)

(α := 1, β := 0 in entropy-like methods and α := 2, β > 0 in logarithmic-quadratic methods). If

K := x ∈ Rn : gi(x) := 〈ai, x〉 − bi ≤ 0, i = 1, ...,m,

the distance D is given by

D(x, y) := d(−g(x),−g(y)), g = (g1, ..., gm)T . (9.2.45)

9.3. PPR WITH GENERALIZED DISTANCE FUNCTIONALS 403

Concerning the kernel function ϕ, it is supposed, in particular, that domϕ ⊆[0,+∞), ϕ is twice continuously differentiable on (0,+∞) and strictly convexon its domain, and

ϕ(1) = ϕ′(1) = 0, ϕ′′(1) > 0. (9.2.46)

Already these conditions enforce that the function D(·, y) defined by (9.2.45) is,in general, non-convex for some y ∈ intK, if at least one function gi is not affineas the following example shows.

9.2.19 Example. Let the kernel function ϕ with the mentioned properties befixed, and K := x ∈ R1 : g(x) ≤ 0, where the choice of the convex andsufficiently smooth function g : R1 → R

1 will be specified below. According to(9.2.44), (9.2.45)

D(x, y) = (−g(y))α

[ϕ

(g(x)

g(y)

)+β

2

(g(x)

g(y)− 1

)2].

Assume first that ϕ( 12 ) ≤ ϕ( 3

2 ) and take g with values

g(x1) := −1

2, g(1) := −3

2, g(x2) := −1,

where x1, x2 are given points, such that (x1 − 1)(x2 − 1) < 0. Then, for y := x2

one gets

D(x1, y) = ϕ(1

2) +

β

8, D(1, y) = ϕ(

3

2) +

β

8, D(x2, y) = 0.

Because of ϕ( 12 ) ≤ ϕ( 3

2 ) and ϕ(t) ≥ 0 ∀ t ∈ domϕ (the last inequality followsfrom the convexity of ϕ and (9.2.46)), we conclude that the functions D(·, y)and ϕ (−g) are not convex.But, if ϕ( 1

2 ) > ϕ( 32 ), the same conclusion holds true with y := x1 and

g : g(x1) = −1, g(1) = −1

2, g(x2) = −3

2.

♦

9.3 PPR with Generalized Distance Functionals

The purpose of this subsection is a uniform approach for analyzing convergenceof proximal-like methods for solving variational inequalities in Hilbert spaceswith non-quadratic distance functionals. In comparison with preceding publica-tions dealing with such non-quadratic regularization terms, here the standardrequirement of the strict monotonicity of the operator ∇1D(·, y) (usually for-mulated as the strict convexity of Bregman’s or an other function generatingthe distance D) is weakened. This leads to an analogy of methods with weakregularization and regularization on a subspace developed on the basis of theclassical PPR in Section 8.2.3. Besides a successive approximation of the set Kis included.


Let (V, ‖·‖) be a Hilbert space with the topological dual V ′ and the dualitypairing 〈·, ·〉 between V and V ′. We consider the variational inequality

VI(Q,K) Find a pair x∗ ∈ K and q∗ ∈ Q(x∗) such that

〈q∗, x− x∗〉 ≥ 0 ∀ x ∈ K,

where K ⊂ V is a convex closed set and Q : V → 2V′

is a maximal monotoneoperator.We generally suppose that VI(Q,K) is solvable and denote by SOL(Q,K) itssolution set.

The exact proximal point method, applied to the variational inequalityVI(Q,K), can be described as follows:

Given x0 ∈ K and a sequence χk, 0 < χk ≤ χ <∞;xk+1 ∈ K is defined such that

∃ q(xk+1) ∈ Q(xk+1) :

〈q(xk+1) + χk∇1D(xk+1, xk), y − xk+1〉 ≥ 0 ∀y ∈ K,

where D(x, y) = 12‖x− y‖

2 and ∇1 is the partial gradient w.r.t. x.

For different modifications of the PPR, also with other quadratic function-als D, we address the reader to [224] and [241], where numerous references canbe found.

The scheme studied here can be described as follows:Taking a linear monotone operator B : V → V ′ such thatQ−B is still monotone,we choose a convex continuous functional h : S → R so that

x→ 1

2〈Bx, x〉+ h(x)

possesses properties like usually required for Bregman functions (with zone S).At the k-th step, with xk ∈ Kk−1 ∩ S obtained at the previous iteration, theiterate xk+1 ∈ Kk ∩ S is defined such that

∃ q(xk+1) ∈ Q(xk+1) :

〈q(xk+1) + χk(∇h(xk+1)−∇h(xk)), x− xk+1〉 (9.3.1)

≥ −δk√

Γ1(x, xk+1) ∀x ∈ Kk ∩ S.

Here, Kk is a sequence of convex sets approaching K, χk chosen as above,δk ↓ 0 is a given non-negative sequence and

Γ1(x, y) := minα‖x− y‖2,Γ(x, y) + 1, α > 0− const., (9.3.2)

with

Γ(x, y) :=1

2〈B(x− y), x− y〉+ h(x)− h(y)− 〈∇h(y), x− y〉 (9.3.3)

considered on domΓ = S ×D(∇h) and used below as a Lyapunov function.


This scheme and the required conditions on h in the following Subsec-tion 9.3.1 do not exclude the use of quadratic functionals h, in particular, thechoice h(x) = 1

2‖x‖2 leads to a perturbed version of the classical proximal point

method (for this version with more general assumptions w.r.t. data approxima-tion see [225]). Therefore, in the sequel the notion ”non-quadratic” indicates thepredominant aspect of this investigation.

9.3.1 Generalized proximal point method

In the sequel we make use of the following notations: S ⊂ V is an open convexset, its closure is denoted by clS; Kk ⊂ V, Kk ⊃ K, is a family of convexclosed sets approximating K;

NK : y →z ∈ V ′ : 〈z, y − x〉 ≥ 0 ∀x ∈ K if y ∈ K∅ otherwise

is the normality operator for K. With B and h : clS → R as introduced, wedefine the functional

η(x) =

12 〈Bx, x〉+ h(x) if x ∈ clS

+∞ otherwise.(9.3.4)

Now the following basic assumptions on the successive approximation ofVI(Q,K) and the choice of the controlling parameters are considered.

9.3.1 Assumption.

(A1) For each k, the operator Q+NKk is maximal monotone;

(A2) S ∩D(Q) ∩Kk 6= ∅ ∀k;

(A3) For each k it holds

〈q(x)− q(y), x− y〉 ≥ 〈B(x− y), x− y〉 ∀x, y ∈ D(Q) ∩Kk, ∀q(·) ∈ Q(·),

where B : V → V ′ is a given linear continuous and monotone operatorwith symmetry property 〈Bx, y〉 = 〈By, x〉;

(A4) Any weak limit point of an arbitrary sequence vk, vk ∈ S ∩D(Q)∩Kk,belongs to K ∩D(Q);

(A5) The non-negative sequences ϕk (accuracy of approximation), χk (reg-ularization parameter) and δk (exactness for solving the auxiliary prob-lems) satisfy

0 < χk ≤ 1,

∞∑k=1

ϕkχk

<∞,∞∑k=1

δkχk

<∞;

(A6) For some x∗ ∈ SOL(Q,K) ∩ clS and q∗(x∗) ∈ Q(x∗) satisfying

〈q∗(x∗), y − x∗〉 ≥ 0 ∀y ∈ K,

and for an arbitrary sequence vk, vk ∈ S ∩ D(Q) ∩Kk, there exists asequence wk(vk) ⊂ K ∩ S such that

〈q∗(x∗), wk(vk)− vk〉 ≤ c(Γ(x∗, vk) + 1

)ϕk (c ≥ 0− const.). (9.3.5)


♦

Condition (A6) seems to be rather artificial, especially, due to the unknownelement q∗(x∗). However, for a series of variational inequalities in mechanicsand physics, we have a helpful a priori information about q∗(x∗). For instance,for the problem on a steady movement of a fluid in a domain Ω bounded by asemi-permeable membrane (see [135], Sect.1) q∗(x∗) = 0 has to be.

In the general situation, one can replace (9.3.5) by

‖wk(vk)− vk‖ ≤ c1ϕk.

Because in (9.3.5) c is an arbitrary (non-negative) constant, this causes no al-terations in the analysis below.

9.3.2 Remark. In [224] and [225], using the functional h such that

∃ m > 0 : Γ(x, y) ≥ m‖x− y‖2 ∀x, y,

more general approximations have been considered (Kk ⊃ K is not supposed),mainly inspired by finite element methods in mathematical physics. Here werenounce it in order to avoid too much technicalities. ♦

Now we are going to define the generalized distance functional Γ.

9.3.3 Assumption.

(B1) h : clS → R is a convex and continuous functional;

(B2) h is Gateaux-differentiable on S;

(B3) The functional η in (9.3.4) is strictly convex on clS;

(B4) SOL(Q,K) ∩ clS 6= ∅;

(B5) L1(x, δ) = y ∈ S : Γ(x, y) ≤ δ is bounded for each x ∈ clS and each δ;

(B6) If vk ⊂ S, yk ⊂ S converge weakly to v and limk→∞ Γ(vk, yk) = 0,then

limk→∞

[Γ(v, vk)− Γ(v, yk)

]= 0;

(B7) If vk ⊂ S is bounded, yk ⊂ S, yk y and limk→∞ Γ(vk, yk) = 0,then

limk→∞

‖vk − yk‖ = 0;

(B8) If vk ⊂ S, yk ⊂ S, vk v, yk y and v 6= y, then

limk→∞∣∣〈∇h(vk) + Bvk −∇h(yk)− Byk, v − y〉

∣∣ > 0;

(B9) ∀ z ∈ V ′ ∃ x ∈ S : ∇h(x) + Bx = z. ♦


In [219], Sect.5, for two problems in elasticity theory the chosen regularizingfunctionals satisfy the Assumptions 9.3.3 and Assumption 9.3.1 (A3), but theyare not strictly convex (see, also Method 8.2.5 (regularization on the kernel)).

As it can be concluded from [65], Sect.2.1.2, Assumption 9.3.3 (B7) isequivalent to the following sequential consistency property for the functional ηon S:

for any non-empty bounded subset E ⊂ S and any t > 0

infx∈E

infη(y)− η(x)− 〈∇η(x), y − x〉 : y ∈ clS, ‖y − x‖ = t > 0.

For the case B = 0, the totality of Assumptions 9.3.3 (B1) - (B9) is similarto the system of hypothesizes for Bregman functions considered Burachik andIusem [59], only (B7) here is stronger than the corresponding assumption in thepaper mentioned, where yk y stands in place of limk→∞ ‖vk−yk‖ = 0. In thecases (b) and (c) (see Lemma 9.3.10 and Theorem 9.3.11 below), this assump-tion from [59] suffices if B is a compact operator. At the same time, the use of(B7) permits us, in particular, to extend the class of operators Q by includingthe case (a) (again, see Lemma 9.3.10 and Theorem 9.3.11 below).

If B := 0, V := Rn, Assumption 9.3.3 (B1)-(B9) can be derived from thestandard hypothesizes for Bregman functions (see the analysis in [59], Sect. 7).

The Assumptions 9.3.3 (B2) and (B3) ensure that Γ(x, y) > 0,Γ1(x, y) > 0hold for x 6= y, and obviously Γ(x, x) = 0,Γ1(x, x) = 0.

The consideration of an approximation of K by Kk addresses, in partic-ular, the situation when K is given in the form K = K1 ∩K2 and we choose hby taking into account the set K1 only. In this case Kk = K1 ∩Kk

2 is natural.

Let us give a simple example illustrating the choice of the functional h.Let V := Rn,

K := x ∈ Rn : xj ≥ 0, j = 1, ..., n1;

n2∑j=n1+1

j|xj | ≤ 1

with 0 < n1 < n2 < n,

Q : x→ (A(x1, ..., xn1), xn1+1 − 1, ..., xn − 1),

where A : Rn1 → Rn1 is an arbitrary continuous and monotone operator such

that the corresponding VI(Q,K) is solvable. Then, considering the approxima-tion

Kk := x ∈ Rn : xj ≥ 0, j = 1, ..., n1,n2∑

j=n1+1

j√x2j + τk ≤ 1 +

√τk

n2∑j=n1+1

j,

where τk ↓ 0, takeB : x→ (0, ..., 0, xn1+1, ..., xn).

Since

Kk ⊂ x ∈ Rn : xj ≥ 0, j = 1, ..., n1;

n2∑j=n1+1

j|xj | ≤ 1 +√τk

n2∑j=n1+1

j,


the estimateminz∈K‖z − vk‖ ≤ c1

√τk ∀vk, vk ∈ Kk,

is evident.Now, it is easy to verify that the choices of Kk,

h(x) :=

n1∑j=1

xj lnxj − xj (with 0× ln 0 = 0 by convention)

andS := x ∈ Rn : xj > 0, j = 1, ..., n1

satisfy the Assumptions 9.3.1 (A1)-(A4), Assumptions 9.3.3(B1)-(B9). As-sumption 9.3.1 (A6) is fulfilled with ϕk =

√τk, whereas the second condition

in (A5) forces that∑∞k=1

√τkχk

<∞.

Now, let us recall the method under consideration.

9.3.4 Method. (Generalized proximal point method) (GPPM):Given x1 ∈ S and xk ∈ Kk−1 ∩S (at the (k− 1)-th step). At the k-th step solve

(P kδk) find xk+1 ∈ Kk ∩ clS such that

∃ q(xk+1) ∈ Q(xk+1) :

〈q(xk+1) + χk(∇h(xk+1)−∇h(xk)), x− xk+1〉 (9.3.6)

≥ −δk√

Γ1(x, xk+1) ∀x ∈ Kk ∩ clS.

♦

We denote by (P k0 ) Problem (9.3.6) with δk = 0 and by xk+1 its solution.

However, the criterion for the approximate calculation of xk+1 in (P kδk) isnot suitable for a straightforward use, but it permits one to extend the conver-gence results, obtained here, to related algorithms with more reasonable criteria.

Eckstein [97] has analyzed different accuracy conditions of Bregman-function-based methods for the inclusion

find z ∈ Rn : 0 ∈ T z, (9.3.7)

with T : Rn → 2Rn

a maximal monotone operator.The method, studied in [97] under rather standard assumptions on a Bregmanfunction g, has the form

0 ∈ χ−1k T (xk+1) +∇g(xk+1)−∇g(xk) + ek+1, (9.3.8)

and solves VI(Q,K) if T := Q+NK and Q+NK is maximal monotone (hereagain, ek+1 is an error vector). A relaxation of the exact inclusion ((9.3.8) givenwith ek+1 = 0) is considered in [97] as to be preferable for numerical imple-mentations. Convergence of the iterates xk generated in (9.3.8) to a solution of(9.3.7) is established under the following conditions on ek:

∞∑k=1

‖ek‖ <∞ (9.3.9)


and∞∑k=1

〈ek, xk〉 <∞. (9.3.10)

Considering in the sequel method (9.3.8) with T := Q + NK on a pair(V, V ′), we have to take ‖ · ‖V ′ in (9.3.9). Now we show that condition (9.3.9)suffices to conclude that (9.3.6) is valid if

∑∞k=1 χ

−1k δk <∞.

If K ∩clS is a bounded set, then (9.3.10) follows immediately from (9.3.9).In this case, one can easily see that (9.3.8) implies the validity of (9.3.6) givenwith an appropriate α in (9.3.2) and δk := α−1/2χk‖ek+1‖V ′ , such that (9.3.9)provides the condition

∑∞k=1 χ

−1k δk <∞ in (A5) (naturally, for this comparison,

we assume that B := 0, Kk ≡ K and g := h; but the same arguments suit ifg := h in (9.3.8) satisfies Assumption 9.3.1 (A3), Assumption 9.3.3 (B1)-(B3)with B 6= 0).Indeed, taking α < (2 diam(K ∩ clS))−2, the equality

Γ1(x, y) = α‖x− y‖2 ∀x ∈ K ∩ clS, y ∈ K ∩ S

follows from the definition of Γ1.Due to the definition of NK , one can rewrite (9.3.8) as

〈χkek+1 + q(xk+1) + χk(∇h(xk+1)−∇h(xk)), x− xk+1〉 ≥ 0 ∀x ∈ K,

with q(xk+1) ∈ Q(xk+1), hence

〈q(xk+1) + χk(∇h(xk+1)−∇h(xk)), x− xk+1〉≥ −χk‖ek+1‖V ′‖x− xk+1‖ ∀x ∈ K.

In view of the assumption xk ⊂ S made in [97] (see also Subsection 9.3.2below), Γ1(x, xk+1) = α‖x − xk+1‖2 holds for x ∈ K ∩ clS, and together withthe last inequality this yields

〈q(xk+1) + χk(∇h(xk+1)−∇h(xk)), x− xk+1〉

≥ −α−1/2χk‖ek+1‖V ′√

Γ1(x, xk+1) ∀x ∈ K ∩ clS. (9.3.11)

Therefore, the claim above follows immediately.Now, let us trace the situation when K is not bounded, nevertheless con-

dition (9.3.9) ensures (9.3.6) with∑∞k=1 χ

−1k δk < ∞ (hence, condition (9.3.10)

is superfluous). In this case, however, the use of a suitable operator B 6= 0 is inessence.We suppose that V := H1(Ω) (where Ω is an open domain in R

n), thatK ⊂ x ∈ V : ‖x‖L2(Ω) ≤ 1 is an unbounded set in V and B : V → V ′ isgiven by 〈Bx, x〉 := ‖∇x‖2L2(Ω). A similar choice of B is quite realistic for ellipticproblems. In this situation, for any functional h satisfying Assumption 9.3.3(B1)-(B3),

Γ(x, y) + 1 ≥ 1

4‖x− y‖2L2(Ω) + 〈B(x− y), x− y〉

≥ 1

4‖x− y‖2H1(Ω) ∀x ∈ K ∩ clS, y ∈ K ∩ S,

is valid, hence setting α ≤ 1/4 in (9.3.2), one gets

Γ1(x, y) = α‖x− y‖2H1(Ω) ∀x ∈ K ∩ clS, y ∈ K ∩ S.


Now, the same arguments as in the case of a bounded setK enable us to concludethat (9.3.8) implies (9.3.6) with δk := α−1/2χk‖ek+1‖V ′ .

Return to the general process, assuming that g := h and

Γ(x, y) ≥ m‖x− y‖2 ∀x ∈ clS, y ∈ S (9.3.12)

is valid with m > 0. Here, boundedness of K is not supposed. However, α ≤ mensures

Γ1(x, y) = α‖x− y‖2, (9.3.13)

and relation (9.3.11) follows as above. Hence, condition (9.3.9) suffices to con-clude that (9.3.6) is valid with

∑∞k=1 χ

−1k δk <∞.

Note that in the particular case h(x) := 12‖x‖

2 we deal with the classicalproximal point method, and (9.3.9) is nothing else as the known criterion (A’) in[352]. Relation (9.3.13) holds also true if the functional h is chosen as in methodswith weak regularization or regularization on a subspace (see Subsection 8.2.3).Thus, the convergence results established below can be applied to these methodsin the form (9.3.8), (9.3.9).

9.3.5 Remark. Eckstein explains the discrepancy in the convergence condi-tions (9.3.9) (for the classical method) and (9.3.9),(9.3.10) (for non-quadraticproximal methods) with the fact that ”no triangle inequality applies”to Bregmandistances D(x, y) := h(x)−h(y)−〈∇h(y), x−y〉 with non-quadratic h. However,the analysis above shows that the ”sufficiency” of criterion (9.3.9) depends moreon the fulfillment of relation (9.3.13) for some α > 0. ♦

9.3.2 Convergence analysis

At first we show existence and uniqueness of a solution of Problem (P k0 ), andthe inclusion xk+1 ∈ S for a solution of (P kδk).

According to Assumption 9.3.3 (B1), the subdifferential operator ∂η ismaximal monotone. The Assumptions 9.3.3 (B1)-(B3) and (B9) provide thatD(∂η) = S. Indeed, the inclusion D(∂η) ⊃ S follows from (B2), and assumingthat ∂η(x) 6= ∅ holds for some x ∈ clS\S, in view of (B3) we obtain

〈∇η(y)− ξ(x), y − x〉 > 0 ∀y ∈ S, ξ(x) ∈ ∂η(x).

But, for a fixed ξ(x) ∈ ∂η(x), due to (B9), there exists y ∈ S such that ∇η(y) =ξ(x), in contradiction to the last inequality.In turns, the conclusion D(∂η) = S means that D(∇h) = S, and the bothoperators ∇η and ∇h are maximal monotone.

Thus, if Problem (P k0 ) is solvable, then it has a unique solution, heredenoted by xk+1 (the strict monotonicity of Q + χk∇h on S ∩ Kk ∩ D(Q)follows immediately from Assumption 9.3.1 (A3) and Assumption 9.3.3 (B3)),and xk+1 ∈ S. Then, of course, the solution xk+1 of Problem (P kδk) exists, and

D(∇h) = S provides xk+1 ∈ S.Because the operator ∇h is maximal monotone and S is an open set, the

maximal monotonicity of the operators Q + χk∇h + NKk and x → Q(x) +χk∇h(x) +NKk(x)−χk∇h(xk) follows from the Assumptions 9.3.1 (A1), (A2)


and (A5) according to Theorem 1 in [350].Since Kk ∩ S 6= ∅, the Moreau-Rockafellar theorem yields

NKk∩clS = NKk +NclS .

Taking into account that xk+1 (if it exists) belongs to S, this permits to trans-form Problem (P k0 ) into the inclusion

0 ∈ Q(x) + χk∇h(x) +NKk(x)− χk∇h(xk)

= Q(x)− χkBx+NKk(x) + χkBxk + χk(∇η(x)−∇η(xk)

),

and with regard to Assumption 9.3.1 (A1), (A3) and 0 < χk ≤ 1, the operator

x→ Q(x)− χkBx+NKk(x) + χkBxk

is maximal monotone (see Proposition 2.6 in [345]).Now, applying Lemma 5 in [59], one can conclude solvability of Problem (P k0 ).So, the following statement is proved.

9.3.6 Theorem. Let the Assumptions 9.3.1 (A1)-(A3), (A5) and Assumptions9.3.3 (B1)-(B3), (B9) be valid. Then Problem (P k0 ) is uniquely solvable (for eachk), the sequence xk generated by Method 9.3.4 is well defined and containedin S.

9.3.7 Remark. Using instead of Assumption 9.3.3 (B9) the condition (see[193])

vk ⊂ S, vk v ∈ clS\S =⇒ limk→∞

〈∇h(vk), y − vk〉 = −∞ ∀y ∈ S,

the conclusion D(∂η) = S can be obtained from Lemma 1 in [59], and a resulton solvability like Theorem 2 in [59] holds also true. ♦

In the sequel, we investigate the convergence of the method and need thefollowing assertion proved first in [223].

9.3.8 Lemma. Let C ⊂ V be a convex closed set, the operators A : V →2V′, A +NC be maximal monotone and D(A) ∩ C be a convex set. Moreover,

assume that the operator

AC : v →A(v) if v ∈ C∅ otherwise

is locally hemi-bounded at each point v ∈ D(A)∩C (see Definition A1.6.38) andthat, for some u ∈ D(A) ∩ C and each v ∈ D(A) ∩ C, there exists ζ(v) ∈ A(v)satisfying

〈ζ(v), v − u〉 ≥ 0. (9.3.14)

Then, with some ζ ∈ A(u), the inequality

〈ζ, v − u〉 ≥ 0

holds for all v ∈ C.


Proof: In view of the maximal monotonicity of A and A+NC , the operatorsA : v → A(v) + I(v − u) and A1 = A + NC (with I : V → V ′ a canonicalinjection) are also maximal monotone. Moreover, they are strongly monotone.Therefore, there exists w ∈ D(A)∩C such that 0 ∈ A(w) +NC(w), and due tothe definition of the normality operator, this yields

〈ζ(w), v − w〉 ≥ 0 ∀v ∈ C, (9.3.15)

with some ζ(w) ∈ A(w).If w = u, then, of course, ζ(w) ∈ A(w), hence, the conclusion of the lemma isvalid. Otherwise, we use the relation

〈ζ(v), v − u〉 ≥ 0 ∀v ∈ D(A) ∩ C, (9.3.16)

which follows from (9.3.14) taking ζ(v) = ζ(v) + I(v − u) ∈ A(v).Let wλ = u+ λ(w− u) for λ ∈ (0, 1]. Obviously, wλ ∈ D(A)∩C, and accordingto (9.3.16) there exists ζ(wλ) ∈ A(wλ) ensuring

〈ζ(wλ), w − u〉 ≥ 0.

Because the operator AC is locally hemi-bounded at u, the set ζ(wλ) : λ ∈(0, λ0] is bounded in V ′ for a sufficiently small λ0 > 0. Hence, if λ tends to 0 inan appropriate manner, the corresponding sequence ζ(wλ) converges weaklyin V ′ to some ζ. Taking into account that limλ→0 ‖wλ − u‖ = 0 and that A ismaximal monotone, one can conclude that ζ ∈ A(u) and

0 ≤ lim〈ζ(wλ), w − u〉 = 〈ζ, w − u〉.

Combining this inequality and inequality (9.3.15) given with v = u, we obtain

〈ζ − ζ(w), u− w〉 ≤ 0,

but that contradicts the strong monotonicity of A.

9.3.9 Lemma. Let the sequence xk, generated by Method 9.3.4, belong to Sand assume that, for some x∗ ∈ SOL(Q,K) ∩ clS, Assumption 9.3.1 (A6) isvalid. Moreover, let the Assumptions 9.3.1 (A3), (A5) and Assumptions 9.3.3(B1), (B2), (B5) be fulfilled. Then

(i) Γ(x∗, xk) is convergent,

(ii) xk is bounded, and

(iii) limk→∞ Γ(xk+1, xk) = 0.

Proof: We rewrite

Γ(x∗, xk+1)− Γ(x∗, xk) = s1 + χ−1k s2 + s3,

with

s1 := h(xk)− h(xk+1) + 〈∇h(xk), xk+1 − xk〉,s2 := χk〈∇h(xk)−∇h(xk+1), x∗ − xk+1〉,

s3 :=1

2〈B(xk+1 − x∗), xk+1 − x∗〉 − 1

2〈B(xk − x∗), xk − x∗〉.


In view of Γ1(x, y) ≤ Γ(x, y) + 1, (A5) and Kk ⊃ K, the inequality

〈q(xk+1) + χk(∇h(xk+1)−∇h(xk)), x∗ − xk+1〉 ≥ −δk√

Γ(x∗, xk+1) + 1

follows immediately from (9.3.6). Together with (A3), this yields

s2 ≤ 〈q(xk+1), x∗ − xk+1〉+ δk

√Γ(x∗, xk+1) + 1

≤ 〈q∗(x∗), x∗ − xk+1〉 − 〈B(x∗ − xk+1), x∗ − xk+1〉

+ δk

√Γ(x∗, xk+1) + 1

≤ 〈q∗(x∗), wk+1 − xk+1〉+ 〈q∗(x∗), x∗ − wk+1〉

− 〈B(x∗ − xk+1), x∗ − xk+1〉+ δk

√Γ(x∗, xk+1) + 1, (9.3.17)

where we take q∗(x∗) and wk+1 := wk+1(xk+1) according to (A6).From the definition of x∗ and q∗(x∗), one gets

〈q∗(x∗), x∗ − wk+1〉 ≤ 0, (9.3.18)

and (9.3.17), (9.3.18) and (A6) permit us to conclude that

s2 ≤ c(Γ(x∗, xk+1) + 1)ϕk − 〈B(x∗ − xk+1), x∗ − xk+1〉+ (1 + Γ(x∗, xk+1))δk.

With regard to 0 < χk ≤ 1, this leads to

χ−1k s2 + s3 ≤ Γ(x∗, xk+1)

(cϕkχk

+δkχk

)+cϕkχk

+δkχk

−1

2〈B(x∗ − xk+1), x∗ − xk+1〉 − 1

2〈B(x∗ − xk), x∗ − xk〉.

But

1

2〈B(xk+1 − xk), xk+1 − xk〉

≤ 〈B(x∗ − xk+1), x∗ − xk+1〉+ 〈B(x∗ − xk), x∗ − xk〉

is obvious, and taking into account the definition of Γ and the convexity of h,

s1 + χ−1k s2 + s3 ≤ Γ(x∗, xk+1)

(cϕkχk

+δkχk

)+cϕkχk

+δkχk

− 1

4〈B(xk+1 − xk), xk+1 − xk〉

−[h(xk+1)− h(xk)− 〈∇h(xk), xk+1 − xk〉

]≤ Γ(x∗, xk+1)

(cϕkχk

+δkχk

)+cϕkχk

+δkχk− 1

2Γ(xk+1, xk).

Hence,

Γ(x∗, xk+1)

(1− cϕk

χk− δkχk

)≤ Γ(x∗, xk) +

cϕkχk

+δkχk− 1

2Γ(xk+1, xk) (9.3.19)


is valid. But, condition (A5) implies the existence of an index k0 such that

cϕkχk

+δkχk≤ 1

2for k ≥ k0,

i.e.,

1 ≤(

1− cϕkχk− δkχk

)−1

≤ 1 + 2

(cϕkχk

+δkχk

)≤ 2.

Thus, for k ≥ k0, we obtain from (9.3.19) and Γ(xk+1, xk) ≥ 0 that

Γ(x∗, xk+1) ≤[1 + 2

(cϕkχk

+δkχk

)]Γ(x∗, xk)

+2

(cϕkχk

+δkχk

)− 1

2Γ(xk+1, xk), (9.3.20)

and, in view of condition (A5), Lemma A3.1.7 provides that the sequenceΓ(x∗, xk) is convergent. Now, boundedness of xk follows from (B5), andusing again (9.3.20) one gets limk→∞ Γ(xk+1, xk) = 0.

In the sequel, we deal with the case that, besides the usual property ofmaximal monotonicity, the operator Q is pseudo-monotone and paramonotone(see the Definitions A1.6.42 and A1.6.43).

In particular, we will use the following property of a paramonotone oper-ator A in C (cf. [59]):

(F) If u∗ solves the variational inequality

〈A(u), v − u〉 ≥ 0 ∀v ∈ C (9.3.21)

and for u ∈ C there exist z ∈ A(u) with 〈z, u∗ − u〉 ≥ 0, then u is also asolution of (9.3.21).

9.3.10 Lemma. Let the assumptions of Lemma 9.3.9 as well as the Assump-tions 9.3.1 (A1), (A4) and Assumptions 9.3.3 (B6), (B7) be valid. Moreover,suppose that one of the following assumptions is fulfilled:

(a) S ⊃ cl(D(Q)), ∇h is Lipschitz continuous on closed and bounded subsetsof S and the hypothesizes of Lemma 9.3.8 hold with A := Q, C := K∩clS;

(b) Q := ∂f , with f a proper convex lower semicontinuous functional whichis continuous at some x ∈ K;

(c) Q : D(Q)→ 2V′

is pseudo-monotone and paramonotone on clS.

Then the sequence xk, generated by Method 9.3.4, is bounded and each weaklimit point is a solution of VI(Q,K).

For a motivation of the inclusion S ⊃ cl(D(Q)), which excludes the usualchoice of a function h leading to interior point methods, see [96].In the case (b), condition (A4) can be weakened assuming that each limit pointof vk belongs to K (in place of K ∩D(Q)).


Proof: According to Lemma 9.3.9, the sequence xk is bounded, hence, thereexists a weakly convergent subsequence xjk, xjk k→∞ x. In view of xk ⊂S, (A4) and the convexity of S, the inclusion x ∈ clS ∩K ∩D(Q) (respectivelyx ∈ clS ∩K in the case (b)) is valid.Due to limk→∞ Γ(xk+1, xk) = 0, one can use condition (B7) with vk := xjk+1,yk := xjk . This leads to

limk→∞

‖xjk+1 − xjk‖ = 0. (9.3.22)

If (a) holds, then with regard to the boundedness of xk, xk ⊂ D(Q),(A5) and (9.3.22), the relation

limk→∞

χjk〈∇h(xjk+1)−∇h(xjk), x− xjk+1〉 = 0 ∀x ∈ V (9.3.23)

follows immediately.Now, take (9.3.6) with an arbitrary x ∈ K ∩ clS and replace q(xk+1) by q(x) ∈Q(x) (this is possible in view of the monotonicity of Q). Then, passing to thelimit in the obtained inequality with k := jk, k → ∞, due to the boundednessof xk, the definition of Γ1, (A5) and (9.3.23), we obtain

〈q(x), x− x〉 ≥ 0 ∀x ∈ K ∩ clS.

The conditions (A1) and S ⊃ cl(D(Q)) guarantee the maximal monotonicity ofthe operator Q+NK∩clS ( in fact, Q+NK∩clS coincides with Q+NK). Thus,we are able to apply Lemma 9.3.8 with C := K ∩ clS, A := Q, which ensuresthat

∃ q(x) ∈ Q(x) : 〈q(x), y − x〉 ≥ 0 ∀y ∈ K ∩ clS.

This yields0 ∈ q(x) +NK∩clS(x) ⊂ Q(x) +NK∩clS(x),

hence 0 ∈ Q(x) +NK(x) holds, proving x ∈ SOL(Q,K).

Suppose now that (b) is valid and take x∗ as in (A6). With regard to thesymmetry of B a straightforward calculation gives

−〈∇h(xk)−∇h(xk+1), x∗ − xk+1〉 = Γ(x∗, xk)− Γ(x∗, xk+1)

− Γ(xk+1, xk)− 〈B(xk+1 − xk), x∗ − xk+1〉. (9.3.24)

Using Lemma 9.3.9, (9.3.22) and 0 < χk ≤ 1, we infer from (9.3.24) that

limk→∞

χjk〈∇h(xjk+1)−∇h(xjk), x∗ − xjk+1〉 = 0. (9.3.25)

But, relation (9.3.6) given with x = x∗ and k := jk implies, due to the convexityof f , that

δjk√

Γ1(x∗, xjk+1) + χjk〈∇h(xjk+1)−∇h(xjk), x∗ − xjk+1〉≥ f(xjk+1)− f(x∗). (9.3.26)

Taking the limit in (9.3.26) as k →∞, due to Lemma 9.3.9, (9.3.25), (A5) andthe lower semicontinuity of f , one gets f(x) ≤ f(x∗). Thus, 0 ∈ ∂ (f(x) + δ(x|K)),


δ(·|K) the indicator functional of K, and the Moreau-Rockafellar theorem pro-vides x ∈ SOL(Q,K).

Finally, let us consider the case (c). Using equality (9.3.24) with k := jk andx in place of x∗, from (9.3.22), Lemma 9.3.9 and condition (B6) for vk := xjk+1,yk := xjk , we conclude that

limk→∞

〈∇h(xjk+1)−∇h(xjk), x− xjk+1〉 = 0.

Thus, (9.3.6) taken with x := x implies

limk→∞〈q(xjk+1), xjk+1 − x〉 ≤ 0,

and the pseudo-monotonicity of Q provides the existence of q(x) ∈ Q(x) suchthat

〈q(x), x− x∗〉 ≤ limk→∞〈q(xjk+1), xjk+1 − x∗〉.Now, from (9.3.6) and relation (9.3.25), which is true also in this case, one gets

〈q(x), x− x∗〉 ≤ 0.

Therefore, the above mentioned property (F) of paramonotonicity permits toconclude that x ∈ SOL(Q,K).

9.3.11 Theorem. Let the Assumptions 9.3.1 (A1)-(A5) and Assumptions 9.3.3(B1)-(B9) be valid, and Assumption 9.3.1 (A6) holds for each x ∈ SOL(Q,K)∩clS (constant c in (A6) may depend on x). Moreover, let the operator Q possessone of the properties (a), (b) or (c) in Lemma 9.3.10.Then the sequence xk, generated by Method 9.3.4, converges weakly to a so-lution of VI(Q,K).

Proof: The existence of the sequence xk and the inclusion xk ⊂ S areguaranteed by Theorem 9.3.6. Denote

dk(x) := Γ(x, xk)− 1

2〈Bx, x〉 − h(x).

According to Lemma 9.3.9, Γ(x, xk) converges for each x ∈ SOL(Q,K)∩clS,hence, the sequence dk(x) possesses the same property.

Boundedness of xk was proved in Lemma 9.3.9, and Lemma 9.3.10yields that each weak limit point of xk belongs to SOL(Q,K) ∩ clS.

Assume that xjk and xik converge weakly to x, x, respectively. Thenit holds x, x ∈ SOL(Q,K) ∩ clS. Let

l1 := limk→∞

dk(x), l2 := limk→∞

dk(x).

Obviously,

l1 − l2 = limk→∞

(dk(x)− dk(x)) = limk→∞

〈∇h(xk) + Bxk, x− x〉.

Considering the latter equality now for the subsequences xjk and xik, onecan conclude that

limk→∞

〈∇h(xjk) + Bxjk −∇h(xik)− Bxik , x− x〉 = 0. (9.3.27)

A comparison of (9.3.27) and (B8) (given with vk := xjk , yk := xik) indicatesx = x, proving uniqueness of the weak limit point of xk.

9.4. COMMENTS 417

9.3.12 Remark. Theorem 9.3.6 remains true if Assumption 9.3.3 (B9) isreplaced by any other condition guaranteeing that xk is well defined andxk ⊂ S (see, in particular, Remark 9.3.7).If we replace S ⊃ cl(D(Q)) by the weaker requirement that

S ⊃ cl(D(Q) ∩ (∪k≥k0Kk))

is valid for an arbitrary large k0, then – with evident technical alterations – theproofs of Lemma 9.3.10 and Theorem 9.3.11 hold true.

9.4 Comments

Section 9.1: Using a relaxation factor ρk > 1, Bertsekas [44] reports aboutthe improvement of the convergence rate of the multiplier method for con-strained convex programming, which is – as we already mentioned – a well-known application of the proximal point algorithm. Facchinei and Pang [107]have shown that for a strongly monotone (with constant m) and Lipschitz con-tinuous (with constant L > m) operator Q the sequence xk generated byrelaxed PPR with ρk = 1 + m

χkL2 > 1 satisfies

‖JχkQ(xk)− x∗‖ ≤ (1 + χkm)−1‖xk − x∗‖,

‖xk+1 − x∗‖ ≤(

1− m2

L2

)1/2

(1 + χkm)−1‖xk − x∗‖,

where x∗ is the unique solution of the equation 0 = Q(x). Due to(

1− m2

L2

)< 1

in the right-hand side of the last inequality, after one step of the relaxed proxi-mal point algorithm the distance ‖xk+1−x∗‖ is less than the one in the classicalPPR.

Eckstein and Bertsekas [98] observed that only in the case of over-relaxation in PPR it should be possible to accelerate the convergence.

Based on Solodov and Svaiter’s investigation in [374], Hue and Stro-diot [186], considered the following version of the relaxed PPR: For some fixedrelaxation parameter ρ ∈ (0, 2) and an error tolerance parameter δ ∈ [0, 1) finda triplet (yk, qk, εk), εk ≥ 0, such that

qk ∈ Qεk(yk), χkqk + yk − xk = rk, (9.4.28)

and

γ‖rk‖2 + 2γεkχk + 2|1− ρ|ρ‖rk‖‖yk − xk‖ ≤ δ2(2− ρ)ρ‖yk − xk‖2, (9.4.29)

where γ := max1, ρ2. Afterwards the authors execute the extra-gradient step

xk+1 := xk − ρχkqk.

For δ := 0 the algorithm of Hue and Strodiot coincides with the exact versionof the relaxed proximal point algorithm of Eckstein, Bertsekas with ρk ≡ ρ.Hue and Strodiot report about first numerical experiences and obtain bet-ter convergence results in the case of over-relaxation, i.e. for ρ > 1. But theymention also, see [186] (Sect. 5), that the behavior of the algorithm for small


and large values of ρ “becomes worse: the number of steps is increasing”. Thisfact is not surprising because the proximal point algorithm is just a fixed pointiteration for the resolvent operator JχkT , with T = Q+NK . A solution of theinclusion 0 ∈ T (x) is found if xk+1 := JχkQ(xk) = xk. Therefore, if the current

iterate xk is far from the solution set, then one can execute over-relaxation stepswith the hope to accelerate the convergence. But, if xk is in a neighborhood ofthe solution set, then it is natural to keep the relaxation parameter ρk ≈ 1. Fromthis point of view it is important to vary the relaxation parameter ρk duringthe iteration process.

The results of this chapter are first published by Huebner and Tichat-schke [188], see also the PhD-thesis of Huebner [187].

Section 9.2: Bregman functions have been introduced by Bregman [48] tofind common points in the intersection of convex sets. Later on they were usedby several authors (cf. Butnariu and Iusem [59], Chen and Teboulle [74],Eckstein [97], Solodov and Svaiter [376]) in order to generalize proximalpoint methods for problems with polyhedral feasibility sets.

The main motivation for such proximal methods is not only to guaranteethat the iterates stay in the interior of the set K and to preserve the mainmerits of the classical PPR (good stability of the auxiliary problems and con-vergence of the whole sequence of iterates to a solution of the original problem),at the same time, the application of non-quadratic proximal techniques (as inAuslender, Teboulle and Ben-Tiba [25], Teboulle [385], Tseng andBertsekas [402]) to the dual of a smooth convex program leads to multipliermethods with twice or higher differentiable augmented Lagrangian functionals.Moreover, in [25] the Hessians of these functionals are bounded.

The implementation of a Bregman-function-based proximal Algorithm con-sidered in this Section is described in [187]. There also numerical examples canbe find showing the performance of the algorithm.

Section 9.3: More motivation of non-quadratic proximal methods can be foundby Auslender and Haddou [22], Ben-Tal and Zibulevski [40], Eckstein[96], Polyak and Teboulle [332]. For infinite-dimensional convex optimiza-tion problems such methods have been studied by Alber, Burachik andIusem [6], Butnariu and Iusem [65, 66], and for variational inequalities inHilbert spaces see Burachik and Iusem [59].

Chapter 10

THE AUXILIARYPROBLEM PRINCIPLE

The auxiliary problem principle (APP), originally introduced by Cohen [78, 79]as a general framework to analyze optimization algorithms of gradient and sub-gradient types as well as decomposition algorithms, was extended later to dif-ferent numerical methods for solving variational inequalities. In the majorityof the papers dedicated to such extensions, the (k+1)-th auxiliary problem isconstructed by applying the operator of the variational inequality (here calledmain operator) to the k-th iterate. The idea to take an additive component ofthis operator at a variable point leads to a scheme which appears to be a gen-eralization of proximal-like methods, too.

In order to illustrate this, let us consider the inclusion problem

IP(T , X) find x ∈ X : 0 ∈ T (x), (10.0.1)

with T a given multi-valued maximal monotone operator in a Hilbert space X.The APP is given in the form

find xk+1 ∈ X : 0 ∈ 1

χk

[Ξ(xk+1)− Ξ(xk)

]+ T (xk), (10.0.2)

with Ξ : X → X an auxiliary operator, χk > 0, and this iterative scheme canbe modified as follows:

find xk+1 ∈ X : 0 ∈ 1

χk

[Ξ(xk+1)− Ξ(xk)

]+ T (xk+1), (10.0.3)

or

find xk+1 ∈ X : 0 ∈ 1

χ

[Ξ(xk+1)− Ξ(xk)

]+ T1(xk) + T2(xk+1), (10.0.4)

where χ > 0 and T is decomposed into a sum of a single-valued monotone op-erator T1 and a maximal monotone operator T2.

Then, in case Ξ := I (I - identity operator), the following well-knownmethods arise:

419

420 CHAPTER 10. THE AUXILIARY PROBLEM PRINCIPLE

inclusion (10.0.2) leads to

xk+1 ∈ xk − χkT (xk),

which is an analogue of the subgradient method;

on using (10.0.3) we obtain

xk+1 = (I + χkT )−1(xk),

the proximal point method;

finally, (10.0.4) produces

xk+1 = (I + χT2)−1(1− χT1)(xk), (10.0.5)

a splitting algorithm, suggested by Lions and Mercier [273] (see alsoGabay [125] and Passty [319]).

Obviously, the latter algorithm can be represented as

zk = xk − χT1(xk), xk+1 = (I + χT2)−1(zk),

where xk+1 is calculated from zk by means of the resolvent of the proximalmapping. Tseng [400, 401] has used method (10.0.5) (with variable χ) as abasic process to investigate convergence of several known but also new splittingmethods for variational inequalities with separability properties (including linearcomplementarity problems) as well as for related convex optimization problemswith linear constraints.

10.1 Extended Auxiliary Problem Principle

10.1.1 On merging of APP and IPR

Throughout this chapter the APP is studied for variational inequalities of theform

VI(F ,Q,K) find x∗ ∈ K and q∗ ∈ Q(x∗) :

〈F(x∗) + q∗, x− x∗〉 ≥ 0 ∀ x ∈ K,

with K a convex closed subset of a Hilbert space (X, ‖ · ‖), F a single-valuedoperator from X into the dual space X ′ and Q : X → 2X

′a maximal mono-

tone (in general, non-symmetric) operator. For a monotone operator, the notionsymmetric means that the operator is the subdifferential of a convex functional.

The suggested scheme includes successive approximation of K by meansof a sequence Kk of convex closed sets. Taking a set

K ⊃ K ∪ (∪∞k=1Kk) , K ⊂ X,

auxiliary problems of the form

(P k) find xk+1 ∈ Kk, qk(xk+1) ∈ Qk(xk+1) :

〈F(xk) + qk(xk+1) + Lk(xk+1)− Lk(xk)

+χk(∇h(xk+1)−∇h(xk)), x− xk+1〉 ≥ 0 ∀ x ∈ Kk,

10.1. EXTENDED AUXILIARY PROBLEM PRINCIPLE 421

are considered. Here h : X → (−∞,+∞] is a convex functional, Gateaux-differentiable on K; Qk is a monotone operator approximating Q; Lk : X → X ′

is a monotone operator linked with F by a kind of pseudo Dunn property,D(Lk) ⊃ K; xk is an approximate solution of the previous problem and χk is apositive scalar. The sum Lk +χk∇h corresponds to the customary notion of anauxiliary operator Ξk.

For practical implementation, the cases when Qk ≡ Q (i.e. there is no ap-proximation of Q) or if Qk is a single-valued approximation of the multi-valuedQ are of most interest, and the scheme under consideration covers these cases.

To make the APP numerically tractable we will investigate in this sectionthe iteration process

(P kε ) find xk+1 ∈ Kk, qk(xk+1) ∈ Qk(xk+1) :


+χk(∇h(xk+1)−∇h(xk)), x− xk+1〉 ≥ −εk‖x− xk+1‖ ∀ x ∈ Kk,

where εk ↓ 0 is a given sequence and x1 ∈ K is an arbitrary starting point.We call this process the Proximal Auxiliary Problem Method (PAP-method).It is easy to see, that replacing Qk by F + Qk, F by 0 and taking Lk ≡ 0,proximal-like methods considered in Section 9.1 can be derived from (P kε ).

The conditions concerning an approximation of Q by Qk and K by Kkare of the type of Mosco-convergence (cf. Section 2.4 and [296]). For the exactconditions which are supposed to be valid for the both problems VI(F ,Q,K)and (P k) see the Assumptions 10.1.1 and 10.1.4 below. They allow the solu-tion set of VI(F ,Q,K) to be infinite-dimensional and unbounded, i.e. we dealwith essentially ill-posed problems.

Coming back to the sources of the PAP-method, we begin with a ver-sion of the APP for convex optimization problems, where a linearization of theGateaux-differentiable term J1 of an objective functional J = J1 + J2 is used.This was already studied by Cohen in [78]. Taking this partial linearization,the term 〈∇J1(xk), ·−xk〉+J2(·) is inserted into the objective functional of the(k + 1)-th auxiliary problem. Such an approach is of special interest for con-structing decomposition methods. In fact, if the problem minJ2(x) : x ∈ Ksplits up into independent subproblems, then the mentioned linearization per-mits one to provide the same splitting in the framework of the APP for theoriginal problem minJ(x) : x ∈ K, i.e. the corresponding auxiliary problemscan be split up, too.

The general scheme oriented to decomposition methods for hemi-variationalinequalities of the form

HVI(T , f,K) find x∗ ∈ X :

〈T (x∗), x− x∗〉+ f(x)− f(x∗) ≥ 0, ∀ x ∈ X,

where T is a single-valued monotone operator and the functional f is convex,lower semicontinuous (lsc) and additive with respect to a Cartesian factorizationof the space X, has been developed by Makler-Scheimberg, Nguyen andStrodiot in [282]. An extension of this scheme (in case X := Rn and without


accentuating decomposition methods), described by Salmon, Nguyen andStrodiot in [359], is connected with relaxation of the monotonicity conditionfor T and with the use of a wider class of auxiliary operators (as distinct from[282], these operators may be non-symmetric). The auxiliary problems in [359]can be written as

find xk+1 ∈ X :

〈T (xk) + Lk(xk+1)− Lk(xk) + χk(∇hk(xk+1)−∇hk(xk)), x− xk+1〉+fk(x)− fk(xk+1) ≥ 0 ∀ x ∈ X, (10.1.1)

and the conditions concerning Lk, hk and fk made there permit one to cover alot of earlier versions of APP and special algorithms.

Formally, HVI(T , f,K) and (10.1.1) correspond to VI(F ,Q,K) and (P k),respectively, with

T := F +Q, f := ind(·|K)

in HVI(T , f,K) and

Lk := Lk +Q, hk := h, fk := ind(· |Kk)

in (10.1.1). However, the conditions on T , Lk and fk in [359] enforce that Q isa single-valued, continuous operator and Kk ⊂ K.

In Section 10.2 we will modify the PAP-method for hemi-variational in-equalities, but with a multi-valued operator T . In some applications, this leadsto more convenient conditions on data approximation, than applying the basicprocess (P kε ) to the equivalent VI(F ,Q,K).

Our PAP-method described here can be considered as a generalizationof the method studied by Zhu and Marcotte in [422], Sect. 4.2. The aux-iliary problems in [422] correspond to (P k) with X := R

n, Q - the sum of asingle-valued, monotone operator and a symmetric, monotone operator, Kk :=K, Qk := Q, Lk := 0, χk := χ, and with stronger assumptions on h and F (itis supposed that h is strongly convex and F possesses the Dunn property).

Paper [345] of Renaud and Cohen should also be mentioned, in whichthe APP is studied for the inclusion problem IP(T , X) where T : X → 2X isa maximal monotone operator. The corresponding auxiliary problems have theform (10.0.3) with a single-valued (in general, non-symmetric) auxiliary opera-tor Ξ = Ξ1 + εΞ2. Here, Ξ1 is supposed to be the gradient of a strongly convexfunctional, and supposing the following relation between T and Ξ2:

∃ γ0 > 0 : 〈t(x)− t(y), x− y〉 ≥ γ0‖Ξ2(x)− Ξ2(y)‖2,∀ t(x) ∈ T (x), t(y) ∈ T (y), ∀ x, y ∈ domT , (10.1.2)

the operator Ξ2 is assumed to be hemicontinuous only.In [345] the general convergence results for method (10.0.3) have been alsoadapted to prove convergence of a new algorithm for solving saddle-point prob-lems with convex-concave functions on the product of convex sets. Dependingon the decomposition of the related maximal monotone operator, the Arrow-Hurwicz algorithm and the proximal point method can be obtained as particular


cases.

The PAP-method studied in this section has the following distinguishingfeatures: In comparison with [282, 359], the operator Q is not supposed to besymmetric; and as distinct from [345], the main operator F+Q is not necessarilymonotone. Approximations of Q and K are also included, and the auxiliaryoperator is allowed to vary after each step.

Besides, we weaken the standard (for the APP) assumption on strongconvexity of the auxiliary function h in the Problems (P kε ): h is supposed tobe convex and the operators Qk + ∇h have to be strongly monotone with acommon modulus for all k.

Note that the conditions, which link the main and auxiliary operators in ourscheme (see Assumption 10.1.1(v) below) are similar but not the same as in[282, 345, 359].

10.1.2 PAP-method and its convergence analysis

In the sequel, for each x ∈ SOL(F ,Q,K), the elements q∗ or q∗(x) belong toQ(x) and satisfy VI(F ,Q,K).

We consider the variational inequality VI(F ,Q,K) under the followingbasic assumptions.

10.1.1 Assumption. (Basic assumptions)

(i) K ⊂ X is a convex, closed set;

(ii) Q : X → 2X′

is a maximal monotone operator, D(Q)∩K is a non-empty,convex set, and

QK : y →Q(y) if y ∈ K∅ otherwise

is locally hemi-bounded at each point of D(Q) ∩K;

(iii) the operator Q+NK is maximal monotone;

(iv) the operator F : X → X ′ is single-valued and weakly continuous on Kand the functional x 7→ 〈F(x), x〉 is weakly lsc on K;

(v) given a family Ly,Ly : X → X ′, of monotone operators on K parame-

terized by y ∈ K; there exists γ > 0 (independent of x, y) such that, forx ∈ SOL(F ,Q,K), y ∈ K, the inequality

〈F(y)−F(x)− Ly(y) + Ly(x), y − x〉+〈F(x) + q∗(x), z(y)− x〉 ≥ γ‖F(y)− Ly(y)−F(x) + Ly(x)‖2X′

is valid (here and in the sequel, z(·) = arg minv∈K ‖ · −v‖);

(vi) VI(F ,Q,K) is solvable. ♦

Let us discuss some notions and conditions given above.


Local hemi-boundedness of Q (see Definition A1.6.38) is used to provide(by means of Lemma 9.3.8) the following implication with a fixed x ∈K ∩D(Q): if

∀ x ∈ K ∩D(Q), ∃ q(x) ∈ Q(x) : 〈F(x) + q(x), x− x〉 ≥ 0,

then x is a solution of VI(F ,Q,K).This implication is crucial for proving that each weak limit point of thesequence xk generated by the APP-method is a solution of VI(F ,Q,K)(see Lemma 10.1.8).

With Q a maximal monotone operator and K a convex closed set, theoperator Q+NK is maximal monotone if, for instance, intD(Q) ∩K 6= ∅or Q is locally bounded at some x ∈ K ∩ clD(Q) (see [350]).Local boundedness of Q at x means that Q carries some neighborhood ofx into a bounded set.

By definition, F is weakly continuous on K if vn v in X, vn ∈ K, v ∈ Kimply F(vn) F(v) in X ′.

Assumption 10.1.1(iv) is valid, in particular, if F is a compact operator.Also the second part of (iv) follows from the first part if F is a monotoneoperator. Indeed, let vn v, vn ∈ K, v ∈ K. In case F is compact, thisensures that F(vn) converges to F(v) strongly in X ′. Then, using theidentity

〈F(vn), vn〉 = 〈F(vn)−F(v), vn〉+ 〈F(v), vn〉,

we obtain immediately

limn→∞

〈F(vn), vn〉 = 〈F(v), v〉.

But also in the case if F is monotone, the relation

〈F(vn), vn〉 = 〈F(vn), vn − v〉+ 〈F(vn), v〉 ≥ 〈F(v), vn − v〉+ 〈F(vn), v〉

together with vn v, F(vn) F(v) yields

limn→∞〈F(vn), vn〉 ≥ 〈F(v), v〉.

Instead of Assumption 10.1.1(iv) one can assume that F is a single-valued,monotone, hemicontinuous and bounded operator on X, D(F) = X (see[223]).

Assumption 10.1.1(v) is not so strange as it seems at the first glance. Forinstance, it is certainly fulfilled if the operators F − Ly possess the Dunnproperty (are co-coercive) with a common modulus γ > 0, i.e.

〈F(x)− Ly(x)−F(v) + Ly(v), x− v〉≥ γ‖F(x)− Ly(x)−F(v) + Ly(v)‖2X′ ∀ x, v ∈ K, ∀ y ∈ K,

which is a rather standard hypothesis for the APP.

On the other hand, let us consider the simple


10.1.2 Example.

X = R1, Q(x) = x+ 4, K = [−1, 1], K = [−2, 2], Ly ≡ 0

and

F(x) =

x+ 2 if x < −1x2 if x ≥ −1

.

It illustrates the situation that Assumption 10.1.1 (v) is fulfilled (with γ = 1),although the operator F does not possess the Dunn property and even is notpseudo-monotone. For the definition of pseudo-monotonicity see A1.6.42.It is also worth to note that in this example the operator F+Q is not monotoneon K. ♦

Comparing Assumption 10.1.1(v) with the corresponding condition in[345] (see (10.1.2)), we have to take

X ′ := X, Kk := K, K := K, Ly := L

in Assumption 10.1.1(v) and to consider (10.0.4) and (10.1.2) with T = F +Q + NK , Ξ = ∇h + εL − εF . Then the condition in [345] can be rewritten inthe form

∃γ0 > 0 :

〈F(y) + qK(y)−F(x)− qK(x), y − x〉 ≥ γ0‖F(y)− L(y)−F(x) + L(x)‖2

∀ x, y ∈ K ∩D(Q), qK(·) ∈ Q(· ) +NK(· ). (10.1.3)

From here it can be seen that the operator F +Q must be monotone on K, andhence, condition (10.1.3) is not valid for Example 10.1.2. However, one can givean ”opposite”example, where (10.1.3) is satisfied, but our Assumption 10.1.1(v)is violated. Also, it should be mentioned that like (v), condition (10.1.3) is validif F − L possesses the Dunn property and L is monotone.

Let us underline that the auxiliary operator in [345] cannot vary duringthe iteration process. This excludes some traditional applications of the APP,in particular, Newton - and Quasi-Newton methods.

Now, the method suggested reads as follows:

10.1.3 Method. Proximal Auxiliary Problem method (PAP-method)Starting with x1 ∈ K, the sequence xk is defined by solving successively theauxiliary problems

(P kε ) find xk+1 ∈ Kk, qk(xk+1) ∈ Qk(xk+1) :


+χk(∇h(xk+1)−∇h(xk)), x− xk+1〉 ≥ −εk‖x− xk+1‖ ∀ x ∈ Kk,

where εk ↓ 0 and Lk = Ly|y=xk are given for k = 1, 2, .... ♦

In the sequel we are going to study the convergence of the PAP-methodusing Assumption 10.1.1 and the following conditions on the data of the aux-iliary problems (P kε ).


10.1.4 Assumption. (Assumptions on the auxiliary problems)

(i) The operators Ly, y ∈ K, are monotone and Lipschitz continuous on K,with common Lipschitz constant lL;

(ii) h is a convex functional on X and the mapping ∇h is Lipschitz continuouson K, with Lipschitz constant lh;

(iii) for each k, it holds Kk ∩D(Qk) 6= ∅ and

〈qk(x)−qk(y), x−y〉 ≥ 〈B(x−y), x−y〉 ∀x, y ∈ Kk∩D(Qk), ∀qk(·) ∈ Qk(·),

where B : X → X ′ is a given linear continuous and monotone operatorwith the symmetry property 〈Bx, y〉 = 〈By, x〉;

(iv) with given constants χ > 0, m > 0, the inequality

1

2χ〈B(x− y), x− y〉+ h(x)− h(y)− 〈∇h(y), x− y〉 ≥ m‖x− y‖2

is valid for all x, y ∈ K;

(v) the regularization parameters satisfy 0 < χ ≤ χk ≤ χk+1 ≤ χ <∞ ∀ k;

(vi) for all k and y ∈ K, the operators Qk + Ly +NKk + χk∇h are maximalmonotone. ♦

Assumptions 10.1.1 (iii)-(vi) and the monotonicity of Ly provide theunique solvability of the auxiliary problems (P k). Moreover, the usage of thefunctional h with properties (iii), (iv) corresponds to weak regularization or reg-ularization on a subspace in proximal methods (see Section 8.2.3) and ensuresthe same kind of convergence as in the case of a strongly convex functionalh(x) = ‖x‖2.

The next group of conditions defines the successive approximation of thepair (Q,K) in the auxiliary problems (P kε ).

10.1.5 Assumption. (Assumptions on the approximations)

(i) for each w ∈ D(Q)∩K, there exists a sequence wk, wk ∈ D(Qk)∩Kk,such that

limk→∞

‖wk − w‖ = 0, limk→∞

infζ∈Qk(wk)

‖ζ − q(w)‖X′ = 0,

with q(w) ∈ Q(w) (in general, q(w) depends on wk);

(ii) for some pair x∗ and q∗ solving VI(F ,Q,K), there exist a constant α > 1and a sequence wk, wk ∈ D(Qk) ∩Kk, such that

limk→∞

kα‖wk − x∗‖ = 0, limk→∞

kα[

infζ∈Qk(wk)

‖ζ − q∗‖X′]

= 0;


(iii) with x∗, q∗ and α as in (ii), for any sequence vk, vk ∈ Kk ∩D(Qk), therelation

limk→∞

kα max0, 〈F(x∗) + q∗, z(vk)− vk〉 = 0

is valid;

(iv) each weak limit point of an arbitrary sequence vk, vk ∈ Kk ∩ D(Qk),belongs to K ∩D(Q). ♦

If χk ≡ χ ∈ [χ, χ] and G is a symmetric monotone operator such thatLy − χG (taken as Ly) satisfies Assumption 10.1.4(i), we have a chance toweaken the Assumptions 10.1.1(v) and 10.1.4 (iv) considering Ly − χG as Lyand h+ g as h, where G = ∇g.

Assumption 10.1.4(vi) is valid under the Assumptions 10.1.4(i) and (ii)if each operator Qk is maximal monotone and locally bounded at some x ∈clD(Qk) ∩Kk. Regarding also Assumption 10.1.4(iii), Assumption 10.1.4(vi)is also valid if D(Qk) ⊃ Kk and Qk is hemicontinuous on Kk (this followsfrom the Theorems 1 and 3 in [350]). Other conditions ensuring Assumption10.1.4(vi) can be derived from the results about the maximality of the sum oftwo monotone operators in [17, 57, 350].

In case F is a compact operator, instead of the strong convergence of wkto w in Assumption 10.1.5(i) one can require that wk w. Of course, the bothAssumptions 10.1.5(iii) and (iv) are fulfilled if

kα(z(vk)− vk) 0

holds for any sequencevk, vk ∈ Kk.Assumptions 10.1.5(i) - (iv) about the simultaneous approximation of Q

and K are mainly inspired by the error estimation technique in finite elementmethods for elliptic variational inequalities (see Appendix A2).Constructing a sequence Kk, Kk = Khk , by means of a finite element methodon a sequence of triangulations with a triangulation parameter hk → 0, we meetthe following rather typical situation (see [75, 134] for detailed results):

with an arbitrary element v ∈ K and vk := arg minz∈Kk ‖v − z‖, theasymptotic limk→∞ ‖v − vk‖ = 0 is known without any practicable esti-mation for the rate of convergence;

according to theorems on regularity of the solutions of elliptic problems, foran important class of variational inequalities the solutions possess a betterregularity than arbitrary elements from K, and then for v ∈ SOL(Q,K)

‖v − vk‖ ≤ c(v)hβ1

k

is guaranteed with some β1 > 0;

for an arbitrary sequence wk, wk ∈ Kk, the estimate

minv∈K‖v − wk‖ ≤ chβ2

k

holds with some β2 > 0.


If the operator Q is Lipschitz-continuous on K and Qk := Q ∀ k is taken,Assumptions 10.1.5(i) - (iv) follow from here immediately (under an appro-priate variation of the triangulation parameter hk). The maintenance of theseassumptions in a more general case, especially with multi-valued operators Q,depends essentially on the special structure of the operator (see also Remark10.2.1 b) below).

As previously mentioned, the case Qk := Q ∀ k and the one, where Qkis a single-valued approximation of Q, are of most interest for applications. InAppendix II of [225], we analyze the fulfillment of conditions on the simultaneousapproximation of a (multi-valued non-symmetric) operator Q and a set K forparticular examples.

10.1.6 Remark. The Assumptions 10.1.4(ii) - (v) provide strong monotonicityof the operator Qk + χk∇h on Kk ∩ D(Qk), and together with Assumption10.1.1(i) and Assumption 10.1.4(i), this yields strong monotonicity of

Qk + Lk +NKk + χk∇h.

Moreover, according to Assumption 10.1.4(vi), this operator is maximal mono-tone. Hence, for each k, Problem (P k) is uniquely solvable. ♦

With x∗ and q∗ solving VI(F ,Q,K), we define

Γk(x∗, x) := χ〈B(x− x∗), x− x∗〉+ h(x∗)− h(x)− 〈∇h(x), x∗ − x〉

+1

χk〈F(x∗) + q∗, z(x)− x∗〉, (10.1.4)

where z(x) := arg minv∈K ‖x − v‖. For x ∈ K, under Assumptions 10.1.4(iv)and (v) it holds

Γk(x∗, x) ≥ m‖x∗ − x‖2 and Γk+1(x∗, x) ≤ Γk(x∗, x). (10.1.5)

Again, the sequence Γk plays the role of a Ljapunov function in the furtheranalysis.

10.1.7 Lemma. Let Assumptions 10.1.1(i), (v), (vi), Assumptions 10.1.4(i) -(vi) as well as Assumptions 10.1.5 (ii), (iii) be fulfilled. Moreover, let

∑∞k=1 εk <

∞ and1

4γm< χ, 2χχ < 1. (10.1.6)

Then sequence xk, generated by Method 10.1.3, is bounded,limk→∞ ‖xk+1 − xk‖ = 0 and sequence Γk(x∗, xk) converges.

This statement can be proved by modifying the proof of Theorem 2.1 in [359]in the following way.

Proof: In the sequel, we make use of the following inequalities, which are validfor arbitrary a, b, x ∈ X, p ∈ X ′ and ε > 0:

〈p, a〉 ≤ 1

2ε‖p‖2X′ +

ε

2‖a‖2, (10.1.7)


〈B(a− b), a− b〉 ≤ (1 + ε)〈B(a− x), a− x〉+1 + ε

ε〈B(b− x), b− x〉. (10.1.8)

In order to estimate Γk+1(x∗, xk+1)−Γk(x∗, xk), where Γk is defined by (10.1.4),we obtain from (10.1.5) that

Γk+1(x∗, xk+1)− Γk(x∗, xk) ≤ Γk(x∗, xk+1)− Γk(x∗, xk)

(the existence of x∗ and xk for all k is guaranteed by Assumption 10.1.1(vi)and Remark 10.1.6). Denoting zl := arg minv∈K ‖xl − v‖, the right-hand sideof this inequality can be decomposed as follows:

Γk(x∗, xk+1)− Γk(x∗, xk) = s1 + s2 + s3 + s4,

with

s1 := h(xk)− h(xk+1) + 〈∇h(xk), xk+1 − xk〉, (10.1.9)

s2 := 〈∇h(xk)−∇h(xk+1), wk − xk+1〉+1

χk〈F(x∗) + q∗, zk+1 − zk〉,

s3 := 〈∇h(xk)−∇h(xk+1), x∗ − wk〉

and

s4 := χ〈B(xk+1 − x∗), xk+1 − x∗〉 − χ〈B(xk − x∗), xk − x∗〉. (10.1.10)

We suppose that the sequence wk satisfies Assumption 10.1.5(ii). Then As-sumption 10.1.4(ii) and relation (10.1.7) yield

s3 ≤τ

2‖xk+1 − xk‖2 +

1

2τl2h‖x∗ − wk‖2, (10.1.11)

with an arbitrary τ > 0. Setting x := wk in Problem (P kε ), we obtain

s2 ≤ 1

χk〈F(xk) + qk(xk+1) + Lk(xk+1)− Lk(xk), wk − xk+1〉

+1

χk〈F(x∗) + q∗, zk+1 − zk〉+

εkχk‖wk − xk+1‖,

and due to Assumption 10.1.4(iii), for an arbitrarily chosen qk(wk) ∈ Qk(wk),it holds

χks2 ≤ 〈F(xk) + qk(wk) + Lk(xk+1)− Lk(xk), wk − xk+1〉− 〈B(wk − xk+1), wk − xk+1〉+ 〈F(x∗) + q∗, zk+1 − zk〉+ εk‖wk − xk+1‖

= 〈F(xk)−F(x∗)− Lk(xk) + Lk(x∗), x∗ − xk〉+ 〈F(x∗) + q∗, x∗ − zk〉+ 〈F(xk)−F(x∗)− Lk(xk) + Lk(x∗), wk − x∗ + xk − xk+1〉+ 〈qk(wk)− q∗, x∗ − xk+1 + wk − x∗〉+ 〈F(x∗) + q∗, wk − x∗〉+ 〈F(x∗) + q∗, zk+1 − xk+1〉 − 〈B(wk − xk+1), wk − xk+1〉+ 〈Lk(x∗)− Lk(xk+1), xk+1 − wk〉+ εk‖wk − xk+1‖. (10.1.12)


With Assumption 10.1.1(v) one can continue:

χks2 ≤ −〈B(wk − xk+1), wk − xk+1〉− γ‖F(xk)− Lk(xk)−F(x∗) + Lk(x∗)‖2X′+ 〈qk(wk)− q∗, x∗ − xk+1 + wk − x∗〉+ 〈F(xk)−F(x∗)− Lk(xk) + Lk(x∗), wk − x∗ + xk − xk+1〉+ 〈F(x∗) + q∗, wk − x∗〉+ max

[0, 〈F(x∗) + q∗, zk+1 − xk+1〉

]+ 〈Lk(x∗)− Lk(xk+1), xk+1 − wk〉+ εk‖wk − xk+1‖. (10.1.13)

Due to the monotonicity of Lk,

〈Lk(x∗)− Lk(xk+1), xk+1 − wk〉 ≤ 〈Lk(xk+1)− Lk(x∗), wk − x∗〉

is valid. Now, using the inequalities (10.1.7), (10.1.8) together with Assumption10.1.4(i) (in order to estimate the right-hand side of (10.1.13) as well as theterm 〈Lk(xk+1)− Lk(x∗), wk − x∗〉), we obtain

χks2 ≤µ+ η − 2γ

2‖F(xk)− Lk(xk)−F(x∗) + Lk(x∗)‖2X′

+1

2µ‖xk − xk+1‖2 +

1

2η‖wk − x∗‖2

− 1

1 + ε〈B(xk+1 − x∗), xk+1 − x∗〉+

1

ε〈B(wk − x∗), wk − x∗〉

+ ‖F(x∗) + q∗‖X′‖wk − x∗‖+lLθk‖wk − x∗‖2

+ max[0, 〈F(x∗) + q∗, zk+1 − xk+1〉

]+ εk‖wk − x∗‖+ εk

(1

4λ+ λ‖xk+1 − x∗‖2

)+ lLθk‖xk+1 − x∗‖2 +

(1

2lLθk+

θk2lL

)‖qk(wk)− q∗‖2X′ , (10.1.14)

with arbitrary positive µ, η, θk, λ and ε. Choosing

ε :=1

2χχ−1, µ ∈

(1

2χm, 2γ

), τ := m− 1

2χµ, η ∈ (0, 2γ−µ), (10.1.15)

one can conclude from Assumption 10.1.4(iv), (v) and relations (10.1.6) and


(10.1.8) that

(χ− 1

χk

1

1 + ε

)〈B(xk+1 − x∗), xk+1 − x∗〉 − χ〈B(xk − x∗), xk − x∗〉

− h(xk+1) + h(xk) + 〈∇h(xk), xk+1 − xk〉+

(1

2χkµ+τ

2

)‖xk+1 − xk‖2

≤ −χ[〈B(xk+1 − x∗), xk+1 − x∗〉+ 〈B(xk − x∗), xk − x∗〉

]− h(xk+1) + h(xk) + 〈∇h(xk), xk+1 − xk〉

+

(1

2χkµ+m

2− 1

4χµ

)‖xk+1 − xk‖2

≤ − χ2〈B(xk+1 − xk), xk+1 − xk〉 − h(xk+1) + h(xk) + 〈∇h(xk), xk+1 − xk〉

+

(m

2+

1

4χµ

)‖xk+1 − xk‖2 ≤

(−m

2+

1

4χµ

)‖xk+1 − xk‖2, (10.1.16)

and that −m2 + 14χµ < 0. Now, we sum up (10.1.9), (10.1.10) and the estimates

for s2, s3 in (10.1.11), (10.1.14), and insert (10.1.15) and (10.1.16) in this sum.This yields

Γk+1(x∗, xk+1)− Γk(x∗, xk) ≤(−m

2+

1

4χµ

)‖xk+1 − xk‖2

+1

χk

µ+ η − 2γ

2‖F(xk)− Lk(xk)−F(x∗) + Lk(x∗)‖2X′

+1

χk

[(1

2η+lLθk

+χl2h2τ

)‖x∗ − wk‖2 + lLθk‖xk+1 − x∗‖2

+

(1

2lLθk+

θk2lL

)‖qk(wk)− q∗‖2X′

+1

ε〈B(wk − x∗), wk − x∗〉+ ‖F(x∗) + q∗‖X′‖wk − x∗‖

]+ max0, 〈F(x∗) + q∗, zk+1 − xk+1〉

+εkχk‖wk − x∗‖+

εkχk

(1

4λ+ λ‖xk+1 − x∗‖2

). (10.1.17)

But, from the first inequality in (10.1.5) one gets

‖xk+1 − x∗‖2 ≤ 1

mΓk+1(x∗, xk+1)


and (10.1.17) leads to(1− lLθk

χm− λεkχm

)Γk+1(x∗, xk+1)

≤ Γk(x∗, xk) +1

χ

[(1

2η+lLθk

+χl2h2τ

)‖wk − x∗‖2

+

(1

2lLθk+

θk2lL

)‖qk(wk)− q∗‖2X′

+1


+ max[0, 〈F(x∗) + q∗, zk+1 − xk+1〉

]]+

(−m

2+

1

4χµ

)‖xk+1 − xk‖2 +

εkχ‖wk − x∗‖+

εk4λχ

. (10.1.18)

Choose λ ∈(

0,χm

2 sup εk+1

)and set θk := θk−α, where θ ∈

(0,

χm

2lL

)and α > 1 is

the same as in Assumption 10.1.5(ii). Then, we obtain with d0 := lLθχm , d1 := λ

χm

that 1− d0k−α− d1εk >

12 − d0 > 0, and with regard to −m2 + 1

4χµ < 0, one can

conclude for k = 1, 2, ... that

Γk+1(x∗, xk+1) ≤ 1

1− d0k−α − d1εkΓk(x∗, xk)

+1

χ(1/2− d0)

[(1

2η+lLθkα +

χl2h2τ

)‖wk − x∗‖2

+

(kα

2lLθ+

θ

2lL

)‖qk(wk)− q∗‖2X′

+1


+ max[0, 〈F(x∗) + q∗, zk − xk〉] + εk‖wk − x∗‖+εk4λ

]. (10.1.19)

Taking into account that qk(wk) is an arbitrary element of Qk(wk), in view of

1

1− d0k−α − d1εk≤ 1 +

d0k−α + d1εk

1/2− d0,

∞∑k=1

εk <∞,

and the Assumptions 10.1.5(ii) and (iii), convergence of the sequence Γk(x∗, xk)follows from Lemma A3.1.7, and the first inequality in (10.1.5) ensures theboundedness of the sequence xk. Passing to the limit in (10.1.18), one canimmediately conclude that limk→∞ ‖xk+1 − xk‖ = 0.

10.1.8 Lemma. Let Assumptions 10.1.1(i) - (iv), (vi), Assumptions 10.1.4(i) - (iii) and Assumptions 10.1.5(i), (iv) be fulfilled, and 0 < χk ≤ χ holds forall k.Moreover, let the sequence xk generated by Method 10.1.3 be bounded, andlimk→∞ ‖xk+1 − xk‖ = 0.Then each weak limit point of xk is a solution of VI(F ,Q,K) .

Proof: Let S ⊂ 1, 2, ... and x be an arbitrary weak limit point of xk andlet xkk∈S converge weakly to x. Due to Assumption 10.1.5(iv), x belongs to


K∩D(Q). Since limk→∞ ‖xk+1−xk‖ = 0, one gets xk+1 x if k ∈ S, k →∞.According to Assumption 10.1.5(i), for each y ∈ D(Q) ∩ K one can choose asequence yk, yk ∈ D(Qk) ∩Kk, such that limk→∞ ‖yk − y‖ = 0, and

limk→∞

‖qk(yk)− q(y)‖X′ = 0 (10.1.20)

holds true with some qk(yk) ∈ Qk(yk) and q(y) ∈ Q(y). By definition of xk+1,for suitably chosen qk(xk+1) ∈ Qk(xk+1)


+χk(∇h(xk+1)−∇h(xk)

), yk − xk+1〉 ≥ −εk‖yk − xk+1‖

is valid, and the monotonicity of Qk (see Assumption 10.1.4(iii)) leads to

〈F(xk) + qk(yk) + Lk(xk+1)− Lk(xk)


), yk − xk+1〉 ≥ −εk‖yk − xk+1‖. (10.1.21)

From the identity

〈F(xk), yk − xk+1〉= 〈F(xk), yk − y〉+ 〈F(xk), xk − xk+1〉+ 〈F(xk), y〉 − 〈F(xk), xk〉,

using the relations

limk→∞

‖yk − y‖ = 0, xk x (k ∈ S), limk→∞

‖xk+1 − xk‖ = 0 (10.1.22)

and Assumption 10.1.1 (iv), we obtain that

limk∈S〈F(xk), yk − xk+1〉 ≤ 〈F(x), y〉+ limk∈S〈F(xk),−xk〉= 〈F(x), y〉 − limk∈S〈F(xk), xk〉 ≤ 〈F(x), y − x〉. (10.1.23)

Now, passing to the limit for k ∈ S in (10.1.21), the relations (10.1.20), (10.1.23),0 < χk ≤ χ, limk→∞ εk = 0 and the Assumptions 10.1.4(i), (ii) imply

〈F(x) + q(y), y − x〉 ≥ 0.

Moreover, due to Assumptions 10.1.1(ii), (iii), the operators

Q0 : y → F(x) +Q(y)

and Q0 +NK are maximal monotone, and the operator

QK : y →Q0(y) if v ∈ K∅ otherwise

is locally hemi-bounded at each point of K. Thus, the application of Lemma9.3.8 with u = x, C = K and A0 = Q0 yields

〈F(x) + q(x), y − x〉 ≥ 0 ∀ y ∈ K,

where q(x) ∈ Q(x). Hence, x is a solution of VI(F ,Q,K).


10.1.9 Lemma. Let S ⊂ 1, 2, ..., assume that the sequence vkk∈S, vk ∈Kk, converges weakly to some x∗ ∈ SOL(F ,Q,K) and

limk∈S

max0, 〈F(x∗) + q∗, z(vk)− vk〉 = 0. (10.1.24)

Thenlimk∈S〈F(x∗) + q∗, z(vk)− x∗〉 = 0 (10.1.25)

holds true.

Proof: Due to x∗ ∈ SOL(F ,Q,K) and z(vk) ∈ K, one gets

〈F(x∗) + q∗, z(vk)− x∗〉 ≥ 0 ∀ k ∈ S. (10.1.26)

The sequence vkk∈S, being weakly convergent, is bounded, and hence the bothsequences z(vk)k∈S and z(vk) − vkk∈S are bounded, too. Taking S0 ⊂ Ssuch that the subsequence z(vk)k∈S0

is weakly convergent, we obtain from(10.1.26)

0 ≤ limk∈S0

〈F(x∗) + q∗, z(vk)− vk〉+ limk∈S0

〈F(x∗) + q∗, vk − x∗〉

= limk∈S0

〈F(x∗) + q∗, z(vk)− vk〉.

Iflimk∈S0

〈F(x∗) + q∗, z(vk)− vk〉 > 0,

then obviously,

0 < limk∈S0

max0, 〈F(x∗) + q∗, z(vk)− vk〉,

but this contradicts (10.1.24). Hence

limk∈S0

〈F(x∗) + q∗, z(vk)− vk〉 = 0

has to be, and the relations

limk∈S〈F(x∗) + q∗, z(vk)− vk〉 = 0

and (10.1.25) follow immediately.

Condition (10.1.24) is obviously guaranteed if Assumption 10.1.5(iii) is fulfilled.

10.1.10 Theorem. Let the Assumptions 10.1.1 and 10.1.4 and condition(10.1.6) be fulfilled and let

∑∞k=1 εk < ∞. Then the following conclusions are

true:

(i) Problem (P k) is uniquely solvable for each k, the sequence xk gener-ated by Method 10.1.3 is bounded, and each weak limit point of xk is asolution of VI(F ,Q,K);

(ii) if, in addition, the Assumptions 10.1.5(ii), (iii) (with common α > 1) arevalid for all x ∈ SOL(F ,Q,K) and

zk z in X, zk ∈ Kk =⇒ ∇h(zk) ∇h(z) in X ′, (10.1.27)

then the whole sequence xk converges weakly to a solution of VI(F ,Q,K);


(iii) if, moreover,

limk→∞

h(xk) = h(x∗) (10.1.28)

holds with x∗ ∈ SOL(F ,Q,K), then xk converges strongly to x∗ ∈SOL(F ,Q,K).

Proof: Conclusion (i) follows immediately from the Lemmata 10.1.7, 10.1.8and Remark 10.1.6. To prove (ii), suppose that xkk∈S1

, xkk∈S2are two

subsequences converging weakly to x, x, respectively. Then, according to (i), xand x belong to SOL(F ,Q,K), and because the Assumptions 10.1.5(ii) and (iii)are valid for each x ∈ SOL(F ,Q,K), Lemma 10.1.7 ensures that the sequencesΓk(x, xk)∞k=1, Γk(x, xk)∞k=1 are convergent (let us remind that the choice ofq∗(·) in Γk satisfies Assumption 10.1.5 (ii)).Due to x ∈ SOL(F ,Q,K), z(x) ∈ K and the definition of q∗(x), the inequality

〈F(x) + q∗(x), z(x)− x〉 ≥ 0

holds true, whereas the monotonicity of B and Assumption 10.1.4 (iv) providethat

h(x)− h(x)− 〈∇h(x), x− x〉+ χ〈B(x− x), x− x〉 ≥ m‖x− x‖2.

From these inequalities and the symmetry property of the operator B, we obtainfor x ∈ K

Γk(x, x)− Γk(x, x)

= (h(x)− h(x)− 〈∇h(x), x− x〉) + 〈∇h(x)−∇h(x), x− x〉

+1

χk〈F(x) + q∗(x), z(x)− x〉 − 1

χk〈F(x) + q∗(x), z(x)− x〉

+ χ〈B(x− x), x− x〉+ 2χ〈B(x− x), x− x〉≥ m‖x− x‖2 + 〈∇h(x)−∇h(x), x− x〉

− 1

χk〈F(x) + q∗(x), z(x)− x〉+ 2χ〈B(x− x), x− x〉. (10.1.29)

Inserting x := xk in (10.1.29) and passing to the limit for k ∈ S2, one can con-clude from (10.1.27), Assumption 10.1.4(v), Assumption 10.1.5(iii) and Lemma10.1.9 that

γ − γ ≥ m‖x− x‖2,

where γ := limk→∞ Γk(x, xk), γ := limk→∞ Γk(x, xk).Obviously, in the same way the ”symmetric” inequality

γ − γ ≥ m‖x− x‖2

can be concluded, and therefore x = x is valid, proving the uniqueness of theweak limit point of xk.Denoting this limit point by x∗, now we suppose additionally that relation(10.1.28) is fulfilled. Choosing wk according to Assumption 10.1.5(ii), qk(xk+1)as in Problem (P kε ) and an arbitrary qk(wk) ∈ Qk(wk), then Assumption


10.1.4(iii) yields

〈B(xk+1 − x∗), xk+1 − x∗〉= 〈B(xk+1 − wk), xk+1 − wk〉 − 〈B(x∗ − wk), xk+1 − x∗〉− 〈B(xk+1 − wk), x∗ − wk〉≤ 〈qk(xk+1)− qk(wk), xk+1 − wk〉 − 〈B(xk+1 − x∗), x∗ − wk〉− 〈B(xk+1 − wk), x∗ − wk〉. (10.1.30)

To estimate the term 〈qk(xk+1), xk+1 − wk〉, we use Problem (P kε ). Togetherwith (10.1.30) this gives

〈B(xk+1 − x∗), xk+1 − x∗〉 ≤ 〈F(xk)−F(x∗), wk − xk+1〉+ χk〈∇h(xk+1)−∇h(xk), wk − xk+1〉+ 〈Lk(xk+1)− Lk(xk), wk − xk+1〉+ 〈F(x∗), wk − x∗〉+ 〈F(x∗), x∗ − xk+1〉+ 〈qk(wk)− q∗(x∗), wk − xk+1〉+ 〈q∗(x∗), wk − x∗〉+ 〈q∗(x∗), x∗ − xk+1〉+ 〈Bx∗ + Bwk − 2Bxk+1, x∗ − wk〉 + εk‖wk − xk+1‖. (10.1.31)

Taking into account that the sequences wk and xk are bounded, one canderive that in the right part of inequality (10.1.31) all terms except for the firstone tend to 0 as k →∞.Indeed, term 2 in view of Assumptions 10.1.4(ii), (v) and ‖xk+1 − xk‖ → 0;term 3 because of Assumption 10.1.4(i) and ‖xk+1 − xk‖ → 0; the terms 4, 6and 7 due to Assumption 10.1.5 (ii); the terms 5 and 8 owing to xk x∗; term9 on account of Assumption 10.1.5(ii) and the boundedness of B, and finallyterm 10 by εk → 0.But, due to Assumption 10.1.1(iv),

limk→∞〈F(xk)−F(x∗), x∗ − xk〉= limk→∞

〈F(xk), x∗〉 − 〈F(x∗), x∗〉+ limk→∞

〈F(x∗), xk〉

+ limk→∞〈F(xk),−xk〉= 〈F(x∗), x∗〉 − limk→∞〈F(xk), xk〉 ≤ 0, (10.1.32)

and using the identity

〈F(xk)−F(x∗), wk − xk+1〉= 〈F(xk)−F(x∗), wk − x∗ + xk − xk+1〉+ 〈F(xk)−F(x∗), x∗ − xk〉,

the inequality

limk→∞〈F(xk)−F(x∗), wk − xk+1〉 ≤ 0 (10.1.33)

follows immediately from xk x∗, ‖xk − xk+1‖ → 0, (10.1.32), Assumption10.1.1(iv) and Assumption 10.1.5(ii).Now, the relation

limk→∞

〈B(xk+1 − x∗), xk+1 − x∗〉 = 0

can be deduced from (10.1.31). Together with (10.1.28) and Assumption 10.1.4(iv)this provides conclusion (iii).

10.2. MODIFICATIONS AND APPLICATIONS OF THE PAP-METHOD437

10.1.11 Remark. Obviously, the conditions (10.1.6) used in Lemma 10.1.7and in Theorem 10.1.10 are compatible if and only if 2γm > χ. But they arecertainly compatible, for instance, if the regularizing functional h is stronglyconvex. In this case, assuming m is the modulus of the strong convexity of h,an arbitrary small χ > 0 is appropriate in Assumption 10.1.4(iv). Also, theinequality 2mγ > χ can be always satisfied if F is a monotone operator and wedeal with proximal-like methods, which correspond formally to the PAP-methodby setting Qk := Qk+F , F := 0, Lk := 0 ( of course, h has to obey Assumption10.1.4(iv)). In this case, Assumption 10.1.1(v) is valid for arbitrary large γ.Working with conditions which are similar to Assumption 10.1.4 (iii), (iv),proximal-like methods with weak regularization and regularization on a subspacehave been developed in Chapters 8 and 7 for solving problems in elasticity theoryand optimal control.In general, one should choose the constants χ and m such that Assumption10.1.4(iv) is satisfied and the ratio χ

m is as small as possible. ♦

10.1.12 Remark. If xk+1 ∈ Kk, qk(xk+1) ∈ Qk(xk+1) and a residual vector

rk ∈ X ′ satisfy

F(xk) + qk(xk+1) + Lk(xk+1)− Lk(xk)


)+NKk(xk+1) 3 rk, (10.1.34)

with ‖rk‖X′ ≤ εk, then the straightforward use of the definition of the normalityoperator shows that xk+1 is a solution of Problem (P kε ).This permits us to apply the convergence analysis of this section to methodsworking with summable error criteria (10.1.34) of Eckstein’s type (cf. [97]).♦

10.2 Modifications and Applications of the PAP-method

We are going to observe extensions of the PAP-method 10.1.3 to different typesof variational inequalities.

10.2.1 Extension of PAP-method to different types of VI’s

For the hemi-variational inequality

HVI(F ,Q, f,X) find x∗ ∈ X, q(x∗) ∈ Q(x∗) :

〈F(x∗) + q(x∗), x− x∗〉+ f(x)− f(x∗) ≥ 0, ∀ x ∈ X,

where f : X → R ≡ R ∪ +∞ is a convex lsc functional and the operatorsQ + ∂f (instead of Q) and F satisfy Assumption 10.1.1 with K = domf , thestudied scheme can be directly applied if we construct the auxiliary problems(P kε ) being caused from the equivalent variational inequality:

VI(F ,Q,domf) find x∗ ∈ domf, s∗(x∗) ∈ Q(x∗) + ∂f(x∗) :

〈F(x∗) + s∗(x∗), x− x∗〉 ≥ 0 ∀ x ∈ domf.


However, one can handle more convenient requirements for the approxi-mation of f if the auxiliary problems have the form

(P kε ) find xk+1 ∈ X, qk(xk+1) ∈ Qk(xk+1) :


+ χk(∇h(xk+1)−∇h(xk)

), x− xk+1〉

+ fk(x)− fk(xk+1) ≥ −εk‖x− xk+1‖ ∀ x ∈ X,

where fk : X → R is a convex lsc functional. In this case, merging the conver-gence analysis from Salmon, Nguyen and Strodiot in [359] and in Subsec-tion 10.1.2, Theorem 10.1.10 can be proved under the following modificationsof the Assumptions 10.1.1, 10.1.4 and 10.1.5:

in the Assumptions 10.1.1 and 10.1.4 (i), (ii), (iv) replace K and K bydomf , and NK by ∂f , respectively;

Assumption 10.1.1(v)’:for y ∈ domf , Ly is monotone on domf and

〈F(y)− Ly(y) + Ly(x) + q(x), y − x〉+ f(y)− f(x)

≥ γ‖F(y)− Ly(y)−F(x) + Ly(x)‖2X′ (γ > 0− const. )

holds true, whenever x ∈ D(Q) ∩ domf and

〈F(x) + q(x), v − x〉+ f(v)− f(x) ≥ 0 ∀ v ∈ domf ;

Assumption 10.1.4(iii)’:

〈qk(x)− qk(y), x− y〉+ fk(x)− fk(y)− 〈gk(y), x− y〉 ≥ 〈B(x− y), x− y〉,∀ x, y ∈ D(Qk) ∩D(∂fk), ∀ qk(·) ∈ Qk(·), ∀ gk(y) ∈ ∂fk(y),

with B as in the former Assumption 10.1.4(iii);

Assumption 10.1.4(vi)’:for all k and y ∈ domf , the operators Qk +Ly +∂fk +χk∇h are maximalmonotone;

Assumption 10.1.5(i)’:fk ≥ f , D(Qk)∩ domfk ⊂ D(Q)∩ domf , and for each w ∈ D(Q)∩ domfthere exists a sequence wk, wk ∈ D(Qk) ∩ domfk, such that

limk→∞

‖wk − w‖ = 0,

limk→∞

fk(wk) = f(w),

limk→∞

infζ∈Qk(wk)

‖ζ − q(w)‖X′ = 0,

with q(w) ∈ Q(w);

10.2. MODIFICATIONS AND APPLICATIONS OF THE PAP-METHOD439

Assumption 10.1.5(ii)’:for some solution x∗ of HVI(F ,Q, X) there exist a constant α > 1 and asequence wk, wk ∈ D(Qk) ∩ domfk, such that

limk→∞

kα‖wk − x∗‖ = 0,

limk→∞

kα[

infζ∈Qk(wk)

‖ζ − q(x∗)‖X′]

= 0

limk→∞

kα maxfk(wk)− f(x∗), 0 = 0.

Here q(x∗) ∈ Q(x∗) has to satisfy

〈F(x∗) + q(x∗), x− x∗〉+ f(x)− f(x∗) ≥ 0 ∀x ∈ domf ;

the Assumptions 10.1.5 (iii) and (iv) must be skipped.

10.2.1 Remark.

a) The version (10.1.1) of the APP-method developed in [359] deals with afinite-dimensional HVI(F ,Q, f,X) where Q = 0. In that case Assumption10.1.1 (v)’ coincides with the condition linking F , L and f in the men-tioned paper, see Theorem 2.1 there.Moreover, our convergence result for process (P kε ) covers up this theoremfor method (10.1.1) with hk ≡ h.

b) Of course, real possibilities to satisfy the conditions 10.1.5(i), (ii) or (i)’,(ii)’, respectively, are connected with mutual properties of Q and K orQ and f . The general way based on the Moreau-Yosida approximation isvery expensive. However, the use of this approximation within a specialalgorithm of the APP in [101] and the proximal method in [298] seems tobe promising. If Qk ≡ Q and Kk ⊃ K, the Assumptions 10.1.5 (i), (ii)are automatically fulfilled.A couple of problems in mathematical physics, for instance the problemof linear elasticity with friction, the Bingham problem (cf. [95, 134]) etc.,takes the form HVI(F ,Q, f,X) with a single-valued operator Q. In thiscase a uniform approximation of f by a sequence of differentiable func-tionals is possible (see [134, 219, 316]) and there are no serious difficultiesin satisfying Assumptions 10.1.5(i)’, (ii)’. ♦

A simultaneous approximation of X and f by a sequence of subspacesXk and a sequence of functionals fk (in particular, with better smoothness

properties) can be inserted in the scheme above setting fk = fk + ind(·|Xk) in(P kε ).

It should also be noted that VI(F ,Q, X) in Section 10.1.1 takes the formof HVI(F ,Q, f,X) with f = ind(·|K). The auxiliary problems (P kε ) are obtainedfrom (P kε ) by setting fk = ind(·|Kk). But, the condition fk ≥ f in Assumption10.1.5(i)’ allows us to consider only the case Kk ⊂ K.


10.2.2 Extension of PAP-method to inclusions

Theorem 10.1.10 can be extended to methods treating IP(T , X) by the use ofthe relationship

0 ∈ T (x) ⇐⇒ x ∈ K and ∃ t(x) ∈ T (x) : 〈T (x), y − x〉 ≥ 0 ∀ y ∈ K,

where K is an arbitrary convex closed set such that K ⊃ D(T ).So, scheme (10.0.3) developed for this inclusion by Renaud and Cohen

in [345] can be considered as a particular realization of Method 10.1.3 if we splitthe operator T into a sum F+Q maintaining the conditions for the operators Fand Q in Assumption 10.1.1 (with a convex set K, intK ⊃ D(Q) and K = K).Indeed, (10.0.3) corresponds to (P kε ) if one takes

X ′ := X, Kk := K, Qk := Q,∇h := Ξ1, Lk := Ξ2 + F ,

χk :=1

ε, εk := 0.

A partial case of scheme (10.0.3) with Ξ1 = I and Ξ2 = −F is known as theLions-Mercier splitting algorithm (cf. [273]):

xk+1 = (I + εQ)−1(I − εF)(xk). (10.2.1)

Its convergence analysis can be found in [125, 319]. A straightforward adapta-tion of Theorem 10.1.10 yields new convergence results for the method (10.0.3)as well as for the Lions-Mercier algorithm.

On the basis of Remark 10.1.12, Theorem 10.1.10 can be extended also to theinexact version of method (10.0.3):

find xk+1 ∈ X : T (xk+1) +1

ε

[Ξ(xk+1)− Ξ(xk)

]3 rk,

with residual vectors rk such that∑∞k=1 ‖rk‖ < ∞. In [345] convergence has

been studied for the exact method only.

10.2.3 Decomposition properties of PAP-method

Due to the splitting of the main operator in VI(F ,Q,K) into a sum F + Q,certain decomposition properties are already inherent in the auxiliary problem(P kε ), where xk+1 is calculated with F fixed at the point xk. Such a splittingcan be caused by several reasons, in particular,

traditional APP-schemes assume usually that the operator in the varia-tional inequality is single-valued, whereas - applying proximal methods -the operator is supposed to be monotone, but not necessary single-valued(concerning relaxations of these conditions see [79], Sect. 3 and [220, 379]).The class of problems admitting the use of Method 10.1.3 is wider, includ-ing variational inequalities with an operator F +Q, whose ”geometrical”properties are defined by Assumptions 10.1.1(ii), (v);

10.3. COMMENTS 441

for some variational inequalities, the problems (P kε ) can be solved easierthan the auxiliary problems arising in proximal methods. This fact wasmotivation for the Lions-Mercier algorithm (10.2.1) for Problem (10.0.3),although the conditions assumed for the operator T in [273, 125] permita straightforward use of the proximal point method, too.

Concerning traditional applications of the APP to decomposition and linear ap-proximation methods we point out the papers [78, 223, 282] for general conceptsand [101, 317, 400, 401] for special algorithms which can be incorporated intothe APP.

10.3 Comments

Let us come back to the weakening of the standard (for the APP) assumptionon strong convexity of the auxiliary function h in the Problems (P kε ) in thesense that we supposed only h to be convex and the operators Qk + ∇h haveto be strongly monotone with a common modulus for all k. This enables us toextend the APP-scheme to proximal-like methods with weak regularization andregularization on a subspace (cf. Section 8.2.3), as well as to methods usingBregman functions (see Section 9.3), if the chosen Bregman function ensuresthe mentioned property of the strong monotonicity for Qk +∇h. In the lattercase, however, substantial modifications in the basic assumptions as well as inthe convergence analysis are needed. For instance, if Kk := K, K := K andh is a Bregman function with a zone S = intK, complications emerge due tothe non-differentiability of h on the boundary of K. Such kind of convergenceanalysis is carried out in the papers [227, 230].

Chapter 11

APPENDIX

Some results from the theory of Hilbert spaces and nonlinear, in particular, con-vex analysis are presented here, predominantly without proofs. Some of themhold true in more general spaces, but we will not focus on this.

The following references may be useful for a more detailed study of thebasic facts: as in functional analysis – Kantorovich and Akilov [202], Ed-wards [99], Schwarz [370]; as in Sobolev spaces – Adams [3], Schwarz [369];in finite element methods and variational inequalities – Ciarlet [75], Glowin-ski, Lions and Tremolieres [135]; in convex analysis – Ekland and Temam[100], Rockafellar [348], Rockafellar and Wets [353], Ioffe and Ti-chomirov [192]; in optimization methods – Fletcher [117], Polak [322],Polyak [330], Bazaraa, Sheraly and Shetty [36], Dennis and Schnabel[90], Golshtein and Tretyakov [141] and Mangasarian [285].

A1 Some Preliminary Results in Functional Anal-ysis

A1.1 Norm and convergence in Hilbert spaces

Let V be a real Hilbert space. A scalar product of two elements u and v in V isdenoted by 〈u, v〉 or 〈u, v〉V if it is convenient to emphasize the connection witha given space. A norm of an element u ∈ V is denoted by ‖u‖ or ‖u‖V . Unlessstated otherwise, the norm is endowed by a scalar product, i.e., ‖u‖ =

√〈u, u〉.

Let us recall the Cauchy-Schwarz inequality

|〈u, v〉| ≤ ‖u‖ · ‖v‖ ∀ u, v ∈ V.

A functional | · | : V → R is called a seminorm if it has all the properties of anorm except for |u| > 0 ∀ u 6= 0.The set of linear continuous functionals on V forms a conjugate space V ′, inwhich the norm of an element ` ∈ V ′ is defined by

‖`‖V ′ = supu6=0

‖`(u)‖V‖u‖V

.

For the value of a linear functional ` at u we shall also use the notation 〈`, u〉,in particular, if the mapping ` : u → 〈`, u〉 is understood as a duality pairing

443

444 CHAPTER 11. APPENDIX

between V and V ′.

A1.1.1 Theorem. (Riesz theorem)For any functional ` ∈ V ′ there exists a unique element v ∈ V such that

`(u) = 〈v, u〉 ∀ u ∈ V.

Due to the Riesz theorem, the space V ′ can be identified with V and it is easilyseen that a Hilbert space V is reflexive, i.e., V ′′ = V .

Additionally to the usual (strong) convergence of a sequence vn to v,denoted by ‖vn − v‖ → 0, or vn → v, we deal also with weakly convergentsequences.

A1.1.2 Definition. A sequence vn ⊂ V is called weakly convergent to v ∈ V(denoted by vn v), if limn→∞ `(vn) = `(v) for each ` ∈ V ′. The point v issaid to be a weak limit of vn. ♦

Uniqueness of a weak limit for such sequences can be shown easily. This resultis a corollary of Lemma 1 in Opial [312].

A1.1.3 Proposition. (Opial’s lemma)Assume that a subset A of V and a sequence un ⊂ V are given such that

(i) ‖un − u‖ converges for each u ∈ A;

(ii) if uni v, then v ∈ A.

Then the sequence un converges weakly.

Proof: In view of (i) the sequence un is bounded. Now, let unk v, unj v′,v′ 6= v. Due to (ii), the inclusions v ∈ A and v′ ∈ A hold. From

‖unk − v′‖2 = ‖unk − v‖2 + ‖v′ − v‖2 + 2〈unk − v, v − v′〉

and

‖unj − v‖2 = ‖unj − v′‖2 + ‖v′ − v‖2 + 2〈unj − v′, v′ − v〉,

we obtain

limk→∞

‖unk − v′‖2 = limk→∞

‖unk − v‖2 + ‖v′ − v‖2,

limj→∞

‖unj − v′‖2 = limj→∞

‖unj − v‖2 + ‖v′ − v‖2,

which contradicts assumption (i).

Obviously, if vn converges to v, then v is also a weak limit of vn.If vn v and ‖vn‖ → ‖v‖, then strong convergence vn → v holds true.

In finite-dimensional spaces weak convergence ensures convergence.

A1.1.4 Definition. A set B is called the closure ( resp. weak closure) of a setA if each point of B is a limit (resp. weak limit) of some sequence in A.Correspondingly, a set A is called closed (resp. weakly closed) if it coincides withits closure (resp. weak closure). ♦

A1. SOME PRELIMINARY RESULTS IN FUNCTIONAL ANALYSIS 445

The closure of a set A is denoted by A or clA and a set A is called densein B if A ⊂ B and clA ⊃ B.

A subset A of V is said to be compact (resp. weakly compact) providingthat each infinite sequence un ⊂ A has a subsequence which converges (resp.weakly converges) to some element of A.

It is known that a bounded, weakly compact subset of a reflexive space,hence of a Hilbert space, is weakly compact.

A1.2 Functionals in Hilbert spaces

Later on real-valued functionals are considered which can attain the value +∞.The set domf = u ∈ V : f(u) < ∞ is called effective domain of a functionalf and we always suppose that domf 6= ∅.

A1.2.5 Definition. A functional f : V → R := R ∪ +∞ is called lowersemicontinuous (resp. weakly lower semicontinuous) if for each u ∈ V andun → u (resp. un u)

limn→∞f(un) ≥ f(u).

Accordingly, f is called upper semicontinuous (resp. weakly upper semicontinu-ous) if for each u ∈ V and un → u (resp. un u)

limn→∞f(un) ≤ f(u).

Sometimes we use the abbreviation lsc functional, wlsc functional, etc..

A functional f is continuous (resp. weakly continuous) if it is lsc and usc (resp.wlsc and wusc) simultaneously. ♦

A1.2.6 Theorem. (Generalized Weierstrass theorem)A weakly lower semicontinuous functional attains its infimum on a weakly com-pact subset of a Hilbert space V .

We recall that a functional f is Holder continuous on a set G ⊂ V if

|f(u)− f(v)| ≤ L‖u− v‖λ for some L, λ > 0 and all u, v ∈ G.

If λ = 1 then f is called Lipschitz continuous (Lipschitz for short) (with constantL).

We will consider linear (continuous) operators from a Hilbert space V intoanother normed space W . The norm of such an operator T is defined by

‖T ‖ = supu 6=0

‖T u‖W‖u‖V

.

A mapping a : V × V → R is a bilinear form if both

a(u, ·) : V → R and a(·, u) : V → R

are linear functionals for each u ∈ V .In the sequel symmetric and continuous bilinear forms are considered only, i.e.,

a(u, v) = a(v, u) ∀ u, v ∈ V


and the existence of a constant M is supposed such that

|a(u, v)| ≤M‖u‖‖v‖ ∀ u, v ∈ V. (A1.2.1)

Since for each u ∈ V the mapping v 7→ a(u, v) is a linear functional, there existsa unique element Λu ∈ V ′ such that

a(u, v) = (Λu)(v) ∀ v ∈ V (A1.2.2)

and

‖Λu‖V ′ ≤M‖u‖V .

Hence, Λ : u 7→ Λu is a linear operator from V into V ′ and ‖Λ‖ ≤M .Of course, according to the Riesz theorem for each u the mapping v 7→

a(u, v) can be identified with a unique element w ∈ V and we can understand Λas a linear operator from V to V defined by Λu = w. However, this identificationis not convenient if the spaces V and V ′ are considered in the chain of inclusions,so-called Gelfand triple

V ⊂ H ⊆ H ′ ⊂ V ′,

where H is an other Hilbert space, especially if H is identified with its conjugateH ′. In the latter case, if V is a dense set in H and the embedding V into H iscontinuous1, then V and H can be identified with dense subspaces of V ′.

A1.3 Some Hilbert spaces and Sobolev spaces

Let Ω ⊂ Rn be an open set with a boundary Γ. In order to classify the boundaryΓ, following Adams [3], Necas [306], we suppose that there are positive con-stants β and δ, a finite number of local coordinate systems and local mappingsar, 1 ≤ r ≤ R, such that

Γ := ∪Rr=1(xr, xr) : xr1 = ar(xr), |xr| < δ,

(xr1, xr) : ar(xr) < xr1 < ar(x

r) + β, |xr| < δ ⊂ Ω, 1 ≤ r ≤ R,

(xr1, xr) : ar(xr)− β < xr1 < ar(x

r), |xr| < δ ∩ Ω = ∅, 1 ≤ r ≤ R,

with xr = xr2, ..., xrn) and |xr| < δ ⇔ |xri | < δ, 2 ≤ i ≤ n.If the functions ar are Lipschitz (or belong to Cm) in their definition domainsxr : |xr| ≤ δ, then the boundary is called Lipschitz-continuous (Lipschitz forshort) (or boundary of class Cm).

For a function ϕ : Ω → R the closure of the set x ∈ Ω : ϕ(x) 6= 0 iscalled support of ϕ.

Let D(Ω) be a space of infinitely often differentiable functions with compactsupports on Ω. For a multi-index α with |α| := α1 + ... + αn (αi non-negativeintegers) let a differential operator

Dα :=∂|α|

∂xα11 ...∂xαnn

1Notion of a continuous embedding cf. Definition A1.3.7


be defined on the space D(Ω). If for functions f, g ∈ L2(Ω) and arbitrary ϕ ∈D(Ω) the relation ∫

Ω

gϕdΩ = (−1)|α|∫

Ω

fDαϕdΩ

is fulfilled, then g is called a generalized derivative of the function f

g := Dαf.

A set of functions u ∈ L2(Ω) with Dαu ∈ L2(Ω) for all α : |α| ≤ m (minteger) forms a Sobolev space Hm(Ω). It turns into a Hilbert space if the scalarproduct is defined by

〈u, v〉m,Ω =∑

0≤|α|≤m

〈Dαu,Dαv〉L2(Ω). (A1.3.3)

We consider the space Hm(Ω) to be a Hilbert space with this scalar productand the corresponding norm

‖u‖m,Ω =

∑0≤|α|≤m

‖Dαu‖2L2(Ω)

1/2

. (A1.3.4)

An analogous notation is applied for H0(Ω) = L2(Ω), too.If the smoothness of the boundary Γ is not too bad, for example if Γ is

Lipschitz as we always suppose, then Hm(Ω) coincides with the closure of theset C∞(Ω) in the norm (A1.3.4).

We also need the Sobolev space Hm0 (Ω), which is obtained as the closure of

the set D(Ω) in the norm (A1.3.4). The scalar product in Hm0 (Ω) is introduced

by (A1.3.3) or

〈u, v〉m,Ω,0 =∑|α|=m

〈Dαu,Dαv〉0,Ω, (A1.3.5)

moreover, due to Friedrichs’ inequality

c‖u‖0,Ω ≤∑|α|=1

‖Dαu‖0,Ω ∀ u ∈ H10 (Ω) (A1.3.6)

(with c > 0 independent of u), the norms (A1.3.4) and

‖u‖m,Ω,0 =

∑|α|=m

‖Dαu‖20,Ω

1/2

(A1.3.7)

are equivalent.The definitions observed can easily be adapted to vector functions

u = (u1, ..., un) : u ∈ [Hm(Ω)]n if ui ∈ Hm(Ω), i = 1, ..., n,

and

‖u‖m,Ω =

(n∑i=1

‖ui‖2m,Ω

)1/2

.

Similarly, the spaces [Hm0 (Ω)]n and norms ‖u‖m,Ω,0 are defined if ui ∈ Hm

0 (Ω).Now, we describe some facts of the embedding theory for Sobolev spaces.


A1.3.7 Definition. A normed space X is called continuously embedded into anormed space Y if X ⊂ Y holds in the algebraic sense and there exists a fixedconstant c > 0 such that

‖x‖Y ≤ c‖x‖X ∀ x ∈ X.

♦

Obviously, an operator T , which performs this embedding, is linear (continuous).An embedding is said to be compact if an operator T maps each bounded subsetA of X into a relatively compact subset T A of Y , i.e. cl(T A) is compact in Y .

The notation X → Y is used for a continuous embedding, whereas Xc→ Y

stands for a compact one.A set of functions in Cm(Ω) for which the derivatives of order m satisfy

the Holder condition with constant λ and for which the norm is defined by

‖u‖Cm,λ(Ω) = ‖u‖Cm(Ω) + max|α|=m

supx,y∈Ωx 6=y

|Dαu(x)−Dαu(y)|‖x− y‖λ

Rn

build the Banach space Cm,λ(Ω).According to the embedding theorems, in particular, the following results

hold true with an arbitrary integer k ≥ 0:

Hm+k(Ω) → Ck,λ(Ω) if 0 < λ ≤ m− n2 and 2(m− 1) < n < 2m;

Hm+k(Ω) → Ck,λ(Ω) if 0 < λ < 1 and n = 2m− 2;Hm+k(Ω) → Ck,1(Ω) if n < 2m− 2.

The trivial continuous embedding Hm+k(Ω) → Hk(Ω) is compact except forthe case m = 0. Moreover, the relations

Hm+k(Ω)c→ Ck(Ω) if n < 2m;

Hm+k(Ω)c→ Ck,λ(Ω) if 2(m− 1) ≤ n < 2m and 0 < λ < m− n

2

are satisfied.All these embeddings remain true for the corresponding spaces Hk

0 (Ω) withthe norms (A1.3.4) or (A1.3.7) without any assumption about smoothness of theboundary Γ.

We briefly deal with the trace of a function of the class Hm(Ω). As theboundary Γ is Lipschitz, a surface measure dγ can be defined on this boundary,and one can consider the space L2(Γ) with the norm ‖ · ‖L2(Γ). For functions inC∞(Ω) the inequality

‖u‖L2(Γ) ≤ c‖u‖1,Ωis known, where c depends only on Ω. Taking into account that H1(Ω) is theclosure of C∞(Ω) in the norm ‖·‖1,Ω, there exists a linear (continuous) mappingγ : u ∈ H1(Ω) 7→ γu ∈ L2(Γ) such that γu = u|Γ for u ∈ C∞(Ω). The factorspace

w ∈ L2(Γ) : ∃ v ∈ H1(Ω), γv = w

is a Hilbert space H12 (Γ) with the norm

‖w‖H

12 (Γ)

= inf‖v‖1,Ω : v ∈ H1(Ω), γv = w on Γ.


We note that

H10 (Ω) = v ∈ H1(Ω) : γu = 0.

In view of the properties of Γ an outward normal ν = (ν1, ..., νn) exists almosteverywhere on Γ, and assuming that ‖ν‖Rn = 1, the normal derivative operator

∂

∂ν=

n∑i=1

νi∂

∂xi

can be extended on all functions in H2(Ω). In particular, we get

H20 (Ω) = u ∈ H2(Ω) : γu = 0,

n∑i=1

νiγ∂

∂xi= 0 on Γ

with ∂u∂xi

a generalized derivative.

Sometimes we also write u|Γ instead of γu for a function u ∈ H1(Ω).

A1.4 Differentiation of operators and functionals

The facts on the differential calculus in functional spaces presented here will bereplenished in Section A1.5, where convex functionals are considered.

Assume that X and Y are Banach spaces, that U(x) ⊂ X is a neighborhoodof an arbitrary point x ∈ X and that F : U(x) → Y is a given mapping. Thesymbol L (X,Y ) indicates the space of all linear (continuous) operators mappingfrom X into Y .

A1.4.8 Definition. An operator T (x) ∈ L (X, y) is called a Gateaux-derivativeof a Mapping F at a point x if for each v ∈ U(x)

limt↓0

∥∥∥∥F(x+ tv)−F(x)

t− T (x)(v)

∥∥∥∥Y

= 0. (A1.4.8)

If the relation

lim‖v‖X→0

‖F(x+ v)−F(x)− T (x)(v)‖Y‖v‖X

= 0 (A1.4.9)

is fulfilled, then T (x) is said to be a Frechet-derivative of F at x.Accordingly, an operator F is called Gateaux- or Frechet-differentiable at

x, and in both cases we write T (x) = F ′(x). ♦

If F has a Gateaux- or Frechet-derivative at every point of the set U(x),then F ′ can be considered as a mapping from U(x) into L (X,Y ). If the deriva-tive F ′ exists in the mentioned sense, then F ′′(x) is an element of the spaceL (X,L (X,Y )).

In the case X = V and F := J : V → R an adaption of the definitions ofGateaux- and Frechet-derivatives is trivial: Instead of the norms ‖ · ‖Y , ‖ · ‖Xin (A1.4.8) and (A1.4.9) we write | · |, ‖ · ‖V , respectively, and in that case

T (x) = J ′(x) ∈ V ′, J ′′(x) ∈ L (V, V ′).


Sometimes the Frechet-derivative J ′(x) is denoted by ∇J(x).If V = Rn, using the Riesz theorem A1.1.1, we obtain

∇J(x) =

(∂J

∂x1(x), ...,

∂J

∂xn(x)

)and instead of ”Frechet-differentiable” and ”Frechet-derivative” the notions dif-ferentiable and derivative, respectively, are used.

Relations between Gateaux- and Frechet-derivatives are described in thefollowing statement.

A1.4.9 Proposition.

(i) If an operator F has a Frechet-derivative at a point x, then F is continuousand Gateaux-differentiable at x and both derivatives coincide.

(ii) If a Gateaux-derivative of F is continuous in some neighborhood of a pointx, then the operator F has a Frechet-derivative at x.

Usually, the notations Fx and F(x) are used for the values of linear andnonlinear operators at the point x, respectively.

If an operator F is Frechet-differentiable on an interval [u, u+ v], then anintegral representation

F(u+ v) = F(u) +

∫ 1

0

F ′(u+ τv)(v)dτ (A1.4.10)

holds true.Concerning a Taylor formula for operators, in particular, we refer to Vain-

berg[405]. For a Frechet-differentiable functional J : V → R the mean valuetheorem

J(u+ v) = J(u) + 〈J ′(u+ τv), v〉 for some τ ∈ [0, 1]

holds, and if J is twice Frechet-differentiable on [u, u+ v], then the relations

J ′(u+ v)− J ′(u) = J ′′(u)(v) + o(‖v‖)

J(u+ v) = J(u) + 〈J ′(u), v〉+1

2〈J ′′(u+ θv)(v), v〉 for some θ ∈ [0, 1]

are true.If a(x, y) is a symmetric, continuous bilinear form, using the presentation

a(u, v) = (Λu)(v) (see (A1.2.2)) and

a(u+ τv, u+ τv)− a(u, u) = 2τa(u, v) + τ2a(v, v),

we obtain J ′(u) = 2Λu and J ′′(u) = 2Λ.

A1.5 Convexity of sets and functionals

We assume that the definitions of affine and convex sets, affine and convex hullsand convex cones as well as some of their simple properties are well-known. Forthe reader which is not familiar with these notions we refer to Rockafellarand Wets [353] and Hiriart-Urruty and Lemarechal [179]. Concerningthe convexity of particular sets, usually we shall not verify this fact in simplecases.

Now, we consider sets in a Hilbert space V .


A1.5.10 Definition. The set of all interior points of a convex set C is calledthe interior of C and denoted by intC. A point x of a convex set C is calledrelatively interior if

U(x) ∩ aff C ⊂ C,

with aff C the affine hull of C and U(x) some neighborhood of the point x. Theset of all relatively interior points of a set C is denoted by riC and forms therelative interior of C. ♦

A1.5.11 Proposition. For a convex set C the sets intC and riC are also convexand

cl intC = clC.

If u ∈ intC, v ∈ clC, then [u, v) ⊂ intC, with

[u, v) := u+ λ(v − u) : λ ∈ [0, 1).

For a convex set its closure and weak closure coincide, hence, a closedconvex set is also weakly closed.

Given a functional J : M ⊂ V → R ≡ R ∪ +∞, the set

epiJ = (α, u) ∈ R×M : J(u) ≤ α

is said to be the epigraph of J .A functional J : V → R is lower semi-continuous iff epiJ is a closed set.

A1.5.12 Definition. A functional J : M → R is called convex on a convex setC if the set epiJ is convex. ♦

Usually, for a convex functional J : V → R, with

domJ := u ∈ V : J(u) <∞ 6= ∅,

the notion of a proper convex functional is used. We omit ”proper”, because onlysuch convex functionals will be considered in this book.

Obviously, J : C → R is convex on C iff the functional

J(u) :=

J(u) if u ∈ C+∞ if u ∈ V \ C

is convex on V . In this case domJ = domJ holds.We can immediately conclude that the epigraph of a convex, lower semi-

continuous functional J : V → R is a convex, closed set, hence also weaklyclosed. Moreover, J is a weakly lower semicontinuous functional.

Often, the following property is considered as an equivalent definition forthe convexity of the function J on a convex set C: For arbitrary two points ofa convex set C ⊂ V and a scalar λ ∈ [0, 1] the inequality

J(λu+ (1− λ)v) ≤ λJ(u) + (1− λ)J(v)

holds. (We assume that the inequality f(x) ≤ g(y) is always fulfilled if g(y) =+∞).

A functional J is called concave if −J is convex.


Convexity of a mapping F , acting from a convex set C ⊂ V into a vectorspace Y , is defined analogously: For arbitrary u, v ∈ C and λ ∈ [0, 1]

F(λu+ (1− λ)v)− λF(u)− (1− λ)F(v) ≤ 0

holds. Here the inequality a ≤ 0 on Y means that −a ∈ Y+, with Y+ a non-negative cone in Y .

It is easy to verify that a non-negative linear combination of a finite numberof convex functionals and the supremum of any number of convex functionalsare also convex functionals.

Now, we formulate some properties of convex functionals.

A1.5.13 Proposition. Let J : V → R be a convex functional.

(i) If (α, u) ∈ epiJ , then u ∈ domJ , hence, domJ is a convex set.

(ii) For arbitrary points u1, ..., un from V and scalars λi ≥ 0 (i = 1, ..., n),with

∑ni=1 λi = 1, Jensen’s inequality holds true:

J(

n∑i=1

λiui) ≤

n∑i=1

λiJ(ui).

(iii) For any constant c the level sets

u ∈ V : J(u) ≤ c, u ∈ V : J(u) < c

are convex, and if additionally J is lower semicontinuous, then the levelset

u ∈ V : J(u) ≤ c

is closed.

(iv) The function η(λ) = J(λu+(1−λ)v) is convex on R for arbitrary u, v ∈ Vand λ ∈ [0, 1].

(v) If Λ ∈ L(Y, V ), with Y a Banach space, then ψ(y) = J(Λy) is a convexfunctional on Y .

A1.5.14 Proposition. For convex functionals J : V → R the following state-ments are equivalent:

(i) J is bounded from above in a neighborhood of some point u ∈ V ;

(ii) J is continuous at u ∈ V ;

(iii) int(epiJ) 6= ∅;

(iv) int(domJ) 6= ∅ and J is continuous on int(domJ).

This proposition implies a number of non-trivial corollaries for convexfunctionals J : V → R.

A1.5.15 Corollary. In a finite-dimensional space a convex functional J iscontinuous on int(domJ).


A1.5.16 Corollary. If J is convex and bounded from above on an open, con-vex set C, then J is locally Lipschitz-continuous on int(domJ), i.e., Lipschitz-continuous in a neighborhood of each point of int(domJ).

A1.5.17 Corollary. If J is a convex, lower semicontinuous functional, then itis continuous on int(domJ).

For convex finite-valued functionals J : C → R the straightforward tran-sitions of the properties formulated above is obvious.

We remind also on the superposition of convex functionals.

A1.5.18 Proposition. Let M ⊂ V be a convex set. If J : M → C ⊂ R

and ϕ : C → R are convex functionals and ϕ does not decrease on C, thenϕ : u 7→ ϕ(J(u)) is a convex functional on M .

The following statement uses essentially the finite dimensionality of thespace.

A1.5.19 Proposition. Let C ⊂ Rn be a relatively open convex set, and C

is supposed to be dense in C. If a sequence of finite-valued convex functionsfi : C → R converges in each point of C, then there exists a finite-valued convexfunction f such that

f(x) = limi→∞

fi(x) ∀ x ∈ C,

moreover, the sequence fi converges uniformly to f on each compact subset ofC.

Note that if C ⊂ Rn is open and convex and ϕ : C → R is convex anddifferentiable on C, then ϕ is continuously differentiable on C.

In this book we deal with particular cases of the following convex varia-tional problem:

minimize J(u),subject to u ∈ K, (A1.5.11)

where J : V → R is a convex, lower semicontinuous functional,K ⊂ V is a convex, closed set and K ∩ domJ 6= ∅.

The set K is called feasible set and

U∗ : u ∈ K : J(u) = infv∈K

J(v) ≡ J∗

is said to be solution set or optimal set of the problem.The symbols Arg minu∈G f(u) and arg minu∈G f(u) denote the optimal set

or optimal point of the problem minf(u) : u ∈ G, respectively.It is easy to verify that for convex variational problems the optimal set

U∗ is closed and convex, however, U∗ = ∅ is not excluded. Moreover, any localminimizer of J on K is at the same time a (global) solution of Problem (A1.5.11).

Obviously, Problem (A1.5.11) can be reformulated as

minJ(u) : u ∈ K ∩ domJ


and also as

minJ |domJ(u) : u ∈ K ∩ domJ, (A1.5.12)

with J |D : D → R, J |D(u) = J(u) ∀ u ∈ D.If K ∩ domJ is bounded, then due to the generalized Weierstrass Theo-

rem A1.2.6, U∗ 6= ∅, i.e. Problem (A1.5.11) is solvable. Otherwise, in order tosecure solvability of Problem (A1.5.11), we need some additional assumptions,for instance coercivity of J on K, i.e.,

limu∈K‖u‖→∞

J(u) = +∞.

Unique solvability of Problem (A1.5.11) can be established by means of strongerconvexity conditions for the functional J . Usually, such conditions are formu-lated for finite-valued functionals on a convex subset of the space V . Taking intoaccount the equivalence between the Problems (A1.5.11) and (A1.5.12), theseconditions can be applied quite simply.

A1.5.20 Definition. A functional J : C → R on a convex set C ⊂ V is saidto be

(i) strictly convex if for any u, v ∈ C, u 6= v and 0 < λ < 1

J(λu+ (1− λ)v) < λJ(u) + (1− λ)J(v);

(ii) uniformly convex if for any u, v ∈ C and 0 ≤ λ ≤ 1

J(λu+ (1− λ)v) ≤ λJ(u) + (1− λ)J(v)− λ(1− λ)δ(‖u− v‖),

with δ : R+ → R+ a strictly increasing function, δ(0) = 0;

(iii) strongly convex if J is uniformly convex with δ(t) = κt2, κ > 0.

♦

Constant κ is called constant of strong convexity. Obviously, a uniformlyor strongly convex functional is strictly convex.

A1.5.21 Proposition.

(i) If J is a strictly convex functional on domJ , then Problem (A1.5.11) hasat most one solution.

(ii) If J is uniformly convex and lower semicontinuous on domJ , then thereexists a unique solution of Problem (A1.5.11).

For the investigation of variational inequalities the coercivity of bilinearforms is essential, i.e., the bilinear form a(·, ·) : V ×V → R is said to be coerciveif there exists a constant α > 0 such that

a(u, u) ≥ α‖u‖2 ∀ u ∈ V.


A1.5.22 Corollary. Let a(·, ·) be a continuous, symmetric and coercive bilinearform on V × V and f ∈ V ′ be a given element. Then Problem (A1.5.11) withthe quadratic functional

J(u) =1

2a(u, u)− 〈f, u〉

has a unique solution.

For non-smooth convex problems we need the technique of directionalderivatives and subdifferentials.

A1.5.23 Definition. Let J : V → R be an arbitrary functional. If for fixedu ∈ domJ and d 6= 0 the limit

limt↓0

J(u+ td)− J(u)

t=: J ′(u; d)

exists (J ′(u; d) = ±∞ is possible), then J ′(u; d) is called a directional derivativeof J at the point u in the direction d. ♦

A1.5.24 Proposition. A convex functional J : V → R has a directionalderivative at each point u ∈ domJ in any direction.If u, v ∈ ri(domJ), v 6= u, then the directional derivative of J at u in directionv − u is finite.

For two points u, u+ v ∈ domJ , due to the convexity of J , the function

ϕ(t) =J(u+ tv)− J(u)

t

increases on (0, 1]. Hence,

J(u+ v)− J(u) ≥ J ′(u; v) ∀ u ∈ domJ, v ∈ V. (A1.5.13)

A1.5.25 Definition. Let J : V → R be a convex functional. An element q ∈ V ′is called a subgradient of J at the point u if

J(v)− J(u) ≥ 〈q, v − u〉 ∀ v ∈ V.

The set of all subgradients of J at the point u is called the subdifferential of Jat the point u and denoted by ∂J(u), i.e.,

∂J(u) = q ∈ V ′ : J(v)− J(u) ≥ 〈q, v − u〉 ∀ v ∈ V .

♦

The subdifferential ∂J(u) (possibly empty) is a convex and weakly closedsubset in V ′. If ∂J(u) 6= ∅, we say that J is subdifferentiable at the point u.Note that the relation

J ′(u; v) ≥ 〈q, v〉 (A1.5.14)

holds always true.In particular, the following connection between directional derivatives and

sub-differentials are true.


A1.5.26 Proposition. Assume that the convex functional J : V → R is con-tinuous at u ∈ domJ . Then

∂J(w) 6= ∅ ∀ w ∈ int(domJ)

andsup

q∈∂J(u)

〈q, v〉 = J ′(u; v) <∞. (A1.5.15)

If V = Rn, then ∂J(v) 6= ∅ ∀ v ∈ ri(domJ).

We remind of some rules of the subdifferential calculus which follow im-mediately from the definition of the subdifferential:

if J1(u) = λJ(u) with λ > 0, then ∂J1(u) = λ∂J(u);

if J1(u) = J(λu) with λ > 0, then ∂J1(u) = λ∂J(λu).

A1.5.27 Theorem. (Moreau-Rockafellar)Let J1, J2 : V → R be convex functionals. Then

∂(J1 + J2)(u) ⊃ ∂J1(u) + ∂J2(u)

and if, moreover, J1 is continuous at u ∈ domJ1 ∩ domJ2, then

∂(J1 + J2)(u) = ∂J1(u) + ∂J2(u).

For fixed Λ ∈ L (X,V ), with X a Banach space, and for convex functionalsJ : V → R the superposition J Λ is always a convex functionals on X.

A1.5.28 Proposition. If J is continuous at some point u = Λx, then

∂(J Λ)(x) = Λ′∂J(Λx) ∀ x ∈ X,

with Λ′ : V ′ → X ′ the conjugate operator of Λ, i.e.,

〈Λ′f, x〉 = 〈f,Λx〉 ∀ f ∈ V ′, x ∈ X.

The following connection between subdifferentials and Gateaux-derivativescan easily be verified.

A1.5.29 Proposition. If a convex functional J : V → R is Gateaux-differentiableat u ∈ V , then J is subdifferentiable at this point and

∂J(u) = J ′(u).

On the other hand, if a functional J is continuous at u and has a unique sub-gradient, then J is Gateaux-differentiable at u.

Due to Definition A1.5.23 and the Propositions A1.5.26 and A1.5.29it follows immediately that the functional J(u + λd) decreases for sufficientlysmall λ > 0 if 〈J ′(u), d〉 < 0.

A1.5.30 Proposition. If C ⊂ V is a convex set and J : C → R is Gateaux-differentiable on C, then convexity of J on C is equivalent to

J(v) ≥ J(u) + 〈J ′(u), v − u〉 ∀ u, v ∈ C. (A1.5.16)


A1.5.31 Proposition. If C ⊂ V is a convex set and the convex functionalJ : C → R is subdifferentiable on C, then strict (resp. strong) convexity of Jguarantees that

J(v)− J(u) > 〈q, v − u〉 ∀ q ∈ ∂J(u),∀ u, v ∈ C, u 6= v, (A1.5.17)

(resp. J(v)− J(u) ≥ 〈q, v − u〉+ κ‖u− v‖2, κ > 0). (A1.5.18)

On the other hand, if inequality (A1.5.17) (resp. (A1.5.18)) is fulfilled for someq ∈ ∂J(u) and each u, v ∈ C, u 6= v, then J is strictly (resp. strongly) convex.

By means of the Propositions A1.5.29 and A1.5.30, Proposition A1.5.31 caneasily be reformulated for Gateaux-differentiable functionals.

A1.5.32 Corollary. Under the hypothesizes of Proposition A1.5.30 or A1.5.31for the functional J the following necessary and sufficient conditions of convex-ity, strict and strong convexity hold, respectively:

〈q(u)− q(v), u− v〉 ≥ 0 ∀ u, v ∈ C; (A1.5.19)

〈q(u)− q(v), u− v〉 > 0 ∀ u, v ∈ C, u 6= v; (A1.5.20)

〈q(u)− q(v), u− v〉 ≥ 2κ‖u− v‖2 ∀ u, v ∈ C; (A1.5.21)

with q(u) ∈ ∂J(u) if J is subdifferentiable and q(u) = J ′(u) if J is Gateaux-differentiable.

Inequality (A1.5.19) characterizes the subdifferentials and Gateaux-derivativesof convex functionals as monotone mappings.

A1.5.33 Lemma. Let J : V → R be a convex, Frechet-differentiable func-tional on the Hilbert space V , and assume that its gradient satisfies a Lipschitzcondition on V with constant L. Then it holds

〈∇J(u1)−∇J(u2), u1−u2〉 ≥ 1

L‖∇J(u1)−∇J(u2)‖2, ∀ u1, u2 ∈ V. (A1.5.22)

Proof: Introducing the functional

J(u) := J(u)− J(u2)− 〈∇J(u2), u− u2〉,

it is obvious that this functional satisfies the assumptions of the lemma withthe same constant L. Using formula (A1.4.10) and the Lipschitz property, weget for any u ∈ V

J(u) = J(u1) + 〈∇J(u1), u− u1〉

+

∫ 1

0

〈∇J(u1 + t(u− u1))−∇J(u1), u− u1〉dt

≤ J(u1) + 〈∇J(u1), u− u1〉+L

2‖u− u1‖2.

Inserting u := u1 − 1L∇J(u1), we get

J(u1 − 1

L∇J(u1)) ≤ J(u1)− 1

L‖∇J(u1)‖2 +

1

2L‖∇J(u1)‖2

= J(u1)− 1

2L‖∇J(u1)‖2.


Since u2 is a minimizer of the functional J with J(u2) = 0, we conclude that

J(u1 − 1

L∇J(u1)) ≥ 0 and J(u1) ≥ 1

2L‖∇J(u1)‖2.

Using the definition of J this leads to

J(u1)− J(u2)− 〈∇J(u2), u1 − u2〉 ≥ 1

2L‖∇J(u1)−∇J(u2)‖2.

Exchanging the positions of u1 and u2 in the latter inequality, we also have

J(u2)− J(u1)− 〈∇J(u1), u2 − u1〉 ≥ 1

2L‖∇J(u2)−∇J(u1)‖2

and inequality (A1.5.22) is obtained by summing up the latter two inequalities.

Now we return to Problem (A1.5.11) and consider the set of feasible direc-tions on K at the point u, i.e.,

Cu(K) = v 6= 0 : u+ λv ∈ K for some λ > 0. (A1.5.23)

Due to Definition A1.5.23 and inequality (A1.5.13), the point u ∈ K ∩ domJis a solution of Problem (A1.5.11) iff

J ′(u; v) ≥ 0 ∀ v ∈ V,

or

J ′(u; v) ≥ 0 ∀ v ∈ Cu(K).

Immediately from Definition A1.5.25 of a subdifferential we imply thatu ∈ U∗ iff 0 ∈ ∂J(u).

A1.5.34 Proposition. Let J = J1 + J2, with J1, J2 : V → R convex, lsc func-tionals and J1 Gateaux-differentiable on V . Then, concerning Problem (A1.5.11),the following statements are equivalent for some u ∈ K:

(i) u is a solution of Problem (A1.5.11);

(ii) the following inequalities holds true

〈J ′1(u), v − u〉+ J2(v)− J2(u) ≥ 0 ∀ v ∈ K;

〈J ′1(v), v − u〉+ J2(v)− J2(u) ≥ 0 ∀ v ∈ K.(A1.5.24)

Proof: Let u be a solution of Problem (A1.5.11). Then, for λ ∈ (0, 1) andu1 = (1− λ)u+ λv, due to the convexity of J2,

J(u) ≤ J(u1) ≤ J1(u1) + (1− λ)J2(u) + λJ2(v)

is true for all v ∈ K, i.e.,

1

λ[J1((1− λ)u+ λv)− J1(u)] + J2(v)− J2(u) ≥ 0. (A1.5.25)


Taking λ ↓ 0, the first relation in (ii) is proved.Conversely, using (A1.5.16) for J ′1, from the first relation in (ii) we just obtain

J(v) ≥ J(u) ∀ v ∈ K.

Now, if the first inequality in (ii) holds, then the second is a conclusion of themonotonicity of J ′1

〈J ′1(v)− J ′1(u), v − u〉 ≥ 0,

which follows from (A1.5.19) for q = J ′1.Finally, let the second inequality in (ii) be fulfilled. For v = (1− λ)u+ λw withw ∈ K and λ ∈ (0, 1] we get

〈J ′1((1− λ)u+ λw), w − u〉+ J2(w)− J2(u) ≥ 0.

The functional

η(λ) := ηw(λ) = J1((1− λ)u+ λw) + λ(J2(w)− J2(u))

is convex and differentiable and, obviously,

η′(λ) = 〈J ′1((1− λ)u+ λw), w − u〉+ J2(w)− J2(u).

Consequently, η′(λ) ≥ 0 for all w ∈ K and λ ∈ (0, 1]. This leads to

η(1) ≥ limλ↓0η(λ) ≥ η(0),

and the inequality J(w) ≥ J(u) ∀ w ∈ K holds as well as the first inequality in(ii) is true.

In case J2 ≡ 0, we conclude from (A1.5.24) that

〈J ′(u), v − u〉 ≥ 0 ∀ v ∈ K. (A1.5.26)

A1.5.35 Remark. If the convex functional J is only subdifferentiable on theset K, then a solution u of Problem (A1.5.11) is defined by

〈q, v − u〉 ≥ 0 for some q ∈ ∂J(u), ∀ v ∈ K. (A1.5.27)

♦

Relations of type (A1.5.24), (A1.5.26) or (A1.5.27) are called variationalinequalities. Here they have been obtained as solution criterions for convex ex-tremal problems. There exist a number of applied problems which can be de-scribed by more general variational inequalities, so called hemi-variational in-equalities,

〈Q(u), v − u〉+ ϕ(v) + ϕ(u) ≥ 0 ∀ v ∈ K,

with Q : V → V ′ a monotone (not necessarily potential) operator and ϕ : V →R a convex functional.

From Proposition A1.5.34 the following result can be immediately ob-tained (cf. Kinderlehrer and Stampacchia [234], chapt. 1).


A1.5.36 Corollary. The operator

ΠK(u) = arg minz∈K

‖u− z‖

of the orthogonal projection onto the convex, closed subset K ⊂ V is non-expansive, i.e.,

‖ΠKu−ΠKv‖ ≤ ‖u− v‖ ∀ u, v ∈ V.In order to investigate the variational problem (A1.5.11) from the numerical

point of view, often the following criterion of an approximate solution v is used:

infq(u)∈∂J(u)

‖q(v)− q(u)‖V ′ ≤ ε for some q(v) ∈ ∂J(v), (A1.5.28)

with u a minimum point of J on K and ε > 0 a given tolerance.In case J is Gateaux-differentiable on V (A1.5.28) turns into

‖J ′(v)− J ′(u‖V ′ ≤ ε.

If J is a strongly convex functional with the constant κ > 0 in someneighborhood U(u), then from (A1.5.28) we can estimate the distance ‖v − u‖for some v ∈ U(u). Indeed, let δ > 0 be an arbitrarily small number and q(u)be an element of the subdifferential ∂J(u) such that

‖q(v)− q(u)‖V ′ ≤ ε+ δ. (A1.5.29)

Then, using (A1.5.21), we get

‖q(v)− q(u)‖‖v − u‖ ≥ 2κ‖v − u‖2,

and because of (A1.5.29),

‖v − u‖ ≤ ε+ δ

2κ.

But δ > 0 is arbitrarily chosen, hence,

‖v − u‖ ≤ ε

2κ. (A1.5.30)

Now, it becomes clear that it suffices to require strong convexity of J only onthe set

u : ‖v − u‖ ≤ ε

2κ+ δ0 with a small δ0 > 0.

If, moreover, J is Lipschitz-continuous with the constant L, the trivialestimate

|J(v)− J(u)| ≤ Lε

2κ(A1.5.31)

results from (A1.5.30).In the case K ≡ V , considering instead of (A1.5.28) the criterion

‖q(v)‖V ′ ≤ ε for some q(v) ∈ ∂J(v),

we obtain (without using the Lipschitz-continuity of J)

J(v)− J(u) ≤ ε2

2κ. (A1.5.32)

Indeed, the convexity of J provides

J(v)− J(u) ≤ 〈q(v), v − u〉 ≤ ‖q(v)‖V ′‖v − u‖Vand there is nothing to do but to use inequality (A1.5.30).


A1.6 Monotone operators and monotone variational in-equalities

For properties of monotone operators we refer, for instance, to Vainberg [405],Aubin and Ekeland [20], Burachik and Iusem [60] and Zeidler [418].

A1.6.37 Definition. A mapping T : V → 2V′

in a Hilberst space V is called amulti-valued operator , i.e. T maps a point x ∈ V to a set T (x) ⊂ V . An operatorT is called single-valued if the image T (x) contains at most one element. ♦

A single-valued operator T : V → V ′ is called hemi-continuous if its restrictionto segments or lines are continuous, i.e.,

∀ u, v ∈ V : T (u+ α(v − u)) T (v) as α→ 1.

For multi-valued operators one defines the effective domain, the range and thegraph, respectively, as

dom(T ) := x ∈ V : T (x) 6= ∅,rge(T ) := ∪x∈dom(T )T (x),

gph(T ) := (x, u) ∈ V × V ′ : u ∈ T (x), x ∈ dom(T ).

The inverse operator T −1 of T is the multi-valued operator defined by theequivalence

x ∈ T −1(y) ⇔ y ∈ T (x).

For two operators T1, T2 : V → 2V′

and two scalars α, β ∈ R one defines thesum αT1 + βT2 by

(αT1 + βT2)(x) =

αT1(x) + βT2(x) if x ∈ dom(T1) ∩ dom(T2),∅ otherwise

A1.6.38 Definition. (Vainberg [405])A multi-valued operator T : V → 2V

′is called locally bounded if for any x ∈

int(dom(T )) there exists a neighborhood U(x) such that the set ∪v∈U(x)T (v)is bounded. It is called locally hemi-bounded at a point x0 ∈ D(T ) ⊂ V if andonly if for each x ∈ D(T ), x 6= x0, there exists a number t0(x0, x) > 0 such thatx0 + t(x− x0) ∈ D(T ) holds for 0 ≤ t ≤ t0(x0, x) and the set⋃

0<t≤t0(x0,x)

T (x0 + t(x− x0)) is bounded in V ′.

♦

Here we use a weakened notion of local hemi-boundedness: the standardnotion supposes boundedness of ∪0≤t≤t0(x0,x)T (x0 + t(x− x0)).

Different monotonicity assumptions can be found when studying solutionmethods for variational inequalities.

A1.6.39 Definition. The multi-valued operator T : C → 2V′, with C ⊂ V is

called monotone if

〈u− v, x− y〉 ≥ 0 ∀ x, y ∈ C, ∀ u ∈ T (x), v ∈ T (y);


strictly monotone, if

〈u− v, x− y〉 > 0 ∀ x, y ∈ C, x 6= y, ∀ u ∈ T (x), v ∈ T (y);

and strongly monotone if

〈u− v, x− y〉 ≥ κ‖x− y‖2 ∀ x, y ∈ C, ∀ u ∈ T (x), v ∈ T (y),

with a constant κ > 0).The operator T is called maximal monotone if its graph is not properly

contained in the graph of any other monotone operator T ′. ♦

If T1, T2 are monotone, then T −11 , λT1 (λ ≥ 0) and T1 + T2 are monotone,

too. A strongly monotone operator is obviously strictly monotone. For instance,the subdifferential of a proper lsc and strictly (strongly) convex functional isstrictly (strongly) monotone.

If T is maximal monotone, then T −1 and λT (λ > 0) are maximal mono-tone. Further properties of a maximal monotone operator T are:

T (x) is a convex and closed set for all x ∈ dom(T ) (cf. [418], Prop. 32.6);

gph(T ) is closed;

rge(T ) = V of dom(T ) is bounded (cf. [418], Prop. 32.35);

cl(dom(T )), ri(dom(T )), cl(rge(T )) and ri(rge(T )) are convex sets (cf.[24], Prop. 6.4.1);

T is locally bounded in int(dom(T )) (cf. [347], Thm.1);

T is upper semi-continuous in int(dom(T )) (cf. [24], Prop. 6.6.8), i.e., forany xint(dom(T )) and any open set O ⊃ T (x) there exists a neighborhoodU(x) such that T (y) ⊂ O for all y ∈ U(x). Thus, a single-valued, maximalmonotone operator is continuous in the interior of its domain.

Important examples of maximal monotone operators is are

(i) For a proper, lsc convex functional f : V → R∪ +∞ the subdifferential∂f : V → 2V

′, defined as

∂f(x) := s ∈ V : f(y) ≥ f(x) + 〈s, y − x〉 ∀ y ∈ V

is a maximal monotone operator (cf. [347], Thm. 12.17).

(ii) Single-valued operators that are monotone and continuous are maximalmonotone (cf. [418], Prop. 32.7).

(iii) An affine operator T (x) = Ax+ b with A ∈ Rn×n and b ∈ Rn is maximalmonotone if 1

2 (A+AT ) is positive semi-definite (cf. [353], Example 12.2).

(iv) The normality operator at a point u on the convex closed set K:

NK(u) := d ∈ V ′ : 〈d, v − u〉 ≤ 0 ∀ v ∈ K.

is maximal monotone (cf. [353], Corollary 12.18).


The sum of maximal monotone operators is not necessarily maximal monotone.

A1.6.40 Theorem. ([350], Thm. 1)Let T1 and T2 be maximal monotone operators on V such that

dom(T1) ∩ int(dom(T2)) 6= ∅.

Then T1 + T2 is maximal monotone.

A1.6.41 Theorem. ([350], Thm. 3)Let K ⊂ V be a nonempty, closed and convex subset and T be a single-valuedmonotone operator (not necessarily maximal) such that K ⊂ dom(T ) and T ishemi-continuous on K. Then T +NK is maximal monotone.

A1.6.42 Definition. (Lions [270])The multi-valued operator Q : V → 2V

′is called pseudo-monotone if the follow-

ing implication holds: If vk ⊂ D(Q) converges weakly to v ∈ D(Q) and

limk→∞〈wk, vk − v〉 ≤ 0

holds with wk ∈ Q(vk), then for each y ∈ D(Q) there exists w ∈ Q(v) such that

limk→∞〈wk, vk − y〉 ≥ 〈w, v − y〉.

♦

There are several notions of pseudo-monotonicity. This notion was introducedby Brezis [50] and [270] for single-valued operators. It should not be mixed upwith those used in a couple of recent papers on variational inequalities (see, forinstance Crouzeix [85]). For multi-valued operators see for instance [318].

A1.6.43 Definition. (Censor et al. [70])A multi-valued operator Q : V → 2V

′is called paramonotone on the set C ⊂ V

if it is monotone and

〈z − z′, v − v′〉 = 0 with v, v′ ∈ C, z ∈ Q(v), z′ ∈ Q(v′) (A1.6.33)

implies

z ∈ Q(v′), z′ ∈ Q(v). (A1.6.34)

♦

A1.6.44 Remark. The following property of paramonotone operators is cru-cial:

(*) If v∗ solves VI(Q,K) with a paramonotone Q and for v ∈ K there existsq ∈ Q(v):

〈q, v∗ − v〉 ≥ 0,

then v is a solution of VI(Q,K), too.

♦


Paramonotonicity is a property lying in between monotonicity and strict mono-tonicity, i.e., strictly monotone operators are paramonotone. If T is the subd-ifferential of a proper, lsc and convex functional then T is paramonotone. Thesum of two paramonotone operators is paramonotone, too (cf. Iusem [194]).

By the way, the maximal monotone operator associated with a Lagrangianof a smooth convex programming problem, see (A1.7.37) below, is paramono-tone only in the very special situation that all constraints are not active.

A1.6.45 Definition. The operator T : V → V ′ is called co-coercive on a setK ⊂ V with modulus γ > 0 if

〈T (x)− T (y), x− y〉 ≥ γ‖T (x)− T (y)‖2 ∀ x, y ∈ K.

♦

Sometimes the latter relation is considered as Dunn property (cf. [94]).Co-coercivity is a concept of generalized monotonicity for single-valued op-

erators that lies strictly between simple and strict monotonicity. A co-coerciveoperator T with modulus γ is monotone and Lipschitz-continuous with con-stant 1/γ. Co-coercive operators are paramonotone but not necessarily stronglymonotone.A sufficient condition for an operator to be co-coercive is: Let T : Rn → R

n beLipschitz-continuous with constant L and strongly monotone with modulus κ.Then, T is co-coercive with modulus γ = κ

L2 (cf. [421], Prop. 3.1).For convex and differentiable functions f : Rn → R, Lipschitz-continuity of ∇fwith constant L is equivalent to ∇f being co-coercive with modulus L−1 (cf.[421], Prop. 3.5).Further, co-coercivity of an operator T is equivalent to the strong monotonicityof the possibly multi-valued operator T −1 (cf. [421], Prop. 3.3).

Notions that describe the behaviour of a monotone operator at infinity areuseful to get statements about solvability of variational inequalities.

A1.6.46 Definition. (Zeidler [418])A multi-valued operator T : V → 2V

′is said to be coercive if

lim‖x‖→∞

infu∈T (x)〈u, x〉‖x‖

= +∞

and weakly coercive iflim‖x‖→∞

infu∈T (x)

‖u‖ = +∞,

with inf ∅ = +∞. ♦

It is clear that coercive operators are weakly coercive, and that a bounded effec-tive domain of T is sufficient for T to be coercive. Further, strongly monotoneoperators are weakly coercive.

The connection between continuous differentiable mappings and their Ja-cobians describes the following


A1.6.47 Lemma. Let F : D ⊂ Rn → Rn be continuous differentiable on the

open set D. Then it holds:

(a) F is monotone on D ⇔ the Jacobian JF (x) is positive semi-definite ∀ x ∈D;

(b) F is strictly monotone on D if JF (x) is positive definite ∀ x ∈ D;

(c) F is strongly monotone on D ⇔ JF (x) is uniformly positive definite ∀ x ∈D, i.e., there exists c > 0 with

〈y, JF (x)y〉 ≥ c‖y‖2, ∀ y ∈ Rn, ∀ x ∈ D.

Now we describe some basic results about existence and uniqueness ofsolutions for variational inequalities needed to prove well-definedness of solutionmethods.Let be given a set-valued operator Q : V → 2V

′and a closed convex subset K

of V . Since the variational inequality

VI(Q,K) find a pair x∗ ∈ K and q∗ ∈ Q(x∗) such that

〈q∗, x− x∗〉 ≥ 0 ∀ x ∈ K.

is equivalent to the inclusion problem IP(T ,K)

IP(T , V ) find x ∈ V : 0 ∈ T (x),

with operator T := Q + NK , existence and uniqueness results for inclusionproblems can easily be transferred to VI’s.Obviously an inclusion problem IP(T ,K) is solvable if and only if

0 ∈ rge(T ).

So, conditions on T guaranteeing that rge(T ) = V are sufficient for the existenceof a solution.

A1.6.48 Theorem. ([418], Corollary 32.35)Let T : V → 2V

′be a maximal monotone and weakly coercive operator. Then

rge(T ) = V .

Of particular interest for us are inclusion problems with operators of specialstructure, like T + ∂f .

A1.6.49 Theorem. ([59], Prop. 3)Let T : V → 2V

′be a monotone operator and f : V → R ∪ +∞ a proper, lsc

and convex functional. Suppose further that the following conditions are satisfied:

(i) dom(T ) ∩ dom(∂f) 6= ∅ and rge(∂f) = V ;

(ii) T + ∂f is maximal monotone.

Then rge(T + ∂f) = V .

This theorem does not ensure uniqueness of the solution. Uniqueness is guar-anteed if the operator T + ∂f is strictly monotone as follows immediately fromthe definition. Strong monotone operators are strictly monotone and weaklycoercive and therefore ensure both existence and uniqueness.

A1.6.50 Theorem. ([353], Thm. 12.54)For a maximal monotone and strongly monotone operator T : V → 2V

′the

inclusion IP(T , V ) has a unique solution.


A1.7 Convex minimization problems in Hilbert spaces

Now we consider the problem:


K := v ∈ V : gj(v) ≤ 0, j = 1, ...,m,(A1.7.35)

J, gj : V → R are convex, lower semi-continuous functionals.

Due to Proposition A1.5.13(iii) the set K is convex and closed, hence, Problem(A1.7.35) is a particular case of Problem (A1.5.11).

Together with this problem we consider the saddle point problem:

find (u, λ) ∈ V ×Rm+ such that

L(u, λ) ≥ L(u, λ) ≥ L(u, λ) ∀ u ∈ V, ∀ λ ∈ Rm+ , (A1.7.36)

with

L(u, λ) = J(u) +

m∑j=1

λjgj(u). (A1.7.37)

Function L is called Lagrangian function of Problem (A1.7.35) and the pair(u, λ), satisfying (A1.7.36), is called a saddle point of L with λ the Lagrangemultiplier vector corresponding to u.

For L(u) = supλ≥0 L(u, λ) we obtain

L(u) =

J(u) if u ∈ K,+∞ otherwise.

Hence, one can reformulate Problem (A1.7.35) in the form

minL(u) : u ∈ V .

The problemminL(λ) : λ ∈ Rm+, (A1.7.38)

with L(λ) = infu∈V L(u, λ) is called the dual problem to Problem (A1.7.35).One can easily show that for any u ∈ V and λ ∈ Rm+

L(u) ≥ L(λ),

and equality in the latter relation means that (u, λ) is a saddle point of theLagrangian function. Hence, if (u, λ) is a saddle point of (A1.7.36), then u is asolution of Problem (A1.7.35).

Later on, we need the following constraint qualification for the constraintsin Problem (A1.7.35):

Slater condition: there exists a point u such that

gj(u) > 0 for j = 1, ...,m.


A1.7.51 Theorem. (Kuhn-Tucker theorem)Assume the constraints of Problem (A1.7.35) fulfill the Slater condition and uis a solution of this Problem. Then there exists a vector λ ∈ Rm+ such that (u, λ)is a saddle point of Problem (A1.7.36).

Statements of this type under different constraint qualifications are thefoundation of the theory of convex programming.

A1.7.52 Proposition. Assume the Slater condition is fulfilled for Problem(A1.7.35). Then u ∈ K is a solution of this problem iff there exists a vectorλ ≥ 0 such that

0 ∈ ∂J(u) +

m∑i=1

λj∂gj(u),

λjgj(u) = 0, j = 1, ...,m.

(A1.7.39)

If in Problem (A1.5.11) the set K is defined by

K = u ∈ V : B(u) ≤ 0, (A1.7.40)

with B a convex mapping from V into a Banach space X, then the followinganalogue of the Kuhn-Tucker theorem holds (see Krabs [248]).Denote Y ′+ a non-negative cone in the dual space Y ′.

A1.7.53 Proposition. Assume that intY+ 6= ∅ and that for each λ ∈ Y ′+, λ 6=0, a point u ∈ V exists such that

〈λ,B(u)〉Y < 0, (A1.7.41)

Then u is a solution of Problem (A1.5.11), (A1.7.40) iff there exists an elementλ ∈ Y ′+ such that for all u ∈ V and λ ∈ Y ′+

J(u) + 〈λ, B(u)〉Y ≥ J(u) + 〈λ, B(u)〉Y ≥ J(u) + 〈λ,B(u)〉Y .

Now, let us consider the following convex semi-infinite problem

(SIP ) minimize J(u)subject to u ∈ K ≡ v ∈ Rn : g(v, t) ≤ 0 ∀ t ∈ T,

(A1.7.42)with T a compact subset of a normed space Y ; J, gt|t∈T : u → g(u, t) convex,finite-valued functions, gu : t → g(u, t) a continuous function on T for anyu ∈ Rn,

and assume that the set K fulfills the following Slater condition:

∃u : supt∈T

g(u, t) < 0. (A1.7.43)

Setting Y = C(T ) and B : u → g(u, ·), Problem (A1.7.42) can be consideredas a special case of Problem (A1.5.11), (A1.7.40). Since condition (A1.7.41) is aconclusion of (A1.7.43), Proposition A1.7.53 is also true for Problem (A1.7.42).

However, in that case the space Y ′ is the space of all Diraque-meassuresand very inconvenient with respect to an approximation.


The following statement characterizes the connections between convex semi-infinite and finite-dimensional problems (cf. Pshenicnyj [338], for other vari-ants of reduction theorems see Hettich and Jongen [175]).

Let T (u) = t ∈ T : g(u, t) = maxτ∈T g(u, τ).

A1.7.54 Proposition. (Reduction theorem)Assume that for Problem (A1.7.42) the Slater condition is fulfilled. Then u is aminimizer of the problem considered iff there exist r ≤ n + 1 points ti ∈ T (u)such that u is a solution of the problem

minJ(u) : g(u, ti) ≤ 0, i = 1, ..., r. (A1.7.44)

Usually, the set (t1, ..., tr) is called reduction set.This proposition implies immediately that Problem (A1.7.44) is equivalent to(A1.7.42) if J is strictly convex and Arg minu∈K J(u) 6= ∅.

Finally, in this section some estimates and techniques are established whichare useful for the numerical analysis of the problems under consideration.

Let S,Q ⊂ V be arbitrary sets. With

ρ(u,Q) = infv∈Q‖u− v‖, ρ(S,Q) = sup

u∈Sρ(u,Q)

the Hausdorff distance ρH between S and Q is defined by

ρH(S,Q) = maxρ(S,Q), ρ(Q,S).

A1.7.55 Proposition. Let ψj : V → R (j = 1, ...,m) be convex, lsc functionalsand for some c = (c1, ..., cm) the set

Uc = u ∈ V : ψj(u) ≤ cj , j = 1, ...,m

be bounded. Moreover, assume that a point u exists with

ψj(u) < cj , j = 1, ...,m.

Then the setUd = u ∈ V : ψj(u) ≤ dj , j = 1, ...,m

is bounded for any d = (d1, ..., dm).If, moreover, dj ↓ cj, j = 1, ...,m, then ρ(Ud, Uc)→ 0.

Proof: Without loss of generality we take cj = 0, j = 1, ...,m. Let δ =

max1≤j≤m dj and Uδ = u ∈ V : ψj(u) ≤ δ, j = 1, ...,m. Obviously, Ud ⊂ Uδ.Setting ψ(·) = max1≤j≤m ψj(·), we reformulate

Uc ≡ U0 = u ∈ V : ψ(u) ≤ 0, Uδ = u ∈ V : ψ(u) ≤ δ

and because ψj are lsc functionals, these sets are closed.Let ρ0 = sup‖u− u‖ : u ∈ U0. For arbitrary u ∈ ∂U0 (∂U0 is the boundary ofU0) we define the point z = u+ λ(u− u), with λ = ρ0δ(|ψ(u)| · ‖u− u|)−1.Then,

u =1

1 + λz +

λ

1 + λu


and, due to ψ(u) = 0 and the convexity of ψ,

0 = ψ(u) ≤ 1

1 + λψ(z) +

λ

1 + λψ(u)

holds true, hence,

ψ(z) ≥ λ|ψ(u)| = ρ0δ(|u− u|)−1 ≥ δ.

But z − u = (1 + λ)(u− u), therefore,

‖z − u‖ ≤(1 + ρ0δ(|ψ(u)| · ‖u− u‖)−1

)‖u− u‖ ≤ ρ0 + ρ0δ(|ψ(u)|)−1 ≡ r0.

Consequently, Uδ ⊂ Br0(u), with Br0(u) = u ∈ V : ‖u− u‖ ≤ r0.Now, let dj ↓ 0, j = 1, ...,m. This leads to δ ↓ 0 and assuming that

limδ↓0ρ(Uδ, U0) = α > 0,

we havelimδ↓0

ρ(Uδ, U0) = α > 0

for a monotone subsequence δ ↓ 0 which is used in the sequel instead of theinitial sequence. Then

ρ(∩δ>0Uδ, U0) = α > 0

and one can choose points w and w such that

w ∈ (∩δ>0Uδ) \ U0, w ∈ [u, w] ∩ ∂U0.

Due to the convexity of ψ on the interval [u, w] and ψ(u) < 0, ψ(w) = 0, weobtain ψ(w) = const > 0. But this is impossible, because w ∈ Uδ for all δ > 0belonging to the considered monotone subsequence.

For a finite-dimensional space this result remains true without the assump-tion about the existence of a Slater point for the set Uc. Indeed, as before we cansuppose that cj = 0, j = 1, ...,m, and because U0 ⊂ Bρ′(v) with some v ∈ U0

and ρ′, the relationmin

u∈∂B2ρ′(v)ψ(u) = c0 > 0

is true. Hence, Uc0 ⊂ B2ρ′(v) and ψj(v) < c0 for all j. Now, from Proposition

A1.7.55 we conclude that Ud is bounded for any d.Moreover, we suppose that dj ↓ 0, j = 1, ...,m, and that

limδ↓0

ρ(Uδ, U0) > 0

for a monotone subsequence δ ↓ 0 (as before, δ = max1≤j≤m dj). Then, choosingw ∈ (∩δ>0Uδ) \ U0 and observing the continuity of ψ and the closedness of U0,we obtain ψ(w) > 0, but this is also impossible.

The following simple example shows that the Slater condition is essentialin the infinite-dimensional case. Let V = `2, m = 1 and

ψ(u) =

∞∑i=1

i−1u2i .

Obviously, u : ψ(u) ≤ 0 = 0, and for any d > 0 the set Ud = u : ψ(u) ≤d contains the points u = (0, ..., 0,

√di, 0, ...), with

√di the value of the i-th

component. Consequently, Ud is bounded.


A2 Approximation Techniques and Estimates ofSolutions

Our investigation of variational problems is mainly concerned with monotone(elliptic) variational inequalities. To provide a finite-dimensional approximationof such problems finite difference methods (FDM), wavelets and finite elementmethods (FEM) are applicable. The latter are most popular and we will usethem here.

There are numerous publications, including a number of monographs, de-voted to finite element approximations for linear as well as nonlinear problemsof mathematical physics (see, in particular, Strang and Fix [382], Ciarlet[75], Glowinski, Lions and Tremolieres [135], Neittaanmaki, Sprekelsand Tiba [304]).In this section finite element methods and some estimates of finite element ap-proximations are briefly described.

A2.1 Some examples of monotone variational inequalities

In order to illustrate the peculiarity of the approximation of variational inequal-ities, we consider three examples. For simplicity, everywhere in this section theset Ω is supposed to be a polygonal domain in R2 with boundary Γ.

Dirichlet Problem:

−∆u = f in Ω,u = 0 on Γ.

(A2.1.1)

Here and in the examples below the equalities or inequalities with respect tofunctions are to be understood in the corresponding functional spaces (seeKinderlehrer and Stampaccia [234], Chapt.2). They can be consideredformally until there are not made suitable assumptions or statements aboutregularity of the solutions.

For fixed f ∈ L2(Ω) the Dirichlet problem has a unique weak solutionu ∈ V ≡ H1

0 (Ω) defined by

〈∇u,∇v〉0,Ω = 〈f, v〉0,Ω ∀ v ∈ V. (A2.1.2)

In particular, if Ω is a convex polygon, then u ∈ H20 (Ω).

Obstacle Problem:

−∆u ≥ fu ≥ ϕ

(−∆u− f)(u− ϕ) = 0

in Ω,

u = 0 on Γ,

u = ϕ, ∂u∂ν = ∂ϕ

∂ν on Γ∗,

(A2.1.3)

with f ∈ L2(Ω), ϕ ∈ H2(Ω), ϕ|Γ ≤ 0, Γ∗ the unknown ”boundary” between

u ∈ Ω : u(x) = ϕ(x) and u ∈ Ω : u(x) > ϕ(x)

A2. APPROXIMATION TECHNIQUES AND ESTIMATES OF SOLUTIONS471

and ν a unit normal to Γ∗.The relations (A2.1.3) characterize the unique solution u of the variational in-equality

〈∇u,∇(v − u)〉0,Ω ≥ 〈f, v − u〉0,Ω ∀ v ∈ K, (A2.1.4)

K = w ∈ V : w ≥ ϕ on Ω, V = H10 (Ω). (A2.1.5)

Conditions, ensuring that u ∈ H10 (Ω) ∩ H2(Ω) are described in Brezis and

Stampacchia [53], Lewy and Stampacchia [264]. For stronger foundationsof the regularity of the solutions see Friedman [120].

Problem with Obstacle on the Boundary:(for short: Boundary-obstacle Problem)

−∆u = f in Ω,

u− ϕ0 ≥ 0,∂u

∂ν≥ 0, (u− ϕ0)

∂u

∂ν= 0 on Γ,

(A2.1.6)

with f ∈ L2(Ω), ϕ0 the trace of some function ϕ ∈ H2(Ω) on Γ and ν a unitoutward normal in a point of Γ. We also assume that (f, 1)0,Ω < 0.

If the solution of Problem (A2.1.6) belongs to H2(Ω), then the problem isequivalent to the variational inequality

(∇u,∇(v − u))0,Ω ≥ (f, v − u)0,Ω ∀ v ∈ K, (A2.1.7)

K = w ∈ V : w ≥ ϕ0 on Γ, V = H1(Ω). (A2.1.8)

Under the conditions described all these problems can be reformulated asconvex variational problems:

minu∈K

J(u), J(u) =1

2a(u, u)− 〈f, u〉, (A2.1.9)

with a(u, v) = (∇u,∇v)0,Ω. For the Dirichlet Problem (A2.1.1) the set K has tobe chosen as K = V , whereas in the Obstacle Problem (A2.1.3) and Boundary-obstacle Problem (A2.1.6) the set K is defined by (A2.1.5) and (A2.1.8), respec-tively. Here 〈·, ·〉 is the duality connection between V and V ′ and 〈f, u〉 coincideswith the scalar product (f, u)0,Ω for f ∈ L2(Ω).

The statements above on the equivalence between boundary value prob-lems and variational inequalities remain also true for more general domains (seeFriedman [120], Hlavacek et al. [182]). The Obstacle Problems (A2.1.3)and (A2.1.6) have a unique solution. Moreover, in (A2.1.3) the correspondingbilinear form a(·, ·) is coercive.

A2.2 Finite element methods

A2.2.1 Definition. A family Vh of finite-dimensional vector spaces is calledan inner approximation of a Hilbert space V if

(i) Vh ⊂ V for all h, and

(ii) for each v ∈ V there exist elements vh ∈ Vh such that ‖vh − v‖V → 0 ifh ↓ 0.


♦

Sometimes it is more convenient to reformulate condition (ii) into

(ii)’ for every v ∈ V , with V a dense set in V , there exist elements vh ∈ Vhsuch that‖vh − v‖ → 0 if h ↓ 0.

We first describe a finite element method for polygonal domains Ω ⊂ R2

and show afterwards its modification for more general domains.

A2.2.2 Definition. For a polygonal domain Ω a sequence of triangulationsTh with parameter h ↓ 0 is called quasi-uniform if for fixed h the set Th

consists of open triangles Tj such that:

(i) Ω = int(∪Tj⊂Th Tj) =: Ωh, (Tj the closure of Tj);

(ii) Ti ∩ Tj = ∅, otherwise Ti and Tj have only a common vertex or a commonside;

(iii) the maximal length of a side of each triangle Tj is not larger than h;

(iv) the area of each triangle Tj ∈ Th is bounded from below by ch2, withc > 0 independent of h and Tj .

♦

Now we consider some special finite element spaces which are subspacesof H1(Ω) or H1

0 (Ω). Usually, solving boundary value problems with (essentiallynonlinear) differential operators of second order on polygonal domains, approx-imations by elements of these subspaces are performed.

We further introduce the following notations:Mh is the set of all vertices of the triangulation,

M0h := Mh ∩ Ωh, Γh := Ωh \ Ωh, Nh := Mh ∩ Γh,

Ih, I0h, I

′h are index sets of the sets Mh, M

0h , Nh, respectively.

(A2.2.10)

Let Pi, i ∈ Ih, be the vertices of the triangles in a fixed triangulation Th.For each i ∈ Ih we define a function ϕi ∈ C(Ω) that is affine on each triangleT ∈ Th and

ϕi(Pi) = 1, ϕi(Pj) = 0 for j 6= i.

Using ϕi as a basis, we construct the finite element spaces

Vh := vh(x) =∑i∈Ih

αiϕi(x) : αi ∈ R

andV 0h := vh(x) =

∑i∈I0h

αiϕi(x) : αi ∈ R.

The families of spaces Vh and V 0h build an inner approximation of the

spaces V = H1(Ω) and V = H10 (Ω), respectively. In order to verify this we recall

that C∞(Ω) and D(Ω) are dense in H1(Ω) and H10 (Ω), respectively.


A2.2.3 Remark. We make use of an affine mapping of an arbitrary triangleT ⊂ Th onto a standard triangle ∆ ⊂ R2 (with vertices (0, 0), (1, 0), (0, 1)).Such a mapping can be represented as a successive transformation of T ontothe triangle ∆h (with vertices (0, 0), (h, 0), (0, h)) and of ∆h onto ∆. Due toquasi-uniformity of the triangulation, the affine mapping Q : T → ∆h and itsinverse have Jacobians uniformly bounded away from 0 (with respect to h andT ). Consequently, for functions v ∈ H1(T ) and vQ = v Q−1 ∈ H1(∆h) we get

c1‖v‖s,T ≤ ‖vQ‖s,∆h≤ c2‖v‖s,T , (A2.2.11)

where s = 0, 1, and c1 > 0 and c2 are independent of h and T . ♦

A2.2.4 Definition. A function uI ∈ Vh(Ω) is called interpolant of a functionu ∈ C(Ω) (on the triangulation Th) if

uI(Pi) = u(Pi) ∀ i ∈ Ih.

♦

In the case Ω ⊂ R2, regarding the continuous embedding H2(Ω) → C(Ω),the notion of an interpolant can be extended to functions u ∈ H2(Ω), too.

A2.2.5 Theorem. If u ∈ H2(Ω) (resp., u ∈ H2(Ω) ∩H10 (Ω)), then the inter-

polant uI ∈ Vh(Ω) (resp. uI ∈ V 0h (Ω)) satisfies the inequality

‖u− uI‖0,Ω + h‖∇(u− uI)‖0,Ω ≤ ch2|u|2,Ω, (A2.2.12)

with |u|2,Ω =(∑

|α|=2 ‖Dαu‖2)1/2

.

Proof: We will denote by c different constants which are independent of u, hand Th. Firstly, estimate (A2.2.12) can be established on ∆h. Put

y1 :=x1

h, y2 :=

x2

hand u(y1, y2) = u(x1, x2), uI(y1, y2) = uI(x1, x2).

Then, obviously,

uI(y1, y2) = u(0, 0) + [u(1, 0)− u(0, 0)]y1 + [u(0, 1)− u(0, 0)]y2

and, consequently,

‖uI‖0,∆ ≤ 5‖u‖C(∆).

Furthermore,

‖u− uI‖0,∆ ≤ ‖u‖0,∆ + ‖uI‖0,∆ ≤ ‖u‖2,∆ + 5‖u‖C(∆),

and using the continuous embedding of H2(∆) into C(∆), we get

‖u− uI‖0,∆ ≤ c‖u‖2,∆. (A2.2.13)

Since linear interpolants reproduce linear functions, it follows that

uI − wI = uI − w for any w(x1, x2) = α+ βx1 + γx2.


Hence, in view of (A2.2.13),

‖u− uI‖0,∆ = ‖u− w − (uI − wI)‖0,∆ ≤ c‖u− w‖2,∆,

and because w is arbitrarily chosen,

‖u− uI‖0,∆ ≤ c infz∈P1(∆)

‖u− z‖2,∆, (A2.2.14)

with P1(∆) a set of affine functions on ∆. Similarly we can obtain

‖∇(u− uI)‖0,∆ ≤ c infz∈P1(∆)

‖u− z‖2,∆.

Now, it is easy to show that

infz∈P1(∆)

‖u− z‖2,∆ ≤ c|u|2,∆

and thus,‖u− uI‖0,∆ ≤ c|u|2,∆, (A2.2.15)

‖∇(u− uI)‖0,∆ ≤ c|u|2,∆. (A2.2.16)

Using the transformation x1 = y1h, x2 = y2h, we get

‖u− uI‖0,∆ = h−1‖u− uI‖0,∆h,

‖∇(u− uI)‖0,∆ = ‖∇(u− uI)‖0,∆h,

|u|2,∆ = h|u|2,∆h

and due to (A2.2.15), (A2.2.16),

‖u− uI‖0,∆h+ h‖∇(u− uI)‖0,∆h

≤ ch2|u|2,∆h. (A2.2.17)

From (A2.2.11) and (A2.2.17) it follows that for any T ∈ Th

‖u− uI‖0,T + h‖∇(u− uI)‖0,T ≤ ch2|u|2,T . (A2.2.18)

Now, in order to obtain (A2.2.12), we square both sides of inequality (A2.2.18)and summing up all triangles T ∈ Th.

A2.2.6 Definition. For a domain Ω ∈ R2, not necessarily polygonal, a systemof triangulations Th is called uniform if it is constructed as follows (see Fig.A2.2.1 below):

(i) Ω ⊂ R2 is covered by a uniform grid (with step lengths hx and hy indirection of the corresponding axes);

(ii) the arising rectangles are split into triangles by diagonals having pointedangles with the x-axis;

(iii) Th is the set of all open triangles, belonging to Ω;

(iv) the ratio 1h minhx, hy is uniformly bounded from below by a constant

c > 0, with h :=√h2x + h2

y, h ↓ 0.

♦


hx

Γ

hy

y

x

Γh

Figure A2.2.1:

t

s(1, 0)

(0,−1)

(−1, 0)

(0, 1)

Q = suppϕ

Figure A2.2.2:

Obviously, for rectangular domains this triangulation Th satisfies the con-dition Ωh ≡ Ω and enables us in this case to approximate Ω with high accuracy.

For a uniform triangulation basis functions ϕi can be constructed usingthe standard function

ϕ(s, t) :=

1− 1

2 (|s|+ |t|+ |s− t|) for (s, t) ∈ Q,0 otherwise,

with Q = (s, t) : −1 ≤ s ≤ 1, |s− t| ≤ 1 (see Fig. A2.2.2 below).Indeed, for a vertex Pi = (khx, lhy) (k, l integers), we get

ϕi(x, y) = ϕ

(x

hx− k, y

hy− l).

This representation simplifies essentially the discretization.However, in general situations a uniform triangulation of the whole domain

Ω leads to a distance of order h between the boundary Γ and the discrete


boundary Γh of the grid domain, i.e., large numerical errors may occur.Now, let us consider a more effective approach of the triangulation, which

guarantees a good accuracy of the approximation of the domain Ω, supposingthat Γ is a piece-wise curve of class C2.We construct a closed broken line Γh with the following properties:

(i) the domain Ωh, restricted by Γh, is contained in Ω;

(ii) for the Hausdorff distance it holds ρH(Γ, Γh) ≤ δh2, with δ independentof h;

(iii) the lengths of the segments in Γh are bounded from below by ch, withc > 0 independent of h;

(iv) the triangulation of Ωh is carried out in the same manner as for polygonaldomains, preserving the quasi-uniformity for h ↓ 0 (cf. Definition A2.2.2).

A triangulation with these properties can always be performed, moreover,it can be done uniformly on the majority of the entire area of the domain Ω,except near the boundary Γh.

A2.2.7 Remark. Sometimes it is more convenient to build a domain Ωh withthe property Ωh ⊃ Ω maintaining the conditions (ii)-(iv) above. ♦

The spaces Vh, which form an inner approximation of V , can be obtainedby a suitable continuation of the functions

vh(x) :=∑i∈Ih

αiϕi(x) on Ω \ Ωh

orvh(x) :=

∑i∈I0h

αiϕi(x) on Ω \ Ωh.

For V := H10 (Ω) the space V 0

h can be constructed by means of the followingcontinuation of the functions vh:

vh(x) :=

∑i∈I0h

αiϕi(x) on Ωh,

0 on Ω \ Ωh.

In more general cases these constructions can face serious difficulties. For differ-ent approaches see Skarpini and Vivaldy [360], Marchuk and Agoschkov[286] and the references quoted there.

A2.2.8 Proposition. For a boundary of class C2 estimate (A2.2.12) remainstrue for u ∈ H2(Ω) ∩H1

0 (Ω) if the spaces V = H1(Ω) and V 0h are constructed

as above.

A2.2.9 Remark. Estimate (A2.2.12) holds also true for some other techniquesof the continuation or restriction of vh from Ωh to Ω, including the case Ωh ⊃ Ω.Due to (A2.2.12) we obtain immediately that

‖u− uI‖0,Ω ≤ ch2|u|2,Ω (A2.2.19)

and in case Ωh ⊃ Ω‖u− uI‖1,Ω ≤ ch2|u|2,Ω, (A2.2.20)

with c and c independent of u, h and Th. ♦


Usually, the solutions of variational inequalities arising from problems with dif-ferential operators of second order (for short: variational inequalities of secondorder) do not belong to the space C2(Ω) (and sometimes even not to H2(Ω))whatever a smoothness of data we have. Therefore, in order to solve such vari-ational inequalities, it makes sense to restrict ourselves to the implementationof piece-wise affine basis functions, because in that case the use of smootherfunctions does not improve the estimates of the approximation, but it increasesthe expense of calculations.

For linear variational inequalities of second order, when the solution has abetter smoothness, and especially for linear equations and variational inequali-ties with differential operators of higher order, smoother functions can be con-structed. Here, curved ”triangular” elements are implemented in order to obtaina sufficiently exact approximation along the boundary (cf. Ciarlet [75], Oden[310], Korneev [245]).

A2.3 FEM-approximation of feasible sets

Let us return to the variational problem (A1.5.11) and consider an example-setting finite element approximation for the feasible set of this problem.

A2.3.10 Definition. The sequence Kh of closed convex sets is called aninner approximation of the feasible set K if

(i) Kh ⊂ Vh and for every v ∈ K there exist elements vh ∈ Kh such that

‖vh − v‖V → 0 for h ↓ 0;

(ii) uh ∈ Kh and uh u in V imply that u ∈ K.

♦

The notion ”inner” does not mean Kh ⊂ K. Similarly to the inner ap-proximation of the space V , it is sufficient to verify condition (i) for elementsv ∈ K ∩ V , with V a dense set in V .

The conditions in Definition A2.3.10 are fulfilled for the Problems (A2.1.3)and (A2.1.6) with

Kh := uh =∑i∈I0h

αiϕi(x) : αi ≥ ϕ(Pi) ∀ i ∈ I0h (A2.3.21)

andKh := uh =

∑i∈I′h

αiϕi(x) : αi ≥ ϕ0(Pi) ∀ i ∈ I ′h, (A2.3.22)

(I0h and I ′h according to (A2.2.10))

respectively, if the system of triangulations is quasi-uniform. Indeed, for eachv ∈ K ∩ V , with V = D(Ω) in the case of Problem (A2.1.3) and V = C∞(Ω)for Problem (A2.1.6), the interpolant vIh ∈ Vh belongs to Kh. Due to TheoremA2.2.5 it holds ‖vIh−v‖ → 0 for h ↓ 0, i.e., condition (i) in Definition A2.3.10is fulfilled.

To verify condition (ii) in Definition A2.3.10 for Problem (A2.1.3) we notethat the inclusion uh ∈ Kh implies uh ≥ ϕIh and, consequently,

〈`, uh − ϕIh〉 ≥ 0 for any ` ∈ V ′+. (A2.3.23)


Since ‖ϕIh − ϕ‖V → 0 and uh u, inequality (A2.3.23) leads to

〈`, u− ϕ〉 ≥ 0 for any ` ∈ V ′+,

i.e., u ∈ K.Now, we consider the situation for Problem (A2.1.6). Regarding that ϕ0 is thetrace of ϕ ∈ H2(Ω) → C(Ω) and ‖ϕ−ϕIh‖C(Ω) → 0, there exists a sequence ofnumbers ch ↓ 0 such that uh + ch ∈ K. Since uh u, we obtain uh + ch uand, due to the weak closedness of K, the inclusion u ∈ K holds.

It would be desirable to obtain approximations of K satisfying simultane-ously Kh ⊂ K. Unfortunately, this cannot be performed effectively. For instance,in Problem (A2.1.3) we should take into account the constraints

vh(x) =∑i∈I0h

αiϕi(x) ≥ ϕ(x) a.e. in Ω,

i.e., each set Kh has to be described by an infinite number of restrictions.

A2.4 FEM-approximation of variational problems

Let us suppose that the objective functional J for the convex variational prob-lem (A1.5.11) is continuous on V = H1(Ω) or V = H1

0 (Ω), respectively. Imple-menting a finite element method, we are looking for an approximate solution ofProblem (A1.5.11) in the form

uh(x) :=∑i

αiϕi(x), (A2.4.24)

summing up for i ∈ Ih or i ∈ I0h, in dependence of the choice of the space V .

Hence, for h ↓ 0 a sequence of approximate problems

minJ(uh) : uh ∈ Kh (A2.4.25)

is obtained.Assuming that J = J1+J2, with J1, J2 convex and J1 Gateaux-differentiable

on V , Problem (A2.4.25) corresponds to the variational inequality

uh ∈ Kh : 〈J ′1(uh), vh − uh〉+ J2(vh)− J2(uh) ≥ 0 ∀ vh ∈ Kh. (A2.4.26)

Now, we prove convergence of a finite element method for variational inequalitieswith strongly monotone operators, i.e., the objective functional J in (A2.4.25)is supposed to be strongly convex. Here we assume that the Problems (A2.4.25)are solved approximately.

A2.4.11 Theorem. Suppose that the family of spaces Vh forms an innerapproximation of the space V , and Kh approximates K in the sense of Defi-nition A2.3.10. Moreover, let J be a continuous and strongly convex functionalon V (with constant κ), and an element uεh let be computed such that

ρ(q(uεh), ∂J(uh)) ≤ εh, εh ↓ 0, (A2.4.27)

withuh := arg min

uh∈KhJ(uh) and some q(uεh) ∈ ∂J(uεh).


Thenlimh↓0‖uεh − u‖ = 0,

with u = arg minu∈K J(u).

Proof: Using estimate (A1.5.30), we conclude from (A2.4.27)

‖uεh − uh‖ ≤εh2κ.

Hence, it suffices to show that limh↓0 ‖uh − u‖ = 0.For arbitrary v ∈ K, due to condition (i) in Definition A2.3.10, one can

choose points vh ∈ Kh such that limh↓0 ‖vh − v‖ = 0.On account of Proposition A1.5.31 the inequality

J(vh)− J(uh) ≥ 〈q(uh), vh − uh〉+ κ‖vh − uh‖2

is fulfilled for all q(uh) ∈ ∂J(uh) and with regard to Remark A1.5.35

〈q(uh), vh − uh〉 ≥ 0

is valid for some q(uh) ∈ ∂J(uh). Consequently,

κ‖vh − uh‖2 ≤ J(vh)− J(uh) ≤ J(vh)−minv∈V

J(v), (A2.4.28)

√κ‖uh − v‖ ≤

(J(vh)−min

v∈VJ(v)

)1/2

+√κ‖vh − v‖.

Hence, in view of limh↓0 ‖vh−v‖ = 0 and the continuity of J , the sequence uhis bounded and we can choose a subsequence that converges weakly to somew ∈ V . Because J is convex and continuous it is also a wlsc functional on Vand, with regard to J(uh) ≤ J(vh), we obtain J(w) ≤ J(v).But condition (ii) in Definition A2.3.10 implies that w ∈ K. Since v is anarbitrary element of K, one can conclude that w is a solution of the originalproblem. Hence, w = u, and the sequence uh converges weakly to u.Using condition (i) in Definition A2.3.10, we choose a sequence vh, vh ∈ Kh,with limh↓0 ‖vh − u‖ = 0. Then, (A2.4.28) leads to limh↓0 ‖uh − vh‖ = 0, conse-quently, limh↓0 ‖uh − u‖ = 0.

On the next pages, we explain the discretization procedure for the above in-troduced variational inequalities: Dirichlet Problem (A2.1.1), Obstacle Problem(A2.1.3) as well as the Boundary-obstacle Problem (A2.1.6) in more detail. Herethe approximate solutions of the first two problems are sought in the form

uh(x) :=∑i∈I0h

αiϕi(x)

and this leads to the finite-dimensional problems

φh(α) :=1

2αTAhα− fTh α→ min, α ∈ Kh. (A2.4.29)

In case of the Dirichlet Problem (A2.1.1) we have

Kh := Rn, (A2.4.30)


and of the Obstacle Problem (A2.1.3)

Kh := α ∈ Rn : αi ≥ ϕ(Pi), i ∈ I0h, (A2.4.31)

(n = |I0h| is the cardinality of the set I0

h, described in (A2.2.10)).For convenience of the description we take I0

h = 1, ..., n and denote

Ah := (aij)ni,j=1, aij = a(ϕi, ϕj);

fh := (f1, ..., fn)T , fi = 〈f, ϕi〉.

For the Boundary-obstacle Problem (A2.1.6) the approximate solutionshave the form

uh =∑i∈Ih

αiϕi(x),

thus, we obtain the finite-dimensional minimization problems

φh(x) :=1

2αT Ahα− fTh α→ min, α ∈ Kh, (A2.4.32)

with

Kh := α ∈ Rn : αi ≥ ϕ0(Pi), i ∈ I ′h,I ′h := 1, ..., n, Ah := (aij)

ni,j=1, fn := (f1, ..., fn)T

(A2.4.33)

and aij , fi defined as above.Let PTk := (xTk , yTk) (k = 1, 2, 3) be the vertices of some triangle T ∈ Th.

Then an affine function ηT (x, y) having the value αTk at PTk can be describedby

ηT (x, y) :=1

σT

3∑k=1

(βTkx+ γTky + δTk)αTk , (A2.4.34)

with

βT1= yT2

− yT3, γT1

= xT3− xT2

, δT1= xT2

yT3− xT3

yT2,

βT2= yT3

− yT1, γT2

= xT1− xT3

, δT2= xT3

yT1− xT1

yT3,(A2.4.35)

βT3= yT1

− yT2, γT3

= xT2− xT1

, δT3= xT1

yT2− xT2

yT1,

σT =∑3k=1 δTk , i.e., |σT | = 2m(T ), with m(T ) the area of the triangle T .

Taking

θT (x, y) =

1 if (x, y) ∈ T ,0 if (x, y) /∈ T ,

we obtain the function

uh(x, y) :=∑T∈Th

1

σT

(3∑k=1

(βTkx+ γTky + δTk)αTk

)θT (x, y),

which coincides with uh(x, y) on the domain Ω, except for the set Γ of sides ofall triangles in Th.


In order to describe the objective function of the approximate Problems (A2.1.1)and (A2.1.3), we have to calculate for any (x, y) /∈ Γ

∂uh∂x

(x, y) =∑T∈Th

1

σT

(3∑k=1

βTkαTk

)θT (x, y),

∂uh∂y

(x, y) =∑T∈Th

1

σT

(3∑k=1

γTkαTk

)θT (x, y).

(A2.4.36)

Therefore, in that cases the objective functions have the form

φh(α) :=1

8

∑T∈Th

1

m(T )

( 3∑k=1

βTkαTk

)2

+

(3∑k=1

γTkαTk

)2

−∑i∈I0h

αi

∫suppϕi

f(x, y)ϕi(x, y)dxdy (A2.4.37)

(αi = 0 if i ∈ I ′h), and the feasible sets Kh are defined by (A2.4.30) and(A2.4.31), respectively.Analogously, for Problem (A2.1.6) one obtains

φh(α) :=1

8

∑T∈Th

1

m(T )

( 3∑k=1

βTkαTk

)2

+

(3∑k=1

γTkαTk

)2

−∑i∈Ih

αi

∫suppϕi

f(x, y)ϕi(x, y)dxdy (A2.4.38)

and Kh is defined by (A2.4.33).Note that∫suppϕ

f(x, y)ϕi(x, y)dxdy =∑T∈Ti

∫T

f(x, y) (biTx+ ciT y + diT ) dxdy,

with Ti the set of triangles of Th having a common vertex Pi. Here the coeffi-cients biT , ciT ; diT are defined such that the function

biTx+ ciT y + dit

is equal to 1 in Pi, and 0 in all the other vertices of T .Now, using the uniform triangulation, we give an illustrative description

of the approximation of Problem (A2.1.3). The rectangular domain is drawn inFigure A2.4.3where the mesh-sizes of this rectangular domain are given by hx = a

Nxand

hy = bNy

, (Nx, Ny are integers). The vertex (i, j) has the coordinates (ihx, jhy).

In this case the solution is sought in the form

uh(x, y) :=

Ny∑j=0

Nx∑i=0

uijϕij(x, y), (A2.4.39)


hx xa

b

(i, j)

hy

T ′

T ′′

y

Figure A2.4.3:

with u0j = ui0 = uNxj = uiNy = 0 for all i, j; ϕij is a usual piece-wise affinebasis function with

ϕij(ihx, jhy) = 1, ϕij(khx, lhy) = 0 for (k, l) 6= (i, j).

With a fixed vertex (i, j), (0 ≤ i ≤ Nx − 1, 0 ≤ j ≤ Ny − 1), two triangles T ′

and T ′′ are connected as shown in Figure A2.4.3.Using the formulas (A2.4.35) for the description of the function uh on the tri-angles T ′ and T ′′, we obtain the following representation of the derivatives ofuh on T ′ ∪ T ′′:

∂uh∂x

(x, y) =1

σT ′(hyui,j − hyui+1,j) θT ′(x, y)

+1

σT ′′(hyui+1,j+1 − hyui,j+1) θT ′′(x, y),

∂uh∂y

(x, y) =1

σT ′(−hxui+1,j+1 + hxui+1,j) θT ′(x, y)

+1

σT ′′(−hxui,j + hxui,j+1) θT ′′(x, y),

with |σT ′ | = |σT ′′ | = hxhy.Hence, in order to describe the quadratic term of the objective function, we haveto calculate

1

2

∫T ′∪T ′′

[(∂uh∂x

)2

+

(∂uh∂y

)2]dxdy =

=1

4hxhy

((ui+1,j − ui,j)2

h2x

+(ui+1,j+1 − ui+1,j)

2

h2y

+(ui+1,j+1 − ui,j+1)2

h2x

+(ui,j+1 − ui,j)2

h2y

)


and

1

2

∫Ω

[(∂uh∂x

)2

+

(∂uh∂y

)2]dxdy =

=1

4hxhy

Nx−1∑i=0

Ny−1∑j=0

((ui+1,j − ui,j)2

h2x

+(ui+1,j+1 − ui+1,j)

2

h2y

+(ui+1,j+1 − ui,j+1)2

h2x

+(ui,j+1 − ui,j)2

h2y

)

=1

4hxhy

Nx−1∑i=0

Ny−1∑j=0

((ui+1,j − ui,j)2

h2x

+(ui+1,j+1 − ui,j+1)2

h2x

)

+1

4hxhy

Nx−1∑i=0

Ny−1∑j=0

((ui,j+1 − ui,j)2

h2y

+(ui+1,j+1 − ui+1,j)

2

h2y

).

Taking into account the values ui,j on the boundary, we obtain

1

2

∫Ω

[(∂uh∂x

)2

+

(∂uh∂y

)2]dxdy =

=1

4hxhy

Nx−1∑i=0

Ny−1∑j=0

((ui+1,j − ui,j)2

h2x

+(ui,j+1 − ui,j)2

h2y

).

The linear part of the objective functional has the form

−Nx−1∑i=1

Ny−1∑j=1

uij

∫suppϕ

f(x, y)ϕij(x, y)dxdy,

in which, according to Figure A2.4.4, we have

(i, j)

1

2

3

45

6

Figure A2.4.4:

ϕij(x, y) :=

− yhy

+ (j + 1) on triangle 1,xhx− y

hy+ 1 + (j − 1) on triangle 2,

xhx

+ (1− i) on triangle 3,yhy

+ (1− j) on triangle 4,

− xhx

+ yhy

+ 1 + (i− j) on triangle 5,

− xhx

+ (i+ 1) on triangle 6,

and

Kh := u = (uij)Nxi=0,

Nyj=0 : u0j = ui0 = uNxj = uiNy = 0, uij ≥ ϕ(ihx, jhy) ∀ i, j.


Note that in the case of a uniform triangulation the matrices Ah and Ahhave a regular five-diagonal structure, and for hx = hy their elements do notdepend on h. Concerning such matrices, see Young [417].If, for example, a = b = 1 and hx = hy = h = 1

N , then the matrix Ah has theeigenvalues

λk,l = 8

(sin2(

k

2hπ) + sin2(

1

2hπ)

), k, l = 1, ..., N − 1.

Hence, its condition number is

cond(Ah) =maxk,l λk,lmink,l λk,l

=sin2 N−1

2N π

sin2 12N π

≈ 4h−2π2 = 8h−2π2

(recall that h =√h2x + h2

y, i.e. h =√

2h).

In a more general situation, using quasi-uniforme triangulations, we obtainfor the Dirichlet Problem (A2.1.1) and the Obstacle Problem (A2.1.3) that

h2m‖v‖Rn ≤ vTAhv ≤M‖v‖Rn ∀ v ∈ Rn,

with m > 0 and M independent of h, consequently,

cond(Ah) ≤ M

mh−2.

Numerical experiments and the theoretical analysis enable us to consider thesematrices as well-conditioned.

In the Boundary-obstacle Problem (A2.1.6) the set K contains constantfunctions and for u = c = const, using the formulas (A2.4.35), (A2.4.36), weimmediately obtain

αT Ahα = 0 with α = (c, ..., c) ∈ Rn.

Hence, due to the Rayleigh principle,

cond−1(Ah) =min‖α‖6=0 α

T Ahα

max‖α‖6=0 αT Aα= 0.

Of course, degeneration of the approximate problems is a consequence ofthe degeneration of the quadratic form a(u, u) on V .

We also need the following result about the distance between the solutionu of the Obstacle Problem (A2.1.3) and the finite-element solutions uh of theapproximate problems (A2.4.29), (A2.4.31).

A2.4.12 Proposition. (Ciarlet [75])Assume that the solution u of Problem (A2.1.3) belongs to H2(Ω). Then thereexists a constant c(u, f, ϕ) independent of h such that for every quasi-uniformfamily of triangulations

‖u− uh‖1,Ω ≤ c(u, f, ϕ)h. (A2.4.40)

Assuming regularity of the Obstacle Problem, this estimate is also true if Ω isa convex set with a boundary of the class C2 ( cf. Brezis and Stampacchia[53]).

Obviously, as a conclusion of the proposition above, a similar estimate

‖u− uh‖1,Ω ≤ c(u, f)h

can be obtained for the Dirichlet Problem (A2.1.1).

A3. SELECTED METHODS OF CONVEX MINIMIZATION 485

A3 Selected Methods of Convex Minimization

In this section standard methods of convex unconstrained and constrained min-imization are briefly described in form of conceptual algorithms. Their selectiondepends on specific properties of the investigated variational problems. In par-ticular, the variational inequalities occurring in applications have usually sim-ple constraints and their discretization leads to finite-dimensional convex (oftenquadratic) problems of minimization with linear constraints.

Moreover, some of the methods considered here are studied in the mainpart of the book in connection with ill-posed finite-dimensional problems.

The majority of the convergence results is described without proofs. Onlyarguments, which help to understand the behavior of the procedures and meth-ods applied, will be given. Concerning proofs, the reader is referred to the liter-ature.

Our aim is to consider numerical methods as a whole, starting with dis-cretization and including elementary procedures within the optimization meth-ods. We omit linear programming methods, nevertheless, they are used in par-ticular for solving the auxiliary problems in algorithms of feasible directions.Quadratic programming methods will be sketched only in a special setting.

Taking into account the object of our efforts, among numerous publicationsrelated to convex optimization methods, the books of Polyak [330], Fletcher[117], Bazaraa, Sheraly and Shetty [36], Bertsekas [41], Dennis andSchnabel [90], Gill, Murray and Wright [128], Mangasarian [285] andGeiger and Kanzow [126] are quite suitable. The monograph of Neittaan-maki, Sprekels and Tiba [304] is devoted, in particular, to optimizationmethods of elliptic systems.

A3.1 Notions, definitions and convergence statements

I order to investigate numerical methods we need the following notionswhich characterize the behavior of iterative methods.

A3.1.1 Definition.

(i) A sequence vk is convergent to a set Q if

limk→∞

ρ(vk, Q) = 0.

(ii) A sequence vk is convergent to the optimal value (especially of Problem(A1.5.11) if

limk→∞

J(vk) = minu∈K

J(u) and limk→∞

ρ(vk,K) = 0.

♦

A3.1.2 Remark. In this case vk is called a generalized minimizing sequence.The notion ”generalized”will be omitted if, starting with some iteration number,vk ∈ K holds true. ♦

A3.1.3 Definition. A sequence vk converges to v


(i) linearly (with geometric progression), if for some c and q ∈ (0, 1)

‖vk − v‖ ≤ cqk, ∀ k ∈ 1, 2, .. = N;

(ii) super-linearly if

‖vk − v‖ ≤ cq1q2...qk, ∀ k ∈ N, qk ∈ (0, 1), limk→∞

qk = 0;

(iii) quadratically if q ∈ (0, 1) and

‖vk − v‖ ≤ cq2k , ∀ k ∈ N.

♦

Usually, the real possibility to estimate the values ‖vk − v‖ is more im-portant, i.e., knowledge of the particular values of c and q is required. In thatcase it makes sense to speak about different types of rates of convergence fordifferent large intervals of the iterations index k. Sometimes we make use onlyof the knowledge of the existence of such constants without knowing their realvalues.

In the following we remind on some results about the convergence of se-quences of numbers, which are useful for proving of convergence results.

A3.1.4 Lemma. Let νk be a sequence in R, bounded from below such that

νk+1 ≤ νk + βk, βk ≥ 0,

∞∑k=0

βk <∞. (A3.1.1)

Then νk is convergent.

Proof: From the first inequality in (A3.1.1) we get

νi+1 ≤ ν0 +

∞∑k=0

βk, ∀ i.

Hence, νk is bounded. Let νkj be a convergent subsequence with limit ν.For each γ > 0 there exists a number j0 such that for j ≥ j0

νkj < ν +γ

2and

∞∑k=kj

βk <γ

2.

Due to (A3.1.1), for i > kj0 , the inequality

νi ≤ νkj0 +

∞∑k=kj0

βk

holds, hence, νi < ν + γ.Since γ is an arbitrary chosen positive number, the sequence νk cannot havea limit greater than ν, and because νkj is an arbitrary convergent subset, νmust be the unique limit.

The following results can be found, for instance by Polyak [330].


A3.1.5 Lemma. Let

νk+1 ≤ qνk + µ, 0 ≤ q < 1, µ > 0. (A3.1.2)

Then it holds

νk ≤µ

1− q+

(ν0 −

µ

1− q

)qk. (A3.1.3)

A3.1.6 Lemma. Letνk+1 ≤ qkνk + µk (A3.1.4)

with

0 ≤ qk < 1, µk ≥ 0,

∞∑k=1

(1− qk) =∞, µk1− qk

→ 0.

Thenlimk→∞νk ≤ 0.

In particular, if νk ≥ 0, then νk → 0.But, if in (A3.1.4)

qk ≡ q < 1, µk → 0, νk ≥ 0,

then νk → 0.

A3.1.7 Lemma. Let νk be a sequence of positive numbers satisfying

νk+1 ≤ (1 + µk)νk + βk, (A3.1.5)

with

µk ≥ 0, βk ≥ 0,

∞∑k=0

µk <∞,∞∑k=0

βk <∞.

Then νk → ν ≥ 0.

A3.1.8 Lemma. Let νk be a sequence of positive numbers satisfying

νk+1 ≤ (1− µk)νk + βk, (A3.1.6)

with

0 ≤ µk < 1, βk ≥ 0,

∞∑k=1

µk =∞, βkµk→ 0.

Then νk → 0.

A3.2 Methods of unconstrained minimization

A3.2.1 Minimization of convex univariate functions

We consider two variants of bisection methods which computes an approximateminimizer u∗ of a convex function J on an interval [a, b]. The first methodmakes use of function values only, whereas the second one is applicable if J isdifferentiable.

A3.2.9 Algorithm. (Bisection algorithm)


S0: Choose δ > 0 (desirable accuracy of the minimizer u∗);

S1: calculate J(

12 (a+ b− δ)

)and J

(12 (a+ b+ δ)

);

S2: a) if J(

12 (a+ b− δ)

)< J

(12 (a+ b+ δ)

), continue on the interval

[a, 1

2 (a+ b+ δ)];

b) if J(

12 (a+ b− δ)

)> J

(12 (a+ b+ δ)

), continue on the interval

[12 (a+ b− δ), b

];

S3: stop if the obtained interval is less than δ.

S4: : If J(

12 (a+ b− δ)

)= J

(12 (a+ b+ δ)

), then u∗ := 1

2 (a+ b).

♦

A3.2.10 Algorithm. (Bisection algorithm using derivatives)

S0: Choose δ > 0 (desirable accuracy of the minimizer);

S1: calculate J ′(a) and J ′(b);

S2: if J ′(a) ≥ 0, set u∗ := a; if J ′(b) ≤ 0 set u∗ := b;otherwise, compute J ′

(12 (a+ b)

);

if J ′(

12 (a+ b)

)< 0, continue on the interval

[12 (a+ b), b

];

if J ′(

12 (a+ b)

)> 0, continue on

[a, 1

2 (a+ b)];

S3: stop if the obtained interval is less than δ;

S4: if J ′(

12 (a+ b)

)= 0, then u∗ := 1

2 (a+ b).

♦

A3.2.2 Line search methods

Our investigations are mainly concerned with smooth variational problems orwith problems permitting to apply efficient smoothing procedures. Therefore,we will focus mainly on techniques of the differentiable optimization. Only inconnection with variational inequalities with multi-valued operators we will metnon-smooth techniques.

Now, we consider unconstrained minimization methods for convex, differ-entiable functions, i.e. we deal with problems

minJ(x) : x ∈ Rn,

assuming, if necessary, the existence of derivatives of higher order.The common structure of these methods is as follows. Let the iterate uk

be calculated at the k-th step, then uk+1 is determined by moving from uk in asuitable direction pk such that

uk+1 := uk + αkpk k = 1, 2, · · · .

The value αk is called step-length parameter.Particular methods differ in the choice of the direction vector pk and the

step-length αk. We consider only methods in which

〈pk,∇J(uk)〉 < 0, for ∇J(uk) 6= 0.


Ifpk := −∇J(uk) and αk := arg min

α≥0J(uk + αpk), (A3.2.7)

then we deal with a method of steepest descent, and the choice of the Newtondirection

pk := −[∇2J(uk)]−1∇J(uk), αk := 1, (A3.2.8)

leads to the classical Newton method. Since, in general, the calculation of uk+1 isdifficult (especially with the Newton direction), there exists a number of modi-fications of these methods. Most of them can be embedded in the following typeof algorithms:

A3.2.11 Method. (Gradient-type method)Data: µ ∈ [−1, 0), u0 ∈ R ;

S0: Set k := 0;

S1: if ∇J(uk) = 0 stop, uk is a minimizer;otherwise, determine pk 6= 0 such that

〈pk,∇J(uk)〉 ≤ µ‖pk‖‖∇J(uk)‖; (A3.2.9)

S2: find αk > 0 such that

J(uk + αkpk) < J(uk), (A3.2.10)

setuk+1 := uk + αkp

k;

S3: set k := k + 1 and go to Step 1.

♦

A3.2.12 Proposition. Let U∗ = Arg minu∈Rn J(u) be non-empty and boundedand the sequence uk is generated by Method A3.2.11. Assume that a decreas-ing function δ : R+ → R− exists with δ(t) < 0 for t > 0 such that

J(uk+1) ≤ J(uk) + δ(‖∇J(uk)‖), ∀ k ∈ N. (A3.2.11)

Then uk is bounded and each cluster point of uk belongs to U∗.

Now, if we suppose that for some µ ∈ [−1, 0) in S1 of Algorithm (A3.2.9)an efficient method is available, then, in order to prove convergence of the al-gorithm, it suffices to construct a suitable function δ or at least to establish itsexistence.

In the sequel we investigate different line search methods. In order to doso the following statement is useful (see Demyanov and Malozemov [88]).

A3.2.13 Proposition. Let J be continuously differentiable on an open set Ω′

and Ω ⊂ Ω′ be a compact set. Then there exists a function o : R+ → R+ such

that limγ↓0o(γ)γ = 0, and for all u ∈ Ω, p ∈ Rn and α ≥ 0 the inequality

J(u+ αp) ≤ J(u) + α〈p,∇J(u)〉+ o(α‖p‖) (A3.2.12)

is true.


Goldstein Search:With fixed σ ∈ (0, 1

2 ) and a given direction pk the step-length parameters αk > 0in (A3.2.10) are chosen such that

αk(1− σ)〈pk,∇J(uk)〉 ≤ J(uk + αkpk)− J(uk) ≤ αkσ〈pk,∇J(uk)〉, ∀ k.

(A3.2.13)This inequality means that the ratio q between both, the ascent of the secantgoing through the points (uk, J(uk)), (uk + αkp

k, J(uk + αkpk)) and the ascent

of the tangent at uk, is bounded by 1− σ ≥ q ≥ σ.If inequality (A3.2.9) holds, then the function δ in Proposition A3.2.12

exists. Indeed, since U∗ is non-empty and bounded, due to Proposition A1.7.55,the set u : J(u) ≤ J(u0) is compact. In view of the differentiability of theobjective function J Proposition A3.2.13 implies the existence of a functiono : R+ → R+ such that

limγ↓0

o(γ)

γ= 0

and the relation

J(uk + αkpk) ≤ J(uk) + αk〈pk,∇J(uk)〉+ o(α‖pk‖), ∀ k

is satisfied. Consequently, a non-increasing function α : R− → R+ exists withα(γ) > 0 for γ < 0 such that

J(uk + αkpk) < J(uk) + α(1− σ)〈pk,∇J(uk)〉 (A3.2.14)

holds with α‖pk‖ ∈(

0, α〈 pk

‖pk‖ ,∇J(uk)〉)

.

Due to the left inequality in (A3.2.13), as well as the relations (A3.2.9), (A3.2.14)and the monotonicity of α, we obtain

αk‖pk‖ ≥ α(µ‖∇J(uk)‖). (A3.2.15)

Moreover, on account of (A3.2.9), (A3.2.15) and the right-hand side in (A3.2.13)

J(uk+1) ≤ J(uk) +σ

‖pk‖〈pk,∇J(uk)〉α(µ‖∇J(uk)‖)

≤ J(uk) + σµ‖∇J(uk)‖α(µ‖∇J(uk)‖), ∀ k.

Thus, we can choose δ(t) = σµtα(µt).

Another often used procedure to calculate the step-length parameters αk > 0in (A3.2.10) is the so-called

Armijo Search:Suppose that there exists a constant c > 0 such that

‖pk‖ ≥ c‖∇J(uk)‖, ∀ k ∈ N. (A3.2.16)

We note that this assumption holds for almost all methods used in practice, inparticular, if pk is determined as a Quasi-Newton direction, i.e.,

pk := −Hk∇J(uk)


with

〈y,Hky〉 ≥ m‖y‖2, ∀ y ∈ Rn

is fulfilled with a constant m > 0 for all k.Now the Armijo Search reads as follows:Choose some τ > 0, β ∈ (0, 1) and σ ∈ (0, 1). Define on the k-th step thesmallest integer sk ≥ 0 such that αk := τβsk satisfies the relation

J(uk + αkpk)− J(uk) ≤ αkσ〈pk,∇J(uk)〉. (A3.2.17)

As in the Goldstein Search, for α‖pk‖ ∈(

0, α〈 pk

‖pk‖ ,∇J(uk)〉)

, and consequently

for α‖pk‖ ∈(0, α(µ‖∇J(uk)‖)

), the inequality

J(uk + αpk) ≤ J(uk) + ασ〈pk,∇J(uk)〉

holds. If sk > 0 then, instead of (A3.2.13, the inequalities

τβsk−1〈pk,∇J(uk)〉 < J(uk + τβsk−1pk)− J(uk),

J(uk + τβskpk)− J(uk) ≤ στβsk〈pk,∇J(uk)〉

can be used and we obtain analogously

J(uk + αkpk) ≤ J(uk) + βσµ‖∇J(uk)‖α(µ‖∇J(uk)‖).

If sk = 0 then, due to (A3.2.16) and (A3.2.17)

J(uk + αkpk) ≤ J(uk) + τµσc‖∇J(uk)‖2.

Consequently, (A3.2.11) is satisfied with

δ(t) = µσtmincτt, βα(µt), ∀ k. (A3.2.18)

Minimization along a direction:In this method αk is defined by an one-dimensional minimization problem alongthe given direction pk

αk := arg minα≥0

J(uk + αpk). (A3.2.19)

Obviously, (A3.2.11) holds with any function δ constructed above.

It is easy to verify that the existence of a suitable αk is guaranteed ineach of the methods described above if the optimal set U∗ is non-empty andbounded.

In the Goldstein Search the value αk is any positive root of the equationζ(α) = 0, with ζ : R+ → R defined by

ζ(α) =

ζ(α) if ζ(α) > 0,ζ(α) if ζ(α) < 0,0 otherwise


and

ζ(α) = J(uk + αpk)− J(uk)− ασ〈pk,∇J(uk)〉,ζ(α) = J(uk + αpk)− J(uk)− α(1− σ)〈pk,∇J(uk)〉.

In Polak[322] a bisection procedure is described which solves the equa-tion ζ(α) = 0 in a finite number of iterations. The Armijo Search of αk hasan algorithmic structure, and minimizing the univariate function J(uk + αpk)according to (A3.2.19), the methods considered above can be used.

A3.2.3 Gradient methods

Using formula (A1.5.15) in order to compute the directional derivative, one candefine a direction of the local steepest descent at a point u as

p := arg min‖y‖=1

〈∇J(u), y〉 = − ∇J(u)

‖∇J(u)‖.

Thus, setting pk := −∇J(uk), we obtain a gradient method. The conditions(A3.2.9) and (A3.2.16) are obviously satisfied. Using the step-size proceduresconsidered above, convergence statements for gradient methods are corollariesof Proposition A3.2.12.

If ∇J satisfies a Lipschitz condition with a known constant L, then thereexists a simpler rule to determine αk. In that case one can choose αk suchthat

0 < ε1 ≤ αk ≤2

L− ε2 (A3.2.20)

with arbitrary ε1 > 0, ε2 > 0. In particular, one can take αk = α ∀ k ∈ N, withα ∈ (0, 2

L ). Indeed, in view of (A1.4.10) for u := uk and w := −αk∇J(uk) itfollows

J(uk+1) = J(uk)− αk‖∇J(uk)‖2

− αk∫ 1

0

〈∇J(uk − ταk∇J(uk))−∇J(uk),∇J(uk)〉dτ

≤ J(uk)− αk‖∇J(uk)‖2 + Lα2k‖∇J(uk)‖2

∫ 1

0

τdτ

= J(uk)− αk(

1− 1

2Lαk

)‖∇J(uk)‖2 ≤ J(uk)− 1

2Lε1ε2‖J(uk)‖2

and inequality (A3.2.11) is satisfied with δ(t) = − 12Lε1ε2t

2.

A3.2.14 Proposition. Let J be a convex, differentiable function and ∇J beLipschitz-continuous (with constant L). Then, for αk := α ∀ k, α ∈ (0, 2

L ), thegradient method converges to a minimizer u∗ of the function J on Rn.If, in addition, J is strongly convex (with constant κ), then the convergence islinear, i.e.,

‖uk − u∗‖ ≤ cqk k = 1, 2, · · · ,

with c =√

1κ (J(u0 − J(u∗)) and q =

√1− 4κα+ 2Lκα2.

Obviously, q∗ =√

1− 2κL is minimal and is attained for α∗ = 1

L .


A3.2.15 Proposition. Let J be twice-differentiable and

m〈v, v〉 ≤ 〈v,∇2J(u)v〉 ≤M〈v, v〉 (A3.2.21)

for some m > 0,M and any v ∈ Rn, u ∈ Rn.Then, for αk := α ∀ k, α ∈ (0, 2

M ), the estimates

‖uk − u∗‖ ≤ qk‖u0 − u∗‖, k = 1, 2, · · · (A3.2.22)

hold with q(γ) := max|1− γm|, |1− γM |.The value q∗ := M−m

M+m becomes minimal if γ∗ := 2M+m .

Estimate (A3.2.22) is sharp in the following sense: It cannot be improved forany quadratic function

J(u) :=1

2〈Au, u〉 − 〈f, u〉,

whereA is a positive definite matrix with eigenvalues λmin(A) = m and λmax(A) =M .

Estimates of the rate of convergence give no clue for a preference of anyvariant of the gradient methods. These methods converge slowly in the case ofill-conditioning of the problems. If we know a suitable value of the Lipschitzconstant L, then the choice of αk according to (A3.2.20) is more convenient.However, the determination of L is usually a challenge.

Nevertheless, gradient methods are simple and converge globally underrelatively weak assumptions.

A3.2.4 Conjugate gradient methods

Methods of conjugate directions make use of successive quadratic approxima-tions of convex functions, because convex quadratic functions can be minimizedby means of these methods in a finite number of iterations.

First we consider a quadratic function

J(u) :=1

2〈Au, u〉 − 〈f, u〉, (A3.2.23)

with a positive definite [n× n]-matrix A and recall that the minimization of Jis equivalent to the solution of the equation Au = f .

A3.2.16 Definition. The pair of vectors pi, pj is called conjugate with respectto A or A-orthogonal if 〈Api, pj〉 = 0. ♦

If a system of pair-wise conjugate vectors pi (i = 0, ..., n − 1) is known,then the minimizer of J can be calculated easily. Indeed, using for the soughtsolution the representation

u :=

n−1∑i=0

αipi,

the equation Au = f provides

n−1∑i=0

αiApi = f.


Multiplying this equation successively by p0, ..., pn−1, it follows that

αi =〈f, pi〉〈Api, pi〉

.

Now, this determination of u can be described recurrently:

uk+1 := uk + αkpk, k = 0, ...,m (m ≤ n− 1),

withαk := arg min

α∈RJ(uk + αpk).

Moreover, uk+1 minimizes J on the affine set u0 + span(p0, ..., pk). This enablesus to construct conjugate directions also by means of the recurrent formuladescribed above.

Now we describe one of the variants of a conjugate gradient method (cg-method).

A3.2.17 Method. (Fletcher-Reeves cg-method)

S0: Choose an arbitrary u0 ∈ Rn and set

r0 := ∇J(u0), p0 := −r0, k := 0. (A3.2.24)

S1: Determineαk := arg min

α∈R+

J(uk + αpk), (A3.2.25)

S2: Setuk+1 := uk + αkp

k, (A3.2.26)

rk+1 := ∇J(uk+1), (A3.2.27)

βk+1 :=‖rk+1‖2

‖rk‖2, (A3.2.28)

pk+1 := −rk+1 + βk+1pk. (A3.2.29)

S3: Set k := k + 1 go to S1.

♦

A3.2.18 Proposition. The gradients r0, r1, ... in Method A3.2.17 are pair-wise orthogonal, and the directions p0, p1, ... are pair-wise A-orthogonal.

Proof: Assume that for fixed k ≥ 2

〈ri, rj〉 = 0, 〈Api, pj〉 = 0 for i < j ≤ k and ri 6= 0, i = 0, ..., k.

Obviously, ri = 0 implies the end of the iteration process. For k = 2 the firsttwo relations can be verified immediately.Then, using (A3.2.26) and (A3.2.29), we get

uk+1 := uk − αkrk +αkβkαk−1

(uk − uk−1)


and, consequently,

rk+1 := rk − 2αkArk +

αkβkαk−1

(rk − rk−1).

Due to the choice of αi, it follows for all i that

0 = J ′α(ui + αpi)|α=αi = 〈ri+1, pi〉, (A3.2.30)

and since rk 6= 0,

〈rk, pk〉 = −‖rk‖2 + βk〈rk, pk−1〉 < 0.

Thus, αk 6= 0 and Ark is a linear combination of the vectors rk+1, rk and rk−1.Henceforward we write Ark = span(rk+1, rk, rk−1).Analogously,

Ari = span(ri+1, ri, ri−1), 0 < i < k (A3.2.31)

can be established. In view of (A3.2.31) and the assumption of induction weobtain

〈Ari, rk〉 = 0, for i < k − 1,

hence,

〈rk+1, ri〉 = 〈rk − 2αkArk +

αkβkαk−1

(rk − rk−1), ri〉 = 0 for i = 0, ..., k − 2.

But (A3.2.24), (A3.2.29) imply

pk = span(rk, rk−1, ..., r0) (A3.2.32)

and〈rk+1, pi〉 = 0 for i = 0, ..., k − 2.

Furthermore,rk+1 := rk + 2αkAp

k

and

〈rk+1, pk−1〉 = 〈rk, pk−1〉+ 2αk〈Apk, pk−1〉 = 2αk〈pk−1, Apk〉 = 0.

Now, due to (A3.2.29) and (A3.2.30), we obtain

〈rk+1, rk〉 = −〈rk+1, pk〉+ βk〈rk+1, pk−1〉 = 0 (A3.2.33)

and, analogously,

〈rk+1, rk−1〉 = 〈rk+1,−pk−1 + βk−1pk−2〉 = 0.

According to (A3.2.31) and (A3.2.32) for i ≤ k − 1 it follows that

Api = span(ri+1, ..., r0)

and using pk+1 := −rk+1 + βk+1pk together with the assumption of induction,


〈Api, pk+1〉 = βk+1〈Api, pk〉 − 〈span(ri+1, ..., r0), rk+1〉 = 0 for i ≤ k − 1.


Finally, from (A3.2.26) we get

Apk =1

2αk(rk+1 − rk),

and the relations (A3.2.29), (A3.2.33), (A3.2.30) and (A3.2.28) lead to

〈Apk, pk+1〉 =1

2αk〈rk+1 − rk,−rk+1 + βk+1p

k〉

=1

2αk

(−‖rk+1‖2 − βk+1〈rk, pk〉

)=

1

2αk

(−‖rk+1‖2 − βk+1〈rk,−rk + βkp

k−1〉)

=1

2αk

(−‖rk+1‖2 + βk+1‖rk‖2

)= 0.

This statement implies immediately the following result.

A3.2.19 Proposition. Method A3.2.17 defines a minimizer of the quadraticfunction (A3.2.23) in at most n iterations.

A3.2.20 Remark. Proposition A3.2.19 remains true if the matrix A is positivesemi-definite and minu∈Rn J(u) > −∞. The number of iterations required doesnot exceed q = rank(A). ♦

Method A3.2.17 can be applied without any change also to the mini-mization of convex, non-quadratic and differentiable functions J : Rn → R. Ofcourse, in that case the determination of the solution requires infinitely mayiterations. For corresponding convergence statements see Daniel [86].

In order to avoid an accumulation of the errors, for non-quadratic func-tions Method A3.2.17 has to be slightly changed. A restart procedure has tobe implemented with βk = 0 after some steps. Usually, instead of (A3.2.28), βkis determined by

βk :=

0 if k = n, 2n, ...,‖rk‖2‖rk−1‖2 if k 6= n, 2n, ....

(A3.2.34)

A3.2.21 Proposition.

(i) Let J be a convex, differentiable function and the optimal set U∗ be non-empty and bounded. Then, the sequence uk, generated by AlgorithmA3.2.17 with an infinite number of restarts via (A3.2.34), is boundedand each limit point belongs to U∗.

(ii) If, moreover, J is strongly convex and twice-continuously differentiable,then uk converges super-linearly.If the Hessian ∇2J(u∗) fulfills a Lipschitz condition in a neighborhoodO(u∗), then the estimate

‖u(m+1)n − u∗‖ ≤ c‖umn − u∗‖2 ∀ m ∈ N

holds in O(u∗).


Solving large-scale problems, an estimate of the distance between the iter-ates uk and the minimizer u∗ is of interest even in the case of quadratic functionsfor k < n.For the quadratic function (A3.2.23) with eigenvalues λ(A)min > m > 0, λ(A)max <M , the estimate

‖uk − u∗‖ ≤ 2

√M

mqk‖u0 − u∗‖

with

q =

√M −

√m√

M +√m

holds, and for m = λ(A)min, M = λ(A)max this estimate cannot be improved.There are a lot of cg-methods that differ mainly in the choice of βk (see,

in particular, Fletcher [116], Gill, Murray and Wright [128], Al-Baali[4]). A sufficient general approach for cg-methods is described by Pshenicnyjand Danilin [341].

For large-scale problems, in practice, the solution procedure becomes es-sentially faster in a neighborhood of the solution if the eigenvalues of the Hessian∇2J(u∗) are clustered into sets containing the eigenvalues of ∇J(uk) with sim-ilar magnitude. This happens often when penalty methods are used, especially,when the problem under consideration results from a discretization.

For the minimization of a quadratic function with a large number of vari-ables special preconditioning procedures have been developed in order to accel-erate the cg-method (see Concus and Golub [80], Glowinski et al. [136]and Blaheta [45]). The large size and the condition of many technical or phys-ical applications result in the need for efficient parallelization and precondition-ing techniques of the cg-methods, see for instance Basermann, Reichel andSchelthoff [33].

A3.2.5 Quasi-Newton methods

Suppose that the objective function J : Rn → R is strongly convex (withconstant κ) and twice-differentiable.

First we consider Newton’s method for solving unconstrained minimizationproblems.Starting with u0 ∈ Rn the method determines recursively

uk+1 := uk − [∇2J(uk)]−1∇J(uk), ∀ k ∈ N, (A3.2.35)

and can be considered as the classical (undamped) Newton method for solvingthe equation ∇J(u) = 0.

A3.2.22 Proposition. Let the Hessian ∇2J satisfy a Lipschitz condition (withconstant L). If the starting point u0 satisfies the condition

q :=L

2κ2‖∇J(u0)‖ < 1,

then Method (A3.2.35) converges to the minimizer u∗ with a quadratic rate, i.e.,

‖uk − u∗‖ ≤ 2κ

Lq2k .


A3.2.23 Remark. For the damped Newton method

uk+1 := uk − αk[∇2J(uk)]−1∇J(uk), ∀ k ∈ N, (A3.2.36)

with αk chosen according to the Armijo Search (with τ = 1 and σ ∈ (0, 12 )),

in Pshenicnyj and Danilin [341] it is shown that under the assumptions ofProposition A3.2.22 the damped Newton method converges super-linearly forarbitrary starting point u0, and after some iterations it turns into the undampedNewton method, i.e., αk = 1 for k ≥ k′. ♦

The computation of the Hessians ∇2J(uk) and especially the solution ofthe system

∇2J(uk)(u− uk) = ∇J(uk)

are expensive procedures. Therefore, numerous modifications of Newton’s methodhave been developed, in which, instead of ∇2J(uk), a sequence of symmetricmatrices Hk ∈ Rn×n with

‖Hk − [∇2J(uk)]−1‖ → 0 for k →∞ (A3.2.37)

is constructed in a simpler way. In the majority of such methods the currentmatrix Hk is efficiently used to calculate Hk+1.These so-called Quasi-Newton methods have the form

uk+1 := uk − αkHk∇J(uk) k = 1, 2, · · · . (A3.2.38)

In order to give a reason for formula (A3.2.37) we consider the Taylor expansionof ∇J :

∇J(uk+1)−∇J(uk) = ∇2J(uk)(uk+1 − uk) + o(‖uk+1 − uk‖).

If uk → u∗, then due to the continuity of ∇2J

‖∇2J(uk)−∇2J(u∗)‖ → 0.

It is obvious that for arbitrary matrices Bk := H−1k the relation

∇J(uk+1)−∇J(uk) = Bk(uk+1 − uk) + o(‖uk+1 − uk‖) (A3.2.39)

cannot be guaranteed, because Hk is calculated before uk+1 is.However, together with Hk → [∇2J(u∗)]−1 the characteristic property of quasi-Newton methods is their finiteness for strongly convex, quadratic functions.Since for a quadratic J

∇J(uk+1)−∇J(uk) = ∇2J(uk)(uk+1 − uk) = 2A(uk+1 − uk),

it is natural to preserve the validity of the so-called quasi-Newton condition

Hk+1yk = αkp

k, (A3.2.40)

with yk := ∇J(uk+1)−J(uk) and pk := −Hk∇J(uk). In other words, we ”fulfil”condition (A3.2.39), in which the terms of higher order are deleted, with onemodification: Bk+1 has to be used instead of Bk.

Moreover, it is important that these matrices Hk are symmetric and the


transition from Hk to Hk+1 is as simple as possible. In particular, if Hk+1 isdefined such that

Hk+1 := Hk + akuvT ,

i.e., with a rank-1-update, then symmetry of Hk+1 requires that v = u. Due tothe quasi-Newton condition (A3.2.40) we get

Hkyk + akuv

T yk = αkpk.

Consequently, the vector u is proportional to αkpk − Hky

k, and inserting thecoefficient of proportionality into ak, we take

u = αkpk −Hky

k,

with ak := 〈u, yk〉−1.The corresponding formula

HBk+1 := Hk +

(αkpk −Hky

k)(αkpk −Hky

k)T

(αkpk −Hkyk)T yk(A3.2.41)

leads to the Broyden method and exploits the possibilities of rank-1-updates.Using a symmetric rank-2-update

Hk+1 := Hk + akuuT + bkvv

T ,

the quasi-Newton condition

Hkyk + (akuu

T + bkvvT )yk = αkp

k

defines a one-parametric family of formulas (Broyden class). In particular, for

u := αkpk, v := Hky

k, akuT yk = 1, bkv

T yk = −1,

the Davidon-Fletcher-Powell method

HDFPk+1 := Hk + αk

pk(pk)T

(yk)T pk− Hky

k(yk)THk

(yk)THkyk(A3.2.42)

can be obtained.We refer also to an often used other formula of this family: the Broyden-

Fletcher-Goldfarb-Shanno class

HBFGSk+1 := Hk+

(1 +

(yk)THkyk

αk(pk)T yk

)αk(pk)(pk)T

(pk)T yk−(pk(yk)THk +Hky

k(pk)T

(pk)T yk

).

(A3.2.43)Any formula of the Broyden family can be written as

Hξk+1 := (1− ξ)HDFP

k+1 + ξHBFGSk+1

and for an arbitrary ξ ≥ 0 the following properties can be guaranteed for arbi-trary starting point u0 and arbitrary positive definite matrix H0:

(i) positive definites of the matrices Hk ∀ k;


(ii) super-linear convergence with respect to m for the sequence ‖umn−u∗‖in a neighborhood of the minimizer u∗, in particular, for some methodsquadratic convergence can be established in O(u∗) if ∇2J is Lipschitz-continuous.

Moreover, for quadratic functions (A3.2.23) it holds also:

(iii) termination of the iteration after m ≤ n steps, and Hm = A−1 if m = n;

(iv) independence of the sequence uk from parameter ξ;

(v) coincidence of the iterates of these methods and the iterates of conjugategradient methods, if H0 := I is chosen.

It is useful to consider quasi-Newton methods also from another point ofview. If, together with the scalar product 〈·, ·〉 in Rn a new scalar product 〈·, ·〉Bis introduced by means of an arbitrary positive definite matrix B, i.e.

〈u, v〉B := 〈Bu, v〉,

then the gradient ∇BJ of function J (on the space with this new scalar product)is related with ∇J by

∇BJ(u) := B−1∇J(u).

In this new metric the gradient method reads as

uk+1 := uk − αkB−1k ∇J(uk) k = 1, 2, · · · . (A3.2.44)

It is easily seen that gradient methods are not invariant with respect to thechoice of the metric, and in the case Bk := ∇2J(uk) an essentially faster gradi-ent method is created. Hence, one can attempt to accelerate a gradient methodby means of a suitable metric.

In order to obtain a faster decreasing of the function J , the choice of Bk :=∇2J(uk) can be considered as the best one. Therefore, quasi-Newton methodscan be interpreted as gradient methods from the point of view that at each stepa new metric is chosen as close as possible to the best one. Note that the Newtonmethods and quasi-Newton methods considered above, if H0 := [∇2J(u0)]−1 istaken for the latter ones, are invariant with respect to the choice of any metric.

Hence, it makes sense that often quasi-Newton methods are called variablemetric methods, but the latter class is substantially broader. For a review ofsuch methods see Luksan and Spedicato [280]. Investigations about globaland super-linear convergence of certain classes of variable metric methods can befound by Ritter [346]. In Vlcek and Luksan [410] variable metric methodswith a limited-memory effect are studied for large-scale unconstrained optimiza-tion.

The number of publications concerning quasi-Newton methods and cg-methods is growing very quickly. Here we refer only to some monographs andspecial collection papers (cf. Fletcher [116], Gill, Murray and Wright[128], Hestens [172], Dixon and Szego [91], Lootsma [278]). In Broydenand Vespucci [58] a brief review of cg-methods is given.

There are also important investigations taking into account inexact linesearch methods (cf. Powell [333], Al-Baali [4]), functions with sparse Hes-sians (cf. Toint [397], Griewank and Toint [149], Powell and Toint


[335]) and not strongly convex functions (cf. Powell [333], Pshenicnyj andDanilin [341], Dennis and Schnabel [90]).

Concerning a comparison of different methods from the numerical pointof view we refer to the analysis of Fletcher [117], Schittkowski [363] andBrezinski [49]. For ill-posed problems the cg-method is studied by Frommerand Maass [122].

For non-quadratic functions, numerical experiments show that quasi-Newtonmethods are more efficient. However,x cg-methods require to store only somecurrent n-dimensional vectors, whereas in quasi-Newton methods it is neces-sary to store and to update n × n-matrices. Thus, the latter methods may beinapplicable for large-scale problems which still can be solved by cg-methodsquite successful. Although, there are recently some developments concerning in-exact quasi-Newton and cg-methods, which allow to work in lower dimensionalspaces. These algorithms forces the iterates to stay on an appropriate manifold,so called Krylow subspace, of a dimension much smaller than that of the Hessianitself and reinitializes approximate curvature along directions off the manifold.In this way, typical difficulties associated with ill-conditioned can be overcome,see for instance Gill and Leonhard [129], Fasano [109].

A3.2.24 Remark. Quasi-Newton- and variable metric methods can also ap-plied to the minimization of non-differentiable functions. Here the transitionto a new metric is performed by stretching the space in the direction of somesubgradients at the previous point or in the direction of the difference betweensome subgradients at the last two points, see for instance, Shor [371]. Theseideas have been further developed, for instance, by Uryas’ev [404], Kiwiel[236] and Lemarechal and Sagastizabal [262]. ♦

A3.3 Minimization methods subject to simple constraints

In this subsection we describe methods which are applicable for solving mini-mization problems arising from the discretization of elliptic variational inequal-ities.

They are destined to solve Problem (A1.5.11), with V = Rn and box-

constraints likeK = ⊗ni=1Si, Si = [ai, bi]. (A3.3.45)

The cases ai = −∞ and bi = +∞ are not excluded.Discretization of variational inequalities by means of finite element meth-

ods or finite difference methods leads to problems in which the Hessian of theobjective functions have a special structure, i.e., non-zero elements are locatedonly along some diagonals. Usually, the approximate problems of real-life appli-cations have a large dimension, which confines essentially the use of universaloptimization methods.

Relaxation methods, based on the coordinate descent, and gradient typemethods make use in an effective manner of the structure and sparseness of theHessian of the objective function in the approximate problems.

A3.3.1 Method of coordinate descent

Now, we consider the following method to minimize a strictly convex and dif-ferentiable function J : Rn → R on box-constraints (A3.3.45). Assume that for


fixed u0 ∈ K the set

K = K ∩ u : J(u) ≤ J(u0)

is compact.

A3.3.25 Method. (Coordinate descent method)

S0: Set k := 0, i := 0, u0,1 := u0.

S1: Denote ui,k+1 := (uk+11 , ..., uk+1

i , uki+1, ..., ukn), 0 ≤ i ≤ n− 1.

Compute

uk+1i+1 := arg min

v∈Si+1

J(uk+11 , ..., uk+1

i , v, uki+2, ..., ukn) (A3.3.46)

and set

ui+1,k+1 := (uk+11 , ..., uk+1

i , uk+1i+1 , u

ki+2, ..., u

kn).

S2: If i < n− 1, set i := i+ 1 and go to S1;if i = n− 1, set k := k + 1, i := 0 and go to S1.

♦

Convergence of this method can be proved by means of the following propertyof strictly convex functions.

A3.3.26 Proposition. Let f : Rn → R be a strictly convex function andC be a convex, compact set. Then there exists a strictly increasing functionτ : R+ → R+, with τ(0) = 0, such that for all u, v ∈ C and all q ∈ ∂f(v)

f(u)− f(v) ≥ 〈q, u− v〉+ τ(‖u− v‖).

A3.3.27 Proposition. If the objective function J in Problem (A1.5.11) isstrictly convex and continuously differentiable and K is a compact set (with Kgiven by (A3.3.45)), then the iterates, generated by Method A3.3.25, convergeto

u∗ = arg minu∈K

J(u).

The idea of the proof consists in the following: Since J(uk+1) ≤ J(uk)holds for arbitrary k and K is a compact set, there exists limk→∞ J(uk) = J .From Proposition A3.3.26 we obtain

‖uk+1 − uk‖ → 0, for k →∞. (A3.3.47)

If J = J(u∗) + d, with d > 0, then due to (A3.3.47), the relation

−d > J(u∗)− J(uk) ≥ 〈∇J(uk), u∗ − uk〉,

and the continuity of ∇J , it holds

lim infk→∞

∣∣uk+1i0(k) − u

ki0(k)

∣∣ > 0


for indices i0(k) for which

∂J(uk)

∂ui0(k)

(u∗i0(k) − u

ki0(k)

)= min

1≤i≤n

∂J(uk)

∂ui(u∗i − uki ).

But this contradicts (A3.3.47).

If J(u) = 12 〈Au, u〉 − 〈f, u〉 with a positive definite matrix A, then the

method can be generalized in the following way.In order to determine uk+1

i+1 , instead of (A3.3.46), the procedure

uk+1/2i+1 = arg min

v∈RJ(uk+1

1 , ..., uk+1i , v, uki+2, ..., u

kn),

uk+1i+1 = min

max

[ai+1, (1− ω)uki+1 + ωu

k+1/2i+1

], bi+1

(A3.3.48)

with ω ∈ (0, 2) is implemented. The fixed value ω is called relaxation parame-ter. Algorithm (A3.3.25) is included in the generalized method (A3.3.48) if onetakes ω = 1. In the case K = Rn and ω = 1 we obtain the Gauss-Seidel method.Proposition A3.3.27 remains true also for the generalized method (A3.3.48).

In order to compute the component uk+1i+1 according to (A3.3.46) the meth-

ods of univariate minimization described in Subsection A3.2.1 can be used. IfJ is a quadratic functional, then u

k+1/2i+1 is calculated as solution of the linear

equation∂

∂vJ(uk+1

1 , ..., uk+1i , v, uki+2, ..., u

kn) = 0.

Note that a generalization of the method of coordinate descent towardsa block-coordinate-wise descent is possible in which, instead of the scalar com-ponent ui, a vector component ui ∈ Rmi is considered with Rn = ⊗Ni=1R

mi

(n =∑Ni=1mi), K = ⊗Ni=1Si and Si ⊂ Rmi are assumed to be convex and

closed. IN this case we deal with the method of block relaxation, see Glowin-ski, Lions and Tremolieres [135].

A3.3.2 Gradient projection methods

Assume that K ⊂ Rn is an arbitrary convex, closed set. Let

ΠK(u) = arg min‖v − u‖ : v ∈ K,

be the projection of the point u onto the set K. Choosing u0 ∈ K, according tothe gradient method we determine an intermediate iterate

uk+1/2 := uk − αk∇J(uk)

and thenuk+1 := ΠK(uk+1/2). (A3.3.49)

In case K = ⊗Ni=1Si, Si := [ai, bi], the projection (A3.3.49) can be performedvery easily:

uk+1i := min

max

[uk+1/2i , ai

], bi

.

We formulate a convergence result for Method (A3.3.49) in case αk = α, k =1, 2, · · · .


A3.3.28 Proposition. Let K be a convex, closed subset of Rn. Assume thatJ : Rn → R is a convex, differentiable function and ∇J fulfils a Lipschitzcondition on K with constant L. If the optimal set U∗ 6= ∅, then the gradientprojection method with α ∈ (0, 2

L ) guarantees that uk → u∗ ∈ U∗.If, moreover, J is twice-differentiable and

m(v, v) ≤ 〈∇2J(u)v, v〉 for some m > 0 and any u ∈ K, v ∈ Rn,

then Method (A3.3.49) converges linearly and for α = 1L the factor of the geo-

metric progression is q∗ = max|1− αm|, |1− αL|.

A3.3.3 Conjugate gradient method

As a last method in this subsection we consider a specification of the Fletcher-Reeves Method A3.2.17, which was developed by Polyak [329] in order tominimize a convex, differentiable function J : Rn → R on the set K given in(A3.3.45).

A3.3.29 Method. (Polyak’s conjugate gradient method)

S0: Choose an arbitrary u0 ∈ K = ⊗ni=1Si.Define

I0 :=

i : u0

i := ai,∂J(u0)

∂ui> 0

∪i : u0

i := bi,∂J(u0)

∂ui< 0

,

p0i :=

−∂J(u0)

∂uiif i /∈ I0,

0 if i ∈ I0,

set p0 := (p01, ..., p

0n) and k := 0.

S1: Calculate

αk = arg minJ(uk + αpk) : α ≥ 0, a ≤ uk + αpk ≤ b, (A3.3.50)

set uk+1 := uk + αkpk.

S2: If ∂J(uk+1)∂ui

= 0 ∀ i /∈ Ik, set

Ik+1 :=

i : uk+1

i := ai,∂J(uk+1)

∂ui> 0

∪i : uk+1

i := bi,∂J(uk+1)

∂ui< 0

and go to S3;otherwise, set

Ik+1 := Ik ∪ i : uk+1i := ai ∪ i : uk+1

i := bi.

S3: Compute

βk+1 :=

∑i/∈Ik+1

(∂J(uk+1)∂ui

)2(∑

i/∈Ik+1

(∂J(uk)∂ui

)2)−1

if Ik+1 = Ik,

0 if Ik+1 6= Ik,

pk+1i :=

−∂J(uk+1)

∂ui+ βk+1p

ki if i /∈ Ik+1,

0 if i ∈ Ik+1,(A3.3.51)

and set pk+1 := (pk+11 , ..., pk+1

n ).


S4: Set k := k + 1 and go to S1.

♦

A3.3.30 Proposition. In order to minimize the quadratic function J in (A3.2.23)(with positive semi-definite matrix A) on the feasible set K given by (A3.3.45),Method A3.3.29 takes a finite number of iterations.

Note, if Ik = Ik+1 = ... = Ik+r, the method performs a conjugate direc-tion search on a corresponding subspace. In the non-quadratic case one can notexclude that r → ∞. The possible infiniteness of Method A3.3.29 requires tocontrol the accuracy of the search on the hyperplane and to introduce restartprocedures analogous to those in the Fletcher-Reeves Method A3.2.17.

Now, we briefly consider methods for solving quadratic programming prob-lems

(QP ) minimize J(u) = 12 〈Au, u〉 − 〈f, u〉

subject to u ∈ K := v ∈ Rn : Bv ≤ d, (A3.3.52)

with B ∈ Rm×n, d ∈ Rm, and A ∈ Rn×n is supposed to be positive definite.Note that for problems with linear constraints the Kuhn-Tucker-Theorem

A1.7.51 remains true without Slater condition. Using this statement, it is pos-sible to solve the dual problem of (A3.3.52), which can be written in the form

minλ≥0ψ(λ) =

1

2〈A−1(f −BTλ, f −BTλ〉+ 〈d, λ〉, (A3.3.53)

(cf. the transition from Problem (A3.4.56) to Problem (A3.4.67) below).Thus, methods described in Subsection A3.3 can be applied for solving Problem(A3.3.53). But, in general, this approach is not quite efficient, because we needto know the inverse matrix A−1 or have to solve an equation system with matrixA in each step of the method considered. Indeed,

∇ψ(λ) = BA−1BTλ+ d−BA−1f,

with BA−1BT a positive semi-definite matrix, and in order to compute ∇ψ(λ)we have to solve the systems

Ax = f and Ay = BTλ. (A3.3.54)

The second system must be solved for fixed λ in each step.However, quadratic problems resulting from finite element discretization

of elliptic variational inequalities have often matrices with diagonal and sparsestructure, which enables us to solve the equation systems (A3.3.54) sufficientlyfast. Here sparse LU -factorization is used for the matrix A and row and columnupdates are performed for the constraint matrix B, see for instance, Gould[144]. Another approach to solve large-scale quadratic programming problemsis due to the application of interior point methods, which are very efficient (cf.Cafieri et al. [67]).


A3.4 Methods for convex minimization problems

We start with the description of the cone of feasible directions for Problem(A1.7.35).

A3.4.31 Definition. Let K be a convex set in the Hilbert space V . The set

Cu(K) := p ∈ V : p 6= 0, u+ γp ∈ K for some γ > 0

is said to be a cone of feasible directions at point u ∈ K, and its elements p arecalled feasible directions at the point u. ♦

In (A1.5.23) this set is used in order to characterize the solutions of Problem(A1.5.11). Obviously, Cu(K) is a convex cone. The closure of this cone can bedescribed as follows.

A3.4.32 Proposition. Assume that the constraints of Problem (A1.7.35) fulfilthe Slater condition and that the constraint functions gi are Gateaux-differentiableon V . Then for u ∈ K the relation

cl(Cu(K)) = p ∈ V : 〈∇gj(u), p〉 ≤ 0, j ∈ I0(u), (A3.4.55)

holds with I0(u) = j : gj(u) = 0.

Now we consider the finite-dimensional variant of Problem (A1.7.35):


K := v ∈ Rn : gj(v) ≤ 0, j = 1, ...,m,(A3.4.56)

J, gj : Rn → R are convex, differentiable functions, the Slater condition isfulfilled and U∗ 6= ∅.

Obviously, Proposition A3.4.32 is applicable also to this problem.

A3.4.33 Remark. The Kuhn-Tucker Theorem A1.7.51 and Proposition A3.4.32remain also true for Problem (A3.4.56) if the Slater condition is weakened inthe following way:

there exists a point u ∈ K such that gj(u) < 0 for those functionswhich are not affine.

♦

Due to the Propositions A1.5.34 (with J2 ≡ 0) and A3.4.32, for u ∈ Kwith J(u) > J∗ there exists a vector p′ such that

〈∇J(u), p′〉 < 0,

〈∇gj(u), p′〉 ≤ 0, ∀ j ∈ I0(u).

Taking p := p′ + α(u− u), then for sufficiently small α > 0 we obtain

〈∇J(u), p〉 < 0


and

〈∇gj(u), p〉 = 〈∇gj(u), p′〉+ α〈∇gj(u), u− u〉≤ α(gj(u)− gj(u)) < 0 ∀ j ∈ I0.

Conversely, if for u ∈ K the inequalities

〈∇J(u), p〉 < 0,

〈∇gj(u), p〉 ≤ 0, j ∈ I0(u)

are satisfied, then there exists a constant α > 0 such that

u+ αp ∈ K and J(u+ αp) < J(u).

Indeed, the inequalities

J(u+ αp) < J(u),

gj(u+ αp)〉 < 0, j /∈ I0(u)

are obvious for sufficiently small α > 0, and for j ∈ I0(u) we get

gj(u+ αp) = gj(u+ αp)− gj(u) ≤ α〈∇gj(u+ αp), p〉

and there has nothing else to be done than to use the continuity of ∇gj .This explains the choice of the directions p in the following methods.

A3.4.1 Feasible direction methods

A3.4.34 Method. (Method of feasible directions)Data: u0 ∈ K, δk ↓ 0.

S0: Set k := 0.

S1: Define Iδ(uk) := j : gj(u

k) ≥ −δk anddetermine the vector (pk, σk) ∈ Rn+1 as solution of the direction searchproblem

minσ

〈∇J(uk), p〉 ≤ σ

〈∇gj(uk), p〉 ≤ σ, ∀ j ∈ Iδ(uk) (A3.4.57)

〈p, p〉 ≤ 1 ( or max1≤i≤n

|pi| ≤ 1). (A3.4.58)

S2: If −σk > δk, compute

αk ≈ arg minJ(uk + αpk) : uk + αpk ∈ K, (A3.4.59)

uk+1 := uk + αkpk

δk+1 := δk and go to S3.

Otherwise set uk+1 := uk, δk+1 := 12δk.


S3: Set k := k + 1 and go to S1.

♦

In (A3.4.59) the step-length αk can be chosen according to

αk := arg minJ(uk + αpk) : uk + αpk ∈ K (A3.4.60)

or

αk : uk+1 := uk + αkpk ∈ K

J(uk+1) ≤ infuk+αpk∈K

J(uk + αpk) + εk(A3.4.61)

with∞∑k=0

εk <∞. (A3.4.62)

Also Armijo search, described in Subsection A3.2.1, can be implementedwith simple modifications guaranteeing that uk + αkp

k ∈ K.

A3.4.35 Remark. The calculation of the direction pk can be improved if foraffine functions gj the constraints 〈∇gj(uk), p〉 ≤ σ are replaced by homogenousones 〈∇gj(uk), p〉 ≤ 0. ♦

A3.4.36 Proposition. Suppose the assumptions concerning Problem (A3.4.56)are fulfilled. Moreover, assume that the gradients ∇J and ∇gj (j = 1, ...,m)satisfy a Lipschitz condition on K and the set U∗ is non-empty and bounded.Then the sequence uk, generated by Method A3.4.34, converges to U∗ if

(i) αk is chosen according (A3.4.60) or

(ii) αk is calculated by means of the conditions (A3.4.61) (A3.4.62).

The parameter δk in S1 has to be chosen such that too short steps areavoided. Using I0(uk) instead of Iδ(u

k), examples are known which show thatJ(uk)→ J > J∗. Constraint (A3.4.58) is introduced to guarantee the solvabilityof the direction search problems in S1. Usually, the constraint max1≤i≤n |pi| ≤ 1is chosen in order to deal in the directions search with a linear programmingproblem.

A3.4.37 Remark. Any method ensuring convergence of the sequence uk toU∗ for Problem (A3.4.56) determines a feasible starting point u0 after a finitenumber of steps if it is applied to the auxiliary problem

minun+1 : gj(u) ≤ un+1, j = 1, ...,m, J(u) ≤ c

with c sufficient large.For this problem a feasible starting point can be found trivially. The constraintJ(u) ≤ c becomes superfluous if the convergence conditions of the method useddo not require boundedness of the optimal set, or if K itself is bounded. ♦

It is easily seen in S1 of Method A3.4.34, if there are no constraints ofthe original problem and the constraint 〈p, p〉 ≤ 1 is used, then the method offeasible directions turns into a gradient method.


A3.4.2 Linearization methods

The idea of the linearization of the original problem is basic for many algorithmsin convex programming. Now a method, suggested by Pshenicnyj [337], [340],will be described using a different line search procedure as given in the papersmentioned.

A3.4.38 Method. (Pshenicnyj’s linearization method)Data: u0 ∈ Rn, α > 0, εk ↓ 0;

S0: Set k := 0.

S1: Determine

Ik(uk) = j : gj(uk) ≥ max[0, max

1≤j≤ngj(u

k)]− ε. (A3.4.63)

S1: Compute an approximate uk of the quadratic programming problem

minψk(u) = 〈∇J(uk), u− uk〉+1

2‖u− uk‖2

gj(uk) + 〈∇gj(uk), u− uk〉 ≤ 0, ∀ j ∈ Ik(uk),

(A3.4.64)

such that ∣∣∇ψk(uk)−∇ψk(uk)∣∣ ≤ εk, (A3.4.65)

with uk the exact solution of Problem (A3.4.64).

S2: Setuk+1 := uk + α(uk − uk),

k := k + 1 and go to S1.

♦

A3.4.39 Proposition. Suppose that

(i) the assumptions for Problem (A3.4.56) are satisfied;

(ii) the gradients of the functions J, gj (j = 1, ...,m) obey a Lipschitz conditionwith constants L,Lj (j = 1, ...,m), respectively;

(iii) the step-length α is chosen such that

0 < α < 2

L+ 2

m∑j=1

Ljλ∗j

−1

with λ∗ a Lagrange multiplier of Problem (A3.4.56);

(iv) the series∑∞k=1 εk converges and ε = +∞.

Then the sequence uk, generated by Method A3.4.38, converges to some ele-ment u∗ ∈ U∗. If, moreover, J is strongly convex and uk is the exact solutionof Problem (A3.4.64), then

‖uk − u∗‖ ≤ cqk, q ∈ (0, 1).


In Polyak [330] it is noted that an analogous proposition is true for any ε > 0.The idea to decrease the number of constraints of Problem (A3.4.56) by

means of a chosen small ε in (A3.4.63) seems to be promising. However, in [340]Pshenicnyj notes that the analysis of numerical experiments leads to a con-verse conclusion: ε should be chosen large enough in order to regard as manyconstraints of the initial problem as possible.

Now, we deal briefly with the numerical solution of the quadratic program-ming problem (A3.4.64). The feasible set of this problem contains the originalfeasible set K and, because ψk is strongly convex, the problem is uniquely solv-able.

Due to the condition∇uLk(u, λ) = 0, (A3.4.66)

withLk(u, λ) := ψk(u) +

∑j∈Ik(uk)

λj[gj(u

k) + 〈∇gj(uk), u− uk〉]

the Lagrangian function of Problem (A3.4.64), we obtain

u = u(λ) := uk −∇J(uk)−∑

j∈Ik(uk)

λj∇gj(uk).

Hence, the determination of the Lagrange multiplier vector λk leads to theproblem

maxLk(u(λ), λ) : λj ≥ 0 ∀ j ∈ Ik(uk),

which can be reformulated as

min

1

2

∥∥∇J(uk) +∑

j∈Ik(uk)

λj∇gj(uk)∥∥2 −

∑j∈Ik(uk)

λjgj(uk)

s.t. λj ≥ 0, ∀ j ∈ Ik(uk).

(A3.4.67)

In this way, in order to calculate λk, we deal with a quadratic programmingproblem with simple constraints. This problem can be solved in a finite numberof iterations, for instance, by means of the conjugate gradient Method A3.3.29.Thus,

∇uLk(uk, λk) = 0,

implies

uk := uk −

∇J(uk) +∑

j∈Ik(uk)

λkj∇gj(uk)

,i.e., the iterate uk can be simply calculated after the determination of the mul-tiplier λk.

For unconstrained problems this linearization method (with εk = 0 ∀ kturns into a gradient method with constant step-length.

A3.4.40 Remark. In order to simplify the linearization method it seems to benatural to omit in the quadratic program (A3.4.64) the quadratic term of theobjective. But usually the numerical algorithm breaks off, because the linear


objective function can be unbounded on the polyhedral set. Some linearizationmethods are known, in which solvability of the linearized auxiliary problemsand convergence of the iterates are obtained by looking for other ways andmeans (see Mangasarian [284], Pironneau and Polak [321], which includemethods with linear auxiliary problems, too. ♦

A3.4.3 Penalty methods

We pay special attention to this class of methods, because they are often usedfor the construction of stable algorithms for ill-posed problems. Here they arestudied in a more general setting than given in Problem (A3.4.56). We considerthe problem

minJ(u) : u ∈ K, (A3.4.68)

with J : Rn → R a convex function and K = G∩H, where G ⊂ Rn is a convex,closed set and H ⊂ Rn is an affine set such that H ∩ intG 6= ∅.

We introduce families φ1k and φ2

k of convex functions mapping fromRn into R. Their properties will be specialized in Theorem A3.4.41 below.

The principal idea of penalty methods consists in the following:Choose a sequence εk ↓ 0, set

Fk = J + φ1k + φ2

k,

and compute uk according to

Fk(uk) ≤ infu∈Rn

Fk(u) + εk. (A3.4.69)

A3.4.41 Theorem. Assume that the solution set U∗ of Problem (A3.4.68) isnot-empty and bounded and that the systems of convex functions φ1

k, φ1k have

the following properties

limk→∞

φ1k =

0 if u ∈ intG,+∞ if u 6= G,

(A3.4.70)

and

φ2k(u) ≥ 0, u ∈ Rn,

limk→∞

φ2k(u) = 0, u ∈ H, (A3.4.71)

limk→∞

φ2k(vk) = +∞ as vk → u /∈ H. (A3.4.72)

Then, starting with some iteration, there exist points uk satisfying (A3.4.69),moreover, the sequence uk is bounded and each cluster point belongs to theoptimal set U∗.

Proof: For fixed θ > 0 and δ > 0 we define the sets

K := u ∈ K : J(u) ≤ J∗ + θ (A3.4.73)

andKδ := u ∈ Rn : ρ(u, K) ≤ δ,


which are bounded due to Proposition A1.7.55. Now, we choose a convex,compact set S ⊂ intG ∩ intKδ such that H ∩ intS 6= ∅ and for every u ∈ S theinequality J(u) ≤ J∗ + τ is true with some fixed τ ∈ (0, θ4 ]. Let

z ∈ H ∩ intS, w ∈ ∂Kδ, z ∈ [z, w] ∩ int(Kδ \ K), z /∈ ∂G ∪ S.

Then the following tree cases are possible:

(a) z /∈ G: In this case, on the basis of (A3.4.70) and Proposition A1.5.19,starting with some iteration k1, we get

J(z) + φ1k(z) > J∗ + θ − τ,

and

J(u) + φ1k(u) ≤ J∗ + 2τ, ∀ u ∈ S.

Consequently, for k ≥ k1 and u ∈ S the inequality

J(z) + φ1k(z) > J(u) + φ1

k(u) + θ − 3τ (A3.4.74)

is satisfies. Since the set Sλ(z) := z+λ(z−u) : u ∈ S, λ ≥ 0 contains thepoint w together with some neighborhood O(w), we obtain, with regardto the convexity of J + φ1

k and (A3.4.74),

J(v) + φ1k(v) > J(z) + φ1

k(z) + θ − 3τ, ∀ k ≥ k1, ∀ v ∈ O(w).

Hence, in view of (A3.4.71), (A3.4.72), there exists a number k2 ≥ k1 suchthat

Fk(v) > Fk(z) + θ − 4τ, ∀ k ≥ k2, ∀ v ∈ O(w). (A3.4.75)

(b) z /∈ H: Due to (A3.4.70) and Proposition A1.5.19 we can conclude thatfor sufficiently large k (k ≥ k3) and all u ∈ S the estimate

|φ1k(u)| < ρ1τ

2ρ2

is true with

ρ1 := ρ(z, ∂S), ρ2 := max‖z − ξ‖ : ξ ∈ ∂K2δ.

Hence, in view of the convexity of φ1k, for arbitrary q ∈ K2δ \ S and

η ∈ [z, q] ∩ ∂S, we obtain for k ≥ k3

φ1k(q) ≥ ‖z − q‖

‖η − z‖φ1k(η)− ‖q − η‖

‖η − z‖φ1k(z)

≥ ‖z − q‖‖η − z‖

(−ρ1τ

2ρ2

)− ‖q − η‖‖η − z‖

(ρ1τ

2ρ2

)≥ 2

‖z − q‖‖η − z‖

(−ρ1τ

2ρ2

)≥ −τ.


Since z /∈ H, it is obvious that w /∈ H. Hence, for a sufficiently smallneighborhood O(w) ⊂ Sλ(z), the relation

cl(O(w)) ∩H = ∅ (A3.4.76)

holds andφ1k(v) ≥ −τ, ∀ v ∈ O(w), ∀ k ≥ k3.

Increasing k3 (if necessary). due to (A3.4.70), we convince ourselves that

φ1k(z) ≤ τ ∀ k ≥ k3.

The relations (A3.4.76), (A3.4.71), (A3.4.72) lead to

J(v) + φ2k(v) > J(z) + φ2

k(z) + θ − 2τ, ∀ v ∈ O(w),∀ k ≥ k4 ≥ k3.

Indeed, for γk := infu∈O(w) φ2k(u), we obtain from (A3.4.76) and (A3.4.72)

limk→∞

γk = +∞,

and (A3.4.71) implies that φ2k(z) = 0.

Consequently, inequality (A3.4.75) is satisfied for all v ∈ O(w) if k ischosen large enough.

(c) z ∈ G ∩H: Due to the choice of z and (A3.4.73) the inequality

J(z) > J∗ + θ

holds and there exists a number k5 such that for k ≥ k5

φ1k(z) ≥ −τ, φ1

k(u) ≤ τ, ∀ u ∈ S.

Thus, we get

J(z) + φ1k(z) > J(u) + φ1

k(u) + θ − 3τ, ∀ k ≥ k5,

and the same inequality (A3.4.75) as in case (a) can be obtained.Therefore, inequality (A3.4.75) is true in any case.

Furthermore, varying the point z, we obtain a cover of the compact set ∂Kδ bymeans of the neighborhoods O(w) determined above. Choosing a finite subcover,we can state that, starting with some iteration k′, inequality (A3.4.75) holds forall v ∈ ∂Kδ. Since θ− 4τ ≥ 0 ∀ k ≥ k′, the functions Fk attain their minima onRn only in points belonging to Kδ.

Moreover, if θ and τ satisfy the relation θ − 4τ >∑k≥k′ εk, then

uk ∈ Kδ ∀ k ≥ k′.

But with regard to the arbitrarily chosen δ > 0, the cluster points of the se-quence uk must belong to K ⊂ K.Now, varying θ and τ such that θ−4τ > 0, θ ↓ 0, (we emphasize that k′ dependson τ and θ and k′ may tend to +∞ if τ ↓ 0), relation (A3.4.73) provides thatmink→∞ ρ(uk, U∗) = 0.


Assuming Fk ∈ C1(Rn), an analogous result is true if, instead of (A3.4.69),the stopping criterion

‖∇Fk(uk)‖ ≤ εk, εk ↓ 0

is used. Following the proof of Lemma 3.1 in Grossmann and Kaplan [151]it is easily seen that under the assumptions of Theorem A3.4.41 the condition

limk→∞

‖∇Fk(uk)‖ = 0

leads tolimk→∞

[Fk(uk)− infu∈Rn

Fk(u)] = 0.

Now, let us specify the penalty functions φ1k and φ2

k. Assume the feasibleset K = G ∩H is described by

G := u ∈ Rn : gj(u) ≤ 0, j = 1, ...,m,H := u ∈ Rn : hi(u) = 0, i = 1, ...,m1,

with gj convex and hi affine functions, and if in addition there exists a Slaterpoint u ∈ H with gj(u) < 0, (j = 1, ...,m), then

φ2k(u) := rk

m1∑k=1

[hi(u)]2

is chosen usually.As for the penalty function φ1

k, satisfying condition (A3.4.70), the following onesare often applied in practice:

φ1k(u) := rk

m∑k=1

max[0, gj(u)]s, s ≥ 1, (A3.4.77)

φ1k :=

− 1rk

∑mj=1

1gj(u) if u ∈ intG,

+∞ otherwise.(A3.4.78)

Sometimes, the latter one is called barrier function.In order to satisfy the hypothesizes of Theorem A3.4.41 one has to guaranteethat rk ↓ 0.

In the sequel we suppose that there are no equality constraints and describesome estimates of the rate of convergence for the penalty method with penaltyfunctions of the type (A3.4.77) and (A3.4.78).

A3.4.42 Proposition. For the penalty method the following estimates are true:

(i) Using penalty function (A3.4.77) with s = 2:

−‖λ‖

√1

r2k

‖λ‖2 +1

rk2εk ≤ J(uk)− J∗ ≤ εk, (A3.4.79)

where λ is a Lagrange multiplier vector.


(ii) Using function (A3.4.78):

J(uk)− J∗ < 2√rk

√mσ0[J(u)− J∗] + εk, (A3.4.80)

with

rk >mσ0

J(u)− J∗, σ0 :=

1

min1≤j≤m |gj(u)|

and u a Slater point.

On the class of problems considered the both estimates cannot be improved withrespect to the order. For general approaches to a priori estimates of the rate ofconvergence for penalty methods we refer to Lootsma [277], Skarin [373] andGrossmann and Kaplan [151].

It should be noted that the conditioning of unconstrained minimization prob-lems generated by penalty methods, becomes worse within the increase of theiterations. Often these methods are used as an effective tool to compute a com-paratively hoarse solution of an optimization problem. However, if a more precisesolution is required, then special modifications are needed.

The following a posteriori estimates for the rate of convergence providea convenient stopping criterion for the penalty methods considered above, see[151].

A3.4.43 Proposition. Suppose that J ∈ C1(Rn), gj ∈ C1(Rn), (j = 1, ...,m),φk := ϕk(g(·)), with ϕk ∈ C1(Rm). Furthermore, assume that the hypothesizesof Theorem A3.4.41 and the Slater condition are fulfilled. The functions

ηjk(·) :=∂ϕk(g(·))∂gj

are assumed to be non-negative on Rn and for each j and zk ⊂ R

n thesequence ηjk(zk) converges to zero if

lim infk→∞

gj(zk) > −∞, lim sup

k→∞gj(z

k) < 0.

Then, for the iterates uk, generated according to

‖∇Fk(uk)‖ ≤ εk, εk ↓ 0,

the following holds true:

(i) setting λkj := ηjk(uk) and λk = (λk1 , ..., λkm)T , the sequence (uk, λk) is

bounded and each cluster point is a saddle point of the corresponding La-grangian function;

(ii) the estimate

J∗ ≥ J(uk) + 〈λk, g(uk)〉 − εkρ(uk, U∗) (A3.4.81)

is true.


If uk ∈ K, then in view of (A3.4.81), we get

J(uk) ≥ J∗ ≥ J(uk) + 〈λk, g(uk)〉 − εkρ(uk, U∗), (A3.4.82)

and〈λk, g(uk)〉 − εkρ(uk, U∗)→ 0, k →∞.

Instead of the unknown value ρ(uk, U∗) an upper estimate can be applied.If uk /∈ K, then by means of a linear interpolation of the function

g(·) := max1≤j≤m

gj(·)

on the interval [u, uk], a point zk ∈ K can be easily defined such that

J(zk)− J(uk)→ 0,

and estimate (A3.4.82) can be replaced by

J(zk) ≥ J∗ ≥ J(uk) + 〈λk, g(uk)〉 − εkρ(uk, U∗). (A3.4.83)

A3.4.44 Remark. Problem (A3.4.56) is equivalent to the unconstrained min-imization of the non-differentiable function

F (u) := J(u) + r

m∑i=1

max[0, gj(u)]

if r > max1≤j≤m |λ|, with λ any Lagrange multiplier vector of the problem.Hence, using penalty function (A3.4.77) with s = 1, it is not necessary to in-crease the penalty parameter r. However, in that case special methods of non-differentiable minimization must be applied, see Remark A3.2.24. ♦

A3.4.4 Dual methods

The statement of the Kuhn-Tucker Theorem A1.7.51 about the equivalence ofconvex optimization problems and corresponding saddle point problems is thefoundation for a number of methods in which a saddle point of a convex-concavefunction is sought under simple constraints, only non-negativity of the dual vari-ables is required.

Let us consider Problem (A1.7.35), given for V ≡ Rn, and the correspond-ing saddle point problem (A1.7.36), (A1.7.37).

One of the first multiplier method for solving this kind of problems is the

Arrow-Hurwicz-Uzawa method:[15]

uk+1 := uk − α∇uL(uk, λk),

λk+1 := [λk + α∇λL(uk, λk)]+,(A3.4.84)

with α > 0 suitably chosen, ([y]+ := max[0, y]).

Hence, the pair of iterates (uk+1, λk+1) can be calculated by means of a gradi-ent step with respect to the u component and a gradient projection step withrespect to the component λ.


In the same paper another multiplier method is described which requiresthe fully minimization of the Lagrangian function L(u, λk) instead of one gra-dient step in each iteration. In other words, a gradient projection method forsolving the dual problem (cf. (A1.7.38))

L(λ) := infu∈Rn

L(u, λ)→ maxλ≥0

,

has to be considered.The analysis of methods based on the implementation of Lagrangian func-

tions shows slow convergence and a comparatively limited field of applications.Their convergence is ensured under sufficiently strict requirements for the initialproblem, mainly caused by the instability of the set of saddle points of the La-grangian function with respect to block relaxation as well as by the insufficientsmoothness of the corresponding function L.

Indeed, without additional assumptions, for instance strict convexity of J ,we have in general

Arg minu∈Rn

L(u, λ∗) 6= U∗, (λ∗ Langrange multiplier vector)

and L is not continuous on Rn and not differentiable on dom(−L), even if thefunctions involved are analytic.

More efficient multiplier methods can be constructed by means of modifi-cations of the Lagrangian function that overcome the defects mentioned above.Often the augmented Lagrangian function LA : Rn ×Rm → R

LA(u, λ) := J(u) +1

2r

‖[λ+ rg(u)]+‖2 − ‖λ‖2

(A3.4.85)

is suggested, with r > 0 arbitrarily, but fixed chosen. The set of saddle points ofLA on Rn ×Rm coincides with the set U∗ ×Λ∗ of saddle points of the originalLagrangian function L on Rn ×Rm+ .

Moreover, function LA has the following desirable properties:

(i) LA is convex with respect to u and concave with respect to λ;

(ii) Arg minu∈Rn LA(u, λ∗) = U∗;

(iii) LM (λ) = infu∈Rn LA(u, λ) is concave and differentiable on Rm and itsgradients fulfill a Lipschitz condition with constant L = 1

r .

If we replace the original Lagrangian function by the modified one, then asimilar multiplier method can be described:

uk+1 := uk − α∇uLA(uk, λk),

λk+1 := λk + α∇λLA(uk, λk),(A3.4.86)

If α := r, then λk+1 := [λk + rg(uk)]+ can be chosen.

A3.4.45 Proposition. Let the assumptions for Problem (A3.4.56) be fulfilled.Assume further that the gradients of the functions J , gj satisfy a Lipschitz con-dition with a common Lipschitz constant L(δ) on the ball Bδ := u ∈ Rn : ‖u‖ ≤δ for any δ > 0. Then, for

δ2 ≥ ‖u0‖2Rn + ‖λ0‖2

Rm ,


there exists a constant α(δ) such that the sequence (uk, λk), generated byMethod (A3.4.86) with α ∈ (0, α(δ)], converges to some saddle point (u∗, λ∗) ∈U∗ × Λ∗.

The constant α(δ) can be efficiently calculated if L(α) is known (cf.Gol’stein and Tretyakov [141]).

If, for some solution u∗ of Problem (A3.4.56), the vectors ∇gj(u∗)j∈I0(u∗),I0(u∗) := i : gi(u

∗) = 0, are linearly independent, then the Lagrange multi-plier vector λ∗ is unique. This is a conclusion of the Kuhn-Tucker theorem.

A3.4.46 Proposition. Suppose in addition to the assumptions of PropositionA3.4.45 that

(i) J and gj are twice-differentiable on Rn and their second derivatives satisfya Lipschitz condition;

(ii) J is strongly convex;

(iii) the vectors ∇gj(u∗)j∈I0(u∗) are linearly independent and

λ∗j > 0 ∀ j ∈ I0(u∗).

Then, starting with an arbitrary point (u0, λ0), a constant α > 0 exists such thatthe iterates (uk, λk), generated by Method (A3.4.86) with α ∈ (0, α), convergelinearly to (u∗, λ∗) ∈ U∗ × Λ∗.

Due to property (iii) of the augmented Lagrangian function LA, it is pos-sible to use a gradient method to maximize LM . In this situation we have tocalculate arg minu∈Rn LA(u, λk) for each λk.Let us briefly describe this method under the assumption that

arg minu∈Rn

LA(u, λk)

is calculated approximately:

uk : LA(uk, λk) ≤ minu∈Rn

LA(u, λk) + min[δk, β‖∇λLA(uk, λk)‖2

],

λk+1 := λk + α∇λLA(uk, λk),(A3.4.87)

with α ∈ (0, 2r), β ≥ 0, δk ≥ 0 fixed and∑∞k=1 δk <∞.

A3.4.47 Proposition. Let the hypothesizes of Proposition A3.4.46 be fulfilled.Then for arbitrary λ0 ≥ 0 and sufficiently small β the sequence (uk, λk),generated by Method (A3.4.87), converges linearly to (u∗, λ∗) ∈ U∗ × Λ∗.Choosing α := r the factor of the geometric progression is q = o( 1

r ).

Hence, factor q can be made as small as desired by means of the chosenparameter r in function (A3.4.85). But one should be aware that an increase ofr makes the computation of uk more difficult, because the condition number ofthe Hessian of LA(·, λ) becomes worse.

Bibliography

[1] Abbe, L. A logarithmic barrier approach and its regularization appliedto convex semi-infinite programming problems. PhD thesis, University ofTrier, 2001.

[2] Abbe, L. Two logarithmic barrier methods for convex semi-infinite prob-lems. In Proceedings of the Conference on Semi-infinite Programming,Alicante (2001), M. Goberna and M. Lopez, Eds., Kluwer Acad. Publ.,pp. 169–195.

[3] Adams, R. Sobolev Spaces. Academic Press, New York, 1973.

[4] Al-Baali, M. Decent properties and global convergence of the Fletcher-Reeves method with inexact line search. IMA J. Numer. Anal. 5 (1985),121–124.

[5] Alart, P., and Lemaire, B. Penalization in non-classical convex pro-gramming via variational convergence. Math. Programming 51 (1991),307–331.

[6] Alber, Y., Burachik, R., and Iusem, A. A proximal method fornon-smooth convex optimization problems in Banach spaces. Abstr. Appl.Anal. 2 (1-2) (1997), 97–120.

[7] Alifanov, O., Artyukhin, E., and Rumyanzev, S. Extremal Methodsfor the Solution of Incorrect Problems. Nauka, Moscow, 1988 (in Rusian).

[8] Allgower, E., and Georg, K. Simplicial and continuation methods forapproximating fixed points and solutions to systems of equations. SIAMReview 22 (1980), 28–85.

[9] Alt, W., and Mackenroth, U. Convergence of finite element approxi-mations to state constrained convex parabolic boundary control problems.SIAM J. Contr. Opt. 27 (1989), 718–736.

[10] Anderson, E., and Philpott, A. Infinite Programming. Lecture Notesin Economics and Mathematical Systems, vol. 259, Springer-Verlag, Berlin- Heidelberg - New York - Toronto, 1985.

[11] Anger, G. Inverse Problems in Differential Equations. Akademie-Verlag,Berlin, 1990.

[12] Antipin, A. Regularization methods for convex programming problems.Ekonomika i Mat. Metody 11 (1975), 336–342 (in Russian).

519

520 BIBLIOGRAPHY

[13] Antipin, A. On a method for convex programs using a symmetricalmodification of the Lagrangian function. Ekonomika i Mat. Metody 12(1976), 1164–1173 (in Russian).

[14] Antipin, A. Nonlinear programming methods using primal and dualmodifications of the Lagrangian function. Preprint, VNIISI, Moscow(1979), (in Russian).

[15] Arrow, K., Hurwicz, L., and Uzawa, H., Eds. Studies in Linear andNonlinear Programming,. Stanford University Press, Stanford, 1958.

[16] Attouch, H. Variational Convergence for Functions and Operators. Ap-plicable Mathematics Series. Pitman, London, 1984.

[17] Attouch, H., Baillon, J. B., and Thera, M. Sum of maximal mono-tone operators revisited: The variational sum. J. of Convex Analysis 1(1994), 1–29.

[18] Attouch, H., and Wets, R. Isometries for the Legendre-Fenchel trans-formation. Transactions of the Amer. Math. Soc. 296 (1984), 33–60.

[19] Aubin, J. P. Approximation of Elliptic Boundary Value Problems. Wiley- Interscience, New York - London - Sydney - Toronto, 1972.

[20] Aubin, J. P., and Ekeland, I. Applied Nonlinear Analysis. J. Wiley& Sons, Chichester, 1984.

[21] Auslender, A., Crouzeix, J., and Fedit, P. Penalty proximal meth-ods in convex programming. J. Opt. Theory Appl. 55 (1987), 1–21.

[22] Auslender, A., and Haddou, M. An interior-proximal method forconvex linearly constrained problems and its extension to variational in-equalities. Math. Programming 71 (1995), 77–100.

[23] Auslender, A., and Teboulle, M. Entropic proximal decompositionmethods for convex programs and variational inequalities. Math. Program-ming Ser. A 91 (2001), 33–47.

[24] Auslender, A., and Teboulle, M. Asymptotic Cones and Funtions inOptimization and Variational Inequalities. Springer Moographs in Math-ematics, Springer-Verlag, New York, 2003.

[25] Auslender, A., Teboulle, M., and Ben-Tiba, S. Interior proxi-mal and multiplier methods based on second order homogeneous kernels.Mathematics of Oper. Res. 24 (1999), 645–668.

[26] Auslender, A., Teboulle, M., and Ben-Tiba, S. A logarithmic-quadratic proximal method for variational inequalities. ComputationalOptimization and Applications 12 (1999), 31–40.

[27] Babuska, I., and Gatica, G. N. On the mixed finite element methodwith lagrange multipliers. Numer. Methods Partial Differ. Equations 19,2, 192–2100, year=2003.

BIBLIOGRAPHY 521

[28] Baiocchi, C., and Capelo, A. Variational and Quasi-Variational In-equalities: Applications to Free Boundary Problems. J. Wiley & Sons,Chichester, 1984.

[29] Bakushinski, A. A general approach for the construction of regularizedalgorithms for solving linear, incorrect equations in a Hilbert space. J.Vycisl. Mat. i Mat. Fiz. 7 (1967), 672–677 (in Russian).

[30] Bakushinski, A., and Gonkharski, A. Ill-posed Problems – NumericalMethods and Applications. Moscow University Press, Moscow, 1989 (inRussian).

[31] Bakushinski, A., and Polyak, B. About the solution of variationalinequalities. Soviet Math. Doklady 15 (1974), 1705–1710.

[32] Balakrishnan, A. A new computing technique in optimal control. SIAMJ. Control 6 (1968), 149–173.

[33] Basermann, A., Reichel, B., and Scheltoff, C. Preconditionedcg-methods for sparse matrices on massively parallel maschines. ParallelComput. 23 (1997), 381–398.

[34] Baumeister, J. Stable Solution of Inverse Problems. Vieweg Verlag,Wiesbaden - Braunschweig, 1987.

[35] Bauschke, H., and Borwein, J. Legendre functions and the methodof random Bregman projections. J. of Convex Analysis 4 (1997), 27–67.

[36] Bazaraa, M., Sherali, H., and Shetty, C. Nonlinear Programming.Theory and Algorithms. J. Wiley & Sons, New York, 1090.

[37] Becker, R., and Vexler, B. Optimal control of the convection-diffusion equation using stabilized finite element methods. Numer. Math.(2007).

[38] Ben-Tal, A., Ben-Israel, A., and Rosinger, E. A Helly-type theo-rem and semi-infinite programming. In Constructive Approaches to Math-ematical Models (1979), C. V. Coffman and G. J. Fix, Eds., AcademicPress, New York, pp. 127–135.

[39] Ben-Tal, A., Teboulle, M., and Zowe, J. Second order necesarryoptimality conditions for semi-infinite programming. In Semi-infinite Pro-gramming (1979), R. Hettich, Ed., Springer-Verlag, Berlin - Heidelberg -New York - Tokyo, pp. 127–135.

[40] Ben-Tal, A., and Zibulevski, M. Penalty-barrier methods for convexprogramming problems. SIAM J. Optim. 7 (1997), 347–366.

[41] Bergounioux, M. Etude de differents problemes de controle avec con-straints sur l’etat. Rapport de Recherche 89-1 (1989).

[42] Bergounioux, M. A penalization method for optimal control of ellipticproblems with state constraints. SIAM J. Control Optim. 30 (1992), 305–323.

522 BIBLIOGRAPHY

[43] Bergounioux, M. Augmented lagrangian method for distributed op-timal control problems with state constraints. J. Opt. Theory Appl. 78(1993), 493–521.

[44] Bertsekas, D. P. Constrained Optimization and Lagrange MultiplierMethods. Computer Science and Applied Mathematics, Acad. Press Inc.[Harcourt Brace Jovanovich Publishers], New York, 1982.

[45] Blaheta, R. GPCG-generalized preconditioned CG method and its usewith nonlinear and non-symetric displacement decomposition precondi-tioners. Numer. Linear Algebra Appl. 9 (2002), 527–550.

[46] Blum, E., and Oettli, W. Mathematische Optimierung. Grundlagenund Verfahren. Springer-Verlag, Berlin, 1975.

[47] Bonnans, J., and Launay, G. Sequential quadratic programming withpenalization of the displacement. SIAM Journal on Optim. 5 (1995),792–812.

[48] Bregman, L. M. A relaxation method of finding a common point ofconvex sets and its application to the solution of problems in convex pro-gramming. Z. Vycisl. Mat. i Mat. Fiz. 7 (1967), 620–631 (in Russian).

[49] Brezinski, C. A classification of quasi-Newton methods. Numer. Algo-rithms 33 (2003), 123–135.

[50] Brezis, H. Equations et inequations non lineaires dans les espaces vec-toriels en dualite. Ann. Inst. Fourier 18 (1968), 115–175.

[51] Brezis, H. Problemes unilateraux. J. Math. Pures Appl. 72 (1971),1–168.

[52] Brezis, H., and Lions, P. L. Produits infinis de resolvantes. Israel J.of Math. 29 (1978), 329–345.

[53] Brezis, H., and Stampacchia, G. Sur la regularite de la solutiond’inequations elliptiques. Bull. Soc. Math. France 96 (1968), 153–180.

[54] Brezzi, F., and Fortin, M. Mixed and Hybrid FiniteElement Methods.Springer-Verlag, Berlin - Heidelberg - New York, 1991.

[55] Brosowski, B. An optimality condition and its application to parametricsemi-infinite optimization. Operations Research in Progress, Theory Decis.Libr. 32 (1982), 3–16.

[56] Brosowski, B. Parametric semi-infinite Programming. Methoden undVerfahren der Math. Physik. Verlag P. Lang, Frankfurt, 1982.

[57] Browder, F. E. Nonlinear maximal monotone operators in Banachspaces. Math. Annalen 175 (1968), 89–113.

[58] Broyden, C., and Vespucci, M. The cg-methods - a brief review.In Recent Trends in Numerical Analysis (Huntington, NY: Nova SciencePublishers. Adv. Theory Comput. Math., 2001), D. Trigiante, Ed., vol. 3,pp. 71–80.

BIBLIOGRAPHY 523

[59] Burachik, R., and Iusem, A. A generalized proximal point algorithmfor the variational inequality problem in Hilbert space. SIAM J. Optim.8 (1998), 197–216.

[60] Burachik, R., and Iusem, A. Set-valued Mappings and Enlargement ofMonotone Operators. Springer, New York, 2008.

[61] Burachik, R., Iusem, A., and Svaiter, B. Enlargements of maxi-mal monotone operators with application to variational inequalities. Set-Valued Analysis 5 (1997), 159–180.

[62] Burachik, R., Sagastizabal, C., and Svaiter, B. Bundle methodsfor maximal monotone operators. In Ill-posed Variational Problems andRegularization Techniques (1999), M. Thera and R. Tichatschke, Eds.,vol. 477 of LNEMS, Springer-Verlag, Berlin - Heidelberg - New York -Tokyo, pp. 49–64.

[63] Burachik, R., and Svaiter, B. ε-enlargement of maximal monotoneoperators in Banach spaces. Set-Valued Analysis 7 (1999), 117–132.

[64] Burachik, R., and Svaiter, B. A relative error tolerance for a familyof generalized proximal point methods. Math. of Oper. Res. 26 (2001),816–831.

[65] Butnariu, D., and Iusem, A. On a proximal point method for convexoptimization in Banach spaces. Numer. Funct. Anal. and Optimiz. 18(7-8) (1997), 723–744.

[66] Butnariu, D., and Iusem, A. Totally Convex Functions for FixedPoints Computation and Infinite Dimensional Optimization. Kluwer,Dordrecht-Boston-London, 2000.

[67] Cafieri, S., D’Apuzzo, M., Marino, M., Mucherino, A., andToraldo, G. Interior-point solver for large-scale quadratic programmingproblems with bound constraints. J. Optim. Theory Appl. 129 (2006), 55–75.

[68] Casas, E. Control of elliptic problems with pointwise state constraints.SIAM J. Control Optim. 6 (1986), 1309–1322.

[69] Cea, J. Optimization. Dunod, Paris, 1971.

[70] Censor, Y., Iusem, A., and Zenios, S. A. An interior point methodwith Bregman functions for the variational inequality problem with para-monotone operators. Math. Programming 81 (1998), 373–400.

[71] Censor, Y., and Zenios, S. A. Proximal minimization algorithm withd-functions. J. Optim. Theory Appl. 73 (1992), 451–464.

[72] Charnes, A., Cooper, W., and Kortanek, K. On representations ofsemi-infinite programs which have no duality gaps. Management Science9 (1965), 113–121.

524 BIBLIOGRAPHY

[73] Charnes, A., Cooper, W., and Kortanek, K. On the theory of semi-infinite programs and some generalizations of Kuhn-Tucker saddle pointtheorems for arbitrary convex functions. Nav. Res. Log. Quart. 16 (1969),41–51.

[74] Chen, G., and Teboulle, M. Convergence analysis of a proximal-like minimization algorithm using Bregman functions. SIAM J. Optim. 3(1993), 538–543.

[75] Ciarlet, P. G. The Finite Element Method for Elliptic Problems. North-Holland, Amsterdam, 1978.

[76] Ciarlet, P. G. Introduction to Numerical Linear Algebra and Opti-mization. Cambridge Univ. Press, Cambridge - New York - Melbourne -Sydney, 1989.

[77] Clarke, F. H. Optimization and Nonsmooth Analysis. Wiley, New York,1983.

[78] Cohen, G. Auxiliary problem principle and decomposition of optimiza-tion problems. JOTA 32 (1980), 277–305.

[79] Cohen, G. Auxiliary problem principle extended to variational inequali-ties. JOTA 59 (1988), 325–333.

[80] Concus, P., Golub, G., and O’Leary, D. A generalized conjugategradient method for the numerical solution of elliptic partial differentialequations. In Sparse Matrix Computation (1976), J. Bunch and D. Rose,Eds., Academic Press, New York, pp. 309–332.

[81] Conn, A. R., and Gould, N. I. M. An exact penalty function forsemi-infinite programming. Math. Programming 37 (1987), 19–40.

[82] Conn, A. R., Gould, N. I. M., and Toint, P. Trus-region methods.MPS-SIAM Series on Optimization, SIAM, Philadelphia 1 (2000).

[83] Coope, I. D., and Watson, G. A. A projected Lagrangian algorithmfor semi-infinite programming. Math. Programming 32 (1985), 337–356.

[84] Correa, R., and Lemarechal, C. Convergence of some algorithms forconvex minimization. Math. Programming 62 (1993), 261–275.

[85] Crouzeix, J.-P. Pseudo-monotone variational inequality problems: Ex-istence of solutions. Math. Programming 78 (1997), 305–314.

[86] Daniel, J. The Approximate Minimization of Functionals. Prentice-Hall,Englewood Cliffs, 1971.

[87] de Veubeke, B. F., and Hogge, M. Dual analysis of heat conductionproblems by finite elements. Intern. J. Num. Meth. Eng. 5 (1972), 65–82.

[88] Demyanov, V. F., and Malozemov, W. Introduction to Minimax.Nauka, Moscow, 1972 (in Russian).

BIBLIOGRAPHY 525

[89] Demyanov, V. F., and Rubinov, A. M. Fundations of NonsmoothAnalysis and Quasidifferential Calculus. Nauka, Moscow (in Russian),1990.

[90] Dennis, J. E., and Schnabel, R. B. Numerical Methods for Uncon-strained Optimization and Nonlinear Equations. Prentice-Hall, EnglewoodCliffs, 1983.

[91] Dixon, L., and Szego, G. Towards Global Optimization. North-Holland,Amsterdam, 1975.

[92] Dontchev, A. L., and Zolezzi, T. Well-posed Optimization Problems,Lecture Notes in Math., vol. 1543. Springer-Verlag, Berlin, 1993.

[93] Douglas, J. Mathematical programming and integral equations. In Sym-pos. Numer. Treatm. ODE, Integral and Integro-Diff. Equations (1960),Berlin - Stuttgart, pp. 269–274.

[94] Dunn, J. Convexity, monotonicity and gradient processes in Hilbertspace. J. Math. Anal. Appl. 53,1 (1976), 145–158.

[95] Duvaut, G., and Lions, J.-L. Les Inequations en Mecanique et enPhysique. Dunod, Paris, 1972.

[96] Eckstein, J. Nonlinear proximal point algorithms using Bregman func-tions, with application to convex programming. Math. Oper. Res. 18(1993), 202–226.

[97] Eckstein, J. Approximate iterations in Bregman-function-based proxi-mal algorithms. Math. Programming 83 (1998), 113–123.

[98] Eckstein, J., and Bertsekas, D. On the douglas-rachford splittingmethod and the proximal point algorithm for maximal monotone opera-tors. Math. Programming 55 (1992), 293–318.

[99] Edwards, R. Functional Analysis - Theory and Applications. Holt, Rein-hart and Winston, New York, 1965.

[100] Ekeland, I., and Temam, R. Convex Analysis and Variational Prob-lems. North-Holland, Amsterdam, 1976.

[101] El-Farouq, N., and Cohen, G. Progressive regularization of variationalinequalities and decomposition algorithms. JOTA 97 (1998), 407–433.

[102] Elman, H., Silvester, D., and Wathen, A. Finite Elements and FastIterative Solvers. Oxford University Press, N. Y., 2005.

[103] Emelin, I., and Krasnoselski, M. A. On the theory of ill-posed prob-lems. Soviet Math. Doklady 20 (1979), 105–109.

[104] Engl, H., Kunisch, K., and Neubauer, A. Convergence rates forTikhonov regularization of non-linear ill-posed problems. Inverse Problems5 (1989), 523–540.

[105] Eremin, I., and Astafjev, N. Introduction to Linear and Convex Pro-gramming. Nauka, Moscow, 1976 (in Russian).

526 BIBLIOGRAPHY

[106] Fabes, E., Luskin, M., and Sell, G. Construction of inertial manifoldsby elliptic regularization. J. of Diff. Equations 89 (1991), 355–387.

[107] Facchinei, F., and Pang, J.-S. Finite-Dimensional Variational In-equalities and Complementarity Problems. Springer, New York, Berlin,Heidelberg, Hong Kong, London, Milan, Paris, Tokyo, 2003.

[108] Falk, R. Approximation of a class of optimal control problems with orderof convergence estimates. J. Math. Anal. and Appl. 44 (1973), 28–47.

[109] Fasano, G. Planar-cg methods and matrix tridiagonalization in largescale unconstrained optimization. In High Performance Algorithms andSoftware for Nonlinear Optimization. Selected lectures presented at theworkshop, Erice, Italy, June 2001 (Boston, MA: Kluwer Acad. Publishers.Appl. Optim., 2003), G. D. Pillo, Ed., vol. 82, pp. 243–263.

[110] Ferris, M., and Philpott, A. An interior point algorithm for semi-infinite linear programming. Math. Programming 43 (1989), 257–276.

[111] Ferris, M., and Philpott, A. On affine scaling and semi-infinite pro-gramming. Math. Programming 56 (1992), 361–364.

[112] Fiacco, A., and McCormick, G. Nonlinear Programming: SequentialUnconstrained Minimization Techniques. J. Wiley, New York, 1968.

[113] Fichera, G. Problemi elastostatici con vincoli unilaterali: il problema diSignorini con ambique condizioni al contorno. Mem. Accad. Naz. Lincei 7(1964), 91–140.

[114] Fichera, G. Boundary Value Problems of Elasticity with Unilateral Con-straints. Springer-Verlag, Berlin, 1972.

[115] Fletcher, R. A general quadratic algorithm. Journal of the Institute ofMathematics and Application 7 (1971), 76–91.

[116] Fletcher, R. Conjugate direction methods. In Numerical Methods ofUnconstrained Optimization (Acad. Press, London, 1972), W. Murray, Ed.

[117] Fletcher, R. Practical Methods of Optimization. J. Wiley & Sons,Chichester - New York - Brisbane - Toronto - Singapure, 1990.

[118] Fortin, M., and Glowinski, R. Methodes de Lagrangian augmente.Collection Methods Mathematiques pour l’Informatique, Dunot, Paris,1982.

[119] Fremond, M. Dual formulations for potential and complementary ener-gies: Unilateral boundary conditions. In The Mathematics of Finite Ele-ments and Applications (1973), J. Whiteman, Ed., Acad. Press, London,pp. 175–188.

[120] Friedman, A. Variational Principles and Free-Boundary Problems. J.Wiley & Sons, Chichester - New York - Brisbane - Toronto - Singapure,1982.

BIBLIOGRAPHY 527

[121] Friedrich, V., Hofmann, B., and Tautenhahn, U. Moglichkeitender Regularisierung bei der Auswertung von Messdaten, vol. 10. Wis-senschaftliche Schriftenreihe der TH Karl-Marx-Stadt, 1979.

[122] Frommer, A., and Maass, P. Fast cg-based methods for Tikhonov-Phillips regularization. SIAM J. Sci. Comput. 20 (1999).

[123] Fukushima, M., and Mine, H. A generalized proximal point algorithmfor certain non-convex minimization problems. Int. J. Systems Sci. 12(1981), 989–1000.

[124] Fuller, R. Relations among continuous and various non-continuous func-tions. Pacific J. Math. 25 (1968), 495–509.

[125] Gabay, D. Applications of the method of multipliers to variational in-equalities. In Augmented Lagrangian Methods: Applications to the Numer-ical Solution of Boundary-Value Problems (North-Holland, Amsterdam,1983), M. Fortin and R. Glowinski, Eds., pp. 299–331.

[126] Geiger, C., and Kanzow, C. Theorie und Numerik restringierter Op-timierungsaufgaben. Springer-Verlag, Berlin Heidelberg New York, 2002.

[127] Gfrerer, H., Guddat, J., Wacker, H., and Zulehner, W. Path-following methods for Kuhn-Tucker curves by an active set strategy. InSystems and Optimization (1985), A. Bagchi and H. T. Jongen, Eds.,vol. 66 of Lecture Notes in Control and Information Science, Springer-Verlag, Berlin - Heidelberg - New York - Tokyo, pp. 111–131.

[128] Gill, P., W.Murray, and Wright, M. Practical Optimization. Aca-demic Press, London, 1981.

[129] Gill, P. E., and Leonhard, M. Reduced-Hessian quasi-Newton meth-ods for unconstrained optimization. SIAM J. Optim. 12 (2001), 209–237.

[130] Giusti, E. Minimal Surfaces and Functions of Bounded Variations.Monographs in Mathem., v. 80, Birkhauser, 1984.

[131] Gladwell, G. Contact Problem in the Classical Theory of Elasticity.Sijthoff and Noordhoff, Alphen aan den Rijn, 1980.

[132] Glashoff, K., and Gustafson, S. A. Linear Optimization and Ap-proximation. Springer-Verlag, Berlin - Heidelberg - New York - Tokyo,1983.

[133] Glashoff, K., and Sachs, E. On theoretical und numerical aspects ofthe bang-bang principle. Num. Meth. 29 (1977), 93–114.

[134] Glowinski, R. Numerical Methods for Nonlinear Variational Problems.Springer-Verlag, New York - Berlin - Heidelberg - Tokyo, 1984.

[135] Glowinski, R., Lions, J. L., and Tremolieres, R. Numerical Anal-ysis of Variational Inequalities. North-Holland, Amsterdam, 1981.

[136] Glowinski, R., Rieder, A., Wells, R., and Zhou, X. A precondi-tioned cg-method for wavelet-Galerkin discretization of elliptic problems.Z. Angew. Math. Mech. 75 (1995), 683–684.

528 BIBLIOGRAPHY

[137] Goberna, M. A., and Lopez, M. A. Linear Semi-infinite Optimization.J. Wiley & Sons, Chichester, 1998.

[138] Gollmer, R., Guddat, J., Guerra, F., Nowack, D., and Ruck-mann, J. Path-following methods in nonlinear optimization I: Penalty em-bedding. In Ser. Approximation and Optimization (P. Lang Publ. House,Frankfurt, 1992), B. Brosowski and J. Guddat, Eds., vol. 3, pp. 163–214.

[139] Gol’shtein, E. G., and Tretyakov, N. V. Gradient method and algo-rithms of convex programs connected with modified Lagrangian functions.Ekonomika i Mat. Metody 11 (1975), 730–742 (in Russian).

[140] Gol’shtein, E. G., and Tretyakov, N. V. Modified Lagrangians inconvex programming and their generalizations. Math. Program. Stud. 10(1979), 86–97.

[141] Gol’shtein, E. G., and Tretyakov, N. V. Modified Lagrangian Func-tions. J. Wiley & Sons, New York, 1995.

[142] Gonsharski, A., Leonov, A., and Jagola, A. On a regularized al-gorithm for ill-posed problems with approximately given operators. J.Vycisl. Mat. i Mat. Fiz. 12 (1972), 1592–1594 (in Russian).

[143] Gonzaga, C., Polak, E., and Trahan, R. An improved algorithmfor optimization problems with functional inequality constraints. IEEETrans. Automat. Control 25 (1980), 49–54.

[144] Gould, N. An algorithm for large-scale quadratic programming. IMA J.Numer. Anal. 11 (1991), 299–334.

[145] Graettinger, T. J., and Krogh, B. H. The acceleration radius: Aglobal performance measure for robotic manipulators. IEEE Journ. ofRobotics and Automation 4 (1988), 60–69.

[146] Gramlich, G., Hettich, R., and Sachs, E. SQP-methods in semi-infinite programming. SIAM J. Optimization 5 (1995), 641–658.

[147] Gribik, P. R. A central cutting plane algorithm for semi-infinite pro-gramming problems. In Semi-infinite Programming (1979), R. Hettich,Ed., vol. 15 of Lecture Notes in Control and Information Science, Springer-Verlag, Berlin - Heidelberg - New York, pp. 66–82.

[148] Grisvard, P. Elliptic Problems in Nonsmooth Domains. Pitman, Boston,1985.

[149] Griwank, A., and Toint, P. Numerical experiments with partialy sep-arable optimization problems. In Numerical Analysis (1984), D. Griffiths,Ed., vol. 1066 of Lecture Notes in Mathematics, Springer-Verlag, Berlin -Heidelberg - New York - Tokyo.

[150] Groetsch, C. Comments on Marosov’s discrepancy priciple. In Improp-erly Posed Problems and their Numerical Treatment, Proc. Conf. Ober-wolfach (1983), G. Hammerlin and K. Hoffmann, Eds., vol. ISNM 63,Birkhauser-Verlag, Basel - Boston - Stuttgart.

BIBLIOGRAPHY 529

[151] Grossmann, C., and Kaplan, A. Strafmethoden und ModifizierteLagrangefunktionen in der Nichtlinearen Optimierung. Teubner Verlag,Leipzig, 1979.

[152] Guddat, J., Vasques, F. G., and Jongen, H. T. Parametric Op-timization: Singularities, Path-following and Jumps. B. G. Teubner,Stuttgart, 1990.

[153] Guler, O. On the convergence of the proximal point algorithm for convexminimization. SIAM. J. Control and Optim. 29 (1991), 403–419.

[154] Gustafson, S. A. A three-phase algorithm for semi-infinite programmingproblems. In Semi-infinite Programming and Applications (1983), A. V.Fiacco and K. O. Kortanek, Eds., vol. 15 of LNEMS, Springer-Verlag,Berlin - Heidelberg - New York - Tokyo, pp. 138–157.

[155] Gustafson, S. A., and Kortanek, K. O. Numerical solution of a classof semi-infinite programming problems. Nav. Res. Log. Quart. 20 (1973),477–504.

[156] Gustafson, S. A., and Kortanek, K. O. Semi-infinite programmingand applications. In Mathematical Programming: The State of the Art(1983), M. G. A. Bachem and K. Korte, Eds., Springer-Verlag, Berlin -Heidelberg - New York, pp. 132–157.

[157] Gwinner, J. Convergence and error analysis for variational inequalitiesand unilateral boundary value problems. Habil. Thesis, TH Darmstadt,Preprint Nr. 1257 (1989).

[158] Gwinner, J. Finite-element convergence for contact problems in planelinear elastostatics. Quarterly of Applied Mathematics 50 (1992), 11–25.

[159] Gwinner, J. A discretization theory for monotone semicoercive problemsand finite element convergence for p-harmonic Signorini problems. ZAMM74 (1994), 417–427.

[160] Ha, C. D. A generalization of the proximal point algorithm. SIAM J.Control Optim. 28 (1990), 503–512.

[161] Hackbusch, W., and Will, T. A numerical method for a parabolicbang-bang problem. In Optimal Control of Partial Differential Equations(1984), K. H. Hoffmann and W. Krabs, Eds., vol. 68, ISNM, Birkhauser,Basel.

[162] Hadamard, J. Sur les problemes aux derivees partielles et leur significa-tion physique. Bull. Univ. Princeton 13 (1902), 49–52.

[163] Hadamard, J. Le Probleme de Cauchy et les Equations aux DeriveesPartielles Leneaires Hyperboliques. Hermann, Paris, 1932.

[164] Haemmerlin, G., and Hoffmann, K. Improperly Posed Problemsand their Numerical Treatment. Proc. Conf. Oberwolfach, ISNM 63,Birkhauser-Verlag, Basel - Boston - Stuttgart, 1983.

530 BIBLIOGRAPHY

[165] Hager, W. W., and Zhang, H. Self-adaptive inexact proximal pointmethods. Comput. Optim. Appl. 39 (2008), 161–181.

[166] Hahn, H. Elastizitatstheorie, Grundlagen der linearen Theorie und An-wendungen auf eindimensionale, ebene und raumliche Probleme. Teubner,Stuttgart, 1985.

[167] Han, S. A globally convergent method for nonlinear programming. J.Opt. Theory Appl. 22 (1977), 297–309.

[168] Haslinger, J. Finite element analysis for unilateral problems with ob-stacles on the boundary. Applikace Matematiky 22 (1977), 180–188.

[169] Haslinger, J. Finite element analysis of the Signorini problem. Comm.Math. Univ. Carol. 20 (1979), 1–17.

[170] Haslinger, J., and Lovisek, J. Mixed variational formulation of uni-lateral problems. Comm. Math. Univ. Carol. 21 (1980), 231–246.

[171] Heinkenschloss, M., and Sachs, E. Numerical solution of a con-strained control problem for a phase field model. Int. Ser. Numer. Math.118 (1994), 171–187.

[172] Hestens, M. Conjugate Direction Methods in Optimization. Springer-Verlag, Berlin – Heidelberg – New York, 1980.

[173] Hettich, R. A review of numerical methods for semi-infinite optimiza-tion. In Semi-infinite Programming and Applications (1983), A. Fiaccoand K. Kortanek, Eds., vol. 215 of Lecture Notes in Economics and Mathe-matical Systems, Springer-Verlag, Berlin - Heidelberg - New York, pp. 158–178.

[174] Hettich, R., and Gramlich, G. A note on an implementation ofa method for quadratic semi-infinite programs. Math. Programming 46(1990), 249–254.

[175] Hettich, R., and Jongen, H. T. Semi-infinite programming: Condi-tions of optimality and applications. Lecture Notes in Control and Infor-mation Sciences 7 (1978), 1–11.

[176] Hettich, R., Kaplan, A., and Tichatschke, R. Regularized penaltymethods for ill-posed optimal control with elliptic equations, parts i andii. Control and Cybernetics 26 (1997), 5–27, 29–43.

[177] Hettich, R., and Kortanek, K. O. Semi-infinite programming: The-ory, methods and applications. SIAM Review 35 (1993), 380–429.

[178] Hettich, R., and Zencke, P. Numerische Methoden der Approxi-mation und semi-infiniten Optimierung. Teubner Studienbucher Math-ematik. Teubner, Stuttgart, 1982.

[179] Hiriart-Urruty, J.-B., and Lemarechal, C. Convex Analysis andMinimization Algorithms. Springer-Verlag, Berlin, 1993.

BIBLIOGRAPHY 531

[180] Hlavacek, I. Dual finite element analysis for elliptic problems withobstacle on the boundary. Aplikace Matematiky 22 (1977), 244–255.

[181] Hlavacek, I. Convergence of dual finite element approximations forunilateral boundary value problems. Aplikace Matematiky 25 (1980), 375–386.

[182] Hlavacek, I., Haslinger, J., Necas, I., and Lovisek, J. NumericalSolution of Variational Inequalities. Springer-Verlag, Berlin-Heidelberg-New York, 1988.

[183] Hlavacek, I., and Lovisek, J. Finite element analysis of the Signoriniproblem in semi-coercive cases. Aplikace Matematiky 25 (1980), 273–285.

[184] Hoffmann, K.-H., and Sprekels, J. On the identification of parame-ters in general variational inequalities by asymptotic regularization. SIAMJ. Math. Anal. 17 (1986), 1198–1217.

[185] Hofmann, B. Reguarization for Applied Inverse and Ill-posed Problems.Teubner-Texte zur Mathematik, BSG B.G. Teubner, Leipzig, 1986.

[186] Hue, T., and Strodiot, J. Convergence analysis of a relaxedextragradient-proximal point algorithm applied to variational inequalities.Optimization 54 (2005), 191–213.

[187] Huebner, E. Proximal-point-like methods for monotone variational in-equalities with set-valued operators. PhD thesis, University of Trier, 2007.

[188] Huebner, E., and Tichatschke, R. Relaxed proximal point algo-rithms for variational inequalities with multi-valued operators. Optimiza-tion Methods and Software 23 (2008), 847–877.

[189] Hut, M., and Tichatschke, R. A hybrid method for semi.infinite pro-gramming problems. In Methods of Operations Research (1989), U. Rieder,Ed., vol. 62, A. Hain, Ulm, pp. 79–90.

[190] Ibaraki, S., Fukushima, M., and Ibaraki, T. Primal-dual proximalpoint algorithm for linearly constrained convex programming problems.Comput. Opt. Appl. 1 (1992), 207–226.

[191] Ioffe, A. D. Second order conditions in non-linear nonsmooth problemsof semi-infinite programming. In Semi-infinite Programming and Applica-tions (1983), A. V. Fiacco and K. O. Kortanek, Eds., vol. 215 of LNEMS,Springer-Verlag, Berlin - Heidelberg - New York - Tokyo, pp. 262–280.

[192] Ioffe, A. D., and Tikhomirov, V. M. Theory of Extremal Problems.North Holland Publ. Co., Amsterdam, 1975.

[193] Iusem, A. On some properties of generalized proximal point methods forquadratic and linear programming. JOTA 85 (1995), 593–612.

[194] Iusem, A. On some properties of paramonotone operators. J. of Conv.Analysis 5 (1998), 269–278.

532 BIBLIOGRAPHY

[195] Ivanov, V. Integral equations of the first kind and an approximate so-lution of the inverse potential problem. Soviet Math. Doklady 3 (1962),210–212 (in Russian).

[196] Ivanov, V., Vasin, V., and Tanana, V. Theory of Linear IncorrectProblems and their Application. Nauka, Moscow, 1978 (in Russian).

[197] Jarre, F. The Method of Analytic Centers for Smooth Convex Programs.PhD thesis, Universitat Wurzburg, Grottenthaler Verlag, Bamberg, 1989.

[198] Jarre, F. Comparing two interior-point approaches for semi-infinite pro-grams.

[199] John, F. Numerical solution of the heat equation for preceding time.Ann. Math. Pura ed Appl. 40 (1955), 129–142.

[200] Jongen, H. T., Jonker, P., and Twilt, F. Critical sets in parametricoptimization. Math. Programming 34 (1986), 333–353.

[201] Jongen, H. T., Wetterling, W., and Zwier, G. On sufficient condi-tions for local optimality in semi-infinite programming. Optimization 18(1987), 165–178.

[202] Kantorovich, L. V., and Akilov, G. Functional Analysis. PergamonPress, Oxford, 1982.

[203] Kaplan, A. Iterative processes for convex programming problems withinner regularization. Sibirskij Mat. Journal 20 (1979), 307–316 (in Rus-sian).

[204] Kaplan, A. Algorithm for convex programming using a smoothing forexact penalty functions. Sibirskij Mat. Journal 23 (1982), 53–64 (in Rus-sian).

[205] Kaplan, A. On the stability of solution methods for convex optimizationproblems and variational inequalities. Proc. of the Math. Inst. of SiberianBranch of Acad. Sci. USSR 10 (1988), 132–159.

[206] Kaplan, A. Convergence rate estimates of regularized methods for solv-ing variational problems. Sov. J. Numer. Anal. Math. Modelling 4 (1989),99–109.

[207] Kaplan, A., and Namm, R. On characterizing minimizing sequences inthe Signorini problem. Soviet Math. Doklady 28 (1983), 711–714.

[208] Kaplan, A., and Tichatschke, R. A study of iterative processes forthe solution of ill-posed convex variational problems. Soviet Math. Doklady42 (1991), 747–751.

[209] Kaplan, A., and Tichatschke, R. Adaptive methods for solving ill-posed, convex semi-infinite programs. Soviet Math. Doklady 45 (1992),119–123.

[210] Kaplan, A., and Tichatschke, R. A regularized penalty method forsolving convex semi-infinite programs. Optimization 26 (1992), 215–228.

BIBLIOGRAPHY 533

[211] Kaplan, A., and Tichatschke, R. Stable approximation schemesfor ill-posed variational problems. Siberian Advances in Mathematics 2(1992), 123–132.

[212] Kaplan, A., and Tichatschke, R. Variational inequalities and convexsemi-infinite programming problems. Optimization 26 (1992), 187–214.

[213] Kaplan, A., and Tichatschke, R. Investigations about iterative pro-cesses for solving incorrect convex variational problems. Journal of GlobalOptimization 43 (1993), 243–255.

[214] Kaplan, A., and Tichatschke, R. Regularized penalty methods forsemi-infinite programming problems. In Approximation and Optimization(1993), B. Brosowski, F. Deutsch, and J. Guddat, Eds., vol. 3, P. LangPubl. House, Frankfurt, Berlin, New York, pp. 341–356.

[215] Kaplan, A., and Tichatschke, R. Stable Methods for Ill-Posed Vari-ational Problems - Prox-Regularization of Elliptic Variational Inequalitiesand Semi-Infinite Optimization Problems. Akademie Verlag, Berlin, 1994.

[216] Kaplan, A., and Tichatschke, R. Multi-step proximal point regular-ization methods for solving convex variational problems. Optimization 33(1995), 287–319.

[217] Kaplan, A., and Tichatschke, R. A note on the paper ”multi-step-prox regularization methods for solving convex variational problems”. Op-timization 37 (1996), 149–152.

[218] Kaplan, A., and Tichatschke, R. Path-following proximal approachfor solving ill-posed convex semi-infinite programming problems. J. Opt.Theory Appl. 90 (1996), 113–137.

[219] Kaplan, A., and Tichatschke, R. Prox-regularization and solution ofill-posed elliptic variational inequalities. Applications of Mathematics 42(1997), 111–145.

[220] Kaplan, A., and Tichatschke, R. Proximal point methods and non-convex optimization. J. of Global Optim. 13 (1998), 389–406.

[221] Kaplan, A., and Tichatschke, R. Regularized penalty methods fornon-coercive parabolic optimal control problems. Control and Cybernetics27 (1998), 5–27.

[222] Kaplan, A., and Tichatschke, R. Proximal interior point approachto convex programming problems. Optimization 45 (1999), 117–148.

[223] Kaplan, A., and Tichatschke, R. Auxiliary problem principle andthe approximation of variational inequalities with non-symmetric multi-valued operators. CMS Conference Proc. 27 (2000), 185–209.

[224] Kaplan, A., and Tichatschke, R. Proximal point approach andapproximation of variational inequalities. SIAM J. Control Optim. 39(2000), 1136–1159.

534 BIBLIOGRAPHY

[225] Kaplan, A., and Tichatschke, R. A general view on proximal pointmethods to variational inequalities in Hilbert spaces - iterative regulariza-tion and approximation. J. of Nonlinear and Convex Analysis 2 (2001),305–332.

[226] Kaplan, A., and Tichatschke, R. Proximal interior point methodsfor convex semi-infinite programming. Optimization Methods & Software15 (2001), 87–119.

[227] Kaplan, A., and Tichatschke, R. Extended auxiliary problem prin-ciple using Bregman distances. Optimization 53 (2004), 603–623.

[228] Kaplan, A., and Tichatschke, R. Interior proximal method for vari-ational inequalities: Case of non-paramonotone operators. Journal of Set-valued Analysis 12 (2004), 357–382.

[229] Kaplan, A., and Tichatschke, R. On inexact generalized proxi-mal methods with a weakened error tolerance criterion. Optimization 53(2004), 3–17.

[230] Kaplan, A., and Tichatschke, R. Bregman functions and auxiliaryproblem principle. Optimization Methods and Software 23, 1 (2008), 95–107.

[231] Kaplan, A., and Tichatschke, R. Proximal point methods and ellipticregularization. Journal on Nonlinear Analysis 71 (2009), 4525–4543.

[232] Karmanov, V. Mathematical Programming. Fizmatgiz, Moscow, 1980(in Russian).

[233] Kelley, C., and Sachs, E. Solution of optimal control problems by apoint wise projected Newton method. SIAM J. Control Optim. 33 (1995),1731–1757.

[234] Kinderlehrer, D., and Stampacchia, G. An Introduction to Varia-tional Inequalities and their Applications. Acad. Press, New York - Lon-don, 1980.

[235] Kirsten, H., and Tichatschke, R. About the efficiency of a method offeasible directions for solving variational inequalities. In Recent Advancesand Historical Development of Vector Optimization (1987), J. Jahn andW. Krabs, Eds., vol. 294 of LNEMS, Springer-Verlag, Berlin - Heidelberg- New York - Tokyo, pp. 379–400.

[236] Kiwiel, K. Methods of Descent for Nondifferentiable Optimization. Lect.Notes in Mathematics, vol 1133, Springer Verlag, Berlin, 1985.

[237] Kiwiel, K. Proximity control in bundle methods for convex differentiableminimization. Math. Program. 46 (1990), 105–122.

[238] Kiwiel, K. Proximal level bundle methods for convex nondifferentiableoptimization, saddle point problems and variational inequalities. Math.Programming 69 B (1995), 89–109.

BIBLIOGRAPHY 535

[239] Kiwiel, K. Restricted step and Levenberg-Marquardt techniques inproximal bundle methods for non-convex non-differentiable optimization.SIAM J. Optimization 6 (1996), 227–249.

[240] Kiwiel, K. Proximal minimization methods with generalized Bregmanfunctions. SIAM J. Control Optim. 35 (1997), 1142–1168.

[241] Kiwiel, K. A projection-proximal bundle method for convex nondiffer-entiable minimization. In Ill-posed Variational Problems and Regulariza-tion Techniques (1999), M. Thera and R. Tichatschke, Eds., vol. 477 ofLNEMS, Springer-Verlag, Berlin - Heidelberg - New York - Tokyo, pp. 137–150.

[242] Knowles, G. Finite element approximation of parabolic time optimalcontrol problems. SIAM J. Control Optim. 20 (1982), 414–427.

[243] Kojima, M. Strongly stable stationary solutions in nonlinear programs.In Analysis and Computing of Fixed Points (1980), S. M. Robinson, Ed.,Academic Press, New York, pp. 93–138.

[244] Kojima, M., and Hirabayashi, R. Continuous deformation of nonlinearprograms. Math. Programming Study 21 (1984), 150–198.

[245] Korneev, V. Finite Element Schemes with Higher Order of Exactness.Leningrad University Press, 1977 (in Russian).

[246] Kort, B., and Bertsekas, D. Combined primal-dual and penalty meth-ods for convex programming. SIAM J. Control Optimization 14 (1976),268–294.

[247] Kortanek, K., and No, H. A central cutting plane algorithm for convexsemi-infinite programs. SIAM J. Optimization 3 (1993), 901–918.

[248] Krabs, W. Optimierung und Approximation, Teubner StudienbucherMathematik. Teubner, Stuttgart, 1975.

[249] Kress, R. Linear Equations. Springer-Verlag, Heidelberg, 1989.

[250] Kryanev, A. Solution of incorrectly posed problems by methods of suc-cessive approximations. Soviet Math. Doklady 14 (1973), 673–676 (inRussian).

[251] Kupfer, F., and Sachs, E. Reduced SQP-method for nonlinear heatconduction control problems. Int. Ser. Numer. Math. 111 (1993), 145–160.

[252] Kustova, V. A solution method for a contact problem with weakly coer-cive operator. Optimizacija (Novosibirsk) 36 (1985), 31–48 (in Russian).

[253] Ladyshenskaya, O., and Uralzewa, N. Linear and Quasilinear Equa-tions of Elliptic Type. Nauka, Moscow, 1973.

[254] Lapin, A. Mixed hybrid finite element method for a variational inequalitywith a quasi-linear operator. Comput. Methods Appl. Math. 9, 4 (2009),354–367.

536 BIBLIOGRAPHY

[255] Lasiecka, I. Boundary control of parabolic systems, finite element ap-proximation. Appl. Math. Opt. 6 (1980), 31–62.

[256] Lasiecka, I. Finite element approximations of compensator design foranalytic generators with fully unbounded control and observations. SIAMJ. Control Optim. 33 (1995), 67–88.

[257] Lavrentjev, M. On the Cauchy problem for the Laplace equation.Isvestija Akad. Nauk USSR, Ser.Matematika 20 (1956), (in Russian).

[258] Lavrentjev, M. Some Improperly Posed Problems in MathematicalPhysics. Springer-Verlag, Berlin - Heidelberg - New York, 1967.

[259] Lavrentjev, M., Romanov, V., and Shishatski, S. Ill-Posed Prob-lems of Mathematical Physiks and Analysis. Amer. Math. Soc., Provi-dence, 1986.

[260] Lemaire, B. Coupling optimization methods and variational conver-gence. ISNM 84 (1988), 163–179.

[261] Lemaire, B. The proximal algorithm. ISNM 87 (1989), 73–87.

[262] Lemarechal, C., and Sagastizabal, C. Variable metric bundle meth-ods: from conceptual to implementable forms. Mathematical Programming76 (1997), 393–410.

[263] Levitin, E., and Polyak, B. T. Minimization methods under con-straints. J. Vycisl. Mat. i Mat. Fiz. 6 (1966), 787–823 (in Rusian).

[264] Lewy, H., and Stampacchia, G. On the regularity of the solutions ofa variational inequality. Com. Pure Appl. Math. 22 (1969), 153–188.

[265] Lewy, H., and Stampacchia, G. On existence and smoothness of solu-tions of some noncoercive variational inequalities. Arch. Rat. Mech. Anal.41 (1971), 241–253.

[266] Lin, C. J., Fang, S., and Wu, S. Y. An unconstrained convex pro-gramming approach to linear semi-infinite programming. SIAM J. Optim.8 (1998), 443–456.

[267] Lions, J., and Stampacchia, G. Variational inequalities. Comm. PureAppl. Math. 20 (1967), 493–519.

[268] Lions, J. L. Equations Differentielles Operationelles dans les Espaces deHilbert. C.I.M.E. Varenna, Juillet, 1963.

[269] Lions, J. L. Controle Optimal des Systemes Gouvernes par des Equationsaux Derivees Partielles. Dunod, Gauthier-Villars, Paris, 1968.

[270] Lions, J. L. Quelques Methodes de Resolution de Problemes Nonlineaires.Dunod, Paris, 1969.

[271] Lions, J. L. Quelques Methodes de Resolution des Problemes aux Limitesnon Lineaires. Dunod, Gauthier-Villars, Paris, 1969.

BIBLIOGRAPHY 537

[272] Lions, J. L., and Magenes, E. Problemes aux limites non homogeneset applications. V.1. Dunod, Paris, 1968.

[273] Lions, P.-L., and Mercier, B. Splitting algorithm for the sum of twononlinear operators. SIAM J. Numer. Anal. 16 (1979), 964–979.

[274] Liskovec, O. Numerical solution of some incorrect problems via thequasi-solution method. Diff. Uravnenija 4 (1968), 735–742 (in Russian).

[275] Liskovec, O. Variational Methods for Solving Incorrect Problems. Naukai Technika, Minsk, 1981 (in Russian).

[276] Liskovec, O. Theory and methods for the solution of incorrect problems.Itogi Nauki i Techniki, Mat. Anal. 20 (1982), 116–177 (in Russian).

[277] Lootsma, F. Boundary properties of penalty functions for constrainedminimization. Philips Res. Rept., Suppl.3 (1970).

[278] Lootsma, F. Numerical Methods for Nonlinear Optimization. AcademicPress, London, 1972.

[279] Louis, A. Inverse und schlechtgestellte Probleme. Teubner StudienbucherMathematik, Teubner, Stuttgart, 1989.

[280] Luksan, L., and Spedicato, E. Variable metric methods for uncon-strained optimization and nonlinear least squares. J. Comput. Appl. Math.124 (2000), 61–95.

[281] Luque, E. Asymptotic convergence analysis of the proximal point algo-rithms. SIAM J. Control Optim. 22 (1984), 277–293.

[282] Makler-Scheimberg, S., Nguyen, V. H., and Strodiot, J. J. Fam-ily of perturbation methods for variational inequalities. JOTA 89 (1996),423–452.

[283] Mancino, O., and Stampacchia, G. Convex programming and varia-tional inequalities. JOTA 9 (1972), 3–23.

[284] Mangasarian, O. Dual feasible direction algorithms. In Techiques ofOptimization (1972), Acadenic Press, New York, pp. 67–88.

[285] Mangasarion, O. Nonlinear Programming. McGraw-Hill, New York,1969.

[286] Marchuk, G., and Agoschkov, V. Introduction into Projectional-Difference Methods. Nauka, Moscow, 1981 (in Russian).

[287] Martinet, B. Regularisation d’inequations variationelles par approxi-mations successives. RIRO 4 (1970), 154–159.

[288] Martinet, B. Algorithmes pour la resolution de problemes d’optimisationet de minimax. PhD thesis, University of Grenoble, 1972.

[289] Megiddo, N. Pathways to the optimal set in linear programming. InProgress in Mathematical Programming, Interior Point and Related Meth-ods (1989), N. Megiddo, Ed., Springer, New York, pp. 131–158.

538 BIBLIOGRAPHY

[290] Minty, G. J. Monotone (nonlinear) operators in Hilbert space. DukeMath. Journal 29 (1962), 341–346.

[291] Mitchell, A., and Wait, R. The Finite Element Method in PartialDifferential Equations. John Wiley & Sons, Chichester – New York –Brisbane – Toronto, 1980.

[292] Moreau, J. J. Proximite et dualite dans un espace Hilbertien. Bull. Soc.Math. France 93 (1965), 273–299.

[293] Morosov, V. On the solution of functional equations via methods ofregularization. Soviet Math. Doklady 7 (1966), 414–417 (in Russian).

[294] Morosov, V. On Regularizing Families of Operators. Moscow UniversityPress, 1967 (in Russian).

[295] Morosov, V. Methods for Solving Incorrectly Posed Problems. Springer-Verlag, New York, 1984.

[296] Mosco, U. Convergence of convex sets and of solutions of variationalinequalities. Advances in Mathematics 3 (1969), 510–585.

[297] Mouallif, K. Sur la convergence d’une methode associant penalisationet regularisation. Bull. Soc. Roy. Sc. de Liege 52 (1987), 175–180.

[298] Mouallif, K., and Thera, M. Finding a zero of the sum of two max-imal monotone operators. JOTA 94 (1997), 425–448.

[299] Mouallif, K., and Tossings, P. Une methode de penalisation expo-nentielle associee a une regularisation proximale. Bull. Soc. Roy. Sc. deLiege 56 (1987), 181–192.

[300] Namm, R. V. Some algorithms for solving the Signorini problem. Opti-mizacija (Novosibirsk) 33 (1969), 63–78 (in Russian).

[301] Namm, R. V. On the rate of convergence of the finite element method invariational inequalities with weakly coercive operators. Report, Instituteof Applied Mathematics, Chabarovsk 04 (1991), 1–13 (in Russian).

[302] Nash, S., and Sofer, A. Linear and Nonlinear Programming. McGraw-Hill, New York, 1996.

[303] Natterer, F. Regularisierung schlecht gestellter Probleme durch Pro-jektionsverfahren. Numer. Math. 28 (1977), 329–341.

[304] Neittaanmaki, P., Sprekels, J., and Tiba, D. Optimization of El-liptic Systems. Springer, New York, 2006.

[305] Neittaanmaki, P., and Tiba, D. An embedding of domains approachin free boundary problems and optimal design. SIAM J. Control Optim.33 .

[306] Necas, J. Les Methods Directes en Theorie des Equations Elliptiques.Masson, Paris, 1967.

BIBLIOGRAPHY 539

[307] Necas, J., and Hlavacek, I. Mathematical Theory of Elastic andElasto-plastic Bodies, An Introduction. Elsevier, Amsterdam, 1981.

[308] Necas, J., Jarusek, J., and Haslinger, J. On the solution of thevariational inequality to the Signorini problem with small friction. Boll.Unione Math. Ital. 5 (1980), 796–811.

[309] Nitsche, J. Vorlesungen uber Minimalflachen. Die Grundlehren derMathematischen Wissenschaften in Einzeldarstellungen, 199. Springer,Berlin, 1975.

[310] Oden, J. Finite Elements of Nonlinear Continua. McGraw-Hill, NewYork, 1972.

[311] Olejnik, O. On second order linear equations with nonnegative charac-teristic form. Matem. Sbornik 69 (1966), 111–140 (in Russian).

[312] Opial, Z. Weak convergence of the successive approximations for nonex-pansive mappings in Banach spaces. Bull. Amer. Math. Soc. 73 (1967),591–597.

[313] Ortega, J. M., and Rheinboldt, W. Iterative Solutions of NonlinearEquations in Several Variables. Academic Press, New York – London,1970.

[314] Painer, E. R., and Tits, A. L. A globally convergent algorithm withadaptively refined discretization for semi-infinite optimization problemsarising in engineering design. IEEE Trans. Automat. Control 39 (1989),903–908.

[315] Panagiotopoulos, P. D. A nonlinear programming approach to theunilateral contact- and friction-boundary value problem in the theory ofelasticity. Ing. Archiv 44 (1975), 421–432.

[316] Panagiotopoulos, P. D. Inequality Problems in Mechanics and Appli-cations. Convex and Nonconvex Energy Functionals. Birkhauser, Boston- Basel - Stuttgart, 1985.

[317] Pang, J. S., and Chan, D. Iterative methods for variational and com-plementarity problems. Math. Programming 24 (1982), 284–313.

[318] Pascali, D., and Sburlan, S. Nonlinear mappings of monotone type.Editura Academiei, Bucharest, 1978.

[319] Passty, G. B. Ergodic convergence to a zero of the sum of monotoneoperators in Hilbert space. J. Math. Anal. Appl. 72 (1979), 383–390.

[320] Phillips, D. A technique for the numerical solution of certain integralequations of the first kind. J. Assoc. Comput. Mech. 9 (1962), 84–97.

[321] Pironneau, O., and Polak, E. Rate of convergence of a class of meth-ods of feasible directions. SIAM J. Numer. Anal. 10 (1973), 161–174.

[322] Polak, E. Computational Methods in Optimization: A Unified Approach.Academic Press, New York, 1971.

540 BIBLIOGRAPHY

[323] Polak, E. On the mathematical foundations of nondifferentiable opti-mization in engineering design. SIAM Rev. 29 (1987), 21–89.

[324] Polak, E. Optimization. Algorithms and Consistent Approximations.volume 124 of Applied Mathematical Sciences. Springer-Verlag, New York,1997.

[325] Polak, E., and He, L. Rate preserving discretization strategies forsemi-infinite programming and optimal control. SIAM J. Control andOptimization 30 (1992), 548–572.

[326] Polak, E., and Mayne, D. Q. An algorithm for optimization problemswith functional inequality constraints. IEEE Trans. Automat. Control 21(1976), 184–193.

[327] Polak, E., and Tits, A. L. A recursive quadratic programming algo-rithm for semi-infinite programs. J. Appl. Math. Optimization 8 (1982),325–349.

[328] Polyak, B. Iteration methods for the solution of certain incorrect prob-lems. In Numerical Methods and Programs (University Press, Moscow,1969), vol. 12, pp. 38–52.

[329] Polyak, B. Methods of conjugate gradients for extremal problems. J.Vycisl. Mat. i Mat. Fiz. 9 (1969), 807–821 (in Russian).

[330] Polyak, B. T. Introduction to Optimization. Optimization Software,Inc. Publ. Division, New York, 1987.

[331] Polyak, R. Modified interior distance functions and center methods.Research Report, IBM, T.J. Watson Reseacg Center RC 17042 (1991).

[332] Polyak, R., and Teboulle, M. Nonlinear rescaling and proximal-likemethods in convex optimization. Math. Programming 76 (1997), 265–284.

[333] Powell, M. Some global convergence properties of a variable metricalgorithm for minimization without exact line search. In SIAM-AMS Pro-ceedings (SIAM Publ., Philadephia, 1976), R. Cottle and C. Lemke, Eds.,vol. IX.

[334] Powell, M. Karmarkar’s algorithm: A view from nonlinear program-ming. Bulletin of the Institute of Mathematics and Applications 26 (1990),165–181.

[335] Powell, M., and Toint, P. L. On the estimation of sparse Hessianmatrices. SIAM J. Numer. Anal. 16 (1979), 1060–1074.

[336] Pucci, C. Sui problemi di Cauchy non ”ben posti”. Rend Alti Acad. Naz.Lincei 8 (1955), 473–477.

[337] Pshenicnyj, B. N. Algorithms for general mathematical programmingproblems. Kibernetika 5 (1970), 120–125 (in Rusian).

[338] Pshenicnyj, B. N. Necessary Conditions for an Extremum. MarselDekker, New York, 1971.

BIBLIOGRAPHY 541

[339] Pshenicnyj, B. N. Convex Analysis and Extremal Problems. Nauka,Moscow (in Rusian), 1980.

[340] Pshenicnyj, B. N. Methods of Linearization. Nauka, Moscow (in Ru-sian), 1983.

[341] Pshenicnyj, B. N., and Danilin, J. Numerische Methoden fur Ex-tremalaufgaben. Deutscher Verlag der Wissenschaften, Berlin, 1982.

[342] Rado, T. The problem of the least area and the problem of Plateau.Math. Z. 32 (1930), 763–796.

[343] Reemtsen, R. Discretization methods for the solution of semi-infiniteprogramming problems. J. Opt. Theory Appl. 71 (1991), 85–103.

[344] Reemtsen, R., and Ruckmann, J.-J., Eds. Semi-infinite Programming.Kluwer Acad. Publ., Dordrecht, 1998.

[345] Renaud, A., and Cohen, G. An extension of the auxiliary problemprinciple to nonsymmetric auxiliary operators. ESAIM: Control, Opti-mization and Calculus of Variations 2 (1997), 281–306.

[346] Ritter, K. Global and superlinear convergence of a class of variablemetric methods. Math. Program. Study 14 (1981).

[347] Rockafellar, R. T. Local boundedness of nonlinear, monotone opera-tors. Michigan Math. J. 16 (1969), 397–407.

[348] Rockafellar, R. T. Convex Analysis. Princeton University Press,Princeton, 1970.

[349] Rockafellar, R. T. Monotone operators associated with saddle func-tions and minimax problems. In Nonlinear Functional Analysis, Part I(1970), F. E. Browder, Ed., vol. 15 of Symposia in Pure Math., Amer.Math. Soc., Providence, R. I., pp. 397–407.

[350] Rockafellar, R. T. On the maximality of sums of nonlinear monotoneoperators. Trans. Amer. Math. Soc. 149 (1970), 75–88.

[351] Rockafellar, R. T. Augmented Lagrange multiplier functions and ap-plications of the proximal point algorithm in convex programming. Math.Oper. Res. 1 (1976), 97–116.

[352] Rockafellar, R. T. Monotone operators and the proximal point algo-rithm. SIAM J. Control Optim. 14 (1976), 877–898.

[353] Rockafellar, R. T., and Wets, R. Variational Analysis. SpringerVerlag, Berlin, 1997.

[354] Romanov, V. Inverse Problems of Mathematical Physics. VNU SciencePress, Utrecht, 1987.

[355] Roos, H.-G., Stynes, M., and Tobiska, L. Numerical Methods forSingularly Perturbed Differential Equations. Springer, Berlin, 1991.

542 BIBLIOGRAPHY

[356] Rotin, S. Regularisierte strafmethoden fur inkorrekt gestellte kon-trollprobleme mit linearen zustandsgleichungen. Forschungsbericht 98-02(1998), pp.39. Mathematik/Informatik, Universitat Trier.

[357] Rotin, S. Convergence of the proximal-point method for ill-posed con-trol problems with partial differential equations. PhD thesis, University ofTrier, 1999.

[358] Rupp, T. Kontinuitatsmethoden zur Losung einparametrischer semi-infiniter Optimierungsprobleme. PhD thesis, University of Trier, 1988.

[359] Salmon, G., Nguyen, V. H., and Strodiot, J. J. Coupling the aux-iliary problem principle and the epiconvergence theory for solving generalvariational inequalities. JOTA 104 (2000), 629–657.

[360] Scarpini, F., and Vivaldi, M. A. Error estimates for the approximationof some unilateral problems. RAIRO Anal. Numer. 11 (1977), 197–208.

[361] Schaefer, P. On uniqueness, stability and pointwise estimates in theCauchy problem for coupled elliptic equations. Quart. Appl. Math. 31(1973), 321–328.

[362] Schattler, U. An interior-point method for solving semi-infinite pro-gramming problems. Annals of Operations Research 62 (1996), 277–301.

[363] Schittkowski, K. Computational Mathematical Programming. Springer-Verlag, Heidelberg, 1985.

[364] Schittkowski, K. Ql: A Fortran code for convex quadratic programming- user’s guide, version 2.11. Tech. Rep., Dept. of Computer Science, Univ.Bayreuth (2005).

[365] Schmitt, H. Normschwachere Prox-Regularisierungen. PhD thesis, Uni-versity of Trier, 1996.

[366] Schramm, H., and Zowe, J. A version of the bundle idea for minimizinga nonsmooth function: Conceptual idea, convergence analysis, numericalresults. SIAM J. Optim. 2 (1992), 121–152.

[367] Schattler, U. An interior-point method for semi-infinite programmingproblems. PhD thesis, Universitat Wurzburg, 1992.

[368] Schwartz, H. R. Numerische Mathematik. B. G. Teubner Stuttgart,1988.

[369] Schwarz, L. Theorie des Distributions. Hermann, Paris, 1973.

[370] Schwarz, L. Analyse Hilbertienne. Hermann, Paris, 1979.

[371] Shor, N. Z. Minimization Methods for Nondifferentiable Funktions.Springer-Verlag, Berlin, 1985.

[372] Shultz, G., Schnabel, R., and Byrd, R. A family of trust-regionbased algorithms for unconstrained minimization with strong global con-vergence properties. SIAM J. Numer. Anal. 22 (1985), 47–67.

BIBLIOGRAPHY 543

[373] Skarin, V. On a penalty method for nonlinear programming problems.J. Vycisl. Mat. i Mat. Fiz. 13 (1973), 1186–1199 (in Russian).

[374] Solodov, M., and Svaiter, B. A hybrid approximate extragradient-proximal point algorithm using the enlargement of a maximal monotoneoperator. Set-Valued Analysis 7 (1999), 323–345.

[375] Solodov, M., and Svaiter, B. A hybrid projection-proximal pointalgorithm. Journal of Convex Analysis 6 (1999), 59–70.

[376] Solodov, M., and Svaiter, B. An inexact hybrid generalized proximalpoint algorithm and some new results on the theory of Bregman functions.Math. Oper. Res. 25 (2000), 214–230.

[377] Sonnevend, G. Applications of analytic centers for the numerical so-lution of semi-infinite, convex programs arising in control theory. Lect.Notes Contr. Inf. Sci. 143 (1990), 413–422.

[378] Sonnevend, G. A new class of a high order interior point method for thesolution of convex semi-infinite problems. In Computational and OptimalControl (1994), R. Bulirsch and D. Kraft, Eds., Birkhauser, Basel, pp. 193–211.

[379] Spingarn, J. E. Submonotone mappings and the proximal point algo-rithm. Numer. Funct. Anal. and Optimiz. 4 (1981-1982), 123–150.

[380] Spingarn, J. E. Application of the method of partial inverses to convexprogramming: Decomposition. Math. Programming 32 (1985), 199–223.

[381] Stampacchia, G. Formes bilineares coercives sur les ensembles convexes.Comptes Rendus Acad. Sciences Paris 258 (1964), 4413–4416.

[382] Streng, G., and Fix, G. An Analysis of the Finite Element Method.Prentice-Hall, New York, 1973.

[383] Tanaka, Y., Fukushima, M., and Hasegawa, T. Implementable l-infinity penalty function method for semi-infinite optimization. Int. J.Syst. Sci. 18 (1987), 1563–1568.

[384] Tanaka, Y., Fukushima, M., and Ibaraki, T. A globally conver-gent sqp-method for semi-infinite optimization. J. Comp. Appl. Math. 23(1988), 141–153.

[385] Teboulle, M. Entropic proximal mappings with application to nonlinearprogramming. Mathematics of Oper. Res. 17 (1992), 670–690.

[386] Teboulle, M. Convergence of proximal-like algorithms. SIAM J. Optim.7 (1997), 1069–1083.

[387] Tichatschke, R., and Hofmann, B. On the quasi-solution of overdetermined ill-conditioned systems of linear equations and its realizationby means of parametric programming. Optimization 11 (1980), 563–578.

[388] Tichatschke, R., Kaplan, A., Voetmann, T., and Boehm, M. Nu-merical treatment of an asset price model with non-stochastic uncertainty.Trabajos de Investigacion and Operativa (TOP) 10 (2002), 1–50.

544 BIBLIOGRAPHY

[389] Tichatschke, R., and Nebeling, V. A cutting-plane algorithm forquadratic semi-infinite programming problems. Optimization 19 (1988),803–817.

[390] Tichatschke, R., and Schwartz, B. Methods of feasible directions forsemi-infinite programming problems, Part I. Optimizacija (Novosibirsk)43 (1988), 62–73.

[391] Tichatschke, R., and Schwartz, B. Methods of feasible directions forsemi-infinite programming problems, Part II. Optimizacija (Novosibirsk)44 (1988), 64–81.

[392] Tikhonov, A. N. Solution of incorrectly formulated problems and aregularization method. Soviet Math. Doklady 4 (1963), 1035–1038 (inRussian).

[393] Tikhonov, A. N. Improper problems of optimal planning and stablemethods of their solution. Soviet Math. Doklady 6 (1965), 1264–1267 (inRussian).

[394] Tikhonov, A. N. Methods for the regularization of optimal controlproblems. Soviet Math. Doklady 6 (1965), 761–763 (in Russian).

[395] Tikhonov, A. N., and Arsenin, V. J. Methods for Solving Ill-posedProblems. J. Wiley & Sons, New York, 1977.

[396] Todd, M. Interior-point algorithms for semi-infinite programming. Math.Programming 65 (1994), 217–245.

[397] Toint, P. On sparse and symmetric updating to a linear equation. Math.Computing 31 (1977), 9544–961.

[398] Toint, P. Global convergence of a class of trust-region methods fornon-convex minimization in Hilbert space. IMA J. on Numerical Anal. 8(1988), 231–252.

[399] Tossings, P. Algorithme de Tikhonov perturbe et applications. Prepub-lication, Inst. de Mathematique, Univ. de Liege 91-002 (1991).

[400] Tseng, P. Further applications of a splitting algorithm to decompositionin variational inequalities and convex programming. Math. Programming48 (1990), 249–263.

[401] Tseng, P. Applications of a splitting algorithm to decomposition inconvex programming and variational inequalities. SIAM J. Control Optim.29 (1991), 119–138.

[402] Tseng, P., and Bertsekas, D. On the convergence of the exponen-tial multiplier method for convex programming. Math. Programming 60(1993), 1–19.

[403] Tuncel, L., and Todd, M. Asymptotic behavior of interior point meth-ods: A view from semi-infinite programming. Math. Oper. Res. 21 (1996),354–381.

BIBLIOGRAPHY 545

[404] Uryas’ev, S. Adaptive variable metric methods for nondifferentiableoptimization problems. Lect. Notes Control Inf. Sci. 144 (1990), 432–441.

[405] Vainberg, M. Variational Method and Method of Monotone Operatorsin the Theory of Nonlinear Equations. J. Wiley & Sons, New York, 1973.

[406] Vainikko, G. Methods of the Solution of Linear Ill-Posed Problems inthe Hilbert space. Tartu State University, 1982 (in Russian).

[407] Vasiljev, F. Application of nonsmooth penalty functions for regularizinginstable minimization problems. J. Vycisl. Mat. i Mat. Fiz. 27 (1987),1444–1450 (in Russian).

[408] Vasiljev, F. P. Methods for Solving Extremal Problems. Nauka, Moscow,1981 (in Russian).

[409] Vasicek, O. An equilibrium characterization of the term structure. Jour-nal of Financial Economics 5 (1977), 177–188.

[410] Vlcek, J., and Luksan, L. Shifted limited-memory variable metricmethods for large-scale unconstrained optimization. J. Comput. Appl.Math. 186 (2006), 365–390.

[411] Voetmann, T. Proximal-punkt-methoden bei schlecht gestellten semi-infiniten problemen. Diploma thesis, University of Trier (1999).

[412] Voetmann, T. On regularization of nonconvex problems. Forschungs-berichte Mathematik/Informatik, Trier 17 (2001).

[413] Voetmann, T. Numerical solution of variational inequalities using weakregularization with Bregman functions. University of Trier, private co-munication (2004).

[414] Vorovich, I., Alexandrov, V., and Babeshko, V. Non-classicalMixed Problems in Elasticity Theory. Nauka, Moscow, 1974 (in Russian).

[415] Wang, G., and Yang, X. Numerical modeling of dual variational in-equalities of unilateral contact problems using the mixed finite elementmethod. Int. J. Numer. Anal. Model. 6, 1 (2009), 161–176.

[416] Watson, G. Globally convergent methods for semi-infinite programmingproblems. BIT 21 (1981), 362–373.

[417] Young, D. Iterative Solution of Large Systems. Academic Press, London,1971.

[418] Zeidler, E. Nonlinear Functional Analysis and its Applications. II/B.Springer Verlag, New York, 1990.

[419] Zelikin, M., and Borisov, V. Regimes with increasingly more frequentswitchings in optimal control problems. Proc. of Steklov Inst. of Math. 1(1991), 95–186.

[420] Zelikin, M., and Borisov, V. Theory of Chattering Control withApplications to Astronautics, Robotics, Economics, and Engineering.Birkhauser Boston, 1994.

546 BIBLIOGRAPHY

[421] Zhu, D. L., and Marcotte, P. New classes of generalized monotonicity.JOTA 87 (1995), 457–471.

[422] Zhu, D. L., and Marcotte, P. Co-coercivity and its role in the con-vergence of iterative schemes for solving variational inequalities. SIAM J.Optimization 6 (1996), 714–726.

Index

algorithmsplitting, 420, 440

approximationinner, 265Mosco, 47

Armijo search, 490auxiliary problem principle, 419

barrier algorithm, 204BFGS class, 499bilinear form, 445

coercive, 454symmetric, 445

bisection method, 487block relaxation, 503boundary

Lipschitz, 446boundary coercivity, 389boundary-obstacle problem, 471Bregman function, 388Bregman-like function, 395Broyden class, 499bundle method, 93, 372

Cauchy Problem, 2closure, 444coercivity, 454

boundary, 389zone, 389

conenormal, 462of feasible directions, 506

conjugategradient method, 493, 504operator, 456space, 443vectors, 493

constraint qualification, 466contact zone, 289controlling parameter

choice of, 131

convection-diffusion problem, 357convergence, 444

linear, 486quadratic, 486superlinear, 486to optimal value, 485to set, 485weak, 444

convergence sensing condition, 389convexification, 82coordinate descent method, 502

Davidon-Fletcher-Powell method, 499derivative

directional, 455Frechet, 449Gateaux, 449generalized, 447

direction search, 507Dirichlet Problem, 351, 470displacements

admissible, 291virtual, 291

distance function, 402domain

effective, 445dual method, 516dual problem, 285, 377duality pairing, 443Dunn property, 421, 424, 464

Einstein’s summation convention, 290elasticity coefficient, 290elliptic regularization method, 339ellipticity property, 290embedding

compact embedding, 448continuous embedding, 448

epigraph, 451exchange method, 168

feasible direction, 506

547

548 INDEX

fieldof displacements, 290of strains, 290

finite elementdiscretization, 471space, 472

functionbarrier, 514Bregman, 388Bregman-type, 388DC, 94distance, 389, 402entropy-like, 402generalized distance, 406logarithmic-quadratic, 402Lyapunov, 404penalty, 514

functionalconcave, 451continuous, 445

weakly, 445convex, 451Holder continuous, 445indicator, 16Lipschitz, 445M-convergence, 47Moreau-Yosida, 8proper convex, 451semicontinuous

lower, 445, 451upper, 445weakly lower, 445weakly upper, 445

strictly convex, 454strongly convex, 454trace, 448uniformly convex, 454

gap function, 370Gauss-Seidel method, 503Gelfand triple, 446generalized distance, 406generalized proximal point method, 340,

408Goldstein search, 490gradient method, 492gradient projection method, 30, 503gradient-type methods, 489growth condition, 347

Hausdorff distance, 468Hooke’s law, 289

inequalityCauchy-Schwarz, 443Friedrichs, 447Jensen, 452Korn, 303Poincare, 335

inner approximationset, 477space, 471

interior proximal method, 387interpolant, 473

Korn’s inequality, 303Krylow subspace, 501

Lagrange multiplier, 466Lagrangian function, 466

augmented, 89, 517lemma

Opial, 444Polyak, 486

linearization method, 509Lyapunov function, 404

mappingconvex, 452proximal point, 22proximal-point, 55

methodArmijo search, 490augmented Lagrangian, 73bisection, 487block relaxation, 503Bregman-function-based, 390Broyden, 499bundle, 372conjugate gradient, 493, 504coordinate descent, 502Davidon-Fletcher-Powell, 499dual, 516elliptic regularization, 339exact PPR, 387, 404feasible directions, 507Fletcher-Reeves, 494Gauss-Seidel, 503generalized proximal point, 340, 408Goldstein search, 490

INDEX 549

gradient, 492gradient projection, 30, 503gradient-type, 489implicit Euler, 77interior proximal, 387iterative regularization, 38Levenberg-Marquard, 78linearization, 79, 509multiplier, 516Newton, 489, 497penalty, 32, 511proximal auxiliary problem, 421proximal gradient, 60proximal point, 57proximal point regularization, 23Quasi-Newton, 498quasi-solution, 15regularized barrier, 68regularized multiplier, 89regularized penalty, 42, 60, 247regularized subgradient projection,

39residual, 15steepest descent, 489, 491Tikhonov regularization, 15trust-region, 90variable metric, 500

minimal surface problem, 351with obstacle, 351

minimizing sequence, 485model, 90Moreau-Yosida functional, 8Moreau-Yosida-Regularization, 56Moroeau-Yosida, 439Mosco convergence, 47Mosco scheme, 47MSR-method, 112multi-step regularization, 38, 112, 248

Newton method, 489, 497norm, 443

operator, 445seminorm, 443

normal solution, 20

obstacle problem, 470one-step regularization, 38, 101, 247operator

ε-enlargement, 367, 390co-coercive, 464

coercive, 464divergence, 285effective domain, 461firmly non-expansive, 55graph, 461inverse, 461locally bounded, 461locally hemi-bounded, 411, 461locally hemi-continuous, 461maximal monotone, 462monotone, 461multi-valued, 461non-expansive, 56normality, 462paramonotone, 388, 414, 463pseudo-monotone, 414, 463range, 461resolvent, 365single-valued, 461strongly monotone, 461weakly coercive, 464

operator equation of first kind, 15OSR-method, 101

modified, 109outward normal, 449

penalty method, 32, 60, 247, 511Poincare inequality, 335precondition, 497problem

boundary control, 244Cauchy, 2chattering, 266convection-diffusion, 357Dirichlet, 351, 470distributed control, 243dual, 377, 466Fuller, 279ill-posed, 2, 6inclusion, 419inverse, 4minimal surface, 351obstacle on the boundary, 283, 471Poisson, 270quadratic programming, 376, 505saddle point, 287, 466semi-infinite programming, 167Signorini, 288Signorini with friction, 328singularly perturbed, 357

550 INDEX

trust-region, 91two-body contact, 288variational, 453weakly well-posed, 6well-posed, 1, 6

proximal gradient methods, 60proximal point method

exact, 57inexact, 59

Quasi-Newton method, 498quasi-solution, 15quasi-solution method, 15

rank-1-update, 499rank-2-update, 499reduction set, 468regularization

multi-step, 38on subspace, 247, 297, 410one-step, 38weak, 410with weaker norm, 298, 307, 322,

334relaxation, 366relaxation parameter, 503residual method, 15

saddle point, 466scalar product, 443semi-infinite programming, 167set

compact, 445dense, 445feasible, 453inner approximation, 477interior, 451M-convergence, 47of feasible directions, 458of well-posedness, 2, 15optimal, 453relative interior, 451solution set, 453weakly compact, 445

SIPT -well-posed, 7convex, 7, 167discretization approach, 168exchange approach, 168linear, 4

reduction approach, 170Slater condition, 466, 467space

Hilbert, 447inner approximation, 265, 471Sobolev, 447

stabilizer, 14weak, 14

steepest descent method, 491step-length, 488stress tensor, 290subdifferential, 455, 462subgradient, 455subgradient method, 420support, 446

Taylor formula, 450theorem

Kuhn-Tucker, 467generalized Weierstrass, 445mean value, 450Moreau-Rockafellar, 411, 456Riesz, 444

Tikhonov regularization, 15trace, 283triangulation

quasi-uniform, 472uniform, 474

unilateral constraint, 287

variable metric method, 500variational inequality, 459

hemi-, 421, 437, 459

weak closure, 444weak limit, 444weak solution, 292

zone coercivity, 389

proximal point methods for variational problems · point regularization for ill-posed vi’s with...

Documents