subgradient projectors: extensions, theory, and ...people.ok.ubc.ca/bauschke/research/109.pdf ·...

74
Subgradient Projectors: Extensions, Theory, and Characterizations Heinz H. Bauschke * , Caifang Wang , Xianfu Wang, and Jia Xu § April 13, 2017 Abstract Subgradient projectors play an important role in optimization and for solving convex feasibil- ity problems. For every locally Lipschitz function, we can define a subgradient projector via generalized subgradients even if the function is not convex. The paper consists of three parts. In the first part, we study basic properties of subgradient projectors and give characteriza- tions when a subgradient projector is a cutter, a local cutter, or a quasi-nonexpansive mapping. We present global and local convergence analyses of subgradent projectors. Many examples are provided to illustrate the theory. In this second part, we investigate the relationship be- tween the subgradient projector of a prox-regular function and the subgradient projector of its Moreau envelope. We also characterize when a mapping is the subgradient projector of a convex function. In the third part, we focus on linearity properties of subgradient projectors. We show that, under appropriate conditions, a linear operator is a subgradient projector of a convex function if and only if it is a convex combination of the identity operator and a projec- tion operator onto a subspace. In general, neither a convex combination nor a composition of subgradient projectors of convex functions is a subgradient projector of a convex function. 2010 Mathematics Subject Classification: Primary 49J52; Secondary 49J53, 47H04, 47H05, 47H09. Keywords: Approximately convex function, averaged mapping, cutter, essentially strictly dif- ferentiable function, fixed point, limiting subgradient, local cutter, local quasi-firmly nonexpan- sive mapping, local quasi-nonexpansive mapping, local Lipschitz function, linear cutter, linear firmly nonexpansive mapping, linear subgradient projection operator, Moreau envelope, projec- tion, prox-bounded, proximal mapping, prox-regular function, quasi-firmly nonexpansive map- ping, quasi-nonexpansive mapping, (C, ε)-firmly nonexpansive mapping, subdifferentiable func- tion, subgradient projection operator. 1 Introduction Studies of optimization problems and convex feasibility problems have led in recent years to the development of a theory of subgradient projectors, which is a projection to a certain half-space. * Mathematics, University of British Columbia, Kelowna, B.C. V1V 1V7, Canada. E-mail: [email protected]. Department of Mathematics, Shanghai Maritime University, Shanghai, China. Email: [email protected]. Mathematics, University of British Columbia, Kelowna, B.C. V1V 1V7, Canada. E-mail: [email protected]. § Mathematics, University of British Columbia, Kelowna, B.C. V1V 1V7, Canada. Email: [email protected]. 1

Upload: others

Post on 25-Jun-2020

23 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Subgradient Projectors: Extensions, Theory, and ...people.ok.ubc.ca/bauschke/Research/109.pdf · Subgradient Projectors: Extensions, Theory, and Characterizations Heinz H. Bauschke,

Subgradient Projectors: Extensions, Theory, andCharacterizations

Heinz H. Bauschke∗, Caifang Wang†, Xianfu Wang,‡ and Jia Xu§

April 13, 2017

Abstract

Subgradient projectors play an important role in optimization and for solving convex feasibil-ity problems. For every locally Lipschitz function, we can define a subgradient projector viageneralized subgradients even if the function is not convex. The paper consists of three parts.In the first part, we study basic properties of subgradient projectors and give characteriza-tions when a subgradient projector is a cutter, a local cutter, or a quasi-nonexpansive mapping.We present global and local convergence analyses of subgradent projectors. Many examplesare provided to illustrate the theory. In this second part, we investigate the relationship be-tween the subgradient projector of a prox-regular function and the subgradient projector ofits Moreau envelope. We also characterize when a mapping is the subgradient projector of aconvex function. In the third part, we focus on linearity properties of subgradient projectors.We show that, under appropriate conditions, a linear operator is a subgradient projector of aconvex function if and only if it is a convex combination of the identity operator and a projec-tion operator onto a subspace. In general, neither a convex combination nor a composition ofsubgradient projectors of convex functions is a subgradient projector of a convex function.

2010 Mathematics Subject Classification: Primary 49J52; Secondary 49J53, 47H04, 47H05, 47H09.

Keywords: Approximately convex function, averaged mapping, cutter, essentially strictly dif-ferentiable function, fixed point, limiting subgradient, local cutter, local quasi-firmly nonexpan-sive mapping, local quasi-nonexpansive mapping, local Lipschitz function, linear cutter, linearfirmly nonexpansive mapping, linear subgradient projection operator, Moreau envelope, projec-tion, prox-bounded, proximal mapping, prox-regular function, quasi-firmly nonexpansive map-ping, quasi-nonexpansive mapping, (C, ε)-firmly nonexpansive mapping, subdifferentiable func-tion, subgradient projection operator.

1 Introduction

Studies of optimization problems and convex feasibility problems have led in recent years to thedevelopment of a theory of subgradient projectors, which is a projection to a certain half-space.

∗Mathematics, University of British Columbia, Kelowna, B.C. V1V 1V7, Canada. E-mail: [email protected].†Department of Mathematics, Shanghai Maritime University, Shanghai, China. Email: [email protected].‡Mathematics, University of British Columbia, Kelowna, B.C. V1V 1V7, Canada. E-mail: [email protected].§Mathematics, University of British Columbia, Kelowna, B.C. V1V 1V7, Canada. Email: [email protected].

1

Page 2: Subgradient Projectors: Extensions, Theory, and ...people.ok.ubc.ca/bauschke/Research/109.pdf · Subgradient Projectors: Extensions, Theory, and Characterizations Heinz H. Bauschke,

Rather than finding projections on level sets of original functions, the iterative algorithms findprojections on half spaces which include the 0-level set of the function. Polyak developed sub-gradient projector iteration for convex functions [46, 47, 48], and they are further developed byCensor, Combettes, Fukushima, Kiwiel, Yamada and others, and applied to many kinds of opti-mization problems [22, 21, 9, 24, 25, 29, 34, 35, 44, 19, 55]. In [12], we give a systematic study forsubgradient projectors of convex functions. Convexity is often a too strong assumption for theneeds of applications. In a recent work [43], Pang studied finitely convergent algorithms for non-convex inequality problems involving approximately convex functions. The subgradient projectorby Pang used the Clarke subdifferential instead of the Mordukhovich limiting subdifferential. Tothis day, however, there is a lack of systematic theory on the subgradient projector when a functionis possibly nonconvex.

The goal of this paper is to carry out the basic theory of subgradient projectors for possibly non-convex functions on a finite dimensional space, which is thus aimed ultimately at applications todiverse problems of nonconvex optimization. Non-differentiable and nonconvex functions arisein many optimization problems. As far as nonconvex functions are concerned, the cutter theory orT -class developed by Cegielski [20], Bauschke, Borwein and Combettes [8], and Bauschke, Wang,Wang and Xu [13] furnish the new approach to subgradient projectors, without appealing to theexistence theory on subgradient projectors for convex functions. Our study shows that subgra-dient projectors for nonconvex functions have many attractive analytical properties. Among allresults presented here, we discover that while cutters and quasi-nonexpansive mappings on Rn

are global, cutters and quasi-nonexpansive mappings on a neighborhood are more useful for func-tions which are locally convex around the desired point, say a critical point or a feasible point. Thispaper not only includes some results from [54], but also many refinements and new advances.

Since definitions and proofs are much simpler in the finite dimensional space, and many techni-cal complications do not even appear, we shall work in the finite dimensional space only. For theconvenience of readers, our main results are presented in three parts. In the first part, we studyextensions and theory of subgradient projectors. In the second part, we consider subgradient pro-jectors of Moreau enevelopes and conditions under which a mapping is the subgradient projectorof a convex function. The third part is devoted to linear subgradient projectors.

The remainder of this paper is organized as follows.

Part I consists of Sections 2–6. Section 2 provides an extension of subgradient projectors fromconvex functions to possibly nonconvex functions; Section 3 is devoted to calculus of subgradientprojectors; Section 4 deals with whether one can recover a function from its subgradient projector,and fixed point closed property of a subgradient projector; conditions on functions under whichtheir subgradient projectors are cutters or local cutters are presented in Section 5. Section 6 isdevoted to convergence analysis of subgradient projectors by using theory from cutters, local cut-ters, quasinonexpansive mapping, and local quasinonexpansive mappings. Under appropriateassumptions, we show that subgradient projectors are (C, ε)-firmly nonexpansive, a very usefulconcept introduced by Hesse and Luke for studying local linear convergence of a variety of algo-rithms.

Part II consists of Sections 7–8. For prox-bounded and prox-regular functions, their Moreauenvelopes are differentiable. Section 7 studies the subgradient projectors of Moreau envelopesof prox-bounded and prox-regular functions, and their connections to subgradient projectors oforiginal functions. We show that if f is proper, lsc, prox-bounded, and prox-regular on Rn, then fis a difference of convex functions (Corollary 7.9); and that if f is C2, min f = 0, and ∇ f (x) 6= 0for every x ∈ Rn \ argmin f , then the subgradient projector G f of f is a cutter if and only if the

2

Page 3: Subgradient Projectors: Extensions, Theory, and ...people.ok.ubc.ca/bauschke/Research/109.pdf · Subgradient Projectors: Extensions, Theory, and Characterizations Heinz H. Bauschke,

subgradient projector Geλ f of the envelope eλ f is a cutter for every λ > 0 (Propositions 7.22 and7.24). Section 8 characterizes when a mapping is actually a subgradient projector of a convexfunction.

Part III consists of Section 9–11. It is interesting to ask when a subgradient projector is linear, andwhat special properties a linear subgradient projector possesses. To the best of our knowledge, thisquestion has not been explored in the literature. Section 9 studies linear subgradient projectors andtheir distinguished features. In particular, we give a nonlinear cutter which is nonexpansive butnot firmly nonexpansive, and the example is much simpler than the one given by Cegielski [20].In Section 10, using results from Section 9, we show that in general neither a convex combinationnor a composition of subgradient projectors of convex functions is a subgradient projector of aconvex function. Finally, in Section 11, we completely characterize linear subgradient projectorson R2, and give explicit formulae for the corresponding functions.

The notation that we employ is for the most part standard; however, a partial list is providedfor the reader’s convenience. Throughout this paper, Rn is the n-dimensional dimensional Eu-clidean space with inner product 〈·, ·〉 and induced norm ‖ · ‖, i.e., (∀x ∈ Rn) ‖x‖ :=

√〈x, x〉.

The identity operator on Rn is Id. For a mapping T : Rn → Rn, its fixed point set is de-noted by Fix T :=

{x ∈ Rn

∣∣ Tx = x}

; its kernel is ker T :={

x ∈ Rn∣∣ Tx = 0

}; its range is

ran T :={

y ∈ Rn∣∣ y = Tx for some x ∈ Rn}.

For a function f : Rn → (−∞,+∞], its α-level set is denoted by levα f :={

x ∈ Rn∣∣ f (x) ≤ α

};

its effective domain is dom f :={

x ∈ Rn∣∣ f (x) < +∞

}. For a set-valued mapping F : Rn ⇒

Rm, the domain, range and fixed point set of F are given by dom F :={

x ∈ Rn∣∣ F(x) 6= ∅

},

ran F := ∪x∈Rn F(x), and Fix F :={

x ∈ Rn∣∣ x ∈ F(x)

}respectively. We use B(x, δ) for the closed

ball centered at x ∈ Rn with radius δ > 0. R+ denotes the set of non-negative real numbers, andN denotes the set of non-negative integers {0, 1, 2, . . .}.

For a set C ⊆ Rn, its distance function is

dC : Rn → [0,+∞) : x 7→ inf{‖x− y‖

∣∣ y ∈ C}

,

and the projection operator onto C is

PC : Rn ⇒ C : x 7→{

p ∈ C∣∣ ‖x− p‖ = dC(x)

}.

The indicator function of C is ιC : Rn → (−∞,+∞] is defined by ιC(x) := 0 if x ∈ C and ιC(x) :=+∞ if x 6∈ C. We write int C for the interior, and bdry(C) := C \ int C for the boundary of C,respectively. For a subspace L ⊆ Rn, its orthogonal complement is defined to be

L⊥ :={

y ∈ Rn ∣∣ 〈y, x〉 = 0, x ∈ L}

.

When x, y ∈ Rn, the line segment between x, y is given by [x, y] :={(1− λ)x + λy

∣∣ 0 ≤ λ ≤ 1}

.

3

Page 4: Subgradient Projectors: Extensions, Theory, and ...people.ok.ubc.ca/bauschke/Research/109.pdf · Subgradient Projectors: Extensions, Theory, and Characterizations Heinz H. Bauschke,

Part I

Extensions to possibly nonconvex functions andbasic theory

2 An extension of subgradient projector via limiting subgradients

To introduce subgradient projectors for possible non-convex functions, we need the followinggeneralized subgradients [51, 40, 39, 31].

Definition 2.1 Consider a function f : Rn → (−∞,+∞] and a point x ∈ Rn with f (x) finite. For avector v ∈ Rn, one says that

(i) v is a regular (or Frechet) subgradient of f at x, written v ∈ ∂ f (x), if

f (x) ≥ f (x) + 〈v, x− x〉+ o(‖x− x‖);

(ii) v is a limiting (or Mordukhovich) subgradient of f at x, written v ∈ ∂ f (x), if there are sequencesxν → x, f (xν)→ f (x) and vν ∈ ∂ f (xν) with vν → v.

A locally Lipschitz function is subdifferentially regular at x with f (x) finite if ∂ f (x) = ∂ f (x), see[51, Corollary 8.11], [39]. It is well-known that when f is locally Lipschitz, ∂ f is nonempty-valuedeverywhere; when f is lower semicontinuous (lsc), the set of points at which ∂ f is nonempty-valued is at least dense in the domain of f , [51, Corollary 8.10]. Furthermore, ∂ f is the usualFenchel subdifferential when f is convex. All of these results can be found in [18, 17, 40, 51].

A function f : Rn → R is called subdifferentiable if ∂ f (x) 6= ∅ for every x ∈ Rn. While everylocally Lipschitz functions on Rn is a subdifferentiable function, a subdifferentiable function mightnot be locally Lipschitz, e.g.,

f : R→ R : x 7→{

1 if x ≤ 0,1−√

x if x > 0,

see also [51, page 359]. The key concept we shall study is the subgradient projection operator.

Definition 2.2 Let f : Rn → R be lsc and subdifferentiable, and let s : Rn → Rn be a selection of ∂ f .The subgradient projector of f is defined by

(1) G f ,s : Rn → Rn : x 7→{

x− f (x)‖s(x)‖2 s(x) if f (x) > 0 and 0 6∈ ∂ f (x),

x otherwise.

When it is not necessary to emphasize the selection s, we will write G f . It is also convenient to introducethe set-valued mapping associated with f by

(2) G f : Rn ⇒ Rn : x 7→{

G f ,s(x)∣∣ s is a selection of ∂ f

}with G f ,s being given in (1).

4

Page 5: Subgradient Projectors: Extensions, Theory, and ...people.ok.ubc.ca/bauschke/Research/109.pdf · Subgradient Projectors: Extensions, Theory, and Characterizations Heinz H. Bauschke,

Although subgradient projectors have been well studied for convex functions [12, 21, 24, 20, 44,46, 48, 55, 42], the extension to possibly nonconvex functions is new.

When f is convex and infRn f ≤ 0, G f ,s reduces to

G f ,s : Rn → Rn : x 7→{

x− f (x)‖s(x)‖2 s(x) if f (x) > 0,

x otherwise,

where s : Rn → Rn is a selection of ∂ f with s(x) ∈ ∂ f (x). When f is continuously differentiableon Rn \ lev0 f , G f reduces to

G f : Rn → Rn : x 7→{

x− f (x)‖∇ f (x)‖2∇ f (x) if f (x) > 0 and ∇ f (x) 6= 0,

x otherwise.

The geometric interpretation and motivation of the subgradient projector come from the follow-ing:

Proposition 2.3 Let f : Rn → R be lsc and subdifferentiable, and let s be a selection of ∂ f .

(i) Whenever f (x) > 0, 0 6∈ ∂ f (x), we have

G f ,s(x) = PH−(s(x),x)(x)

where the half space

H−(s(x), x) : ={

z ∈ Rn ∣∣ f (x) + 〈s(x), z− x〉 ≤ 0}

.(3)

(ii) The fixed point set of G f ,s is

Fix G f ,s ={

x ∈ Rn ∣∣ 0 ∈ ∂ f (x)}∪ lev0 f = FixG f .

If f is locally Lipschitz, then Fix G f ,s is closed.

(iii) If f is convex and infRn f ≤ 0, then Fix G f ,s = lev0 f .

Proof. (i). According to [7] or [20, page 133], for the half space

H−(a, β) :={

z ∈ Rn ∣∣ 〈a, z〉 ≤ β}

,

where a ∈ Rn, a 6= 0 and β ∈ R, its metric projection is given by

(4) PH−(a,β)x =

{x− 〈a,x〉−β

‖a‖2 a if 〈a, x〉 > β,

x if 〈a, x〉 ≤ β.

Apply (4) with a := s(x), β := β(x) = 〈s(x), x〉 − f (x).

(ii). This follows from the definition of G f . When f is locally Lipschitz, ∂ f is upper-semicontinuous [51, Proposition 8.7], so{

x ∈ Rn ∣∣ 0 ∈ ∂ f (x)}

is closed. Being a union of two closed sets, Fix G is closed.

(iii). When f is convex, 0 ∈ ∂ f (x) gives f (x) = minX f , so f (x) ≤ 0. Then{x ∈ Rn ∣∣ 0 ∈ ∂ f (x)

}⊆ lev0 f .

Thus (iii) follows from (ii). �

5

Page 6: Subgradient Projectors: Extensions, Theory, and ...people.ok.ubc.ca/bauschke/Research/109.pdf · Subgradient Projectors: Extensions, Theory, and Characterizations Heinz H. Bauschke,

Remark 2.4 (i) Proposition 2.3(i) uses the Euclidean distance. Following [36, 8, 11, 10], one maydefine Bregman subgradient projectors for lsc and subdifferentiable functions. This will be ex-plored in future work.

(ii) Proposition 2.3(ii) shows that for subdifferential functions, the fixed point of G f gives x ∈ Rn

such that0 ∈ ∂ f (x) or f (x) ≤ 0.

Proposition 2.3(iii) shows that for convex functions, the fixed point of G f gives x ∈ Rn such that

f (x) ≤ 0.

We give two simple examples to illustrate the difference of subgradient projectors between con-vex and nonconvex functions.

Example 2.5 Consider f : Rn → R : x 7→ k√‖x‖ = ‖x‖1/k where k > 0. Then

(i) When k ≤ 1, f is convex, G f = (1− k) Id is firmly nonexpansive.

(ii) When k > 1, f is not convex, G f = (1− k) Id is not monotone and need not be nonexpansive,e.g, k = 3.

Proof. For x 6= 0, we have

∇ f (x) =1k‖x‖1/k−1 x

‖x‖ ,

and the result follows from the definition of G f . �

Let B denote the closed unit ball of Rn. According to [51, Exercise 8.14], for a nonempty C ⊆ Rn

the normal cone and regular normal cone mapping are respectively defined NC := ∂ιC and NC :=∂ιC. Recall

Fact 2.6 ([51, Example 8.53]) For f := dC in the case of a closed set C 6= ∅ in Rn, one has at any pointx ∈ C that

∂ f (x) = NC(x) ∩ B, ∂ f (x) = NC(x) ∩ B.

On the other hand, for any x 6∈ C, one has

∂ f (x) =x− PC(x)

dC(x), ∂ f (x) =

{{x−x

dC(x)

}if PC(x) = {x},

∅ otherwise.

Example 2.7 (subgradient projectors of distance functions) Let C 6= ∅ be a closed set in Rn.Then GdC = PC.

Proof. Let s be a selection of ∂dC. By Fact 2.6,

(∀x 6∈ C) s(x) =x− p(x)

dC(x)

where p(x) ∈ PC(x). We show GdC ,s = p.

When x ∈ C, we have PC(x) = {x} and p(x) = x. Because dC(x) = 0 for x ∈ C, the definitionof GdC ,s gives GdC ,s(x) = x. Thus GdC ,s(x) = x = p(x) for x ∈ C.

6

Page 7: Subgradient Projectors: Extensions, Theory, and ...people.ok.ubc.ca/bauschke/Research/109.pdf · Subgradient Projectors: Extensions, Theory, and Characterizations Heinz H. Bauschke,

When x 6∈ C, dC(x) > 0 and 0 6∈ ∂dC(x) because every x∗ ∈ ∂dC(x) has ‖x∗‖ = 1 by Fact 2.6.Then for x 6∈ C,

GdC ,s(x) = x− dC(x)x− p(x)

dC(x)= p(x).

Altogether, GdC ,s(x) = p(x) for every x ∈ Rn. �

When C is nonempty, closed and convex, the projection mapping PC is single-valued, and dC iscontinuously differentiable on Rn \ C. Example 2.7 implies:

Fact 2.8 ([11], [25]) Let C ⊆ Rn be nonempty, closed and convex. Then GdC = PC.

What happens if we take the subgradient projector of a distance function to a set where the dis-tance is taken with respect to another norm? The following example illustrates that using theEuclidean norm for dC in Example 2.7 is essential.

Example 2.9 Definef : R2 → R : (x1, x2) 7→ |x1|+ |x2|,

the distance function to C := {(0, 0)} in `1-norm. When x1 > 0, x2 > 0, x1 6= x2, we haveG f (x1, x2) = ((x1 − x2)/2, (x2 − x1)/2) 6= (0, 0) = PC(x1, x2). Even using ‖ · ‖∞ the dual norm of‖ · ‖1 for s(x1, x2), we have G f (x1, x2) = (−x2,−x1) 6= (0, 0) = PC(x1, x2).

Remark 2.10 Example 2.7 might lead the reader to believe that G f is a monotone operator. Thisholds for any twice differentiable convex function f : R→ R; see [12, Proposition 8.2]. However,this fails for f : R2 → R : (x1, x2) 7→ |x1|p + |x2|p when 1 < p < 2; see [12, Proposition 10.1(iii)].

The following example shows that the assumption that f being subdifferentiable is importantin Definition 2.2.

Example 2.11 The function defined by

f : R→ R : x 7→{

1/|x| if x 6= 0,0 if x = 0,

has G f = 2 Id on R so that Fix G f = Fix 2 Id = {0}. However, the function defined by

g : R→ (−∞,+∞] : x 7→{

1/|x| if x 6= 0,+∞ if x = 0,

has Gg = 2 Id on R \ {0}. Because Gg is not defined at x = 0, we have Fix Gg = ∅ but Fix 2 Id ={0}.

3 Calculus for subgradient projectors

In this section we obtain calculus results for subgradient projectors defined in Section 2 relatedto representations of subgradient projectors for max functions, compositions of functions with alinear operator, and positive powers of nonnegative functions. Subdifferential calculus is the maintool for proving these results.

7

Page 8: Subgradient Projectors: Extensions, Theory, and ...people.ok.ubc.ca/bauschke/Research/109.pdf · Subgradient Projectors: Extensions, Theory, and Characterizations Heinz H. Bauschke,

A mapping Φ : Rn → Rk is called strictly differentiable at x if the Frechet derivative Φ′(x) existsand

limx,y→xy 6=x

‖Φ(x)−Φ(y)−Φ′(x)(x− y)‖‖x− y‖ = 0.

The following facts on subdifferentials are crucial to study the calculus of subgradient projec-tors.

Fact 3.1 ([39, Theorem 6.5], [40, Theorem 1.110(ii)]) Assume that F : Rn → Rk is locally Lipschitz atx ∈ Rn, and g : Rk → R is strictly differentiable at F(x). Then for f (x) = g(F(x)), one has

∂ f (x) = ∂⟨

g′(y), F⟩(x) with y = F(x).

For a matrix A : Rn → Rn, let Aᵀ denote its transpose.

Fact 3.2 ([39, Theorem 6.7(i)], [40, Proposition 1.112(i)], or [51, Exercise 10.7]) Let F : Rn → Rn bestrictly differentiable at x with F′(x) being invertible, and suppose that f (x) = g(F(x)) with g : Rn →(−∞,+∞] being lsc around y = F(x) and f being finite at x. Then

∂ f (x) =(

F′(x))ᵀ

∂g(y) with y = F(x).

Fact 3.3 ([39, Theorem 7.5(ii)], [40, Theorem 3.46(ii)]) Let f1, f2 : Rn → R be locally Lipschitz at x andJ(x) :=

{j∣∣ f j(x) = max{ f1, f2}(x)

}. Then

∂ max{ f1, f2}(x) ⊆ conv⋃ {

∂ f j(x)∣∣ j ∈ J(x)

},

where the equality holds and max{ f1, f2} is subdifferentially regular at x if the function f j is subdifferen-tially regular at x for j ∈ J(x).

Proposition 3.4 Let f : Rn → R be lsc and subdifferentiable.

(i) If k > 0 then Gk f = G f .

(ii) Let α ∈ R and s be a selection of ∂ f . Define

G f ,α : Rn → Rn : x 7→{

x− f (x)−α‖s(x)‖2 s(x) if f (x) > α and 0 6∈ ∂ f (x),

x otherwise.

Then G f ,α = G f−α,s.

(iii) Let α > 0 and s be a selection of ∂ f . Then

G f−α,s(x) =

{G f ,s(x) + αs(x)

‖s(x)‖2 if f (x) > α and 0 6∈ ∂ f (x),

x otherwise.

Proof. (i). By Fact 3.1, ∂(k f ) = k∂ f . Note that k f (x) > 0 if and only if f (x) > 0, and 0 6∈ ∂(k f )(x)if and only if 0 6∈ ∂ f (x). When k f (x) > 0 and 0 6∈ ∂(k f )(x), for s(x) ∈ ∂ f (x), we have ks(x) ∈∂(k f )(x) so that

Gk f ,ks(x) = x− k f (x)‖ks(x)‖2 ks(x) = x− f (x)

‖s(x)‖2 s(x) = G f ,s(x).

8

Page 9: Subgradient Projectors: Extensions, Theory, and ...people.ok.ubc.ca/bauschke/Research/109.pdf · Subgradient Projectors: Extensions, Theory, and Characterizations Heinz H. Bauschke,

When k f (x) ≤ 0 or 0 ∈ ∂(k f )(x), we have f (x) ≤ 0 or 0 ∈ ∂ f (x), so Gk f ,ks(x) = x = G f ,s(x).

(ii). It suffices to note that ∂( f − α) = ∂ f .

(iii). When f (x) > α and 0 6∈ ∂ f (x), we have f (x) > 0 and 0 6∈ ∂ f (x). Then

G f−α,s(x) = x− f (x)− α

‖s(x)‖2 s(x) = x− f (x)‖s(x)‖2 s(x) +

α

‖s(x)‖2 s(x) = G f ,s(x) +α

‖s(x)‖2 s(x).

When f (x) ≤ α or 0 ∈ ∂ f (x), G f−α,s(x) = x by the definition. �

Proposition 3.5 Assume that f1, f2 : Rn → R are locally Lipschitz and subdifferentially regular. For themaximum function g := max{ f1, f2}, one has

Gg(x) =

G f1(x) if g(x) > max{ f2(x), 0}, 0 6∈ ∂ f1(x),G f2(x) if g(x) > max{ f1(x), 0}, 0 6∈ ∂ f2(x),V(x) if g(x) = f1(x) = f2(x) > 0, 0 6∈ conv

(∂ f1(x) ∪ ∂ f2(x)

),

x if g(x) ≤ 0, or 0 ∈ conv(∂ f1(x) ∪ ∂ f2(x)

),

where

V(x) :={

x− fi(x)‖s(x)‖2 s(x)

∣∣∣∣ s(x) ∈ conv(∂ f1(x) ∪ ∂ f2(x)

)}.

Proof. When g(x) > 0, we consider three cases: (i). g(x) > f2(x); (ii). g(x) > f1(x); (iii). g(x) ≤min{ f2(x), f1(x)} which is g(x) = f1(x) = f2(x). Also note that

∂g(x) = conv(∂ f1(x) ∪ ∂ f2(x)

)when f1(x) = f2(x) by Fact 3.3. �

Proposition 3.6 Assume that f : Rn → R is lsc and subdifferentiable, and that g(x) := f (kx) with0 6= k ∈ R. Then

Gg(x) =1kG f (kx)

for every x ∈ Rn. Moreover, FixGg = 1k FixG f .

Proof. By Fact 3.2, ∂g(x) = k∂ f (y) where y = kx, so 0 6∈ ∂g(x) if and only if 0 6∈ ∂ f (y) with y = kx.Let s be a selection of ∂ f . When g(x) > 0 and 0 6∈ ∂g(x), we have f (kx) > 0 and 0 6∈ ∂ f (kx),therefore

Gg,ks(k·)(x) = x− f (kx)‖ks(y)‖2 ks(y) = x− 1

kf (kx)‖s(y)‖2 s(y)(5)

=1k

(kx− f (kx)

‖s(y)‖2 s(y))=

1k

G f ,s(kx)(6)

where s(y) ∈ ∂ f (y) with y = kx.

When g(x) ≤ 0 or 0 ∈ ∂g(x), we have f (y) ≤ 0 or 0 ∈ ∂ f (y) with y = kx, thus

Gg,ks(k·)(x) = x =1k

kx =1k

G f ,s(kx).

This establishes the result. �

9

Page 10: Subgradient Projectors: Extensions, Theory, and ...people.ok.ubc.ca/bauschke/Research/109.pdf · Subgradient Projectors: Extensions, Theory, and Characterizations Heinz H. Bauschke,

Proposition 3.7 Let A : Rn → Rn be unitary and b ∈ Rn, and let f : Rn → R be lsc and subdifferen-tiable. Define g : Rn → R : x 7→ f (Ax + b). Then

(7) Gg(x) = Aᵀ(G f (Ax + b)− b

)for every x ∈ Rn. Furthermore,

(8) FixGg = Aᵀ(FixG f − b).

Proof. Let s be a selection of ∂ f . By Fact 3.2, ∂g(x) = Aᵀ∂ f (y) where y = Ax + b. As A is unitary,‖Aᵀs(y)‖ = ‖s(y)‖ for every s(y) ∈ ∂ f (y). When g(x) > 0 and 0 6∈ ∂g(x), we have f (Ax + b) > 0and 0 6∈ ∂ f (y) with y = Ax + b, therefore

Gg,Aᵀs(A·+b)(x) = x− f (Ax + b)‖Aᵀs(y)‖2 Aᵀs(y) = Aᵀ

(Ax + b− f (Ax + b)

‖s(y)‖2 s(y)− b)

(9)

= Aᵀ(G f ,s(Ax + b)− b).(10)

When g(x) ≤ 0 or 0 ∈ ∂g(x), we have f (y) ≤ 0 or 0 ∈ ∂ f (y) with y = Ax + b, thus

Gg,Aᵀs(A·+b)(x) = x = Aᵀ(Ax + b− b) = Aᵀ(G f ,s(Ax + b)− b).

Hence (7) holds. Finally, (8) follows from (7). �

Corollary 3.8 Let a ∈ Rn, f : Rn → R be lsc and subdifferentiable, and g(x) := f (x − a). ThenGg(x) = G f (x− a) + a for every x ∈ Rn. Moreover, FixGg = a + FixG f .

Theorem 3.9 Assume that f : Rn → R+ is locally Lipschitz, and g := f k with k > 0. Then

Gg =

(1− 1

k

)Id+

1kG f .

Proof. By Fact 3.1, ∂g(x) = k f (x)k−1∂ f (x) when f (x) > 0. Let s be a selection of ∂ f . When g(x) > 0and 0 6∈ ∂g(x), we have f (x) > 0 and 0 6∈ ∂ f (x), therefore

(Id−Gg,k f k−1s)(x) =f (x)k

‖k f (x)k−1s(x)‖2 k f (x)k−1s(x) =1k

f (x)‖s(x)‖2 s(x)(11)

=1k(Id−G f ,s)(x).(12)

When g(x) = 0 or 0 ∈ ∂g(x), we have f (x) = 0 or 0 ∈ ∂ f (x), thus

(Id−Gg,k f k−1s)(x) = 0 = (Id−G f ,s)(x).

Therefore, Id−Gg,k f k−1s =1k (Id−G f ,s) which gives Gg,k f k−1s =

(1− 1

k

)Id+ 1

k G f ,s. �

Remark 3.10 While Theorem 3.9 says that the convex combination of Id and G f ,s is a subgradientprojector, the set of subgradient projectors is not a convex set; see Theorems 10.1,10.3 in Section 10.Note that if Ui : H → H is a cutter (see [20] or Definition 5.1) with a common fixed point, i ∈ I :={1, 2, . . . , m}, and w : H → ∆m is an appropriate weight function, then the operator U := ∑i∈I wiUiis a cutter, cf. [20, Corollary 2.1.49].

10

Page 11: Subgradient Projectors: Extensions, Theory, and ...people.ok.ubc.ca/bauschke/Research/109.pdf · Subgradient Projectors: Extensions, Theory, and Characterizations Heinz H. Bauschke,

Corollary 3.11 For f := d2C in the case of a closed set C 6= ∅ in Rn, one has

G f =Id+PC

2.

Proof. Combine Theorem 3.9 and Example 2.7. �

Example 3.12 (penalty function) Assume that g : Rn → R is locally Lipschitz. In optimization,for a direct constraint given by

C :={

x ∈ Rn ∣∣ g(x) ≤ 0}

,

one can define penalty substitutes. Two popular penalty functions associated with g are: the linearpenalty θ1 ◦ g(x) = t+(g(x)) and quadratic penalty θ2 ◦ g(x) = t2

+(g(x)) where t+ := max{0, t},cf. [51, page 4]. We have

(∀x ∈ Rn) Gθ1◦g(x) = Gg(x).

Because θ2 ◦ g = (θ1 ◦ g)2, by Theorem 3.9 we obtain

Gθ2◦g =Id+Gθ1◦g

2.

The following is immediate from the definition of subgradient projectors.

Proposition 3.13 Let f , g : Rn → R be lsc and subdifferentiable such that f ≡ g on an open set O ⊆ Rn.Then G f = Gg on O.

Remark 3.14 For calculus of subgradient projectors of convex functions, see [12, 44].

4 Basic properties of subgradient projectors

In this section under appropriate conditions we show that the subgradient projector can determinea function uniquely up to a positive scalar multiplication, that the subgradient projector enjoys thefixed point closedness property, and that the subgradient projector is continuous if the function isstrictly differentiable.

We start with some elementary properties of subgradient projectors.

Theorem 4.1 Let f : Rn → R be lsc and subdifferentiable, and G f ,s be given by (1). Then the followinghold:

(i) We have

(13) ‖x− G f ,s(x)‖ = f (x)‖s(x)‖ and

(14)x− G f ,s(x)‖x− G f ,s(x)‖2 =

1f (x)

s(x)

for every x satisfying f (x) > 0 and 0 6∈ ∂ f (x). In particular, when f is locally Lipschitz, one has

(15)x− G f ,s(x)‖x− G f ,s(x)‖2 =

s(x)f (x)

∈ ∂(ln f )(x);

11

Page 12: Subgradient Projectors: Extensions, Theory, and ...people.ok.ubc.ca/bauschke/Research/109.pdf · Subgradient Projectors: Extensions, Theory, and Characterizations Heinz H. Bauschke,

when f is continuously differentiable, one has

(16)x− G f (x)‖x− G f (x)‖2 = ∇(ln f (x)).

(ii) Set g := ln f when f > 0. Then whenever f (x) > 0 and 0 6∈ ∂ f (x) we have

(17) G f ,s(x) = x− c(x)‖c(x)‖2

where c(x) = s(x)f (x) ∈ ∂g(x). If f is continuously differentiable on Rn \ lev0 f , then

(18) G f (x) = x− ∇g(x)‖∇g(x)‖2 ,

whenever f (x) > 0 and ∇ f (x) 6= 0.

Proof. (i). By the definition of G f ,s, when f (x) > 0 and 0 6∈ ∂ f (x), x − G f ,s(x) = f (x)‖s(x)‖2 s(x).

Therefore,

‖x− G f ,s(x)‖ = f (x)‖s(x)‖ .

It follows that

x− G f ,s(x) =f (x)2

‖s(x)‖2s(x)f (x)

= ‖x− G f ,s(x)‖2 s(x)f (x)

,(19)

equivalently, x−G f ,s(x)‖x−G f ,s(x)‖2 = s(x)

f (x) . When f is locally Lipschitz, (15) holds because Fact 3.1 gives

∂(ln f )(x) = ∂ f (x)f (x) when f (x) > 0. When f is continuously differentiable, s(x) = ∇ f (x), hence

(16) follows from ∇(ln f (x)) = ∇ f (x)/ f (x) when f (x) > 0.

(ii). By Fact 3.1, we have ∂g(x) = ∂ f (x)f (x) when f (x) > 0. Then c(x) = s(x)

f (x) where s(x) ∈ ∂ f (x).

(17) follows since 1‖c(x)‖2 = f (x)2

‖s(x)‖2 when f (x) > 0 and 0 6∈ ∂ f (x).

When f is continuously differentiable on Rn \ lev0 f , the same holds for g, so (18) follows. �

4.1 When is a mapping T a subgradient projector?

Theorem 4.2 Given a mapping T : Rn → Rn. The following are equivalent:

(i) T is the subgradient projector of a locally Lipschitz function.

(ii) There exists a locally Lipschitz function f : Rn → R such that

(20)x− T(x)‖x− T(x)‖2 ∈ ∂(ln f (x)) whenever f (x) > 0 and 0 6∈ ∂ f (x),

Tx = x whenever f (x) ≤ 0 or 0 ∈ ∂ f (x).

12

Page 13: Subgradient Projectors: Extensions, Theory, and ...people.ok.ubc.ca/bauschke/Research/109.pdf · Subgradient Projectors: Extensions, Theory, and Characterizations Heinz H. Bauschke,

Proof. (i)⇒(ii). Suppose that T = G f ,s with f being locally Lipschitz. Apply Theorem 4.1(i) toobtain (20).

(ii)⇒(i). Assume that (20) holds. By Fact 3.1, ∂ ln f = ∂ ff when f > 0. When f (x) > 0 and

0 6∈ ∂ f (x), (20) gives1

‖x− Tx‖ =‖s(x)‖

f (x)i.e., ‖x− Tx‖ = f (x)

‖s(x)‖

where s(x) ∈ ∂ f (x). Then using (20) again we obtain x− Tx = ‖x− Tx‖2 s(x)f (x) so that

Tx = x− ‖x− Tx‖2 s(x)f (x)

= x−(

f (x)‖s(x)‖

)2 s(x)f (x)

= x− f (x)‖s(x)‖2 s(x)

as required. �

Can the functions Theorem 4.2(i) and (ii) be different? This is answered in the next subsection.

4.2 Recovering f from its subgradient projector G f

Can one determine the function f if G f is known? To this end, we recall the concept of essentiallystrictly differentiable functions by Borwein and Moors [15, Section 4].

Definition 4.3 A locally Lipschitz function f : Rn → R is called essentially strictly differentiable on anopen set O ⊆ Rn if f is strictly differentiable everywhere on O except possibly on a Lebesgue null set.

This class of functions has been extensively studied by Borwein and Moors [15]. This class of func-tions includes finite-valued convex functions, Clarke regular locally Lipschitz functions, semis-mooth locally Lipschitz functions, C1 functions and others, [15, pages 323-328]. If a locally Lip-schitz function is essentially strictly differentiable, then ∂ f is single-valued almost everywhere.Moreover, the Clarke subdifferetial ∂c f , which can be written as conv ∂ f (the convex hull of ∂ f )when f is locally Lipschitz [40, Theorem 3.57], can be recovered by every densely defined selections ∈ ∂ f ; see, e.g., [15]. We refer the reader to [23] and [51] for details on the Clarke subdifferential.

Fact 4.4 Let f , g be locally Lipschitz on a polygonally connected and open subset O of Rn. If ∇ f = ∇galmost everywhere on O, then h := f − g is a constant on O.

Proof. We prove this by contradiction. Rademacher’s Theorem says that a locally Lipschitz func-tion is differentiable almost everywhere, see, e.g., [28, page 81]. By the assumption, h is locallyLipschitz, so ∇h = 0 almost everywhere. Suppose that x, y ∈ O and h(x) 6= h(y). As O is polygo-nally connected, there exists z ∈ O such that either [x, z] ⊆ O with h(x) 6= h(z) or [z, y] ⊆ O withh(z) 6= h(y). Without loss of generality, assume [z, y] ⊆ O and h(z) 6= h(y). As h is differentiablealmost everywhere, by Fubini’s Theorem [49, Theorem 6.2.2, page 110], we can choose z nearby zand y nearby y so that both h is differentiable and ∇h = 0 almost everywhere on [z, y] ⊆ O, andh(z) 6= h(y). Then

h(y)− h(z) =∫ 1

0〈∇h(z + t(y− z)), y− z〉 dt =

∫ 1

00dt = 0

which contradicts h(z) 6= h(y). �

13

Page 14: Subgradient Projectors: Extensions, Theory, and ...people.ok.ubc.ca/bauschke/Research/109.pdf · Subgradient Projectors: Extensions, Theory, and Characterizations Heinz H. Bauschke,

Theorem 4.5 Let T : Rn → Rn be a subgradient projector. Suppose that there exist two essentially strictlydifferentiable functions f , f1 : Rn → R such that G f ,s = T = G f1,s1 with s being a selection of ∂ f and s1being a selection of ∂ f1. Then on each polygonally connected component of Rn \ Fix T there exists k > 0such that f = k f1.

Proof. Assume that there exist two essentially strictly differentiable and locally Lipschitz functionsf , f1 such that T = G f ,s = G f1,s1 . Since T has a full domain, we have dom f = dom f1 = Rn. ByTheorem 4.2, we have

x− T(x)‖x− T(x)‖2 ∈ ∂(ln f (x)) whenever x ∈ Rn \ Fix T,

x− T(x)‖x− T(x)‖2 ∈ ∂(ln f1(x)) whenever x ∈ Rn \ Fix T.

As f , f1 are locally Lipschitz, both ln f , ln f1 are locally Lipschitz on Rn \ Fix T. Then

∂ ln f =1f

∂ f , ∂ ln f1 =1f1

∂ f1

by Fact 3.1 or [23, Theorem 2.3.9(ii)]. Because f , f1 are essentially strictly differentiable and locallyLipschitz, ∂ f , ∂ f1 are single-valued almost everywhere [15], thus

∂(ln f1(x)) =x− T(x)‖x− T(x)‖2 = ∂(ln f (x)) almost everywhere on Rn \ Fix T.

By Fact 4.4, on each polygonally connected component of Rn \ Fix T, there exists c ∈ R such thatln f − ln f1 = c, which implies that f1 = k f for k = ec > 0. �

Example 4.6 Define

f1 : R→ R : x 7→

2x if x > 0,0 if −1 ≤ x ≤ 0,−3(x + 1) if x < −1,

and

f2 : R→ R : x 7→

x if x > 0,0 if −1 ≤ x ≤ 0,−1(x + 1) if x < −1.

Then

(∀x ∈ R) G f1(x) = G f2(x) =

0 if x ≥ 0,x if −1 ≤ x ≤ 0,−1 if x < −1.

The set R \ [−1, 0] has two connected components (−∞,−1) and (0,+∞). We have f1 = 3 f2 on(−∞,−1), and f1 = 2 f2 on (0,+∞).

The following example shows that Theorem 4.5 fails if one removes the assumption of essen-tially strictly differentiability.

Example 4.7 In [16], Borwein, Moors and Wang showed that generically nonexpansive Lipschitzfunctions have their limiting subdifferentials identically equal to the unit ball; see also [53]. Letf : Rn → R be a locally Lipschitz function such that ∂ f (x) = B for every x ∈ Rn. Since 0 ∈ ∂ f (x)for every x ∈ Rn, in view of Definition 2.2 we have G f = Id. As such, generically nonexpansiveLipschitz functions have a subgradient projector equal to the identity mapping.

14

Page 15: Subgradient Projectors: Extensions, Theory, and ...people.ok.ubc.ca/bauschke/Research/109.pdf · Subgradient Projectors: Extensions, Theory, and Characterizations Heinz H. Bauschke,

4.3 Fixed point closed property and continuity

Definition 4.8 We say that an operator T : D → Rn is fixed point closed at x ∈ D if for every sequencexk → x with xk − Txk → 0 one has x = Tx. If this holds for every x ∈ D, we say that T has the fixedpoint closed property on D.

In [20], Cegielski calls the fixed point closed property of T as Id−T being closed at 0.

Theorem 4.9 (fixed-point closed property) Let f : Rn → R be locally Lipschitz and G f ,s be given byDefinition 2.2. Then G f ,s is fixed-point closed at every x ∈ Rn, i.e.,

(21) ‖y− G f ,s(y)‖ → 0 and y→ x ⇒ x = G f ,s(x).

Proof. Assume that a sequence (yn)n∈N in Rn satisfies

(22) ‖yn − G f ,s(yn)‖ → 0 and yn → x.

Consider three cases.

Case 1. If there exists infinitely many yn’s, say (ynk)k∈N, such that 0 ∈ ∂ f (ynk). Since ∂ f is uppersemicontinuous, taking limit when k→ ∞ gives 0 ∈ ∂ f (x). Hence x = G f (x).

Case 2. If there exists infinitely many yn’s, say (ynk)k∈N, such that f (ynk) ≤ 0. Taking limit whenk→ ∞ and using the continuity of f at x gives

f (x) = limk→∞

f (ynk) ≤ 0.

Hence x = G f (x).

Case 3. There exists N ∈N such that f (yn) > 0 and 0 6∈ ∂ f (yn) when n > N. Then by (13),

(23) f (yn) = ‖yn − G f ,s(yn)‖‖s(yn)‖.

As f is continuous at x, f is locally Lipschitz around x, so ∂ f is locally bounded around x. There-fore,

f (x) = limn→∞

f (yn) = limn→∞

(‖yn − G f ,s(yn)‖‖s(yn)‖) = 0

since ‖yn − G f ,s(yn)‖ → 0. Hence x = G f ,s(x).

Altogether, x ∈ Fix G f ,s. This establishes (21) because (yn)n∈N was an arbitrary sequence satis-fying (22). �

The following result generalizes [12, Theorem 5.6].

Theorem 4.10 Let f : Rn → R be a locally Lipschitz function and essentially strictly differentiable, andlet G f ,s be given by Definition 2.2. Suppose that x ∈ Rn \ Fix G f ,s. Then the following statements areequivalent:

(i) G f ,s is continuous at x.

(ii) f is strictly differentiable at x.

Consequently, G f ,s is continuous on Rn \ Fix G f ,s if and only if f is continuously differentiable on Rn \Fix G f ,s.

15

Page 16: Subgradient Projectors: Extensions, Theory, and ...people.ok.ubc.ca/bauschke/Research/109.pdf · Subgradient Projectors: Extensions, Theory, and Characterizations Heinz H. Bauschke,

Proof. (ii)⇒(i). Assume that f is strictly differentiable at x ∈ Rn \ Fix G f ,s. Under the assumption,s : Rn → Rn is continuous at x and s(x) 6= 0. The result follows from the definition G f ,s:

y 7→ y− f (y)‖s(y)‖2 s(y).

(i)⇒(ii). Assume that G f ,s is continuous at x ∈ Rn \ Fix G f ,s. By (14),

s(y) = f (y)y− G f ,s(y)‖y− G f ,s(y)‖2

so s is continuous at x. Because s is a selection of ∂ f and f is essentially strictly differentiable, weconclude that f is strictly differentiable at x.

Note that Fix G f ,s is closed by Proposition 2.3(ii). The remaining result follows from the fact thaton an open set on which a function is finite, the function is continuously differentiable if and onlyif the function is strictly differentiable; cf. [51, Corollary 9.19]. �

We illustrate Theorem 4.10 by two examples.

Example 4.11 Define

f : Rn → R : x 7→{|x| if x ≤ 1,2x− 1 if x > 1.

Then

G f ,s(x) =

0 if x < 1,1/2 if x > 1,1− 1

s(x) where s(x) ∈ [1, 2] if x = 1,

is discontinuous at x = 1, because f is not differentiable at x = 1.

Proof. When x < 0, G f ,s(x) = x − −x(−1)2 (−1) = 0; When x = 0, f (0) = 0, so G f ,s(0) = 0; When

0 < x < 1, G f ,s(x) = x − x12 (1) = 0; When x > 1, G f ,s(x) = x − 2x−1

22 2 = 1/2; When x = 1,∂ f (1) = [1, 2], so

G f ,s(x) = x− 1s(x)2 (s(x)) = x− 1

s(x)

where s(x) ∈ [1, 2]. �

The next example gives a function that is differentiable but not strictly differentiable at 0, andthat its subgradient projector is not continuous at 0.

Example 4.12 Define

f : R→ R : x 7→{

x2 sin 1x + x + 1 if x 6= 0,

0 if x = 0.

Then f is differentiable everywhere, but not strictly differentiable at 0. The subgradient projector

G f (x) =

{x− x2 sin(1/x)+x+1

2x sin(1/x)−cos(1/x2)+1 if f (x) > 0 and f ′(x) 6= 0,

x otherwise,

is not continuous at 0.

16

Page 17: Subgradient Projectors: Extensions, Theory, and ...people.ok.ubc.ca/bauschke/Research/109.pdf · Subgradient Projectors: Extensions, Theory, and Characterizations Heinz H. Bauschke,

Proof. At x = 0, f (0) = 1 and f ′(0) = 1. The function f is not strictly differentiable at 0 be-cause f ′ is not continuous at 0. Since limx→0 G f (x) does not exist, the subgradient projector is notcontinuous at 0. �

How about the continuity of G f ,s on Fix G f ,s? Since G f ,s = Id on Fix G f ,s, it is always continuousat x ∈ int(Fix G f ,s). The following result deals with the case of x ∈ bdry(Fix G f ,s).

Theorem 4.13 Let f : Rn → R be locally Lipschtiz, G f ,s be given by Definition 2.2, and x ∈bdry(Fix G f ,s).

(i) Assume that f (x) > 0 and 0 ∈ ∂ f (x). Then G f ,s is discontinuous at x.

(ii) Assume that f (x) ≤ 0. Suppose that one of the following holds:

(a) ∃α > 0 such that

(24) (∀y : f (y) > 0, 0 6∈ ∂ f (y)) α f (y) + 〈s(y), x− y〉 ≤ 0.

In particular, this is true when f is convex.

(b)

(25) 0 6∈ ∂ f (x).

(c)

(26) lim infy→x

f (y)>0,0 6∈∂ f (y)

‖s(y)‖ > 0.

Then G f ,s is continuous at x.

Proof. (i). As x ∈ bdry(Fix G f ), there exists a sequence (yk)k∈N such that yk → x, f (yk) > 0 and0 6∈ ∂ f (yk). Because f is locally Lipschitz and s(yk) ∈ ∂ f (x), (s(yk))k∈N is locally bounded. Bytaking a subsequence if necessary, we can assume that ‖s(yk)‖ → l ∈ R+. Taking limit whenk→ ∞ yields that

‖yk − G f ,s(yk)‖ =f (yk)

‖s(yk)‖→ f (x)

l

which is +∞ if l = 0 or a positive number if l > 0. Because G f ,s(x) = x, this shows that G f ,s is notcontinuous at x.

(ii). To show that G f ,s is continuous at x, it suffices to show that

(27) limy→x

f (y)>0,0 6∈∂ f (y)

f (y)‖s(y)‖ = 0.

Indeed, by Theorem 4.1, when f (y) > 0 and 0 6∈ ∂ f (y), we have

‖y− G f ,s(y)‖ =f (y)‖s(y)‖ .

Then (27) gives that limy→x G f ,s(y) = limy→x(G f ,s(y) − y) + limy→x y = x. When y ∈ Fix G f ,s,G f (y) = y, clearly limy→x G f (y) = x. Hence G f ,s is continuous at x.

17

Page 18: Subgradient Projectors: Extensions, Theory, and ...people.ok.ubc.ca/bauschke/Research/109.pdf · Subgradient Projectors: Extensions, Theory, and Characterizations Heinz H. Bauschke,

Now (24) gives

f (y) ≤ 〈s(y), y− x〉α

≤ ‖s(y)‖‖y− x‖α

so thatf (y)‖s(y)‖ ≤

‖y− x‖α

which implies (27).

Next, we show (25) implies (26). Note that (25) gives d∂ f (x)(0) > 0 since ∂ f (x) is closed by[51, Theorem 8.6]. Because f is locally Lipschitz, in view of [51, Proposition 8.7], we have thatlim supy→x ∂ f (y) ⊆ ∂ f (x), hence 0 6∈ ∂ f (y) for y sufficiently nearby x. Invoking [51, Corollary4.7(b)], we obtain

lim infy→x

d∂ f (y)(0) ≥ d∂ f (x)(0),

from which it follows that

lim infy→x

f (y)>0,0 6∈∂ f (y)

‖s(y)‖ ≥ lim infy→x

f (y)>0,0 6∈∂ f (y)

d∂ f (y)(0)(28)

≥ lim infy→x

d∂ f (y)(0) ≥ d∂ f (x)(0) > 0,(29)

and this gives (26).

Finally, (26) gives (27) because limy→x f (y) = 0 and

0 ≤ lim infy→x

f (y)>0,0 6∈∂ f (y)

f (y)‖s(y)‖ ≤ lim sup

y→xf (y)>0,0 6∈∂ f (y)

f (y)‖s(y)‖ ≤

limy→x, f (y)>0 f (y)lim inf y→x

f (y)>0,0 6∈∂ f (y)‖s(y)‖ = 0.

Here is an example showing the result of Theorem 4.13(i).

Example 4.14 (1). Define f : R→ R : x 7→ x3 + 1. Then

G f (x) =

{2x3 −

13x2 if x 6= 0 and x > −1,

x if x = 0 or x ≤ −1,

has Fix G f = (−∞,−1] ∪ {0}, and G f is not continuous at x = 0.

(2). Define

f : R→ R : x 7→{

x + 1 if x ≤ 0,1 if x ≥ 0.

Then

G f (x) =

{−1 if −1 < x < 0,x if x ≤ −1 or x ≥ 0,

has Fix G f = (−∞,−1] ∪ [0,+∞), and G f is not continuous at x = 0.

18

Page 19: Subgradient Projectors: Extensions, Theory, and ...people.ok.ubc.ca/bauschke/Research/109.pdf · Subgradient Projectors: Extensions, Theory, and Characterizations Heinz H. Bauschke,

4.4 The family of subgradient projectors

Theorem 4.15 Let f : Rn → R be locally Lipschitz. Then the following are equivalent:

(i) G f is a single-valued.

(ii) f is strictly differentiable on Rn \ FixG f .

Proof. (i)⇒(ii). By (14) in Theorem 4.1, when x ∈ Rn \ FixG f , we have

s(x) = f (x)x− G f ,s(x)‖x− G f ,s(x)‖2

where s(x) ∈ ∂ f (x). By the assumption G f = T for an everywhere single-valued T : Rn → Rn, so

s(x) = f (x)x− Tx‖x− Tx‖2 .

It follows that ∂ f (x) is a singleton, so f is strictly differentiable at x by [51, Theorem 9.18]. There-fore, f is strictly differentiable on Rn \ FixG f .

(ii)⇒(i). Clear. �

Theorem 4.16 Let C ⊆ Rn be a nonempty closed set. Then the following are equivalent:

(i) GdC is a single-valued.

(ii) C is convex.

Proof. According to Fact 2.6, we have FixGdC = C.

(i)⇒(ii). By Theorem 4.15, dC is strictly differentiable on Rn \C. Fact 2.6 shows that PC is single-valued for every x ∈ Rn \ C. Hence, C is convex; cf. [27, Theorem 12.7].

(ii)⇒(i). Apply Fact 2.8. �

5 When is the subgradient projector G f a cutter or local cutter?

In this section we provide conditions for a subgradient projector to be a cutter or local cutter, andan explicit nonconvex function with a cutter subgradient projector. Along the way some calculuson cutter subgradient projectors are also developed.

5.1 Cutters, quasi-firmly nonexpansive mappings, and local cutters

Recall the following well-known algorithmic operators.

Definition 5.1 ([20, page 53]) Let D be a nonempty subset of Rn and T : D → Rn. We say that T is acutter if Fix T 6= ∅ and

(30) (∀x ∈ D)(∀u ∈ Fix T) 〈x− Tx, u− Tx〉 ≤ 0.

19

Page 20: Subgradient Projectors: Extensions, Theory, and ...people.ok.ubc.ca/bauschke/Research/109.pdf · Subgradient Projectors: Extensions, Theory, and Characterizations Heinz H. Bauschke,

Definition 5.2 ([20, page 56]) Let D be a nonempty subset of Rn and T : D → Rn. We say that T isquasi-firmly nonexpansive (quasi-fne) if Fix T 6= ∅ and

(∀x ∈ D)(∀u ∈ Fix T) ‖Tx− u‖2 + ‖x− Tx‖2 ≤ ‖x− u‖2.

In [20, page 56], quasi-fne mappings are called strongly quasinonexpansive mappings.

The following fact says that a cutter is strongly Fejer monotone with respect to the set of its fixedpoints, and that cutters and quasi-fne mappings are the same, see [20, page 108].

Fact 5.3 ([20, Theorem 2.1.39, Lemma 2.1.36])

(i) A mapping T : D → Rn is a cutter if and only if T is quasi-fne.

(ii) Let T : D → Rn be a cutter. Then T is always continuous on Fix T.

(iii) Let T : D → Rn be a cutter. Then Fix T is closed and convex.

In Definitions 5.1 and (i), they require that T satisfies the inequalities for all x ∈ D and u ∈ Fix T.In practice, the sets D and Fix T might be too large to verify those inequalities. We now introducelocal cutters and locally quasi-firmly nonexpansive mappings.

Definition 5.4 A mapping T : D → Rn is a local cutter at x ∈ Fix T if Fix T 6= ∅ and there exists δ > 0such that

(31) (∀x ∈ B(x, δ) ∩ D)(∀u ∈ B(x, δ) ∩ Fix T) 〈x− Tx, u− Tx〉 ≤ 0.

Definition 5.5 A mapping T : D → Rn is locally quasi-firmly nonexpansive (locally quasi-fne) at x ∈Fix T if there exists δ > 0 such that

(∀x ∈ B(x, δ) ∩ D)(∀u ∈ B(x, δ) ∩ Fix T) ‖Tx− u‖2 + ‖Tx− x‖2 ≤ ‖x− u‖2.

A localized version of Fact 5.3(i) comes next.

Proposition 5.6 A mapping T : D → Rn is a local cutter at x ∈ Fix T if and only if T is locally quasi-fneat x ∈ Fix T.

Proof. This follows from ‖x− u‖2 = ‖Tx− u‖2 + ‖x− Tx‖2 + 2 〈x− Tx, Tx− u〉 . �

Proposition 5.7 Assume that T : R→ R and Fix T 6= ∅. Then T is a cutter on R if and only if

(32) (∀x ∈ R) Tx ∈ [x, PFix Tx].

Proof. The sufficiency is clear. Conversely, when x ∈ Fix T, (32) clearly holds. Assume x 6∈ Fix Tand c ∈ Fix T. Because T is from R to R, there exists λ ∈ R such that

Tx = (1− λ)x + λc.

As T is a cutter, we have

(x− Tx)(c− Tx) = −λ(1− λ)(x− c)2 ≤ 0,

which gives 0 ≤ λ ≤ 1, so that Tx ∈ [x, c]. Since c ∈ Fix T was arbitrary, it follows that x ∈[x, PFix Tx]. �

20

Page 21: Subgradient Projectors: Extensions, Theory, and ...people.ok.ubc.ca/bauschke/Research/109.pdf · Subgradient Projectors: Extensions, Theory, and Characterizations Heinz H. Bauschke,

Remark 5.8 Compare Proposition 5.7 to Corollary 8.4, which characterizes the subgradient pro-jector of a convex function on R.

For a nonempty convex set C ⊆ Rn, the recession cone of C is rec C :={

x ∈ Rn∣∣ x + C ⊆ C

}. The

negative polar of K ⊆ Rn is K− :={

y ∈ Rn∣∣ 〈y, x〉 ≤ 0, ∀x ∈ K

}.

Proposition 5.9 Let T : Rn → Rn be a cutter. Then

ran(Id−T) ⊆ (rec(Fix T))−.

Consequently, when Fix T is a linear subspace, ran(Id−T) ⊆ (Fix T)⊥. In other words, ran(Id−T) ⊆(ker(Id−T))⊥.

Proof. Let x− Tx ∈ ran(Id−T) and v ∈ rec(Fix T). Then for every k > 0 and u ∈ Fix T, we haveu + kv ∈ Fix T. The assumption of T being a cutter implies

〈x− Tx, u + kv− Tx〉 ≤ 0 ⇒ 〈x− Tx, u/k + v− Tx/k〉 ≤ 0.

When k → ∞ this gives 〈x− Tx, v〉 ≤ 0. Since v ∈ rec(Fix T) was arbitrary, we have x − Tx ∈(rec(Fix T))−.

When Fix T is a linear subspace, Fix T = rec(Fix T) and (rec(Fix T))− = (rec(Fix T))⊥. �

5.2 Characterizations of G f being a cutter or local cutter

Our first result characterizes the class of functions f for which its G f ,s is a cutter.

Lemma 5.10 Let f : Rn → R be lsc and subdifferentiable, and let G f ,s be given by Definition 2.2. Supposethat f (x) > 0, 0 6∈ ∂ f (x). Then

⟨x− G f ,s(x), u− G f ,s(x)

⟩=

f (x)‖s(x)‖2

(f (x) + 〈s(x), u− x〉

).

Proof. Let f (x) > 0, 0 6∈ ∂ f (x). The definition of G f ,s gives⟨x− G f ,s(x), u− G f ,s(x)

⟩=

⟨f (x)s(x)‖s(x)‖2 , u− x +

f (x)‖s(x)‖2 s(x)

⟩=

f (x)‖s(x)‖2 〈s(x), u− x〉+ f 2(x)

‖s(x)‖2

=f (x)‖s(x)‖2

(f (x) + 〈s(x), u− x〉

).

Theorem 5.11 (level sets of tangent planes including the target set) Let f : Rn → R be lsc andsubdifferentiable, let G f ,s be given by Definition 2.2, and

S :={

u ∈ Rn ∣∣ f (u) ≤ 0 or 0 ∈ ∂ f (u)}

.

Then the following hold:

21

Page 22: Subgradient Projectors: Extensions, Theory, and ...people.ok.ubc.ca/bauschke/Research/109.pdf · Subgradient Projectors: Extensions, Theory, and Characterizations Heinz H. Bauschke,

(i) G f ,s is a cutter if and only if whenever x 6∈ S and u ∈ S one has

f (x) + 〈s(x), u− x〉 ≤ 0.

(ii) Let x ∈ S and δ > 0. G f ,s is a cutter on B(x, δ) if and only if for all x ∈ B(x, δ) \ S and u ∈S ∩ B(x, δ) one has

f (x) + 〈s(x), u− x〉 ≤ 0.

Proof. (i). When f (x) ≤ 0 or 0 ∈ ∂ f (x), x = G f ,s(x), (31) holds for T = G f ,s. Assume that f (x) > 0,0 6∈ ∂ f (x) and s(x) ∈ ∂ f (x). By Lemma 5.10,

⟨x− G f ,s(x), u− G f ,s(x)

⟩=

f (x)‖s(x)‖2

(f (x) + 〈s(x), u− x〉

).

Since f (x) > 0, we deduce that⟨x− G f ,s(x), u− G f ,s(x)

⟩≤ 0 ⇔ f (x) + 〈s(x), u− x〉 ≤ 0.

Hence, the result follows from Definition 5.1.

(ii). Apply the same arguments as in above with x ∈ B(x, δ) and u ∈ S ∩ B(x, δ). �

One immediately obtains the following:

Fact 5.12 ([20, page 146]) Let f : Rn → R be convex, let G f ,s be given by Definition 2.2, and lev0 f 6= ∅.Then G f ,s is a cutter. Consequently, G f ,s is continuous at every x ∈ lev0 f .

Proof. As lev0 f 6= ∅, Fix G f ,s = lev0 f . Assume that f (x) > 0. For u ∈ G f ,s, f (u) ≤ 0. By theconvexity of f we have

f (x) + 〈s(x), u− x〉 ≤ f (u) ≤ 0.

Theorem 5.11 shows that G f ,s is a cutter. The remaining result follows from Fact 5.3(ii). �

In Fact 5.12, lev0 f 6= ∅ is required, as the following example shows.

Example 5.13 (1). Let f : R→ R be defined by (∀x ∈ R) f (x) := exp |x|. Then lev0 f = ∅ and

G f (x) =

x− 1 if x > 0,0 if x = 0,x + 1 if x < 0.

In particular, this G f is discontinuous at x = 0 and not a cutter. Moreover, G f is not monotone.(2). Consider

f : Rn → R : x 7→ exp(‖x‖2/2).

We have lev0 f = ∅ and

G f (x) =

{x− x

‖x‖2 if x 6= 0,

0 if x = 0.

In particular, G f is not continuous at 0, so not a cutter.

22

Page 23: Subgradient Projectors: Extensions, Theory, and ...people.ok.ubc.ca/bauschke/Research/109.pdf · Subgradient Projectors: Extensions, Theory, and Characterizations Heinz H. Bauschke,

Example 5.14 The nonconvex function

f : Rn → R : x 7→{‖x‖2 if ‖x‖ ≤ 1,1 if ‖x‖ > 1,

has G f being a cutter on a neighborhood of 0, but not a cutter on Rn.

It is instructive to consider dC where C ⊆ Rn is closed and nonempty.

Proposition 5.15 Let C ⊆ Rn be closed and nonempty, and s be a selection of ∂dC. Then GdC ,s is a cutterif and only if the set C is convex.

Proof. By Fact 2.6, 0 6∈ ∂dC(x) whenever x 6∈ C, because ‖v‖ = 1 for every v ∈ ∂dC(x). This impliesthat Fix GdC ,s = C.

Assume that GdC ,s is a cutter. Then Fix GdC ,s = C is convex. Conversely, assume that C is convex.We have dC is convex, consequently, GdC ,s is a cutter. �

Theorem 5.16 Let k ≥ 1, f : Rn → [0,+∞) be locally Lipschitz, and let G f ,s be given by Definition 2.2Suppose that G f ,s is a cutter, and Fix G f ,s 6= ∅. If g = f k, then Gg,k f k−1s is a cutter.

Proof. By Theorem 3.9, Gg,k f k−1s = (1 − 1/k) Id+1/kG f ,s. As Id and G f ,s are both cutters, andFix G f ,s ∩ Fix Id = Fix G f ,s 6= ∅, being a convex combination of cutters, Gg,k f k−1s is a cutter by [20,Corollary 2.1.49, page 62]. �

In Corollary 11.6, we give an example showing even though G f 2,2 f s is a cutter, G f ,s might not bea cutter; so the converse of Theorem 5.16 is not true.

Theorem 5.17 Let A : Rn → Rn be unitary, b ∈ Rn, let f : Rn → R be lsc and subdifferentiable, and letG f ,s be given by Definition 2.2. Define g : Rn → R : x 7→ f (Ax + b). If G f ,s is a cutter, then Gg,Aᵀs(A·+b)is a cutter.

Proof. Let x ∈ Rn, u ∈ Fix Gg,Aᵀs(A·+b). Proposition 3.7 gives

Gg,Aᵀs(A·+b)(x) = Aᵀ(G f ,s(Ax + b)− b

), Au + b ∈ Fix G f ,s.

Since A is unitary and G f ,s is a cutter, we have

‖x− Gg,Aᵀs(A·+b)(x)‖2 = ‖x− Aᵀ(G f ,s(Ax + b)− b

)‖2 = ‖Ax + b− G f ,s(Ax + b)‖2(33)

≤ ‖Ax + b− (Au + b)‖2 − ‖G f ,s(Ax + b)− (Au + b)‖2(34)

= ‖x− u‖2 − ‖G f ,s(Ax + b)− Au− b‖2(35)

= ‖x− u‖2 − ‖Aᵀ(G f ,s(Ax + b)− b

)− u‖2(36)

= ‖x− u‖2 − ‖Gg,Aᵀs(A·+b)(x)− u‖2.(37)

Hence Gg,Aᵀs(A·+b) is a cutter by Fact 5.3(i). �

Corollary 5.18 Let B be an n× n symmetric matrix. Define

f : Rn → R : x 7→ 12

xᵀBx.

23

Page 24: Subgradient Projectors: Extensions, Theory, and ...people.ok.ubc.ca/bauschke/Research/109.pdf · Subgradient Projectors: Extensions, Theory, and Characterizations Heinz H. Bauschke,

Then

(38) G f (x) =

{x− xᵀBx

2‖Bx‖2 Bx if xᵀBx > 0 and Bx 6= 0,

x otherwise.

Moreover, the following are equivalent:

(i) G f is a cutter.

(ii) B is positive semidefinte or negative semidefinite.

Proof. (38) follows from Definition 2.2.

Because B is symmetric, there exists an orthogonal matrix Q such that QᵀBQ = D, where D is ann× n diagonal matrix whose diagonal entries are eigenvalues of B. Using x = Qy, Theorem 5.17shows that G f is a cutter if and only if Gg is a cutter, where g : Rn → R : y 7→ f (Qy) = 1

2 yᵀDy.

(i)⇒(ii). (i) implies that Gg is a cutter. This means that

(39) (∀y ∈ Rn : yᵀDy > 0)(∀u ∈ Rn : uᵀDu ≤ 0) yᵀDu ≤ 12

yᵀDy.

We will show that all nonzero diagonal entries of D have the same sign. Suppose to the contrarythat there exist diagonal entries of D such that λi > 0, λj < 0. Put yk = 0, uk = 0 for k =1, . . . , n, k 6= i, j. Then (39) reduces to that whenever λiy2

i + λjy2j > 0 and

(40) λiu2i + λju2

j ≤ 0,

we have

(41) λiyiui + λjyjuj ≤12(λiy2

i + λjy2j ).

Fix (yi, yj) such that λiy2i + λjy2

j > 0 and yj < 0. When ui = 0, uj → +∞, (40) is verified but (41)fails to hold. This contradicts that Gg is a cutter. Hence all nonzero diagonal entries of D musthave the same sign, which implies that B is positive semidefinite if positive sign, and B is negativesemidefinite if negative sign.

(ii)⇒(i). When B is positive semidefinite, f is convex, we apply Fact 5.12. When B is negativesemidefinite, Gg = Id is a cutter. �

Theorem 5.19 Let f : Rn → R be lsc and subdifferentiable, and let G f ,s be given by Definition 2.2.Assume that R 3 k 6= 0 and g(x) = f (kx). If G f ,s is a cutter, then Gg,ks(k·) is a cutter.

Proof. Proposition 3.6 gives Gg,ks(k·)(x) = 1k G f ,s(kx), and Fix Gg,ks(k·) =

1k Fix G f ,s. Let x ∈ X and

u ∈ Fix Gg,ks(k·). We have⟨x− Gg,ks(k·)(x), u− Gg,ks(k·)(x)

⟩=⟨

x− 1/kG f ,s(kx), u− 1/kG f ,s(kx)⟩

(42)

= 1/k2 ⟨kx− G f ,s(kx), ku− G f ,s(kx)⟩≤ 0(43)

since G f ,s is a cutter. Therefore, Gg,ks(k·) is a cutter. �

One might ask: If each function fi : Rn → R has G fi being a cutter, must the maximum g :=max{ f1, f2} have Gg being a cutter? The answer is negative as the following example shows.

24

Page 25: Subgradient Projectors: Extensions, Theory, and ...people.ok.ubc.ca/bauschke/Research/109.pdf · Subgradient Projectors: Extensions, Theory, and Characterizations Heinz H. Bauschke,

Example 5.20 Let f1, f2 : Rn → R be defined by f1(x) := 1 + x and f2(x) := 1− x on R. Each G fi

is a cutter by Fact 5.12. The function g(x) := max{ f1(x), f2(x)} has

Gg(x) =

−1 if x > 0,0 if x = 0,1 if x < 0,

which is not continuous at x = 0, so Gg is not a cutter.

5.3 A nonconvex function whose G f is a cutter

Example 5.21 If f is not convex, G f ,s need not be a cutter. Consider

f : R→ R : x 7→ 1− exp(−x2).

Then the subgradient projector of f is

G f ,s(x) =

{x− ( 1

2x exp(−x2)− 1

2x ) if x 6= 0,

0 if x = 0,

and Fix G f = {0}. However G f is not a cutter. Indeed, when |x| >√

2 we have

f (x) + s(x)(0− x) = 1− exp(−x2) + (2x exp(−x2))(0− x)

=exp(x2)− (1 + 2x2)

exp(x2)≥

1 + x2 + x4

2 − (1 + 2x2)

exp(x2)

=x2(x2 − 2)2 exp(x2)

> 0.

By Theorem 5.11, G f is not a cutter.

Example 5.22 Even though f is not convex, G f ,s may still be a cutter. Define

f : R→ R : x 7→

0 if x ≤ 0,x if 0 ≤ x ≤ 20/7,8(x− 2.5) if 20/7 ≤ x ≤ 3,2(x− 1) if x > 3.

Then f is not convex since f ′(x) is not monotone on [20/7,+∞). However, its subgradient projec-tor

G f ,s(x) =

x if x ≤ 0,0 if 0 < x < 20/7,207 −

207

1s(x) if x = 20/7, where s(x) ∈ [1, 8],

2.5 if 20/7 < x < 3,3− 4

s(x) if x = 3, where s(x) ∈ {2, 8},1 if x > 3.

25

Page 26: Subgradient Projectors: Extensions, Theory, and ...people.ok.ubc.ca/bauschke/Research/109.pdf · Subgradient Projectors: Extensions, Theory, and Characterizations Heinz H. Bauschke,

is a cutter. To see this, by Theorem 5.11, it suffices to consider zero level sets of tangent planes.Indeed,

Let f (u) ≤ 0, i.e., u ≤ 0. When x0 > 3,

f (x0) + s(x0)(u− x0) = 2(u− 1) ≤ 0;

when x0 = 3,f (x0) + s(x0)(u− x0) = 4 + s(3)(u− 3) ≤ 4 + 2(u− 3) ≤ 0;

where 2 ≤ s(3) ≤ 8; when 20/7 < x0 < 3,

f (x0) + s(x0)(u− x0) = 8(u− 2.5) ≤ 0;

when x0 = 20/7,

f (x0) + s(x0)(u− x0) = 70/2 + s(20/7)(u− 20/7) ≤ u ≤ 0;

where 1 ≤ s(20/7) ≤ 8; when 0 < x0 < 20/7,

f (x0) + s(x0)(u− x0) = u ≤ 0.

See Corollary 11.6(ii) for an example on R2. Note that even if G f is continuous, it does not meanthat G f is a cutter, e.g., see Example 2.5(ii). In [20], Cegielski developed a systematic theory forcutters. The theory of cutters can be used to study the class of functions (Theorem 5.11) whosesubgradient projectors are cutters.

One might also ask: If f : Rn → R has G f ,s being a cutter, does g := f + r have Gg,s being acutter for every r ∈ R? In general, the answer is negative. When f is convex and lev0 f 6= ∅,it follows from Fact 5.12 that G f−r,s is a cutter whenever r > 0. This might fail for r < 0 as thefollowing example shows.

Example 5.23 Let f : R→ R be defined by f (x) := e|x| − 1. Then

G f ,s(x) =

x− 1 + e−x if x > 0,x + 1− ex if x < 0,0 if x = 0,

is a cutter by Fact 5.12. However, for g : R→ R : x 7→ e|x|, we have g = f + 1 and but Gg,s is not acutter by Example 5.13(1).

For a nonconvex function function, although G f ,s is a cutter, G f−r,s might not be a cutter evenwhen r > 0.

Example 5.24 Let f be given by Example 5.22, and g := f − 20/7. Then

Gg,s(x) =

x if x ≤ 20/7,207 if 20/7 < x < 3,

s ∈ {17/7, 20/7} if x = 3,17/7 if x > 3.

As shown in Example 5.22, G f ,s is a cutter. However, Gg,s is not a cutter by using Proposition 5.7or by direct calculations using Definition 5.1.

26

Page 27: Subgradient Projectors: Extensions, Theory, and ...people.ok.ubc.ca/bauschke/Research/109.pdf · Subgradient Projectors: Extensions, Theory, and Characterizations Heinz H. Bauschke,

6 Convergence analysis of subgadient projectors

In this section, we study the convergence of the sequence generated by the subgradient projector.When the function is convex, the convergence analysis has been fairly well known; see, e.g., [47,Section 5.3], [46], [9], and [20]. For nonconvex functions, we demonstrate that the convergenceresults on cutters, local cutters, quasi-ne mappings, and local quasi-ne mappings can be effec-tively used. It turns out that local cutters and local quasi-ne mappings are more appropriate fornonconvex functions.

In addition to cutters and local cutters, see Definitions 5.1 and 5.4, quasi-nonexpansive map-pings and local quasi-nonexpansive mappings are also useful for the convergence analysis.

6.1 Quasi-nonexpansive mappings and local quasi-nonexmapsive mappings

According to [7, page 59], and [20, page 47], we define:

Definition 6.1 Let D be a nonempty subset of Rn and T : D → Rn. We say that

(i) T is quasinonexpansive (quasi-ne) if

(∀x ∈ D)(∀y ∈ Fix T) ‖Tx− y‖ ≤ ‖x− y‖.

(ii) A mapping T : D → D is said to be asymptotically regular at x ∈ D if ‖Tk+1x − Tkx‖ → 0 ask→ ∞; it is said to be asymptotic regular on D if it is so at every x ∈ D.

Definition 6.1(ii) requires that T satisfy the inequalities for all x ∈ D and y ∈ Fix T. In practice,the sets D and Fix T might be too large to verify those inequalities. We now introduce locallyquasinonexpansive mappings.

Definition 6.2 A mapping T : D → Rn is locally quasinonexpansive (locally quasi-ne) at x ∈ Fix T ifthere exists δ > 0 such that

(∀x ∈ B(x, δ) ∩ D) (∀y ∈ B(x, δ) ∩ Fix T) ‖Tx− y‖ ≤ ‖x− y‖.

The connection between quasi-ne mappings and quasi-fne mappings is given by the followingfact.

Fact 6.3 ([9, Proposition 2.3(v)⇔ (vi)], [20, Corollary 2.1.33]) Let D be a nonempty subset of Rn, andT : D → Rn with Fix T 6= ∅. Then the following are equivalent:

(i) T is quasi-fne.

(ii) 2T − Id is quasi-ne.

The following result says that quasi-ne, nonexpansiveness, and local quasi-ne are the same forlinear mappings. Although the equivalence of quasi-ne and nonexpansiveness for linear map-pings has been given in [7, Exercise 4.4], the equivalence to local quasi-ne is new.

Proposition 6.4 Let T : Rn → Rn be a linear operator. Then the following are equivalent:

27

Page 28: Subgradient Projectors: Extensions, Theory, and ...people.ok.ubc.ca/bauschke/Research/109.pdf · Subgradient Projectors: Extensions, Theory, and Characterizations Heinz H. Bauschke,

(i) T is quasi-ne.

(ii) T is nonexpansive.

(iii) There exists δ > 0 and x ∈ Fix T such that T is quasi-ne on B(x, δ).

Proof. (i)⇒(ii). Since 0 ∈ Fix T, we have ‖Tx‖ ≤ ‖x‖ for every x ∈ Rn. Hence T is nonexpansive.

(ii)⇒(i). Clear.

(ii)⇒(iii). Clear.

(iii)⇒(ii). The assumption means that there exists x ∈ Fix T and δ > 0 such that

(44) (∀x ∈ B(x, δ))(∀y ∈ B(x, δ) ∩ Fix T) ‖Tx− y‖ ≤ ‖x− y‖.

Let v ∈ B(0, δ). Using Tx = x, y = x, and T being linear, from (44) we obtain ‖Tv‖ = ‖T(x + v)−Tx‖ = ‖T(x + v)− x‖ ≤ ‖(x + v)− x‖ = ‖v‖. Since v ∈ B(0, δ) was arbitrary and T is linear, wehave ‖Tv‖ ≤ ‖v‖ for every v ∈ Rn. Hence T is nonexpansive. �

Remark 6.5 Fact 6.3 and Proposition 6.4 hold in Hilbert spaces. We formulate them only in Rn.

The following example illustrates that for nonlinear T, quasinonexpanseness and nonexpanse-ness are different.

Example 6.6 Define

T : R→ R : x 7→{|x|2 sin 1

x if x 6= 0,0 if x = 0.

Then T is quasi-ne but not nonexpansive.

Proof. T is quasi-ne because that Fix T = {0} and

(∀x ∈ R) |T(x)− 0| =∣∣∣∣ |x|2 sin

1x

∣∣∣∣ ≤ |x|2 ≤ |x|.T is not nonexpansive because for x > 0, we have

T′(x) =12

sin1x− 1

2xcos

1x

and |T′(1/(2nπ))| = nπ > 1. �

For analogous results on linear cutters, see Proposition 9.1 in Section 9.

Although we have developed calculus for G f being cutters in Section 5, most results also holdfor quasi-ne mappings. We single out two most important ones.

Theorem 6.7 Let k ≥ 1 and f : Rn → [0,+∞) be locally Lipschitz, and let G f ,s be given by Definition 2.2.Suppose that G f ,s is quasi-ne and Fix G f ,s 6= ∅. If g = f k, then Gg,k f k−1s is quasi-ne.

Proof. Apply Theorem 3.9 and [7, Exercise 4.11]. �

Theorem 6.8 Let A : Rn → Rn be unitary and b ∈ Rn, let f : Rn → R be lsc and subdifferentiable,and let G f ,s be given by Definition 2.2. Define g : Rn → R : x 7→ f (Ax + b). If G f ,s is quasi-ne, thenGg,Aᵀs(A·+b) is quasi-ne.

28

Page 29: Subgradient Projectors: Extensions, Theory, and ...people.ok.ubc.ca/bauschke/Research/109.pdf · Subgradient Projectors: Extensions, Theory, and Characterizations Heinz H. Bauschke,

Proof. Apply Proposition 3.7 and Definition 6.1. �

Corollary 6.9 Let k ≥ 1 and f : Rn → [0,+∞) be locally Lipschitz with G f ,s be given by Definition 2.2.Suppose that g = f 2 and Fix G f ,s 6= ∅. Then G f ,s is quasi-ne if and only if Gg,2 f s is quasi-fne.

Proof. By Theorem 3.9, Gg,2 f s =G f ,s+Id

2 . The result then follows from Fact 6.3. �

6.2 Convergences of cutters, local cutters, quasi-ne mappings, and local quasi-nemappings

Proposition 6.10 (convergence of iterates of a cutter) Let D ⊆ Rn be a nonempty closed convex set,T : D → D be an operator with a fixed point, and that T has the fixed point closed property on D. If T is acutter, then for every x ∈ D, the sequence (Tkx)k∈N converges to a point z ∈ Fix T.

Proof. Since T is a cutter, T is quasi-fne by Fact 5.3(i), so quasi-ne. Moreover, T is asymptoticallyregular by [20, Theorem 3.4.3]. The result now follows from [20, Theorem 3.5.2]. �

Proposition 6.11 (convergence of iterates of a locally quasi-fne mapping) Let D be nonempty closedconvex subset of Rn, let T : D → D and Fix T 6= ∅. Assume that

(i) There exists x ∈ Fix T and δ > 0 such that T is locally quasi-fne (see Definition 5.5);

(ii) T has the fixed-point closed property.

Let x0 ∈ D ∩ B(x, δ). Set(∀k ∈N) xk+1 = Txk.

Then (xk)k∈N converges to a point z ∈ B(x, δ) ∩ Fix T.

Proof. By assumption (i),

(45) (∀x ∈ B(x, δ) ∩ D) (∀y ∈ B(x, δ) ∩ Fix T) ‖Tx− y‖2 + ‖Tx− x‖2 ≤ ‖x− y‖2.

With x0 ∈ D ∩ B(x, δ), equation (45) gives

(∀y ∈ B(x, δ) ∩ Fix T) ‖Tx0 − y‖2 + ‖Tx0 − x0‖2 ≤ ‖x0 − y‖2,

so ‖x1 − x‖ ≤ ‖x0 − x‖ ≤ δ. By induction, we have that

(46) (xk)k∈N is a sequence in B(x, δ).

Moreover, equation (45) gives

(∀k ∈N)(∀y ∈ B(x, δ) ∩ Fix T) ‖Txk − y‖2 + ‖Txk − xk‖2 ≤ ‖xk − y‖2,

so (xk)k∈N is Fejer monotone with respect to C := B(x, δ) ∩ Fix T, and ‖xk+1 − xk‖ → 0 as k→ ∞.

Let x be a cluster point of (xk)k∈N, say xkl → x. Since Txkl − xkl → 0, and T is fixed-point closed,we have Tx− x = 0. Moreover, ‖x− x‖ ≤ δ because of (46). Thus, x ∈ C. Applying [7, Theorem5.5], we conclude that xk → z ∈ C. �

29

Page 30: Subgradient Projectors: Extensions, Theory, and ...people.ok.ubc.ca/bauschke/Research/109.pdf · Subgradient Projectors: Extensions, Theory, and Characterizations Heinz H. Bauschke,

Proposition 6.12 (convergence of iterates of a locally quasi-ne mapping) Let D be nonempty closedconvex subset of Rn, let T : D → D and int(Fix T) 6= ∅. Assume that

(i) There exists x ∈ Fix T and δ > 0 such that T is locally quasi-ne (see Definition 6.2);

(ii) int(B(x, δ) ∩ Fix T) 6= ∅;

(iii) T has the fixed-point closed property.

Let x0 ∈ D ∩ B(x, δ). Set(∀k ∈N) xk+1 = Txk.

Then (xk)k∈N converges to a point z ∈ B(x, δ) ∩ Fix T.

Proof. By assumption (i), there exists δ > 0 such that

(47) (∀x ∈ B(x, δ) ∩ D) (∀y ∈ B(x, δ) ∩ Fix T) ‖Tx− y‖ ≤ ‖x− y‖;

With x0 ∈ D ∩ B(x, δ), equation (47) gives

(∀y ∈ B(x, δ) ∩ Fix T) ‖Tx0 − y‖ ≤ ‖x0 − y‖,

so ‖x1 − x‖ ≤ ‖x0 − x‖ ≤ δ. By induction, we have that

(48) (xk)k∈N is a sequence in B(x, δ).

Moreover, equation (47) gives

(∀k ∈N)(∀y ∈ B(x, δ) ∩ Fix T) ‖Txk − y‖ ≤ ‖xk − y‖,

so (xk)k∈N is Fejer monotone with respect to C := B(x, δ) ∩ Fix T. As int C 6= ∅, we have thatxk → z ∈ Rn by [7, Proposition 5.10]. This implies that ‖Txk − xk‖ = ‖xk+1 − xk‖ → 0 and xk → zas k→ ∞.

Since T is fixed-point closed, we have Tz− z = 0. Moreover, ‖z− x‖ ≤ δ because of (48). Hencexk → z ∈ B(x, δ) ∩ Fix T. �

6.3 Applications to subgradient projectors

Theorem 6.13 Let f : Rn → R be locally Lipschitz, G f ,s be given by Definition 2.2, and

S := {x ∈ Rn| 0 ∈ ∂ f (x)} ∪ lev0 f 6= ∅.

If the subgradient projector G f ,s is a cutter, then for every x ∈ Rn, the sequence (G f ,skx)k∈N converges to

a point z such that either 0 ∈ ∂ f (z) or f (z) ≤ 0.

Proof. Combine Theorem 4.9 and Proposition 6.10. �

To proceed, it will be convenient to single out:

Lemma 6.14 Let f : Rn → R be lsc and subdifferentiable, and G f ,s be given by Definition 2.2. Whenf (x) > 0, 0 6∈ ∂ f (x), and y ∈ Rn, we have

‖G f ,s(x)− y‖2 = ‖x− y‖2 +f (x)‖s(x)‖2

(f (x) + 2 〈y− x, s(x)〉

).(49)

30

Page 31: Subgradient Projectors: Extensions, Theory, and ...people.ok.ubc.ca/bauschke/Research/109.pdf · Subgradient Projectors: Extensions, Theory, and Characterizations Heinz H. Bauschke,

Proof. This follows from

‖G f ,s(x)− y‖2 =

∥∥∥∥x− y− f (x)‖s(x)‖2 s(x)

∥∥∥∥2

= ‖x− y‖2 +f 2(x)‖s(x)‖2 − 2

⟨x− y,

f (x)‖s(x)‖2 s(x)

⟩.

Theorem 6.15 Let f : Rn → R be locally Lipschitz, G f ,s be given by Definition 2.2, and

S := {x ∈ Rn| 0 ∈ ∂ f (x)} ∪ lev0 f .

Then the following hold:

(i) G f ,s is quasi-ne if and only if

(50) (∀x 6∈ S) (∀y ∈ S) f (x) + 2 〈y− x, s(x)〉 ≤ 0.

(ii) Assume that int S 6= ∅, and (50) holds. Then for every x ∈ Rn, the sequence (G f ,skx)k∈N converges

to a point z ∈ S.

Proof. (i). By Lemma 6.14, when x 6∈ S, and y ∈ S, we have

‖G f ,s(x)− y‖2 = ‖x− y‖2 +f (x)‖s(x)‖2

(f (x) + 2 〈y− x, s(x)〉

).(51)

In view of Definition 6.1, assumption (50) is equivalent to G f ,s being quasi-ne.

(ii). By (i), the sequence (G f ,skx)k∈N is Fejer monotone with respect to S. Since int S 6= ∅, by [7,

Proposition 5.10], the sequence (G f ,skx)k∈N converges to a point z ∈ Rn. Write xk = G f ,s

kx. Then(∀k ∈ N) xk+1 = G f ,s(xk). As f is locally Lipschitz at z and xk → z, the sequence (s(xk))k∈N isbounded. Since

xk+1 − xk = −f (xk)

‖s(xk)‖2 s(xk)

and limk→∞ xk = z, we have

| f (z)| = limk→∞| f (xk)| = lim

k→∞‖xk+1 − xk‖‖s(xk)‖ = 0.

Hence z ∈ S. �

Theorem 6.16 Let f : Rn → R be locally Lipschitz, and G f ,s be given by Definition 2.2, and

S := {x ∈ Rn| 0 ∈ ∂ f (x)} ∪ lev0 f 6= ∅.

Assume that the subgradient projector G f ,s is locally quasi-fne at x ∈ S, i.e., there exists δ > 0 such that

(52) (∀x ∈ B(x, δ) \ S) (∀y ∈ B(x, δ) ∩ S) f (x) + 〈y− x, s(x)〉 ≤ 0.

Then for every x0 ∈ B(x, δ), the sequence (xk)k∈N defined by

(∀k ∈N) xk+1 = G f ,s(xk)

converges to a point z ∈ B(x, δ) ∩ S.

31

Page 32: Subgradient Projectors: Extensions, Theory, and ...people.ok.ubc.ca/bauschke/Research/109.pdf · Subgradient Projectors: Extensions, Theory, and Characterizations Heinz H. Bauschke,

Proof. (52) guarantees that G f ,s is locally quasi-fne at x ∈ S. Indeed, when x 6∈ S and y ∈ S ∩B(x, δ), using Lemma 6.14, (52) and (13), we have

‖G f ,s(x)− y‖2 = ‖x− y‖2 +f (x)‖s(x)‖2

(f (x) + 2 〈y− x, s(x)〉

)(53)

= ‖x− y‖2 +f 2(x)‖s(x)‖2

2( f (x) + 〈y− x, s(x)〉)− f (x)f (x)

(54)

≤ ‖x− y‖2 + ‖G f (x)− x‖2(− f (x)

f (x)

)(55)

≤ ‖x− y‖2 − ‖G f ,s(x)− x‖2.(56)

In view of Theorem 4.9, it suffices to apply Proposition 6.11. �

Theorem 6.17 Let f : Rn → R be locally Lipschitz, G f ,s be given by Definition 2.2, and

S := {x ∈ Rn| 0 ∈ ∂ f (x)} ∪ lev0 f

with int S 6= ∅. Assume that there exist x ∈ S and δ > 0 such that

(57) (∀x ∈ B(x, δ) \ S) (∀y ∈ B(x, δ) ∩ S) f (x) + 2 〈y− x, s(x)〉 ≤ 0.

Assume further that int(B(x, δ) ∩ S) 6= ∅. Then for every x0 ∈ B(x, δ), the sequence (xk)k∈N defined by

(∀k ∈N) xk+1 = G f ,s(xk)

converges to a point z ∈ B(x, δ) ∩ S.

Proof. (57) guarantees that G f ,s is locally quasi-ne at x. Indeed, for every x ∈ B(x, δ) \ S andy ∈ B(x, δ) ∩ S, using Lemma 6.14 and (57) we have

‖G f ,s(x)− y‖2 = ‖x− y‖2 +f (x)‖s(x)‖2

(f (x) + 2 〈y− x, s(x)〉

)(58)

≤ ‖x− y‖2.(59)

By Theorem 4.9, G f ,s has the fixed point closed property. Therefore, Proposition 6.12 applies. �

Example 6.18 (1). Define

f : R→ R : x 7→{

1− |x| if |x| ≤ 1,0 otherwise.

Because Fix G f ,s ={

x ∈ R∣∣ |x| ≥ 1

}is not convex, we have that G f ,s is not a cutter. However,

f satisfies the assumptions of both Theorems 6.16 and 6.17 so that the local convergence theoryapplies.

(2). Define

f : R→ R : x 7→

0 if x ≤ 0,x if 0 ≤ x ≤ 1,1 if 1 ≤ x ≤ 2,x− 1 if x ≥ 2.

As Fix G f ,s = (−∞, 0] ∪ [1, 2], G f ,s is not a cutter. However, both Theorems 6.16 and 6.17 apply.

32

Page 33: Subgradient Projectors: Extensions, Theory, and ...people.ok.ubc.ca/bauschke/Research/109.pdf · Subgradient Projectors: Extensions, Theory, and Characterizations Heinz H. Bauschke,

6.4 Finite convergence and (C, ε)-firmly nonexpansiveness

Finite termination algorithms for subgradient projectors of convex functions have been studiedin [46, 29, 13]. Recently, in [43] Pang studied finite convergent algorithms of subgradient projec-tors of locally Lipschitz functions defined in terms of the Clarke subdifferential. Naturally, oneasks what his result implies about the subgradient projector defined by us. To this end, let usrecall lower-Ck functions defined by Rockafellar and Wets [51, Definition 10.29], and approximateconvex functions by Nghai, Luc, and Thera [41], respectively.

Definition 6.19 A function f : O → R, where O is an open subset in Rn, is said to be lower Ck on O ifon some neighborhood V of each x ∈ O there is a representation f (x) := maxt∈T ft(x) in which ft is ofclass Ck on V and the index set T is compact such that ft(x) and all its partial derivatives through order kdepend continuously not just on x ∈ V but jointly on (t, x) ∈ T ×V.

Definition 6.20 A function f : Rn → R is approximately convex at x ∈ Rn if for every ε > 0 there existsδ > 0 such that

(∀x, y ∈ B(x, δ))(∀λ ∈ (0, 1)) f (λx + (1− λ)y) ≤ λ f (x) + (1− λ) f (y) + ελ(1− λ)‖x− y‖.

Fact 6.21 (See [2, Theorem 4.5], [26, Corollary 3]) Let f : Rn → R be locally Lipschitz at x. Then thefollowing are equivalent:

(i) f is lower-C1 around x.

(ii) f is approximately convex at x.

(iii) for every ε > 0 there exists δ > 0 such that

(∀x, y ∈ B(x, δ))(x∗ ∈ ∂c f (x)) f (y) ≥ f (x) + 〈x∗, y− x〉 − ε‖x− y‖.

Theorem 6.22 (finite convergence for accelerated subgradient projectors) Let f : Rn → R be lo-cally Lipschitz, and let x ∈ Rn satisfy

(i) f (x) = 0;

(ii) 0 6∈ ∂ f (x);

(iii) f is lower-C1 around x.

Suppose that the strictly decreasing sequence (εk)k∈N converges to 0 at a sublinear rate. Then there existδ > 0 and ε > 0 such that for every x0 ∈ B(x, δ) and ε0 < ε, the sequence (xk)k∈N defined by

(60) (∀k ∈N) xk+1 = xk −εk + f (xk)

‖sk‖2 sk, where sk ∈ ∂ f (xk),

converges in finitely many iterations, i.e., f (xk) ≤ 0 for some k ∈N.

Proof. Since f is lower-C1 around x, f is Clarke regular around x; see [51, Theorem 10.31]. Thus,the Clarke subdifferential and the limiting subdifferential of f are the same around x. Because∂ f is upper semicontinuous, when δ is sufficiently small, (ii) guarantees that 0 6∈ ∂ f (x) for everyx ∈ B(x, δ), which implies that (60) is well defined. By Fact 6.21, (iii) is equivalent to f beingapproximately convex at x. The result then follows from [43, Theorem 3]. �

33

Page 34: Subgradient Projectors: Extensions, Theory, and ...people.ok.ubc.ca/bauschke/Research/109.pdf · Subgradient Projectors: Extensions, Theory, and Characterizations Heinz H. Bauschke,

Remark 6.23 See [2, 41, 52] for more characterizations on lower-C1 functions and approximatelyconvex functions.

Let ε ≥ 0 and C ⊆ Rn. In [30], Hesse and Luke studied (C, ε)-firmly nonexpansive mappings;see also [37].

Definition 6.24 Let C, D be nonempty subsets of Rn and T : D → Rn. T is called (C, ε)-firmly nonex-pansive if

(∀x ∈ D)(∀y ∈ C) ‖Tx− Ty‖2 + ‖(x− Tx)− (y− Ty)‖2 ≤ (1 + ε)‖x− y‖2.

Theorem 6.25 ((C, ε)-firmly nonexpansivness of G f ,s) Let f : Rn → R be locally Lipschitz, G f ,s begiven by Definition 2.2, and

S := {x ∈ Rn| 0 ∈ ∂ f (x)} ∪ lev0 f .

Suppose that x ∈ Rn satisfies

(i) f (x) = 0;

(ii) 0 6∈ ∂ f (x);

(iii) f is lower-C1 around x.

Then for every ε > 0 there exists δ > 0 such that on B(x, δ) the subgradient projector G f ,s is (S ∩B(x, δ), ε)-firmly nonexpansive, in which ε = 1 + 8Lε/d∂ f (x)(0)

2 and L being the Lipschitz modulus of faround x.

Proof. Let α := d∂ f (x)(0)/2. Then α > 0 by (ii). For every ε > 0, we can find δ > 0 such that

(61) f (y) ≥ f (x) + 〈x∗, y− x〉 − ε‖x− y‖, when x, y ∈ B(x, δ), x∗ ∈ ∂ f (x).

This follows from (iii) and Fact 6.21.

• ‖s(x)‖ ≥ α whenever s(x) ∈ ∂ f (x) and x ∈ B(x, δ). This is because that ∂ f (x) is compact,∂ f is upper semicontinuous, and (ii).

• | f (x) − f (y)| ≤ L‖x − y‖ whenever x, y ∈ B(x, δ). This is possible because f is locallyLipschitz around x.

Since 0 6∈ ∂ f (y) for y ∈ B(x, δ), we must have f (y) ≤ 0 if y ∈ S ∩ B(x, δ)). Thus, (61) gives

(62) (∀x ∈ B(x, δ))(∀y ∈ S ∩ B(x, δ))(∀x∗ ∈ ∂ f (x)) f (x) + 〈x∗, y− x〉 ≤ ε‖x− y‖.

Put C := S ∩ B(x, δ). When f (x) > 0, 0 6∈ ∂ f (x), and y ∈ S ∩ B(x, δ), using Lemma 6.14, (62), and(13), we have

‖G f ,s(x)− y‖2 = ‖x− y‖2 +f (x)‖s(x)‖2

(f (x) + 2 〈y− x, s(x)〉

)(63)

= ‖x− y‖2 +f 2(x)‖s(x)‖2

2( f (x) + 〈y− x, s(x)〉)− f (x)f (x)

(64)

34

Page 35: Subgradient Projectors: Extensions, Theory, and ...people.ok.ubc.ca/bauschke/Research/109.pdf · Subgradient Projectors: Extensions, Theory, and Characterizations Heinz H. Bauschke,

≤ ‖x− y‖2 +2 f (x)‖s(x)‖2 ε‖y− x‖+ ‖G f ,s(x)− x‖2

(− f (x)

f (x)

)(65)

≤ ‖x− y‖2 +2( f (x)− f (y))

α2 ε‖x− y‖ − ‖G f ,s(x)− x‖2(66)

≤ ‖x− y‖2 +2L‖x− y‖

α2 ε‖x− y‖ − ‖G f ,s(x)− x‖2(67)

= ‖x− y‖2 +2Lε

α2 ‖x− y‖2 − ‖G f ,s(x)− x‖2(68)

≤ (1 + ε)‖x− y‖2 − ‖G f ,s(x)− x‖2.(69)

This completes the proof. �

Remark 6.26 Observe that both Theorems 6.22 and 6.25 aim for solving nonconvex inequalityproblems, e.g., finding a point x such that f (x) ≤ 0 with f (x) = −e−x2

+ 1/2 and f satisfying theassumptions at x =

√ln 2. However, they do not apply to f = dC.

This completes Part I. In Part II, we will study subgradient projectors of Moreau envelopes, andtheir connections to subgradient projectors of original functions.

Part II

Subgradient projectors of Moreau envelopes andcharacterizations

7 Subgradient projectors of Moreau envelopes

When f : Rn → (−∞,+∞] is lsc, ∂ f (x) might be empty for some x ∈ Rn. However, eλ f has muchbetter properties when f is prox-bounded, see, e.g., Fact 7.5 below; and this is the motivation forus to study subgradient projectors of Moreau envelopes below. To do this, we need to study therelationship between Geλ f and G f ,s.

Recall that for a proper, lsc function f : Rn → (−∞,+∞] and parameter value λ > 0, theMoreau envelope eλ f and proximal mapping Pλ f are defined respectively by

eλ f : Rn → (−∞,+∞] : x 7→ infw

{f (w) +

12λ‖x− w‖2

}, and

Pλ f : Rn ⇒ Rn : x 7→ argminw

{f (w) +

12λ‖x− w‖2

}.

When f is proper, lsc, and convex, we refer the reader to [7, Chapter 12] and [50] for the propertieseλ f and Pλ f . When f is a proper and lsc function, not necessarily convex, in [45] Poliquin andRockafellar coined the notions of prox-boundedness and prox-regularity of functions; see also [51,page 610].

Definition 7.1 (i) A function f : Rn → (−∞,+∞] is prox-bounded if there exists λ > 0 such thateλ f (x) > −∞ for some x ∈ Rn. The supremum of the set of all such λ is the threshold λ f ofprox-boundedness for f .

35

Page 36: Subgradient Projectors: Extensions, Theory, and ...people.ok.ubc.ca/bauschke/Research/109.pdf · Subgradient Projectors: Extensions, Theory, and Characterizations Heinz H. Bauschke,

(ii) A function f : Rn → (−∞,+∞] is prox-regular at x for v if f is finite and locally lsc at x withv ∈ ∂ f (x), and there exists ε > 0 and ρ ≥ 0 such that

f (x′) ≥ f (x) +⟨v, x′ − x

⟩− ρ

2‖x′ − x‖2 for all ‖x′ − x‖ ≤ ε

when v ∈ ∂ f (x), ‖v− v‖ < ε, ‖x− x‖ < ε, f (x) < f (x) + ε.

When this holds for all v ∈ ∂ f (x), f is said to be prox-regular at x.

We give a simple example to illustrate the concepts of prox-regularity and prox-bounded of func-tions.

Example 7.2 (1). The function f : R → R : x 7→ −|x| is prox-bounded with λ f = +∞. However,f is not prox-regular at x = 0.

(2). The function f : R→ R : x 7→ x3 is prox-regular on R. However, f is not prox-bounded.

(3). The function f : R→ R : x 7→ −x2/2 is prox-regular on R, and prox-bounded with λ f = 1.

(4). The function f : R→ R : x 7→ x3 − |x| not prox-regular at x = 0, and not prox-bounded.

In the sequel, we shall also need the following key concepts.

Definition 7.3 ([51, page 614]) Let O be a nonempty open subset of Rn. We say that f : O → R is C1+

if f is differentiable with ∇ f Lipschitz continuous.

Set q : Rn → R : x 7→ ‖x‖2/2.

Definition 7.4 ([51, page 567]) A proper, lsc function f : Rn → (−∞,+∞] is µ-hypoconvex for someµ > 0 if f + µ−1 q is convex.

7.1 Fine properties of prox-regular functions

Two major facts about the Moreau envelopes of prox-bounded functions and prox-regular func-tions are:

Fact 7.5 ([51, Example 10.32]) Let f : Rn → (−∞,+∞] be proper, lsc, and prox-bounded with thresholdλ f . Then for every λ ∈ (0, λ f ), the function −eλ f is lower C2, hence semidifferentiable, locally Lipschitz,Clarke regular, and

∂[−eλ f ](x) = λ−1[conv Pλ f (x)− x],

∅ 6= ∂[eλ f ](x) ⊆ λ−1[x− Pλ f (x)].

Fact 7.6 ([5, Proposition 5.3], [51, Proposition 13.37]) Let f : Rn → (−∞,+∞] be lsc, proper, andprox-bounded with threshold λ f . Suppose that f is prox-regular at x for v ∈ ∂ f (x). Then for all λ ∈ (0, λ f )there is a neighborhood Uλ of x + λv for which the following equivalent properties hold:

(i) eλ f is C1+ on Uλ.

(ii) Pλ f is nonempty, single-valued, monotone and Lipschitz continuous on Uλ. Further,

∇eλ f = (Id−Pλ f )/λ on Uλ.

36

Page 37: Subgradient Projectors: Extensions, Theory, and ...people.ok.ubc.ca/bauschke/Research/109.pdf · Subgradient Projectors: Extensions, Theory, and Characterizations Heinz H. Bauschke,

Proposition 7.7 Let f : Rn → (−∞,+∞] be proper, lsc, and prox-bounded with threshold λ f . Then forevery λ ∈ (0, λ f ), one has dom Pλ f = Rn. Consequently, ran(Id+λ∂ f ) = Rn.

Proof. As 0 < λ < λ f , we have dom Pλ f = Rn. To complete the proof, it suffices to apply [51,Example 10.2]: Pλ f ⊆ (Id+λ∂ f )−1. �

Proposition 7.8 (global prox-regularity implies hypoconvexity) Let f : Rn → (−∞,+∞] beproper, lsc, and prox-bounded with threshold λ f . Suppose that f is prox-regular on Rn. Then for everyλ ∈ (0, λ f ), the following hold:

(i) The function f + λ−1 q is convex.

(ii) Pλ f = (Id+λ∂ f )−1 is single-valued and Lipschitz continuous on Rn.

(iii) ∇eλ f = (Id−Pλ f )/λ.

Proof. When λ ∈ (0, λ f ), we have dom Pλ f = Rn. Since f is prox-regular, by Fact 7.6, for v ∈∂ f (x) there exists an open neighborhood Uλ of x + λv such that Pλ f is single-valued and locallyLipschitz. Proposition 7.7 implies that Pλ f is single-valued and locally Lipschitz on Rn. As Pλ f isalways monotone, cf. [51, Proposition 12.19], Pλ f is maximally monotone by [51, Example 12.7].Then (i) and (ii) follow from [51, Proposition 12.19]. To obtain (iii), one can apply Fact 7.6(ii). �.

Proposition 7.8(i) immediately implies:

Corollary 7.9 Let f : Rn → (−∞,+∞] be proper, lsc, and prox-bounded. Suppose that f is prox-regularon Rn. Then the function f is a difference of two convex functions.

Characterizations of prox-regularity on an open subset is given by

Fact 7.10 ([51, Theorem 10.33], [51, Proposition 13.33]) Let f : O → R, where O is a nonempty openset in Rn. The following are equivalent:

(i) The function f is lower C2 on O.

(ii) Relative to some neighborhood of each point of O, there is an expression f = g− ρ q in which g isfinite, convex function, and ρ > 0.

(iii) f is prox-regular and locally Lipschitz on O.

Corollary 7.11 Let f : Rn → (−∞,+∞] and let O be a nonempty open subset of Rn. Suppose that f isprox-regular and locally Lipschitz on O. Then for every compact convex subset S of O, there exists ρ > 0such that f + ρ q is convex on S.

Proof. Let x ∈ S. By Fact 7.10, there exists an open ball B(x, δx) ⊆ O and ρx such that f + ρx q isconvex on B(x, δx). Select from the covering of S by various balls B(x, δx) a finite covering, sayB(xi, δxi) with i = 1, . . . , m. Let ρ := max{ρx1 , . . . , ρxm}. As f + ρ q is convex on each B(xi, δxi), andS ⊆ ⋃B(xi, δxi), we obtain that f + ρ q is convex on S. �

37

Page 38: Subgradient Projectors: Extensions, Theory, and ...people.ok.ubc.ca/bauschke/Research/109.pdf · Subgradient Projectors: Extensions, Theory, and Characterizations Heinz H. Bauschke,

7.2 Relationship among (∂eλ f )−1(0), Fix Pλ f and (∂ f )−1(0)

Proposition 7.12 Let f : Rn → (−∞,+∞] be proper, lsc, and prox-bounded with threshold λ f . Then forevery λ ∈ (0, λ f ), the following hold:

(i) For every α ∈ R, the level set levα f 6= ∅ if and only if levα(eλ f ) 6= ∅. Moreover, levα(eλ f ) ⊇levα f .

(ii) 0 ∈ ∂eλ f (x) ⇒ x ∈ Pλ f (x) ⇒ 0 ∈ ∂ f (x).

(iii) If, in addition, f is prox-regular at x for v ∈ ∂ f (x), then on a neighborhood Uλ of x + λv one has

0 = ∇eλ f (x) ⇔ x = Pλ f (x).

When f is prox-regular on Rn, one has

(∀x ∈ Rn) 0 = ∇eλ f (x) ⇔ x = Pλ f (x) ⇔ 0 ∈ ∂ f (x).

Proof.

(i). Since inf f = inf eλ f and argmin f = argmin eλ f by [51, Example 1.46], levα f 6= ∅ if andonly if levα(eλ f ) 6= ∅ for every α ∈ R. The inclusion follows from eλ f ≤ f .

(ii). By Fact 7.5, we have ∂eλ f (x) ⊆ λ−1[x − Pλ f (x)]. This gives the first implication. By [51,Example 10.2], Pλ f (x) ⊆ (Id+λ∂ f )−1(x) for all x ∈ Rn. The second implication follows.

(iii). By Fact 7.6 or [45, Theorem 4.4], the Moreau envelope eλ f is C1+ on a neighborhood Uλ

of x + λv with ∇eλ f = λ−1[Id−Pλ f ] on Uλ. When f is prox-regular on Rn, one has ∇eλ f =λ−1[Id−Pλ f ], and Pλ f = (Id+λ∂ f )−1 is single-valued on Rn by Proposition 7.8. �

Fact 7.13 [51, Proposition 12.19] For a proper, lsc function f : Rn → (−∞,+∞], assume that f isµ-hypoconvex for some µ > 0. Then Pµ f = (Id+µ∂ f )−1, and for all λ ∈ (0, µ) the mapping Pλ f =(Id+λ∂ f )−1 is Lipschitz continuous with constant µ/[µ− λ].

Under the assumption of f being µ-hypoconvex for some µ > 0, when λ > 0 is sufficientlysmall eλ f gives rise to a smooth regularization of f .

Proposition 7.14 For a proper, lsc function f : Rn → (−∞,+∞], assume that f is µ-hypoconvex forsome µ > 0. Then for every λ ∈ (0, µ), the following hold:

(i) eλ f is C1+ and ∇eλ f = λ−1(Id−Pλ f ) on Rn.

(ii) ∇eλ f (x) = 0 ⇔ 0 ∈ ∂ f (x).

Proof. As f is µ-hypoconvex, f is prox-regular and prox-bounded. By Fact 7.13 and Fact 7.6,

∇eλ f = λ−1[Id−Pλ f ] = λ−1[Id−(Id+λ∂ f )−1].

Remark 7.15 Proposition 7.14(ii) can also been obtained from [33, Theorem 4.4], in which the au-thors study the Bregman envelope and proximal mapping of proper, lsc, and prox-bounded func-tions.

38

Page 39: Subgradient Projectors: Extensions, Theory, and ...people.ok.ubc.ca/bauschke/Research/109.pdf · Subgradient Projectors: Extensions, Theory, and Characterizations Heinz H. Bauschke,

Proposition 7.16 For a proper, lsc function f : Rn → (−∞,+∞], assume that f := max{ f1, . . . , fm}with fi being C2 and that f is prox-bounded below. Then for every λ > 0 sufficiently small, one has

0 = ∇eλ f (x) ⇔ x = Pλ f (x) ⇔ 0 ∈ ∂ f (x).

Proof. By [51, Proposition 13.33] or [45, Example 2.9], f is prox-regular everywhere on Rn. ByProposition 7.8, Pλ f = (Id+λ∂ f )−1. By Fact 7.6, we have ∇eλ f = λ−1[Id−Pλ f ]. It remains toapply Proposition 7.12(iii), and Pλ f = (Id+λ∂ f )−1 being single-valued. �

For a sequence {Ck}k∈N of subsets of Rn, its limit and outer limit are denoted respectively bylimk→∞ Ck and lim supk→∞ Ck; see [51, page 109].

Proposition 7.17 Let f : Rn → (−∞,+∞] be proper, lsc, and prox-bounded with threshold λ f > 0.Assume that C ⊆ Rn is nonempty and closed. Then for every α ∈ R one has limλ↓0 levα(eλ f + ιC) =levα( f + ιC).

Proof. In view of [51, Theorem 7.4(d)], eλ f + ιC converges epigraphically to f + ιC when λ ↓ 0.By [51, Proposition 7.7], for every α ∈ R there exists αλ ↓ α such that limλ↓0 levαλ

(eλ f + ιC) =levα( f + ιC). Since

levα( f + ιC) ⊆ levα(eλ f + ιC) ⊆ levαλ(eλ f + ιC),

we obtain limλ↓0 levα(eλ f + ιC) = levα( f + ιC). �

7.3 The subgradient projector of eλ f

The following result extends [12, Proposition 3.1(viii)] and [11, Example 4.9(ii)] from convex func-tions to possibly nonconvex functions.

Theorem 7.18 (subgradient projector of Moreau envelopes of a prox-regular function) Suppose thatf : Rn → (−∞,+∞] is proper, lsc, and prox-bounded with threshold λ f , and that f is prox-regular. Thenfor every λ ∈ (0, λ f ), the subgradient projector of eλ f is given by

Geλ f : Rn → Rn : x 7→{

x− λeλ f (x)

‖x−Pλ f (x)‖2 (x− Pλ f (x)) if eλ f (x) > 0 and x 6= Pλ f (x),

x otherwise,

and Fix Geλ f = lev0(eλ f ) ∪ {x ∈ Rn| x = Pλ f (x)}. When x = Pλ f (x), we have 0 ∈ ∂ f (x). Moreover,limλ↓0 lev0(eλ f ) = lev0 f .

Proof. Apply Propositions 7.12, 7.8, and 7.17 with C = Rn and α = 0. �

The restriction of G f to a subset D ⊆ Rn is denoted by G f |D and is the operator defined by

G f |D : D → Rn, G f |D(x) = G f (x) for every x ∈ D.

Theorem 7.19 (functions being prox-regular at the critical point) Suppose that f : Rn → (−∞,+∞]is proper, lsc, and prox-bounded with threshold λ f , and that f is prox-regular at x for 0 ∈ ∂ f (x). Then forevery λ ∈ (0, λ f ), there exists a closed neighborhood Uλ of x for which

(70) (∀x ∈ Uλ) Geλ f |Uλ(x) =

{x− λ

eλ f (x)‖x−Pλ f (x)‖2 (x− Pλ f (x)) if eλ f (x) > 0 and x 6= Pλ f (x),

x otherwise,

39

Page 40: Subgradient Projectors: Extensions, Theory, and ...people.ok.ubc.ca/bauschke/Research/109.pdf · Subgradient Projectors: Extensions, Theory, and Characterizations Heinz H. Bauschke,

(71) Fix Geλ f |Uλ=(

lev0(eλ f ) ∪ {x ∈ Rn| x = Pλ f (x)})∩Uλ, and

(72) x = Pλ f (x) ⇒ 0 ∈ ∂ f (x).

Moreover, lim supλ↓0(

lev0(eλ f ) ∩Uλ

)⊆ lev0 f .

Proof. Apply Fact 7.6 and Proposition 7.12(iii) to obtain (70)–(72). Since(lev0(eλ f ) ∩Uλ

)⊆ lev0(eλ f ), and

limλ↓0

lev0(eλ f ) =⋂k≥1

lev0(e1/k f )

by [51, Exercise 4.3(b)], it suffices to use Proposition 7.17 with C = Rn and α = 0. �

Theorem 7.20 (subgradient projector of Moreau envelopes of a hypoconvex function) Suppose thatf : Rn → (−∞,+∞] is proper and lsc, and that f is µ-hypoconvex for some µ > 0. Then for everyλ ∈ (0, µ), the subgradient projector of eλ f is given by

Geλ f : Rn → Rn : x 7→{

x− λeλ f (x)

‖x−Pλ f (x)‖2 (x− Pλ f (x)) if eλ f (x) > 0 and 0 6∈ ∂ f (x),

x otherwise,

Fix Geλ f = lev0(eλ f ) ∪ {x ∈ Rn| 0 ∈ ∂ f (x)}, and

{x ∈ Rn| 0 = ∇eλ f (x)} = {x ∈ Rn| 0 ∈ ∂ f (x)}.

Moreover, limλ↓0 lev0(eλ f ) = lev0 f .

Proof. Apply Propositions 7.14 and 7.17 with C = Rn and α = 0. �

Theorems 7.18 and 7.20 imply that if one can solve xλ ∈ Fix(Geλ f ), then either 0 ∈ ∂ f (xλ) forsome λ > 0 or the subsequential limits of (xλ) will lie in lev0 f when λ ↓ 0.

Remark 7.21 Moreau envelopes of nonconvex functions in infinite dimensional spaces have beenintensively studied; see, e.g., [5, 6, 32, 3]. Thus, it is possible to have analogues of Theorems 7.18,7.19, 7.20 in infinite dimensional spaces. However, this is beyond the scope of this paper.

Cutters are important for studying convergence of iterative methods; see, e.g., [9, 20, 13]. It isnatural to ask whether Geλ f is a cutter in the case that G f is a cutter. Although we cannot answerthis in general, the following special case is true.

Proposition 7.22 Let f : Rn → (−∞,+∞] be proper, lsc, and prox-regular. Suppose that min f = 0, fis strictly differentiable at every x ∈ argmin f , and that 0 6∈ ∂ f (x) for every x ∈ Rn \ argmin f . Then forevery λ > 0 the following hold:

(i) Fix Geλ f = FixG f .

(ii) If G f ,s is a cutter for every selection s of ∂ f , then Geλ f is a cutter.

40

Page 41: Subgradient Projectors: Extensions, Theory, and ...people.ok.ubc.ca/bauschke/Research/109.pdf · Subgradient Projectors: Extensions, Theory, and Characterizations Heinz H. Bauschke,

Proof. As min f > −∞, the function f is prox-bounded with threshold r f = +∞.

(i). Note that min f = min eλ f , argmin f = argmin eλ f . The assumption min f = 0 implieslev0 f = lev0eλ f = argmin f . Because f is prox-regular on Rn and r f = +∞, for every λ > 0we have ∇eλ f = λ−1(Id−Proxλ f ) and Proxλ f = (Id+λ∂ f )−1 being single-valued by Proposi-tion 7.8. This gives {

x ∈ Rn ∣∣ ∇eλ f (x) = 0}={

x ∈ Rn ∣∣ 0 ∈ ∂ f (x)}

.

Then

Fix Geλ f = lev0eλ f ∪{

x ∈ Rn ∣∣ ∇eλ f (x) = 0}= lev0 f ∪

{x ∈ Rn ∣∣ 0 ∈ ∂ f (x)

}= FixG f .

(ii). Assume that eλ f (x) > 0 and 0 6= ∇eλ f (x). Since f is prox-regular, ∇eλ f (x) = λ−1(x −Proxλ f (x)) 6= 0. By the definition of Proxλ f , we have

(73) 0 6= λ−1(x− Proxλ f (x)) ∈ ∂ f (Proxλ f (x)),

which implies that Proxλ f (x) 6∈ argmin f . Indeed, if Proxλ f (x) ∈ argmin f , the assumption gives∂ f (Proxλ f (x)) = {0} which contradicts (73). Thus, f (Proxλ f (x)) > 0. Because Proxλ f (x) 6∈argmin f , the assumption also gives 0 6∈ ∂ f (Proxλ f (x)). These arguments, (i), Theorem 5.11, andG f ,s being a cutter altogether imply that

(74) f (Proxλ f (x)) +⟨

λ−1(x− Proxλ f (x)), u− Proxλ f (x)⟩≤ 0

if u ∈ Fix G f ,s, eλ f (x) > 0 and ∇eλ f (x) 6= 0.

Now we show that Geλ f is a cutter. Let u ∈ Fix Geλ f , eλ f (x) > 0, and ∇eλ f (x) 6= 0. In view of(74) and (i), we calculate

eλ f (x) +⟨

λ−1(x− Proxλ f (x)), u− x⟩

= eλ f (x) +⟨

λ−1(x− Proxλ f (x)), u− Proxλ f (x)⟩+⟨

λ−1(x− Proxλ f (x)), Proxλ f (x)− x⟩

= f (Proxλ f (x)) +1

2λ‖x− Proxλ f (x)‖2 +

⟨λ−1(x− Proxλ f (x)), u− Proxλ f (x)

⟩−

λ−1‖x− Proxλ f (x))‖2

= f (Proxλ f (x)) +⟨

λ−1(x− Proxλ f (x)), u− Proxλ f (x)⟩− 1

2λ‖x− Proxλ f (x))‖2

≤ − 12λ‖x− Proxλ f (x))‖2 ≤ 0.

Theorem 5.11(i) concludes the proof. �

A local version of Theorem 7.22 comes as follows.

Proposition 7.23 Let f : Rn → (−∞,+∞] be proper, lsc, and prox-regular at x for v = 0, and let

S :={

x ∈ Rn ∣∣ 0 ∈ ∂ f (x)}∪ lev0 f .

Suppose that min f = 0, and there exists δ > 0 such that

(i) For every selection s of ∂ f , G f ,s is a cutter on B(x, δ), i.e.,

(∀x ∈ B(x, δ) \ S)(∀u ∈ S ∩ B(x, δ)) f (x) + 〈s(x), u− x〉 ≤ 0.

41

Page 42: Subgradient Projectors: Extensions, Theory, and ...people.ok.ubc.ca/bauschke/Research/109.pdf · Subgradient Projectors: Extensions, Theory, and Characterizations Heinz H. Bauschke,

(ii) f is strictly differentiable at every u ∈ argmin f ∩ B(x, δ), and that 0 6∈ ∂ f (x) for every x ∈(Rn \ argmin f ) ∩ B(x, δ).

Then for every λ > 0 there is a neighborhood of x on which Geλ f is a cutter.

Proof. Because min f > −∞, the function f is prox-bounded with threshold r f = +∞. Sincemin f = min eλ f and argmin f = argmin eλ f , the assumption min f = 0 implies lev0 f =lev0eλ f = argmin f . Because f is prox-regular at x for v = 0, and r f = +∞, by Proposition 7.6 forevery λ > 0 there exists δ > δ1 > 0 such that on B(x, δ1) the proximal mappings

(75) Pλ f is Lipschitz continuous, Pλ f (x) = x,

and

(76) ∇eλ f = λ−1(Id−Proxλ f ).

By (75) there exists δ1 > δ2 > 0 such that

(77) Pλ f (x) ∈ B(x, δ1) when x ∈ B(x, δ2).

Claim 1. For every u ∈ Fix G f ,s ∩ B(x, δ2) we have

(78) f (Proxλ f (x)) +⟨

λ−1(x− Proxλ f (x)), u− Proxλ f (x)⟩≤ 0

if eλ f (x) > 0, ∇eλ f (x) 6= 0 and x ∈ B(x, δ2).

Indeed, let eλ f (x) > 0 and 0 6= ∇eλ f (x) and x ∈ B(x, δ2). In view of (76), ∇eλ f (x) = λ−1(x−Proxλ f (x)) 6= 0. By the definition of Proxλ f or [45, Proposition 4.3(b)], we have

(79) 0 6= λ−1(x− Proxλ f (x)) ∈ ∂ f (Proxλ f (x)).

This implies that Proxλ f (x) 6∈ argmin f . Suppose to the contrary that Proxλ f (x) ∈ argmin f . Thenthe assumption (ii) and (77) give ∂ f (Proxλ f (x)) = {0} which contradicts (166). Thus,

(80) f (Proxλ f (x)) > 0.

Because Proxλ f (x) 6∈ argmin f and (77), the assumption (ii) also ensures

(81) 0 6∈ ∂ f (Proxλ f (x)).

Therefore, (78) follows from assumptions (i) and (ii).

Claim 2. Geλ f is a cutter on B(x, δ2).

To this end, let u ∈ Fix Geλ f ∩ B(x, δ2), x ∈ B(x, δ2), eλ f (x) > 0, and ∇eλ f (x) 6= 0. Thenu ∈ Fix G f ∩ B(x, δ2), f (Proxλ f (x)) > 0, 0 6∈ ∂ f (Proxλ f (x)) by (80), (81). Using (78) we calculate

eλ f (x) +⟨

λ−1(x− Proxλ f (x)), u− x⟩

= eλ f (x) +⟨

λ−1(x− Proxλ f (x)), u− Proxλ f (x)⟩+⟨

λ−1(x− Proxλ f (x)), Proxλ f (x)− x⟩

= f (Proxλ f (x)) +1

2λ‖x− Proxλ f (x)‖2 +

⟨λ−1(x− Proxλ f (x)), u− Proxλ f (x)

⟩−

42

Page 43: Subgradient Projectors: Extensions, Theory, and ...people.ok.ubc.ca/bauschke/Research/109.pdf · Subgradient Projectors: Extensions, Theory, and Characterizations Heinz H. Bauschke,

λ−1‖x− Proxλ f (x))‖2

= f (Proxλ f (x)) +⟨

λ−1(x− Proxλ f (x)), u− Proxλ f (x)⟩− 1

2λ‖x− Proxλ f (x))‖2

≤ − 12λ‖x− Proxλ f (x))‖2 ≤ 0.

Hence, Geλ f is a cutter on B(x, δ2) by Theorem 5.11(ii). �

Is it possible that Geλ f is a cutter for every λ > 0 but G f is not a cutter? This is partially answeredby the following result.

Proposition 7.24 Let f : Rn → R be C2 and prox-bounded below. If Geλ f is a cutter for all sufficientlysmall λ > 0, then G f is a cutter.

Proof. Let λ > 0 be sufficiently small. Proposition 7.8 yields that eλ f is C1+. Write

S :={

x ∈ Rn ∣∣ 0 ∈ ∇ f (x)}∪ lev0 f ,

Sλ :={

x ∈ Rn ∣∣ ∇eλ f (x) = 0}∪ lev0 eλ f .

Using that∇eλ f (x) = 0⇔ 0 ∈ ∂ f (x) by Proposition 7.16 and that eλ f ≤ f , we have S ⊆ Sλ. SinceGeλ f is a cutter, by Theorem 5.11 we obtain

Sλ ⊆{

u ∈ Rn ∣∣ eλ f (x) + 〈∇eλ f (x), u− x〉 ≤ 0}

whenever eλ f (x) > 0 and ∇eλ f (x) 6= 0. It follows that

(82) S ⊆{

u ∈ Rn ∣∣ eλ f (x) + 〈∇eλ f (x), u− x〉 ≤ 0}

whenever eλ f (x) > 0 and ∇eλ f (x) 6= 0. By [3, Theorem 3.10] or [32, Theorem 5.1],

(83) ∇ f (x) = lim supm→∞

∇eλm f (xm)

in which xm → x, eλm f (xm) → f (x), λm ↓ 0. Whenever f (x) > 0,∇ f (x) 6= 0, (83) implies that forsufficiently large m, it holds that eλm f (xm) > 0 and ∇eλm f (xm) 6= 0. Then by (82),

(∀u ∈ S) eλm f (xm) + 〈∇eλm f (xm), u− xm〉 ≤ 0.

Passing to the limit when m→ ∞, we have

(∀u ∈ S) f (x) + 〈∇ f (x), u− x〉 ≤ 0.

Hence, G f is a cutter by using Theorem 5.11 again. �

7.4 The subgradient projector of dC when C is prox-regular at a point

In this subsection, instead of functions we shall consider sets which are prox-regular at somepoints. Recall that a set C ⊆ Rn is prox-regular at x ∈ C for v ∈ NC(x) when ιC is prox-regular atx for v; see [51, Exercise 13.31].

Example 7.25 Let C ⊆ Rn be closed and x ∈ C. If C is prox-regular at x for v = 0, then there existsa neighborhood U of x on which

43

Page 44: Subgradient Projectors: Extensions, Theory, and ...people.ok.ubc.ca/bauschke/Research/109.pdf · Subgradient Projectors: Extensions, Theory, and Characterizations Heinz H. Bauschke,

(i) PC is single-valued and Lipschitz;

(ii) PC = (Id+T)−1 for some localization T of NC around (x, 0);

(iii) dC is strictly differentiable on U \ C with ∇dC = Id−PCdC

;

(iv) GdC = PC;

(v) Gd2C= Id+PC

2 .

Proof. (i), (ii), and (iii) are given in [51, page 618].

To see (iv), let x ∈ U \ C. Since dC(x) > 0 and ∇dC(x) = x−PC(x)dC(x) 6= 0, we have

GdC(x) = x− dC(x)‖∇dC(x)‖2∇dC(x)

= x− (x− PC(x)) = PC(x).

When x ∈ U ∩ C, GdC(x) = x = PC(x).

(v) follows from (iv) and Theorem 3.9. �

Remark 7.26 Sets which satisfy the assumption on C in Theorem 7.25 include convex sets,strongly amenable sets, etc; see, e.g., [51, page 442]. See also [1] for recent advances on prox-regular sets and uniformly prox-regular sets.

According to Example 7.25(iv), when C is prox-regular at x for v = 0, we have GdC = PC arounda neighborhood of x. What happens if, in addition, GdC is a cutter or quasi nonexpansive on theneighborhood?

Proposition 7.27 Let C ⊆ Rn be closed and x ∈ C, and let C be prox-regular at x for v = 0. Supposethat there exists δ > 0 such that one of the following holds:

(i) PC is a cutter on B(x, δ), i.e.,

(84) (∀x ∈ B(x, δ))(∀u ∈ C ∩ B(x, δ)) 〈x− PC(x), u− PC(x)〉 ≤ 0.

(ii) PC is quasi-ne on B(x, δ), i.e.,

(85) (∀x ∈ B(x, δ))(∀u ∈ C ∩ B(x, δ)) ‖PC(x)− u‖ ≤ ‖x− u‖.

Then C ∩ B(x, δ) is convex.

Proof. By Example 7.25, there exists δ > 0 such that PC is single-valued and Lipschitz on the closedball B(x, δ).

(i). Assume that (84) holds. On the one hand, (84) gives that

C ∩ B(x, δ) ⊆⋂

x∈B(x,δ)

{u ∈ B(x, δ)

∣∣ 〈x− PC(x), u− PC(x)〉 ≤ 0}

.

44

Page 45: Subgradient Projectors: Extensions, Theory, and ...people.ok.ubc.ca/bauschke/Research/109.pdf · Subgradient Projectors: Extensions, Theory, and Characterizations Heinz H. Bauschke,

On the other hand, let

y ∈⋂

x∈B(x,δ)

{u ∈ B(x, δ)

∣∣ 〈x− PC(x), u− PC(x)〉 ≤ 0}

.

Then y ∈ B(x, δ) and 〈x− PC(x), y− PC(x)〉 ≤ 0 for every x ∈ B(x, δ). Taking x = y we have〈y− PC(y), y− PC(y)〉 ≤ 0, which implies y = PC(y), so y ∈ C. Therefore, y ∈ B(x, δ) ∩ C. Hence

C ∩ B(x, δ) =⋂

x∈B(x,δ)

{u ∈ B(x, δ)

∣∣ 〈x− PC(x), u− PC(x)〉 ≤ 0}

,

and consequently C ∩ B(x, δ) is a convex set.

(ii). Similar arguments as (i) show that

C ∩ B(x, δ) =⋂

x∈B(x,δ)

{u ∈ B(x, δ)

∣∣ ‖PCx− u‖ ≤ ‖x− u‖}

.

To finish the proof, it suffices to observe that in the Euclidean space Rn, for every x, y ∈ Rn the set{u ∈ Rn

∣∣ ‖y− u‖ ≤ ‖x− u‖}

is a half space when x 6= y, and the whole space Rn if x = y. �

Proposition 7.28 Let C ⊆ Rn be closed and x ∈ C. If there exists δ > 0 such that C ∩ B(x, δ) is convex,then there exists δ1 > 0 such that PC is a cutter on B(x, δ1). Consequently, PC is a quasi-ne on B(x, δ1).

Proof. Observe that for p ∈ PC(x),

‖p− x‖ ≤ ‖p− x‖+ ‖x− x‖ ≤ 2‖x− x‖.

Thus, we can choose 0 < δ1 < δ sufficiently small, e.g., δ1 < δ/2, such that

‖x− x‖ < δ1 implies (∀p ∈ PC(x)) ‖p− x‖ < δ.

This implies that whenever x ∈ B(x, δ1), we have PC(x) ⊆ C ∩ B(x, δ). Then

(86) (∀x ∈ B(x, δ1)) PC(x) = PC∩B(x,δ)(x).

Because C ∩ B(x, δ) is closed and convex, PC∩B(x,δ) is firmly nonexpansive on Rn. From (86), wehave that PC is firmly nonexpansive on B(x, δ1), that is,

(∀x, y ∈ B(x, δ1)) ‖PC(x)− PC(y)‖2 + ‖(Id−PC)(x)− (Id−PC)(y)‖2 ≤ ‖x− y‖2.

It follows that

(∀x ∈ B(x, δ1))(∀y ∈ C ∩ B(x, δ1)) ‖PC(x)− y‖2 + ‖x− PC(x)‖2 ≤ ‖x− y‖2.

Hence PC is a cutter on B(x, δ1). �

8 Characterization of subgradient projectors of convex functions

Subgradient projectors of convex functions are quasi-fne, so algorithms developed in [20] or [7]can be applied; see also Theorem 6.13. Therefore, in practice, it is useful to have available someresults on whether a mapping is a subgradient projector of a convex function. This is the goalof this section. The results in this section provide some checkable conditions for convergence ofiterated subgradient projectors in Section 6.

The following result is of independent interest.

45

Page 46: Subgradient Projectors: Extensions, Theory, and ...people.ok.ubc.ca/bauschke/Research/109.pdf · Subgradient Projectors: Extensions, Theory, and Characterizations Heinz H. Bauschke,

Proposition 8.1 Let C ⊆ Rn be closed and convex. Assume that the function f : Rn \ C → R satisfies

(i) f ≥ 0 on Rn \ C;

(ii) f is convex on every convex subsets of Rn \ C;

(iii) Whenever x ∈ bdry(Rn \ C), one has lim y→xy∈Rn\C

f (y) = 0. That is, limi→∞ f (yi) = 0 whenever

(yi)i∈N is a sequence in Rn \ C converging to a boundary point x of Rn \ C.

Define

g : Rn → R : x 7→{

f (x) if x 6∈ C,0 if x ∈ C.

Then g is convex on Rn.

Proof. Let x, y ∈ Rn, 0 ≤ λ ≤ 1. We need to show

(87) g(λx + (1− λ)y) ≤ λg(x) + (1− λ)g(y).

We consider three cases.

(i). If [x, y] ⊆ Rn \ C, g = f is convex on [x, y] by the assumption.

(ii). If λx + (1− λ)y ∈ C, then

g(λx + (1− λ)y) = 0 ≤ λg(x) + (1− λ)g(y)

since g(x), g(y) ≥ 0.

(iii). λx + (1− λ)y 6∈ C and [x, y] ∩ C 6= ∅. In particular, x, y cannot both be in C. We considertwo subcases.

Subcase 1. x ∈ C and y 6∈ C. As y 6∈ C, there exists z ∈ bdry(C) such that

λx + (1− λ)y ∈ [z, y] ⊆ X \ C and f (z) = 0.

Becauseλx + (1− λ)y = αz + (1− α)y for some 0 ≤ α ≤ 1,

and f is convex on [z, y], we have

(88) f (λx + (1− λ)y) = f (αz + (1− α)y) ≤ α f (z) + (1− α) f (y) = (1− α) f (y).

Nowz = βx + (1− β)y for some 0 ≤ β ≤ 1, and

λx + (1− λ)y = αz + (1− α)y = α(βx + (1− β)y) + (1− α)y= (αβ)x + (1− αβ)y

give λ = αβ. Therefore, by (88), g(x) = 0 and g(y) = f (y) ≥ 0,

g(λx + (1− λ)y) = f (λx + (1− λ)y)(89)≤ (1− αβ) f (y) = (1− λ)g(y) + λg(x),(90)

46

Page 47: Subgradient Projectors: Extensions, Theory, and ...people.ok.ubc.ca/bauschke/Research/109.pdf · Subgradient Projectors: Extensions, Theory, and Characterizations Heinz H. Bauschke,

which is (87).

Subcase 2. x 6∈ C and y 6∈ C. By the assumption, there exists z ∈ bdry(C) such that

λx + (1− λ)y ∈ [z, y] or λx + (1− λ)y ∈ [x, z],

say λx + (1− λ)y ∈ [z, y]. Then

λx + (1− λ)y = αz + (1− α)y for some 0 ≤ α ≤ 1.

As f is convex on [z, y], f (z) = 0,

(91) g(λx + (1− λ)y) = f (αz + (1− α)y) ≤ α f (z) + (1− α) f (y) = (1− α) f (y).

Nowz = βx + (1− β)y for some 0 ≤ β ≤ 1, and

λx + (1− λ)y = αz + (1− α)y = α(βx + (1− β)y) + (1− α)y= (αβ)x + (1− αβ)y

give λ = αβ. Then by (91), using g(x) = f (x) ≥ 0, g(y) = f (y) ≥ 0, we obtain

g(λx + (1− λ)y) ≤ (1− αβ) f (y) = (1− λ) f (y)(92)≤ (1− λ) f (y) + λ f (x)(93)= (1− λ)g(y) + λg(x),(94)

which is (87).

Combining (i)–(iii), we conclude that g is convex on Rn. �

Theorem 8.2 Let T : Rn → Rn and C :={

x ∈ Rn∣∣ Tx = x

}be closed convex. Then T is a subgradient

projector of a convex function f : Rn → R with lev0 f = C if and only if there exists g : Rn → [−∞,+∞)such that g : Rn \ C → R is locally Lipschitz, g(x) = −∞ for every x ∈ C, and

(i) for every x ∈ Rn \ C, x−Tx‖x−Tx‖2 ∈ ∂g(x);

(ii) the function defined by

f (x) :=

{exp(g(x)) if x 6∈ C,0 if x ∈ C,

is convex.

In this case, T = G f .

Proof. “⇒”: Assume that T is a subgradient projector, say T = G f1 with f1 : Rn → R being convexand lev0 f1 = C. Then f = max{0, f1} is convex and G f = G f1 . Put g = ln f and C = lev0 f .Since f is locally Lipschitz, g is locally Lipschitz on Rn \C. Note that ∂g(x) = (∂ f (x))/ f (x) whenf (x) > 0. Apply Theorem 4.1(i) to obtain (i).

“⇐”: Assume that (i), (ii) hold. When x 6∈ C, (i) and (ii) give

‖x− Tx‖ = 1‖c(x)‖ , ∂g(x) =

∂ f (x)f (x)

47

Page 48: Subgradient Projectors: Extensions, Theory, and ...people.ok.ubc.ca/bauschke/Research/109.pdf · Subgradient Projectors: Extensions, Theory, and Characterizations Heinz H. Bauschke,

where c(x) ∈ ∂g(x). Using (i) again, we have

Tx = x− ‖x− Tx‖2c(x) = x− c(x)‖c(x)‖2 = G f (x)(95)

by Theorem 4.1(ii). Moreover, when x ∈ C, Tx = x = G f (x). Hence T = G f . �

For an n× n symmetric matrix A, by A � 0 we mean that A is positive semidefinite.

Theorem 8.3 Let T : Rn → Rn and C :={

x ∈ Rn∣∣ Tx = x

}. Suppose that C is closed and convex, and

T is continuously differentiable on Rn \ C. Define

T1 : Rn \ C → Rn : x 7→ x− Tx‖x− Tx‖2 .

Then T is a subgradient projector of a convex function f : Rn → R with lev0 f = C and being differentiableon Rn \ C if and only if

(i) For every x ∈ Rn \ C, the matrix T1(x)(T1(x))∗ +∇T1(x) � 0;

(ii) There exists a function g : Rn → [−∞,+∞) such that

(∀x ∈ Rn \ C) ∇g(x) = T1(x),

(∀x ∈ bdry(C)) limy→x

y∈Rn\C

g(y) = −∞, and (∀x ∈ C) g(x) = −∞.

Proof. “⇒”: Assume that T = G f with f being convex and lev0 f = C. Theorem 4.10 shows thatf is continuously differentiable on Rn \ C. By Theorem 4.1(i), we can put g = ln f to obtain (ii).Moreover, as f = exp(g), thanks to (16) in Theorem 4.1, for every x 6∈ C we have

∇ f (x) = eg(x)∇g(x) = eg(x)T1(x),

∇2 f (x) = eg(x)T1(x)(T1(x))∗ + eg(x)∇T1(x) = eg(x)(T1(x)(T1(x))∗ +∇T1(x)).

Since f is convex, ∇2 f (x) � 0, and this is equivalent to

T1(x)(T1(x))∗ +∇T1(x) � 0

which is (i).

“⇐”: Assume that (i) and (ii) hold. Put f = exp(g). Then lev0 f = C, and for x ∈ Rn \ C,

∇ f (x) = eg(x)∇g(x) = eg(x)T1(x),

∇2 f (x) = eg(x)T1(x)(T1(x))∗ + eg(x)∇T1(x) = eg(x)(T1(x)(T1(x))∗ +∇T1(x)).

(i) and (ii) imply that f is differentiable and convex on convex subsets of Rn \ C, and f ≡ 0 on C.By Proposition 8.1, f is convex on Rn.

Moreover, when x 6= Tx we have

G f (x) = x−(

f (x)‖∇ f (x)‖

)2∇ f (x)f (x)

= x− T1(x)‖T1(x)‖2(96)

= x−x−Tx‖x−Tx‖2( 1‖x−Tx‖

)2 = x− (x− Tx) = Tx.(97)

48

Page 49: Subgradient Projectors: Extensions, Theory, and ...people.ok.ubc.ca/bauschke/Research/109.pdf · Subgradient Projectors: Extensions, Theory, and Characterizations Heinz H. Bauschke,

Corollary 8.4 Let T : R → R and C :={

x ∈ R∣∣ Tx = x

}. Suppose that C is a closed interval, and T

is continuously differentiable on R \ C. Then T is a subgradient projector of a convex function f : R→ R

with lev0 f = C and being differentiable on R \ C if and only if

(i) T is monotonically increasing on convex subsets of R \ C;

(ii) The function

g(x) =∫ x

a

1s− Ts

ds

satisfies limx↓sup(C) g(x) = −∞ for some a > sup(C); and limx↑inf(C) g(x) = −∞ for somea < inf(C).

Proof. Define n : R → R : x 7→ x − Tx. Then for every x 6∈ C, T1(x) = 1n(x) . Theorem 8.3(i) is

equivalent to1

n2(x)− n′(x)

n2(x)≥ 0.

This is the same as n′(x) ≤ 1, which transpires to T′(x) ≥ 0. �

Remark 8.5 Let f : Rn → R be continuously differentiable, lev0 f = Fix T, and G f = T. Can oneuse T to decide whether f is convex? The proof of Theorem 8.3 implies that ∇ f = f · T1 where

T1 : Rn \ lev0 f → Rn : x 7→ x− Tx‖x− Tx‖2 .

If

(98) f · T1 is monotone on convex subsets of Rn \ lev0 f ,

then f is convex on convex subsets of Rn \ lev0 f . Using Proposition 8.1, we conclude thatmax{0, f } is convex on Rn. When T is continuously differentiable, (98) is equivalent to

(99) (∀x ∈ Rn \ C) T1(x)(T1(x))∗ +∇T1(x) � 0.

On R, (99) is equivalent to

(100) T is monotonically increasing on convex subsets of R \ C.

Corollary 8.6 Let T : R → R and C :={

x ∈ R∣∣ Tx = x

}. Let C be a closed interval, and T be

continuously differentiable on R \ C. Define

N : R→ R : x 7→ x− Tx.

Suppose that

(i) N is nonexpansive;

(ii) The function

g(x) =∫ x

a

1s− Ts

ds

satisfies limx↓sup(C) g(x) = −∞ for some a > sup(C); and limx↑inf(C) g(x) = −∞ for somea < inf(C).

49

Page 50: Subgradient Projectors: Extensions, Theory, and ...people.ok.ubc.ca/bauschke/Research/109.pdf · Subgradient Projectors: Extensions, Theory, and Characterizations Heinz H. Bauschke,

Then T is a subgradient projector of a convex function f : R→ R with lev0 f = C and being differentiableon R \ C. In particular, the assumption (i) holds when T is firmly nonexpansive.

Proof. It suffices to observe that T = Id−N. Since N is nonexpansive, T is monotone. Also notethat T is firmly nonexpansive if and only if N is. �

We illustrate Corollary 8.4 with three examples. They demonstrate that both conditions (i) and(ii) in Corollary 8.4 are needed. More precisely, (i) is for the convexity of f ; (ii) is for lev0 f = C.

Example 8.7 Define T : R→ R by

T(x) :=

{x−√

x +√

xe−2√

x if x > 0,0 if x ≤ 0.

Then T is a subgradient projector of the nonconvex function

f : R→ R : x 7→{

e2√

x − 1 if x > 0,0 if x ≤ 0.

In this case, T fails to be monotone, but T verifies condition (ii) of Corollary 8.4.

Proof. When x > 0, f ′(x) = e2√

xx−1/2, so that

f ”(x) =e2√

x(1− 1/(2√

x))x

.

Since f ”(x) < 0 when x < 1/4, f is not convex on R. Now we show that

(i). T fails to be monotone. This is equivalent to verify that for some x we have N′(x) > 1 whereN(x) = x− Tx. Indeed,

N′(x) =(√

x−√

xe2√

x

)′=

12

e2√

x − 1e2√

x√

x+ e−2

√x.

L’Hospital’s rule gives

limx→0+

e2√

x − 1e2√

x√

x= 2,

so limx→0+ N′(x) = 2. Therefore, T is not monotone.

(ii). T satisfies condition (ii) of Corollary 8.4. For x > 0,

N(x) =e2√

x − 1e2√

xx−1/2.

With a > 0, we have

g(x) =∫ x

a

1N(s)

ds =∫ x

a

e2√

xx−1/2

e2√

x − 1dx = ln(e2

√x − 1)− ln(e2

√a − 1).

Clearly, limx→0+ g(x) = −∞. Hence (ii) holds. �

50

Page 51: Subgradient Projectors: Extensions, Theory, and ...people.ok.ubc.ca/bauschke/Research/109.pdf · Subgradient Projectors: Extensions, Theory, and Characterizations Heinz H. Bauschke,

Example 8.8 Define

T : R→ R : x 7→{

x− 12x if x 6= 0,

0 if x = 0.

Then T = G f where f : R → R : x 7→ ex2. However, lev0 f = ∅ but Fix(T) = {0}. In this case, in

Corollary 8.4 condition (i) holds but condition (ii) fails.

Proof. We have

N(x) = x− T(x) =1

2x

and N′(x) = − 12x2 . Therefore, T is monotone on (0,+∞) and (−∞, 0). This says that condition (i)

of Corollary 8.4 holds. However, when a > 0, for x > 0 we have

g(x) =∫ x

a

1N(x)

dx =∫ x

a2xdx = x2 − a2.

Then limx→0+ g(x) = −a2, so condition (ii) of Corollary 8.4 fails. �

Example 8.9 Define T : R→ R by

x 7→

x−√

x if x > 0,0 if x = 0,x−√−x if x < 0.

Then T = G f where the nonconvex function

f : R→ R : x 7→{

e2√

x if x ≥ 0,e−2√−x if x < 0.

However, lev0 f = ∅ but Fix T = {0}. In this case, both conditions (i) and (ii) in Corollary 8.4 fail.

Proof. The function f (x) = e2√

x is nonconvex on [0,+∞), see Example 8.7. G f = T follows bydirect calculations.

Condition (i) of Corollary 8.4 fails: T is not monotonically increasing on [0,+∞) since T′(x) =1− 1

2√

x < 0 when x > 0 is sufficiently near 0.

Condition (ii) of Corollary 8.4 fails. Indeed, N(x) = x− T(x) =√

x when x ≥ 0. When a > 0,for x > 0 we have

g(x) =∫ x

a

1√s

ds = 2√

x− 2√

a,

so that limx→0+ g(x) = −2√

a. �

For further properties of subgradient projectors of convex functions, we refer the reader to [44,54, 12].

This completes Part II. We will investigate conditions under which a subgradient projector islinear in part III.

51

Page 52: Subgradient Projectors: Extensions, Theory, and ...people.ok.ubc.ca/bauschke/Research/109.pdf · Subgradient Projectors: Extensions, Theory, and Characterizations Heinz H. Bauschke,

Part III

Linear subgradient projectors

9 Characterizations of G f ,s when G f ,s is linear

We shall see in this section that under appropriate conditions a linear operator is a subgradientprojector of a convex function if and only if it is a convex combination of the identity operator anda projection operator on a subspace (Theorems 9.6 and 9.11). For subgradient projectors of convexfunctions, see [12, 44, 9, 46, 47, 48]. We begin with

9.1 Linear cutters are precisely linear firmly nonexpansive mappings

Proposition 9.1 Let H be a Hilbert space, and T : H → H be a linear operator. Then the following areequivalent:

(i) T is a cutter, i.e., quasi-firmly nonexpansive.

(ii) T is firmly nonexpansive.

(iii) There exists δ > 0 and x ∈ Fix T such that T is a cutter on B(x, δ), i.e., a local cutter.

Proof. (i)⇒(ii). Assume that T is a cutter. Then for every x ∈ X and u ∈ Fix T,

〈x− Tx, u− Tx〉 = 〈Tx− x, Tx− u〉 ≤ 0.

Put u = 0. We have〈Tx− x, Tx− 0〉 ≤ 0 ⇒ ‖Tx‖2 ≤ 〈x, Tx〉 .

Hence T is firmly nonexpansive, see [7, Corollary 4.3].

(ii)⇒(i). Assume that T is firmly nonexpansive. Let u ∈ Fix T. Then Tu = u and

〈Tx− x, Tx− u〉 = 〈Tx− x, Tx− Tu〉(101)= 〈Tx− Tu + Tu− x, Tx− Tu〉(102)

= ‖Tx− Tu‖2 + 〈Tu− x, Tx− Tu〉(103)

= ‖Tx− Tu‖2 − 〈x− u, Tx− Tu〉 ≤ 0.(104)

Hence T is a cutter.

(iii)⇒(ii). By the assumption

(∀x ∈ B(x, δ))(∀u ∈ B(x, δ) ∩ Fix T) 〈x− Tx, u− Tx〉 ≤ 0.

As Tx = x, and T is linear, for x = x + v with ‖v‖ ≤ δ, we have

0 ≥ 〈x− Tx, x− Tx〉 = 〈Tx− x, Tx− x〉= 〈T(x− x)− (x− x), T(x− x)〉 = ‖Tv‖2 − 〈v, Tv〉 .

Since T is linear, we have ‖Tx‖2 ≤ 〈x, Tx〉 for every x ∈ X, so T is firmly nonexpansive, see [7,Corollary 4.3].

52

Page 53: Subgradient Projectors: Extensions, Theory, and ...people.ok.ubc.ca/bauschke/Research/109.pdf · Subgradient Projectors: Extensions, Theory, and Characterizations Heinz H. Bauschke,

Since (i)⇔(ii), and (i) implies (iii), the proof is done. �

The following example says that Proposition 9.1 fails if T is not linear.

Example 9.2 Define the continuous nonlinear mapping

T : R→ R : x 7→

x/2 if −2 ≤ x ≤ 2,3− x if 2 ≤ x ≤ 3,−(3 + x) if −3 ≤ x ≤ −2,0 otherwise.

Then T is a cutter, nonexpansive, but not firmly nonexpansive as T is not monotone; cf. [7, Propo-sition 4.2(iv)].

Indeed, Fix T = {0}. This means that T is a cutter if and only if (T(x))2 ≤ xT(x). When−2 ≤ x ≤ 2, we have

(T(x))2 =x2

4≤ x

x2= xT(x);

When 2 ≤ x ≤ 3,(T(x))2 = (3− x)2 = (3− x)(3− x) ≤ x(3− x);

when −3 ≤ x ≤ −2,

(T(x))2 = [−(x + 3)]2 = [−(x + 3)][−(x + 3)] ≤ x[−(3 + x)];

when |x| > 3,(T(x))2 = 0 = xT(x).

Hence T is a cutter. Clearly, T is nonexpansive. As T is not monotone, we conclude that T is notfirmly nonexpansive.

Remark 9.3 Observe that Example 9.2 is much simpler than the example on R2 constructed byCegielski [20, Example 2.2.8, page 68].

9.2 Subgradient projector of powers of a quadratic function

It is natural to investigate subgradient projectors of quadratic functions or their variants first. Inthe following result, we assume B 6= 0 because that B = 0 gives G f = Id with f ≡ 0.

Theorem 9.4 Let a > 0 and B 6= 0 being an n× n symmetric and positive semidefinite matrix. Considerthe function

f : Rn → R : x 7→ (x∗Bx)1/(2a).

Then the following hold:

(i) lev0 f ={

x ∈ Rn∣∣ Bx = 0

}.

(ii) We have

G f (x) =

{x− a x∗Bx

‖Bx‖2 Bx if Bx 6= 0,

x if Bx = 0.

53

Page 54: Subgradient Projectors: Extensions, Theory, and ...people.ok.ubc.ca/bauschke/Research/109.pdf · Subgradient Projectors: Extensions, Theory, and Characterizations Heinz H. Bauschke,

(iii) G f is linear if and only if B = λPL where λ > 0 and L ⊆ X is a subspace. In this case

ker B = L⊥,

f (x) = λ1/(2a)(dL⊥(x))1/a and

G f = Id−aPL = (1− a) Id+aPL⊥ .

(iv) Assume that G f is linear. Then G f is a cutter if and only if 0 < a ≤ 1.

Proof. (i). Since B is symmetric and positive semidefinite, there exists a matrix A such that B =A∗A; see, e.g., [38, page 558]. Then Ax = 0⇔ Bx = 0. The result follows because f (x) = ‖Ax‖1/a.

(ii). G f follows from direct calculations.

(iii). “⇒”: Assume that G f is linear. The mapping

x 7→ T1(x) := a−1(x− G f (x))=

{x∗Bx‖Bx‖2 Bx if Bx 6= 0,

0 if Bx = 0,

is linear. Let λ1, λ2 > 0 be any two eigenvalues of B. We show that λ1 = λ2. Suppose thatλ1 6= λ2. Take unit length eigenvector vi associated with λi. Note that 〈v1, v2〉 = 0, Bvi 6= 0 andB(v1 + v2) = λ1v1 + λ2v2 6= 0. As T1 is linear, we have T1(v1 + v2) = T1v1 + T1v2. Now

T1(v1 + v2) =(v1 + v2)∗B(v1 + v2)

‖B(v1 + v2)‖2 B(v1 + v2)(105)

=(v1 + v2)∗(λ1v1 + λ2v2)

‖λ1v1 + λ2v2‖2 (λ1v1 + λ2v2)(106)

=λ1 + λ2

λ21 + λ2

2(λ1v1 + λ2v2),(107)

T1v1 + T1v2 =v∗1 Bv1

‖Bv1‖2 Bv1 +v∗2 Bv2

‖Bv2‖2 Bv2(108)

=λ1‖v1‖2

‖λ1v1‖2 λ1v1 +λ2‖v2‖2

‖λ2v2‖2 λ2v2(109)

= v1 + v2.(110)

As {v1, v2} are linearly independent, the above gives λ1 = λ2 which contradicts λ1 6= λ2. There-fore, all positive eigenvalues of B have to be equal. Hence, we have

B = λU∗(

Id 00 0

)U

where U is an orthogonal matrix, λ > 0, Id is an m × m identity matrix with m = rank B. Thematrix

U∗(

Id 00 0

)U

is idempotent and symmetric, so it is a matrix associated with an orthogonal projection onto aclosed subspace, say PL, [38, page 430, page 433]. Hence

B = λPL

54

Page 55: Subgradient Projectors: Extensions, Theory, and ...people.ok.ubc.ca/bauschke/Research/109.pdf · Subgradient Projectors: Extensions, Theory, and Characterizations Heinz H. Bauschke,

which implies that Bx = 0 if and only if PLx = 0, i.e., ker B = L⊥. Then when PLx 6= 0,

T1(x) =x∗Bx‖Bx‖2 Bx =

λx∗PLxx∗λPLλPLx

λPLx =λx∗PLxλ2x∗PLx

λPLx = PLx;

when PLx = 0, T1x = 0 = PLx. Hence T1 = PL. It follows that

G f = Id−aT1 = Id−aPL = (1− a) Id+a(Id−PL) = (1− a) Id+aPL⊥ .

We proceed to find the expression for f (x):

f (x) = (x∗Bx)1/(2a) = (x∗λPLx)1/(2a)(111)

= λ1/(2a)(x∗PLPLx)1/(2a) = λ1/(2a)(‖PLx‖2)1/(2a)(112)

= λ1/(2a)(‖x− PL⊥x‖2)1/(2a) = λ1/(2a)(dL⊥(x)2)1/(2a)(113)

= λ1/(2a)(dL⊥(x))1/a.(114)

“⇐”: Assume that B = λPL for λ > 0 and some subspace L ⊆ Rn. The assumption gives

f (x) = λ1/(2a)(dL⊥(x))1/a.

By Proposition 3.4(i), G f = G(dL⊥)1/a . By Theorem 3.9, G f = (1− a) Id+aGdL⊥

. By Fact 2.8,

G f = (1− a) Id+aPL⊥ .

Hence G f is linear.

(iv). “⇒”: Assume that G f is linear and a cutter. By Fact 9.1, G f is firmly nonexpansive, so isId−G f . By (ii) Id−G f = aPL, aPL has to be nonexpansive. Because B 6= 0, we have L 6= {0}. Take0 6= x ∈ L. The nonexpansiveness requires

‖aPLx− aPL0‖ = ‖ax‖ ≤ ‖x‖

so that a ≤ 1.

“⇐”: Assume that 0 < a ≤ 1. Since x 7→ (x∗Bx)1/2 is convex, and the function [0,+∞) 3 t 7→t1/a is convex and increasing when 0 < a ≤ 1, we have that x 7→ f (x) =

((x∗Bx)1/2)1/a is convex.

Then G f is a cutter by Fact 5.12. �

We illustrate Theorem 9.4(iv) with the following example.

Example 9.5 Let a > 1. Consider

f : Rn → R : x 7→ (x∗x)1/(2a) = ‖x‖1/a.

Then f is not convex, and G f (x) = (1− a)x for every x ∈ Rn. Although G f is linear, it is not acutter since it is not monotone; see, e.g., Proposition 9.1.

55

Page 56: Subgradient Projectors: Extensions, Theory, and ...people.ok.ubc.ca/bauschke/Research/109.pdf · Subgradient Projectors: Extensions, Theory, and Characterizations Heinz H. Bauschke,

9.3 Symmetric and linear subgradient projectors

The following result completely characterizes symmetric and linear subgradient projectors.

Theorem 9.6 Assume that T : Rn → Rn is linear and symmetric. Then the following are equivalent:

(i) T is a subgradient projector of a convex function f : Rn → R with lev0 f 6= ∅.

(ii) T = G f where f : Rn → R is given by

(115) f (x) = K(x∗PLx)1/(2λ) = K(dL⊥(x)

)1/λ

where 0 < λ ≤ 1, K > 0, and L ⊆ Rn is a subspace such that L⊥ = Fix T. In this case,G f = (1− λ) Id+λPL⊥ .

Proof. (i)⇒(ii). Assume that T = G f for some convex function. Since T is linear and a cutter,T is firmly nonexpansive by Proposition 9.1. Then T1 = Id−T is firmly nonexpansive by [7,Proposition 4.2].

We consider two cases.

Case 1. int lev0 f 6= ∅. We have T1 ≡ 0 on an open set B(x0, ε) ⊆ lev0 f , i.e.,

T1(x0 + b) = 0 for every ‖b‖ < ε.

As T1 is linear, T1(b) = T1(x0 + b)− T1(x0) = 0− 0 = 0 when ‖b‖ < ε, so T1 ≡ 0 on Rn. Thus,T = Id on Rn. Then T = G f with f ≡ 0. This means that (ii) holds with L = {0}, λ = 1 and K > 0.

Case 2. int lev0 f = ∅. Since lev0 f is a proper subspace, it is an intersection of a finite collectionof hyper-planes [50, Corollary 1.4.1], so Rn \ lev0 f is union of a finite collection of open half spaces.As T1 is continuous, we only need to consider

T1(x) =f (x)

‖∇ f (x)‖2∇ f (x) when f (x) > 0.

Then

‖T1(x)‖ = f (x)‖∇ f (x)‖ and

∇ f (x)f (x)

=T1(x)‖T1(x)‖2 .

Since T is symmetric, T1 is symmetric, so there exists an orthogonal matrix Q such that Q∗T1Q = Dwhere D is an diagonal matrix and Q∗ denotes the transpose of Q. Put g = ln f and x = Qy. Wheny 6∈ Q∗(Fix T), we have

(∇g)(Qy) =T1Qy‖T1Qy‖2 .

Multiplying both sides by Q∗ and using Q∗ being an isometry (i.e., ‖Q∗z‖ = ‖z‖ for every z ∈ Rn)give

Q∗(∇g)(Qy) =Q∗T1Qy‖T1Qy‖2 =

Dy‖Q∗T1Qy‖2 =

Dy‖Dy‖2 .

If we put h = g ◦Q, then ∇h(y) = Q∗∇g(Qy) for every y ∈ Rn \ (Q∗ Fix T), so

(∀y ∈ Rn \ (Q∗ Fix T)) ∇h(y) =Dy‖Dy‖2 .

56

Page 57: Subgradient Projectors: Extensions, Theory, and ...people.ok.ubc.ca/bauschke/Research/109.pdf · Subgradient Projectors: Extensions, Theory, and Characterizations Heinz H. Bauschke,

Moreover, Rn \ (Q∗ Fix T) is a finite union of open half spaces, because Q∗ Fix T is a proper sub-space of Rn. Write

D =

λ1 0 · · · 00 λ2 · · · 0...

.... . .

...0 0 · · · λn

.

When λ1 = · · · = λn = 0, this is covered in Case 1. We thus assume that T1 6≡ 0. As T1 ismonotone, we can and do assume that λ1, · · · , λm > 0 and λm+1 = · · · = λn = 0. Then

∇h(y) =

(λ1y1

∑mk=1 λ2

ky2k

, · · · ,λmym

∑mk=1 λ2

ky2k

, 0, · · · , 0

).

Since h has continuous second order derivatives on the nonempty open Rn \ (Q∗ Fix T), it musthold that

∂2h∂yi∂yj

=∂2h

∂yj∂yi

which gives

(116)−2λjλ

2i yiyj

∑mk=1 λ2

ky2k

=−2λiλ

2j yiyj

∑mk=1 λ2

ky2k

when 1 ≤ i, j ≤ m, i 6= j. As int lev0 f = int Fix T = ∅, (116) holds on the nonempty openRn \ (Q∗ Fix T), so we have λi = λj. Because 1 ≤ i, j ≤ m were arbitrary, we obtain that λ1 =· · · = λm. Hence

T1 = Q(

λ Idm 00 0

)Q∗ = λQ

(Idm 0

0 0

)Q∗ = λPL

where L ⊆ Rn is a linear subspace; see [38, page 430]. More precisely, T1 is a positive multiple ofan orthogonal projector with

(117) Fix T = ker T1 = L⊥.

Now T1 is firmly nonexpansive and T∗1 = T1, this implies that

T1 + T∗12

− T∗1 T1 = T1 − T21 = Q

((λ− λ2) Id 0

0

)Q∗

is positive semidefinite, so 0 ≤ λ ≤ 1. Because T1 6= 0 in this case, we obtain 0 < λ ≤ 1.

Therefore, when x 6∈ Fix T,

∇ ln f (x) =T1x‖T1x‖2 =

λPLx‖λPLx‖2 =

PLx‖PLx‖2 .

Note that P∗L = PL, P2L = PL,

∇ ln ‖PLx‖ = 1‖PLx‖∇‖PLx‖ = 1

‖PLx‖P∗LPLx‖PLx‖ =

PLx‖PLx‖2 .

It follows that∇ ln f (x) =

1λ∇ ln ‖PLx‖ = ∇ ln ‖PLx‖1/λ.

57

Page 58: Subgradient Projectors: Extensions, Theory, and ...people.ok.ubc.ca/bauschke/Research/109.pdf · Subgradient Projectors: Extensions, Theory, and Characterizations Heinz H. Bauschke,

On each connected and open component of Rn \ Fix T, this is equivalent to

ln f (x) = ln ‖PLx‖1/λ + c

for some constant c ∈ R. Taking exp both sides gives

(118) f (x) = K‖PLx‖1/λ = K(‖PLx‖2)1/(2λ) = K(x∗PLx)1/(2λ)

where K = exp(c) > 0. As PL = Id−PL⊥ , we obtain

f (x) = K‖x− P⊥x‖1/λ = K(dL⊥(x))1/λ.

Moreover,T = G f = Id−T1 = Id−λPL = Id−λ(Id−PL⊥) = (1− λ) Id+λPL⊥

where L⊥ = Fix T by (117). One can apply the same argument on each connected and opencomponent of Rn \ Fix T, while one might have different constant K’s in (118), but λ will be thesame. Indeed, suppose that there exist 0 < λ, λ1 ≤ 1, λ 6= λ1 such that

(1− λ) Id+λPFix T = (1− λ1) Id+λ1PFix T.

Then PFix T = Id so that Fix T = Rn, which contradicts that int Fix T = ∅. Using the same K > 0for all connected and open component of Rn \ Fix T, one obtains (115).

(ii)⇒(i). Clear. �

Theorem 9.6 is proved under the assumption that the linear subgradient projector of a convexfunction is symmetric. We think that the assumption of symmetry is superfluous; cf. Theorem 11.2.

Conjecture 9.7 If f : Rn → R is convex and its subgradient projector G f ,s is linear, then G f ,s must besymmetric.

Note that when f is not convex, G f ,s can be nonsymmetric; see Corollary 11.6(ii).

9.4 Characterization of linear subgradient projectors

In subsection 9.3, we assume that the linear operator is symmetric. What happens if the linearoperator is not symmetric? For this purpose we need the following result.

Proposition 9.8 Let M : Rn → Rn be linear, monotone and

(∀x ∈ Rn \ ker M) ∇h(x) =Mx‖Mx‖2

where the function h : Rn \ ker M→ R. If dim ran M 6= 2, then M is symmetric.

Proof. If dim ran M = 0, then M = 0, so it is symmetric. Let us assume that dim ran M > 0 anddim ran M 6= 2. Since h has continuous mixed second order derivatives at x whenever Mx 6= 0,the Hessian matrix ∇2h(x) is symmetric. As

∇2h(x) =‖Mx‖2M−Mx(∇‖Mx‖2)∗

‖Mx‖4 =‖Mx‖2M− 2Mxx∗M∗M

‖Mx‖4 ,

58

Page 59: Subgradient Projectors: Extensions, Theory, and ...people.ok.ubc.ca/bauschke/Research/109.pdf · Subgradient Projectors: Extensions, Theory, and Characterizations Heinz H. Bauschke,

the symmetric property means that

‖Mx‖2M− 2Mxx∗M∗M = (‖Mx‖2M− 2Mxx∗M∗M)∗ = M∗‖Mx‖2 − 2M∗Mxx∗M∗

whenever Mx 6= 0. Put y = Mx. The above is simplified to

M−M∗

2=

yy∗

‖y‖2 M−M∗yy∗

‖y‖2 .

Denote the projection operator on the line spanned by {y}, span(y), by Py := yy∗

‖y‖2 . We have

(119) (∀y ∈ ran M)M−M∗

2= Py M−M∗Py.

Since M is monotone, ran M = ran M∗; see, e.g., [14, Theorem 3.2]. Let{

ei∣∣ i = 1, . . . , m

}be an

orthonormal basis of ran M. Then

(120) Pran M =m

∑i=1

eie∗i .

Note that

(121) M∗Pran M = M∗Pran M∗ = M∗(Pran M∗ + P(ran M∗)⊥) = M∗

because M∗P(ran M∗)⊥ = 0. To see this, let y ∈ (ran M∗)⊥. For every z ∈ Rn,

〈M∗y, z〉 = 〈y, Mz〉 = 0

because Mz ∈ ran M = ran M∗. Because z ∈ Rn was arbitrary, we must have M∗y = 0. Since

(122)M−M∗

2= Pei M−M∗Pei

by (119), summing up (122) from from i = 1 to i = m, followed by using (120) and (121), we obtain

m2(M−M∗) = (

m

∑i=1

Pei)M−M∗(m

∑i=1

Pei) = Pran M M−M∗Pran M = M−M∗,

that is,(

m2− 1)(M−M∗) = 0.

Hence M−M∗ = 0 because m 6= 2, and so M is symmetric. �

The proof of Proposition 9.8 requiring dim ran M 6= 2 seems bizarre. However, the followingexamples show that Proposition 9.8 fails when dim ran M = 2.

Example 9.9 When dim ran M = 2, although M : R2 → R2 is linear, monotone and

(∀x ∈ Rn \ ker M) ∇h(x) =Mx‖Mx‖2 ,

one cannot guarantee that M is symmetric.

To see this, let x = (x, y)∗ ∈ R2.

59

Page 60: Subgradient Projectors: Extensions, Theory, and ...people.ok.ubc.ca/bauschke/Research/109.pdf · Subgradient Projectors: Extensions, Theory, and Characterizations Heinz H. Bauschke,

(1). Define

M :=(

0 −11 0

).

Then M is linear, monotone, dim ran M = 2 and

∇ arctan(y/x) =

(−yx

)x2 + y2 =

Mx‖Mx‖2

whenever x 6= 0. However, M is not symmetric.

(2). Define

M :=(

1/2 −1/21/2 1/2

).

Then M is linear, firmly nonexpansive and

∇(

ln(x2 + y2)

2+ arctan(y/x)

)=

(x− yy + x

)x2 + y2 =

Mx‖Mx‖2

whenever x 6= 0. However, dim ran M = 2 and M is not symmetric.

Conjecture 9.10 Let M : Rn → Rn be linear, monotone and

(∀x ∈ Rn \ ker M) ∇h(x) =Mx‖Mx‖2

where the function h : Rn \ ker M → R. If dim ran M = 2 and exp(h) is convex on convex subsets ofRn \ ker M, then M is symmetric.

Combining Theorem 9.6 and Proposition 9.8, we obtain the following characterization of linearsubgradient projectors.

Theorem 9.11 Assume that T : Rn → Rn is linear and dim ran(Id−T) 6= 2. Then the following areequivalent:

(i) T is a subgradient projector of a convex function f : Rn → R with lev0 f 6= ∅.

(ii) T = G f where f : Rn → R is given by

f (x) := K(x∗PLx)1/(2λ) = K(dL⊥(x)

)1/λ

where 0 < λ ≤ 1, K > 0, and L ⊆ Rn is a subspace such that L⊥ = Fix T. In this case,G f = (1− λ) Id+λPL⊥ .

Proof. (i)⇒(ii). Assume that T = G f for some convex function f : Rn → R. Then T is a cutter byFact 5.12. As T is linear, in view of Proposition 9.1, T is firmly nonexpansive, so M := Id−T isfirmly nonexpansive, in particular, monotone. By Theorem 4.1(i),

(123) ∇h(x) =Mx‖Mx‖2

60

Page 61: Subgradient Projectors: Extensions, Theory, and ...people.ok.ubc.ca/bauschke/Research/109.pdf · Subgradient Projectors: Extensions, Theory, and Characterizations Heinz H. Bauschke,

where h(x) = ln f (x), f (x) > 0. Since Fix T = lev0 f = ker M, (123) is equivalent to

∇h(x) =Mx‖Mx‖2

when Mx 6= 0. Proposition 9.8 shows that M is symmetric, so is T = Id−M. It suffices to applyTheorem 9.6 to obtain (ii).

(ii)⇒(i). Clear. �

10 Subgradient projectors of convex functions are not closed underconvex combinations and compositions

A convex combination of cutters is a cutter, see [20, Corollary 2.1.49] or [7, Proposition 4.34].Convex combinations of a finite family of cutters with a common fixed point are effectively usedin simultaneous cutter methods; see [20, Section 5.8], [7, Corollary 5.18].

A question that naturally arises is whether the set of subgradient projectors of convex functionsis convex. Theorem 9.6 allows us to show that the answer is negative. While Theorem 10.1 worksonly in R2, Theorem 10.3 works in Rn with n ≥ 2.

Theorem 10.1 In R2, a convex combination of subgradient projectors of convex functions need not be asubgradient projector of a convex function.

Proof. Let L := {0} ×R ⊆ R2 and M :={

x = (x1, x2) ∈ R2∣∣ x1 + x2 = 0

}. Both L, M are proper

linear subspaces of R2. Define f , g : R2 → R by

(124) (∀x ∈ R2) f (x) := K1(dL⊥(x)

)1/λ1 , g(x) := K2(dM⊥(x)

)1/λ2

where 0 < λ1 6= λ2 < 1, K1, K2 > 0. By Theorem 9.6, we have

(125) G f = (1− λ1) Id+λ1PL⊥ , and Gg = (1− λ2) Id+λ2PM⊥ .

Now consider λ3G f + (1− λ3)Gg where 0 < λ3 < 1. Then

λ3G f + (1− λ3)Gg(126)

=λ3((1− λ1) Id+λ1PL⊥

)+ (1− λ3)

((1− λ2) Id+λ2PM⊥

)(127)

=(1− λ2 + λ2λ3 − λ1λ3) Id+λ1λ3PL⊥ + λ2(1− λ3)PM⊥ .(128)

We show that λ3G f + (1− λ3)Gg is not a subgradient projector of a convex function by contra-diction. Suppose that λ3G f + (1 − λ3)Gg is a subgradient projector of a convex function. ByTheorem 9.6, there are 0 < λ < 1 and S which is a subspace of R2 such that

(129) λ3G f + (1− λ3)Gg = (1− λ) Id+λPS⊥ .

Therefore, we have

(130) (1− λ2 + λ2λ3 − λ1λ3) Id+λ1λ3PL⊥ + λ2(1− λ3)PM⊥ = (1− λ) Id+λPS⊥ .

61

Page 62: Subgradient Projectors: Extensions, Theory, and ...people.ok.ubc.ca/bauschke/Research/109.pdf · Subgradient Projectors: Extensions, Theory, and Characterizations Heinz H. Bauschke,

Naturally, the set of fixed points of left-hand side is equal to the set of fixed points of right-handside. Thus we have

Fix ((1− λ) Id+λPS⊥)(131)= Fix ((1− λ2 + λ2λ3 − λ1λ3) Id+λ1λ3PL⊥ + λ2(1− λ3)PM⊥) .(132)

By [7, Proposition 4.34], we have

(133) Fix ((1− λ2 + λ2λ3 − λ1λ3) Id+λ1λ3PL⊥ + λ2(1− λ3)PM⊥) = L⊥ ∩M⊥.

Also,

(134) Fix ((1− λ) Id+λPS⊥) = S⊥.

Hence, using definitions of L, M, and (131)-(134), it follows that

{(0, 0)} = L⊥ ∩M⊥(135)= Fix ((1− λ2 + λ2λ3 − λ1λ3) Id+λ1λ3PL⊥ + λ2(1− λ3)PM⊥)(136)= Fix ((1− λ) Id+λPS⊥)(137)

= S⊥.(138)

Therefore S⊥ = {(0, 0)}, which implies S = R2. In terms of matrices, we have

(139) PL⊥ =

(1 00 0

), PM⊥ =

(1/2 −1/2

−1/2 1/2

), and PS⊥ =

(0 00 0

).

In particular, PL⊥ , PS⊥ are diagonal matrices, but PM⊥ is not. Hence, equation (130) is not true.Therefore, λ3G f + (1− λ3)Gg is not a subgradient projector of a convex function. �

Our next result needs averaged mappings.

Definition 10.2 (See [4], [7, Definition 4.23]) Let λ ∈ (0, 1). An operator T : Rn → Rn is λ-averagedif there exists a nonexpansive operator N : Rn → Rn such that T = (1− λ) Id+λN.

Theorem 10.3 Let n ≥ 2, 0 < λ1 < 1, 0 < λ2 < 1, 0 < λ < 1. Suppose that L, M are linear subspacesof Rn satisfying L = M⊥, M = L⊥, and that both L and M are proper linear subspaces of Rn. Define

f : Rn → R : x 7→ (dL⊥(x))1/λ1 , and

g : Rn → R : x 7→ (dM⊥(x))1/λ2 .

If 1−λλ 6=

λ2λ1

, then (1− λ)G f + λGg is not a subgradient projector of a convex function.

Proof. By Theorem 9.6, we have

(140) G f = (1− λ1) Id+λ1PL⊥ , and Gg = (1− λ2) Id+λ2PM⊥ .

Then

(1− λ)G f + λGg(141)

=(1− λ) ((1− λ1) Id+λ1PL⊥) + λ ((1− λ2) Id+λ2PM⊥)(142)

62

Page 63: Subgradient Projectors: Extensions, Theory, and ...people.ok.ubc.ca/bauschke/Research/109.pdf · Subgradient Projectors: Extensions, Theory, and Characterizations Heinz H. Bauschke,

= [(1− λ)(1− λ1) + λ(1− λ2)] Id+λ1(1− λ)PL⊥ + λλ2PM⊥(143)=β Id+(1− β)(γPL⊥ + (1− γ)PM⊥),(144)

where β := (1− λ)(1− λ1) + λ(1− λ2) and γ := λ1(1−λ)1−(1−λ)(1−λ1)−λ(1−λ2)

. We observe that 0 < β < 1

and γ 6= 12 . Indeed,

(145) 0 < β = (1− λ)(1− λ1) + λ(1− λ2) < (1− λ) + λ = 1.

Also,

γ =12

(146)

⇐⇒ λ1(1− λ)

1− (1− λ)(1− λ1)− λ(1− λ2)=

λλ2

1− (1− λ)(1− λ1)− λ(1− λ2)(147)

⇐⇒λ1(1− λ) = λλ2(148)

⇐⇒1− λ

λ=

λ2

λ1.(149)

By the assumption: 1−λλ 6= λ2

λ1, so γ 6= 1

2 . Since G f , Gg are linear and symmetric, so is(1 − λ)G f + λGg. We show that (1 − λ)G f + λGg is not a subgradient projector of a convexfunction by contradiction. If (1 − λ)G f + λGg is a subgradient projector of a convex function,by Theorem 9.6, we have

(150) (1− λ)G f + λGg = (1− α) Id+αPS⊥

where 0 < α < 1 and S is a subspace of X. Note that G f , Gg are averaged mappings, so is(1− λ)G f + λGg. Because

(151) Fix G f = L⊥, and Fix Gg = M⊥,

by [7, Proposition 4.34], we obtain

(152) Fix((1− λ)G f + λGg) = Fix G f ∩ Fix Gg = L⊥ ∩M⊥ = {0}.

Because Fix((1− α) Id+αPS⊥) = S⊥, using (150) and (152) we obtain S⊥ = {0}. Therefore, inview of equation (150), we have

(153) (1− λ)G f + λGg = (1− α) Id .

Combing (144) and (153) gives

(154) β Id+(1− β)(γPL⊥ + (1− γ)PM⊥) = (1− α) Id .

We proceed to analyze α, β. Take x ∈ M \ {0}, which is possible since M 6= {0}. Then PM⊥x =0, PL⊥x = PMx = x. Equation (154) gives

(155) βx + (1− β)(γx) = (1− α)x,

which implies

(156) β + (1− β)γ = 1− α.

63

Page 64: Subgradient Projectors: Extensions, Theory, and ...people.ok.ubc.ca/bauschke/Research/109.pdf · Subgradient Projectors: Extensions, Theory, and Characterizations Heinz H. Bauschke,

Take x ∈ L \ {0}, which is possible since L 6= {0}. Then PL⊥x = 0, PM⊥x = PLx = x. Equation (154)gives

(157) βx + (1− β)(1− γ)x = (1− α)x,

which implies

(158) β + (1− β)(1− γ) = 1− α.

Subtracting equation (156) from equation (158), we have

(159) (1− β)(1− 2γ) = 0

which implies β = 1 or γ = 12 . This contradicts the choices of λ, λ1, λ2. �

If two nearest point projectors onto subspaces commute, then their composition is the projectiononto the intersection of the subspaces; see [27, Lemma 9.2]. One referee asks whether there is ananalogue when two linear subgradient projectors commute. The answer is negative. To this end,we need an auxiliary result.

Lemma 10.4 Let L, M ⊆ Rn be two subspaces, and λi ∈ [0, 1) with i = 1, 2. Then the following areequivalent:

(i)(λ1 Id+(1− λ1)PL⊥

)(λ2 Id+(1− λ2)PM⊥

)=(λ2 Id+(1− λ2)PM⊥

)(λ1 Id+(1− λ1)PL⊥

).

(ii) PL⊥PM⊥ = PM⊥PL⊥ .

(iii) PLPM = PMPL.

Proof. (i)⇔(ii): This follows from(λ1 Id+(1− λ1)PL⊥

)(λ2 Id+(1− λ2)PM⊥

)(160)

= λ1λ2 Id+λ1(1− λ2)PM⊥ + (1− λ1)λ2PL⊥ + (1− λ1)(1− λ2)PL⊥PM⊥ ,(161)

and (λ2 Id+(1− λ2)PM⊥

)(λ1 Id+(1− λ1)PL⊥

)(162)

= λ1λ2 Id+λ2(1− λ1)PL⊥ + (1− λ2)λ1PM⊥ + (1− λ1)(1− λ2)PM⊥PL⊥ .(163)

(ii)⇔(iii): Since PL⊥ = Id−PL, PM⊥ = Id−PM, (ii) is equivalent to

(Id−PL)(Id−PM) = (Id−PM)(Id−PL),

which is (iii) after simplifications. �

Theorem 10.5 In R2, even though two linear subgradient projectors of convex functions commute, itscomposition need not be a subgradient projector of a convex function.

Proof. Let 0 < λ1 < λ2 < 1. Because(1 00 0

)= PR×{0}, and

(0 00 1

)= P{0}×R,

64

Page 65: Subgradient Projectors: Extensions, Theory, and ...people.ok.ubc.ca/bauschke/Research/109.pdf · Subgradient Projectors: Extensions, Theory, and Characterizations Heinz H. Bauschke,

by Theorem 9.6, there exist two convex functions f , g : R2 → R such that

G f = λ1

(1 00 1

)+ (1− λ1)

(1 00 0

)=

(1 00 λ1

), and

Gg = λ2

(1 00 1

)+ (1− λ2)

(0 00 1

)=

(λ2 00 1

).

These two subgradient projectors are commutative by Lemma 10.4 or a direct calculation:

G f Gg = GgG f =

(λ2 00 λ1

).

We claim that T := G f Gg is not a subgradient projector of a convex function. We prove this bycontradiction. Suppose that T is a subgradient projector. Since T is symmetric and linear, byTheorem 9.6, there exists 0 ≤ λ ≤ 1 such that

(164) T = λ Id+(1− λ)P

where P is a projector onto a subspace of R2. We consider five cases.

Case 1. λ = 0. This gives P = T. Because P is a projector, its eigenvalues are 0 or 1. This isimpossible, since 0 < λi < 1 and λ1 6= λ2.

Case 2. λ = 1. This gives T = Id. This is impossible, since λ1 6= λ2.

Case 1 and Case 2 implies that 0 < λ < 1. This gives

P =

(λ2−λ1−λ 00 λ1−λ

1−λ

).

Case 3. λ > λ1. Then λ1 − λ < 0. This is impossible, since the eigenvalues of P have to benonnegative.

Case 4. λ = λ1. Sinceλ2 − λ

1− λ> 0,

and P has eigenvalues only of 0 or 1, we have

λ2 − λ

1− λ= 1.

It follows that λ2 = 1, which is impossible.

Case 5. 0 < λ < λ1. Thenλ1 − λ

1− λ> 0,

λ2 − λ

1− λ> 0.

Since P has eigenvalues only of 0 or 1, we must have

λ2 − λ

1− λ=

λ1 − λ

1− λ= 1

from which λ1 = λ2. This is impossible.

Altogether, (164) does not hold. Using Theorem 9.6 again, we conclude that T is not a subgra-dient projector of a convex function. �

65

Page 66: Subgradient Projectors: Extensions, Theory, and ...people.ok.ubc.ca/bauschke/Research/109.pdf · Subgradient Projectors: Extensions, Theory, and Characterizations Heinz H. Bauschke,

11 A complete analysis of linear subgradient projectors on R2

In this section we turn our attention to linear operators on R2. One nice feature is that we are ableto not only characterize when the linear operator is a subgradient projector but also give explicitformulae for the corresponding functions.

Is every linear mapping from R2 to R2 a subgradient projector of an essentially strictly dif-ferentiable function (convex or nonconvex) on R2? The answer is no by Theorem 11.2 below.Theorem 11.2(iii) also shows that Theorem 9.11 fails if dim ran(Id−T) 6= 2 is removed.

We start with a simple result about essentially strictly differentiable functions, see Definition 4.3.

Lemma 11.1 Let O ⊆ Rn be a nonempty open set and f : O ⊆ Rn → R be an essentially strictlydifferentiable function. If there exists a continuous selection s : O → Rn with s(x) ∈ ∂ f (x) for everyx ∈ O, then f is strictly differentiable on O. Consequently, f is continuously differentiable on O.

Proof. By [15, Theorem 2.4, Corollary 4.2], f has a minimal Clarke subdifferential ∂c f , and ∂c f canbe recovered by every dense selection of ∂c f . Since s(x) ∈ ∂ f (x) ⊆ ∂c f (x), and s is continuous onO, we have ∂c f (x) = ∂ f (x) = {s(x)} for every x ∈ O, which implies that f is strictly differentiableat x; see, e.g., [51, page 362, Theorem 9.18] or [40, Theorem 3.54]. Hence f is strictly differentiableon O. �

We consider the linear operator T : R2 → R2 defined by

T :=(

1− a −b−c 1− d

)(165)

where

(166) a2 + b2 + c2 + d2 6= 0 (i.e., (a, b, c, d) 6= (0, 0, 0, 0)).

Note that when a = b = c = d = 0, we have T = Id = G f with f ≡ 0.

Theorem 11.2 Let T be given by (165). Then T is a subgradient projector of an essentially strictly differ-entiable function on R2 \ Fix T if and only if one of the following holds:

(i) a = b = c = 0, d 6= 0: T = G f where f (x1, x2) := K|x2|1/d for some K > 0; b = c = d = 0, a 6= 0:T = G f where f (x1, x2) := K|x1|1/a for some K > 0.

(ii) a 6= 0, d 6= 0, b = c 6= 0, ad = c2: T = G f where f (x1, x2) := K|ax1 + cx2|a/(a2+c2) for someK > 0.

(iii) a = d, b = −c, and a2 + c2 6= 0: T = G f where

(167) f (x1, x2) :=

K(x2

1 + x22)

a2(a2+c2) exp

(− c

a2+c2 arctan(

x1x2

))if x2 6= 0,

0 if (x1, x2) = (0, 0),

K|x1|a

(a2+c2) exp(− |c|

a2+c2π2

)if x1 6= 0, x2 = 0,

for some K > 0, and f is lsc. In particular, when c 6= 0, f is not convex.

66

Page 67: Subgradient Projectors: Extensions, Theory, and ...people.ok.ubc.ca/bauschke/Research/109.pdf · Subgradient Projectors: Extensions, Theory, and Characterizations Heinz H. Bauschke,

Proof. Observe that (166) implies that Fix T is a proper subspace of R2. Assume that T is asubgradient projector. By Theorem 4.1 and Lemma 11.1, we can find a differentiable functiong : (R2 \ Fix T)→ R such that

for every x ∈ R2 \ Fix T,x− Tx‖x− Tx‖2 = ∇g(x).

Because

x− Tx =

(a bc d

)(x1x2

)=

(ax1 + bx2cx1 + dx2

),

we have

∂g∂x1

=ax1 + bx2

(ax1 + bx2)2 + (cx1 + dx2)2 ,

∂g∂x2

=cx1 + dx2

(ax1 + bx2)2 + (cx1 + dx2)2 .

Since

∂2

∂x1∂x2g(x1, x2) = − (a2b− bc2 + 2acd)x2

1 + (b3 + bd2)x22 + 2(ab2 + ad2)x1x2

((ax1 + bx2)2 + (cx1 + dx2)2)2 ,

∂2

∂x2∂x1g(x1, x2) = − (a2c + c3)x2

1 + (cd2 − b2c + 2abd)x22 + 2(c2d + a2d)x1x2

((ax1 + bx2)2 + (cx1 + dx2)2)2 ,

on the nonempty open set of R2 \ Fix T, we have

∂2

∂x1∂x2g(x1, x2) =

∂2

∂x2∂x1g(x1, x2).

This leads to a2b− bc2 + 2acd = a2c + c3, (1)

b3 + bd2 = cd2 − b2c + 2abd, (2)ab2 + ad2 = c2d + a2d. (3)

Now multiplying (2) by a, followed by subtracting it with (3) multiplied by b, gives (ad− bc)(ab+cd) = 0. It suffices to consider two cases:

• Case ad = bc. (1) implies (b− c)(a2 + c2) = 0. Observe that b− c 6= 0 is impossible by (2).Then the following two subcases could happen.

i. b = c = 0. Then (3)⇒ ad(a− d) = 0. This means

(168) a = b = c = 0, d 6= 0,

or

(169) b = c = d = 0, a 6= 0.

ii. b = c 6= 0, which implies a 6= 0, d 6= 0, and ad = c2.

67

Page 68: Subgradient Projectors: Extensions, Theory, and ...people.ok.ubc.ca/bauschke/Research/109.pdf · Subgradient Projectors: Extensions, Theory, and Characterizations Heinz H. Bauschke,

• Case ab + cd = 0. (1) implies (b + c)(a2 + c2) = 0. When b = −c = 0, (3) gives ad(a− d) = 0,which leads to (168), (169), or a = d 6= 0. It remains to consider the case b = −c 6= 0. Then(2) and (3) imply a = d. Moreover, we can and do assume a2 + c2 6= 0 since a = c = 0 gives(168) by (2).

In summary, we only have the following three cases.

Case 1. a = b = c = 0, d 6= 0. Then we get

g(x1, x2) =ln |x2|

d+ C1, if x2 6= 0.

Or b = c = d = 0, a 6= 0. Then we get

g(x1, x2) =ln |x1|

a+ C1, if x1 6= 0.

Case 2. a 6= 0, d 6= 0, b = c 6= 0, ad = c2. Then we get

g(x1, x2) =a

a2 + c2 ln |ax1 + cx2|+ C2, if ax1 + cx2 6= 0.

Case 3. a = d, b = −c, and a2 + c2 6= 0. Then we get

g(x1, x2) =a

2(a2 + c2)ln(x2

1 + x22)−

ca2 + c2 arctan

(x1

x2

)+ C3, if x2 6= 0.

Since g = ln f , we obtain f = exp(g) by using Case 1-Case 2. For Case 3, we obtain

f (x1, x2) = K(x21 + x2

2)a

2(a2+c2) exp(− c

a2 + c2 arctan(

x1

x2

))if x2 6= 0

for some K > 0. However, when c 6= 0, f is not continuous at (x1, 0) with x1 6= 0 since

limx1→x1,x2↓0

arctanx1

x26= lim

x1→x1,x2↑0arctan

x1

x2= ±π

2.

The function given by (167) is lsc but not continuous at every (x1, 0). Moreover, f is not convex onR2 since a finite-valued convex function on a finite dimensional space is continuous; see, e.g., [7,Corollary 8.31].

It is interesting to ask for what selection s ∈ ∂ f , we have G f = T on R2. On R2 \{(x1, x2)

∣∣ x2 6= 0}

, one clearly chooses s = ∇ f . It remains to determine the subgradient of f at(x1, 0). Indeed, when x2 6= 0, f (x1, x2) = exp(g(x1, x2)), so that ∇ f (x1, x2) = f (x1, x2)∇g(x1, x2),i.e.

∇ f (x1, x2) = K(x21 + x2

2)a

2(a2+c2) exp(− c

a2 + c2 arctan(

x1

x2

))1

a2 + c2(ax1 − cx2, ax2 + cx1)

x21 + x2

2.

When (x1, x2)→ (x1, 0), cx1/x2 > 0, we have

f (x1, x2)→ K|x1|a

(a2+c2) exp(− |c|

a2 + c2π

2

)= f (x1, 0),

68

Page 69: Subgradient Projectors: Extensions, Theory, and ...people.ok.ubc.ca/bauschke/Research/109.pdf · Subgradient Projectors: Extensions, Theory, and Characterizations Heinz H. Bauschke,

and

∇ f (x1, x2)→ K|x1|a

(a2+c2) exp(− |c|

a2 + c2π

2

)1

a2 + c2

(ax1

,cx1

).

Therefore, by the definition of limiting subdifferentials (see Definition 2.1),

(170) K|x1|a

(a2+c2) exp(− |c|

a2 + c2π

2

)1

a2 + c2

(ax1

,cx1

)∈ ∂ f (x1, 0).

Hence, we can choose s(x1, 0) to be the limiting subgradient given by (170). �

Remark 11.3 Note that ∂ f (x1, 0) is not a singleton when x1 6= 0 and c 6= 0 in Theorem 11.2(iii).Thus, in Theorem 11.2(iii), we only have T ∈ G f when c 6= 0. In order to make f continuous onRn, we need c = 0, in which case (167) reduces to

f (x1, x2) = K‖(x1, x2)‖1/a,

and G f = (1− a) Id. Clearly, f is not convex when a > 1. This has been discussed in Example 2.5.

Corollary 11.4 Let T be given by (165). Suppose that one of the following holds:

(i) |b| 6= |c|.

(ii) b = c = 0, a 6= 0, d 6= 0, and a 6= d.

Then there exists no f : R2 → R being essentially strictly differentiable such that T = G f .

Corollary 11.5 Let T be given by (165). Suppose that b = c = 0, 0 < a < 1, 0 < d < 1, and a 6= d. ThenT is firmly nonexpansive, and there exists no f : R2 → R being essentially strictly differentiable such thatT = G f .

Corollary 11.6 (i) The skew linear mapping

T :=(

0 −11 0

)is not firmly nonexpansive, so not a cutter. However, T is a subgradient projector of a nonconvex,discontinuous but lsc function f1 given by

(171) f1(x, y) :=

(x2 + y2)1/4 exp

(− (1/2) arctan(x/y)

)if y 6= 0,

0 if (x, y) = (0, 0),|x|1/2 exp(−π/4) if x 6= 0, y = 0.

(ii) The linear mapping

T :=(

1/2 1/2−1/2 1/2

)is firmly nonexpansive and a cutter. However, T is a subgradient projector of a nonconvex, discon-tinuous but lsc function f2 given by

(172) f2(x, y) :=

(x2 + y2)1/2 exp

(− arctan(x/y)

)if y 6= 0,

0 if (x, y) = (0, 0),|x| exp(−π/2) if x 6= 0, y = 0.

69

Page 70: Subgradient Projectors: Extensions, Theory, and ...people.ok.ubc.ca/bauschke/Research/109.pdf · Subgradient Projectors: Extensions, Theory, and Characterizations Heinz H. Bauschke,

Figure 1: Plot of function given by (171) Figure 2: Plot of function given by (172)

Note that f2 = f 21 in Corollary 11.6 and G f2 = (Id+G f1)/2.

Remark 11.7 Corollary 11.5 and Corollary 11.6 together show that although the set of cutters andthe set of subgradient projectors have a nonempty intersection, they are different because neitherone contains the other.

By Theorem 4.5, there exists no continuous convex function f such that G f = T in either caseof Corollary 11.6. Corollary 11.6 says that T = G f being linear and firmly nonexpansive does notimply that f is convex. A key point below is that if T = G f is linear and f is convex on R2, thenTheorem 11.2 implies that T has to be firmly nonexpansive and symmetric.

Corollary 11.8 Let T be given by (165). Then T is a subgradient projector of a convex function if and onlyif one of the following holds:

(i) a = b = c = 0, d 6= 0, 0 < d ≤ 1: T = G f where f (x1, x2) = K|x2|1/d for some K > 0; orb = c = d = 0, a 6= 0, 0 < a ≤ 1: T = G f where f (x1, x2) = K|x1|1/a for some K > 0.

(ii) a 6= 0, d 6= 0, b = c 6= 0, ad = c2, a ≥ a2 + c2: T = G f where f (x1, x2) = K|ax1 + cx2|a/(a2+c2)

for some K > 0.

(iii) a = d, b = c = 0, 0 < a ≤ 1: T = G f where f (x1, x2) = K(x21 + x2

2)12a for some K > 0.

Acknowledgments

The authors thank two anonymous referees for careful reading and constructive suggestions onthe paper. HHB was partially supported by a Discovery Grant of the Natural Sciences and Engi-neering Research Council of Canada (NSERC) and by the Canada Research Chair Program. CWwas partially supported by National Natural Science Foundation of China (11401372). XW waspartially supported by a Discovery Grant of the Natural Sciences and Engineering Research Coun-cil of Canada (NSERC). JX was supported by by NSERC grants of HHB and XW.

70

Page 71: Subgradient Projectors: Extensions, Theory, and ...people.ok.ubc.ca/bauschke/Research/109.pdf · Subgradient Projectors: Extensions, Theory, and Characterizations Heinz H. Bauschke,

References

[1] S. Adly, F. Nacry, and L. Thibault, Preservation of prox-regularity of sets with applicationsto constrained optimization, SIAM J. Optim. 26 (2016), no. 1, 448–473.

[2] D. Aussel, A. Daniilidis, and L. Thibault, Subsmooth sets: functional characterizations andrelated concepts, Trans. Amer. Math. Soc. 357 (2005), no. 4, 1275–1301.

[3] M. Bacak, J.M. Borwein, A. Eberhard, and B.S. Mordukhovich, Infimal convolutions andLipschitzian properties of subdifferentials for prox-regular functions in Hilbert spaces, J.Convex Anal. 17 (2010), no. 3-4, 737–763.

[4] J.B. Baillon, R.R. Bruck, and S. Reich, On the asymptotic behavior of nonexpansive mappingsand semigroups in Banach spaces, Houston J. Math. 4 (1978), 1–9.

[5] F. Bernard and L. Thibault, Prox-regular functions in Hilbert spaces, J. Math. Anal. Appl. 303(2005), no. 1, 1–14.

[6] F. Bernard and L. Thibault, Prox-regularity of functions and sets in Banach spaces, Set-ValuedAnal. 12 (2004), no. 1-2, 25–47.

[7] H.H. Bauschke and P.L. Combettes, Convex Analysis and Monotone Operator Theory in HilbertSpaces, Springer, 2011.

[8] H.H. Bauschke, J.M. Borwein, and P.L. Combettes, Bregman monotone optimization algo-rithms, SIAM J. Control Optim. 42 (2003), 596–636.

[9] H.H. Bauschke and P.L. Combettes, A weak-to-strong convergence principle for Fejer-monotone methods in Hilbert spaces, Math. Oper. Res. 26 (2001), 248–264.

[10] H.H. Bauschke, P.L. Combettes, and D. Noll, Joint minimization with alternating Bregmanproximity operators, Pac. J. Optim. 2 (2006), no. 3, 401–424.

[11] H.H. Bauschke, J. Chen, and X. Wang, A Bregman projection method for approximatingfixed points of quasi-Bregman nonexpansive mappings, Appl. Anal. 94 (2015), 75–84.

[12] H.H. Bauschke, C. Wang, X. Wang, and J. Xu, On subgradient projectors, SIAM J. Optim. 25(2015), no. 2, 1064–1082.

[13] H.H. Bauschke, C. Wang, X. Wang, and J. Xu, On the finite convergence of a projected cuttermethod, J. Optim. Theory Appl. 165 (2015), no. 3, 901–916.

[14] H.H. Bauschke, X. Wang, and L. Yao, Monotone linear relations: maximality and Fitzpatrickfunctions, J. Convex Anal. 16 (2009), 673–686.

[15] J.M. Borwein and W.B. Moors, Essentially smooth Lipschitz functions, J. Funct. Anal. 149(1997), no. 2, 305–351.

[16] J.M. Borwein, W.B. Moors, and X. Wang, Generalized subdifferentials: a Baire categoricalapproach, Trans. Amer. Math. Soc. 353 (2001), no. 10, 3875–3893.

[17] J.M. Borwein and A.S. Lewis, Convex Analysis and Nonlinear Optimization: Theory and Exam-ples, Springer, New York, second edition, 2006.

71

Page 72: Subgradient Projectors: Extensions, Theory, and ...people.ok.ubc.ca/bauschke/Research/109.pdf · Subgradient Projectors: Extensions, Theory, and Characterizations Heinz H. Bauschke,

[18] J.M. Borwein and Q.J. Zhu, Techniques of Variational Analysis, Springer-Verlag, New York,2005.

[19] T.D. Capricelli, Algorithmes de Projections Convexes Generalisees et Applications en ImagerieMedicale, Ph.D. dissertation, University Pierre & Marie Curie (Paris 6), Paris, June 2008.

[20] A. Cegielski, Iterative Methods for Fixed Point Problems in Hilbert Spaces, Lecture Notes inMathematics, 2057, Springer, Heidelberg, 2012.

[21] Y. Censor and A. Lent, Cyclic subgradient projections, Math. Programming 24 (1982), no. 1,233–235.

[22] Y. Censor, W. Chen, and H. Pajoohesh, Finite convergence of a subgradient projectionsmethod with expanding controls, Appl. Math. Optim. 64 (2011), no. 2, 273–285.

[23] F.H. Clarke, Optimization and Nonsmooth Analysis, second edition, Classics in Applied Math-ematics, 5. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, 1990.

[24] P.L. Combettes, Convex set theoretic image recovery by extrapolated iterations of parallelsubgradient projections, IEEE Trans. Image Process. 6 (1997), no. 4, 493–506.

[25] P.L. Combettes and J. Luo, An adaptive level set method for nondifferentiable constrainedimage recovery, IEEE Trans. Image Process. 11 (2002), no. 11, 1295–1304.

[26] A. Daniilidis and P. Georgiev, Approximate convexity and submonotonicity, J. Math. Anal.Appl. 291 (2004), no. 1, 292–301.

[27] F. Deutsch, Best Approximation in Inner Product Spaces, Springer, 2001.

[28] L.C. Evans and R.F. Gariepy, Measure Theory and Fine Properties of Functions, Studies in Ad-vanced Mathematics, CRC Press, Boca Raton, FL, 1992.

[29] M. Fukushima, A finitely convergent algorithm for convex inequalities, IEEE Trans. Automat.Control 27 (1982), no. 5, 1126–1127.

[30] R. Hesse and D.R. Luke, Nonconvex notions of regularity and convergence of fundamentalalgorithms for feasibility problems, SIAM J. Optim. 23 (2013), no. 4, 2397–2419.

[31] A.D. Ioffe, Approximate subdifferentials and applications, I: the finite-dimensional theory,Trans. Amer. Math. Soc. 281 (1984), no. 1, 389–416.

[32] A. Jourani, L. Thibault, and D. Zagrodny, Differential properties of the Moreau envelope, J.Funct. Anal. 266 (2014), no. 3, 1185–1237.

[33] C. Kan and W. Song, The Moreau envelope function and proximal mapping in the sense ofthe Bregman distance, Nonlinear Anal. 75 (2012), no. 3, 1385–1399.

[34] K.C. Kiwiel, The efficiency of subgradient projection methods for convex optimization. I.General level methods, SIAM J. Control Optim. 34 (1996), no. 2, 660–676.

[35] K.C. Kiwiel, The efficiency of subgradient projection methods for convex optimization. II.Implementations and extensions, SIAM J. Control Optim. 34 (1996), no. 2, 677–697.

72

Page 73: Subgradient Projectors: Extensions, Theory, and ...people.ok.ubc.ca/bauschke/Research/109.pdf · Subgradient Projectors: Extensions, Theory, and Characterizations Heinz H. Bauschke,

[36] K.C. Kiwiel, A Bregman-projected subgradient method for convex constrained nondifferen-tiable minimization, Operations Research Proceedings 1996 (Braunschweig), 26–30, Springer,Berlin, 1997.

[37] D.R. Luke, N.H. Thao, and M.K. Tam, Quantitative convergence analysis of iterated expan-sive, set-valued mappings, http://arxiv.org/abs/1605.05725, May 18, 2016.

[38] C. Meyer, Matrix Analysis and Applied Linear Algebra, Society for Industrial and AppliedMathematics (SIAM), Philadelphia, PA, 2000.

[39] B.S. Mordukhovich and Y. Shao, Nonsmooth sequential analysis in Asplund spaces, Trans.Amer. Math. Soc. 348 (1996), 1235–1280.

[40] B.S. Mordukhovich, Variational Analysis and Generalized Differentiation I, Springer-Verlag,2006.

[41] H. Van Ngai, D.T. Luc, and M. Thera, Approximate convex functions, J. Nonlinear ConvexAnal. 1 (2000), no. 2, 155–176.

[42] N. Ogura and I. Yamada, A deep outer approximating half space of the level set of certainquadratic functions, J. Nonlinear Convex Anal. 6 (2005), no. 1, 187–201.

[43] C.H. Jeffery Pang, Finitely convergent algorithm for nonconvex inequality problems,http://arxiv.org/abs/1405.7280, July 31, 2014.

[44] B. Pauwels, Subgradient projection operators, http://arxiv.org/abs/1403.7237v1, March2014.

[45] R.A. Poliquin and R.T. Rockafellar, Prox-regular functions in variational analysis, Trans.Amer. Math. Soc. 348 (1996), no. 5, 1805–1838.

[46] B.T. Polyak, Minimization of unsmooth functionals, U.S.S.R. Computational Mathematics andMathematical Physics 9 (1969), 14–29. (The original version appeared in Akademija Nauk SSSR.Zurnal Vycislitel’ noı Matematiki i Matematiceskoı Fiziki 9 (1969), 509–521.)

[47] B.T. Polyak, Introduction to Optimization, Optimization Software, 1987.

[48] B.T. Polyak, Random algorithms for solving convex inequalities, in Inherently Parallel Al-gorithms in Feasibility and Optimization and their Applications, D. Butnariu, Y. Censor, and S.Reich (editors), pages 409–422, Elsevier, 2001.

[49] L.F. Richardson, Measure and Integration : A Concise Introduction to Real Analysis, Wiley, 2009.

[50] R.T. Rockafellar, Convex Analysis, Princeton University Press, 1970.

[51] R.T. Rockafellar and R. J-B Wets, Variational Analysis, Springer, corrected 3rd printing, 2009.

[52] J.E. Spingarn, Submonotone subdifferentials of Lipschitz functions, Trans. Amer. Math. Soc.264 (1981), no. 1, 77–89.

[53] X. Wang, Subdifferentiability of real functions, Real Anal. Exchange 30 (2004/05), no. 1, 137–171.

[54] J. Xu, Subgradient Projectors: Theory, Extensions, and Algorithms, Ph.D. dissertation, Universityof British Columbia, Kelowna, April 2016.

73

Page 74: Subgradient Projectors: Extensions, Theory, and ...people.ok.ubc.ca/bauschke/Research/109.pdf · Subgradient Projectors: Extensions, Theory, and Characterizations Heinz H. Bauschke,

[55] I. Yamada and N. Ogura, Adaptive projected subgradient method for asymptotic minimiza-tion of sequence of nonnegative convex functions, Numer. Funct. Anal. Optim. 25 (2004), no.7-8, 593–617.

74