college of william & mary department of computer science · 2020. 10. 13. · abstract. we consider...

28
Technical Report WM-CS-2008-07 College of William & Mary Department of Computer Science WM-CS-2008-07 Active Set Identification without Derivatives Robert Michael Lewis and Virginia Torczon September 2008 Submitted to SIAM Journal on Optimization

Upload: others

Post on 21-Oct-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

  • Technical Report WM-CS-2008-07

    College ofWilliam & Mary

    Department of Computer Science

    WM-CS-2008-07

    Active Set Identification without Derivatives

    Robert Michael Lewis and Virginia Torczon

    September 2008

    Submitted to SIAM Journal on Optimization

  • ACTIVE SET IDENTIFICATION WITHOUT DERIVATIVES

    ROBERT MICHAEL LEWIS∗ AND VIRGINIA TORCZON†

    Abstract. We consider active set identification for linearly constrained optimization problemsin the absence of explicit information about the derivative of the objective function. We focus on gen-erating set search methods, a class of derivative-free direct search methods. We begin by presentingsome general results on active set identification that are not tied to any particular algorithm. Thesegeneral results are sufficiently strong that, given a sequence of iterates converging to a Karush–Kuhn–Tucker point, they allow us to identify binding constraints for which there are nonzero multipliers.We then discuss why these general results, which are posed in terms of the direction of steepestdescent, apply to generating set search, even though these methods do not have explicit recourse toderivatives. Nevertheless, there is a clearly identifiable subsequence of iterations at which we canreliably estimate the set of constraints that are binding at a solution. We discuss how active setestimation can be used to accelerate generating set search methods and illustrate the appreciableimprovement that can result using several examples from the CUTEr test suite. We also introducetwo algorithmic refinements for generating set search methods. The first expands the subsequenceof iterations at which we can make inferences about stationarity. The second is a more flexible stepacceptance criterion.

    Key words. active sets, constrained optimization, linear constraints, direct search, generatingset search, generalized pattern search, derivative-free methods.

    AMS subject classifications. 90C56, 90C30, 65K05

    1. Introduction. We consider active constraint identification for generating setsearch (GSS) algorithms for linearly constrained minimization. For our discussion theoptimization problem of interest will be posed as

    minimize f(x)subject to Ax ≤ b,

    (1.1)

    where f : Rn → R and A ∈ Rm×n.GSS algorithms are a class of direct search methods and as such do not make

    explicit use of derivatives of the objective f [11]. Results on active constraint iden-tification for smooth nonlinear programming, on the other hand, do rely explicitlyon the gradient ∇f(x) of f (e.g., [5, 3, 2, 4, 7, 18]). Nevertheless, GSS algorithmscompute information in the course of their operation that can be used to identifythe set of constraints active at a Karush–Kuhn–Tucker (KKT) point of (1.1). Thus,while we assume for our theoretical results that f is continuously differentiable, forthe application to GSS methods we do not assume that ∇f(x) (or an approximationthereto) is computationally available. The fact that active set estimation is possiblein the absence of an explicit linear or quadratic model of the objective is the reasonfor the intentionally provocative title of this paper.

    We first prove some general active set identification results. While these resultsare intended to reveal the active set identification properties of GSS methods. theyare general and not tied to any particular class of algorithms. The first set of results,Theorems 3.2 and 3.3 and Corollaries 3.4 and 3.5, establish the active set identifi-cation properties of a sequence of points in terms of the projection of the direction

    ∗Department of Mathematics, College of William & Mary, P.O. Box 8795, Williamsburg, Virginia,23187-8795; [email protected]. This work was supported by the National Science Foundationunder Grant No. DMS-0713812.

    †Department of Computer Science, College of William & Mary, P.O. Box 8795, Williamsburg,Virginia, 23187-8795; [email protected].

    1

    HTTP://WWW.MATH.WM.EDU/~BUCKAROOHTTP://WWW.CS.WM.EDU/~VA/mailto:[email protected]:[email protected]

  • of steepest descent onto a cone associated with constraints near the points in thesequence. In this first set of results we allow the sequence to be entirely interior tothe feasible region in (1.1), and this leads to additional requirements of linear inde-pendence of active constraints and strict complementarity that, while standard, wewish to avoid. We do so by observing that an easily computed auxiliary sequence ofpoints (that can include points on the boundary) possesses even stronger active setidentification properties. In Theorem 3.6 we show that for this auxiliary sequence wehave the stronger result that the projection of the direction of steepest descent ontothe feasible region tends to zero. From results of Burke and Moré [3, 4] we then havethe following. First is that this auxiliary sequence allows us to identify the activeconstraints for which there are nonzero Lagrange multipliers (unique or not) withoutany assumption of linear independence of the active constraints. Second is that undermild nondegeneracy assumptions this auxiliary sequence will find the face determinedby the active constraints in a finite number of iterations.

    Any algorithm for which these general results hold will have active set identifi-cation properties at least as strong as the gradient projection method [5, 3, 2, 4]. Itis therefore perhaps surprising that these results apply to GSS methods, which donot have direct access to the gradient. However, even though GSS methods do nothave recourse to ∇f(x), there does exist an implicit relationship between an explic-itly known algorithmic parameter used to control step lengths and an appropriatelychosen measure of stationarity for (1.1) [13]. As the step-length control parametertends to zero, so does this measure of stationarity. This correlation means that GSSmethods generate at least one subsequence of iterates that satisfies the hypotheses ofour general active set identification results, and so GSS methods are able to identifyactive sets of constraints, even without direct access to the derivative of the objective.

    We illustrate the practical benefits of incorporating active set estimation into GSSalgorithms. We propose several ways in which one can use estimates of the active setto accelerate the search for a solution of (1.1) and present numerical tests that showthese strategies can appreciably accelerate progress towards a solution. These testsshow that generating set search can be competitive with a derivative-based quasi-Newton algorithm. As part of these algorithmic developments we also introduce tworefinements for GSS methods. The first expands the subsequence of iterations at whichwe can make inferences about stationarity by introducing the notion of tangentiallyunsuccessful iterations, as discussed further in Section 4.4. The second is a moreflexible step acceptance criterion that scales automatically with the magnitude of theobjective, as discussed further in Section 4.3.

    The general results on active set identification are given in Section 3. The relevantfeatures of GSS methods are reviewed and expanded in Section 4. Section 5 discussesthe active set identification properties of GSS methods and proposes accelerationschemes based on them. Section 6 contains some numerical tests of these ideas.

    2. Notation. Let aTi be the ith row of the constraint matrix A. The set ofconstraints active at a point x is defined to be

    A(x) ={

    i | aTi x = bi}

    .

    Let Ci ={

    y | aTi y = bi}

    denote the set of points where the ith constraint is binding.Denote the feasible region by Ω: Ω = { x | Ax ≤ b }. Given x ∈ Ω and ε ≥ 0, define

    I(x, ε) =

    {

    i | dist(x, Ci) =

    ∣ bi − aTi x

    ‖ ai ‖≤ ε

    }

    . (2.1)

    2

  • The vectors ai for i ∈ I(x, ε) are the outward-pointing normals to the constraintsdefining the boundary of Ω within distance ε of x. We call the set of such vectors theworking set of constraints associated with x and ε. The ε-normal cone N(x, ε) andthe ε-tangent cone T (x, ε) are defined to be

    N(x, ε) =

    v | v =∑

    i∈I(x,ε)

    ciai, ci ≥ 0

    ∪ {0} and T (x, ε) = N◦(x, ε).

    If ε = 0 these cones are the usual normal cone N(x) and tangent cone T (x) of Ω atx. If K ⊆ Rn is a cone and v ∈ Rn, then we denote the Euclidean norm projection ofv onto K by [v]K . Finally, ‖ · ‖ denotes the Euclidean norm.

    3. Results concerning active set identification. Suppose x⋆ is a KKT pointof (1.1), and consider an algorithm that generates a sequence of feasible points {xk} ⊂Ω, indexed by k ∈ K, such that xk → x⋆ as k → ∞. Let A(x⋆) denote the set ofindices of the linear constraints active at x⋆. Then at x⋆ we have

    −∇f(x⋆) =∑

    j∈A(x⋆)

    λjaj , λj ≥ 0 for all j ∈ A(x⋆).

    At this point we make no assumption that the Lagrange multipliers λj are unique.We wish to show that for some sequence {x̂k}—either {xk} or one we can derive

    from {xk}—and a set of constraints W (x̂k) that we can construct knowing x̂k, wehave the following:

    1. For all k sufficiently large, we either identify the active set at x⋆ or at leastthose constraints in A(x⋆) with nonzero multipliers. That is, for all k suffi-ciently large we would prefer to have A(x⋆) = W (x̂k) and that at the veryleast we want { i ∈ A(x⋆) | λi > 0 } ⊆W (x̂k).

    2. If x⋆ lies in a face of Ω, then we land on the face that contains x⋆ in a finitenumber of iterations. That is, for all k sufficiently large we have A(x̂k) =A(x⋆). In particular, if x⋆ is at a vertex of Ω, then we land on x⋆ in a finitenumber of iterations: x̂k = x⋆ for all k sufficiently large.

    Moreover, we want these results to hold under assumptions on ∇f(x⋆) and A(x⋆)that are as mild as possible. Because the gradient projection algorithm possessesthese properties [5, 3, 2, 4], we view these requirements as reasonable expectationsfor any combination of optimization algorithm and active set identification techniqueapplied to (1.1) when f is differentiable.

    Theorem 3.1 introduces the property of the iterates xk that is central to the activeset identification properties of generating set search.

    Theorem 3.1. Suppose that f is continuously differentiable on Ω, that xk → x⋆,and that there exist sequences εk and ηk with εk, ηk ≥ 0 and εk, ηk → 0 such that

    ∥ [−∇f(xk)]T (xk,εk)∥

    ∥ ≤ ηk.

    Then x⋆ is a KKT point of (1.1).Proof. Since xk → x⋆ and εk → 0 we know that once k ∈ K is sufficiently

    large we have I(xk, εk) ⊆ A(x⋆) [16, Proposition 1]. Henceforth restrict attentionto such k. Since I(xk, εk) ⊆ A(x⋆) it follows that N(xk, εk) ⊆ N(x⋆, 0). ThereforeT (x⋆, 0) ⊆ T (xk, εk) and so

    ∥ [−∇f(xk)]T (x⋆,0)∥

    ∥ ≤∥

    ∥ [−∇f(xk)]T (xk,εk)∥

    ∥ ≤ ηk.

    3

  • Taking the limit as k →∞ yields [−∇f(x⋆)]T (x⋆,0) = 0, and the result follows.Theorem 3.2 is a first step towards finding a set of constraints that characterize

    the KKT point x⋆.

    Theorem 3.2. Suppose that f is continuously differentiable on Ω, that xk → x⋆,and that there exist sequences εk and ηk with εk, ηk ≥ 0 and ηk → 0 such that

    ∥ [−∇f(xk)]T (xk,εk)∥

    ∥ ≤ ηk.

    Then for all k ∈ K sufficiently large, −∇f(x⋆) ∈ N(xk, εk).Proof. Since there is only a finite number of possible distinct working sets

    I(xk, εk), we know that there exists k̄ ∈ K such that if k ≥ k̄ then the associated work-ing set I(xk, εk) appears infinitely often among the working sets

    {

    I(xℓ, εℓ) | ℓ ≥ k̄}

    .

    Now suppose k ∈ K and k ≥ k̄. Then

    limℓ→∞

    I(xℓ,εℓ)=I(xk,εk)

    ∥ [−∇f(xℓ)]T (xk,εk)∥

    ∥ ≤ limℓ→∞

    I(xℓ,εℓ)=I(xk,εk)

    ηℓ,

    whence∥

    ∥ [−∇f(x⋆)]T (xk,εk)∥

    ∥ = 0. It follows that −∇f(x⋆) ∈ N(xk, εk).The next theorem relates active constraints that are missing from I(xk, εk) to

    their associated Lagrange multipliers λj . Its significance will be made explicit inCorollaries 3.4 and 3.5. For i ∈ A(x⋆) let Πi(·) denote the projection onto the orthogo-nal complement of the linear span of the other active constraints { aj | j ∈ A(x⋆)− {i} }.

    Theorem 3.3. Suppose that f is continuously differentiable on Ω, that xk → x⋆,and that there exist sequences εk and ηk with εk, ηk ≥ 0 and εk → 0 such that

    ∥ [−∇f(xk)]T (xk,εk)∥

    ∥ ≤ ηk.

    Then for all k ∈ K sufficiently large, if i ∈ A(x⋆) but i 6∈ I(xk, εk), then

    |λi | ‖ Πi(ai) ‖ ≤ ηk + ‖ ∇f(xk)−∇f(x⋆) ‖ .

    Proof. Since xk → x⋆ and εk → 0 we know that once k ∈ K is sufficiently largewe have I(xk, εk) ⊆ A(x⋆). Henceforth we restrict attention to such k.

    Let N(A(x⋆) − {i}) be the cone generated by { aj | j ∈ A(x⋆)− {i} } and letT (A(x⋆)−{i}) be the cone polar to N(A(x⋆)−{i}). Since i ∈ A(x⋆) but i 6∈ I(xk, εk)we know that I(xk, εk) ⊆ A(x⋆)− {i}, so it follows that N(xk, εk) ⊆ N(A(x⋆)− {i})and thus T (A(x⋆)− {i}) ⊆ T (xk, εk). This means that

    ∥ [−∇f(xk)]T (A(x⋆)−{i})∥

    ∥ ≤∥

    ∥ [−∇f(xk)]T (xk,εk)∥

    ∥ . (3.1)

    Meanwhile,

    Πi(

    [−∇f(x⋆)]N(A(x⋆)−{i}) + [−∇f(x⋆)]T (A(x⋆)−{i}))

    = Πi(∑

    j∈A(x⋆)−{i}

    λjaj + λiai)

    Πi(

    [−∇f(x⋆)]T (A(x⋆)−{i}))

    = λiΠi(ai),

    whence

    |λi | ‖ Πi(ai) ‖ =∥

    ∥ Πi(

    [−∇f(x⋆)]T (A(x⋆)−{i})) ∥

    ∥ ≤∥

    ∥ [−∇f(x⋆)]T (A(x⋆)−{i})∥

    ∥ .

    4

  • The theorem then follows from (3.1):

    |λi | ‖ Πi(ai) ‖ ≤∥

    ∥ [−∇f(x⋆)]T (xk,εk)∥

    ≤∥

    ∥ [−∇f(xk)]T (xk,εk)∥

    ∥ +∥

    ∥ [−∇f(xk)]T (xk,εk) − [−∇f(x⋆)]T (xk,εk)∥

    ≤ ηk + ‖ ∇f(xk)−∇f(x⋆) ‖ .

    In the last step we use the fact that projection onto a convex cone is nonexpansive.Several useful facts follow from Theorem 3.3. The first concerns the nature of

    any constraints in A(x⋆) that we fail to capture in I(xk, εk), our estimate of A(x⋆).Suppose constraint i is active at x⋆, and that Πi(ai) 6= 0. Then once k is sufficientlylarge, constraint i can fail to be in I(xk, εk) only if the associated multiplier λi issmall. If the multipliers are valid shadow costs, then once k is sufficiently large theonly constraints in A(x⋆) that can possibly be absent from I(xk, εk) are those thatplay only a small role in constraining the optimal objective value. This observation,and its consequences as k →∞, are summarized in Corollary 3.4.

    Corollary 3.4. Suppose that the conditions of Theorem 3.3 hold. In addition,suppose that ηk → 0 and i ∈ A(x⋆) is such that Πi(ai) 6= 0. Then

    1. If i 6∈ I(xk, εk) and k ∈ K is sufficiently large, then

    |λi | ≤ (ηk + ‖ ∇f(xk)−∇f(x⋆) ‖) / ‖ Πi(ai) ‖ . (3.2)

    2. If i 6∈ I(xk, εk) for infinitely many k ∈ K, then λi = 0, where λi is themultiplier associated with ai at x⋆.

    3. For all k ∈ K sufficiently large we have

    { j ∈ A(x⋆) | λj > 0, Πj(aj) 6= 0 } ⊆ I(xk, εk).

    Proof. Since Πi(ai) 6= 0, Part 1 follows from Theorem 3.3. To prove Part 2, letK′ ⊆ K be an infinite subsequence of iterations for which i 6∈ I(xk, εk). Then takingthe limit of (3.2) for k ∈ K′ as k →∞ yields

    |λi | ≤ limk∈K′→∞

    ηk + ‖ ∇f(xk)−∇f(x⋆) ‖

    ‖ Πi(ai) ‖= 0.

    Part 3 is the contrapositive of Part 2.Note that the nonlinear program (1.1) is posed in terms of inequality constraints.

    We assume that any equality constraints are expressed as a pair of opposing inequal-ities: aTi x ≤ bi and −a

    Ti x ≤ −bi. Let E denote the set of equality constraints:

    E ={

    i | aTi x = bi for all x ∈ Ω}

    .

    The set of constraints A(x⋆)−E is the set of “true” inequalities that are active at x⋆.Corollary 3.5. Suppose that the conditions of Theorem 3.3 hold, that ηk → 0,

    that xk ∈ Ω for all k, and that for all i ∈ A(x⋆)− E we have Πi(ai) 6= 0. Then1. For all k ∈ K sufficiently large we have E ∪ {j ∈ A(x⋆)− E |λj > 0} ⊆

    I(xk, εk).2. If strict complementarity holds at x⋆ for the constraints in A(x⋆) − E, thenA(x⋆) = I(xk, εk) for all k sufficiently large.

    Proof. Since by assumption xk ∈ Ω for all k, we are guaranteed that the equalityconstraints are satisfied at every iteration and thus E ⊆ I(xk, εk) for all k. Moveover,Πi(ai) 6= 0 for all i ∈ A(x⋆) − E , so Part 3 of Corollary 3.4 holds for the constraints

    5

  • in A(x⋆)−E . Part 1 then follows. If strict complementarity holds for the constraintsin A(x⋆) − E then E ∪ { j ∈ A(x⋆)− E | λj > 0 } = A(x⋆), so A(x⋆) ⊆ I(xk, εk).However, for k sufficiently large we know that I(xk, εk) ⊆ A(x⋆), so Part 2 follows.

    Theorem 3.3 and Corollary 3.4 allow us to make useful inferences about the activesetA(x⋆) from the sets I(xk, εk). Corollary 3.5 shows that under stronger assumptionswe have I(xk, ε) = A(x⋆), which is the first of the goals outlined at the start of thissection. However, the linear independence hypothesis of Corollary 3.5 is dissatisfyingsince it does not hold at such reasonable and benign points as the apex of a pyramid.In addition, we have not required any of the iterates xk to lie on the boundary of Ω,and we need iterates that lie on the boundary of Ω in order to achieve our secondgoal, that A(xk) = A(x⋆) after a finite number of iterations.

    To address these issues we define an auxiliary sequence {x̂k} of points that can lieon the boundary of Ω. Suppose the set

    {

    x | Ax ≤ b and aTi x = bi for i ∈ I(xk, εk)}

    is nonempty (if I(xk, εk) = ∅, then this set should be taken to mean all of Rn). In

    this situation define

    x̂k = argminx

    {

    ‖ x− xk ‖ | Ax ≤ b and aTi x = bi for i ∈ I(xk, εk)

    }

    . (3.3)

    Note that when I(xk, εk) = ∅, x̂k = xk. We will show that {x̂k} has even strongeractive set identification properties than {xk}.

    Theorem 3.6. Suppose that f is continuously differentiable on Ω, that xk → x⋆,and that there exist sequences εk and ηk with εk, ηk ≥ 0 and εk, ηk → 0 such that

    ∥ [−∇f(xk)]T (xk,εk)∥

    ∥ ≤ ηk.

    Then1. x̂k is defined for all k ∈ K sufficiently large;

    2. limk→∞

    ‖ x̂k − xk ‖ = 0; and

    3. limk→∞

    [−∇f(x̂k)]T (x̂k,0) = 0.

    Proof. Let k ∈ K be sufficiently large that I(xk, εk) ⊆ A(x⋆). Then we know that

    x⋆ ∈{

    x | Ax ≤ b and aTi x = bi for all i ∈ I(xk, εk)}

    .

    Since x⋆ is feasible for the optimization problem in (3.3), it follows that x̂k is defined.This proves Part 1. Moreover, since x⋆ is feasible for (3.3), then for all sufficiently largek we have ‖ x̂k − xk ‖ ≤ ‖ x⋆ − xk ‖. Since xk → x⋆ we conclude that ‖ x̂k − xk ‖ →0, which is Part 2.

    Finally, the definition of x̂k ensures that all the constraints in I(xk, εk) are activeat x̂k. Thus N(xk, εk) ⊆ N(x̂k, 0) and so T (x̂k, 0) ⊆ T (xk, εk), whence

    ∥ [−∇f(x̂k)]T (x̂k,0)∥

    ∥ ≤∥

    ∥ [−∇f(x̂k)]T (xk,εk)∥

    ≤∥

    ∥ [−∇f(xk)]T (xk,εk)∥

    ∥ +∥

    ∥ [−∇f(x̂k)]T (xk,εk) − [−∇f(xk)]T (xk,εk)∥

    ≤ ηk + ‖ ∇f(x̂k)−∇f(xk) ‖ .

    By Part 2, we know that ‖ x̂k − xk ‖ → 0 as k →∞. Taking the limit yields

    limk→∞

    [−∇f(x̂k)]T (x̂k,0) = 0,

    which is Part 3.

    6

  • We next recall some results of Burke and Moré [3, 4], stated here in terms ofthe linearly constrained problem (1.1). Theorem 3.7 assumes a mild nondegeneracycondition and gives a necessary and sufficient condition for identifying the activeset A(x⋆) from A(x̂k) in a finite number of iterations. It also describes a situationin which x⋆ will be attained in a finite number of iterations. Theorem 3.8 gives anecessary and sufficient condition for the identification of all constraints in A(x⋆) forwhich there is a strictly positive Lagrange multiplier.

    Theorem 3.7. Let f : Rn → R be continuously differentiable on Ω, and assumethat {x̂k} ⊂ Ω converges to a KKT point x⋆ for which −∇f(x⋆) is in the relativeinterior of N(x⋆).

    1. [3, Corollary 3.6] Then A(x̂k) = A(x⋆) for all k sufficiently large if and onlyif {[−∇f(x̂k)]T (x̂k,0)} converges to zero.

    2. [3, Corollary 3.5] If N(x⋆) has a nonempty interior and {[−∇f(x̂k)]T (x̂k,0)}converges to zero, then x̂k = x⋆ for all k sufficiently large.

    The requirement that −∇f(x⋆) lie in the relative interior of N(x⋆) is equivalent to

    −∇f(x⋆) =∑

    j∈A(x⋆)

    λjaj , λj > 0 for all j ∈ A(x⋆).

    (See [3, Lemma 3.2]).Theorem 3.8. [4, Theorem 4.5] Let f : Rn → R be continuously differentiable

    on Ω, and assume that {x̂k} ⊂ Ω converges to a KKT point x⋆ at which we have

    −∇f(x⋆) =∑

    i∈A(x⋆)

    λiai, λi ≥ 0. (3.4)

    Then limk→∞[−∇f(x̂k)]T (x̂k,0) = 0 if and only if { i ∈ A(x⋆) | λi > 0 } ⊆ A(x̂k) forall k sufficiently large.

    From Theorems 3.7 and 3.8 and Part 3 of Theorem 3.6 we obtain Corollary 3.9,which achieves the goals concerning active set identification set out at the beginningof this section. Corollary 3.9 says that while I(xk, εk) serves as a useful estimate ofthe constraints active at x⋆, the constraints in A(x̂k) are a sharper estimate.

    Corollary 3.9. Suppose that xk → x⋆, ηk → 0, εk → 0, and∥

    ∥ [−∇f(xk)]T (xk,εk)∥

    ∥ ≤ ηk.

    Then we have the following:1. If −∇f(x⋆) is in the relative interior of N(x⋆), A(x̂k) = A(x⋆) for all k

    sufficiently large. If, in addition, N(x⋆) has a nonempty interior, then x̂k =x⋆ for all k sufficiently large.

    2. Suppose

    −∇f(x⋆) =∑

    i∈A(x⋆)

    λiai, λi ≥ 0.

    Then { i ∈ A(x⋆) | λi > 0 } ⊆ A(x̂k) for all k sufficiently large.

    4. Generating set search for linearly constrained minimization. We nextreview the features of GSS that are relevant to active set identification for (1.1) (see[13, 14] for further discussion). GSS algorithms for linearly constrained problems arefeasible iterate methods. Associated with each xk is a value εk → 0, defined shortly,which is used to compute a working set of nearby constraints I(xk, εk) as defined by(2.1). The connection with the active set identification results of the previous sectioncomes through the stationarity properties of GSS discussed in Section 4.7.

    7

  • 4.1. Partitioning the search directions. At the center of the convergenceproperties of GSS methods is the notion of a core set of search directions, denotedGk, which is constructed at each iteration in a way that ensures the inclusion of atleast one direction of descent along which a feasible step of sufficient length can betaken. In the linearly constrained case Gk must comprise a set of generators for theε-tangent cone T (xk, εk). If Gk satisfies this requirement, then convergence resultscan be obtained under straightforward conditions detailed in [13]. The core searchdirections (the elements of Gk) conform to the boundary of Ω near xk, where “near”is determined by the value of εk. To simplify the discussion, we assume that all of thedirections in Gk are normalized. For details on other options, see [13, section 2.3].

    As a practical matter it is useful to allow additional search directions. For in-stance, in Section 5 we introduce additional search directions to effect the accelerationstrategies based on active set estimation. The set of additional search directions isdenoted by Hk. The total set of search directions Dk is thus Dk = Gk ∪Hk.

    In the event that T (xk, εk) contains a lineality space (as in Figure 4.2(a) andFigure 4.2(f)) we have some freedom in our choice of generators. In order to preservethe convergence properties of the algorithm we place Condition 4.1 on the choice ofGk. Condition 4.1 makes use of the quantity κ(G), which appears in [13, (2.1)] and isa generalization of that given in [11, (3.10)]. For any finite set of vectors G define

    κ(G) = infv∈Rn

    [v]K 6=0

    maxd∈G

    vT d

    ‖ [v]K ‖ ‖ d ‖, where K is the cone generated by G. (4.1)

    We place the following condition on the set of search directions Gk.Condition 4.1. There exists a constant κmin > 0, independent of k, such that the

    following holds. For every k, Gk generates T (xk, εk); if Gk 6= {0}, then κ(Gk) ≥ κmin.

    4.2. Determining the lengths of steps. The step-length control parameter∆k regulates the steps tried along the search directions in Dk and prevents them frombecoming either too long or too short. The trial point associated with any d ∈ Dk is

    xk + ∆̃(d)d,

    where ∆̃(d) is

    ∆̃(d) = max { ∆ ∈ [0,∆k] | xk + ∆d ∈ Ω } . (4.2)

    This construction ensures that the full step (with respect to ∆k) is taken if the re-sulting trial point is feasible. Otherwise, the trial point is found by taking the longestfeasible step possible from xk along d.

    With the directions in Gk normalized, the value of εk used to determine the currentworking set is derived from the current value of ∆k according to εk = min{εmax,∆k}for some εmax > 0. Prudent choices of εmax and ∆0 mean that in practice it is typicallythe case that εk = ∆k.

    4.3. Ascertaining success. A trial point is considered acceptable only if itsatisfies the sufficient decrease condition

    f(xk + ∆̃(d)d) < f(xk)− ρk(∆k). (4.3)

    An appropriate sufficient decrease condition allows the possibility of taking exact stepsto the boundary of the feasible region [16, 13]. Here we use

    ρk(∆) = αk∆2. (4.4)

    8

  • We require the sequence {αk} be bounded away from zero; i.e., there exists αmin > 0such that αmin ≤ αk for all k. The choice of αk in the work reported here is

    αk = α max{

    |ftyp|, |f(xk)|}

    , (4.5)

    where α > 0 is fixed (e.g., α = 10−2) and ftyp 6= 0 is some fixed value that reflectsthe typical magnitude of the objective, given feasible inputs.

    In earlier work [16, 13, 14], the decrease condition was independent of k, whereashere it is not. We introduce ρk(∆), as in (4.4), so that the step acceptance rule (4.3)scales automatically with the magnitude of the objective. In Section 4.7 we provethat this change does not adversely affect the key stationarity results from [13].

    To ensure asymptotic success, if there is no direction d ∈ Dk for which (4.3) issatisfied, then ∆k+1 ← θk∆k for some choice of θk satisfying 0 < θk ≤ θmax < 1.Otherwise, ∆k+1 ← φk∆k for some choice of φk satisfying φk ≥ 1. To keep thingssimple, in both the examples shown here and in our implementation, we use θk =

    12

    and φk = 1 for all k.

    4.4. Tangentially unsuccessful iterations. We distinguish three types of it-erations. This is a departure from previous work [13, 14] in which the iterations aredesignated as either successful when a trial point satisfying (4.3) is found, or unsuc-cessful, when no trial point satisfying (4.3) is found. Let S and U denote the setsof successful and unsuccessful iterations, respectively; then the set of all iterations isS ∪ U with S ∩ U = ∅.

    We want to track more closely a lack of decrease along the core search directionscontained in the set Gk, so we introduce a new set UT , for tangentially unsuccessfuliterations. A tangentially unsuccessful iteration is one at which we know that (4.3)is not satisfied for any d ∈ Gk. We call the iteration tangentially unsuccessful sinceno decrease is realized by any of the trial steps defined by the generators of the ε-tangent cone. Thus, any unsuccessful step is also tangentially unsuccessful, since atan unsuccessful step no direction in Dk = Gk ∪Hk is found to satisfy (4.3). However,a tangentially unsuccessful step is not necessarily unsuccessful, since steps along thedirections in Gk may be unsuccessful but a step along one of the extra search directionsin Hk is successful. To summarize, we have U ⊆ UT but, possibly, S ∩ UT 6= ∅.

    4.5. Illustration. In Figure 4.2 we illustrate several iterations of one variant oflinearly constrained GSS applied to a simple example. This example will be used laterto give some insight into how GSS can be accelerated by active set identification. Theobjective in Figure 4.2 is a modification of the two-dimensional Broyden tridiagonalfunction [1, 17]. Its level sets are indicated in the background in Figure 4.2. We haveintroduced two linear inequality constraints, both of which are active at the KKTpoint, which is indicated by the star in Figure 4.2(a). Figure 4.1 provides a legend.

    — constraint ⋆ KKT point

    ◦ ε-ball to identify working set • current iterate xk----- constraint in working set � trial point xk + ∆kdk, dk ∈ Gk— direction defined by dk ∈ Gk ××× infeasible trial point xk + ∆kdk, dk ∈ Hk— direction defined by dk ∈ Hk � feasible trial point xk + ∆̃(dk)dk , dk ∈ Hk— direction from iteration k − 1 � feasible trial point from iteration k − 1

    Fig. 4.1. Legend for Figures 4.2–5.3.

    At the first iteration, illustrated in Figure 4.2(a), we choose G0 = {e1,−e1,−e2},where ei, i ∈ {1, 2} is a unit coordinate vector. The choice of e1 and −e1 is dictated

    9

  • (a) k = 0. The working setcontains one constraint. MoveEast; k ∈ S.

    (b) k = 1. New working set.Change set of search directions.Move North; k ∈ S and k ∈ UT .

    (c) k = 2. Same working set.Keep set of search directions. Nodecrease; halve ∆k; k ∈ U andk ∈ UT .

    (d) k = 3. Same working set.Keep set of search directions. Nodecrease; halve ∆k; k ∈ U andk ∈ UT .

    (e) k = 4. Same working set.Keep set of search directions. Nodecrease; halve ∆k; k ∈ U andk ∈ UT .

    (f) k = 5. New working set(with respect to k = 4). Changeset of search directions. MoveEast; k ∈ S.

    Fig. 4.2. A version of linearly constrained GSS applied to the modified Broyden tridiagonalfunction augmented with two linear inequality constraints. At each iteration, a constraint in theworking set is indicated with a dashed line.

    by the single constraint in the working set. To ensure that G0 generates T (x0, ε0) werequire a third vector; −e2 is a sensible choice (see [14, section 5]). We also elect toinclude the vector e2, which generates N(x0, ε0), since it allows the search to movetoward the constraint contained in the working set. Thus, H0 = {e2}. A full step oflength ∆0 along e2 is not feasible, so we reduce the length of the step so we stop atthe boundary, leading to a step of the form ∆̃(e2)e2. The step to the East satisfies(4.3), so this trial point becomes the next iterate and we assign k = 0 to S.

    At the next iteration, illustrated in Figure 4.2(b), the second constraint entersthe working set of constraints. This gives us a new core set G1 containing only twovectors. However, none of the trial steps defined by the vectors in G1 improve upon thevalue of the objective at the current iterate. Again, H1 comprises a set of generatorsfor the ε-normal cone. By considering the feasible steps along the directions in H1we see the most improvement by taking the step to the North, so that is the step weaccept. We assign k = 1 to both S and UT since, while the iteration was a success,no improvement was found along the directions defined by the vectors in G1.

    At the next iteration, illustrated in Figure 4.2(c), the working set remains un-changed, so we do not change the set of search directions: G2 ← G1 and H2 ← H1.

    10

  • Once again, the full steps of length ∆2 along the vectors contained in the core setG2 are feasible but do not yield improvement. Moreover, there are no feasible stepsalong the two vectors in H2. Since no feasible decrease has been found, the iterationis unsuccessful ; the current iterate is unchanged, k = 2 is assigned to the set U aswell as to the set UT , and ∆3 ←

    12∆2. The same is true in the next two iterations,

    illustrated in Figures 4.2(d)–4.2(e)At iteration k = 5, illustrated in Figure 4.2(f), the reductions in ∆k finally pay

    off. Now ε5 = ∆5 is sufficiently small that the second constraint is dropped from theworking set. Then it is possible to take a feasible step of length ∆5 along the vectore1 ∈ G5. The iteration is successful and k = 5 is assigned to the set S.

    4.6. The algorithm. Algorithm 4.1 restates Algorithm 5.1 from [13]. We as-sume that the search directions in Gk have been normalized, though this simplificationis not essential to the results here. We use (4.3) as our new sufficient decrease criterion.We also now keep track of tangentially unsuccessful iterations.

    4.7. Stationarity results. Next we examine the relevant convergence proper-ties of Algorithm 4.1. The results in this section and their proofs are extensions ofresults in [11, 13]. They are key to active set identification for GSS.

    The following theorem, similar to [13, Theorem 6.3] shows that for the sub-sequence of iterations k ∈ UT , Algorithm 4.1 bounds the size of the projection of−∇f(xk) onto T (xk, εk) as a function of the step-length control parameter ∆k. Thetwo modifications introduced in Algorithm 4.1 mean that Theorem 4.2 differs from[13, Theorem 6.3] in two respects. First, we have replaced ρ(∆k) with a ρk(∆k) thatdoes not satisfy the requirements on ρ(·) found in [13, Condition 4]. Second, thebound (4.8) is shown to hold for the subsequence k ∈ UT , not just the subsequencek ∈ U (recall that U ⊆ UT ). We also prove a slightly more general result using amodulus of continuity for ∇f : given x ∈ Ω and r > 0,

    ω(x, r) = max { ‖ ∇f(y)−∇f(x) ‖ | y ∈ Ω, ‖ y − x ‖ ≤ r } . (4.7)

    Theorem 4.2. Suppose that ∇f is continuous on Ω. Consider the iteratesproduced by Algorithm 4.1. If k ∈ UT and εk satisfies εk = ∆k, then

    ∥ [−∇f(xk)]T (xk,εk)∥

    ∥ ≤1

    κmin(ω(xk,∆k) + αk∆k) , (4.8)

    where κmin is from Condition 4.1 and αk is from (4.5).Proof. Clearly, we need only consider the case when [−∇f(xk)]T (xk,εk) 6= 0.

    Condition 4.1 guarantees a set Gk that generates T (xk, εk). Recall that Algorithm 4.1requires ‖ d ‖ = 1 for all d ∈ Gk. Thus, by (4.1) with K = T (xk, εk) and v = −∇f(xk),

    there exists some d̂ ∈ Gk such that

    κmin∥

    ∥ [−∇f(xk)]T (xk,εk)∥

    ∥ ≤ −∇f(xk)T d̂. (4.9)

    The fact that k ∈ UT ensures us that f(xk + ∆̃(d)d) ≥ f(xk) − ρk(∆k) for alld ∈ Gk. By assumption εk = ∆k and Algorithm 4.1 requires ‖ d ‖ = 1 for all d ∈ Gk,so ‖ ∆kd ‖ = εk for all d ∈ Gk. From [13, Proposition 2.2] we know that if x ∈ Ω andv ∈ T (x, ε) satisfies ‖ v ‖ ≤ ε, then x + v ∈ Ω. Therefore xk + ∆kd ∈ Ω for all d ∈ Gk.This, together with (4.2), ensures

    f(xk + ∆kd)− f(xk) + ρk(∆k) ≥ 0 for all d ∈ Gk. (4.10)

    11

  • Step 0. Initialization. Let x0 ∈ Ω be the initial iterate. Let ∆tol > 0 be thetolerance used to test for convergence. Let ∆0 > ∆tol be the initialvalue of the step-length control parameter. Let εmax > ∆tol be themaximum distance used to identify nearby constraints (εmax = +∞ ispermissible). Let α > 0. Let ρk(∆k) = α max

    ˘

    |ftyp|, |f(xk)|¯

    ∆2k,where ftyp 6= 0 is some value that reflects the typical magnitude of theobjective for x ∈ Ω (use ftyp = 1 as a default). Let 0 < θmax < 1.

    Step 1. Choose search directions. Let εk = min{εmax, ∆k}. Choose a set ofsearch directions Dk = Gk ∪Hk satisfying Condition 4.1. Normalize thecore search directions in Gk so that ‖ d ‖ = 1 for all d ∈ Gk.

    Step 2. Look for decrease. Consider trial steps of the form xk + ∆̃(d)d ford ∈ Dk, where ∆̃(d) is as defined in (4.2), until either finding a dk ∈ Dkthat satisfies (4.3) (a successful iteration) or determining that (4.3) isnot satisfied for any d ∈ Gk (an unsuccessful iteration).

    Step 3. Successful Iteration. If there exists dk ∈ Dk such that (4.3) holds,then:

    – Set xk+1 = xk + ∆̃(dk)dk.

    – Set ∆k+1 = φk∆k with φk ≥ 1.

    – Set S = S ∪ {k}.

    – If during Step 2 it was determined that (4.3) is not satisfied for anyd ∈ Gk, set UT = UT ∪ {k}.

    Step 4. Unsuccessful Iteration. Otherwise,

    – Set xk+1 = xk.

    – Set ∆k+1 = θk∆k with

    0 < θk ≤ θmax < 1. (4.6)

    – Set U = U ∪ {k}.

    – Set UT = UT ∪ {k}.

    If ∆k+1 < ∆tol, then terminate.

    Step 5. Advance. Increment k by one and go to Step 1.

    Algorithm 4.1: A linearly constrained GSS algorithm

    Meanwhile, the mean value theorem says that for some σk ∈ (0, 1),

    f(xk + ∆kd)− f(xk) = ∆k∇f(xk + σk∆kd)T d for all d ∈ Gk.

    Combining this with (4.10) yields

    0 ≤ ∆k∇f(xk + σk∆kd)T d + ρk(∆k) for all d ∈ Gk.

    12

  • Dividing through by ∆k and subtracting ∇f(xk)T d from both sides leads to

    −∇f(xk)T d ≤ (∇f(xk + σk∆kd)−∇f(xk))

    T d +ρk(∆k)

    ∆kfor all d ∈ Gk.

    Since 0 < σk < 1 and ‖ d ‖ = 1 for all d ∈ Gk, from (4.7) we obtain

    −∇f(xk)T d ≤ ω(xk,∆k) +

    ρk(∆k)

    ∆kfor all d ∈ Gk. (4.11)

    Since (4.11) holds for all d ∈ Gk, (4.9) tells us that for some d̂ ∈ Gk,

    κmin∥

    ∥ [−∇f(xk)]T (xk,εk)∥

    ∥ ≤ ω(xk,∆k) +ρk(∆k)

    ∆k.

    Using (4.4) and (4.5) finishes the proof:

    ∥ [−∇f(xk)]T (xk,εk)∥

    ∥ ≤1

    κmin

    (

    ω(xk,∆k) + α max{

    |ftyp|, |f(xk)|}

    ∆k)

    .

    If ∇f is Lipschitz continuous with constant M on Ω then ω(x, r) ≤ Mr and (4.8)reduces to

    ∥ [−∇f(xk)]T (xk,εk)∥

    ∥ ≤1

    κmin(M + αk) ∆k. (4.12)

    From (4.12) it is easy to see that the choice of αk in (4.5) yields a bound on therelative size of the projection of −∇f(xk) onto T (xk, εk):

    ∥ [−∇f(xk)]T (xk,εk)∥

    ∥ ≤1

    κmin

    (

    M + α max{

    |ftyp|, |f(xk)|})

    ∆k.

    Next we have the following result, which strengthens [11, Theorem 3.4] while alsoincorporating the ρk(·) we have introduced here as a replacement for the ρ(·) used in[11, 13]. Note the mild conditions placed on φk and ∆k.

    Theorem 4.3. Consider the iterates produced by Algorithm 4.1. Suppose thatfor all k, 1 ≤ φk ≤ φmax and ∆k ≤ ∆max for some ∆max ≥ ∆0. Then eitherlimk→∞ ∆k = 0 or limk→∞ f(xk) = −∞.

    Proof. Suppose limk→∞ ∆k 6= 0. Observe that there must be at least one success-ful iteration, since otherwise every iteration would be unsuccessful and the updatingrule (4.6) would ensure that limk→∞ ∆k = 0. Let k̄ be the first successful iteration.

    Let ∆⋆ = lim supk→∞ ∆k. Since limk→∞ ∆k 6= 0 we know that ∆⋆ > 0. Define

    Q ={

    k | k ≥ k̄, ∆k ≥ ∆⋆/2

    }

    Q′ ={

    k | k ≥ k̄, ∆k ≥ ∆⋆/(2φmax)

    }

    .

    We claim that there are infinitely many successful iterations in Q′.To prove the claim, note that Q is infinite. If there are infinitely many successful

    iterations in Q then there is nothing to prove since Q ⊆ Q′. On the other hand,suppose there are infinitely many unsuccessful iterations in Q. Let k be any suchiteration, and let m be such that iteration k−m− 1 was the last successful iterationpreceding k (we know there must be such an iteration since k ≥ k̄). Since iteration

    13

  • k−m−1 was successful and iterations k−m to k were unsuccessful, the update rulestell us that

    ∆k = θk−1θk−2 · · · θk−mφk−m−1∆k−m−1

    ∆k−m−1 = (θk−1θk−2 · · · θk−mφk−m−1)−1∆k.

    From (4.6), the assumptions placed on φk and ∆k, and the fact that k ∈ Q we obtain

    ∆max ≥ ∆k−m−1 ≥1

    (θmax)m∆⋆

    2φmax≥

    ∆⋆

    2φmax. (4.13)

    Since k −m− 1 ≥ k̄, we conclude that k −m− 1 ∈ Q′. Moreover,

    (θmax)m ≥

    ∆⋆

    2φmax∆max.

    Using the fact log θmax < 0 because θmax < 1 we obtain

    m ≤ m̄ =

    log ∆⋆

    2φmax∆max

    log θmax

    .

    Note that m̄ does not depend on k. Thus, for each of the infinitely many unsuccessfuliterations k ∈ Q there is a successful iteration in Q′ that precedes k by no more thanm̄ + 1 iterations, so there must be infinitely many successful iterations in Q′.

    Next, let ρ⋆ = α∣

    ∣ ftyp∣

    ∣ (∆⋆/(2φmax))2. Then ρ⋆ > 0. If k ∈ Q′ is a successful

    iteration, it follows from the step acceptance rule (4.3), the definitions (4.4) and (4.5),and the definition of Q′ that

    f(xk+1)− f(xk) < −ρk(∆k) < −ρ⋆ < 0.

    Meanwhile, for all other iterations (successful iterations not in Q′ and unsuccess-ful iterations) we have f(xk+1) ≤ f(xk). Since there are infinitely many successfuliterations in Q′, it follows that limk→∞ f(xk) = −∞, and the theorem is proved.

    Theorems 4.2 and 4.3 then yield the following.Theorem 4.4. Let f : Rn → R be continuously differentiable on Ω, and let

    {xk} be the sequence of iterates produced by the generating set search method inAlgorithm 4.1 with εk satisfying εk = ∆k. If {xk} is bounded, or if f is boundedbelow on Ω and ∇f is uniformly continuous on Ω, then

    limk∈UT →∞

    ∥ [−∇f(xk)]T (xk,εk)∥

    ∥ = 0.

    5. Active set identification properties of GSS. It is straightforward to spec-ify Algorithm 4.1 in a way that ensures that the conclusions of Theorems 4.2–4.4 hold.For instance, the usual choice of φk = 1 for all k ensures that ∆k ≤ ∆0 for all k. Inthis case the choice εmax ≥ ∆0 ensures that εk = ∆k for all k since {∆k} is thenmonotonically decreasing.

    Theorems 4.2–4.4 then allow us to conclude that the active set identificationresults of Section 3 hold for the subsequence K = UT of tangentially unsuccessfuliterations produced by Algorithm 4.1. Thus, for problem (1.1) GSS has active setidentification properties like those of gradient projection—even though GSS does notexplicitly rely on derivatives of the objective.

    14

  • When the sequence of tangentially unsuccessful iterates {xk}, k ∈ UT , convergesto a KKT point x⋆, then Corollaries 3.4 and 3.5 allow us to make inferences aboutA(x⋆), the constraints binding at x⋆, from the working sets I(xk, εk) at the tangen-tially unsuccessful iterates xk. The even stronger results of Corollary 3.9 hold for theauxiliary sequence x̂k, k ∈ UT , defined in (3.3). In particular, Corollary 3.9 says thatin a finite number of iterations x̂k will identify the face containing x⋆ under mildnondegeneracy assumptions.

    We next turn to strategies for using the results of Section 3 to accelerate GSSalgorithms. We begin by illustrating what can slow progress towards a solution if thesearch does not make better use of information about the working set I(xk, εk).

    In Figure 4.2 the GSS algorithm makes slow and steady progress toward the KKTpoint. Figure 5.1 shows just how slow the progress might be by illustrating the nextthree iterations, magnified so that the process is easier to see. Notice the pattern thatemerges: so long as the working set contains both the constraints that are active atthe solution, no progress is made since there is no search direction that yields feasibleimprovement. But as soon as the working set drops back down to a single constraint,a feasible step to the East is possible—which is exactly what is needed to move thecurrent iterate toward the solution at the vertex formed by the two constraints. Thisbehavior is especially contrary, since the search fails to make progress at those stepswhere it actually has information that suggests where the solution lies.

    (a) k = 6. New working set(w.r.t. k = 5). Change set ofsearch directions. No decrease;halve ∆k; k ∈ U and k ∈ UT .

    (b) k = 7. Same working set.Keep set of search directions.No decrease; halve ∆k; k ∈ Uand k ∈ UT .

    (c) k = 8. New working set(w.r.t. k = 7). Change set ofsearch directions. Move East;k ∈ S.

    Fig. 5.1. A continuation (at a finer scale) of the example given in Figure 4.2.

    If k is sufficiently large, then at a tangentially unsuccessful step −∇f(xk) liesprimarily in the direction of N(xk, εk). This suggests we include the generators ofN(xk, εk) in the set of additional search directions Hk [14], just as illustrated inFigures 4.2–5.1. However, in the preceding example there are no feasible steps alongthe generators of N(xk, εk) once k > 1. In general, steps along generators of N(xk, εk)usually capture only one constraint in A(x⋆) at a time, as seen in Figure 4.2(b).Typically, active set strategies that change one constraint in the working set at a timeare not as efficient as those that allow multiple constraints to change [9].

    Our resolution is to make use of x̂k from (3.3). We now outline some active setstrategies for GSS that make use of the sequence {x̂k} and its properties described inCorollary 3.9, as well as other information we can infer from the working set I(xk, εk).

    15

  • 5.1. Jumping onto the face identified by the working set. If at iterationk we find that (4.3) is not satisfied by any d ∈ Gk, then the step is tangentiallyunsuccessful. We then could try the point x̂k defined by (3.3). If this trial pointexisted and satisfied (4.3), we could accept it. If it was not acceptable, we still couldtrack A(x̂k) as an estimate of A(x⋆). The rationale for the latter is that we mayactually have A(x̂k) = A(x⋆) but might not have taken the right step into this face.

    A more aggressive strategy, which we have found to be more effective in ourtests, is to try the step x̂k for all iterations, tangentially unsuccessful or not. If thistrial point exists and satisfies the acceptance criterion (4.3), we immediately acceptit. This strategy is consistent with our stationarity results since it is equivalent toincluding the vector for this step in Hk. In this aggressive strategy we venture thatif constraints are showing up in the working set, it may be because they are active atthe solution. We can only guarantee that this is true asymptotically, and then onlyfor the subsequence k ∈ UT . But in our tests we found that GSS had surprisinglyfew iterations that were tangentially unsuccessful (i.e., k ∈ UT ) while also beingsuccessful (i.e., k ∈ S). In general, S ∩ UT = ∅. Moreover, to ascertain that aniteration is tangentially unsuccessful we must verify that (4.3) is not satisfied for anyd ∈ Gk, which requires |Gk| evaluations of the objective. This can be expensive whenevaluation of the objective is costly.

    On the other hand, determining whether the projection onto a face will be a suc-cess requires only one evaluation of the objective. If the step is successful, the work forthe iteration is done; otherwise, we fall back on the usual strategy—after the expenseof this single speculative objective evaluation. We accept the risk knowing that thissimple strategy can be fooled, but confident in the knowledge that asymptotically itwill work consistently in our favor.

    It is easy to see that the aggressive strategy might not succeed in the initialiterations if we start far from a solution. For instance, consider a problem with onlylower and upper bounds on the variables, assume that x⋆ is the vertex where all theupper bounds are active and that x0 is a feasible point near the vertex where all thelower bounds are active. Choose ∆0 so that all of the lower bounds, but none ofthe upper bounds, are contained in the working set I(x0, ε0). Then at k = 0, theprojection will produce the vertex defined by the lower bounds and most likely theobjective function at that point will not satisfy (4.3). However, once the upper boundsbegin to enter the working set, Corollary 3.9 tells us this strategy will succeed.

    5.2. Restricting the search to the face identified by the working set.

    Corollary 3.9 tells us that a step to x̂k is a step into a face of Ω that contains x⋆ oncek is sufficiently large. Once in such a face we have the option of giving priority tosearching inside that face. Given an iteration k, let j be the last iteration precedingk that is known to be tangentially unsuccessful. If the current working set I(xk, εk)satisfies I(xk, εk) ⊆ I(xj , εj), then we first search in the face containing xk. If one ofthese trial steps satisfies (4.3), then we accept it and declare the iteration successful.However, if none of the steps restricted to the face satisfy (4.3), then we continue thesearch using the directions remaining in Gk and/or Hk, which may possibly move thesearch off of the current face.

    Figure 5.2 illustrates this strategy. Rather than automatically attempting toevaluate f at all four of the trial points defined by ∆k and the set of directionsDk, we start by evaluating the object at trial steps in the face identified by the singleconstraint in the working set. So long as one of these two trial points is successful, wedo not consider the trial steps defined by the two directions normal to this constraint.

    16

  • The reasoning here is that, in general, the performance of direct search methodssuffers as the number of optimization variables increases. If we can, with reasonableconfidence, restrict the search to a face, then the reduced dimension of the problemshould make the search more efficient.

    Again, this is a calculated risk. If the working set is a reasonable approxima-tion of the active set, then searching first in the face for improvement makes sense.Furthermore, as long as the iterations remain successful, we do not risk prematureconvergence so long as once we cannot find a successful step in the face we return toa search in the full space.

    Ω Ω Ω

    Fig. 5.2. Giving priority to searching along directions inside a face believed to contain x⋆ withH used to denote the steps within the face defined by the constraint in the working set. We hold thethird generator for T (xk, εk) in reserve in Gk in case it is needed for the search to progress, thoughthat is not the case for this particular example.

    Incidentally, Figure 5.2 illustrates why the working sets at tangentially successfuliterations (i.e., k 6∈ UT ) are not guaranteed to coincide asymptotically with the setof constraints that are active at x⋆. This can also be seen in Figures 4.2–5.1 byconsidering the subsequence k 6∈ UT = {0, 1, 5, 8, . . .}, whereas if we consider thesubsequence k ∈ UT = {1, 2, 3, 4, 6, 7, . . .} it is clear that the working sets for thissubsequence of iterations corresponds to A(x⋆).

    5.3. Identifying vertex solutions. Another benefit of computing x̂k is thatCorollary 3.9 says that if {xk} is converging to a KKT point x⋆ which is a vertex,then {xk} converges to x⋆ in a finite number of iterations. If we find a successful stepto a vertex and this is followed by an unbroken sequence of unsuccessful iterations,then this suggests that we have arrived at a KKT point.

    This is attractive since it gives us another mechanism for terminating the search.Various analytical results, of which Theorem 4.2 is an example, bound appropriatelychosen measures of stationarity in terms of the step-length control parameter ∆k[6, 15, 13]. This makes the magnitude of ∆k a practical measure of stationarity;hence our use of the test θk∆k < ∆tol, for some ∆tol > 0, for termination in Step 4 inAlgorithm 4.1. However, we only decrease ∆ at the end of an unsuccessful iteration.Furthermore, to determine that an iteration is unsuccessful requires verifying that(4.3) is not satisfied for any d ∈ Gk. Experience has shown that while GSS methodsmake good progress toward the solution, they can be slow and costly in verifying thata solution has been found—precisely because they lack explicit derivative information.In the case of vertex solutions the strategy outlined gives us another stopping rule, onethat might require far fewer objective evaluations, as becomes clear when consideringthe example illustrated in Figure 5.3.

    17

  • 5.4. Illustration of the active set strategies. Returning to the example usedin Figure 4.2, Figure 5.3 illustrates what happens if the strategies in Sections 5.1–5.3are incorporated into Algorithm 4.1. We demonstrate in the next section that thesesame strategies can be successfully applied to far more difficult problems from theCUTEr test suite.

    (a) k = 0. New working set.Try but reject the speculativestep x̂k. Move East; k ∈ S.

    (b) k = 1. New working set.Try and accept the speculativestep x̂k; k ∈ S and k ∈ UT .

    (c) k = 2. Same working set.No decrease; halve ∆k; k ∈ Uand k ∈ UT .

    (d) k = 3. Same working set.No decrease; halve ∆k; k ∈ Uand k ∈ UT .

    (e) k = 4. Same working set.No decrease; halve ∆k; k ∈ Uand k ∈ UT .

    (f) k = 5. Same working set.No decrease; halve ∆k; k ∈ Uand k ∈ UT . Possibly stop.

    Fig. 5.3. The effect of employing the strategies in Sections 5.1–5.3 on the example used inFigure 4.2. The speculative step x̂k defined in (3.3) is denoted by �. For any k > 1, Corollary 3.9supports the conclusion for this example that xk = x̂k = x⋆ since xk is at a vertex defined by theconstraints in I(xk, εk), I(xk, εk) remains the same, and k ∈ UT .

    6. Numerical illustration. We demonstrate the effect of using the active setstrategies outlined in Section 5 on eight problems drawn from the CUTEr test suite[10]. Basic characteristics of these eight problems are given in Table 6.1. Seven ofthe eight problems must deal with degeneracy of the linear constraints in the workingset at least once during the course of the search; to do so we used the basic strategyoutlined in [14]. We included the problem bleachng, even though it only has boundson the variables, because it is representative of the type of simulation-based problemsto which direct search methods are typically applied [11]. In the case of bleachng, theevaluation of the objective involves the numerical integration of differential equations.The second derivatives of the objective are not available, computing the objective isrelatively expensive, and the values obtained are only moderately accurate.

    18

  • # of linear # of bounds CUTEr

    problem n network eq. eq. ineq. lower upper classification

    avion2 49 0 15 0 49 49 OLR2-RN-49-15bleachng 17 0 0 0 9 8 SBR1-RN-17-0dallass 46 31 0 0 46 46 ONR2-MN-46-31dallasm 196 151 0 0 196 196 ONR2-MN-196-151himmelbi 100 0 0 12 100 100 OLR2-MN-100-12loadbal 31 11 0 20 31 11 OLR2-MN-31-31spanhyd 97 33 16 0 97 97 ONR2-RN-97-33water 31 10 0 0 31 31 ONR2-MN-31-10

    Table 6.1

    Summary of problem characteristics

    6.1. Starting values and tolerances for the results reported here. Forall eight test problems, we started with the value of x0 specified by CUTEr, thoughwe often had to project into the feasible region, as discussed in [14, section 8.1].The objective values at the feasible points used to initiate the search are given inTable 6.2. The optimal feasible value of the objective function given in Table 6.2was either extracted from the CUTEr file, when that information is included, or wasdefined as the best objective value found using snopt [8], when that information isnot included in the CUTEr file. An exception was made for bleachng since the lowsolution reported in the CUTEr file is almost twice the value of the best solutionsfound by both our method and snopt; the value reported by snopt is treated here asoptimal. It is worth noting that on bleachng, snopt terminates due to numericaldifficulties, indicating that the current point cannot be improved upon; we suspectthat at some point the derivatives are not sufficiently accurate due to the odessaode solver used to evaluate the objective.

    problem f(x0), x0 ∈ Ω f(x∗), x∗ ∈ Ω max evals.

    avion2 9.46803e+07 9.46801e+07 1500bleachng 1.59238e+04 9.17676e+03 540dallass 1.24987e+07 -3.2393 e+04 1410dallasm 6.19530e+07 -4.81981e+04 5910himmelbi -7.84832e+02 -1.73557e+03 3030loadbal 1.54669e+00 4.52851e-01 960spanhyd 2.26888e+10 2.39738e+02 2940water 1.71709e+04 1.05494e+04 960

    Table 6.2

    Summary of starting values

    When both lower and upper bounds are given for all the variables in the problem,we scale the variables to lie between −1 and 1 as described in [14, section 8.4]. Forlinearly constrained GSS algorithms, an advantage of scaling is that it makes it easierto define the reasonable default value ∆0 = 2 since 2 is the longest possible feasiblestep along a unit coordinate direction. Thus we used ∆0 = 2 as our default, evenwhen we had no prior knowledge of reasonable ranges for the variables and thus didnot scale the problem.

    We terminated either when the value of ∆k fell below ∆tol = 1.0e-05 or whenthe number of evaluations of the objective reached that which would be needed, ata minimum, for 30 iterations of a higher-order nonlinear programming method (suchas snopt) using forward-difference estimates of the gradient; these limits are given inTable 6.2. The specific value chosen for ∆tol is an acknowledgment of the fact that

    19

  • there are no more than seven significant decimal digits in the specification of the prob-lems drawn from the CUTEr test suite (in fact, the data for several of the problemsused here are specified to only two or three decimal digits). We used a factor of 12to reduce ∆k after an unsuccessful iteration; we left ∆k unchanged after a successfuliteration, meaning that θk =

    12 and φk = 1 for all k. Thus the sequence {∆k} was

    monotonically decreasing. This, together with the choices of ∆0 and ∆tol, meant that∆k could be reduced at most eighteen times. The second stopping criterion imposesa fairly stringent limit on the number of evaluations of the objective. Our implemen-tation employs a caching scheme, described in [14, section 8.5], to avoid reevaluatingthe objective at points for which the objective has already been computed. Each runreported here starts with the cache empty and only increments the number of evalua-tions of the objective once for evaluating the objective at any given point; there is noincrement if in subsequent iterations the value of the objective at that point is foundin the cache.

    We set εmax = 25∆0 so that it played no role in the results reported here. We set

    α = 1.0e-04 and ftyp = 1.When the problem looked unconstrained locally (i.e., the working set was empty),

    coordinate search was used. Otherwise, the core set of search directions Gk consistedof generators for the ε-tangent cone T (xk, εk). A description of the construction ofthe set Gk in the presence of linear constraints can be found in [14, section 5]. Theset Hk consisted of generators for the ε-normal cone N(xk, εk). The generators forboth T (xk, εk) and N(xk, εk) were normalized. In addition, whenever I(xk, εk) wasnot empty, the set Hk contained the vector from xk to the x̂k defined by (3.3).

    The tests were run on an Apple MacBook with a 2 GHz Intel Core 2 Duo processorand 1 GB memory running Mac OS X, Version 10.4.10 and using Matlab 7.4.0 R2007a.With the exception of bleachng, all the runs reported here took no more thanone minute to execute, even with graphical and printed output to the screen duringexecution. The elapsed times for bleachng were around three minutes since invokingodessa appreciably increases the cost of an objective evaluation.

    6.2. The results. Below we include a summary, along with illustrations, of theiteration history for each problem. We start with a summary of the solutions obtained,given in Tables 6.3–6.6. Under the assumption that computation of a single objectivevalue was expensive, and thus only a limited number of function evaluations would bepossible in practical settings, the active set strategies were designed with two goalsin mind: to obtain better solutions (i.e., by obtaining as large as possible a fractionof optimal improvement) and to more quickly identify potential solutions (i.e., bydriving ∆k below ∆tol more quickly). The value f sol was the best objective valuefound by our method given ∆tol and the limit on the number of objective evaluationsallowed. The value f0 was the value at the initial (feasible) point at which the searchcommenced. The value f opt was the optimal solution, as defined in Section 6.1.Fraction of optimal improvement then was defined as

    |f0− f sol|

    |f0− f opt|.

    In Table 6.3 it is clear that the active set strategies, when employed, obtained frac-tions of optimal improvement that were at least comparable—and generally better—than those obtained without employing the active set strategies.

    The results in Table 6.4 make clear that the active set strategies may help identifysolutions more quickly by driving ∆k below ∆tol more quickly. In four of the instances,

    20

  • active objective value fraction of opt.

    problem set? at termination improvement

    avion2 no 9.46801e+07 0.99983avion2 yes 9.46801e+07 0.99999bleachng no 9.17676e+03 1.00000bleachng yes 9.17676e+03 1.00000dallass no -3.22905e+04 0.99999dallass yes -3.22406e+04 0.99999dallasm no -4.81525e+04 0.99999dallasm yes -4.81311e+04 0.99999himmelbi no -1.72387e+03 0.98770himmelbi yes -1.73488e+03 0.99928loadbal no 4.52897e-01 0.99996loadbal yes 4.52851e-01 1.00000spanhyd no 2.39738e+02 1.00000spanhyd yes 2.39739e+02 1.00000water no 1.05992e+04 0.99247water yes 1.05494e+04 1.00000

    Table 6.3

    Summary of solutions

    the run using the active set strategies terminated with ∆k < ∆tol, which occurredfor only two of the runs that did not employ the active set strategies. Furthermore,even when the runs using the active set strategies terminated because the limit onthe number of objective evaluations was reached, the value of ∆k obtained using theactive set strategies was at least as low, if not lower, than the value obtained whenthe active set strategies were not employed.

    active reason ∆k at reductions # of obj. cacheproblem set? terminated termination of ∆k evals. hits

    avion2 no max evals. 1.220703e-04 14 1500 882avion2 yes ∆k < ∆tol 7.629395e-06 18 852 560bleachng no ∆k < ∆tol 7.629395e-06 18 368 195bleachng yes ∆k < ∆tol 7.629395e-06 18 329 115dallass no max evals. 1.562500e-02 7 1410 273dallass yes max evals. 7.812500e-03 8 1410 275dallasm no max evals. 7.812500e-03 8 5910 931dallasm yes max evals. 7.812500e-03 8 5910 701himmelbi no max evals. 7.812500e-03 8 3030 1288himmelbi yes max evals. 7.812500e-03 8 3030 1187loadbal no max evals. 3.125000e-02 6 960 466loadbal yes max evals. 3.051758e-05 16 960 357spanhyd no ∆k < ∆tol 7.629395e-06 18 1110 2023spanhyd yes ∆k < ∆tol 7.629395e-06 18 1121 1519water no max evals. 7.812500e-03 8 960 899water yes ∆k < ∆tol 7.629395e-06 18 868 585

    Table 6.4

    Summary of state at termination

    We now illustrate the iteration histories of each of the eight test problems. Foreach, we include two graphs, the legends for which are given in Figure 6.1. The firstpair of graphs show the progress made by the algorithm, with and without the activeset strategies employed, along with the reduction in ∆k relative to the number ofobjective evaluations. In every case, at least one of the two active set steps—either astep to the face identified by the working set of constraints or a step within the face

    21

  • identified by the working set—was accepted during the course of the search.

    Value of objective/Value of ∆� f(xk), k ∈ S; accepted step x̂k

    to face identified by I(xk, εk)H f(xk), k ∈ S; accepted step

    within face identified by I(xk, 0)◦ f(xk), k ∈ S, accepted

    regular search stepX f(xk), k ∈ U△ magnitude of ∆k

    Fig. 6.1. Legend for remaining figures.

    0 500 1000 15009.468

    9.468

    9.468

    9.468

    9.468

    9.468

    9.468

    9.468

    9.468

    9.468

    9.468x 10

    7

    Number of objective evaluations

    Val

    ue o

    f obj

    ectiv

    e

    10−5

    10−4

    10−3

    10−2

    10−1

    100

    101

    Val

    ue o

    f ∆

    0 500 1000 15009.468

    9.468

    9.468

    9.468

    9.468

    9.468

    9.468

    9.468

    9.468

    9.468

    9.468x 10

    7

    Number of objective evaluations

    Val

    ue o

    f obj

    ectiv

    e

    10−5

    10−4

    10−3

    10−2

    10−1

    100

    101

    Val

    ue o

    f ∆

    Fig. 6.2. Progress on avion2. On the left, without the active set strategies enabled. On theright, with the active set strategies enabled.

    0 100 200 300 400 50010

    3

    104

    105

    Number of objective evaluations

    Val

    ue o

    f obj

    ectiv

    e

    10−5

    10−4

    10−3

    10−2

    10−1

    100

    101

    Val

    ue o

    f ∆

    0 100 200 300 400 50010

    3

    104

    105

    Number of objective evaluations

    Val

    ue o

    f obj

    ectiv

    e

    10−5

    10−4

    10−3

    10−2

    10−1

    100

    101

    Val

    ue o

    f ∆

    Fig. 6.3. Progress on bleachng. On the left, without the active set strategies enabled. On theright, with the active set strategies enabled.

    By way of comparison, we show how our results compare with those found usingsnopt. The runs from snopt were made using the finite-difference option so that thecount of objective evaluations can be compared. As can be seen in Figures 6.10–6.13,in all cases our algorithm exhibits the good global behavior one would expect from agradient-related method and is, in fact, competitive with snopt in this regard.

    7. Conclusions. We have presented some general results on active set identifi-cation and shown that as a consequence the generating set search class of direct search

    22

  • 0 200 400 600 800 1000 1200 1400−2

    0

    2

    4

    6

    8

    10

    12

    14x 10

    6

    Number of objective evaluations

    Val

    ue o

    f obj

    ectiv

    e

    10−5

    10−4

    10−3

    10−2

    10−1

    100

    101

    Val

    ue o

    f ∆

    0 200 400 600 800 1000 1200 1400−2

    0

    2

    4

    6

    8

    10

    12

    14x 10

    6

    Number of objective evaluations

    Val

    ue o

    f obj

    ectiv

    e

    10−5

    10−4

    10−3

    10−2

    10−1

    100

    101

    Val

    ue o

    f ∆

    Fig. 6.4. Progress on dallass. On the left, without the active set strategies enabled. On theright, with the active set strategies enabled.

    0 1000 2000 3000 4000 5000−1

    0

    1

    2

    3

    4

    5

    6

    7x 10

    7

    Number of objective evaluations

    Val

    ue o

    f obj

    ectiv

    e

    10−5

    10−4

    10−3

    10−2

    10−1

    100

    101

    Val

    ue o

    f ∆

    0 1000 2000 3000 4000 5000−1

    0

    1

    2

    3

    4

    5

    6

    7x 10

    7

    Number of objective evaluations

    Val

    ue o

    f obj

    ectiv

    e

    10−5

    10−4

    10−3

    10−2

    10−1

    100

    101

    Val

    ue o

    f ∆Fig. 6.5. Progress on dallasm. On the left, without the active set strategies enabled. On the

    right, with the active set strategies enabled.

    algorithms possess active set identification properties that are as strong as those ofthe gradient projection method. This is the case even though generating set searchdoes not have explicit recourse to derivatives.

    Our general analytical results show that the working sets of constraints at thestandard sequence of iterates produced by generating set search will, under relativelymild assumptions, identify those constraints active at a KKT point for which thereexist nonzero Lagrange multipliers. The general results also show how a straightfor-ward modification of generating set search will enjoy active set identification propertiescomparable to the gradient projection algorithm.

    We have used these ideas on active set identification to modify our implementationof GSS for solving (1.1) and thus accelerate the progress of the search. This claim issubstantiated with results obtained by applying the new variant of GSS for linearlyconstrained problems to a subset of difficult problems from the CUTEr test suite.

    We plan to extend this approach to handle problems with nonlinear constraintsfor which derivatives can be obtained. Under appropriate constraint qualifications,similar active set results should hold for GSS methods should there be sufficientlyaccurate gradients to allow the linearization of nonlinear constraints, as in the ap-proaches considered in [16]. In addition, the active set identification results may alsobe useful in conjunction with the treatment of inequality constraints via nonnegative

    23

  • 0 500 1000 1500 2000 2500 3000−1800

    −1600

    −1400

    −1200

    −1000

    −800

    −600

    Number of objective evaluations

    Val

    ue o

    f obj

    ectiv

    e

    10−5

    10−4

    10−3

    10−2

    10−1

    100

    101

    Val

    ue o

    f ∆

    0 500 1000 1500 2000 2500 3000−1800

    −1600

    −1400

    −1200

    −1000

    −800

    −600

    Number of objective evaluations

    Val

    ue o

    f obj

    ectiv

    e

    10−5

    10−4

    10−3

    10−2

    10−1

    100

    101

    Val

    ue o

    f ∆

    Fig. 6.6. Progress on himmelbi. On the left, without the active set strategies enabled. On theright, with the active set strategies enabled.

    0 200 400 600 800

    0.5012

    0.631

    0.7943

    1

    1.2589

    Number of objective evaluations

    Val

    ue o

    f obj

    ectiv

    e

    10−5

    10−4

    10−3

    10−2

    10−1

    100

    101

    Val

    ue o

    f ∆

    0 200 400 600 800

    0.5012

    0.631

    0.7943

    1

    1.2589

    Number of objective evaluations

    Val

    ue o

    f obj

    ectiv

    e

    10−5

    10−4

    10−3

    10−2

    10−1

    100

    101

    Val

    ue o

    f ∆

    Fig. 6.7. Progress on loadbal. On the left, without the active set strategies enabled. On theright, with the active set strategies enabled.

    slacks, as in the augmented Lagrangian approaches described in [15, 12].

    Acknowledgments. The authors are deeply grateful to Philip Gill for makinga copy of snopt available for use in the numerical tests reported in this paper.

    REFERENCES

    [1] C. G. Broyden, A class of methods for solving nonlinear simultaneous equations, Mathematicsof Computation, 19 (1965), pp. 577–593.

    [2] J. Burke, On the identification of active constraints II: The nonconvex case, SIAM J. Nu-mer. Anal., 27 (1990), pp. 1081–1102.

    [3] J. V. Burke and J. J. Moré, On the identification of active constraints, SIAM J. Nu-mer. Anal., 25 (1988), pp. 1197–1211.

    [4] , Exposing constraints, SIAM J. Optim., 4 (1994), pp. 573–595.[5] P. H. Calamai and J. J. Moré, Projected gradient methods for linearly constrained problems,

    Mathematical Programming, 39 (1987), pp. 93–116.[6] E. D. Dolan, R. M. Lewis, and V. J. Torczon, On the local convergence properties of pattern

    search, SIAM J. Optim., 14 (2003), pp. 567–583.[7] F. Facchinei, A. Fischer, and C. Kanzow, On the accurate identification of active con-

    straints, SIAM J. Optim., 9 (1998), pp. 14–32.[8] P. E. Gill, W. Murray, and M. A. Saunders, SNOPT: An SQP algorithm for large-

    scale constrained optimization, SIAM J. Optim., 12 (2002), pp. 979–1006. See alsohttp://www.sbsi-sol-optimize.com/asp/sol_product_snopt.htm.

    [9] P. E. Gill, W. Murray, and M. H. Wright, Practical Optimization, Academic Press, Lon-don, 1981.

    24

    http://www.sbsi-sol-optimize.com/asp/sol_product_snopt.htm

  • 0 500 1000 1500 2000 250010

    2

    104

    106

    108

    1010

    1012

    Number of objective evaluations

    Val

    ue o

    f obj

    ectiv

    e

    10−5

    10−4

    10−3

    10−2

    10−1

    100

    101

    Val

    ue o

    f ∆

    0 500 1000 1500 2000 250010

    2

    104

    106

    108

    1010

    1012

    Number of objective evaluations

    Val

    ue o

    f obj

    ectiv

    e

    10−5

    10−4

    10−3

    10−2

    10−1

    100

    101

    Val

    ue o

    f ∆

    Fig. 6.8. Progress on spanhyd. On the left, without the active set strategies enabled. On theright, with the active set strategies enabled.

    0 200 400 600 800

    12589.2541

    15848.9319

    Number of objective evaluations

    Val

    ue o

    f obj

    ectiv

    e

    10−5

    10−4

    10−3

    10−2

    10−1

    100

    101

    Val

    ue o

    f ∆

    0 200 400 600 800

    12589.2541

    15848.9319

    Number of objective evaluations

    Val

    ue o

    f obj

    ectiv

    e

    10−5

    10−4

    10−3

    10−2

    10−1

    100

    101

    Val

    ue o

    f ∆

    Fig. 6.9. Progress on water. On the left, without the active set strategies enabled. On theright, with the active set strategies enabled.

    [10] N. I. M. Gould, D. Orban, and P. L. Toint, CUTEr (and SifDec), a constrained and uncon-strained testing environment, revisited, ACM Trans. Math. Software, 29 (2003), pp. 373–394. See also http://cuter.rl.ac.uk/cuter-www/.

    [11] T. G. Kolda, R. M. Lewis, and V. Torczon, Optimization by direct search: New perspectiveson some classical and modern methods, SIAM Review, 45 (2003), pp. 385–482.

    [12] , A generating set direct search augmented Lagrangian algorithm for optimization witha combination of general and linear constraints, Tech. Rep. 2006–5315, Sandia NationalLaboratories, Albuquerque, NM 87185 and Livermore, CA 94550, August 2006.

    [13] , Stationarity results for generating set search for linearly constrained optimization,SIAM J. Optim., 17 (2006), pp. 943–968.

    [14] R. M. Lewis, A. Shepherd, and V. Torczon, Implementing generating set search methodsfor linearly constrained minimization, SIAM J. Sci. Comput., 29 (2007), pp. 2507–2530.

    [15] R. M. Lewis and V. Torczon, A globally convergent augmented Lagrangian pattern searchalgorithm for optimization with general constraints and simple bounds, SIAM J. Optim.,12 (2002), pp. 1075–1089.

    [16] S. Lucidi, M. Sciandrone, and P. Tseng, Objective-derivative-free methods for constrainedoptimization, Mathematical Programming, 92 (2002), pp. 37–59.

    [17] J. J. Moré, B. S. Garbow, and K. E. Hillstrom, Testing unconstrained optimization soft-ware, ACM Trans. Math. Software, 7 (1981), pp. 17–41.

    [18] C. Oberlin and S. J. Wright, Active set identification in nonlinear programming, SIAMJ. Optim., 17 (2006), pp. 577–605.

    25

    http://cuter.rl.ac.uk/cuter-www/

  • active maximum violation at solution

    problem set? lower bounds upper bounds

    avion2 no 0.00000e+00 0.00000e+00avion2 yes 1.77636e-15 0.00000e+00bleachng no 0.00000e+00 0.00000e+00bleachng yes 0.00000e+00 0.00000e+00dallass no 0.00000e+00 0.00000e+00dallass yes 0.00000e+00 0.00000e+00dallasm no 3.55271e-14 0.00000e+00dallasm yes 1.77636e-14 0.00000e+00himmelbi no 3.41061e-13 0.00000e+00himmelbi yes 2.67164e-12 0.00000e+00loadbal no 3.27056e-15 0.00000e+00loadbal yes 1.67351e-14 0.00000e+00spanhyd no 1.42109e-14 0.00000e+00spanhyd yes 0.00000e+00 0.00000e+00water no 9.09495e-13 0.00000e+00water yes 5.68434e-13 0.00000e+00

    Table 6.5

    Summary of feasibility of simple bounds at termination.

    active maximum violation at solution

    problem set? linear lower bounds linear upper bounds

    avion2 no 3.63798e-12 4.54747e-13avion2 yes 2.54659e-11 3.92220e-12dallass no 2.90323e-13 2.93099e-13dallass yes 1.56319e-13 9.76996e-14dallasm no 4.17028e-13 3.62377e-13dallasm yes 2.23821e-13 1.77636e-13himmelbi no 0.00000e+00 2.67164e-12himmelbi yes 1.70530e-13 5.57066e-12loadbal no 1.77636e-14 1.13687e-13loadbal yes 2.84217e-14 8.52651e-14spanhyd no 2.16716e-13 7.74492e-13spanhyd yes 2.52243e-13 5.45342e-13water no 2.27374e-13 2.27374e-13water yes 2.16005e-12 1.08002e-12

    Table 6.6

    Summary of feasibility of linear constraints at termination.

    0 500 1000 15009.468

    9.468

    9.468

    9.468

    9.468

    9.468

    9.468

    9.468

    9.468

    9.468

    9.468x 10

    7

    Number of objective evaluations

    Val

    ue o

    f obj

    ectiv

    e

    0 100 200 300 400 5000.9

    1

    1.1

    1.2

    1.3

    1.4

    1.5

    1.6x 10

    4

    Number of objective evaluations

    Val

    ue o

    f obj

    ectiv

    e

    Fig. 6.10. Progress on avion2,on the left, and bleachng, on the right, with the active setstrategies enabled (◦ ) vs. progress for snopt with finite-differencing ( �).

    26

  • 0 200 400 600 800 1000 1200 1400−2

    0

    2

    4

    6

    8

    10

    12

    14x 10

    6

    Number of objective evaluations

    Val

    ue o

    f obj

    ectiv

    e

    0 1000 2000 3000 4000 5000−1

    0

    1

    2

    3

    4

    5

    6

    7x 10

    7

    Number of objective evaluations

    Val

    ue o

    f obj

    ectiv

    eFig. 6.11. Progress on dallass,on the left, and dallasm, on the right, with the active set

    strategies enabled (◦ ) vs. progress for snopt with finite-differencing ( �).

    0 500 1000 1500 2000 2500 3000−1800

    −1600

    −1400

    −1200

    −1000

    −800

    −600

    −400

    −200

    0

    200

    Number of objective evaluations

    Val

    ue o

    f obj

    ectiv

    e

    0 200 400 600 8000.4

    0.6

    0.8

    1

    1.2

    1.4

    1.6

    1.8

    2

    Number of objective evaluations

    Val

    ue o

    f obj

    ectiv

    e

    Fig. 6.12. Progress on himmelbi,on the left, and loadbal, on the right, with the active setstrategies enabled (◦ ) vs. progress for snopt with finite-differencing ( �).

    0 500 1000 1500 2000 25000

    0.5

    1

    1.5

    2

    2.5x 10

    10

    Number of objective evaluations

    Val

    ue o

    f obj

    ectiv

    e

    0 200 400 600 8001

    1.1

    1.2

    1.3

    1.4

    1.5

    1.6

    1.7

    1.8x 10

    4

    Number of objective evaluations

    Val

    ue o

    f obj

    ectiv

    e

    Fig. 6.13. Progress on spanhyd,on the left, and water, on the right, with the active setstrategies enabled (◦ ) vs. progress for snopt with finite-differencing ( �).

    27

    title_page.pdfpaper.pdf