optimization in engineering design georgia institute of technology systems realization laboratory 1...

Optimization in Engineering Design

Georgia Institute of TechnologySystems Realization Laboratory

1

Primal Methods



2

Primal Methods

• By a primal method of solution we mean a search method that works on the original problem directly by searching through the feasible region for the optimal solution.

– Methods that work on an approximation of the original problem are often referred to as “Transformation Methods”

• Each point in the process is feasible (theoretically) and the value of the objective function constantly decreases.

• Given n variables and m constraints, primal methods can be devised that work in spaces of dimension n-m, n, m, or n+m. In other words, a large variety exists.



3

Advantages of Primal Methods

Primal methods possess 3 significant advantages (Luenberger):

1) Since each point generated in the search process is feasible, if the process is terminated before reaching the solution, the terminating point is feasible. Thus, the final point is feasible and probably nearly optimal.

2) Often it can be guaranteed that if they generate a convergent sequence, then the limit point of that sequence must be at least a local constrained minimum.

3) Most primal methods do not rely on a special problem structure, such as convexity, and hence these methods are applicable to general nonlinear programming problems.

• Furthermore, their convergence rates are competitive with other methods, and particularly for linear constraints, they are often among the most efficient.



4

Disadvantages of Primal Methods

Primal methods are not without disadvantages:• They require a (Phase I) procedure to obtain an initial feasible point.

• They are all plagued, particularly for problems with nonlinear constraints, with computational difficulties arising from the necessity to remain within the feasible region as the method progresses.

• Some methods can fail to converge for problems with inequality constraints (!) unless elaborate precautions are taken.



5

Some Typical Primal Algorithm Classes

The following classes of algorithms are typically noted under primal methods:

• Feasible direction methods which search only in directions which are always feasible

– Zoutendijk’s Feasible Direction method

• Active set methods which partition inequality constraints into two groups of active and inactive constraints. Constraints treated as inactive are essentially ignored.

– Gradient projection methods which project the negative gradient of the objective onto the constraint surface.

– (Generalized) reduced gradient methods which partition the problem variables into basic and non-basic variables.



6

Active Sets



7

Dividing the Constraint Set

• Constrained optimization can be made much more efficient if you know which constraints are active and which are inactive.

• Mathematically, active constraints are always equalities (!)

• Only considering the active constraints leads to a family of constrained optimization algorithms that can be classified as “active set methods”



8

Active Set Methods

• The idea underlying active set methods is to partition inequality constraints into two groups: – those that are active and – those that are inactive.

• The constraints treated as inactive are essentially ignored.

• Clearly, if the active constraints (for the solution) would be known, then the original problem could be replaced by a corresponding problem having equality constraints only.

• Alternatively, suppose we guess an active set and solve the equality problem. Then if all constraints and optimality conditions would be satisfied, then we would have found the correct solution.



9

Basic Active Set Method

• Idea behind active set methods is to define at each step af the algorithm a set of constraints, termed the working set, that is to be treated as the active set.

• Active set methods consist of two components:1) determine a current working set that is a subset of the active set,2) move on the surface defined by the working set to an improved solution. This surface is often referred to as the working surface.

• The direction of movement is generally determined by first or second order approximations of the functions.



10

Basic Active Set Algorithm

Basic active set algorithm is as follows:1. Start with a given working set and begins minimizing over the

corresponding working surface.2. If new constraint boundaries are encountered, they may be added to

the working set, but no constraints are dropped from the working set.3. Finally, a point is obtained that minimizes the objective function with

respect to the current working set of constraints.4. For this point, optimality criteria are checked, and if it is deemed

“optimal”, the solution has been found.5. Otherwise, one or more inactive constraints are dropped from the

working set and the whole procedure is restarted with this new working set.

• Many variations are possible• Specific examples:

– Gradient Projection algorithm – (Generalized) Reduced Gradient algorithm



11

Some Problems With Active Set Methods

• Accuracy of activity can cause some problems.

• Also, the calculation of the Lagrangian multipliers may not be accurate if we are just a bit off the exact optimum.

• In practice, constraints are dropped from the working set using various criteria before an exact minimum on the working surface is found.

• For many algorithms, convergence cannot be guaranteed and jamming may occur in (very) rare cases.

• Active set methods with various refinements are often very effective.



12

Feasible Direction Methods



13

Basic Algorithm

Each iteration in a feasible direction method consists of 1) selecting a feasible direction and 2) a constrained line search.



14

(Simplified) Zoutendijk MethodOne of the earliest proposals for a feasible direction method uses a linear programming subproblem. Consider

min ƒ(x)subject to a1

T x b1

...am

T x bm

Given a feasible point, xk, let I be the set of indices representing active constraints, that is, aiT x = bi for i I.

The direction vector dk is the chosen as the solution to the linear program

minimize ƒ(xk) dsubject to ai

T d 0, i I(normalizing constraint)

where d = (d1, d2, ..., dn)

Constraints assure that vectors of the form will be feasible for sufficiently small > 0, and subject to these conditions, d is chosen to line up as closely as possible with the negative gradient of ƒ. This will result in the locally best direction in which to proceed.

The overall procedure progresses by generating feasible directions in this manner, and moving along them to decrease the objective.

1|d|n

1ii



15

Feasible Descent Directions

Basic problem:Min f(x)Subject to gi(x) 0 with i = 1, .., m

• Now think of a direction vector d that is both descending and feasible:

– “descent direction” (= reducing f(x))– “feasible direction” (= reducing g(x) = increasing feasibility)

• If d reduces f(x), then the following holds: f(x)T d <0 • If d increases feasibility of gi(x), then the following holds: gi(x)T d

<0• Given that you know d, you know need to know how far to go along d.

– xk+1 = xk + k dk



16

Finding the direction vector – A LP Problem

• The following condition expresses the value of k:k = max {f(x)T d , gj(x)T d for each j I}

where I is the set of active constraints

• Note that k < 0 MUST hold if you want both a reduction in f(x) and increase in feasibility (remember g(x) < 0, thus lower g(x) is better)

• The best k is the lowest valued (most negative) k , thus the problem now becomes:minimize Subject tof(x)T d gj(x)T d for each j I-1 di 1 where I = 1, .., n

• This Linear Programming problem now has n+1 variables (n elements of vector d plus scalar )



17

Next step: Constrained Line Search

• The idea behind feasible direction methods is to take steps through the feasible region of the formxk+1 = xk + k dk

where dk is a direction vector and k is a nonnegative scalar.

• Given that we have dk , next we need to know how far to move along dk .

• The scalar k is chosen to minimize the objective function ƒ with the restriction that the point xk+1 and the line segment joining xk and xk+1 be feasible.

• IMPORTANT: Note that while moving along dk, we may encounter constraints that were inactive, but can now become active.

• Thus, we do need to do a constrained line search to find the maximum u .

• Approach in textbook: – Determine maximum step size based on bounds of variables. – If all constraints are feasible at the variable bounds, take this maximum step size as the step size.– Otherwise, search along dk until you find a constraint that cause infeasibility first.



18

Major Shortcomings

Two major shortcomings of feasible direction methods that require modification of the methods in most cases:

1) For general problems, there may not exist any feasible direction. (example??)In such cases, either • relax definition of feasibility or allow points to deviate, or• introduce concept of moving along curves rather than straight lines.

2) Feasible direction methods can be subject to jamming a.k.a. zigzagging, that is, it does not converge to a constrained local minimum.In Zoutendijk's method, this can be caused because the method for finding a feasible direction changes if another constraint becomes active.



19

Gradient Projection Methods



20

Basic Problem Formulation

• Gradient projection started from nonlinear optimization problem with linear constraints:

Min f(x)S.t.

arTx br

asTx = bs



21

Gradient Projection Methods

• Gradient projection method is motivated by the ordinary methods of steepest descent for unconstrained problems.

• Fundamental Concept: The negative gradient of the objective function is projected on the working surface (subset of active constraints) in order to define the direction of movement.

• Major task is to calculate projection matrix (P) and subsequent feasible direction vector d.



22

Feasible Direction Vector and Projection Matrix

At the feasible point x we seek a feasible direction d satisfying ƒ(x) d < 0 so thatmovement in the direction d will cause a decrease in the functions ƒ.

Initially, we consider directions satisfying aiTd = 0 so that all working constraints remainactive. This is the same as requiring that the direction vector d lie in the tangent subspacedefined by the working set.

In gradient projection methods, the particular direction used is the projection of thenegative gradient of the objective function on this working set subspace.

After some math, the following expression for this direction can be found

dk = –[I – AqT(AqAqT)-1Aq] gk = – Pk gk

wheregk is the gradient of the objective functionAq is a matrix composed of the rows of (linear) working constraints, andPk = [I – AqT(AqAqT)-1Aq] is the so-called projection matrix.



23

Nonlinear Constraints

For the general case of

min ƒ(x)s.t. h(x) = 0 g(x) 0

the basic idea is that at a feasible point xk one determines the active constraints and projects the negative gradient of ƒ onto the subspace tangent to the surface determined by these constraints.

This vector (if nonzero) determines the direction for the next step.

However, this vector is in general not a feasible direction since the working surface may be curved. Therefore, it may not be possible to move along this projected negative gradient to obtain the next point.



24

Overcoming Curvature Difficulties

What is typically done to overcome the problem of curvature and loss of feasibility is to search along a curve along the constraint surface, the direction of the search being defined by the projected negative gradient.

A new point is found as follows:• First, a move is made along the projected gradient to a point y.• Then a move is made in the direction perpendicular to the tangent

plane at the original point to a nearby feasible point on the working set.

• Once this point is found, the value of the objective function is determined.

• This is repeated with various y's until a feasible point is found that satisfies the descent criteria for improvement relative to the original point.



25

Difficulties and Complexities

The movement away from the feasible region and then coming back introduces difficulties that require series of interpolations for and nonlinear equation solutions for their resolution, because:

1) You first have to get back in the feasible region, and2) next, you have to find a point on the active set of constraints.Thus, a satisfactory gradient projection method is quite complex.

Computation of the nonlinear projection matrix is also more time consuming than for linear constraints:

Pk = [I – h(xk)T [h(xk)h(xk)T]-1 h(xk)]

where h(xk) is the first order derivative of all active constraints with respect to thevariables.

Nevertheless, gradient projection method has been successfully implemented and found to be effective (your book says otherwise).

But, all the extra features needed to maintain feasibility require skill.



26

(Generalized) Reduced Gradient Method



27

Reduced Gradient Method

• Reduced gradient method is closely related to simplex LP method because variables are split into basic and non-basic groups.

• From a theoretical viewpoint, the method behaves very much like the gradient projection method.

• Like gradient projection method, it can be regarded as a steepest descent method applied on the surface defined by the active constraints.

• Reduced gradient method seems to be better than gradient projection methods.



28

Dependent and Independent Variables

Consider min ƒ(x)s.t. Ax = b, x 0

Partition variables into two groups x = (y, z) where y has dimension m and z has dimension n-m. This partition is formed such that all varaibles in y are strictly positive.

Now, the original problem can be expressed as:

min ƒ(y, z)s.t. By + Cz = b, y ≥ 0, z ≥ 0 (with, of course, A = [B, C])

Key notion is that if z is specified (independent variables), than y (the dependent variables) can be uniquely solved. NOTE: y and z are dependent.

Because of this dependency, if we move z along the line z + z, then y will have to move along a corresponding line y + y.

Dependent variables y are also referred to as basic variablesIndependent variables z are also referred to as non-basic variables



29

The Reduced Gradient

Basic idea of reduced gradient method is to consider, at each stage, the problem only interms of the independent variables.

Since y can be obtained from z, the objective function ƒ can be considered as a functionof z only.

The gradient of ƒ with respect to the independent variables z is found by evaluating thegradient of ƒ(B-1b – B-1Cz, z). The resulting gradient

rT = zƒ(y, z) – yƒ(y, z) B-1C

is called the reduced gradient.

Note the similarity with the reduced simplex and multiplex methods.

Note that in order to improve the solution, we want to step in the direction z = – r, thatis, the negative gradient.



30

Generalized Reduced Gradient

The generalized reduced gradient solves nonlinear programming problems in the standard form

minimize ƒ(x)subject to h(x) = 0, a x b

where h(x) is of dimension m.

GRG algorithm works similar as with linear constraints.However, is also plagued with similar problems as gradient

projection methods regarding maintaining feasibility.Well known implementation: GRG2 software

The generalized reduced gradient is

rT = zƒ(y, z) – yƒ(y, z) [yh(y, z)]-1zh(y, z)



31

Movement Basics

• Basic idea in GRG is to search with z variables along reduced gradient r for improvement of objective function and use y variables to maintain feasibility.

• If some zi is at its bound (see Eq. 5.62), then set the search direction for that variable di = 0, dependent on the sign of the reduced gradient r.

– Reason: you do not want to violate a variable’s bound. Thus, that variable is fixed by not allowing it to change (di = 0 means it has now no effect on f(x)).

• Search xk+1 = xk + d with d = [dy, dz] (column vector)• If constraints are linear, then new point is (automatically) feasible.

– See derivation on page 177; constraints and objective function are combined in reduced gradient.

• If constraint(s) are non-linear, you have to adjust y with some y to get back to feasibility.

– Different techniques exist, but basically it is equivalent to an unconstrained optimization problem that minimizes constraint violation.



32

Changing Basis

• Picking a basis is sometimes poorly discussed in textbooks– Some literally only say “pick a set z and y”

• Your textbook provides a method based on Gaussian elimination on pages 180-181 that is done every iteration of the method in your textbook.

• Other (“recent”) implementations favor changing basis only when a basic variable reaches zero (or equivalently, its upper or lower bound) since this saves recomputation of B-1.

• Thus, if a dependent variable (y) becomes zero, then this zero-valued dependent (basic) variable is declared independent and one of the strictly positive independent variables is made dependent.

• Either way, this is analogous to an LP pivot operation



33

Reduced Gradient Algorithm (one implementation)

1. Pick a set of independent (z) and dependent (y) variables2. Let z = -ri if ri < 0 or ri > 0. Otherwise let z = 03. If z = 0, then stop because the current point is the solution.

Otherwise, find y = – B-1Cz 3. Find 1, 2, 3 achieving, respectively:

max {: y + y ≥ 0 }max {: z + z ≥ 0 }min {ƒ(x + x ): 0 ≤ ≤ 1, 0 ≤ ≤ 2 }

Let x = x + 3 x 4. If 3 < 1, return to (1). Otherwise, declare the vanishing variable in

the dependent set (y) independent and declare the strictly positive variable in the independent set (z) dependent (pivot operation). Update B and C.

Note that your book has a slightly different implementation on page 181!



34

Comments

• GRG method can be quite complex.

• Also note that inequality constraints have to be converted to equalities first through slack and surplus variables.

• GRG2 is a very well known implementation.

optimization in engineering design georgia institute of technology systems realization laboratory 1...

Documents