stochastic linear optimization - uzh · stochastic linear programming algorithms: a comparison...

Stochastic Linear Optimization

Peter Kall

Sommersemester 1999rev. Wintersemester 2004/05

*************************

Literature i

References

[1] J.F. Benders. Partitioning procedures for solving mixed-variablesprogramming problems. Numer. Math., 4:238–252, 1962.

[2] J.R. Birge and F. Louveaux. Introduction to StochasticProgramming. Springer Series in Operations Research.Springer-Verlag, Berlin/Heidelberg, 1997.

[3] P. Kall and J. Mayer. Stochastic Linear Programming: Models,Theory, and Computation. Kluwer Academic Publ. /Springer-Verlag, 2004. To appear.

[4] P. Kall and S.W. Wallace. Stochastic Programming. John Wiley& Sons, Chichester, 1994. Out of print; to download from{http://www.unizh.ch/ior/Pages/Deutsch/Mitglieder/Kall/bib/ka-wal-94.pdf}.

Literature ii

[5] J. Mayer. Stochastic Linear Programming Algorithms: AComparison Based on a Model Management System. Gordon andBreach Science Publishers, 1998.(Habilitationsschrift, Wirtschaftswiss. Fakultat, UniversitatZurich, 1996).

[6] A. Prekopa. Stochastic Programming. Kluwer Academic Publ.,1995.

Content iii

Contents

1 Introduction 1.8

2 Stochastic Programs:General Formulation 41.235

2.1 Deterministic Equivalents . . . . . . . . . . . . . . . . . 46.263

3 Probabilistic Constraints andIntegrated Chance Constraints: Properties 66.426

3.1 Properties of Probabilistic Constraints . . . . . . . . . . 67.427

3.2 Integrated Chance ConstraintsW.K. Klein Haneveld and M.H. van der Vlerk (2002). . . . . . . 92.595

4 Chance Constrained Programs:

Content iv

Computational Approaches 113.760

4.1 Probabilistic Constraints . . . . . . . . . . . . . . . . . . 114.761

4.1.1 Reduced Gradient Method . . . . . . . . . . . . . 114.761

4.1.2 Cutting Plane Methods . . . . . . . . . . . . . . 121.829

4.2 ICC: The Discrete Case . . . . . . . . . . . . . . . . . . 141.981

5 Recourse Problems:Discrete Distributions 146.1023

5.1 The Two-Stage Case . . . . . . . . . . . . . . . . . . . . 147.1024

5.1.1 Dual Decomposition . . . . . . . . . . . . . . . . 151.1042

5.2 The Multi-Stage Case . . . . . . . . . . . . . . . . . . . 175.1258

5.2.1 Nested Decomposition . . . . . . . . . . . . . . . 184.1322

5.3 Regularized Decomposition . . . . . . . . . . . . . . . . 197.1415

Content v

5.4 Stochastic Decomposition . . . . . . . . . . . . . . . . . 210.1505

5.5 Stochastic Quasi-Gradient-Method . . . . . . . . . . . . 219.1559

6 Recourse Problems:Properties and Approximations 231.1644

6.1 The Two–Stage Case: Properties . . . . . . . . . . . . . 232.1645

6.2 Inequalities and Approximations . . . . . . . . . . . . . 236.1670

7 Simple Recourse Type Problems 251.1749

7.1 SRT Functions . . . . . . . . . . . . . . . . . . . . . . . 252.1750

7.2 Approximation of Expected SRT Functions . . . . . . . 254.1762

7.3 Multiple Simple Recourse . . . . . . . . . . . . . . . . . 262.1812

1. Introduction 1.8

1 Introduction

1. Introduction 2.19

Production problem (e.g. refinery):

From two raw materials, raw1 and raw2,to produce simultaneously two products, prod1 and prod2.

The unit costs of raw materials, c = (craw1, craw2)T

(yielding the production cost γ),

the product demands, h = (hprod1, hprod2)T,

the production capacity b(max. total amount of raws to be processed),

and the productivities π(raw i, prod j)(output of product j per unit of the raw i), are given in Table 1.1.


Products

Raws prod 1 prod 2 c b

raw 1 2 3 2 1

raw 2 6 3 3 1

relation ≥ ≥ = ≤h 180 162 γ 100

Table 1.1: Productivities π(raw i, prod j).


Products


raw 1 2 3 2 1

raw 2 6 3 3 1

relation ≥ ≥ = ≤h 180 162 γ 100


Objective:

γ = 2xraw1 +3xraw2


Products


raw 1 2 3 2 1

raw 2 6 3 3 1

relation ≥ ≥ = ≤h 180 162 γ 100


Capacity constraint:

xraw1 +xraw2 ≤ 100


Products


raw 1 2 3 2 1

raw 2 6 3 3 1

relation ≥ ≥ = ≤h 180 162 γ 100


Demand for product prod 1:

2xraw1 +6xraw2 ≥ 180


Products


raw 1 2 3 2 1

raw 2 6 3 3 1

relation ≥ ≥ = ≤h 180 162 γ 100


Demand for product prod 2:

3xraw1 +3xraw2 ≥ 162,


Products


raw 1 2 3 2 1

raw 2 6 3 3 1

relation ≥ ≥ = ≤h 180 162 γ 100


Nonnegativity:

xraw1 ≥ 0

xraw2 ≥ 0.


In conclusion, we have to deal with the following LP:

(1.1)

min(2xraw1 + 3xraw2)

s.t. xraw1 + xraw2 ≤ 100,

2xraw1 + 6xraw2 ≥ 180,

3xraw1 + 3xraw2 ≥ 162,

xraw1 ≥ 0,

xraw2 ≥ 0.

Due to the simplicity of this example problem, we can give agraphical representation of the feasible production plans (Figure 1.1).


Figure 1.1: Deterministic LP: set of feasible production plans.


For γ(x) = 2xraw1 + 3xraw2, Figure 1.2 shows the unique optimum

(1.2) xraw1 = 36, xraw2 = 18, γ(x) = 126.

Figure 1.2: LP: feasible production plans and cost function for γ = 290.


Production problem properly described by (1.1) and solved by (1.2)provided the productivities, the unit costs, the demands and thecapacity (Table 1.1) are fixed data, known to us prior to making ourdecision on the production plan.If however at least some of the data—productivities and demands forinstance—vary within certain limits (for our discussion, randomly),and we have to make our decision on the production plan beforeknowing the exact values of those data, the LP (1.1) does no morereflect the real problem.

To be more specific, let us assume that


• our model describes the weekly production process of a refinerydepending on two dealers of crude oil (raw1 and raw2,respectively), supplying one big company with gasoline (prod1)for its distribution system of gas stations and another one withfuel oil (prod2) for its heating and/or power plants;

• the productivities π(raw1, prod1) and π(raw2, prod2), i.e. theoutput of gas from raw1 and fuel from raw2, change randomly,whereas the other productivities are deterministic;

• the weekly demands of the clients, hprod1 for gas and hprod2 forfuel, are varying randomly;

• the weekly production plan (xraw1, xraw2) has to be fixed inadvance and cannot be changed during the week,


• whereas the actual productivities are only observed (measured)during the production process itself, and

• the clients expect their actual demand to be satisfied during thecorresponding week.

Assume that, with N (µ, σ) the normal distribution with mean µ andvariance σ2, we have

(1.3)

hprod1 = 180 + ζ1, distr ζ1 ∼ N (0, 12),

hprod2 = 162 + ζ2, distr ζ2 ∼ N (0, 9),

π(raw1, prod1) = 2 + η1, distr η1 ∼ U [−0.8, 0.8],

π(raw2, prod2) = 3.4 − η2, distr η2 ∼ EXP(λ = 2.5),

the rv’s ζ1, ζ2, η1, η2 to be mutually independent.


Restricting the unbounded random variables ζ1, ζ2 and η2 to theirrespective 99% confidence intervals (except for U), the realizationsvary as follows:

(1.4)

ζ1 ∈ [−30.91, 30.91],

ζ2 ∈ [−23.18, 23.18],

η1 ∈ [−0.8, 0.8],

η2 ∈ [0.0, 1.84].


Hence, instead of the LP (1.1), with

hprod1 = 180 + ζ1, hprod2 = 162 + ζ2,

π(raw1, prod1) = 2 + η1, π(raw2, prod2) = 3.4 − η2,

we have the stochastic linear program:

(1.5)



(2 + η1)xraw1 + 6xraw2 ≥ 180 + ζ1,

3xraw1 + (3.4 − η2)xraw2 ≥ 162 + ζ2,

xraw1 ≥ 0,

xraw2 ≥ 0.


This SLP is not well-defined since it is not at all clear what “min”means before knowing a realization (ζ1, ζ2, η1, η2) of (ζ1, ζ2, η1, η2).

Geometrically, the consequence of our random parameter changesmay be rather complex.

The effect of only the right-hand sides ζi varying over the intervalsgiven in (1.4)corresponds to parallel translations of the facets of the feasible setrelated to the particular constraints as shown in Figure 1.3.


Figure 1.3: LP: feasible set varying with demands.


Consider instead the effect of only the ηi changing their values withinthe intervals mentioned in (1.4).

The result: Rotations of the related facets.

Some possible situations are shown in Figure 1.4, where the centersof rotation are indicated by small circles.


Figure 1.4: LP: feasible set varying with productivities.


Allowing for all the possible changes in the demands and in theproductivities simultaneously, yields a superposition of the twogeometrical motions, i.e. the translations and the rotations. It iseasily seen that the variation of the feasible set may be substantial,depending on the actual realizations of the random data.

The same is also true for the so-called wait-and-see solutions, i.e. forthose optimal solutions we should choose if we knew the realizationsof the random parameters in advance.

In Figure 1.5 a few possible situations are indicated.


Figure 1.5: LP: varying productivities and demands; some wait-and-see solutions.


In addition to the deterministic solution

x = (xraw1, xraw2) = (36, 18), γ = 126,

production plans such as

(1.6)

y = (yraw1, yraw2) = (20, 30), γ = 130,

z = (zraw1, zraw2) = (50, 22), γ = 166,

v = (vraw1, vraw2) = (58, 6), γ = 134

may be wait-and-see solutions.

Unfortunately, wait-and-see solutions are not what we need. We haveto decide on production plans under uncertainty, i.e. ahead of therealizations of the random demands and productivities!


A first possibility: Looking for a “safe” production program, i.e. oneto be feasible for all possible realizations of the rv’s. This is a fatsolution and reflects total risk aversion. Rather expensive!

From Fig. 1.5 intersection ofthe two rightmost constraints forprod1 and prod2;

easily computed as(1.7) x∗ = (x∗raw1, x

∗raw2) = (48.018, 25.548), γ∗ = 172.681.


Another possibility. Assume that the refinery has made the followingarrangement with its clients:

The clients expect the refinery to satisfy their weekly demands.However, very likely—according to the production plan and theunforeseen clients’ demands and/or the refinery’s productivity—thedemands cannot be covered by the production.

This will cause “penalty” costs to the refinery. The amount ofshortage has to be bought from the market, these penalties supposedto be proportional to the respective shortage in products. Assumethat per unit of undeliverable products they amount to

(1.8) qprod1 = 7, qprod2 = 12.


The costs due to shortage—or in general due to the violation in theconstraints—are determined after the observation of the randomdata. They are denoted as recourse costs.

In a case (like ours) of repeated execution of the production programit is meaningful—according to statistics—to apply an expected valuecriterion.

More precisely, we may want to find a production plan thatminimizes the sum of our original first-stage (i.e. production) costsand the expected recourse costs.

To formalize this approach, we abbreviate our notation. Instead ofthe four single random variables ζ1, ζ2, η1, η2, it seems convenient touse the random vector ξ = (ζ1, ζ2, η1, η2)T.


Further, we introduce for each of the two stochastic constraintsin (1.5) a recourse variable yi(ξ), i = 1, 2, which simply measuresthe corresponding shortage in production if there is any; sinceshortage depends on the realizations of our random vector ξ, so doesthe corresponding recourse variable, i.e. the yi(ξ) are themselvesrandom variables.

Following the approach sketched so far, we now replace the vaguestochastic program (1.5) by the well defined stochastic program withrecourse, using

h1(ξ) := hprod1 = 180 + ζ1, h2(ξ) := hprod2 = 162 + ζ2,

α(ξ) := π(raw1, prod1) = 2 + η1, β(ξ) := π(raw2, prod2) = 3.4 − η2,

to get:


(1.9)

min{2xraw1 + 3xraw2+Eξ[7y1(ξ) + 12y2(ξ)]}s.t. xraw1 + xraw2 ≤ 100,

α(ξ)xraw1 + 6xraw2 + y1(ξ) ≥ h1(ξ),

3xraw1 + β(ξ)xraw2 + y2(ξ) ≥ h2(ξ),

xraw1 ≥ 0,

xraw2 ≥ 0,

y1(ξ) ≥ 0,

y2(ξ) ≥ 0.


In general, in (1.9) the stochastic constraints have to hold almostsurely (a.s.) (i.e., with probability 1). For ξ with a finite discretedistribution {(ξi, pi), i = 1, · · · , r} (pi > 0 ∀i), (problem 1.9) is justan LP with a dual decomposition structure:

(1.10)

min{2xraw1 + 3xraw2 +∑r

i=1 pi[7y1(ξi) + 12y2(ξi)]}s.t. xraw1 + xraw2 ≤ 100,

α(ξi)xraw1 + 6xraw2 + y1(ξi) ≥ h1(ξi) ∀i,

3xraw1 + β(ξi)xraw2 + y2(ξi) ≥ h2(ξi) ∀i,

xraw1 ≥ 0,

xraw2 ≥ 0,

y1(ξi) ≥ 0 ∀i,

y2(ξi) ≥ 0 ∀i.


Depending on the number r of realizations of ξ, this LP may become(very) large in scale, but its particular (dual decomposition) blockstructure is amenable to specially designed algorithms.

LP’s with dual decomposition structure will later on be introduced ingeneral and a basic solution method for these problems will bedescribed.


To further analyze our refinery problem, assume first only thedemands, hi(ξ), i = 1, 2, to be random (productivities fixed!). Thiscase was illustrated in Figure 1.3.

Even this small idealized problem, as an NLP, can present numericaldifficulties. The reason: The evaluation of the expected valueappearing in the objective requires

• multivariate numerical integration;

• implicit definition of the functions yi(ξ) (these functions yieldingfor a fixed x the optimal solutions of (1.9) for every possiblerealization ξ of ξ),

both of which are rather cumbersome tasks.


To avoid these difficulties, we approximate the normal distributionsby discrete ones. For this purpose, we

• generate large samples ζµi , µ = 1, 2, · · · , K, i = 1, 2, restricted to

the 99% intervals; sample size K =10 000;

• choose equidistant partitions of the 99% intervals into ri, i = 1, 2,subintervals Iiν (e.g. r1 = r2 = 15);

• calculate for Iiν , ν = 1, · · · , ri, i = 1, 2, the mean ζνi of sample

values ζµi ∈ Iiν as an estimate for Eξ[ζi | ζi ∈ Iiν ];

• calculate for every subinterval Iiν the relative frequency piν forζµi ∈ Iiν (i.e. piν = kiν/K, where kiν = #{ζµ

i | ζµi ∈ Iiν}). This

yields an estimate for the probability Pξ({ζi ∈ Iiν}).


�

�

-30 -25 -20 -15 -10 -5 0 5 10 15 20 25 30

3

6

9

12

15%

�∼ N (0, 12)�∼ N (0, 9)

�

�

�

�

�

�

�

�

�

�

�

�

�

�

��

�

�

�

�

�

�

�

�

�

�

�

�

�

�

Figure 1.6: Discrete distribution generated from N (0, 12),N (0, 9);(r1, r2) = (15, 15).


Figure 1.6 shows these discrete distributions for N (0, 12) andN (0, 9), with 15 realizations each.

The discrete distributions {(ζνi , piν), ν = 1, · · · , ri}, i = 1, 2, are then

used as approximations for the given normal distributions.

Obviously, these discrete distributions with 15 realizations each areonly rough approximations of the corresponding normal distributions.Therefore, using them in computations instead of the underlyingnormal distributions, remarkable discretization errors have to beexpected. This will become evident in the following numericalexamples.

With these latter distributions, with 15 realizations each and hence152 = 225 realizations for the joint distribution and 225 blocks in ourdecomposition problem, we get


as an optimal solution for the LP (1.10)

(with γ(·) its TOTAL objective and γI(x) = 2xraw1 + 3xraw2)

(1.11) x = (x1, x2) = (38.539, 20.539), γ(x) = 140.747,

with corresponding first-stage costs of

γI(x) = 138.694,

the empirical reliability ρ(x) (probability of x being feasible)

ρ(x) = 0.9541,

whereas our original LP solution x = (36, 18) would yield

γ(x) = 199.390 and ρ(x) = 0.3188

overestimating clearly the theoretical reliability of 0.25.


Consider now the case hi(ξ), i = 1, 2, fixed at hi(E[ξ]), andα(ξ) and β(ξ) discretized with 15 and 18 realizations, respectively,(i.e. 15 × 18 = 270 blocks in (1.10)).

Solving the resulting LP (1.10) yields

x = (37.566, 22.141), γ(x) = 144.179, γI(x) = 141.556,

whereas the original solution of LP (1.1) would yield

γ(x) = 204.561.

For the reliability, we now get

ρ(x) = 0.9497,

in contrast to ρ(x) = 0.2983 for the LP solution x.


Finally for the most general case of α(ξ), β(ξ), h1(ξ) and h2(ξ) beingdiscretized with 5−, 9−, 7- and 11-point distributions, respectively,we get a joint discrete distribution of 5 × 9 × 7 × 11 = 3465realizations and hence that many blocks in the recourseproblem (1.10); i.e. we have to solve an LP with 2 × 3465 + 1 = 6931constraints! The solution x amounts to

x = (37.754, 23.629), γ(x) = 150.446, γI(x) = 146.396,

with a reliability ofρ(x) = 0.9452,

whereas the LP solution x = (36, 18) would yield

γ(x) = 232.492, ρ(x) = 0.2499.


So far, for solutions of recourse problems we could determineafterwards the empirical reliabilities.

Although reliability provides no indication on the size of constraintviolation, and hence also not on the related penalties, there aremany real life decision situations where reliability is considered to bethe most important issue (e.g. medical or technical applications).

Suppose that only the demands are random and that for themanagement of our refinery it is absolutely necessary—in order tomaintain a client base—to achieve a reliability of 95% with respect tosatisfying the demands. In this case we may formulate the followingstochastic program with joint probabilistic constraints:




xraw1 ≥ 0,

xraw2 ≥ 0,

P

2xraw1 + 6xraw2 ≥ h1(ξ)


≥ 0.95.

This problem can be solved with appropriate methods, to bediscussed later in this text, using the original normal distributionsinstead of their discrete approximations. The solution of theprobabilistically constrained program is

z = (37.758, 21.698), γI(z) = 140.612.


It seems worth mentioning that the first-stage costs γI(z) = 140.612are only slightly increased as compared to γI(x) = 126 for our LPsolution, if we keep in mind the substantial increase in reliabilityfrom ρ(x) = 0.25 to ρ(z) = 0.95!

2. General Formulation 41.235

2 Stochastic Programs:

General Formulation


In the same way as random parameters in the LP (1.1)



2xraw1 + 6xraw2 ≥ 180,

3xraw1 + 3xraw2 ≥ 162,

xraw1 ≥ 0,

xraw2 ≥ 0,

led us to the stochastic (linear) program (1.5),




(2 + η1)xraw1 + 6xraw2 ≥ 180 + ζ1,

3xraw1 + (3.4 − η2)xraw2 ≥ 162 + ζ2,

xraw1 ≥ 0,

xraw2 ≥ 0,

random parameters in the general mathematical program (2.1)

(2.1)

min g0(x)

s.t. gi(x) ≤ 0, i = 1, · · · ,m,x ∈ X ⊂�n,


may lead to the stochastic program

(2.2)

“min”g0(x, ξ)

s.t. gi(x, ξ) ≤ 0, i = 1, · · · ,m,x ∈ X ⊂�n,

where ξ is a random vector varying over a set Ξ ⊂�k. Moreprecisely, we assume throughout that a family F of “events”, i.e.subsets of Ξ, and the probability distribution P on F are given.Hence for every event A ⊂ Ξ, i.e. A ∈ F , the probability P (A) isdetermined.

Furthermore, we assume that the functions gi(x, ·) : Ξ → IR ∀x, i arerandom variables themselves, and that the probability distribution Pis independent of x.


However, problem (2.2) is not well defined since the meaning of theobjective, i.e.

“min”g0(x, ξ)

as well as of the constraints

gi(x, ξ) ≤ 0, i = 1, · · · ,m,

are not clear at all, if we think of taking a decision on x beforeknowing the realization of ξ. Therefore a revision of the modellingprocess is necessary, leading to so-called deterministic equivalentsfor (2.2), which can be introduced in various ways, some of which wehave already met in our simple example.


2.1 Deterministic Equivalents

Let us come back to deterministic equivalents for (2.2):“min”g0(x, ξ)

s.t. gi(x, ξ) ≤ 0, i = 1, · · · ,m,x ∈ X ⊂�n,

In analogy to the particular stochastic linear program withrecourse (1.9), for problem (2.2) we may proceed as follows. With

g+i (x, ξ) =

0 if gi(x, ξ) ≤ 0,

gi(x, ξ) otherwise,

the ith constraint of (2.2) is violated if and only if g+i (x, ξ) > 0 for a

given decision x and realization ξ of ξ.


Hence we could provide for each constraint a recourse or second-stageactivity yi(ξ) that, after observing the realization ξ, is chosen such asto compensate its constraint’s violation—if there is one—bysatisfying gi(x, ξ) − yi(ξ) ≤ 0. This extra effort is assumed to causean extra cost or penalty of qi per unit, i.e. our additional costs(called the recourse function) amount to

(2.3) Q(x, ξ) = miny

{m∑

i=1

qiyi(ξ)∣∣∣ yi(ξ) ≥ g+

i (x, ξ), i = 1, · · · ,m},

yielding a total cost—first-stage and recourse cost—of

(2.4) f0(x, ξ) = g0(x, ξ) +Q(x, ξ).

Instead of (2.3), we might think of a more general linear recourseprogram with a recourse vector y(ξ) ∈ Y ⊂ IRn, where


Y is some given polyhedral set, such as {y | y ≥ 0}, and we have anarbitrary fixed m× n matrix W (the recourse matrix) and acorresponding unit cost vector q ∈ IRn, yielding for (2.4) therecourse function

(2.5) Q(x, ξ) = miny

{qTy |Wy ≥ g+(x, ξ), y ∈ Y },

where g+(x, ξ) = (g+1 (x, ξ), · · · , g+

m(x, ξ))T.

If we think of a factory producing m products, gi(x, ξ) could beunderstood as the difference {demand} − {output} of a product i.Then g+

i (x, ξ) > 0 means that there is a shortage in product i,relative to the demand. Assuming that the factory is committed tocover the demands, problem (2.3) could be interpreted as buying theshortage of products at the market.


Problem (2.5) instead could result from a second-stage or emergencyproduction program, carried through with the factor input y and atechnology represented by the matrix W . Choosing W = I, them×m identity matrix, (2.3) turns out to be a special case of (2.5).

Finally we also could think of a nonlinear recourse program to definethe recourse function for (2.4), i.e. for

f0(x, ξ) = g0(x, ξ) +Q(x, ξ);

for instance, Q(x, ξ) could be chosen as

(2.6) Q(x, ξ) = min{q(y) | Hi(y) ≥ g+i (x, ξ) ∀ i; y ∈ Y ⊂ IRn},

where q : IRn → IR and Hi : IRn → IR are supposed to be given.


In any case, if it is meaningful and acceptable to the decision makerto minimize the expected value of the total costs (i.e. first-stage andrecourse costs), instead of problem (2.2) we could consider itsdeterministic equivalent, the (two-stage) stochastic program withrecourse

(2.7) minx∈X

Eξf0(x, ξ) = minx∈X

Eξ{g0(x, ξ) +Q(x, ξ)}.

The above two-stage problem can immediately be extended to themodel of a multistage recourse program as follows:

Instead of the two decisions x and y, to be taken at stages 1 and 2,we now have K + 1 sequential decisions x0, x1, · · · , xK (xτ ∈�nτ ),to be taken at the subsequent stages τ = 0, 1, · · · , K.


The term “stages” can, but need not, be interpreted as “timeperiods”.

Assume for simplicity that the objective of (2.2) is deterministic, i.e.g0(x, ξ) ≡ g0(x).

At stage τ (τ ≥ 1) we know the realizations ξ1, · · · , ξτ of the randomvectors ξ1, · · · , ξτ as well as the previous decisions x0, · · · , xτ−1, andwe have to decide on xτ such that the constraint(s) (with vectorvalued constraint functions gτ )

gτ (x0, · · · , xτ , ξ1, · · · , ξτ ) ≤ 0

are satisfied, which—as stated—at this stage can only be achieved bythe proper choice of xτ , based on the knowledge of the previousdecisions and realizations.


Hence, assuming a cost function qτ (xτ ), at stage τ ≥ 1 we have arecourse function

Qτ (x0, x1, · · · , xτ−1, ξ1, · · · , ξτ ) =

= minxτ

{qτ (xτ ) | gτ (x0, · · · , xτ , ξ1, · · · , ξτ ) ≤ 0}

indicating that the optimal recourse action xτ at time τ depends onthe previous decisions and on the realizations observed until stage τ ,i.e.

xτ = xτ (x0, · · · , xτ−1, ξ1, · · · , ξτ ), τ ≥ 1.

Hence, taking into account the multiple stages, we get the total costsfor the multistage problem as


f0(x0, ξ1, · · · , ξK) =(2.8)

= g0(x0) +K∑

τ=1

Qτ (x0, x1, · · · , xτ−1, ξ1, · · · , ξτ ).

This yields the deterministic equivalent for the described dynamicdecision problem, the multistage stochastic program with recourse, as

(2.9) minx0∈X

[g0(x0) +K∑

τ=1

Eξ1,··· ,ξτQτ (x0, x1, · · · , xτ−1, ξ1, · · · , ξτ )],

obviously a straight generalization of our former (two-stage)stochastic program with recourse (2.7).


For the two-stage case, in view of their practical relevance it isworthwile to describe briefly some variants of recourse problems inthe stochastic linear programming setting. Assume that we are giventhe following stochastic linear program

(2.10)

“min”cTx

s.t. Ax = b,

T (ξ)x = h(ξ),

x ≥ 0.

Comparing this with the general stochastic program (2.2), we seethat the set X ⊂ IRn is specified as

X = {x ∈ IRn | Ax = b, x ≥ 0},where the m0 × n matrix A and b are assumed to be deterministic.


In contrast, the m1 × n matrix T (·) and the vector h(·) are allowedto depend on the random vector ξ, and therefore to have randomentries themselves. In general, we assume that this dependence onξ ∈ Ξ ⊂ IRk is given as

(2.11)

T (ξ) = T 0 + ξ1T1 + · · · + ξkT

k,

h(ξ) = h0 + ξ1h1 + · · · + ξkh

k,

with deterministic matrices T 0, · · · , T k and vectors h0, · · · , hk.Observing that the stochastic constraints in (2.10) are equalities(instead of inequalities, as in the general problem formulation (2.2)),it seems meaningful to equate their deficiencies, which, using linearrecourse and assuming that Y = {y ∈ IRn | y ≥ 0}, according to (2.5)yields the stochastic linear program with fixed recourse


(2.12)

minxEξ{cTx+Q(x, ξ)}s.t. Ax = b

x ≥ 0,

where

Q(x, ξ) = min{qT y |Wy = h(ξ) − T (ξ)x, y ≥ 0} .

In particular, we speak of complete fixed recourse if the fixed m1 × n

recourse matrix W satisfies

(2.13) {z | z = Wy, y ≥ 0} = IRm1 .

This implies that, whatever the first-stage decision x and therealization ξ of ξ are,


the second-stage program

Q(x, ξ) = min{qT y |Wy = h(ξ) − T (ξ)x, y ≥ 0}

will always be feasible. A special case of complete fixed recourse issimple recourse, where with the identity matrix I of order m1:

(2.14) W = (I,−I).

Then the second-stage program reads as

Q(x, ξ) =

= min{(q+)T y+ + (q−)T y− | y+ − y− = h(ξ) − T (ξ)x, y+y− ≥ 0},i.e., for q+ + q− ≥ 0, the recourse variables y+ and y− can be chosento measure (positively) the absolute deficiencies in the stochasticconstraints.


Generally, all the above problems fit into the following form:

(2.15)

minEξf0(x, ξ)

s.t. Eξfi(x, ξ) ≤ 0, i = 1, · · · , s,Eξfi(x, ξ) = 0, i = s+ 1, · · · , m,

x ∈ X ⊂ IRn,

where the fi are constructed from the objective and the constraintsin (2.2) or (2.10) respectively. So far, f0 represented the total costs(see (2.4) or (2.8)) and f1, · · · , fm could be used to describe thefirst-stage feasible set X . However, depending on the way thefunctions fi are derived from the problem functions gj in (2.2), thisgeneral formulation also includes other types of deterministicequivalents for the stochastic program (2.2).


To give a first example on how other deterministic equivalentproblems for (2.2) may be generated, let us choose first α ∈ [0, 1]and define a “payoff” function for all constraints as

ϕ(x, ξ) :=

1 − α if gi(x, ξ) ≤ 0, i = 1, · · · ,m,−α otherwise.

Consequently, for x infeasible at ξ we have an absolute loss of α,whereas for x feasible at ξ we have a return of 1 − α. It seemsnatural to aim for decisions on x that, at least in the mean, avoid anabsolute loss. This is equivalent to the requirement

Eξϕ(x, ξ) =∫

Ξ

ϕ(x, ξ)dP ≥ 0.

Defining f0(x, ξ) = g0(x, ξ) and f1(x, ξ) := −ϕ(x, ξ), we get


(2.16)

f0(x, ξ) = g0(x, ξ),

f1(x, ξ) =

α− 1 if gi(x, ξ) ≤ 0, i = 1, · · · ,m,α otherwise,

implyingEξf1(x, ξ) = −Eξϕ(x, ξ) ≤ 0,

where, with the vector-valued function

g(x, ξ) = (g1(x, ξ), · · · , gm(x, ξ))T,


Eξf1(x, ξ) =∫

Ξ

f1(x, ξ)dP

=∫{g(x,ξ)≤0}

(α− 1)dP +∫{g(x,ξ) �≤0}

αdP

= (α− 1)P ({ξ | g(x, ξ) ≤ 0}) + αP ({ξ | g(x, ξ) ≤ 0})

= α [P ({ξ | g(x, ξ) ≤ 0}) + P ({ξ | g(x, ξ) ≤ 0})]︸︷︷︸= 1

−P ({ξ | g(x, ξ) ≤ 0}).

Then the constraint Eξf1(x, ξ) ≤ 0 is equivalent to

P ({ξ | g(x, ξ) ≤ 0}) ≥ α.


Hence, under these assumptions, (2.15) reads as

(2.17)

minx∈X Eξg0(x, ξ)

s.t. P ({ξ | gi(x, ξ) ≤ 0, i = 1, · · · ,m}) ≥ α.

Problem (2.17) is called a probabilistically constrained or chanceconstrained program (or a problem with joint probabilisticconstraints).

If instead of (2.16) we define αi ∈ [0, 1], i = 1, · · · ,m, and analogous“payoffs” for every single constraint, resulting in

f0(x, ξ) = g0(x, ξ)

fi(x, ξ) =

αi − 1 if gi(x, ξ) ≤ 0,

αi otherwise,i = 1, · · · ,m,


then we get from (2.15) the problem with single (or separate)probabilistic constraints:

(2.18)

minx∈X Eξg0(x, ξ)

s.t. P ({ξ | gi(x, ξ) ≤ 0}) ≥ αi, i = 1, · · · ,m.

With the functions gi(x, ξ) linear in x, and the set X convexpolyhedral, we have the stochastic linear program

“min” cT(ξ)x

s.t. Ax = b,

T (ξ)x ≥ h(ξ),

x ≥ 0;


then problems (2.17) and (2.18) become

(2.19)

minx∈X EξcT(ξ)x

s.t. P ({ξ | T (ξ)x ≥ h(ξ)}) ≥ α,

and, with Ti(·) and hi(·) denoting the ith row and ith component ofT (·) and h(·) respectively,

(2.20)

minx∈X EξcT (ξ)x

s.t. P ({ξ | Ti(ξ)x ≥ hi(ξ)}) ≥ αi, i = 1, · · · ,m,

the stochastic linear programs with joint and with single chanceconstraints respectively.


Obviously there are many other possibilities to generate types ofdeterministic equivalents for (2.2) by constructing the fi in differentways out of the objective and the constraints of (2.2).

Formally, all problems derived, i.e. all the above deterministicequivalents, are mathematical programs. The first question is,whether or under which assumptions do they have properties likeconvexity and/or smoothness such that we have any reasonablechance to deal with them computationally using the toolkit ofmathematical programming methods.

3. Probabilistic Constraints 66.426

3 Probabilistic Constraints and

Integrated Chance Constraints:

Properties


3.1 Properties of Probabilistic Constraints

For chance constrained problems (CCP), the situation becomesrather difficult, in general. The constraint of the CCP (2.17) was

P ({ξ | g(x, ξ) ≤ 0}) ≥ α,

where the gi were replaced by the vector-valued function g defined byg(x, ξ) :=

(g1(x, ξ), · · · , gm(x, ξ)

)T. A point x is feasible iff the set

(3.1) S(x) = {ξ | g(x, ξ) ≤ 0}

has a probability measure P (S(x)) of at least α. In other words, ifG ⊂ F is the collection of all events of F such that P (G) ≥ α ∀G ∈ Gthen x is feasible iff we find at least one event G ∈ G such that for allξ ∈ G holds g(x, ξ) ≤ 0.


Formally, x is feasible according to (3.1) iff ∃G ∈ G:

(3.2) x ∈⋂ξ∈G

{x | g(x, ξ) ≤ 0}.

Hence the feasible set

B(α) = {x | P ({ξ | g(x, ξ) ≤ 0}) ≥ α}is the union of all those vectors x feasible according to (3.2), andconsequently may be rewritten as

(3.3) B(α) =⋃

G∈G

⋂ξ∈G

{x | g(x, ξ) ≤ 0}.

Since a union of convex sets need not be convex, this presentationdemonstrates that in general we may not expect B(α) to be convex,even if {x | g(x, ξ) ≤ 0} are convex ∀ξ ∈ Ξ.


Indeed, there are simple examples for nonconvex feasible sets.

Example 3.1 Assume that in our refinery problem (1.1) thedemands are random with the following discrete joint distribution:

P

h1(ξ1) = 160

h2(ξ1) = 135

= 0.85,

P

h1(ξ2) = 150

h2(ξ2) = 195

= 0.08,

P

h1(ξ3) = 200

h2(ξ3) = 120

= 0.07.


Then the constraints

xraw1 + xraw2 ≤ 100

xraw1 ≥ 0

xraw2 ≥ 0

P



≥ α

for any α ∈ (0.85, 0.92] require that


• either due to

P

h1(ξ1) = 160

h2(ξ1) = 135

= 0.85 and P

h1(ξ2) = 150

h2(ξ2) = 195

= 0.08,

we satisfy the demands hi(ξ1) and hi(ξ2), i = 1, 2, (enforcing areliability of 93%) and hence choose a production program to

cover a demand hA =

160

195


• or due to

P

h1(ξ1) = 160

h2(ξ1) = 135

= 0.85 and P

h1(ξ3) = 200

h2(ξ3) = 120

= 0.07,

we satisfy the demands hi(ξ1) and hi(ξ3), i = 1, 2 (enforcing areliability of 92%) such that our production plan is designed to

cope with the demand hB =

200

135

.

It follows that the feasible set for the above constraints is nonconvex,as shown in Figure 3.1.


Figure 3.1: Chance constraints: nonconvex feasible set.


As above, define S(x) := {ξ | g(x, ξ) ≤ 0}.If g(·, ·) is jointly convex in (x, ξ) then, withxi ∈ B(α), i = 1, 2, ξi ∈ S(xi) and λ ∈ [0, 1], for(x, ξ) = λ(x1, ξ1) + (1 − λ)(x2, ξ2) it follows that

g(x, ξ) ≤ λg(x1, ξ1) + (1 − λ)g(x2, ξ2) ≤ 0,

i.e. ξ = λξ1 + (1 − λ)ξ2 ∈ S(x), and hencea

S(x) ⊃ [λS(x1) + (1 − λ)S(x2)]

implyingP (S(x)) ≥ P (λS(x1) + (1 − λ)S(x2)).

aThe algebraic sum of sets ρS1 + σS2 := {ξ := ρξ1 + σξ2 | ξ1 ∈ S1, ξ2 ∈ S2}.


By our assumption on g (joint convexity), any set S(x) is convex.Now we conclude immediately that B(α) is convex ∀α ∈ [0, 1], if

P (λS1 + (1 − λ)S2) ≥ min[P (S1), P (S2)] ∀λ ∈ [0, 1]

for all convex sets Si ∈ F , i = 1, 2, i.e. if P is quasi-concave. Hencewe have proved the following

Proposition 3.1 If g(·, ·) is jointly convex in (x, ξ) and P isquasi-concave, then the feasible setB(α) = {x|P ({ξ|g(x, ξ) ≤ 0}) ≥ α} is convex ∀α ∈ [0, 1].


Remark 3.1 The assumption of joint convexity of g(·, ·) is verystrong. It even is not satisfied in the linear case (2.19), in general,i.e. in

minx∈X EξcT(ξ)x

s.t. P ({ξ | T (ξ)x ≥ h(ξ)}) ≥ α.

However, if here T (ξ) ≡ T (constant) and h(ξ) ≡ ξ then the jointconvexity is satisfied. With Fξ the distribution function of ξ, theabove constraints then read as

P ({ξ | Tx ≥ ξ}) = Fξ(Tx) ≥ α.

Therefore B(α) is convex ∀α ∈ [0, 1] in this particular case if Fξ is aquasi-concave function, i.e. ifFξ(λξ

1 + (1 − λ)ξ2) ≥ min[Fξ(ξ1), Fξ(ξ

2)] ∀ ξ1, ξ2 ∈ Ξ, ∀λ ∈ [0, 1].


It seems worthwile to mention the following facts.If the probability measure P is quasi-concave then the correspondingdistribution function Fξ is quasi-concave.This follows from the definition of distribution functionsFξ(ξ

i) = P (Si) with Si = {ξ | ξ ≤ ξi}, i = 1, 2,and since forξ = λξ1 + (1 − λ)ξ2, λ ∈ [0, 1], we haveS = {ξ | ξ ≤ ξ} = λS1 + (1 − λ)S2 (see Figure 3.2).With P being quasi-concave, this yields

Fξ(ξ) = P (S) ≥ min[P (S1), P (S2)] = min[Fξ(ξ1), Fξ(ξ

2)].


Figure 3.2: Convex combination of Si = {ξ | ξ ≤ ξi}, i = 1, 2; λ = 12 .


On the other hand, Fξ being quasi-concave does not imply thecorresponding probability measure P to be quasi-concave.

For instance, in IR1 every monotone function is obviouslyquasi-concave, such that every distribution function of a randomvariable (always being monotonically increasing) is quasi-concave.

But not every probability measure P on IR is quasi-concave (seeFigure 3.3).


Figure 3.3: P (A) = P (B) = 12 , but P (C) = P ( 1

3A+ 23B) = 0.


Hence we stay with the question of when a probability measure—orits distribution function—is quasi-concave. This question wasanswered first by Prekopa for the subclass of log-concave probabilitymeasures, i.e. measures satisfying

P (λS1 + (1 − λ)S2) ≥ Pλ(S1)P 1−λ(S2)

for all convex Si ∈ F and λ ∈ [0, 1].

That the class of log-concave measures is really a subclass of the classof quasi-concave measures is easily seen as follows:


Lemma 3.1 If P is a log-concave measure on F then P isquasi-concave.

Proof Let Si ∈ F , i = 1, 2, be convex sets such thatP (Si) > 0, i = 1, 2 (otherwise there is nothing to prove, sinceP (S) ≥ 0 ∀S ∈ F). By assumption, for any λ ∈ (0, 1) we have

P (λS1 + (1 − λ)S2) ≥ Pλ(S1)P 1−λ(S2).

By the monotonicity of the logarithm, it follows that

ln[P (λS1 + (1 − λ)S2)] ≥ λ ln[P (S1)] + (1 − λ) ln[P (S2)]

≥ min{ln[P (S1)], ln[P (S2)]},and hence

P (λS1 + (1 − λ)S2) ≥ min[P (S1), P (S2)].�


Necessary and sufficient conditions were derived first for thelog-concave case (Prekopa), and later corresponding conditions forquasi-concave measures were found (Brascamp-Lieb, Rinott).

Proposition 3.2 Let P on Ξ = IRk be of the continuous type, i.e.have a density f . Then the following statements hold:

• P is log-concave iff f is log-concave(i.e. if the logarithm of f is a concave function);

• P is quasi-concave iff f−1/k is convex.

The proof is omitted here, since it is too demanding concerningmeasure theory.


Remark 3.2 Consider

(a) the k-dimensional uniform distribution on a convex body S ⊂ IRk

given by the density ϕU (x) :=

1/µ(S) if x ∈ S,

0 otherwise

(µ the Lebesgue measure in IRk);

(b) the exponential distribution with density

ϕEXP(x) :=

0 if x < 0,

λe−λx if x ≥ 0(λ > 0 is constant);

(c) the multivariate normal distribution in IRk with density

ϕN (x) := γe−12 (x−m)TΣ−1(x−m)

(γ > 0 const, m expectation (vector), Σ covariance matrix).


(a) Then we get for the uniform distribution:

ϕ− 1

k

U (x) =

[1/µ(S)]−1k = k

√µ(S) if x ∈ S,

0−1k = ∞ otherwise,

Proposition 3.2 =⇒ the propability measure PU is quasi-concave;

(b) for the exponential distribution:

ln[ϕEXP(x)] =

ln 0 = −∞ if x < 0,

lnλe−λx = lnλ− λx if x ≥ 0,

i.e. ϕEXP(x) is log-concave;Proposition 3.2 =⇒ PEXP is log-concave;Lemma 3.1=⇒ PEXP is also quasi-concave;


(c) for the normal distribution:

ln[ϕN (x)] = ln γe−12 (x−m)TΣ−1(x−m)

= ln γ − 12 (x−m)TΣ−1(x−m)

Σ and hence Σ−1 positive definite =⇒ ϕN (x) log-concave;Proposition 3.2 =⇒ PN is log-concave;Lemma 3.1=⇒ PN is also quasi-concave.

There are many other classes of widely used continuous typeprobability measures, which—according to Proposition 3.2—are eitherlog-concave or at least quasi-concave. �


Since for mathematical programs in general, we cannot assert theexistence of solutions if the feasible sets are not known to be closed,the following statement is of interest:

Proposition 3.3 If g : IRn × Ξ → IRm is continuous then thefeasible set B(α) is closed.

Proof Consider any sequence {xν} such that xν −→ x andxν ∈ B(α) ∀ν.We have to show that x ∈ B(α).Define A(x) := {ξ | g(x, ξ) ≤ 0}.Let Vk ⊂�n be the open ball with center x and radius 1/k.

Then we show first that

3. Probabilistic Constraints: Prop. 3.3 88.560

(3.4) A(x) =∞⋂

k=1

cl⋃

x∈Vk

A(x).

Here the inclusion “⊂” is obvious since x ∈ Vk ∀k.Hence, we have only to show that

A(x) ⊃∞⋂

k=1

cl⋃

x∈Vk

A(x).

Assume that ξ ∈ ⋂∞k=1 cl

⋃x∈Vk

A(x).

⇐⇒ ∀ k : ξ ∈ cl⋃

x∈VkA(x); i.e. ∀ k ∃ ξk ∈ ⋃x∈Vk

A(x)

⇐⇒ ∀ k ∃ xk ∈ Vk with ξk ∈ A(xk): ‖ξk − ξ‖ ≤ 1/k(and obviously ‖xk − x‖ ≤ 1/k since xk ∈ Vk).


pro memoriam:

∀ k ∃ xk ∈ Vk, ξk ∈ A(xk) : ‖ξk − ξ‖ ≤ 1/k, ‖xk − x‖ ≤ 1/k

Hence (xk, ξk) −→ (x, ξ).ξk ∈ A(xk) =⇒ g(xk, ξk) ≤ 0 ∀kand therefore, by the continuity of g(·, ·),ξ ∈ A(x), which proves (3.4): A(x) =

⋂∞k=1 cl

⋃x∈Vk

A(x) to be true.

The sequence of sets

BK :=K⋂

k=1

cl⋃

x∈Vk

A(x)

is monotonically decreasing to the set A(x).


pro memoriam:

BK :=⋂K

k=1 cl⋃

x∈VkA(x) ↘ A(x) =

⋂∞k=1 cl

⋃x∈Vk

A(x)

Since xν −→ x, for every K there exists a νK such thatxνK ∈ VK ⊂ VK−1 ⊂ · · · ⊂ V1, implying that A(xνK ) ⊂ BK

and hence P (BK) ≥ P (A(xνK )) ≥ α ∀K.Hence, by the well-known continuity of probability measures onmonotonic sequences, we have P (A(x)) ≥ α, i.e. x ∈ B(α). �

In conclusion, for stochastic programs with joint chance constraintsthe situation appears to be rather difficult, in general. But, at leastunder certain additional assumptions, we may assert convexity andclosedness of the feasible sets as well (Proposition 3.1, Remark 3.1and Proposition 3.3).


For stochastic linear programs with single chance constraints,convexity statements have been derived without the joint convexityassumption on gi(x, ξ) := hi(ξ) − Ti(ξ)x, for special distributions andspecial intervals for the values of αi.In particular, if Ti(ξ) ≡ Ti (const), the situation becomes ratherconvenient: with Fi the distribution function of hi(ξ), we have

P ({ξ | Tix ≥ hi(ξ)}) = Fi(Tix) ≥ αi,

or equivalentlyTix ≥ F−1

i (αi),

where F−1i (αi) is the smallest real value η such that Fi(η) ≥ αi.

Hence in this special case any single chance constraint is just a linearconstraint, and the only additional work is to compute F−1

i (αi).

3. Integrated Chance Constraints 92.600

3.2 Integrated Chance Constraints

W.K. Klein Haneveld and M.H. van der Vlerk (2002).

Let us consider the SLP

(3.5)

“min”cTx

s.t. T (ξ)x ≥ h(ξ)

x ∈ X,

with X a given convex polyhedral set.

To get a well defined problem, we have introduced either a jointchance constraint (with α ∈ [0, 1])

P ({ξ | T (ξ)x ≥ h(ξ)}) ≥ α ,

or else separate chance constraints (with αi ∈ [0, 1])

P ({ξ | Ti(ξ)x ≥ hi(ξ)}) ≥ αi, i = 1, · · · ,m.


Let us rewrite (3.5) as

(3.6)

“min”cTx

s.t. T (ξ)x −η(x, ξ) = h(ξ)

x ∈ X,

then the joint chance constraint can be rewritten as

P ({ξ | η(x, ξ) ≥ 0}) ≥ α ,

or else as

P ({ξ | maxi[−ηi(x, ξ)] > 0}) ≤ β := 1 − α ,

whereas the separate chance constraints read as

P ({ξ | −ηi(x, ξ) > 0}) ≤ βi := 1 − αi .


Whereas, in the first formulation, α and αi, respectively, were therequired minimal levels of (qualitative) reliability, in thereformulation β and βi, respectively, may be interpreted as maximallevels of (qualitative) risk, or so called risk parameters.

Hence, with ζ− := max{0,−ζ} and sign (ζ) =

1 if ζ > 0

0 otherwise,

separate chance constraints can be written as

P ({ξ | −ηi(x, ξ) > 0}) = �ξ[sign (η−i (x, ξ))] ≤ βi ,

and a joint chance constraint is rewritten as

P ({ξ | maxi[−ηi(x, ξ)] > 0}) = �ξ

[sign

(max

1≤i≤mη−i (x, ξ)

)]≤ β .


Instead of the qualitative risk measures in chance constraints (CC),

(3.7) �ξ

[sign

(max


)]and �ξ[sign (η−i (x, ξ))],

in many applications, e.g. in finance, it appears to be moreappropriate to consider quantitative risk measures.

In the present framework, a natural choice would be

�ξ[η−i (x, ξ)], the average shortfall in the i-th constraint, and

�ξ

[max1≤i≤m η−i (x, ξ)

], the average largest shortfall in all

constraints, respectively.Introducing risk aversion parameters βi and β, respectively, thisleads to integrated chance constraints (ICC)

(3.8) �ξ[η−i (x, ξ)] ≤ βi and �ξ

[max


]≤ β .


The name ICC refers to the fact that, due to partial integration,with Fηi(τ ;x) the distribution function of ηi(ξ, x),

�ξ[η−i (x, ξ)] =

∫ 0

−∞−τdFηi(τ ;x)

= −τFηi(τ ;x)︸︷︷︸=0

0

−∞

+∫ 0

−∞Fηi(τ ;x)dτ

=∫ 0

−∞Pξ{ηi(x, ξ) ≤ τ}dτ .

Hence, ICC are related to a loss function �ICC(z) = z−,whereas CC are based on �CC(z) = sign (z−); and for z we may haveeither z := −η−i (x, ξ) or z := −maxi[η−i (x, ξ)].


The properties of CC and ICC are strongly related to the propertiesof their corresponding loss functions.

Whereas �CC(z) = sign (z−) =

1 if z < 0

0 otherwise,i.e. �CC(z) ≡ 1 in (−∞, 0) and non-convex, discontinuous on �,=⇒ non-convexity of CC (some exceptions, as discussed), we have:

Proposition 3.4 For ICC

a) the loss function �ICC(z) = z− is strictly decreasing on (−∞, 0)and convex, continuous on �;

b) the expected losses �ξ(�ICC(z(x, ξ)), with z(x, ξ) := −η−i (x, ξ)and z(x, ξ) := −maxi[η−i (x, ξ)], respectively, are convex in x.

3. Integrated Chance Constraints: Prop. 3.4 98.643

pro memoriam:

a) �ICC(z) = z− monotone ↘ , convex, continuous on �

Proof: Statement a) is obvious from the shape of �ICC(z):

�

Figure 3.4: Loss function �ICC(z).

3. Integrated Chance Constraints: Prop. 3.4 99.649

pro memoriam:

b) �ξ(�ICC(z(x, ξ)) convex in x with z(x, ξ) :=

−η−i (x, ξ) or

−maxi[η−i (x, ξ)]

For z(x, ξ) =


−maxi[η−i (x, ξ)]we get

�ICC(z(x, ξ)) = z−(x, ξ) =

η−i (x, ξ) or

maxi[η−i (x, ξ)]

with η(x, ξ) = T (ξ)x− h(ξ) =⇒ ∀ ξ ∈ Ξ: �ICC(z(x, ξ)) convex in x.

Hence, �ξ(�ICC(z(x, ξ)) =∫Ξ�ICC(z(x, ξ))Pξ(dξ) is convex in x. �


Remark 3.3 In the above proof we have used the following fact:

If ϕ :�n −→� is concave, and if f :� −→� is monotonedecreasing and convex, then f(ϕ) :�n −→� is convex.

The proof is as follows:

ϕ(λx+ (1 − λ)y) ≥ λϕ(x) + (1 − λ)ϕ(y) concavity

f(ϕ(λx+ (1 − λ)y) ≤ f(λϕ(x) + (1 − λ)ϕ(y)) mon. ↘≤ λf(ϕ(x)) + (1 − λ)f(ϕ(y)) convexity

In Prop. 3.4, the corresponding functions are

ϕξ(x) := z(x, ξ) =


−maxi[η−i (x, ξ)]and f(z) := �ICC(z) = z−.

�


In (3.8) the following ICC’s were introduced:

�ξ[η−i (x, ξ)] ≤ βi and �ξ

[max


]≤ β .

Due to Prop. 3.4 we get immediately

Lemma 3.2 The feasible sets described with ICC as

C(βi) := {x | �ξ[η−i (x, ξ)] ≤ βi} and

D(β) :={x

∣∣∣∣�ξ

[max


]≤ β

}are closed and convex for any arbitrary distribution Pξ, and increasecontinuously with βi ≥ β and β ≥ β, respectively,with β such that C(·) and D(·) are non-empty on [β,∞).

3. Integrated Chance Constraints: Lemma 3.2 102.687

Proof: Due to Prop. 3.4

�ξ[η−i (x, ξ)] and �ξ

[max


]are convex in x ∈�n. Hence, they are also continuous in x ∈�n.Therefore,

C(βi) := {x | �ξ[η−i (x, ξ)] ≤ βi} and

D(β) :={x

∣∣∣∣�ξ

[max


]≤ β

}are convex and closed subsets of �n. The remaining part of thelemma can now be stated as follows:For F :�n →� convex and θ such that {x | F (x) ≤ θ} = ∅,A(θ) := {x | F (x) ≤ θ} increases continuously with θ ≥ θ.


For the continuity of a set-valued mapping A(·) :�→ 2�

n

we canuse the convergence concepts of Kuratowski for sequences of sets:

For Bν ⊂�n, ν ∈�, with

lim νBν = {x | ∃{xν} : xν ∈ Bν , ν ∈�, xν −→ x} and

limνBν = {x | ∃{xν} : xν ∈ Bν , ν ∈�, x HP von {xν}}, we have

B = limν Bν if B = lim νBν = limνBν .

For θ ≤ θν ↗ θ and A(θν) = {x | F (x) ≤ θν}, F convex, let

A = limνA(θν) ⊇ lim νA(θν). To show limνA(θν) ⊆ lim νA(θν), let

x = limκ xνκ , xνκ ∈ A(θνκ), {νκ ↗ } ⊂ {ν ∈�}.


If, for some κ, νκ < µ < νκ+1, then for

xµ =θνκ+1 − θµ

θνκ+1 − θνκ

· xνκ +θµ − θνκ

θνκ+1 − θνκ

· xνκ+1

follows

F (xµ) ≤ θνκ+1 − θµ

θνκ+1 − θνκ

F (xνκ) +θµ − θνκ

θνκ+1 − θνκ

F (xνκ+1)

≤ θνκ+1 − θµ

θνκ+1 − θνκ

θνκ +θµ − θνκ

θνκ+1 − θνκ

θνκ+1 = θµ.

Hence, xµ ∈ A(θµ) and ‖xµ − x‖ ≤ max{‖xνκ − x‖, ‖xνκ+1 − x‖}.Therefore, adding the above elements xµ in the natural order into theoriginal subsequence {xνκ}, we get a sequence {xρ ∈ A(θρ) ∀ρ} suchthat xρ −→ x and hence x ∈ lim ρA(θρ). �


Observe that the convexity and continuity of ICC stated in Lemma3.2 hold for any distribution, in particular also for discrete ones, verymuch in contrast to the behaviour to CC.

Lemma 3.3 Assume that ξ is a discrete random vector, withPξ(ξ = ξs) = ps, s ∈ S, and let (T s, hs) = (Ti·(ξs), hi(ξs)) for s ∈ S.Then, for βi ≥ 0, holds for individual ICC’s

C(βi) =⋂

K⊂S

{x ∈�n

∣∣∣∣∣∑k∈K

pk(hk − T kx) ≤ βi

}.

If S is finite, then C(βi) is a polyhedral set defined by 2|S| − 1 linearconstraints.


pro memoriam:

C(βi) =⋂

K⊂S

{x ∈�n

∣∣∑k∈K pk(hk − T kx) ≤ βi

}Proof: With the discrete distribution of ξ follows

�ξ[η−i (x, ξ)] =

∑s∈S

ps max{0,−ηi(x, ξs)}

=∑s∈S

max{0,−psηi(x, ξs)}

=∑s∈S

(−psηi(x, ξs))+

= maxK⊂S

∑k∈K

−pkηi(x, ξk)

due to maxK⊂S

∑k∈K αk =

∑{k∈S: αk>0} αk.


pro memoriam:

C(βi) =⋂

K⊂S

{x ∈�n


}Furthermore, for mJ ∈�, J ⊂ I, with I an arbitrary set, it holdsthat supJ⊂I mJ ≤M iff mJ ≤M ∀J ⊂ I, implying

C(βi) =

{x ∈�n

∣∣∣∣∣�ξ[η−i (x, ξ)] = max

K⊂S

∑k∈K

−pkηi(x, ξk) ≤ βi

}

=⋂

K⊂S

{x ∈�n

∣∣∣∣∣− ∑k∈K

pkηi(x, ξk) ≤ βi

}

=⋂

K⊂S

{x ∈�n

∣∣∣∣∣∑k∈K

pk(hk − T kx

) ≤ βi

}.


pro memoriam:

C(βi) =⋂

K⊂S

{x ∈�n


}If S is finite, there are 2|S| − 1 non-empty subsets K ⊂ S, such thatC(βi) is described by the 2|S| − 1 linear constraints∑

k∈K

pk(hk − T kx) ≤ βi, ∀K ⊂ S : K = ∅.

�


For the joint ICC, according to Lemma 3.2 the feasible set

D(β) :={x

∣∣∣∣�ξ

[maxi∈I

η−i (x, ξ)]≤ β

}, β ≥ 0 ,

with ηi(x, ξ) := Ti·(ξ)x− hi(ξ), i ∈ I := {1, · · · ,m}, is convex andclosed.

Lemma 3.4 Assume that ξ is a discrete random vector, withPξ(ξ = ξs) = ps, s ∈ S, S a finite set. Let (T s

i , hsi ) = (Ti·(ξs), hi(ξs))

for i ∈ I and s ∈ S. Then, with I(K) := I × · · · × I︸︷︷︸|K|

,

D(β) =⋂

K⊂S

⋂∈I(K)

{x

∣∣∣∣∣∑k∈K

−pk(T kkx− hk

k) ≤ β

},

involving (m+ 1)|S| − 1 linear constraints.


Proof: We have�ξ

[maxi∈I

η−i (x, ξ)]

=∑s∈S

ps maxi∈I

η−i (x, ξs)

=∑

k∈K⊂S

pk maxi∈I

η−i (x, ξk)

with K := {k ∈ S | maxi∈I

η−i (x, ξk) > 0},

=⇒ �ξ

[maxi∈I

η−i (x, ξ)]

≤ β

⇐⇒∑

k∈K⊂S

pk maxi∈I

η−i (x, ξk) ≤ β

⇐⇒ maxK⊂S

∑k∈K

pk maxi∈I

η−i (x, ξk) ≤ β .


With I(K) := I × · · · × I︸︷︷︸|K|

this yields

�ξ

[maxi∈I

η−i (x, ξ)]

≤ β

⇐⇒ maxK⊂S

∑k∈K

pk maxi∈I

η−i (x, ξk) ≤ β

⇐⇒ maxK⊂S

max∈I(K)

∑k∈K

pkη−k(x, ξk) ≤ β .

=⇒ D(β) ={x

∣∣∣∣�ξ

[maxi∈I

η−i (x, ξ)]≤ β

}=

⋂K⊂S

⋂∈I(K)

{x

∣∣∣∣∣∑k∈K

−pk(T kkx− hk

k) ≤ β

}.


For any K ⊂ S, the set I(K) := I × · · · × I︸︷︷︸|K|

contains m|K| elements.

Hence, having

|S||K|

possibilities to choose |K| elements out of S,

the feasible set

D(β) =⋂

K⊂S

⋂∈I(K)

{x

∣∣∣∣∣∑k∈K

−pk(T kkx− hk

k) ≤ β

}involves

|S|∑|K|=1

|S||K|

m|K| = (m+ 1)|S| − 1 constraints. �

4. CCP Methods 113.760

4 Chance Constrained Programs:

Computational Approaches


4.1 Probabilistic Constraints

4.1.1 Reduced Gradient Method

Consider the following problem type:

(4.1)

min cTx

s.t. G(x) ≥ α

Dx = d

x ≥ 0,

where G(·) is real-valued, concave, and continuously differentiable.Furthermore, assume D to be an (m× n)-matrix with rank(D) = m.Let x be feasible in (4.1).


Consider a partition D := (B,N), with B regular, and thecorresponding partitions of x := (yT, zT)T, c := (fT, gT)T, and of adescent direction w := (uT, vT)T (i.e. cTw < 0).Assume further, that x is strictly non-degenerated w.r.t. B, i.e. that

(4.2) yj > ε, ε > 0 (ε a small tolerance).

For the descent direction w, with some θ > 0, choose (u, v) s.t.

(4.3)

max τ

fTu + gTv ≤ −τ (< 0)

∇yG(x)Tu + ∇zG(x)Tv ≥ θτ, if G(x) ≤ α+ ε

Bu + Nv = 0

vj ≥ 0, if zj ≤ ε

‖v‖∞ ≤ 1,


Restating this system, using u = −B−1Nv,

max τ

2.) fTu +gTv ≤ −τ (< 0)

3.) ∇yG(x)Tu +∇zG(x)Tv ≥ θτ, if G(x) ≤ α+ ε

=⇒ Bu +Nv = 0


‖v‖∞ ≤ 1,

yielding

2.) rTv := (gT − fTB−1N)v ≤ −τ3.) sTv = (∇zG(x)T −∇yG(x)TB−1N)v ≥ θτ ,


we get, with u = −B−1Nv,

(4.4)

max τ

s.t. rTv ≤ −τsTv ≥ θτ, if G(x) ≤ α+ ε


‖v‖∞ ≤ 1,

where

rT = gT − fTB−1N

sT = ∇zG(x)T −∇yG(x)TB−1N,

with a solution (τ∗, u∗T, v∗T).


To solve (4.1), procede as follows:

Case 1 τ∗ = 0:Then let ε = 0; resolve (4.4). If again τ∗ = 0, then thefeasible solution xT = (yT, zT) is optimal in (4.1).Otherwise go to Case 2 (Step 1 and Step 2), starting withthe original ε > 0.

Case 2 0 < τ∗ ≤ ε: Run the following cycle:Step 1: Let ε := 0.5ε.

Step 2: Solve (4.4) again.If still τ∗ ≤ ε, go to Step 1 ;otherwise continue in Case 3.

Case 3 τ∗ > ε: Accept w∗T = (u∗T, v∗T) as descent direction.


With B(α) the feasible set of (4.1) and the above feasible solution x,determine—within an ε-tolerance—by a line search the intersection{z | z = x+ µw∗, µ ≥ 0} ∩ B(α).If the result x of this line search happens to satisfy the strictnon-degenracy (4.2) w.r.t. B, again, the above cycle is repeated withx and the same partition D = (B,N) as before.Otherwise, determine a new partition D = (B, N), such that thenon-degeneracy (4.2) holds for x w.r.t. B, and then repeat the abovecycle with x and the new partition D = (B, N).

This procedure, named the reduced gradient method—due to therepresentation of the gradients of the objective and the nonlinearconstraint function G as depending only of the non-basic variables ofx = (yT, zT) w.r.t. to the partition D = (B,N)—was introduced byAbadie and Carpentier (1969).


For the stochastic program with one joint probabilistic constraint

(4.5)

min cTx

w.r.t. Pξ({ξ | Tx ≥ ξ}) ≥ α

Dx = d

x ≥ 0,

assuming that ξ ∼ N (µ,Σ), we know that

G(x) := Pξ({ξ | Tx ≥ ξ}) = Fξ(Tx)is concave in x.

J. Mayer has adapted the above reduced gradient method—asPROCON—for this problem type. To compute ε-approximations ofG(x) and ∇G(x), a stochastic method proposed by T. Szantai (1987)is used.


4.1.2 Cutting Plane Methods

For the NLP

(4.6) minx∈�n

{f(x) | gi(x) ≤ 0, i = 1, · · · ,m}

assume f, gi to be convex and continuously differentiable, and let

B = {x | gi(x) ≤ 0, i = 1, · · · ,m}be bounded and have an interior point x ∈ intB (Slater-point).

Then (4.6) is equivalent to

(4.7)

min θ

s.t. gi(x) ≤ 0, i = 1, · · · ,m,f(x) − θ ≤ 0.


Since the condition θ ≤ f(x) + γ, with some γ > 0, does not changethe solution set of (4.7), we may consider the problem

min{ϕ(x, θ) ≡ θ | (x, θ) ∈ B := {(x, θ) | x ∈ B, f(x) ≤ θ ≤ f(x) + γ}}instead, with some (x, θ) ∈ intB existing as well.

Hence, we can restrict ourselves to NLP’s of the type

(4.8) min{cTx | x ∈ B},with a bounded convex set B containing an interior point x.

Then there exists a convex polyhedron P : P ⊃ B, such that

minx∈P

cTx ≤ minx∈B

cTx.


Cutting Planes: A First Outer Linearizationdue to Kleibohm (1966), Veinott (1967)

S 1 Find a x ∈ intB and a convex polyhedron P0 ⊃ B; let k := 0.

S 2 Solve the LP min{cTx | x ∈ Pk}, yielding the solution x(k).If x(k) ∈ B, stop; x(k) solves (4.8).Else, determine z(k) ∈ [x(k), x] ∩ bdB (with [x(k), x] the straightline between x(k) and x, and bdB the boundary of B).

S 3 Determine a supporting hyperplane Hk of B in z(k), i.e. finda(k) ∈�n and αk = a(k)Tz(k) such that

Hk := {x | a(k)Tx = αk} and a(k)Tx(k) > αk ≥ a(k)Tx ∀x ∈ B.

Define Pk+1 := Pk ∩ {x | a(k)Tx ≤ αk}, let k := k + 1, andreturn to step S 2.


In general we may not expect the iterates z(k) ∈ B or x(k) ∈ B toconverge. However, the following can be proved easily.

Proposition 4.1 Under the above assumptions, the accumulationpoints of {x(k)} as well as of {z(k)} solve (4.8). Furthermore, theobjective values {cTx(k)} and {cTz(k)} converge to min{cTx | x ∈ B}.Finally, in every iteration we have an error estimate with respect tothe true optimal value δ of (4.8) as

∆k = minl=1,··· ,k

cTz(l) − cTx(k).

Remark 4.1 Observe that due to Pk+1 ⊂ Pk ∀k it follows

cTx(k+1) ≥ cTx(k) whereas the sequence {cTz(k)} need not bemonotone. However, since z(k) ∈ B ∀k, we have cTz(k) ≥ δ ∀k,whereas cTx(k) ≤ δ as long as x(k) ∈ B.


Obviously, the above error estimate yields an additional stoppingcriterion in step S 2 according to ∆k < ε, with a predeterminedtolerance ε > 0.

As to the supporting hyperplane Hk: For the feasible set

B = {x | gi(x) ≤ 0, i = 1, · · · ,m} = {x | G(x) ≤ 0}

with G(x) := max1≤i≤m

gi(x) we determine in S 2 the (unique) boundary

point z(k) ∈ [x(k), x] ∩ {x | G(x) = 0}, which due to G(x) < 0 andG(x(k)) > 0 results from solving

λk := max{λ | G((1 − λ)x+ λx(k)) ≤ 0}as z(k) = (1 − λk)x+ λkx

(k).


Afterwards we define the hyperplane Hk := {x | a(k)Tx = αk} witha(k) ∈ ∂G(z(k)), which may be chosen e.g. as

a(k) = ∇gj(z(k)) for any j : gj(z(k)) = G(z(k)),

and then let αk := a(k)Tz(k). It follows a(k)Tx ≤ αk ∀x ∈ B, whereasa(k)Tx(k) > αk.

Hence, with the inequality a(k)Tx ≤ αk added in step S 3, all feasiblepoints of B are maintained, and the outer approximate x(k) is cut off(see Fig. 4.1).

�


��

� � � ��

� � � �

� � � ��

� � � �

� � ��

� � �

� � � � � �

� � � � � �

Figure 4.1: Three cycles of Veinott’s cutting plane method.


With Fξ(x) := Pξ({ξ | Tx ≥ ξ}), this method can be applied to solve

(4.9)

min cTx

w.r.t. Fξ(x) ≥ α

Ax ≥ b

x ≥ 0,

with ξ ∼ N (µ,Σ) to assert the convexity of (4.9).

Also in this approach, a major issue is the efficient evaluation ofFξ(·) and ∇Fξ(·) wherever needed in steps S 2 and S 3 of thealgorithm.

The above mentioned approach of Szantai based on fast Monte Carlosimulation can be applied, again.


We mention two further cutting plane methods designed for thefollowing general NLP

(4.10)

min cTx

w.r.t. F (x) ≥ α

Ax ≥ b,

under the following assumptions:

• F concave, continuously differentiable(as satisfied by Fξ for ξ ∼ N (µ,Σ))

• Blin := {x | Ax ≥ b} bounded and henceB = Blin ∩ {x | F (x) ≥ α} bounded

• ∃xS ∈ Blin being a Slater point for the nonlinear constraint, i.e.F (xS) > α.


Cutting Planes: Outer Linearization, Moving Slater Points

S 1 Let y(1) := xS , B1 := Blin, and k := 1.

S 2 Solve the LP min{cTx | x ∈ Bk} yielding a solution x(k).

S 3 If F (x(k)) > α− ε (for some predefined tolerance ε > 0), thenstop;else add a feasibility cut according to the next step.

S 4 Determine z(k) ∈ [y(k), x(k)] ∩ {x | F (x) = α};Bk+1 := Bk ∩ {x | ∇F (z(k))

T(x− z(k)) ≥ 0};

y(k+1) := y(k) +1

k + 1(z(k) − y(k)); k := k + 1; return to step S 2.

Under the above assumptions, the statements of Prop. 4.1 hold truefor this algorithm as well.


Remark 4.2 Whereas in the previous method the interior point

x was kept fixed throughout the procedure, in this variant the interiorpoint of the set {x | F (x) ≥ α} (originally y(1) = xS) is changed ineach cycle (see Fig. 4.2).

For any convex set D, with some y ∈ intD and any z ∈ bdD,it follows that λz + (1 − λ)y ∈ intD ∀λ ∈ (0, 1).We conclude that in step S 4 with y(k) interior to {x | F (x) ≥ α} andz(k) on its boundary, we get y(k+1) ∈ {x | F (x) > α} and hence againan interior point.

However, these changes of the interior (Slater) points may improvethe convergence rate of the algorithm. �

The above method was proposed by Zoutendijk (1966), specialized bySzantai (1988) for chance constrained programs, and implementedfinally by J. Mayer as PCSPIOR.


� �

��

��

��

� �

� �

� �

� � � �

Figure 4.2: Outer linearization with moving Slater points.


With the above assumptions

– F concave, continuously differentiable

– Blin bounded

– ∃xS ∈ Blin such that F (xS) > α,

but the last one modified to

• ∃xS ∈ intBlin such that F (xS) > α,∃U ∈� : cTx ≤ U ∀x ∈ B, and ‖c‖ = 1

we mention finally a central cutting plane method,originally proposed by Elzinga–Moore (1975), andlateron adapted and implemented for chance constrained programsby J. Mayer (1998) as PROBALL.


A Central Cutting Plane Method

S 1 Let y(1) := xS , k := 1, andP1 := {(xT, η)T | a(i)Tx− ‖a(i)‖ η ≥ bi ∀i, cTx+ η ≤ U}.

S 2 Solve the LP max{η | (xT, η)T ∈ Pk} yielding

(x(k)T, η(k))T as a solution.

S 3 If η(k) < ε (ε > 0 a prescribed tolerance), then stop;otherwise– if F (x(k)) < α, then go to step S 4 to add a feasibility cut;

– else go to step S 5 to add a central (objective) cut.


S 4 Determine z(k) ∈ [y(k), x(k)] ∩ {x | F (x) = α} and let

Pk+1 := Pk ∩{(xT, η)T | ∇F (z(k))T(x−z(k))−‖∇F (z(k))‖η ≥ 0},

y(k+1) := y(k), k := k + 1, and go to step S 2.

S 5 Replace the last objective cut by cTx+ η ≤ cTx(k) =⇒ Pk+1.If F (x(k)) > α, then set y(k+1) := x(k),else let y(k+1) := y(k).With k := k + 1 go to step S 2.

An outer cut (feasibility cut) according to step S 4 and two centralcuts (objective cuts) according to step S 5 are illustrated in Fig. 4.3and 4.4, respectively.


F(x) = α c

y2 = y1

z1

x1

Feasibility cut

Figure 4.3: Central cutting plane method: Outer cut.


��

��

� � � � �

� ��

�

� � � � � � � � � � �

�

��

��

� � � � �

� � � � � � � � � � �

� �

��

��

�

Figure 4.4: Central cutting plane method: Objective cuts.


Remark 4.3 The basic ideas of this algorithm are related to Hesse’snormal form of a linear equation:

The equation dTx = ρ is in normal form if ‖d‖ = 1. Thenσ = dTy − ρ yields with |σ| the Euclidean distance of y to thehyperplane {x | dTx = ρ}, with σ > 0 if and only if dTy > ρ.

Hence, for aTx = b, a = 0, the normal formaT

‖a‖x =b

‖a‖ yields an

upper bound η ≤ aT

‖a‖y −b

‖a‖ for the distance η of any

y ∈ {y | aTy ≥ b} to the hyperplane {x | aTx = b}.Hence, solving an LP like max{η | d(i)Tx− ‖d(i)‖η ≥ ρi, i ∈ I} as instep S 2 yields the center x and the radius η of the largest ballinscribed into the polyhedron {x | d(i)Tx ≥ ρi, i ∈ I}, as pointed outin Nemhauser–Widhelm (1971). �


Therefore, with

Jk := {j ≤ k | iteration j generates a feasibility cut}Ik := {1, · · · , k} \ Jk and Uk := min{U,min

i∈ Ik

cTx(i)},

in the k-th cycle we get the center x(k) and the radius η(k) of thelargest hypersphere contained in Pk as defined by

(4.11)

a(i)Tx ≥ bi, i = 1, · · · ,m,cTx ≤ Uk

∇F (z(j))Tx ≥ ∇F (z(j))

Tz(j), j ∈ Jk,

and, depending on x(k) ∈ B or x(k) ∈ B, we add a feasibility cut orelse a central cut, respectively.


Proposition 4.2 Under the above assumptions for the centralcutting plane method holds lim

k→∞η(k) = 0.

If U > minx∈B

cTx, then every convergent subsequence of {x(k) | k ∈ Ik}converges to a solution of (4.10).

For the proof and for details on the convergence behaviour of thisalgorithm we refer to Elzinga–Moore (1975).


4.2 ICC: The Discrete Case

The problem with separate ICC’s was stated as:

min cTx

�ξ[η−i (x, ξ )] ≤ βi

x ∈ X,

where ηi(x, ξ) = Ti·(ξ)x− hi(ξ). For a finite discrete distribution{(ξs, ps); s ∈ S} with ys = η−(x, ξs) = (T sx− hs)− we get the LP

minx∈X cTx

T sx+ ys ≥ hs, s ∈ S∑s∈S

psys ≤ β

ys ≥ 0, s ∈ S .


Hence, for |S| not too large we may solve this LP having |S| + 1constraints and |S| additional variables.

From Lemma 3.3 we know that

C(β) =⋂

K⊂S

{x ∈�n

∣∣∣∣∣∑k∈K

pk(hk − T kx) ≤ β

}.

Hence, for K = S it follows that

C0 :=

{x ∈�n

∣∣∣∣∣∑s∈S

ps(hs − T sx) ≤ β

}⊇ C(β) ,

i.e. the constraintTx ≥ h− β

is necessarily satisfied. Assumption: minx∈X∩C0

cTx be bounded.


Algorithm for a separate ICC

S 0 Let C0 := {x ∈�n | Tx ≥ h− β} and t = 0;d0 := −TT

, e0 := −(h− β).

S 1 Solve the LPt

minx∈X

cTx

x ∈ Ct := {x ∈�n | dix ≤ ei, i = 0, · · · , t}

yielding xt. If LPt is infeasible, stop; problem infeasible.

S 2 With Kt := {s ∈ S | η−(xt, ξs) > 0} compute

�ξ η−(xt, ξ ) =

∑k∈Kt

pkη−(xt, ξk) =∑

k∈Kt

pk(hk − T kxt).


S 3 If �ξ η−(xt, ξ ) ≤ β, stop; xt is optimal.

Otherwise, define the feasibility cut dt+1x ≤ et+1, with

dt+1 = −∑

k∈Kt

pkT k, ek+1 = β −∑

k∈Kt

pkhk ,

and setCt+1 := Ct ∩ {x ∈�n | dt+1x ≤ et+1}.

With t := t+ 1 go to step S 1.

Proposition 4.3 The bounded SLP with a separate ICCand ξ finitely distributed,

minx∈X

{cTx | �ξ [η−(x, ξ ] ≤ β ,

is solved in finitely man steps by the above algorithm.


Proof: In step S 3 the cut dt+1x ≤ et+1 with

dt+1 = −∑

k∈Kt

pkT k, ek+1 = β −∑

k∈Kt

pkhk ,

i.e. one of the cuts in C(β),∑

k∈Kt

pk(hk − T kx) ≤ β, is added to the

cuts dix ≤ ei, i ≤ t, and has to hold for future iterates xt+τ , τ > 0,such that

∑k∈Kt

pk(hk − T kxt+τ ) > β, τ > 0, cannot happen. Hence,

after finitely many steps we get xt ∈ C(β) and

Ct ⊃ C(β) =⋂

K⊂S

{x ∈�n

∣∣∣∣∣∑k∈K

pk(hk − T kx) ≤ β

},

such that for xt ∈ arg minX∩Ct

cTx follows: xt optimal on X ∩ C(β). �

5. Discrete Recourse 146.1023

5 Recourse Problems:

Discrete Distributions


5.1 The Two-Stage Case

We consider first the two-stage SLP with complete fixed recourse:

(5.1)

min cTx+�ξQ(x;T (ξ), h(ξ))

s.t. Ax = b

x ≥ 0 ,

where

(5.2)

Q(x;T (ξ), h(ξ)) := min qTy

s.t. Wy = h(ξ) − T (ξ)x

y ≥ 0 .


Definition 5.1 The recourse (or second-stage) problem

(5.2)

Q(x;T (ξ), h(ξ)) := min qTy

s.t. Wy = h(ξ) − T (ξ)x

y ≥ 0

is of complete fixed recourse, if the m× n-matrix W is constantsuch that

{z | z = Wy, y ≥ 0} =�m and {u |WTu ≤ q} = ∅ .

Furthermore, we assume, that ξ is distributed in �r according to{(ξj , pj), j = 1, · · · , S}, yielding for (T, h) the realizations

T j =r∑

i=0

Tiξji , h

j =r∑

i=0

hiξji with Ti ∈�m×n , hi ∈�m fixed.


Then problem (5.1) reads as

(5.3)

min cTx+S∑

j=1

pjqTyj

s.t. Ax = b

T 1x +Wy1 = h1

T 2x +Wy2 = h2

.... . .

...

TSx +WyS = hS

x ≥ 0

yS ≥ 0

with the special data structure:


W

WW

W

W

A

T1

T2

TS

T 2

TS

Figure 5.1: Dual decomposition structure

If, for instance, we had r = 5 with 5 independent realizations percomponent, we had S = 55 = 3′125 blocks in Fig. 5.1.

Hence, in real problems applying the straight Simplex method, doesnot seem to be a promising approach to solve (5.3).


5.1.1 Dual Decomposition

One possibility to take advantage from the data structure in Fig. 5.1is dual decomposition due to Benders [1].

For simplicity, we present this method for S = 1 in (5.3), i.e. for theproblem

(5.4)

min{cTx+ qTy}s. t. Ax = b

Tx +Wy = h

x ≥ 0

y ≥ 0.


The extension of the method for S > 1 realizations is thenimmediate, although several variants and tricks can be involved.

Assumption: The LP (5.4) is solvable and the first stage feasible set{x | Ax = b, x ≥ 0} is bounded. The solvability of (5.4) implies that

{(x, y) | Ax = b, Tx+Wy = h, x ≥ 0, y ≥ 0} = ∅and that

cTξ+qTη ≥ 0 ∀ (ξ, η) ∈ {(ξ, η) | Aξ = 0, T ξ+Wη = 0, ξ ≥ 0, η ≥ 0},and therefore in particular, for ξ = 0, that

qTη ≥ 0 ∀ η ∈ {η |Wη = 0, η ≥ 0},such that the recourse function

f(x) := min{qTy |Wy = h− Tx, y ≥ 0}is finite if the recourse constraints are feasible.


Otherwise, we define the recourse function as f(x) = ∞ if{y |Wy = h− Tx, y ≥ 0} = ∅.Then we have

Proposition 5.1 The recourse function f(x), defined on thebounded set {x | Ax = b, Tx+Wy = h, x ≥ 0, y ≥ 0} = ∅, ispiecewise linear, convex, and bounded below.

Proof: By our assumptions, with B1 = {x | Ax = b, x ≥ 0} it followsthat B := B1 ∩ {x | ∃y ≥ 0 : Wy = h− Tx} = ∅ is bounded. Since{x | ∃y ≥ 0 : Wy = h− Tx} = ∅ is the projection of the convexpolyhedral set {(x, y) | Tx+Wy = h, y ≥ 0} in (x, y)-space intox-space, it is convex polyhedral. Hence, B as the intersection of aconvex polyhedron with a convex polyhedral set is a convexpolyhedron, and for x ∈ B holds:

5. Discrete Recourse: Prop. 5.1 154.1075

f(x) = q{BW }TB−1

W (h− Tx) if B−1W (h− Tx) ≥ 0,

where BW out of W is an appropriate optimal basis. Hence, f(x) ispiecewise linear and bounded below in x ∈ B. Finally, with x1 ∈ Band x2 ∈ B such that f(xi), i = 1, 2, is finite, and with correspondingrecourse solutions y1, y2 satisfying

f(xi) = qTyi, i = 1, 2, and Wyi = h− Txi, yi ≥ 0, i = 1, 2,

for arbitrary λ ∈ (0, 1) and x = λx1 + (1 − λ)x2 it follows that

λy1 + (1 − λ)y2 ∈ {y |Wy = h− T x, y ≥ 0}

=⇒ f(x) = min{qTy |Wy = h− T x, y ≥ 0}≤ qT(λy1 + (1 − λ)y2) = λf(x1) + (1 − λ)f(x2),

=⇒ the convexity of f(x) on its effective domain dom f = B. �


Obviously, with the recourse function f(x), the LP (5.4) can berewritten equivalently as an NLP:

min{cTx+ f(x)}s. t. Ax = b

x ≥ 0

or else as

(5.5)

min{cTx+ θ}

s. t. Ax = b

θ − f(x) ≥ 0

x ≥ 0.


However, this may not yet help a lot since we do not know the(convex polyhedral) recourse function f(x) explicitely. In otherterms: f(x) being bounded below, piecewise linear andconvex on B implies the existence of finitely many linear functionsϕν(x), ν = 1, · · · , L, such that, on dom f = B, it holds thatf(x) = max

ν∈{1,··· ,L}ϕν(x).

Hence, to reduce the feasible set of (5.5) to dom f = B, it may benecessary to add some further linear constraints ψ1(x), · · · , ψK(x)(observe that the polyhedron B is defined by finitely many linearconstraints) to achieve feasibility of the recourse problem, such thatinstead of (5.5) we get an equivalent LP:


(5.6)

min{cTx+ θ}s. t. Ax = b

θ − ϕν(x) ≥ 0 ν = 1, · · · , Lψµ(x) ≥ 0 µ = 1, · · · , K

x ≥ 0.

Also in this case, we do not know in advance the linear constraintsneeded for the exact coincidence of (5.6) with the original LP (5.4).Therefore, the idea of the following procedure is to generatesuccessively those additional constraints needed to approximate (andfinally to hit) the solution of the original LP (5.4).


The Dual Decomposition Algorithm

S 1 Initialization

Find a lower bound θ0 for

min{qTy | Ax = b, Tx+Wy = h, x ≥ 0, y ≥ 0}and solve the LP

min{cTx+ θ | Ax = b, x ≥ 0, θ ≥ θ0}yielding the solution (x, θ). Define

B0 = {(x, θ) | Ax = b, x ≥ 0, θ ∈�} ; B1 = {�n × {θ} | θ ≥ θ0}.

S 2 Evaluate the recourse function

For f(x) = min{qTy |Wy = h− T x, y ≥ 0}, solve the dual LP

f(x) = max{(h− T x)Tu |WTu ≤ q}.If f(x) = +∞, then go to step S 3, else to S 4.


S 3 The feasibility cut

x is infeasible for (5.4). In this case there exists an unboundedgrowth direction u (to be revealed by the simplex algorithm)such that WTu ≤ 0 and (h− T x)Tu > 0, whereas for anyfeasible x of (5.4) there exists some y ≥ 0 such thatWy = h− Tx.

Multiplying this equation by u yields the inequality

uT(h− Tx) = uTWy ≤ 0,

which has to hold for any feasible x but is violated by x.Therefore we redefine B1 := B1 ∩ {(x, θ) | uT(h− Tx) ≤ 0} suchthat the infeasible x is cut off, and go on to step S 5.


S 4 The optimality cut

f(x) finite =⇒ ∃ a dual optimal feasible basic solution u for therecourse problem determined in step S 2, such that

f(x) = (h− T x)Tu,

whereas for any arbitrary x we have

f(x) = sup{(h− Tx)Tu |WTu ≤ q}≥ (h− Tx)Tu.

Therefore, by the inequality θ − f(x) ≥ 0 in (5.5) follows

θ ≥ uT(h− Tx).

If this holds for (x, θ), stop; x := x optimal.

Otherwise redefine B1 := B1 ∩ {(x, θ) | θ ≥ uT(h− Tx)}, thuscutting off the nonoptimal (x, θ), and go on to step S 5.


S 5 Solve the updated LP

min{cTx+ θ | (x, θ) ∈ B0 ∩ B1} ,called the master program, yielding the optimal solution (x, θ).

With (x, θ) := (x, θ) return to step S 2.

Proposition 5.2 Given the above assumptions, the dualdecomposition algorithm yields an optimal first stage solution x of(5.4) after finitely many cycles.


pro memoriam:

Prop. 5.1: f(x) is piecewise linear, convex, and bounded below on{x | Ax = b, Tx+Wy = h, x ≥ 0, y ≥ 0} = ∅.

Proof: According to Prop. 5.1 the lower bound θ0 ofmin{qTy | Ax = b, Tx+Wy = h, x ≥ 0, y ≥ 0} as required in S 1exists (e.g., by weak duality θ0 := bTw + hTu for any(w, u) ∈ {(w, u) | ATw + TTu ≤ 0, WTu ≤ q}).Due to the solvability of (5.4) the dual constraints WTu ≤ q arefeasible and independent of x. Hence the dual representation of f(x)in S 2 is always feasible implying that f(x) is either finite or equal to+∞, the latter indicating primal infeasibility.


If f(x) = max{(h− T x)Tu |WTu ≤ q} = +∞,

=⇒ ∃u : WTu ≤ 0 and (h− T x)Tu > 0.

We may assume that u is one of finitely many generating elements ofthe cone {u |WTu ≤ 0}, as we can get them in the simplexalgorithm.

Since the cone {u |WTu ≤ 0} is finitely generated, we shall add atmost finitely many constraints of the type uT(h− Tx) ≤ 0 before wehave finite recourse in all further cycles.

If f(x) is finite, we assume that f(x) = (h− T x)Tu with u anoptimal dual feasible basic solution, of which there are only finitelymany; hence finitely many constraints θ ≥ uT(h− Tx) are added atmost. After finitely many cycles, with the solution (x, θ) in S 5, wemust get in the subsequent step S 4 that θ ≥ uT(h− T x) = f(x).


pro memoriam:

(5.5) min{cTx+ θ | Ax = b, θ − f(x) ≥ 0, x ≥ 0}

Due to the facts that

a) the feasible set of (5.5) is contained in the feasible set B0 ∩ B1 ofthe last master program in step S 5, solved by (x, θ), and that

b) this solution (x, θ) is obviously feasible for (5.5),

it follows for any solution (x, θ) of (5.5) that

cTx + θ = cTx + f(x)

≥ cTx+ θ due to a)

≥ cTx + θ due to b).

Hence, x is a first stage solution of (5.4). �


Observe that whenever we have that

x ∈ dom f with the stopping criterion not satisfied,

we have to add in S 4 a linear constraint of the typeθ ≥ φ(x) := γ + gTx, where γ = uTh and g = −TTu ∈ ∂f(x), thesubdifferential of f in x.

Hence φ(x) is a linear lower bound of f in x ∈ dom f such thatφ(x) = f(x).

This is illustrated in Fig. 5.2.


x1

f(x)

Φ(x)

x2x3

Figure 5.2: Dual decomposition: Optimality cuts.


Let us consider now, instead of (5.4), the two-stage SLP with S > 1realizations given as

(5.7)

min cTx+S∑

j=1

pjqTyj

s. t. Ax = b

T jx +Wyj = hj , j = 1, · · · , Sx ≥ 0

yj ≥ 0, j = 1, · · · , S.


This is equivalent to the NLP

(5.8)

min{cTx+S∑

j=1

pjθj}

s. t. Ax = b

θj − fj(x) ≥ 0, j = 1, · · · , Sx ≥ 0

with the recourse functions

fj(x) = min{qTyj |Wyj = hj − T jx, yj ≥ 0}, j = 1, · · · , S.

Then we can modify the above dual decomposition algorithm asfollows:


Dual Decomposition – Multicut Version

S 1 Initialization

Find, for j = 1, · · · , S, lower bounds θj for

min{qTyj | Ax = b, T jx+Wyj = hj , x ≥ 0, yj ≥ 0}and, with p = (p1, · · · , pS)T, θ = (θ1, · · · , θS)T andθ = (θ1, · · · , θS)T solve the LP

min{cTx+ pTθ | Ax = b, x ≥ 0, θ ≥ θ},

yielding the solution (x, θ). Define

B0 = {(x, θ) | Ax = b, x ≥ 0, θ ∈�S} and

B1 = {�n × {θ} | θ ≥ θ}.


S 2 Evaluate the recourse functions

To get fj(x) = min{qTyj |Wyj = hj − T j x, yj ≥ 0}, solve thedual LP’s

fj(x) = max{(hj − T j x)Tuj |WTuj ≤ q}, j = 1, · · · , S.

If fj(x) = +∞ for at least one j, then go to step S 3, else to S 4.

S 3 Feasibility cuts

We have fj(x) = +∞ for j ∈ J = ∅ implying that x is infeasiblefor (5.8). In this case again, there exist unbounded growthdirections uj , j ∈ J , such that ∀j ∈ J holds WTuj ≤ 0 and(hj − T j x)Tuj > 0, whereas for any feasible x of (5.8) there existsome yj ≥ 0 such that Wyj = hj − T jx. Hence it follows that

ujT(hj − T jx) = ujTWyj ≤ 0 .


S 3 Feasibility cuts (contd.)

Thus,ujT(hj − T jx) = ujTWyj ≤ 0, j ∈ J,

have to hold for any feasible x, but are violated by x. RedefineB1 := B1 ∩ {(x, θ) | ujT(hj − T jx) ≤ 0, j ∈ J}such that the infeasible x is cut off, and go on to step S 5.

S 4 Optimality cuts

Since fj(x) is finite for all j = 1, · · · , S, for the recourse problemsthere exist dual optimal feasible basic solutions uj , determined instep S 2, such that

fj(x) = (hj − T j x)Tuj ,

whereas for any arbitrary x we have:


S 4 Optimality cuts (contd.)

fj(x) = sup{(hj − T jx)Tuj |WTuj ≤ q}≥ (hj − T jx)Tuj .

Therefore, the θj − fj(x) ≥ 0 in (5.8) imply the constraints

θj ≥ ujT(hj − T jx).

If fj(x) ≤ θj ∀j, stop; x := x is an optimal first stage solution; iffj(x) > θj for j ∈ J = ∅, cut off (x, θ) by redefiningB1 := B1 ∩ {(x, θ) | θj ≥ ujT(hj − T jx) for j ∈ J}, =⇒ S 5.

S 5 Solve the updated master program

min{cTx+ pTθ | (x, θ) ∈ B0 ∩ B1}yielding (x, θ). With (x, θ) := (x, θ) return to step S 2.


This multicut version of the dual decomposition method for solvingthe two-stage SLP (5.7) or its equivalent NLP (5.8) is due to Birge(see Birge–Louveaux [2]). Similarly to Prop. 5.2, the multicutmethod can also be shown to yield an optimal first stage solutionafter finitely many cycles.

Instead of introducing S variables θj as in the multicut version, wemay also get along with just one additional variable θ: Instead of(5.8) we deal, again equivalently to the SLP (5.7), with the NLP

(5.9)

min{cTx+ θ}s. t. Ax = b

θ −S∑

j=1

pjfj(x) ≥ 0,

x ≥ 0.


In step S 3 we add feasibility cuts to B1 as long as we findfj(x) = +∞ for at least one j. In step S 4, where all recoursefunction values are finite with fj(x) = (hj − T j x)Tuj , we

either add the optimality cut θ ≥S∑

j=1

pj ujT(hj − T jx) to B1 if

θ <S∑

j=1

pj ujT(hj − T j x), and then go on to solve the master

program in step S 5; or else

if θ ≥S∑

j=1

pj ujT(hj − T j x), we stop with x as an optimal first

stage solution.

This L-shaped method was introduced by Van Slyke–Wets and isdescribed in detail in Birge–Louveaux [2] as well.


5.2 The Multi-Stage Case

In (2.9) (Sec. 2), the general multi-stage SLP with fixed recourse wasintroduced. In applications, there often arises a specialized datastructure as follows:

(5.10)

min

{cT1 x1 +� ζT

[T∑

t=2

cTt xt(ζt)

]}W1x1 = b1

T2(ζ2)x1 + W2x2(ζ2) = b2(ζ2), a.s.,

T3(ζ3)x2(ζ2) + W3x3(ζ3) = b3(ζ3), a.s.,...

TT (ζT )xT−1(ζT−1) + WTxT (ζT ) = bT (ζT ), a.s.,

x1, xt(ζt) ≥ 0, a.s. ∀ t ≥ 2 .


In analogy to (2.9) (Sec. 2) the matrices T1,W2, · · · ,WT as well asthe vectors b1 and c1, · · · , cT are assumed to be deterministic;ξ2, · · · , ξT and therefore also ζt = (ξ2, · · · , ξt), t = 2, · · · , T, arerandom vectors with given distributions. Furthermore, since in staget with 2 ≤ t ≤ T

Tt(ζt)xt−1(ζt−1) +Wtxt(ζt) = bt(ζt) a.s. ,

obviously for any particular realization ζt = (ζt−1, ξt) the decisionxt−1(·) is the one determined for the corresponding subpath up tostage t− 1 and hence equals xt−1(ζt−1).

If in particular the random vector ξ := (ξ2, · · · , ξT ) has a finitediscrete distribution {ξs, �ξ(ξ = ξs) = qs; s ∈ S := {1, · · · , S}},the process {ζt; t = 2, · · · , T} can be represented on a scenario tree:


Node n = 1 in stage 1 corresponds to the assumed deterministic stateat the beginning of the process;in stage 2 we have the nodes n = 2, · · · , K2, corresponding to one ofthe subpaths ζ s(n)

2 contained in the scenarios ξ1, · · · , ξS , endowed

with the probability pn =∑s∈S

{qs | ζs

2 = ζs(n)2

};

in stage 3 there are the nodes n = K2 + 1, · · · , K3 corresponding toone of the different subpaths ζ s(n)

3 contained in {ξs; s ∈ S}, with the

probabilities pn =∑s∈S

{qs | ζs

3 = ζs(n)3

}; and so on.

As an example of a scenario tree see the four-stage case in Fig. 5.3with 10 scenarios.


�

�

�

�

�

�

�

�

� �

� �

� �

� �

� �

�

�

� �

� �

Figure 5.3: Four-stage scenario tree


By construction, any node n in some stage tn ≥ 2 has exactly onepredecessor (node) hn in stage tn − 1, whereas each node n in stagetn < T has a nonempty finite set C(n) of successors (nodes in stagetn + 1), also called the children of n.

For any node n in stage tn ≥ 2 (i.e. Ktn−1 < n ≤ Ktn) we shall usethe shorthand Tn, xn, bn instead of

Ttn(ζs(n)tn

), xtn(ζs(n)tn

), btn(ζs(n)tn

), respectively.

Now we can rewrite problem (5.10) as the following optimizationproblem on the corresponding scenario tree:


(5.11)

min

{cT1 x1 +

K2∑n=2

pncT2 xn +

K3∑n=K2+1

pncT3 xn + · · ·

· · · +KT∑

n=KT−1+1

pncTTxn

W1x1 = b1

Tnx1 +W2xn = bn, 1 < n ≤ K2

Tnxhn +W3xn = bn, K2 < n ≤ K3

. . .

Tnxhn +WTxn = bn, KT−1 < n ≤ KT

xn ≥0, n = 1, · · · , KT .


This problem corresponds to the following sequence of programs:For node n = 1

(5.12)

F1 = min cT1 x1 +

K2∑n=2

pnFn(x1)

W1x1 = b1

x1 ≥ 0 ;

then for each node in stage 2, i.e. for n = 2, · · · , K2,

(5.13)

Fn(x1) = min cT2 xn +

∑m∈C(n)

pm

pnFm(xn)

W2xn = bn − Tnx1

xn ≥ 0 ;


and in general for any node n in stage tn ∈ {3, · · · , T − 1},

(5.14)

Fn(xhn) = min cTtnxn +

∑m∈C(n)

pm

pnFm(xn)

Wtnxn = bn − Tnxhn

xn ≥ 0 ;

finally, for nodes n in stage tn = T , i.e. n ∈ {KT−1 + 1, · · · , KT },

(5.15)

Fn(xhn

) = min cTTxn

WTxn = bn − Tnxhn

xn ≥ 0 .


For n with tn = T it is obvious from (5.15) that Fn(xhn) is piecewiselinear and convex in xhn for all n ∈ {KT−1 + 1, · · · , KT }. Then,going backwards through stages T − 1, T − 2, · · · , 2, it followsimmediately from (5.14), that the additive terms

∑m∈C(n)

pm

pnFm(xn)

are piecewise linear and convex in xn implying also the functionsFn(xhn) being piecewise linear and convex, such that by (5.13)

alsoK2∑n=2

pnFn(x1) in (5.12) is piecewise linear and convex in x1.


5.2.1 Nested Decomposition

Consider now problem (5.14) for some n with tn ≥ 2, and assumethat we have for the additive term

∑m∈C(n)

pm

pnFm(xn), due to its

piecewise linearity, an upper bound θn, characterized by someadditional linear constraints


dTnkxn + θn ≥ δnk, k = 1, · · · , sn ,

and assume, in case, some further linear constraints

aTnjxn ≥ αnj , j = 1, · · · , rn ,

necessary to induce the feasibility of one of the LP’s for some nodem ∈ C(n), such that (5.14) now reads as

(5.16)

Fn(xhn) = min cTtnxn + θn


aTnjxn ≥ αnj , j = 1, · · · , rndT

nkxn +θn ≥ δnk, k = 1, · · · , sn

xn ≥ 0.


The infeasibility of (5.16) at some predetermined xhn is equivalent tothe infeasibility of the constraints

{Wtnxn = bn − Tnxhn ; aTnjxn ≥ αnj , j = 1, · · · , rn; xn ≥ 0},

since for any xn feasible to them there exists trivially a feasible θn aswell. Hence, infeasibility of (5.16) at xhn implies

(5.17)

max(bn − Tnxhn)Tun +rn∑

j=1

αnjvj −→ +∞

WTtnun +

rn∑j=1

anjvj ≤ ctn

vj ≥ 0 ∀ j


and therefore the existence of un and vj ≥ 0 satisfying

WTtnun +

rn∑j=1

anj vj ≤ 0 and (bn − Tnxhn)Tun +rn∑

j=1

αnj vj > 0,

such that, analogously to Benders’ algorithm, for the feasibility of(5.16) the feasibility cut

aThnxhn ≥ αhn

with ahn := TTn un and αhn := bTn un +

rn∑j=1

αnj vj has to be added to

the LP for node hn at stage tn − 1.

Assume now that we have a solution (xn, θn) for (5.16).Then for anyone of the successor problems m ∈ C(n) in stagetm = tn + 1 we have


(5.18)

Fm(xn) = min cTtmxm + θm

Wtmxm = bm − Tmxn

aTmjxm ≥ αmj , j = 1, · · · , rmdT

mkxm +θm ≥ δmk, k = 1, · · · , sm

xm ≥ 0.

Assume that the corresponding dual LP’s


(5.19)

max(bm − Tmxn)Tum +rm∑j=1

αmjvmj +sm∑k=1

δmkwmk

WTtmum +

rm∑j=1

amjvmj +sm∑k=1

dmkwmk ≤ ctm

sm∑k=1

wmk ≤ 1

vmj , wmk ≥ 0 ∀ j, k

have solutions (um, vmj , wmk) such that for all m ∈ C(n)

Fm(xn) = (bm − Tmxn)Tum +rm∑j=1

αmj vmj +sm∑k=1

δmkwmk .


If it turns out that θn <∑

m∈C(n)

pm

pnFm(xn) then θn fails to be an

upper bound for this expression. To enforce the upper boundproperty of θn, for xn at least as in the L-shaped method, we have toadd the constraint

θn ≥∑

m∈C(n)

pm

pn

(bm − Tmxn)Tum +rm∑j=1

αmj vmj +sm∑k=1

δmkwmk

to (5.16), or equivalently

dTnkxn + θn ≥ δnk, k = sn + 1

withdnk =

∑m∈C(n)

pm

pnTT

mum and


δnk =∑

m∈C(n)

pm

pn

bTmum +rm∑j=1

αmj vmj +sm∑k=1

δmkwmk

.

We shall refer, according to (5.16), to the particular LP assigned tonode n as

(LPn)

min cTtnxn + θn


aTnjxn ≥ αnj , j = 1, · · · , rndT

nkxn +θn ≥ δnk, k = 1, · · · , sn

xn ≥ 0

and to its dual as


(DLPn)

max(bn − Tnxhn)Tun +rn∑

j=1

αnjvnj +sn∑

k=1

δnkwnk

WTtnun +

rn∑j=1

anjvnj +sn∑

k=1

dnkwnk ≤ ctn

sn∑k=1

wnk ≤ 1

vnj , wnk ≥ 0 ∀ j, k ,where for n = 1 holds xh1 ≡ 0; and for every n with tn = T we haveno feasibility cuts and therefore no variables vnj , and furthermoreholds θn ≡ 0, and hence there are no variables wnk in (DLPn).

Based on the above considerations, we can now formulate the


Nested Decomposition Algorithm

S 0 Initialization

Let rn = 0, sn = 0, θn = 0 ∀n ∈ {1, · · · , KT } and set n := 1.(Observe that for n ∈ {KT−1 + 1, · · · , KT } we’ll havern = sn = 0 as well as θn ≡ 0 throughout the procedure.)

S 1 Solve (LPn).

S 2 If the problem (LPn) is infeasible andif n = 1, then stop (problem (5.11) is infeasible);

else if n > 1, then let rhn := rhn + 1 and determine from(DLPn) a direction un and vnj ≥ 0 (with wnk = 0) such that

WTtnun +

rn∑j=1

anj vj ≤ 0 and (bn − Tnxhn)Tun +rn∑

j=1

αnj vj > 0

(with rn ≡ 0 if tn = T ).


Add to (LPhn) the feasibility cut

aThnxhn ≥ αhn

with ahn:= TT

n un and αhn:= bTn un +

rn∑j=1

αnj vj . Let n := hn

and return to S 1.

Otherwise, let (xn, θn) be a solution of (LPn).Determine a solution (un, vnj , wnk) of (DLPn).

If n < KT then let n := n+ 1 and go back to S 1;else, i.e. if n = KT , let n := 0 and go on to S 3 withn := KT−2 + 1, and t := T − 1.


S 3 Let k := sn + 1 and compute

dnk =∑

m∈C(n)

pm

pnTT

mum and

δnk =∑

m∈C(n)

pm

pn

bTmum +rm∑j=1

αmj vmj +sm∑k=1

δmkwmk

(with rm = sm = 0 if tm = T ).

If sn = 0, then drop the constraint θn = 0,and as well as if sn > 0 and dT

nkxn + θn < δnk,

let sn := sn + 1 and add the optimality cut

dTnkxn + θn ≥ δnk

to problem (LPn), and if n = 0, then set n := n.


If n < Kt, let n := n+ 1 and repeat S 3;else if n = Kt and n > 0, then let n := n and go back to S 1.

Otherwise, go on to S 4.

S 4 If t > 1, then let n := Kt−2 + 1, and t := t− 1 (observe thatK1 = 1 and K0 = 0), and go back to S 3;else if t = 1, then x1 is an optimal first stage solution; stop.

Assuming the solvability of (5.11) as well as finite upper bounds forall feasible xn, n = 1, · · · , KT (or instead the feasibility of the dualproblems (DLPn) for n = 1, · · · , KT ), the nested decompositionalgorithm converges finitely to an optimal first stage solution.

For a proof and more details see Birge–Louveaux [2].


5.3 Regularized Decomposition

In reduced notation, the k-th multicut master problem is

(5.20) min

cTx+S∑

j=1

pjθj

∣∣∣∣∣∣ (x, θ1, · · · , θS) ∈ Dk

,

with Dk the feasible set of the set Gk of constraints required in thismaster program. Hence, instead of minimizing

Φ(x) = cTx+S∑

j=1

pjfj(x)

we minimize, with respect to x,

Φk(x) = cTx+ minθ

S∑

j=1

pjθj

∣∣∣∣∣∣ (x, θ1, · · · , θS) ∈ Dk

.


Φk(x) is a piecewise linear function supporting Φ from below.

In general, in the early cycles of the algorithm, this support functionΦk is likely not to represent well the true function Φ in someneighborhood of the last iterate x(k). This may imply, that even foran x(k) close to the overall optimum of (5.8), page 168.1191, we getfrom solving (5.20) an x(k+1) far away from the optimal point.

Hence, it is no surprise that we often observe an “erratic jumpingaround” of the subsequent iterates x() without a substantialprogress in the objective, even when starting from an overall feasibleiterate x(k) close to the solution of the original NLP (5.8).

This may be improved substantially by regularizing the masterprogram with an additive quadratic term, to avoid too big stepsaway from an overall feasible approximate z(k) within one iteration.


Hence, with some control parameter ρ > 0, we deal with the master

(5.21) min

12ρ

‖x− z(k)‖2 + cTx+S∑

j=1

pjθj

∣∣∣∣∣∣ (x, θ1, · · · , θS) ∈ Dk

to find a next trial point x(k), for which we have to decide by criteriato be mentioned later, to either accept it as the next approximate orto continue with the current approximate, z(k).

We restrict ourselves just to a sketch of the modified algorithm.

For simplicity, degeneracy in the constraints Gk of (5.20) is excludedby assumption, such that every vertex of the feasible set Dk ⊂�n+S

is determined by exactly n+ S active constraints (including Ax = b

and active nonnegativity conditions, i.e. xi = 0 in case).


Regularized Decomposition Algorithm QDECOM

S 1 Determine a first approximate z(1), overall feasible for (5.8);let k := 1, and define D1 as the feasible set determined by theconstraint set

G1 := {Ax = b} ∪ {all optimality cuts at z(1)}.

S 2 Solve (5.21) for x(k) as first stage trial point andθ(k) = (θ(k)

1 , · · · , θ(k)S )T as recourse approximates.

If Φ(z(k)) = Φ(x(k)) (= cTx(k) +S∑

j=1

pjθ(k)j ), then stop; z(k) is an

optimal first stage solution for (5.8). Otherwise continue.


S 3 Delete from the constraint set Gk of (5.21) constraints beinginactive at (x(k), θ(k)), such that no more than n+ S constraintsare left.

S 4 If x(k) satisfies all first stage constraints (i.e. in particularx(k) ≥ 0), then go to step S 5 ;otherwise add to Gk no more than S violated first stageconstraints (nonnegativity conditions xi ≥ 0), yielding Gk+1 ; letz(k+1) := z(k), k := k + 1, and go to step S 2 .

S 5 Determine fj(x(k)), j = 1, · · · , S.If fj(x(k)) = +∞, then add a feasibility cut to Gk,else if fj(x(k)) > θ

(k)j , then add an optimality cut to Gk.


S 6 If fj(x(k)) = +∞ for at least one j , then let z(k+1) := z(k) andgo to step S 8 ;otherwise go to step S 7 .

S 7 If Φ(x(k)) = Φ(x(k)),or else if Φ(x(k)) ≤ µΦ(z(k)) + (1 − µ)Φ(x(k)) for some parameterµ ∈ (0, 1) and exactly n+S constraints were active at (x(k), θ(k)),then let z(k+1) := x(k);otherwise, let z(k+1) := z(k).

S 8 Let Gk+1 be the constraint set resulting from Gk after deletingand adding constraints due to steps S 3 and S 5 , respectively.With Dk+1 the corresponding feasible set and k := k + 1 returnto step S 2 .


The parameters ρ > 0 and µ ∈ (0, 1) are chosen adaptively betweenfixed bounds in order to improve the progress of the algorithm.

Obviously, during this algorithm all approximates z(k) are overallfeasible since the change z(k+1) := x(k) only takes place in step S 7 ,

– either if Φ(x(k)) = Φ(x(k)), which means that the piecewise linearsupport Φ of Φ coincides with Φ in x(k), as well as obviously inz(k), such that, since (x(k), θ(k)) minimizes (5.21), we have theinequality Φ(x(k)) ≤ Φ(z(k)) implying Φ(x(k)) < +∞ and hencethe overall feasibility of x(k), and continuing with the unchangedapproximate z(k) would block the procedure;


– or if (x(k), θ(k)) is a vertex of Dk (corresponding to Φ having akink in x(k)) and the decrease of Φ from z(k) to x(k)

Φ(x(k)) − Φ(z(k)) ≤ (1 − µ)(Φ(x(k)) − Φ(z(k)))

= (1 − µ)(Φ(x(k)) − Φ(z(k))) < 0

is substantial with respect to the corresponding decrease of Φand implies, due to Φ(x(k)) − Φ(z(k)) < 0 and thereforeΦ(x(k)) < +∞, again the overall feasibility of x(k).

As an example see the modified Fig. 5.2, where we assume ρ to besufficiently large:


x1 x2x3

Φ(x)

x4 x5

Φ(x)^

Here, starting from z(1) = x1 with the related optimality cut, we findx2 according to the feasibility cut being active there.


x1 x2x3

Φ(x)

x4 x5

Φ(x)^

Then we add a new optimality cut in x2 due to step S 5 , but keepz(2) := z(1) since Φ(x2) > Φ(z(1)).


x1 x2x3

Φ(x)

x4 x5

Φ(x)^

Hence we get next the trial point x3 which—depending on the choiceof µ—could be a candidate for the next approximate z(3).


x1 x2x3

Φ(x)

x4 x5

Φ(x)^

Adding in z(3) = x3 an optimality cut we might get the trial point x4

and might keep z(4) = z(3) = x3, due to µ.


x1 x2x3

Φ(x)

x4 x5

Φ(x)^

Finally, with a further optimality cut in x4, we might end up in theminimal point z(5) = x5.


5.4 Stochastic Decomposition

Again, we consider the problem

(5.22)

min{cTx+ Q(x)}s.t. Ax = b

x ≥ 0,

with

Q(x) =∫

Ξ

Q(x, ξ)d�,

but now with an arbitrary distribution �, and

(5.23)

Q(x, ξ) = min{qTy |Wy = ξ − Tx, y ≥ 0}= max{(ξ − Tx)Tu |WTu ≤ q}.


Assumptions:

A1 The dual feasible set in (5.23) is nonempty and compact.

A2 X = {x | Ax = b, x ≥ 0} and Ξ ⊂ IRr are nonempty andcompact.

A3 ∀x ∈ X : Q(x, ξ) ≥ 0 a.s.

The basic ideas of Stochastic Decomposition (Higle und Sen, 1991)can be stated as follows:

S 1 ξ0 := ξ, V0 := ∅, x1 ∈ arg min{cTx+Q(x, ξ0) | x ∈ X}; k := 1.

S 2 Sample ξk from � (such that ξ1, ξ2, · · · , ξk are independent).

S 3 Solve (5.23) for ξk and xk with the dual vertex uk as solution;Vk := Vk−1 ∪ {uk}.


S 4 Introduce cuts:a) With

ukj ∈ arg max{(ξj − Txk)Tu | u ∈ Vk}, j = 1, · · · , k ,

determine the k-th optimality cut as

gkk

Tx+ αk

k :=1k

k∑j=1

ukj

T(ξj − Tx) ≤ τ .

b) Modify the former cuts (j < k) to

gkj :=

k − 1k

gk−1j , αk

j :=k − 1k

αk−1j ; j < k .

S 5 Solve the k-th Master Program

min{cTx+ τ | x ∈ X, gkj

Tx+ αk

j ≤ τ, 1 ≤ j ≤ k}yielding a solution xk+1. Mit k := k + 1 =⇒ S 2.


Higle and Sen proved for this basic method:

Theorem 5.1 There exists a subsequence {xkn}∞n=1 of {xk}∞k=1 suchthat each accumulation point of {xkn}∞n=1 is an optimal solutionof (5.22) with probability 1.

Obviously, there are two difficulties related with this method: theidentification of the subsequence mentioned in the theorem, and thedestabilizing effect of elements not belonging to this subsequence.Therefore, Higle and Sen have introduced the concept of anincumbent solution with a “sufficiently low” estimated value of theobjective. Then the Stochastic Decomposition Algorithm (with xk theincumbent solutions) reads as:


S1 ξ0 := ξ, V0 := ∅,x1 ∈ arg min{cTx+Q(x, ξ0) | x ∈ X},x0 := x1, i0 := 0, r ∈ (0, 1), k := 1.

S2 Sample ξk from �

(ξ1, ξ2, · · · , ξk independent).

S3 Solve (5.23) for ξk at xk

for ξk at xk−1⇒ dual vertices

uk

uk.

Vk := Vk−1 ∪ {uk, uk}.


S4 a) With ukj ∈ arg max{(ξj − Txk)Tu | u ∈ Vk},

j = 1, · · · , k, determine the k-th optimality cut

gkk

Tx+ αk

k :=1k

k∑j=1

ukj

T(ξj − Tx) ≤ τ.

b) With uj ∈ arg max{(ξj − Txk−1)Tu | u ∈ Vk}, j = 1, · · · , k,determine / update the cut indexed by ik−1 as

gkik−1

Tx+ αk

ik−1:=

1k

k∑j=1

ujT(ξj − Tx) ≤ τ.

c) Update the remaining cuts according to

gkj :=

k − 1k

gk−1j , αk

j :=k − 1k

αk−1j , j ∈ {ik−1, k}.


S5 Test the incumbent: For k > 1, let

τkl := min{τ | gl

k

Txk + αl

j ≤ τ, 1 ≤ j ≤ l}τk

l := min{τ | gljTxk−1 + αl

j ≤ τ, 1 ≤ j ≤ l}

l = k, k − 1;

if

cTxk + τkk − (cTxk−1 + τk

k)

< r{cTxk + τkk−1 − (cTxk−1 + τk

k−1)}

then xk := xk, ik := k

else xk := xk−1, ik := ik−1.

S6 Solve the master program

min{cTx+ τ | x ∈ X, gkj

Tx+ αk

j ≤ τ, 1 ≤ j ≤ k} ⇒ xk+1;

k := k + 1 −→ S2.


For this modified method, called SDECOM, they could show:

Theorem 5.2 For the incumbent solutions {xk}∞k=1, defined in theSD-algorithm, let {kn}∞n=1 be the subsequence of iterations, at whichthe incumbent solutions are changed. For

∆k = cTxk + τkk−1 − (cTxk−1 + τk

k−1)

the index set

N∗ = {n | ∆kn ≥ 1n

n∑m=1

∆km}

is infinite,

and each accumulation point of {xkn}n∈N∗ is optimal in (5.22), a.s.


Stopping criteria may be based on the following theorem:

Theorem 5.3 Let {kn}n∈N be the sequence of iterations at whichthe incumbent changes, x∗ a solution of (5.22), and

Fk(xk) = cTxk + min{τ | gkj

Txk + αk

j ≤ τ, j = 1, · · · , k}.If N is finite, then

1m

m∑k=1

Fk(xk) −→ cTx∗ + Q(x∗) a.s.,

otherwise1m

m∑n=1

Fkn(xkn) −→ cTx∗ + Q(x∗) a.s.


5.5 Stochastic Quasi-Gradient-Method

Consider problems of the form

(5.24) minx∈X

[f(x) +

∫Ξ

Q(x, ξ) Pξ(dξ)].

This formulation includes SLP’s with recourse, using

X = {x | Ax = b, x ≥ 0},f(x) = cTx,

Q(x, ξ) = min{(q(ξ))Ty |Wy = h(ξ) − T (ξ)x, y ≥ 0}.

To describe the so-called stochastic quasi-gradient method (SQG), wesimplify the notation to

F (x, ξ) := f(x) +Q(x, ξ)


and consider therefore the problem

(5.25) minx∈X

�ξF (x, ξ)

assuming that

�ξF (x, ξ) is finite and convex in x,(5.26)

X is convex and compact.(5.27)

For SLP with recourse assumptions (5.26), (5.27) are satisfied, if e.g.

• relatively complete recourse is given, the recourse functionQ(x, ξ) is finite a.s. ∀x, and the components of ξ are in L2;

• X = {x | Ax = b, x ≥ 0} is bounded.


Starting with a feasible point x0 ∈ X , an iteration can be defined as

(5.28) xν+1 = ΠX(xν − ρνvν),

where the search direction vν is a random vector, ρν ≥ 0 is a steplength, and ΠX is the projection onto X , i.e. for y ∈�n with theEuclidean norm ‖ · · · ‖ holds

(5.29) ΠX(y) = arg minx∈X

‖y − x‖.

Due to (5.26) is ϕ(x) := �ξF (x, ξ) convex in x. If ϕ weredifferentiable for arbitrary z, with the gradientg := ∇ϕ(z) = ∇z�ξF (z, ξ), then −g were the steepest descent

direction of ϕ(x) = �ξF (x, ξ) in z, and we could use −g as searchdirection to get the well known method of steepest descent.


In view of the known difficulty in evaluating ϕ(x) = �ξF (x, ξ)

as well as even more in computing ∇ϕ(z) = ∇z�ξF (z, ξ) , thisapproach seems not to be manageable.

Due to the convexity of ϕ, for any z there is a subgradient g ∈ ∂ϕ(z),such that (see Fig. 5.4)

(x− z)Tg ≤ ϕ(x) − ϕ(z) ∀x,

and the set ∂ϕ(z) of all subgradients of ϕ in z is the subdifferential ofϕ in z. Obviously, for g = 0 follows that ϕ(z) = minx ϕ(x).If however g = 0, then for λ > 0 follows

ϕ(z + λg) ≥ ϕ(z) + gT(λg)

= ϕ(z) + λ‖g‖2

> ϕ(z).


Figure 5.4: Nondifferentiable convex function: Subgradients.


Hence, any g ∈ ∂ϕ(z), g = 0, is a direction of (strict) ascent of ϕin z. However, the reverse statement is not true, i.e. −g with g = 0 isnot necessarily a (strict) descent direction. Consider e.g.

ψ(u, v) := |u| + |v| .Then for z = (0, 3)T follows g = (1, 1)T ∈ ∂ψ(z). Furthermore, for0 < λ < 3 and z − λg = (−λ, 3 − λ)T, we have

ψ(z − λg) = 3 = ψ(z),

i.e. in this case −g is not a (strict) descent direction for ψ in z.

In general, however, for an arbitrary convex function ϕ withg ∈ ∂ϕ(z), g = 0, z ∈ arg minϕ and x ∈ arg minϕ,for ρ > 0 we have


‖(z − ρg) − x‖2 = ‖(z − x) − ρg‖2

= ‖z − x‖2 + ρ2‖g‖2 − 2ρgT(z − x)

≤ ‖z − x‖2 + ρ2‖g‖2 − 2ρ[ϕ(z) − ϕ(x)].

By the assumption ϕ(z) − ϕ(x) > 0, we can choose a steplengthρ = ρ > 0 such that

ρ2‖g‖2 − 2ρ[ϕ(z) − ϕ(x] < 0 ;

therefore, z − ρg is closer to x ∈ arg minϕ than z (see Fig. 5.5 forthe above example ψ(u, v) := |u| + |v|). This fact motivates thesubgradient methods for the minimization of convex nondifferentiablefunctions.


Figure 5.5: Reducing the distance to arg minψ using g ∈ ∂ϕ


Obviously, to get convergence statements on the procedure (5.28), weneed further assumptions on the search direction vν and thesteplength ρν .

Hence, let vν be a stochastic quasi-gradient , which means that

(5.30) �(vν | x0, · · · , xν) ∈ ∂x�ξF (xν , ξ) + bν ,

where ∂x denotes the subdifferential w.r.t. x.

Here, the dependence of vν on ξ may be replaced by a dependence ofvν on an observation ξν or on a sample {ξν1, · · · , ξνNν} of ξ, andobviously vν depends also on xν .

Given the choice of ρν , due to (5.28) the next iterate xν+1 dependson xν . Hence, also xν+1 is stochastic. Therefore the sequences(x0, x1, · · · , xν) are random ∀ν ≥ 1.


Obviously (5.30) is not a restrictive assumption. It only requires theconditional expectation of vν , given the history (x0, · · · , xν), to bethe sum of a subgradient gν ∈ ∂x�ξF (xν , ξ) and some bν .

Due to the convexity assumption (5.26) follows

(5.31) �ξF (x∗, ξ) −�ξF (xν , ξ) ≥ gνT(x∗ − xν)

for any solution x∗ of (5.25) and any gν ∈ ∂x�ξF (xν , ξ), such thatdue to (5.30)

(5.32)0 ≥ �ξF (x∗, ξ) −�ξF (xν , ξ)

≥ �(vν | x0, · · · , xν)T(x∗ − xν) + γν ,

with

(5.33) γν = −bνT(x∗ − xν).


Intuitively, expecting that {xν} −→ x and the vν are uniformlybounded, i.e. ‖vν‖ ≤ α ∀ν, we might require that ‖bν‖ ν→∞−→ 0 suchthat γν

ν→∞−→ 0.

Choosing in particular stochastic subgradients as search directions,i.e.

(5.34) vν ∈ ∂xF (xν , ξν),

or, more general,

(5.35) vν =1Nν

Nν∑µ=1

wµ, wµ ∈ ∂xF (xν , ξνµ),

where ξν or ξνµ are independent samples, then bν = 0, γν = 0 ∀ν,provided that integration and differentiation may be exchanged.


Finally, we assume that

(5.36) ρν ≥ 0,∞∑

ν=0

ρν = ∞,∞∑

ν=0

�ξ(ρν |γν | + ρ2ν‖vν‖2) <∞.

For the special cases (5.34) or (5.35) with uniformly bounded vν

this assumption on steplengths is reduced to

(5.37) ρν ≥ 0,∞∑

ν=0

ρν = ∞,

∞∑ν=0

ρ2ν <∞.

It can be shown [Ermoliev, 1983], that under the assumptions(5.26), (5.27), (5.30) and (5.36)(or else (5.26), (5.27), (5.34) or (5.35), and (5.37))the iteration (5.28) converges a.s. to a solution of (5.25).

6. Recourse Properties 231.1644

6 Recourse Problems:

Properties and Approximations


6.1 The Two–Stage Case: Properties

The two–stage SLP (with recourse) was:

minx∈X

{cTx+�ξ Q(x, ξ)} ,

with the recourse problem

Q(x, ξ) = miny≥0

{q(ξ)y |Wy = h(ξ) − T (ξ)x} .

Given the known solvability assumptions, i.e. complete recourse(incl. dual feasibility of the recourse problem),and—in addition—q(ξ) ≡ q, the recourse function

Q(x, ξ) is obviously convex in x for all ξ.


Proposition 6.1 If g0(·, ξ) and Q(·, ξ) are convex inx ∀ξ ∈ Ξ, and if X is a convex set, then the stochastic program withrecourse

minx∈X

�ξ{g0(x, ξ) +Q(x, ξ)}is a convex problem.

Proof: By assumption ∀x, y ∈ X and λ ∈ (0, 1) holds:

Q(λx+ (1 − λ)y, ξ) ≤ λQ(x, ξ) + (1 − λ)Q(y, ξ) a.s.

Hence we have

�ξ{Q(λx+ (1 − λ)y, ξ)} ≤ λ�ξ{Q(x, ξ)} + (1 − λ)�ξ{Q(y, ξ)}as well as the analogous result for �ξ{g0(x, ξ)} �


Proposition 6.2 If ∇xQ(x, ξ) exists a.s. and the assumptions ofLebesgue’s bounded convergence theorem hold for the differencequotients of Q (with repect to xi ), then ∇EξQ(x, ξ) exists as well,and

∇EξQ(x, ξ) =∫

Ξ

∇xQ(x, ξ)�(dξ).

Alternative: Let∂Q(x, ξ)∂xj

exist a.s., i.e.

Q(x+ hej , ξ) −Q(x, ξ)h

=∂Q(x, ξ)∂xj

+ρj(x, ξ;h)

h

withρj(x, ξ;h)

hh→0−→ 0 .


If ∃ �ξ

∂Q(x, ξ)∂xj

, and if1h

∫ρj(x, ξ;h)�(dξ) h→0−→ 0 ,

then Q(x) =∫Q(x, ξ)�(dξ) is partially differentiable at x and

∂Q(x)∂xj

=∫∂Q(x, ξ), ξ)

∂xj

�(dξ).

(Kall and Wallace [4]) �

6. Recourse Approximations 236.1673

6.2 Inequalities and Approximations

Assume that

Ξ = /\k

i=1[ai0, ai1] ⊃ supp� , Q(x, ·) : Ξ → IR is convex ∀x ∈ X

and further that ∫Ξ

Q(x, ξ) P (dξ) exists ∀x ∈ X.

Withξ := � ξ

we have Jensen’s inequality:


Proposition 6.3

Q(x, ξ) ≤∫

Ξ

Q(x, ξ) �(dξ).

Proof: If � is finite discrete, we have from convexity that

Q(x, ξ) = Q(x,K∑

i=1

piξi)

≤K∑

i=1

piQ(x, ξi).

Otherwise, construct a sequence of simple functions ξν

approximating∫ξ�(dξ) and

∫Q(x, ξ)�(dξ) in the proper way

known from measure theory. �


To get an upper bound, consider first the 1-point distribution�ξ(ξ) = 1 for any fixed ξ ∈ Ξ.

Then the components ξi are independent with �ξ(ξi) = 1 and

�ξ (ξ) = ξ.

Introducing the independent random variables ηi(ξ) with the 2-pointdistributions at ai0 and ai1 s.t.


p(ai0; ξ) :=ai1 − ξiai1 − ai0

p(ai1; ξ) := (−1)ai0 − ξiai1 − ai0

i = 1, · · · , K,

we get the distribution on the vertices aν of Ξ(where ν = (ν1, · · · , νK)T and νi ∈ {0, 1}, such that aν

i = aiνi) as

p(aν ; ξ) =∏K

i=1(−1)νi(aiνi − ξi)∏Ki=1(ai1 − ai0)

with νi = 1 − νi. Observing that

�ηi (ξ)(ηi) = ξi =⇒ �η (ξ)(η) =∑

ν

aνp(aν ; ξ) = ξ,

we have from Jensen’s inequality:


Lemma 6.1 For ξ ∈ Ξ holds

Q(x, ξ) ≤∫

Ξ

Q(x, η(ξ))�η(ξ)(dη)

=∑

ν

Q(x, aν)p(aν ; ξ).

(Q(x, ·) linear on Ξ =⇒ equality). =⇒ gen. (E–M) inequality:

Proposition 6.4 For p0(aν) =∫

Ξ

p(aν ; ξ)�(dξ) holds∫Ξ

Q(x, ξ)�(dξ) ≤∑

ν

Q(x, aν)p0(aν)

with the discrete measure p0(·) on extr (Ξ).(Q(x, ·) linear on Ξ =⇒ equality).


PartitionS = {Ξ1, · · · ,ΞL}

Hence with

pl = P (Ξl) (> 0 by assumption)

ξl= E(ξ | ξ ∈ Ξl)

{alν} the vertices of Ξl

p0l (a

lν) the conditional E-M probabilites on extr (Ξl),

we getQ(x, ξ

l) ≤ 1

pl

∫Ξl

Q(x, ξ) �(dξ)

1pl

∫Ξl

Q(x, ξ) �(dξ) ≤∑

ν

Q(x, alν)p0

l (alν).


Result:

LS(x) =L∑

l=1

plQ(x, ξl)

as lower bound and

US =L∑

l=1

pl

∑ν

Q(x, alν)p0

l (alν)

as upper bound for the expected recourse∫Ξ

Q(x, ξ)P (dξ).


Cycle:

For f(x) the first stage objective, e.g. f(x) = cTx, solve

minx∈X

{f(x) + LS(x)},

yielding xS . Then

f(xS) + LS(xS)

≤ minx∈X

{f(x) +

∫Ξ

Q(x, ξ) �(dξ)}

≤ f(xS) + US(xS).

If xS ε-optimal then stop; else refine S1 := S −→ S2 and repeat.


Stability statements proved by Robinson-Wets and Kall assertapproximation of optimal value and of the solution set of

minx∈X

{f(x) +∫

Ξ

Q(x, ξ) �(dξ)}.

Various partly heuristic refinement strategies, e.g. by Frauendorfer,Frauendorfer-Kall;

particular decomposition techniques by Strazicky and Ruszczynski.

Method implemented as DAPPROX (using QDECOM); efficient for

– either dim(Ξ) low (e.g. dim(Ξ) ≤ 10)

– or simple recourse, where the method is simplified to SRAPPROXaccording to the following observations.


For the special case of Simple Recourse, i.e. for

qT = (q+T, q−T), W = (I,−I), T (ξ) ≡ T, h(ξ) ≡ ξ,

supp P ⊂ Ξ := /\Ki=1 (αi0, αi1]

we know (Wets 1966) that Q(x) is a separable function in the

components of χ := Tx, i.e. Q(x) =m∑

i=1

Qi(χi) with

(6.1) Qi(χi) =

q+i ξi − q+i χi if χi ≤ αi0

q+i ξi − q+i χi − qi

∫ξi≤χi

(ξi − χi)�(dξ)

if αi0 < χi < αi1

−q−i ξi + q−i χi if χi ≥ αi1,


where qi = q+i + q−i (≥ 0). Applying Jensen we get for the 1-pointdistribution at ξi from (6.1) that

Qi(χi; ξi) =

q+i ξi − q+i χi if χi ≤ ξi

−q−i ξi + q−i χi if χi ≥ ξi,

implying (see Fig. 6.1)

Proposition 6.5 For χi ≤ αi0 and for χi ≥ αi1 the Jensen boundQi(χi; ξi) coincides with Qi(χi).


Figure 6.1: Jensen bound Qi(χi; ξi)


Partitioning (αi0, αi1] at χi it follows from Prop. 6.5, that with theconditional expectations ξ

1

i ∈ (αi0, χi] and ξ2

i ∈ (χi, αi1] the Jensenbound Qi(χi; ξ

1

i , ξ2

i ) coincides at χi with Qi(χi) (see Fig. 6.2). Hencein this case we do not have the burden of computing the E–M upperbound.


Figure 6.2: Jensen bound: Partitioning at χi


Partitioning (αi0, αi1] at χi it follows from Prop. 6.5, that with theconditional expectations ξ

1

i ∈ (αi0, χi] and ξ2

i ∈ (χi, αi1] the Jensenbound Qi(χi; ξ

1

i , ξ2

i ) coincides at χi with Qi(χi) (see Fig. 6.2). Hencein this case we do not have the burden of computing the E–M upperbound.

With the modifications resulting from these special properties ofsimple recourse problems, an analoque of DAPPROX wasimplemented as SRAPPROX (Kall-Stoyan 1982, Kall-Wallace [4]) forwhich, due to the separability mentioned above, dimΞ has not thesame critical impact as for DAPPROX.

251.1749

7 Simple Recourse Type Problems

7. SRT Functions 252.1755

7.1 SRT Functions

Definition 7.1 For a real variable z, a random variable ξ withdistribution Pξ and realizations ξ, and real constants α, β, γ withα+ β ≥ 0, the function ϕ(·, ·) given by

ϕ(z, ξ) := α · [ξ − z]+ + β · [ξ − z]− − γ

is called a simple recourse type function. Then

Φ(z) := �ξ ϕ(z, ξ) =∫ ∞

−∞(α · [ξ − z]+ + β · [ξ − z]−)Pξ(dξ) − γ

is the expected SRT function.

From Definition 7.1 follows immediately:

7. SRT Functions 253.1761

Proposition 7.6 Let ϕ(·, ·) be a SRT function and Φ(·) thecorresponding expected SRT function. Then

• ϕ(z, ·) is convex in ξ for any fixed z ∈�;

• ϕ(·, ξ) is convex in z for any fixed ξ ∈�;

• Φ(·) is convex in z.

In particular, the SRT functions

Qi(χi, ξi) = q+i [ξi − χi]+ + q−i [ξi − χi]−

are– convex in ξi for any fixed χi ∈�;

– convex in χi for any ξi ∈�;

hence the components Qi(χi) = �ξQi(χi, ξi) of the simple recourseobjective are convex in χi.

7. SRT Approximation 254.1769

7.2 Approximation of Expected SRT Functions

Let ϕ(·, ·) be a SRT function according to Definition 7.1, and assumeFξ(·) to be a distribution function of ξ, inducing the measure Pξ ,such that

µ := �ξ ξ =∫ ∞

−∞ξ dFξ (ξ)

exists.

Proposition 7.7 For all z ∈� holds

ϕ(z, µ) = ϕ(z,�ξ ξ) ≤ �ξ ϕ(z, ξ) =∫ ∞

−∞ϕ(z, ξ) dFξ (ξ) = Φ(z).

Proof: By Proposition 7.6 ϕ(z, ·) is convex in ξ; hence the statementfollows from Jensen’s inequality. �


Proposition 7.8 For

Φ(z) := �ξ ϕ(z, ξ)

=∫ ∞

−∞(α · [ξ − z]+ + β · [ξ − z]−) dFξ (ξ) − γ

={α ·

∫ ∞

z

[ξ − z] dFξ (ξ) + β ·∫ z

−∞[z − ξ] dFξ (ξ)

}− γ

holds: Φ(z) − ϕ(z, µ) = Φ(z) − [α · (µ− z) − γ] −→ 0 as z → −∞and Φ(z) − ϕ(z, µ) = Φ(z) − [β · (z − µ) − γ] −→ 0 as z → +∞ .

In particular follows:

Pξ (ξ < a) = 0

Pξ (ξ > b) = 0

=⇒ Φ(z) =

α · (µ− z) − γ = ϕ(z, µ) for z ≤ a

β · (z − µ) − γ = ϕ(z, µ) for z ≥ b .


Proof: Follows immediately from the assumed integrability of ξ . �

By Propositions 7.7 and 7.8 we have

ϕ(z, µ) ≤ Φ(z) ∀z

and, furthermore,

a := inf suppPξ > −∞ =⇒ Φ(z) = ϕ(z, µ) ∀z ≤ a

b := sup suppPξ < +∞ =⇒ Φ(z) = ϕ(z, µ) ∀z ≥ b .

For any interval I = {ξ | a < ξ ≤ b} ⊂ suppPξ with Pξ (I) > 0 and atleast one of the bounds a, b being finite, it was shown by Pfanzagl[1974], that Jensen’s inequality holds as well for the correspondingconditional expectations.


Proposition 7.9 With µ |I = �ξ {ξ | ξ ∈ I} and

Φ |I (z) = �ξ {ϕ(z, ξ) | ξ ∈ I} holds for all z ∈�

ϕ(z, µ |I) ≤ Φ |I (z) =1

Pξ (I)

∫ b

a

ϕ(z, ξ) dFξ (ξ) .

As shown in Kall-Stoyan [1982], in analogy to Proposition 7.8 followsalso

Proposition 7.10 For any finite bound a and/or b of I = (a, b] holds

Φ |I (z) =

ϕ(z, µ |I) for z ≤ a

ϕ(z, µ |I) for z ≥ b .


Assume that J := suppPξ is an interval and that we have a finitepartition {Iν | I1 = [a0, a1], Iν = (aν−1, aν ] , ν = 2, · · · , N} of J , such

that ∀ν holds pν = Pξ (Iν) > 0 andN∑

ν=1

pν = 1

(where aν−1 < aν ∀ν ≤ N and possibly a0 = −∞ and/or aN = +∞).We shall make use of the trivial fact that for any Pξ-integrablefunction ψ(ξ) holds

(7.2) �ξ ψ(ξ) =N∑

ν=1

pν �ξ {ψ(ξ) | ξ ∈ Iν}.

Then we get immediately


Proposition 7.11 Given a partition {Iν ; ν = 1, · · · , N} with theabove assumptions, there follow

a) the relations

ϕ(z, µ) = ϕ(z,N∑

ν=1

pν µ |Iν) ≤

N∑ν=1

pν ϕ(z, µ |Iν)

≤N∑

ν=1

pν Φ |Iν(z) = Φ(z);

b) for every fixed (finite) aκ out of {a0, · · · , aN}, the equalities

Φ |Iν(aκ) = ϕ(aκ, µ |Iν

) for ν = 1, · · · , N

Φ(aκ) =N∑

ν=1

pνΦ |Iν(aκ) =

N∑ν=1

pνϕ(aκ, µ |Iν).


Proof: The above relations are consequences of previously mentionedfacts:

a) The two equations reflect (7.2), the first inequality follows fromthe convexity of ϕ(z, ·), and the second inequality appliesProposition 7.9.

b) The first equation applies Proposition 7.10, the second onerepeats (7.2) again. �

Taking the probabilities pν associated with the partition intervals Iνinto account yields an improved global error estimate:


Proposition 7.12 Given the interval partition {Iν ; ν = 1, · · · , N}of suppPξ and z ∈ Iκ (aκ−1, aκ finite), then the global error estimate∆(z) satisfies

0 ≤ ∆(z) = Φ(z) −N∑

ν=1

pν ϕ(z, µ |Iν) ≤ 1

2pκ (α+ β)

aκ − aκ−1

2

for z ∈ int Iκ , whereas for z ∈ {a0, a1, · · · , aN} we have ∆(z) = 0.

7. MSRT Functions 262.1816

7.3 Multiple Simple Recourse

Instead of the SRT functions

ϕ(z, ξ) := α · [ξ − z]+ + β · [ξ − z]− − γ,

with constant marginal costs for shortage and surplus, respectively,Klein Haneveld, and recently van der Vlerk, discussed the sometimesmore appropriate case of piecewise linear, increasing marginal costsfor these deficiencies:


Definition 7.2 For real constants {αk, βk, uk, lk; k = 0, · · · , K − 1}and γ, such that

αk ≥ 0, βk ≥ 0 for k = 0, · · · , K − 1 withK−1∑k=0

(αk + βk) > 0

u0 = 0 < u1 < · · · < uK−1

l0 = 0 < l1 < · · · < lK−1 ,

the function ψ(·, ·) given by

ψ(z, ξ) :=K−1∑k=0

{αk · [ξ − z − uk]+ + βk · [ξ − z + lk]−} − γ

is a multiple simple recourse type function. =⇒ Expected MSR:Ψ(z) = �ξ ψ(z, ξ) =∫ ∞

−∞

K−1∑k=0

{αk · [ξ − z − uk]+ + βk · [ξ − z + lk]−} dFξ (ξ) − γ .


In comparison, SRT and MSRT functions look like

� � � � � � � � � � ��

� � � � � � � �

� � � � ��

� � � � � � � �

� � � � � � � � � � � � � � � �

The most interesting result ist, that we can deal with MSRTproblems by applying methods developed for SRT problems:


Proposition 7.13 The multiple simple recourse problem with theexpected MSRT function

Ψ(z) =(7.3)

=K−1∑k=0

αk

∫(ξ − z)+ dFξ (ξ + uk)

+K−1∑k=0

βk

∫(ξ − z)− dFξ (ξ − lk)

is equivalent to the simple recourse problem with the expected SRTfunction


Ψ(z) =(7.4)

=

(K−1∑k=0

αk

)∫(ξ − z)+ dG (ξ)

+

(K−1∑k=0

βk

)∫(ξ − z)− dG (ξ) − C

using the distribution function

(7.5) G (ξ) =

K−1∑k=0

αkFξ (ξ + uk) +K−1∑k=0

βkFξ (ξ − lk)

K−1∑k=0

(αk + βk)


and the constant

(7.6) C =

(K−1∑k=0

αk

)K−1∑k=0

βklk +

(K−1∑k=0

βk

)K−1∑k=0

αkuk

K−1∑k=0

(αk + βk)

.

stochastic linear optimization - uzh · stochastic linear programming algorithms: a comparison...

Documents