final report for technical cooperation between georgia ... · final report for technical...

37
Final report for technical cooperation between Georgia Institute of Technology and ONS - Operador Nacional do Sistema El´ etrico Alexander Shapiro and Wajdi Tekaya School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332-0205, USA Joari Paulo da Costa and Murilo Pereira Soares ONS - Operador Nacional do Sistema El´ etrico Rua da Quitanda, 196, Centro Rio de Janeiro, RJ, 20091-005, Brasil December 4, 2012 1

Upload: vanthuan

Post on 11-Nov-2018

217 views

Category:

Documents


0 download

TRANSCRIPT

Final report for technical cooperation between Georgia

Institute of Technology and ONS - Operador Nacional

do Sistema Eletrico

Alexander Shapiro and Wajdi TekayaSchool of Industrial and Systems Engineering,

Georgia Institute of Technology,Atlanta, Georgia 30332-0205, USA

Joari Paulo da Costa and Murilo Pereira SoaresONS - Operador Nacional do Sistema Eletrico

Rua da Quitanda, 196, CentroRio de Janeiro, RJ, 20091-005, Brasil

December 4, 2012

1

Contents

1 Introduction 2

2 Mathematical formulation and modeling 32.1 Time series models for the inflows . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

3 Stochastic Dual Dynamic Programming method 53.1 Generic description of the SDDP algorithm . . . . . . . . . . . . . . . . . . . . . . . 6

3.1.1 Backward step of the SDDP algorithm . . . . . . . . . . . . . . . . . . . . . . 63.1.2 Forward step of the SDDP algorithm . . . . . . . . . . . . . . . . . . . . . . . 7

3.2 Stopping criteria and validation of the optimality gap . . . . . . . . . . . . . . . . . 83.3 Generic description of the risk neutral SDDP . . . . . . . . . . . . . . . . . . . . . . 9

4 Risk averse approach 104.1 Risk averse SDDP method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

4.1.1 Backward step of the SDDP algorithm . . . . . . . . . . . . . . . . . . . . . . 124.1.2 Forward step . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134.1.3 Dynamic choice of the weights . . . . . . . . . . . . . . . . . . . . . . . . . . 13

4.2 Time consistency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144.3 Generic description of the risk averse SDDP . . . . . . . . . . . . . . . . . . . . . . . 15

5 Worst-case-expectation approach 155.1 Static case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165.2 Multistage case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175.3 The SDDP approach to the worst-case-expectation problem . . . . . . . . . . . . . . 19

5.3.1 Backward step of the SDDP algorithm . . . . . . . . . . . . . . . . . . . . . . 195.3.2 Forward step of the SDDP algorithm . . . . . . . . . . . . . . . . . . . . . . . 205.3.3 Sampling from the uncertainty set . . . . . . . . . . . . . . . . . . . . . . . . 215.3.4 Discussion of the SDDP algorithm . . . . . . . . . . . . . . . . . . . . . . . . 21

5.4 Generic description of the worst-case-expectation SDDP . . . . . . . . . . . . . . . . 22

6 Computational aspects 236.1 Cutting plane elimination procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . 236.2 Risk neutral approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

6.2.1 Typical bounds behaviour & gap based stopping criteria . . . . . . . . . . . . 256.2.2 Policy value stopping criterion . . . . . . . . . . . . . . . . . . . . . . . . . . 266.2.3 Variability of SAA problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276.2.4 Sensitivity to initial conditions . . . . . . . . . . . . . . . . . . . . . . . . . . 27

6.3 Risk averse approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 286.4 Worst-case-expectation approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

6.4.1 Solution strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 306.4.2 Worst-case expectation approach . . . . . . . . . . . . . . . . . . . . . . . . . 316.4.3 Worst-case expectation approach vs. Risk averse approach . . . . . . . . . . . 33

7 Conclusion 35

1

1 Introduction

The Brazilian interconnected power system is a large scale system planned and constructed consid-ering the integrated utilization of the generation and transmission resources of all agents and theuse of inter-regional energy interchanges, in order to achieve cost reduction and reliability in powersupply. The power generation facilities as of December 2010 are composed of more that 200 powerplants with installed capacity greater than 30 MW, owned by 108 public and private companies,called “Agents” – 57 Agents own 141 hydro power plants located in 14 large basins, 69 with largereservoirs (monthly regulation or above), 68 run-of-river plants and four pumping stations. Consid-ering only the National Interconnected System (SIN), without captive self-producers, the installedcapacity reaches 171.1 GW in 2020, with increment of 61.6 GW over the 2010 installed capacityof 109.6 GW. Hydropower accounts for 56% of the expansion (34.8 GW), while biomass and windpower account for 25% of the expansion (15.4 GW).

The main transmission grid is operated and expanded in order to achieve safety of supplyand system optimization. The inter-regional and inter-basin transmission links allow interchangesof large blocks of energy between areas making it possible to take advantage of the hydrologicaldiversity between river basins. The main transmission grid system has 100.7 ×103km of lines above230 kV, owned by 66 Agents, and is planned to reach 142.2 ×103km in 2020, an increment of 41.2%,mainly with the interconnection of the projects in the Amazonian region.

The purpose of hydrothermal system operation planning is to define an operation strategywhich, for each stage of the planning period, given the system state at the beginning of the stage,produces generation targets for each plant. The hydrothermal operating planning can be seen as adecision problem under uncertainty because of unknown variables such as future inflows, demand,fuel costs and equipment availability. The existence of large multi-year regulating reservoirs makesthe operation planning a multistage optimization problem; in the Brazilian case it is usual toconsider a planning horizon of 5 years on monthly basis. The existence of multiple interconnectedhydro plants and transmission constraints characterizes the problem as large scale.

In summary, the Brazilian hydro power operation planning problem is a multistage (60 stages),large scale (more than 200 power plants, of which 141 are hydro plants), nonlinear and nonseparablestochastic optimization problem. This setting far exceeds the computer capacity to solve it withadequate accuracy in reasonable time frame. The standard approach to solve this problem is toresort to a chain of models considering long, mid and short term planning horizon in order tobe able to tackle the problem in a reasonable computational time. For the long-term problem, itis usual to consider an approximate representation of the system, the so-called aggregate systemmodel that aggregates all hydro power plants belonging to a homogeneous hydrological region ina single equivalent energy reservoir, and solve the resulting much smaller problem. The majorcomponents of the aggregate system model are: the equivalent energy reservoir model and thetotal energy inflow (controllable and uncontrollable). The energy storage capacity of the equivalentenergy reservoir can be estimated as the energy that can be produced by the depletion of thereservoirs of a system, provided a simplified operating rule that approximates the actual depletionpolicy.

For the Brazilian interconnected power system it is usual to consider four energy equivalentreservoirs, one in each one of the four interconnected main regions, SE, S, N and NE. The resultingpolicy obtained with the aggregate representation can be further refined, so as to provide decisionsfor each of the hydro and thermal power plants. This can be done by solving the mid-term problem,considering a planning horizon up to a few months and individual representation of the hydro plantswith boundary conditions (the expected cost-to-go functions) given by the solution of the long-termproblem. This is the approach nowadays used for solving the long and mid term hydrothermal power

2

planning in the Brazilian interconnected power system.

2 Mathematical formulation and modeling

Mathematical algorithms compose the core of the Energy Operation Planning Support System. Theobjective is to compute an operation strategy which controls the operation costs, over a planningperiod of time, in a reasonably optimal way. This leads to formulation of large scale multistagestochastic programming problems of the form

MinA1x1=b1x1≥0

cT1 x1 + E

minB2x1+A2x2=b2

x2≥0

cT2 x2 + E[· · ·+ E

[min

BT xT−1+AT xT=bTxT≥0

cTTxT]] . (2.1)

Components of vectors ct, bt and matrices At, Bt are modeled as random variables forming thestochastic data process ξt = (ct, At, Bt, bt), t = 2, ..., T , with ξ1 = (c1, A1, b1) being deterministic(not random). By ξ[t] = (ξ1, ..., ξt) we denote history of the data process up to time t.

In order to set the hydrothermal operating planning problem within this framework one canproceed as follows. Considering the aggregate representation of the hydroplants, the energy con-servation equation for each equivalent energy reservoir n can be written as

SEt,n = SEt−1,n + CEt,n −GHt,n − SPt,n. (2.2)

That is, the stored energy (SE) at the end of each stage (start of the next stage) is equal to theinitial stored energy plus controllable energy inflow (CE) minus total hydro generated energy (GH)and losses (SP) due to spillage, evaporation, etc.

At each stage, the net subsystem load L, given by the remaining load after discounting theuncontrolled energy inflow from the total load, has to be met by the total hydro, the sum of allthermal generation belonging to system n, given by the set NTn, and the net interconnection energyflow (NF) to each subsystem. In other words, the energy balance equation for system n is

GHt,n +∑

j∈NTn

GTt,j + NFt,n = Lt,n. (2.3)

Note that this equation is always feasible (i.e., the problem has relatively complete recourse) dueto the inclusion of a dummy thermal plant with generation capacity equal to the demand andoperation cost that reflects the social cost of not meeting the energy demand (deficit cost).

Constraints Btxt−1 +Atxt = bt are obtained writing

xt = [SE; GH; GT; SP; NF], bt = [CE; L], ct = [0; 0; CT; 0; 0],

At =(I I ∆ 0 00 I I 0 I

), Bt =

(−I 0 0 0 00 0 0 0 0

),

where ∆ = {δn,j = 1 for all j ∈ NTn and zero else}, I and 0 are identity and null matrices,respectively, of appropriate dimensions and the components of CT are the unit operation costof each thermal plant and penalty for failure in load supply. Note that hydroelectric generationcosts are assumed to be zero. Physical constraints on variables like limits on the capacity of theequivalent reservoir, hydro and thermal generation, transmission capacity and so on are taken intoaccount with constraints on xt.

3

2.1 Time series models for the inflows

The main variability in the considered applications is coming from the uncertainty of the inflows.Therefore it is important to develop a realistic forecasting model for the inflows. In doing so twomajor considerations should be taken into account. The forecasting, with specified distributions ofthe involved errors, should give a reasonable representation of variability of the inflows; this couldbe tested against available historical data. Another, not less important, consideration is that theforecasting model should be amenable for the consequent numerical procedures. In particular, itshould be possible to generate scenarios by sampling from the error distributions, and convexityshould be preserved in the constructed optimization problems. One possible approach is describedbelow.

The historical data is composed of 79 years of observations of the natural monthly energy inflow(from year 1931 to 2009) for each of the four systems. Let Xt, t = 1, . . . , 948 denote a time seriesof monthly inflows for one of the regions. It can be verified that Xt is highly skewed to the rightfor each of the 4 systems. This observation motivates considering Yt = log(Xt) for analysis. Aftertaking the logarithm, the distributions become more symmetric. It could be also seen that inflowsof the N, NE and SE systems have a clear seasonal behavior, while for the S system this is notobvious.

Let µt = µt+12 be the monthly averages of Yt and Zt = Yt − µt be the corresponding residuals.High value at lag 1 of the partial autocorrelation of the Zt time series and insignificant values forlarger lags suggest the first order AR(1) autoregressive time series model for Zt:

Zt = α+ φZt−1 + εt. (2.4)

And indeed the AR(1) gave reasonably good fit to the data.This suggests the following model for the time series Yt

Yt − µt = φ(Yt−1 − µt−1) + εt, (2.5)

where εt is iid sequence having normal distribution N(0, σ2). For the original times series Xt thisgives

Xt = eεteµt−φµt−1Xφt−1. (2.6)

Unfortunately this model is not linear in Xt and would result in a nonlinear multistage program.Therefore we proceed by using the following (first order) approximation of the function y = xφ ateµt−1

xφ ≈ (eµt−1)φ + φ(eµt−1)φ−1(x− eµt−1),

which leads to the following approximation of the model (2.6)

Xt = eεt[eµt + φeµt−µt−1(Xt−1 − eµt−1)

]. (2.7)

We allow, further, the constant φ to depend on the month, and hence to consider the followingtime series model

Xt = eεt[eµt + γte

µt−µt−1(Xt−1 − eµt−1)]

= (1− γt)eεteµt + γteεteµt−µt−1Xt−1,

(2.8)

with γt = γt+12. Note that model (2.8) guarantees that values Xt stay nonnegative for any realiza-tions of errors εt, provided that γt ∈ [0, 1].

The parameters of model (2.8) are estimated directly from the data.

4

• Denote by Rt = Xt−eµteµt

. If the error term εt is set to zero, i.e., the multiplicative error termeεt is set to one, (2.8) can be written as:

Rt = γtRt−1 (2.9)

For each month, we perform a least square fit to the Rt sequence to obtain the monthly valuesfor γt, assuming that γt = γt+12.

• The errors εt are modelled as a component of multivariate normal distribution N(0, Σt), whereΣt is the sample covariance matrix for

log(

Xt

[eµt + γteµt−µt−1(Xt−1 − eµt−1)]

)on a monthly basis, i.e., Σt+12 = Σt.

In that way correlations between different regions are taken into account.

3 Stochastic Dual Dynamic Programming method

In this section we discuss an approach to numerical solution of the multistage problem (2.1). Themain tool is the Stochastic Dual Dynamic Programming (SDDP) algorithm and its extensions.There are three levels of approximations in solving the problem. The first level is modelling.The inflows are viewed as seasonal time series and modelled as a periodic auto-regressive (PAR(p))process. Any such modelling involves inaccuracies - autoregressive parameters should be estimated,errors distributions are not precise, etc. We will refer to an optimization problem based on a currenttime series model as the “true” problem.

We will make the basic assumption that the data process ξt is stagewise independent, i.e.,random vector ξt is independent of (ξ1, ..., ξt−1), t = 2, ..., T . This assumption is instrumental forimplementation of the SDDP algorithm. The data process ξt, t = 1, ..., T , of the “true” model hascontinuous distributions. Since the corresponding expectations (multidimensional integrals) cannotbe computed in a closed form, one needs to make a discretization of the data process ξt. That is, asample ξ1

t , ..., ξNtt , of size Nt, from the distribution of the random vector ξt, t = 2, ..., T , is generated.

These samples generate a scenario tree with the total number of scenarios N =∏Tt=2Nt, each with

equal probability 1/N . Consequently the true problem is approximated by the so-called SampleAverage Approximation (SAA) problem associated with this scenario tree. This corresponds to thesecond level of approximation in the current system. Note that in such sampling approach thestagewise independence of the data process is preserved in the constructed SAA problem.

Even with a moderate number of scenarios per stage, say each Nt = 100, the total number ofscenarios N quickly becomes astronomically large with increase of the number of stages. Thereforethe constructed SAA problem can be solved only approximately. The SDDP method suggests acomputationally tractable approach to solving the SAA, and hence the “true”, problems, and canbe viewed as the third level of approximation in the current system.

Note that the autoregressive models of the form (2.8) obviously do not satisfy the conditionof stagewise independence. Nevertheless it is possible to make a transformation of the originalproblem (2.1) in such a way as to accommodate this condition. Let us observe that inflows appearonly in the right hand sides of the constraints of problem (2.1). So suppose that only vectors bt ofproblem (2.1) are random with each component bit satisfying the periodic autoregressive model ofthe form (2.8). That is,

bt = Φtbt−1 + ht, (3.1)

5

where Φt is a diagonal matrix with diagonal elements

φii,t = γiteεiteµit−µi,t−1 ,

and hit = (1− γit)eεiteµit . Then the feasibility equations of problem (2.1) can be written as

Φtbt−1 − bt + ht = 0, Btxt−1 +Atxt − bt = 0, xt ≥ 0, t = 2, ..., T. (3.2)

Therefore by replacing xt with (xt, bt), and considering the corresponding data process ξt formedby random elements of Φt and ht, we transform the problem to the stagewise independent case.The obtained problem is still linear with respect to the decision variables xt and bt. Of course, thecost of this transformation is that the number of state variables is increased from xt to (xt, bt).

3.1 Generic description of the SDDP algorithm

We give now a general description of the SDDP algorithm applied to the SAA problem. Supposethat Nt, t = 2, ..., T , points generated at every stage of the process. Let ξjt = (ctj , Atj , Btj , btj),j = 1, .., Nt, t = 2, ..., T , be the generated points. As it was already mentioned the total number ofscenarios of the SAA problem is N =

∏Tt=1Nt and can be very large.

3.1.1 Backward step of the SDDP algorithm

Let xt, t = 1, ..., T − 1, be trial points (we can, and eventually will, use more than one trial pointat every stage of the backward step, an extension to that will be straightforward), and Qt(xt−1) bethe (expected value) cost-to-go functions of dynamic programming equations associated with themultistage problem (2.1). Note that because of the stagewise independence assumption, the cost-to-go functions Qt(xt−1) do not depend on the data process. Furthermore, let Qt(·) be a currentapproximation of Qt(·) given by the maximum of a collection of cutting planes

Qt(xt−1) = maxk∈It

{αtk + βT

tkxt−1

}, t = 1, ..., T − 1. (3.3)

At stage t = T we solve NT problems

MinxT∈RnT

cTTjxT s.t. BTj xT−1 +ATjxT = bTj , xT ≥ 0, j = 1, ..., NT . (3.4)

Recall that QTj(xT−1) is equal to the optimal value of problem (3.4) and that subgradients ofQTj(·) at xT−1 are given by −BT

TjπTj , where πTj is a solution of the dual of (3.4). Therefore forthe cost-to-go function QT (xT−1) we can compute its value and a subgradient at the point xT−1

by averaging the optimal values of (3.4) and the corresponding subgradients. Consequently we canconstruct a supporting plane to QT (·) and add it to collection of supporting planes of QT (·). Notethat if we have several trial points at stage T − 1, then this procedure should be repeated for eachtrial point and we add each constructed supporting plane.

Now going one stage back let us recall that QT−1,j(xT−2) is equal to the optimal value ofproblem

MinxT−1∈RnT−1

cTT−1,jxT−1 +QT (xT−1) s.t. BT−1,j xT−2 +AT−1,jxT−1 = bT−1,j , xT−1 ≥ 0. (3.5)

However, function QT (·) is not available. Therefore we replace it by QT (·) and hence considerproblem

MinxT−1∈RnT−1

cTT−1,jxT−1 + QT (xT−1) s.t. BT−1,j xT−2 +AT−1,jxT−1 = bT−1,j , xT−1 ≥ 0. (3.6)

6

Recall that QT (·) is given by maximum of affine functions (see (6.24)). Therefore we can writeproblem (3.6) in the form

MinxT−1∈RnT−1 ,θ∈R

cTT−1,jxT−1 + θ

s.t. BT−1,j xT−2 +AT−1,jxT−1 = bT−1,j , xT−1 ≥ 0θ ≥ αTk + βT

TkxT−1, k ∈ IT .(3.7)

Consider the optimal value, denoted QT−1,j

(xT−2), of problem (3.7). Problem (3.7) is a linearprogramming problem, its dual is also a linear programming problem in (vector) variable πT−1,j

corresponding to the constraint BT−1,j xT−2 + AT−1,jxT−1 − bT−1,j = 0, and variables λk corre-sponding to the constraints αTk + βT

TkxT−1 − θ ≤ 0, k ∈ IT . Let πT−1,j be an optimal solution ofthe dual of problem (3.7) corresponding to the first constraint, and let

QT−1(xT−2) :=1

NT−1

NT−1∑j=1

QT−1,j

(xT−2)

and

gT−1 = − 1NT−1

NT−1∑j=1

BTT−1,j πT−1,j .

Consequently we add the corresponding affine function to collection of QT−1(·). And so on goingbackward in t.

3.1.2 Forward step of the SDDP algorithm

The computed approximations Q2(·), ...,QT (·) (with QT+1(·) ≡ 0 by definition) and a feasible firststage solution x1 can be used for constructing an implementable policy as follows. For a realizationξt = (ct, At, Bt, bt), t = 2, ..., T , of the data process, decisions xt, t = 1, ..., T , are computedrecursively going forward with x1 being the chosen feasible solution of the first stage problem, andxt being an optimal solution of

Minxt

cTt xt + Qt+1(xt) s.t. Atxt = bt −Btxt−1, xt ≥ 0, (3.8)

for t = 2, ..., T . These optimal solutions can be used as trial decisions in the backward step of thealgorithm. Note that xt is a function of xt−1 and ξt, i.e., xt is a function of ξ[t] = (ξ1, ..., ξt), fort = 2, ..., T . That is, policy xt = xt(ξ[t]) is nonanticipative and by the construction satisfies the fea-sibility constraints for every realization of the data process. Thus this policy is implementable andfeasible. If we restrict the data process to the generated sample, i.e., we consider only realizationsξ2, ..., ξT of the data process drawn from scenarios of the SAA problem, then xt = xt(ξ[t]) becomesan implementable and feasible policy for the corresponding SAA problem. On the other hand, ifwe draw samples from the true (original) distribution, this becomes an implementable and feasiblepolicy for the true problem.

Since the policy xt = xt(ξ[t]) is feasible, the expectation

E

[T∑t=1

cTt xt(ξ[t])

](3.9)

gives an upper bound for the optimal value of the corresponding multistage problem. That is, ifwe take this expectation over the true probability distribution of the random data process, then

7

the above expectation (3.9) gives an upper bound for the optimal value of the true problem. Onthe other hand, if we restrict the data process to scenarios of the SAA problem, each with equalprobability 1/N , then the expectation (3.9) gives an upper bound for the optimal value of theSAA problem conditional on the sample used in construction of the SAA problem. Of course, ifthe constructed policy is optimal, then the expectation (3.9) is equal to the optimal value of thecorresponding problem.

The forward step of the SDDP algorithm consists in generating M random realizations (scenar-ios) of the data process and computing the respective optimal values

ϑj :=T∑t=1

cTtj xtj , j = 1, ...,M. (3.10)

That is, ϑj is the value of the corresponding policy for the realization ξ1, ξj2, ..., ξ

jT of the data process.

As such, ϑj is an unbiased estimate of expected value of that policy, i.e., E[ϑj ] = E[∑T

t=1 cTt xt(ξ[t])

].

The forward step has two functions. First, some (all) of computed solutions xtj can be used astrial points in the next iteration of the backward step of the algorithm. Second, these solutions canbe employed for constructing a statistical upper bound for the optimal value of the correspondingmultistage program (true or SAA depending on from what distribution the sample scenarios weregenerated).

Let

ϑM :=1M

M∑j=1

ϑj and σ2M =

1M − 1

M∑j=1

(ϑj − ϑM )2 (3.11)

be the respective sample mean and sample variance of the computed values ϑj . Since ϑj is anunbiased estimate of the expected value of the constructed policy, we have that ϑM is also anunbiased estimate of the expected value of that policy. By invoking the Central Limit Theorem wecan say that ϑM has an approximately normal distribution provided that M is reasonably large.This leads to the following (approximate) (1 − α)-confidence upper bound for the value of thatpolicy

uα,M := ϑM + zασM√M. (3.12)

Here 1 − α ∈ (0, 1) is a chosen confidence level and zα = Φ−1(1 − α), where Φ(·) is the cdf ofstandard normal distribution. That is, with probability approximately 1 − α the expected valueof the constructed policy is less than the upper bound uα,M . Since the expected value (3.9) ofthe constructed policy is bigger than or equal to the optimal value of the considered multistageproblem, we have that uα,M also gives an upper bound for the optimal value of the multistageproblem with confidence at least 1 − α. Note that the upper bound uα,M can be used for the SAAor the true problem depending on from what distribution the sampled scenarios were generated.

3.2 Stopping criteria and validation of the optimality gap

Consider the optimal value, denoted Opt, of the problem computed at a backward step of thealgorithm. That is, Opt is the optimal value of the problem

Minx1

cT1 x1 + Q2(x1) s.t. A1x1 = b1, x1 ≥ 0. (3.13)

Note that the function Q2(x1), and hence Opt, can change from one iteration to another of thealgorithm. If no cuts are discarded during consecutive iterations of the algorithm, then Opt is

8

monotonically increasing (or rather monotonically nondecreasing) with every new iteration of thealgorithm. Since Qt(·) is the maximum of cutting planes of the cost-to-go function Qt(·) we havethat

Qt(·) ≥ Qt(·), t = 2, ..., T. (3.14)

Therefore Opt gives a lower bound for the optimal value of the considered SAA problem. Thislower bound is deterministic (i.e., is not based on sampling) if applied to the corresponding SAAproblem. As far as the true problem is concerned, recall that the optimal value of the true problemis greater than or equal to the expectation of the optimal value of the SAA problem. Therefore onaverage Opt is also a lower bound for the optimal value of the true problem.

The current procedure for stopping the iterations (stopping criterion) is when the lower end

lα,M := ϑM − zασM√M

(3.15)

of the 100(1 − α)% confidence interval (constructed by the forward step procedure at a giveniteration of the algorithm), becomes smaller than the corresponding lower bound Opt. This stoppingcriterion depends on the number of scenarios used in the forward step and the chosen confidencelevel 100(1 − α)%. Reducing the number of scenarios results in increasing the standard deviationof the corresponding estimate and hence making the lower end of the confidence interval smaller.Also increasing the confidence level makes the confidence interval larger, i.e., decreases its lowerend. This indicates that for sufficiently large confidence level the algorithm could be stopped atany iteration and this stopping criterion does not give any reasonable guarantee for quality of theobtained solution and could result in a premature stopping of the iteration procedure.

Consider the upper end uα,M of the confidence interval (defined in (3.12)). As it was argued insection 3.1.2, uα,M gives an upper bound for the value of the considered policy, and hence for theoptimal value of the corresponding multistage problem, with a confidence of about 100(1 − α)%.Therefore

∆α,M := uα,M − Opt (3.16)

gives an estimate for the optimality gap of the considered policy with a confidence of 100(1−α)%.Several remarks are now in order. The values uα,M and Opt, and hence ∆α,M , can change from

one iteration of the algorithm to the next. The lower bound Opt, applied to the constructed SAAproblem, is deterministic. On the other hand, the upper bound uα,M is a function of generatedscenarios and thus is stochastic even for considered (fixed) SAA problem. This upper bound mayvary for different sets of random samples. Therefore even if the lower bound Opt is monotonicallyincreasing, the gap estimate ∆α,M doesn’t have to decrease with every new iteration.

In some implementations the sample size M of the forward step procedure can be small, evenit can be M = 1. In that case variability of the average estimate ϑM can be large and thecorresponding upper bound cannot be reliably estimated by uα,M . Yet it makes sense to averageϑM over last K iterations, especially after the lower bound Opt starts to stabilize.

It should be made clear that it is unrealistic to expect that large SAA problems could be solved totrue optimality in a reasonable computational time. In other words for large problems the estimate∆α,M of the optimality gap cannot be made very small, say less that 1%, in a reasonable computa-tional time. Therefore it makes sense to have alternative stoping criteria based on stabilization ofthe lower bound. Stabilization of first stage optimal solutions can be also taken into account.

3.3 Generic description of the risk neutral SDDP

An algorithmic description of the SDDP method for the risk neutral approach is presented inAlgorithm 1.

9

Algorithm 1 Risk neutral SDDP algorithmRequire:

{Q0t

}t=2,...,T+1

(Init. Lower approx.) and ε > 0(accuracy)

1: Initialize: i← 0, z =∞ (Upper bound), z = −∞ (Lower bound)2: while z − z > ε do3: Sample M scenarios:

{{ctk, Atk, Btk, btk}2≤t≤T

}1≤k≤M

4: (Forward step)5: for k = 1→M do6: for t = 1→ T do

7: xkt ← arg minxt∈Rnt

{c>tkxt + Qi

t+1(xt) :Btkxt−1 +Atkxt = btk, xt ≥ 0

}8: end for9: ϑk ←

∑Tt=1 c

Ttkx

kt

10: end for

11: (Upper bound update)12: z ← ϑM + zα

σM√M

13: (Backward step)14: for k = 1→M do15: for t = T → 2 do16: for j = 1→ Nt do

17:[Qtj(xkt−1), πktj

]← minxt∈Rnt

{c>tjxt + Qi

t+1(xt) :Btj xt−1 +Atjxt = btj , xt ≥ 0

}18: end for19: Qt(xkt−1) := 1

Nt

∑Ntj=1 Qt,j(x

kt−1) ; gkt := − 1

Nt

∑Ntj=1 π

ktjBt,j

20: Qi+1t ← {xt−1 ∈ Qi

t : −gkt xt−1 ≥ Qt(xkt−1)− gkt xkt−1}21: end for22: end for

23: (Lower bound update)24: z ← minx1∈Rn1

{c>1 x1 + Q2(x1) : A1x1 = b1, x1 ≥ 0

}25: i← i+ 126: end while

4 Risk averse approach

In formulation (2.1) the expected value E[∑T

t=1 cTt xt

]of the total cost is minimized subject to

the feasibility constraints. That is, the total cost is optimized (minimized) on average. Since thecosts are functions of the random data process, they are random and hence are subject to randomperturbations. For a particular realization of the data process these costs could be much biggerthan their average (i.e., expectation) values. We will refer to the formulation (2.1) as risk neutralas opposed to risk averse approaches which we will discuss below.

The goal of a risk averse approach is to avoid large values of the costs for some possible realiza-tions of the data process at every stage of the considered time horizon. One such approach will beto maintain constraints cTt xt ≤ θt, t = 1, ..., T , for chosen upper levels θt and all possible realizationsof the data process. However, trying to enforce these upper limits under any circumstances couldbe unrealistic and infeasible. One may try to relax these constraints by enforcing them with a high(close to one) probability. However, introducing such so-called chance constraints can still result in

10

infeasibility and moreover is very difficult to handle numerically. So we consider here penalizationapproaches. That is, at every stage the cost is penalized while exceeding a specified upper limit.

In a simple form this leads to a risk averse formulation where costs cTt xt are penalized1 byκt[cTt xt − θt]+, with θt and κt ≥ 0, t = 2, ..., T , being chosen constants. That is, the costs cTt xt arereplaced by functions ft(xt) = cTt xt + κt[cTt xt − θt]+ in the objective of the problem (2.1). Thisleads to the following nested formulation of a multistage problem

MinA1x1=b1x1≥0

f1(x1) + E

minB2x1+A2x2=b2

x2≥0

f2(x2) + E[· · ·+ E

[min

BT xT−1+AT xT=bTxT≥0

fT (xT )]] . (4.1)

The additional penalty terms represent the penalty for exceeding the upper limits θt. An immediatequestion is how to choose constants θt and κt. It could be noted that in that approach the upperlimits θt are fixed (chosen a priori) and not adapted to a current realization of the random process.Let us observe that optimal solutions of the corresponding risk averse problem will be not changedif the penalty term at t-th stage is changed to θt + κt[cTt xt − θt]+ by adding the constant θt. Nowif we adapt the upper limits θt to a realization of the data process by taking these upper limitsto be (1 − αt)-quantiles of cTxt conditional on observed history ξ[t−1] = (ξ1, ..., ξt−1) of the dataprocess, we end up with penalty terms given by AV@Rαt with αt = 1/κt. Recall that the AverageValue-at-Risk of a random variable2 Z is defined as

AV@Rα[Z] := infu∈R

{u+ α−1E [Z − u]+

}. (4.2)

The minimum in the right hand side of (4.2) is attained at u = V@Rα(Z), where V@Rα(Z) is the(say left side) (1− α)-quantile of the distribution of Z. That is

V@Rα(Z) := inf{t : F (t) ≥ 1− α},

where F (·) is the cumulative distribution function (cdf) of the random variable Z. Therefore

AV@Rα[Z] = V@Rα(Z) + α−1E [Z − V@Rα(Z)]+ . (4.3)

This leads to the following nested risk averse formulation of the corresponding multistage prob-lem

MinA1x1=b1x1≥0

cT1 x1 + ρ2|ξ1

minB2x1+A2x2=b2

x2≥0

cT2 x2 + · · ·+ ρT |ξ[T−1]

[min

BT xT−1+AT xT=bTxT≥0

cTTxT

] . (4.4)

Here ξ2, ..., ξT is the random process (formed from the random elements of the data ct, At, Bt, bt),E[Z|ξ[t−1]

]denotes the conditional expectation of Z given ξ[t−1], AV@Rαt

[Z|ξ[t−1]

]is the condi-

tional analogue of AV@Rαt [Z] given ξ[t−1], and

ρt|ξ[t−1][Z] := (1− λt)E

[Z|ξ[t−1]

]+ λtAV@Rαt

[Z|ξ[t−1]

], (4.5)

with λt ∈ [0, 1] and αt ∈ (0, 1) being chosen parameters. In formulation (4.4) the penalty termsα−1t

[cTt xt − V@Rα(cTt xt)

]+

are conditional, i.e., are adapted to the random process by the opti-mization procedure. Therefore we refer to the risk averse formulation (4.4) as adaptive.

1Recall that [a]+ denotes max{a, 0}, i.e., [a]+ = a if a ≥ 0 and [a]+ = 0 if a < 0.2In some publications the Average Value-at-Risk is called the Conditional Value-at-Risk and denoted CV@Rα.

Since we deal here with conditional AV@Rα, it will be awkward to call it conditional CV@Rα.

11

It is also possible to give the following interpretation of the adaptive risk averse formulation (4.4).It is clear from the definition (4.3) that AV@Rα[Z] ≥ V@Rα(Z). Therefore ρt|ξ[t−1]

[Z] ≥ %t|ξ[t−1][Z],

where%t|ξ[t−1]

[Z] = (1− λt)E[Z|ξ[t−1]

]+ λtV@Rαt

[Z|ξ[t−1]

]. (4.6)

If we replace ρt|ξ[t−1][Z] in the risk averse formulation (4.4) by %t|ξ[t−1]

[Z], we will be minimizingthe weighted average of means and (1 − α)-quantiles, which will be a natural way of dealing withthe involved risk. Unfortunately such formulation will lead to a nonconvex and computationallyintractable problem. This is one of the main reasons for using AV@Rα instead of V@Rα in thecorresponding risk averse formulations.

4.1 Risk averse SDDP method

With a relatively simple additional effort the SDDP algorithm can be applied to risk averse problemsof the form (4.4). Assuming that the stagewise independence condition holds, the correspondingdynamic programming equations are

Qt (xt−1, ξt) = infxt∈Rnt

{cTt xt +Qt+1(xt) : Btxt−1 +Atxt = bt, xt ≥ 0

}, (4.7)

withQt+1 (xt) := ρt+1 [Qt+1 (xt, ξt+1)] , (4.8)

whereρt[Z] = (1− λt)E [Z] + λtAV@Rαt [Z] . (4.9)

At the first stage problem

Minx1∈Rn1

cT1 x1 +Q2(x1) s.t. A1x1 = b1, x1 ≥ 0, (4.10)

should be solved. Note that because of the stagewise independence, the cost-to-go functionsQt+1

(xt)

and the mean-risk measures ρt+1 in (4.8) do not depend on the data process. Notealso that since the considered risk measures are convex and monotone, the cost-to-go functionsQt+1

(xt)

are convex.

4.1.1 Backward step of the SDDP algorithm

The corresponding SAA problem is obtained by replacing the expectations with their sample averageestimates. The cost-to-go functions of the SAA problem are

Qt+1

(xt)

= (1− λt)Qt+1

(xt)

+ λt

Qt+1,ι

(xt)

+1

αtNt

Nt∑j=1

[Qt+1,j

(xt)−Qt+1,ι

(xt)]

+

, (4.11)

where Qt+1,ι

(xt)

is the (1 − αt) sample quantile of Qt+1,1

(xt), ..., Qt+1,Nt

(xt). That is, numbers

Qt+1,j

(xt), j = 1, ..., Nt, are arranged in the increasing order Qt+1,π(1)

(xt)≤ ... ≤ Qt+1,π(Nt)

(xt)

and ι = j such that π(j) is the smallest integer such that π(j) ≥ (1−αt)Nt. Note that if (1−αt)Nt

is not an integer, then ι remains the same for small perturbations of xt.In order to apply the backward step of the SDDP algorithm we need to know how to compute

the respective cuts of the cost-to-go functions. This means that we need to know how to compute

12

a subgradient of Qt+1(·) at a given point xt. For the mean-risk measure ρt (defined in (4.9)) suchsubgradient is given by

gt+1 = (1− λt)γt+1 + λt

γt+1,ι +1

αtNt

Nt∑j=1

ζt+1,j

, (4.12)

where

ζt+1,j ={

0 if Qt+1,j

(xt)−Qt+1,ι

(xt)< 0,

γt+1,j − γt+1,ι if Qt+1,j

(xt)−Qt+1,ι

(xt)≥ 0.

(4.13)

In the backward step of the SDDP algorithm the above formulas are applied to the piecewiselinear lower approximations Qt+1(·) exactly in the same way as in the risk neutral case.

4.1.2 Forward step

The constructed lower approximations Qt(·) of the cost-to-go functions define a feasible policy andhence can be used in the forward step procedure in the same way as it was discussed in section3.1.2. That is, for a given scenario (sample path), starting with a feasible first stage solution x1,decisions xt, t = 2, ..., T , are computed recursively going forward with xt being an optimal solutionof

Minxt

cTt xt + Qt+1(xt) s.t. Atxt = bt −Btxt−1, xt ≥ 0, (4.14)

for t = 2, ..., T . These optimal solutions can be used as trial decisions in the backward step of thealgorithm.

Unfortunately there is no easy way to evaluate the risk-adjusted cost

cT1 x1 + ρ2|ξ1

[cT2 x2(ξ[2]) + · · ·+ ρT |ξ[T−1]

(cTT xT (ξ[T ])

)](4.15)

of the obtained policy, and hence to construct an upper bound for the optimal value of the cor-responding risk-averse problem (4.4). Therefore a stopping criterion based on stabilization of thelower bound was used in numerical experiments. Of course, the expected value (3.9) of the con-structed policy can be estimated in the same way as in the risk neutral case by the averagingprocedure.

4.1.3 Dynamic choice of the weights

One of the questions related to the above risk averse approach is how to choose weights λt for therisk measures ρt defined in (4.9). So far we considered the case where the weights λt, responsible fora compromise between optimizing on average and controlling the risk, are constants, i.e., are chosena priori. Let us consider now a situation where λt = λt(ξ[t−1]) are functions of the data process.That is, in the nested formulation (4.4) we define the corresponding conditional risk measures as

ρt|ξ[t−1][Z] := (1− λt(ξ[t−1]))E

[Z|ξ[t−1]

]+ λt(ξ[t−1])AV@Rαt

[Z|ξ[t−1]

]. (4.16)

Note again that the difference between definitions (4.5) and (4.16) is that in (4.16) the weightsλt(ξ[t−1]) are functions of the data process.

The dynamic programming equations (4.7)–(4.8) can be written in the similar way, but nowthe cost-to-go functions

Qt(xt−1, ξ[t]

)= inf

xt∈Rnt

{cTt xt +Qt+1(xt, ξ[t]) : Btxt−1 +Atxt = bt, xt ≥ 0

}(4.17)

13

andQt+1(xt, ξ[t]) = ρt+1|ξ[t]

[Qt+1

(xt, ξ[t+1]

)](4.18)

are functions of the data process even in the case of stagewise independence.In order to simplify this suppose now that the weights λt(ξt−1) are functions of the last observed

data vector ξt−1, rather than the whole history ξ[t−1] of the data process. Also suppose that thestagewise independence condition holds. In that case equation (4.18) takes the form

Qt+1(xt, ξt) = (1− λt+1(ξt))E [Qt+1 (xt, ξt+1)] + λt+1(ξt)AV@Rαt [Qt+1 (xt, ξt+1)] (4.19)

with the cost-to-go functions Qt (xt−1, ξt) and Qt+1(xt, ξt) depending only on ξt rather than thewhole history ξ[t].

Suppose further that each random vector ξt can take a finite number of possible values, i.e., thedata vectors are discretized. Then the backward step of the SDDP algorithm can be applied bymaking a piecewise linear approximation of Qt+1(·, ξt) for every possible value of ξt. This, however,can be computationally very expensive even with a moderate number of discretizations per stage.

4.2 Time consistency

A solution of the considered multistage problem (risk neutral or risk averse) is a sequence ofdecisions, called a decision rule or a policy,

decision (x1) observation (ξ2) decision (x2) ..... observation (ξT ) decision (xT ).

(4.20)

That is, for every possible realization of the data process ξ1, ..., ξT , and at every time stage t =1, ..., T , the user is given the decision xt = xt(ξ[t]). The decision at time t is a function of theobserved history of the data process up to time t and should not depend on future observations.This is the basic requirement of the so-called nonanticipativity of the decision rule. A policy is saidto be feasible if it satisfies the corresponding feasibility constraints for all possible realizations ofthe data process.

The concept of nonanticipativity does not say anything about optimality of the consideredpolicy, it only enforces the rule that our decisions can be based on observed past and should notdepend on unknown future. As far as optimality is concerned, with every policy we associate a costand optimize (say minimize) this cost over all feasible policies. For example, in the (risk neutral)case of the multistage problem (2.1) we minimize the expected total cost E

[∑Tt=1 c

Tt xt(ξ[t])

].

In order to formalize this we need to construct a model where possible realizations (calledscenarios) of the data process are specified. The number of scenarios can be finite or infinite. Ontop of that in a probabilistic model we have to specify probability distribution of the consideredscenarios. In the framework of a considered model, at every time stage t conditional on the observedhistory ξ[t−1], we know which scenarios can and which cannot happen in the future. So it is naturalto require that optimality of our decisions at time t conditional on ξ[t−1] should not depend onscenarios which cannot happen in the future. In a slightly different form this can be formulatedas: the optimal policies obtained when solving the original problem at time t ∈ {1, ..., T − 1} remainoptimal for all subsequent problems. This is the basic requirement of time (or dynamic) consistency.

It should be clearly understood that this concept of time consistency requires a specificationof the optimality at every time stage t conditional on ξ[t−1]. In some applications this comes in anatural way. Consider the risk neutral formulation (2.1) of multistage program. The expectationoperator has the following decomposition property

E[·] = E|ξ1[E|ξ[2]

[· · ·E|ξ[T−1]

[·]]], (4.21)

14

where E|ξ[t] [·] = E[·|ξ[t]] is the conditional expectation (since ξ1 is deterministic E|ξ1 [·] = E[·]). Thisis a basis for showing that the nested formulation (2.1), which more accurately should be writtenas

MinA1x1=b1x1≥0

cT1 x1 + E|ξ1

minB2x1+A2x2=b2

x2≥0

cT2 x2 + E|ξ[2][· · ·+ E|ξ[T−1]

[min

BT xT−1+AT xT=bTxT≥0

cTTxT]] , (4.22)

is equivalent to minimization of the expectation E[∑T

t=1 cTt xt(ξ[t])

]subject to the feasibility con-

straints. The nested formulation specifies that at stage t, conditionally on ξ[t−1], we minimize theconditional expectation of the remaining cost.

With the nested formulation (4.4) of the adapted risk averse problem the situation is consid-erably more involved. This formulation, with naturally defined conditional optimality criteria atevery time stage t, is time consistent. However, the risk measures of the form (4.9) do not havethe decomposition property similar to (4.21) even if all λt and αt are constants (i.e., do not dependon t) and the process is stagewise independent. The risk-adjusted cost (4.15) is given in a nestedform. This cost can be written as %

[∑Tt=1 c

Tt xt(ξ[t])

], where % is the composite risk measure

%[·] := ρ2|ξ1

[ρ3|ξ[2]

[· · · ρT |ξ[T−1]

[·]]]. (4.23)

Unfortunately the composite risk measure % is complicated and cannot be written in a closedform. This is one of the criticisms of the nested risk averse formulation (4.4) that it may be not easyto give an intuitive explanation of the total objective function %[·]. Our argument is that we try tocontrol the risk of high costs at every stage of the process, and the nested formulation is tailed forthis purpose. On the other hand, some other proposals for risk averse multistage programming arenot time consistent in the sense that there is no simple and natural way for writing the requiredconditional optimality criteria. Of course, the notion of “simple” and “natural” could be somewhatsubjective. Therefore the time consistency concept can be controversial.

4.3 Generic description of the risk averse SDDP

An algorithmic description for the risk averse SDDP algorithm with M trial points per iteration ispresented in Algorithm 2.

5 Worst-case-expectation approach

There are basically two popular approaches to optimization under uncertainty. One approachmodels uncertain parameters as random variables with a specified probability distribution andoptimization is applied to the expected value of the objective function. This is the risk neutral ap-proach discussed in previous sections. An alternative approach is to hedge against a worst possiblesituation by optimizing a worst possible value of the objective function. This is the approach of ro-bust optimization which became popular in recent years not least due to computational tractabilityof the obtained optimization problems. Both approaches have advantages and disadvantages andcan be applied in different situations. The robust approach could be too conservative especially incases where uncertain parameters have a large range of variability. On the other hand, the stochas-tic programming approach makes sense if averaging of the objective function could be reasonablyjustified by applying the Law of Large Numbers to repeated operations.

15

Algorithm 2 Risk averse SDDP algorithmRequire:

{Q0t

}t=2,...,T+1

(Init. lower approx.), {αt, λt}t=2,...,T+1 ∈ [0, 1] and imax(max. iterations)

1: Initialize: i← 0, z = −∞ (Lower bound)2: while i < imax do3: Sample M scenarios:

{{ctk, Atk, Btk, btk}2≤t≤T

}1≤k≤M

4: (Forward step)5: for k = 1→M do6: for t = 1→ T do

7: xkt ← arg minxt∈Rnt

{c>tkxt + Qi

t+1(xt) :Btkxt−1 +Atkxt = btk, xt ≥ 0

}8: end for9: end for

10: (Backward step)11: for k = 1→M do12: for t = T → 2 do13: for j = 1→ Nt do

14:[Qtj(xkt−1), πktj

]← minxt∈Rnt

{c>tjxt + Qi

t+1(xt) :Btj xt−1 +Atjxt = btj , xt ≥ 0

}15: end for16: ι ∈ {1, ..., Nt} = (1− αt) sample quantile of

{Qtj(xkt−1)

}j∈{1,...,Nt}

17: Qt(xkt−1) := 1Nt

∑Ntj=1 Qt,j(x

kt−1) , gktj := −πktjBt,j ,gkt := 1

Nt

∑Ntj=1 g

ktj

18: Qt(xkt−1

)= (1−λt−1)Qt

(xkt−1

)+λt

(Qt,ι

(xkt−1

)+ 1

αtNt

∑Ntj=1

[Qt,j

(xkt−1

)−Qt,ι

(xkt−1

)]+

)19: gkt = (1− λt)gkt + λt

(gktι + 1

αtNt

∑Ntj=1

[gktj − gktι

]× 1Qt,j(xkt−1)≥Qt,ι(xkt−1)

)20: Qi+1

t ← {xt−1 ∈ Qit : −gkt xt−1 ≥ Qt(xkt−1)− gkt xkt−1}

21: end for22: end for

23: (Lower bound update)24: z ← minx1∈Rn1

{c>1 x1 + Q2(x1) : A1x1 = b1, x1 ≥ 0

}25: i← i+ 126: end while

There are also situations where the involved uncertain parameters can be naturally dividedinto two groups, for one group the robust approach makes sense while for the other the stochasticprogramming approach is more appropriate. We discuss now how the robust and stochastic pro-gramming approaches can be combined together. Applied to Brazilian operation planning of hydroplants the combined approach models, as before, inflows as random variables and consider demandsas uncertain subject to relatively small perturbations contained in specified uncertainty sets.

5.1 Static case

Consider a problem where an objective function F (x, ξ), depending on vector ξ ∈ Rd of uncer-tain parameters, is optimized (say minimized) subject to feasibility constraints x ∈ X ⊂ Rn. Inparticular, in two stage modeling F (x, ξ) can be the optimal value of second stage problem. Sup-pose further that vector ξ can be partitioned ξ = (ξ1, ξ2) with ξ1 varying in specified uncertaintyset Ξ1 ⊂ Rd1 , and ξ2 ∈ Rd2 modeled as a random vector with specified probability distribution.

16

This leads to the following Robust - Stochastic Programming formulation of the correspondingoptimization problem

Minx∈X

{f(x) := sup

ξ1∈Ξ1

E[F (x, ξ1, ξ2)

]}. (5.1)

For given x and ξ1, the expectation in (5.1) is performed with respect to the probability distributionof random vector ξ2. In this formulation one optimizes the worst case average of the objective.

Suppose that the function F (x, ξ) is convex in x. It follows then that the function f(x) isconvex. In order to proceed with cutting plane type algorithms for solving problem (5.1) we needa procedure for computing subgradients of f(x). It is possible to show that under mild regularityconditions the subdifferential (i.e., the set of subgradients) of f(x) is given by

∂f(x) = conv{∪ξ1∈Ξ1(x)E

[∂F (x, ξ1, ξ2)

]}, (5.2)

where conv(A) denotes convex hull of set A, all subdifferentials are taken with respect to x and

Ξ1(x) := arg maxξ1∈Ξ1

E[F (x, ξ1, ξ2)]. (5.3)

In particular, if the probability distribution of ξ2 has finite support {ξ21 , ..., ξ

2K} with respective

probabilities p1, ..., pK , then

E[F (x, ξ1, ξ2)

]=

K∑k=1

pkF (x, ξ1, ξ2k) and ∂E

[F (x, ξ1, ξ2)

]=

K∑k=1

pk∂F (x, ξ1, ξ2k). (5.4)

In order to apply these formulas in a numerical, cutting plane type, algorithm we need a procedurefor computing maximum of E

[F (x, ξ1, ξ2)

]over ξ1 ∈ Ξ1. In general this could be an intractable

problem. If the set Ξ1 is finite of not too large cardinality, then of course this can be done in astraightforward way. We will discuss this further in the next section.

5.2 Multistage case

In this section we discuss an extension of the above approach to a multistage setting. We considerlinear multistage problems. Let ξ1, ..., ξT be the uncertainty process underlying the correspondingmultistage problem with ξ1 being known (deterministic). Suppose that the data vectors ξt ∈ Rdt ,t = 2, ..., T , are decomposed into two parts. That is ξt = (ξ1

t , ξ2t ), with (ξ1

2 , ..., ξ1T ) ∈ Ξ1 and

ξ22 , ..., ξ

2T being a random process with a specified probability distribution. We refer to ξ1

2 , ..., ξ1T as

the uncertain parameters and to ξ22 , ..., ξ

2T as the random parameters of the model.

Consider the following worst-case-expectation formulation of the linear multistage program

Minx1,x2(·),...,xT (·)

sup(ξ12 ,...,ξ

1T )∈Ξ1

E{cT1 x1 + cT2 x2(ξ[2]) + ...+ cTTxT (ξ[T ])

}s.t. A1x1 = b1, x1 ≥ 0,

Btxt−1(ξ[t−1]) +Atxt(ξ[t]) = bt, xt(ξ[t]) ≥ 0, t = 2, ..., T,

(5.5)

where ξ1 = (c1, A1, b1), ξt = (ct, At, Bt, bt), t = 2, ..., T , ξ[t] = (ξ1, ..., ξt) denotes history of theprocess up to time t and the expectation is taken with respect to the probability distribution of therandom process ξ2

2 , ..., ξ2T . The optimization is performed over functions (called policies or decision

rules) xt(ξ[t]), t = 1, ..., T , of the history of the data process (this is emphasized in the notation

17

xt(·)) and the feasibility constraints should be satisfied for all (ξ12 , ..., ξ

1T ) ∈ Ξ1 and almost every

(a.e.) realization of the random process. For the two stage case of T = 2, problem (5.5) becomesof the form (5.1) with the function F (x, ξ1, ξ2) given by the optimal value of the correspondingsecond stage problem.

We have thatE{cT1 x1 + cT2 x2(ξ[2]) + cT3 x3(ξ[3]) + ...+ cTTxT (ξ[T ])

}=

cT1 x1 + E{cT2 x2(ξ[2]) +

{E|ξ2

[2]cT3 x3(ξ[3]) + ...+ E|ξ2

[T−1]

[cTTxT (ξ[T ])

]}},

where E|ξ2[t]

denotes the conditional expectation with respect to ξ2[t]. Thus

sup(ξ12 ,...,ξ

1T )∈Ξ1

E{cT1 x1 + cT2 x2(ξ[2]) + cT3 x3(ξ[3]) + ...+ cTTxT (ξ[T ])

}≤

cT1 x1 + sup(ξ12 ,...,ξ

1T )∈Ξ1

E{cT2 x2(ξ[2]) + sup

(ξ12 ,...,ξ1T )∈Ξ1

E|ξ2[2]

{cT3 x3(ξ[3]) + ...

+ sup(ξ12 ,...,ξ

1T )∈Ξ1

E|ξ2[T−1]

[cTTxT (ξ[T ])

]}}.

(5.6)

This leads to the following nested formulation of the worst-case-expectation multistage problem

MinA1x1=b1x1≥0

cT1 x1 + ρ2|ξ1

minB2x1+A2x2=b2

x2≥0

cT2 x2 + · · ·+ ρT |ξ[T−1]

[min

BT xT−1+AT xT=bTxT≥0

cTTxT

] , (5.7)

whereρt|ξ[t−1]

[ · ] = sup(ξ12 ,...,ξ

1T )∈Ξ1

E|ξ2[t−1]

[ · ], t = 2, ..., T. (5.8)

Note that the inequality in (5.6) can be strict, and the nested worst-case-expectation multistageproblem (4.4) is not necessarily equivalent to the problem (5.5)

Suppose that the data process ξ2, ..., ξT is stagewise independent. That is, random vector ξ2t is

independent of (ξ22 , ..., ξ

2t−1), t = 3, ..., T , and the uncertainty set Ξ1 = Ξ1

2 × · · · × Ξ1T is given by

a direct product of individual uncertainty sets. If the uncertain parameters have reasonably smallvariations, one can construct the respective uncertainty sets by introducing small variations of theparameters around their nominal values. In a sense this approach can be considered as a study ofsensitivity of the computed solutions to small variations of the uncertain parameters.

Note similarity between risk averse formulation (4.4) and nested formulation (5.7). Similar to(4.7)–(4.10) it is possible to write the following dynamic programming equations for the nestedproblem (5.7)

Qt(xt−1, ξ1t , ξ

2t ) = inf

Btxt−1+Atxt=bt, xt≥0

{cTt xt +Qt+1(xt)

}, (5.9)

t = 2, ..., T , with QT+1(·) ≡ 0 and

Qt+1(xt) = supξ1t+1∈Ξ1

t+1

E{Qt+1(xt, ξ1

t+1, ξ2t+1)

}, (5.10)

where the expectation is taken with respect to ξ2t+1. Note that under the above assumptions of

stagewise independence the cost-to-go functions Qt+1(xt) do not depend on the data process. Notealso that functions Qt(xt−1, ξ

1t , ξ

2t ) are convex in xt−1, and hence functions Qt(xt−1) are convex.

There are two basic questions about this model – how to choose the uncertainty sets Ξ1i , i =

2, ..., T , and how to solve the obtained problem. These two questions are related, on one hand choiceof Ξ1

i should make practical sense, on the other hand the corresponding problem (4.4) should beamendable to a numerical solution.

18

5.3 The SDDP approach to the worst-case-expectation problem

In this section we discuss application of the Stochastic Dual Dynamic Programming (SDDP) methodto the nested formulation (5.7). We assume the stagewise independence condition, hence thedynamic equations take the form (5.9)–(5.10). In order to apply the SDDP algorithm we need tocompute subgradients of the cost-to-go functions Qt+1(xt). It is possible to show that a subgradientof Qt+1(xt) is given by

∇Qt(xt−1) = E{∇Qt(xt−1, ξ

1t , ξ

2t )}, (5.11)

for some ξ1t ∈ Ξ1

t (xt−1) . Here

Ξ1t (xt−1) := arg max

ξ1t∈Ξ1t

E{Qt(xt−1, ξ1t , ξ

2t )},

and as before all subgradients are taken with respect to xt−1 and the expectation is taken withrespect to the distribution of ξ2

t .The first step of numerical solution is to discretize the data process. The random process ξ2

t ,t = 2, ..., T , is discretized by generating random sample from probability distribution of each randomvector; this is the approach of the Sample Average Approximation (SAA) method. Discretizationof the uncertainty sets Ξ1

t is a more subtle issue.

5.3.1 Backward step of the SDDP algorithm

Suppose for the sake of simplicity that only the right hand sides bt, t = 2, ..., T , are uncertain.That is, ξt = bt and ξ1

t = b1t , ξ2t = b2t , bt = (b1t , b

2t ). Let b2tj , j = 1, ..., N , be the respective random

sample, t = 2, ..., T . Let xt, t = 1, ..., T − 1, be trial points (we can use more than one trial point atevery stage of the backward step, an extension to that will be straightforward). Let Qt(·) be thecost-to-go functions of dynamic programming equations associated with the considered multistageproblem, and Qt(·) be a current approximation of Qt(·) given by the maximum of a collection ofcutting planes

Qt(xt−1) = maxk∈It

{αtk + βT

tkxt−1

}, t = 1, ..., T − 1. (5.12)

For a given b1T and bTj = (b1T , b2Tj) at stage t = T we solve N problems

MinxT∈RnT

cTTxT s.t. BT xT−1 +ATxT = bTj , xT ≥ 0, j = 1, ..., N. (5.13)

The optimal value of problem (5.13) is QTj(xT−1, b1T ). We should compute value of the cost-to-go

function QT (xT−1) at trial point xT−1. We have

QT (xT−1) = supb1T∈Ξ1

T

qT (xT−1, b1T ), (5.14)

where

qT (xT−1, b1T ) = N−1

N∑j=1

QTj(xT−1, b1T ). (5.15)

Now going one stage back QT−1,j(xT−2, b1T−1) is equal to the optimal value of problem

MinxT−1∈RnT−1

cTT−1xT−1 +QT (xT−1) s.t. BT−1xT−2 +AT−1xT−1 = bT−1,j , xT−1 ≥ 0. (5.16)

19

However, function QT (·) is not available. Therefore we replace it by QT (·) and hence considerproblem

MinxT−1∈RnT−1

cTT−1xT−1 + QT (xT−1) s.t. BT−1xT−2 +AT−1xT−1 = bT−1,j , xT−1 ≥ 0. (5.17)

Recall that QT (·) is given by maximum of affine functions (see (5.12)). Therefore we can writeproblem (5.17) in the form

MinxT−1∈RnT−1 ,θ∈R

cTT−1xT−1 + θ

s.t. BT−1xT−2 +AT−1xT−1 = bT−1,j , xT−1 ≥ 0θ ≥ αTk + βT

TkxT−1, k ∈ IT .(5.18)

Then we have to solve the problem

maxb1T−1∈Ξ1

T−1

qT−1(xT−2, b1T−1), (5.19)

where

qT−1(xT−2, b1T−1) = N−1

N∑j=1

QT−1,j(xT−2, b1T−1),

and so on.Let us consider the following approach. Suppose that we can sample from sets Ξ1

t . For exampleif sets Ξ1

t are finite, probably with large cardinality, we can sample an element of Ξ1t with equal

probability. Consider the cost-to-go function QT (xT−1). Sample L points b1T`, ` = 1, ..., L, fromΞ1T . The number L can be small, even L = 1. For ` = 1, ..., L, compute subgradient (at xT−1)

γT` = N−1N∑j=1

∇QTj(xT−1, b1T`). (5.20)

Note that by (5.11) the subgradient ∇QTj(xT−1, b1T`) can be computed by solving the dual of

problem (5.13) for bTj = (b1T`, b2Tj). Add the corresponding cutting planes

qT (xT−1, b1T`) + γT

T`(xT−1 − xT−1), ` = 1, ..., L,

to the collection of cutting planes of QT (·). By (5.14) we have that QT (xT−1) ≥ qT (xT−1, b1T ) for

any b1T ∈ Ξ1T , and hence

QT (xT−1) ≥ qT (xT−1, b1T`) + γT

T`(xT−1 − xT−1), (5.21)

i.e., qT (xT−1, b1T`) + γT

T`(xT−1 − xT−1) is indeed a cutting plane for QT (·).And so on for stages t = T − 1, ..., 2.

5.3.2 Forward step of the SDDP algorithm

The forward step of the algorithm is done in the standard way. The computed approximationsQ2(·), ...,QT (·) (with QT+1(·) ≡ 0 by definition) and a feasible first stage solution x1 can be used forconstructing an implementable policy as follows. For a realization ξt = (ct, At, Bt, bt), t = 2, ..., T ,

20

of the data process, decisions xt, t = 1, ..., T , are computed recursively going forward with x1 beingthe chosen feasible solution of the first stage problem, and xt being an optimal solution of

Minxt

cTt xt + Qt+1(xt) s.t. Atxt = bt −Btxt−1, xt ≥ 0, (5.22)

for t = 2, ..., T . These optimal solutions can be used as trial decisions in the backward step of thealgorithm.

The forward step of the standard SDDP algorithm has two functions: (i) to generate trialpoints, and (ii) to estimate value of the constructed policy and hence to provide an upper boundfor the optimal value of the considered problem. Unfortunately the second function of the forwardstep cannot be reproduced in the present case.

5.3.3 Sampling from the uncertainty set

Following the general methodology of robust optimization suppose that the uncertainty set Ξ1t ,

t = 2, ..., T , is an ellipsoid, centered at a point ξ. That is,

Ξ1t := {ξ : (ξ − ξ)TA(ξ − ξ) ≤ r}, (5.23)

where A is a positive definite matrix and r > 0. Consider the set of points of Ξ1t which are not

dominated by other points of Ξ1t ,

D :={ξ ∈ Ξ1

t : does not exist ξ′ ∈ Ξ1t such that ξ′ 6= ξ and ξ ≤ ξ′

}.

In applications which we have in mind it makes sense to restrict our sampling to the set D ⊂ Ξ1t ,

i.e., in fact we consider the uncertainty set given by boundary points of the ellipsoid Ξ1t which are

“larger” than the reference value ξ.The set D can be represented in the following form

D =⋃

a≥0, ‖a‖=1

arg max{aTξ : (ξ − ξ)TA(ξ − ξ) ≤ r

}.

The maximizer of aTξ subject to (ξ−ξ)TA(ξ−ξ) ≤ r is obtained by writing the optimality condition

a− λA(ξ − ξ) = 0,

with λ > 0. Hence such maximizer is given by ξ∗ = λ−1A−1a+ ξ.We can generate the required sample ξ1, ...., as follows. Generate Zi ∼ N(0, I), where I is the

identity matrix of an appropriate dimension. Take ξi = cA−1|Zi|+ ξ, where the absolute value |Zi|of vector Zi is taken componentwise, and c is the calibration constant such that

c2(A−1|Zi|)TA(A−1|Zi|) = r,

i.e., c =√r/(|Zi|TA−1|Zi|

).

5.3.4 Discussion of the SDDP algorithm

The numerical approach outlined above proceeds in several steps. First the sequence ξ22 , ..., ξ

2T of

random parameters is discretized by generating a random sample of size N from each random vectorξ2t , t = 2, ..., T . Consequently the expectation in the right hand side of (5.10) is approximated by

21

the finite sum (5.15). Once the sample is constructed, we are trying to solve the obtained SAAproblem and this sample is not changed in the backward steps of the SDDP algorithm.

Still we have the problem of computing maximum of the corresponding cost-to-go functions withrespect to the uncertain parameters. This forces us to do additional sampling from the uncertaintysets in the backward steps of the algorithm. As the number of iterations will increase, the collectionof generated points will fill the uncertainty sets and in the limit we will reconstruct the correspondingmaxima. However, the convergence can be very slow in high dimensional cases. The number ofpoints needed to approximate say a unit ball in Rd in a uniform way grows exponentially withincrease of the dimension d. Therefore this approach could be reasonable when the dimensionsd1t are small. Fortunately in the considered applications we have that d1

t = 4, t = 2, ..., T , andmoreover, since we are sampling from the boundary of the corresponding ellipsoids, this effectivelyreduces the dimensions to 3.

5.4 Generic description of the worst-case-expectation SDDP

An algorithmic description for the worst-case-expectation SDDP algorithm with one trial point periteration is presented in Algorithm 3.

Algorithm 3 Worst-case-expectation SDDP algorithmRequire:

{Q0t

}t=2,...,T+1

(Init. Lower approx.) and imax(max. iterations)

1: Initialize: i← 0, z = −∞ (Lower bound)2: while i < imax do3: Sample 1 scenario:

{ct, At, Bt, (b1t ; b

2t )}

2≤t≤T

4: (Forward step)5: for t = 1→ T do6: xt ← arg minxt∈Rnt

{c>t xt + Qi

t+1(xt) : Btxt−1 +Atxt = (b1t ; b2t ), xt ≥ 0

}7: end for

8: (Backward step)9: for t = T → 2 do

10: for j = 1→ Nt do11:

[Qtj(xt−1, b

1t ), πtj

]← minxt∈Rnt

{c>t xt + Qi

t+1(xt) : Btxt−1 +Atxt = (b1t ; b2tj), xt ≥ 0

}12: end for13: Qt(xt−1, b

1t ) := 1

Nt

∑Ntj=1 Qt,j(xt−1, b

1t ) ; gt := − 1

Nt

∑Ntj=1 πtjBt

14: Qi+1t ← {xt−1 ∈ Qi

t : −gtxt−1 ≥ Qt(xt−1, b1t )− gtxt−1}

15: end for

16: (Lower bound update)17: z ← minx1∈Rn1

{c>1 x1 + Q2(x1) : A1x1 = b1, x1 ≥ 0

}18: i← i+ 119: end while

22

6 Computational aspects

6.1 Cutting plane elimination procedure

Typically, a significant number of cutting planes added by the SDDP method in the backward stepbecome at some point not necessary for the description of the cost-to-go functions approximationsand could be eliminated. In this section we present a subroutine that identifies these redundantcutting planes. This subroutine allows a significant speed up of any SDDP type algorithm in generalwhile preserving the statistical properties of the constructed policy.

First, we start by presenting the problem setting. At each stage, the cost-to-go function ofdynamic programming equations are approximated by Q(·) given by the maximum of a collectionof cutting planes in the following manner:

Q(x) = maxk∈I

{αk + βT

k x}

(6.24)

for x ∈ Γ where Γ is a compact set.

An example of a redundant cutting plane is illustrated in Figure 1. In this figure, we assumethat all the hyperplanes define half spaces in the non negative orthant and Γ = [0, 4]. The cutting

Figure 1: Illustration of a redundant cutting planes

plane denoted by continuous line is redundant since it can never be active in describing Q(·) overΓ. Thus, it can be safely discarded. Empirical evidence shows that the SDDP method tends togenerate a significant number of such cutting planes.

Checking whether, say, α1+βT1 x is redundant is equivalent to checking feasibility of the following

linear system θ ≤ α1 + βT

1 x,θ ≥ αk + βT

k x,∀k ∈ I\ {1} ,x ∈ Γ,

(6.25)

where (θ, x) are variables and (αk, βk)k∈I are known data. If problem (6.25) is infeasible, thenα1 +βT

1 x is redundant (as illustrated in Figure 1) and could be removed. Otherwise, the constraintmust be maintained to preserve the statistical properties of the constructed policy.

Algorithmic description of the redundant cutting planes elimination subroutine is described inAlgorithm 4.

23

Algorithm 4 Redundant cutting planes elimination subroutineRequire: Q(x) = maxk∈I

{αk + βT

k x}

1: for j ∈ I do

2: Check feasibility of the polyhedron P =

(θ, x) :θ ≤ αj + βT

j x

θ ≥ αk + βTk x, ∀k ∈ I\ {j}

x ∈ Γ

3: if P = ∅ then4: Discard (αj + βT

j x) from P5: end if6: end for

Now, we discuss the use of the redundant cutting planes elimination subroutine described inAlgorithm 4 within the general framework of SDDP type methods.

First, we investigate the question of how frequently should this subroutine be used. We run3000 iterations of the SDDP method with 1 trial point per iteration on the risk neutral case. Werun several experiments where we use different constant cycle lengths (i.e. a cycle length of 50means that we run the subroutine each 50 iterations). The ∞ denotes the case where we don’t usethe subroutine.

Cycle SDDP run time subroutine run time Total CPU timelength (dd:hh:mm) (dd:hh:mm) (dd:hh:mm)

50 00:09:59 00:04:10 00:14:09100 00:10:18 00:02:06 00:12:24200 00:11:21 00:01:06 00:12:27400 00:13:19 00:00:43 00:14:02∞ 01:04:54 - 01:04:54

Table 1: CPU time summary for risk neutral SDDP

Over the 3000 iterations, a speed up factor of at least 2 times is recorded with a cycle length of100 or 200 when compared to experiment without running the subroutine. It is clear that it the useof the subroutine significantly improves the method performance. Clearly, a tradeoff between thetime spent in removing redundant cutting planes and performing SDDP iterations has to be made.On the one hand, with a cycle length of 50, the lowest SDDP run time is obtained. However, asignificant amount of time is spent in running the subroutine. On the other hand, with a cycle lengthof 400, the lowest time spent on running the subroutine is achieved. Nevertheless, a longer time torun the SDDP is recorded. Furthermore, subroutines runs become more expensive as the numberof cutting planes increases. A better strategy consists in changing the cycle length throughout theexperiment as function of how costly it is to run the subroutine compared to performing furtherSDDP iterations.

This subroutine allows to have some measure of the cutting planes efficiency at each stage.Figure 2 plots the percentage of redundant cutting planes compared to the total number of cutsadded per stage for the worst-case-expectation approach with u = 3%. Performing 3000 iterationswith 1 trial point per iteration generates 3000 cutting planes at each stage. It can be seen thatthe proportion of redundant cutting planes is higher for earlier stages with more than 60% in thefirst 70 stages. This can be explained by the continuous refinement of the cost-to-go functionapproximations for the first stages. In addition, the lower error accumulation in the cost-to-go

24

function approximation for later stages explains the lower proportion of redundant cutting planes.

Figure 2: Redundant cutting planes proportion at each stage

6.2 Risk neutral approach

In this section, we present some of the computational results for the risk neutral SDDP.

6.2.1 Typical bounds behaviour & gap based stopping criteria

A typical bounds behaviour is illustrated in Figure 3 along with a moving average of the gap (using300 past observations). After 3000 iterations, the obtained policy has a relatively high gap (i.e.around 30%). Running further the algorithm shows a very slow improvement of the gap.

Figure 3: Risk neutral SDDP bounds & gap moving average

Table 2 shows the CPU time needed to run the risk neutral SDDP along with a redundantcutting planes elimination each 400 iterations at some key iterations.

25

Table 2: Total CPU timeiteration dd:hh:mm3000 00:15:335000 01:13:247000 02:16:53

Running the algorithm up to 3000 iterations is done in a reasonable time and gives a reasonablequality of the policy.

6.2.2 Policy value stopping criterion

Assume that we have 2 policies and we would like to know if they are significantly different. Oneof the possible approaches is to use a t-test by proceeding as follows. We generate a set of samplescenarios which will be considered as a reference sample. For each scenario of the reference sample,we compute the value for the 2 policies. We perform a paired sample t-test on the obtained value.The null hypothesis is the means of two normally distributed populations are equal. The result ofthe test is a rejection of the null hypothesis at the 5% significance level if the p-value is less than5%.

This approach can be used in the SDDP context as follows. First, we generate a referencesample of scenarios. Assume that we would like to know whether the policy changes significantlyafter iteration 3000. Then, each specified number of iterations, we evaluate the policy and compareit to the one computed at iteration 3000. We perform this experiment with a reference scenariosample of size 1000 and we evaluate the policy each 100 iterations. Figure 4 gives the evolution ofthe p-value as function of iterations. The dashed line represents the 5% significance level.

Figure 4: p-value evolution with reference iteration 3000

Starting from iteration 4300, the null hypothesis (i.e. the means of two normally distributedpopulations are equal) is rejected at the 5% significance level. In this way, one could argue that itwould be premature to stop the algorithm at iteration 3000.

26

6.2.3 Variability of SAA problems

In this section we discuss variability of the bounds of optimal values of the SAA problems. Recallthat an SAA problem is based on a randomly generated sample, and as such is subject to randomperturbations, and in itself is an approximation of the “true” problem. In this experiment wegenerate twenty SAA problems where each one having 1×100×· · ·×100 = 100119 scenarios. Thenwe run 3000 iterations of the SDDP algorithm for the risk neutral approach. At the last iteration,we perform a forward step to evaluate the obtained policy with 2000 scenarios and compute theaverage policy value.

Table 3 shows the 95% confidence interval for the lower bound and average policy value atiteration 3000 over a sample of 20 SAA problems. Each of the policy value observations wascomputed using 2000 scenarios. The last 2 columns of the table shows the range divided by theaverage of the lower bound (where the range is the difference between the maximum and minimumobservation) and the standard deviation divided by the average value. This problem has relativelylow variability (approx. 4%) for both of the lower bound and the average policy value.

95% C.I. left Average 95% C.I. right range/average sdev./average(×109) (×109) (×109)

Lower bound 22.290 22.695 23.100 15.92% 4.07%Average policy 27.333 27.836 28.339 17.05% 4.12%

Table 3: SAA variability for risk neutral SDDP

Similar results were obtained for the risk averse case.

6.2.4 Sensitivity to initial conditions

The purpose of this experiment is to investigate the impact of changing initial conditions on thedistribution of the individual stage costs. Table 4 shows the initial conditions used in all of theexperiments above and Table 5 shows the maximum storage value for each system.

SE S NE NReservoir level 119,428.8 11,535.09 29,548.19 6649.39

% of Maximum capacity 56.5 % 58.8% 57.03% 52.17%

Table 4: Initial conditions

SE S NE NMaximum storage 200,717.6 19,617.2 51,806.1 12,744.9

Table 5: Maximum stored volume

We consider the following 2 initial levels: 25% and 75% of the maximum storage capacity ineach system. Figure 5 shows the individual stage costs for the risk neutral approach in two cases:all the reservoirs start at 25% or at 75% of the maximum capacity.

When we start with 25% of the maximum capacity in all reservoirs, high costs occur for thefirst stages which reflects the recourse to the expensive thermal generation to satisfy the demand.

27

Figure 5: Sensitivity to initial conditions (Risk neutral)

Similarly, when we start with 75% of the maximum capacity in all reservoirs low costs occur in thefirst stages. Interestingly in both cases the costs become identical starting from 60th stage.

Similar results were observed for the risk averse approach.

6.3 Risk averse approach

In this section, we present some computational results for the risk averse approach.Figure 6 plots the total cost for 120 stages as function of λ and for α ∈ {0.05, 0.1}. In the 120

stages case, a reasonably good performance is achieved by setting α = 0.05 and λ = 0.15. Thissetting allows a reduction of 23.61% and 11.06% in the 99% and 95% quantiles and a loss of 10.39%on average when compared to the risk neutral case.

Figure 6: Total 120 stage cost for risk averse approach

28

Figure 7 compares the average and 99% quantile of the individual stage costs for the risk neutraland the risk averse approach (with α = 0.05 and λ = 0.15). The risk averse approach allows asignificant reduction in the 99% quantile with a moderate loss on average when compared with therisk neutral method.

Figure 7: Individual 120 stage costs for risk averse approach

Figure 8 compares the individual stages average deficit for the risk neutral and the risk averseapproach (with α = 0.05 and λ = 0.15). The risk averse approach allows a sensible reduction indeficit when compared to the risk neutral method.

Figure 8: Individual 120 stage deficits for risk averse approach

29

6.4 Worst-case-expectation approach

In this section, we present computational results for the worst-case-expectation approach. In par-ticular, we compare the worst-case-expectation approach to the risk averse method.

Consider the uncertainty sets (ellipsoids) defined in equation (5.23) of section 5.3.3 with matrixA := Σ−1

t , where Σt denotes the demand correlation matrix for the 4 systems estimated usinghistorical demand data (60 observations). Set the corresponding parameter r = rt as

rt :=(‖ξ1t ‖2 × u

)2,

where the uncertainty parameter u > 0 represents the percentual increment on the demand, andξ1t denotes the demand forecast at stage t = 2, ..., T .

6.4.1 Solution strategies

The first question that may arise before considering the worst-case-expectation approach is relatedto the number of points that should be sampled from the uncertainty set at each iteration andstage. To answer this question, three possibilities were investigated. The first one, which mayseem to be natural, is to sample L demand values from the uncertainty set of stage t and add allcalculated cuts to stage t− 1. This approach requires that the backward computation is repeatedL times at each iteration and, as L cuts are added at each iteration instead of one, it is expected agreat increase on the CPU time with a small increase on the L value. On the other hand, it is alsoexpected that the uncertainty set is better filled for higher L values. The second approach was tosample L demand values from the uncertainty set but, instead of adding all cuts to stage t−1, onlythe cut related to the worst expected value is added. This approach may reconstruct the maximaas well as the first one, but is going to require less CPU time. In the following experiments theuncertainty parameter u = 1%, L = 3 and the described studies are called “Worst-case (u = 1%,L = 3) all cuts” and “Worst-case (u = 1%, L = 3) worst cut”, respectively. The third approach wascarried with L = 1, that is, sampling one value from the uncertainty set, and is called “Worst-case(u = 1%)”. This approach has no additional burden compared to risk neutral approach and maybe preferred over the others, but it is unclear if the uncertainty set is going to be well described atthe end of the computation. The main objective of the tests performed in this section is to addressthis issue.

The CPU time for the three strategies and the risk neutral case are shown on Table 6, and it isclear that with the increase of L value the worst-case approach becomes too time consuming. Onthe other hand, with L = 1 there was no additional time required to solve the problem. The nextstep is to compare the solutions obtained by each sampling approach.

Table 6: Total CPU time – Solution strategiesCase Study dd:hh:mm:ssRisk Neutral 01:06:57:40u = 1.00% and L = 1 01:06:50:30u = 1.00% and L = 3 worst cut 03:20:32:11u = 1.00% and L = 3 all cuts 11:12:49:19

The total expected cost for 120 stages and its 95% confidence interval are shown on Table 7.The solution values obtained were very similar, which indicates that the L = 1 strategy may giveas good results as the other approaches.

30

Table 7: Total expected values (×109) – 120 stagesCase Study 95% CI lower limit Policy value mean 95% CI upper limitu = 1.00% and L = 1 27.270 27.864 28.458u = 1.00% and L = 3 worst cut 27.264 27.849 28.434u = 1.00% and L = 3 all cuts 27.274 27.868 28.461

To ensure that “Worst-case (u = 1%)” approach gives similar results as the other approaches,all variables related to the problem were compared, such as decision variables, marginal cost (dualvariable) and the load curtailment frequencies, which is an index calculated after the problemsolution to assess the security of the system’s operation. This comparison shows that there was nosignificant difference on the solution of evaluated approaches. South-East system stored volumesand marginal costs average values, together with its 5% and 95% quantiles are shown on Figure 9for a simulation with 1% of increase on the forecast demand.

Figure 9: Stored volumes and marginal costs average, 5% and 95% quantiles values – South-Eastsystem

The next section comprises the numerical evaluation of the worst-case-expectation methodologyitself. As the “Worst-case (u = 1%)” showed good results, the L = 1 approach is going to be usedon the further studies of this work.

6.4.2 Worst-case expectation approach

As discussed in Section 6.4.1, the approach with L = 1 gives sufficiently accurate results. Therefore,in order to assess the performance of the worst-case-expectation approach with respect to theuncertainty parameter u, three case studies were run: risk neutral SDDP and worst-case-expectationSDDP with uncertainty parameter u = 1% and u = 3%.

In the left graph of Figure 10 we can see the average discounted total cost of 120 stages as afunction of the demand increase for both risk neutral and worst-case-expectation approaches. Asthe demand increases, the average policy value increases for all the approaches. However, the rateof increase for the worst-case-expectation approaches is lower than for the risk neutral approach.The gain achieved by the worst-case-expectation approach related to the risk neutral is shown onthe right side of Figure 10.

31

Figure 10: 120 stages policy values for risk neutral and worst-case-expectation (r = 1% and r = 3%)

On Figure 11 we can see the average stage costs for the forward simulation with 1% of increaseon the demand, and on Figure 12 the same average costs together with 95% and 99% quantilesfor the forward simulation. We can notice that the average costs of cheaper stages were increasedwhile the average costs of more expensive stages were reduced. This same behaviour is observedon 95% and 99% quantiles, which peak values were reduced.

Figure 11: Average individual stage costs for risk neutral and worst-case-expectation (u = 1% andu = 3%)

The annual load curtailment frequencies for each depth, are shown for the five years of theplanning horizon on Figure 13. This simulation was done with an increase of 1% on the demand.The worst-case-expectation approach was able to reduce the deficit frequencies to less than its halfvalues in most years and deficit depths.

The total CPU time used by each case study is compared to the “Risk Neutral” SDDP approach

32

Figure 12: Individual stage costs average, 95% and 99% quantiles for risk neutral and worst-case-expectation (u = 1% and u = 3%)

Figure 13: Annual deficit’s risks for each depth of load curtailment for South-East system

on Table 8. As seen in Table 6, an increment in the uncertainty parameter from u = 1% to u = 3%resulted in no significant increase in CPU time.

6.4.3 Worst-case expectation approach vs. Risk averse approach

In this section we compare the risk averse approach with the worst-case-expectation method. Weconsider the mean-AV@R risk averse approach with parameters λ = 0.15 and α = 0.05. As seen in

33

Table 8: Total CPU timeCase Study dd:hh:mm:ssRisk Neutral 01:06:57:40u = 1.00% and L = 1 01:06:50:30u = 3.00% and L = 1 01:07:02:59

section 6.3, this setting resulted in a reasonably good performance for the risk averse approach witha moderate loss of average costs. For the worst-case-expectation method we used u = 3%. Bothstudies considered 3000 iterations of the SDDP algorithm and 2000 randomly generated scenariosto evaluate the policy.

Figure 14 plots the average, 95% and 99% quantiles of the total cost as function of the demandincrease for the worst-case-expectation and the risk averse approach. When the forecast demanddata is used (i.e. demand increase = 0%), the worst-case-expectation approach has lower averageand almost similar 95% and 99% quantiles when compared with the risk averse method. The worst-case-expectation approach has considerably lower average policy value consistently as the demandincreases. Furthermore, it outperforms the risk averse method with lower 95% and 99% quantileswhen the demand increase is greater than 2%.

Figure 14: 120 stages discounted cost as function of demand increase

Figure 15 shows the average individual stage costs with 0% and 1% demand increase for theworst-case-expectation and the risk averse approach. In most of the first 100 stages, the worst-case-expectation approach has lower average value when compared to the risk averse method. However,in final stages, higher costs occur for the former method. The worst-case-expectation approachallows a smoother average individual costs across stages than the risk averse approach. An increaseof 1% in the demand process shifts up approximately in similar manner the average costs for bothmethods.

Figure 16 plots the 99% quantile of the individual stage costs with 0% and 1% demand increasefor the worst-case-expectation and the risk averse approach. The worst-case-expectation approachhas higher 99% quantile value most of the time with 0% and 1% demand increase. In some sense, a

34

Figure 15: Average individual stage costs with 0% and 1% demand increase

better performance at this level is expected for the risk averse method, especially under low demandincrease.

Figure 16: 99% quantile individual stage costs with 0% and 1% demand increase

Figure 17 plots the average and 99% quantile of the individual stage costs with 3% demandincrease for the worst-case-expectation and the risk averse approach. The worst-case-expectationapproach has lower average individual stage costs than the risk averse method for most of thestages. Furthermore, the latter method exhibits significantly higher 99% quantile peaks. It isexpected that the worst-case-expectation is less sensitive than the risk averse method to relativelyhigh demand perturbation.

7 Conclusion

This report summarizes the results that were developed throughout the technical cooperation be-tween Georgia Institute of Technology and ONS - Operador Nacional do Sistema Eletrico.

Section 2 discussed the problem formulation and modelling related issues. A time series modelwith multiplicative error term that complies with the requirements of SDDP algorithm was sug-

35

Figure 17: 99% quantile individual stage costs with 0% and 1% demand increase

gested. In section 3, the risk neutral approach was discussed. The algorithm was presented andthe stopping criteria was investigated.

The risk averse method was analyzed in section 4. The purpose of the risk averse method isto allow a trade off between loss on average and immunity against unexpected extreme scenarios.The algorithm steps were outlined. The dynamic choice of weights and time consistency aspectswere discussed. The risk averse approach constructs a policy that is less sensitive to unexpectedextreme scenarios with a reasonable loss on average when compared to the risk neutral method.

In section 5, the worst-case expectation approach was presented. This method tackles multi-stage stochastic programming problem where the data process can be naturally separated into twocomponents, one can be modelled as a random process, with a specified probability distribution,and the other one can be treated from a robust point of view. The basic ideas were discussed in thestatic and multistage setting and a time consistent formulation with the corresponding dynamicprogramming equations is presented. The worst-case-expectation approach constructs a policy thatis less sensitive to unexpected demand increase with a reasonable loss on average when comparedto the risk neutral method.

Finally, in section 6, we discussed numerical experiments with all the above mentioned methodsapplied to Brazilian operation planning of hydro plants. A comparison between the worst-case-expectation method and the risk averse approach was presented.

36