stochastic dynamic linear programming: a sequential ...stochastic dual dynamic programming (sddp)...

25
* *

Upload: others

Post on 22-Jul-2020

23 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Stochastic Dynamic Linear Programming: A Sequential ...stochastic dual dynamic programming (SDDP) algorithm, have received wide acceptance. However, these sampling-based methods still

Stochastic Dynamic Linear Programming: A Sequential

Sampling-based Multistage Stochastic Programming Algorithm

Harsha Gangammanavar∗1 and Suvrajeet Sen†2

1Department of Engineering Management, Information, and Systems, Southern

Methodist University, Dallas TX2Department of Industrial and Systems Engineering, University of Southern California,

Los Angeles, CA

Abstract

Multistage stochastic programming deals with operational and planning problems thatinvolve a sequence of decisions over time while responding to realizations that are uncertain.Algorithms designed to address multistage stochastic linear programming (MSLP) prob-lems often rely upon scenario trees to represent the underlying stochastic process. Whenthis process exhibits stagewise independence, sampling-based techniques, particularly thestochastic dual dynamic programming (SDDP) algorithm, have received wide acceptance.However, these sampling-based methods still operate with a deterministic representationof the problem which uses the so-called sample average approximation. In this work wepresent a sequential sampling approach for MSLP problems while allowing the decisionprocess to assimilate new sampling information in a recursive fashion. We refer to thismethod as the stochastic dynamic linear programming (SDLP) algorithm. Since we use se-quential sampling, the algorithm does not necessitate a priori representation of uncertainty,either through a scenario tree or sample average approximation, which require the knowl-edge/estimation of the underlying distribution. In this regard, SDLP is a distribution-freeapproach to address the MSLP problems. We employ regularization at non-terminal stagesand have a piecewise-ane solution discovery scheme to identify incumbent solutions usedin the regularization term embedded into the algorithm. The SDLP algorithm is shown toprovide asymptotically convergent value functions and an optimal policy, with probabilityone.

1 Introduction

Many practical applications require sequences of decisions to be made under evolving and oftentimes uncertain conditions. Multistage stochastic programming (SP) is one of the commonapproaches used to guide decision making in such stochastic optimization problems. MultistageSP has been successfully applied in a variety of elds ranging from traditional production systems[26], hydroelectric reservoir scheduling [23, 25] and nancial planning models [4, 21], to emergingapplications in electricity grids with renewable generation [29] and revenue management [38],among others.

The earliest applications of multistage SP used models which are formulated as multistagestochastic linear programs (MSLP) with recourse, and solved using multistage extensions of theL-shaped method [39] such as the Nested Benders Decomposition (NBD) method [2], the scenariodecomposition method [24]; and the progressive hedging algorithm [31]. A common feature across

[email protected][email protected]

1

Page 2: Stochastic Dynamic Linear Programming: A Sequential ...stochastic dual dynamic programming (SDDP) algorithm, have received wide acceptance. However, these sampling-based methods still

Gangammanavar and Sen Stochastic Dynamic Linear Programming

all these algorithms is the use of deterministic approximation of uncertainty through scenariotrees (i.e. precedence relations) built using scenario generation techniques (e.g., [8]). Whenthe underlying stochastic process becomes complicated, their deterministic approximation mayresult in large unwieldy scenario trees, and one may have to resort to scenario reduction methods(e.g. [9]) to achieve computational viability. For models which allow stagewise independent data,stochastic dual dynamic programming (SDDP) algorithm was proposed in [25]. The multistageextensions of the L-shaped method as well as SDDP and its variants intend to solve a base modelwith a nite sample space and known probability distribution (a scenario tree or a sample averageapproximation).

In problems with continuous sample space, an approach which does not rely on probabilisticinformation is desirable. We refer to such methods as being distribution-free. For MSLPs, thiswas achieved in the the rst inexact bundle method proposed in [34] called multistage stochas-tic decomposition (MSD). This algorithm is a dynamic extension of the regularized version ofthe two-stage stochastic decomposition (2-SD) algorithm [15]. It accommodates very generalstochastic processes, possibly with correlations, lags, etc. through a nodal formulation whichrequires a layout" of a scenario tree, and some mechanism which provides transitions betweennodes. A standard scenario tree formulation is a special version of such a mechanism. Whenthe stochastic process exhibits interstage independence, a time-staged formulation (as opposedto nodal scenario-tree formulation) is more convenient. In this regard, the main contributionsof this work are as follows:

1. We will harness the advantages oered by both interstage independence assumption (likeSDDP) as well as sequential sampling design (like MSD) to build a computationally ecientMSLP algorithm. The algorithm is designed for a state variable formulation of MSLP. Incontrast to SDDP, our algorithm completes the forward and backward pass computationsalong a single sample path that is generated independently of previously observed samplepaths.

2. This paper serves as a companion to our earlier work [13] by providing the theoretical cor-roboration of the empirical evidence presented there. In [13], a distribution-free approachwas used for controlling distributed storage devices in power systems with signicant re-newable resources. Computational experiments conducted on large-scale instances showedthat such an approach provides solutions which are statistically indistinguishable to solu-tions obtained using SDDP, while signicantly reducing the computational eort.

3. The proposed algorithm, as well as MSD, employ quadratic regularizing terms at all non-terminal stages using incumbent" decisions1 that are maintained for all sample pathsdiscovered during the algorithm. Maintaining and updating these incumbent solutionsbecomes cumbersome as the number of sample paths increase. In order to address thiscritical issue we develop a piecewise-ane policy that can be used to identify incumbentsolutions for out-of-sample scenarios (new sample paths) generated sequentially within thealgorithm. This policy is based on optimal bases of the approximate stage problems thatare solved during the course of the algorithm.

We will refer to our sequential sampling-based approach for MSLP with interstage indepen-dence as the stochastic dynamic linear programming (SDLP) algorithm. Unlike SDDP whichrecovers the cost-to-go value functions in nitely many steps, optimality of our algorithm isproved via complementarity and primal-dual relationships which are fundamental to mathemat-ical programming. Moreover, SDLP approximations are not necessarily nitely convergent. Thisdistinguishes the mathematical underpinnings of SDLP from SDDP.

1In SP algorithms, an incumbent decision" is one for which the predicted objective value is currently thebest. When predictions change, the incumbent decision must be updated.

2

Page 3: Stochastic Dynamic Linear Programming: A Sequential ...stochastic dual dynamic programming (SDDP) algorithm, have received wide acceptance. However, these sampling-based methods still

Gangammanavar and Sen Stochastic Dynamic Linear Programming

The remainder of the paper is organized as follows. In 2 we will present the MSLP formula-tion used this paper. We will present a brief overview of the deterministic decomposition-basedMSLP methods, particularly SDDP, in 3, and a detailed description of SDLP algorithm in 4.Our presentation will have a particular emphasis on the dierences in approximations employedin deterministic and stochastic decomposition methods. We will discuss the convergence analysesof SDLP in 5.

2 Formulation

We consider a system where sequential decisions are made at discrete decision epochs denotedby the set T := 0, . . . , T . Here T < ∞, and hence we have a nite horizon sequential decisionmodel with T + 1 stages. We will use [t] to denote the history of the stochastic process vtTt=0

until (and including) stage t, i.e., v[t] = v0, v1, . . . , vt. Likewise, we will use (t) to denote theprocess starting from stage t until the end of horizon (stage T ), i.e., v(t) = vt, vt+1, . . . , vT . Wewill use t+ and t− to denote the succeeding and preceding stages of stage t, respectively.

Ordinarily in SP, MSLP models are formulated without state variables, focusing only ondecisions in each stage. In many applications, however, it is advisable to use a state variableformulation which is common in the dynamic programming community. In this regard, we use astate variable st := (xt, ωt) ∈ St to describe the system at stage t. This state variable comprisesof two components: xt ∈ Xt is the endogenous state of the system, and ωt ∈ Ωt captures theexogenous information revealed in interval (t − 1, t]. The exogenous state evolution is drivenby a stochastic process over which the decision maker cannot exert any control. For example,the exogenous state variable may represent a weather phenomenon like wind speed, or a marketphenomenon like price of gasoline. The evolution of endogenous state, on the other hand, canbe controlled by an algorithm through decisions ut, and is captured by the stochastic lineardynamics:

xt+ = Dt+(xt, ωt+, ut) = at+ +At+xt +Bt+ut. (1)

where ωt+ denotes the future of the data process, which in our case is modeled as a linear systemdened by the tuple (at+, At+, Bt+).

To characterize the exogenous process ωt, we will use (Ω,F ,P) to denote the lteredprobability space. Here, Ω = Ω1 × . . . × ΩT denotes the set of outcomes, and ωt denotes anobservation of the random variable ωt. The σ-algebras Ft ⊆ F represent the data available tothe decision maker at time t, which satisfy Ft ⊆ Ft′ for t < t′.

With these notations, the state-variable representation of the time-staged MSLP can bewritten in the following recursive form for all t ∈ T :

ht(st) = c>t xt + min d>t ut + E[ht+(st+)] (2)

s.t. ut ∈ Ut(st) := ut|Dtut ≤ bt − Ctxt, ut ≥ 0,

where xt+ = Dt+(xt, ωt+, ut). In our nite horizon framework we assume that the terminal costhT+(sT+) is known for all sT+ (or negligible enough to be set to 0). Further, the initial states0 = (x0, ω0) is also assumed to be given, and hence the stage-0 (henceforth known as the rootstage) problem has deterministic input.

In general, the MSLP problems are PSPACE-hard [10] and require exponential eort inhorizon T for provably tight approximations with high probability. To keep our presentationconsistent with our algorithmic goals, we make the following assumptions:

(A1) The set of root stage decisions U0 is compact.

(A2) Complete-recourse assumption is satised at all non-root decision epochs, that is, thefeasible set Ut(st) is non-empty for all endogenous trajectories xt satisfying (1).

3

Page 4: Stochastic Dynamic Linear Programming: A Sequential ...stochastic dual dynamic programming (SDDP) algorithm, have received wide acceptance. However, these sampling-based methods still

Gangammanavar and Sen Stochastic Dynamic Linear Programming

(A3) The constraint matrices Dt are xed and have full row rank.

(A4) Zero provides the lower bound on all cost-to-go functions.

(A5) The stochastic process for exogenous information is stagewise independent, and its supportis nite.

These assumptions provide a special structure, and are fairly standard in the SP literature ([28,34]). The xed recourse assumption (A3) implies that the recourse matrix Dt does not dependon exogenous information, and our nite support assumption (A5) on exogenous informationensures that Ft is nite. We note that the algorithms presented here can be extended, aftersome renement, to settings where some of the above assumptions can be relaxed. For instance,certain extensions to Markovian stochastic processes can be envisioned. However, a detailedtreatment of these extensions is beyond the scope of this paper.

3 MSLP Algorithms

The basic diculty of solving SP problems is associated with the multidimensional integral forcomputing the expectation in (2). The most direct approach involves incorporating simulationto estimate the expected recourse function as:

E≈[ht(st+)]HNt (ut) :=

1

N

N∑n=1

ht(snt+) (3)

where, snt+ has components xnt+ = at+ +At+xt+Bt+ut and ωnt+. Doing so results in the so-called

sample average approximation (SAA) problem. In this case, we can view the support of Ωt+

to comprise of a simulated sample ΩNt+ := ω1

t+, ω2t+, . . . , ω

Nt+, where each observation vector

ωnt+ has the same probability of p(ωnt+) = (1/N) ∀n = 1, . . . , N . Since the recourse function in(2) involves the expectation operator, it is worth noting that the estimate in (3) is an unbiasedestimator, and under certain conditions (e.g., when the sample is independent and identicallydistributed) a consistent estimator of the expected recourse function.

Once the SAA problem is set-up, a deterministic algorithm can be employed to solve it. TheSAA problem can be reformulated as a single large linear program (the deterministic equivalentform [3]), and o-the-shelf optimization software can be used to solve the problem. However,as the sample size increases (as mandated by SAA theory to achieve high quality solutions[37]), or the number the number of stages increases, such an approach will be computationallyburdensome. The deterministic decomposition-based cutting plane methods, also known asouter-linearization methods, provide a means to partially overcome the aforementioned burden.

The deterministic decomposition-based (DD) methods can be traced to Kelley [20] for smoothconvex optimization problems, Benders decomposition for ideas of decomposition/ partitioningin mixed integer programs (MIPs) [1], and Van Slyke and Wets for 2-SLPs [39] for 2-SLPs.While the exact motivation for these methods arose in dierent contexts, we now see them asbeing very closely related to the outer-linearization perspective. These ideas have become themain-stay for both 2-SLPs and stochastic MIPs.

DD-based algorithms originally developed for 2-SLP have been extended to successive stagesof dynamic linear programs. One of the early successes was reported in [2], where the classicaltwo-stage Benders decomposition algorithm was extended to multiple stages. This procedurehas subsequently come to be known as the Nested Benders Decomposition (NBD) algorithm.The starting point of this algorithm is the scenario tree representation of underlying uncertaintywhere all possible outcomes and their interdependence are represented as nodes on a tree. Nat-urally, this implies that the NBD algorithm can be classied under the multistage DD-basedmethods.

4

Page 5: Stochastic Dynamic Linear Programming: A Sequential ...stochastic dual dynamic programming (SDDP) algorithm, have received wide acceptance. However, these sampling-based methods still

Gangammanavar and Sen Stochastic Dynamic Linear Programming

BendersDecomposition [1]

L-shaped Method [39]

Regularized BendersDecomposition [33]

StochasticApproximation [30]

StochasticQuasi-Gradient [11]

StochasticDecomposition [15, 16]

Two-stage

Multistage

Nested-Benders Decomposition [2]

Sampling-based Methods

Stochastic Dual Dynamic Programming [19, 25, 36]Abridged Nested-Benders [7]Cutting-plane and Partial Sampling [6]

MultistageStochastic Decomposition [34]

Stochastic DynamicLinear Programming

(current work)

Figure 1: Multistage Stochastic Linear Programming Algorithms

3.1 Stochastic Dual Dynamic Programming

It is well known that number of nodes in the scenario tree increases exponentially with thenumber of stages, and therefore, the need to visit all the nodes in the scenario tree signicantlyincreases the computational requirements of the NBD algorithm. Pereira and Pinto [25] provideda sampling-based approach to address this issue in the stochastic dual dynamic programming(SDDP) algorithm.

Like MSLP algorithms mentioned earlier, SDDP creates an outer approximation of the stagevalue function using subgradient information. SDDP performs its iteration in a forward passand a backward pass, a feature common to most multistage SP algorithms. However, it avoidsintractability of scenario trees by assuming that the stochastic process is stagewise independent.While the algorithm traverses forward using sampling, the approximations are created on thebackward pass from deterministic Benders type cuts. The interstage independence assumptionallows these cuts to be shared across dierent states within a stage. Cut sharing under specialstagewise dependency as presented in [19], algorithmic enhancements proposed in [22], and theinclusion of risk measures [27] have contributed to the success of SDDP. The abridged nesteddecomposition algorithm in [7] and the cutting plane and partial sampling algorithm proposed in[6] are other sampling-based methods which are similar in avor to SDDP. A complete analyticaltreatment of this class of algorithms is provided in [28]. The analysis of SDDP algorithm appliedto the multistage SAA problem is provided in [36].

The main steps of SDDP are presented in Algorithm 1. As in the case of NBD, each iterationof SDDP begins by solving an optimization problem for the root-stage. Then a nite numberof Monte Carlo simulations are carried out to identify forward sample paths ω(0)Nn=1 for theiteration. Along each one of these sample paths, the forward recursion involves identifyingcandidate solutions uknt by solving an optimization problem of the form:

minfk−1t (ut) | ut ∈ Ut(sknt ), (4)

and propagating the state according to the dynamics in (1) as sknt+ = Dt+(sknt , ωknt+ , u

knt ). These

two steps are undertaken in an alternating manner for all stages until the end of the horizon. Inthe above stage optimization problem, fk−1

t (ut) denotes the current approximation of the cost-to-go function in (2). At the end of the forward recursion we have a set of candidate solutionsat each non-terminal stage uknt ∀t one for each simulated forward recursion sample path.

In the work of Pereira and Pinto [25] the backward pass proceeds as in the case of NBD (seeSteps 1222 in Algorithm 1). At a non-terminal stage t and for each element of the candidatesolution set uknt , backward pass states are computed using the linear dynamics in (1) for allpossible outcomes in Ωt+. With each of these backward pass states as input, an optimizationproblem is solved in stage t+ and the optimal dual solution is used to compute a lower boundingane function. Since this procedure requires subproblems to be solved for all the nodes along all

5

Page 6: Stochastic Dynamic Linear Programming: A Sequential ...stochastic dual dynamic programming (SDDP) algorithm, have received wide acceptance. However, these sampling-based methods still

Gangammanavar and Sen Stochastic Dynamic Linear Programming

the sample paths simulated in the forward recursion, this approach is ideal for narrow trees (fewpossible realizations per stage). However, the computational issues resurface when the numberof outcomes per stage increases. Donohue and Birge proposed the abridged NBD algorithm toaddress this issue in [7] where the forward recursion proceeds only along a subset of candidatestates (termed as branching states) while solving all the nodes only along the trajectory ofbranching states in the backward pass. Subsequently, it was proposed in [22] and [28] thatsampling procedures can be adopted in the backward pass as well. We make the followingobservations regarding the original SDDP procedure, and its variants:

Algorithm 1 Stochastic Dual Dynamic Programming

1: Initialization: Iteration count k ← 0.2: Forward recursion: Decision simulation along simulated sample paths.3: Solve the root-stage optimization problem (4) to identify uk0.4: Sample a set of N paths ωkn(0)

Nn=1.

5: for t = 1, . . . , T − 1 do

6: for n = 1, . . . , N do

7: Setup the candidate states sknt = Dt(xknt−, ωknt , uknt−).8: Solve the stage optimization problem in (4) with sknt as input, and obtain the

optimal primal candidate solution uknt .9: end for

10: end for

11: Backward pass: Update cost-to-go function approximations.12: for t = T − 1, . . . , 0 do

13: for n = 1, . . . , N do

14: for ωt+ ∈ Ωt+ do

15: Setup st+ = (xt+, ωt+), where xt+ = Dt+(xknt , ωt+, uknt ) as input.

16: Solve subproblem with st+ as input:

min fkt+(st+, ut+) | ut+ ∈ Ut+(st+), (5)

and obtain optimal dual solution πt+(ωt+).17: Compute lower bounding ane function `t+(xt+) := αknt+(ωt+) +

(βknt+(ωt+))>xt+, where

αknt+(ωt+) = b>t+πt+(ωt+); βknt+(ωt+) = ct+ − C>t+πt+(ωt+). (6)

18: Update the set of coecients as:

J kt+(ωt+) = J k−1t+ (ωt+) ∪ (αknt+(ωt+), βknt+(ωt+).

19: end for

20: end for

21: Obtain the updated stage cost-to-go function approximation using

hkt+(st+) = maxj∈J k

t+(ωt+)αjt+ + (βjt+)>xt+. (7)

to obtain fkt (st, ut) = c>t xt + d>t ut +∑

ωt+∈Ωt+p(ωt+)hkt+(st+).

22: end for

23: Increment iteration count: k ← k + 1, and go to Line 2.

1. Each collection of ane function J knt (ω) is associated with a unique candidate solution uknt−at stage (t− 1). The cost-to-go function approximations in (7) includes a piecewise linear

6

Page 7: Stochastic Dynamic Linear Programming: A Sequential ...stochastic dual dynamic programming (SDDP) algorithm, have received wide acceptance. However, these sampling-based methods still

Gangammanavar and Sen Stochastic Dynamic Linear Programming

approximation in which the pointwise maximum is dened over the collections of anefunctions generated across all the sample paths, i.e., J kt+(ω) = ∪Nn=1J knt+ (ω). In addition,if the uncertainty is conned to the state dynamics, then the cuts can be shared across theoutcomes ω ∈ Ωt+. This sharing of cuts is possible due to the stagewise independence ofexogenous information, and was rst proposed in [19].

2. A SAA of the problem in (2) can be constructed by replacing the true distribution ofωt by the empirical distribution based on a random sample ω1

t , ω2t , . . . , ω

Nt for all t ∈

T \ 0. These random samples are generated independently to ensure that the stagewiseindependent assumption is respected. A SAA based SDDP algorithm was analyzed in [36].

3. The forward recursion sampling must ensure that each of the |Ω1| × |Ω2| × . . . × |ΩT |possible sample paths are visited innitely many times w.p.1. If sampling is employed incut generation procedure (as in [6, 28]), it must be performed independently of the forwardrecursion sampling and must ensure that each element of Ωt is sampled innitely manytimes w.p.1. at all stages.

In contrast to the SDDP algorithm, where a xed sample is used at each stage, our SDLPalgorithm will generate approximations that are based on sample average functions constructedusing a sample whose size increases with iteration. The sequential nature of introducing newobservations into the sample requires additional care within the algorithm design, particularlyin the backward pass when approximations are generated (Steps 1222). We will present thesedetails in the next section.

4 Stochastic Dynamic Linear Programming

Like other multistage SP algorithms, an iteration of SDLP involves two principal steps: forwardand backward recursion. We will present details of these in iteration k of the algorithm. Notethat, we make the same assumptions as the SDDP algorithm.

4.1 Forward Recursion

The forward recursion begins by solving the following quadratic regularized optimization prob-lem:

minu0∈U0

fk−1

0 (u0) +σ

2‖u0 − uk−1

0 ‖2. (8)

Here, the proximal parameter σ > 0 is assumed to be given. We will denote the optimal solutionof the above problem as uk0 and refer to it as the candidate solution. The incumbent solution uk0used in the proximal term is similar to that used in the regularized L-shaped [33] and 2-SD [16]algorithms. This is followed by simulating a sample path ωk(0) which is generated independentlyof previously observed sample paths. The remainder of the forward recursion computationsare carried out only along this simulated sample path in two passes - a prediction pass and anoptimization pass.

Prediction Pass

At all non-terminal stages we use a regularized stage optimization problem which is centeredaround the incumbent solution. The goal of the prediction pass is to make sure that the incum-bent solutions, and the corresponding incumbent states, satisfy the underlying model dynamicsin (1). Given the initial state x0, the prediction pass starts by using the root stage incumbentsolution uk0 and computing the incumbent state for stage-1 as: xk1 = D1(x0, ω

k1 , u

k0). At the

subsequent stage, letMt : St → Ut be a vector valued function which takes the state vector st

7

Page 8: Stochastic Dynamic Linear Programming: A Sequential ...stochastic dual dynamic programming (SDDP) algorithm, have received wide acceptance. However, these sampling-based methods still

Gangammanavar and Sen Stochastic Dynamic Linear Programming

as an input and maps it on to a solution ut ∈ Ut(st). This mapping will be used to identifythe incumbent solutions at these stages, i.e., ukt =Mt(s

kt ). We postpone the discussion of this

mapping to 4.3.2. We proceed by computing the incumbent state using (1) and identifyingthe incumbent solution using the mapping for the remainder of the horizon. At the end of theprediction pass, we have an incumbent state xkt and solution ukt trajectories that satisfystate dynamics in (1) over the entire horizon.

Optimization Pass

After completing the prediction pass, the optimization pass is carried out to simulate candidatesolutions along the current sample path ωk(0) for all t ∈ T \ T:

ukt ∈ argminfk−1t (skt , ut) +

σkt2‖ut − ukt ‖2 | ut ∈ Ut(skt ). (9)

Here fk−1t (u0) is the current approximation of the cost-to-go function, and the proximal term

σkt > 0 is assumed to be given. Structurally, fk−1t is a piecewise ane and convex function, and

is similar to the approximations used in the SDDP algorithm. However, each individual pieceis a minorants2 generated using certain sample mean functions. The candidate decision for aparticular stage is used to set up the subsequent endogenous state xkt+ = Dt+(xkt , ω

kt+, u

kt ), and

thus the input state skt+. We will refer to the decision problem in (9) as Timestaged Decision

Simulation at stage t (TDSt). This completes the optimization pass, and hence the forwardrecursion, for the current iteration. At the end of forward recursion, we have the incumbenttrajectory xkt and the candidate trajectory xkt which will be used for updates during thebackward recursion.

4.2 Backward Recursion

The primary goal in the backward recursion procedure is to update the cost-to-go approxima-tions fk−1

t at all non-terminal stages. As the name suggests these calculations are carried outbackwards in time, starting from the terminal stage to the root stage, along the same samplepath that was observed during the forward recursion. These calculations are carried out for boththe candidate as well as incumbent trajectories.

In both the DD and SD-based approaches, the value function is approximated by the point-wise maximum of ane functions. However, the principal dierence between these approacheslies in how the expected value function is approximated. In DD-based methods, it is the trueexpected value function which requires the knowledge of the probability distribution, or a SAAwith a xed sample (as in (3)). On the other hand, the SD-based methods create successiveapproximations fkt (for t < T ) that provide a lower bound on a sample mean using only kobservations in iteration k, and therefore satises:

fkt (ut)− d>t ut ≤ Hkt+(ut) :=

∑ωt+∈Ωk

t+

pk(ωt+)ht+(xt+, ωt+), (10)

where xt+ is the endogenous state obtained from (1) with (xt, ωt+, ut) as input, for all ωt+ ∈ Ωkt+

and ut ∈ Ut(st). The quantity pk(ωt+) in (10) measures the relative frequency of an observationwhich is dened as the number of times ωt+ is observed (κk(ωt+)) over the number of iterations(k). This quantity approximates the unconditional probability of exogenous information at staget+, and is updated as follows. Given the current sample path ωk(0), a collection of observations

2Since the approximations generated in sequential sampling-based methods are based on statistical estimateswhich are updated iteratively, we use the term minorant to refer the lower bounding ane functions. This usagefollows its introduction in [35] and is intended to distinguish them from the more traditional cuts in DD-basedmethods.

8

Page 9: Stochastic Dynamic Linear Programming: A Sequential ...stochastic dual dynamic programming (SDDP) algorithm, have received wide acceptance. However, these sampling-based methods still

Gangammanavar and Sen Stochastic Dynamic Linear Programming

at a non-root stage t (t > 0) is updated to include the latest observation ωkt as: Ωkt = Ωk−1

t ∪ωkt .The observation count is also updated as: κk(ωt) = κk−1(ωt) + 1ωt=ωk

t, for all ωt ∈ Ωk

t . Using

these, the observation frequency for ωt ∈ Ωkt is given by: pk(ωt) = κk(ωt)

k . Notice the superscript

k (iteration count) that is used in our notation of the SAA function Hkt+, the collection of

observations Ωkt+ and the observation frequency pk. This is intended to convey the sequential

nature of SDLP.

4.2.1 Terminal Stage Approximation

At the terminal stage, recall that E[hT+(sT+)] = 0, and the value function hT is the valueof a deterministic linear program for a given state input sT . The sample mean Hk

T (uT−1) =∑ωT∈Ωk

Tpk(ωT )hT (sT ) provides an unbiased estimate of E[hT (sT )]. Hence, the value function

at the penultimate stage (t = T − 1) can be approximated using a procedure similar to the oneemployed in the 2-SD algorithm.

In this procedure, a subproblem corresponding to the current observation ωkT is setup andsolved. This subproblem uses skT = (xkT , ω

kT ) as input, where xkT = DT (xT−1, ω

kT , u

kT−1). Let

the optimal dual solution obtained be denoted as πkT (ωkT ) which is added to the collection ofpreviously discovered dual vertices: Πk

T = Πk−1T ∪ πkT (ωkT ). For other observations in Ωk

T , i.e.,ωT ∈ Ωk

T and ωT 6= ωkT , we identify the best dual vertex in ΠkT using the argmax operation as

follows:

πkT (ωT ) ∈ argmaxπ>T (bT − CTxT ) | πT ∈ ΠkT . (11)

Using the dual vertices πkT (ωT )ωT∈ΩkT, we compute the lower bounding ane function

`kT (xT ) := αkT (ωT ) + (βkT (ωT ))>xT , where

αkT (ωT ) = b>T πkT (ωT ); βkT (ωT ) = cT − C>T πkT (ωT ). (12)

The set of ane functions thus obtained (J kT = ∪kj=1`jT (xT )) provides the piecewise ane lower

bounding function to the value function hT (sT ) that is given by:

hkT (sT ) = maxj∈J k

T (ωT )`jT (xT ) := αjT (ωT ) + βjT (ωT )>xT . (13)

The above function provides is an outer linearization of the terminal value function.

4.2.2 Non-terminal Stage Approximation

When updating the approximations at a non-terminal stage t, we have access to the minorantsat stage t+ (recall that the value functions are being updated recursively backwards from theterminal stage). Using these we can dene:

Hkt (st) := c>t xt + min

ut∈Ut(st)d>t ut +

∑ωt+∈Ωk

t+

pk(ωt+)hkt+(Dt+(xt, ωt+, ut), ωt+). (14)

The above expression represents a sample mean computed over the current observations Ωkt+

at stage t+ at an arbitrary input state st. Since we use lower bounding approximations hkt+ inbuilding this sample mean, this sample estimate is biased. The stage approximation is updatedusing a lower bound to the above sample mean function, and hence, is biased as well.

In order to compute this lower bound, notice that we can obtain the subgradient, i.e.,βkt+(ωt+) ∈ ∂hkt+(Dt+(xkt , ωt+, u

kt ), ωt+) using the collection of ane functions J kt+(ω) for all

9

Page 10: Stochastic Dynamic Linear Programming: A Sequential ...stochastic dual dynamic programming (SDDP) algorithm, have received wide acceptance. However, these sampling-based methods still

Gangammanavar and Sen Stochastic Dynamic Linear Programming

Algorithm 2 Stochastic Dynamic Linear Programming

1: Initialization:2: Choose a proximal parameter σ ∈ [σmin, σmax] with 0 < σmin < σmax.3: Set observations Ω0

t = ∅; a trivial ane functions `0t = 0 in the set J 0t for all t ∈ T ; iteration

counter k ← 1.4: Forward recursion: Decision simulation along simulated sample path5: Solve the root stage optimization problem of the form (8) to identify uk0.6: Simulate a sample path ωk(0).7: Prediction pass:8: for t = 1, . . . , T − 1 do

9: Setup the incumbent state skt = Dt(xkt−, ωkt , ukt−).10: Identify an incumbent solution ukt =Mk

t (skt ).

11: end for

12: Optimization pass:13: for t = 1, . . . , T do

14: Setup the candidate state skt = Dt(xkt−, ωkt , ukt−).15: Solve the stage optimization problem (9) using skt as input, and obtain the candidate

primal solution ukt .16: end for

17: Backward recursion: Update value function approximations.18: for t = T, . . . , 1 do

19: Setup the stagewise-dual approximation (16).20: Solve the dual approximation using the candidate and the incumbent states, and

compute the coecients for ane functions using (17).21: Obtain the updated value function approximation as in (19).22: end for

23: Increment the iteration count k ← k + 1, and go to Line-4.

observations ωt+ ∈ Ωt+ (see 4.3.1 for details). Let αkt+(ωt+) be the corresponding interceptterm. Using these, a valid lower bound to the sample mean function in (14) can be written as:

Hkt (st) ≥c>t xt + min

ut∈Ut(st)d>t ut+ (15)∑

ωt+∈Ωkt+

pk(ωt+)

[αkt+(ωt+) + βkt+(ωt+)>Dt+(xt, ωt+, ut)

].

Substituting the state dynamics equation in (1), and dualizing the linear program on the right-hand side of the above inequality, we obtain:

Hkt (st) ≥ αkt+ + (βkt+)>xt + max π>t [bt − Ctxt] | D>t πt ≤ ρkt+, πt ≤ 0, (16)

where,

βkt+ =∑

ωt+∈Ωkt+

pk(ωt+)βkt+(ωt+)>At+, ρkt+ = dt +∑

ωt+∈Ωkt+

pk(ωt+)βkt+(ωt+)>Bt+,

and αkt+ =∑

ωt+∈Ωkt+

pk(ωt+)[αkt+(ωt+) + βkt+(ωt+)>at+].

We will refer to the linear program on the right-hand side of inequality in (16) as the stagewise-dual approximation at stage t, and denote it as (SDAkt ). Let πkt (ωkt ) denote the optimal dualsolution with least two-norm obtained by solving the above problem with skt as input. Using

10

Page 11: Stochastic Dynamic Linear Programming: A Sequential ...stochastic dual dynamic programming (SDDP) algorithm, have received wide acceptance. However, these sampling-based methods still

Gangammanavar and Sen Stochastic Dynamic Linear Programming

this we obtain a lower bounding ane function `kt (xt) = αkt (ωkt ) + βkt (ωkt )>xt with the following

coecients:

αkt (ωkt ) = (πkt (ωkt ))>bt + αkt+ ; βkt (ωkt ) = ct − C>t (πkt (ωkt )) + βkt+. (17)

Similar calculations using πkt (ωkt ), an optimal solution with least two-norm to the dual in (16)with skt as input, yields an incumbent ane function ˆk

t (xt). As before these functions areincluded in a collection of ane functions to obtain the updated set J kt (ωkt ).

While it is true that the latest ane functions satisfy Hkt (st) ≥ `kt (xt), the same does not

hold for ane functions generated at earlier iterations. Hence, it is possible that there exists aj ∈ J kt (ωt) such that the ane function `jt (xt) may not lower bound the current sample meanHkt (st). In keeping with the updates of 2-SD [15], the old minorants need to be updated as the

sample mean estimate changes during the iterations. Under assumption (A4), this is achievedby scaling down the previously generated ane functions. In the two-stage case, 2-SD minorantsare updated by multiplying the coecients by (k − 1)/k. In the multistage case, the minorantsare updated3 as follows

hkt (st) = max

(k − 1

k

)T−t`jt (xt)

j∈J k−1

t (ωt)

, `kt (xt),ˆkt (xt)

. (18)

Notice that both the candidate and incumbent ane functions generated in previous iterationsare treated similarly while scaling down.

We use these updated minorants to obtain the stage objective function as follows:

fkt (st, ut) = c>t xt + d>t ut +∑

ωt+∈Ωkt+

pk(ωt+)hkt+(Dt+(xt, ωt+, ut), ωt+). (19)

This completes the backward pass for iteration k. The sequentially ordered steps of SDLPalgorithm are presented in Algorithm 2.

4.2.3 Comparison of DD and SD-based approximations

The complete recourse assumption ensures that the dual feasible set is non-empty, and theoptimal dual solution πT is an extreme point of πT | D>T πT ≤ dT , πT ≤ 0. There are nitelymany of these extreme points, and hence, coecients for terminal stage computed using (6) forDD-based algorithms or (12) for SD-based methods take nitely many values.

In DD-based multistage algorithms the coecients belong to a nite set at stage t+, andtherefore, there exists an iteration k′ such that the set of coecients J kt+(ω) = J k′t+(ω) ∀ω ∈ Ωt+

for k > k′. Consequently, the dual feasible region of the problem solved in the backward passhas the following form:

Πk,DDt =

(πt, θt+)

∣∣∣∣∣ D>t πt ≤ dt +∑

ω∈Ωt+

∑j∈J k

t+(ω) θjt+(ω)βjt+(ω),∑

j∈J kt+(ω) θ

jt+(ω) = p(ω) ∀ω ∈ Ωt+, πt ≤ 0

. (20)

Notice that this dual feasible region does not change for iterations k > k′. Since there arenite number of extreme points to Πk,DD

t , the coecients computed using these extreme pointsolutions result in at most a nite number of distinct values at stage t.

In SDLP, notice the update of the old ane functions in (18) at stage t+ can be viewedas a convex combination of the coecient vector (αjt+, β

jt+) and a zero vector. Due to these

3The exponent (T − t) results from the fact that minorants in the future T − t stages are also updated in asimilar manner. Theorem 3 will provide a more formal argument.

11

Page 12: Stochastic Dynamic Linear Programming: A Sequential ...stochastic dual dynamic programming (SDDP) algorithm, have received wide acceptance. However, these sampling-based methods still

Gangammanavar and Sen Stochastic Dynamic Linear Programming

updates, the dual feasible region depends on updated coecients (particularly βkt+(ω)) as wellas frequencies pk(ω):

Πk,SDt = πt | D>t πt ≤ dt +

∑ωt+∈Ωk

t+pk(ωt+)βkt+(ωt+)>Bt+, πt ≤ 0 . (21)

This implies that dual solutions used to compute the coecients no longer belong to a nite set.However, following assumption (A2) the dual feasible set in (16) is bounded, and therefore, thecoecients computed in (17) for non-terminal stage are only guaranteed to be in a compact set.Proceeding backwards, we can conclude that this is the case for coecients at all non-terminalstages. These observations are summarized in the following Lemma.

Lemma 1. Suppose the algorithm runs for innitely many iterations. Under assumption As-

sumption (A1), (A2) and (A4). For all k ≥ 1,

1. The coecients of cuts generated within DD-based methods in (6), and coecients of

minorants generated for the terminal stage within SD-based methods in (12) belong to

nite sets.

2. The coecients of minorants generated within SD-based methods for the non-terminal

stages in (17) belong to compact sets for all k ≥ 1.

As a consequence of (i) in above Lemma, a nite number of cuts are generated during thecourse of DD-based algorithms. This is possible because these algorithms utilize the knowl-edge of transition probabilities in computing cut coecients. Additionally, these cuts providelower bound to the true value function and are not required to be updated over the courseof the algorithm. The subgradients computed in the SD-based methods are stochastic in na-ture. Therefore, only ane functions generated in the current iteration act as lower bounds,not to the true value function, but to the current sample mean approximation. The previousane functions have to be updated using the scheme described in (18). This scheme ensuresthat the minorant hkt , obtained after computing the current ane function and updating allprevious ane functions, provides a lower bound to the sample mean function Hk

t at all non-terminal stages and throughout the execution of the algorithm. This property is formalized inthe following theorem.

Theorem 2. Suppose assumptions Assumption (A1)-(A5) hold. The sampled minorant com-

puted in (13) for terminal stage and (18) for non-terminal stages satisfy: hkt ≤ Hkt , for all

t ∈ T \ T and k ≥ 1.

Proof. In this proof we will use m = ωkt+ which is the observation encountered at stage t+in iteration-k and n to index the set Ωt+. Following this notation, we will denote xt+ =Dt+(xt, ωt+, ut) as xnt+ and st+ = (xt+, ωt+) as snt+. Consider the stage sample mean problemin (14):

Hkt (st)− c>t xt = min

ut∈Ut(st)d>t ut +

∑n∈Ωk

t+

pk(n)hkt+(snt+)

= minut∈Ut(st)

d>t ut +∑n∈Ωj

t+

pk(n)hkt+(snt+) +∑

n∈Ωkt+\Ω

jt+

pk(n)hkt+(snt+).

We distribute the summation over observations encountered in the rst j < k iterations (i.e.,Ωjt+) and those encountered after iteration j. Since hkt+ ≥ 0, we have

Hkt (st)− c>t xt ≥ min

ut∈Ut(st)d>t ut +

∑n∈Ωj

t+

pk(n)hkt+(snt+)

= minut∈Ut(st)

d>t ut +∑n∈Ωj

t+

κj(n) + κj→k(n)

k· hkt+(snt+).

12

Page 13: Stochastic Dynamic Linear Programming: A Sequential ...stochastic dual dynamic programming (SDDP) algorithm, have received wide acceptance. However, these sampling-based methods still

Gangammanavar and Sen Stochastic Dynamic Linear Programming

For observations in Ωjt+, we distribute the computation of their relative frequency by setting

κk(n) = κj(n) + κj→k(n) where κj→k(n) is the number of times observation n was encounteredafter iteration j. Once again invoking hkt+ ≥ 0 we obtain:

Hkt (st)− c>t xt ≥ min

ut∈Ut(st)d>t ut +

∑n∈Ωj

t+

j

k× κj(n)

j· hkt+(snt+)

Recall that the minorants at stage t+ are updated in (18) by adding new ane function into thecollection while multiplying the previously generated ane function by a factor of ( jk )T−t−1 < 1.By replacing the current minorant hkt+ by the scaled version of the one available in iteration j,we have:

Hkt (st) ≥ c>t xt + min

ut∈Ut(st)d>t ut +

j

k

∑n∈Ωj

t+

pj(n)

[(j

k

)T−t−1

hjt+(snt+)

]

≥(j

k

)T−t[c>t xt + min

ut∈Ut(st)d>t ut +

∑n∈Ωj

t+

pj(n)hjt+(snt+)

]

The second inequality follows from our assumption that the stagewise costs are non-negative.Notice that the scaling factor used when t+ = T reduces to one. In this case, the future costcorresponds to the terminal stage, and the ane functions satisfy `jT (xT ) ≤ hT (sT ) for allj ∈ J kT (ωT ). Therefore, hkT (sT ) ≤ hT (sT ). At other stages, the ane function generated in

iteration j, viz. `jt (xt) provides a lower bound to the sample mean in the same iteration Hjt (st).

This leads us to conclude that

Hkt (st) ≥

(j

k

)T−tHjt (st) ≥

(j

k

)T−t`jt (xt).

Applying the same arguments for j < k, and using the denition of minorant in (18) completesthe proof.

Additionally, it is worth noticing the relationship between minorants across consecutive it-erations as well as their asymptotic behavior which is stated in the following Theorem.

Theorem 3. The minorants satisfy the following:

1. At consecutive iterations we have for all k > 0, st ∈ St, t ∈ T \ 0:

Hkt (st) ≥ hkt (st) ≥

(k − 1

k

)T−thk−1t (st).

2. Under assumption (A2), (A3) and (A5), the sequence of functions hkt k is uniformly

equicontinuous, and uniformly convergent at all non-root stages.

Proof. See Appendix A.

In contrast to these results, the approximations created in the DD-based methods (see (7))improve monotonically. To clarify what we mean, note that the probability distribution isexplicitly used (as constants) in computing the SDDP cuts. Consequently, the approximationssatisfy HN

t (st) ≥ hkt (st) ≥ hk−1t (st) for all st, without any need for updates.

Remark: Since the SDLP algorithm works with data discovered through sequential sampling,it does not rely on any apriori knowledge of exogenous probability distribution. This featuremakes this algorithm suitable to work with external simulators or statistical models which are

13

Page 14: Stochastic Dynamic Linear Programming: A Sequential ...stochastic dual dynamic programming (SDDP) algorithm, have received wide acceptance. However, these sampling-based methods still

Gangammanavar and Sen Stochastic Dynamic Linear Programming

able to better capture the nature of exogenous uncertainty. In each iteration, the algorithmwill prompt the simulator to provide a new sample path. This feature is particularly appealingwhen an a priori representation of uncertainty using scenario trees is either cumbersome orinadequate due to computational and/or timeliness constraints. Such optimization problems arecommonly encountered in operations of power systems with signicant renewable penetration.Due to the intermittent nature of renewable resources such as wind and solar, a scenario treerepresentation may be dicult (even impossible) to create within the timeliness constraints.State-of-the-art numerical weather prediction and other time series models are known to bemore accurate descriptors of this uncertainty. Therefore, optimization algorithms which usesample paths simulated from such models yield more reliable plans and cost estimates [13, 14].

4.3 Subgradient and Incumbent Selection

In this section we address two important components of the SDLP algorithm: the argmaxprocedure to identify the subgradient of a SDLP approximation at non-root stage that is usedduring the backward recursion, and the selection of an incumbent solution for the proximal termused during timestaged decision simulation.

4.3.1 Subgradient Selection

During the backward recursion, we build a lower bound to the sample mean function Hkt− using

the best lower bounding ane functions from the collection J kt for all ωt ∈ Ωkt . This procedure

is accomplished dierently based on whether the observation belongs to the current sample pathωk(0), or not. We will utilize the collection of dual vertices Πk

t identied during the course of

the algorithm for this purpose. Note that an element πt ∈ Πkt depends on the sample mean

function used in the iteration in which the dual solution was generated (denoted as i(πt)). This

dependence is explicitly captured by the coecients (αi(πt)t+ , β

i(πt)t+ ), and the term ρ

i(πt)t+ that

denes the feasible set associated with πt (see (16)).For observation ωk

t : This observation is encountered at stage t along the current samplepath. Consequently in the current backward pass, we built and solved a SDAkt to optimalityusing skt as input. Using the optimal dual solution thus obtained, we compute the coecients in(17) for the supporting hyperplanes `kt (xt) to SDAkt at the candidate state. Similar calculationswith skt as input yield the supporting hyperplane ˆk

t (xt) to SDAkt at incumbent state skt .For observations ωt ∈ Ωk

t \ ωkt : These are the observations not included in the current

sample path, and therefore, no backward pass optimization is carried out for these observations.Instead, we use an argmax procedure to identify the subgradient approximations. Thesesubgradients correspond to the best lower bounding ane function of SDAkt for these observation.In order to accomplish this, we maintain a set of dual solutions Πk

t obtained by solving the SDAjtin iterations j ≤ k as in the case of 2-SD. For each ωt ∈ Ωk

t \ ωkt , we setup st = (xt, ωt), wherext is computed with (xkt−1, ωt, u

kt−1) as input in (1), and identify a dual solution:

πkt (ωt) ∈ argmax

(i(πt)

k

)T−tπ>t (bt − Ctxt) | πt ∈ Πk

t

.

Notice that the set of dual vertices Πkt changes with iteration which may lead to computational

diculties which . We address this issue by using the constancy of the basis index sets thatgenerate these dual vertices. Further discussion of this issue is provided in section 4.3.2. Usingthe dual solution obtained by the above procedure, we can compute the coecients:

αkt (ωt) =

(i(πkt (ωt))

k

)T−t[πkt (ωt)

>bt + αi(πk

t (ωt))t+ ],

βkt (ωt) =

(i(πkt (ωt))

k

)T−t[−C>t πkt (ωt) + β

i(πkt (ωt))

t+ ].

14

Page 15: Stochastic Dynamic Linear Programming: A Sequential ...stochastic dual dynamic programming (SDDP) algorithm, have received wide acceptance. However, these sampling-based methods still

Gangammanavar and Sen Stochastic Dynamic Linear Programming

In essence, the above procedure identies a dual solution πkt (ωt) which was obtained using a

SDAi(πk

t (ωt)t , and scales it appropriately to provide the best lower bounding approximation to

the current SDAkt .

4.3.2 Incumbent Selection

The procedure described here to identify an incumbent solution at all non-root, non-terminalstages is motivated by the optimal basis propagation policy presented in [5]. This identicationperformed during the prediction pass relies on the basis of stage dual approximation (SDAkt )that appears on the right-hand side of (16). To facilitate the discussion here, we have restatedSDAkt below:

max π>t [bt − Ctxt] subject to D>t πt ≤ ρkt , πt ≤ 0, (22)

where ρkt is dened in the expressions following (16). In each iteration, the above linear programis solved to optimality along the iteration sample path and potentially a new basis is discovered.Let Bkt denote the index set whose elements are the rows which are active in (22). Denote byDt,Bk

tthe submatrix of Dt formed by columns indexed by Bkt (the basis matrix). From standard

linear programming results we have that a feasible point is an extreme point of the feasible setif and only if there exists an index set that satises D>

t,Bktπkt = ρk

t,Bjt

. This index set is added

to the collection of previously discovered index sets: Bkt = Bk−1t ∪ Bkt . We use this collection of

index sets to construct dual solutions of the linear program in (22). Assumption (A2) ensuresthat the optimal set of the dual linear program is non-empty which implies that there exists anindex set Bjt ∈ Bkt such that for any arbitrary input state st we can write:

ut,i = D−1

t,Bjt

(bt − Ctxt), i ∈ Bjt ; ujt,i = 0, i /∈ Bjt . (23)

This operations can be written as ut = RBjt(bt−Ctxt), where RBj

tis a mt×nt matrix with rows

[RBkt]i = [(D

t,Bjt)−1]i for i ∈ Bjt and [RBj

t]i = 0 (a zero vector of length mt) for i /∈ Bjt . Note

that, if ujt satises the dual constraints of (22) then it is a suboptimal basic feasible solutionto the dual problem (and if complementarity conditions are also satised then it is optimal).We will use Ukt (st) to denote the set of dual basic feasible solutions generated using (23) for allindex sets in Bkt . Using these index sets we dene the mapping used for incumbent selection atnon-root stages as follows:

Mkt (st) = argminfk−1

t (skt , ujt ) | u

jt ∈ Ukt (st) ∀t ∈ T \ 0. (24)

Notice that the dual LP of (22) has cost coecients which vary over iterations, akin to 2-SD withrandom cost coecient in the second-stage [12]. The steps involved in our procedure to selectincumbent solutions, particularly computation of dual solutions in (23) and establishing theirfeasibility, can be implemented in a computationally ecient manner using a sparsity preservingrepresentation of dual solutions. We refer the reader to [12] for a detailed discussion about thisrepresentation and its implementation.

At the root stage it suces to maintain a single incumbent solution. This incumbent solutionis updated based on predicted objective value reduction at the root stage:

fk0 (ukt , s0)− fk0 (uk−10 , s0) ≤ q [fk−1

0 (uk0, s0)− fk−10 (uk−1

0 , s0)], (25)

wh33ere q ∈ (0, 1) is a given parameter. If the above inequality is satised, then the incumbentsolution uk−1

0 updated to the corresponding candidate solutions. That is, uk0 = uk0 for all t′ ≥ t.On the other hand, if the inequality is not satised, then the current incumbent solution forstage t is retained (uk0 = uk−1

0 ). This update rule is similar to incumbent updates carried out innon-smooth optimization methods including regularized 2-SD [18] and MSD [34] algorithms.

15

Page 16: Stochastic Dynamic Linear Programming: A Sequential ...stochastic dual dynamic programming (SDDP) algorithm, have received wide acceptance. However, these sampling-based methods still

Gangammanavar and Sen Stochastic Dynamic Linear Programming

5 Convergence Analysis

In this section we will present the convergence results for SDLP. We will begin by discussing thebehavior of the sequence of states and decisions generated by the SDLP algorithm, then proceedto show the convergence of value function estimates and the solutions.

State and decision accumulation points

We next characterize how the states and the corresponding decisions evolve over the horizon ofthe problem. Under assumption (A5), we have a nite number possible sample paths over thehorizon. We will use Pt to denote the set of all sample paths until stage t. We focus on evolutionof states and decisions along these sample paths. However, instead of using the sample path ω[t]

to index the evolution, we use the state trajectory until stage t, i.e., s[t] which includes both thetrajectories of exogenous as well as endogenous states.

Theorem 4. Suppose assumptions (A1)-(A5) hold. Let uk0 ⊆ U0 denote any innite sequence

of root stage incumbent solutions. There exists a subsequence K0 of iterations such that uk0K0

has an accumulation point. In subsequent stages, for all possible paths ω[t] ∈ Pt there exists

a subsequence of iterations indexed by Kt(s[t]) such that the sequence ukt (s[t])k∈Kt(s[t]) has an

accumulation point.

Proof. For the root-node the feasible set U0 is compact by (A1), hence there exists a subsequenceof iterations indexed by K0 such that uk0k∈K0 → u∗0. Following (A5), there exists an innitesubsequence K1(s[1]) ⊆ K0 such that the algorithm selects sample path ω[1] ∈ P1. Since uk0k∈K0

converges and x0 is xed, the sequence of endogenous state xk1k∈K1(s[1]) converge to x∗1(s[1]).

Consider the optimization problem on the right-hand side of (16) for t = 1 in its dual form:

min (ρk1)>u1 | D1u1 ≤ b1 − C1x1, u1 ≥ 0.

Recall that the feasible set of the above problem is denoted as U1(s1). Let D(u1, s1) :=argmin||u1 − u||2, u ∈ U(s1). A slight variant of Homan's lemma (see Lemma 9 in theappendix) leads us to conclude that for any s1, s

∗1 ∈ dom U1 and any u1 ∈ U1(s1) that

D(u1, s1) ≤ γ‖(b1 −C1x1)− (b1 −C1x∗1)‖. Here, γ > 0 is the Lipschitz constant of the mapping

D(·) which depends only on the recourse matrix D1.In other words, the feasible set U1(·) is Lipschitz continuous in the above sense. It follows

that it is possible to choose an extreme point u1(s1) ∈ U1(s1) such that u1(s1) is continuouson dom U1. Moreover, the polyhedral set U1 has a nite number of extreme points. There-fore, the procedure outlined in 4.3.2 to select the incumbent M1 : s1 → u1, is a continuouspiecewise linear mapping. For a given sample path ω[1], since the input state sequence con-verge the right-hand side of above inequality can be bounded from above by a positive constantε(s[1]) which depends on the sample path through the transfer matrix C1. This lead us to con-

clude that the innite subsequence of solutions obtained from this mapping uk1K1(s[1]) has an

accumulation point and consequent existence of a subsequence of K(s[1]) over which the nextinput endogenous state converges at t > 1. For sample path ω[t] ∈ Pt, once again assump-tion (A5) guarantees that there exists an innite subsequence of Kt(s[t]) ⊆ Kt−(s[t−]) whensample path ω[t] is encountered. Here ω[t] = (ω[t−], ωt), i.e., sample path ω[t] shares the sameobservations with ω[t−] until stage t−. Over this subsequence, the convergence of endogenous

state xkt = Dt(xkt−, ωt, ukt−)Kt(s[t]) → x∗t (s[t]) ensures the convergence of incumbent input states

skt Kt(s[t]). Using the same arguments as before for stage t and proceeding recursively to therest of the stages, we validate the hypothesis.

16

Page 17: Stochastic Dynamic Linear Programming: A Sequential ...stochastic dual dynamic programming (SDDP) algorithm, have received wide acceptance. However, these sampling-based methods still

Gangammanavar and Sen Stochastic Dynamic Linear Programming

Convergence of Value Function Estimates

Since our algorithm uses sequential sampling, path-wise forward and backward updates, esti-mates of probability and sampled minorants we use a benchmark to verify optimality of valuefunctions and the solutions thus obtained from them. In order to build such a benchmark, letPk(t) ∈ Ωk

t+× . . .×ΩkT denote the set of all possible scenarios from stage-(t+1) to the end of hori-

zon which traverse through observations encountered by the algorithm in the rst k iterations.Note that Pk(t) represent the set of possible paths in the future and should not be confused with

Pkt which represent the set of traversed paths. Stagewise independence allows us to computethe probability estimate of a sample path ωj(t) ∈ P

k(t) as product of frequencies associated with

observations along that sample path, i.e. pk(ωj(t)) = pk(ωjt+1) × . . . × pk(ωjT ). Let xj(t) and uj(t)

denote endogenous state and decision vector, respectively, associated with sample path ωj(t). Fora given input st, the following is an extensive formulation of the cost-to-function:

Hkt (st) = c>t xt + min d>t ut +∑j∈Pk

(t)

pk(ωj(t))× [(c(t))>xj(t) + (d(t))

>uj(t)] (26)

s.t. ut ∈ Ut(u0, st), ujt′ ∈ Ut′(u0, sjt′)t′>t and non-anticipative,

xjt′+ = Dt′+(xjt′ , ωjt′+, u

jt′)t′≥t.

In the above formulation, dynamics and non-anticipativity are satised starting at stage t, andare relative to input st. This sample mean function Hkt represents the value associated withinput st for the remainder of horizon with respect to current observations Ωk

i ki=t+. In orderto simplify notation, the dependence of the sample mean function on the set Pk(t) is conveyed

through the index k in Hkt (st), as opposed to the more complete Hkt (st|Pk(t)).During forward pass decisions ut are simulated using approximation fk−1

t in (9) alongthe observations dictated by sampling, and during the backward pass the approximations usingsubgradients observed along the same sample path. Next we will relate the objective functionvalues encountered during forward and backward passes. In order to do this, we dene ut(st) tobe the optimal solution obtained using (9) during forward pass with input st. The forward passobjective function value F k−1

t associated with this decision is therefore given by:

F k−1t (st) := c>t xt+d

>t ut(st) +

∑ωt+∈Ωk−1

t+

pk−1(ωt+) hk−1t+ (skt+(ωt+)).

Here skt+(ωt+) = Dt+(xt, ωt+, ut(st)). In order to study the asymptotic behavior of our algorithm,we will investigate how the functions Hkt , F kt and hkt relate in value at limiting states. It isworthwhile to note that the sample mean approximation in (15), the extensive formulation in(26) and the forward pass objective value in (27) are dened only for non-terminal stages asHkT (sT ) = HkT (sT ) = F kT (sT ) = hT (sT ) for terminal stage ∀k.

Lemma 5. Suppose Assumptions (A1)-(A5) hold.

1. The sequence of functions F kt k is uniformly equicontinuous, and uniformly convergent

for all t.

2. The sequence of functions Hkt k converges uniformly with probability one to the expectation

function ht in (2) for all t > 0.

Proof. See parts (b) and (c) of Lemma 4.3 in [34].

This lemma shows that Hkt converges uniformly to ht asymptotically (w.p.1), and therefore,is a convenient benchmark for assessing optimality of the SDLP algorithm. We rst show theconvergence of approximations generated during the course of the algorithm to the true valuefunction in the following theorem.

17

Page 18: Stochastic Dynamic Linear Programming: A Sequential ...stochastic dual dynamic programming (SDDP) algorithm, have received wide acceptance. However, these sampling-based methods still

Gangammanavar and Sen Stochastic Dynamic Linear Programming

Theorem 6. Suppose Assumptions (A1)-(A3) hold, and at any non-terminal stage t there ex-

ists a subsequence of iterations Kt(s[t]) such that if k1, k2 ∈ Kt(s[t]) then |k1 − k2| < ∞ and

skt k∈Kt(s[t]) → s∗t (s[t]) with probability one. Then:

limk∈Kt

F k−1t (skt ) = lim

k∈Kt

hkt (skt ) = lim

k∈Kt

Hkt (skt ) = lim

k∈Kt

Hkt (skt ) (w.p.1). (27)

Proof. For terminal stage (t = T ), continuity of linear programming value function impliesthat limk∈KT (s[T ]) hT (skT ) = hT (s∗T (s[t])). Since F kT , H

kT and HkT are all equivalent to hT , the

above relation trivially holds. Consequently we have, limk→KT (s[T ])ˆkT (xkT ) = hT (s∗T (s[T ])) and

limk→KT (s[T ]) ∂ˆkT (xkT ) ∈ ∂hT (s∗T (s[T ])).

For a non-terminal stage, let k−τ and k be two successive iterations of subsequence Kt(s[t]).

The forward pass objective function F k−1t (skt ) and the backward pass sample mean function

Hk−1t (skt ) dier only in the proximal term, and hence Hk−1

t (skt ) ≤ F k−1t (skt ) for all st ∈ St. In

the following we will use m = ωkt+ and n as an index for set Ωkt+. The forward pass objective

function value at the current input state can be written as:

F k−1t (skt ) = c>t x

kt + d>t ut(s

kt ) +

∑n∈Ωk−1

t+

pk−1(n)hk−1t+ (sknt+)

The optimality of ut(skt ) ensures that the objective function value is associated with ut(s

kt ) is

lower than any other feasible solution. If we specically consider the dual of (16) denoted ut(skt ),

we have

F k−1t (skt ) ≤ c>t x

kt + d>t ut(s

kt ) +

∑n∈Ωk−1

t+

pk−1(n)hk−1t+ (sknt+)

By adding and subtracting the current approximation of future cost, i.e.,∑n∈Ωk

t+pk(n)hkt+(snt+) =

∑n∈Ωk−1

t+ \mpk(n) hkt+(snt+) + pk(m)hkt+(smt+) we obtain

F k−1t (skt ) ≤ c>t x

kt + d>t ut(s

kt ) +

∑n∈Ωk

t+

pk(n)hkt+(sknt+) +∑

n∈Ωk−1t+

pk−1(n)hk−1t+ (sknt+)

−[ ∑n∈Ωk−1

t+

(k − 1

k

)pk−1(n) hkt+(sknt+) +

1

khkt+(skmt+)

].

From the denition of backward pass sample mean approximation in (16) and the fact thatht(st) ≥ 0, we have

F k−1t (skt ) ≤ Hk

t (skt ) +∑

n∈Ωk−1t+

pk−1(n)[hk−1t+ (sknt+)− hkt+(sknt+)].

Let us focus on the terms within the summation on the right hand side of above inequality, i.e.,∆kn = hk−1

t+ (sknt+)− hkt+(sknt+). Then

∆kn = hk−1

t+ (sknt+)− hk−1t+ (sknt+) + hk−1

t+ (sknt+)− hkt+(sknt+)

≤ |hk−1t+ (sknt+)− hk−1

t+ (sknt+)|+ |hk−1t+ (sknt+)− hkt+(sknt+)|

Since the sequence hkt converges uniformly (Theorem 3), for every ε > 0, there existsK1n(ε) <

∞ such that for k > K1n(ε), |hkt+ − hk−1

t+ | < ε/2. Further, since skt k∈Kt(s[t]) → s∗t (s[t]), there

exists a K2n(δ) ∈ K(s[t]) such that ‖skt − skt ‖ < δ for all k > K2

n(δ). Since the minorants are

18

Page 19: Stochastic Dynamic Linear Programming: A Sequential ...stochastic dual dynamic programming (SDDP) algorithm, have received wide acceptance. However, these sampling-based methods still

Gangammanavar and Sen Stochastic Dynamic Linear Programming

equicontinuous, we have that there exists K3n(ε) > K2

n(δ) such that |hk−1t+ (sknt+)− hkt+(sknt+)| <

ε/2. Therefore, for iterations k > maxn∈Ωk−1t+K1

n(ε),K3n(ε), we have

F k−1t (skt ) ≤ Hk

t (skt ) +∑

n∈Ωk−1t+

pk−1(n)ε = Hkt (skt ) + ε.

We can conclude that limk∈K(s[t]) Fk−1t (skt )−Hk

t (skt ) ≤ ε.To show the inequality in the other direction, we use the fact that Hk

t (skt ) ≤ F kt (skt ) and theuniform convergence of the sequence F kt . This gives us limk∈K(s[t])H

kt (skt ) − F k−1

t (skt ) ≤ ε.Since inequalities hold in both directions for an arbitrary ε > 0, we have

limk∈Kt(s[t])

|F k−1t (skt )−Hk

t (skt )| = 0 (w.p.1). (28)

Since we choose the dual πkt with least two-norm of sample mean function in (16), we areguaranteed that πkt → π∗t which in turn ensures that the coecients satisfy αkt , βkt ) →(α∗t , β

∗t ). Using strong duality of linear programs, we have the subgradient property: hkt (s

kt ) =

`kt (xkt ). The complementary solution to πkt is primal feasible and non-anticipative, and hence

optimality of Hkt (skt ) yields

limk∈Kt

Hkt (skt ) ≤ limk∈Kt

hkt (skt ) ≤ lim

k∈Kt

Hkt (skt ) (w.p.1),

where the last inequality is due to Theorem 2. Moreover, the forward pass objective functionvalue satises limk∈Kt F

k−1t (skt ) ≤ limk∈Kt Hkt (skt ) (w.p.1). Therefore we have

limk∈Kt

F k−1t (skt ) ≤ lim

k∈Kt

Hkt (skt ) ≤ limk∈Kt

hkt (skt ) ≤ lim

k∈Kt

Hkt (skt ) (w.p.1), (29)

Using (28) in the above relation, concludes our proof.

Convergence of Solutions

At any stage the number of observations is nite (A5), and so are the number of unique pathsobservable. In order to satisfy the non-anticipativity requirement, we need an incumbent solutionfor every path. Let Pt denote the set of all possible path. We will next show the convergence ofthese incumbent solutions corresponding to each ω[t] ∈ Pt at all non-terminal stages.

Lemma 7. Suppose assumptions (A1) - (A5) hold, and σ ≥ 1. Then there exists u∗0 ∈ U0(s0)such that the sequence of root-node incumbent decisions generated by the algorithm satisfy uk0 →u∗0. Moreover in every subsequent stage, there exists u∗t (s[t]) ∈ Ut(s∗t (s[t])) which satisfy dynamics

in (1) and the sequence of solutions generated by the algorithm ukt (s[t])Kt(s[t]) → u∗t (s[t]) for allpaths ω[t] ∈ Pt.

Proof. See Appendix A.

Theorem 8. Suppose Assumptions (A1)-(A5) hold and σ > 1, then the SDLP algorithm pro-

duces a sequence incumbent solutions at root stage uk0 → u∗0 and u∗0 is optimum w.p.1.

Proof. For all sample paths ω[t] ∈ Pt, Theorem 4 shows that xkt → x∗t (s[t]) and (29) in theproof of Theorem 6 implies that

limk→∞

ˆkt (x

kt ) = ht(s

∗t (s[t])), and lim

k→∞∂ ˆkt (x

kt ) ⊂ ∂ht(s∗t (s[t]) w.p.1.

Using the same arguments while proceeding backwards to the root stage, we conclude thatthe limit point u∗0 associated with root stage incumbents yields the optimum solution, withprobability one.

19

Page 20: Stochastic Dynamic Linear Programming: A Sequential ...stochastic dual dynamic programming (SDDP) algorithm, have received wide acceptance. However, these sampling-based methods still

Gangammanavar and Sen Stochastic Dynamic Linear Programming

6 Conclusions

The SDLP algorithm extends the regularized 2-SD algorithm [16] to MSLP setting where theunderlying stochastic process exhibits stagewise independence. The algorithm addresses thestate variable formulation of MSLP problems by employing sequential sampling. In this sense,it is a counterpart to the MSD algorithm of [34] which was designed for a case where the under-lying uncertainty has a scenario tree structure. The algorithm incorporates several additionaladvantages granted by the stagewise independence property. We conclude here by noting thesalient features of our SDLP algorithm:

1. The algorithm uses a single sample path both for simulating decisions during forwardrecursion and updating approximations during backward recursion. In any iteration, com-pared to SDDP which requires solving subproblems corresponding to all outcomes at allstages and for all sample paths simulated during forward pass, SDLP uses two subproblemsolves at each stage. This signicantly reduces the computational burden of solving MSLPproblems.

2. The method uses quadratic regularization terms at all non-terminal stages which is allevi-ates the need to retain all the minorants generated. This allows us to retain a nite sizedsubproblems at all stages, further improving its computational advantage.

3. The ane mapping described in 4.3.2 is the rst to provide an optimal policy for MSLPs.This mapping overcomes the need to store incumbent solutions that, either explicitly orimplicitly, depend on the entire history of state evolution, and can be used with otherregularized MSLP algorithms. Our convergence results show that the optimality of theaccumulation points of a subsequence of incumbent solutions is preserved even when sucha mapping is employed.

4. SDLP incorporates sampling within the optimization step, and thereby, optimizes an SAAwith increasing sample size. This feature enables SDLP to solve the MSLP problems togreater accuracy by incorporating additional observations at any stage without having tore-discover the structural information of an instance to build/update the approximations.The adaptive nature allows the algorithm to be terminated upon attaining a desired levelof accuracy. This opens the avenue to design statistical optimality rules for multistagesetting akin to those developed for 2-SLP [18, 35].

The computational advantages of SDLP were revealed in our companion paper [13]. In thatpaper, we applied the SDLP algorithm to a MSLP model for distributed storage control in thepresence of renewable generation uncertainty. The computational results compare our algorithmwith SDDP applied to a SAA of the original model. The sample paths used to setup the SAA andthose used within SDLP algorithm were simulated using an autoregressivemoving-average timeseries model. The results indicate that SDLP provides results which are not only reliable but arealso statistically indistinguishable from SDDP, while signicantly improving the computationaltimes. These results provide the rst evidence of computational benets provided by a sequentialsampling approach in a multistage setting.

A Proofs of Convergence Analyses Results

This appendix has proofs of convergence analysis results discussed in 5.

Proof. [Theorem 3] The rst part follows from Theorem 2 and the denition of minorants in(18). To show the second part, we begin by rst noting that the coecients of the minorantsare generated using the dual extreme points of a bounded set as discussed at the beginning of

20

Page 21: Stochastic Dynamic Linear Programming: A Sequential ...stochastic dual dynamic programming (SDDP) algorithm, have received wide acceptance. However, these sampling-based methods still

Gangammanavar and Sen Stochastic Dynamic Linear Programming

this section. Therefore, the sequence of coecients generated will converge. Next, we restrictour attention to a compact subset of S(s[t]) ⊂ St which contain the converging subsequenceof decisions for a given sample path ω[t]. Since the Cauchy condition holds over this compact

subset, the sequence of hkt k converges, for all st, to a limit which we may call ht(st). Thusthe sequence converges on S(s[t]). Let ε > 0 be given, and choose K such that |hk1t − h

k2t | ≤ ε

holds for k1, k2 > K. Fix k1, and let k2 → ∞ in the above relation. Since hk2t → ht(st) ask2 → ∞, this gives |hk1t (st) − ht(st)| ≤ ε, for every k1 ≥ K and every st ∈ S(s[t]). Repeating

the same arguments on remaining nitely many (due to (A5)) sample paths, shows that hkt kis uniformly convergent which also implies their equicontinuity (see Theorem 7.24 in [32]).

Notes on the proof of Theorem 4

We present a variant of the Homan's Lemma that is integral in the proof of Theorem 4.

Lemma 9. Let U(x, ρ) be the set of optimal primal solutions of problem (22). Then there exists

a positive constant χ, depending only on C and D, such that for any (x, ρ), (x′, ρ′) ∈ dom U and

any u ∈ U(x, ρ),

dist(u,U(x′, ρ′)) ≤ χ‖x− x′‖. (30)

Proof. The linear program can be written in an equivalent form:

minη∈R

η subject to Du ≤ b− Cx, ρ>u− η ≤ 0. (31)

Denote by E(x, ρ) := (u, η) | Du ≤ b−Cx, ρ>u− η ≤ 0 the set of feasible points of (31). Let(x, ρ), (x′, ρ′) ∈ dom U and consider a point (u, η) ∈ E(x, ρ). Note that for any a ∈ Rn we have‖a‖ = sup‖z‖∗≤1 z

>a, where ‖ · ‖∗ is the dual of the norm ‖ · ‖. Using this we have

dist((u, η), E(x′, ρ′)) = inf(u′,η′)∈E(x′,ρ′)

‖(u, η)− (u′, η′)‖ (32a)

= infDu′≤b−Cx′

(ρ′)>u′−η′≤0

sup‖(z0,z1)‖∗≤1

z>0 (u− u′) + z1(η − η′) (32b)

= sup‖(z0,z1)‖∗≤1

infDu′≤b−Cx′

(ρ′)>u′−η′≤0

z>0 (u− u′) + z1(η − η′). (32c)

The interchange between the minimum and maximum operators follows from Theorem 7.11 in[37]. By using the change of variables ∆u = (u−u′) and ∆η = (η−η′), we can rewrite the innerterm as:

infD∆u≥Du−(b−Cx′)

(ρ′)>∆u−∆η≥(ρ′)>u−η

z>0 ∆u+ z1∆η. (33)

The dual of the above problem is given by:

supλ≥0,µ≥0

D>λ+µρ′=z0µ=−z1

λ>[Du− (b− Cx′)] + µ[(ρ′)>u− η]. (34)

It follows from ‖(z0, z1)‖∗ ≤ 1 that

dist((u, η), E(x′, ρ′)) = supλ≥0, 0≤µ≤1‖D>λ+µρ′‖∗≤1

λ>[Du− (b− Cx′)] + µ[(ρ′)>u− η]. (35)

21

Page 22: Stochastic Dynamic Linear Programming: A Sequential ...stochastic dual dynamic programming (SDDP) algorithm, have received wide acceptance. However, these sampling-based methods still

Gangammanavar and Sen Stochastic Dynamic Linear Programming

Using similar arguments used in Lemma 1(ii) we can establish that the cost coecients of(22) generated within SD-based methods for non-terminal stages belong to compact sets for allk ≥ 1. Therefore, we can assume without loss of generality that ‖ρ′‖∗ ≤ M . We obtain arelaxation of the above dual problem by replacing the constraint ‖D>λ + µρ′‖∗ ≤ 1 with theconstraint ‖D>λ‖∗ ≤ 1 + M . Let (λ, µ) be an optimal solution of the relaxed dual problem.We can assume without loss of generality that ‖ · ‖ is the `1 norm, and hence its dual is the `∞norm. For such a choice of a polyhedral norm, we have that the feasible set of the relaxed dualproblem is polyhedral. Therefore, λ is the an extreme point of the set λ | ‖D>λ‖∗ ≤ 1 +M.This implies that ‖λ‖∗ can be bounded by a constant χ1 which depends only on D.

Since (u, η) ∈ E(x, ρ), and hence Du− (b− Cx) ≤ 0 and ρ>u− η ≤ 0, we have

(λ)>[Du− (b− Cx′)] = (λ)>[Du− (b− Cx)] + (λ)>C(x− x′)≤ (λ)>C(x− x′) ≤ ‖λ‖∗‖C‖‖x− x′‖.

Further, notice that µ[ρ>u− η] ≤ 0 and ‖C‖ ≤ χ2. This leads us to conclude that

dist((u, η), E(x′, ρ′)) = ‖λ‖∗‖C‖‖x− x′‖ ≤ χ1χ2‖x− x′‖. (36)

This implies that our hypothesis in (30) is true.

Proof. [Lemma 7] The proof for the root stage follows that of regularized master in 2-SD (The-orem 4.4, [17]) and the root-node of MSD algorithm ([34]). Here we present the main parts ofthe proof and refer the reader to earlier works for detailed exposition. If the incumbent solutionuk0 changes innitely many times, then the optimality condition for regularized approximation(equation 2.6 in [17]) and our choice of σ ≥ 1 suggests that for any candidate solution uk0 thefollowing holds:

fk−10 (s0, u

k0)− fk−1

0 (s0, uk−10 ) ≤ −‖uk0 − uk−1

0 ‖2 ≤ 0. (37)

Note that the above condition holds at the iterations when the incumbent was updated by assign-ing the candidate solution as the new incumbent solution, i.e. uk0 = uk0. Let k1, k2, . . . , km ∈ K0

denote the set of m successive iterations when the incumbent solution was updated starting with

an incumbent uk00 . Note that, for any k` ∈ K0, uk`−10 = u

k`−1

0 . Using (37) over these m updates,we have

∆m :=1

m

m∑l=1

[fk`−10 (s0, u

k`0 )− fk`−1

0 (s0, uk`−1

0 )]

=1

m[fkm−1

0 (s0, ukm0 )− fk1−1

0 (s0, uk00 )] +

1

m

m∑`=1

[fk`−10 (s0, u

k`0 )− fk`+1−1

0 (s0, uk`0 )]

≤− 1

m

m∑`=1

‖uk`0 − uk`−1

0 ‖2.

The boundedness of functions fk0 implies that the rst term in the second equality aboveapproaches zero as m → ∞, and their uniform convergence (Lemma 5) implies that the sum-

mation converges to zero. Hence ∆m → 0, and limm→∞1m

∑m`=1 ‖u

k`0 − u

k`−1

0 ‖2 = 0. Thereforethe sequence of root-node incumbent solutions converges to u∗0 ∈ U0.

At non-root stages, the incumbent selection is based on the bases of (16). Since there is anite collection Bt of index sets, there exists iteration count Kt large enough such that Bk′t = Btfor all k′ ≥ Kt. Let us consider k > maxtKt when all the index sets for all non-root, non-terminal stages have been discovered. In these iterations, the procedure in 4.3.2 results in anincumbent solution such that:

ukt (skt ) =Mk

t (skt ) ∈ argminut∈Ut(skt ) d

>t ut +

∑ωt+∈Ωk−1

t+

pk−1(ωt+)hk−1t+ (st+(ωt+)).

22

Page 23: Stochastic Dynamic Linear Programming: A Sequential ...stochastic dual dynamic programming (SDDP) algorithm, have received wide acceptance. However, these sampling-based methods still

Gangammanavar and Sen Stochastic Dynamic Linear Programming

Consequently, the value associated with ukt (skt ) is H

k−1t (skt ). Recall from the proof of Theorem

6 that this value satises Hk−1t (skt ) ≤ F k−1

t (skt ) which can be restated using (19) as:

fk−1t (skt , u

kt (s

kt ))− fk−1

t (skt , ut(skt )) ≤ 0

where, ut(skt ) is the solution obtained by optimizing the regularized problem used during forward

pass. The quadratic programming optimality conditions of this regularized problem allow us towrite the analogous to (37) as:

fk−1t (skt , ut(s

kt ))− fk−1

t (skt , ukt (s

kt )) ≤ 0

The above two inequalities together imply that ‖ut(skt ) − ukt (skt )‖2 = 0. For a sample path

ω[t] ∈ Pt if we restrict out attention to those iterations K < k ∈ Kt(s[t]), then the result of The-

orem 4 shows the existence of an accumulation point u∗t (s[t]) of the sequence ukt (skt )k∈Kt(s[t]).Extending the argument to all samples paths in Pt completes the proof.

References

[1] J.F. Benders. Partitioning procedures for solving mixed-variables programming problems.Numerische Mathematik, 4(1):238252, 1962.

[2] J. R. Birge. Decomposition and partitioning methods for multistage stochastic linear pro-grams. Operations Research, 33(5):9891007, 1985.

[3] J. R. Birge and F. Louveaux. Introduction to Stochastic Programming. Springer Series inOperations Research and Financial Engineering. Springer, 2011.

[4] D. R. Cariño, D. H. Myers, and W. T. Ziemba. Concepts, technical issues, and uses of theRussell-Yasuda Kasai nancial planning model. Operations Research, 46(4):450462, 1998.

[5] M. S. Casey and S. Sen. The scenario generation algorithm for multistage stochastic linearprogramming. Mathematics of Operations Research, 30(3):615631, 2005.

[6] Z.L. Chen and W.B. Powell. Convergent cutting-plane and partial-sampling algorithm formultistage stochastic linear programs with recourse. Journal of Optimization Theory and

Applications, 102(3):497524, 1999.

[7] C. Donohue and J.R. Birge. The abridged nested decomposition method for multistagestochastic linear programs with relatively complete recourse. Algorithmic Operations Re-

search, 1(1), 2006.

[8] J. Dupa£ová, G. Consigli, and S. W. Wallace. Scenarios for multistage stochastic programs.Annals of Operations Research, 100(1-4):2553, 2000.

[9] J. Dupa£ová, N. Gröwe-Kuska, and W. Römisch. Scenario reduction in stochastic program-ming. Mathematical Programming, 95(3):493511, 2003.

[10] M. Dyer and L. Stougie. Computational complexity of stochastic programming problems.Mathematical Programming, 106(3):423432, 2006.

[11] Y. M. Ermol'ev. Stochastic quasigradient methods and their application to system opti-mization. Stochastics, 9, 1983.

[12] H. Gangammanavar, Y. Liu, and S. Sen. Stochastic decomposition for two-stage stochasticlinear programs with random cost coecients. 2019. To appear in INFORMS Journal onComputing.

23

Page 24: Stochastic Dynamic Linear Programming: A Sequential ...stochastic dual dynamic programming (SDDP) algorithm, have received wide acceptance. However, these sampling-based methods still

Gangammanavar and Sen Stochastic Dynamic Linear Programming

[13] H. Gangammanavar and S. Sen. Two-scale stochastic optimization for controlling dis-tributed storage devices. IEEE Transactions on Smart Grid, 9(4):26912702, July 2018.

[14] H. Gangammanavar, S. Sen, and V. M. Zavala. Stochastic optimization of sub-hourlyeconomic dispatch with wind energy. IEEE Transactions on Power Systems, 31(2):949959, March 2016.

[15] J. L. Higle and S Sen. Stochastic decomposition: An algorithm for two-stage linear programswith recourse. Mathematics of Operations Research, 16(3):650669, 1991.

[16] J. L. Higle and S Sen. Finite master programs in regularized stochastic decomposition.Mathematical Programming, 67(1-3):143168, 1994.

[17] J. L. Higle and S Sen. Stochastic Decomposition: A Statistical Method for Large Scale

Stochastic Linear Programming. Kluwer Academic Publishers, Boston, MA., 1996.

[18] J. L. Higle and S Sen. Statistical approximations for stochastic linear programming prob-lems. Annals of Operations Research, 85(0):173193, 1999.

[19] G. Infanger and D. P. Morton. Cut sharing for multistage stochastic linear programs withinterstage dependency. Mathematical Programming, 75(2):241256, 1996.

[20] J. E. Jr. Kelley. The cutting-plane method for solving convex programs. Journal of the

Society for Industrial and Applied Mathematics, 8(4):703712, 1960.

[21] M. I. Kusy and W. T. Ziemba. A bank asset and liability management model. OperationsResearch, 34(3):pp. 356376, 1986.

[22] K. Linowsky and A.B. Philpott. On the convergence of sampling-based decompositionalgorithms for multistage stochastic programs. Journal of Optimization Theory and Appli-

cations, 125(2):349366, 2005.

[23] D. P. Morton. An enhanced decomposition algorithm for multistage stochastic hydroelectricscheduling. Annals of Operations Research, 64(1):211235, 1996.

[24] J. M. Mulvey and A. Ruszczy«ski. A new scenario decomposition method for large-scalestochastic optimization. Operations Research, 43(3):477490, 1995.

[25] M.V.F. Pereira and L.M.V.G. Pinto. Multi-stage stochastic optimization applied to energyplanning. Mathematical Programming, 52(1-3):359375, 1991.

[26] R.J. Peters, K. Boskma, and H. E. Kupper. Stochastic programming in production planning:a case with none-simple recourse. Statistica Neerlandica, 31(3):113126, 1977.

[27] A. B. Philpott and V. L. de Matos. Dynamic sampling algorithms for multi-stage stochasticprograms with risk aversion. European Journal of Operational Research, 218(2):470 483,2012.

[28] A. B. Philpott and Z Guan. On the convergence of stochastic dual dynamic programmingand related methods. Operations Research Letters, 36(4):450 455, 2008.

[29] W. B. Powell, A. George, H. Simão, W. Scott, A. Lamont, and J. Stewart. SMART: Astochastic multiscale model for the analysis of energy resources, technology, and policy.INFORMS Journal on Computing, 24(4):665682, 2012.

[30] H. Robbins and S. Monro. A stochastic approximation method. Ann. Math. Statist.,22(3):400407, 09 1951.

24

Page 25: Stochastic Dynamic Linear Programming: A Sequential ...stochastic dual dynamic programming (SDDP) algorithm, have received wide acceptance. However, these sampling-based methods still

Gangammanavar and Sen Stochastic Dynamic Linear Programming

[31] R. T. Rockafellar and R. J. B. Wets. Scenarios and policy aggregation in optimization underuncertainty. Math. Oper. Res., 16(1):119147, February 1991.

[32] W. Rudin. Principles of mathematical analysis. McGraw-Hill Book Co., New York, thirdedition, 1976. International Series in Pure and Applied Mathematics.

[33] A. Ruszczy«ski. A regularized decomposition method for minimizing a sum of polyhedralfunctions. Mathematical Programming, 35(3):309333, 1986.

[34] S. Sen and Z. Zhou. Multistage Stochastic Decomposition: A bridge between StochasticProgramming and Approximate Dynamic Programming. SIAM Journal on Optimization,24(1):127153, 2014.

[35] Suvrajeet Sen and Yifan Liu. Mitigating uncertainty via compromise decisions in two-stagestochastic linear programming: Variance reduction. Operations Research, 64(6):14221437,2016.

[36] A. Shapiro. Analysis of stochastic dual dynamic programming method. European Journal

of Operational Research, 209(1):63 72, 2011.

[37] A. Shapiro, D. Dentcheva, and A. Ruszczynski. Lectures on Stochastic Programming: Model-

ing and Theory, Second Edition. Society for Industrial and Applied Mathematics, Philadel-phia, PA, USA, 2014.

[38] H. Topaloglu. Using Lagrangian relaxation to compute capacity-dependent bid prices innetwork revenue management. Oper. Res., 57(3):637649, 2009.

[39] R. M. Van Slyke and R. J. B. Wets. L-shaped linear programs with applications to optimalcontrol and stochastic programming. SIAM Journal on Applied Mathematics, 17(4):638663, 1969.

25