asymptotics and fast simulation for tail probabilities of …sandeepj/avail_papers/jks_07.pdf ·...

Asymptotics and Fast Simulation for TailProbabilities of Maximum of Sums of FewRandom Variables

S. JUNEJA

Tata Institute of Fundamental Research, Mumbai

R. L. KARANDIKAR

Indian Statistical Institute, Delhi

and

P. SHAHABUDDIN

Columbia University, New York

We derive tail asymptotics for the probability that the maximum of sums of a few random variablesexceeds an increasing threshold, when the random variables may be light as well as heavy tailed.These probabilities arise in many applications including in PERT networks where our interestmay be in measuring the probability of large project delays. We also develop provably asymptot-ically optimal importance sampling techniques to efficiently estimate these probabilities. In thelight-tailed settings we show that an appropriate mixture of exponentially twisted distributionsefficiently estimates these probabilities. As is well known, exponential twisting based methodsare not applicable in the heavy-tailed settings. To remedy this, we develop techniques that relyon “asymptotic hazard rate twisting” and prove their effectiveness in both light and heavy-tailedsettings. We show that in many cases the latter may have implementation advantages over ex-ponential twisting based methods in the light-tailed settings. However, our experiments suggestthat when easily implementable, the exponential twisting based methods significantly outperformasymptotic hazard rate twisting based methods.

Categories and Subject Descriptors: G.3 [Probability and Statistics]: Probabilistic Algorithms(including Monte Carlo); I.6.1 [Simulation and Modeling]: Simulation Theory

General Terms: Algorithms, Performance, Theory

Additional Key Words and Phrases: PERT networks, importance sampling, rare event simulation,tail asymptotics

Authors’ addresses: S. Juneja, School of Technology and Computer Science, Tata Institute of Funda-mental Research, Colaba, Mumbai, India - 400005; email: [email protected]; R. L. Karandikar, In-dian Statistical Institute 7, S J S Sansanwal Marg, New Delhi, India - 110016; email: [email protected] to make digital or hard copies of part or all of this work for personal or classroom use isgranted without fee provided that copies are not made or distributed for profit or direct commercialadvantage and that copies show this notice on the first page or initial screen of a display alongwith the full citation. Copyrights for components of this work owned by others than ACM must behonored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers,to redistribute to lists, or to use any component of this work in other works requires prior specificpermission and/or a fee. Permissions may be requested from Publications Dept., ACM, Inc., 2 PennPlaza, Suite 701, New York, NY 10121-0701 USA, fax +1 (212) 869-0481, or [email protected]© 2007 ACM 1049-3301/2007/04-ART7 $5.00 DOI 10.1145/1225275.1225278 http://doi.acm.org/10.1145/1225275.1225278

ACM Transactions on Modeling and Computer Simulation, Vol. 17, No. 2, Article 7, Publication date: April 2007.

2 • S. Juneja et. al.

ACM Reference Format:Juneja, S. Karandikar, R. L., and Shahabuddin, P. 2007. Asymptotics and fast simulation fortail probabilities of maximum of sums of few random variables. ACM Trans. Model. Comput.Simul. 17, 2, Article 7 (April 2007), 35 pages. DOI = 10.1145/1225275.1225278 http://doi.acm.org/10.1145/1225275.1225278

1. INTRODUCTION

Consider independent random variables (X 1, . . . , X n) and let (P j : j ≤ τ ) de-note a collection of τ subsets of {1, 2, . . . , n} such that ∪τ

i=1P j = {1, 2, . . . , n}.Set

T = maxj≤τ

∑i∈P j

X i.

In this article, we develop asymptotics and importance sampling techniques forP (T > u) as u → ∞ for n fixed. When τ = 1, P (T > u) reduces to P (Sn > u)where Sn = ∑n

i=1 X i. Also note that the sets (P j : j ≤ τ ) need not be disjoint sothat the associated sums may be dependent.

One application of such probabilities arises in stochastic PERT networks(Project Evaluation and Review Technique; see Elmaghraby [1977] andAdalakha and Kulkarni [1989]). These networks consist of many tasks or activ-ities with stochastic durations that need to be executed under specified prece-dence constraints. The random variables (X 1, . . . , X n) may be used to modelthese task durations and the subsets (P j : j ≤ τ ) may be chosen to reflect theprecedence constraints so that T denotes the overall duration of the project. Ourinterest then is in efficient estimation of the probability of large delay, P (T > u),in such networks. These probabilities are of enormous interest in project man-agement as the costs associated with large delays can be prohibitive.

Many other applications fit our framework. For example, consider the per-formance evaluation problem faced by a composite web service provider. Asthe name suggests, the composite service providers provide a number of ser-vices or activities. As in a typical project, these activities may be performedunder prespecified precedence constraints. A simple example is the case wherea service provider is asked to provide an address of persons A and B and theshortest path connecting the two addresses. Another example may be a requestto determine the credit rating of a customer where the service provider mayneed to contact many agencies to establish the customer’s credit. Compositeservice providers in the Internet settings typically provide guarantees to cus-tomers against large delays (e.g., delay is less than 5 minutes 95% of times).Thus, accurate estimation of large delay probabilities is particularly importantto them. Our framework may also be used to model grid computing networkswhere the overall computation consists of many sub-computational processesthat may be executed in parallel in different processors but that need to sat-isfy certain prespecified precedence constraints. Similarly, we may also modeltransportation schedules in this framework, (e.g., for trains, planes or buses)where precedence constraints are induced by the requirement that a vehicle ata particular location may take off only after the arrival of certain other vehicles.


Tail Probabilities of Maximum of Sums of Few Random Variables • 3

Note that when τ = n and each P j = (1, 2, . . . , j ), then T = maxk≤n Sk so thatP (T > u) corresponds to a probability of considerable interest in the appliedprobability literature.

In our analysis we allow activity durations (i.e., X i ’s) to have both light andheavy-tailed distributions (a random variable is said to have a light-tailed dis-tribution if its tail distribution function decays at least at an exponential rate;when the tail distribution function decays at a subexponential rate, under mildregularity conditions, the random variable is said to be subexponentially dis-tributed or heavy tailed; precise definition in Section 2.2). Typically, light-taileddistributions may be used to model activity duration in project managementor transportation settings (see Adalakha and Kulkarni [1989]). In the web-services and grid computing settings, heavy-tailed distributions may often beused (see, e.g., Crovella et al. [1998] and Leland et al. [1994]). We further as-sume that the duration of all the activities are mutually independent, that is,the random variables (X 1, . . . , X n) are mutually independent.

When the underlying random variables are light tailed, importance sam-pling probability measure obtained by appropriate exponential twisting of theunderlying distributions has been found to be useful for efficient simulationof many performance measures (see survey articles, e.g., Heidelberger [1995]and Juneja and Shahabuddin [2006]; this technique is reviewed in Section 2).In this article, we use and generalize some of these ideas to develop exponen-tial twisting based asymptotically optimal importance sampling techniques toefficiently estimate P (Sn > u) and P (T > u) when the (X i : i ≤ n) are lighttailed (asymptotic optimality is a standard criteria for measuring effectivenessof importance sampling distribution and is reviewed in Section 2).

As is well known, an essential requirement to obtain an exponentially twisteddistribution from the original distribution of a random variable (rv; rvs whenplural) is that it be light tailed (more specifically, its moment generating func-tion should exist in a neighborhood of the origin). When the distribution of therandom variable does not satisfy this property (roughly speaking, this is truewhen the rv is subexponentially distributed) exponential twisting is not feasi-ble. Juneja and Shahabuddin [2002] develop hazard rate twisting to efficientlyestimate P (Sn > u) when the (X i : i ≤ n) are independent and identically dis-tributed (iid), and have a subexponential distribution (we review this techniquein Section 3.3). In this article, we introduce ‘asymptotic hazard rate twisting’in light and heavy-tailed settings. This generalizes the hazard rate twistingproposed for heavy-tailed rvs. In many cases plain hazard rate twisting orexponential twisting (in light-tailed settings) may be difficult to implement,however it may be feasible to successfully implement asymptotic hazard ratetwisting.

The specific contributions of this article include:

(1) We develop a logarithmic tail asymptotic for P (T > u) when all the under-lying activity durations X i are light tailed, n is fixed and u → ∞. Enroute,we develop logarithmic asymptotics for the probability P (Sn > u) when theX i ’s are light-tailed random variables. Here again, n is fixed and u → ∞.While there exists enormous literature on logarithmic asymptotics for the



large deviations probability P (Sn/n > u) as n → ∞ and u is fixed (see, forexample, Dembo and Zeitouni [1998] and Petrov [1975]), to the best of ourknowledge, little is documented for the case that we consider (in Jureckova[1981], this problem is considered when (X i : i ≤ n) are iid).

(2) We develop an exact asymptotic for P (T > u) when one or more X i aresubexponentially distributed.

(3) We develop exponential twisting based asymptotically optimal importancesampling techniques to efficiently estimate P (Sn > u) when (X i : i ≤ n) arelight-tailed. We note through a simple example that straightforward appli-cation of exponential twisting may not effectively estimate P (T > u). Wethen show that an importance sampling distribution that is a convex com-bination of many appropriately exponentially twisted distributions, asymp-totically optimally estimates this probability.

(4) We develop asymptotically optimal importance sampling techniques to es-timate P (T > u) (optimal as u → ∞) using asymptotic hazard rate twistingin light and heavy-tailed settings.

We restrict our analysis to small networks. Thus, in our analysis of P (T > u),n and hence τ remain fixed while u → ∞. This is relevant as in many appli-cations the number of activities may be small (e.g., for composite web services)or an aggregate level analysis may be conducted so that many activities aregrouped under a single head resulting in a network consisting of a few aggre-gated activities. From technical viewpoint, it may be straightforward to gen-eralize exponential-twisting-based methodologies to large n in light-tailed set-tings. The analysis in Sadowsky and Bucklew [1990] becomes useful here asthey consider development of importance sampling techniques for P (Sn/n > a)for (X i : i ≤ n) iid and light tailed, fixed a > E X i and n → ∞. However, whensubexponential random variables are involved, effective importance samplingtechniques for estimating P (Sn/n > a) as n → ∞ do not exist. Also, as we verifythrough experiments, the hazard twisting techniques discussed here may notbe effective for large n. Therefore, to maintain focus in this paper we restrictour analysis to fixed n case.

In Section 2, we develop the mathematical background needed to analyzeP (T > u). In particular, we review the importance sampling technique and theclassification of distributions as light and heavy tailed. In Section 3, we developlogarithmic asymptotics and exponential-twisting-based importance samplingtechniques for P (T > u) when all activities are light tailed. In Section 4, we in-troduce asymptotic hazard rate twisting and we again consider the problem ofefficient estimation of P (T > u) when all activities are light-tailed, albeit thistime we use asymptotic hazard rate twisting. In Section 5, we develop exactasymptotics and asymptotic hazard rate twisting based importance samplingtechniques for P (T > u) when some of the activities are heavy tailed. Somepractical implementation considerations related to the proposed importancesampling techniques are discussed in Section 6. Numerical experiments, veri-fying the efficacy of proposed fast simulation techniques are given in Section 7.Finally, in Section 8 we provide a brief conclusion and discuss some areas forfurther research. All proofs are relegated to the appendix.



2. MATHEMATICAL FRAMEWORK AND BACKGROUND

In this section, we first discuss the importance sampling approach in the contextof estimating P (T > u) and P (Sn > u). We then classify the distributions aslight and heavy tailed and illustrate the definitions through examples.

Let (�, F , P ) denote the underlying probability space on which the mutuallyindependent rvs (X 1, . . . , X n) are defined. It suffices to let F correspond to theσ algebra generated by (X 1, . . . , X n).

2.1 Importance Sampling

Recall that naive estimation of P (T > u) involves generating many indepen-dent samples of (X i : i ≤ n) and hence of I (T > u) via simulation and then tak-ing their average (here I (·) denotes the indicator function). As is well known,the number of samples needed to get a fixed degree of relative accuracy (ratio ofstandard deviation and the mean of the estimate) is inversely proportional toP (T > u) and hence becomes prohibitively large for large values of u. This mo-tivates the need for importance sampling which we now briefly review: Let Fi

denote the distribution function of X i. To keep the notation simple, we assumethat each X i has a probability density function (pdf) given by fi. Let P∗ denoteanother probability measure on the space (�, F) under which each X i has adistribution function F ∗

i and a pdf f ∗i , such that P is absolutely continuous

with respect to P∗ (i.e., f ∗i (x) > 0 if fi(x) > 0 almost everywhere), and the X i ’s

are mutually independent under P∗ as well. Importance sampling under themeasure P∗ involves generating samples of (X 1, X 2, . . . , X n) using P∗. Then,an unbiased estimator of P (T > u) equals I (T > u)L where

L = f1(X 1)f ∗

1 (X 1)f2(X 2)f ∗

2 (X 2). . .

fn(X n)f ∗

n (X n)a.s.,

and is referred to as the likelihood ratio. The average of many independentsamples of I (T > u)L provides an unbiased estimator of P (T > u). For anyprobability P̃ , let EP̃ denote the associated expectation operator. Note thatEP∗ [L2 I (T > u)] ≥ (EP∗ [LI (T > u)])2 = P (T > u)2 and hence

lim supu→∞

log EP∗ [L2 I (T > u)]log P (T > u)

≤ 2.

Such a P∗ is defined to be an asymptotically optimal estimator of P (T > u) iff:

limu→∞

log EP∗ [L2 I (T > u)]log P (T > u)

= 2. (1)

The discussion above holds for P (Sn > u) as well with T replaced by Sn.

2.2 Classification of Distributions Based on Tail Behavior

2.2.1 Light-Tailed and Superexponential Distributions. Consider a rv Xwith distribution function F and pdf f . Let F̄ denote its tail distributionfunction so that F̄ (x) = 1 − F (x). Let � denote its hazard function, that is,�(x) = − log F̄ (x). Since F̄ is nonincreasing and F̄ (x) → 0 as x → ∞, it follows



that � is non-decreasing and �(x) → ∞ as x → ∞. We say that X has a light-tailed distribution if there exists a positive extended real number λ ∈ (0, ∞]and an α ≥ 1 such that

�(x)xα

→ λ.

We say that X is superexponentially distributed if above holds with α > 1. Inthis case, we say that its tail distribution decays at a superexponential rate. Tokeep the exposition simple, we do not consider distributions where the abovelimit does not exist.

Two functions f and g are said to be asymptotically similar if limx→∞ f (x)g (x) =

1. This is denoted by f (x) ∼ g (x). The following examples illustrate some light-tailed distributions:

Example 2.1. Suppose that X is Gamma(n, λ) distributed. Its pdf

f (x) = λnxn−1 exp[−λx](n − 1)!

,

for x > 0 and f (x) = 0 otherwise, where n is a positive integer and λ > 0. Itstail df equals

exp[−λx]n−1∑m=0

(λx)m

m!.

Its hazard function is asymptotically similar to λx. Hence, it is light tailed butnot superexponentially distributed.

Example 2.2. Suppose that X has a Weibull(α, λ) distribution, that is,P (X > x) = exp (−(λx)α), x > 0, where λ is called the scale parameter andα is called the shape parameter. Then, for α ≥ 1, it is light tailed and for α > 1it is superexponentially distributed. Here the hazard function is �(x) = (λx)α.

Example 2.3. Suppose that X has a Normal distribution with mean μ andvariance σ 2 (let N (μ, σ 2) denote a rv with this distribution). Its tail df F̄ eval-uated at x equals the tail df of a N (0, 1) rv evaluated at (x − μ)/σ . Thus, thefollowing inequalities follow (see, for example, Feller [1970], VII.1, Lemma 2,pg. 175):

1√2π

σ

x − μ

(1 −

[σ

x − μ

]2)

exp[− (x − μ)2

2σ 2

]< F̄ (x)

<1√2π

σ

x − μexp

[− (x − μ)2

2σ 2

],

for x ≥ μ. Although the hazard function � of N (μ, σ 2) does not have a simpleexplicit representation, it follows that for any ε > 0 and x sufficiently large:

(x − μ)2

2σ 2+ log

(x − μ

σ

)+ log(

√2π ) ≤ �(x) ≤ (x − μ)2

2σ 2+ log

(x − μ

σ

)+ log(

√2π ) + ε.

In particular, �(x) ∼ (x−μ)2

2σ 2 ∼ x2

2σ 2 and X is superexponentially distributed.



2.2.2 Heavy-Tailed or Subexponential Distributions. A rv X is said to besubexponentially distributed if it is nonnegative and if

P (X 1 + X 2 > u)2P (X 1 > u)

→ 1

as u → ∞, where X 1 and X 2 are independent and have the same distributionas X (see, for example, Embrechts et al. [1997] and Pakes [2004] generalizesthis definition to remove the non-negativity restriction. However, we consideronly non-negative subexponential random variables in this paper as this con-siderably simplifies some of our analysis).

It can be shown that if X is subexponentially distributed then its momentgenerating function evaluated at any positive value equals ∞. Furthermore,if the hazard function � of X is sublinear, that is, �(x)/x → 0 as x → ∞then under certain regularity conditions X is subexponentially distributed (seePitman [1980]). For instance, if for all x sufficiently large, �(x) = x/ log(x), thenX is subexponentially distributed.

Consider the following well-known family of subexponential distributions(see, e.g., Embrechts et al. [1997]):

Example 2.4. Suppose that X has a Weibull(α, λ) distribution, that is,P (X > x) = exp (−(λx)α), x > 0 with shape parameter α < 1. Here the hazardfunction �(x) = (λx)α is sublinear.

Example 2.5. Suppose that X = exp[N (μ, σ 2)], that is, it has a Lognormaldistribution. Then, its pdf

f (x) = 1

x√

2πσ 2exp

(−(log x − μ)2

2σ 2)for x > 0.

Note that its tail df evaluated at x equals the tail df of N (μ, σ 2) evaluated atlog(x). The same therefore is true for the hazard function �. Then, for any ε > 0and x sufficiently large:

(log(x) − μ)2

2σ 2+ log

(log(x) − μ

σ

)+ log(

√2π ) ≤ �(x),

and

�(x) ≤ (log(x) − μ)2

2σ 2+ log

(log(x) − μ

σ

)+ log(

√2π ) + ε.

In particular, �(x) ∼ (log(x)−μ)2

2σ 2 . It is well known that X is subexponentiallydistributed (see, e.g., Embrechts et al. [1997]).

Example 2.6. Suppose that X has a Pareto(α, λ) distribution, that is,

P (X > x) = 1(1 + λx)α

for x ≥ 0.

Then,

�(x) = α log(1 + λx).



In the literature there are different notions of heavy-tailed random variables.Most general classification considers random variable X to be heavy tailed if

exp[εx]P (X ≥ x) → ∞as x → ∞ for all ε > 0. Under this definition, subexponentially distributed ran-dom variables are a subclass of heavy-tailed random variables. Certain classifi-cations include random variables with Pareto-type polynomially-decaying tailsas heavy, while excluding the random variables with lighter Weibull-type tails.Our analysis in Section 5 focuses on subexponential distributions that includesPareto as well as Weibull distributions (with shape parameter less than 1). Inthis article, we refer to subexponential distributions as heavy tailed.

In Sections 3 and 4, we allow the light-tailed random variables to take valuesin � (to accommodate Normally distributed random variables commonly used inPERT analysis). In Section 5, our focus is on subexponential random variables.As indicated earlier, there we restrict all random variables to be nonnegativefor ease of analysis.

3. LIGHT-TAILED ANALYSIS USING EXPONENTIAL TWISTING

In this section, we consider the case where (X i : i ≤ n) are light tailed. We firstreview the importance sampling technique of exponential twisting. We thenconsider P (Sn > u) as this provides insights for analyzing the more generalprobability P (T > u). Specifically, we develop the logarithmic tail asymptoticsfor P (Sn > u) and show that appropriate exponential twisting based importancesampling distribution asymptotically optimally estimates P (Sn > u). Throughan example we illustrate that simple exponential twisting may not work toasymptotically optimally estimate P (T > u). We then propose a convex com-bination of exponentially twisted distributions to asymptotically optimally es-timate P (T > u). (See, e.g., Sadowsky and Bucklew [1990] for an applicationof a similar idea in the large deviations settings involving multi-dimensionallight-tailed random variables.)

3.1 Logarithmic Tail Asymptotics and Simulation for P(Sn > u)

Let Hi(θ ) = log∫ ∞−∞ exp(θx) fi(x)d x denote the log-moment generating function

of X i. For θ such thatHi(θ ) < ∞, let f θi denote the pdf obtained by exponentially

twisting fi by θ , that is,

f θi (x) = exp[θx − Hi(θ )] fi(x).

Let Pθ denote the probability measure under which each X i has a pdf f θi

and (X i : i ≤ n) are mutually independent. Thus, if Pθ is used to estimateP (Sn > u), the corresponding likelihood ratio Lθ equals

Lθ = exp[

− θSn +∑i≤n

Hi(θ )]

a.s. (2)

Let λi(·) and �i(·) denote the hazard rate and hazard function of X i, re-spectively. Then, fi(x) = λi(x) exp[−�i(x)]. As is well known, λi(x) = �′

i(x) =fi(x)/F̄i(x), where �′

i(x) denotes the derivative of �i(x).



Note that if �i (x)� j (x) → ∞ as x → ∞, then X i has a lighter tail compared to X j ,

asP (X i > x)P (X j > x)

= exp(−�i(x) + � j (x)) → 0.

The following assumption is needed to prove Theorem 3.2:

ASSUMPTION 3.1. There exists an α ≥ 1, k : 1 ≤ k ≤ n and constants λ̃i ,0 < λ̃i < ∞, 1 ≤ i ≤ k such that for all i ≤ k:

λi(x) ∼ λ̃iαxα−1. (3)

(Note that this implies that

�i(x) ∼ λ̃ixα, (4)

see [Feller 1971, VIII.9, Theorem 1(b), pg 281]. Furthermore, if k < n, then�i(x)/xα → ∞ as x → ∞, for k < i ≤ n.

Thus, each of the rvs (X i : i > k) have a lighter tail distribution compared tothe rvs (X i : i ≤ k).

Let

λ∗ = 1( ∑i≤k 1/λ̃

1α−1i

)α−1, (5)

for α > 1 and

λ∗ = mini≤k

λ̃i

for α = 1. Recall that EP̃ denotes the expectation operator associated withprobability P̃ .

THEOREM 3.2. Under Assumption 3.1,

limu→∞

log P (Sn > u)uα

= −λ∗, (6)

and

limu→∞

log EPθuL2 I (Sn > u)uα

= −2λ∗, (7)

where, for α > 1, θu ∼ αλ∗uα−1, and for α = 1, θu ∼ mini≤k λ̃i such that Hi(θu) =o(u) for each i ≤ n.

From Theorem 3.2, it follows that Pθu asymptotically optimally estimatesP (Sn > u).

Remark 3.3. For α > 1, the choice θu ∼ αλ∗uα−1 is motivated by the factthat the value that minimizes an upper bound exp[−θu+∑

i≤n Hi(θ )] (for θ > 0)on the right-hand side in (2) is asymptotically similar to αλ∗uα−1; the latter isgiven by the solution to the equation∑

i≤n

H′i(θ ) = u, θ > 0. (8)



The analysis supporting this is omitted for brevity. When α = 1, typically, θu

can be easily selected so that the conditions in Theorem 3.2 hold. For instance,suppose that X 1 is exponentially distributed with rate η, X 2 has a Gammadistribution with rate η and shape parameter α, while all other (X i : i ≥ 3) havelighter tails, for example, they are exponentially distributed with rate μ > η.Then, we may set θu = η(1 − 1

uβ ), for any β > 0. For this θu, H1(θu) = β log u,H2(θu) = βα log u, while Hi(θu) = log μ

μ−η−1/uβ for i ≥ 3. Hence, all Hi(θu) areo(u).

Remark 3.4. The proof of Theorem 3.2 does not require the existence of pdffi, or equivalently, the hazard rate λi, for each i. It holds even if relation (4) isassumed to hold in lieu of (3) in Assumption 3.1. However, since (3) is useful insimplifying later analysis, we retain Assumption 3.1 in its present form.

3.2 Logarithmic Tail Asymptotics for P(T > u)

The logarithmic tail asymptotics for P (T > u) follows easily from Theorem 3.2.Some notation is needed for this purpose.

Recall from Assumption 3.1 that, roughly speaking, (X i : i ≤ k) have tails ofa similar magnitude that are much heavier than the tails of (X i : i > k). LetP̃ j = P j ∩ {1, 2, . . . , k}. For each j ≤ τ such that P̃ j is nonempty, when α > 1,let

λ∗j = 1( ∑

i∈P̃ j1/λ̃

1α−1i

)α−1. (9)

Let λ∗j = mini∈P̃ j

λ̃i when α = 1. Set λ∗j to ∞ if P̃ j is empty. Let L j = ∑

i∈P jX i.


limu→∞

log P (T > u)uα

= − minj≤τ

λ∗j . (10)

Remark 3.6. Thus, in the PERT network terminology, the tail decay rateof T is governed by the tail decay rate of the most likely ‘path’ P j , j ≤ τ in thenetwork. Note that when α = 1, min j≤τ λ∗

j equals min j≤k λ̃ j . Thus, when thetail distribution function of a rv in the network with the heaviest tail decays atan exponential rate, the decay rate of the network delay T is also exponentialwith the same rate.

3.3 Asymptotically Optimal Simulation for P(T > u)

In Example 3.7, we illustrate that under exponential twisting the change ofmeasure that asymptotically optimally estimates P (Sn > u) need not asymp-totically optimally estimate P (T > u).

Example 3.7. Suppose that n = 2 and that X 1 and X 2 are iid with N (0, σ 2)distribution. Then,

limu→∞

log P (X i > u)u2

= − 12σ 2



for i = 1, 2. In particular, Assumption 3.1 holds with k = 2, α = 2, λ̃1 = λ̃2 =1

2σ 2 . Also, λ∗ = 14σ 2 . Further suppose that T = max(X 1, X 2). It follows from

Theorem 3.2 that

limu→∞

log P (S2 > u)u2

= − 14σ 2

,

and

limu→∞

log EPθuL2

θuI (S2 > u)

uα= − 1

2σ 2,

where θu = u2σ 2 . (Recall the definition of Pθ from Section 3.1). Again, since X 1

and X 2 are iid,

P (T > u) = 1−P (X 1 ≤ u)P (X 2 ≤ u) = 2P (X 1 > u)−P (X 1 > u)2 ∼ 2P (X 1 > u),

and it follows that

limu→∞

log P (T > u)u2

= − 12σ 2

.

We now argue that

lim infu→∞

log EPθuL2

θuI (T > u)

u2> − 1

σ 2. (11)

Hence, while Pθu asymptotically optimally estimates P (S2 > u), it does not doso for P (T > u). To see (11), note that Hi(θ ) = θ2σ 2/2 for i = 1, 2. Hence,Hi(θu) = u2

8σ 2 for i = 1, 2. Specializing (2) to our example, we get:

Lθu = exp[− u

2σ 2(X 1 + X 2) + u2

4σ 2

]

almost surely. In particular, it follows that EPθuL2

θuI (T > u) = EP Lθu I (T > u)

is bounded from below by

exp[− 1

2σ 2u(u + √

u + c) + u2

4σ 2

]P (X 1 ∈ (u, u + √

u))P (0 ≤ X 2 ≤ c) (12)

for a positive constant c. Since X 1 has a N (0, σ 2) distribution, it is easily seenthat P (X 1∈(u,u+√

u))P (X 1>u) → 1 as u → ∞. Hence,

limu→∞

log P (X 1 ∈ (u, u + √u))

u2= − 1

2σ 2. (13)

Taking logarithm of (12), dividing by u2 and taking lim inf as u → ∞, noting(13), the result can be seen to equal − 3

4σ 2 u2 and hence (11) follows.Similarly, it is easily argued that just by exponentially twisting X 1 by any

parameter and not X 2, or vice-versa, asymptotic optimality cannot be achievedin this example. This is easily seen by considering the contribution to the sec-ond moment by the set where the untwisted component taking values greaterthan u.

In Example 3.7, we illustrated that the exponential-twisted-based changeof measure that asymptotically optimally estimates P (Sn > u) may not be as



effective in estimating P (T > u). A brief intuitive explanation is as follows:It is well known that a change of measure that sufficiently emphasizes themost likely paths to the rare event provides an effective importance samplingestimator for the rare event probability (see, e.g., Juneja and Shahabuddin[2006]). However, since the most likely way the event {Sn >u} occurs may differfrom the way the event {T > u} occurs, an importance sampling estimatorthat is effective in estimating P (Sn > u) may not be as effective in estimatingP (T > u). For instance, in Example 3.7, for large u, a likely way for {S2 > u}to occur can be roughly seen to correspond to both X 1 and X 2 taking valuesclose to u/2, while a likely way for {T > u} to occur involves X 1 exceeding u andX 2 taking a usual small value (relative to u), or vice-versa. It can be seen inExample 3.7 that under Pθu each X i has a N (u/2, σ 2) distribution. This explainsits lack of effectiveness in estimating P (T > u) in that example.

We now propose an asymptotically optimal change of measure to estimateP (T > u) that involves taking a convex combination of τ (τ is defined in theIntroduction) exponentially twisted changes of measure. It can be seen thatthis change of measure assigns significant probability to events {L j > u} forj ≤ τ and P̃ j nonempty (recall that L j = ∑

i∈P jX i), and hence to the event

{T > u} =⋃j≤τ

{L j > u}.

When P̃ j is nonempty, let Pθu jfor j ≤ τ , denote a change of measure under

which the distribution of each X i for i ∈ P j is obtained by exponentially twistingthe original distribution by θu j , where θu j ∼ αλ∗

j uα−1, if α > 1. When α = 1,

select θu j such that θu j ∼ λ∗j and Hi(θu j ) = o(u) for each i ∈ P j . Set θu j = 0 if P̃ j

is empty. From Theorem 3.2, it follows that Pθu jis an asymptotically optimal

importance sampling distribution to estimate P (L j > u) when P̃ j is nonempty.Now consider a probability measure P∗ for (X i : i ≤ n) such that for any

A ∈ F :

P∗(A) =∑j≤τ

pj Pθu j(A),

where (pj > 0 : j ≤ τ ) and∑

j≤τ pj = 1. Note that the likelihood ratio L of Pwith respect to P∗ equals

L = 1∑j≤τ pj exp

(θu j

∑i∈P j

X i − ∑i∈P j

Hi(θu j )) a.s. (14)


limu→∞

log EP∗ [L2 I (T > u)]uα

= −2 minj≤τ

λ∗j . (15)

From this theorem and (10), it follows that P∗ asymptotically optimally esti-mates P (T > u) for any arbitrary (pj > 0 : j ≤ τ ) such that

∑j≤τ pj = 1. Later

in Section 6, we discuss a heuristic for arriving at a good choice for (pj : j ≤ τ ).



4. LIGHT-TAILED ANALYSIS USING HAZARD RATE TWISTING

As mentioned in the Introduction, for a subexponential distribution, exponen-tial twisting is no longer feasible as the moment generating function evaluatedat any positive value equals ∞. To remedy this, Juneja and Shahabuddin [2002]introduced hazard rate twisting to efficiently estimate P (Sn > u) (recall thatSn = ∑

i≤n X i) where (X i : 1 ≤ i ≤ n) are mutually independent, and are iden-tically and subexponentially distributed. In this and the next section we showthat hazard rate twisting can be implemented to efficiently estimate P (T > u)(and hence also P (Sn > u)) even when all the underlying rvs are allowed tohave different distributions and may even be light tailed. A potential drawbackof this approach is that both for light and heavy-tailed rvs, this twisting maybe difficult to implement as it may be difficult to sample from the hazard ratetwisted distribution. Fortunately, in many such cases it may be feasible to twistwith a function that is asymptotically similar to the hazard function, such thatit is easy to sample from the ‘asymptotically hazard rate twisted’ distributionand the resultant distribution is asymptotically optimal in estimating the rareevent probabilities that we consider.

In this section, we first review the hazard rate twisting technique and in-troduce the more general asymptotic hazard rate twisting technique when all(X i : i ≤ n) are light tailed. We also prove that this technique asymptoticallyoptimally estimates P (T > u) in the light-tailed settings. In Section 5, we de-velop the idea of asymptotic hazard rate twisting when one or more (X i : i ≤ n)are subexponentially distributed.

4.1 Hazard Rate Twisting

As mentioned in Section 3.1, the pdf fi(x) may be re-expressed asλi(x) exp[−�i(x)], where λi(·) and �i(·) denote the hazard rate and the haz-ard function of X i, respectively. Then the pdf obtained by hazard rate twistingof the original distribution by an amount θ < 1 is given by hθ

i , where

hθi (x) = λi(x)(1 − θ ) exp[−(1 − θ )�i(x)].

Note that hθi has corresponding hazard rate λi(x)(1 − θ ) and hazard func-

tion �i(x)(1 − θ ) at x. For instance, the pdf obtained by hazard rate twist-ing of the Weibull(α, λ) distribution in Examples 2.2 and 2.4 by an amountθ is Weibull(α, λ(1 − θ )1/α). Similarly, hazard rate twisting of Pareto(α, λ) dis-tribution in Example 2.6 by θ results in Pareto(α(1 − θ ), λ) distribution. Thepdf obtained by hazard rate twisting of N (μ, σ 2) has a more complex form.Note that λi(x) = fi(x)/F̄i(x). Hence, hθ

i (x) may alternatively be expressedas fi(x)(1 − θ )/F̄i(x)θ . In particular, the pdf obtained by hazard rate twistingN (μ, σ 2) by amount θ has the form

(1√2πσ

)1−θ (1 − θ ) exp[ − 1

2 ( x−μ

σ)2

](∫ ∞

x exp[ − 1

2 ( y−μ

σ)2

]d y

)θ. (16)

To see that hazard rate twisting results in a significantly different distri-bution compared to exponential twisting note that exponentially twisting a



Weibull(α, λ) distribution is feasible only when α ≥ 1, and for α > 1 the re-sultant distribution is no longer Weibull. As mentioned earlier, exponentialtwisting a Pareto distribution is not feasible. Furthermore, it is easy to checkthat N (μ+ θσ 2, σ 2) is the distribution obtained when N (μ, σ 2) is exponentiallytwisted by an amount θ .

Juneja and Shahabuddin [2002] show that hazard rate twisting with θu =1 − n

�i (u) , asymptotically optimally estimates P (Sn > u) when X i ’s are iid andsubexponentially distributed (asymptotically as u → ∞, for a fixed n). This canbe easily seen to be true more generally for θu = 1 − 1

(�i (u)) . Here, we refer toany non-negative function a(u) as (b(u)), where b(u) is another non-negativefunction, if there exist constants 0 < K1 < K2, and K1b(u) ≤ a(u) ≤ K2b(u) forall sufficiently large u.

4.2 Conditions on Asymptotic Hazard Functions

Whenever hazard rate twisting results in a distribution from which it is easyto sample (e.g., Weibull distribution and Pareto distribution discussed at theend of Section 4.1), it may be practical to conduct hazard rate twisting to esti-mate P (T > u). However, when this is not the case, more general methods maybe needed. For instance, it may be seen from (16) that hazard rate twisting aNormal distribution does not result in another Normal distribution. In fact it re-sults in a distribution from which generating random variates may be difficult.The same can be seen to be true for the Lognormal distribution. Fortunately,as noted in Example 2.3, the hazard function �(x) of a Normal distributionis asymptotically similar to (x−μ)2

2σ 2 . If this function is used to twist the Normaldistribution, then we get another Normal distribution. Similarly, from Exam-ple 2.5, it can be seen that if we twist a Lognormal distribution with (log x−μ)2

2σ 2

(asymptotically similar to its hazard function), we again get a Lognormal dis-tribution. We now describe some conditions imposed on the asymptotic hazardfunctions that, with a suitably selected twisting parameter, ensure that theassociated importance sampling asymptotically optimally estimates P (T > u).

Consider non-negative ‘asymptotic hazard’ functions �̃i(·) : � → �+, i ≤ ksuch that

�̃i(x) ∼ �i(x). (17)

Further suppose that �̃i(x) satisfy the condition

�̃i(x) ≤ �i(x) + ai max(log �i(x), 1) (18)

for some ai ≥ 0, and for all x and i ≤ k.Clearly, the conditions (17) and (18) are satisfied by �̃i(x) = �i(x).Let �i(θ ) = log

∫ ∞−∞ exp[θ�̃i(x)] fi(x)d x. Note that if �̃i is used as a proxy for

�i in hazard rate twisting of X i by amount θ , the resultant pdf equals

exp[θ�̃i(x) − �i(θ )] fi(x).

Also note that �i(θ ) = 11−θ

if �̃i = �i.



4.3 Asymptotically Optimal Simulation

Consider an importance sampling distribution P∗ under which all X i ’s remainmutually independent. Each X i for i ≤ k, has a pdf given by

f ∗i (x) = exp[θu�̃i(x) − �i(θu)] fi(x)

and θu = 1 − 1/ (uα). Each X i for i > k has the original distribution.Note that the likelihood ratio of P with respect to P∗ equals

L = exp

[−θu

k∑i=1

�̃i(X i) +k∑

i=1

�i(θu)

]a.s. (19)

THEOREM 4.1. Under Assumption 3.1, (17) and (18),

limu→∞

log EP∗ L2 I (T > u)uα

= −2 minj≤τ

λ∗j . (20)

In view of Theorem 3.5, this establishes that P∗ estimates P (T > u) asymp-totically optimally. Also note that Theorem 4.1 implies that,

limu→∞

log EP∗ L2 I (Sn > u)uα

= −2λ∗. (21)

Remark 4.2. In light of Example 3.7, this result may appear surprising asthat example illustrates that the exponential twisting based importance sam-pling distribution that asymptotically optimally estimates P (Sn > u) may notbe as efficient in estimating P (T > u). The intuitive explanation for effec-tiveness of asymptotic hazard rate twisting for both the probabilities is thatunder this twisting, the probability of a rv taking large values is much largerthan that under naive simulation and is typically more spread out comparedto exponential twisting. For example, it can be seen that if an asymptotic haz-ard function 1

2

( x−μ

σ

)2 is used to twist N (μ, σ 2) rv and θu is set to 1 − cu2 for

a constant c, then, the resulting distribution is N (μ, u2σ 2/c). It is easy to seethat P (N (μ, u2σ 2/c) > u/2) and P (N (μ, u2σ 2/c) > u) both converge to positiveconstants as u → ∞.

Remark 4.3. Empirically we observed that the above hazard rate twistingdistribution gives large variance reduction compared to naive simulation onlywhen the probability of interest is extremely small. A significant improvementover it is achieved if in a manner analogous to that described in Section 3.3,we use a convex combination of τ probability measures each using asymptotichazard rate twisting geared to a particular path as described above (again, theoriginal measure may be used for paths that do not contain any activity X i,i ≤ k). The proof of asymptotic optimality then follows exactly as the proofof Theorem 3.8 (with Theorem 4.1 replacing Theorem 3.2 in the proof) and istherefore omitted. In our experiments we report the results using this mixtureof probability measures.

5. HEAVY-TAILED ANALYSIS

We now develop an exact tail-asymptotic and asymptotic hazard ratetwisting based simulation methods to estimate P (T > u) for the case



where the heaviest-tailed rv in the network has a subexponentialdistribution.

5.1 Exact Asymptotic

Theorem 5.2 develops an exact asymptotic for P (T > u). The following assump-tion is needed to prove it.

ASSUMPTION 5.1. The random variables (X i : i ≤ n) are non-negative. Fur-thermore, X 1 is subexponentially distributed and there exists a (k : 1 ≤ k ≤ n)and positive constants (γi ≤ 1 : i ≤ k) such that for i ≤ k,

limu→∞

P (X i > u)P (X 1 > u)

= γi

and

limu→∞

P (X i > u)P (X 1 > u)

= 0

for i > k (if k < n).Under Assumption 5.1, it is well known that

P

(∑i≤n

X i > u

)∼ P (max

i≤nX i > u) ∼

∑i≤n

P (X i > u), (22)

(see, e.g., Sigman [1999]).THEOREM 5.2. Under Assumption 5.1,

P (T > u) ∼(

k∑i=1

γi

)P (X 1 > u). (23)

Example 5.3. Consider the case where all X i ’s have a Weibull distribution,that is, P (X i > x) = exp (−(ηix)αi ), x > 0. Suppose that α1 < 1 so that X 1 issubexponentially distributed (see, e.g., Embrechts et al. [1997]). Since it has theheaviest tail, it follows that if αi = α1 then, ηi ≥ η1. In this setting, note that

limu→∞

P (X i > u)P (X 1 > u)

equals either one or zero. Hence, k corresponds the number of rvs with thedistribution identical to X 1, and P (T > u) ∼ kP (X 1 > u) = k exp (−(η1u)α1 ).

Similarly, if for each i, X i has a Lognormal(μi, σ 2i ) distribution, that is, its pdf

fi(x) = 1

x√

2πσ 2i

exp

(−(log x − μi)2

2σ 2i

)for x > 0,

then again σ 2i ≤ σ 2

1 and if σ 2i = σ 2

1 , then μi ≤ μ1. Here again, k denotes thenumber of rvs with the distribution identical to X 1, and

P (T > u) ∼ kP (X 1 > u) ∼ kσ1√2π log u

exp

(−(log u − μ1)2

(2σ 21 )

)

(an exact asymptotic of P (X 1 > u) may be inferred from Example 2.3).



Finally, suppose that all X i have a Pareto(αi, ηi) distribution, that is,

P (X i > x) = 1(1 + ηix)αi

for x > 0.

Now X i, for i ≤ k are rvs with αi = α1, while αi > α1 for i > k. Hence,

P (T > u) ∼k∑

i=1

1(1 + ηiu)α1

∼ 1uα1

k∑i=1

1η

α1i

.

5.2 Conditions on Asymptotic Hazard Functions

We now describe the conditions imposed on the non-negative asymptotic hazardfunctions �̃i(·) : � → �+, i ≤ n, that can be used to asymptotically optimallyestimate P (T > u) via appropriate importance sampling. As in the light-tailedsettings, we assume that

�̃1(x) ∼ �1(x), (24)

and for all i, there exists ai ≥ 0,

�̃i(x) ≤ �i(x) + ai max(log �i(x), 1). (25)

Rather than selecting �̃i(u) ∼ �i(u) for i ≥ 2, as in the light-tailed settings,we allow greater flexibility in the choice of �̃i by considering those that satisfy

lim infu→∞

�̃i(u)�̃1(u)

≥ 1 (26)

for i ≥ 2. It is easy to see that the Assumption 5.1 ensures that the hazardfunctions satisfy this condition. To see this, note that under Assumption 5.1,

limu→∞ �1(u)

( �i(u)�1(u)

− 1)

≥ 0.

Therefore,

lim infu→∞

�i(u)�1(u)

≥ 1.

To ease the analysis, we further restrict �̃1 to be eventually a nondecreasingfunction and to satisfy the following property: For every ε > 0, there exists a uε

such that

n∑i=1

�̃1(xi) ≥ �̃1

(n∑

i=1

xi

)− ε. (27)

for all (x1, . . . , xn) ≥ 0 with∑n

i=1 xi ≥ uε . This property is shown to be true inJuneja and Shahabuddin [2002] for any non-negative function �(·) that is even-tually everywhere differentiable and whose derivative is eventually decreasingto zero. It is satisfied by hazard rates of commonly encountered subexponentialdistributions such as Pareto, Lognormal and Weibull distributions with shapeparameter less than 1.



As mentioned in Section 4, for Lognormal(μi, σ 2i ) rv, the hazard-rate-twisted

distribution is not Lognormal and may be difficult to sample from. However,�i(x) ∼ (log x − μi)2/(2σ 2

i ). Hence, we can set �̃i(x) = (log x − μi)2/(2σ 2i ). This

function satisfies the restrictions imposed on �̃i(·) in (25) and the resultanttwisted distribution can be seen to be Lognormal.

5.3 Asymptotically Optimal Simulation

Let P∗ denote the probability measure under which the X i ’s remain mutuallyindependent and each X i has the distribution

d F ∗i (x) = exp[θu�̃i(x) − �i(θu)]d Fi(x),

where θu = 1 − 1/ (�̃1(u)). In Juneja and Shahabuddin [2002], this value ofθu is shown to work well for estimating P (Sn > u) when all rvs are iid andsubexponentially distributed and one uses plain hazard rate twisting. The factthat P (T > u) has the same exact asymptotic as P (Sn > u) suggests that asimilar θu may work in this setting as well.

Theorem 5.2 implies that

limu→∞

log P (T > u)�1(u)

= −1. (28)

Theorem 5.4 states the main result of this section. Together with (28), it impliesasymptotic optimality of P∗.

THEOREM 5.4. Under Assumption 5.1, (24), (25) and (27),

limu→∞

log EP∗ L2 I (T > u)�1(u)

= −2. (29)

Remark 5.5. Theorem 5.4 can be seen to hold even if the light-tailed X ′is

retain their original distribution under P∗. The proof then follows along thelines of Theorem 3.2 where all rvs with αi > α retain their original distribu-tion under the asymptotically optimal change of measure. However, we do notexplicitly prove this to avoid repetition.

Remark 5.6. As we discussed in Remark 4.3 in the light-tailed settings,significant performance improvement over P∗ may be achieved, if in a manneranalogous to that described in Section 3.3, we use a convex combination of τ

probability measures each using asymptotic hazard rate twisting geared to aparticular path. The proof of asymptotic optimality then follows exactly as theproof of Theorem 3.8. As in the light-tailed settings, in our experiments wereport the results using this mixture of probability measures.

6. PARAMETER SELECTION FOR IMPORTANCE SAMPLING

In the previous sections we gave order of magnitude specifications of optimalθu for asymptotically optimal estimation of P (Sn > u) and P (T > u). Similarly,we proved that any choice of (pj : j ≤ τ ) where each pj > 0 and

∑j≤τ pj = 1

can be used to asymptotically optimally estimate P (T > u) with appropriate



probability measures corresponding to each setP j . Empirically, we observe thatwhen superexponential rvs are involved and exponential twisting is conductedin estimating P (Sn > u), setting θu as a solution to (8) performs very well.

Empirically we also observe that when asymptotic hazard rate twisting orhazard rate twisting is applied, setting θu to any 1 − 1/ (uα) function in light-tailed settings (Theorem 4.1) and 1 − 1/ (�1(u)) function in the heavy-tailedsettings (Theorem 5.4) may achieve variance reduction only for extremely largevalues of u. To achieve simulation efficiency increase for practical ranges ofu, say when the rare event probabilities are of the order of 10−1 to 10−6, weneed good heuristics to select θu. Similarly, though setting each pj = 1/τ givesreliable results, the results are improved substantially (about 2-3 times in ourexperiments) by selecting them cleverly, as outlined in this section.

6.1 Selecting θu

We specialize a heuristic from Huang and Shahabuddin [2004] that considers amore general set-up. We focus on arriving at a good θu for estimating P (Sn > u).This then extends to finding good (θu j , j ≤ τ ) for estimating P (T > u) by settingeach θu j to the value that is good for estimating the corresponding P (L j > u).

For estimating P (Sn > u), the second moment obtained by asymptotic hazardrate twisting by amount θ is given by (see (19))

EP∗ (L2 I (Sn > u)) = exp

(2

∑i≤n

�i(θ )

)EP∗

[exp

(−2θ

∑i≤n

�̃i(X i)

)I (Sn > u)

],

(30)both under light and heavy-tailed settings. Here, for light-tailed settings, wehave assumed that k defined in Assumption 3.1 equals n. When, k < n, theimpact of (X i : i > k) on the rare event probability P (Sn > u) is negligible, soheuristically we can proceed with finding best twisting parameter by ignoringthese rvs.

We now follow two steps. In the first step, for a given θ ≥ 0, we find adeterministic upper bound on EP∗ (L2 I (Sn > u)). This is done by minimizing∑

i≤n �̃i(xi) over all (xi : i ≤ n) satisfying∑

i≤n xi ≥ u. In the case where anyrandom variable X i is non-negative we add an additional constraint xi ≥ 0. Letg (u) denote such a solution. This yields the bound

EP∗ (L2 I (Sn > u)) ≤ exp(

2∑i≤n

�i(θ ) − 2θ g (u))

. (31)

Often, g (u) may not be available in closed form, however a simple approximatesolution to g (u) may be found that is close to it for all sufficiently large valuesof u (this is illustrated through an example in Case 1 in Section 7). Then,we replace g (u) above by its approximation. In the second step we find theθ = θu ≥ 0 that minimizes the above bound. In all the examples reported inSection 7, the θu j so obtained (by focussing on P (L j > u)) have a form thatis consistent with the requirements in the corresponding theorem statements,thus assuring asymptotic optimality. For small u, the θu j so obtained may benegative. Then, we set θu j = 0, corresponding to naive simulation.



6.2 Selection of pj’s

Here, we use the ideas reported in Juneja and Shahabuddin [2006]. We illus-trate them in the framework in Section 3.3. Again, first an upper bound toEP∗ (L2 I (T > u)) is developed where L is given by (14). Then, we find the vector(pj : j ≤ τ ) that minimizes this bound. Note that for θu j > 0,

LI (L j > u) ≤ K j

pjI (L j > u),

where K j = exp(−θu j u + ∑i∈P j

Hi(θu j )). Therefore,

LI (T > u) ≤(

maxj≤τ

K j

pj

)I (T > u),

and

EP∗(L2 I (T > u)) ≤(

maxj≤τ

K j

pj

)2

.

Minimizing the above upper bound subject to∑

j≤τ pj = 1, and pj ≥ 0 forj ≤ τ , yields the weights

p∗j = K j∑m

i=1 Ki.

The asymptotic hazard rate twisting case, both in light and heavy-tailedsettings, is similar; in this case we get the same expression for p∗

j except thatnow K j = exp(−θu j g (u) + ∑

i∈P j�i(θu j )). Here, we make use of the bound in

(31) evaluated at θ = θu j . This methodology is used to determine pj ’s in theexperiments in Section 7.

7. NUMERICAL RESULTS

We consider an example with n = 15 and τ = 10. The (L j : j ≤ 10), respectively,equal

X 1 + X 4 + X 11 + X 15,X 1 + X 4 + X 12,

X 2 + X 5 + X 11 + X 15,X 2 + X 5 + X 12,X 2 + X 6 + X 13,X 2 + X 7 + X 14,

X 3 + X 8 + X 11 + X 15,X 3 + X 8 + X 12,X 3 + X 9 + X 15,

and X 3 + X 10 + X 14.

We test the performance of the proposed techniques for five sets of distribu-tions of (X i : i ≤ 15) described below. For j ≤ 10, let nj denote the number ofterms in P j .



Case 1. Subexponential-tailed with Hazard Rate Twisting. In this caseX 1, . . . , X 8 have the Pareto(2, 1) distribution and X 9, . . . , X 15 have theWeibull(0.5, 2) distribution. Note that in this case all X i ’s are heavy tailed.We set �̃i(x) = �1(x) = 2 ln(1 + x) for i = 1, . . . , 8. Also, �̃i(x) = �i(x) = (2x)0.5

for i = 9, . . . , 15. It follows that �i(θ ) = − ln(1 − θ ) for all i = 1, . . . , 15. Weapply the procedure outlined in Section 6 to determine good (θu j : j ≤ 10). Asan illustration, consider the procedure to determine θu1 . Here, we first need tominimize

2 ln(1 + x1) + 2 ln(1 + x4) +√

2x11 +√

2x15,

subject to x1 + x4 + x11 + x15 ≥ u and xi ≥ 0, 1 ≤ i ≤ 4. Note that 2 ln(1 + x) isdominated by

√2x for all sufficiently large x. Due to the concavity of ln(1 + x)

it is easy to see that the minimum occurs around (x1, x4, x11, x15) = (u, 0, 0, 0)or (0, u, 0, 0) (note that the solution also gives an indication of the most likelyway {L1 > u} occurs: X 1 exceeds u or X 4 exceeds u). The minimum value isg (u) = 2 ln(1 + u).

In the second step, we need to minimize

1(1 − θ )8

e−2θ g (u)

for θ ≥ 0. For u sufficiently large, this yields

θ ≡ θu1 = 1 − 4g (u)

.

Since g (u) = �1(u), θu1 is of the form consistent with the requirement in Theo-rem 6. In general, since all P j contain at least one Pareto distributed rv, it canbe seen that

θu j = 1 − nj

�̃1(u).

Case 2. Exponential-Tailed with Hazard Rate Twisting. In this case X 1, . . . , X 8have an exponential distribution with rate 0.5 and X 9, . . . , X 15 have an expo-nential distribution with rate 1. In this case α = 1, λ̃i = 0.5 for i = 1, . . . , 8, andλ̃i = 1 otherwise. Also, �i(θ ) = − ln(1 − θ ) for all i = 1, . . . , 15. The procedure inSection 6 yields

θu j = 1 − nj

λ∗j u

(recall that λ∗j = mini∈P j λ̃i).

Case 3. Superexponential-Tailed with Hazard Rate Twisting. In this case,X 1, . . . , X 8 have the Weibull(2, 1) distribution and X 9, . . . , X 15 have theWeibull(2, 2) distribution. Here α = 2, λ̃i = 1 for i = 1, . . . , 8, and λ̃i = 4 other-wise. Also, �i(θ ) = − ln(1 − θ ) for all i = 1, . . . , 15. The procedure in Section 6yields

θu j = 1 − nj

λ∗j uα

,

where, recall that λ∗j is given by (9).



Case 4. Superexponential-Tailed with Asymptotic Hazard Rate Twisting. Inthis case X 1, . . . , X 8 have the N (1, 0.02) distribution and X 9, . . . , X 15 have theN (1, 0.01) distribution. In this case α = 2, and λ̃i = 1/(2σ 2

i ) for i = 1, . . . , 15(where σ 2

i denotes the variance of X i). We twist each component i using theasymptotic hazard function �̃i(x) = (x −μi)2/(2σ 2

i ). Hence �i(θ ) = −0.5 ln(1−θ )for all i = 1, . . . , 15. The procedure in Section 6 yields

θu j = 1 − nj

2λ∗j (u − aj )α

, (32)

where aj is the mean of L j . As an illustration, consider j = 2. Here, n2 = 3,a2 = μ1 + μ4 + μ12 and λ∗

2 = 12(σ 2

1 +σ 24 +σ 2

12). Then,

θu2 = 1 − 3(σ 2

1 + σ 24 + σ 2

12

)(u − (μ1 + μ4 + μ12))2

.

Under asymptotic hazard rate twisting the distribution of each X i for i = 1, 4and 12, becomes N (μi, σ 2

i /(1 − θu2 )). In particular,

(X 1 + X 4 + X 12) ∼ N(

μ1 + μ4 + μ12,(u − (μ1 + μ4 + μ12))2

3

)

under the new measure. Thus, the mean is unchanged but the variance is in-creased (so that the probability of exceeding u is no longer rare).Case 5. Superexponential-Tailed with Exponential Twisting. Same as Case 4,but now we apply the exponential-twisting based algorithm of Section 3.3 toeach path. For each of the ten paths, we find the θu j as the solution to (8).Note that Hi(θ ) = μiθ + σ 2

i θ2/2 for all i = 1, . . . , 15 (where μi and σ 2i denote

the mean and variance of X i). As an illustration, consider j = 2. Here, thesolution to (8) yields

θu2 = (u − (μ1 + μ4 + μ12))/(σ 2

1 + σ 24 + σ 2

12

).

Applying exponential twisting by amount θu2 to each X i in P2 makes(X 1 + X 4 + X 12) ∼ N (u, σ 2

1 + σ 24 + σ 2

12), that is, under the new measure, themean duration of that path is changed to u but the variance is unchanged.

For each case, we simulate for different values of u, and estimate the proba-bility, the 99% relative error (i.e., the 99% confidence interval half-width uponthe quantity to be estimated), the variance reduction factor (VRF) and the ef-ficiency increase factor (EIF). The VRF is the ratio of the variance of the naivesimulation estimator to that of the importance sampling estimator; the varianceof the naive simulation estimator is estimated by substituting the accurate im-portance sampling estimator of P (T > u) in P (T > u)(1 − P (T > u)). The EIFis the ratio of the expected run-length of naive simulation to that of expectedrun-length of the importance sampling simulation, for achieving the same rel-ative error. We estimate it by running naive simulation for the same numberof replications as the importance sampling simulation, computing the ratio ofthe two CPU times, and then multiplying it with the VRF. We use MATLABfor the computation. Since MATLAB makes extensive use of matrix methods,



Table I. Estimates of Probabilities, 99% Relative Errors, Variance Reduction Factors(VRF), and Efficiency Increase Factors (EIF) for the Different Cases and for Various

u’s. The Paired Quantities in Brackets are (VRF, EIF)

Case 1 (Subexp) Case 2 (Exp) Case 3 (Superexp)Estimates Estimates Estimates

u (100,000 rep) u (100,000 rep) u (100,000 rep)

10 2.41 ∗ 10−1 ± 1.1% 10 3.62 ∗ 10−1 ± .85% 4 1.36 ∗ 10−1 ± 1.2%(1.7, 0.7) (1.6, .55) (2.7, 1.2)

20 4.87 ∗ 10−2 ± 2.1% 15 5.66 ∗ 10−2 ± 1.7% 5 7.69 ∗ 10−2 ± 2.3%(3.0, 1.3) (3.9, 1.2) (17, 7.7)

50 4.18 ∗ 10−3 ± 4.6% 20 6.80 ∗ 10−3 ± 2.5% 5.5 1.21 ∗ 10−3 ± 2.9%(7.4, 3.1) (15, 5.2) (65, 32)

100 8.39 ∗ 10−4 ± 7.1% 25 7.14 ∗ 10−4 ± 3.4% 6 1.57 ∗ 10−4 ± 3.4%(16, 6.5) (80, 28) (356, 155)

400 4.70 ∗ 10−5 ± 11% 30 7.39 ∗ 10−5 ± 4.2% 6.5 1.61 ∗ 10−5 ± 4.0%(109, 45) (498, 142) (2510, 990)

1600 3.07 ∗ 10−6 ± 20% 35 7.00 ∗ 10−6 ± 5.0% 7.0 1.44 ∗ 10−6 ± 4.6%(519, 216) (3708, 1117) (21444, 9760)

the CPU time is usually not proportional to the number of replications of thesimulation. Hence, the EIF numbers (but not the VRF numbers) may changesubstantially if we change the number of replications.

Table I gives the results for the first three cases. Note that different sets ofu’s have been selected for the three cases, so that the estimated probabilities liein similar ranges. The results are as expected; both the VRF and EIF becomelarger as u increases.

Table II gives the results for Case 4 and Case 5, that apply different sim-ulation algorithms to the same problem. Note that as u increases, exponen-tial twisting far exceeds asymptotic hazard rate twisting in terms of efficiency.Hence for the case of superexponential X i ’s, whenever exponential twisting iseasily implementable as in the Normal distribution case, we recommend usingthe exponential-twisting based algorithm of Section 3.3. In cases where expo-nential twisting is hard to implement (e.g., Case 3), asymptotic hazard ratetwisting may be a viable alternative.

7.1 Effect of Problem Size

In this section, through simple examples we test the efficacy of proposed im-portance sampling techniques as n increases. We consider two extreme casesfor different values of n:

(1) τ = 1. In PERT network terminology, this corresponds to all the activitiesin series.

(2) τ = n. In PERT network terminology, this corresponds to all the activitiesin parallel.

We refer to the first as the series case and to the second as the parallel case.We observe that the variance reduction obtained by asymptotic hazard ratetwisting deteriorates significantly with n in the series case. The deteriorationis insignificant in the parallel case. In contrast, exponential twisting, wheneverapplicable, is robust in both these dimensions. Recall that in the parallel case,



Table II. Estimates of Probabilities, 99% Relative Errors,Variance Reduction Factors (VRF), and Efficiency

Increase Factors (EIF) for Normally Distributed ActivityDurations, Using Two Different Methods. The Paired

Quantities in Brackets are (VRF, EIF)

Case 4 (Asymp. HRT) Case 5 (Exp. Twist.)Estimates Estimates

u (100,000 rep) (100,000 rep)

4.3 2.57 ∗ 10−1 ± 1.1% 2.61 ∗ 10−1 ± .71%(1.5, 1.3) (3.8, 1.7)

4.5 5.61 ∗ 10−2 ± 2.7% 5.61 ∗ 10−2 ± 0.97%(1.6, 1.4) (12, 5.6)

4.7 6.31 ∗ 10−3 ± 5.0% 6.20 ∗ 10−3 ± 1.17%(4.2, 3.3) (78, 35)

4.9 3.60 ∗ 10−4 ± 9.3% 3.57 ∗ 10−4 ± 1.3%(21, 18) (1030, 471)

5.0 7.24 ∗ 10−5 ± 11% 6.65 ∗ 10−5 ± 1.4%(67, 60) (4960, 2350)

5.2 1.60 ∗ 10−6 ± 19% 1.43 ∗ 10−6 ± 1.6%(1137, 890) ( 187610, 89550)

Table III. Estimates of Variance Reduction Factors (VRF) UsingExponential Twisting and Asymptotic Hazard Rate Twisting, for

Series and Parallel Cases. The Activity Durations are N (0, 1), andthe u is Selected so that P (T > u) = 0.0001. (The VRF in the series

column were analytically calculated.)

Series Series Parallel ParallelVRF-exp VRF-hazard

n VRF-exp VRF-hazard (100,000 rep) (100,000 rep)

1 2388 317 2388 3175 2388 28 2085 296

10 2388 5 2108 22850 2388 1 1980 190

100 2388 1 1801 187

both for exponential twisting and asymptotic hazard rate twisting, we use amixture of appropriate probability distributions.

Specifically, we first consider the case where all the X i have the N (0, 1)distribution in the series and the parallel settings. We vary n, but at the sametime vary u so that P (T > u) remains fixed at 10−4. Results with both aymptotichazard rate twisting and exponential twisting are presented in Table III. In theseries case, it is easy to compute the second moment in each setting analytically,so no simulations were conducted.

We also do the same for the case where all the X i ’s have Pareto(2,1) distribu-tion. Results are presented in Table IV. Here, in each setting u was determinedthrough trial-and-error so that P (T > u) is approximately 10−4.

8. CONCLUSION AND FUTURE WORK

In this article, we developed tail asymptotics and fast simulation techniques forestimating the probability that the maximum of sums of a few random variables



Table IV. Estimates of VarianceReduction Factors (VRF) Using

Asymptotic Hazard Rate Twisting, forSeries and Parallel Cases. The ActivityDurations are Pareto(2,1), and the u is

Selected so that P (T > u) = 0.0001

Series ParallelVRF-Hazard VRF-Hazard

n (100,000 rep) (100,000 rep)

1 875 8755 68 673

10 14 63350 1 575

100 1 540

exceeds a large threshold. We considered both light and heavy-tailed randomvariables and introduced asymptotic hazard rate twisting based importancesampling in these settings. This may have implementation advantages overhazard rate twisting or exponential twisting, although the latter whenever easyto implement, is empirically seen to perform much better. We also noted thatstraightforward application of exponential twisting and asymptotic hazard ratetwisting may not be very effective in estimating P (T > u) when the associatedPERT network has many paths. This may be remedied by using a probabilitymeasure that is a mixture of probability measures so that for each path thereis a probability measure in this mixture that is tailored to it.

Our work focussed on small number of random variables. As mentioned in theintroduction, one useful generalization corresponds to extending these resultsto large networks when the random variables involved are light tailed. Thenthe tail asymptotics may be developed with the help of large deviations theory.In addition, it may be straightforward to extend the exponential twisting basedimportance sampling methods discussed here to large networks.

Consider a resource allocation problem where the distribution time of eachactivity is a function of the resources allocated to it. Further suppose that theobjective function corresponds to the probability of large delays. In this setting,the tail asymptotics developed here may serve as a surrogate objective functionand facilitate determining the asymptotic solution to this resource allocationproblem. The authors explore this in future research.

APPENDIX: PROOFS

PROOFS FOR SECTION 3: LIGHT TAILS, EXPONENTIAL TWISTING

We first prove some lemmas useful in proving Theorem 3.2.

LEMMA 8.1. Under Assumption 3.1,

lim infu→∞

log P (Sn > u)uα

≥ −λ∗. (33)



PROOF. First consider the case where α = 1. Since all X i ’s are real valuedrandom variables, there exist constants (bi : 2 ≤ i ≤ n) such that for each suchi, P (X i ≥ bi) > 0. Thus,

P (Sn > u) ≥ P

(X 1 ≥ u −

n∑i=2

bi

)n∏

i=2

P (X i ≥ bi).

From this, it easily follows that

lim infu→∞

log P (Sn > u)u

≥ −λ̃1.

Similarly, the above inequality holds for right-hand side equal to −λ̃i for 2 ≤i ≤ k and hence for −λ∗ = − mini≤k λ̃i.

Now consider α > 1. Suppose that (βi ≥ 0 : i ≤ k) are such that∑

i≤k βi = 1.Again,

P (Sn > u) ≥[∏

i≤k

P

(X i ≥ βi

(u −

n∑j>k

bj

))] [n∏

j>k

P (X j ≥ bj )

],

(with the usual convention that if k = n, then∑n

j>k bj = 0 and �nj>k P (X j ≥

bj ) = 1). Let b̄ = ∑nj>k bj . Thus

log P (Sn > u)uα

≥∑i≤k

log P (X i ≥ βi(u − b̄))uα

+∑j>k

log P (X j ≥ bj )uα

.

Taking lim inf as u → ∞, from Assumption 3.1, it follows that

lim infu→∞

log P (Sn > u)uα

≥ −∑i≤k

βαi λi.

Now by setting βi = 1/λ̃1/(α−1)i

/∑j≤k 1/λ̃

1/(α−1)j the required relation (33)

follows.

To prove Theorem 3.2 for α > 1 case, we need to develop an asymptotic upperbound for log moment generating function in this setting. To this end, we beginwith a simple observation regarding the log-moment generating function. Let Hbe the log-moment generating function corresponding to a distribution functionF . Using exp[θs] = 1 + θ

∫ s0 exp[θx]d x, we get∫ ∞

0exp[θs]d F (s) = F̄ (0) + θ

∫ ∞

0

∫ s

0exp[θx]d x d F (s)

= F̄ (0) + θ

∫ ∞

0exp[θx]F̄ (x)d x.

Since, for θ > 0,∫ 0−∞ exp[θs]d F (s) ≤ F (0), it follows that

exp[H(θ )] ≤ 1 + θ

∫ ∞

0exp[θx]F̄ (x)d x ∀ θ > 0. (34)

Let log+(θ ) denote max(log θ , 1).



LEMMA 8.2. Suppose that the tail distribution function F̄ (·) of a randomvariable satisfies the asymptotic relation

lim supu→∞

log F̄ (u)uα

≤ −λ, 0 < λ < ∞ (35)

for α > 1 and λ > 0. Then for every sufficiently small ε there exists a constantCε and constants C1 and C2 that are independent of θ such that

H(θ ) ≤(

(θ/α)α/(α−1)

(λ − ε)1/(α−1)

)(α − 1) + Cε + C1 log+(θ ) + C2 log+(λ − ε). (36)

PROOF. Given ε > 0, it follows from (35) that there exists Kε such that

F̄ (x) ≤ Kε exp[−(λ − ε)xα],

for all x ≥ 0. Using (34) and Lemma 8.3 given below, it follows that

H(θ ) ≤ 1 + log+(θ ) + log+(∫ ∞

0exp[θx]F̄ (x)d x

),

≤ 1 + log+(θ ) + log+(

Kε

∫ ∞

0exp[θx − (λ − ε)xα]d x

).

This in turn is upper bounded by

1 + log+(θ ) + log+ Kε +(

(θ/α)α/(α−1)

(λ − ε)1/(α−1)

)(α − 1) + K1 log+(θ ) + K2 log+(λ − ε).

The result follows from this.

LEMMA 8.3. For α > 1 and γ > 0, there exist constants K1 and K2 indepen-dent of θ > 0 such that

log(∫ ∞

0exp[θx − γ xα]d x

)≤

((θ/α)α/(α−1)

γ 1/(α−1)

)(α − 1) + K1 log+(θ ) + K2 log+(γ ).

PROOF. Consider the integral∫ ∞

0exp[θx − γ xα]d x. (37)

Let x̃ = ( θγ α

)1/(α−1) so that θ = γαx̃α−1. From Taylor’s expansion, it follows that

xα = x̃α + (x − x̃)αx̃α−1 + (x − x̃)2

2α(α − 1)ξα−2

x ,

where ξx lies between x̃ and x. Thus, (37) may be re-expressed as

exp[θ x̃ − γ x̃α]∫ ∞

0exp

[− (x − x̃)2

2γα(α − 1)ξα−2

x

]d x. (38)

Note that

θ x̃ − γ x̃α =(

(θ/α)α/(α−1)

γ 1/(α−1)

)(α − 1).



Thus, to complete the proof, it remains to prove that

log(∫ ∞

0exp

[− (x − x̃)2

2γα(α − 1)ξα−2

x

]d x

)≤ K1 log+(θ ) + K2 log+(γ ) (39)

for constants K1 and K2 independent of θ . Now,∫ ∞

0exp

[− (x − x̃)2

2γα(α − 1)ξα−2

x

]dx ≤ 2x̃

+∫ ∞

2x̃exp

[− (x − x̃)2

2γα(α − 1)ξα−2

x

]dx. (40)

Consider the case α ≥ 2, so that ξα−2x ≥ x̃α−2 for x ≥ x̃. Hence, the integral in

the right-hand side of (40) is bounded by∫ ∞

−∞exp

[− (x − x̃)2

2γα(α − 1)x̃α−2

]dx.

This integral equals√

2π√γα(α−1)x̃α−2

and hence (39) is true.

Now consider the case where α < 2. Here, ξα−2x ≥ xα−2 for x ≥ x̃ and (x− x̃)2 ≥

(x/2)2 for x ≥ 2x̃ and hence the integral in the right-hand side of (40) is boundedby ∫ ∞

0exp

[−xα

8γα(α − 1)

]dx = γ

−1α

∫ ∞

0exp

[− yα

8α(α − 1)

]dy.

Thus, (39) is true.

PROOF OF THEOREM 3.2. First consider α > 1. Fix ε > 0. Recall that log+ xdenotes max(log x, 1). From Lemma 8.2 and the fact that θu ∼ αλ∗uα−1, it followsthat for i ≤ k and u sufficiently large, Hi(θu) is ≤

λ∗ αα−1 uα 1

(λ̃i − ε)1/(α−1)(α − 1) + o(uα) + Cε + C1 log+(αλ∗uα−1) + C2 log+(λ̃i − ε)

(41)and for i > k, for any M < ∞, Hi(θu) is ≤

λ∗ αα−1 uα 1

(M − ε)1/(α−1)(α − 1) + o(uα) + Cε + C1 log+(αλ∗uα−1) + C2 log+(M − ε).

(42)It is easy to select Cε , C1 and C2 so that they are independent of i and u. Recallthat for u sufficiently large so that θu > 0,

Lθu = exp[

− θuSn +∑i≤n

Hi(θu)]

≤ exp[

− θuu +∑i≤n

Hi(θu)]

a.s.

It then follows that

P (Sn > u) = EPθuLθu I (Sn > u) ≤ exp

[− θuu +

∑i≤n

Hi(θu)]. (43)

Using the estimates in (41) and (42), recalling that θu ∼ αλ∗uα−1, taking thelogarithms, dividing by uα and taking lim sup, we get lim supu→∞

log P (Sn>u)uα is



less than or equal to

−αλ∗ +∑i≤k

λ∗ αα−1

1(λ̃i − ε)1/(α−1)

(α − 1) +∑i>k

λ∗ αα−1

1(M − ε)1/(α−1)

(α − 1).

This holds for all sufficiently small ε > 0 and M < ∞. Thus, taking limit asM → ∞ and then limit as ε → 0, it follows that

lim supu→∞

log P (Sn > u)uα

≤ −αλ∗ +∑i≤k

λ∗ αα−1

1

λ̃1/(α−1)i

(α − 1).

The right-hand side above equals −λ∗ and (6) follows from this andLemma 8.1. Similarly, by essentially repeating the analysis from (43) onwards,EPθu

[L2θu

I (Sn > u)] may be bounded from above to establish (7).Now consider α = 1. Without loss of generality, let λ1 = mini≤k λ̃i. Again,

P (Sn > u) = EPθu[Lθu I (Sn > u)] ≤ exp

[− θuu +

∑i≤n

Hi(θu)],

and

EPθ

[L2

θuI (Sn > u)

] ≤ exp[

− 2θuu + 2∑i≤n

Hi(θu)].

Since θu ∼ λ∗ and each Hi(θu) = o(u), the result follows.

PROOF OF THEOREM 3.5. From Theorem 3.2, it follows that when P̃ j isnonempty, then

limu→∞

log P (L j > u)uα

= −λ∗j .

Otherwise, if P̃ j is empty, then

P (L j > u) ≤ P

⎛⎝⋃

i∈P j

{X i > u/|L j |}⎞⎠ ≤

∑i∈P j

P (X i > u/|L j |).

Since, due to Assumption 3.1, �i (u/|L j |)uα → ∞, it follows that

limu→∞

1uα

log P (L j > u) = −∞. (44)

The proof now follows from the inequalities

maxj≤τ

P (L j > u) ≤ P (T > u) = P

(⋃j≤τ

L j > u

)

≤∑j≤τ

P (L j > u)

≤ τ maxj≤τ

P (L j > u).

Taking logarithms in the above inequalities, dividing by uα, and taking limitsas u → ∞, the result follows.



PROOF OF THEOREM 3.8. Note again that

EP∗ [L2 I (T > u)] ≤∑j∈K

EP∗ [L2 I (L j > u)].

Thus, it suffices to show that

lim supu→∞

log EP∗ [L2 I (L j > u)]uα

≤ −2λ∗j , (45)

for all j ≤ τ (recall that λ∗j = ∞ if P̃ j is empty). To see this, first consider j

such that P̃ j is nonempty. Again consider the expression for L given in (14).This may be upper bounded by

1pj

exp[

− θu j

∑i∈P j

X i +∑i∈P j

Hi(θu j )]. (46)

Thus, (45) follows from the arguments used in proof of Theorem 3.2 special-ized to activities in P j .

Now consider the case j such that P̃ j is empty. Note that EP∗ [L2 I (L j > u)] =EP [LI (L j > u)] and that L is uniformly upper bounded by 1

pjalong the sample

paths in the set {L j > u}. Thus,

EP [LI (L j > u)] ≤ 1pj

P (L j > u).

The result then follows from (44).

PROOFS FOR SECTION 4: LIGHT TAILS, HAZARD RATE TWISTING

The following lemma is useful in proving Theorem 4.1.

LEMMA 8.4. If �̃i satisfies (18), then

exp[�i(θ )] ≤ p(

11 − θ

)for 0 < θ < 1, where p(x) is a polynomial in x.

PROOF. We need to show that∫ ∞

−∞exp[θ�̃i(x)]λi(x) exp(−�i(x))dx (47)

is upper bounded by a polynomial term in 11−θ

. From (18), we can upper boundthis term by ∫ ∞

−∞λi(x) exp[ai max(log �i(x), 1) − (1 − θ )�i(x)]dx.

Now simply set y = (1−θ )�i(x), then dy = (1−θ )λi(x)dx. Note that �i(−∞) = 0and �i(∞) = ∞. Thus, the above integral is upper bounded by

1(1 − θ )ai+1

∫ ∞

0max(exp[1], y)ai exp[− y]dy,

establishing the result.



PROOF OF THEOREM 4.1. We first establish (21). In view of Lemma 8.1, it suf-fices to show that

lim supu→∞

log EP∗ [L2 I (Sn > u)]uα

≤ −2λ∗.

Some notation is useful in displaying intermediate results used to prove (21).For M > 0, let

AM = {Sn > u} ∩⋂i>k

{X i ≤ u/M },

and

BM =⋃i>k

{X i > u/M }.

Note that {Sn > u} ⊂ AM ∪BM . Equation (21) follows easily from Lemmas 8.5and 8.6 stated below.

To complete the proof of Theorem 4.1 note that

EP∗ [L2 I (T > u)] ≤∑j≤τ

EP∗[L2 I (L j > u)

]. (48)

Thus, it suffices to show that

lim supu→∞

log EP∗[L2 I (L j > u)

]uα

≤ −2λ∗j .

To see this, first consider j such that P̃ j is nonempty. Consider again the ex-pression for L given in (19). This may be bounded from above by

exp[

− θu

∑i∈P̃ j

�̃i(X i) +∑i∈P̃ j

�i(θu) + o(uα)]

a.s., (49)

where we use the fact that∑

i≤k,i �∈P̃ j�i(θu) is exp[o(uα)] (see Lemma 8.4). The

proof follows as in the proof of (21), with the caveat that the additional exp[o(uα)]in the likelihood ratio in no way affects that proof.

Now consider the case j such that P̃ j is empty. Note that EP∗ [L2 I (L j >

u)] = EP [LI (L j > u)]. Again, note that L is uniformly upper bounded by anexp[o(uα)] term on the set {L j > u}. Thus,

EP [LI (L j > u)] ≤ exp[o(uα)]P (L j > u).

The result then follows from (44).

LEMMA 8.5. Under Assumption 3.1, (17) and (18) ,

limM→∞

lim supu→∞

log EP∗ [L2 I (AM )]uα

≤ −2λ∗

LEMMA 8.6. Under Assumption 3.1, (17) and (18),

limM→∞

lim supu→∞

log EP∗ [L2 I (BM )]uα

≤ −2λ∗



PROOF OF LEMMA 8.5. Due to Lemma 8.4, (1−θ )�i(θ ) → 0 as θ ↑ 1, and hence,

�i(θu) (uα)

→ 0 (50)

as u → ∞. In the following discussion we assume that u is sufficiently large sothat 0 < θu < 1. Recall display (19) and note that AM ⊂ {∑k

i=1 X i > u(1− n−kM )}.

In view of these, it suffices to show that for any positive ε < 1, there exists aconstant C̃ε such that,

exp[

− θu

k∑i=1

�̃i(X i)]

I

(k∑

i=1

X i > u(

1 − n − kM

))

≤ exp[−λ∗(1 − ε)θuuα

(1 − n − k

M

)α

+ C̃ε

](51)

as such. The result then follows by squaring the above equation, taking ex-pectation on both sides with respect to P∗ (note that the right-hand side isdeterministic), dividing the logarithm of the resultant value of by uα, takingthe limit as u → ∞, and letting ε → 0 and M → ∞ in the right-hand side.

To see (51), note that since �̃i is non-negative and �̃i(x) ∼ λ̃ixα, it followsthat for x ≥ 0, for any positive ε < 1, there exists a positive constant Cε suchthat

�̃i(x) ≥ λ̃ixα(1 − ε) − Cε .

Using this, and letting I denote the random set of rvs amongst (X 1, . . . , X k)that are positive, we can upper bound the left-hand side of (51) with

exp[

− θu(1 − ε)∑i∈I

λ̃i X αi + kCε

]I

(k∑

i=1

X i > u(

1 − n − kM

)).

Using Jensen’s inequality, it is easy to infer that∑i∈I

λ̃ixαi ≥ 1( ∑

i∈I 1/λ̃1

α−1i

)α−1

( ∑i∈I

xi

)α

.

To see this, consider a rv that takes value λ1

α−1j x j with probability

1/λ1

α−1j

/∑i∈I 1/λ̃

1α−1i for j ∈ I, and use the fact that α moment of a non-negative

rv is greater than or equal to the first moment raised to power α for α ≥ 1. Notethat

1( ∑i∈I 1/λ̃

1α−1i

)α−1≥ λ∗.

Thus, the left-hand side of (51) may be upper bounded by

exp

[−θu(1 − ε)λ∗

(k∑

i=1

X i

)α

+ kCε

]I

(k∑

i=1

X i > u(

1 − n − kM

)),

and (51) follows.



PROOF OF LEMMA 8.6. Note that EP∗ [L2 I (BM )] = EP [LI (BM )]. From (50),and the fact that �̃(x) ≥ 0 for all x in (19), it follows that L ≤ exp[o(uα)] as such.

Thus, EP [LI (BM )] ≤ exp[o(uα)]P (BM ). It thus suffices to show that

limM→∞

lim supu→∞

log P (BM )uα

≤ −2λ∗.

Note that

P

(⋃i>k

X i > u/M

)≤

∑i>k

P (X i > u/M )

and P (X i > u/M ) = exp(−�i(u/M )). Hence, it suffices to show that for i > k

limM→∞

lim infu→∞

�i(u/M )uα

≥ 2λ∗.

From Assumption 3.1, for any M , �i (u/M )( u

M )α increases to infinity. Hence,

lim infu→∞ �i (u/M )uα ≥ 2λ∗ and the result follows.

PROOFS FOR SECTION 5: HEAVY TAILS

PROOF OF THEOREM 5.2. Note that,

P (T > u) ≥ P

(⋃i≤n

{X i > u})

≥∑i≤n

P (X i > u) −∑

i, j ,i �= j

P (X i > u)P (X j > u).

Hence, dividing both sides by P (X 1 > u), it is easily seen that

lim infu→∞

P (T > u)P (X 1 > u)

≥k∑

i=1

γi.

Also note that,

P (T > u) ≤ P (Sn > u),

so from (22), it follows that

lim supu→∞

P (T > u)P (X 1 > u)

≤k∑

i=1

γi.

PROOF OF THEOREM 5.4. Note that the likelihood ratio of P with respect to P∗

equals

L = exp

[−θu

n∑i=1

�̃i(X i) +n∑

i=1

�i(θu)

]. (52)

In view of (26) and the fact that each �̃i is non-negative, it follows that thereexists a constant Dε > 0 such that

(1 − ε)�̃1(x) − Dε ≤ �̃i(x),



for all i and x. Using this, for u sufficiently large, L may be upper boundedby

exp

[nDε − θu(1 − ε)

n∑i=1

�̃1(X i) +n∑

i=1

�i(θu)

].

In particular, for u ≥ uε , using (27), noting that �̃1(Sn) ≥ �̃1(u) when Sn > ufor u sufficiently large, it follows that

L2I (T > u) ≤ L2I (Sn > u) ≤ exp

[2nDε − 2θu(1 − ε)�̃1(u) + 2θuε(1 − ε) + 2

n∑i=1

�i(θu)

].

In particular, the logarithm of the right-hand side upper boundslog EP∗ [L2 I (T > u)]. The result now follows by noting that due to (25) andLemma 8.4, �i(θu) is bounded from above by log p[ (�̃1(u))] (since �̃1(u) =1/(1 − θu)) and so due to (24)

�i(θu)�1(u)

→ 0

for each i, θu < 1, and that ε is arbitrary.

REFERENCES

ADALAKHA, V. G. AND KULKARNI, V. G. 1989. A classified bibliography of research on stochasticPERT networks. INFOR 27, 3, 272–296.

CROVELLA, M., TAQQU, M. S., AND BESTAVROS, A. 1998. Heavy tails in the world wide web. In PracticalGuide to Heavy Tails, R. Adler, R. Feldman, and M. S. Taqqu, Eds. Birkhauser, Boston, MA 24–31.

DEMBO, A. AND ZEITOUNI, O. 1998. Large Deviations Techniques and Applications, Second ed.Springer, New York, NY.

ELMAGHRABY, S. E. 1977. Activity Networks: Project Planning and Control by Network Models.Wiley, New York, NY.

EMBRECHTS, P., KLUPPELBERG, C., AND MIKOSCH, T. 1997. Modelling Extremal Events for Insuranceand Finance. Springer-Verlag, Berlin, Heidelberg, Germany.

FELLER, W. 1970. An Introduction to Probability Theory and Its Applications, Third ed. Vol. 1.Wiley, New York, NY.

HEIDELBERGER, P. 1995. Fast simulation of rare events in queueing and reliability models. ACMTrans. Model. Comput. Simul. 5, 1, 43–85.

HUANG, Z. AND SHAHABUDDIN, P. 2004. A unified approach for finite dimensional, rare-event MonteCarlo simulation. In Proceedings of the 2004 Winter Simulation Conference, R. Ignalls, M. Ros-setti, J. Smith, and B. Peters, Eds. (Piscataway, NJ). 1616–1624.

JUNEJA, S. AND SHAHABUDDIN, P. 2002. Simulating heavy-tailed processes using delayed hazardrate twisting. ACM Trans. Model. Comput. Simul. 12, 94–118.

JUNEJA, S. AND SHAHABUDDIN, P. 2006. Rare event simulation techniques. In Handbook onSimulation, S. Henderson and B. Nelson, Eds. Elsevier, Amsterdam, The Netherlands, 291–350.

JURECKOVA, J. 1981. Tail behavior of location estimators. Ann. Stat. 9, 3, 578–585.LELAND, W. E., TAQQU, M. S., WILLINGER, W., AND WILSON, D. V. 1994. On the self-similar nature of

ethernet traffic. IEEE/ACM Trans. Netw. 2, 1–15.PAKES, A. G. 2004. Convolution equivalence and infinite divisibility. J. Appl. Prob. 41, 407–

424.PETROV, V. V. 1975. Sums of Independent Random Variables. Springer-Verlag, New York, NY.



PITMAN, E. J. G. 1980. Subexponential distribution functions. J. Austral. Math. Soc. Ser. A 29,337–347.

SADOWSKY, J. S. AND BUCKLEW, J. 1990. On large deviation theory and asymptotically efficientMonte Carlo estimation. IEEE Trans. Inf. Theory 36, 3, 579–588.

SIGMAN, K. 1999. A primer on heavy-tailed distributions. Queu. Syst. 33, 261–275.

Received February 2004; revised December 2005 and August 2006; accepted October 2006


asymptotics and fast simulation for tail probabilities of …sandeepj/avail_papers/jks_07.pdf ·...

Documents