monte-carlo methods for the estimation of rare event probabilities_leder

7/31/2019 Monte-Carlo Methods for the Estimation of Rare Event Probabilities_Leder

1/35

Monte-Carlo Methods for the Estimation of

Rare Event Probabilities

Kevin Leder

December 2, 2011


2/35

Outline

1 Introduction

2 Importance Sampling

3 Splitting Method

4 Jackson Network


3/35

Introduction

Estimation of Small Probabilities via Monte Carlo

Why try to estimate the probability of rare events? Arent theyjust 0?

Phrase in terms of probabilities and random variables, there is a

random variable Z and a set A such thatP

(Z A) 0.Useful to embed rare event into sequence of rare events andstudy asymptotic properties i.e., consider the events {Xn A}.

We are interested in setting where the probabilities decayexponentially, i.e. there exists a > 0 such that

limn

1

nlogP (Zn A) = < 0.


4/35

Introduction

Estimating Rare Event Probabilities via Standard

Monte Carlo

Suppose we are interested in estimating P (Zn A) for somefixed n.

For a large integer k draw an i.i.d. vector (Zn1 , . . . , Znk ) then form

the estimator

pn,k = 1K

kj=1

1A(Znj ),

which is unbiased and consistent.

Consider relative error of estimator though

RE(pn,k) =sd(pn,k)

E[pn,k]=

1 P(Zn A)

kP(Zn A)

1kP(Zn A)

.

Number of replications k has to grow with 1/P(Xn A) to keep

relative error bounded.


5/35

Introduction

Two Solutions

Importance sampling: Simulate system under alternativedynamics so that event of interest is no longer rare. Keep track

of likelihood ratio so that you can renormalize final answer tocreate unbiased estimator.

Particle based methods: Simulate lots of correlated copies ofsystem under original dynamics, these methods can be viewedas a type of branching random walk.


6/35

Importance Sampling

Importance Sampling

Importance Sampling


7/35

Importance Sampling

Importance Sampling

For estimating pn = P(Zn A), first construct new sampling

measure Q then form estimator by averaging independentreplications of

pn =dP

dQ1A(Z

n),

where Zn is sampled according to measure Q.

Judge the performance of pn via its variance, (or equivalently2nd moment)

EQ[p2n] = E[pn].

In order to control relative error would like strong efficiency

supn


8/35

Importance Sampling

Importance Sampling for Random Walks

Consider estimating pn(A) = P(Zn/n A) where

Zn = X1 + . . . + Xn and {Xi} is an i.i.d sequence of ddimensional random vectors that satisfy

() = logE[e,Y] < ,

for in a neighborhood of the origin. Assume that E[X1] / A.

Create a sampling measure by exponential tilt of each incrementof Xi, for each R

d define a sampling measure

Q(X1 dx1, . . . , Xn dxn) =e,x1

e()P(X1 dx1)

e,xn

e()P(Xn dxn)

For this change of measure denote our estimator of pn(A) bypn(A, ), then 2nd moment single replication of estimator is

E[pn(A, )] = E[1{Zn/nA}en((),Zn/n)].

Study asymptotic properties of 2nd moment via large deviation

theory.

I S li


9/35

Importance Sampling

Large Deviations Principle

A sequence of random variables {Zn

} taking values in a Polishspace X satisfy a large deviations principle (LDP) with ratefunction I : X [0, ] if

1 I has compact level sets2 For every Borel set A X

infxA

I(x) lim infn

1

nlogP(Zn A)

lim supn

1

nlogP(Zn A) inf

xAI(x)

A useful alternative formulation: for any bounded and continuousf : X R the following holds

limn

1

nlogEenf(Z

n) = infxX

[f(x) + I(x)].

I t S li


10/35

Importance Sampling

Large Deviations for Random Walks

Suppose that Zn = X1 + . . . + Xn for an i.i.d sequence {Xi} of ddimensional vector that satisfy

() = logE[e,Y] < ,

for in a neighborhood of the origin.

Then Zn/n satisfies an LDP with rate function (CramersTheorem)

I() = supRd

[, ()].

In the 1-d setting, if a> E[X1] then Cramers theorem gives that

P(Zn > na) = en(I(a)+o(1)),

where I(a) = infx>aI(x) = I(a) = aa (a) and a solves(a) = a.

Importance Sampling


11/35

Importance Sampling

Using Large Deviations for Importance Sampling

Using Cramers Theorem and alternative formulation of LDP we

can approximate 2nd moment of IS estimator for large n,1

nlogE[1{Zn/nA}e

n((),Zn/n)] infxA

[I(x) () + , x].

The goal of importance sampling is to minimize variance whichgives following minmax problem

supRd

infxA

[I(x) () + , x].

If A is convex then

supRd

infxA

[I(x) () + , x] = infxA

supRd

[I(x) () + , x]

= 2 infxA

I(x).

Which Rd to use? Let x = arginfxAI(x) then we usechange of measure defined by using tilte x which is solution ofD(x) = x

.

If A is convex, this is a logarithmically efficient estimator.

Importance Sampling


12/35

Importance Sampling

Importance of Convexity

Glasserman and Yang (97) consider the problem of estimatingP(|Zn| > 1.5n) where the increments Xi = Ai Bi andAi N(1.5, 1) and B exp(1).

If set were convex then we need to findx = arginfx:|x|>1.5I(x) = 1.5, then use change of measurebased on 1.5 i.e.,

dP

dQ(x1) = e

1.5x1(1.5).

However by pretending target set was convex we end up withterrible estimator

lim supn

E[pn(A, 1.5)] = .

Importance Sampling


13/35

Importance Sampling

What went wrong?

Normal paths undersampling measure

Rogue paths under

sampling measure

Importance Sampling


14/35

Importance Sampling

A procedure for non-convex A

Dupuis and Wang showed that for non-convex A logarithmic

efficiency requires state-dependent changes of measure.

Suppose that A = A1 . . . Am, where Aj are closed convexsets. Define yj = arginfjAjI(y) and j

.= yj. Then a

logarithmically efficient change of measure is given by usingtransition kernel

Q(Xi dx|Zi1 = z) =m

j=1

rji (z)e

j,xi(j)P(Xi dxi),

The state-dependence mixture probabilities are described as

follows

rji (z) =

wji(z)m

k=1 wki (z)

,

wherewki (z) = exp [nk, z yk + (n i)(k)] .

Importance Sampling


15/35

Importance Sampling

Importance Sampling in Finance

Many option pricing problems can be viewed as rare-event

calculations. Option only has value on a small set of samplespace, so expected value is dominated by values on a rare set.

Glasserman et al (99) looked at using importance sampling as acomputational tool for pricing a variety of path dependentoptions. In setting of concave payoff function they presentlogarithmically efficient procedure for pricing options.

Gausoni and Robertson extended framework of the Glassermanpaper to a continuous time setting and establish that optimalchange of measure in continuous time setting can be found bysolving Euler-Lagrange equation. They assume that payofffunction is concave.

Several works by Glasserman have considered use ofimportance sampling for estimating value at risk, and conditionalvalue at risk.

Dupuis and Wang show that under very weak conditions on thepayoff functional adaptive importance sampling can be used to

evaluate option prices with logarithmic efficiency.

Splitting Method


16/35

p g

Splitting Method

Particle Based Methods

Splitting Method


17/35

Splitting Method

Will focus on a specific particle method called splitting method,

first developed in Villen-Altamirano and Villen-Altamirano 94who called it RESTART.

Dean and Dupuis 08 presented a procedure for construction ofefficient and stable splitting schemes. Will follow their notation.

Model problem: Xn a sequence of stochastic processes on

domain D Rd

, and two disjoint sets A and B, define thesequence of stopping times n = min{i : Xn(i) A B}

Goal estimate the probabilities

pn(x) = P (Xn(n) B|X

n(0) = x) .

Assume that there is a non-negative measurable function L suchthat

limn

1

nlog pn(x)

= inf{t

0

L(s), (s) ds : (0) = 0, (t) B, (s) Ac for all s t}.

Splitting Method


18/35

The Splitting Algorithm

Consider collection of nested setsB = Cn(0) Cn(1) . . . Cn(Mn)

1 Initiate simulation procedure with a single particle starting fromposition x Cn(k) for some k 1. Let w1 = 1 initial weightassociated to particle.

2 Evolve initial particle according to original transition kernel untileither it hits A (dies) or level Cn(k 1). If it hits Cn(k 1) it isreplaced by r identical particles (r > 1). Weight of descendantparticles is weight of parent particle 1/r.

3 Procedure from step 1 is replicated for each descendant particle,

carrying over the value of the weights at each level for thesurviving particles.

4 Steps 1 to 3 are repeated until all particles have either died orreached level Cn(0) = B.

Splitting Method


19/35

Splitting Method

A

B

x

Cn(0)

Cn(1)

Cn(2)

Splitting Method


20/35

The Splitting Estimator

Consider collection of nested setsB = Cn(0) Cn(1) . . . Cn(Mn) (Note will want Mn = c nfor some c > 0).

Nested sets are based on level sets of an importance function,U. Specifically define Lz = {y D : U(y) z} then

Cn(j) = L(j1)/n.

An important function is the level function

n(y) = min{j 0 : y Cn(j)},

Estimator for pn(x) is

Rn(x) = Nn(x)/rn(x),

where Nn(x) is number of particles that made it to B.

Splitting Method


21/35

Analysis of Splitting Estimators

For numerical stability want E[Nn(x)] rn(x)pn(x) to growsubexponentially, or rn(x)pn(x) = exp(o(n)).

For logarithmically optimal 2nd moment require thatrn(x) = pn(x) exp(o(n)).

Suppose we have a function W(x) such that

pn(x) exp (nW(x) + o(n)) ,

then it suffices to establish that

n(x) log r nW(x) = o(n)

Its easy to see that n(x) = nU(x) therefore we choose ourimportance function as U(x) = W(x)/ log(r).

See Dean and Dupuis for details.

Jackson Network


22/35

Jackson Networks

Performance Comparison: Overflow inJackson Networks

Jackson Network

O J k N k


23/35

Open Jackson Networks

Consider a network of d stations. Customers arrive to thenetwork with arrival rate = (1, . . . , d)

T, and the service rateof the d stations is encoded by = (1, . . . , d)

T.

A job that leaves station i joins station j with probability Pi,j, andleaves the system with probability

Pi,0 = 1 d

j=1

Pi,j,

this is called the routing matrix.

We are interested in stable open Jackson networks, that isi) i, either i > 0 or j1 Pj1j2 ...Pjki > 0 for some j1,...,jk.

ii) i, either Pi0 > 0 or Pij1 Pj1j2 ...Pjk0 > 0 for some j1,...,jk.iii) The network is stable (i.e. a stationary distribution exists).

Jackson Network

B i P ti f J k N t k


24/35

Basic Properties of Jackson Networks

Assume without loss of generality thatd

j=1(j + j) = 1.Under the stability assumption the following

i = i +d

j=1jPji, i = 1, 2,..., d

has a unique solution T = T(I P)1.

The traffic intensity at station i is in equilibrium is given byi = i/i (0, 1).

Define = max1id i, and then set = |{i : i = }.

Study system through embedded discrete time Markov chainQ = {Q(k) : k 0} where Q(k) = (Q1(k), . . . , Qd(k)), andQi(k) represents number of customers in station i immediatelyafter kth transition.

Jackson Network

O fl P b biliti i J k N t k


25/35

Overflow Probabilities in Jackson Networks

Consider a subset of stations encoded by the vector v, denote

the total population in this subset by Nv(x) = x, v.Will be interested in the following probability:

pvn = P { total population in stations encoded by v reaches

n before returning to 0, starting from 0}.

Can also define pvn via stopping times

T{x} inf{k 1 : Q(k) = x},

TVn inf{k 1 : Nv (Q(k)) n}.

If we define P()

.

= P(|Q(0) = x) thenpVn = P0(T

Vn T{0}).

or more generally

pV

n

(x) = Px(TV

n

T{0}).

Jackson Network

D mi f Q


26/35

Dynamics of Q

Queue length process is just a state-dependent random walk

Q(k + 1) = Q(k) + (Q(k), Y (k + 1)) ,

is a reflection function that prevents the queue-length processfrom taking negative values.

The noise term Y(k) represents outcome of next transition andhas following pdf

P (Y (k) = w) =

i arrival at station i,

iPij dep. at station i goes to station j,iPi0 dep. at station i leaves sys.

Jackson Network

Logarithmic Asymptotics of Overflow Probabilities in


27/35


Jackson Networks I

Large deviations theory dictates the existence of a function W

pVn (x/n) = exp (nWV (x/n) + o(n)) .

By looking at Q/n we have the following via formal Taylor

expansion

1 =1

pVn (x/n)E

pVn (x/n+

1

n(x/n, Y (1)))

Eexp{nWV[x/n+1

n

(x/n, Y (1))] + nWV (x/n)}

= Eexp{WV(x/n)T(x/n, Y (1)) + o(1)}

= exp ( (x/n, WV (x/n)) + o(1)) ,

where (x, ) = logE expT(x, Y (k)).

Jackson Network



28/35


Jackson Networks II

In order to characterize logarithmic asymptotics of pn(x/n) needto find function WV that satisfies

(x/n, WV(x/n)) = 0,

or for an asymptotic logarithmic upper bound find WV thatsatisfies

(x/n, WV(x/n)) 0.

A function that satisfies this condition is

WV(x/n) = , x/n log V ,

where i = log i and V = max{i : vi = 1}.

Build our splitting scheme out of this function, i.e. theimportance function is given by U(x/n) = WV(X/n)/ log(r).

Jackson Network

Logarithmically Efficient Estimation of Overflow


29/35

Logarithmically Efficient Estimation of Overflow

Probabilities

Dean and Dupuis established that if we use the importancefunction U then the splitting estimator for pVn (x) is logarithmicallyefficient, and number of particles created grows

subexponentially in n.Similarly Dupuis and Wang (09) established that usingsubsolutions to PDE from previous slide you can constructlogarithmically efficient IS estimators for overflow probabilities inJackson networks.

How do we then evaluate relative merits of two algorithms?Requires refined knowledge of performance characteristics, not

just logarithmic scale.

Jackson Network

Asymptotics of Overflow Probabilities in Jackson


30/35

Asymptotics of Overflow Probabilities in Jackson

Networks

Stationary distribution of Jackson networks:

(m1,..., md) =d

j=1

P (Qj () = mj)

=

dj=1

(1 j) mj

j , j = 1,..., d, and mj 0.

Can use this result and a time-reversal argument to show that ifx is in a compact set then there exists k0 and k1 such that

lim supn

pvn(x)

envnv1 k1

lim infn

pvn(x)

envnv1 k0

where where v = log v, in which v = max{i : vi = 1}; and

v = i

I{i = v

, vi = 1}. See Blanchet (11), or Blanchet,Leder, Shi (11).

Jackson Network

Computational Effort for Single Run of Splitting


31/35

Computational Effort for Single Run of Splitting

Algorithm

In Blanchet, Leder, and Shi (11) we looked at the computationaleffort necessary to use a well designed splitting algorithm.

Define C = logv

log r, then rewrite importance function and level

function as

U(x/n) = C(1

log vx

n)

n(x) = C(n x

log v).

Consider the total number of particles that make it to overflowset, can see that

E[Nn(x)] = rn(x)pVn (x) cevnnv1rn(x).

Notice that ev = elogv = eClog r = rC, so that

E[Nn(x) ]cnv1rn(x)nC,

and if we assume that x/n 0 then we have thatE[Nn(x)] cnv1

Jackson Network

Refined Performance of Splitting


32/35

Refined Performance of Splitting

From previous slide we saw that the number of particles tosurvive is nv1, the actual computational effort is on the orderof nv+1 as established in Blanchet, Leder and Shi (11).

The computational effort required to achieve a fixed level ofrelative error is given by

CnE[Rn(x)2]

pVn (x)2

,

where Cn is the computational cost per replication of theestimator i.e. roughly nv+1.

In Blanchet, Leder, Shi (11) we establish thatE[Rn(x)

2] = pVn (x)2O

nV

.

Thus the computational cost of a well designed splittingalgorithm is O

n2V+1

.

Jackson Network

Importance Sampling for Tandem Jackson Network


33/35

Importance Sampling for Tandem Jackson Network

Dupuis, Sezer, and Wang considered estimating total population

overflow in d node tandem network using sampling measuredefined as

Q(Y(k) = z|Q(k 1) = x)

Q(Y(k) = z|Q(k 1) = x)=

dj=0

rj(x/n) exp (j, z (j, x/n)) ,

where

rj(x/n) =wj(x/n)2

j=0 wk(x/n), wj(x/n) = exp (nj, x/n + n+ jn)

and

(j)i =

, 1 i d j0, otherwise

Dupuis, Sezer and Wang established that this estimator islogarithmically efficient.

Call associated estimator pn

.

Jackson Network

Refined Performance of Importance Sampling


34/35

Refined Performance of Importance Sampling

In Blanchet, Glynn and Leder (11) we performed refined analysisof this estimator to compare with splitting and other methods.

We know that cost of algorithm is roughly

E[pn]

e2nn22 .

By direct analysis of likelihood ratio on event of interest we areable to establish that

E[pn] = Oe2nn2d .We are able to establish that the computational complexity ofthis algorithm is O

n2(d+1)

.

Jackson Network

Comparing Performance on Estimating Overflow


35/35

Comparing Performance on Estimating Overflow

Probabilities in Tandem Networks

Computational cost of splitting algorithm is O(n2+1).

Computational cost of importance sampling algorithm isO(n2(d+1).

Thus prefer importance sampling if more than half the stationsare bottlenecks, and splitting otherwise.

Conjecture: This property holds for all Jackson networks, not

just tandem network topology.

monte-carlo methods for the estimation of rare event probabilities_leder

Documents