introduction to rare event simulation · introduction to rare event simulation rare event...

Introduction to Rare Event SimulationBrown University: Summer School on Rare Event Simulation

Jose Blanchet

Columbia University.Department of Statistics,Department of IEOR.

Blanchet (Columbia) 1 / 31

Goals / questions to address in this lecture

The need for rare event simulation by introducing several problems.

Why do we care about rare event simulation?

How does one measure the effi ciency of rare event simulationalgorithms?

Describe basic rare event simulation techniques.

Why are Rare Events Diffi cult to Assess?

Typically no closed forms

But crudely implemented simulation might not be good

Relative mean squared error (MSE) PER TRIAL = stdev / mean√P (RED) (1− P (RED))

P (RED)≈ 1√

P (RED).

Moral of Story: Naive Monte Carlo needs

N = O(

1P (Rare Event)

)replications to obtain a given relative accuracy.

Relative mean squared error (MSE) PER TRIAL = stdev / mean√P (RED) (1− P (RED))

P (RED)≈ 1√

P (RED).

Moral of Story: Naive Monte Carlo needs

N = O(

1P (Rare Event)

)replications to obtain a given relative accuracy.

Introduction to Rare Event Simulation

Rare event simulation = techniques for effi cient estimation ofrare-event probabilities AND conditional expectations givensuch rare events.

Arises in settings such as:

1 Finance and Insurance,2 Queueing networks (operations and communications),3 Environmental and physical sciences,4 Statistical inference and combinatorics...5 Stochastic optimization,

Rare event simulation = techniques for effi cient estimation ofrare-event probabilities AND conditional expectations givensuch rare events.Arises in settings such as:

1 Finance and Insurance,

2 Queueing networks (operations and communications),3 Environmental and physical sciences,4 Statistical inference and combinatorics...5 Stochastic optimization,

1 Finance and Insurance,2 Queueing networks (operations and communications),

3 Environmental and physical sciences,4 Statistical inference and combinatorics...5 Stochastic optimization,

1 Finance and Insurance,2 Queueing networks (operations and communications),3 Environmental and physical sciences,

4 Statistical inference and combinatorics...5 Stochastic optimization,

1 Finance and Insurance,2 Queueing networks (operations and communications),3 Environmental and physical sciences,4 Statistical inference and combinatorics...

5 Stochastic optimization,

Insurance Reserve Setup

Vi’s i.i.d. (independent and identially distributed) claim sizes

τi’s i.i.d. inter-arrival times

An = time of the n-th arrival

Constant premium p

Reserve process

R (t) = b+ pt −N (t)

∑j=1

N (t) = # arrivals up to time t

Constant premium p

Reserve process

R (t) = b+ pt −N (t)

∑j=1

Constant premium p

Reserve process

R (t) = b+ pt −N (t)

∑j=1

Constant premium p

Reserve process

R (t) = b+ pt −N (t)

∑j=1

Constant premium p

Reserve process

R (t) = b+ pt −N (t)

∑j=1

Constant premium p

Reserve process

R (t) = b+ pt −N (t)

∑j=1

Insurance Reserve Plot

Plot of risk reserve

A2 A3A1

Insurance Reserve and Random Walks

Evaluating reserve at arrival times we get random walk

Suppose Y1,Y2, ... are i.i.d.

S (n) = b+ Y1 + ...+ Yn

R (An) = S (n) reserve at arrival times with Yn = pτn − Vn.R(t)

A2 A3A1

Question: What is the chance that inf{R (t) : t ≥ 0} < 0? Andhow does this happen?

Evaluating reserve at arrival times we get random walkSuppose Y1,Y2, ... are i.i.d.

S (n) = b+ Y1 + ...+ Yn

A2 A3A1

S (n) = b+ Y1 + ...+ Yn

A2 A3A1

S (n) = b+ Y1 + ...+ Yn

A2 A3A1

Question: What is the chance that inf{R (t) : t ≥ 0} < 0? Andhow does this happen?Blanchet (Columbia) 9 / 31

How Does Ruin Occur with Light-Tailed Increments?

Yi’s are N(0, 1), EYi = 1

Random walk conditioned on ruinLight tails: Exponential, Gamma, Gaussian, mixtures of these, etc.Picture generated with Siegmund’s 76 algorithm

Yi’s are N(0, 1), EYi = 1Random walk conditioned on ruin

Light tails: Exponential, Gamma, Gaussian, mixtures of these, etc.Picture generated with Siegmund’s 76 algorithm

Yi’s are N(0, 1), EYi = 1Random walk conditioned on ruinLight tails: Exponential, Gamma, Gaussian, mixtures of these, etc.

Picture generated with Siegmund’s 76 algorithm

Yi’s are N(0, 1), EYi = 1Random walk conditioned on ruinLight tails: Exponential, Gamma, Gaussian, mixtures of these, etc.Picture generated with Siegmund’s 76 algorithm

Back to Heavy-tailed Insurance

Ruin with Pareto Claims

1 21 41 61 81 101 121 141 161 181 201

Ruin with Powerlaw Type Increments

1 21 41 61 81 101 121 141 161 181 201

B Time

1 21 41 61 81 101 121 141 161 181 201

B Time

Yi’s are t-distributed, EYi = 1, Var (Yi ) = 1, 4 degrees of freedom.

Heavy-tailed random walk conditioned on ruin.

Picture generated with Blanchet and Glynn’s 09 algorithm.

1 21 41 61 81 101 121 141 161 181 201

B Time

1 21 41 61 81 101 121 141 161 181 201

B Time

1 21 41 61 81 101 121 141 161 181 201

B Time

1 21 41 61 81 101 121 141 161 181 201

B Time

A Two-node Tandem Network

Poisson arrivals, exponential service times, and Markovian routing...

Example: A Simple Stochastic Network

Questions of interest:

How would the system perform IF TOTAL POPULATION reaches ninside a busy period?

How likely is this event under different parameters / designs?

Naive Benchmark: Crude Monte Carlo

Say λ = .1, µ1 = .5, µ2 = .4, p = .1

Overflow probability ≈ 10−7

Each replication in crude Monte Carlo ≈ 10−3 secsTime to estimate with 10% precision ≈ 107/1000 = 104 secs ≈ 2.7hours

What if you want to do sensitivity analysis for a wide range ofparameters?

What about different network designs?

Say λ = .1, µ1 = .5, µ2 = .4, p = .1

Each replication in crude Monte Carlo ≈ 10−3 secs

Time to estimate with 10% precision ≈ 107/1000 = 104 secs ≈ 2.7hours

Say λ = .1, µ1 = .5, µ2 = .4, p = .1

Rare Events in Networks

Question: How does the TOTAL CONTENT OF THE SYSTEM ismost likely to reach a give level in an operation cycle?

Say λ = .2, µ1 = .3, µ2 = .5... How can total content reach 50?Naive Monte Carlo takes ≈ 115 daysEach picture below took ≈ .01 seconds

Say λ = .2, µ1 = .3, µ2 = .5... How can total content reach 50?

Naive Monte Carlo takes ≈ 115 daysEach picture below took ≈ .01 seconds

Say λ = .2, µ1 = .3, µ2 = .5... How can total content reach 50?Naive Monte Carlo takes ≈ 115 days

Each picture below took ≈ .01 seconds

Say λ = .2, µ1 = .3, µ2 = .5... How can total content reach 50?Naive Monte Carlo takes ≈ 115 daysEach picture below took ≈ .01 seconds

λ = .1, µ1 = .3, µ2 = .5

Plot of Conditional Path

λ = .1, µ1 = .5, µ2 = .2

λ = .1, µ1 = .4, µ2 = .4

Several Notions of Effi ciency

Assume the estimator is unbiased.

E (Z ) = P (A) and we assume P (A)→ 0.

Logarithmic Effi ciency (weakly effi ciency or asymptoticoptimality):

limP (A)→0

log E(Z 2)

2 logP (A)= 1.

Bounded Relative Error

E(Z 2)

P (A)2= O (1) as P (A)→ 0.

limP (A)→0

log E(Z 2)

2 logP (A)= 1.

E(Z 2)

P (A)2= O (1) as P (A)→ 0.

limP (A)→0

log E(Z 2)

2 logP (A)= 1.

E(Z 2)

P (A)2= O (1) as P (A)→ 0.

limP (A)→0

log E(Z 2)

2 logP (A)= 1.

E(Z 2)

P (A)2= O (1) as P (A)→ 0.

Effi ciency of Naive Monte Carlo

If Z = I (A) then E(Z 2)= E (Z ) = P (A)

Thereforelim

P (A)→0

logP (A)2 logP (A)

How to design effi cient rare event simulation estimators?Importance Sampling and other variance reduction techniques...

Thereforelim

P (A)→0

logP (A)2 logP (A)

Thereforelim

P (A)→0

logP (A)2 logP (A)

Graphical Interpretation of Importance Sampling

Importance sampling (I.S.): sample from the important region andcorrect via likelihood ratio

RED AREA ≈ PROPORTION DARTS IN RED AREA× 19

Importance Sampling

Goal: Estimate P (A) > 0

Choose P̃ and simulate ω from it

Importance Sampling (I.S.) estimator per trial is

I .S .Estimator = L (ω) I (ω ∈ A) ,

where L (ω) is the likelihood ratio (i.e. L (ω) = dP (ω) /dP̃ (ω)).NOTE: P̃ (·) is called a change-of-measure

Importance Sampling

Goal: Estimate P (A) > 0Choose P̃ and simulate ω from it

Importance Sampling

where L (ω) is the likelihood ratio (i.e. L (ω) = dP (ω) /dP̃ (ω)).

NOTE: P̃ (·) is called a change-of-measure

Importance Sampling

Suppose we choose P̃ (·) = P ( ·|A)

L (ω) =dP (ω)

I (ω ∈ A) dP (ω) /P (A)I (ω ∈ A) = P (A)

Estimator has zero variance, but requires knoweledge of P (A)

Lesson: Try choosing P̃ (·) close to P ( ·|A)!

Importance Sampling

L (ω) =dP (ω)

I (ω ∈ A) dP (ω) /P (A)I (ω ∈ A) = P (A)

Importance Sampling

L (ω) =dP (ω)

I (ω ∈ A) dP (ω) /P (A)I (ω ∈ A) = P (A)

Large Deviations of the Empirical Mean

Suppose X1,X2, ... are i.i.d. and H (θ) = log E exp (θX ) < ∞ for θ ina neighborhood of the origin.

Assume that E (Xi ) = 0.

Suppose that for each a > 0, there is θa such that H ′ (θa) = a.

Consider the problem of estimating via simulation

P (S (n) > na) ,

where S (n) = X1 + ...+ Xn.

P (S (n) > na) ,

where S (n) = X1 + ...+ Xn.

P (S (n) > na) ,

where S (n) = X1 + ...+ Xn.

P (S (n) > na) ,

where S (n) = X1 + ...+ Xn.

Exponential Tilting

Suppose that Xi has density f (·) and define

fθ (x) = exp (θx −H (θ)) f (x) .

Note that

log Eθ (exp (ηX )) = log∫fθ (x) exp (ηx) dx

= log∫f (x) exp ((η + θ) x −H (θ)) dx

= H (η + θ)−H (θ) .

Therefore Eθ (X ) = H ′ (θ) and Varθ (X ) = H ′′ (θ) .

Exponential Tilting

fθ (x) = exp (θx −H (θ)) f (x) .

Note that

= H (η + θ)−H (θ) .

Exponential Tilting

fθ (x) = exp (θx −H (θ)) f (x) .

Note that

= H (η + θ)−H (θ) .

Exponential Tilting in Action

Cramer’s theorem (strong version): (Sketch)

P (Sn > na)

=∫f (x1) ...f (xn) I (sn > na) dx1:n

=∫{sn>na}

fθa (x1) ...fθa (xn) exp (−θasn + nH (θa)) dx1:n

= exp (−n[aθa −H (θa)])Eθa [exp (−θa (Sn − na)) I (Sn > na)]= exp (−nI (a))Eθa [exp (−θa (Sn − na)) I (Sn > na)] .

Use CLT and approximate (Sn − na) by N (0, nH ′′ (θa)) to obtain

Eθa [exp (−θa (Sn − na)) I (Sn > na)] ≈1

n1/2 (2πH ′′ (θa))1/2 θa

Exponential Tilting in Action

Cramer’s theorem (strong version): (Sketch)

P (Sn > na)

=∫f (x1) ...f (xn) I (sn > na) dx1:n

=∫{sn>na}

= exp (−n[aθa −H (θa)])Eθa [exp (−θa (Sn − na)) I (Sn > na)]= exp (−nI (a))Eθa [exp (−θa (Sn − na)) I (Sn > na)] .

Use CLT and approximate (Sn − na) by N (0, nH ′′ (θa)) to obtain

Eθa [exp (−θa (Sn − na)) I (Sn > na)] ≈1

n1/2 (2πH ′′ (θa))1/2 θa

Why is Exponential tilting Useful?

TheoremUnder the assumptions imposed

P (X1 ∈ dx1|Sn > na)→ Pθa (X1 ∈ dx1) .

In simple words, exponential tilting approximates thezero-variance (optimal!) change-of-measure!

Proof.

P (X1 ∈ dx1|Sn > na) =P (Sn−1 > na− x1)

P (Sn > na)f (x1) dx .

Now apply Cramer’s theorem with a←− (na− x1) /(n− 1) andn←− n− 1 in the numerator and simplify, observing that

(na− x1n− 1

a− x1n− 1

)≈ θ (a) +

1H ′′ (θa)

a− x1n− 1 .

First Asymptotically Optimal Rare Event SimulationEstimator

Recall the identity used in Cramer’s theorem

P (Sn > na) =∫f (x1) ...f (xn) I (sn > na) dx1:n

=∫{sn>na}

= Eθa [exp (−θaSn + nH (θa)) I (Sn > na)] .

IMPORTANCE SAMPLING ESTIMATOR:

Z = exp (−θaSn + nH (θa)) I (Sn > na) ,

with X1, ...,Xn simulated using the density fθa (·) .Let us verify now that Z is asymptotically effi cient!

=∫{sn>na}

with X1, ...,Xn simulated using the density fθa (·) .

Let us verify now that Z is asymptotically effi cient!

=∫{sn>na}

with X1, ...,Xn simulated using the density fθa (·) .Let us verify now that Z is asymptotically effi cient!

with X1, ...,Xn simulated using the density fθa (·) .

Therefore,

(Z 2)= Eθa exp (−2θaSn + 2nH (θa)) I (Sn > na)

= exp (−2nI (a))Eθa [exp (−2θa(Sn − na)) I (Sn > na)] .

Thuslog Eθa

2 logP (Sn > na)=2nI (a)2nI (a)

+ o (1)→ 1.

with X1, ...,Xn simulated using the density fθa (·) .Therefore,

Thuslog Eθa

+ o (1)→ 1.

with X1, ...,Xn simulated using the density fθa (·) .Therefore,

Thuslog Eθa

+ o (1)→ 1.

How to Simulate under Exponential Tilting

Key is to identify: Hθ (η) = H (η + θ)−H (θ) .

Example 1: Gaussian if X ∼ N (µ, 1)

H (θ) = θ2/2+ µθ, Hθ (η) = η2/2+ (θ + µ) η.

Thus, under fθ, X ∼ N (µ, 1).Example 2: If X ∼ Poisson (λ), then

H (θ) = λ (exp (θ)− 1) , Hθ (η) = λ exp (θ) · (exp (η)− 1) .

Thus, under fθ, X ∼ Poisson (λ exp (θ)).

Key is to identify: Hθ (η) = H (η + θ)−H (θ) .Example 1: Gaussian if X ∼ N (µ, 1)

H (θ) = θ2/2+ µθ, Hθ (η) = η2/2+ (θ + µ) η.

Thus, under fθ, X ∼ N (µ, 1).

Example 2: If X ∼ Poisson (λ), then

Key is to identify: Hθ (η) = H (η + θ)−H (θ) .Example 1: Gaussian if X ∼ N (µ, 1)

H (θ) = θ2/2+ µθ, Hθ (η) = η2/2+ (θ + µ) η.

Thus, under fθ, X ∼ N (µ, 1).Example 2: If X ∼ Poisson (λ), then

Conclusions

1 Rare event simulation is a power numerical tool applicable in a widerange of examples.

2 Importance sampling can be used to construct effi cient estimators.3 Best importance sampler is the conditional distribution given theevent of interest.

4 Effi cient algorithms come basically in two flavors: logarithmically andstrongly effi cient.

5 Exponential tilting can be used to approximate the conditionaldistribution given the event of interest.

Conclusions

2 Importance sampling can be used to construct effi cient estimators.

3 Best importance sampler is the conditional distribution given theevent of interest.

Conclusions

introduction to rare event simulation · introduction to rare event simulation rare event...

Documents

deep probabilistic accelerated evaluation: a …deep...

introduction to rare event...

efficient rare-event simulation for sums of dependent -...

hyperspherical clustering and sampling for rare event...

discrete event simulation

rare-event simulation for stochastic korteweg-de vries...

rare event simulation using importance sampling and cross...

analysis of a splitting estimator for rare event...

rare event simulation in finite-infinite dimensional space

efficient rare-event simulation for...

rare event modeling using common event data -...

on heavy-tailed rare event analysis · design of scheduling...

diskrete event simulation - technische universität … ·...

markov chain monte carlo for rare-event simulation in heavy...

a survey of rare event simulation methods for static input...

implementing a discrete event simulation using the ... ·...

event simulation

anatomy of a rare event model: sampling, estimation, and...

rare event simulation using monte carlo methods · rare...

rare-event simulation for many-server queues · we develop...