introduction to rare event simulation · introduction to rare event simulation rare event...

Post on 17-Jul-2020

24 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Introduction to Rare Event SimulationBrown University: Summer School on Rare Event Simulation

Jose Blanchet

Columbia University.Department of Statistics,Department of IEOR.

Blanchet (Columbia) 1 / 31

Goals / questions to address in this lecture

The need for rare event simulation by introducing several problems.

Why do we care about rare event simulation?

How does one measure the effi ciency of rare event simulationalgorithms?

Describe basic rare event simulation techniques.

Blanchet (Columbia) 2 / 31

Goals / questions to address in this lecture

The need for rare event simulation by introducing several problems.

Why do we care about rare event simulation?

How does one measure the effi ciency of rare event simulationalgorithms?

Describe basic rare event simulation techniques.

Blanchet (Columbia) 2 / 31

Goals / questions to address in this lecture

The need for rare event simulation by introducing several problems.

Why do we care about rare event simulation?

How does one measure the effi ciency of rare event simulationalgorithms?

Describe basic rare event simulation techniques.

Blanchet (Columbia) 2 / 31

Goals / questions to address in this lecture

The need for rare event simulation by introducing several problems.

Why do we care about rare event simulation?

How does one measure the effi ciency of rare event simulationalgorithms?

Describe basic rare event simulation techniques.

Blanchet (Columbia) 2 / 31

Why are Rare Events Diffi cult to Assess?

Typically no closed forms

But crudely implemented simulation might not be good

1

10

Blanchet (Columbia) 3 / 31

Why are Rare Events Diffi cult to Assess?

1

10

Blanchet (Columbia) 4 / 31

Why are Rare Events Diffi cult to Assess?

Relative mean squared error (MSE) PER TRIAL = stdev / mean√P (RED) (1− P (RED))

P (RED)≈ 1√

P (RED).

Moral of Story: Naive Monte Carlo needs

N = O(

1P (Rare Event)

)replications to obtain a given relative accuracy.

Blanchet (Columbia) 5 / 31

Why are Rare Events Diffi cult to Assess?

Relative mean squared error (MSE) PER TRIAL = stdev / mean√P (RED) (1− P (RED))

P (RED)≈ 1√

P (RED).

Moral of Story: Naive Monte Carlo needs

N = O(

1P (Rare Event)

)replications to obtain a given relative accuracy.

Blanchet (Columbia) 5 / 31

Introduction to Rare Event Simulation

Rare event simulation = techniques for effi cient estimation ofrare-event probabilities AND conditional expectations givensuch rare events.

Arises in settings such as:

1 Finance and Insurance,2 Queueing networks (operations and communications),3 Environmental and physical sciences,4 Statistical inference and combinatorics...5 Stochastic optimization,

Blanchet (Columbia) 6 / 31

Introduction to Rare Event Simulation

Rare event simulation = techniques for effi cient estimation ofrare-event probabilities AND conditional expectations givensuch rare events.Arises in settings such as:

1 Finance and Insurance,2 Queueing networks (operations and communications),3 Environmental and physical sciences,4 Statistical inference and combinatorics...5 Stochastic optimization,

Blanchet (Columbia) 6 / 31

Introduction to Rare Event Simulation

Rare event simulation = techniques for effi cient estimation ofrare-event probabilities AND conditional expectations givensuch rare events.Arises in settings such as:

1 Finance and Insurance,

2 Queueing networks (operations and communications),3 Environmental and physical sciences,4 Statistical inference and combinatorics...5 Stochastic optimization,

Blanchet (Columbia) 6 / 31

Introduction to Rare Event Simulation

Rare event simulation = techniques for effi cient estimation ofrare-event probabilities AND conditional expectations givensuch rare events.Arises in settings such as:

1 Finance and Insurance,2 Queueing networks (operations and communications),

3 Environmental and physical sciences,4 Statistical inference and combinatorics...5 Stochastic optimization,

Blanchet (Columbia) 6 / 31

Introduction to Rare Event Simulation

Rare event simulation = techniques for effi cient estimation ofrare-event probabilities AND conditional expectations givensuch rare events.Arises in settings such as:

1 Finance and Insurance,2 Queueing networks (operations and communications),3 Environmental and physical sciences,

4 Statistical inference and combinatorics...5 Stochastic optimization,

Blanchet (Columbia) 6 / 31

Introduction to Rare Event Simulation

Rare event simulation = techniques for effi cient estimation ofrare-event probabilities AND conditional expectations givensuch rare events.Arises in settings such as:

1 Finance and Insurance,2 Queueing networks (operations and communications),3 Environmental and physical sciences,4 Statistical inference and combinatorics...

5 Stochastic optimization,

Blanchet (Columbia) 6 / 31

Introduction to Rare Event Simulation

Rare event simulation = techniques for effi cient estimation ofrare-event probabilities AND conditional expectations givensuch rare events.Arises in settings such as:

1 Finance and Insurance,2 Queueing networks (operations and communications),3 Environmental and physical sciences,4 Statistical inference and combinatorics...5 Stochastic optimization,

Blanchet (Columbia) 6 / 31

Insurance Reserve Setup

Vi’s i.i.d. (independent and identially distributed) claim sizes

τi’s i.i.d. inter-arrival times

An = time of the n-th arrival

Constant premium p

Reserve process

R (t) = b+ pt −N (t)

∑j=1

Vj

N (t) = # arrivals up to time t

Blanchet (Columbia) 7 / 31

Insurance Reserve Setup

Vi’s i.i.d. (independent and identially distributed) claim sizes

τi’s i.i.d. inter-arrival times

An = time of the n-th arrival

Constant premium p

Reserve process

R (t) = b+ pt −N (t)

∑j=1

Vj

N (t) = # arrivals up to time t

Blanchet (Columbia) 7 / 31

Insurance Reserve Setup

Vi’s i.i.d. (independent and identially distributed) claim sizes

τi’s i.i.d. inter-arrival times

An = time of the n-th arrival

Constant premium p

Reserve process

R (t) = b+ pt −N (t)

∑j=1

Vj

N (t) = # arrivals up to time t

Blanchet (Columbia) 7 / 31

Insurance Reserve Setup

Vi’s i.i.d. (independent and identially distributed) claim sizes

τi’s i.i.d. inter-arrival times

An = time of the n-th arrival

Constant premium p

Reserve process

R (t) = b+ pt −N (t)

∑j=1

Vj

N (t) = # arrivals up to time t

Blanchet (Columbia) 7 / 31

Insurance Reserve Setup

Vi’s i.i.d. (independent and identially distributed) claim sizes

τi’s i.i.d. inter-arrival times

An = time of the n-th arrival

Constant premium p

Reserve process

R (t) = b+ pt −N (t)

∑j=1

Vj

N (t) = # arrivals up to time t

Blanchet (Columbia) 7 / 31

Insurance Reserve Setup

Vi’s i.i.d. (independent and identially distributed) claim sizes

τi’s i.i.d. inter-arrival times

An = time of the n-th arrival

Constant premium p

Reserve process

R (t) = b+ pt −N (t)

∑j=1

Vj

N (t) = # arrivals up to time t

Blanchet (Columbia) 7 / 31

Insurance Reserve Plot

Plot of risk reserve

R(t)

t

b

A2 A3A1

R(t)

t

b

A2 A3

R(t)

t

b

A2 A3A1

Blanchet (Columbia) 8 / 31

Insurance Reserve and Random Walks

Evaluating reserve at arrival times we get random walk

Suppose Y1,Y2, ... are i.i.d.

S (n) = b+ Y1 + ...+ Yn

R (An) = S (n) reserve at arrival times with Yn = pτn − Vn.R(t)

t

b

A2 A3A1

R(t)

t

b

A2 A3A1

R(t)

t

b

A2 A3

R(t)

t

b

A2 A3A1

Question: What is the chance that inf{R (t) : t ≥ 0} < 0? Andhow does this happen?

Blanchet (Columbia) 9 / 31

Insurance Reserve and Random Walks

Evaluating reserve at arrival times we get random walkSuppose Y1,Y2, ... are i.i.d.

S (n) = b+ Y1 + ...+ Yn

R (An) = S (n) reserve at arrival times with Yn = pτn − Vn.R(t)

t

b

A2 A3A1

R(t)

t

b

A2 A3A1

R(t)

t

b

A2 A3

R(t)

t

b

A2 A3A1

Question: What is the chance that inf{R (t) : t ≥ 0} < 0? Andhow does this happen?

Blanchet (Columbia) 9 / 31

Insurance Reserve and Random Walks

Evaluating reserve at arrival times we get random walkSuppose Y1,Y2, ... are i.i.d.

S (n) = b+ Y1 + ...+ Yn

R (An) = S (n) reserve at arrival times with Yn = pτn − Vn.R(t)

t

b

A2 A3A1

R(t)

t

b

A2 A3A1

R(t)

t

b

A2 A3

R(t)

t

b

A2 A3A1

Question: What is the chance that inf{R (t) : t ≥ 0} < 0? Andhow does this happen?

Blanchet (Columbia) 9 / 31

Insurance Reserve and Random Walks

Evaluating reserve at arrival times we get random walkSuppose Y1,Y2, ... are i.i.d.

S (n) = b+ Y1 + ...+ Yn

R (An) = S (n) reserve at arrival times with Yn = pτn − Vn.R(t)

t

b

A2 A3A1

R(t)

t

b

A2 A3A1

R(t)

t

b

A2 A3

R(t)

t

b

A2 A3A1

Question: What is the chance that inf{R (t) : t ≥ 0} < 0? Andhow does this happen?Blanchet (Columbia) 9 / 31

How Does Ruin Occur with Light-Tailed Increments?

Yi’s are N(0, 1), EYi = 1

Random walk conditioned on ruinLight tails: Exponential, Gamma, Gaussian, mixtures of these, etc.Picture generated with Siegmund’s 76 algorithm

Blanchet (Columbia) 10 / 31

How Does Ruin Occur with Light-Tailed Increments?

Yi’s are N(0, 1), EYi = 1Random walk conditioned on ruin

Light tails: Exponential, Gamma, Gaussian, mixtures of these, etc.Picture generated with Siegmund’s 76 algorithm

Blanchet (Columbia) 10 / 31

How Does Ruin Occur with Light-Tailed Increments?

Yi’s are N(0, 1), EYi = 1Random walk conditioned on ruinLight tails: Exponential, Gamma, Gaussian, mixtures of these, etc.

Picture generated with Siegmund’s 76 algorithm

Blanchet (Columbia) 10 / 31

How Does Ruin Occur with Light-Tailed Increments?

Yi’s are N(0, 1), EYi = 1Random walk conditioned on ruinLight tails: Exponential, Gamma, Gaussian, mixtures of these, etc.Picture generated with Siegmund’s 76 algorithm

Blanchet (Columbia) 10 / 31

Back to Heavy-tailed Insurance

Ruin with Pareto Claims

­200

­100

0

100

200

300

400

1 21 41 61 81 101 121 141 161 181 201

Time

Res

erve

Ruin with Power­law Type Increments

­200

­100

0

100

200

300

400

1 21 41 61 81 101 121 141 161 181 201

B Time

Ran

dom

 Wal

k Va

lue

Ruin with Pareto Claims

­200

­100

0

100

200

300

400

1 21 41 61 81 101 121 141 161 181 201

Time

Res

erve

Ruin with Power­law Type Increments

­200

­100

0

100

200

300

400

1 21 41 61 81 101 121 141 161 181 201

B Time

Ran

dom

 Wal

k Va

lue

Yi’s are t-distributed, EYi = 1, Var (Yi ) = 1, 4 degrees of freedom.

Heavy-tailed random walk conditioned on ruin.

Picture generated with Blanchet and Glynn’s 09 algorithm.

Blanchet (Columbia) 11 / 31

Back to Heavy-tailed Insurance

Ruin with Pareto Claims

­200

­100

0

100

200

300

400

1 21 41 61 81 101 121 141 161 181 201

Time

Res

erve

Ruin with Power­law Type Increments

­200

­100

0

100

200

300

400

1 21 41 61 81 101 121 141 161 181 201

B Time

Ran

dom

 Wal

k Va

lue

Ruin with Pareto Claims

­200

­100

0

100

200

300

400

1 21 41 61 81 101 121 141 161 181 201

Time

Res

erve

Ruin with Power­law Type Increments

­200

­100

0

100

200

300

400

1 21 41 61 81 101 121 141 161 181 201

B Time

Ran

dom

 Wal

k Va

lue

Yi’s are t-distributed, EYi = 1, Var (Yi ) = 1, 4 degrees of freedom.

Heavy-tailed random walk conditioned on ruin.

Picture generated with Blanchet and Glynn’s 09 algorithm.

Blanchet (Columbia) 11 / 31

Back to Heavy-tailed Insurance

Ruin with Pareto Claims

­200

­100

0

100

200

300

400

1 21 41 61 81 101 121 141 161 181 201

Time

Res

erve

Ruin with Power­law Type Increments

­200

­100

0

100

200

300

400

1 21 41 61 81 101 121 141 161 181 201

B Time

Ran

dom

 Wal

k Va

lue

Ruin with Pareto Claims

­200

­100

0

100

200

300

400

1 21 41 61 81 101 121 141 161 181 201

Time

Res

erve

Ruin with Power­law Type Increments

­200

­100

0

100

200

300

400

1 21 41 61 81 101 121 141 161 181 201

B Time

Ran

dom

 Wal

k Va

lue

Yi’s are t-distributed, EYi = 1, Var (Yi ) = 1, 4 degrees of freedom.

Heavy-tailed random walk conditioned on ruin.

Picture generated with Blanchet and Glynn’s 09 algorithm.

Blanchet (Columbia) 11 / 31

A Two-node Tandem Network

Poisson arrivals, exponential service times, and Markovian routing...

Blanchet (Columbia) 12 / 31

Example: A Simple Stochastic Network

Questions of interest:

How would the system perform IF TOTAL POPULATION reaches ninside a busy period?

How likely is this event under different parameters / designs?

Blanchet (Columbia) 13 / 31

Naive Benchmark: Crude Monte Carlo

Say λ = .1, µ1 = .5, µ2 = .4, p = .1

Overflow probability ≈ 10−7

Each replication in crude Monte Carlo ≈ 10−3 secsTime to estimate with 10% precision ≈ 107/1000 = 104 secs ≈ 2.7hours

What if you want to do sensitivity analysis for a wide range ofparameters?

What about different network designs?

Blanchet (Columbia) 14 / 31

Naive Benchmark: Crude Monte Carlo

Say λ = .1, µ1 = .5, µ2 = .4, p = .1

Overflow probability ≈ 10−7

Each replication in crude Monte Carlo ≈ 10−3 secsTime to estimate with 10% precision ≈ 107/1000 = 104 secs ≈ 2.7hours

What if you want to do sensitivity analysis for a wide range ofparameters?

What about different network designs?

Blanchet (Columbia) 14 / 31

Naive Benchmark: Crude Monte Carlo

Say λ = .1, µ1 = .5, µ2 = .4, p = .1

Overflow probability ≈ 10−7

Each replication in crude Monte Carlo ≈ 10−3 secs

Time to estimate with 10% precision ≈ 107/1000 = 104 secs ≈ 2.7hours

What if you want to do sensitivity analysis for a wide range ofparameters?

What about different network designs?

Blanchet (Columbia) 14 / 31

Naive Benchmark: Crude Monte Carlo

Say λ = .1, µ1 = .5, µ2 = .4, p = .1

Overflow probability ≈ 10−7

Each replication in crude Monte Carlo ≈ 10−3 secsTime to estimate with 10% precision ≈ 107/1000 = 104 secs ≈ 2.7hours

What if you want to do sensitivity analysis for a wide range ofparameters?

What about different network designs?

Blanchet (Columbia) 14 / 31

Naive Benchmark: Crude Monte Carlo

Say λ = .1, µ1 = .5, µ2 = .4, p = .1

Overflow probability ≈ 10−7

Each replication in crude Monte Carlo ≈ 10−3 secsTime to estimate with 10% precision ≈ 107/1000 = 104 secs ≈ 2.7hours

What if you want to do sensitivity analysis for a wide range ofparameters?

What about different network designs?

Blanchet (Columbia) 14 / 31

Naive Benchmark: Crude Monte Carlo

Say λ = .1, µ1 = .5, µ2 = .4, p = .1

Overflow probability ≈ 10−7

Each replication in crude Monte Carlo ≈ 10−3 secsTime to estimate with 10% precision ≈ 107/1000 = 104 secs ≈ 2.7hours

What if you want to do sensitivity analysis for a wide range ofparameters?

What about different network designs?

Blanchet (Columbia) 14 / 31

Rare Events in Networks

Question: How does the TOTAL CONTENT OF THE SYSTEM ismost likely to reach a give level in an operation cycle?

Say λ = .2, µ1 = .3, µ2 = .5... How can total content reach 50?Naive Monte Carlo takes ≈ 115 daysEach picture below took ≈ .01 seconds

Blanchet (Columbia) 15 / 31

Rare Events in Networks

Question: How does the TOTAL CONTENT OF THE SYSTEM ismost likely to reach a give level in an operation cycle?

Say λ = .2, µ1 = .3, µ2 = .5... How can total content reach 50?

Naive Monte Carlo takes ≈ 115 daysEach picture below took ≈ .01 seconds

Blanchet (Columbia) 15 / 31

Rare Events in Networks

Question: How does the TOTAL CONTENT OF THE SYSTEM ismost likely to reach a give level in an operation cycle?

Say λ = .2, µ1 = .3, µ2 = .5... How can total content reach 50?Naive Monte Carlo takes ≈ 115 days

Each picture below took ≈ .01 seconds

Blanchet (Columbia) 15 / 31

Rare Events in Networks

Question: How does the TOTAL CONTENT OF THE SYSTEM ismost likely to reach a give level in an operation cycle?

Say λ = .2, µ1 = .3, µ2 = .5... How can total content reach 50?Naive Monte Carlo takes ≈ 115 daysEach picture below took ≈ .01 seconds

Blanchet (Columbia) 15 / 31

Rare Events in Networks

λ = .1, µ1 = .3, µ2 = .5

X1

X2

Plot of Conditional Path

OF

Blanchet (Columbia) 16 / 31

Rare Events in Networks

λ = .1, µ1 = .5, µ2 = .2

X1

X2

Plot of Conditional Path

OF

Blanchet (Columbia) 17 / 31

Rare Events in Networks

λ = .1, µ1 = .4, µ2 = .4

X1

X2

Plot of Conditional Path

OF

Blanchet (Columbia) 18 / 31

Several Notions of Effi ciency

Assume the estimator is unbiased.

E (Z ) = P (A) and we assume P (A)→ 0.

Logarithmic Effi ciency (weakly effi ciency or asymptoticoptimality):

limP (A)→0

log E(Z 2)

2 logP (A)= 1.

Bounded Relative Error

E(Z 2)

P (A)2= O (1) as P (A)→ 0.

Blanchet (Columbia) 19 / 31

Several Notions of Effi ciency

Assume the estimator is unbiased.

E (Z ) = P (A) and we assume P (A)→ 0.

Logarithmic Effi ciency (weakly effi ciency or asymptoticoptimality):

limP (A)→0

log E(Z 2)

2 logP (A)= 1.

Bounded Relative Error

E(Z 2)

P (A)2= O (1) as P (A)→ 0.

Blanchet (Columbia) 19 / 31

Several Notions of Effi ciency

Assume the estimator is unbiased.

E (Z ) = P (A) and we assume P (A)→ 0.

Logarithmic Effi ciency (weakly effi ciency or asymptoticoptimality):

limP (A)→0

log E(Z 2)

2 logP (A)= 1.

Bounded Relative Error

E(Z 2)

P (A)2= O (1) as P (A)→ 0.

Blanchet (Columbia) 19 / 31

Several Notions of Effi ciency

Assume the estimator is unbiased.

E (Z ) = P (A) and we assume P (A)→ 0.

Logarithmic Effi ciency (weakly effi ciency or asymptoticoptimality):

limP (A)→0

log E(Z 2)

2 logP (A)= 1.

Bounded Relative Error

E(Z 2)

P (A)2= O (1) as P (A)→ 0.

Blanchet (Columbia) 19 / 31

Effi ciency of Naive Monte Carlo

If Z = I (A) then E(Z 2)= E (Z ) = P (A)

Thereforelim

P (A)→0

logP (A)2 logP (A)

=12.

How to design effi cient rare event simulation estimators?Importance Sampling and other variance reduction techniques...

Blanchet (Columbia) 20 / 31

Effi ciency of Naive Monte Carlo

If Z = I (A) then E(Z 2)= E (Z ) = P (A)

Thereforelim

P (A)→0

logP (A)2 logP (A)

=12.

How to design effi cient rare event simulation estimators?Importance Sampling and other variance reduction techniques...

Blanchet (Columbia) 20 / 31

Effi ciency of Naive Monte Carlo

If Z = I (A) then E(Z 2)= E (Z ) = P (A)

Thereforelim

P (A)→0

logP (A)2 logP (A)

=12.

How to design effi cient rare event simulation estimators?Importance Sampling and other variance reduction techniques...

Blanchet (Columbia) 20 / 31

Graphical Interpretation of Importance Sampling

Importance sampling (I.S.): sample from the important region andcorrect via likelihood ratio

RED AREA ≈ PROPORTION DARTS IN RED AREA× 19

0 1

1

0 1

1

Blanchet (Columbia) 21 / 31

Importance Sampling

Goal: Estimate P (A) > 0

Choose P̃ and simulate ω from it

Importance Sampling (I.S.) estimator per trial is

I .S .Estimator = L (ω) I (ω ∈ A) ,

where L (ω) is the likelihood ratio (i.e. L (ω) = dP (ω) /dP̃ (ω)).NOTE: P̃ (·) is called a change-of-measure

Blanchet (Columbia) 22 / 31

Importance Sampling

Goal: Estimate P (A) > 0Choose P̃ and simulate ω from it

Importance Sampling (I.S.) estimator per trial is

I .S .Estimator = L (ω) I (ω ∈ A) ,

where L (ω) is the likelihood ratio (i.e. L (ω) = dP (ω) /dP̃ (ω)).NOTE: P̃ (·) is called a change-of-measure

Blanchet (Columbia) 22 / 31

Importance Sampling

Goal: Estimate P (A) > 0Choose P̃ and simulate ω from it

Importance Sampling (I.S.) estimator per trial is

I .S .Estimator = L (ω) I (ω ∈ A) ,

where L (ω) is the likelihood ratio (i.e. L (ω) = dP (ω) /dP̃ (ω)).

NOTE: P̃ (·) is called a change-of-measure

Blanchet (Columbia) 22 / 31

Importance Sampling

Goal: Estimate P (A) > 0Choose P̃ and simulate ω from it

Importance Sampling (I.S.) estimator per trial is

I .S .Estimator = L (ω) I (ω ∈ A) ,

where L (ω) is the likelihood ratio (i.e. L (ω) = dP (ω) /dP̃ (ω)).NOTE: P̃ (·) is called a change-of-measure

Blanchet (Columbia) 22 / 31

Importance Sampling

Suppose we choose P̃ (·) = P ( ·|A)

L (ω) =dP (ω)

I (ω ∈ A) dP (ω) /P (A)I (ω ∈ A) = P (A)

Estimator has zero variance, but requires knoweledge of P (A)

Lesson: Try choosing P̃ (·) close to P ( ·|A)!

Blanchet (Columbia) 23 / 31

Importance Sampling

Suppose we choose P̃ (·) = P ( ·|A)

L (ω) =dP (ω)

I (ω ∈ A) dP (ω) /P (A)I (ω ∈ A) = P (A)

Estimator has zero variance, but requires knoweledge of P (A)

Lesson: Try choosing P̃ (·) close to P ( ·|A)!

Blanchet (Columbia) 23 / 31

Importance Sampling

Suppose we choose P̃ (·) = P ( ·|A)

L (ω) =dP (ω)

I (ω ∈ A) dP (ω) /P (A)I (ω ∈ A) = P (A)

Estimator has zero variance, but requires knoweledge of P (A)

Lesson: Try choosing P̃ (·) close to P ( ·|A)!

Blanchet (Columbia) 23 / 31

Large Deviations of the Empirical Mean

Suppose X1,X2, ... are i.i.d. and H (θ) = log E exp (θX ) < ∞ for θ ina neighborhood of the origin.

Assume that E (Xi ) = 0.

Suppose that for each a > 0, there is θa such that H ′ (θa) = a.

Consider the problem of estimating via simulation

P (S (n) > na) ,

where S (n) = X1 + ...+ Xn.

Blanchet (Columbia) 24 / 31

Large Deviations of the Empirical Mean

Suppose X1,X2, ... are i.i.d. and H (θ) = log E exp (θX ) < ∞ for θ ina neighborhood of the origin.

Assume that E (Xi ) = 0.

Suppose that for each a > 0, there is θa such that H ′ (θa) = a.

Consider the problem of estimating via simulation

P (S (n) > na) ,

where S (n) = X1 + ...+ Xn.

Blanchet (Columbia) 24 / 31

Large Deviations of the Empirical Mean

Suppose X1,X2, ... are i.i.d. and H (θ) = log E exp (θX ) < ∞ for θ ina neighborhood of the origin.

Assume that E (Xi ) = 0.

Suppose that for each a > 0, there is θa such that H ′ (θa) = a.

Consider the problem of estimating via simulation

P (S (n) > na) ,

where S (n) = X1 + ...+ Xn.

Blanchet (Columbia) 24 / 31

Large Deviations of the Empirical Mean

Suppose X1,X2, ... are i.i.d. and H (θ) = log E exp (θX ) < ∞ for θ ina neighborhood of the origin.

Assume that E (Xi ) = 0.

Suppose that for each a > 0, there is θa such that H ′ (θa) = a.

Consider the problem of estimating via simulation

P (S (n) > na) ,

where S (n) = X1 + ...+ Xn.

Blanchet (Columbia) 24 / 31

Exponential Tilting

Suppose that Xi has density f (·) and define

fθ (x) = exp (θx −H (θ)) f (x) .

Note that

log Eθ (exp (ηX )) = log∫fθ (x) exp (ηx) dx

= log∫f (x) exp ((η + θ) x −H (θ)) dx

= H (η + θ)−H (θ) .

Therefore Eθ (X ) = H ′ (θ) and Varθ (X ) = H ′′ (θ) .

Blanchet (Columbia) 25 / 31

Exponential Tilting

Suppose that Xi has density f (·) and define

fθ (x) = exp (θx −H (θ)) f (x) .

Note that

log Eθ (exp (ηX )) = log∫fθ (x) exp (ηx) dx

= log∫f (x) exp ((η + θ) x −H (θ)) dx

= H (η + θ)−H (θ) .

Therefore Eθ (X ) = H ′ (θ) and Varθ (X ) = H ′′ (θ) .

Blanchet (Columbia) 25 / 31

Exponential Tilting

Suppose that Xi has density f (·) and define

fθ (x) = exp (θx −H (θ)) f (x) .

Note that

log Eθ (exp (ηX )) = log∫fθ (x) exp (ηx) dx

= log∫f (x) exp ((η + θ) x −H (θ)) dx

= H (η + θ)−H (θ) .

Therefore Eθ (X ) = H ′ (θ) and Varθ (X ) = H ′′ (θ) .

Blanchet (Columbia) 25 / 31

Exponential Tilting in Action

Cramer’s theorem (strong version): (Sketch)

P (Sn > na)

=∫f (x1) ...f (xn) I (sn > na) dx1:n

=∫{sn>na}

fθa (x1) ...fθa (xn) exp (−θasn + nH (θa)) dx1:n

= exp (−n[aθa −H (θa)])Eθa [exp (−θa (Sn − na)) I (Sn > na)]= exp (−nI (a))Eθa [exp (−θa (Sn − na)) I (Sn > na)] .

Use CLT and approximate (Sn − na) by N (0, nH ′′ (θa)) to obtain

Eθa [exp (−θa (Sn − na)) I (Sn > na)] ≈1

n1/2 (2πH ′′ (θa))1/2 θa

.

Blanchet (Columbia) 26 / 31

Exponential Tilting in Action

Cramer’s theorem (strong version): (Sketch)

P (Sn > na)

=∫f (x1) ...f (xn) I (sn > na) dx1:n

=∫{sn>na}

fθa (x1) ...fθa (xn) exp (−θasn + nH (θa)) dx1:n

= exp (−n[aθa −H (θa)])Eθa [exp (−θa (Sn − na)) I (Sn > na)]= exp (−nI (a))Eθa [exp (−θa (Sn − na)) I (Sn > na)] .

Use CLT and approximate (Sn − na) by N (0, nH ′′ (θa)) to obtain

Eθa [exp (−θa (Sn − na)) I (Sn > na)] ≈1

n1/2 (2πH ′′ (θa))1/2 θa

.

Blanchet (Columbia) 26 / 31

Why is Exponential tilting Useful?

TheoremUnder the assumptions imposed

P (X1 ∈ dx1|Sn > na)→ Pθa (X1 ∈ dx1) .

In simple words, exponential tilting approximates thezero-variance (optimal!) change-of-measure!

Proof.

P (X1 ∈ dx1|Sn > na) =P (Sn−1 > na− x1)

P (Sn > na)f (x1) dx .

Now apply Cramer’s theorem with a←− (na− x1) /(n− 1) andn←− n− 1 in the numerator and simplify, observing that

θ

(na− x1n− 1

)= θ

(a+

a− x1n− 1

)≈ θ (a) +

1H ′′ (θa)

a− x1n− 1 .

Blanchet (Columbia) 27 / 31

First Asymptotically Optimal Rare Event SimulationEstimator

Recall the identity used in Cramer’s theorem

P (Sn > na) =∫f (x1) ...f (xn) I (sn > na) dx1:n

=∫{sn>na}

fθa (x1) ...fθa (xn) exp (−θasn + nH (θa)) dx1:n

= Eθa [exp (−θaSn + nH (θa)) I (Sn > na)] .

IMPORTANCE SAMPLING ESTIMATOR:

Z = exp (−θaSn + nH (θa)) I (Sn > na) ,

with X1, ...,Xn simulated using the density fθa (·) .Let us verify now that Z is asymptotically effi cient!

Blanchet (Columbia) 28 / 31

First Asymptotically Optimal Rare Event SimulationEstimator

Recall the identity used in Cramer’s theorem

P (Sn > na) =∫f (x1) ...f (xn) I (sn > na) dx1:n

=∫{sn>na}

fθa (x1) ...fθa (xn) exp (−θasn + nH (θa)) dx1:n

= Eθa [exp (−θaSn + nH (θa)) I (Sn > na)] .

IMPORTANCE SAMPLING ESTIMATOR:

Z = exp (−θaSn + nH (θa)) I (Sn > na) ,

with X1, ...,Xn simulated using the density fθa (·) .

Let us verify now that Z is asymptotically effi cient!

Blanchet (Columbia) 28 / 31

First Asymptotically Optimal Rare Event SimulationEstimator

Recall the identity used in Cramer’s theorem

P (Sn > na) =∫f (x1) ...f (xn) I (sn > na) dx1:n

=∫{sn>na}

fθa (x1) ...fθa (xn) exp (−θasn + nH (θa)) dx1:n

= Eθa [exp (−θaSn + nH (θa)) I (Sn > na)] .

IMPORTANCE SAMPLING ESTIMATOR:

Z = exp (−θaSn + nH (θa)) I (Sn > na) ,

with X1, ...,Xn simulated using the density fθa (·) .Let us verify now that Z is asymptotically effi cient!

Blanchet (Columbia) 28 / 31

First Asymptotically Optimal Rare Event SimulationEstimator

IMPORTANCE SAMPLING ESTIMATOR:

Z = exp (−θaSn + nH (θa)) I (Sn > na) ,

with X1, ...,Xn simulated using the density fθa (·) .

Therefore,

Eθa

(Z 2)= Eθa exp (−2θaSn + 2nH (θa)) I (Sn > na)

= exp (−2nI (a))Eθa [exp (−2θa(Sn − na)) I (Sn > na)] .

Thuslog Eθa

(Z 2)

2 logP (Sn > na)=2nI (a)2nI (a)

+ o (1)→ 1.

Blanchet (Columbia) 29 / 31

First Asymptotically Optimal Rare Event SimulationEstimator

IMPORTANCE SAMPLING ESTIMATOR:

Z = exp (−θaSn + nH (θa)) I (Sn > na) ,

with X1, ...,Xn simulated using the density fθa (·) .Therefore,

Eθa

(Z 2)= Eθa exp (−2θaSn + 2nH (θa)) I (Sn > na)

= exp (−2nI (a))Eθa [exp (−2θa(Sn − na)) I (Sn > na)] .

Thuslog Eθa

(Z 2)

2 logP (Sn > na)=2nI (a)2nI (a)

+ o (1)→ 1.

Blanchet (Columbia) 29 / 31

First Asymptotically Optimal Rare Event SimulationEstimator

IMPORTANCE SAMPLING ESTIMATOR:

Z = exp (−θaSn + nH (θa)) I (Sn > na) ,

with X1, ...,Xn simulated using the density fθa (·) .Therefore,

Eθa

(Z 2)= Eθa exp (−2θaSn + 2nH (θa)) I (Sn > na)

= exp (−2nI (a))Eθa [exp (−2θa(Sn − na)) I (Sn > na)] .

Thuslog Eθa

(Z 2)

2 logP (Sn > na)=2nI (a)2nI (a)

+ o (1)→ 1.

Blanchet (Columbia) 29 / 31

How to Simulate under Exponential Tilting

Key is to identify: Hθ (η) = H (η + θ)−H (θ) .

Example 1: Gaussian if X ∼ N (µ, 1)

H (θ) = θ2/2+ µθ, Hθ (η) = η2/2+ (θ + µ) η.

Thus, under fθ, X ∼ N (µ, 1).Example 2: If X ∼ Poisson (λ), then

H (θ) = λ (exp (θ)− 1) , Hθ (η) = λ exp (θ) · (exp (η)− 1) .

Thus, under fθ, X ∼ Poisson (λ exp (θ)).

Blanchet (Columbia) 30 / 31

How to Simulate under Exponential Tilting

Key is to identify: Hθ (η) = H (η + θ)−H (θ) .Example 1: Gaussian if X ∼ N (µ, 1)

H (θ) = θ2/2+ µθ, Hθ (η) = η2/2+ (θ + µ) η.

Thus, under fθ, X ∼ N (µ, 1).

Example 2: If X ∼ Poisson (λ), then

H (θ) = λ (exp (θ)− 1) , Hθ (η) = λ exp (θ) · (exp (η)− 1) .

Thus, under fθ, X ∼ Poisson (λ exp (θ)).

Blanchet (Columbia) 30 / 31

How to Simulate under Exponential Tilting

Key is to identify: Hθ (η) = H (η + θ)−H (θ) .Example 1: Gaussian if X ∼ N (µ, 1)

H (θ) = θ2/2+ µθ, Hθ (η) = η2/2+ (θ + µ) η.

Thus, under fθ, X ∼ N (µ, 1).Example 2: If X ∼ Poisson (λ), then

H (θ) = λ (exp (θ)− 1) , Hθ (η) = λ exp (θ) · (exp (η)− 1) .

Thus, under fθ, X ∼ Poisson (λ exp (θ)).

Blanchet (Columbia) 30 / 31

Conclusions

1 Rare event simulation is a power numerical tool applicable in a widerange of examples.

2 Importance sampling can be used to construct effi cient estimators.3 Best importance sampler is the conditional distribution given theevent of interest.

4 Effi cient algorithms come basically in two flavors: logarithmically andstrongly effi cient.

5 Exponential tilting can be used to approximate the conditionaldistribution given the event of interest.

Blanchet (Columbia) 31 / 31

Conclusions

1 Rare event simulation is a power numerical tool applicable in a widerange of examples.

2 Importance sampling can be used to construct effi cient estimators.

3 Best importance sampler is the conditional distribution given theevent of interest.

4 Effi cient algorithms come basically in two flavors: logarithmically andstrongly effi cient.

5 Exponential tilting can be used to approximate the conditionaldistribution given the event of interest.

Blanchet (Columbia) 31 / 31

Conclusions

1 Rare event simulation is a power numerical tool applicable in a widerange of examples.

2 Importance sampling can be used to construct effi cient estimators.3 Best importance sampler is the conditional distribution given theevent of interest.

4 Effi cient algorithms come basically in two flavors: logarithmically andstrongly effi cient.

5 Exponential tilting can be used to approximate the conditionaldistribution given the event of interest.

Blanchet (Columbia) 31 / 31

Conclusions

1 Rare event simulation is a power numerical tool applicable in a widerange of examples.

2 Importance sampling can be used to construct effi cient estimators.3 Best importance sampler is the conditional distribution given theevent of interest.

4 Effi cient algorithms come basically in two flavors: logarithmically andstrongly effi cient.

5 Exponential tilting can be used to approximate the conditionaldistribution given the event of interest.

Blanchet (Columbia) 31 / 31

Conclusions

1 Rare event simulation is a power numerical tool applicable in a widerange of examples.

2 Importance sampling can be used to construct effi cient estimators.3 Best importance sampler is the conditional distribution given theevent of interest.

4 Effi cient algorithms come basically in two flavors: logarithmically andstrongly effi cient.

5 Exponential tilting can be used to approximate the conditionaldistribution given the event of interest.

Blanchet (Columbia) 31 / 31

top related