semi-parametric importance sampling for rare-event...

24
Semi-Parametric Importance Sampling for Rare-event probability Estimation Z. I. Botev and P. L’Ecuyer IMACS Seminar 2011 Borovets, Bulgaria Semi-Parametric Importance Sampling for Rare-event probability Estimation – p.1/24

Upload: others

Post on 26-Apr-2020

12 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Semi-Parametric Importance Sampling for Rare-event ...parallel.bas.bg/dpa/IMACS_MCM_2011/Talks/Zdravko_Botev.pdfSemi-Parametric Importance Sampling for Rare-event probability Estimation

Semi-Parametric ImportanceSampling for Rare-event probability

EstimationZ. I. Botev and P. L’Ecuyer

IMACS Seminar 2011

Borovets, Bulgaria

Semi-Parametric Importance Sampling for Rare-event probability Estimation – p.1/24

Page 2: Semi-Parametric Importance Sampling for Rare-event ...parallel.bas.bg/dpa/IMACS_MCM_2011/Talks/Zdravko_Botev.pdfSemi-Parametric Importance Sampling for Rare-event probability Estimation

Outline

Formulation of importance sampling problem.

Background material on currently suggested

adaptive importance sampling methods.

The proposed methodology

Numerical example taken from recent work by

Asmussen, Blanchet, Juneja, and

Rojas-Nandayapa — Tail probabilities of sums

of correlated lognormals.

ConclusionsSemi-Parametric Importance Sampling for Rare-event probability Estimation – p.2/24

Page 3: Semi-Parametric Importance Sampling for Rare-event ...parallel.bas.bg/dpa/IMACS_MCM_2011/Talks/Zdravko_Botev.pdfSemi-Parametric Importance Sampling for Rare-event probability Estimation

Problem formulation

The problem is to estimate high-dimensional integrals ofthe form

ℓ =

∫f(x)H(x)dx = EfH(X)

The function H : Rd → R and X is a d-dimensionalrandom variable with pdf f .

For discrete counting or combinatorial problems wesimply replace the integration by summation.

How do we estimate such integrals via Monte Carlo?

Semi-Parametric Importance Sampling for Rare-event probability Estimation – p.3/24

Page 4: Semi-Parametric Importance Sampling for Rare-event ...parallel.bas.bg/dpa/IMACS_MCM_2011/Talks/Zdravko_Botev.pdfSemi-Parametric Importance Sampling for Rare-event probability Estimation

Existing methods

Importance sampling: Let g be an importance samplingdensity such that g(x) = 0 ⇒ H(x) f(x) = 0 for all x.

Generate X1, . . . ,Xmiid∼ g, then an unbiased estimator

of ℓ is

ℓ =1

m

m∑

k=1

Zk with Zk = H(Xk)f(Xk)

g(Xk).

The minimum variance importance sampling density is

π(x) =H(x) f(x)

ℓ, (H(x) > 0) ,

which depends on ℓ and is therefore not useful.

Semi-Parametric Importance Sampling for Rare-event probability Estimation – p.4/24

Page 5: Semi-Parametric Importance Sampling for Rare-event ...parallel.bas.bg/dpa/IMACS_MCM_2011/Talks/Zdravko_Botev.pdfSemi-Parametric Importance Sampling for Rare-event probability Estimation

Existing importance sampling ideas

Existing methods for selecting the density g assumethat g is part of a parametric family: {g(·;η)}.

The objective is to select the parameter η so that g is as“close” to the optimal density π as possible.

The closeness between π and g is measured by theφ-divergence distance

∫π(x) φ

(g(x;η)

π(x)

)dx , (1)

where φ : R+ → R is twice continuously differentiable,and φ(1) = 0, φ′′(x) > 0, for all x > 0.

Semi-Parametric Importance Sampling for Rare-event probability Estimation – p.5/24

Page 6: Semi-Parametric Importance Sampling for Rare-event ...parallel.bas.bg/dpa/IMACS_MCM_2011/Talks/Zdravko_Botev.pdfSemi-Parametric Importance Sampling for Rare-event probability Estimation

Variance Minimal method

Variance Minimal (VM) method:

Method equivalent to minimizing the φ-divergence withφ(z) = 1/z.

The minimization of the resulting φ-divergence

argminη

∫π2(x)

g(x;η)dx

is highly nonlinear /The φ-divergence has to be estimated — we have a

noisy nonlinear optimization problem /

Semi-Parametric Importance Sampling for Rare-event probability Estimation – p.6/24

Page 7: Semi-Parametric Importance Sampling for Rare-event ...parallel.bas.bg/dpa/IMACS_MCM_2011/Talks/Zdravko_Botev.pdfSemi-Parametric Importance Sampling for Rare-event probability Estimation

Cross Entropy method

Cross Entropy method:

Method equivalent to minimizing the φ-divergence withφ(z) = − ln(z).

The minimization of the Kullback-Leibler distance

argminη

∫π(x) ln(g(x;η)/π(x))dx

is similar to likelihood maximization and we thusfrequently have analytical solutions to the optimization

problem ,The Kullback-Leibler distance still has to be estimated/

Semi-Parametric Importance Sampling for Rare-event probability Estimation – p.7/24

Page 8: Semi-Parametric Importance Sampling for Rare-event ...parallel.bas.bg/dpa/IMACS_MCM_2011/Talks/Zdravko_Botev.pdfSemi-Parametric Importance Sampling for Rare-event probability Estimation

Shortcomings

Both Cross-Entropy and Variance Minimization methods

suffer from the following /:

Their performance is limited by how well a simple andrigid parametric density can approximate the optimal π.Ideally we would like a flexible non-parametric model forthe importance sampling density, but this is oftenimpossible.

The VM method always requires non-linear non-convexoptimization and the Cross Entropy method is as simpleas likelihood maximization can be.

Semi-Parametric Importance Sampling for Rare-event probability Estimation – p.8/24

Page 9: Semi-Parametric Importance Sampling for Rare-event ...parallel.bas.bg/dpa/IMACS_MCM_2011/Talks/Zdravko_Botev.pdfSemi-Parametric Importance Sampling for Rare-event probability Estimation

MCMC methods for estimation

The Bayesian community has alternatives to parametricimportance sampling that use MCMC:

Chib’s method

Bridge sampling

Path sampling

Equi-Energy sampling

Gelfand-Dey Method

The most popular and efficient is Chib’s method and itsvariants. However, even Chib’s approach suffers from thefollowing drawbacks.

Semi-Parametric Importance Sampling for Rare-event probability Estimation – p.9/24

Page 10: Semi-Parametric Importance Sampling for Rare-event ...parallel.bas.bg/dpa/IMACS_MCM_2011/Talks/Zdravko_Botev.pdfSemi-Parametric Importance Sampling for Rare-event probability Estimation

MCMC problems

The estimators are biased estimators, because thechain almost never starts in stationarity.

Difficulty in computing empirical and asymptotic errorestimates, because MCMC does not generate iidsamples.

Chib’s estimator relies on the output of multiple differentchains.

Is there a way to draw on the strength of both MCMC and

importance sampling?

Semi-Parametric Importance Sampling for Rare-event probability Estimation – p.10/24

Page 11: Semi-Parametric Importance Sampling for Rare-event ...parallel.bas.bg/dpa/IMACS_MCM_2011/Talks/Zdravko_Botev.pdfSemi-Parametric Importance Sampling for Rare-event probability Estimation

Markov chain importance sampling

Similar to the cross entropy and variance minimizationmethods, the MCIS method consists of two stages:

Markov Chain (MC) stage, in which we construct anergodic estimator π of the minimum varianceimportance sampling density π.

Importance Sampling (IS) stage, in which we use π as animportance sampling density to estimate ℓ.

There are many different ways of constructing a model-free

estimator of π — e.g., standard kernel density estimation

(poor convergence). Here we only explore one way related

to the Gibbs sampler.Semi-Parametric Importance Sampling for Rare-event probability Estimation – p.11/24

Page 12: Semi-Parametric Importance Sampling for Rare-event ...parallel.bas.bg/dpa/IMACS_MCM_2011/Talks/Zdravko_Botev.pdfSemi-Parametric Importance Sampling for Rare-event probability Estimation

MCIS with Gibbs

Suppose that we have used an MCMC sampler togenerate the population

X1, . . . ,Xnapprox∼ π(x), Xi = (Xi,1, . . . , Xi,d) .

We can construct the nonparametric estimator of πusing the Gibbs transition kernel:

π(y) =1

n

n∑

i=1

κ(y |Xi)

κ(y |X)def=

d∏

j=1

π(yj | y1, . . . , yj−1, Xj+1, . . . , Xd) .

Semi-Parametric Importance Sampling for Rare-event probability Estimation – p.12/24

Page 13: Semi-Parametric Importance Sampling for Rare-event ...parallel.bas.bg/dpa/IMACS_MCM_2011/Talks/Zdravko_Botev.pdfSemi-Parametric Importance Sampling for Rare-event probability Estimation

MCIS vs importance sampling

The MCIS estimator is

ℓ =1

m

m∑

k=1

|H(Yk)| f(Yk)

π(Yk),

where Y1, . . . ,Ymiid∼ π. Advantages and disadvantages of

the MCIS approach:

P(limn→∞ π(x) = π(x)) = 1 for all x.

No need to solve a φ-divergence optimization problem

If n is large, then evaluation of π may be costly.

MCIS estimator, unlike Chib’s, is unbiased and iidsampling allows for standard estimation error bands.

Semi-Parametric Importance Sampling for Rare-event probability Estimation – p.13/24

Page 14: Semi-Parametric Importance Sampling for Rare-event ...parallel.bas.bg/dpa/IMACS_MCM_2011/Talks/Zdravko_Botev.pdfSemi-Parametric Importance Sampling for Rare-event probability Estimation

Sums of correlated lognormals

Consider the estimation of the rare-event probability

ℓ = P(eX1 + · · ·+ eXd > γ) =

∫f(x) I{S(x) > γ}dx ,

where:

X = (X1, . . . , Xd) and X ∼ N(µ,Σ).

S(x) = ex1 + · · ·+ exd.

f is the density of N(µ,Σ) with associated precisionmatrix Λ = Σ−1.

We use the notation Λ = (Λi,j) and Σ = (Σi,j).

Semi-Parametric Importance Sampling for Rare-event probability Estimation – p.14/24

Page 15: Semi-Parametric Importance Sampling for Rare-event ...parallel.bas.bg/dpa/IMACS_MCM_2011/Talks/Zdravko_Botev.pdfSemi-Parametric Importance Sampling for Rare-event probability Estimation

Conditional densities needed for π

We need the conditional densities of the optimal importancesampling pdf π(y) = f(y)I{S(x) > γ}/ℓ:

π(yi |y−i) ∝

{f(yi |y−i) if

∑j 6=i eyj > γ

f(yi |y−i) I{yi > ln(γ −

∑j 6=i eyj)

}if∑

j 6=i eyj < γ,

where by standard properties of the multivariate normaldensity f(yi |y−i) is normal density with mean

µi + Λ−1i,i

j 6=i

Λi,j(µj − yj) and variance Λ−1i,i .

Semi-Parametric Importance Sampling for Rare-event probability Estimation – p.15/24

Page 16: Semi-Parametric Importance Sampling for Rare-event ...parallel.bas.bg/dpa/IMACS_MCM_2011/Talks/Zdravko_Botev.pdfSemi-Parametric Importance Sampling for Rare-event probability Estimation

Numerical setup

Compare with importance sampling vanishing relativeerror estimator (ISVE) and the cross entropy vanishingrelative error estimator (CEVE) of Asmussen, Blanchet,Juneja, Rojas-Nandayapa, Efficient simulation of tailprobabilities of sums of correlated lognormals, Ann. Oper. Res.(2009)

Both of these estimators decompose the probability ℓ(γ)into two parts:

ℓ(γ) = P(maxi

Xi > γ) + P(S(X) > γ, maxi

Xi 6 γ).

The first (dominant) term and the second (residual) termare estimated by two different importance samplingestimators that ensure the strong efficiency of the sum.

Semi-Parametric Importance Sampling for Rare-event probability Estimation – p.16/24

Page 17: Semi-Parametric Importance Sampling for Rare-event ...parallel.bas.bg/dpa/IMACS_MCM_2011/Talks/Zdravko_Botev.pdfSemi-Parametric Importance Sampling for Rare-event probability Estimation

Numerical setup I

The first term is asymptotically dominant in the sensethat

limγ→∞

P(maxiXi > γ)

P(S(X) > γ, maxiXi 6 γ)= 0 .

We compute ℓ for various values of the commoncorrelation coefficient

=Σi,j√Σi,iΣj,j

.

d = 10, µi = i− 10, σ2i = i (i = 1, . . . , d), γ = 5× 105.

We used the MCIS estimator with m = 5× 105 and aMarkov chain sample n = 80 obtained using splitting.

Semi-Parametric Importance Sampling for Rare-event probability Estimation – p.17/24

Page 18: Semi-Parametric Importance Sampling for Rare-event ...parallel.bas.bg/dpa/IMACS_MCM_2011/Talks/Zdravko_Botev.pdfSemi-Parametric Importance Sampling for Rare-event probability Estimation

Numerical results I

relative error % MCIS est. ℓ MCIS CEVE ISVE0 1.795092× 10−5 0.0092 0.0063 0.0069

0.4 1.8077× 10−5 0.093 0.17 0.230.7 1.9014× 10−5 0.04 0.65 2.850.9 2.0735× 10−5 0.068 0.63 2.80

0.93 2.0997× 10−5 0.17 3.5 4.590.95 2.1412× 10−5 0.11 6 8.090.99 2.1882× 10−5 0.29 15 4.35

Semi-Parametric Importance Sampling for Rare-event probability Estimation – p.18/24

Page 19: Semi-Parametric Importance Sampling for Rare-event ...parallel.bas.bg/dpa/IMACS_MCM_2011/Talks/Zdravko_Botev.pdfSemi-Parametric Importance Sampling for Rare-event probability Estimation

Numerical setup II

Both the ISVE and CEVE estimators are stronglyefficient, so we expect that these estimators willeventually outperform the MCIS estimator.

This intuition is confirmed in the next table describing 14cases with increasing values of the threshold γ.

We use the same algorithmic and problem parameters,except that = 0.9 in all cases and the thresholdparameter depends on the case number c according tothe formula:

γ = 5× 10c+3, c = 1, . . . , 14 .

Semi-Parametric Importance Sampling for Rare-event probability Estimation – p.19/24

Page 20: Semi-Parametric Importance Sampling for Rare-event ...parallel.bas.bg/dpa/IMACS_MCM_2011/Talks/Zdravko_Botev.pdfSemi-Parametric Importance Sampling for Rare-event probability Estimation

Numerical results IIrelative error % efficiency

case c = ISVE estimate MCIS ISVE MCIS ISVE

1 3.9865× 10−4 0.049 1.6 25 1500

2 2.0802× 10−5 0.067 4.3 75 10000

3 6.4385× 10−7 0.077 5.5 64 22000

4 1.2039× 10−8 0.041 3.0 21 5000

5 1.3468e× 10−10 0.034 6.1 12 20000

6 8.9791× 10−13 0.036 7.6 17 30000

7 3.5899× 10−15 0.043 0.014 27 0.11

8 8.5302× 10−18 0.079 0.029 81 0.50

9 1.2082× 10−20 0.025 0.024 7 0.32

10 1.0148× 10−23 0.042 0.00064 28 0.00021

11 5.0428× 10−27 0.042 0.00030 27 4.6× 10

−5

12 1.4898× 10−30 0.024 0.00014 7 1.0× 10

−5

13 2.5961× 10−34 0.023 3.87× 10

−15 8.2 7.7× 10−27

14 2.6754× 10−38 0.012 4.22× 10

−13 2.6 9.1× 10−23

Semi-Parametric Importance Sampling for Rare-event probability Estimation – p.20/24

Page 21: Semi-Parametric Importance Sampling for Rare-event ...parallel.bas.bg/dpa/IMACS_MCM_2011/Talks/Zdravko_Botev.pdfSemi-Parametric Importance Sampling for Rare-event probability Estimation

L2 properties

Assume that:

X1, . . . ,Xniid∼ κt−1(x |θ), where κt−1 is the (t− 1) step

transition density.

κ is the transition density of systematic Gibbs sampler.s

κ2(y |x)π(x)π(y)dydx < ∞. Warning: this can be difficult

to verify /All states of the chain can be accessed using a singlestep of the chain (irreducability condition).

Then we have the following result for ℓ = f(Y)H(Y)π(Y) .

Semi-Parametric Importance Sampling for Rare-event probability Estimation – p.21/24

Page 22: Semi-Parametric Importance Sampling for Rare-event ...parallel.bas.bg/dpa/IMACS_MCM_2011/Talks/Zdravko_Botev.pdfSemi-Parametric Importance Sampling for Rare-event probability Estimation

L2 theoretical properties

We have the following bound on the Neymann χ2

distance:

E

[(ℓ− ℓ)2

ℓ ℓ

]=

(−1 +

∫κ2

t (y |θ)

π(y)dy

)

︸ ︷︷ ︸χ2 component

+1

n

∫Eκ2(y |X)− κ2

t (y |θ)

π(y)dy

︸ ︷︷ ︸variance component

6 V (θ)e−rt + O(1/n) ,

where V is a positive Liapunov function and r > 0 is aconstant (the geometric rate of convergence of theGibbs sampler).

Note that the above is NOT the relative error, becausethe denominator has ℓ, instead of ℓ.

Semi-Parametric Importance Sampling for Rare-event probability Estimation – p.22/24

Page 23: Semi-Parametric Importance Sampling for Rare-event ...parallel.bas.bg/dpa/IMACS_MCM_2011/Talks/Zdravko_Botev.pdfSemi-Parametric Importance Sampling for Rare-event probability Estimation

Conclusions

We have presented a new method for integration thatcombines MCMC with importance sampling.

We argue that the method is preferable to methodsbased purely on MCMC or importance sampling.

The method dispenses with the traditional parametricmodeling used in the design of importance samplingdensities ,Unlike MCMC based methods, the proposed methodprovides unbiased estimators and estimation of relativeerror and confidence bands is straightforward ,

Semi-Parametric Importance Sampling for Rare-event probability Estimation – p.23/24

Page 24: Semi-Parametric Importance Sampling for Rare-event ...parallel.bas.bg/dpa/IMACS_MCM_2011/Talks/Zdravko_Botev.pdfSemi-Parametric Importance Sampling for Rare-event probability Estimation

Some interesting links

Can we use large deviations theory to sample exactlyfrom the minimum variance density, instead of usingMCMC?

Idea related to empirical likelihood andRao-Blackwellization in classical statistics. Essentiallywe construct an estimator based on empirical likelihoodarguments.

thank you for your attention

Semi-Parametric Importance Sampling for Rare-event probability Estimation – p.24/24