semi-parametric importance sampling for rare-event...
TRANSCRIPT
Semi-Parametric ImportanceSampling for Rare-event probability
EstimationZ. I. Botev and P. L’Ecuyer
IMACS Seminar 2011
Borovets, Bulgaria
Semi-Parametric Importance Sampling for Rare-event probability Estimation – p.1/24
Outline
Formulation of importance sampling problem.
Background material on currently suggested
adaptive importance sampling methods.
The proposed methodology
Numerical example taken from recent work by
Asmussen, Blanchet, Juneja, and
Rojas-Nandayapa — Tail probabilities of sums
of correlated lognormals.
ConclusionsSemi-Parametric Importance Sampling for Rare-event probability Estimation – p.2/24
Problem formulation
The problem is to estimate high-dimensional integrals ofthe form
ℓ =
∫f(x)H(x)dx = EfH(X)
The function H : Rd → R and X is a d-dimensionalrandom variable with pdf f .
For discrete counting or combinatorial problems wesimply replace the integration by summation.
How do we estimate such integrals via Monte Carlo?
Semi-Parametric Importance Sampling for Rare-event probability Estimation – p.3/24
Existing methods
Importance sampling: Let g be an importance samplingdensity such that g(x) = 0 ⇒ H(x) f(x) = 0 for all x.
Generate X1, . . . ,Xmiid∼ g, then an unbiased estimator
of ℓ is
ℓ =1
m
m∑
k=1
Zk with Zk = H(Xk)f(Xk)
g(Xk).
The minimum variance importance sampling density is
π(x) =H(x) f(x)
ℓ, (H(x) > 0) ,
which depends on ℓ and is therefore not useful.
Semi-Parametric Importance Sampling for Rare-event probability Estimation – p.4/24
Existing importance sampling ideas
Existing methods for selecting the density g assumethat g is part of a parametric family: {g(·;η)}.
The objective is to select the parameter η so that g is as“close” to the optimal density π as possible.
The closeness between π and g is measured by theφ-divergence distance
∫π(x) φ
(g(x;η)
π(x)
)dx , (1)
where φ : R+ → R is twice continuously differentiable,and φ(1) = 0, φ′′(x) > 0, for all x > 0.
Semi-Parametric Importance Sampling for Rare-event probability Estimation – p.5/24
Variance Minimal method
Variance Minimal (VM) method:
Method equivalent to minimizing the φ-divergence withφ(z) = 1/z.
The minimization of the resulting φ-divergence
argminη
∫π2(x)
g(x;η)dx
is highly nonlinear /The φ-divergence has to be estimated — we have a
noisy nonlinear optimization problem /
Semi-Parametric Importance Sampling for Rare-event probability Estimation – p.6/24
Cross Entropy method
Cross Entropy method:
Method equivalent to minimizing the φ-divergence withφ(z) = − ln(z).
The minimization of the Kullback-Leibler distance
argminη
−
∫π(x) ln(g(x;η)/π(x))dx
is similar to likelihood maximization and we thusfrequently have analytical solutions to the optimization
problem ,The Kullback-Leibler distance still has to be estimated/
Semi-Parametric Importance Sampling for Rare-event probability Estimation – p.7/24
Shortcomings
Both Cross-Entropy and Variance Minimization methods
suffer from the following /:
Their performance is limited by how well a simple andrigid parametric density can approximate the optimal π.Ideally we would like a flexible non-parametric model forthe importance sampling density, but this is oftenimpossible.
The VM method always requires non-linear non-convexoptimization and the Cross Entropy method is as simpleas likelihood maximization can be.
Semi-Parametric Importance Sampling for Rare-event probability Estimation – p.8/24
MCMC methods for estimation
The Bayesian community has alternatives to parametricimportance sampling that use MCMC:
Chib’s method
Bridge sampling
Path sampling
Equi-Energy sampling
Gelfand-Dey Method
The most popular and efficient is Chib’s method and itsvariants. However, even Chib’s approach suffers from thefollowing drawbacks.
Semi-Parametric Importance Sampling for Rare-event probability Estimation – p.9/24
MCMC problems
The estimators are biased estimators, because thechain almost never starts in stationarity.
Difficulty in computing empirical and asymptotic errorestimates, because MCMC does not generate iidsamples.
Chib’s estimator relies on the output of multiple differentchains.
Is there a way to draw on the strength of both MCMC and
importance sampling?
Semi-Parametric Importance Sampling for Rare-event probability Estimation – p.10/24
Markov chain importance sampling
Similar to the cross entropy and variance minimizationmethods, the MCIS method consists of two stages:
Markov Chain (MC) stage, in which we construct anergodic estimator π of the minimum varianceimportance sampling density π.
Importance Sampling (IS) stage, in which we use π as animportance sampling density to estimate ℓ.
There are many different ways of constructing a model-free
estimator of π — e.g., standard kernel density estimation
(poor convergence). Here we only explore one way related
to the Gibbs sampler.Semi-Parametric Importance Sampling for Rare-event probability Estimation – p.11/24
MCIS with Gibbs
Suppose that we have used an MCMC sampler togenerate the population
X1, . . . ,Xnapprox∼ π(x), Xi = (Xi,1, . . . , Xi,d) .
We can construct the nonparametric estimator of πusing the Gibbs transition kernel:
π(y) =1
n
n∑
i=1
κ(y |Xi)
κ(y |X)def=
d∏
j=1
π(yj | y1, . . . , yj−1, Xj+1, . . . , Xd) .
Semi-Parametric Importance Sampling for Rare-event probability Estimation – p.12/24
MCIS vs importance sampling
The MCIS estimator is
ℓ =1
m
m∑
k=1
|H(Yk)| f(Yk)
π(Yk),
where Y1, . . . ,Ymiid∼ π. Advantages and disadvantages of
the MCIS approach:
P(limn→∞ π(x) = π(x)) = 1 for all x.
No need to solve a φ-divergence optimization problem
If n is large, then evaluation of π may be costly.
MCIS estimator, unlike Chib’s, is unbiased and iidsampling allows for standard estimation error bands.
Semi-Parametric Importance Sampling for Rare-event probability Estimation – p.13/24
Sums of correlated lognormals
Consider the estimation of the rare-event probability
ℓ = P(eX1 + · · ·+ eXd > γ) =
∫f(x) I{S(x) > γ}dx ,
where:
X = (X1, . . . , Xd) and X ∼ N(µ,Σ).
S(x) = ex1 + · · ·+ exd.
f is the density of N(µ,Σ) with associated precisionmatrix Λ = Σ−1.
We use the notation Λ = (Λi,j) and Σ = (Σi,j).
Semi-Parametric Importance Sampling for Rare-event probability Estimation – p.14/24
Conditional densities needed for π
We need the conditional densities of the optimal importancesampling pdf π(y) = f(y)I{S(x) > γ}/ℓ:
π(yi |y−i) ∝
{f(yi |y−i) if
∑j 6=i eyj > γ
f(yi |y−i) I{yi > ln(γ −
∑j 6=i eyj)
}if∑
j 6=i eyj < γ,
where by standard properties of the multivariate normaldensity f(yi |y−i) is normal density with mean
µi + Λ−1i,i
∑
j 6=i
Λi,j(µj − yj) and variance Λ−1i,i .
Semi-Parametric Importance Sampling for Rare-event probability Estimation – p.15/24
Numerical setup
Compare with importance sampling vanishing relativeerror estimator (ISVE) and the cross entropy vanishingrelative error estimator (CEVE) of Asmussen, Blanchet,Juneja, Rojas-Nandayapa, Efficient simulation of tailprobabilities of sums of correlated lognormals, Ann. Oper. Res.(2009)
Both of these estimators decompose the probability ℓ(γ)into two parts:
ℓ(γ) = P(maxi
Xi > γ) + P(S(X) > γ, maxi
Xi 6 γ).
The first (dominant) term and the second (residual) termare estimated by two different importance samplingestimators that ensure the strong efficiency of the sum.
Semi-Parametric Importance Sampling for Rare-event probability Estimation – p.16/24
Numerical setup I
The first term is asymptotically dominant in the sensethat
limγ→∞
P(maxiXi > γ)
P(S(X) > γ, maxiXi 6 γ)= 0 .
We compute ℓ for various values of the commoncorrelation coefficient
=Σi,j√Σi,iΣj,j
.
d = 10, µi = i− 10, σ2i = i (i = 1, . . . , d), γ = 5× 105.
We used the MCIS estimator with m = 5× 105 and aMarkov chain sample n = 80 obtained using splitting.
Semi-Parametric Importance Sampling for Rare-event probability Estimation – p.17/24
Numerical results I
relative error % MCIS est. ℓ MCIS CEVE ISVE0 1.795092× 10−5 0.0092 0.0063 0.0069
0.4 1.8077× 10−5 0.093 0.17 0.230.7 1.9014× 10−5 0.04 0.65 2.850.9 2.0735× 10−5 0.068 0.63 2.80
0.93 2.0997× 10−5 0.17 3.5 4.590.95 2.1412× 10−5 0.11 6 8.090.99 2.1882× 10−5 0.29 15 4.35
Semi-Parametric Importance Sampling for Rare-event probability Estimation – p.18/24
Numerical setup II
Both the ISVE and CEVE estimators are stronglyefficient, so we expect that these estimators willeventually outperform the MCIS estimator.
This intuition is confirmed in the next table describing 14cases with increasing values of the threshold γ.
We use the same algorithmic and problem parameters,except that = 0.9 in all cases and the thresholdparameter depends on the case number c according tothe formula:
γ = 5× 10c+3, c = 1, . . . , 14 .
Semi-Parametric Importance Sampling for Rare-event probability Estimation – p.19/24
Numerical results IIrelative error % efficiency
case c = ISVE estimate MCIS ISVE MCIS ISVE
1 3.9865× 10−4 0.049 1.6 25 1500
2 2.0802× 10−5 0.067 4.3 75 10000
3 6.4385× 10−7 0.077 5.5 64 22000
4 1.2039× 10−8 0.041 3.0 21 5000
5 1.3468e× 10−10 0.034 6.1 12 20000
6 8.9791× 10−13 0.036 7.6 17 30000
7 3.5899× 10−15 0.043 0.014 27 0.11
8 8.5302× 10−18 0.079 0.029 81 0.50
9 1.2082× 10−20 0.025 0.024 7 0.32
10 1.0148× 10−23 0.042 0.00064 28 0.00021
11 5.0428× 10−27 0.042 0.00030 27 4.6× 10
−5
12 1.4898× 10−30 0.024 0.00014 7 1.0× 10
−5
13 2.5961× 10−34 0.023 3.87× 10
−15 8.2 7.7× 10−27
14 2.6754× 10−38 0.012 4.22× 10
−13 2.6 9.1× 10−23
Semi-Parametric Importance Sampling for Rare-event probability Estimation – p.20/24
L2 properties
Assume that:
X1, . . . ,Xniid∼ κt−1(x |θ), where κt−1 is the (t− 1) step
transition density.
κ is the transition density of systematic Gibbs sampler.s
κ2(y |x)π(x)π(y)dydx < ∞. Warning: this can be difficult
to verify /All states of the chain can be accessed using a singlestep of the chain (irreducability condition).
Then we have the following result for ℓ = f(Y)H(Y)π(Y) .
Semi-Parametric Importance Sampling for Rare-event probability Estimation – p.21/24
L2 theoretical properties
We have the following bound on the Neymann χ2
distance:
E
[(ℓ− ℓ)2
ℓ ℓ
]=
(−1 +
∫κ2
t (y |θ)
π(y)dy
)
︸ ︷︷ ︸χ2 component
+1
n
∫Eκ2(y |X)− κ2
t (y |θ)
π(y)dy
︸ ︷︷ ︸variance component
6 V (θ)e−rt + O(1/n) ,
where V is a positive Liapunov function and r > 0 is aconstant (the geometric rate of convergence of theGibbs sampler).
Note that the above is NOT the relative error, becausethe denominator has ℓ, instead of ℓ.
Semi-Parametric Importance Sampling for Rare-event probability Estimation – p.22/24
Conclusions
We have presented a new method for integration thatcombines MCMC with importance sampling.
We argue that the method is preferable to methodsbased purely on MCMC or importance sampling.
The method dispenses with the traditional parametricmodeling used in the design of importance samplingdensities ,Unlike MCMC based methods, the proposed methodprovides unbiased estimators and estimation of relativeerror and confidence bands is straightforward ,
Semi-Parametric Importance Sampling for Rare-event probability Estimation – p.23/24
Some interesting links
Can we use large deviations theory to sample exactlyfrom the minimum variance density, instead of usingMCMC?
Idea related to empirical likelihood andRao-Blackwellization in classical statistics. Essentiallywe construct an estimator based on empirical likelihoodarguments.
thank you for your attention
Semi-Parametric Importance Sampling for Rare-event probability Estimation – p.24/24