bayesian dynamic linear modelling for complex computer modelsfei/papers/dlmcomputer.pdf · bayesian...
TRANSCRIPT
Bayesian Dynamic Linear Modelling for
Complex Computer Models
Fei Liu, Liang Zhang, Mike West
Abstract
Computer models may have functional outputs. With no loss of generality, we as-
sume that a single computer run is generating a function of time. For complex computer
models, Bayarri et al. (2002) considers the time as a computer model associated input
parameter, and uses the Gaussian Response Surface Approximation method (GaSP)
with the Kronecker product correlation matrix in the augumented space. However,
this approach is only applicable when there are only a few time points. In this pa-
per, we consider the Bayesian Dynamic Linear Model West and Harrison (1997) as an
alternative approach when there are many time points. Our method also allows the
forecasting for the future.
Keywords: Computer model; Bayesian Dynamic Linear Model; Gaussian stochas-
tic process; Bayesian analysis; Forwarding filtering and backward sampling; MCMC.
1 Introduction
The computer models can be represented as deterministic functions of the associated pa-
rameters. There are generally two types of parameters: (a) calibration parameters u are
1
only associated with the computer codes. They may be uncertain physical properties. (b)
unknown parameters x are associated with both the computer models and the field exper-
iments. They are characteristics associated with the real experiments. For simplicity, we
use x to represent (x, u). As a result, we can represent the computer model as a function
of x, y (x). On the other hand, exsercising the code is very time consuming for complex
computer models. Consequently, the function y (x) is only evaluated at selected locations
(xi, i = 1, . . . , n).
In this paper, we focus on the computer models with the functional outputs. We assume
that the computer model outputs are functions of time t, t = 1, . . . , T . We represent such
computer model output as y (x, t). This type of computer models has been studied both
in Bayarri et al. (2002) and Bayarri et al. (2006). The SAVE model in Bayarri et al. (2002)
uses the Gaussian Response Surface Approximation method (GaSP) on the augmented space
of (x, t) by assuming seperable correlation in the space of x and t. They assume that the
computer model outputs are realizations from a Gaussian Stochastic Process defined on the
(x, t) space, i.e.,
y (·, ·) ∼ GP
(µ,
1
λMCorr ((·, ·), (·, ·))
)
where Corr(y (x, t) , y
(x′, t′))
= exp(−∑
βi | xi − x′i |αi
)exp
(−β(t) | t− t′ |α(t)
). We
use y (x) to represent the functional output of a single computer run whose inputs is x,
(y (x))t = (y (x, tj) , j = 1, . . . , T )). The likelihood in SAVE is represented as,
y (x1)
...
y (xn)
∼ N
(µ× 1,
1
λMΣ1 ⊗ Σ2
)(1)
where (Σ1)k,l = exp(−∑
βi | xki − x′li |αi
)and (Σ2)k,l = exp
(−β(t) | tk − t′l |α(t)
).
2
To implement the SAVE model, one needs to invert the matrices Σ1 and Σ2, where Σ1 is
a n by n matrix, and Σ2 is T by T . In the context of complex computer models, inverting
Σ1 is feasible because n is generally small. But however, the dimension of Σ2 may be too
huge to invert. Bayarri et al. (2006) uses basis expansion method (SAVE2), i.e.,
y (x, ·) =I∑
i=1
wi (x) φi(·)
where {φi(·)} is a basis library, and they use a wavelet for their application. Then, they
model the coefficients as independent spatial processes, wi (·) ∼ GP(µi,
1λM
iCorri (·, ·)
).
SAVE2 can give predictions with confidence bounds for the computer model output at
any values of x by spatial interpolation. However, it can only handle the computer models
with fixed time grids t = 1, . . . , T . Some applications of the computer model may require
forecasting for the future, weather forecasting models for instance. In this paper, we will
discuss modelling the computer model code by Dynamic Linear Models (DLM), as to capture
the temporal structures in the data.
The paper is organized as follows. We will first introduce our DLM model and make
connections with the SAVE model in section 2. In section 3, we will give the likelihood and
specify the prior distributions for the unknown parameters associated with the DLM model.
Section 4 discusses the MCMC method to get draws from the posterior distributions of the
unknown quantities, and also gives spatial interpolation for the computer model at arbitray
locations in the x space . The method will be applied on an example data set in section 5.
2 the DLM for the computer model outputs
For a single computer model run at x, we use the time varying autoregressive model (TVAR)
(West and Harrison, 1997) to model its temporal structure.
3
y (x, t) =
p∑j
φt,jy(x, t− j) + εt(x) (2)
The computer model runs are correlated by assuming a Gaussian stochastic processes for
the evolutions εt(x) in equation (2), i.e.,
εt(·) ∼ GP(0, vtCorr(t)(·, ·)) (3)
where, we are assuming that Corr(t)(·, ·)) = Corr(·, ·)) is the same for all t. And we use
seperable power exponential function for the evolution correlation, i.e.,
Corr(x, x′) = exp(−
∑i
βi | xi − x′i |αi)
The model in equation (2) can be connected with the SAVE model given in equation (1),
in an approximation sense. Consider the likelihood for the SAVE model in equation (1). Let
yt = (y (x1, t) , . . . , y (xn, t))′. We represent the likelihood in equation (1) by the product of
conditional likelihoods,
L(yT , yT−1, . . . , y1 | Θ
)=
(p+1∏i=T
L(yi | yi−1, . . . , y1, Θ
))
L(yp, yp−1, . . . , y1 | Θ
)(4)
Next, at any time t, we approximate the conditional likelihood as,
L(yt | yt−1, . . . , y1, Θ
) ≈ L(yt | yt−1, . . . , yt−p, Θ
)(5)
Let ρ(k, l) = exp(−β(t) | k − l |α(t)
), ρt,t−1:t−p = (ρ(t, t− 1), . . . , ρ(t, t− p))
′,
(Σ̃2
)k,l
=
ρ(k, l), k, l = t − 1, . . . , t − p. The conditional likelihoods in equation (5) are multivariate
normals with mean vectors,
4
E(yt | yt−1, , . . . , yt−p, Θ
)=
((ρt,t−1:t−p ⊗ Σ1
)′ (Σ̃2 ⊗ Σ1
)−1)
yt−1
...
yt−p
=
((ρt,t−1:t−p
)′ (Σ̃2
)−1)⊗ In×n
yt−1
...
yt−p
This implies the auto-regressive term in equation (2),
yM (x, t) =(ρt,t−1:t−p
)′ (Σ̃2
)−1
yM (x, t− 1)
...
yM (x, t− p)
We assume that Corr(t)(·, ·)) = Corr(·, ·)) in equation (3) because the covariance matrices
of the conditional likelihoods L(yt | yt−1, . . . , yt−p, Θ
)is time-independent. To see this, we
representat Cov(yt | yt−1, . . . , yt−p, Θ
)as,
Cov(yt | yt−1, . . . , yt−p, Θ
)=
1
λM
(Σ1 −
((ρt,t−1:t−p)
′ ⊗ Σ1
)′ (Σ̃2 ⊗ Σ1
)−1 ((ρt,t−1:t−p)
′ ⊗ Σ1
))
=1
λM
(1− (ρt,t−1:t−p)
′ (Σ̃2
)−1
(ρt,t−1:t−p)
)Σ1
Finally, realizing that the functinal outputs of the computer models are usually tempo-
rally inhomogenous, we adapt our model to such inhomogenienty by allowing time-varying
autoregressive coefficients and time-varying variances of the innovations in equation (2).
5
3 Likelihood and the Prior Distributions
3.1 The Multivariate DLM representation
We can represent the likelihood in the matrix form, i.e.,
y(x1, t)
y(x2, t)
. . .
y(xn, t)
=
y(x1, t− 1) y(x1, t− 2) . . . y(x1, t− p)
y(x2, t− 1) y(x2, t− 2) . . . y(x2, t− p)
......
. . ....
y(xn, t− 1) y(xn, t− 2) . . . y(xn, t− p)
φt,1
φt,2
...
φt,p
+
εt(x1)
εt(x2)
...
εt(xn)
(6)
And we model the TVAR coefficients Φt =
φt,1
φt,2
...
φt,p
as,
Φt = Φt−1 + wt
where wt ∼ N(0,Wt). Let Gt be the identity matrix of size p, V t = vtΣ1, and
F′t =
y(x1, t− 1) y(x1, t− 2) . . . y(x1, t− p)
y(x2, t− 1) y(x2, t− 2) . . . y(x2, t− p)
......
. . ....
y(xn, t− 1) y(xn, t− 2) . . . y(xn, t− p)
We can represent the likelihood in the way of Multivariate DLM (West and Harrison, 1997),
{Ft, Gt, V t,Wt}Tt=1
6
3.2 The Prior distributions
Let Dt be the data up to time t. We sequentially specify the prior distribtions for Wt and
Vt by two discounting factors δ1, δ2.
v−1t | Dt−1 ∼ G(δ1nt−1/2, δ1dt−1/2)
For Wt, we assume,
Wt | Dt−1 = (1− δ2)Ct−1/δ2, Ct−1 = Cov(Φt−1 | Dt−1)
where Ct−1 = Cov (Φt−1 | Dt−1) and will be specified recursively in section A. The values
for (n0, d0, C0) will be prespecified.
Finally, for the spatial parameters α = {αi} and β = {βi}, we use the Jeffereys’ rule
prior π (α, β) discussed in Berger et al. (2001) and Paulo (2005).
π (α, β) ∝| I (α, β) |1/2∝√| tr(Σ−1
1 Σ̇1)2 |
where I (α, β) is the Fisher information matrix, and Σ̇1 = ∂Σ1
∂(α,β).
4 MCMC method for the Multivariate DLM
We use the Monte Carlo Markov Chain method (MCMC) to draw samples from the poste-
rior distributions, π ({v1, . . . vT}; {Φ1, . . . , ΦT}; {α, β} | DT )). We first give the algorithm as
follows. At the i’th iteration,
1. Sample({α(i), β(i)} | DT , {v(i−1)
1 , . . . , v(i−1)T }, {Φ(i−1)
1 , . . . , Φ(i−1)T }
)by the Metroplis-Hastings
algorithm.
7
2. Sample({v(i)
1 , . . . , v(i)T }, {Φ(i)
1 , . . . , Φ(i)T } | DT , {α(i), β(i)}
)as,
2.1 Sample({v(i)
1 , . . . v(i)T } | DT , {α(i), β(i)}
). This will be discussed in section 4.1.
2.2 Sample({Φ(i)
1 , . . . , Φ(i)T } | DT , {v(i)
1 , . . . v(i)T }, {α(i), β(i)}
)as in section 4.2.
4.1 Sampling the variances
We give the algorithm to update the variances({v(i)
1 , . . . v(i)T } | DT , {α(i), β(i)}
).
1. Do the Forward filtering assuming {v1, . . . , vT} unknown, as discussed in the ap-
pendix B.
2. Sample((
v−1T
)(i) | DT , {α(i), β(i)})∼ G (nT /2, dT /2).
3. Sample vt, t = T − 1, . . . , 1 recursively as,
v−1t = δ1v
−1t+1 + G ((1− δ1)nt/2, dt/2)
4.2 Sampling the TVAR coefficients
Below is the algorithm to make draws from π ({Φ1, . . . , ΦT} | DT , {v1, . . . vT}, {α, β}).
1. Do the Forward filtering conditional on {v1, . . . , vT}. This will be discussed in the
appendix A.
2. Sample (ΦT | DT , {v1, . . . , vT}) ∼ MVN (mT , CT ).
8
3. Sample Φt, t = T − 1, . . . , 1 recursively from,
(Φt | DT , Φt+1, {v1, . . . , vT}) ∼ MVN ((1− δ2)mt + δ2Φt+1, (1− δ2)Ct)
4.3 Spatial interpolation
We predict the output of a computer model at a new input value by spatial interpolation.
Suppose x is the new (unexsercised) input value. Let et(xi) = yt(xi)−∑
j yt−j(xi)φt,j and
ρx(x, x1:n) = (Corr(x, x1), . . . ,Corr(x, xn))′, we have,
(yt(x) | {yt−1(x), . . . yt−p(x)}, Data, {v1, . . . , vT}, {α, β}) ∼ N(µt(x), σ2
t (x))
where,
µt(x) =∑
j
yt−j(x)φt,j + v−1t ρx(x, x1:n)Σ−1
1
et(x1)
et(x2)
...
et(xn)
and,
σ2t (x) = vt
(1− ρx (x, x1:n) Σ−1
1 ρx (x, x1:n))
As all the computer model emulators do, the DLM modelling approach gives back the
computer model output, when we are trying to make predictions for the exsercised computer
input values. In other words, if x ∈ {x1, . . . , xn}, we have µt(x) = yt(x) and σ2t (x) = 0.
9
5 An example
5.1 The data
Figure 1 gives an example of the functional outputs of computer models. Each time series
is associated with an x value located to the left of the series. The x values are considered
as the computer model inputs. The data with x = 0.5 (in red) is obtained from some real
physical experiment. This data is observed at T = 3000 time points. We use yt(0.5) =
(yt(0.5), t = 1, . . . , T ) to represent it. Given yt(0.5) and its TVAR20 fit {φt,j, vt}, we simulate
the data for x = 0.25, . . . , 0.75 by fixing α = 2, β = 1.6. The details are discussed in
Appendix C.
Figure 1: The simulated computer model data at various input values
10
5.2 MCMC Results
In section 4, we can perfectly sample {v(i)1 , . . . , v
(i)T }, {Φ(i)
1 , . . . , Φ(i)T } conditional on {α(i), β(i)}.
This implies that, we do not need to update {v(i)1 , . . . , v
(i)T }, {Φ(i)
1 , . . . , Φ(i)T } in every iteration.
In particular, we update {v(i)1 , . . . , v
(i)T }, {Φ(i)
1 , . . . , Φ(i)T } after every 200 iterations of sampling
{α(i), β(i)} by the Metroplis-Hastings algorithm. We fix {α(i)} at 2 for the example data
set. For the other unknowns, starting the MCMC from “true” parameter values, we obtained
N = 2000 samples, among which the first 1000 are treated as burnin samples and will be
discarded in all the posterior inferences. Figure 2 gives the trace plot, prior distribution (up
to a normalizing constant), posterior distribution, autocorrelation function for β. For the
purpose of making comparison between the prior and the posterior distribution for β, we
highlight with red line the prior distribution in the interval (1, 2), within which the posterior
draws are concentrated.
Suppose{
φ(i)t,j
}is the i’th MCMC draw for the TVAR coefficients {φt,j}, where i = 1, . . . , N,
t = 1, . . . , T , and j = 1, . . . , 20. We calculate the posterior mean for φt,j, φ̂t,j by,
φ̂t,j =1
N
∑i
φ(i)t,j
And the point-wise posterior means of the TVAR coefficients are shown in the left panel
of figure 3. The right panel shows {v̂t, t = 1, . . . , T}, the point-wise posterior means of {vt}.
5.3 Spatial interpolation
One direct application of the multivariate DLM, as we discussed in section 4.3, is to get the
prediction for the computer model at input other than the design points. In figure 4, we
give our prediction for the dynamic computer model outputs at input value x = 0.5. We
also make comparison between the true outputs and our prediction at the time intervals
11
Figure 2: Upper-left: trace plot of the MCMC samples for β; Upper-Right: autocorrelationfunctions of the MCMC samples for β; Lower-Left: posterior distribution of β; Lower-Right:prior density of β.
(1100, 1300) and (2700, 2900), where the data is exhibiting interesting features.
5.4 Wave and modular decomposition
We can decompose the process {y(t)} as
12
Figure 3: Left: posterior means for the TVAR coefficients {φt,j}; Right: posterior means forthe time varying variances {vt}
Figure 4: Posterior predictive curve (green), true computer model output (red), and 90%piece-wise predictive intervals for spatial interpolation with input value x = 0.5.
yt =c∑
l=1
zt,l +r∑
l=1
xt,l
13
where the latent processes {zt,l} are TVAR’s with lag 1 and xt,l are stochastically time-
varying damped harmonic components, each of which is associated with the modulars (damp-
ing parameters) {at,l} and the wavelengths (periods) {λt,l} (West and Harrison, 1997). Such
decomposition can help to understand the physics meanings of the computer model outputs.
In Figure 5, we show the decompositions for the posterior mean of the process {yt(0.5)}. In
Figure 6, we show the modulars and the wavelengths of the first 5 components, as a function
of t.
Figure 5: The true computer model output data {yt(0.5)}(bottom), posterior mean for{yt(0.5)}(second to the bottom), and decomposition of the posterior mean (the rest curvesare the first to the third components from bottom to the top).
A Forward filtering with known variances
We briefly review the forward filtering algorithm with known variances for multivariate
DLM. For more details, refer to the Chapter 16 in West and Harrison (1997).
With (m0, C0),
14
Figure 6: Left: wave decompositions; Right: modular decompositions
(a). Posterior at t− 1: (Φt−1 | Dt−1) ∼ N(mt−1, Ct−1)
(b). Prior at t: (Φt | Dt−1) ∼ N(at, Rt), with,
at = mt−1, Rt = Ct−1/δ2
(c). One-step forecast: (yt | Dt−1) ∼ N(f t, Qt), with,
f t = F′t at = F
′t mt−1; Qt = F
′t Ct−1Ft/δ2 + vt
(d). Poserior at t: (Φt | Dt) ∼ N(mt, Ct)
with,
mt = at + Atet and Ct = Rt − AtQtA′t
where,
15
At = RtFtQ−1t and et = Yt − ft
B Forward filtering with unknown variances
We first describe the forward filtering algorithm with unknown variances for multivari-
ate DLM.
With m0, C0, s0, n0,
(a). Posterior at t− 1: (Φt−1 | Dt−1) ∼ N(mt−1, Ct−1)
(b). Prior at t: (Φt | Dt−1) ∼ N(at, Rt), with,
at = mt−1, Rt = Ct−1/δ2
(c). One-step forecast: (yt | Dt−1) ∼ N(f t, Qt), with,
f t = F′t at = F
′t mt−1; Qt = F
′t Ct−1Ft/δ2 + st−1Σ1
(d). Posterior at t: (Φt | Dt) ∼ Tnt(mt, Ct), and, (V −1t | Dt) ∼ G(nt/2, dt/2), with,
At = RtFtQ−1t = Ct−1FtQ
−1t /δ2
where mt = mt−1 + Atet, et = yt − F′t mt−1, and Ct = st
st−1
(Ct−1
δ2− AtQtA
′t
), and,
nt = δ1nt−1 + n; dt = δ1dt−1 + st−1ettQ
−1t et (7)
16
Now, we derive the relationship in equation (7). At time t, the prior for v−1t is,
(v−1
t | Dt−1
) ∼ G(δ1nt−1/2, δ1dt−1/2)
The likelihood,
(et | Dt−1, v
−1t
) ∼ N(0, Qt)
Therefore, the posterior distribution for v−1t is,
π(v−1t | Dt) ∝ 1
| vtQt |1/2exp(−st−1
vt
εttQ
−1t εt)(v
−1t )δ1nt−1/2−1 exp(−δ1dt−1v
−1t /2)
This implies that,
(v−1t | Dt) ∼ G
((n + δ1nt−1) /2,
(δ1dt−1 + st−1ε
ttQ
−1t εt
)/2
)
In other words,
nt = δ1nt−1 + n;
dt = δ1dt−1 + st−1ettQ
−1t et;
st = dt/nt
17
C Data simulation
Suppose we have a functional data (yt(x), t = 1, . . . , T ). Given (α, β, {Φt}, {vt}), we simu-
late the data as,
yt(x1)
yt(x2)
. . .
yt(xn)
| {yt−1(xi)}, {Φt}, yt(x) ∼ MVN
µt =
µt(x1)
µt(x2)
. . .
µt(xn)
, (vt)Σ
with,
µt(xi) =∑
j
φt,jYt−j(xi) + (vt)−1ρ(x) (x, x1:n)
(yt(x)−
∑j
φt,jyt−j(x)
)
and,
Σ = Σ1 − ρ(x)(x; x1:n)ρ(x)(x; x1:n)
Σ1 is n× n matrix with the (i, j) element (Σ1)i,j = Corr(xi, xj). And ρ(x)(x; x1:n) is an
n by 1 vector with the i’th element(ρ(x)(x; x1:n)
)i= Corr(x, xi).
18
References
Bayarri, M., Berger, J., Garcia-Donato, G., Liu, F., Palomo, J., Paulo, R., Sacks, J., Walsh,
D., Cafeo, J., and Parthasarathy, R. (2006). Computer model validation with functional
outputs. Niss tech. report .
Bayarri, M., Berger, J., Higdon, D., Kennedy, M., Kottas, A., Paulo, R., Sacks, J., Cafeo, J.,
Cavendish, J., Lin, C., and Tu, J. (2002). A framework for validation of computer models.
In D. Pace and S. S. (Eds.), eds., Proceedings of the Workshop on Foundations for V&V
in the 21st Century. Society for Modeling and Simulation International.
Berger, J., Oliveira, V. D., and Sanso, B. (2001). Objective bayesian analysis of spatially
correlated data. JASA 1361–1374.
Paulo, R. (2005). Default priors for gaussian processes. Annals 556–582.
West, M. and Harrison, P. (1997). Bayesian Forecasting and Dynamic Models. Springer,
New York, USA.
19