bayesian dynamic linear modelling for complex computer modelsfei/papers/dlmcomputer.pdf · bayesian...

Bayesian Dynamic Linear Modelling for

Complex Computer Models

Fei Liu, Liang Zhang, Mike West

Abstract

Computer models may have functional outputs. With no loss of generality, we as-

sume that a single computer run is generating a function of time. For complex computer

models, Bayarri et al. (2002) considers the time as a computer model associated input

parameter, and uses the Gaussian Response Surface Approximation method (GaSP)

with the Kronecker product correlation matrix in the augumented space. However,

this approach is only applicable when there are only a few time points. In this pa-

per, we consider the Bayesian Dynamic Linear Model West and Harrison (1997) as an

alternative approach when there are many time points. Our method also allows the

forecasting for the future.

Keywords: Computer model; Bayesian Dynamic Linear Model; Gaussian stochas-

tic process; Bayesian analysis; Forwarding filtering and backward sampling; MCMC.

1 Introduction

The computer models can be represented as deterministic functions of the associated pa-

rameters. There are generally two types of parameters: (a) calibration parameters u are

1

only associated with the computer codes. They may be uncertain physical properties. (b)

unknown parameters x are associated with both the computer models and the field exper-

iments. They are characteristics associated with the real experiments. For simplicity, we

use x to represent (x, u). As a result, we can represent the computer model as a function

of x, y (x). On the other hand, exsercising the code is very time consuming for complex

computer models. Consequently, the function y (x) is only evaluated at selected locations

(xi, i = 1, . . . , n).

In this paper, we focus on the computer models with the functional outputs. We assume

that the computer model outputs are functions of time t, t = 1, . . . , T . We represent such

computer model output as y (x, t). This type of computer models has been studied both

in Bayarri et al. (2002) and Bayarri et al. (2006). The SAVE model in Bayarri et al. (2002)

uses the Gaussian Response Surface Approximation method (GaSP) on the augmented space

of (x, t) by assuming seperable correlation in the space of x and t. They assume that the

computer model outputs are realizations from a Gaussian Stochastic Process defined on the

(x, t) space, i.e.,

y (·, ·) ∼ GP

(µ,

1

λMCorr ((·, ·), (·, ·))

)

where Corr(y (x, t) , y

(x′, t′))

= exp(−∑

βi | xi − x′i |αi

)exp

(−β(t) | t− t′ |α(t)

). We

use y (x) to represent the functional output of a single computer run whose inputs is x,

(y (x))t = (y (x, tj) , j = 1, . . . , T )). The likelihood in SAVE is represented as,

y (x1)

...

y (xn)

∼ N

(µ× 1,

1

λMΣ1 ⊗ Σ2

)(1)

where (Σ1)k,l = exp(−∑

βi | xki − x′li |αi

)and (Σ2)k,l = exp

(−β(t) | tk − t′l |α(t)

).

2

To implement the SAVE model, one needs to invert the matrices Σ1 and Σ2, where Σ1 is

a n by n matrix, and Σ2 is T by T . In the context of complex computer models, inverting

Σ1 is feasible because n is generally small. But however, the dimension of Σ2 may be too

huge to invert. Bayarri et al. (2006) uses basis expansion method (SAVE2), i.e.,

y (x, ·) =I∑

i=1

wi (x) φi(·)

where {φi(·)} is a basis library, and they use a wavelet for their application. Then, they

model the coefficients as independent spatial processes, wi (·) ∼ GP(µi,

1λM

iCorri (·, ·)

).

SAVE2 can give predictions with confidence bounds for the computer model output at

any values of x by spatial interpolation. However, it can only handle the computer models

with fixed time grids t = 1, . . . , T . Some applications of the computer model may require

forecasting for the future, weather forecasting models for instance. In this paper, we will

discuss modelling the computer model code by Dynamic Linear Models (DLM), as to capture

the temporal structures in the data.

The paper is organized as follows. We will first introduce our DLM model and make

connections with the SAVE model in section 2. In section 3, we will give the likelihood and

specify the prior distributions for the unknown parameters associated with the DLM model.

Section 4 discusses the MCMC method to get draws from the posterior distributions of the

unknown quantities, and also gives spatial interpolation for the computer model at arbitray

locations in the x space . The method will be applied on an example data set in section 5.

2 the DLM for the computer model outputs

For a single computer model run at x, we use the time varying autoregressive model (TVAR)

(West and Harrison, 1997) to model its temporal structure.

3

y (x, t) =

p∑j

φt,jy(x, t− j) + εt(x) (2)

The computer model runs are correlated by assuming a Gaussian stochastic processes for

the evolutions εt(x) in equation (2), i.e.,

εt(·) ∼ GP(0, vtCorr(t)(·, ·)) (3)

where, we are assuming that Corr(t)(·, ·)) = Corr(·, ·)) is the same for all t. And we use

seperable power exponential function for the evolution correlation, i.e.,

Corr(x, x′) = exp(−

∑i

βi | xi − x′i |αi)

The model in equation (2) can be connected with the SAVE model given in equation (1),

in an approximation sense. Consider the likelihood for the SAVE model in equation (1). Let

yt = (y (x1, t) , . . . , y (xn, t))′. We represent the likelihood in equation (1) by the product of

conditional likelihoods,

L(yT , yT−1, . . . , y1 | Θ

)=

(p+1∏i=T

L(yi | yi−1, . . . , y1, Θ

))

L(yp, yp−1, . . . , y1 | Θ

)(4)

Next, at any time t, we approximate the conditional likelihood as,

L(yt | yt−1, . . . , y1, Θ

) ≈ L(yt | yt−1, . . . , yt−p, Θ

)(5)

Let ρ(k, l) = exp(−β(t) | k − l |α(t)

), ρt,t−1:t−p = (ρ(t, t− 1), . . . , ρ(t, t− p))

′,

(Σ̃2

)k,l

=

ρ(k, l), k, l = t − 1, . . . , t − p. The conditional likelihoods in equation (5) are multivariate

normals with mean vectors,

4

E(yt | yt−1, , . . . , yt−p, Θ

)=

((ρt,t−1:t−p ⊗ Σ1

)′ (Σ̃2 ⊗ Σ1

)−1)

yt−1

...

yt−p

=

((ρt,t−1:t−p

)′ (Σ̃2

)−1)⊗ In×n

yt−1

...

yt−p

This implies the auto-regressive term in equation (2),

yM (x, t) =(ρt,t−1:t−p

)′ (Σ̃2

)−1

yM (x, t− 1)

...

yM (x, t− p)

We assume that Corr(t)(·, ·)) = Corr(·, ·)) in equation (3) because the covariance matrices

of the conditional likelihoods L(yt | yt−1, . . . , yt−p, Θ

)is time-independent. To see this, we

representat Cov(yt | yt−1, . . . , yt−p, Θ

)as,

Cov(yt | yt−1, . . . , yt−p, Θ

)=

1

λM

(Σ1 −

((ρt,t−1:t−p)

′ ⊗ Σ1

)′ (Σ̃2 ⊗ Σ1

)−1 ((ρt,t−1:t−p)

′ ⊗ Σ1

))

=1

λM

(1− (ρt,t−1:t−p)

′ (Σ̃2

)−1

(ρt,t−1:t−p)

)Σ1

Finally, realizing that the functinal outputs of the computer models are usually tempo-

rally inhomogenous, we adapt our model to such inhomogenienty by allowing time-varying

autoregressive coefficients and time-varying variances of the innovations in equation (2).

5

3 Likelihood and the Prior Distributions

3.1 The Multivariate DLM representation

We can represent the likelihood in the matrix form, i.e.,

y(x1, t)

y(x2, t)

. . .

y(xn, t)

=

y(x1, t− 1) y(x1, t− 2) . . . y(x1, t− p)

y(x2, t− 1) y(x2, t− 2) . . . y(x2, t− p)

......

. . ....

y(xn, t− 1) y(xn, t− 2) . . . y(xn, t− p)

φt,1

φt,2

...

φt,p

+

εt(x1)

εt(x2)

...

εt(xn)

(6)

And we model the TVAR coefficients Φt =

φt,1

φt,2

...

φt,p

as,

Φt = Φt−1 + wt

where wt ∼ N(0,Wt). Let Gt be the identity matrix of size p, V t = vtΣ1, and

F′t =

y(x1, t− 1) y(x1, t− 2) . . . y(x1, t− p)

y(x2, t− 1) y(x2, t− 2) . . . y(x2, t− p)

......

. . ....

y(xn, t− 1) y(xn, t− 2) . . . y(xn, t− p)

We can represent the likelihood in the way of Multivariate DLM (West and Harrison, 1997),

{Ft, Gt, V t,Wt}Tt=1

6

3.2 The Prior distributions

Let Dt be the data up to time t. We sequentially specify the prior distribtions for Wt and

Vt by two discounting factors δ1, δ2.

v−1t | Dt−1 ∼ G(δ1nt−1/2, δ1dt−1/2)

For Wt, we assume,

Wt | Dt−1 = (1− δ2)Ct−1/δ2, Ct−1 = Cov(Φt−1 | Dt−1)

where Ct−1 = Cov (Φt−1 | Dt−1) and will be specified recursively in section A. The values

for (n0, d0, C0) will be prespecified.

Finally, for the spatial parameters α = {αi} and β = {βi}, we use the Jeffereys’ rule

prior π (α, β) discussed in Berger et al. (2001) and Paulo (2005).

π (α, β) ∝| I (α, β) |1/2∝√| tr(Σ−1

1 Σ̇1)2 |

where I (α, β) is the Fisher information matrix, and Σ̇1 = ∂Σ1

∂(α,β).

4 MCMC method for the Multivariate DLM

We use the Monte Carlo Markov Chain method (MCMC) to draw samples from the poste-

rior distributions, π ({v1, . . . vT}; {Φ1, . . . , ΦT}; {α, β} | DT )). We first give the algorithm as

follows. At the i’th iteration,

1. Sample({α(i), β(i)} | DT , {v(i−1)

1 , . . . , v(i−1)T }, {Φ(i−1)

1 , . . . , Φ(i−1)T }

)by the Metroplis-Hastings

algorithm.

7

2. Sample({v(i)

1 , . . . , v(i)T }, {Φ(i)

1 , . . . , Φ(i)T } | DT , {α(i), β(i)}

)as,

2.1 Sample({v(i)

1 , . . . v(i)T } | DT , {α(i), β(i)}

). This will be discussed in section 4.1.

2.2 Sample({Φ(i)

1 , . . . , Φ(i)T } | DT , {v(i)

1 , . . . v(i)T }, {α(i), β(i)}

)as in section 4.2.

4.1 Sampling the variances

We give the algorithm to update the variances({v(i)

1 , . . . v(i)T } | DT , {α(i), β(i)}

).

1. Do the Forward filtering assuming {v1, . . . , vT} unknown, as discussed in the ap-

pendix B.

2. Sample((

v−1T

)(i) | DT , {α(i), β(i)})∼ G (nT /2, dT /2).

3. Sample vt, t = T − 1, . . . , 1 recursively as,

v−1t = δ1v

−1t+1 + G ((1− δ1)nt/2, dt/2)

4.2 Sampling the TVAR coefficients

Below is the algorithm to make draws from π ({Φ1, . . . , ΦT} | DT , {v1, . . . vT}, {α, β}).

1. Do the Forward filtering conditional on {v1, . . . , vT}. This will be discussed in the

appendix A.

2. Sample (ΦT | DT , {v1, . . . , vT}) ∼ MVN (mT , CT ).

8

3. Sample Φt, t = T − 1, . . . , 1 recursively from,

(Φt | DT , Φt+1, {v1, . . . , vT}) ∼ MVN ((1− δ2)mt + δ2Φt+1, (1− δ2)Ct)

4.3 Spatial interpolation

We predict the output of a computer model at a new input value by spatial interpolation.

Suppose x is the new (unexsercised) input value. Let et(xi) = yt(xi)−∑

j yt−j(xi)φt,j and

ρx(x, x1:n) = (Corr(x, x1), . . . ,Corr(x, xn))′, we have,

(yt(x) | {yt−1(x), . . . yt−p(x)}, Data, {v1, . . . , vT}, {α, β}) ∼ N(µt(x), σ2

t (x))

where,

µt(x) =∑

j

yt−j(x)φt,j + v−1t ρx(x, x1:n)Σ−1

1

et(x1)

et(x2)

...

et(xn)

and,

σ2t (x) = vt

(1− ρx (x, x1:n) Σ−1

1 ρx (x, x1:n))

As all the computer model emulators do, the DLM modelling approach gives back the

computer model output, when we are trying to make predictions for the exsercised computer

input values. In other words, if x ∈ {x1, . . . , xn}, we have µt(x) = yt(x) and σ2t (x) = 0.

9

5 An example

5.1 The data

Figure 1 gives an example of the functional outputs of computer models. Each time series

is associated with an x value located to the left of the series. The x values are considered

as the computer model inputs. The data with x = 0.5 (in red) is obtained from some real

physical experiment. This data is observed at T = 3000 time points. We use yt(0.5) =

(yt(0.5), t = 1, . . . , T ) to represent it. Given yt(0.5) and its TVAR20 fit {φt,j, vt}, we simulate

the data for x = 0.25, . . . , 0.75 by fixing α = 2, β = 1.6. The details are discussed in

Appendix C.

Figure 1: The simulated computer model data at various input values

10

5.2 MCMC Results

In section 4, we can perfectly sample {v(i)1 , . . . , v

(i)T }, {Φ(i)

1 , . . . , Φ(i)T } conditional on {α(i), β(i)}.

This implies that, we do not need to update {v(i)1 , . . . , v

(i)T }, {Φ(i)

1 , . . . , Φ(i)T } in every iteration.

In particular, we update {v(i)1 , . . . , v

(i)T }, {Φ(i)

1 , . . . , Φ(i)T } after every 200 iterations of sampling

{α(i), β(i)} by the Metroplis-Hastings algorithm. We fix {α(i)} at 2 for the example data

set. For the other unknowns, starting the MCMC from “true” parameter values, we obtained

N = 2000 samples, among which the first 1000 are treated as burnin samples and will be

discarded in all the posterior inferences. Figure 2 gives the trace plot, prior distribution (up

to a normalizing constant), posterior distribution, autocorrelation function for β. For the

purpose of making comparison between the prior and the posterior distribution for β, we

highlight with red line the prior distribution in the interval (1, 2), within which the posterior

draws are concentrated.

Suppose{

φ(i)t,j

}is the i’th MCMC draw for the TVAR coefficients {φt,j}, where i = 1, . . . , N,

t = 1, . . . , T , and j = 1, . . . , 20. We calculate the posterior mean for φt,j, φ̂t,j by,

φ̂t,j =1

N

∑i

φ(i)t,j

And the point-wise posterior means of the TVAR coefficients are shown in the left panel

of figure 3. The right panel shows {v̂t, t = 1, . . . , T}, the point-wise posterior means of {vt}.

5.3 Spatial interpolation

One direct application of the multivariate DLM, as we discussed in section 4.3, is to get the

prediction for the computer model at input other than the design points. In figure 4, we

give our prediction for the dynamic computer model outputs at input value x = 0.5. We

also make comparison between the true outputs and our prediction at the time intervals

11

Figure 2: Upper-left: trace plot of the MCMC samples for β; Upper-Right: autocorrelationfunctions of the MCMC samples for β; Lower-Left: posterior distribution of β; Lower-Right:prior density of β.

(1100, 1300) and (2700, 2900), where the data is exhibiting interesting features.

5.4 Wave and modular decomposition

We can decompose the process {y(t)} as

12

Figure 3: Left: posterior means for the TVAR coefficients {φt,j}; Right: posterior means forthe time varying variances {vt}

Figure 4: Posterior predictive curve (green), true computer model output (red), and 90%piece-wise predictive intervals for spatial interpolation with input value x = 0.5.

yt =c∑

l=1

zt,l +r∑

l=1

xt,l

13

where the latent processes {zt,l} are TVAR’s with lag 1 and xt,l are stochastically time-

varying damped harmonic components, each of which is associated with the modulars (damp-

ing parameters) {at,l} and the wavelengths (periods) {λt,l} (West and Harrison, 1997). Such

decomposition can help to understand the physics meanings of the computer model outputs.

In Figure 5, we show the decompositions for the posterior mean of the process {yt(0.5)}. In

Figure 6, we show the modulars and the wavelengths of the first 5 components, as a function

of t.

Figure 5: The true computer model output data {yt(0.5)}(bottom), posterior mean for{yt(0.5)}(second to the bottom), and decomposition of the posterior mean (the rest curvesare the first to the third components from bottom to the top).

A Forward filtering with known variances

We briefly review the forward filtering algorithm with known variances for multivariate

DLM. For more details, refer to the Chapter 16 in West and Harrison (1997).

With (m0, C0),

14

Figure 6: Left: wave decompositions; Right: modular decompositions

(a). Posterior at t− 1: (Φt−1 | Dt−1) ∼ N(mt−1, Ct−1)

(b). Prior at t: (Φt | Dt−1) ∼ N(at, Rt), with,

at = mt−1, Rt = Ct−1/δ2

(c). One-step forecast: (yt | Dt−1) ∼ N(f t, Qt), with,

f t = F′t at = F

′t mt−1; Qt = F

′t Ct−1Ft/δ2 + vt

(d). Poserior at t: (Φt | Dt) ∼ N(mt, Ct)

with,

mt = at + Atet and Ct = Rt − AtQtA′t

where,

15

At = RtFtQ−1t and et = Yt − ft

B Forward filtering with unknown variances

We first describe the forward filtering algorithm with unknown variances for multivari-

ate DLM.

With m0, C0, s0, n0,

(a). Posterior at t− 1: (Φt−1 | Dt−1) ∼ N(mt−1, Ct−1)

(b). Prior at t: (Φt | Dt−1) ∼ N(at, Rt), with,

at = mt−1, Rt = Ct−1/δ2

(c). One-step forecast: (yt | Dt−1) ∼ N(f t, Qt), with,

f t = F′t at = F

′t mt−1; Qt = F

′t Ct−1Ft/δ2 + st−1Σ1

(d). Posterior at t: (Φt | Dt) ∼ Tnt(mt, Ct), and, (V −1t | Dt) ∼ G(nt/2, dt/2), with,

At = RtFtQ−1t = Ct−1FtQ

−1t /δ2

where mt = mt−1 + Atet, et = yt − F′t mt−1, and Ct = st

st−1

(Ct−1

δ2− AtQtA

′t

), and,

nt = δ1nt−1 + n; dt = δ1dt−1 + st−1ettQ

−1t et (7)

16

Now, we derive the relationship in equation (7). At time t, the prior for v−1t is,

(v−1

t | Dt−1

) ∼ G(δ1nt−1/2, δ1dt−1/2)

The likelihood,

(et | Dt−1, v

−1t

) ∼ N(0, Qt)

Therefore, the posterior distribution for v−1t is,

π(v−1t | Dt) ∝ 1

| vtQt |1/2exp(−st−1

vt

εttQ

−1t εt)(v

−1t )δ1nt−1/2−1 exp(−δ1dt−1v

−1t /2)

This implies that,

(v−1t | Dt) ∼ G

((n + δ1nt−1) /2,

(δ1dt−1 + st−1ε

ttQ

−1t εt

)/2

)

In other words,

nt = δ1nt−1 + n;

dt = δ1dt−1 + st−1ettQ

−1t et;

st = dt/nt

17

C Data simulation

Suppose we have a functional data (yt(x), t = 1, . . . , T ). Given (α, β, {Φt}, {vt}), we simu-

late the data as,

yt(x1)

yt(x2)

. . .

yt(xn)

| {yt−1(xi)}, {Φt}, yt(x) ∼ MVN

µt =

µt(x1)

µt(x2)

. . .

µt(xn)

, (vt)Σ

with,

µt(xi) =∑

j

φt,jYt−j(xi) + (vt)−1ρ(x) (x, x1:n)

(yt(x)−

∑j

φt,jyt−j(x)

)

and,

Σ = Σ1 − ρ(x)(x; x1:n)ρ(x)(x; x1:n)

Σ1 is n× n matrix with the (i, j) element (Σ1)i,j = Corr(xi, xj). And ρ(x)(x; x1:n) is an

n by 1 vector with the i’th element(ρ(x)(x; x1:n)

)i= Corr(x, xi).

18

References

Bayarri, M., Berger, J., Garcia-Donato, G., Liu, F., Palomo, J., Paulo, R., Sacks, J., Walsh,

D., Cafeo, J., and Parthasarathy, R. (2006). Computer model validation with functional

outputs. Niss tech. report .

Bayarri, M., Berger, J., Higdon, D., Kennedy, M., Kottas, A., Paulo, R., Sacks, J., Cafeo, J.,

Cavendish, J., Lin, C., and Tu, J. (2002). A framework for validation of computer models.

In D. Pace and S. S. (Eds.), eds., Proceedings of the Workshop on Foundations for V&V

in the 21st Century. Society for Modeling and Simulation International.

Berger, J., Oliveira, V. D., and Sanso, B. (2001). Objective bayesian analysis of spatially

correlated data. JASA 1361–1374.

Paulo, R. (2005). Default priors for gaussian processes. Annals 556–582.

West, M. and Harrison, P. (1997). Bayesian Forecasting and Dynamic Models. Springer,

New York, USA.

19

bayesian dynamic linear modelling for complex computer modelsfei/papers/dlmcomputer.pdf · bayesian...

Documents