bayesian calibration of a large-scale geothermal reservoir model … · 2012-02-02 · citation:...

Bayesian calibration of a large-scale geothermal reservoir model by anew adaptive delayed acceptance Metropolis Hastings algorithm

T. Cui,1 C. Fox,2 and M. J. O’Sullivan1

Received 23 December 2010; revised 5 September 2011; accepted 8 September 2011; published 20 October 2011.

[1] The aim of this research is to estimate the parameters of a large-scale numerical modelof a geothermal reservoir using Markov chain Monte Carlo (MCMC) sampling, within theframework of Bayesian inference. All feasible parameters that are consistent with themeasured data are summarized by the posterior distribution, and hence parameter estimationand uncertainty quantification are both given by calculating expected values of statistics ofinterest over the posterior distribution. It appears to be computationally infeasible to use thestandard Metropolis-Hastings algorithm (MH) to sample the high dimensionalcomputationally expensive posterior distribution. To improve the sampling efficiency, anew adaptive delayed-acceptance MH algorithm (ADAMH) is implemented to adaptivelybuild a stochastic model of the error introduced by the use of a reduced-order model. Thisuse of adaptivity differs from existing adaptive MCMC algorithms that tune proposaldistributions of the Metropolis-Hastings algorithm (MH), though ADAMH also implementsthat technique. For the 3-D geothermal reservoir model we present here, ADAMH shows agreat improvement in the computational efficiency of the MCMC sampling, and promisingresults for parameter estimation and uncertainty quantification are obtained. This algorithmcould offer significant improvement in computational efficiency when implementingsample-based inference in other large-scale inverse problems.

Citation: Cui, T., C. Fox, and M. J. O’Sullivan (2011), Bayesian calibration of a large-scale geothermal reservoir model by a new

adaptive delayed acceptance Metropolis Hastings algorithm, Water Resour. Res., 47, W10521, doi:10.1029/2010WR010352.

1. Introduction[2] In setting up numerical models for understanding

subsurface fluid transport, e.g., groundwater modeling, geo-thermal reservoir modeling and petroleum reservoir model-ing, the estimation of spatially distributed parameters andthe quantification of the associated uncertainties are impor-tant topics for investigation [Zimmerman et al., 1998;Carrera et al., 2005; Liu and Gupta, 2007; Hendricks-Franssen et al., 2009]. In this process, direct and indirectmeasurements of the subsurface properties of the reservoirprovide limited but necessary knowledge for the assessmentof parameters. This study aims at inferring parameters fromstate data (such as temperatures and pressures) that are indi-rectly related to the parameters through a nonlinear forwardmodel. Since hard data (localized direct measurements ofthe parameters) are not available for the reservoir we pres-ent here, geostatistical approaches such as Deutsch andJournel [1998] for interpreting hard data are not discussed.

[3] Because the parameters of interest such as porosityand permeability are usually spatially distributed, highly het-erogeneous and anisotropic, and state data are usually sparse,conditioning the parameters on state data is an ill-posed

inverse problem [Hadamard, 1902; Yeh, 1986; Carreraet al., 2005]. Ill-posedness means that there exist a range offeasible parameters that are consistent with the measureddata, and hence a range of possible model predictions. Thus,the assessment of parameters and model predictions requiresummarizing information over this range of feasible param-eters. Numerous approaches have been developed in the hy-drology and petroleum research communities to summarizethe range of feasible parameters and assess the associateduncertainties. They include the gradient-based optimizationalgorithms of Carrera and Neuman [1986], Yeh [1986],Finsterle [1993], Gomez-Hernanez et al. [1997], Galarzaet al. [1999], Doherty [2005], Hernandez et al. [2006], andAwotunde and Horne [2011]; stochastic global optimizationalgorithms developed by Duan et al. [1992], Yapo et al.[1998], Karpouzos et al. [2001], and Vrugt et al. [2003];pilot points methods [RamaRao et al., 1995; Cooley,2000]; the probability perturbation method [Caers andHoffman, 2006]; and the Bayesian filtering technique ofLehikoinen et al. [2010]. While these methods have beensuccessfully implemented for calibrating hydrology models,in many cases they also suffer from difficulties such as inad-equately exploring the parameter space [Duan et al., 1992;Bates and Campbell, 2001; Liu and Gupta, 2007; Keatinget al., 2010]. In this study, we implement sample-based in-ference within a Bayesian framework. Instead of giving apoint estimate of the model parameters, sample-based infer-ence characterizes the posterior distribution over parametersby using Markov chain Monte Carlo (MCMC) sampling

1Department of Engineering Science, University of Auckland, Auck-land, New Zealand.

2Department of Physics, University of Otago, Dunedin, New Zealand.

Copyright 2011 by the American Geophysical Union.0043-1397/11/2010WR010352

W10521 1 of 26

WATER RESOURCES RESEARCH, VOL. 47, W10521, doi:10.1029/2010WR010352, 2011

http://dx.doi.org/10.1029/2010WR010352

techniques and Monte Carlo integration. Traditionally, thisapproach has been considered to be computationally prohib-itive for large-scale hydrology models. This paper introdu-ces a new algorithm, on the basis of the latest advances inBayesian methodology and sampling algorithms, thatallows us to apply MCMC techniques to the title problem ata scale that has previously been infeasible.

[4] In a Bayesian framework, the posterior distributionquantifies the relative probability of a set of parametersbeing correct, given the various uncertainties such as mea-surement noise and modeling error. Probabilistic modelingof the various uncertainties is crucial in formulating theposterior distribution [Woodbury and Ulrych, 2000; Foxet al., 2009; Kaipio and Somersalo, 2004], and variousschemes for hydrology models have been developed, suchas the Bayesian hierarchical methods of Kavetski et al.[2006], Kuczera et al. [2006], Thyer et al. [2009], andRenard et al. [2010] and the autoregressive models used byBates and Campbell [2001] and Schoups and Vrugt [2010].Given the posterior distribution, robust model predictionsand uncertainty quantification are then calculated asexpectations of desired quantities over the posterior distri-bution. Because of the high dimensionality of the parameterspace and nonlinearity of the posterior distribution for com-plex subsurface flow problems, the most efficient way forcomputing expectations of summary statistics is by MonteCarlo integration, using samples distributed according tothe posterior distribution. The best technology currently fordrawing such samples is MCMC sampling. This approachhas been applied to several hydrology and petroleum stud-ies, include conditioning a permeability field by pressuredata [Oliver et al., 1997], calibrating conceptual rainfall-runoff models [Kuczera and Parent, 1998; Bates andCampbell, 2001; Renard et al., 2010; Schoups et al.,2010], estimating the permeability distribution for a twophase flow model [Efendiev et al., 2005], and assessing pa-rameter uncertainty and data worth for a single phase aqui-fer [Fu and Gomez-Hernandez, 2009].

[5] For large-scale subsurface modeling problems,MCMC sampling becomes a significant computational chal-lenge, not only because of the evaluation of a computation-ally intensive numerical model in each iteration of theMCMC sampling, but also because difficulties arise fromexploring the high dimensional space of the spatially dis-tributed parameters. Several investigations on improvingthe efficiency of traversing the parameter space have beencarried out. For example, Smith and Marshall [2008]compared several recent tuning-free MCMC algorithms,namely the adaptive Metropolis algorithm (AM) of Haarioet al. [2001], the delayed rejection AM algorithm (DRAM)of Haario et al. [2006], and the differential evolution MonteCarlo algorithm (DEMC) developed by Ter Braak [2006]for a hydrology model of a watershed. Vrugt et al. [2008]applied the differential evolution adaptive Metropolis algo-rithm (DREAM) [Vrugt et al., 2009] to analyze a concep-tual rainfall-runoff model, and Liu et al. [2010] employedthe adaptive direction sampling (ADS) of Gilks et al.[1994] and the multiple-try Metropolis (MTM) of Liu et al.[2000] to the analysis of a semianalytic DNAPL dissolutionand dissolved-phase transportation model. Fu and Gomez-Hernandez [2009] compared several block updating pro-posal distributions of the Metropolis-Hastings algorithm for

an aquifer model, and an iterative spatial resampling transi-tion kernel was proposed by Mariethoz et al. [2010].

[6] However, it is still computationally prohibitive toapply general purpose MCMC algorithms for computation-ally expensive models that may requires hours to simulate.We introduce an adaptive delayed acceptance Metropolis-Hastings algorithm (ADAMH) to reduce the computingtime per iteration of the MCMC sampling. With this algo-rithm, we are able to sample problems that are at the limitof present computational power. ADAMH enhances thecomputational efficiency of the Metropolis-Hastings algo-rithm (MH) [Metropolis et al., 1953; Hastings, 1970] byemploying a reduced-order model and state-of-the-art tech-niques in adaptive MCMC sampling [Haario et al., 2001;Atchade and Rosenthal, 2005; Andrieu and Moulines,2006; Roberts and Rosenthal, 2007, 2009]. Other MCMCalgorithms that aim at enhancing the statistical efficiency(e.g., AM and DREAM) can also be implemented withinthe framework of ADAMH.

[7] We apply ADAMH to estimating the heterogeneousand anisotropic permeability distribution and boundary con-ditions of a 3-D steady state model of a convection domi-nated, two phase (water and steam) geothermal reservoir.The name and geological features of this reservoir are notgiven here for confidentiality reasons, but the numericalmodel and data available are presented. Temperature logs(state data) that are measured from exploitation wells areused to infer the permeabilities and boundary conditions. Thetemperature data spans the range from about 30�C to about330�C, with the deepest measurement point being about2200 m below sea level. The integrated finite volume simula-tor TOUGH2 [Pruess, 1991] is used to simulate the multi-phase nonisothermal flows in the reservoir. There are 10,046parameters in our model, and each model simulation requiresabout 30 to 50 min of CPU time, which makes it virtuallyimpossible for the standard MH algorithm to be applied.

[8] One of the main ingredients of ADAMH is the useof a computationally fast reduced-order model that approx-imates the behavior of the accurate numerical model.Reduced-order models have been used by various research-ers to perform fast MCMC sampling. For example, Higdonet al. [2003] developed a Metropolis coupled MCMCscheme that simultaneously runs chains for models thathave different levels of grid resolution. Information fromthe faster running coarse formulations speed-up the mixingof the finest scale chain, from which samples are taken.Lieberman et al. [2010] employed a projection-basedreduced-order model to construct an approximate posteriordensity on a reduced parameter space, and then samples aredrawn from the approximate posterior distribution directly.ADAMH aims at sampling directly from the exact posteriordistribution, and uses the reduced-order model in a differ-ent way, within the framework of the delayed acceptancescheme of Christen and Fox [2005]. In the first step ofADAMH, an approximate posterior distribution (alsoreferred to as an ‘‘approximation’’) based on a reduced-order model is used to precompute the acceptance probabil-ity of proposals as in MH, then a second step evaluation ofthe exact posterior density only occurs for the proposalsaccepted in the first step. Here, a modified acceptance prob-ability is used in the second step that ensures the ergodicityof the Markov chain.

W10521 CUI ET AL.: ADAPTIVE DELAYED ACCEPTANCE METROPOLIS-HASTINGS ALGORITHM W10521

2 of 26

[9] This two step MCMC scheme is very similar to the‘‘surrogate transition’’ method [e.g., Liu, 2001, section9.4.3], which is also implemented by Efendiev et al. [2005]for a 2-D multiphase subsurface flow problem. However,the ‘‘surrogate transition’’ method only allows the use of anapproximate posterior distribution that does not depend onthe current state. One of the most important features of thedelayed acceptance scheme is that it can deal with a moregeneral form of approximation that includes both the state-dependent and state-independent cases. We find that theuse of a state-dependent approximation is essential in thisapplication.

[10] We construct the state-dependent reduced-ordermodel by coarsening the grid structure of the fine-scale nu-merical model (referred to as the fine model), and adding alocal correction term within ADAMH (see section 3). Sincethe size of the convective plume of the geothermal reser-voir in which we are interested is characterized by the reso-lution of the grid structure, a coarse model with areasonably high resolution is chosen (see section 5). Thepermeability distribution is parametrized by a pixel-basedrandom field representation [Cressie, 1993; Rue and Held,2005] that has the same resolution as the coarse grid. Thisparametrization is used by both the fine model and thecoarse model, and hence permeability upscaling is notrequired. The boundary conditions (only the mass input atthe bottom of the model needs to be estimated) is parame-trized by a radial basis function with control points thathave prespecified locations. The upscaling of the massinput of the coarse model can be simply calculated fromthe sum of the mass input values of the corresponding cellsin the fine model.

[11] Since there usually exists a non-negligible discrep-ancy between the reduced-order model and the accuratemodel, the approximate posterior distribution that does notinclude the statistics of this model reduction error wouldresult in biased estimates. In the delayed acceptance scheme,this shows up as a very low acceptance rate in the secondstep, and a poorly mixed Markov chain. ADAMH imple-ments the enhanced error model (EEM) of Kaipio and Som-ersalo [2007] to estimate the model reduction error online,and allows for the construction of an approximate posteriordistribution that adapts to the model reduction error.

[12] ADAMH shows a great enhancement in sampling ef-ficiency when it is used to sample the posterior distributionfor the 3-D geothermal reservoir model inversion here. Itachieves a speed-up factor about 7.7 compared to the stand-ard MH, and we are able to run 11,200 iterations in about 40days. The sampling results show good agreement betweenthe estimated temperature profiles and the measured data.We expect that ADAMH will produce significant improve-ment in computational efficiency when applied to sample-based inference in other large-scale inverse problems.

[13] This paper is organized as follows: section 2 givesdiscussions of the Bayesian inferential framework forinverse problems, efficiencies of MCMC sampling, andissues in constructing approximate posterior distributionsbased on reduced-order models. Section 3 addresses theframework of ADAMH and several adaptive approxima-tions. Section 4 provides the details of the adaptive pro-posal distribution used by ADAMH. In section 5, wepresent the 3-D geothermal reservoir model and its

reduced-order form, as well as the parametrization of theunknowns and the prior distribution. Section 6 summarizesthe computational results and posterior analysis. Section 7offers some conclusions and discussion.

2. Bayesian Inverse Problems2.1. Posterior Formulation

[14] In a Bayesian framework, the unknown parametersx are considered as random variables, and a posterior distri-bution �ðxjdÞ over parameters x conditioned on measure-ments d can be constructed using the probabilistic modelfor various uncertainties associated with the inverse prob-lem. One commonly used stochastic relationship is

d ¼ FðxÞ þ e; ð1Þ

where F : x ! d is a deterministic forward model that of-ten involves solving a large system of PDEs, and e is anoise vector that represents all the uncertainties associatedwith the inverse problem, such as measurement noise andmodel bias. Model bias is the discrepancy between the for-ward model and underlying physical system. This mayarise from various sources: numerical error in computerimplementation of the forward model, spatial discretizationof the unknown parameters, simplification and inappropri-ate assumptions in the conceptual model, etc. For thesteady state problem, we present in section 5, the noiseterm is assumed to follow a zero mean multivariate Gaus-sian, i.e., e � Nð0;�eÞ, as discussed in the work of Higdonet al. [2003]. Since the field measurements are reasonablysparse (see section 5), we further assume that the covari-ance matrix �e is diagonal and has the form of �e ¼ �2

eI(as discussed in the early reference Carrera and Neuman[1986]), where I is an identity matrix.

[15] It is worth mentioning that inappropriate modelingof the noise term may cause biased estimation of model pa-rameters and underestimation of the uncertainty [Neuman,2003; Thyer et al., 2009]. This also introduces difficultiesto MCMC sampling and other parameter estimation techni-ques. In the hydrology literature, various approaches havebeen proposed to deal with inadequate models and quantifythe associated uncertainties. They includes the generalizedlikelihood uncertainty estimation approach developed byBeven and Binley [1992] and Beven and Freer [2001] andthe Bayesian model averaging methods of Neuman [2003],Duan et al. [2007], and Marshall et al. [2007] that use a setof conceptual models that have different model structuresand parameterizations to assess the uncertainties in parame-ters and predictions associated with model variation.Bayesian hierarchical methods presented in the work ofKavetski et al. [2006], Kuczera et al. [2006], Thyer et al.[2009], and Renard et al. [2010] explicitly represent eachsource of uncertainty affecting the calibration and predic-tion by hierarchical posterior distributions. These methodsrequire a reasonably large amount of data and informativepriors for each source of uncertainty. Autoregressive mod-els are also employed by Bates and Campbell [2001] andSchoups and Vrugt [2010] to analyze conceptual rainfall-runoff models that have a small number of parametersusing time series data. Because we only have one concep-tual model available in this study, and deal with steady


3 of 26

state temperature measurements, simplifying assumptionsabout the stochastic noise e are made as mentioned above.However, posterior analysis of sampling results in section 6shows that these simplifying assumptions are reasonable.

[16] Following Bayes’ theorem, the unnormalized poste-rior distribution

�ðxjdÞ / �ðdjxÞ�ðxÞ ð2Þ

is given as a product of the likelihood function �ðdjxÞ andprior distribution �ðxÞ. From (1) and the Gaussian assump-tion about the noise e, we can deduce that the likelihoodfunction has the form

�ðdjxÞ / exp � 12�2

e

��d� FðxÞ��2

� �; ð3Þ

where jj�jj is the Euclidean norm.[17] The prior distribution �ðxÞ quantifies the relative

probability for a given set of parameters x in the absence offield measurements [Jaynes, 1968; Woodbury and Ulrych,2000]. Formulating the prior distribution for a subsurfacemodeling problem usually consists of (1) Choosing anappropriate representation of the unknown parameters. Asdiscussed in the work of Hurn et al. [2003], this is a com-posite part of prior modeling since expressing certain typesof knowledge is simpler in some representations than others,and solutions that cannot be represented are excluded. (2)Deriving the spatial statistics for the chosen parametrizationby expert knowledge of allowable parameter values, previ-ous measurements, modeling of processes that produce theunknowns, or a combination of these [Ulrych et al., 2001;Ulrych and Sacchi, 2005]. The details of the parameteriza-tion and the form of the prior distribution for our geother-mal reservoir models are presented in section 5.

2.2. Efficiency of MCMC Sampling[18] To effectively solve the inverse problem, the poste-

rior distribution (2) is sampled by an MCMC method, andhence summary statistics can be estimated from samples.MCMC algorithms draw samples from the posterior distri-bution by generating a sequence, or ‘‘chain,’’ of solutionsthat have the ergodic property, i.e., that allow expectationsover the posterior distribution to be replaced by averagesover the chain. Informally, we think of an ergodic chain asone that spends time in each region of parameter space pro-portional to the posterior probability of that region. Almostall implementations of MCMC sampling employ MH dy-namics, which was originally developed for applications incomputational chemistry [Metropolis et al., 1953]. Thismethod was generalized by Hastings [1970] for generalpurpose proposal distributions (also referred to as moves),and then later extended to allowing transitions in parameterspace with a variable number of dimensions [Green, 1995].Green’s ‘‘reversible jump’’ MCMC is derived from MH,and is a Hastings algorithm in general parameter spaces(see Green [1995] for details). Green and Mira [2001];Waagepetersen and Sorensen [2001] provide discussionand examples of reversible jump MCMC. Even though wedo not always have variable dimension models, we preferthe reversible jump formulation as it simplifies the calcula-tion of acceptance probabilities for the subspace moves

which we use for the geothermal reservoir model. Anexample is the moves for updating the weights that controlthe distribution of hot water injection at the base of themodel (see section 5.3) that have special constraints.

[19] In the reversible jump formulation, for a given statex, we generate a vector of random variables � from aknown density g(�), and then a new proposal x0 is generatedby a deterministic function x0 ¼ �ðx; �Þ. The associatedreverse transition from x0 to x is made with aid of a newrandom vector �0 drawn from another known density g0(�)and a deterministic function x ¼ �0ðx0; �0Þ. The notations �and �0 indicate that different functions could be used ineach direction though these functions might often be thesame in practice [Green and Mira, 2001].

[20] The reversible jump formulation requires that thetransformation from ðx; �Þ to ðx0; �0Þ is injective, differen-tiable and with a continuous derivative. Then the proposalcan be accepted or rejected according to some acceptancerule that ensures the target distribution is invariant. At stepn, given state xn ¼ x, one step of reversible jump can bewritten as follows.

[21] 1. Generate a vector of random variable � from adensity g(�). Then a proposal x0 is generated by a determin-istic function x0 ¼ �ðx; �Þ.

[22] 2. With probability

�ðx; x0Þ ¼ 1 ^ �ðx0jdÞ

�ðxjdÞg0ð�0Þgð�Þ J ; ð4Þ

set xnþ1 ¼ x0, otherwise xnþ1 ¼ x, where �0 � g0ð�Þ is thevector of random variable associated with the reverse tran-sition x ¼ �0ðx0; �0Þ.

[23] The factor J in (4) denotes the determinant of the Ja-cobian of the transition from ðx; �Þ to ðx0; �0Þ, i.e., J ¼@ðx0; �0Þ=@ðx; �Þj j. The symbol ^ denotes the minimum of

two factors. The probability of generating proposal x0 fromcurrent state x is determined by the density g(�). As shownin the work of Green and Mira [2001], the pair of proposaldistributions for forward transition from x to x0, and reversetransition from x0 to x can be written as

qðx; x0Þ ¼ gð�Þ; ð5Þ

qðx0; xÞ ¼ g0ð�0ÞJ : ð6Þ

In the above form, the reversible jump algorithm is equiva-lent to MH.

[24] MH generates a highly correlated random sequenceof parameters (or solutions), x1, x2, . . . , xN having the Mar-kov property, with a limiting distribution equal to thedesired posterior distribution, i.e., xi � �ðxjdÞ for large i.Hence xi and xiþ1 will be very similar, however for largelags, i.e., j � i, the parameter xi and xj may be viewed asindependent samples from the posterior distribution.Because our goal is to estimate the expectation values ofstatistics of interest, the performance of MCMC samplingis characterized by the number of statistically independentsamples that can be drawn. The statistical efficiency of MHis quantified by the number of iterations required to gener-ate a statistically independent sample, which is calculatedby estimating the integrated autocorrelation time of some


4 of 26

statistics of interest over the chain [Goodman and Sokal,1989; Roberts, 1996]. The method and MATLAB codeprovided by Wolff [2004] is used to estimate the integratedautocorrelation time here.

[25] The statistical efficiency of MH is determined by theform of the moves used to generate a proposal [Amit andGrenander, 1991; Roberts and Sahu, 1997]. For a nonlinearinverse problem such as parameter estimation for a geother-mal reservoir model, the posterior distribution usually has ahigh dimensional parameter space with complex structures,and hence to design moves that allow for efficiently travers-ing the parameter space requires exploiting the structure ofthe inverse problem. In some hydrology problems Oliveret al. [1997] used a component-wise updating scheme, inwhich only one component of the parameters is modifiedin each iteration. With this method, the number of iterationsfor a sweep update in the component-wise scheme is pro-portional to the dimensionality of the parameter space,which makes it infeasible for high dimensional problems.Fu and Gomez-Hernandez [2009] employed a block updat-ing scheme that modifies a subgroup of parameters eachtime, which appears to be more efficient than component-wise updating. A more comprehensive iterative spatialresampling scheme (ISR) was developed by Mariethozet al. [2010], and shows superior performance for their 2-Daquifer case study. ISR uses a Gaussian process model asprior distribution, and a new proposal is generated byresampling the Gaussian process model conditioned on asubset of randomly selected points. Since ISR is a priordriven method, its efficiency relies on the accuracy of theprior distribution. For geothermal reservoir models, we usu-ally do not have a highly informative prior distribution, andthe use of ISR need to be further assessed.

[26] Once a move has been formulated, tuning the varia-bles that control the scale of the move is crucial for achievingstatistical efficiency. As shown in the work of Gelman et al.[1996], Roberts et al. [1997], and Roberts and Rosenthal[2001], for a d-dimensional multivariate Gaussian randomwalk move, i.e., � � Nð0; �2�Þ and x0 ¼ xþ �, where � isthe covariance matrix and � is the scale variable, the opti-mal choice of the scale variable is � � 2:38=

ffiffiffidp

given sev-eral theoretical assumptions. They also showed that theacceptance rate of about 0.23 gives the optimal statisticalefficiency for a high dimensional target distribution. Tradi-tionally, the modeler has to adjust these scale variables in a‘‘trial and run’’ manner, which is very time consuming forhigh dimensional problems. Recent advances in adaptiveMCMC sampling, including AM [Haario et al., 2001],DRAM [Haario et al., 2006], the adaptive Metropoliswithin Gibbs algorithm (AMWG) of Roberts and Rosenthal[2009], and differential evolution Markov chain algorithmwith snooker update [Ter Braak and Vrugt, 2008], automatethe tuning process by using the sampling history. In section4, adaptive MCMC sampling techniques [Haario et al.,2001; Roberts and Rosenthal, 2009] and a block updatingscheme [Chib and Greenberg, 1995] are combined togetherto design effective moves for sampling the posterior distri-bution of our problem.

[27] All the algorithms we have discussed so far focus ongenerating proposals in MH, which determines the statisti-cal efficiency for traversing the parameter space. Apartfrom designing the moves, computational difficulty arises

in MCMC sampling mainly because MH requires sequen-tial evaluation of the posterior density at each iteration, andmany thousands or millions of iterations are necessary togive sufficiently accurate estimates. Hence, we have toconsider the computational efficiency of MH in practice,i.e., the computing time for generating a statistically inde-pendent sample is used instead of the number of iterations.To improve computational efficiency, it not only necessaryto design efficient moves that adequately explore the pa-rameter space, but it also requires the reduction of thecomputing time per iteration. In this research, the reduc-tion of the computing time per iteration is achieved byimplementing ADAMH, which employs a computationallyfast reduced-order model that adapts to the model reduc-tion error.

2.3. Approximate Posterior[28] Solving an inverse problem usually requires a large

number of evaluations of the posterior density. Computa-tional difficulties arise because evaluating the posterior den-sity (2) involves solving the computationally demandingforward model F. The amount of computation can bereduced by introducing a simplified version of the forwardmodel, called a reduced-order model. Typical examples ofthe use of reduced-order models in inverse problemsinclude the following: (1) the approach of Christen andFox [2005], who constructed a state-dependent linearizationfor a electrical network problem; (2) enhanced error modelshave been used by Arridge et al. [2006] and Kolehmainenet al. [2011] for the analysis of optical diffusion tomogra-phy; (3) Lehikoinen et al. [2010] applied the nonstationaryextension of the enhanced error model [Huttunen andKaipio, 2007] for imaging unsaturated flow in heterogene-ous soil; (4) Lipponen et al. [2011] used a projection-basedreduced-order Navier-Stokes model to reconstruct nonsta-tionary flows from electrical impedance tomography data;(5) Lieberman et al. [2010] applied a greedy algorithm toconstruct a projection-based reduced-order model for sam-pling a groundwater problem; and (6) in hydrology andpetroleum, coarse models and upscaling techniques [Begget al., 1989; Renard and De Marsily, 1997; Durlofsky,1998; Christie and Blunt, 2001] are widely used to buildreduced-order models. Mondal et al. [2010] applied themultiscale finite element method and model upscaling tech-niques to build a reduced-order model for a 2-D subsurfaceflow problem. They employed the nonadaptive surrogatetransition and reversible jump MCMC together to performsampling. This approach is similar to ours, but we employadaptive MCMC that allows for tuning the algorithm auto-matically and estimate the model reduction error adaptivelyto enhance the sampling efficiency.

[29] We use F�(x) and F�yðxÞ to denote the state-independ-ent and state-dependent reduced-order models, respectively.One possible construction of the approximate posterior dis-tribution is by directly replacing F by F� in the likelihoodfunction (3), which yields

��ðxjdÞ / exp � 12�2

e

��d� F�ðxÞ��2

� ��ðxÞ: ð7Þ

The amount of computation is reduced by only simulating areduced-order model in the approximate posterior distribution.


5 of 26

However, if we consider the probabilistic model (1) in thefollowing form

d ¼ FðxÞ þ e

¼ F�ðxÞ þ ½FðxÞ � F�ðxÞ þ e

¼ F�ðxÞ þ AðxÞ þ e;

ð8Þ

then there exists a model reduction error A(x) between thefine model and the coarse model. When this model reduc-tion error is more significant than the noise level in the like-lihood function, the inference will be dominated by thismodel reduction error. Ignorance of this model reductionerror results in biased estimates, as discussed in the work ofArridge et al. [2006] and Kaipio and Somersalo [2007].

[30] Kaipio and Somersalo [2007] introduce an enhancederror model technique which treats the model reductionerror A as independent of the model parameter x andassumes that the model reduction error is multivariate nor-mally distributed, i.e., A � Nð�A;�AÞ. This gives the ap-proximate posterior distribution

��ðx dj Þ / exp � 12

��LA½d� F�ðxÞ � �A��2

� �ðxÞ; ð9Þ

where LAT LA ¼ ð�A þ �eÞ�1. In (7) and (9), F�(x) and

F�yðxÞ can be used interchangeably. To construct the EEM,Kaipio and Somersalo [2007] first draw a set of randomsamples from the prior distribution �ðxÞ and then empiri-cally estimate �A and �A from the differences between thefine model outputs and coarse model outputs. In the opticaldiffusion tomography examples of Arridge et al. [2006], themarginal distribution of the approximate posterior distribu-tion (9) is very close to the exact marginal posterior distribu-tion, while the marginal distribution of the approximateposterior (7) has almost zero probability in the mode of themarginal distribution of the exact posterior distribution. Thissuggest that the probabilistic modeling of the model reduc-tion error is essential for accurately formulating the approxi-mate posterior distribution based on reduced-order models.

[31] The EEM is integrated into ADAMH (see section 3)as the approximate posterior distribution used in the firststep, thus speeding up the computation. However, ADAMHuses techniques in adaptive MCMC sampling to constructthe EEM over the posterior distribution adaptively throughthe sample history rather than from the prior distribution asin the work of Kaipio and Somersalo [2007]. The adaptiveconstruction of the EEM over the posterior has two advan-tages over forming the EEM a priori : (1) it avoids all theprecomputing for estimating the EEM, and (2) it provides amore accurate approximation to the exact posterior distri-bution that is consistent with the field data.

3. ADAMH Algorithm3.1. ADAMH

[32] ADAMH implements the basic structure of thedelayed acceptance scheme of Christen and Fox [2005].The adaptation is not only used for online construction ofthe enhanced error model, but also to tune the proposal dis-tributions as in the traditional adaptive MCMC algorithms[Haario et al., 2001, 2006; Atchade and Rosenthal, 2005;Andrieu and Moulines, 2006; Roberts and Rosenthal, 2009].

[33] Suppose we have a computationally expensive pos-terior distribution �ð�jdÞ, and a computationally fast, adapt-ive, local approximation ��x;nð�jdÞ. Here the subscript n isthe adaptation index at step n, and the notation is of a gen-eral form that includes the approximations that are state-independent and state-dependent. Consider an adaptive re-versible jump move with random variable � drawn from anadaptive probability distribution gn(�), and the forward tran-sition defined by a deterministic function x0 ¼ �ðx; �Þ. Theassociated reverse transition from x0 to x is made with aidof a random vector �0 drawn from a known adaptive den-sity g0n(�) and a deterministic function x ¼ �0ðx0; �0Þ. Sup-pose further the mapping from ðx; �Þ to ðx0; �0Þ is injective,differentiable and with continuous derivative. As shown byGreen and Mira [2001], this defines a pair of proposal dis-tributions in the MH algorithm:

qnðx; x0Þ ¼ gnð�Þ; ð10Þ

qnðx0; xÞ ¼ gn0ð�0ÞJ : ð11Þ

[34] We first accept or reject the move with the approxi-mation ��x;nðx0jdÞ. This defines a modified proposal distri-bution q�nðx; x0Þ, which is then accepted or rejected by thestandard MH rule applied to q�nðx; x0Þ in the second step.Thus, the algorithm is guaranteed to converge with correctergodic properties since the standard results for the adapt-ive MH algorithm hold. We gain computational efficiencybecause we avoid calculating �ðx0jdÞ when proposals arerejected by ��x;nðx0jdÞ. The delayed acceptance scheme nec-essarily reduces statistical efficiency [Christen and Fox,2005]; however, there is an overall gain in computationalefficiency when the modified proposal has a high accep-tance rate in the second step.

[35] At step n, given xn ¼ x, and an approximate targetdistribution ��x;nð�jdÞ, one iteration of ADAMH is given bythe following.

[36] 1. Generate a random variable � from a densitygn(�). Then a proposal x0 is generated by a deterministicfunction x0 ¼ �ðx; �Þ.


�ðx; x0Þ ¼ 1 ^��x;nðx0jdÞ��x;nðxjdÞ

qnðx0; xÞqnðx; x0Þ

; ð12Þ

accept x0 to be used as a proposal for the second step.Otherwise reject x0 by setting xnþ1 ¼ x, and go to step 4.Here qn(�,�) is defined by equations (10) and (11).


ðx; x0Þ ¼ 1 ^ �ðx0jdÞ

�ðxjdÞq�nðx0; xÞq�nðx; x0Þ

; ð13Þ

accept x0 setting xnþ1 ¼ x0. Otherwise reject x0 by settingxnþ1 ¼ x. Here q�nð�; �Þ is the modified proposal distributiondefined by

q�nðx; x0Þ ¼ qnðx; x0Þ�ðx; x0Þ þ ½1� rnðxÞxðx0Þ; ð14Þ

q�nðx0; xÞ ¼ qnðx0; xÞ�ðx0; xÞ þ ½1� rnðx0Þx0 ðxÞ: ð15Þ


6 of 26

[39] 4. Update the adaptive approximation to ��x;nþ1ð�jdÞ.[40] 5. Update the adaptive probability density to qnþ1(�).[41] In equation (14), the term ½1� rnðxÞxðx0Þ denotes

the probability that the proposal is rejected by the approxi-mation ��x;nðx0jdÞ, where

rnðxÞ ¼Z

gnð�Þ�ðx; �ðx; �ÞÞd�: ð16Þ

Note (as is also the case in the standard Metropolis-Hastings algorithm) that there is never a need to calculatern(x). When a proposal is rejected by the approximation, itcan be considered as a proposal x0 ¼ x in Step 3 ofADAMH. The same argument holds for the reverse transi-tion. Thus, equation (13) can be simplified to


�ðxjdÞgn0ð�0Þ�ðx0; xÞ

gnð�Þ�ðx; x0ÞJ : ð17Þ

[42] In calculating the acceptance probability (17) in step3, there is a need to evaluate both the forward probability�ðx; x0Þ and reverse probability �ðx0; xÞ, which are the prob-abilities for accepting proposals with approximation. Theevaluation of the reverse probability does not incur extracomputing time for state-independent approximations. Forstate-dependent approximations, it requires computing thereverse approximation ��x0;nðxjdÞ. However, computing thestate-dependent approximation that we introduced in thissection does not incur additional cost. We provide justifica-tions and simplified formulas for implementing ADAMH inAppendix A.

[43] Roberts and Rosenthal [2007] established a standardroute for proving the ergodicity of adaptive MCMC algo-rithms. We directly applied the theorems and corollaries ofRoberts and Rosenthal [2007] to establish regularity condi-tions on the form of the adaptive approximation that aresufficient to ensure the ergodicity of ADAMH (for details,see Cui et al. [2011]). These regularity conditions are satis-fied by the two adaptive approximations discussed in thissection, but the technical details are not presented here forcompactness. The adaptive proposal distributions (step 1and 5 of ADAMH) used for sampling the geothermal reser-voir model are discussed in section 4.

[44] We compare two ways of formulating an approxi-mate posterior distribution by using the EEM. First, a state-independent EEM over the posterior is constructed. Sincefor each state of the chain in ADAMH, the fine model F(xn)and coarse model F�(xn) are evaluated (if xn is rejected thenF(xn) is not evaluated, and hence is not in the Markovchain), an empirical estimation of the EEM can be calcu-lated iteratively through the sample history. The adaptiveform of the approximate posterior at step n of ADAMH isgiven as

��nðxjdÞ / exp � 12

LA;n½d� F�ðxÞ � �A;n�� 2

� ��ðxÞ; ð18Þ

where LA;nT LA;n ¼ ð�A;n þ �eÞ�1. The mean �A;n and the

covariance �A;n are updated �A;nþ1 and �A;nþ1 in step 4 ofADAMH by

�A;nþ1 ¼1

nþ 1½n�A;n þ Anþ1; ð19Þ

�A;nþ1 ¼n� 1

n�A;n þ

1n

Anþ1Anþ1T � �A;nþ1�A;nþ1

T ; ð20Þ

where the model reduction error Anþ1 ¼ F(x0) � F�(x0) foraccepted state xnþ1 ¼ x0, and remains Anþ1 ¼ F(x) � F�(x)for rejected state xnþ1 ¼ x.

[45] Second, a state-dependent EEM over the posteriorwith a local correction is constructed. The local correctionis given by the difference between the fine model outputand coarse model output at a given state x, i.e., F(x) �F�(x). Then, for a proposal x0, the reduced-order model hasthe form

F�x ðx0Þ ¼ F�ðx0Þ þ ½FðxÞ � F�ðxÞ: ð21Þ

The model reduction error of this state-dependent approxi-mate posterior is also estimated by employing the EEM,which yields

��x;nðx0jdÞ / exp � 12

LA;n½d� F�x ðx0Þ�� 2

� ��ðxÞ; ð22Þ

where LA;nT LA;n ¼ ð�A;n þ �eÞ�1. The covariance �A;n is

updated to �A;nþ1 in step 4 of ADAMH by

�A;nþ1 ¼1nðn� 1Þ�A;n þ Axnþ1ðxnÞAxnþ1ðxnÞTh i

; ð23Þ

where Axnþ1ðxnÞ ¼ Fðxnþ1Þ � F�xnðxnþ1Þ. Since ADAMH

satisfies diminishing adaptation [Roberts and Rosenthal,2007], it follows that the algorithm converges to a well-defined ‘‘best’’ algorithm with fixed proposal distributionand enhanced error model, i.e., that has a fixed transition ker-nel K(x, x0). That algorithm (as n!1) will satisfy detailedbalance, and hence we can determine the limiting mean ofthe locally corrected coarse model lim

n!1�A;n as follows:

limn!1

�A;n ¼ E�½Z

Axðx0ÞKðx; x0Þdx0

¼Z Z

½Fðx0Þ � F�ðx0Þ�ðxÞKðx; x0Þdx0dx

�Z Z

½FðxÞ � F�ðxÞ�ðxÞKðx; x0Þdx0dx

¼Z Z

½Fðx0Þ � F�ðx0Þ�ðx0ÞKðx0; xÞdx0dx

�Z Z

½FðxÞ � F�ðxÞ�ðxÞKðx; x0Þdxdx0

¼ 0

ð24Þ

Our preliminary numerical tests also suggested that the magni-tude of �A;n is negligible compared to the model outputsF�(x) and F(x), and hence can be set to zero in practice.

3.2. Performance[46] ADAMH improves the computational efficiency by

reducing the average computing time per iteration, andhence the computing time per independent sample of


7 of 26

ADAMH is lower than that of the standard MH. Supposethat for the same proposal distributions, the number of itera-tions for generating a statistically independent sample are�� for ADAMH and � for the standard MH. Let t� and tdenote the computing time of the approximate target densityand the exact target density, respectively. Suppose furtherboth algorithms have generated N independent samples, andhence 2N�� and 2N� iterations are required for ADAMHand the standard MH, respectively. Therefore, the total com-puting time for ADAMH is about 2N��ðt� þ t�̂Þ, and isabout 2N� t for the standard MH, where �̂ is the average ac-ceptance rate in the first step. Thus, the speed-up factor ofADAMH is

�

��1

�̂þ t�=tð Þ : ð25Þ

Here �=�� < 1 is a measure of the decrease of the statisticalefficiency of ADAMH. ADAMH is always statistically lessefficient than the standard MH [Christen and Fox, 2005],because it requires more iterations to generate a statisticallyindependent sample than the standard MH. This fact fol-lows directly from the condition nðx; x0Þ 1. This suggestthat there are two important issues in implementingADAMH: (1) designing efficient moves in the first step,which optimize the statistical efficiency in the sense of run-ning a standard MH; and (2) maintaining the statistical effi-ciency of ADAMH, which requires a ‘‘good’’ approximateposterior distribution that has transaction probability fromx to x0 close to the standard MH, i.e., nðx; x0Þ � 1, andhence �=�� 1.

[47] Note that in (2), there is a trade off between numeri-cal accuracy and computing speed of the approximate tar-get distribution. For a reduced-order model that providessufficient computing speed, the model reduction error canhave a significant impact on the accuracy of the approxi-mate posterior distribution. In numerical experiments basedon a simple model of the feedzone of a geothermal well,calibrated with well discharge test data [Cui et al., 2011],the approximate posterior (7) only achieved about a 17%acceptance rate in the second step of ADAMH, and theresulting Markov chain was poorly mixed (the integratedautocorrelation time could not be estimated to benchmarkeffciency). This is also confirmed by the study of Efendievet al. [2005], in which a nonadaptive surrogate transitionmethod was implemented to sample a 2-D multiphase flow(oil and water) problem. They tested approximate posteriordistributions with various noise levels without consideringthe model reduction error, giving the second step accep-tance rate of only about 23% for their best approximation.

[48] The accuracy of the approximate posterior distribu-tion can be improved significantly by using the EEM, incur-ring only a small increase in computational cost. In thisresearch, the adaptive local approximated posterior (22) isemployed. The second step acceptance rate increased fromabout 30% using approximation (7) (this cannot be pre-cisely estimated because the resulting chain is poorlymixed) to about 74%. Therefore, application of the adaptiveEEM with the local (state dependent) correction in (22) iscritical to making the MCMC feasible, since it allows sam-pling from the exact posterior distribution while using a

significantly cheaper computational model, without signifi-cant reduction in statistical efficiency.

4. Adaptive Proposal Distributions4.1. Adaptive Block Updating

[49] As mentioned in section 2.1, the statistical effi-ciency of MH is determined by the proposal distributions(moves) that generate proposals. Difficulty arises in design-ing efficient proposal distributions for geophysical inverseproblems because the spatially distributed, highly heteroge-neous parameters have a very high dimensionality. Variousforms of proposal distributions are discussed by Chib andGreenberg [1995], in which the component-wise updating,block updating, and full updating are most commonly usedin sampling inverse problems. Component-wise updating isimplemented by Oliver et al. [1997] for conditioning per-meabilities on pressure data. This scheme suffers fromcomputational difficulties for high dimensional problems,because the number of forward model evaluation is propor-tional to the number of parameters for each sweep ofupdates. Fu and Gomez-Hernandez [2009] employed theblock updating scheme to sample a 2-D hydrology model-ing problem, where they randomly select and perturb ablock of parameters each time. They also assessed the sam-pling performance of various block sizes, and suggest that8 � 8 block size appears to be most efficient for their casestudy. This block updating scheme is also adopted in thispaper to generate proposals for updating the permeabilitiesof the geothermal reservoir model.

[50] Another difficulty in designing proposal distribu-tions for problems with many parameters is that the tuningof the variables in the proposal distributions is usually verytime consuming in terms of computing time and humaninput, because many short trial runs have to be made.Adaptive MCMC sampling techniques [Haario et al.,2001; Atchade and Rosenthal, 2005; Andrieu and Mou-lines, 2006; Haario et al., 2006; Roberts and Rosenthal,2007, 2009] offer great flexibility in the automatic tuningof these variables. Our preliminary numerical experiments(not shown) showed that the proposal distribution2:38=

ffiffiffidp

Nð0;�Þ (where � is the adaptively estimated co-variance matrix of model parameters) used by AM can besuboptimal in cases where the target distribution is non-Gaussian. However, the covariance matrix constructed byAM provides valuable information about the parameterspace, and potentially could be used to improve the statisti-cal efficiency of sampling.

[51] One attempt to implement this idea is the DRAM ofHaario et al. [2006], which combines AM with the delayedrejection algorithm (DR) [Tierney and Mira, 1999]. DRAMemploys a second step proposal distribution that has thesame form as AM, but uses a smaller scale. The philosophyof DRAM is if the first step proposal is rejected, which sug-gests that the proposal is too far away from current statebecause the scale is too large, use a smaller scale in the sec-ond step to draw another proposal that is closer to the cur-rent state. In the examples given in the work of Haario et al.[2006], statistical efficiency can be dramatically improvedby DRAM, particularly during burn in. The performance ofDR has been discussed by Green and Mira [2001]. Theysuggest that DR gives a better improvement in the statistical


8 of 26

efficiency when the first step proposal is suboptimal. For anoptimal proposal distribution, DR does not improve the sta-tistical efficiency.

[52] Rather than using two proposal distributions with dif-ferent scales, we adopt a different approach, introduced inAMWG [Roberts and Rosenthal, 2009], that adaptivelyadjusts a single scale in the proposal distribution to match acertain optimal acceptance rate. The original AMWG is acomponent-wise updating scheme, and the optimal accep-tance rate of about 0.44 [Roberts et al., 1997; Roberts andRosenthal, 2001] is used as the target acceptance rate. Themajor drawback of this approach is that the component-wiseupdating scheme cannot utilize the covariance matrix. Wedesign an adaptive block updating scheme (ABU) by utiliz-ing the adaptively estimated covariance matrix of AM andthe approach of AMWG for adaptive adjustment of the scale.We present ABU below within the framework of ADAMH.

[53] In step 1 of ADAMH, suppose the D-dimensionalpermeabilities k ¼[k1, . . . , kD] are divided into L blocks,I1; . . . ; I j, IL � f1; 2; . . . ;Dg, and each block I j is asso-ciated with a scale variable �j. Let Dj be the number of ele-ments of group I j, and �I j ¼ f1; 2; . . . ;Dg=I j. At step n,given kn ¼ k, for a given group j, the proposal k0 is gener-ated by drawing a Dj dimensional random variable � fromthe probability distribution

gj;nð�Þ ¼Nð0; 0:01IjÞ n 2Dj

N 0;�2

j

maxi2I jð�2i Þð�j;n þ 0:05IjÞ

" #n > 2Dj;

8><>: ð26Þ

and then calculated by the deterministic map k0 ¼ �ðk; �Þ :

k0I j¼ kI j þ �

k0½�I j ¼ k½�I j :

(ð27Þ

Note we use the same deterministic map and the sameprobability density in the forward and reverse transition,i.e., gn(�) ¼ g0n(�) and �ð�; �Þ ¼ �0ð�; �Þ. This move has deter-minant of the Jacobian J ¼ 1 in the reversible jump rule.Here, Ij is a Dj � Dj identity matrix, and �j;n is the empiri-cally estimated covariance matrix of the block I j. �2

i arethe diagonal terms of the covariance matrix �j;n, and1=maxi2I jð�2

i Þ is used to scale the covariance matrix tosome fixed magnitude. This is used to avoid the adaptivelyadjusted scale variable �j interacting with the scale of thecovariance matrix �j;n, since otherwise the algorithmwould explore this undetermined scale leading to loss of ef-ficiency. Then, step 2 and 3 of ADAMH are used to updatekn to knþ1.

[54] In step 5 of ADAMH, we first update the empiri-cally estimated covariance matrix �j;n to �j;nþ1 by

�j;nþ1 ¼1

nþ 1½n�j;n þ kj;nþ1; ð28Þ

�j;nþ1 ¼1n½ðn� 1Þ�j;n þ kj;nþ1kT

j;nþ1 � ðnþ 1Þ�j;nþ1�Tj;nþ1;

ð29Þ

where kj;nþ1 ¼ fki;nþ1gi2I j. Then, the scale variable �j is

updated as following. For a prespecified batch number Nb,

if n mod Nb ¼ 0, calculate the overall acceptance rate �̂jfrom the past Nb updates, and then adjust �j according tothe target acceptance rate 0.1:

�j ¼�j ¼ �j þ exp 0:01 ^ n

Nb

� ��1=2" #

�̂j 0:1

�j ¼ �j þ exp 0:01 ^ nNb

� ��1=2" #

�̂j > 0:1

:

8>>>><>>>>:

ð30Þ

This treatment of �j directly follows Roberts and Rosenthal[2009]. The theoretical results of Roberts et al. [1997] andRoberts and Rosenthal [2001] suggest that the target accep-tance rate 0.23 is optimal for adjusting �j, under certainassumptions. They also suggest that an acceptance rate inthe range of [0.1,0.6] gives reasonably good statisticalefficiency. Preliminary numerical testings (not shown)revealed that the target acceptance rate of around 0.1 pro-duces the best statistical efficiency for a geothermal modelwith synthetic data, and hence is adopted here.

4.2. Adaptive Move With Special Constraints[55] For the geothermal reservoir model introduced in

section 5, the distribution of mass input is controlled by aM-dimensional vector w ¼ [w1, . . . , wM]. The constraintson w are (1) the vector is non-negative, i.e., wi 0 for i ¼1, . . . , M, and (2) sum of the vector equal to some fixedvalue C, i.e.,

PMi¼1 wi ¼ C. To satisfy these constraints, the

following reversible jump move is used to generate the pro-posal: In step 1 of the ADAMH, suppose we have currentstate wn ¼ w, the proposal w0 is determined by thefollowing.

[56] 1. Select a component i from 1, . . . , M randomlywith uniform distribution.

[57] 2. Draw a random variable � � Uniformð1=�; �Þ,where � > 1.

[58] 3. Then, the candidate x0 is given by the map

�ðw; �Þ ¼ wþ wið�� 1Þei

C þ ð�� 1ÞwiC; ð31Þ

where ei is the ith standard basis vector in the M-dimen-sional Cartesian coordinate system. The determinant of theJacobian matrix in the reversible jump rule is

J ¼ CC þ ð�� 1Þwi

� �Mþ1 1�: ð32Þ

Detailed derivations are presented in Appendix B. Then,step 2 and 3 of ADAMH are used to update wn to wnþ1. Instep 5 of ADAMH, the scale variable � is updated adap-tively according to same rule as ABU. In this reversiblejump formulation, the same probability density and thesame deterministic map are used in the forward and reversetransition.

[59] The above reversible jump move can be comparedwith that of Haslett et al. [2006, p. 413] who also walk on asimplex but do not show explicitly why their approach is re-versible. This move preserves the constraint (1) by scaling aweight wi at every iteration, and the scaling perturbs the


9 of 26

shape of the distribution. The constraint (2) is satisfied byscaling all the weights simultaneously back to maintain thefixed total value C, which also preserves the shape of theperturbed distribution and does not violate the constraint (2).Note that each update changes the values of all the weights.

5. Geothermal Reservoir Modeling5.1. Numerical Simulation

[60] We use the TOUGH2 computer package [Pruess,1991] to simulate the multiphase nonisothermal flow in ageothermal reservoir, which is governed by the mass bal-ance equation

ddt

Z�

Mm dV ¼Z@�

Qm � n d�þZ

�

qmdV ; ð33Þ

and energy balance equations

ddt

Z�

Me dV ¼Z@�

Qe � n d�þZ

�

qedV : ð34Þ

Here, � is the control volume (m3), @� is its boundary(m2), n is the outward unit normal vector to @�, Mm repre-sents the amount of mass per unit volume (kg m�3), Me isthe amount of energy per unit volume (J m�3), Qm is themass flux (kg m�2 s�1), Qe is the energy flux (J m�2 s�1),qm is the mass sink/source per unit volume (kg m�3), andqe is the energy sink/source per unit volume (j m�3).

[61] The amount of mass and energy in the control vol-ume are calculated from

Mm ¼ � ð lSl þ vSvÞ; ð35Þ

Me ¼ ð1� �Þ rurT þ � ð lulSl þ vuvSvÞ; ð36Þ

where � represents porosity (dimensionless), Sl is liquidsaturation (dimensionless), Sv is vapor saturation (dimen-sionless), l is density of liquid (kg m�3), v is density ofvapor (kg m�3), r is density of rock (kg m�3), and ul anduv represent specific internal energy (J kg�1) of liquid andvapor, respectively. ur is the specific heat of the rock(J kg�1 K�1), and T is the temperature K. The mass flux ofliquid and vapor are modeled by the multiphase version ofDarcy’s law

Qml ¼ �kkrl

�lrp� lg½ ; ð37Þ

Qmv ¼ �kkrv

�vrp� vg½ ; ð38Þ

where k is permeability tensor (m2), p represents pressure(kg m�1 s�2), r is the gradient operator, �l is kinematicviscosity of liquid (m2 s�1), �v is kinematic viscosity ofvapor (m2 s�1), krl and krv represent the relative permeabil-ities (dimensionless), and g is gravitational acceleration(m s�2). Here, the van Genuchten-Mualem relative perme-ability model [Van Genuchten, 1980] is used to model therelative permeabilities. The total amount of mass flux andenergy flux are given by

Qm ¼ Qml þ Qmv ð39Þ

Qe ¼ hlQml þ hvQmv � KrT ; ð40Þ

where h is specific enthalpy (J kg�1) and K is thermal con-ductivity (J s�1 m�1 K�1).

[62] In the system of equations (33)–(40), pressure p andtemperature T (or vapor saturation Sv for two phase flow)are spatially distributed quantities that represent the state ofthe system. Permeability k, relative permeabilities krl andkrv and porosity � are the parameters of interest. The rest ofthe quantities such as viscosities �l and �v, specific internalenergies ul, uv, and ur, densities l, v, and r, and specificenthalpy hl and hv are calculated using steam table equationsgiven by the International Formulation Committee [1967].

[63] TOUGH2 implements the integrated finite differencemethod to spatially discretize the mass and energy balanceequations (33)–(34). To guarantee the stability of simula-tions, a fully implicit scheme with adaptive time stepping isused for numerical integration in time, and upstreamweighting is used for calculating flows between adjacentblocks. Within each time step, the Newton-Raphson methodis used to solve the resulting system of nonlinear differenceequations. TOUGH2 uses a preconditioned conjugate gradi-ent sparse matrix solver for solving the linear equations ateach Newton-Raphson iteration [e.g., Pruess, 1991].

5.2. A Steady State Model5.2.1. Model Description

[64] In geothermal reservoir modeling, the first step in set-ting up a numerical model to solve the mass and energy bal-ance equations (33)–(34) is the estimation of the permeabilitystructure and boundary conditions from the pre-exploitation(steady state) temperature measurements [Grant et al.,1982; O’Sullivan, 1985; O’Sullivan and McKibbin, 1989].The temperature measurements indicate the movement ofhot fluid, which is directly influenced by the permeabilitystructure and mass input at the bottom of the model. Thesimulation of the model is carried out over a long period ofmodel time to ensure a stable steady state is obtained. Forsteady state models, the porosities do not affect the solutionof the set of equations (33)–(40), and the relative perme-abilities only have minor impact on the solution. Thus, thepermeability distribution and deep hot up flows at the bot-tom of the reservoir are the parameters of major concern inour analysis. The temperature measurements are presentedin section 6.2, manual calibration results and trial runs ofMCMC sampling suggest that �e ¼ 7:5�C should be usedfor the likelihood function discussed in section 2.1.

[65] On the basis of the geological data, such as the geo-logical structure, resistivity data and magnetotelluric sur-veys, a conceptual model that reflects the size and depth ofthe reservoir was set up by one of the authors (M. J. O’Sul-livan, unpublished report, 2006). Temperature logs aremeasured from the wells (black lines) drilled into the reser-voir. To include all the hot region of the reservoir and alarge surrounding warm and cold area, it was decided touse a model volume of 12.0 km � 14.4 km extending downto 3050 m below sea level, which is below the deepestwell. Relatively large blocks were used in the outside ofthe model and then were progressively refined near thewells to achieve a well-by-well allocation to the blocks, seeFigure 1. For the fine model, the largest blocks are 1200 m� 1200 m, and the smallest blocks are 150 m � 150 m.


10 of 26

The model is divided into 26 layers. Thin layers are used inthe top half of the model to allow for modeling complexhorizontal flows and circulation, and relatively thick layersare used at the bottom of the model because vertical flowsfrom the depth do not have a very complex structure. Thethickness of the top layer of the model is varied to matchthe ground surface elevation.

[66] The 3-D structure of the model with 26,005 blocks isshown in Figure 2. The black lines in the middle of the gridare well tracks for the wells drilled into the reservoir. The re-solution of this model is satisfactory for reproducing the ba-sic convective structure of the system as shown by thesampling results in section 6. To be able to reproduce the‘‘fine’’ detail of the fluid transport in the system, furtherrefinements of the grid structure is necessary. However, amore refined model would exceed the current computationallimitation. ADAMH also uses a coarse grid with 3,335blocks, which is constructed by combining adjacent blocksin the x, y, and z directions of the fine grid (see Figure 3). Acoarser level of grid resolution is not used here because fur-ther coarsening of the grid structure would produce a modelthat cannot reproduce the convective plume in the reservoir.Each simulation of the fine model takes about 30–50 minCPU time on a DELL T3400 workstation, and the computingtime for the coarse model is about 1–1.5 min. The computingtime for these models is sensitive to the input parameters.5.2.2. Model Parameters

[67] Previously, the rock structure of the model was pre-assigned based on geological data, and each rock type cov-ers a range of grid blocks. From manual calibration of themodel, we found that the permeability may not have a verygood correspondence with the type of rock, and it has alarge variation depending on the location. To capture thesevariabilities, we assign different permeabilities to eachblock of the coarse model, and map the permeabilities fromthe coarse model to the corresponding blocks in the finemodel. Hence, we have Nk ¼ 3,335 three-dimensional per-meability tensors to calibrate. Since TOUGH2 only handlesthe diagonal terms of the permeability tensor, and we esti-mate the permeabilities on the base 10 logarithmic scale,

these permeabilities are represented by vectors k(x), k(y),and k(z) here, and each of kð�Þ ¼ ½log10ðk

ð�Þ1 Þ; . . . ; log10

ðkð�ÞN Þ; � ¼ fx; y; zg is a Nk dimensional vector. Hence, wehave a 10,005 dimensional permeability vector

K ¼ kðxÞ;kðyÞ; kðzÞn o

ð41Þ

in a our calibration problem.[68] The top of the model is assumed to be ‘‘open,’’

which allows the model to have direct connection with theatmosphere. Atmosphere pressure and temperature are usedas the boundary conditions at the top of the model. Themodel covers a sufficiently large area so that the flowsthrough the side of the boundary are negligible in the natu-ral state, and hence the sides of the model are treated as no-flow boundaries.

[69] At the base of the model, a distribution of very hotwater is injected to represent the upflow from depth, whichalso has to be calibrated. A radial basis function (RBF)with kernel function

Cðs; s0; r̂Þ ¼ exp � s� s0k kr̂

� �2" #

ð42Þ

Figure 1. Top view of the model and well tracks for thewells drilled into the reservoir (black lines).

Figure 2. The 3-D grid used for natural state modeling.

Figure 3. The coarse grid used for speeding upcomputation.


11 of 26

is used to represent the distribution of the injected hot water,where s ¼ (x, y) denotes the location of the control points.Because the distribution of the injected hot water isunknown, we use a set of evenly spaced control points, withfixed locations to parametrize the distribution. These controlpoints spans a sufficiently large area to cover all the possiblelocations of injected hot water as indicated by the locationand orientation of known faults in the system (M. J. O’Sulli-van, unpublished report, 2006). The hyperparameter r̂ is setto have a fixed value of 300 m, which is empiricallyadjusted so that each control point has a sufficiently largecoverage area, and the overall distribution does not haveany spikes. The set of M ¼ 41 control points S ¼ fsigM

i¼1and one realization of the mass input distribution are shownin Figure 4. By assigning weights w ¼ fwigM

i¼1 to each ofthe control points, the distribution of the water injection(with unit [kg/(m2 s)]) for a given point s0 is

f ðs0jw;S; r̂Þ ¼XMi¼1

wiCðsi; s0; r̂Þ: ð43Þ

The weighting vector w is unknown, and hence should beincluded in the calibration process.

[70] The total amount of mass input at the base of themodel is given by integration of equation (44), which hasthe form ofZ

f ðs0jw;S; r̂Þds0 ¼XMi¼1

wi

!ZCðsi; s0; r̂Þds0

¼XMi¼1

wi

!r̂2�:

ð44Þ

As suggested by surface flow data, the total amount ofmass should be fixed at 105 kg s�1.

[71] The enthalpy of the distribution of hot water injec-tion is taken to be 1490 kJ kg�1, which corresponds to thetemperature about 324�C. The temperature of 324�C waschosen so that after some mixing with large-scale convec-tive flows, the deep measured temperatures of about 300�C

could be reproduced by the model (M. J. O’Sullivan,unpublished report, 2006). In addition, a background heatflux of 80 mW m�2 is specified at the base of model, whichis typical of those measured outside the hot zone of geo-thermal systems.

[72] In a more Bayesian way, a hierarchical posteriorshould be used to model the hyperparameter r̂ for modelingthe distribution of mass input, the standard deviation �e inthe likelihood function, the temperature of the mass input,as well as the hyperparameter used in the prior distribu-tion (in section 5.3). Then, MCMC sampling could be usedto either estimate or marginalize these parameters as in thework of Hurn et al. [2003] and Christensen et al. [2006].However, correlation between these parameter and othermodel parameters could make this process computationallyinfeasible, because the resulting hierarchical posterior isusually difficult to sample from, and hence requires a largenumber of iterations in MCMC sampling. Since a fullyBayesian treatment of these parameters is still limited bythe computational power available, instead in this research,we use empirical estimates of these quantities that arededuced from either expert judgment or previous trial runsof MCMC. The limited computational power is allocated toexploring the model parameters.

5.3. Prior Distribution and Constraints[73] The unknown model parameters are the 10,005

dimensional permeability vector K ¼ fk(x), k(y), k(z)g andthe 41 dimensional weighting vector w that controls thedistribution of water injection (see section 5.2.2).

[74] We assert two weak prior assumptions regarding thepermeability distribution, as follows. First, we constrain thepermeabilities by setting bounds on the allowable values,with different bounds for the inner and outer regions of thereservoir. These bounds are listed in Table 1 while the innerand outer regions of the reservoir are indicated by the whiteand gray shaded areas in Figure 5. These provide compo-nentwise constraints on the vector of permeabilities but donot take account of any spatial structure. Second, within theconstrained values we assume that adjacent blocks willhave similar permeabilities, and hence assert a first orderGaussian Markov random field (GMRF) model [Besag,1974, 1986; Cressie, 1993; Rue and Held, 2005] for thelogarithm of permeabilities. Let i � j denote that two blocksi and j are ‘‘neighbors,’’ i.e., the blocks are adjacent to eachother and have a connection in the TOUGH2 model. TheGMRF model for permeabilities in each of the x, y, and zdirection kð�Þ; � ¼ fx; y; zg has the form

�ðkð�ÞÞ / exp �Xi�j

!ij log10ðkð�Þi Þ � log10ðk

ð�Þj Þ

h i2( )

; ð45Þ

Figure 4. Location of control points of the RBF model formass input distribution (red circles) and one realization of thedistribution of hot water injection at the base of the model.

Table 1. Prior Bounds of Permeabilitiesa

Inner Outer

Minimum Maximum Minimum Maximum

k(x) 0.1 1000 0.1 100k(y) 0.1 1000 0.1 100k(z) 0.01 200 0.01 40

aUnit: millidarcy. The inner and outer region are shown in Figure 5.


12 of 26

where !ij is the inverse of the Euclidean distance betweenthe two adjacent block centers, and is a hyperparametercontrols the smoothing. The sum

Pi�j is over all pairs of

neighboring blocks.[75] Because there is no point measurement or other

direct survey (e.g., seismic data) of the permeabilitiesavailable for this reservoir, we assume the permeabilities inx, y, and z directions are uncorrelated, and hence the overallprior distribution of the permeabilities is

�ðKÞ ¼ �ðkðxÞÞ � �ðkðyÞÞ � �ðkðzÞÞ

/ exp �X

�¼fx;y;zgkð�Þ

TPkð�Þ

0@

1A; ð46Þ

for permeabilities that satisfy the bound constraints, and isotherwise zero. Here P is the sparse precision matrix of theGMRF model defined by Pij ¼ !ij for i � j, and Pij ¼ 0otherwise. We take the hyperparameter ¼ 0:5, a valuesuggested by the trial runs of MCMC. With this setting, theposterior distribution is dominated by the likelihood func-tion, and hence the model parameters are primarily deter-mined by the data. The prior distribution provides a minorpressure to ensure that the permeabilities are smooth.

[76] For the weighting vector w that controls the distri-bution of water injection, we use a noninformative priorbecause there is no available observations of the specificamount of injected water from various locations at the baseof the reservoir. However, the total amount of injectedwater can be deduced from the surface flow data, which is105 kg s�1. This constrains the sum of the weighting vectorPM

i¼1 wi ¼ 3:7136� 104 from equation (44). We alsorequire that all the weights wi must be non-negative toensure that the distribution of mass injection (43) is non-negative.

6. Sampling Results6.1. MCMC Sampling

[77] The parameters to be estimated are the permeabil-ities in x, y, and z directions K ¼ fk(x), k(y), k(z)g, and theweights w that control the distribution of water injection.This gives a 10,046 dimensional parameter

x ¼ K;wf g: ð47Þ

Let F denote the fine model, we sample from the followingunformalized posterior distribution:

�ðxjdÞ / LðdjxÞ|fflfflffl{zfflfflffl}likelihood function

� �ðxÞ|ffl{zffl}prior distribution

; ð48Þ

¼ exp � 12�2

T

��d� FðxÞ��2

� �; ð49Þ

� exp �X

�¼fx;y;zgkð�Þ

TPkð�Þ

0@

1A: ð50Þ

Note that the prior distribution over the weighting parame-ter w is uniform over allowable values, and hence does notappear explicitly in the above formula. The state-dependentEEM over the posterior (section 3) is used in ADAMH toimprove the sampling efficiency. At step n, the approxi-mate posterior is given by

��x;nðx0jdÞ / L�x;nðdjx0Þ|fflfflfflfflffl{zfflfflfflfflffl}approximatelikelihood

� �ðx0Þ|ffl{zffl}priordistribution

; ð51Þ

¼exp �12

LA;n½d�F�ðx0Þ�FðxÞþF�ðxÞ�� 2

� ; ð52Þ

��ðx0Þ; ð53Þ

where F�(�) is the coarse model. In this approximate poste-rior, (52) is the approximation to the exact likelihood func-tion (49), and the prior distribution (53) is identical to (50).This approximation has the property that the value of theapproximate likelihood function equals the value of theexact likelihood function at the current state, i.e.,L�x;nðdjxÞ¼LðdjxÞ. Thus the approximate posterior and theexact posterior have the same density at the current state,i.e., ��x;nðxjdÞ¼�ðxjdÞ.

[78] The model is divided into 68 blocks for updating thepermeabilities K by the ABU introduced in section 4.1.The weights w for controlling the distribution of mass inputare adjusted separately by using the adaptive reversiblejump move in section 4.2. Hence each sweep of updates ofthe parameter x requires 69 iterations.

[79] Suppose we have X ¼ fK, wg at the current state.When permeabilities are proposed by ABU the resultingproposed parameter is X0 ¼ fK0, wg. The acceptance prob-ability in step 2 of ADAMH becomes

�ðx; x0Þ ¼ 1 ^L�x;nðdjx0Þ

LðdjxÞ|fflfflfflfflffl{zfflfflfflfflffl}ratio of likelihood

� �ðx0Þ�ðxÞ|ffl{zffl}

ratio of prior

: ð54Þ

Figure 5. The first-order neighborhood GMRF graph cor-responding to the prior (45), each term in the sum

Pi�j

corresponds to an edge in the graph. The gray and whiteregions indicate two different types of bounds on thepermeabilities.


13 of 26

Because we have symmetrical probability densities gn(�) inABU and the determinant of the Jacobian is 1, the ratio ofproposal distributions qn(x, x0) and qn(x0, x) equals 1, andhence is not included in (54). Using the simplified formula(A4), the acceptance probability in step 3 becomes

ðx; x0Þ ¼ 1 ^Lðdjx0Þ�ðx0Þ ^ L�x0 ;nðdjxÞ�ðxÞL�x;nðdjx0Þ�ðx0Þ ^ LðdjxÞ�ðxÞ : ð55Þ

Note that in this formulation, the reverse evaluation of theapproximate likelihood L�x0;nðdjxÞ does not incur additionalcost, because F�(x) has already been computed for currentstate x.

[80] When we propose w by using the reverse move, theresulting proposed parameter is X0 ¼ fK, w0g. Because theprior over w is noninformative and K remains unchanged, itis not necessary to evaluate the prior �ðxÞ. Suppose the re-versible jump selects the ith element of w, wi, and a randomvariable � is used in the transition from w to w0. We use thefollowing acceptance probability in step 2 of ADAMH:

�ðx; x0Þ ¼ 1 ^L�x;nðdjx0Þ

LðdjxÞ|fflfflfflfflffl{zfflfflfflfflffl}ratio of likelihood

� J ; ð56Þ

where the determinant of the Jacobian

J ¼ @ðw0; �0Þ@ðw; �Þ

�� ¼ C

C þ ð�� 1Þwi

� �Mþ1 1�

ð57Þ

as given in section 4.2. Because symmetrical and identicaldensities g0nð�0Þ and gnð�Þ are used in the reversible jumpmove, the ratio of the probability densities g0nð�0Þ and gnð�Þcancel, and hence not included in the formula. Using thesimplified formula (A4), the acceptance probability in step3 becomes

ðx; x0Þ ¼ 1 ^Lðdjx0ÞJ ^ L�x0;nðdjxÞL�x;nðdjx0ÞJ ^ LðdjxÞ : ð58Þ

The reverse evaluation of the approximate likelihoodL�x0;nðdjxÞ in (58) also does not incur additional cost.

[81] Since the fine model is computationally verydemanding, we first simulate the chain for about 200 sweepsof updates with the coarse model only to provide a suitablestarting state for ADAMH. Then, we start ADAMH to sam-ple from the exact posterior distribution (48). ADAMHachieves about 74% acceptance rate in the second step, andis able to sample the posterior distribution for about 11,200iterations in about 40 days. Since the computing time of thecoarse model is about 3% of the fine model, and the firststep acceptance rate in the ADAMH is set to be 0.1,ADAMH achieves a 7.7 speed-up factor compared to stand-ard MH. Note that this is an upper bound of the speed-up,because the loss of statistical efficiency �=�� cannot bemeasured. However, the 74% acceptance rate in the secondstep indicates that the statistical efficiency of ADAMHshould be similar to that for standard MH.

[82] We use the log of the likelihood function to bench-mark the efficiency of the algorithm, because this directlyreflects the goodness of fit. The trace plot of the log likeli-hood function is shown in Figure 6. We observed that after30 to 50 sweeps of updates, the sample outputs fit observeddata, and that the value of the log likelihood function showsno average trend. Thus, we discard the first 40 sweeps ofupdates as burn-in steps, and use the remaining samples toestimate the integrated autocorrelation time (IACT) of thelog likelihood function. The IACT is, roughly speaking, thenumber of samples required from the chain to give thesame variance reducing power as one independent sample.Many techniques have been developed for estimatingIACT from samples generated by Monte Carlo simulations(see Roberts [1996] for a useful summary). We used theapproach and code provided by Wolff [2004], which is astate-of-the-art approach from the physics literature, to pro-duce the estimates and error bars in Figure 6. The horizon-tal axis in Figures 6(c) and 6(d) shows the various windowsizes used to estimate the IACT, and the vertical line indi-cates the optimal window size for estimating IACT. The

Figure 6. Trace plot of the log likelihood function and the integrated autocorrelation time. (a) Trace plotof the log likelihood function, the vertical line represent the choice of number of burn-in steps. (b) Traceplot of the log likelihood function after discarding burn-in steps. (c) Normalized autocorrelation plot of thelog likelihood function. (d) Estimated integrated autocorrelation time with statistical error [Wolff, 2004].


14 of 26

estimated IACT is given by the vertical axis of Figure 6(d),which is 2.80 at the optimal window size, while the statisti-cal errors [see Wolff, 2004] of these estimates are summar-ized by vertical error bars. Twice the IACT, i.e., 5.6, thengives the number of sweeps required to produce independ-ence between samples Wolff [2004]. This is only a roughestimate because the chain has not been run long enough togive tight error bars on the estimate of IACT.

6.2. Computing Results[83] The mean and standard deviation of the temperature

profiles are estimated as sample averages over model real-izations. We compare these estimates with the measureddata in Figures 7 and 8. The solid black lines are the esti-mated mean temperatures, dashed black lines are the 95%credible interval, and measured data are shown as red

Figure 7. Comparison of estimated temperatures and measured data. The solid black lines are the esti-mated mean temperatures, dashed black lines are the 95% credible interval. Measured data are shown asred crosses, and the corresponding estimated mean model outputs are given as circles. The gray andgreen lines represent coarse and fine model outputs for each sweep of update, respectively.


15 of 26

crosses. The gray and green lines represent coarse and finemodel outputs of various realizations, respectively.

[84] Most of the estimated model temperatures agreewith the measured data very well, the root mean squareerror (RMSE) is summarized in Table 2. The assumptionthat the residuals follow a Gaussian distribution with stand-ard deviation �e ¼ 7:5�C is validated by the normal quan-tile plot and the cumulative density plot of the residuals(see Figure 9). The normal quantile plot shows that theresiduals are symmetric and roughly follow the Gaussiandistribution in the interval [�10,10], where most of themare located. Outside of this range, the residuals are slightlyskewed (light tailed below �10 and heavy tailed above 10).The same result is suggested by the cumulative densityplot. Overall, these plots show that the normality assump-tion is reasonable.

[85] Interestingly, the log likelihood achieves its maxi-mum value at about 80 iterations after burn in, which is af-ter 120 iterations in all. Each iteration is a sweep of 69proposal steps, which requires close to 5 evaluations of thefine-scale model per iteration, since the acceptance rate atthe first step of the delayed acceptance is 0.1% and the ac-ceptance at the second step is 74%. Hence, this best fit todata over the chain occurred after about 600 fine-scale sim-ulations, out of about 850 fine-scale evaluations in all.Even though our priority here is evaluating sample-basedaverages, it appears that ADAMH performs reasonabley

well as an algorithm for finding parameters that give a‘‘best’’ fit to data.

[86] Figure 7 shows that the coarse model and fine modelproduce significantly different temperature profiles. Thedifferences between the coarse model outputs and finemodel outputs are summarized by boxplots in Figure 10.We can observe that there exists structure in the errorbetween the fine model outputs and coarse model outputs,and the fine model is hotter than the coarse model on aver-age. The mean model reduction error at each of the mea-surement positions span a range of [�33.64, 54.98], andhence the noise level of these model reduction errors aremore significant than the zero mean normal distributionwith standard deviation �e ¼ 7:5�C used in the likelihoodfunction. Hence, the statistical modeling of the modelreduction error in sections 2.3 and 3 is essential for the effi-cient use of the coarse model in this research.

[87] We also notice that there exists structural errors insome of the temperature profiles, especially for the deepertemperature measurements, e.g., between �1100 and�1700 m in WB and between �1000 and �1500 m in WF,see Figure 8. These measurements have high temperaturesthat are above 320�C, measured in reasonably deep posi-tions. This suggests that the prespecified temperature of thedeep mass input distribution is not high enough. We couldadjust the temperature in the future by introducing more var-iables in the MCMC or by using other ‘‘expert’’ judgments.

[88] The temperature distributions are shown in Figure11, where Figure 11(a) shows the mean temperature distri-bution, the standard deviation of the temperature distribu-tion is shown in Figure 11(b), and the temperaturedistributions of the two realizations from the Markov chain(one from the middle and another from the end) are pre-sented in Figures 11(c) and 11(d). From these plots, we canobserve that there is one hot plume in the model, and the

Figure 8. Comparison of estimated temperatures and measured data for WB and WF. The solid blacklines are the estimated mean temperatures, dashed black lines are the 95% credible interval. Measureddata are shown as red crosses, and the corresponding estimated mean model outputs are given as circles.The green lines represent fine model outputs for each sweep of update.

Table 2. Summary of the Root Mean Square Error of the SampleOutputs Compare to Measured Data

RMSE of the Mean Estimate Minimum RMSE Maximum RMSE

7.22 7.11 8.66


16 of 26

top and the bottom of this hot plume have larger variationsin temperature than the rest of the model.

[89] Similarly, we present the estimated means, standarddeviations, and two realizations of the distribution of massinput and permeability distribution in the x, y, and z direc-tions in Figures 12–15. In these figures, (a) is the mean, (b)is the standard deviation, and (c) and (d) show tworealizations.

[90] The mean distribution of the mass input shows thatmost of the injected hot water occurs at the bottom of thefine region of the model. This corresponds to the opinion ofgeologists that the major source of deep hot water occurs atthe intersection of two geological faults in this area. Thestandard deviation and the two realizations from this distri-bution suggest that the variation among the samples is verysmall.

[91] The reasonably large variations in the reconstructedpermeability distributions suggest that there exists ambigu-ities in the permeability distributions, and this may be causedby the sparsely measured data. This effect is more significantin the region close to the boundary of the reservoir, whereno measured data are available. We can either give moreaccurate quantifications to these ambiguities by running thechain for a very long time or remove these ambiguities byimposing stronger prior assumptions or by using differentparameterizations of the permeability distributions.

7. Discussion[92] We have presented an adaptive delayed acceptance

Metropolis-Hastings algorithm (ADAMH) for exploringthe computationally expensive posterior distribution of alarge-scale geothermal reservoir model. This algorithmenhances the computational efficiency of the standard Me-tropolis-Hastings algorithm by employing approximate

posterior distributions that are based on reduced-ordermodels, as well as state-of-the-art techniques in adaptiveMCMC sampling. The approximations are formulated byusing the enhanced error model (EEM) of Kaipio andSomersalo [2007], and constructed through the samplinghistory. The use of adaptivity eliminates the computationalburden in constructing the EEM and provides a more accu-rate approximation.

[93] This algorithm demonstrates efficiency in samplingthe posterior distribution of the large 3-D geothermal reser-voir model, a task that was not possible with existingMCMC methods. ADAMH could also offer significantimprovement in computational efficiency when implement-ing sample-based inference in other large-scale inverseproblems. By using adaptive approximations and adaptiveproposal distributions, ADAMH becomes a fully automatedalgorithm and can be implemented for other problems with-out modifying the existing code.

[94] One reviewer queried the convergence of the algo-rithm, as ADAMH did not require a large number of itera-tions to reach the support of the posterior distribution inour problem. In contrast, MCMC algorithms typicallyrequire a large number of iterations to get through the burn-in period in some applications, e.g., example 3 of Haarioet al. [2006] used 500,000 iterations. The convergencespeed of an MCMC algorithm depends on the nature of theposterior distribution, which is problem specific, as well ason the particular algorithm and proposals used. Example 3of Haario et al. [2006] has a chaotic model and time seriesdata, with the resulting posterior distribution having manylocal modes with low probability density mass (H. Haario,personal communication, 2011). Hence it requires a largenumber of iterations for a random walk MCMC to fullyexplore the posterior. This difficulty can also be describedby observing that ‘‘close’’ in the sense of the parametrization

Figure 9. (a) Normal quantile plot of the residuals. (b) Cumulative density plot of the residuals, thesolid line is the estimated cumulative density function, and the dashed line indicates the Gaussian distri-bution used in the assumption.


17 of 26

does not correspond to ‘‘close’’ in the sense of fit to data.Choosing a representation in which these two notions ofclose are equivalent is part of the ‘‘art’’ of MCMC, particu-larly if random walk proposals are envisaged. Alternatively,

including moves that move the state between modes is effec-tive, though this is can be difficult in practice.

[95] Insights into the operation of sampling algorithmsmay also be found by considering relatively simple target

Figure 10. Boxplot of the differences between fine model outputs and coarse model outputs that areproduced by realizations of the Markov chain.


18 of 26

distributions such as a Gaussian distribution in high dimen-sions. This distribution does not suffer from local modesand has the advantage that conditional densities are avail-able. Consider the simplest case of a Gaussian in 10,000dimensions with identity covariance. Then a standard com-ponent wise Gibbs sampler will easily sample the distribu-tion, but will take 10,000 iterations to do so. In contrast ablock-update Gibbs sampler that updates all componentswill take just one iteration, independent of dimension. Inthis case these two algorithms are essentially the same, andimportantly the same number of random numbers are usedin each case to draw a single sample. For this reason wetend to think of sampling algorithms in terms of the numberof random numbers required rather than the number of iter-ations taken, with algorithmic efficiency corresponding tofewer random numbers being needed to traverse the statespace. When the covariance matrix is not diagonal andcomponents are correlated the Gibbs sampler will becomeless efficient while a block update scheme can still take oneiteration as long as the conditional structure is correctlyaccounted for. Note that the ADAMH algorithm used inthis paper is built on a block-update proposal, for this rea-son. However, in high-dimensional cases, correctly usingconditional structure can be a demanding task. An interest-ing recent development is the Krylov-space sampling algo-rithms that diagonalize the covariance within the algorithmto draw samples from a Gaussian distribution with compu-tational work proportional to the number of distinct eigen-values of the covariance matrix, independent of dimension[see e.g., Simpson et al., 2008; Fox, 2008], though we donot use those methods here. The block updates used in this

work were the result of many exploratory computationalexperiments aimed at understanding which blocks of varia-bles are highly correlated and which are close to independ-ent to achieve a useful partitioning. More generally, if thestate can be partitioned into n blocks so that the covariancebetween blocks is zero, or small, then the sampling prob-lem can be partitioned into n smaller tasks, independent ofdimension. Again, this is an approach used in the blockupdate proposals we use here.

[96] In the more realistic case of non-Gaussian target dis-tributions where conditional distributions are not available,one of the few options is the random walk MH algorithm,that we employ here. Then a severe problem occurs withblock updates that act on a block of many (nearly) inde-pendent components since the acceptance rates decreaseexponentially with the size of blocks (see example 30.3 ofMacKay [2003] and its solution). This property is actuallyequivalent to the geometric observation that the volume ofa sphere in high dimensions is overwhelmingly concen-trated at the shell. The nonintuitive nature of geometry inhigh dimensions is one of the major difficulties whendesigning effective algorithms for high-dimensional prob-lems, either stochastic or deterministic, and is commonlythe reason that algorithms that look promising in low-dimensional settings perform poorly in high-dimensionalsettings. There are many ways to get lost in 10,000 dimen-sions. For this reason we use block updates over highly cor-related variables that are effectively constrained to arelatively low-dimensional subspace. This is a case wherethe ill-conditioning of the inverse problem can actually beused to improve algorithmic efficiency. Adapting to the

Figure 11. Distributions of model temperatures, unit is Celsius. (a) The mean realizations, (b) standarddeviations of realizations, (c) one realization from the Markov chain, and (d) another realization fromthe Markov chain.


19 of 26

correlation structure, as in ADAMH, significantly reducesthe human effort required for building an efficient algo-rithm in this setting.

[97] One reason this sampling strategy works efficientlyin our study is that the effective number of dimensions ofthe spatially distributed parameters is much lower that thedimension of parameter space (see the discussion in thework of Flath et al. [2011]). Further, the amount of data inour problem is reasonably small, and the model has largernumber of parameters than data. Usually an under deter-mined system is easier to sample from than an over deter-mined system. These features mean that we can need onlya reasonably small number of steps to reach the support ofthe posterior distribution.

[98] Both the iteration count and number of randomnumbers required, discussed in the previous paragraphs, areproxies for the time required to compute. Further, since theoutput of a MCMC algorithm is a sequence of correlatedsamples, the statistical efficiency of the MCMC needs to bemeasured in terms of the integrated autocorrelation time(IACT) as in section 6.2, since this measures the power ofthe MCMC to reduce variance in estimates evaluated over

the chain. For this reason we use the CPU time per IACTas a quantitative measure of algorithmic performance andlook to minimize this quantity when tuning an algorithmfor optimal performance.

[99] A reviewer also queried the lack of computed diag-nostic statistics, such as the multi chain R-statistic [Gelmanand Rubin, 1992] and the single chain statistics [Geweke,1992; Raftery and Lewis, 1992]. These diagnostic statisticsprovide consistency checking of the outputs of the samplerbut do not show formal convergence of MCMC algorithms.In applications where it is possible to draw a large numberof samples, these statistics may be estimated accurately andare then useful for debugging the code and checking con-sistency of the MCMC simulation. However, in the presentapplication the computational cost of the forward modelconstrains the number of samples that can be drawn. Wehave to be realistic about what information can be extractedfrom the posterior with limited computing power. For thepresent study we used several chains with random startingpoints (not shown) and checked that each chain reached thesupport of the posterior, by checking that the data was fitreasonably well. This simple check ensures that the chain is

Figure 12. Distributions of mass input at the bottom of the model, unit in kg s�1. (a) The mean realiza-tions, (b) standard deviations of realizations, (c) one realization from the Markov chain, and (d) anotherrealization from the Markov chain.


20 of 26

Figure 13. The permeability distribution on x direction, in base 10 logarithmic scale. (a) The meanrealizations, (b) standard deviations of realizations, (c) one realization from the Markov chain, and (d)another realization from the Markov chain.

Figure 14. The permeability distribution on y direction, in base 10 logarithmic scale. (a) The meanrealizations, (b) standard deviations of realizations, (c) one realization from the Markov chain, and (d)another realization from the Markov chain.


21 of 26

mixing and not getting stuck in local modes. From the singlelonger chain that we have shown here, we found that thesamples are consistent as the model outputs of these samplesfit the data, see Figure 7. The lack of trend in the trace ofthe log likelihood in Figure 6 then shows that the chain isconsistent with having the desired ergodic properties.

[100] A reviewer also asked whether our prior distribu-tion equals the posterior distribution. The Gaussian Markovrandom field (GMRF) prior we used in section 5.3 is quitenoninformative, and is significantly different to the poste-rior. A marked difference is that the GMRF prior (45) isimproper, since it only enforces smoothness through thedifferences of permeabilities in adjacent elements of themodel. In particular the prior density is invariant when apermeability vector K is perturbed by a constant scalar k̂ toK þ k̂, yet these two permeabilities will produce signifi-cantly different model outputs for large perturbation k̂, andhence significantly different posterior density values. Fur-ther, samples from the prior distribution will be smooththroughout the volume and not show the structured hotwater injection or temperature plume that is evident in sam-ples from the posterior distribution.

[101] Future research is necessary to enhance the sam-pling efficiency of ADAMH. This may include (1) con-structing computationally faster, more accurate reduced-order models for multiphase systems. For example, thelocal approximation of the Hessian matrix provided by theadjoint state approach [Sun and Yeh, 1990; Galarza et al.,1999; Chu et al., 1995; Awotunde and Horne, 2011] is apotential candidate, but further investigation is necessary toimplement this approach for multiphase nonisothermalflows. (2) It is also important that strategies are investigated

for designing proposal distributions that traverse the param-eter space more efficiently.

Appendix A: Simplifications of the AcceptanceProbabilities Used by ADAMH

[102] The numerator of the right-hand side of equation(17) can be expressed as

�ðx0jdÞg0nð�0Þ�ðx0; xÞÞJ

¼ �ðx0jdÞg0nð�0ÞJ 1 ^��x0;nðxjdÞ��x0;nðx0jdÞ

gnð�Þgn0ð�0Þ

1J

" #

¼ �ðx0jdÞg0nð�0ÞJ�

^ ��x0;nðxjdÞgnð�Þ�ðx0jdÞ��x0;nðx0jdÞ

" #:

ðA1Þ

Similarly, the denominator of the right-hand side of equa-tion (17) has the form

�ðxjdÞgnð�Þ�ðx; x0Þ

¼ �ðxjdÞgnð�Þ 1 ^��x;nðx0jdÞ��x;nðxjdÞ

g0nð�0Þgnð�Þ

J

" #

¼ �ðxjdÞgnð�Þ½ ^ ��x;nðx0jdÞg0nð�0ÞJ�ðxjdÞ��x;nðxjdÞ

" #:

ðA2Þ

[103] For a state-dependent approximation, we assumethat the exact posterior and approximate posterior have thesame probability at the current state x, i.e., ��x;nðxjdÞ ¼

Figure 15. The permeability distribution on z direction, in base 10 logarithmic scale. (a) The meanrealizations, (b) standard deviations of realizations, (c) one realization from the Markov chain, and (d)another realization from the Markov chain.


22 of 26

�ðxjdÞ, which is often the case in practice. Hence the firststep acceptance probability (12) becomes

�ðx; x0Þ ¼ 1 ^��x;nðx0jdÞ�ðxjdÞ

qnðx0; xÞqnðx; x0Þ

: ðA3Þ

To further simplify the second step probability (17), we canreduce equations (A1) and (A2) by using ��x;nðxjdÞ ¼�ðxjdÞ, and substitute these back into equation (17) to give

ðx; x0Þ ¼ 1 ^�ðx0jdÞgn

0ð�0ÞJ ^ ��x0;nðxjdÞgnð�Þ��x;nðx0jdÞgn

0ð�0ÞJ ^ �ðxjdÞgnð�Þ: ðA4Þ

Note that this shows that evaluation of the reverse approxi-mation ��x0;nðxjdÞ is required. For the approximate posteriorbased on linearization used in the work of Christen andFox [2005], this reverse evaluation incurs additional com-puting. In the case of using the coarse model with local cor-rection, this does not incur additional cost, because thecoarse model output has already been evaluated for state xin the chain. The cost of evaluating the local correctionterm is negligible compared to the cost of evaluating thecoarse model and fine model.

[104] For a state-independent approximation, we usuallydo not have the property that ��x;nðxjdÞ ¼ �ðxjdÞ, otherwisethe approximation would equal to the exact target distribu-tion. However, equation (17) can still be reduced as follows.Because the approximation is no longer state dependent, theapproximation ��x;nð�jdÞ can now be written as ��nð�jdÞ.Then, equation (A1) can be written in the form

�ðx0jdÞg0nð�0ÞJ�

^ ��nðxjdÞgnð�Þ�ðx0jdÞ��nðx0jdÞ

� �

¼ ��nðx0jdÞgn0ð�0ÞJ ^ ��nðxjdÞgnð�Þ

� �ðx0jdÞ��nðx0jdÞ

:

ðA5Þ

Similarly, equation (A2) has the form

�ðxjdÞgnð�Þ½ ^ ��nðx0jdÞg0ð�0ÞJ�ðxjdÞ��nðxjdÞ

� �

¼ ��nðxjdÞgnð�Þ ^ ��nðx0jdÞgn0ð�0ÞJ

� �ðxjdÞ��nðxjdÞ

:

ðA6Þ

Then, substitute equations (A4) and (A5) back into equa-tion (17) to give


�ðxjdÞ��nðxjdÞ��nðx0jdÞ

: ðA7Þ

This formula is identical to the surrogate transition [Liu,2001] and the reversible jump two step method [Mondal etal., 2010]. Note that in this formulation the approximatedensity ��nðxjdÞ is evaluated for each state x in the chain,and hence the reverse evaluation of �ðx0; xÞ in equation(17) does not incur any additional cost.

Appendix B: Derivation of the Reversible JumpRule in Section 4

[105] In section 4, suppose that we have current state w, agiven component i, where i 2 ð1; . . . ;MÞ, and a random

variable � � Uniformð1=�; �Þ, where � > 1. The mapw0 ¼ �ðw; �Þ is given by

w0 ¼ wþ wið�� 1Þei

C þ ð�� 1ÞwiC; ðB1Þ

where ei is the ith standard basis vector in the M-dimen-sional Cartesian coordinate system, and C ¼

PMi¼1 wi is the

sum of vector w. In equation (B1), we first perturb thevalue of w at its ith component, wi, by a multiplicative fac-tor �. Then the denominator C þ ð1� �Þwi is the sum ofperturbed vector. By scaling the perturbed vector byC=½C þ ð1� �Þwi, the sum of resulting w0 equals C. Sinceeach time wi is scaled up and down by � > 0, the nonnega-tive constraint in section 5.3 is hold. Similarly, we have theinverse map �ðw0; �0Þ :

w ¼ w0 þ w0ið�0 � 1Þei

C þ ð�0 � 1Þw0iC: ðB2Þ

[106] On component i, the above relations can beexpressed as

wi0 ¼ wi�C

C þ ð�� 1Þwi; ðB3Þ

wi ¼w0i�

0CC þ ð�0 � 1Þw0i

: ðB4Þ

By substituting equation (B3) into equation (B4) and col-lecting terms, we have

��0C ¼ C þ wið�� 1Þ þ wi�ð�0 � 1Þ: ðB5Þ

Equation (B5) only holds if ��0 ¼ 1.[107] We now verify this relationship on equations (B1)

and (B2). By substituting equation (B1) into equation (B2)and collecting terms, we have

w ¼ wC þ wið�� 1ÞCei þ wi�ð�0 � 1ÞCei

C þ wið�� 1Þ þ wi�ð�0 � 1Þ : ðB6Þ

Equation (B6) also requires that ��0 ¼ 1.[108] Since � � Uniformð1=�; �Þ, the map �ðw; �Þ is

inverted by �0 ¼ 1=�, which is also in the range ½1=�; �.This is necessary to make the MCMC reversible. Let�i ¼ f1; . . . ;Mg=i, the determinant of the Jacobian matrixj@ðw0; �0Þ=@ðw; �Þj

¼ @ðw�i0;wi

0; �0Þ@ðw�i;wi; �Þ

��

¼

CCþð��1Þwi

. . . 0 ... ..

.

..

. . .. ..

. @w0�i@wi

@w0�i@�

0 . . . CCþð��1Þwi

..

. ...

0 . . . 0 �C2

½Cþð��1ÞwiÞ2@w0i@�

0 . . . 0 0 � 1�2

��

��¼ C

C þ ð�� 1Þwi

� �Mþ1 1�:

ðB7Þ


23 of 26

[109] Acknowledgments. The authors would like to thank HeikkiHaario for precious insights on his work and helpful discussions, as well asJasper Vrugt and three anonymous reviewers for the set of helpful sugges-tions. Financial support for one of the authors (T. Cui) from the New Zea-land Institute of Mathematics and Its Application (NZIMA) is gratefullyacknowledged.

ReferencesAmit, Y., and U. Grenander (1991), Comparing sweep strategies for sto-

chastic relaxation, J. Multivariate Anal., 37, 197–222.Andrieu, C., and E. Moulines (2006), On the ergodicity properties of some

adaptive Markov chain Monte Carlo algorithms, Ann. Appl. Probab.,16(1), 462–505.

Arridge, S. R., J. P. Kaipio, V. Kolehmainen, M. Schweiger, E. Somersalo,T. Tarvainen, and M. Vauhkonen (2006), Approximation errors andmodel reduction with an application in optical diffusion tomography,Inverse Probl., 22, 175–195.

Atchade, Y. F., and J. S. Rosenthal (2005), On adaptive Markov chainMonte Carlo algorithms, Bernoulli, 11, 815–828.

Awotunde, A. A., and R. N. Horne (2011), A wavelet approach to adjointstate sensitivity computation for steady state differential equations,Water Resour. Res., 47, W03502, doi:10.1029/2010WR009165.

Bates, B. C., and E. P. Campbell (2001), A Markov chain Monte Carloscheme for parameter estimation and inference in conceptual rainfall-runoff modeling, Water Resour. Res., 37(4), 937–947, doi:10.1029/2000WR900363.

Begg, S. H., R. R. Carter, and P. Dranfield (1989), Assigning effective val-ues to simulator gridblock parameters for heterogeneous reservoirs, SPEReservoir Eng., 4(4), 455–463.

Besag, J. E. (1974), Spatial interaction and the statistical analysis of latticesystems, J. R. Stat. Soc. Ser. B, 36(2), 192–236.

Besag, J. E. (1986), On the statistical analysis of dirty pictures, J. R. Stat.Soc. Ser. B, 48, 259–302.

Beven, K. J., and A. M. Binley (1992), The future of distributed hydrologi-cal models: Model calibration and uncertainty prediction, Hydrol. Proc-esses, 6(3), 279–298, doi:10.1002/hyp.3360060305.

Beven, K. J., and J. Freer (2001), Equifinality, data assimilation, and uncer-tainty estimation in mechanistic modeling of complex environmentalsystems using the glue methodology, J. Hydrol., 249, 11–29.

Caers, J., and T. Hoffman (2006), The probability perturbation method: Anew look at Bayesian inverse modeling, Math. Geol., 38(1), 81–100,doi:10.1007/s11004-005-9005-9.

Carrera, J., and S. P. Neuman (1986), Estimation of aquifer parametersunder transient and steady state conditions: 1. Maximum likelihoodmethod incorporating prior information, Water Resour. Res., 22(2), 199–210, doi:10.1029/WR022i002p00199.

Carrera, J., A. Alcolea, A. Medina, J. Hidalgo, and L. J. Slooten (2005),Inverse problem in hydrogeology, Hydrogeol. J., 13(1), 206–222,doi:10.1007/s10040-004-0404-7.

Chib, S., and E. Greenberg (1995), Understanding the Metropolis-Hastingsalgorithm, Am. Stat., 49(4), 327–335.

Christen, J. A., and C. Fox (2005), MCMC using an approximation, J. Com-put. Graph. Stat., 14(4), 795–810.

Christensen, O. F., G. O. Roberts, and M. Skold (2006), Robust Markovchain Monte Carlo methods for spatial generalized linear mixed models,J. Comput. Graph. Stat., 15(1), 1–17.

Christie, M. A., and M. J. Blunt (2001), Tenth SPE comparative solutionproject: A comparison of upscaling techniques, SPE Reservoir Eng.Eval., 4, 308–317.

Chu, L., A. C. Reynolds, and D. S. Oliver (1995), Computation of sensitiv-ity coefficients for conditioning the permeability field to well-test pres-sure data, In Situ, 19(2), 179–223.

Cooley, R. L. (2000), An analysis of the pilot point methodology for auto-mated calibration of an ensemble of conditionally simulated transmissiv-ity fields, Water Resour. Res., 36(4), 1159–1163, doi:10.1029/2000WR900008.

Cressie, N. A. C. (1993), Statistics for Spatial Data, Wiley, New York.Cui, T., C. Fox, and M. J. O’Sullivan (2011), Adaptive error modelling in

MCMC sampling for large scale inverse problem, Tech. Rep. 686, Fac. ofEng., Univ. of Auckland, Auckland, New Zealand.

Deutsch, C. V., and A. G. Journel (1998), GSLIB, Geostatistical SoftwareLibrary User’s Guide, Oxford Univ. Press, New York.

Doherty, J. (2005), PEST: Software for Model-Independent ParameterEstimation, Watermark Numer. Comput., Brisbane, Australia.

Duan, Q., S. Sorooshian, and V. Gupta (1992), Effective and efficientglobal optimization for conceptual rainfall-runoff models, Water Resour.Res., 28(4), 1015–1031, doi:10.1029/91WR02985.

Duan, Q., N. K. Ajami, X. Gao, and S. Sorooshian (2007), Multi-model en-semble hydrologic prediction using Bayesian model averaging, Adv.Water Resour., 30(5), 1371–1386, doi:10.1016/j.advwatres.2006.11.014.

Durlofsky, L. J. (1998), Coarse scale models of two phase flow in heteroge-neous reservoirs: volume averaged equations and their relationship toexisting upscaling techniques, Comput. Geosci., 2, 73–92.

Efendiev, Y., A. Datta-Gupta, V. Ginting, X. Ma, and B. Mallick (2005), Anefficient two-stage Markov chain Monte Carlo method for dynamic dataintegration, Water Resour. Res., 41, W12423, doi:10.1029/2004WR003764.

Finsterle, S. (1993), ITOUGH2 User’s Guide, Lawrence Berkeley Natl.Lab., Berkeley, Calif.

Flath, H. P., V. Akcelik, L. C. Wilcox, J. Hill, B. van Bloemen Waanders, andO. Ghattas (2011), Fast algorithms for Bayesian uncertainty quantification inlarge-scale linear inverse problems based on low-rank partial Hessian approx-imations, SIAM J. Sci. Comput., 33, 407–432, doi:10.1137/090780717.

Fox, C. (2008), A conjugate sampler for normal distributions, with a fewcomputed examples, Tech. Rep. 2008-1, pp. 1172–4234, Electron. Group,Univ. of Otago, Dunedin, New Zealand.

Fox, C., S. Tan, and G. K. Nicholls (2009), Inverse Problems, Lecture Notesof ELEC 404, Univ. of Otago, Dunedin, New Zealand.

Fu, J. L., and J. J. Gomez-Hernandez (2009), Uncertainty assessment anddata worth in groundwater flow and mass transport modeling using ablocking Markov chain Monte Carlo method, J. Hydrol., 364, 328–341,doi:10.1016/j.jhydrol.2008.11.014.

Galarza, G. A., J. Carrera, and A. Medina (1999), Computational techni-ques for optimization of problems involving non-linear transient simula-tions, Int. J. Numer. Methods Eng., 45(3), 319–334.

Gelman, A., and D. B. Rubin (1992), Inference from iterative simulationusing multiple sequences (with discussion), Stat. Sci., 7, 457–511.

Gelman, A., G. O. Roberts, and W. R. Gilks (1996), Efficient Metropolisjumping rules, in Bayesian Statistics, vol. 5, edited by J. O. Berger, J. M.Bernardo, A. P. Dawid, D. V. Lindley, and A. F. M. Smith, pp. 599–608,Oxford Univ. Press, Oxford, U.K.

Geweke, J. (1992), Evaluating the accuracy of sampling-based approachesto the calculation of posterior moments, in Bayesian Statistics 4, editedby J. M. Bernardo et al., pp. 169–193, Oxford Univ. Press, New York.

Gilks, W. R., G. O. Roberts, and E. I. George (1994), Adaptive directionsampling, J. R. Stat. Soc. Ser. D, 43(1), 179–189.

Gomez-Hernanez, J. J., A. Sahuquillo, and J. E. Capilla (1997), Stochasticsimulation of transmissivity fields conditional to both transmissivity andpiezometric data – I. theory, J. Hydrol., 203, 162–174, doi:10.1016/S0022-1694(97)00098-X.

Goodman, J., and A. Sokal (1989), Multigrid Monte Carlo method: Con-ceptual foundations, Phys. Rev. D, 40(6), 2035–2071.

Grant, M. A., I. G. Donaldson, and P. F. Bixley (1982), Geothermal Reser-voir Engineering, Acad. Press, New York.

Green, P. J. (1995), Reversible jump Markov chain Monte Carlo computa-tion and Bayesian model determination, Biometrika, 82, 711–732.

Green, P. J., and A. Mira (2001), Delayed rejection in reversible jump Me-tropolis-Hastings, Biometrika, 88, 1035–1053.

Haario, H., E. Saksman, and J. Tamminen (2001), An adaptive Metropolisalgorithm, Bernoulli, 7, 223–242.

Haario, H., M. Laine, A. Mira, and E. Saksman (2006), DRAM: Efficientadaptive MCMC, Stat. Comput., 16, 339–354.

Hadamard, J. (1902), Sur les problemes aux derivees partielles et leur signi-fication physique, (on the problems of partial derivatives and their physi-cal significance), Princeton Univ. Bull., 13, 49–52.

Haslett, J., M. Whiley, S. Bhattacharya, M. Salter-Townshend, S. P. Wil-son, J. R. M. Allen, B. Huntley, and F. J. G. Mitchell (2006), Bayesianpalaeoclimate reconstruction (with discussion), J. R. Stat. Soc. Ser. A,169(3), 395–438.

Hastings, W. (1970), Monte Carlo sampling using Markov chains and theirapplications, Biometrika, 57, 97–109.

Hendricks-Franssen, H. J., A. Alcolea, M. Riva, M. Bakr, N. van der Wiel,F. Stauffer, and A. Guadagnini (2009), A comparison of seven methodsfor the inverse modelling of groundwater flow. Application to the charac-terisation of well catchments, Adv. Water Resour., 32(6), 851–872,doi:10.1016/j.advwatres.2009.02.011.

Hernandez, A. F., S. P. Neuman, A. Guadagnini, and J. Carrera (2006),Inverse stochastic moment analysis of steady state flow in randomly het-erogeneous media, Water Resour. Res., 42, W05425, doi:10.1029/2005WR004449.


24 of 26

Higdon, D., H. Lee, and C. Holloman (2003), Markov chain Monte Carlo-based approaches for inference in computationally intensive inverseproblems, in Bayesian Statistics 7, edited by J. M. Bernardo et al., pp.181–197, Oxford Univ. Press, New York.

Hurn, M. A., O. Husby, and H. Rue (2003), ‘‘Advances in Bayesian imageanalysis,’’ in Highly Structured Stochastic Systems, pp. 302–322, OxfordUniv. Press, New York.

Huttunen, J. M. J., and J. P. Kaipio (2007), Approximation error analysis innonlinear state estimation with an application to state-space identifica-tion, Inverse Probl., 23(5), 2141, doi:10.1088/0266-5611/23/5/019.

International Formulation Committee (1967), A Formulation of the Ther-modynamic Properties of Ordinary Water Substance, IFC Secretariat,Düsseldorf, Germany.

Jaynes, E. T. (1968), Prior probability, IEEE Trans. Syst. Sci. Cybernetics,4(3), 227–241.

Kaipio, J., and E. Somersalo (2004), Statistical and Computational InverseProblems, Springer, New York.

Kaipio, J. P., and E. Somersalo (2007), Statistical inverse problems: Discre-tization, model reduction and inverse crimes, J. Comput. Appl. Math.,198(2), 493–504.

Karpouzos, D. K., F. Delay, K. L. Katsifarakis, and G. De Marsily (2001),A multipopulation genetic algorithm to solve the inverse problem inhydrogeology, Water Resour. Res., 37(9), 2291–2302, doi:10.1029/2000WR900411.

Kavetski, D., G. Kuczera, and S. W. Franks (2006), Bayesian analysis ofinput uncertainty in hydrological modeling: 1. Theory, Water Resour.Res., 42, W03407, doi:10.1029/2005WR004368.

Keating, E., J. Doherty, J. A. Vrugt, and Q. Kang (2010), Optimization anduncertainty assessment of strongly non-linear groundwater models withhigh parameter dimensionality, Water Resour. Res., 46, W10517,doi:10.1029/2009WR008584.

Kolehmainen, V., T. Tarvainen, S. R. Arridge, and J. P. Kaipio (2011),Marginalization of uninteresting distributed parameters in inverse prob-lems, Int. J. Uncertainty Quant., 1, 1–17.

Kuczera, G., and E. Parent (1998), Monte Carlo assessment of parameteruncertainty in conceptual catchment models: The Metropolis algorithm,J. Hydrol., 211, 69–85, doi:10.1016/S0022-1694(98)00198-X.

Kuczera, G., D. Kavetski, S. Franks, and M. Thyer (2006), Towards aBayesian total error analysis of conceptual rainfall-runoff models: Char-acterising model error using storm-dependent parameters, J. Hydrol.,331, 161–177, doi:10.1016/j.jhydrol.2006.05.010.

Lehikoinen, A., M. J. Huttunen, S. Finsterle, M. B. Kowalsky, and J. P.Kaipio (2010), Dynamic inversion for hydrological process monitoringwith electrical resistance tomography under model uncertainties, WaterResour. Res., 46, W04513, doi:10.1029/2009WR008470.

Lieberman, C., K. Willcox, and O. Ghattas (2010), Parameter and statemodel reduction for large-scale statistical inverse problems, SIAM J. Sci.Comput., 32(5), 2523–2542.

Lipponen, A., A. Seppanen, and J. P. Kaipio (2010), Reduced-order estima-tion of nonstationary flows with electrical impedance tomography,Inverse Probl., 26(7), 074010, doi:10.1088/0266-5611/26/7/074010.

Liu, J. S. (2001), Monte Carlo Strategies in Scientific Computing, Springer,New York.

Liu, J. S., F. M. Liang, and W. H. Wong (2000), The use of multiple-trymethod and local optimization in Metropolis sampling, J. Am. Stat.Assoc., 95(449), 121–134.

Liu, X., M. A. Cardiff, and P. K. Kitanidis (2010), Parameter estimation innonlinear environmental problems, Stochastic Environ. Res. Risk Assess.,24(7), 1003–1022, doi:10.1007/s00477-010-0395-y.

Liu, Y., and H. V. Gupta (2007), Uncertainty in hydrologic modeling: To-ward an integrated data assimilation framework, Water Resour. Res., 43,W07401, doi:10.1029/2006WR005756.

MacKay, D. J. C. (2003), Information Theory, Inference, and LearningAlgorithms, Cambridge Univ. Press, New York.

Mariethoz, G., P. Renard, and J. Caers (2010), Bayesian inverse problemand optimization with iterative spatial resampling, Water Resour. Res.,46, W11530, doi:10.1029/2010WR009274.

Marshall, L., D. Nott, and A. Sharma (2007), Towards dynamic catchmentmodelling: A Bayesian hierarchical mixtures of experts framework,Hydrol. Processes, 21(7), 847–861, doi:10.1002/hyp.6294.

Metropolis, N., A. W. Rosenbluth, M. N. Rosenbluth, A. H. Teller, and E.Teller (1953), Equation of state calculations by fast computing machines,J. Chem. Phys., 21, 1087–1092.

Mondal, A., Y. Efendiev, B. Mallick, and A. Datta-Gupta (2010), Bayesianuncertainty quantification for flows in heterogeneous porous media using

reversible jump Markov chain Monte Carlo methods, Adv. WaterResour., 33(3), 241–256.

Neuman, S. P. (2003), Maximum likelihood Bayesian averaging of uncer-tain model predictions, Stochastic Environ. Res. Risk Assess., 17, 291–305.

Oliver, D. S., L. B. Cunha, and A. C. Reynolds (1997), Markov chainMonte Carlo methods for conditioning a permeability field to pressuredata, Math. Geol., 29(1), 61–91, doi:10.1007/BF02769620.

O’Sullivan, M. J. (1985), Geothermal reservoir simulation, Int. J. EnergyRes., 9(3), 319–332.

O’Sullivan, M. J., and R. McKibbin (1989), Geothermal Reservoir Engi-neering, Univ. of Auckland, Auckland, New Zealand.

Pruess, K. (1991), TOUGH2—A General-Purpose Numerical Simulator forMultiphase Fluid and Heat Flow, Lawrence Berkeley Natl. Lab., Berke-ley, Calif.

Raftery, A. E., and S. M. Lewis (1992), One long run with diagnostics:Implementation strategies for Markov chain Monte Carlo, Stat. Sci., 7,493–497.

RamaRao, B., A. LaVenue, G. De Marsily, and M. Marietta (1995), Pilotpoint methodology for automated calibration of an ensemble of condi-tionally simulated transmissivity fields: 1. Theory and computationalexperiments, Water Resour. Res., 31(3), 475–493, doi:10.1029/94WR02258.

Renard, B., D. Kavetski, G. Kuczera, M. Thyer, and S. W. Franks (2010),Understanding predictive uncertainty in hydrologic modeling: The chal-lenge of identifying input and structural errors, Water Resour. Res., 46,W05521, doi:10.1029/2009WR008328.

Renard, P., and G. De Marsily (1997), Calculating equivalent permeability:A review, Adv. Water Resour., 20, 253–278.

Roberts, G. O. (1996), Markov chain concepts related to sampling algo-rithms, in Markov Chain Monte Carlo in Practice, edited by W. R. Gilks,S. Richardson, and D. J. Spiegelhalter, pp. 45–57, Chapman and Hall,New York.

Roberts, G. O., and J. S. Rosenthal (2001), Optimal scaling for variousMetropolis-Hastings algorithms, Stat. Sci., 16, 351–367.

Roberts, G. O., and J. S. Rosenthal (2007), Coupling and ergodicity ofadaptive MCMC, J. Appl. Prob., 44, 458–475.

Roberts, G. O., and J. S. Rosenthal (2009), Examples of adaptive MCMC,J. Comput. Graph. Stat., 18(2), 349–367.

Roberts, G. O., and S. K. Sahu (1997), Updating schemes, correlation struc-ture, blocking and parameterization for the Gibbs sampler, J. R. Stat.Soc. Ser. B, 59(2), 291–317.

Roberts, G. O., A. Gelman, and W. R. Gilks (1997), Weak convergence andoptimal scaling of random walk Metropolis algorithms, Ann. Appl.Prob., 7, 110–120.

Rue, H., and L. Held (2005), Gaussian Markov Random Fields: Theoryand Applications, Chapman and Hall, New York.

Schoups, G., and J. A. Vrugt (2010), A formal likelihood function for pa-rameter and predictive inference of hydrologic models with correlated,heteroscedastic and non-Gaussian errors, Water Resour. Res., 46,W10531, doi:10.1029/2009WR008933.

Schoups, G., J. A. Vrugt, F. Fenicia, and N. C. van de Giesen (2010), Cor-ruption of accuracy and efficiency of Markov chain Monte Carlo simula-tion by inaccurate numerical implementation of conceptual hydrologicmodels, Water Resour. Res., 46, W10530, doi:10.1029/2009WR008648.

Simpson, D. P., I. W. Turner, and A. N. Pettitt (2008), Fast sampling from aGaussian Markov random field using Krylov subspace approaches, QUTePrints ID 14376, http://eprints.qut.edu.au/14376/, Queensland Univ. ofTechnology, Brisbane, Australia.

Smith, T. J., and L. A. Marshall (2008), Bayesian methods in hydrologic mod-eling: A study of recent advancements in Markov chain Monte Carlo tech-niques, Water Resour. Res., 44, W00B05, doi:10.1029/2007WR006705.

Sun, N.-Z., and W. W.-G. Yeh (1990), Coupled inverse problems ingroundwater modeling: 1. Sensitivity analysis and parameter identifica-tion, Water Resour. Res., 26(10), 2507–2525, doi:10.1029/WR026i010p02507.

Ter Braak, C. J. F. (2006), A Markov chain Monte Carlo version of thegenetic algorithm differential evolution: easy Bayesian computing forreal parameter spaces, Stat. Comput., 16(3), 239–249, doi:10.1007/s11222-006-8769-1.

Ter Braak, C. J. F., and J. A. Vrugt (2008), Differential evolution Markovchain with snooker updater and fewer chains, Stat. Comput., 18(4), 435–446, doi:10.1007/s11222-008-9104-9.

Thyer, M., B. Renard, D. Kavetski, G. Kuczera, S. Franks, and S. Srikan-than (2009), Critical evaluation of parameter consistency and predictive


25 of 26

uncertainty in hydrological modelling: A case study using Bayesian totalerror analysis, Water Resour. Res., 45, W00B14, doi:10.1029/2008WR006825.

Tierney, L., and A. Mira (1999), Some adaptive Monte Carlo methods forBayesian inference, Stat. Med., 18, 2507–2515.

Ulrych, T. J., and M. D. Sacchi (2005), Information-Based Inversion andProcessing with Applications, Handbook of Geophysical Exploration,Seismic Exploration, vol. 36, Elsevier, New York.

Ulrych, T. J., M. D. Sacchi, and A. D. Woodbury (2001), A Bayes tour ofinversion: A tutorial, Geophysics, 60(1), 55–69.

Van Genuchten, M. T. (1980), A closed-form equation for predicting the hy-draulic conductivity of unsaturated soils, Soil Sci. Soc. Am. J., 44, 892–898.

Vrugt, J. A., H. V. Gupta, W. Bouten, and S. Sorooshian (2003), A shuffledcomplex evolution Metropolis algorithm for optimization and uncer-tainty assessment of hydrologic model parameters, Water Resour. Res.,39(8), 1201, doi:10.1029/2002WR001642.

Vrugt, J. A., C. J. F. ter Braak, M. P. Clark, J. M. Hyman, and B. A.Robinson (2008), Treatment of input uncertainty in hydrologic model-ing: doing hydrology backward with Markov chain Monte Carlo simu-lation, Water Resour. Res., 44, W00B09, doi:10.1029/2007WR006720.

Vrugt, J. A., C. J. F. ter Braak, C. G. H. Diks, D. Higdon, B. A. Robinson,and J. M. Hyman (2009), Accelerating Markov chain Monte Carlo simu-lation by differential evolution with self-adaptive randomized subspacesampling, Int. J. Nonlin. Sci. Numer. Simul., 10(3), 273–290.

Waagepetersen, R., and D. Sorensen (2001), A tutorial on reversible jumpMCMC with a view toward applications in qtl-mapping, Int. Stat. Rev.,69(1), 49–61.

Wolff, U. (2004), Monte Carlo errors with less errors, Comput. Phys. Com-mun., 156(2), 143–153, doi:10.1016/S0010-4655(03)00467-3.

Woodbury, A. D., and T. J. Ulrych (2000), A full-Bayesian approach to thegroundwater inverse problem for steady-state flow, Water Resour. Res.,36(8), 2081–2093, doi:10.1029/2000WR900086.

Yapo, P. O., H. V. Gupta, and S. Sorooshian (1998), Multi-objective globaloptimization for hydrologic models, J. Hydrol., 204, 83–97, doi:10.1016/S0022-1694(97)00107-8.

Yeh, W. W.-G. (1986), Review of parameter identification procedures ingroundwater hydrology: The inverse problem, Water Resour. Res.,22(2), 95–108, doi:10.1029/WR022i002p00095.

Zimmerman, D. A., et al. (1998), A comparison of seven geostatisticallybased inverse approaches to estimate transmissivities for modeling ad-vective transport by groundwater flow, Water Resour. Res., 34(6), 1373–1413, doi:10.1029/98WR00003.

T. Cui and M. J. O’Sullivan, Department of Engineering Science, Uni-versity of Auckland, Auckland 1141, New Zealand. ([email protected]; [email protected])

C. Fox, Department of Physics, University of Otago, Dunedin 9016,New Zealand. ([email protected])


26 of 26

bayesian calibration of a large-scale geothermal reservoir model … · 2012-02-02 · citation:...

Documents